Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regen.sh is failing in Ubuntu when working with recommended version of protoc #10979

Closed
jcsherin opened this issue Jun 18, 2024 · 2 comments · Fixed by #11006
Closed

regen.sh is failing in Ubuntu when working with recommended version of protoc #10979

jcsherin opened this issue Jun 18, 2024 · 2 comments · Fixed by #11006
Labels
bug Something isn't working

Comments

@jcsherin
Copy link
Contributor

Describe the bug

Running regen.sh in datafusion-proto fails with the error:

$ ./regen.sh
Error: "protobuf compilation failed: protoc failed: datafusion/proto/proto/datafusion.proto: This file contains proto3 optional fields, but --experimental_allow_proto3_optional was not set.\n"

The error happens with libprotoc 3.12.4, which is the version recommended in the DataFusion docs. This is also the default version installed by apt in Ubuntu 22.04.

$ protoc --version
libprotoc 3.12.4

A similar bug was reported in #8602.

I needed to run regen.sh while trying to migrate string_agg, one of the built-in aggregate functions pending migration, listed in #8708.

Could you recommend an approach to fix this? I have listed 3 possible ways here.

1. Add flag

Satisfy the experimental check by passing --experimental_allow_proto3_optional to the protoc compiler invocation. I've verified locally that this will work.

// In datafusion/proto/gen/src/main.rs L31
prost_build::Config::new()
	.protoc_arg("--experimental_allow_proto3_optional")

2. Convert to oneof

Rewrite optional field as a oneof with a single field.

Currently, in datafusion.proto there are two usages of optional:

message ScalarUDFExprNode {  
  string fun_name = 1;  
  repeated LogicalExprNode args = 2;  
  optional bytes fun_definition = 3;  
}

message PhysicalScalarUdfNode {  
  string name = 1;  
  repeated PhysicalExprNode args = 2;  
  optional bytes fun_definition = 3;  
  datafusion_common.ArrowType return_type = 4;  
}

We can rewrite the optional field as a oneof with a single field like this:

message ScalarUDFExprNode {  
  string fun_name = 1;  
  repeated LogicalExprNode args = 2;  
  oneof fun_definition_opt {  
    bytes fun_definition = 3;  
  }  
}

This technique is already used in other message types in datafusion.proto like - WindowFrame, CsvScanExecNode, ProjectionNode etc.

I've tried this change in my fork and verified that it works. Please see the diff here

3. Remove optional

We could remove the optional keyword usage here, which would fix the issue. It might however have other cascading effects. To evaluate it, it might be useful to see #8706, where the message type in question was designed.

To Reproduce

You need protoc version 3.12.4.

Expected behavior

Running regen.sh should exit without errors when using protoc version 3.12.4.

Additional context

No response

@jcsherin jcsherin added the bug Something isn't working label Jun 18, 2024
@jayzhan211
Copy link
Contributor

jayzhan211 commented Jun 19, 2024

I'm not sure why there is the error, but my version is libprotoc 27.0. 3.12.4 might be an outdated version 🤔

the version recommended in the DataFusion docs

3.12.4 is not the recommended version, but the minimum version then (probably not anymore given your error)

Could you try update it to the latest to see if there is still any issue?

@jcsherin
Copy link
Contributor Author

Could you try update it to the latest to see if there is still any issue?

There is no compilation error when I tried with a different version: libprotoc 26.0.

From v3.15 there shouldn't be any compilation errors even when optional is used in the .proto source. I found this section in the protobuf field presence docs:

Presence tracking for proto3 messages is enabled by default since v3.15.0 release, formerly up until v3.12.0 the --experimental_allow_proto3_optional flag was required when using presence tracking with protoc.

How about changing the minimum required version in DataFusion docs source here from v3.12 to v3.15?

If acceptable, I can submit a doc patch for your review. We don't need to make any of the above suggested changes to code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants