Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

editions: strange behavior with open enums and unrecognized values in message literals #16757

Closed
jhump opened this issue May 6, 2024 · 1 comment · Fixed by bufbuild/protocompile#309
Assignees
Labels

Comments

@jhump
Copy link
Contributor

jhump commented May 6, 2024

The text format, which is used in message literals for options with message types in Protobuf source files, allows the use of numeric values for enum fields, instead of enum value names. Furthermore, if the enum is open, it allows the use of unrecognized numeric values.

However, some strange things happen if we try combining this capability with features (all this was done using the most recent release candidate, v27.0-rc1).

First, a base line file. This exhibits expected behavior:

// test.proto
edition = "2023";
import "google/protobuf/descriptor.proto";
message Message {
  Enum s = 1;
}
enum Enum {
  ZERO = 0;
  ONE = 1;
}
extend google.protobuf.FileOptions {
  Message msg = 1000;
}
option (msg) = {
  s: -321
};

If we examine the descriptor produced from this, it looks right:

$ protoc test.proto -o /dev/stdout | protoc test.proto --decode google.protobuf.FileDescriptorSet
file {
  name: "test2.proto"
  dependency: "google/protobuf/descriptor.proto"
  message_type {
    name: "Message"
    field {
      name: "s"
      number: 1
      label: LABEL_OPTIONAL
      type: TYPE_ENUM
      type_name: ".Enum"
      json_name: "s"
    }
  }
  enum_type {
    name: "Enum"
    value {
      name: "ZERO"
      number: 0
    }
    value {
      name: "ONE"
      number: 1
    }
  }
  extension {
    name: "msg"
    extendee: ".google.protobuf.FileOptions"
    number: 1000
    label: LABEL_OPTIONAL
    type: TYPE_MESSAGE
    type_name: ".Message"
    json_name: "msg"
  }
  options {
    [msg] {
      s: -321
    }
  }
  syntax: "editions"
  edition: EDITION_2023
}

But, if we move that enum field into a custom feature, things get a little weird:

edition = "2023";
import "google/protobuf/descriptor.proto";
message Message {
  Enum s = 1;
}
enum Enum {
  ZERO = 0;
  ONE = 1;
}
extend google.protobuf.FeatureSet {
  Message msg = 1000;
}
option features.(msg) = {
  s: -321
};
file {
  name: "test2.proto"
  dependency: "google/protobuf/descriptor.proto"
  message_type {
    name: "Message"
    field {
      name: "s"
      number: 1
      label: LABEL_OPTIONAL
      type: TYPE_ENUM
      type_name: ".Enum"
      json_name: "s"
    }
  }
  enum_type {
    name: "Enum"
    value {
      name: "ZERO"
      number: 0
    }
    value {
      name: "ONE"
      number: 1
    }
  }
  extension {
    name: "msg"
    extendee: ".google.protobuf.FeatureSet"
    number: 1000
    label: LABEL_OPTIONAL
    type: TYPE_MESSAGE
    type_name: ".Message"
    json_name: "msg"
  }
  options {
    features {
      [msg] {
        s: ONE
      }
    }
  }
  syntax: "editions"
  edition: EDITION_2023
}

The -321 value has been transformed into the value ONE (which had numeric value 1).

Looking at the actual raw output, using --decode_raw, we see that it was in fact serialized with the value 1. So this appears to be a serialization issue, not de-serialization.

Things get even weirder if we try to have both an option and a feature:

edition = "2023";
import "google/protobuf/descriptor.proto";
message Message {
  Enum s = 1;
}
enum Enum {
  ZERO = 0;
  ONE = 1;
}
extend google.protobuf.FeatureSet {
  Message msg = 1000;
}
extend google.protobuf.FileOptions {
  Message msg2 = 1000;
}
option features.(msg) = {
  s: -321
};
option (msg2) = {
  s: -321
};

This won't even compile:

$ protoc test.proto -o /dev/null
test.proto:19:17: Error while parsing option value for "msg2": Unknown enumeration value of "-321" for field "s".

Wha?? This is the same as the first example above, except that the presence of the feature seems to confound it. (The enum is not defined as closed, so this should be allowed, and was allowed in the first two examples.)

If we explicitly set the enum to be closed (via option features.enum_type = CLOSED), the error remains the same. As already described in #16756, if we do that but only have the feature present (not the custom option), there is no error (it incorrectly accepts the unrecognized numeric value), but like the second case directly above, the value is serialized as 1 instead of -321.

@jhump jhump added the untriaged auto added to all issues by default when created. label May 6, 2024
@shaod2 shaod2 added editions and removed untriaged auto added to all issues by default when created. labels May 7, 2024
@mkruskal-google
Copy link
Member

mkruskal-google commented May 7, 2024

lol this one took me a while to figure out. The issue here is that you're specifically using extension 1000 and field 1. As far as protoc is concerned, this corresponds to pb.CppFeatures.legacy_closed_enum, meaning that every enum field in the file will be treated as closed. That solved the mystery of why the option becomes an error when you add the feature. I'm not sure if this is WAI or a bug, I guess we could treat extension 1000 specially and ban it from being used?

Why the feature alone parses badly is similar. It treats this as a boolean when it moves the FeatureSet object into the generated pool, and then when it serializes it again it's just 1 (corresponding to ONE). If you change the extension number to anything else this should work as expected

This example has a number of weird edge cases that we might want to just outright ban:

  1. Reusing extension 1000, which overlap the C++ features built into protoc
  2. Defining invalid features (this would fail to compile edition defaults)
  3. Using features in the file they're defined

copybara-service bot pushed a commit that referenced this issue May 7, 2024
This will prevent users from accidentally overriding these with different types (e.g. #16757 and #16756).

PiperOrigin-RevId: 631518986
copybara-service bot pushed a commit that referenced this issue May 9, 2024
This will prevent users from accidentally overriding these with different types (e.g. #16757 and #16756).

PiperOrigin-RevId: 631518986
copybara-service bot pushed a commit that referenced this issue May 9, 2024
This will prevent users from accidentally overriding these with different types (e.g. #16757 and #16756).

PiperOrigin-RevId: 631518986
copybara-service bot pushed a commit that referenced this issue May 9, 2024
This will prevent users from accidentally overriding these with different types (e.g. #16757 and #16756).

PiperOrigin-RevId: 631518986
mkruskal-google added a commit to mkruskal-google/protobuf that referenced this issue May 13, 2024
This will prevent users from accidentally overriding these with different types (e.g. protocolbuffers#16757 and protocolbuffers#16756).

PiperOrigin-RevId: 631518986
copybara-service bot pushed a commit that referenced this issue May 15, 2024
This will prevent users from accidentally overriding these with different types (e.g. #16757 and #16756).

PiperOrigin-RevId: 633760581
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants