Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

editions: maps are incorrectly serialized if file has default message encoding of DELIMITED #1615

Closed
jhump opened this issue May 7, 2024 · 7 comments

Comments

@jhump
Copy link
Contributor

jhump commented May 7, 2024

For one, the protoreflect.FieldDescriptor.Kind() method returns protoreflect.GroupKind for the map field itself (which is defined under-the-hood as a repeated message field) and also for any values in the map entry that are also messages. This was recently confirmed by the Protobuf team to be a bug (protocolbuffers/protobuf#16549) and then fixed in the C++ runtime/protoc implementation in protocolbuffers/protobuf@0dbd99a#diff-eccabec4b932653c8b1a80b49eb3eeecdbab939a444d9518f5c240e3e55c8c29R5546-R5567.

But the issue in the Go runtime (even as recent as v1.34.1) is actually a little bit worse than the C++ behavior: instead of serializing the maps using group encoding (which is technically not correct as maps cannot be configured to have delimited encoding), it serializes them in a way that is not valid Protobuf binary data.

It actually uses the "start group" wire type (3), but then emits a length-prefix and has no "end group" tag at the end.

Here's an example:

00000000: 0b08 0a03 6162 6312 0131 0b08 0a03 6465  ....abc..1....de
00000010: 6612 0132 0b08 0a03 7879 7a12 0133 130c  f..2....xyz..3..
00000020: 087b 130b 060a 0161 1201 3114            .{.....a..1.

How this gets interpreted by a decoder:

  • 0b: Wire type "start group", tag 1
  • 08: Since the previous wire type was group this gets interpreted as a field in the group. So wire type "varint", tag 1.
  • 0a: A valid varint value of 10.
  • 03: The next tag in the group: but this one is invalid 💀. It encodes a wire type of 3 but a tag of zero which is not legal 💥

However, what's actually going on:

  • 0b: Wire type "start group", tag 1
  • 08: The varint length of 8 bytes. Oops! (Pretend the previous byte was 0a: wire type "bytes", tag 1.). The contents are a nested message entry.
    • 0a: Wire type "bytes", tag 1
    • 03: Varint length of 3 bytes
    • 616263: The string "abc"
    • 12: Wire type "bytes", tag 2
    • 01: Varint length of 1 byte
    • 31: The string "1"
  • And so on.

Interestingly, there is no trailing tag with an "end group" wire type. It uses "start group" in the initial tag but then acts as it if it otherwise encoded with length-prefix instead of delimited.

At the very end, we can see another example, but this time it's the value of a map entry that is similarly encoded incorrectly. The last 9 bytes of the above output are 0b 060a 0161 1201 3114. Like the above entries, this one also starts with a "group start" wire type, but is then followed by a length (6 bytes). The value inside that entry is the final four bytes 1201 3114. Interestingly, this has the reverse issue as the entry itself: it starts off with 12, which is wire type "bytes", tag 2. It then has the length (01) and the data (31). But, despite starting with a "bytes" wire type, it does conclude with an "end group" tag: 14 is wire type "end group", tag 2.

The above invalid output was produced using the following file:

edition = "2023";

option features.message_encoding = DELIMITED;

message Foo {
  map<string, string> string_map = 1;
  map<uint32, Foo> children = 2;
}

It was compiled using v27.0-rc1 of protoc and v1.34.1 of protoc-gen-go:

protoc test.proto --go_opt paths=source_relative --go_opt Mtest.proto=tmp/main --go_out .

And then the above was produced using this small Go program (main.go in the same folder/package as above generated code):

package main

import "fmt"
import "os"
import "google.golang.org/protobuf/proto"

func main() {
	foo := &Foo{
		StringMap: map[string]string{
			"abc": "1",
			"def": "2",
			"xyz": "3",
		},
		Children: map[uint32]*Foo{
			123: &Foo{StringMap: map[string]string{"a": "1"}},
		},
	}
	data, err := proto.Marshal(foo)
	if err != nil {
		fmt.Fprintf(os.Stderr, "failed to marshal message: %v\n", err)
		os.Exit(1)
	}
	os.Stdout.Write(data)
	err = proto.Unmarshal(data, foo)
	if err != nil {
		fmt.Fprintf(os.Stderr, "failed to unmarshal message: %v\n", err)
		os.Exit(1)
	}
}

It fails to round-trip the message with the an error:

failed to unmarshal message: proto: cannot parse invalid wire-format data

It writes the actual encoded bytes to stdout, which is the example binary data above.

@jhump
Copy link
Contributor Author

jhump commented May 24, 2024

@stapelberg, @lfolger, have either of you seen this? Now that v27.0 of protoc is released (which includes some add'l minor changes to descriptor.proto), it would be nice to get another release soon that has the latest v27 version incorporated into the descriptorpb package as well as a fix to this bug. If it would help, I can probably work on creating pull requests for these.

@stapelberg
Copy link

(Lasse is on vacation until June 10th.)

Thanks for the report. We haven’t encountered this issue elsewhere.

Yes, if you send pull requests, that would help to get a new release out. I’m out of office on Thursday and Friday, so timeline-wise, I think we should be able to cut a new release next week.

To confirm: importing the new descriptor version merely side-steps this issue, yes? For delimited encoding, it sounds like we still need a fix eventually.

@jhump
Copy link
Contributor Author

jhump commented May 27, 2024

To confirm: importing the new descriptor version merely side-steps this issue, yes? For delimited encoding, it sounds like we still need a fix eventually.

Sorry that I brought up the new descriptor in the same sentence. They are separate -- that and a fix to this bug are two independent things I'd like to get into a release.

@stapelberg
Copy link

Ah, thanks for clarifying. Will you also send a PR for the bug fix? I’m afraid we’re currently stretched very thin (with others on vacation and parental leave), so to make progress in a reasonable time frame we’re reliant on contributions. Thanks

@jhump
Copy link
Contributor Author

jhump commented May 29, 2024

Will you also send a PR for the bug fix?

Just opened one: https://go-review.googlesource.com/c/protobuf/+/588976

@stapelberg
Copy link

Apologies for the delay, I was out sick the last few days.

Should I start a new release, or would you like to wait for https://go.dev/cl/589336 to land as well?

@jhump
Copy link
Contributor Author

jhump commented Jun 6, 2024

We can wait to see the outcome of that other CL. If it is approved, it would be nice for it to be included as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants