Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replication: Support GTID tag in PreviousGTIDsEvent #952

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

dveeden
Copy link
Collaborator

@dveeden dveeden commented Nov 21, 2024

Issue: ref #845

The PreviousGTIDsEvent / PREVIOUS_GTIDS_LOG_EVENT has changed to work with tagged GTIDs.

First the uuidCount has changed, it encodes the GTID format. Here format 1 is tagged and format 0 is untagged.

Then each entry may have a tag. If there is a tag then the uuid itself isn't printed but the tag is appended to the last entry.

Example 1

896e7882-18fe-11ef-ab88-22222d34d411:1-3

regular format, compatible with both formats

Example 2

896e7882-18fe-11ef-ab88-22222d34d411:1-4:aaaa:1

tagged format.

Combination of

  • 896e7882-18fe-11ef-ab88-22222d34d411:1-4
  • 896e7882-18fe-11ef-ab88-22222d34d411:aaaa:1

Example 3

896e7882-18fe-11ef-ab88-22222d34d411:1-4:aaaa:1:abc:1-3:bbbbb:1:bbbbbb:1:x:1,896e7882-18fe-11ef-ab88-22222d34d412:1-2

Combination of:

896e7882-18fe-11ef-ab88-22222d34d411:1-4
                                    :aaaa:1
                                    :abc:1-3
                                    :bbbbb:1
                                    :bbbbbb:1
                                    :x:1,
896e7882-18fe-11ef-ab88-22222d34d412:1-2

Please also see: mysqlbinlog --read-from-remote-server --hexdump $binlogfile to see how MySQL encodes/decodes this.

See also:

Issue: ref go-mysql-org#845

The `PreviousGTIDsEvent` / `PREVIOUS_GTIDS_LOG_EVENT` has changed to
work with tagged GTIDs.

First the `uuidCount` has changed, it encodes the GTID format. Here
format 1 is tagged and format 0 is untagged.

Then each entry may have a tag. If there is a tag then the uuid itself
isn't printed but the tag is appended to the last entry.

Examples:

`896e7882-18fe-11ef-ab88-22222d34d411:1-3`

regular format, compatible with both formats

`896e7882-18fe-11ef-ab88-22222d34d411:1-4:aaaa:1`

tagged format.

Combination of

- `896e7882-18fe-11ef-ab88-22222d34d411:1-4`
- `896e7882-18fe-11ef-ab88-22222d34d411:aaaa:1`

`896e7882-18fe-11ef-ab88-22222d34d411:1-4:aaaa:1:abc:1-3:bbbbb:1:bbbbbb:1:x:1,896e7882-18fe-11ef-ab88-22222d34d412:1-2`

Combination of:
```
896e7882-18fe-11ef-ab88-22222d34d411:1-4
                                    :aaaa:1
                                    🔤1-3
                                    :bbbbb:1
                                    :bbbbbb:1
                                    ❌1,
896e7882-18fe-11ef-ab88-22222d34d412:1-2
```

Please also see: `mysqlbinlog --read-from-remote-server --hexdump $binlogfile` to see how MySQL encodes/decodes this.

See also:
- https://dev.mysql.com/doc/refman/8.4/en/replication-gtids-concepts.html
Copy link
Collaborator

@lance6716 lance6716 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure Example 3 in PR description has the same format as MySQL. In https://dev.mysql.com/doc/refman/8.4/en/replication-gtids-concepts.html , there's

GTIDs originating from the same server but having different tags are treated in a manner similar to those originating from different servers, like this:

3E11FA47-71CA-11E1-9E33-C80AA9429562:Domain_1:1-3:15-21, 3E11FA47-71CA-11E1-9E33-C80AA9429562:Domain_2:8-52

rather than 3E11FA47-71CA-11E1-9E33-C80AA9429562:Domain_1:1-3:15-21:Domain_2:8-52. I guess the reason is if Domain_2 is replaced by tag 23, we will see 3E11FA47-71CA-11E1-9E33-C80AA9429562:Domain_1:1-3:15-21:23:8-52, which is ambiguous.

@dveeden
Copy link
Collaborator Author

dveeden commented Nov 26, 2024

I'm not sure Example 3 in PR description has the same format as MySQL. In https://dev.mysql.com/doc/refman/8.4/en/replication-gtids-concepts.html , there's

GTIDs originating from the same server but having different tags are treated in a manner similar to those originating from different servers, like this:

3E11FA47-71CA-11E1-9E33-C80AA9429562:Domain_1:1-3:15-21, 3E11FA47-71CA-11E1-9E33-C80AA9429562:Domain_2:8-52

rather than 3E11FA47-71CA-11E1-9E33-C80AA9429562:Domain_1:1-3:15-21:Domain_2:8-52. I guess the reason is if Domain_2 is replaced by tag 23, we will see 3E11FA47-71CA-11E1-9E33-C80AA9429562:Domain_1:1-3:15-21:23:8-52, which is ambiguous.

mysql-9.1.0> SET gtid_next='automatic:aaa';
Query OK, 0 rows affected (0.00 sec)

mysql-9.1.0> INSERT INTO t1 VALUES (1);
Query OK, 1 row affected (0.01 sec)

mysql-9.1.0> INSERT INTO t1 VALUES (2);
Query OK, 1 row affected (0.01 sec)

mysql-9.1.0> SET gtid_next='automatic:bbbb';
Query OK, 0 rows affected (0.00 sec)

mysql-9.1.0> INSERT INTO t1 VALUES (3);
Query OK, 1 row affected (0.01 sec)

mysql-9.1.0> INSERT INTO t1 VALUES (4);
Query OK, 1 row affected (0.01 sec)

mysql-9.1.0> INSERT INTO t1 VALUES (5);
Query OK, 1 row affected (0.01 sec)

mysql-9.1.0> INSERT INTO t1 VALUES (6);
Query OK, 1 row affected (0.01 sec)

mysql-9.1.0> SET gtid_next='automatic:c';
Query OK, 0 rows affected (0.00 sec)

mysql-9.1.0> INSERT INTO t1 VALUES (7);
Query OK, 1 row affected (0.01 sec)

mysql-9.1.0> SET gtid_next='automatic:d';
Query OK, 0 rows affected (0.00 sec)

mysql-9.1.0> INSERT INTO t1 VALUES (8);
Query OK, 1 row affected (0.01 sec)

mysql-9.1.0> SET gtid_next='automatic';
Query OK, 0 rows affected (0.00 sec)

mysql-9.1.0> INSERT INTO t1 VALUES (9);
Query OK, 1 row affected (0.01 sec)

mysql-9.1.0> SET gtid_next='automatic:123';
ERROR 1774 (HY000): Malformed GTID specification 'automatic:123'.
mysql-9.1.0> show global variables like 'gtid_executed';
+---------------+-----------------------------------------------------------------+
| Variable_name | Value                                                           |
+---------------+-----------------------------------------------------------------+
| gtid_executed | 30c380a5-ac28-11ef-b590-ba33c4017a43:1:aaa:1-2:bbbb:1-4:c:1:d:1 |
+---------------+-----------------------------------------------------------------+
1 row in set (0.02 sec)

mysql-9.1.0> SELECT VERSION();
+-----------+
| VERSION() |
+-----------+
| 9.1.0     |
+-----------+
1 row in set (0.00 sec)

It looks like my example might be correct and the MySQL docs might not be in sync with the actual behavior.

Note that 23 can't be a valid tag.

From https://dev.mysql.com/doc/dev/mysql-server/latest/classmysql_1_1gtid_1_1Tag.html

Representation of the GTID tag.

Tag format: [a-z_]{a-z0-9_}{0,31} Tag may be created from a given text. Text may contain leading and trailing 
spaces which will be omitted during Tag creation. Text may also contain uppercase characters. Acceptable 
format for text is as follows: [:space:][a-zA-Z_][a-zA-Z0-9_]{0,31}[:space:]

@dveeden
Copy link
Collaborator Author

dveeden commented Nov 26, 2024

And with the exact GTIDs from the docs:

mysql-9.1.0> reset binary logs and gtids;
Query OK, 0 rows affected (0.03 sec)

mysql-9.1.0> set global gtid_purged='3E11FA47-71CA-11E1-9E33-C80AA9429562:Domain_1:1-3:15-21, 
    '> 3E11FA47-71CA-11E1-9E33-C80AA9429562:Domain_2:8-52';
Query OK, 0 rows affected (0.02 sec)

mysql-9.1.0> show global variables like 'gtid%';
+----------------------------------+-----------------------------------------------------------------------+
| Variable_name                    | Value                                                                 |
+----------------------------------+-----------------------------------------------------------------------+
| gtid_executed                    | 3e11fa47-71ca-11e1-9e33-c80aa9429562:domain_1:1-3:15-21:domain_2:8-52 |
| gtid_executed_compression_period | 0                                                                     |
| gtid_mode                        | ON                                                                    |
| gtid_owned                       |                                                                       |
| gtid_purged                      | 3e11fa47-71ca-11e1-9e33-c80aa9429562:domain_1:1-3:15-21:domain_2:8-52 |
+----------------------------------+-----------------------------------------------------------------------+
5 rows in set (0.00 sec)

@dveeden
Copy link
Collaborator Author

dveeden commented Nov 26, 2024

I've filed https://bugs.mysql.com/bug.php?id=116789 for this

@dveeden
Copy link
Collaborator Author

dveeden commented Nov 27, 2024

I also filed https://bugs.mysql.com/bug.php?id=116747 some days ago

Copy link
Collaborator

@lance6716 lance6716 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a pure code-related review. I'll check GTID encoding and decoding soon

replication/event.go Outdated Show resolved Hide resolved
replication/event.go Outdated Show resolved Hide resolved
if format == GtidFormatTagged {
tagLength := int(data[pos]) / 2
pos += 1
if tagLength > 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can the tag have zero length? please add comments

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mysql-9.1.0> set gtid_next = 'AUTOMATIC:abc';
Query OK, 0 rows affected (0.00 sec)

mysql-9.1.0> create table test.t9 (id int primary key);
Query OK, 0 rows affected (0.02 sec)

mysql-9.1.0> set gtid_next = 'AUTOMATIC:';
ERROR 1774 (HY000): Malformed GTID specification 'AUTOMATIC:'.

https://dev.mysql.com/doc/dev/mysql-server/latest/classmysql_1_1gtid_1_1Tag.html#details

Representation of the GTID tag.

Tag format: [a-z_]{a-z0-9_}{0,31} Tag may be created from a given text. Text may contain leading and trailing spaces which will be omitted during Tag creation. Text may also contain uppercase characters. Acceptable format for text is as follows: [:space:][a-zA-Z_][a-zA-Z0-9_]{0,31}[:space:]

So it is non-zero if it has a tag. Note the {0,31} in the regexp, which is only for the second character as the first can't be a number.

@@ -254,9 +298,14 @@ func (e *PreviousGTIDsEvent) Decode(data []byte) error {
}
intervals[i] = interval
}
previousGTIDSets[i] = fmt.Sprintf("%s:%s", uuid, strings.Join(intervals, ":"))
if isTag {
previousGTIDSets[currentSetnr-1] += fmt.Sprintf(":%s:%s", tag, strings.Join(intervals, ":"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because string is immutable, string += will create a new string object each time. If there are too many tags the performance will drop. Maybe store every "tag+intervals" and join them once at all

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we should try to do that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe try this

diff --git a/replication/event.go b/replication/event.go
index 20c7b0c..0786516 100644
--- a/replication/event.go
+++ b/replication/event.go
@@ -277,7 +277,9 @@ func (e *PreviousGTIDsEvent) Decode(data []byte) error {
 	pos += 8
 
 	previousGTIDSets := make([]string, uuidCount)
+
 	currentSetnr := 0
+	var buf strings.Builder
 	for range previousGTIDSets {
 		uuid := e.decodeUuid(data[pos : pos+16])
 		pos += 16
@@ -292,30 +294,38 @@ func (e *PreviousGTIDsEvent) Decode(data []byte) error {
 				pos += tagLength
 			}
 		}
+
+		if isTag {
+			buf.WriteString(":")
+			buf.WriteString(tag)
+		} else {
+			if currentSetnr != 0 {
+				buf.WriteString(",")
+			}
+			buf.WriteString(uuid)
+			currentSetnr += 1
+		}
+
 		sliceCount := binary.LittleEndian.Uint16(data[pos : pos+8])
 		pos += 8
-		intervals := make([]string, sliceCount)
-		for i := range intervals {
+		for range sliceCount {
+			buf.WriteString(":")
+
 			start := e.decodeInterval(data[pos : pos+8])
 			pos += 8
 			stop := e.decodeInterval(data[pos : pos+8])
 			pos += 8
-			interval := ""
 			if stop == start+1 {
-				interval = fmt.Sprintf("%d", start)
+				fmt.Fprintf(&buf, "%d", start)
 			} else {
-				interval = fmt.Sprintf("%d-%d", start, stop-1)
+				fmt.Fprintf(&buf, "%d-%d", start, stop-1)
 			}
-			intervals[i] = interval
 		}
-		if isTag {
-			previousGTIDSets[currentSetnr-1] += fmt.Sprintf(":%s:%s", tag, strings.Join(intervals, ":"))
-		} else {
-			previousGTIDSets[currentSetnr] = fmt.Sprintf("%s:%s", uuid, strings.Join(intervals, ":"))
+		if !isTag {
 			currentSetnr += 1
 		}
 	}
-	e.GTIDSets = strings.Join(previousGTIDSets[:currentSetnr], ",")
+	e.GTIDSets = buf.String()
 	return nil
 }
 

@dveeden
Copy link
Collaborator Author

dveeden commented Dec 3, 2024

Copy link
Collaborator

@lance6716 lance6716 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest lgtm

// https://github.com/mysql/mysql-server/blob/61a3a1d8ef15512396b4c2af46e922a19bf2b174/sql/rpl_gtid_set.cc#L1363-L1378
func decodeSid(data []byte) (format GtidFormat, sidnr uint64) {
gtidinfo := make([]byte, 8)
copy(gtidinfo, data[:8])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the modification to gtidinfo happens only when it's GtidFormatTagged. So we can postpone the data copy until checked it's GtidFormatTagged ...

Comment on lines +257 to +266
sid_mask := []byte{0, 255, 255, 255, 255, 255, 255, 0}

// Apply the mask
for i, _ := range gtidinfo[:8] {
gtidinfo[i] &= sid_mask[i]
}
gtidinfo = append(gtidinfo, 0)

// sidnr
sidnr = binary.LittleEndian.Uint64(gtidinfo[1:])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and seems we only need to prepare a [8]byte and copy data[1:7] to it? Then binary.LittleEndian.Uint64 can work

isTag := false
var tag string
if format == GtidFormatTagged {
tagLength := int(data[pos]) / 2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how this ... / 2 happens in MySQL's source code. Seems it's https://github.com/mysql/mysql-server/blob/61a3a1d8ef15512396b4c2af46e922a19bf2b174/libs/mysql/gtid/tag.cpp#L84 and it further calls a complex read_varlen_bytes.

Just to double check with you, did you simplify read_varlen_bytes into int(data[pos]) / 2? If not, I think I can help to provide more test data with different tag length and check our calculation is correct

uuid := e.decodeUuid(data[pos : pos+16])
pos += 16
isTag := false
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems isTag can be replaced by len(tag) > 0

@@ -254,9 +298,14 @@ func (e *PreviousGTIDsEvent) Decode(data []byte) error {
}
intervals[i] = interval
}
previousGTIDSets[i] = fmt.Sprintf("%s:%s", uuid, strings.Join(intervals, ":"))
if isTag {
previousGTIDSets[currentSetnr-1] += fmt.Sprintf(":%s:%s", tag, strings.Join(intervals, ":"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe try this

diff --git a/replication/event.go b/replication/event.go
index 20c7b0c..0786516 100644
--- a/replication/event.go
+++ b/replication/event.go
@@ -277,7 +277,9 @@ func (e *PreviousGTIDsEvent) Decode(data []byte) error {
 	pos += 8
 
 	previousGTIDSets := make([]string, uuidCount)
+
 	currentSetnr := 0
+	var buf strings.Builder
 	for range previousGTIDSets {
 		uuid := e.decodeUuid(data[pos : pos+16])
 		pos += 16
@@ -292,30 +294,38 @@ func (e *PreviousGTIDsEvent) Decode(data []byte) error {
 				pos += tagLength
 			}
 		}
+
+		if isTag {
+			buf.WriteString(":")
+			buf.WriteString(tag)
+		} else {
+			if currentSetnr != 0 {
+				buf.WriteString(",")
+			}
+			buf.WriteString(uuid)
+			currentSetnr += 1
+		}
+
 		sliceCount := binary.LittleEndian.Uint16(data[pos : pos+8])
 		pos += 8
-		intervals := make([]string, sliceCount)
-		for i := range intervals {
+		for range sliceCount {
+			buf.WriteString(":")
+
 			start := e.decodeInterval(data[pos : pos+8])
 			pos += 8
 			stop := e.decodeInterval(data[pos : pos+8])
 			pos += 8
-			interval := ""
 			if stop == start+1 {
-				interval = fmt.Sprintf("%d", start)
+				fmt.Fprintf(&buf, "%d", start)
 			} else {
-				interval = fmt.Sprintf("%d-%d", start, stop-1)
+				fmt.Fprintf(&buf, "%d-%d", start, stop-1)
 			}
-			intervals[i] = interval
 		}
-		if isTag {
-			previousGTIDSets[currentSetnr-1] += fmt.Sprintf(":%s:%s", tag, strings.Join(intervals, ":"))
-		} else {
-			previousGTIDSets[currentSetnr] = fmt.Sprintf("%s:%s", uuid, strings.Join(intervals, ":"))
+		if !isTag {
 			currentSetnr += 1
 		}
 	}
-	e.GTIDSets = strings.Join(previousGTIDSets[:currentSetnr], ",")
+	e.GTIDSets = buf.String()
 	return nil
 }
 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants