Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for mysql MIXED binary log format #897

Open
alarbada opened this issue Jul 8, 2024 · 8 comments
Open

Support for mysql MIXED binary log format #897

alarbada opened this issue Jul 8, 2024 · 8 comments

Comments

@alarbada
Copy link

alarbada commented Jul 8, 2024

The default format in mysql is ROW, but it would be nice that the library could also support the "MIXED" format.
My usecase is to develop a conduit connector for mysql.

Is there a specific reason for why only the "ROW" format is supported?

@lance6716
Copy link
Collaborator

As a user of this library I think ROW based replication is safer, however I didn't take enough time to try MIXED format. Welcome to discuss. I may response later.

https://dev.mysql.com/doc/refman/8.4/en/replication-sbr-rbr.html here is the tradeoff of ROW vs STATEMENT.

@atercattus
Copy link
Member

Hello, @alarbada.
Statement-based and mixed formats require you to analyze and execute such sql queries from the binlog. We, as a library, can provide a sql query read from the binlog, but that's it. Further work will be on you.

If this is what you need, then we can do it. If you are expecting an API, as is the case with row-based, it will not work, because the library knows nothing about the data in the database at the time of parsing the statement/mixed-based message.

@dveeden
Copy link
Collaborator

dveeden commented Jul 9, 2024

From https://dev.mysql.com/doc/refman/8.4/en/replication-options-binary-log.html#sysvar_binlog_format:

"binlog_format is deprecated, and subject to removal in a future version of MySQL. This implies that support for logging formats other than row-based is also subject to removal in a future release. Thus, only row-based logging should be employed for any new MySQL Replication setups. "

This doesn't mean I'm against supporting it, but it does mean the focus should be the row based format.

Note that for statement based you need more than just the query as many queries need context to get the same result on the target, e.g. RAND() needs rand_seed1 and rand_seed2 to be set.

@alarbada
Copy link
Author

alarbada commented Jul 9, 2024

Thanks a lot for the link @dveeden! I didn't know it was deprecated. Now I've got a good excuse not to have to support mixed or statement-based formats, at least initially.

@atercattus, I'm not sure now. After reading more about this, I feel like it yields very little value for the complexity it would add to my connector. This might be useful for people stuck with legacy mysql versions, which I might or might not have to support.

In any case, having the statement as a string in the canal.RowsEvent would be enought for me, I can do the parsing with something like this.

@dveeden
Copy link
Collaborator

dveeden commented Jul 9, 2024

This would probably be something like a QueryEvent instead of a RowsEvent.

While row based binlogs weren't the default before 8.0 it was an configuration option since MySQL 5.1 (~2006).

Note that in MariaDB the MIXED format is still the default.

@dveeden
Copy link
Collaborator

dveeden commented Nov 22, 2024

With MySQL 9.1.0

mysql-9.1.0> \W
Show warnings enabled.
mysql-9.1.0> set session binlog_format='STATEMENT';
Query OK, 0 rows affected, 1 warning (0.00 sec)

Warning (Code 1287): '@@binlog_format' is deprecated and will be removed in a future release.
mysql-9.1.0> insert into t123 values (1+2);
Query OK, 1 row affected (0.01 sec)

And code like this:

                switch e := ev.Event.(type) {
                case *replication.QueryEvent:
                        if bytes.Compare(e.Query, []byte("BEGIN")) == 0 {
                                continue
                        }
                        fmt.Printf("Query: %s\n", e.Query)
Query: insert into t123 values (1+2)

So if you are using binlog_format = 'STATEMENT' or binlog_format = 'MIXED' you would see QueryEvent with the query for DML operations. With MIXED some operations would be in RowsEvent. And with ROW all would be in RowsEvent. Note that for DDL operations you would see QueryEvent regardless of the binlog_format.

So I think we already support the MIXED format well enough in go-mysql. I would still recommend to use ROW instead.

If no further comments than I'll close this issue.

@dveeden
Copy link
Collaborator

dveeden commented Nov 22, 2024

To make things even more complex:

If you set binlog_rows_query_log_events=ON in MySQL and have binlog_format=ROW you get the RowsEvent as expected, but you also get a RowsQueryEvent that hold the original query that was linked to the RowsEvent.

=== RowsQueryEvent ===
Date: 2024-11-22 11:27:53
Log position: 458
Event size: 59
Query: insert into t123 values (1+2+3+4+5)

The linked RowsEvent has this for the Rows: [[15]]. This can be useful to for example see any comments in the query that can be used to trace back some process in the application. Here you can see that it gives a clue about how the value of 15 was created by adding up 5 numbers.

@dveeden
Copy link
Collaborator

dveeden commented Nov 22, 2024

@lance6716 maybe we should add some recommendations in the docs:

binlog_format = ROW
binlog_row_image = full

And maybe binlog_row_metadata=FULL?

Do we support binlog_transaction_compression=ON?

I assume we only support binlog_encryption = OFF?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants