Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nom-sql: Add parsing for EXTRACT built-in #1261

Open
wants to merge 69 commits into
base: Id1465b9c660965c898932c783f5e47064a50a4b7
Choose a base branch
from

Conversation

ethan-readyset
Copy link

This commit adds parsing for the built-in EXTRACT function. This
function is present in both MySQL and PostgreSQL, but the supported
fields across the two databases are different. To keep things simple and
scoped, only support for the PostgreSQL fields have been added.

altmannmarcelo and others added 30 commits May 8, 2024 13:02
As part of MySQL 8.4 release, terminology about Master/Slave has been
replaced. During snapshot, we issue SHOW MASTER STATUS to gather the
current binlog position. This command has been replaced with SHOW
BINARY LOG STATUS by mysql/mysql-server@6e2c577 . Unfortunately, the
new terminology is not available on 8.0 series, so we need to check
the server version and conditionally adjust the query we issue.

Also, adjusted the checksum query to use the new terminology of
source_binlog_checksum is compatible with 8.0 and 8.4.

Ref: REA-4374
Closes: #1253

Release-Note-Core: Adjusted replicators terminology to be compatible
   with MySQL 8.4.

Change-Id: I2a57d07ef1a4a426efce3e1989d2d0e3436b6d52
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7449
Tested-by: Buildkite CI
Reviewed-by: Jason Brown <[email protected]>
Since there is no `i24/u24` native to Rust, we use `u32` and do some
manual bounds checking when converting from other types.

We also introduce a MySQL-specific logictest subdirectory, for tests
(such as the one included here) which we never expect to run against
PostgreSQL. Likely some existing tests should be moved there.

Release-Note-Core: Added support for MySQL's `MEDIUMINT` column type.
Fixes: REA-4285
Change-Id: I4530093ea029957dc4c8b32ab6b56a47cce177ca
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7461
Tested-by: Buildkite CI
Reviewed-by: Jason Brown <[email protected]>
Allow clustertests to set the `--enable-experimental-post-lookup`
flag.

Change-Id: Ia6b485922f6097154a1c4134f0aae827f4b5b0eb
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7415
Tested-by: Buildkite CI
Reviewed-by: Ethan Donowitz <[email protected]>
If you have an empty directory under the `benches_data/<db>`, you will
get a confusing error message like this:

```
No schema for benchmark {name}
```

It ends up being that you don't have as schema file (suffixed with
`.sql`) in on of the subdirectories. The error message fails to insert
the bench test name (from it's folder name), and thus unclear. This CL
just fixes that error message a bit.

Change-Id: I2f85bc43d5ce986894233bff855392fc1345b9d6
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7469
Tested-by: Buildkite CI
Reviewed-by: Ethan Donowitz <[email protected]>
Short circuit some post-lookup operations if the results contain only
a single key (possibly with multiple rows), or a single key with a
single row.

Change-Id: I6698188a7aeb2a1a575896a38387cc77225fe9e7
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7470
Tested-by: Buildkite CI
Reviewed-by: Ethan Donowitz <[email protected]>
Allow parameterization of a WHERE IN clause when the SELECT
contains aggregations. The aggregation across the keys in from the
WHERE IN still must execute as a post-lookup operation.

Release-Note-Core: Allow parameterizing WHERE IN
  clauses when the query contains aggregations.
Change-Id: Iaf28fb394c4964e5d7e9869b3741fc2017c492d5
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7399
Tested-by: Buildkite CI
Reviewed-by: Ethan Donowitz <[email protected]>
If we are no longer replicating a table, e.g. due to the presence of
`--replication-tables-ignore`, we will register it as a non-replicated
relation on the controller. We will then later try to drop the base
table, but will instead only drop the non-replicated relation
registration, leaving the base table around. Additionally, if the table
was only partially snapshotted, we could later error when trying to
retrieve its replication offset after snapshotting has supposedly
finished (but we've haven't been replicating that table). With this
change, we drop leftover tables before registering non-replicated
relations, so that attempting to drop the base table actually does so.

Fixes: REA-3770
Change-Id: Ie98eccd888f3c250d18b72a799d0dcfb5622a872
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7472
Tested-by: Buildkite CI
Reviewed-by: Ethan Donowitz <[email protected]>
The MySQL min and max positions can be far apart. This happens on
unbalance workloads, where for example one of the tables do not
receive updates for a long period of time, like a config table that
might be static. This causes the MySQL min position to fall behind
the max position, in case of a restart, the replicator has to catch
up starting from the min position. This causes a lot on unnecessary
data to be re-streamed.
Currently we only update the min position when MySQL rotates the
binary log and we receive a EventType::ROTATE_EVENT.
Adjusting the position on all tables might be costly if the
installation has a lot of tables.

This commit adjust the replicator to adjust table position on a fixed
interval. This interval is hardcoded to 10 seconds.
The event we act on is either the EventType::QueryEvent when we
receive a "COMMIT" query, or the EventType::XidEvent. They are
virtually the same thing (a commit), but depending on the storage
engine MySQL reports a query COMMIT or an XID event.
We also report the position once we have finished the initial catch up
phase.

Ref: REA-4326
Closes: #1223

Release-Note-Core: Adjusted MySQL replicator to report table position
on a fixed interval(10 seconds). This makes the replicator to keep
distance short between min and max positions. This is useful when
Readyset restart, ensuring that we do not have to re-stream a lot of
binary logs to catch up.

Change-Id: I6dfaf523b8851597a6a0fd97f4d4627ca2f4ea80
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7363
Tested-by: Buildkite CI
Reviewed-by: Ethan Donowitz <[email protected]>
Since MySQL binlogs `TRUNCATE` as a statement (`QUERY_EVENT`), but it's
not DDL and doesn't have a corresponding recipe `Change`, we were just
ignoring it. Now the MySQL replicator parses it, emitting the
`TableOperation::Truncate` we had already added for Postgres.

Fixes: REA-4325
Closes: #1221
Release-Note-Core: Added support for `TRUNCATE TABLE` statements for MySQL.
Change-Id: Ia40551e40fa70598973587f5b26e8662419e9853
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7488
Tested-by: Buildkite CI
Reviewed-by: Marcelo Altmann <[email protected]>
Similar to how we split up the postgres logictests, split up the mysql
logictests into numbered subfolders so we can run them in parallel to
reduce build times.

Change-Id: I4bb088b00f2f1e6f43c7791f9517d63e27c93a22
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7477
Reviewed-by: Michael Zink <[email protected]>
Tested-by: Buildkite CI
Now that `8c9bc345a6a42f071bf1b621047f840eb9b31379` is committed, most
of the logictests that were under the `out-of-scope/ENG-629` directory
can be moved out so they execute with everything else. There are still
some failing tests, but those are almost all related to `DISTINCT`,
which isn't supported for where in and aggregates.

Change-Id: I181694eacf94a3cc04c46e5d8a16e479004ac361
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7478
Tested-by: Buildkite CI
Reviewed-by: Ethan Donowitz <[email protected]>
Reviewed-by: Michael Zink <[email protected]>
Lifts most of the restriction of
`ff819dc0d9cede2af458e17a30830f2a1e843821`, which prevented WHERE IN
and aggregates from being generated for the same query. After
`8c9bc345a6a42f071bf1b621047f840eb9b31379`, we can generate most
aggregations alongside `IN` clauses, but we still need to disallow
`DISTINCT`, either as a plain modifier on a column or as a modifier on
certain aggregation function (`sum`, `count`).

Change-Id: I76ac3573d864e39a8d3b91a6155fc32914332dce
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7479
Tested-by: Buildkite CI
Reviewed-by: Ethan Donowitz <[email protected]>
Previously, the views synchronizer only checked the server for views for
queries that were in the "pending" state. This meant that if the
migration handler set a query's state to "dry run succeeded" before the
views synchronizer had a chance to check the server for a view, the query
would be stuck in the "dry run succeeded" state forever, even if a view
for the query did indeed exist already.

This commit fixes the issue by having the views synchronizer check the
server for views for queries in *either* the "pending" or "dry run
succeeded" states. In order to prevent the views synchronizer from
rechecking every query with status "dry run succeeded" over and over
again, a "cache" has been added to the views synchronizer to keep track
of which queries have already been checked.

While working on this, I also noticed that it was possible for the
following sequence of events to occur:

- Migration handler sees that a query is pending and kicks off a dry run
  migration
- Views synchronizer finds a view on the server for the same query and
  sets the status to "successful"
- Migration handler finishes the dry run migration for the query and
  overwrites the status as "dry run succeeded"

This could lead to a situation where a query that was previously
(correctly) labeled as "successful" is moved back to the "dry run
succeeded" state. To fix the issue, this commit updates the migration
handler to only write the "dry run succeeded" status if the query's
status is still "pending" after the dry run is completed.

Release-Note-Core: Fixed a bug where queries that already had caches
  were sometimes stuck in the `SHOW PROXIED QUERIES` list
Change-Id: Ie5faa100158fc80c906d8ad5cb897d8a02a07be9
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7442
Tested-by: Buildkite CI
Reviewed-by: Jason Brown <[email protected]>
This reverts commit bde05974330a69526f06f1fbcafab925064cd659.

Change-Id: I56b71ed96e508ac617579ca1d0e181a1387671f1
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7396
Tested-by: Buildkite CI
Reviewed-by: Jason Brown <[email protected]>
This reverts commit 337df377be353ebd4f0fa548f1301997ba7d3e28.

Change-Id: I73174d2aa27cb077941eab13ab1b613a6e6a4a07
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7397
Tested-by: Buildkite CI
Reviewed-by: Jason Brown <[email protected]>
Native async traits were stabilized as of Rust 1.75, so we no longer need
the async_trait crate in many situations. This commit replaces the 3rd
party crate with the native version everywhere we can. The areas of the
code that still require the 3rd party crate include:

- Any trait that is used as a trait object (this is not supported
  natively by Rust yet)
- Certain traits that returned lifetime errors when attempting to remove
  the `#[async_trait]` macro (these errors included a message that said
  the error was a known limitation and would be removed in the future)
- The trait in proptest-stateful, whose interface I didn't want to change
  without further discussion, since it's a publicly-available trait

Change-Id: I5c761c075966e4fcebbb6d4955608107cf871b7c
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7375
Tested-by: Buildkite CI
Reviewed-by: Jason Brown <[email protected]>
`OnceCell` was added to the Rust standard library a while back, so it is
no longer necessary to use the 3rd party crate. This commit removes the
crate, replacing our only usage of `OnceCell` with `OnceLock`, which is
a thread-safe alternative.

Change-Id: Ifdb622c34c24ff40836276e25d2db8c33a2694df
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7376
Tested-by: Buildkite CI
Reviewed-by: Jason Brown <[email protected]>
This commit removes some unused dependencies as reported by `cargo udeps`.

Change-Id: Iebcf5c662b392f2825232cc75a587d770105bfb0
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7377
Tested-by: Buildkite CI
Reviewed-by: Jason Brown <[email protected]>
57872b449 fixed a couple of bugs present in the views synchronizer. As
part of that work, a new "cache" was added to the views synchronizer to
keep track of which queries in the query status cache have already had
their views synchronized with the server. However, this commit
also introduced a bug: instead of looking for views for the queries that
we *haven't* yet checked, the views synchronizer was looking for views
only for the queries for which we *already synchronized views*.

This commit fixes the issue by reversing the boolean logic.

Change-Id: Ic63d4deac400cdca23b3c2f5517d7209351af625
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7516
Reviewed-by: Jason Brown <[email protected]>
Tested-by: Buildkite CI
This reverts commit 1cde153ceafb59901ad133317b85d357573cf2df.

Reason for revert: While this CL was attempting to address a
reasonable conern (under-eviction), it unfornately goes too far in the
opposite direction and over-evicts as well as unnecessarily burning
through additional CPU cycles.

As background, when we determine we need to evict some bytes in
`do_eviction()`, we send messages to domains requesting them to
evict some number of bytes. Those messsages are sent _asynchonously_,
and any reciprocating updates to the domain sizes are recieved
_asynchronously_.

This CL introduced a loop around that core eviction functionality,
assumably thinking that eviction is synchronous. As there is nothing
blocking or delaying each iteration of the loop, it would hammer away
sending async eviction messages until the domain sizes fell below the
threshold, but because we sent more eviction requests than necessary,
we over-evicted.

This is compounded by the call to `MemoryTracker::allocated_bytes()`
on each loop iteration. That function must update all jemalloc stats
by updating an epoch value inside of jemalloc, which turns out be an
expensive operation.

Change-Id: Id7cc5dec6da388d0ec7876e1e3259e2398272ca6
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7522
Tested-by: Buildkite CI
Reviewed-by: Ethan Donowitz <[email protected]>
This commit upgrades many of our dependencies to newer versions. Note
that this was not just the result of a `cargo update` invocation; I used
the `cargo-edit` tool to automate the process of upgrading crate
versions in our Cargo.toml (I also ran `cargo update` afterwards for
good measure). The code changes in this commit reflect breaking API
changes in the new package versions.

Change-Id: Ib15333b66c6bba2e3eb4a302ea85c3a03ab0acf5
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7378
Tested-by: Buildkite CI
Reviewed-by: Jason Brown <[email protected]>
The linux release packages get their version number from the readyset
version number in public/readyset/Cargo.toml.

Change-Id: I445f45581cae2da5854c475951473c9c6e344196
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7531
Tested-by: Buildkite CI
Reviewed-by: Marcelo Altmann <[email protected]>
Allow straddled joins in system-benchmark testing.

Change-Id: I4b7e846b543711cac786e9382673d972e949efc2
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7526
Tested-by: Buildkite CI
Reviewed-by: Michael Zink <[email protected]>
Reviewed-by: Ethan Donowitz <[email protected]>
This commit updates the Rust version of the public ECR library image we
use for our xtask crate and for our cargo-deny docker image builder. The
version is updated from 1.74 to 1.78.

Change-Id: Idad4f0c1727b5bc37c1f008281e450e8f63fa24d
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7447
Tested-by: Buildkite CI
Reviewed-by: Jason Brown <[email protected]>
This commit updates the cargo-deny version we use from 0.13.17 to
0.14.23.

Change-Id: I46039b68fa1f8fa02e08b5789a1151ca77314b35
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7448
Tested-by: Buildkite CI
Reviewed-by: Jason Brown <[email protected]>
This commit updates the Rust toolchain being used in both public/ and
crates/ to nightly 2024-05-02, which corresponds with the stable release
of 1.78 on the same date.

Change-Id: I56ea0995b899ce657b47bb42c6d2bef219db2516
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7439
Tested-by: Buildkite CI
Reviewed-by: Jason Brown <[email protected]>
- This ticket is for syntactic support of CREATE DATABASE statement,
without processing it. This allows for avoiding a confusing error
message as such statement is issued.

Fixes: REA-4244
Change-Id: I1632d395c8388963f7567e8b8d74fcda1d8886a4
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7520
Tested-by: Buildkite CI
Reviewed-by: Ethan Donowitz <[email protected]>
Update mysql_common to 0.32.3 in order to get the new collation data
dictionary support.

As part of this update, we also need to downgrade the version of
sqlformatter and prometheus-parse as they use itertools v0.12.1 which
is currently incompatible with criterion.rs and cause cargo check to
fail. We should be able to updated them once again when a release with
[1] is published.

[1]: bheisler/criterion.rs#743

Ref: REA-4382
Closes: #1258

Change-Id: I36614184b749c96c0046c88ca5e1c6a2d186eff6
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7505
Tested-by: Buildkite CI
Reviewed-by: Jason Brown <[email protected]>
When trying to coerce a CHAR/VARCHAR column we need to check the value
length in characters, not in bytes. This is because the field length
is declared in characters, not in bytes.

Ref: REA-4383 REA-4366
Change-Id: I0cce0c68370512272bd3da67ca4ce7b08b662c3f
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7509
Tested-by: Buildkite CI
Reviewed-by: Ethan Donowitz <[email protected]>
This commits adds proper collation support for CHAR and BINARY columns
in MySQL.
CHAR columns should be right padded with spaces to the column length
when storing them and BINARY should right pad zeros.

This commit fixes the issue at snapshot - During snapshot we do a
logical dump of data. MySQL removes padding spaces from CHAR columns
when retrieving them. So, we need to take the column collation into
consideration when storing them. One gotcha is with ENUM/SET columns,
they are retrieved as Strings(MYSQL_TYPE_STRING), but we should not
pad them.
During CDC, we need to retrieve proper
metadata from TME in order to validate if padding is necessary or not.

This commit also fixes an issue when storing BINARY columns. We were
storing them as TinyText/Text if the binary representation of the
columns was a valid UTF-8 string. This is not correct. We should store
them as ByteArray.

Test cases were written taking into consideration a mix of characters
from different bytes, like mixing ASCII and UTF-8 characters from
2nd and 3rd bytes.

Note: MySQL uses the terminology of charset and collation interchangeably.
In the end everything is stored as collation ID, which can be used to
determine the charset and collation.

Ref: REA-4366
Ref: REA-4383
Closes: #1247 #1259

Release-Note-Core: Added collation support for storing CHAR and BINARY
   columns in MySQL using the correct padding. This fixes an issue when
   looking up CHAR/BINARY columns with values that do not match the
   column length.

Change-Id: Ibb436b99b46500f940efe79d06d86494bfc4bf30
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7510
Tested-by: Buildkite CI
Reviewed-by: Michael Zink <[email protected]>
rs-sac and others added 28 commits June 21, 2024 16:01
This warning, printed during every single build, is just noise that
tells engineers that they are engineers, which they likely already
know.  If CI wants to set the variable, that's cool, but the default
should be tailored for humans, not for machines.

Change-Id: I273b2796a9974cc874ceedc4713fba5f565337ca
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7623
Tested-by: Buildkite CI
Reviewed-by: Jason Brown <[email protected]>
We were doing extraneous padding every time we turn a `BitVec` into
bytes, resulting in incorrect predication on caches created with
parameters on `BIT` columns.

We also improve bitvec resultset type support in logictests and add a
test that previously failed on Postgres. (The equivalent MySQL test is
still failing due to REA-3381.)

Change-Id: I85fcf99449a14e9ddfc9e82020e08183cb552fd6
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7587
Tested-by: Buildkite CI
Reviewed-by: Marcelo Altmann <[email protected]>
We now handle doubled quotation marks, both single and double (MySQL
only).  We add several tests of the various combinations of these
things, including of table comments, which were already working.

Release-Note-Core: Correctly handle escaped quotes in table column
  comments.
Fixes: REA-4446
Change-Id: I40d56e5b01880a182db1cc73b2e7e6fd6ff0ebfd
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7626
Tested-by: Buildkite CI
Reviewed-by: Marcelo Altmann <[email protected]>
…rsion

- Currently, for any non parametrized binary operations (column vs column)
we always figure out some common datatype, the operands might be coerced to.
Prior to the fix, the temporal datatypes, CHAR/VARCHAR, BOOL, and certain
combinations of the numerical types were wrongly defaulted to type DOUBLE,
what caused issues later on. The fix added support for the missing
datatypes combinations. Note, the fix still does not handle
DECIMAL vs DECIMAL correctly.

Fixes: REA-4440
Change-Id: I1474a36536d9a70f01c2c3089095fa9848ef2437
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7584
Tested-by: Buildkite CI
Reviewed-by: Marcelo Altmann <[email protected]>
CHAR columns are fixed-width, if the value is shorter than the column
width, it should be padded with spaces. If the value column is NULL,
it should not attempt to pad.

Fixes: REA-4476
Change-Id: Ieafd250603295f07096fcf070da5bc85034bfef2
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7633
Tested-by: Buildkite CI
Reviewed-by: Jason Brown <[email protected]>
Despite appearances, these are not integers, though of course their type
depends on dialect.  In MySQL, you get varbinary, but in Postgres, you
get bit strings.  We already had X'...', and we now add x'...' and 0x
(MySQL only).

Release-Note-Core: Handle hexadecimal x'...' and 0x... literals.
Fixes: REA-4456
Change-Id: I011790ffd13c2e792b4481bd67ef696cc168f797
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7637
Tested-by: Buildkite CI
Reviewed-by: Marcelo Altmann <[email protected]>
- This CL enables using JSON type in select list only,
  and not in the WHERE clause

Fixes: REA_4462
Change-Id: I69b07098f0f4ea07c581045be531d6a2499ba015
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7638
Tested-by: Buildkite CI
Reviewed-by: Marcelo Altmann <[email protected]>
The test would have failed on a hash collision.

Change-Id: I05c3beaebe1533def1ede42451e3b9043518e2a5
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7636
Tested-by: Buildkite CI
Reviewed-by: Marcelo Altmann <[email protected]>
At some point, the macOS security framework changed enough such that it
apparently cannot be convinced to accept a TLS cert without a password.
That meant that some tests involving TLS were failing on macs because
our test cert had no password on it.  This update creates a new cert
with password "password", and updates the tests that use it.

Furthermore, OpenSSL 3 dropped compatibility with certain encryption
ciphers by default, meaning that pkcs12 certs created with it couldn't
be verified by the macOS security framework.  The web-recommended
solution is to run `openssl pkcs12` with the `-legacy` option.
Unfortunately, while solving the problem for macOS, this produced a cert
that was too out-of-date for OpenSSL3 on linux.  More specific cipher
selection per the Magic Incantations(tm) below generates a cert that
will pass tests on both macOS *and* Linux...  but may not be safe for
any other purpose.  Apply only to affected area.  In case of hemorrhage,
seek emergency medical help immediately.

For reference, the commands below were used to create this cert on macOS
using OpenSSL 3.3.1 installed with `homebrew`:
```
# Make a new private key
openssl genrsa -out private.key 2048
# Generate a signing request.
openssl req -new -key private.key -out cert.csr
# Generate an x5509 cert from the signing request (good for 10 years)
openssl x509 -req -days 3650 -in cert.csr -signkey private.key \
   -out certificate.crt
# Export the pkcs12 file with password "password"
openssl pkcs12 -export -out certificate.p12 -inkey private.key \
   -in certificate.crt -passout pass:password \
    -keypbe PBE-SHA1-3DES -certpbe PBE-SHA1-3DES -macalg sha1
```

Change-Id: Ib6d25034f29690a94b41e4ebc1ad88add27bf777
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7640
Tested-by: Buildkite CI
Reviewed-by: Sidney Cammeresi <[email protected]>
During the resolve_schemas pass, we visit all the tables and columns
in the foreign key constraints. Currently we don't do anything with
FKs and enforce table/schema resolve imposes a limitation in not been
able to snapshot tables with FKs that reference tables that are not
currently replicated.

This change adds a new method to allow visiting FKs and their columns
without enforcing the schema resolution. In case the target table is
not been replicated and the FK is not provided with db.table notation,
we add a placeholder schema to the Relation.

Fixes: REA-4473
Fixes: #1289
Change-Id: I32be7c134d0d669f0a0628f980c4363f6ae24ce0
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7634
Tested-by: Buildkite CI
Reviewed-by: Jason Brown <[email protected]>
Add the db.table notion to the snapshot warning message if we fail to
extend a DDL receipe.

Fixes: REA-4474
Fixes: #1291
Change-Id: I4dd8995796985d9c6be2f753239aaa2dc92d3018
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7632
Tested-by: Buildkite CI
Reviewed-by: Jason Brown <[email protected]>
Add new datatypes to MySQL DDL vertical tests (blob and datetime).
Adjust postgresql upload artifact to match regression file name.
Add DDL vertical MySQL to nightly.

Fixes: REA-4467
Change-Id: If47ea218e71c2a90198753d247977273f505404a
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7625
Tested-by: Buildkite CI
Reviewed-by: Michael Zink <[email protected]>
Reviewed-by: Ron Hough <[email protected]>
The linux release packages get their version number from the readyset
version number in public/readyset/Cargo.toml.

Change-Id: Iede6d2c95443c9fcfce0750627658e0565680870
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7647
Reviewed-by: Marcelo Altmann <[email protected]>
Tested-by: Buildkite CI
Fix fk test to also add t_child2 to replication filters.

Change-Id: I968f04d5d2aeb5044f50e615f4895da515dfeab6
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7650
Tested-by: Buildkite CI
Reviewed-by: Vassili Zarouba <[email protected]>
Ignore the warning since this is only used in a specific build.

Change-Id: I7cdc848a2b61837909405711615928b0af6d2f58
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7630
Tested-by: Buildkite CI
Reviewed-by: Jason Brown <[email protected]>
Add auxiliary functions to extract keys from a create table statement.
This will be used in a future commit to enhance snapshot.

Change-Id: I76cf8f577d3e262954e08336c081cdd8d872df6d
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7648
Tested-by: Buildkite CI
Reviewed-by: Michael Zink <[email protected]>
Add snapshot type to mysql connector. Snapshot type defines how the
snapshot will be taken. It can be either a key based (Primary Key /
Unique Key) or a full table scan snapshot.

Change-Id: Id6daad9746c7ed4bd3b1fe3f76d997946e0ac322
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7649
Tested-by: Buildkite CI
Reviewed-by: Michael Zink <[email protected]>
Adjust the MySQL snapshot to use the new snapshot type. The snapshot
type checks if table has a primary key or unique key and uses it to
define batches to query the table, making it less intrusive, specially
for large tables. In case the table does not have a primary key or
unique key, the snapshot will do a full table scan.

Fixes: REA-4477
Fixes: #1303

Release-Note-Core: Enhance MySQL snapshot to use Primary Key or unique
   key when available. This makes the snapshot less intrusive than a
   `SELECT *` (Full Table Scan) for large tables.

Change-Id: Iafda6ea6c74888262a0eea8bc1e880a3214b068a
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7641
Tested-by: Buildkite CI
Reviewed-by: Michael Zink <[email protected]>
Somehow we ended up copying an immutable object to pass a bunch of
temporary copies around by value.  Replace copying with fancy
reference technology.

Change-Id: Iac46dc74abe931dd4e5677be0e378c1d6599ff6a
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7654
Tested-by: Buildkite CI
Reviewed-by: Jason Brown <[email protected]>
Insufficient committing and testing of the previous patch resulted in
code that didn't actually work.  The two oversights were:  failing to
check for hex before integers and an incorrect attempt to handle
odd-length literals in Postgres, which was checking the length of the
returned bytes, not the length of the literal.

(Some of this code has been written so as to help support odd-length
literals in the future, but they don't currently work.  The first
obvious problem is that the return type here returns bytes, not bits.)

Fixes: REA-4456
Change-Id: I99715c9f5b7b8cbbf9c0a4a507c1f5ff8bdb2f0f
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7652
Tested-by: Buildkite CI
Reviewed-by: Marcelo Altmann <[email protected]>
We have out own implementation of Datetime/Timestamp called
TimestampTZ. This object has a 3 bytes bitmap that gives us better
control over printing date only, timestamp with tz even if tz is zero
and microsecond precision even if it is zero.
Currently we were converting the TimestampTZ to NaiveDateTime and then
printing it. This was causing some issues with MySQL and datetime.
NaiveDateTime has an inner object called NaiveTime to represent time.
When printing naive time, if the microsecond is zero, it is not
printed [0]. For MySQL, if the Datetime column has the optional
microsecond precision set, we need to print the microsecond even if
it is zero, causing a mismatch between the readyset and MySQL.

This commit changes the display object for datetime/tz columns to
TimestampTZ and implement text and binary protocol trait for it.

Ref: REA-4490
Ref: #1309

[0]: https://github.com/chronotope/chrono/blob/v0.4.38/src/naive/time/mod.rs#L1520

Change-Id: I31301b20bebdd1bb33dbf6b79b84a8f7065dee80
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7655
Tested-by: Buildkite CI
Reviewed-by: Michael Zink <[email protected]>
This commit fixes the microsecond precision of DATETIME columns in
MySQL by setting the correct fractional seconds precision in the
TimestampTZ object.

This fixes a discrepancy between the precision of the DATETIME in
Readyset and MySQL.

Fixes: REA-4469
Fixes: #1309

Release-Note-Core: Fixes the microsecond precision of DATETIME columns
  in MySQL that sometimes were not being correctly represented in
  Readyset.

Change-Id: Ifc3bb58b16a87423a0e4079dffa34ed28fafaa35
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7656
Tested-by: Buildkite CI
Reviewed-by: Michael Zink <[email protected]>
Currently, `dataflow-state::State` has a `tear_down` function. This
was added in `7b5324d3aaa41a85f8c0380782d8e8efb53be7c7` to throw away
the state of a node when removing it form the dataflow graph. This has
only been used to remove rocksdb files from base table when we stop
replicating them.

When we added custom WAL flushing behavior, in
b34f871bb1da26434e88534a049c45c58d20815a, code was added to shut down
the WAL flushing thread. However, it was added to the
`PersistentState::tear_down()` implementation, and thus would only be
called when the node was removed from the graph. We need that thread
to be run on process shut down, as well, else Readyset can core dump
on normal process exit.

This CL adds a `State::shut_down` function that can be called under
normal exit situations. Machinery to actually invoke it will be added
in a followup CL.

Change-Id: I8f9403471b5459cfbe8ca1af4dafb6165a0d973c
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7659
Tested-by: Buildkite CI
Reviewed-by: Marcelo Altmann <[email protected]>
When a worker receives a message on the `shutdown_tx` channel, we now
plumb through the event to the the new `State::shut_down` function
on each node of the dataflow graph. This allows Readyset to properly
shut down the rocksdb WAL flushing thread when a domain is killed (due
to other error) or when the process is exiting.

This CL also raises the priority of listening for events on the
`shutdown_tx` within `Worker::run()`, by adding the `biased` tag
before that block in the `select!`. We do this in several places in
Readyset, most notable in the adapter.

Release-Note-Core: Now properly shutting down the rocksdb WAL at
  process exit.
Change-Id: Ib2d1f19fb6046b5db475d017f52528f729a4fd03
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7660
Tested-by: Buildkite CI
Reviewed-by: Marcelo Altmann <[email protected]>
MySQL Timestamps are stored as UTC, but the server returns them in the
local timezone. This commit fixes the handling of MySQL timestamps to
ensure that they are correctly converted to UTC during snapshot/
replication and converted to local time when read from the database.

This commit also adds a test to ensure that MySQL timestamps are
correctly handled during snapshot/replication by comparing the
timestamps with the upstream database.

This commits also adds timestamp support into DDL Veritical tests.

Fixes: REA-4469
Closes: #1279

Release-Note-Core: Fixed correctness of MySQL timestamp handling.
  Now MySQL timestamps are correctly converted to UTC during snapshot/
  replication and converted to local time when read from the database.
Change-Id: I9d50fb66a52c015de7b613d0d7e614767569075d
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7661
Tested-by: Buildkite CI
Reviewed-by: Michael Zink <[email protected]>
During the DATETIME nanosecond handling fix, the code responsible for
dealing with NULL values was not implemented for all cases. This patch
fixes the issue by adding the necessary logic to handle NULL values
correctly.

Change-Id: I093e18d1bfa414e38f05ac9c9d72c68ec0c9e0f5
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7662
Tested-by: Buildkite CI
Reviewed-by: Michael Zink <[email protected]>
Currently we only capture the live sstable sizes as part of
`PersistentState::deep_size_of()`. This CL includes the memtable sizes
in order to give a more accurate value. This is especially important
if the table is small enough to have not been flushed to disk yet.

Release-Note-Core: More accurately report the size
  of a persistent node by including the size of
  open memtables.
Change-Id: I28a41126743866a795e33b29dc24ba9c4e77feac
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/7667
Tested-by: Buildkite CI
Reviewed-by: Marcelo Altmann <[email protected]>
This commit adds parsing for the built-in `EXTRACT` function. This
function is present in both MySQL and PostgreSQL, but the supported
fields across the two databases are different. To keep things simple and
scoped, only support for the PostgreSQL fields have been added.

Change-Id: Ic73ef858478e73b6c466695a84ddb0266d881e92
@readysetbot readysetbot force-pushed the Ic73ef858478e73b6c466695a84ddb0266d881e92 branch from 1ad2d55 to b6e9453 Compare July 12, 2024 00:15
@CLAassistant
Copy link

CLAassistant commented Jul 12, 2024

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 4 committers have signed the CLA.

✅ altmannmarcelo
❌ mvzink
❌ vassili-zarouba
❌ rs-sac
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants