-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQL logic error when schemas out of sync #261
Comments
That's a great question. Schema synchronization has been largely left to users. At Fly: We have a repository containing Corrosion-related configuration that we deploy independently. Usually this is a multi-step process, similar to a migration for a normal database:
If we do this properly, every node has the same schema and there are no errors. If there are no errors, that's ok because nodes will keep trying to synchronize versions they haven't been able to apply. It's not perfect and a bit tedious... if you want to distribute the schema automatically via Corrosion, that sounds doable in the way you've described it, but a little scary sometimes. Automatic schema propagation was something we decided not to do to keep more explicit control of what happens to the schema. |
Very interesting subject area. Sometimes called Schema Evolution. Reminds me of Protocol Buffers, and the pattern of never doing a delete or rename, but instead always creating a new field. Then you have a little transformation engine that watches the Schema, and maps things into the Fields correctly. This mapper has to be Runtime, not designed time based, so that the Transformation changes when the Schema changes. I use Benthos for that. CDC of Schema changes is useful, because it can at least tell you "something is going to break". You can then compensate at Design time, and anything that is dependant on that Schema. If the things that are dependent on the Schema are generated off the Schema, that a Compensation technique. If not, you can at least design to know what is "bound" to your Schema, and tell them. Sort of like how Github Dependant helps you know that a package has changed that your are importing. |
@gedw99 Can you please elaborate or provide some examples on how you use Benthos for managing transformations? |
im a bit time short, but benthos has great docs for this and its very easy. you can make custom processors in Benthos to transform anything flowing through the systems as a "stream". https://github.com/redpanda-data/benthos https://github.com/akhenakh/geo-benthos is a example. In the case of a DB, JSONSchema is nice and reflective off some JSON, which you , I think, have on you API coming out. https://docs.redpanda.com/redpanda-connect/components/processors/json_schema/ |
When DB schema is changed (e.g. a new column is added) on node A and not on node B, inserts/updates from node A using new schema fail to apply on node B:
A similar error occurs when trying to process a change for a new table that doesn't exist on node B:
This behavior seems reasonable to prevent data loss. However, it requires every node in the cluster to run the same schema version.
To comply with that requirement, what should be the process for deploying schema changes to the cluster? One approach could be having another distributed table that would store the latest schema along with a separate process that would watch the latest schema and synchronize it to
db.schema_paths
followed by executingcorrosion reload
.Is there a better way to handle this? How do you manage schema changes at Fly?
The text was updated successfully, but these errors were encountered: