Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement transaction identifiers - continued #2539

Merged
merged 12 commits into from
May 26, 2024

Conversation

roeap
Copy link
Collaborator

@roeap roeap commented May 25, 2024

Description

This is based on @Blajda's work in #2327 and aims to revive transactions identifiers.

This PR elaborates a bit on the ReplayVisitor concept introduced by David. specifically we moved things "one level down" to be tracked on the eager snapshot. This was mainly required to make it work with the commit flow, to properly handle updating the state after commits, without piping the visitors through all the ways.

The nice thing about this, that we can isolate the mechanics as to when or how we track additional actions in the EagerSnapshot and expose an interface the looks like what we might get for kernel - i.e. some opaque iterator over the respective actions.

Related Issue(s)

closes #2130

Documentation

@github-actions github-actions bot added the binding/rust Issues for the Rust crate label May 25, 2024
@github-actions github-actions bot added the binding/python Issues for the Python package label May 26, 2024
@roeap roeap marked this pull request as ready for review May 26, 2024 15:03
@roeap roeap requested a review from Blajda May 26, 2024 15:03
Copy link
Collaborator Author

@roeap roeap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After this PR, there is very little left that the DeltaTableState is actually doing. Moist methods are just proxies to the methods on EagerSnapshot, which makes sense, since that also represents the state 😆.

After this, and getting the first kernel version in (#2495) I hope that we can finally drop the DeltaTableState ...

Comment on lines -283 to +304
/// Convert actions to their json representation
pub fn log_entry_from_actions<'a>(
actions: impl IntoIterator<Item = &'a Action>,
) -> Result<String, TransactionError> {
/// Obtain the byte representation of the commit.
pub fn get_bytes(&self) -> Result<bytes::Bytes, TransactionError> {
let mut jsons = Vec::<String>::new();
for action in actions {
for action in &self.actions {
let json = serde_json::to_string(action)
.map_err(|e| TransactionError::SerializeLogJson { json_err: e })?;
jsons.push(json);
}
Ok(jsons.join("\n"))
}

/// Obtain the byte representation of the commit.
pub fn get_bytes(&self) -> Result<bytes::Bytes, TransactionError> {
// Data MUST be read from the passed `CommitData`. Don't add data that is not sourced from there.
let actions = &self.actions;
Ok(bytes::Bytes::from(Self::log_entry_from_actions(actions)?))
Ok(bytes::Bytes::from(jsons.join("\n")))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since i to grok the commit logic a bit, it seemed like this was just seprated for legacy reasons, and the log_entry_from_actions was ever only supposed to be used with the data in CommitData

Comment on lines -410 to +443
) -> Result<PreCommit<'a>, CommitBuilderError> {
let data = CommitData::new(self.actions, operation, self.app_metadata)?;
Ok(PreCommit {
) -> PreCommit<'a> {
let data = CommitData::new(
self.actions,
operation,
self.app_metadata,
self.app_transaction,
);
PreCommit {
log_store,
table_data,
max_retries: self.max_retries,
data,
post_commit_hook: self.post_commit_hook,
})
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this did not need to be fallible - a lot of the touched files just updates from this change.

@rtyler rtyler merged commit e52bd29 into delta-io:main May 26, 2024
24 of 25 checks passed
Comment on lines +78 to +85
if self.app_transaction_version.contains_key(app_id) {
continue;
}
self.app_transaction_version.insert(
app_id.to_owned(),
Transaction {
app_id: app_id.into(),
version: ex::read_primitive(version, idx)?,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for a drive-by comment, but: Does this mean this will only keeps the first version? since we continue if the hash map has the key

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, test_app_txn_visitor answered this in the affirmative.

Is that the correct behavior, though? The Delta protocol says

Delta only ensures that the latest version for a given appId is available in the table snapshot.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The replay visits commits from latest to oldest. By keeping the first one we encounter, we are in fact only showing the latest.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see! Thanks!

/// actions.
pub fn app_transaction_version(&self) -> &HashMap<String, i64> {
&self.app_transaction_version
/// HashMap containing the last transaction stored for every application.

This comment was marked as resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package binding/rust Issues for the Rust crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Transaction Identifiers
4 participants