Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Branching #3201

Merged
merged 17 commits into from
Nov 4, 2024
Merged

Branching #3201

merged 17 commits into from
Nov 4, 2024

Conversation

samuelstroschein
Copy link
Member

@samuelstroschein samuelstroschein commented Nov 4, 2024

Closes opral/inlang-sdk#143

Changes

  • updateBranchPointers to keep pointers moving when new changes come in
  • changeInBranch(id) filter to get changes only in a branch
  • mergeBranch({ lix, sourceBranch: Branch, targetBranch: Branch })
  • createBranch({ lix, from: Branch })
  • switchBranch
  • currentBranch

Important notes

  • de-duplicating pointers can be added later via chunking see this
  • we can add more head pointers, for example, for discussions, in the future

Notes

💡 = insight
⚠️ = API change
😍 = awesome
❓ = unclear/review wanted

  • 💡better to have a dedicated change set pointer concept to

    • build index by entity_id
    • have unique (branch_id, entity_id) check
    • distinguish between user sets and mostly automated sets of pointers that can be garbage collected
    • argument to rename change_set to change_group for user intent. both are formalized as change set
  • 💡branch is state, we could introduce a state concept to move only one pointer at a time

    • points to changes now
    • could point to discussions and more entities in the future
  • ⚠️ plugin.detectConflicts needs overhaul because targetLix, sourceLix doesn't make sense for branching

    • all changes exists in the lix
    • for merging two lix’es, it seems to make sense to create an interim branch in a transaction that has all changes and then pass the diff to plugin.detectChanges
  • 😍 plugins don't need to handle conflict detection for diverging entity changes anymore

    • lix detects diverting entity changes as conflict
    • plugin.detectConflicts is reduced to “detectEntityOverarchingConflicts
  • 😍 no more need for targetLix & sourceLix

    • merging two lix’es is creating an interim branch
    • unification of merging APIs to branch model
  • 😍 querying changes in a branch is as easy as where(changeInBranch(currentBranch.id))

  • ❓️branch_change_pointer or branch_change_head

    • same applies to discussions and other entities we might add in the future
  • ❓️any benefit of introducing a state concept?

    • branch is modeling state with pointers that move independently.

Copy link

changeset-bot bot commented Nov 4, 2024

⚠️ No Changeset found

Latest commit: 2529fcd

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR


// it is reasonable to assume that a conflict exists if the same (entity, file, type) change is updated in both branches.
// in case a plugin does not detect a conflict, the system should automatically detect it.
test("it should automatically detect a conflict if a change exists that differs updates in both branches despite having a common ancestor", async () => {
Copy link
Member Author

@samuelstroschein samuelstroschein Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a beautiful change.

plugins do not need to detect diverging entity graphs as conflict anymore. lix handles that as it's universal conflict. an analogy with git: git defines entities in text files as rows. if the same row (entity) diverges in two repos, git raises a conflict. same now applies to lix. if two entities diverge, it's a conflict. but in contrast to git, entities can be defined more fine grained e.g. "character", automatically be resolved, and semantic conflicts between entities can be raised. if row 1 and row 2 are diverging, it's not an 'entity conflict' but maybe a semantic one that a plugin can detect!

Comment on lines +155 to +166
CREATE TABLE IF NOT EXISTS branch_change_pointer (
branch_id TEXT NOT NULL,
change_id TEXT NOT NULL,
change_file_id TEXT NOT NULL,
change_entity_id TEXT NOT NULL,
change_type TEXT NOT NULL,

PRIMARY KEY(branch_id, change_file_id, change_entity_id, change_type),

FOREIGN KEY(branch_id) REFERENCES branch(id),
FOREIGN KEY(change_id, change_file_id, change_entity_id, change_type) REFERENCES change(change_id, change_file_id, change_entity_id, change_type)
) strict;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decided to not go with the change set column to

  • get foreign key validations
  • avoid "dead" pointers e.g. a change set can have no reference. a change pointer of a branch with no reference can be garbage collected
  • indexes 👀

An open question is whether branch_change_head is a better suited table name

Comment on lines +168 to +175
-- only one branch can be active at a time
-- hence, the table has only one row
CREATE TABLE IF NOT EXISTS current_branch (
id TEXT NOT NULL PRIMARY KEY,

FOREIGN KEY(id) REFERENCES branch(id)
) strict;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about implementing the current branch as runtime state, contained in a settings file, or adding branch.active` but this approach is better because:

  • it;s just a pointer that can be updated without traversing tables
  • foreign key validation 🥳


const changes = await lix.db
.selectFrom("change")
.where(changeInBranch(branch.id))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BEAUTIFUL querying of tables scoped to branches via a filter.

  • no views
  • no full-blown utility queries
  • just a filter 😍

Comment on lines +9 to +15
export async function getLowestCommonAncestorV2(args: {
lix: Pick<LixReadonly, "db">;
changeA: Pick<Change, "id">;
changeB: Pick<Change, "id">;
}): Promise<Change | undefined> {
// check for direct ancestors first before running the recursive query
const directAncestorA = await args.lix.db
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is (ofc) a monster query. Performance is likely sub-optimal. Doesn't matter for now. We can add caching or jumping of commonly/previously traversed paths in the future

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤯 looking at this for a while now :D

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A thought here: coverage wont be a good measurement for tests of lix i guess...

Copy link
Contributor

@martin-lysk martin-lysk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WIP reviewed create-branch.test.ts, create-branch.ts, merge-branch.test.ts, switch-branch.test.ts and switch-branch.ts

packages/lix-sdk/src/branch/create-branch.ts Outdated Show resolved Hide resolved
change_id TEXT NOT NULL,
change_file_id TEXT NOT NULL,
change_entity_id TEXT NOT NULL,
change_type TEXT NOT NULL,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we denormalize change in this table? change_file_id, entity_id and change_type are already defined in the change table....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • (minor) avoid joining pointers with the change table
  • (major) ensure consistency that a branch only points to one "entity change" at a time

If we find a way to ensure consistency without the de-serialized columns, more than welcome to change it.

change_entity_id TEXT NOT NULL,
change_type TEXT NOT NULL,

PRIMARY KEY(branch_id, change_file_id, change_entity_id, change_type),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this produces an index table on branch_id, change_file_id, change_entity_id and change_type so we would end up with 5 uuid columns 5 for branch_change_pointer table and 4 for the primary key index table.

Each UUID is 16 bytes.
So, 9 UUIDs per record means 9 * 16 = 144 bytes per record
For 100 records, the total storage required for each branch would be 14.4 kb.

Just to set this in scope - we discussed 250k records last week - this would be 36 mb only for a single branch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be de-duplicated in the future

packages/lix-sdk/src/branch/merge-branch.test.ts Outdated Show resolved Hide resolved
join
// join related change pointers
// related = same entity, file, and type
.onRef("source.change_entity_id", "=", "target.change_entity_id")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok - this seems to be the reason for the extra columns in branch_change_pointer

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

  • (minor) avoid joining pointers with the change table
  • (major) ensure consistency that a branch only points to one "entity change" at a time

if (plugin.detectConflictsV2) {
const conflicts = await plugin.detectConflictsV2({
lix: args.lix,
changes: diffingChanges,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we need to provide the information about the target branch as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i thought so too but it doesn't seem like it.

  • a branch points to a set of changes
  • merging two branches leads to a new set
  • a plugin only needs to detect if a change in the new set exists

A nice side effect, the API generalizes pretty well. This API can now be used to detect conflicts even within the same branch (because it's just a set) or other domains.

packages/lix-sdk/src/branch/merge-branch.ts Outdated Show resolved Hide resolved
// (ignore if the conflict already exists)
if (detectedConflicts.length > 0) {
await trx
.insertInto("conflict")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have a relation between conflicts and branches - is this intended?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment yes. We need to investigate conflicts in another PR.

Gut feeling is any branch pointing to the same conflicting change will see the same conflict. If one branch resolves the conflict, it will be automatically in other branches that point to the same change (and thereby conflict).

Copy link
Contributor

@martin-lysk martin-lysk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WIP review part 2

import { openLixInMemory } from "../open/openLixInMemory.js";
import { updateBranchPointers } from "./update-branch-pointers.js";

test("the branch pointer for a change should be updated", async () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also cover the onConflict case

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep. the function is intentionally updateBranchPointers to include any other pointers we want to update in the future (like maybe conflicts or discussions)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expected the tests for handling branch pointers in separate files i here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i scoped it to branching. the API will become large and choatic. i thought about scoping code to data structures e.g. branch (mergeBranch, updateBranchPointers), lix (openLix, mergeLix), plugin (lix-plugin-schema, utilityies), database (schema). but it will just get choatic i guess :D

)
// pointer for this branch and change_entity, change_file, change_type
// already exists, then update the change_id
.onConflict((oc) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure on this one but doesn't a conflict trigger also on a foreign key conflict - If so we should add a conflict target here - would also be more explicit what conflicts we are dealing with

@@ -37,6 +37,10 @@ export type LixPlugin = {
sourceLix: LixReadonly;
targetLix: LixReadonly;
}) => Promise<DetectedConflict[]>;
detectConflictsV2?: (args: {
lix: LixReadonly;
changes: Array<Change>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the plugin will need to know the source and target branch here - to be able to take the current state into account during conflict detection.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you find a use case?

  • i thougt so too but it's just a set.
  • a plugin detects changes in a given set
  • it's up to lix to pass the set to be tested set in whatever combination e.g. on merging branches, lix creates a diffing set of changes between branch A and B. this diffing set is passed to detectConflicts.

packages/lix-sdk/src/query-utilities/change-in-branch.ts Outdated Show resolved Hide resolved
blob: await newLixFile(),
});

const mockChanges = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is not sharp enough a call to "getRootChange" would be sufficient in this case as well. I would at least add some more history before the common to ensure the query stops as expected?

changeB,
});

expect(commonAncestor).toBe(undefined);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect the function to throw in such a case - how can you not have a common ancestor for a change with the same entity_id and type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, the same row has been independently created for a csv file. While both refer to the same entity, they have been created independently and, therefore, have no common ancestor.

The scenario of no common ancestor is unlikely for a file formats like inlang where the entity id is a UUID.

Comment on lines +9 to +15
export async function getLowestCommonAncestorV2(args: {
lix: Pick<LixReadonly, "db">;
changeA: Pick<Change, "id">;
changeB: Pick<Change, "id">;
}): Promise<Change | undefined> {
// check for direct ancestors first before running the recursive query
const directAncestorA = await args.lix.db
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤯 looking at this for a while now :D

Comment on lines +9 to +15
export async function getLowestCommonAncestorV2(args: {
lix: Pick<LixReadonly, "db">;
changeA: Pick<Change, "id">;
changeB: Pick<Change, "id">;
}): Promise<Change | undefined> {
// check for direct ancestors first before running the recursive query
const directAncestorA = await args.lix.db
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A thought here: coverage wont be a good measurement for tests of lix i guess...

Copy link
Contributor

@martin-lysk martin-lysk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just one more note

@samuelstroschein
Copy link
Member Author

Merging to get into apps for testing.

@samuelstroschein samuelstroschein merged commit 793055e into lix-integration Nov 4, 2024
2 checks passed
@samuelstroschein samuelstroschein deleted the branching branch November 4, 2024 15:34
@github-actions github-actions bot locked and limited conversation to collaborators Nov 4, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants