Branching #3201

samuelstroschein · 2024-11-04T03:34:52Z

Closes opral/inlang-sdk#143

Changes

updateBranchPointers to keep pointers moving when new changes come in
changeInBranch(id) filter to get changes only in a branch
mergeBranch({ lix, sourceBranch: Branch, targetBranch: Branch })
createBranch({ lix, from: Branch })
switchBranch
currentBranch

Important notes

de-duplicating pointers can be added later via chunking see this
we can add more head pointers, for example, for discussions, in the future

Notes

💡 = insight
⚠️ = API change
😍 = awesome
❓ = unclear/review wanted

💡better to have a dedicated change set pointer concept to
- build index by entity_id
- have unique (branch_id, entity_id) check
- distinguish between user sets and mostly automated sets of pointers that can be garbage collected
- argument to rename change_set to change_group for user intent. both are formalized as change set
💡branch is state, we could introduce a state concept to move only one pointer at a time
- points to changes now
- could point to discussions and more entities in the future
⚠️ plugin.detectConflicts needs overhaul because targetLix, sourceLix doesn't make sense for branching
- all changes exists in the lix
- for merging two lix’es, it seems to make sense to create an interim branch in a transaction that has all changes and then pass the diff to plugin.detectChanges
😍 plugins don't need to handle conflict detection for diverging entity changes anymore
- lix detects diverting entity changes as conflict
- plugin.detectConflicts is reduced to “detectEntityOverarchingConflicts”
😍 no more need for targetLix & sourceLix
- merging two lix’es is creating an interim branch
- unification of merging APIs to branch model
😍 querying changes in a branch is as easy as where(changeInBranch(currentBranch.id))
❓️branch_change_pointer or branch_change_head
- same applies to discussions and other entities we might add in the future
❓️any benefit of introducing a state concept?
- branch is modeling state with pointers that move independently.

changeset-bot · 2024-11-04T03:34:56Z

⚠️ No Changeset found

Latest commit: 2529fcd

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

samuelstroschein · 2024-11-04T03:41:26Z

packages/lix-sdk/src/branch/merge-branch.test.ts

+
+// it is reasonable to assume that a conflict exists if the same (entity, file, type) change is updated in both branches.
+// in case a plugin does not detect a conflict, the system should automatically detect it.
+test("it should automatically detect a conflict if a change exists that differs updates in both branches despite having a common ancestor", async () => {


this is a beautiful change.

plugins do not need to detect diverging entity graphs as conflict anymore. lix handles that as it's universal conflict. an analogy with git: git defines entities in text files as rows. if the same row (entity) diverges in two repos, git raises a conflict. same now applies to lix. if two entities diverge, it's a conflict. but in contrast to git, entities can be defined more fine grained e.g. "character", automatically be resolved, and semantic conflicts between entities can be raised. if row 1 and row 2 are diverging, it's not an 'entity conflict' but maybe a semantic one that a plugin can detect!

samuelstroschein · 2024-11-04T03:44:13Z

packages/lix-sdk/src/database/applySchema.ts

+  CREATE TABLE IF NOT EXISTS branch_change_pointer (
+    branch_id TEXT NOT NULL,
+    change_id TEXT NOT NULL,
+    change_file_id TEXT NOT NULL,
+    change_entity_id TEXT NOT NULL,
+    change_type TEXT NOT NULL,
+
+    PRIMARY KEY(branch_id, change_file_id, change_entity_id, change_type),
+
+    FOREIGN KEY(branch_id) REFERENCES branch(id),
+    FOREIGN KEY(change_id, change_file_id, change_entity_id, change_type) REFERENCES change(change_id, change_file_id, change_entity_id, change_type)
+  ) strict;


Decided to not go with the change set column to

get foreign key validations

avoid "dead" pointers e.g. a change set can have no reference. a change pointer of a branch with no reference can be garbage collected

indexes 👀

An open question is whether branch_change_head is a better suited table name

samuelstroschein · 2024-11-04T03:45:47Z

packages/lix-sdk/src/database/applySchema.ts

+  -- only one branch can be active at a time
+  -- hence, the table has only one row
+  CREATE TABLE IF NOT EXISTS current_branch (
+    id TEXT NOT NULL PRIMARY KEY,
+
+    FOREIGN KEY(id) REFERENCES branch(id)
+  ) strict;
+


I thought about implementing the current branch as runtime state, contained in a settings file, or adding branch.active` but this approach is better because:

it;s just a pointer that can be updated without traversing tables

foreign key validation 🥳

samuelstroschein · 2024-11-04T03:46:48Z

packages/lix-sdk/src/query-utilities/change-in-branch.test.ts

+
+	const changes = await lix.db
+		.selectFrom("change")
+		.where(changeInBranch(branch.id))


BEAUTIFUL querying of tables scoped to branches via a filter.

no views

no full-blown utility queries

just a filter 😍

packages/lix-sdk/src/query-utilities/change-in-branch.ts

samuelstroschein · 2024-11-04T03:48:32Z

packages/lix-sdk/src/query-utilities/get-lowest-common-ancestor-v2.ts

+export async function getLowestCommonAncestorV2(args: {
+	lix: Pick<LixReadonly, "db">;
+	changeA: Pick<Change, "id">;
+	changeB: Pick<Change, "id">;
+}): Promise<Change | undefined> {
+	// check for direct ancestors first before running the recursive query
+	const directAncestorA = await args.lix.db


This is (ofc) a monster query. Performance is likely sub-optimal. Doesn't matter for now. We can add caching or jumping of commonly/previously traversed paths in the future

🤯 looking at this for a while now :D

A thought here: coverage wont be a good measurement for tests of lix i guess...

martin-lysk

WIP reviewed create-branch.test.ts, create-branch.ts, merge-branch.test.ts, switch-branch.test.ts and switch-branch.ts

packages/lix-sdk/src/branch/create-branch.ts

martin-lysk · 2024-11-04T08:44:59Z

packages/lix-sdk/src/database/applySchema.ts

+    change_id TEXT NOT NULL,
+    change_file_id TEXT NOT NULL,
+    change_entity_id TEXT NOT NULL,
+    change_type TEXT NOT NULL,


why do we denormalize change in this table? change_file_id, entity_id and change_type are already defined in the change table....

(minor) avoid joining pointers with the change table

(major) ensure consistency that a branch only points to one "entity change" at a time

If we find a way to ensure consistency without the de-serialized columns, more than welcome to change it.

martin-lysk · 2024-11-04T08:54:27Z

packages/lix-sdk/src/database/applySchema.ts

+    change_entity_id TEXT NOT NULL,
+    change_type TEXT NOT NULL,
+
+    PRIMARY KEY(branch_id, change_file_id, change_entity_id, change_type),


this produces an index table on branch_id, change_file_id, change_entity_id and change_type so we would end up with 5 uuid columns 5 for branch_change_pointer table and 4 for the primary key index table.

Each UUID is 16 bytes.
So, 9 UUIDs per record means 9 * 16 = 144 bytes per record
For 100 records, the total storage required for each branch would be 14.4 kb.

Just to set this in scope - we discussed 250k records last week - this would be 36 mb only for a single branch.

Can be de-duplicated in the future

packages/lix-sdk/src/branch/create-branch.test.ts

packages/lix-sdk/src/branch/merge-branch.test.ts

martin-lysk · 2024-11-04T10:05:46Z

packages/lix-sdk/src/branch/merge-branch.ts

+			join
+				// join related change pointers
+				// related = same entity, file, and type
+				.onRef("source.change_entity_id", "=", "target.change_entity_id")


ok - this seems to be the reason for the extra columns in branch_change_pointer

Yes.

(minor) avoid joining pointers with the change table

(major) ensure consistency that a branch only points to one "entity change" at a time

martin-lysk · 2024-11-04T10:07:40Z

packages/lix-sdk/src/branch/merge-branch.ts

+				if (plugin.detectConflictsV2) {
+					const conflicts = await plugin.detectConflictsV2({
+						lix: args.lix,
+						changes: diffingChanges,


don't we need to provide the information about the target branch as well?

i thought so too but it doesn't seem like it.

a branch points to a set of changes

merging two branches leads to a new set

a plugin only needs to detect if a change in the new set exists

A nice side effect, the API generalizes pretty well. This API can now be used to detect conflicts even within the same branch (because it's just a set) or other domains.

packages/lix-sdk/src/branch/merge-branch.ts

martin-lysk · 2024-11-04T10:27:03Z

packages/lix-sdk/src/branch/merge-branch.ts

+		// (ignore if the conflict already exists)
+		if (detectedConflicts.length > 0) {
+			await trx
+				.insertInto("conflict")


We don't have a relation between conflicts and branches - is this intended?

At the moment yes. We need to investigate conflicts in another PR.

Gut feeling is any branch pointing to the same conflicting change will see the same conflict. If one branch resolves the conflict, it will be automatically in other branches that point to the same change (and thereby conflict).

martin-lysk

WIP review part 2

martin-lysk · 2024-11-04T10:43:53Z

packages/lix-sdk/src/branch/update-branch-pointers.test.ts

+import { openLixInMemory } from "../open/openLixInMemory.js";
+import { updateBranchPointers } from "./update-branch-pointers.js";
+
+test("the branch pointer for a change should be updated", async () => {


This should also cover the onConflict case

yep. the function is intentionally updateBranchPointers to include any other pointers we want to update in the future (like maybe conflicts or discussions)

martin-lysk · 2024-11-04T10:51:13Z

packages/lix-sdk/src/merge/merge.test.ts

I expected the tests for handling branch pointers in separate files i here?

yeah i scoped it to branching. the API will become large and choatic. i thought about scoping code to data structures e.g. branch (mergeBranch, updateBranchPointers), lix (openLix, mergeLix), plugin (lix-plugin-schema, utilityies), database (schema). but it will just get choatic i guess :D

martin-lysk · 2024-11-04T11:18:42Z

packages/lix-sdk/src/branch/update-branch-pointers.ts

+			)
+			// pointer for this branch and change_entity, change_file, change_type
+			// already exists, then update the change_id
+			.onConflict((oc) =>


Not sure on this one but doesn't a conflict trigger also on a foreign key conflict - If so we should add a conflict target here - would also be more explicit what conflicts we are dealing with

martin-lysk · 2024-11-04T11:20:44Z

packages/lix-sdk/src/plugin/lix-plugin.ts

@@ -37,6 +37,10 @@ export type LixPlugin = {
 		sourceLix: LixReadonly;
 		targetLix: LixReadonly;
 	}) => Promise<DetectedConflict[]>;
+	detectConflictsV2?: (args: {
+		lix: LixReadonly;
+		changes: Array<Change>;


the plugin will need to know the source and target branch here - to be able to take the current state into account during conflict detection.

Can you find a use case?

i thougt so too but it's just a set.

a plugin detects changes in a given set

it's up to lix to pass the set to be tested set in whatever combination e.g. on merging branches, lix creates a diffing set of changes between branch A and B. this diffing set is passed to detectConflicts.

packages/lix-sdk/src/plugin/with-transaction.ts

packages/lix-sdk/src/query-utilities/change-in-branch.ts

martin-lysk · 2024-11-04T12:12:00Z

packages/lix-sdk/src/query-utilities/get-lowest-common-ancestor-v2.test.ts

+		blob: await newLixFile(),
+	});
+
+	const mockChanges = [


This test is not sharp enough a call to "getRootChange" would be sufficient in this case as well. I would at least add some more history before the common to ensure the query stops as expected?

martin-lysk · 2024-11-04T12:16:06Z

packages/lix-sdk/src/query-utilities/get-lowest-common-ancestor-v2.test.ts

+		changeB,
+	});
+
+	expect(commonAncestor).toBe(undefined);


I would expect the function to throw in such a case - how can you not have a common ancestor for a change with the same entity_id and type?

For example, the same row has been independently created for a csv file. While both refer to the same entity, they have been created independently and, therefore, have no common ancestor.

The scenario of no common ancestor is unlikely for a file formats like inlang where the entity id is a UUID.

martin-lysk · 2024-11-04T12:25:50Z

packages/lix-sdk/src/query-utilities/get-lowest-common-ancestor-v2.ts

+export async function getLowestCommonAncestorV2(args: {
+	lix: Pick<LixReadonly, "db">;
+	changeA: Pick<Change, "id">;
+	changeB: Pick<Change, "id">;
+}): Promise<Change | undefined> {
+	// check for direct ancestors first before running the recursive query
+	const directAncestorA = await args.lix.db


🤯 looking at this for a while now :D

martin-lysk · 2024-11-04T12:27:34Z

packages/lix-sdk/src/query-utilities/get-lowest-common-ancestor-v2.ts

+export async function getLowestCommonAncestorV2(args: {
+	lix: Pick<LixReadonly, "db">;
+	changeA: Pick<Change, "id">;
+	changeB: Pick<Change, "id">;
+}): Promise<Change | undefined> {
+	// check for direct ancestors first before running the recursive query
+	const directAncestorA = await args.lix.db


A thought here: coverage wont be a good measurement for tests of lix i guess...

martin-lysk

just one more note

packages/lix-sdk/src/query-utilities/get-parent-changes-v2.test.ts

samuelstroschein · 2024-11-04T15:34:00Z

Merging to get into apps for testing.

samuelstroschein added 13 commits November 2, 2024 19:45

add branch table

6c2a6f9

add pointers

f84a9f8

add jsdoc

9bc853b

switching branches

3bc74a7

export public api

1bd9322

add change-in-branch filter

1a9edd0

wip merge branch

35e7753

fix: run in transaction

2a10d14

fix: detect diverting entity changes conflict

de928b8

expose v2 apis

7d634be

improve: readonly for lowest common ancestor

3f4c168

improve: readonly for lowest common ancestor

79adbb1

add update branch pointers in file handlers

77e6347

samuelstroschein requested a review from martin-lysk November 4, 2024 03:34

samuelstroschein commented Nov 4, 2024

View reviewed changes

packages/lix-sdk/src/query-utilities/change-in-branch.ts Outdated Show resolved Hide resolved

samuelstroschein commented Nov 4, 2024

View reviewed changes

martin-lysk reviewed Nov 4, 2024

View reviewed changes

packages/lix-sdk/src/query-utilities/get-parent-changes-v2.test.ts Show resolved Hide resolved

samuelstroschein added 4 commits November 4, 2024 07:55

update comment

7cafa13

improve: conflicts can't self reference

b23e906

add comment

0c2a7ad

add grandparents for re-assurance

2529fcd

samuelstroschein merged commit 793055e into lix-integration Nov 4, 2024
2 checks passed

samuelstroschein deleted the branching branch November 4, 2024 15:34

github-actions bot locked and limited conversation to collaborators Nov 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Branching #3201

Branching #3201

samuelstroschein commented Nov 4, 2024 •

edited

Loading

changeset-bot bot commented Nov 4, 2024 •

edited

Loading

samuelstroschein Nov 4, 2024 •

edited

Loading

samuelstroschein Nov 4, 2024

samuelstroschein Nov 4, 2024

samuelstroschein Nov 4, 2024

samuelstroschein Nov 4, 2024

martin-lysk Nov 4, 2024

martin-lysk Nov 4, 2024

martin-lysk left a comment

martin-lysk Nov 4, 2024

samuelstroschein Nov 4, 2024

martin-lysk Nov 4, 2024

samuelstroschein Nov 4, 2024

martin-lysk Nov 4, 2024

samuelstroschein Nov 4, 2024

martin-lysk Nov 4, 2024

samuelstroschein Nov 4, 2024

martin-lysk Nov 4, 2024

samuelstroschein Nov 4, 2024

martin-lysk left a comment

martin-lysk Nov 4, 2024

samuelstroschein Nov 4, 2024

martin-lysk Nov 4, 2024

samuelstroschein Nov 4, 2024

martin-lysk Nov 4, 2024

martin-lysk Nov 4, 2024

samuelstroschein Nov 4, 2024

martin-lysk Nov 4, 2024

martin-lysk Nov 4, 2024

samuelstroschein Nov 4, 2024

martin-lysk Nov 4, 2024

martin-lysk Nov 4, 2024

martin-lysk left a comment

samuelstroschein commented Nov 4, 2024

Branching #3201

Branching #3201

Conversation

samuelstroschein commented Nov 4, 2024 • edited Loading

Changes

Important notes

Notes

changeset-bot bot commented Nov 4, 2024 • edited Loading

⚠️ No Changeset found

samuelstroschein Nov 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martin-lysk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martin-lysk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martin-lysk left a comment

Choose a reason for hiding this comment

samuelstroschein commented Nov 4, 2024

samuelstroschein commented Nov 4, 2024 •

edited

Loading

changeset-bot bot commented Nov 4, 2024 •

edited

Loading

samuelstroschein Nov 4, 2024 •

edited

Loading