-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Branching #3201
Branching #3201
Conversation
|
|
||
// it is reasonable to assume that a conflict exists if the same (entity, file, type) change is updated in both branches. | ||
// in case a plugin does not detect a conflict, the system should automatically detect it. | ||
test("it should automatically detect a conflict if a change exists that differs updates in both branches despite having a common ancestor", async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a beautiful change.
plugins do not need to detect diverging entity graphs as conflict anymore. lix handles that as it's universal conflict. an analogy with git: git defines entities in text files as rows. if the same row (entity) diverges in two repos, git raises a conflict. same now applies to lix. if two entities diverge, it's a conflict. but in contrast to git, entities can be defined more fine grained e.g. "character", automatically be resolved, and semantic conflicts between entities can be raised. if row 1 and row 2 are diverging, it's not an 'entity conflict' but maybe a semantic one that a plugin can detect!
CREATE TABLE IF NOT EXISTS branch_change_pointer ( | ||
branch_id TEXT NOT NULL, | ||
change_id TEXT NOT NULL, | ||
change_file_id TEXT NOT NULL, | ||
change_entity_id TEXT NOT NULL, | ||
change_type TEXT NOT NULL, | ||
|
||
PRIMARY KEY(branch_id, change_file_id, change_entity_id, change_type), | ||
|
||
FOREIGN KEY(branch_id) REFERENCES branch(id), | ||
FOREIGN KEY(change_id, change_file_id, change_entity_id, change_type) REFERENCES change(change_id, change_file_id, change_entity_id, change_type) | ||
) strict; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decided to not go with the change set column to
- get foreign key validations
- avoid "dead" pointers e.g. a change set can have no reference. a change pointer of a branch with no reference can be garbage collected
- indexes 👀
An open question is whether branch_change_head
is a better suited table name
-- only one branch can be active at a time | ||
-- hence, the table has only one row | ||
CREATE TABLE IF NOT EXISTS current_branch ( | ||
id TEXT NOT NULL PRIMARY KEY, | ||
|
||
FOREIGN KEY(id) REFERENCES branch(id) | ||
) strict; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about implementing the current branch as runtime state, contained in a settings file, or adding branch.active` but this approach is better because:
- it;s just a pointer that can be updated without traversing tables
- foreign key validation 🥳
|
||
const changes = await lix.db | ||
.selectFrom("change") | ||
.where(changeInBranch(branch.id)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BEAUTIFUL querying of tables scoped to branches via a filter.
- no views
- no full-blown utility queries
- just a filter 😍
export async function getLowestCommonAncestorV2(args: { | ||
lix: Pick<LixReadonly, "db">; | ||
changeA: Pick<Change, "id">; | ||
changeB: Pick<Change, "id">; | ||
}): Promise<Change | undefined> { | ||
// check for direct ancestors first before running the recursive query | ||
const directAncestorA = await args.lix.db |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is (ofc) a monster query. Performance is likely sub-optimal. Doesn't matter for now. We can add caching or jumping of commonly/previously traversed paths in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤯 looking at this for a while now :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A thought here: coverage wont be a good measurement for tests of lix i guess...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WIP reviewed create-branch.test.ts, create-branch.ts, merge-branch.test.ts, switch-branch.test.ts and switch-branch.ts
change_id TEXT NOT NULL, | ||
change_file_id TEXT NOT NULL, | ||
change_entity_id TEXT NOT NULL, | ||
change_type TEXT NOT NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we denormalize change in this table? change_file_id, entity_id and change_type are already defined in the change table....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- (minor) avoid joining pointers with the change table
- (major) ensure consistency that a branch only points to one "entity change" at a time
If we find a way to ensure consistency without the de-serialized columns, more than welcome to change it.
change_entity_id TEXT NOT NULL, | ||
change_type TEXT NOT NULL, | ||
|
||
PRIMARY KEY(branch_id, change_file_id, change_entity_id, change_type), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this produces an index table on branch_id, change_file_id, change_entity_id and change_type so we would end up with 5 uuid columns 5 for branch_change_pointer table and 4 for the primary key index table.
Each UUID is 16 bytes.
So, 9 UUIDs per record means 9 * 16 = 144 bytes per record
For 100 records, the total storage required for each branch would be 14.4 kb.
Just to set this in scope - we discussed 250k records last week - this would be 36 mb only for a single branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be de-duplicated in the future
join | ||
// join related change pointers | ||
// related = same entity, file, and type | ||
.onRef("source.change_entity_id", "=", "target.change_entity_id") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok - this seems to be the reason for the extra columns in branch_change_pointer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
- (minor) avoid joining pointers with the change table
- (major) ensure consistency that a branch only points to one "entity change" at a time
if (plugin.detectConflictsV2) { | ||
const conflicts = await plugin.detectConflictsV2({ | ||
lix: args.lix, | ||
changes: diffingChanges, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't we need to provide the information about the target branch as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i thought so too but it doesn't seem like it.
- a branch points to a set of changes
- merging two branches leads to a new set
- a plugin only needs to detect if a change in the new set exists
A nice side effect, the API generalizes pretty well. This API can now be used to detect conflicts even within the same branch (because it's just a set) or other domains.
// (ignore if the conflict already exists) | ||
if (detectedConflicts.length > 0) { | ||
await trx | ||
.insertInto("conflict") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have a relation between conflicts and branches - is this intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment yes. We need to investigate conflicts in another PR.
Gut feeling is any branch pointing to the same conflicting change will see the same conflict. If one branch resolves the conflict, it will be automatically in other branches that point to the same change (and thereby conflict).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WIP review part 2
import { openLixInMemory } from "../open/openLixInMemory.js"; | ||
import { updateBranchPointers } from "./update-branch-pointers.js"; | ||
|
||
test("the branch pointer for a change should be updated", async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also cover the onConflict case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep. the function is intentionally updateBranchPointers
to include any other pointers we want to update in the future (like maybe conflicts or discussions)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expected the tests for handling branch pointers in separate files i here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah i scoped it to branching. the API will become large and choatic. i thought about scoping code to data structures e.g. branch
(mergeBranch, updateBranchPointers), lix
(openLix, mergeLix), plugin
(lix-plugin-schema, utilityies), database
(schema). but it will just get choatic i guess :D
) | ||
// pointer for this branch and change_entity, change_file, change_type | ||
// already exists, then update the change_id | ||
.onConflict((oc) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure on this one but doesn't a conflict trigger also on a foreign key conflict - If so we should add a conflict target here - would also be more explicit what conflicts we are dealing with
@@ -37,6 +37,10 @@ export type LixPlugin = { | |||
sourceLix: LixReadonly; | |||
targetLix: LixReadonly; | |||
}) => Promise<DetectedConflict[]>; | |||
detectConflictsV2?: (args: { | |||
lix: LixReadonly; | |||
changes: Array<Change>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the plugin will need to know the source and target branch here - to be able to take the current state into account during conflict detection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you find a use case?
- i thougt so too but it's just a set.
- a plugin detects changes in a given set
- it's up to lix to pass the set to be tested set in whatever combination e.g. on merging branches, lix creates a diffing set of changes between branch A and B. this diffing set is passed to detectConflicts.
blob: await newLixFile(), | ||
}); | ||
|
||
const mockChanges = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is not sharp enough a call to "getRootChange" would be sufficient in this case as well. I would at least add some more history before the common to ensure the query stops as expected?
changeB, | ||
}); | ||
|
||
expect(commonAncestor).toBe(undefined); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect the function to throw in such a case - how can you not have a common ancestor for a change with the same entity_id and type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, the same row has been independently created for a csv file. While both refer to the same entity, they have been created independently and, therefore, have no common ancestor.
The scenario of no common ancestor is unlikely for a file formats like inlang where the entity id is a UUID.
export async function getLowestCommonAncestorV2(args: { | ||
lix: Pick<LixReadonly, "db">; | ||
changeA: Pick<Change, "id">; | ||
changeB: Pick<Change, "id">; | ||
}): Promise<Change | undefined> { | ||
// check for direct ancestors first before running the recursive query | ||
const directAncestorA = await args.lix.db |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤯 looking at this for a while now :D
export async function getLowestCommonAncestorV2(args: { | ||
lix: Pick<LixReadonly, "db">; | ||
changeA: Pick<Change, "id">; | ||
changeB: Pick<Change, "id">; | ||
}): Promise<Change | undefined> { | ||
// check for direct ancestors first before running the recursive query | ||
const directAncestorA = await args.lix.db |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A thought here: coverage wont be a good measurement for tests of lix i guess...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just one more note
Merging to get into apps for testing. |
Closes opral/inlang-sdk#143
Changes
updateBranchPointers
to keep pointers moving when new changes come inchangeInBranch(id)
filter to get changes only in a branchmergeBranch({ lix, sourceBranch: Branch, targetBranch: Branch })
createBranch({ lix, from: Branch })
switchBranch
currentBranch
Important notes
Notes
💡 = insight
⚠️ = API change
😍 = awesome
❓ = unclear/review wanted
💡better to have a dedicated change set pointer concept to
💡branch is state, we could introduce a state concept to move only one pointer at a time
plugin.detectConflicts
needs overhaul because targetLix,
sourceLix
doesn't make sense for branchingplugin.detectChanges
😍 plugins don't need to handle conflict detection for diverging entity changes anymore
plugin.detectConflicts
is reduced to “detectEntityOverarchingConflicts
”😍 no more need for targetLix & sourceLix
😍 querying changes in a branch is as easy as
where(changeInBranch(currentBranch.id))
❓️branch_change_pointer or branch_change_head
❓️any benefit of introducing a state concept?