In the last update, I said that we were abandoning our efforts to apply CRDTs to the entire repository, citing lack of clarity on what we were actually trying to achieve. However, after more conversations with colleagues, we've decided to proceed with that effort after all. After a lot more thinking and writing, we finally got enough clarity on our direction to start writing code last week.
We still plan to continue developing Xray as a text editor, but we're adding a new top-level module to the repository called Memo, which is essentially a CRDT-based version control system that interoperates with Git. Xray will pull in Memo as a library and build directly on top of its primitives, but we also plan to make Memo available as a standalone executable in the future to support integration with other editors.
Our plan is for Memo to complement Git with real-time capabilities. Like Git, Memo will support branches to track parallel streams of development, but in Memo, all replicas of a given branch will be synchronized in real-time without conflicts. For example, if you and a collaborator check out the same Memo branch, you'll be able to move a file while someones else is editing that file, and the state of the file tree will cleanly converge.
Today, Git serves as a bridge between your local development environment and the cloud. When you push commits to GitHub, you're not only ensuring that your changes are safely persisted and shared with your teammates, but you're also potentially kicking off processes on one or more cloud-based services to run tests, perform analysis, or deploy to production. We want to make that feedback loop tighter, allowing you to share your changes with teammates and cloud-based services as you actively write code.
With Memo, as you're editing, a CI provider like Travis could run tests across a cluster of machines and give you feedback about your changes immediately. A source code analysis service like Code Climate could literally become an extension of your IDE, giving you feedback long before you commit.
Like Git, we also intend to persist each branch's history to a database, but your changes will be continuously persisted on every keystroke rather than only when you commit. After the fact, you'll be able to replay edits and identify specific points in a branch's evolution via a version vector. When we detect commits to the underlying Git repository, we'll automatically persist a snapshot of the current state of the Memo repository and map the commit SHA to a version vector. When a commit only contains a subset of the outstanding changes, we'll need a more complex representation than a pure version vector in order to account for the exact contents of the commit, since a version vector can only identify the state of the repository at a specific point in time.
Last week, after getting clear on our goals, we started on a new tree implementation that we'll use to index the history of changes to the file system and text files. It's based heavily on the tree that we already use within Xray to represent the buffer CRDT, but we're modifying it to support persistence of individual nodes in an external database. This will allow us to index the entire operational history of files without needing to load that entire history into memory during active editing. Once we complete the initial implementation of this B-tree, we'll use it to build out a CRDT representing the state of the file system.
While I've been focused on getting clarity in terms of version control, @as-cii has continued to make progress on Xray itself. Last week he merged a PR that adds support for horizontal scrolling the editor, which was a bit more challenging than it might sound.
To support horizontal scrolling, we need to know the width of the editor's content, which involves efficient tracking and measurement of the longest line. Previously, we maintained a vector of newline offsets as part of each chunk of inserted text to support efficient translation between 1-D and 2-D coordinates which we implemented by binary searching this vector. Antonio replaced this representation with a static binary tree, which is still stored inside a vector for efficiency. With the binary tree, we maintain the same offset information that was formerly available in the flat vector, but we also index maximal row lengths, which gives us the ability to request the longest row in an arbitrary region of the text in logarithmic time.
I'll be out next week on vacation, so Antonio plans to focus primarily on more editor features until I'm back on Monday, June 4th. He'll start with rendering a gutter and line numbers, which he already got started last week. In light of my absence, there's a good chance we could go another 2 weeks before the next update. Thanks for your patience.