Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is the compaction blog ready? #2

Open
iAziz786 opened this issue Jan 16, 2022 · 16 comments
Open

Is the compaction blog ready? #2

iAziz786 opened this issue Jan 16, 2022 · 16 comments
Assignees
Labels
question Further information is requested

Comments

@iAziz786
Copy link

Hey @adambcomer, hope you are doing well! I have been following this blog series and it's really great.

I am now waiting for the other parts to arrive, I know you might be busy that's why you haven't put them yet.

If possible can you please discuss how to go about the compaction? I am in a blocker to finish the database engine.

Thanks anyways :)

@adambcomer
Copy link
Owner

Hello @iAziz786,

Unfortunately, I have gotten very busy finishing university. I assure you that I haven't forgotten this project and will complete it.

Since you asking about compaction, I'll assume you've built the Memtable, Write Ahead Log, and SSTables. Leveled Compaction, although scary sounding, isn't very complex. It functions in three steps: joining SSTables with overlapping key ranges, removing overwritten records, and migrating the new SSTable up a level.

First, joining SSTables with common key ranges. Currently, I'm working on some graphics to easily explain this, but I'll try my best without them. To do this, iterate over each of the identified tables and take the union of the sets of records.

Second, when joining each set of records, you might encounter the same key in multiple SSTables. In my tutorial, I store the timestamp the record was written. Use this value to keep the latest record and discard the rest.

Third, the prior two steps will produce a new set of records that are written into a new SSTable. Put the new SSTable on the next level and delete the old SSTables.

Finally, repeat this process for each level.

You can see that this process will naturally move records into progressively larger files and clear out old records on each pass.

Hope this helps,
Adam Comer

@iAziz786
Copy link
Author

Hi @adambcomer, thanks for taking out some of your time to explain it well.

Basically I'm at the point where the last tutorial is ended, that means I haven't created the SSTable yet. I guess once they will be ready I will be able to do the Leveled Compaction. I am working on making the SSTable work right now.

Will keep you updated,
Have a nice day :)

@iAziz786
Copy link
Author

Hi @adambcomer,

Hope you are doing well :)

So basically I'm stuck in implementing sstable.rs file. I am not able to understand how to go about that file 🤷‍♂️ Can I ask you for one big help?

If you can let me know the methods that this file may contain and just sudo code about each method then I think I will be able to complete the implementation. Just a rough steps about when to write to file, when to load in memory etc.

I tried to comprehend the blocked-base table for SSTable from RocksDB but didn't understood it clearly.

I know you might be very much occupied with your college, anyways, stay safe!

@iAziz786

@adambcomer
Copy link
Owner

adambcomer commented Jan 24, 2022

@iAziz786

A Sorted Strings Table, SSTable for short, is an immutable set of records from the Memtable. You can use any disk format you want. I plan on using the same format as WAL because this is an explainer blog article series. I've copied the WAL block structure for reference.

+---------------+---------------+-----------------+-...-+--...--+-----------------+
| Key Size (8B) | Tombstone(1B) | Value Size (8B) | Key | Value | Timestamp (16B) |
+---------------+---------------+-----------------+-...-+--...--+-----------------+
Key Size = Length of the Key data
Tombstone = If this record was deleted and has a value
Value Size = Length of the Value data
Key = Key data
Value = Value data
Timestamp = Timestamp of the operation in microseconds

Building the SSTable requires integrating the Memtable, WAL, and SSTables. So it's not clear what the methods will be for the SSTable and Database structs.

I'll look into getting the code on github for you to preview, then publish the articles later.

@adambcomer adambcomer added the question Further information is requested label Jan 24, 2022
@adambcomer adambcomer self-assigned this Jan 24, 2022
@iAziz786
Copy link
Author

@adambcomer

Just one clarification, what benefits that we get from SSTable if is sorted by keys? I know it can be useful for compaction but how does it helps to search any value?

Even if it's sorted by key we have to do a linear search on each entry, right?

@adambcomer
Copy link
Owner

@iAziz786

Sorting the keys gives O(log n) runtime with binary search.

@iAziz786
Copy link
Author

@adambcomer

Assuming the key is not in the memtables and SSTables are structured like the WAL but sorted, even in that case how can we binary search on a data that's stored in file?

@adambcomer
Copy link
Owner

@iAziz786

The keys are indexed in memory. In the Rocksdb BlockBasedTable Format, they use a block based index. My project's SSTable is similar to the PlainTable Format without the hashing. In that doc, they explain how RocksDB builds an in-memory index using record offsets. Of course, they have many optimizations for a bunch of key-value patterns, but you can ignore them to get an understanding of how the basic system functions.

@iAziz786
Copy link
Author

@adambcomer

Apparently RocksDB has something called FileMetaData which stores some important information like the largest and smallest key of that file. I assume that FileMetaData is a class in the memory and got created when the database starts. For each SST file in the database there will be a FileMetaData. And when doing the Get() query we will first find the correct file based on the FileMetaData and do a linear search on that one file.

Does this sound good?

@adambcomer
Copy link
Owner

@iAziz786

I think you are missing some foundational knowledge about LSM databases. I highly recommend you read more about the subject to understand the theoretical underpinnings of this database. Reading the original paper should fill the holes in your understanding.

Original paper: https://www.cs.umb.edu/~poneil/lsmtree.pdf

Best of luck,
Adam Comer

@iAziz786
Copy link
Author

@adambcomer

Yeah, that's correct. I lack some fundamental understand about the LSM database. I will get back to you after reading the paper.

Thank you!

@huangzixun123
Copy link

Hey @adambcomer , I have been following this project. And i want to know how long will take it to complete? I'm really looking forward to this project!

@iAziz786
Copy link
Author

I too.

@huangzixun123
Copy link

huangzixun123 commented Jun 30, 2023 via email

@muqiuhan
Copy link

You can check @adambcomer ‘s other project: WiscKey.
I read WiscKey and LSM-Tree original paper, which can help you understand the unfinished chapters of this repo.

@huangzixun123
Copy link

@muqiuhan good advice! thx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants