Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a BinExport v3 format based on SQLite #77

Open
cblichmann opened this issue Jun 11, 2021 · 0 comments
Open

Implement a BinExport v3 format based on SQLite #77

cblichmann opened this issue Jun 11, 2021 · 0 comments
Labels
enhancement New feature or request

Comments

@cblichmann
Copy link
Member

The current protobuf based format was originally based on the PostgreSQL database schema used by the (now archived) BinNavi project. Is is heavily optimized for compactness and being well compressible, as Google's internal use case is to store billions of them.
This, in turn, makes accessing disassembly structure somewhat difficult and error prone (e.g. see binexport.cc:GetInstructionAddress()). One has to write a lot of code to get to the most basic information. This code also has to be implemented at least in C++ (for BinDiff core), Java (for its UI) and possibly Python if one wishes to use the format from a script in one of the supported disassemblers.
Another issue with the current protobuf based format is that Protocol Buffers messages are not self-delimiting and always have to be parse whole. The (never published) BinExport v1 format used a small header with (file offset, size)-pairs followed by individual CallGraph/FlowGraph proto messages. To save space, the v2 format combined everything into one big message. This design decision has lead to various problems: For example, BinDiff has to reparse the full .BinExport file each time symbols and comments are imported. As another example, some binaries (such as Electron) lead to proto message that are hundreds of megabytes in size, resulting in warnings from libprotobuf itself as messages over 32MiB are considered to be inefficient.

A new database based format would allow for a somewhat more natural query interface and SQL queries that can be shared across languages. As BinDiff already uses SQLite for its result and workspace files, it seems like an obvious choice that does not require a database server. SQLite based formats can be partially consumed as well and it should be possible to keep them small, too.

@cblichmann cblichmann added the enhancement New feature or request label Jun 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant