Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Epic: Large Data Support #10494

Closed
DLehenbauer opened this issue Jun 1, 2022 · 14 comments
Closed

Epic: Large Data Support #10494

DLehenbauer opened this issue Jun 1, 2022 · 14 comments
Assignees
Labels
area: dds Issues related to distributed data structures area: driver Driver related issues area: runtime Runtime related issues status: stale

Comments

@DLehenbauer
Copy link
Contributor

DLehenbauer commented Jun 1, 2022

Roadmap for Large Ops and Summaries

Milestones

M0: Design and POC

While we generally understand the path for M1 (efficient use of storage and network), many Fluid customers are highly sensative to application startup and document loads times.

The outcome of this deliverable is to show that the additional code and computation required for compression yield a neutral to positive impact for the spectrum of Fluid customers as well as select the specific algorithms we will use in M1 and M1.5.

An additional desirable outcome is that the synthetic benchmarks developed for measuring compression can be used for additional performance tuning with data that statistically resembles real-world scenarios.

Deliverables

  • Develop benchmark to evaluate existing serializers (e.g., MsgPackR)

  • Develop benchmark to evaluate existing compression algorithms (e.g., LZ4)

  • Develop benchmark to evaluate effects of holistic UUID compression for Fluid runtime

  • Analysis of summary structure and metadata

  • Analysis of op message structure and metadata

  • Proof of concept using PropertyTree DDS as tactical solution

M1: Summaries make efficient use of network & storage (~10x)

This milestone tracks pragmatic work to improve the efficiency with which we use our current network and storage limits. Changes to the storage format require backwards compatibility with prior versions. Therefore, it is desirable to bundle changes to minimize the number of versions that the runtime must support.

Deliverables

  • Replace JSON serialization with a faster/compact binary alternative

    • Modify runtime to avoid base64 encoding with binary payloads

    • Apply alternative serialization and/or general compression to summaries

  • Implement ID compression

M1.5: Ops make efficient use of network & storage (~0.5x..15x)

This milestone tracks pragmatic work to improve the efficiency with which we use our current network and storage limits. Changes to the storage format require backwards compatibility with prior versions. Therefore, it is desirable to bundle changes to minimize the number of versions that the runtime must support.

Deliverables

  • Replace JSON serialization with a faster/compact binary alternative

    • Modify runtime to support binary payloads in op messages

    • Apply alternative serialization and/or general compression to payload

    • Remove duplicate serialization of op message content

    • Process msg without deserializing or decompressing payload

  • Implement ID compression

  • Remove redundant metadata within batched ops

M2: Incremental Summaries (#11572)

This milestone tracks work items that reduce the size of summary uploads by reusing portions of previous summaries for unchanged data. We plan to tackle this problem at two levels. The first is through automatic identification and de-duplication on the client using techniques like Content-Defined Chunking. The second is by exposing the original blob handles to the DDS so that a sophisticated DDS (SharedTree) can improve reuse with less cost by manually tracking and reusing chunks from previous summaries.

Deliverables

  • Automatic chunk reuse:

    • Develop benchmark to evaluate Content-Defined Chunking (CDC) algorithms

    • Modify runtime to apply CDC to summaries prior to compression

    • Implement client-side blob deduplication

  • Explicit chunk reuse (SharedTree)

    • Expose summary blob handles to DDS

    • Track and reuse summary blobs

M3: Chunked Ops

This milestones track the work items necessary to process operations that exceed service limits (Socket.io, Kafka, etc). This work requires some initial analysis to find the right balance between allowing concurrent edits and overall system complexity.

  • Complexity and performance analysis of three alternatives:

    • Use of a side-channel for large ops

    • Synthetically assigning seq#s at the client

    • Applying multiple operations with the seq# of the final chunk

  • Modify runtime to apply chunking strategy to large op payloads

    • Detection of large ops

    • Partitioning of large batches into multiple messages

    • Re-assembly of multiple messages into a single batch

    • Orchestration of incoming/outgoing messages to preserve ordering guarantees

Future: Distributed Summaries (SharedTree)

  • Research and feasibility analysis of distributed summaries

Future: Cloud Summaries (SharedTree)

@ghost ghost added the triage label Jun 1, 2022
@DLehenbauer DLehenbauer self-assigned this Jun 1, 2022
@DLehenbauer DLehenbauer added area: runtime Runtime related issues area: driver Driver related issues area: dds Issues related to distributed data structures and removed triage labels Jun 7, 2022
@DLehenbauer
Copy link
Contributor Author

@dstanesc, @milanro - FYI, initial draft of Large Data epic.

@milanro
Copy link
Contributor

milanro commented Jun 14, 2022

Hello @DLehenbauer, should we create special discussion / design ticket for this epic?

Base64 removal optimization :

However, due to a past limitation of the Fluid Runtime API, the resulting binary is then base64 encoded (~33% penalty).

We've since introduced the SummaryTreeBuilder, which allows Uint8Arrays to be attached to the summary without first encoding as base64. Using SummaryTreeBuilder to avoid the base64 encoding is both a size and perf win.

It looks like Property DDS already uses SummaryTreeBuilder but still encodes the blobs with base64. When this is disabled and the blobs are in binary form, it looks that the routerlicious Upload Manager converts it to base64 automatically. Is there any trick to force the binary blobs in the SummaryTree (or is there other SummaryTreeBuilder available than that one used by Property DDS)?

The WholeSummaryUploadManager is used for uploading to routerlicious, the implementation is located in

/server/routerlicious/packages/services-client/src/wholeSummaryUploadManager.ts

The following method is called in order to upload

WholeSummaryUploadManager.writeSummaryTreeCore(...)

which generates the transfer form of the tree at

server/routerlicious/packages/services-client/src/storageUtils.ts

function

convertSummaryTreeToWholeSummaryTree(...)

This method encodes the binary blobs by base64

if (typeof summaryObject.content === "string") {
...
content: summaryObject.content,
...
} else {
...
content: Uint8ArrayToString(summaryObject.content, "base64"),
...

@DLehenbauer
Copy link
Contributor Author

@milanro - Creating a new GitHub issue about base64 would be very helpful. Would you mind doing that and tagging me in it?

@DLehenbauer
Copy link
Contributor Author

@milanro, @dstanesc - FYI. Some initial feedback from @vladsud that I'll factor into the plan soon:

We're thinking of starting the summary work (M1) and a subset of the ops work (M1.5) in parallel. The thought is that the two of you would drive large summaries and MSFT devs would drive ops (at least initially). The reason is that MSFT has more pain around large ops due to batching, while I believe you're most interested in large summaries and the single large op scenario.

Vlad would prefer to decouple binary encoding from eliminating base64. His reasoning is that we'll need to continue to support downgrading to base64 until the protocol update has sufficient penetration on both the client and service, and therefore he sees it as a lower priority than landing the other benefits of binary & compression, even with the 33% overhead of base64.

Vlad corrected me that incremental summaries (M3) do not depend on switching to shredded summaries (M2). Even though the "whole document" summarizer uploads a single monolithic blob, the service will still decompose that blob and assign ids to the fragments. This means we'll have reusable blob IDs for the interior nodes and can deprioritize shredded summaries, which would then only become relevant when a document has +30mb of changes to upload.

@DLehenbauer
Copy link
Contributor Author

I was reading through @vladsud's notes on op chunking, and one observation he made is that because chunking will amplify the number of ops a client sends, clients will be more likely to hit service rate limits (e.g., too many ops per period of time.)

I believe the current throttling algorithm tolerates short bursts, but we may need to add some form of backpressure to the large data roadmap so that clients can locally throttle transmission and/or production to avoid exceeding these limits.

@dstanesc
Copy link
Contributor

dstanesc commented Jul 15, 2022

@DLehenbauer @vladsud Created first draft for material like synthetic data generation fake-material-data. Experimenting w/ template based data generation, probably a reusable pattern for other data domains. Looking forward to feedback and contributions :).

@dstanesc
Copy link
Contributor

Derived from above, just published a materials data compression benchmark repository. Evaluating currently brotli, pako and lz4js compression libraries. Serialization provided by msgpackr. Materials generated in a typical range of 20 -1000 properties.

@DLehenbauer
Copy link
Contributor Author

@dstanesc - That's awesome, thank you. I'll make sure @vladsud sees this as well.

@dstanesc
Copy link
Contributor

@DLehenbauer @vladsud @milanro One more synthetic data generator fake-metrology-data. Also available via the npmjs repo. Pretty large payloads can be created with it.

@dstanesc
Copy link
Contributor

dstanesc commented Aug 3, 2022

... and also the associated metrology data compression benchmark. Deserves noted that while Lz4js is fastest, its compression rate remains behind competitors, for instance an average quality Brotli compression for a fairly large metrology report squeezes to almost half the size Lz4js does. The difference in speed is however notable. > 10x.

@dstanesc
Copy link
Contributor

Status overview on HxGN large data contributions:

Open PRs

  • PR 11600 - Configurable Summary Compression
  • PR 12396 - Content Defined Chunking Module
  • Draft PR 12494 - Summary Compaction (Deduplication) Library Proposal, POC
  • PR 12699 - Summary Compaction (Deduplication) Library

Fruition Depends On

@microsoft-github-policy-service
Copy link
Contributor

This PR has been automatically marked as stale because it has had no activity for 60 days. It will be closed if no further activity occurs within 8 days of this comment. Thank you for your contributions to Fluid Framework!

@milanro
Copy link
Contributor

milanro commented Feb 2, 2023

We should probably keep this opened.

@microsoft-github-policy-service
Copy link
Contributor

This issue has been automatically marked as stale because it has had no activity for 180 days. It will be closed if no further activity occurs within 8 days of this comment. Thank you for your contributions to Fluid Framework!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: dds Issues related to distributed data structures area: driver Driver related issues area: runtime Runtime related issues status: stale
Projects
None yet
Development

No branches or pull requests

3 participants