feat(mito): Implement SST format for mito2 #2178

evenyag · 2023-08-15T13:01:32Z

I hereby agree to the terms of the GreptimeDB CLA

What's changed and what's your intention?

This PR implements the SST format for mito2 engine.

The new SST format encodes the primary keys in a memory-comparable format and stores them as dictionary arrays. We distinguish different time series by comparing the keys of the dictionary array while decoding the RecordBatch.

We store three internal columns in parquet:

__primary_key, the primary key of the row (tags).
__sequence, the sequence number of a row.
__op_type, the op type of the row.

The schema of a parquet file is:

field 0, field 1, ..., field N, time index, primary key, sequence, op type

Checklist

I have written the necessary rustdoc comments.
I have added the necessary unit tests and integration tests.

Refer to a related PR or issue link (optional)

Tracking issue for implementing mito as a region engine #1869

codecov · 2023-08-16T08:46:52Z

Codecov Report

Merging #2178 (241ff3e) into develop (8ea1763) will decrease coverage by 0.39%.
Report is 4 commits behind head on develop.
The diff coverage is 76.72%.

@@             Coverage Diff             @@
##           develop    #2178      +/-   ##
===========================================
- Coverage    84.68%   84.29%   -0.39%     
===========================================
  Files          698      700       +2     
  Lines       112701   113147     +446     
===========================================
- Hits         95437    95377      -60     
- Misses       17264    17770     +506

src/mito2/src/sst/parquet/reader.rs

src/mito2/src/sst/parquet/format.rs

src/mito2/src/read.rs

src/mito2/src/error.rs

waynexia

LGTM except the missing tests

v0y4g3r

LGTM

* chore: update comment * feat: stream writer takes arrow's types * feat: Define Batch struct * feat: arrow_schema_to_store * refactor: rename * feat: write parquet in new format with tsids * feat: reader support projection * feat: Impl read compat * refactor: rename SchemaCompat to CompatRecordBatch * feat: changing sst format * feat: make it compile * feat: remove tsid and some structs * feat: from_sst_record_batch wip * chore: push array * chore: wip * feat: decode batches from RecordBatch * feat: reader converts record batches * feat: remove compat mod * chore: remove some codes * feat: sort fields by column id * test: test to_sst_arrow_schema * feat: do not sort fields * test: more test helpers * feat: simplify projection * fix: projection indices is incorrect * refactor: define write/read format * test: test write format * test: test projection * test: test convert record batch * feat: remove unused errors * refactor: wrap get_field_batch_columns * chore: clippy * chore: fix clippy * feat: build arrow schema from region meta in ReadFormat * feat: initialize the parquet reader at `build()` * chore: fix typo

evenyag added 19 commits August 15, 2023 10:58

chore: update comment

807ccfe

feat: stream writer takes arrow's types

c8a2ec4

feat: Define Batch struct

d5a7360

feat: arrow_schema_to_store

2a8d374

refactor: rename

1571114

feat: write parquet in new format with tsids

70e15a5

feat: reader support projection

319280a

feat: Impl read compat

725b7da

refactor: rename SchemaCompat to CompatRecordBatch

05b4ade

feat: changing sst format

8601b6c

feat: Merge branch 'develop' into feat/mito2-read

3c5e6d0

feat: make it compile

c33c46c

feat: remove tsid and some structs

a56f2f6

feat: from_sst_record_batch wip

48f5aa4

chore: push array

bbf5ce9

chore: wip

f1f4c5f

feat: decode batches from RecordBatch

ec8bf4d

feat: reader converts record batches

f5d8934

feat: remove compat mod

ccfd736

evenyag changed the title ~~feat(mito): Implement SST format~~ feat(mito): Implement SST format for mito2 Aug 15, 2023

chore: remove some codes

bbcf8ee

evenyag force-pushed the feat/mito2-sst-format branch from 8cb4479 to c619a17 Compare August 15, 2023 13:38

evenyag added 6 commits August 15, 2023 21:42

feat: sort fields by column id

d0591ab

test: test to_sst_arrow_schema

757ef86

feat: do not sort fields

2891b9d

test: more test helpers

a2e5f62

feat: simplify projection

557d286

fix: projection indices is incorrect

a466935

evenyag force-pushed the feat/mito2-sst-format branch from c619a17 to a466935 Compare August 15, 2023 15:04

refactor: define write/read format

cb6fad5

evenyag added 7 commits August 16, 2023 14:39

test: test write format

ec552f2

test: test projection

e268d5a

test: test convert record batch

3ae156a

feat: remove unused errors

71e68e3

refactor: wrap get_field_batch_columns

fb7b531

chore: clippy

7cc761e

chore: Merge branch 'develop' into feat/mito2-sst-format

8123d3a

evenyag force-pushed the feat/mito2-sst-format branch from a554ce9 to 8123d3a Compare August 16, 2023 08:20

evenyag marked this pull request as ready for review August 16, 2023 08:21

chore: fix clippy

a0aa8c6

evenyag requested review from v0y4g3r and waynexia August 16, 2023 09:04

v0y4g3r reviewed Aug 16, 2023

View reviewed changes

src/mito2/src/sst/parquet/reader.rs Outdated Show resolved Hide resolved

src/mito2/src/sst/parquet/format.rs Outdated Show resolved Hide resolved

src/mito2/src/read.rs Outdated Show resolved Hide resolved

src/mito2/src/read.rs Outdated Show resolved Hide resolved

waynexia reviewed Aug 17, 2023

View reviewed changes

src/mito2/src/error.rs Outdated Show resolved Hide resolved

waynexia enabled auto-merge August 17, 2023 02:50

evenyag added 2 commits August 17, 2023 11:12

feat: build arrow schema from region meta in ReadFormat

a7b63f3

feat: initialize the parquet reader at build()

0b493fc

evenyag requested review from v0y4g3r and waynexia August 17, 2023 03:30

chore: fix typo

241ff3e

waynexia approved these changes Aug 17, 2023

View reviewed changes

evenyag self-assigned this Aug 17, 2023

v0y4g3r approved these changes Aug 17, 2023

View reviewed changes

waynexia added this pull request to the merge queue Aug 17, 2023

Merged via the queue into GreptimeTeam:develop with commit 4ba1215 Aug 17, 2023
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(mito): Implement SST format for mito2 #2178

feat(mito): Implement SST format for mito2 #2178

evenyag commented Aug 15, 2023

codecov bot commented Aug 16, 2023 •

edited

Loading

waynexia left a comment

v0y4g3r left a comment

feat(mito): Implement SST format for mito2 #2178

feat(mito): Implement SST format for mito2 #2178

Conversation

evenyag commented Aug 15, 2023

What's changed and what's your intention?

Checklist

Refer to a related PR or issue link (optional)

codecov bot commented Aug 16, 2023 • edited Loading

Codecov Report

waynexia left a comment

Choose a reason for hiding this comment

v0y4g3r left a comment

Choose a reason for hiding this comment

codecov bot commented Aug 16, 2023 •

edited

Loading