Database optimize #291

zly2006 · 2024-09-29T17:09:40Z

This PR optimize database size in various ways:

Use long instead of string to store timestamp
Use a separate string table to store strings, also use java hash code to optimize searching.

I finally found it very hard to support both old and new scheme. Currently only auto purge and muanually purge are supported. I made a /ledger convert command to convert data from the old scheme.

zly2006 · 2024-09-29T17:18:01Z

I think we can only support searching of the old scheme, and implement a converting program for it.

zly2006 · 2024-09-29T19:16:53Z

old scheme on sqlite, it stores timestamp in a string

-- auto-generated definition
create table actions
(
    id              INTEGER
        primary key autoincrement,
    action_id       INT     not null
        constraint fk_actions_action_id__id
            references ActionIdentifiers
            on update restrict on delete restrict,
    time            TEXT    not null,
    x               INT     not null,
    y               INT     not null,
    z               INT     not null,
    world_id        INT     not null
        constraint fk_actions_world_id__id
            references worlds
            on update restrict on delete restrict,
    object_id       INT     not null
        constraint fk_actions_object_id__id
            references ObjectIdentifiers
            on update restrict on delete restrict,
    old_object_id   INT     not null
        constraint fk_actions_old_object_id__id
            references ObjectIdentifiers
            on update restrict on delete restrict,
    block_state     TEXT,
    old_block_state TEXT,
    source          INT     not null
        constraint fk_actions_source__id
            references sources
            on update restrict on delete restrict,
    player_id       INT
        constraint fk_actions_player_id__id
            references players
            on update restrict on delete restrict,
    extra_data      TEXT,
    rolled_back     BOOLEAN not null
);

create index actions_action_id
    on actions (action_id);

create index actions_by_location
    on actions (x, y, z, world_id);

create index actions_object_id
    on actions (object_id);

create index actions_old_object_id
    on actions (old_object_id);

create index actions_player_id
    on actions (player_id);

create index actions_source
    on actions (source);

zly2006 · 2024-10-06T08:03:20Z

On my real game server testing:
new scheme:
20140591 unique strings, 46653892 actions, timestamp: 1727628523588-1728200973330 (159 hour), 26.1 GB, 601 bytes/action

old:
6308393 actions, time: 2024-07-01 11:54:21.086 -> 2024-10-04 03:34:17.058, 12.0 GB, 2058 bytes/action

70% database size saved.

@DrexHD

DrexHD · 2024-10-09T12:59:49Z

Impressive!, but...
Maintaining two different formats feels very jank, because it introduces lots of code duplication, maintenance burden and gets worse if we want to make more changes to the schema.
We have already discussed versioned sql, with automatic schema updates on discord to give us the ability to improve the database format over time, but that is a lot of effort to implement properly.

zly2006 · 2024-10-10T15:34:51Z

I tried to run this sql on a 100k sample of our real data:

select action_id,count(*) as c from actions
group by action_id,x,y,z
order by  c desc ;

here's the top 10 result:

So we can implement a better "auto purge", that purges actions at the same locations over 100 times, this saves 97% data from our database. command:

select sum(c) - 100*count(*)
from (select action_id,count(*) as c from actions
group by action_id,x,y,z
order by  c desc)
where c>100;

zly2006 · 2024-10-10T15:41:31Z

My approach will be, add some new config options: smart_purge_enabled and smart_purge_filters(default=action_id,object_id,world_id,x,y,z) and smart_purge_max(default=100).

If some actions, their all smart_purge_filters are same, then we will purge old actions if their count >= smart_purge_max.

We can also encourage server admins to enable it since it almost never make you lose important data and save a lot of space.

zly2006 added 2 commits September 29, 2024 13:26

chore: bump dependencies

8c550a9

fix

f27142e

zly2006 closed this Sep 29, 2024

zly2006 reopened this Sep 29, 2024

zly2006 added 2 commits September 30, 2024 01:23

fix: convert

1d354bd

fix: convert

57c0e79

zly2006 added 2 commits October 3, 2024 20:05

build: bump dependencies

980d30e

feat: convert subcommand

bd2ee8d

zly2006 marked this pull request as ready for review October 6, 2024 07:05

zly2006 requested a review from a team as a code owner October 6, 2024 07:05

fix: database compatibility

b663f29

zly2006 added 3 commits October 11, 2024 20:55

add gzip; drop old scheme support except converting

efa94cb

feat: smart purge

8995779

feat: smart purge

66acfe1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Database optimize #291

Database optimize #291

zly2006 commented Sep 29, 2024 •

edited

Loading

zly2006 commented Sep 29, 2024

zly2006 commented Sep 29, 2024

zly2006 commented Oct 6, 2024

DrexHD commented Oct 9, 2024 •

edited

Loading

zly2006 commented Oct 10, 2024

zly2006 commented Oct 10, 2024

Database optimize #291

Are you sure you want to change the base?

Database optimize #291

Conversation

zly2006 commented Sep 29, 2024 • edited Loading

zly2006 commented Sep 29, 2024

zly2006 commented Sep 29, 2024

zly2006 commented Oct 6, 2024

DrexHD commented Oct 9, 2024 • edited Loading

zly2006 commented Oct 10, 2024

zly2006 commented Oct 10, 2024

zly2006 commented Sep 29, 2024 •

edited

Loading

DrexHD commented Oct 9, 2024 •

edited

Loading