Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database optimize #291

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

zly2006
Copy link
Contributor

@zly2006 zly2006 commented Sep 29, 2024

This PR optimize database size in various ways:

  • Use long instead of string to store timestamp
  • Use a separate string table to store strings, also use java hash code to optimize searching.

I finally found it very hard to support both old and new scheme. Currently only auto purge and muanually purge are supported. I made a /ledger convert command to convert data from the old scheme.

@zly2006 zly2006 closed this Sep 29, 2024
@zly2006 zly2006 reopened this Sep 29, 2024
@zly2006
Copy link
Contributor Author

zly2006 commented Sep 29, 2024

I think we can only support searching of the old scheme, and implement a converting program for it.

@zly2006
Copy link
Contributor Author

zly2006 commented Sep 29, 2024

old scheme on sqlite, it stores timestamp in a string

-- auto-generated definition
create table actions
(
    id              INTEGER
        primary key autoincrement,
    action_id       INT     not null
        constraint fk_actions_action_id__id
            references ActionIdentifiers
            on update restrict on delete restrict,
    time            TEXT    not null,
    x               INT     not null,
    y               INT     not null,
    z               INT     not null,
    world_id        INT     not null
        constraint fk_actions_world_id__id
            references worlds
            on update restrict on delete restrict,
    object_id       INT     not null
        constraint fk_actions_object_id__id
            references ObjectIdentifiers
            on update restrict on delete restrict,
    old_object_id   INT     not null
        constraint fk_actions_old_object_id__id
            references ObjectIdentifiers
            on update restrict on delete restrict,
    block_state     TEXT,
    old_block_state TEXT,
    source          INT     not null
        constraint fk_actions_source__id
            references sources
            on update restrict on delete restrict,
    player_id       INT
        constraint fk_actions_player_id__id
            references players
            on update restrict on delete restrict,
    extra_data      TEXT,
    rolled_back     BOOLEAN not null
);

create index actions_action_id
    on actions (action_id);

create index actions_by_location
    on actions (x, y, z, world_id);

create index actions_object_id
    on actions (object_id);

create index actions_old_object_id
    on actions (old_object_id);

create index actions_player_id
    on actions (player_id);

create index actions_source
    on actions (source);

@zly2006 zly2006 marked this pull request as ready for review October 6, 2024 07:05
@zly2006 zly2006 requested a review from a team as a code owner October 6, 2024 07:05
@zly2006
Copy link
Contributor Author

zly2006 commented Oct 6, 2024

On my real game server testing:
new scheme:
20140591 unique strings, 46653892 actions, timestamp: 1727628523588-1728200973330 (159 hour), 26.1 GB, 601 bytes/action

old:
6308393 actions, time: 2024-07-01 11:54:21.086 -> 2024-10-04 03:34:17.058, 12.0 GB, 2058 bytes/action

70% database size saved.

@DrexHD

@DrexHD
Copy link
Contributor

DrexHD commented Oct 9, 2024

Impressive!, but...
Maintaining two different formats feels very jank, because it introduces lots of code duplication, maintenance burden and gets worse if we want to make more changes to the schema.
We have already discussed versioned sql, with automatic schema updates on discord to give us the ability to improve the database format over time, but that is a lot of effort to implement properly.

@zly2006
Copy link
Contributor Author

zly2006 commented Oct 10, 2024

I tried to run this sql on a 100k sample of our real data:

select action_id,count(*) as c from actions
group by action_id,x,y,z
order by  c desc ;

here's the top 10 result:

6,47566
6,27051
6,5015
1,4880
1,4880
6,4196
1,2442
6,713
6,523
6,237

So we can implement a better "auto purge", that purges actions at the same locations over 100 times, this saves 97% data from our database. command:

select sum(c) - 100*count(*)
from (select action_id,count(*) as c from actions
group by action_id,x,y,z
order by  c desc)
where c>100;

@zly2006
Copy link
Contributor Author

zly2006 commented Oct 10, 2024

My approach will be, add some new config options: smart_purge_enabled and smart_purge_filters(default=action_id,object_id,world_id,x,y,z) and smart_purge_max(default=100).

If some actions, their all smart_purge_filters are same, then we will purge old actions if their count >= smart_purge_max.

We can also encourage server admins to enable it since it almost never make you lose important data and save a lot of space.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants