Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the sync process more efficient when reading large documents on MongoDB #42133

Closed
paoliniluis opened this issue May 1, 2024 · 0 comments · Fixed by #42140
Closed

Make the sync process more efficient when reading large documents on MongoDB #42133

paoliniluis opened this issue May 1, 2024 · 0 comments · Fixed by #42140
Assignees
Labels
Administration/Metadata & Sync Database/Mongo .Escalation Priority:P1 Security holes w/o exploit, crashing, setup/upgrade, login, broken common features, correctness .Team/BackendComponents also known as BEC Type:Bug Product defects
Milestone

Comments

@paoliniluis
Copy link
Contributor

Describe the bug

We OOM easily on big documents in MongoDB on relatively small DB's

To Reproduce

  1. spin up a MongoDB 4.4.5 (just mentioning this as the customer who hits this runs this specific MongoDB version, but it can be any). It needs to run on port 27017 with metabase as the username and metasample123 as the password
  2. use this script to load a few thousand docs (like 2K docs). Run it with bun runtime for increased speed
const { MongoClient, ServerApiVersion } = require("mongodb");

const uri = "mongodb://metabase:metasample123@localhost:27017/admin";

const client = new MongoClient(uri);

// function to generate a random string of length n
function randomString(n) {
  const chars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
  let str = "";
  for (let i = 0; i < n; i++) {
    str += chars.charAt(Math.floor(Math.random() * chars.length));
  }
  return str;
}

// function to generate a very big and complex bson structure
function generateComplexBSON() {
  return {
    "key1": randomString(65536),
    "key2": randomString(65536),
    "key3": randomString(65536),
    "key4": randomString(65536),
    "key5": randomString(65536),
    "key6": randomString(65536),
    "key7": randomString(65536),
    "key8": randomString(65536),
    "key9": randomString(65536),
    "key10": randomString(65536),
    "key11": randomString(65536),
    "key12": randomString(65536),
    "key13": randomString(65536),
    "key14": randomString(65536),
    "key15": randomString(65536),
  };
};

async function run() {
  try {
    // Connect the client to the server (optional starting in v4.7)
    await client.connect();
    // loop a million times and create documents in the sample.people collection
    for (let i = 0; i < 1000000; i++) {
      await client.db("sample").collection("people").insertOne({
        "id": i + 1,
        // generate a random address key
        "address": "address" + Math.floor(Math.random() * 1000000),
        // generate a random email address key
        "email": "email" + Math.floor(Math.random() * 1000000),
        // generate a random password key
        "password": "password" + Math.floor(Math.random() * 1000000),
        // generate a random name key
        "name": "name" + Math.floor(Math.random() * 1000000),
        // generate a random phone city key
        "phone": "phone" + Math.floor(Math.random() * 1000000),
        // generate a random city key
        "city": "city" + Math.floor(Math.random() * 1000000),
        // generate a random longitude
        "longitude": Math.floor(Math.random() * 1000000),
        // generate a random US state
        "state": "state" + Math.floor(Math.random() * 1000000),
        // generate a random latitude
        "latitude": Math.floor(Math.random() * 1000000),
        // generate a random source
        "source": "source" + Math.floor(Math.random() * 1000000),
        // generate a random birth date
        "birth_date": new Date(),
        // generate a random zip code
        "zip": Math.floor(Math.random() * 1000000),
        // generate a random created at date
        "created_at": new Date(),
        "description": randomString(65536),
        // "facts": {
        //   "key1": generateComplexBSON(),
        //   "key2": randomString(65536),
        //   "key3": randomString(65536),
        //   "key4": randomString(65536),
        //   "key5": randomString(65536),
        //   "key6": randomString(65536),
        //   "key7": randomString(65536),
        //   "key8": randomString(65536),
        //   "key9": randomString(65536),
        //   "key10": randomString(65536),
        //   "key11": randomString(65536),
        //   "key12": randomString(65536),
        //   "key13": randomString(65536),
        //   "key14": randomString(65536),
        //   "key15": randomString(65536),
        // },
        "tags": [
          randomString(65536),
          randomString(65536),
          randomString(65536),
          randomString(65536),
          randomString(65536),
          randomString(65536),
          randomString(65536),
          randomString(65536),
          randomString(65536),
          randomString(65536),
          randomString(65536),
        ],
        "complex": generateComplexBSON(),
            });
      console.log("Inserted " + i + " documents into the 'people' collection.");
    }
  } finally {
    // Ensures that the client will close when you finish/error
    await client.close();
  }
}
run().catch(console.dir);

(credits to chatgpt)

  1. spin up Metabase and run it with -Xms512m -Xmx1530m, more memory is also fine, it will also blow up
  2. Add the DB, it will crash with OOM

Expected behavior

We should read documents more efficiently, I really don't know how, but there should be a way?

Logs

2024-05-01 23:26:10,813 DEBUG middleware.log :: POST /api/database/4/sync_schema 200 29.1 ms (7 DB calls) App DB connections: 1/10 Jetty threads: 4/50 (6 idle, 0 queued) (70 total active threads) Queries in flight: 0 (0 queued)
2024-05-01 23:26:10,813 INFO sync.util :: STARTING: Sync metadata for mongo Database 4 ''mg''
2024-05-01 23:26:10,822 INFO sync.util :: STARTING: step ''sync-dbms-version'' for mongo Database 4 ''mg''
2024-05-01 23:26:10,823 INFO sync.util :: FINISHED: step ''sync-dbms-version'' for mongo Database 4 ''mg'' (1.1 ms)
2024-05-01 23:26:10,824 INFO sync.util :: STARTING: step ''sync-timezone'' for mongo Database 4 ''mg''
2024-05-01 23:26:10,824 INFO sync-metadata.sync-timezone :: :mongo database 4 default timezone is nil
2024-05-01 23:26:10,824 INFO sync.util :: FINISHED: step ''sync-timezone'' for mongo Database 4 ''mg'' (190.7 µs)
2024-05-01 23:26:10,824 INFO sync.util :: STARTING: step ''sync-tables'' for mongo Database 4 ''mg''
2024-05-01 23:26:10,829 INFO sync-metadata.tables :: Updating table metadata for Table 38 ''people''
2024-05-01 23:26:10,829 INFO sync.util :: FINISHED: step ''sync-tables'' for mongo Database 4 ''mg'' (5.0 ms)
2024-05-01 23:26:10,829 INFO sync.util :: STARTING: step ''sync-fields'' for mongo Database 4 ''mg''
Aborting due to java.lang.OutOfMemoryError: Java heap space
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (debug.cpp:339), pid=1, tid=268
#  fatal error: OutOfMemory encountered: Java heap space
#
# JRE version: OpenJDK Runtime Environment Temurin-11.0.22+7 (11.0.22+7) (build 11.0.22+7)
# Java VM: OpenJDK 64-Bit Server VM Temurin-11.0.22+7 (11.0.22+7, mixed mode, tiered, compressed oops, g1 gc, linux-amd64)
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping to //core.1)
#
# An error report file with more information is saved as:
# /tmp/hs_err_pid1.log

Information about your Metabase installation

v49.7

Severity

P1

Additional context

NA

@paoliniluis paoliniluis added Type:Bug Product defects Priority:P1 Security holes w/o exploit, crashing, setup/upgrade, login, broken common features, correctness Database/Mongo Administration/Metadata & Sync .Escalation labels May 1, 2024
@darksciencebase darksciencebase added the .Team/BackendComponents also known as BEC label May 2, 2024
@qnkhuat qnkhuat added this to the 0.49.9 milestone May 3, 2024
@sloansparger sloansparger modified the milestones: 0.49.9, 0.49.8 May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Administration/Metadata & Sync Database/Mongo .Escalation Priority:P1 Security holes w/o exploit, crashing, setup/upgrade, login, broken common features, correctness .Team/BackendComponents also known as BEC Type:Bug Product defects
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants