Memory usage explained #2711

ancoron · 2024-02-17T16:45:26Z

ancoron
Feb 17, 2024

Hi, I am evaluating OpenObserve as a replacement for mainly Loki + Grafana and was doing a relatively simple experiment using the following in a local Docker Compose setup:

scaphandre exposes prometheus metrics
prometheus configured to:
- scrape scaphandre metrics them every 10 seconds
- remote_write to openobserve every minute

Using the default configuration I noticed that OpenObserve uses just slightly below 1 GiB of memory for this very simple setup and relatively low amount of data.

Then I tried to reconfigure OpenObserve to reduce the amount of memory it needs ased on information from these:

Resulting service configuration:

  openobserve:
    image: public.ecr.aws/zinclabs/openobserve:latest
    environment:
      RUST_LOG: "error"
      RUST_BACKTRACE: "full"
      ZO_DATA_DIR: "/data"
      ZO_ROOT_USER_EMAIL: "[email protected]"
      ZO_ROOT_USER_PASSWORD: "abc"
      ZO_META_STORE: "postgres"
      ZO_META_POSTGRES_DSN: "postgres://x:y@o2-database:5432/openobserve"
      ZO_MEMORY_CACHE_ENABLED: "false"
      ZO_MEMORY_CACHE_MAX_SIZE: "128"
      ZO_MEMORY_CACHE_DATAFUSION_MAX_SIZE: "128"
      ZO_DISK_CACHE_ENABLED: "false"
      ZO_DISK_CACHE_MAX_SIZE: "8192"
      ZO_MAX_FILE_SIZE_ON_DISK: "16"
      ZO_MAX_FILE_SIZE_IN_MEMORY: "32"
      ZO_MEM_TABLE_MAX_SIZE: "128"
      TZ: "Zulu"
    ports:
      - "5080:5080"
    networks:
      - monitoring-network
    volumes:
      - openobserve-data:/data
    deploy:
      resources:
        limits:
          memory: 512M
        reservations:
          memory: 256M

...which was working OK'ish for some minutes but was OOM-killed eventually:

2024-02-17T12:30:44,221405+00:00 [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
2024-02-17T12:30:44,221408+00:00 [ 175029]     0 175029  3746258   132252  4665344   130714             0 openobserve
2024-02-17T12:30:44,221411+00:00 oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=docker-0aa3c62645ab21d1cb05a1b43a5d3f6bded9466999bda17254b7ffc25a4d8bb0.scope,mems_allowed=0,oom_memcg=/system.slice/docker-0aa3c62645ab21d1cb05a1b43a5d3f6bded9466999bda17254b7ffc25a4d8bb0.scope,task_memcg=/system.slice/docker-0aa3c62645ab21d1cb05a1b43a5d3f6bded9466999bda17254b7ffc25a4d8bb0.scope,task=openobserve,pid=175029,uid=0
2024-02-17T12:30:44,221570+00:00 Memory cgroup out of memory: Killed process 175029 (openobserve) total-vm:14985032kB, anon-rss:470000kB, file-rss:59008kB, shmem-rss:0kB, UID:0 pgtables:4556kB oom_score_adj:0

After noticing - a few hours later - I reconfigured the service to allow it up to 1 GiB of memory, which immediately failed:

docker-compose-openobserve-1  | 2024-02-17T15:50:35.321546270+00:00 ERROR datafusion_physical_plan::sorts::sort: Failure while reading spill file: RefCountedTempFile { parent_temp_dir: TempDir { path: "/tmp/.tmpCnQhtV" }, tempfile: NamedTempFile("/tmp/.tmpCnQhtV/.tmpmmQ4QG") }. Error: Execution error: channel closed    
docker-compose-openobserve-1  | 2024-02-17T15:50:35.355252177+00:00 ERROR openobserve::service::compact::merge: [COMPACT] merge files failed: Resources exhausted: Failed to allocate additional 2058560 bytes for SortPreservingMergeExec[0] with 12328576 bytes already allocated - maximum available is 1304672    
docker-compose-openobserve-1  | 2024-02-17T15:50:35.360581371+00:00 ERROR openobserve::service::compact::merge: [COMPACT] merge files failed: Resources exhausted: Failed to allocate additional 2083136 bytes for SortPreservingMergeExec[0] with 12476352 bytes already allocated - maximum available is 894752    
docker-compose-openobserve-1  | 2024-02-17T15:50:35.473821569+00:00 ERROR datafusion_physical_plan::sorts::sort: Failure while reading spill file: RefCountedTempFile { parent_temp_dir: TempDir { path: "/tmp/.tmpCnQhtV" }, tempfile: NamedTempFile("/tmp/.tmpCnQhtV/.tmpbYET07") }. Error: Execution error: channel closed    
...

After allowing up to 2 GiB of memory, it still is not able to work stable, although it varies between 0.5 - 1 GiB normally but during compact, it seems to require much more and logs a lot of the "Failed to allocate additional XYZ bytes" error. I guess it will just be forever in this loop unless I kill the container and give it even more memory.

That is unexpected to me and is at least 10x of what e.g. Grafana requires in the same setup. And of course I know I am comparing apples with bananas to some extend but I haven't even started to ingest logs or traces.

It would be really helpful if there was some kind of math that someone could apply to predict node resource requirement and the effect of configuration parameters more reliably.

In addition, many activities don't seem to have any controls, making capacity planning impossible at the moment.

Apart from that, documentation about what activities inside of OpenObserve (e.g. the above "compact") are CPU, memory and/or I/O intensive and by which parameters that is influenced.

Answered by hengfeiyang

Feb 18, 2024

@ancoron thank your for the test, actually the memory usage is a very complex question. let me try to explain it.

OpenObserve is a big binary includes some components:

Ingester
Querier
Compactor
Router
UI

It is an all-in-one server when you run it as a single instance. All-in-one means it will run everything inside an instance.

Okay, What will be in memory?

Runtime: process HTTP / gRPC request, especially ingest data.
Ingester cache: store recently data in memory table
Querier cache: store latest query data in memory cache.
Query engine: we are using a memory data model to analyze and compute the query.
Compacter: merge small files into big files.

Let me make more details about this.

R…

View full answer

ancoron · 2024-02-17T17:03:58Z

ancoron
Feb 17, 2024
Author

Was running into the "Failed to allocate additional XYZ bytes" error even with 4 GiB of memory although I saw that it never went above 3 GiB.

Since there is no documentation about it, with the following changes, the compaction seems to be happy now:

--- docker-compose.yaml.bak     2024-02-17 18:01:37.970823116 +0100
+++ docker-compose.yaml         2024-02-17 17:58:02.220944571 +0100
@@ -30,14 +30,14 @@
       ZO_ROOT_USER_PASSWORD: "SuperSecret"
       ZO_META_STORE: "postgres"
       ZO_META_POSTGRES_DSN: "postgres://o2-user:OO123456@o2-database:5432/openobserve"
-      ZO_MEMORY_CACHE_ENABLED: "false"
-      ZO_MEMORY_CACHE_MAX_SIZE: "128"
-      ZO_MEMORY_CACHE_DATAFUSION_MAX_SIZE: "128"
+      ZO_MEMORY_CACHE_ENABLED: "true"
+      ZO_MEMORY_CACHE_MAX_SIZE: "256"
+      ZO_MEMORY_CACHE_DATAFUSION_MAX_SIZE: "256"
       ZO_DISK_CACHE_ENABLED: "false"
       ZO_DISK_CACHE_MAX_SIZE: "8192"
       ZO_MAX_FILE_SIZE_ON_DISK: "16"
-      ZO_MAX_FILE_SIZE_IN_MEMORY: "32"
-      ZO_MEM_TABLE_MAX_SIZE: "128"
+      ZO_MAX_FILE_SIZE_IN_MEMORY: "64"
+      ZO_MEM_TABLE_MAX_SIZE: "512"
       TZ: "Zulu"
     ports:
       - "5080:5080"

0 replies

prabhatsharma · 2024-02-17T18:19:57Z

prabhatsharma
Feb 17, 2024
Maintainer

You are right that comparing o2 with grafana is apples to oranges comparison. The right comparison would be

Grafana + Prometheus + Loki/es + Tempo/Jaeger+es = O2

In your case it is Grafana + Prometheus = O2.

How much CPU + memory is Prometheus using? With O2 you should run Prometheus in agent mode which will reduce the requirements for Prometheus a lot and then you will have the right metrics for comparison.

0 replies

ancoron · 2024-02-17T20:17:04Z

ancoron
Feb 17, 2024
Author

Prometheus is running just fine with 256 MiB memory max and pushing all the metrics to O2.

Before going into the details of the others, I'd like to get a better understanding of why O2 requires so much memory (independent of any comparison) in the first place.

I am sure there are good reasons for that, just that I can't find any documentation about it.

2 replies

prabhatsharma Feb 17, 2024
Maintainer

The default behavior is to do aggressive caching, which requires a lot of RAM. You pulled the right env variables to reduce memory usage. e.g., ZO_MEM_TABLE_MAX_SIZE is to cache the incoming data in memory so recent data can be queried directly from memory instead of disk since recent is queried a lot. It is effectively a trade-off in memory usage vs performance.

ancoron Feb 18, 2024
Author

OK, so essentially there is no configuration to make O2 require less than 4 GiB?

Btw. is it possible to run the components in separate containers? I'd want to experiment with something like this:

Web UI + querier
ingester
compactor
alert manager

hengfeiyang · 2024-02-18T14:30:32Z

hengfeiyang
Feb 18, 2024
Maintainer

@ancoron thank your for the test, actually the memory usage is a very complex question. let me try to explain it.

OpenObserve is a big binary includes some components:

Ingester
Querier
Compactor
Router
UI

It is an all-in-one server when you run it as a single instance. All-in-one means it will run everything inside an instance.

Okay, What will be in memory?

Runtime: process HTTP / gRPC request, especially ingest data.
Ingester cache: store recently data in memory table
Querier cache: store latest query data in memory cache.
Query engine: we are using a memory data model to analyze and compute the query.
Compacter: merge small files into big files.

Let me make more details about this.

Runtime

It is easy to understand, we have some in memory data like: users, stream metadata, alerts, dashboards and others. Also for every HTTP / gRPC request the data will be in memory first, like your ingest data speed is 100MB/s, it need at least 200MB memory to process data. it's based on the ingestion speed. but this part should be not too much.

Ingester

We store recently data in memory table for better performance since v0.8.0. and we create MemTable for each organization + stream_type. For example you ingest logs and metrics data into a single organization default then we will create two MemTable default/logs and default/metrics. The default size limit for MemTable is ZO_MAX_FILE_SIZE_IN_MEMORY=256, it means will use 256MB for each MemTable at maximum. two MemTable maybe need 512MB memory.

When we trigger the limit will convert MemTable to an Immutable and create a new MemTable for writing data. and then will dump Immutable to disk as parquet file. The problem is the Immutable is existing in memory for a few seconds and the memory will not released until the dump is done. need to know that dumping data to disk need additional memory for convert files. but it won't over the data size.

Also we have a backend job to move the local files to remote storage (S3) or another directory local mode & local storage. this job will merge multiple small files into a big file to reduce to workload of compactor, event there is no separate compactor the logic is same.

These 3 parts are all based on the MemTable size, default is 256MB, but you can reduce it to fit your hardware resource, like limit to 128MB: ZO_MAX_FILE_SIZE_IN_MEMORY=128.

Querier

Default we enabled Memory cache for accelerate search performance ZO_MEMORY_CACHE_ENABLED=true, and give it a limit by ZO_MEMORY_CACHE_MAX_SIZE=0, default 0 means will use 50% of total memory for memory query cache. you can set it to a small size, like 128MB: ZO_MEMORY_CACHE_MAX_SIZE=128 or disable memory cache: ZO_MEMORY_CACHE_ENABLED=false.

Another big memory usage is for search engine. we are using a memory model search engine: Datafusion + Arrow. it process data in memory. we give a maximum limit for it by ZO_MEMORY_CACHE_DATAFUSION_MAX_SIZE=0, 0 also means will use 50% for query engine. you can limit it by ZO_MEMORY_CACHE_DATAFUSION_MAX_SIZE=256 to 256MB. but it shouldn't be too small, when you query a large data scale when give it a very small size will cause query error.

Compactor

Compactor used to merge small files into big files to reduce file num then improve search performance. the process is read small files into memory and write the data into a big file, but we have a limit ZO_COMPACT_MAX_FILE_SIZE=256, it means for each file will use 256MB at maximum. it will concurrent process by stream, the threads num limit by ZO_FILE_MOVE_THREAD_NUM=0, 0 means default use CPU_NUM * 2. for example, you limit the pod CPU to 2 cores. then it will create 2 * 2 = 4 threads to concurrent merge files. per file max size is 256MB, so at most will use 256 * 4 = 1GB memory.

Router

It is a simple reverse proxy to dispatch traffic to ingester or querier, but for single instance, there is no router.

UI

Some static files in memory, it is 26MB for now.

Conclusion

The default parameters are optimized for cluster mode and should separate deploy each component, it should works fine. but for local mode / single instance, Yep, we need to optimize the parameters, we will keep improve it.

for now, if you want to run it in a single instance for limit 1GB memory. you can try the parameters like this:

Limit MemTable size for ingester:

ZO_MAX_FILE_SIZE_IN_MEMORY=128

Limit the threads num for compactor:

ZO_FILE_MOVE_THREAD_NUM=1

Limit query cache for querier:

ZO_MEMORY_CACHE_MAX_SIZE=256

or disable it:

ZO_MEMORY_CACHE_ENABLED=false

Limit query engine memory:

ZO_MEMORY_CACHE_DATAFUSION_MAX_SIZE=256

Please let me know how is it going.

Thanks.

8 replies

hengfeiyang Apr 24, 2024
Maintainer

Compactor is a separate pod, right? as i know the compactor will use a lot of memory only one case the stream have over 1000 fields. do you have one stream have over 1000 fields?

TessaIO Apr 24, 2024

Yes, it's in a single pod. Stream has more than 600 fields.

TessaIO Apr 24, 2024

and also ZO_COMPACT_INTERVAL is set to 60

hengfeiyang Apr 24, 2024
Maintainer

Yep, i think it related to there is more than 600 fields. Can you create an issue for this, i will try to improve the memory usage for big stream on compactor

TessaIO Apr 24, 2024

Sure, thanks!

hengfeiyang · 2024-02-18T15:32:50Z

hengfeiyang
Feb 18, 2024
Maintainer

@ancoron you also need to configure this for improve Prometheus remote write:

queue_config:
  max_samples_per_send: 10000

https://openobserve.ai/docs/ingestion/metrics/prometheus/

0 replies

kurtisane · 2024-02-24T20:56:05Z

kurtisane
Feb 24, 2024

I need to join this conversation. We are currently in the process of investigating a alternative to SEQ.

We stumbled across OpenObserve and I'm quite pleased since it is also a replacement of our Jaeger instance for Telemetry.

But running SEQ and Jaeger is requiring really little amount of memory and we are trying to justify the memory difference to OpenObserve.

What environment variables do have a effect on memory usage ?

1 reply

hengfeiyang Feb 25, 2024
Maintainer

as me explained, we will try to use all of memory that you have for better performance. you can limit the memory usage by set some params. and we will optimize the default parameters for single instance later.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory usage explained #2711

{{title}}

Replies: 6 comments 11 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Memory usage explained #2711

ancoron Feb 17, 2024

R…

Replies: 6 comments · 11 replies

ancoron Feb 17, 2024 Author

prabhatsharma Feb 17, 2024 Maintainer

ancoron Feb 17, 2024 Author

prabhatsharma Feb 17, 2024 Maintainer

ancoron Feb 18, 2024 Author

hengfeiyang Feb 18, 2024 Maintainer

Runtime

Ingester

Querier

Compactor

Router

UI

Conclusion

hengfeiyang Apr 24, 2024 Maintainer

TessaIO Apr 24, 2024

TessaIO Apr 24, 2024

hengfeiyang Apr 24, 2024 Maintainer

TessaIO Apr 24, 2024

hengfeiyang Feb 18, 2024 Maintainer

kurtisane Feb 24, 2024

hengfeiyang Feb 25, 2024 Maintainer

ancoron
Feb 17, 2024

Replies: 6 comments 11 replies

ancoron
Feb 17, 2024
Author

prabhatsharma
Feb 17, 2024
Maintainer

ancoron
Feb 17, 2024
Author

prabhatsharma Feb 17, 2024
Maintainer

ancoron Feb 18, 2024
Author

hengfeiyang
Feb 18, 2024
Maintainer

hengfeiyang Apr 24, 2024
Maintainer

hengfeiyang Apr 24, 2024
Maintainer

hengfeiyang
Feb 18, 2024
Maintainer

kurtisane
Feb 24, 2024

hengfeiyang Feb 25, 2024
Maintainer