Memory usage explained #2711
-
Hi, I am evaluating OpenObserve as a replacement for mainly Loki + Grafana and was doing a relatively simple experiment using the following in a local Docker Compose setup:
Using the default configuration I noticed that OpenObserve uses just slightly below 1 GiB of memory for this very simple setup and relatively low amount of data. Then I tried to reconfigure OpenObserve to reduce the amount of memory it needs ased on information from these: Resulting service configuration: openobserve:
image: public.ecr.aws/zinclabs/openobserve:latest
environment:
RUST_LOG: "error"
RUST_BACKTRACE: "full"
ZO_DATA_DIR: "/data"
ZO_ROOT_USER_EMAIL: "[email protected]"
ZO_ROOT_USER_PASSWORD: "abc"
ZO_META_STORE: "postgres"
ZO_META_POSTGRES_DSN: "postgres://x:y@o2-database:5432/openobserve"
ZO_MEMORY_CACHE_ENABLED: "false"
ZO_MEMORY_CACHE_MAX_SIZE: "128"
ZO_MEMORY_CACHE_DATAFUSION_MAX_SIZE: "128"
ZO_DISK_CACHE_ENABLED: "false"
ZO_DISK_CACHE_MAX_SIZE: "8192"
ZO_MAX_FILE_SIZE_ON_DISK: "16"
ZO_MAX_FILE_SIZE_IN_MEMORY: "32"
ZO_MEM_TABLE_MAX_SIZE: "128"
TZ: "Zulu"
ports:
- "5080:5080"
networks:
- monitoring-network
volumes:
- openobserve-data:/data
deploy:
resources:
limits:
memory: 512M
reservations:
memory: 256M ...which was working OK'ish for some minutes but was OOM-killed eventually:
After noticing - a few hours later - I reconfigured the service to allow it up to 1 GiB of memory, which immediately failed:
After allowing up to 2 GiB of memory, it still is not able to work stable, although it varies between 0.5 - 1 GiB normally but during compact, it seems to require much more and logs a lot of the "Failed to allocate additional XYZ bytes" error. I guess it will just be forever in this loop unless I kill the container and give it even more memory. That is unexpected to me and is at least 10x of what e.g. Grafana requires in the same setup. And of course I know I am comparing apples with bananas to some extend but I haven't even started to ingest logs or traces. It would be really helpful if there was some kind of math that someone could apply to predict node resource requirement and the effect of configuration parameters more reliably. In addition, many activities don't seem to have any controls, making capacity planning impossible at the moment. Apart from that, documentation about what activities inside of OpenObserve (e.g. the above "compact") are CPU, memory and/or I/O intensive and by which parameters that is influenced. |
Beta Was this translation helpful? Give feedback.
Replies: 6 comments 11 replies
-
Was running into the "Failed to allocate additional XYZ bytes" error even with 4 GiB of memory although I saw that it never went above 3 GiB. Since there is no documentation about it, with the following changes, the compaction seems to be happy now: --- docker-compose.yaml.bak 2024-02-17 18:01:37.970823116 +0100
+++ docker-compose.yaml 2024-02-17 17:58:02.220944571 +0100
@@ -30,14 +30,14 @@
ZO_ROOT_USER_PASSWORD: "SuperSecret"
ZO_META_STORE: "postgres"
ZO_META_POSTGRES_DSN: "postgres://o2-user:OO123456@o2-database:5432/openobserve"
- ZO_MEMORY_CACHE_ENABLED: "false"
- ZO_MEMORY_CACHE_MAX_SIZE: "128"
- ZO_MEMORY_CACHE_DATAFUSION_MAX_SIZE: "128"
+ ZO_MEMORY_CACHE_ENABLED: "true"
+ ZO_MEMORY_CACHE_MAX_SIZE: "256"
+ ZO_MEMORY_CACHE_DATAFUSION_MAX_SIZE: "256"
ZO_DISK_CACHE_ENABLED: "false"
ZO_DISK_CACHE_MAX_SIZE: "8192"
ZO_MAX_FILE_SIZE_ON_DISK: "16"
- ZO_MAX_FILE_SIZE_IN_MEMORY: "32"
- ZO_MEM_TABLE_MAX_SIZE: "128"
+ ZO_MAX_FILE_SIZE_IN_MEMORY: "64"
+ ZO_MEM_TABLE_MAX_SIZE: "512"
TZ: "Zulu"
ports:
- "5080:5080" |
Beta Was this translation helpful? Give feedback.
-
You are right that comparing o2 with grafana is apples to oranges comparison. The right comparison would be Grafana + Prometheus + Loki/es + Tempo/Jaeger+es = O2 In your case it is Grafana + Prometheus = O2. How much CPU + memory is Prometheus using? With O2 you should run Prometheus in agent mode which will reduce the requirements for Prometheus a lot and then you will have the right metrics for comparison. |
Beta Was this translation helpful? Give feedback.
-
Prometheus is running just fine with 256 MiB memory max and pushing all the metrics to O2. Before going into the details of the others, I'd like to get a better understanding of why O2 requires so much memory (independent of any comparison) in the first place. I am sure there are good reasons for that, just that I can't find any documentation about it. |
Beta Was this translation helpful? Give feedback.
-
@ancoron thank your for the test, actually the memory usage is a very complex question. let me try to explain it. OpenObserve is a big binary includes some components:
It is an all-in-one server when you run it as a single instance. All-in-one means it will run everything inside an instance. Okay, What will be in memory?
Let me make more details about this. RuntimeIt is easy to understand, we have some in memory data like: users, stream metadata, alerts, dashboards and others. Also for every HTTP / gRPC request the data will be in memory first, like your ingest data speed is 100MB/s, it need at least 200MB memory to process data. it's based on the ingestion speed. but this part should be not too much. IngesterWe store recently data in memory table for better performance since When we trigger the limit will convert MemTable to an Immutable and create a new MemTable for writing data. and then will dump Immutable to disk as parquet file. The problem is the Immutable is existing in memory for a few seconds and the memory will not released until the dump is done. need to know that dumping data to disk need additional memory for convert files. but it won't over the data size. Also we have a backend job to move the local files to remote storage (S3) or another directory These 3 parts are all based on the MemTable size, default is QuerierDefault we enabled Memory cache for accelerate search performance Another big memory usage is for search engine. we are using a memory model search engine: CompactorCompactor used to merge small files into big files to reduce file num then improve search performance. the process is read small files into memory and write the data into a big file, but we have a limit RouterIt is a simple reverse proxy to dispatch traffic to ingester or querier, but for single instance, there is no router. UISome static files in memory, it is 26MB for now. ConclusionThe default parameters are optimized for cluster mode and should separate deploy each component, it should works fine. but for local mode / single instance, Yep, we need to optimize the parameters, we will keep improve it. for now, if you want to run it in a single instance for limit 1GB memory. you can try the parameters like this: Limit MemTable size for ingester:
Limit the threads num for compactor:
Limit query cache for querier:
or disable it:
Limit query engine memory:
Please let me know how is it going. Thanks. |
Beta Was this translation helpful? Give feedback.
-
@ancoron you also need to configure this for improve Prometheus remote write:
|
Beta Was this translation helpful? Give feedback.
-
I need to join this conversation. We are currently in the process of investigating a alternative to SEQ. We stumbled across OpenObserve and I'm quite pleased since it is also a replacement of our Jaeger instance for Telemetry. But running SEQ and Jaeger is requiring really little amount of memory and we are trying to justify the memory difference to OpenObserve. What environment variables do have a effect on memory usage ? |
Beta Was this translation helpful? Give feedback.
@ancoron thank your for the test, actually the memory usage is a very complex question. let me try to explain it.
OpenObserve is a big binary includes some components:
It is an all-in-one server when you run it as a single instance. All-in-one means it will run everything inside an instance.
Okay, What will be in memory?
Let me make more details about this.
R…