Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance question #36

Open
oderwat opened this issue May 17, 2021 · 3 comments
Open

Performance question #36

oderwat opened this issue May 17, 2021 · 3 comments

Comments

@oderwat
Copy link

oderwat commented May 17, 2021

I let the data-miner demo run for some time (2 hours roughly) and get ever-decreasing answer times (4-5 seconds now) when reloading the frontend. As this is just a simple query with a last:50 clause I wonder how the database will perform when using it for something "real"?

@oderwat
Copy link
Author

oderwat commented May 18, 2021

This may be related to Docker on OSX. I need to do a further investigation (like running it natively on OSX and or Linux) to get a comparison. Still wonder how much data you can plug into EliasDB. Is it realistic to have millions of nodes and edges?

@mladkau
Copy link
Contributor

mladkau commented May 19, 2021

Hey there, you raise a very valid point. I've done some soak testing yesterday and have the following timings (this ran on a Linux VM on a server of mine):

Time / API response time (measured in a Browser)
09:11:00 2200ms
11:48:00 6631ms
12:27:00 7500ms
15:48:00 12000ms

So the time definitely goes up as time passes. Now let's me add some of my thoughts:

  1. The time increases in a linear fashion which is a good sign. This means the code seems to behaving as expected and doesn't add any crazy delays.

  2. The main purpose of the example is to show the cluster functionality. However, doing this on a single machine where all containers share the same I/O interface is really not a good idea for long term use as the cluster functionality will only increase the required computational overhead without providing any real benefit.

  3. The example is using the database in a way which is not the main intention for a graph database. Graph databases are primarily used to store highly connected data. In the example no actual graph is constructed - the data is stored in isolated nodes and thus the query has to iterate over them in a sequential fashion.

In the moment the underlying datastore is more-or-less a giant hash map. It uses a so called HTree (https://en.wikipedia.org/wiki/HTree) see here: https://github.com/krotik/eliasdb/blob/master/hash/htree.go
Using this data structure means that the lookup via keys is very fast - this comes in handy when traversing edges. However, sequential reading is quite slow as there is no optimization possible (using just the HTree). A possible solution to make queries which require sequential reading faster would be to implement something akin to a linked list on the level of the HTree...

@oderwat
Copy link
Author

oderwat commented May 19, 2021

I also tested some more. The major speed degradation I got was related to me "somehow" had created persistent volumes for the docker-compose setup. After changing some code in collector (URL), I wondered why I see old URLs in the results. I removed all the containers. Even rebuild all images. Data was still there. Then I found the unnamed volumes. After deleting them, everything was fine and much faster. This is expected because the Docker implementation on OSX "sucks" when using volumes. It makes everything file system related up to 60 times slower. There are some workarounds. One of them is running docker inside a virtual machine on the same Mac, which is fast. Yeah... Computers.

Getting "the last" entries of something (indexed) seems a pretty common requirement. Probably also for a Graph database. Sadly nothing we do will have less than thousands if not millions of nodes, and there will be many searches for date ranges or last entries.

I don't know who is using EliasDB in anything real and how that works out. I plan to use it for a small project which connects multiple SPAs to a PHP server where the SPAs and the server use GrapQL Websockets to implement a notification system. Between groups of running SPAs. Like having SPAs A B C D while A B and C D form a group, A sends new data to the server. The server informs B to update and the same for the C and D pair. Currently, A B C D connects to a Go-based Websocket server, and the Go server polls the DB for changes).

But I would love to create bigger systems with much more complex business logic. EliasDB seems to have the right feature set for this. But I guess this would really need an extension to sequential traverse of the DB.

Not sure if one could keep something like a "last20" edge between the last elements and updating it accordingly. Just something that came to mind a second ago.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants