-
Notifications
You must be signed in to change notification settings - Fork 444
MongoShake Performance Document
This article is a partial performance test document for MongoShake, which gives test data in different cases.
Both the source MongoDB, destination MongoDB and MongoShake are located in the same hosts. The following shows the configuration:
- CPU:64 Core,Intel(R) Xeon(R) CPU E5-2682 v4 @ 2.50GHz
- Memory:32*16G
- Network Card:20Gbps
- OS:Linux x86_64
- MongoDB version: 3.4.15
- Hard disk: SSD
- Go version: 1.10.3
Above figure shows the topology of our experiment deployment where tunnel type is “direct” so that MongoShake write data into target MongoDB.
The source MongoDB type is replica-set or sharding, while the target MongoDB type is ReplicaSet. There is only one Mongod instance in every replica-set in our experiment. For convenience, there is only insertion operation in the oplog.
In our pre-test, the performance limit of target MongoDB writing QPS with 8 threads is about 70,000 when single oplog inserted, 540,000 when multiple insertions batched together. When adjusting the threads to 64, the QPS is about 80,000 when single oplog inserted while 610,000 when multiple.
The test data covers the following dimensions: latency, QPS, CPU usage, and memory usage. All the values are given by the average of 60 seconds. Latency calculates the time difference between data is inserted in the source database and the same data copied in the destination database, the source timestamp is from the key “ts” in source oplog while target timestamp is also from the key “ts” in the oplog generated in the target side, however, it’s not an easy job to calculate time difference of oplog that generated in two hosts which have different time clock, so we choose topology that source MongoDB and destination MongoDB locate in the same host; QPS is got from the RESTful API of MongoShake which counts oplog writing number every second; Workload marks data distribution; We also give the CPU and memory usage.
The test variable including source MongoDB type, tunnel type, the workload of data in the source database, worker concurrency, and executor concurrency.
In this case, tunnel type is “mock” which will throw away all the data so we can measure the QPS of reading and handle in the pipeline. There is 1 db with 5 collections in the source database, the data distribution in the database is relatively average, about 24 million items per collection. For convenience, all the oplog operation is the insertion. Each document includes 5 columns and the total size of each oplog document is about 220 bytes.
Variable | Value |
---|---|
Source MongoDB type | replica set |
Unique index/Shard key | No; Hash by id. |
Workload | A total of 5 collections in one db, each document includes 5 columns and the total size of each oplog document is about 220 bytes. |
Tunnel type | mock |
Worker concurrency | 8 |
Here comes the result of measurement:
Measurement | Value |
---|---|
QPS | 424,251 |
CPU usage | 1175% |
Memory usage | About 0%, 130MB |
The QPS is about 420 thousands while the CPU usage is 1175%. The memory usage is about 0.0% because data is thrown away in the tunnel so that the queue usage is very low.
In this case, we change the workload. There is still 1 db with 5 collections but only 1 column in each document and the size of the document is about 180 bytes.
Variable | Value |
---|---|
Source MongoDB type | replica set |
Unique index/Shard key | No; Hash by id. |
Workload | A total of 5 collections in one db, each document includes 5 columns and the total size of each oplog document is about 180 bytes. |
Tunnel type | mock |
Worker concurrency | 8 |
Here comes the result of measurement:
Measurement | Value |
---|---|
QPS | 545,117 |
CPU usage | 694% |
Memory usage | About 0%, 43MB |
Compared to case 1, case 2’s QPS is higher, and CPU and memory usage are lower. It is because the size of oplog is smaller so that the cost of de-serialization is lower.
Compare to case 1, we change tunnel type to “direct” which will write data into target MongoDB directly. What needs to be emphasized is that MongoShake will merge the continuously oplogs that has same namespace and same operation into one before writing. But in this case, all operation in each collection is very average which means not conducive to merge before writing.
Variable | Value |
---|---|
Source MongoDB type | replica set |
Unique index/Shard key | No; Hash by id. |
Workload | A total of 5 collections in one db, each document includes 5 columns and the total size of each oplog document is about 220 bytes. This data is not conducive to merge before writing. |
Tunnel type | direct |
Worker concurrency | 8 |
Here comes the result of measurement:
Measurement | Value |
---|---|
QPS | 55,705 |
CPU usage | 497% |
Memory usage | About 0.4%, 1.9 GB |
In this case, we can see that QPS is about one eighth of case 1. The bottleneck is the writing speed in the target database.
Compare to case 3, we increase the worker concurrency to 64 so that there are 64 worker threads in the pipeline.
Variable | Value |
---|---|
Source MongoDB type | replica set |
Unique index/Shard key | No; Hash by id. |
Workload | A total of 5 collections in one db, each document includes 5 columns and the total size of each oplog document is about 220 bytes. This data is not conducive to merge before writing. |
Tunnel type | direct |
Worker concurrency | 64 |
Here comes the result of measurement:
Measurement | Value |
---|---|
QPS | 186,755 |
CPU usage | 2256% |
Memory usage | About 0.4%, 1.9 GB |
In this case, we increase the degree of worker concurrency which means the replayer number of writing is also increased. As a result, both the QPS, CPU usage and memory usage is increased.
Compare to case 2, we adjust the workload to only 1 db and 1 collection with only 1 field. As an expected, this oplogs are conducive to merge before writing so that the QPS will higher than the value in the case 2.
Variable | Value |
---|---|
Source MongoDB type | replica set |
Unique index/Shard key | No; Hash by id. |
Workload | 1 db, 1 collection with only 1 field. |
Tunnel type | direct |
Worker concurrency | 8 |
Here comes the result of measurement:
Measurement | Value |
---|---|
QPS | 455,567 |
CPU usage | 1049% |
Memory usage | About 0.28%, 1.5GB |
The QPS is about 471 thousands which is higher than case 3 because the data is more likely to be merged together and then inserted into mongoDB to achive higher result.
Compare to case 5, we also increase worker concurrency just like case 4 do.
Variable | Value |
---|---|
Source MongoDB type | replica set |
Unique index/Shard key | No; Hash by id. |
Workload | 1 db, 1 collection with only 1 field. |
Tunnel type | direct |
Worker concurrency | 64 |
Here comes the result of measurement:
Measurement | Value |
---|---|
QPS | 456,990 |
CPU usage | 1811% |
Memory usage | About 0.18%, 0.74 GB |
The QPS is almost equal to the result shown in Case 4. The result is limited by the writing speed and conflict on the target MongoDB.
In this case, we adjust the collection number from 5 to 128 compared to case 3 to reduce the conflicts when inserting in the target MongoDB. In each collection, there are 5 fields.
Variable | Value |
---|---|
Source MongoDB type | replica set |
Unique index/Shard key | No; Hash by id. |
Workload | 1 db, 128 collection with 5 fields each. |
Tunnel type | direct |
Worker concurrency | 8 |
Here comes the result of measurement:
Measurement | Value |
---|---|
QPS | 63,206 |
CPU usage | 508% |
Memory usage | About 0.4%, 1.8GB |
The QPS value is bigger than case 3 because we ease the writing conflict. In MongoDB, there is a optimistic lock in collection when writing in the same collection concurrently.
Compared to case 7, we also increase worker concurrency from 8 to 64. The result will bigger than case 3.
Variable | Value |
---|---|
Source MongoDB type | replica set |
Unique index/Shard key | No; Hash by id. |
Workload | 1 db, 128 collection with 5 fields each. |
Tunnel type | direct |
Worker concurrency | 64 |
Here comes the result of measurement:
Measurement | Value |
---|---|
QPS | 189,140 |
CPU usage | 1348% |
Memory usage | About 0.3%, 1.3GB |
The value is a little bigger more case 3 but meets our expectation.
We adjust the shard_key from “auto” to collection and set worker concurrency to 128.
Variable | Value |
---|---|
Source MongoDB type | replica set |
Unique index/Shard key | No; Hash by document. |
Workload | 1 db, 128 collection with 5 fields each. |
Tunnel type | direct |
Worker concurrency | 128 |
Here comes the result of measurement:
Measurement | Value |
---|---|
QPS | 279,633 |
CPU usage | 2096% |
Memory usage | About 0.3%, 1.6GB |
Set worker concurrency to collection number will achieve good performance when collection number is bigger and data distribution evenly.
In this case, we adjust the source MongoDB from ReplicaSet to Sharding with 8 shards. There are also a total of 5 collections in one db and 5 fields in each table which is same as case 3. This data is also not conducive to merge before writing.
Variable | Value |
---|---|
Source MongoDB type | sharding |
Unique index/Shard key | No; Hash by id. |
Workload | A total of 5 collections in one db, each document includes 5 columns and the total size of each oplog document is about 220 bytes. This data is not conducive to merge before writing. |
Tunnel type | direct |
Worker concurrency | 8 |
Here comes the result of measurement:
Measurement | Value |
---|---|
QPS | 61,184 |
CPU usage | 518% |
Memory usage | About 2.7%, 13.8GB |
Compared to case 3, the QPS is very close which means no matter ReplicaSet or Sharding type the source database is, we can get similar results.
At first, we want to test the performance when unique key exist. However, the QPS will depends on how many unique keys have, how evenly collections distributed, and how many conflicts happens between these unique keys. And in current v1.0.0 open source, “uk” field is not supported. So we don’t give this result.
In general, the QPS value will be slightly smaller than above value because the hash method is switching from “id” to “collection” and in most cases the data is distributed unevenly.
So far, we didn’t give the latency value in the above experiments because these experiments are all fetching data at the beginning of oplogs generated at the source database. So that if we want to calculate the latency of a new generated data, we have to wait for all the previous data to be transmitted first. Under these circumstances, the calculated value is inaccurate.
So, we came up an idea to inserting data after all previous data finish transmitting and then calculate the difference between timestamp in the oplog of source database and target database as the latency. However, the time clock in the source database host and target database host maybe different, so we only calculate the latency that source database and target database located in the same host to eliminate this difference.
Latency: less than 1 second.