feat: clickhouse optimize S3 #4825

makeavish · 2024-04-06T19:08:37Z

Summary

Clickhouse S3 optimizer runs OPTIMIZE TABLE query to reduce excessive PUT calls on S3

Co-authored-by: Prashant Shahi [email protected]

Co-authored-by: Prashant Shahi <[email protected]>

github-actions · 2024-04-06T19:08:49Z

Build Error! No Linked Issue found. Please link an issue or mention it in the body using #<issue_id>

github-actions · 2024-04-11T02:46:43Z

Build Error! No Linked Issue found. Please link an issue or mention it in the body using #<issue_id>

github-actions · 2024-04-11T02:47:00Z

Build Error! No Linked Issue found. Please link an issue or mention it in the body using #<issue_id>

github-actions · 2024-04-11T02:47:26Z

Build Error! No Linked Issue found. Please link an issue or mention it in the body using #<issue_id>

srikanthccv

My main concern with this is it's going to put system under so much pressure whenever it is triggered https://clickhouse.com/docs/en/optimize/avoidoptimizefinal. This can make the customer env the pod running effect everyone adversely. How big of the data we tested this on? Could this process be controlled with an upper limit on resources?

pkg/query-service/app/clickhouseOptimizeS3/optimize.go

srikanthccv · 2024-04-12T14:58:15Z

pkg/query-service/app/clickhouseOptimizeS3/optimize.go

+ tables := []string{
+ "signoz_logs.logs",
+ "signoz_logs.tag_attributes",
+ "signoz_metrics.samples_v2",
+ "signoz_metrics.time_series_v4",
+ "signoz_metrics.time_series_v3",
+ "signoz_metrics.time_series_v2",
+ "signoz_traces.usage_explorer",
+ "signoz_traces.span_attributes",
+ "signoz_traces.dependency_graph_minutes",
+ "signoz_traces.dependency_graph_minutes_v2",
+ "signoz_traces.signoz_error_index_v2",
+ "signoz_traces.signoz_index_v2",
+ "signoz_traces.signoz_spans",
+ "signoz_traces.durationSort",
+ }


How is this list prepared? It doesn't seem to be in full sync with

signoz/pkg/query-service/app/clickhouseReader/reader.go

Line 2425 in 389058b

case constants.LogsTTL:

I see we are not allowing user to change ttl or move to s3 for signoz_metrics.time_series_v4 or other time_series tables. Any reason for that?

pkg/query-service/app/clickhouseOptimizeS3/optimize.go

makeavish · 2024-04-16T04:09:15Z

My main concern with this is it's going to put system under so much pressure whenever it is triggered https://clickhouse.com/docs/en/optimize/avoidoptimizefinal. This can make the customer env the pod running effect everyone adversely. How big of the data we tested this on? Could this process be controlled with an upper limit on resources?

We use optimize_skip_merged_partitions=1 which ensures that already merged partitions are not merged. Using upper limit on resources can be done but depends on load. Limiting resources too much can slow down merges eventually resulting in slowdown of queries. I would suggest we release this and then manually adjust resource limits based on load. After some feedbacks we can also automate resource limits.

Allow cluster to be a variable

srikanthccv · 2024-04-16T21:53:51Z

IMO this is not one of those features where we get the initial working piece merged and think about it more later. It has the potential to bring down the entire system from ingestion, alerting and product usage. We know the mutations are the worst thing in ClickHouse. Even on the table with not more than a few tens of million rows, the CPU usage went near 100% which delayed ingestion leading to false alerts and the product was unusable during the whole time https://signoz-team.slack.com/archives/C06C5U3TUDP/p1706604434441239. Once it got into a bad state there was no way to stop it immediately (The kill wouldn't work). We had to wait till it was completed on its own.

Did we test this on the system which had a non-trivial amount of data? What was the system resources usage change observed? How did it affect the rest of the product usage? My main point about the resource limits is that there should be sensible limits when we initiate an unscheduled merge. I didn't say we should limit resources too much.

ankitnayan · 2024-04-29T20:53:15Z

pkg/query-service/app/clickhouseOptimizeS3/optimize.go

+// General
+const (
+ CH_OPTIMIZE_INTERVAL_IN_HOURS = 24
+ CH_TIMEOUT_WAIT_IN_MINUTES = 30


this is not respected in case of slowdown

feat: clickhouseOptimizeS3

19d4c34

Co-authored-by: Prashant Shahi <[email protected]>

makeavish requested a review from prashant-shahi April 6, 2024 19:08

github-actions bot added the enhancement New feature or request label Apr 6, 2024

prashant-shahi added 2 commits April 9, 2024 13:53

Merge branch 'develop' into feat/optimize-s3-archive

8bc1864

Merge branch 'develop' into feat/optimize-s3-archive

982f453

prashant-shahi marked this pull request as ready for review April 10, 2024 09:34

Merge branch 'develop' into feat/optimize-s3-archive

ce7fdac

prashant-shahi requested a review from srikanthccv April 12, 2024 07:21

srikanthccv reviewed Apr 12, 2024

View reviewed changes

makeavish added 2 commits April 16, 2024 09:40

Merge branch 'develop' into feat/optimize-s3-archive

cf8cc5d

chore: address review comments

394b30b

Allow cluster to be a variable

prashant-shahi added 3 commits April 19, 2024 10:19

Merge branch 'develop' into feat/optimize-s3-archive

b3665ab

Merge branch 'develop' into feat/optimize-s3-archive

dab4ccc

Merge branch 'develop' into feat/optimize-s3-archive

5627172

ankitnayan reviewed Apr 29, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: clickhouse optimize S3 #4825

feat: clickhouse optimize S3 #4825

makeavish commented Apr 6, 2024 •

edited by prashant-shahi

github-actions bot commented Apr 6, 2024

github-actions bot commented Apr 11, 2024

github-actions bot commented Apr 11, 2024

github-actions bot commented Apr 11, 2024

srikanthccv left a comment

srikanthccv Apr 12, 2024

makeavish Apr 16, 2024

makeavish commented Apr 16, 2024

srikanthccv commented Apr 16, 2024 •

edited

ankitnayan Apr 29, 2024

feat: clickhouse optimize S3 #4825

Are you sure you want to change the base?

feat: clickhouse optimize S3 #4825

Conversation

makeavish commented Apr 6, 2024 • edited by prashant-shahi

Summary

github-actions bot commented Apr 6, 2024

github-actions bot commented Apr 11, 2024

github-actions bot commented Apr 11, 2024

github-actions bot commented Apr 11, 2024

srikanthccv left a comment

Choose a reason for hiding this comment

srikanthccv Apr 12, 2024

Choose a reason for hiding this comment

makeavish Apr 16, 2024

Choose a reason for hiding this comment

makeavish commented Apr 16, 2024

srikanthccv commented Apr 16, 2024 • edited

ankitnayan Apr 29, 2024

Choose a reason for hiding this comment

makeavish commented Apr 6, 2024 •

edited by prashant-shahi

srikanthccv commented Apr 16, 2024 •

edited