Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting NotEnoughValidWindowsException when I click on KafkaClusterLoad or any similar tabs #2085

Open
aagrrawal opened this issue Dec 4, 2023 · 21 comments
Labels
question A code or meta question about the project. robustness Makes the project tolerate or handle perturbations. usability Improves the ease of use or learnability of the system.

Comments

@aagrrawal
Copy link

Getting below Exception when I click on KafkaClusterLoad or any similar tabs. I am using kafka v3.3.1 and cruise-control v2.5.132. Went through discussion in #310 but it didn't help. I have manually created __CruiseControlMetrics topic in kafka cluster. Pl help

Caused by: com.linkedin.cruisecontrol.exception.NotEnoughValidWindowsException: There is no window available in range [-1, 1701684649907] (index [1, -1]). Window index (current: 0, oldest: 0).
	at com.linkedin.cruisecontrol.monitor.sampling.aggregator.MetricSampleAggregator.aggregate(MetricSampleAggregator.java:202) ~[cruise-control-core-2.5.132-SNAPSHOT.jar:?]
	at com.linkedin.kafka.cruisecontrol.monitor.sampling.aggregator.KafkaPartitionMetricSampleAggregator.aggregate(KafkaPartitionMetricSampleAggregator.java:151) ~[cruise-control-2.5.132-SNAPSHOT.jar:?]
	at com.linkedin.kafka.cruisecontrol.monitor.LoadMonitor.clusterModel(LoadMonitor.java:503) ~[cruise-control-2.5.132-SNAPSHOT.jar:?]
	at com.linkedin.kafka.cruisecontrol.KafkaCruiseControl.clusterModel(KafkaCruiseControl.java:370) ~[cruise-control-2.5.132-SNAPSHOT.jar:?]
	at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.LoadRunnable.clusterModel(LoadRunnable.java:111) ~[cruise-control-2.5.132-SNAPSHOT.jar:?]
	at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.LoadRunnable.clusterModelFromEarliest(LoadRunnable.java:93) ~[cruise-control-2.5.132-SNAPSHOT.jar:?]
	at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.LoadRunnable.getResult(LoadRunnable.java:76) ~[cruise-control-2.5.132-SNAPSHOT.jar:?]
	at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.LoadRunnable.getResult(LoadRunnable.java:26) ~[cruise-control-2.5.132-SNAPSHOT.jar:?]
	at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.OperationRunnable.run(OperationRunnable.java:45) ~[cruise-control-2.5.132-SNAPSHOT.jar:?]
	at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.LoadRunnable.run(LoadRunnable.java:26) ~[cruise-control-2.5.132-SNAPSHOT.jar:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
@aagrrawal
Copy link
Author

@efeg could you pls have a look ?

@aagrrawal
Copy link
Author

Hi @CCisGG @rosakng, could you help here ?

@aagrrawal
Copy link
Author

It seems to me that probably kafka is not able to write to __CruiseControlMetrics topic as this was empty ->
% kafka-console-consumer.sh --topic __CruiseControlMetrics --from-beginning --bootstrap-server localhost:9092

Are kafka v3.3.1 and cruise-control v2.5.132 compatible ?

@mhratson
Copy link
Contributor

mhratson commented Dec 4, 2023

Have you checked CC logs for errors after __CruiseControlMetrics was created?

@aagrrawal
Copy link
Author

Hi @mhratson, I checked CC logs, I didn't get any error after __CruiseControlMetrics was created. I got error NotEnoughValidWindowsException when I clicked KafkaClusterLoad tab on cruise control ui.

@aagrrawal
Copy link
Author

How is metric written to __CruiseControlMetrics topic ? Saw this in CC log during CC startup - "Collected 0 partition metric samples for 0 partitions." Is it normal ?

[2023-12-04 21:55:56,568] INFO [Consumer clientId=CruiseControlMetricsReporterSampler-consumer--6371283249548768098, groupId=null] Resetting the last seen epoch of partition __CruiseControlMetrics-0 to 3 since the associated topicId changed from null to gq4rghsKQ7ivpr82jkpJeg (org.apache.kafka.clients.Metadata)
[2023-12-04 21:55:56,574] INFO [Consumer clientId=CruiseControlMetricsReporterSampler-consumer--6371283249548768098, groupId=null] Seeking to offset 0 for partition __CruiseControlMetrics-0 (org.apache.kafka.clients.consumer.KafkaConsumer)
[2023-12-04 21:56:01,578] INFO Finished sampling from topic __CruiseControlMetrics for partitions [0] in time range [1701706916549,1701707156554]. Collected 0 metrics. (com.linkedin.kafka.cruisecontrol.monitor.sampling.CruiseControlMetricsReporterSampler)
[2023-12-04 21:56:01,579] INFO Collected 0 partition metric samples for 0 partitions. Total partition assigned: 116. (com.linkedin.kafka.cruisecontrol.monitor.sampling.SamplingFetcher)
[2023-12-04 21:56:01,579] INFO Collected 0 broker metric samples for 0 brokers. (com.linkedin.kafka.cruisecontrol.monitor.sampling.SamplingFetcher)
[2023-12-04 21:56:01,583] INFO Finished sampling in 5026 ms. (com.linkedin.kafka.cruisecontrol.monitor.sampling.MetricFetcherManager)

@mhratson
Copy link
Contributor

mhratson commented Dec 4, 2023

@aagrrawal how did you configure Kafka for CC besides creating the topic?

@aagrrawal
Copy link
Author

aagrrawal commented Dec 4, 2023

@mhratson I added metric.reporters=com.linkedin.kafka.cruisecontrol.metricsreporter.CruiseControlMetricsReporter in server.properties. Also since I am starting without zookeeper, I commented out zookeeper config and added kafka.broker.failure.detection.enable config setting

kafka.broker.failure.detection.enable=true
#zookeeper.connect=localhost:2181

# Timeout in ms for connecting to zookeeper
#zookeeper.connection.timeout.ms=18000

Also copied cruise-control-metrics-reporter-2.5.132-SNAPSHOT.jar to libs folder in kafka directory

@aagrrawal
Copy link
Author

I am starting cruise control like this - ./kafka-cruise-control-start.sh config/cruisecontrol.properties

@mhratson
Copy link
Contributor

mhratson commented Dec 4, 2023

@aagrrawal we need to ensure metrics is produced into __CruiseControlMetrics. So worth checking kafka logs if there are any related errors…

@aagrrawal
Copy link
Author

@mhratson there are no related errors in kafka log. But one interesting thing we noted, this line Kafka metrics for Cruise Control metrics during initialization is present in v3.1.0 when we start Kafka with metricsReporter but is absent in v3.3.1. Any idea what might cause this ?

https://github.com/linkedin/cruise-control/blob/migrate_to_kafka_2_5/cruise-control-metrics-reporter/src/main/java/com/linkedin/kafka/cruisecontrol/metricsreporter/CruiseControlMetricsReporter.java#L98

@marcelloromani
Copy link

marcelloromani commented Dec 15, 2023

I am facing the same issue.
I left Cruise Control running overnight, so lack of metrics shouldn't be a problem.

I have read somewhere that the error might be caused by a mismatch in the message format version.

I found this line in the readme file:

* `message.format.version` `0.10.0` and above is needed.

message.format.version 0.10.0 and above is needed.

I believe in my case this could be the reason:
I initially installed Cruise Control 2.5.36
Later I replaced my installation with Cruise Control 2.5.99

My hypothesis is this: I creatd the __CruiseControlMetric topic for the first CC install, 2.5.36, which started populating it with messages with a message format < 0.10.0
I then upgraded to CC 2.5.99, which now finds messages in an old format.

Update

Used a brand new topic and the error persists, so there must be something else at play.

I have:

[2023-12-15 14:13:13,975] INFO Finished sampling from topic CruiseControlMetrics for partitions [0, 1] in time range [1702649468778,1702649588779]. Collected 0 metrics. (com.linkedin.kafka.cruisecontrol.monitor.sampling.CruiseControlMetricsReporterSampler)
[2023-12-15 14:13:13,975] INFO Collected 0 partition metric samples for 0 partitions. Total partition assigned: 126. (com.linkedin.kafka.cruisecontrol.monitor.sampling.SamplingFetcher)

I wonder if the fact that there are 126 partitions but CC is only collecting from partitions 0, 1 means I have misconfigured CC?

@tedherring-smarsh
Copy link

tedherring-smarsh commented Jan 4, 2024

Any updates here? I'm seeing the same issue. I see my __CruiseControlMetrics appears empty, but I can write to that topic via console-producer. I can also write to my own my-topic

@CCisGG
Copy link
Contributor

CCisGG commented Jan 4, 2024

Any updates here? I'm seeing the same issue. I see my __CruiseControlMetrics appears empty, but I can write to that topic via console-producer. I can also write to my own my-topic

@tedherring-smarsh
In that case I think your CC+Kafka may not setup correctly. To make CC produce metrics to that topic, you need to inject the CruiseControlMetricsReporter in to your Kafka broker. Follow the README: Quick Start section 1.

@yangmuye-c
Copy link
Contributor

当我单击 KafkaClusterLoad 或任何类似选项卡时,出现以下异常。我正在使用 kafka v3.3.1 和 Cruise-control v2.5.132。在#310中进行了讨论,但没有帮助。我已在 kafka 集群中手动创建了 __CruiseControlMetrics 主题。请帮忙

Caused by: com.linkedin.cruisecontrol.exception.NotEnoughValidWindowsException: There is no window available in range [-1, 1701684649907] (index [1, -1]). Window index (current: 0, oldest: 0).
	at com.linkedin.cruisecontrol.monitor.sampling.aggregator.MetricSampleAggregator.aggregate(MetricSampleAggregator.java:202) ~[cruise-control-core-2.5.132-SNAPSHOT.jar:?]
	at com.linkedin.kafka.cruisecontrol.monitor.sampling.aggregator.KafkaPartitionMetricSampleAggregator.aggregate(KafkaPartitionMetricSampleAggregator.java:151) ~[cruise-control-2.5.132-SNAPSHOT.jar:?]
	at com.linkedin.kafka.cruisecontrol.monitor.LoadMonitor.clusterModel(LoadMonitor.java:503) ~[cruise-control-2.5.132-SNAPSHOT.jar:?]
	at com.linkedin.kafka.cruisecontrol.KafkaCruiseControl.clusterModel(KafkaCruiseControl.java:370) ~[cruise-control-2.5.132-SNAPSHOT.jar:?]
	at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.LoadRunnable.clusterModel(LoadRunnable.java:111) ~[cruise-control-2.5.132-SNAPSHOT.jar:?]
	at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.LoadRunnable.clusterModelFromEarliest(LoadRunnable.java:93) ~[cruise-control-2.5.132-SNAPSHOT.jar:?]
	at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.LoadRunnable.getResult(LoadRunnable.java:76) ~[cruise-control-2.5.132-SNAPSHOT.jar:?]
	at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.LoadRunnable.getResult(LoadRunnable.java:26) ~[cruise-control-2.5.132-SNAPSHOT.jar:?]
	at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.OperationRunnable.run(OperationRunnable.java:45) ~[cruise-control-2.5.132-SNAPSHOT.jar:?]
	at com.linkedin.kafka.cruisecontrol.servlet.handler.async.runnable.LoadRunnable.run(LoadRunnable.java:26) ~[cruise-control-2.5.132-SNAPSHOT.jar:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]

I had the same problem. Did you solve it?

@marcelloromani
Copy link

To make CC produce metrics to that topic, you need to inject the CruiseControlMetricsReporter in to your Kafka broker.

How can that be done when using managed kafka serivces like Amazon MSK?

@CCisGG
Copy link
Contributor

CCisGG commented Jan 15, 2024

I had the same problem. Did you solve it?

@1299618103 did you follow the README to setup CruiseControlMetricsReporter in your Kafka cluster?

@CCisGG
Copy link
Contributor

CCisGG commented Jan 15, 2024

How can that be done when using managed kafka serivces like Amazon MSK?

@marcelloromani Please feel free to reach out to Amazon MSK support. I guess there could be some ENV setup you can use in your MSK portal, but I personally didn't use it before so not exactly sure how they made it pass through those configs.

@CCisGG
Copy link
Contributor

CCisGG commented Jan 15, 2024

@mhratson This is a popular issue so I'd like to invite you to the discussion.

@yangmuye-c
Copy link
Contributor

我有同样的问题。你解决了吗?

@1299618103您是否按照自述文件在 Kafka 集群中设置 CruiseControlMetricsReporter ?

I set it, my other clusters are normal, this cluster is abnormal after expansion,the details are here #2107

@CCisGG CCisGG added question A code or meta question about the project. usability Improves the ease of use or learnability of the system. robustness Makes the project tolerate or handle perturbations. labels Feb 21, 2024
@marcelloromani
Copy link

How can that be done when using managed kafka serivces like Amazon MSK?

@marcelloromani Please feel free to reach out to Amazon MSK support. I guess there could be some ENV setup you can use in your MSK portal, but I personally didn't use it before so not exactly sure how they made it pass through those configs.

https://docs.aws.amazon.com/msk/latest/developerguide/cruise-control.html

As far as I could see, for MSK we need to provision a Prometheus instance that scrapes the OpenTelemetry MSK endpoints, and point Cruise Control to this Prometheus instance to read the MSK metrics it needs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question A code or meta question about the project. robustness Makes the project tolerate or handle perturbations. usability Improves the ease of use or learnability of the system.
Projects
None yet
Development

No branches or pull requests

6 participants