Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metrics for RPC #2104

Closed
xemul opened this issue Oct 31, 2023 · 17 comments · Fixed by #2343
Closed

Add metrics for RPC #2104

xemul opened this issue Oct 31, 2023 · 17 comments · Fixed by #2343
Assignees
Labels
enhancement New feature or request

Comments

@xemul
Copy link

xemul commented Oct 31, 2023

RPC metrics are added in 5.4 with the scylladb/scylladb@0c69a31 seastar update and are enhanced (see below) with the scylladb/scylladb#15785 merge. In ent. it's going to be 2024.1

The metrics include

  • scylla_rpc_client_count -- gauge showing total number of connections
  • scylla_rpc_client_sent_messages -- counter with total number of messages sent
  • scylla_rpc_client_replied -- counter with total number of responses received. This is less-or-equal than the above, because some messages-sent can be one-way calls not asking for the response or can result in exception or timeout (there's metrics for that too)
  • scylla_rpc_client_exception_received -- counter with total number of exceptional replies
  • scylla_rpc_client_timeout -- counter with total number of request timeouts
  • scylla_rpc_client_pending -- gauge with the number of requests queued for sending, but not yet sent
  • scylla_rpc_client_wait_reply -- gauge with the number of requests waiting for the reply

The metrics are labeled with "domain" and "shard" values, each domain should have its own set of plots on the dashboard. Domains are dynamic.

@xemul xemul added the enhancement New feature or request label Oct 31, 2023
@amnonh amnonh added this to the Monitoring 4.6 milestone Nov 2, 2023
@amnonh
Copy link
Collaborator

amnonh commented Nov 6, 2023

@xemul I tried to test it with 5.4-rc1 didn't see those metrics

@xemul
Copy link
Author

xemul commented Nov 6, 2023

If you're testing it with a single scylla instance, there will be no RPC connections and no such metrics

@amnonh
Copy link
Collaborator

amnonh commented Nov 6, 2023

@xemul I've tested it with a 3 nodes cluster

@xemul
Copy link
Author

xemul commented Nov 6, 2023

Hm... I tested with two scylla processes launched by hand. Can you give me the IPs of those nodes so I could check?

@xemul
Copy link
Author

xemul commented Nov 6, 2023

Ah, wait. scylla-5.4.0-rc1 doesn't have this seastar update (yet?)

@xemul
Copy link
Author

xemul commented Nov 6, 2023

I guess 5.4's seastar is not going to be just merged from master, so it's going to be 5.5/6.0 then

@amnonh amnonh removed this from the Monitoring 4.6 milestone Nov 7, 2023
@amnonh
Copy link
Collaborator

amnonh commented Nov 7, 2023

postponding until we'll have a version with those metrics

@amnonh
Copy link
Collaborator

amnonh commented Feb 8, 2024

@xemul is this part of 2024.1?

@xemul
Copy link
Author

xemul commented Feb 9, 2024

Nope :(

@amnonh
Copy link
Collaborator

amnonh commented May 23, 2024

@xemul, did it make it to 6.0?

@xemul
Copy link
Author

xemul commented May 23, 2024

Yes

@amnonh amnonh added this to the Monitoring 4.8 milestone May 23, 2024
@amnonh
Copy link
Collaborator

amnonh commented Jul 3, 2024

@xemul, @mykaul who is going to use those metrics and how?
Those are 7 additional panels, I collected data from the compression test and can use that to add the panels, but the question, should we?

@amnonh amnonh assigned amnonh and xemul and unassigned amnonh Jul 3, 2024
@amnonh
Copy link
Collaborator

amnonh commented Jul 3, 2024

@xemul I'm assigning it to you to get your input, you don't need to add any code

@xemul
Copy link
Author

xemul commented Jul 3, 2024

From my perspective it's for Advanced dashboard that includes IO-queue metrics and CPU-scheduler ones

@amnonh
Copy link
Collaborator

amnonh commented Jul 3, 2024

I'm worried that we just add more and more metrics, the dashboard will work slower and slower and in the end, it's not useful

amnonh added a commit to amnonh/scylla-grafana-monitoring that referenced this issue Jul 4, 2024
@mykaul
Copy link
Contributor

mykaul commented Jul 9, 2024

Is the content of scylladb/seastar#2293 included here, or do we need a separate issue for it? (rpc DELAY metrics) ?

@amnonh
Copy link
Collaborator

amnonh commented Jul 9, 2024

Is the content of scylladb/seastar#2293 included here, or do we need a separate issue for it? (rpc DELAY metrics) ?

It's not, it will require a new issue with what version it's part of and some additional information and samples

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants