-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prometheus: sanitize label value for text protocol #2400
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just some nits, lgtm.
tests/unit/prometheus_http_test.cc
Outdated
output.flush().get(); | ||
auto resp = input.read().get(); | ||
auto resp_str = std::string(resp.get(), resp.size()); | ||
BOOST_REQUIRE_NE(resp_str.find("200 OK"), std::string::npos); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could also use
using namespace std::literals;
// ...
BOOST_REQUIRE(std::ranges::search(resp, "200 OK"sv));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, done
|
||
future<> client = seastar::async([&lsi] { | ||
connected_socket c_socket = lsi.connect(socket_address(ipv4_addr()), socket_address(ipv4_addr())).get(); | ||
input_stream<char> input(c_socket.input()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, could use
auto close_input = deferred_close(input);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
|
||
co_await seastar::async([] { | ||
loopback_connection_factory lcf(1); | ||
http_server server("test"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, could use
auto stop_server = deferred_stop(server);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't use this one because the stop()
method of http_server
is not noexcept
so it doesn't satisfy the stoppable
concept required for deferred_stop
.
@amnonh hey Amnon, would you please take a look? |
@pgellert I like the idea and the optimization. It will be better to perform the label clean up on creation, so it can be done once. I would even convider throwing an exception in that case |
Adds a test to verify that the metrics returned with the prometheus text format are formatted as expected.
The prometheus text protocol requires that the label value to be escaped. Note that this is only needed for the text protocol and not for the protobuf-based prometheus protocol. Without this sanitization, a single label with one of these special characters makes the whole metrics output unparseable by prometheus. > label_value can be any sequence of UTF-8 characters, but the backslash > (\), double-quote ("), and line feed (\n) characters have to be > escaped as \\, \", and \n, respectively. From https://github.com/prometheus/docs/blob/main/content/docs/instrumenting/exposition_formats.md
d156991
to
6bf5958
Compare
Force-push: improve test code based on code review feedback |
@tchaikov @amnonh thank you for the reviews.
This one is trickier to implement than it first seems. The main complication is that the escaping must apply to the label values only when serialized with the text protocol, but not when serialized with protobuf. I wanted to add a new field |
I disagree, protobuf and text should be the same, I would seriously
concider throwing on non standard labels
…On Tuesday, August 27, 2024, Gellért Peresztegi-Nagy < ***@***.***> wrote:
@tchaikov <https://github.com/tchaikov> @amnonh
<https://github.com/amnonh> thank you for the reviews.
It will be better to perform the label clean up on creation, so it can be
done once. I would even convider throwing an exception in that case
This one is trickier to implement than it first seems. The main
complication is that the escaping must apply to the label values only when
serialized with the text protocol, but not when serialized with protobuf. I
wanted to add a new field _escaped_value to label_instance but it's not
that simple, the label instances associated with a metric are converted
into a labels_type and it's difficult to trace which label
representations are used at what points on the text-based serialization
path and the protobuf-based serialization path. I am also not sure the
tests would catch if I made a mistake during the refactoring necessary to
make that change, so I am tempted to keep the change simpler and hold off
with this optimization. Let me know what you think.
—
Reply to this email directly, view it on GitHub
<#2400 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAQFDP5MOIAH6XF62GCF6RDZTSVU5AVCNFSM6AAAAABM72GT22VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJTGA3TOMJQGA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I think we should do it for both - we do not want to be in the situation where one decides to switch from one to another and then be surprised by the result. |
switch (v.type()) { | ||
case seastar::metrics::impl::data_type::GAUGE: | ||
case seastar::metrics::impl::data_type::REAL_COUNTER: | ||
return std::to_string(v.d()); | ||
fmt::print(os, "{:.6f}", v.d()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think .6f is a good format. For small values, it loses precision. For large values, (beyond 10^(std::numeric_limit::digits10-6)), it has precision without accuracy.
I expect '{g}' will work and provide maximum accuracy, but haven't tested it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should changing the value's precision be part of a patch that's supposedly about the labels?
I had in the past some bad experience (scylladb/scylladb#19081 (comment)) with how Seastar's metrics convert double-precision to 6 digits of precision in obscure places, causing surprising effects. We need to be very careful about this. Note that using "to_string()" is even worse, because nobody knows what it really does... If I remember correctly (but I can't find a reference now), we even had a case where in the past some to-string function used to print 15 digits of precision but one day when we changed the way we use fmt
suddenly it because 6 digits.
Ideally, we'd want to have regression tests for these precision issues, so we won't be surprised if one day the precision of some metric format regresses.
P.S. @avikivity , actually the default precision of printf's %g
is 6 digits, similar to %f
- the "g" is about scientific notation for large exponents - not about precision. But I don't know if fmt inherited printf's idea of what "g" is or invented its own - it's worth checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This patch is already changing the precision, by changing to_string() to fmt::print("{:.6f").
If plain {g} doesn't work, then {:19g} (or whatever) can be substituted. Or even start with {} to maintain the status quo and improve it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@avikivity this is exactly my question - why does a patch for "sanitize label value for text protocol" need to change the formatting of numbers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This patch is already changing the precision
My intention with this PR is to keep the existing formatting of floating point numbers, keeping this PR purely a refactoring when it comes to the formatting of the metric values. The goal is to avoid the coming changes to std::to_string
. std::to_string
effectively used {:.6f}
for the formatting of double
s and this is what I intend to preserve with this change.
From https://en.cppreference.com/w/cpp/string/basic_string/to_string:
7,8) Converts a floating point value to a string as if by std::sprintf(buf, "%f", value). (until C++26)
Then from https://en.cppreference.com/w/cpp/io/c/fprintf:
f/F: Converts floating-point number to the decimal notation in the style [-]ddd.ddd.
Precision specifies the exact number of digits to appear after the decimal point character. The default precision is 6. In the alternative implementation decimal point character is written even if no digits follow it. For infinity and not-a-number conversion style see notes.
I'm happy to move this commit to another PR if you prefer.
I think keeping the metric formatting as the original std::to_string
-like format ({:.6f}
) would be best. An important edge case for which {:.6f}
works but {g}
or {}
would be problematic is gauges that have large integer values (eg. a gauge with a unix timestamp as its value). Changing the formatting to scientific notation could make these integer values less accurate which would be confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This patch is already changing the precision
My intention with this PR is to keep the existing formatting of floating point numbers, keeping this PR purely a refactoring when it comes to the formatting of the metric values. The goal is to avoid the coming changes to
std::to_string
.std::to_string
effectively used{:.6f}
for the formatting ofdouble
s and this is what I intend to preserve with this change.
Absolutely the correct approach.
From https://en.cppreference.com/w/cpp/string/basic_string/to_string:
7,8) Converts a floating point value to a string as if by std::sprintf(buf, "%f", value). (until C++26)
Then from https://en.cppreference.com/w/cpp/io/c/fprintf:
f/F: Converts floating-point number to the decimal notation in the style [-]ddd.ddd.
Precision specifies the exact number of digits to appear after the decimal point character. The default precision is 6. In the alternative implementation decimal point character is written even if no digits follow it. For infinity and not-a-number conversion style see notes.I'm happy to move this commit to another PR if you prefer.
I think keeping the metric formatting as the original
std::to_string
-like format ({:.6f}
) would be best. An important edge case for which{:.6f}
works but{g}
or{}
would be problematic is gauges that have large integer values (eg. a gauge with a unix timestamp as its value). Changing the formatting to scientific notation could make these integer values less accurate which would be confusing.
Let's plan a follow-up to improve floating-point formatting. .6f is pretty bad. It's okay for a gauge, but as soon as you take a derivative of a small slowly-changing metric, you get poor results.
src/core/prometheus.cc
Outdated
case seastar::metrics::impl::data_type::HISTOGRAM: | ||
case seastar::metrics::impl::data_type::SUMMARY: | ||
break; | ||
} | ||
return ""; // we should never get here but it makes the compiler happy | ||
return os; // we should never get here but it makes the compiler happy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment is now incorrect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
Could you please elaborate in what sense they should be the same? Just to clarify, if a user of the library intends to create a label value with the string Prometheus supports label values with special characters so I don't see why seastar should restrict the range of label values that it supports. |
This avoids a string copy when writing the metric value and the metric type to the prometheus text-based response. Note that now `fmt::print` is used to format the values instead of the originally used `std::to_string`. This is because `std::to_string` is going to change its formatting behaviour in C++26, so we prefer to use a more deterministic formatter here. The formatting of the numbers matters because if the floating point numbers are converted to scientific notation, they can lose precision. Ref: https://en.cppreference.com/w/cpp/string/basic_string/to_string
6bf5958
to
6e2f629
Compare
Force-push: remove outdated comment |
Since users are free to switch between the protocols, we do not want the end user experience to be different when changing protocol. Unless I misunderstand the end result - would the user see the same label value eventually (say, in a query in Grafana) when switching from native to text? I'll also add that I think protobuf is regretfully still not GA (I think?). |
Yes, they would see the same label value with this PR when they switch between native and text. Without this PR, they would see different results or be unable to switch from protobuf to text if they had label values with special characters (backslash |
Label and their values are static on creation, now, we'll need to call escape each time we send them, it's a shame. Was there an actual demand for special characters? |
I've changed the label escaping now to only happen once, on creation. Let me know if the changes look good, and if the changes in the header files are within the source compatibility requirements of seastar.
Yes, in redpanda we have labels that could contain these special characters (eg. consumer group names). |
include/seastar/core/metrics_api.hh
Outdated
@@ -41,7 +41,7 @@ namespace seastar { | |||
namespace metrics { | |||
namespace impl { | |||
|
|||
using labels_type = std::map<sstring, sstring>; | |||
using labels_type = std::map<sstring, label_value>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that's exposed as an API and will break scylladb unit tests that verify metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I attempted to avoid changing this in place now: #2400 (comment)
0561dac
to
3ed894f
Compare
I've pushed some more changes to try to keep the type alias Note that there are still API-incompatible changes in the last commit, namely:
@amnonh @avikivity could you please re-review and advise on whether the optimization in the last commit is worth doing or if I should drop it since the blast radius of the change is quite large and there are still API-breaking changes in it? I would like to get this PR in if possible. |
@pgellert let's split the last patch off (I haven't reviewed it yet). The fixes are more important and can be queued immediately, we can consider the optimization after that. |
3ed894f
to
6e2f629
Compare
Force-push: dropped the last commit based on @avikivity's feedback above |
This PR fixes a bug in the prometheus text-based endpoint where a label value with a special character (backslash (), double-quote (") or line feed (\n)) makes the whole metrics output unparseable by prometheus.
The prometheus text protocol requires that the label value to be escaped. Note that this is only needed for the text protocol and not for the protobuf-based prometheus protocol.
From https://github.com/prometheus/docs/blob/main/content/docs/instrumenting/exposition_formats.md
This also adds a new test (
prometheus_http_test.cc
) to validate the output of the text-based prometheus response. I've introduced this because it was simpler to make explicit checks on the text output than modifyingprometheus_test.py
andmetrics_tester.cc
to do that. I am open to give a go at working this intoprometheus_test.py
if you prefer that.Finally, we switch to using
fmt::print
to format the metric value and the metric type instead of the originally usedstd::to_string
.std::to_string
is going to change its formatting behaviour in C++26 and so we prefer a more deterministic formatter here. The formatting matters because if the floating point numbers are converted to scientific notation, they can lose precision.As an added benefit, using
fmt::print
writes these values to the output stream directly which means that we avoid a string copy when writing these values to the prometheus text-based response.