prometheus: sanitize label value for text protocol #2400

pgellert · 2024-08-23T09:35:52Z

This PR fixes a bug in the prometheus text-based endpoint where a label value with a special character (backslash (), double-quote (") or line feed (\n)) makes the whole metrics output unparseable by prometheus.

The prometheus text protocol requires that the label value to be escaped. Note that this is only needed for the text protocol and not for the protobuf-based prometheus protocol.

label_value can be any sequence of UTF-8 characters, but the backslash (), double-quote ("), and line feed (\n) characters have to be escaped as \, ", and \n, respectively.

From https://github.com/prometheus/docs/blob/main/content/docs/instrumenting/exposition_formats.md

This also adds a new test (prometheus_http_test.cc) to validate the output of the text-based prometheus response. I've introduced this because it was simpler to make explicit checks on the text output than modifying prometheus_test.py and metrics_tester.cc to do that. I am open to give a go at working this into prometheus_test.py if you prefer that.

Finally, we switch to using fmt::print to format the metric value and the metric type instead of the originally used std::to_string. std::to_string is going to change its formatting behaviour in C++26 and so we prefer a more deterministic formatter here. The formatting matters because if the floating point numbers are converted to scientific notation, they can lose precision.

As an added benefit, using fmt::print writes these values to the output stream directly which means that we avoid a string copy when writing these values to the prometheus text-based response.

tchaikov

just some nits, lgtm.

tchaikov · 2024-08-24T01:44:46Z

tests/unit/prometheus_http_test.cc

+ output.flush().get();
+ auto resp = input.read().get();
+ auto resp_str = std::string(resp.get(), resp.size());
+ BOOST_REQUIRE_NE(resp_str.find("200 OK"), std::string::npos);


could also use

using namespace std::literals; // ... BOOST_REQUIRE(std::ranges::search(resp, "200 OK"sv));

tchaikov · 2024-08-24T01:59:40Z

tests/unit/prometheus_http_test.cc

+
+ future<> client = seastar::async([&lsi] {
+ connected_socket c_socket = lsi.connect(socket_address(ipv4_addr()), socket_address(ipv4_addr())).get();
+ input_stream<char> input(c_socket.input());


nit, could use

auto close_input = deferred_close(input);

tchaikov · 2024-08-24T02:06:32Z

tests/unit/prometheus_http_test.cc

+
+ co_await seastar::async([] {
+ loopback_connection_factory lcf(1);
+ http_server server("test");


nit, could use

auto stop_server = deferred_stop(server);

I couldn't use this one because the stop() method of http_server is not noexcept so it doesn't satisfy the stoppable concept required for deferred_stop.

tchaikov · 2024-08-24T02:13:17Z

@amnonh hey Amnon, would you please take a look?

amnonh · 2024-08-24T15:21:03Z

@pgellert I like the idea and the optimization. It will be better to perform the label clean up on creation, so it can be done once. I would even convider throwing an exception in that case

Adds a test to verify that the metrics returned with the prometheus text format are formatted as expected.

The prometheus text protocol requires that the label value to be escaped. Note that this is only needed for the text protocol and not for the protobuf-based prometheus protocol. Without this sanitization, a single label with one of these special characters makes the whole metrics output unparseable by prometheus. > label_value can be any sequence of UTF-8 characters, but the backslash > (\), double-quote ("), and line feed (\n) characters have to be > escaped as \\, \", and \n, respectively. From https://github.com/prometheus/docs/blob/main/content/docs/instrumenting/exposition_formats.md

pgellert · 2024-08-27T16:55:00Z

Force-push: improve test code based on code review feedback

pgellert · 2024-08-27T16:56:23Z

@tchaikov @amnonh thank you for the reviews.

It will be better to perform the label clean up on creation, so it can be done once. I would even convider throwing an exception in that case

This one is trickier to implement than it first seems. The main complication is that the escaping must apply to the label values only when serialized with the text protocol, but not when serialized with protobuf. I wanted to add a new field _escaped_value to label_instance but it's not that simple, the label instances associated with a metric are converted into a labels_type and it's difficult to trace which label representations are used at what points on the text-based serialization path and the protobuf-based serialization path. I am also not sure the tests would catch if I made a mistake during the refactoring necessary to make that change, so I am tempted to keep the change simpler and hold off with this optimization. Let me know what you think.

amnonh · 2024-08-30T17:55:55Z

I disagree, protobuf and text should be the same, I would seriously concider throwing on non standard labels

…

On Tuesday, August 27, 2024, Gellért Peresztegi-Nagy < ***@***.***> wrote: @tchaikov <https://github.com/tchaikov> @amnonh <https://github.com/amnonh> thank you for the reviews. It will be better to perform the label clean up on creation, so it can be done once. I would even convider throwing an exception in that case This one is trickier to implement than it first seems. The main complication is that the escaping must apply to the label values only when serialized with the text protocol, but not when serialized with protobuf. I wanted to add a new field _escaped_value to label_instance but it's not that simple, the label instances associated with a metric are converted into a labels_type and it's difficult to trace which label representations are used at what points on the text-based serialization path and the protobuf-based serialization path. I am also not sure the tests would catch if I made a mistake during the refactoring necessary to make that change, so I am tempted to keep the change simpler and hold off with this optimization. Let me know what you think. — Reply to this email directly, view it on GitHub <#2400 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAQFDP5MOIAH6XF62GCF6RDZTSVU5AVCNFSM6AAAAABM72GT22VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJTGA3TOMJQGA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

mykaul · 2024-09-01T10:12:16Z

I think we should do it for both - we do not want to be in the situation where one decides to switch from one to another and then be surprised by the result.

avikivity · 2024-09-01T19:00:16Z

src/core/prometheus.cc

 switch (v.type()) {
 case seastar::metrics::impl::data_type::GAUGE:
 case seastar::metrics::impl::data_type::REAL_COUNTER:
- return std::to_string(v.d());
+ fmt::print(os, "{:.6f}", v.d());


I don't think .6f is a good format. For small values, it loses precision. For large values, (beyond 10^(std::numeric_limit::digits10-6)), it has precision without accuracy.

I expect '{g}' will work and provide maximum accuracy, but haven't tested it.

Should changing the value's precision be part of a patch that's supposedly about the labels?

I had in the past some bad experience (scylladb/scylladb#19081 (comment)) with how Seastar's metrics convert double-precision to 6 digits of precision in obscure places, causing surprising effects. We need to be very careful about this. Note that using "to_string()" is even worse, because nobody knows what it really does... If I remember correctly (but I can't find a reference now), we even had a case where in the past some to-string function used to print 15 digits of precision but one day when we changed the way we use fmt suddenly it because 6 digits.

Ideally, we'd want to have regression tests for these precision issues, so we won't be surprised if one day the precision of some metric format regresses.

P.S. @avikivity , actually the default precision of printf's %g is 6 digits, similar to %f - the "g" is about scientific notation for large exponents - not about precision. But I don't know if fmt inherited printf's idea of what "g" is or invented its own - it's worth checking.

This patch is already changing the precision, by changing to_string() to fmt::print("{:.6f").

If plain {g} doesn't work, then {:19g} (or whatever) can be substituted. Or even start with {} to maintain the status quo and improve it later.

@avikivity this is exactly my question - why does a patch for "sanitize label value for text protocol" need to change the formatting of numbers.

This patch is already changing the precision

My intention with this PR is to keep the existing formatting of floating point numbers, keeping this PR purely a refactoring when it comes to the formatting of the metric values. The goal is to avoid the coming changes to std::to_string. std::to_string effectively used {:.6f} for the formatting of doubles and this is what I intend to preserve with this change.

From https://en.cppreference.com/w/cpp/string/basic_string/to_string:

7,8) Converts a floating point value to a string as if by std::sprintf(buf, "%f", value). (until C++26)

Then from https://en.cppreference.com/w/cpp/io/c/fprintf:

f/F: Converts floating-point number to the decimal notation in the style [-]ddd.ddd.
Precision specifies the exact number of digits to appear after the decimal point character. The default precision is 6. In the alternative implementation decimal point character is written even if no digits follow it. For infinity and not-a-number conversion style see notes.

I'm happy to move this commit to another PR if you prefer.

I think keeping the metric formatting as the original std::to_string-like format ({:.6f}) would be best. An important edge case for which {:.6f} works but {g} or {} would be problematic is gauges that have large integer values (eg. a gauge with a unix timestamp as its value). Changing the formatting to scientific notation could make these integer values less accurate which would be confusing.

This patch is already changing the precision

My intention with this PR is to keep the existing formatting of floating point numbers, keeping this PR purely a refactoring when it comes to the formatting of the metric values. The goal is to avoid the coming changes to std::to_string. std::to_string effectively used {:.6f} for the formatting of doubles and this is what I intend to preserve with this change.

Absolutely the correct approach.

From https://en.cppreference.com/w/cpp/string/basic_string/to_string:

7,8) Converts a floating point value to a string as if by std::sprintf(buf, "%f", value). (until C++26)

Then from https://en.cppreference.com/w/cpp/io/c/fprintf:

f/F: Converts floating-point number to the decimal notation in the style [-]ddd.ddd.
Precision specifies the exact number of digits to appear after the decimal point character. The default precision is 6. In the alternative implementation decimal point character is written even if no digits follow it. For infinity and not-a-number conversion style see notes.

I'm happy to move this commit to another PR if you prefer.

I think keeping the metric formatting as the original std::to_string-like format ({:.6f}) would be best. An important edge case for which {:.6f} works but {g} or {} would be problematic is gauges that have large integer values (eg. a gauge with a unix timestamp as its value). Changing the formatting to scientific notation could make these integer values less accurate which would be confusing.

Let's plan a follow-up to improve floating-point formatting. .6f is pretty bad. It's okay for a gauge, but as soon as you take a derivative of a small slowly-changing metric, you get poor results.

avikivity · 2024-09-01T19:00:51Z

src/core/prometheus.cc

 case seastar::metrics::impl::data_type::HISTOGRAM:
 case seastar::metrics::impl::data_type::SUMMARY:
 break;
 }
- return ""; // we should never get here but it makes the compiler happy
+ return os; // we should never get here but it makes the compiler happy


Comment is now incorrect.

pgellert · 2024-09-02T18:57:54Z

@amnonh @mykaul:

I disagree, protobuf and text should be the same, I would seriously concider throwing on non standard labels
Protobuf and text should be the same

Could you please elaborate in what sense they should be the same? Just to clarify, if a user of the library intends to create a label value with the string X, prometheus expects this label value to be serialised as escaped(X) on the text protocol whereas it expects it to be serialized as the string field set to X for protobuf. These are the rules of the two independent serialisation protocols. If we set the protobuf field to escaped(X), prometheus would think that the user set this field to escaped(X) and not X. So to show the user-intended label value on the prometheus UI and other metrics frontends the label value needs to be correctly serialized according to the protocol, escaped on the text output but not escaped on the protobuf output.

Prometheus supports label values with special characters so I don't see why seastar should restrict the range of label values that it supports.

This avoids a string copy when writing the metric value and the metric type to the prometheus text-based response. Note that now `fmt::print` is used to format the values instead of the originally used `std::to_string`. This is because `std::to_string` is going to change its formatting behaviour in C++26, so we prefer to use a more deterministic formatter here. The formatting of the numbers matters because if the floating point numbers are converted to scientific notation, they can lose precision. Ref: https://en.cppreference.com/w/cpp/string/basic_string/to_string

pgellert · 2024-09-02T19:01:06Z

Force-push: remove outdated comment

mykaul · 2024-09-03T07:58:04Z

@amnonh @mykaul:

I disagree, protobuf and text should be the same, I would seriously concider throwing on non standard labels
Protobuf and text should be the same

Could you please elaborate in what sense they should be the same? Just to clarify, if a user of the library intends to create a label value with the string X, prometheus expects this label value to be serialised as escaped(X) on the text protocol whereas it expects it to be serialized as the string field set to X for protobuf. These are the rules of the two independent serialisation protocols. If we set the protobuf field to escaped(X), prometheus would think that the user set this field to escaped(X) and not X. So to show the user-intended label value on the prometheus UI and other metrics frontends the label value needs to be correctly serialized according to the protocol, escaped on the text output but not escaped on the protobuf output.

Prometheus supports label values with special characters so I don't see why seastar should restrict the range of label values that it supports.

Since users are free to switch between the protocols, we do not want the end user experience to be different when changing protocol. Unless I misunderstand the end result - would the user see the same label value eventually (say, in a query in Grafana) when switching from native to text?

I'll also add that I think protobuf is regretfully still not GA (I think?).

pgellert · 2024-09-03T08:27:37Z

would the user see the same label value eventually (say, in a query in Grafana) when switching from native to text?

Yes, they would see the same label value with this PR when they switch between native and text. Without this PR, they would see different results or be unable to switch from protobuf to text if they had label values with special characters (backslash \, double-quote ", or line feed \n).

amnonh · 2024-09-04T05:19:12Z

@amnonh @mykaul:

Prometheus supports label values with special characters so I don't see why seastar should restrict the range of label values that it supports.

Label and their values are static on creation, now, we'll need to call escape each time we send them, it's a shame. Was there an actual demand for special characters?

pgellert · 2024-09-05T16:53:41Z

I've changed the label escaping now to only happen once, on creation. Let me know if the changes look good, and if the changes in the header files are within the source compatibility requirements of seastar.

Was there an actual demand for special characters?

Yes, in redpanda we have labels that could contain these special characters (eg. consumer group names).

avikivity · 2024-09-05T17:27:58Z

include/seastar/core/metrics_api.hh

@@ -41,7 +41,7 @@ namespace seastar {
 namespace metrics {
 namespace impl {

-using labels_type = std::map<sstring, sstring>;
+using labels_type = std::map<sstring, label_value>;


I believe that's exposed as an API and will break scylladb unit tests that verify metrics.

I attempted to avoid changing this in place now: #2400 (comment)

pgellert · 2024-09-10T11:48:07Z

I've pushed some more changes to try to keep the type alias labels_type around in its original form.

Note that there are still API-incompatible changes in the last commit, namely:

The public field metric_definition_impl::labels changes its type. I intentionally didn't duplicate this field because it is public and mutable. I would rather have users fail to compile their code than silently break their code if the user only modifies metric_definition_impl::labels but doesn't modify a new field metric_definition_impl::full_labels.
The same reasoning applies for metric_id::labels() which exposes metric_id::_labels and allows its mutation externally.

@amnonh @avikivity could you please re-review and advise on whether the optimization in the last commit is worth doing or if I should drop it since the blast radius of the change is quite large and there are still API-breaking changes in it? I would like to get this PR in if possible.

avikivity · 2024-09-11T10:42:04Z

@pgellert let's split the last patch off (I haven't reviewed it yet). The fixes are more important and can be queued immediately, we can consider the optimization after that.

pgellert · 2024-09-11T10:50:24Z

Force-push: dropped the last commit based on @avikivity's feedback above

tchaikov approved these changes Aug 24, 2024

View reviewed changes

pgellert added 2 commits August 27, 2024 14:31

tests: unit test prometheus wire format

d308aba

Adds a test to verify that the metrics returned with the prometheus text format are formatted as expected.

pgellert force-pushed the prometheus-label-value branch from d156991 to 6bf5958 Compare August 27, 2024 16:48

pgellert requested a review from tchaikov August 27, 2024 16:59

xemul requested a review from amnonh August 30, 2024 12:20

avikivity reviewed Sep 1, 2024

View reviewed changes

pgellert force-pushed the prometheus-label-value branch from 6bf5958 to 6e2f629 Compare September 2, 2024 19:00

pgellert requested review from avikivity and nyh September 5, 2024 16:53

avikivity reviewed Sep 5, 2024

View reviewed changes

pgellert force-pushed the prometheus-label-value branch from 0561dac to 3ed894f Compare September 10, 2024 11:47

pgellert requested a review from avikivity September 10, 2024 11:50

pgellert force-pushed the prometheus-label-value branch from 3ed894f to 6e2f629 Compare September 11, 2024 10:48

avikivity closed this in 20cf440 Sep 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prometheus: sanitize label value for text protocol #2400

prometheus: sanitize label value for text protocol #2400

pgellert commented Aug 23, 2024

tchaikov left a comment

tchaikov Aug 24, 2024

pgellert Aug 27, 2024

tchaikov Aug 24, 2024

pgellert Aug 27, 2024

tchaikov Aug 24, 2024

pgellert Aug 27, 2024

tchaikov commented Aug 24, 2024

amnonh commented Aug 24, 2024

pgellert commented Aug 27, 2024

pgellert commented Aug 27, 2024

amnonh commented Aug 30, 2024 via email

mykaul commented Sep 1, 2024

avikivity Sep 1, 2024

nyh Sep 1, 2024

avikivity Sep 1, 2024

nyh Sep 2, 2024

pgellert Sep 2, 2024

avikivity Sep 5, 2024

avikivity Sep 1, 2024

pgellert Sep 2, 2024

pgellert commented Sep 2, 2024 •

edited

Loading

pgellert commented Sep 2, 2024

mykaul commented Sep 3, 2024 •

edited

Loading

pgellert commented Sep 3, 2024

amnonh commented Sep 4, 2024

pgellert commented Sep 5, 2024

avikivity Sep 5, 2024

pgellert Sep 10, 2024

pgellert commented Sep 10, 2024 •

edited

Loading

avikivity commented Sep 11, 2024

pgellert commented Sep 11, 2024

prometheus: sanitize label value for text protocol #2400

prometheus: sanitize label value for text protocol #2400

Conversation

pgellert commented Aug 23, 2024

tchaikov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tchaikov commented Aug 24, 2024

amnonh commented Aug 24, 2024

pgellert commented Aug 27, 2024

pgellert commented Aug 27, 2024

amnonh commented Aug 30, 2024 via email

mykaul commented Sep 1, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pgellert commented Sep 2, 2024 • edited Loading

pgellert commented Sep 2, 2024

mykaul commented Sep 3, 2024 • edited Loading

pgellert commented Sep 3, 2024

amnonh commented Sep 4, 2024

pgellert commented Sep 5, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pgellert commented Sep 10, 2024 • edited Loading

avikivity commented Sep 11, 2024

pgellert commented Sep 11, 2024

pgellert commented Sep 2, 2024 •

edited

Loading

mykaul commented Sep 3, 2024 •

edited

Loading

pgellert commented Sep 10, 2024 •

edited

Loading