Skip to content

Commit

Permalink
fix: Handle escaped separators in Presto url_extract_parameter functi…
Browse files Browse the repository at this point in the history
…on (#11540)

Summary:
Pull Request resolved: #11540

Presto Java supports the case where a separator ('=' or '&') is percent escaped in the query portion
of the URL in url_extract_parameter.

Velox currently only unescapes the value of the parameter once it's been found, so when a separator
is escaped it breaks parsing the parameters in the query.

This change fixes it so Velox's url_extract_parameter unescapes the query portion of the URL before searching for the parameter in it.

Reviewed By: xiaoxmeng

Differential Revision: D65967602

fbshipit-source-id: bbbdfbd40d5598a71e0adde5a3bb8d8a725956fe
  • Loading branch information
Kevin Wilfong authored and facebook-github-bot committed Nov 22, 2024
1 parent d06bdfa commit bfaa9a3
Show file tree
Hide file tree
Showing 2 changed files with 29 additions and 6 deletions.
17 changes: 11 additions & 6 deletions velox/functions/prestosql/URLFunctions.h
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,7 @@ FOLLY_ALWAYS_INLINE void urlUnescape(
TOutString& output,
const TInString& input) {
auto inputSize = input.size();
output.reserve(inputSize);
output.resize(inputSize);

auto outputBuffer = output.data();
const char* p = input.data();
Expand Down Expand Up @@ -361,23 +361,28 @@ struct UrlExtractParameterFunction {
static const boost::regex kQueryParamRegex(
"(^|&)" // start of query or start of parameter "&"
"([^=&]*)=?" // parameter name and "=" if value is expected
"([^=&]*)" // parameter value
"([^&]*)" // parameter value (allows "=" to appear)
"(?=(&|$))" // forward reference, next should be end of query or
// start of next parameter
);

StringView query = uri.query;
std::string unescapedQuery;
if (uri.queryHasEncoded) {
detail::urlUnescape(unescapedQuery, uri.query);
query = StringView(unescapedQuery);
}

const boost::cregex_iterator begin(
uri.query.data(),
uri.query.data() + uri.query.size(),
kQueryParamRegex);
query.data(), query.data() + query.size(), kQueryParamRegex);
boost::cregex_iterator end;

for (auto it = begin; it != end; ++it) {
if (it->length(2) != 0 && (*it)[2].matched) { // key shouldnt be empty.
auto key = detail::submatch((*it), 2);
if (param.compare(key) == 0) {
auto value = detail::submatch((*it), 3);
detail::urlUnescape(result, value);
result.copy_from(value);
return true;
}
}
Expand Down
18 changes: 18 additions & 0 deletions velox/functions/prestosql/tests/URLFunctionsTest.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -531,6 +531,24 @@ TEST_F(URLFunctionsTest, extractParameter) {
"http://example.com/path1/p.php?k1=v1&k2=v2&k3&k4#Ref1", "k6"),
std::nullopt);
EXPECT_EQ(extractParam("foo", ""), std::nullopt);
EXPECT_EQ(
"",
extractParam(
"http://example.com/path1/p.php?k1=v1&k2=v2&k3%26k4=v4", "k3"));
EXPECT_EQ(
"v3",
extractParam(
"http://example.com/path1/p.php?k1=v1&k2=v2&k3=v3%26k4=v4", "k3"));
EXPECT_EQ(
"v3",
extractParam(
"http://example.com/path1/p.php?k1=v1&k2=v2&k3%3Dv3%26k4=v4", "k3"));
// Test "=" inside a parameter value.
EXPECT_EQ(
"v3.1=v3.2",
extractParam(
"http://example.com/path1/p.php?k1=v1&k2=v2&k3%3Dv3.1%3Dv3.2%26k4=v4",
"k3"));
}

TEST_F(URLFunctionsTest, urlEncode) {
Expand Down

0 comments on commit bfaa9a3

Please sign in to comment.