Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support all patterns for Spark CAST(varchar as timestamp) #11114

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

liujiayi771
Copy link
Contributor

@liujiayi771 liujiayi771 commented Sep 27, 2024

Velox supports casting varchar to timestamp, but Spark supports more varchar to timestamp conversion patterns. This PR completes the conversion patterns needed by Spark.

Allows all patterns supported by Spark:
[+-]yyyy*
[+-]yyyy*-[m]m
[+-]yyyy*-[m]m-[d]d
[+-]yyyy*-[m]m-[d]d
[+-]yyyy*-[m]m-[d]d [h]h:[m]m:[s]s.[ms][ms][ms][us][us][us][zone_id]
[+-]yyyy*-[m]m-[d]dT[h]h:[m]m:[s]s.[ms][ms][ms][us][us][us][zone_id]
[h]h:[m]m:[s]s.[ms][ms][ms][us][us][us][zone_id]
T[h]h:[m]m:[s]s.[ms][ms][ms][us][us][us][zone_id]
where zone_id should have one of the forms:

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 27, 2024
@liujiayi771
Copy link
Contributor Author

Previous PR: #9388.

Copy link

netlify bot commented Sep 27, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 4b60b3a
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/672772ad7238cd0008ffd5ba

@liujiayi771
Copy link
Contributor Author

@pedroerp @rui-mo Could you help to review?

@liujiayi771
Copy link
Contributor Author

Resolve #11028.

@pedroerp pedroerp requested a review from rui-mo October 1, 2024 03:16

namespace facebook::velox::functions::sparksql {

SparkCastHooks::SparkCastHooks(const core::QueryConfig& config) : CastHooks() {
const auto sessionTzName = config.sessionTimezone();
if (config.adjustTimestampToTimezone() && !sessionTzName.empty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heads up that adjustTimestampToTimezone() was added to be compatible with Presto's legacy_timestamp mode, which basically means that if you don't have a time zone associated with a timestamp, it would assume it was in the local user time zone.

I'm not sure if this is the same behavior in Spark.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We always set adjustTimestampToTimezone = true in gluten, and it is consistent with Spark's behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks for clarifying.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liujiayi771, maybe, we can remove the check for config.adjustTimestampToTimezone() since this implementation is only for Spark without any compatibility consideration for presto.

velox/functions/sparksql/specialforms/SparkCastHooks.cpp Outdated Show resolved Hide resolved
auto sign = timeZoneName[start];
auto offset = timeZoneName.substr(start + 1);
std::string hm, h, m, s;
RE2 hhPattern(R"((\d?\d))");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how hard would it be to implement the parsing logic without using re2, like we do for other timestamp conversions/parsing? I'm slightly worried about performance, but also adding re2 as a dependency to basically everything else in the library.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The primary purpose here is to extract these four scenarios.

  • h[h]
  • hh[:]mm
  • hh:mm:ss
  • hhmmss

Using re2 code might make it more concise, but for performance considerations, I will try to implement the logic here without using re2.


bool normalizeSparkTimezone(std::string& timeZoneName, Timestamp& ts) {
// Spark support an id with one of the prefixes UTC+, UTC-, GMT+, GMT-, UT+ or
// UT-, and a suffix in the formats: h[h], hh[:]mm, hh:mm:ss, hhmmss
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does Spark accept timezone offsets containing seconds? AFAIK, IANA's definition is only (+/-)00:00 (hours and minutes). There are no use cases for timezone conversions with second granularity.

Copy link
Contributor Author

@liujiayi771 liujiayi771 Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Spark's timezone supports a special format that includes a seconds offset, such as UTC+hh:mm:ss. Therefore, I need to remove the seconds offset from the timezone, and add it to the Timestamp result.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is odd. What does Spark do if you have seconds on that definition? Does it add the seconds to the timezone conversion process?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Spark will add the seconds to the timezone conversion process. For example:

spark-sql (default)> select cast('2015-03-07 19:33:33 UTC-00:00:01' as timestamp);
2015-03-08 03:33:34
Time taken: 0.044 seconds, Fetched 1 row(s)

return false;
}
}
RE2 singleHourTz(R"((\+|\-)(\d):)");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have similar logic to parse timezone offsets (full or partially defined). Could we reuse it here instead of relying on re2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this implementation, I followed the handling from Spark's source code, where Spark uses regex matching to pad the hours and minutes to two digits. Therefore, I used re2, and I will try to reuse the existing code. I will also attempt to modify other parts that use re2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pedroerp I checked the TimeZoneMap::normalizeTimeZoneOffset method, and its normalization rules are inconsistent with Spark's rules in this context. Spark corrects timezone offsets like +2:3, +02:3, or +2:03 to +02:03. I will add the normalization rule in Spark to not use re2.

Copy link
Collaborator

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Expected<Timestamp> SparkCastHooks::castStringToTimestamp(
const StringView& view) const {
return util::fromTimestampString(
// Allows all patterns supported by Spark:
// `[+-]yyyy*` `[+-]yyyy*-[m]m` `[+-]yyyy*-[m]m-[d]d` `[+-]yyyy*-[m]m-[d]d `
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: perhaps one pattern one line for readability.

Is there an extra white-space after the last pattern `[+-]yyyy*-[m]m-[d]d `?

Copy link
Contributor Author

@liujiayi771 liujiayi771 Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an extra white-space after the last pattern [+-]yyyy*-[m]m-[d]d ?

Yes, this is one of the patterns.

// `T[h]h:[m]m:[s]s.[ms][ms][ms][us][us][us][zone_id]`
//
// where `zone_id` should have one of the forms:
// - Z - Zulu time zone UTC+0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps use 1) 2) ... 5) for the forms to distinguish with those on L47-L50.

velox/functions/sparksql/specialforms/SparkCastHooks.cpp Outdated Show resolved Hide resolved
velox/type/TimestampConversion.cpp Outdated Show resolved Hide resolved
velox/type/TimestampConversion.cpp Outdated Show resolved Hide resolved
// Additionally, IANA does not support seconds in the offset, so this func will
// add the seconds offset to the input Timestamp.
bool parseSparkTzWithPrefix(
std::string& timeZoneName,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use 'std::string_view' as parameter type to avoid creating intermediate strings?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This func will return bool to indicate whether an exception has occurred. Using std::string& for the parameter is because we also need to modify the content of timeZoneName.

size_t start) {
VELOX_DCHECK_GE(timeZoneName.length(), start + 2);
auto sign = timeZoneName[start];
auto offset = timeZoneName.substr(start + 1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto. Can we use 'std::string_view'?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following will concatenate offset with other strings, and it always requires copying.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the 'concat' only happens on L846. Can we specially handle it there and use string_view elsewhere? I assume the 'substr' could be avoided then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rui-mo I added an appendZeroMinutes variable to handle this special case in the if branch.

@@ -865,9 +995,15 @@ fromTimestampWithTimezoneString(
timezonePos++;
}

std::string_view timeZoneName(str + pos, timezonePos - pos);
std::string timeZoneName(str + pos, timezonePos - pos);
std::string normalizeTzName = timeZoneName;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto. Can we use std::string_view?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use std::string_view for timeZoneName, but normalizeTzName need to be std::string, because we will change its content in normalizeSparkTimezone.

velox/type/TimestampConversion.cpp Outdated Show resolved Hide resolved
velox/type/TimestampConversion.cpp Outdated Show resolved Hide resolved
velox/type/TimestampConversion.cpp Outdated Show resolved Hide resolved
velox/type/TimestampConversion.cpp Outdated Show resolved Hide resolved
velox/type/TimestampConversion.cpp Outdated Show resolved Hide resolved

if ((timeZone = tz::locateZone(timeZoneName, false)) == nullptr) {
if ((timeZone = tz::locateZone(normalizeTzName, false)) == nullptr) {
return folly::makeUnexpected(
Status::UserError("Unknown timezone value: \"{}\"", timeZoneName));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use 'normalizeTzName' in the error msg.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I deliberately used timeZoneName in the msg because timeZoneName is the original timezone string, while normalizeTzName has been modified. I want the user to see that the original timezone is problematic in the error msg.

@liujiayi771 liujiayi771 force-pushed the spark-varchar-cast-ts branch from 2042f61 to 8b3a6ac Compare October 24, 2024 00:58
@@ -117,6 +117,20 @@ inline bool characterIsDigit(char c) {
return c >= '0' && c <= '9';
}

inline bool
allCharactersIsDigit(std::string_view str, size_t start, size_t end) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps add a bound check for start and end.

  if (start > end || end >= str.size()) {
    return false;
  }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A minor suggestion.
Seems allCharactersIsDigit(std::string_view str) is enough. We can just pass str.substr() at caller side to it.

velox/type/TimestampConversion.cpp Outdated Show resolved Hide resolved
velox/type/TimestampConversion.cpp Outdated Show resolved Hide resolved
velox/type/tz/TimeZoneMap.cpp Outdated Show resolved Hide resolved
TJavaShortIdIndex lower;
lower.reserve(javaShortIdMap.size());
for (auto& it : javaShortIdMap) {
lower[boost::algorithm::to_lower_copy(it.first)] =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a 'sanitizeName' method to convert name as lower case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Including FunctionSignature.h here seems a bit odd; I think it's better to stick with Boost to maintain consistency with the handling logic of TimeZoneIndex.

@rui-mo rui-mo changed the title Support all cast patterns from varchar to timestamp for spark Support all patterns for Spark CAST(varchar as timestamp) Oct 28, 2024
Copy link
Contributor

@PHILO-HE PHILO-HE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it may be better to move the Spark timezone normalization impl. into some other places dedicated for handling timezone, e.g., TimeZoneMap.cpp, like https://github.com/facebookincubator/velox/blob/main/velox/type/tz/TimeZoneMap.cpp#L180.

Not sure whether most code can be re-used by presto. Anyway, we can call these util functions only for Spark if not suitable for Presto.


namespace facebook::velox::functions::sparksql {

SparkCastHooks::SparkCastHooks(const core::QueryConfig& config) : CastHooks() {
const auto sessionTzName = config.sessionTimezone();
if (config.adjustTimestampToTimezone() && !sessionTzName.empty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liujiayi771, maybe, we can remove the check for config.adjustTimestampToTimezone() since this implementation is only for Spark without any compatibility consideration for presto.

// Spark supports a timezone id with one of the prefixes UTC+, UTC-, GMT+,
// GMT-, UT+ or UT-, and a suffix in the following formats: h[h], hh[:]mm,
// hh:mm:ss, hhmmss.
if (startsWith(timeZoneName, "UTC") || startsWith(timeZoneName, "GMT")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So "Etc/GMT" will not be handled here and will directly be located by external date lib, right?

Copy link
Contributor Author

@liujiayi771 liujiayi771 Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Etc/GMT" will be handled in TimeZoneMap::normalizeTimeZone. This code only handles some tz ids like "GMT+8", "UTC-07:06". As you mentioned earlier, Spark normalization can be added to TimeZoneMap, but these normalizations are specific to Spark. It might be necessary to differentiate within TimeZoneMap whether it's for Spark.

@liujiayi771
Copy link
Contributor Author

liujiayi771 commented Oct 30, 2024

I think it may be better to move the Spark timezone normalization impl. into some other places dedicated for handling timezone.

@PHILO-HE TimestampConversion is used for Timestamp conversion or normalization, it can be used by Presto and Spark. However, some common normalizations are already handled in TimeZoneMap. Perhaps we can add Spark-specific normalizations in TimeZoneMap, but it will require passing TimestampParseMode information.

@liujiayi771 liujiayi771 force-pushed the spark-varchar-cast-ts branch from 8b3a6ac to 80e58d1 Compare October 31, 2024 06:59
Copy link
Contributor

@PHILO-HE PHILO-HE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liujiayi771, this pr only considers the normalization for timezone embedded in the input timestamp string, right? I remember Spark prioritize the using of this embedded timezone if it is specified in the input string. If it is not, the timezone passed from config is used. Does the configured timezone need to be normalized similarly in the conversion path for string without timezone embedded?

@@ -117,6 +117,20 @@ inline bool characterIsDigit(char c) {
return c >= '0' && c <= '9';
}

inline bool
allCharactersIsDigit(std::string_view str, size_t start, size_t end) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A minor suggestion.
Seems allCharactersIsDigit(std::string_view str) is enough. We can just pass str.substr() at caller side to it.

velox/type/TimestampConversion.cpp Outdated Show resolved Hide resolved
if (ts.getSeconds() - sec < Timestamp::kMinSeconds) {
return false;
}
ts = Timestamp(ts.getSeconds() - sec, ts.getNanos());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a uniform adjustment by ts.toGMT(timezone) based on the returned timestamp and timezone from fromTimestampWithTimezoneString.

Can we somehow let ts.toGMT(timezone) do your proposed adjustment based on sec?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PHILO-HE The goal here is to obtain a timezone id string that can be handled by tz::locateZone, and this method does not support seconds. The offset with seconds is added to the input Timestamp in advance so that subsequent processing can ignore the seconds. Ultimately, in SparkCastHooks::castStringToTimestamp, it is converted to the correct timezone using result.first.toGMT(*result.second). The toGMT method cannot be used here because it cannot handle seconds in this way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liujiayi771, I get your point. I mean can we extract the adjustment based on sec and somehow make ts.toGMT include this adjustment? Then, here we can only do timezone normalization. This decouple may be helpful to support normalizing timezone from config.

Another question is, do we really need to support timezone with second-level offset? It is very rare to use such timezone in real world. Actually almost all timezones have hour, half hour or quarter hour level offset. If there is no real request, maybe we can just ignore such timezone in the implementation, even though Spark supports it.

Copy link
Contributor Author

@liujiayi771 liujiayi771 Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PHILO-HE I see, I can extract this logical to a method in Timestamp.

do we really need to support timezone with second-level offset?

I'm not very clear on this myself; I just adapted it according to what Spark supports. The code for supporting seconds is not extensive, but it might be somewhat hard to understand.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liujiayi771, then, I recommend to ignore such timezone with second-level offset. You can ask for others' feedback in the community. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @zhouyuan @rui-mo. What are your suggestions on whether to support seconds?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+|-h[h]:mm:ss
+|-hhmmss

@liujiayi771 I notice you mentioned above two kinds of suffix for the timezone id, and want to know where they come from. Does Spark mention such requirement? Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rui-mo They come from the comments in Spark code.
https://github.com/apache/spark/blob/d3d84e045cc484cf7b70d36410a554238d7aef0e/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L288-L289
+|-h:mm:ss is not present in the comments, but I have tested this pattern can work.

@liujiayi771
Copy link
Contributor Author

@liujiayi771, this pr only considers the normalization for timezone embedded in the input timestamp string, right? I remember Spark prioritize the using of this embedded timezone if it is specified in the input string. If it is not, the timezone passed from config is used. Does the configured timezone need to be normalized similarly in the conversion path for string without timezone embedded?

Yes, you are right, the timezone id in config may also needs to be normalized. For example, user may set "GMT+8" timezone in config, it is supported by velox. I think this scenario might occur in other Timestamp-related functions and casts as well. The normalization of the timezone in the config should be placed in a more centralized location to apply to all these scenarios. Perhaps the conversion can be done when handling timezone config in the Gluten cpp? This can be managed within the Gluten code by utilizing the utility method added in this PR. What do you think? I can continue working on this modification.

@PHILO-HE
Copy link
Contributor

PHILO-HE commented Nov 1, 2024

@liujiayi771, this pr only considers the normalization for timezone embedded in the input timestamp string, right? I remember Spark prioritize the using of this embedded timezone if it is specified in the input string. If it is not, the timezone passed from config is used. Does the configured timezone need to be normalized similarly in the conversion path for string without timezone embedded?

Yes, you are right, the timezone id in config may also needs to be normalized. For example, user may set "GMT+8" timezone in config, it is supported by velox. I think this scenario might occur in other Timestamp-related functions and casts as well. The normalization of the timezone in the config should be placed in a more centralized location to apply to all these scenarios. Perhaps the conversion can be done when handling timezone config in the Gluten cpp? This can be managed within the Gluten code by utilizing the utility method added in this PR. What do you think? I can continue working on this modification.

@liujiayi771, I feel it may be better to put the normalization on Velox side. Can we just extend the existing normalizeTimeZone in TimeZoneMap.cpp with your normalization logic? That is, add more if check for your considered timezone pattern in this function, then do normaization accordingly. Though this function is also called for Presto, I guess the extended handling may not have any bad impact on Presto.

@liujiayi771
Copy link
Contributor Author

liujiayi771 commented Nov 1, 2024

Can we just extend the existing normalizeTimeZone in TimeZoneMap.cpp with your normalization logic?

I think we still need to differentiate between Presto and Spark. For example, GMT+8 may throw an exception or return NULL in Presto. However, if we add Spark's normalization logic, we would get the correct timezone, which would lead to inconsistency with Presto's results.

We are currently unclear whether applying Spark's normalization to Presto will affect Presto. It seems a bit strange to place Spark-specific normalization into the TimeZoneMap, as the tz::locateZone method would need to accept TimestampParseMode. If these normalizations are also applicable to Presto, then placing them in the TimeZoneMap for shared functionality between Presto and Spark would be acceptable.

@liujiayi771 liujiayi771 force-pushed the spark-varchar-cast-ts branch from 7bf695d to 4b60b3a Compare November 3, 2024 12:55
@@ -71,6 +71,19 @@ void Timestamp::toGMT(const tz::TimeZone& zone) {
seconds_ = sysSeconds.count();
}

void Timestamp::toGMT(const tz::TimeZone& zone, int32_t secondsOffset) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that if you want to do a timezone offset conversion, you should use this TimeZone constructor and let the TimeZone object handle the conversion internally:

https://github.com/facebookincubator/velox/blob/main/velox/type/tz/TimeZoneMap.h#L98

Looks like to support this case we should first change the TimeZone class to support std::chrono::second instead of std::chrono::minute, then do the appropriate plumbing. If you change it there, you probably won't need to make any changes to the Timestamp class.

Copy link
Contributor

@PHILO-HE PHILO-HE Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pedroerp, yes, if we truly need to support seconds offset, it's better to change TimeZone class to use seconds for offset time representation. My question is, do we really need to support that kind of timezone? As far as I know, there is no timezone with seconds offset in the real world.

Copy link
Contributor Author

@liujiayi771 liujiayi771 Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @pedroerp. Thank you for your suggestion. I also @ you in #11411 to ask about your thoughts on the timezone formats supported by Velox can be the union of Presto and Spark. What do you think? Velox supporting more timezone formats shouldn't affect users' usage, and I can move Spark's normalization code into the TimeZoneMap.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question is, do we really need to support that kind of timezone? As far as I know, there is no timezone with seconds offset in the real world.

@PHILO-HE I asked the same question at some point, but it was pointed out that this is something (somehow) supported by Spark. So if we absolutely need to support it, it would probably better to do so within the TimeZone class, so at least it is a bit more encapsulated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liujiayi771 let me get a hold on @kevinwilfong who has recently changed that code and has more context how this code is organized now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants