Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Demonstrate issue with FiniteDateTimeRangeField #169

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

tdcdev
Copy link

@tdcdev tdcdev commented Aug 13, 2024

This commit shows the existence of an edge case introduced by this change: 7558f71

If we store the last hour of the DST period we won't be able to convert it to a proper FiniteDatetimeRange. Because we set clocks back by one hour then start == end.

This commit shows the existence of an edge case introduced by this
change: 7558f71

If we store the last hour of the DST period we won't be able to convert
it to a proper `FiniteDatetimeRange`. Because we set clocks back by one
hour then start == end.
@tdcdev tdcdev requested a review from jarshwah August 13, 2024 13:15
@tdcdev tdcdev marked this pull request as ready for review August 13, 2024 13:16
@jarshwah
Copy link
Contributor

I'll start by saying that perhaps doing automatic localtime conversion in the returned data was a mistake - and we should have built timezone conversion directly into ranges for the datetime types. Alas.

But I'm not sure there's a real bug here. You're starting in UTC and attempting to go to an ambiguous localtime. If you had been working in localtime in the first place, you wouldn't have been able to construct the range with an ambiguous round trip.

In [78]: dt = datetime.datetime(2021, 10, 31, 1, tzinfo=localtime.LONDON)

In [79]: print(dt)
    ...: for tz in [localtime.UTC, localtime.LONDON]:
    ...:     dt = localtime.as_localtime(dt, tz=tz)
    ...:     print(dt)
    ...: 
2021-10-31 01:00:00+01:00
2021-10-31 00:00:00+00:00
2021-10-31 01:00:00+01:00

The problem stems from starting in UTC and expecting it to resolve to an unambiguous localtime. As you can see here, attempting to convert UTC to LONDON results in the fold=1 attribute being set:

In [81]: localtime.as_localtime(datetime.datetime(2021, 10, 31, 1, 0, tzinfo=datetime.timezone.utc), tz=localtime.LONDON)
Out[81]: datetime.datetime(2021, 10, 31, 1, 0, fold=1, tzinfo=zoneinfo.ZoneInfo(key='Europe/London'))
In [82]: print(localtime.as_localtime(datetime.datetime(2021, 10, 31, 1, 0, tzinfo=datetime.timezone.utc), tz=localtime.LONDON))
2021-10-31 01:00:00+00:00

Because the timezone logic sees that it's ambiguous (+1 or +0) it chooses one and sets the fold attribute. But, when you specifically start in localtime it's already at +1 offset.

That said, the raised error is far away from the creation of the error. We only see if fail when querying the data when the error occurred at write time.

Perhaps the answer here is to proactively do the localtime conversion on the range to catch any errors prior to the data being written, since we know it'll fail at query time. Or.. unwind the previous commit which does localtime conversion at query time .. which will break lots of expectations downstream.

@patiences
Copy link

@jarshwah thanks for your response! Help me understand a few things here... 🙏

Perhaps the answer here is to proactively do the localtime conversion on the range to catch any errors prior to the data being written, since we know it'll fail at query time.

The data (in this case, it's price data from price indexes) comes to us in UTC format and if we convert to localtime before storage, that will fail. I don't think this is the right solution then because there are actually two hours of data (prices), which Kraken should be able to store.

Or.. unwind the previous commit which does localtime conversion at query time .. which will break lots of expectations downstream.

😬

Are there other backwards-compatible options? Could we default to localtime but also enable the caller to specify their own timezone?

@jarshwah
Copy link
Contributor

Yeah a timezone argument would work, accepting None to do no conversion.

This issue has been bugging me, because I’m not sure it should be ambiguous - and what else are we doing wrong (with localtime) if it is? The only reason we’re seeing an error here is because the range type doesn’t like an empty range. Need to dig in more.

But in the mean time agree an argument to the field would be suitable.

@CostantiniMatteo
Copy link

I'm confused here. Is it really ambiguous? Let's take this year change from DST to standard time:

# No confusion for sure
> dt = datetime.datetime(2024, 10, 26, 23, 0, tzinfo=zoneinfo.ZoneInfo(key='UTC'))
> dt.astimezone(zoneinfo.ZoneInfo(key='Europe/Amsterdam'))
datetime.datetime(2024, 10, 27, 1, 0, tzinfo=zoneinfo.ZoneInfo(key='Europe/Amsterdam'))

# First 2am local time
> dt = datetime.datetime(2024, 10, 27, 0, 0, tzinfo=zoneinfo.ZoneInfo(key='UTC'))
> dt.astimezone(zoneinfo.ZoneInfo(key='Europe/Amsterdam'))
datetime.datetime(2024, 10, 27, 2, 0, tzinfo=zoneinfo.ZoneInfo(key='Europe/Amsterdam'))

# Second 2am localtime
> dt = datetime.datetime(2024, 10, 27, 1, 0, tzinfo=zoneinfo.ZoneInfo(key='UTC'))
> dt.astimezone(zoneinfo.ZoneInfo(key='Europe/Amsterdam'))
datetime.datetime(2024, 10, 27, 2, 0, fold=1, tzinfo=zoneinfo.ZoneInfo(key='Europe/Amsterdam'))

# No confusion again
> dt = datetime.datetime(2024, 10, 27, 2, 0, tzinfo=zoneinfo.ZoneInfo(key='UTC'))
> dt.astimezone(zoneinfo.ZoneInfo(key='Europe/Amsterdam'))
datetime.datetime(2024, 10, 27, 3, 0, tzinfo=zoneinfo.ZoneInfo(key='Europe/Amsterdam'))

The mapping between UTC and local should still be doable, doesn't it?

@jarshwah
Copy link
Contributor

I'm confused here. Is it really ambiguous?

Me too! I dug into this a bit more..

The problem is purely within the range type, specifically the check_op function that compares start and end boundaries (and, python in general).

l1 = datetime.datetime(2021, 10, 31, 1, 0, tzinfo=zoneinfo.ZoneInfo(key='Europe/London'))
l2 = datetime.datetime(2021, 10, 31, 1, 0, fold=1, tzinfo=zoneinfo.ZoneInfo(key='Europe/London'))

In [40]: l1 == l2
Out[40]: True

In [41]: l1 < l2
Out[41]: False

In [42]: l1 > l2
Out[42]: False

Python isn't using the fold attribute to compare the times, despite the offset being different:

In [44]: print(l1, '!=', l2)
2021-10-31 01:00:00+01:00 != 2021-10-31 01:00:00+00:00

From https://peps.python.org/pep-0495/#temporal-arithmetic-and-comparison-operators

The value of the fold attribute will be ignored in all operations with naive datetime instances. As a consequence, naive datetime.datetime or datetime.time instances that differ only by the value of fold will compare as equal. Applications that need to differentiate between such instances should check the value of fold explicitly or convert those instances to a timezone that does not have ambiguous times (such as UTC).

Naive and intra-zone comparisons will ignore the value of fold and return the same results as they do now. (This is the only way to preserve backward compatibility. If you need an aware intra-zone comparison that uses the fold, convert both sides to UTC first.)

This can mean that any localised datetimes within our range type can be incorrect if the fold attribute is set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants