Skip to content

Commit

Permalink
Check other ISO8601 datetimes for WARC/1.1 compliance. Re:#283
Browse files Browse the repository at this point in the history
  • Loading branch information
machawk1 committed Jun 18, 2020
1 parent 73f136f commit 9cd23ba
Showing 1 changed file with 12 additions and 3 deletions.
15 changes: 12 additions & 3 deletions ipwb/util.py
Original file line number Diff line number Diff line change
Expand Up @@ -159,10 +159,19 @@ def rfc1123ToDigits14(rfc1123DateString):
return d.strftime('%Y%m%d%H%M%S')


def iso8601ToDigits14(iso8601DateString):
def iso8601ToDigits14(warcDatetimeString):
setLocale()
d = datetime.datetime.strptime(iso8601DateString,
"%Y-%m-%dT%H:%M:%SZ")

iso8601_datestrings = ["%Y-%m-%dT%H:%M:%SZ", "%Y-%m-%dT%H:%MZ", '%Y-%m-%dT%HZ', '%Y-%m-%d', '%Y-%m', '%Y', '%Y-%m-%dT%H:%M:%S.%fZ']

for format in iso8601_datestrings:
try:
d = datetime.datetime.strptime(warcDatetimeString, format)
break

except ValueError as ve:
print(f'ValueError matching {format} for value {warcDatetimeString}')


# TODO: Account for conversion if TZ other than GMT not specified

Expand Down

3 comments on commit 9cd23ba

@ibnesayeed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@machawk1 this will change the exception behavior. Earlier, if a string was supplied that does not fit the specific format then strptime would have thrown an exception to the caller. Now, all the strptime-originated exceptions are suppressed and if the value matches none of the formats then d will not be initialized, as a result the return line will through a different type of exception when it will try to call strftime on an undefined object.

@machawk1
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ibnesayeed That behavior is somewhat by-design but not yet handled. This is exhibited, for example, when variableSizedDates.warc is indexed. The exception thrown is UnboundLocalError: local variable 'd' referenced before assignment. Catching it in the present try-catch will not work, as the exception is thrown in the transformation step current inline with the return statement.

How do you recommend we revise this? It is not yet complete (hence, not merged).

@ibnesayeed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with the UnboundLocalError exception is that we are telling the caller about a variable local to this function that the caller has no control on. We should be throwing an exception to let the caller know that their string does not comply with any of the supported formats, which will essentially be a ValueError.

Please sign in to comment.