Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

missing timezone #174

Open
TheCutestCat opened this issue Nov 28, 2024 · 1 comment
Open

missing timezone #174

TheCutestCat opened this issue Nov 28, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@TheCutestCat
Copy link

TheCutestCat commented Nov 28, 2024

The revelant HTML file : htmldate_debug_no_timezone.html.zip

Thanks for all your hard work! htmldate is very useful for me.
But when I use htmldate, I found ther is no timezone in the result.
I try the code :

from htmldate import find_date
from pathlib import Path

content = Path(input_path).read_text(encoding='utf-8')

from lxml import html
mytree = html.fromstring(content)

publish_time = find_date(mytree, outputformat="%Y-%m-%d %H:%M:%S%z")

HTML be like :

"datePublished": "2024-11-06T08:37:00+05:30",

Result I expected :

2024-11-06T08:37:00+05:30

the result from htmldate :

2024-11-06 00:00:00

There is no timezone, please help check this problem.
I will be very glad to fix this problem with you.

@TheCutestCat TheCutestCat changed the title timezone missing timezone Nov 28, 2024
@adbar adbar added the enhancement New feature or request label Nov 28, 2024
@adbar
Copy link
Owner

adbar commented Nov 29, 2024

Hi @TheCutestCat, when dates are found using HTML markup you get the time zone, when they are extracted from free text regexes are applied. The regular expressions don't include time zones for now. Feel free to have a look and draft a pull request, your case is here (and others below and above):

rf'"datePublished": ?"({YEAR_RE}-{MONTH_RE}-{DAY_RE})', re.I

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants