Add conversion for HTML to markdown #932

hrodz · 2023-08-20T21:58:09Z

Description:

Related issue (if applicable): #527

I use Mailrise to get email notifications for various services. The HTML to Text conversion works great for regular notifications but it will be nice to be able to receive and click on links. This is my attempt to implement a HTML to Markdown converter to be able to have links on the notifications.

This code is heavily base on the current HTML to Text converter. I'm not a developer but I'm willing to learn and work on the code to make this feature available. Let me know if this is something we can work on. I have been testing with Gotify notifications and it seems to work fine.

The only thing I have not been able to get working are line breaks. I have try with different characters (double space, \,  ) without success on Gotify. Thanks.

Checklist

The code change is tested and works locally.
There is no commented out code in this PR.
No lint errors (use flake8)
100% test coverage

caronc · 2023-08-21T06:01:04Z

Thank you for the PR, I'll try to review it soon

codecov-commenter · 2023-08-21T06:02:17Z

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Files	Coverage Δ
apprise/conversion.py	`99.30% <98.71%> (-0.70%)`	⬇️

... and 11 files with indirect coverage changes

📢 Thoughts on this report? Let us know!.

caronc · 2023-08-22T00:30:22Z

@hrodz : this is a pretty massive undertaking, there is a lot of missing code coverage. But as you get your code more stable and to your liking, I can try and pitch in and help here.

On a second note:
Parsing HTML is fine if structured okay, but i'll be interested in how you handle things like...

<html
<!--- no closing tag above >
<body>test
<b>test</b>
</body>

Or tags that don't align:

<p>A story about something that has no closing paragraph tag.

Another scenario might be deep nesting with missing tags, etc.

Just to be clear, I'm not saying your solution has to be perfect 🙂 . But there are so many what-ifs when it comes to HTML. I'm actually really excited to see how you choose to solve these types of things. I'll be looking forward to your next commit and to see how this pull request evolves. 🚀

hrodz · 2023-08-23T00:27:43Z

@caronc Thanks for taking the time to look at this and provide feedback. I will work on the code coverage. Just wanted to know if you have any thoughts about this before committing more time to it.

Regarding working with invalid HTML, the two example provided seem to be already covered by the current html_to_text conversion, this code will produce similar results. I understand that I may have to do more testing on the parsing of the invalid tags to be converted to markdown, but if possible I would like more insight on the extend of this validation. Would it be acceptable to have a similar handling of invalid HTML as the html_to_text conversion? Or should we expect it to handle more cases? Thanks.

caronc · 2023-10-06T22:14:23Z

I reviewed your code; i had to make some changes, but i think i got the test coverage done.

The only thing is what you're doing is very tricky... it's hard to track indentation; so some formatting will get lost (hence if a code block should have been indented over because we're currently in a list (or <li> object for example....

But overall it seems to work. I updated the test cases a bit too... the previous code was escaping characters that shouldn't have been (like periods . for example) where in the markdown as \\.

Have a look at what i've done let me know your thoughts

Add conversion for HTML to markdown

8d543a5

caronc added 2 commits October 6, 2023 18:08

code & test improvements, added more coverage

f725b3a

complete coverage of what is there

4c1b5ca

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add conversion for HTML to markdown #932

Add conversion for HTML to markdown #932

hrodz commented Aug 20, 2023

caronc commented Aug 21, 2023

codecov-commenter commented Aug 21, 2023 •

edited

caronc commented Aug 22, 2023

hrodz commented Aug 23, 2023

caronc commented Oct 6, 2023

Add conversion for HTML to markdown #932

Are you sure you want to change the base?

Add conversion for HTML to markdown #932

Conversation

hrodz commented Aug 20, 2023

Description:

Checklist

caronc commented Aug 21, 2023

codecov-commenter commented Aug 21, 2023 • edited

Codecov Report

caronc commented Aug 22, 2023

hrodz commented Aug 23, 2023

caronc commented Oct 6, 2023

codecov-commenter commented Aug 21, 2023 •

edited