Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include custom HTML attributes #170

Open
wants to merge 35 commits into
base: master
Choose a base branch
from

Conversation

legenderrys
Copy link

@legenderrys legenderrys commented Dec 9, 2022

This is similar to this PR#134

I would like to propose an alternative solution that will not alter existing render methods.

Html attribute block Proposed Spec

Line containing the following string
${html_attr_name:value, another_html_attr:value}
will describe the html attributes for the element proceeding it.

Contents within the ${...} string will be a comma separated list of key/value pairs.
The > character will separate parent attributes from child attributes.

Example Html attribute block
INPUT

${id:my-value, class:some-class}
# Mistletoe is Awesome

${id:my-list, class:foo >class:bar-items}
- Item One
- Item Two
- Item Three\

OUTPUT

<h1 id="my-value" class="some-class">Mistletoe is Awesome</h1>
<ul id="my-list" class="foo">
    <li class="bar-items">Item One</li>
    <li class="bar-items">Item Two</li>
    <li class="bar-items">Item Three</li>
</ul>

@legenderrys legenderrys changed the title Include any custom HTML attributes on any Token Include custom HTML attributes Dec 9, 2022
@legenderrys legenderrys marked this pull request as draft December 9, 2022 16:39
@legenderrys legenderrys marked this pull request as ready for review December 10, 2022 02:13
@anderskaplan
Copy link
Contributor

Hi, a few comments from someone who's been messing a bit with this code base recently.

I think the approach with a new block token is a good one, and it fits nicely in the markdown syntax.
It definitely has an advantage over the other proposed method to add attributes, in that you don't need to
write a custom renderer to use it.

Here are a few suggestions:

  1. The way this is added into block_tokenizer.make_tokens() feels a bit hacky. Why not treat it like
    any other block token and handle it in HTMLRenderer.render_html_attributes() instead?
    That's more in line with the extensible design of mistletoe.
  2. Since this is optional and non-standard functionality, I'd suggest to not add it to HTMLrenderer but rather to create a new
    renderer class for it in the contrib folder, derived from HTMLRenderer of course. The new
    block token class could go there as well.
  3. The format for the ${...} strings should be defined more clearly. What about spaces, newlines, escaping, ...?
    Not saying it needs to be more complicated than necessary -- just that it should be well defined.
  4. Markdown can get complicated and I'm not sure if the proposed design will work well with lists nested inside lists,
    for example. It would be nice with some test cases to demonstrate how it handles more difficult cases. For example,
    if you set the id attribute on a list that has a nested list inside it, you don't want the same id on the inner list
    because that would be invalid html.

Hope this can be helpful to make it a good addition to mistletoe!

@legenderrys
Copy link
Author

Hi, a few comments from someone who's been messing a bit with this code base recently.

Thank you for the response and suggestions.

  1. The format for the ${...} strings should be defined more clearly. What about spaces, newlines, escaping, ...?
    Not saying it needs to be more complicated than necessary -- just that it should be well defined.

Using the {...} characters was causing conflicts with other renderers namely the test_XWiki20Renderer.py macros test failing tests. Although your suggestion to create a new HTMLrenderer may help fix those issues here.

Per the block syntax. Lets get inline working first before multi-line. I'd imagine we could repurpose the code block functionality

  1. Markdown can get complicated and I'm not sure if the proposed design will work well with lists nested inside lists,
    for example. It would be nice with some test cases to demonstrate how it handles more difficult cases. For example,
    if you set the id attribute on a list that has a nested list inside it, you don't want the same id on the inner list
    because that would be invalid html.

Great point, id attributes should always be unique to pass accessibility standards. We could add a numeric suffix onto the nest element id each time props are passed down from parents. I could also experiment with making the apply_props recursive for child tokens.

@legenderrys
Copy link
Author

I've updated the branch to include adding unique id attribute values for deeply nested lists.

Below is an example currently in place
INPUT

${id:todos, tabindex:100 > class:list-item}
- Item One
- Item Two
    - item two.one
    - item two.two
        - apples
        - oranges
- Item Three

OUTPUT

<ul id="todos" tabindex="100">
    <li class="list-item" tabindex="1">Item One</li>
    <li class="list-item" tabindex="1">Item Two
        <ul id="todos-2" tabindex="1">
            <li class="list-item" tabindex="1">item two.one</li>
            <li class="list-item" tabindex="1">item two.two
                <ul id="todos-2-3" tabindex="1">
                    <li class="list-item" tabindex="1">apples</li>
                    <li class="list-item" tabindex="1">oranges</li>
                </ul>
            </li>
        </ul>
    </li>
    <li class="list-item" tabindex="1">Item Three</li>
</ul>

@anderskaplan
Copy link
Contributor

You could also consider to let the attributes apply only to the immediately following block token. Then it would be possible to add another set to the inner list, like so:

${id:todos}
- Item One
- Item Two
    - item two.one
    - item two.two
        ${id:shopping-list}
        - apples
        - oranges
- Item Three

@legenderrys
Copy link
Author

You could also consider to let the attributes apply only to the immediately following block token. Then it would be possible to add another set to the inner list, like so:

Hmm, I had that idea in the beginning, however the current tokenization processing was a struggle and we need a system to pass parent props to children.

Or perhaps an Emmet style syntax at the root level may function better.

🐊 ${id:todos > id:shopping-list > class:shopping-list-items}

@legenderrys
Copy link
Author

@anderskaplan There is still a few things I need to update in this PR and I'd like to know what is the criteria for getting code merged into master? Thanks for your help and feedback.

things todo:

  1. Implement attributes for all other html elements
  2. Add test cases for each html element
  3. Update Readme? ( not sure If the readme should include this feature )

@anderskaplan
Copy link
Contributor

Ok, that's a question for @pbodnar. I can only help with some guidance on how to make the code fit in with the overall design of the codebase.

Copy link
Collaborator

@pbodnar pbodnar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anderskaplan There is still a few things I need to update in this PR and I'd like to know what is the criteria for getting code merged into master? Thanks for your help and feedback.

@hyperking, thank you for your contribution (and thanks to @anderskaplan for giving some useful feedback :)). I have gone through it quickly and gave you some feedback. There are no exactly described criteria. Generally, the code should be production-ready (or "fully baked" ;)) before being merged into master.

It also helps if commits and their messages are already as close to the final state as possible. So rebasing (squashing etc.) and force-pushing is usually OK to achieve that, even though this may lead to losing links from conversations to the code...


Some further quick thoughts (as I don't have much time for maintaining mistletoe these days and some open PRs are still waiting there for me ;)):

  1. PR description

This is similar to this #134

I would like to propose an alternative solution that will not alter existing render methods.

OK, it would be also good to highlight that unlike #134, this PR lets regular users specify the additional attributes, so it is not just an API extension.

  1. Selecting the syntax for attributes

Using the {...} characters was causing conflicts with other renderers namely the test_XWiki20Renderer.py macros test failing tests. Although your suggestion to create a new HTMLrenderer may help fix those issues here.

Provided we would want to use just {...} instead of ${...} and provided a test is failing, just fixing that test could be a solution?

Anyway, I wonder if we couldn't inspire at some alternative Markdown parsers or elsewhere, regarding the syntax (e.g. XWiki uses (% ... %))? Also from many tools and languages, I am quite used to read ${...} as something that should be evaluated, that's why I would like to consider some other variants.

  1. HTMLAttributes, or ExtraAttributes?

Any thoughts on making this mechanism, or token, general? I.e. the attributes maybe don't have to be necessarily just for HTML?

Also, HTMLAttributes are being added into block_token.__all__ by this PR now, so they would be already available to all the other renderers, right?

And a related question: shouldn't we do this feature / parsing optional (thinking about performance and backwards compatibility here)?

README.md Outdated Show resolved Hide resolved
mistletoe/block_token.py Outdated Show resolved Hide resolved
mistletoe/block_token.py Outdated Show resolved Hide resolved
mistletoe/block_token.py Outdated Show resolved Hide resolved
mistletoe/block_token.py Outdated Show resolved Hide resolved
mistletoe/block_token.py Outdated Show resolved Hide resolved
mistletoe/html_attributes_renderer.py Outdated Show resolved Hide resolved
title = ' title="{}"'.format(html.escape(token.title))
else:
title = ''
attr = '' if not hasattr(token,'html_props') else token.html_props
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would extract this repeated line to a dedicated helper method, as in #134, so that the logic is kept and can be changed at one place.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

attributes are serialized ahead of time already. See here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, yet I meant it literally though. :) Something like:

Suggested change
attr = '' if not hasattr(token,'html_props') else token.html_props
attr = self._render_attributes(token)
  • you can even inline this attr variable on the next line then (i.e. remove it).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pbodnar i may be missing something here.. but is this what you mean?

    def render_html_attributes(self, token: block_token) -> str:
        return '' if not hasattr(token,'html_props') else token.html_props

    def render_image(self, token: span_token.Image) -> str:
        template = '<img src="{}" alt="{}"{}{attrs} />'
        title = ' title="{}"'.format(html.escape(token.title)) if token.title else ''
        attrs = self.render_html_attributes(token)
        return template.format(token.src, self.render_to_plain(token), title, attrs=attrs)

Copy link
Collaborator

@pbodnar pbodnar Oct 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, although I would change the newly introduced method to (i.e. keep it "private"; it seems to be usable by inline tokens as well; proper formatting; also move it ):

    def _render_html_attributes(self, token: token.Token) -> str:
        return '' if not hasattr(token, 'html_props') else token.html_props


class TestHTMLAttributes(TestCase):

def test_document(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically OK, but It will be better to test every aspect (variant) of this new feature in a dedicated method (and move all the common code like imports to the top).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😅 I knew this moment would come. Sure thing!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice with some unit tests for the HTMLAttribute options as well.

test/test_html_attributes_renderer.py Outdated Show resolved Hide resolved
@legenderrys
Copy link
Author

legenderrys commented Dec 18, 2022

@hyperking, thank you for your contribution (and thanks to @anderskaplan for giving some useful feedback :)). I have gone through it quickly and gave you some feedback. There are no exactly described criteria. Generally, the code should be production-ready (or "fully baked" ;)) before being merged into master.

🤓 Awesome! With the holidays underway I may have some in between time to address any loose ends with this PR.

It also helps if commits and their messages are already as close to the final state as possible. So rebasing (squashing etc.) and force-pushing is usually OK to achieve that, even though this may lead to losing links from conversations to the code...

🤓 Yes, I'll make an effort to clean up the PR as close to upstream/master as possible without any conflicts. I've made a test run locally and git didn't seem to complain.

  1. Selecting the syntax for attributes
    Provided we would want to use just {...} instead of ${...} and provided a test is failing, just fixing that test could be a solution?

🤓 I agree, the original { ... } syntax would be ideal. However it does conflict with other aforementioned renders. I did make this renderer configurable so that the end user could specify an alternative character literals.

See configure statement here

  1. HTMLAttributes, or ExtraAttributes?
    Any thoughts on making this mechanism, or token, general? I.e. the attributes maybe don't have to be necessarily just for HTML?

🤓 HTMLAttributes are the common term used in web development for describing attributes on a given DOM node. per this feature, It's currently a general token. The token currently only supports basic markdown that renders HTML DOM nodes and Users are required to specify the HTMLAttributesRenderer vs the HTMLRenderer

Also, HTMLAttributes are being added into block_token.__all__ by this PR now, so they would be already available to all the other renderers, right?

🤓 Yes, It's also dependent on if the qualifying token string is present in the markdown ( outside of any code blocks )

And a related question: shouldn't we do this feature / parsing optional (thinking about performance and backwards compatibility here)?

I'd love to see how this performs, but generally this feature would come at a cost, but Its optional though 🤷‍♂️

@@ -0,0 +1,86 @@
HTMLAttributesRenderer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice docs!

I'd suggest to change the name of the file to something along the lines of "Extension to attach custom html attributes" to make it more clear what it's about. "Features" is so very generic.

You could also describe the feature in a paragraph in the README.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we rename the file name from features.md into extensions.md
and within extensions.md we link to any available docs for new extensions or just add extension details to a single file?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we really should think this through, I think there is no docs like this in mistletoe yet... Maybe keeping most of the description right at the class itself (in pydoc) could also be the way (although I have suggested creating a dedicated md myself firstly, I know)?

features.md Outdated
HTMLAttributesRenderer
----------------------

This feature allows you to write Markdown that will render 508 compliant html attributes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice with an explanation (maybe a link) of what "508 compliant" means. It's just a googling away but still.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See (48c8134)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, me too had a look at what the "508 compliance" means - if I understand it correctly, this standard is all about writing "accessible documents", which happens to include using concrete HTML attributes. So while this is somewhat related to this PR, I would stay generic here, focusing just on the ability to add custom HTML attributes to the output, like this:

This renderer allows you to specify extra attributes within your Markdown text, outputting them as HTML attributes of corresponding HTML tags.

I think the docs are basically fine, but I, as a quite demanding reader myself, would like to make it a little bit more "appealing". I'm just not sure if I will have enough time to do a rewrite myself... :P

recon_tokens.append(token_type)
doc_token.children = recon_tokens

def render_strong(self, token: span_token.Strong) -> str:
Copy link
Contributor

@anderskaplan anderskaplan Dec 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks identical to the base class implementation (HTMLRenderer). Maybe I'm missing something, but if that's the case then I'd suggest to remove it from here. Less code is better.

(The same comment applies to other methods too: render_emphasis, render_inline_code, etc)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See (a7828be)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, yet I can still see some identical methods in both classes - @hyperking, is it that you are about to add support for HTML attributes to those methods later on in this PR?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pbodnar Yes, the base class methods for HTML nodes that typically do not have attributes were removed in the above commit.

Support for the other HTML node methods has not been implemented yet and are in progress on my machine.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Leaving this thread opened, so that we don't forget there are still some copied methods to be altered.

BTW when thinking about it, this methods' duplication is not ideal. Because when we change something in the HtmlRenderer, we should most probably also apply the same to the new HtmlAttributesRenderer, right? So putting this new feature introduced by this PR into the HtmlRenderer directly should be our final goal, shouldn't it?

Copy link
Collaborator

@pbodnar pbodnar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi guys, I'm sorry, thanks for your efforts, but I don't really have much time for mistletoe nowadays.

So I have tried to give at least some new feedback today. I haven't gone through all the code in detail yet though...

mistletoe/html_attributes_renderer.py Outdated Show resolved Hide resolved
recon_tokens.append(token_type)
doc_token.children = recon_tokens

def render_strong(self, token: span_token.Strong) -> str:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, yet I can still see some identical methods in both classes - @hyperking, is it that you are about to add support for HTML attributes to those methods later on in this PR?

mistletoe/block_token.py Outdated Show resolved Hide resolved
mistletoe/block_token.py Outdated Show resolved Hide resolved
mistletoe/base_renderer.py Outdated Show resolved Hide resolved
test/test_html_attributes_renderer.py Outdated Show resolved Hide resolved
mistletoe/html_attributes_renderer.py Outdated Show resolved Hide resolved
mistletoe/block_token.py Outdated Show resolved Hide resolved
title = ' title="{}"'.format(html.escape(token.title))
else:
title = ''
attr = '' if not hasattr(token,'html_props') else token.html_props
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, yet I meant it literally though. :) Something like:

Suggested change
attr = '' if not hasattr(token,'html_props') else token.html_props
attr = self._render_attributes(token)
  • you can even inline this attr variable on the next line then (i.e. remove it).

mistletoe/base_renderer.py Outdated Show resolved Hide resolved
@@ -1020,6 +1020,111 @@ def read(lines):
return [next(lines)]


class HTMLAttributes(BlockToken):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the meantime, we switched to "strict/simple camel case convention" globally, so it needs to be like this now:

Suggested change
class HTMLAttributes(BlockToken):
class HtmlAttributes(BlockToken):

The same applies to the new HTMLAttributesRenderer.

@@ -202,4 +202,4 @@ def render_thematic_break(self, token: block_token.ThematicBreak) -> str:
return self.render_inner(token)

def render_document(self, token: block_token.Document) -> str:
return self.render_inner(token)
return self.render_inner(token)
Copy link
Collaborator

@pbodnar pbodnar Oct 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beware of leaving EOL at the EOF everywhere. This will also decrease number of changed files - like this one.

mistletoe/block_token.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@pbodnar pbodnar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still quite a lot of work on this.😅

Also, while merging changes from the master branch is welcome, I would ask you to do Git rebase instead. I think the commit history would deserve doing some cleanup anyway, so that the current 35 commits get transformed to a few atomic commits at maximum...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants