Make MarkdownFields translatable #102

jeriox · 2022-07-05T21:00:01Z

Currently, when using wagtail-localize, a MarkdownField cannot be translated in an easy way, as the whole content of the field is put into one translation segment. For a long page with a markdown body, this is not feasible. I'd like to have the MarkdownField split up in several translation segments (like with StreamFields), so I can translate them separately.
I wrote a hacky solution for that some time ago, but it breaks with the current version. I'd be happy if we could find a way to support that properly.

My old code for reference:

import html2text
from django.db.models import TextField
from wagtail_localize.segments import (
    OverridableSegmentValue,
    StringSegmentValue,
    TemplateSegmentValue,
)
from wagtail_localize.segments.extract import quote_path_component
from wagtail_localize.segments.ingest import organise_template_segments
from wagtail_localize.strings import extract_strings, restore_strings

from wagtailmarkdown.utils import render_markdown
from wagtailmarkdown.widgets import MarkdownTextarea


class MarkdownField(TextField):
    def formfield(self, **kwargs):
        defaults = {"widget": MarkdownTextarea}
        defaults.update(kwargs)
        return super(MarkdownField, self).formfield(**defaults)

    def get_translatable_segments(self, value):
        template, strings = extract_strings(render_markdown(value))

        # Find all unique href values
        hrefs = set()
        for string, attrs in strings:
            for tag_attrs in attrs.values():
                if "href" in tag_attrs:
                    hrefs.add(tag_attrs["href"])

        return (
            [TemplateSegmentValue("", "html", template, len(strings))]
            + [StringSegmentValue("", string, attrs=attrs) for string, attrs in strings]
            + [OverridableSegmentValue(quote_path_component(href), href) for href in sorted(hrefs)]
        )

    def restore_translated_segments(self, value, field_segments):
        format, template, strings = organise_template_segments(field_segments)
        return html2text.html2text(restore_strings(template, strings))

zerolab · 2022-07-06T08:27:52Z

Hey @jeriox,

thank you for sharing this. Had a few requests for making this localize-compatible, so the code snippet is very handy!

jeriox · 2022-08-29T12:58:51Z

I got it working again with the code above, we will use that for now. Still feels a bit hacky to me, so we'd be happy if there was a better alternative built in :)

zerolab · 2022-08-29T13:29:36Z

This would need a bit of thinking. e.g.

I'd like to have the MarkdownField split up in several translation segments (like with StreamFields), so I can translate them separately.

Where do you draw the line and split things? is it at every link? every paragraph? every heading? given we can allow raw html in there too, how should we handle that?

jeriox · 2022-08-29T13:43:54Z

This would need a bit of thinking. e.g.

I'd like to have the MarkdownField split up in several translation segments (like with StreamFields), so I can translate them separately.

Where do you draw the line and split things? is it at every link? every paragraph? every heading? given we can allow raw html in there too, how should we handle that?

Currently, my approach works as follows: as there is already a lot of thought going into how to split up StreamFields, I tried to reuse that as much as possible. Therefor, I render the markdown to HTML and use the existings extract_strings() method. This also ensures that links are treated appropriatly. For the other direction, using html2text works quite well. I didn't test with raw HTML though. I think that every paragraph and every heading is a good split, as it ensures that one doesn't need to re-translate it if the page didn't change.

jeriox mentioned this issue Jul 5, 2022

Use wagtail-markdown instead of custom code fsr-de/myHPI#116

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make MarkdownFields translatable #102

Make MarkdownFields translatable #102

jeriox commented Jul 5, 2022 •

edited by zerolab

zerolab commented Jul 6, 2022

jeriox commented Aug 29, 2022

zerolab commented Aug 29, 2022

jeriox commented Aug 29, 2022

Make MarkdownFields translatable #102

Make MarkdownFields translatable #102

Comments

jeriox commented Jul 5, 2022 • edited by zerolab

zerolab commented Jul 6, 2022

jeriox commented Aug 29, 2022

zerolab commented Aug 29, 2022

jeriox commented Aug 29, 2022

jeriox commented Jul 5, 2022 •

edited by zerolab