Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a documented way to customize slugs generated by SlugInput / urlify #11916

Open
drimacus182 opened this issue May 1, 2024 · 13 comments

Comments

@drimacus182
Copy link
Contributor

I've just tried new 6.1 and as I can see window.URLify is deprecated and is not used by SlugInput internally.

The problem is that current SlugInput urlify function is although universal but not entirely correct for every language. For instance, for Ukrainian it produces incorrect results according to transliteration rules:

sluginput_ukrainian

Prior to 6.1 I was solving this using a tailored URLify function injected into admin. Now it lives inside a stimulus controller on a client js side, which is not straightforward to customize.

It would be great to have a documentation describing best practices on how this could be achieved.

@gasman gasman added this to the 6.1.1 milestone May 1, 2024
@lb-
Copy link
Member

lb- commented May 2, 2024

Thanks for the report @drimacus182 - while the previous approach to override the global function window.URLify was never documented I can see how having some capability to override the behaviour here would be helpful.

I think we should try to consider a few ways to resolve this.

1. Potentially incorrect transliteration

You will have to help us out here, could you please advise the exact characters , in text (not just a screenshot), that you think should be updated to be transliterated differently?

All of this transliteration is based on the Django urlify.js code.

We have now pulled this into a more manageable config file urlify.config.json as part of our refactor.

Maybe we should just try to fix this in our code (plus we could suggest a similar fix in Django's code), so that the characters are handled better.

# current characters configured
"UKRAINIAN_MAP": [
  ["Є", "Ye"],
  ["І", "I"],
  ["Ї", "Yi"],
  ["Ґ", "G"],
  ["є", "ye"],
  ["і", "i"],
  ["ї", "yi"],
  ["ґ", "g"]
]

2. Another unofficial work around

It should be possible to add an event listener to the slug input against the event 'w-sync:apply' and then stop propagation and re-dispatch that same event with the desired transliteration.

This is not as clean as just mutating the global but should be enough to get you started with specific transliteration requirements.

This would still be an unofficial work around though, and could break in any future release.

# .../static/js/admin.js

document.addEventListener(
  'w-sync:apply',
  (event) => {
    // return if the event is not from the slug field
    if (event.target.name !== 'slug') return;
    // return if the event has already been processed
    if (event.detail.processed) return;

    console.log('w-sync:apply', event);

    // stop the current event from propagating
    event.preventDefault();
    event.stopPropagation();

    // get the value to apply and mutate to suit our needs, dispatch a new event
    const value = event.detail.value || ''; // will be the original title's value
    event.target.dispatchEvent(
      new CustomEvent('w-sync:apply', {
        detail: {
          // change any values from the title field before it gets to the slugify controller
          value: value.replaceAll('ґ', 'g'),
          processed: true,
        },
      }),
      { bubbles: true, cancelable: true },
    );
  },
  { capture: true }, // ensure this listener captures the event first
);
# ... /wagtail_hooks.py
from django.utils.safestring import mark_safe
from django.templatetags.static import static

from wagtail import hooks

@hooks.register('insert_global_admin_js')
def global_admin_js():
    src = static('js/admin.js')
    return mark_safe(f'<script src="{src}"></script>')

3. Add ability to intercept the urlify/slugify more easily with events

We should probably dispatch a change event when the slugify/urlify triggers, this could be intercepted easily and any further mutations of the field's value can be made.

Additionally we could add custom events for when urlify/slugify are about to happen and allow this value to be changed similar to how we do for the image/document title field. https://docs.wagtail.org/en/latest/advanced_topics/images/title_generation_on_upload.html#images-title-generation-on-upload


I would rather us try to get the transliteration right first though, but I am not sure what wider input is needed here from the community.

Generally transliteration is a error-prone process. We currently run into a similar problem with this for the form builder usage of cautious_slugify, where some users want this to work like transliteration but some cases are not handled.

@lb-
Copy link
Member

lb- commented May 2, 2024

I have also posted in the #translators chat to get some other thoughts on all this.

For now @drimacus182 - can you test the work around above and see if that gets you to a point where you can use 6.1 in your production code. Any other suggestions welcome.

@drimacus182
Copy link
Contributor Author

drimacus182 commented May 3, 2024

Hi @lb-, thanks for looking into this. I'll try to add my thoughts:

Approach 1

I understand the desire to make slugs work out of the box, but I see a problem with this universal approach. The existing Django's urlify.js and wagtail's urlify.config.json you've mentioned are just mapping letters one by one, without any knowledge of what language the original string is in. E.g. for a Cyrillic character, it starts with RUSSIAN_MAP and proceeds to UKRAINIAN_MAP only if the character is not from the Russian alphabet. As a result, we get correct transliteration for Russian, but partially incorrect for Ukrainian. I mean, there are letters that are common to Russian and Ukrainian but have different phonetics in those languages and hence different transliterations. I suppose the same is true for other Cyrillic languages like Belarusian, Bulgarian, Serbian etc.

Example from urlify.config.json:

"RUSSIAN_MAP": [
  //...
  ["г", "g"], // should be "h" for Ukrainian
  ["и", "i"], // should be "y" for Ukrainian
  // ...

Apart from phonetics, there are different transliteration standards possible for different contexts and languages. So there is no simple "one-size-fits-all" solution for every use case. Such a solution would require knowledge of what language the original string is in and will introduce lots of complexity.

Just in case, leaving a link for the official transliteration standard for Ukrainian

Approach 2

I'm generally fine with the 2nd solution, and thank you for the complete code sample. I'd try to use this with my deployment, but can't promise to do this quickly. It feels a bit hacky for me though, especially considering it is unofficial and can break in future releases.

Approach 3

we could add custom events for when urlify/slugify are about to happen and allow this value to be changed

It sounds most satisfying to me since my initial need was to have an ability to modify the internals of SlugController.urlify method in a documented and reliable way. As I understand, it won't be the "modification" of the internal slugify method, but rather an expected placeholder to inject a custom slugify logic. I'm new to Stimulus, and can't fully understand everything that's happening inside controllers, but I suppose the value from the title input is passed to the slug input as is, and the event handler you propose will be able to get an original unmodified string and set the slugified value.

To summarise, I think it's straightforward for a developer to write a transliteration function that will fit all their needs. On the contrary, it's almost impossible to create a single transliteration logic that will make everyone happy. So the solution should be any of those that allow a developer to define their custom logic.

@lb-
Copy link
Member

lb- commented May 5, 2024

@drimacus182 here's a proposed client-side API for overriding this behaviour.

document.addEventListener('w-clean:urlify', event => {
  const {
    apply /* A function you can use, once, to apply a custom value */,
    currentValue /* Current value (e.g. 'рецепт чізкейку') */,
    newValue /* Default transliteration (e.g. 'hecept czizkejku' ) */
  } = event.detail;

  // If you want to stop the new value from being used and make your own...
  event.preventDefault();
  apply(myCustomFunction(currentValue) /* e.g. 'recept czizkejku' */);
});

A similar event would be available for w-clean:slugify. These give you two levels of control, first - just stopping the change all together, second - applying a custom value.

Additionally, we would add change events to the slug field, so you can listen for a change after it's been applied and do something else then.

This approach leverages the DOM events API, avoids globals and gives us room to enhance later with additional data being provided to the event. You'll also have access to the input with event.target if you want to do more complex things.

However, this would not easily allow any Async usage, as the preventDefault call needs to be used. It would be possible to still use this method to do a fetch call and update the field after the response comes back though, just may be a bit more involved.

Finally, this involves us updating the SlugController/w-slug to CleanController/w-clean which I think is a better name for it and it lets us add new behaviour to this controller later.

@ismayil-ismayilov
Copy link

ismayil-ismayilov commented May 14, 2024

Hey, @lb- . If possible, could you please also consider Azerbaijani letters? This would also cover many Turkic based languages at once (including Turkish). Azerbaijani has more letters than others in average.
Here is the map:

"AZERBAIJANI": [
  ["ə", "e"],
  ["Ə", "E"]
  ["ı", "i"],
  ["I", "I"],
  ["ö", "o"],
  ["Ö", "O"],
  ["ü", "u"],
  ["Ü", "U"],
  ["ğ", "g"],
  ["Ğ", "G"],
  ["ç", "c"],
  ["Ç", "C"],
  ["ş", "s"],
  ["Ş", "S"],
]

@laymonage
Copy link
Member

laymonage commented May 14, 2024

Perhaps instead of urlify.config.json, we should inline it as a Python dictionary and allow developers to extend it via a hook/setting? From there, we use json_script to render it in the template, to then be parsed by the URLify function. I feel like this would be a common enough use case that would make sense to be configurable rather than having to reach out to custom JS code.

@lb-
Copy link
Member

lb- commented May 14, 2024

I like that idea @laymonage - we could still allow the JS override in the future but I think having a non-JS way to do simpler customisations would be a better approach initially.

Could you flag this for the core team discussion, see if there are any other ideas/inputs.

@lb- lb- changed the title Need a documented way to customize slugs generated by SlugInput in 6.1 Provide a documented way to customize slugs generated by SlugInput / urlify May 14, 2024
@allcaps
Copy link
Member

allcaps commented May 15, 2024

I've made a suggestion on Slack, and cross-post it here to keep the ref:

Would it be an idea to have Wagtail provide a slugify endpoint? This way, the frontend could be 'dumb', and the backend could have a new Wagtail hook to override the slugify behaviour.
This allows a "global" slugify function that both frontend and backend can use. Keeping slugify results the same between front and back.
It would also be straightforward for integrators to customise, and easy for Wagtail to support.

See Slack for the full discussion: https://wagtailcms.slack.com/archives/CTKN6UXKN/p1714721188163579?thread_ts=1714686970.068729&cid=CTKN6UXKN

@Stormheg
Copy link
Member

The use case seems common enough to me to warrant a documented approach indeed. If we go the custom code route, I would prefer a custom urlify config file over custom JS in the admin for simplicity.

Not convinced that a universal approach is not feasible. If the mapping of characters to the incorrect language raised by drimacus182 is the only issue, forgive my naïveté, but don't we already have information about the original locale available? Example: if the editor of a site configured for Ukrainian created a page, the locale of that page would be Ukrainian. We can use this information to consult the Ukrainian character map first. If it's a Russian site we would use the Russian character map first. Wouldn't that solve the issue? No need for customizing, it would work out of the box. Or are there other issues I'm not aware of? Would love to hear your input @drimacus182.

I don't think making the slugify feature a little smarter would introduce that much extra complexity to support. We'd restructure the urlify config to support looking up maps by language code, pull the current language code from the page and use that as 'preferred locale' when transliterating. All easy things to say from the comfort of my chair, I'm not familiar with SlugInput or its inner workings.


We could decide we don't want the added complexity and leave it up to developers to implement their own slugify mechanism, but that seems a little unfair to me. I believe it should work correctly out of the box if at all feasible.

@laymonage
Copy link
Member

@Stormheg I agree that changing the mapping to use the locale's language code as the key is a better approach. It may result in a bigger character map especially if there are multiple languages that could utilise the same mapping, though, so if we care about the dictionary size then perhaps we could also add support for aliases to let multiple languages use the same mapping.

However, I think another issue is that Wagtail's character map may be incomplete. It's true that anyone can submit a pull request to add support for more characters/languages, but that means they'd have to wait until the next release to be able to use it. If there was a way to customise the mapping, developers can do it in the mean time until they upgrade for the next release.

@Stormheg
Copy link
Member

We discussed this during the core team meeting

Summarizing;

  • Using the wrong character map to transliterate is a bug. We should fix that. Preferably using a library that already has this solved and contains additional transliteration maps that may currently be missing from our urlify.config.json.
  • We are a little divided about whether or not to officially support customising the slugify feature. I'm in the nay-camp, my personal take on it is that it wouldn't be used much because I believe people would only customize it when they encounter a bug that should be fixed in Wagtail. People can resort to unoffical JS workarounds to bridge the gap until a patched release.

@gasman
Copy link
Collaborator

gasman commented May 15, 2024

I agree with @Stormheg. Delivering a solution that "just works" for Ukrainian transliteration solves the immediate problem - if any other situations arise in future where a custom function is necessary, we can cross that bridge when we come to it. In the meantime, the workaround that @lb- has provided should ensure that this is not a blocker for anyone upgrading to 6.1, so I'll bump the milestone to 6.2.

@gasman gasman modified the milestones: 6.1.1, 6.2 May 15, 2024
@drimacus182
Copy link
Contributor Author

@Stormheg

We'd restructure the urlify config to support looking up maps by language code, pull the current language code from the page and use that as 'preferred locale' when transliterating

I'm generally fine with this solution and will use it in my deployment if Wagtail team is comfortable creating and maintaining character maps for different languages.

What I'm a bit concerned with is the lack of fine control for a developer to make customisations. I can imagine scenarios when you may want slightly different transliteration logic for Pages and Snippets (i.e. blog posts and authors). And some other urlify customisations like removing stop words.

These are a bit hypothetical though, and can be addressed using the workaround @lb- has provided above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants