Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plugins/url pattern #551

Open
wants to merge 3 commits into
base: rewrite-with-urlplugin
Choose a base branch
from

Conversation

vivien
Copy link
Member

@vivien vivien commented Feb 8, 2015

This PR mainly factorizes regexps (case insensitivity and protocol prefix) and removes redundant matching code from js/plugins.js. This makes it more convenient to write URL Plugins. Parts have been discussed in #543.

There's no need to check for every GET parameters (using the line ending
'$' character), we can just add a word boundary (the '\b' character)
instead. Thus, revert commit b939bc7.
Thanks to UrlPlugin, there (often) is no need to change for the protocol
prefix (usually specified as \^https?:\/\/ with eventual sub domain).

Thus, replace any occurrence of protocol + sub domain check with a word
boundary ('\b') character.
Extend UrlPlugin to take an optional regexp pattern, and pass the
matches array to the plugin callback.

This allows to factorize the case insensitivity ('i' flag), and get rid
of redundant match check in plugins:

    var regexp = /.../;
    var match = url.match(regexp);
    if (match) {
        ...
    }

Refs glowing-bear#543 (part 3)
@lorenzhs
Copy link
Member

This is quite cool. I'll have to take a closer look later!

@lorenzhs
Copy link
Member

As mentioned on IRC, these regex escaping issues are super-annoying, and to prevent mistakes we should add tests that ensure certain things are not recognized by the plugins, like the youtube1com/foo example or foo.bar/youtube.com/baz

@lorenzhs
Copy link
Member

cherry-picked 06e3a85 onto master

@lorenzhs
Copy link
Member

I tested the following and it works in Chrome and Firefox:

diff --git a/js/plugins.js b/js/plugins.js
index 423360a..6b11bb5 100644
--- a/js/plugins.js
+++ b/js/plugins.js
@@ -34,7 +34,12 @@ var urlRegexp = /(?:ftp|https?):\/\/\S*[^\s.;,(){}<>]/g;
 var UrlPlugin = function(name, urlPattern, urlCallback) {
     return {
         contentForMessage: function(message) {
-            var regexp = new RegExp(urlPattern, 'i');
+            var regexp;
+            if (urlPattern instanceof RegExp) {
+                regexp = new RegExp(urlPattern.source, 'i');
+            } else {
+                regexp = new RegExp(urlPattern, 'i');
+            }
             var urls = message.match(urlRegexp);
             var content = [];

@cormier
Copy link
Member

cormier commented Feb 18, 2015

@vivien: Since rewrite-with-urlplugin has been merged in master, I think that this should be rebased on master as well. Other than this, I find it most excellent and would gladly merge it!

@vivien
Copy link
Member Author

vivien commented Feb 18, 2015

@lorenzhs had good points on the review. We could indeed merge this work and continue on top of it if we need parts of it, otherwise I can propose a v2. Unfortunately I'm quite busy these days, but in the big picture, here's what I have in mind:

  1. Get rid of regexp in URL plugins. Hash an URL and give it to the plugin. This makes much clearer and less cryptic URL handling. See: https://gist.github.com/vivien/50f329e36d404dae00ab
  2. Do not parse URLs from a message for every plugin. This means having a single UrlPlugin with a list of known URL handlers. See: https://gist.github.com/vivien/500e8abbf8dbfa1efc1e

The only concern with 2 is that I don't know how to return the plugin name.
What do you guys think?

@lorenzhs
Copy link
Member

lorenzhs commented Mar 1, 2015

I've tried to come up with something easier than the if blocks, but the potential conditions are too complex to formulate them as a simple object. The youtube condition would look something like this monstrosity and be even less readable than the if blocks:

[
    {
        domain: ["com", "youtube"],
        tokens: [
            {conditions: [{path: 0, value: "watch"}], param: "v"},
            {conditions: [{path: 0, value: "embed"}], path: 1},
        ]
    },
    {
        domain: ["be", "youtube"],
        tokens: [{path: 0}]
    }
]

So maybe we should consider multiple (domain, path) pairs of regexes? That way, we could capture parameters a lot better and still not end up with ten lines of URL foobar.

var regexes = [
    [/\byoutube\.com/, /^watch?([a-zA-Z0-9]+=[a-zA-Z0-9]+)(&[a-zA-Z0-9]+=[a-zA-Z0-9]+)*/],
    [/\byoutube\.com/, /^embed/[a-zA-Z0-9]+/],
    [/\byoutu\.be/, /^[a-zA-Z0-9^]/]];

-- but then it becomes a pain to figure out which of the parameters was v (and also the above doesn't account for #t=10 etc)

I don't have a good solution :(

@lorenzhs
Copy link
Member

Was there a reason why we haven't merged this yet? If we added the regex passing as noted in #551 (comment) I think this would be really nice.

@vivien
Copy link
Member Author

vivien commented Mar 31, 2015

That can be a first shot indeed. I'll try to check that asap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants