Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request(http-redirected): Allowed redirects #789

Open
Kristinita opened this issue Jan 19, 2024 · 1 comment
Open

feature request(http-redirected): Allowed redirects #789

Kristinita opened this issue Jan 19, 2024 · 1 comment

Comments

@Kristinita
Copy link

Kristinita commented Jan 19, 2024

1. Summary

It would be nice if linkchecker users could specify allowed redirects that linkchecker wouldn’t return as warning.

2. Example of desired behavior

For example, Reddit add \?rdt=\d{5} to the end of each link, when I make any linkchecker request to the Reddit site. I get a warning like this:

linkchecker --recursion-level 0 https://www.reddit.com/r/webdev/comments/48z7jz/do_you_take_into_account_those_who_disable/d0nxftd/

URL        `https://www.reddit.com/r/webdev/comments/48z7jz/do_you_take_into_account_those_who_disable/d0nxftd/'
Real URL   https://www.reddit.com/r/webdev/comments/48z7jz/do_you_take_into_account_those_who_disable/d0nxftd/?rdt=37228
Check time 1.003 seconds
Warning    [http-redirected] Redirected to
           `https://www.reddit.com/r/webdev/comments/48z7jz/do_you_take_into_account_those_who_disable/d0nxftd/?rdt=37228'
           status: 302 Found.
Result     Valid: 200 OK

Reddit URL can be represented as a regular expression:

^(https:\/\/www\.reddit\.com.+)$

Reddit REAL URL regular expression:

^$1\?rdt=\d{5}$

where $1 — matches from Reddit URL.

We can add to linkcheckerrc the option like allowed-redirects:

allowed-redirects=
	# "URL" "REAL URL"
	"^(https:\/\/www\.reddit\.com.+)$" "^$1\?rdt=\d{5}$"

If the redirected REAL URL matches the expression ^$1\?rdt=\d{5}$, Linkchecker will not return a warning as in this case:

URL        `https://www.reddit.com/r/webdev/comments/48z7jz/do_you_take_into_account_those_who_disable/d0nxftd/'
Real URL   https://www.reddit.com/r/webdev/comments/48z7jz/do_you_take_into_account_those_who_disable/d0nxftd/?rdt=12345

But if the redirected REAL URL doesn’t match the regular expression ^$1\?rdt=\d{5}$, as in the next case, Linkchecker will return a warning:

URL        `https://www.reddit.com/r/webdev/comments/48z7jz/do_you_take_into_account_those_who_disable/d0nxftd/'
Real URL   https://www.spam.site/redirect-to-spam-site

3. More examples of desired behavior

3.1. VK

allowed-redirects=
	"^(https:\/\/)(vk\.com.+)$" "^$1m\.$2$"

No warning:

URL        `https://vk.com/psychologist_kira_k'
Real URL   https://m.vk.com/psychologist_kira_k

Warning:

URL        `https://vk.com/psychologist_kira_k'
Real URL   https://anotherspamsite.com/redirect-to-spam-site

3.2. Stack Overflow

allowed-redirects=
	"^(https:\/\/ru\.stackoverflow\.com\/).+$" "^$1\/questions\/.+$"

No warning:

URL        `https://ru.stackoverflow.com/a/544861/199934'
Real URL   https://ru.stackoverflow.com/questions/537362/%d0%98%d0%ba%d0%be%d0%bd%d0%ba%d0%b8-%d1%84%d0%b0%d0%b9%d0%bb%d0%be%d0%b2-%d0%b2-%d1%81%d0%b0%d0%b9%d0%b4%d0%b1%d0%b0%d1%80%d0%b5-sublime-text-3/544861#544861

Warning:

URL        `https://ru.stackoverflow.com/a/544861/199934'
Real URL   https://ru.stackoverflow.com/this-answer-is-deleted

3.3. TOML

allowed-redirects=
	"^(https:\/\/toml\.io\/en\/)latest$" "^$1v\d\.\d\.\d$"

No warning:

URL        `https://toml.io/en/latest'
Real URL   https://toml.io/en/v1.0.0

Warning:

URL        `https://toml.io/en/latest'
Real URL   https://tomlnewsite.com

4. Reasons why the feature is needed

In some cases, linkchecker users may find that it’s better not use redirected links. For my cases:

  1. Reddit case — linkchecker users should spend time adding to Reddit links endings like ?rdt=37228. I just want to add links to Reddit to my site and don’t worry that every time I have to add something to them.
  2. VK case — VK redirects the linkchecker request to the mobile version of the site — https://m.vk.com. I think that it’s better to leave a link to the desktop version of the site by default. VK automatically redirect users of mobile devices to the mobile version of the site from desktop version, but not automatically redirect users of desktop devices to the desktop version of the site from mobile version.
  3. Stack Overflow case — I prefer to use links like https://ru.stackoverflow.com/a/544861/199934 because in this format they remain unchanged if the title of a Stack Overflow question changes. Stack Overflow editors may change question titles quite often, and redirected links may become non-valid.
  4. TOML case — in my case, I wanted to link specifically to the latest version of the TOML specification — https://toml.io/en/latest. When the next version of TOML (2.0.0, 1.2.0 or 1.0.1) is released, the reference to version 1.0.0 will no longer be a reference to the latest version.

5. Not best ideas

5.1. Just use “ignorewarnings=http-redirected”

I think this is a very bad idea. If external site no longer works and its links lead to a spam site or to content that is different from the content that was there when the linkchecker user inserted the link, http-redirected carries important information to linkchecker users. In my case, through http-redirected I fixed about a hundred of outdated links on my site.

5.2. Just add your URL to “ignore=”

I don’t think that ignoring is the best idea. If I add the URL to the ignore list, I will no longer receive any errors and warnings about that URL. I won’t be aware of any problems with this URL.

Thanks.

@cjmayo
Copy link
Contributor

cjmayo commented Jan 19, 2024

Is this a duplicate/expansion of #782? Please have a look at PR #788 - better still try it! - and see if it meets your needs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants