Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request(option): return exit(1) if links from $n domains has errors #787

Open
Kristinita opened this issue Jan 17, 2024 · 0 comments

Comments

@Kristinita
Copy link

Kristinita commented Jan 17, 2024

1. Summary

It would be nice if linkchecker will have the option that allowed the user to define after how many errors from different domains linkchecker should return exit code 1.

2. Example of desired behavior

  1. linkcheckerrc:

    # [INFO] Return exit code 1, if 5 or more errors from different domains.
    # Else 0—4 errors, return exit code 0.
    mindomainerrorsforexit1=5
  2. CLI command:

    # [INFO] Run it on UNIX shells or on Cygwin on Windows
    #
    # [INFO] Get error 403
    $ linkchecker https://www.tandfonline.com/doi/abs/10.1080/17539153.2013.855385
    
    Start checking at 2024-01-17 05:11:11+001
    
    URL        `https://www.tandfonline.com/doi/abs/10.1080/17539153.2013.855385'
    Real URL   https://www.tandfonline.com/doi/abs/10.1080/17539153.2013.855385
    Check time 0.979 seconds
    Result     Error: 403 Forbidden
    
    Statistics:
    Downloaded: 0B.
    Content types: 0 image, 1 text, 0 video, 0 audio, 0 application, 0 mail and 1 other.
    URL lengths: min=47, max=64, avg=55.
    
    That's it. 2 links in 2 URLs checked. 0 warnings found. 1 error found.
    Stopped checking at 2024-01-17 05:11:13+001 (1 seconds)
    
    
    # [INFO] Get exit code of the previously executed command:
    # https://www.cyberciti.biz/faq/bash-get-exit-code-of-command/
    #
    # [INFO] Return exit code 0, because the option “mindomainerrorsforexit1”
    # in the “linkcheckerrc” file specifies that exit(1) requires 5 errors.
    # In this example “1 error found”. 1 < 5.
    #
    # [INFO] Currently, linkchecker return exit status 1
    $ echo $?
    0
    
    
    # [INFO] Run linkchecker for non-existent internal link
    $ linkchecker http://localhost:3014/NonExistentPage
    
    Start checking at 2024-01-17 05:48:13+001
    
    URL        `http://localhost:3014/NonExistentPage`
    Real URL   http://localhost:3014/NonExistentPage
    Check time 0.006 seconds
    Size       154B
    Result     Error: 404 Not Found
    
    Statistics:
    Downloaded: 0B.
    Content types: 0 image, 1 text, 0 video, 0 audio, 0 application, 0 mail and 0 other.
    URL lengths: min=37, max=37, avg=37.
    
    That’s it. 1 link in 1 URL checked. 0 warnings found. 1 error found.
    Stopped checking at 2024-01-17 05:48:14+001 (1 seconds)
    
    
    # [INFO] Return exit code 1 despite the option “mindomainerrorsforexit1”.
    # All internal links still must be valid in any case.
    # Unlike external links, the site owner can and should control them.
    $ echo $?
    1

3. Reason why the feature is needed

Without the feature as in my example, in practice it’s impossible to use linkchecker with the --check-extern option for continuous integration (for example on Jenkins, Travis or AppVeyor) on real projects with thousands of external links. In the real world, it doesn’t happen that 100% of thousands external links open without errors. The website owner has no control over temporary problems on other sites.

4. Response to possible counterarguments

4.1. Use “--ignore-url”

I use the option --ignore-url for URLs where I permanently get linkchecker errors when requesting them, such as Cloudflare-protected URLs. However, external websites may have temporary errors, sites may be temporarily unavailable. For example, when I tested the external links of my site the day before yesterday, I received 3 errors. When I tested my site yesterday, I haven’t received these 3 errors, but 2 new errors appeared. When user have thousands of external links on a site, the likelihood that all of them will open without errors is small. That’s why an option like mindomainerrorsforexit1 is needed.

5. Errors from the same domain

If behavior like my request ever gets implemented, it would be nice if linkchecker would count all errors from the same domain as 1 error.

For example, I have such links on my website:

<!-- [INFO] Part of the “KiraExample.html” -->

<a href="https://linkchecker.github.io/linkchecker/">linkchecker documentation</a>
<a href="https://linkchecker.github.io/linkchecker/install.html">linkchecker install</a>
<a href="https://linkchecker.github.io/linkchecker/faq.html">linkchecker FAQ</a>
<a href="https://linkchecker.github.io/linkchecker/contributing.html">linkchecker Contribution Guide</a>

If I make a request linkchecker http://localhost:3014/KiraExample.html while the site https://linkchecker.github.io temporarily unavailable, I will get 4 errors. It would be nice in such cases to be able to receive 1 error instead of 4.

If a user has multiple external links to a specific site, and that site is temporarily down, then the user can quickly reach the error limit defined in the mindomainerrorsforexit1 option. That’s why I wrote in my issue about “links from $n domains”.

6. Not helped

If the linkchecker users can already get behavior described in my feature request, please add to the linkchecker documentation what users must do to get this behavior. In the linkchecker documentation about exit codes, I solely found the paragraph “RETURN VALUE”. Also, I haven’t found how I can get the desired behavior using search in:

  1. This issue tracker
  2. Google

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant