Broken Link Crawler

This was for my tutorial on building a dead link checker so its scope has been kept quite small.

Broken Link Crawler

Let's say I have a website and I want to find any dead links and images on this website.

$ python deadseeker.py 'https://healeycodes.com/'
> 404 - https://docs.python.org/3/library/missing.html
> 404 - https://github.com/microsoft/solitare2

The website is crawled, and all href and src attributes are sent a request. Errors are reported. This bot doesn't observe robots.txt but you should.

It is not a clever bot. But it is a good bot.

Accepting (small) PRs and issues!

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bot-in-action.gif		bot-in-action.gif
deadseeker.py		deadseeker.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Broken Link Crawler

It is not a clever bot. But it is a good bot.

About

Releases

Packages

Languages

License

healeycodes/Broken-Link-Crawler

Folders and files

Latest commit

History

Repository files navigation

Broken Link Crawler

It is not a clever bot. But it is a good bot.

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages