SCP Foundation Web Scraper

This is a web scraper for the SCP Wikidot website for SCP entities. Returns them in a JSON format (see schema below). To be used in tandem with my personal scp-api but feel free.

TODO Next:

Scrape the authors names and credits as well
Add procedure to collect all addendums and documents to be stored in different json

Requirements

Jupyter Notebook (or Python should be fine)
The requests, bs4, json, and re modules
A will to live

Get Started

Just initialize all the helper functions and then run the scrape-scp function which takes an ID of the entity you want returned. Can be used in a for loop for mass scraping.

Scrape Schema

Currently scrapes and returns as follows:

{
    "id": "str",
    "class:" "str",
    "containment": "str",
    "description": "str",
    "more_info": "json"
}

e.g.:

{
    "id": "SCP-343",
    "class:" "Safe",
    "containment": "SCP-343 resides in a 6.1m by 6.1m (20 ft by..."
    "description": "CP-343 is a male, seemingly race-less, humanoid in...",
    "more_info": {
        "Addendum #343-1": "SCP-343, colloquially nicknamed...."
    }
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SCP Wiki Scraper.ipynb		SCP Wiki Scraper.ipynb
sample_database.json		sample_database.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

SCP Wiki Scraper.ipynb

SCP Wiki Scraper.ipynb

sample_database.json

sample_database.json

Repository files navigation

SCP Foundation Web Scraper

TODO Next:

Requirements

Get Started

Scrape Schema

About

Releases

Packages

Languages

License

rakhadjo/scp-scraper

Folders and files

Latest commit

History

Repository files navigation

SCP Foundation Web Scraper

TODO Next:

Requirements

Get Started

Scrape Schema

About

Topics

Resources

License

Stars

Watchers

Forks

Languages