Requisites 🔧

Extract information from all games published in Steam thanks to its Web API, and store it in JSON format. It also collects extra data from SteamSpy.

I used this code to generate these dataset: 'Steam Games Dataset'.

Requisites 🔧

Pyhton 3.8
requests and argparse.

pip3 install requests argparse

Usage 🚀

Start generating data simply with:

python SteamGamesScraper.py

The first time, the file 'appplist.json' will be created with all the ID that facilitates Steam (>140K). In the next execution, that file will be used instead of requesting all the data again. If you want to get new IDs, simply delete the file 'appplist.json'.

Only the data of the games are saved. DLCs, music, tools, etc. are ignored and added to the file 'discarted.json' so as not to ask for them in future searches. You can delete the file to ask again for those IDs.

Finally, in the file 'games.json' all games are stored, if:

It have been already been released.
'developers' field not empty.
Price included if its not free.

The format is this:

{
    "906850": {
        "name": "...",
        "release_date": {
            "coming_soon": false,
            "date": "..."
        },
        "required_age": 0,
        "is_free": false,
        "price": 0.99,
        "detailed_description": "...",
        "supported_languages": "...",
        "reviews": "...",
        "header_image": "...",
        "website": "...",
        "support_url": "...",
        "support_email": "...",
        "windows": true,
        "mac": false,
        "linux": false,
        "metacritic_score": 0,
        "metacritic_url": "...",
        "achievements": 0,
        "recommendations": 0,
        "notes": "",
        "packages": [
            {
                "title": "...",
                "description": "...",
                "subs": [
                    {
                        "text": "...",
                        "description": "...",
                        "price": 0.99
                    }
                ]
            }
        ],
        "developers": [
            "..."
        ],
        "publishers": [
            "..."
        ],
        "categories": [
            "..."
        ],
        "genres": [
            "..."
        ],
        "screenshots": [
            "..."
        ],
        "movies": [
            "..."
        ],
        "user_score": 0,
        "score_rank": "",
        "negative": 0,
        "positive": 1,
        "estimated_owners": "0 - 20000",
        "average_playtime_forever": 0,
        "average_playtime_2weeks": 0,
        "median_playtime_forever": 0,
        "median_playtime_2weeks": 0,
        "peak_ccu": 0,
        "tags": {
            "...": 22,
            ...
        }
    },
    ...
}

In the file 'ParseExample.py' you can see a simple example of how to parse the information.

⚙️ Parameters

To change the input file uses the parameter '-i' / '-infile':

python SteamGamesScraper.py -i games.json

To change the output file uses the parameter '-o' / '-outfile':

python SteamGamesScraper.py -o output.json

There is a general API rate limit for each unique IP adress of 200 requests in five minutes which is one request every 1.5 seconds. That's why 1.5 seconds are waited by default. You can change this with the parameter '-s' / '-sleep':

python SteamGamesScraper.py -s 2.0

It is not recommended to set the wait time below 1.5 seconds.

You can disable the extra data collected in SteamSpy using '-p' / '-steamspy':

python SteamGamesScraper.py -p False

When this option is deactivated, some data will appear as empty.

When Steam denies a request, by default it is trying up to four times. You can change the number of retries with '-r' / '-retries':

python SteamGamesScraper.py -r 10

Although it is not recommended, you can set always retry by changing the value to 0.

By default prices are requested in US dollars. You can change the currency with the parameter '-c' / '--currency' and the country or region code:

python SteamGamesScraper.py -c es

By default the language is set to English. You can change the language wit the parameter '-l' / '--language' and the country or region code:

python SteamGamesScraper.py -l en

The games that have not yet been released are added to the file 'notreleased.json' and will not be checked again. If you want to ignore this list, you can set the parameter '-d' / '-released' to False, or eliminate the file.

At the end of the scan, or by pressing Ctrl + C, all data are recorded. You can activate the auto-save to activate each X new entries with '-a' / '-autosave':

python SteamGamesScraper.py -a 100

A backup file will also be generated with the previous data.

Do you want to add new games from a file? You can use the parameter '-u' / '-update' and the CSV file name to add new games. The AppID must be in the first column.

python SteamGamesScraper.py -u update.csv

Contributors ✨

License 📜

Code released under MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 128 Commits
images		images
.gitignore		.gitignore
ConvertToCSV.py		ConvertToCSV.py
LICENSE.md		LICENSE.md
ParseExample.py		ParseExample.py
README.md		README.md
SteamGamesScraper.py		SteamGamesScraper.py
discarted.json		discarted.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

images

images

.gitignore

.gitignore

ConvertToCSV.py

ConvertToCSV.py

LICENSE.md

LICENSE.md

ParseExample.py

ParseExample.py

README.md

README.md

SteamGamesScraper.py

SteamGamesScraper.py

discarted.json

discarted.json

Repository files navigation

Requisites 🔧

Usage 🚀

⚙️ Parameters

Contributors ✨

License 📜

About

Releases

Packages

Contributors 2

Languages

License

FronkonGames/Steam-Games-Scraper

Folders and files

Latest commit

History

Repository files navigation

Requisites 🔧

Usage 🚀

⚙️ Parameters

Contributors ✨

License 📜

About

Topics

Resources

License

Stars

Watchers

Forks

Languages