ArtScraper is a tool to download images and metadata for artworks available on WikiArt (www.wikiart.org/) and Google Arts & Culture (artsandculture.google.com/).
Functionality:
WikiArt
andGoogle Arts & Culture
: Download images and metadata from a list of artworks' urlsGoogle Arts & Culture
: Download all images and metadata in the site, or from specific artists
The ArtScraper package can be installed with pip, which automatically installs the python dependencies:
pip install artscraper
To download data from WikiArt it is necessary to obtain
API keys. After obtaining them, you
can put them in a file called .wiki_api
in the working directory for your
script. The format is: the API access key, a new line, the API secret key, and
a new line, e.g.:
7e57a60844
3defc62d8f
Alternatively, when ArtScraper doesn't detect the file .wiki_api
, it will
ask for the API keys.
An example of fetching data is shown below and in the notebook.
from artscraper import WikiArtScraper
art_url = "https://www.wikiart.org/en/edvard-munch/anxiety-1894"
with WikiArtScraper(output_dir="data") as scraper:
scraper.load_link(art_url)
scraper.save_metadata()
scraper.save_image()
This will store both the image itself and the metadata in separate folders. If you use ArtScraper in this way, it will skip images/metadata that is already present. Remove the directory to force it to redownload it.
Results:
To download data from GoogleArt it is necessary to install Firefox.
ArtScraper will open a new Firefox window, navigate to the image, zoom on it and take a screenshot of it. It will take a few seconds. Do not minimize that browser, and do not let the screensaver go on.
An example of fetching data is shown below and in the notebook.
from artscraper import GoogleArtScraper
art_url = "https://artsandculture.google.com/asset/anxiety-edvard-munch/JgE_nwHHS7wTPw"
with GoogleArtScraper() as scraper:
scraper.load_link(art_url)
metadata = scraper.get_metadata() #or scraper.save_metadata()
scraper.save_image("data/anxiety_munch.jpg")
print(metadata)
See example notebook.
The final structure of the results will be
- data
- artist_links.txt (All artists, with one url per line)
- Artist_1
- description.txt (Description of artist, from wikidata)
- metadata.json (Metadata of arist, from wikidata)
- works.txt (All artworks, with one url per line)
- works
- work1
- artwork.png (Artwork image)
- metadata.json (Metadata of artwork, from Google Art and Culture)
- work2
- ...
- work1
- Artist_2
- ...
A full example (but please check the example notebook to add retries):
from artscraper.find_artists import get_artist_links
# Get links for all artists, as a list
output_dir = "data"
artist_urls = get_artist_links(min_wait_time=1, output_file=f'{output_dir}/artist_links.txt')
# Find_artworks for each artist
for artist_url in artist_urls:
with FindArtworks(artist_link=artist_url, output_dir=output_dir,
min_wait_time=min_wait_time) as scraper:
# Save list of artworks, the description, and metadata for an artist
scraper.save_artist_information()
# Find artist directory
artist_dir = output_dir + '/' + scraper.get_artist_name()
# Scrape artworks
with GoogleArtScraper(artist_dir + '/' + 'works', min_wait=min_wait_time) as subscraper:
# Get list of links to this artist's works
with open(artist_dir+'/'+'works.txt', 'r') as file:
artwork_links = [line.rstrip() for line in file]
# Download all artwork link (slow)
for url in artwork_links:
print(f'artwork URL: {url}')
subscraper.save_artwork_information()
Contributions are what make the open source community an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
Please refer to the CONTRIBUTING file for more information on issues and pull requests.
The package artscraper
is published under an MIT license. When using artscraper
for academic work, please cite:
Schram, Raoul, Mitra, Modhurita, Garcia-Bernardo, Javier, van Kesteren, Erik-Jan, de Bruin, Jonathan, & Stamkou, Eftychia. (2022).
ArtScraper: A Python library to scrape online artworks (0.1.1). Zenodo. https://doi.org/10.5281/zenodo.7129975
This project is developed and maintained by the ODISSEI Social Data Science (SoDa) team.
Do you have questions, suggestions, or remarks? File an issue in the issue tracker or feel free to contact the team via https://odissei-data.nl/en/using-soda/.