While researching GitHub Actions for a talk, I asked myself: "How many repositories use GitHub Actions via pin-by-hash?". As I was unable to find a tool that could answer this question, I decided to build one myself.
The results are published at: http://pin-gh-actions.kammel.dev/
$ go run . --help
Usage of GH Pinned Actions:
-download-dir string
path to folder where repositories will be downloaded (default "/tmp/pinned")
-max-pages int
maximum number of pages to download (default 1)
-per-page int
number of repositories to download per page (default 100)
To replicate the results for 10,000 repositories, run:
go run . -max-pages 100
Note
The default download directory is /tmp/pinned
. You can change it with the --download-dir
flag.
Warning
Downloading 10,000 repositories will take a long time (depending on your internet connection) and consume about 1.5TB of disk space.
Notes about the chosen libraries and APIs.
We use the public GitHub repository search API to request the most popular repositories by stars. Although the search API support pagination, it has a limit of 100 results per page, and additionally a limit of 1000 results per search.
To get around this limitation, we modify the search query after each request, and only use the first page returned.
Although go-git
was the initial choice to clone the repositories, it was later replaced by os/exec
and git
due to performance limitations of the library. See linux-fetcher.
stacklok/frizbee already provides all the necessary tools to parse GitHub Actions. We use this library to parse the actions from the repositories.