You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We need to enable regular updates for GitHub metadata (Issues/PRs and their comments) for the Bitcoin Core repository in the coredev index. Once this is implemented, we can extend support to other repositories, including:
bitcoin/bips
bitcoin-core/gui
bitcoin-core/secp256k1
repositories for other Bitcoin open-source software as needed.
Current Status
The coredev index powers the CoreDev bot and focuses on Bitcoin Core-related sources. Current sources includes:
However, the index was created from a one-time scrape and has not been updated. While the onboarding guide is static and doesn’t require updates, other sources need regular indexing to remain relevant.
Use the existing backup from bitcoin-data which updates every hour.
Our scraper already supports GitHub repositories, so Option 2 should be straightforward.
Considerations
Start with smaller datasets:
Begin with a subset of the bitcoin/bitcoin repository or smaller repositories like bitcoin/bips or bitcoin-core/secp256k1 to simplify testing and validation.
Data storage format:
Ensure replies to issues are linked to the main issue/PR using the issue field (as in the PR Review Club scraper).
Alternatively, the thread_url field could be used, but the issue field is preferred for consistency.
We need to enable regular updates for GitHub metadata (Issues/PRs and their comments) for the Bitcoin Core repository in the
coredev
index. Once this is implemented, we can extend support to other repositories, including:bitcoin/bips
bitcoin-core/gui
bitcoin-core/secp256k1
Current Status
The
coredev
index powers the CoreDev bot and focuses on Bitcoin Core-related sources. Current sources includes:However, the index was created from a one-time scrape and has not been updated. While the onboarding guide is static and doesn’t require updates, other sources need regular indexing to remain relevant.
Proposed Approach
To achieve regular updates, we can use 0xB10C's github-metadata-backup. There are two options:
Our scraper already supports GitHub repositories, so Option 2 should be straightforward.
Considerations
Start with smaller datasets:
bitcoin/bitcoin
repository or smaller repositories likebitcoin/bips
orbitcoin-core/secp256k1
to simplify testing and validation.Data storage format:
issue
field (as in the PR Review Club scraper).thread_url
field could be used, but theissue
field is preferred for consistency.Follow the terminology outlined in Improving Terminology Consistency Across Data Infrastructure #89
https://github.com/bitcoin/bitcoin/
The text was updated successfully, but these errors were encountered: