This project summarizes the bitcoin-dev, lightning-dev mailing lists, and Delving Bitcoin website.
Utilizing data collected by the scraper and stored in an Elasticsearch index, it uses several nightly cron jobs to automate the generation of easily accessible summarization feeds for public consumption.
- Daily XML Generation (source)
- Queries Elasticsearch for documents from the last 30 days across each source. For each source, it retrieves existing XML files/summaries, while generating summaries for new posts (those lacking XML file). For each thread, it compiles inputs to generate a combined thread summary. This compilation includes the summaries of previous posts and the actual content of newer posts that have been added since the last run. All generated summaries, both individual and combined, are formatted into XML files and committed to GitHub to be used by Bitcoin TLDR.
- Daily Push Summary From XML Files to ES INDEX (source)
- Queries Elasticsearch for documents lacking summaries, extracts summaries from corresponding XML files, and then updates these documents with their summaries in the Elasticsearch index.
- Daily Push Combined Summary From XML Files to ES INDEX (source)
- Processes each combined thread summary XML file, transforming it into a document format, checks for its existence in Elasticsearch, and updates or inserts the document as needed.
- Daily Python Homepage Update Script (source)
- Queries the last 7 days of data from Elasticsearch for each source to compile lists of active threads, recent threads, and historical threads for 'Today in History'. It generates a summary of recent threads if available; otherwise, for active threads. The resulting
homepage.json
is then committed to GitHub to be used by Bitcoin TLDR.
- Queries the last 7 days of data from Elasticsearch for each source to compile lists of active threads, recent threads, and historical threads for 'Today in History'. It generates a summary of recent threads if available; otherwise, for active threads. The resulting
- Weekly Python Newsletter Generation Script (source)
- Generates a newsletter by compiling lists of new and active threads from the past week's data for each source. It generates a summary of new threads if available; otherwise, for active threads. The resulting
newsletter.json
is then committed to GitHub to be used by Bitcoin TLDR.
- Generates a newsletter by compiling lists of new and active threads from the past week's data for each source. It generates a summary of new threads if available; otherwise, for active threads. The resulting
- Install all the dependencies from requirements.txt file:
pip install -r requirements.txt
- Set up environment variables: Create
.env
file in the root folder and add following keys -OPENAI_API_KEY="<your_api-Key>" ES_CLOUD_ID = "<your_es_cloud_id>" ES_USERNAME = "<your_es_username>" ES_PASSWORD = "<your_es_password>" ES_INDEX = ""<your_es_index>""
- In
src > config.py
file, setCHATGPT=True
if you want to generate results using chatgpt model, else set it toFalse
and assignCOMPLETION_MODEL
variable with the model's name. - Run an app using command:
python app.py
- Directories:
postman_collection
: APIsoutput
: generate results on api callnotebook
: jupyter-notebook with all the scripts