Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download and archive all tweets #20

Open
digitalarchivo opened this issue May 4, 2024 · 5 comments
Open

Download and archive all tweets #20

digitalarchivo opened this issue May 4, 2024 · 5 comments

Comments

@digitalarchivo
Copy link

Id like to be able to archive a twitter user's tweets, all that are on waybackmachine.

If we take a user for example, degenspartan, he has 55,000+ tweets on waybackmachine.

If I could enter his username & then extract url & image of the tweet, and do that for all 55k+ tweets, that would be ideal.

What I want to do is one, have an archive of a twitter user's tweets. And two, I'd like to present them on my GitHub pages .md site.

Please let me know what would be possible.

Thank you so much.

@claromes
Copy link
Owner

claromes commented May 5, 2024

  • A CSV file containing

    • Wayback Machine URL
    • original tweet URL
    • tweet text(1)
    • images(2), if the tweet contains any
    • screenshot(3)
    • MIME type
    • date it was saved on Wayback
  • And an HTML file (web interface) listing these tweets. This HTML can be included as a page on GitHub Pages later on.

(1) When the MIME type is JSON or when the tweet hasn't been deleted (but is archived on Wayback).
(2) In the CSV, there would be the image file name, and the image would also be downloaded.
(3) This is the most complicated part, as the application on the Streamlit cloud doesn't have many resources. This option would only be available for local execution.

I would have to adjust the interface for Wayback Tweets, write the code for downloading, and the web interface for viewing the listing, and write documentation. Given the time I have, I would take 1 to 2 months to finish everything.

I think this would be a great upgrade for the tool.

What do you think?

@digitalarchivo
Copy link
Author

I'm new to this -- but here are some thoughts I came up with, after reading your reply, which sounded great. 
-Retrieve all URLs of a Twitter user's tweets.
-Upload all tweet URLs to the Wayback Machine for archiving.
-For each tweet:
  -Retrieve the Wayback Machine URL.
  -Extract tweet text.
  -Download any images associated with the tweet.
  -Take a screenshot of the tweet.
  -Generate a JSON file listing the tweet details.
-Sort all tweets by date.
-For each tweet URL:
  -Extract the tweet ID.
-Utilize the GitHub project "DocNow/tweet-viewer":
  -Use the tweet IDs to fetch the tweets from Twitter's API.
  -Display all tweets in chronological order.
  -Present them on a single page with the appearance of Twitter.
  -Example: https://tweet-viewer.vercel.app/
-Take a user's tweets and upload them to the Wayback Machine.
-Retrieve all URLs for that user on the Wayback Machine, including deleted tweets.
-Create an offline archive with the information retrieved from the Wayback Machine.
-Create an online archive version for each particular user.
-Ensure the online archive version has a similar appearance to that of Twitter.

What do you think?

-https://github.com/DocNow/tweet-viewer

-https://github.com/yusuf-yldrm/Tweet-Viewer
-https://tweet-viewer.vercel.app/

@claromes claromes changed the title From reddit Download and archive all tweets May 18, 2024
@claromes
Copy link
Owner

After the private conversation, I wrote on my blog about the upcoming updates: https://claromes.com/blog/wayback-tweets-is-moving-to-the-command-line.html

@digitalarchivo
Copy link
Author

After the private conversation, I wrote on my blog about the upcoming updates: https://claromes.com/blog/wayback-tweets-is-moving-to-the-command-line.html

That sounds like a very exciting blog post, to be honest.. I'll repost it on my twitter and on reddit

@digitalarchivo
Copy link
Author

https://x.com/jtig37/status/1794048221569786350

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants