Skip to content

Commit

Permalink
fix: rate limit of GraphQL search api
Browse files Browse the repository at this point in the history
fix: reset poll count to 0 in online api
update: archive script
update: readme about archive script
  • Loading branch information
BANKA2017 committed May 25, 2023
1 parent 7ad0d56 commit 05764f3
Show file tree
Hide file tree
Showing 6 changed files with 376 additions and 276 deletions.
12 changes: 6 additions & 6 deletions README.MD
Original file line number Diff line number Diff line change
Expand Up @@ -76,10 +76,10 @@ This repository included `core/crawler/api/scripts`, frontend repository is [her

### Archiver

* archive userinfo, nearly all tweets(not included **reply**) and nearly **ALL MEDIA**(included avatar and banner) by search api
* TODO: spaces, boradcast, mix media (like image and video in the same tweet)
* archive userinfo, most tweets(included **reply**) and nearly **ALL MEDIA**(included avatar and banner) by search api, `Following` and `Followers`
* TODO: spaces, boradcast (ffmpeg command)
* **PLEASE PRECHECK THE ACCOUNT HAVEN BEEN SEARCHBAN**
* **DO NOT EXECUTE `init.sh` UNTIL YOU BACKUP ALL ARCHIVED DATA IN FOLDER './twitter_archiver/'**, it will clean all archived data
* Read more in [archiver/README.md](https://github.com/BANKA2017/twitter-monitor/tree/node/apps/archiver)

### CloudFlare Workers

Expand Down Expand Up @@ -215,10 +215,10 @@ Those are Chinese articles

### Archiver

* 通过**搜索API**备份帐号的用户信息,几乎所有推文(不包括回复)以及几乎所有的媒体文件(包括当前头像和banner)
* TODO: 备份最近30天的空间、播客(暂不清楚怎么搞)、混合媒体(差不多就是同一条推文上有两个或以上的视频或者图片和视频放一起
* 通过**搜索API**备份帐号的用户信息,大多数推文(包括回复)和媒体文件(包括当前头像和banner)`Following``Followers`
* TODO: 备份Spaces、播客(生成 ffmpeg 命令
* **使用前请检查待备份帐号是否被搜索封禁**
* **在备份好'./twitter_archiver/' 文件夹的内容前请不要运行 `init.sh`**,它将会清除掉相应文件夹的内容
* 使用方式请阅读 [archiver/README.md](https://github.com/BANKA2017/twitter-monitor/tree/node/apps/archiver)

### CloudFlare Workers

Expand Down
72 changes: 72 additions & 0 deletions apps/archiver/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
Archiver
---

## ⚠ WARNING

- We cannot guarantee that these features will be available.
- When archiving someone else's Twitter, please obtain their permission first.
- The structure of the generated data is still being adjusted, and the current results may not be available in viewer.


## Known issues
- Unable to crawl most of the retweets.
- Unable to crawl tweets marked as sensitive content (TODO login can solve).
- Unable to crawl copyrighted media files in some region.
- Some videos are damaged, which is normal. Downloading the corresponding m3u8 will result in a lower quality version.
- Unable to crawl tweets from protected/banned/deleted users.
- The rate limit status after logging in will follow the account rather than the guest token (TODO not implemented yet).

## Features

- Userinfo (not included the author of the quoted tweet).
- Tweets and replies can be searched anonymously, not included most retweets.
- Polls
- Avatar, banner, photos and videos.
- Following and followers list (optional)
- Keep raw data for future used.

## TODO

- Space and Broadcast with ffmpeg
- Login by **COOKIE**
- Incremental update tweets/followers/following list

## Init

- Execute command:

```shell
#bash
bash init.sh <screen_name> # like 'twitter'
#or powershell
.\init.ps1 <screen_name>
```

A folder named `screen_name` will be created. If the folder `screen_name` already exists, you will be prompted to delete or rename the folder.

## Run

### Crawler

```shell
node archive.mjs [OPTION]
```
|Parameter|Required|Description|
|:--|:--|:--|
|--all|Optional|All data (UserInfo, Tweets, Following, Followers)|
|--followers|Optional|Get Followers|
|--following|Optional|Get Following|
|--media|Optional|Get Media|
|--skip_\<key of argvList \>|Optional|Key of argvList included `user_info_and_tweets`, `followers`, `following` and `media`. Will skip the corresponding job.|

### Retry media

```shell
node retryMedia.mjs
```

Attempt to retrieve the failed images during crawling. (useless)

## View

The front-end project is currently under development and if it is ready, it might be available in <https://github.com/BANKA2017/twitter-archive-viewer>.
Loading

0 comments on commit 05764f3

Please sign in to comment.