Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support: singlefile & readability fail to work #1386

Open
ahermitforhire opened this issue Mar 25, 2024 · 3 comments
Open

Support: singlefile & readability fail to work #1386

ahermitforhire opened this issue Mar 25, 2024 · 3 comments

Comments

@ahermitforhire
Copy link

ahermitforhire commented Mar 25, 2024

For every snapshot I try, singlefile and readability fail. I assume readability may fail due to lack of the singlefile.html.

Error for singlefile:

SingleFile was not able to archive the page

If I run the command it tries to run in terminal I get:

bash: syntax error near unexpected token `('

The raw command I copied from the log section and ran to get the above:

/mnt/media/ArchiveBox/node_modules/single-file-cli/single-file --browser-executable-path=chromium --browser-args=[\"--headless=new\", \"--user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36 ArchiveBox/0.7.2 (+https://github.com/ArchiveBox/ArchiveBox/)\", \"--window-size=1440,2000\"] https://web.archive.org/web/20240301170542/https://www.roadandtrack.com/car-culture/a46975496/behind-f1-velvet-curtain/ singlefile.html

This error occurs for every article I try to archive.

Readability error is:

Readability was not able to archive the page (invalid JSON)

But, again, since I see a reference to singlefile.html in the command I expect solving the above will solve this.

The output of archivebox version is:

0.7.2
ArchiveBox v0.7.2 BUILD_TIME=2024-02-28 11:27:51 1709137671
IN_DOCKER=False IN_QEMU=False ARCH=x86_64 OS=Linux PLATFORM=Linux-5.15.0-101-generic-x86_64-with-glibc2.35 PYTHON=Cpython
FS_ATOMIC=True FS_REMOTE=False FS_USER=1000:1000 FS_PERMS=755
DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND=ripgrep LDAP=False

[i] Dependency versions:
 √  PYTHON_BINARY         v3.10.12        valid     /usr/bin/python3.10                                                         
 √  SQLITE_BINARY         v2.6.0          valid     /usr/lib/python3.10/sqlite3/dbapi2.py                                       
 √  DJANGO_BINARY         v3.1.14         valid     /home/ahermitforhire/.local/lib/python3.10/site-packages/django/__init__.py 
 √  ARCHIVEBOX_BINARY     v0.7.2          valid     /home/ahermitforhire/.local/bin/archivebox                                  

 √  CURL_BINARY           v7.81.0         valid     /usr/bin/curl                                                               
 √  WGET_BINARY           v1.21.2         valid     /usr/bin/wget                                                               
 √  NODE_BINARY           v12.22.9        valid     /usr/bin/node                                                               
 √  SINGLEFILE_BINARY     v1.1.54         valid     ./node_modules/single-file-cli/single-file                                  
 √  READABILITY_BINARY    v0.0.11         valid     ./node_modules/readability-extractor/readability-extractor                  
 √  MERCURY_BINARY        v1.0.0          valid     ./node_modules/@postlight/parser/cli.js                                     
 √  GIT_BINARY            v2.34.1         valid     /usr/bin/git                                                                
 √  YOUTUBEDL_BINARY      v2023.12.30     valid     /home/ahermitforhire/.local/bin/yt-dlp                                      
 √  CHROME_BINARY         v122.0.6261.94  valid     /usr/bin/chromium                                                           
 √  RIPGREP_BINARY        v13.0.0         valid     /usr/bin/rg                                                                 

[i] Source-code locations:
 √  PACKAGE_DIR           23 files        valid     /home/ahermitforhire/.local/lib/python3.10/site-packages/archivebox         
 √  TEMPLATES_DIR         3 files         valid     /home/ahermitforhire/.local/lib/python3.10/site-packages/archivebox/templates
 -  CUSTOM_TEMPLATES_DIR  -               disabled  None                                                                        

[i] Secrets locations:
 -  CHROME_USER_DATA_DIR  -               disabled  None                                                                        
 -  COOKIES_FILE          -               disabled  None                                                                        

[i] Data locations:
 √  OUTPUT_DIR            8 files         valid     /mnt/media/ArchiveBox                                                       
 √  SOURCES_DIR           11 files        valid     ./sources                                                                   
 √  LOGS_DIR              1 files         valid     ./logs                                                                      
 √  ARCHIVE_DIR           5 files         valid     ./archive                                                                   
 √  CONFIG_FILE           238.0 Bytes     valid     ./ArchiveBox.conf                                                           
 √  SQL_INDEX             328.0 KB        valid     ./index.sqlite3  
@pirate
Copy link
Member

pirate commented Mar 25, 2024

Try running this:

/mnt/media/ArchiveBox/node_modules/single-file-cli/single-file --browser-executable-path=chromium 'https://web.archive.org/web/20240301170542/https://www.roadandtrack.com/car-culture/a46975496/behind-f1-velvet-curtain/' singlefile.html 

But also you are archiving a URL that's already on the internet archive? You can try it but we don't really support that very well. You may want to follow this issue if you do that a lot: #160

@ahermitforhire
Copy link
Author

If I do that in terminal I get:

Unexpected token '?'

Note: the error I described happens on ANY URL I try to add as mentioned in my initial post, not just archive.org links. For example:

/mnt/media/ArchiveBox/node_modules/single-file-cli/single-file --browser-executable-path=chromium --browser-args=[\"--headless=new\", \"--user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36 ArchiveBox/0.7.2 (+https://github.com/ArchiveBox/ArchiveBox/)\", \"--window-size=1440,2000\"] https://www.theguardian.com/us-news/2016/aug/30/us-national-parks-fire-lookout-forest-wildfire singlefile.html

Gets:

bash: syntax error near unexpected token `('

(I noticed in my initial post the code block removed the symbol before the parenthesis and I have edited to reflect that)

Also, I don't plan on using the terminal over the web interface to add new snapshots. The only reason I ran the command in terminal was to get more details of the error, so I'd like to see what can be done to solve this to enable the use of the web UI. Thanks!

@pirate
Copy link
Member

pirate commented Mar 25, 2024

Can you screenshot the terminal running the command and getting this error Unexpected token '?'

(manually remove the user agent args when running that copy-pasted command as the quote escaping is whats causing a bunch of the errors you're seeing error near unexpected token (')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants