Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: SAVE_ARCHIVE_DOT_ORG=False is not respected #1295

Open
gerroon opened this issue Dec 18, 2023 · 7 comments
Open

Bug: SAVE_ARCHIVE_DOT_ORG=False is not respected #1295

gerroon opened this issue Dec 18, 2023 · 7 comments
Labels
expected: next release size: easy status: backlog Work is planned someday but is not the highest priority at the moment touches: configuration type: bug report why: functionality Intended to improve ArchiveBox functionality or features why: security Intended to improve ArchiveBox security or data integrity
Milestone

Comments

@gerroon
Copy link

gerroon commented Dec 18, 2023

Describe the bug

SAVE_ARCHIVE_DOT_ORG=False is set in multiple places, and it is still set to True in the actual config and it still sends snapshots to atchive.org.

Steps to reproduce

set SAVE_ARCHIVE_DOT_ORG=False in the docker-compose.yml and use

image

Also set
docker compose run archivebox config --set SAVE_ARCHIVE_DOT_ORG=False

Restart the containers

Screenshots or log output

Here I ran the config after a restart and it is still set to True

docker compose run archivebox config


SAVE_WARC=True
SAVE_GIT=True
SAVE_MEDIA=True
SAVE_ARCHIVE_DOT_ORG=True
RESOLUTION=1440,2000
GIT_DOMAINS=github.com,bitbucket.org,gitlab.com,gist.github.com

And the page still has an archive.org url in the json
image

ArchiveBox version

 docker compose run archivebox version
0.7.1
ArchiveBox v0.7.1+editable BUILD_TIME=2023-12-18 06:57:51 1702882671
IN_DOCKER=True IN_QEMU=False ARCH=x86_64 OS=Linux PLATFORM=Linux-5.14.0-4-amd64-x86_64-with-glibc2.36 PYTHON=Cpython
FS_ATOMIC=True FS_REMOTE=True FS_USER=911:911 FS_PERMS=644
DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND=ripgrep LDAP=False

[i] Dependency versions:
 √  PYTHON_BINARY         v3.11.7         valid     /usr/local/bin/python3.11
 √  SQLITE_BINARY         v2.6.0          valid     /usr/local/lib/python3.11/sqlite3/dbapi2.py
 √  DJANGO_BINARY         v3.1.14         valid     /usr/local/lib/python3.11/site-packages/django/__init__.py
 √  ARCHIVEBOX_BINARY     v0.7.1          valid     /usr/local/bin/archivebox

 √  CURL_BINARY           v8.4.0          valid     /usr/bin/curl
 √  WGET_BINARY           v1.21.3         valid     /usr/bin/wget
 √  NODE_BINARY           v21.4.0         valid     /usr/bin/node
 √  SINGLEFILE_BINARY     v1.1.18         valid     /app/node_modules/single-file-cli/single-file
 √  READABILITY_BINARY    v0.0.9          valid     /app/node_modules/readability-extractor/readability-extractor
 √  MERCURY_BINARY        v1.0.0          valid     /app/node_modules/@postlight/parser/cli.js
 √  GIT_BINARY            v2.39.2         valid     /usr/bin/git
 √  YOUTUBEDL_BINARY      v2023.11.16     valid     /usr/local/bin/yt-dlp
 √  CHROME_BINARY         v120.0.6099.28  valid     /usr/bin/chromium-browser
 √  RIPGREP_BINARY        v13.0.0         valid     /usr/bin/rg

[i] Source-code locations:
 √  PACKAGE_DIR           24 files        valid     /app/archivebox
 √  TEMPLATES_DIR         4 files         valid     /app/archivebox/templates
 -  CUSTOM_TEMPLATES_DIR  -               disabled  None

[i] Secrets locations:
 -  CHROME_USER_DATA_DIR  -               disabled  None
 -  COOKIES_FILE          -               disabled  None

[i] Data locations:
 √  OUTPUT_DIR            8 files @       valid     /data
 √  SOURCES_DIR           6 files         valid     ./sources
 √  LOGS_DIR              1 files         valid     ./logs
 √  ARCHIVE_DIR           5 files         valid     ./archive
 √  CONFIG_FILE           159.0 Bytes     valid     ./ArchiveBox.conf
 √  SQL_INDEX             248.0 KB        valid     ./index.sqlite3

@gerroon gerroon changed the title Bug: ... SAVE_ARCHIVE_DOT_ORG=False is not respected Bug: SAVE_ARCHIVE_DOT_ORG=False is not respected Dec 18, 2023
@mamema
Copy link

mamema commented Dec 18, 2023

i didn't have this. on my side its correct. Does your config file looks like

[SERVER_CONFIG]
SECRET_KEY = redacted
PUBLIC_ADD_VIEW = True
PUBLIC_INDEX = false
PUBLIC_SNAPSHOTS = false

[ARCHIVE_METHOD_TOGGLES]
SAVE_ARCHIVE_DOT_ORG = false
SAVE_PDF = false
SAVE_SCREENSHOT = false
SAVE_DOM = false
SAVE_GIT = false
SAVE_FAVICON = false
SAVE_WARC = false
SAVE_SINGLEFILE = false
SAVE_READABILITY = false
SAVE_MERCURY = false
SAVE_MEDIA = false
SAVE_WGET_REQUISITES = false

[GENERAL_CONFIG]
OUTPUT_DIR = /data
ONLY_NEW = true

@gerroon
Copy link
Author

gerroon commented Dec 18, 2023

See my post I already listed the docker config result, it is on (after setting it in docker-compose.yml and from the cli)

image

@mamema
Copy link

mamema commented Dec 18, 2023

you should have a conf file in your /data directory. Does it look similar as the one posted above?
Does it have the [ARCHIVE_METHOD_TOGGLES] header?

@gerroon
Copy link
Author

gerroon commented Dec 18, 2023

It is there but then it does not respect the config file given it seems enabled in the actual runtime.

cat /media/archivebox/ArchiveBox.conf

[SERVER_CONFIG]
SECRET_KEY = xxxxx
PUBLIC_ADD_VIEW = True

[ARCHIVE_METHOD_TOGGLES]
SAVE_ARCHIVE_DOT_ORG = False

@pirate
Copy link
Member

pirate commented Dec 19, 2023

Can you post your full unredacted docker-compose.yml file including the volumes: and rest of the environment: options you have.

Also the full output of docker compose run archivebox cat /data/ArchiveBox.conf & cat ./ArchiveBox.conf (outside docker) to confirm it's actually seeing the correct config file both inside and outside the container.

@gerroon
Copy link
Author

gerroon commented Dec 19, 2023

I will provide the compose a bit later since I need to clear it up.

 docker compose run archivebox cat /data/ArchiveBox.conf

[SERVER_CONFIG]
SECRET_KEY = xxxx
PUBLIC_ADD_VIEW = True

[ARCHIVE_METHOD_TOGGLES]
SAVE_ARCHIVE_DOT_ORG = False

cat /media/archivebox/ArchiveBox.conf

[SERVER_CONFIG]
SECRET_KEY = xxxxx
PUBLIC_ADD_VIEW = True

[ARCHIVE_METHOD_TOGGLES]
SAVE_ARCHIVE_DOT_ORG = False


As you see it has been modified today

ls -alh /media/archivebox/ArchiveBox.conf
-rwxrwxrwx 1 911 911 159 Dec 18 11:44 /media/archivebox/ArchiveBox.conf

      volumes:
            - /media/archivebox:/data

@pirate pirate added type: bug report touches: configuration why: functionality Intended to improve ArchiveBox functionality or features size: easy why: security Intended to improve ArchiveBox security or data integrity status: backlog Work is planned someday but is not the highest priority at the moment expected: next release labels Dec 28, 2023
@pirate pirate added this to the v0.8 milestone Dec 28, 2023
@pirate
Copy link
Member

pirate commented Jan 5, 2024

I cant seem to reproduce this on the latest 0.7.2 image. Can you give it a try again with the latest build?

docker pull archivebox/archivebox:dev

docker compose down --remove-orphans
docker compose down

docker compose run archivebox config --set SAVE_ARCHIVE_DOT_ORG=False
docker compose run archivebox config --get SAVE_ARCHIVE_DOT_ORG

[i] [2024-01-05 00:47:57] ArchiveBox v0.7.2: archivebox config --get SAVE_ARCHIVE_DOT_ORG
    > /data

SAVE_ARCHIVE_DOT_ORG=False

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
expected: next release size: easy status: backlog Work is planned someday but is not the highest priority at the moment touches: configuration type: bug report why: functionality Intended to improve ArchiveBox functionality or features why: security Intended to improve ArchiveBox security or data integrity
Projects
None yet
Development

No branches or pull requests

3 participants