Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fails to record when in http proxy mode #847

Open
alexkreidler opened this issue Jun 5, 2023 · 1 comment
Open

Fails to record when in http proxy mode #847

alexkreidler opened this issue Jun 5, 2023 · 1 comment

Comments

@alexkreidler
Copy link

Describe the bug

Pywb can't serve a request in HTTP proxy mode and record the WARC file. Hopefully I'm missing something simple!

Steps to reproduce the bug

I'm trying to setup the HTTP proxy to record Warc files. When I set my config.yaml to this:

proxy:
  coll: test2
  recording: true

I get no error on pywb startup, but an HTTP 400 error <p>Collection not found: <b>test2</b></p> when I make the request via ALL_PROXY=http://localhost:8080 curl example.com.

So then I run wb-manager init proxied, change the coll to proxied, and then when I run the same request, I get an http page that says Pywb Error and this error:


{'args': {'coll': 'proxied', 'type': 'record', 'metadata': {}, 'cache': 'default'}, 'error': '{"error": "HTTPError(\'404 Client Error: No Resource Found for url: http://localhost:40369/live/resource/postreq?param.recorder.coll=proxied&url=http%3A%2F%2Fexample.com%2F&closest=now&matchType=exact\')"}'}

and the logs are:

$ pywb
2023-06-05 04:19:04,234: [INFO]: Proxy recording into collection "proxied"
2023-06-05 04:19:04,356: [INFO]: Starting Gevent Server on 8080
127.0.0.1 - - [2023-06-05 04:19:09] "POST /live/resource/postreq?param.recorder.coll=proxied&url=http%3A%2F%2Fexample.com%2F&closest=now&matchType=exact HTTP/1.1" 404 215 0.000875
127.0.0.1 - - [2023-06-05 04:19:09] "POST /live/resource/postreq?param.recorder.coll=proxied&url=http%3A%2F%2Fexample.com%2F&closest=now&matchType=exact HTTP/1.1" 400 336 0.004111
127.0.0.1 - - [2023-06-05 04:19:09] "GET http://example.com/ HTTP/1.1" 400 1779 0.076372
$ lsof -iTCP
pywb      1700596            root    6u  IPv4 455674019      0t0  TCP localhost:40369 (LISTEN)
pywb      1700596            root    7u  IPv4 455674024      0t0  TCP localhost:45495 (LISTEN)
pywb      1700596            root    8u  IPv4 455674027      0t0  TCP *:http-alt (LISTEN)

Expected behavior

The system should return the correct HTML response, record the WARC file, and I should see it on disk

Environment

  • OS: Ubuntu 22.04.1 LTS, Linux 5.15.0-25-generic
  • Python env:
conda list
List of packages in environment: "/root/micromamba/envs/proxy"

  Name              Version     Build           Channel    
─────────────────────────────────────────────────────────────
  _libgcc_mutex     0.1         conda_forge     conda-forge
  _openmp_mutex     4.5         2_gnu           conda-forge
  c-ares            1.19.0      h5eee18b_0                 
  ca-certificates   2023.01.10  h06a4308_0                 
  certifi           2022.12.7   py37h06a4308_0             
  cffi              1.15.1      py37h5eee18b_3             
  gevent            21.12.0     py37haa10bde_2  conda-forge
  greenlet          1.1.3       py37h6a678d5_0             
  ld_impl_linux-64  2.38        h1181459_1                 
  libev             4.33        h7f8727e_1                 
  libffi            3.4.4       h6a678d5_0                 
  libgcc-ng         13.1.0      he5830b7_0      conda-forge
  libgomp           13.1.0      he5830b7_0      conda-forge
  libstdcxx-ng      11.2.0      h1234567_1                 
  libuv             1.44.2      h5eee18b_0                 
  ncurses           6.4         h6a678d5_0                 
  openssl           1.1.1t      h7f8727e_0                 
  pip               22.3.1      py37h06a4308_0             
  pycparser         2.21        pyhd3eb1b0_0               
  python            3.7.16      h7a1cb2a_0                 
  python_abi        3.7         2_cp37m         conda-forge
  readline          8.2         h5eee18b_0                 
  setuptools        65.6.3      py37h06a4308_0             
  sqlite            3.41.2      h5eee18b_0                 
  tk                8.6.12      h1ccaba5_0                 
  wheel             0.38.4      py37h06a4308_0             
  xz                5.4.2       h5eee18b_0                 
  zlib              1.2.13      h5eee18b_0                 
  zope              1.0         py37_1                     
  zope.event        4.5.0       py37_0                     
  zope.interface    5.4.0       py37h7f8727e_0    
@malicious
Copy link

I had a similar error; I needed to add the $live collection to config.yaml.
(It's mentioned in the error message that it's trying to hit http://localhost:40369/live/)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants