Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to access to meta values? #81

Open
llermaly opened this issue Oct 26, 2018 · 3 comments
Open

How to access to meta values? #81

llermaly opened this issue Oct 26, 2018 · 3 comments
Labels
bug project maintainers identified this issue as potential bug in project

Comments

@llermaly
Copy link

llermaly commented Oct 26, 2018

Hi,

I'm doing this request:
curl -XPOST -d '{ "spider_name":"quotes", "start_requests":true, "request":{ "meta": {
"test": "1", } } }' "http://138.219.228.215:9080/crawl.json"

Then I try to access from my spider by print(response.meta) and this is what it shows:

{'depth': 0, 'download_latency': 0.03323054313659668, 'download_slot': 'URL', 'download_timeout': 180.0}

of course response.meta["test"] throws error.

I need to use this "test" parameter to fill the form request

EDIT: spider : https://pastebin.com/EFz818qL

thanks!

@changtingQ
Copy link

Do you have been slove the issues? can you tell me the method

@pawelmhm pawelmhm added the bug project maintainers identified this issue as potential bug in project label Sep 12, 2019
@pawelmhm
Copy link
Member

pawelmhm commented Sep 12, 2019

indeed that doesn't work, I'm going to fix it.

edit: it only happens when you set start_requests: True and provide request object with meta. In this case meta is not passed to spider.

@rythm-of-the-red-man
Copy link

rythm-of-the-red-man commented Oct 10, 2019

I guess this workaround might help you for now.
You can patch scrapyrt to take parameters from json directly. I mean, not under meta key, but as key. To achieve this, modification of resources.py is required. In tl;dr version just paste

        crawler_params = api_params.copy()
        for api_param in ['max_requests', 'start_requests', 'spider_name', 'url']:
            crawler_params.pop(api_param, None)
        kwargs.update(crawler_params)

below first try:except block in prepare_crawl method, so it looks as on attached screenshot
screenshot
.
After that when you set attributes in spider constructor like this:

    def __init__(self, name=name, **kwargs):
        super().__init__(name=name, **kwargs)
        ### Getting get/post args ###
        for k, v in kwargs.items():
            setattr(self, k, v)

test parameter should be avaiable as self.test

I'm not sure, but I think I found this solution in open PR, @pawelmhm maybe it is good idea to merge it? I'm using this patch around half year and it seems to be fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug project maintainers identified this issue as potential bug in project
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants
@pawelmhm @llermaly @rythm-of-the-red-man @changtingQ and others