-
Notifications
You must be signed in to change notification settings - Fork 331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add session cookies to crawling context #710
Comments
Could you elaborate please? For plain HTTP crawlers, you can use |
I missed that, for HTTP crawlers we have |
It is true that it would be hard to reach the headers via |
It can be useful for Playwright to have access to the cookie of the session from which the request was made. But not directly to the headers |
Feel free to rephrase the issue title and description then 🙂 |
When using PlaywrightCrawler in Crawlee for web scraping, how can I add cookies? Could you provide an example? |
Hey @oldsiks Thank you for your interest in crawlee. Here is an example of using a cookie at the request header level import asyncio
from crawlee import Request
from crawlee.playwright_crawler import PlaywrightCrawler, PlaywrightCrawlingContext
async def main() -> None:
crawler = PlaywrightCrawler()
@crawler.router.default_handler
async def request_handler(context: PlaywrightCrawlingContext) -> None:
print(await context.page.content())
await crawler.run([
Request.from_url(url='https://httpbin.org/get', headers={'cookie': 'my_cookies'})
])
asyncio.run(main()) Also after release 0.5 it will be possible to set cookie in You can try this using the pre-relise version - 0.5.0b30 Example for 0.5: import asyncio
from crawlee.crawlers import PlaywrightCrawler, PlaywrightCrawlingContext
from crawlee.browsers import BrowserPool, PlaywrightBrowserPlugin
async def main() -> None:
user_plugin = PlaywrightBrowserPlugin(
browser_new_context_options={"extra_http_headers": {'cookie': 'my_cookies'}}
)
browser_pool = BrowserPool(plugins=[user_plugin])
crawler = PlaywrightCrawler(browser_pool=browser_pool)
@crawler.router.default_handler
async def request_handler(context: PlaywrightCrawlingContext) -> None:
print(await context.page.content())
await crawler.run(['https://httpbin.org/get'])
asyncio.run(main()) |
Add to the context, the cookie of the session from which the request was made, both for HTTP crawlers and Playwright
The text was updated successfully, but these errors were encountered: