Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decrease memory footprint of scrapy #1

Open
rimbi opened this issue Apr 18, 2011 · 2 comments
Open

Decrease memory footprint of scrapy #1

rimbi opened this issue Apr 18, 2011 · 2 comments
Assignees

Comments

@rimbi
Copy link
Member

rimbi commented Apr 18, 2011

This can be done holding Request objects in DB instead of memory.

@ghost ghost assigned rimbi Apr 18, 2011
@sardok
Copy link
Member

sardok commented Apr 20, 2011

my investigation shows that scrappy is not collecting request object after got response for it but the request object are aggregated in schedular queue. scheduler is not fast enough as download of course and its queue is just growing.

@rimbi
Copy link
Member Author

rimbi commented Apr 20, 2011

I've found the place wherein Request objects are stored in dequeue structures. I tried to insert an hook there but it's not an easy change. I thought I could use pickle module to store Request objects as strings in database but I 'm getting errors.
Also, this change requires changes in scrapy framework. So, we need the screapy repo as well.
By the way, during these development efforts I did realize the advantages of statically typed languages over dynamically typed languages in debugging errors. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants