-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retrieving objects for a set or list of URL's in parallel #22
Comments
The loop could be one iteration... in fact the example you're looking at just loops once ( |
@wumpus - thanks for the response :D I read your explanation on another issue on this repo (#8). I wanted to ask if the retrieval time is dependent on how many requests are given to cc at a specific time ? And it would be helpful if you can suggest any changes that can help in speeding up the retrieval time. Thanks. |
Turn up the verbose level and you'll see what's going on -- if you are not limiting your time span, the cdx code has to talk to every Common Crawl index individually. Whereas for the Internet Archive, there's just one query. |
Hi,
Thanks for sharing the programming example - https://github.com/cocrawler/cdx_toolkit#programming-example
I wanted to ask if there is a way to feed in a list of URL's and retrieve their objects. We feed URL's one by one in the above example and looping over a few thousands (or even hundreds) seems to be a little time consuming.
Thanks.
The text was updated successfully, but these errors were encountered: