-
Notifications
You must be signed in to change notification settings - Fork 343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read_html_live sessions do not close and accumulate causing memory to crash #422
Comments
Do you see the problem with this simpler reprex? library(rvest)
for (i in 1:100) {
page <- read_html_live("https://hadley.nz")
} Does adding an explicit |
I bet this is going to be a windows specific problem 😞 |
@alireza5969 In the latest screenshot, I see it says "Google Chrome (86)". I don't have a Windows machine handy -- does that mean there are 86 tabs/windows open? It may be counting your regular (visible) tabs in that number. Also, I think 1.7GB is not actually a lot of memory for Chrome to consume when you have multiple tabs open. If that 86 does represent the number of open tabs and you do not have that many visible open tabs, then it may be the case that rvest is opening many tabs and not closing them right away. Can you check if that number is the number of open tabs -- does it increase when you open a new tab? And also check how many visible tabs you have, as opposed to the invisible headless ones created by rvest/chromote. |
@alireza5969 could you please try installing |
To be honest, I’m not entirely sure what this indicates! After a fresh session following a restart, when I visit this page with Chrome, I see varying counts (like 14 or 22). When I open a new tab (for instance, google.com), the number jumps to between 19 and 27. So, I suspect it’s not accurately reflecting active or visible tabs.
I agree, it’s not. However, it’s currently using 73% of my memory (and I actually have decent RAM!). But for my tasks, I sometimes need to scrape over 5K webpages! That’s when it really becomes a concern.
Yes, I believe that’s the case.
Yes, it goes up with the number of open tabs (or potentially with the workload). All the examples above were with Chrome, without using I'm sorry, @hadley! I can’t install
But, I was able to install it with this one: This is what it looks like when I run: for (i in 1:100) {
print(i)
page <- read_html_live("https://hadley.nz")
} I think you did it @hadley 👏🏻😌 |
Oops, sorry for the wrong org name, and thanks for verifying that the fix works! |
Dear
{{tidyverse}}
/{{rvest}}
community,I'm not sure if this is a bug or a problem that I can not find the solution for.
I try to read about 1000 pages with
read_html_live()
in a for loop. Naturally,I expect each page / session (I'm sorry if I'm not using the correct technical term) to be closed when a new one is called. However, after a while, when the machine has read 50-100 pages, the memory crashes.When I look at task manage, I see all chrome is severely disrupting the memory (see image below).
FYI, this is the code that I'm using:
Currently, my work around is this code, which I add it at the end of every 100 loops. But it makes the script very slow.
The text was updated successfully, but these errors were encountered: