Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

selenium.py运行出错 #2

Open
soen0905 opened this issue Jul 19, 2022 · 2 comments
Open

selenium.py运行出错 #2

soen0905 opened this issue Jul 19, 2022 · 2 comments

Comments

@soen0905
Copy link

我在尝试运行时报错,个人猜测不知道是生成器的哪里出现了问题,也有可能是版本问题(windows10系统,python版本为3.8)

def parse_index():
    elements = browser.find_elements_by_css_selector('#index .item .name')
    for element in elements:
        href = element.get_attribute('href')
        yield urljoin(INDEX_URL, href)


我的报错显示如下:

2022-07-19 20:52:44,940 - INFO:scraping https://spa2.scrape.center/page/1
2022-07-19 20:52:48,127 - INFO:detail url https://spa2.scrape.center/detail/ZWYzNCN0ZXVxMGJ0dWEjKC01N3cxcTVvNS0takA5OHh5Z2ltbHlmeHMqLSFpLTAtbWIx
2022-07-19 20:52:48,127 - INFO:scraping https://spa2.scrape.center/detail/ZWYzNCN0ZXVxMGJ0dWEjKC01N3cxcTVvNS0takA5OHh5Z2ltbHlmeHMqLSFpLTAtbWIx
abcd
2022-07-19 20:52:49,963 - INFO:detail data {'url': 'https://spa2.scrape.center/detail/ZWYzNCN0ZXVxMGJ0dWEjKC01N3cxcTVvNS0takA5OHh5Z2ltbHlmeHMqLSFpLTAtbWIx', 'name': '霸王别姬 - Farewell My Concubine', 'categories': ['剧情', '爱情'], 'cover': 'https://p0.meituan.net/movie/ce4da3e03e655b5b88ed31b5cd7896cf62472.jpg@464w_644h_1e_1c', 'score': '9.5', 'drama': '影片借一出《霸王别姬》的京戏,牵扯出三个人之间一段随时代风云变幻的爱恨情仇。段小楼(张丰毅 饰)与程蝶衣(张国荣 饰)是一对打小一起长大的师兄弟,两人一个演生,一个饰旦,一向配合天衣无缝,尤其一出《霸王别姬》,更是誉满京城,为此,两人约定合演一辈子《霸王别姬》。但两人对戏剧与人生关系的理解有本质不同,段小楼深知戏非人生,程蝶衣则是人戏不分。段小楼在认为该成家立业之时迎娶了名妓菊仙(巩俐 饰),致使程蝶衣认定菊仙是可耻的第三者,使段小楼做了叛徒,自此,三人围绕一出《霸王别姬》生出的爱恨情仇战开始随着时代风云的变迁不断升级,终酿成悲剧。'}
Traceback (most recent call last):
  File "D:/kinds_work/python_work/spider/第七章/selenium_spider/scrape_Spa2.py", line 93, in <module>
    main()
  File "D:/kinds_work/python_work/spider/第七章/selenium_spider/scrape_Spa2.py", line 81, in main
    for detail_url in detail_urls:
  File "D:/kinds_work/python_work/spider/第七章/selenium_spider/scrape_Spa2.py", line 45, in parse_index
    href = element.get_attribute('href')
  File "E:\anaconda\envs\spider\lib\site-packages\selenium\webdriver\remote\webelement.py", line 139, in get_attribute
    attributeValue = self.parent.execute_script(
  File "E:\anaconda\envs\spider\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 634, in execute_script
    return self.execute(command, {
  File "E:\anaconda\envs\spider\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "E:\anaconda\envs\spider\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
  (Session info: chrome=103.0.5060.114)

当我尝试将原代码转为

def parse_index():
    temp = []
    elements = browser.find_elements_by_css_selector('#index .item .name')
    for element in elements:
        href = element.get_attribute('href')
        temp.append(href)
    return temp
        # yield urljoin(INDEX_URL, href)

后,程序可以正常运行,我实在无法理解为什么会出现这样的问题。

尝试过调试该段代码,在第二次for循环中对于element.get_attribute('href')中element对象的传入没有问题。

希望大佬能拨冗解答我的疑问

@hefeng61
Copy link

hefeng61 commented Mar 3, 2023

image这块将生成器转为了list,但我不清楚为什么要这样,前面的例子也没有这样的操作

@soen0905
Copy link
Author

我超级就没有看这玩意了,谷歌给我的答案是:可能在于list后detail_urls就全部加载入内存了,这样会不卡在这个地方?或者说方便调试?感觉使用in访问生成器中的值,或者说可能出问题?
image
whatever,,,,just guess. XD

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants