Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

macOS 下没有自动 Grab #6

Open
RockNHawk opened this issue Nov 21, 2018 · 5 comments
Open

macOS 下没有自动 Grab #6

RockNHawk opened this issue Nov 21, 2018 · 5 comments

Comments

@RockNHawk
Copy link

RockNHawk commented Nov 21, 2018

作者你好!

经过对代码的兼容性修改(已发 pull request),macOS 目前已经可以通过点击 Actions 里的 Grab Now 按钮成功获取到数据了。

但是看下来没有自动 Grab,GrabResult 是空,Status 是 ON,Log 也是空,请教这个问题需要从何处查起呢?

是单机使用的。

还望指教一二,谢谢!

@RockNHawk
Copy link
Author

RockNHawk commented Nov 21, 2018

Log 有了

Log 有了

2018-11-21 20:49:47,809 [1] INFO - 127.0.0.1:36000 feed scheduler starting
2018-11-21 20:49:47,819 [1] INFO - 127.0.0.1:36000 feed scheduler started
2018-11-21 20:49:47,821 [1] INFO - Start WebApiServer At http://127.0.0.1:36000 with STANDALONE node
2018-11-21 20:49:48,983 [4] INFO - 127.0.0.1:36000 add job with feed id 5
2018-11-21 20:49:48,996 [4] INFO - 127.0.0.1:36000 add job with feed id 3
2018-11-21 20:49:49,006 [4] INFO - 127.0.0.1:36000 add job with feed id 11
2018-11-21 20:49:49,013 [4] INFO - 127.0.0.1:36000 add job with feed id 1
2018-11-21 20:49:49,023 [4] INFO - 127.0.0.1:36000 add job with feed id 2
2018-11-21 20:49:49,032 [4] INFO - 127.0.0.1:36000 add job with feed id 4
2018-11-21 20:49:49,040 [4] INFO - 127.0.0.1:36000 add job with feed id 12
2018-11-21 20:49:49,040 [4] INFO - 127.0.0.1:36000 sync feed and add feed jobs:7
2018-11-21 20:49:49,042 [4] INFO - 127.0.0.1:36000 add extract job
2018-11-21 20:50:00,069 [14] INFO - feed job feed127.0.0.1:36000.3 add to feed crawl queue
2018-11-21 20:50:00,069 [13] INFO - feed job feed127.0.0.1:36000.12 add to feed crawl queue
2018-11-21 20:50:00,069 [15] INFO - feed job feed127.0.0.1:36000.2 add to feed crawl queue
2018-11-21 20:50:00,069 [5] INFO - feed job feed127.0.0.1:36000.11 add to feed crawl queue
2018-11-21 20:50:00,069 [9] INFO - feed job feed127.0.0.1:36000.1 add to feed crawl queue
2018-11-21 20:50:00,069 [4] INFO - feed job feed127.0.0.1:36000.5 add to feed crawl queue
2018-11-21 20:50:00,096 [4] INFO - feed job http://www.jiuxian.com/goods-55611.html?source=92 starting
2018-11-21 20:50:00,096 [15] INFO - feed job https://www.kuaidaili.com/free/inha/1/ starting
2018-11-21 20:50:00,096 [9] INFO - feed job https://www.oschina.net/blog starting
2018-11-21 20:50:00,096 [5] INFO - feed job http://www.ruijihg.com/爬虫 starting
2018-11-21 20:50:00,098 [4] INFO - do task -> request address http://www.jiuxian.com/goods-55611.html?source=92
2018-11-21 20:50:00,098 [15] INFO - do task -> request address https://www.kuaidaili.com/free/inha/1/
2018-11-21 20:50:00,098 [9] INFO - do task -> request address https://www.oschina.net/blog
2018-11-21 20:50:00,098 [5] INFO - do task -> request address http://www.ruijihg.com/爬虫
2018-11-21 20:50:00,098 [13] INFO - begin move delay feed
2018-11-21 20:50:00,102 [13] INFO - get snapshot feed count:0
2018-11-21 20:50:00,104 [13] INFO - feed job http://press.gapp.gov.cn:8088/press_search/pages/query/queryAction!findmediaPaging.action starting
2018-11-21 20:50:00,104 [13] INFO - do task -> request address http://press.gapp.gov.cn:8088/press_search/pages/query/queryAction!findmediaPaging.action
2018-11-21 20:50:00,169 [4] INFO - request http://www.jiuxian.com/goods-55611.html?source=92 response code is BadRequest
2018-11-21 20:50:01,052 [16] INFO - feed job http://app.cannews.com.cn/roll.php?do=query&callback=jsonp1475197217819&_=1542804600157&date=2018-11-21&size=20&page=1 starting
2018-11-21 20:50:01,053 [16] INFO - do task -> request address http://app.cannews.com.cn/roll.php?do=query&callback=jsonp1475197217819&_=1542804600157&date=2018-11-21&size=20&page=1
2018-11-21 20:50:04,157 [5] INFO - request http://www.ruijihg.com/爬虫 response code is OK
2018-11-21 20:50:04,170 [5] INFO - http://www.ruijihg.com/爬虫 response save to /Users/user1/git/RuiJi.Net/RuiJi.Net.Cmd/bin/Debug/netcoreapp2.1/snapshot/1_636784302041708630.json
2018-11-21 20:50:04,262 [19] INFO - feed job http://app.cannews.com.cn/roll.php?do=query&callback=jsonp1475197217819&_=1542804600157&date=2018-11-21&size=20&page=2 starting
2018-11-21 20:50:04,263 [19] INFO - do task -> request address http://app.cannews.com.cn/roll.php?do=query&callback=jsonp1475197217819&_=1542804600157&date=2018-11-21&size=20&page=2
2018-11-21 20:50:04,461 [9] INFO - request https://www.oschina.net/blog response code is OK
2018-11-21 20:50:04,463 [9] INFO - https://www.oschina.net/blog response save to /Users/user/git/RuiJi.Net/RuiJi.Net.Cmd/bin/Debug/netcoreapp2.1/snapshot/5_636784302044636140.json
2018-11-21 20:50:04,832 [15] INFO - request https://www.kuaidaili.com/free/inha/1/ response code is OK
2018-11-21 20:50:04,833 [15] INFO - https://www.kuaidaili.com/free/inha/1/ response save to /Users/user1/git/RuiJi.Net/RuiJi.Net.Cmd/bin/Debug/netcoreapp2.1/snapshot/2_636784302048338500.json
2018-11-21 20:50:04,864 [16] INFO - request http://app.cannews.com.cn/roll.php?do=query&callback=jsonp1475197217819&_=1542804600157&date=2018-11-21&size=20&page=1 response code is OK
2018-11-21 20:50:04,865 [16] INFO - http://app.cannews.com.cn/roll.php?do=query&callback=jsonp1475197217819&_=1542804600157&date=2018-11-21&size=20&page=2 response save to /Users/user1/git/RuiJi.Net/RuiJi.Net.Cmd/bin/Debug/netcoreapp2.1/snapshot/3_636784302048652700.json
2018-11-21 20:50:04,946 [19] INFO - request http://app.cannews.com.cn/roll.php?do=query&callback=jsonp1475197217819&_=1542804600157&date=2018-11-21&size=20&page=2 response code is OK
2018-11-21 20:50:04,947 [19] INFO - http://app.cannews.com.cn/roll.php?do=query&callback=jsonp1475197217819&_=1542804600157&date=2018-11-21&size=20&page=2 response save to /Users/user1/git/RuiJi.Net/RuiJi.Net.Cmd/bin/Debug/netcoreapp2.1/snapshot/3_636784302049472390.json
2018-11-21 20:50:06,134 [13] INFO - request http://press.gapp.gov.cn:8088/press_search/pages/query/queryAction!findmediaPaging.action response code is OK
2018-11-21 20:50:06,143 [13] INFO - http://press.gapp.gov.cn:8088/press_search/pages/query/queryAction!findmediaPaging.action response save to /Users/user1/git/RuiJi.Net/RuiJi.Net.Cmd/bin/Debug/netcoreapp2.1/snapshot/11_636784302061430920.json

2018-11-21 20:51:07,090 [13] INFO - extract job http://www.cannews.com.cn/2018/1121/185471.shtml save result False
2018-11-21 20:51:07,090 [16] INFO - extract job http://www.cannews.com.cn/2018/1121/185469.shtml save result False

....

2018-11-21 20:52:00,005 [13] INFO - feed extract job execute
2018-11-21 20:52:00,006 [13] INFO - extract job started
2018-11-21 20:52:00,006 [13] INFO - begin move delay feed
2018-11-21 20:52:00,007 [9] INFO - get snapshot feed count:0
2018-11-21 20:53:00,004 [27] INFO - feed extract job execute
2018-11-21 20:53:00,005 [27] INFO - extract job started
2018-11-21 20:53:00,005 [27] INFO - begin move delay feed
2018-11-21 20:53:00,005 [13] INFO - get snapshot feed count:0

@RockNHawk
Copy link
Author

这个是 Error Log,没有堆栈信息,应从何处查起呢?

2018-11-21 19:45:00,037 [20] ERROR - https://www.oschina.net/blog response error is Specified value has invalid Control characters.
Parameter name: value
2018-11-21 19:45:00,037 [17] ERROR - http://www.ruijihg.com/爬虫 response error is Specified value has invalid Control characters.
Parameter name: value
2018-11-21 20:22:16,762 [39] ERROR - http://www.cannews.com.cn/2018/1121/185448.shtml response error is A task may only be disposed if it is in a completion state (RanToCompletion, Faulted or Canceled).
2018-11-21 20:32:15,484 [43] ERROR - http://www.cannews.com.cn/2018/1121/185460.shtml response error is A task may only be disposed if it is in a completion state (RanToCompletion, Faulted or Canceled).
2018-11-21 20:37:33,384 [15] ERROR - http://www.jiuxian.com/goods-55611.html?source=92 response error is One or more errors occurred. (Failed to launch chrome! path to executable does not exist)
2018-11-21 20:45:00,070 [22] ERROR - http://www.jiuxian.com/goods-55611.html?source=92 response error is One or more errors occurred. (Failed to launch chrome! path to executable does not exist)
2018-11-21 20:50:00,166 [4] ERROR - http://www.jiuxian.com/goods-55611.html?source=92 response error is One or more errors occurred. (Failed to launch chrome! path to executable does not exist)
2018-11-21 20:55:00,020 [28] ERROR - http://www.jiuxian.com/goods-55611.html?source=92 response error is One or more errors occurred. (Failed to launch chrome! path to executable does not exist)
2018-11-21 21:00:00,215 [24] ERROR - http://www.jiuxian.com/goods-55611.html?source=92 response error is One or more errors occurred. (Failed to launch chrome! path to executable does not exist)

@githublixiang
Copy link
Collaborator

你好,macOS因设备原因无法测试。“Failed to launch chrome! path to executable does not exist”此错误是该规则使用了RunJs但是没有配置好无头浏览器。

如果您需要运行页面上的js脚本,您需要安装chromium无头浏览器。
地址为 https://pan.baidu.com/s/1rsyCNnXxbobCBLZuPTiJHQ 访问密码 cr3e
下载RuiJi.Net所部署的操作系统对应的chromium的zip包
将运行文件解压至RuiJi.Net运行根目录中的chromium文件夹中,即可运行RunJs。

具体macOS使用chromium还要如何还要如何配置,请查阅一下相关资料。
以下为linux解决方法。
linux下需安装chromelib库
yum install chromium-libs.x86_64
并给与chromium文件夹最高权限
chmod -R 777 chromium

进行以上两步之后linux即可正常运行chromium无头浏览器,供参考。
https://gitee.com/zhupingqi/RuiJi.Net/wikis/%E5%85%B6%E4%BB%96?sort_id=580719 参考中文文档

@RockNHawk
Copy link
Author

你好,macOS因设备原因无法测试。“Failed to launch chrome! path to executable does not exist”此错误是该规则使用了RunJs但是没有配置好无头浏览器。

如果您需要运行页面上的js脚本,您需要安装chromium无头浏览器。
地址为 https://pan.baidu.com/s/1rsyCNnXxbobCBLZuPTiJHQ 访问密码 cr3e
下载RuiJi.Net所部署的操作系统对应的chromium的zip包
将运行文件解压至RuiJi.Net运行根目录中的chromium文件夹中,即可运行RunJs。

具体macOS使用chromium还要如何还要如何配置,请查阅一下相关资料。
以下为linux解决方法。
linux下需安装chromelib库
yum install chromium-libs.x86_64
并给与chromium文件夹最高权限
chmod -R 777 chromium

进行以上两步之后linux即可正常运行chromium无头浏览器,供参考。
https://gitee.com/zhupingqi/RuiJi.Net/wikis/%E5%85%B6%E4%BB%96?sort_id=580719 参考中文文档

感谢回复!使用的项目中自带的数据做的测试,里面也有无需 RunJs 的项目,也没有 GrabResult 。

所以对于:

2018-11-21 20:22:16,762 [39] ERROR - http://www.cannews.com.cn/2018/1121/185448.shtml response error is A task may only be disposed if it is in a completion state (RanToCompletion, Faulted or

这类错误应从何查起呢?

@githublixiang
Copy link
Collaborator

你好,macOS因设备原因无法测试。“Failed to launch chrome! path to executable does not exist”此错误是该规则使用了RunJs但是没有配置好无头浏览器。
如果您需要运行页面上的js脚本,您需要安装chromium无头浏览器。 地址为 https://pan.baidu.com/s/1rsyCNnXxbobCBLZuPTiJHQ 访问密码 cr3e 下载RuiJi.Net所部署的操作系统对应的chromium的zip包 将运行文件解压至RuiJi.Net运行根目录中的chromium文件夹中,即可运行RunJs。
具体macOS使用chromium还要如何还要如何配置,请查阅一下相关资料。 以下为linux解决方法。 linux下需安装chromelib库 yum install chromium-libs.x86_64 并给与chromium文件夹最高权限 chmod -R 777 chromium
进行以上两步之后linux即可正常运行chromium无头浏览器,供参考。
https://gitee.com/zhupingqi/RuiJi.Net/wikis/%E5%85%B6%E4%BB%96?sort_id=580719 参考中文文档

感谢回复!使用的项目中自带的数据做的测试,里面也有无需 RunJs 的项目,也没有 GrabResult 。

所以对于:

2018-11-21 20:22:16,762 [39] ERROR - http://www.cannews.com.cn/2018/1121/185448.shtml response error is A task may only be disposed if it is in a completion state (RanToCompletion, Faulted or

这类错误应从何查起呢?

你好,此条日志提示响应异常,打开此链接发现已经失效。
请检查需要提取的Feed及Rule是否设置正确。
请参照测试服务器FeedId为5的开源中国博客示例。
http://118.31.61.230:36000/#feed/feeds

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants