-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Add support for JMESPath #25
Comments
That's an interesting idea. I'd be okay with adding a
|
Alright I'll prototype this idea and see how it goes. |
@eliasdorneles : do you recall XPathHtmlSelector, XPathXMLSelector, CSSHtmlSelector...? I am not fond of using different class for JMESPath, we ditched it already in favour of a single class with different methods per selection type. From the tip of my head the main reason I recall is simpler nesting of selection methods: |
@dangra I see. Do we have an use case for |
There are useful use cases for chaining (e.g. processing data- attributes), but I think they don't worth extra complexity we may introduce to support them.
|
@kmike curious how you're thinking about the implementation. |
This is exactly what Parsel provides, it moves the implementation complexity out from users.
Both examples for chaining JSON and HTML are valid and making chaining easy is part of Parsel philosophy. |
Users who try to subclass the selectors will end up facing these complexities. |
@Digenis I don't think there is such complexity for users extending Selector class, I can understand there was bit when CSS selection method was added because behind the scenes it translates to xpath and reuse it. But for JMESPath this is going to be a completely new method, it doesn't interfere with existent methods at all. I think we have two options:
|
Ok! I must admit Option 2 is complex because we are parsing the DOM in Selector constructor but option 1 is still compelling, isn't it? :) |
I agree that option (1) looks easy enough to implement, but have anyone had a real use case for it? If I understood @voith properly, he wanted to parse JSON using Parsel (no XML/HTML involved at all), not to query some JSON data extracted form XML/HTML element attributes. |
Yes I opened this issue with the intention of being able to parse JSON with Parsel. Although It'd be great to have chained parsing. But the implementation of having |
We can delay the parsing of the DOM until the first selection method is called. That will trigger json, xml, html parsing on demand. |
Offering But I don't see a compelling use case for chaining Internally, in current parsel implementation, once the input is parsed, the chaining navigates inside the same parsed document tree, it does not re-parse to build a new document. Take, say, some HTML document containing comments which themselves contain HTML code,
parsel does not support something like
You still have to reinject into another selector to work on the embedded HTML:
|
Finding a mixed form of the json and xml\html is not rare when we crawl
Here are some examplewhen have json in html<div>
<h1>Information</h1>
<content>
{
"user": [
{ "name": "A", "age": 18},
{"name": "B","age": 32},
{"name": "C","age": 22},
{"name": "D","age": 25}
],
"total": 4,
"status": "ok"
}
</content>
</div>
>>> sel.xpath('//div/content').jpath('user[*].name').getall()
['A', 'B', 'C', 'D'] when have html in json{
"content": [
{ "name": "A", "value": "a" },
{"name": {"age": 18}, "value": "b"},
{"name": "C", "value": "c"},
{"name": "<a>D</a>", "value": "<div>d</div>"}
],
"html": [
"<div><a>AAA<br>Test</a>aaa</div><div><a>BBB</a>bbb<b>BbB</b><div/>"
]
}
>>> sel.jpath('html').xpath('//div/a/text()').getall()
['AAA', 'Test', 'BBB']
|
Hey guys! I think we need to discuss which name is better, jsonpath? Jpath? Jmespath?
|
JMESPath and JSONPath are different JSON query languages. If we use Moreover, just as we support 2 different HTML/XML query languages (CSS and XPath), at some point we may support multiple JSON query languages (e.g. JMESPath, JSONPath and jq); so I really believe that Yesterday I found out that Parsel used to have a |
You convinced me, I agree with you now, I decided to adopt jmespath, thank you very much for your help.^0^ |
How is going this? |
I’m not entirely against it, but given that we use |
Not to derail this but I'd argue that implementing Ideally it would be great to have both! More and more web is using json and would be great to have one good parser for both html and json. 1 - https://github.com/h2non/jsonpath-ng jsonpath implementation in Python |
I’ve added JMESPath support to a real-life project, and I must say @Granitosaurus you are completely right. The lack of the concept of parent nodes in JMESPath can be quite limiting, just as in CSS. It feels like JMESPath is to JSONpath what CSS is to XPath. So, once this is fixed, I agree we should aim to extend support to JSONpath. Hopefully it won’t be too hard at that point. |
I think Jmespath should be supported first, because it has been actively maintained over the years, and has plenty of resources and documentation. Many developers can find a way to get started. Then we can wait for a better and more robust json parser to appear.This doesn't conflict, just like css doesn't conflict with xpath, both are supported by parsel at the same time. |
Building a Selector based on JMESPath in parsel will help ease parsing Json.
This will also help scrapy to add methods like
add_json
andget_json
to theItemLoader
. I got this idea from scrapy/scrapy#1005.From what I understand, the
Selector
in parsel has been built using lxml, how about using jmespath for building aJsonSelector
?I am not sure if this is the feature to have in this library as
Parsel
describes itself as a parser for XML/HTML. But adding this feature will add great value to this project.PS: If the maintainers would like to have this feature in, Than I'd like to contribute to it myself.
The text was updated successfully, but these errors were encountered: