Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to Choose a Suitable CSS Selector for a Website #53

Open
kongjining opened this issue Nov 22, 2023 · 1 comment
Open

How to Choose a Suitable CSS Selector for a Website #53

kongjining opened this issue Nov 22, 2023 · 1 comment

Comments

@kongjining
Copy link

Inspecting Web Page Structure:

Open the target website (e.g., https://www.google.com.hk/webhp?hl=zh-CN&sourceid=cnhp/).
Right-click on the page element you wish to crawl (such as a specific text or area) and select "Inspect" to open the browser's developer tools.
Analyzing the Element:

In the developer tools, examine the HTML code of the element.
Look for attributes that uniquely identify the element or its container, such as class, id, or other attributes.
Building a CSS Selector:

Create a CSS selector based on the attributes you observed.
For example, if an element has class="content", the selector could be .content.
If the element has multiple classes, you can combine them like .class1.class2.
Testing the Selector:

In the "Console" tab of the developer tools, use document.querySelector('YOUR_SELECTOR') to test if the selector accurately selects the target element.
Applying the Selector:

Once a suitable selector is found, apply it in the selector field of your crawler configuration.
Ensure that the chosen CSS selector accurately reflects the content you wish to extract from the webpage. An incorrect selector might result in the crawler not being able to retrieve the desired data.

@bigshirtjonny
Copy link

bigshirtjonny commented Dec 2, 2023

Something I've seen is that the selector doesn't exist on one (or first) page of the crawl then the crawl will end with error. How can we configure the crawl so that if a selector doesn't exist for one page that GPT will continue to try the next page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants