-
-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding China National People's Congress #861
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's call this something like cn_wikidata_npc so that if we get an official source one day, we can use this name or similar for it.
I'm actually using the data from NPC observer, because it was already in a google sheets . Do you want me to extract from the wikipedia or is this ok? |
Using type.gender in the lookup Co-authored-by: JD Bothma <[email protected]>
Oh right - I pasted one table into my google sheet from wikipedia and it seems to work fine https://docs.google.com/spreadsheets/d/1WfvKmydpsyIQe-5V28meRx2miVNCxMAtbtM-taJDFNw/edit#gid=0 I think it's worth using the wikipedia data since it contains dates of birth which help a lot to disambiguate when two people have the same name. |
I added the data from wikipedia to the google sheets. I used the following code to extract the data:
|
Using google sheets to fetch data Co-authored-by: JD Bothma <[email protected]>
I'm having trouble with the assertion because there are some cases we don't have enough information to create a unique ID. For example, there are two deputies with the name Zhang Qiang, that are male and ethnicity Han, since we don't have any other relevant information about them like birth date, we create the same ID. What we do in this case? |
well spotted! It looks like one represents Jiangsu Province and one represents Jiangxi Province. Let's include the province they represent in the ID |
Could you include your script for scraping the content in the crawler directory? Then it's easy to update. |
Fixes opensanctions/crawler-planning#170
It's currently throwing an assertion error because it's not creating an entity for each line, but I couldn't find why.