-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Wikidata primary source for Pakistan #119834
Comments
For now it's OK to make a morph scraper to get the membership/P39 information out of Wikidata — we won't know the correct abstractions for doing something more general until we've done quite a few more of these. |
I've made an initial prompt which compares what's in the Wikipedia scraper with Wikidata items that have a current (no end date) Member of the 14th National Assembly of Pakistan (Q33512801) P39 entry. https://www.wikidata.org/wiki/User:Chris_Mytton/sandbox/prompts/Pakistan_National_Assembly |
Outstanding:
|
I've also created a prompt for the official site. That prompt uses a manually generated CSV that takes the output from the scraper and combines it with EveryPolitician reconciliation information using something similar to the following command, run from
|
Once #53037 has been merged that should fix a couple of issues with the official site prompt. I've also been seeing a strange error with the official site prompt, sometimes the SPARQL will return 330 results, but then clicking through an running the SPARQL manually returns 339 results, as expected. I'm not sure if there's anything that can be done or if it's just transient, but worth watching out for. |
I've updated the Wikipedia scraper to pick up historic members from the "Membership changes" table in everypolitician-scrapers/pakistan-national-assembly-wikipedia@62d4ac5. |
Prompt for the historic members of the 14th term is here: https://www.wikidata.org/wiki/User:Chris_Mytton/sandbox/prompts/Pakistan_National_Assembly_historic |
I've generated Quickstatements (docs) for the missing historic term 14 members here: https://gist.github.com/chrismytton/aa224963a46b92dc273569af7355a512 |
I've now run the that batch of Quickstatements, so the people on the historic term 14 prompt should now all have a "Member of the 14th National Assembly of Pakistan" P39 statement. |
The members that I've just added a term 14 P39 for are missing start and end dates, because they weren't simple to scrape from the Wikipedia page. @lucychambers has kindly volunteered to go through and manually add them for the 22 members on the prompt, thanks Lucy! |
Before we can switch EveryPolitician over to using Wikidata as the primary membership source we need to create a scraper. The tonga-assembly-wikidata scraper is probably the best example we have to work from, that scraper was created on a previous attempt to switch a country to using Wikidata, so should in theory have all the fields we need. |
I've created the scraper for getting membership information from Wikidata: |
Prompt created at User:Chris_Mytton/sandbox/prompts/Pakistan_National_Assembly_EveryPolitician which compares the |
Limit initially to:
This will involve creating multiple prompts to make sure that Wikidata is sufficient:
Acceptance Criteria
morph/official.csv
andarchive/official-term-14.csv
) are replaced by information from Wikidata.person
source.The text was updated successfully, but these errors were encountered: