Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wishlist bot creating duplicate works & badly formatted authors #24

Open
tfmorris opened this issue Sep 15, 2018 · 1 comment
Open

Wishlist bot creating duplicate works & badly formatted authors #24

tfmorris opened this issue Sep 15, 2018 · 1 comment

Comments

@tfmorris
Copy link

My concern about large scale data imports has always been that we be careful not to make our data quality issues worse at the expense playing the "numbers game" to bulk up.

Perhaps I just got unlucky, but the very first wishlist bot addition that I looked at (linked from the OpenLibrary blog post) had three duplicated works and two duplicated authors, both with badly formatted names.

https://openlibrary.org/works/OL17890901W/Eagle's_Trees_and_shrubs_of_New_Zealand.
https://openlibrary.org/works/OL17900501W/Eagle's_Trees_and_shrubs_of_New_Zealand.
https://openlibrary.org/works/OL17900497W/Eagle's_Trees_and_shrubs_of_New_Zealand.

Eagle, Audrey Lily - https://openlibrary.org/authors/OL7416671A/Eagle_Audrey_Lily
Audrey LilyEagle - https://openlibrary.org/authors/OL7417982A/Audrey_LilyEagle

Although adding "1000 books" sounds like a relatively small sample, if this single book is representative, we now have many thousands of records to clean up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants