Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A more robust multistep approach #4

Open
Deleetdk opened this issue Nov 3, 2016 · 5 comments
Open

A more robust multistep approach #4

Deleetdk opened this issue Nov 3, 2016 · 5 comments
Assignees

Comments

@Deleetdk
Copy link
Owner

Deleetdk commented Nov 3, 2016

A better idea is to use a multi-step approach:

  1. Find all matches.
  2. Remove duplicates.
  3. Sort by length, shortest first.
  4. Try each match against DOI lookup, note status.
  5. If only one gets OK status, use that.
  6. If multiple get OK status, ask user which to use.

This should cover pretty much everything, but could be slow depending on how long it takes to look up DOIs. Unless a page really has multiple valid DOIs (if it has a reference list!), then this should find all the valid ones fairly quickly.

@Deleetdk Deleetdk self-assigned this Nov 3, 2016
@Deleetdk
Copy link
Owner Author

Deleetdk commented Nov 3, 2016

http://link.springer.com/article/10.1385/MO:23:4:443

This currently fails because there are multiple valid DOIs present and the shortest one is not the right one.

Is there some way other aside from shortness one can grade plausibility by?

@onbjerg
Copy link
Contributor

onbjerg commented Nov 3, 2016

@Deleetdk Maybe we should check if the DOI is visible. The other DOIs are in the URLs of the cited papers.

@Deleetdk
Copy link
Owner Author

Deleetdk commented Nov 3, 2016

Can check if it's in the visible part of the code. Like not in href or

.

On Nov 3, 2016 19:49, "Oliver" [email protected] wrote:

@Deleetdk https://github.com/Deleetdk Maybe we should check if the DOI
is visible. The other DOIs are in the URLs of the cited papers.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#4 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGcl6fr8F6BHKD5dom223Mk0-P4nLHYNks5q6iykgaJpZM4KoAj-
.

@onbjerg
Copy link
Contributor

onbjerg commented Nov 3, 2016

Actually, maybe we should check the meta tags as well. The site you linked to has a meta tag for the DOI.

@onbjerg
Copy link
Contributor

onbjerg commented Nov 3, 2016

Also, the DOI handbook states that one should strive to display it as doi:<doi> and another major registration agency recommends using https://doi.org/<doi> (ref).

As such, we should probably rank them in order of "correct" format:

  1. doi:<doi>
  2. https://[www.]doi.org/<doi>
  3. <doi>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants