Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Abacus Data Network's new server URL to harvest the installation's metadata #251

Closed
jggautier opened this issue Mar 12, 2024 · 13 comments

Comments

@jggautier
Copy link
Collaborator

Clicking on the titles of the records harvested from Abacus Data Network into Harvard Dataverse (https://dataverse.harvard.edu/dataverse/ubc_harvested) no longer takes users to the datasets.

Abacus Data Network upgraded its Dataverse software version to v5.6 and changed its server URL to https://abacus.library.ubc.ca. The old URL, https://dvn.library.ubc.ca redirects to https://abacus.library.ubc.ca.

We'll need to harvest using new URL https://abacus.library.ubc.ca/oai.

I'm not sure if we should either:

  • Edit the existing harvesting client in Harvard Dataverse so that it uses the new server URL (https://abacus.library.ubc.ca/oai), then try to harvest
  • Or remove the existing client in Harvard Dataverse, then create a new one with the new server URL
@jggautier jggautier changed the title Using Abacus Data Network's new server URL to harvest the installation's metadata Use Abacus Data Network's new server URL to harvest the installation's metadata Mar 12, 2024
@jggautier
Copy link
Collaborator Author

To see if harvesting from Abacus Data Network would work, I just told Demo Dataverse to harvest records from https://abacus.library.ubc.ca/oai into the collection at https://demo.dataverse.org/dataverse/ubc_abacus_harvested, using the dataverse_json format.

I'll check later this week to see what happens.

@cmbz
Copy link
Collaborator

cmbz commented Mar 12, 2024

2024/03/123

@jggautier
Copy link
Collaborator Author

jggautier commented Mar 13, 2024

Just an update that Demo Dataverse couldn't harvest any of the dataset metadata from Abacus using the dataverse_json metadata format.

@landreev
Copy link
Collaborator

Just want to put it on record that changing the server type (from "DVN" to "Dataverse") in the harvesting clients panel did NOT fix the redirects for the existing harvested records either.
Screen Shot 2024-03-15 at 1 00 28 PM

This appears to be because they have changed all their handle identifiers - the ones we have harvested look like hdl:11272/NNNNN, the ones they are using now - hdl:11272.1/AB2/XXXXX.
The actual old handles are still redirecting properly, if you click on them:
Screen Shot 2024-03-15 at 1 08 32 PM
so they are still registered on the handlenet side. But the Abacus' new Dataverse no longer recognizes them, when we try to redirect there directly.

... one way or another, these records are hopelessly stale. We do need to delete this client and re-harvest. I agree it's prudent to first work out a working configuration on demo. If dataverse_json isn't working, we should follow the normal downgrade route - to oai_ddi, and then to oai_dc, if that's not working either.

@landreev
Copy link
Collaborator

Hah, I was able to fix the redirects for the old harvested records in prod, by changing to "generic" and using the handle resolver as the archive url:
Screen Shot 2024-03-15 at 1 23 46 PM

Haven't tried all of them, but the ones I tried worked.
This does NOT change the fact that we want to be able to re-harvest from their new server and to restart regular harvesting from them.

@landreev
Copy link
Collaborator

On demo every record failed with the same exception:
Failed to import harvested dataset: class edu.harvard.iq.dataverse.util.json.JsonParseException (Invalid license: ...)

I'll need to refresh my memory on what this means.

@jggautier
Copy link
Collaborator Author

jggautier commented Mar 18, 2024

If dataverse_json isn't working, we should follow the normal downgrade route - to oai_ddi, and then to oai_dc, if that's not working either.

Is it okay if I try oai_ddi now and then oai_dc if that doesn't work? Or should I wait until you can look into what's going on with that "Invalid license" exception?

In case "license" there means a dataset's license metadata, Abacus is running v5.6 so its datasets' dataverse_json exports are different than exports from installations running v5.10+ after the multiple license update.

@landreev
Copy link
Collaborator

Please go ahead and try the other formats, no need to wait.
(you will need to either delete and recreate the client; or purge the clientharvestrun entry from the database; otherwise it'll attempt to harvest incrementally, since the date/time of the last so called "success". sorry if I'm explaining the obvious)

You are most likely correct, about the json format. That was my guess, that it's completely incompatible between pre- and post-5.10, because of the license change. Just wanted to confirm this w/ others in dv-tech.

@jggautier
Copy link
Collaborator Author

Using oai_ddi worked mostly.

It looks like 2,435 records were harvested into https://demo.dataverse.org/dataverse/ubc_abacus_harvested. The harvesting client page says that 3 failed.

And it looks like there are 2,449 datasets in the repository (https://abacus.library.ubc.ca), although maybe a few of those are missing in their oai feed because they were published very recently.

Want me to delete the client in Harvard Dataverse and re-create a new one using oai_ddi instead?

@landreev
Copy link
Collaborator

Yes, that's a very good success-to-fail ratio, let's use oai_ddi in prod.
I would only suggest to wait to run the actual harvest until the weekend.

@jggautier
Copy link
Collaborator Author

jggautier commented Mar 21, 2024

Just leaving an update that the old client was deleted and I created a new client using the new server URL https://abacus.library.ubc.ca/oai. It's scheduled to run Saturdays at 4am and harvest oai_ddi metadata into the collection at https://dataverse.harvard.edu/dataverse/ubc_harvested.

Next Monday I'll check to see how it went 🤞

@jggautier
Copy link
Collaborator Author

jggautier commented Mar 25, 2024

The harvesting client page for HDV says that the scheduled harvest ran on Saturday at 4 am and 2,438 records were harvested into https://dataverse.harvard.edu/dataverse/ubc_harvested. It failed to harvest 3 records.

I'm going to close this issue.

@landreev
Copy link
Collaborator

I saw that last night. Going to consider this a smashing success, by our standards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants