Allow librarians to import MARC data from other libraries #8360

onnotasler · 2023-10-03T17:19:35Z

When entering new books or editing existing books, I often have to manually copy from libraries that offer a MARC record for download. It would be great if I could directly import this data instead of having to typing it.

As an example, take Das Postwesen im Postamtbezirk Buxtehude.
This book exists as a really low quality import on Open Library at OL26425107W
The Deutsche Nationalbibliothek offers most of the lacking information on their website. It offers downloads as MARC21-XML and RDF (Turtle).

The DNB is not the only national libraries offering this, even though the formats differ between libraries. The Bibliothèque nationale de France offers Intermarc and Unimarc instead, for instance. LIBRIS (National Library of Sweden) offers MARC21.

It would save me time and prevent spelling errors if I could import those datasets.

Describe the problem that you'd like solved

A way to import MARC records from National Libraries, to at least improve existing records, but ideally also to create new books.

Proposal & Constraints

As far as I understood, Open Library already imports MARC records from some libraries. At least I often read "imported by MARC record from library of ..." at the bottom of editions.

The import should not be more annoying than typing the stuff in manually. Also, there seems to be a lot of technical differences between different MARC versions - I probably won't be able to get up to speed in all of them, this would have to be handled automatically.

Additional context

Stakeholders

@hornc

LeadSongDog · 2023-10-03T23:44:08Z

Really, this should have been addressed long ago. Once a unique external ID such as ISBN or OCLCn has been furnished, the ImportBot ought not settle for just one repository’s record, but either select the most complete one available from a reliable library, or even better, fuse them together to fill in any blank fields. Certainly not a good plan to be stuck indefinitely with whatever little bit AMZ or BWB furnished.

Koenisegg484 · 2023-11-04T02:40:08Z

Hi @hornc
I would like to work on this issue,
Could I get some pointers on how shall I start as this is my first contribution.

mekarpeles · 2023-11-06T20:15:46Z

It seems like the ask is:
Ability to upload/submit a MARC record to Open Library

We have a pipeline for importing MARCs to Open Library, backed by Archive.org items which is described here:
https://github.com/internetarchive/openlibrary/wiki/Developer's-Guide-to-Data-Importing#MARC-Records

Also, there is a MARC option in the openlibrary.org/api/import path...

This doesn't seems like a fantastic match for a first project by a community contributor. If someone did want to work through this, the solution would likely be...

To create a librarian-only UI where a contributor with librarian permission group can upload a MARC record which gets submitted to our import process using the MARC format of parse_data:

openlibrary/openlibrary/plugins/importapi/code.py

Lines 117 to 133 in c792a2f

    
           class importapi: 
        
               """/api/import endpoint for general data formats.""" 
        
               def error(self, error_code, error='Invalid item', **kwargs): 
        
                   content = {'success': False, 'error_code': error_code, 'error': error} 
        
                   content.update(kwargs) 
        
                   raise web.HTTPError('400 Bad Request', data=json.dumps(content)) 
        
               def POST(self): 
        
                   web.header('Content-Type', 'application/json') 
        
                   if not can_write(): 
        
                       raise web.HTTPError('403 Forbidden') 
        
                   data = web.data() 
        
                   try: 
        
                       edition, format = parse_data(data)

hornc · 2023-11-06T22:37:02Z

I agree with @mekarpeles that this is probably a bit tricky for a first time contributor.

I had been meaning to respond with a summary of the two options mentioned above where we do have MARC imports already.

The bulk import process could be used to import a single record, but that's a bit fiddly and involves creating a new archive.org item. Depending on the source though, if MARC records are available publicly, there might be a way to import an entire collection rather than a few books one by one. Is that a possibility here?

The API should work to import a single record in one go, but I have not looked at this in a while. I don't think the single import API will store the MARC record anywhere, which is less useful than it could be. Open Library does not store MARC records, they are all on archive.org as single records stored on a scanned item, or part of a larger bulk-data MARC collection. Single MARC records without corresponding scans is not handled well / at all (if I remember correctly).

The work around has been to only import bulk collections, which gives many new books, and records the source.

Three options:

Use the existing bulk import API because we can get more records from this source (I don't know if that's possible or better than the original request)
Figure out the existing API instructions in way that satisfy the request. The API is there, but is mostly unused.
Implement a new librarian UI interface to the existing APIs, if the first two options aren't sufficient as is.

onnotasler · 2023-11-06T23:28:03Z

Depending on the source though, if MARC records are available publicly, there might be a way to import an entire collection rather than a few books one by one. Is that a possibility here?

The free MARC records I found were all limited to a single edition of a single work. With the tools and knowledge I have, I can only download and process one edition at a time. If it is possible to import the whole catalogue at once, that would definitely be better.

At least the Deutsche Nationalbibliothek has an Bezugswege und Exportformate entry on their homepage, and they seem to offer their whole catalogue in several different files formats:

MARC21
There are several files, one needs to download all files that end with mrc.gz to get the whole snapshot.
RDF
There are many, many files in there, one needs Stabiler Link auf den aktuellen Gesamtabzug (roughly: Permalink to catalogue snapshot) in one of the formats offered.

They also offer a long list of formats and APIs, but I lack the technical expertise to comment on them.

hornc · 2023-11-12T21:11:33Z

@onnotasler There's an issue for DNB data here: internetarchive/openlibrary-bots#29 I have prepared the data and made a start on importing. I stopped because of the various discussion about import data quality, and have not yet resumed importing. This is something I can turn back on again if there is demand.

onnotasler · 2023-12-04T19:39:23Z

I do not insist on a MARC importer if I can instead get the books imported in bulk, but in that case we should implement a way to suggest sources for bulk data instead.

onnotasler added Needs: Lead Needs: Triage This issue needs triage. The team needs to decide who should own it, what to do, by when. [managed] Type: Feature Request Issue describes a feature or enhancement we'd like to implement. [managed] labels Oct 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow librarians to import MARC data from other libraries #8360

Allow librarians to import MARC data from other libraries #8360

onnotasler commented Oct 3, 2023

LeadSongDog commented Oct 3, 2023 •

edited

Loading

Koenisegg484 commented Nov 4, 2023

mekarpeles commented Nov 6, 2023

hornc commented Nov 6, 2023

onnotasler commented Nov 6, 2023

hornc commented Nov 12, 2023

onnotasler commented Dec 4, 2023

Allow librarians to import MARC data from other libraries #8360

Allow librarians to import MARC data from other libraries #8360

Comments

onnotasler commented Oct 3, 2023

Describe the problem that you'd like solved

Proposal & Constraints

Additional context

Stakeholders

LeadSongDog commented Oct 3, 2023 • edited Loading

Koenisegg484 commented Nov 4, 2023

mekarpeles commented Nov 6, 2023

hornc commented Nov 6, 2023

onnotasler commented Nov 6, 2023

hornc commented Nov 12, 2023

onnotasler commented Dec 4, 2023

LeadSongDog commented Oct 3, 2023 •

edited

Loading