Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: 'NullPointerException' with Children's Books dataset #10

Open
sdevalk opened this issue Feb 25, 2020 · 4 comments
Open

Question: 'NullPointerException' with Children's Books dataset #10

sdevalk opened this issue Feb 25, 2020 · 4 comments

Comments

@sdevalk
Copy link
Contributor

sdevalk commented Feb 25, 2020

@nfreire When I run the crawler using the command underneath...

docker-compose run --rm crawler /bin/bash ./crawler.sh \
    -dataset_uri http://data.bibliotheken.nl/id/dataset/rise-childrensbooks \
    -output_file /opt/europeana_cc_lod_share/crawled/centsprenten.nt \
    -log_file run.log

...I get a 'java.lang.NullPointerException':

Exception in thread "main" java.lang.NullPointerException
	at java.base/java.net.URI$Parser.parse(Unknown Source)
	at java.base/java.net.URI.<init>(Unknown Source)
	at eu.europeana.commonculture.lod.crawler.LinkedDataCrawler.crawl(LinkedDataCrawler.java:69)
	at eu.europeana.commonculture.lod.crawler.CommandLineInterface.main(CommandLineInterface.java:59)

Do you know what the cause of this exception could be?

@nfreire
Copy link
Collaborator

nfreire commented Feb 25, 2020

The children's books collection as was the first LOD delivered to Europeana by the KB. It was the first experiment with an LOD dataset description. After working on it, we evolved on the dataset descriptions for the other two KB collections.
The children's books dataset RDF description is outdated and no longer conforms with the Europeana guidelines.
Could the KB update it?

Regardless of the KB updated the data, I'll change the crawler so that it does not break in such situations.

@nfreire
Copy link
Collaborator

nfreire commented Feb 25, 2020

Should be fixed now, and the crawler no longer throws the exception on an invalid dataset description.

@sdevalk
Copy link
Contributor Author

sdevalk commented Feb 25, 2020

Thank you!

@sdevalk
Copy link
Contributor Author

sdevalk commented Feb 25, 2020

Question: if the crawler returns an error on an invalid dataset description, it doesn't seem to exit. My console keeps outputting the following until I shut down the crawler myself:

com.ontologycentral.ldspider.http.internal.CloseIdleConnectionThread run
INFO: Closing expired and idle connections

Would it be possible to exit the crawler automatically?

nfreire pushed a commit that referenced this issue Feb 26, 2020
nfreire pushed a commit that referenced this issue Feb 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants