Skip to content

Ping the Semantic Web

Tim L edited this page Jan 14, 2014 · 32 revisions
csv2rdf4lod-automation is licensed under the [Apache License, Version 2.0](https://github.com/timrdf/csv2rdf4lod-automation/wiki/License)

What is first

What we will cover

This page describes http://pingthesemanticweb.com and http://sindice.com/developers/pingApi, and how to use them. These two services accept pointers to Linked Data URIs or files. They index them, so it's easier for others to find your data. Sindice seems to be the leader in this capability.

Let's get to it!

There are two platforms that accept announcements of semantic web data: Ping the Semantic Web and Sindice.

Platform 1 of 2: Ping the Semantic Web

$CSV2RDF4LOD_HOME/bin/util/ptsw.sh returns the URLs required to notify Ping the Semantic Web about the RDF documents listed.

Must be done from a machine whose IP is registered at http://pingthesemanticweb.com

Success 1

http://homepages.rpi.edu/~lebot/lod-links/state-fips-dbpedia.ttl

curl http://pingthesemanticweb.com/rest/?url=http%3A%2F%2Fhomepages.rpi.edu%2F~lebot%2Flod-links%2Fstate-fips-dbpedia.ttl
<response>
	<message>Thanks for pinging Ping the Semantic Web.</message>
	<flerror>0</flerror>
</response>

Success 2

http://homepages.rpi.edu/~lebot/lebot.foaf

curl http://pingthesemanticweb.com/rest/?url=http%3A%2F%2Fhomepages.rpi.edu%2F~lebot%2Flebot.foaf
<response>
	<message>Thanks for pinging Ping the Semantic Web.</message>
	<flerror>0</flerror>
</response>

Failures

Would not work on:

<response>
	<message>Ping the Semantic Web is not allowed to index this URL.</message>
	<flerror>1</flerror>
</response>

Platform 2 of 2: Sindice

http://sindice.com/developers/pingApi

curl -H "Accept: text/plain" --data-binary http://healthdata.tw.rpi.edu/source/healthdata-tw-rpi-edu/dataset/cr-linksets/version/2013-Jan-08 http://api.sindice.com/v2/ping

1 pings submitted, 1 accepted

See what they index in the last week:

http://sindice.com/developers/publishing

cr-pingback

cr-pingback.sh can be run from a csv2rdf4lod [data root](csv2rdf4lod automation data root) or a specific source organization directory (e.g. data/source, data/source/us, respectively) (it is also run by cr-cron.sh). It dereferences the "/void" path of the data domain (e.g. http://opendap.tw.rpi.edu/void) and POSTs it to update datahub.io's dataset record. The [environment variable](CSV2RDF4LOD environment variables) CSV2RDF4LOD_PUBLISH_DATAHUB_METADATA_OUR_BUBBLE_ID determines which dataset on datahub.io will be updated; e.g. the value twc-ieeevis will cause the dataset http://datahub.io/dataset/twc-ieeevis to be updated. The DataFAQs script add-metadata.py is used to process the VoID description into the JSON structure that CKAN requires, and suits the additional requirements to be listed in the lodcloud group. The add-metadata.py script requires the environment variable X_CKAN_API_Key to contain the datahub.io API key, which can be found on your datahub.io user page.

cr-pingback.sh will only collect the VoID and POST it to datahub.io once per week.

NOTE: cr-pingback.sh avoids overwriting any CKAN metadata, to be cautious. However, this prevents updates to the "links:" attribute fields to express connections to other lodcloud datasets. This could be relaxed to ensure regular updates of those links.

Although cr-pingback.sh used to create the datahub.io dataset if it did not exist, it appears as though they've disabled this option. So, you must manually created the listing on datahub.io before cr-pingback.sh can fill in the metadata. This isn't all terrible, because some of the human-level listing stuff is still useful to list on datahub.io and cr-pingback.sh doesn't have that information anyway.

What is next

Clone this wiki locally