You must be signed in to change notification settings - Fork 36
Ping the Semantic Web
- It is natural to want to announce any new datasets that you Conversion process phase: publish.
This page describes http://pingthesemanticweb.com and http://sindice.com/developers/pingApi, and how to use them. These two services accept pointers to Linked Data URIs or files. They index them, so it's easier for others to find your data. Sindice seems to be the leader in this capability.
There are two platforms that accept announcements of semantic web data: Ping the Semantic Web and Sindice.
$CSV2RDF4LOD_HOME/bin/util/ptsw.sh returns the URLs required to notify Ping the Semantic Web about the RDF documents listed.
Must be done from a machine whose IP is registered at http://pingthesemanticweb.com
curl http://pingthesemanticweb.com/rest/?url=http%3A%2F%2Fhomepages.rpi.edu%2F~lebot%2Flod-links%2Fstate-fips-dbpedia.ttl
<message>Thanks for pinging Ping the Semantic Web.</message>
curl http://pingthesemanticweb.com/rest/?url=http%3A%2F%2Fhomepages.rpi.edu%2F~lebot%2Flebot.foaf
<message>Thanks for pinging Ping the Semantic Web.</message>
Would not work on:
- https://raw.github.com/timrdf/csv2rdf4lod-automation/master/doc/instances/person/lebot.foaf (scheme? domain? it's the same file as above.)
- http://logd.tw.rpi.edu/source/twc-rpi-edu/file/iogds/version/2011-Nov-15/conversion/twc-rpi-edu-iogds-2011-Nov-15.void.ttl
<message>Ping the Semantic Web is not allowed to index this URL.</message>
curl -H "Accept: text/plain" --data-binary http://healthdata.tw.rpi.edu/source/healthdata-tw-rpi-edu/dataset/cr-linksets/version/2013-Jan-08 http://api.sindice.com/v2/ping
1 pings submitted, 1 accepted
See what they index in the last week:
- http://sindice.com/search?q=date:last_week+domain:purl.org/twc/health
- http://sindice.com/search?q=date:last_week+domain:healthdata.tw.rpi.edu
- http://sindice.com/search?q=domain%3Aieeevis.tw.rpi.edu&nq=&fq=date%3Alast_week
cr-pingback.sh can be run from a csv2rdf4lod [data root](csv2rdf4lod automation data root) or a specific source organization directory (e.g. data/source
, data/source/us
, respectively) (it is also run by cr-cron.sh). It dereferences the "/void" path of the data domain (e.g. http://opendap.tw.rpi.edu/void) and POSTs it to update datahub.io's dataset record. The [environment variable](CSV2RDF4LOD environment variables) CSV2RDF4LOD_PUBLISH_DATAHUB_METADATA_OUR_BUBBLE_ID
determines which dataset on datahub.io will be updated; e.g. the value twc-ieeevis
will cause the dataset http://datahub.io/dataset/twc-ieeevis to be updated. The DataFAQs script add-metadata.py is used to process the VoID description into the JSON structure that CKAN requires, and suits the additional requirements to be listed in the lodcloud group. The add-metadata.py script requires the environment variable X_CKAN_API_Key
to contain the datahub.io API key, which can be found on your datahub.io user page.
cr-pingback.sh will only collect the VoID and POST it to datahub.io once per week.
NOTE: cr-pingback.sh avoids overwriting any CKAN metadata, to be cautious. However, this prevents updates to the "links:" attribute fields to express connections to other lodcloud datasets. This could be relaxed to ensure regular updates of those links.
Although cr-pingback.sh used to create the datahub.io dataset if it did not exist, it appears as though they've disabled this option. So, you must manually created the listing on datahub.io before cr-pingback.sh can fill in the metadata. This isn't all terrible, because some of the human-level listing stuff is still useful to list on datahub.io and cr-pingback.sh doesn't have that information anyway.
- cr-pingback.sh is listed among the Secondary Derivative Datasets.
- csv2rdf4lod-automation does not announce new datasets by default; it must be setting the CSV2RDF4LOD environment variables