Robot code for accessioning and delivery of GIS resources.
These robots require several dependencies needed to perform the GIS workflow steps. These are often shelled out to using system
calls.
- GDAL Needed for geospatial tasks. For local development, this can be installed with brew.
xsltproc
andxmllint
for transforming XML files- rsync also used as part of the robot process and is needed
GIS data has its own set of names, standards and conventions that can be difficult for newcomers. To better understand some of these please see the Geo4LibCamp Glossary as well as the following article which describes the initial goals for supporting GIS in SDR:
Kim Durante & Darren Hardy (2015) Discovery, Management, and Preservation of Geospatial Data Using Hydra, Journal of Map & Geography Libraries, 11:2, 123-154, DOI: 10.1080/15420353.2015.1041630.
gis-robot-suite services two workflows: gisAssemblyWF and gisDeliveryWF.
extract-iso19139-metadata
:: Transform ISO 19139 metadata from ArcCatalog metadataextract-iso19110-metadata
:: Transform ISO 19110metadata from ArcCatalog metadataextract-fgdc-metadata
:: Transform FGDC metadata from ArcCatalog metadatagenerate-tag
:: Apply Geo tag to objectgenerate-descriptive
:: Convert ISO 19139 into Cocina descriptiveassign-placenames
:: Insert linked data into MODS record from gazetteerextract-boundingbox
:: Extract bounding box from data for Cocina descriptivegenerate-structural
:: Generate structural metadata and update the Cocina data store accordinglyfinish-gis-assembly-workflow
:: Finalize assembly workflow to prepare for assembly/delivery/discovery (validity check)start-delivery-workflow
:: Kickstart the GIS delivery workflow at gisDeliveryWF
load-vector
:: Load vector data into PostGIS databaseload-raster
:: Load raster into GeoTIFF data storeload-geoserver
:: Load layers into GeoServerreset-geowebcache
:: Reset GeoWebCache for the layerfinish-gis-delivery-workflow
:: Connect to public and restricted GeoServers to verify layermetadata-cleanup
:: Remove the staging druid tree for the working druidstart-accession-workflow
:: Closes the object version to initiate the accessioning workflow
The file system structure will initially look like the following (see Consul page for a description) where the source input files for the shapefiles are all hard links to reduce space requirements: This is pre-stage when preparing the data for upload to globus.
zv925hd6723/
OGWELLS.dbf
OGWELLS.prj
OGWELLS.shp
OGWELLS.shp.xml
OGWELLS.shx
preview.jpg
index_map.json
Note that index_map.json
is optional.
Then at the end of GIS assembly processing -- see above prior to accessioning -- it will look like this in the workspace:
zv/
925/
hd/
6723/
zv925hd6723/
content/
preview.jpg
index_map.json
layer-iso19110.xml
layer-iso19139.xml
layer-fgdc.xml
layer.shp.xml
Note that content/index_map.json
is optional.
None 🙂
- gis-robot-suite's only data store is the shared robots Redis. Nothing needs to be done with this, since all robots will be quieted and the queues cleared as part of the larger reset process.
- Nothing special needs to be kept in terms of APOs, other than what the integration tests use (saving and reseeding that is already tracked elsewhere in the overall SDR reset process). Same for agreements and collections.
- Earthworks: we expect/hope that the unpublish step of the overall SDR reset plan will take care of removing old Earthworks data, but we are not sure whether Earthworks responds to unpublish, so that is yet to be tested on our first QA/stage SDR reset attempt (planned for Sept 2023).
- We have checked with the main user of gis-robot-suite, and have confirmed that there is no test data that needs to be kept in stage or QA across resets.
- While gis-robot-suite connects to a geoserver database, that is maintained as part of the Access portfolio, and resetting it is outside the scope of an Infrastructure portfolio SDR reset.
- Delete all content under the directories pointed to by the following shared_configs settings for the given env (note: double-check the actual settings values, the examples are valid for stage and QA as of Aug 2023):
Settings.geohydra.stage
(e.g.'/var/geomdtk/current/stage'
)Settings.geohydra.workspace
(e.g.'/var/geomdtk/current/workspace'
)Settings.geohydra.tmpdir
(e.g.'/var/geomdtk/current/tmp'
)Settings.geohydra.geotiff.dir
(e.g.'/var/geoserver/local/raster/geotiff'
)Settings.geohydra.opengeometadata.dir
(e.g.'/var/geomdtk/current/export/opengeometadata/edu.stanford.purl'
)
Done.