Skip to content

Latest commit

 

History

History
218 lines (194 loc) · 11.9 KB

README.md

File metadata and controls

218 lines (194 loc) · 11.9 KB

zoophy-services

RESTful services for ZooPhy. This consists of:

  1. Retrieving GenBankRecords from the ZooPhy SQL Database

  2. Searching GenBankRecords in the ZooPhy Lucene Index

  3. Starting/Stopping ZooPhy Pipeline jobs

Dependencies:

Setup:

  1. Import the project into an IDE as "Existing Maven Project"

  2. Create an application.properties file in the config folder with your SQL and Lucene details. Refer to application.properties.template

  3. Run the build.sh script

  4. The build should run successfully and generate a runnable jar in the target folder. This can be run via terminal, or in Spring Tool Suite click Run As "Spring Boot App"

Using Services

The current services may be used via HTTPS requests. They return data in JSON format:

Get GenBankRecord:

Get Location of GenBankRecord

Lucene query for Index records

Lucene query for Index records count

Lucene query for specific list of Accessions

["GQ258462","CY055940","CY055932","CY055788","CY055780","CY055740","CY055661","HQ712184","HM624085"]
  • Note: This service is a work around for searching long specific lists of accessions, rather than using absurdly long GET request URLs. Unfortunately, the current limit is 1000 accessions. This will be refactored in a future PR.

Start ZooPhy Job

  • Type: POST
  • Path: /run
  • Required POST Body Data: JobParameters JSON Object containing:
  • replyEmail - String
  • jobName - String (optional)
  • records - List of JobRecord (Limit 1000)
    • Note: In records, the fields collectionDate, geonameID and rawSequence are required only for FASTA records.
  • useGLM - Boolean (default is false)
  • predictors - Map of <String, List of Predictors> (optional)
    • Note: This is only if custom GLM Predictors need to be used. Otherwise, if usedGLM is set to true, defualt predictors will be used that can only be applied to US States. If locations outside of the US, or more precise locations, are needed then custom predictors must contain at least lat, long, and SampleSize. All predictor values must be positive (< 0) numbers, except for lat/long. Predictor year is not needed, and will not be used for custom predictors. The predictor states must also exactly match the accession states as proccessed in our pipeline, for this reason it is critical to use the Template Generator service to generate locations, coordinates, and sample sizes. This feature is currently experimental.
  • Example Request URL: https://zodo.asu.edu/zoophy/api/run
  • Example POST Body:
{
        "records":  [
            {
                "id":"EPI_ISL_150187","collectionDate":"02-Sep-2004","geonameID":"8900568",
                "rawSequence":"AGCAAAAG.........CAATCTGT",
                "resourceSource":"2"
            },
            {
                "id":"EPI_ISL_190187","collectionDate":"30-Jan-2009","geonameID":"8041603",
                "rawSequence":"AGCAAAAG......AAATAGTGC",
                "resourceSource":"2"
            },
            {
                "id":"KM654884","collectionDate":null,"geonameID":null,
                "rawSequence":null,
                "resourceSource":"1"
            },
            {
                "id":"KM654883","collectionDate":null,"geonameID":null,
                "rawSequence":null,
                "resourceSource":"1"
            },
            {
                "id":"KM654882","collectionDate":null,"geonameID":null,
                "rawSequence":null,
                "resourceSource":"1"
            }
                ],
        "replyEmail":"[email protected]",
        "jobName":"Sample run",
        "useGLM":true,
        "predictors":{
            "merrylands" : [
                            {"state": "merrylands", "name": "lat", "value": -33.833328, "year": null},
                            {"state": "merrylands", "name": "long", "value": 150.98334, "year": null},
                            {"state": "merrylands", "name": "SampleSize", "value": 2, "year": null}
                        ],
            "perth": [
                        {"state": "perth", "name": "lat", "value": -31.95224, "year": null},
                        {"state": "perth", "name": "long", "value": 115.8614, "year": null},
                        {"state": "perth", "name": "SampleSize", "value": 1, "year": null}
                    ],
            "castle-hill" : [
                                {"state": "castle-hill", "name": "lat", "value": -33.73333, "year": null},
                                {"state": "castle-hill", "name": "long", "value": 151.0, "year": null},
                                {"state": "castle-hill", "name": "SampleSize", "value": 4, "year": null}
                            ],
            "brisbane": [
                        {"state": "brisbane", "name": "lat", "value": -27.467939, "year": null},
                        {"state": "brisbane", "name": "long", "value": 153.02809, "year": null},
                        {"state": "brisbane", "name": "SampleSize", "value": 1, "year": null}
                        ]
        },
        "xmlOptions":
            { 
                "substitutionModel":"HKY",
    				"gamma":false,
    				"invariantSites":false,
    				"clockModel" :"Strict",
    				"treePrior":"Constant",
    				"chainLength":10000000,
    				"subSampleRate":1000
            }
}
  • Note:
      1. The ZooPhy Pipeline ties together several packages of complex software that may fail for numerous reasons. A common reason is having too few or too many unique disjoint Geoname locations (must have between 2 and 50). Jobs may also take very long to run, and time estimates will be provided in update emails.
      1. The example contains both Fasta and GenBank records. The API can be used to run job with only Fasta records or only GenBank records or both.

Validate ZooPhy Job

  • Type: POST
  • Path: /validate
  • Required POST Body Data: Exact same as the Run service
  • Note: This service is intended to check ZooPhy jobs for common errors before starting the jobs. It will return null if no errors are found, otherwise it returns an error message describing the reason(s) that the job will not succeed. Just because the validation test runs successfully, the job is NOT guaranteed to succeed.

Stop ZooPhy Job

Generate GenBankRecord data download

{
	"accessions":	[{
    "id": "CY214007", "collectionDate": null, "geonameID": null, 
    "rawSequence": null, "resourceSource": 1
},
{
    "id": "CY060544", "collectionDate": null, "geonameID": null,
    "rawSequence": null, "resourceSource": 1
},
{
    "id": "CY060688","collectionDate": null, "geonameID": null,
    "rawSequence": null, "resourceSource": 1
},
{
    "id": "JN632581","collectionDate": null, "geonameID": null,
    "rawSequence": null, "resourceSource": 1
}],
	"columns":	["Genes","Date","Country"]
}
  • Note: This service will not return an actual File, just a JSON String ready to be written into a file.

Generate GLM Predictor template download

["GQ258462","CY055940","CY055932","CY055788","CY055780","CY055740","CY055661","HQ712184","HM624085"]
  • Note: This service will not return an actual File, just a JSON String ready to be written into a file.