This repository contains a simple API to explore EuropePMC articles.
git clone https://github.com/RNAcentral/rnacentral-references.git
cd rnacentral-references
- create an
.env
file with the following environment variables (keep ENVIRONMENT=DOCKER and change LITSCAN_USER, LITSCAN_PASSWORD and LITSCAN_DB as desired).
ENVIRONMENT=DOCKER
LITSCAN_USER=docker
LITSCAN_PASSWORD=example
LITSCAN_DB=reference
docker-compose up --build
A volume is being used to persist data generated by and used by Docker containers.
If you want to recreate the database, just run the docker-compose down --volumes
command before step 4.
git clone https://github.com/RNAcentral/rnacentral-references.git
cd rnacentral-references
python3 -m venv venv
source venv/bin/activate
pip3 install -r requirements.txt
docker build -t local-postgres database/local
- this will create an image with postgres databases.docker run -d -p 5432:5432 -e POSTGRES_PASSWORD=postgres -t local-postgres
- this will create and start an instance of postgres on your local machine's 5432 port.python3 -m database
- creates necessary database tablespython3 -m producer
- starts producer server on port 8080python3 -m consumer
- starts consumer server on port 8081
Submit a single job using
curl -H "Content-Type:application/json" -d "{\"id\": \"RF00001\"}" localhost:8080/api/submit-job
This job will use the following query on EuropePMC:
query=("RF00001" AND ("rna" OR "mrna" OR "ncrna" OR "lncrna" OR "rrna" OR "sncrna") AND IN_EPMC:Y AND OPEN_ACCESS:Y AND NOT SRC:PPR)
Where:
"RF00001"
is the string used in the search("rna" OR "mrna" OR "ncrna" OR "lncrna" OR "rrna" OR "sncrna")
is used to filter out possible false positivesIN_EPMC:Y
means that the full text of the article is available in Europe PMCOPEN_ACCESS:Y
it must be an Open Access article to allow access to the full contentNOT SRC:PPR
cannot be a Preprint, as preprints are not peer-reviewed
It is possible to change the query used to filter out possible false positives.
To do so, use the query
parameter when submitting a job
curl -H "Content-Type:application/json" -d "{\"id\": \"RF00001\", \"query\": \"('foo' AND 'bar')\"}" localhost:8080/api/submit-job
Or if you don't want to use any queries, run
curl -H "Content-Type:application/json" -d "{\"id\": \"RF00001\", \"query\": \"\"}" localhost:8080/api/submit-job
Use the search_limit
parameter if you want to set a maximum number of articles to search when testing this tool
curl -H "Content-Type:application/json" -d "{\"id\": \"RF00001\", \"search_limit\": 10}" localhost:8080/api/submit-job
To rescan an id, use the rescan
parameter
curl -H "Content-Type:application/json" -d "{\"id\": \"RF00001\", \"rescan\": true}" localhost:8080/api/submit-job
To submit multiple ids run
curl -H "Content-Type:application/json" -d "{\"job_id\": [\"5S rRNA\", \"5S ribosomal RNA\"], \"database\": \"rfam\", \"primary_id\": \"RF00001\", \"search_limit\": 10}" localhost:8080/api/multiple-jobs
The example above is useful for submitting a collection of identifiers (Ids) related to a single gene/accession.
Note that the URL used in this case is localhost:8080/api/multiple-jobs
.
You can check the results by accessing the URL
http://localhost:8080/api/results/rf00001
Current rate limit is 10 requests per second or 500 per minute. We are making at least 200 requests per minute when this tool is used. If a consumer is searching for an ID that has many articles, this VM can make up to 100 requests per minute. The vast majority of IDs searched do not have articles, so the rate limit is unlikely to be exceeded.