Scripts for processing and predicting CRISPR/Cas9-generated mutations
To predict and view mutational profiles for individual gRNAs, please visit the FORECasT website at:
https://partslab.sanger.ac.uk/FORECasT
Precomputed profiles for all gRNAs in human and mouse CCDS regions are available here:
https://fa9.cog.sanger.ac.uk/index.html
Entries are collected into all gRNAs corresponding to each CCDS id. Within each file ending in _predicted_mapped_indel_summary.txt, the entries for each gRNA are separated by a line with
@@@id guide_seq predicted_in_frame
where the id contains the CCDS id, the chomosome coordinates and the strand. The next line is '- - 1000' and can be ignored (there for visualization only). The following lines are the particular indels predicted and their predicted counts (assuming total reads of 1000, and ignoring indels with less than 1 read). For the read sequences, see corrresponding entries in the _predicted_rep_reads.txt files.
-
Follow the installation instructions here.
-
After installation, from a command line:
cd indel_prediction
cd predictor
- Run single or batch prediction as described next.
python FORECasT.py <target DNA sequence> <PAM index (0 based)> <output_file_prefix>
e.g.
python FORECasT.py ATGCTAGCTAGGGCATGAGGCATGCTAGTGACTGCATGGTAC 17 test_output
Output will be in
<output_file_prefix>_predictedindelsummary.txt
A list of predicted mutations, one per line, listed in order of decreasing predicted counts. Each line contains an identifier string for the indel followed by a - (ignore this), and then a predicted read count (tab-delimited).
e.g.
- - 1000 (always 1000 reads - it is the original template sequence - here for viewer use).
D2_L-3R0 - 550
I1_L-2C1R0 - 200
<output_file_prefix>_predictedreads.txt A list of read sequences corresponding to each predicted mutation in the previous file. The format is read_id (ignore this), read sequence, mutation identifier (tab delimited), followed by a - (ignore this)
e.g.
ATGCTAGCTAGGGCATGAGGCATGCTAGTGACTGCATGGTAC - -
ATGCTAGCTAGGGCAAGGCATGCTAGTGACTGCATGGTAC D2_L-3R0 -
ATGCTAGCTAGGGCATGGAGGCATGCTAGTGACTGCATGGTAC I1_L-2C1R0 -
python FORECasT.py <batch_filename> <output_file_prefix>
e.g.
python FORECasT.py example_batch.txt test_batch_output
where batch_filename is a tab-delimited file with columns: ID, Target, PAM Index e.g.
ID Target PAM Index
Guide_1 ATGCTAGCTAGGGCATGAGGCATGCTAGTGACTGCATGGTAC 17
Guide_2 ATCGATGACTGATCGTAGCTAGCTGGGATGCTAGCTAGTTGCATGCTAGGAGTCAGCTAG 23
Guide_3 GATAGTCGTAGGCTAGCTAGCTAGCTGGCAAGTGTGGAAAAGGGGATGCATGTA 26
Output will be in <output_file_prefix>_predictedindelsummary.txt and <output_file_prefix>_predictedreads.txt
which are formatted as for single mode, but separate guides are prefaced by a line with
@@@<ID> <predicted_in_frame>
where ID is the identifier provided for the guide in the batch file, and predicted_in_frame is the predicted percentage of in-frame mutations (i.e. all insertions or deletions that are of size 3,6,9...etc)
Create a Python 3 virtual environment and activate it
# install Python dependencies
pip install -r requirements.txt
cd selftarget_pyutils
pip install -e .
cd ../indel_prediction
pip install -e .
# compile predictor
cd indel_analysis/indelmap
cmake . -DINDELMAP_OUTPUT_DIR=/usr/local/bin
make && make install
export INDELGENTARGET_EXE=/usr/local/bin/indelgentarget
Alternatively, you can start a Docker container and exec into it:
docker pull quay.io/felicityallen/selftarget
docker exec -it quay.io/felicityallen/selftarget bash
The predictor can be run as a web service. It can be accessed through a separate front end application FORECasT (source on GitHub). SelfTarget repository contains a Flask server with two API endpoints that are used by FORECasT to access predictor.
To run predictor as a server, you can follow the local installation steps above, go to the root directory and launch
python server/server.py --port=5001
or simply run a Docker container
docker run -d --name selftarget -p 5001:8006 quay.io/felicityallen/selftarget
All changes to the server must be reflected in swagger.yaml
since
it's being used to automatically generate clients for other services.
Tests use it as well, so generally any unreflected changes must fail
some of the tests. It is handy to validate swagger specification with
swagger validate swagger.yml