Skip to content
forked from hoelzer/pocp

Calculation of the Percentage of Conserved Proteins following Qin, Xie et al. 2014

License

Notifications You must be signed in to change notification settings

hoelzer-lab/pocp

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Calculation of the Percentage Of Conserved Proteins

Twitter Follow

Please note that I simply provide this code for POCP calculation and it is not polished in any way. Thus, the user experience might be not great, but the code does the job. Please feel free to report any issues!

As input use one amino acid sequence FASTA file per genome such as provided by Prokka. Simply annotate all your genomes with Prokka first and subsequently organize all your FASTA (*.faa) in one folder (see example data folder in this repository). The script will then calculate all pairwise alignments between all FASTA files in the provided folder and use this information for POCP calculation following Qin, Xie et al. 2014. You can also compare all your *.faa input files only against one other *.faa file to reduce runtime (see below).

You need ruby, awk, grep, and blastp as installed dependencies (ruby, awk and grep should be already installed on any Linux system). The easiest way to install blastp is via conda:

conda create -n pocp -c bioconda blast && conda activate pocp 

If you have blastp installed and in your PATH, simply skip the above step and clone this repository and execute the following test comand:

git clone https://github.com/hoelzer/pocp.git
cd pocp
chmod +x pocp.rb

./pocp.rb example/ example/results/ 2

Important: The first given parameter must be the input dir holding the *.faa FASTA files, the second parameter must be the output path and the third parameter must be the number of threads used for blastp searches.

The final output (results.csv) should look like this:

Example output

If needed, the following parameters used for filtering the blast results can be adjusted directly in the ruby script:

EVALUE = 1e-5
SEQ_IDENTITY = 0.4
ALN_LENGTH = 0.5

Per default all pairwise comparisons of all .faa FASTA files located in the input folder are performed. Please define a single FASTA filename (not path, only the filename), if comparisons should be only performed against the protein sequences in this file:

./pocp.rb example/ example/results/ 2 Cav_10DC88.faa

About

Calculation of the Percentage of Conserved Proteins following Qin, Xie et al. 2014

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Ruby 100.0%