Skip to content

LOCALIZER: subcellular localization prediction of plant and effector proteins in the plant cell

License

Unknown, GPL-3.0 licenses found

Licenses found

Unknown
LICENCE.txt
GPL-3.0
COPYING.txt
Notifications You must be signed in to change notification settings

JanaSperschneider/LOCALIZER

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 

Repository files navigation

What is LOCALIZER?

LOCALIZER is a machine learning method for predicting the subcellular localization of both plant proteins and pathogen effectors in the plant cell. It can currently predict localization to chloroplasts and mitochondria using transit peptide prediction and to nuclei using a collection of nuclear localization signals (NLSs).

You can submit your proteins of interest to the webserver at http://localizer.csiro.au/ or install it locally. All training and evaluation data can be found here.

Installing LOCALIZER

LOCALIZER has been written in Python and uses pepstats from the EMBOSS software and the WEKA 3.6 software. It also requires that you have Perl and BioPython installed. LOCALIZER from version 1.0.5 inclusive uses Python 3.

To get LOCALIZER to work on your local machine, you need to install the EMBOSS and WEKA softwares from source. Both are already provided in the LOCALIZER distribution to ensure that compatible versions are used.

  1. Download the latest release from this github repo (or alternatively you can clone the github repo and skip step 1).

  2. Make sure LOCALIZER has the permission to execute. Then unpack LOCALIZER in your desired location

tar xvf LOCALIZER-1.0.5.tar.gz
chmod -R 755 LOCALIZER-1.0.5/
cd LOCALIZER-1.0.5
  1. For the EMBOSS installation, you need to switch to the Scripts directory and unpack, configure and make. Alternatively, if you are on a computer cluster and EMBOSS is already installed, you can change the variable PEPSTATS_PATH in the LOCALIZER.py script to the EMBOSS directory that contains pepstats on the machine you are using.
cd Scripts
tar xvf emboss-latest.tar.gz
cd EMBOSS-6.5.7/
./configure
make
cd ../ 
  1. For WEKA, you need to simply unzip the file weka-3-6-12.zip
unzip weka-3-6-12.zip

If you are having troube installing EMBOSS, please see here for help. If you are having troube installing WEKA, please see here for help.

  1. Test if LOCALIZER is working
python LOCALIZER.py -e -i Effector_Testing.fasta
  1. Problems?

If you are getting an error message like 'ImportError: No module named Bio', you need to install BioPython on your computer. See here for help. For example, you can try and run:

pip install biopython

Note also that you need PERL to be installed on your computer for running NLStradamus.

Running LOCALIZER on plant data

For plant protein localization prediction, submit full-length sequences and run it in 'plant mode' (option -p). Do not submit short sequence fragments to LOCALIZER, it expects the full protein sequences.

python LOCALIZER.py -p -i Plant_Testing.fasta

LOCALIZER will then search for transit peptides in the N-terminus and for nuclear localization signals in the sequence.

Running LOCALIZER on effector data

For effector protein localization prediction, submit full-length sequences and run it in 'effector mode' (option -e). Do not submit short sequence fragments to LOCALIZER, it expects the full protein sequences.

It is recommended to use tools such as SignalP or Phobius to predict first if a protein is likely to be secreted and to obtain the mature sequences without the signal peptide. Alternatively, provide full sequences and let LOCALIZER delete the first 20 aas as the putative signal peptide region.

python LOCALIZER.py -e -i Effector_Testing.fasta

You can set how LOCALIZER treats the signal peptide region with these options:

    -M      : in effector mode, do not remove the signal peptide. Use this if you are providing mature effector sequences.
    -S <x>  : in effector mode, remove the signal peptide by deleting the first x aas (default: 20).

LOCALIZER output format

Run this to get a feel for the output format:

python LOCALIZER.py -e -i Effector_Testing.fasta

# -----------------
# LOCALIZER 1.0.5 Predictions (-e mode)
# -----------------
Identifier      Chloroplast             Mitochondria            Nucleus
CRN15           -                       -                       Y (KRKR)
Ecp2            -                       -                       -
AVR-Pii         -                       -                       -
ToxA            Y (0.877 | 62-130)      -                       -
--------------------------------------
--------------------------------------
# Proteins analyzed: 4 from file: Effector_Testing.fasta

# Number of proteins with cTP: 1 (25.0%)
# Number of proteins with cTP & possible mTP: 0 (0.0%)
# Number of proteins with cTP & NLS: 0 (0.0%)
# Number of proteins with cTP & possible mTP & NLS: 0 (0.0%)
# Number of proteins with mTP: 0 (0.0%)
# Number of proteins with mTP & possible cTP: 0 (0.0%)
# Number of proteins with mTP & NLS: 0 (0.0%)
# Number of proteins with mTP & possible cTP & NLS: 0 (0.0%)
# Number of proteins with NLS and no transit peptides: 1 (25.0%)
--------------------------------------
--------------------------------------
# Summary statistics

# Number of proteins with chloroplast localization (cTP, cTP & possible mTP, cTP & NLS, cTP & possible mTP & NLS): 1 (25.0%)
# Number of proteins with mitochondrial localization (mTP, mTP & possible cTP, mTP & NLS, mTP & possible cTP & NLS): 0 (0.0%)
# Number of proteins with nuclear localization and no transit peptides: 1 (25.0%)
# Number of proteins with nuclear localization and with transit peptides: 0 (0.0%)
--------------------------------------
--------------------------------------

LOCALIZER will return the output as shown in the example above. First, a summary table will be shown which shows the predictions (chloroplast, mitochondria or nucleus) for each submitted protein. If a transit peptide is predicted, the start and end positions in the submitted sequences are shown, alongside the probability. In this example, ToxA has a predicted chloroplast transit peptide with probability 0.885 at position 62-130 in its sequence. LOCALIZER does not return a probability for nucleus localization, because it uses a simple NLS search. In this example, LOCALIZER found a NLS in CRN15, i.e. the sequence KRKR.

In the summary statistic, we count LOCALIZER predictions that are 'chloroplast', 'chloroplast and possible mitochondrial', 'chloroplast and nucleus' and 'chloroplast & possible mitochondrial and nucleus' as chloroplast predictions (same strategy for mitochondrial predictions). A protein that carries a predicted transit peptide with an additional predicted NLS might have experimental evidence only for one of those locations due to the technical hurdles of recognizing dual targeting and should thus not necessarily be counted as a false positive prediction. However, in the LOCALIZER paper, a protein was counted as a nucleus prediction only if it has the category 'nucleus' to avoid assigning a protein to multiple predictions in the evaluation and this is what we recommend.

Citation for LOCALIZER:

Sperschneider, J., Catanzariti, A., DeBoer, K. et al. LOCALIZER: subcellular localization prediction of both plant and effector proteins in the plant cell. Sci Rep 7, 44598 (2017) doi:10.1038/srep44598