This is the result of our participiation in Copenhacks 2018, created 7th-8th of April, 2018. The team members are:
The purpose of this script is to make it easier to plot large datasets of gene sequences and add functionalities such as the consensus sequence.
Download and install the MAFFT alignment tool from source.
Navigate to the correct folder
cd ~/Microbesoft
Install python3 modules: with conda installed:
conda install -r etc/requirements.txt -y
or with pip:
pip install --file etc/requirements.txt
python src/align.py -in input_file -c color_scheme -w char_width
The script takes the following arguments:
-in --infile
: the input fasta file-out --outfile [optional]
: Outfile name. Default is infile with "align" added.-p --plotfile [optional]
: Plot outfile name. Default is infile with ending .png.-c --colors [optional]
: options are "all", "dna", "gbmr4", "sdm12", "hsdm17", "hp2", "murphy10", "alex6", "aromatic2", "hp_vs_aromatic", "cinema". Default is "cinema".-w --width [optional]
: width in characters of the sequence. Default is 200.-same --same_length [optional]
: flag to indicate that we try to create plot spanning multiple lines have the same length in each line.
Run the script
python src/align.py -in data/ebola_virus_reduced.fasta -c cinema -w 100