Markonv: a novel convolutional layer for identifying inter-positional correlations effectively and efficiently
This is the repository for reproducing figures and tables in the paper.
conda env create -f env_markonv.yaml
conda activate markonv
pip3 install git+https://github.com/rrwick/Porechop.git
cd scripts/bonito
python setup.py develop
- ggpubr
- data.table
- magrittr
- foreach
- Generate simulation datasets and plot Markov transition matrix (Reproduce Appendix C)
cd external/simulation
python3 GeneRateMarcov.py
The plotted Markov transition matrix is saved in external/simulation/motif/
.
- Training and evaluation
cd ../../scripts/simulation
python3 torch_trainMarkonv.py
- Plot AUROC (Reproduce Figure 2)
python3 Compare.py
Rscript generate_fig_2.R
The figure generated is saved in result/simulation.auroc.png
.
- Recover motifs (Reproduce Figure 3 and Appendix G)
python3 geneRatemotifs.py
python3 kernel2motif.py
cd ../../
The recovered Markov transition matrix is saved in result/simulation/Motifs/*.png
.
You can then compare the recovered motif with the real motif. Offset may exist between the real motif and the recovered motif, and the offset is saved in result/simulation/Motifs/kernel_offset.txt
.
Downloads training and test datasets from https://github.com/NWPU-903PR/HOCNNLB/blob/master/lncRBPdata.zip and put them into ./external/HOCNNLB
cd external/HOCNNLB
unzip lncRBPdata.zip
mv RBPdata1201 fasta
cd ../code/
python generateHDF5.py
cd ../../../scripts/HOCNNLB
python torch_trainMarkonv.py
python Compare.py
cd ../../
- Download training and test sets
Get training set
cd scripts/bonito
bonito download --training
Get test set
mkdir -p ../../external/bonito/Klebsiella_pneumoniae_INF032_fast5s
cd ../../external/bonito/Klebsiella_pneumoniae_INF032_fast5s
wget https://bridges.monash.edu/ndownloader/files/15188573
mv 15188573 Klebsiella_pneumoniae_INF032_fast5s.tar.gz
tar -xzf Klebsiella_pneumoniae_INF032_fast5s.tar.gz
cd ../../../scripts/bonito/test
wget https://bridges.monash.edu/ndownloader/files/14260223
mv 14260223 Klebsiella_pneumoniae_INF032_reference.fasta.gz
gzip -d Klebsiella_pneumoniae_INF032_reference.fasta.gz
mv Klebsiella_pneumoniae_INF032_reference.fasta reference.fasta
cd ../
- Training and basecalling
Convolution-based bonito network
bash run_train_basecall_conv.sh
Markonv-based bonito network
bash run_train_basecall_markonv.sh
The number of parameters for each network (in Table 2) will be printed on the screen.
- Evaluating basecalled reads
Convolution-based bonito network
cd test
bash run_analysis_conv.sh
Markonv-based bonito network
bash run_analysis_markonv.sh
The read accuracy for each read is saved in result/bonito/*_reads.tsv
, and the consensus accuracy for each read is saved in result/bonito/*_assembly.tsv
.
The median of read accuracy and consensus accuracy for each network (in Table 2) will be printed on the screen.
- Testing for multiple random seeds
cd ../
sbatch sbatch_multiple_seed.sh
cd ../../
The result file is saved in result/bonito/bonito_multiple_seeds.tsv
. You can use result/bonito/stat_multiple_seeds.py
to calculate the mean and standard deviation. (Table 2)
cd scripts/SpeedCompare
python torch_trainMarkonv.py
python DrawTheHistory.py
The figure is saved in result/SpeedCompare/figure/
.
- Generate simulation datasets for training and testing
cd external/JasparSimu
python3 PWMmotifSimu.py
The training and testing dataset is saved in external/JasparSimu/HDF5/1/
.
- Training and evaluation
cd ../../scripts/JasparPWMtest
python3 torch_trainMarkonv.py
- Plot AUROC (Reproduce additional figure 1.B)
python3 Compare.py
The figure generated is saved in result/JasparSimu/picture/1auc.png
.
- Recover motifs (Reproduce additional figure 1.B and Appendix G)
python3 geneRatemotifs.py
python3 kernel2motif.py
cd ../../
the additional figure 1.C is in result/JasperSimu/Motifs/1/MarkonvV, while the additional figure 1.B is in result/JasperSimu/PWMMotifs2
- Generate simulation datasets for training and testing
cd external/simulation2merShuffle
python3 generateData.py
- Training and evaluation
cd ../../scripts/simulation2merShuffle
python3 torch_trainMarkonv.py
- Plot AUROC of all networks (Reproduce additional figure 4)
python3 Compare.py
The figure generated is saved in ~/result/simulation2merShuffle/picture/
.
- Plot AUROC between Markonv-based network and convolution-based network (Reproduce additional figure 2)
Rscript generate_fig_2.R
The figure generated is saved in ~/result/simulation/simulation.auroc.png
.
- Recover motifs (additional figure 3.A and additional figure 3.C)
python3 geneRatemotifs.py
python3 kernel2motif.py
cd
The Markov transition matrix(additional figure 3.A) is saved in result/simulation2merShuffle/Motifs/291/MarkonvV/*.png
.
The Markov transition matrix(additional figure 3.C) is saved in result/simulation2merShuffle/Motifs/291*.png
.
Downloads training and test datasets from https://github.com/NWPU-903PR/HOCNNLB/blob/master/lncRBPdata.zip and put them into ./external/HOCNNLB
cd external/HOCNNLB
unzip lncRBPdata.zip
mv RBPdata1201 fasta
cd ../code/
python generateHDF5.py
cd ../../../scripts/HOCNNLB
python torch_trainMarkonv.py
python Compare.py
python robust.py
cd ../../
cd scripts/SpeedCompare
python torch_trainMarkonv.py
python DrawTheHistory.py
The figure is saved in result/SpeedCompare/figure/
.