Skip to content

Commit

Permalink
Merge pull request #8 from tjiangHIT/master
Browse files Browse the repository at this point in the history
Version update
  • Loading branch information
tjiangHIT authored May 7, 2019
2 parents 6496c0f + 5ebc05f commit 3f36e23
Show file tree
Hide file tree
Showing 6 changed files with 39 additions and 37 deletions.
16 changes: 8 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,26 +11,25 @@ rMETL - realignment-based Mobile Element insertion detection Tool for Long read
|_| |_| \__/ |_| |______| |_| |______|

$ git clone https://github.com/hitbc/rMETL.git (git clone https://github.com/tjiangHIT/rMETL.git)
$ git clone https://github.com/tjiangHIT/rMETL.git (git clone https://github.com/hitbc/rMETL.git)
$ cd rMETL/
$ bash INSTALL.sh
$ ./rMETL.py

---
### Introduction
Mobile element insertion (MEI) is a major category of structure variations (SVs). The rapid development of long read sequencing provides the opportunity to sensitively discover MEIs. However, the signals of MEIs implied by noisy long reads are highly complex, due to the repetitiveness of mobile elements as well as the serious sequencing errors. Herein, we propose Realignment-based Mobile Element insertion detection Tool for Long read (rMETL). rMETL takes advantage of its novel chimeric read re-alignment approach to well handle complex MEI signals. Benchmarking results on simulated and real datasets demonstrated that rMETL has the ability to more sensitivity discover MEIs as well as prevent false positives. It is suited to produce high quality MEI callsets in many genomics studies.

Mobile element insertion (MEI) is a major category of structure variations (SVs). The rapid development of long read sequencing technologies provides the opportunity to detect MEIs sensitively. However, the signals of MEI implied by noisy long reads are highly complex due to the repetitiveness of mobile elements as well as the high sequencing error rates. Herein, we propose the Realignment-based Mobile Element insertion detection Tool for Long read (rMETL). Benchmarking results of simulated and real datasets demonstrate that rMETL has the ability to discover MEIs sensitively as well as prevent false positives. It is suited to produce high-quality MEI callsets in many genomics studies.

---
### Simulated datasets

The simulated datasets use for benchmarking are available at: https://drive.google.com/open?id=1ujV2C8e1PNAVhSkh9vKtjWLdG_OHcH-k
The simulated datasets use for benchmarking are available at: [Google drive](https://drive.google.com/open?id=1ujV2C8e1PNAVhSkh9vKtjWLdG_OHcH-k)

---
### Memory usage

The memory usage of rMETL can fit the configurations of most modern servers and workstations.
Its peak memory footprint is about 12.18 Gigabytes (default setting), on a server with Intel Xeon CPU at 2.00 GHz, 1 Terabytes RAM running Linux Ubuntu 14.04. These reads were aligned to human reference genome hs37d5.
Its peak memory footprint is about 7.05 Gigabytes (default setting), on a server with Intel Xeon CPU at 2.00 GHz, 1 Terabytes RAM running Linux Ubuntu 14.04. These reads were aligned to human reference genome hs37d5.

---
### Dependences
Expand All @@ -46,7 +45,7 @@ Its peak memory footprint is about 12.18 Gigabytes (default setting), on a serve
---
### Installation

Current version of rMETL needs to be run on Linux operating system.
Current version of rMETL has been tested on 64bit Linux operating system.
The source code is written in python, and can be directly download from: https://github.com/hitbc/rMETL
A mirror is also in: https://github.com/tjiangHIT/rMETL
The INSTALL.sh is attached. Use the bash command for generating the executable file.
Expand Down Expand Up @@ -102,8 +101,9 @@ Strongly recommend making output directory manually at first.:blush:

---
### Citation
Tao Jiang, Bo Liu, Junyi Li, Yadong Wang; rMETL: sensitive mobile element insertion detection with long read realignment, Bioinformatics, , btz106, https://doi.org/10.1093/bioinformatics/btz106
If you use rMETL, please cite:
> Tao Jiang *et al*; rMETL: sensitive mobile element insertion detection with long read realignment, *Bioinformatics*, , btz106, https://doi.org/10.1093/bioinformatics/btz106
---
### Contact
For advising, bug reporting and requiring help, please contact [email protected] or [email protected]
For advising, bug reporting and requiring help, please post on [Github Issue](https://github.com/tjiangHIT/rMETL/issues) or contact [email protected].
3 changes: 2 additions & 1 deletion src/rMETL.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,9 +33,10 @@
STAGE is one of
detection Inference of putative MEI loci.
realignment Realignment of chimeric read parts.
calling Mobile Element Insertion calling.
calling Mobile Element Insertion/Deletion calling.
See README.md for documentation or --help for details
Strongly recommend making output directory manually at first.
rMETL V%s
Author: %s
Expand Down
17 changes: 9 additions & 8 deletions src/rMETL_MEIcalling.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,10 @@
rMETL - realignment-based Mobile Element insertion detection Tool for Long read
rMETL MEI calling.
Optional output format: .bed or .vcf
Generate final MEI/MED callset in bed or vcf file.
The output file called 'calling.bed' or 'calling.vcf'
stores in output directory.
rMETL V%s
Author: %s
Expand Down Expand Up @@ -313,12 +314,12 @@ def call_vcf(args):
def parseArgs(argv):
parser = argparse.ArgumentParser(prog="rMETL.py calling", description=USAGE, \
formatter_class=argparse.RawDescriptionHelpFormatter)
parser.add_argument("input", metavar="SAM", type=str, help="Input cluster.sam.")
parser.add_argument("input", metavar="SAM", type=str, help="Input cluster.sam on STAGE realignment.")
parser.add_argument("Reference", metavar="REFERENCE", type=str, \
help="The reference genome(fasta format).")
help="The reference genome in fasta format.")
parser.add_argument("format", metavar="[BED,VCF]", type=str, \
help="The format of the output file. [%(default)s]", default = "bed")
parser.add_argument('output', type=str, help = "Prefix of final call set.")
parser.add_argument('output', type=str, help = "Directory to output final callset.")
parser.add_argument('-hom', '--homozygous', \
help = "The mininum score of a genotyping reported as a homozygous.[%(default)s]", \
default = 0.8, type = float)
Expand All @@ -330,7 +331,7 @@ def parseArgs(argv):
parser.add_argument('-c', '--clipping_threshold', \
help = "Mininum threshold of realignment clipping.[%(default)s]", \
default = 0.5, type = float)
parser.add_argument('--sample', help = "The name of the sample that is noted.", \
parser.add_argument('--sample', help = "Sample description", \
default = "None", type = str)
parser.add_argument('--MEI', help = "Enables rMETL to display MEI/MED only.[%(default)s]", \
default = "True", type = str)
Expand All @@ -346,7 +347,7 @@ def run(argv):
elif args.format == "vcf":
call_vcf(args)
else:
logging.error("The format is available.")
logging.error("Invalid format.")
exit(1)
logging.info("Finished in %0.2f seconds."%(time.time() - starttime))

Expand Down
30 changes: 15 additions & 15 deletions src/rMETL_extraction.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,19 +37,19 @@
rMETL - realignment-based Mobile Element insertion detection Tool for Long read
Map reads using NGMLR and Samtools to produce .bam file.
Support reads aligned with Ngmlr and sorted with Samtools
If input is a .fastq, or .fasta, we do the initial mapping
for you all at once.
If input is a fastq or fasta format file, rMETL generates
alignments with Ngmlr at first;
If input is a .sam, we convert and sort it to be a bam,
and then make an index for it.
If input is a sam format file, rMETL converts and sorts it
to be a bam format file;
If your input is a .bam, we extract the ME signatures and
collect the sub-sequence of them.
If your input is a bam format file with index, rMETL extracts
the ME signatures and collects the sub-sequence of them.
The output is a .fasta file contains potentials non-reference
ME clusters.
The output is a fasta format file called 'potential.fa'
contains potentials non-reference ME clusters.
rMETL V%s
Author: %s
Expand Down Expand Up @@ -428,7 +428,7 @@ def single_pipe(out_path, chr, bam_path, low_bandary, evidence_read, SV_size):
'''
samfile = pysam.AlignmentFile(bam_path)
CLIP_note = dict()
logging.info("Resolving the chromsome %s."%(chr))
logging.info("Resolving chromsome %s."%(chr))
if chr not in CLIP_note:
CLIP_note[chr] = dict()
cluster_pos_INS = list()
Expand All @@ -455,7 +455,7 @@ def single_pipe(out_path, chr, bam_path, low_bandary, evidence_read, SV_size):
SV_size, low_bandary)
del cluster_pos_DEL
gc.collect()
logging.info("%d MEI signal locuses in the chromsome %s."%(len(Cluster_INS)+\
logging.info("%d MEI/MED signal loci in the chromsome %s."%(len(Cluster_INS)+\
len(Cluster_DEL), chr))
combine_result(add_genotype(Cluster_INS, samfile, low_bandary), \
add_genotype(Cluster_DEL, samfile, low_bandary), out_path, chr)
Expand Down Expand Up @@ -528,12 +528,12 @@ def parseArgs(argv):
parser = argparse.ArgumentParser(prog="rMETL.py detection", \
description=USAGE, formatter_class=argparse.RawDescriptionHelpFormatter)
parser.add_argument("input", metavar="[SAM,BAM,FASTA,FASTQ]", type=str, \
help="Input [Mapped/Unmapped] reads.")
help="Input reads with/without alignment.")
parser.add_argument("Reference", metavar="REFERENCE", type=str, \
help="The reference genome (fasta format).")
help="The reference genome in fasta format.")
parser.add_argument('temp_dir', type=str, \
help = "Temporary directory to use for distributed jobs.")
parser.add_argument('output', type=str, \
parser.add_argument('output_dir', type=str, \
help = "Directory to output potential ME loci.")
parser.add_argument('-s', '--min_support',\
help = "Mininum number of reads that support a ME.[%(default)s]", \
Expand All @@ -542,7 +542,7 @@ def parseArgs(argv):
help = "Mininum length of ME to be reported.[%(default)s]", \
default = 50, type = int)
parser.add_argument('-d', '--min_distance', \
help = "Mininum distance of two ME clusters to be intergrated.[%(default)s]", \
help = "Mininum distance of two ME signatures to be intergrated.[%(default)s]", \
default = 20, type = int)
parser.add_argument('-t', '--threads', \
help = "Number of threads to use.[%(default)s]", default = 8, \
Expand Down
8 changes: 4 additions & 4 deletions src/rMETL_realign.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@
TE refs: Alu concensus
L1 concensus
SVA concensus
The output of this script is a .sam file.
The output is a sam format file called 'cluster.sam'.
rMETL V%s
Author: %s
Expand Down Expand Up @@ -69,9 +69,9 @@ def call_ngmlr(inFile, ref, presets, nproc, outFile, SUBREAD_LENGTH, SUBREAD_COR
def parseArgs(argv):
parser = argparse.ArgumentParser(prog="rMETL.py realignment", description=USAGE, \
formatter_class=argparse.RawDescriptionHelpFormatter)
parser.add_argument("input", metavar="FASTA", type=str, help="Input potential_ME.fa.")
parser.add_argument("ME_Ref", type=str, help="The reference genome(fasta format).")
parser.add_argument('output', type=str, help = "Prefix of potential ME classification.")
parser.add_argument("input", metavar="FASTA", type=str, help="Input potential_ME.fa on STAGE detection.")
parser.add_argument("ME_Ref", type=str, help="The transposable element concensus in fasta format.")
parser.add_argument('output', type=str, help = "Directory to output realignments.")
parser.add_argument('-t', '--threads', help = "Number of threads to use.[%(default)s]", \
default = 8, type = int)
parser.add_argument('-x', '--presets', \
Expand Down
2 changes: 1 addition & 1 deletion src/rMETL_version.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# * @author: Jiang Tao ([email protected])

__version__ = '1.0.2'
__version__ = '1.0.3'
__author__ = 'Jiang Tao'
__contact__ = '[email protected]'

0 comments on commit 3f36e23

Please sign in to comment.