-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parsing from sRNAbench data #53
Comments
Hi, thanks for reporting this. This may be fixed in the devel version. Can you provide with these two files:
You can create them to contain only these two sequences. I can test it here, and see what is going on. As well, you can try to use the devel version yourself. Thanks a lot for spotting this. Cheers |
Here are the specific entries for the example in the two sRNAbench output files: microRNAannotation.txt lines: reads.annotation.txt lines: |
sorry, another question:
another question, is this mirbase 21 or 22?
…On Wed, Feb 20, 2019 at 8:18 AM lsumoy ***@***.***> wrote:
Here are the specific entries for the example in the two sRNAbench output
files:
microRNAannotation.txt lines:
*TCGAGGAGCTCACAGTCTAGT hsa-miR-151a-5p hsa-mir-151a exact - 1551
597.3388956586441 351.1619134905494 TCGAGGAGCTCACAGTCT
hsa-miR-151a-5p$hsa-miR-151b hsa-mir-151a$hsa-mir-151b
exact$lv3p|lv3pT|lv3p#-3 - 169 65.08721687060661 38.263290380337104*
reads.annotation.txt lines:
*TCGAGGAGCTCACAGTCTAGT 1551 351.1619134905494 mature#sense
mature#hsa-miR-151a-5p#sense#hsa-mir-151a,11,31 1 TCGAGGAGCTCACAGTCT 169
38.263290380337104 mature#sense
mature#hsa-miR-151a-5p#sense#hsa-mir-151a,11,28$mature#hsa-miR-151b#sense#hsa-mir-151b,60,77
2*
reads.annotation.txt
<https://github.com/miRTop/mirtop/files/2884720/reads.annotation.txt>
microRNAannotation.txt
<https://github.com/miRTop/mirtop/files/2884709/microRNAannotation.txt>
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#53 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABi_HECb-0Aai86aWJa4dAh-pQeJ91tNks5vPUsIgaJpZM4bDglo>
.
|
Hi, now that I am looking better to the sRNAbench output, I see that there is where the two reads are labeled as exact for hsa-miR-151a-5p. I assume the order is relevant in the columns indicating the miRNA and the variants. So, for instance, read:
and read:
Cheers |
Hi Lorena, It is version 22 for mirBase in this case.
Thank you very much |
This was done with mirBase 22. Looking it up it seems the shorter form is the exact for miR-151b and the longer one is the exact for 151a. The fact that the shorter could be at the same time a add-3p:-3 of miR-151a leads to the confusion. Beyond issues with files formats and data parsing to me this means that it is tricky to perform analysis grouping by isomiR class and that it makes all the sense to do it at the individual sequence (license plate) level. If not it represents a nightmare in the few cases were there is redundancy in terms of annotation. I suppose similar issue happen with alternative transcript isoform and splicing analysis in mRNA. |
Hi all, thanks for detecting this. it really seems that the labels are incorrectly switched. i will check and fix asap. |
Hi @lsumoy, Just to follow up on your comment: Yes, it is difficult to group by anything for these cases, that is the reason the format should give you enough information to do what it may best, like work with sequence levels, or even filter these cases. It would be a good thing if you have somebody there to contribute in the code to create matrix with unique Thanks! |
Hi all, |
Thank you for the fix.
All of these ideas sound good to me, we are trying to test some of them
with simulated data. Using other more robust databases will help as well,
as miRGeneDB for sure.
Cheers
|
Hi,
We recently detected that gff output form mirtop stats gives unexpected results. We got two entries for canonical (variant=NA for many miRNAs instead of a single one (one of the two in fact is a n add-3p=-1 isoform but the miRTop generated gff tags it as NA (canonical).
In many instances wo results come out annotated as 'NA? (i.e. canonical or mature reference in miRBase) in our parsed output:
example: in the case of miR-151a-5p
http://www.mirbase.org/cgi-bin/mature.pl?mature_acc=MIMAT0004697 the canonical reference lecenseplate ID is PhkUD0w
9589 isomiR HC_BST_RA_L001_10 hsa-mir-151a hsa-miR-151a-5p PhkUD0 TCGAGGAGCTCACAGTCT
9593 isomiR HC_BST_RA_L001_10 hsa-mir-151a hsa-miR-151a-5p PhkUD0w TCGAGGAGCTCACAGTCTAGT
expression
9589 169
9593 1551
Looking at gff files generated by mirtop (please see highlighted (**) entries that have NA as variant):
grep -w 'PhkUD0' HC_BST_RA_L001_10.gff
hsa-mir-151a miRBasev22 isomiR 10 29 . + . Read=TCGAGGAGCTCACAGTCTA; UID=PhkUD0@2; Name=hsa-miR-151a-5p; Parent=hsa-mir-151a; Variant=iso_3p:+1; Cigar=19M; Expression=14; Filter=Pass; Hits=2
hsa-mir-151a miRBasev22 ref_miRNA 10 28 . + . Read=TCGAGGAGCTCACAGTCT; UID=PhkUD0; Name=hsa-miR-151a-5p; Parent=hsa-mir-151a; Variant=NA; Cigar=18M; Expression=169; Filter=Pass; Hits=2
hsa-mir-151b miRBasev22 isomiR 59 78 . + . Read=TCGAGGAGCTCACAGTCTA; UID=PhkUD0@2; Name=hsa-miR-151b; Parent=hsa-mir-151b; Variant=iso_3p:-2; Cigar=19M; Expression=14; Filter=Pass; Hits=2
hsa-mir-151b miRBasev22 isomiR 59 77 . + . Read=TCGAGGAGCTCACAGTCT; UID=PhkUD0; Name=hsa-miR-151b; Parent=hsa-mir-151b; Variant=iso_3p:-3; Cigar=18M; Expression=169; Filter=Pass; Hits=2
hsa-mir-151b miRBasev22 isomiR 59 80 . + . Read=TCGAGGAGCTCACAGTCTAAA; UID=PhkUD0@; Name=hsa-miR-151b; Parent=hsa-mir-151b; Variant=iso_add:+2; Cigar=19MAM; Expression=1; Filter=Pass; Hits=1
grep -w 'PhkUD0w' HC_BST_RA_L001_10.gff
hsa-mir-151a miRBasev22 isomiR 10 32 . + . Read=TCGAGGAGCTCACAGTCTAGTA; UID=PhkUD0w@2; Name=hsa-miR-151a-5p; Parent=hsa-mir-151a; Variant=iso_3p:+1; Cigar=22M; Expression=657; Filter=Pass; Hits=1
hsa-mir-151a miRBasev22 isomiR 10 33 . + . Read=TCGAGGAGCTCACAGTCTAGTAA; UID=PhkUD0w@1; Name=hsa-miR-151a-5p; Parent=hsa-mir-151a; Variant=iso_add:+1; Cigar=22MA; Expression=56; Filter=Pass; Hits=1
hsa-mir-151a miRBasev22 isomiR 10 34 . + . Read=TCGAGGAGCTCACAGTCTAGTAAA; UID=PhkUD0w@; Name=hsa-miR-151a-5p; Parent=hsa-mir-151a; Variant=iso_add:+2; Cigar=22MAA; Expression=2; Filter=Pass; Hits=1
hsa-mir-151a miRBasev22 ref_miRNA 10 31 . + . Read=TCGAGGAGCTCACAGTCTAGT; UID=PhkUD0w; Name=hsa-miR-151a-5p; Parent=hsa-mir-151a; Variant=NA; Cigar=21M; Expression=1551; Filter=Pass; Hits=1
Steps to reproduce the problem.
/imppc/labs/lslab/jsanchez/python_environment/mirtop_jsanchez/bin/mirtop gff --sps hsa --hairpin /imppc/labs/lslab/aluna/opt/sRNAtoolboxDB/libs/hairpin.fa --gtf /imppc/labs/lslab/aluna/opt/hsa.gff3 --format srnabench -o /imppc/labs/lslab/share/data/proc_data/20190208_LungCancer_Serum_small_RNAseq/2.Analysis_20190208/LungCancer_Serum_PE/6.2.isomiR_miRTop/HC_BST_RA_L001_10 /imppc/labs/lslab/share/data/proc_data/20190208_LungCancer_Serum_small_RNAseq/2.Analysis_20190208/LungCancer_Serum_PE/6.1.isomiR_sRNAbenchtoolbox/HC_BST_RA_L001_10 2> /imppc/labs/lslab/share/data/proc_data/20190208_LungCancer_Serum_small_RNAseq/2.Analysis_20190208/LungCancer_Serum_PE/6.2.isomiR_miRTop/HC_BST_RA_L001_10_logfile.txt
/imppc/labs/lslab/jsanchez/python_environment/mirtop_jsanchez/bin/mirtop stats -o /imppc/labs/lslab/share/data/proc_data/20190208_LungCancer_Serum_small_RNAseq/2.Analysis_20190208/LungCancer_Serum_PE/6.2.isomiR_miRTop/HC_BST_RA_L001_10/stats /imppc/labs/lslab/share/data/proc_data/20190208_LungCancer_Serum_small_RNAseq/2.Analysis_20190208/LungCancer_Serum_PE/6.2.isomiR_miRTop/HC_BST_RA_L001_10/HC_BST_RA_L001_10.gff 2>> /imppc/labs/lslab/share/data/proc_data/20190208_LungCancer_Serum_small_RNAseq/2.Analysis_20190208/LungCancer_Serum_PE/6.2.isomiR_miRTop/HC_BST_RA_L001_10_logfile.txt
Specifications like the version of the project, operating system, or hardware.
We are running this on:
mirtop (0.3.17)
python2.7
debian8.10
linux
The text was updated successfully, but these errors were encountered: