You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am recent graduate student who is doing raw dna sequence analyzing from the Red Sunflower Seed Weevil.
In this research, I am trying to take an approach of utilizing both dnaPipeTE & deepTE(CNN based classification tool) hoping for better results in TE classification.
The approach I take is as follows:
Run dnaPipeTE on raw dna sequence
Run deepTE on Trinity.fasta file from dnaPipeTE output
join dnaPipeTE reads per component and annotation results with deepTE output using trinity contig as common key.
When taking this approach, I figured that there is a discrepancy between the number of entries(rows) of the Trinity.fasta data set and the reads_per_component_and_annotation output, which the latter lacks about 4k entries.
I was hoping to have a better understanding on why this discrepancy exists. My guess is that within the dnaPipeTE pipeline, after Trinity is done configuring the sequence the data is handed over to Repeat Masker for annotation and quantification, and there is some filtering done by RM in this process that takes out certain reads that don't meet a certain threshold, but not sure whether it is true or not, since the final output file(reads per component and annotation) also includes unknown elements.
I am hoping if you could confirm or provide any insights regarding the discrepancy between the Trinity.fasta output file and the reads_per_component_and_annotation file, or if there could be an alternative output file that I could be utilizing for the approach I am taking.
Thank you for reading this, and thank you for developing and sharing such a wonderful tool.
with best regards,
The text was updated successfully, but these errors were encountered:
It is most likely that the 4k contigs that have disappeared between the file Trinity.fasta and reads_per_component_and_annotation came from low-copy repeats (possibly even non-TE sequences). This happens, because the read sample used for quantification is drawn independently that the sample(s) used to assemble the repeats with Trinity.
By default Trinity does 2 iterations with 2 independent samples, while a third independent sample is used for the quantification.
So unless there is something else you noticed, I think this is all normal based on the sampling strategy of dnaPipeTE.
Hi,
I am recent graduate student who is doing raw dna sequence analyzing from the Red Sunflower Seed Weevil.
In this research, I am trying to take an approach of utilizing both dnaPipeTE & deepTE(CNN based classification tool) hoping for better results in TE classification.
The approach I take is as follows:
When taking this approach, I figured that there is a discrepancy between the number of entries(rows) of the Trinity.fasta data set and the reads_per_component_and_annotation output, which the latter lacks about 4k entries.
I was hoping to have a better understanding on why this discrepancy exists. My guess is that within the dnaPipeTE pipeline, after Trinity is done configuring the sequence the data is handed over to Repeat Masker for annotation and quantification, and there is some filtering done by RM in this process that takes out certain reads that don't meet a certain threshold, but not sure whether it is true or not, since the final output file(reads per component and annotation) also includes unknown elements.
I am hoping if you could confirm or provide any insights regarding the discrepancy between the Trinity.fasta output file and the reads_per_component_and_annotation file, or if there could be an alternative output file that I could be utilizing for the approach I am taking.
Thank you for reading this, and thank you for developing and sharing such a wonderful tool.
with best regards,
The text was updated successfully, but these errors were encountered: