Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Which RNA type I should considered as long non-coding RNAs #562

Open
beginner984 opened this issue Sep 29, 2021 · 1 comment
Open

Which RNA type I should considered as long non-coding RNAs #562

beginner984 opened this issue Sep 29, 2021 · 1 comment

Comments

@beginner984
Copy link

beginner984 commented Sep 29, 2021

Indeed, this is not an issue with RNAcentral, rather I need help with some intuition please

We have exosome-sequensing (from plasma). In raw read counts file, I see 72650 gene names

This is hoe my read count file looks like

Screenshot 2021-09-29 at 22 09 00

I have created a percentage bar chart for categories of RNAs annotated in this exosome-seq like

Picture 2

Which category (RNA type) I should consider as long non-coding RNA (lncRNA) ?

Can I consider this observed24% Long intergenic non-coding RNA (lincRNA) (sense+antisense) as long non-coding RNA (lncRNA) ?

But as I read Generally speaking we don’t expect much lncRNA/mRNA in plasma and much of that will be heavily fragmented which makes it very difficult to sequence. So how I see 24% of lincRNAs ?

If this was your data, which type of RNAs here you would considered as long non-coding RNA (lncRNA) ?

In RNAcentral, I see this

Screenshot 2021-09-29 at 22 22 03

Screenshot 2021-09-29 at 22 23 17

In Rfam part I could not find any lncRNAs

Am I right in searching?

Thanks for any intuition

@AntonPetrov
Copy link
Member

@beginner984 Thank you for your question!

Searching for lncRNAs in RNAcentral is indeed not straightforward. My colleague @blakesweeney might be able to provide a more specific advice, but in general I would treat lncRNA and lincRNA as the same class to be on the safe side, as some lncRNAs could be incorrectly classified as lincRNAs and vice versa. I would also suggest not to use Rfam sequences if you are interested in lncRNAs, as Rfam does not focus on lncRNAs.

With respect to your question about why you observe such a high % of lncRNAs in your sample, that's difficult to answer without having more information, and the RNAcentral team cannot provide input on specific research projects. I would suggest spot-checking some of these lncRNA entries and see if you notice any pattern. It could be a misannotation, and those sequences are not actually lncRNAs, or it could be that your short sequences happen to overlap these lncRNAs by chance.

I hope this helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants