Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can value of k-mer is changeable based on read length (150nt) ? #9

Open
unique379r opened this issue Sep 30, 2019 · 3 comments
Open

Comments

@unique379r
Copy link

Hi
I have tested your tool (FuSeq) with 100nt long sequences (glioma dataset) as you have described in your paper as well as in GitHub tutorial and have compared the results with Soafuse and fusionCatcher . Seems FuSeq works well in terms Precision as of recall was almost similar.
Anyway, taking this comparison further I have in-house Sequences and 14 validated truth sets (gene-fusion) with 150nt long read length. Using your by default parameters, FuSeq was able to predict only 2 out of 14 fusion genes in compare to FusionCatcher which has predicted 10.
Any thought about it ?? I was thinking to change the -k mer values (since i have 150nt read length) but I am not able to ..."Error: k must not be larger than 31, you chose 51".

Any insight would be appreciated.
Thanks
Rupesh.

@nghiavtr
Copy link
Owner

nghiavtr commented Oct 1, 2019

Dear Rupesh,

Thank you for using FuSeq in your research.
I have not tested FuSeq for RNA-seq with 150nt read long, so I am not sure for a correct answer. But I think k-mer length might not be an issue, 31 would be ok. I think might be other parameters can effect. If you can send the FuSeq output of your data and the list of 14 validated ones, I would investigate what's happening. If it is big, you can upload the data to somewhere and send the download link to me via email ([email protected]).

Best,
Nghia

@unique379r
Copy link
Author

unique379r commented Oct 3, 2019

Thank you for your kind reply.
Please check your email. I sent you via aspera as package.

@nghiavtr
Copy link
Owner

nghiavtr commented Oct 7, 2019

Dear Rupesh,

Thank you for your files!

So you are running FuSeq on RNA-seq data with150nt read long and using the annotation of Homo_sapiens.GRCh38.94. It should be noted that we have never tested FuSeq carefully either for Hg 38 and long-read RNA-seq data.

I did an investigation from the FuSeq output and discovered that most of the missing fusions are very lowly expressed. If I change the parameter setting of minScore from 3 (default) to 1, I will obtain 8 out of 14 true fusions.

 FuSeq.params$minScore=1  
  myFusionFinal.MR=FuSeq.MR.postPro$myFusionFinal
  myFusionFinal.SR=FuSeq.SR.postPro$myFusionFinal

  fragmentInfo=FuSeq.MR$fragmentInfo
  FuSeq.integration=integrateFusion(myFusionFinal.MR, myFusionFinal.SR, FuSeq.params, fragmentInfo=fragmentInfo, paralog.fc.thres=2.0)
  myFusionFinal=FuSeq.integration$myFusionFinal

Moreover, one fusion (TACC3-FGFR3) is in very short distance 48117 which is less than the default setting of FuSeq.params$minGeneDist=1e5. So if you want to get this fusion you might reduce the distance to 10000 (FuSeq.params$minGeneDist=1e4).

I hope it would help.

Best,
Nghia

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants