-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BSmooth error #104
Comments
Please share the output of |
You tend to get this kind of error when you have very short sequences in
the genome, I think. I would look a bit at the length of especially smaller
contigs. What organism are you using?
…On Wed, Sep 29, 2021 at 6:36 PM Peter Hickey ***@***.***> wrote:
Please share the output of BiocManager::valid()
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#104 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABF2DHZZ25DHALMC2WVRAH3UEOIFDANCNFSM5FA3E3PA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
--
Best,
Kasper
|
1. Please print the BSseq22 object.
2. If this is human data, why do you have 65M CpGs?
…On Wed, Sep 29, 2021 at 10:20 PM Priyatama Pandey ***@***.***> wrote:
Hi Pete and Hansen,
Thanks so much for your reply. Output of BiocManager::valid() showed TRUE.
I am working on human WGBS samples. I ran for chr22 only and it worked
with warning (attached).
[image: Screen Shot 2021-09-29 at 7 18 42 PM]
<https://user-images.githubusercontent.com/17990280/135374935-4c0174ef-bba5-418b-983f-9f5cd679cd2a.png>
Please suggest.
Thank you,
Priya
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#104 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABF2DH5QS6ZMZYRGNE4FRRTUEPCM7ANCNFSM5FA3E3PA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
--
Best,
Kasper
|
Sorry, it only returns that info that if it returns |
Thanks, @priyatamapandey. Everything looks up to date, which makes debugging easier :) Regarding the 65M CpGs, are your CpGs stranded or have they been aggregated across strands?
If you share the output of running We typically aggregate across strands before smoothing CpG methylation, and Unfortunately, we've also experienced sporadic failures when smoothing large BSseq objects that can lead to these obscure error message, particularly when using parallelisation on a shared computing environment (e.g., a university cluster). |
Hi Pete, Thank you for all the suggestions. So, do you think I should not use multicore option even for the chromosome wise smoothing. It have split up the object with using 3 cores/parallel and it failed with the following error. Then, I have tried chromosome2 without using multiple core.
It took few hours to complete and at the end it retuned error. Although for chr1 the above code worked after taking long hours. I am running on my iMAC for now which has 32gb memory and 8core. I do have 100 samples to process once I get successful, then I would probably use institute cluster to perform. Here is the error for chr2 smoothing. I have also ran rowData() on the enitre BSseq object and on chr1 and chr2 BSseq object. Since it did not return any column information. I am attaching the BSseq object jist here. It might help to understand the stand information. This is the large BSseq object Thank you so much for your help, |
In your code example you def. specify a strand-specific analysis
(strandCollapse = FALSE) which is pretty unusual and certainly not what I
would recommend unless you have a very specific question in mind.
Even account for that, I think you have too many Cpgs.
…On Fri, Oct 1, 2021 at 2:49 PM Priyatama Pandey ***@***.***> wrote:
Hi Pete,
Thank you for all the suggestions. So, do you think I should not use
multicore option even for the chromosome wise smoothing. It have split up
the object with using 3 cores/parallel and it failed with the following
error.
[image: Screen Shot 2021-10-01 at 11 37 49 AM]
<https://user-images.githubusercontent.com/17990280/135670539-876bf620-686c-4b33-bd9d-5b398a4adeee.png>
Then, I have tried chromosome2 without using multiple core.
Here is my command.
bismarkBSseqOrdered2 <- orderBSseq(bismarkBSseq, seqOrder = "chr2" )
bismarkBSseq.chr2 <- BSmooth(
BSseq = bismarkBSseqOrdered2,
BPPARAM = SerialParam(progressbar = TRUE),
verbose = TRUE)
It took few hours to complete and at the end it retuned error. Although
for chr1 the above code worked after taking long hours. I am running on my
iMAC for now which has 32gb memory and 8core. I do have 100 samples to
process once I get successful, then I would probably use institute cluster
to perform.
Here is the error for chr2 smoothing.
[image: Screen Shot 2021-10-01 at 11 04 05 AM]
<https://user-images.githubusercontent.com/17990280/135669212-26e37f3a-3e91-45f2-8337-9fadeef5d55b.png>
I have also ran rowData() on the enitre BSseq object and on chr1 and chr2
BSseq object. Since it did not return any column information. I am
attaching the BSseq object jist here. It might help to understand the stand
information.
[image: Screen Shot 2021-10-01 at 11 12 04 AM]
<https://user-images.githubusercontent.com/17990280/135669686-857e19f5-c3fa-43ee-a00b-5f2320c6f6b1.png>
This is the large BSseq object
[image: Screen Shot 2021-10-01 at 11 14 40 AM]
<https://user-images.githubusercontent.com/17990280/135669600-f7aa89f8-9e45-491a-b8a3-8952050f3538.png>
Thank you so much for your help,
Priya
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#104 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABF2DH62ZLCNPHUXP4J37MLUEX7E7ANCNFSM5FA3E3PA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
--
Best,
Kasper
|
Hi Kasper, Thank you, |
All of those weird looking contigs (chromosomes) is likely the cause of the
error. What happens is that some of these contigs contains too few CpGs for
the smoothing to work.
I strongly recommend the following.
1) recreate your objects with strandCollapse=TRUE to ensure you don't do a
strand specific analysis
2) get rid of all the contigs (at least to start with); the easiest one is
to only _keep_ the standard autosomes.
That should resolve your error with 98% probability. I don't think you will
have problems with parallelization although - as Pete has said - sometimes
we see problems.
Best,
Kasper
…On Fri, Oct 1, 2021 at 6:20 PM Priyatama Pandey ***@***.***> wrote:
Hi Kasper,
Ok. I will change that. I found that example in read.bismark R
documentation so thought to start with.
Also, I checked my coverage file and found that I do have many more
chromosome than Chr1 to chr22 and X and Y such as chr19_KI270883v1_alt,
chr19_KI270890v1_alt, chr19_KI270919v1_alt, chrUn_KI270334v1,
chrUn_KI270373v1, chrUn_KI270376v1.
I think that is causing many addition CpGs. I want to consider only chr1
to chr22 and chrX and chrY while reading files in the read.bismark but do
not understand how to set that condition? Is there *Loci* parameter
contain that option in read.bismark function?
Thank you,
Priya
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#104 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABF2DH5JKVFJSL7XUKGEUYLUEYX2DANCNFSM5FA3E3PA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
--
Best,
Kasper
|
There's a section 'Specifying |
Happy New Year Dr. Hansen and Pete, I have run bsmooth on the default setting and also want to customize the setting to see, whether different settings are improving our results but most of the setting getting failed at some point. I have divided my data chromosome wise and some of the setting is getting fail for chromosome 1, some for chromosome 2 etc. Below is the setting list which I have already tried Here is the error returned Please suggest how to fix this issue. Any idea why it is throwing error? Thank you so much for your help, |
It's very hard to know and I can only refer you to our previous comments. |
Thank you for your suggestion, I am working on the DMRs now and want to understand the cutoff options. I don't understand how to choose cutoff vector? I calculated t-statistics and generated the following figure. I don't see much difference in corrected and uncorrected in figure. Please suggest me how to select cutoff while checking the t-statistics? |
HI Kirito, Best, |
We've used the default settings for many different human (and possibly mouse) sample types, not just cancer data. |
It's very hard without a reproducible example and further details. |
Species is human, and context is GpC.I running it in my PC, and with 3.6GHz 6core processor. |
Is it really GpC (e.g., a NOMe-Seq-like protocol) and not CpG methylation (e.g., a standard bisulfite-seq-like protocol)? bsseq was initially designed for analysing CpG methylation data from bisulfite-sequencing protocols. As far as I'm aware, neither @kasperdanielhansen or I have used it to analyse GpC methylation from NOMe-Seq-like data. > suppressPackageStartupMessages(library(bsseq))
> suppressPackageStartupMessages(library(BSgenome.Hsapiens.UCSC.hg38))
> # Identify all GpC and CpG dinucleotides on forward strand of chr1 in GRCh38
> gpc_chr1 <- findLoci("GC", BSgenome.Hsapiens.UCSC.hg38, "chr1", strand = "+")
> cpg_chr1 <- findLoci("CG", BSgenome.Hsapiens.UCSC.hg38, "chr1", strand = "+")
> # How many are there?
> length(gpc_chr1)
[1] 10145272
> length(cpg_chr1)
[1] 2375159
> # What is the distance between adjacent loci?
> par(mfrow = c(1, 2))
> plot(density(log10(start(gpc_chr1)[-1] - start(gpc_chr1)[-length(gpc_chr1)])), "GpC")
> plot(density(log10(start(cpg_chr1)[-1] - start(cpg_chr1)[-length(cpg_chr1)])), "CpG") As bsseq is not designed for analysing GpC data and we have not tried this ourselves, we cannot really offer more substantial support. |
The smoothing code has problems when you have a lot of sites (CpGs / GpCs)
and a small bandwidth relative to the number of sites. The potential way
out of this is to set maxGap which will divide a chromosome into more than
1 "chunk" if there is a stretch of DNA without sites. The default values of
maxGap is 10^8 which essentially mean "don't do this" and this works for
the default setting and human chromosomes.
Because there are more GpCs you will probably need to set maxGap to
something.
Also, before you run it genome wide I suggest to look at a couple of loci
in great detail to get a feel for smoothing parameters.
Best,
Kasper
…On Wed, Feb 9, 2022 at 7:47 PM Peter Hickey ***@***.***> wrote:
Is it really GpC (e.g., a NOMe-Seq
<https://www.illumina.com/science/sequencing-method-explorer/kits-and-arrays/nome-seq.html>-like
protocol) and not CpG methylation (e.g., a standard bisulfite-seq-like
protocol)?
*bsseq* was initially designed for analysing CpG methylation data from
bisulfite-sequencing protocols.
This is what the default parameters to BSmooth() are aimed at (in
particular, for human+mouse-like genomes).
More recently, we have used *bsseq* to analyse CpH (e.g., CpA)
methylation data from bisulfite-sequencing protocols in human samples, but
with different smoothing parameters; importantly, getting satisfactory
smoothing results was a lot of trial and error.
As far as I'm aware, neither @kasperdanielhansen
<https://github.com/kasperdanielhansen> or I have used it to analyse GpC
methylation from NOMe-Seq-like data.
And I very much doubt the default parameters would work out of the box
since the distribution of GpCs along the genome is very different to the
distribution of CpGs, as illustrated below:
> suppressPackageStartupMessages(library(bsseq))> suppressPackageStartupMessages(library(BSgenome.Hsapiens.UCSC.hg38))> # Identify all GpC and CpG dinucleotides on forward strand of chr1 in GRCh38> gpc_chr1 <- findLoci("GC", BSgenome.Hsapiens.UCSC.hg38, "chr1", strand = "+")> cpg_chr1 <- findLoci("CG", BSgenome.Hsapiens.UCSC.hg38, "chr1", strand = "+")> # How many are there?> length(gpc_chr1)
[1] 10145272> length(cpg_chr1)
[1] 2375159> # What is the distance between adjacent loci?> par(mfrow = c(1, 2))> plot(density(log10(start(gpc_chr1)[-1] - start(gpc_chr1)[-length(gpc_chr1)])), "GpC")> plot(density(log10(start(cpg_chr1)[-1] - start(cpg_chr1)[-length(cpg_chr1)])), "CpG")
[image: image]
<https://user-images.githubusercontent.com/1049741/153315106-b651ad04-5704-4c38-80e5-527c8edd862f.png>
As *bsseq* is not designed for analysing GpC data and we have not tried
this ourselves, we cannot really offer more substantial support.
*bsseq* *may* be suitable, but my guess it's going to take some
substantial research effort to identify appropriate parameters.
—
Reply to this email directly, view it on GitHub
<#104 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABF2DHYFMSNQI7BN4RJKJUTU2MDJPANCNFSM5FA3E3PA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Best,
Kasper
|
Hi,
I am trying to use BSmooth function on the 16 WGBS samples and it showed the following error.
Please suggest me what I need to correct?
I used the following code for reading my 16 coverage files and then performing the BSmooth.
Thank you so much for your help.
Priya
The text was updated successfully, but these errors were encountered: