-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read.bismark errors and warnings for large number of files #93
Comments
Hi Annie, It looks like you are using the (default) in-memory backend (equivalent to It looks like you are also using the (default) parallelisation strategy (equivalent to Let me know if either of these help. Cheers, PS Could you also please include the output of |
Thank you for your reply. Sorry it took me so long to respond but I had to work on another project. I modified my code to include the HDF5 array option. It did seem to be "dumping data" into the specified temp folder but I still got an error. I honestly am very confused by the documentation on the BPPARAM and how to change that. I don't really have any idea what I should try. I am attaching the biocmanager output you requested. My new code with HDF5 array option is: meth = bsseq::read.bismark(
files = files,
colData = data.frame(row.names = c("PL1142","PL1973","PL232","PL722","PL837","PL1103","PL171","PL2102","PL274","PL1523","PL1746","PL230","PL891","PL1487","PL1616","PL1814","PL449","PL865","PL875","PL1342","PL1500","PL2043","PL2241","PL2256","PL443","PL946","PL965","PL1177","PL1553","PL373","PL874","PL899","PL1274","PL1457","PL1540","PL2137","PL1138","PL1299","PL1354","PL1451","PL1462","PL1641","PL1774","PL642","PL1027","PL1085","PL721","PL816","PL862","PL1303","PL1545","PL1549","PL1674","PL2091","PL796B","PL921")),
rmZeroCov = FALSE,
strandCollapse = FALSE,
verbose=2,
loci=lociTemp,
BACKEND="HDF5Array",
dir="/mnt/DATA/Cores/hiseq2000/annie/misc_methylation_analysis/PE_july20/temp"
)
The error message that I am still getting is:
Loading required package: rhdf5
[read.bismark] Using 'loci' as candidate loci.
[read.bismark] Parsing files and constructing 'M' and 'Cov' matrices ...
Error in result[[njob]] <- value :
attempt to select less than one element in OneIndex
In addition: Warning message:
In parallel::mccollect(wait = FALSE, timeout = 1) :
1 parallel job did not deliver a result I am not sure if I did this right but I think my value for BPPARAM is as follows: > BPPARAM
class: MulticoreParam
bpisup: FALSE; bpnworkers: 22; bptasks: 0; bpjobname: BPJOB
bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
bpRNGseed: ; bptimeout: 2592000; bpprogressbar: FALSE
bpexportglobals: TRUE
bplogdir: NA
bpresultdir: NA
cluster type: FORK Thank you for your assistance. |
I am having issues with the read.bismark command. I have a large number of files that need to be analyzed. Right now I am working with 24 but I'd like to be able to run up to 30 or more at any given time. I realize the problem is likely with R and the amount of memory that it can allocate itself. I am working on a linux cluster. I have been trying to adhere to the "best practices" for this command as outlined in the vignette. I am using cytosine reports so they are about 57M lines and I have 24 files. So I understand that is a lot of data. I wrote a loop so that it was only dealing with 1 chromosome at a time but I am still running into errors. It is very strange because R will work for a chromosome or 2 and then fail. I then have to close out R, close out my terminal, and restart. If I just try to restart my script in R without shutting everything off it generally just keeps failing. I haven't tried any of the multicore settings as outlined in the vignette because I don't really understand what to do. The following is the read.bismark command I am running.
meth = bsseq::read.bismark(
files = files,
colData = data.frame(row.names=c("G1138","G1774","G642","G1641","G443","G965","G2043","G1342","G1354","G1451","G1299","G1462","G2241","G946","G1500","G2256","G1927","G1533","G2024","G2092","G335","G1787","G709","G1631")),
rmZeroCov = FALSE,
strandCollapse = FALSE,
verbose=2,
loci=lociTemp
)
Please note that lociTemp is only the loci from 1 chromosome. The data files are all chromosomes though. I really don't want to have to split those up if I don't have to. In the rest of my script I've done other things like remove files once they were finished in order to preserve memory. This seems to have gotten rid of the "forking errors". Even when it works I get the following warning/error: "Error in mcexit(0L) : ignoring SIGPIPE signal" This is usually repeated about 20 times but everything seems to work so I have just been ignoring it. When it stops, I see the warning and then some verbiage about allocating index less than value. I wish I could provide the exact error but you know when I wanted it to give me the error, it didn't. If I see it again, I will definitely post.
Is there any other guidance you can provide to me to deal with a large number of files?
Annie
The text was updated successfully, but these errors were encountered: