-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
size of subsetted BSseq object is larger than the original object #83
Comments
Hi, Briefly, it's because the subsetting is stored as 'delayed operation' using the DelayedArray package. You might try using the HDF5Array backend for such a large object. It'll reduce the memory footprint to roughly under a 1 GB. |
I'm travelling for the next few days but will be happy to give some suggestions for processing a large dataset like this, as I've done quite a bit of this sort of thing. |
Hi, Thanks for the clarification. I actually have no experience dealing with data of similar size. I have spent the last two days following some of the tutorials available online(here and here) about DelayedArray format and I understand what 'delayed operations' mean. Nevertheless, I don't know how to implement this architecture for memory footprint reduction. As a result, I would really appreciate it if you could guide me through an efficient way to process the data. Best regards, |
If starting from Bismark files, you could try Alternatively, if you do want to keep your data in-memory, you can do I'm actually giving an updated tutorial on DelayedArray next week at BioC2019. |
Hi, Thank you for your valuable input. I am also looking forward to learning more from the updated tutorial to be given at BIoC2019. Actually, I start with the raw coverage and methylation matrices and then create a BSseq object from scratch. I find that using I think that another method might be to filter the raw coverage and methylation matrices in advance before creating the BSseq objects. Don't know if that would be a good alternative or not, but worth trying. One last question, is it recommended to create BSseq object from DelayedMatrix objects?
Best regards, |
I think that should be fine. The matrix will be wrapped in a DelayedMatrix, but that's cheap and shouldn't copy the data. |
Broadly, we (well, Pete) has processed extremely large datasets using this
backend, but it is probably somewhat finicky - ie. you can do stuff that
will make it explode and other stuff which will work fine. And it is pretty
clear that what is what is not well explained (and perhaps not well
understood).
…On Thu, Jun 20, 2019 at 2:15 PM Peter Hickey ***@***.***> wrote:
I think that should be fine. The matrix will be wrapped in a
DelayedMatrix, but that's cheap and shouldn't copy the data.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#83?email_source=notifications&email_token=ABF2DH45OAZK5KIWJEEQ3NTP3NYH5A5CNFSM4HZP7M3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYFHQLQ#issuecomment-504002606>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABF2DH3JILZULCS7HZY4HJLP3NYH5ANCNFSM4HZP7M3A>
.
--
Best,
Kasper
|
Thank you @kasperdanielhansen for pointing this out. It's good to keep that in mind because I have no experience on this level of analysis. I would be glad if you could share a "conservative" approach to follow in similar situations. Best, |
Hi developers,
I am having a weird problem that results in the size of a subsetted bsseq object being larger than the original one:
I am using the github version of bsseq.
Best,
Mohamed Shoeb
The text was updated successfully, but these errors were encountered: