-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
randomstats not cleaning up #67
Comments
A while ago I did a major overhaul on the randomization stuff, implementing a new method ( Looks like I never made this method the default for To use the new method, you can specify So for your example, this should do the trick:
(side note: If you take a look at the leftover temp files, I think they should all be genome files) |
that does the trick. can genome_fn be a required argument to avoid this? |
Yeah, that's probably best. I still need to do a little more cleaning up and "officially" deprecate the old randomstats method; when that happens the genome_fn will be required. |
got it. would you consider adding _orig_pool kwag to random_op. it'd be nice be able to keep re-using a pool if I'm running this across multiple pairs of bed files. |
Sure. Implementation-wise, would you rather create your own pool and use it for various parallel calls like mypool = multiprocessing.Pool(25)
bt.randomstats(_orig_pool=mypool, *args, **kwargs)
bt.random_op(_orig_pool=mypool, *args, **kwargs)
bt.random_jaccard(_orig_pool=mypool, *args, **kwargs) or have a BedTool._pool instance variable that, if None, will initialize with n processes, but subsequent calls (when _orig_pool=True) re-use that auto-created one? # initializes a pool, BedTool._pool = multiprocessing.Pool(25)
bt.randomstats(_orig_pool=True, processes=25, *args, **kwargs)
# subsequent calls re-use BedTool._pool
bt.randomstats(_orig_pool=True, processes=25, *args, **kwargs)
# set to None to re-initialize w/ different nprocs
bt._pool = None
bt.randomstats(_orig_pool=True, processes=500, *args, **kwargs) |
I much prefer the former. |
sorry for putting this in this thread, but it's another open file error. if i stream, it must be leaving open the process? from pybedtools import BedTool
a = BedTool('chr1 1 2', from_string=True)
b = BedTool('chr1 1 2', from_string=True)
for i in range(10000):
print i
c = a.intersect(b, stream=True) is that expected to leak? |
In this case, I think the answer is yes: The way streaming bedtools are closed is by hitting a StopIteration (see cbedtools.IntervalIterator). Since But it would be nice if the garbage collector saw that the streaming BedTool from iteration i-1 no longer has any references, and cleans it up (would a |
i tried a number of things including |
I dont know why this is happening, because a look at the code shows that it is calling close_or_delete but randomstats is leaving a ton of pybedtools.tmp* files in my tmp dir. and calling cleanup() does not remove them.
Perhaps what's getting sent to close_or_delete is a filehandle?
I've tried calling randomstats with an object and with object.fn and it never cleans up the files.
My call looks like this:
The text was updated successfully, but these errors were encountered: