-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BedTool.sequence hangs on a BedTool made with generator? #77
Comments
calling len() will cause the iterator to exhaust itself and then be empty by the time it gets to the .sequence() call. if you really want to iterate over it multiple times, then use:
maybe if you used "raise StopIteration", the error would be more helpful. |
Sure about len, didn't meant to imply that it fixes the issue, but I just found it surprising that len can exhaust the iterator (thus ending in an empty sequence output as you write) whereas sequence cannot. I definitely don't want to load it in to memory so I avoided list - there's no need for me to iterate through it multiple times, it's only made into a BedTool as a generator so that it can be written to a file line by line. I tried using |
I think this boils down to a question about whether import pybedtools
def generator():
seqsize = 3
for i in range(100):
strand = '+'
if i % 2 == 0:
strand = '-'
yield pybedtools.create_interval_from_list([
'chr1',
str(seqsize * i),
str(seqsize * i + seqsize),
'region_%s' % i,
'.',
strand
])
return
x = pybedtools.BedTool(generator())
x.sequence(
fi=pybedtools.example_filename('test.fa'),
fo='example',
name=True,
s=True)
print open('example').read() This works fine. Are you able to modify this (perhaps by using your It's possible that if |
I am following Ryan's template for creating a
BedTool
on the fly to avoid temporary files (as outlined here #55). I'm parsing through a BedTool corresponding to a BED filea
. As I iterate through the file, I'm creating a new BedToolb
usingcreate_interval_from_list
. The new BedToolb
represents some shuffles of each BED interval ina
. Once the new BedToolb
is created, I callb.sequence()
to generate a FASTA from it.I found that the
sequence()
call hangs after outputting a large portion of the FASTA, probably because of some issue with the way I'm making the iterator. Strangely, if I calllen(b)
beforeb.sequence(...)
, then the problem goes away. I've looked atbedtool.py
and noticed that__len__()
callsself.count()
, which is apparently what resolves the iterator. Any idea what might be causing this halting condition? Do I need to raise aStopIteration
when feeding a generator toBedTool
?Here is my code for making the generator, it's extremely simple:
Now I create a BedTool using this function and call
sequence()
:Any thoughts on what might cause
sequence
to hang? The BedTool seems to be made fine from the generator, it's thesequence
line that actually makes use of the generator that hangs. Thanks!Update: further confirmation that it's a problem somewhere in the
sequence()
call or downstream of it - if I take the BedTool that I made using the generator (sampled_clusters_bed
) and write it to a.bed
first:And then load
test.bed
as a BedTool instance, and call that object's.sequence()
method, then it all works. So it must be something about howsequence()
parses the generator it seems.The text was updated successfully, but these errors were encountered: