-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make file_type a settable attribute #170
Comments
Why not take a biopython SeqIO approach (http://biopython.org/wiki/SeqIO) Where the constructor for the bedtool has an optional file_type arg, and Something like: pybedtools.Bedtool(fn, file_type="bed") Might take slightly more engineering for a very rare edge case though. Gabriel Pratt On Tue, Apr 26, 2016 at 5:57 PM, Ryan Dale [email protected] wrote:
|
I like that idea, and should probably add it. For it to work for this example, the results from each BedTool method would inherit the file type of But the case where you have that problematic file already on disk and then want to make a BedTool out of it, that's where the constructor would come in handy. |
Hmm... I was about to agree with you chasing down all the places were you'll need to manually define changes in type is going to be a huge pain. I actually just ran into a bug that would break your proposed solution. I'm using an intersect bed command -wo, which long story short outputs the bedline which pybedtools thinks is a sam file.
Both the parent bed files are fine, but the child breaks on implicit construction. Can you define the type of the object after creation, but before running though the create_interval_from_list function? |
Yep, that's what the original proposal was: you could set the |
Imagine we have two bed files, and we do this:
The resulting file could look like this, is incorrectly guessed to be SAM format:
When we try to iterate over it, we get an
OverflowError
. Currently the fix is to make the name field a non-integer before doing the intersection:While we could get more fancy with detecting SAM, I don't want to go the route of checking against a regex for every field in every line of a file for pathological cases like this. Instead, it would be useful to set the filetype on the
BedTool
object and use that to short-circuit thecreate_interval_from_fields
heuristics. So then you could do this:The text was updated successfully, but these errors were encountered: