Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

try cases where .hap file does not follow format exactly #51

Closed
4 tasks done
aryarm opened this issue Jun 17, 2022 · 5 comments
Closed
4 tasks done

try cases where .hap file does not follow format exactly #51

aryarm opened this issue Jun 17, 2022 · 5 comments

Comments

@aryarm
Copy link
Member

aryarm commented Jun 17, 2022

The .hap file format is described in our documentation here

The data.haplotypes.Haplotypes class is responsible for reading and writing .hap files. We have test cases to ensure the class works properly

But what does our code do in situations where the .hap file isn't formatted properly and doesn't follow the specification? Can we give helpful error messages?

Here are some things to check:

  • what happens when there's an extra field that isn't declared in the header?
    • (related) what happens when there are more extra fields than declared in the header?
  • what happens when there are strings in fields where there are supposed to be ints or floats?
  • what happens when there's an unsupported line type?

Once we know what happens in each case, we should either

  1. ignore the issue in our code but add an error message telling the user to run the validate subcommand
  2. raise a ValueError exception that kills the current process
  3. just ignore the issue altogether

We should only do item 2 if the issue is especially egregious and will prevent us from continuing. And we should do item 3 if it would be too cumbersome to report the error. But in either case 1 or 2, we could create test cases to ensure these error messages appear.

@aryarm aryarm changed the title test cases where .hap file does not follow format exactly try cases where .hap file does not follow format exactly Jul 28, 2022
@aryarm
Copy link
Member Author

aryarm commented Jul 28, 2022

@ciarareeve , if you choose to do this issue, can you write a small/informal report detailing your findings here within a comment to this issue?

@ciarareeve
Copy link
Collaborator

ciarareeve commented Aug 8, 2022

In the case that there is any additional field which haptools does not anticipate for a given subcommand then it will produce a KeyError. In the case that a string is given when a float or int is required then haptools will throw a ValueError. Lastly, when there is an unsupported line type (beginning with anything other than an H, V, or #), if it is the first row following the header, then an IndexError is thrown. If the unsupported line type is any other following row then the unsupported line type is ignored and the output produced is identical to what would be produced if the user had input an expected line type.

@aryarm
Copy link
Member Author

aryarm commented Aug 9, 2022

looks great, @ciarareeve! thanks for doing all of this

Probably all of these are ok. The user will encounter these errors and then probably just run the validate command and get more informative error messages. If anything, the unsupported line type being ignored is probably the only thing we'll need to fix.

If it's easy to do (and only if it is!), it would be helpful if you could attach stack-traces for each of these Exceptions. Basically, it would be useful to know

  1. What is the error message for each exception?
  2. Which line of code does the error originate from? (Can you link to it?)

That way, I can potentially wrap those exceptions with a try-except block and then error out with a message that's more useful, like "Your .hap file is malformed. Please run the validate subcommand."
Apologies for not asking for these beforehand. I didn't really realize it would be useful to have them until now.

@ciarareeve
Copy link
Collaborator

ciarareeve commented Aug 10, 2022

Here is a screen shot if there is an additional field - this is a value error rather than a key error which is either because you have made changes or because there were changes that were not saved before running the test on my end, sorry about that:

Image

This is for a syntax mismatch:

Image

Unsupported line type starting with first row:

Image

And unsupported line type with any following row ( no error):

Image

If you would like this copy&pasted/typed out as more of a report please let me know.

@aryarm
Copy link
Member Author

aryarm commented Aug 10, 2022

If you would like this copy&pasted/typed out as more of a report please let me know.

No, this is perfect - it's exactly what I was looking for!

I'll go ahead and use this info to update the TODO for #47

@aryarm aryarm closed this as completed Aug 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants