Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Topological correctness #54

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Open

Topological correctness #54

wants to merge 16 commits into from

Conversation

dkhofer
Copy link
Collaborator

@dkhofer dkhofer commented Sep 25, 2024

No description provided.

fixtures/aa.gfa Outdated
S 1 A SN:Z:123 SO:i:0 SR:i:0
S 2 A SN:Z:123 SO:i:0 SR:i:0
L 1 + 2 + *
P 124 1+,2+ 4M
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure what the parser would do, but this should be 1M (1 match)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? Shouldn't it be 0M for overlap of length 0 if it's representing the sequence AA?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, it should be. I knew 4m was way too much :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -57,13 +64,16 @@ pub fn import_fasta(
.sequence(&sequence)
.save(conn)
};
let node = Node::create(conn, &seq.hash);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if i import the same file 2x, we'll have duplicate nodes + edges made?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. It's not great, but I don't see it as blocking. I think properly detecting the duplicates in general would take some work, worth doing long term.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it that much of a problem on the fasta import side? When would people import the same fasta file and expect them to collapse onto the same nodes and edges? I can understand on the vcf import you'd eventually want a heuristic that says something like "for a pair of edges (p, q), if there is already a pair of edges (a, b) with a shared node id (from the reference path) and identical coordinates on all side, then check to see if the new node has the same hash as the old node."

);
let _ = conn.execute(&insert_statement, ());
Node {
id: conn.last_insert_rowid() as i32,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason to do this over returning id?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I genuinely thought I was using the full node somewhere, but I guess not. Changed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants