Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new implementation of overlap management #14

Open
PascalCrepey opened this issue Oct 17, 2019 · 5 comments
Open

new implementation of overlap management #14

PascalCrepey opened this issue Oct 17, 2019 · 5 comments
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed
Milestone

Comments

@PascalCrepey
Copy link
Owner

As @ClementMassonnaud said, we need to look at the new version of the adjust_overlapping_stays() function proposed with commit #50200db

"[...] the two versions give the same results on all the tests I wrote. Considering the current version though, I'm not sure to understand the purpose of the iterator and how it works. Because I created a fake database with all the types of overlapping stays I could think of, with overlapping stays nested multiple times, etc. And it seems that the function adjusts for those stays correctly with only one pass in the while loop... Maybe you can take a look at the tests and see if I'm missing something.

Anyway, the new function works quite similarly to the current one, but with a slight change that allows to adjust partial overlaps with either admission or discharge leading. Also, it's using a bit more of the useful data.table syntax which makes the function shorter.

If the new function is not missing anything important that I might have missed, I would propose to use it as a default, mainly for the two advantages I mentioned."

@PascalCrepey PascalCrepey added enhancement New feature or request help wanted Extra attention is needed labels Oct 17, 2019
@PascalCrepey PascalCrepey added this to the v0.9.2 milestone Oct 17, 2019
@tjibbed
Copy link
Collaborator

tjibbed commented Oct 21, 2019

Hi @ClementMassonnaud ,

I'm currently checking your new implementation of the overlapping admission algorithm. (I am using a subset of a real dataset, so no simulated data at this moment). I've run into a couple of issues that might need our attention:

  • It's quite slow. The original method takes:

user system elapsed
1.413 0.000 0.751

whereas the new implementation times at:

user system elapsed
84.214 0.039 83.084

I think this is mainly die to the "per subject" calculation of overlaps, although I haven't checked this. However, I used to use the by=sID method before switching to the full list method. @PascalCrepey taught me this, and I guess you know it too. It's quite fast... basically you calculate all differences between record N and N+1 and exclude any differences where sID in N is not the sID in N+1. Isn't that possible here as well?

over = base[, list("left" = .SD[, Adate][-1] - .SD[, Ddate][-.N], "right" = .SD[, Ddate][-1] - .SD[, Ddate][-.N], "I" = .I[-1]), by = sID]

  1. The checked database isn't final. It can still contain overlapping admissions, and re-running the function on the corrected database sometimes delivers a new corrected database that is slightly different.
    This can be caused by staggered hospital stays or sequential overlaps, e.g.
    Facility A: |--------------------------------------------|
    Facility B: |----| |----| |----|

Because overlap detection is based on successive records, sorted on patient, admission, and discharge, the algorithm will only detect and correct the first instance, leaving two overlaps:
Facility A: |--------| |------------------------------|
Facility B: |----| |----| |----|

The next run, the following overlap is detected and corrected:
Facility A: |--------| |-------| |-------------------|
Facility B: |----| |----| |----|

Etc.

This is the original method uses the iterations, and I think this could be done in the new method as well.

  1. Some of the newly create records have negative lengths of stay. This not a big issue, as they can easily be filtered out.

  2. If I run the algorithm until it gives the final answer, the answer is not the same as the original method. The new method gives far more records. I am currently going through the database to see what the correct answer should be. As it is real data, this does take some time, but I think the extra "dirt" in the real data may help us find the potential issues here.

We might need to think of a way to also use a real dataset (or subset thereof) to validate the methods.

  1. Minor issue: it currently doesn't carry over auxiliary data. These might be needed further on, but this is relatively easy to implement.

@tjibbed
Copy link
Collaborator

tjibbed commented Oct 21, 2019

My "beautiful" schematic of overlapping admission didn't align there... But I think you get the idea.

@ClementMassonnaud
Copy link
Collaborator

Hi,
Thank you for your feedback

It can still contain overlapping admissions, and re-running the function on the corrected database sometimes delivers a new corrected database that is slightly different. This can be caused by staggered hospital stays or sequential overlaps, e.g.

I tried to think about the different types of possible overlaps, but I missed this type. I now see what is the issue... I usually try to avoid 'while' loops as much as possible, but here it seems we don't have much choice indeed

We might need to think of a way to also use a real dataset (or subset thereof) to validate the methods.

Yes, I think it is a key point, to make sure the function deals correctly with every possible scenario. I wrote tests in tests/testthat/test-adjust_overlaps.R. I manually created a fake database with all the specific overlaps I could think of, listing them, and anticipating what the correct result should be. I think that it would be a good idea to use the other types of overlaps from a real database, as you suggested, and then 'merge' them with the current test database to complete the tests.

Minor issue: it currently doesn't carry over auxiliary data. These might be needed further on, but this is relatively easy to implement.

I'm not sure I understand what you mean by auxiliary data. If by that you mean additional columns in the database, I don't understand where is the issue

basically you calculate all differences between record N and N+1 and exclude any differences where sID in N is not the sID in N+1. Isn't that possible here as well?

Yes this is interesting, I missed that. I don't see why it would not be possible here as well. I am currently on the road, I will look into it ASAP.

@PascalCrepey
Copy link
Owner Author

It seems to me we need to remain on @tjibbed original implementation, with the loop. Am I right ? does it need further optimization ? Can we merge @ClementMassonnaud data.table optimization into @tjibbed implementation ? or should we simply close this issue ?

@ClementMassonnaud
Copy link
Collaborator

Yes we do need the loop I think
But I think we should also implement the optimizations I proposed.
I'll merge the two versions and close this issue ASAP

@PascalCrepey PascalCrepey modified the milestones: v0.9.2, v0.9.3 Nov 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants