new implementation of overlap management #14

PascalCrepey · 2019-10-17T08:44:11Z

As @ClementMassonnaud said, we need to look at the new version of the adjust_overlapping_stays() function proposed with commit #50200db

"[...] the two versions give the same results on all the tests I wrote. Considering the current version though, I'm not sure to understand the purpose of the iterator and how it works. Because I created a fake database with all the types of overlapping stays I could think of, with overlapping stays nested multiple times, etc. And it seems that the function adjusts for those stays correctly with only one pass in the while loop... Maybe you can take a look at the tests and see if I'm missing something.

Anyway, the new function works quite similarly to the current one, but with a slight change that allows to adjust partial overlaps with either admission or discharge leading. Also, it's using a bit more of the useful data.table syntax which makes the function shorter.

If the new function is not missing anything important that I might have missed, I would propose to use it as a default, mainly for the two advantages I mentioned."

tjibbed · 2019-10-21T10:14:38Z

Hi @ClementMassonnaud ,

I'm currently checking your new implementation of the overlapping admission algorithm. (I am using a subset of a real dataset, so no simulated data at this moment). I've run into a couple of issues that might need our attention:

It's quite slow. The original method takes:

user system elapsed
1.413 0.000 0.751

whereas the new implementation times at:

user system elapsed
84.214 0.039 83.084

I think this is mainly die to the "per subject" calculation of overlaps, although I haven't checked this. However, I used to use the by=sID method before switching to the full list method. @PascalCrepey taught me this, and I guess you know it too. It's quite fast... basically you calculate all differences between record N and N+1 and exclude any differences where sID in N is not the sID in N+1. Isn't that possible here as well?

over = base[, list("left" = .SD[, Adate][-1] - .SD[, Ddate][-.N], "right" = .SD[, Ddate][-1] - .SD[, Ddate][-.N], "I" = .I[-1]), by = sID]

The checked database isn't final. It can still contain overlapping admissions, and re-running the function on the corrected database sometimes delivers a new corrected database that is slightly different.
This can be caused by staggered hospital stays or sequential overlaps, e.g.
Facility A: |--------------------------------------------|
Facility B: |----| |----| |----|

Because overlap detection is based on successive records, sorted on patient, admission, and discharge, the algorithm will only detect and correct the first instance, leaving two overlaps:
Facility A: |--------| |------------------------------|
Facility B: |----| |----| |----|

The next run, the following overlap is detected and corrected:
Facility A: |--------| |-------| |-------------------|
Facility B: |----| |----| |----|

Etc.

This is the original method uses the iterations, and I think this could be done in the new method as well.

Some of the newly create records have negative lengths of stay. This not a big issue, as they can easily be filtered out.
If I run the algorithm until it gives the final answer, the answer is not the same as the original method. The new method gives far more records. I am currently going through the database to see what the correct answer should be. As it is real data, this does take some time, but I think the extra "dirt" in the real data may help us find the potential issues here.

We might need to think of a way to also use a real dataset (or subset thereof) to validate the methods.

Minor issue: it currently doesn't carry over auxiliary data. These might be needed further on, but this is relatively easy to implement.

tjibbed · 2019-10-21T10:18:41Z

My "beautiful" schematic of overlapping admission didn't align there... But I think you get the idea.

ClementMassonnaud · 2019-10-21T11:08:30Z

Hi,
Thank you for your feedback

It can still contain overlapping admissions, and re-running the function on the corrected database sometimes delivers a new corrected database that is slightly different. This can be caused by staggered hospital stays or sequential overlaps, e.g.

I tried to think about the different types of possible overlaps, but I missed this type. I now see what is the issue... I usually try to avoid 'while' loops as much as possible, but here it seems we don't have much choice indeed

We might need to think of a way to also use a real dataset (or subset thereof) to validate the methods.

Yes, I think it is a key point, to make sure the function deals correctly with every possible scenario. I wrote tests in tests/testthat/test-adjust_overlaps.R. I manually created a fake database with all the specific overlaps I could think of, listing them, and anticipating what the correct result should be. I think that it would be a good idea to use the other types of overlaps from a real database, as you suggested, and then 'merge' them with the current test database to complete the tests.

Minor issue: it currently doesn't carry over auxiliary data. These might be needed further on, but this is relatively easy to implement.

I'm not sure I understand what you mean by auxiliary data. If by that you mean additional columns in the database, I don't understand where is the issue

basically you calculate all differences between record N and N+1 and exclude any differences where sID in N is not the sID in N+1. Isn't that possible here as well?

Yes this is interesting, I missed that. I don't see why it would not be possible here as well. I am currently on the road, I will look into it ASAP.

PascalCrepey · 2019-11-15T09:01:40Z

It seems to me we need to remain on @tjibbed original implementation, with the loop. Am I right ? does it need further optimization ? Can we merge @ClementMassonnaud data.table optimization into @tjibbed implementation ? or should we simply close this issue ?

ClementMassonnaud · 2019-11-15T09:33:35Z

Yes we do need the loop I think
But I think we should also implement the optimizations I proposed.
I'll merge the two versions and close this issue ASAP

PascalCrepey added enhancement New feature or request help wanted Extra attention is needed labels Oct 17, 2019

PascalCrepey added this to the v0.9.2 milestone Oct 17, 2019

PascalCrepey assigned ClementMassonnaud and tjibbed Oct 17, 2019

PascalCrepey modified the milestones: v0.9.2, v0.9.3 Nov 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

new implementation of overlap management #14

new implementation of overlap management #14

PascalCrepey commented Oct 17, 2019

tjibbed commented Oct 21, 2019 •

edited

Loading

tjibbed commented Oct 21, 2019

ClementMassonnaud commented Oct 21, 2019

PascalCrepey commented Nov 15, 2019

ClementMassonnaud commented Nov 15, 2019

new implementation of overlap management #14

new implementation of overlap management #14

Comments

PascalCrepey commented Oct 17, 2019

tjibbed commented Oct 21, 2019 • edited Loading

tjibbed commented Oct 21, 2019

ClementMassonnaud commented Oct 21, 2019

PascalCrepey commented Nov 15, 2019

ClementMassonnaud commented Nov 15, 2019

tjibbed commented Oct 21, 2019 •

edited

Loading