-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing social history makes automated medical coding challenging #1663
Comments
Thanks for raising this, I wasn't aware of it personally and that's an unfortunate side effect. It's open for discussion but I think the removal of the social history section remains useful for deidentification given the expansion of the dataset to ED patients. The approach could be improved by applying NER on the raw PHI note and allowing through only the medically relevant segments of the social history (alcohol use, smoking use, drug use). It's unlikely we'll get to doing that any time soon though, sorry. |
Prerequisites
Description
Automated medical coding (also called medical code prediction) is a growing machine learning task that aims to predict medical codes given a discharge summary. MIMIC-IV has become a popular dataset to train and evaluate such models. However, there is an issue. Since your de-identification algorithm removed the social history section, certain annotated medical codes are impossible to predict. For instance, the medical codes representing whether the patient smokes (e.g., F17.210 and Z87.891) are often annotated in MIMIC-IV without being mentioned in the discharge summary. This is because of the missing social history.
The consequences of the missing section are that the models are trained on labels that are impossible to predict and are evaluated unfairly every time the necessary information would have been in the social history. Consequently, MIMIC-IV is a noisier dataset for automated medical coding than MIMIC-III (MIMIC-III contains the social history).
Is there a way to de-identify the discharge summaries without removing the social histories?
The text was updated successfully, but these errors were encountered: