Electronic Health Records (EHRs) have been heavily used to predict various downstream clinical tasks such as readmission or mortality. One of the modalities in EHRs, clinical notes, has not been fully explored for these tasks due to its unstructured and inexplicable nature. Although recent advances in deep learning (DL) enables models to extract interpretable features from unstructured data, they often require a large amount of training data. However, many tasks in medical domains inherently consist of small sample data with lengthy documents; for a kidney transplant as an example, data from only a few thousand of patients are available and each patient’s document consists of a couple of millions of words in major hospitals. Thus, complex DL methods cannot be applied to these kind of domains. In this paper, we present a comprehensive ensemble model using vector space modeling and topic modeling. Our proposed model is evaluated on the readmission task of kidney transplant patients, and improves 0.0211 in terms of c-statistics from the previous state-of-the-art approach using structured data, while typical DL methods fails to beat this approach. The proposed architecture provides the interpretable score for each feature from both modalities, structured and unstructured data, which is shown to be meaningful through a physician’s evaluation.
Proceedings of the IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI) / 2019
Anthology | Paper | Presentation | BibTeX