With vast amount of biomedical literature available online, doctors have the benefits of consulting the literature before making clinical decisions, but they are facing the daunting task of finding needles in haystacks. In this situation, it would be of great use to the doctors if an effective clinical decision support system is available to generate accurate queries and return a manageable size of highly useful articles. Existing studies showed the usefulness of patients’ diagnosis information in supporting effective retrieval of relevant literature, but such diagnosis information is often missing in most cases. Furthermore, existing diagnosis prediction systems mainly focus on predicting a small range of diseases with well-formatted features, and it is still a great challenge to perform large-scale automatic diagnosis predictions based on noisy medical records of the patient. In this paper, we propose automatic diagnosis prediction methods for enhancing the retrieval in a clinical decision support system, where the prediction is based on evidences automatically collected from publicly accessible online knowledge bases such as Wikipedia and Semantic MEDLINE Database (SemMedDB). The assumption is that relevant diseases and their corresponding symptoms co-occur more frequently in these knowledge bases. Our methods use Markov Random Field (MRF) model to identify diagnosis candidates in the knowledge bases, and their performance was evaluated using test collections from the Clinical Decision Support (CDS) track in TREC 2014, 2015, and 2016. The results show that our methods can automatically predict diagnosis with about 75% accuracy, and such predictions can significantly improve the related biomedical literatures retrieval. Our methods can generate comparable retrieval results to the state-of-the-art methods, which utilize much more complicated methods and some manually crafted medical knowledge. One possible future work is to apply these methods in collaboration with real doctors.
Notes: a portion of this work was published in iConference 2017 as a poster, which won the best poster award. This paper greatly expands the research scope over that poster.