A Semantic Multi-Field Clinical Search for Patient Medical Records

E Umamaheswari Vasanthakumar 1  and Francis Bond 1
  • 1 Linguistics and Multilingual Studies, School of Humanities, Nanyang Technological University, Singapore


A semantic-based search engine for clinical data would be a substantial aid for hospitals to provide support for clinical practitioners. Since electronic medical records of patients contain a variety of information, there is a need to extract meaningful patterns from the Patient Medical Records (PMR). The proposed work matches patients to relevant clinical practice guidelines (CPGs) by matching their medical records with the CPGs. However in both PMR and CPG, the information pertaining to symptoms, diseases, diagnosis procedures and medicines is not structured and there is a need to pre-process and index the information in a meaningful way. In order to reduce manual effort to match to the clinical guidelines, this work automatically extracts the clinical guidelines from the PDF documents using a set of regular expression rules and indexes them with a multi-field index using Lucene. We have attempted a multi-field Lucene search and ontology-based advanced search, where the PMR is mapped to SNOMED core subset to find the important concepts. We found that the ontology-based search engine gave more meaningful results for specific queries when compared to term based search.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • 1. Langville, A. N., C. D. Meyer. Google’s PageRank and Beyond: The Science of Search Engine Rankings. Princeton University Press, 2011.

  • 2. Bodenreider, O. The Unified Medical Language System (UMLS): Integrating Biomedical Terminology. – Nucleic Acids Research, Vol. 32, 2004, Suppl. 1, pp. D267-D270.

  • 3. Manning, C. D., P. Raghavan, H. Schutze. Introduction to Information Retrieval. Vol. 1. Cambridge, Cambridge University Press, 2008.

  • 4. Cummins, R. Clinical Decision Support with the SPUD Language Model. TREC, 2015.

  • 5. Klein, D., C. D. Manning. Accurate Unlexicalized Parsing. – In: Proc. of 41st Annual Meeting on Association for Computational Linguistics, Vol. 1, Association for Computational Linguistics, 2003, pp. 423-430.

  • 6. Metzler, D., W. B. Croft. A Markov Random Field Model for Term Dependencies. – In: Proc. of 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, 2005, pp. 472-479.

  • 7. Zhu, D., S. T.-I. Wu, J. J. Masanz, B. Carterette, H. Liu. Using Discharge Summaries to Improve Information Retrieval in Clinical Domain. – In: CLEF (Working Notes), 2013.

  • 8. Zhu, D., B. Carterette. Improving Health Records Search Using Multiple Query Expansion Collections. – In: 2012 IEEE International Conference on Bioinformatics and Biomedicine (BIBM’12), IEEE, 2012, pp. 1-7.

  • 9. Diaz, F., D. Metzler. Improving the Estimation of Relevance Models Using Large External Corpora. – In: Proc. of 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, 2006, pp. 154-161.

  • 10. Kilicoglu, H., D. Shin, M. Fiszman, G. Rosemblat, T. C. Rindflesch. Semmeddb: A Pubmed-Scale Repository of Biomedical Semantic Predications. Bioinformatics, Vol. 28, 2012, No 23, pp. 3158-3160.

  • 11. Oh, H.-S., Y. Jung, K.-Y. Kim. A Multiple-Stage Approach to Re-Ranking Medical Documents. – In: International Conference of the Cross-Language Evaluation Forum for European Languages, Springer, 2015, pp. 166-177.

  • 12. Fung, K. W., M. Ma, S. Srinivasan. The Umls-Core Project – A Study of the Problem List Vocabularies Used in Large Health Care Institutions, 2010.

  • 13. Robertson, S. E., S. Walker. Some Simple Effective Approximations to the Poisson Model for Probabilistic Weighted Retrieval. – Readings in Information Retrieval, 1997, p. 345.

  • 14. Jonnalagadda, S. R., G. D. Fiol, R. Medlin, C. Weir, M. Fiszman, J. Mostafa, H. Liu. Automatically Extracting Sentences from Medline Citations to Support Clinicians’ Information Needs. – Journal of the American Medical Informatics Association, Vol. 20, 2013, No 5, pp. 995-1000.

  • 15. Lavrenko, V., W. B. Croft. Relevance Based Language Models. – In: Proc. of 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, 2001, pp. 120-127.

  • 16. Xia, Y., H. Zhao, K. Liu, H. Zhu. Normalization of Chinese Informal Medical Terms Based on Multi-Field Indexing. – In: Natural Language Processing and Chinese Computing, Springer, 2014, pp. 311-320.


Journal + Issues