An Approach for Ontology Based Information Extraction

Open access

Abstract

An approach for Ontology based Information Extraction (OBIE) from unstructured text in the Bulgarian language is presented in this paper. The presented method and algorithm provide a solution for automatic data extraction from text documents exploiting ontologies. To this end, in addition to the standard tools for processing language resources in an open source free software, a dictionary-based lemmatizer for Bulgarian has been developed and integrated. It is distributed as free software, publicly available to download and use under the GPL v3 license. Due to the specifics of inflection in Bulgarian the developed tools for lemmatization will contribute to improving the results of the POS tagger. This approach will offer opportunities for developing a dynamically created gazetteer that is, in combination with a few other generic GATE resources, capable of producing ontologybased annotations over the given content with regards to the given ontology. This algorithm can also be used in the processes of content creation and management of information and knowledge.

References

  • 1. Berners-Lee, T., J. Hendler, O. Lassila. The Semantic Web. Scientific American Magazine, May 17, 2001.

  • 2. Bontcheva, K., H. Cunningham, A. Kiryakov, V. Tablan. Semantic Annotation and Human Language Technology. Semantic Web Technologies: Trends and Research in Ontology-based Systems (Eds. J. Davies, R. Studer and P. Warren), John Wiley & Sons, Ltd, Chichester, UK, 2006. doi: 10.1002/047003033X.ch3.

  • 3. Gobinda, G. G. Natural Language Processing. Annual Review of Information Science and Technology, 37, 2003, 1, 51-89.

  • 4. Cunningham, H., K. Bontcheva, D. Maynard, V. Tablan. GATE - A New Release. - ELSNews, 11 2002, 1. http://www.elsnet.org/publications/elsnews/11.1.pdf.

  • 5. Borisova, N., G. Iliev, E. Karashtranova. On Detecting Noun- Adjective Agreement Errors in Bulgarian Language Using GATE. Proceedings of the Fifth International Conference of FMNS, Blagoevgrad, 2013, 180-187, 2013.

  • 6. Tablan, V., C. Ursu, K. Bontcheva, H. Cunningham, D. Maynard, O. Hamza, T. Mcenery, P. Baker, M. Leisher. A Unicode-based Environment for Creation and Use of Language Resources. Proceedings of 3rd Language Resources and Evaluation Conference, 66-71. http://citeseerx.ist.psu.edu/viewdocdownload;jsessionid=F9063E0E70FAA70A6878E6502D7F0968?doi=10.1.1.18.5528&rep=rep1&type=pdf.

  • 7. https://gate.ac.uk.

  • 8. Simov, K., P. Osenova, M. Slavcheva. BulTreeBank Morphosyntactic Tagset. Technical Report BTB-TR03, BulTreeBank Project, March 2004.

  • 9. Nakov, P. BulStem: Design and Evaluation of Inflectional Stemmer for Bulgarian. Proceedings of Workshop on Balkan Language Resources and Tools (1st Balkan Conference in Informatics), Thessaloniki, Greece, November, 2003. http://lml.bas.bg/~nakov/selected_papers_list/nakov_BLRT_BulStem.pdf.

  • 10. Cunningham, H., D. Maynard, K. Bontcheva, V. Tablan, N. Aswani, I. Roberts, G. Gorrell, A. Funk, A. Roberts, D. Damljanovic, T. Heitz, M. A. Greenwood, H. Saggion, J. Petrak, Y. Li, W. Peters, et al. Developing Language Processing Components with GATE Version 8. The University of Sheffield, Department of Computer Science, 2014.

  • 11. Iliev, G., N. Borisova, E. Karashtranova, E., D. Kostadinova. A Publicly Available Cross-Platform Lemmatizer for Bulgarian. Proceedings of the Sixth International Scientific Conference - SWU, FMNS 2015, Blagoevgrad, 2015, 147-151.

  • 12. Karashtranova, E., G. Iliev, N. Borisova, Y. Chankova, I. Atanasova. Evaluation of the Accuracy of the BGLemmatizer. Proceedings of the Sixth International Scientific Conference - SWU, FMNS 2015, Blagoevgrad, 2015, 152-156.

  • 13. Krustev, B. The Morphology of the Bulgarian Language in 187Type Tables. 1990.

  • 14. Kiryakov, A., D. Ognyanov, D. Manov. OWLIM-a Pragmatic Semantic Repository for OWL. Web Information Systems Engineering - WISE 2005 Workshops, Lecture Notes in Computer Science Volume 3807, 2005, 182-192.

Information Technologies and Control

The Journal of Institute of Information and Communication Technologies of Bulgarian Academy of Sciences

Journal Information

Metrics

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 42 42 25
PDF Downloads 16 16 11