Analysing the Methods of Dzongkha Word Segmentation

Parshu Ram Dhungyel; Jānis Grundspeņķis

Open Access

Analysing the Methods of Dzongkha Word Segmentation

Parshu Ram Dhungyel

and

Jānis Grundspeņķis

| Jun 13, 2017

Applied Computer Systems

Volume 21 (2017): Issue 1 (May 2017)

About this article

Cite

Page range: 61 - 65

DOI: https://doi.org/10.1515/acss-2017-0008

Keywords
Dzongkha word segmentation, maximal matching, n-gram, natural language processing

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

In both Chinese and Dzongkha languages, the greatest challenge is to identify the word boundaries because there are no word delimiters as it is in English and other Western languages. Therefore, preprocessing and word segmentation is the first step in Dzongkha language processing, such as translation, spell-checking, and information retrieval. Research on Chinese word segmentation was conducted long time ago. Therefore, it is relatively mature, but the Dzongkha word segmentation has been less studied by researchers. In the paper, we have investigated this major problem in Dzongkha language processing using a probabilistic approach for selecting valid segments with probability being computed on the basis of the corpus.

eISSN:: 2255-8691
Language:: English

Publication timeframe:: 2 times per year
Journal Subjects:: Computer Sciences, Artificial Intelligence, Information Technology, Project Management, Software Development

Journal RSS Feed

Analysing the Methods of Dzongkha Word Segmentation

Published Online: Jun 13, 2017

Page range: 61 - 65

DOI: https://doi.org/10.1515/acss-2017-0008

Keywords
Dzongkha word segmentation, maximal matching, n-gram, natural language processing

© Riga Technical University

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Analysing the Methods of Dzongkha Word Segmentation

Published Online: Jun 13, 2017

Page range: 61 - 65

DOI: https://doi.org/10.1515/acss-2017-0008

KeywordsDzongkha word segmentation, maximal matching, n-gram, natural language processing

© Riga Technical University

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Keywords
Dzongkha word segmentation, maximal matching, n-gram, natural language processing