Masked Sentence Model Based on BERT for Move Recognition in Medical Scientific Abstracts

Gaihong Yu; Zhixiong Zhang; Huan Liu; Liangping Ding

Open Access

Masked Sentence Model Based on BERT for Move Recognition in Medical Scientific Abstracts

and

| Dec 27, 2019

Journal of Data and Information Science

Volume 4 (2019): Issue 4 (December 2019)

About this article

Cite

Published Online: Dec 27, 2019

Page range: 42 - 55

Received: Sep 27, 2019

Accepted: Nov 05, 2019

DOI: https://doi.org/10.2478/jdis-2019-0020

Keywords
Move recognition, BERT, Masked sentence model, Scientific abstracts

© 2019 Gaihong Yu, Zhixiong Zhang, Huan Liu, Liangping Ding, published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Purpose

Move recognition in scientific abstracts is an NLP task of classifying sentences of the abstracts into different types of language units. To improve the performance of move recognition in scientific abstracts, a novel model of move recognition is proposed that outperforms the BERT-based method.

Design/methodology/approach

Prevalent models based on BERT for sentence classification often classify sentences without considering the context of the sentences. In this paper, inspired by the BERT masked language model (MLM), we propose a novel model called the masked sentence model that integrates the content and contextual information of the sentences in move recognition. Experiments are conducted on the benchmark dataset PubMed 20K RCT in three steps. Then, we compare our model with HSLN-RNN, BERT-based and SciBERT using the same dataset.

Findings

Compared with the BERT-based and SciBERT models, the F1 score of our model outperforms them by 4.96% and 4.34%, respectively, which shows the feasibility and effectiveness of the novel model and the result of our model comes closest to the state-of-the-art results of HSLN-RNN at present.

Research limitations

The sequential features of move labels are not considered, which might be one of the reasons why HSLN-RNN has better performance. Our model is restricted to dealing with biomedical English literature because we use a dataset from PubMed, which is a typical biomedical database, to fine-tune our model.

Practical implications

The proposed model is better and simpler in identifying move structures in scientific abstracts and is worthy of text classification experiments for capturing contextual features of sentences.

Originality/value

T he study proposes a masked sentence model based on BERT that considers the contextual features of the sentences in abstracts in a new way. The performance of this classification model is significantly improved by rebuilding the input layer without changing the structure of neural networks.

eISSN:: 2543-683X
Language:: English

Publication timeframe:: 4 times per year
Journal Subjects:: Computer Sciences, Information Technology, Project Management, Databases and Data Mining

Journal RSS Feed

Masked Sentence Model Based on BERT for Move Recognition in Medical Scientific Abstracts

Article Category: Research Paper

Published Online: Dec 27, 2019

Page range: 42 - 55

Received: Sep 27, 2019

Accepted: Nov 05, 2019

DOI: https://doi.org/10.2478/jdis-2019-0020

KeywordsMove recognition, BERT, Masked sentence model, Scientific abstracts

© 2019 Gaihong Yu, Zhixiong Zhang, Huan Liu, Liangping Ding, published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Purpose

Design/methodology/approach

Findings

Research limitations

Practical implications

Originality/value

Keywords
Move recognition, BERT, Masked sentence model, Scientific abstracts