Clustering of Authors’ Texts of English Fiction in the Vector Space of Semantic Fields

Bohdan Pavlyshenko

Open Access

Clustering of Authors’ Texts of English Fiction in the Vector Space of Semantic Fields

Bohdan Pavlyshenko

| Nov 05, 2014

Cybernetics and Information Technologies

Volume 14 (2014): Issue 3 (September 2014)

About this article

Cite

Page range: 25 - 36

DOI: https://doi.org/10.2478/cait-2014-0030

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

This paper describes the analysis of possible differentiation of the author’s idiolect in the space of semantic fields; it also analyzes the clustering of text documents in the vector space of semantic fields and in the semantic space with orthogonal basis. The analysis showed that using the vector space model on the basis of semantic fields is efficient in cluster analysis algorithms of author’s texts in English fiction. The study of the distribution of authors' texts in the cluster structure showed the presence of the areas of semantic space that represent the idiolects of individual authors. Such areas are described by the clusters where only one author dominates. The clusters, where the texts of several authors dominate, can be considered as areas of semantic similarity of author’s styles. SVD factorization of the semantic fields matrix makes it possible to reduce significantly the dimension of the semantic space in the cluster analysis of author’s texts. Using the clustering of the semantic field vector space can be efficient in a comparative analysis of author's styles and idiolects. The clusters of some authors' idiolects are semantically invariant and do not depend on any changes in the basis of the semantic space and clustering method.

eISSN:: 1314-4081
Language:: English

Publication timeframe:: 4 times per year
Journal Subjects:: Computer Sciences, Information Technology

Journal RSS Feed

Clustering of Authors’ Texts of English Fiction in the Vector Space of Semantic Fields

Published Online: Nov 05, 2014

Page range: 25 - 36

DOI: https://doi.org/10.2478/cait-2014-0030

© by Bohdan Pavlyshenko

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.