Otwarty dostęp

Automatic Classification of Swedish Metadata Using Dewey Decimal Classification: A Comparison of Approaches


Zacytuj

The different datasets generated from the raw LIBRIS data.

DatasetIDrecordsclasses
TitlesT143,838816
Titles and keywordsT_KW121,505802
Keywords onlyKW121,505802
Titles, major classesT_MC72,93729
Titles and keywords, major classesT_KW_MC60,64129
Keywords only, major classesKW_MC60,64129

Accuracy of the Support Vector Machine classifier when using two digits.

Support Vector Machine
DatasetAccuracy, unigramsAccuracy, unigrams + 2-grams
Training setTest setTraining setTest set
T_KW_stm_2d90.60%72.68%96.23%73.32%
T_KW_2d91.21%72.14%95.48%73.24%
KW_2d81.75%71.86%86.18%71.96%

Accuracy of the Support Vector Machine classifier on the different datasets.

DatasetAccuracy, unigramsAccuracy, unigrams + 2-grams
Training setTest setTraining setTest set
T93.74%40.91%99.59%40.45%
T_KW97.50%65.25%99.90%66.13%
KW83.09%64.02%92.38%64.09%
T_MC93.95%57.99%99.62%57.80%
T_KW_MC97.89%80.75%99.93%81.37%
KW_MC90.58%79.56%96.30%80.38%

Accuracy of the Supper Vector Machine classifier using different pre-processing.

Support Vector Machine
DatasetAccuracy, unigramsAccuracy, unigrams + 2-grams
Training setTest setTraining setTest set
T_KW_MC97.89%80.75%99.93%81.37%
T_KW_MC_rem92.51%80.94%95.02%81.83%
T_KW_MC_stm97.21%81.07%99.91%81.80%
T_KW_MC_stm_rem92.18%81.34%94.89%82.20%
T_KW_MC_sw95.44%80.98%98.48%81.24%
T_KW_MC_sw_rem92.46%81.04%94.30%82.13%
T_KW_MC_sw_stm94.87%81.40%98.72%81.24%
T_KW_MC_sw_stm_rem92.17%81.54%94.16%81.90%

Accuracy of Linear and RNN classifiers using word embeddings.

DatasetLinearRNN
Training setTest setTraining setTest set
T_KW_MC97.17%79.99%92.76%78.70%
KW_MC91.30%78.41%88.03%78.74%
T_KW_MC_stm96.90%80.81%92.38%79.16%

Accuracy of the Naïve Bayes classifier when using two digits.

Naïve Bayes
DatasetAccuracy, unigramsAccuracy, unigrams + 2-grams
Training setTest setTraining setTest set
T_KW_stm_2d87.40%65.64%93.18%67.79%
T_KW_2d88.26%64.78%93.55%66.92%
KW_2d78.36%68.12%82.53%67.94%

Accuracy of NN and CNN classifiers using word embeddings.

DatasetNNCNN
Training setTest setTraining setTest set
T_KW_MC96.19%79.40%95.33%79.92%
KW_MC90.54%78.23%90.39%79.15%
T_KW_MC_stm95.92%79.57%94.60%80.38%

Accuracy of the Naïve Bayes classifier using different pre-processing.

Naïve Bayes
DatasetAccuracy, unigramsAccuracy, unigrams + 2-grams
Training setTest setTraining setTest set
T_KW_MC95.42%76.52%99.66%75.96%
T_KW_MC_rem90.17%76.79%93.25%78.21%
T_KW_MC_stm94.32%76.36%99.59%76.36%
T_KW_MC_stm_rem89.62%76.26%92.95%78.27%
T_KW_MC_sw95.50%76.46%99.64%76.62%
T_KW_MC_sw_rem90.28%77.09%92.33%78.60%
T_KW_MC_sw_stm94.49%76.59%99.53%76.95%
T_KW_MC_sw_stm_rem89.79%76.36%91.96%78.90%

Accuracy of the Multinomial Naïve Bayes classifier on the different datasets.

DatasetAccuracy, unigramsAccuracy, unigrams + 2-grams
Training setTest setTraining setTest set
T83.54%34.89%95.82%34.15%
T_KW90.01%55.33%98.14%55.45%
KW75.28%59.15%84.95%58.11%
T_MC90.83%54.21%98.63%50.51%
T_KW_MC95.42%76.52%99.66%75.96%
KW_MC86.94%77.25%94.24%77.09%
eISSN:
2543-683X
Język:
Angielski
Częstotliwość wydawania:
4 razy w roku
Dziedziny czasopisma:
Computer Sciences, Information Technology, Project Management, Databases and Data Mining