Open Access

A Novel Method for Resolving and Completing Authors’ Country Affiliation Data in Bibliographic Records


Cite

Figure 1

Abstract view of the different data retrieval and processing steps in our approach.
Abstract view of the different data retrieval and processing steps in our approach.

Figure 2

Comparing numbers of affiliation data unidentified using string matching with numbers of those identified using Wikidata query by year.
Comparing numbers of affiliation data unidentified using string matching with numbers of those identified using Wikidata query by year.

Figure 3

Average improvement (and outliers) of affiliations identified per country using the ACM Digital Library (left) and Microsoft Academic Graph (right) data sets.
Average improvement (and outliers) of affiliations identified per country using the ACM Digital Library (left) and Microsoft Academic Graph (right) data sets.

Figure 4

The number of affiliations identified for each country before and after using the Wikidata query, using ACM Digital Library and Microsoft Academic Graph data sets.
The number of affiliations identified for each country before and after using the Wikidata query, using ACM Digital Library and Microsoft Academic Graph data sets.

Summary of data sets used.

FeaturesACM DLMAG
Total works182,791212,689,976
Unique, co-authored, computer science works121,672557,730

Results of country identification.

CodeResultsACM DLMAG
CAffiliations in co-authored works384,672853,482
C1“NA”, “None”, etc values52,454 (13.64%)66,924 (7.84%)
C2Identified273,245 (71.02%)643,678 (75.42%)
C2.1Identified by string matching236,100 (61.38%)594,911 (69.70%)
C2.2Identified by Wikidata37,106 (9.65%)48,767 (5.71%)
C3Not identified (Other values)59,012 (15.34%)142,888 (16.74%)

Summary statistics of the method's results.

ACM DLMAG
Mean5.70%5.42%
Standard error0.69%1.29%
Median1.20%0.91%
Mode0%0%
Standard deviation8.10%17.88%
Interquartile range6.30%10.18%
Count137192

The accuracy of the method using Wikidata query.

ACM DLMAG
False match rate (FMR)0 %0 %
False non-match rate (FNMR)73 %75 %
eISSN:
2543-683X
Language:
English
Publication timeframe:
4 times per year
Journal Subjects:
Computer Sciences, Information Technology, Project Management, Databases and Data Mining