Otwarty dostęp

A Criteria-based Assessment of the Coverage of Scopus and Web of Science


Zacytuj

Introduction

Although the providers of Scopus (Elsevier) and Web of Science (Clarivate Analytics) claim to be increasingly covering the world’s scientific and scholarly literature comprehensively, both products are selective in practice as well as in principle. To have success on the market, these products not only depend on the coverage, but also the quality and relevance of their contents, as well as on their production costs. The provider of Web of Science, Clarivate Analytics, in addition inherits a tradition in which Eugene Garfield (1979) demonstrated that information retrieval theory (Bradford’s law of scattering) and citation analysis support the idea of indexing mainly the “core journals”. For many decades, an in-house editorial team has been evaluating possible new source items for Web of Science according to a set of publicly available criteria and with the help of citation analysis.

Elsevier instead publicly states on the webpages of the product that “content included in Scopus is carefully curated and ultimately selected by the independent Scopus Content Selection and Advisory Board (CSAB), an international group of scientists, researchers and librarians who represent the major scientific disciplines.” Although the coverage of Scopus is somewhat broader than that of Web of Science, all comparisons, including our own in this study, demonstrate a large overlap and indicate the same pattern of deficiencies when it comes to the social sciences and humanities, and the coverage of literatures in other languages than English. The business model and the criteria seem to be the same. Scopus is also selective in principle and practice.

The two products serve several purposes. Among them are information retrieval, science studies and research evaluation and funding. Here, we limit the perspective to research evaluation and funding as we ask two questions that normally must be answered all the time in this context: How should research quality be assessed? And who should decide on the criteria? With the use of Scopus and Web of Science for research evaluation and funding, the answers are already given above: The commercial providers decide how to select the information provided for the evaluation and who will be using the selection criteria. Even the “independent” advisory board for Scopus is appointed by Elsevier. These procedures ensure the quality of the highly valued products that we use for information retrieval and science studies. Hence, it is easy to forget that the same procedures are less legitimate in research evaluation and funding.

In research evaluation, the procedures and criteria are normally developed and decided in the public domain and anchored in representative bodies of the research communities. In public funding of research, the procedures and criteria are normally decided by democratically responsible authorities and policies and made public to society.

We see a need for the international community of experts in bibliometrics and research evaluation to start discussing the use of Scopus and Web of Science from the perspective of properly organized research evaluation and funding. The two questions need to be renewed in this context: How should research quality be assessed? And who should decide on the criteria?

To initiate the discussion, we apply a criteria-based assessment of the coverage of Scopus and Web of Science in this study. The criteria have been developed by the Norwegian Association of Higher Education Institutions (Universities Norway) with the assistance of its underlying national disciplinary committees and in collaboration with the Norwegian Ministry of Higher Education and Science to support the latter’s institutional funding model. The criteria are also in practice applied by the Research Council of Norway when collecting information for funding applications and national field evaluations. The criteria are very similar to those applied for institutional funding purposes in three other countries: Belgium (Flanders), Denmark and Finland.

The inclusion criteria used in the “Norwegian Model” will be further described in the Data and Methods section below, but essentially, peer-reviewed scientific and scholarly publications are defined and delimited in a way that is comparable to selecting only original research publications (articles) and reviews in Scopus and Web of Science. Source items are similarly selected one by one on the basis of a set of minimal criteria that are intended to promote proper peer review and research quality. In practice, these minimal criteria provide a wider selection of source items than in Scopus and Web of Science. We are thereby able to describe the differences between what the academic communities of a country regard should be included as original research publications for evaluation and funding and what the commercial providers of Scopus and Web of Science are able to provide within a similar limitation to publication type. The patterns of differences will be described both with regard to publication type (books, articles in books, articles in series and journals), field of research and language.

During recent years, several valuable studies have addressed how Web of Science, and more recently Scopus, cover the research literature of various fields and countries. Nevertheless, a criteria-based approach representing research evaluation standards has been absent. With a few examples given in each category, these are the main types of approaches in earlier studies:

The products have been compared to each other with no external reference data, usually confirming that both are suitable tools for evaluation e.g. (Archambault et al., 2009; Barnett & Lascar, 2012; Gavel & Iselid, 2008; Lopez-Illescas, de Moya-Anegon, & Moed, 2008).

• What is not covered has been determined by using citations to non-indexed items in the same products as data e.g. (Moed, 2005; Nederhof, 2006; Van Leeuwen, 2012).

The coverage of the products has been compared to Google Scholar in several studies with different conclusions regarding the usability of the latter (Franceschet, 2010; Harzing & Alakangas, 2016; Kousha & Thelwall, 2008; Lopez-Cozar, Robinson-Garcia, & Torres-Salinas, 2014; Meho & Yang, 2007). None of the studies assert that Google Scholar represents inclusion criteria according to research evaluation standards.

Ulrich's Periodicals Directory has also been used as an external reference, again with no assertion that it represents academic standards for evaluation (Mongeon & Paul-Hus, 2016).

Closer to our approach are studies that base the comparison a wider dataset defined as the published research output of a discipline in a non-English speaking country (Osca-Lluch et al., 2013), or area of research (Ossenblok, Engels, & Sivertsen, 2012) or a geographical region (Santa & Herrero-Solana, 2010). Particularly interesting among these is (Chavarro, 2017) with a critical discussion of the principles and practices of selectivity in the products, demonstrating how their alleged ‘universalism’ does not represent global research in practice.

Our study differs from such earlier studies by applying an explicit set of general criteria developed by academic communities with which we can observe what is included and excluded in the two products.

Data and methods

The so-called “Norwegian Model” (Sivertsen, 2016), which so far has been adopted at the national level by Belgium (Flanders), Denmark, Finland, Norway, and Poland, as well as at the local level by several Swedish universities and by University College Dublin, has three components:

A complete representation in a national database of structured, verifiable and validated bibliographical records of the peer-reviewed scholarly literature in all areas of research;

A publication indicator with a system of weights that makes field-specific publishing traditions comparable across fields in the measurement of ‘Publication points’ at the level of institutions;

A performance-based funding model which reallocates a small proportion of the annual direct institutional funding according the institutions’ shares in the total of Publication points.

The experience is that even with only marginal influence on the total funding, component C will support the need for completeness and validation of the bibliographic data in component A. The data in component A are delimited by a definition, according to which a scholarly publication must:

present new insight

in a scholarly format that allows the research findings to be verified and/or used in new research activity

in a publication channel (journal, series, book publisher) which represents authors from several institutions and organizes independent peer review of manuscripts before publication.

While the first two requirements of the definition demand originality and scholarly format in the publication itself (this is checked locally by each institution), the third and fourth requirements are supported centrally by a dynamic register of approved scholarly publication channels.

Component A in our study is the Norwegian Science Index, a bibliographic database in Cristin (Current Research Information System in Norway), which covers the scientific publication output at almost all Norwegian higher education institutions, research institutes and hospitals. Only publications which have officially qualified as scientific or scholarly according to specific criteria given above are included in the study. We use simple counts of unique publications, leaving aside the publication indicator in component B. A total of 45,092 scientific or scholarly publications are included from the years 2015 and 2016.

While Scopus is organized as one database, Web of Science consists of several individual databases. The core databases are included in the Web of Science Core Collection which are:

Science Citation Index Expanded (SCIE)

Social Sciences Citation Index (SSCI)

Arts & Humanities Citation Index (AHCI)

Conference Proceedings Citation Index (CPCI)

Book Citation Index (BKCI)

Emerging Sources Citation Index (ESCI)

Although these are the core databases of Web of Science, many bibliometric analyses and indicators are limited to the classical (“flagship”) citation indexes, the SCIE, SSCI, and AHCI, which cover journal publishing, only. For example, this holds for the Leiden ranking (http://www.leidenranking.com/information/data) The CPCI and BKCI databases cover conference series and book publications, respectively. The ESCI database was launched in 2015 and contains journals with regional importance and journals under evaluation for being a part of SCIE/SSCI/AHCI (http://wokinfo.com/products_tools/multidisciplinary/esci/) In this study, we have analysed the various databases individually and provide figures for the entire Web of Science Core Collection and for the three classical journal indexes, SCIE/SSCI/AHCI. In some of the analyses, figures are also shown for individual databases.

The comparative analysis consists of several steps. For the journal articles indexed in Cristin, the analyses are based on the list of source journals for Scopus and Web of Science. For Scopus, the October 2016 source list was used, which was the most recent available when the study was carried out. For Web of Science, the 2017 journal source list has been applied. In order to map the journal records of Cristin indexed in Scopus and Web of Science (SCIE, SSCI, AHCI and ESCI), the journal name, ISSN-number and e-ISSN numbers were used as identifiers. Because both database produces apply a cover-to-cover indexing of the journal literature, and fully index all issues such a method is justified.

The analysis of book publications is more complicated where information on the title/name of the monographs, edited books, book series, conferences, conference series, as well as ISBN numbers in various ways were used as identifiers. The source lists of Scopus and Web of Science for book publications and proceedings were used as basis for comparison.

Although considerable efforts have been made to match the records as exact as possible, there inevitably will be cases where items mistakenly have been identified as being indexed or not. This is due to issues such as errors in core data, changes in the name of journals, or in the ISSN or ISBN numbers. Nevertheless, we believe that the sources of errors have rather minor importance when it comes to the overall findings of the study.

Results

Figure 1 shows overall results for the 2015 and 2016 publications. Scopus covers 72 percent of the total publication output, while the corresponding figure for Web of Science Core Collection is 69 percent. Thus, the large majority of the Norwegian scientific and scholarly publication output is indexed in the two databases. Although Scopus has the highest coverage, the difference is not large. The three classical databases, SCIE, SSCI, and AHCI, cover 56 percent of the publication output, while the figures for the CPCI, ESCI and BKCI, are 7 percent, 5 percent, and 1 percent respectively.

Figure 1

Coverage of 2015 and 2016 publications, total all fields and publication types, Scopus and Web of Science.

The publications have been classified in four domains: humanities, social sciences, health sciences, natural sciences and engineering (note that law is included under the social sciences, while psychology is classified in health sciences, not in the

social sciences). For both databases, there are large variations in coverage across different domains. This is shown in Figure 2. In medicine and health, the coverage is not far from complete, with proportions of 89 percent for Scopus and 87 percent for Web of Science Core Collection. The three journal indexes of Web of Science, SCIE, SSCI, and AHCI capture 82 percent of the production. The coverage is also very high for the natural sciences and technology, although for SCIE, SSCI, and AHCI the coverage is reduced (due to the importance of proceeding papers in technology).

Figure 2

Coverage of 2015 and 2016 publications by domain, total all publication types, Scopus and Web of Science.

For the social sciences the coverage is significantly lower. Here, 48 percent of the publications are indexed in Scopus and 40 percent in Web of Science Core Collection, while 27 percent appear in the SCIE, SSCI, and AHCI subset. Only a minor part of the publication output in humanities are indexed. Here the proportions are 27 percent and 23 percent for Scopus and Web of Science Core Collection.

Further details on the coverage by domains are provided in Table 1.

Coverage of 2015 and 2016 publications by domain, total all publication types, Scopus and Web of Science.

WoS Core CollectionN (total number
ScopusSCIE/SSCI/AHCICPCIBKCIESCITotalof publications)
Humanities27%15%1%2%5%23%5,067
Medicine & health89%82%0%0%5%87%12,879
Natural sci & tech85%66%15%0%2%84%18,223
Social sciences48%27%3%2%9%40%9,803
Total72%56%7%1%5%69%45,972

The Appendix 1 contains a complete overview with details for individual disciplines. In the humanities there are large differences in coverage across different disciplines. For example, the Scopus coverage ranges from 11 percent in Scandinavian studies to 54 percent in architecture and design. These differences are likely to reflect the patterns of publication types and publication language applied. Also disciplines within the social sciences show large variations, with law at the bottom

in terms of coverage. In medicine and health a few disciplines achieve a 100 percent coverage in both Scopus and Web of Science Core collection. Disciplines within health, such as nursing and psychology, tend to be less well covered, in nursing approximately 50 percent of the publications are indexed in Scopus and Web of Science Core collection. Disciplines with the natural sciences tend to be very well covered, with chemistry and physics on the top. The proportions for the engineering fields are generally lower than for the natural sciences. Here publishing in proceedings plays a more important role, and this publication type is less well covered that the journal publications.

Data in the Norwegian Science Index are classified into three publication types: monographs, book chapters (articles/chapters in anthologies) and articles in journals/series. The latter category accounts for the large majority of the publications (81 percent), while 17 percent appear as book chapters and 1 percent as monographs.

Figures 3a and 3b show how the coverage of publications varies according to publication type. In total, 84 percent of the journal articles are indexed in Scopus, 80 percent in Web of Science Core Collection, while 68 percent appear in the SCIE, SSCI, and AHCI subset. The coverage of the book chapters is much lower, 12–13 percent for both Scopus and Web of Science Core Collection.

Figure 3a

Coverage of 2015 and 2016 publications by publication types, Scopus and Web of Science, number of publications.

Figure 3b

Coverage of 2015 and 2016 publications by publication types, Scopus and Web of Science, proportions.

All publications in the Norwegian Science Index are classified according to publication language. Overall, 87 percent of the Norwegian publications are written in English (2015–2016). Of the remaining publications, most of them are written in

Norwegian and a small minority in other languages. However, Norwegian accounts for a much higher share of the publications in humanities and social sciences than in the other domains.

Figure 4 shows that both databases have a poor coverage of the Norwegian-language literature.

Cf. Scopus Web-page: “Scopus coverage is global by design to best serve researchers’ needs and ensure that relevant scientific information is not omitted from the database. Titles from all geographical regions are covered, including non-English titles as long as English abstracts can be provided with the articles. In fact, approximately 22% of titles on Scopus are published in languages other than English, adding up to 40 local languages (or published in both English and another language).”

This is an important reason why the databases cover humanities and social sciences less well than what is the case for the other domains. However, also the English language publications of these domains are less well covered. For the humanities Scopus covers 43 percent of this literature, while the corresponding figure for Web of Science Core Collection is 36 percent. The English language publications of the social sciences are better covered with 67 percent and 57 percent indexed in Scopus and Web of Science Core Collection, respectively.

Figure 4

Coverage of 2015 and 2016 publications by publication language and domain, Scopus and Web of Science.

Appendix 2 provides a more detailed picture at the level of disciplines with regard to publications in journals. The left row (% in journals/series) shows the proportion of the total publication output appearing in journals and series. Here the proportions range from 42 percent in history to 100 percent in a few disciplines within medicine. Then follows a row showing the percentage of the journal publications having English as publication language. This proportion is between 90 and 100 percent in most disciplines within medicine and health and natural sciences and technology while it varies from 17 percent (Scandinavian studies) to 100 percent (English studies) in the humanities. Also, across the social science disciplines there are large variations in the prevalence of English as publication language. Finally, the table in the Appendix shows the proportion of the journal publications that are indexed in the two databases.

At the level of individual institutions, there are quite large differences in how well Scopus and Web of Science cover the publication output. For the largest hospital in Norway, Oslo University Hospital, almost all publications are indexed in Scopus and Web of Science Core Collection (96 percent and 95 percent), cf. Figure 5. On the other hand, Oslo and Akershus University College of Applied Sciences (now OsloMet) has less than half of their publications indexed. These differences reflect the field and publication profile of the institutions.

Figure 5

Coverage of 2015 and 2016 publications for selected institutions, Scopus and Web of Science.

Discussions

As described in the introduction, the coverage of Scopus and the Web of Science has been analysed in many previous studies. It is clear that the results of these studies will depend on various issues related to how they are designed and their object of study:

Type of yardstick used to assess the coverage

Publication measure applied (e.g. journals or total number of publications)

Publication types included (e.g. journal articles or the entire research output)

Time period and database product analyzed

Fields and countries selected for analysis

Our analysis differs from earlier studies by applying an explicit set of general criteria developed by academic communities with which we can observe what is included and excluded in the two products (point a). Some main findings are discussed below.

In agreement with several previous studies (e.g. Archambault et al., 2009; Mongeon & Paul-Hus, 2016), our results show that Scopus has a broader coverage than Web of Science. There has been an expansion in the number of sources covered in Scopus and Web of Science the recent years (Collazo-Reyes, 2014). The number of books indexed by Scopus is increasing by 20,000 each year (Elsevier, 2017) and the Web of Science has been supplemented with the Book Citation Index (BCI), and the Emerging Source Citation Index (ESI). We note, however, that compared with the entire Web of Science Core Collection, Scopus’ coverage of the Norwegian publication output is not much higher (72 percent and 69 percent). This finding should be contrasted with the fact that currently more than 20,000 journals and 90,000 books are indexed Web of Science Core Collection (Clarivate Analytics, 2018), compared with 21,200 journals and 150,000 books in in Scopus (Elsevier, 2017). Previous studies such as Archambault et al. (2009) have shown that there is an extremely strong correlation in the number of papers per country in Scopus and WoS, with Scopus providing the largest numbers. This reflects that the two products have similar profile and biases. These have been characterised as to be in favour of large and frequently cited journals, articles written in English and published in English-language journals (Côté, Roberge, & Archambault, 2016).

For both databases our study shows large differences in the coverage across different domains. The health sciences, the natural sciences and technology are very well covered, while this does not hold for the social sciences and the humanities in particular. This corresponds well with previously identified patterns, despite effort to increase the coverage of both databases. Already in 1996, Bourke & Butler (1996) identified a large difference in the ISI (Web of Science) coverage of the Australian publication output in the sciences and social sciences and humanities. In the latter domains less than 20 percent were covered. A similar pattern was also found by (Moed, 2005), using the references patterns of the indexed publications as data source.

In addition, two more recent Norwegian studies using a similar approach showed analogous results. Sivertsen and Larsen (2012) found a coverage of ISI (SCIE, SSCI, and AHCI subset of the Web of Science) ranging from 11 percent in the humanities to 80 percent in the natural sciences. Generally, the figures reported in this study were 5–10 percentage points lower than the ones identified here, which can be explained by the increased coverage of the database during the time period. The study by Sivertsen (2014) included Scopus in the analyses, where the proportions typically were 5–10 percentage points lower than what has been reported here.

As another example, (Mongeon & Paul-Hus, 2016) used Ulrich’s periodical database to assess the coverage. They found that in the natural sciences and engineering, Scopus covered 38 per cent of the journals, while the corresponding figure for Web of Science (SCIE, SSCI, and AHCI) was 33 percent. These proportions are much lower than the figures identified in this study, where Scopus and Web of Science (SCIE, SSCI, and AHCI) covered 94 and 73 percent, respectively of the journal publications. However, the studies have important differences in research design: while our study analyses the population of individual publications, Mongeon & Paul-Hus (2016) investigate the coverage by publication channels based on journal lists (cf. point b above). Ulrich’s database indexes a large number of regional or national periodicals, most of them being irrelevant for researchers within a particular country. A journal-based approach would typically result in lower coverage, as the publication output is skewed at the level of periodicals: within country or field, a limited number publications channels account for a large proportion of the publications output. This is a variant of Bradford’s Law of Scattering providing a basis for Eugene Garfield’s creation of the Science Citation index (Garfield, 1979).

Our study shows that both databases have the same problems in terms of coverage of publications in non-English languages. Moreover, although the number of indexed books has been increasing in both databases, the coverage of book publications is still very limited. This publication type accounts for a small share of the indexed publications of both Scopus and Web of Science (Clarivate Analytics, 2018; Elsevier, 2017), although the number of books indexed in Scopus is higher than in the Web of Science Core Collection. Nevertheless, our study shows no differences in their coverage of the book publications. While the coverage of the English language journal publications is almost complete, this does not hold for the corresponding book publications. Apparently, many important publishers of scholarly books, particularly in the social sciences and humanities, are not covered by the databases (Sivertsen, 2014). The limited coverage of non-English literature on the one hand and book publications on the other, explain why the coverage of the social sciences and humanities is still inadequate.

Our study performs a test based on criteria developed by the academic communities in Norway. The criteria might well be changed or improved when discussed with academic communities in other countries. Although based on data from one country, we argue that the findings have general relevance because the study rooted in publication patterns that are similar across all countries (Sivertsen, 2014). Nevertheless, to some extent the biases identified might affect individual countries differently. There are variations across countries in the tendency towards publishing in international English language journals, in some countries publishing in non-English journals plays a more important role in others (Alperin, 2014). Variations in the coverage of different fields will also affect the overall coverage of each country differently, reflecting their individual specialisation profile (Aksnes et al., 2017).

The aim of our study transcends the one of analysing how Norwegian research is represented in Scopus and Web of Science. It is a test of what the two products look like in the perspective of properly organized research evaluation. This may be exemplified through the large differences that exist in how well Scopus and Web of Science cover the publication output of individual institutions (Figure 5). Research evaluation of institutions based on publication data from Scopus and Web of Science only will therefore rest on foundations lacking adequate justification. After decades of letting commercial providers act as the ‘neutral guarantors of quality’, we wish to empower the academic communities to take back responsibility for criteria and procedures also in the domain of bibliometrics for research evaluation and funding.

Conclusions

This study based on a whole country shows that there are minor differences between the coverages of the scientific literature in Scopus and the Web of Science Core Collection. The patterns of coverage are very similar with only partial representation of some fields of research. Both databases have the same problems in terms of coverage of the social sciences and humanities literature and with coverage of non-English languages. Moreover, although the number of indexed books has been increasing in both databases, the coverage of book publications is still very limited. Special efforts would be required to increase the coverage of the databases and reduce their bias towards particular fields and publication types.

eISSN:
2543-683X
Język:
Angielski
Częstotliwość wydawania:
4 razy w roku
Dziedziny czasopisma:
Computer Sciences, Information Technology, Project Management, Databases and Data Mining