Visualization of Disciplinary Profiles: Enhanced Science Overlay Maps

This paper advances science overlay mapping processes. The intent is to provide the research communities using scientometrics with an improved methodology to generate overlay maps (Rafols, Porter, & Leydesdorff, 2010). An overlay map is a global map of science over which a subset of publications is projected, thus allowing the visualization of disciplinary scope for the scientific production of a given organization, individuals, territory, etc. Such maps can help analysts and readers grasp the mix of disciplines engaging a given topic or the portfolio of research interests reflected in the publication (sub)set of an organization (see Wallace and Rafols (2015) for a discussion of research portfolios).

The paper briefly overviews the heritage of the use of Web of Science subject categories (WCs) and of science overlay mapping. It then presents enhanced methodology to generate the maps, followed by examples to illustrate novel application opportunities. The paper updates the visualization process and provides an advanced 2015 basemap.

1.1

Efforts to Classify Research

In order to understand the multidisciplinary profile of publication sets, disciplinary or sub-disciplinary categories can be assigned to the publications. These categories can then be used to represent the position of a publication set in the overall structure of science—i.e. to overlay a specific research activity onto the map of science (Rafols, Porter, & Leydesdorff, 2010).

One method to assign publications to a disciplinary category is to rely on the journal of the publication as an estimate of the scientific field. However, disciplines and fields of science develop above the level of individual journals. Scientometricians proposed the normalization of citations in terms of journal categories (ISI Subject Categories, now known as Web of Science Categories)—as proxies of scientific fields defined above the level of individual journals—in a series of publications during the 1980s (e.g. Schubert, Glänzel, & Braun, 1986; Schubert, Glänzel, & Braun, 1989; Vinkler, 1986).

Using these categories, Moed, de Bruin, & van Leeuwenet (1995) further developed the “crown indicator” at the Center for Science and Technology Studies (CWTS) in Leiden that was later improved as the “Mean Normalized Citation Score” (MNCS). This indicator remains based on the same subject categories, and it is currently the most widely used method to provide normalized comparisons across scientific areas.

The WCs tagged to the 11,000+ journals covered by the Science Citation Index (SCI) and the Social Sciences Citation Index (SSCI) are assigned by indexers on the basis of a number of criteria, including field experts’ judgment of relevance to a given field, the journal’s title, and its citation patterns (Bensman & Leydesdorff, 2009). As of 2015, there are 227 WCs covering SCI and SSCI. Pudovkin and Garfield (2002) described the methods used by the ISI (then provided by Thomson Reuters, and now Clarivate Analytics), and concluded that in many fields these categories are “sufficient;” but “in many areas of research these “classifications” are crude and do not permit the user to quickly learn which journals are most closely related” (p. 1113). Boyack, Börner, and Klavans (2007) estimated that the assignment of WCs is correct in approximately 50% of cases across the file. That said, the “correct” assignment based on detailed article content would usually be proximate.

On the basis of a comparison of this classification with algorithmically generated ones, Rafols and Leydesdorff (2009) (p. 1830) concluded that the WCs can be used for aggregate statistical purposes (i.e. above 100 or so publications, depending on the desired granularity); but are not well-suited for detailed analyses (e.g. to assess an individual’s research). The WCs sometimes cover similar sets of journals; for example, in the domain of biomedicine. In other cases, the categories added by an indexer cover areas that could be considered as separate sub-disciplines or subfields (Leydesdorff & Bornmann, 2016; van Eck et al., 2013). In the case of interdisciplinary publications, problems of imprecise or potentially erroneous classifications can be expected (Rafols & Meyer, 2010)

In scientometric evaluations, journals are sometimes attributed percentages proportional to the categories under which they are subsumed. These multiple categories have also been considered indicators of the interdisciplinarity of journals (Bordons, Bravo, & Barrigon, 2004; Katz & Hicks, 1995; Morillo, Bordons, & Gomez, 2001).

. Klavans and Boyack (in press) recommended using classification schemes based on fine-grained publication-level clustering; but these classifications, which we would recommend where possible, are not publicly available yet—one exception being that provided by Waltman and van Eck (2012).

Notwithstanding these issues, WCs are a main basis for scientometric analyses. The use of these journal categories has become conventional among scientometricians (e.g. Rehn et al., 2014), including use to assess research portfolios. For example, InCites—a customized, Web-based research evaluation tool developed by Thomson Reuters—routinely provides normalizations of citation impact using WCs for the delineation of reference sets (e.g. Costas, van Leeuwen, & Bordons, 2010; Leydesdorff, Hammarfelt, & Salah, 2011). The Flemish ECOOM unit for evaluation in Leuven (SOOI) has developed a new classification system for journals (Glänzel & Schubert, 2003). Other authors have refined the journal lists within specific WCs to enable a more precise evaluation of a given discipline (van Leeuwen & Calero Medina, 2012). Another journal classification system in terms of fields and subfields has been made available by Elsevier’s Scopus in the meantime, but Wang and Waltman (2016) found it to be more problematic than WCs, in particular due to the high rate of multiple category assignments of a journal

The field/subfield classification of Scopus is available in the journal list from http://www.elsevier.com/online-tools/scopus/content-overview. WCs are available (under subscription) at http://images.webofknowledge.com/WOKRS56B5/help/WOS/hp_subject_category_terms_tasca.html.

1.2

Description of WCs as Fields of Science

WCs can also be considered “macro-journals” representing fields and subfields of science. Their sub-disciplinary level of detail fits well with a US National Academies recommendation for study of interdisciplinarity (2005). The current (2015 WoS data) matrix of 227 WCs citing one another can be decomposed using multi-variate (e.g. clustering) analysis. It can be analyzed as a network using, for example, community-finding algorithms. Initially (refer to Leydesdorff & Rafols, 2009; Rafols, Porter, & Leydesdorff, 2010), we used 2007 data to develop a global map of science. At that time, drawing a map using the approximately 10,000 journals in the database was technically not feasible due, among other things, to the cluttering of the labels on the screen. This problem was elegantly solved by VOSviewer (which became available in 2009), by allowing interactive zoom in/out functionality in the visualization (Klavans & Boyack, 2009; van Eck & Waltman, 2010)

Available at http://www.vosviewer.com.

Earlier maps were developed into an overlay-toolkit

http://www.leydesdorff.net/overlaytoolkit

that enabled users to visualize portfolios as overlays using Pajek

Pajek is a network analysis and visualization program freely available for non-commercial usage at http://mrvar.fdv.uni-lj.si/pajek.

(e.g. Leydesdorff, Carley, & Rafols, 2013; Rahman et al., 2015; Riopelle, Leydesdorff, & Li, 2014; Soós and Kampis, 2011). At that time, however, further integration between community-finding algorithms (Blondel et al., 2008), network analysis (e.g. Pajek (de Nooy, Mrvar, & Batgelj, 2011)), and visualization programs such as VOSviewer and Gephi were still emerging (Waltman, van Eck, & Noyons, 2010). VOSviewer, for example, was fully integrated into Pajek in July, 2012, following incorporation of the Blondel (“Louvain”) algorithm for community-finding in January of that year (Blondel et al., 2008). This algorithm offers appeal to provide an improved location of the WCs as nodes in a suitable visual rendition. The overlay process, then superimposes colored and sized nodes on top of that base to convey concentrations of activity. The enhanced science overlay mapping process provides an option to generate networking links among those nodes based on co-occurrence intensities. These can be rendered to augment the maps, with particular appeal to show network evolution over time for a given local domain of research activity.

We now make some choices differently from the ones we made some ten years ago. The wide use by a variety of stakeholders (including not only some researchers, but also scientometric students and practitioners) and requests for a current database, together with technical improvements in visualization during recent years, lead us to revise the overlay basemaps and toolkit based on the most recent version of the Journal Citation Reports (JCR), i.e. 2015.

Data and Methods

2.1

The Mapping

We use the combined set of the JCRs 2015 for the Science Citation Index (SCI) (n of journals = 8,778) and the Social Sciences Citation Index (SSCI) (n = 3,212) leading to a total number of 11,365 journals; 625 journals are covered by both databases (Table 1). A JCR for the Arts & Humanities Citation Index is not available, but, in any event, the behavior of those journals’ citation practices differs considerably from that of SCI and SSCI journals (Leydesdorff, Hammarfelt, & Salah, 2011). We also note that Web of Science has expanded its coverage of other research resources, especially conference proceedings and books. Those are not included in the maps presented here.

Table 1

Numbers of journals and Web of Science categories in SCI and SSCI.

	Journals	WCS
SCI	8,778	177
SSCI	3,212	57
Sum	11,990	234
Total	11,365	227
Overlap	625	6 The journal Language and Cognitive Processes is additionally assigned with “OY,” one of the categories of the Arts & Humanities Citation Index.

The set of WCs covering SCI and SSCI has expanded from 224 in 2010 to 227 in 2015. The three newly added WCs are: “Audiology & speech-language pathology,” “Green & sustainable science & technology,” and “Logic.” The former WC—“Biology, miscellaneous”—was no longer in use in 2010 and, therefore, not included in the analysis; it is also absent from the 2015 data and the current maps.

Using dedicated software, the matrix of 227 × 227 cells was generated on the basis of whole-number citation counting. As previously, we normalize this matrix using the cosine function. However, the default VOSviewer setting normalizes using Zitt, Bassecoulard, and Okubo’s so-called “probabilistic activity index” (PAI) (2000). PAI is equal to the ratio between observed and expected values in a contingency table based on a probability calculus (Equations (1) and (2)): $P A I = p_{i j} / (p_{i} \times p_{j})$ $$ \it{PAI = p_{ij} \,/ \,(p_i \times p_j)} $$ (1) $= n_{i j} \times Σ_{i} Σ_{j} n_{i j} / Σ_{i} n_{i j} \times Σ_{j} n_{i j} .$ $$ \it{= n_{ij} \times \Sigma_i\, \,\Sigma_j n_{ij} \,\,/ \,\,\Sigma_i n_{ij} \times \Sigma_{j}n_{ij}.} $$ (2)

In the context of VOSviewer, this measure is renamed as the “association strength” (van Eck & Waltman, 2009).

Unlike the cosine, which is symmetrical, PAI can be used to normalize asymmetrically the vertical and horizontal dimension of a matrix. However, this possible advantage is not exploited in VOSviewer because the matrix is first made symmetric using the sums of lower and upper triangle values (cellij + cellji) in a new matrix. The cosine-normalized matrix remains worth investigating, because one is able to show the difference between the citation as the current activity (citing) versus the cited structures as archival representations (Wouters, 1998).

Taking these issues into consideration, we first develop the citing-side, cosine-normalized map using 2015 data and VOSviewer visualization with default parameter values. This map is a “descendant” of our previous maps; strong relationship can be seen in comparing Figure 1 (our 2015 basemap from VOSviewer) with A-1 in Appendix A (the 2010 basemap from Pajek). A routine for making overlays on the basis of the map (“wc15.exe”) is provided at http://www.leydesdorff.net/wc15/index.htm and described in Appendix B. If the file cosine.dbf is additionally downloaded from the website, the routine writes a value for the Rao-Stirling measure of diversity, which is a proxy of the disciplinary breadth of the publication subset (Stirling, 2007; Zhang, Rousseau, & Glänzel, 2016)

Rao-Stirling diversity is a measure that takes into account both the variety, balance, and the disparity of categories in a distribution. In the case of publication or patent portfolios the categories can be respectively, WCs or IPC classes. The indicator is defined as Equation (3) (Rao, 1982; Stirling, 2007): $Δ = Σ_{i j} p_{i} p_{j} d_{i j},$ $$ {\it \Delta} = { \Sigma _{ij}\,\, p_i \,\,p_j \,\,d_{ij},} $$ (3)

where d_ij is a disparity measure between two categories i and j and p_i is the proportion of elements assigned to each category i. As the disparity measure, we use (1 − cosine).

Zhang, Rousseau, and Glänzel (2016) and Garner at al. (2013) argue that ²D^S provides a true diversity measure that outperforms Rao-Stirling diversity (Δ) because ²D^S = 2.0 is twice as diverse as ²D^S = 1.0. In Equation (4), these authors formulate: $^{2} D^{S} = 1 / (1 - Δ),$ $$ ^2D^\text S = 1/(1 - {\it \Delta}), $$ (3)

where Δ is the Rao-Stirling diversity. This improved measure varies from 1 to ∞ when Δ varies from 0 to 1. The transformation is monotonic and the value of ²D^S follows directly from that of the Rao-Stirling diversity using Equation (3).

to the screen, based on using (1 − cosine) as the distance measure (p. 986 of Jaffe (1986)).

Five-cluster basemap for 2015 (based on VOSviewer)
This map can be Web-started at http://www.vosviewer.com/vosviewer.php?map=http://www.leydesdorff.net/wc15/cos015m.txt&network=http://www.leydesdorff.net/wc15/cos015n.txt&label_size_variation=0.5&scale=1&colored_lines&curved_lines&n_lines=10000.
.

As an additional resource, one can feed the citation matrix of 227 WCs (citing versus cited, but without prior normalization) into VOSviewer and develop a similar map (Appendix A, Figure A-2). Analogous routines as wc15.exe are provided by mtrx15.exe that produces a file “mtrx15.csv” as an input file for the mapping of a portfolio in VOSviewer using the non-normalized citation data (Figure A-2). In the case of a non-normalized matrix, a distance measure is not provided: the number of possible similarity criteria is large (see Klavans & Boyack (in press), and US National Academies (2005)) and the choice can be left to the user (using, for example, SPSS).

The routines also provide cluster and vector files for cos15.paj and matrix.paj made available on the website for Pajek, respectively (as previously). Pajek and Gephi contain a suite of tools for network analysis and visualization such as various decompositions, layouts, and visualization options. Using Pajek or Gephi, for example, one can also obtain the results of the Louvain algorithm (Blondel et al., 2008) for the decomposition in a format that can again be visualized in programs such as VOSviewer or Gephi.

Using VOSviewer, the user can change the number of clusters by changing the resolution parameter and running the clustering algorithm again. Using default values, both maps (i.e. cosine-normalized or not) show five clusters, but chi-square statistics reject the zero-hypothesis that the two classifications are similar (Cramer’s V = 0.707; p < 0.01). The corresponding five colors (blue, red, green, yellow, and pink) will also be used for the overlays, but the user can change this. Changing the granularity requires one to import the file with network data. More detailed instructions can be found in Appendix B and at http://www.leydesdorff.net/wc15/index.htm.

2.2

Measures of Disciplinary Diversity

As mentioned in the previous section, the cosine-similarity matrix for the WCs provides both the basis for locating the WCs as nodes in science maps (Figure 1), and the basis to calculate measures of diversity. Footnotes 7 and 8 remind the users of Stirling’s measure and how it can be calculated using the 227-by-227 WC cosine-similarity matrix (see Rafols, Porter, & Leydesdorff (2010) for details).

Porter and colleagues introduced measures of interdisciplinarity and multidisciplinarity called “Integration scores” and “Specialization scores,” extended by Carley and Porter to “Diffusion scores” as well (Carley & Porter, 2012; Porter et al., 2007; Porter & Rafols, 2009). For a given set of publication from WoS, Specialization scores indicate the disciplinary diversity of the set based on the distribution of their WCs. Integration scores reflect the diversity of those publications’ cited references—again, using the cited WCs. Downloading the “cited references” of a given WoS search set allows one to pursue this metric. Conversely, Diffusion scores reflect the diversity of the disciplines citing a given set of papers, based on the citing journals’ WCs. This requires a citation search and data downloading from WoS.

These scores are different instances of the Rao-Stirling diversity measures (Footnote 7) (Stirling, 2007). As introduced earlier in this section, one can obtain the Specialization score (Rao-Stirling diversity for the WCs represented in the WoS search set) along with a science overlay map if desired, directly from the script provided at http://www.leydesdorff.net/wc15.

Integration or Diffusion scores need more detailed computation. Scripts have been prepared to run in VantagePoint software

Scripts available at http://www.vpinstitute.com/.

2.3

Mapping Options

As what is introduced earlier in this paper, and enabled at the website (http://www.leydesdorff.net/wc15), one can perform a topical search at WoS and take the output as an “analyze.txt” file to enter directly at the site to generate the corresponding science overlay map in VOSviewer. And, as noted, one can vary the resulting overlay maps in several ways in VOSviewer to accentuate points of interest

Another way to compute the maps is to use VantagePoint (http://www.thevantagepoint.com) to process a search set downloaded from WoS. If one mainly wants a science overlay map of the full search set as is, it is easier to output the “analyze.txt” file from WoS for entry into http://www.leydesdorff.net/wc15. However, if you have cause to process the search set data further, VantagePoint provides helpful tools to facilitate data cleaning (e.g. to remove inappropriate items from the search set) or to analyze sub-data sets (e.g. to compare what selected organizations have published on, say, nanotechnology).

The website provides the option to generate either five-cluster science overlay maps or finer scaled (color-differentiated) 18-cluster overlay maps. Both cluster solutions were generated in VOSviewer, using its algorithm

Our previous clustering solutions were generated using factor analyses in SPSS, resulting in 4 “metadisciplines” (see Appendix Figure A-1) and 19 “macro-disciplines” for 2010 base data.

. Appendix map A-3 shows the 18-cluster basemap. Appendix map A-4 shows an overlay for the London School of Economics as an example.

Case Examples

Our intent here is to present a range of maps to illustrate differences that the new science overlay mapping can convey. We hope that these promote thinking of additional uses of science overlay mapping, potentially augmented by enabling calculation of diversity measures (e.g. Specialization and Integration scores) with the same tool suite.

Figures 2 and 3 compare two multinational companies’ research publications in WoS for 2010–2015. Both show biomedical and physical science strengths. Unlike Unilever, Pfizer also has a pronounced portfolio in “economics” and “statistics and probability” as fields of science. These visualizations facilitate exploration of shared and complementary research interests, potentially of use in considering collaboration (as well as tracking competition) among organizations or nations.

Figures 4, 5, and 6 present three contrasting university profiles. Patterns stand out quite boldly among the engineering-oriented Georgia Tech, the social science emphases of the London School of Economics, and the full spectrum University of Amsterdam research. In contrast to Figure 5, Figure A-4 (Appendix) presents the same data using an 18-cluster map that facilitates finer comparisons.

Science overlay map for the London School of Economics.

Science overlay map for the University of Amsterdam.

Usually one would want to focus more tightly—e.g. on a particular research unit or even on an individual researcher’s work (say to ascertain complementarity with another research group or emphases of a funding program). As one step in that direction, contrast the emphases seen in Figure 4 to its subset for one department of Georgia Tech, the School of Public Policy, shown in Figure 7.

Science overlay map for the School of Public Policy, Georgia Tech.

Conversely, one can observe even broader research profiles—Figure A-5 does so for a country, South Africa. Not surprisingly, one sees a very broad spectrum of research activity at this level. One could pursue via further analyses—e.g. to identify researchers active in a particular sub-domain as spotted on a map. We envision various uses for such technical intelligence, ranging from identification of others pursuing one’s area of interest to identifying complementary strengths for research center development, or such.

Figures 2 to 7 map the research outputs of a given organization. One can map other WoS search sets as well. For instance, in a study of the outputs and impacts of an NSF research program on Human & Social Dynamics (HSD), science overlay mapping was useful for those assessing the merits of that program to see the diversity of the publications generated by HSD support. However, it was even more interesting to see the spread of papers citing those publications across the disciplines. Those showed that this funding from the Social, Behavioral & Economic Sciences Directorate was actively cited beyond those social sciences by natural sciences and engineering (Garner et al., 2013).

Another appealing opportunity arises in mapping topical searches. Figure 8 illustrates for an emerging energy technology, dye-sensitized solar cells (DSSCs), dominated by materials science and related research. “Big Data” (using a first approach) (Figure 9) shows a strong concentration in Computer Science and related fields, but note the incredible breadth of publication as virtually all fields consider how Big Data and Analytics can enhance their R&D. Such research profiling could support funding agencies’ confirmation of interdisciplinary research programs.

Science overlay map for dye-sensitized solar cells.

Discussion

This article bolsters science overlay mapping as a tool for researchers and analysts to help understand the disciplinary profiles of organizations, funding programs, topics, or other types of publication sets. Visualization of the disciplinary profile, operationalized at the sub-discipline level of 227 Web of Science Categories (WCs) can now offer an adjustable, “birds eye” view of the fields involved. By choosing the 18-cluster option (Figure A-3) or the five-cluster option (Figure 1), one can show the analysis at a narrow or broad disciplinary description.

We use a cosine-normalized basemap in this paper’s examples, but note the option of a non-normalized matrix that can default to VOSviewer’s internal normalization scheme for a different presentation (e.g. Figure A-2). We favor the cosine-normalization as 1) yielding more intuitive results, 2) consistent with our prior overlay maps (see Figure A-1), 3) and shown to be consistent with consensus science mapping (e.g. various renditions by Klavans & Boyack (in press), and others (Klavans, & Boyack, 2009), and 4) conducive to use as a diversity measure in calculating diversity indexes (Rao-Stirling). Comparing to Figure A-1 also shows the general continuity between the previous Pajek visualization to the current VOSviewer one. It also shows some differences, both in the visual rendition and in node localizations. We now favor VOSviewer for its ease of use and accessible richness of the visualization options.

Basemap 2010, using Pajek with a four-factor analysis decomposition.

Basemap 2015, using VOSViewer with a five-cluster decomposition. Settings: Attraction 2, Repulsion 0
This file can be web-started at http://www.vosviewer.com/vosviewer.php?map=http://www.leydesdorff.net/wc15/mtrxmap.txt&network=http://www.leydesdorff.net/wc15/mtrxnet.txt&label_size_variation=0.3&=1&colored_lines&curved_lines&n_lines=10000.
.

2015 basemap with a 18-cluster decomposition.

18-cluster science overlay map for the London School of Economics (LSE). Figure

Science overlay map for South Africa (based on 72,937 records in a simple search on cu = South Africa for 2010–2015 in SCI+SSCI).

As illustrated in the case examples, these science overlay maps can provide a quick and intuitive perspective on the disciplinary profiles of organizations. As explained in Rafols, Porter, and Leydesdorff (2010) (see also Leydesdorff & Bornmann, 2016; Rafols & Leydesdorff, 2009; Rafols & Meyer, 2010; van Eck et al. 2013), the main downside of this visualization tool is the lack of accuracy in the WCs—which nevertheless is the most widely used and easily available classification system. As shown in a previous study (Rafols, Porter, & Leydesdorff, 2010), the lack of accuracy of WCs is less problematic at a relatively high level of aggregation. Most errors in locating specific research are nearby in the mapping. For fine-grained descriptions, article-based clustering is preferred (Waltman & van Eck, 2012). However, that does not match the WC-based mapping for communication of which fields are engaged, to what degree.

We believe these new science overlay maps open opportunities for future research. For one, exploration of the differences between the global science maps over time (e.g. between 2010 and 2015 basemaps), shows promise to elucidate real shifts in global research emphases. For instance, is medical science becoming more closely related to biological sciences and less linked to chemistry? The basemaps appear to evolve slowly as shown by the fact that the underlying 2010 and 2015 citation matrices among WCs are very similar (QAP correlation r = 0.937; p < 0.001) in spite of considerable changes in WoS journal inclusion over that period. This justifies their use for overlays over a certain temporal range.

In stepping through the case analyses, we have pointed to a variety of appealing applications for the science overlay mapping. We believe the enhanced clustering of the WCs, improved visualization, and simplified processing will enable various scientometric applications. We do not repeat those here, but note a synergistic capability offered by the integrated data processing hereby enabled. Namely, analysts can now treat multiple aspects of cross-disciplinary engagement in tandem—science overlay mapping, social network analyses (e.g. by comparing connection strengths among WC nodes over time), and diversity (e.g. through calculation of Specialization, Integration, and/or Diffusion scores).

eISSN:: 2543-683X
Language:: English

Publication timeframe:: 4 times per year
Journal Subjects:: Computer Sciences, Information Technology, Project Management, Databases and Data Mining

Journal RSS Feed

Visualization of Disciplinary Profiles: Enhanced Science Overlay Maps

Article Category: Research Paper

Published Online: Aug 22, 2017

Page range: 68 - 111

Received: May 20, 2017

Accepted: Jul 21, 2017

DOI: https://doi.org/10.1515/jdis-2017-0015

KeywordsScience overlay maps, Science visualization, Scientometrics, Bibliometrics, Interdisciplinary research, Multidisciplinarity, Research policy, Research management

© 2017 Stephen Carley, Alan L. Porter, Ismael Rafols & Loet Leydesdorff

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Keywords
Science overlay maps, Science visualization, Scientometrics, Bibliometrics, Interdisciplinary research, Multidisciplinarity, Research policy, Research management