Open Access

Library and Information Science Papers Discussed on Twitter: A new Network-based Approach for Measuring Public Attention


Cite

Introduction

In recent years, we witnessed a general trend in research evaluation to measure the impact research has on society (beyond science) or the attention research receives from other parts of society. Whereas in the UK Research Excellence Framework (REF) the case-study approach was used for societal impact measurements, altmetrics has been proposed to measure impact or attention quantitatively (Bornmann, Haunschild, & Adams, 2019). Since the introduction of altmetrics, most quantitative studies focussed on Mendeley or Twitter data (e.g. saves of publications in this online reference manager and short messages with links to publications, respectively). Whereas Mendeley data might be useful in research evaluation to measure the early impact of publications (which can be scarcely measured by citations) (Thelwall, 2018), the usefulness of Twitter counts has frequently been questioned (e.g. Bornmann, 2015; Robinson-Garcia et al., 2017).

Hellsten and Leydesdorff (2020) analyzed Twitter data and mapped the co-occurrences of hashtags (as representation of topics) and usernames (as addressed actors). The resulting networks can show the relationships between three different types of nodes: authors, actors, and topics. The maps demonstrate how actors and topics are co-addressed in science-related communications. Wouters, Zahedi, and Costas (2019) discussed such an approach as a new and valid procedure to use social media data in research evaluation. Recently, Haunschild et al. (2019) explored a network-oriented approach for using Twitter data in research evaluation. Such a methodology can be used to measure the public discussion around a field or topic. For example, Haunschild et al. (2019) based their study on papers about climate change.

This approach can be used to study how the public discusses a certain topic differently from the discussion of the topic in the research community. In this study, we use all papers published during the period 2010–2017 in journals covered by the subject category “Information Science & Library Science” in the Web of Science (WoS, Clarivate Analytics). The objective is to explore the publicly discussed topics in comparison to topics of research as discussed within the journals classified as library and information science (LIS) by Clarivate Analytics.

Methodology
Datasets

We used the WoS data of the in-house database of the Max Planck Society (MPG) derived from the Science Citation Index Expanded (SCI-E), Social Sciences Citation Index (SSCI), and Arts and Humanities Citation Index (AHCI) licensed from Clarivate Analytics (Philadelphia, USA). In this database, 86,657 papers were assigned to the WoS subject category “Information Science & Library Science” and published between 2010 and 2017. Of these papers, 31,348 (36.2%) have a DOI in the database. Following previous studies (Bornmann, Haunschild, & Marx, 2016), we used the Perl module Bib::CrossRef to search for additional DOIs. Only 2,478 additional DOIs were obtained by this procedure. The combined set of WoS and CrossRef DOIs was searched for DOIs occurring multiple times. Such DOIs were removed. Finally, a set of 33,312 papers (38.4%) with DOI was obtained.

The company Altmetric.com (see https://www.altmetric.com) tracks mentions of scientific papers in various altmetrics sources (e.g. Twitter, Facebook, news outlets, and Wikipedia). Twitter is monitored by the company Altmetric.com for tweets that reference scientific papers. Tweets may refer to the content of papers. Twitter users often use hashtags to index their tweets. News outlets are also monitored by the company Altmetric.com for online news items which reference scientific papers (via direct links and text mining or unique identifiers in, e.g. the Washington Post). Altmetric.com provides free access to the resulting datasets for research purposes for free via their API or snapshots.

We received the most recent snapshot from Altmetric.com on October 30, 2019. This snapshot was imported and processed in our locally maintained PostgreSQL database at the Max Planck Institute for Solid State Research. We used the combined set of 33,312 papers to match them via the DOIs with our locally maintained database of altmetrics data. In Haunschild, Leydesdorff, and Bornmann (2019) an earlier snapshot from Altmetric.com from 10th June 2018 was used. Recently, we found data problems regarding this data snapshot: (i) Altmetric.com offered a partial dataset, the limitations of which were not made clear at the time of delivery. (ii) Inadvertently, we did not import all data provided by Altmetric.com at that time into our local database due to an error in our routine. Therefore, we used the newer data snapshot for this study (see also Haunschild et al., 2020).

The following information was appended to the DOIs: (1) links to the tweets which mentioned the respective papers, (2) the numbers of tweets in which the respective paper was mentioned, and (3) the numbers of mentions in news outlets of this same paper. Among the LIS papers with DOI, 13.2% of the (11,421) papers were mentioned in 91,914 tweets; 8.7% (n=7,513) of the papers were mentioned by at least two twitter accounts in 87,529 tweets. Only 0.5% (n=469) were also mentioned in news outlets. The additional consideration of news outlets is intended to identify topics in Twitter discussions which are also reflected in the news sector.

Data

In the most-recent Altmetric.com data dump no tweet URLs were available but only the IDs of tweets. We used these tweet IDs to download the 87,529 tweets with all additionally available information from the Twitter API using R (R Core Team, 2019) between 5th and 6th November 2019. We are interested in all author keywords and hashtags, including name variants. Since these names start with the # sign, no stop-word list is needed. The most frequently occurring author keywords and hashtags were selected for further analysis (see below). We used a cosine-normalized term co-occurrence matrix generated with a dedicated routine written in Visual Basic (see https://www.leydesdorff.net/software/twitter).

We exported four different sets of author keywords: (1) author keywords of all LIS papers, (2) author keywords of not-tweeted papers, (3) author keywords of papers tweeted at least twice, and (4) author keywords of papers tweeted at least twice and mentioned in news outlets at least once. In total 1,366 different author keywords occurred in LIS papers tweeted by at least two accounts and mentioned in news outlets at least once; 211 of these author keywords occurred at least twice, and 65 of them occurred at least three times. We used the top-65 author keywords of the sets of the different author keywords in order to compare networks of the same and a displayable size.

When we refer below to “tweeted papers”, only papers tweeted at least twice are meant. When we refer to “not-tweeted papers”, indeed not-tweeted papers are meant. Papers tweeted exactly once (n=3,908 papers) are not included in the analysis in order to reduce noise. Many papers are tweeted only a single time by the publisher or the authors themselves for self-promotion. We consider these single occurrences as noise.

Visualization

The resulting files (containing cosine-normalized distributions of terms in the Pajek format, see http://mrvar.fdv.uni-lj.si/pajek) were laid-out using the algorithm of Kamada and Kawai (1989) in Pajek and then exported to VOSviewer v.1.6.12 for visualizations. The community-searching algorithm in VOSviewer was employed with a resolution parameter of 1.0, minimum cluster size of 1, 10 random starts, 10 iterations, a random seed of 0, and the option “merge small clusters” enabled. The size of a node indicates the frequency of co-occurrence of a specific term with all other terms on the map. Lines between two nodes and their thickness indicate the co-occurrence frequency of these specific terms.

Results
Author keywords

Figure 1 shows the semantic map of the top-65 author keywords of LIS publications. This map visualizes the author keywords used within the scholarly communication. Five different clusters are marked by respective colours. These clusters reveal the broad spectrum of LIS research. The green cluster represents the core of scientometrics including bibliometrics and most of altmetrics. The yellow cluster is centred on text mining, data mining, and related topics, such as semantics and machine learning. The red cluster contains author keywords related to social media and social networks. The blue cluster deals mainly with libraries and higher-education issues. The purple cluster contains the author keywords “Social network analysis” and “Network analysis”. These methods are used in many of the other clusters’ papers. Both nodes of the purple cluster have many strong links to the red and green clusters. This also shows their topical relations to scientometrics and social media.

Figure 1

Top-65 author keywords of LIS papers published between 2011 and 2017. An interactive version of this network can be viewed at https://tinyurl.com/qwvtoeq. Note that the colour scheme may be different in the interactive version.

Figure 2 shows the semantic map of the top-64 author keywords of not-tweeted LIS publications. The author keywords on ranks 65–67 are tied in this case. Therefore, we decided to display the top-64 author keywords. The author keywords are grouped in six different clusters. Overall, the grouping is like the clustering in Figure 1. The semantic maps in Figure 1 and Figure 2 have an overlap of 55 author keywords (85.9%).

Figure 2

Top-64 author keywords of not-tweeted LIS papers published between 2011 and 2017. An interactive version of this network can be viewed at: https://tinyurl.com/u3569lc. Note that the colour scheme may be different in the interactive version.

Figure 3 shows the semantic map of the top-63 author keywords of tweeted LIS publications. The author keywords on the ranks 64–69 are tied in this case. Therefore, we decided to display the top-63 author keywords. Six different clusters are found: the green, red, yellow, and blue clusters roughly correspond to their counter parts in Figure 1. The purple cluster comprises author keywords about qualitative research and health care while some author keywords related to electronic health records are grouped in the yellow (semantics and text mining) cluster. Overall, the semantic maps in Figure 1 and Figure 2 share 47 (74.6%) and 37 (58.7%), respectively, keywords with the semantic map in Figure 3. Although the quantitative agreement between the semantic maps in Figure 1, Figure 2, and Figure 3 decreases considerably, the qualitative agreement is still large for most of the top 63–65 author keywords of LIS papers. The core author keywords of scientometrics, bibliometrics, altmetrics, text mining, data mining, and social networks still appear in all maps and are grouped in the same clusters, independently of the specific variants of the indicator.

Figure 3

Top-63 author keywords of LIS papers tweeted and published between 2011 and 2017. An interactive version of this network can be viewed at: https://tinyurl.com/rfmy4vz. Note that the colour scheme may be different in the interactive version.

Figure 4 shows the semantic map of the top-65 author keywords of LIS publications which were tweeted and mentioned in news outlets. This network is less dense. Nine different clusters are shown in Figure 4. The rose cluster contains only a single author keyword: “Certification” (rose dot left of “Scientometrics” and “Citation_ analysis”). The red cluster represents the core of scientometrics, bibliometrics, altmetrics, and scholarly publishing. The author keywords related to social media are split-up into two different clusters: light-blue and orange. The purple cluster contains author keywords related to electronic health issues. The yellow cluster contains various information-related author keywords. The green cluster contains author keywords related to journalism and big data. Health-related author keywords are also mixed in the green and yellow clusters. The blue cluster contains author keywords related to qualitative sociology research. The brown cluster is mainly related to privacy issues in the internet. Rather few author keywords in the semantic map of Figure 4 appeared also in the previous figures: 24 (36.9%) in the case of Figure 1, 19 (29.7%) in the case of Figure 2, and 28 (44.4%) in the case of Figure 3.

Figure 4

Top-65 author keywords of LIS papers tweeted, mentioned in news outlets at least once, and published between 2011 and 2017. An interactive version of this network can be viewed at: https://tinyurl.com/twssvt5. Note that the colour scheme may be different in the interactive version.

Table 1 shows the overlap between the top author keywords of all LIS publications (“All”), not-tweeted LIS publications (“Not tweeted”), LIS publications tweeted at least twice (“Tweeted”), and LIS publications tweeted both at least twice and mentioned in news outlets at least once (“Tweeted and mentioned in the news”). The lower triangle shows the absolute number of overlapping author keywords and the upper triangle shows the proportion of overlapping keywords. The top-65 author keywords of publications which were tweeted and mentioned in news outlets show an overlap of about one third with the sets of top author keywords of all and not tweeted publications. The overlap with the author keywords of tweeted publications is higher. This might be partly due to the fact that the author keywords of publications which were tweeted and mentioned in news outlets are a sub-set of the author keywords of tweeted publications. However, this fact cannot explain all of the differences among the overlaps.

The focus of top author keyword selection varies slightly from all and not-tweeted publications to publications tweeted at least twice, but the focus varies significantly to publications tweeted at least twice and also mentioned in news outlets. These results suggest that Twitter activity is rather high in library and information sciences in comparison with other subject categories (Bornmann & Haunschild, 2016). Most of the topics seem to be used both on Twitter and in the scholarly literature. Most of these author keywords of LIS papers which were mentioned also in news outlets have a strong thematic relation to health care.

Overlap between top author keywords. The lower triangle shows the absolute number of overlapping keywords and the upper triangle shows the proportion of overlapping keywords.

AllNot tweetedTweetedTweeted and mentioned in the news
All6585.9%74.6%36.9%
Not tweeted556458.7%29.7%
Tweeted47376344.4%
Tweeted and mentioned in the news24192865
Hashtags

Figure 5 shows the semantic map of the top-65 hashtags of tweets mentioning LIS publications. The hashtags are grouped in eight different clusters. The red cluster mainly contains hashtags related to libraries, scientometrics, bibliometrics, and altmetrics. The hashtags in the green cluster are related to digital and electronic health care. The yellow cluster contains hashtags about big data and open data related to health care issues and financial technology. The blue cluster is mainly related to the World Development Report 2016 entitled “Digital Dividends” and related topics. The purple cluster is focussed on open access and open science. The remaining three clusters are very small: three hashtags are gathered in the light-blue cluster regarding health-related issues which probably also very well could be part of the yellow or green cluster when other parameters in the cluster algorithm would be used. The hashtags (“#PAYWALLED” and “#RICKYPO”) are in the orange cluster. The brown cluster contains only the hashtag “#WIKILEAKS”.

Figure 5

Top-65 hashtags from tweets which mentioned a LIS paper published between 2011 and 2017. An interactive version of this network can be viewed at: https://tinyurl.com/sv8gpax. Note that the colour scheme may be different in the interactive version.

The semantic map in Figure 5 shows many hashtags which are mainly related to the author keywords of the semantic map in Figure 4 but also hashtags which seem to be unrelated to all other semantic maps, e.g. most of the hashtags in the green, light-blue, blue, and orange clusters. Many other hashtags focus stronger on specific events and buzzwords, e.g. “#WDR2016”, “#ICT4D”, “#PAYWALLED”, and “#WIKILEAKS” than the author keywords.

Discussion and conclusions

Many scientometric studies used Twitter counts for measuring societal impact, but the meaningfulness of this data for these measurements in research evaluations (or measurements of attention) has been questioned (Haunschild et al., 2019). We followed our recent proposal (Haunschild et al., 2019) to focus on hashtags in tweets and author keywords in scientific papers in separate sets to differentiate public discussions of certain topics from their addressing in research. We analyzed three datasets: (1) author keywords of all LIS papers, (2) author keywords of not tweeted papers, (3) author keywords of papers tweeted at least twice, and (4) author keywords of papers tweeted at least twice and mentioned in news outlets at least once.

Our study is based on the papers in the WoS subject category LIS which have a DOI and were published between 2011 and 2017. Unfortunately, only less than half of the LIS papers have a DOI which is a major limitation of this study. We used Twitter data to reveal topics of public interest and compare them to research-focused topics. Such an analysis can provide insights into a subject category by revealing which topics enter the public discussions and which do not. Furthermore, the connections between the different topics become visible by using a network-oriented approach.

Our results show that topics in LIS papers seem to be represented rather well on Twitter. Similar topics appear in the networks of author keywords of all LIS papers, not tweeted LIS papers, and tweeted LIS papers. The networks of the author keywords of all LIS papers and not tweeted LIS papers are most similar to each other in terms of author keyword overlap. Larger differences were found between these first three networks of scholarly communications, and the networks of hashtags and the networks author keywords of LIS papers which were tweeted by at least two accounts and mentioned in news outlets at least once as representations of public discourse. Both, the latter scholarly discourses and the tweets, are oriented towards digital and electronic health care more than tweeted LIS papers, not tweeted LIS papers, or all LIS papers. Our results confirm that only specific aspects of research outcomes intersect directly with the attention of the general public. Moving from the author keywords of all LIS papers to those author keywords of tweeted papers and those author keywords of papers additionally mentioned in the news, the focus shifts from theoretical applications and methodologies to health-applications, social media, privacy issues, and sociological studies.

Although we used another dump of data from Altmetric.com in this study than in our ISSI 2019 conference contribution (Haunschild, Leydesdorff, & Bornmann, 2019), the conclusions and interpretations in that conference paper were confirmed. In a similar paper on discussions about climate change, Haunschild et al. (2019) came to the following conclusion: “publications using scientific jargon are less likely to be tweeted than publications using more general keywords” (p. 18). A similar tendency was not visible in the current study using LIS papers and tweets as data. A possible reason for the difference is that the scientific jargon in LIS is less technical than in climate-change research.

eISSN:
2543-683X
Language:
English
Publication timeframe:
4 times per year
Journal Subjects:
Computer Sciences, Information Technology, Project Management, Databases and Data Mining