Our motivation for conducting this research is driven by the lack of studies focusing on the acknowledgments sections of published papers. Another motivation is the lack of a study examining the countries and organizations mentioned in the acknowledgments section and their influence—something that cannot be analyzed using a citation or co-authorship relationship. Concentrating on the qualitative aspects of acknowledgments has been limited because of the atypical pattern of the acknowledgment section. Our research aims to identify useful information hidden within the acknowledgment sections of the articles stored in the PubMed Central database and to analyze a map of influence via a country-acknowledgment network. To solve the problems, we use the topic modeling to analyze topics of acknowledgments and conduct a basic network analysis to find the difference in the co-the country network and acknowledgment network. A word-embedding model is used to compare the semantic similarity that exists between the authors and countries extracted from our original dataset. The result of topic modeling suggests that funding has become a critical topic in acknowledgments. The results of network analysis indicate that some large countries work as hubs in terms of both implicitly and explicitly while revealing that some countries such as China do not frequently work with other countries. The word-embedding model built by acknowledgments suggests that the authors frequently referenced in acknowledgments are also likely to be referred to in a similar context. It also implies that the publishing country of a paper has little effect on whether it receives an acknowledgment from any other specific country. Through these results, we conclude that the content in acknowledgments extracted from the papers can be divided into two categories—funding and appreciation. We also find that there is no clear relationship between the publication country and the countries mentioned in the acknowledgment section.
Publishing a scientific paper is a pivotal process in which a scholar builds an invisible college where other scholars are involved in the process of publication and communicate via the bibliometric artifacts (Zuccala, 2006). The most representative feature that shows this formal communication is the “citation.» Since Garfield (1955) brought up the usefulness of the citation to trace the scholarly impact of a scientific paper in a domain, the citation has been widely used as a quantitative factor to estimate scientific performance. Meanwhile, the direct interaction between scholars can be measured by using a co-authorship relation. This relationship is a robust tool that shows collaboration and interdependence trends in a scientific domain (Cronin, 2001a). Another important feature is the acknowledgment, which is a critical feature mirroring communication among scholars. Within these sections, the literature and links have significant value in terms of bibliometric study (Borgman & Furner, 2002). As Cronin (1995) pointed out, acknowledgment operationalizes influence and reflects social exchanges of intellectual debts in scholarly communication.
In general, the acknowledgment section comprises statements that consist of various entities such as appreciation to colleagues for providing content, to funding agencies for providing support, and messages of gratitude and dedication to members of the family (Cronin, Shaw, & La Barre, 2003). This feature is distinct from others due to its informal approach and diversity; it is flourishing with information regarding the development of an invisible college. Cronin & Weaver (1995) argued that acknowledgment is one of the components of the “Reward Triangle,” which consists of authorship, citedness, and acknowledgments, and that it is one of the most important bibliometric features presented in a paper. While the other two features have been dealt as important measurements of scholarly communication and scholarly impact respectively, the difficulty of aggregating and mapping data on the acknowledgment section per scientific paper makes it hard to analyze acknowledgments (Cronin, 2001a). However, due to the development of computational techniques, studies have begun to analyze the statements found in the acknowledgment section (Cronin, Shaw, & La Barre, 2003). Nevertheless, even though countries and organizations presented in the acknowledgment section have an important influence on the work in which they are referred to, they have been disregarded (Cronin, 1991).
To tackle this issue, the present study adopts the state-of-the-art text-mining techniques such as machine learning–based Named Entity Recognition (NER), topic modeling, and word embedding. The goal of the study is to identify distinct characteristics and their roles in impacting intellectual influences laid out in acknowledgments. Lastly, we used PubMed Central to download full-text records in the XML form, (1,565,733 full-text records) and extracted the acknowledgment sections. The rest of the paper is organized as follows: in Section 2, we discuss some related works and some of their limitations; in Section 3, we discuss our three methods of analysis: DRM, network, and word2vector; we move on to Section 4 where we discuss the results of the three methods mentioned in Section 3; and conclude with the discussion and conclusion detailing and connecting the results from Section 4, as well as stating some of the limitations of our analysis.
2 Related works
The first work that considered the acknowledgment section as an important feature was conducted in 1973 by a sociologist (Patel, 1973). He analyzed the collaboration of authors within the sociology domain to identify the growth of American sociology and defined three types of collaboration: authorship collaboration, cross-institutional collaboration, and sub-authorship collaboration. To build the sub-authorship collaboration, the acknowledgment sections of papers were manually extracted and the measured, successfully showing the increase of multipurpose collaboration in American sociology. The significance of acknowledgment as a bibliometric feature was confirmed by Patel (1973); there had been no work concentrating on the role of acknowledgment in bibliometrics until Cronin (1991) published the paper on quantitative and qualitative analysis of acknowledgment. He collected 938 articles that had been published within a 20-year time span (1970–1990) and analyzed the acknowledgment sections of each article. It was revealed that almost half of the articles contained some statements of acknowledgment, and most of them were compounded acknowledgments including personal, moral, financial, technical, and conceptual support from institutions, agencies, coworkers, and mentors.
McCain (1991) also conducted quantitative and qualitative analyses on the acknowledgment section to identify the exchange patterns of research-related information in the journal, Genetics. In biology, physical research products, such as experimental materials and innovative instruments, locally produced software, and datasets, are important for researchers to verify their scientific findings. He noticed that the acknowledgment section plays a vital role as communication channel for these kinds of information. The common feature of those very early works is that the unit of analysis was only one journal. Cronin, McKenzie, & Stiffler (1992) and Cronin et al. (1993) expanded the dataset to several journals, as opposed to one, while using the same analysis method. The former study was followed by Cronin (2001), revealing that in a period of about 10 years (1991–1999), the proportion of articles containing acknowledgments and the number of articles published in the five main journals in the information science domain has increased.
Along with the empirical analysis of acknowledgment, Cronin & Weaver (1995) claimed that acknowledgments “define a variety of cognitive and social relationships between researchers and across discipline,” and set acknowledgment as a component of the “Reward Triangle,” which consists of authorship, citation, and acknowledgment. He emphasized that acknowledgment had not been utilized as much as the other components of the “Reward Triangle” notwithstanding its ability to map networks of influence. Meanwhile, in the late 1990s, most studies on acknowledgment placed stress on funding information mentioned in the acknowledgment section, which is helpful when analyzing its effects on funding. Lewison (1994) collected acknowledgments on funding from European Community’s Biotechnology Action Programme (BAP), which aimed to support high-quality research and to foster the construction of European scientific community. He evaluated the fulfillment of these two purposes by counting the number of citations and nations belonging to the papers that contain funding acknowledgments about BAP. In a similar way, Lewison (1998) compared the impact of papers funded by some organizations to those funded by no one. The result showed that the impact of papers that included acknowledgments on different organizations considerably varied, and the papers that did not include acknowledgments had less impact than those that contained acknowledgments. This tendency was supported by Lewison & Dawson (1998) who stated that “research supported by several funding bodies is likely to be of superior quality to that supported by only a single body or by none” in the biomedical domain (p.18).
However, Cronin & Shaw (1999) identified that in the information science field, highly cited works were not likely to be funded and there were a lot of funded works that have not been cited. Additionally, papers published in some countries (the UK, the US, and Canada) tend to receive more citations and contain more acknowledgments than papers published in other countries. The significance of the acknowledgment was reintroduced in 2001 when the notion that hidden social practices and large-scale collaborations in the biomedical domain could ruin the practice arose. Evaluating someone’s scholarly impact using co-authorship relationship was on the rise (Cronin, 2001b). Cronin et al. (2003) adopted the acknowledgment as a supplement of co-authorship to identify the pattern of collaboration in the Psychology and Philosophy domain. The result showed that the distribution of acknowledgment was different from domain, while the occurrence of acknowledgment had increased. They asserted that the result was evidence of the growth and firmness witnessed in the scholarly domains. In 2008 (the year when Web of Science started to collect funding acknowledgment), more in-depth studies on acknowledgment began to rise, thanks to the growth of interest on funding’s relation to acknowledgment, text-mining technologies in bibliometrics, and the overall increase in data.
Wang & Shapira (2011) identified more than 91,500 articles published in nanotechnology within a one-year time span (2008–2009) and found that about 67% of the articles contained information on funding in their acknowledgment sections. In addition, most of nanotechnology funding was provided by the same countries where the works had been published. Their work is the very first one to identify specific exchange of influence knowledge between countries and organizations using large-scale data on acknowledgment. Costas & Leeuwen (2012) utilized about 1,670,000 publications to depict how countries and disciplines are distributed in terms of funding acknowledgment. Along with the descriptive statistics result, the relationship between the presence of acknowledgment and impact, the relationship between the length of acknowledgment and the impact, and the relationship between acknowledgment and co-authorships were also explored. The limitation of the previous two studies is that they only concentrated on funding acknowledgments so that the other kindt of acknowledgments were excluded because of the limitation derived from the dataset provided by Web of Science.
3 The Proposed Approach
Figure 1 is the overall research design of the present study. We begin by extracting the required author and acknowledgment information from the PubMed Central database records. We then use the country data extracted in author affiliation to perform a co-country network analysis. From acknowledgments, entities such as person, organization, and country are extracted. We detail on this more in Section 3.1. In Section 3.3, we compare how the status of each country from traditional co-country analysis is different from that extracted from acknowledgment. We also use the acknowledgment data to generate a DMR model and word2vec model, which will be covered in Sections 3.2 and 3.4, respectively.
3.1 Data collecting and XML Parsing
There are 1,565,733 records in the PubMed Central database, 595,336 of which contain an acknowledgments section. We begin by parsing every file containing an acknowledgment section and extract bibliometric information such as PubMed ID, date of publication, and name(s), affiliation(s), and nationality of author(s). The bibliometric data are used to examine descriptive statistics of the data and to build co-authorship networks in the following analysis. Additionally, the Named Entity Recognition (NER) function of Stanford NLP toolkit (Manning et al., 2014) is utilized to find an organization, individual’s name, and name of a country referred to in an acknowledgment section. In the NER phase, due to the lack of uniformity of acknowledgment statements, only one entity that has the highest accuracy is extracted in an acknowledgment section. Some faults in capturing organizations and countries are filtered with the help of a dictionary in terms of finding compounded words indicating organizations.
Usually, the names of the national institutions names contain the country name as well, for example, “National Natural Science Foundation of China.” However, in the case of national institutions in the United States, the practice of having country names as part of the institution name is ignored. To test if the NER system we used was appropriate enough to follow this method, we randomly selected 1,000 records. There were 247 papers that mentioned the names of national institutions in their acknowledgment sections. Of these 247 papers that mentioned national institutions in their acknowledgment sections, 185 papers had country names in acknowledgments. If there is evidence that indicates the country in another part of acknowledgment, we assumed that the national institution is of a country that is recognized by the NER system. This means that only 6% of records contain national institutions in their acknowledgment and at the same time, country names are not recognized by the NER system.
To measure the performance of our approach for country-name extraction, we randomly selected 600 records and compared the country names extracted by our system with the data manually collected. In this process, we found out that our country-name extraction system achieved 73.86% of accuracy.
3.2 Topic Modeling
Topic modeling is a tool that has been widely adapted to simplify the organization and collection of topics in excessively large volumes of literature. In addition, by adding critical entities such as authors and organizations in terms of creating a scholarly impact, the leading countries or authors in each topic can be identified through topic modeling (Song, et al., 2014). The specific model used in this study is the Dirichlet-multinomial Regression (DMR) model. In the DMR model, each topic has a distribution over words, as well as over metadata values such as author, country, and publication data (Mimno & McCallum, 2012). Additionally, because the variables are conditioned based on the input, as opposed to the estimation, the type of metadata is not limited to the DMR model. The prior distribution over topics, α, is located in the documents set. It depicts the reason for the presence of prior statements. The model uses three fixed parameters—T (number of topics), σ2 (variance of the prior on parameter values), and β (Dirichlet prior to the topic-word distribution)—to find θ (topic distribution) and φ (word distribution) in dataset, D. In this study, we define the acknowledgment sections of the PubMed Central articles as our literature, and use the articles’ publication dates to evaluate the change of a topic’s weight over time. Within the extracted acknowledgments sections, we target the countries as our goal to determine the correlation between a country and specific topic.
Before determining the parameters, we must screen the quality of the topics extracted according to the purpose of the application (Leydesdorff & Nerghes, 2017). The present study set the ratio of topics, which include certain countries and the number of topics that include words that indicate countries divided by the number of topics as the criterion to decide the number of topics to be ascertained. According to the criterion (Table 1), 20 topics have been deciphered. The model is iterated 2,000 times to assure profound accuracy in this study.
Ratio of Topics Including Countries.
|The number of topics||10||15||20||25||30||35||40|
|The ratio of topics that contains countries||0.6||0.667||0.7||0.56||0.533||0.571||0.575|
3.3 Network Analysis
Co-authorship is “the most visible indicator of collaboration” and there have been a lot of studies on co-authorship network, focusing on the effects of collaboration on organizational and institutional aspects (Milojević, 2000). However, even though acknowledgment delivers information about the intellectual influence between researchers, which is different from that of collaboration (Costas & Leeuwen, 2012), it is hard to find the works on networks built by considering acknowledgments of papers. Accordingly, we aim to identify the intellectual influences interchanged among authors, organizations, and countries using the acknowledgment networks (Figure 2). The acknowledgment network includes the relationship between the author(s) or other units of a paper and the units the author(s) are acknowledged in the paper, while co-authorship network only considers the authors, organizations, or countries that have coauthorship relationships.
Word2vec model allows us to learn about high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships (Mikolov et al., 2014). Every word is mapped to a unique vector represented by a column in a matrix. The column is indexed by the position of the word in the vocabulary. The concatenation or sum of the vectors is then used as features for prediction of the next word in a sentence (Le & Mikolov, 2014). Additionally, words with similar context will have similar vectors (Goldberg, & Levy, 2014). To determine the similarity of the authors and countries in our dataset, we begin by parsing our data, extracting the acknowledgment section, and creating a word2vec model using Deep Learning4J (Deeplearning4j Development Team, n. d.). Word2vec is a two-layer neural net that takes a text corpus as input, and sends as output a set of feature vectors for the words in the input corpus. First, the input data are loaded and processed, converting all words to lowercase. Next, we tokenize the data using the default setting, separating the data by white space and preparing the output for one word per line. We then build the model using the settings as seen in Table 2. The model is run a second time restarting the training, and serializing the output to allow the model to be updated without being built from scratch. We can determine the similarity between the vectors by calculating the cosine distance between the vectors of the two words. The result of the calculation is a scalar from –1 to 1 where 1 is an exact match. We then extract every country and author mentioned in our original dataset, resulting in 161 and 65,500 respectively. Using the DeepLeaning4J library, we measure the cosine distance between each country and place the result in a matrix, which we will refer to as a similarity-matrix. We also generate a matrix containing only the countries with more than 5,000 acknowledgments. Unfortunately, due to the overwhelming number of unique authors, a similaritymatrix representing the similarity between every single author cannot be created; however, a similarity-matrix representing authors with more than 50,100, and between 25 and 50 acknowledgments is generated with 164,44 and 223 entries, respectively. The matrices are then plotted and visualized via heat maps using R.
Settings used for building a Word2vec Model.
|Batch size||Defines the mini-batch size||1000|
|Minimum word frequency||Defines the minimum times a number may appear in the corpus||3|
|Epochs||Iterations over whole training corpus||1|
|Layer size||Number of dimensions for output vectors||100|
|Iterations||Number of iterations for each mini-batch during training||3|
|Min learning rate||Defines the minimal learning rate value for training||1E-10|
|Seed||Seed for random number generator||43|
In this section, we present the results of our analysis and provide in-depth examination.
4.1 Descriptive statistics
From the 595,336 acknowledgment sections, we collected 65,452 co-authors, with 16% having at least ten acknowledgments and 18% having at least 50 acknowledgments. There were also only two co-authors with more than 100 acknowledgments; the maximum number of co-authorships was 163 acknowledgments, while the minimum number of co-authorships was six. The dataset also has 65,500 authors with 66.91% having at least one acknowledgment, 98.02% having at least ten acknowledgments, and only 44 authors with more than 100 acknowledgments. The maximum number of acknowledgments a single author received within our author dataset was 675, while the minimum was one. We also extracted the co-country pairs, and countries from the dataset. There were 1,993 country pairs referred to, with 14% receiving at least ten acknowledgments and 97.19% receiving less than 1,000 acknowledgments. There are only three co-countries with more than 5,000 acknowledgments; the maximum is 6,721 and the minimum number of acknowledgments is six. We also extracted 161 countries with 76.40% of the countries receiving at least 5,000 acknowledgments. The maximum number of acknowledgments a single country received was 453,556 and the minimum was one. On the other hand, Figure 3 shows the distribution of countries that appear in acknowledgments. The average number of papers whose acknowledgments include the country was 738.97. As shown in the figure, the distribution of countries follows the power law and hence it can be said that there are a few countries that implicitly contribute to many papers while many countries which implicitly contribute to few papers. The top five countries implicitly contributed to 51% of the papers and top 28 countries occupied 91% of the papers.
In terms of the continent-level, with the exception of Africa, the continents with more countries unsurprisingly tended to receive more acknowledgments (Figure 4). As Figure 5 denotes, it is also safe to assume that countries with more published papers will most likely receive more acknowledgments. Additionally, publishing papers require researchers to write the papers as well as grants to fund the research. It should be of no surprise that the United States, China, and the United Kingdom are dominating the results. Collectively, the United States, China, and the United Kingdom receive over 25% of all of the extracted acknowledgments. These countries also dominate the world in terms of the gross domestic product (GDP). According to International Monetary Fund (2017), China is ranked number one, the United States is ranked number two, and the United Kingdom is ranked number nine in terms of the GDP.
Table 3 shows the number of acknowledgments in which the corresponding organization appeared. The invisible supports of research foundations and government institutions that do not appear directly on author’s affiliation can be revealed by analyzing acknowledgments. Although support from the research institutions is significant in the scientific community, traditional organization analysis based on bibliographic data does not reveal this implicit support. However, with acknowledgment analysis, this implicit contribution of research foundation is captured. Seven out of 18 organizations that appeared most on the acknowledgment are revealed by acknowledgment analysis.
Number of Organizations Referred in Acknowledgment
|German Research Foundation*||1728|
|Ministry of Health*||809|
|US National Science Foundation*||515|
|Mashhad University of Medical Sciences||420|
|National Institute of Child Health and Human Development*||277|
|University of California Davis||262|
|University of Helsinki||178|
|German Science Foundation*||157|
|University of Arizona||142|
|Priority Academic Program Development of Jiangsu Higher Education Institutions*||141|
|Pusan National University||122|
|University of Geneva||121|
|University of Bologna||120|
|Iowa State University||113|
|Weizmann Institute of Science||109|
According to Table 3, the German Research Foundation is the most common organization extracted from acknowledgments. The Ministry of Health and US National Science Foundation follow next. Some institutions extracted from acknowledgments remain ambiguous about the nationality because there is no word on nationality. For example, Ministry of Health can be ambiguous because it does not point out to a specific country. Many of the organizations in Table 3 are national government institutes or research foundations. However, there is the European Community, which is an institution of European Union. We can assume that the authors of a paper whose acknowledgment mentions European Community are more likely to have co-worked with the authors from a country in Europe. Another interesting finding is the dominance of less well-known universities such as the University of Arizona, Pusan National University, and Iowa State University in statements of acknowledgment. The result tells us that the most productive universities in terms of publication are not always the same as the universities that support the research of their faculty.
4.2 Content Analysis using DRM
Table 4 shows the 20 topics extracted through topic modeling. Each topic is labeled after reviewing the words included in a topic. In general, there are two types of topics—to express one’s appreciation and to express funding information—both of which illustrate the content of the acknowledgment in the biomedical domain. The distribution of topics follows the traditional categorization of acknowledgment proposed by Cronin (1991). He categorized content of acknowledgment in scientific publications into the six types according to the purpose as follows: (1) moral support (access to facilities, use of equipment, familial support), (2) assistance of research (editorial and presentational guidance, assistance with analysis), (3) technical support (programming advice, access to technical knowhow), (4) prime mover (project director, dissertation adviser), (5) communication with companies (feedback, new insights from peers), and (6) funding (grants, scholarships, fellowships). The 20 topics are well matched with the categorization. On the other hand, the topic labeled “Thanks to participants” does not match well with the existing categories because the topic is related to the biomedical domain. Additionally, the results show that funding information is responsible for 45% of the topics extracted with regard to a specific country or organization.
The 20 Topics Extracted through Topic Modeling.
|Thanks to National Support||Thanks to advice||Thanks to anonymous review||Funding from various organizations||Thanks to organizational support||Funding from Canadian institutions||Thanks to technical assistant||Funding from national organizations||Thanks to financial support||Thanks to scholars|
|Thanks to assistance on research||Funding from Japanese institutions||Funding from French institutions||Thanks to feedback||Funding from European institutions||Funding from Netherland institutions||Thanks to experimental assistance||Thanks to participants||Funding from American institutions||Funding from German institutions|
Figure 6 is the topic map based on the result of topic modeling. The nodes are the words and phrases extracted via DRM, and the edges denote the co-occurrence relationship among the words and phrases assigned in the same topic. Visualizing the result of topic model enables us to understand large sets of data intuitively (Iwata et al., 2008). When the collection of acknowledgments is represented as 20 topics, there are 16 countries/regions (India, Australia, the UK, China, Canada, Korea, Germany, Taiwan, Sweden, Japan, France, Brazil, Spain, Austria, Netherland, and the US) represented as a noun or an adjective among the words that form 20 topics. The words are shaded in Table 4. Gephi (Bastian, et al., 2009) is used to visualize the map and it displays the words used by the authors starting the acknowledgments in the biomedical domain as well as the overall structure of content in terms of topics. In the map, the words “grant,” “research,” “study,” “support,” “provide,” and “author” are emphasized due to their high occurrence in the results produced by DRM. In addition, the map is color coded according to specific words and their relation to countries, which allows us to see the topic and its relation to a particular country. For example, “USA” forms a topic with “thank_dr, core_facility, mouse, and medicine” as their relation is determined by distance. Meanwhile, in the topic map, the United States, Sweden, and Canada tend to form their own clusters, allowing us to make the assumption that the countries occupy distinct topics in the acknowledgments dataset.
With the overall content of the text data used as input, DMR enables us to track the dynamic changes of weights of topics in a literature set. The weight of a topic increases when the topic becomes important in the literature set, and decreases when the topic becomes less important than other topics in the literature set at that time. According to Figure 7, which illustrates weight trends of 11 topics related to appreciation, there are some topics that show rapid changes. The topics “Thanks for advice,” “Thanks to experimental assistance,” “Thanks for feedback,” and “Thanks to scholars” have become less important since 2008. On the other hand, the topics “Thanks for assistance on research,” “Thanks to financial support,” and “Thanks to organizational support” have become more important since 2010. Especially, the trend line of the topic, thanks to various organizational support, jumps in 2007, which indicates the growth of reference about Korean and Chinese organizational support. We can assume that there were active supports from the nations’ organizations at that time. The importance of the topic was slightly reduced during 2008–2010 but has bounced back since 2010. More recently, it has been less important than what it was before 2015. This trend mirrors the growth of importance of financial support in the biomedical area. The topic related with the appreciation of participants also becomes more important as time passes. The topics “Thanks to National Support” and “Thanks to anonymous review” maintain low weight regardless of the time series.
Figure 8 shows the changes in weights of topics, which contain funding information. The topic on funding information from German institutions suddenly hit a peak in 2004. In the topic, the word “German” forms the topic with the words such as support, grant, European, research, fund, and deutsche_forschungsgemeinschaftso on. Accordingly, it can be inferred that the German organization, Deutsche Forschungsgemeinschaft had an important role to play in terms of support, grant, and research in biomedical area at that time. In the same vein, the topics on funding from American institutions, various organizations, and the Netherland institutions have a peak point when the topics play important roles in the acknowledgments of the biomedical field. The overall pattern of topics on funding information indicates that the topics had become less important for 5 years (2008–2013) while regaining importance in 2014.
4.3 Network Analysis
We examined the implicit and explicit contribution of each country by comparing the number of papers, mentioning the country in the acknowledgment section (implicit contribution) and the number of papers the author of that country has written (explicit contribution). Since traditional co-country network analysis only captures explicit contribution, it is meaningful to find out how a country’s implicit contribution is different from explicit contribution.
In Figure 9, the x axis shows an explicit contribution of each country, that is, the number of papers that researchers in the country write and the y axis shows an implicit contribution, that is, the number of papers in which the country appeared on acknowledgment. To visualize the relationship between two different contributions in an effective way, the countries that have been published on more than 8,000 papers are labeled in Figure 9. The edges between the nodes is marked only if the two countries coworked on same paper more than 51,807 times, which is the top 10% of the co-occurrence count between two countries. Generally, implicit contribution and explicit contribution are proportional to each other. The correlation coefficient between the number of papers that the country contributed and the number of occurrence of the country is 0.898. This figure discovers the tendency that productive countries will more likely receive more acknowledgments. However, there are some countries that often appear a lot in the acknowledgment section, but do not co-work often with other countries. Countries such as India, Korea, and Japan appeared in a number of acknowledgments compared to the number of papers their authors wrote. Even though these countries contributed to a number of papers, they are not considered major agents in traditional co-country analysis.
Table 5 shows the closeness centrality, betweenness centrality, and degree centrality in the co-country network and acknowledgment network. The top five ranking values in each centrality are highlighted. The co-country network is an undirected multigraph with a total of 149 nodes each of which has at least one co-work paper with other countries. Hub countries such as the United States and the United Kingdom appear in the network. They gain high closeness, betweenness, and degree centrality. Some countries show high closeness and betweenness centrality compared to the degree centrality. Researchers in these countries do not frequently co-work with the researchers from other countries, but the countries that they co-work with tend to co-work a lot with other countries. In brief, these countries co-work mainly with hub countries, which locate the countries in the middle of the network. France, Switzerland, Belgium, and India are in this category. The Netherlands, Italy, and China have a relatively high-degree centrality compared to closeness and betweenness. This fact indicates that these countries, on the other hand, are co-working with less-significant countries. Even though China is ranked high both in implicit and explicit contribution as seen in Figure 9, the centrality index for China is low relatively compared to the number of papers it contributed. This shows that China does not work frequently with other countries.
Centralities of Co-country Network Based on Co-authorship and Acknowledgment Network.
|Co-country Network||Acknowledgment Network|
The inter-country network based on acknowledgment is an undirected multigraph with a total of 157 nodes, each of which appears at least once in the acknowledgment section. On the whole, the degree centralities are lower than those of co-author networks due to the sparsity of countries referred to in acknowledgments. The hub countries are similar to the hub countries of the co-country network with the exception of Germany. This means that German researchers do not refer to papers written in their country too often. When reminding the result of topic modeling (see Table 4), it can be inferred that they write the name of the institution instead of the word, Germany, because the topic including the German institution assumes major importance in acknowledgments. The countries that have high closeness and betweenness centrality compared to degree centrality such as China tend to be referred frequently by the hub countries, the United States and the United Kingdom.
In the heat maps shown in Figure 10, red cells indicate a low similarity, a yellow cell is a perfect match that occurs only when the value on both axes are the same, and white represents an empty cell where there was no connection between the two values. As shown in Figure 10a, comparing each of the 161 countries tells us that, with the exception of Anguilla, each country has at least one connection with each other. Additionally, the variation in color tells us that the references between each of the countries are distributed relatively evenly. When limiting the entries of the matrix only to countries with over 5,000 acknowledgments, we observed a very similar distribution in similarities. This implies that the publishing country of a paper has little effect on whether it receives an acknowledgment from any other specific country. Upon analyzing the author’s heatmaps generated from their respective similarity-matrices, we noticed that there was very little similarity between authors with more than 50 acknowledgments (Figure 10b), as well as authors with between 25 and 50 acknowledgments (Figure 10d). However, similarity was found to exist among the top 44 authors, who all had more than 100 acknowledgments each (Figure 10c). Additionally, analyzing smaller numbers above 25 acknowledgments produced result similar to that of Figure 10c, whereas viewing authors with less than 25 acknowledgments produced results similar to that of Figures 10b and F10d. Our dataset contained 65,500 authors, and 99.41% had less than 25 acknowledgments. This suggests that there is a direct relation between the number of acknowledgments an author receives and that author’s similarity with other authors. We can assume that authors with a number of acknowledgments have published significantly more papers. In the published papers of those authors, there are references and acknowledgments to other authors. The acknowledged authors begin to receive more acknowledgments due to the promotion they received through the reference in another paper. As more papers are published, the authors are referenced more and more, which helps them grow exponentially in popularity.
When looking at countries with high acknowledgment and publish numbers, we find that those countries tend to mainly work with hub countries. The fact that those countries tend to work within specific groups is backed by the finding that states that there is a direct relation between the number of acknowledgments an author receives and how likely it is that the author will work with another particular author. For example, our findings show that China does not work frequently with other countries and the majority of the country’s acknowledgments are from within the country itself. We also find that “funding” accounts for 45% of the topics extracted from the acknowledgments section implying that funding plays a major role in an acknowledgment received. A country would most likely fund another paper within its own country before funding a paper from another country within a particular group. However, although the larger countries tend to work within certain groups, our finding also suggests that those countries are also working with less-significant countries. This is reinforced by our discovery implying that the publishing country of a paper has little effect on whether it receives an acknowledgment from any other specific country.
The above findings effectively display the importance of acknowledgment as a unit of bibliometric analysis by highlighting the fact that acknowledgments of papers enable us to find the implicit contribution of authors, organizations, and countries. The implicit relationships between authors, organizations, and countries cannot be identified with well-known explicit measures, citation, and authorship. Moreover, it is shown that there is a correlation between the impact of a paper and number of acknowledgments, an important factor of scholarly communication (Costas & Leeuwen, 2012). Along with the practical implication, the number of papers that have acknowledgments in the biomedical field has been steadily increasing (Figure 11). Nevertheless, there have been little works analyzing specific content via the acknowledgment section. Although our findings show the scholarly influence within the acknowledgments section of a paper, continuous study along with the utilization of well-developed, text-mining techniques will maximize the usefulness of acknowledgments as citation and authorship.
Despite its importance, the acknowledgment section in articles is sparse and has lack of formality, thereby making in-depth analysis of acknowledgments much more difficult than the analysis of citation and authorship. As stated in the Related Work section, a number of works stated the necessity of understanding the quantitative aspects of the acknowledgment section; however, until now it has been neglected due to the technical challenges of mining a large amount of unstructured acknowledgment data. To overcome this limitation, the present study aims to identify the content of the acknowledgment section and the map of influence in a network and in a data-driven way. It is shown that the content of acknowledgments in the biomedical field can be divided into two main categories: funding and appreciation. The importance of specific topics changes dynamically according to a time-series, indicating that the content of an acknowledgment differs over time. With the understanding of this content, the tendency of forming coauthorship in terms of countries is identified through the network analysis. Although there are a few countries that have high publication and occurrence frequencies, those countries tend to show low frequencies of collaboration. Lastly, the analysis using word embeddings shows us the correlation between the content of acknowledgments an author appears in, and the number of times that the author occurred in acknowledgments. Meanwhile, there is no clear relationship between publication countries and the number of times a country occurred within the acknowledgments section. Although our study succeeds in intensifying the important aspects of the acknowledgment section, there are some areas in which the present study can be improved. One area is the extraction of named entities such as person and organization due to its atypical patterns. For example, some authors refer to the nation of an organization, while many authors just refer the name of an organization without referring the name of a country. The issue brought to light the incompleteness of the nation-centered analysis, and we had to manually merge extracted results. In addition, the disambiguation of authors’ names should be considered to produce more precise results. In future work, these enhancements of preprocessing named entities will have to be integrated with the present study, and the degree of influence shown in the acknowledgments section will have to be measured. This will ultimately help shed light on the undiscovered “invisible college.”
This work was supported by the Bio-Synergy Research Project (NRF-2013M3A9C4078138) of the Ministry of Science, ICT and Future Planning through the National Research Foundation.
Bastian, M., Heymann, S., & Jacomy, M. (2009). Gephi: an open source software for exploring and manipulating networks. ICWSM, 8, 361–362.
Borgman, C. L., & Furner, J. (2002). Scholarly communication and bibliometrics.
Costas, R., & Leeuwen, T. N. (2012). Approaching the “reward triangle”: General analysis of the presence of funding acknowledgments and “peer interactive communication” in scientific publications. Journal of the American Society for Information Science and Technology, 63(8), 1647–1661.
Cronin, B. (1995). The scholar’s courtesy: The role of acknowledgement in the primary communication process. Taylor Graham.
Cronin, B. (2001b). Hyperauthorship: A postmodern perversion or evidence of a structural shift in scholarly communication practices? Journal of the Association for Information Science and Technology, 52(7), 558–569.
Cronin, B., Shaw, D., & La Barre, K. (2003). A cast of thousands: Coauthorship and subauthorship collaboration in the 20th century as manifested in the scholarly journal literature of psychology and philosophy. Journal of the American Society for information Science and Technology, 54(9), 855–871.
Deeplearning4j Development Team. (n. d.). Deeplearning4j: Open-source distributed deep learning for the JVM, Apache Software Foundation License 2.0. Retrieved from http://deeplearning4j.org
Goldberg, Y., & Levy, O. (2014). word2vec explained: Deriving mikolov et al.’s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722. Lw201
International Monetary Fund. (2017). World Economic Outlook Database, Retrieved from http://www.imf.org/external/pubs/ft/weo/2016/02/weodata/index.aspx
Iwata, T., Yamada, T., & Ueda, N. (2008, August). Probabilistic latent semantic visualization: topic model for visualizing documents. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 363–371). ACM.
Le, Q., & Mikolov, T. (2014). Distributed Representations of Sentences and Documents. arXiv preprint arXiv: 1402.3722.
Manning, C. D., Surdeanu, M., Bauer, J., Finkel, J. R., Bethard, S., & McClosky, D. (2014). The stanford corenlp natural language processing toolkit. In ACL (System Demonstrations) (pp. 55–60).
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
Milojevic, S. (2010). Modes of collaboration in modern science: Beyond power laws and preferential attachment. Journal of the Association for Information Science and Technology, 61(7), 1410–1423.
Mimno, D., & McCallum, A. (2012). Topic models conditioned on arbitrary features with dirichlet-multinomial regression. arXiv preprint arXiv:1206.3278.
Patel, N. (1973). Collaboration in the professional growth of American sociology. Information (International Social Science Council), 12(6), 77–92.
Zuccala, A. (2006). Modeling the invisible college. Journal of the Association for Information Science and Technology, 57(2), 152–168.