Around the world, research data management (RDM) has been becoming an increasingly important service that a variety of information centers and libraries provide. According to Whyte and Tedds (2011), “Research data management concerns the organization of data, from its entry to the research cycle through to the dissemination and archiving of valuable results” (p. 1). Two years later, in the ARL (Association of Research Libraries) SPEC (Systems and Procedures Exchange Center) Kit 334 entitled “Research Data Management Services,” Fearon et al. (2013) defined RDM services as “providing information, consulting, training or active involvement in data management planning, data management guidance during research (e.g., advice on data storage or file security), research documentation and metadata, research data sharing and curation (selection, preservation, archiving, citation) of completed projects and published data” (p. 12). Within the US, a variety of federal funding agencies or foundations including National Institutes of Health (NIH), National Science Foundation (NSF), Department of Energy (DoE), Department of Education (DoED), Environmental Protection Agency (EPA), and the Alfred P. Sloan Foundation (Sloan) have started requiring the sharing of research outputs (National Institutes of Health, 2005; National Institutes of Health, 2008) and mandating data management plan (National Science Foundation, 2010a, 2010b). Similar mandates from funding agencies of a variety of other countries include the UK (UKRI, n.d.), Canada (Government of Canada, 2016), Australia (ARC, 2017), European Union (Shearer, 2015), Japan (JST, 2013), and India (Department of Science & Technology, Government of India, n.d.). In the UK, the first data management plan (DMP henceforth) requirement was put in place by the Medical Research Council in 2006, soon followed by the Wellcome Trust in 2007 (Smale, Unsworth, Denyer, & Barr, 2018). The NSF in 2011 was the first funder in the US to implement a DMP requirement This work is licensed under the Creative Commons Attribution-
(National Science Foundation, 2010a). In 2013, the Office of Science and Technology Policy released a memo to the heads of executive departments and agencies requiring that each agency develop a plan to promote public access to research. The public access plan included a requirement that “all extramural researchers receiving Federal grants and contracts for scientific research and intramural researchers develop data management plans, as appropriate” (OSTP, 2013, p. 5). In August 2017, Smale et al. (2018) surveyed 16 US national funding bodies and found that 10 required DMPs as did six of seven research councils in the UK (p. 10).
Given these mandates, universities or research libraries from different countries have redefined or extended their role in RDM services. Their efforts have made librarians not only an important contributor to the research process but also an essential partner in the entire research data ecosystem. As pointed out by Choudhury (2008), the new role of librarians in “supporting new forms of data-intensive scholarship” (p. 215) has transferred librarians into a “data scientist” or “data humanist.” In such a role, “they act as the human interface between the library and the eScience projects. In a fundamental sense, they may represent the future of subject librarianship and help craft a new relationship between the library and scientists” (p. 217). As an example, in health science libraries, the demand for RDM services has been very strong. As indicated by Martin (2013), “As biomedical science becomes more data-intensive, researchers are faced with a range of data management challenges, problems, and needs. Health sciences librarians are ideal partners for offering scientists at their institutions a range of data management services” (p. 1). Data services provided by health science libraries may vary from creating “a comprehensive data dictionary” and “a standardized data request form” (Gore, 2013, p. 22) to assign metadata to incoming data, naming and labeling research variables, and finding a data repository (Hasman, Berryman, & Mcintosh, 2013). As reported in by Hanson et al. (2013), librarians are now providing “the deepest and most detailed data management support” to scientific research teams “to date” (p. 25).
It should be noted that data services are also provided in social science and humanities disciplines. In the “Special Report: Digital Humanities in Libraries” by Varner and Hswe (2016), the authors argue that with the rapid evolution of digital humanities and the uncertainty of its future trajectory, “it may be more productive—and more honest—to position the library as a research partner that can explore new solutions with researchers rather than as a service provider that either has what a researcher is looking for or doesn’t” (para 9).
However, even though there is a growing momentum in RDM, the level of success in implementing and providing RDM services at various data-intensive organizations has not been consistent. According to Perrier, Blondal, and MacDonald (2018), “although libraries play a role in RDM at academic institutions, they have experienced varying degrees of success with the development of RDM support and services given this expanded responsibility” (p. 173). Furthermore, as noted by Faniel and Connaway (2018), “research into RDM still is in the early stages, and few studies focus on the library community’s RDM experiences” (p. 100). In this study, with a focus on the RDM practice in terms of the librarians’ role, their levels of preparedness in providing RDM services, and the current RDM services and tools provided, we conducted an international survey involving RDM service librarians around the world who worked in a variety of organizational types. We also probed into the knowledge and skills that respondents deem crucial for RDM services, as well as their vision for RDM roles in the future. Our primary research questions (RQs) were as follows:
RQ1. What is the state of current practice of RDM services in libraries?
RQ2. What current role do librarians play in providing RDM services?
RQ3. What specific knowledge and skills do librarians believe as needed for RDM training?
RQ4. How do participants see as the future evolutions of the librarians’ role in providing RDM services?
2 Literature Review
2.1 Components and Models of RDM Services
Empirical research studies on RDM services in libraries tend to use a similar set of definitions for RDM. A frequently cited definition is by Cox and Pinfield (2014), which states that RDM “consists of a number of different activities and processes associated with the data lifecycle, involving the design and creation of data, storage, security, preservation, retrieval, sharing, and reuse, all taking into account technical capabilities, ethical considerations, legal issues and governance frameworks” (p. 300). Both this definition and the one by Fearon et al. (2013) as cited earlier define RDM through a series of activities associated with data lifecycle. Meanwhile, the conceptualization of RDM is enriched through the development of frameworks or process models of RDM services. For example, in their
working level guide for the Digital Curation Centre, Jones, Pryor, and Whyte (2013) created a model outlining the components of an RDM service (see Figure 1). As pointed out by Jones et al. (2013), “In order to support effective data management and sharing, an institution needs a coherent strategy and suite of services” (p. 5). In the RDM components diagram, the overarching activities include “RDM policy and strategies” and “business plan and sustainability.” Under the RDM service establishment, various levels of guidance, training, and support are required. The core RDM process features the service components of data management planning, managing active data, data selection and handover, data repositories, and data catalogs (p. 5).
Building upon Jones et al.’s RDM components model, Whyte (2014) presented a process pathway model to illustrate, among other things, the steps required in developing an RDM service. In Whyte’s model, higher level factors, which are positioned together, include context, principles, inputs, outputs, and outcomes. Six steps are included within “outputs”: envision, initiate, discover, design, implement, and evaluate (p. 60). In the same “output” section, actors involved in RDM are listed as researchers, research users, data managers, research support, IT support, librarians, archivists, compliance officers, senior managers, funders, and others. Even though Whyte’s pathway model is relevant and comprehensive in listing diverse research communities, it does not, as pointed out by Pinfield et al. (2014), address “any constraints on the service delivery side” (p. 4). Based on their qualitative investigation, Pinfield et al. (2014) developed “a library-oriented model of institutional RDM.” The model features factors including drivers (why), components (what), influencing factors (how), and stakeholders (who). Those within the stakeholder section include institutional units such as the library, IT services, academic departments, senior university managers, research support services, and other support services. Researchers from various disciplines are listed as stakeholders. The examples for drivers include storage, security, preservation, and compliance, whereas influencing factors consisted of demand, roles, resources, acceptance, and communications, among others. The authors indicated that this model aligns well with previous models such as the one by Jones et al. (2013) and Whyte (2014). Its uniqueness is that it “deals more with complexity of the underlying drivers and influencing factors that could shape which types of specific services are developed and what they might look like” (p. 24).
Another notable model is an RDM maturity model by Cox et al. (2017). In this model, four levels of RDM maturity are specified, with Level 0 as “none,” Level 1 as “basic,” Level 2 as “developing,” and Level 3 as “extensive.” These levels are categorized according to the existence or absence of services and support, compliance, skills, roles and structures, embedded practices, and cultural acceptance. All the models reviewed in this section provided relevant ideas for our current study. However, even though several of these models hold valuable practical implications, there has been rather limited empirical research investigating and confirming the constructs such as drivers, components, and influencing factors, which are associated with RDM. Even with the RDM maturity model outlined by Cox et al. (2017), substantial empirical evidences are needed to test, verify, or enhance the model.
2.2 RDM Services in Academic Libraries
Although information professionals served a role in data management before the proliferation of DMP requirements, the adoption of such requirements provided a “catalyst” for academic libraries and librarians to expand their data management services (Dietrich, Adams, Miner, & Steinhart, 2012). According to Cox et al. (2017), from 2013 to 2016, the number of academic libraries with an RDM policy in Australia nearly doubled from 29% to 56%, while in the same time period, the number in the UK more than doubled from 17% to 42%. Furthermore, the number of academic libraries providing data literacy training from 2013 to 2016 also increased dramatically, from 26% to 74% in Australia and from 14% to 65% in the UK. Similar comparisons for the US were conducted but with different results. According to Tenopir et al. (2015) who conducted the same survey in 2011 and 2015, there was no significant growth in the number of RDM services provided by academic libraries in the US.
A number of international surveys shed interesting light on the growth of RDM services and tools. In 2013, Fearon et al. (2013) conducted a SPEC survey among ARL member institutions to understand current RDM activities in ARL libraries. Out of the 125 member libraries, 73 responded to the 67-question survey covering topics of RDM services, data archiving services, RDM services, staffing, education and skills, training needs, and funding RDM services and partnerships. The results showed that RDM services and support for data archiving were the emerging services at the member libraries. Around 54 ARL libraries already provided RDM services, with varying capacities, and 17 libraries reported planning for RDM service provision in the near future. Of the RDM activities conducted, the focuses were on DMP for grant applications, data management consultation, and research data archiving. Another important finding from the survey was the need for collaborations between libraries and other units within the institution.
Meanwhile, in a survey of 500 science librarians at ARL-affiliated libraries, Antell et al. (2014) found that 94.9% of librarians were aware of the 2011 NSF requirement that researchers needed to submit a DMP. Ninety percent of the respondents reported that their institutions had an institutional repository (IR). Only 23.5% indicated that they had a data repository (DR henceforth) while another 27.1% indicated that a DR was being planned. Of those surveyed, 60.1% provided some kind of data management services to researchers. Such services were being planned in 17.8% of cases. It was also found that 39.5% of those surveyed had duties related to the IR, DR, or data management. The most common duties were “liaise, consult, and refer” (39%); “just starting” (16%); and “promote, publicize, or advocate” (14%).
In a survey of six European countries and Canada totaling 170 responses, Cox et al. (2017) found that the most common service across regions was the provision of web resources or guides for RDM. RDM training or data literacy training was a growing service that was considered as either basic or well developed depending on the institution. Some services were offered by only a few institutions, including promoting awareness of reusable data archives, data publication advisory services, data storage advisory services, and providing access to tools to assist with RDM. The top strategic priorities for services indicated by respondents included giving advice on copyright/intellectual property rights (IPR) for data and on running a DR.
In a study of services at 128 academic institutions in the US, Tenopir et al. (2015) found that most institutions did not offer or plan to offer a wide range of RDM services. Consultative services such as providing reference and support for citing data (29.5%) and publishing web guides for data management (21.5%) were offered more frequently than technical services such as providing technical support for RDS systems (approximately 15%) and creating or transforming metadata (less than 10%).
Research results concerning institutions in the US differ strongly from findings from a similar study based in Europe. Tenopir et al. (2017) surveyed the prevalence of research data services among members of the Association of European Research Libraries (LIBER). The findings revealed that 45% of libraries surveyed provided services in the form of consultations with researchers and students regarding DMP, 43.5% provided consultations on metadata standards for RDM, and 36.6% provided reference support. The authors concluded that, while disappointing, this gap between the provision of RDM services in the US and EU might be due to the earlier adoption of DMP requirements in the EU. It is also possible that there were regional differences in libraries’ abilities to overcome the challenges of implementing research data services.
Overall, it is clear that since the emergence of RDM services in libraries, there have been research studies surveying the kinds of RDM services that libraries of varying geographical locations provide. The comparative investigations into RDM services in the US and Europe seemed to show that EU was more advanced than the US. A broader comparative study examining the differences or gaps among RDM practices in the US and in other non-US world regions would be valuable. Furthermore, within the US, there has been no study exploring the differences or gaps of RDM services among the US regions. Specific focus on individual regions and the types of RDM services they provide would produce helpful information to identify both the gaps in services and the RDM niche associated with each geographical region within the US.
2.3 Challenges in Providing RDM Services
Many empirical studies on RDM services investigated the challenges that libraries or librarians face. For example, in the survey by Fearon et al. (2013), it was reported that establishing and maintaining effective partnerships and fostering campus-wide collaboration were mentioned by many respondents (38%) as the top challenge, followed by securing financial resources and funding (35%), increasing the engagement with faculty and researchers (31%), and providing technology infrastructure (27%). Limited staffing (25%), marketing services (25%), and staff training (23%) were also among the top six challenges. The need to provide training for library staff was also recognized in the article by Tenopir et al. (2015), as the authors pointed out that “in order to fully offer technical RDS, libraries need to have technologically skilled staff or greatly increase opportunities for technology training for their existing staff, which might not be feasible due to resource constraints” (p. 17).
Meanwhile, Cox et al. (2017) indicated that one of the major concerns related to the development of RDM services across various countries was that technical-oriented services such as providing a data catalog and curating active data were rather limited. Data curation skills and capabilities were not consistently in place. Other challenges identified by Cox et al. (2017) “include resourcing, working with other support services, and achieving ‘buy-in’ from researchers and senior managers” (p. 2182). Similar findings were reported in the study by Faniel and Connaway (2018). In conducting individual or focus group interviews that involved 36 library professionals in 2012 and 2013, Faniel and Connaway (2018) found that there were five factors contributing to librarians’ RDM support: human resources, communication, coordination, collaboration, technical resources, leadership support, and researchers’ perceptions of the library. Specifically related to technical resources, 36% of librarians indicated that the solutions related to data storage and preservation presented serious challenges, particularly with regard to the financial commitment required to sustain a long-term solution. In terms of human resources, the constraints noted included “demands on time” and “limited number of expert staff.” For 25% of the participants, researchers’ misperception about the library and librarians’ expertise was a major constraint for RDM services. These findings echo with other studies such as Cox et al. (2017) and Fearon et al. (2013).
Despite a great number of studies on RDM services in libraries and the perceptions of faculty or librarians in their endeavor of working collaboratively on RDM, there have been limited studies investigating into the differences in locations, organizational types, level of preparedness, and the degree of development associated with the types of services and kinds of training that librarians and researchers should provide and receive. With the evidence that there have been global-level efforts in RDM services (e.g., Cox et al., 2017; Tenopir et al., 2017), it is necessary to examine the extent to which RDM practices vary by locations. The revelation of the location differences will help to identify the impact of cultural and service infrastructure on location-based practices. The strengths and niche that are identified to be associated with location-based services would provide guidance for institutions lacking such strengths to improve and create their unique RDM service structures. In the same vein, examining the differences among organizational types such as academia versus non-academic health sectors will help various types of institutions to learn from one another and to develop their own effective strategies for providing RDM services. Meanwhile, probing into measures of preparedness and degrees of development would enable us to operationalize RDM maturity levels as per Cox et al. (2017), based on which, we would be able to further examine the variations in kinds of services and tools provided by librarians working in environments with varying levels of RDM maturities. These empirical constructs are important and worthwhile to pursue, not only because they have not been investigated much in the existing RDM literature but also due to the valuable insights and enriched understanding that they afford. The present study intends to address this gap by developing an in-depth understanding of the RDM services around the world and examining various differences and challenges surrounding the provision of RDM services.
3.1 Research Variables
The current study included a number of quantitative variables. As reviewed earlier, even though there were research studies examining RDM practices and policies through a global lens, seldom has any research compared country or location differences in RDM services. In this study, because we are interested in investigating the location and organizational differences in RDM services and tools, we have location groupings and organizational types as our independent variables. In terms of locations, in addition to grouping participants by world regions, we also categorized people based on whether they were from the US or non-US countries. For people who were located in the US, we were also interested in exploring the differences among the five regions (National Geographic Society, 2009). For organizational types, we had four categories including “university/college,” “government
agency,” “non-academic health sector,” and “others.” For this particular variable, we were interested in comparing participants from academic settings with those who were outside academia. The remaining two independent variables were levels of preparedness in providing RDM services and the degree of development that participants perceived that their institutional RDM services belonged. These two variables are related to RDM maturity (Cox et al., 2017), which are good indicators of the differences in RDM services. The dependent variables of this study included various forms of RDM services and tools provided in a given institution and respondents’ view of the kinds of challenges that they face in delivering RDM services. All the research variables that were included in our inferential statistics analysis are listed in Table 1.
Variables Used in This Study
|Independent Variables||Dependent Variables|
|Location||Data management services|
|● World regions||● RDM planning|
|● USA vs non-USA||● Data discovery and access|
|● US regions||● Data organization and curation|
|Organization type||● Metadata|
|● Academic versus non-academic||● Protocol documentation|
|Level of preparedness||● Data sharing and dissemination|
|● Prepared versus unprepared||● Data preservation|
|Role development||● Data visualization|
|● Five-point scale development level from not developed to very developed||RDM tools|
|● Electronic lab notebooks|
|● Data citation manager|
|● Data search engine|
|● Data processing software (R, Python, etc.)|
3.2 Research Questions
In this study, we pursue the answers to the following RQs, which include four primary RQs and several sub-RQs under each primary RQ. Table 2 lists all the RQs.
RQs and Sub-RQs Investigated in the Study
|RQ1||What is the state of current practice of RDM services in libraries?||1.1 What specific RDM services and tools are currently provided?|
|1.2 What aspects of RDM services and tools provided differ among various locations and|
|organizational types? 1.3 What challenges do libraries face in providing RDM services?|
|RQ2||What current role do librarians play in providing RDM services?||2.1 Do participants consider that they are personally prepared in providing RDM services?|
|2.1.1 What aspects of RDM services and tools provided different among participants who indicated that they are prepared or unprepared to provide RDM services?|
|2.1.2 Are there statistically significant differences between participants who considered themselves as “prepared” and the kind of challenges they perceived in providing RDM services?|
|2.2 How developed do participants perceive their current role in RDM services, and do they believe that they should take on a more formal role?|
|2.2.1 What aspects of RDM services and tools provided different among participants who reported their RDM role at different degrees of development?|
|RQ3||What specific knowledge and skills do librarians believe are needed for RDM training?||3.1 What sources do librarians use to identify RDM training opportunities?|
|3.2 What are the most important and useful knowledge and skills in RDM training?|
|3.3 What content areas and specific skill sets are useful for RDM training targeted to librarians?|
|3.4 What content areas and specific skill sets are useful for RDM training targeted for researchers?|
|RQ4||How do participants see as the future evolutions of the librarians’ role in providing RDM services?||4.1 What are the most common future RDM roles that respondents identified?|
From late May 2018 to early September 2018, we conducted an international survey featuring an online questionnaire containing 19 questions that inquired into the current practice of RDM in libraries, how prepared librarians were in RDM, knowledge and skills needed for RDM, and more. The questionnaire was hosted on SurveyMonkey. On May 23, 2018, a call for participation was sent to various librarian communities and international conferences’ attendees through email, blog posts, listservs, and word of mouth. Library Connect, an Elsevier outreach program for librarians, was used to distribute the online survey. A total of 241 responses were received. Answers to individual questions were not mandatory, so the number of responses to each question varied. The two authors processed survey data, coded responses, and analyzed the data. Results are outlined in the next section.
The 241 respondents came from five continents and 29 countries. As shown in Figure 2, 198 (82.2%) responses were from North America, 21 (8.7%) were from Europe, 13 (5.4%) were from Asia/Pacific/Middle East, five (2.1%) were from Africa, and four (1.7%) were from South America. Figure 3 shows the distributions of the respondents by country in the format of a pie chart. Seventy-eight percent of the participants were from the US.
The 187 US respondents came from 40 different states and District of Columbia (DC) as well as Puerto Rico (n=1) (see Figure 4 for distribution). The US respondents were from all five US regions. The Northeast (n=70, 37.6%) had the highest number of participants and Southwest (n=15, 8.1%) had the lowest. Table 3 lists the number of participants from various US regions.
Distribution of Survey Participants by US Regions (n=186, plus Puerto Rico)
|US Regions||Counts (Percentage)|
|Northeast (DC included)||70 (37.63)|
4.1.2 Organization Types
The 241 respondents were from four types of institutions, including universities/colleges, government agencies, non-academic health organizations, and other types of organizations (see Table 4 for distribution). The affiliations of 19 respondents were not available. Of 222 respondents who belonged to an institution, more than half were from an academic institution (n=152, 63.1%) and seven (2.9%) were from a non-academic health sector. In our inferential statistics analysis, we compared responses from academic participants with those from non-academic participants. Furthermore, for individual questions, there were responses from participants of the same institutions. We compared responses from participants of the same institutions with regard to factual information such as the kinds of RDM services and tools that a given institution provided; there were variations in these respondents’ answers. We believed that the varied responses were due to either the respondents were from different libraries within an institution or the librarians had different levels of awareness of the services and tools that a given library provided. Because our study is based on librarians’ perceptions and self-reported data, we decided to treat these responses from the same institution as the same as they were from different institutions.
Distribution of Survey Participants by Organization Types (n=241)
|Organizational Types||Counts (Percentage)|
|Government Agencies||22 (9.13)|
|Non-Academic Health Sectors||7 (2.90)|
|Other Types of Organizations||41 (17.01)|
|Not Available||19 (7.88)|
4.2 RDM Services and Tools Provided in Institutions
4.2.1 RDM Services
In answering the question of “What RDM services does your institution offer?” 63 respondents reported that their
most frequently offered services include RDM planning (n=51, 81.0%) and data sharing and dissemination (n=49, 77.8%). Table 5 lists the range of RDM services that participants reported that their institutions provided.
RDM Services Provided in Participants’ Institutions (n=63)
|RDM Services Provided||Counts (Percentage)|
|RDM planning||51 (80.95)|
|Data sharing and dissemination||49 (77.78)|
|Data preservation||42 (66.67)|
|Data discovery and access||41 (65.08)|
|Data visualization||37 (58.73)|
|Data organization and curation||37 (58.73)|
|Protocol documentation||20 (31.75)|
Through a series of Chi-square tests, statistically significant differences were found in terms of location, organization type, and the kinds of RDM services that librarians provided through their institutions. In the following sections, we outline these differences.
22.214.171.124 Location Differences – World Regions
Significant differences among North America, Europe, and Asia-Pacific regions were found in terms of the provision of data discovery and access services (χ2(2, n=60)=6.085, p=0.048, V=0.32) and data visualization services (χ2(2, n=61)=17.253, p=0.000, V=0.53). Significantly higher proportions of institutions in North America (75.0%) provided data discovery and access services than those in Asia-Pacific (66.7%) and Europe (33.3%). In addition, significantly higher proportions of institutions in North America (71.4%) provided data visualization services than those in Europe (11.1%) and Asia-Pacific (0.0%).
126.96.36.199 Location Differences – USA versus Non-USA
Significant differences between USA and non-USA countries were found in terms of the provision of data organization and curation services (χ2(1, n=61)=4.736, p=0.030, V=0.28), metadata services (χ2(1, n=61)=4.891, p=0.027, V=0.28), and data visualization services (χ2(1, n=61)=11.716, p=0.001, V=0.44). Significantly higher proportions of institutions in the USA (66.0%) provided data organization/curation services than those in non-USA (35.7%) locations. Significantly higher proportions of institutions in the USA (72.3%) provided metadata services than those in the non-USA countries (42.9%). Finally, significantly higher proportions of institutions in the USA (70.2%) provided data visualization services than those in the non-USA countries (21.4%).
188.8.131.52 Location Differences – US Regions
Significant differences among five US regions were found in terms of the provision of protocol documentation services (χ2(4, n=47)=16.599, p=0.002, V=0.59) and data preservation services (χ2(4, n=47)=9.598, p=0.048, V=0.45). Significantly higher proportions of institutions in the Midwest (66.7%) provided protocol documentation services than those in the West (58.3%), Southeast (50.0%), Northeast (8.7%), and Southwest (0.0%). Meanwhile, significantly higher proportions of institutions in the West (100.0%) provided data preservation services than those in the Midwest (83.3%), Northeast (52.2%), Southeast (50.0%), and Southwest (50.0%).
184.108.40.206 Organizational Differences
Significant differences were found between academic institutions and non-academic institutions in terms of service provision of RDM planning (χ2(1, n=58)=7.344, p=0.007, V=0.36) and data preservation (χ2(1, n=58)=3.888, p=0.049, V=0.26). Significantly higher proportions of respondents working in universities (90.7%) reported that their institutions provided RDM planning services than those working in non-academic organizations (60.0%). Additionally, significantly higher proportions of respondents working in universities (74.4%) reported the provision of data preservation services than those working in non-academic organizations (46.7%).
4.2.2 RDM Tools
With regard to the RDM tools offered by their institutions, among the 57 participants who answered the question,
44 indicated that they provided data repositories (n=44, 77.2%). Other frequently mentioned tools were data processing software such as R, Python (n=38, 66.7%), data citation manager (n=23, 40.4%), and electronic lab notebooks (n=21, 36.8%). Table 6 lists the RDM tools selected by participants as being provided through their institutions.
RDM Tools Available in Participants’ Institutions (n=57)
|RDM Tools Provided||Counts (Percentage)|
|Data repositories||44 (77.19)|
|Data processing software||38 (66.67)|
|Data citation manager||23 (40.35)|
|Electronic laboratory notebooks||21 (36.84)|
|Data search engine||16 (28.07)|
Through a series of Chi-square tests, statistically significant differences were found in terms of location, organization type, and the kinds of RDM tools that institutions provided. In the following sections, we outline these differences.
220.127.116.11 Location Differences – World Regions
Significant differences among North America, Europe, and Asia-Pacific regions were found in terms of the provision of data processing software (χ2(2, n=55)=8.054, p=0.018, V=0.38). Significantly higher proportions of respondents from institutions in North America (75.6%) reported that their institutions provided data processing software than those from Europe (57.1%) and Asia-Pacific (0.0%).
18.104.22.168 Location Differences – USA vs. Non-USA
Statistically significant differences between US institutions and non-US institutions were found in terms of providing data processing software (χ2(1, n=55)=5.406, p=0.020, V=0.31). Significantly higher proportions of respondents working in US institutions (86.8%) reported that their institutions provided data processing software than those working in non-US organizations (13.2%).
22.214.171.124 Organizational Differences
University and non-academic institutions differ significantly in terms of the provision of data processing software (χ2(1, n=53)=5.487, p=0.019, V=0.32). Significantly
higher proportions of respondents working in universities (76.7%) reported that their institutions provided data processing software than those working in non-academic organizations (41.7%).
In summary, a number of significant location or organizational differences were found in terms of the provision of the data services or RDM tools. Figure 5 presents an aggregated view of various significant differences.
4.3 Challenges in Providing RDM Services
In responding to the question “What challenges does your institution face in offering those services?”, among the 54 participants who provided their answers, the most frequent theme was “capacity/bandwidth; limited staffing” (n=28, 51.9%), followed by “marketing and outreach of RDM services” (n=16, 29.6%) and “collaborative understanding among campus departments” (n=16, 29.6%). Other common responses included “upskilling staff,” “providing consistent service in terms of quality and options,” and “handling researcher’s and faculty’s misconception of RDM and library services.” Figure 6 shows the categories of the common responses.
4.4 The Role of Librarians in RDM Services
4.4.1 Level of Preparedness
A total of 87 respondents answered the question “Do you personally feel prepared to offer RDM services?” Among them, 61 (70.1%) said “yes,” whereas 26 (29.9%) indicated “no.” For those who said no, in their explanation of the reasons for feeling unprepared, they listed “lack of training” (n=8, 30.8%), “lack of knowledge and skills” (n=7, 26.9%), “only comfortable with providing basic service” (n=7, 26.9%), and “not prepared for specific/ advanced RDM areas” (n=5, 19.2%). Other reasons listed included “lack of experience,” “job title was not Data Librarian,” “lack of time and resources,” and “lack of cooperation of researchers.”
Through a series of Chi-square tests, statistically significant differences were found in terms of the type of RDM services provided and the challenges that the respondents identified. In the following sections, we outline these differences.
126.96.36.199 RDM Services
Significant differences between respondents who indicated that they were prepared and those who believed that they were unprepared were found in terms of the provision of protocol documentation services (χ2(1, n=61)=7.864, p=0.005, V=0.36) and data preservation services (χ2(1, n=61)=4.051, p=0.044, V=0.26). A significantly higher proportion of the respondents who felt they were prepared provided protocol documentation services (42.5%) than those in the “unprepared” category (9.5%). A significantly higher proportion of the respondents who felt they were prepared provided data preservation services (75.0%) than those in the “unprepared” category (52.4%).
188.8.131.52 Challenges in providing RDM services
Significant statistical differences were found in the level of preparedness and the challenges that respondents identified in providing RDM services, specifically in “bandwidth and capacity” (χ2(1, n=54)=7.269, p=0.007, V=0.37), “upskilling staff” (χ2(1, n=54)=12.399, p=0.000, V=0.48), and “marketing, outreach and awareness”(χ2(1, n=54)=4.441, p=0.035, V=0.29). A significantly higher proportion of the respondents who felt they were “unprepared” identified “bandwidth and capacity” as a challenge (77.8%) than those in the “prepared” category (38.9%). A significantly higher proportion of the respondents who felt they were “unprepared” also identified “upskilling staff” as a challenge for them (38.9%) than those in the “prepared” category (2.8%). However, a significantly higher proportion of the respondents who felt they were “prepared” identified “marketing outreach and awareness” as a challenge (38.9%) than those in the “unprepared” category (11.1%).
4.4.2 Degree of Development in RDM Role
A total of 239 respondents answered the question “how developed is your role with your institution’s RDM?” using a 5-point scale with 1 being “not developed” and 5 being “very developed.” Among these respondents, 89 (37.2%) indicated that their role was “not developed” and 19 (8.0%) stated their role as “very developed.” Figure 7 displays the distribution of the response categories. Parallel to the fact that less than 8% of respondents felt that their role was “very developed” when responding to the question “Do you wish you had a more formal role with RDM?”, 121 (88.3%) said “yes” and only 16 (11.7%) said “no.” The contrast between the responses on preparedness and these two questions suggests that even though the librarians did not feel that their RDM role was fully developed and wished to have a more formal role, they personally felt more prepared to provide such services.
184.108.40.206 Organizational differences in role perception
A Chi-square test indicated that there is a significant difference between university/academic institutions and non-academic organizations in respondents’ perception on whether they wished to assume a more formal role in RDM (χ2(1, n=126)=5.449, p=0.020, V=0.21). Significantly higher proportions of respondents working in non-academic organizations (97.6%) wished to assume a more formal role in RDM than those working in universities/ academic institutions (83.3%).
220.127.116.11 Level of preparedness and degree of development in RDM role
Significant differences were found in the level of preparedness with regard to their perception of the degree of development in RDM services (χ2(2, n=126)=9.870,
p=0.007, V=0.34). Significantly higher proportion of the respondents in the “developing” category indicated that they were “unprepared” (43.5%) than those in the “developed” category, indicating that they were “unprepared” (21.7%), and those in the “very developed” category, indicating that they were “unprepared” (5.6%).
18.104.22.168 RDM services and degree of development in RDM role
Significant differences among respondents of different degrees of development in their RDM role were found in terms of the provision of data organization and curation services (χ2(2, n=61)=10.963, p=0.004, V=0.42), metadata services (χ2(2, n=61)=9.666, p=0.008, V=0.40), and protocol documentation services (χ2(2, n=61)=6.318, p=0.042, V=0.32). A significantly higher proportion of the respondents in the “very developed” category indicated that they provided data organization and curation service (81.8%) than those in the “developed” category (80.0%) and those in the “developing” category (42.9%). Meanwhile, a significantly higher proportion of the respondents in the “developed” category indicated that they provided metadata service (86.7%) than those in the “very developed” category (81.8%) and those in the “developing” category (51.4%). A significantly higher proportion of the respondents in the “very developed” category indicated that they provided protocol service (63.6%) than those in the “developed” category (26.7%) and those in the “developing” category (22.9%).
4.5 Essential RDM Skills, Knowledge, and RDM Training
A number of questions in the survey probed the issue of RDM training opportunities, the essential knowledge and skills needed, and what RDM training was needed for librarians and for researchers.
4.5.1 RDM training opportunities
A total of 68 respondents answered the question concerning where they learned about RDM training opportunities. More than 55% of the respondents (n=40, 58.8%) indicated that they identified training opportunities through “email listservs.” Other sources included announcements from associations (n=16, 23.5%), messages from organizations (n=15, 22.1%), exchange from experts or colleagues (n=12, 17.7%), by searching online for workshops (n=12, 17.7%), receiving feeds from social media (n=11, 16.2%), through word of mouth (n=8, 11.8%), and via conferences (n=8, 11.8%). When asked “how likely would you participate in online training on RDM?,” 111 (92.5%) out of 120 selected “very likely” or “somewhat likely” and three (2.5%) selected “somewhat unlikely” or “very unlikely.” Six (5.0%) participants selected “neutral.” In terms of whether getting academic credit would motivate them more in getting the RDM training, the answer was nearly 50/50 with “yes” as 52.5% and “no” as 47.5%.
4.5.2 Essential RDM knowledge and skills
Sixty-three respondents answered the question “What three things have you learned about RDM that was absolutely mandatory?” The most frequent responses were “data/file documentation” (n=17, 26.9%), “metadata” (n=13, 20.6%), and “DMPs” (n=13, 20.6%). Figure 8 includes the common responses.
As to what further RDM training was needed, the most frequent answers were “advanced data management skills (e.g., data analysis, preservation, acquisition and “de-identification)” (n=15, 22.7%), “learning about DM or Open Source tools” (n=14, 21.2%), “hands-on projects” (n=8, 12.1%), and “RDM related policies and regulations” (n=7, 10.6%). Figure 9 includes all the response categories concerning this question.
4.5.3 Training for Librarians and Researchers
Sixty-one participants answered the question about RDM training for information specialists, and 68 answered the question about RDM training for researchers. As seen in Table 7, participants’ top six responses about RDM training needed for librarians differed from the top responses about RDM training needed for researchers. While librarians were mostly interested in learning about data service skills, the other frequently mentioned categories were also service-oriented or outreach activities. On the other hand, respondents believed that researchers should get rather specific RDM training such as management, process, storage, preservation, DMP, data sharing and dissemination, and more.
RDM Training for Librarians and for Researchers
|RDM Training for Librarians (n=61)||RDM Training for Researchers (n=68)|
|● Data services/science skills (13, 22.95%)||● Data management/data processing (12, 17.65%)|
|● Skills regarding data reference interview (11, 15.3%)||● Data storage/data preservation (11, 16.18%)|
|● Basic training (8, 13.11%)||● DMP (9, 13.24%)|
|● Connecting with the researcher and faculty (8, 13.11%)||● Data sharing/data dissemination (9, 13.24%)|
|● Depending on the knowledge that the librarians have and their||● Data file/documentation (8, 11.76%)|
|institutional needs (6, 9.84%)||● All/everything/a lot (7, 10.29%)|
|● All levels of trainings (5, 8.20%)||Education/training (7, 10.29%)|
|Software/systems/tools (7, 10.29%) Metadata (7, 10.29%)|
4.6 RDM Evolving Role
Fifty-three participants shared their thoughts on the “evolving role of librarians in the context of RDM.” Sixteen (30.2%) stated that one aspect of librarians’ continuing role is to “support researchers with RDM,” while 14 (26.4%) acknowledged the “importance of the role” librarians play in RDM services. Twelve (22.6%) participants suggested that librarians need to “connect with others in the institution” and form “partnership,” and 11 (20.8%) believed librarians should “embed in research/data lifecycle.” Other common responses included “teaching RDM” (n=7, 13.2%), providing “consultation” (n=6, 11.3%), and “collaborating with IT departments” (n=5, 9.4%).
Several respondents stressed the value and contribution of librarians to RDM. As stated by a participant (P240), “I am the only librarian by degree in a data services group of six people; I think data viz, GIS, DH, etc. are attracting a wide variety of people, but RDM is a place where librarians excel because it speaks to our strengths for making information documented and discoverable. As collections are seen as data and data as collections and as researchers now want to use natural language processing and machine learning on subscription library resources – we are key parts of the ecosystem.” A great number of participants saw “supporting researchers” for their RDM needs as a continuing role for the future. As commented by P201, “It depends on the institution, but I see librarians becoming more hands-on in helping researchers organize their data, and learning/teaching software and tools.” Another respondent (P170) described an entire suite of researcher’s activities that RDM libraries may support: “I see librarians in a position of advisor and reviewers. As a librarian, we can give researchers all the keys to start their study with a good architecture, support them in the DMP writing, answer questions during the study, help them to identify sensitive data, give information about legislation, curate the data, review an article before publication (see if the data given are enough for reproducibility), give advice on metadata, archive the data, share the data.” Meanwhile, several respondents believed that it is very important to connect with other divisions on campus to provide a full range of RDM services. As pointed out by P137, “partnering with others on campus for a full-fledged/ all services approach – need to work more with ITS and others providing more data analysis help, and storage capacity.” The value of extended collaboration across campus was also advocated by P199 who stated that “Working in partnership with other offices, directives, and departments on campus; focus on training and keeping up with new tech developments such as ELNs.”
Another future development seen by several respondents was to embed the RDM librarian and the services they provide into research data lifecycle. As noted by P189, “Eventually, we will be more embedded in the process as consultants and trainers.” P130 also indicated that “I’d like to see librarians as partners in research groups, involved throughout the research data lifecycle.”
P215 declared a similar sentiment: “I hope that our best and brightest don’t leave the profession to become ‘data scientists’ and that data librarians become integral to their researchers’ research lifecycle.” Parallel to P215’s concern of losing the RDM librarian workforce, P191 also articulated the challenge of misconception of researchers about librarian role – “Honestly I’m worried about the future of RDM as a librarian role, mainly because in my experience it’s such a challenge to get researchers to think of libraries and librarians that way.”
Multiple respondents also expressed their vision of incorporating data access and discovery into the existing library search and discovery systems. While P151 discussed their concerns with the lack of efficient data discovery, data description, and data depository mechanism by stating “I’m guessing librarians are going to have to be as facile with data as they are with other products of scholarship. Data discovery will continue to be a huge challenge, as data are not well described and are deposited all over creation without standardized metadata,” P029 claimed that “It’s [data is] a new content type and we must set up our systems to enable search and discovery like we do for other research output.” Several participants pointed out that RDM role in the future will be more technologically driven, as commented by P088, “Library roles become increasingly technical and technology-focused.” Interestingly, a few participants suggested paying attention to the advancement of AI and its impact on RDM services. P037 indicated that “I think AI will have a huge impact. Positioning ourselves to take advantage of that space is key.” P169 echoed, “Impact of AI will be huge and determining significant roles will be important.”
Overall, participants’ qualitative comments in discussing the evolving role of librarians in providing RDM services are all rooted in actual practice and corresponded well with their responses in other questions such as the kinds of services and tools they provided, challenges and the level of preparedness, and essential skills and knowledge needed for RDM.
5 Discussion and Conclusion
In this study, we investigated extensively current RDM practice offered from institutions around the world. We identified a number of location and organizational differences in the RDM services and tools provided as well as the impact of the level of preparedness and degree of development in RDM roles on the types of RDM services provided. Figure 10 lists all the statistically significant differences found in this study. As it is shown, not only locations and organizational types had an impact on RDM services and tools offered but also participants’ perception of their levels of preparedness and degrees of development impacted the kinds of services they provided and the kind of challenges they perceived.
We also examined respondents’ perception on both the current challenges and future roles of RDM services. Although our results provided necessary answers to our RQs, there are several limitations in our study. One of the limitations is related to the composition of our study sample. The majority of our sample was from North America (82.2%) and academic institutions (63.1%). When we compared the locations and organizations, it would be more desirable if we had a balanced sample. In addition, even though we were able to perform comparative analyses among five of the US regions, with the range from Northeast (37.6%) and Southwest (8.1%), it would be helpful to have an even distribution of participants from the five regions. Our study revealed interesting results with regard to the RDM service strength associated with each US region. A potential follow-up study focusing exclusively on comparing regional strength in RDM service provision will need to perform systematic or cluster-based multistage random sampling in order to achieve a balanced and comparable sample. Another limitation of the study design is that one of the main distribution methods that we used was through Elsevier Library Connect blog post. A number of participants were resistant to providing full responses to questions due to their concern of the involvement of a publisher in the RDM realm. Nevertheless, we were able to obtain a good representative sample both in terms of the locations and in terms of a variety of organizational types. Furthermore, the findings of this research study are confined to the survey responses; in order to obtain a more in-depth contextual understanding of these responses, follow-up interviews could be conducted. In further studies, a multi-phased and multimethod design may be used to warrant a comprehensive understanding of the challenges and practices of RDM services and tools.
As one of the first studies investigating the location and organizational differences in the RDM practice around the world, the present research revealed gaps in the global-based effort of developing RDM services. As argued earlier, the understanding of location and organizational differences would help us learn more about the strengths and weaknesses associated with each location or organizational types. It would also help to identify the unique characteristics, or the niche associated with a given location or institutional type. Moreover, our investigation into the level of preparedness, the degree of development, as well as the obstacles that RDM librarians face helps to paint a realistic picture of the problems that librarians encounter in their collaborative endeavor of positioning themselves in a larger RDM lifecycle. This study contributes to the advancement of empirical understanding of the RDM phenomena by introducing and employing new comparative perspectives such as location and organizational types as well as exploring the impact of varying levels of RDM maturity. With a majority of the respondents hoping to receive more training while expressing concerns of lack of bandwidth or capacity, it is clear that, in order to grow RDM services, institutional commitment to resources and training opportunities is crucial. As an emergent profession, data librarians need to be nurtured, mentored and further trained. The study makes a case for developing a global community of practice where data librarians work together, exchange information, help one another grow, and strive to advance RDM practice around the world.
This research study was a part of needs assessment project, funded by Elsevier to establish the Research Data Management Librarian Academy (RDMLA). The authors wish to thank Jean Shipman and Elaine Martin for developing questions for the online survey and Alyson Gamble for performing preliminary data processing and analysis of the survey dataset. The authors also wish to thank all the respondents who answered our survey.
Antell, K., Foote, J. B., Turner, J., & Shults, B. (2014). Dealing with data: Science librarians’ participation in data management at Association of Research Libraries institutions. College & Research Libraries 75(4), 557-574.doi:10.5860/crl.75.4.557
Australian Research Council (ARC). (2017). ARC open access policy Retrieved from http://www.arc.gov.au/arc-open-access-policy
Choudhury, G. S. (2008). Case study in data curation at Johns Hopkins University. Library Trends 57(2), 211-220. Retrieved from http://hdl.handle.net/2142/10669
Cox, A. M., Kennan, M. A., Lyon, L., & Pinfield, S. (2017). Developments in research data management in academic libraries: Towards an understanding of research data service maturity. Journal of the Association for Information Science and Technology 68(9), 2182-2200. doi:10.1002/asi.23781
Cox, A. M., & Pinfield, S. (2014). Research data management and libraries: Current activities and future priorities. Journal of Librarianship and Information Science 46(4), 299–316. doi:10.1177/0961000613492542
Department of Science & Technology, Government of India. (n.d.). National data sharing and accessibility policy Retrieved from http://www.dst.gov.in/national-data-sharing-and-accessibility-policy-0
Dietrich, D., Adams, T., Miner, A., & Steinhart, G. (2012). De-mystifying the data management requirements of funders. Issues in Science and Technology Librarianship 70(1), 1-16. doi: 10.5062/F44M92G2
Faniel, I. M. & Connaway, L. S. (2018). Librarians’ on the factors influencing research data management programs. College & Research Libraries 79(1), 100-119. doi:10.5860/crl.79.1.100
Fearon, D., Gunia, B., Pralle, B.E., Lake, S., & Sallans, A. L., (2013). Research data management services SPEC Kit 334. Washington, DC: Association of Research Libraries. doi:10.29242/spec.334
Federer, L. (2018). Defining data librarianship: a survey of competencies, skills, and training. Journal of the Medical Library Association 106(3), 294-303. doi:10.5195/ jmla.2018.306
Gore, S. A. (2013). A librarian by any other name: The role of the informationist on a clinical research team. Journal of eScience Librarianship 2(1), 20-24.
Government of Canada. (2016, December 21). Tri-Agency Statement of Principles on Digital Data Management Retrieved from http://www.science.gc.ca/eic/site/063.nsf/eng/h_83F7624E.html?OpenDocument
Hanson, K. L., Bakker, T. A., Svirsky, M. A., Neuman, A. C., & Rambo, N. (2013). Informationist role: clinical data management in auditory research. Journal of eScience Librarianship 2(1), 25-29.
Hasman, L., Berryman, D., & Mcintosh, S. (2013). NLM Informationist Grant – web assisted tobacco intervention for community college students. Journal of eScience Librarianship 2(1), 30-34.
Japan Science and Technology Agency (JST). (2013). JST Policy on Open Access and Research Publications and Research Data Management Retrieved from https://www.jst.go.jp/EN/about/openscience/index.html
Jones, S., Pryor, G., & Whyte, A. (2013). How to Develop Research Data Management Services - a guide for HEIs. DCC How-to Guides. Edinburgh: Digital Curation Centre.
Martin, E. R. (2013). Highlighting the informationist as a data librarian embedded in a research team. Journal of eScience Librarianship 2(1), 1-2.
National Geographic Society. (2009). United States Regions. Retrieved June 19, 2019, from https://www.nationalgeographic.org/maps/united-states-regions/
National Institutes of Health. (2005). Policy on enhancing public access to archived publications resulting from NIH-funded research [Internet]. Retrieved from http://grants.nih.gov/grants/guide/notice-files/NOT-OD-05-022.html
National Institutes of Health. (2008). Revised policy on enhancing public access to archived publications resulting from NIH-funded research [Internet]. Retrieved from http://grants.nih.gov/grants/guide/notice-files/NOT-OD-08-033.html
National Science Foundation. (2010a). Dissemination and sharing of research results [Internet]. Retrieved from http://www.nsf.gov/bfa/dias/policy/dmp.jsp
National Science Foundation. (2010b). Scientists seeking NSF funding will soon be required to submit data management plans [Internet]. Retrieved from http://www.nsf.gov/news/news_summ.jsp?cntn_id=116928&org=NSF&from=news
National Science Foundation. (2011). Grant proposal guide: Chapter II - proposal preparation instructions [Internet]. Retrieved from http://www.nsf.gov/pubs/policydocs/pappguide/nsf11001/gpg_2.jsp#IIC2j
Office of Science and Technology Policy (OSTP). (2013, February 22). Memorandum for the heads of executive departments and agencies Retrieved from https://rosap.ntl.bts.gov/view/dot/34953
Perrier, L., Blondal, E., & MacDonald, H. (2018). Exploring the experiences of academic libraries with research data management: A meta-ethnographic analysis of qualitative studies. Library & Information Science Research 40(3-4), 173-183.
Pinfield, S., Cox, A. M., & Smith, J. (2014) Research data management and libraries: Relationships, activities, drivers and influences. PLoS ONE 9(12), e114734. doi:10.1371/journal. pone.0114734
Shearer, K. (2015). Comprehensive Brief on Research Data Management Policies Retrieved from: https://portagenetwork.ca/wp-content/uploads/2016/03/Comprehensive-Brief-on-Research-Data-Management-Policies-2015.pdf
Smale, N., Unsworth, K. J., Denyer, G., & Barr, D. P. (2018). The History, Advocacy and Efficacy of Data Management Plans. bioRxiv 443499. doi:10.1101/443499
Tenopir, C., Hughes, D., Allard, S., Frame, M., Birch, B., Baird, L., …& Lundeen, A. (2015). Research data services in academic libraries: Data intensive roles for the future? Journal of eScience Librarianship 4(2), 1-21. doi:10.7191/jeslib.2015.1085
Tenopir, C., Talja, S., Horstmann, W., Late, E., Hughes, D., Pollock, D., ... & Allard, S. (2017). Research data services in European academic research libraries. Liber Quarterly 27(1), 23-44. DOI: 10.18352/lq.10180
UK Research and Innovation. (n.d.). Data Policy Retrieved from: https://www.ukri.org/funding/information-for-award-holders/data-policy/
Varner, S., & Hswe, P. (2016). Special report: Digital humanities in libraries Retrieved from American Libraries website: https://americanlibrariesmagazine.org/2016/01/04/special-report-digital-humanities-libraries/
Whyte, A. (2014). A pathway to sustainable research data services: From scoping to sustainability. In Pryor, Jones, & Whyte eds Delivering research data management services (pp. 59-88) London: Facet.
Whyte, A., & Tedds, J. (2011). Making the case for research data management Retrieved from Digital Curation Centre website: http://www.dcc.ac.uk/resources/briefing-papers