EDITOR’S SUMMARY

Prompted by federal mandates regarding data management planning, researchers sought to understand the needs and expectations of faculty at the University of Minnesota Twin Cities for managing their data. An initial survey revealed weak identification with the word data. Researchers then customized surveys for four colleges, using terminology that would resonate best for each. Survey results showed a distinction between those who identify the primary output of their work as data vs. research materials. Among those in the College of Food, Agriculture and Natural Resource Sciences, 91% responded they generate data, along with 65% of health center researchers, 68% of College of Science and Engineering respondents and 54% of those from the College of Liberal Arts. Those whose focus was data wanted support with long term data preservation and preparing for sharing materials. Respondents who stated they produce research materials rather than data indicated different needs, though help with long term preservation and better access to resources to support data management were common themes.

KEYWORDS

data curation
materials preservation
research data sets
needs assessment
faculty


RDAP Review

When Data Is a Dirty Word: A Survey to Understand Data Management Needs Across Diverse Research Disciplines

by Alicia Hofelich Mohr, Josh Bishoff, Carolyn Bishoff, Steven Braun, Christine Storino and Lisa R. Johnston

In recent years, academic research libraries have been actively surveying faculty in order to understand their research data management needs in light of new requirements and expectations around data management planning from the federal funding agencies [1, 2, 3]. Evidence based on cross-disciplinary surveys of faculty suggests that needs vary by department [4, 5, 6, 7]. Additionally, a survey of researchers in the humanities, natural sciences and social sciences at the University of Kansas found that influences on practices and needs vary according to the discipline and also according to the research methodology (for example, qualitative or quantitative) of the individual [5]. While many of these surveys start by defining data as a wide net covering many file types (for instance, [4]), not all researchers see the materials they work with as “data,” and, more importantly, not all agree with the notion that those materials even should be considered data [8].

To better understand disciplinary differences in the data management needs of local researchers, we set out to create a survey that would be sensitive to the language researchers used to describe their own work practices. As data management services benefit a wide spectrum of digital scholarly activity, it is important to engage researchers who describe the products of their scholarly or creative work as something other than “data.” To engage these researchers, we first introduced data management as a broad term covering a range of activities that include managing, documenting, sharing and preserving data or research materials. The survey instrument was then designed to allow researchers to self-select whether they collect, create or use “data” or “research materials,” and the rest of the questions were presented based on that choice. Our results from four colleges at the University of Minnesota Twin Cities show that a significant portion of survey respondents chose not to describe their work as “data,” suggesting that an intentionally flexible approach to the language of research can broaden the reach of this kind of needs assessment.

Survey Design Strategy

The Twin Cities campus at the University of Minnesota has over 3,000 faculty in 11 colleges and centers, representing a diverse range of disciplines. The first version of the survey was developed for one of the most heterogeneous of the colleges, the College of Liberal Arts (CLA). The survey was designed to reach researchers in departments ranging from art and music to economics and psychology and was done in consultation with the associate dean for research and with library staff in the humanities.

The CLA survey successfully reached a broad range of researchers (30% response rate, with 29 of 32 departments represented). After the CLA survey was complete, CLA and University Libraries staff modified the instrument to run in other major colleges on campus: the Academic Health Center (AHC), which includes the Medical School and other academic health colleges; the College of Science and Engineering (CSE); the College of Food, Agriculture and Natural Resource Sciences (CFANS); and the College of Biological Sciences (CBS). Disciplinary input was key. For each college, we consulted with library subject liaisons, college support offices and associate deans for research, garnering feedback about language that would resonate best with their faculty.

Each college was given a different, customized version of the roughly 30-question survey based on this feedback [9]. Similar questions were used where possible, as we also wanted to compare results across colleges. The surveys were run in the four colleges between September 2013 through February 2015 and yielded a total of 726 responses (CLA n=172; AHC n=329 from 6 colleges, response rate unknown due to listserv distribution method; CSE n=79 from 12 departments, 18% response rate; CFANS n=146 from 14 departments, response rate unknown due to listserv distribution method). A survey for CBS is planned for fall 2015.

Results

The breakdown by college demonstrated that many researchers, even in colleges that are traditionally thought of as data-heavy, chose not to identify data as the primary output of their work:

  • CLA: 54% data vs. 46% research materials
  • CSE: 68% data vs. 32% research materials
  • CFANS: 91% data vs. 9% research materials
  • In the Academic Health Center (AHC), researchers were asked whether they identified as a “researcher” (65%) or “clinician” (35%); based on feedback, all answered questions about data rather than research materials.
Figure 1. University of Minnesota faculty responses to “I would like more support in …” displayed by college and by research “type” classification

Figure 1. University of Minnesota faculty responses to “I would like more support in …” displayed by college and by research “type” classification

When asked to indicate their desire for support around various facets of the research data lifecycle (Figure 1), faculty who worked with “data” wanted more support in preserving data/research materials in the long term (after the research project is completed/published), followed by assistance with preparing their data/research materials for sharing (navigating privacy, copyright issues). Far fewer faculty indicated a need for support with storing data in the short term overall.

Researchers who said “research materials” were their primary product of research indicated different areas where they wanted more support, with respondents from CSE and CLA wanting less support overall compared to their “data” colleagues. Respondents who worked with “research materials” in CFANS reported wanting more support than these respondents from other colleges. Overall, preserving data in the long term is a consistently high need across colleges and types of data/materials. The general responses to this question reflected a lack of access to resources (time, people and funds) to support better data management practices.

Conclusions

This method of surveying faculty was very useful for understanding differences in user needs across and within academic colleges on our campus and has impacted our service development, outreach techniques and education. It has been successful in reaching researchers that may not be comfortable describing their scholarly and academic work as “data” – those who may see the term as a dirty word rather than the basis of their research. We invite others to test our survey tool at their campus. The comparison of each survey question by college is available at http://hdl.handle.net/11299/174051.

Resources Mentioned in the Article

[1] National Institutes of Health (NIH). (February 26, 2003). Final NIH statement on sharing research data. Notice: NOT-OD-03-032. Retrieved from http://grants.nih.gov/grants/guide/notice-files/NOT-OD-03-032.html

[2] National Science Foundation (NSF). (November 30, 2010). Dissemination and sharing of research results. Retrieved from www.nsf.gov/bfa/dias/policy/dmp.jsp.

[3] Holdren, J. P. (August 2013). Increasing access to the results of federally funded scientific research: Memorandum for the heads of executive departments and agencies. Office of Science and Technology Policy, Executive Office of the President, Washington, DC. Retrieved from www.whitehouse.gov/sites/default/files/microsites/ostp/ostp_public_access_memo_2013. pdf

[4] Akers, K. G., & Doty, J. (2013). Disciplinary differences in faculty research data management practices and perspectives. International Journal of Digital Curation, 8(2), 5-26. doi:10.2218/ijdc.v8i2.263.

[5] Weller, T. and Monroe-Gulick, A. (2014). Understanding methodological and disciplinary differences in the data practices of academic researchers, Library Hi Tech, 32(3), 467 – 482. doi:10.1108/LHT-02-2014-0021.

[6] Diekema, A. R., Wesolek, A., & Walters, C. D. (2014). The NSF/NIH effect: Surveying the effect of data management requirements on faculty, sponsored programs, and institutional repositories. Journal of Academic Librarianship, 40(3–4), 322-331. doi:10.1016/j.acalib.2014.04.010.

[7] Parham, S. W., Bodnar, J., & Fuchs, S. (2012). Supporting tomorrow’s research: Assessing faculty data curation needs at Georgia Tech. College & Research Libraries News, 73(1), 10-13. http://hdl.handle.net/1853/48706.

[8] Marche, S. (October 28, 2012). Literature is not data: Against digital humanities. LA Review of Books. Retrieved from https://lareviewofbooks.org/essay/literature-is-not-data-against-digital-humanities

[9] Hofelich Mohr, A., Bishoff, J., Johnston, L., Braun, S., Storino, C., & Bishoff, C. (2015). Data management needs assessment – Surveys in CLA, AHC, CSE, and CFANS. Retrieved from the University of Minnesota Digital Conservancy, http://hdl.handle.net/11299/174051.


Alicia Hofelich Mohr, the corresponding author, is the data management specialist in the College of Liberal Arts, University of Minnesota, and she can be reached at hofelich<at>umn.edu.

Josh Bishoff, Steven Braun and Lisa R. Johnston are on the staff of the University Libraries, University of Minnesota.

Carolyn Bishoff and Christine Storino are graduate students in the data science program, University of Minnesota.