RDAP Summary: Renee Walsh

This year we were able to use the proceeds from our annual conference to help three professionals attend the Research Data Access & Preservation (RDAP) Summit. Held in conjunction with the Information Architecture (IA) Summit, RDAP explores themes such as open data, data infrastructure, metadata, and data preservation. The RDAP community brings together a variety of individuals, including data managers and curators, librarians, archivists, researchers, educators, students, technologists, and data scientists from academic institutions, data centers, funding agencies, and industry who represent a wide range of STEM disciplines, social sciences, and humanities.

The attendees wrote up their experiences to share with our readers. This account is written by Renee Walsh of University of Connecticut:

Attending the RDAP summit in Chicago was a great experience for me. I appreciated the diversity of speakers and viewpoints. As a new data management outreach librarian, it was valuable for me to be able to speak with my fellow librarians who have similar positions at other institutions. Having worked previously as an intern with the City of Boston’s Department of Innovation and Technology on their open data website redesign and communication, I was very interested to hear from Tom Schenk, Chief Data Officer from the City of Chicago. His talk was very engaging and he told many interesting data stories that stem from the development of a vibrant and engaged civic technology community in Chicago.

One of the goals with collecting large amounts of municipal data is to use data analytics to improve problems in the city that stem from infrastructure and also to improve the lives and health outcomes of Chicagoans. The goal of much of the data analysis is to predict future problems more quickly and with greater accuracy. Another goal is to prevent problems from occurring in the first place. For example, Tom Schenck said that underground city infrastructure is hit on average every 60 minutes. A 3D model of underground city infrastructure helps to decrease and prevent contact damage to underground infrastructure like pipes and wiring. The city has also created a heatmap of rodent complaints. Using data analytics comprised of 31 different factors that correlate with rodent complaints over a seven day period, the city can predict where in the city the next increase in rodent complaints will occur. In a similar way the city can also use data analytics to find the food establishments with the highest possibility of risk of food poisoning. Using data analytics, the city is able to speed up the rate at which they can predict food violations by 7 days, which is important in preventing food poisoning in food customers. Schenck also mentioned that the computer code for this model is open source and available on Github. Other projects tackled by the cities data analytics include predicting where West Nile virus may occur, predicting where e-coli may occur on city beaches, and the Lead Safe project which aims to reduce children’s exposure to residential lead paint. The Clean Water project was created thanks to about 1000 hours of volunteered time from Chicagoans involved in the civic tech community. According to Schenk, the project used open science that is fully reproducible and available on BiorXiv.

In addition, I enjoyed many of the talks from university data management librarians. Andrew Johnson from the University of Colorado talked about defining the role of the library in an institution's research data management. He referenced SPEC Kit from the ARL on data curation. He asked the question, “are we doing things because we can or because we have a good reason to be doing them ?” He cautioned against preservation for preservation's sake. Finally, Andrew thought the library plays a unique role in the university, because it is the only place that understands the big picture of scholarly communication.

There were also many talks about FAIR data, which is an acronym for Findable, Accessible, Interoperable, and Re-usable. In talking about big data, Ayoung Yoon mentioned the 3 Vs: volume, variety, and velocity that characterize big data sets. Wendy Kozlowski from Cornell University’s ITS, talked about the development of a usable and interactive data storage finder. I thought their website was very impressive and well thought-out.

On my last day at RDAP, I particularly enjoyed the workshop titled, Building with the Carpentries. It was an overview of how to get involved with the carpentries at your local institution. I also had the opportunity to meet and talk with Tess Grynoch and Julie Goldman about the New England Library Carpentry community. In conclusion, I really enjoyed my trip to the RDAP summit in Chicago. I particularly enjoyed speaking with fellow research data librarians from other university institutions. It was interesting to observe and ask about how the roles vary at each institution depending upon its needs, priorities, and organizational structure.