Kouper – Question 2
2In your role, are the main pressures on and needs of data curation the same across the sciences, social sciences, and humanities?
CLIR/DLF Data Curation Postdoctoral Fellow, Data to Insight Center – Indiana University
¶ 1 Leave a comment on paragraph 1 0 From a data curation perspective, the distinctions between the sciences, social sciences, and humanities are getting more and more blurred. Many of our projects at D2I aim to facilitate cross-disciplinarity and problem-based research. If someone is working on solving problems of food insecurity or flood management, they would benefit from using heterogeneous data gathered from satellites, sensors, interviews, surveys, and published documents. They could also benefit from a digital environment where they can combine methods, such as statistical analysis, data mining, and visualizations.
¶ 2 Leave a comment on paragraph 2 0 Obviously, data curation can be domain specific; for example, descriptions of geosciences data would be different from descriptions of sociological surveys. At the same time, it seems more useful to talk about the pressures on and need for curating various types of data, such as images; spatial, temporal, and textual data; audio and video; databases; heterogeneous datasets; and so on. Identifying differences and similarities in the curation of various types of data would allow us to create more integrated environments for sharing and analyzing data across disciplines, fields, and institutions.
¶ 3 Leave a comment on paragraph 3 0 In addition to data types, pressures and needs relating to data curation cluster around the size of datasets, data sensitivity, and researchers’ awareness of the need for data stewardship.
¶ 4 Leave a comment on paragraph 4 0 Size of data. There are many discussions around so-called big data versus small or long-tail data. In the business world, big data are often defined along the dimensions of three Vs: volume, velocity, and variety.1 Big data are the data that are so large, quickly accumulating, and complex that they require different approaches in storage, analysis, and curation, whether dealing with a massive amount of texts in a digital humanities database or high-resolution images in astronomy. Long-tail data have their own challenges. They may be easy to store on one computer, but because they are often collected for individual studies, they may be very heterogeneous in variables and observation points they cover. In data curation, it is hard to figure out how to integrate such data into a single environment. As Katherine Akers pointed out in her recent essay, small data are messy, but they are important; once they are meaningfully preserved and integrated, they can contribute as much if not more to our knowledge as big data.2
¶ 5 Leave a comment on paragraph 5 0 Data sensitivity. This is related to the idea of open access to the products of scholarly activities. Proponents argue that open access is beneficial for the advancement of knowledge as well as for the existence of an informed and educated citizenry. Should the ideals and policies of open access be applied to data? Access to data might be more complicated than access to published materials, because knowledge about human subjects or certain areas of the planet can be harmful when widely shared. When thinking about data curation, it is important to develop a framework to address multiple levels of data sensitivity and promote research and education without causing more problems than solutions.
¶ 6 Leave a comment on paragraph 6 0 Researchers’ awareness. Research communities vary by their awareness and acceptance of the need to share and curate data. For example, in some disciplines such as climatology or astronomy, the frameworks for data exchanges are much more developed. Climate models rely on the three-dimensional grid that represents and calculates atmospheric, oceanic, and chemical information from around the world.3 No one—be it a person, a team, or an institution—can collect such data on his or her own. That is why climate-modeling efforts are coordinated on the local, national, and international levels and climate data are stored and documented quite well.
¶ 7 Leave a comment on paragraph 7 0 In other disciplines, data efforts are still much more localized. At D2I, we have been working with social-ecological data, particularly in one database that contains information from around the world and spans almost two decades. It is a single database and researchers add information using a standardized instrument. Nevertheless, the variation within this database is immense because researchers collect data for their individual projects and adapt the instrument accordingly. When other researchers consult this database to retrieve data for their individual projects, they do not know what is available to them based on their requirements. And we as data curators and tool developers do not necessarily know how to facilitate their queries and make the database better to navigate and share. Hopefully, as more and more researchers become aware of the importance of sharing their data, we will be able to engage them and create integrated solutions for various communities.
- ¶ 8 Leave a comment on paragraph 8 0
- Doug Laney, “3D Data Management: Controlling Volume, Velocity and Variety,” Gartner Blog Network, February 6, 2001, http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf. [↩]
- Katherine Akers, “Looking Out for the Little Guy: Small Data Curation,” Bulletin of the American Society for Information Science and Technology, February/March 2013, http://www.asis.org/Bulletin/Feb-13/FebMar13_RDAP_Akers.html. [↩]
- NOAA’s Office of Oceanic and Atmospheric Research, “Modeling Climate,” last modified March 7, 2011, http://www.research.noaa.gov/climate/t_modeling.html. [↩]