- Jan 2014
-
www.dataone.org www.dataone.org
-
An effective data management program would enable a user 20 years or longer in the future to discover , access , understand, and use particular data [ 3 ]. This primer summarizes the elements of a data management program that would satisfy this 20-year rule and are necessary to prevent data entropy .
Who cares most about the 20-year rule? This is an ideal that appeals to some, but in practice even the most zealous adherents can't picture what this looks like in some concrete way-- except in the most traditional ways: physical paper journals in libraries are tangible examples of the 20-year rule.
Until we have a digital equivalent for data I don't blame people looking for tenure or jobs for not caring about this ideal if we can't provide a clear picture of how to achieve this widely at an institutional level. For digital materials I think the picture people have in their minds is of tape backup. Maybe this is generational? New generations not exposed widely to cassette tapes, DVDs, and other physical media that "old people" remember, only then will it be possible to have a new ideal that people can see in their minds-eye.
-
A key component of data management is the comprehensive description of the data and contextual information that future researchers need to understand and use the data. This description is particularly important because the natural tendency is for the information content of a data set or database to undergo entropy over time (i.e. data entropy ), ultimately becoming meaningless to scientists and others [ 2 ].
I agree with the key component mentioned here, but I feel the term data entropy is an unhelpful crutch.
-
data entropy Normal degradation in information content associated with data and metadata over time (paraphrased from [ 2 ]).
I'm not sure what this really means and I don't think data entropy is a helpful term. Poor practices certainly lead to disorganized collections of data, but I think this notion comes from a time when people were very concerned about degradation of physical media on which data is stored. That is, of course, still a concern, but I think the term data entropy really lends itself as an excuse for people who don't use good practices to manage data and is a cover for the real problem which is a kind of data illiteracy in much the same way we also face computational illiteracy widely in the sciences. Managing data really is hard, but let's not mask it with fanciful notions like data entropy.
-