115 Matching Annotations
  1. Nov 2024
    1. TRSP Desirable Characteristics It is recognised that a repository may offer different levels of curation to different digital objects.

    1. TRSP Desirable Characteristics Review and annotation of the data performed by the repository (e.g. via a data submission tool that enforces some curation, or by its curation team): are there a set of minimum curation steps that repository performs on the submitted data? Is there a webpage or document that describes the type of curation done? This criteria is also related to the Data and Metadata Standards criteria.

    1. the notes you make about it as you curate it (which tells the AI exactly what you find useful about it)

      My notes may give some indication about what I find useful about a thing, but certainly not exactly or even all of what I find useful.

  2. Oct 2024
    1. https://web.archive.org/web/20241007071434/https://www.dbreunig.com/2024/10/03/we-need-help-with-discovery-more-than-generation.html

      Author says generation isn't a problem to solve for AI, there's enough 'content' as it is. Posits discovery as a bigger problem to solve. The issue there is, that's way more personal and less suited for VC funded efforts to create a generic tool that they can scale from the center. Discovery is not a thing, it's an individual act. It requires local stuff, tuned to my interests, networks etc. Curation is a personal thing, providing intent to discovery. Same why [[Algemene event discovery is moeilijk 20150926120836]], as [[Event discovery is sociale onderhandeling 20150926120120]] Still it's doable, but more agent like than central tool.

  3. Jul 2024
  4. Feb 2024
    1. Our goal is to have the best answers to every question, so if you see questions or answers that can be improved, you can edit them.
    2. In fact, I think this self-answered Q&A of yours was already quite good by the standards of the site, and very useful - I've used it to close other duplicates several times. As someone who wears a "curator" hat around here, I want to make questions like this even better - as good as they can be - and make it clear to others that this is the right duplicate target to use when someone else asks the same question.
  5. Dec 2023
      • for: climate crisis - debate - community action, climate crisis - discussion - community action, indyweb - curation example

      • discussion: effectiveness of community action to address climate crisis

        • This is a good discussion on the effectiveness of community action to address the climate crisis.
        • It offers a diverse range of perspectives that can all be mapped using SRG trailmark protocol and then data visualized within Indyweb via cytoscape
  6. Nov 2023
    1. À cette fin, un groupe d'experts serait appelé à éditer ces contenus

      Ce qui veut dire que la curation est faite par des experts et l'éditorialisation peut être produite par un effort de groupe ou une communauté?

      Est-ce qu'au moment où les données sont manipulées par un ou des individus la dimension culturelle n'entre pas en jeu? Puisqu'une campagne de grippe ne serait pas la même si on change de pays, donc nécessairement le choix des informations serait influencé par, entre autres, des facteurs culturels.

    2. la curation des contenus est un des éléments du processus d'éditorialisation, tandis que cette dernière désigne le processus dans son intégralité, prenant en considération tous les aspects de la production d'un contenu et du sens que ce contenu acquiert au sein d'une culture.
    3. le concept d'éditorialisation implique une dimension culturelle qui n'est pas présente dans l'idée de curation
  7. Jan 2023
    1. Every day, thousands of strangers upload little slices of their consciousness directly into my mind. My concern is that I'm prone to mistake their thoughts for my own — that some part of me believes I'm only hearing myself think.

      Letting others think for us -- groupthink -- though recognition that as social animals it is important to us to know how others think; problem of any type of feed, even RSS, is telling us what to be thinking about -- thoughtful curation of sources vital

  8. Nov 2022
    1. dealised utopia

      Possibly, besides web monetization, there can be donation basket like ko-fi beside curation, as well as cleatly linking back to the original that can have a donation basket as well. The options are complementary.

  9. Aug 2022
    1. https://news.ycombinator.com/item?id=32341607

      Didn't read it all, but the total number of notes, many likely repetitive or repetitive of things elsewhere makes me think that there is a huge diversity of thought within this space and different things work for different people in terms of work and even attention.

      The missing piece is that all of this sits here instead of being better curated and researched to help some forms of quicker consensus. I'm sure there are hundreds of other posts just like this on HN with all the same thoughts over and over again with very little movement forward.

      How can we help to aggregate and refine this sort of knowledge to push the borders for everyone broadly rather than a few here and there?

  10. Jul 2022
    1. The most common way is to log the number of upvotes (or likes/downvotes/angry-faces/retweets/poop-emojis/etc) and algorithmically determine the quality of a post by consensus.

      When thinking about algorithmic feeds, one probably ought to not include simple likes/favorites/bookmarks as they're such low hanging fruit. Better indicators are interactions which take time, effort, work to post.

      Using various forms of webmention as indicators could be interesting as one can parse responses and make an actual comment worth more than a dozen "likes", for example.

      Curating people (who respond) as well as curating the responses themselves could be useful.

      Time windowing curation of people and curators could be a useful metric.

      Attempting to be "democratic" in these processes may often lead to the Harry and Mary Beercan effect and gaming issues seen in spaces like Digg or Twitter and have dramatic consequences for the broader readership and community. Democracy in these spaces is more likely to get you cat videos and vitriol with a soupçon of listicles and clickbait.

  11. Apr 2022
    1. Since most of our feeds rely on either machine algorithms or human curation, there is very little control over what we actually want to see.

      While algorithmic feeds and "artificial intelligences" might control large swaths of what we see in our passive acquisition modes, we can and certainly should spend more of our time in active search modes which don't employ these tools or methods.

      How might we better blend our passive and active modes of search and discovery while still having and maintaining the value of serendipity in our workflows?

      Consider the loss of library stacks in our research workflows? We've lost some of the serendipity of seeing the book titles on the shelf that are adjacent to the one we're looking for. What about the books just above and below it? How do we replicate that sort of serendipity into our digital world?

      How do we help prevent the shiny object syndrome? How can stay on task rather than move onto the next pretty thing or topic presented to us by an algorithmic feed so that we can accomplish the task we set out to do? Certainly bookmarking a thing or a topic for later follow up can be useful so we don't go too far afield, but what other methods might we use? How can we optimize our random walks through life and a sea of information to tie disparate parts of everything together? Do we need to only rely on doing it as a broader species? Can smaller subgroups accomplish this if carefully planned or is exploring the problem space only possible at mass scale? And even then we may be under shooting the goal by an order of magnitude (or ten)?

    2. …and they are typically sorted: chronologically: newest items are displayed firstthrough data: most popular, trending, votesalgorithmically: the system determines what you see through your consumption patterns and what it wants you to seeby curation: humans determine what you seeby taxonomy: content is displayed within buckets of categories, like Wikipedia Most media entities employ a combination of the above.

      For reading richer, denser texts what is the best way of ordering and sorting it?

      Algorithmically sorting with a pseudo-chronological sort is the best method for social media content, but what is the most efficient method for journal articles? for books?

    1. Aaron Tay, a librarian at Singapore Management University who studies academic search tools, gets literature recommendations from both Twitter and Google Scholar, and finds that the latter often highlights the same articles as his human colleagues, albeit a few days later. Google Scholar “is almost always on target”, he says.

      Anecdotal evidence indicates that manual human curation as evinced by Twitter front runs Google Scholar by a few days.

  12. Mar 2022
    1. adopt the mindset of a curator – objective, opinionated, and reflective.

      This is a fascinating idea of what it means to curate. I'm not entirely clear what "objective" means but I like "opinionated and reflective". In all the "become a curator" advice I've heard, those 2 words feel like they've been missing.

  13. Feb 2022
    1. On iOS, not even dedicated file managers like DEVONthink are capable (or willing might be a more accurate term) of handling the diversity of data Telegram will happily pass on for you, especially through the Share Sheet.

      In tandem with @OlegWock’s Raindrop Telegram Bot, Telegram’s speedy share sheet makes for the fastest means of sending content to one’s Raindrop collection to date.

  14. Jan 2022
    1. The mere scribe and the mere compiler have disappeared (almost completely), and the mere commentator has become very rare. Each exists only insofar as any author in creating his own work cannot do without some copying, some compiling (or research), and some commenting.

      The digital era has made copying (scriptor) completely redundant. The click of a button allows the infinite copying of content.

      Real compilators are few and far between, but exist in niches. Within social media many are compiling and tagging content within their accounts.

      Commentators are a dime a dozen and have been made ubiquitous courtesy of social media.

      Content creators or auctors still exist, but are rarer in the broader field of writing or other contexts.

  15. Dec 2021
    1. the five Cs

      Category and Context indicate the relevance Correctness indicates the validity Contributions indicates the value Clarity indicates the quality

    2. In the second pass, read the paper with greater care, butignore details such as proofs.

      The objective is to get an overall grasp of the content, sufficient to allow you to summarize effectively with supporting evidence.

      You should gain enough familiarity to determine where additional effort to understand the details is warranted.

      You are also looking for evidence of quality (or lack thereof) that might color your impressions of the overall value.

    3. The first pass is a quick scan to get a bird’s-eye view ofthe paper.

      Initial screening of the document, based on front-matter, end-matter, introduction, and conclusions.

      You are evaluating the relevance, validity, quality, and ultimate value of the document to your knowledge base. If your evaluation is favorable, you can be relatively confident that spending more time to go through the document in more detail (second pass) will be worthwhile.

  16. Nov 2021
    1. public collections

      Update Raindrop has since implemented per-account profile pages. (A collection one’s public collections, you might say.)

      <iframe style="border: 0; width: 100%; height: 450px;" allowfullscreen frameborder="0" src="https://raindrop.io/davidblue/embed/me" allowfullscreen></iframe>
  17. Sep 2021
    1. I will be looking for your conscious choice in your entry selections, dedicated organizational patterns and curation techniques, self-reflection and thoughtful responses in your short writing exercises, and as a whole, your engagement with and understanding of our various texts.

      Focus on some of the conscious choices, organization and curation are pieces missing from modern digital note taking space in talking about digital gardens and zettelkasten.

  18. Aug 2021
    1. Taking turns at hosting shared the administrative load and the benefits that accrued. It was considered good practice to read all the submissions and craft your own post that would link to them, possibly exercising some selection, in a way that might entice readers to see for themselves. In that respect, because they were curated, blog carnivals to me are distinct from planets that merely accrete stuff, admittedly on a topic, without curation.

      This almost sounds like the creation of a wiki page, but in blog format.

    1. If we cannot afford real, diverse, and independent assessment, we will not realize the promise of middleware.
    2. Building on platforms' stores of user-generated content, competing middleware services could offer feeds curated according to alternate ranking, labeling, or content-moderation rules.

      Already I can see too many companies relying on artificial intelligence to sort and filter this material and it has the ability to cause even worse nth degree level problems.

      Allowing the end user to easily control the content curation and filtering will be absolutely necessary, and even then, customer desire to do this will likely loose out to the automaticity of AI. Customer laziness will likely win the day on this, so the design around it must be robust.

    3. Francis Fukuyama has called "middleware": content-curation services that could give users more control over the material they see on internet platforms such as Facebook or Twitter.
  19. May 2021
    1. <small><cite class='h-cite via'> <span class='p-author h-card'>kickscondor</span> in Web Curios Returns! (<time class='dt-published'>05/01/2021 10:37:56</time>)</cite></small>

  20. Mar 2021
  21. Feb 2021
    1. sharing it on a cloud-based platform.

      Interesting that tools would come up at this stage. Chances are, someone's curation toolkit will cover all the steps and there are some tools which integrate several of these. Refworks, Zotero, and Mendeley might be interesting examples in that they allow for cloud sharing yet focus on the information management.

  22. Jan 2021
    1. But in my apt/deb world, where I use official repositories from my distro, where is the threat from 3rd party ? They are eventually « curated » in partner repository, or in universe
  23. Oct 2020
    1. A more active stance by librarians, journalists, educators, and others who convey truth-seeking habits is essential.

      In some sense these people can also be viewed as aggregators and curators of sorts. How can their work be aggregated and be used to compete with the poor algorithms of social media?

    1. You find them in a place that you curate yourself, not one “curated” for you by a massive corporate social network intent on forcing you to be every part of yourself to everyone, all at once. You should control how, when, and where to interact with your people.
    1. It isn't rocket science, but as Jon indicates, it's incredibly powerful.

      I use my personal website with several levels of taxonomy for tagging and categorizing a variety of things for later search and research.

      Much like the example of the Public Radio International producer, I've created what I call a "faux-cast" because I tag everything I listen to online and save it to my website including the appropriate <audio> link to the.mp3 file so that anyone who wants to follow the feed of my listens can have a playlist of all the podcast and internet-related audio I'm listening to.

      A visual version of my "listened to" tags can be found at https://boffosocko.com/kind/listen/ with the RSS feed at https://boffosocko.com/kind/listen/feed/

    1. But on the Web, stories of all kinds can show up anywhere and information and news are all mixed together. Light features rotate through prominent spots on the "page" with the same weight as breaking news, sports coverage, and investigative pieces, even on mainstream news sites. Advertorial "features" and opinion pieces are not always clearly identified in digitalspaces.

      This difference is one of the things I miss about reading a particular newspaper and experiencing the outlet's particular curation of their own stories.Perhaps I should spend more time looking at the "front page" of various news sites.

    1. why encourage posting before you’ve even read the thing? Because, at least my hope is, it’ll prevent posting a link from becoming an endorsement for the content at the other end of that link. There’s a natural tendency to curate what we associate with our online profiles and I think that’s, in large part, because we’ve spent a lot of time equating a user’s profile page with a user’s identity and, consequently, their beliefs. But I consume a wealth of content that I don’t necessarily agree with, and that helps to inform me, to shape my opinions, as much as the content that I agree with wholeheartedly.
  24. Aug 2020
  25. Jul 2020
  26. Jun 2020
  27. May 2020
  28. Apr 2020
    1. LitCovid: Curated literature hub for tracking up-to-date scientific information about the 2019 novel Coronavirus .t3_fp7epr ._2FCtq-QzlfuN-SwVMUZMM3 { --postTitle-VisitedLinkColor: #9b9b9b; --postTitleLink-VisitedLinkColor: #9b9b9b; } methods and toolshttps://www.ncbi.nlm.nih.gov/research/coronavirus/LitCovid is a curated literature hub for tracking up-to-date scientific information about the 2019 novel Coronavirus. It is the most comprehensive resource on the subject, providing a central access to 1724 (and growing) relevant articles in PubMed. The articles are updated daily and are further categorized by different research topics and geographic locations for improved access. You can read more at Chen et al. Nature (2020) and download our data here.
    1. LitCovid is a curated literature hub for tracking up-to-date scientific information about the 2019 novel Coronavirus. It is the most comprehensive resource on the subject, providing a central access to 1737 (and growing) relevant articles in PubMed. The articles are updated daily and are further categorized by different research topics and geographic locations for improved access. You can read more at Chen et al. Nature (2020) and download our data here.
  29. Feb 2020
    1. Reverse engineering a bronze cannon from theLaBelleshipwreck

      The benefit to archaeology, museum curation, and other areas presented by computer modeling and 3D printing cannot be overstated. These technologies allow us to explore artifacts, sites, and more, in ways that we never could before.

  30. Mar 2019
  31. Feb 2019
    1. Creating a work of art will depend more and more on the ability of the artist to select, organize and present the bits of raw data we have at our disposal

      The Artist-Researcher paradigm

  32. Oct 2017
  33. Jul 2017
    1. A pedagogy of abundance, acknowledging that content often is freely available and abundant, may eventually prove relevant in this regard.

      This may be a great way to develop evaluation skills in "students". There is so much good on the web but you need to have the ability to distinguish it from the other.

  34. Jun 2017
    1. Althoughit cannot be taken for granted that the publication lists on RIDare error-free, these lists will probably be more reliable thantheautomatically generated lists (by Elsevier).

      Seems like a list which is automatically populated and then edited by an researcher would be better than one manually created. I don't think there's any factual basis for this claim.

  35. Apr 2017
  36. Jan 2017
    1. That’s not to say that social media curbs our self-awareness, or that our internet selves aren’t highly artificial and curated. Nor that people living in oppressive regimes, or as minorities in societies where they know they will be targeted, aren’t justifiably anxious about what they say online. But the point remains that digital media have radically transformed our conceptions of intimacy and shame, and they’ve done so in ways that are unpredictable and paradoxical.
  37. Dec 2016
    1. Daniel does a lot of great work

      CURATION BY CALLOUT

    2. try to find connecting points

      CURATION BY THREAD

      Trying to pull out common threads. Also, trying to pull out substantive differences.

    3. In the interest

      CURATION WITH OTHER MODES (image, text in image, gif, video, sound files)

    4. am removed from the anchor text

      CURATION WITHIN THE ANNOTATION FRAME

      CURATION OUTSIDE OF THE ANNOTATION FRAME

    5. rough take

      CURATING BY SUMMING UP

    6. to each other

      CURATION WITH EACH OTHER

    7. reacting to ideas

      CURATION BY REACTION

    8. It’s also invisible, to some degree.

      CURATING BY SHARING (PUBLIC, UNLISTED, and PRIVATE GROUPS)

      Private annotations for oneself and for private groups are also possible. Nice. Fine grained. Along with tagging, there are many possibilities to send annotations in multiple channels within Hypothes.is (private groups, public notes, tags, social sharing).

    9. These annotations are for me an experiment in meta-curation, curating about curation. I hope that we can draw some lessons in how to curate from Kevin's post. It would be great to do this elsewhere and then draw our discoveries together on Hypothes.is.

      In lieu of starting that project I have begun a private group where we can gather together curatorial strategies. Here is a link if you want to join Curation Strategies: https://hypothes.is/groups/j3eoYn2b/curation-strategies

      It might also be helpful to come up with a set of specific (curation strategies) and generic tags (curation) that we can search for on Hypothes.is here: https://hypothes.is/stream?q=tag:curation

    10. Sifting

      CURATING BY SIFTING

      Technique #1: thinking of the marginalia as vein to be mined for nuggets and gems. Begs the question: what criteria do we apply to the vein so that we can sift stuff?

      Notable quotes (why notable?) Main ideas Worthy figurative language (metaphors, symbols, analogies) New wine. Old wine in new bottles

  38. Sep 2016
    1. curate

      The term may still sound somewhat misleading to those who work in, say, museums (where “curator” is a very specific job title). But the notion behind it is quite important, especially when it comes to Open Education. A big part of the job is to find resources and bring them together for further reuse, remix, and reappropriation. In French, we often talk about «veille technologique», which is basically about watching/monitoring relevant resources, especially online.

  39. Aug 2016
    1. the concept of the printed edition

      oui, plus exact que la distinction avec la "curation". Curation peut être une éditorialisation : ouvert et dynamique. c'est une éditorialisation spécifique.

    1. la curation de mon profil,

      je dirais que la curation s'effectue sur un profil, à travers un profil (curation de contenus externes), mais on ne fait pas la curation d'un profil.

  40. Jul 2016
    1. collaborate on joint, shared, or cooperative programs that address common educa-tional and training needs.

      are they asking for grant proposals on training for data curation for cultural heritage?

    2. support the re-use of data over time and across generations of technology (digital curation)

      is this how they're defining digital curation?

  41. Apr 2016
    1. Centralizing content and distributing labor: a community model for curating the very long tail of microbial genomes.

      This is going to be a good talk. Get your coffee, open your eyes, and open your mind! A pattern that could actually scale up - worth a try! Disagree? reply here.

  42. Feb 2016
    1. Educators

      Just got to think about our roles, in view of annotation. Using “curation” as a term for collecting URLs sounds like usurping the title of “curator”. But there’s something to be said about the role involved. From the whole “guide on the side” angle to the issue with finding appropriate resources based on a wealth of expertise.

  43. Dec 2015
    1. distributed curation

      While “web curation” is well-established as a practice, there’s still a lot of work to do on what it represents, conceptually and epistemologically.

  44. Nov 2015
    1. some kind of curated library

      Which is where OER catalogues (tied to the Semantic Web) may shine. Sure, they can require a lot of work. But this is precisely why they matter.

  45. Oct 2015
    1. long time curating these tomes

      Part of the argument for OER might come from more efficient ways to curate this type of material. Creating textbooks is some people’s main goal, but there’s a whole lot to be said about Open Coursepacks in Linked Open Data.

  46. Aug 2015
  47. Jan 2014
    1. Reasons for not making data electronically available. Regarding their attitudes towards data sharing, most of the respondents (85%) are interested in using other researchers' datasets, if those datasets are easily accessible. Of course, since only half of the respondents report that they make some of their data available to others and only about a third of them (36%) report their data is easily accessible, there is a major gap evident between desire and current possibility. Seventy-eight percent of the respondents said they are willing to place at least some their data into a central data repository with no restrictions. Data repositories need to make accommodations for varying levels of security or access restrictions. When asked whether they were willing to place all of their data into a central data repository with no restrictions, 41% of the respondents were not willing to place all of their data. Nearly two thirds of the respondents (65%) reported that they would be more likely to make their data available if they could place conditions on access. Less than half (45%) of the respondents are satisfied with their ability to integrate data from disparate sources to address research questions, yet 81% of them are willing to share data across a broad group of researchers who use data in different ways. Along with the ability to place some restrictions on sharing for some of their data, the most important condition for sharing their data is to receive proper citation credit when others use their data. For 92% of the respondents, it is important that their data are cited when used by other researchers. Eighty-six percent of survey respondents also noted that it is appropriate to create new datasets from shared data. Most likely, this response relates directly to the overwhelming response for citing other researchers' data. The breakdown of this section is presented in Table 13.

      Categories of data sharing considered:

      • I would use other researchers' datasets if their datasets were easily accessible.
      • I would be willing to place at least some of my data into a central data repository with no restrictions.
      • I would be willing to place all of my data into a central data repository with no restrictions.
      • I would be more likely to make my data available if I could place conditions on access.
      • I am satisfied with my ability to integrate data from disparate sources to address research questions.
      • I would be willing to share data across a broad group of researchers who use data in different ways.
      • It is important that my data are cited when used by other researchers.
      • It is appropriate to create new datasets from shared data.
    2. Data sharing practices. Only about a third (36%) of the respondents agree that others can access their data easily, although three-quarters share their data with others (see Table 11). This shows there is a willingness to share data, but it is difficult to achieve or is done only on request.

      There is a willingness, but not a way!

    3. Nearly one third of the respondents chose not to answer whether they make their data available to others. Of those who did respond, 46% reported they do not make their data electronically available to others. Almost as many reported that at least some of their data are available somehow, either on their organization's website, their own website, a national network, a global network, a personal website, or other (see Table 10). The high percentage of non-respondents to this question most likely indicates that data sharing is even lower than the numbers indicate. Furthermore, the less than 6% of scientists who are making “All” of their data available via some mechanism, tends to re-enforce the lack of data sharing within the communities surveyed.
    4. Adding descriptive metadata to datasets helps makes the dataset more accessible by others and into the future. Respondents were asked to indicate all metadata standards they currently use to describe their data. More than half of the respondents (56%) reported that they did not use any metadata standard and about 22% of respondents indicated they used their own lab metadata standard. This could be interpreted that over 78% of survey respondents either use no metadata or a local home grown metadata approach.

      Not surprising that roughly 80% use no or ad hoc metadata.

    5. Data reuse. Respondents were asked to indicate whether they have the sole responsibility for approving access to their data. Of those who answered this question, 43% (n=545) have the sole responsibility for all their datasets, 37% (n=466) have for some of their datasets, and 21% (n=266) do not have the sole responsibility.
    6. Policies and procedures sometimes serve as an active rather than passive barrier to data sharing. Campbell et al. (2003) reported that government agencies often have strict policies about secrecy for some publicly funded research. In a survey of 79 technology transfer officers in American universities, 93% reported that their institution had a formal policy that required researchers to file an invention disclosure before seeking to commercialize research results. About one-half of the participants reported institutional policies that prohibited the dissemination of biomaterials without a material transfer agreement, which have become so complex and demanding that they inhibit sharing [15].

      Policies and procedures are barriers, but there are many more barriers beyond that which get in the way first.

    7. data practices of researchers – data accessibility, discovery, re-use, preservation and, particularly, data sharing
      • data accessibility
      • discovery
      • re-use
      • preservation
      • data sharing
    1. The Data Life Cycle: An Overview The data life cycle has eight components: Plan : description of the data that will be compiled, and how the data will be managed and made accessible throughout its lifetime Collect : observations are made either by hand or with sensors or other instruments and the data are placed a into digital form Assure : the quality of the data are assured through checks and inspections Describe : data are accurately and thoroughly described using the appropriate metadata standards Preserve : data are submitted to an appropriate long-term archive (i.e. data center ) Discover : potentially useful data are located and obtained, along with the relevant information about the data ( metadata ) Integrate : data from disparate sources are combined to form one homogeneous set of data that can be readily analyzed Analyze : data are analyzed

      The lifecycle according to who? This 8-component description is from the point of view of only the people who obsessively think about this "problem".

      Ask a researcher and I think you'll hear that lifecycle means something like:

      collect -> analyze -> publish
      

      or a more complex data management plan might be:

      ask someone -> receive data in email -> analyze -> cite -> publish -> tenure
      

      To most people lifecycle means "while I am using the data" and archiving means "my storage guy makes backups occasionally".

      Asking people to be aware of the whole cycle outlined here is a non-starter, but I think there is another approach to achieve what we want... dramatic pause [to be continued]

      What parts of this cycle should the individual be responsible for vs which parts are places where help is needed from the institution?

    2. Data represent important products of the scientific enterprise that are, in many cases, of equivalent or greater value than the publications that are originally derived from the research process. For example, addressing many of the grand challenge scientific questions increasingly requires collaborative research and the reuse , integration, and synthesis of data.

      Who else might care about this other than Grand Challenge Question researchers?

    3. Journals and sponsors want you to share your data

      What is the sharing standard? What are the consequences of not sharing? What is the enforcement mechanism?

      There are three primary sharing mechanisms I can think of today: email, usb stick, and dropbox (née ftp).

      The dropbox option is supplanting ftp which comes from another era, but still satisfies an important niche for larger data sets and/or higher-volume or anonymous traffic.

      Dropbox, email and usb are all easily accessible parts of the day-to-day consumer workflow; they are all trivial to set up without institutional support or, importantly, permission.

      An email account is already provisioned by default for everyone or, if the institutional email offerings are not sufficient, a person may easily set up a 3rd-party email account with no permission or hassle.

      Data management alternatives to these three options will have slow or no adoption until the barriers to access and use are as low as email; the cost of entry needs to be no more than *a web browser, an email address, and no special permission required".

    4. An effective data management program would enable a user 20 years or longer in the future to discover , access , understand, and use particular data [ 3 ]. This primer summarizes the elements of a data management program that would satisfy this 20-year rule and are necessary to prevent data entropy .

      Who cares most about the 20-year rule? This is an ideal that appeals to some, but in practice even the most zealous adherents can't picture what this looks like in some concrete way-- except in the most traditional ways: physical paper journals in libraries are tangible examples of the 20-year rule.

      Until we have a digital equivalent for data I don't blame people looking for tenure or jobs for not caring about this ideal if we can't provide a clear picture of how to achieve this widely at an institutional level. For digital materials I think the picture people have in their minds is of tape backup. Maybe this is generational? New generations not exposed widely to cassette tapes, DVDs, and other physical media that "old people" remember, only then will it be possible to have a new ideal that people can see in their minds-eye.

    5. A key component of data management is the comprehensive description of the data and contextual information that future researchers need to understand and use the data. This description is particularly important because the natural tendency is for the information content of a data set or database to undergo entropy over time (i.e. data entropy ), ultimately becoming meaningless to scientists and others [ 2 ].

      I agree with the key component mentioned here, but I feel the term data entropy is an unhelpful crutch.

    6. data entropy Normal degradation in information content associated with data and metadata over time (paraphrased from [ 2 ]).

      I'm not sure what this really means and I don't think data entropy is a helpful term. Poor practices certainly lead to disorganized collections of data, but I think this notion comes from a time when people were very concerned about degradation of physical media on which data is stored. That is, of course, still a concern, but I think the term data entropy really lends itself as an excuse for people who don't use good practices to manage data and is a cover for the real problem which is a kind of data illiteracy in much the same way we also face computational illiteracy widely in the sciences. Managing data really is hard, but let's not mask it with fanciful notions like data entropy.

    7. Although data management plans may differ in format and content, several basic elements are central to managing data effectively.

      What are the "several basic elements?"

    8. By documenting your data and recommending appropriate ways to cite your data, you can be sure to get credit for your data products and their use

      Citation is an incentive. An answer to the question "What's in it for me?"

    9. This primer describes a few fundamental data management practices that will enable you to develop a data management plan, as well as how to effectively create, organize, manage, describe, preserve and share data

      Data management practices:

      • create
      • organize
      • manage
      • describe
      • preserve
      • share
    10. The goal of data management is to produce self-describing data sets. If you give your data to a scientist or colleague who has not been involved with your project, will they be able to make sense of it? Will they be able to use it effectively and properly?
    1. One respondent noted that NSF doesn't have an enforcement policy. This is presumably true of other mandate sources as well, and brings up the related and perhaps more significant problem that mandates are not always (if they are ever) accompanied by the funding required to satisfy them. Another respondent wrote that funding agencies expect universities to contribute to long-term data storage.
    2. Data management activities, grouped. The data management activities mentioned by the survey can be grouped into five broader categories: "storage" (comprising backup or archival data storage, identifying appropriate data repositories, day-to-day data storage, and interacting with data repositories); "more information" (comprising obtaining more information about curation best practices and identifying appropriate data registries and search portals); "metadata" (comprising assigning permanent identifiers to data, creating and publishing descriptions of data, and capturing computational provenance); "funding" (identifying funding sources for curation support); and "planning" (creating data management plans at proposal time). When the survey results are thus categorized, the dominance of storage is clear, with over 80% of respondents requesting some type of storage-related help. (This number may also reflect a general equating of curation with storage on the part of respondents.) Slightly fewer than 50% of respondents requested help related to metadata, a result explored in more detail below.

      Categories of data management activities:

      • storage
        • backup/archival data storage
        • identifying appropriate data repositories
        • day-to-day data storage
        • interacting with data repositories
      • more information
        • obtaining more information about curation best practices
        • identifying appropriate data registries
        • search portals
      • metadata
        • assigning permanent identifiers to data
        • creating/publishing descriptions of data
        • capturing computational provenance
      • funding
        • identifying funding sources for curation support
      • planning
        • creating data management plans at proposal time
    3. Data management activities, grouped. The data management activities mentioned by the survey can be grouped into five broader categories: "storage" (comprising backup or archival data storage, identifying appropriate data repositories, day-to-day data storage, and interacting with data repositories); "more information" (comprising obtaining more information about curation best practices and identifying appropriate data registries and search portals); "metadata" (comprising assigning permanent identifiers to data, creating and publishing descriptions of data, and capturing computational provenance); "funding" (identifying funding sources for curation support); and "planning" (creating data management plans at proposal time). When the survey results are thus categorized, the dominance of storage is clear, with over 80% of respondents requesting some type of storage-related help. (This number may also reflect a general equating of curation with storage on the part of respondents.) Slightly fewer than 50% of respondents requested help related to metadata, a result explored in more detail below.

      Storage is a broad topic and is a very frequently mentioned topic in all of the University-run surveys.

      http://www.alexandria.ucsb.edu/~gjanee/dc@ucsb/survey/plots/q4.2.png

      Highlight by Chris during today's discussion.

    4. Distribution of departments with respect to responsibility spheres. Ignoring the "Myself" choice, consider clustering the parties potentially responsible for curation mentioned in the survey into three "responsibility spheres": "local" (comprising lab manager, lab research staff, and department); "campus" (comprising campus library and campus IT); and "external" (comprising external data repository, external research partner, funding agency, and the UC Curation Center). Departments can then be positioned on a tri-plot of these responsibility spheres, according to the average of their respondents' answers. For example, all responses from FeministStds (Feminist Studies) were in the campus sphere, and thus it is positioned directly at that vertex. If a vertex represents a 100% share of responsibility, then the dashed line opposite a vertex represents a reduction of that share to 20%. For example, only 20% of ECE's (Electrical and Computer Engineering's) responses were in the campus sphere, while the remaining 80% of responses were evenly split between the local and external spheres, and thus it is positioned at the 20% line opposite the campus sphere and midway between the local and external spheres. Such a plot reveals that departments exhibit different characteristics with respect to curatorial responsibility, and look to different types of curation solutions.

      This section contains an interesting diagram showing the distribution of departments with respect to responsibility spheres:

      http://www.alexandria.ucsb.edu/~gjanee/dc@ucsb/survey/plots/q2.5.png

    5. In the course of your research or teaching, do you produce digital data that merits curation? 225 of 292 (77%) of respondents answered "yes" to this first question, which corresponds to 25% of the estimated population of 900 faculty and researchers who received the survey.

      For those who do not feel they have data that merits curation I would at least like to hear a description of the kinds of data they have and why they feel it does not need to be curated?

      For some people they may already be using well-curated data sets; on the other hand there are some people who feel their data may not be useful to anyone outside their own research group, so there is no need to curate the data for use by anyone else even though under some definition of "curation" there may be important unmet curation needs for internal-use only that may be visible only to grad students or researchers who work with the data hands-on daily.

      UPDATE: My question is essentially answered here: https://hypothes.is/a/xBpqzIGTRaGCSmc_GaCsrw

    6. Responsibility, myself versus others. It may appear that responses to the question of responsibility are bifurcated between "Myself" and all other parties combined. However, respondents who identified themselves as being responsible were more likely than not to identify additional parties that share that responsibility. Thus, curatorial responsibility is seen as a collaborative effort. (The "Nobody" category is a slight misnomer here as it also includes non-responses to this question.)

      This answers my previous question about this survey item:

      https://hypothes.is/a/QrDAnmV8Tm-EkDuHuknS2A

    7. Awareness of data and commitment to its preservation are two key preconditions for successful data curation.

      Great observation!

    8. Which parties do you believe have primary responsibility for the curation of your data? Almost all respondents identified themselves as being personally responsible.

      For those that identify themselves as personally responsible would they identify themselves (or their group) as the only ones responsible for the data? Or is there a belief that the institution should also be responsible in some way in addition to themselves?

    9. Availability of the raw survey data is subject to the approval of the UCSB Human Subjects Committee.
    10. Survey design The survey was intended to capture as broad and complete a view of data production activities and curation concerns on campus as possible, at the expense of gaining more in-depth knowledge.

      Summary of the survey design

    11. Researchers may be underestimating the need for help using archival storage systems and dealing with attendant metadata issues.

      In my mind this is a key challenge: even if people can describe what they need for themselves (that in itself is a very hard problem), what to do from the infrastructure standpoint to implement services that aid the individual researcher and also aid collaboration across individuals in the same domain, as well as across domains and institutions... in a long-term sustainable way is not obvious.

      In essence... how do we translate needs that we don't yet fully understand into infrastructure with low barrier to adoption, use, and collaboration?

    12. Researchers view curation as a collaborative activity and collective responsibility.
    13. To summarize the survey's findings: Curation of digital data is a concern for a significant proportion of UCSB faculty and researchers. Curation of digital data is a concern for almost every department and unit on campus. Researchers almost universally view themselves as personally responsible for the curation of their data. Researchers view curation as a collaborative activity and collective responsibility. Departments have different curation requirements, and therefore may require different amounts and types of campus support. Researchers desire help with all data management activities related to curation, predominantly storage. Researchers may be underestimating the need for help using archival storage systems and dealing with attendant metadata issues. There are many sources of curation mandates, and researchers are increasingly under mandate to curate their data. Researchers under curation mandate are more likely to collaborate with other parties in curating their data, including with their local labs and departments. Researchers under curation mandate request more help with all curation-related activities; put another way, curation mandates are an effective means of raising curation awareness. The survey reflects the concerns of a broad cross-section of campus.

      Summary of survey findings.

    14. In 2012 the Data Curation @ UCSB Project surveyed UCSB campus faculty and researchers on the subject of data curation, with the goals of 1) better understanding the scope of the digital curation problem and the curation services that are needed, and 2) characterizing the role that the UCSB Library might play in supporting curation of campus research outputs.

      1) better understanding the scope of the digital curation problem and the curation services that are needed

      2) characterizing the role that the UCSB Library might play in supporting curation of campus research outputs.