61 Matching Annotations
  1. Feb 2025
    1. modularize functionality such that core research data and methods are decoupled from any one interface

      Yes, this is part of the low-friction transitions / data portability. Mention funcationality modularisation.

    2. Designing with longevity in mind often means choosing simpler, well-supported technologies over cutting-edge but ephemeral ones.

      Again, perhaps we can test the 'simplicity' argument using the dataset we're building.

    3. indicators of health (akin to how ecologists track species populations). For example, an indicator might be “active installations” of a software – if that number drops to zero, the software is effectively dead.

      Perhaps we can find some metric here that could be expressed in / derived from the dataset to show (a) maximum use of the tool, (b) when the tool died.

    4. Similarly, we have scant information on how often historical researchers reuse software developed by others. If reuse is low, perhaps because tools are hard to find or hard to learn, that could dampen incentives to sustain them. Some evidence suggests reuse is limited in humanities: Barats et al. (2020) hint that unlike sciences where shared tools (R, Python libraries) are common, humanities projects often start from scratch or custom-build

      REinventing the whell is a problem, in our dataset we should look for evidence of use, at least at an order-of-magnitude scale.

    5. lost research or reanalysis needed

      Avoid at all costs.

      Anyway - highlight this lack in the 'future directions' section of the paper, and mention it in the section arguing for low-friction transitions.

    6. We lack understanding of how end-users of historical software (e.g. historians using a text mining tool) deal with software obsolescence and what they need for continuity. Do they find workarounds? Do they abandon methods when software breaks?

      This sort of research would be useful, until then, focusing on reducing the friction of change from one tool to the next is the best we can do.

    7. There is a need for lightweight assessment tools that account for the realities of these projects. For example, a “Sustainability Scorecard for Digital History Projects” could focus on a few key predictors (open source, multiple collaborators, archived in repository, etc.) and be easier to use than a full maturity model.

      Perhaps we could suggest something here based on what our dataset of softare looks like.

    8. providing migration tools or services to move users to a new product

      Maybe some truncated version of this, where there are built-in archiving / export tools.

    9. if a tool gains a sufficient user base (even if small but dedicated), it can leverage community contributions for maintenance. However, many humanities tools never reach that critical mass

      Yes, actual use is key

    10. Backward compatibility is another hallmark of commercial practice: e.g., Adobe Photoshop today can still open files created 20+ years ago, thanks to consistent format support. In research software, backward compatibility often translates to data portability: making sure that data formats remain readable even if the software isn’t the same. The Endings Principles stress this by requiring data in standard formats​dh-tech.github.io – if a project’s custom software dies, another tool can potentially read the data.

      Ok, this was going to be the main point of the paper, need to talk to Peter Sefton about how to go past the results of the Ending Project.

      Require standard data formats, ensure data portability, if a project's custom softare dies, another tool can potentially read the data.

      Perhaps can extend by focusing on low-frinction transition to other tools, such as the use of bog standard formats and data bundled with metadata (like ROCrate).

    11. QGIS in archaeology

      Voyant, QGIS - large open-source tools the only ones that are really succeeding in archaeology / DH - Pareto distribution, I'm sure...

    12. infrastructure and community level

      In big data / big science fields, sustainability is approached at the infrastructure and community level.

    13. limitations of empirical work

      More longitudinal studies needed. Only strong hints and isolated metrics plus analogies to data in the literature at present. No comprehensive empirical model for software longevity.

    14. Methodological Reflections

      Methods include surveys, repository mining, case studies and post-mortems, studies of dependency networks. Unsurprisingly, software with more users and uses survives longer (strengthening argument for FAIR software, see above). Broad user base, as with Voyant, is crucial.

    15. integrated archival practice

      Static site, source doce with DOI (Zenodo + GitHub), descriptive metadata. Again, however, most of the examples are from collections not tools.

    16. Repositories and Archival Sources

      Another GitHub study: Duckles et al (2020). Poorly documented. Allen et al 2019 looks at Zenodo and shows increase it its use after 2016. Internet Archive can be used to retrospectively study project websites.

    17. Quantitative Studies on Software Lifespan

      Relative few: Vines et all 2014 looks at data (half-life of 6-7 years). Nielsen et al (2017) looked at biology papers and found that many could not be obtained a few years later. Endings Project has some data. Katz and Niemeryer (2019) look at Github repos and found that most have a short, bursty commit history (see also Howison and Herbsleb). Few are active beyond five yeras or attract multiple external contributors. Overall, short lifespan.

    18. King’s Digital Lab (KDL)

      KDL, tiered archiving framework, classifies each project based on importance and feasibility, some kept running, others are archived

    19. Computational History project Stadt.Geschichte.Basel

      Applied Endings Principles, all research data in standard formats and website archived to statis HTML

    20. Methodologies and Tools for Sustainability

      Methodologies and tools * Endings Principles for Digital Longevity: data, documentation, processing, publication, and release management * 'Data' under Endings Principles is the closest to what we are arguing, recommending that all project data should be stored in open, non-proprietary formats, so no closed or obsolecent formats (follow up, isn't clear exactly what 'data' they are talking about - I assume it's data produced by the software, as we are discussing, but need to confirm). * Producing a static website end product not really relevant, shows how much of the literature is about collections rather than tools. * Goddard (2023) dark archving, web harvesting, emulation, continuous migration.

    21. National and International Initiatives

      National and international initiatives - lots of good advice here: * Early planning for preseravation * Choosing appropriate OS licenses * engaging users to encourage co-development * better software practices (e.g., FAIR4RS) * software rachiving * support for reproducibility (what does this mean?) * Capacity and expertise building * Larger infrastructure orgs like data centers, libraries, eresearch institutes, etc) should be more active rather than individual project teams (IMPORTANT: comes from NL report, essentially trying to apply big data / big science solutions to small-science domain). * NL report explains how FAIR makes software more sustainable: easier to find and reuse, less likely to be lost or reinvented - I would add more likely to build a larger user base and get more people involved to provide or find resourcing)

    22. Existing Frameworks and Methodologies

      Potential additional research querries: (1) national (and other) reports on software sustainability like the one from NL (2) Relationship of FAIR for RS and sustainability (3) Projects on deliberate end-of-life planning, like the Endings Project (4) Frameworks for assessing sustainability, like RSMM (5) Methods and tools for sustainability like the Endings Principles

    23. Netherlands’ national report on research software sustainability

      Note the existence of national reports on research software sustainability. Are there others?

    24. adopted by a wider community,

      'Intended for adoption by a wider community' might be part of our definition of a software tool, to eliminate one-offs by researchers. The criteria we are using to select software (e.g., publicaton about it or citing it) seem to select for this criteria.

    25. FAIR Principles

      FAIR has, I think, limited applicability to longevity / sustainability. It helps make software more discoverable and reusable, which in turn migh tincrease uptake and make it more likely that the original creators or someone else will do the work to maintain the software. Otherwise, the key here is improved interoperability, which allows the software to be substituted with something else with as little friction as possible.

    26. shares common fundamental issues with other domains (e.g. technology obsolescence, need for maintenance effort) but often without the same level of structural support; addressing these issues in a humanities context requires tailoring strategies to smaller projects and advocacy for institutional change.

      Small data / small science problem again

    27. scientific software may be sustained as part of ongoing experimental operations or through agencies like NSF and DOE that mandate data/software management plans, whereas humanities software is frequently tied to one-time grants with less stringent post-project requirements

      Cross-domain differences in funding models

    28. scale of projects often differs as well: fields like astronomy or genomics may create large, collaboratively developed software (with dozens of contributors and multi-year roadmaps), whereas a digital history project might be a small team effort

      Again, big vs. small, particularly challenging for small disciplines since work necessary for maintanence is concentrated, there are fewer standards / less standardisation, less shared infrastructure (more bespoke work / reinventing the wheel) and funding is less

    29. making software and data FAIR (Findable, Accessible, Interoperable, Reusable) has become a “shared ambition” backed by concrete action​digitalhumanities.org. By contrast, in humanities, such principles until recently remained more of a theoretical discussion than common practice

      'big science' vs 'small science' / big data vs small data

    30. Once a grant ends, there may be no dedicated budget to update the software or migrate it to newer platforms. This cyclic funding model – “forever or five years” as one commentary wryly put it (evoking Rothenberg’s famous quote that digital content lasts forever or five years)

      Grant funding is brief, but upkeep costs continue

    31. scholars hope their libraries will “adopt [the] project wholesale” and keep all components running indefinitely, which is typically not feasible

      tools need continual upkeep

    32. Institutional challenges

      Institutional / socio-technical challenges. I'd probably separate 'techological challenges' and 'socio-technical challenges' in the lit review.

    33. inevitable obsolescence of software dependencies and environments

      This is key: 'inevitable obsolescence of software dependencies and environments'

    34. historical research outputs (digitized archives, databases, analytical tools) often need to endure far longer than the rapidly shifting technologies that support them

      I wouldn't quite word it this way, but the idea that research software tools need to endure longer than the technologies they are built from is an imporant one, and summarises why tools need continual investment and resourcing.

    35. tension between long-term preservation needs and the short life cycles of software, data formats, and platforms

      Keywords: preservation, life-cycle