Future-Proofing and Actionable Insights
(1) and (4) are the emphasis of our paper so far, but check against the dataset.
Future-Proofing and Actionable Insights
(1) and (4) are the emphasis of our paper so far, but check against the dataset.
tacit knowledge
Make the implicit explicit.
document for the future stranger
Yes - also part of the low-friction transition equation.
loose coupling and clear APIs
Kansa - should bring up the 'loose coupling' argument.
modularize functionality such that core research data and methods are decoupled from any one interface
Yes, this is part of the low-friction transitions / data portability. Mention funcationality modularisation.
Designing with longevity in mind often means choosing simpler, well-supported technologies over cutting-edge but ephemeral ones.
Again, perhaps we can test the 'simplicity' argument using the dataset we're building.
Software Management Plans (SMPs)
Not a bad idea, haven't seen any in the wild
indicators of health (akin to how ecologists track species populations). For example, an indicator might be “active installations” of a software – if that number drops to zero, the software is effectively dead.
Perhaps we can find some metric here that could be expressed in / derived from the dataset to show (a) maximum use of the tool, (b) when the tool died.
increasing reuse would naturally improve sustainability (through network effects and shared maintenance).
This is key
Similarly, we have scant information on how often historical researchers reuse software developed by others. If reuse is low, perhaps because tools are hard to find or hard to learn, that could dampen incentives to sustain them. Some evidence suggests reuse is limited in humanities: Barats et al. (2020) hint that unlike sciences where shared tools (R, Python libraries) are common, humanities projects often start from scratch or custom-build
REinventing the whell is a problem, in our dataset we should look for evidence of use, at least at an order-of-magnitude scale.
lost research or reanalysis needed
Avoid at all costs.
Anyway - highlight this lack in the 'future directions' section of the paper, and mention it in the section arguing for low-friction transitions.
We lack understanding of how end-users of historical software (e.g. historians using a text mining tool) deal with software obsolescence and what they need for continuity. Do they find workarounds? Do they abandon methods when software breaks?
This sort of research would be useful, until then, focusing on reducing the friction of change from one tool to the next is the best we can do.
There is a need for lightweight assessment tools that account for the realities of these projects. For example, a “Sustainability Scorecard for Digital History Projects” could focus on a few key predictors (open source, multiple collaborators, archived in repository, etc.) and be easier to use than a full maturity model.
Perhaps we could suggest something here based on what our dataset of softare looks like.
data preservation studies
Perhaps use the 'fading away' article as a model.
empirical data on long-term outcomes
Yay, this is our paper!
providing migration tools or services to move users to a new product
Maybe some truncated version of this, where there are built-in archiving / export tools.
if a tool gains a sufficient user base (even if small but dedicated), it can leverage community contributions for maintenance. However, many humanities tools never reach that critical mass
Yes, actual use is key
paying for a support contract means someone is on the hook to fix issues
Yeah, this is what university IT departments want...
Backward compatibility is another hallmark of commercial practice: e.g., Adobe Photoshop today can still open files created 20+ years ago, thanks to consistent format support. In research software, backward compatibility often translates to data portability: making sure that data formats remain readable even if the software isn’t the same. The Endings Principles stress this by requiring data in standard formatsdh-tech.github.io – if a project’s custom software dies, another tool can potentially read the data.
Ok, this was going to be the main point of the paper, need to talk to Peter Sefton about how to go past the results of the Ending Project.
Require standard data formats, ensure data portability, if a project's custom softare dies, another tool can potentially read the data.
Perhaps can extend by focusing on low-frinction transition to other tools, such as the use of bog standard formats and data bundled with metadata (like ROCrate).
build throwaway prototypes, then rebuild for longevity
This seems like a good idea given practical constraints
digital humanities labs
Perhaps getting better...
small-scale artisanal development
See also 'crapple' license...generally poor RSE practice in DH
single-team developed tool
Small scale, poorly resourced, no standards...
QGIS in archaeology
Voyant, QGIS - large open-source tools the only ones that are really succeeding in archaeology / DH - Pareto distribution, I'm sure...
Best practices from software engineering
Good RSE practices more common in STEM
infrastructure and community level
In big data / big science fields, sustainability is approached at the infrastructure and community level.
project-centric
Small data / small science problems, scale of DH software development rarely exceeds the individual project.
limitations of empirical work
More longitudinal studies needed. Only strong hints and isolated metrics plus analogies to data in the literature at present. No comprehensive empirical model for software longevity.
Methodological Reflections
Methods include surveys, repository mining, case studies and post-mortems, studies of dependency networks. Unsurprisingly, software with more users and uses survives longer (strengthening argument for FAIR software, see above). Broad user base, as with Voyant, is crucial.
integrated archival practice
Static site, source doce with DOI (Zenodo + GitHub), descriptive metadata. Again, however, most of the examples are from collections not tools.
Repositories and Archival Sources
Another GitHub study: Duckles et al (2020). Poorly documented. Allen et al 2019 looks at Zenodo and shows increase it its use after 2016. Internet Archive can be used to retrospectively study project websites.
Qualitative Studies and Project Post-mortems
Projec tBamboo taken as a case study in unsustainability of DH.
Quantitative Studies on Software Lifespan
Relative few: Vines et all 2014 looks at data (half-life of 6-7 years). Nielsen et al (2017) looked at biology papers and found that many could not be obtained a few years later. Endings Project has some data. Katz and Niemeryer (2019) look at Github repos and found that most have a short, bursty commit history (see also Howison and Herbsleb). Few are active beyond five yeras or attract multiple external contributors. Overall, short lifespan.
Bamboo
Failed due to lack of governance, iterative development, and community buy-in - the last is probably the most common problem...
centralized institutional support
Importance of shared infrastructure
King’s Digital Lab (KDL)
KDL, tiered archiving framework, classifies each project based on importance and feasibility, some kept running, others are archived
Computational History project Stadt.Geschichte.Basel
Applied Endings Principles, all research data in standard formats and website archived to statis HTML
ARIADNE
Need to get DR to summarise the ARIADNE project
Methodologies and Tools for Sustainability
Methodologies and tools * Endings Principles for Digital Longevity: data, documentation, processing, publication, and release management * 'Data' under Endings Principles is the closest to what we are arguing, recommending that all project data should be stored in open, non-proprietary formats, so no closed or obsolecent formats (follow up, isn't clear exactly what 'data' they are talking about - I assume it's data produced by the software, as we are discussing, but need to confirm). * Producing a static website end product not really relevant, shows how much of the literature is about collections rather than tools. * Goddard (2023) dark archving, web harvesting, emulation, continuous migration.
National and International Initiatives
National and international initiatives - lots of good advice here: * Early planning for preseravation * Choosing appropriate OS licenses * engaging users to encourage co-development * better software practices (e.g., FAIR4RS) * software rachiving * support for reproducibility (what does this mean?) * Capacity and expertise building * Larger infrastructure orgs like data centers, libraries, eresearch institutes, etc) should be more active rather than individual project teams (IMPORTANT: comes from NL report, essentially trying to apply big data / big science solutions to small-science domain). * NL report explains how FAIR makes software more sustainable: easier to find and reuse, less likely to be lost or reinvented - I would add more likely to build a larger user base and get more people involved to provide or find resourcing)
Existing Frameworks and Methodologies
Potential additional research querries: (1) national (and other) reports on software sustainability like the one from NL (2) Relationship of FAIR for RS and sustainability (3) Projects on deliberate end-of-life planning, like the Endings Project (4) Frameworks for assessing sustainability, like RSMM (5) Methods and tools for sustainability like the Endings Principles
Netherlands’ national report on research software sustainability
Note the existence of national reports on research software sustainability. Are there others?
adopted by a wider community,
'Intended for adoption by a wider community' might be part of our definition of a software tool, to eliminate one-offs by researchers. The criteria we are using to select software (e.g., publicaton about it or citing it) seem to select for this criteria.
Endings Project
Endings project is crucial, as it's the only one I've found so far that actually promotes end-of-life planning.
Research Software Sustainability Maturity Model (RSMM)
RSMM is important, but too new to assess whether it's had any impact.
FAIR Principles
FAIR has, I think, limited applicability to longevity / sustainability. It helps make software more discoverable and reusable, which in turn migh tincrease uptake and make it more likely that the original creators or someone else will do the work to maintain the software. Otherwise, the key here is improved interoperability, which allows the software to be substituted with something else with as little friction as possible.
shares common fundamental issues with other domains (e.g. technology obsolescence, need for maintenance effort) but often without the same level of structural support; addressing these issues in a humanities context requires tailoring strategies to smaller projects and advocacy for institutional change.
Small data / small science problem again
abandonment
Keyword: abandonment
scientific software may be sustained as part of ongoing experimental operations or through agencies like NSF and DOE that mandate data/software management plans, whereas humanities software is frequently tied to one-time grants with less stringent post-project requirements
Cross-domain differences in funding models
scale of projects often differs as well: fields like astronomy or genomics may create large, collaboratively developed software (with dozens of contributors and multi-year roadmaps), whereas a digital history project might be a small team effort
Again, big vs. small, particularly challenging for small disciplines since work necessary for maintanence is concentrated, there are fewer standards / less standardisation, less shared infrastructure (more bespoke work / reinventing the wheel) and funding is less
making software and data FAIR (Findable, Accessible, Interoperable, Reusable) has become a “shared ambition” backed by concrete actiondigitalhumanities.org. By contrast, in humanities, such principles until recently remained more of a theoretical discussion than common practice
'big science' vs 'small science' / big data vs small data
changes in culture
culture change necessary around organisational and social context of research tools
knowledge custodian
Brittle nature of the staffing around digital tools
Once a grant ends, there may be no dedicated budget to update the software or migrate it to newer platforms. This cyclic funding model – “forever or five years” as one commentary wryly put it (evoking Rothenberg’s famous quote that digital content lasts forever or five years)
Grant funding is brief, but upkeep costs continue
persistence is a function of organizations, not a function of technology”
Keyword: persistence
scholars hope their libraries will “adopt [the] project wholesale” and keep all components running indefinitely, which is typically not feasible
tools need continual upkeep
Institutional challenges
Institutional / socio-technical challenges. I'd probably separate 'techological challenges' and 'socio-technical challenges' in the lit review.
inevitable obsolescence of software dependencies and environments
This is key: 'inevitable obsolescence of software dependencies and environments'
historical research outputs (digitized archives, databases, analytical tools) often need to endure far longer than the rapidly shifting technologies that support them
I wouldn't quite word it this way, but the idea that research software tools need to endure longer than the technologies they are built from is an imporant one, and summarises why tools need continual investment and resourcing.
tension between long-term preservation needs and the short life cycles of software, data formats, and platforms
Keywords: preservation, life-cycle
sustainability and long-term longevity
Keywords for conventional lit search: sustainability, longevity