- Sep 2023
-
popjournal.ca popjournal.ca
-
Machine-Readable
{Machine-Readable}
-
Globally Unique Names
{Globally Unique}
-
-
datascience.codata.org datascience.codata.org
-
Systems such as DOI can thus support resolution mechanisms that are likely to be able to maintain the resolution of identifiers regardless of changes in technology or to one particular system.
{Protocol Independence}
-
-
zenodo.org zenodo.org
-
Brown, Josh, Jones, Phill, Meadows, Alice, & Murphy, Fiona. (2022). Incentives to invest in identifiers: A cost-benefit analysis of persistent identifiers in Australian research systems. Zenodo. https://doi.org/10.5281/zenodo.7100578
P1: Benefits of PIDs
Tags
Annotators
URL
-
-
datacite.org datacite.org
-
PIDs for research dataPIDs for instrumentsPIDs for academic eventsPIDs for cultural objects and their contextsPIDs for organizations and projectsPIDs for researchers and contributorsPIDs for physical objectsPIDs for open-access publishing services and current research information systems (CRIS)PIDs for softwarePIDs for text publications
PID Use Case Elements, entities
-
-
www.researchgate.net www.researchgate.net
-
Although the DOIs assigned to relatively large aggregations of datasets are well suited for citation and acknowledgment pur-poses, they are not issued at fine enough granularity to meet the scientific imperative that published results should be traceableand verifiable
Reproducibility
-
Persistent identifiers for acknowledgment and citation
PID Use Cases
-
ne key element is to generate a dataset-centric rather than system-centric focus, with an aim to making the infrastructure less prone to systemic failure.
PID Motivations
-
scientific reproducibility and accountability
PID Motivations
-
-
www.surf.nl www.surf.nl
-
To reuse and/or reproduce research it is desirable that researchoutput be available with sufficient context and details for bothhumans and machines to be able to interpret the data as described inthe FAIR principles
Reusability and reproducibility of research output
-
Registration of research output is necessary to report tofunders like NWO, ZonMW, SIA, etc. for monitoring andevaluation of research (e.g. according to SEP or BKOprotocols). Persistent identifiers can be applied to ease theadministrative burden. This results in better reporting,better information management and in the end betterresearch information.
Registering and reporting research
-
1. Registration and reporting research2. Reusability and reproducibility of research3. Evaluation and recognition of research4. Grant application5. Researcher profiling6. Journal rankings
PID Use Cases
-
-
www.rd-alliance.org www.rd-alliance.org
-
STAGE Description PIDs Benefits
PID Use Cases
-
-
www.rd-alliance.org www.rd-alliance.org
-
Deduplication of researchersLinkage with awardsAuthoritative attribution of affiliationand worksORCID iD RecommendedIdentification of datasets, software andother types of research outputsDataCite DOI RecommendedIdentification of organisations GRID/ROR RecommendedIdentification of organisations inNZRISNZBN Required for data providers
PID Use Cases
-
-
www.rd-alliance.org www.rd-alliance.org
-
Function PID type (Examples:) Recommended or required?
PID Use Cases
-
The progress and impact of the project will be measured and monitored through the collection ofquantitative indicators. The different systems of the project partners as well as ORCID Inc. andROR will be queried. If possible, indicators for all 10 PID use cases should be measured. Theseinclude for example the following indicators:● Number of registered DataCite DOIs by scientific institutions in Germany.● Number of registered DataCite-DOIs that have a link to further resources via arelated-IDentifier relationship.● Number of ROR implementations at scientific institutions in Germany.● Number of GND records that have an ORCID iD or a ROR ID.● Etc.
PID Use Cases
-
-
www.rd-alliance.org www.rd-alliance.org
-
Key features● KISTI’s mission is to curate collect, consolidate, and provide scientific information toKorean researchers and institutions. It includes but is not limited to.■ Curating Korean R&D outputs. Curate them higher state of identification for bettercuration, tracking research impact, analysing research outcomes.■ DOI RA management. Issuing DOIs to Korean research outputs, Intellectualproperties, research data■ Support Korean societies to stimulate better visibilities of their journal articlesaround the world.■ Collaborate for better curation (identification and interlinking) with domestic andglobal scientific information management institutions, publishers and identifiermanaging agencies
PID Use Cases
-
-
www.rd-alliance.org www.rd-alliance.org
-
Name of infrastructure Key purpose List of integrated PIDsFairdata.fi Research data publication,metadata hub andpreservation serviceDOI, URN, ORCID (updaResearch.fi National research data hub. Current draft:ADSbibcode - AstrophysicsData System -Bibliographic ReferenceCode (en)ARK - Archival ResourceKey (en)arXiv - arXiv identifierscheme (en)BusinessID - Y-tunnus (fi)(en)Crossref_funders -Crossref Funder Registry(en)DOI - Digital ObjectIdentifier (en)Case Study: FINLAND Page 3 of 6
PID Use Cases
-
-
www.rd-alliance.org www.rd-alliance.org
-
Name of infrastructure Key purpose List ofintegratedPIDse-infra This large infrastructure will build the NationalRepository Platform in the upcoming years. Thatshould greatly facilitate adoption of PIDs.TBDNational CRIS - IS VaVaI(R&D Information System)National research information system. We planon working with Research, Development andInnovation Council (in charge of IS VaVaI) onintegrating global PIDs into their submissionprocesses as required. Nowadays it uses mostlylocal identifiers.TBDInstitutional CRIS systems Various institutional CRIS systems at CzechRPOs. OBD (Personal Bibliographic Database)application is an outstanding case of aninstitutional CRIS system in the Czech Republicdeveloped locally by a Czech company DERS.An ORCID integration for OBD is currently indevelopment.TBD, OBDORCID inprocessInstitutional or subjectrepositoriesThere are several repositories in the Czechrepublic collecting different objects, some arealready using PIDs but there is still enough roomto improve and really integrate those PIDs, notonly allow their evidence.Handle,DOI,maybeotherMajor research funders Grant application processes TBDLocal publishers Content submission processes TBD
PID Use Cases
-
TARGET INSTITUTIONS:● Public research performing organisations (RPOs): Higher Education Institutions andResearch organizations● Research funding organizations (RFOs): Ministry of Education, Youth and Sports, CzechScience Foundation, Technology Agency of the Czech Republic etc.● Policymakers: Ministry of Education, Youth and Sports; Research, Development andInnovation Council (R&D&I Council)● Libraries: National library, National Library of Technology, academic libraries● Publishers based in Czechia● Service providers, research infrastructuresTARGET GROUPS:● Researchers● Librarians● Open Science/Open Access managers/coordinators● CRIS system managers● Repository managers● Other research support positions, e.g. data stewards, data curators
PID Stakeholders and Target Groups
-
Function PID type Recommended or required?
PID Use Cases
-
-
www.rd-alliance.org www.rd-alliance.org
-
ResearchVocabularies Australia
PID - Vocabularies for Concepts and Things
-
-
www.rd-alliance.org www.rd-alliance.org
-
Function PID type Recommended or required?
PID Use Cases in the Netherlands
-
1. Registration and reporting research2. Reusability and reproducibility of research3. Evaluation and recognition of research4. Grant application5. Researcher profiling6. Journal rankings
PID Use Cases in the Netherlands
-
-
www.rd-alliance.org www.rd-alliance.org
-
PIDs comparison tableCase study Function PID typeFinland Researchers, persons ORCID; ISNIOrganisations VAT-number (not resolvableyet)RoRISNI___________________________________________________________________________________________________________________Pathways to National PID Strategies: Guide and Checklist to facilitate uptake and alignment Page 13 of 20
PID usage by country
-
- Aug 2023
-
www.rd-alliance.org www.rd-alliance.org
-
Generalist repositorie
-
-
-
10 Things for Curating Reproducible and FAIR Research
Test for multiple annotations
Tags
Annotators
URL
-
-
zenodo.org zenodo.org
-
research data life cycle
Annotated with RDA Tags: Working groups
-
- Jul 2023
-
wiki.surfnet.nl wiki.surfnet.nl
-
Supported Schemes
[Schemes] [Feature Support]
-
-
gdpr-info.eu gdpr-info.eu
-
processing relates to personal data which are manifestly made public by the data subject;
[GDPR and Metadata]
Tags
Annotators
URL
-
- May 2023
-
-
Deep Learning (DL) A Technique for Implementing Machine LearningSubfield of ML that uses specialized techniques involving multi-layer (2+) artificial neural networksLayering allows cascaded learning and abstraction levels (e.g. line -> shape -> object -> scene)Computationally intensive enabled by clouds, GPUs, and specialized HW such as FPGAs, TPUs, etc.
[29] AI - Deep Learning
-
-
en.wikiquote.org en.wikiquote.org
-
The object of the present volume is to point out the effects and the advantages which arise from the use of tools and machines ;—to endeavour to classify their modes of action ;—and to trace both the causes and the consequences of applying machinery to supersede the skill and power of the human arm.
[28] AI - precedents...
-
-
openai.com openai.comGPT-41
-
Safety & alignment
[25] AI - Alignment
Tags
Annotators
URL
-
-
ourworldindata.org ourworldindata.orgBooks1
-
A book is defined as a published title with more than 49 pages.
[24] AI - Bias in Training Materials
-
-
www.notepage.net www.notepage.net
-
Epidemiologist Michael Abramson, who led the research, found that the participants who texted more often tended to work faster but score lower on the tests.
[21] AI - Skills Erosion
-
-
www.technologyreview.com www.technologyreview.com
-
An AI model taught to view racist language as normal is obviously bad. The researchers, though, point out a couple of more subtle problems. One is that shifts in language play an important role in social change; the MeToo and Black Lives Matter movements, for example, have tried to establish a new anti-sexist and anti-racist vocabulary. An AI model trained on vast swaths of the internet won’t be attuned to the nuances of this vocabulary and won’t produce or interpret language in line with these new cultural norms. It will also fail to capture the language and the norms of countries and peoples that have less access to the internet and thus a smaller linguistic footprint online. The result is that AI-generated language will be homogenized, reflecting the practices of the richest countries and communities.
[21] AI Nuances
-
-
serokell.io serokell.io
-
According to him, there are several goals connected to AI alignment that need to be addressed:
[20] AI - Alignment Goals
-
-
cointelegraph.com cointelegraph.com
-
The AI developers came under intense scrutiny in Europe recently, with Italy being the first Western nation to temporarily ban ChatGPT
[19] AI - Legal Response
-
-
www.visualcapitalist.com www.visualcapitalist.com
-
The following table lists the results that we visualized in the graphic.
[18] AI - Increased sophistication
-
-
artificialintelligenceact.eu artificialintelligenceact.euThe Act1
-
A summary presentation on the Act by the European Commission can be downloaded here.
[3] AI - Risk Clasiification
-
-
www.visualcapitalist.com www.visualcapitalist.com
-
Images: Generative AI can create new images based on existing ones, such as creating a new portrait based on a person’s face or a new landscape based on existing scenery
[17] AI- Features - Image Synthesis
-
-
blog.google blog.google
-
To evaluate the information for yourself, you can also expand your view to see how the response is corroborated, and click to go deeper.
[14] AI Features - Provenance
-
You’ll see an AI-powered snapshot of key information to consider, with links to dig deeper.
[14] AI - Summary
-
-
www.visualcapitalist.com www.visualcapitalist.com
-
Netherlands33%$57,768
[15] AI - Attitudes
-
-
www.visualcapitalist.com www.visualcapitalist.com
-
17openai.com1.8Technology - Other
[15] AI - Growth of Users
-
-
openai.com openai.com
-
Actors: Language models could drive down the cost of running influence operations, placing them within reach of new actors and actor types. Likewise, propagandists-for-hire that automate production of text may gain new competitive advantages.Behavior: Influence operations with language models will become easier to scale, and tactics that are currently expensive (e.g., generating personalized content) may become cheaper. Language models may also enable new tactics to emerge—like real-time content generation in chatbots.Content: Text creation tools powered by language models may generate more impactful or persuasive messaging compared to propagandists, especially those who lack requisite linguistic or cultural knowledge of their target. They may also make influence operations less discoverable, since they repeatedly create new content without needing to resort to copy-pasting and other noticeable time-saving behaviors.
[10] AI - Influencing Concerns
-
-
blogs.microsoft.com blogs.microsoft.com
-
The new Bing also cites all its sources, so you’re able to see links to the web content it references.
[13] AI Features - Provenance
-
empowers you to refine your search until you get the complete answer you are looking for by asking for more details, clarity and ideas – with links available so you can immediately act on your decisions.
[13] AI Features - Refinement
-
reviews results from across the web to find and summarize the answer you’re looking for.
[13] AI Features - Summaries
-
-
datascience.codata.org datascience.codata.org
-
Editable metadata – identifiers’ metadata must be able to be edited in order to allow their owners to update details of the thing they are referring to, such as its location, as they will inevitably change.
{Dynamic}
-
Ownership – identifiers created must be able to have their management restricted to particular agent;
{Single Agent}
-
articulates requirements for readability sating that identifiers must be: Any printable characters from the Universal Character Set of ISO/IEC 10646 (ISO 2012):UTF-8 encoding is required; Case insensitive:Only ASCII case folding is allowed.
{UTF-8} {ASCII Case Folding}
-
The reason for broadening them is that identifier resolution systems may be forced to change protocols over time and what is acceptable for one protocol may not be for another.
{Protocol Independence}
-
Creating identifiers that are independent of any particular technology or organisation and are able to be unambiguously understood are well-known requirement for PID systems.
{Independence}
-
Uniqueness – within some scope, not necessarily globally, to avoid clashes;
{Unique}
-
-
www.doi.org www.doi.org
-
In addition, data model policy requires that RAs maintain a record of the date of allocation of a DOI name, and the identity of the registrant on whose behalf the DOI name was allocated.
{Record-keeping}
-
The policy provides a simple test of an RA’s competence: the ability to make a DOI Kernel Declaration, which requires that the RA has an internal system which can support the unambiguous allocation of a DOI name, and is fundamentally sound enough to support interoperability within the network.
{Competence} {Unambiguous Allocation}
-
The second aim of DOI data model policy is “to ensure minimum standards of quality of administration of DOI names by Registration Agencies, and facilitate the administration of the DOI system as a whole”.
{Administrative Capacity}
-
-
www.doi.org www.doi.org
-
Designing and implementing specific operational processes for e.g. quality control of input data and output data; Integrating the community into other DOI related activities and services.
{Quality Assurance}
-
Providing applications, services, marketing, outreach, business cases etc. to introduce the DOI system to the community; Designing and implementing specific operational processes for e.g. quality control of input data and output data;
{Services}
-
Providing information and advice to the community
{Community Advice}
-
Registration Agencies must comply with the policies and technical standards established by the IDF, but are free to develop their own business model for running their businesses. There is no appropriate “one size fits all” model; RAs may be for-profit or not-for-profit organisations. The costs of providing DOI registration may be included in the services offered by an RA provision and not separately distinguished from these. Examples of possible business models may involve explicit charging based on the number of prefixes allocated or the number of DOI names allocated; volume discounts, usage discounts, stepped charges, or any mix of these; indirect charging through inclusion of the basic registration functions in related value added services; and cross-subsidy from other sources.
{Fee-for-Service}
-
Integrating the community into other DOI related activities and services
{Community}
-
-
www.doi.org www.doi.org
-
More sophisticated functionality available, e.g., multiple resolution, data typing
{Data Typing} {Multiple Resolution}
-
-
en.wikipedia.org en.wikipedia.org
-
URNs are globally unique persistent identifiers assigned within defined namespaces so they will be available for a long period of time, even after the resource which they identify ceases to exist or becomes unavailable.
{Global}
-
Approximately sixty formal URN namespace identifiers have been registered.
{Unambiguous Allocation}
-
In order to ensure the global uniqueness of URN namespaces, their identifiers (NIDs) are required to be registered with the IANA. Registered namespaces may be "formal" or "informal".
{Unique}
-
A Uniform Resource Name (URN) is a Uniform Resource Identifier (URI) that uses the urn scheme. URNs are globally unique persistent identifiers assigned within defined namespaces so they will be available for a long period of time, even after the resource which they identify ceases to exist or becomes unavailable
{Persistence}
-
-
www.iso.org www.iso.org
-
existence, and ability to be used in services outside the direct control of the issuing assigner, without a stated time limit
{Persistence}
-
specification by a DOI name (3.2) of one and only one referent (3.16)
{Unique}
-
process of submitting a DOI name (3.2) to a network service and receiving in return one or more pieces of current information related to the identified object such as metadata or a location of the object or of metadata
{Resolvable}
-
— dynamic updating of metadata, applications and services.
{Dynamic}
-
— single management of data for multiple output formats (platform independence),
{Platform Independence}
-
— interoperability with other data from other sources,
{Interoperable}
-
— persistence, if material is moved, rearranged, or bookmarked,
{Persistence}
-
— extensibility by adding new features and services through management of groups of DOI names,
{Extensible}
-
-
openscholarlyinfrastructure.org openscholarlyinfrastructure.org
-
Patent non-assertion – The organisation should commit to a patent non-assertion covenant. The organisation may obtain patents to protect its own operations, but not use them to prevent the community from replicating the infrastructure.
{No Patents}
-
Open source – All software required to run the infrastructure should be available under an open source license. This does not include other software that may be involved with running the organisation.
{Open Source}
-
Open data (within constraints of privacy laws) – For an infrastructure to be forked it will be necessary to replicate all relevant data. The CC0 waiver is best practice in making data legally available. Privacy and data protection laws will limit the extent to which this is possible
{Open Data}
-
Available data (within constraints of privacy laws) – It is not enough that the data be made “open” if there is not a practical way to actually obtain it. Underlying data should be made easily available via periodic data dumps.
{Accessible}
-
Revenue based on services, not data – data related to the running of the research enterprise should be a community property. Appropriate revenue sources might include value-added services, consulting, API Service Level Agreements or membership fees.
{Sustainable Operational Revenue}
-
Mission-consistent revenue generation – potential revenue sources should be considered for consistency with the organisational mission and not run counter to the aims of the organisation
{Mission-Consistent}
-
Goal to create contingency fund to support operations for 12 months – a high priority should be generating a contingency fund that can support a complete, orderly wind down (12 months in most cases). This fund should be separate from those allocated to covering operating risk and investment in development.
{Contingency}
-
Goal to generate surplus – organisations which define sustainability based merely on recovering costs are brittle and stagnant. It is not enough to merely survive, it has to be able to adapt and change. To weather economic, social and technological volatility, they need financial resources beyond immediate operating costs.
{Surplus}
-
Time-limited funds are used only for time-limited activities – day to day operations should be supported by day to day sustainable revenue sources. Grant dependency for funding operations makes them fragile and more easily distracted from building core infrastructure.
{Time-Limited}
-
Formal incentives to fulfil mission & wind-down – infrastructures exist for a specific purpose and that purpose can be radically simplified or even rendered unnecessary by technological or social change. If it is possible the organisation (and staff) should have direct incentives to deliver on the mission and wind down.
{Formal Incentives]
-
Living will – a powerful way to create trust is to publicly describe a plan addressing the condition under which an organisation would be wound down, how this would happen, and how any ongoing assets could be archived and preserved when passed to a successor organisation. Any such organisation would need to honour this same set of principles.
{Living Will}
-
Cannot lobby – the community, not infrastructure organisations, should collectively drive regulatory change. An infrastructure organisation’s role is to provide a base for others to work on and should depend on its community to support the creation of a legislative environment that affects it.
{Cannot Lobby}
-
Transparent operations – achieving trust in the selection of representatives to governance groups will be best achieved through transparent processes and operations in general (within the constraints of privacy laws).
{Transparent}
-
Non-discriminatory membership – we see the best option as an “opt-in” approach with a principle of non-discrimination where any stakeholder group may express an interest and should be welcome. The process of representation in day to day governance must also be inclusive with governance that reflects the demographics of the membership.
{Membership}
-
Stakeholder Governed – a board-governed organisation drawn from the stakeholder community builds more confidence that the organisation will take decisions driven by community consensus and consideration of different interests.
{Stakeholder Governed}
-
Coverage across the research enterprise – it is increasingly clear that research transcends disciplines, geography, institutions and stakeholders. The infrastructure that supports it needs to do the same.
{Coverage}
-
-
www.rfc-editor.org www.rfc-editor.org
-
this specification permits several other cases of URN resolution as well as URNs for resources that do not involve information retrieval systems. This is true either individually for particular URNs or (as defined below) collectively for entire URN namespaces.
{Resolvable}
-