388 Matching Annotations
  1. Aug 2023
    1. The presence of duplicate information also cannot be counted when a data stakeholder qual-ifies as ideal; thus, Algorithm 3 checks for identical datasets in an ODC, i.e., datasets with thesame identifier. It uses a set to keep track of the identifiers of each dataset and prints a messagewith the percentage of duplicate datasets

      I don't understand the first sentence (until the semicolon). What is meant with "presence cannot be counted", "a data stakeholder", "a data stakeholder qualifies as ideal"?

      Are two datasets with the same identifier "identical"? Or are they the same dataset?

      For an RDF processor, two resources with the same identifier are the same resource. And DCAT is okay with that.

    2. References

      It feels like there are too many self-references for a single-author paper.

      DOI links should refer to URLs starting with https://doi.org/ instead of http://dx.doi.org/, and underscores should not have a \ in front. References to arXiv are inconsistent: some have a DOI and others don't.

    3. While some common practices exist, individual approachesto ensuring data quality have historically demonstrated superior performance.

      Please refer to work that supports this vague claim.

    4. focus on the automatic quality assessment of ODCs,

      Looking at the references, I don't believe this work is the first to focus on automatic QA of ODCs.

    5. (pp. 29–126)

      I didn't know that W3C Recommendations had page numbers?

  2. Oct 2022
    1. Having used both Marva and Sinopia, I think that Marva supports “BIBFRAME cataloging without thinking about BIBFRAME” especially well.

      I still see a few Linked Data-y elements, like the templates and profiles being identified with "bf2:something".

  3. May 2022
    1. Currently often these repositories are giving some data representation enhanced with data base systems that provide a local layer of data and metadata indexing.

      This sounds speculative.

  4. Apr 2022
    1. development and expansion of a linked open data cloud of cultural heritage

      In general it would be good to keep in mind the rationale for such a LOD cloud and express this rationale.

    2. LIFT uses another Python library, lxml, in conjunction with RDFLib to parse TEI documents

      I think RDFLib is only used to create and output the RDF, not for parsing the TEI documents.

    3. so all elements and attributes must be mapped to classes and properties from arbitrary ontologies

      This is better for reuse, not really a drawback :) I do understand that there may not always be a perfect mapping.

    4. Python

      Note that the documentation mentions Python 2.7 very explicitly. That version of Python is no longer supported; it would be great if the code were updated to support Python 3.

    5. open-source application

      The GitHub repository does not contain a license, so the code does not appear to be open source.

    6. there is a lack of user-friendly tools for working with digital scholarly editions and LOD

      Why do we need such tools?

  5. Jan 2022
  6. Nov 2021
    1. For any of our DH work to be sustainable, it needs to be produced in full dialogue with the community of information professionals.

      That sounds about right.

    2. They tired of care-taking, even though this involved little more than continuing to host the project files on a server.

      I'm puzzled that the author uses these words, when the point of the article appears to be that maintenance is work.

      This quote was dissected by Andromeda Yelton on Twitter.

  7. Jul 2021
    1. Valid use of attributes in XML

      The quotes are smart quotes, which are not accepted as quotes.

    2. Element names cannot start with the string xml, XML, Xml, etc

      This is new to me. But it's true:

      This specification does not constrain the application semantics, use, or (beyond syntax) names of the element types and attributes, except that names beginning with a match to (('X'|'x')('M'|'m')('L'|'l')) are reserved for standardization in this or future versions of this specification.

    1. W3Schools identifies

      W3Schools is not affiliated with W3C, who manages the standards. Maybe it isn't the best source for rules?

    1. it is a requirement for every TEI project to provide a detailed ODD model.

      I assume it is not required for every project to create a new ODD model? Reusing ODDs improves interoperability.

    1. For instance, the following statement describes a black circle with a radius r of 50:

      Note that this is not a complete SVG document. You cannot save this snippet as a file and view it without adding elements around it.

    1. DTD syntax as it is easier to learn and use than schema languages, while the general principles of modelling are similar. 

      I don't quite agree that the DTD syntax is easier than XML's syntax.

    2. what attributes they can contain

      Attributes were not mentioned yet.

    1. discipline of text encoding

      Text encoding is a discipline?

    2. philology

      Philology is the study of language in oral and written historical sources; it is the intersection of textual criticism, literary criticism, history, and linguistics (with especially strong ties to etymology).[1][2][3] Philology is more commonly defined as the study of literary texts as well as oral and written records, the establishment of their authenticity and their original form, and the determination of their meaning. A person who pursues this kind of study is known as a philologist.

      Wikipedia on Philology

    1. Federally, Immigration, Refugees and Citizenship Canada told CBC News it can only issue documents printed "in the Roman alphabet with some French characters" because of standards set by the International Civil Aviation Organization. 

      Hmm. I don't think you can put all the blame on commercial air travel.

  8. Nov 2020
  9. Apr 2020
    1. Looking at the number of self-citations, this article is from a niche field. I have seen a presentation from this author in 2017 and subsequently had some discussion about how to apply OntoUML, but at the time his ideas were too philosophical for me.

      This article may still be too philosophical for many users of RDFS and OWL, but it provides a good argument for looking at OntoUML.

  10. Mar 2020
    1. nanopublications now help us to publish this model and its instantiations in a FAIR manner.

      Nanopublications may be the most FAIR way to publish Linked Data.

    2. Registered in FAIRsharing as https://fairsharing.org/bsg-s001394

      Doesn't FAIRsharing support DOIs for entries?

  11. Jun 2019
    1. Will we continue to use a microservices architecture for our digital repository in the future? We aren’t certain.

      In software engineering in general, it looks like people are taking a more nuanced view of microservices. As with many new ideas, it looked to be The Answer to some problems, but turned out to have its own problems.

    2. Usability testing also revealed biases in our design process that had gone unnoticed. A particularly salient example of this is the name ‘BC Digitized Collections’. As archivists and librarians, we took for granted that our users would be able to decipher this and understand what they would be able to find within the system. In fact, our test results suggested the opposite. A reevaluation of how we are naming and describing the system is forthcoming, with a greater sensitivity to our end users’ level of familiarity with library jargon and information architecture conventions.

      Interesting!

    3. Technical

      Perhaps you meant 'Image [API]'?

    4. Of the eight potential repository solutions that we investigated before undertaking this project, five were open-source and three were vended solutions.

      Open-source solutions can be provided (even hosted) by vendors, so you probably meant proprietary to contrast with open source. Is that correct?

    5. While Alma offers partial IIIF support with its implementation of Universal Viewer,

      Wait, what? It does?

    6. Instead, we worked with ITS to isolate the server from our campus network to minimize the impact of any potential attacks, then rapidly accelerated our migration timeline.

      It almost sounds like the security issues came in handy :)

    1. The most relevant to spot

      I keep thinking of Metabase as an open-source alternative that allows creating and sharing charts. Grafana is another dashboard application, geared towards timeseries data.

  12. Mar 2019
    1. contractaanpassingsmechanismen en alternatievegeschillenbeslechtingsprocedures

      Prachtige woorden...

    2. Wanneer de rechthebbenden de aanbieders van onlinediensten voor het delen van content niet de relevante en noodzakelijke informatie over hun specifieke werken of andere materialen verstrekken

      Wie is er geen rechthebbende? Hoe gaat iedereen in de hele wereld aan alle aanbieders van onlinediensten voor het delen van content informatie verstrekken over hun werken?

    3. Het is daarom van belang de ontwikkeling van de licentiemarkt tussen rechthebbenden en aanbieders van een onlinedienst voor het delen van content te bevorderen

      Willen we echt licenties voor alles wat iedereen online doet?

    4. Wat zou een inhoudsopgave in dit document goed van pas zijn gekomen...

    1. Open Archival Information System (OAIS) compliant preservation system

      I'm not convinced there is such a thing as an OAIS compliant preservation system, as I still see OAIS as a model to describe systems.

    2. OAIS is the ISO-standard for digital preservation

      I'm taking this a bit out of context, but I was never able to look at OAIS as a standard for digital preservation.

  13. Feb 2019
    1. We see it with a drive to static sites, conflating speed with the lack of a database, and ending up recreating the database in the filesystem or relying on a raft of third party services to plug the holes that would have been filled by a more traditional CMS.

      Guilty… :$

    1. Competing scholarly platforms, many of which are proprietary, appear to be growing in popularity yet demonstrate poor support for open standards or prevalent open science technical protocols, as well as low levels of integration with open scholarly infrastructure.

      It would be great to see numbers for this statement, although I wouldn't be surprised to see it confirmed.

    1. “back office”

      This should say "front office".

    2. Archivists need to verify metadata, documentation, and data integrity to ensure that data sets meet minimum standards for ingest.

      And this is where automation could help.

    3. Contributors who submit a data set once every year or two, or maybe once in a career, need assistance in structuring and documenting their files for submission.

      Everyone should be taught to structure and document their files even if they will not submit them for archiving.

    4. Automation is facilitating more archival procedures, such as batch ingest of files and APIs for submission and retrieval, but much of the labor associated with contributing data to archives remains craft work.

      This remains an issue for growth of data archiving.

    5. three‐ring binder

      At least two interviewees use three-ring binders, which I think are rare in the Netherlands. Interesting!

    6. Weblogs

      i.e. logs of web traffic

    7. DATAVERSE.nl

      Also known as DataverseNL

    1. Because retraining a neural network requires large annotated data sets and extensive computational power, we looked for different ways to use existing neural networks to identify visual similarity. We turned to the penultimate layer of the CNN to identify similar visual trends in advertisements.

      Excellent!

    2. F1 score—a harmonic mean of the precision and recall—of 0.9 over the entire period, meaning that it accurately predicted the type of 90% of the images.

      I think this is the definition of 'accuracy'. Precision and especially recall are important when the model also has to find instances, rather than classify all instances into one of two categories.

    1. As with the universality of browsers, there is a major role for the W3C in creating the standards that will allow decentralized data pods and apps to interoperate.

      This point of standardisation has not had enough attention in the Solid presentations I have seen until now. The ecosystem of Solid pods and apps requires agreements on APIs – not so much for reading and writing data documents, but the structure and contents of those documents.

  14. Jan 2019
    1. In my experience, Linked Data excites developers who have never seen it before, because they suddenly have access to a whole Web of data instead of just one back-end. It opens up huge opportunities, since they no longer depend on harvesting data to get build something nice.

      Hear, hear!

    2. "object": "https://you.example/likes/2018/12#rubens-post",

      The object of the Like activity should be the IRI of the thing liked.

    1. Recommendations 9 and 10 are not directly mapped to any requirements as requirements were inferred from use cases from human users.

      This phrasing makes it sound like requirements from human users are very different, whereas in the end all use cases come from human users, aren't they?

    2. new description format

      Everywhere this format is called a user story.

  15. Dec 2018
    1. A political cartoon can be better understood if we know the offices of state that the individuals held at the time of the cartoon, and here is where it helps to combine artwork data and political data on the same platform.
  16. Nov 2018
    1. Have your own research question in DH projects. If we want to be partners in research, we can also do research. Find the research aspect in the projects that are relevant for you and feed the answers you find during the projects back into your practice as a library. Make research projects mutually benefical and become the partner you want to be.

      I was reminded of this recently and agree.

    2. Work more openly. Publish documents that you might think are only of interest to your direct colleagues, but can actually be very valuable to your wider network.

      Good reminder!

    1. As machine learning techniques like Optical Character Recognition (OCR) push new boundaries, IIIF Image provides a uniquely efficient way to build new training sets that can easily be shared as lists of urls.
    1. Composite annotations are a more complex case as they do not represent a single entity mentioned in the text. As such they do not contain a named entity reference but rather point to one or more existing annotations that are used to mark up the infor-mation that describes the thing represented by the composite annotation.

      You are creating entity references, but they are unnamed by default.

    2. using annotation classes for dis-tinguishingbetween mentions of different types of objects such as Persons or Lo-cations

      My intuition would be to distinguish mentions of different types of objects by their entity class, not annotation classes.

    3. Annotations have technical metadata.It is important to identify, who and when created the annotation, what is the visibility of the annotation, etc.

      This is usually called descriptive metadata (who created the annotation when) and administrative metadata (who should be able to see the annotation).

  17. Oct 2018
    1. It looks to me like "building bridges" is what other people would call "annotating the Web". Using the W3C Web Annotation standards one can do precisely what "building a bridge" is supposed to do: link pages, or specific parts of pages, to (parts of) other pages/commentary/... – except that getting virtual money for annotating is not part of the standards (or thinking, as fas as know).

  18. Sep 2018
    1. there are no othercosts related to the creation and attribution of the identifiers.

      No costs for end users, I presume?

    1. a policy must be established for the three collection levels “objects”, “metadata” and “metadata records” based on a pilot.

      It is still unclear why the authors chose these three 'collection levels' and why there must be a policy for them.

    2. The pilot will reveal

      A pilot project should reveal obstacles – if it does depends on the setup and execution.

    3. a kind of scorecard containing all FAIR principles on which one can mark the score for each principle per collection

      How to measure FAIRness of data is still a topic of research. What would you suggest?

    4. ORCID (orcid.org),

      Only for researchers and only living people can get an ORCID.

    5. “Persistent” entails the permanent availability of the identifier

      I think people in the field of persistent identifiers don't dare to claim PIDs must or can be permanent. Yes, permanent would be ideal.

    6. Digital objects have a date-timestamp

      A timestamp for creation, publication, copyright?

    7. The DEN DE BASIS set of best practices, although it is very broad and complete, is not compact enough for quick adoption.

      Do best practices have to be 'compact for quick adoption'?

    8. It seems that citation standards can be derived from provided standard metadata formats as facilitated by tools like Refworks, Mendeley and Zotero.

      Having a clear suggestion for a citation is also a visual reminder to actually cite the item. Plus, it can signal the most important metadata to use in the citation, including (potentially) rights holders.

    9. The list raises several questions, for instance how to link annotations to specific fragments of a text, what a retrieval protocol for annotations should look like, etc.

      I should read the article, but have they heard of Web Annotations?

    10. The phrasing of the FAIR principles is somewhat confusing, very probably because of the wish to be concise. Some principles (not all) refer to both “data” and “metadata”, which is formulated as “(meta)data”, for example in “F1. (meta)data are assigned a globally unique and persistent identifier”. One principle is self-referencing (“I2. (meta)data use vocabularies that follow FAIR principles“). Also the enumeration of the four sections (F, A, I, R), each containing 3 or 4 principles, is rather uncommon, for example “(F)air” has four individual principles F1, F2, F3, F4, but “(R)eusable” has one main principle R1 and three sub-principles R1.1, R1.2, R1.3. The logic of this subdivision is unclear, because all four Reusable principles are guidelines on the same level in their own right. In fact, the authors say so themselves: “The elements of the FAIR Principles are related, but independent and separable.” (Wilkinson et al. 2016, p.4).

      Interesting observations. I can see that they are confusing, but is it relevant in this context?

    1. It's the forgetting that will allow progress.

      But are blockchains a result of progress?

  19. Aug 2018
    1. The hardest part of looking back on this project is seeing how, in the last year of the project, we managed to do most of the heavy lifting, while prior to this year the project felt untethered.

      I understand this feeling. It can be a similar feeling for shorter projects too – "I've spent weeks on X before I found out that Y was an easier/quicker/better solution."

    2. It is challenging, if not impossible, to usefully quantify the return on investment of involvement in a community like Samvera, but the time spent by other community members providing technical support (solicited and unsolicited), conceptual support, and emotional support have provided us with benefits beyond what one could expect from contracts or subscriptions

      Well said.

    3. We hope our migration story will be helpful to developers and repository managers as a map of development hurdles and an aspiration of success.

      It is definitely laudible that you shared your story.

    1. Reproducibility advocates are converging around a tool set to minimize these problems. The list includes version control, scripting, computational notebooks and containerization — tools that allow researchers to document their data, the steps they follow to manipulate it, and the computing environment in which they work (see ‘Getting reproducible’).

      It would be great to have references to existing articles about 'getting reproducible'. I find the lack of references amazing to papers with similar lists of practices amazing.

    1. Limitations

      It would be interesting to see a comparison of curation on various platforms. It is clear that GitHub is used a lot for curation of lists, but so is Wikipedia (supported by Wikidata). There used to be directories on the Web, to help people navigate, before search engines became the main means to navigate the Web.

    2. A curated list that provides centralized peer-reviewed resources about a specific topic provides a starting point where developers know that they can find high-quality resources and begin learning the subject.

      There are different platforms for this purpose as well, like https://learnxinyminutes.com. When you don't know about the existence of such lists, using online search would help beginners.

    3. Curation is a common practice in Archeology.

      Why only mention archaeology as a field that practices curation?

    1. Concepts for the future of scholarly publishing extend beyond collaborative writing [45,46]. Bookdown [47] and Pandoc Scholar [48] both extend traditional Markdown to better support publishing. Examples of continuous integration to automate manuscript generation include gh-publisher and Continuous Publishing [49], which was used to produce the book Opening Science [50]. Distill journal articles [51], Idyll [52], and Stencila [53] support manuscripts with interactive graphics and close integration with the underlying code. As an open source project, Manubot can be extended to adopt best practices from these other emerging platforms.

      Great list of interesting and useful tools.

    1. This resource is protected by copyright

      "Protected by copyright" is not saying what you can or cannot do; usually the phrase "All rights reserved" is used to indicate the possibilities for (re)use.

    2. To analyze the status and potential risks of copyright infringement for our digital collections we made use of the licenses granted by both Creative Commons and RightsStatements, established by Europeana and DPLA.

      How do the licences help you analyse the copyright status or risk of copyright infringement?

    3. We decided that all data are made as openly available as possible, in as many places as possible.

      If everything should be as open as possible, then why not allow downloads of TIFF files?

    4. Are they allowed to download and reuse our metadata?

      Are they?

    1. World Wide Web (w3c)

      C is for Consortium :)

    2. nstitutional databases do not communicate with outside data stores such as publishers, vendors, non-profit organizations, and open source platforms.

      This is a very broad claim, which I think would be stronger if 'communication with outside data stores' is better defined and if you can support it with sources.

    3. The proposed solution is a shared information pipeline where all stakeholders/agents would be able to share and exchange data about entities in real time. Three W3C recommended protocols are considered as potential solutions: the Linked Data Notifications protocol, the ActivityPub protocol, and the WebSub protocol.

      It looks like you are equating the 'pipeline' with protocols. Is that what you are doing?

    1. introductory session on the Textual Encoding Initiative (TEI) offered by Huw Jones, head of the Digital Library Unit

      CUL provides introductions to TEI

  20. Jul 2018
    1. “Your primary collaborator is yourself six months from now, and your past self doesn’t answer e-mails,”

      Very true!

  21. Jun 2018
    1. The other cost barrier is the opportunity cost of requiring scholars to spend significant time learning how to install and deploy web applications before they even get the chance to see how those applications can support their digital scholarship goals.

      Yes!

    2. It was Brian Henebry, the Director for Architecture, Service and Operations for Miami's IT Service

      Was this the only collaboration with University IT? What kind of collaboration was there when you used a completely separate infrastructure?

    3. these are made more complicated with the addition of a third party (Amazon)

      How?

    4. plans for longer term sustainability; and succession planning when a client moves on and wants to take their work with them.

      Have you considered making the web apps and/or data in the VMs static to preserve them in a different way, like in a web archive or data repository? Sometimes all a researcher wants is to share their data online within the context of a website with context and documentation – you don't need a separate VM for eternity to do that.

    5. we recognize that users of the Scholars Dashboard may at some point want to move to a less experimental platform and move their work to a “production” environment.

      This means sometimes the 'work' has to be rebuilt from scratch (e.g. if an essential WordPress plugin is found to be a security vulnerability and needs to be replaced) and settings may need to be updated for different URLs etc. Who would be responsible for managing the production environment?

    6. Other services routinely require customization of their configuration to match the characteristics of the running instance

      Does this mean that even though scholars and students can create their own VMs, the CDS or IT have to configure it? Or is all configuration left to the end users? If the latter, how do they feel about that?

    7. That process is mostly manual and would benefit from more automation and integration with common billing mechanisms.

      Automation would be great, but I'm already impressed by the built-in usage tracking that allows splitting the bill in this way!

    8. (http://scholardashboard.miamioh.edu)

      I get a connection time-out when connecting from The Netherlands. Also, this should be secured with TLS, i.e. be HTTPS-only.

    9. As indicated in a 2014 EDUCAUSE article “Libraries have always been in the business of knowledge creation and transfer, and the digital scholarship incubator within the library can serve as a natural extension of this essential function” (Sinclair, 2014 Sinclair, B. (2014). The university library as incubator for digital scholarship. EDUCAUSE Review. Retrieved from http://er.educause.edu/articles/2014/6/the-university-library-as-incubator-for-digital-scholarship [Google Scholar]). Such incubators can create innovative virtual shared spaces that can support learning and discovery at different scales.

      This is sometimes called a 'digital scholarship laboratory', especially when there is a physical space.

    1. Doel is het maken van een toolkit voor het meten van hergebruik van erfgoed data (d.w.z. al het gebruik buiten het kijken en downloaden in het repository om).

      Dit doet me denken aan https://peerj.com/preprints/26505, dat bedoeld is voor het meten van gebruik van onderzoeksdata.

    1. Mark Lizar [position statement] presents the idea of ‘consent receipts’. A working group of the Kantara Initiative developed a format to formally describe the purpose of data collection, the identity of the data controller, and more. Mark is working on making such receipts, and the policies they refer to, easier to understand.

      This is interesting and something I have been thinking about in 'personal contract management' terms.

  22. May 2018
    1. creating the architecture and scaffolding of the World Wide Web

      Of course the World Wide Web is not the same as the Internet – the Web was invented by Tim Berners-Lee.

  23. Apr 2018
    1. It would be interesting to see the impact in other search engines, but I understand this was not the focus of the article.

      Another result may be better interpretation of metadata in Zotero and other reference managers.

    1. As a profession, we reward productivity in the form of papers and grants, and sitting down to deeply read journal articles can feel like wasted time. Yet, if we aren’t regularly reading the literature, we risk that the work we are doing is out-of-date, duplicative, or derivative.

      Yes!

  24. Mar 2018
    1. The following five different NER systems have been used in our tests: Stanford NER, NER-Tagger, the Edinburgh Geoparser, spaCy, and Polyglot-NER.

      Short overview of NER software used in the research.

    1. By assigning DOIs, these resources also become citable.

      You don't need a DOI to make something citable – any identifier will do.

    1. References

      Why are there no direct links to the articles in this same journal? Why don't you follow your own instruction for citing articles and did you leave out the DOIs?

    2. doi: 10.1038/sdata.2017.27 (2018).

      Is this the actual DOI? On the right of the page, under the Info tab the DOI contains '2018' instead of '2017'.

  25. Feb 2018
    1. For large-scale metadata harvesting the MetaStore is OAI complaint

      What do you mean by "X is OAI [compliant]"?

    2. content

      descriptive metadata, I presume?

    3. they either have a focus on generic institutional repositories without community specific adaptations (DSpace, Fedora)

      Fedora is not at all focused on being used in institutional repositories.

    4. KIT DM is more flexible

      Fedora and iRODS are designed to be very flexible, so this is a bold statement.

  26. Jan 2018
    1. Is DL-Learner related work?

    2. If we can incorporate such methods into end-to-end models, it becomes possible to let these models learn the most appropriate level of inference themselves.

      You could also make all the implicit knowledge explicit by using a reasoner before using ML.

    3. 3.4.The default data model?

      This section considers the various ways of modeling information (knowledge graph, tree (XML) and table (relational model)) for use in machine learning.

    1. When designing such a system, scalability issues need also to be considered. These questions are not easily solved and will be investigated as part of our future research. We aim at addressing issues such as what is the best way to transfer all the data, how can large data sets be processed, and how can processing power be distributed.

      Many algorithms can be parallelised and distributed. Services that are not easily parallelised may be replicated on multiple machines, using a load balancing proxy for access via a central access point.

    2. With DivaServices all complicated installation and configuration steps to use a method are removed by providing a simple-to-use RESTful web service.

      Unfortunately the authors provide no information about service levels to be expected.

    1. RDFa does not have a standardized option to place data in named graphs

      I assumed that by default, a file containing RDF serves as the graph. Reading this, I probably shouldn't.

  27. Dec 2017
    1. Slecht onderzoek kan bovendien een domino-effect veroorzaken: als een andere studie dat effect niet repliceert, heeft dat een minder grote kans om gepubliceerd te worden. Het eerste artikel veroorzaakt een bias: als dat niet deugt, hebben alle goede studies het moeilijker.

      Belangrijk argument tegen het afwijzen van artikelen waarin een effect niet wordt aangetoond.

    1. I have not seen this crucial second paradigm shift articulated explicitly elsewhere

      dokieli is (AFAIK) the best example of an app being a view – or rather providing a view. And feed readers. You're probably right that it hasn't been discussed a lot.

    2. We can essentially replace LinkedIn by an address book, where somebody is a connection if they also have you in their contact list.

      I've wondered why we've started exposing our contact lists in the first place – but that's a different discussion.

    3. Consider this social media post, where an author states his professional opinion on an online news article.

      Great visualisation

    4. Most Web applications today follow the adage “your data for my services”. They motivate this deal from both a technical perspective (how could we provide services without your data?) and a business perspective (how could we earn money without your data?).

      Great observation! The business perspective also includes why would anyone pay us for this service?, as a counter-counter question to the counter question why don't you charge money instead of data?

  28. Nov 2017
    1. A practice is included in our list if large numbers of researchers use it and large numbers of people are still using it months after first trying it out. We include the second criterion because there is no point in recommending something that people won't actually adopt.

      I like the criteria for inclusion (and the whole article), but one could wonder if they really makes something a "good enough" practice. How many is "large numbers of researchers"?

    2. Many of our recommendations are for the benefit of the collaborator every researcher cares about most: their future self (as the joke goes, yourself from 3 months ago doesn't answer email…).

      :)

  29. Oct 2017
    1. Desktop editors for Markdown writing may meet content editing needs in these cases but were not explored in depth for this article.

      Using a tool like pandoc may allow someone to write in Word, LibreOffice or (La)TeX and convert the file to Markdown or HTML, but I understand it always makes things more complex.

    1. Forming an individual relationship may be time-consuming but it can make a big difference in the quantity and quality of researcher contributions.

      I do wonder if larger group introductions can help too.

    2. Throughout the outreach process, it has become increasingly apparent that focused and personalized attention, demonstrated through individualized emails and one-on-one meetings, helps increase researcher participation.

      Hopefully they won't need to address every individual researcher as researchers get comfortable with the catalog and start helping each other get started with the system…

    3. People just assume that the work I do in my day job now is much harder than it actually is, so if I can lower that barrier we can have more people learning to do it and more people can be more efficient in their jobs.

      This is one the reasons for the DH Clinics to exist.

    1. Very interesting summary and comparison between 'DP' and 'DH'!

    2. Do we not see likewise in dh that conferences focus on techniques, and papers focus more on describing technology than its application? 

      Yes, we do see this! Amen! ;)

    3. automatisation

      automation?

    1. The Harry Ransom Center at the University of Texas Austin, which is well-funded and ambitious, is said to have particularly driven up the price for the most sought-after collections

      What a coincidence that the name of the Center includes "Ransom"!

  30. Sep 2017
    1. The framework of the EM research is mainly BS added with the concept of spirituality, as described by religious sociologists

      Citations needed

    2. BS is the most important framework of the research.

      Why?

    3. Here, however, the discussion starts at the level of the organisation because this was logically seen the best starting point.

      What logic did you use?

    4. blood groups

      Did you mean blood types? Why did you choose blood groups as an analogy for model explanation or interpretation?

    5. The EM is an open model. This implies that no restrictions exist in publishing.

      Is the model in the public domain?

    6. Research to the EM provided relevant information that is profitable for every (potential) user

      What research provided what information and how is that information profitable? (Did you mean useful?)

    7. he EM is far and foremost a coaching model.

      Please define coaching model.

    8. BS studies made clear that BS has positive effects for the organisation as a whole, and the welness of the employee.

      citations needed

    1. Because there still does not exist a proper Knowledge Management at theGEI as would be provided by using a suitable KOS these research resultscan not yet be linked to other research contexts and therefore are in danger ofnot being used afterwards. Their production within a highly specialized researchcommunity with complex but separate contexts and systems prevents them frombeing found easily, consequently followed by death of data”, double work andwaste of resources. Another researcher with related information interests has noknowledge about the already existing GEI data and is not able to satisfy theirneed of information easily.

      Making research results findable can be done using non-KOS, like a full-text search engine. I think 'not having a KOS will keep research results from view' is not a strong argument.

    1. scientific protocols

      i.e. biomedical / experimental protocols :)

    2. This post has been written from a life sciences point of view. Life sciences also seems to be the main target discipline of protocols.io, even though the principles of open science are applicable in all disciplines.

    1. Before approving a study, ethics committees should ask researchers to declare in writing their willingness to work with their institutional resources, such as librarians, to ensure they do not submit to any journals without reviewing evidence-based criteria for avoiding these titles.

      Interesting idea. Librarians will probably have insights on trustworthyness of journals.

  31. Aug 2017
    1. SPRQL queries

      I believe the standard is called SPARQL. I can't believe the authors consistently use the wrong name.

    1. try and click on a very common sentence, e.g. “the experiments were performed as previously described”. In essentially every single case today, nothing happens, while in the demo in 1968, it would have taken the reader to a document describing the experiments.

      That is a great example. I guess one reason that this doesn't work is that the software people use for writing makes it unintuitive to (cross)reference. But of course the authors should be encouraged to want to do this in the first place and they aren't.

    2. nobody seems to care about the software we write to transform the bits and bytes of the raw data into the flat, pixel-based images.

      Some of us do…

    1. For example, the US Fair Credit Reporting Act requires that agencies disclose “all of the key factors that adversely affected the credit score of the consumer in the model used, the total number of which shall not exceed 4.” It’s difficult to satisfy this regulation if your credit model is a deep neural network.

      :/

    1. in the end all scientists should have an interest in improving scientific practice

      yes!

    2. Having recently had the displeasure of experiencing firsthand in my own life how the news media operate

      What is the author referring to here?

  32. Jul 2017
    1. I considered a number of annotation tools. After trying many and consulting with colleagues and the distance learning team at FSU,33. Thanks are due to Paul Fyfe, Tarez Graban, and Charles McCann for their advice on this topic.View all notes I chose Hypothes.is.44. Hypothesis <https://hypothes.is> [accessed 3 August 2016].View all notes It offered stability, good reviews, open source, creative commons, the option of private annotation, a clean and inviting interface, and flexible possibilities for the course and beyond.

      There are tools that can be installed on university servers and that should work in much the same way as Hypothes.is. I wonder if these were considered?

    2. With a large class, some responsibility must fall on the student to follow directions and ask for help.

      I'm not a teacher, but yes, I agree that students (they're usually adults) have responsibility to follow directions – also to be critical of them, and ask for help when the directions are unclear.

    3. Some websites do not integrate smoothly with the Hypothes.is shell.

      That is indeed problematic.

    4. I worked to establish trust and community so that we could respectfully disagree upon difficult matters.

      One would hope university students learn to debate respectfully and using arguments that go into the contents, not ad hominem 'arguments'.

    5. many students used their own names

      On the wider web, this does not appear to be a reason for refraining from making offensive remarks.

    6. The staff at Hypothes.is was extremely helpful in deleting these false pointers once students had re-posted their annotations in the correct location.

      I understand the students were having trouble already, but they should be able to edit their annotations and put them in the correct channel themselves, don't they?

    7. Given my students’ lack of familiarity with annotation software, each of these roles was distinctly necessary. I offered instructions and advice via assignment guidelines, announcements, FAQs with screenshots, personal emails, video conferencing, and meeting with students personally in office hours. The staff at Hypothes.is emailed with students, and Jeremy Dean provided a Student Resource Guide and video tutorial.

      That is quite a bit of work!

    8. Using Hypothes.is on Victorian texts lends itself to focused annotation along a number of axes:(1) Historical: highlight a literary, medical, judicial, or historical reference and discuss its history;(2) Linguistic: highlight one word and discuss its history, usage, and connotations;(3) Literary: highlight a phrase or sentence to discuss using particular literary or analytic concepts discussed in class (metaphor, free indirect discourse, irony, etc);(4) Ethical: highlight a sentence that represents an ethical choice in this text and discuss the costs of that choice;(5) Multimedia: highlight a phrase or sentence and link to a visual, audio, or video clip with a brief explanation of the connection you see.

      Good – a guide for what aspects of texts to annotate.

    1. The authors provide a clear idea on how data that is not accessible in RDF per se can be made interoperable using RML and TPF. However, their suggestion that the presented solution uses the LDP specification beyond the shared use of the term "Container" seems inadequate.

    2. Within the LDP specification is the concept of an LDP Container. A basic implementation of LDP containers involves two “kinds” of resources, as diagrammed in Fig. 1. The first type of resource represents the container—a metadata document that describes the shared features of a collection of resources, and (optionally) the membership of that collection. This is analogous to, for example, a metadata document describing a data repository, where the repository itself has features (ownership, curation policy, etc.) that are independent from the individual data records within that repository (i.e., the members of the collection). The second type of resource describes a member of the contained collection and (optionally) provides ways to access the record itself.

      It is a bit confusing that the authors project their resource types (Container and MetaRecord) onto the LDP specification, which does not specify the MetaRecord type. It would have been clearer if the authors had just mentioned LDP as inspiration.

    3. if there is an algorithm capable of extracting it and exposing it via the TPF interface.

      I wonder how much new users should know about the open world assumption and handling conflicting statements. If this takes off, we might see many triples that 'do not compute'. Just a thought, not criticism towards the paper.

    4. without the need to define an API

      …a new API

    5. The FAIR Projector, in this case, is a script that dynamically transforms data from a query of UniProt into the appropriately formatted triples; however, this is opaque to the client. The Projector’s TPF interface, from the perspective of the client, would be identical if the Projector was serving pre-transformed data from a static document, or even generating novel data from an analytical service.

      Given the premisse that TPF endpoint are more scalable than SPARQL endpoints, using (dynamic) TPF endpoints makes sense.

    6. Calling HTTP GET on the URL of the FAIR Projector produces RDF triples from the data source that match the format defined by that Projector’s Triple Descriptor. The originating data source behind a Projector may be a database, a data transformation script, an analytical web service, another FAIR Projector, or any other static or dynamic data-source.

      Cool idea!

    7. We propose, therefore, to combine three elements—data transformed into RDF, which is described by Triple Descriptors, and served via TPF-compliant URLs. We call this combination of technologies a ”FAIR Projector”.
    8. An RML map describes the triple structure (subject, predicate, object, abbreviated as [S,P,O]), the semantic types of the subject and object, and their constituent URI structures, that would result from a transformation of non-RDF data (of any kind) into RDF data.

      So this could be used instead of the JSON schema definition for tabular data that was recommended in the Tabular data on the Web efforts.

    9. we only require read functionality

      You need to create the structure somehow, don't you? I think you're saying that creating FAIR Accessors is up to the reader?

    10. the LDP’s use of the Data Catalogue Vocabulary

      The LDP specification does not mention DCAT, so I'm not sure what is meant here. Do the authors imply there is a connection between the standards or merely that it is possible to combine the two?

    1. What would it be like if that was all there was—structures meant to bring people and students together for as long as a methodology remains useful or a question remains interesting? Such entities would be born like centers—born with all the excitement and possibility of not knowing what you’re doing—of having to learn from each other what the methodologies and questions are really about.

      I like this idea.

    2. we’d have to change the names of the degrees to something vague, like “Bachelor of Arts” or “Doctor of Philosophy.”

      :D

    3. The goal of any new theory of libraries must of course accommodate the increasing needs in research and scholarship for large quantities of information, but should not preface quantity of information over all else. As important as the information itself is, providing and supporting an environment that allows for the transformation of that information into new knowledge is essential.

      Yes!

    4. The Wrong Business for Libraries

      This essay got me thinking, first that being scholar-centered is indeed something to strive for. But later I wondered if the library should be the place for a scholar to do everything – if so, why do we have the other parts of universities (faculties with labs, offices)?

    1. Most popular U.S. mobile apps, by unique monthly visitors

      Is WhatsApp not popular in the U.S.? In the Netherlands, 'whatsapp' is a verb.

    2. There are no professional standards on such disclosures in the research papers, which are mostly published in law journals at the universities.

      That surprises me. Many, if not most, scholarly publications include funding in articles because it increases transparency and (ideally) trust in the outcomes. Not including corporate/private funding feels like hiding a hidden agenda; often when it is later revealed, researchers get scrutiny for hiding the information – which is the point of this article.

    3. Google has paid professors whose papers, for instance, declared that the collection of consumer data was a fair exchange for its free services;

      [citation needed]

    1. best practices for assisting digital humanists defined

      Is it possible to define best practices based on a single case study?

    1. We encourage style guides to update their recommendations for DOIs to use the full URL form.

      Agreed, unlinked identifiers are soooo… well, hard to use.

    2. The short form of the DOI for https://doi.org/10.5285/1D4D70AD-DC38-4E5F-BC39-066BABCA2FB2 is https://doi:10/bcc7.

      The second link does not resolve, as "Firefox can’t find the server at doi." Are you sure this is correct?

    1. Called “Sneak Peek,” it’s not exactly a preprint server (because scientists can only post if Cell Press has accepted their manuscript), and it’s not exactly open access (because you need to register for free to see them). But it does allow scientists to share work ahead of peer review and publication and brag about their high-profile placement at the same time.

      How does Cell accept manuscripts that haven't been peer reviewed yet?

    2. The Proceedings of the National Academy of Sciences won’t take papers that appear as preprints if they have a Creative Commons License

      Even though CC-licences (without the Non-Commercial clause) explicitly permit redistribution! Crazy! :)

    3. You may be wondering why scientists would even bother to publish in journals after they’ve posted a preprint—a system intentionally built to subvert the bottleneck of peer-reviewed publication. But the system of academic publishing and all the rewards built into it haven’t disappeared. Which means for now at least, biology careers aren’t made on bioRxiv. Traditional journals still hold the key to postdoc positions, tenure lines, and lab funding.

      This has been a key discussion point for quite a while. A related interesting history of journal publishing is The Guardian's longread on the business of scientific publishing.