- Aug 2017
-
essepuntato.github.io essepuntato.github.io
-
PeerJ review #1:
Basic reporting
The paper have clear language, significantly improved since first review. The dataset is augmented with extra material, and referenced properly from Figshare with https://doi.org/10.6084/m9.figshare.3980463.v5
Argumentation is well-structured and founded, although a couple of citations or examples are missing, e.g. claim that HTML allow ambiguous structures, or the novel (and unnecessary) use of RDF/XML in a script tag.
See https://via.hypothes.is/https://essepuntato.github.io/papers/rash-peerj2016/2017-07-06.html#rash-eval for my detailed review per section of this version.
Experimental design
The paper describes well the motivation and design of the RASH framework, while also giving an extensive and up to date review of comparative technologies and approaches. The paper explains also challenges and peculiarities encountered in its implementation.
RASH is a well-designed subset of HTML that emphasizes document structure and semantic annotations. I think it could also be argued that unlike "any HTML5" this design also improves longevity for articles published in RASH HTML.
My only slight concerns is the extension of WAI-ARIA roles (e.g. "doc-endnotes"), which I could not find any citations for being allowed (or not) within HTML5; as well as the novel use of RDF/XML in a HTML script tag.
Validity of the findings
The survey part provides valuable insight into the uptake potential of RASH-like technology - although this should be taken with a grain of salt as the relative low number of survey participants means the data is (as the authors point out) NOT statistically significant. The paper do however provide a good qualitative analysis of the findings, which warrants their inclusion.
The authors provide well-reasoned conclusions. While my previous review identified some speculative language, this have now been improved.
Comments for the author
I am Stian Soiland-Reyes http://orcid.org/0000-0001-9842-9718 and believe in open reviews.
This review is licensed under a Creative Commons Attribution 4.0 International License http://creativecommons.org/licenses/by/4.0/
The authors present the RASH framework, a subset of XHTML for academic publishing, along with software tools for its validation and conversion. The paper review the state of art in academic HTML publishing, motivate and detail the design of the framework, and evaluate its uptake and future challenges.
While personally I would have welcomed a more visionary/revolutionary approach for changing academic publishing for the Web, the authors take a more conservative approach with emphasis on pragmatic tooling to support existing authoring workflows (e.g. support LaTex and MS Word). From this, RASH can provide a valuable stepping stone for more structured and accessible Web publication workflows for academic publishing.
I think this is a solid article that presents an important contribution to the further development of web-based scholarly communication.
I Recommend this article as "Accept" - although I have left some annotations in https://via.hypothes.is/https://essepuntato.github.io/papers/rash-peerj2016/2017-07-06.html#rash-eval that I hope the authors will consider (along with this review) if a revision nevertheless take place.
-
For instance, several of the papers written by Semantic Web experts do not include any RDF statements other than those enforced by RASH.
"Eat your own dog food" is still not common practice in SW community..
-
, tend to have difficulties with HTML editors
Why?
-
SAVE-SD 2015 Survey
Good qualitative summary of the surveys. We can't conclude too much from the quantitative side (Table 3) as the low number of participants (6+7 authors, 4+3 reviewers) don't provide much statistical significance alone.
-
available at [28]
Great! Complete datasets provided, figshare DOI.
-
in developing RASH, we discussed which of the following two possible approaches fo
Great details on how this was approached!
-
rmat, i.e. interactiveness, accessibility and easiness to be processed by machine. In addition, RAJE uses the GitHub API so as to allow authors to store their articles online, to keep track of changes by means of the GitHub services, and to share the articles with others.
Very cool!
-
. https://rawgit.com/gspinaci/raje-doceng2017/master/raje-doceng2017.html
BTW - in the cited section 4.3 images are missing and some
TODO
s are scattered. -
HTML5 document6 [22], by specifying the generic HTML DOCTYPE followed by the document element html with the usual namespace ( http://www.w3.org/1999/xhtml)
Is it HTML5 or XHTML5 or either?
-
with the attribute type set to application/rdf+xml for adding plain RDF/XML content [19].
Would RDF/XML actually be valid within XHTML's script tag? Do you mean within a CDATA block? This custom
script
block sounds dangerous to me without specifying clearly exactly how it should be used and escaped, as (unlike JSON-LD) and Turtle), RDF/XML specifications do not define how use it inside<script>
within HTML/XHTML (perhaps deliberately).IMHO no-one should encourage RDF/XML anymore, so I would prefer if RASH never even recommended that :)
-
<script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"> </script>
Note that this MathJax CDN has been retired, you should instead recommend something like
https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=....
-
doc-bibliography
While I like this reasoning, it is not clear to me from wai-aria and HTML5's definition of
role
attribute that such custom roles are allowed. Have they been suggested to the W3C ARIA working group? (ARIA 1.1 is still at Candidate Recommendation stage, semi-open for comments) -
doc-endnote
These are not official WAI-ARIA roles. Why not just the official note inside and
class="doc-endnotes"
outside? -
able at http://cs.unibo.it/save-sd/rash,
Is this an implied permalink, or could one with w3id.org be made?
-
quite ambiguous structures
Quotation or example for ambiguous? HTML5 semantics have improved this a bit, but still for instance allows implicit paragraphs which complicates post-processing.
-
We applied these guidelines for the definition of RASH. We restricted HTML, which does not use the aforementioned patterns in a systematic way, allowing the creation of arbitrary and, sometimes, quite ambiguous structures by selecting a good subset of elements expressive enough to capture the typical components of a scholarly article while being also well-designed, easy to reuse and robust.
--> HTML does not use the aforementioned patterns in a systematic way, as it allows the creation of arbitrary and, sometimes, quite ambiguous structures.
To apply the structural pattern guidelines for RASH, we restricted HTML by selecting a good subset of elements expressive enough to capture the typical components of a scholarly article while being also well-designed, easy to reuse and robust.
-
rendered by browsers or other software readers
Longevity wise there's also an issue in that future browsers might not render any "advanced" HTML usage in the same way. Secondly having the full range of HTML available could encourage the use of active content, which although enriching the reading experience today would run with a higher risk of the external content and javascript technologies no longer working just a few years later.
While we don't know what future browsers would expect, a restricted HTML subset would presumably run at a lower risk of such degradation - particularly if it largely overlaps with elements that have survived earlier HTML versions.
-
As of August 10, 2016, the online documentation is mainly a fork of the Scholarly HTML specification proposed by science.ai discussed above.
How compatible Is RASH with this new Scholarly HTML approach?
-
Of course, publishers, conferences, and workshop organisers, would like to manage new formats in the same way as for the formats they already support
I don't think this should be given - we must also push publishers and workshop along into the desired "future".
-
is expected to be produced from MS Word, ODT and LaTeX sources
So RASH fits into the existing paper production workflow as an output format -- but should it not also be the target for authoring Web-first papers in HTML?
-
- Nov 2016
-
essepuntato.github.io essepuntato.github.io
-
The language of the article is OK, but I'm afraid it needs some work several places to improve clarity, e.g. by rephrasing or simplification.
Some of the data (the CSV files) has been shared on Figshare and cited as such, but I am missing the raw data of the extracted RDF annotations as well as the scripts used for extraction.
The HTML file of the article in RASH format has for some reason not been submitted as an additional file (only cited by URL) -- perhaps this was not supported by PeerJ's submission system?
The RASH framework and associated software is referenced by GitHub URLs, but without using versioning. For archival purposes and future availability I would appreciate a Zenodo or Figshare archive of a tagged/version of the software, cited using DOI.
-
This article presents RASH, an HTML-based format for authoring, exchanging and publishing academic articles, arguing that this allows a "Web-first" approach to authoring with a focus on content; but with facilities for semantic annotations. The associated RASH software allows conversion to more traditional article styles (for PDF), as well as conversion from a traditional word processor to RASH HTML.
While the authors argue that we should aim for a "Web first" publishing model with no conversion to traditional PDFs, as championed by the perhaps more visionary linkedresearch.org movement, here it is proposed that RASH gives a pragmatic approach that requires smaller adaptation to existing co-authoring and publishing workflows.
The authors has performed a kind of usability survey for RASH users at two consequent workshop, which gives validity to the claims of its purpose, but also (as recognized by the authors) highlights the current gap in tooling and skills to produce the underlying HTML and RDF annotations.
I think RASH can be seen as an important element of modernizing academic publishing; and while it can be argued that a restricted HTML template like RASH can limit academic authors from publishing articles augmented with state of the art web technology (for instance for interactive data rendering), this model is also a stepping-stone with a stronger focus on portability and longevity that lowers the barriers to get existing publishers on board.
-
The survey data is robust, but perhaps of a bit small sample size to be statistically sound. This is however helped by the fact that the survey was run over two consequent years.
The paper makes several unfounded claims using words like "guarantees" - I believe this is more of an English language/grammar issue than actual claims. See detailed comments in-line.
-
guarantee
You can't guarantee this. "In order to provide visualization"
-
1751 annotations
Are these extracted annotations available as a dataset somewhere?
-
i.e,
Remove i.e. - there is only one XHTML namespace.
-
It allows authors to use RDFa annotations [31] within any element of the language5. In addition to RDFa, RASH makes available another way to add RDF statements to the document, i.e., the use of an element script (with the attribute type set to application/rdf+xml, text/turtle or to application/ld+json) within the element head for adding plain RDF/XML [13], Turtle [28] or JSON-LD content [32].
I would remove these RDF details as this is too early/scary here - they are also explained more later. Perhaps just "Allows authors to use embedded RDF annotations"
-
.
Remove "."
-
Springer's Lecture Note in Computer Science
Is this based on Capadisli's Springer CSS as used in Dokieli?
-
Note that accepting HTML as format for submissions in conferences/workshops is a totally different issue, since this choice is normally taken by the organisers.
I've found EasyChair had problems accepting RASH HTML files and I instead had to make my own PDF from the browser, which was difficult to also ensure hyperlinks worked.
-
english
English (capital E)
-
Angree
Strongly Agree
-
editions of the SAVE-SD workshop
two SAVE-SD workshops (it's not an edition)
-
edition
workshops
-
edition
workshop
-
second edition
second year
-
analisys
analysis
-
simple tools for helping authors in this phase.
For instance the The OpenLink Structured Data Editor
-
can find hard
can find it hard
-
who
who currently cannot (...)
-
in addition to
other than
-
For instance, several of the papers written by Semantic Web experts do not include any RDF statement in addition to those annotations that are enforced by RASH.
Perhaps the authors were not too concerned with adding RDF if the conference call did not make any "carrot" promises of what any RDF statements would be used for, e.g. added to a SPARQL-queriable triple store across all submitted papers or recognized/rendered by publisher's system.
-
, which are enforced by default in RASH,
"(used by default by RASH)" in paranthesis
-
spontaneously decided
"decided themselves" - it might not have happened spontaneously!
-
several CSV files
Data citation
-
a Python script we developed
Code citation
-
e W3C RDFa 1.1 Distiller service on e
Note that of the three
<script>
options mentioned above, this Distiller only supports RDFa and embedded Turtle, not embedded JSON-LD or the custom embedded RDF/XML. -
Some models are already available under the terms of the Apache Licence at http://opennlp.sourceforge.net/models-1.5/. [back]
Remove footnote, just make the sourceforge URL be the citation.
-
reviews
reviewers
-
Since, the program committee, the reviews and the editors will also have access to a LaTeX or a PDF version of the paper, the RASH file is an addition that does not preclude any current workflows
Rephrase: "As the program committee, reviewers and editors also have access ..."
-
very easy
fairly easy
-
chanche
chance
-
Java application
Code citation?
-
that can be downloaded
From where?
-
web-based service
Where?
-
in
to
-
XSLT 2.0 document
XSLT document -> XSLT script.
Code citation?
-
automatically
How is it done automatically? Is there an OpenOffice extension for this?
Do you mean "by executing the converter script"?
ODT documents are ZIP files, how can an XSLT script access the XML inside it without first unpacking? Presumably this must be coordinated from a second script.. Code citation?
-
related
saved
-
OpenOffice
-
This is, actually, one of the crucial step to guarantee the use of RASH
"We believe this is essential to help uptake of RASH usage"
-
RASH, of any
"RASH or any"
-
as required
currently requested
-
developed will take care about deciding
the Javascript decide what..
-
scripts
Javascript script -> Javascript
-
just an empty link
This functionality is sweet - but it seems to be partially broken in the PeerJ PDF for the section references (But not tables!)
-
developed a script
Which script? Add code citation.
-
Among other things above just using the RASH grammar only, this script adds relatively sophisticated checking of the datatype microsyntaxes of attribute values.
Simplify setence: This script also checks datatype microsyntaxes.
-
This ensures that RASH users get alerted to more potential mistakes in their documents so that they can easily fix them.
Rephrase, something like: This will hopefully help RASH users to discover and fix any mistakes in their documents early.
-
full set of HTML checks
.. also against the HTML specification using the W3C Nu HTML Checker ...
-
does for all HTML documents (by checking all requirements given in the HTML specification)
Unnecessary - remove this part of sentence
-
clear
Remove "clear"
-
We are actually using some external libraries
RASH uses external libraries like Bootstrap and JQuery to guarantee (..)
-
The whole visualisation of this document (as any other RASH document)
As this paper, if accepted, is transferred to other forms (e.g. PDF), this sentence would no longer be true. Rephrase to "The visualizations of RASH documents are renderered by the browser .."
-
The full project is available at https://github.com/essepuntato/rash/. Please use the hashtag #rashfwk for referring to any of the items defined in the RASH Framework via Twitter or other social platforms.
This footnote is appropriate for a website, but not for a paper. I would remove it and just use a normal citation for https://github.com/essepuntato/rash/
I would prefer also a software citation using a Zenodo archived version of the github repository.
-
<http://purl.org/spar/pro/>
Won't
<
and>
need to be escaped for this to be valid XHTML? Alternatively use CDATA block. -
application/rdf+xml for adding plain RDF/XML content
This is not standardized for RDF/XML -- it should be made clear that this is a non-standard extension.
-
Turtle content [28];
The section https://www.w3.org/TR/turtle/#in-html shows Turtle
<script>
, and equivalent https://www.w3.org/TR/json-ld/#embedding-json-ld-in-html-documents for JSON-LDIs it possible in RASH to have multiple #anchor hyperlinks to the same [28] bibliographic reference..? (e.g. avoid forcing another reference0
-
its RASH version
Should be permalink https://w3id.org/people/essepuntato/papers/rash-peerj2016.html, not https://rawgit.com/essepuntato/rash/master/papers/rash-peerj2016.html (which gave 404)
-
It is worth noticing that, excepting three properties from schema.org for defining author's metadata (see Section 2 of the RASH documentation for additional details), RASH does not constrain any particular vocabulary for introducing RDF statements
Rephrase: It is worth noticing that RASH does not constrain any particular vocabulary for introducing RDF statements, except three properties from schema.org for defining author's metadata (see Section 2 of the RASH documentation for additional details),
-
statements
statement (it's just one triple)
-
https://cdn.mathjax.org
If RASH papers are using such external scripts, which like here refers to /latest/ and can change (or disappear) over time - is there not a potentially large preservability problem in that RASH documents can't be archived directly?
I think you can argue that RASH is mainly a "live" format for interchange between co-authors, reviewers and publishers, and that once a paper is to be published, the document is "rendered" from RASH to PDF and/or publication-specific HTML; where such HTML would be modified to use snapshots/archived external resources for embedded images and scripts. Perhaps this could be a new small paragraph or part of Discussion?
-
it
it -> MathJax
-
all browsers
in most modern browsers
-
the
"Both options"
-
other two ways
"two other ways"
-
Ï€r2
This came out as just "r²" in the PeerJ PDF (pi symbol was missing).
-
guarantee a
to provide
-
possible (and more generic)
"their more generic alternatives"
-
back-compatible
backwards compatible
-
, introduced as follows:
".. by extending the HTML5
section
element with@role
set todoc-footnotes
anddoc-footnote
" -
The first, is a container-based behaviour, also suggested by JATS [21] by means of the element fn-group, that allows one to specify footnotes (through the element ft) by using a tag that is totally separated from the main text from which it is referenced (usually through XML attributes), as shown in the following excerpt:
It's worth pointing out that
fn-group
andft
are JATS-specific elements that are not part of HTML5 -
The first,
Remove comma
-
A different discourse can be done for the pattern popup, which is used for any structure that, while still not allowing text content inside itself, is nonetheless found in elements with a mixed content context [+t+s], and it is meant to represent complex substructures that interrupt but do not break the main flow of the text,
I'm afraid I didn't understand these sentences well - could they be rephrased or shortened?
-
acceptable/accepted
"acceptable for"
-
environment
such an environment
-
many scientific papers
not in the papers themselves - but in the software producing those papers.
-
readability
clarity
-
Formulas have been taken in particular consideration, since different ways are possible so as to implement them
Rephrase to avoid "possible so as" and use active language. "We have taken particular considerations to formulas, since there are different ways to implement them"
-
allows
allows the (..)
-
HTLM5
HTML5
-
ormal grammar of RASH
This link goes to a live version on raw.githubusercontent.com which is likely to change. Could you use a versioned permalink, e.g. using w3id.org or rawgit.com?
-
contain additional ones
additional elements
-
this is a quite odd situation
rephrase to professional English
-
As already mentioned
You can remove these repetitive "as already..." phrases
-
must/should
Confusing.. must or should? I guess you mean that some metadata is MUST and other is SHOULD? Rephrase.
-
(X)HTML5 document
XHTML or HTML? And which specification version?
-
(which is currently a working draft)
No need to mention "working draft" here as long as the citation clarifies that.
-
which is not pattern-based at all
This claim is unfounded - HTML, and in particular XHTML - has a detailed specification of patterns with which elements are allowed where -- however they are not fully the structural patterns you mention above.
-
orthogonality
Use italics on the new concepts orthogonality and assemblability
-
a bunch of
several
-
development robust
development of robust
-
, we developed
"which we have developed over the past few years"
-
imprecise on the full HTML.
rephrase "on the HTML"
-
However, leaving the user (i.e., the author) the freedom of using, potentially, the whole HTML specification may affect, in some way, the whole writing and publishing process of articles.
Full HTML would also need serious considerations for archival and future accessibility purposes. RASH has some of the same problems, which is not addressed here - for instance how do you make sure the CSS and images of the article is carried along with the HTML file?
Perhaps the article needs to be archived in a Research Object or archived using archive.org.
-
as primary language
as the primary format
-
in order
"aiming to" -- this paper itself shows it's not too easy to learn.
-
In services made available by the company would enable
I don't understand this sentence - rephrase?
-
oritented
oriented
-
HTML-oriented WYSIWYG editors
Authorea should be included in this section, this now supports WYSIWYG HTML, MarkDown and LaTeX in a collaborative UI.
-
The rest of the paper is structured as follows. In Section 2 we introduce some of the most relevant related works in the area, providing a functional comparison of the various works. In Section 3 we introduce the rationale for the creation of a new Web-first format for scholarly publication, discussing the importance of minimality. In Section 4 and Section 5 we introduce the theoretical background of RASH, and then provide an introduction of the language and the main tools included in its Framework. In Section 6, as a case study, we discuss the use of RASH as one of the formats for submitting papers to the SAVE-SD 2015 and SAVE-SD 2016 workshops. Finally, in Section 7 we conclude the paper sketching out some future developments.
"Section 2" etc. here and elsewhere don't show up correctty in PDF submitted to PeerJ - the numbers are missing.
-
guarantee the possibility of developing tools
guarantee a possibility? Rephrase to something like "enable tools to convert and render"
-
The publishers will then take care of creating the final formatting to best render the content in the style of their publications.
What about self-publishing, as promoted by Linked Research ?
-
is meant to be produced from MS Word, ODT and LaTeX sources
yet the surveys in this paper describes authors mainly writing RASH by hand, not in Word/ODT/LaTeX.
Rephrase to something like "the goal of RASH is to be produced from .." ?
-
i.e.,
delete "i.e."
-
i.e.
delete "i.e". (which means "for example")
-
trasferred
transfered
-