19 Matching Annotations
  1. Mar 2019
    1. Data journalism produced by two of the nation’s most prestigious news organizations — The New York Times and The Washington Post — has lacked transparency, often failing to explain the methods journalists or others used to collect or analyze the data on which the articles were based, a new study finds. In addition, the news outlets usually did not provide the public with access to that data

      While this is a worthwhile topic, I would like to see more exploration of data journalism in the 99.99999 percent of news organizations that are NOT the New York Times or the Washington Post and don't have the resources to publish so many data stories despite the desperate need for them across the nation. Also, why no digital news outlets included?

    2. Worse yet, it wouldn’t surprise me if we saw more unethical people publish data as a strategic communication tool, because they know people tend to believe numbers more than personal stories. That’s why it’s so important to have that training on information literacy and methodology.”

      Like the way unethical people use statistics in general? This should be a concern, especially as government data, long considered the gold standard of data, undergoes attacks that would skew the data toward political ends. (see the census 2020)

  2. Oct 2018
    1. research publications are not research data

      they could be, if used as part of a text mining corpus, for example

  3. May 2018
    1. Negative values included when assessing air quality In computing average pollutant concentrations, EPA includes recorded values that are below zero. EPA advised that this is consistent with NEPM AAQ procedures. Logically, however, the lowest possible value for air pollutant concentrations is zero. Either it is present, even if in very small amounts, or it is not. Negative values are an artefact of the measurement and recording process. Leaving negative values in the data introduces a negative bias, which potentially under represents actual concentrations of pollutants. We noted a considerable number of negative values recorded. For example, in 2016, negative values comprised 5.3 per cent of recorded hourly PM2.5 values, and 1.3 per cent of hourly PM10 values. When we excluded negative values from the calculation of one‐day averages, there were five more exceedance days for PM2.5 and one more for PM10 during 2016.
  4. Sep 2017
    1. We’re delighted to announce that the California Digital Library has been awarded a 2-year NSF EAGER grant to support active, machine-actionable data management plans (DMPs).
  5. Sep 2016
    1. the risk of re-identification increases by virtue of having more data points on students from multiple contexts

      Very important to keep in mind. Not only do we realise that re-identification is a risk, but this risk is exacerbated by the increase in “triangulation”. Hence some discussions about Differential Privacy.

    2. the automatic collection of students’ data through interactions with educational technologies as a part of their established and expected learning experiences raises new questions about the timing and content of student consent that were not relevant when such data collection required special procedures that extended beyond students’ regular educational experiences of students

      Useful reminder. Sounds a bit like “now that we have easier access to data, we have to be particularly careful”. Probably not the first reflex of most researchers before they start sending forms to their IRBs. Important for this to be explicitly designated as a concern, in IRBs.

    3. Responsible Use

      Again, this is probably a more felicitous wording than “privacy protection”. Sure, it takes as a given that some use of data is desirable. And the preceding section makes it sound like Learning Analytics advocates mostly need ammun… arguments to push their agenda. Still, the notion that we want to advocate for responsible use is more likely to find common ground than this notion that there’s a “data faucet” that should be switched on or off depending on certain stakeholders’ needs. After all, there exists a set of data use practices which are either uncontroversial or, at least, accepted as “par for the course” (no pun intended). For instance, we probably all assume that a registrar should receive the grade data needed to grant degrees and we understand that such data would come from other sources (say, a learning management system or a student information system).

    4. Research: Student data are used to conduct empirical studies designed primarily to advance knowledge in the field, though with the potential to influence institutional practices and interventions. Application: Student data are used to inform changes in institutional practices, programs, or policies, in order to improve student learning and support. Representation: Student data are used to report on the educational experiences and achievements of students to internal and external audiences, in ways that are more extensive and nuanced than the traditional transcript.

      Ha! The Chronicle’s summary framed these categories somewhat differently. Interesting. To me, the “application” part is really about student retention. But maybe that’s a bit of a cynical reading, based on an over-emphasis in the Learning Analytics sphere towards teleological, linear, and insular models of learning. Then, the “representation” part sounds closer to UDL than to learner-driven microcredentials. Both approaches are really interesting and chances are that the report brings them together. Finally, the Chronicle made it sound as though the research implied here were less directed. The mention that it has “the potential to influence institutional practices and interventions” may be strategic, as applied research meant to influence “decision-makers” is more likely to sway them than the type of exploratory research we so badly need.

    1. the use of data in scholarly research about student learning; the use of data in systems like the admissions process or predictive-analytics programs that colleges use to spot students who should be referred to an academic counselor; and the ways colleges should treat nontraditional transcript data, alternative credentials, and other forms of documentation about students’ activities, such as badges, that recognize them for nonacademic skills.

      Useful breakdown. Research, predictive models, and recognition are quite distinct from one another and the approaches to data that they imply are quite different. In a way, the “personalized learning” model at the core of the second topic is close to the Big Data attitude (collect all the things and sense will come through eventually) with corresponding ethical problems. Through projects vary greatly, research has a much more solid base in both ethics and epistemology than the kind of Big Data approach used by technocentric outlets. The part about recognition, though, opens the most interesting door. Microcredentials and badges are a part of a broader picture. The data shared in those cases need not be so comprehensive and learners have a lot of agency in the matter. In fact, when then-Ashoka Charles Tsai interviewed Mozilla executive director Mark Surman about badges, the message was quite clear: badges are a way to rethink education as a learner-driven “create your own path” adventure. The contrast between the three models reveals a lot. From the abstract world of research, to the top-down models of Minority Report-style predictive educating, all the way to a form of heutagogy. Lots to chew on.

  6. Jun 2016
  7. Feb 2016
  8. Jan 2016
    1. The explosion of data-intensive research is challenging publishers to create new solutions to link publications to research data (and vice versa), to facilitate data mining and to manage the dataset as a potential unit of publication. Change continues to be rapid, with new leadership and coordination from the Research Data Alliance (launched 2013): most research funders have introduced or tightened policies requiring deposit and sharing of data; data repositories have grown in number and type (including repositories for “orphan” data); and DataCite was launched to help make research data cited, visible and accessible. Meanwhile publishers have responded by working closely with many of the community-led projects; by developing data deposit and sharing policies for journals, and introducing data citation policies; by linking or incorporating data; by launching some pioneering data journals and services; by the development of data discovery services such as Thomson Reuters’ Data Citation Index (page 138).
  9. Nov 2015
  10. Oct 2015
    1. The Coming of OERRelated to the enthusiasm for digital instructional resources,four-fifths (81percent) of the survey participants agreethat “Open Source textbooks/Open Education Resource(OER) content “will be an important source for instructional resources in five yea
  11. Aug 2015
    1. data deposition is limited to researchers working at the same institution,

      Not necessarily. For many institutions, as long as one of the researchers is affiliated, the data can be deposited

  12. Jan 2014
    1. We regularly provide scholars with access to content for this purpose. Our Data for Research site (http://dfr.jstor.org)

      The access to this is exceedingly slow. Note that it is still in beta.