Hypothesis

35 Matching Annotations

Apr 2026
ai.meta.com ai.meta.com

https://ai.meta.com/blog/introducing-muse-spark-msl/

1
1. fxp007 16 Apr 2026
  
  in Public
  
  we collaborated with over 1,000 physicians to curate training data that enables more factual and comprehensive responses.
  
  令人惊讶的是：为了提升Muse Spark在健康领域的推理能力，Meta竟然与超过1000名医生合作来筛选训练数据。这种规模的专家参与在AI模型开发中极为罕见，显示了Meta对医疗健康领域准确性的高度重视，也反映了AI模型专业化训练的新趋势。
  
  surprising health-ai data-curation
Visit annotations in context

Tags

health-ai

surprising

data-curation

Annotators

fxp007

URL

ai.meta.com/blog/introducing-muse-spark-msl/
Jul 2020
www.youtube.com www.youtube.com

YouTube

1
1. ErikStuchly 13 Jul 2020
  
  in BehSci
  
  Supporting Open Science Data Curation, Preservation, and Access by Libraries. (2020, June 25). https://www.youtube.com/watch?v=SbmGWHpzAHs&feature=youtu.be
  
  is:youtube lang:en webinar open science data curation reproducibility knowledge management accessibility network transparency data storage data sharing video
Visit annotations in context

Tags

reproducibility

data curation

accessibility

open science

lang:en

video

knowledge management

is:youtube

network

data storage

data sharing

transparency

webinar

Annotators

ErikStuchly

URL

youtube.com/
Jun 2020
zoom.us zoom.us

Welcome! You are invited to join a webinar: Supporting Open Science Data Curation, Preservation, and Access by Libraries. After registering, you will receive a confirmation email about joining the webinar.

1
1. ErikStuchly 28 Jun 2020
  
  in BehSci
  
  Welcome! You are invited to join a webinar: Supporting Open Science Data Curation, Preservation, and Access by Libraries. After registering, you will receive a confirmation email about joining the webinar. (n.d.). Zoom Video. Retrieved June 28, 2020, from https://zoom.us/webinar/register/2615905946283/WN_W6dYUXQFTqGQjGAZPRB74w
  
  is:webpage lang:en webinar open science data curation accessibility reproducibility infrastructure network data sharing
Visit annotations in context

Tags

accessibility

data curation

reproducibility

infrastructure

open science

lang:en

network

data sharing

webinar

is:webpage

Annotators

ErikStuchly

URL

zoom.us/webinar/register/2615905946283/WN_W6dYUXQFTqGQjGAZPRB74w
May 2020
arxiv.org arxiv.org

How the world's collective attention is being paid to a pandemic: COVID-19 related 1-gram time series for 24 languages on Twitter

1
1. edampf 29 May 2020
  
  in BehSci
  
  Alshaabi, T., et al. (2020 March 27). How the world's collective attention is being paid to a pandemic: COVID-19 related 1-gram time series for 24 languages on Twitter. Cornell University. arXiv:2003.12614
  
  is:preprint lang:en COVID-19 collective attention Twitter response data media social media curation divergence time series
Visit annotations in context

Tags

social media

curation

COVID-19

data

divergence

response

lang:en

collective attention

is:preprint

time series

media

Twitter

Annotators

edampf

URL

arxiv.org/abs/2003.12614
Jan 2014
www.ncbi.nlm.nih.gov www.ncbi.nlm.nih.gov

Untitled document

7
1. aculich 31 Jan 2014
  
  in Public
  
  Reasons for not making data electronically available. Regarding their attitudes towards data sharing, most of the respondents (85%) are interested in using other researchers' datasets, if those datasets are easily accessible. Of course, since only half of the respondents report that they make some of their data available to others and only about a third of them (36%) report their data is easily accessible, there is a major gap evident between desire and current possibility. Seventy-eight percent of the respondents said they are willing to place at least some their data into a central data repository with no restrictions. Data repositories need to make accommodations for varying levels of security or access restrictions. When asked whether they were willing to place all of their data into a central data repository with no restrictions, 41% of the respondents were not willing to place all of their data. Nearly two thirds of the respondents (65%) reported that they would be more likely to make their data available if they could place conditions on access. Less than half (45%) of the respondents are satisfied with their ability to integrate data from disparate sources to address research questions, yet 81% of them are willing to share data across a broad group of researchers who use data in different ways. Along with the ability to place some restrictions on sharing for some of their data, the most important condition for sharing their data is to receive proper citation credit when others use their data. For 92% of the respondents, it is important that their data are cited when used by other researchers. Eighty-six percent of survey respondents also noted that it is appropriate to create new datasets from shared data. Most likely, this response relates directly to the overwhelming response for citing other researchers' data. The breakdown of this section is presented in Table 13.
  
  Categories of data sharing considered:
  
  I would use other researchers' datasets if their datasets were easily accessible.
  
  I would be willing to place at least some of my data into a central data repository with no restrictions.
  
  I would be willing to place all of my data into a central data repository with no restrictions.
  
  I would be more likely to make my data available if I could place conditions on access.
  
  I am satisfied with my ability to integrate data from disparate sources to address research questions.
  
  I would be willing to share data across a broad group of researchers who use data in different ways.
  
  It is important that my data are cited when used by other researchers.
  
  It is appropriate to create new datasets from shared data.
  
  RIT data curation data sharing categories survey results
2. aculich 31 Jan 2014
  
  in Public
  
  Data sharing practices. Only about a third (36%) of the respondents agree that others can access their data easily, although three-quarters share their data with others (see Table 11). This shows there is a willingness to share data, but it is difficult to achieve or is done only on request.
  
  There is a willingness, but not a way!
  
  RIT data curation data sharing
3. aculich 31 Jan 2014
  
  in Public
  
  Nearly one third of the respondents chose not to answer whether they make their data available to others. Of those who did respond, 46% reported they do not make their data electronically available to others. Almost as many reported that at least some of their data are available somehow, either on their organization's website, their own website, a national network, a global network, a personal website, or other (see Table 10). The high percentage of non-respondents to this question most likely indicates that data sharing is even lower than the numbers indicate. Furthermore, the less than 6% of scientists who are making “All” of their data available via some mechanism, tends to re-enforce the lack of data sharing within the communities surveyed.
  
  RIT data curation data sharing
4. aculich 31 Jan 2014
  
  in Public
  
  Adding descriptive metadata to datasets helps makes the dataset more accessible by others and into the future. Respondents were asked to indicate all metadata standards they currently use to describe their data. More than half of the respondents (56%) reported that they did not use any metadata standard and about 22% of respondents indicated they used their own lab metadata standard. This could be interpreted that over 78% of survey respondents either use no metadata or a local home grown metadata approach.
  
  Not surprising that roughly 80% use no or ad hoc metadata.
  
  RIT data curation metadata ad hoc
5. aculich 31 Jan 2014
  
  in Public
  
  Data reuse. Respondents were asked to indicate whether they have the sole responsibility for approving access to their data. Of those who answered this question, 43% (n=545) have the sole responsibility for all their datasets, 37% (n=466) have for some of their datasets, and 21% (n=266) do not have the sole responsibility.
  
  RIT data curation reuse responsibility
6. aculich 31 Jan 2014
  
  in Public
  
  Policies and procedures sometimes serve as an active rather than passive barrier to data sharing. Campbell et al. (2003) reported that government agencies often have strict policies about secrecy for some publicly funded research. In a survey of 79 technology transfer officers in American universities, 93% reported that their institution had a formal policy that required researchers to file an invention disclosure before seeking to commercialize research results. About one-half of the participants reported institutional policies that prohibited the dissemination of biomaterials without a material transfer agreement, which have become so complex and demanding that they inhibit sharing [15].
  
  Policies and procedures are barriers, but there are many more barriers beyond that which get in the way first.
  
  RIT data curation policy barriers technology transfer
7. aculich 31 Jan 2014
  
  in Public
  
  data practices of researchers – data accessibility, discovery, re-use, preservation and, particularly, data sharing
  
  data accessibility
  
  discovery
  
  re-use
  
  preservation
  
  data sharing
  
  RIT data curation practices
Visit annotations in context

Tags

responsibility

data curation

ad hoc

technology transfer

practices

metadata

categories

data sharing

policy

survey results

RIT

barriers

reuse

Annotators

aculich

URL

ncbi.nlm.nih.gov/pmc/articles/PMC3126798/
www.dataone.org www.dataone.org

Untitled document

10
1. aculich 31 Jan 2014
  
  in Public
  
  The Data Life Cycle: An Overview The data life cycle has eight components: Plan : description of the data that will be compiled, and how the data will be managed and made accessible throughout its lifetime Collect : observations are made either by hand or with sensors or other instruments and the data are placed a into digital form Assure : the quality of the data are assured through checks and inspections Describe : data are accurately and thoroughly described using the appropriate metadata standards Preserve : data are submitted to an appropriate long-term archive (i.e. data center ) Discover : potentially useful data are located and obtained, along with the relevant information about the data ( metadata ) Integrate : data from disparate sources are combined to form one homogeneous set of data that can be readily analyzed Analyze : data are analyzed
  
  The lifecycle according to who? This 8-component description is from the point of view of only the people who obsessively think about this "problem".
  
  Ask a researcher and I think you'll hear that lifecycle means something like:
  
  collect -> analyze -> publish
  
  or a more complex data management plan might be:
  
  ask someone -> receive data in email -> analyze -> cite -> publish -> tenure
  
  To most people lifecycle means "while I am using the data" and archiving means "my storage guy makes backups occasionally".
  
  Asking people to be aware of the whole cycle outlined here is a non-starter, but I think there is another approach to achieve what we want... dramatic pause [to be continued]
  
  What parts of this cycle should the individual be responsible for vs which parts are places where help is needed from the institution?
  
  RIT data curation jargon lifecycle critique opinion responsibility
2. aculich 31 Jan 2014
  
  in Public
  
  Data represent important products of the scientific enterprise that are, in many cases, of equivalent or greater value than the publications that are originally derived from the research process. For example, addressing many of the grand challenge scientific questions increasingly requires collaborative research and the reuse , integration, and synthesis of data.
  
  Who else might care about this other than Grand Challenge Question researchers?
  
  data curation grand challenge questions RIT
3. aculich 31 Jan 2014
  
  in Public
  
  Journals and sponsors want you to share your data
  
  What is the sharing standard? What are the consequences of not sharing? What is the enforcement mechanism?
  
  There are three primary sharing mechanisms I can think of today: email, usb stick, and dropbox (née ftp).
  
  The dropbox option is supplanting ftp which comes from another era, but still satisfies an important niche for larger data sets and/or higher-volume or anonymous traffic.
  
  Dropbox, email and usb are all easily accessible parts of the day-to-day consumer workflow; they are all trivial to set up without institutional support or, importantly, permission.
  
  An email account is already provisioned by default for everyone or, if the institutional email offerings are not sufficient, a person may easily set up a 3rd-party email account with no permission or hassle.
  
  Data management alternatives to these three options will have slow or no adoption until the barriers to access and use are as low as email; the cost of entry needs to be no more than *a web browser, an email address, and no special permission required".
  
  RIT data curation data management data sharing sharing standards barriers adoption
4. aculich 31 Jan 2014
  
  in Public
  
  An effective data management program would enable a user 20 years or longer in the future to discover , access , understand, and use particular data [ 3 ]. This primer summarizes the elements of a data management program that would satisfy this 20-year rule and are necessary to prevent data entropy .
  
  Who cares most about the 20-year rule? This is an ideal that appeals to some, but in practice even the most zealous adherents can't picture what this looks like in some concrete way-- except in the most traditional ways: physical paper journals in libraries are tangible examples of the 20-year rule.
  
  Until we have a digital equivalent for data I don't blame people looking for tenure or jobs for not caring about this ideal if we can't provide a clear picture of how to achieve this widely at an institutional level. For digital materials I think the picture people have in their minds is of tape backup. Maybe this is generational? New generations not exposed widely to cassette tapes, DVDs, and other physical media that "old people" remember, only then will it be possible to have a new ideal that people can see in their minds-eye.
  
  RIT data curation data management 20-year rule critique opinion ideals vision tangible jargon data entropy tape backup
5. aculich 31 Jan 2014
  
  in Public
  
  A key component of data management is the comprehensive description of the data and contextual information that future researchers need to understand and use the data. This description is particularly important because the natural tendency is for the information content of a data set or database to undergo entropy over time (i.e. data entropy ), ultimately becoming meaningless to scientists and others [ 2 ].
  
  I agree with the key component mentioned here, but I feel the term data entropy is an unhelpful crutch.
  
  RIT data curation data management key component jargon data entropy
6. aculich 31 Jan 2014
  
  in Public
  
  data entropy Normal degradation in information content associated with data and metadata over time (paraphrased from [ 2 ]).
  
  I'm not sure what this really means and I don't think data entropy is a helpful term. Poor practices certainly lead to disorganized collections of data, but I think this notion comes from a time when people were very concerned about degradation of physical media on which data is stored. That is, of course, still a concern, but I think the term data entropy really lends itself as an excuse for people who don't use good practices to manage data and is a cover for the real problem which is a kind of data illiteracy in much the same way we also face computational illiteracy widely in the sciences. Managing data really is hard, but let's not mask it with fanciful notions like data entropy.
  
  RIT data curation jargon data entropy entropy illiteracy critique opinion
7. aculich 31 Jan 2014
  
  in Public
  
  Although data management plans may differ in format and content, several basic elements are central to managing data effectively.
  
  What are the "several basic elements?"
  
  RIT data curation question
8. aculich 31 Jan 2014
  
  in Public
  
  By documenting your data and recommending appropriate ways to cite your data, you can be sure to get credit for your data products and their use
  
  Citation is an incentive. An answer to the question "What's in it for me?"
  
  RIT data curation citation incentive
9. aculich 31 Jan 2014
  
  in Public
  
  This primer describes a few fundamental data management practices that will enable you to develop a data management plan, as well as how to effectively create, organize, manage, describe, preserve and share data
  
  Data management practices:
  
  create
  
  organize
  
  manage
  
  describe
  
  preserve
  
  share
  
  RIT data curation data management practices
10. aculich 31 Jan 2014
  
  in Public
  
  The goal of data management is to produce self-describing data sets. If you give your data to a scientist or colleague who has not been involved with your project, will they be able to make sense of it? Will they be able to use it effectively and properly?
  
  RIT data curation question
Visit annotations in context

Tags

responsibility

data curation

citation

20-year rule

vision

opinion

practices

grand challenge questions

key component

jargon

ideals

barriers

lifecycle

illiteracy

entropy

question

sharing standards

tape backup

data entropy

critique

incentive

data management

tangible

adoption

data sharing

RIT

Annotators

aculich

URL

dataone.org/sites/all/documents/DataONE_BP_Primer_020212.pdf
www.alexandria.ucsb.edu www.alexandria.ucsb.edu

Untitled document

14
1. aculich 31 Jan 2014
  
  in Public
  
  One respondent noted that NSF doesn't have an enforcement policy. This is presumably true of other mandate sources as well, and brings up the related and perhaps more significant problem that mandates are not always (if they are ever) accompanied by the funding required to satisfy them. Another respondent wrote that funding agencies expect universities to contribute to long-term data storage.
  
  RIT data curation mandate enforcement
2. aculich 31 Jan 2014
  
  in Public
  
  Data management activities, grouped. The data management activities mentioned by the survey can be grouped into five broader categories: "storage" (comprising backup or archival data storage, identifying appropriate data repositories, day-to-day data storage, and interacting with data repositories); "more information" (comprising obtaining more information about curation best practices and identifying appropriate data registries and search portals); "metadata" (comprising assigning permanent identifiers to data, creating and publishing descriptions of data, and capturing computational provenance); "funding" (identifying funding sources for curation support); and "planning" (creating data management plans at proposal time). When the survey results are thus categorized, the dominance of storage is clear, with over 80% of respondents requesting some type of storage-related help. (This number may also reflect a general equating of curation with storage on the part of respondents.) Slightly fewer than 50% of respondents requested help related to metadata, a result explored in more detail below.
  
  Categories of data management activities:
  
  storage
  
  backup/archival data storage
  
  identifying appropriate data repositories
  
  day-to-day data storage
  
  interacting with data repositories
  
  more information
  
  obtaining more information about curation best practices
  
  identifying appropriate data registries
  
  search portals
  
  metadata
  
  assigning permanent identifiers to data
  
  creating/publishing descriptions of data
  
  capturing computational provenance
  
  funding
  
  identifying funding sources for curation support
  
  planning
  
  creating data management plans at proposal time
  
  RIT data curation data management categories
3. aculich 30 Jan 2014
  
  in Public
  
  Data management activities, grouped. The data management activities mentioned by the survey can be grouped into five broader categories: "storage" (comprising backup or archival data storage, identifying appropriate data repositories, day-to-day data storage, and interacting with data repositories); "more information" (comprising obtaining more information about curation best practices and identifying appropriate data registries and search portals); "metadata" (comprising assigning permanent identifiers to data, creating and publishing descriptions of data, and capturing computational provenance); "funding" (identifying funding sources for curation support); and "planning" (creating data management plans at proposal time). When the survey results are thus categorized, the dominance of storage is clear, with over 80% of respondents requesting some type of storage-related help. (This number may also reflect a general equating of curation with storage on the part of respondents.) Slightly fewer than 50% of respondents requested help related to metadata, a result explored in more detail below.
  
  Storage is a broad topic and is a very frequently mentioned topic in all of the University-run surveys.
  
  http://www.alexandria.ucsb.edu/~gjanee/dc@ucsb/survey/plots/q4.2.png
  
  Highlight by Chris during today's discussion.
  
  RIT data curation diagram
4. aculich 30 Jan 2014
  
  in Public
  
  Distribution of departments with respect to responsibility spheres. Ignoring the "Myself" choice, consider clustering the parties potentially responsible for curation mentioned in the survey into three "responsibility spheres": "local" (comprising lab manager, lab research staff, and department); "campus" (comprising campus library and campus IT); and "external" (comprising external data repository, external research partner, funding agency, and the UC Curation Center). Departments can then be positioned on a tri-plot of these responsibility spheres, according to the average of their respondents' answers. For example, all responses from FeministStds (Feminist Studies) were in the campus sphere, and thus it is positioned directly at that vertex. If a vertex represents a 100% share of responsibility, then the dashed line opposite a vertex represents a reduction of that share to 20%. For example, only 20% of ECE's (Electrical and Computer Engineering's) responses were in the campus sphere, while the remaining 80% of responses were evenly split between the local and external spheres, and thus it is positioned at the 20% line opposite the campus sphere and midway between the local and external spheres. Such a plot reveals that departments exhibit different characteristics with respect to curatorial responsibility, and look to different types of curation solutions.
  
  This section contains an interesting diagram showing the distribution of departments with respect to responsibility spheres:
  
  http://www.alexandria.ucsb.edu/~gjanee/dc@ucsb/survey/plots/q2.5.png
  
  RIT data curation responsibility departments diagram distribution
5. aculich 30 Jan 2014
  
  in Public
  
  In the course of your research or teaching, do you produce digital data that merits curation? 225 of 292 (77%) of respondents answered "yes" to this first question, which corresponds to 25% of the estimated population of 900 faculty and researchers who received the survey.
  
  For those who do not feel they have data that merits curation I would at least like to hear a description of the kinds of data they have and why they feel it does not need to be curated?
  
  For some people they may already be using well-curated data sets; on the other hand there are some people who feel their data may not be useful to anyone outside their own research group, so there is no need to curate the data for use by anyone else even though under some definition of "curation" there may be important unmet curation needs for internal-use only that may be visible only to grad students or researchers who work with the data hands-on daily.
  
  UPDATE: My question is essentially answered here: https://hypothes.is/a/xBpqzIGTRaGCSmc_GaCsrw
  
  RIT data curation survey question question merit
6. aculich 30 Jan 2014
  
  in Public
  
  Responsibility, myself versus others. It may appear that responses to the question of responsibility are bifurcated between "Myself" and all other parties combined. However, respondents who identified themselves as being responsible were more likely than not to identify additional parties that share that responsibility. Thus, curatorial responsibility is seen as a collaborative effort. (The "Nobody" category is a slight misnomer here as it also includes non-responses to this question.)
  
  This answers my previous question about this survey item:
  
  https://hypothes.is/a/QrDAnmV8Tm-EkDuHuknS2A
  
  RIT data curation survey question responsibility collaborative effort answer
7. aculich 30 Jan 2014
  
  in Public
  
  Awareness of data and commitment to its preservation are two key preconditions for successful data curation.
  
  Great observation!
  
  RIT data curation survey question preconditions awareness commitment
8. aculich 30 Jan 2014
  
  in Public
  
  Which parties do you believe have primary responsibility for the curation of your data? Almost all respondents identified themselves as being personally responsible.
  
  For those that identify themselves as personally responsible would they identify themselves (or their group) as the only ones responsible for the data? Or is there a belief that the institution should also be responsible in some way in addition to themselves?
  
  RIT data curation survey question question responsibility
9. aculich 30 Jan 2014
  
  in Public
  
  Availability of the raw survey data is subject to the approval of the UCSB Human Subjects Committee.
  
  http://www.research.ucsb.edu/compliance/human-subjects/
  
  RIT data curation survey data restrictions human subjects
10. aculich 30 Jan 2014
  
  in Public
  
  Survey design The survey was intended to capture as broad and complete a view of data production activities and curation concerns on campus as possible, at the expense of gaining more in-depth knowledge.
  
  Summary of the survey design
  
  RIT data curation summary survey survey design
11. aculich 30 Jan 2014
  
  in Public
  
  Researchers may be underestimating the need for help using archival storage systems and dealing with attendant metadata issues.
  
  In my mind this is a key challenge: even if people can describe what they need for themselves (that in itself is a very hard problem), what to do from the infrastructure standpoint to implement services that aid the individual researcher and also aid collaboration across individuals in the same domain, as well as across domains and institutions... in a long-term sustainable way is not obvious.
  
  In essence... how do we translate needs that we don't yet fully understand into infrastructure with low barrier to adoption, use, and collaboration?
  
  RIT data curation key challenge question
12. aculich 30 Jan 2014
  
  in Public
  
  Researchers view curation as a collaborative activity and collective responsibility.
  
  RIT data curation collaboration responsibility
13. aculich 30 Jan 2014
  
  in Public
  
  To summarize the survey's findings: Curation of digital data is a concern for a significant proportion of UCSB faculty and researchers. Curation of digital data is a concern for almost every department and unit on campus. Researchers almost universally view themselves as personally responsible for the curation of their data. Researchers view curation as a collaborative activity and collective responsibility. Departments have different curation requirements, and therefore may require different amounts and types of campus support. Researchers desire help with all data management activities related to curation, predominantly storage. Researchers may be underestimating the need for help using archival storage systems and dealing with attendant metadata issues. There are many sources of curation mandates, and researchers are increasingly under mandate to curate their data. Researchers under curation mandate are more likely to collaborate with other parties in curating their data, including with their local labs and departments. Researchers under curation mandate request more help with all curation-related activities; put another way, curation mandates are an effective means of raising curation awareness. The survey reflects the concerns of a broad cross-section of campus.
  
  Summary of survey findings.
  
  RIT data curation survey findings summary
14. aculich 30 Jan 2014
  
  in Public
  
  In 2012 the Data Curation @ UCSB Project surveyed UCSB campus faculty and researchers on the subject of data curation, with the goals of 1) better understanding the scope of the digital curation problem and the curation services that are needed, and 2) characterizing the role that the UCSB Library might play in supporting curation of campus research outputs.
  
  1) better understanding the scope of the digital curation problem and the curation services that are needed
  
  2) characterizing the role that the UCSB Library might play in supporting curation of campus research outputs.
  
  RIT data curation goals
Visit annotations in context

Tags

responsibility

data curation

collaborative effort

commitment

summary

distribution

survey data

survey question

collaboration

key challenge

survey

categories

goals

diagram

mandate

survey design

answer

question

enforcement

awareness

preconditions

data management

merit

human subjects

findings

RIT

departments

restrictions

Annotators

aculich

URL

alexandria.ucsb.edu/~gjanee/dc@ucsb/survey/

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL