10 Matching Annotations
  1. Sep 2023
    1. But to understand how the system works and why it does what it does, we need also to understand the data used to train the model. And as the open data community has seen, barriers to sharing data are different than sharing source code. In addition to copyright and liability disclaimers, data have additional considerations like privacy and other third-party rights that might be in the training set. In order to truly get the benefits of open ML we need some additional information about the data used to build the overall ML.

      I assume this is subject to discussion, and will ultimately be edited to match what is said under "data and dataset"?

    2. Any additional terms accompanying any form of the artifacts (such as terms of use, or patents held by the licensor) must not contradict the terms of the license. Must not have any contractual restrictions on modifications that can be made.

      So you're saying that it must be a copyleft license? Permissive licenses can have proprietary terms imposed

    3. In

      The biggest barriers that I learned about data is that what might be acceptable constraints in software, like attribution or describing changes, are overly burdensome in data. I think we need to consider whether even some standard FOSS constraints need to be precluded for open data

    4. Model architecture and its weights and training parameters

      I wonder if these should be separated? Architecture is conceptually different from weights, although is architecture the same as parameters?

    5. without fear of repercussion

      This strikes me as overly alarmist. I think it's enough to say "the ability to share and improve the software source code, the preferred form of modifying the work" and leave it at that.