autonomy, transparency, and collaborative improvement
, which in the context of ML translate collectively into increased trust from a freedom to use, ownership and attribution perspective.
autonomy, transparency, and collaborative improvement
, which in the context of ML translate collectively into increased trust from a freedom to use, ownership and attribution perspective.
But to understand how the system works and why it does what it does, we need also to understand the data used to train the model. And as the open data community has seen, barriers to sharing data are different than sharing source code. In addition to copyright and liability disclaimers, data have additional considerations like privacy and other third-party rights that might be in the training set. In order to truly get the benefits of open ML we need some additional information about the data used to build the overall ML.
Might it be helpful to consider here the data provenance standards, such as those we discussed in context of the Data & Trust Alliance?
and
hyperparameters (settings that are not learned from the data but are set prior to training)
We need to be able to share ML and our improvements in it so that anyone can collaborate to use, study, and improve ML for everyone.
I believe there is an opportunity to take this statement a step further, as it is in the understanding of ML that we can unearth the risks and opportunities, and define the boundaries of what is appropriate and what is harmful.