9 Matching Annotations
  1. Oct 2024
    1. Inferred Data: Sometimes information that doesn’t directly exist can be inferred through data mining (as we saw last chapter), and the creation of that new information could be a privacy violation. This includes the creation of Shadow Profiles, which are information about the user that the user didn’t provide or consent to

      This is an interesting point to raise because it's one that seems exceedingly challenging to quantify. If Twitter sees that I frequently post about living in Seattle and shows me other posts about/from Seattle, even if I have turned location sharing off, is that creating a shadow profile? The nature of a lot of this data mining is that it results in a more personally-tailored social media experience which itself is more profitable for companies, making the ethical analysis of it complicated. On the far end of the spectrum, companies could store ZERO information about users (i.e. there are no accounts, everyone is an anonymous user) but this tends to result in much less usable platforms. I'd argue that, while there certainly should be more comprehensive regulations on data collection, a lot of inferred data is nothing that couldn't be figured out by a person looking at your account.

    1. But while that is the proper security for storing passwords. So for example, Facebook stored millions of Instagram passwords in plain text, meaning the passwords weren’t encrypted and anyone with access to the database could simply read everyone’s passwords. And Adobe encrypted their passwords improperly and then hackers leaked their password database of 153 million users.

      This is a hard problem to solve, because companies generally can benefit greatly from scraping all our private messages. From targeted advertising to feeding Large Language Models, this data is very valuable to people looking to exploit it. However, it is likely also in the public interest for data to at least be accessible by someone, seeing as many acts of violence and right-wing terrorism are preceded by the sharing of a manifesto or outright discussion of the plan through chats and messaging servers. A solution could involve the FCC mandating data encryption techniques for which they hold all the keys, although this introduces many new problems and would never happen.

    1. Data can be poisoned intentionally as well. For example, in 2021, workers at Kellogg’s were upset at their working conditions, so they agreed to go on strike, and not work until Kellogg’s agreed to improve their work conditions. Kellogg’s announced that they would hire new workers to replace the striking workers:

      Another example of this sort of motivated data poisoning is the design of Nightshade and other anti-ai "washes," programs which add invisible overlays, aberrations, and metadata which damage an LLM's ability to accurately parse data from the affected image. These were largely introduced to help artists protect their own work and style from being stolen for the sake of LLMs.

    2. Additionally, spam and output from Large Language Models like ChatGPT can flood information spaces (e.g., email, Wikipedia) with nonsense, useless, or false content, making them hard to use or useless.

      In recent years, this has become an increasingly large concern for internet stability and longevity. Generative-AI has flooded nearly all aspects of the internet, damaging search functions, discovery of human creators, and ironically the growth of AI itself. Because of the torrent of gen-AI content, these LLMs are now training on considerably lower-quality datasets. In order to preserve the LLM, datasets must be rigorously maintained. In order to preserve the usable internet, LLM outputs must be pruned quickly.

    1. And they also talked about the advertising motives behind supporting social causes (even if some employees do indeed support them), and the advertising motivation behind tweeting openly about how they are basing their decisions based on advertising.

      This can be a deceptive strategy which obfuscates authenticity. Brand accounts will tend to align themselves with trends and social causes which they believe are popular and uncontroversial. This can serve as an authentic metric for societal trends while also usually being an inauthentic representation of the company and their corporate "morals." As with many things on the internet, authentic patterns may emerge from a collection of independent, potentially inauthentic behavior.

    1. Does anonymity discourage authenticity and encourage inauthentic behavior?

      This is an interesting question that should be answered it two parts. In most cases, it seems that anonymity inspires authenticity among users. Anonymity gives users the freedom to express their views and opinions that they may not be able to in the real world, which seems to be a much more common use of anonymity than using it to pretend to hold other beliefs (i.e. a type of trolling) in my anecdotal experience. However, if the metric of authenticity is truth, then anonymity may do harm. When there is no person to blame for the spread of hateful rhetoric and/or conspiratorial ideation, it quickly becomes challenging to combat the spread of misinformation.

    1. Before this centralization of media in the 1900s, newspapers and pamphlets were full of rumors and conspiracy theories. And now as the internet and social media have taken off in the early 2000s, we are again in a world full of rumors and conspiracy theories.

      While it's reasonable to claim that a decentralized media system results in the proliferation of conspiracy theories, I think it's worth noting that conspiratorial press is also very common within centralized media systems. News networks and daytime TV alike both have a tendency to report shocking and dubiously truthful (if not outright false and dangerous) news, often manufacturing outrage just the same.

    1. Friction is anything that gets in the way of a user performing an action. For example, if you have to open and navigate through several menus to find the privacy settings, that is significant friction. Or if one of the buttons has a bug and doesn’t work when you press it, so you have to find another way of performing that action, which is significant friction.

      It's very interesting to think about the idea of friction as an intentional choice, which seems counterintuitive. However, on further thought it seems like MORE friction should be added to social media. Services like Meta's Reels and TikTok have very low friction levels (even ads can be swiftly swiped past) which cause many people--including me--to get sucked into an endless scroll for dopamine.

    1. Metadata is information about some data. So we often think about a dataset as consisting of the main pieces of data (whatever those are in a specific situation), and whatever other information we have about that data (metadata).

      While metadata is information about data, I feel it is worth considering that the metadata itself is often interpreted by the audience as an equal part of the data. Alt text on a picture, the status of the author, and the popularity of the content are not only information about the content, but a part of the experience of it. This is often important for understanding social trends, as well as developing accessibility features.