189 Matching Annotations
  1. Sep 2023
  2. Feb 2023
    1. And it constitutes an important but overlooked signpost in the 20th-centuryhistory of information, as ‘facts’ fell out of fashion but big data became big business.

      Of course the hardest problem in big data has come to be realized as the issue of cleaning up messing and misleading data!

    Tags

    Annotators

  3. Nov 2022
  4. Jul 2022
    1. AI text generator, a boon for bloggers? A test report

      While I wanted to investigate AI text generators further, I ended up writing a testreport.. I was quite stunned because the AI ​​text generator turns out to be able to create a fully cohesive and to-the-point article in minutes. Here is the test report.

  5. Feb 2022
    1. Large enterprises and organizations use a vast variety ofdifferent information systems, databases, portals, wikis andKnowledge Bases [KBs] combining hundreds and thousandsof data and information sources.

      Problemstellung: Big Data

    Tags

    Annotators

  6. Jan 2022
    1. Notez que le coût pour une grande sécurité est incroyablement élevé. Si vous voulez fournir des données sur lesquelles vous êtes capable d’effectuer des calculs de manière complètement homomorphe, la taille va être extrêmement grande par rapport aux données initiales. Actuellement, c’est quelque chose qui n’est pas exploitable pour des calculs de type big data.

      Ne pas faire de big data avec des donnees personnelles et faire du chiffrement homomorphe sur les donnees personnelles

  7. Oct 2021
    1. billing data created by electric utilities

      That's really true. For example, we can use the big data of electricity to depict the resumption rate of factories after the pandemic.

  8. Jun 2021
  9. Apr 2021
    1. The privacy policy — unlocking the door to your profile information, geodata, camera, and in some cases emails — is so disturbing that it has set off alarms even in the tech world.

      This Intercept article covers some of the specific privacy policy concerns Barron hints at here. The discussion of one of the core patents underlying the game, which is described as a “System and Method for Transporting Virtual Objects in a Parallel Reality Game" is particularly interesting. Essentially, this system generates revenue for the company (in this case Niantic and Google) through the gamified collection of data on the real world - that selfie you took with squirtle is starting to feel a little bit less innocent in retrospect...

  10. Mar 2021
  11. Feb 2021
  12. Jan 2021
  13. Oct 2020
  14. Sep 2020
  15. Aug 2020
    1. Lozano, R., Fullman, N., Mumford, J. E., Knight, M., Barthelemy, C. M., Abbafati, C., Abbastabar, H., Abd-Allah, F., Abdollahi, M., Abedi, A., Abolhassani, H., Abosetugn, A. E., Abreu, L. G., Abrigo, M. R. M., Haimed, A. K. A., Abushouk, A. I., Adabi, M., Adebayo, O. M., Adekanmbi, V., … Murray, C. J. L. (2020). Measuring universal health coverage based on an index of effective coverage of health services in 204 countries and territories, 1990–2019: A systematic analysis for the Global Burden of Disease Study 2019. The Lancet, 0(0). https://doi.org/10.1016/S0140-6736(20)30750-9

  16. Jul 2020
    1. Fontanet, A., Tondeur, L., Madec, Y., Grant, R., Besombes, C., Jolly, N., Pellerin, S. F., Ungeheuer, M.-N., Cailleau, I., Kuhmel, L., Temmam, S., Huon, C., Chen, K.-Y., Crescenzo, B., Munier, S., Demeret, C., Grzelak, L., Staropoli, I., Bruel, T., … Hoen, B. (2020). Cluster of COVID-19 in northern France: A retrospective closed cohort study. MedRxiv, 2020.04.18.20071134. https://doi.org/10.1101/2020.04.18.20071134

    1. Sapoval, N., Mahmoud, M., Jochum, M. D., Liu, Y., Elworth, R. A. L., Wang, Q., Albin, D., Ogilvie, H., Lee, M. D., Villapol, S., Hernandez, K., Berry, I. M., Foox, J., Beheshti, A., Ternus, K., Aagaard, K. M., Posada, D., Mason, C., Sedlazeck, F. J., & Treangen, T. J. (2020). Hidden genomic diversity of SARS-CoV-2: Implications for qRT-PCR diagnostics and transmission. BioRxiv, 2020.07.02.184481. https://doi.org/10.1101/2020.07.02.184481

    1. Lavezzo, E., Franchin, E., Ciavarella, C., Cuomo-Dannenburg, G., Barzon, L., Del Vecchio, C., Rossi, L., Manganelli, R., Loregian, A., Navarin, N., Abate, D., Sciro, M., Merigliano, S., De Canale, E., Vanuzzo, M. C., Besutti, V., Saluzzo, F., Onelia, F., Pacenti, M., … Crisanti, A. (2020). Suppression of a SARS-CoV-2 outbreak in the Italian municipality of Vo’. Nature, 1–1. https://doi.org/10.1038/s41586-020-2488-1

  17. Jun 2020
    1. Rosenberg, E. S., Tesoriero, J. M., Rosenthal, E. M., Chung, R., Barranco, M. A., Styer, L. M., Parker, M. M., John Leung, S.-Y., Morne, J. E., Greene, D., Holtgrave, D. R., Hoefer, D., Kumar, J., Udo, T., Hutton, B., & Zucker, H. A. (2020). Cumulative incidence and diagnosis of SARS-CoV-2 infection in New York. Annals of Epidemiology. https://doi.org/10.1016/j.annepidem.2020.06.004

    1. Chu, D. K., Akl, E. A., Duda, S., Solo, K., Yaacoub, S., Schünemann, H. J., Chu, D. K., Akl, E. A., El-harakeh, A., Bognanni, A., Lotfi, T., Loeb, M., Hajizadeh, A., Bak, A., Izcovich, A., Cuello-Garcia, C. A., Chen, C., Harris, D. J., Borowiack, E., … Schünemann, H. J. (2020). Physical distancing, face masks, and eye protection to prevent person-to-person transmission of SARS-CoV-2 and COVID-19: A systematic review and meta-analysis. The Lancet, 0(0). https://doi.org/10.1016/S0140-6736(20)31142-9

    1. Hsiang, S., Allen, D., Annan-Phan, S., Bell, K., Bolliger, I., Chong, T., Druckenmiller, H., Huang, L. Y., Hultgren, A., Krasovich, E., Lau, P., Lee, J., Rolf, E., Tseng, J., & Wu, T. (2020). The effect of large-scale anti-contagion policies on the COVID-19 pandemic. Nature, 1–9. https://doi.org/10.1038/s41586-020-2404-8

    1. Oliver, N., Lepri, B., Sterly, H., Lambiotte, R., Deletaille, S., Nadai, M. D., Letouzé, E., Salah, A. A., Benjamins, R., Cattuto, C., Colizza, V., Cordes, N. de, Fraiberger, S. P., Koebe, T., Lehmann, S., Murillo, J., Pentland, A., Pham, P. N., Pivetta, F., … Vinck, P. (2020). Mobile phone data for informing public health actions across the COVID-19 pandemic life cycle. Science Advances, 6(23), eabc0764. https://doi.org/10.1126/sciadv.abc0764

    1. Kempfert, K., Martinez, K., Siraj, A., Conrad, J., Fairchild, G., Ziemann, A., Parikh, N., Osthus, D., Generous, N., Del Valle, S., & Manore, C. (2020). Time Series Methods and Ensemble Models to Nowcast Dengue at the State Level in Brazil. ArXiv:2006.02483 [q-Bio, Stat]. http://arxiv.org/abs/2006.02483

  18. May 2020
    1. Drew, D. A., Nguyen, L. H., Steves, C. J., Menni, C., Freydin, M., Varsavsky, T., Sudre, C. H., Cardoso, M. J., Ourselin, S., Wolf, J., Spector, T. D., Chan, A. T., & Consortium§, C. (2020). Rapid implementation of mobile technology for real-time epidemiology of COVID-19. Science. https://doi.org/10.1126/science.abc0473

  19. Apr 2020
  20. Feb 2020
    1. One important aspect of critical social media research is the study of not just ideolo-gies of the Internet but also ideologies on the Internet. Critical discourse analysis and ideology critique as research method have only been applied in a limited manner to social media data. Majid KhosraviNik (2013) argues in this context that ‘critical dis-course analysis appears to have shied away from new media research in the bulk of its research’ (p. 292). Critical social media discourse analysis is a critical digital method for the study of how ideologies are expressed on social media in light of society’s power structures and contradictions that form the texts’ contexts.
    2. t has, for example, been common to study contemporary revolutions and protests (such as the 2011 Arab Spring) by collecting large amounts of tweets and analysing them. Such analyses can, however, tell us nothing about the degree to which activists use social and other media in protest communication, what their motivations are to use or not use social media, what their experiences have been, what problems they encounter in such uses and so on. If we only analyse big data, then the one-sided conclusion that con-temporary rebellions are Facebook and Twitter revolutions is often the logical conse-quence (see Aouragh, 2016; Gerbaudo, 2012). Digital methods do not outdate but require traditional methods in order to avoid the pitfall of digital positivism. Traditional socio-logical methods, such as semi-structured interviews, participant observation, surveys, content and critical discourse analysis, focus groups, experiments, creative methods, par-ticipatory action research, statistical analysis of secondary data and so on, have not lost importance. We do not just have to understand what people do on the Internet but also why they do it, what the broader implications are, and how power structures frame and shape online activities
    3. Challenging big data analytics as the mainstream of digital media studies requires us to think about theoretical (ontological), methodological (epistemological) and ethical dimensions of an alternative paradigm

      Making the case for the need for digitally native research methodologies.

    4. Who communicates what to whom on social media with what effects? It forgets users’ subjectivity, experiences, norms, values and interpre-tations, as well as the embeddedness of the media into society’s power structures and social struggles. We need a paradigm shift from administrative digital positivist big data analytics towards critical social media research. Critical social media research combines critical social media theory, critical digital methods and critical-realist social media research ethics.
    5. de-emphasis of philosophy, theory, critique and qualitative analysis advances what Paul Lazarsfeld (2004 [1941]) termed administrative research, research that is predominantly concerned with how to make technologies and administration more efficient and effective.
    6. Big data analytics’ trouble is that it often does not connect statistical and computational research results to a broader analysis of human meanings, interpretations, experiences, atti-tudes, moral values, ethical dilemmas, uses, contradictions and macro-sociological implica-tions of social media.
    7. Such funding initiatives privilege quantitative, com-putational approaches over qualitative, interpretative ones.
    8. There is a tendency in Internet Studies to engage with theory only on the micro- and middle-range levels that theorize single online phenomena but neglect the larger picture of society as a totality (Rice and Fuller, 2013). Such theories tend to be atomized. They just focus on single phenomena and miss soci-ety’s big picture
  21. Jan 2020
    1. A final word: when we do not understand something, it does not look like there is anything to be understood at all - it just looks like random noise. Just because it looks like noise does not mean there is no hidden structure.

      Excellent statement! Could this be the guiding principle of the current big data boom in biology?

  22. Jan 2019
    1. Big Data is a buzzword which works on the platform of large data volume and aggregated data sets. The data sets can be structured or unstructured.  The data that is kept and stored at a global level keeps on growing so this big data has a big potential.  Your Big Data is generated from every little thing around us all the time. It has changed its way as the people are changing in the organization. New skills are being offered to prepare the new generated power of Big Data. Nowadays the organizations are focusing on new roles, new challenges and creating a new business.  
  23. Nov 2018
    1. The Chinese place a higher value on community good versus individual rights, so most feel that, if social credit will bring a safer, more secure, more stable society, then bring it on
  24. Aug 2018
    1. hus it becomes possible to see how ques-tions around data use need to shift from asking what is in the data, to include discussions of how the data is structured, and how this structure codifies value systems and social practices, subject positions and forms of visibility and invisi-bility (and thus forms of surveillance), along with the very ideas of crisis, risk governance and preparedness. Practices around big data produce and perpetuate specific forms of social engagement as well as understandings of the areas affected and the people being served.

      How data structure influences value systems and social practices is a much-needed topic of inquiry.

    2. Big data is not just about knowing more. It could be – and should be – about knowing better or about changing what knowing means. It is an ethico- episteme-ontological- political matter. The ‘needle in the haystack’ metaphor conceals the fact that there is no such thing as one reality that can be revealed. But multiple, lived are made through mediations and human and technological assemblages. Refugees’ realities of intersecting intelligences are shaped by the ethico- episteme-ontological politics of big data.

      Big, sweeping statement that helps frame how big data could be better conceptualized as a complex, socially contextualized, temporal artifact.

    3. Burns (2015) builds on this to investigate how within digital humanitarianism discourses, big data produce and perform subjects ‘in need’ (individuals or com-munities affected by crises) and a humanitarian ‘saviour’ community that, in turn, seeks answers through big data

      I don't understand what Burns is arguing here. Who is he referring to claims that DHN is a "savior" or "the solution" to crisis response?

      "Big data should therefore be be conceptualized as a framing of what can be known about a humanitarian crisis, and how one is able to grasp that knowledge; in short, it is an epistemology. This epistemology privileges knowledges and knowledge- based practices originating in remote geographies and de- emphasizes the connections between multiple knowledges.... Put another way, this configuration obscures the funding, resource, and skills constraints causing imperfect humanitarian response, instead positing volunteered labor as ‘the solution.’ This subjectivity formation carves a space in which digital humanitarians are necessary for effective humanitarian activities." (Burns 2015: 9–10)

    4. Crises are often not a crisis of information. It is often not a lack of data or capacity to analyse it that prevents ‘us’ from pre-venting disasters or responding effectively. Risk management fails because there is a lack of a relational sense of responsibility. But this does not have to be the case. Technologies that are designed to support collaboration, such as what Jasanoff (2007) terms ‘technologies of humility’, can be better explored to find ways of framing data and correlations that elicit a greater sense of relational responsibility and commitment.

      Is it "a lack of relational sense of responsibility" in crisis response (state vs private sector vs public) or is it the wicked problem of power, class, social hierarchies, etc.?

      "... ways of framing data and correlations that elicit a greater sense of responsibility and commitment."

      That could have a temporal component to it to position urgency, timescape, horizon, etc.

    5. In some ways this constitutes the production of ‘liquid resilience’ – a deflection of risk to the individuals and communities affected which moves us from the idea of an all-powerful and knowing state to that of a ‘plethora of partial projects and initiatives that are seeking to harness ICTs in the service of better knowing and governing individuals and populations’ (Ruppert 2012: 118)

      This critique addresses surveillance state concerns about glue-ing datasets together to form a broader understanding of aggregate social behavior without the necessary constraints/warnings about social contexts and discontinuity between data.

      Skimmed the Ruppert paper, sadly doesn't engage with time and topologies.

    6. Indeed, as Chandler (2015: 9) also argues, crowdsourcing of big data does not equate to a democratisation of risk assessment or risk governance:

      Beyond this quote, Chandler (in engaging crisis/disaster scenarios) argues that Big Data may be more appropriately framed as community reflexive knowledge than causal knowledge. That's an interesting idea.

      *"Thus, It would be more useful to see Big Data as reflexive knowledge rather than as causal knowledge. Big Data cannot help explain global warming but it can enable individuals and household to measure their own energy consumption through the datafication of household objects and complex production and supply chains. Big Data thereby datafies or materialises an individual or community’s being in the world. This reflexive approach works to construct a pluralised and multiple world of self-organising and adaptive processes. The imaginary of Big Data is that the producers and consumers of knowledge and of governance would be indistinguishable; where both knowing and governing exist without external mediation, constituting a perfect harmonious and self-adapting system: often called ‘community resilience’. In this discourse, increasingly articulated by governments and policy-makers, knowledge of causal connections is no longer relevant as communities adapt to the real-time appearances of the world, without necessarily understanding them."

      "Rather than engaging in external understandings of causality in the world, Big Data works on changing social behaviour by enabling greater adaptive reflexivity. If, through Big Data, we could detect and manage our own biorhythms and know the effects of poor eating or a lack of exercise, we could monitor our own health and not need costly medical interventions. Equally, if vulnerable and marginal communities could ‘datafy’ their own modes of being and relationships to their environments they would be able to augment their coping capacities and resilience without disasters or crises occurring. In essence, the imaginary of Big Data resolves the essential problem of modernity and modernist epistemologies, the problem of unintended consequences or side-effects caused by unknown causation, through work on the datafication of the self in its relational-embeddedness.42 This is why disasters in current forms of resilience thinking are understood to be ‘transformative’: revealing the unintended consequences of social planning which prevented proper awareness and responsiveness. Disasters themselves become a form of ‘datafication’, revealing the existence of poor modes of self-governance."*

      Downloaded Chandler paper. Cites Meier quite a bit.

    7. However, with these big data collections, the focus becomes not the individu-al’s behaviour but social and economic insecurities, vulnerabilities and resilience in relation to the movement of such people. The shift acknowledges that what is surveilled is more complex than an individual person’s movements, communica-tions and actions over time.

      The shift from INGO emergency response/logistics to state-sponsored, individualized resilience via the private sector seems profound here.

      There's also a subtle temporal element here of surveilling need and collecting data over time.

      Again, raises serious questions about the use of predictive analytics, data quality/classification, and PII ethics.

    8. Andrejevic and Gates (2014: 190) suggest that ‘the target becomes the hidden patterns in the data, rather than particular individuals or events’. National and local authorities are not seeking to monitor individuals and discipline their behaviour but to see how many people will reach the country and when, so that they can accommodate them, secure borders, and identify long- term social out-looks such as education, civil services, and impacts upon the host community (Pham et al. 2015).

      This seems like a terribly naive conclusion about mass data collection by the state.

      Also:

      "Yet even if capacities to analyse the haystack for needles more adequately were available, there would be questions about the quality of the haystack, and the meaning of analysis. For ‘Big Data is not self-explanatory’ (Bollier 2010: 13, in boyd and Crawford 2012). Neither is big data necessarily good data in terms of quality or relevance (Lesk 2013: 87) or complete data (boyd and Crawford 2012)."

    9. as boyd and Crawford argue, ‘without taking into account the sample of a data set, the size of the data set is meaningless’ (2012: 669). Furthermore, many tech-niques used by the state and corporations in big data analysis are based on probabilistic prediction which, some experts argue, is alien to, and even incom-prehensible for, human reasoning (Heaven 2013). As Mayer-Schönberger stresses, we should be ‘less worried about privacy and more worried about the abuse of probabilistic prediction’ as these processes confront us with ‘profound ethical dilemmas’ (in Heaven 2013: 35).

      Primary problems to resolve regarding the use of "big data" in humanitarian contexts: dataset size/sample, predictive analytics are contrary to human behavior, and ethical abuses of PII.

  25. May 2018
    1. We showhow the rise of large datasets, in conjunction with arising interest in data as scholarly output, contributesto the advent of data sharing platforms in a field trad-itionally organized by infrastructures.

      What does this paper mean by infrastructures? Perhaps this is a reference to the traditional scholarly journals and monographs.

    1. Human cognition loses its personal character. Individuals turn into data, and data become regnant

      Reminds me of The End of Theory. But if we lose the theory, the human understanding, what will be the consequences?

    1. Get the best Explanation on Talend Training and Tutorial Course with Real time Experience and Exercises with Real time projects for better Hands on from the scratch to advance level

      so check this link and learn :- https://www.youtube.com/watch?v=lhTPrpBvakw

  26. Jan 2018
    1. reliability and accessibility of big data will help facilitate increased reliance upon outcomes-based contracting and alternative payment models.

      reliability and accessibility of big data will help facilitate increased reliance upon outcomes-based contracting and alternative payment models.

  27. Oct 2017
    1. We also downloaded Twitter user profiles, such as the size offollowers, along with their profile description.

      I wonder how many profiles in the 3,389 tweets? Did the automate the review and capture of the details? Or did they review each profile by hand?

    1. Thesedigitaltracesareoftenreferredtoasbigdataandarepopularlydiscussedasaresource,arawmaterialwithqualitiestobeminedandcapitalized,thenewoiltobetappedtospureconomies.Throughavarietyofpracticesofvaluation,corporationsnotonlyexploitthedigitaltracesoftheircustomerstomaximizetheiroperationsbutalsosellthosetracestoothers.Forthatreason,citizensubjectswhouseplatformssuchasGooglearesometimesreferredtonotasitscustomersbutasitsproduct.

    Tags

    Annotators

    1. what are our best, shared hopes for DH? What tasks and projects might we take up, or tie in? What are our functions—or, if you prefer, our vocations, now
      1. The Digital Recovery of Texts. Due to computer assisted approaches to paleography( noun: paleography the study of ancient writing systems and the deciphering and dating of historical manuscripts) and the steady advances in the field of digital preservation "Resurrection can be grisly work, I THINK WE COME TO UNDERSTAND EXTINCTION BETTER IN OUR STRUGGLES." .

      2. DH has a public and transformative role to play : Big Data and The Longue Duree

      Unable to look at article by Armitage, D & Guldi,j.i on The Return of The Longue Duree - hit by paywall each time,

      Look at great article on WWW.WIRED.COM .

      https://www.wired.com/2014/01/return-of-history-long-timescales/

      "The return of the longue durée is intimately connected to changing questions of scale. In a moment of ever-growing inequality, amid crises of global governance, and under the impact of anthropogenic climate change, even a minimal understanding of the conditions shaping our lives demands a scaling-up of our inquiries. "

      What does she mean by BIG DATA? Read Samuel Arbesman article in The Washington Post for easy explanation [] (https://www.washingtonpost.com/opinions/five-myths-about-big-data/2013/08/15/64a0dd0a-e044-11e2-963a-72d740e88c12_story.html?utm_term=.54ff7fdf82fe)

  28. Sep 2017
    1. A data lake management service with an Apache licence. I am particularly interested in how well the monitoring features of this platform work.

  29. Aug 2017
    1. In fact, academics now regularly tap into the reservoir of digitized material that Google helped create, using it as a dataset they can query, even if they can’t consume full texts.

      It's good to understand that exploring a corpus for "brainstorming" or discovering heretofore seen connections is different than a discovery query that is meant to give access to an entire text.

  30. Jul 2017
  31. Jun 2017
    1. literature became data

      Doesn't this obfuscate the process? Literature became digital. Digital enables a wide range of futther activity to take place on top of literature, including, perhaps, it's datafication.

  32. May 2017
    1. volume, velocity, and variety

      volume: The actual size of traffic

      Velocity: How fast does the traffic show up.

      Variety: Refers to data that can be unstructured, semi structured or multi structured.

  33. Apr 2017
    1. En produisant des services gratuits (ou très accessibles), performants et à haute valeur ajoutée pour les données qu’ils produisent, ces entreprises captent une gigantesque part des activités numériques des utilisateurs. Elles deviennent dès lors les principaux fournisseurs de services avec lesquels les gouvernements doivent composer s’ils veulent appliquer le droit, en particulier dans le cadre de la surveillance des populations et des opérations de sécurité.

      Voilà pourquoi les GAFAM sont aussi puissants (voire plus) que des États.

  34. Mar 2017
    1. Corporate thought leaders have now realized that it is a much greater challenge to actually apply that data. The big takeaways in this topic are that data has to be seen to be acknowledged, tangible to be appreciated, and relevantly presented to have an impact. Connecting data on the macro level across an organization and then bringing it down to the individual stakeholder on the micro level seems to be the key in getting past the fact that right now big data is one thing to have and quite another to unlock.

      Simply possessing pools of data is of limited utility. It's like having a space ship but your only access point to it is through a pin hole in the garage wall that lets you see one small, random glint of ship; you (think you) know there's something awesome inside but that sense is really all you've got. Margaret points out that it has to be seen (data visualization), it has to be tangible (relevant to audience) and connected at micro and macro levels (storytelling). For all of the machine learning and AI that helps us access the spaceship, these key points are (for now) human-driven.

    1. Either we own political technologies, or they will own us. The great potential of big data, big analysis and online forums will be used by us or against us. We must move fast to beat the billionaires.
  35. Feb 2017
    1. Not in the right major. Not in the right class. Not in the right school. Not in the right country.

      There's a bit of a slippery slope here, no? Maybe it's Audrey on that slope, maybe it's data-happy schools/companies. In either case, I wonder if it might be productive to lay claim to some space on that slope, short of the dangers below, aware of them, and working to responsibly leverage machine intelligence alongside human understanding.

    2. Ed-Tech in a Time of Trump
    1. in order to facilitate advisors holding more productive conversations about potential academic directions with their advisees.

      Conversations!

    2. Each morning, all alerts triggered over the previous day are automatically sent to the advisor assigned to the impacted students, with a goal of advisor outreach to the student within 24 hours.

      Key that there's still a human and human relationships in the equation here.

    3. A single screen for each student offers all of the information that advisors reported was most essential to their work,

      Did students have access to the same data?

    4. and Georgia State's IT and legal offices readily accepted the security protocols put in place by EAB to protect the student data.

      So it's not as if this was done willy-nilly.

  36. Oct 2016
  37. Sep 2016
    1. The importance of models may need to be underscored in this age of “big data” and “data mining”. Data, no matter how big, can only tell you what happened in the past. Unless you’re a historian, you actually care about the future — what will happen, what could happen, what would happen if you did this or that. Exploring these questions will always require models. Let’s get over “big data” — it’s time for “big modeling”.
  38. Jul 2016
    1. Page 14

      Rockwell and Sinclair note that corporations are mining text including our email; as they say here:

      more and more of our private textual correspondence is available for large-scale analysis and interpretation. We need to learn more about these methods to be able to think through the ethical, social, and political consequences. The humanities have traditions of engaging with issues of literacy, and big data should be not an exception. How to analyze interpret, and exploit big data are big problems for the humanities.

    1. big data

      les algorithmes ont besoin de données soi-disant neutres.. c'est un peu aller dans le sens des discours d'accompagnement de ces algorithmes et services de recommandation qui considèrent leurs données "naturelles", sans valeur intrasèque. (voir Bonenfant 2015)

    1. p. 100

      Data are not useful in and of themselves. They only have utility if meaning and value can be extracted from them. In other words, it is what is done with data that is important, not simply that they are generated. The whole of science is based on realising meaning and value from data. Making sense of scaled small data and big data poses new challenges. In the case of scaled small data, the challenge is linking together varied datasets to gain new insights and opening up the data to new analytical approaches being used in big data. With respect to big data, the challenge is coping with its abundance and exhaustivity (including sizeable amounts of data with low utility and value), timeliness and dynamism, messiness and uncertainty, high relationality, semi-structured or unstructured nature, and the fact that much of big data is generated with no specific question in mind or is a by-product of another activity. Indeed, until recently, data analysis techniques have primarily been designed to extract insights from scarce, static, clean and poorly relational datasets, scientifically sampled and adhering to strict assumptions (such as independence, stationarity, and normality), and generated and alanysed with a specific question in mind.

      Good discussion of the different approaches allowed/required by small v. big data.

  39. Apr 2016
    1. We should have control of the algorithms and data that guide our experiences online, and increasingly offline. Under our guidance, they can be powerful personal assistants.

      Big business has been very militant about protecting their "intellectual property". Yet they regard every detail of our personal lives as theirs to collect and sell at whim. What a bunch of little darlings they are.

  40. Mar 2016
  41. Feb 2016
  42. Jan 2016
    1. 50 Years of Data Science, David Donoho<br> 2015, 41 pages

      This paper reviews some ingredients of the current "Data Science moment", including recent commentary about data science in the popular media, and about how/whether Data Science is really di fferent from Statistics.

      The now-contemplated fi eld of Data Science amounts to a superset of the fi elds of statistics and machine learning which adds some technology for 'scaling up' to 'big data'.

  43. Dec 2015
    1. The idea was to pinpoint the doctors prescribing the most pain medication and target them for the company’s marketing onslaught. That the databases couldn’t distinguish between doctors who were prescribing more pain meds because they were seeing more patients with chronic pain or were simply looser with their signatures didn’t matter to Purdue.
  44. Aug 2015
    1. Shared information

      The “social”, with an embedded emphasis on the data part of knowledge building and a nod to solidarity. Cloud computing does go well with collaboration and spelling out the difference can help lift some confusion.

  45. Feb 2015