1,199 Matching Annotations
  1. Last 7 days
  2. Mar 2024
    1. Cyber-attacks are deeply upsettingfor staff whose data is compromised and whose work is disrupted, and for users whoseservices are interrupted.

      This is important -- I imagine it must be EXTREMELY stressful to have been working through this.

    2. extremely hard to restore.

      I can only imagine how hard this must be. This is why Continuous Integration is so important.

    3. Moving to the cloud doesnot remove our cyber-risks, it simply transforms them to a new set of risks that should beeasier to manage given the necessary resources and capacity.

      This is important. Moving some things to the cloud could make them more vulnerable if it's not done well.

    4. The Library’s balance between cloud and on-site technologies will shift considerably over thenext 18 months. There are risks and issues associated with this shift that will need to beassessed and managed, including varying levels of staff familiarity with at-scale cloudtechnology solutions and a whole new set of cyber-security risks

      It sounds like moving to the cloud is a necessity because they don't have the physical infrastructure anymore? And this involves a significant investment in reskilling. It won't make their security problems go away in a puff of smoke though.

    5. The Technology department was overstretched before the incident and had some staffshortages which were beginning to be successfully addressed. Faced with the challenge ofrebuilding our entire IT infrastructure, there is a risk that the capacity and capability withinthe Technology department will not be sufficient to meet the needs of the Rebuild & RenewProgramme. The need to grow cyber-security capacity and cloud engineering capabilities willbe particularly acute and will be difficult to remediate without reconsideration of how theLibrary remunerates high-demand IT skills.

      Lack of maintenance might've been partly the result of lack of sufficient staff or expertise.

    6. investment in culture change across different parts of the Library

      This could be very painful.

    7. The main vehicle for this will be the Modern Library Services Programme which will centralise andreplace our current Library Services Platform, legacy catalogues, online reader registration system,digital preservation system, and enquiries management system.

      That's basically everything.

    8. east privilege

      No super-users.

    9. a best practice network design, implementing proper segmentation with a defence in depthapproach

      It sounds like once you got in as the right user, you could get anywhere you wanted on the network?

    10. a robust and resilient backup service, providing immutable and air-gapped copies, offsitecopies, and hot copies of data with multiple restoration points on a 4/3/2/1 model

      This suggests that there weren't recent off-site backup copies.

    11. keepinginfrastructure and applications current

      maintenance is important!

    12. extract, transform and load (ETL)processes to pass data from one system to another

      Moving data from one system to another for processing.

    13. our historically complex network topology (ie. the ‘shape’ of our network and how itscomponents connect to each other) allowed the attackers wider access to our network thanwould have been possible in a more modern network design, allowing them to compromisemore systems and services

      It sounds like there were too many things connected to other things. A natural result of organic growth over time.

    14. The complexity of the Library’s technology estate was increased significantly by the implementationin 2013 of the Non Print Legal Deposit Regulations, in partnership with the other five Legal Depositlibraries of the UK and Ireland. This required investment from core Library funds in a significantsuite of new statutory activities including web archiving, digital preservation systems and viewingapplications. One implication of this was that some legacy applications for core Library operationswere retained longer than originally intended

      It sounds like part of the legacy software problem was related to web archives? I'm noticing that https://www.webarchive.org.uk/ukwa/ is no longer online.

    15. the programme aims to not only restore disruptedservices but to strategically modernise and enhance Library operations, ensuring they remain at theforefront of delivering valuable knowledge services to our users.

      hopefully it will be kept up to date once it is modernized -- it's an ongoing practice

    16. Accreditation to Cyber Essentials Plus was successfully achieved in 2019, but changes tothe standard in 2022 meant that we ceased to be compliant pending replacement of some of ourolder core systems. Work to address this, including a major programme approved by the BritishLibrary Board in 2022 to procure and implement a new library services platform

      20/20 hindsight it sounds like this should have been a warning sign. I don't think that the web archiving program has a public face?

    17. Digital Portfolio Committee,

      A new committee is always the answer :-)

    18. unedited Electoral Roll database

      Lists of people who are able to vote?

      https://en.wikipedia.org/wiki/Electoral_roll

    19. Gold and Silver committee

      Is this some sort of management term for classes of employees?

    20. Endangered Archives Programme (EAP) Hubs and EAP grants programme have continuedwell but access to previously digitised content remains currently unavailable.

      Kind of ironic & sad :-(

    21. iable sources of backups had been identified that wereunaffected by the cyber-attack and from which the Library’s digital and digitised collections,collection metadata and other corporate data could be recovered.

      I noticed in the diagram above that onsite backups were compromised. I wonder where these backups were and how current they were.

    22. library management system,

      What is this, the ILS? I thought that is already back online? Oh perhaps it has been brought back in a "new" form?

    23. echnical obsolescence, lack of vendorsupport, or the inability of the system to operate in a modern secure environment.

      These are the reasons why the restore has been hard.

    24. the lack ofviable infrastructure on which to restore it.

      They need new computers -- hello Mr Bezos.

    25. The attack methodology of Rhysida and its affiliates involves several different elements, includingdefence evasion and anti-forensics (e.g. they ‘clean up after themselves’ and delete log files etc., inorder to make it hard to trace their activities), exfiltration of data for ransom, encryption for impact,and destruction of servers to inhibit system recovery (and as a further anti-forensic measure)

      So how can they say things like The Electoral Rolls were not compromised?

    26. validated

      I wonder what this process will look like.

    27. we believe that secure copies exist both of ourborn-digital and digitised content, and of the metadata which describes it.

      It sounds like they're not entirely sure...

    28. Thirdly, the attackers hijacked native utilities (e.g. IT tools used to administer the network) and usedthem to forcibly create backup copies of 22 of our databases, which were then subsequentlyexfiltrated from our network.

      Database backup tools were used to export data from running systems.

    29. Secondly, a keyword attack scanned our network for any file or folder that used certain sensitivekeywords in its naming convention, such as ‘passport’ or ‘confidential’, and copied files not just fromour corporate networks but also from drives used by staff for personal purposes as permitted underthe Library’s Acceptable Use of IT Policy.

      Collecting data that could be used to infiltrate other systems.

    30. Firstly, a targeted attack copied records belonging to our Finance, Technology, and People teams ona ‘wholesale’ basis, resulting in the copying of entire sections of our network drives

      Network drives sounds like Windows shares?

    31. The challenge of rebuilding our technology infrastructure infull also brings risks of capacity and capability within our Technology department, which will need tobe actively addressed.

      Being able to rebuild applications easily from backups is critical.

    32. for reasons of practicality, cost and impact on ongoing Libraryprogrammes, it was decided at this time that connectivity to the British Library domain (includingmachine log-on access and access to on-premise servers) would be out of scope for MFAimplementation, pending further renewal of the Library’s infrastructure.

      I wonder if there were legacy systems that couldn't be adapted to work with MFA?

    33. access was not subject to Multi-Factor Authentication (MFA)

      This seems significant.

    34. A review of security provisions relating tothe management of third parties was planned for 2024; and the tightening of access provisions thatwould be enabled by improvements to underlying computer and storage infrastructure and themigration of storage to the cloud, which is currently being implemented.

      This is kind of heart-breaking, that they knew of the need for tightening, and were about to do it.

    35. the increasingcomplexity of managing their access was flagged as a risk.

      Difficult to keep track of who has access to what and how.

    36. The most likely source of the attack is therefore the compromise of privilegedaccount credentials, possibly via a phishing or spear-phishing attack or a brute force attack wherepasswords are repeatedly tried against a user’s account.

      "Most likely" -- they still don't know? If there was a test of access two days earlier, and the account had its password reset, how did they still use it? I guess multiple people could have been compromised.

    37. However, the first detected unauthorised access to our network was identified at the TerminalServices server. This terminal server had been installed in February 2020 to facilitate efficient accessfor trusted external partners and internal IT administrators, as a replacement for the previousremote access system, which had been assessed as being insufficiently secure.

      Remote work during the pandemic may have required increased remote access?

    38. high volume of data traffic (440GB)

      This should have set off alarm bells?

    39. unblocking the account later that day

      Why was the account unblocked?

    40. undertook a vulnerability scan (which came back with noresults)

      It seems that there was a false sense of security from the malware scanning software.

    41. Later that night, at 01:15 on 26October 2023, the Library’s IT Security Manager was alerted to possible malicious activity on theLibrary network.

      So they knew two days beforehand that something was not right?

    42. independent cyber-security advisors

      I wonder who did this?

    43. WhatsApp video call

      It must've been tricky to negotiate new communication channels in the crisis. Or were they established already?

    44. Gold meeting

      What is a Gold Meeting? I guess it means the most expensive security help you can get?

    45. We expect the balance between cloud-based and onsite technologies to shift substantially towardsthe former in the next 18 months, which will come with its own risks that need to be activelymanaged, even as we substantially reduce security and other risks by making this change

      It's probably easier to bring up machines in the cloud than it is to provision them onsite. But that does bring another set of vulnerabilities, and doesn't negate them.

    46. TheLibrary’s unusually diverse and complex technology estate, including many legacy systems, has rootsin its origins as the merger of many different collections, organisational cultures and functions. Webelieve that the nature of this legacy infrastructure contributed to the severity of the impact of theattack. The historically complex shape of the network allowed the attackers wider access than wouldhave been possible in a more modern network design, and the reliance of older applications onmanual processes to pass data from one system to another increased the volume of staff andcustomer data held in multiple copies on the network.

      Legacy systems were a point of vulnerability.

    47. more recent software versions

      Software not being kept up to date.

    48. they will not function on the new secureinfrastructure

      It would be interesting to know why the software no longer will work on the new "secure infrastructure".

    49. we have been hampered by the lack of viable infrastructure on which to restore it.

      It can take time to purchase and provision new machines on site. To say nothing about the time to bring back the data and put it online again.

  3. Jan 2024
    1. Once again, like the original standards-makers, the organizational efforts followan idea that is already working in practice.

      de jure vs de facto

    2. The momentum of ISO was buoyed by theemergence of global industries like air traveland freight shipping, as well as a conver-gence of political interests.

      In short, Capitalism.

  4. Nov 2023
    1. Instead of campaigning for implementation ofpreservation solutions, we could therefore bestrategic and campaign for implementation ofprinciples inspired by learning organizations in thehope of being granted more discretion in handlingpreservation issues

      I like the idea of Principles over Rules. Too often rules become automatic, and the motivating principle for creating the rule in the first place is lost.

    2. This leaves thepreservationist no other option than communicatingand campaigning to the point of exhaustion

      Yes, this is frustrating!

    3. risk is socially constructed

      But doesn't risk also have real, material outcomes?

    4. The same dualfocus can be seen within the field of preservationsince the OAIS-model is describing functions forexploration of new technological developments andchanged user demand

      It is interesting to think of how in terms of OAIS that DIP or access objects might be changing more often.

    5. This is especially relevant within the field ofdigital preservation that was conceived to a largeextent with the goal of countering obsolescence bystaying up to date.

      Where does this precept come from?

    6. However, stability is also the opposite offlexibility. Organizations in modern times need to beflexible to be able to keep up with technologicalchange.

      Do we want to be able to keep up with all tech change?

    Annotators

  5. Jul 2023
    1. Prior to its destruction, the library had reached new levels of growth with laptops, a Wi-Fi hub, and a tent donated by author and rock legend Patti Smith and dubbed “Fort Patti.”

      Fort Patti

  6. Mar 2023
    1. This tension persists through the broader history of the web.

      this is so true -- I think it's very evident in the httpRange-14 debate, which (shameless plug) I tried to write about in: https://arxiv.org/abs/1302.4591

      Also, https://www.ibiblio.org/hhalpin/homepage/publications/hayes-halpin-final-copyedited6.pdf is a good read.

    2. Vulgar Linked Data

      Love this idea of Vulgar Linked Data for how it draws on the history of Vulgar Latin. I wonder how https://linkeddatafragments.org/ might fit into this picture by de-centering the knowledge graph?

    3. The merger of AI shit and knowledge graph shit

      Yes, maybe it's another paper, but it is interesting to consider GPT-n and the massive web scale data extraction that took place to create these LLM models that are gate kept behind APIs, which people are binding their services too with abandon.

      Maybe there's a connection to be made here to the debate around whether these models just represent surface level statistics (Markov chains) or if some sort underlying alien representational model: https://thegradient.pub/othello/

    4. As with search, we should be particularly wary of information infrastructures that are technically open13 but embed design logics that preserve the hegemony of the organizations that have the resources to make use of them.

      well said!

    5. google

      Caps

    6. “Open” standards are yet another fraught domain of openness. For an example within academia, the seemingly-open Digital Object Identifier (DOI) system was concocted as a means for publishers to retain control of indexing research, avoiding the impact of the proposed free repository PubMedCentral and the high overhead of linking documents between publishers11 (see sec. 3.1.1 in [73]). The nonprofit standards body NISO’s standards for indicating journal article versions [74] and licensing [75] are used by publishers to enforce their intellectual property monopolies and programmatically scour the web to prevent free access to publicly funded information

      The Handle System that underlays DOI is particularly opaque, and thus has led to the single point of failure that is https://dx.doi.org/

    7. After shuttering Freebase, Google has donated a substantial amount of money to kickstart its successor [69] Wikidata,

      Wikidata's inception actually predates Freebase's donation by about two years...

      Denny Vrandečić helped start Wikidata then went to Google to help start their Knowledge Graph and then went back to Wikimedia Foundation: https://en.wikipedia.org/wiki/Denny_Vrande%C4%8Di%C4%87

    8. They are coproductive with the corporate and technical structure of surveillance capitalism, facilitating conglomerates that gobble up as many platforms and data sources as possible to stitch them into an expanding, heterogeneous graph of data.

      The knowledge graph that was meant to live on the web, is extracted from the web and kept almost completely private, with small rich snippets showing up in search results like the tip of an iceberg, with the assertions and structured data hidden below the surface.

    9. Palantir

      Glad you made this connection. In case its of interest there are some details about how entities are linked together in https://logicmag.io/commons/enter-the-dragnet/

    10. That’s all within biomedical sciences, but RELX’s risk division also provides “comprehensive data, analytics, and decision tools for […] life insurance carriers” [35], so while we will never have the kind of external visibility into its infrastructure to say for certain, it’s not difficult to imagine combining its diverse biomedical knowledge graph with personal medical information in order to sell risk-assessment services to health and life insurance companies.

      This section is reminding me of how biomedical use cases were some of the first "real world" implementations of semweb technology.

    11. asymmetry

      This is such an important keyword that sums up how APIs disciplined the web as an information space during the "Web2" period.

    12. We were recast from our role as people creating a digital world to consumers of subscriptions and services.

      Yes, Google recast knowledge graphs as an SEO problem -- "publish your linked data this way, and it will show up in our search results like this".

    13. The mutation from “Linked Open Data” [21] to “Knowledge Graphs” is a shift in meaning from a public and densely linked web of information from many sources to a proprietary information store used to power derivative platforms and services.

      I think this is right. The outlier here being Wikidata I guess?

    14. driven more by an empirical approach of trying to realize these systems on the wilds of the web, creating some of the first public “Linked Open Data” systems like DBPedia and Freebase.

      Also driven by these uncamp-style conferences, vocamps: http://vocamp.org/wiki/Main_Page

      Although arguably these were meant to be more oriented around liberating people to create their own vocabularies, instead of aligning everyone to using the same ones. So perhaps mentioning vocamps here doesn't make sense. They almost represent a deferred future, or road not taken...

    15. [13]

      love this quote!

    16. “people were frightened of getting lost in it. You could follow links forever.”
    17. triplet links

      maybe "triples" will be a more familiar term?

    18. dissolve the Silos

      This reminded me of timbl's original purpose of the WWW: http://cds.cern.ch/record/369245/files/dd-89-001.pdf

  7. Aug 2021
    1. The tool relies on a new algorithm designed to recognize known child sexual abuse images, even if they have been slightly altered. Apple says this algorithm is extremely unlikely to accidentally flag legitimate content, and it has added some safeguards, including having Apple employees review images before forwarding them to the National Center for Missing and Exploited Children. But Apple has allowed few if any independent computer scientists to test its algorithm.

      Even with the white paper they published a lot of questions remain about how this self-supervised ConvNet model is generated and used.

    1. We are privacy and cybersecurity researchers whose careers are built on protecting users. That’s why we’ve been so careful to make sure that our Ad Observer tool collects only limited and anonymous information from the users who agreed to participate in our research. And it is also why we made the tool’s source code public so that Facebook and others can verify that it does what we say it does.

      This is absolutely key.

    2. our work shows that the archive of political ads that Facebook makes available to researchers is missing more than 100,000 ads.

      Need to follow and see if the ads that are missing from the archive. It would be super interesting to see if they are different from the public ones in some way.

    1. Collecting data via scraping is an industry-wide problem that jeopardizes people’s privacy, and we’ve been clear about our public position on this as recently as April.

      How ironic that FB would fall victim to scraping which is the very thing that they did to bootstrap their startup collecting women's faces from university websites.

    2. The extension also collected data about Facebook users who did not install it or consent to the collection.

      What information?

    3. usernames, ads, links to user profiles and “Why am I seeing this ad?” information, some of which is not publicly-viewable on Facebook.

      Why is the "Why am I seeing this ad?" information not public? Why are ads not public? Maybe this is the problem, and not the fact that they had to scrape them?

  8. Mar 2021
    1. "The Facebooks and the Googles are taking over, and they want to make money," Bailey said. The more people act on the internet behind a password and the more the web becomes corporate, the more the open internet ethos fades away from the public consciousness, easing the way toward that splintering that Kahle fears.

      And IA doesn't want to make money?

    2. force

      This is the right word to use here.

    3. Imagine if each of us could look back on our great-grandparents and know what they said or thought at age 15, and then 25, and 50. The Archive would allow that.

      What does this type of memory do to our culture? Is it really a win-win?

    4. Social media companies want us to focus on tomorrow, not on the posts we made a year ago. Publishers do, too. HarperCollins is suing the archive to try to prevent it from sharing out-of-print books in its digital library, arguing that publicly sharing out-of-print books is a massive violation of copyright laws. While at first it might seem odd that publishers would care about books that aren't in print anymore, for companies whose business depends on people buying new things, archiving so that people can focus on the past is not in their financial interest.

      IA is on the wrong side of this one. People are tired of Silicon Valley disruptions. Book publishers might like to dip into their back catalog and republish things. How often does that happen? Why is book circulation practice ripe for disruption by this one organization that stands to profit from it?

    5. The people building open-source translation tools at Mozilla have also found the internet archive's collection of websites in multiple languages useful for training their translation tools

      Interesting, I wonder what that project was and how it worked.

    6. "That's always the dilemma of the librarian."

      Yes, it is an old problem, and not just an idle one of the anxious as Jefferson seems to imply. Or maybe he was taken out of context.

    7. the most important fraction

      Oh, it's that easy? Just find the most important stuff!

    8. There's no use being anxious over what's outside your control," he said.

      What a brush off.

    9. "I'd look like an idiot," he said — because no one really can guess the size or scale of the internet. (Don't get there in your head, if you can avoid it. How would you even measure: by data size? Number of objects? Number of distinct URLs?)

      Companies that have a larger crawl breadth than IA are one way to measure. e.g. Google.

    10. Web pages have an average lifespan of about 90 days before they change or disappear, and so the Archive needs to capture those pages at a minimum of every 90 days to preserve a full picture of the web over time.

      A citation would be good here.

    11. Section 230 — which protects website owners from legal liability for content created and posted by its users — would destroy the delicate legal framework that protects the Internet Archive's work (as well as Wikipedia and user-contributed projects),

      So it appears the IA and the platforms are allied in not updating 230 which allows these corporations to profit from the distribution of disinformation and lies of the powerful. Not that the baby should be thrown out with the bathwater, but clearly some adjustments need to be made to make these companies legally accountable for what happens on them?

    12. At the end of the day, we're just a library.

      Increasingly just the library. This is the problem.

    13. Facebook is the hardest, because the company is archiving-unfriendly in general

      There are some really good reasons why social media should be hard to archive. See https://www.jstor.org/stable/j.ctt7t09g

    14. That could soon change, however. "Are we at risk of locking down? Yes, absolutely," he said. The Internet Archive is currently blocked in China, and occasionally as well in Russia, India and Turkey, and that's just at the whim of nation-state governments that have the tools to make that work. According to Kahle and Bailey, corporations are just as capable of fracturing the web in ways that make it harder to access and archive; even "user lock-in" to a specific browser and products could one day create internet bubbles, and then walls, based on the products people pay for.

      This has always been the case with the web, as soon as the Cookie and site logins were created.

    15. a professor's ID

      Minksy or Hillis? My bet would be Hillis.

    16. until the end of time

      The end of time you say? Now I will have to keep reading. But I guess that was the point.

    17. the Internet Archive will keep doing what it's been doing since 1996: preserving every fragment of text you or I are ever likely to read

      Hyperbole much? Who is likely to read what? The Web is much bigger than the Internet Archive's view of the Web. Little statements like this don't help us understand what is being preserved from the web.

  9. Dec 2020
    1. noarchive

      This implies that the cached content is there, but a link to the content is not provided publicly.

  10. Aug 2020
    1. Epilogue: The Uncertain Climb

      Thinking about Luther the Catholic Church and the printing press provides a lens on thinking about the complicated ways that the Internet is shaping and being shaped by our social and political lives.

  11. Nov 2019
    1. Do any creative projects — including personal poetry, expository writing, etc. — in a journal or on your personal computer, using your personal Google Drive, Microsoft 365 account, or a native text document tools like Notes or TextEdit.

      It's sad to me that things have come to this point where teenagers are asked to not do creative work on the web.

  12. Oct 2019
    1. The new gold rush in the context of artificial intelligence is to enclose different fields of human knowing, feeling, and action, in order to capture and privatize those fields. When in November 2015 DeepMind Technologies Ltd. got access to the health records of 1.6 million identifiable patients of Royal Free hospital, we witnessed a particular form of privatization: the extraction of knowledge value. 53 A dataset may still be publicly owned, but the meta-value of the data – the model created by it – is privately owned.

      AI is part of an old colonial project.

    2. Increasingly, the process of quantification is reaching into the human affective, cognitive, and physical worlds. Training sets exist for emotion detection, for family resemblance, for tracking an individual as they age, and for human actions like sitting down, waving, raising a glass, or crying. Every form of biodata – including forensic, biometric, sociometric, and psychometric – are being captured and logged into databases for AI training.

      The dark side of all the sensors we place around us.

    3. dysprosium
    4. Terbium
    5. With Amazon Mechanical Turk, it may seem to users that an application is using advanced artificial intelligence to accomplish tasks. But it is closer to a form of ‘artificial artificial intelligence’, driven by a remote, dispersed and poorly paid clickworker workforce that helps a client achieve their business objectives.

      Artificial Artificial Intelligence

    6. While ‘off the shelf’ machine learning tools, like TensorFlow, are becoming more accessible from the point of view of setting up your own system, the underlying logics of those systems, and the datasets for training them are accessible to and controlled by very few entities.

      The data is integral, it's not just about having the tools. That's why the tools are made open source.

    7. satellite picture
    8. In 2009, China produced 95% of the world's supply of these elements, and it has been estimated that the single mine known as Bayan Obo contains 70% of the world's reserves.

      This was 10 years ago. I wonder what it looks like today.

    9. There are 17 rare earth elements, which are embedded in laptops and smartphones, making them smaller and lighter. They play a role in color displays, loudspeakers, camera lenses, GPS systems, rechargeable batteries, hard drives and many other components. They are key elements in communication systems from fiber optic cables, signal amplification in mobile communication towers to satellites and GPS technology. But the precise configuration and use of these minerals is hard to ascertain. In the same way that medieval alchemists hid their research behind cyphers and cryptic symbolism, contemporary processes for using minerals in devices are protected behind NDAs and trade secrets.

      Difficult to understand how these rare earth metals are being deployed.

    1. Data science can indeed play a role in addressing deep inequities. Progressive critics of algorithmic decision making suggest focusing on transparency, accountability and human-centered design to push big data toward social justice.

      open source and transparency

    2. Contemporary proponents of poverty analytics believe that public services will improve if we use these data to create “actionable intelligence” about fraud and waste. Daniels, for example, promised that Indiana would save $500 million in administrative costs and another $500 million by identifying fraud and ineligibility over the 10 years of the contract.

      Cost savings is the promise.

    1. The Latin alphabet evolved from the visually similar Etruscan alphabet, which evolved from the Cumaean Greek version of the Greek alphabet, which was itself descended from the Phoenician alphabet, which in turn derived from Egyptian hieroglyphics.[1] The Etruscans, who ruled early Rome, adopted the Cumaean Greek alphabet, which was modified over time to become the Etruscan alphabet, which was in turn adopted and further modified by the Romans to produce the Latin alphabet.

      But did the Latin alphabet really come from Greek?

  13. May 2019
  14. Apr 2019
    1. Organization Guests

      I think this will allow us to have contractors on particular projects?

  15. Mar 2019
  16. archivaria.ca archivaria.ca
    1. The documentary heritage should be formed according to an archival conception, historically assessed, which reflects the consciousness of the particular period for which the archives is responsible and from which the source material to be appraised is taken

      This is the heart of appraisal, but how does one measure group consciousness?

  17. Feb 2019
    1. At any one moment of time there are X amount of tweets in the public firehose. You're allowed to be served up to 1% of whatever X is per a "streaming second." If you're streaming from the sample hose at https://stream.twitter.com/1/statuses/sample.json, you'll receive a steady stream of tweets, never exceeding 1% X tweets in the public firehose per "streaming second." If you're using the filter feature of the Streaming API, you'll be streamed Y tweets per "streaming second" that match your criteria, where Y tweets can never exceed 1% of X public tweets in the firehose during that same "streaming second." If there are more tweets that would match your criteria, you'll be streamed a rate limit message indicating how many tweets fell outside of 1%.

      I'm not sure I've seen this documented elsewhere in the current Twitter documentation. But I believe something like this is still in operation when retrieving data from the filter stream.

  18. Nov 2018
    1. 1,484,166 total views Share Facebook Twitter LinkedIn

      That's a lot of views, and now they are gone?

    1. donations from people like you

      taxes from citizens like you

  19. Sep 2018
    1. With the release of GPGMail 3.0 stable, we will start charging a small fee for GPGMail

      Notification of business model change.

  20. Jul 2018
    1. #### Parker Higgins ##### About Parker Higgins is an activist at the Electronic Frontier Foundation, working on issues of copyright, free speech, and electronic privacy. He also co-authors the weekly IP newsletter \[[Five Useful Articles](http://five.usefularticl.es/)\]. Follow him on Twitter at [@xor](https://twitter.com/xor).

      Hi Markdown, fancy seeing you here.

  21. May 2018
    1. Assange’s previously active Twitter account has had no activity since then

      Both @wikileaks and @julianassange accounts seem active right now (2018-05-16). But I guess the #ReconnectJulian campaign has taken them over?

      https://twitter.com/JulianAssange/status/988305592539340800

    1. statistically

      We talked about dropping the word "statistically" since some signals may just be assertions "A was written by B" which in itself isn't statistical.

  22. Mar 2018
    1. Media professionals and everyday users found common ground in noting that at the very least, a news outlet should contact users to let them know that their tweets may be used in a story. As with Asian-American Twitter, both journalists and regular users in Black Twitter said a simple DM could open a line of communication between a reporter and a potential source. Jesse Holland, an Associated Press reporter, said he contacts users to verify tweets, which are essentially quotes. “Verification is probably the first and foremost thing,” Holland said. “Doing that means that you’re actually having a conversation, either by email or in person. It’s very rare that I would just take someone’s tweet and say, ‘This person said that.’”Initiating conversation with Twitter users equips reporters to provide accurate context by going beyond the metrics of what is being retweeted, and why. Simply searching for high retweets and “favorites” can link false narratives to Black Twitter via popular hashtags. For instance, the far right-wing account @prisonplanet had three out of four of the highest retweet counts in our data set, amassing just over 21,000 and 18,000 retweets for two tweets using the #notmypresident hashtag, which Black Twitter used to signal disdain for President-elect Donald Trump. The account gained an additional 13,000 tweets by linking to a video that the user claimed “would be devastating for #blacklivesmatter.” Verification of the identity and intention of users like this, preferably through conversation, is key to understanding the message that is being communicated through hashtags that gain traction on Twitter. Simply relying on Twitter trends to tell the story will not suffice.

      I think this applies for archivists too, because the content is being collected for long term preservation.

    1. Any favorite examples?

      This is a great question! I'll look forward to reading the article :-)

  23. Feb 2018
    1. By “document” here, I mean to capture a comprehensive record, or at least a good approximation, of the present reality that can be consulted today and brought forward into the future.

      I find this idea of documenting "reality" a bit troubling. Archives serve purposes, and if we aren't explicit about these purposes, and instead simply talk about how effectively they document reality we're not doing our job.

      Lynch claims to be offering up "pragmatic" approaches several times in this piece. But the key measure of pragmatism (in the philosophical sense) is the degree to which something is useful. Documenting reality is not a use. What purposes does the documentation need to serve right now.

      I'm as guilty as anyone for pointing to the future and imagining some user who will want to know that something happened. But it's just not a satisfying story to tell.

  24. Dec 2017
    1. Over two-thirds of users were unsure whether Twitter gives public Tweets to the Library of Congress for archiving (and another 11.5% were incorrect). This raises questions about what Twitter users think happens to Tweets in the long term. It also raises questions about whether they are truly giving informed consent for this archiving.Finally, only a slim majority of users accurately indicated that Tweets are set to be public by default. Given the common refrain that Twitter is a “public” platform, having 33% of respondents indicate they are uncertain whether or not Twitter is public by default suggests some users may not actively perceive it this way. This raises many questions about the kinds of literacy work that needs to be done to improve user understanding of what it means for a platform to be “public.” Together, these individual findings suggest that the problems of inaccurate knowledge of information flow highlighted by these three anecdotes may be more common across a wider swath of users.

      This is a really important finding for people who are actively archiving social media, and Twitter in particular. It shows why archivists shouldn't throw consent out the window as it shifts to archiving "public" content on the web.

  25. Nov 2017
    1. Label bots as automated accounts. This is technically achievable and would increase transparency in online political conversation.

      It's interesting to think about who would do the labeling. On the one hand Twitter could try to identify the bots themselves, and label them as such. In another bot creators could identify an account as a bot. Maybe there could be two labels?

  26. Oct 2017
    1. If the fool would persist in his folly he would become wise.

      I sure hope this one is true!

    1. 6.4.4 for 'dns' scheme

      It would be interesting to look at what DNS records a tool like Heritrix puts into a WARC file.

    1. However, relatively little attentionin the literature has been paid to articulating specificallyhow Web-based materials fit into this larger body of cul-tural heritage materials.

      I wonder if Rogue Archives: Digital Cultural Memory and Media Fandom by Kosnik could help answer this?

    2. Duncanand Blumenthal claim that a collaborative approach hasbeen critical to the success of NYARC’s Web archiving ef-forts, allowing curatorial and appraisal effort to be spreadacross member institutions, and helping to meet a varietyof Web archiving challenges, including technical diffi-culties and resource deficiencies. Rollason-Cass and Reedalso cite the importance of collaborating across institu-tions to create and grow the #blacklivesmatter Web Ar-chives. Duncan and Blumenthal suggest that similartrans-institutional collaboration could be encouragedthrough national organizations like the NDSA.

      Collaboration, or participatory archives could provide a useful framework for future work.

    3. earce-Moses and Kaczmarek

      This looks like an interesting article.

    4. For other kinds of collectingefforts, archivists often require donor agreements fromprevious owners of archival materials, expressly handingover control to the archival institution; however, it is oftennot feasible to gain the consent of copyright holders forWeb-based materials due to the sheer scale of collectingand due to the fact that it may not be possible to locate thecopyright holder in many cases.

      A real challenge for consent.

    5. Determining the scope, scale, intensity, and fre-quency of collecting, all constitute important appraisaldecisions that shape the resulting Web archives.

      There is a decidedly academic archive/library feel to this paper--but it's not really clearly scoped that way.

    6. Rollason-Cass and Reed describe this approach as aSpontaneous Events model, or a Living Archives model, asthese collecting programs respond and adapt to ongoingdevelopments, offering the example of the #black-livesmatter Web Archives at the IA.

      Does the concept of a LIving Archives come from somewhere else?

    7. While it is difficult for archivists to measure thesuccess of their appraisal activity,

      Indeed, what does success even look like for an archive?

    8. If everything from the past issaved, it becomes close to impossible to actually find sig-nificant materials.

      There is also the significance of wanting to forget material. Thinking about Mayer-Schönberger's Delete.

    Annotators

    1. In the below text, we extend these metrics to encompassdynamic graphs, as well as define some new metrics that areunique to dynamic graphs.

      Extending metrics work of Dunne & Shneiderman.

    1. One could, for example, imagine an honest business model – in which people paid an annual subscription for a service that did not rely on targeting people on the basis of the 98 data-points that the company holds on every user.

      This is why I've started paying Medium. At least they are trying. Twitter, are you listening?

    1. function get_resource_info(url) { ajax("HEAD", url, function(response) { if(response.status==200) { $wmloading.style.display='none'; var dt=response.getResponseHeader('Memento-Datetime'); var dt_span=document.createElement('span'); var dt_result = datetime_diff(dt); var style = dt_result.highlight ? "color:red;" : ""; dt_span.innerHTML=" " + dt_result.text; dt_span.title=dt; dt_span.setAttribute('style', style); var ct=response.getResponseHeader('Content-Type'); var url=response.responseURL.replace(window.location.origin, ""); var link=document.createElement('a'); // remove /web/timestamp/ from appearance link.innerHTML=url.split("/").splice(3).join("/"); link.href=url; link.title=ct; link.onmouseover=highlight_on; link.onmouseout=highlight_off; link.setAttribute('style', style); var el=document.createElement('div'); el.setAttribute('data-delta', dt_result.delta); el.appendChild(link); el.append(dt_span); $capresources.appendChild(el); if(dt_result.highlight === true && show_warning_icon === false) { display_warning_icon(); } // sort elements by delta in a descending order and update container var items = Array.prototype.slice.call($capresources.childNodes, 0); items.sort(function(a, b) { return b.getAttribute('data-delta') - a.getAttribute('data-delta'); }); $capresources.innerHTML = ""; for(var i=0, len=items.length; i<len; i++) { $capresources.appendChild(items[i]); } } }); }

      UA little function that uses a HEAD request to the wayback machine to determine the time gap between a web page and its constituent parts.

    1. The Twitter account, @Blacktivists, provided several clues that in hindsight indicate it was not what it purported to be. In several tweets, it employed awkward phrasing that a native English speaker would be unlikely to use. It also consistently posted using an apostrophe facing the wrong way, i.e. "it`s" instead of "it's."

      Well that`s interesting.

  27. Sep 2017
    1. a pre-study we conducted to se-lect the pair of least diverging topic

      This was an important part of the study. To make sure that the story didn't bias the results that were supposed to be about the graphics.

    2. Amazon Mechanical Turk(AMT).

      Do they have any idea who these people are?

    3. Although we never intended to test all of them—ourgoal was to assess whether anthropographics generally havean effect on empathy and donating behavior, not to test formost effective designs—and although we do not claim theyare exhaustive, we wanted to get a sense of the creative pos-sibilities.

      A more interesting experiment might have been to compare all these variations?

    4. Second, because we believed using only proportionaldata—instead of using proportions and absolute numbers—would help avoid possible confounds linked to theproportiondominance bias

      Couldn't there be other biases built into the stories?

    5. standard chart(baseline)

      What is a standard chart?

    6. t is sometimes unclear whether they sharethe same definition ofempathy

      This is important, and is directly related to how empathy is measured, since this is an empiriical study

    7. his complements Batemanet al.’s call to learn more about the effects of different types ofvisual embellishment in charts [6], and opens new perspectivesfor exploring the benefits of anthropographics.

      Isn't anthropographics a bit of a redundant term? All graphics are meant for humans aren't they? What other assumptions is this anthropological approach cooking into it?

    Annotators

  28. Aug 2017
    1. Ed Summers, a software developer at the Maryland Institute for Technology in the Humanities, graciously offered to grab some basic information about the more than 11,500 suspected new bot followers that were still following my account earlier this morning. An analysis of that data indicates that more than 75 percent of the accounts (8,836) were created before 2013 — with the largest group of accounts (3,366) created six years ago.

      It's nice to get a shout out from Mr Krebs.

    1. Google Scholar supports Highwire Press tags (e.g., citation_title), Eprints tags (e.g., eprints.title), BE Press tags (e.g., bepress_citation_title), and PRISM tags (e.g., prism.title). Use Dublin Core tags (e.g., DC.title) as a last resort - they work poorly for journal papers because Dublin Core doesn't have unambiguous fields for journal title, volume, issue, and page numbers.

      It looks like Google Scholar look for a variety of metadata.

  29. Jul 2017
    1. While innovation — the social process of introducing new things — is important, most technologies around us are old, and for the smooth functioning of daily life, maintenance is more important.

      This seems so obvious, but it's so overlooked. The addiction to constant growth and newness seems so closely tied to our ideas of how markets operate. Consumption and waste trump conservation and repair.

    1. One strategy that UTL employed in collaboration with project partners to address challenges of agency, differential access to resources, and the most direct application of benefit was very deliberate transactional use of project funding. Rather than assume transfer of documentation to UTL—either through donation or purchase—as required under the custodial paradigm, UTL instead helped to arrange and purchased negotiated access to documentation that remained in the custody or control of the partner organization. Project funds were put toward the arrangement, description, preservation, and digitization of documentation, just as they would have been if the archival materials were at UTL. But the investments were made not in Texas, but locally with the partner organizations. In this way,the partner organizations and in some cases communities were able to build infrastructure and skills in digitization, metadata, software development, and preservation appropriate to the context of their organizational goals and uses of the documentation. And in two cases at least, the human rights organization developed significant local expertise that served them well beyond their partnership with UTL. Additionally, rather than acquire the original records themselves—as called for under the custodial paradigm—UTL sometimes purchased digitized copies of documentation or gained non-exclusive access to documentation as they andpartners made it available online. Though somewhat unusual for a custodial archival repository, this system was very familiar and comfortable for UTL as an academic library that annually spent hundreds of thousands of dollars for access to databases.

      I love this logic for post-custodial investment in record keeping infrastructures which draws the comparison to the way we pay for access to information that we do not own, but which instead only lines the coffers of corporations.

    Annotators

  30. Jun 2017
    1. Designers often manipulate the circle visualization that purports to track app-download progress, front-loading it so that it moves slowly at first but then speeds up at the end. This allows the download to please us by seeming to beat our expectations, which were established by the contrived slowness.

      It would be interesting to learn the source of this.

    1. Many publiclibraries have active local history collections of print materials documenting their region. These materials, however, are now increasingly published exclusively online. But technical hurdles, the absence of training resources on web archiving for local history collection development, and the lack of an active network of peer practitioners have hindered the capacity of public libraries to expand into community-focused web archiving.

      It's not just a technical problem. How do librarians and archivists decide what to collect from the web? Will people come and ask to donate their content? Should they focus on public domain government material? How can they use adapt existing collection development policies to meet the web?

    1. Hans Ulrich Obrist and the artists Philippe Parreno and Olafur Eliasson all used the same word to describe his oeuvre: it’s a “toolbox”, they said, from which they can pluck useful ideas.

      Huh, a pragmatist perhaps?

    2. Morton means not only that irreversible global warming is under way, but also something more wide-reaching. “We Mesopotamians” – as he calls the past 400 or so generations of humans living in agricultural and industrial societies – thought that we were simply manipulating other entities (by farming and engineering, and so on) in a vacuum, as if we were lab technicians and they were in some kind of giant petri dish called “nature” or “the environment”. In the Anthropocene, Morton says, we must wake up to the fact that we never stood apart from or controlled the non-human things on the planet, but have always been thoroughly bound up with them.

      This idea that we're all Mesopotamians strangely reminds me of Phillip K Dick's idea that the empire never ended, but in this case the one from Iraq, not Rome.

    3. He says that we’re already ruled by a primitive artificial intelligence: industrial capitalism.

      This is the kind of idea that you can't unthink once you've thought it.

    1. \\

      In MARCBreaker format a backslash is used instead of a space.

  31. May 2017
    1. “Tastes, Ties and Time” dataset comprising four years’ worth of complete Facebook profile data harvested from the accounts of cohort of 1,700 college students.

      It's significant that the data is called a dataset which is a collection of data pulled from a website. In this aggregated form it affords different types of analysis. Sharing this dataset on the web takes the 4 years of work and makes it instantly available to anyone else in the world.

    1. Second - Researchers affiliated with an academic institution accredited by a member of the Council for Higher Education Accreditation remain able to share an unlimited number of Tweet IDs for non-commercial research purposes, subject to all of the other provisions and rules of the Developer Policy and Agreement.

      This is super news!

    1. So I’m going to appropriate the term to put a label on a few ideas.
    2. Firstly, there’s more that could be done to build better ways to deep link into pages, e.g. to allow sharing of individual page elements. But people have been trying to do that on and off for years without much visible success. It’s a hard problem, particularly if you want to allow someone to link to a piece of text. It could be time for a standards body to have another crack at it. Or I might have missed some exciting process, so please tell me if I have! But I think something like this would need some serious push behind. You need support from not just web frameworks and the major CMS platforms, but also (probably) browser vendors.

      What does success look like? You can for example link to this paragraph using Hypothesis' annotation.

      https://hyp.is/ZVCqDj7dEee45nfjauA71A/blog.ldodds.com/2017/05/18/enabling-data-forensics/

      It would be nice if annotation was built into browsers somehow so annotation wasn't so dependent on a particular service. Maybe we'll get there someday. Or maybe we just need to use the tools that already exist more?

    1. Firstly, there’s more that could be done to build better ways to deep link into pages, e.g. to allow sharing of individual page elements. But people have been trying to do that on and off for years without much visible success. It’s a hard problem, particularly if you want to allow someone to link to a piece of text. It could be time for a standards body to have another crack at it. Or I might have missed some exciting process, so please tell me if I have! But I think something like this would need some serious push behind. You need support from not just web frameworks and the major CMS platforms, but also (probably) browser vendors.

      Hypothesis annotation let's you do this, try this URL https://hyp.is/A9qIOj7dEeef5tPxZEUiYw/blog.ldodds.com/

      It would be nice if annotation was built into browsers somehow so it wasn't so dependent on a particular service. Maybe we'll get there someday.

    1. Facebook activity, if concentrated strategically, could be influential. Was the activity mostly in swing states? Did it occur in the months of the Republican primaries and originate with accounts seeded from Russia? Or did fake-news and fake- account activity peak in the three days before the election?

      Answering these questions is of great public interest. The question about whether activity clustered in swing states is of particular interest because it would suggest a clear motive to influence the election and not simply generate clicks.

      I'm guessing it is possible to see if a Facebook account was created by someone with a Russian IP address, and if a post came from a Russian IP address. But I think it's important to remember that this doesn't necessarily mean the fake news campaigns are an act of the Russian government. The posts could very well be the product of a business enterprise that is making money by generating fake news for clients. The question then would be, who are the clients? Follow the money.

    1. For him, like most of us, e-mail is a “habitat” rather than an appli-cation (Ducheneaut and Bellotti 2001),

      This could be an interesting lead to think about in terms of appraising social media content.

    Annotators

    1. Whereas public archives should be appraised and preserved for both evidential value and informa­tional value, private manuscripts do not possess evidential value and are preserved only for their informational or research value, or their potential for use in research.

      Doesn't this presume that public records are not appraised? The reality is that not everything is kept, and some are selected -- just as is the case with personal archives.

    2. Most acquiring archives engage in some selection and arrangement that would threaten the “archive character” of the fonds.

      The same is true of non-personal records too isn't it?

    3. Jenkinson accordingly was suspicious of the prac­tice of acquisition, where an archives acquired a fonds created by another individual or organization, of which he observed: “Turning to the other kind of Archives, that of documents written originally by one person or body and preserved by another, we have not of course the same guarantee against forg­ery or tampering, because there are now two sides involved and either may have a motive for deceiving the other.”

      ... such devotion to the institution, as if individuals working in government didn't alter the record themselves ...

    4. His Creed, the Sanctity of Evidence; his Task, the conservation of every scrap of Evidence attaching to the Documents committed to his charge; his Aim to provide, without prejudice or thought, for all who wish to know the Means of Knowledge.

      reminds me so much of Open Data mantras

    Annotators

  32. Apr 2017
    1. They already police their networks for pornography, and quite well.

      I'm no expert, but is this really the case?

    1. their ia_archiver web crawler consults a publisher’s robots.txt to determine what parts of a website to archive and how often

      I've since heard from several people that the Internet Archive does not respect robots.txt when crawling at all, and that the robots.txt is only consulted when deciding what archived content to make available in the WayBack machine.

      I've never actually looked at my logs closely enough to confirm this, I guess because I've never actually told ia_archiver to go away either ...

    1. In identifying the need to shift from post-modern theory to archival performance,Schwartz brings diplomatic and photographic theory together to demonstrate thearchivalvalues of one specific form—of photographs

      Archival performance -- I like the sound of that, and how it could relate to performativity more generally.

    2. With the myths of the simplicity of the format-neutral world beginning to unravel,archivists can now grasp the complexities of debates about form and practice that newformats, in particular electronic records, are stimulating. Ultimately, the advent of thespecial format of electronic records has the potential finally to eradicate the pestilence ofthe traditional format-neutral stance from archival thinking and practice

      Strong words! So can we even talk about an "archival science" after this?

    3. the critical characteristic is that a record has to be linked to doing something – it isinherently transactional in its nature

      love this focus on "doing something" does that come from Jenkinson, or somewhere else?

    4. The movement from an archive to a collection is characterised by achange in the unity and coherence which is derived from the collector who ‘‘constructs anarrative of luck which replaces the narrative of production’’ (Stewart1984, p. 165).

      It's interesting to see the archive put in conversation with collections like this. It reminds me of some discussions about web archives and whether they were in fact had more to do with collections than archives.

    5. However, many historians, including the late Raphael Samuel who considered archivistsand librarians as the ‘‘Poor Bloody Infantry of the profession’’ (Samuel1994, pp. 18–19),have yet to understand the active role that archivists and librarians play in ‘pre-cooking’the raw materials of history (Elkner2003, p. 55).

      Troulliout too.

    6. urportedly format-neutral approach to archival research andpractices

      reminds me of ricky's position on needing to specialize in different formats -- the ideas need not necessarily translate across formats?

    Annotators

    1. A return to the older notion of “multiplying the copies” may make more sense: such copies will not, by definition, be unique, but will that make any difference?

      LOCKSS

    2. Unlike manuscripts, printed documents, photographs, and other traditional forms of records, electronic records have no material existence—at least none that can be perceived without the intervention of both hardware and software

      Makes me think of Kirschenbaum's Mechanisms.

    3. With tongues deep in their cheeks, archivists might try to assert that this represented a unique assemblage of information, but as an image constructed deliberately to lie, to misinform (“disinform,” perhaps), does it have value? The assumption that uniqueness is a positive quality in records—keep the information that is unique and disregard that which is not—is thus under serious attack.

      Reminds me of fake news.

    4. One medieval historian, for example, has estimated that a book of laws, compiled from other sources in ninth-century Italy, cost the equivalent of ninety-six two-pound loaves of bread, a staggering sum for the time.20

      Haha, how random. A book of laws sounds kinda voluminous though.

    5. Such diversity of opinion in specifying what the idea of uniqueness really means may indicate that, like many other archival ideas, this one is clearest if one has in mind a very narrow range of archival materials.

      This is really interesting, so the nature of the materials being considered shapes the type of uniqueness that is considered?

    6. Writers who approach uniqueness in this way have taken a step back from both the documents themselves and the information in them, emphasizing instead the processes that generate both.

      This makes me wonder about the approach we are taking in DocNow to appraisal. What are the ways that we can provide insight into the processes rather than the documents themselves? Perhaps the DocNow application itself is an embodiment of a macro-appraisal process?

    7. What have we meant by them in the past, and how have those meanings, which we encounter as fixed absolutes, evolved through time?

      Tracing this evolution over time does seem like a valuable thing to do. To unpack this assumptions that we kind of take for granted.

    Annotators

    1. This level of institutional mediation in providing access to cultural heritage information supports what Bourdieu has termed the “hierarchy of genres.” Within the fields that facilitate the production of culture, the symbolic production of art and literature is defined by their institutional treatment. Thisstatus creates a hierarchy of genres within each field that has been debated from Plato to the nineteenth-century Salons of Paris. The present-day translation of Bourdieu’s hierarchy as it applies to the field of art manifests in the digital environment, where the most important creators and artistic genres are reproduced online at an extremely high frequency (e.g., images of the Mona Lisa, paintingsby Picasso, etc.), while works lower in the hierarchy may require more specific search terms. Within the digital environment the hierarchy is expressed through metadata.

      This seems like an interesting connection: the hierarchy of genres and metadata being considered together. The primacy of metadata in the digital preservation field in part speaks to the role that text plays in archival systems. Perhaps this may be supplanted by other forms of image reading, e.g. by machine-learning, etc.

    2. In physical form, there exist “complex problems with the relationship of physical structure, intellectual integrity, and the representation of spatial hierarchy,” which are eliminated or left out in digital form.

      Don't these relationships exist in the digital as well -- they are just different, and reflect the digitization practices.

    Annotators