    1. Making a Memento

      To create an archived version of the page that could be played back properly, I used the Internet Archive’s “Save” feature by going to this URL in my web browser:


      …which created this snapshot:


      From here, we can use wget to look at what gets played back:

      $ wget --server-response http://web.archive.org/web/20150709104019/http://iipc.github.io/warc-specifications/primers/web-archive-formats/hello-world.txt


        HTTP/1.0 200 OK
        Server: Tengine/2.1.0
        Date: Thu, 09 Jul 2015 10:41:38 GMT
        Content-Type: text/plain;charset=utf-8
        Content-Length: 13
        Set-Cookie: wayback_server=19; Domain=archive.org; Path=/; Expires=Sat, 08-Aug-15 10:41:38 GMT;
        Memento-Datetime: Thu, 09 Jul 2015 10:40:19 GMT
        Link: <http://iipc.github.io/warc-specifications/primers/web-archive-formats/hello-world.txt>; rel="original", <http://web.archive.org/web/timemap/link/http://iipc.github.io/warc-specifications/primers/web-archive-formats/hello-world.txt>; rel="timemap"; type="application/link-format", <http://web.archive.org/web/http://iipc.github.io/warc-specifications/primers/web-archive-formats/hello-world.txt>; rel="timegate", <http://web.archive.org/web/20150709104019/http://iipc.github.io/warc-specifications/primers/web-archive-formats/hello-world.txt>; rel="first last memento"; datetime="Thu, 09 Jul 2015 10:40:19 GMT"
        X-Archive-Orig-x-cache-hits: 0
        X-Archive-Orig-x-served-by: cache-sjc3122-SJC
        X-Archive-Orig-cache-control: max-age=600
        X-Archive-Orig-content-type: text/plain; charset=utf-8
        X-Archive-Orig-server: GitHub.com
        X-Archive-Orig-age: 0
        X-Archive-Orig-x-timer: S1436438419.302921,VS0,VE141
        X-Archive-Orig-access-control-allow-origin: *
        X-Archive-Orig-last-modified: Wed, 08 Jul 2015 22:33:03 GMT
        X-Archive-Orig-expires: Thu, 09 Jul 2015 10:50:19 GMT
        X-Archive-Orig-accept-ranges: bytes
        X-Archive-Orig-vary: Accept-Encoding
        X-Archive-Orig-connection: close
        X-Archive-Orig-date: Thu, 09 Jul 2015 10:40:19 GMT
        X-Archive-Orig-via: 1.1 varnish
        X-Archive-Orig-content-length: 13
        X-Archive-Orig-x-cache: MISS
        X-Archive-Wayback-Perf: {"IndexLoad":359,"IndexQueryTotal":359,"RobotsFetchTotal":1,"RobotsRedis":1,"RobotsTotal":1,"Total":371,"WArcResource":10}
        X-Archive-Playback: 1
        X-Page-Cache: MISS
    2. Extracting a WARC record

      Once we’ve identified the offset and length of a particular record (in this case, an offset of 1260 bytes and a length of 1085 bytes), we can snip out an individual record like this:

      $ tail -c +1261 hello-world.warc | head -c 1085
    3. Making the CDX

      To generate a content index (CDX) file, we have at least two options. There’s JWATTools:

      $ jwattools cdx hello-world.warc

      …(which created cdx.unsorted.out), or the cdx-indexer from OpenWayback:

      $ cdx-indexer hello-world.warc > hello-world.warc.cdx

      …(which created hello-world.warc.cdx).

    4. Making the WARC

      To create a WARC, we used wget:

      $ wget --warc-file hello-world http://iipc.github.io/warc-specifications/primers/web-archive-formats/hello-world.txt

      …which created the compressed hello-world.warc.gz file. These special block-compressed files are often used directly, but in this primer, we uncompress it so we can see what’s going on:

      $ gunzip hello-world.warc.gz

      …leaving us with hello-world.warc.

    1. Wayback Machine being broken in Firefox

      Are you getting "Fail with status: 498 No Reason Phrase"? You might have your Referer header disabled.

      If that's the case, you can fix it by going to about:config and setting network.http.sendRefererHeader to 2 (or pressing the reset button to the right).

    1. Removing the navigational toolbar


      For example, here is an archived post discussing the id_ identity flag. This is a normal link to the Wayback Machine, which renders with the navigational toolbar:

      Here is the same archived page, with the i<var>d_</var> identity flag added to the link. This does not include the toolbar, but now the page renders poorly because of the broken references:

      Finally, here is the same archived page, with the <var>if_</var> iframe flag instead. This renders perfectly, without the toolbar:

      Since this is the most faithful reproduction of the original web page, please use the <var>if_</var> iframe flag for links to specific archive copies!

    2. Editors are encouraged to add an archive link as a part of each citation, or at least submit the referenced URL for archiving, at the same time that each citation is created or updated. New URLs added to Wikipedia articles (but not other pages) are usually automatically archived by a bot.


      In short, this is the code that needs to be added to an existing {{cite web}} or similar template:

      <ref>{{cite ... <!--EXISTING REFERENCE--> |archive-url=https://web.archive.org/web/<date>/http://www.originalurl.com |archive-date=<date> |url-status=dead}}</ref>
