Take apart webarchive files with Nu

Ever wonder what’s inside those .webarchive files that are saved by Safari? It turns out that they are just property lists, a common representation that Apple uses for externalizing structured data. Here’s a short Nu script that opens a saved webarchive file and prints its contents.

#!/usr/local/bin/nush

(import Cocoa)

(set webarchive (NSData dataWithContentsOfFile:"ProgrammingNu.webarchive"))
(set propertylist (NSPropertyListSerialization
  propertyListFromData:webarchive
  mutabilityOption:0
  format:NSPropertyListBinaryFormat_v1_0
  errorDescription:nil))

(set WebMainResource (propertylist "WebMainResource"))
(puts "--- Main Resource ---")
(set line (+ "0. " (WebMainResource "WebResourceURL")
             " ("  (WebMainResource "WebResourceMIMEType")
             ")"))
(if (set encoding (WebMainResource "WebResourceTextEncodingName"))
    (set line (+ line " (" encoding ")")))
(puts line)

;; Resource keys include the following:
;WebResourceURL,
;WebResourceTextEncodingName,
;WebResourceMIMEType,
;WebResourceData,
;WebResourceFrameName

(set WebSubresources (propertylist "WebSubresources"))
(puts "--- Subresources (#{(WebSubresources count)}) ---")
(WebSubresources eachWithIndex:
     (do (resource index)
         (set line (+ "#{(+ 1 index)}. " (resource "WebResourceURL")
                      " ("  (resource "WebResourceMIMEType") ")"))
         (if (set encoding (resource "WebResourceTextEncodingName"))
             (set line (+ line " (" encoding ")")))
         (puts line)))
Here’s what I get when I run it on the Programming Nu website:
--- Main Resource ---
0. http://programming.nu/ (text/html) (UTF-8)
--- Subresources (5) ---
1. http://programming.nu/stylesheets/nu.css (text/css)
2. http://programming.nu/files/recycle-s.png (image/png)
3. http://programming.nu/files/nupp.png (image/png)
4. http://programming.nu/files/masyu-solved.png (image/png)
5. http://programming.nu/files/ohloh_profile.png (image/png)

Data for individual objects is available using the WebResourceData key.

Comment on this post ↓

Leave a Comment (sign in with Twitter)