How does pageVault work?
The filter component of pageVault sits inside the web-server’s address space. It inspects each HTTP request, and if the pageVault configuration specifies that the request is of a type which should be considered for archiving, it then inspects the response. The byte-stream making up the response is charactertised by constructing a checksum. pageVault can be configured to ignore certain parts of particular responses as being “non-material”: these parts are excluded from the checksum calculation (but will be included in the archive if the response is archived). The HTTP response header is also excluded from the checksum calculation. If the checksum of the response is different from the checksum calculated for the previous request for the same resource, pageVault archives the response. For more details, see Web site archiving – an approach to recording every materially different response produced by a website, a refereed paper presented at AusWeb03.