Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

How does pageVault work?

April 26, 2017pageVault

0

Posted

How does pageVault work?

1 Answer

0

Posted

The filter component of pageVault sits inside the web-server’s address space. It inspects each HTTP request, and if the pageVault configuration specifies that the request is of a type which should be considered for archiving, it then inspects the response. The byte-stream making up the response is charactertised by constructing a checksum. pageVault can be configured to ignore certain parts of particular responses as being “non-material”: these parts are excluded from the checksum calculation (but will be included in the archive if the response is archived). The HTTP response header is also excluded from the checksum calculation. If the checksum of the response is different from the checksum calculated for the previous request for the same resource, pageVault archives the response. For more details, see Web site archiving – an approach to recording every materially different response produced by a website, a refereed paper presented at AusWeb03.

How can I minimise the resources used by pageVault?
Does pageVault only archive HTML?
Can I trial pageVault?