Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

Doesn adding the decompressor length to the length of the archive encourage developing unreadable short decompressors, rather than smart compressors based on understanding the text corpus enwik8?

0
10 Posted

Doesn adding the decompressor length to the length of the archive encourage developing unreadable short decompressors, rather than smart compressors based on understanding the text corpus enwik8?

0
10

The typical size of current decompressors is less than 100KB. So obscuring them by making them shorter will give you at most an 0.1% = 100KB/100MB advantage (not enough to be eligible for the prize). On the other hand, for fairness it is necessary to include the size of the decompressor. Take a compressor like PAQ8H that contains 800KB of tokens. Clearly it can achieve better compression than one from scratch. If you’re not convinced by this argument, consider an extreme “decompressor” of size 100MB that simply outputs enwik8 byte by byte (from a zero byte archive), thus achieving a compressed size of 0.

Related Questions

What is your question?

*Sadly, we had to bring back ads too. Hopefully more targeted.

Experts123