This compression contest is motivated by the fact that being able to compress well is closely related to acting intelligently. In order to compress data, one has to find regularities in them, which is intrinsically difficult (many researchers live from analyzing data and finding compact models). So compressors beating the current “dumb” compressors need to be smart(er). Since the prize wants to stimulate developing “universally” smart compressors, we need a “universal” corpus of data. Arguably the online lexicon Wikipedia is a good snapshot of the Human World Knowledge. So the ultimate compressor of it should “understand” all human knowledge, i.e. be really smart. enwik8 is a hopefully representative 100MB extract from Wikipedia.
This test is so much more meaningful than the Turing Test. It is quantitative, and amenable to incremental advances. It further emphasizes the relationship between general intelligence and ability to compress data. The Hutter Prize is a more concrete version of Jim Bowery’s proposed C-Prize. A more detailed rationale is on the site here.