Accelerating Future Transhumanism, AI, nanotech, the Singularity, and extinction risk.


Hutter Prize for Compressing Human Knowledge

The €50'000 Prize for Compressing Human Knowledge was just announced via The motivation is as follows:

This compression contest is motivated by the fact that being able to compress well is closely related to acting intelligently. In order to compress data, one has to find regularities in them, which is intrinsically difficult (many researchers live from analyzing data and finding compact models). So compressors beating the current "dumb" compressors need to be smart(er). Since the prize wants to stimulate developing "universally" smart compressors, we need a "universal" corpus of data. Arguably the online lexicon Wikipedia is a good snapshot of the Human World Knowledge. So the ultimate compressor of it should "understand" all human knowledge, i.e. be really smart. enwik8 is a hopefully representative 100MB extract from Wikipedia.

This test is so much more meaningful than the Turing Test. It is quantitative, and amenable to incremental advances. It further emphasizes the relationship between general intelligence and ability to compress data. The Hutter Prize is a more concrete version of Jim Bowery's proposed C-Prize. A more detailed rationale is on the site here.

Filed under: AI Leave a comment
Comments (27) Trackbacks (0)
  1. I think the second requirement, i.e., to restore the original data as was, is somewhat contraproductive. Natural language is just one way of representing knowledge and it surely isn’t the most efficient one. Hence, what one would want is a good natural language processing algorithm, which extracts the “knowledge” out of the wiki pages (or whatever source you want to use). Represented using, say, ontologies paired with first order predicate logic gets rid of the redundancies of natural language and can thus be compressed a lot more efficiently. On the “downside”, a complete restoration of the original pages becomes impossible, it’s just like with lossy image compression. Except that the actual quality of information doesn’t suffer.

  2. I have to agree with Herwig Moser. Good teachers aren’t interested in their students repeating their text book, word for word, only in them learning from it and being able to represent the data differntly.

    However the prize founder’s may have another approach in mind.

  3. Luke: The human-repeating-a-textbook analogy doesn’t work. Making a machine that can regurgitate information is a solved problem: the text you’re reading at this instant was prepared by machines that memorize and repeat my keyboard inputs with perfect accuracy. If a student can repeat a chapter from an economics textbook verbatim from memory, we have no way of knowing how much space it occupies in her brain, or what methods she’s using to remember it. If the student can reproduce (or at least approximate) information that cannot be memorized, then they have found a way to compress it. One can’t memorize a list of optimal moves in chess for every possible board position, which is why it’s so remarkable when people play chess well. Regurgitating text books only sounds unintelligent because we assume that the regurgitator is applying the obvious method: just recording the information without compressing it.

    Suppose I were to recite to you the first 100 digits of the decimal expansion of pi. This isn’t all that impressive. It only tells you I have enough free time to memorize a string of 100 digits. But what if I can recite 10,000 digits without error? How about a million? At some point, memorization ceases to be an option. I must be generating the digits on-the-fly by some sort of algorithm. Finding those algorithms is the part that requires intelligence.

    In Hutter’s contest, we know that the program is not simply “memorizing” like a student that can repeat a textbook, because, unlike a student, we know exactly how much space the generating program requires.

  4. IMHO there are basically 3 components to a compressed corpus:

    1) The presentation-invariant knowledge.
    2) The calculus of presentation (vocabulary, syntax, grammar, visual markup, etc.).
    3) The noise.

    Taken together, 1 and 2 are the “model” part and 3 is the “noise” part of what I believe is called algorithmic statistics or minimum description length of the string. However, dividing the model into the presentation calculus (I’m making these terms up as I go along — there are probably legitimate academic terms) and presentation-invariant knowledge may make it clearer what people are actually talking about.

    Clearly the presentation calculus has its own value in rendering for human interface. The presentation-invariant knowledge is what people frequently think of as “human knowledge” but of course human knowledge encompasses more than that.

    I expect that the enwik8 level of the Hutter Prize will go some distance toward discovering things about the presentation calculus of an english Wiki but that presentation invariant knowledge may need more of a corpus to push compression to higher levels.

  5. Gutzeit, I was agreeing with:

    “Natural language is just one way of representing knowledge and it surely isn’t the most efficient one. Hence, what one would want is a good natural language processing algorithm, which extracts the “knowledge” out of the wiki pages (or whatever source you want to use). ”

    When I said:

    “Good teachers aren’t interested in their students repeating their text book, word for word, only in them learning from it and being able to represent the data differently.”

    It is unreasonable to assume students have time to memorise even key parts of a text book, word-for-word. But if they can show that they know the knowledge, that will satisfy most good teachers. What I think the Hutter Prize should encourage is this latter form of information retrieval (perhaps opposed to ‘data’ retrieval), where as Herwig Moser says, the words of the text book aren’t spat out word-for-word, but the information from it is still retained, and exportable (or to continue with the human analogy: explainable). Reconstructing the noisy knowledge as an additional layer, will take some time to code, and it isn’t necessary to achieve something closer to a human’s mental compression.

    Perhaps another prize that awards advances in mimicking this kind of ontology system would be best.

    Another similarity that this could approximate: when people see something happen with their eyes, they don’t remember even most of the information originally present, they break it down, compressing it in several different ways before they recall it in long term memory. Actually optical illusions readily demonstrate, even our short term memory recall of imagery is incredibly lossy, try finding your blindspot. If our eyes used pixels, we could safely say, our memory recall of visual information uses a lossy compression system.

    Another similarity that this could approximate: when humans read, if a word is incorrectly spelt, the very fact that we know what the word is *meant* to be shows understanding. If an advanced compression algorithm removes these spelling mistakes, it wouldn’t be a problem in most real terms. For most purposes, I don’t mind using a database or file system that just happens to automatically correct spelling errors, although if it doesn’t recognise new words, such as Skype, Google (as opposed to Googol) or Flickr (as opposed to flicker), or somebody’s password, there would be problems.

    A system aimed at emulating human intelligence that has the same flaws as a human intelligence can sometimes have, is probably on the right lines, as a precursor to something better.

  6. This will not quality for the competition but points to an answer to the problem.
    Hope it helps the dialogue.

  7. I really feel similar web site enthusiasts really should consider this particular homepage as an example. Totally clean and convenient style and design, and in many cases awesome content material! You’re an expert operating in this excellent area :)

  8. Extraordinarily educative thank you, I believe your trusty readers might want even more information similar to this carry on the excellent effort.

  9. Thanks for this well-written post on the area which I’m deeply interested in. Can I enquire if there are any more updates coming up? I should set a reminder on this awesome page for updates.

  10. Check out a great exclusive ibook about . Lots of amazing photos.

  11. This web site is known as a stroll-through for all the information you needed about this and didn?t know who to ask. Glimpse right here, and also you?ll undoubtedly uncover it.

  12. What i do not understood is in truth how you are not actually much more well-liked than you might be right now. You’re so intelligent. You already know therefore considerably in the case of this subject, produced me personally consider it from a lot of varied angles. Its like women and men aren’t involved unless it is something to accomplish with Lady gaga! Your own stuffs outstanding. At all times handle it up!

  13. since touch geekiness and a bit of associated with these with intoxicated ideas have Ray Ban Window frames by way of medicine lenses. One of those particular super stars and consequently models think it’s fascinating turn out to be bespectacled. Ray Ban as a business is managing their utmost to deliver our future’s greatest ultra-modern sunglasses. This important model is actually greater plus there is a lot of actors and actresses which experts claim need to have .

  14. Spot on with this write-up, I really suppose this web site wants far more consideration. I?ll most likely be once more to learn far more, thanks for that info.

  15. Heya I’m for the very first time here. I came across this table and I find It really useful & it helped me out much. I am hoping to provide something back and help others like I was helped by you.

  16. I just want to tell you that I am new to blogging and definitely savored your blog. Almost certainly I’m going to bookmark your blog post . You absolutely come with tremendous article content. Kudos for sharing with us your website page.

  17. Inside year 1990, that Bulls held the best record that they had ever held. The star within the team, Michael Jordan hit his their own best scores inside year as very. During a game rrn opposition to Cleveland, the super star was able to score 69 points 1 game. The actual year, the finest also landed three months two three point shots during the very season, compared to the sixty eight three pointers he had in all of the times of year that he played combined.

  18. You should check out a cool exclusive ibook all about graffiti artists. Tons of amazing photographs.

  19. A formidable share, I just given this onto a colleague who was doing somewhat analysis on this. And he in reality bought me breakfast as a result of I found it for him.. smile. So let me reword that: Thnx for the treat! But yeah Thnkx for spending the time to discuss this, I feel strongly about it and love reading more on this topic. If possible, as you turn into expertise, would you mind updating your blog with more details? It is highly helpful for me. Big thumb up for this blog post!

  20. I am now not certain the place you are getting your info, but good topic. I needs to spend a while studying much more or figuring out more. Thanks for excellent information I was in search of this info for my mission.

  21. I’m extremely impressed with your writing skills and also with the layout on your weblog. Is this a paid theme or did you customize it yourself? Anyway keep up the excellent quality writing, it’s rare to see a nice blog like this one nowadays..

  22. We do take method in which that you’ve got framed such issue but it will do create personally some fodder for thought. Nevertheless, through things also have had, I wish due to the fact actual opinions pack on that most people continue to issue certainly not begin a soap box with regards to news of the day. Still, many thanks this outstanding point you are going to I wouldn’t go along with the item in totality, I value the viewpoint.

  23. I was more than happy to discover this site. I want to to thank you for your time for this wonderful read!! I definitely enjoyed every part of it and I have you book marked to see new things in your web site.

  24. Thanks a ton for ?ha?ing this gr?at web-site.

    Al?o visit my web site …

  25. You could have verified that you’re allowed to write with this subject. Information which you refer to and the know-how in addition to comprehension of these matters definitely reveal that you’ve got a number of practical knowledge.

  26. What’s up to all, it’s actually a fastidious for me to go to see this web page, it consists of valuable Information.

  27. Very efficiently written post. It will be beneficial to anyone who utilizes it, including me. Keep up the good work – for sure i will check out more posts.

Leave a comment

No trackbacks yet.