1. Aug 15th, 2006

    The Web is broken … or just our concept of what works

    Ian Hickson on the W3C TAG mailing list (emphasis is mine):

    I did a short study recently checking only for _syntax_ errors in HTML documents, and the results were that of the 667416 files tested, 626575 had syntax errors. Over 93%. That’s only syntax errors in the HTML, not checking the CSS, the content types, the semantic errors (e.g. duplicate IDs — 86461 of those files had duplicated IDs), or any other errors. If you included those kinds of errors, you’d probably find that almost all pages had errors that would trigger this warning. Thus any sort of visible UI would be basically always saying “this page is broken”. That would not be good UI for the majority of users, who don’t care.

    So what can we learn from this? First, that the Web is mostly broken. Wrong element names, unbalanced tags, incorrect content type, duplicate ID (the spec clearly says “thou shalt not …”). Every page is a sinner.

    But you’re reading this, so the Web must still function in spite of that. That must not be the lesson.

    That people are so bad with technology, they can’t even get HTML straight? (Yours truely included) We already know that. We code as well as we drive. And that’s not a compliment.

    That technologies that are simple, fuzzy and resilient to human error work better in the long term?

    The lesson I learned (and I have similar numbers from my scraping logs) is that if you want anything of Internet scale, you need to assume 100% failure rate.

    To demand anything that’s strict and compliant is to set up barriers from the world outside.

    Via Bill de hÓra, photo by e-magic

    Update: Thanks to Bill de hÓra for correcting me, the quote was actually by Ian Hickson, in response to Paul Prescod. And the irony of erring is not lost on me.

    1. Aug 16th, 2006

      Internet no sirve para nada, o sencillamente es un concepto que funciona ›› La Cara Oscura del Desarrollo de Software

      [...] Leyendo Labnotes: The Web is broken … or just our concept of what works (de donde obviamente tomé el título de este artículo) señala que en una muestra de unas sesicientas mil página tomadas de internet, el 93% de la mismas tenían errores de “marcaje” (markup), y eso sin escarbar mucho en cosas más complicadas. [...]

    2. Aug 16th, 2006

      Eran

      Err… Postel’s Law?

    3. Aug 16th, 2006

      Labnotes » The Web is broken … follow up

      [...] My previous post, The Web is broken has a lot more going behind the scenes. [...]

    Your comment, here ⇓