1. Aug 16th, 2006

    The Web is broken … follow up

    My previous post, The Web is broken has a lot more going behind the scenes.

    The point of reference is this post by Ian Hickson:

    However, it seems that these real world concerns are not a factor in the TAG’s findings, since the day after I posted the aforementioned blog post, they published a document describing how browsers must always follow Content-Type headers, how specifications must never require browsers to ignore such headers, and how authors must all go and correct their mis-configured servers.

    There’s two ways to write specifications. You can architect something that’s elegant and correct. Or you can design
    something that works in the real world.

    A content type tells you what the server is returning, and video is not text/plain. Unfortunately, in the real world the content type is often wrong, and any browser that listens to it will just not work. It will be “spec compliant” but useless to anyone.

    There’s a lot of reasons why content types are wrong. Not reading the manual: I myself setup several sites before I realized I’m not serving the right content type. Then there’s being lazy: would you rather add a new cool feature, or go and fix the content type on all your pages?

    And then there’s practical. For example, the correct content type for RSS feeds is application/rss+xml. In co.mments, I serve RSS feeds as content type application/xml. I do that because clicking on the Subscribe link has one of two results.

    If it’s the right content type, you get a confusing Open/Save dialog which annoys users. It’s also hard to develop and test that way. If it’s the wrong content type, you get an XML that gets translated into HTML so the user is getting a Web page they can deal with. And I can test it with my browser.

    In other words, the spec has one view of right and wrong, but the real world has the opposite view.

    Remember when Google Accelerator broke some sites (thanks to Bill again for reminding me)? The standard makes a clear distinction between GET and POST. GETs don’t have side effects.

    But the Web is about people, and people make their own distinction between links and buttons, and have their own perception of side effects. A login link on the top of the page has no side effects. For people, neither does logout, and there’s no reason for it to look any different.

    Do you build Web sites for people or machines?

    As a Web developer I’m often caught in that cross fire between GETs that are incredibly useful if they do have side effects. You can bookmark them, grab them from a feed reader, and use them as link. Yet, the capture more of the action, they work better.

    Setting up barriers to usability has the opposite side effect on standards.

    Again I’m passing the mic to Ian:

    The biggest thing I’m worried about here, to be honest, is that every time we (the standards community) require the browser vendors to do something that they consider makes their user’s life harder, they get one step closer to ignoring _all_ standards. That is, it’s not just that they pick and choose the standards they want to comply to — there comes a point where engineers decide that they’ve had enough of "stupid specs" in general, and ignore all of them. At that point, we (the standards community) might as well go home, because we are utterly powerless without the engineers listening to us and doing what we tell them to.

    It’s a tough call, but the standard community needs to realize it can’t work in a vacuum, away from the people who want to use the technology.

    Update: Also read this post by Mike Champion.

    1. Aug 20th, 2006

      len

      It isn’t an either/or situation. The pattern will repeat in smaller or larger numbers. Some specifications are ignored, parts are ignored, people are ignored and so on. The pattern weakens the utility of the web as a whole, but the reality is it weakens the perceptions of particular sites to particular people. As the evolution toward interactive sites (say web n.n) grows, it becomes necessary to apply client-side testing of server resources in more strict fashion.

      The kumbah-yah days of the web have long been over if they ever were really that at all (say Blink). It is c caveat emptor market and technology. The focus has to turn to providing more client-side detection.

    2. Aug 23rd, 2006

      Assaf

      len,

      Anything that’s easy to test and has immediate impact is respected more often, e.g. the title element in HTML. Anything that’s hidden, hard to test, or a problem, ends up broken and we route around it.

      Which is fair. The Content-Type header was a good idea, I still think it was a good solution. But it ended up not working, for a lot of different reason.

      My point is, we have to look forward and we have to build on what’s out there in the field. If spec authors decide that purity is above real world, they risk alienting users and … what’s the point of a spec which no one uses?

      It’s like deciding that search engines should only index well-formed XHTML, and browsers should show error pages if the CSS is incorrect.

    3. Nov 16th, 2006

      Rob

      I don’t exactly know where to post this, this discussion is playing out in the blogosphere all at once, and I’m a few weeks behind. I’ll probably post a comment here, and then repost it on my blog, for whoever happens to wander by.

      I agree the W3C has made some enormous errors and needs to reform. They need to work with application developers, their own volunteers, the web standards people, the microformat people and most especially web designers. If they don’t, they’ll become an academic body whose badly implemented (or unimplemented) standards have about as much baring on the internet as those W3DC people. It would be a shame, because they’ve done some great work in the past, and until recently they had a spiffy shiny-happy reputation which everyone loved to love.

      But let’s look at some of those standards. W3C was small when it created HTML3 and 3.2, but it was already a growing power on the web when it came out with HTML4. It had to be, or it never would have been able to reign in the competing markup languages put out by MS and Netscape. HTML4 was a compromise to be sure, but it streamlined the language to the point where the language seemed to take on a coherent vision, providing the features web designers needed but removing the excess bloat. W3C’s second major success was CSS, and after that came XML. Surely these are the W3C’s most long lasting and enduring creations. I don’t know how much of W3C’s chemistry has changed since then, but certainly the W3C wasn’t flying under the radar; I clearly remember they got a LOT of press, especially about XML, and just about all of it was good.

      While each of the above standards *were* based on previous work, they were more than that. They were willing to push against prevailing implementations for the sake of elegance and power. They believed (and I believe) that standards should be elegant. Really there’s no reason elegant standards can’t work in the real world. Yes, implementations (both in documents and renderers) won’t be perfect right off the bat. But draconian standards don’t have to be adopted all at once. You can adopt them in pieces, but over time renders should be getting stricter and start to enforce standards more acutely. The desire for simple, predictable and strict implementation was, after all, part of the design process behind XML, arguably W3C’s greatest success.

      This can be at least partially a voluntary process- if browsers just provided purists with a prominently placed “strict” mode option optimized for and limited to rendering valid websites, a small percentage of users would turn it on. Web designers would be the first ones, testing their web sites in the stricter modes to keep from loosing the purist web surfers. It would also be of a lot of help if the W3C came up with degrees of compatibility (TBL has even advocated this). You don’t have to look any farther than the Acid2 test to see how useful such a test could be, and how browser makes compete for the best results (although, unfortunately, the 2 most popular browsers still don’t make the cut).

      Whatever you think about mostly unimplemented standards like XForms, there’s no sensible reason for sites as huge as Amazon.com to continue to serve such sloppy mal-formed HTML except for the fact that browser makers bend over backwards to allow them to do it. We would all be better off if websites served valid documents and browsers all rendered them correctly.

      The alternative to gradually requiring strict conformance is to let browsers stagnate with almost-implemented standards indefinitely. Such an approach would result in a stack of web technologies full of workarounds and violations, with no hope of ever being fixed, and would greatly complicate any future implementations and even make certain future extensions impossible.

      Now to quickly address some more specific points… I read the W3C document and browsers can correctly implement content-type headers and still allow the user to decide how to display the document. It would be an extra dialog box, but big deal. It’s better than IE’s choice to base the interpretation on the file extension, isn’t it? And as you’ve mostly admitted, it’s your own fault if you serve files with the wrong header. If content-headers were properly implemented tomorrow, web designers would very quickly learn to server content with the correct headers. And the logic behind the W3C’s advice is IMHO sound.

      “For example, the correct content type for RSS feeds is application/rss+xml. In co.mments, I serve RSS feeds as content type application/xml.”

      Are you sure about this? While rss+xml sounds logical I don’t see it mentioned as an official mime type at http://www.iana.org/assignments/media-types/application/ . Or are mime types and content-headers different (I’m certainly no expert). But more to the point, isn’t your choice of content-header precisely what the W3C recommends? O_o If you want the file to be displayed as XML styled with CSS, serve it as application/xml (which it certainly is). If, on the other hand, you wanted to serve the file as a RSS file specifically, for example to be opened in a dedicated feed reader, you supply it with the content-header for the specific flavor of XML. That would be my interpretation. I*n which case, you’re doing precisely what the W3C recommends for the reasons it recommends.

      Disclaimer: I am not a professional web designer, and most of my experience w/ W3C standards has been as a spectator and casual user. Although I have been frustrated with uncooperative web layouts, most of my angst gas been toward poor browser implementations.

    4. Nov 16th, 2006

      Assaf

      Rob,

      Great comment!

      I also believe serving correct content is the best way to go. That way, we can spend less time fixing broken content, patching malformed results and dealing with tag soups. We can spend more time building features that matter.

      But, browsers that refuse broken content are, as far as end users are concerned, broken. And changing the existing content is not an option.

      The Web has a lot of useful information contained in unmaintained and understaffed sites that will not be fixed. You can either show that content or ignore them.

      Your typical “Web designer” doesn’t work for Amazon, know HTML 4.0.1 back to front, and monitor what comes out of the W3C. Your typical Web designer knows the very minimum HTML to get by.

      Most Web sites are designed using a simple process: write, test in browser, repeat until done.

      I have to admit, I like to follow the HTML/CSS/JS specifications. But when I find that it’s broken in ten different ways, and no two browsers are broken the same way, I stop caring about it being correct, all I want is something that just works.

      It would be great, though, if the big guys did something about it and led by example. Sites that have correct content and browsers that don’t break over it.

      As for content type, technically application/rss+xml is not registered, but that’s the convention for identifying the content type before (or instead of) serving the content.

      AFAIK the Atom working group did register application/atom+xml, and I wouldn’t use that one either for exactly the same reason.

      I do fall back on application/xml, which is almost as good as far as the spec is concerned, but I’m willing to do that only because it works for users.

      It’s not the W3C’s fault that we’re in this situation. Talking specifically about content type, that’s actually a result of unintended consequences. Nobody planned to handle these content types correctly because for the longest time nobody cared.

      Then RSS happened, and all of a sudden we get to use them left and right, and we just go for the first thing that works. It’s a hack.

      My point is, the W3C didn’t get us into this problem, but they have to realize what happend in real life and work with that. They can’t turn a blind eye and decide it’s pure or bust.

    Your comment, here ⇓