1. Aug 7th, 2005

    Searching the blogsphere

    Mary Hodder over at Napsterization is tossing an idea for a new kind of pagerank for the blogsphere. My words, not hers. If I understand it correctly, then I don’t get the _why_. I think we need something much better.

    First there was search. And search was good. But search doesn’t scale on the Web, when you search for a common keyword you get millions of hits. Philosophical debates about rating and ranking aside, I want something that weeds through these millions of results and puts the better ones at the top. And Google’s pagerank algorithm does a good job of qualifying.

    Can’t do without some sort of ranking, so let’s take that one for granted. But better is not going to come from tweaking the algorithm.

    With or without ranks, search as we know it was designed in a world that has content providers and content consumers and a china wall inbetween. Searchers, the content consumers, have no context.

    Not so on the blogsphere. A blog is a content provider, a blog is a context. You can use it wisely to establish a sphere of relevance. All of a sudden searchers can start from a context.

    Start with this blog, use it as context and search for the keyword ‘blog’. First observation, there’s a lot of links coming out of this blog. Most are links to sources I find interesting, relevant, authorative. Others may disagree, but in this particular context, my outbound links rule. Anything these sources have to say about ‘blog’ should be ranked highest.

    Second observation, those blogs link to other blogs, which they find interesting, relevant, authorative, etc. So that’s a second hop that increases the sphere of relevance. Repeat enough times and you’ll spider the entire Web, something to do with six degrees of separation. But now we’re just duplicating Google.

    Third observation, limit the number of hops to a small set (say six), and decrease relevance in proportion to distance. So a blog four degrees of separation ranks less than a blog two degrees of separation. Interesting patterns start to emerge.

    When searching for ‘blog’, starting with this one as context, the posts that rise to the top talk about the future of blogging, the social aspects, new technologies. Just the sort of stuff I want to read. The posts that end at the bottom, maybe don’t even show up, talk about blogging 101, how to start a new blog, spam. The context has an interesting way of determining relevance.

    The “top 100″ that emerges is the one I want to know about. Not the Technorati or Feedster top 100, their lists are drawn with the broadest brush you can find. With contexts, you’ll find the top 100 blogs about scripting languages, the top 100 about parenting, and yes, even the top 100 about scripting and parenting.

    What we get at the end of the day is not one blogsphere with it’s A list and B list, but a million blogspheres, each created by a blog. In participating you’re actually defining what your blogsphere looks like.

    I think that’s much more interesting than pageranking RSS feeds.

    tags: search blogsphere ranking

    1. Aug 9th, 2005

      Napsterization

      Lotta Linkin Going On.. Or Not

      I wanted to summarize some of the very interesting things people have been saying about my post Saturday, Link Love Lost. Elisa Camahort at Worker Bees Blog: … all of this talk and tempest around some relatively new companies and…

    2. Aug 9th, 2005

      Dave Scotese

      We can participate when we read blogs too. While it is difficult for a machine to determine whether or not you liked a post that you viewed and then left, if you were to indicate somehow that it was better or worse than other posts, the machine could use that information to build a much better ranking system than any I’ve seen.

      That is the effect of the plugin I wrote and deployed at http://we-rank.com If you visit a category or you view the comments to a post, you will see that it allows you to rank the posts (or the comments), indicating your opinion about what is best, second best, third, etc. That information is then used to produce an overall quality-based ranking of the posts in a category, or the responses to a post.

    3. Aug 10th, 2005

      Assaf

      You might want to check this:
      http://trac.labnotes.org/cgi-bin/trac.cgi/wiki/RateBack

      there’s also a WP plugin in the works.

    Your comment, here ⇓