From Robin's Wiki

RobinsStuff: RSSFilter

Introduction

If you subscribe to the RSS feeds of a number of sites, it doesn't take long before the amount of information coming in starts to become overwhelming. An anology is with email, where you have a lot of incoming spam mixed in with regular, desired, incoming mail. So perhaps we can apply the same system to sort 'desirable' feed entries from 'undesirable' ones.

Web aggregators exist that you give a list of feeds, and they pull them down and present them to you in a nice fashion, all in one place. We can add to this idea by allowing the user to give each one a rating. This rating provides an indication of how much they'd like to see this kind of thing in the future. The system then uses these ratings to filter incoming items, and present them to you ordered by relevance. It can also simply not display those of the least relevance, and put those of the highest in a box on their own at the top of the page.

The Plan

This is a project that can be quite effectively done in stages. The first stage is to build a web-based RSS aggregator. A simple one isn't a hard problem. It just needs to poll RSS feeds, add entries to a database, pull those entries back to present to the user. Along with the standard necessary things such as marking those that have been read so they don't show to that user again.

Once the aggregator side is done, we need to add a feature to the system: rating and learning. A way for the user to say how much they liked a given entry. When ratings are set, the entries, with the attached ratings, are passed to a learning function that uses the information to build a profile of the user's tastes.

Learning

The most obvious method is to use a simple bayesian filtering system that looks for key words and phrases, and tags them as interesting/not interesting. This will then give us a score for a block of text that determines its 'interestingness'. It will hopefully also give us a confidence score, so that if it's uncertain, the user can choose to see them (if they're not too busy) and give them ratings too.

More to come

Retrieved from http://www.kallisti.net.nz/RobinsStuff/RSSFilter
Page last modified on March 15, 2006, at 02:37 AM