RobinsStuff.RSSFilter History
Hide minor edits - Show changes to output
Deleted lines 15-16:
We also need for the aggregator to now pass each entry to a user-specific filtering widget so tha
Added lines 1-17:
(:toc:)
!Introduction
If you subscribe to the RSS feeds of a number of sites, it doesn't take long before the amount of information coming in starts to become overwhelming. An anology is with email, where you have a lot of incoming spam mixed in with regular, desired, incoming mail. So perhaps we can apply the same system to sort 'desirable' feed entries from 'undesirable' ones.
Web aggregators exist that you give a list of feeds, and they pull them down and present them to you in a nice fashion, all in one place. We can add to this idea by allowing the user to give each one a rating. This rating provides an indication of how much they'd like to see this kind of thing in the future. The system then uses these ratings to filter incoming items, and present them to you ordered by relevance. It can also simply not display those of the least relevance, and put those of the highest in a box on their own at the top of the page.
!The Plan
This is a project that can be quite effectively done in stages. The first stage is to build a web-based RSS aggregator. A simple one isn't a hard problem. It just needs to poll RSS feeds, add entries to a database, pull those entries back to present to the user. Along with the standard necessary things such as marking those that have been read so they don't show to that user again.
Once the aggregator side is done, we need to add a feature to the system: rating and learning. A way for the user to say how much they liked a given entry. When ratings are set, the entries, with the attached ratings, are passed to a learning function that uses the information to build a profile of the user's tastes.
!Learning
The most obvious method is to use a simple bayesian filtering system that looks for key words and phrases, and tags them as interesting/not interesting. This will then give us a score for a block of text that determines its 'interestingness'. It will hopefully also give us a confidence score, so that if it's uncertain, the user can choose to see them (if they're not too busy) and give them ratings too.
''More to come''
We also need for the aggregator to now pass each entry to a user-specific filtering widget so tha
!Introduction
If you subscribe to the RSS feeds of a number of sites, it doesn't take long before the amount of information coming in starts to become overwhelming. An anology is with email, where you have a lot of incoming spam mixed in with regular, desired, incoming mail. So perhaps we can apply the same system to sort 'desirable' feed entries from 'undesirable' ones.
Web aggregators exist that you give a list of feeds, and they pull them down and present them to you in a nice fashion, all in one place. We can add to this idea by allowing the user to give each one a rating. This rating provides an indication of how much they'd like to see this kind of thing in the future. The system then uses these ratings to filter incoming items, and present them to you ordered by relevance. It can also simply not display those of the least relevance, and put those of the highest in a box on their own at the top of the page.
!The Plan
This is a project that can be quite effectively done in stages. The first stage is to build a web-based RSS aggregator. A simple one isn't a hard problem. It just needs to poll RSS feeds, add entries to a database, pull those entries back to present to the user. Along with the standard necessary things such as marking those that have been read so they don't show to that user again.
Once the aggregator side is done, we need to add a feature to the system: rating and learning. A way for the user to say how much they liked a given entry. When ratings are set, the entries, with the attached ratings, are passed to a learning function that uses the information to build a profile of the user's tastes.
!Learning
The most obvious method is to use a simple bayesian filtering system that looks for key words and phrases, and tags them as interesting/not interesting. This will then give us a score for a block of text that determines its 'interestingness'. It will hopefully also give us a confidence score, so that if it's uncertain, the user can choose to see them (if they're not too busy) and give them ratings too.
''More to come''
We also need for the aggregator to now pass each entry to a user-specific filtering widget so tha
Page last modified on March 15, 2006, at 02:37 AM