RSS aggregation using rails
I’ve been playing around with aggregating RSS feeds, and thought I’d share a few observations:
- Handling all the different RSS variants is still not simple (many feeds aren’t even valid). After harvesting direct turned out to be challenging, I switched to using Google Base to authenticate with Google and then retrieving feeds from the undocumented Reader API (still waiting for official Google Data API) – eg. http://www.google.com/reader/atom/feed/http://www.yourfeedaddresshere.com. This way at least Google has done the work for you in terms of standardizing the feed to a common format, atom, and ensuring validity.
- Converting contents encoded with HTML entities is easily undone using the htmlentities gem.
- You can skip ruby-feedparser and simple-rss and move straight to hpricot, especially given the clean results from Google Reader.
- A rake task is perfect for scheduling and queuing updates (make sure you throttle and cache appropriately).

Hi Paul,
You can actually do a direct call to the Google feed proxy. Technically it’s the Google Ajax Search API’s direct endpoint for feed manipulation.
More info:
NiallKennedy.com: Google Feed API
Example:
http://www.google.com/uds/Gfeeds?v=1.0&callback=&context=&output=json&q=http%3A%2F%2Fwww.paulcowles.com%2Ffeed%2F
Thanks Niall.
The new RESTful interface for Flash and other Non-Javascript Environments looks perfect for server-side aggregation.
As example, Techcrunch as json.