RSS aggregation using rails

I’ve been playing around with aggregating RSS feeds, and thought I’d share a few observations:

  • Handling all the different RSS variants is still not simple (many feeds aren’t even valid). After harvesting direct turned out to be challenging, I switched to using Google Base to authenticate with Google and then retrieving feeds from the undocumented Reader API (still waiting for official Google Data API) – eg. http://www.google.com/reader/atom/feed/http://www.yourfeedaddresshere.com. This way at least Google has done the work for you in terms of standardizing the feed to a common format, atom, and ensuring validity.
  • Converting contents encoded with HTML entities is easily undone using the htmlentities gem.
  • You can skip ruby-feedparser and simple-rss and move straight to hpricot, especially given the clean results from Google Reader.
  • A rake task is perfect for scheduling and queuing updates (make sure you throttle and cache appropriately).

What’s better than a little REST?

A couple years ago, Mark Baker and I had a quick conversation about Web services. I told Mark we were planning on offering SOAP and XML-RPC APIs to eventSherpa and Sherpafind, our first products at Semaview. Mark quickly and emphatically pointed out some of the advantages to offering REST style Web services. In the end, I went ahead with our SOAP and XML-RPC APIs, only to find very much the same result that Stewart Butterfield has had with flickr – that being little to no actual development taking place using these APIs. We had a few developers inquire about using the SOAP APIs, but other than a few less than stellar prototypes nothing came of it. If we had offered REST style services I have no doubt that we would have had increased interest by the early adopters. My mistake. I’ve seen the light.

REST style Web services are simple and intuitive. If you look at the services that are exploding on the Web – they are the ones that are doing so with the help of thousands of developers extending and refining the product. If you can get a community involved in building extensions to your foundation, you have free innovation. I have yet to see a service gain momentum on par with flickr, using SOAP or XML-RPC APIs. Even the Web’s premier retailer sees the majority of developers use REST rather than SOAP.

Lesson learned. Keep it simple stupid.