April 08, 2004

Feed Analysis: How Big is Your Feed?

I am picking up the feed analysis again from where I left off last week. To gather more baseline data I took the 28,774 feeds that have active subscriptions (see this posting on the source of the data) and analyzed their size, age, etc.

Of these feeds 1232 were non-existing, 1421 unreachable, and 2440 were not parsable by my not very liberal parser. The data is thus based on 23,681 life feeds.

First issue to look at is at the size of the feeds. They show a characteristics similar to a negative exponential distribution with average of 22 KB and a mean of 10KB. The sizes range from 210 Bytes to  890 KiloBytes. Almost 3% of the feeds have a size of over 100 KB.

The first graph presents a histogram of the feed sizes of all the feeds in the sample and of the feed sizes of only the 500 most subscribed feeds. There is a slight shift in the median of the top500 group to 14KB.

If we compare the sizes for the various feed versions we do see different distributions. RSS 2.0 feeds are 2-3 as large as the RSS 0.91 feeds in the sample. The 'other feeds' (RDF & Atom)  have a more dense distribution that falls between the two RSS versions in size. Almost all the larger feeds (over 80KB) are RSS 2.0 feeds. Of the feeds 18% were RSS 0.91, 46% were RSS 2.0 feeds, and 35% RDF & Atom feeds.

In the coming days I will follow-up with some more analysis on the number of items in the feeds, the item sizes and the age.

Posted by Werner Vogels at April 8, 2004 01:08 PM