April 21, 2004

Feed Analysis: Item Size

To complete the set of postings on the feed analysis, here is the data on items and item size. After this there is one more posting before we move on to examining some usage scenarios using this data. Bob Wyman of has gracefully offered me access to their data collection on feeds, item and time lines. So you can expect some updates on the basic analysis once I have had a chance to work on their much better selection of data.

The data for this analysis still comes from these 23,000 feeds. Earlier we have looked at the size of those feeds and the distribution of different versions and their sizes. Now we will drill down one level of detail and look at the items in  feed and what we can say about their sizes.

The number of items in a feeds appear to be driven by the CMS the generates the feeds: some have the number item fixed, some allow the user to specify the number of item or to specify a number of days to appear in the feed, independent of the number  items that will trigger. Some provide an upper bound to this. 42% of the feeds has less than 15 items in it, 41% of the feeds has exactly 15 items in it, and 98% has 50 items or less. A bit more than 1% of the feeds has 100 or more items in it, while almost 2% of the feeds of this distribution has 0 items.

If we examine the individual item size we see that 50% of the items have a size of 1 KBytes or less. The distribution of item size is different for the different feed versions, which is partially explained by additional overhead in extra information that for example RSS 2.0 and Atom carry. But there is also an increased content component to it: 3% of the items in RSS 2.0 is larger than 10 KBytes.

You can expect one more posting on the basic analysis, on the distribution of the age of items in a feed.

