March 18, 2004

Number of Feed Pulls per Day

In an earlier posting I showed distribution and cumulative distribution graphs of the inter-arrival times of feed requests, when coming from the same user. The graphs are mainly useful if we want to look the number of requests that arrive at the system, and not so much at the individual user. There will be more requests arriving from users that have configured their aggregators to poll every 5 minutes than those that have configured their system to poll every hour or more. To have a better look at what the behavior of the individual user is, we can have a look the distribution of the number of  poll requests per user per day.

The graphs show the expected exponential behavior, expect for the area around the 24 polls per day. This outlier hints at that there is a significant number of users that have their aggregators set to poll once per hour and that leave it running 24 hour per day. When investigating this subset of the users we see indeed that more than 90% of this group has an inter-arrival time of one hour.

Of these graphs only a few points exhibit this strong correlation between number of pulls per day and the inter-arrival time. For example if a user pulls 10 times per day this does not mean the inter-arrival time is around 24 minutes. It is highly likely in that case that the user ran her aggregator during the day set to 1 hour pull intervals and shutdown the aggregator when going home.

