Circular buffers

Let's take a closer look at how the temperature samples were collected in our calibrated thermistor project.

In many scientific projects, keeping good statistics helps to understand when data are real (or significant) and when they are the result of noise (random variations in the measurement). We created the class Circular to collect samples and do simple statistics on them so we could find out how accurate our thermometer reading were. Here is that class once again:

class Circular
{
  double samples[ 200 ];
  long int count;
  double mean_value;
  double standard_deviation;
  double variance;
  enum { COUNT = (sizeof samples / sizeof *samples) };
  
public:
  Circular( void ) { count = 0; }
  void store( double value ) { samples[ count++ % (sizeof samples / sizeof *samples) ] = value; }
  
  void
  calculate_statistics( void )
  {
    double sum = 0;
    int cnt = min( count, COUNT );
    for( int i = 0; i < cnt; i++ )
      sum += samples[ i ];

    mean_value = sum / cnt;

    variance = 0;

    for( int i = 0; i < cnt; i++ )
    {
      double deviation = samples[ i ] - mean_value;
      variance += deviation * deviation;
    }

    variance /= ( cnt - 1 );
    standard_deviation = sqrt( variance );
  }
  
  double mean( void ) { return mean_value; }
  double std_dev( void ) { return standard_deviation; }
  double var( void ) { return variance; }
  int num_samples( void ) { return min( count, COUNT ); }
};

The class gets its name from a simple concept. We want to collect up to 200 samples, and no more. But we always want to analyze the last samples that came in. We put each sample into its own slot in an array that can hold 200 samples. But when we get the 201st sample, we copy it into the first slot, over-writing the first sample. The next sample goes into slot 2, and so on.

Let's look at how that happens. The array of samples is declared by the line

double samples[ 200 ];

The line

enum { COUNT = (sizeof samples / sizeof *samples) };

is just a clever way to set COUNT equal to 200. It says to take the number of bytes in the entire array, and divide that by the number of bytes in the first slot. We do this instead of using the value 200 so that we can easily change the 200 to some other number if we need more samples, or we need fewer samples (to leave room for other data later). Now we just change the one number, and all the other parts of the code that depend on the size of the array will simply work without change.

 

The line

void store( double value ) { samples[ count++ % (sizeof samples / sizeof *samples) ] = value; }

might be easier to read if we break it up into several lines:

void
store( double value )
{
    samples[ count++ % (sizeof samples / sizeof *samples) ] = value;
}

There are a few tricky things going on here. First, we have the variable count, which keeps track of how many samples we have collected. The two plus signs after it mean that it gets incremented (we add one to it) after it is used. The percent sign means to take the value of count (before it is incremented), and divide it by 200 (our sizeof trick), and use the remainder as our index into the array. Then the variable value is stored in thesamples array at that index.

At first, count is equal to zero (we'll see how that happens in a moment).

Zero divided by 200 is 0, with a remainder of 0, so the first value goes into samples[0].

Now count is incremented, so its value is 1.

One divided by 200 is 0, with a remainder of 1. So the next value goes into slot 1.

Now lets jump ahead to when we have 200 samples. The count variable is 200. 200 divided by 200 is 1, with a 0 as the remainder.

So the next sample goes into slot 0.

It's as if we had a circle of pots to put things in, and we just keep going around the circle, putting things in the pots.

 

Right after the keyword public is a funny declaration:

Circular( void ) { count = 0; }

It uses the same name as the class itself, but it looks like a method declaration without a return type.

This is called a constructor. Whenever we create an instance of our class, this method will be called, just once.

It sets up things in the class. In this case, it sets out count variable to zero.

 

Now let's look at the method calculate_statistics().

Sometimes out count variable will be less than 200, and sometimes it will be much greater. We want to know how many samples we have collected. That number is the smaller of count or 200. That is what min() gives us.

int cnt = min( count, COUNT );

Now we want to visit all the samples we have collected, and add them up. We use thefor() statement to do that.

    for( int i = 0; i < cnt; i++ )
      sum += samples[ i ];

It says to make a new variable called i and set it to zero. Then as long as i is less than cnt, do the next line, and then increment i.

The next statement says to add each sample into the variable sum.

Dividing that sum by the number of samples gives us the mean (the arithmetic average of all the samples).

The next for() statement calculates the sum of the squared differences from the mean.

    for( int i = 0; i < cnt; i++ )
    {
      double deviation = samples[ i ] - mean_value;
      variance += deviation * deviation;
    }

That is the variance. Note that this for() loop has curly braces around the two statements below it. The curly braces say to treat anything between them as if it were a single line. So in this case, the body of the loop has two statements that are executed each time through the loop.

To get the standard deviation of the sample set, we divide the variance by one less than the sample size, and take the square root. (I am being very brief here -- if you need a refresher on simple statistics, Google for "variance", and "standard deviation").

 

Our circular buffer class is now a convenient package that can be copied into other programs that need to collect samples of data and do simple statistics on them.