Elementary Statistics.

An example on Standard Deviation:

Given a set of data we know how to locate the maximum value, the minimum value and the arithmetic mean. We have seen several examples and the methods can be summarized as follows:

For the maximum, we designate as temporary maximum the first element of the data and then we sweep through the rest one by one. As we do so, we compare each entry to the temporary maximum. If the current entry is greater than the temporary maximum then we replace the temporary maximum by this entry, otherwise we pass.  At the end the temporary maximum will end up being the actual maximum.

For the minimum we follow the exact same procedure but of course we replace the temporary minimum with the current entry only if the entry is less than the temporary minimum.

For the arithmetic mean we form the sum of all data using sum = sum + entry and then at the end we divide the sum by the number of entries.

Another elementary statistical characteristic of a given set of data is the Standard Deviation.  Look at the two sets of data given below

 67 65 66 68 69 64 62 90 71 60 50 40 60 90

Both have arithmetic mean 65.86 but they are different in character. In the first set all numbers are near and about 65 or 66.  But in the second set the max is 90 and the min 40.  How can we quantify the difference between the two sets? We can calculate the standard deviation using the formula:

Here  x with a bar denotes the arithmetic mean ( 65.86 for the present examples) and N is the number of data entries ( 7  in these examples).  Calculation of STD using our calculator produces 2.23 for the first set but 17.65 for the second set.

Using arrays makes the calculation of STD very simple.  Here we give an example program which performs such a calculation

/* An example on the use of arrays : Calculating the standard deviation in a

list of scores in a data file.

file: 6ex4.cpp

FALL 1998

___________________________________

Jacob Y. Kazakia   jyk0

October 13, 1998

Example 4 of week 6

Recitation Instructor: J.Y.Kazakia

Recitation Section  01

___________________________________

Purpose: This program reads a column of 20 float numbers from a file

named 6ex4data.txt into the array  a[20] .

It outputs  these numbers to a file named 6ex1rep.txt .

It then outputs to the same file the average score and the

standard deviation.

Algorithm: The average score avg is obtained by calculating the sum of

the twenty scores and then dividing by 20.

For the standard deviation we first calculate the sum of the

squares of   score - average score, ie.

sumsq = sum ( from m=0 to m=19) of { (a[m] - avg ) ^ 2  },

and then we calculate the standard deviation std by:

std = sqrt ( sumsq / 20 )

*/

#include <iostream.h>

#include <iomanip.h>

#include <fstream.h>

#include <math.h>

void main()

{

// declare the variables of the main function

float a[20];   //  This is an array of twenty entries named as:

//   a[0], a[1], a[2], ......., a[18], a[19].

int m;

float sum = 0.0;

float avg = 0.0; // the average score

float sumsq = 0.0 ;

float std = 0.0;

ifstream FinalExam  ( "6ex4data.txt" , ios:: in);

ofstream report     ( "6ex4rep.txt" , ios:: out);

for ( m = 0 ; m <= 19 ; m++ )

{

FinalExam >> a[m] ;

report << setiosflags( ios :: fixed) << setprecision(1);

report << "    a( " << setw(2) << m <<" ) = "<< setw(5) << a[m];

sum = sum + a[m] ;

report << endl;

}

// calculation of the average

avg = sum / 20 ;

report << setprecision(4);

report <<"\n\n    the average score is: " << avg << endl <<endl;

// calculation of the standard deviation

for ( m = 0 ; m <= 19 ; m++ )

sumsq = sumsq + ( a[m] - avg )* ( a[m] - avg );

std = sqrt ( sumsq / 20 );

report << " \n\n  The standard deviation  is: " << std << endl << endl ;

cout<< "    \n\n    DONE ! The output is in the file 6ex4rep.txt  \n\n";

cout<<" \n\n enter e (exit) to terminate the program....";

char hold;

cin>>hold;

}

/*

THIS IS THE FILE  6ex4rep.txt :

a(  0 ) =  78.5

a(  1 ) =  67.3

a(  2 ) =  90.9

a(  3 ) =  89.6

a(  4 ) =  23.4

a(  5 ) =   0.0

a(  6 ) =  67.5

a(  7 ) =  89.6

a(  8 ) =  79.5

a(  9 ) =  77.3

a( 10 ) =  94.9

a( 11 ) =  89.6

a( 12 ) =  45.4

a( 13 ) =  10.0

a( 14 ) =  67.5

a( 15 ) =  89.6

a( 16 ) =  78.5

a( 17 ) =  67.3

a( 18 ) =  90.9

a( 19 ) =  89.6

the average score is: 69.3450

The standard deviation  is: 27.3840

*/