Elementary Statistics.

An example on Standard Deviation:

 

Given a set of data we know how to locate the maximum value, the minimum value and the arithmetic mean. We have seen several examples and the methods can be summarized as follows:

For the maximum, we designate as temporary maximum the first element of the data and then we sweep through the rest one by one. As we do so, we compare each entry to the temporary maximum. If the current entry is greater than the temporary maximum then we replace the temporary maximum by this entry, otherwise we pass.  At the end the temporary maximum will end up being the actual maximum.

For the minimum we follow the exact same procedure but of course we replace the temporary minimum with the current entry only if the entry is less than the temporary minimum.

For the arithmetic mean we form the sum of all data using sum = sum + entry and then at the end we divide the sum by the number of entries.

Another elementary statistical characteristic of a given set of data is the Standard Deviation.  Look at the two sets of data given below

 67

65

66

68

69

64

62

 

 

 

 

 

 

 

90

71

60

50

40

60

90

 

Both have arithmetic mean 65.86 but they are different in character. In the first set all numbers are near and about 65 or 66.  But in the second set the max is 90 and the min 40.  How can we quantify the difference between the two sets? We can calculate the standard deviation using the formula:

 


 


Here  x with a bar denotes the arithmetic mean ( 65.86 for the present examples) and N is the number of data entries ( 7  in these examples).  Calculation of STD using our calculator produces 2.23 for the first set but 17.65 for the second set.

Using arrays makes the calculation of STD very simple.  Here we give an example program which performs such a calculation

 

/* An example on the use of arrays : Calculating the standard deviation in a

                                     list of scores in a data file.

 

                                        file: 6ex4.cpp

   FALL 1998

   ___________________________________

   Jacob Y. Kazakia   jyk0

   October 13, 1998

   Example 4 of week 6

   Recitation Instructor: J.Y.Kazakia

   Recitation Section  01

   ___________________________________

 

Purpose: This program reads a column of 20 float numbers from a file

         named 6ex4data.txt into the array  a[20] .

         It outputs  these numbers to a file named 6ex1rep.txt .

         It then outputs to the same file the average score and the

         standard deviation.

 

Algorithm: The average score avg is obtained by calculating the sum of

           the twenty scores and then dividing by 20.

 

           For the standard deviation we first calculate the sum of the

           squares of   score - average score, ie.

 

           sumsq = sum ( from m=0 to m=19) of { (a[m] - avg ) ^ 2  },

 

           and then we calculate the standard deviation std by:

 

           std = sqrt ( sumsq / 20 )

 

 

                */

 

#include <iostream.h>

#include <iomanip.h>

#include <fstream.h>

#include <math.h>

 

void main()

{

// declare the variables of the main function

 

float a[20];   //  This is an array of twenty entries named as:

               //   a[0], a[1], a[2], ......., a[18], a[19].

int m;

float sum = 0.0;

float avg = 0.0; // the average score

float sumsq = 0.0 ;

float std = 0.0;

 

ifstream FinalExam  ( "6ex4data.txt" , ios:: in);

ofstream report     ( "6ex4rep.txt" , ios:: out);

 

for ( m = 0 ; m <= 19 ; m++ )

{

 FinalExam >> a[m] ;

 report << setiosflags( ios :: fixed) << setprecision(1);

 report << "    a( " << setw(2) << m <<" ) = "<< setw(5) << a[m];

 sum = sum + a[m] ;

 report << endl;

}

 

// calculation of the average

 

avg = sum / 20 ;

report << setprecision(4);

report <<"\n\n    the average score is: " << avg << endl <<endl;

 

// calculation of the standard deviation

 

 

for ( m = 0 ; m <= 19 ; m++ )

 

 sumsq = sumsq + ( a[m] - avg )* ( a[m] - avg );

 

std = sqrt ( sumsq / 20 );

 

report << " \n\n  The standard deviation  is: " << std << endl << endl ;

 

 

cout<< "    \n\n    DONE ! The output is in the file 6ex4rep.txt  \n\n";

 

 

cout<<" \n\n enter e (exit) to terminate the program....";

char hold;

cin>>hold;

}

 

/*

 

   THIS IS THE FILE  6ex4rep.txt :

 

 

    a(  0 ) =  78.5

    a(  1 ) =  67.3

    a(  2 ) =  90.9

    a(  3 ) =  89.6

    a(  4 ) =  23.4

    a(  5 ) =   0.0

    a(  6 ) =  67.5

    a(  7 ) =  89.6

    a(  8 ) =  79.5

    a(  9 ) =  77.3

    a( 10 ) =  94.9

    a( 11 ) =  89.6

    a( 12 ) =  45.4

    a( 13 ) =  10.0

    a( 14 ) =  67.5

    a( 15 ) =  89.6

    a( 16 ) =  78.5

    a( 17 ) =  67.3

    a( 18 ) =  90.9

    a( 19 ) =  89.6

 

 

    the average score is: 69.3450

 

 

 

  The standard deviation  is: 27.3840

 

*/

 

 

( Click here for a text file of the above example 6ex4 )

 

 

©   2001  J. Y. Kazakia   All rights reserved