Experimental uncertainties: the normal distribution
We consider the statistical analysis of random experimental uncertainties in this worksheet. This will allow us to understand how to analyze repeated measurements of the same quantity, assuming that the uncertainties are caused by random measurement errors. These can include fluctuations in currents in a circuit (depending on someone else's loads in the power grid), variable reaction time when a stopwatch is used, parallax problems when reading a length. Some of these can also give rise to systematic errors, whereby all data values taken are consistently under- (or over-) estimated by the experimenter. These are much harder to track. The most typical systematic error that I observed in our laboratory over the years was the uncalibrated out-of-tune timebase of a time chart recorder (used in a Cavendish experiment to track the motion of two lead balls via a photodiode pick-up of a laser beam). This error was the easily spotted with a stopwatch, but no student was able to pick up on it (they simply believed the reading on the knob). A 20% systematic error in the timebase resulted in a wrong value of Newton's constant. Most students had a more accurate value from the manual experiment (where they tracked the motion by markings on graph paper pasted onto a wall), and were surprised that the automated experiment gave inferior results (and often inflated their uncertainties to avoid conflict with the literature value).
A detailed analysis of experimental uncertainties is crucial in order to assess whether systematic errors are present in someone's experiment or not. We will not enter into the question here as to why the experimental uncertainties can (usually) be assumed to follow a normal (Gaussian) distribution. The prerequisites for this situation are that there are many sources of small errors.
We are interested in a single quantity that is measured repeatedly. For the measurement of linear (or other) relationships an extension of the arguments presented here leads to the linear least squares (or quadratic) fit. In this worksheet we will simulate random errors by using normally distributed random numbers to test the theoretical ideas which are proposed for the analysis of experimental uncertainties.
Every experimenter will intuitively form averages from repeated measurements in order to arrive at an improved experimental result for the quantity to be measured. Understanding experimental uncertainties will not only lead to a proper justification, but also to a measure of the expected accuracy of the averaged result. For this purpose we have to adopt a model which tells us how the repeated measurements are distributed.
> restart;
To illustrate the concept of average and standard deviation we follow the textbook approach (John R. Taylor: An Introduction to Error Analysis , chapters 4 and 5, University Science Books 1982, 2nd ed. 1998). Suppose that we have carried out a measurement for the spring constant of a typical spring (out of a box containing a set of identically manufactured ones) repeatedly by timing the period of oscillation while a well-known mass was attached. The series of 10 measurements is given as (in Newton/meter) the following list of values with 2-digit accuracy:
> kL:=[86,85,84,89,86,88,88,85,83,85];
The choice of 2-digit accuracy was reasonable. Our timing device gave more digits of readout, but due to the substantial fluctuations in the second digit we decided to round the numbers to two digits. (Common mistakes made by students is to: (a) carry all 10 digits that their calculator produced while dividing two numbers that were measured only to 2 or 3 figures; (b) to trust digitial devices that display much more than the stated accuracy in the manual (digital voltmeters and ammeters); (c) ignoring the fact that timing motion with digital stopwatches does not eliminate human error, and reaction time; (d) measurements with calipers can be tricky due to bending, improper holding of the instrument.
Suppose that all numbers would have been equal in the set of measurements. Then there would be no deviation from the average. In that case we might ask ourselves whether more digits can be read from the instrument, and whether it is worthwhile to make an attempt at more precision. At this point we probably could proceed and carry out an uncertainty estimate with the data set with 3-digit accuracy.
The average value is presumably our best estimate:
> N:=nops(kL);
> k_a:=evalf(add(kL[i],i=1..N)/N,3);
Before we proceed to calculate the deviation we illustrate the data by forming a histogram (a frequency distribution or binning of the data).
We choose a bin size and count how many measurements fall into each bin. We could pick 10 bins around the calculated average value, or simply cover the range between 80 and 90 with 20 bins.
> dk:=0.5;
> k_i:=i->80+(i-0.5)*dk;
One way to proceed to bin the data is to subtract 80 from the value, multiply the remainder by 20 and round it to an integer (by adding 0.5 and truncating):
> i_k:=k->trunc(2*(k-80-0.5*dk)+0.5);
> j:=1; i_k(kL[j]),k_i(i_k(kL[j])),kL[j],k_i(i_k(kL[j])+1);
The defined functions are consistent, i.e., the index used in the binning process works out such that it assigns data values to fall to the right of the bordering value.
> for i from 1 to 20 do: C[i]:=0: od:
> for n from 1 to N do: ict:=i_k(kL[n]); if ict>0 and ict<=20 then C[ict]:=C[ict]+1; fi; od:
> with(plots): P1:=plot([k_a,t,t=0..2.5],color=red):
> P2:=plot([seq([k_i(ict)+0.5*dk,t*C[ict],t=0..1],ict=1..20)],color=blue): display(P1,P2);
Is this data set following a Gaussian distribution? At first we might think that it does not (suspicious gap to the right of the average). On second thought (and supported by testing done on the binomial and Poisson distributions in a separate worksheet using a chi-squared test of a hypothesis) we remain open-minded: the data set is much to small to rule out that a much larger data set obtained with the same measurement procedure will reach a Gaussian as its limiting distribution (limit of infinitely many measurements!)
The standard deviation of the data is a quantity that looks at the distances of the individual measurements from the calculated average. The distances are added quadratically in order not to cancel positive against negative errors. We average the squared deviations, and then take the square root to obtain a width (the deviation). When averaging the sum of the squared deviation we make a small adjustment to account for the fact that one degree of freedom from the N available from the data set is lost, as the average was obtained from the data set itself. Thus:
> sigma:=evalf(sqrt(add((kL[n]-k_a)^2,n=1..N)/(N-1)),3);
This number represents the average deviation of individual data points from the average value. It is the typical uncertainty associated with an individual measurement. If we make single measurements for other springs from the box of supposedly identical springs, and find that the value is within sigma away from our calculated average (based on N measurements on the first spring), then we should not be concerned at all about the question whether the springs are made identically to a sufficient degree, as they appear to be identical to within our measuring accuracy.
We did promise, however, more. We promised to be able to assess from the distribution the accuracy of the determined average value, which is presumably the best value we have (after N measurements). The uncertainty or standard deviation of the mean - provided the data do have a Gaussian as its limiting distribution - can be estimated as:
> sig_avg:='sigma/sqrt(N)';
> evalf(sig_avg);
This is smaller than the average uncertainty of the individual data points by sqrt(N) , which is a good reason to take as many data as possible as the reduction in this uncertainty with increasing N is slow.
Thus, the mean (average) together with its deviation can be listed for our example as
(85.9 plusminus 0.6) [N/m].
Normal distribution
To compare the histogram to a normal distribution it has to be properly normalized. We have just determined the two parameters that control the shape of the limiting Gaussian distribution, namely the position of the average (a Gaussian is symmetric), and the width (the deviation). The Gaussian (or normal) distribution is normalized by an integral to unit area. The frequency distribution at present is normalized to the number of measurements:
> add(C[ict],ict=1..20);
It is expedient to normalize it to unity:
> for ict from 1 to 20 do: C[ict]:=evalf(C[ict]/N,3); od:
The average value of the spring constant can now be obtained by a probabilistic sample average:
> evalf(add((k_i(ict)+0.5*dk)*C[ict],ict=1..20),3);
Other averages can be obtained similarly, e.g., the deviation squared of the individual data points:
> evalf(add((k_i(ict)+0.5*dk-k_a)^2*C[ict],ict=1..20),3);
> evalf(sqrt(%),3);
This is slightly less than the value obtained before, as it does not replace the 1/ N factor in the formation of the average squared deviation by 1/( N -1).
To make comparison with the limiting distribution we plot:
> P1:=plot([k_a,t,t=0..0.25],color=red): P4:=plot([[k_a-sig_avg,t,t=0..0.25],[k_a+sig_avg,t,t=0..0.25]],color=magenta): P5:=plot([[k_a-sigma,t,t=0..0.15],[k_a+sigma,t,t=0..0.15]],color=violet):
> P3:=plot(exp(-(k-k_a)^2/(2*sigma^2))/(sqrt(2*Pi)*sigma),k=80..90,color=green):
> P2:=plot([seq([k_i(ict)+0.5*dk,t*C[ict],t=0..1],ict=1..20)],color=blue): display(P1,P2,P3,P4,P5);
We have added a few vertical lines to the comparison: the magenta-coloured lines indicate the standard deviation of the mean; the violet-grey coloured line indicate the standard deviation for the Gaussian distribution associated with the individual measurements.
It is the obligation of the experimenter to demonstrate that the error in the data are indeed normally distributed, i.e., that the agreement with a Gaussian distribution improves for larger data sets. Once this is demonstrated, the probability theory based on the normal distribution can be used to further assess the confidence in our result.
First we demonstrate that the continuous Gaussian distribution results in the expected values for the total probability (area under the curve), the average value, and the deviation squared.
> int(exp(-(k-k_a)^2/(2*sigma^2))/(sqrt(2*Pi)*sigma),k=-infinity..infinity);
> int(k*exp(-(k-k_a)^2/(2*sigma^2))/(sqrt(2*Pi)*sigma),k=-infinity..infinity);
> int((k-k_a)^2*exp(-(k-k_a)^2/(2*sigma^2))/(sqrt(2*Pi)*sigma),k=-infinity..infinity);
One can ask how much probability is contained within the standard deviation interval:
> int(exp(-(k-k_a)^2/(2*sigma^2))/(sqrt(2*Pi)*sigma),k=k_a-sigma..k_a+sigma);
We say that we have 68% confidence that a measurement will fall within this interval. Why do we care about this statement (given that we have determined an accurate 'best' value by forming the average)? The answer is that what we learn about individual measurements can be transferred later to assess the accuracy of the 'best' value.
So let us continue with considering the individual measurements. A 68% confidence sounds not so terribly high. We can carry out area calculations, i.e., confidence estimates for larger than one-sigma intervals. For the - and -intervals we have:
> int(exp(-(k-k_a)^2/(2*sigma^2))/(sqrt(2*Pi)*sigma),k=k_a-2*sigma..k_a+2*sigma);
> int(exp(-(k-k_a)^2/(2*sigma^2))/(sqrt(2*Pi)*sigma),k=k_a-3*sigma..k_a+3*sigma);
Therefore, once we have determined sigma for our distribution, we can say with a definite confidence that subsequent measurements (by ourselves, or someone else provided our systematics are identical) will fall into ( )-intervals, for t =1,2,....
What relevance does this have for assessing our best value? One could perform a statistical analysis of the average value, i.e., run 10 datasets with 10 measurements respectively, compute the averages, and then the average of averages with known uncertainty. The previous assessment of the standard deviation of the mean is a shortcut (but it is a guess as opposed to a measurement). It guesses that these measurements of the averages ('best' values) will be distributed randomly according to a Gaussian with a width estimated to be sig_avg . Therefore, we can consider the interval surrounding the best estimate (red vertical line with magenta-coloured sidebars) to be the 68% confidence interval for the true value. We can draw the interval with 2*sig_avg and be 95% confident that the true value lies within this interval. Taylor provides a proof about the Gaussian distribution of the best values based on Gaussian distributed individual data measurements in section 5.7.
The lessons to be learned here are:
a) the one-sigma interval is not a holy grail. it contains only a 68% confidence assessment! One-sigma 'new' physics (propagated with big hoopla in Physical Review Letters or Physics Letters and suspected by the critical minds) often goes away! For three-sigma events the chances are much less that they are the result of hype rather than solid experimentation. By t -sigma 'new' physics we mean unexpected results that lie outside of the t -sigma confidence interval of conventional physics.
b) the reader may wish to understand why the average represents the best estimate for the measured value. The argument involves turning things around, i.e., assuming a limiting Gaussian distribution with yet undetermined parameters to calculate the probability to arrive at a certain measured data set (a product of N probabilities). A so-called principle of maximum likelihood is invoked which results in a determination of the peak of the Gaussian at the position given by the average of the measured data values (the arguments are given in Taylor's book, and elaborated on in applied probability theory courses).
c) there are probably lessons to be learned also for error propagation. Given that we are arriving at an understanding of uncertainties for one variable, we can learn that normally distributed errors in two (or more) variables conspire to form errors in some final result (that may involve some mathematical operation between the two basic variables). Taylor explains how addition of errors in quadrature follows from the normal distribution in section 5.6.
Simulations
We can verify our claims by generating virtual measurements in Maple, i.e., computer simulations of normally distributed measurements. Then we apply the formulae and check whether our findings are consistent with the input.
The first task is to find a distribution of random numbers which follows a Gaussian. Normally pseudo-random number generators produce randoms that fill the [0,1] interval uniformly. Normally distributed randoms fill the range -infinity..infinity in such a way that a histogram produces a Gaussian of width 1.
> with(stats):
> RNd:=[stats[random, normald](150)]:
> with(stats[statplots]):
> histogram(RNd,area=1);
Without the option area=1 the histogram would be arranged in such a way that each bar would contains the same area.
We can use the numbers produced with a deviation of 1 to generate errors by simply scaling the numbers with a constant factor. The average comes out to be zero, and thus we can add the scaled numbers to some hypothetical measurement value. Suppose we have measured an angle of val=65 degrees with some uncertainty.
> val:=38;
> N:=nops(RNd);
> MData:=[seq(val+0.5*RNd[i],i=1..N)]:
> histogram(MData,area=1);
Let us apply our expressions to recover the true value. In the example below it is important to carry more than 3 digits when carrying out the average!
> v_a:=evalf(add(MData[i],i=1..N)/N,7);
> sigma:=evalf(sqrt(add((MData[n]-v_a)^2,n=1..N)/(N-1)),5);
> sig_avg:='sigma/sqrt(N)';
> sig_avg:=evalf(sig_avg);
We have a 68% confidence interval which can exclude the true value.
> [evalf(v_a-sig_avg,4),evalf(v_a+sig_avg,4)];
For the 95% CI we almost certainly include the correct value for N >100:
> [evalf(v_a-2*sig_avg,4),evalf(v_a+2*sig_avg,4)];
If we don't include the correct value in the two-sigma interval, the cause for this discrepancy comes from the fact that the average of the sample taken according to a normal distribution has a systematic drift:
> evalf(add(RNd[i],i=1..N)/N);
Exercise 1:
Explore the question whether the true value lies within the 1-sigma and within the 2-sigma confidence intervals around the 'best' answer by:
a) changing the sample size (increase and decrease);
b) changing the true value of the measured quantity (increase and decrease), while the error size is kept the same
c) changing the error size, while keeping the original true value of the measured quantity.
Keep in mind that the simulation in Maple provides us with the luxury of many repeated measurements. In the undergraduate laboratory setting we often forego this luxury (unless experiments are automated), and pay the price of reduced precision.
We conclude the worksheet with an example of error propagation. Let us assume that the quantity measured above represents an angle of refraction in a simple experiment with a lightbeam at an air/glass interface. Let us assume that we know the index of refraction ( n =1.51 for glass) to a higher degree of precision than the angle measurement. What is the deviation in the incident angle. We first look at the transformed data sample:
> fun:=x->evalf(180/Pi*arcsin(sin(Pi/180*x)*1.51));
> IAngle:=map(fun,MData):
> IAngle[1];
> histogram(IAngle,area=1);
> IA_a:=evalf(add(IAngle[i],i=1..N)/N,7);
> sigmaIA:=evalf(sqrt(add((IAngle[n]-IA_a)^2,n=1..N)/(N-1)),5);
> sig_avgIA:='sigmaIA/sqrt(N)';
> sig_avgIA:=evalf(sig_avgIA);
Let us compare the true value with the confidence interval:
> IA_true:=evalf(fun(val),5);
> [evalf(IA_a-sig_avgIA,4),evalf(IA_a+sig_avgIA,4)];
We notice that the deviation tripled. How can this be explained?
We made the claim that the uncertainty in the index of refraction is negligible. Thus we use Snell's law sin(IA)=n*sin(RA) to relate the uncertainties in the reflected angle RA to the uncertainty in the incident angle IA . Considering the differentials results in the statement:
LHS: d[sin(IA)]=cos(IA)*d[IA]
RHS: n*d[sin(RA)]=n*cos(RA)*d[RA]
We can solve the equation for d[IA] :
d[IA] = n*(cos(RA)/cos(IA))*d[RA]
> evalf(1.51*cos(val*Pi/180)/cos(Pi/180*IA_a)*sigma);
This calculated value for sigmaIA is very close indeed to the one measured above. Thus, it is important to consider error propagation carefully!
We can illustrate the error propagation for this nonlinear function by a graph. This also serves to justify why a linearized treatment of the function in the neighbourhood of the mean argument value is jsutified (the linearization is implied by the use of first-oder differentials).
> f_ex:=180/Pi*arcsin(1.51*sin(Pi/180*x));
> f_lin:=convert(taylor(f_ex,x=38,2),polynom);
> sigma;
> P1:=plot([f_ex,f_lin],x=37..39,color=[red,blue]):
> P2:=plot([[val-sigma,65],[val-sigma,subs(x=val-sigma,f_lin)],[0,subs(x=val-sigma,f_lin)]],color=green): P3:=plot([[val,65],[val,subs(x=val,f_lin)],[0,subs(x=val,f_lin)]],color=yellow):
> P4:=plot([[val+sigma,65],[val+sigma,subs(x=val+sigma,f_lin)],[0,subs(x=val+sigma,f_lin)]],color=brown):
> display(P1,P2,P3,P4,scaling=constrained,title="incident vs refracted angle");
Exercise 2:
Determine the relationship between sigma and sigma_IA for a different mean refracted angle x both from the calculation, and from a corresponding graph.
We can also look at the relative deviations in the two angles:
> sigmaIA/IA_a;
> sigma/val;
The relative deviation doubled when going from refracted to incident angle.
Exercise 3:
Change the value of the exactly known reflected angle and repeat the simulation. Compare the prediction of the simulation on the relationship between d[IA] and d[RA] , and compare with the analytical result based on error propagation analysis. What happens in the above example when the glass is replaced by water?
The index of refraction for water equals n =1.33.
Exercise 4:
Use some other physics law that involves a nonlinearity to relate the uncertainties in two variables (a measured variable with known uncertainty is connected to a deduced variable). Perform a simulation, and confirm your results by an analytical derivation (as done above for Snell's law).
Systematic errors
So far we have assumed that the only source of errors were random, and that no systematic deviations were present, i.e., that the experimental apparatus was optimized at the level of desired precision. One possibility to check whether the measuring apparatus is by comparison (e.g., identically built ammeters are compared in their readings of the same current, known objects are measured with different calipers or micrometer screws, etc.). In this way one can establish the presence of systematic errors in the equipment (e.g., one ammeter is measuring consistently high values, etc.). Often we do not have the luxury of comparing equipment in this way and rely on tolerance specifications of the manufacturer who was able and obliged to establish the systematic tolerances for the equipment as it left the factory.
An important question that arises in this context is how to deal with this systematic error. Suppose we have established a standard deviation interval for a measurement sequence. How should we add the systematic measurement error to the statistical uncertainty? John R. Taylor points out (in chapter 4) that two situations can arise: either the error is known for the equipment never to exceed a certain tolerance (e.g., 1% of the reading; or 2% of the maximum reading on the scale). In this case one would simply add the systematic error to the statistical one. On the other hand, it might be true that based on a sequence of comparisons the systematic error is known to fall within a normal distribution, i.e., 70% of the instruments are better than 1%, but some of them are outside this interval. In this latter case the systematic error has to be added in quadrature to the uncertainty.
D[x]_tot = sqrt(D[x]_systematic^2 + D[x]_random^2)
A justification for adding independent errors (and uncertainties) in quadrature is given below. Usually it is the second method of adding systematic and random errors that we wish to use in a teaching laboratory. Even if equipment was certified to be within a certain tolerance range, we cannot be assured that after years of use (and possibly abuse) it was still performing to norm. We have more confidence in the statement that with some likelihood it still performed to specifications.
Uncertainty in a function of several variables
We have demonstrated in the previous section how an error propagates for a non-linear function of one variable. Effectively, the function was linearized in the range around the mean value x_a and the range x_a-sigma .. x_a+sigma was mapped into a correpsonding range surrounding f(x_a) using the first derivative f '( x ). Depending on the magnitude of the derivative the deviation in f (i.e., sigma_f ) can increase or decrease as compared to sigma . Now we demonstrate for a function of two variables that the uncertainties in x and y have to be added in quadrature to obtain the uncertainty in f ( x , y ).
First we generate a second sequence of normally distributed random numbers.
> RNd2:=[stats[random, normald](150)]:
We pick Snell's law again, and consider the following problem: suppose we measure corresponding incident and refracted angles in order to determine the index of refraction of the optically dense medium (we assume that for the purposes of the experiment air has an index of refraction of n =1, i.e., the vacuum value).
We measure the incident angle with the same accuracy as the refracted angle. The data set for the refracted angle is kept from the previous section, and we produce a data set for the incident angles (consistent with n =1.51).
> MDataIA:=[seq(IA_true+0.5*RNd2[i],i=1..N)]:
For the purposes of our simulation we simply associate with each datapoint from the set of refracted angles a corresponding datapoint from the set of incident angles to obtain a set of values for the index of refraction:
> MDatan:=[seq(evalf(sin(Pi/180*MDataIA[i])/sin(Pi/180*MData[i])),i=1..N)]:
> histogram(MDatan,area=1);
We calculate the 'best' value for n by forming the average, the deviation of the data, and the standard deviation of the mean:
> n_a:=evalf(add(MDatan[i],i=1..N)/N,7);
> sigma_n:=evalf(sqrt(add((MDatan[n]-n_a)^2,n=1..N)/(N-1)),5);
> sig_avg_n:='sigma_n/sqrt(N)';
> sig_avg_n:=evalf(sig_avg_n);
> [n_a-sig_avg_n,n_a+sig_avg_n];
The true value of n =1.51 falls barely outside the 70% confidence interval.
How can we predict the uncertainty sigma_n from the known uncertainties in the refracted and incident angles?
First we should verify that the uncertainty in the incident angle is of the same magnitude as for the refracted angle (note that it independently given in this section, and not determined via Snell's law as in the previous section!):
> IA_a:=evalf(add(MDataIA[i],i=1..N)/N,7);
> sigma_IA:=evalf(sqrt(add((MDataIA[n]-IA_a)^2,n=1..N)/(N-1)),5);
We need to apply the addition in quadrature rule given for a function q(x1, x2, ..., xn) as:
D[q] = sqrt((diff(q, x1)*D[x1])^2 + (diff(q, x2)*D[x2])^2 + ... + (diff(q, xn)*D[xn] )^2)
where the derivatives are evaluated at the average value of [x1, x2, ..., xn] .
Our function is given as:
> RF_ind:=sin(x1)/sin(x2);
> Dn:=sqrt((diff(RF_ind,x1)*Dx1)^2+(diff(RF_ind,x2)*Dx2)^2);
Here x1 and x2 are the incident and refracted angles respectively.
> evalf(subs(x1=Pi/180*IA_a,x2=Pi/180*v_a,Dx1=Pi/180*sigma_IA,Dx2=Pi/180*sigma,Dn));
It is important in the above expression to provide the conversion factors for the angles from degrees to radians not only in the arguments to the sine functions, but also for the uncertainties themselves!
The agreement of the predicted uncertainty with the one observed in the simulation (which represents a statistical measurement!) is very good.
Exercise 5:
Repeat the above simulation with a larger deviation in the incident and refracted angles. Pick different incident and refracted angles.
Exercise 6:
Repeat the simulation with a very much reduced statistical sample as corresponds to teaching laboratory measurements (or as happens sometimes in life science research when circumstances do not permit large sample sizes), e.g., N=5. Are the conclusions still valid?
>