Friday, August 5, 2022

Standard Deviation & Variance

Standard Deviation

The Standard Deviation is a measure of how spread out numbers are.

Its symbol is ฯƒ (the greek letter sigma)

The formula is easy: it is the square root of the Variance. So now you ask, "What is the Variance?"

 

 


Formulas

Here are the two formulas, explained at Standard Deviation Formulas if you want to know more:


The "Population Standard Deviation":

 

 

 

square root of [ (1/N) times Sigma i=1 to N of (xi - mu)^2 ]

 

 

 

 

The “Sample Standard Deviation”:

 

square root of [ (1/(N-1)) times Sigma i=1 to N of (xi - xbar)^2 ]

 

 

 

 

Looks complicated, but the important change is to
divide by N-1 (instead of N) when calculating a Sample Variance.

When you have "N" data values that are:

  • The Population: divide by N when calculating Variance (like we did)
  • A Sample: divide by N-1 when calculating Variance

Variance

The Variance is defined as:

The average of the squared differences from the Mean.

To calculate the variance follow these steps:

  • Work out the Mean (the simple average of the numbers)
  • Then for each number: subtract the Mean and square the result (the squared difference).
  • Then work out the average of those squared differences. (Why Square?)

 

Example of Standard Deviation vs. Variance

 

To demonstrate how both principles work, let's look at an example of standard deviation and variance.

Suppose you have a series of numbers and you want to figure out the standard deviation for the group. The numbers are 4, 34, 11, 12, 2, and 26. We need to determine the mean or the average of the numbers. In this case, we determine the mean by adding the numbers up and dividing it by the total count in the group:

(4 + 34 + 18 + 12 + 2 + 26) ÷ 6 = 16

square root of [ (1/N) times Sigma i=1 to N of (xi - mu)^2 ]

So the mean is 16. Now subtract the mean from each number then square the result:

  • (4 - 16)2 = 144
  • (34 - 16)2 = 324
  • (18 - 16)2 = 4
  • (12 - 16)2 = 16
  • (2 - 16)2 = 196
  • (26 - 16)2 = 100

Now we have to figure out the average or mean of these squared values to get the variance. This is done by adding up the squared results from above, then dividing it by the total count in the group:

(144 + 324 + 4 + 16 + 196 + 100) ÷ 6 = 130.67

This means we end up with a variance of 130.67. To figure out the standard deviation, we have to take the square root of the variance, which is 11.43

 

No comments:

Post a Comment

"๐Ÿš€ Delta Lake's Vectorized Delete: The Secret to 10x Faster Data Operations!"

"๐Ÿš€ Delta Lake's Vectorized Delete: The Secret to 10x Faster Data Operations!" Big news for data engineers! Delta Lake 2.0+ in...