# Testimonials

You guys are amazing at math and thanks again for doing my math online class for me! Without you I wouldn't be able to work on sunday!
Shannon~ California

I need help with math homework and statistics so I looked for statistics for dummies the book and found this great website. There team helped me get a A in my math class thank you!
Vanessa~Florida

Understanding Statistics - Frequently Asked Questions

1
.What is ANOVA, and what is it used for?

Analysis of Variance—ANOVA—is a statistical analysis method uses the ratio between the variance estimates of different samples from the same population. While there may be slight differences in sample statistics, if the samples are essentially the same, the variance values estimated for the population should be likewise similar. Very roughly, the closer the ratio is to 1, the greater the probability that differences between the samples are due to chance. More technically, a given ratio (with appropriate degrees of freedom) corresponds to a particular probability value in the F-distribution, as can be found using tables of said distribution. ANOVA is used to ascertain whether there are any REAL differences between given samples. For example, given two sample groups of patients form the same main population, one of which receives treatment and the other placebo, ANOVA answers the question “does the treatment have any significant effect?” To gauge effect size, should any effect be present, further analysis is required.

2.What is the difference between standard error and standard deviation?

The standard deviation measures the dispersion or “spread” of values about the mean within a given population—thus it is refers to the variability of values in a population with respect to the mean. Standard error is the standard deviation of a sampling distribution. For example, if a sample is drawn from a given population, the sample mean can be used as an estimate of the population mean. However, if many different samples of the same size were taken, they would all yield slightly different estimates of the population mean, forming a sampling distribution of means. The standard error of the mean would therefore refer to the standard deviation of the sampling distribution of means. It is referred to as such because when unbiased sampling is used, the standard deviation of the resultant population mean estimates is equal to the standard deviation of the error (variability of the estimates about the actual population mean).

3.How do you calculate Pearson’s correlation coefficient, r?

Pearson’s r essentially provides a measure of correlation between an independent variable and its associated dependent variable. This can be assessed graphically by plotting the values in a scatter plot, adding a line of best fit through the values and ascertaining visually how “tightly” the values are clustered around that line. Data that seems randomly dispersed implies no correlation, while data points that all lie on the line would imply perfect correlation. A positive sloping line is associated with positive correlation, while the inverse is true for a negatively sloping line. The value of the correlation coefficient, r, can be calculated as shown below:

A homeowner tabulated the data from his electricity bill as shown below.  The electricity usage reflects hot water, cooking, lights, and appliances.  The data show costs for each two-month period over more than two years, compared to the average daily temperature for the same time period. Construct an X-Y scatter plot of the data, and calculate Pearson’s correlation coefficient, r

4.When and how would you do a Student’s t-test?

Student’s t-test basically compares the difference between two given means to the overall variability in the data (variability within each sample, and variability between samples). The t-test is useful in situations where you wish to compare the mean values for two different samples of the same size with some essential difference e.g. one exposed to treatment and one not. The t-test provides the answer to the question “is the difference observed between the two means, if any, significant or is it simply due to chance?”

Example:

A local brewery conducted an experiment to determine the maximal alcohol concentration two strains of yeast, A and B, could tolerate in the presence of a special flavoring agent. The brewers grew five cultures of each yeast strain and measured the alcohol concentration at which the yeast populations declined due to the alcohol’s toxic effects. The data are shown in the table below. Was there any significant difference between the two yeast strains?

 Strain A max alcohol (%v/v) 12.2 13 14.4 13.22 13.01 Strain B max alcohol (%v/v) 15.2 11.1 12.23 14.7 14.9

Since the formula for the t-statistic is:

To find the probability P above, we must consult a table of the t-distribution and lookup the value which corresponds to a t value of 0.574 with 8 (nx + ny – 2) degrees of freedom and two tails. According to the data above, the probability that the difference between the two mean alcohol concentrations is purely due to chance is 0.581, far above the accepted value of 0.05 for significance. Hence, there is no significant difference between the two yeast strain with respect to their alcohol tolerance in the presence of the flavoring agent.

5. What is a type I error, and how do you avoid/minimize the risk of committing it?

Type I errors, a.k.a. “false positives”, occur when a given statistical test rejects the null hypothesis (H0) when it is in fact true. More simply, type I errors occur when given information incorrectly changes previous probabilities thus finding some “effect” when in fact no such effect exists. The type I error rate is normally symbolized using the Greek letter α which signifies the significance level—the probability value below which the results are sufficiently unlikely to be obtained by chance—for a given test. Type I errors cannot be removed altogether, but they can be minimized through experimental design decisions.

6.How do you calculate standard deviation, and more importantly, what does it mean?

The standard deviation measures the “spread” of a given data set. It is the average distance between each data point and the mean. As such, it is included with descriptive statistics due to the information it provides about the variability within the given population. It is particularly useful when comparing populations; two 7th grade classes may have the same mean score for a standardized math exam, however, unless the standard deviations were likewise similar, it would not be appropriate to draw the quick conclusion that, students in the two classes are doing similarly well or poorly overall.
The standard deviation, σ, may be calculated using the formula below, where is the population mean and  n  is the population size.

Example:
At a certain bakery, each loaf of bread is wrapped for sale with a nominal weight of 1.5lb. The weights of 15 loaves selected at random from one day’s baking were found to be 1.52645, 1.502434, 1.4985 , 1.5012, 1.5422, 1.5556, 1.5411, 1.4532, 1.53, 1.52229, 1.4988, 1.573, 1.456, 1.4832, 1.4443, 1.5223. Calculate the standard deviation of the sample.