Sunday, 5 August 2018

Analysis of Variance (ANOVA) and F-test in R Language

Irawen August 05, 2018 R No comments

Analysis of Variance (ANOVA)

A statistical method for making simultaneous comparisons between two or more means; a statistical method that yields values that can be tested to determine whether a significant relation exists between variables.

Examples :-

A car company wishes to compare the average petrol consumption of THREE similar models of car and has available six vehicles of each model.

A teacher is interested in a comparison of average percentage marks attained in the examinations of FIVE different subjects and has available the marks of eight students who all completed each examination.

What ANOVA looks at is the way groups differ internally versus what the difference is between them. To take the above example:
ANOVA calculates the mean for each group.
It calculates the mean for all the group combined - the Overall Mean.
Then it calculates, within each group, the total deviation of each individual's score from the Group Mean - Within Group Variation.
Next, it calculates the deviation of each Group Mean from the Overall Mean - Between Group Variation.
Finally, ANOVA produces the F statistic which is the ratio - Between Group Variation to the Within Group Variation.
If the Between Group Variation is significantly greater than the Within Group Variation, then it is likely that there is a statistically significant difference between the groups.

One way Analysis of Variance

setwd("C:\\PERSONAL\\Irawen_Business_Analytics_With_R\\WIP\\Class-8 data-1")
data.ex1<-read.table("Class-8 data.txt",sep=',',head=T)
aov.ex1 = aov(Alertness~Dosage,data=data.ex1)
summary(aov.ex1)

Two way Analysis of Variance

Data are from an experiment in which alertness level of male and female subjects was measured after they had been given one of two possible dosages of a drug. Thus, this is a 2x2 design with the factors being Gender and Dosage

Used of Analysis of Variance

→ ANOVA is a particular form of statistical hypothesis testing heavily used in the analysis of experimental data.

→ A statistical hypothesis test is a method of making decisions using data. A test result (calculated from the null hypothesis and the sample) is called statistically significant if it is deemed unlikely to have occurred by chance, assuming the truth of the null hypothesis. A statistically significant result, when a probability (p-value) is less than a threshold (significance level), justifies the rejection of the null hypothesis, but only if the a priori probability of the null hypothesis is not high.

→In the typical application of ANOVA, the null hypothesis is that all groups are simply random samples of the same population. This implies that all treatments have the same effect (perhaps). Rejecting the null hypothesis implies that different treatments results in altered effects.

More on ANOVA

ANOVA is the synthesis of several ideas and it is use for multiple purposes. As a consequence, it is difficult to define concisely or precisely.

Classical ANOVA for balanced data does three things at once:
- As exploratory data analysis, an ANOVA is an organization of an additive data decomposition, and its sums of squares indicate the variance of each component of the decomposition (or, equivalently, each set of terms of a linear model).
- Comparisons of mean squares, along with F-tests ... allow testing of a nested sequence of models.
- Closely related to the ANOVA is a linear model fit with coefficient estimates and standard errors. In short, ANOVA is a statistical tool used in several ways to develop and confirm an explanation for the observed data.

Additionally:

- It is computationally elegant and relatively robust against violations of its assumptions.
- ANOVA provides industrial strength (multiple sample comparison) statistical analysis.
- It has been adapted to the analysis os a variety of exprimental designs.

F-test

An F-test is any statistical test in which the test statistic has an F-distribution under the null hypothesis.

It is most often used when comparing statistical models that have been fitted to a data set, in order to identify the model that best fits the population from which the data were sampled.

Exact "F-test" mainly arise when the models have been fitted to the data using least squares.

The name was coined by George W. Snedecor, in honour of Sir Ronald A.Fisher.

Fisher initially developed the statistic as the variance ratio in the 1920s.

The formula for the one-way ANOVA F-test statistic is: