Wednesday, April 5, 2017

Assignment 4: Hypothesis Testing

Introduction to Hypothesis Testing

T-Test or Z-Test
Elements of the Test

Z and T-tests are used to test whether or not a sample of a whole population or hypothesized situation is statistically different. A T-test is used when a sample is of less than 30 and a Z-test is used when the sample is greater than 30. The equation used for Z and T-testing is above. The process these tests are a part of is called hypothesis testing.

Hypothesis testing has 5 steps listed below:

–1. State the null hypothesis, Ho
      There is no significant difference between the sample mean and the mean for the entire population
–2. State the alternative hypothesis, Ha
      There is a significant difference between the sample mean and the mean for the entire population
–3. Choose a statistical test
      Use Z if n (size of sample) is greater than 30, or T if less than 30.
–4. Choose α or the level of significance
      α = 0.05 or α = 0.01 (usually) corresponding to a Z score (critical value) representing the place of  of 95% or 99% if a one-tailed test, and 97.5% or 99.5% if a two-tailed test. This is different however if a T-test is being performed as a different statistic (and a different critical value) will be used found by using degrees of freedom (sample population) and the α on the T-table.

–5. Calculate test statistic
      THIS IS DIFFERENT IF IT IS A T-TEST OR Z-TEST as talked about in step 4 above.

After calculating the test statistic, the null hypothesis can be rejected in favor of the alternative hypothesis, or can fail to be rejected. The null hypothesis is usually assumed true before evidence to the contrary (a T or Z-test pointing to the Ha) occurs.

Part 1


1. Trial Calculations:

Bolded fields were provided. α was found by subtracting the confidence interval from 100, then dividing the resulting value by 100, then dividing that resulting value by 2 if the interval type was 2 tailed. Next, each entry was deemed a Z or T-test depending on if n was greater or lesser than 30. Finally, a critical value was found from the Z table by finding the corresponding statistic to the α (1-α) in the table then finding the appropriate Z value using the column and row headings, or a critical value was found from the T table by using the n-1 value for the degrees of freedom and the α.


Interval Type
Confidence Level
n
α
z or t?
z or t value
A
2 Tailed
90
45
.05
Z
±1.28
B
2 Tailed
95
12
.025
T
±2.20099
C
1 Tailed
95
36
.05
Z
1.64
D
2 Tailed
99
180
.005
Z
±2.57
E
1 Tailed
80
60
.2
Z
0.84
F
1 Tailed
99
23
.01
T
2.50832
G
2 Tailed
99
15
.005
T
±2.97684

2.
A Department of Agriculture and Live Stock Development organization in Kenya estimate that yields in a certain district should approach the following amounts in metric tons (averages based on data from the whole country) per hectare: groundnuts. 0.57; cassava, 3.7; and beans, 0.29. A survey of 23 farmers had the following results:
Data for 2


a. Test the hypothesis for each of these products. Assume that each are 2 tailed with a Confidence Level of 95% *Use the appropriate test
b. Be sure to present the null and alternative hypotheses for each as well as conclusions
c. What are the probabilities values for each crop?
d. What are the similarities and differences in the results

For each crop I went through the five steps of hypothesis testing. I then analyzed the results in each step five, and finally compared all of them after performing each test.

Ground Nuts: 
1. Ho: There is no difference between the sample mean and the district estimate of ground nut yield. 
2. Ha: There is a difference between the sample mean and the district estimate of ground nut yield.
3. A T-test should be used because the sample size is less than 30.
4. The significance level is 95% with which α=.025 is found after considering the 2 tailed Interval Type. This corresponds with a critical value of ±2.07387
5. Running the T-test a value of -0.7993 is found corresponding to a probability value of 0.2148. This is in the interval from -2.07387 to 2.07387 so for the ground nuts the null hypothesis fails to be rejected.

Cassava:
1. Ho: There is no difference between the sample mean and the district estimate of cassava yield. 
2. Ha: There is a difference between the sample mean and the district estimate of cassava yield.
3. A T-test should be used because the sample size is less than 30.
4. The significance level is 95% with which α=.025 is found after considering the 2 tailed Interval Type. This corresponds with a critical value of ±2.07387
5. Running the T-test a value of -2.5578 is found corresponding to a probability value of 0.0054. This test statistic is outside of the interval set up by the critical value. This means that we reject the null hypothesis. It can then be said there is 95% certainty there is a difference between the sample and the hypothesis. 

Beans:
1. Ho: There is no difference between the sample mean and the district estimate of cassava yield. 
2. Ha: There is a difference between the sample mean and the district estimate of cassava yield.
3. A T-test should be used because the sample size is less than 30.
4. The significance level is 95% with which α=0.025 can found after considering the 2 tailed Interval Type. This corresponds with a critical value of ±2.07387
5. Running the T-test a value of 1.9983 is found corresponding to a probability value of 0.9767. This value falls in the interval set up by the critical value which means that the null hypothesis fails to be rejected.

Conclusions: Ground nut and bean sample average yields are not significantly different at the significance level set than the estimates while with the cassava sample average yield there is 95% certainty that the sample is less productive than the estimates.
3.
A researcher suspects that the level of a particular stream’s pollutant is higher than the allowable limit of 4.2 mg/l.  A sample of n= 17 reveals a mean pollutant level of 6.4 mg/l, with a standard deviation of 4.4.  What are your conclusions?  (one tailed test, 95% Significance Level) Please follow the hypothesis testing steps.  What is the corresponding probability value of your calculated answer
1. Ho: There is no difference between the sample level and the allowable level of pollutant in the stream.
2. Ha: The sample level of pollution reveals higher levels of pollutant than allowable in the stream.
3. A T-test should be used because the sample size is less than 30.
4. The significance level is 95% with which α=0.05 can be found considering this is a one-tailed test. The critical value found is 1.745884
5.Running the T-test on the sample data gives a value of 2.061553 corresponding to a probability of 0.9803

Conclusions: The T-test run gives a value that is greater than the critical value. This means the null hypothesis is rejected. It can be stated that it is 95% certain that the sample levels of pollution reveal higher levels of pollutant than are allowed in the stream.


Part 2

Introduction: 

Part 2 puts a spatial aspect into the realm of hypothesis testing. Using a shapefile of US Census block groups in the City of Eau Claire, another of block groups in Eau Claire County, processes in ArcMap, hypothesis testing, and mapping, the question of whether the average value of homes is significantly different for the city compared with county as a whole is answered.

Methods:

First calculations were made from the two groups. Statistics were found by right clicking on the appropriate column for the 2016 home values in the attribute tables in an ArcMap document of both of the shapefiles and clicking statistics. These then were processed the same as other hypothesis testings before, treating the City of Eau Claire group data as the sample of the whole county value data. 
Eau Claire City Home Value Statistics

Eau Claire County Home Value Statistics
Now, the five steps were followed to get the correct data.

1. Ho: There is no significant difference between the average home values of the City of Eau Claire block groups compared to the averages of the block groups for the whole county.
2. Ha: The average home value for the City of Eau Claire block groups is lower than the averages for the block groups of the entire county.
3. A Z-test should be performed because the n of the sample (the block groups of the City of Eau Claire) is greater than 30 (53).
4. A 95% significance level one-tailed test corresponding to an α of 0.05 should be used. The critical value for this using the Z-table is -1.64.
5. Using the data found earlier in ArcMap to calculate the Z-test, the value obtained is -2.572.

Conclusions and Discussion:

Because the resultant value of the Z-test is smaller than the critical value it can be concluded that with 95% certainty the average prices of homes are less in the City of Eau Claire than in the entire of the county.

The means of home values in the city and county are much less than the average prices of homes in the entire country which ranged from $340,600 to $385,700 in all of 2016.

Mapping:

 The data folder provided by the instructor was copied to a personal folder. then ArcMap was opened and the new document created saved in the personal folder as well. Now, dragging the two block group shapefiles in from the catalog on the right side of the screen, the setup of the view of the data was able to begin. A dissolve tool was first applied to the City of Eau Claire block group shapefile in order to remove internal boundaries for the creation of a later transparent but outlined feature class that could outline the block groups that were located inside the city. The transparency was set by selecting "no color" as the fill color and the outline set by selecting the color black and 2 as the outline width in the resulting "Symbol Selector" window from clicking on the symbol in the TOC (Table of Contents). The City of Eau Claire block groups shapefile was then turned off by unchecking the box in the TOC, and the symbology in the properties of the Eau Claire County block groups shapefile set to quantites, graduated symbols, the value set to "2016 Average Home Value," and the classification set to Natural Breaks with 5 classes. The values for display of the different classes were then changed to be more appropriate for display, the projected coordinate system of the data frame was set to NAD 1983 StatePlane Wisconsin Central, and other additions were made to the map, and it was exported. It can be seen below.

Eau Claire County and City Home 2016 Values by Block Group


No comments:

Post a Comment