Monday, February 20, 2017

Assignment 2


Introduction:

    This assignment practices the calculation of a variety of descriptive statistics methods both on paper and in Microsoft Excel. It also practices finding descriptive spatial statistics methods in ArcMap.

Part 1:

    To begin, definitions of the descriptive statistics methods that will be used need to be defined. The first is range. Range is the difference between the greatest and the least of the observations. Next is mean, otherwise known as the average. This is all of the observations added, then divided by the number of observations. After this we have the median. This is simply the middle observation, and if there is not a single middle observation then it is the average of the two middle observations. Then there is the mode. This is the observation that occurs the most. We then have the kurtosis of the the dataset. This is the "pointiness" of the dataset curve. Next we have the skewness. This is the degree to which the dataset curve is pulled to one side or the other. A positive skewness is a dataset with a tail on the curve leading right (pulled by outliers on the right side), and a negative skewness is one that is pulled by outliers on the left, leading to a tail on that side of the distribution frequency curve. Finally, we have the standard deviation of the curve. This statistic is a measure of how spread out the data is. Its formulas are shown in Figure 1 (population standard deviation) and Figure 2 (sample standard deviation). A larger standard deviation means that the data is wider spread.
Figure 1
(Taken from class lecture)
Figure 2
(Taken from class lecture)
    In practice of these statistical methods, I have analyzed two bicycle race teams' scores. The scores for the two teams are shown in Figure 3. I have first calculated the standard deviations for the two
Figure 3
teams by hand. These are shown in Figure 4 and Figure 5. The overall team stats are in Figure 6.
Figure 4
Figure 5
Figure 6
In analysis of these statistics I would say that I would absolutely choose to invest in team Tobler over team Astana. Despite Astana having the winning racer in this race, team Tobler's racers are consistently better. There is less variation in skill level in the team (smaller range and smaller standard deviation) and both the median and mean race times are faster than team Astana's. If the team that wins gains $400,000 in many ways, and 35% of this goes to the owner, and the racer that wins the race gets $300,000 and only 25% of the money goes to the owner I would much rather invest in the team that will win over the team with the racer that will win the race and in this case that team is team Tobler, whose members are consistently better.


Part 2:

    Statistics methods used must again be defined. In this section mean center and weighted mean center are used. Mean center is the center of an area defined by the centers of all of the subareas that make it up. For example, in this assignment the mean center is calculated by use of the centers of all of the counties of Wisconsin. All x values for these centers are averaged, and then all y values are averaged. The mean center is the (mean x, mean y) coordinate. When it comes to weighted mean center all coordinates for subareas are weighted by individual values. In this specific assignment those values are the populations for each county in 2000 and then 2015.

    For creation of the map below (Figure 7) I was supplied with a data table already normalized for use in ArcMap by my instructor. This table included a GEO_ID column for joining, a name column, and populations for each county in both 2000 and 2015. I imported this table into a new file geodatabase in a folder for this specific assignment, and then right clicked on my shapefile with all counties of Wisconsin which I got from the US Census website. I selected join, then joined my instructor supplied table using the GEO_ID column to match records. I now ran the mean center tool three times, the second two selecting the two different population columns for weighing. Weighted mean center equation is shown below in Figure 8.a
Figure 7
Figure 8
(Taken from class lecture)

    The weighted mean centers clearly show the weight of the two largest cities in Wisconsin: Milwaukee and Madison. Since 2000, it seems that the populations of Madison and other cities on the western side of the state have increased disproportionately to the increase in population that Milwaukee has seen. Looking back at the data it seems that this is very plausable, Dane County population having increased by 19.62% in the 15 year gap from 426,526 to 510,198, while Milwaukee County's population only increased 1.68% from 940,164 to 955,939.






No comments:

Post a Comment