Monday, February 20, 2017

Assignment 2


Introduction:

    This assignment practices the calculation of a variety of descriptive statistics methods both on paper and in Microsoft Excel. It also practices finding descriptive spatial statistics methods in ArcMap.

Part 1:

    To begin, definitions of the descriptive statistics methods that will be used need to be defined. The first is range. Range is the difference between the greatest and the least of the observations. Next is mean, otherwise known as the average. This is all of the observations added, then divided by the number of observations. After this we have the median. This is simply the middle observation, and if there is not a single middle observation then it is the average of the two middle observations. Then there is the mode. This is the observation that occurs the most. We then have the kurtosis of the the dataset. This is the "pointiness" of the dataset curve. Next we have the skewness. This is the degree to which the dataset curve is pulled to one side or the other. A positive skewness is a dataset with a tail on the curve leading right (pulled by outliers on the right side), and a negative skewness is one that is pulled by outliers on the left, leading to a tail on that side of the distribution frequency curve. Finally, we have the standard deviation of the curve. This statistic is a measure of how spread out the data is. Its formulas are shown in Figure 1 (population standard deviation) and Figure 2 (sample standard deviation). A larger standard deviation means that the data is wider spread.
Figure 1
(Taken from class lecture)
Figure 2
(Taken from class lecture)
    In practice of these statistical methods, I have analyzed two bicycle race teams' scores. The scores for the two teams are shown in Figure 3. I have first calculated the standard deviations for the two
Figure 3
teams by hand. These are shown in Figure 4 and Figure 5. The overall team stats are in Figure 6.
Figure 4
Figure 5
Figure 6
In analysis of these statistics I would say that I would absolutely choose to invest in team Tobler over team Astana. Despite Astana having the winning racer in this race, team Tobler's racers are consistently better. There is less variation in skill level in the team (smaller range and smaller standard deviation) and both the median and mean race times are faster than team Astana's. If the team that wins gains $400,000 in many ways, and 35% of this goes to the owner, and the racer that wins the race gets $300,000 and only 25% of the money goes to the owner I would much rather invest in the team that will win over the team with the racer that will win the race and in this case that team is team Tobler, whose members are consistently better.


Part 2:

    Statistics methods used must again be defined. In this section mean center and weighted mean center are used. Mean center is the center of an area defined by the centers of all of the subareas that make it up. For example, in this assignment the mean center is calculated by use of the centers of all of the counties of Wisconsin. All x values for these centers are averaged, and then all y values are averaged. The mean center is the (mean x, mean y) coordinate. When it comes to weighted mean center all coordinates for subareas are weighted by individual values. In this specific assignment those values are the populations for each county in 2000 and then 2015.

    For creation of the map below (Figure 7) I was supplied with a data table already normalized for use in ArcMap by my instructor. This table included a GEO_ID column for joining, a name column, and populations for each county in both 2000 and 2015. I imported this table into a new file geodatabase in a folder for this specific assignment, and then right clicked on my shapefile with all counties of Wisconsin which I got from the US Census website. I selected join, then joined my instructor supplied table using the GEO_ID column to match records. I now ran the mean center tool three times, the second two selecting the two different population columns for weighing. Weighted mean center equation is shown below in Figure 8.a
Figure 7
Figure 8
(Taken from class lecture)

    The weighted mean centers clearly show the weight of the two largest cities in Wisconsin: Milwaukee and Madison. Since 2000, it seems that the populations of Madison and other cities on the western side of the state have increased disproportionately to the increase in population that Milwaukee has seen. Looking back at the data it seems that this is very plausable, Dane County population having increased by 19.62% in the 15 year gap from 426,526 to 510,198, while Milwaukee County's population only increased 1.68% from 940,164 to 955,939.






Thursday, February 2, 2017

Assignment 1

Introduction: This assignment practiced and proves my ability to differentiate between levels of measurement, differentiate between classification methods, retrieve data from the U.S. Census, join data, and build cartographically pleasing maps.

Part 1: There are four main types of data relevant to mapping. These as nominal, ordinal, interval, and ratio data. Nominal data is data classified by name. Examples of nominal data are gender, land cover, dominant species, etc. In each of these examples are two or more categories each with a different name. Figure 1 shows two categories: more women, and more men. This map shows nominal data. Ordinal data is characterized by showing the rank or order each unit of data falls into. Figure 2 shows ordinal data because it shows the rankings of the 10 busiest air travel routes. Interval data describes a numerical value associated with each unit of data however 0 does not really represent anything in this type of data. The best example of this is a temperature map, as 0 does not represent 0 energy or absolute 0 in the most commonly used temperature scales. In other words, the temperature value is not a magnitude of temperature. Figure 3 shows interval data, and is a good example with negative values being displayed. Finally, ratio data is numerical data with a 0 that means something. The Richter scale is a good example of this with a 0 meaning no earthquake, therefore Figure 4 is a good example of a map showing ratio data. When looking at this map take into consideration that it is a bivariate map and that the size of the bubble is what shows the magnitude, the ratio data.

Figure 1
https://images.washingtonpost.com/?url=https://img.washingtonpost.com/blogs/wonkblog/files/2015/07/Gender.gif&op=noop
Figure 2
https://twistedsifter.files.wordpress.com/2013/08/top-10-busiest-air-travel-routes-of-2012.jpg?w=800&h=410
Figure 3
https://upload.wikimedia.org/wikipedia/commons/thumb/3/3c/JulArcticSfcT.svg/350px-JulArcticSfcT.svg.png
Figure 4
http://d3svfn6as6o5bl.cloudfront.net/mpt/howto/case_eq_world_eck4.png

Part 2: In a hypothetical situation in which I am working for an agriculture consulting and marketing company trying to find customers and persuade them that there needs to be more female farm operators I was able to make three maps using three different classification techniques that are shown below (Figures 5-7). It is evident from these maps that efforts should be concentrated on the farms in the northern and central areas of the state.

Figure 5 shows the equal interval classification method map. Each interval corresponding to a separate class has the same breadth in this scheme. Figure 6 shows the quantile classification method map. This method has an equal number of data points in each class. Figure 7 shows the natural breaks method. This method finds the larges gaps in data and designates class extents based on these. After making these three maps it is clear that the quantile method produces the map that would best persuade business owners to instill females into higher ranks. This is because if one focuses on the northern most areas in Wisconsin there are then the largest differences between these areas and the other area, creating a sense of being abnormal.

Figure 5

Figure 6
Figure 7