#install.packages("readr")
library(readr)
library(mosaic)
GroupA_EducationAttainment <- read.csv("~/Desktop/FALL2016/SDS291/Project/GroupA_EducationAttainment.csv")
GroupA_PoliceKillings <- read.csv("~/Desktop/FALL2016/SDS291/Project/GroupA_PoliceKillings.csv")
GroupA_PovertyRate <- read.csv("~/Desktop/FALL2016/SDS291/Project/GroupA_PovertyRate.csv")
GroupA_StatePopulation <- read.csv("~/Desktop/FALL2016/SDS291/Project/GroupA_StatePopulation.csv")
GroupA_BigDataSet <- read.csv("~/Desktop/FALL2016/SDS291/Project/GroupA_BigDataSet.csv")
str(GroupA_EducationAttainment)
## 'data.frame': 51 obs. of 7 variables:
## $ State : Factor w/ 51 levels "AK","AL","AR",..: 2 1 4 3 5 6 7 9 8 10 ...
## $ LessthanHS2014 : num 15.4 8.1 13.8 14.4 17.9 9.5 9.9 10.9 9.9 12.8 ...
## $ HSorHigher2014 : num 84.6 91.9 86.2 85.6 82.1 90.5 90.1 89.1 90.1 87.2 ...
## $ HSOnly2014 : num 31.7 25.8 24.4 35.1 20.9 21.9 27.5 31.6 17.8 29.5 ...
## $ BachelororHigher2014: num 23 29.1 27.4 21.4 31.7 38 37.9 30.3 54.8 27.4 ...
## $ BachelorOnly2014 : num 14.5 18.7 17.3 14.1 19.9 24.1 21.3 17.9 24.5 17.5 ...
## $ GraduateDegree2014 : num 8.5 10.4 10.1 7.3 11.8 13.9 16.7 12.4 30.2 9.9 ...
In GroupA_EducationAttainment
, there are 7 variables in the data, and 51 observations. The variables are:
State
contains the abbreviations of each state, as well as DC (factor variable).LessthanHS2014
is a numeric variable that contains the percentage of people who are below high school graduates in 2014.HSorHigher2014
is a numeric variable that contains the percentage of people who are high school graduates or higher in 2014.HSOnly2014
is a numeric variable that contains the percentage of people who are only high school graduates in 2014.BachelororHigher2014
is a numeric variable that contains the percentage of people who have a bachelor’s degree or higher.BachelorOnly2014
is a numeric variable that contians the percentage of people who have only a bachelor’s degree in 2014.GraduateDegree2014
is a numeric variable that contains the percentage of people who have a graduate degree in 2014.favstats(~LessthanHS2014, data=GroupA_EducationAttainment)
## min Q1 median Q3 max mean sd n missing
## 7.1 9.05 10.7 14.2 17.9 11.4549 3.029212 51 0
Minimum is 7.1%. Maximum is 17.9%. There is no missing data. This makes sense.
favstats(~HSorHigher2014, data=GroupA_EducationAttainment)
## min Q1 median Q3 max mean sd n missing
## 82.1 85.8 89.3 90.95 92.9 88.5451 3.029212 51 0
Minimum is 82.1%. Maximum is 92.9%. There is no missing data. This makes sense.
favstats(~HSOnly2014, data=GroupA_EducationAttainment)
## min Q1 median Q3 max mean sd n missing
## 17.8 26.1 28.3 31.55 41.3 28.67647 4.247191 51 0
Minimum is 17.8%. Maximum is 41.3%. There is no missing data. This makes sense.
favstats(~BachelororHigher2014, data=GroupA_EducationAttainment)
## min Q1 median Q3 max mean sd n missing
## 19.3 26.05 28.7 32.35 54.8 29.7098 6.124941 51 0
Minimum is 19.3%. Maximum is 54.8%. There is no missing data. This makes sense.
favstats(~BachelorOnly2014, data=GroupA_EducationAttainment)
## min Q1 median Q3 max mean sd n missing
## 11.7 16.55 18.5 20.65 24.5 18.4902 2.894841 51 0
Minimum is 11.7%. Maximum is 24.5%. There is no missing data. This makes sense.
favstats(~GraduateDegree2014, data=GroupA_EducationAttainment)
## min Q1 median Q3 max mean sd n missing
## 6.7 9.25 10.4 12.1 30.2 11.22353 3.78442 51 0
Minimum is 6.7%. Maximum is 30.2%. There is no missing data. This makes sense.
str(GroupA_PoliceKillings)
## 'data.frame': 467 obs. of 34 variables:
## $ name : Factor w/ 465 levels "A'donte Washington",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ age : Factor w/ 61 levels "16","17","18",..: 1 12 11 10 14 14 7 20 29 16 ...
## $ gender : Factor w/ 2 levels "Female","Male": 2 2 2 2 2 2 2 2 2 2 ...
## $ raceethnicity : Factor w/ 6 levels "Asian/Pacific Islander",..: 2 6 6 3 6 6 3 3 6 6 ...
## $ month : Factor w/ 6 levels "April","February",..: 2 1 5 5 5 5 5 5 3 2 ...
## $ day : int 23 2 14 11 19 7 27 26 28 7 ...
## $ year : int 2015 2015 2015 2015 2015 2015 2015 2015 2015 2015 ...
## $ streetaddress : Factor w/ 459 levels "1 Moreland Dr",..: 341 172 137 177 208 106 219 75 421 114 ...
## $ city : Factor w/ 364 levels "Albany","Albuquerque",..: 203 246 165 309 213 245 17 360 324 363 ...
## $ state : Factor w/ 47 levels "AK","AL","AR",..: 2 19 45 5 35 4 5 5 41 23 ...
## $ latitude : num 32.5 31.3 42.6 33.9 41.1 ...
## $ longitude : num -86.4 -92.4 -87.8 -118.2 -81.4 ...
## $ state_fp : int 1 22 55 6 39 4 6 6 48 26 ...
## $ county_fp : int 51 79 59 37 153 13 29 37 41 81 ...
## $ tract_ce : int 30902 11700 1200 535607 530800 111602 700 294200 603 14200 ...
## $ geo_id : num 1.05e+09 2.21e+10 5.51e+10 6.04e+09 3.92e+10 ...
## $ county_id : int 1051 22079 55059 6037 39153 4013 6029 6037 48041 26081 ...
## $ namelsad : Factor w/ 389 levels "Census Tract 1",..: 172 48 49 267 264 39 304 161 286 62 ...
## $ lawenforcementagency: Factor w/ 377 levels "Albuquerque Police Department",..: 203 270 155 309 156 255 10 181 372 158 ...
## $ cause : Factor w/ 5 levels "Death in custody",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ armed : Factor w/ 8 levels "Disputed","Firearm",..: 4 4 4 2 4 4 2 5 2 6 ...
## $ pop : int 3779 2769 4079 4343 6809 4682 5027 5238 4832 3795 ...
## $ share_white : Factor w/ 363 levels "-","0","0.1",..: 193 170 255 12 338 235 159 284 29 208 ...
## $ share_black : Factor w/ 246 levels "-","0","0.1",..: 121 134 199 8 16 199 5 4 57 199 ...
## $ share_hispanic : Factor w/ 293 levels "-","0","0.1",..: 196 7 59 293 18 262 179 274 234 113 ...
## $ p_income : Factor w/ 452 levels "-","10219","10987",..: 331 37 282 83 388 55 289 273 78 208 ...
## $ h_income : int 51367 27972 45365 48295 68785 20833 58068 66543 30391 44553 ...
## $ county_income : int 54766 40930 54930 55909 49669 53596 48552 55909 38310 51667 ...
## $ comp_income : num 0.938 0.683 0.826 0.864 1.385 ...
## $ county_bucket : int 3 2 2 3 5 1 4 4 2 3 ...
## $ nat_bucket : int 3 1 3 3 4 1 4 4 1 2 ...
## $ pov : Factor w/ 281 levels "-","1.1","1.3",..: 40 141 44 21 6 244 67 25 189 76 ...
## $ urate : num 0.0977 0.0657 0.1663 0.1248 0.0635 ...
## $ college : num 0.1685 0.1114 0.1473 0.0501 0.404 ...
There are 467 observations and 34 variables.
We are only interested in states and race, so we will group by state and race to see how many people were killed in each state by race and eliminate all of the other variables.
state_race_killings <- GroupA_PoliceKillings %>%
group_by(state,raceethnicity) %>%
summarize(n=n())
str(state_race_killings)
## Classes 'grouped_df', 'tbl_df', 'tbl' and 'data.frame': 108 obs. of 3 variables:
## $ state : Factor w/ 47 levels "AK","AL","AR",..: 1 1 2 2 3 3 3 4 4 4 ...
## $ raceethnicity: Factor w/ 6 levels "Asian/Pacific Islander",..: 3 4 2 6 2 5 6 2 3 4 ...
## $ n : int 1 1 4 4 1 1 2 1 6 2 ...
## - attr(*, "vars")=List of 1
## ..$ : symbol state
## - attr(*, "drop")= logi TRUE
Now, there are 108 observations and 3 variables. The variables are:
state
is a factor variable with 47 levels (this will need to be edited to include all of the states)raceethnicity
is a factor with 6 levels of different race.n
is an integer that contains the number of people killed per state.These variables all seem to make sense.
favstats(~state, data=state_race_killings)
## Warning in FUN(eval(formula[[2]], data, .envir), ...): Auto-converting
## factor to numeric.
## min Q1 median Q3 max mean sd n missing
## 1 10.75 24 36.25 47 23.58333 14.30452 108 0
There are 47 total states included in this, we will need to add the 3 missing states
tally(~raceethnicity, data=state_race_killings)
## raceethnicity
## Asian/Pacific Islander Black Hispanic/Latino
## 6 30 16
## Native American Unknown White
## 3 8 45
Number of people killed by race, this makes sense.
favstats(~n, data=state_race_killings)
## min Q1 median Q3 max mean sd n missing
## 1 1 3 5 27 4.324074 4.56895 108 0
Minimum is 1, maximum is 27, no mising data. This makes sense.
str(GroupA_PovertyRate)
## 'data.frame': 51 obs. of 11 variables:
## $ State : Factor w/ 51 levels "AK","AL","AR",..: 2 1 4 3 5 6 7 9 8 10 ...
## $ X.BPL.2014 : num 19.3 11.2 18.2 18.9 16.4 12 10.8 12.5 17.7 16.5 ...
## $ X.BPL.2015 : num 18.5 10.3 17.4 19.1 15.3 11.5 10.5 12.4 17.3 15.7 ...
## $ HIncome.2014 : int 42895 71671 50094 41302 61990 61351 70112 59746 71659 47496 ...
## $ HIncome.2015 : int 44765 73355 51492 41995 64500 63909 71346 61255 75628 49426 ...
## $ X.Unemp2014 : num 8.6 7.6 7.9 6.8 8.5 5.5 7.9 6.7 8.9 8 ...
## $ X.Unemp2015 : num 7.2 7.9 6.9 5.8 7.3 5.2 6.9 5.8 7.3 7 ...
## $ X.Below10000.2014: num 10.1 3.9 7.8 8.7 5.9 6 6 6.7 11.4 7.9 ...
## $ X.Below10000.2015: num 9.5 3.9 7.6 9.1 5.5 5.3 5.7 5.4 8.6 7.4 ...
## $ X.FoodStamp2014 : num 15.8 10.3 13.1 14.4 9.5 8.9 12.8 12.9 14.1 14.8 ...
## $ X.FoodStamp2015 : num 15.5 10.8 13 13.7 9.7 8.4 12.6 13 15.3 14.9 ...
There are 11 variables in this dataset, and 51 observations. They are:
State
contains the abbreviations of each state, as well as DC (Factor variable)X.BPL.2014
contains the percentage of people below the poverty line in a state in 2014 (numeric variable)X.BPL.2015
contains the percentage of people below the poverty line in a state 2015 (numeric variable)HIncome.2014
is the estimated median household income in 2014 per state, measured in dollars ( Integer variable)HIncome.2015
is the estimated median household income in 2015 per state, measured in dollars ( Integer variable)X.Unemp2014
is the estimated unemployment per state measured as a percent of the population in that state in 2014 (numeric variable)X.Unemp2015
is the estimated unemployment per state measured as a percent of the population in that state in 2015 (numeric variable)X.Below10000.2014
is the percentage of hoseholds with an income below $10,000 in 2014 (numeric variable)X.Below10000.2015
is the percentage of hoseholds in a state with an income below $10,000 in 2015 (numeric variable)X.FoodStamp2014
is the percentage of households in a state that recieved Foodstamp/SNAP benefits in 2014 (numeric variable)X.FoodStamp2015
is the percentage of households in a state that recieved Foodstamp/SNAP benefits in 2015 (numeric variable)tally(~State, data=GroupA_PovertyRate)
## State
## AK AL AR AZ CA CO CT DC DE FL GA HI IA ID IL IN KS KY LA MA MD ME MI MN MO
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## MS MT NC ND NE NH NJ NM NV NY OH OK OR PA RI SC SD TN TX UT VA VT WA WI WV
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## WY
## 1
There are 50 states plus DC, so the tally of one of each makes sense.
favstats(~X.BPL.2014, data=GroupA_PovertyRate)
## min Q1 median Q3 max mean sd n missing
## 9.2 12.1 14.8 17.2 21.5 14.86667 3.085882 51 0
Minimum Percent Below Poverty Line in 2014 is 9.2%, max is 21.5%. There is no data missing, this makes sense.
favstats(~X.BPL.2015, data=GroupA_PovertyRate)
## min Q1 median Q3 max mean sd n missing
## 8.2 11.5 14.5 16.25 22 14.23333 3.100108 51 0
Minimum Percent Below Poverty Line in 2015 is 8.2%, max is 22%. There is no data missing, this makes sense.
favstats(~HIncome.2014, data=GroupA_PovertyRate)
## min Q1 median Q3 max mean sd n missing
## 39702 47732.5 52707 61163.5 74070 54499.65 9234.893 51 0
Minimum Median Household Income in 2014 is $39,702, max is $74070. There is no data missing, this makes sense.
favstats(~HIncome.2015, data=GroupA_PovertyRate)
## min Q1 median Q3 max mean sd n missing
## 40593 49467.5 54736 63200 75847 56406.76 9517.995 51 0
Minimum Median Household Income in 2015 is $40593, max is $75847. There is no data missing, this makes sense.
favstats(~X.Unemp2014, data=GroupA_PovertyRate)
## min Q1 median Q3 max mean sd n missing
## 3 5.5 7 7.9 9.8 6.741176 1.536643 51 0
Minimum Unemployment Rate in 2014 is 3%, max is 9.8%. There is no data missing, this makes sense.
favstats(~X.Unemp2015, data=GroupA_PovertyRate)
## min Q1 median Q3 max mean sd n missing
## 2.6 5.05 6 6.95 8.9 5.919608 1.340451 51 0
Minimum Unemployment Rate in 2015 is 2.6%, max is 8.9%. There is no data missing, this makes sense.
favstats(~X.Below10000.2014, data=GroupA_PovertyRate)
## min Q1 median Q3 max mean sd n missing
## 3.8 6 7.1 8.2 11.4 7.268627 1.763121 51 0
Minimum Percentage of Households with income below $10,000 in 2014 is 3.8%, max is 11.4%. There is no data missing, this makes sense.
favstats(~X.Below10000.2015, data=GroupA_PovertyRate)
## min Q1 median Q3 max mean sd n missing
## 3.9 5.6 6.7 7.75 11.5 6.831373 1.683626 51 0
Minimum Unemployment Rate in 2015 is 3.9%, max is 11.5%. There is no data missing, this makes sense.
favstats(~X.FoodStamp2014, data=GroupA_PovertyRate)
## min Q1 median Q3 max mean sd n missing
## 6.7 11 13.1 15.2 18.9 13.03725 2.967555 51 0
Minimum Percentage of Households on Food Stamp/SNAP benefits in 2014 is 6.7%, max is 18.9%. There is no data missing, this makes sense.
favstats(~X.FoodStamp2015, data=GroupA_PovertyRate)
## min Q1 median Q3 max mean sd n missing
## 4.7 10.7 13 14.9 18.6 12.56078 3.102327 51 0
Minimum Percentage of Households on Food Stamp/SNAP benefits in 2015 is 4.7%, max is 18.6%. There is no data missing, this makes sense.
str(GroupA_StatePopulation)
## 'data.frame': 51 obs. of 3 variables:
## $ State : Factor w/ 51 levels "AK","AL","AR",..: 2 1 4 3 5 6 7 9 8 10 ...
## $ Pop.2014: int 4846411 737046 6728783 2966835 38792291 5355588 3594762 935968 659836 19905569 ...
## $ Pop.2015: int 4858979 738432 6828065 2978204 39144818 5456574 3590886 945934 672228 20271272 ...
There are 3 variables in the data, and 51 observations. The variables are:
State
which is a factorPop.2014
is an integer that contains the size of population in 2014 per state.Pop.2015
is an integer that contains the size of population in 2015 per state.The variable types all make sense to me. Their names (States, Pop.2014, and Pop.2015) are self-explanatory.
favstats(~State, data=GroupA_StatePopulation)
## min Q1 median Q3 max mean sd n missing
## 1 13.5 26 38.5 51 26 14.86607 51 0
Minimum is 1, max is 51. There is no missing data. This makes sense.
favstats(~Pop.2014, data=GroupA_StatePopulation)
## min Q1 median Q3 max mean sd n missing
## 584304 1741778 4412617 6909145 38792291 6253086 7125130 51 0
Minimum is 584304 and maximum is 38792291, no missing data. This makes sense.
favstats(~Pop.2015, data=GroupA_StatePopulation)
## min Q1 median Q3 max mean sd n missing
## 586107 1749529 4425092 6999208 39144818 6302330 7201100 51 0
Minimum is 586107 and maximum is 39144818, no missing data. This makes sense.
tally(~State, data=GroupA_StatePopulation)
Tried this and there a bunch of 1’s and the state names.
str(GroupA_BigDataSet)
## 'data.frame': 52 obs. of 132 variables:
## $ State : Factor w/ 52 levels "AK","AL","AR",..: 2 1 4 3 5 6 7 9 8 10 ...
## $ TotalPop : int 4817678 728300 6561516 2947036 38066920 5197580 3592053 917060 633736 19361792 ...
## $ AgeUnder5yrs : int 299571 54498 440616 193697 2521299 337435 194338 55949 38546 1076836 ...
## $ Age5.9yrs : int 304412 51185 453952 200107 2531195 352695 217491 57547 29421 1100919 ...
## $ Age10.14yrs : int 321104 51427 455871 197225 2552173 344661 234666 56338 24069 1135272 ...
## $ Age15.17yrs : int 192831 30980 270053 118528 1607621 199906 148590 34682 15953 707950 ...
## $ Age18.24yrs : int 481858 79891 656248 287516 3988766 509480 341391 91658 83026 1779219 ...
## $ Age25.34yrs : int 618482 111054 874746 381975 5513196 764384 433145 117308 140234 2408242 ...
## $ Age35.44yrs : int 610792 92321 829680 367155 5175688 710650 459130 112435 87033 2419436 ...
## $ Age45.54yrs : int 675347 103682 840781 395446 5248476 724166 563772 131856 76294 2746426 ...
## $ Age55.64yrs : int 614020 91021 765082 363212 4310599 639858 468451 118205 67437 2468932 ...
## $ Age65.74yrs : int 401417 40949 559095 251769 2553063 359148 280541 81415 39650 1896734 ...
## $ Age75. : int 217634 16077 301840 137722 1417512 179860 162971 42367 21744 1139305 ...
## $ Age85. : int 80210 5215 113552 52684 647332 75337 87567 17300 10329 482521 ...
## $ White : int 3327891 484195 5174082 2306073 23650913 4364911 2789105 639354 254955 14747196 ...
## $ Black : int 1269808 25148 274380 458136 2262323 209062 365871 198028 314138 3114841 ...
## $ AmericanIndian.AlaskaNative : int 25181 102743 290780 18337 287360 49917 8036 3134 2072 59121 ...
## $ Asian : int 58322 40976 191071 38409 5130536 146561 145842 31745 22785 490833 ...
## $ NativeHawaiianPacificIslander : int 1430 8335 12638 6278 147286 6641 1105 494 195 12128 ...
## $ OtherRace : int 58618 8382 418033 60565 4890329 240282 183297 20369 23999 484274 ...
## $ X2.Races : int 76428 58521 200532 59238 1698173 180206 98797 23936 15592 453399 ...
## $ AverageHouseholdSize : num 2.55 2.79 2.69 2.53 2.95 2.54 2.56 2.63 2.22 2.62 ...
## $ Pop25. : int 3217902 460319 4284776 1949963 24865866 3453403 2455577 620886 442721 13561596 ...
## $ LessthanHS25. : int 524368 37700 604392 306199 4602986 332246 257011 74267 49099 1837056 ...
## $ HSGrad25. : int 999761 126611 1050079 682451 5153257 759335 677887 195806 82531 4024052 ...
## $ SomeCollege25. : int 951960 168570 1469229 558722 7400714 1068138 612128 168049 74717 4071061 ...
## $ Bachelor25. : int 465268 82261 733845 263299 4870524 819675 506662 108647 103179 2324792 ...
## $ Master25. : int 196935 32681 306410 98442 1889640 344986 290414 52733 79033 889974 ...
## $ ProfessionalSchool25. : int 47795 7370 71569 24187 575093 73307 71488 11146 37053 267663 ...
## $ DR25. : int 31815 5126 49252 16663 373652 55716 39987 10238 17109 146998 ...
## $ LessthanHS25.H : int 524368 37700 604392 306199 4602986 332246 257011 74267 49099 1837056 ...
## $ HSGrad25.H : int 2693534 422619 3680384 1643764 20262880 3121157 2198566 546619 393622 11724540 ...
## $ SomeCollege25.H : int 1693773 296008 2630305 961313 15109623 2361822 1520679 350813 311091 7700488 ...
## $ Bachelor25.H : int 741813 127438 1161076 402591 7708909 1293684 908551 182764 236374 3629427 ...
## $ Master25.H : int 276545 45177 427231 139292 2838385 474009 401889 74117 133195 1304635 ...
## $ ProfessionalSchool25.H : int 79610 12496 120821 40850 948745 129023 111475 21384 54162 414661 ...
## $ DR25.H : int 31815 5126 49252 16663 373652 55716 39987 10238 17109 146998 ...
## $ Pop16.19yrs : int 263787 40395 365547 159995 2181759 273868 205866 49118 34000 961151 ...
## $ Hsdropout : int 15454 2100 21813 8602 82842 13636 6390 2338 1727 49955 ...
## $ Hsgrad.enrolled : int 248333 38295 343734 151393 2098917 260232 199476 46780 32273 911196 ...
## $ Pop16.19yrsMale : int 133674 21350 187908 81748 1123855 141770 104983 24585 15959 494988 ...
## $ HsdropoutMale : int 9234 1207 12706 4746 52301 8519 3930 1275 1059 30441 ...
## $ Hsgrad.enrolledMale : int 124440 20143 175202 77002 1071554 133251 101053 23310 14900 464547 ...
## $ Pop16.19yrsFemale : int 130113 19045 177639 78247 1057904 132098 100883 24533 18041 466163 ...
## $ HsdropoutFemale : int 6220 893 9107 3856 30541 5117 2460 1063 668 19514 ...
## $ Hsgrad.enrolledFemale : int 123893 18152 168532 74391 1027363 126981 98423 23470 17373 446649 ...
## $ White16. : int 1580060 269224 2442779 1087524 11915855 2378569 1540529 331698 177119 7157392 ...
## $ White16.Emp : int 1451372 252348 2225697 1010475 10682330 2206883 1414643 307222 170417 6465380 ...
## $ White16.Unemp : int 128688 16876 217082 77049 1233525 171686 125886 24476 6702 692012 ...
## $ Black16. : int 564892 13319 128755 200028 1052918 103252 194555 97383 148740 1478933 ...
## $ Black16.Emp : int 474577 11698 109956 168800 865204 88139 161034 85743 119680 1224549 ...
## $ Black16.Unemp : int 90315 1621 18799 31228 187714 15113 33521 11640 29060 254384 ...
## $ AmericanIndian.AlaskaNative16. : int 11904 42899 113028 8467 134309 24386 3931 1629 1250 28006 ...
## $ AmericanIndian.AlaskaNative16.Emp : int 10045 33846 88328 7610 111608 20629 3374 1251 1033 23573 ...
## $ AmericanIndian.AlaskaNative16.Unemp : int 1859 9053 24700 857 22701 3757 557 378 217 4433 ...
## $ Asian16. : int 30138 23451 99914 19710 2697742 78649 79422 16687 15479 260871 ...
## $ Asian16.Emp : int 28435 22476 93624 18549 2476089 73633 73428 15661 15105 241932 ...
## $ Asian16.Unemp : int 1703 975 6290 1161 221653 5016 5994 1026 374 18939 ...
## $ NativeHwn.OPacif16CivilLabor : int 758 3853 5302 2725 76846 3808 708 298 142 6518 ...
## $ EmpNativeHwn.OPacif16CivilLabor : int 742 3348 4542 2268 64887 3329 630 291 142 5586 ...
## $ UnempNativeHwn.OPacif16CivilLabor : int 16 505 760 457 11959 479 78 7 0 932 ...
## $ OtherRace16CivilLabor : int 27618 4439 199852 28822 2434117 119817 97130 10501 13757 257060 ...
## $ EmpOtherRace16CivilLabor : int 25388 4118 173311 26748 2123641 105696 82303 9309 12251 226290 ...
## $ UnempOtherRace16CivilLabor : int 2230 321 26541 2074 310476 14121 14827 1192 1506 30770 ...
## $ X2plusRaces16CivilLabor : int 23799 22895 69419 19531 663219 71463 37246 7726 7847 171148 ...
## $ Emp2plusRaces16CivilLabor : int 19894 20149 59524 17267 566683 62394 31522 6626 7210 147713 ...
## $ Unemp2plusRaces16CivilLabor : int 3905 2746 9895 2264 96536 9069 5724 1100 637 23435 ...
## $ HispOrLat16CivilLabor : int 86731 20712 861520 86739 6867225 507707 255351 37128 37572 2288418 ...
## $ EmpHispOrLat16CivilLabor : int 78874 19280 760160 80566 5992745 453419 219134 33545 34116 2036076 ...
## $ UnempHispOrLat16CivilLabor : int 7857 1432 101360 6173 874480 54288 36217 3583 3456 252342 ...
## $ WhiteNotHispLat16CivilLabor : int 1525786 256826 1823074 1032356 7840832 2026214 1400538 308392 157523 5234808 ...
## $ EmpWhiteNotHispLat16CivilLabor : int 1401573 240738 1673968 958951 7113527 1889590 1292676 285801 152239 4748064 ...
## $ UnempWhiteNotHispLat16CivilLabor : int 124213 16088 149106 73405 727305 136624 107862 22591 5284 486744 ...
## $ HouseholdIncomes : int 1842174 251678 2387246 1132488 12617280 1998314 1356206 339046 267415 7217508 ...
## $ Lessthan10k : int 182834 9365 183011 103380 732367 122079 77857 19599 27619 566058 ...
## $ X10kto14.9k : int 128313 8930 124156 85278 645041 84229 52951 13582 11413 409607 ...
## $ X15kto19.9k : int 121400 8966 131701 80416 600113 89229 58103 15125 9962 436488 ...
## $ X20kto24.9k : int 116666 10043 135418 76304 602334 95537 55854 15062 10365 440156 ...
## $ X25kto29.9k : int 109845 8876 132640 72385 568726 93153 52701 16346 8557 425629 ...
## $ X30kto34.9k : int 100941 9764 134217 70316 569982 96814 52975 16242 9414 419178 ...
## $ X35kto39.9k : int 94146 9557 121824 62205 532421 88918 49170 14596 8439 386724 ...
## $ X40kto44.9k : int 93611 11225 125030 59707 525934 93185 52402 15723 10117 375445 ...
## $ X45kto49.9k : int 78376 8768 107092 51184 472926 83295 46636 14403 7701 325496 ...
## $ X50kto59.9k : int 145196 18474 200101 93361 921192 160263 94563 28215 16358 599079 ...
## $ X60kto74.9k : int 171976 27600 241557 104569 1190009 204968 126014 35933 21908 708470 ...
## $ X75kto99.9k : int 199359 36982 282294 113844 1544981 263329 176629 46083 29330 800834 ...
## $ X100kto124.9k : int 122245 28223 178925 67341 1134125 180627 132302 31513 23828 497669 ...
## $ Inc125000_149999 : int 67156 19129 106094 35210 747275 112904 92388 19977 16427 275777 ...
## $ Inc150000_199999 : int 61354 20893 96069 31696 870522 119480 107942 19899 21655 268710 ...
## $ Inc200000. : int 48756 14883 87117 25292 959332 110304 127719 16748 34322 282188 ...
## $ Median_House_Inc : int 43511 71829 49928 41264 61489 59448 69899 60231 69235 47212 ...
## $ PerCapitaIncome : int 23936 33129 25537 22595 29906 31674 38480 30191 46502 26499 ...
## $ PovertyTotal : int 4699510 711235 6411354 2862662 37323127 5079529 3481115 891493 599620 18946215 ...
## $ PovertyUnder.5 : int 385730 32151 548788 227865 2642882 297645 168574 51190 59892 1399140 ...
## $ Poverty.5_.74 : int 240297 16958 299890 144773 1515176 167393 89957 26488 26558 803874 ...
## $ Poverty.75_.99 : int 263683 22757 320631 176665 1957186 198823 106209 29423 22928 956245 ...
## $ Poverty1_1.49 : int 527004 57013 701163 350029 3894812 431605 223789 72212 45510 2069085 ...
## $ Poverty1.5_1.99 : int 471251 57412 639692 328269 3566199 436886 234516 74731 36500 1983342 ...
## $ Poverty2. : int 2811545 524944 3901190 1635061 23746872 3547177 2658070 637449 408232 11734529 ...
## [list output truncated]
There are 52 observations of 132 variables. * Except for State
and AverageHouseholdSize
, all of the variables are integers. * State
is a factor variable. * AverageHouseholdSize
is a number. This makes sense. * We will need to delete PR
from the dataset because no other sets we will be using include this.
There are no missing data.
favstats(~TotalPop, data=GroupA_BigDataSet)
## min Q1 median Q3 max mean sd n missing
## 575251 1790277 4141808 6717749 38066920 6110501 6921369 52 0
min=575251, max=38066920, n=52, none missing. This makes sense.
favstats(~Hsdropout, data=GroupA_BigDataSet)
## min Q1 median Q3 max mean sd n missing
## 730 3968.5 10945 16646.5 82842 15529.1 17464.94 52 0
min=730, max=82842, n=52, none missing. This makes sense.
favstats(~HouseholdIncomes, data=GroupA_BigDataSet)
## min Q1 median Q3 max mean sd n missing
## 225514 694825 1612612 2565213 12617280 2258703 2406258 52 0
min=225514, max=12617280, n=52, nome missing. This makes sense.
favstats(~PovertyTotal,data=GroupA_BigDataSet)
## min Q1 median Q3 max mean sd n missing
## 561187 1741653 4036054 6509816 37323127 5958289 6770423 52 0
min=561187, max=37323127, n=52, none missing. This makes sense.
favstats(~AverageHouseholdSize, data=GroupA_BigDataSet)
## min Q1 median Q3 max mean sd n missing
## 2.22 2.4775 2.545 2.645 3.14 2.576923 0.1713446 52 0
min=2.22, max=3.14, n=52, none missing. This makes sense.
GroupA_BigDataSet
.GroupA_PoliceKillings
, which are North Dakota, Rhode Island, and South Dakota.PR
from GroupA_BigDataSet
.