Loading data

#install.packages("readr")
library(readr)
library(mosaic)
GroupA_EducationAttainment <- read.csv("~/Desktop/FALL2016/SDS291/Project/GroupA_EducationAttainment.csv")

GroupA_PoliceKillings <- read.csv("~/Desktop/FALL2016/SDS291/Project/GroupA_PoliceKillings.csv")

GroupA_PovertyRate <- read.csv("~/Desktop/FALL2016/SDS291/Project/GroupA_PovertyRate.csv")

GroupA_StatePopulation <- read.csv("~/Desktop/FALL2016/SDS291/Project/GroupA_StatePopulation.csv")

GroupA_BigDataSet <- read.csv("~/Desktop/FALL2016/SDS291/Project/GroupA_BigDataSet.csv")

Structure and names (GroupA_EducationAttainment)

str(GroupA_EducationAttainment)
## 'data.frame':    51 obs. of  7 variables:
##  $ State               : Factor w/ 51 levels "AK","AL","AR",..: 2 1 4 3 5 6 7 9 8 10 ...
##  $ LessthanHS2014      : num  15.4 8.1 13.8 14.4 17.9 9.5 9.9 10.9 9.9 12.8 ...
##  $ HSorHigher2014      : num  84.6 91.9 86.2 85.6 82.1 90.5 90.1 89.1 90.1 87.2 ...
##  $ HSOnly2014          : num  31.7 25.8 24.4 35.1 20.9 21.9 27.5 31.6 17.8 29.5 ...
##  $ BachelororHigher2014: num  23 29.1 27.4 21.4 31.7 38 37.9 30.3 54.8 27.4 ...
##  $ BachelorOnly2014    : num  14.5 18.7 17.3 14.1 19.9 24.1 21.3 17.9 24.5 17.5 ...
##  $ GraduateDegree2014  : num  8.5 10.4 10.1 7.3 11.8 13.9 16.7 12.4 30.2 9.9 ...

In GroupA_EducationAttainment, there are 7 variables in the data, and 51 observations. The variables are:


Variable Analysis (GroupA_EducationAttainment)

favstats(~LessthanHS2014, data=GroupA_EducationAttainment)
##  min   Q1 median   Q3  max    mean       sd  n missing
##  7.1 9.05   10.7 14.2 17.9 11.4549 3.029212 51       0

Minimum is 7.1%. Maximum is 17.9%. There is no missing data. This makes sense.

favstats(~HSorHigher2014, data=GroupA_EducationAttainment)
##   min   Q1 median    Q3  max    mean       sd  n missing
##  82.1 85.8   89.3 90.95 92.9 88.5451 3.029212 51       0

Minimum is 82.1%. Maximum is 92.9%. There is no missing data. This makes sense.

favstats(~HSOnly2014, data=GroupA_EducationAttainment)
##   min   Q1 median    Q3  max     mean       sd  n missing
##  17.8 26.1   28.3 31.55 41.3 28.67647 4.247191 51       0

Minimum is 17.8%. Maximum is 41.3%. There is no missing data. This makes sense.

favstats(~BachelororHigher2014, data=GroupA_EducationAttainment)
##   min    Q1 median    Q3  max    mean       sd  n missing
##  19.3 26.05   28.7 32.35 54.8 29.7098 6.124941 51       0

Minimum is 19.3%. Maximum is 54.8%. There is no missing data. This makes sense.

favstats(~BachelorOnly2014, data=GroupA_EducationAttainment)
##   min    Q1 median    Q3  max    mean       sd  n missing
##  11.7 16.55   18.5 20.65 24.5 18.4902 2.894841 51       0

Minimum is 11.7%. Maximum is 24.5%. There is no missing data. This makes sense.

favstats(~GraduateDegree2014, data=GroupA_EducationAttainment)
##  min   Q1 median   Q3  max     mean      sd  n missing
##  6.7 9.25   10.4 12.1 30.2 11.22353 3.78442 51       0

Minimum is 6.7%. Maximum is 30.2%. There is no missing data. This makes sense.


Structure and names (GroupA_PoliceKillings)

str(GroupA_PoliceKillings)
## 'data.frame':    467 obs. of  34 variables:
##  $ name                : Factor w/ 465 levels "A'donte Washington",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ age                 : Factor w/ 61 levels "16","17","18",..: 1 12 11 10 14 14 7 20 29 16 ...
##  $ gender              : Factor w/ 2 levels "Female","Male": 2 2 2 2 2 2 2 2 2 2 ...
##  $ raceethnicity       : Factor w/ 6 levels "Asian/Pacific Islander",..: 2 6 6 3 6 6 3 3 6 6 ...
##  $ month               : Factor w/ 6 levels "April","February",..: 2 1 5 5 5 5 5 5 3 2 ...
##  $ day                 : int  23 2 14 11 19 7 27 26 28 7 ...
##  $ year                : int  2015 2015 2015 2015 2015 2015 2015 2015 2015 2015 ...
##  $ streetaddress       : Factor w/ 459 levels "1 Moreland Dr",..: 341 172 137 177 208 106 219 75 421 114 ...
##  $ city                : Factor w/ 364 levels "Albany","Albuquerque",..: 203 246 165 309 213 245 17 360 324 363 ...
##  $ state               : Factor w/ 47 levels "AK","AL","AR",..: 2 19 45 5 35 4 5 5 41 23 ...
##  $ latitude            : num  32.5 31.3 42.6 33.9 41.1 ...
##  $ longitude           : num  -86.4 -92.4 -87.8 -118.2 -81.4 ...
##  $ state_fp            : int  1 22 55 6 39 4 6 6 48 26 ...
##  $ county_fp           : int  51 79 59 37 153 13 29 37 41 81 ...
##  $ tract_ce            : int  30902 11700 1200 535607 530800 111602 700 294200 603 14200 ...
##  $ geo_id              : num  1.05e+09 2.21e+10 5.51e+10 6.04e+09 3.92e+10 ...
##  $ county_id           : int  1051 22079 55059 6037 39153 4013 6029 6037 48041 26081 ...
##  $ namelsad            : Factor w/ 389 levels "Census Tract 1",..: 172 48 49 267 264 39 304 161 286 62 ...
##  $ lawenforcementagency: Factor w/ 377 levels "Albuquerque Police Department",..: 203 270 155 309 156 255 10 181 372 158 ...
##  $ cause               : Factor w/ 5 levels "Death in custody",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ armed               : Factor w/ 8 levels "Disputed","Firearm",..: 4 4 4 2 4 4 2 5 2 6 ...
##  $ pop                 : int  3779 2769 4079 4343 6809 4682 5027 5238 4832 3795 ...
##  $ share_white         : Factor w/ 363 levels "-","0","0.1",..: 193 170 255 12 338 235 159 284 29 208 ...
##  $ share_black         : Factor w/ 246 levels "-","0","0.1",..: 121 134 199 8 16 199 5 4 57 199 ...
##  $ share_hispanic      : Factor w/ 293 levels "-","0","0.1",..: 196 7 59 293 18 262 179 274 234 113 ...
##  $ p_income            : Factor w/ 452 levels "-","10219","10987",..: 331 37 282 83 388 55 289 273 78 208 ...
##  $ h_income            : int  51367 27972 45365 48295 68785 20833 58068 66543 30391 44553 ...
##  $ county_income       : int  54766 40930 54930 55909 49669 53596 48552 55909 38310 51667 ...
##  $ comp_income         : num  0.938 0.683 0.826 0.864 1.385 ...
##  $ county_bucket       : int  3 2 2 3 5 1 4 4 2 3 ...
##  $ nat_bucket          : int  3 1 3 3 4 1 4 4 1 2 ...
##  $ pov                 : Factor w/ 281 levels "-","1.1","1.3",..: 40 141 44 21 6 244 67 25 189 76 ...
##  $ urate               : num  0.0977 0.0657 0.1663 0.1248 0.0635 ...
##  $ college             : num  0.1685 0.1114 0.1473 0.0501 0.404 ...

There are 467 observations and 34 variables.

We are only interested in states and race, so we will group by state and race to see how many people were killed in each state by race and eliminate all of the other variables.

state_race_killings <- GroupA_PoliceKillings %>%
  group_by(state,raceethnicity) %>%
  summarize(n=n())
str(state_race_killings)
## Classes 'grouped_df', 'tbl_df', 'tbl' and 'data.frame':  108 obs. of  3 variables:
##  $ state        : Factor w/ 47 levels "AK","AL","AR",..: 1 1 2 2 3 3 3 4 4 4 ...
##  $ raceethnicity: Factor w/ 6 levels "Asian/Pacific Islander",..: 3 4 2 6 2 5 6 2 3 4 ...
##  $ n            : int  1 1 4 4 1 1 2 1 6 2 ...
##  - attr(*, "vars")=List of 1
##   ..$ : symbol state
##  - attr(*, "drop")= logi TRUE

Now, there are 108 observations and 3 variables. The variables are:

These variables all seem to make sense.


Variable Analysis (GroupA_PoliceKillings)

favstats(~state, data=state_race_killings)
## Warning in FUN(eval(formula[[2]], data, .envir), ...): Auto-converting
## factor to numeric.
##  min    Q1 median    Q3 max     mean       sd   n missing
##    1 10.75     24 36.25  47 23.58333 14.30452 108       0

There are 47 total states included in this, we will need to add the 3 missing states

tally(~raceethnicity, data=state_race_killings)
## raceethnicity
## Asian/Pacific Islander                  Black        Hispanic/Latino 
##                      6                     30                     16 
##        Native American                Unknown                  White 
##                      3                      8                     45

Number of people killed by race, this makes sense.

favstats(~n, data=state_race_killings)
##  min Q1 median Q3 max     mean      sd   n missing
##    1  1      3  5  27 4.324074 4.56895 108       0

Minimum is 1, maximum is 27, no mising data. This makes sense.


Structure and names (GroupA_PovertyRate)

str(GroupA_PovertyRate)
## 'data.frame':    51 obs. of  11 variables:
##  $ State            : Factor w/ 51 levels "AK","AL","AR",..: 2 1 4 3 5 6 7 9 8 10 ...
##  $ X.BPL.2014       : num  19.3 11.2 18.2 18.9 16.4 12 10.8 12.5 17.7 16.5 ...
##  $ X.BPL.2015       : num  18.5 10.3 17.4 19.1 15.3 11.5 10.5 12.4 17.3 15.7 ...
##  $ HIncome.2014     : int  42895 71671 50094 41302 61990 61351 70112 59746 71659 47496 ...
##  $ HIncome.2015     : int  44765 73355 51492 41995 64500 63909 71346 61255 75628 49426 ...
##  $ X.Unemp2014      : num  8.6 7.6 7.9 6.8 8.5 5.5 7.9 6.7 8.9 8 ...
##  $ X.Unemp2015      : num  7.2 7.9 6.9 5.8 7.3 5.2 6.9 5.8 7.3 7 ...
##  $ X.Below10000.2014: num  10.1 3.9 7.8 8.7 5.9 6 6 6.7 11.4 7.9 ...
##  $ X.Below10000.2015: num  9.5 3.9 7.6 9.1 5.5 5.3 5.7 5.4 8.6 7.4 ...
##  $ X.FoodStamp2014  : num  15.8 10.3 13.1 14.4 9.5 8.9 12.8 12.9 14.1 14.8 ...
##  $ X.FoodStamp2015  : num  15.5 10.8 13 13.7 9.7 8.4 12.6 13 15.3 14.9 ...

There are 11 variables in this dataset, and 51 observations. They are:


Variable Analysis (GroupA_PovertyRate)

tally(~State, data=GroupA_PovertyRate)
## State
## AK AL AR AZ CA CO CT DC DE FL GA HI IA ID IL IN KS KY LA MA MD ME MI MN MO 
##  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 
## MS MT NC ND NE NH NJ NM NV NY OH OK OR PA RI SC SD TN TX UT VA VT WA WI WV 
##  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 
## WY 
##  1

There are 50 states plus DC, so the tally of one of each makes sense.

favstats(~X.BPL.2014, data=GroupA_PovertyRate)
##  min   Q1 median   Q3  max     mean       sd  n missing
##  9.2 12.1   14.8 17.2 21.5 14.86667 3.085882 51       0

Minimum Percent Below Poverty Line in 2014 is 9.2%, max is 21.5%. There is no data missing, this makes sense.

favstats(~X.BPL.2015, data=GroupA_PovertyRate)
##  min   Q1 median    Q3 max     mean       sd  n missing
##  8.2 11.5   14.5 16.25  22 14.23333 3.100108 51       0

Minimum Percent Below Poverty Line in 2015 is 8.2%, max is 22%. There is no data missing, this makes sense.

favstats(~HIncome.2014, data=GroupA_PovertyRate)
##    min      Q1 median      Q3   max     mean       sd  n missing
##  39702 47732.5  52707 61163.5 74070 54499.65 9234.893 51       0

Minimum Median Household Income in 2014 is $39,702, max is $74070. There is no data missing, this makes sense.

favstats(~HIncome.2015, data=GroupA_PovertyRate)
##    min      Q1 median    Q3   max     mean       sd  n missing
##  40593 49467.5  54736 63200 75847 56406.76 9517.995 51       0

Minimum Median Household Income in 2015 is $40593, max is $75847. There is no data missing, this makes sense.

favstats(~X.Unemp2014, data=GroupA_PovertyRate)
##  min  Q1 median  Q3 max     mean       sd  n missing
##    3 5.5      7 7.9 9.8 6.741176 1.536643 51       0

Minimum Unemployment Rate in 2014 is 3%, max is 9.8%. There is no data missing, this makes sense.

favstats(~X.Unemp2015, data=GroupA_PovertyRate)
##  min   Q1 median   Q3 max     mean       sd  n missing
##  2.6 5.05      6 6.95 8.9 5.919608 1.340451 51       0

Minimum Unemployment Rate in 2015 is 2.6%, max is 8.9%. There is no data missing, this makes sense.

favstats(~X.Below10000.2014, data=GroupA_PovertyRate)
##  min Q1 median  Q3  max     mean       sd  n missing
##  3.8  6    7.1 8.2 11.4 7.268627 1.763121 51       0

Minimum Percentage of Households with income below $10,000 in 2014 is 3.8%, max is 11.4%. There is no data missing, this makes sense.

favstats(~X.Below10000.2015, data=GroupA_PovertyRate)
##  min  Q1 median   Q3  max     mean       sd  n missing
##  3.9 5.6    6.7 7.75 11.5 6.831373 1.683626 51       0

Minimum Unemployment Rate in 2015 is 3.9%, max is 11.5%. There is no data missing, this makes sense.

favstats(~X.FoodStamp2014, data=GroupA_PovertyRate)
##  min Q1 median   Q3  max     mean       sd  n missing
##  6.7 11   13.1 15.2 18.9 13.03725 2.967555 51       0

Minimum Percentage of Households on Food Stamp/SNAP benefits in 2014 is 6.7%, max is 18.9%. There is no data missing, this makes sense.

favstats(~X.FoodStamp2015, data=GroupA_PovertyRate)
##  min   Q1 median   Q3  max     mean       sd  n missing
##  4.7 10.7     13 14.9 18.6 12.56078 3.102327 51       0

Minimum Percentage of Households on Food Stamp/SNAP benefits in 2015 is 4.7%, max is 18.6%. There is no data missing, this makes sense.


Structure and names (GroupA_StatePopulation)

str(GroupA_StatePopulation)
## 'data.frame':    51 obs. of  3 variables:
##  $ State   : Factor w/ 51 levels "AK","AL","AR",..: 2 1 4 3 5 6 7 9 8 10 ...
##  $ Pop.2014: int  4846411 737046 6728783 2966835 38792291 5355588 3594762 935968 659836 19905569 ...
##  $ Pop.2015: int  4858979 738432 6828065 2978204 39144818 5456574 3590886 945934 672228 20271272 ...

There are 3 variables in the data, and 51 observations. The variables are:

The variable types all make sense to me. Their names (States, Pop.2014, and Pop.2015) are self-explanatory.


Variable analysis (GroupA_StatePopulation)

favstats(~State, data=GroupA_StatePopulation)
##  min   Q1 median   Q3 max mean       sd  n missing
##    1 13.5     26 38.5  51   26 14.86607 51       0

Minimum is 1, max is 51. There is no missing data. This makes sense.

favstats(~Pop.2014, data=GroupA_StatePopulation)
##     min      Q1  median      Q3      max    mean      sd  n missing
##  584304 1741778 4412617 6909145 38792291 6253086 7125130 51       0

Minimum is 584304 and maximum is 38792291, no missing data. This makes sense.

favstats(~Pop.2015, data=GroupA_StatePopulation)
##     min      Q1  median      Q3      max    mean      sd  n missing
##  586107 1749529 4425092 6999208 39144818 6302330 7201100 51       0

Minimum is 586107 and maximum is 39144818, no missing data. This makes sense.

tally(~State, data=GroupA_StatePopulation)

Tried this and there a bunch of 1’s and the state names.


Structure and names (GroupA_BigDataSet)

str(GroupA_BigDataSet)
## 'data.frame':    52 obs. of  132 variables:
##  $ State                                                    : Factor w/ 52 levels "AK","AL","AR",..: 2 1 4 3 5 6 7 9 8 10 ...
##  $ TotalPop                                                 : int  4817678 728300 6561516 2947036 38066920 5197580 3592053 917060 633736 19361792 ...
##  $ AgeUnder5yrs                                             : int  299571 54498 440616 193697 2521299 337435 194338 55949 38546 1076836 ...
##  $ Age5.9yrs                                                : int  304412 51185 453952 200107 2531195 352695 217491 57547 29421 1100919 ...
##  $ Age10.14yrs                                              : int  321104 51427 455871 197225 2552173 344661 234666 56338 24069 1135272 ...
##  $ Age15.17yrs                                              : int  192831 30980 270053 118528 1607621 199906 148590 34682 15953 707950 ...
##  $ Age18.24yrs                                              : int  481858 79891 656248 287516 3988766 509480 341391 91658 83026 1779219 ...
##  $ Age25.34yrs                                              : int  618482 111054 874746 381975 5513196 764384 433145 117308 140234 2408242 ...
##  $ Age35.44yrs                                              : int  610792 92321 829680 367155 5175688 710650 459130 112435 87033 2419436 ...
##  $ Age45.54yrs                                              : int  675347 103682 840781 395446 5248476 724166 563772 131856 76294 2746426 ...
##  $ Age55.64yrs                                              : int  614020 91021 765082 363212 4310599 639858 468451 118205 67437 2468932 ...
##  $ Age65.74yrs                                              : int  401417 40949 559095 251769 2553063 359148 280541 81415 39650 1896734 ...
##  $ Age75.                                                   : int  217634 16077 301840 137722 1417512 179860 162971 42367 21744 1139305 ...
##  $ Age85.                                                   : int  80210 5215 113552 52684 647332 75337 87567 17300 10329 482521 ...
##  $ White                                                    : int  3327891 484195 5174082 2306073 23650913 4364911 2789105 639354 254955 14747196 ...
##  $ Black                                                    : int  1269808 25148 274380 458136 2262323 209062 365871 198028 314138 3114841 ...
##  $ AmericanIndian.AlaskaNative                              : int  25181 102743 290780 18337 287360 49917 8036 3134 2072 59121 ...
##  $ Asian                                                    : int  58322 40976 191071 38409 5130536 146561 145842 31745 22785 490833 ...
##  $ NativeHawaiianPacificIslander                            : int  1430 8335 12638 6278 147286 6641 1105 494 195 12128 ...
##  $ OtherRace                                                : int  58618 8382 418033 60565 4890329 240282 183297 20369 23999 484274 ...
##  $ X2.Races                                                 : int  76428 58521 200532 59238 1698173 180206 98797 23936 15592 453399 ...
##  $ AverageHouseholdSize                                     : num  2.55 2.79 2.69 2.53 2.95 2.54 2.56 2.63 2.22 2.62 ...
##  $ Pop25.                                                   : int  3217902 460319 4284776 1949963 24865866 3453403 2455577 620886 442721 13561596 ...
##  $ LessthanHS25.                                            : int  524368 37700 604392 306199 4602986 332246 257011 74267 49099 1837056 ...
##  $ HSGrad25.                                                : int  999761 126611 1050079 682451 5153257 759335 677887 195806 82531 4024052 ...
##  $ SomeCollege25.                                           : int  951960 168570 1469229 558722 7400714 1068138 612128 168049 74717 4071061 ...
##  $ Bachelor25.                                              : int  465268 82261 733845 263299 4870524 819675 506662 108647 103179 2324792 ...
##  $ Master25.                                                : int  196935 32681 306410 98442 1889640 344986 290414 52733 79033 889974 ...
##  $ ProfessionalSchool25.                                    : int  47795 7370 71569 24187 575093 73307 71488 11146 37053 267663 ...
##  $ DR25.                                                    : int  31815 5126 49252 16663 373652 55716 39987 10238 17109 146998 ...
##  $ LessthanHS25.H                                           : int  524368 37700 604392 306199 4602986 332246 257011 74267 49099 1837056 ...
##  $ HSGrad25.H                                               : int  2693534 422619 3680384 1643764 20262880 3121157 2198566 546619 393622 11724540 ...
##  $ SomeCollege25.H                                          : int  1693773 296008 2630305 961313 15109623 2361822 1520679 350813 311091 7700488 ...
##  $ Bachelor25.H                                             : int  741813 127438 1161076 402591 7708909 1293684 908551 182764 236374 3629427 ...
##  $ Master25.H                                               : int  276545 45177 427231 139292 2838385 474009 401889 74117 133195 1304635 ...
##  $ ProfessionalSchool25.H                                   : int  79610 12496 120821 40850 948745 129023 111475 21384 54162 414661 ...
##  $ DR25.H                                                   : int  31815 5126 49252 16663 373652 55716 39987 10238 17109 146998 ...
##  $ Pop16.19yrs                                              : int  263787 40395 365547 159995 2181759 273868 205866 49118 34000 961151 ...
##  $ Hsdropout                                                : int  15454 2100 21813 8602 82842 13636 6390 2338 1727 49955 ...
##  $ Hsgrad.enrolled                                          : int  248333 38295 343734 151393 2098917 260232 199476 46780 32273 911196 ...
##  $ Pop16.19yrsMale                                          : int  133674 21350 187908 81748 1123855 141770 104983 24585 15959 494988 ...
##  $ HsdropoutMale                                            : int  9234 1207 12706 4746 52301 8519 3930 1275 1059 30441 ...
##  $ Hsgrad.enrolledMale                                      : int  124440 20143 175202 77002 1071554 133251 101053 23310 14900 464547 ...
##  $ Pop16.19yrsFemale                                        : int  130113 19045 177639 78247 1057904 132098 100883 24533 18041 466163 ...
##  $ HsdropoutFemale                                          : int  6220 893 9107 3856 30541 5117 2460 1063 668 19514 ...
##  $ Hsgrad.enrolledFemale                                    : int  123893 18152 168532 74391 1027363 126981 98423 23470 17373 446649 ...
##  $ White16.                                                 : int  1580060 269224 2442779 1087524 11915855 2378569 1540529 331698 177119 7157392 ...
##  $ White16.Emp                                              : int  1451372 252348 2225697 1010475 10682330 2206883 1414643 307222 170417 6465380 ...
##  $ White16.Unemp                                            : int  128688 16876 217082 77049 1233525 171686 125886 24476 6702 692012 ...
##  $ Black16.                                                 : int  564892 13319 128755 200028 1052918 103252 194555 97383 148740 1478933 ...
##  $ Black16.Emp                                              : int  474577 11698 109956 168800 865204 88139 161034 85743 119680 1224549 ...
##  $ Black16.Unemp                                            : int  90315 1621 18799 31228 187714 15113 33521 11640 29060 254384 ...
##  $ AmericanIndian.AlaskaNative16.                           : int  11904 42899 113028 8467 134309 24386 3931 1629 1250 28006 ...
##  $ AmericanIndian.AlaskaNative16.Emp                        : int  10045 33846 88328 7610 111608 20629 3374 1251 1033 23573 ...
##  $ AmericanIndian.AlaskaNative16.Unemp                      : int  1859 9053 24700 857 22701 3757 557 378 217 4433 ...
##  $ Asian16.                                                 : int  30138 23451 99914 19710 2697742 78649 79422 16687 15479 260871 ...
##  $ Asian16.Emp                                              : int  28435 22476 93624 18549 2476089 73633 73428 15661 15105 241932 ...
##  $ Asian16.Unemp                                            : int  1703 975 6290 1161 221653 5016 5994 1026 374 18939 ...
##  $ NativeHwn.OPacif16CivilLabor                             : int  758 3853 5302 2725 76846 3808 708 298 142 6518 ...
##  $ EmpNativeHwn.OPacif16CivilLabor                          : int  742 3348 4542 2268 64887 3329 630 291 142 5586 ...
##  $ UnempNativeHwn.OPacif16CivilLabor                        : int  16 505 760 457 11959 479 78 7 0 932 ...
##  $ OtherRace16CivilLabor                                    : int  27618 4439 199852 28822 2434117 119817 97130 10501 13757 257060 ...
##  $ EmpOtherRace16CivilLabor                                 : int  25388 4118 173311 26748 2123641 105696 82303 9309 12251 226290 ...
##  $ UnempOtherRace16CivilLabor                               : int  2230 321 26541 2074 310476 14121 14827 1192 1506 30770 ...
##  $ X2plusRaces16CivilLabor                                  : int  23799 22895 69419 19531 663219 71463 37246 7726 7847 171148 ...
##  $ Emp2plusRaces16CivilLabor                                : int  19894 20149 59524 17267 566683 62394 31522 6626 7210 147713 ...
##  $ Unemp2plusRaces16CivilLabor                              : int  3905 2746 9895 2264 96536 9069 5724 1100 637 23435 ...
##  $ HispOrLat16CivilLabor                                    : int  86731 20712 861520 86739 6867225 507707 255351 37128 37572 2288418 ...
##  $ EmpHispOrLat16CivilLabor                                 : int  78874 19280 760160 80566 5992745 453419 219134 33545 34116 2036076 ...
##  $ UnempHispOrLat16CivilLabor                               : int  7857 1432 101360 6173 874480 54288 36217 3583 3456 252342 ...
##  $ WhiteNotHispLat16CivilLabor                              : int  1525786 256826 1823074 1032356 7840832 2026214 1400538 308392 157523 5234808 ...
##  $ EmpWhiteNotHispLat16CivilLabor                           : int  1401573 240738 1673968 958951 7113527 1889590 1292676 285801 152239 4748064 ...
##  $ UnempWhiteNotHispLat16CivilLabor                         : int  124213 16088 149106 73405 727305 136624 107862 22591 5284 486744 ...
##  $ HouseholdIncomes                                         : int  1842174 251678 2387246 1132488 12617280 1998314 1356206 339046 267415 7217508 ...
##  $ Lessthan10k                                              : int  182834 9365 183011 103380 732367 122079 77857 19599 27619 566058 ...
##  $ X10kto14.9k                                              : int  128313 8930 124156 85278 645041 84229 52951 13582 11413 409607 ...
##  $ X15kto19.9k                                              : int  121400 8966 131701 80416 600113 89229 58103 15125 9962 436488 ...
##  $ X20kto24.9k                                              : int  116666 10043 135418 76304 602334 95537 55854 15062 10365 440156 ...
##  $ X25kto29.9k                                              : int  109845 8876 132640 72385 568726 93153 52701 16346 8557 425629 ...
##  $ X30kto34.9k                                              : int  100941 9764 134217 70316 569982 96814 52975 16242 9414 419178 ...
##  $ X35kto39.9k                                              : int  94146 9557 121824 62205 532421 88918 49170 14596 8439 386724 ...
##  $ X40kto44.9k                                              : int  93611 11225 125030 59707 525934 93185 52402 15723 10117 375445 ...
##  $ X45kto49.9k                                              : int  78376 8768 107092 51184 472926 83295 46636 14403 7701 325496 ...
##  $ X50kto59.9k                                              : int  145196 18474 200101 93361 921192 160263 94563 28215 16358 599079 ...
##  $ X60kto74.9k                                              : int  171976 27600 241557 104569 1190009 204968 126014 35933 21908 708470 ...
##  $ X75kto99.9k                                              : int  199359 36982 282294 113844 1544981 263329 176629 46083 29330 800834 ...
##  $ X100kto124.9k                                            : int  122245 28223 178925 67341 1134125 180627 132302 31513 23828 497669 ...
##  $ Inc125000_149999                                         : int  67156 19129 106094 35210 747275 112904 92388 19977 16427 275777 ...
##  $ Inc150000_199999                                         : int  61354 20893 96069 31696 870522 119480 107942 19899 21655 268710 ...
##  $ Inc200000.                                               : int  48756 14883 87117 25292 959332 110304 127719 16748 34322 282188 ...
##  $ Median_House_Inc                                         : int  43511 71829 49928 41264 61489 59448 69899 60231 69235 47212 ...
##  $ PerCapitaIncome                                          : int  23936 33129 25537 22595 29906 31674 38480 30191 46502 26499 ...
##  $ PovertyTotal                                             : int  4699510 711235 6411354 2862662 37323127 5079529 3481115 891493 599620 18946215 ...
##  $ PovertyUnder.5                                           : int  385730 32151 548788 227865 2642882 297645 168574 51190 59892 1399140 ...
##  $ Poverty.5_.74                                            : int  240297 16958 299890 144773 1515176 167393 89957 26488 26558 803874 ...
##  $ Poverty.75_.99                                           : int  263683 22757 320631 176665 1957186 198823 106209 29423 22928 956245 ...
##  $ Poverty1_1.49                                            : int  527004 57013 701163 350029 3894812 431605 223789 72212 45510 2069085 ...
##  $ Poverty1.5_1.99                                          : int  471251 57412 639692 328269 3566199 436886 234516 74731 36500 1983342 ...
##  $ Poverty2.                                                : int  2811545 524944 3901190 1635061 23746872 3547177 2658070 637449 408232 11734529 ...
##   [list output truncated]

There are 52 observations of 132 variables. * Except for State and AverageHouseholdSize, all of the variables are integers. * State is a factor variable. * AverageHouseholdSize is a number. This makes sense. * We will need to delete PR from the dataset because no other sets we will be using include this.


Variable analysis (GroupA_BigDataSet)

There are no missing data.

favstats(~TotalPop, data=GroupA_BigDataSet)
##     min      Q1  median      Q3      max    mean      sd  n missing
##  575251 1790277 4141808 6717749 38066920 6110501 6921369 52       0

min=575251, max=38066920, n=52, none missing. This makes sense.

favstats(~Hsdropout, data=GroupA_BigDataSet)
##  min     Q1 median      Q3   max    mean       sd  n missing
##  730 3968.5  10945 16646.5 82842 15529.1 17464.94 52       0

min=730, max=82842, n=52, none missing. This makes sense.

favstats(~HouseholdIncomes, data=GroupA_BigDataSet)
##     min     Q1  median      Q3      max    mean      sd  n missing
##  225514 694825 1612612 2565213 12617280 2258703 2406258 52       0

min=225514, max=12617280, n=52, nome missing. This makes sense.

favstats(~PovertyTotal,data=GroupA_BigDataSet)
##     min      Q1  median      Q3      max    mean      sd  n missing
##  561187 1741653 4036054 6509816 37323127 5958289 6770423 52       0

min=561187, max=37323127, n=52, none missing. This makes sense.

favstats(~AverageHouseholdSize, data=GroupA_BigDataSet)
##   min     Q1 median    Q3  max     mean        sd  n missing
##  2.22 2.4775  2.545 2.645 3.14 2.576923 0.1713446 52       0

min=2.22, max=3.14, n=52, none missing. This makes sense.


Most pressing data cleaning issues