\documentclass[10pt]{article}
\usepackage{amsmath,amssymb,amsthm}
\usepackage{fancyhdr,url,hyperref}
\usepackage{graphicx,xspace}
\oddsidemargin 0in %0.5in
\topmargin 0in
\leftmargin 0in
\rightmargin 0in
\textheight 9in
\textwidth 6in %6in
%\headheight 0in
%\headsep 0in
%\footskip 0.5in
\newtheorem{thm}{Theorem}
\newtheorem{cor}[thm]{Corollary}
\newtheorem{obs}{Observation}
\newtheorem{lemma}{Lemma}
\newtheorem{claim}{Claim}
\newtheorem{definition}{Definition}
\newtheorem{question}{Question}
\newtheorem{answer}{Answer}
\newtheorem{problem}{Problem}
\newtheorem{solution}{Solution}
\newtheorem{conjecture}{Conjecture}
\pagestyle{fancy}
\lhead{\textsc{Prof. McNamara}}
\chead{\textsc{SDS/MTH 220: Lecture notes}}
\lfoot{}
\cfoot{}
%\cfoot{\thepage}
\rfoot{}
\renewcommand{\headrulewidth}{0.2pt}
\renewcommand{\footrulewidth}{0.0pt}
\newcommand{\ans}{\vspace{0.25in}}
\newcommand{\R}{{\sf R}\xspace}
\newcommand{\cmd}[1]{\texttt{#1}}
\rhead{\textsc{September 11, 2017}}
\begin{document}
\paragraph{Agenda}
\begin{enumerate}
\itemsep0em
\item Data and Sampling
\item HW \#1 due Wednesday
\end{enumerate}
\paragraph{Activity: Data Collection}
\begin{enumerate}
\item Find two people with notes whose colors are different from yours (and each others). That is, form a tri-chromatic group!
\item Take turns answering the following questions:
\begin{itemize}
\itemsep0em
\item What is your name?
\item What is your email address?
\item What color sheet do you have?
\item What year are you?
\item In which house do you live?
\item What is your hometown?
\item How many siblings do you have?
\item What is the furthest away from Northampton (in miles) you were over the break?
\end{itemize}
While one person is answering, the other two groupmates will write down the answers.
\item Open this \href{https://docs.google.com/a/smith.edu/spreadsheets/d/18nigJczAwKBZUSH867ZkORYktAWVDJeXYz49kO6SVdo/edit?usp=sharing}{Google Spreadsheet}, and start entering data. (You can all work on one computer.)
\item What are the \emph{cases} in this data set?
\ans
\item For each of the variables in the spreadsheet, describe the type of variable that it is (e.g. categorical/numerical, discrete/continuous, ordinal, etc.)
\begin{itemize}
\itemsep0.4in
\item Name
\item Sheet Color
\item Class Year
\item House
\item Hometown
\item \# of Siblings
\item Distance over break
\end{itemize}
\end{enumerate}
\paragraph{Sampling}
It is important to keep in mind the distinction between the \emph{population} and the \emph{sample}. We collect a sample of data, analyze it, and try to use that information to make inferences about the population.
Three sampling schemes: \href{http://en.wikipedia.org/wiki/Simple_random_sampling}{simple random sampling}, \href{http://en.wikipedia.org/wiki/Stratified_sampling}{stratified sampling}, and \href{http://en.wikipedia.org/wiki/Cluster_sampling}{cluster sampling} (see Figure 1.14 on page 15)
\begin{enumerate}
% Wikipedia
\item Suppose that in a company there are the following 180 staff members: 90 women who work full-time, 18 women who work part-time, 9 men who work full-time, and 63 men who work part-time. We are asked to take a sample of 40 staff, stratified according to the above categories. Devise a sampling scheme to do this.
\vspace{1.2in}
% Mine
\item A city council has requested a household survey be conducted in a suburban area of their city. The area is broken into many distinct and unique neighborhoods, some including large homes, some with only apartments, and others a diverse mixture of housing structures. Briefly assess the strengths and weaknesses of each approach. Which approach would likely be the \emph{least} effective? Why?
\begin{itemize}
\itemsep0.25in
\item Simple random sampling
\item{Cluster sampling}
\item Stratified sampling
%\item Blocked sampling
\item Anecdotal sampling
\end{itemize}
\item A school district is considering whether it will no longer allow high school students to park at school after two recent accidents where students were severely injured. As a first step, they survey parents by mail, asking them whether or not the parents would object to this policy change. Of 6,000 surveys that go out, 1,200 are returned. Of these 1,200 surveys that were completed, 960 agreed with the policy change and 240 disagreed. Which of the following statements are true? Why?
\begin{enumerate}
\item Some of the mailings may have never reached the parents.
\item The school district has strong support from parents to move forward with the policy approval.
\item It is possible that the majority of the parents of high school students disagree with the policy change.
\item The survey results are unlikely to be biased because all parents were mailed a survey.
\end{enumerate}
\end{enumerate}
% \newpage
%
% \subsection{Instructor's Notes}
%
% \paragraph{Activity: Data Collection}
% \begin{enumerate}
% \item Before class, write out a series of questions on the chalkboard (or prepare a slide). If on the chalkboard, I cover the questions with the screen as they come in. Possible questions \ldots
%
% \begin{enumerate}
% \item What is your name?
% \item What is your email address?
% \item What year are you?
% \item In which house do you live?
% \item What is your hometown?
% \item How many siblings do you have?
% \item What is the farthest away from Northampton (in miles) over the winter break?
% \end{enumerate}
%
% The objectives are to have questions that will help students contact one another, break the ice a bit and get them talking, and provide examples of different classes of data (categorial, numerical, count, etc).
%
% \item As students come in, give them a colored index card. I've used three colors in the past.
%
% \item At the beginning of class, tell the students that they are to make groups of three by finding two other students with different colored cards. That should be just enough structure to not make them feel too awkward about asking to be in each other's groups. You can also wait to give out the cards till they're seated and pass them out in single color chunks. If people came in with a friend, they'd likely sit by them, so this is a way to get them talking to other students.
%
% \item Each group member then takes several minutes to take turns introducing themselves to their groupmates, answering each question as they go. The listeners will jot down the answers on their cards, with one person on each side.
%
% \item Once most people have both sides of the card filled out, bring the class together and introduce yourself by answering the same questions. This can then segue into a discussion of data types. I usually diagram stuff out on the board. You can also talk about ways to summarize the data of the whole class with summary statistics (mean and max distance, mode of year) as well as ways to visualize it.
%
% \item The goals are for students to have met two of their fellow students and gotten their contact info, hopefully to facilitate future collaboration. The will also start thing about the different forms that data can take and the taxonomy of data is useful because it can help guide how we summarize, visualize, and later model the data.
%
% \end{enumerate}
% \paragraph{Answers}
%
% The main difference between cluster sampling and stratified sampling is that in cluster sampling the cluster is treated as the sampling unit so analysis is done on a population of clusters (at least in the first stage). In stratified sampling, the analysis is done on elements within strata. In stratified sampling, a random sample is drawn from each of the strata, whereas in cluster sampling only the selected clusters are studied. The main objective of cluster sampling is to reduce costs by increasing sampling efficiency. This contrasts with stratified sampling where the main objective is to increase precision.
%
% The first step is to find the total number of staff (180) and calculate the percentage in each group.
% \% male, full-time = 90 ÷ 180 = 50%
% \% male, part-time = 18 ÷ 180 = 10%
% \% female, full-time = 9 ÷ 180 = 5%
% \% female, part-time = 63 ÷ 180 = 35%
%
% This tells us that of our sample of 40,
% 50\% should be male, full-time.
% 10\% should be male, part-time.
% 5\% should be female, full-time.
% 35\% should be female, part-time.
% 50\% of 40 is 20.
% 10\% of 40 is 4.
% 5\% of 40 is 2.
% 35\% of 40 is 14.
%
% Another easy way without having to calculate the percentage is to multiply each group size by the sample size and divide by the total population size (size of entire staff):
% male, full-time = 90 × (40 ÷ 180) = 20
% male, part-time = 18 × (40 ÷ 180) = 4
% female, full-time = 9 × (40 ÷ 180) = 2
% female, part-time = 63 × (40 ÷ 180) = 14
%
% 1: (b)
%
% 2: (a) and (c)
\end{document}