\documentclass[10pt]{article}
\usepackage{amsmath,amssymb,amsthm}
\usepackage{fancyhdr,url,hyperref}
\usepackage{graphicx,xspace}
\usepackage{subfigure}
\usepackage{tikz}
\usetikzlibrary{arrows,decorations.pathmorphing,backgrounds,positioning,fit,through}
\oddsidemargin 0in %0.5in
\topmargin 0in
\leftmargin 0in
\rightmargin 0in
\textheight 9in
\textwidth 6in %6in
%\headheight 0in
%\headsep 0in
%\footskip 0.5in
\newtheorem{thm}{Theorem}
\newtheorem{cor}[thm]{Corollary}
\newtheorem{obs}{Observation}
\newtheorem{lemma}{Lemma}
\newtheorem{claim}{Claim}
\newtheorem{definition}{Definition}
\newtheorem{question}{Question}
\newtheorem{answer}{Answer}
\newtheorem{problem}{Problem}
\newtheorem{solution}{Solution}
\newtheorem{conjecture}{Conjecture}
\pagestyle{fancy}
\lhead{\textsc{Prof. McNamara}}
\chead{\textsc{SDS/MTH 220: Lecture notes}}
\lfoot{}
\cfoot{}
%\cfoot{\thepage}
\rfoot{}
\renewcommand{\headrulewidth}{0.2pt}
\renewcommand{\footrulewidth}{0.0pt}
\newcommand{\ans}{\vspace{0.25in}}
\newcommand{\R}{{\sf R}\xspace}
\newcommand{\cmd}[1]{\texttt{#1}}
\rhead{\textsc{October 18, 2017}}
\begin{document}
\paragraph{Agenda}
\begin{enumerate}
\itemsep0em
\item Confidence intervals
\item Central limit theorem
\item Sampling Distributions from Multimodal Populations
\end{enumerate}
\paragraph{Reminder}
Confidence intervals for test statistics that are normally distributed are of the form:
$$
\text{point estimate} \pm z_{\alpha/2}^* \cdot SE
$$
Computing the point estimate is usually easy. Once you've chosen a confidence level, finding $z_{\alpha/2}^*$ is trivial (use \cmd{qnorm()} or the table in the back of the book). The difficult part is usually computing the $SE$, since that depends on the sampling distribution of the test statistic!
\paragraph{Warmup}
\begin{enumerate}
\itemsep0in
% MMC, 7e, 6.21
\item A recent study estimated the mean U.S. per capita consumption of sugar-sweetened beverages among adults 20 to 44 years of age to be 289 kcal/day with a standard error of 7 kcal/day.
\begin{enumerate}
\itemsep0.3in
\item The 68-95-99.7 rule says that the probability is about 0.95 that $\bar{x}$ is within $y$ kcal/day of the population mean $\mu$. What is $y$?
\item About 99\% of all samples will capture the true mean of kcals consumed per day in the interval $\bar{x}$ plus or minus $7$ kcal/day times what? Draw a labeled picture and indicate where the missing quantity is. Estimate it. What does the computer need to known in order to compute it?
<>=
qnorm(c(0.005, 0.995))
@
\vspace{0.2in}
\end{enumerate}
% MMC, 7e, 6.13
\item Suppose 400 randomly selected alumni of the University of Okoboji were asked to rate the university's counseling services on a 1 to 10 scale. The sample mean was found to be 8.6. Assume that the standard error was computed to be 0.4.
\begin{enumerate}
\itemsep0.4in
\item A researcher computes the 99\% confidence interval for the average satisfaction score as $8.6 \pm 1.96 \cdot 0.4$. What is her mistake?
\item After correcting her mistake in part (a), she states: ``I am 95\% confident that the sample mean falls between 7.82 and 9.38." What is wrong with this statement?
\item She quickly realizes her mistake in part (b) and instead states: ``The probability the true mean is between 7.82 and 9.38 is 0.95." What misinterpretation is she making now?
\item Finally in her defense for using the Normal distribution to determine the confidence interval she says ``Because the sample size is quite large, the population of alumni ratings will be approximately Normal." Explain to Ima her misunderstanding and correct this statement.
\end{enumerate}
\vspace{0.2in}
% MMC, 7e, 6.89
\item Explain whether a test of signficance can answer each of the following questions.
\begin{enumerate}
\item Is the sample or experiment properly designed?
\item Is the observed effect compatible with the null hypothesis?
\item Is the observed effect important?
\end{enumerate}
% MMC, 7e, 6.92
\item Justify whether or not you agree with each of the following statements.
\begin{enumerate}
\item If the p-value is larger than 0.05, the null hypothesis is true.
\item Practical significance is not the same as statistical significance.
\item We can perform a statistical analysis using any set of data.
\item If you find an interesting pattern in a set of data, it is appropriate to then use a significance test to determine its significance.
\end{enumerate}
\end{enumerate}
\paragraph{Central limit theorem}
See \url{http://45.55.32.181/shiny/SamplingDist/}
\begin{enumerate}
\itemsep1in
\item What happens to the sampling distribution as the size of samples increases?
\item What happens to the sampling distribution as the number of samples increases?
\end{enumerate}
\vspace{1in}
\paragraph{Sampling Distributions from Multimodal Populations}
Consider the following multimodal probability distribution.
<>=
require(mosaic)
n <- 1000
ds <- data.frame(a = rnorm(n, mean = 53), b = rnorm(n, mean = 57, sd = 0.8),
c = rnorm(n, mean = 64), d = rnorm(n, mean = 68, sd = 0.8), p = runif(n)) %>%
mutate(x = ifelse(p < 0.25, a, ifelse(p < 0.4, b, ifelse(p < 0.65, c, d))))
pop_plot <- qplot(data = ds, x = x, geom = "density", adjust = 0.5) +
geom_vline(aes(xintercept = mean(ds$x)), linetype = 2)
pop_plot
@
\begin{enumerate}
\itemsep1in
\item Sketch what you think the sampling distribution of the mean looks like for samples of size 1.
\item Sketch what you think the sampling distribution of the mean looks like for samples of size 2.
\item Sketch what you think the sampling distribution of the mean looks like for samples of size 10,000.
\item Sketch what you think the sampling distribution of the mean looks like for samples of size 4.
\end{enumerate}
\vspace{1in}
<>=
sim <- do(1000) * mean(~x, data = sample_n(ds, 4))
pop_plot + geom_density(data = sim, aes(x = mean), adjust = 0.5, color = "red")
@
\end{document}