The field of Statistics aims to interpret large data sets that contain random variation. Baseball is a simple game that contains a high degree of randomness, and because professional baseball has been played since the 19th century, a large amount of data has been collected about players’ performance. In this class we examine key concepts in Statistics and Data Science using baseball as a motivating example. We will also discuss how newer statistics, created by sabermetric researchers, have led to additional insights, and will be learn how to use the R programming language to analyze data. Assignments will consist of weekly problem sets and a short final project. By taking this class students will develop an understanding of key Statistical concepts that will be useful for interpreting data from many fields.

### Resources

**Class resources:** syllabus , final project guidelines, class piazza site

**Textbooks:** Teaching Statistics Using Baseball, Big Data Baseball, Analyzing Baseball Data with R (optional)

**R resources:** R tutorial, R Markdown cheat sheet, article on using R to analyze baseball data , Learning R videos: Intro, common functions, vectors , descriptive statistics, Visualizing Univariate Data, scatter plots

**Baseball resources:** Basic and more detailed rules of baseball, , NYT article: What Umpires Get Wrong, NYT article: Baseball’s borders ,

**Shiny Apps:** Regression app, Big League Baseball app , Single proportion app

**R Markdown worksheets:** Worksheet 1, Worksheet 2, Worksheet 3, Worksheet 4, Worksheet 5, Worksheet 6, Worksheet 7, Worksheet 8, Worksheet 9, Worksheet 10, Worksheet 11

### Schedule

**Class 1:** Introduction

**Class 2:** Baseball statistics and an introduction to R

**Class 3:** Summary statistics and plots for a single batch of data

Worksheet 1

**Class 4:** Exploring categorical and quantitative data

**Class 5:** Quantifying variability

Worksheet 2

**Class 6:** More descriptive statistics: Percentiles, boxplots, and z-scores

**Class 7:** Relationships between variables

Worksheet 3

**Class 8:** Simple linear regression

**Class 9:** Linear regression continued

Worksheet 4

**Class 10:** Multiple linear regression

Regression Shiny App

**Class 11:** Data manipulation (with dplyr)

Worksheet 5

**Class 12:** Understanding probability using games

Big League Baseball App

**Class 13:** Understanding probability using games continued

Worksheet 6

NYT article: digital strat-o-matic

**Class 14:** Tree diagrams and the binomial distribution

**Class 15:** Binomial and normal distributions

Worksheet 7

**Class 16:** Introduction to statistical inference

**Class 17:** Hypothesis tests on a single proportion

Single proportion shiny app

**Class 18:** Hypothesis tests for two proportions

Worksheet 8

**Class 19:** Hypothesis tests for two proportions and two means

**Class 20:** Randomization tests for two or more means

Worksheet 9

**Class 21:** Parametric tests for two or means

**Class 22:** Hypothesis tests for two or more means and confidence intervals

Worksheet 10

**Class 23:** Confidence intervals

**Class 24:** Final project presentations

**Class 25:** Class presentations, and review

Worksheet 11