CS-0149: Introduction to Statistics Through Baseball

The field of Statistics aims to interpret large data sets that contain random variation. Baseball is a simple game that contains a high degree of randomness, and because professional baseball has been played since the 19th century, a large amount of data has been collected about players’ performance. In this class we examine key concepts in Statistics and Data Science using baseball as a motivating example. We will also discuss how newer statistics, created by sabermetric researchers, have led to additional insights, and will be learn how to use the R programming language to analyze data. Assignments will consist of weekly problem sets and a short final project. By taking this class students will develop an understanding of key Statistical concepts that will be useful for interpreting data from many fields.


Class resources: syllabus , final project guidelines, class piazza site

Textbooks: Teaching Statistics Using Baseball, Big Data Baseball, Analyzing Baseball Data with R (optional)

R resources: R tutorial, R Markdown cheat sheet, article on using R to analyze baseball data , Learning R videos: Intro, common functions, vectors , descriptive statistics, Visualizing Univariate Data, scatter plots

Baseball resources: Basic and more detailed rules of baseball, , NYT article: What Umpires Get Wrong, NYT article: Baseball’s borders ,

Shiny Apps: Regression app, Big League Baseball app , Single proportion app

R Markdown worksheets: Worksheet 1, Worksheet 2, Worksheet 3, Worksheet 4, Worksheet 5, Worksheet 6, Worksheet 7, Worksheet 8, Worksheet 9, Worksheet 10, Worksheet 11


Class 1: Introduction

Class 2: Baseball statistics and an introduction to R

Class 3: Summary statistics and plots for a single batch of data
    Worksheet 1

Class 4: Exploring categorical and quantitative data

Class 5: Quantifying variability
    Worksheet 2

Class 6: More descriptive statistics: Percentiles, boxplots, and z-scores

Class 7: Relationships between variables
    Worksheet 3

Class 8: Simple linear regression

Class 9: Linear regression continued
    Worksheet 4

Class 10: Multiple linear regression
    Regression Shiny App

Class 11: Data manipulation (with dplyr)
    Worksheet 5

Class 12: Understanding probability using games
    Big League Baseball App

Class 13: Understanding probability using games continued
    Worksheet 6
    NYT article: digital strat-o-matic

Class 14: Tree diagrams and the binomial distribution

Class 15: Binomial and normal distributions
    Worksheet 7

Class 16: Introduction to statistical inference

Class 17: Hypothesis tests on a single proportion
    Single proportion shiny app

Class 18: Hypothesis tests for two proportions
    Worksheet 8

Class 19: Hypothesis tests for two proportions and two means

Class 20: Randomization tests for two or more means
    Worksheet 9

Class 21: Parametric tests for two or means

Class 22: Hypothesis tests for two or more means and confidence intervals
    Worksheet 10

Class 23: Confidence intervals

Class 24: Final project presentations

Class 25: Class presentations, and review
    Worksheet 11