Introduction to Statistics & Probability Through Sports
Author(s): Reza Noubary , Dong Zhang
Edition: 1
Copyright: 2020
Pages: 562. LSI page count does not change
Edition: 1
Copyright: 2020
Pages: 562
Choose Your Platform | Help Me Choose
Introduction to Statistics & Probability Through Sports links the tools students need for data analysis and modeling to the universal love of sports. By using sports as the focus of data analysis, it increases student’s motivation to study statistics and builds on a foundation of understanding of sports they already possess. Sports also present sets of data that are accurate, reliable, and recorded with great precision under controlled conditions.
Through the use of sports statistics, Introduction to Statistics & Probability Through Sports discusses:
- Descriptive Statistics
- Probability
- Random variable
- Classical, Limiting, and Sampling Distributions
- Estimation
- Testing
- Analysis of Variance
- Regression
- Times Series Analysis
- Use of R to perform data analysis
Preface
1. ELEMENTS OF STATISTICS
1.1 NUMERICAL SUMMARIES
1.2 FREQUENCY TABLE AND HISTOGRAM
1.3 POPULATION AND SAMPLING
1.4 INTERPRETING THE STANDARD DEVIATION
1.5 MEASURES OF RELATIVE STANDING
1.5.1 Percentile Ranking
1.5.2 z-score
1.5.3 Interpretation of z-Score for Bell-Shaped Distribution
1.6 CASE STUDY: COMPARISON OF PROFESSIONAL PLAYERS AND TEAMS
1.6.1 Performance Measures
1.6.2 How Good Was Michael Jordan?
1.7 PAIRED DATA AND LINEAR CORRELATION COEFFICIENT
1.7.1 Correlation Coefficient
1.7.2 Using Correlation for Prediction
1.7.3 Interpretation of Correlation Coefficient
1.7.4 Regression-to-the-Mean
1.7.5 Inference and Data Mining
1.8 USING R
2. ELEMENTS OF PROBABILITY
2.1 CHANCE EXPERIMENTS
2.2 EVENTS AND THEIR ALGEBRA
2.3 PROBABILITY
2.3.1 Introduction and Definitions
2.3.2 Permutations and Combinations
2.3.3 Misleading Use of Probability and Statistics2.4 CONDITIONAL PROBABILITY AND INDEPENDENCE
2.4.1 Conditional Probability, Addition, and Multiplication Rules
2.4.2 Independent Events
2.4.3 More on The Multiplication Rule
2.4.4 The Law of Total Probability
2.4.5 Bayes’ Theorem
2.4.6 Independence of More Than Two Events
2.5 ANALYSIS OF A TENNIS MATCH
2.5.1 Table Tennis
2.6 RANDOM VARIABLES AND THEIR DISTRIBUTION
2.7 EXPECTATION AND VARIANCE
2.7.1 Lifetimes and Life Expectancies
2.8 SIMPSON’S PARADOX AND HOTH AND IN SPORT
2.8.1 Introduction and Examples
2.8.2 Weighted Averages
2.8.3 Application to “Hot Hand” in Sport
2.9 COINCIDENCES
2.9.1 Streaks
2.10 USING R
2.10.1 Probability of random variables
3 SOME SPECIAL DISTRIBUTIONS
3.1 BINOMIAL DISTRIBUTION (DISCRETE)
3.1.1 Binomial Distribution
3.1.2 Bernoulli Distribution and System Reliability
3.2 GEOMETRIC AND NEGATIVE BINOMIAL DISTRIBUTIONS (DISCRETE)
3.2.1 Geometric Distribution
3.2.2 Negative Binomial (Pascal) Distribution
HYPER GEOMETRIC DISTRIBUTION (DISCRETE)
3.3.1 Hypergeometric Distribution
3.4 POISSON DISTRIBUTION (DISCRETE)
3.4.1 Binomial-Poisson Hierarchy
3.5 UNIFORM DISTRIBUTION (CONTINUOUS)
3.6 NORMAL DISTRIBUTION (CONTINUOUS)
3.7 RELATIVES OF THE NORMAL DISTRIBUTION: CHI-SQUARE( χ2), t AND F-DISTRIBUTIONS (ALLCONTINUOUS)
3.7.1 Chi-Square Distribution
3.7.2 t-Distribution
3.7.3 F-Distribution3.8 EXPONENTIAL, GAMMA, BETA, LOGNORMAL, GUMBEL, WEIBULL AND FRÉCHET DISTRIBUTIONS (ALLCONTINUOUS)
3.8.1 Exponential Distribution
3.8.2 Gamma Distribution
3.8.3 Beta Distribution
3.8.4 Lognormal Distribution (Another Relative of the Normal Distribution)
3.8.5 Gumbel Distribution
3.8.6 Weibull Distribution
3.8.7 Fréchet Distribution
3.8.8 Maxwell Distribution
3.8.9 Pareto Distribution
3.8.10 Relationships Between Some Distributions
3.9 THE POISSON PROCESS
3.9.1 Facts and Applications
3.10 USING R
4 LIMIT PROPERTIES AND MODELS
4.1 THE MODEL OF SUMS: NORMAL DISTRIBUTION
4.1.1 The Central Limit Theorem
4.1.2 The Normal Approximation to the Binomial Distribution
4.2 POISSON LIMIT THEOREM
4.3 NORMAL APPROXIMATION: A SUMMARY
4.4 THE MODEL OF PRODUCTS: THE LOGNORMAL DISTRIBUTION .
4.5 THE MODEL OF EXTREMES: THE EXTREME VALUE DISTRIBUTIONS
4.6 EXCEEDANCES
4.6.1 Return Periods
4.6.2 Exceedances and English Premier League
4.6.3 Characteristic Values
4.7 USING R
5 ESTIMATION
5.1 THE PROBLEM DESCRIPTION
5.2 SAMPLING DISTRIBUTION
5.3 POINT ESTIMATOR
5.4 MAXIMUM LIKELIHOOD PRINCIPLE
5.4.1 Method of Moments
5.5 INTERVAL ESTIMATION AND CONFIDENCE INTERVALS
5.6 BOOTSTRAP CONFIDENCE INTERVAL
5.6.1 An Example of Bootstrapping
5.7 USING R
5.7.1 Using Known Distributions
6. STATISTICAL TESTING
6.1 STATISTICAL HYPOTHESIS
6.2 TESTS
6.3 Z-TEST
6.3.1 Testing Using Confidence Interval
6.4 TWO-SAMPLE Z-TEST
6.5 OBSERVED SIGNIFICANCE LEVEL, p-VALUES
6.6 t-TEST
6.6.1 Test for a Population Mean
6.6.2 Test for a Population Correlation Coefficient
6.7 LARGE-SAMPLE TEST OF HYPOTHESIS FOR POPULATION PROPORTION
6.7.1 Determination of Sample Size
6.8 χ2 TEST FOR VARIANCE(STANDARD DEVIATION)
6.9 F-TEST
6.10 OTHER ALTERNATIVE HYPOTHESIS
6.11 MORE ON INFERENCE ABOUT TWO POPULATIONS
6.11.1 Large-Sample Inference About the Difference Between Two Population Means
6.11.2 Small-Sample Inference About the Difference Between Two Population Means (Normal Populations)
6.11.3 Inference About the Difference Between Two Population Means (Unequal Variances)
6.11.4 Inference About the Difference Between Two Population Means: Paired Difference Experiment
6.11.5 Fitness Test
6.11.6 Inference About the Difference Between Two Population Proportions
6.11.7 Case Study: 2018-2019 NBA MVP
6.12 DISTRIBUTION FREE χ –TEST
6.13 TEST OF INDEPENDENCE, CONTINGENCY TABLES
6.13.1 Further Discussions on “Hot Hand” in Sports
6.13.2 Case study: Comparison of Men and Women Professional Basketball Players
6.14 KOLMOGOROV-SMIRNOV TEST
6.15 SIGN TEST
6.16 FINAL WORDS
6.17 USING R
6.17.1 Hypothesis test of mean values
6.17.2 Hypothesis test of variances
6.17.3 χ2 2-Test
6.17.4 Kolmogorov-Smirnov test
7. ANALYSIS OF VARIANCE AND EXPERIMENTAL DESIGN
7.1 COMPONENTS OF VARIANCE
7.2 ONE-WAY CLASSIFICATION
7.3 TWO-WAY CLASSIFICATIONS
7.4 THE EXPERIMENTAL DESIGN
7.4.1 The Completely Randomized Design
7.4.2 The Randomized Block Design
7.4.3 Factorial Experiments
7.5 FINAL WORDS
7.6 USING R
8. REGRESSION ANALYSIS
8.1 INTRODUCTION
8.2 SIMPLE LINEAR REGRESSION
8.3 STATISTICAL INFERENCE FOR LEAST SQUARES ESTIMATORS
8.4 ANOVA FOR SIMPLE LINEAR REGRESSION
3.5 THE COEFFICIENT OF DETERMINATION: A MEASURE OF THE USEFULNESS OF THE MODEL
8.6 APPLICATION OF SIMPLE LINEAR REGRESSION TO TRACK AND FIELD
8.6.1 Modeling and Prediction
8.6.2 More on Ultimate Records
8.6.3 Some Examples
8.6.4 Olympic Trends
8.6.5 An Alternative Regression Model
8.6.6 Estimation of Ultimate Record, An Example
8.6.7 Least Squares Using Matrices
8.7 MULTIPLE LINEAR REGRESSION
8.8 BEST SUBSET SELECTION AND STEPWISE REGRESSION
8.9 APPLICATION
8.9.1 Why NBA Teams Win
8.9.2 Prediction of Medal Totals for Olympic Games
8.10 DIFFICULTIES OF USING MULTIPLE REGRESSION
8.10.1 Exclusion of a Relevant Variable
8.10.2 Inclusion of an Irrelevant Variable
8.10.3 Incorrect Functional Form
8.10.4 Stepwise Regression
8.10.5 Proxy Variables and Measurement Error
8.10.6 Selection Bias
8.10.7 Multicollinearity and Singularity
8.10.8 Autocorrelation
8.10.9 Heteroskedasticity
8.10.10 Outliers
8.10.11 Influential Observations
8.10.12 Misconception
8.10.13 Ethical Issues
8.10.14 Cross Validation
8.10.15 Some Remedies
8.10.16 Final Word
8.11 THE LOGISTIC MODEL
8.11.1 Effect of the Star Player
8.12 USING R
9. TIME SERIES ANALYSIS
9.1 STOCHASTIC PROCESSES
9.2 STATIONARY PROCESS
9.3 AUTOREGRESSIVE PROCESSES
9.4 FIRST-ORDER AR PROCESS
9.5 GENERAL-ORDER AR PROCESS
9.6 FORECASTING USING TIME SERIES
9.7 COMPONENTS OF TIME SERIES
9.8 SMOOTHING TECHNIQUES
9.9 TREND ANALYSIS (TREND PROJECTION)
9.10 ANALYSIS OF DATA FOR 100, 400, AND 8-METER RUNS
9.10.1 Advanced Analysis (400 m and 800 m)
9.10.2 Elementary Analysis Using Minitab (400 m)
9.10.3 Time Series Analysis of the Men’s 100-m Run
9.11 USING R
10. NONPARAMETRIC STATISTICS
10.1 ORDER STATISTICS, RANKING THE BEST
10.2 DISTRIBUTION OF THE i-TH ORDER STATISTICS
10.3 JOINT DISTRIBUTION OF THE FIRST r-ORDER STATISTICS
10.4 THE PROBABILITY INTEGRAL TRANSFORMATION AND UNIFORM ORDER STATISTICS
10.5 DISTRIBUTION-FREE CONFIDENCE INTERVALS AND TESTS
10.5.1 Single Sample Sign Test
10.5.2 Run Test
10.5.3 Wilcoxon (Mann-Whitney) Rank Sum Test
10.5.4 Wilcoxon Matched-Pairs Signed Rank Test
10.5.5 Paired-Sample Sign Test
10.5.6 Kruskal-Wallis and Friedman Tests
10.5.7 Spearman Rank Correlation
10.6 EXAMPLES OF APPLICATIONS
10.6.1 Wilcoxon Rank Sum Test for Comparing Two Populations, Independent Samples
10.6.2 Wilcoxon Signed Rank Test for Comparing Two Populations, Paired Difference Experiment
10.6.3 Kruskal–Wallis H Test for a Completely Randomized Design
10.6.4 The Friedman Fr Test for a Randomized Block Design
10.7 USING R
10.7.1 Sign Test
10.7.2 Wilcoxon Test
10.7.3 Friedman Test
10.7.4 Spearman’s correlation
Introduction to Statistics & Probability Through Sports links the tools students need for data analysis and modeling to the universal love of sports. By using sports as the focus of data analysis, it increases student’s motivation to study statistics and builds on a foundation of understanding of sports they already possess. Sports also present sets of data that are accurate, reliable, and recorded with great precision under controlled conditions.
Through the use of sports statistics, Introduction to Statistics & Probability Through Sports discusses:
- Descriptive Statistics
- Probability
- Random variable
- Classical, Limiting, and Sampling Distributions
- Estimation
- Testing
- Analysis of Variance
- Regression
- Times Series Analysis
- Use of R to perform data analysis
Preface
1. ELEMENTS OF STATISTICS
1.1 NUMERICAL SUMMARIES
1.2 FREQUENCY TABLE AND HISTOGRAM
1.3 POPULATION AND SAMPLING
1.4 INTERPRETING THE STANDARD DEVIATION
1.5 MEASURES OF RELATIVE STANDING
1.5.1 Percentile Ranking
1.5.2 z-score
1.5.3 Interpretation of z-Score for Bell-Shaped Distribution
1.6 CASE STUDY: COMPARISON OF PROFESSIONAL PLAYERS AND TEAMS
1.6.1 Performance Measures
1.6.2 How Good Was Michael Jordan?
1.7 PAIRED DATA AND LINEAR CORRELATION COEFFICIENT
1.7.1 Correlation Coefficient
1.7.2 Using Correlation for Prediction
1.7.3 Interpretation of Correlation Coefficient
1.7.4 Regression-to-the-Mean
1.7.5 Inference and Data Mining
1.8 USING R
2. ELEMENTS OF PROBABILITY
2.1 CHANCE EXPERIMENTS
2.2 EVENTS AND THEIR ALGEBRA
2.3 PROBABILITY
2.3.1 Introduction and Definitions
2.3.2 Permutations and Combinations
2.3.3 Misleading Use of Probability and Statistics2.4 CONDITIONAL PROBABILITY AND INDEPENDENCE
2.4.1 Conditional Probability, Addition, and Multiplication Rules
2.4.2 Independent Events
2.4.3 More on The Multiplication Rule
2.4.4 The Law of Total Probability
2.4.5 Bayes’ Theorem
2.4.6 Independence of More Than Two Events
2.5 ANALYSIS OF A TENNIS MATCH
2.5.1 Table Tennis
2.6 RANDOM VARIABLES AND THEIR DISTRIBUTION
2.7 EXPECTATION AND VARIANCE
2.7.1 Lifetimes and Life Expectancies
2.8 SIMPSON’S PARADOX AND HOTH AND IN SPORT
2.8.1 Introduction and Examples
2.8.2 Weighted Averages
2.8.3 Application to “Hot Hand” in Sport
2.9 COINCIDENCES
2.9.1 Streaks
2.10 USING R
2.10.1 Probability of random variables
3 SOME SPECIAL DISTRIBUTIONS
3.1 BINOMIAL DISTRIBUTION (DISCRETE)
3.1.1 Binomial Distribution
3.1.2 Bernoulli Distribution and System Reliability
3.2 GEOMETRIC AND NEGATIVE BINOMIAL DISTRIBUTIONS (DISCRETE)
3.2.1 Geometric Distribution
3.2.2 Negative Binomial (Pascal) Distribution
HYPER GEOMETRIC DISTRIBUTION (DISCRETE)
3.3.1 Hypergeometric Distribution
3.4 POISSON DISTRIBUTION (DISCRETE)
3.4.1 Binomial-Poisson Hierarchy
3.5 UNIFORM DISTRIBUTION (CONTINUOUS)
3.6 NORMAL DISTRIBUTION (CONTINUOUS)
3.7 RELATIVES OF THE NORMAL DISTRIBUTION: CHI-SQUARE( χ2), t AND F-DISTRIBUTIONS (ALLCONTINUOUS)
3.7.1 Chi-Square Distribution
3.7.2 t-Distribution
3.7.3 F-Distribution3.8 EXPONENTIAL, GAMMA, BETA, LOGNORMAL, GUMBEL, WEIBULL AND FRÉCHET DISTRIBUTIONS (ALLCONTINUOUS)
3.8.1 Exponential Distribution
3.8.2 Gamma Distribution
3.8.3 Beta Distribution
3.8.4 Lognormal Distribution (Another Relative of the Normal Distribution)
3.8.5 Gumbel Distribution
3.8.6 Weibull Distribution
3.8.7 Fréchet Distribution
3.8.8 Maxwell Distribution
3.8.9 Pareto Distribution
3.8.10 Relationships Between Some Distributions
3.9 THE POISSON PROCESS
3.9.1 Facts and Applications
3.10 USING R
4 LIMIT PROPERTIES AND MODELS
4.1 THE MODEL OF SUMS: NORMAL DISTRIBUTION
4.1.1 The Central Limit Theorem
4.1.2 The Normal Approximation to the Binomial Distribution
4.2 POISSON LIMIT THEOREM
4.3 NORMAL APPROXIMATION: A SUMMARY
4.4 THE MODEL OF PRODUCTS: THE LOGNORMAL DISTRIBUTION .
4.5 THE MODEL OF EXTREMES: THE EXTREME VALUE DISTRIBUTIONS
4.6 EXCEEDANCES
4.6.1 Return Periods
4.6.2 Exceedances and English Premier League
4.6.3 Characteristic Values
4.7 USING R
5 ESTIMATION
5.1 THE PROBLEM DESCRIPTION
5.2 SAMPLING DISTRIBUTION
5.3 POINT ESTIMATOR
5.4 MAXIMUM LIKELIHOOD PRINCIPLE
5.4.1 Method of Moments
5.5 INTERVAL ESTIMATION AND CONFIDENCE INTERVALS
5.6 BOOTSTRAP CONFIDENCE INTERVAL
5.6.1 An Example of Bootstrapping
5.7 USING R
5.7.1 Using Known Distributions
6. STATISTICAL TESTING
6.1 STATISTICAL HYPOTHESIS
6.2 TESTS
6.3 Z-TEST
6.3.1 Testing Using Confidence Interval
6.4 TWO-SAMPLE Z-TEST
6.5 OBSERVED SIGNIFICANCE LEVEL, p-VALUES
6.6 t-TEST
6.6.1 Test for a Population Mean
6.6.2 Test for a Population Correlation Coefficient
6.7 LARGE-SAMPLE TEST OF HYPOTHESIS FOR POPULATION PROPORTION
6.7.1 Determination of Sample Size
6.8 χ2 TEST FOR VARIANCE(STANDARD DEVIATION)
6.9 F-TEST
6.10 OTHER ALTERNATIVE HYPOTHESIS
6.11 MORE ON INFERENCE ABOUT TWO POPULATIONS
6.11.1 Large-Sample Inference About the Difference Between Two Population Means
6.11.2 Small-Sample Inference About the Difference Between Two Population Means (Normal Populations)
6.11.3 Inference About the Difference Between Two Population Means (Unequal Variances)
6.11.4 Inference About the Difference Between Two Population Means: Paired Difference Experiment
6.11.5 Fitness Test
6.11.6 Inference About the Difference Between Two Population Proportions
6.11.7 Case Study: 2018-2019 NBA MVP
6.12 DISTRIBUTION FREE χ –TEST
6.13 TEST OF INDEPENDENCE, CONTINGENCY TABLES
6.13.1 Further Discussions on “Hot Hand” in Sports
6.13.2 Case study: Comparison of Men and Women Professional Basketball Players
6.14 KOLMOGOROV-SMIRNOV TEST
6.15 SIGN TEST
6.16 FINAL WORDS
6.17 USING R
6.17.1 Hypothesis test of mean values
6.17.2 Hypothesis test of variances
6.17.3 χ2 2-Test
6.17.4 Kolmogorov-Smirnov test
7. ANALYSIS OF VARIANCE AND EXPERIMENTAL DESIGN
7.1 COMPONENTS OF VARIANCE
7.2 ONE-WAY CLASSIFICATION
7.3 TWO-WAY CLASSIFICATIONS
7.4 THE EXPERIMENTAL DESIGN
7.4.1 The Completely Randomized Design
7.4.2 The Randomized Block Design
7.4.3 Factorial Experiments
7.5 FINAL WORDS
7.6 USING R
8. REGRESSION ANALYSIS
8.1 INTRODUCTION
8.2 SIMPLE LINEAR REGRESSION
8.3 STATISTICAL INFERENCE FOR LEAST SQUARES ESTIMATORS
8.4 ANOVA FOR SIMPLE LINEAR REGRESSION
3.5 THE COEFFICIENT OF DETERMINATION: A MEASURE OF THE USEFULNESS OF THE MODEL
8.6 APPLICATION OF SIMPLE LINEAR REGRESSION TO TRACK AND FIELD
8.6.1 Modeling and Prediction
8.6.2 More on Ultimate Records
8.6.3 Some Examples
8.6.4 Olympic Trends
8.6.5 An Alternative Regression Model
8.6.6 Estimation of Ultimate Record, An Example
8.6.7 Least Squares Using Matrices
8.7 MULTIPLE LINEAR REGRESSION
8.8 BEST SUBSET SELECTION AND STEPWISE REGRESSION
8.9 APPLICATION
8.9.1 Why NBA Teams Win
8.9.2 Prediction of Medal Totals for Olympic Games
8.10 DIFFICULTIES OF USING MULTIPLE REGRESSION
8.10.1 Exclusion of a Relevant Variable
8.10.2 Inclusion of an Irrelevant Variable
8.10.3 Incorrect Functional Form
8.10.4 Stepwise Regression
8.10.5 Proxy Variables and Measurement Error
8.10.6 Selection Bias
8.10.7 Multicollinearity and Singularity
8.10.8 Autocorrelation
8.10.9 Heteroskedasticity
8.10.10 Outliers
8.10.11 Influential Observations
8.10.12 Misconception
8.10.13 Ethical Issues
8.10.14 Cross Validation
8.10.15 Some Remedies
8.10.16 Final Word
8.11 THE LOGISTIC MODEL
8.11.1 Effect of the Star Player
8.12 USING R
9. TIME SERIES ANALYSIS
9.1 STOCHASTIC PROCESSES
9.2 STATIONARY PROCESS
9.3 AUTOREGRESSIVE PROCESSES
9.4 FIRST-ORDER AR PROCESS
9.5 GENERAL-ORDER AR PROCESS
9.6 FORECASTING USING TIME SERIES
9.7 COMPONENTS OF TIME SERIES
9.8 SMOOTHING TECHNIQUES
9.9 TREND ANALYSIS (TREND PROJECTION)
9.10 ANALYSIS OF DATA FOR 100, 400, AND 8-METER RUNS
9.10.1 Advanced Analysis (400 m and 800 m)
9.10.2 Elementary Analysis Using Minitab (400 m)
9.10.3 Time Series Analysis of the Men’s 100-m Run
9.11 USING R
10. NONPARAMETRIC STATISTICS
10.1 ORDER STATISTICS, RANKING THE BEST
10.2 DISTRIBUTION OF THE i-TH ORDER STATISTICS
10.3 JOINT DISTRIBUTION OF THE FIRST r-ORDER STATISTICS
10.4 THE PROBABILITY INTEGRAL TRANSFORMATION AND UNIFORM ORDER STATISTICS
10.5 DISTRIBUTION-FREE CONFIDENCE INTERVALS AND TESTS
10.5.1 Single Sample Sign Test
10.5.2 Run Test
10.5.3 Wilcoxon (Mann-Whitney) Rank Sum Test
10.5.4 Wilcoxon Matched-Pairs Signed Rank Test
10.5.5 Paired-Sample Sign Test
10.5.6 Kruskal-Wallis and Friedman Tests
10.5.7 Spearman Rank Correlation
10.6 EXAMPLES OF APPLICATIONS
10.6.1 Wilcoxon Rank Sum Test for Comparing Two Populations, Independent Samples
10.6.2 Wilcoxon Signed Rank Test for Comparing Two Populations, Paired Difference Experiment
10.6.3 Kruskal–Wallis H Test for a Completely Randomized Design
10.6.4 The Friedman Fr Test for a Randomized Block Design
10.7 USING R
10.7.1 Sign Test
10.7.2 Wilcoxon Test
10.7.3 Friedman Test
10.7.4 Spearman’s correlation