-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathStatistics
More file actions
119 lines (72 loc) · 2.21 KB
/
Statistics
File metadata and controls
119 lines (72 loc) · 2.21 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
1. What is Statistics?
Statistics is the study of collecting, organizing, analyzing, and interpreting data.
2. Types of Statistics
Descriptive Statistics → summarizes data (mean, median, variance)
Inferential Statistics → uses sample data to make conclusions about population
3. Population vs Sample
Population = entire data (N)
Sample = subset of population (n)
4. Measures of Central Tendency
Mean = sum of values / n
Median = middle value after sorting
Mode = most frequent value
5. Measures of Dispersion
Population Variance = sum of (xi − mean)^2 / N
Sample Variance = sum of (xi − sample_mean)^2 / (n − 1)
Reason for (n − 1): gives better estimate (Bessel’s correction)
Standard Deviation = square root of variance
6. Types of Variables
Quantitative (numeric)
Discrete → countable (1, 2, 3)
Continuous → measurable (height, weight)
Qualitative (categorical)
Non-numeric (gender, color, type)
7. Histogram
A graph showing frequency distribution of data
X-axis → value ranges (bins)
Y-axis → frequency
8. Percentiles
A percentile is a value below which P% of data lies
Position = (P/100) × (n + 1)
Example: 50th percentile = median
9. Quartiles
Q1 → 25%
Q2 → 50% (median)
Q3 → 75%
10. Interquartile Range (IQR)
IQR = Q3 − Q1
Represents spread of middle 50%
11. Outliers
Lower Fence = Q1 − 1.5 × IQR
Upper Fence = Q3 + 1.5 × IQR
Values outside this range = outliers
12. Five Number Summary
Minimum, Q1, Median, Q3, Maximum
13. Covariance
Cov(X, Y) = sum((xi − x_mean)(yi − y_mean)) / (n − 1)
Interpretation:
Positive → move together
Negative → move opposite
Zero → no relation
14. Correlation
Correlation = Cov(X, Y) / (std_dev_X × std_dev_Y)
Range: -1 to 1
1 → perfect positive
-1 → perfect negative
0 → no relation
Limitation: only captures linear relationships
15. Types of Correlation
Pearson → linear relationship
Spearman → rank-based, handles non-linear (monotonic)
16. Key Insight
Sample statistics approximate population values
Sample variance uses (n − 1) for accuracy
17. Quick Example
Data: 1,2,2,3,3,4,5,5,6,6,7,8,8,9
Median = 5
Q1 ≈ 3
Q3 ≈ 7
IQR = 4
Lower Fence = -3
Upper Fence = 13
Any value outside this → outlier (e.g., 27)