| Preface | 1 |
|---|
| Variability, Information, and Prediction | 16 |
|---|
| The Curse of Dimensionality | 18 |
| The Two Extremes | 19 |
| Perspectives on the Curse | 20 |
| Sparsity | 21 |
| Exploding Numbers of Models | 23 |
| Multicollinearity and Concurvity | 24 |
| The Effect of Noise | 25 |
| Coping with the Curse | 26 |
| Selecting Design Points | 26 |
| Local Dimension | 27 |
| Parsimony | 32 |
| Two Techniques | 33 |
| The Bootstrap | 33 |
| Cross-Validation | 42 |
| Optimization and Search | 47 |
| Univariate Search | 47 |
| Multivariate Search | 48 |
| General Searches | 49 |
| Constraint Satisfaction and Combinatorial Search | 50 |
| Notes | 53 |
| Hammersley Points | 53 |
| Edgeworth Expansions for the Mean | 54 |
| Bootstrap Asymptotics for the Studentized Mean | 56 |
| Exercises | 58 |
| Local Smoothers | 68 |
|---|
| Early Smoothers | 70 |
| Transition to Classical Smoothers | 74 |
| Global Versus Local Approximations | 75 |
| LOESS | 79 |
| Kernel Smoothers | 82 |
| Statistical Function Approximation | 83 |
| The Concept of Kernel Methods and the Discrete Case | 88 |
| Kernels and Stochastic Designs: Density Estimation | 93 |
| Stochastic Designs: Asymptotics for Kernel Smoothers | 96 |
| Convergence Theorems and Rates for Kernel Smoothers | 101 |
| Kernel and Bandwidth Selection | 105 |
| Linear Smoothers | 110 |
| Nearest Neighbors | 111 |
| Applications of Kernel Regression | 115 |
| A Simulated Example | 115 |
| Ethanol Data | 117 |
| Exercises | 122 |
| Spline Smoothing | 132 |
|---|
| Interpolating Splines | 132 |
| Natural Cubic Splines | 138 |
| Smoothing Splines for Regression | 141 |
| Model Selection for Spline Smoothing | 144 |
| Spline Smoothing Meets Kernel Smoothing | 145 |
| Asymptotic Bias, Variance, and MISE for Spline Smoothers | 146 |
| Ethanol Data Example -- Continued | 148 |
| Splines Redux: Hilbert Space Formulation | 151 |
| Reproducing Kernels | 153 |
| Constructing an RKHS | 156 |
| Direct Sum Construction for Splines | 161 |
| Explicit Forms | 164 |
| Nonparametrics in Data Mining and Machine Learning | 167 |
| Simulated Comparisons | 169 |
| What Happens with Dependent Noise Models? | 172 |
| Higher Dimensions and the Curse of Dimensionality | 174 |
| Notes | 178 |
| Sobolev Spaces: Definition | 178 |
| Exercises | 179 |
| New Wave Nonparametrics | 186 |
|---|
| Additive Models | 187 |
| The Backfitting Algorithm | 188 |
| Concurvity and Inference | 192 |
| Nonparametric Optimality | 195 |
| Generalized Additive Models | 196 |
| Projection Pursuit Regression | 199 |
| Neural Networks | 204 |
| Backpropagation and Inference | 207 |
| Barron's Result and the Curse | 212 |
| Approximation Properties | 213 |
| Barron's Theorem: Formal Statement | 215 |
| Recursive Partitioning Regression | 217 |
| Growing Trees | 219 |
| Pruning and Selection | 222 |
| Regression | 223 |
| Bayesian Additive Regression Trees: BART | 225 |
| MARS | 225 |
| Sliced Inverse Regression | 230 |
| ACE and AVAS | 233 |
| Notes | 235 |
| Proof of Barron's Theorem | 235 |
| Exercises | 239 |
| Supervised Learning: Partition Methods | 246 |
|---|
| Multiclass Learning | 248 |
| Discriminant Analysis | 250 |
| Distance-Based Discriminant Analysis | 251 |
| Bayes Rules | 256 |
| Probability-Based Discriminant Analysis | 260 |
| Tree-Based Classifiers | 264 |
| Splitting Rules | 264 |
| Logic Trees | 268 |
| Random Forests | 269 |
| Support Vector Machines | 277 |
| Margins and Distances | 277 |
| Binary Classification and Risk | 280 |
| Prediction Bounds for Function Classes | 283 |
| Constructing SVM Classifiers | 286 |
| SVM Classification for Nonlinearly Separable Populations | 294 |
| SVMs in the General Nonlinear Case | 297 |
| Some Kernels Used in SVM Classification | 303 |
| Kernel Choice, SVMs and Model Selection | 304 |
| Support Vector Regression | 305 |
| Multiclass Support Vector Machines | 308 |
| Neural Networks | 309 |
| Notes | 311 |
| Hoeffding's Inequality | 311 |
| VC Dimension | 312 |
| Exercises | 315 |
| Alternative Nonparametrics | 322 |
|---|
| Ensemble Methods | 323 |
| Bayes Model Averaging | 325 |
| Bagging | 327 |
| Stacking | 331 |
| Boosting | 333 |
| Other Averaging Methods | 341 |
| Oracle Inequalities | 343 |
| Bayes Nonparametrics | 349 |
| Dirichlet Process Priors | 349 |
| Polya Tree Priors | 351 |
| Gaussian Process Priors | 353 |
| The Relevance Vector Machine | 359 |
| RVM Regression: Formal Description | 360 |
| RVM Classification | 364 |
| Hidden Markov Models -- Sequential Classification | 367 |
| Notes | 369 |
| Proof of Yang's Oracle Inequality | 369 |
| Proof
|