| Foreword | 6 |
---|
| Preface | 9 |
---|
| Acknowledgments | 11 |
---|
| Contents | 12 |
---|
| Part I Introduction | 20 |
---|
| 1 Introduction to AI and ML | 21 |
| 1.1 Introduction | 21 |
| 1.2 What Is AI | 22 |
| 1.3 What Is ML | 22 |
| 1.4 Organization of the Book | 23 |
| 1.4.1 Introduction | 23 |
| 1.4.2 Machine Learning | 23 |
| 1.4.3 Building End to End Pipelines | 24 |
| 1.4.4 Artificial Intelligence | 24 |
| 1.4.5 Implementations | 24 |
| 1.4.6 Conclusion | 25 |
| 2 Essential Concepts in Artificial Intelligence and Machine Learning | 26 |
| 2.1 Introduction | 26 |
| 2.2 Big Data and Not-So-Big Data | 26 |
| 2.2.1 What Is Big Data | 26 |
| 2.2.2 Why Should We Treat Big Data Differently? | 27 |
| 2.3 Types of Learning | 27 |
| 2.3.1 Supervised Learning | 27 |
| 2.3.2 Unsupervised Learning | 28 |
| 2.3.3 Reinforcement Learning | 28 |
| 2.4 Machine Learning Methods Based on Time | 28 |
| 2.4.1 Static Learning | 28 |
| 2.4.2 Dynamic Learning | 29 |
| 2.5 Dimensionality | 29 |
| 2.5.1 Curse of Dimensionality | 30 |
| 2.6 Linearity and Nonlinearity | 30 |
| 2.7 Occam's Razor | 35 |
| 2.8 No Free Lunch Theorem | 35 |
| 2.9 Law of Diminishing Returns | 36 |
| 2.10 Early Trends in Machine Learning | 36 |
| 2.10.1 Expert Systems | 36 |
| 2.11 Conclusion | 37 |
| 3 Data Understanding, Representation, and Visualization | 38 |
| 3.1 Introduction | 38 |
| 3.2 Understanding the Data | 38 |
| 3.2.1 Understanding Entities | 39 |
| 3.2.2 Understanding Attributes | 39 |
| 3.2.3 Understanding Data Types | 41 |
| 3.3 Representation and Visualization of the Data | 41 |
| 3.3.1 Principal Component Analysis | 41 |
| 3.3.2 Linear Discriminant Analysis | 44 |
| 3.4 Conclusion | 46 |
| Part II Machine Learning | 47 |
---|
| 4 Linear Methods | 48 |
| 4.1 Introduction | 48 |
| 4.2 Linear and Generalized Linear Models | 49 |
| 4.3 Linear Regression | 49 |
| 4.3.1 Defining the Problem | 49 |
| 4.3.2 Solving the Problem | 50 |
| 4.4 Regularized Linear Regression | 51 |
| 4.4.1 Regularization | 51 |
| 4.4.2 Ridge Regression | 51 |
| 4.4.3 Lasso Regression | 52 |
| 4.5 Generalized Linear Models (GLM) | 52 |
| 4.5.1 Logistic Regression | 52 |
| 4.6 k-Nearest Neighbor (KNN) Algorithm | 53 |
| 4.6.1 Definition of KNN | 53 |
| 4.6.2 Classification and Regression | 55 |
| 4.6.3 Other Variations of KNN | 55 |
| 4.7 Conclusion | 56 |
| 5 Perceptron and Neural Networks | 57 |
| 5.1 Introduction | 57 |
| 5.2 Perceptron | 57 |
| 5.3 Multilayered Perceptron or Artificial Neural Network | 58 |
| 5.3.1 Feedforward Operation | 58 |
| 5.3.2 Nonlinear MLP or Nonlinear ANN | 59 |
| 5.3.2.1 Activation Functions | 59 |
| 5.3.3 Training MLP | 59 |
| 5.3.3.1 Online or Stochastic Learning | 61 |
| 5.3.3.2 Batch Learning | 61 |
| 5.3.4 Hidden Layers | 62 |
| 5.4 Radial Basis Function Networks | 62 |
| 5.4.1 Interpretation of RBF Networks | 63 |
| 5.5 Overfitting and Regularization | 64 |
| 5.5.1 L1 and L2 Regularization | 64 |
| 5.5.2 Dropout Regularization | 65 |
| 5.6 Conclusion | 65 |
| 6 Decision Trees | 66 |
| 6.1 Introduction | 66 |
| 6.2 Why Decision Trees? | 67 |
| 6.2.1 Types of Decision Trees | 67 |
| 6.3 Algorithms for Building Decision Trees | 67 |
| 6.4 Regression Tree | 68 |
| 6.5 Classification Tree | 70 |
| 6.6 Decision Metrics | 70 |
| 6.6.1 Misclassification Error | 70 |
| 6.6.2 Gini Index | 70 |
| 6.6.3 Cross-Entropy or Deviance | 71 |
| 6.7 CHAID | 71 |
| 6.7.1 CHAID Algorithm | 72 |
| 6
|