| Contents at a Glance | 5 |
|---|
| Contents | 6 |
|---|
| About the Authors | 17 |
|---|
| About the Technical Reviewer | 19 |
|---|
| Acknowledgments | 20 |
|---|
| Chapter 1: Introduction to Machine Learning and R | 21 |
|---|
| 1.1 Understanding the Evolution | 22 |
| 1.1.1 Statistical Learning | 22 |
| 1.1.2 Machine Learning (ML) | 23 |
| 1.1.3 Artificial Intelligence (AI) | 23 |
| 1.1.4 Data Mining | 24 |
| 1.1.5 Data Science | 25 |
| 1.2 Probability and Statistics | 26 |
| 1.2.1 Counting and Probability Definition | 27 |
| 1.2.2 Events and Relationships | 29 |
| 1.2.2.1 Independent Events | 29 |
| 1.2.2.2 Conditional Independence | 30 |
| 1.2.2.3 Bayes Theorem | 30 |
| 1.2.3 Randomness, Probability, and Distributions | 32 |
| 1.2.4 Confidence Interval and Hypothesis Testing | 33 |
| 1.2.4.1 Confidence Interval | 34 |
| 1.2.4.2 Hypothesis Testing | 35 |
| 1.3 Getting Started with R | 38 |
| 1.3.1 Basic Building Blocks | 38 |
| 1.3.1.1 Calculations | 38 |
| 1.3.1.2 Statistics with R | 39 |
| 1.3.1.3 Packages | 39 |
| 1.3.2 Data Structures in R | 39 |
| 1.3.2.1 Vectors | 40 |
| 1.3.2.2 List | 40 |
| 1.3.2.3 Matrix | 40 |
| 1.3.2.4 Data Frame | 41 |
| 1.3.3 Subsetting | 41 |
| 1.3.3.1 Vectors | 41 |
| 1.3.3.2 Lists | 42 |
| 1.3.3.3 Matrixes | 42 |
| 1.3.3.4 Data Frames | 43 |
| 1.3.4 Functions and Apply Family | 43 |
| 1.4 Machine Learning Process Flow | 46 |
| 1.4.1 Plan | 46 |
| 1.4.2 Explore | 46 |
| 1.4.3 Build | 47 |
| 1.4.4 Evaluate | 47 |
| 1.5 Other Technologies | 48 |
| 1.6 Summary | 48 |
| 1.7 References | 48 |
| Chapter 2: Data Preparation and Exploration | 50 |
|---|
| 2.1 Planning the Gathering of Data | 51 |
| 2.1.1 Variables Types | 51 |
| 2.1.1.1 Categorical Variables | 51 |
| 2.1.1.2 Continuous Variables | 52 |
| 2.1.2 Data Formats | 52 |
| 2.1.2.1 Comma-Separated Values | 53 |
| 2.1.2.2 Microsoft Excel | 53 |
| 2.1.2.3 Extensible Markup Language: XML | 53 |
| 2.1.2.4 Hypertext Markup Language: HTML | 55 |
| 2.1.2.5 JSON | 57 |
| 2.1.2.6 Other Formats | 59 |
| 2.1.3 Data Sources | 59 |
| 2.1.3.1 Structured | 59 |
| 2.1.3.2 Semi-Structured | 59 |
| 2.1.3.3 Unstructured | 59 |
| 2.2 Initial Data Analysis (IDA) | 60 |
| 2.2.1 Discerning a First Look | 60 |
| 2.2.1.1 Function str() | 60 |
| 2.2.1.2 Naming Convention: make.names() | 61 |
| 2.2.1.3 Table(): Pattern or Trend | 62 |
| 2.2.2 Organizing Multiple Sources of Data into One | 62 |
| 2.2.2.1 Merge and dplyr Joins | 62 |
| 2.2.2.1.1 Using merge | 63 |
| 2.2.2.1.2 dplyr | 64 |
| 2.2.3 Cleaning the Data | 65 |
| 2.2.3.1 Correcting Factor Variables | 65 |
| 2.2.3.2 Dealing with NAs | 66 |
| 2.2.3.3 Dealing with Dates and Times | 67 |
| 2.2.3.3.1 Time Zone | 68 |
| 2.2.3.3.2 Daylight Savings Time | 68 |
| 2.2.4 Supplementing with More Information | 68<
|