: Karthik Ramasubramanian, Abhishek Singh
: Machine Learning Using R
: Apress
: 9781484223345
: 1
: CHF 38.00
:
: Informatik
: English
: 580
: Wasserzeichen/DRM
: PC/MAC/eReader/Tablet
: PDF

This book is inspired by the Machine Learning Model Building Process Flow, which provides the reader the ability to understand a ML algorithm and apply the entire process of building a ML model from the raw data.

This new paradigm of teaching Machine Learning will bring about a radical change in perception for many of those who think this subject is difficult to learn. Though theory sometimes looks difficult, especially when there is heavy mathematics involved, the seamless flow from the theoretical aspects to example-driven learning provided inBlockchain and Capitalism makes it easy for someone to connect the dots.

For every Machine Learning algorithm covered in this book, a 3-D approach of theory, case-study and practice will be given. And where appropriate, the mathematics will be explained through visualization in R.

All practical demonstrations will be explored in R, a powerful programming language and software environment for statistical computing and graphics. The various packages and methods available in R will be used to explain the topics. In the end, readers will learn some of the latest technological advancements in building a scalable machine learning model with Big Data.


Who This Book is For:


What you will learn: 

1.ML model building process flow
2.Theoretical aspects of Machine Learning
3.Industry based Case-Study
4.Exampl based understanding of ML algorithm using R
5.Building ML models using Apache Hadoop and Spark



Karthik Ramasubramanian, works for one of the largest and fastest growing technology unicorn in India, Hike Messenger. He brings the best of Business Analytics and Data Science experience to his role at Hike Messenger. In his 7 years of research and industry experience, he has worked on cross-industry data science problems in retail, e-commerce, and technology, developing and prototyping data driven solutions. In his previous role at Snapdeal, one of the largest e-commerce retailer in India, he was leading core statistical modelling initiatives for customer growth and pricing analytics. Prior to Snapdeal, he was part of central database team, managing the data warehouses for global business applications of Reckitt Benckiser (RB). He has rich experience working with scalable machine learning solutions for industry, including sophisticated graph network and self-learning neural networks. He has a Masters in Theoretical Computer Science from PSG College of Technology, Anna University and certified big data professional. He is passionate about teaching and mentoring future data scientist through different online and public forums. He enjoys writing poems in his leisure time and an avid traveler.

< div>

bhishek Singh, is a Data Scientist in Advanced Data Science team of Prudential Financial Inc., second largest Life Insurance Provider in US, and is based out of Ireland. He have 5 years of professional and academic experiene in Data Science, spanning across consulting,teaching and financial services. At Deloitte Advisory, he was leading Risk Analytics initiatives for top US banks in their regulatory risk, credit risk, and balance sheet modelling requirements. In his current role, he is working on scalable machine learning algorithms for Indiavidual Life Insurance business of Prudential. He have working experience in time series models and has worked with cross functional teams to implement data science solutions in enterprise infrastructure. He has been active trainer at Deloitte Professional University and had led training and development initiatives for professionals in the area of statistics, economics, financial risk and data science tools (SAS and R). He is a B.Tech. in Mathematics and Computing from Indian Institute of Technology, Guwahati and an MBA from Indian Institute of Management, Bangalore. He speaks in public events on Data Science and working with leading universities towards bringing data science skills to graduates. He have keen interest in Law and holds a Post Graduate Diploma in Cyber Law from NALSAR University. He enjoys cooking and photography during his free hours.
div>
Contents at a Glance5
Contents6
About the Authors17
About the Technical Reviewer19
Acknowledgments20
Chapter 1: Introduction to Machine Learning and R21
1.1 Understanding the Evolution22
1.1.1 Statistical Learning22
1.1.2 Machine Learning (ML)23
1.1.3 Artificial Intelligence (AI)23
1.1.4 Data Mining24
1.1.5 Data Science25
1.2 Probability and Statistics26
1.2.1 Counting and Probability Definition27
1.2.2 Events and Relationships29
1.2.2.1 Independent Events29
1.2.2.2 Conditional Independence30
1.2.2.3 Bayes Theorem30
1.2.3 Randomness, Probability, and Distributions32
1.2.4 Confidence Interval and Hypothesis Testing33
1.2.4.1 Confidence Interval34
1.2.4.2 Hypothesis Testing35
1.3 Getting Started with R38
1.3.1 Basic Building Blocks38
1.3.1.1 Calculations38
1.3.1.2 Statistics with R39
1.3.1.3 Packages39
1.3.2 Data Structures in R39
1.3.2.1 Vectors40
1.3.2.2 List40
1.3.2.3 Matrix40
1.3.2.4 Data Frame41
1.3.3 Subsetting41
1.3.3.1 Vectors41
1.3.3.2 Lists42
1.3.3.3 Matrixes42
1.3.3.4 Data Frames43
1.3.4 Functions and Apply Family43
1.4 Machine Learning Process Flow46
1.4.1 Plan46
1.4.2 Explore46
1.4.3 Build47
1.4.4 Evaluate47
1.5 Other Technologies48
1.6 Summary48
1.7 References48
Chapter 2: Data Preparation and Exploration50
2.1 Planning the Gathering of Data51
2.1.1 Variables Types51
2.1.1.1 Categorical Variables51
2.1.1.2 Continuous Variables52
2.1.2 Data Formats52
2.1.2.1 Comma-Separated Values53
2.1.2.2 Microsoft Excel53
2.1.2.3 Extensible Markup Language: XML53
2.1.2.4 Hypertext Markup Language: HTML55
2.1.2.5 JSON57
2.1.2.6 Other Formats59
2.1.3 Data Sources59
2.1.3.1 Structured59
2.1.3.2 Semi-Structured59
2.1.3.3 Unstructured59
2.2 Initial Data Analysis (IDA)60
2.2.1 Discerning a First Look60
2.2.1.1 Function str()60
2.2.1.2 Naming Convention: make.names()61
2.2.1.3 Table(): Pattern or Trend62
2.2.2 Organizing Multiple Sources of Data into One62
2.2.2.1 Merge and dplyr Joins62
2.2.2.1.1 Using merge63
2.2.2.1.2 dplyr64
2.2.3 Cleaning the Data65
2.2.3.1 Correcting Factor Variables65
2.2.3.2 Dealing with NAs66
2.2.3.3 Dealing with Dates and Times67
2.2.3.3.1 Time Zone68
2.2.3.3.2 Daylight Savings Time68
2.2.4 Supplementing with More Information68<