: Andreas Krause, Melvin Olson
: The Basics of S-PLUS
: Springer-Verlag
: 9780387283906
: 4
: CHF 90.50
:
: Sonstiges
: English
: 444
: Wasserzeichen/DRM
: PC/MAC/eReader/Tablet
: PDF

Proven bestseller: almost 6000 copies sold in the U.S. in two editions

New edition updated to cover S-PLUS 6.0

Can be used as an introduction to R, as well as S-PLUS

New exercises have been added; Includes a comparison of S-PLUS and R

Well-suited for self-study

7 Exploring Data  (p. 193)

In the preceding chapters, we have laid the foundation for understanding the concepts and ideas of the S-Plus system. We explored basic ideas and how to use S-Plus for performing calculations, and we have seen how data can be generated, stored, and accessed. Furthermore, we also looked at how data can be displayed graphically. All this will be useful as we explore real data sets in this chapter. We will explore data sets that come with S-Plus, speci.cally the Barley and Geyser data sets.

Rather than presenting a list of available statistical functions, we will go through a typical data analysis as a way of introducing the more useful and common commands and the kind of output we ll encounter. We chose to use S-Plus data sets so you can follow along with the analysis we present and complete the exercises at the end of this chapter. We divide the data analysis into two categories:"descriptive" and"graphical" exploration. Further sections cover distributions and related functions, con.rmatory statistics and hypothesis testing, and missing and in.nite values.

7.1 Descriptive Data Exploration

We will now explore the di.erent variables contained in the Barley data set. We will first analyze the variables in one dimension, or, in other words, we will take a univariate approach. The analysis of the dependence between the variables and the exploration of higher-dimensional structure follows later.


The Barley Data Set

The Barley data are measurements of yield in bushels per acre at di.erent sites. The analysis comprises 6 sites planting 10 di.erent varieties of barley in 2 successive years, 1931 and 1932. The data set therefore contains 120 measurements of barley yield. Our main goal will be to investigate di.erences in barley yields given by the di.erent variable constellations, such as the 1931 harvest of the .fth variety on site 4 and the 1932 harvest of the seventh variety at the same site.
Just enter
             > barley
to see the data.

Exploratory data analysis (EDA) is an approach to investigating data that stresses the need to know more about the structure and information inherent in the data. The methods used with this approach are referred to as descriptive, as opposed to con.rmatory. Descriptive simply means that simple summaries are used to describe the data: their shapes, sizes, relationships, and the like. Examples of descriptive statistics are means, medians, standard deviations, ranges, and so on.

Given the basic information about the Barley data, the following analysis is intended to gain more information and structural knowledge about the numbers we have.

A typical place to begin is, of course, looking at the data. If the data set is small, we can easily look at it simply by printing it out. We check the data size by entering
              > dim(barley)
                     120 4
Preface6
Contents9
Figures17
Tables21
1 Introduction23
1.1 The History of S and S-Plus24
1.2 S-Plus on Different Operating Systems26
1.3 Notational Conventions28
2 Graphical User Interface31
2.1 Introduction31
2.2 System Overview32
2.2.1 Using a Mouse33
2.2.2 Object Explorer33
2.2.3 Commands Window33
2.2.4 Toolbars34
2.2.5 Graph Sheets34
2.2.6 Script Window34
2.3 Getting Started with the Interface35
2.3.1 Importing Data35
2.3.2 Graphs35
2.3.3 Data and Statistics37
2.3.4 Customizing the Toolbars37
2.3.5 Chapters38
2.4 Detailed Use of the GUI Interface40
2.5 Object Explorer40
2.6 Help41
2.7 Data Export43
2.8 Working Directory45
2.9 Data Import46
2.10 Data Summaries49
2.11 Graphs51
2.12 Trellis Graphs58
2.13 Linear Regression60
2.14 PowerPoint (Windows Only)64
2.15 Excel (Windows Only)66
2.16 Script Window67
2.17 UNIX/Linux GUI69
2.18 Summary78
2.19 Exercises79
2.20 Solutions80
3 A First Session95
3.1 General Information95
3.1.1 Starting and Quitting96
3.1.2 The Help System97
3.1.3 Before Beginning97
3.2 Simple Structures98
3.2.1 Arithmetic Operators98
3.2.2 Assignments99
3.2.3 The Concatenate Command: c101
3.2.4 The Sequence Command: seq102
3.2.5 The Replicate Command: rep103
3.3 Mathematical Operations104
3.4 Use of Brackets106
3.5 Logical Values107
3.6 Review110
3.7 Exercises113
3.8 Solutions114
4 A Second Session117
4.1 Constructing and Manipulating Data117
4.1.1 Matrices118
4.1.2 Arrays123
4.1.3 Data Frames126
4.1.4 Lists129
4.2 Introduction to Functions130
4.3 Introduction to Missing Values131
4.4 Merging Data132
4.5 Putting It All Together133
4.6 Exercises136
4.7 Solutions138
5 Graphics147
5.1 Basic Graphics Commands147
5.2 Graphics Devices148
5.2.1 Working with Multiple Graphics Devices150
5.3 Plotting Data150
5.3.1 The plot Command151
5.3.2 Modifying the Data Display152
5.3.3 Modifying Figure Elements153
5.4 Adding Elements to Existing Plots155
5.4.