ebook ebooks e-book e-books downloaden bei MyEbooks.ch downloaden

Handbook of Floating-Point Arithmetic

:	Jean-Michel Muller, Nicolas Brisebarre, Florent de Dinechin, Claude-Pierre Jeannerod, Vincent Lefèvr
:	Handbook of Floating-Point Arithmetic
:	Birkhäuser Basel
:	9780817647056
:	1
:	CHF 142.40
:

:	Wahrscheinlichkeitstheorie, Stochastik, Mathematische Statistik
:	English

:	579
:	Wasserzeichen/DRM
:	PC/MAC/eReader/Tablet
:	PDF

Floating-p int arithmetic is the most widely used way of implementing real-number arithmetic on modern computers. However, making such an arithmetic reliable and portable, yet fast, is a very difficult task. As a result, floating-point arithmetic is far from being exploited to its full potential. This handbook aims to provide a complete overview of modern floating-point arithmetic. So that the techniques presented can be put directly into practice in actual coding or design, they are illustrated, whenever possible, by a corresponding program.

The handbook is designed for programmers of numerical applications, compiler designers, programmers of floating-point algorithms, designers of arithmetic operators, and more generally, students and researchers in numerical analysis who wish to better understand a tool used in their daily work and research.

	Preface	14
	List of Figures	16
	List of Tables	19
	I Introduction, Basic Definitions, and Standards	22
	1 Introduction	23
	1.1 Some History	23
	1.2 Desirable Properties	26
	1.3 Some Strange Behaviors	27
	1.3.1 Some famous bugs	27
	1.3.2 Difficult problems	28
	2 Definitions and Basic Notions	33
	2.1 Floating-Point Numbers	33
	2.2 Rounding	40
	2.2.1 Rounding modes	40
	2.2.2 Useful properties	42
	2.2.3 Relative error due to rounding	43
	2.3 Exceptions	45
	2.4 Lost or Preserved Properties of the Arithmetic on the Real Numbers	47
	2.5 Note on the Choice of the Radix	49
	2.5.1 Representation errors	49
	2.5.2 A case for radix 10	50
	2.6 Tools for Manipulating Floating-Point Errors	52
	2.6.1 The ulp function	52
	2.6.2 Errors in ulps and relative errors	57
	2.6.3 An example: iterated products	57
	2.6.4 Unit roundoff	59
	2.7 Note on Radix Conversion	60
	2.7.1 Conditions on the formats	60
	2.7.2 Conversion algorithms	63
	2.8 The Fused Multiply-Add (FMA) Instruction	71
	2.9 Interval Arithmetic	71
	2.9.1 Intervals with floating-point bounds	72
	2.9.2 Optimized rounding	72
	3 Floating-Point Formats and Environment	74
	3.1 The IEEE 754-1985 Standard	75
	3.1.1 Formats specified by IEEE 754-1985	75
	3.1.2 Little-endian, big-endian	79
	3.1.3 Rounding modes specified by IEEE 754-1985	80
	3.1.4 Operations specified by IEEE 754-1985	81
	3.1.5 Exceptions specified by IEEE 754-1985	85
	3.1.6 Special values	88
	3.2 The IEEE 854-1987 Standard	89
	3.2.1 Constraints internal to a format	89
	3.2.2 Various formats and the constraints between them	90
	3.2.3 Conversions between floating-point numbers and decimal strings	91
	3.2.4 Rounding	92
	3.2.5 Operations	92
	3.2.6 Comparisons	93
	3.2.7 Exceptions	93
	3.3 The Need for a Revision	93
	3.3.1 A typical problem: ``double rounding''	94
	3.3.2 Various ambiguities	96
	3.4 The New IEEE 754-2008 Standard	98
	3.4.1 Formats specified by the revised standard	99
	3.4.2 Binary interchange format encodings	100
	3.4.3 Decimal interchange format encodings	101
	3.4.4 Larger formats	111
	3.4.5 Extended and extendable precisions	111
	3.4.6 Attributes	112
	3.4.7 Operations specified by the standard	116
	3.4.8 Comparisons	118
	3.4.9 Conversions	118
	3.4.10 Default exception handling	119
	3.4.11 Recommended transcendental functions	122
	3.5 Floating-Point Hardware in Current Processors	123
	3.5.1 The common hardware denominator	123
	3.5.2 Fused multiply-add	123
	3.5.3 Extended precision	123
	3.5.4 Rounding and precision control	124
	3.5.5 SIMD instructions	125
	3.5.6 Floating-point on x86 processors: SSE2 versus x87	125
	3.5.7 Decimal arithmetic	126
	3.6 Floating-Point Hardware in Recent GraphicsProcessing Units	127
	3.7 Relations with Programming Languages	128
	3.7.1 The Language Independent Arithmetic (LIA) standard	128
	3.7.2 Programming languages	129
	3.8 Checking the Environment	129
	3.8.1 MACHAR	130
	3.8.2 Paranoia	130
	3.8.3 UCBTest	134
	3.8.4 TestFloat	135
	3.8.5 IeeeCC754	135
	3.8.6 Miscellaneous	135
	II Cleverly Using Floating-Point Arithmetic	136
	4 Basic Properties and Algorithms	137
	4.1 Testing the Computational Environment	137
	4.1.1 Computing the radix	137
	4.1.2 Computing the precision	139
	4.2 Exact Operations	140
	4.2.1 Exact addition	140
	4.2.2 Exact multiplications and divisions	142
	4.3 Accurate Computations of Sums of Two Numbers	143
	4.3.1 The Fast2Sum algorithm	144
	4.3.2 The 2Sum algorithm	147
	4.3.3 If we do not use rounding to nearest	149
	4.4 Computation of Products	150
	4.4.1 Veltkamp splitting	150
	4.4.2 Dekker's multiplication algorithm	153
	4.5 Complex numbers	157
	4.5.1 Various error bounds	158
	4.5.2 Error bound for complex multiplication	159
	4.5.3 Complex division	162
	4.5.4 Complex square root	167
	5 The Fused Multiply-Add Instruction	169
	5.1 The 2MultFMA Algorithm	170
	5.2 Computation of Residuals of Division and Square Root	171
	5.3 Newton--Raphson-Based Division with an FMA	173
	5.3.1 Variants of the Newton--Raphson iteration	173
	5.3.2 Using the Newton--Raphson iteration for correctly rounded division	178
	5.4 Newton--Raphson-Based Square Root with an FMA	185
	5.4.1 The basic iterations	185
	5.4.2 Using the Newton--Raphson iteration for correctly rounded square roots	186
	5.5 Multiplication by an Arbitrary-Precision Constant	189
	5.5.1 Checking for a given constant C if Algorithm 5.2 will always work	190
	5.6 Evaluation of the Error of an FMA	193
	5.7 Evaluation of Integer Powers	195
	6 Enhanced Floating-Point Sums, Dot Products, and Polynomial Values	198
	6.1 Preliminaries	199
	6.1.1 Floating-point arithmetic models	200
	6.1.2 Notation for error analysis and classical error estimates	201
	6.1.3 Properties for deriving validated running error bounds	204
	6.2 Computing Validated Running Error Bounds	205
	6.3 Computing Sums More Accurately	207
	6.3.1 Reordering the operands, and a bit more	207
	6.3.2 Compensated sums	209
	6.3.3 Implementing a ``long accumulator''	216
	6.3.4 On the sum of three floating-point numbers	216
	6.4 Compensated Dot Products	218
	6.5 Compensated Polynomial Evaluation	220
	7 Languages and Compilers	222
	7.1 A Play with Many Actors	222
	7.1.1 Floating-point evaluation in programming languages	223
	7.1.2 Processors, compilers, and operating systems	225
	7.1.3 In the hands of the programmer	226