: Jean-Michel Muller, Nicolas Brisebarre, Florent de Dinechin, Claude-Pierre Jeannerod, Vincent Lefèvr
: Handbook of Floating-Point Arithmetic
: Birkhäuser Basel
: 9780817647056
: 1
: CHF 142.40
:
: Wahrscheinlichkeitstheorie, Stochastik, Mathematische Statistik
: English
: 579
: Wasserzeichen/DRM
: PC/MAC/eReader/Tablet
: PDF

Floating-p int arithmetic is the most widely used way of implementing real-number arithmetic on modern computers. However, making such an arithmetic reliable and portable, yet fast, is a very difficult task. As a result, floating-point arithmetic is far from being exploited to its full potential. This handbook aims to provide a complete overview of modern floating-point arithmetic. So that the techniques presented can be put directly into practice in actual coding or design, they are illustrated, whenever possible, by a corresponding program.

The handbook is designed for programmers of numerical applications, compiler designers, programmers of floating-point algorithms, designers of arithmetic operators, and more generally, students and researchers in numerical analysis who wish to better understand a tool used in their daily work and research.

Preface14
List of Figures16
List of Tables19
I Introduction, Basic Definitions, and Standards22
1 Introduction23
1.1 Some History23
1.2 Desirable Properties26
1.3 Some Strange Behaviors27
1.3.1 Some famous bugs27
1.3.2 Difficult problems28
2 Definitions and Basic Notions33
2.1 Floating-Point Numbers33
2.2 Rounding40
2.2.1 Rounding modes40
2.2.2 Useful properties42
2.2.3 Relative error due to rounding43
2.3 Exceptions45
2.4 Lost or Preserved Properties of the Arithmetic on the Real Numbers47
2.5 Note on the Choice of the Radix49
2.5.1 Representation errors49
2.5.2 A case for radix 1050
2.6 Tools for Manipulating Floating-Point Errors52
2.6.1 The ulp function52
2.6.2 Errors in ulps and relative errors57
2.6.3 An example: iterated products57
2.6.4 Unit roundoff59
2.7 Note on Radix Conversion60
2.7.1 Conditions on the formats60
2.7.2 Conversion algorithms63
2.8 The Fused Multiply-Add (FMA) Instruction71
2.9 Interval Arithmetic71
2.9.1 Intervals with floating-point bounds72
2.9.2 Optimized rounding72
3 Floating-Point Formats and Environment74
3.1 The IEEE 754-1985 Standard75
3.1.1 Formats specified by IEEE 754-198575
3.1.2 Little-endian, big-endian79
3.1.3 Rounding modes specified by IEEE 754-198580
3.1.4 Operations specified by IEEE 754-198581
3.1.5 Exceptions specified by IEEE 754-198585
3.1.6 Special values88
3.2 The IEEE 854-1987 Standard89
3.2.1 Constraints internal to a format89
3.2.2 Various formats and the constraints between them90
3.2.3 Conversions between floating-point numbers and decimal strings91
3.2.4 Rounding92
3.2.5 Operations92
3.2.6 Comparisons93
3.2.7 Exceptions93
3.3 The Need for a Revision93
3.3.1 A typical problem: ``double rounding''94
3.3.2 Various ambiguities96
3.4 The New IEEE 754-2008 Standard98
3.4.1 Formats specified by the revised standard99
3.4.2 Binary interchange format encodings100
3.4.3 Decimal interchange format encodings101
3.4.4 Larger formats111
3.4.5 Extended and extendable precisions111
3.4.6 Attributes112
3.4.7 Operations specified by the standard116
3.4.8 Comparisons118
3.4.9 Conversions118
3.4.10 Default exception handling119
3.4.11 Recommended transcendental functions122
3.5 Floating-Point Hardware in Current Processors123
3.5.1 The common hardware denominator123
3.5.2 Fused multiply-add123
3.5.3 Extended precision123
3.5.4 Rounding and precision control124
3.5.5 SIMD instructions125
3.5.6 Floating-point on x86 processors: SSE2 versus x87125
3.5.7 Decimal arithmetic126
3.6 Floating-Point Hardware in Recent GraphicsProcessing Units127
3.7 Relations with Programming Languages128
3.7.1 The Language Independent Arithmetic (LIA) standard128
3.7.2 Programming languages129
3.8 Checking the Environment129
3.8.1 MACHAR130
3.8.2 Paranoia130
3.8.3 UCBTest134
3.8.4 TestFloat135
3.8.5 IeeeCC754135
3.8.6 Miscellaneous135
II Cleverly Using Floating-Point Arithmetic136
4 Basic Properties and Algorithms137
4.1 Testing the Computational Environment137
4.1.1 Computing the radix137
4.1.2 Computing the precision139
4.2 Exact Operations140
4.2.1 Exact addition140
4.2.2 Exact multiplications and divisions142
4.3 Accurate Computations of Sums of Two Numbers143
4.3.1 The Fast2Sum algorithm144
4.3.2 The 2Sum algorithm147
4.3.3 If we do not use rounding to nearest149
4.4 Computation of Products150
4.4.1 Veltkamp splitting150
4.4.2 Dekker's multiplication algorithm153
4.5 Complex numbers157
4.5.1 Various error bounds158
4.5.2 Error bound for complex multiplication159
4.5.3 Complex division162
4.5.4 Complex square root167
5 The Fused Multiply-Add Instruction169
5.1 The 2MultFMA Algorithm170
5.2 Computation of Residuals of Division and Square Root171
5.3 Newton--Raphson-Based Division with an FMA173
5.3.1 Variants of the Newton--Raphson iteration173
5.3.2 Using the Newton--Raphson iteration for correctly rounded division178
5.4 Newton--Raphson-Based Square Root with an FMA185
5.4.1 The basic iterations185
5.4.2 Using the Newton--Raphson iteration for correctly rounded square roots186
5.5 Multiplication by an Arbitrary-Precision Constant189
5.5.1 Checking for a given constant C if Algorithm 5.2 will always work190
5.6 Evaluation of the Error of an FMA193
5.7 Evaluation of Integer Powers195
6 Enhanced Floating-Point Sums, Dot Products, and Polynomial Values198
6.1 Preliminaries199
6.1.1 Floating-point arithmetic models200
6.1.2 Notation for error analysis and classical error estimates201
6.1.3 Properties for deriving validated running error bounds204
6.2 Computing Validated Running Error Bounds205
6.3 Computing Sums More Accurately207
6.3.1 Reordering the operands, and a bit more207
6.3.2 Compensated sums209
6.3.3 Implementing a ``long accumulator''216
6.3.4 On the sum of three floating-point numbers216
6.4 Compensated Dot Products218
6.5 Compensated Polynomial Evaluation220
7 Languages and Compilers222
7.1 A Play with Many Actors222
7.1.1 Floating-point evaluation in programming languages223
7.1.2 Processors, compilers, and operating systems225
7.1.3 In the hands of the programmer226