| Preface | 5 |
|---|
| Contents | 9 |
|---|
| 1 Background to Research | 15 |
|---|
| 1.1 Introduction | 15 |
| 1.2 The DFT and Its Efficient Computation | 16 |
| 1.3 Twentieth Century Developments of the FFT | 18 |
| 1.4 The DHT and Its Relation to the DFT | 20 |
| 1.5 Attractions of Computing the Real-Data DFT via the FHT | 21 |
| 1.6 Modern Hardware-Based Parallel Computing Technologies | 22 |
| 1.7 Hardware-Based Arithmetic Units | 23 |
| 1.8 Performance Metrics | 24 |
| 1.9 Basic Definitions | 25 |
| 1.10 Organization of the Monograph | 26 |
| References | 27 |
| 2 Fast Solutions to Real-Data Discrete Fourier Transform | 29 |
|---|
| 2.1 Introduction | 29 |
| 2.2 Real-Data FFT Algorithms | 30 |
| 2.2.1 The Bergland Algorithm | 30 |
| 2.2.2 The Brunn Algorithm | 32 |
| 2.3 Real-From-Complex Strategies | 33 |
| 2.3.1 Computing One Real-Data DFT via One Full-Length Complex-Data FFT | 34 |
| 2.3.2 Computing Two Real-Data DFTs via One Full-Length Complex-Data FFT | 34 |
| 2.3.3 Computing One Real-Data DFT via One Half-Length Complex-Data FFT | 36 |
| 2.4 Data Re-ordering | 37 |
| 2.5 Discussion | 38 |
| References | 39 |
| 3 The Discrete Hartley Transform | 40 |
|---|
| 3.1 Introduction | 40 |
| 3.2 Normalization of DHT Outputs | 41 |
| 3.3 Decomposition into Even and Odd Components | 42 |
| 3.4 Connecting Relations Between DFT and DHT | 42 |
| 3.4.1 Real-Data DFT | 43 |
| 3.4.2 Complex-Data DFT | 43 |
| 3.5 Fundamental Theorems for DFT and DHT | 44 |
| 3.5.1 Reversal Theorem | 45 |
| 3.5.2 Addition Theorem | 46 |
| 3.5.3 Shift Theorem | 47 |
| 3.5.4 Convolution Theorem | 47 |
| 3.5.5 Product Theorem | 48 |
| 3.5.6 Autocorrelation Theorem | 48 |
| 3.5.7 First Derivative Theorem | 48 |
| 3.5.8 Second Derivative Theorem | 49 |
| 3.5.9 Summary of Theorems | 49 |
| 3.6 Fast Solutions to DHT | 50 |
| 3.7 Accuracy Considerations | 52 |
| 3.8 Discussion | 52 |
| References | 53 |
| 4 Derivation of the Regularized Fast Hartley Transform | 54 |
|---|
| 4.1 Introduction | 54 |
| 4.2 Derivation of the Conventional Radix-4 Butterfly Equations | 55 |
| 4.3 Single-to-Double Conversion of the Radix-4 Butterfly Equations | 58 |
| 4.4 Radix-4 Factorization of the FHT | 59 |
| 4.5 Closed-Form Expression for Generic Radix-4 Double Butterfly | 61 |
| 4.5.1 Twelve-Multiplier Version of Generic Double Butterfly | 67 |
| 4.5.2 Nine-Multiplier Version of Generic Double Butterfly | 67 |
| 4.6 Trigonometric Coefficient Storage, Accession and Generation | 69 |
| 4.6.1 Minimum-Arithmetic Addressing Scheme | 70 |
| 4.6.2 Minimum-Memory Addressing Scheme | 70 |
| 4.6.3 Trigonometric Coefficient Generation via Trigonometric Identities | 71 |
| 4.7 Comparative Complexity Analysis with Existing FFT Designs | 72 |
| 4.8 Scaling Considerations for Fixed-Point Implementation | 74 |
| 4.9 Discussion | 75 |
| References | 76 |
| 5 Algorithm Design for Hardware-Based Computing Technologies | 77 |
|---|
| 5.1 Introduction | 77 |
| 5.2 The Fundamental Properties of FPGA and ASIC Devices | 78 |
| 5.3 Low-Power Design Techniques | 79 |
| 5.3.1 Clock Frequency | 80 |
| 5.3.2 Silicon Area | 80 |
| 5.3.3 Switching Frequency | 82 |
| 5.4 Proposed Hardware Design Strategy | 82 |
| 5.4.1 Scalability of Design | 83 |
| 5.4.2 Partitioned-Memory Processing | 83 |
| 5.4.3 Flexibility of Design | 84 |
| 5.5 Constraints on Available Resources | 85 |
| 5.6 Assessing the Resource Requirements | 85 |
| 5.7 Discussion | 86 |
| References | 87 |
| 6 Derivation of Area-Efficient and Scalable Parallel Architecture | 88 |
|---|
| 6.1 Introduction | 88 |
| 6.2 Single-PE Versus Multi-PE Architectures | 89 |
| 6.3 Conflict-Free Parallel Memory Addressing Schemes | 91 |
| 6.3.1 Data Storage and Accession | 91 |
| 6.3.2 Trigonometric Coefficient Storage, Accession and Generation | 95 |
| 6.3.2.1 Minimum-Arithmetic Addressing Scheme | 96 |
| 6.3.2.2 Minimum-Memory Addressing Scheme | 96 |
| 6.3.2.3 Summary of Addressing Schemes | 97 |
| 6.4 Design of Pipelined PE for Single-PE Architecture | 100 |
| 6.4.1 Internal Pipelining of Generic Double Butterfly | 101 |
| 6.4.2 Space Complexity Considerations | 102 |
| 6.4.3 Time Complexity Considerations | 103 |
| 6.5 Performance and Requirements Analysis of FPGAImplementation | 104 |
| 6.6 Constraining Latency Versus Minimizing Update-Time | 106 |
| 6.7 Discussion | 108 |
| References | 109 |
| 7 Design of Arithmetic Unit for Resource-Constrained Solution | 111 |
|---|
| 7.1 Introduction | 111 |
| 7.2 Accuracy Considerations | 112 |
| 7.3 Fast Multiplier Approach | 113 |
| 7.4 CORDIC Approach | 114 |
| 7.4.1 CORDIC Formulation of Complex Multiplier | 114 |
| 7.4.2 Parallel Formulation of CORDIC-Based PE | 115 |
| 7.4.3 Discussion of CORDIC-Based Solution | 116 |
| 7.4.4 Logic Requirement of CORDIC-Based PE | 119 |
| 7.5 Comparative Analysis of PE Designs | 120 |
| 7.6 Discussion | 122 |
| References | 125 |
| 8 Computation of 2n-Point Real-Data Discrete Fourier Transform | 126 |
|---|
| 8.1 Introduction | 126 |
| 8.2 Computing One DFT via Two Half-Length Regularized FHTs | 127 |
| 8.2.1 Derivation of 2n-Point Real-Data FFT Algorithm
|