| Preface | 6 |
|---|
| Contents | 8 |
|---|
| Chapter 1: An Introduction to Multi-Core System on Chip - Trends and Challenges | 10 |
|---|
| 1.1 From SoC to MPSoC | 10 |
| 1.2 General Structure of MPSoC | 11 |
| 1.2.1 Processing Elements | 11 |
| 1.2.2 Interconnection | 12 |
| 1.2.3 Power Management | 12 |
| 1.3 Power Efficiency and Adaptability | 13 |
| 1.4 Complexity and Scalability | 15 |
| 1.5 Heterogeneous and Homogeneous Approaches | 16 |
| 1.5.1 Heterogeneous MPSoC | 16 |
| 1.5.2 Homogeneous MPSoC | 17 |
| 1.6 Multi variable Optimization | 20 |
| 1.6.1 Static Optimization | 20 |
| 1.6.2 Dynamic Optimization | 20 |
| 1.6.2.1 Centralized Approaches | 21 |
| 1.6.2.2 Distributed Approaches | 24 |
| 1.7 Static vs Dynamic Centralized and Distributed Approaches | 25 |
| 1.8 Conclusion | 27 |
| References | 28 |
| Part I: ``Application Mapping and Communication Infrastructure´´ | 31 |
|---|
| Chapter 2: Composability and Predictability for Independent Application Development,Verification, and Execution | 32 |
| 2.1 Introduction | 32 |
| 2.2 Composability and Predictability | 34 |
| 2.2.1 Terminology | 34 |
| 2.2.2 Composable Resources | 38 |
| 2.2.3 Predictable resources | 41 |
| 2.2.4 Composable and predictable resources | 42 |
| 2.3 Processor tile | 45 |
| 2.3.1 Composability | 45 |
| 2.3.1.1 Constant task slots | 46 |
| 2.3.1.2 Constant OS slot | 47 |
| 2.3.1.3 Two-level application and task scheduling | 48 |
| 2.3.2 Predictability | 48 |
| 2.4 Interconnect | 49 |
| 2.4.1 Composability | 50 |
| 2.4.2 Predictability | 51 |
| 2.5 Memory tile | 51 |
| 2.5.1 Predictability | 52 |
| 2.5.1.1 Predictable SDRAM back-end | 52 |
| 2.5.1.2 Predictable arbitration | 55 |
| 2.5.2 Composability | 56 |
| 2.6 Experiments | 57 |
| 2.7 Conclusions | 59 |
| References | 61 |
| Chapter 3: Hardware Support for Efficient Resource Utilization in Manycore Processor Systems | 64 |
| 3.1 Introduction | 65 |
| 3.2 Learning from Network Processing Applications | 67 |
| 3.2.1 Commercial Network Processors | 68 |
| 3.2.2 Example Networking Applications | 69 |
| 3.2.3 The FlexPath NP Approach | 70 |
| 3.2.4 What Can Other Manycore Domains Learn from Network Processing? | 75 |
| 3.3 Learning from HPC and Scientific Computing | 77 |
| 3.3.1 Hierarchical Multi-Topology Networks-on-Chip | 77 |
| 3.3.2 Task Management | 81 |
| 3.3.3 Synchronization Subsystem | 82 |
| 3.3.4 What Can Other Manycore Domains Learn from Supercomputing? | 83 |
| 3.4 Learning from Bio-Inspired, Self-Organizing Systems in Nature | 84 |
| 3.4.1 Collective Behavior of Entities in Natural and Technical Systems | 84 |
| 3.4.2 Technical Realization of Self-Adaptive IP Cores | 86 |
| 3.4.3 What Can Manycore Domains Learn from Nature? | 90 |
| 3.5 Summary and Conclusions | 92 |
| References | 93 |
| Chapter 4: PALLAS: Mapping Applications onto Manycore | 95 |
| 4.1 PALLAS | 96 |
| 4.2 Driving Applications | 97 |
| 4.2.1 Content-Based Image Retrieval | 97 |
| 4.2.2 Optical Flow and Tracking | 99 |
| 4.2.3 Stationary Video Background Subtraction | 101 |
| 4.2.4 Automatic Speech Recognition | 102 |
| 4.2.5 Compressed Sensing MRI | 103 |
| 4.2.6 Market Value-at-Risk Estimation in Computational Finance | 105 |
| 4.2.7 Games | 106 |
| 4.2.8 Machine Translation | 107 |
| 4.2.9 Summary | 108 |
| 4.3 Perspectives on Parallel Performance | 109 |
| 4.3.1 Linear Scaling Not Required | 109 |
| 4.3.2 Measure Real Problems on Real Hardware | 110 |
| 4.3.3 Consider the Algorithms | 110 |
| 4.3.4 Summing Up | 111 |
| 4.4 Patterns to Frameworks | 111 |
| 4.4.1 Application Frameworks | 112 |
| 4.4.2 Programming Frameworks | 114 |
| 4.4.2.1 Efficiency and Portability Through Programming Frameworks | 114 |
| 4.4.2.2 Copperhead | 115 |
| 4.5 Conclusions | 116 |
| 4.6 Appendices | 117 |
| 4.6.1 Structural Patterns | 117 |
| 4.6.2 Computational Patterns | 117 |
| 4.6.3 Parallel Algorithm Strategy Patterns | 118 |
| References | 118 |
| Chapter 5: The Case for Message Passing on Many-Core Chips | 120 |
| 5.1 Metrics for Comparing Parallel Programming Models | 121 |
| 5.2 Comparison Framework | 122 |
| 5.3 Comparing Message Passing and Shared Memory | 123 |
| 5.3.1 Agenda Parallelism | 123 |
| 5.3.2 Result Parallelism | 124 |
| 5.3.3 Specialist Parallelism | 125 |
| 5.4 Architectural Implications | 126 |
| 5.5 Discussion and Conclusion | 127 |
| References | 128 |
| Part II: ``Reconfigurable Hardware in Multiprocessor Systems´´ | 129 |
|---|
| Chapter 6: Adaptive Multiprocessor System-on-Chip Architecture: New Degrees of Freedom in System Design and Runtime Support | 130 |
| 6.1 Introduction | 131 |
| 6.2 Background: Introduction to Reconfigurable Hardware | 133 |
| 6.2.1 Basic Concept of Runtime Reconfiguration | 133 |
| 6.2.2 Basic Concept of Runtime Reconfiguration and Classification of Configurable Granularity | 135 |
| 6.3 Related Work | 138 |
| 6.4 The RAMPSoC Approach | 139 |
| 6.5 Hardware Architecture of RAMPSoC | 142 |
| 6.6 Design Methodology of RAMPSoC | 144 |
| 6.7 CAP-OS: Configuration Access Port-Operating System for RAMPSoC | 148 |
| 6.8 Conclusions and Outlook | 152 |
| References | 152 |
| Part III: ``Physical Design of Multiprocessor Systems´´ | 155 |
|---|
| Chapter 7: Design Tools and Methods for Chip Physical Design | 156 |
| 7.1 Introduction | 157 |
| 7.2 Use of MOS Complex Gates | 158 |
| 7.3 Wirelength Reduction | 159 |
| 7.4 Power Reduction | 159 |
| 7.5 Layout Strategies | 159 |
| 7.6 Layout as a Network of Transistors | 161 |
| 7.7 Using ASTRAN to Help in the Synthesis of Analog Modules | 163 |
| 7.8 Conclusions | 166 |
| References | 166 |
| Chapter 8: Power-Aware Multicore SoC and NoC Design | 168 |
| 8.1 Introduction | 168 |
| 8.2 Power Estimation Models: From Spreadsheets to Power State Machines | 172 |
| 8.2.1 Power Models of Processors | 174 |
| 8.2.2 Power Models of Memory | 175 |
| 8.2.3 Power Models of On-Chip Interconnects | 176 |
| 8.2.4 Power Models for Embedded Software | 178 |
| 8.2.5 Power Estimation, Analysis, and Optimization Tools | 180 |
| 8.2.6 Standardization and Power Formats | 182 |
| 8.3 Power Management | 183 |
| <