| Preface | 5 |
|---|
| Contents | 7 |
|---|
| List of Contributors | 11 |
|---|
| I Integrated Development Environments | 12 |
|---|
| Sun HPC ClusterTools 7+: A Binary Distribution of Open MPI | 13 |
| Introduction | 13 |
| History | 14 |
| Sun-Driven features | 15 |
| Sun Product Activity | 23 |
| Pros and Cons | 25 |
| Future work and conclusions | 26 |
| References | 27 |
| An Integrated Environment For the Development of Parallel Applications | 29 |
| Introduction | 29 |
| Challenges | 31 |
| Architecture | 33 |
| A Simple Case Study | 38 |
| Future Directions | 41 |
| Conclusion | 43 |
| References | 44 |
| Debugging MPI Programs on the Grid using g-Eclipse | 45 |
| Introduction | 45 |
| Related Work | 46 |
| Overview of g-Eclipse Approach | 47 |
| Remote Builder | 48 |
| Grid Application Launchers | 49 |
| Trace Viewer | 49 |
| Conclusions and Future Work | 54 |
| References | 54 |
| II Parallel Communication and Debugging | 56 |
|---|
| Enhanced Memory debugging of MPI-parallel Applications in Open MPI | 57 |
| Introduction | 57 |
| Overview of Memcheck | 58 |
| Design and Implementation | 59 |
| Performance Implications | 61 |
| Detectable error classes and findings in actual applications | 65 |
| Conclusion and future work | 67 |
| References | 68 |
| MPI Correctness Checking with Marmot | 69 |
| Introduction | 70 |
| Related Work | 70 |
| Design of Marmot | 71 |
| Collaboration with other tools | 78 |
| Experiences with real Applications | 80 |
| How to install and use Marmot | 83 |
| Conclusion and Future Work | 84 |
| References | 84 |
| Memory Debugging in Parallel and Distributed Applications | 87 |
| Introduction | 87 |
| The Challenges of Memory Debugging in Parallel Development | 88 |
| Classifying Memory Errors | 88 |
| Detecting Memory Leaks | 90 |
| The MemoryScape Debugger | 90 |
| MemoryScape Architecture | 91 |
| MemoryScape Features | 92 |
| MemoryScape Usage Tips | 95 |
| MemoryScape User Case Study: SIMULIA Uses MemoryScape to Find and Fix Bugs Quickly | 96 |
| Future MemoryScape Product Plans | 98 |
| Conclusion | 98 |
| III Performance Analysis Tools | 99 |
|---|
| Sequential Performance Analysis with Callgrind and KCachegrind | 100 |
| Introduction | 100 |
| Callgrind: a Call-Graph building Online Cache Simulator | 104 |
| KCachegrind: Profile Visualization | 112 |
| Usage Example | 117 |
| Future Development | 118 |
| References | 120 |
| Improving Cache Utilization Using Acumem VPE | 121 |
| Introduction | 122 |
| Throughput Study of SPEC CPU 2006 | 124 |
| First Generation Performance Tools Based on Hardware Counters | 126 |
| Enter: The New Performance Tool | 128 |
| Utilization Study of the Worst SPEC CPU 2006 Applications | 132 |
| Tuning Example: 179.art | 134 |
| Tuning Example: Revisiting the Throughput Applications | 138 |
| Conclusion | 140 |
| References | 141 |
| Parallel Performance Analysis Tools | 141 |
| The Vampir Performance Analysis Tool-Set | 143 |
| Introduction | 143 |
| Performance Analysis via Profiling or Tracing | 144 |
| Instrumentation with VampirTrace | 145 |
| Run-Time Measurement and Event Recording | 148 |
| Trace Visualization with Vampir and VampirServer | 152 |
| Related Work | 158 |
| Conclusions and Future Work | 158 |
| References | 159 |
| Usage of the SCALASCA toolset for scalable performance analysis of large-scale parallel applications | 160 |
| Introduction | 160 |
| Overview | 161 |
| Instrumentation and Measurement | 162 |
| Trace Analysis | 165 |
| Understanding Performance Behavior | 167 |
| Outlook | 169 |
| References | 170 |
| Evolution of a Parallel Performance System | 171 |
| Introduction | 171 |
| TAU Performance System Design and Architecture | 172 |
| TAU Instrumentation | 174 |
| TAU Measurement | 180 |
| TAU Analysis | 185 |
| Conclusion and Future Work | 188 |
| References | 190 |
| Cray Performance Analysis Tools | 193 |
| Introduction | 193 |
| The Cray Performance Analysis Tools | 194 |
| Conclusions and Future Work | 200 |
| References | 201 |
| Index | 202 |