List of Figures .............................................. xiii
List of Tables ................................................ xxi
Series in Computational Physics ............................. xxiii
Foreword ...................................................... xxv
Preface ..................................................... xxvii
About the Editors .......................................... xxxiii
Contributors ................................................. xxxv
1 The Charm++ Programming Model ................................ 1
Laxmikant V. Kale and Gengbin Zheng
1.1 Design Philosophy ....................................... 2
1.2 Object-Based Programming Model .......................... 3
1.3 Capabilities of the Adaptive Runtime System ............. 8
1.4 Extensions to the Basic Model .......................... 10
1.5 Charm++ Ecosystem ...................................... 13
1.6 Other Languages in the Charm++ Family .................. 14
1.7 Historical Notes ....................................... 15
1.8 Conclusion ............................................. 16
2 Designing Charm ++ Programs ................................. 17
Laxmikant V. Kale
2.1 Simple Stencil: Using Over-Decomposition and Selecting
Grainsize .............................................. 17
2.1.1 Grainsize Decisions ............................. 18
2.1.2 Multicore Nodes ................................. 21
2.1.3 Migrating Chares, Load Balancing and Fault
Tolerance ....................................... 21
2.2 Multi-Physics Modules Using Multiple Chare Arrays ...... 23
2.2.1 LeanMD .......................................... 24
2.3 SAMR: Chare Arrays with Dynamic Insertion and
Flexible Indices ....................................... 27
2.4 Combinatorial Search: Task Parallelism ................. 29
2.5 Other Features and Design Considerations ............... 30
2.6 Utility of Charm++ for Future Applications ............. 31
2.7 Summary ................................................ 32
3 Tools for Debugging and Performance Analysis ................ 35
Filippo Gioachin, Chee Wai Lee, Jonathan Lifflander,
Yanhua Sun and Laxmikant V. Kale
3.1 Introduction ........................................... 36
3.2 Scalable Debugging with CharmDebug ..................... 36
3.2.1 Accessing User Information ...................... 37
3.2.2 Debugging Problems at Large Scale ............... 40
3.2.3 Summary ......................................... 46
3.3 Performance Visualization and Analysis via
Projections ............................................ 47
3.3.1 A Simple Projections Primer ..................... 49
3.3.2 Features of Projections via Use Cases ........... 51
3.3.3 Advanced Features for Scalable Performance
Analysis ........................................ 58
3.3.4 Summary ......................................... 59
3.4 Conclusions ............................................ 60
4 Scalable Molecular Dynamics with NAMD ....................... 61
James C. Phillips, Klaus Schulten, Abhinav Bhatele, Chao
Mei, Yanhua Sun, Eric J. Bohm and Laxmikant V. Kale
4.1 Introduction ........................................... 61
4.2 Need for Biomolecular Simulations ...................... 62
4.3 Parallel Molecular Dynamics ............................ 63
4.4 NAMD's Parallel Design ................................. 64
4.4.1 Force Calculations .............................. 65
4.4.2 Load Balancing .................................. 66
4.5 Enabling Large Simulations ............................. 67
4.5.1 Hierarchical Load Balancing ..................... 67
4.5.2 SMP Optimizations ............................... 68
4.5.3 Optimizing Fine-Grained Communication in NAMD ... 70
4.5.4 Parallel Input/Output ........................... 71
4.6 Scaling Performance .................................... 72
4.7 Simulations Enabled by NAMD ............................ 75
4.8 Summary ................................................ 76
5 OpenAtom: Ab initio Molecular Dynamics for Petascale
Platforms ................................................... 79
Glenn J. Martyna, Eric J. Bohm, Ramprasad Venkataraman,
Laxmikant V. Kale and Abhinav Bhatele
5.1 Introduction ........................................... 80
5.2 Car-Parrinello Molecular Dynamics ...................... 81
5.2.1 Density Functional Theory, KS Density Functional
Theory and the Local Density Approximation ...... 82
5.2.2 DFT Computations within Basis Sets .............. 84
5.2.3 Molecular Dynamics .............................. 84
5.2.4 Ab initio Molecular Dynamics and CPAIMD ......... 84
5.2.5 Path Integrals .................................. 85
5.2.6 Parallel Tempering .............................. 86
5.3 Parallel Application Design ............................ 87
5.3.1 Modular Design and Benefits ..................... 87
5.3.2 Parallel Driver ................................. 89
5.3.3 Topology Aware Mapping .......................... 93
5.4 Charm++ Feature Development ............................ 95
5.5 Performance ............................................ 97
5.6 Impact on Science and Technology ....................... 98
5.6.1 Carbon Based Materials for Photovoltaic
Applications .................................... 98
5.6.2 Metal Insulator Transitions for Novel Devices .. 102
Future Work .................................... 103
6 N-body Simulations with ChaNGa ............................. 105
Thomas R. Quinn, Pritish Jetley, Laxmikant V. Kale and
Filippo Gioachin
6.1 Introduction .......................................... 106
6.2 Code Design ........................................... 107
6.2.1 Domain Decomposition and Load Balancing ........ 108
6.2.2 Tree Building .................................. 110
6.2.3 Tree Walking ................................... 112
6.2.4 Force Softening ................................ 113
6.2.5 Periodic Boundary Conditions ................... 114
6.2.6 Neighbor Finding ............................... 114
6.2.7 Multi-Stepping ................................. 115
6.3 Accuracy Tests ........................................ 115
6.3.1 Force Errors ................................... 115
6.3.2 Cosmology Tests ................................ 116
6.4 Performance ........................................... 121
6.4.1 Domain Decomposition and Tree Build
Performance .................................... 121
6.4.2 Single-Stepping Performance .................... 123
6.4.3 Multi-Stepping Performance ..................... 125
6.4.4 ChaNGa on GPUs ................................. 125
6.5 Conclusions and Future Work ........................... 133
7 Remote Visualization of Cosmological Data Using Salsa ...... 137
Orion Sky Lawlor and Thomas R. Quinn
7.1 Introduction .......................................... 137
7.2 Salsa Client/Server Rendering Architecture ............ 138
7.2.1 Client Server Communication Styles ............. 139
7.2.2 Image Compression in Salsa ..................... 141
7.2.3 GPU Particle Rendering on the Server ........... 142
7.3 Remote Visualization User Interface ................... 144
7.4 Example Use: Galaxy Clusters .......................... 145
8 Improving Scalability of BRAMS: a Regional Weather
Forecast Model ............................................. 149
Eduardo R. Rodrigues, Celso L. Mendes and Jairo Panetta
8.1 Introduction .......................................... 150
8.2 Load Balancing Strategies for Weather Models .......... 151
8.3 The BRAMS Weather Model ............................... 153
8.4 Load Balancing Approach ............................... 154
8.4.1 Adaptations to AMPI ............................ 155
8.4.2 Balancing Algorithms Employed .................. 157
8.5 New Load Balancer ..................................... 158
8.6 Fully Distributed Strategies .......................... 162
8.6.1 Hilbert Curve-Based Load Balancer .............. 162
8.6.2 Diffusion-Based Load Balancer .................. 164
8.7 Experimental Results .................................. 166
8.7.1 First Set of Experiments: Privatization
Strategy ....................................... 166
8.7.2 Second Set of Experiments: Virtualization
Effects ........................................ 167
8.7.3 Third Set of Experiments: Centralized Load
Balancers ...................................... 171
8.7.4 Fourth Set of Experiments: Distributed Load
Balancers ...................................... 174
8.8 Final Remarks ......................................... 182
9 Crack Propagation Analysis with Automatic Load Balancing ... 187
Orion Sky Lawlor, M. Scot Breitenfeld, Philippe H.
Geubelle and Gengbin Zheng
9.1 Introduction .......................................... 187
9.1.1 ParFUM Framework ............................... 188
9.1.2 Implementation of the ParFUM Framework ......... 190
9.2 Load Balancing Finite Element Codes in Charm++ ........ 193
9.2.1 Runtime Support for Thread Migration ........... 193
9.2.2 Comparison to Prior Work ....................... 194
9.2.3 Automatic Load Balancing for FEM ............... 195
9.2.4 Load Balancing Strategies ...................... 196
9.2.5 Agile Load Balancing ........................... 197
9.3 Cohesive and Elasto-plastic Finite Element Model of
Fracture .............................................. 199
9.3.1 Case Study 1: Elastc-Plastic Wave Propagation .. 202
9.3.2 Case Study 2: Dynamic Fracture ................. 206
9.4 Conclusions ........................................... 210
10 Contagion Diffusion with EpiSimdemics ...................... 211
Keith R. Bisset, Ashwin M. Aji, Tariq Kamal, Jae-Seung
Yeom, Madhav V. Marathe, Eric J. Bohm and Abhishek Gupta
10.1 Introduction .......................................... 212
10.2 Problem Description ................................... 215
10.2.1 Formalization .................................. 215
10.2.2 Application to Computational Epidemiology ...... 217
10.3 EpiSimdemics Design ................................... 218
10.3.1 The Disease Model .............................. 220
10.3.2 Modeling Behavior of Individual Agents ......... 220
10.3.3 Intervention and Behavior Modification ......... 221
10.3.4 Social Network Representation .................. 223
10.4 EpiSimdemics Algorithm ................................ 224
10.5 Charm++ Implementation ................................ 227
10.5.1 Designing the Chares ........................... 228
10.5.2 The EpiSimdemics Algorithm ..................... 230
10.5.3 Charm++ Features ............................... 232
10.6 Performance of EpiSimdemics ........................... 232
10.6.1 Experimental Setup ............................ 233
10.6.2 Performance Characteristics ................... 233
10.6.3 Effects of Synchronization .................... 234
10.6.4 Effects of Strong Scaling ..................... 235
10.6.5 Effects of Weak Scaling ....................... 236
10.6.6 Effect of Load Balancing ...................... 237
10.7 Representative Study .................................. 241
Bibliography .................................................. 247
Index ......................................................... 271
|