List of Figures .............................................. xiii
List of Tables ................................................ xxi
Foreword .................................................... xxiii
Preface ....................................................... xxv
1 Multi-Core Architectures for Embedded Systems ................ 1
С.P. Ravikumar
1.1 Introduction ............................................ 2
1.1.1 What Makes Multiprocessor Solutions
Attractive? ...................................... 3
1.2 Architectural Considerations ............................ 9
1.3 Interconnection Networks ............................... 11
1.4 Software Optimizations ................................. 13
1.5 Case Studies ........................................... 14
1.5.1 HiBRID-SoC for Multimedia Signal Processing ..... 14
1.5.2 VIPER Multiprocessor SoC ........................ 16
1.5.3 Defect-Tolerant and Reconfigurable MPSoC ........ 17
1.5.4 Homogeneous Multiprocessor for Embedded
Printer Application ............................. 18
1.5.5 General Purpose Multiprocessor DSP .............. 20
1.5.6 Multiprocessor DSP for Mobile Applications ...... 21
1.5.7 Multi-Core DSP Platforms ........................ 23
1.6 Conclusions ............................................ 25
Review Questions ............................................ 25
Bibliography ................................................ 27
2 Application-Specific Customizable Embedded Systems .......... 31
Georgios Kornaros
2.1 Introduction ........................................... 32
2.2 Challenges and Opportunities ........................... 34
2.2.1 Objectives ...................................... 35
2.3 Categorization ......................................... 37
2.3.1 Customized Application-Specific Processor
Techniques ...................................... 37
2.3.2 Customized Application-Specific On-Chip
Interconnect Techniques ......................... 40
2.4 Configurable Processors and Instruction Set
Synthesis .............................................. 41
2.4.1 Design Methodology for Processor
Customization ................................... 43
2.4.2 Instruction Set Extension Techniques ............ 44
2.4.3 Application-Specific Memory-Aware
Customization ................................... 48
2.4.4 Customizing On-Chip Communication
Interconnect .................................... 48
2.4.5 Customization of MPSoCs ......................... 49
2.5 Reconfigurable Instruction Set Processors .............. 52
2.5.1 Warp Processing ................................. 53
2.6 Hardware/Software Codesign ............................. 54
2.7 Hardware Architecture Description Languages ............ 55
2.7.1 LISATek Design Platform ......................... 57
2.8 Myths and Realities .................................... 58
2.9 Case Study: Realizing Customizable Multi-Core
Designs ................................................ 60
2.10 The Future: System Design with Customizable
Architectures, Software, and Tools ..................... 62
Review Questions ............................................ 63
Bibliography ................................................ 63
3 Power Optimization in Multi-Core System-on-Chip ............. 71
Massimo Conti, Simone Orcioni, Giovanni Vece and Stefano
Gigli
3.1 Introduction ........................................... 72
3.2 Low Power Design ....................................... 74
3.2.1 Power Models .................................... 75
3.2.2 Power Analysis Tools ............................ 80
3.3 PKtool ................................................. 82
3.3.1 Basic Features .................................. 82
3.3.2 Power Models .................................... 83
3.3.3 Augmented Signals ............................... 84
3.3.4 Power States .................................... 85
3.3.5 Application Examples ............................ 86
3.4 On-Chip Communication Architectures .................... 87
3.5 NOCEXplore ............................................. 90
3.5.1 Analysis ............................................. 91
3.6 DPM and DVS in Multi-Core Systems ...................... 95
3.7 Conclusions ........................................... 100
Review Questions ........................................... 101
Bibliography ............................................... 102
4 Routing Algorithms for Irregular Mesh-Based Network-on-
Chip ....................................................... 111
Shu-Yen Lin and An-Yeu (Andy) Wu
4.1 Introduction .......................................... 112
4.2 An Overview of Irregular Mesh Topology ................ 113
4.2.1 2D Mesh Topology ............................... 113
4.2.2 Irregular Mesh Topology ........................ 113
4.3 Fault-Tolerant Routing Algorithms for 2D Meshes ....... 115
4.3.1 Fault-Tolerant Routing Using Virtual
Channels ....................................... 116
4.3.2 Fault-Tolerant Routing with Turn Model ......... 117
4.4 Routing Algorithms for Irregular Mesh Topology ........ 126
4.4.1 Traffic-Balanced OAPR Routing Algorithm ........ 127
4.4.2 Application-Specific Routing Algorithm ......... 132
4.5 Placement for Irregular Mesh Topology ................. 136
4.5.1 OIP Placements Based on Chen and Chili's
Algorithm ...................................... 137
4.5.2 OIP Placements Based on OAPR ................... 140
4.6 Hardware Efficient Routing Algorithms ................. 143
4.6.1 Turns-Table Routing (TT) ....................... 146
4.6.2 XY-Deviation Table Routing (XYDT) .............. 147
4.6.3 Source Routing for Deviation Points (SRDP) ..... 147
4.6.4 Degree Priority Routing Algorithm .............. 148
4.7 Conclusions ........................................... 151
Review Questions ........................................... 151
Bibliography ............................................... 151
5 Debugging Multi-Core Systems-on-Chip ....................... 155
Bart Vermeulen and Kees Goossens
5.1 Introduction .......................................... 156
5.2 Why Debugging Is Difficult ............................ 158
5.2.1 Limited Internal Observability ................. 158
5.2.2 Asynchronicity and Consistent Global States .... 159
5.2.3 Non-Determinism and Multiple Traces ............ 161
5.3 Debugging an SoC ...................................... 163
5.3.1 Errors ......................................... 164
5.3.2 Example Erroneous System ....................... 165
5.3.3 Debug Process .................................. 166
5.4 Debug Methods ......................................... 169
5.4.1 Properties ..................................... 169
5.4.2 Comparing Existing Debug Methods ............... 171
5.5 CSAR Debug Approach ................................... 174
5.5.1 Communication-Centric Debug .................... 175
5.5.2 Scan-Based Debug ............................... 175
5.5.3 Run/Stop-Based Debug ........................... 176
5.5.4 Abstraction-Based Debug ........................ 176
5.6 On-Chip Debug Infrastructure .......................... 178
5.6.1 Overview ....................................... 178
5.6.2 Monitors ....................................... 178
5.6.3 Computation-Specific Instrument ................ 180
5.6.4 Protocol-Specific Instrument ................... 181
5.6.5 Event Distribution Interconnect ................ 182
5.6.6 Debug Control Interconnect ..................... 183
5.6.7 Debug Data Interconnect ........................ 183
5.7 Off-Chip Debug Infrastructure ......................... 184
5.7.1 Overview ....................................... 184
5.7.2 Abstractions Used by Debugger Software ......... 184
5.8 Debug Example ......................................... 190
5.9 Conclusions ........................................... 193
Review Questions ........................................... 194
Bibliography ............................................... 194
6 System-Level Tools for NoC-Based Multi-Core Design ......... 201
Luciano Bononi, Nicola Concer, and Miltos Grammatikakis
6.1 Introduction .......................................... 202
6.1.1 Related Work ................................... 204
6.2 Synthetic Traffic Models .............................. 206
6.3 Graph Theoretical Analysis ............................ 207
6.3.1 Generating Synthetic Graphs Using TGFF ......... 209
6.4 Task Mapping for SoC Applications ..................... 210
6.4.1 Application Task Embedding and Quality
Metrics ........................................ 210
6.4.2 SCOTCH Partitioning Tool ....................... 214
6.5 OMNeT ++ Simulation Framework ......................... 216
6.6 A Case Study .......................................... 217
6.6.1 Application Task Graphs ........................ 217
6.6.2 Prospective NoC Topology Models ................ 218
6.6.3 Spidergon Network on Chip ...................... 219
6.6.4 Task Graph Embedding and Analysis .............. 221
6.6.5 Simulation Models for Proposed NoC
Topologies ..................................... 223
6.6.6 Mpeg4: A Realistic Scenario .................... 227
6.7 Conclusions and Extensions ............................ 231
Review Questions ........................................... 234
Bibliography ............................................... 235
7 Compiler Techniques for Application Level Memory
Optimization for MPSoC ..................................... 243
Bruno Girodias, Youcef Bouchebaba, Pierre Paulin, Bruno
Lavigueur, Gabriela Nicolescu, and El Mostapha Aboulhamid
7.1 Introduction .......................................... 244
7.2 Loop Transformation for Single and Multiprocessors .... 245
7.3 Program Transformation Concepts ....................... 246
7.4 Memory Optimization Techniques ........................ 248
7.4.1 Loop Fusion .................................... 249
7.4.2 Tiling ......................................... 249
7.4.3 Buffer Allocation .............................. 249
7.5 MPSoC Memory Optimization Techniques .................. 250
7.5.1 Loop Fusion .................................... 251
7.5.2 Comparison of Lexicographically Positive and
Positive Dependency ............................ 252
7.5.3 Tiling ......................................... 253
7.5.4 Buffer Allocation .............................. 254
7.6 Technique Impacts ..................................... 255
7.6.1 Computation Time ............................... 255
7.6.2 Code Size Increase ............................. 256
7.7 Improvement in Optimization Techniques ................ 256
7.7.1 Parallel Processing Area and Partitioning ...... 256
7.7.2 Modulo Operator Elimination .................... 259
7.7.3 Unimodular Transformation ...................... 260
7.8 Case Study ............................................ 261
7.8.1 Cache Ratio and Memory Space ................... 262
7.8.2 Processing Time and Code Size .................. 263
7.9 Discussion ............................................ 263
7.10 Conclusions ........................................... 264
Review Questions ........................................... 265
Bibliography ............................................... 266
8 Programming Models for Multi-Core Embedded Software ........ 269
Bijoy A. Jose, Bin Xue, Sandeep K. Shukla and Jean-Pierre
Talpin
8.1 Introduction .......................................... 270
8.2 Thread Libraries for Multi-Threaded Programming ....... 272
8.3 Protections for Data Integrity in a Multi-Threaded
Environment ........................................... 276
8.3.1 Mutual Exclusion Primitives for Deterministic
Output ......................................... 276
8.3.2 Transactional Memory ........................... 278
8.4 Programming Models for Shared Memory and Distributed
Memory ................................................ 279
8.4.1 OpenMP ......................................... 279
8.4.2 Thread Building Blocks ......................... 280
8.4.3 Message Passing Interface ...................... 281
8.5 Parallel Programming on Multiprocessors ............... 282
8.6 Parallel Programming Using Graphic Processors ......... 283
8.7 Model-Driven Code Generation for Multi-Core Systems ... 284
8.7.1 StreamIt ....................................... 285
8.8 Synchronous Programming Languages ..................... 286
8.9 Imperative Synchronous Language: Esterel .............. 288
8.9.1 Basic Concepts ................................. 288
8.9.2 Multi-Core Implementations and Their
Compilation Schemes ............................ 289
8.10 Declarative Synchronous Language: LUSTRE .............. 290
8.10.1 Basic Concepts ................................. 291
8.10.2 Multi-Core Implementations from LUSTRE
Specifications ................................. 291
8.11 Multi-Rate Synchronous Language: SIGNAL ............... 292
8.11.1 Basic Concepts ................................. 292
8.11.2 Characterization and Compilation of SIGNAL ..... 293
8.11.3 SIGNAL Implementations on Distributed
Systems ........................................ 294
8.11.4 Multi-Threaded Programming Models for SIGNAL ... 296
8.12 Programming Models for Real-Time Software ............. 299
8.12.1 Real-Time Extensions to Synchronous
Languages ...................................... 300
8.13 Future Directions for Multi-Core Programming .......... 301
Review Questions ........................................... 302
Bibliography ............................................... 305
9 Operating System Support for Multi-Core Systems-on-Chips ... 309
Xavier Guérin and Frédéric Pétrot
9.1 Introduction .......................................... 310
9.2 Ideal Software Organization ........................... 311
9.3 Programming Challenges ................................ 313
9.4 General Approach ...................................... 314
9.4.1 Board Support Package .......................... 314
9.4.2 General Purpose Operating System ............... 317
9.5 Real-Time and Component-Based Operating System
Models ................................................ 322
9.5.1 Automated Application Code Generation and
RTOS Modeling .................................. 322
9.5.2 Component-Based Operating System ............... 326
9.6 Pros and Cons ......................................... 329
9.7 Conclusions ........................................... 330
Review Questions ........................................... 332
Bibliography ............................................... 333
10 Autonomous Power Management in Embedded Multi-Cores ........ 337
Arindam Mukherjee, Arun Ravindran, Bharat Kumar Joshi,
Kushal Datta and Yue Liu
10.1 Introduction .......................................... 338
10.1.1 Why Is Autonomous Power Management
Necessary? ..................................... 339
10.2 Survey of Autonomous Power Management Techniques ...... 342
10.2.1 Clock Gating ................................... 342
10.2.2 Power Gating ................................... 343
10.2.3 Dynamic Voltage and Frequency Scaling .......... 343
10.2.4 Smart Caching .................................. 344
10.2.5 Scheduling ..................................... 345
10.2.6 Commercial Power Management Tools .............. 346
10.3 Power Management and RTOS ............................. 347
10.4 Power-Smart RTOS and Processor Simulators ............. 349
10.4.1 Chip Multi-Threading (CMT) Architecture
Simulator ..................................... 350
10.5 Autonomous Power Saving in Multi-Core Processors ...... 351
10.5.1 Opportunities to Save Power ................... 353
10.5.2 Strategies to Save Power ....................... 354
10.5.3 Case Study: Power Saving in Intel Centrino ..... 356
10.6 Power Saving Algorithms ............................... 358
10.6.1 Local PMU Algorithm ............................ 358
10.6.2 Global PMU Algorithm ........................... 358
10.7 Conclusions ........................................... 360
Review Questions ........................................... 362
Bibliography ............................................... 363
11 Multi-Core System-on-Chip in Real World Products ........... 369
Gajinder Panesar, Andrew Duller, Alan H. Gray and Daniel
Towner
11.1 Introduction .......................................... 370
11.2 Overview of picoArray Architecture .................... 371
11.2.1 Basic Processor Architecture ................... 371
11.2.2 Communications Interconnect .................... 373
11.2.3 Peripherals and Hardware Functional
Accelerators ................................... 373
11.3 Tool Flow ............................................. 375
11.3.1 picoVhdl Parser (Analyzer, Elaborator,
Assembler) ..................................... 376
11.3.2 С Compiler ..................................... 376
11.3.3 Design Simulation .............................. 378
11.3.4 Design Partitioning for Multiple Devices ....... 381
11.3.5 Place and Switch ............................... 381
11.3.6 Debugging ...................................... 381
11.4 pico Array Debug and Analysis ......................... 381
11.4.1 Language Features .............................. 382
11.4.2 Static Analysis ................................ 383
11.4.3 Design Browser ................................. 383
11.4.4 Scripting ...................................... 385
11.4.5 Probes ......................................... 387
11.4.6 FileIO ......................................... 387
11.5 Hardening Process in Practice ......................... 388
11.5.1 Viterbi Decoder Hardening ...................... 389
11.6 Design Example ........................................ 392
11.7 Conclusions ........................................... 396
Review Questions ........................................... 396
Bibliography ............................................... 397
12 Embedded Multi-Core Processing for Networking .............. 399
Theofanis Orphanoudakis and Stylianos Perissakis
12.1 Introduction .......................................... 400
12.2 Overview of Proposed NPU Architectures ................ 403
12.2.1 Multi-Core Embedded Systems for Multi-Service
Broadband Access and Multimedia Home
Networks ....................................... 403
12.2.2 SoC Integration of Network Components and
Examples of Commercial Access NPUs ............. 405
12.2.3 NPU Architectures for Core Network Nodes and
High-Speed Networking and Switching ............ 407
12.3 Programmable Packet Processing Engines ................ 412
12.3.1 Parallelism .................................... 413
12.3.2 Multi-Threading Support ........................ 418
12.3.3 Specialized Instruction Set Architectures ...... 421
12.4 Address Lookup and Packet Classification Engines ...... 422
12.4.1 Classification Techniques ...................... 424
12.4.2 Case Studies ................................... 426
12.5 Packet Buffering and Queue Management Engines ......... 431
12.5.1 Performance Issues ............................. 433
12.5.2 Design of Specialized Core for Implementation
of Queue Management in Hardware ................ 435
12.6 Scheduling Engines .................................... 442
12.6.1 Data Structures in Scheduling Architectures .... 443
12.6.2 Task Scheduling ................................ 444
12.6.3 Traffic Scheduling ............................. 450
12.7 Conclusions ........................................... 453
Review Questions ........................................... 455
Bibliography ............................................... 459
Index ......................................................... 465
|