

IBM Systems & Technology Group Cell/Quasar Ecosystem & Solutions Enablement

# Hands-on – The Hello World! Program

Cell Programming Workshop Cell/Quasar Ecosystem Solutions Enablement

3/2/2008

### **Class Objectives**

- You will learn how to write, build and run "Hello World!" on the Cell System Simulator
- Navigate through the basic build process and make files
- Familiarize with gcc and xlc compilers
- Familiarize with the system simulator
- There are three different versions of "Hello World!" used in this session
  - PPE only,
  - SPE only, and
  - Cell BE, i.e. using both PPE and SPE
    - Synchronous
    - Asynchronous

**Trademarks** - Cell Broadband Engine <sup>™</sup> is a trademark of Sony Computer Entertainment, Inc.

# How to build, compile and execute the "Hello World!" program

#### Pre-requisites

- Toolchain
- Compiler

#### Build Process

#### Source Code

- Makefiles
- Source PPE
- Source SPE

#### Simulator

- Getting the binary into the simulator
- Running the binary



# The build process

# **Build Process**





# SDK 3.0 Makefile

# Compiling within the SDK

- Top of build environment is /opt/cell/sdk/
- Includes the build environment files
  - README\_build\_env.txt
    - Provides details on the build environment features, including files, structure and variables.
  - make.footer
    - Specifies all of the build rules needed to properly build CBEA binaries
    - Must be included in all SDK Makefiles (referenced relatively if \$CELL\_TOP is not defined)
    - Includes make.header
  - make.header
    - Specifies definitions needed to process the Makefiles
    - Includes make.env
  - make.env
    - Specifies the default compilers and tools to be used by make
- make.footer and make.header should not be modified

# **Common Makefile variables**

- list of subdirectories to build first

### PROGRAM\_ppu PROGRAMS\_ppu

- 32-bit PPU program (or list of programs) to build.

### PROGRAM\_ppu64 PROGRAMS\_ppu64

- 64-bit PPU program (or list of programs) to build.

### PROGRAM\_spu PROGRAMS\_spu

- SPU program (or list of programs) to build.
- If written as a standalone binary, can run without being embedded in a PPU program.

### LIBRARY\_embed LIBRARY\_embed64

 Creates a linked library from an SPU program to be embedded into a 32-bit or 64-bit PPU program.

#### OBJS OBJS\_<program>

 List of objects for the programs (or one specific program). By default, all objects in the current directory are linked into the binary.

#### IMPORTS

#### IMPORTS\_<program>

 List of libraries to link in the programs (or one specific program). Also used by the PPU programs to embed the SPU linked library.



# **Directory Layout and Examples of Makefile**



# **Building The Code**

### Environment setup

- Set the CELL\_TOP environment variable so that the makefile system can be found:
  - export CELL\_TOP=/opt/cell/sdk/
  - make.footer contains the build rules for the makefile system
- Ensure compilers or cross-compilers are in the executable search path

### Separate SPE code and PPE code into different directories

- Each set of code has it's own makefile and toolchain to use
- Suggestion: create a subdirectory called 'spu' in the directory where the PPU code is found

### Makefile template for PPE code:

```
DIRS = spu
PROGRAM_ppu = <PPU_executable_name>
IMPORTS = <spu_executable-embed.a> -lspe2
include $(CELL_TOP)/buildutils/make.footer
```

### Makefile template for SPE code:

```
PROGRAM_spu = <SPU_executable_name>
LIBRARY_embed = <spu_executable-embed.a>
include $(CELL_TOP)/buildutils/make.footer
```



# The "Hello World!" program



# Four Different Versions of "Hello World!"

- PPE only
- SPE only
- Synergistic PPE and SPE: synchronous
  - One SPE is used.
  - Main thread blocks and waits for the SPE code to run to completion

### Synergistic PPE and SPE: asynchronous

- Eight SPEs are used
- Main thread uses pthreads to get concurrent/asynchronous execution

### "Hello World!" - PPE Only

#### PPU program

- just like any "Hello World!" program one would write

```
#include <stdio.h>
int main(void)
{
    printf("Hello world!\n");
    return 0;
}
```

#### Makefile

- make.footer included to set up compiler and compiler flags
- PROGRAM\_ppu tells make to use PPC cross-compiler



# "Hello World!" - SPE Only

SPU Program

```
#include <stdio.h>
int main()
{
    printf("Hello world!\n");
    return 0;
}
```





# Synergistic PPE and SPE (SPE Embedded)

- Applications use software constructs called SPE contexts to manage and control SPEs.
- Linux schedules SPE contexts from all running applications onto the physical SPE resources in the system for execution according to the scheduling priorities and policies associated with the runable SPE contexts.
- libspe provides the means for communication and data transfer between PPE threads and SPEs.

# How does a PPE program start an SPE thread?

- 4 basic steps must be done by the PPE program
  - Create an SPE context.
  - Load an SPE executable object into the SPE context local store.
  - Run the SPE context. This transfers control to the operating system, which requests the actual scheduling of the context onto a physical SPE in the system.
  - Destroy the SPE context.

### SPE context creation

#### spe\_context\_create - Create and initialize a new SPE context data structure.

#include <libspe2.h>

```
spe_context_ptr_t spe_context_create(unsigned int flags,
spe_gang_context_ptr_t gang)
```

- *flags* A bit-wise OR of modifiers that are applied when the SPE context is created.
- gang Associate the new SPE context with this gang context. If NULL is specified, the new SPE context is not associated with any gang.
- On success, a pointer to the newly created SPE context is returned.

### spe\_program\_load

• **spe\_program\_load** - Load an SPE main program.

```
#include <libspe2.h>
int spe_program_load (spe_context_ptr_t spe,
spe_program_handle_t *program)
```

- spe A valid pointer to the SPE context for which an SPE program should be loaded.
- program A valid address of a mapped SPE program.

### spe\_context\_run

### • **spe\_context\_run** - Request execution of an SPE context.

#include <libspe2.h>

int spe\_context\_run(spe\_context\_ptr\_t spe, unsigned int \*entry, unsigned int runflags, void \*argp, void \*envp, spe\_stop\_info\_t \*stopinfo)

- *spe* A pointer to the SPE context that should be run.
- entry Input: The entry point, that is, the initial value of the SPU instruction pointer, at which the SPE program should start executing. If the value of entry is SPE\_DEFAULT\_ENTRY, the entry point for the SPU main program is obtained from the loaded SPE image. This is usually the local store address of the initialization function crt0.
- runflags A bit mask that can be used to request certain specific behavior for the execution of the SPE context. 0 indicates default behavior.
- argp An (optional) pointer to application specific data, and is passed as the second parameter to the SPE program,
- envp An (optional) pointer to environment specific data, and is passed as the third parameter to the SPE program,
- stopinfo An (optional) pointer to a structure of type spe\_stop\_info\_t

### spe\_context\_destroy

#### spe\_context\_destroy - Destroy the specified SPE context.

```
#include <libspe2.h>
```

```
int spe_context_destroy (spe_context_ptr_t spe)
```

- *spe* Specifies the SPE context to be destroyed
- On success, **0** (zero) is returned, else -1 is returned



# "Hello World!" – PPE and SPE Combined Structure

### SPU code

- Compiled with SPU specific toolchain
- Object is repackaged as PPC ELF object
- From this point forward normal PPU tools are used.

### PPU code

- Compiled with normal PPU toolchain
- Objects are linked to form a combined executable.
- At runtime, kernel extensions and SDK libraries are used to move the SPU code to an SPU and start the SPU thread.



# "Hello World!" – Synergistic PPE and SPE (SPE Embedded)

#### SPU program

Same as for SPE only

#### SPU Makefile

PROGRAM\_spu := hello\_spu
LIBRARY\_embed := hello\_spu.a
include \$(CELL TOP)/buildutils/make.footer

# "Hello World!" – PPU program

```
// Load an SPE executable object into the
#include <errno.h>
                                                     SPE context local store
#include <stdio.h>
                                                     if (spe program load(speid, &hello spu))
#include <stdlib.h>
#include <libspe2.h>
                                                     perror("spe program load");
                                                     return -3;
extern spe program handle t hello spu;
                                                     }
int main(void)
                                                  // Run the SPE context
{
                                                     rc = spe context run(speid, &entry, 0,
                                                     argp, envp, &stop info);
   spe context ptr_t speid;
                                                     if (rc < 0)
   unsigned int flags = 0;
                                                     perror("spe context run");
   unsigned int entry = SPE DEFAULT ENTRY;
   void * argp = NULL;
                                                  // Destroy the SPE context
   void * envp = NULL;
                                                     spe context destroy(speid);
   spe stop info t stop info;
                                                     return 0;
   int rc;
                                                  }
                                                                                    PPU Makefile
// Create an SPE context
```

```
speid = spe_context_create(flags, NULL);
if (speid == NULL) {
  perror("spe_context_create");
  return -2;
```

```
DIRS = spu
PROGRAM_ppu = hello_be1
IMPORTS = spu/hello_spu.a -lspe2 -lpthread
include $(CELL TOP)/buildutils/make.footer
```



# The IBM Full System Simulator – An Overview

### **Simulator Overview**



25

3/2/2008



# SystemSim Runtime Environment



26

3/2/2008

# SystemSim User Interface

#### Graphical interface

- Provides a visual display of the state of the simulated system, including the PPE and the eight (or 16) SPEs
- Includes dialogs to view the contents of the registers, memory, and channels, and other architectural structures
- Based on Tcl/Tk
- Layered on top of the command line interface

#### Command line

- Uses Tcl (Tool Control Language) as the base command interpreter
- All the standard Tcl commands are available
- SystemSim commands to configure and create simulated machines
- Commands (e.g. mysim) to control a specific simulated machine

### **Operating-System Modes**

#### Linux Mode

- Simulator boots a full Linux operating system on the simulated system
- Applications are launched from the Linux console window and run
- The simulated operating system handles all the system calls

#### Standalone Mode

- The application is loaded directly into the simulated machine without an operating system
- The simulator traps all system calls made by the application and performs these functions in place of the operating system
- Some restrictions apply, such as
  - The application must be statically linked with any libraries it needs
  - No virtual memory support is provided
  - Only a subset of system calls are supported

# Simulator Structure and Windows



# Interacting with the Simulator

### Issuing commands to the simulator

- in the simulator command window, or using the equivalent actions in the graphical user interface (GUI).
- To control the simulator itself, configuring it to do such tasks as collect and display performance statistics on particular SPEs, or set breakpoints in code.

### Issuing commands to the simulated system

- in the console window which is a Linux shell of the simulated Linux operating system.
- The simulated system is the Linux environment on top of the simulated cell, where you run and debug programs.

# Starting the Simulator in GUI Interface

- The simulator is invoked with the systemsim command "systemsim –g"
  - Note: add /opt/ibm/systemsim-cell/bin to your path
- Specify the initial run script using –f if configuration is needed
  - file should be in the current directory or path qualified
  - This configures the simulated machine and prepares it for execution
  - The default is .systemsim.tcl
  - Samples are provided in the simulator run directory
    - · Linux mode:
      - /opt/ibm/systemsim-cell/run/cell/linux/systemsim.tcl

#### Other systemsim options

- -n : do not open a console window
- -q : suppress periodic run statistics messages
- -g : enable the graphical interface
- Starting the simulator in GUI mode with two Cell BE (SMP configuration)
  - systemsim –g –f config\_smp.tcl
- Another way to start the simulator
  - # cd /opt/ibm/systemsim-cell/run/cell/linux
  - #../run\_gui

# SystemSim Cell GUI main panel

Fil

| systemsim-cell                                                                                                                                                                                                                                                                                                                                                |                      |                    |                |                   |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------|--------------------|----------------|-------------------|
| e Window                                                                                                                                                                                                                                                                                                                                                      |                      |                    |                | Help              |
| mysim     BE_0     B─────────────────────────────────                                                                                                                                                                                                                                                                                                         | Advance Cycle Amour  | cpu                | ▼ Cycles: 0    | 1                 |
| PPE0:1     SPE0                                                                                                                                                                                                                                                                                                                                               | Advance Cycle        | Go                 | Stop           | Service GDB       |
| E SPE1                                                                                                                                                                                                                                                                                                                                                        | Triggers/Breakpoints | Update GUI         | Debug Controls | Options           |
| SPE2     SPE3                                                                                                                                                                                                                                                                                                                                                 | Emitters             | Fast Mode          | SPU Modes      | SPE Visualization |
|                                                                                                                                                                                                                                                                                                                                                               | Process-Tree         | Process-Tree-Stats | Track All PCs  | Event Log         |
|                                                                                                                                                                                                                                                                                                                                                               |                      | 5 (199             |                | Exit              |
| SPE7     Load-Elf-App     Load-Elf-Kernel     Memory Map     SystemMemory     BE_1     PPE0:0     PPE0:1     SPE0     SPE1     SPE2     SPE3     SPE3     SPE4     SPE5     SPE5     SPE5     SPE5     SPE6     SPE7     Load-Elf-Kernel     Memory Map     SystemMemory     SystemMemory     SystemMemory     SystemMemory     SystemMemory     SystemMemory |                      |                    |                |                   |

### **Basic Simulator operations**

| $\chi$ systemsim-cell          |                      |                    |                |                   |
|--------------------------------|----------------------|--------------------|----------------|-------------------|
| File Window                    |                      |                    |                | Help              |
| B mysim                        | 2                    | Ссри               | ▼ Cycles: 0    |                   |
|                                | Advance Cycle Amou   | nt: 🕕              |                |                   |
| turning PPE0:1<br>turning SPE0 | Advance Cycle        | Go                 | Stop           | Service GDB       |
| ⊞ SPE1                         | Triggers/Breakpoints | Update GUI         | Debug Controls | Options           |
| E⊡ SPE2                        | Emitters             | Fast Mode          | SPU Modes      | SPE Visualization |
| terminia SPE3<br>terminia SPE4 | Process-Tree         | Process-Tree-Stats | Track All PCs  | Event Log         |
| E⊡ SPE5                        |                      |                    |                | Exit              |
| E SPE6                         |                      |                    |                |                   |
| 🕀 💼 SPE7                       |                      |                    |                |                   |
| Load-Elf-App                   |                      |                    |                |                   |
| Load-Elf-Kernel                |                      |                    |                |                   |
| Memory Map                     |                      |                    |                |                   |
|                                |                      |                    |                |                   |

# The PPE



# The PPE

35

#### 🗆 🚞 mysim 🖻 💼 PPE0:0 🛅 PCTrack PPCCore 🛅 GPRegs 🚊 FPRegs PCAddressing X mysim/PPE0:0: Core 🖻 💼 PPE0:1 0x00000000000000000000 0x00000000000000000 GPRO 0x000000000FE9DF34 FPRO VMXRO BPVR GPR1 0x00000000FFA79700 FPR1 0x000000000000000000 VMXR1 DCIDRO 0x000000000000000000 GPR2 0x00000000F7FE7480 FPR2 0x0000000000000000000 VMXR2 DCIDRI 0x000000000000000000 GPR3 0x000000000000000000 0x000000000FEA385F FPR3 VMXR3 DRMR0 0x000000000000000000 GPR4 0x0000000001A7EEE3 FPR4 0x0000000000000000000 VMXR4 DRMR1 0x00000000000000000 GPR5 0x000000000FE9DF34 FPR5 0x000000000000000000 VMXR5 DRSR0 0x000000000000000000 GPR6 0x00000000FFA79788 FPR6 0x000000000000000000 VMXR6 DRSR1 0x00000000000000000 GPR7 0x00000000F800FE50 FPR7 0x000000000000000000 VMXR7 ICIDR0 0x000000000000000000 GPR8 0x000000001000018c 0x000000000000000000 FPR8 0x0000000000000000000 VMXR8 ICIDR1 GPR9 0x00000000000000290 FPR9 0x000000000000000000 VMXR9 IRMR0 0x000000000000000000 GPR10 0x000000001001777c FPR10 0x000000000000000000 VMXR10 IRMR1 0x000000000000000000 GPR11 0x00000000000000000 FPR11 0x000000000000000000 VMXR11 IRSR0 0x000000000000000000 GPR12 FPR12 0x0000000000000000000 0x0000000048002448 VMXR12 IRSR1 0x000000000000000000 GPR13 0x000000000000000000 FPR13 0x000000000000000000 VMXR13 PURR 0x000000000000000000 GPR14 0x000000000000000000 FPR14 0x000000000000000000 VMXR14 SCOMC 0x00000000000000000 GPR15 FPR15 0x000000000000000000 VMXR15 0x000000000000000000 0x0000000001A7EEE3 SCOMD GPR16 FPR16 0x000000000000000000 VMXR16 0x000000000000000000 0x00000000100006F8 TDABR GPR17 0x0000000000000000000 FPR17 0x000000000000000000 VMXR17 TDABRX 0x000000000000000000 GPR18 0x00000000F7FE1898 FPR18 0x000000000000000000 VMXR18 TIABR 0x0000000000000000000 GPR19 0x0000000000000000000 FPR19 0x000000000000000000 VMXR19 TLB\_RMT 0x000000000000000000 GPR20 0x0000000000000000003 FPR20 0x000000000000000000 VMXR20 TLB\_inde: 0x000000000000000000 GPR21 0x000000000000000000 FPR21 0x000000000000000000 VMXR21 TLB inde 0x00000000000000528 GPR22 0x00000000100009c2 FPR22 0x000000000000000000 VMXR22 TLB\_rpn 0x000000000000000000 GPR23 0x0000000010000318 FPR23 0x000000000000000000 VMXR23 TLB\_vpn 0x000000000000000000 0x00000000F7FE1898 GPR24 FPR24 0x000000000000000000 VMXR24 TRACE 0x000000000000000000 GPR25 0x000000000000000012 FPR25 0x000000000000000000 VMXR25 accr 0x000000000000000000 GPR26 0x0000000000000000000 FPR26 0x000000000000000000 VMXR26 asr 0x000000000000000000 GPR27 x00000000F7FE17B8 FPR27 0x000000000000000000 VMXR27 or 0x28424442 GPR28 0x00000000100005A8 FPR28 0x000000000000000000 VMXR28 $\operatorname{ctr}$ 0x000000000000000000000 GPR29 FPR29 0x0000000000000000000 0x80800000 0x00000000F800FcF8 VMXR29 otrl GPR30 0x00000000F800F008 FPR30 0x000000000000000000 VMXR30 otrl 0x80800000 GPR31 0x00000000000000029 FPR31 0x0000000000000000000 VMXR31 dabr 0x00000000000000000

# The SPU

|                                                             |                    | 🗙 mysim/SPE0: PC Tracker                                                                                                                                                                                                                                                   |                                                                                                                                                                                              |                                                 |   |
|-------------------------------------------------------------|--------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------|---|
| SPE0     SPUTrack     SPUCore     MFC     MFC     MFC_XLate |                    | 00000100 : 40FE8802 : @***<br>00000104 : 24004080 : \$*@*<br>00000108 : 40800FFF : @***<br>00000100 : 24F44081 : \$*@*<br>00000110 : 18008081 : ****<br>00000114 : 3F810203 : ?***<br>00000118 : 4080207E : @* *<br>00000112 : 1030082 : *0**<br>00000120 : 1030082 : *0** | : il \$2,-752<br>: stqd \$0,16(\$1)<br>: il \$127,31<br>: stqd \$1,-752(\$<br>: a \$1,\$1,\$2<br>: rotqbyi \$3,\$4,<br>: il \$125,64<br>: il \$5,24<br>: ai \$2,\$1,192<br>: il \$125,0<br>M | :1)<br>4                                        |   |
| LS Stats                                                    | X mysim/SPE0: Chan | inels                                                                                                                                                                                                                                                                      |                                                                                                                                                                                              |                                                 |   |
| SPUMemory                                                   | □ BP 00000000      | 0 0 Read Event Status (RB)                                                                                                                                                                                                                                                 | BP 00000500                                                                                                                                                                                  | 1 16 DMA Local Storage Address (W)              |   |
| SPUStats     Model:instruction                              | □ BP 00000000      | 1 1 Write Event Mask (W)                                                                                                                                                                                                                                                   | BP D0000000                                                                                                                                                                                  | 1 17 DMA Effective Address High (W)             |   |
| Load-Exec                                                   | ⊒ BP 00000300      | 1 2 Write Event Acknowledgment (W)                                                                                                                                                                                                                                         | BP 00268000                                                                                                                                                                                  | 1 18 DMA Effective Address Low (W)              |   |
| : <u>-</u>                                                  |                    | 0 2 Signal Notification 1 /PP)                                                                                                                                                                                                                                             | BP 00000900                                                                                                                                                                                  | 1 19 DMA Transfer Size (W)                      | e |
|                                                             |                    | 0 3 Signal Notification 1 (NB)                                                                                                                                                                                                                                             | BP 00000000                                                                                                                                                                                  | 1 20 DMA Command Tag ID (W)                     |   |
|                                                             |                    | U 4 aigna Nourication 2 (nb)                                                                                                                                                                                                                                               | ■ RP 00000040                                                                                                                                                                                | 15 21 DMA Command Opcode / Class ID (WB)        |   |
|                                                             | □ BP 00000000      | 1 7 Write Decrementer (W)                                                                                                                                                                                                                                                  | BP triggered: PC=0000                                                                                                                                                                        | 0010C:21A00A82:wrch \$mfc_cmd_queue,\$2         |   |
|                                                             | □ BP 00000000      | 1 8 Read Decrementer (R)                                                                                                                                                                                                                                                   | BP 00000000                                                                                                                                                                                  | 1 23 Write Tag Status Update Request (WB)       |   |
|                                                             |                    |                                                                                                                                                                                                                                                                            | BP 00000000                                                                                                                                                                                  | 0 24 Read Tag-Group Status (RB)                 |   |
|                                                             | BP 00000000        | 1 9 Write Multisource Sync. Request (WB)                                                                                                                                                                                                                                   | BP 00000000                                                                                                                                                                                  | 0 25 Read List Stall-and-Notify Tag Status (RB) |   |
|                                                             | BP 00000000        | 1 11 Read Event Mask (R)                                                                                                                                                                                                                                                   | BP 00000000                                                                                                                                                                                  | 1 26 Write List Stall-and-Notify Tag Ack. (W)   |   |
|                                                             | ⊒ BP 00000000      | 1 12 Read Tag-Group Query Mask (R)                                                                                                                                                                                                                                         | BP 00000000                                                                                                                                                                                  | 0 27 Read Atomic Command Status (R)             |   |
|                                                             | BP 00000000        | 1 13 Read Machine Status (R)                                                                                                                                                                                                                                               | BP 00000000                                                                                                                                                                                  | 1 28 Write Outbound Mailbox (WB)                |   |
|                                                             |                    |                                                                                                                                                                                                                                                                            | ■ BP *:00000000 ▼                                                                                                                                                                            | 0 29 Read Inbound Mailbox (R)                   |   |
|                                                             |                    |                                                                                                                                                                                                                                                                            | BP 00000000                                                                                                                                                                                  | 1 30 Write Outbound Interrupt Mailbox (WB)      |   |
|                                                             | BP 00000000        | 1 14 Write State Save-and-Restore (W)                                                                                                                                                                                                                                      |                                                                                                                                                                                              |                                                 |   |
|                                                             | BP 00000000        | 1 15 Read State Save-and-Restore (R)                                                                                                                                                                                                                                       | Status not stalled                                                                                                                                                                           |                                                 |   |

# Simulator Modes – fast, simple, and cycle

- The default simulation mode when the simulator starts is "simple", or functional-only, simulation
  - In this mode, the time / cycles to execute an application is NOT a meaningful indicator of execution time on real hardware

### To get meaningful performance results:

- Select "Cycle" mode on the GUI
- Enter "mysim mode cycle" in the command window

### This will make the simulator run slower

- Depending on the workload, simulation time could increase by 10x to 100x
- But you can switch between modes as needed, so you can limit this overhead to just the relevant portions of the simulation

# How to Exchange Files between Host and Simulator

#### callthru

- A command issued from a simulated windows (from the simulator)
- "backdoor" communication mechanism for the simulated environment to communicate with the host environment
- Useful for bringing in files to the simulated environment without shutting down and restarting the simulator
- Example: (binary host  $\rightarrow$  simulator)
  - callthru source /opt/cell\_class/Hands-on-30/hello/hello\_ppu/hello\_ppu > hello\_ppu
  - chmod 755 hello\_ppu
  - ./hello\_ppu
- Example (result file simulator  $\rightarrow$  host)
  - callthru sink /home/systemsim-cell/results/result\_file < cat result\_file</li>
  - exporting result files out of the simulated environment for later inspection



# **Execute Binary**

- From the simulated windows, bring executable into the simulator by using the callthru utility, e.g.,
  - callthru source /opt/cell\_class/Hands-on-30/hello/hello\_ppu/hello\_ppu
     hello\_ppu
- Execute binary
  - chmod 755 hello\_ppu
  - ./hello\_ppu

39

Tip!

Copy binary to /tmp/´<exe> on host to shorten the filename



# Building three types of the hello world! program

# **Directory Structure**

/opt/cell\_class/Hands-on-30/hello

- hello\_ppu
- hello\_spu
- hello\_be1 synchronous spu thread (hello\_be1-sync)
  - spu
- hello\_be1 asynchronous spu thread (hello\_be1-async)
  - spu

### Hands-on Exercise

- 1. Create a directory hello\_ppu, write a hello world ppu program and create a Makefile, then compile and execute it as a standalone ppu program
- 2. Create a directory hello\_spu, write a hello world spu program and create a Makefile, then compile and execute it as a standalone spu program
- 3. Create a directory hello\_be1, and write a ppu program that calls an spu program to write hello world in a synchronous manner. Create all ppu and spu makefiles. Compile and execute those programs to demonstrate the basic structure of a simple PPE-SPE software synergy model (PPE-single SPE model)
- 4. Same as in 3. but with asynchronous thread
- 5. Producing a simple multi-threaded hello world program
  - See instructions in the next page

# Need to compile (use make) and run the executables on the simulator

### Hands-on – multi-threaded hello world

To produce a simple program for the CBE, you should follow the steps listed below (this example is included in the SDK in /opt/cell/sdk/src/tutorial/simple).

The project is called simple. For this example, the PPE code will be built in the project directory, instead of a ppu subdirectory.

This program creates SPE threads that output "Hello Cell (#)\n" to the systemsim output window, where # is the spe\_id of the SPE thread that issued the print.

#### 1. Create a directory named simple.

#### 2. In directory simple, create a file named Makefile using the following code:

# Hands-on – multi-threaded hello world (cont'd)

IMPORTS := spu/lib\_simple\_spu.a -lspe2 -lpthread

# imports the embedded simple\_spu library

# allows consolidation of spu program into ppe binary

# make.footer

# make.footer is in the top of the SDK

ifdef CELL\_TOP

include \$(CELL\_TOP)/buildutils/make.footer

else

include ../../../buildutils/make.footer

Endif

44

#### 3. In directory simple, create a file simple.c using the following code:

#include <stdlib.h>

#include <stdio.h>

#include <errno.h>

#include <libspe2.h>

#include <pthread.h>

# Hands-on - multi-threaded hello world (cont'd)

```
extern spe program handle t simple spu;
#define MAX SPU THREADS 16
void *ppu pthread function(void *arg) {
spe context ptr t ctx;
unsigned int entry = SPE DEFAULT ENTRY;
ctx = *((spe_context_ptr_t *)arg);
if (spe_context_run(ctx,&entry, 0, NULL, NULL, NULL) < 0) {
perror ("Failed running context");
exit (1);
pthread_exit(NULL);
}
int main()
{
int i,spu threads;
spe context ptr t ctxs[MAX SPU THREADS];
pthread t threads[MAX SPU THREADS];
```

### Hands-on – multi-threaded hello world (cont'd)

```
/* Determine the number of SPE threads to create */
spu_threads = spe_cpu_info_get(SPE_COUNT_USABLE_SPES, -1);
if (spu threads > MAX SPU THREADS) spu threads = MAX SPU THREADS;
/* Create several SPE-threads to execute 'simple spu' */
for(i=0; i<spu threads; i++) {</pre>
/* Create context */
if ((ctxs[i] = spe context create (0, NULL)) == NULL) {
perror ("Failed creating context");
exit (1);
/* Load program into context */
if (spe program load (ctxs[i],&simple spu)) {
perror ("Failed loading program");
exit (1);
}
/* Create thread for each SPE context */
if (pthread create (&threads[i], NULL,&ppu pthread function,&ctxs[i])) {
perror ("Failed creating thread");
exit (1);
```



# Hands-on – multi-threaded hello world (cont'd)

```
/* Wait for SPU-thread to complete execution. */
for (i=0; i<spu_threads; i++) {
    if (pthread_join (threads[i], NULL)) {
        perror("Failed pthread_join");
    exit (1);
    }
    printf("\nThe program has successfully executed.\n");
    return (0);
}</pre>
```

4. Create a directory named spu.

# Hands-on - multi-threaded hello world (cont'd)

### 5. In the directory spu, create a file named Makefile using the following code: # Target PROGRAMS spu := simple spu # created embedded library LIBRARY embed := lib simple spu.a # make.footer \_\_\_\_\_ # make.footer is in the top of the SDK ifdef CELL TOP include \$(CELL TOP)/buildutils/make.footer else include ../../../buildutils/make.footer endif

# Hands-on - multi-threaded hello world (cont'd)

#### 6. In the same directory, create a file simple\_spu.c using the following code:

```
#include <stdio.h>
```

```
int main(unsigned long long id)
```

```
{
```

/\* The first parameter of an spu program will always be the spe\_id of the spe

```
* thread that issued it.
```

```
*/
```

```
printf("Hello Cell (0x%llx)\n", id);
```

return 0;

}

7. Compile the program by entering the following command at the command line while in the simple directory:

make

# Summary

- Compile and execute different types of cell programs on the simulator
  - Understand the basic differences between a ppu, spu, and BE program
  - Understand the embedded concept of a cellBE program
  - Understand the build process
  - Understand the contents of different Makefile
  - Understand the basic operations of the simulator

### **Special Notices -- Trademarks**

This document was developed for IBM offerings in the United States as of the date of publication. IBM may not make these offerings available in other countries, and the information is subject to change without notice. Consult your local IBM business contact for information on the IBM offerings available in your area. In no event will IBM be liable for damages arising directly or indirectly from any use of the information contained in this document.

Information in this document concerning non-IBM products was obtained from the suppliers of these products or other public sources. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.

IBM may have patents or pending patent applications covering subject matter in this document. The furnishing of this document does not give you any license to these patents. Send license inquires, in writing, to IBM Director of Licensing, IBM Corporation, New Castle Drive, Armonk, NY 10504-1785 USA.

All statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.

The information contained in this document has not been submitted to any formal IBM test and is provided "AS IS" with no warranties or guarantees either expressed or implied.

All examples cited or described in this document are presented as illustrations of the manner in which some IBM products can be used and the results that may be achieved. Actual environmental costs and performance characteristics will vary depending on individual client configurations and conditions.

IBM Global Financing offerings are provided through IBM Credit Corporation in the United States and other IBM subsidiaries and divisions worldwide to qualified commercial and government clients. Rates are based on a client's credit rating, financing terms, offering type, equipment type and options, and may vary by country. Other restrictions may apply. Rates and offerings are subject to change, extension or withdrawal without notice.

IBM is not responsible for printing errors in this document that result in pricing or information inaccuracies.

All prices shown are IBM's United States suggested list prices and are subject to change without notice; reseller prices may vary.

IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply.

Many of the features described in this document are operating system dependent and may not be available on Linux. For more information, please check: <u>http://www.ibm.com/systems/p/software/whitepapers/linux\_overview.html</u>

Any performance data contained in this document was determined in a controlled environment. Actual results may vary significantly and are dependent on many factors including system hardware configuration and software design and configuration. Some measurements quoted in this document may have been made on development-level systems. There is no guarantee these measurements will be the same on generally-available systems. Some measurements quoted in this document may have been estimated through extrapolation. Users of this document should verify the applicable data for their specific environment.

Revised January 19, 2006



### Special Notices (Cont.) -- Trademarks

The following terms are trademarks of International Business Machines Corporation in the United States and/or other countries: alphaWorks, BladeCenter, Blue Gene, ClusterProven, developerWorks, e business(logo), e(logo)business, e(logo)server, IBM, IBM(logo), ibm.com, IBM Business Partner (logo), IntelliStation, MediaStreamer, Micro Channel, NUMA-Q, PartnerWorld, PowerPC, PowerPC(logo), pSeries, TotalStorage, xSeries; Advanced Micro-Partitioning, eServer, Micro-Partitioning, NUMACenter, On Demand Business logo, OpenPower, POWER, Power Architecture, Power Everywhere, Power Family, Power PC, PowerPC Architecture, POWER5, POWER5+, POWER6, POWER6+, Redbooks, System p, System p5, System Storage, VideoCharger, Virtualization Engine.

A full list of U.S. trademarks owned by IBM may be found at: http://www.ibm.com/legal/copytrade.shtml.

Cell Broadband Engine and Cell Broadband Engine Architecture are trademarks of Sony Computer Entertainment, Inc. in the United States, other countries, or both.

Rambus is a registered trademark of Rambus, Inc.

XDR and FlexIO are trademarks of Rambus, Inc.

UNIX is a registered trademark in the United States, other countries or both.

Linux is a trademark of Linus Torvalds in the United States, other countries or both.

Fedora is a trademark of Redhat, Inc.

52

Microsoft, Windows, Windows NT and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries or both.

Intel, Intel Xeon, Itanium and Pentium are trademarks or registered trademarks of Intel Corporation in the United States and/or other countries.

AMD Opteron is a trademark of Advanced Micro Devices, Inc.

Java and all Java-based trademarks and logos are trademarks of Sun Microsystems, Inc. in the United States and/or other countries.

TPC-C and TPC-H are trademarks of the Transaction Performance Processing Council (TPPC).

SPECint, SPECfp, SPECjbb, SPECweb, SPECjAppServer, SPEC OMP, SPECviewperf, SPECapc, SPEChpc, SPECjvm, SPECmail, SPECimap and SPECsfs are trademarks of the Standard Performance Evaluation Corp (SPEC).

AltiVec is a trademark of Freescale Semiconductor, Inc.

PCI-X and PCI Express are registered trademarks of PCI SIG.

InfiniBand<sup>™</sup> is a trademark the InfiniBand<sup>®</sup> Trade Association

Other company, product and service names may be trademarks or service marks of others.

Revised July 23, 2006



### **Special Notices - Copyrights**

(c) Copyright International Business Machines Corporation 2005. All Rights Reserved. Printed in the United Sates September 2005.

The following are trademarks of International Business Machines Corporation in the United States, or other countries, or both. IBM IBM Logo Power Architecture

Other company, product and service names may be trademarks or service marks of others.

All information contained in this document is subject to change without notice. The products described in this document are NOT intended for use in applications such as implantation, life support, or other hazardous uses where malfunction could result in death, bodily injury, or catastrophic property damage. The information contained in this document does not affect or change IBM product specifications or warranties. Nothing in this document shall operate as an express or implied license or indemnity under the intellectual property rights of IBM or third parties. All information contained in this document was obtained in specific environments, and is presented as an illustration. The results obtained in other operating environments may vary.

While the information contained herein is believed to be accurate, such information is preliminary, and should not be relied upon for accuracy or completeness, and no representations or warranties of accuracy or completeness are made.

THE INFORMATION CONTAINED IN THIS DOCUMENT IS PROVIDED ON AN "AS IS" BASIS. In no event will IBM be liable for damages arising directly or indirectly from any use of the information contained in this document.

IBM Microelectronics Division 1580 Route 52, Bldg. 504 Hopewell Junction, NY 12533-6351 The IBM home page is http://www.ibm.com The IBM Microelectronics Division home page is http://www.chips.ibm.com