ST: CDA 6938
Multi-core/many-core architectures and programming
Assignments
Homework #0 (No need to turn it in)
a. Write a multithreaded program using the Brook+ streaming programming model and the emulator to generate multiple “Hello world!”
b. Write a multithreaded program using the CUDA programming model and the emulator to generate multiple “Hello world!”
Tips:
Brook+:
Download link: http://ati.amd.com/technology/streamcomputing/sdkdwnld.html
You need to download ATI Stream SDK to install Brook+ and CAL. After that, you need to set environment variable for your project: if you want to run in the emulation mode BRT_RUNTIME = cpu. If you have an ATI card, set BRT_RUNTIME = cal. You may use Visual Studio to open a project in C:\Program Files\AMD\AMD Brook+ 1.2.1_beta\samples\tests\ to test it.
Note that you may need additional
DLLs for
CUDA:
Download link: http://www.nvidia.com/object/cuda_get.html
You need to download (1) CUDA driver (not necessary for the emulation mode), (2) CUDA toolkit (required), and (3) CUDA SDK (required)
After installation, you may use Visual Studio to open a project in "C:\Program Files\NVIDIA Corporation\NVIDIA CUDA SDK\projects\deviceQuery\". Select emurelease mode to build the project and run.
The latest versions of CAL and Brook+ as well as CUDA have been set up on the lab machines. If you choose to use them, the username is CDA6938 and the password is hec242.
On the desktop, there is a folder "ATI_BROOK". In this folder, you may copy the "template" folder to your own folders to start your work. There is another folder "NVIDIA_CUDA",
which contains a template for your projects using CUDA.
Homework #1 2-D Convolution in Brook+
Send your source code along with a brief explanation and the performance results in a text file to zhou@eecs.ucf.edu
Brief explanation on 2D convolution (or image convolution): assume a matrix a[M, N] and a matrix h[J,K], the convolution is defined as follows:
For simplicity, you may assume out-of-bound elements of a (e.g., a[-1,-1]) are zero.
Homework #2 2-D Convolution in CUDA
a. Write a 2D convolution function using CUDA. Debug your program using the CPU-based emulator and test it on lab machines.
b. Write an optimized version of single-precision floating-point 2D convolution for Nvidia GTX8800 GPU (using CUDA). Use random numbers to initialize both matrices. Report the following results: (1) The number of lines of code in the kernel function(s). (2) The execution time (including data transmission time from CPU/GPU to GPU/CPU) for matrix size 256 x 256, 512 x 512, 1024x 1024, 2048x2048, 4096x4096, the convolution kernel size is 7x7.
Sample solutions
Homework #3 Cell Programming
Tips of installing the Cell SDK 3.0 on an x86 machine
Sample code on using Mailbox and DMA