Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
Introduction
- What is OpenCL?
- OpenCL vs CUDA vs SYCL
- Overview of OpenCL features and architecture.
- Setting up the development environment.
Getting Started
- Creating a new OpenCL project using Visual Studio Code.
- Exploring the project structure and files.
- Compiling and running the program.
- Displaying output using printf and fprintf.
OpenCL API
- Understanding the role of the OpenCL API in the host program.
- Using the OpenCL API to query device information and capabilities.
- Using the OpenCL API to create contexts, command queues, buffers, kernels, and events.
- Using the OpenCL API to enqueue commands such as read, write, copy, map, unmap, execute, and wait.
- Using the OpenCL API to handle errors and exceptions.
OpenCL C
- Understanding the role of OpenCL C in the device program.
- Using OpenCL C to write kernels that execute on the device and manipulate data.
- Using OpenCL C data types, qualifiers, operators, and expressions.
- Using OpenCL C built-in functions such as math, geometric, and relational operations.
- Using OpenCL C extensions and libraries such as atomic, image, cl_khr_fp16, etc.
OpenCL Memory Model
- Understanding the difference between host and device memory models.
- Using OpenCL memory spaces such as global, local, constant, and private.
- Using OpenCL memory objects such as buffers, images, and pipes.
- Using OpenCL memory access modes such as read-only, write-only, read-write, etc.
- Using OpenCL memory consistency models and synchronization mechanisms.
OpenCL Execution Model
- Understanding the difference between host and device execution models.
- Using OpenCL work-items, work-groups, and ND-ranges to define parallelism.
- Using OpenCL work-item functions such as get_global_id, get_local_id, get_group_id, etc.
- Using OpenCL work-group functions such as barrier, work_group_reduce, work_group_scan, etc.
- Using OpenCL device functions such as get_num_groups, get_global_size, get_local_size, etc.
Debugging
- Understanding common errors and bugs in OpenCL programs.
- Using the Visual Studio Code debugger to inspect variables, breakpoints, call stack, etc.
- Using CodeXL to debug and analyze OpenCL programs on AMD devices.
- Using Intel VTune to debug and analyze OpenCL programs on Intel devices.
- Using NVIDIA Nsight to debug and analyze OpenCL programs on NVIDIA devices.
Optimization
- Understanding factors that affect the performance of OpenCL programs.
- Using OpenCL vector data types and vectorization techniques to improve arithmetic throughput.
- Using OpenCL loop unrolling and loop tiling techniques to reduce control overhead and increase locality.
- Using OpenCL local memory and local memory functions to optimize memory accesses and bandwidth.
- Using OpenCL profiling and profiling tools to measure and improve execution time and resource utilization.
Summary and Next Steps
Requirements
- Understanding of the C/C++ language and parallel programming concepts.
- Basic knowledge of computer architecture and memory hierarchy.
- Experience with command-line tools and code editors.
Audience
- Developers who wish to learn how to use OpenCL to program heterogeneous devices and exploit their parallelism.
- Developers who wish to write portable and scalable code that can run on different platforms and devices.
- Programmers who wish to explore the low-level aspects of heterogeneous programming and optimize their code performance.
28 Hours