Get in Touch

Course Outline

Introduction

  • What is OpenCL?
  • OpenCL vs CUDA vs SYCL
  • Overview of OpenCL features and architecture.
  • Setting up the development environment.

Getting Started

  • Creating a new OpenCL project using Visual Studio Code.
  • Exploring the project structure and files.
  • Compiling and running the program.
  • Displaying output using printf and fprintf.

OpenCL API

  • Understanding the role of the OpenCL API in the host program.
  • Using the OpenCL API to query device information and capabilities.
  • Using the OpenCL API to create contexts, command queues, buffers, kernels, and events.
  • Using the OpenCL API to enqueue commands such as read, write, copy, map, unmap, execute, and wait.
  • Using the OpenCL API to handle errors and exceptions.

OpenCL C

  • Understanding the role of OpenCL C in the device program.
  • Using OpenCL C to write kernels that execute on the device and manipulate data.
  • Using OpenCL C data types, qualifiers, operators, and expressions.
  • Using OpenCL C built-in functions such as math, geometric, and relational operations.
  • Using OpenCL C extensions and libraries such as atomic, image, cl_khr_fp16, etc.

OpenCL Memory Model

  • Understanding the difference between host and device memory models.
  • Using OpenCL memory spaces such as global, local, constant, and private.
  • Using OpenCL memory objects such as buffers, images, and pipes.
  • Using OpenCL memory access modes such as read-only, write-only, read-write, etc.
  • Using OpenCL memory consistency models and synchronization mechanisms.

OpenCL Execution Model

  • Understanding the difference between host and device execution models.
  • Using OpenCL work-items, work-groups, and ND-ranges to define parallelism.
  • Using OpenCL work-item functions such as get_global_id, get_local_id, get_group_id, etc.
  • Using OpenCL work-group functions such as barrier, work_group_reduce, work_group_scan, etc.
  • Using OpenCL device functions such as get_num_groups, get_global_size, get_local_size, etc.

Debugging

  • Understanding common errors and bugs in OpenCL programs.
  • Using the Visual Studio Code debugger to inspect variables, breakpoints, call stack, etc.
  • Using CodeXL to debug and analyze OpenCL programs on AMD devices.
  • Using Intel VTune to debug and analyze OpenCL programs on Intel devices.
  • Using NVIDIA Nsight to debug and analyze OpenCL programs on NVIDIA devices.

Optimization

  • Understanding factors that affect the performance of OpenCL programs.
  • Using OpenCL vector data types and vectorization techniques to improve arithmetic throughput.
  • Using OpenCL loop unrolling and loop tiling techniques to reduce control overhead and increase locality.
  • Using OpenCL local memory and local memory functions to optimize memory accesses and bandwidth.
  • Using OpenCL profiling and profiling tools to measure and improve execution time and resource utilization.

Summary and Next Steps

Requirements

  • Understanding of the C/C++ language and parallel programming concepts.
  • Basic knowledge of computer architecture and memory hierarchy.
  • Experience with command-line tools and code editors.

Audience

  • Developers who wish to learn how to use OpenCL to program heterogeneous devices and exploit their parallelism.
  • Developers who wish to write portable and scalable code that can run on different platforms and devices.
  • Programmers who wish to explore the low-level aspects of heterogeneous programming and optimize their code performance.
 28 Hours

Number of participants


Price per participant

Upcoming Courses

Related Categories