博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Getting started with OpenCL and GPU Computing
阅读量:6171 次
发布时间:2019-06-21

本文共 10638 字,大约阅读时间需要 35 分钟。

转载:

 

OpenCL (Open Computing Language) is a new framework for writing programs that execute in parallel on different compute devices (such as CPUs and GPUs) from different vendors (AMD, Intel, ATI, Nvidia etc.). The framework defines a language to write “kernels” in. These kernels are the functions which are to run on the different compute devices. In this post I explain how to get started with OpenCL and how to make a small OpenCL program that will compute the sum of two lists in parallel.

 

Installing and setting up OpenCL on your computer

First of all you need to download the newest drivers to your graphics card. This is important because OpenCL will not work if you don’t have drivers that support OpenCL.

To install OpenCL you need to download an implementation of OpenCL. The major graphic vendors Nvidia and AMD/ATI have both released implementations of OpenCL for their GPUs. These implementation come in a so called software development kits and often include some useful tools such as a visual profiler. The next step is to download and install the SDK for the GPU you have on your computer. Note that not all graphic cards are supported. A list of which graphic cards are supported can be found on the vendors websites.

For AMD/ATI GPUs download the

For Nvidia GPUs download the

The installation steps differ for each SDK and the OS you are running. Follow the installation manual of the SDK carefully. Personally I use Ubuntu Linux and have an AMD 7970 graphics card. Below are some installation steps for this specific setup.

Installing OpenCL on Ubuntu Linux with AMD graphics card

To install the latest AMD drivers on Ubuntu 12.04 open additional drivers and install/active the one called “ATI/AMD proprietary FGLRX graphic driver (post-release updates)”.

After that is done, restart and download and extract the .

AMD APP SDK 2.8 includes an installer. Run this with the command:

sudo sh Install-AMD-APP.sh

Next, install the OpenCL headers files

sudo apt-get install opencl-headers

And your done! Note that the AMD APP SDK and its samples is located at /opt/AMDAPP.

Installing OpenCL on Ubuntu Linux with NVIDIA graphics card

Download the CUDA toolkit for Ubuntu from . Open a terminal an run the installation file with the command:

sudo sh cudatoolkit_3.1_linux_64_ubuntu9.10.run

Download the Developer Drivers for Linux at the same website and install it by first stopping X, running the file and start X again. To stop X use:

sudo /etc/init.d/gdm stop

Then get a terminal up by pressing CTRL+ALT+F5, login and navigate to where you downloaded the devdriver then type:

sudo sh devdriver_3.1_linux_64_256.40.run

After the driver has been installed start x again by typing

startx

Before compiling an OpenCL application you need to add the path to the lib folder of CUDA to LD_LIBRARY_PATH like so:

export LD_LIBRARY_PATH=/usr/local/cuda/lib64

Your first OpenCL program – Vector addition

To demonstrate OpenCL I explain how to perform the simple task of vector addition. Suppose we have two lists of numbers, A and B, of equal size. The task of vector addition is to add the elements of A with the elements of B and put the result in the element of a new list called C of the same size. The figure below explains the operation.

Two lists A and B and the result list C of vector addition on A and B

The naive way of performing this operation is to simply loop through the list and perform the operation on one element at a time like the C++ code below:

for(int i = 0; i < LIST_SIZE; i++) { C[i] = A[i] + B[i]; }

This algorithm is simple but has a linear time complexity, O(n) where n is the size of the list. But since each iteration of this loop is independent on the other iterations this operation is data parallel, meaning that each iteration can be computed simultaneously. So if we have n cores on a processor this operation can be performed in constant time O(1).

To make OpenCL perform this operation in parallel we need to make the kernel. The kernel is the function which will run on the compute device.

The kernel

The kernel is written in the OpenCL language which is a subset of C and has a lot of math and vector functions included. The kernel to perform the vector addition operation is defined below.

__kernel void vector_add(__global const int *A, __global const int *B, __global int *C) { // Get the index of the current element to be processed int i = get_global_id(0); // Do the operation C[i] = A[i] + B[i]; }

The host program

The host program controls the execution of kernels on the compute devices. The host program is written in C, but bindings for other languages like C++ and Python exists. The OpenCL API is defined in the cl.h (or opencl.h for apple) header file. Below is the code for the host program that executes the kernel above on compute device. I will not go into details on each step as this is supposed to be an introductory article although I can recommend the book if you want to dive into the details. The main steps of a host program is as follows:

  • Get information about the platform and the devices available on the computer (line 42)
  • Select devices to use in execution (line 43)
  • Create an OpenCL context (line 47)
  • Create a command queue (line 50)
  • Create memory buffer objects(line 53-58)
  • Transfer data (list A and B) to memory buffers on the device (line 61-64)
  • Create program object (line 67)
  • Load the kernel source code (line 24-35) and compile it (line 71) (online exeuction) or load the precompiled binary OpenCL program (offline execution)
  • Create kernel object (line 74)
  • Set kernel arguments (line 77-79)
  • Execute the kernel (line 84)
  • Read memory objects. In this case we read the list C from the compute device (line 88-90)
1 #include 
2 #include
3 4 #ifdef __APPLE__ 5 #include
6 #else 7 #include
8 #endif 9 10 #define MAX_SOURCE_SIZE (0x100000) 11 12 int main(void) { 13 // Create the two input vectors 14 int i; 15 const int LIST_SIZE = 1024; 16 int *A = (int*)malloc(sizeof(int)*LIST_SIZE); 17 int *B = (int*)malloc(sizeof(int)*LIST_SIZE); 18 for(i = 0; i < LIST_SIZE; i++) { 19 A[i] = i; 20 B[i] = LIST_SIZE - i; 21 } 22 23 // Load the kernel source code into the array source_str 24 FILE *fp; 25 char *source_str; 26 size_t source_size; 27 28 fp = fopen("vector_add_kernel.cl", "r"); 29 if (!fp) { 30 fprintf(stderr, "Failed to load kernel.\n"); 31 exit(1); 32 } 33 source_str = (char*)malloc(MAX_SOURCE_SIZE); 34 source_size = fread( source_str, 1, MAX_SOURCE_SIZE, fp); 35 fclose( fp ); 36 37 // Get platform and device information 38 cl_platform_id platform_id = NULL; 39 cl_device_id device_id = NULL; 40 cl_uint ret_num_devices; 41 cl_uint ret_num_platforms; 42 cl_int ret = clGetPlatformIDs(1, &platform_id, &ret_num_platforms); 43 ret = clGetDeviceIDs( platform_id, CL_DEVICE_TYPE_DEFAULT, 1, 44 &device_id, &ret_num_devices); 45 46 // Create an OpenCL context 47 cl_context context = clCreateContext( NULL, 1, &device_id, NULL, NULL, &ret); 48 49 // Create a command queue 50 cl_command_queue command_queue = clCreateCommandQueue(context, device_id, 0, &ret); 51 52 // Create memory buffers on the device for each vector 53 cl_mem a_mem_obj = clCreateBuffer(context, CL_MEM_READ_ONLY, 54 LIST_SIZE * sizeof(int), NULL, &ret); 55 cl_mem b_mem_obj = clCreateBuffer(context, CL_MEM_READ_ONLY, 56 LIST_SIZE * sizeof(int), NULL, &ret); 57 cl_mem c_mem_obj = clCreateBuffer(context, CL_MEM_WRITE_ONLY, 58 LIST_SIZE * sizeof(int), NULL, &ret); 59 60 // Copy the lists A and B to their respective memory buffers 61 ret = clEnqueueWriteBuffer(command_queue, a_mem_obj, CL_TRUE, 0, 62 LIST_SIZE * sizeof(int), A, 0, NULL, NULL); 63 ret = clEnqueueWriteBuffer(command_queue, b_mem_obj, CL_TRUE, 0, 64 LIST_SIZE * sizeof(int), B, 0, NULL, NULL); 65 66 // Create a program from the kernel source 67 cl_program program = clCreateProgramWithSource(context, 1, 68 (const char **)&source_str, (const size_t *)&source_size, &ret); 69 70 // Build the program 71 ret = clBuildProgram(program, 1, &device_id, NULL, NULL, NULL); 72 73 // Create the OpenCL kernel 74 cl_kernel kernel = clCreateKernel(program, "vector_add", &ret); 75 76 // Set the arguments of the kernel 77 ret = clSetKernelArg(kernel, 0, sizeof(cl_mem), (void *)&a_mem_obj); 78 ret = clSetKernelArg(kernel, 1, sizeof(cl_mem), (void *)&b_mem_obj); 79 ret = clSetKernelArg(kernel, 2, sizeof(cl_mem), (void *)&c_mem_obj); 80 81 // Execute the OpenCL kernel on the list 82 size_t global_item_size = LIST_SIZE; // Process the entire lists 83 size_t local_item_size = 64; // Divide work items into groups of 64 84 ret = clEnqueueNDRangeKernel(command_queue, kernel, 1, NULL, 85 &global_item_size, &local_item_size, 0, NULL, NULL); 86 87 // Read the memory buffer C on the device to the local variable C 88 int *C = (int*)malloc(sizeof(int)*LIST_SIZE); 89 ret = clEnqueueReadBuffer(command_queue, c_mem_obj, CL_TRUE, 0, 90 LIST_SIZE * sizeof(int), C, 0, NULL, NULL); 91 92 // Display the result to the screen 93 for(i = 0; i < LIST_SIZE; i++) 94 printf("%d + %d = %d\n", A[i], B[i], C[i]); 95 96 // Clean up 97 ret = clFlush(command_queue); 98 ret = clFinish(command_queue); 99 ret = clReleaseKernel(kernel);100 ret = clReleaseProgram(program);101 ret = clReleaseMemObject(a_mem_obj);102 ret = clReleaseMemObject(b_mem_obj);103 ret = clReleaseMemObject(c_mem_obj);104 ret = clReleaseCommandQueue(command_queue);105 ret = clReleaseContext(context);106 free(A);107 free(B);108 free(C);109 return 0;110 }

To make OpenCL run the kernel on the GPU you can change the constant CL_DEVICE_TYPE_DEFAULT to CL_DEVICE_TYPE_GPU in line 43. To run on CPU you can set it to CL_DEVICE_TYPE_CPU. This shows how easy OpenCL makes it to run different programs on different compute devices.

Compiling an OpenCL program

If the OpenCL header and library files are located in their proper folders (/usr/include and /usr/lib) the following command will compile the vectorAddition program.

gcc main.c -o vectorAddition -l OpenCL

How to learn more

To learn more about OpenCL I recommend the book from Fixstars called . Below are some links to useful sites with information on OpenCL:

你可能感兴趣的文章
[2018-9-4T2]探索黑暗dark
查看>>
【学术信息】中科院2019年学术期刊分区-综合性期刊
查看>>
ShareObject离线存储相关
查看>>
C++ XML
查看>>
windows批处理 打开exe后关闭cmd
查看>>
Flask开发系列之快速入门
查看>>
关于SaveChanges
查看>>
php7扩展开发 一 获取参数
查看>>
处女座与复读机
查看>>
Laravel 5.2数据库--迁移migration
查看>>
ExtJs Extender controls 不错的例子
查看>>
html的基础知识
查看>>
Mybatis Sql片段的应用
查看>>
突发奇想20150126
查看>>
Nginx + CGI/FastCGI + C/Cpp
查看>>
学习笔记------jsp页面与jsp标记
查看>>
DS博客作业02--线性表
查看>>
第三届ACM山东省赛I题_Chess_STL
查看>>
jQuery each和js forEach用法比较
查看>>
前端笔记-作用域链的一些理解加记录(JS高级程序设计读书笔记1)
查看>>