>> Tuesday, August 16, 2011
OpenGL provides two main methods of combining vertices into primitives (lines, triangles, etc.):
- glDrawArrays/glMultiDrawArrays - For each primitive, every vertex is sent to the GPU. If Vertex A is connected to four primitives, Vertex A will be sent to the GPU four times.
- glDrawElements/glMultiDrawElements - Vertices are sent in one large block and an index list is used to determine which vertices belong to which primitive. If Vertex A is connected to four primitives, Vertex A will only be sent to the GPU once.
But I want to process vertices on the GPU using OpenCL work-items. In this case, it may be easier to have each work-item access a separate group of vertices than to have multiple work-items access the same input data through an index list. As long as each work-item processes different vertices, I don't need to synchronize them with barriers. So I'm currently using glMultiDrawArrays instead of glMultiDrawElements.
However, some operations are beyond OpenCL, and dynamic memory allocation is an important one. For example, if a model contains one thousand 3-D objects that need be processed mathematically, OpenCL is great. But if the user deletes Objects 5 and 612, OpenCL can't deallocate the memory. Instead, the CPU needs to free the memory and re-transfer the remaining vertices to the GPU.
So the question is whether the speed-up provided by OpenCL kernels is sufficient to offset the delay imposed by using glMultiDrawArrays instead of glMultiDrawElements. In the end, the only way to know is to profile both methods and go with the one that provides better performance.