Now that we have GPGPUs with languages like CUDA and OpenCL, do the multimedia SIMD extensions (SSE/AVX/NEON) still serve a purpose? I read an article

Why use SIMD if we have GPGPU? [closed]

submited by
Style Pass
2024-02-11 16:00:07

Now that we have GPGPUs with languages like CUDA and OpenCL, do the multimedia SIMD extensions (SSE/AVX/NEON) still serve a purpose?

I read an article recently about how SSE instructions could be used to accelerate sorting networks. I thought this was pretty neat but when I told my comp arch professor he laughed and said that running similar code on a GPU would destroy the SIMD version. I don't doubt this because SSE is very simple and GPUs are large highly-complex accelerators with a lot more parallelism, but it got me thinking, are there many scenarios where the multimedia SIMD extensions are more useful than using a GPU?

If GPGPUs make SIMD redundant, why would Intel be increasing their SIMD support? SSE was 128 bits, now it's 256 bits with AVX and next year it will be 512 bits. If GPGPUs are better processing code with data parallelism why is Intel pushing these SIMD extensions? They might be able to put the equivalent resources (research and area) into a larger cache and branch predictor thus improving serial performance.

First, SIMD can more easily interoperate with scalar code, because it can read and write the same memory directly, while GPUs require the data to be uploaded to GPU memory before it can be accessed. For example, it's straightforward to vectorize a function like memcmp() via SIMD, but it would be absurd to implement memcmp() by uploading the data to the GPU and running it there. The latency would be crushing.

Leave a Comment