Blender Cycles benchmark on GPUBox Artist


In the previous post we delivered you some information about scalability and performance in Octane Render. Now it is time to take a closer look at Blender and see how the situation looks here. We recorded another video showing rendering widely-used benchmark scene BMW 1M by MikePan, but instead of default settings we applied 8000×4200 resolution and 1000 samples.

When it comes to using multiple GPUs, Blender is rather a long-runner than a sprinter: the heavier is the scene and the longer is the rendering, the better is the scalability. Adding more and more GPUs will not be very efficient if simple, low-quality scene is being rendered. Preparing and distributing the scene to GPUs is not always made up with the performance boost during the rendering itself.  But the magic happens while using multiple virtualized GPUs to render complex scenes on high settings.

The table below presents other results that we gathered for this test. Rendering times are given as hh:mm:ss. Every rendering was performed on the same settings, which were 8000×4200 resolution and 1000 samples.

System specification:

  • CPU: Intel i7-3820
  • GPUs: GeForce GTX 690
  • CentOS 6.5
Rendering device Usage Network Time Additional info
CPU Native - 10:34:18 -
1 x GPU Native - 03:05:20 -
4 x GPU Native - 00:46:28 -
4 x GPU GPUBox 20 Gb/s 00:48:45 -
8 x GPU GPUBox 20 Gb/s 00:25:36  -
16 x GPU GPUBox 20 Gb/s 00:18:50  -
16 x GPU GPUBox 20 Gb/s 00:16:24 No recording
16 x GPU GPUBox 20 Gb/s 00:15:56 No recording, only remote GPUs

Disabling the CPU-consuming recording application resulted in decreasing the rendering time on 16 GPUs from 00:18:50 to 00:16:24. The best time (00:15:56) was achieved while none of the virtualized GPUs was installed in the PC that we were using, so instead of the configuration from the video…


…we used another PC only as a GPUBox Client:


Blender exhibits solid scalability in case of longer rendering sequences. For this test, in case of 8 GPUs, the scalability was very satisfactory and oscillated around 90% compared to rendering on 1 native GPU. For 16 GPUs it was ~65%

However, scalability and multi-GPU performance in Cycles can be significantly improved, if the communication between CPU and GPU was limited and using long-running CUDA kernels.

GPU-rendering performance in Blender can be also increased by overlapping rendering, but this is a topic for another post… Stay put!

You can also download the rendered scene here: bmw-8k-rendered.jpg

Subscribe to our newsletter to receive latest
news and updates.