A few months ago, we received a press release from an Israeli company called LucidLogix, who claimed to have developed an innovative new approach to multi-GPU scaling called HYDRA. We weren’t completely sure what to make of it at that point, but it had apparent technical merit, so we decided to wait until the technology had matured somewhat and yielded up its first public demonstrations before we would bring the news to you.
We caught up with the company again at this year’s fall IDF in San Francisco, now simply called “Lucid”, and arranged for a private demonstration of the technology, which the company believes is more than ready for prime-time.
In a room at the W Hotel near downtown San Francisco, just a brief walk away from the Moscone center, we were introduced to the company’s president, Offir Remez, who conducted the demonstration and answered our questions along the way. A demonstration machine with dual NVIDIA GeForce 9800GTX video cards sat at the ready, feeding a pair of 1920×1080 LCD monitors.
The demonstration setup was somewhat imperfect, because in order to validate the performance of Lucid’s hardware without building an entire motherboard around it, Lucid had to build a breakout enclosure with an independent power supply, and an ATX form-factor evaluation board for its Hydra 100 SoC (which we’ll discuss in detail). What we saw worked quite well, however – so well, in fact, that it’s caused quite a buzz among our tech journalism colleagues. In this article, we’ll explain why.
Lucid’s HYDRA Engine is fundamentally different from either of the extant solutions for GPU parallelism – SLI from NVIDIA, or Crossfire from AMD. Both SLI and Crossfire rely on split-frame or alternate-frame methods for dividing graphics tasks. While split-frame rendering can only divide up the actual task of pixel shading, requiring each of the arrayed GPUs to compute the complete set of geometry for the scene, alternate-frame rendering splits up the graphics tasks frame-by-frame.
AMD’s Crossfire also allows for asymmetrical frame distribution among video cards. Alternate frame rendering is the preferred method for optimizing performance, though it still doesn’t scale perfectly with the number of GPUs installed – there’s a marked decrease in the amount of performance gained from each subsequent GPU added to the array.
The Lucid Hydra Engine uses neither of the above approaches – instead, its algorithm evaluates the complexity of rendering tasks, and divides them evenly among video cards to achieve the best possible balance of loading between GPUs, allowing almost perfect scaling between any number of installed cards. That’s right – any number. While CrossfireX and SLI are both inherently limited to four GPUs at this point, HYDRA Engine technology has no such limitation.
How does Lucid’s HYDRA technology achieve performance scalability results that SLI and Crossfire can’t touch? It accomplishes this noteworthy feat by first deconstructing the elements of the image to be rendered (a process called ‘decompositing’), dispatching those elements to the individual GPUs installed in the system, and then ‘recompositing’ the final rendered elements together into a complete rendered frame. The HYDRA Engine’s algorithm handles load-balancing between the cards by determining the complexity of each rendering task, and then making sure each card works only on that set of tasks that will allow the cards to finish their rendering tasks simultaneously.
The HYDRA Engine algorithm keeps track of the time it takes each video card in the system to render an element with a certain complexity, which allows it to decide which combination of tasks to send to each individual card to achieve the shortest rendering time. For this reason, the cards in a HYDRA configuration need not necessarily be perfectly matched to one another, and dissimilar cards should not cause an undue deterioration in the visual experience.
In Lucid’s current implementation, all HYDRA processing is handled by a separate system-on-a-chip (SoC), which incorporates a processor and all of its memory and interface logic, and places no additional load on the host machine’s CPU. This approach preserves complete 3D graphics acceleration within the HYDRA-equipped machine. The hardware that Lucid demonstrated to us was their HYDRA 100 SoC, which integrates a 225MHz RISC microprocessor core and a graphics algorithm library contained in on-chip ROM, as well as a PCI-Express bus switch with an upstream x16 port and two downstream x16 ports that can each be divided into two x8 ports, for up to four connected GPUs.
Next, let’s go deeper into the details of the HYDRA Engine, as well as the HYDRA 100 SoC architecture, and see exactly how the two work in tandem to optimize multi-GPU graphics performance.