Techgage logo

Lucid HYDRA Engine Multi-GPU Technology

Date: August 25, 2008
Author(s): Rory Buszka

One of the more exciting third-party demonstrations we saw at Intel’s 2008 Developer Forum was by a little-known company called Lucid, who promises highly-efficient multi-GPU performance scaling via their unique “Hydra Engine” technology. We take a look at Hydra Engine, and what it means for ATI’s Crossfire and NVIDIA’s SLI.



Introduction, HYDRA Technology

A few months ago, we received a press release from an Israeli company called LucidLogix, who claimed to have developed an innovative new approach to multi-GPU scaling called HYDRA. We weren’t completely sure what to make of it at that point, but it had apparent technical merit, so we decided to wait until the technology had matured somewhat and yielded up its first public demonstrations before we would bring the news to you.

We caught up with the company again at this year’s fall IDF in San Francisco, now simply called “Lucid”, and arranged for a private demonstration of the technology, which the company believes is more than ready for prime-time.

In a room at the W Hotel near downtown San Francisco, just a brief walk away from the Moscone center, we were introduced to the company’s president, Offir Remez, who conducted the demonstration and answered our questions along the way. A demonstration machine with dual NVIDIA GeForce 9800GTX video cards sat at the ready, feeding a pair of 1920×1080 LCD monitors.

The demonstration setup was somewhat imperfect, because in order to validate the performance of Lucid’s hardware without building an entire motherboard around it, Lucid had to build a breakout enclosure with an independent power supply, and an ATX form-factor evaluation board for its Hydra 100 SoC (which we’ll discuss in detail). What we saw worked quite well, however – so well, in fact, that it’s caused quite a buzz among our tech journalism colleagues. In this article, we’ll explain why.

HYDRA Technology

Lucid’s HYDRA Engine is fundamentally different from either of the extant solutions for GPU parallelism – SLI from NVIDIA, or Crossfire from AMD. Both SLI and Crossfire rely on split-frame or alternate-frame methods for dividing graphics tasks. While split-frame rendering can only divide up the actual task of pixel shading, requiring each of the arrayed GPUs to compute the complete set of geometry for the scene, alternate-frame rendering splits up the graphics tasks frame-by-frame.

AMD’s Crossfire also allows for asymmetrical frame distribution among video cards. Alternate frame rendering is the preferred method for optimizing performance, though it still doesn’t scale perfectly with the number of GPUs installed – there’s a marked decrease in the amount of performance gained from each subsequent GPU added to the array.

The Lucid Hydra Engine uses neither of the above approaches – instead, its algorithm evaluates the complexity of rendering tasks, and divides them evenly among video cards to achieve the best possible balance of loading between GPUs, allowing almost perfect scaling between any number of installed cards. That’s right – any number. While CrossfireX and SLI are both inherently limited to four GPUs at this point, HYDRA Engine technology has no such limitation.

How does Lucid’s HYDRA technology achieve performance scalability results that SLI and Crossfire can’t touch? It accomplishes this noteworthy feat by first deconstructing the elements of the image to be rendered (a process called ‘decompositing’), dispatching those elements to the individual GPUs installed in the system, and then ‘recompositing’ the final rendered elements together into a complete rendered frame. The HYDRA Engine’s algorithm handles load-balancing between the cards by determining the complexity of each rendering task, and then making sure each card works only on that set of tasks that will allow the cards to finish their rendering tasks simultaneously.

The HYDRA Engine algorithm keeps track of the time it takes each video card in the system to render an element with a certain complexity, which allows it to decide which combination of tasks to send to each individual card to achieve the shortest rendering time. For this reason, the cards in a HYDRA configuration need not necessarily be perfectly matched to one another, and dissimilar cards should not cause an undue deterioration in the visual experience.

In Lucid’s current implementation, all HYDRA processing is handled by a separate system-on-a-chip (SoC), which incorporates a processor and all of its memory and interface logic, and places no additional load on the host machine’s CPU. This approach preserves complete 3D graphics acceleration within the HYDRA-equipped machine. The hardware that Lucid demonstrated to us was their HYDRA 100 SoC, which integrates a 225MHz RISC microprocessor core and a graphics algorithm library contained in on-chip ROM, as well as a PCI-Express bus switch with an upstream x16 port and two downstream x16 ports that can each be divided into two x8 ports, for up to four connected GPUs.

Next, let’s go deeper into the details of the HYDRA Engine, as well as the HYDRA 100 SoC architecture, and see exactly how the two work in tandem to optimize multi-GPU graphics performance.

HYDRA Architecture Details

What makes Lucid’s current HYDRA implementation different from the other multi-GPU technologies on the market is that it places another stage of hardware processing between the CPU and the GPU. This has raised questions of additional latency, but Lucid assures us that any latency added is inconsequential – the HYDRA 100 SoC handles the decomposition and recomposition of individual frames so quickly that your eye won’t even notice. And indeed, in the demonstration that we witnessed, the system was snappy and responsive, without any perceptible delay.

Let’s take a look at how the HYDRA Engine algorithm and HYDRA 100 system-on-a-chip work together to achieve seamless multi-GPU integration.

The HYDRA 100 ASIC

At present, Lucid’s HYDRA Engine algorithm is tied to the Lucid-designed HYDRA 100 ASIC, whose system-on-a-chip architecture handles the calculations required by the HYDRA Engine completely independent of the host PC. At present, the company has no plans to carry out HYDRA Engine calculations in software on the host machine’s CPU, but it’s not entirely inconceivable that the CPU could be used for this processing in the future. We’ll revisit this notion in a bit, but let’s continue on with our analysis of the current hardware-based implementation.

As we mentioned before, the Lucid-designed HYDRA 100 SoC incorporates an embedded RISC processor running at 225MHz, and a graphics algorithm library. It also contains 32 kilobytes (yes, kilobytes) of memory and both a 16-lane upstream and dual-16-lane downstream (reconfigurable to quad-8-lane) PCI Express switch. That means that from a single HYDRA 100, quad-card configurations are possible, which is about as many GPUs as you’d want to run in an ordinary desktop PC. However, Lucid informed us that the HYDRA ASIC architecture is highly scalable, which means you could see any combination of PCI-E input and output lanes.

Lucid suggests two possible scenarios for an implementation of HYDRA silicon – first as an additional chip on a gaming or workstation PC motherboard, between the northbridge and the PCI express x16 slots, and alternately as a central chip on a multi-GPU add-in board. On current models of multi-GPU add-in boards like AMD’s ATI Radeon HD 4870 X2, a PCI Express bridge (sourced from a company like PLX) is used to split the sixteen incoming lanes of traffic into a pair of 8-lane ports – with Crossfire functionality continuing to be handled in software. But swap out that PCI Express bridge for a Lucid HYDRA 100, and suddenly you’ve got more efficient performance scaling, and support for up to four GPUs on a single card.

So, how do the multiple GPUs connected to a HYDRA ASIC relate to the host system? The Lucid HYDRA Engine is capable of bringing together dissimilar cards to work on rendering tasks – however, there is one limitation: All GPUs must be able to share the same software driver. That means you can forget mixing and matching cards with ATI and NVIDIA GPUs in the same HYDRA configuration. Cards from both makers can theoretically be present in the same system, however – you could have a HYDRA array of ATI Radeon HD 4870s handling the video rendering, with a low-end NVIDIA GeForce 8600GT handling PhysX acceleration, for example.

So as best we can tell, the Lucid HYDRA 100 ASIC appears to the host system as a single graphics processor of whatever GPU family is arrayed together in the HYDRA configuration. The benefit of this is that you won’t need ATI or NVIDIA to pony up special drivers for their video cards – you won’t even need to run in SLI or CrossfireX mode.

Next, let’s look at the nuts and bolts of the Lucid HYDRA Engine algorithm.

The HYDRA Engine Algorithm

We already touched on what the HYDRA Engine algorithm basically ‘does’ on the preceding page of this article. The HYDRA Engine balances the load between multiple GPUs by ‘decompositing’ the GPU rendering tasks – the digital bits and pieces that tell a video card what to render – and then ‘recompositing’ the results into a complete rendered image.

While we can’t give specific performance numbers, since the private demonstration we were given didn’t include a benchmarking session, we did observe a consistent 58-60 fps in Crysis on a pair of NVIDIA GeForce 9800GTX cards, with all the game’s graphics settings pegged at ‘high’ under DirectX 9. So the HYDRA Engine works, delivering nearly double the single-card performance figures – and we’ve seen it with our own eyes.

The HYDRA Engine load-distribution algorithm is by far the most complex that we’ve seen, which begins to give us some idea of why Lucid seems so dead-set on keeping the algorithm’s implementation based in hardware – their hardware, to be precise. It brings together a variety of historical data, and dynamically balances the processing load based on individual tasks, not complete frametimes.

First, the algorithm collects data on the time required by each subordinate GPU to accomplish a task with a certain complexity level, and stores this data in a small repository of data. Then the algorithm looks at the types of tasks being requested by the 3D application, and decides which of the connected graphics cards would be best-suited to each task.

From there, the HYDRA Engine algorithm dispatches the individual tasks to each of the connected cards, then recomposites the image using its own onboard composition engine and returns that image to one of the video cards, to be displayed on the output monitor. The HYDRA Engine also handles other optimizations, such as occlusion-culling, in its own hardware, freeing the video card from any extraneous processing.

Final Thoughts

It takes quite a bit to excite us these days as PC hardware enthusiasts, with Intel’s newest CPU microarchitecture right around the corner, and multi-GPU video cards already blowing our minds with the performance that they manage to put up on current- and next-generation games. Yet Offir and the others at Lucid have managed to accomplish just that, simply by the ingenious solution they’ve come up with to solve the present problems with multi-GPU performance scaling.

But being enthusiastic about the technology is one thing – the company’s actual ability to make the technology a success is quite another. As far as pricing is concerned, in an era of $400 enthusiast motherboards (ASUS Rampage Extreme, anyone?), Lucid assures us that their hardware is extremely cost-effective, and won’t cause much of an increase in the bill of materials – which means that aside from any ‘exclusivity tax’ that HYDRA-equipped motherboards incur upon their manufacturers’ whim, Lucid’s technology is well within the realm of affordability for enthusiasts and gamers.

There are a few questions we wish we’d asked during the demonstration of HYDRA technology. First of all is with regard to antialiasing – does the HYDRA Engine send the entire frame to one of the installed GPUs to handle the full-frame antialiasing task? Does it optimize the handling of the antialiasing task by dispatching it to the GPU with the least load? What about other post-processing effects like HDR? Do these incur a performance penalty when enabled? Does Lucid presently have any manufacturing partners lined up?

We heard a couple of major names dropped, but as far as we know, nothing has yet been set in stone. Also, while Lucid doesn’t have any current plans to take the HYDRA Engine completely into software, there’s no telling what the future may hold, or what steps Lucid may need to take to stay competitive.

There are a few more things we do know, however, besides what we’ve already covered. First of all, though the HYDRA 100 ASIC only incorporates a single 16x upstream PCI Express port, today’s graphics cards don’t even come close to exhausting the HYDRA 100’s available bandwidth across that solitary 16-lane interface, even in quad-card arrangements.

Secondly, GPGPU processing won’t get a boost from HYDRA, which only handles machine-language calls from the graphics driver for 3D rendering tasks. Also, while Lucid’s current HYDRA hardware is DirectX 10.1 compliant, the software implementation, for the moment, is not. But we’re assured that’s coming ‘very soon’. Lastly, multi-monitor arrangements aren’t outside the realm of possibility.

In the end, there remains much about Lucid’s HYDRA Engine that we won’t know until we’ve got HYDRA-equipped hardware in-hand, but we do know this much: SLI and Crossfire have found a formidable new competitor. At the very least, Lucid’s HYDRA Engine technology should be enough to slap the major players around a little and make them look silly until they can come up with some way to fix Crossfire and SLI to resolve the performance scaling issues that plague both multi-GPU solutions, without infringing upon Lucid’s patents.

If they do manage to do this, we’re not sure where this would leave Lucid, but we’re certain that our Israeli friends will have plenty of time to shake up the 3D graphics marketplace until the big guys can get their rears in gear.

Discuss in our forums!

If you have a comment you wish to make on this article, feel free to head on into our forums! There is no need to register in order to reply to such threads.

Copyright © 2005-2021 Techgage Networks Inc. - All Rights Reserved.