At last week’s CES, AMD announced that it would become the first company to support PCIe 4.0, with either its enterprise-bound EPYC or desktop-oriented Ryzen platforms. But what does PCIe 4.0 actually mean, and how would it impact you? Well, the truth is, we already have so much PCIe bandwidth available with current platforms, that only true power users – and servers – are going to be the biggest beneficiaries.
It’s hard to gauge the amount of PCIe bandwidth that current graphics cards actually need, but it seems like they’re going to be one of the last things to benefit from the move to PCIe 4.0+. If more GPUs release this year with “old” 3.0 support, it’s probably not worth getting upset over, since we already have more than enough bandwidth for an overwhelming number of cases. That’s not to say other architectural upgrades won’t improve things in other areas, but really, bandwidth is king.
Calculating PCIe bandwidth can be challenging, especially when uncommon terms like “Gigatransfers” are tossed around. Gigatransfers per second, or GT/s, is the preferred performance metric used by PCIe’s development consortium, PCI-SIG, as it denotes what the interconnect is capable of, regardless of the encoding method or bitstream (word length). It’s also why we sometimes see the transfer rates expressed as a frequency, since each data transfer is the number of samples captured per second – or a sample rate. This is expressed in Hertz, or GHz in this case, hence why we sometimes see GT/s and GHz used interchangeably when talking about PCIe bandwidth.
In order to figure out the total bandwidth available over the various PCIe standards in units we’re more familiar with, we also need to know the encoding method that the standard uses to give us that ultimate “top speed” value in bits and bytes.
Both PCIe 1.x and 2.x use encoding that converts 8-bit data to 10-bit characters, a design that results in a significant 20% performance overhead. In real numbers, that means PCIe 1.0’s per-lane bandwidth of 250MB/s is cut down to 200MB/s in the real-world (before even more potential overhead, at least). For PCIe 3.0+, more efficient 128b/130b encoding is used, whittling the overhead down to a modest 1.5%.
For PCIe 1.x and 2.x, the calculation for bandwidth performance is GT/s * (1/5) / 2, and for PCIe 3.0+, it’s GT/s * (1-2/130)) / 8. Both calculations give us GB/s results, representing single lane performance. Because PCIe offers bi-directional bandwidth, peak per-slot bandwidth is double x16 (eg: PCIe 1.x x16 is 8GB/s bi-directional). Here’s the math for each unidirectional x16 spec:
PCIe 1.0 is 2.5 GT/s with 8b/10b
2.5 * (1/5) / 2 = 0.250 GB/s per lane (4GB/s x16).
PCIe 2.0 is 5 GT/s with 8b/10b
5 * (1/5) / 2 = 0.500 GB/s per lane (8GB/s x16).
PCIe 3.0 is 8 GT/s with 128b/130b
8 * (1 – (2/130)) / 8 = 0.985 GB/s per lane (15.75GB/s x16).
PCIe 4.0 is 16 GT/s with 128b/130b
16 * (1 – (2/130)) / 8 = 1.969 GB/s per lane (31.51GB/s x16).
PCIe 5.0 is 32 GT/s with 128b/130b
32 * (1 – (2/130)) / 8 = 3.938 GB/s per lane (63.04GB/s x16).
When the math is done for all speeds, we have this:
PCI Express |
1.0/1.1 |
2.0/2.1 |
3.0/3.1 |
4.0 |
5.0 |
Encoding |
8b/10b |
8b/10b |
128b/130b |
128b/130b |
128b/130b |
Gigatransfer |
2.5 GT/s |
5 GT/s |
8 GT/s |
16 GT/s |
32 GT/s |
x1 Speeds |
250 MB/s |
500 MB/s |
985 MB/s |
1.969 GB/s |
3.938 GB/s |
x4 Speeds |
1 GB/s |
2 GB/s |
3.94 GB/s |
7.88 GB/s |
15.76 GB/s |
x8 Speeds |
2 GB/s |
4 GB/s |
7.88 GB/s |
15.76 GB/s |
31.52 GB/s |
x16 Speeds |
4 GB/s |
8 GB/s |
15.75 GB/s |
31.51 GB/s |
63.04 GB/s |
Even with the current PCIe 3.0 generation, a single lane would provide all of the bandwidth we’d need to handle our home 1Gbps Ethernet connections. In fact, we could almost support 8x 1Gbps connections at the same time on a x1 card. At a time when 10GbE network connections are lusted over by home users, PCIe 5.0 offers 5x the peak bandwidth of a 100GbE connection. At ~63GB/s, PCIe 5.0 could technically transfer more than one entire Blu-ray disc each second.
If it’s not clear by this point, for home users to take advantage of such extreme bandwidth, there needs to be special use cases. With PCIe 3.0, x4 M.2 NVMe SSDs can peak at 3.94GB/s, a number which will double to 7.88GB/s with PCIe 4.0. Can you think of a scenario where you are regularly in need of even 4GB/s of I/O, either from native or network storage?
One area where we might see growth with storage is with more interest in 3D XPoint memory, eg: Intel’s Optane and Micron’s QuantX. With more bandwidth being made available, we might have a class of always-on computers taking over. We’ve already seen the start of this with some of the ARM-based Windows 10 machines, and the tentatively close launch of Optane DIMMs, but PCIe 4.0 and above could see very close parity in storage and memory bandwidth.
Another scenario that will likely catch on, even with consumer gear, is I/O aggregation, meaning fitting more USB, SATA, M.2, U.2 ports onto fewer lanes. In the case of USB 3.1, and even Thunderbolt to some extent, motherboard manufacturers could push more of those ports into fewer lanes, allowing for more ports to be made available, without encroaching on PCIe lanes dedicated to graphics or M.2 storage. This means digging through UEFI options or fiddling around with jumper pins to enable and disable SATA ports to get a second M.2 port working won’t be required.
As our graphics cards continue to become more bandwidth-heavy, we’re surely going to grow into PCIe 4.0 before long, and there’s nothing to hate about technology that may very well be over our “pay grade” for some time to come. For the enterprise, though, PCIe 4.0 and beyond sounds pretty tasty.