As artificial intelligence (AI) data centers grapple with the ever-increasing demands for bandwidth to shuttle massive datasets between GPUs and memory, the limitations of traditional copper interconnects are becoming increasingly apparent. Startups are stepping up to address this bottleneck by developing innovative optical interconnects that can be integrated directly onto standard GPU and memory chiplets, promising a significant leap in data transfer speeds and energy efficiency.
The current infrastructure in data centers relies on pluggable optical transceivers to carry data between racks, converting electrical signals to optical and back. The industry’s “holy grail” has been co-packaged optics (CPO), integrating optical components within the chip package to improve energy efficiency and reduce latency. While tech giants like Nvidia have recently announced progress in CPO for network switches, startups are pushing the boundaries further by aiming to replace even the short, meter-long copper links within a rack with direct chip-to-chip optical interconnects.
Several startups are focusing on chiplet-based solutions, utilizing silicon photonic waveguides and microring resonators. These resonators encode data lanes onto different wavelengths of light from an external laser and filter the appropriate wavelength at the receiver. This technology, similar to that used in Nvidia’s CPO switches, is being scaled up by startups employing multiple resonators to achieve massive parallel data transfer.
Ayar Labs recently demonstrated an optical interconnect between GPUs incorporating the standard UCIe electrical interface. Their optical chiplet, connected to the GPU via UCIe, transmits digital signals over single-mode optical fibers for distances up to 2 km. A single fiber can carry 16 wavelengths, and with multiple input/output ports, the system achieves an impressive 8 Tbps bandwidth between GPUs, all while the GPU remains unaware of the signal leaving its package.
LightMatter has taken a different approach, stacking optical chiplets directly on top of GPU or memory chiplets using chip-on-wafer techniques. Their Passage L200 offers a modular approach to optical integration, while their Passage M1000, an optical interposer, demonstrates a more aggressive 3D integration strategy, placing optical interconnects directly beneath the processing units for ultra-short electrical connections before routing data optically off the interposer.
Xscape Photonics is innovating by integrating frequency-comb lasers directly onto the chip, eliminating the need for external light sources altogether. Their ChromX platform aims to maximize the “escape bandwidth” from the chip by co-packaging the laser and the optical links.
While the potential of these direct optical interconnects is immense, challenges remain. Concerns about the granularity of switchable data lanes in large GPU clusters, the cost, energy efficiency, and reliability of multi-wavelength lasers, and the overall path to market are being actively discussed within the industry.
However, another promising approach, championed by Avicena, utilizes arrays of blue microLEDs connected through imaging fibers to move data. This laser-free approach, as demonstrated in their LightBundle platform, offers high granularity, reduced reliability risks, lower cost, and a significant energy efficiency improvement.
Despite the ongoing debate and the fact that copper interconnects still function, the consensus is that solving the bandwidth limitations within AI data centers with cheap, low-power, and reliable interconnects is a critical challenge. The startups pioneering direct optical interconnects to GPUs are at the forefront of this revolution, and the winner in this space could very well define the future of high-performance computing.
Related Content: Linear Pluggable Optics – Streamlining Data Center Efficiency