Facebook says the trio of platforms — which it’s donating to the Open Compute Project, an organization that shares designs of data center products among its members — will dramatically accelerate AI training and inference. “AI is used across a range of services to help people in their daily interactions and provide them with unique, personalized experiences,” Facebook engineers Kevin Lee, Vijay Rao, and William Christie Arnold wrote in a blog post. “AI workloads are used throughout Facebook’s infrastructure to make our services more relevant and improve the experience of people using our services.”
Zion — which is tailored to handle a “spectrum” of neural networks architectures including CNNs, LSTMs, and SparseNNs — comprises three parts: a server with eight NUMA CPU sockets, an eight-accelerator chipset, and Facebook’s vendor-agnostic OCP accelerator module (OAM). It boasts high memory capacity and bandwidth, thanks to two high-speed fabrics (a coherent fabric that connects all CPUs, and a fabric that connects all accelerators), and a flexible architecture that can scale to multiple servers within a single rack using a top-of-rack (TOR) network switch.
“Since accelerators have high memory bandwidth, but low memory capacity, we want to effectively use the available aggregate memory capacity by partitioning the model in such a way that the data that is accessed more frequently resides on the accelerators, while data accessed less frequently resides on DDR memory with the CPUs,” Lee, Rao, and Arnold explain. “The computation and communication across all CPUs and accelerators are balanced and occurs efficiently through both high and low speed interconnects.”
As for Kings Canyon, which was designed for inferencing tasks, it’s split into four components: Kings Canyon inference M.2 modules, a Twin Lakes single-socket server, a Glacier Point v2 carrier card, and Facebook’s Yosemite v2 chassis. Facebook says it’s collaborating with Esperanto, Habana, Intel, Marvell, and Qualcomm to develop ASIC chips that support both INT8 and high-precision FP16 workloads.
Each server in Kings Canyon combines M.2 Kings Canyon accelerators and a Glacier Point v2 carrier card, which connect to a Twin Lakes server; two of these are installed into a Yosemite v2 sled (which has more PCIe lanes than the first-gen Yosemite) and linked to a TOR switch via a NIC. Kings Canyon modules include an ASIC, memory, and other supporting components — the CPU host communicates to the accelerator modules via PCIe lanes — while Glacier Point v2 packs an integrated PCIe switch that allows the server to access to all the modules at once.
“With the proper model partitioning, we can run very large deep learning models. With SparseNN models, for example, if the memory capacity of a single node is not enough for a given model, we can further shard the model among two nodes, boosting the amount of memory available to the model,” Lee, Rao, and Arnold said. “Those two nodes are connected via multi-host NICs, allowing for high-speed transactions.”
So what about Mount Shasta? It’s an ASIC developed in partnership with Broadcom and Verisilicon that’s built for video transcoding. Within Facebook’s datacenters, it’ll be installed on M.2 modules with integrated heat sinks, in a Glacier Point v2 (GPv2) carrier card that can house multiple M.2 modules.
The company says that on average, it expects the chips will be “many times” more efficient than its current servers. It’s targeting encoding at least two times 4K at 60fps input streams within a 10W power envelope.
“We expect that our Zion, Kings Canyon, and Mount Shasta designs will address our growing workloads in AI training, AI inference, and video transcoding respectively,” Lee, Rao, and Arnold wrote. “We will continue to improve on our designs through hardware and software co-design efforts, but we cannot do this alone. We welcome others to join us in in the process of accelerating this kind of infrastructure.”