Tesla expands its GPU-powered AI supercomputer – is the next Dojo?

Using Selene’s Top500 rendering as proxy, we estimate that Tesla’s 7,360-GPU assembly will be capable of about 100 petaflops of the dual-precision Linpack, although we expect Tesla to primarily run mono and less precision workloads (FP32, FP16, bfloat16, etc.) .

A larger AI supercomputer – from Meta/Facebook – was detailed earlier this year. The AI ​​Research SuperCluster (RSC) will use 16,000 A100 GPUs, providing more than 200 petaflops of dual-resolution, once completed this summer.

The Tesla GPU system was unveiled last June by Andrej Karpathy, Tesla’s Senior Director of Artificial Intelligence, at the Fourth Joint International Conference on Computer Vision and Pattern Recognition (CCVPR 2021). “I wanted to briefly give a plug to this crazy supercomputer we are building and using right now,” Karpathi said. At the time, the system spanned 720 nodes, each powered by eight Nvidia A100 GPUs (model 80 GB), for a total of 5,760 A100. At eight GPUs per node, injecting another 1,600 GPUs adds 200 nodes to the installation for a total of 920 nodes.

News of the upgrade came via a tweet From Tim Zaman, Engineering Director at Tesla – Part of a promotion for the upcoming MLSysConf. Tesla is sponsoring the conference, which runs from August 29, 2022 until September 1, 2022. The company is also holding its second AI Day event on September 30, 2022.

Tesla’s GPU kits are a precursor to the company’s upcoming Dojo supercomputer domestically, which has been in development since August 2020, when Tesla CEO Elon Musk was chirp: “Tesla Evolution a [neural network] A training computer called Dojo to process huge amounts of video data. It’s a monster! … xaflop is really useful in FP32”.

The D1 chip from Tesla. Image courtesy of Tesla.

Dojo’s design was revealed at Tesla’s inaugural AI Day event last August, when details of the system and its component D1 chip surfaced. Tesla may soon be ready to pour some extra Dojo tea next week at Hot Chips. The (entirely virtual) event kicks off on Sunday (August 16) and runs through Tuesday, August 23, 2022. Tesla has three placements in the program, all on Tuesday. In the morning, Emil Talbes, Tesla Instrumentation Engineer, is scheduled to give a presentation titled “Dojo: Precision Engineering of Tesla’s Exa-Scale Computer,” followed by Dojo’s Principal Tesla System Engineer Bill Chang, with his speech, “Dojo – Super-Scale.” computing system for machine learning training.”

Later in the day, Ganesh Venkataramanan, Senior Director of Autopilot Instrumentation at Tesla, will deliver a keynote address titled “Beyond Computing – Enabling AI Through System Integration”. This is the second of two keywords to appear in Hot Chips 2022; The second (“Semiconductor Runs the World”) will be presented by Intel CEO Pat Gelsinger on Monday, August 22.

Several technologies are competing to power the world’s fastest artificial intelligence supercomputer. In addition to market-leading Nvidia GPUs, AMD GPUs now power the world’s fastest (publicly rated) supercomputer, Frontier. Intel is working on the launch of the Ponte Vecchio GPU, the core engine for the future Aurora supercomputer. Dedicated chips are also starting to appear: Google is working on a 4th generation TPU; Microsoft has invested in FPGAs to power AI workloads; Amazon launched Trainium and Inferentia AI chips.

Leave a Reply

%d bloggers like this: