At 11 a.m. on May 24, NVIDIA will appear at the Taipei Computer Exhibition and hold a keynote speech. Although Lao Huang is absent, the senior vice president of geforce business is listed, and it is still possible to announce the news of RTX 40 Series graphics card, even in advance. Prior to this, kopte7kimi shared the kernel design drawing of ad102 GPU.
Ad102 is the secondary top configuration core of ADA Lovelace family and the configuration of RTX 40 series game flagship card. It probably corresponds to RTX 4090 Ti and RTX 4090 graphics cards.
From the analysis, there are 12 groups of GPCS (display computing clusters) built in ad102, which is 70% more than the previous generation ga102. Each group of GPC includes 6 TPCS (2 SM), and each SM unit includes 4 sub cores, which are the same as that of ampere, but the difference is that each SM sub core contains 128 groups of fp32 units, which together with in32 integer units amounts to 192.
The complete ad102 includes 24 groups of SM, all of which are 12288 fp32 units plus 6144 int32. In other words, the easy to understand point is 18432 CUDA
In terms of cache, in the ad102 core, each group of SM enjoys 192kb L1, which is 50% higher than the ampere, with a total of 4.5mb. L2 increased to 96MB, 16 times the ampere.
Accordingly, the scale of ROP and RT light tracing units naturally increases. There are 384 ROPS at most for ad102 and 112 for RTX 3090 ti. In addition, the light tracing unit is upgraded to the third generation, and the tensor unit is upgraded to the fourth generation.
Based on this, the final performance doubling of RTX 4090 does not seem to be unreachable. For fp32 single precision floating point, the outside world is expected to reach 90t, while RTX 3090 ti is only 40t, at the cost of ultra 600W power consumption