Apple M3 Family Benefits from New GPU Features
0 Facebook Twitter Reddit
Applications and games that use the Metal API for specific features of Apple Silicon GPUs, which are made even better by significant parallel improvements in the M3 and A17 Pro. Here's how it works.
Apple has published a developer report on these new Apple Silicon GPU features, detailing exactly what's happening to achieve the improved results. The video goes into a lot of technical detail but provides enough information to explain in basic terms.
Developers building applications using the Metal API do not need to make any changes to their applications to see performance improvements with the M3 and A17 Pro. These chipsets use dynamic caching, hardware-accelerated ray tracing, and hardware-accelerated mesh mapping to make the GPU more performant than ever.
Dynamic shader core memory
Dynamic caching is made possible by the next generation shader core. By using the latest GPU cores in the A17 Pro and M3, these shaders can run in parallel much more efficiently than before, significantly improving output performance.
Dotted lines represent wasted register memory
Typically, the GPU can only allocate register memory based on the process with the maximum throughput of the activity running for the duration of that actions. Therefore, if one part of an activity requires significantly more register memory than the rest, the activity will use much more register memory for that process.
Dynamic caching allows the GPU to allocate exactly the right amount of register memory for each action it performs. Previously unavailable register memory is freed, allowing many more shader tasks to run in parallel.
Flexible on-chip memory
Earlier on-chip memory had a fixed memory allocation for register, thread group and tile memory with buffer cache. This meant that a significant amount of memory remained unused if an action used more of one type of memory than another.
All built-in memory can be used as a cache
With flexible built-in memory, all built-in memory is a cache, can be used for any type of memory. Thus, an activity that relies heavily on thread group memory can use the entire range of on-chip memory and even offload activities into main memory.
The shader core dynamically adjusts on-chip memory occupancy to maximize performance. This means developers can spend less time optimizing occupancy.
High performance shader core ALU pipelines
Apple encourages developers to perform FP16 math in their programs, but high-performance ALUs perform various combinations of integers, FP32, and FP16 in parallel. Instructions are executed for different actions running in parallel, which means ALU utilization improves at higher loads.
Increasing the number of parallel operations using high-performance ALU pipelines
Essentially, if different actions contain the same FP32 or FP16 instructions, they will be executed in execution times can overlap at different points to increase parallelism.
Hardware-accelerated graphics pipelines
Hardware-accelerated ray tracing significantly speeds up the process, output is vital important intersection calculations from the GPU function. Since some of the computation is done by the hardware, it allows more operations to be done in parallel, thereby speeding up ray tracing using the hardware component.
Hardware acceleration replaces native processes
Hardware acceleration for mesh shading uses a similar method. It takes the middle of the geometric computation pipeline and passes it to a dedicated block, allowing for more parallel operations.
These are complex systems that cannot be broken down into several paragraphs. We recommend watching the video to get all the details, keeping one thing in mind – — The A17 Pro and M3 focus on parallel computing to speed up tasks.
M3 is available on MacBook Pro and 24-inch iMac. A17 Pro is available on iPhone 15 Pro.