GPU access during build steps (shell_commands / Dockerfile) Models that use torch.compile, Triton JIT, or CUDA graph capture need a GPU warmup pass to pre-compile kernels. Currently builds run on CPU-only machines, so this compilation has to happen at runtime on every cold start, adding 20-60s of latency before the first request can be served. With GPU-enabled builds, users could run a warmup inference during shell_commands and bake the compiled artifacts (Inductor cache, Triton kernels, CUDA graphs) into the container image. Cold starts would then skip compilation entirely.