mirror of
https://github.com/pytorch/pytorch.git
synced 2025-10-20 21:14:14 +08:00
Finishes the work started in https://github.com/pytorch/pytorch/pull/118697. Thanks @MarouaneMaatouk for the attempt, but due to inactivity I have opened this PR for Adamax. Note that the new capturable implementation is much simpler and I've modified the foreach capturable impl--it now calls fewer kernels and is more easily comparable to forloop. Next steps: * This PR discovered two bugs: #121178 and #121238. * Move the now hefty graph optim tests in test_cuda to use OptimInfo. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121183 Approved by: https://github.com/albanD