pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
cyy	d44daebdbc	[Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/127051 Approved by: https://github.com/cpuhrsch, https://github.com/malfet	2024-05-31 01:20:45 +00:00
feifan	da9fb670d2	Nadam support the flag for "maximize" (#127214 ) Fixes https://github.com/pytorch/pytorch/issues/126642 Pull Request resolved: https://github.com/pytorch/pytorch/pull/127214 Approved by: https://github.com/janeyx99	2024-05-31 01:11:16 +00:00
SandishKumarHN	da39461d61	[optim] Move test_grad_scaling_autocast_fused_optimizers to test_cuda.py (#126418 ) this PR address the comments in this PR #124904 - Move test_grad_scaling_autocast_fused_optimizers to test_cuda.py - Combine _grad_scaling_autocast_fused_optimizers into test_grad_scaling_autocast_fused_optimizers - Move to OptimizerInfo framework. - For failing tests test_grad_scaling_autocast_fused_optimizers AdamW_cuda_float32, Adam_cuda_float32 - Added toleranceOverride in this PR - created a issue #127000 ``` > (c2env) [sandish@devgpu166.ash6 ~/pytorch (refactoroptimizers)]$ python test/test_cuda.py -k test_grad_scaling_autocast_fused_optimizers -v /home/sandish/pytorch/torch/backends/cudnn/__init__.py:106: UserWarning: PyTorch was compiled without cuDNN/MIOpen support. To use cuDNN/MIOpen, rebuild PyTorch making sure the library is visible to the build system. warnings.warn( /home/sandish/pytorch/torch/backends/cudnn/__init__.py:106: UserWarning: PyTorch was compiled without cuDNN/MIOpen support. To use cuDNN/MIOpen, rebuild PyTorch making sure the library is visible to the build system. warnings.warn( test_grad_scaling_autocast_fused_optimizers_Adagrad_cpu_float32 (__main__.TestCudaOptimsCPU) ... {'fused': True} {'fused': True} {'weight_decay': 0.1, 'fused': True} {'weight_decay': 0.1, 'fused': True} {'weight_decay': 0.1, 'maximize': True, 'fused': True} {'weight_decay': 0.1, 'maximize': True, 'fused': True} {'lr': 0.1, 'fused': True} {'lr': 0.1, 'fused': True} {'initial_accumulator_value': 0.1, 'weight_decay': 0.1, 'fused': True} {'initial_accumulator_value': 0.1, 'weight_decay': 0.1, 'fused': True} {'lr': 0.1, 'lr_decay': 0.5, 'weight_decay': 0.1, 'fused': True} {'lr': 0.1, 'lr_decay': 0.5, 'weight_decay': 0.1, 'fused': True} {'lr': tensor(0.0010), 'fused': True} {'lr': tensor(0.0010), 'fused': True} ok test_grad_scaling_autocast_fused_optimizers_AdamW_cpu_float32 (__main__.TestCudaOptimsCPU) ... {'fused': True} {'fused': True} {'lr': 0.01, 'fused': True} {'lr': 0.01, 'fused': True} {'weight_decay': 0.1, 'fused': True} {'weight_decay': 0.1, 'fused': True} {'weight_decay': 0.1, 'maximize': True, 'fused': True} {'weight_decay': 0.1, 'maximize': True, 'fused': True} {'weight_decay': 0.1, 'amsgrad': True, 'fused': True} {'weight_decay': 0.1, 'amsgrad': True, 'fused': True} ok test_grad_scaling_autocast_fused_optimizers_Adam_cpu_float32 (__main__.TestCudaOptimsCPU) ... {'fused': True} {'fused': True} {'lr': 0.01, 'fused': True} {'lr': 0.01, 'fused': True} {'weight_decay': 0.1, 'fused': True} {'weight_decay': 0.1, 'fused': True} {'weight_decay': 0.1, 'maximize': True, 'fused': True} {'weight_decay': 0.1, 'maximize': True, 'fused': True} {'weight_decay': 0.1, 'amsgrad': True, 'fused': True} {'weight_decay': 0.1, 'amsgrad': True, 'fused': True} ok test_grad_scaling_autocast_fused_optimizers_SGD_cpu_float32 (__main__.TestCudaOptimsCPU) ... {'fused': True} {'fused': True} {'lr': 0.01, 'fused': True} {'lr': 0.01, 'fused': True} {'lr': tensor(0.0010), 'fused': True} {'lr': tensor(0.0010), 'fused': True} {'momentum': 0.9, 'fused': True} {'momentum': 0.9, 'fused': True} {'momentum': 0.9, 'dampening': 0.5, 'fused': True} {'momentum': 0.9, 'dampening': 0.5, 'fused': True} {'momentum': 0.9, 'weight_decay': 0.1, 'fused': True} {'momentum': 0.9, 'weight_decay': 0.1, 'fused': True} {'momentum': 0.9, 'nesterov': True, 'weight_decay': 0.1, 'fused': True} {'momentum': 0.9, 'nesterov': True, 'weight_decay': 0.1, 'fused': True} {'weight_decay': 0.1, 'maximize': True, 'fused': True} {'weight_decay': 0.1, 'maximize': True, 'fused': True} ok test_grad_scaling_autocast_fused_optimizers_Adagrad_cuda_float32 (__main__.TestCudaOptimsCUDA) ... skipped 'cuda is not supported for fused on Adagrad' test_grad_scaling_autocast_fused_optimizers_AdamW_cuda_float32 (__main__.TestCudaOptimsCUDA) ... {'fused': True} {'fused': True} {'lr': 0.01, 'fused': True} {'lr': 0.01, 'fused': True} {'weight_decay': 0.1, 'fused': True} {'weight_decay': 0.1, 'fused': True} {'weight_decay': 0.1, 'maximize': True, 'fused': True} {'weight_decay': 0.1, 'maximize': True, 'fused': True} {'weight_decay': 0.1, 'amsgrad': True, 'fused': True} {'weight_decay': 0.1, 'amsgrad': True, 'fused': True} {'capturable': True, 'fused': True} {'capturable': True, 'fused': True} {'weight_decay': 0.1, 'amsgrad': True, 'capturable': True, 'fused': True} {'weight_decay': 0.1, 'amsgrad': True, 'capturable': True, 'fused': True} {'lr': tensor(0.0010), 'amsgrad': True, 'capturable': True, 'fused': True} {'lr': tensor(0.0010), 'amsgrad': True, 'capturable': True, 'fused': True} ok test_grad_scaling_autocast_fused_optimizers_Adam_cuda_float32 (__main__.TestCudaOptimsCUDA) ... {'fused': True} {'fused': True} {'lr': 0.01, 'fused': True} {'lr': 0.01, 'fused': True} {'weight_decay': 0.1, 'fused': True} {'weight_decay': 0.1, 'fused': True} {'weight_decay': 0.1, 'maximize': True, 'fused': True} {'weight_decay': 0.1, 'maximize': True, 'fused': True} {'weight_decay': 0.1, 'amsgrad': True, 'fused': True} {'weight_decay': 0.1, 'amsgrad': True, 'fused': True} {'capturable': True, 'fused': True} {'capturable': True, 'fused': True} {'weight_decay': 0.1, 'amsgrad': True, 'capturable': True, 'fused': True} {'weight_decay': 0.1, 'amsgrad': True, 'capturable': True, 'fused': True} {'lr': tensor(0.0010), 'amsgrad': True, 'capturable': True, 'fused': True} {'lr': tensor(0.0010), 'amsgrad': True, 'capturable': True, 'fused': True} ok test_grad_scaling_autocast_fused_optimizers_SGD_cuda_float32 (__main__.TestCudaOptimsCUDA) ... {'fused': True} {'fused': True} {'lr': 0.01, 'fused': True} {'lr': 0.01, 'fused': True} {'lr': tensor(0.0010), 'fused': True} {'lr': tensor(0.0010), 'fused': True} {'momentum': 0.9, 'fused': True} {'momentum': 0.9, 'fused': True} {'momentum': 0.9, 'dampening': 0.5, 'fused': True} {'momentum': 0.9, 'dampening': 0.5, 'fused': True} {'momentum': 0.9, 'weight_decay': 0.1, 'fused': True} {'momentum': 0.9, 'weight_decay': 0.1, 'fused': True} {'momentum': 0.9, 'nesterov': True, 'weight_decay': 0.1, 'fused': True} {'momentum': 0.9, 'nesterov': True, 'weight_decay': 0.1, 'fused': True} {'weight_decay': 0.1, 'maximize': True, 'fused': True} {'weight_decay': 0.1, 'maximize': True, 'fused': True} ok ---------------------------------------------------------------------- Ran 8 tests in 16.117s OK (skipped=1) > lintrunner test/test_cuda.py ---------------------------------------------------------------------- ok No lint issues. > lintrunner torch/testing/_internal/common_optimizers.py ---------------------------------------------------------------------- ok No lint issues. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/126418 Approved by: https://github.com/janeyx99	2024-05-30 01:47:41 +00:00
PyTorch MergeBot	67739d8c6f	Revert "[Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051 )" This reverts commit 699db7988d84d163ebb6919f78885e4630182a7a. Reverted https://github.com/pytorch/pytorch/pull/127051 on behalf of https://github.com/PaliC due to This PR needs to be synced using the import button as there is a bug in our diff train ([comment](https://github.com/pytorch/pytorch/pull/127051#issuecomment-2138496995))	2024-05-30 01:16:57 +00:00
cyy	699db7988d	[Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/127051 Approved by: https://github.com/cpuhrsch, https://github.com/malfet	2024-05-29 11:58:03 +00:00
PyTorch MergeBot	cdbb2c9acc	Revert "[Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051 )" This reverts commit 4fdbaa794f9d5af2f171f772a51cb710c51c925f. Reverted https://github.com/pytorch/pytorch/pull/127051 on behalf of https://github.com/PaliC due to This PR needs to be synced using the import button as there is a bug in our diff train ([comment](https://github.com/pytorch/pytorch/pull/127051#issuecomment-2136428735))	2024-05-29 03:02:35 +00:00
feifan	22712ba5c5	Radam support the flag for "maximize" (#126765 ) Fixes #[126642](https://github.com/pytorch/pytorch/issues/126642) I reference the maximize in `Adam` and add `Radam's` maximize flag. If this pr is OK, I will add another pr for `Nadam`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/126765 Approved by: https://github.com/janeyx99	2024-05-27 06:34:50 +00:00
cyy	4fdbaa794f	[Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/127051 Approved by: https://github.com/cpuhrsch, https://github.com/malfet	2024-05-27 03:54:03 +00:00
Jane Xu	665637714f	Remove SparseAdam weird allowance of raw Tensor input (#127081 ) This continues the full deprecation after https://github.com/pytorch/pytorch/pull/114425. It's been 6 months! And I'm fairly certain no one is going to yell at me as this patch is not really used. ------ # BC Breaking note As of this PR, SparseAdam will become consistent with the rest of our optimizers in that it will only accept containers of Tensors/Parameters/param groups and fully complete deprecation of this path. Hitherto, the SparseAdam constructor had allowed raw tensors as the params argument to the constructor. Now, if you write the following code, there will be an error similar to every other optim: "params argument given to the optimizer should be an iterable of Tensors or dicts" ``` import torch param = torch.rand(16, 32) optimizer = torch.optim.SparseAdam(param) ``` Instead you should replace the last line with ``` optimizer = torch.optim.SparseAdam([param]) ``` to no longer error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127081 Approved by: https://github.com/soulitzer	2024-05-25 02:58:24 +00:00
eqy	ebbd431d9e	[CPU] Bump `test_complex_2d` thresholds for LBFGS on `complex64` (#126358 ) Is this supposed to be bitwise identical? Wasn't sure how to interpret the comment but it seems to be giving mismatches like: ``` Mismatched elements: 1 / 2 (50.0%) Greatest absolute difference: 4.6372413635253906e-05 at index (1,) (up to 1e-05 allowed) Greatest relative difference: 3.4600801882334054e-05 at index (1,) (up to 1.3e-06 allowed) To execute this test, run the following from the base repo dir: python test/test_optim.py -k test_complex_2d_LBFGS_cpu_complex64 ``` on Neoverse-N2 SBSA ARM CPUs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/126358 Approved by: https://github.com/lezcano, https://github.com/janeyx99	2024-05-23 00:16:45 +00:00
PyTorch MergeBot	cb69c51b6f	Revert " Updated test_graph_optims and test_graph_scaling_fused_optimizers to use new OptimizerInfo infrastructure (#125127 )" This reverts commit cf35a591b95220aa1bfcc04ff8a943efd1d6d6eb. Reverted https://github.com/pytorch/pytorch/pull/125127 on behalf of https://github.com/DanilBaibak due to Broken trunk ([comment](https://github.com/pytorch/pytorch/pull/125127#issuecomment-2120337584))	2024-05-20 12:14:22 +00:00
jayanth domalapalli	cf35a591b9	Updated test_graph_optims and test_graph_scaling_fused_optimizers to use new OptimizerInfo infrastructure (#125127 ) This PR is meant to address issue #123451, more specifically, the ```test_graph_optims``` and ```test_graph_scaling_fused_optimizers``` functions in ```test_cuda.py``` have been updated so that they now use the new OptimizerInfo infrastructure. Lintrunner passed: ``` $ lintrunner test/test_cuda.py ok No lint issues. ``` Tests passed: ``` >python test_cuda.py -k test_graph_optims Ran 19 tests in 7.463s OK (skipped=9) >python test_cuda.py -k test_graph_scaling_fused_optimizers Ran 6 tests in 2.800s OK (skipped=3) ``` Both the functions have been moved to the newly created TestCase class ```TestCudaOptims```. The test is mostly the same except the ```@optims``` decorator is used at the top of the function to implicitly call the function using each of the optimizers mentioned in the decorator instead of explicitly using a for loop to iterate through each of the optimizers. I was unable to use the ```_get_optim_inputs_including_global_cliquey_kwargs``` to get all kwargs for each of the optimizers since some of the kwargs that are used in the original ```test_graph_optims``` function are not being returned by the new OptimizerInfo infrastructure, more specifically, for the ```torch.optim.rmsprop.RMSprop``` optimizer, the following kwargs are not returned whenever ```_get_optim_inputs_including_global_cliquey_kwargs``` is called: ``` {'foreach': False, 'maximize': True, 'weight_decay': 0} { 'foreach': True, 'maximize': True, 'weight_decay': 0} ``` I ran into the same issue for ```test_graph_scaling_fused_optimizers```, for the ```torch.optim.adamw.AdamW``` optimizer, whenever ```optim_info.optim_inputs_func(device=device)``` was called, the following kwarg was not returned: ``` {'amsgrad': True} ``` Due to this issue, I resorted to using a dictionary to store the kwargs for each of the optimizers, I am aware that this is less than ideal. I was wondering whether I should use the OptimizerInfo infrastructure to get all the kwargs regardless of the fact that it lacks some kwargs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/125127 Approved by: https://github.com/janeyx99	2024-05-20 06:20:45 +00:00
David Chiu	7e166e8057	[optim] Fix: wrong ASGD implementation (#126375 ) This PR is based on #125440, additionally merging the latest main branch and fixing the lint failures from #126361. Pull Request resolved: https://github.com/pytorch/pytorch/pull/126375 Approved by: https://github.com/janeyx99	2024-05-17 15:46:39 +00:00
PyTorch MergeBot	e3c5d1b7d7	Revert "[optim] Fix: wrong ASGD implementation (#125440 )" This reverts commit 2c5ad9a3d7ea79ca897aec153a401f4b9175a717. Reverted https://github.com/pytorch/pytorch/pull/125440 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it looks like there is a linter failure coming from this change ([comment](https://github.com/pytorch/pytorch/pull/125440#issuecomment-2113833108))	2024-05-16 02:12:29 +00:00
haozhe.zhu	f9d107af66	[optim] add fused_adagrad support for CPU device (#124905 ) Support fused_sgd_kernel support for CPU. ## Bench result: 32 core/sockets ICX Test Scripts: https://gist.github.com/zhuhaozhe/79e842e0a6e25d6d7fa1e4598807272c https://gist.github.com/zhuhaozhe/b4c6998a509dcea1796dd05b3005c969 ``` Tensor Size: 262144, Num Tensor 4, Num Threads: 1 _single_tensor_adagrad time: 0.2500 seconds _fused_adagrad time: 0.0933 seconds Tensor Size: 4194304, Num Tensor 32, Num Threads: 32 _single_tensor_adagrad time: 2.8819 seconds _fused_adagrad time: 1.7591 seconds ``` ## Test Plan: ``` python test_optim.py -k test_fused_matches_forloop python test_optim.py -k test_fused_large_tensor python test_optim.py -k test_can_load_older_state_dict python test_optim.py -k test_grad_scaling_autocast_fused_optimizers python test_torch.py -k test_grad_scaling_autocast_fused python test_torch.py -k test_params_invalidated_with_grads_invalidated_between_unscale_and_step ``` Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/124905 Approved by: https://github.com/jgong5, https://github.com/janeyx99	2024-05-16 01:11:51 +00:00
David Chiu	2c5ad9a3d7	[optim] Fix: wrong ASGD implementation (#125440 ) > previous: Originally, the variables `new_eta` and `new_mu` would be constructed `len(grouped_mus)` times, but each of their values is the same and won't be changed. Therefore, it can be simplified using Python list multiplication, which only constructs one tensor. - [X] Ill assumption that every param will have the same step. - [x] DIfferent implementation between `foreach=Ture` and `foreach=False`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/125440 Approved by: https://github.com/janeyx99	2024-05-15 22:52:15 +00:00
PyTorch MergeBot	bd3cbdba2f	Revert "[optim] add fused_adagrad support for CPU device (#124905 )" This reverts commit 1c3fe8403365db3cc9b75524ae742e3027b745e2. Reverted https://github.com/pytorch/pytorch/pull/124905 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it is failing distributed multigpu test in trunk `1c3fe84033` ([comment](https://github.com/pytorch/pytorch/pull/124905#issuecomment-2108777063))	2024-05-13 20:53:22 +00:00
haozhe.zhu	1c3fe84033	[optim] add fused_adagrad support for CPU device (#124905 ) Support fused_sgd_kernel support for CPU. ## Bench result: 32 core/sockets ICX Test Scripts: https://gist.github.com/zhuhaozhe/79e842e0a6e25d6d7fa1e4598807272c https://gist.github.com/zhuhaozhe/b4c6998a509dcea1796dd05b3005c969 ``` Tensor Size: 262144, Num Tensor 4, Num Threads: 1 _single_tensor_adagrad time: 0.2500 seconds _fused_adagrad time: 0.0933 seconds Tensor Size: 4194304, Num Tensor 32, Num Threads: 32 _single_tensor_adagrad time: 2.8819 seconds _fused_adagrad time: 1.7591 seconds ``` ## Test Plan: ``` python test_optim.py -k test_fused_matches_forloop python test_optim.py -k test_fused_large_tensor python test_optim.py -k test_can_load_older_state_dict python test_optim.py -k test_grad_scaling_autocast_fused_optimizers python test_torch.py -k test_grad_scaling_autocast_fused python test_torch.py -k test_params_invalidated_with_grads_invalidated_between_unscale_and_step ``` Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/124905 Approved by: https://github.com/jgong5, https://github.com/janeyx99	2024-05-13 01:16:20 +00:00
Michael Lazos	b24ad7eab5	Enable dynamo traced test_param_group_with_lrscheduler_goes_right_direction (#124544 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124544 Approved by: https://github.com/janeyx99 ghstack dependencies: #125825, #125826	2024-05-11 06:29:59 +00:00
Michael Lazos	e3d5afc60a	Enable dynamo'd test for 116499 (#123469 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123469 Approved by: https://github.com/janeyx99 ghstack dependencies: #123619	2024-05-07 22:17:01 +00:00
Michael Lazos	f0c6d6100b	Enable dynamo-traced optimizer peak memory tests (#124543 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124543 Approved by: https://github.com/yf225, https://github.com/janeyx99	2024-05-07 08:21:50 +00:00
Michael Lazos	787afc5180	Add LR as tensor tests (#123750 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123750 Approved by: https://github.com/janeyx99	2024-05-01 04:46:49 +00:00
haozhe.zhu	3c964ad1ca	add fused_sgd_kernel support for CPU device (#123629 ) Support fused_sgd_kernel support for CPU. ## Bench result: 32 core/sockets ICX Test Scripts: https://gist.github.com/zhuhaozhe/688763e17e93e4c5e12f25f676ec90d9 https://gist.github.com/zhuhaozhe/ad9938694bc7fae8b66d376f4dffc6c9 ``` Tensor Size: 262144, Num Tensor 4, Num Threads: 1 _single_tensor_sgd time: 0.2301 seconds _fused_sgd time: 0.0925 seconds Tensor Size: 4194304, Num Tensor 32, Num Threads: 32 _single_tensor_sgd time: 2.6195 seconds _fused_sgd time: 1.7543 seconds ``` ## Test Plan: ``` python test_optim.py -k test_fused_matches_forloop python test_optim.py -k test_fused_large_tensor python test_optim.py -k test_can_load_older_state_dict python test_optim.py -k test_grad_scaling_autocast_fused_optimizers python test_torch.py -k test_grad_scaling_autocast_fused python test_torch.py -k test_params_invalidated_with_grads_invalidated_between_unscale_and_step ``` Looks like we already have some PRs under this issue https://github.com/pytorch/pytorch/issues/123451 to unified the UTs, I did not modified UT in this PR. Co-authored-by: Jane Xu <janeyx@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/123629 Approved by: https://github.com/jgong5, https://github.com/janeyx99	2024-04-23 08:28:19 +00:00
Michael Lazos	0d0b5b2655	Enable dynamo rosenbrock sparse tests (#124542 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124542 Approved by: https://github.com/yf225 ghstack dependencies: #124540, #124541	2024-04-20 05:54:41 +00:00
Michael Lazos	184f16016e	Enable dynamo-traced deepcopy test for RMSprop (#124541 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124541 Approved by: https://github.com/yf225 ghstack dependencies: #124540	2024-04-20 05:54:41 +00:00
Michael Lazos	6a730698e2	Enable dynamo-traced Adamax tests (#124540 ) Enabling tests related to https://github.com/pytorch/pytorch/issues/121178 Pull Request resolved: https://github.com/pytorch/pytorch/pull/124540 Approved by: https://github.com/yf225	2024-04-20 05:54:41 +00:00
Michael Lazos	68a027f144	Fixes for 123400 (#123406 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123406 Approved by: https://github.com/janeyx99 ghstack dependencies: #123324, #123404, #123405, #124309	2024-04-19 17:20:57 +00:00
Michael Lazos	1531a29fb9	Enable tests related to 116061 (#123405 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123405 Approved by: https://github.com/janeyx99 ghstack dependencies: #123324, #123404	2024-04-19 17:20:54 +00:00
Michael Lazos	406d99e46c	Fix for 117147 (#123404 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123404 Approved by: https://github.com/Skylion007, https://github.com/janeyx99 ghstack dependencies: #123324	2024-04-19 17:20:50 +00:00
Michael Lazos	203d111c54	Enable dynamo test_forloop_goes_right_direction_multi_gpu (#123324 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123324 Approved by: https://github.com/janeyx99	2024-04-19 17:20:41 +00:00
Jane Xu	b412b75b42	[optim] add fused_adam/adamw_kernel support for CPU device (#123074 ) On par with `CUDA` implementation. For `autocast` logic, same with `CUDA` + `Fused Adam`: - check inf in `gradscalar.step` - In fused kernel, if there is `inf`, do nothing. If not, unscale the grad ( also write back) and update the param. TestPlan: ``` # extend CUDA only test for CPU fused adagrad python test_optim.py -k test_fused_matches_forloop python test_optim.py -k test_fused_large_tensor python test_torch.py -k test_grad_scaling_autocast_fused # extend fused test python test_torch.py -k test_params_invalidated_with_grads_invalidated_between_unscale_and_step python test_optim.py -k test_can_load_older_state_dict # newly added test (follow `6b1f13ea2f/test/test_cuda.py (L1108)`) python test_optim.py -k test_grad_scaling_autocast_fused_optimizers ``` Benchmark: 5.1x on 56 core SPR Parameter-size=1M Nparams=10 [test script](https://gist.github.com/zhuhaozhe/ef9a290ad3f8f4067b3373a3bdaa33e7) ``` numactl -C 0-55 -m 0 python bench_adam.py non-fused 6.0174267292022705 s fused 1.1787631511688232 s ``` Note: Fused kernel accuracy The accuracy failure in CI shows a little higher than default tolerance ``` 2024-04-02T06:09:16.2213887Z Mismatched elements: 21 / 64 (32.8%) 2024-04-02T06:09:16.2214339Z Greatest absolute difference: 1.5735626220703125e-05 at index (6, 6) (up to 1e-05 allowed) 2024-04-02T06:09:16.2214813Z Greatest relative difference: 1.0073336852656212e-05 at index (4, 1) (up to 1.3e-06 allowed) ``` I have debug it step by step and unfortunately we may not able to make the `fused kernel` exactly same with `non fused` one due to compiler optimizations. For example, in non-fused impl ``` exp_avg_sq.mul_(beta2).addcmul_(grad, grad.conj(), value=1 - beta2) ``` and in fused impl ``` exp_avg_sq_ptr[d] = scalar_t(beta2) * exp_avg_sq_ptr[d]; // std::cout << "exp_avg_sq " << exp_avg_sq_ptr[d] << std::endl; exp_avg_sq_ptr[d] = exp_avg_sq_ptr[d] + scalar_t(exp_avg_sq_grad_coefficient) * grad_val * grad_val; ``` If I keep `std::cout`, I can get exactly same results in UT ``` ===============param 0.6796758770942688 0.6796758770942688 ``` But when I comment out it, there will be a difference ``` ===============param 0.6796758770942688 0.6796759366989136 ``` So I will make the tolerance a little higher than default one. Co-authored-by: Jane Xu <janeyx@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/123074 Approved by: https://github.com/jgong5, https://github.com/janeyx99	2024-04-19 11:14:04 +00:00
Michael Lazos	102a223216	Enable dynamo test_state_dict_deterministic (#123323 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123323 Approved by: https://github.com/janeyx99 ghstack dependencies: #123498, #123322	2024-04-18 01:06:28 +00:00
Michael Lazos	d88fcb86d8	Enable dynamo traced test_forloop_goes_right_direction (#123322 ) Removed a bunch of skips, I also updated test_forloop_goes_right_direction to not use the closure when dynamo is tracing. The reason for this is that testing the disabled optimizer doesn't actually test anything. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123322 Approved by: https://github.com/janeyx99 ghstack dependencies: #123498	2024-04-18 00:50:10 +00:00
Michael Lazos	565e8c0645	[Reland] Enable dynamo'd tests disabled for #115679 (#123552 ) Relanding https://github.com/pytorch/pytorch/pull/123315 Pull Request resolved: https://github.com/pytorch/pytorch/pull/123552 Approved by: https://github.com/anijain2305 ghstack dependencies: #123496, #123497, #123551	2024-04-09 02:14:32 +00:00
Michael Lazos	6951626735	[Reland] Enable tests disabled for #115607 (#123551 ) Relanding https://github.com/pytorch/pytorch/pull/123314 Pull Request resolved: https://github.com/pytorch/pytorch/pull/123551 Approved by: https://github.com/anijain2305 ghstack dependencies: #123496, #123497	2024-04-08 21:29:28 +00:00
PyTorch MergeBot	e94b81b254	Revert "Enable tests disabled for #115607 (#123314 )" This reverts commit 9564e204c1616ce78434abfdea0f3fd428b675f3. Reverted https://github.com/pytorch/pytorch/pull/123314 on behalf of https://github.com/atalman due to break TestOptimRenewedCPU::test_foreach_matches_forloop_Adamax_cpu_float64 ([comment](https://github.com/pytorch/pytorch/pull/123314#issuecomment-2040854499))	2024-04-06 01:59:22 +00:00
PyTorch MergeBot	954d750516	Revert "Enable dynamo'd tests disabled for #115679 (#123315 )" This reverts commit d472ebf94a3f3a3dec31e9d8b2038127b2309727. Reverted https://github.com/pytorch/pytorch/pull/123315 on behalf of https://github.com/atalman due to break TestOptimRenewedCPU::test_foreach_matches_forloop_Adamax_cpu_float64 ([comment](https://github.com/pytorch/pytorch/pull/123315#issuecomment-2040835229))	2024-04-06 00:57:42 +00:00
Michael Lazos	d472ebf94a	Enable dynamo'd tests disabled for #115679 (#123315 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123315 Approved by: https://github.com/janeyx99 ghstack dependencies: #123313, #123314	2024-04-05 23:21:53 +00:00
Michael Lazos	9564e204c1	Enable tests disabled for #115607 (#123314 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123314 Approved by: https://github.com/janeyx99 ghstack dependencies: #123313	2024-04-05 23:21:53 +00:00
Jane Xu	d7fe0603a1	Move sparse tests to TestOptimRenewed (#123146 ) This is the last of the old TestOptim! With this change, everything will be migrated to use OptimizerInfo. Our sparse support is...well, sparse, and the tests try to best encapsulate which configs actually work. Note that support_sparse is actually just supports sparse grads...we don't test sparse params. 1. This PR fixes a bug in Adagrad multi_tensor with maximize by passing the correct value of maximize (vs False everytime) when sparse values are present. 2. This PR does improve coverage. There used to only be 2 configs each, and now we have the following configs for: Adagrad: ``` python test/test_optim.py -k test_rosenbrock_sparse_with_lrsched_False_Adagrad /home/janeyx/.conda/envs/pytorch-3.10/lib/python3.10/site-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. _torch_pytree._register_pytree_node( {'maximize': True, 'lr': 0.1} {'initial_accumulator_value': 0.1, 'lr': 0.1} <--- this and above are CPU .{'foreach': False, 'lr': 0.1} {'foreach': True, 'lr': 0.1} {'maximize': True, 'foreach': False, 'lr': 0.1} {'maximize': True, 'foreach': True, 'lr': 0.1} {'initial_accumulator_value': 0.1, 'foreach': False, 'lr': 0.1} {'initial_accumulator_value': 0.1, 'foreach': True, 'lr': 0.1} . ---------------------------------------------------------------------- Ran 2 tests in 227.744s OK ``` SGD ``` (pytorch-3.10) [janeyx@devgpu023.odn1 /data/users/janeyx/pytorch (bff23193)]$ python test/test_optim.py -k test_rosenbrock_sparse_with_lrsched_False_SGD /home/janeyx/.conda/envs/pytorch-3.10/lib/python3.10/site-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. _torch_pytree._register_pytree_node( {'dampening': 0.5, 'lr': 0.0048} .{'foreach': False, 'lr': 0.0048} {'foreach': True, 'lr': 0.0048} {'dampening': 0.5, 'foreach': False, 'lr': 0.0048} {'dampening': 0.5, 'foreach': True, 'lr': 0.0048} . ---------------------------------------------------------------------- Ran 2 tests in 112.801s OK ``` SparseAdam ``` (pytorch-3.10) [janeyx@devgpu023.odn1 /data/users/janeyx/pytorch (bff23193)]$ python test/test_optim.py -k test_rosenbrock_sparse_with_lrsched_False_Sparse /home/janeyx/.conda/envs/pytorch-3.10/lib/python3.10/site-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. _torch_pytree._register_pytree_node( {'maximize': True, 'lr': 0.04} .{'maximize': True, 'lr': 0.04} . ---------------------------------------------------------------------- Ran 2 tests in 35.113s OK ``` Fixes #103322. A side quest in this migration was to re-enable and track dynamo issues as they trigger on the optim tests, which will be complete from this PR. New tests may add more things to track in dynamo, but there is now an established system for doing so, and dynamo is either enabled or a bug is tracked for every migrated test in TestOptimRenewed. Next steps: Remove the hyperparameter constraints in common_optimizer.py defined by metadata_for_sparse (other than LR, which seems handpicked for the tests to actually pass). Doing this requires adding more sparse functionality. Add more tests! Maybe add more optimizers! Pull Request resolved: https://github.com/pytorch/pytorch/pull/123146 Approved by: https://github.com/albanD ghstack dependencies: #123134, #123139	2024-04-02 22:51:02 +00:00
Jane Xu	f2838c99a0	Add a tensor lr test for optimizers (#123139 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123139 Approved by: https://github.com/albanD ghstack dependencies: #123134	2024-04-02 22:51:02 +00:00
Jane Xu	cb8fc30e4a	Move LRScheduler integration tests to OptimizerInfo (#123134 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123134 Approved by: https://github.com/albanD	2024-04-02 22:51:02 +00:00
Jane Xu	9d9d2af786	[BE] Move tests using functional API to OptimizerInfo (#122822 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122822 Approved by: https://github.com/albanD	2024-04-02 01:35:59 +00:00
Michael Lazos	16771747c2	Add tensor step and capturable support to rprop (#122261 ) Towards fixing https://github.com/pytorch/pytorch/issues/115679 Fixes Rprop step update while compiling Also adds capturable support + testing Pull Request resolved: https://github.com/pytorch/pytorch/pull/122261 Approved by: https://github.com/janeyx99	2024-03-28 23:31:18 +00:00
Michael Lazos	caa57e4fcd	Add tensor step and capturable support to rmsprop (#122264 ) Towards fixing https://github.com/pytorch/pytorch/issues/115679 Fixes RMSprop step update while compiling Adds capturable support to RMSprop Pull Request resolved: https://github.com/pytorch/pytorch/pull/122264 Approved by: https://github.com/janeyx99	2024-03-28 03:39:28 +00:00
Michael Lazos	365e89a591	Add tensor step to adadelta (#122252 ) Towards fixing https://github.com/pytorch/pytorch/issues/115679 Fixes Adadelta step update while compiling Pull Request resolved: https://github.com/pytorch/pytorch/pull/122252 Approved by: https://github.com/janeyx99	2024-03-21 07:28:47 +00:00
Jane Xu	fb1d7935bb	[optim][BE] move complex_2d (last of complex tests) to OptimInfo (#120618 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/120618 Approved by: https://github.com/albanD	2024-03-12 02:33:21 +00:00
Jane Xu	f76e541ec7	[BE] NO MORE discrepancy between forloop foreach capturable YAY (#121269 ) and I will not let it happen again Pull Request resolved: https://github.com/pytorch/pytorch/pull/121269 Approved by: https://github.com/albanD ghstack dependencies: #121260, #121264	2024-03-08 00:00:30 +00:00
Jane Xu	9d6c5be781	Add ASGD capturable API for forloop (#121264 ) @tfsingh I got to it first--wanted to land this stack and close the gap ASAP. This PR also fixes a discrepancy between `_init_group` and `__set_state__` because we have the constants live on params' device always. There are some next steps though: - ASGD can be made faster by making etas, mus, steps be on CPU when NOT capturable. (I had mistakenly thought foreachifying was faster and so we landed https://github.com/pytorch/pytorch/pull/107857, but it is slower). No one has complained yet though. ¯\_(ツ)_/¯ Pull Request resolved: https://github.com/pytorch/pytorch/pull/121264 Approved by: https://github.com/albanD ghstack dependencies: #121260	2024-03-08 00:00:30 +00:00
Jane Xu	24821fec26	Add RAdam capturable API for forloop (#121260 ) Implementation thanks to @MarouaneMaatouk in https://github.com/pytorch/pytorch/pull/118697, though I've since cleaned it up a lot to save perf on the rect < 5 eager case. It also just looks better now :) Added tests and the cudagraph health check. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121260 Approved by: https://github.com/mlazos	2024-03-08 00:00:30 +00:00

1 2

82 Commits