pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
haozhe.zhu@intel.com	0ae952db76	enable mkldnn bf32 matmul (#116015 ) ### Testing FP32 matmul vs. mkldnn BF32 matmul on SPR single core: Input \| BF32 / ms \| FP32 / ms \| Speed up -- \| -- \| -- \| -- M: 128, N: 128, K: 128, trans_a: False, trans_b: False \| 32.842 \| 38.279 \| 1.165 M: 128, N: 256, K: 128, trans_a: False, trans_b: False \| 38.590 \| 73.967 \| 1.917 M: 8192, N: 768, K: 768, trans_a: False, trans_b: False \| 18456.267 \| 74588.002 \| 4.041 56 cores: Input \| BF32 / ms \| FP32 / ms \| Speed up -- \| -- \| -- \| -- M: 8192, N: 768, K: 768, trans_a: False, trans_b: False \| 1199.400 \| 1715.548 \| 1.430 M: 8192, N: 768, K: 768, trans_a: False, trans_b: True \|1129.204 \| 1708.912 \| 1.513 M: 8192, N: 768, K: 3072, trans_a: False, trans_b: False \| 3655.915 \| 7992.877 \| 2.186 M: 8192, N: 768, K: 3072, trans_a: False, trans_b: True \| 3707.993 \| 8026.191 \| 2.165 Batch: 768, M: 128, N: 64, K: 128 \| 1296.419 \| 1308.411 \| 1.009 Pull Request resolved: https://github.com/pytorch/pytorch/pull/116015 Approved by: https://github.com/jgong5, https://github.com/ezyang	2024-01-20 09:30:23 +00:00
CaoE	c9528a11dd	Add Half support for masked_softmax on CPU (#117028 ) Add Half support for `masked_softmax` on CPU. Pull Request resolved: https://github.com/pytorch/pytorch/pull/117028 Approved by: https://github.com/jgong5, https://github.com/cpuhrsch	2024-01-18 08:59:20 +00:00
vfdev-5	1a57c18760	Fixed cuda grads for interpolate::trilinear on non-contig grad output (#117373 ) Fixes #113642 Pull Request resolved: https://github.com/pytorch/pytorch/pull/117373 Approved by: https://github.com/lezcano	2024-01-15 18:05:47 +00:00
vmoens	6f0f4f12ca	[BugFix] Prevent LSTM to run with wrong input shape (#115542 ) Fixes #114874 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115542 Approved by: https://github.com/mikaylagawarecki	2024-01-11 02:57:09 +00:00
Xinya Zhang	e3ca7346ce	Re-add initial Flash Attention support on ROCM (#115981 ) Note about the Updates: This PR: 1. skips more flash attention related UTs on MI200 2. Fix additional ATen compiling errors after hipification 3. Fix the author "root" of a specific commit 4. Includes the patch from Nikita in favor of block level static initialization. CAVEAT: This revised PR has a commit that modifies the CI to force its running on MI200 nodes. That specific commit must be reverted before merge. Original PR (https://github.com/pytorch/pytorch/pull/114309) Note: This pull requests add initial Flash Attention support for AMD/ROCM platform. It added a specialized Triton repository/branch as a compile-time dependency for Flash Attention math library on AMD/ROCM. This triton submodule is not used at runtime and will not be shipped to the final pytorch package. We have the plan to release this specialized Triton as a separate project. Know limitations: - Only supports MI200 series GPU (i.e., `gcnArchName == gfx90a:sramecc+:xnack-`. - Only supports power of two sequence lengths. - No support for varlen APIs. - Only support head dimension 16,32,64,128. - Performance is still being optimized. Fixes #112997 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115981 Approved by: https://github.com/malfet	2024-01-04 22:21:31 +00:00
Aaron Gokaslan	3fe437b24b	[BE]: Update flake8 to v6.1.0 and fix lints (#116591 ) Updates flake8 to v6.1.0 and fixes a few lints using sed and some ruff tooling. - Replace `assert(0)` with `raise AssertionError()` - Remove extraneous parenthesis i.e. - `assert(a == b)` -> `assert a == b` - `if(x > y or y < z):`->`if x > y or y < z:` - And `return('...')` -> `return '...'` Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/116591 Approved by: https://github.com/albanD, https://github.com/malfet	2024-01-03 06:04:44 +00:00
Sun, Jiayi	c173a9d9b3	add Half support for layer_norm on CPU (#99590 ) ### Testing Single socket (icx, 32cores): \| shape \| fp32 forward (ms) \| fp16 forward (ms) \| mixed fp32 fp16 forward (ms) \| fp32 backward (ms) \| fp16 backward (ms) \| mixed fp32 fp16 backward (ms) \| \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| \| (1, 8, 16) \| 0.012 \| 0.011 \| 0.011 \| 0.051 \| 0.051 \| 0.050 \| \| (8 ,8, 16) \| 0.013 \| 0.013 \| 0.013 \| 0.054 \| 0.053 \| 0.051 \| \| (32, 8, 16) \| 0.015 \| 0.014 \| 0.014 \| 0.059 \| 0.054 \| 0.052 \| \| (64, 128, 56, 56) \| 1.875 \| 0.790 \| 1.016 \| 12.845 \| 7.151 \| 6.985 \| \| (64, 128, 256, 256) \| 50.226 \| 25.462 \| 35.736 \| 328.957 \| 179.615 \| 175.618 \| Single core (icx): \| shape \| fp32 forward (ms) \| fp16 forward (ms) \| mixed fp32 fp16 forward (ms) \| fp32 backward (ms) \| fp16 backward (ms) \| mixed fp32 fp16 backward (ms) \| \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| \| (1, 8, 16) \| 0.012 \| 0.011 \| 0.011 \| 0.040 \| 0.041 \| 0.041 \| \| (8 ,8, 16) \| 0.012 \| 0.012 \| 0.012 \| 0.042 \| 0.042 \| 0.042 \| \| (32, 8, 16) \| 0.027 \| 0.014 \| 0.014 \| 0.048 \| 0.048 \| 0.046 \| \| (64, 128, 56, 56) \| 58.054 \| 11.034 \| 17.928 \| 108.603 \| 48.816 \| 50.244 \| \| (64, 128, 256, 256) \| 1327.758 \| 352.394 \| 496.994 \| 2846.182 \| 1224.247 \| 1218.422 \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/99590 Approved by: https://github.com/mingfeima, https://github.com/jgong5, https://github.com/cpuhrsch	2023-12-20 01:11:15 +00:00
eqy	d55365dc05	[CUDA] Workaround shmem limit for certain input sizes in `AdaptiveAvgPool1D` (#115231 ) Reference issue #68248 CC @ptrblck @malfet @xwang233 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115231 Approved by: https://github.com/mikaylagawarecki	2023-12-19 22:40:10 +00:00
PyTorch MergeBot	c006c8b50e	Revert "markDynamoStrictTest some more (#115885 )" This reverts commit 55ce4693ff2c0b6e50b8af323f36ecc7ff929638. Reverted https://github.com/pytorch/pytorch/pull/115885 on behalf of https://github.com/atalman due to OSSCI oncall, broke inductor ([comment](https://github.com/pytorch/pytorch/pull/115885#issuecomment-1858409669))	2023-12-15 19:51:24 +00:00
rzou	55ce4693ff	markDynamoStrictTest some more (#115885 ) Featuring test_native_mha.py test_nn.py test_prims.py test_schema_check.py test_serialization.py test_show_pickle.py test_sort_and_select.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/115885 Approved by: https://github.com/voznesenskym ghstack dependencies: #115845, #115855, #115856, #115857, #115858, #115870, #115871, #115879	2023-12-15 13:19:52 +00:00
eqy	9056903b09	[CUDA] 64-bit indexing for avg_pool_backward (#114193 ) Fixes #113833 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114193 Approved by: https://github.com/malfet	2023-12-15 03:58:46 +00:00
Mikayla Gawarecki	f5919335db	Fix _load_from_state_dict for num_batches_tracked in batchnorm (#115285 ) I approved https://github.com/pytorch/pytorch/pull/110850 which did the following Previously: `num_batches_tracked` not in state_dict when doing `m.load_state_dict(state_dict)` --> always overwrite module's `num_batches_tracked` in `load_from_state_dict` with a 0 cpu tensor Now: `num_batches_tracked` not in state_dict loaded when doing `m.load_state_dict(state_dict)` --> only overwrite module's `num_batches_tracked` in `load_from_state_dict` with a 0 cpu tensor if module does not have `num_batches_tracked` This causes the following issue: ``` with torch.device('meta'): m = BatchNorm(...) m.load_state_dict(state_dict, assign=True) ``` If `num_batches_tracked` is not in `state_dict`, since `modules's` `num_batches_tracked` is present on meta device, it is not overwritten with a 0 cpu tensor. When compiling, this error is raised ``` AssertionError: Does not support mixing cuda+meta ``` I am not sure whether the explicit check for meta device makes sense as a fix, will add testing if this fix is ok Pull Request resolved: https://github.com/pytorch/pytorch/pull/115285 Approved by: https://github.com/albanD	2023-12-07 22:48:26 +00:00
Jeff Daily	4c04ae2451	[ROCm] fix test_softmax_forward_64bit_indexing_cuda OOM (#113093 ) TestNNDeviceTypeCUDA.test_softmax_forward_64bit_indexing_cuda started failing for ROCm after #112096 with the message torch.cuda.OutOfMemoryError: HIP out of memory. Tried to allocate 13.35 GiB. GPU 0 has a total capacity of 31.98 GiB of which 3.89 GiB is free. Of the allocated memory 26.69 GiB is allocated by PyTorch, and 18.91 MiB is reserved by PyTorch but unallocated. This amounts to approximately 41GB. The test is currently decorated with `largeTensorTest("30GB", "cuda")` but this is not sufficient for ROCm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113093 Approved by: https://github.com/malfet	2023-11-07 03:00:37 +00:00
Eddie Yan	e39668770a	[CUDA] 64-bit indexing fixes for cross-entropy kernels (#112096 ) For #108345, #111484 Addresses the forward kernels implicated in the issues, but will take another look at the backward kernels (in follow-up PRs if necessary). The spatial softmax kernel is changed to use signed integer indexing rather than unsigned as `ScalarType` only has signed integer types declared for now, but this should be a minor change. CC @ptrblck @crcrpar (who landed a few related PRs recently). Pull Request resolved: https://github.com/pytorch/pytorch/pull/112096 Approved by: https://github.com/mikaylagawarecki	2023-11-06 17:37:08 +00:00
Tobias Ringwald	29716e865c	Enforce both input tensor shapes of CosineEmbeddingLoss to be equal. (#112782 ) …Added a test to prevent regressions. Fixes #112732. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112782 Approved by: https://github.com/lezcano	2023-11-03 15:15:06 +00:00
Tristan Rice	013f622dd2	grid_sample: support bfloat16 (#112331 ) This adds bfloat16 support to `torch.nn.functional.grid_sample` this is particularly important when doing feature sampling such as for rendering techniques used in PyTorch3d or for camera projections to voxel grids such as in SimpleBEV. Related to #57707 Test plan: ``` pytest test/test_nn.py -k grid_sample pytest test/test_ops.py -k grid_sample ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112331 Approved by: https://github.com/zou3519	2023-10-30 19:31:41 +00:00
Cao E	1c89ea7f72	Add Half support for softmax and log_softmax on CPU (#103315 ) Add Half support for softmax and log_softmax on CPU. Note: This introduces a correctness issue with MPS https://github.com/pytorch/pytorch/issues/111416 and https://github.com/pytorch/pytorch/issues/111479. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103315 Approved by: https://github.com/jgong5, https://github.com/mikaylagawarecki, https://github.com/malfet	2023-10-26 08:38:54 +00:00
pbialecki	17b732eb04	increase CPU memory requirement for test_nll_loss_large (#110963 ) Running `python test_nn.py -v -k test_nll_loss_large_tensor` on a machine with a small host RAM availability (e.g. ~50GB) fails with a `SIGKILL` even though the currently specified memory requirements for CPU (and GPU) are set to 48GB and are thus met. Profiling the peak memory usage via: ``` \time -v python test_nn.py -v -k test_nll_loss_large_tensor ``` and adding `print(torch.cuda.memory_summaryu())` at the end of the test shows a higher host RAM usage of >100GB and a device memory usage of ~32GB. ``` Command being timed: "python test_nn.py -v -k test_nll_loss_large_tensor" User time (seconds): 81.66 System time (seconds): 229.02 Percent of CPU this job got: 671% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:46.30 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 118150096 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 90280839 Voluntary context switches: 1669 Involuntary context switches: 1214548 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 ``` ``` \| PyTorch CUDA memory summary, device ID 0 \| \|---------------------------------------------------------------------------\| \| CUDA OOMs: 0 \| cudaMalloc retries: 0 \| \|===========================================================================\| \| Metric \| Cur Usage \| Peak Usage \| Tot Alloc \| Tot Freed \| \|---------------------------------------------------------------------------\| \| Allocated memory \| 32769 MiB \| 32769 MiB \| 81923 MiB \| 49154 MiB \| \| from large pool \| 32768 MiB \| 32768 MiB \| 81921 MiB \| 49152 MiB \| \| from small pool \| 0 MiB \| 0 MiB \| 1 MiB \| 1 MiB \| \|---------------------------------------------------------------------------\| \| Active memory \| 32769 MiB \| 32769 MiB \| 81923 MiB \| 49154 MiB \| \| from large pool \| 32768 MiB \| 32768 MiB \| 81921 MiB \| 49152 MiB \| \| from small pool \| 0 MiB \| 0 MiB \| 1 MiB \| 1 MiB \| \|---------------------------------------------------------------------------\| \| Requested memory \| 32769 MiB \| 32769 MiB \| 81923 MiB \| 49154 MiB \| \| from large pool \| 32768 MiB \| 32768 MiB \| 81921 MiB \| 49152 MiB \| \| from small pool \| 0 MiB \| 0 MiB \| 1 MiB \| 1 MiB \| \|---------------------------------------------------------------------------\| \| GPU reserved memory \| 32774 MiB \| 32774 MiB \| 81938 MiB \| 49164 MiB \| \| from large pool \| 32772 MiB \| 32772 MiB \| 81930 MiB \| 49158 MiB \| \| from small pool \| 2 MiB \| 2 MiB \| 8 MiB \| 6 MiB \| \|---------------------------------------------------------------------------\| ... ``` We haven't seen this issue before as the majority of our runners have sufficient host RAM and I just ran into it by chance. CC @atalman @malfet @crcrpar Pull Request resolved: https://github.com/pytorch/pytorch/pull/110963 Approved by: https://github.com/mikaylagawarecki, https://github.com/eqy, https://github.com/malfet	2023-10-25 23:45:47 +00:00
PyTorch MergeBot	5ce8002d24	Revert "Remove deprecated fbgemm operators (#104535 )" This reverts commit 57c7aa12dbf71617bd21fe7e076df8e823b5b7bb. Reverted https://github.com/pytorch/pytorch/pull/104535 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/104535#issuecomment-1779650412))	2023-10-25 16:34:16 +00:00
Oleg Bulatov	192477b5ba	Enable flake8-bugbear B020 lint (#110823 ) Fixes part of https://github.com/pytorch/pytorch/issues/106571 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110823 Approved by: https://github.com/Skylion007	2023-10-24 22:43:47 +00:00
FFFrog	0e0f6a248d	Fix num_batches_tracked of BatchNorm when load_state_dict (#110850 ) Fixes #110361 as the title shown Pull Request resolved: https://github.com/pytorch/pytorch/pull/110850 Approved by: https://github.com/mikaylagawarecki	2023-10-24 04:20:38 +00:00
Peter Bell	57c7aa12db	Remove deprecated fbgemm operators (#104535 ) These operators are not used and have been deprecated since #72690 (Feb 2022). Additionally, the `torch.jit.quantized` interface has been deprecated since #40102 (June 2020). Pull Request resolved: https://github.com/pytorch/pytorch/pull/104535 Approved by: https://github.com/ezyang	2023-10-22 06:10:09 +00:00
CaoE	54c28c564f	add Half support for BatchNorm on CPU (#102070 ) Fixes #106543 ### Testing Single core: shape \| fp32 forward / ms \| fp16 forward / ms \| bf16 forward / ms \| fp32 backward / ms \| fp16 backward / ms \| bf16 backward / ms -- \| -- \| -- \| -- \| -- \| -- \| -- (1, 4, 256, 256) \| 0.7116 \| 0.1427 \| 0.1744 \| 0.2638 \| 0.2002 \| 0.2556 (1, 32, 100, 100) \| 0.8579 \| 0.1725 \| 0.2077 \| 0.3023 \| 0.2399 \| 0.2995 (32, 16, 200, 200) \| 57.3466 \| 12.2179 \| 13.1320 \| 45.9524 \| 24.1526 \| 24.9882 28 cores: shape \| fp32 forward / ms \| fp16 forward / ms \| bf16 forward / ms \| fp32 backward / ms \| fp16 backward / ms \| bf16 backward / ms -- \| -- \| -- \| -- \| -- \| -- \| -- (1, 4, 256, 256) \| 0.2571 \| 0.0713 \| 0.0846 \| 0.1140 \| 0.0883 \| 0.1043 (1, 32, 100, 100) \| 0.1077 \| 0.0510 \| 0.0548 \| 0.0700 \| 0.0645 \| 0.0713 (32, 16, 200, 200) \| 5.5060 \| 1.4195 \| 1.4663 \| 6.773 \| 3.0886 \| 3.1343 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102070 Approved by: https://github.com/jgong5, https://github.com/mikaylagawarecki, https://github.com/mingfeima	2023-09-19 10:43:33 +00:00
lezcano	653c1564bf	Fix broadcasting cosine_similarity (#109363 ) Fixes https://github.com/pytorch/pytorch/issues/109333 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109363 Approved by: https://github.com/peterbell10	2023-09-15 17:12:35 +00:00
PyTorch MergeBot	b226373d16	Revert "add Half support for BatchNorm on CPU (#102070 )" This reverts commit b6a1d3fb97ca8eeccf15a4c495fdd1af4b197f88. Reverted https://github.com/pytorch/pytorch/pull/102070 on behalf of https://github.com/clee2000 due to I'm very sorry but it looks like #106543 was not fixed, I still see it failing on main `b6a1d3fb97` https://github.com/pytorch/pytorch/actions/runs/6185704949/job/16793975677 ([comment](https://github.com/pytorch/pytorch/pull/102070#issuecomment-1719747065))	2023-09-14 16:13:34 +00:00
CaoE	b6a1d3fb97	add Half support for BatchNorm on CPU (#102070 ) Fixes #106543 ### Testing Single core: shape \| fp32 forward / ms \| fp16 forward / ms \| bf16 forward / ms \| fp32 backward / ms \| fp16 backward / ms \| bf16 backward / ms -- \| -- \| -- \| -- \| -- \| -- \| -- (1, 4, 256, 256) \| 0.7116 \| 0.1427 \| 0.1744 \| 0.2638 \| 0.2002 \| 0.2556 (1, 32, 100, 100) \| 0.8579 \| 0.1725 \| 0.2077 \| 0.3023 \| 0.2399 \| 0.2995 (32, 16, 200, 200) \| 57.3466 \| 12.2179 \| 13.1320 \| 45.9524 \| 24.1526 \| 24.9882 28 cores: shape \| fp32 forward / ms \| fp16 forward / ms \| bf16 forward / ms \| fp32 backward / ms \| fp16 backward / ms \| bf16 backward / ms -- \| -- \| -- \| -- \| -- \| -- \| -- (1, 4, 256, 256) \| 0.2571 \| 0.0713 \| 0.0846 \| 0.1140 \| 0.0883 \| 0.1043 (1, 32, 100, 100) \| 0.1077 \| 0.0510 \| 0.0548 \| 0.0700 \| 0.0645 \| 0.0713 (32, 16, 200, 200) \| 5.5060 \| 1.4195 \| 1.4663 \| 6.773 \| 3.0886 \| 3.1343 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102070 Approved by: https://github.com/jgong5, https://github.com/mikaylagawarecki	2023-09-14 12:23:59 +00:00
PyTorch MergeBot	04a765f95d	Revert "add Half support for BatchNorm on CPU (#102070 )" This reverts commit 6065e7a97cfad4c2ae2b8722969648a53265fa13. Reverted https://github.com/pytorch/pytorch/pull/102070 on behalf of https://github.com/clee2000 due to sorry it looks like this is causing an unexpected success for `test_jit_fuser_te.py::TestNNCOpInfoCPU::test_nnc_correctness_nn_functional_batch_norm_cpu_float16` `6065e7a97c` https://github.com/pytorch/pytorch/actions/runs/6178069462/job/16770849782 ([comment](https://github.com/pytorch/pytorch/pull/102070#issuecomment-1718402208))	2023-09-13 22:38:42 +00:00
CaoE	6065e7a97c	add Half support for BatchNorm on CPU (#102070 ) Fixes #106543 ### Testing Single core: shape \| fp32 forward / ms \| fp16 forward / ms \| bf16 forward / ms \| fp32 backward / ms \| fp16 backward / ms \| bf16 backward / ms -- \| -- \| -- \| -- \| -- \| -- \| -- (1, 4, 256, 256) \| 0.7116 \| 0.1427 \| 0.1744 \| 0.2638 \| 0.2002 \| 0.2556 (1, 32, 100, 100) \| 0.8579 \| 0.1725 \| 0.2077 \| 0.3023 \| 0.2399 \| 0.2995 (32, 16, 200, 200) \| 57.3466 \| 12.2179 \| 13.1320 \| 45.9524 \| 24.1526 \| 24.9882 28 cores: shape \| fp32 forward / ms \| fp16 forward / ms \| bf16 forward / ms \| fp32 backward / ms \| fp16 backward / ms \| bf16 backward / ms -- \| -- \| -- \| -- \| -- \| -- \| -- (1, 4, 256, 256) \| 0.2571 \| 0.0713 \| 0.0846 \| 0.1140 \| 0.0883 \| 0.1043 (1, 32, 100, 100) \| 0.1077 \| 0.0510 \| 0.0548 \| 0.0700 \| 0.0645 \| 0.0713 (32, 16, 200, 200) \| 5.5060 \| 1.4195 \| 1.4663 \| 6.773 \| 3.0886 \| 3.1343 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102070 Approved by: https://github.com/jgong5, https://github.com/mikaylagawarecki	2023-09-13 17:30:16 +00:00
Kurt Mohler	3f88e3105f	Reland: Remove remaining global `set_default_dtype` calls from tests (#108088 ) Fixes #68972 Relands #107246 To avoid causing Meta-internal CI failures, this PR avoids always asserting that the default dtype is float in the `TestCase.setUp/tearDown` methods. Instead, the assert is only done if `TestCase._default_dtype_check_enabled == True`. `_default_dtype_check_enabled` is set to True in the `if __name__ == "__main__":` blocks of all the relevant test files that have required changes for this issue Pull Request resolved: https://github.com/pytorch/pytorch/pull/108088 Approved by: https://github.com/ezyang	2023-09-07 03:04:34 +00:00
CaoE	8f02884569	add Half support for GroupNorm on CPU (#100234 ) ### Testing Single socket (28cores): * Contiguous: shape \| forward / s\| forward / s\| backward / s\| backward / s -- \| -- \| -- \| -- \| -- \| fp32 \| mixed fp32 fp16 \| fp32 \| mixed fp32 fp16 [10, 128, 10, 10] \| 2.45E-05 \| 3.26E-05 \| 6.87E-05 \| 7.40E-05 [10, 128, 80, 80] \| 0.000726 \| 0.000606 \| 0.002183 \| 0.001112 * Channels Last: shape \| forward / s\| forward / s\| backward / s\| backward / s -- \| -- \| -- \| -- \| -- \| fp32 \| mixed fp32 fp16 \| fp32 \| mixed fp32 fp16 [10, 128, 10, 10] \| 2.88E-05 \| 2.72E-05 \| 6.56E-05 \| 6.63E-05 [10, 128, 80, 80] \| 0.00076 \| 0.000256 \| 0.002385 \| 0.000735 Single core: * Contiguous: shape \| forward / s\| forward / s\| backward / s\| backward / s -- \| -- \| -- \| -- \| -- \| fp32 \| mixed fp32 fp16 \| fp32 \| mixed fp32 fp16 [10, 128, 10, 10] \| 9.47E-05 \| 1.90E-04 \| 2.03E-04 \| 3.10E-04 [10, 128, 80, 80] \| 6.25E-03 \| 8.98E-03 \| 0.016485 \| 0.01369 * Channels Last: shape \| forward / s\| forward / s\| backward / s\| backward / s -- \| -- \| -- \| -- \| -- \| fp32 \| mixed fp32 fp16 \| fp32 \| mixed fp32 fp16 [10, 128, 10, 10] \| 8.66E-05 \| 7.89E-05 \| 1.95E-04 \| 1.43E-04 [10, 128, 80, 80] \| 5.97E-03 \| 3.13E-03 \| 0.01626 \| 8.70E-03 Pull Request resolved: https://github.com/pytorch/pytorch/pull/100234 Approved by: https://github.com/jgong5, https://github.com/mikaylagawarecki	2023-09-01 21:25:24 +00:00
Mikayla Gawarecki	3817de5d84	Fix layernorm cpu precision issues (#108089 ) #108072 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108089 Approved by: https://github.com/mingfeima, https://github.com/albanD	2023-08-30 23:55:10 +00:00
Xia, Weiwen	97a291f6bd	[ONEDNN][BC-breaking] update onednn from v2.7.3 to v3.1.1 (#97957 ) Summary Update onednn from v2.7.3 to v3.1.1. It is bc-breaking as some APIs are changed on oneDNN side. Changes include: - PyTorch code where oneDNN is directly called - Submodule `third_party/ideep` to adapt to oneDNN's new API. - CMAKE files to fix build issues. Test plan Building issues and correctness are covered by CI checks. For performance, we have run TorchBench models to ensure there is no regression. Below is the comparison before and after oneDNN update. ![image](https://github.com/pytorch/pytorch/assets/12522207/415a4ff0-7566-40c6-aed0-24997a475b0e) Note: - Base commit of PyTorch: da322ea - CPU: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz (Ice Lake) Pull Request resolved: https://github.com/pytorch/pytorch/pull/97957 Approved by: https://github.com/jgong5, https://github.com/jerryzh168	2023-08-25 12:13:18 +00:00
Aaron Gokaslan	660e8060ad	[BE]: Update ruff to 0.285 (#107519 ) This updates ruff to 0.285 which is faster, better, and have fixes a bunch of false negatives with regards to fstrings. I also enabled RUF017 which looks for accidental quadratic list summation. Luckily, seems like there are no instances of it in our codebase, so enabling it so that it stays like that. :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107519 Approved by: https://github.com/ezyang	2023-08-22 23:16:38 +00:00
PyTorch MergeBot	d59a6864fb	Revert "[BE]: Update ruff to 0.285 (#107519 )" This reverts commit 88ab3e43228b7440a33bf534cde493446a31538c. Reverted https://github.com/pytorch/pytorch/pull/107519 on behalf of https://github.com/ZainRizvi due to Sorry, but this PR breaks internal tests. @ezyang, can you please hep them get unblocked? It seems like one of the strings was prob accidentally modified ([comment](https://github.com/pytorch/pytorch/pull/107519#issuecomment-1688833480))	2023-08-22 19:53:32 +00:00
Aaron Gokaslan	88ab3e4322	[BE]: Update ruff to 0.285 (#107519 ) This updates ruff to 0.285 which is faster, better, and have fixes a bunch of false negatives with regards to fstrings. I also enabled RUF017 which looks for accidental quadratic list summation. Luckily, seems like there are no instances of it in our codebase, so enabling it so that it stays like that. :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107519 Approved by: https://github.com/ezyang	2023-08-20 01:36:18 +00:00
lcskrishna	bc662ffff9	[ROCm] Update ROCm skip decorators (#106138 ) This PR adds a msg argument for skipIfRocm and skipCUDAIfRocm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106138 Approved by: https://github.com/jataylo, https://github.com/jeffdaily, https://github.com/pruthvistony, https://github.com/albanD	2023-08-18 22:02:06 +00:00
Kurt Mohler	6af6b8f728	Reland: Remove `set_default_dtype` from nn tests (#107069 ) Part of #68972 Relands #105775 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107069 Approved by: https://github.com/ezyang	2023-08-14 17:01:57 +00:00
PyTorch MergeBot	ec0f3fda7d	Revert "Remove `set_default_dtype` from nn tests (#105775 )" This reverts commit 4d6a891baf2224cfa81bfe7632cf08be50812216. Reverted https://github.com/pytorch/pytorch/pull/105775 on behalf of https://github.com/huydhn due to Sorry for reverting you change, it is failing one of the slow test in trunk ([comment](https://github.com/pytorch/pytorch/pull/105775#issuecomment-1675460195))	2023-08-11 22:14:17 +00:00
Kurt Mohler	4d6a891baf	Remove `set_default_dtype` from nn tests (#105775 ) Part of #68972 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105775 Approved by: https://github.com/ezyang	2023-08-10 14:56:13 +00:00
Jason Lu	bc88028e8e	Back out "Reland "Make adding buffers more like adding parameters (#104069 )" (#106224 )" (#106743 ) Summary: Original commit changeset: 81319beb97f3 Original Phabricator Diff: D47961182 Test Plan: revert to maintain backward compat with legacy ads_dper3 production package. Read details in: S357822 Reviewed By: atuljangra Differential Revision: D48131623 @diff-train-skip-merge (D48131623 landed internally) Pull Request resolved: https://github.com/pytorch/pytorch/pull/106743 Approved by: https://github.com/malfet	2023-08-08 15:27:34 +00:00
Michael Gschwind	63d45275f4	is causal hints for transformer (#106143 ) Summary: make is_causal hint flags available for the top level transformer module. It's debatable whether this is useful -- at present we autodetect causal masks for src and tgt masks in transformer encoder and decoder, respectively. is_causal flags available woul enable users to short-cut this check by asserting whether they mask is causal, or not. I am putting this diff up for discussion, not as a solution. Not doing anything may be the right solution, unless there is strong (data-driven) user demand. -- it appears the consensus is to move ahead with this, as per discussions below. @cpuhrsch @mikaylagawarecki @jbschlosser @janEbert Test Plan: sandcastle Differential Revision: D47373260 Pull Request resolved: https://github.com/pytorch/pytorch/pull/106143 Approved by: https://github.com/mikaylagawarecki	2023-08-04 14:16:48 +00:00
CaoE	f82e6ff29e	add channel last 3d support for batch_norm on CPU (#97774 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/97774 Approved by: https://github.com/mingfeima, https://github.com/jgong5, https://github.com/mikaylagawarecki	2023-08-03 01:16:05 +00:00
Mikayla Gawarecki	c9be60cd0e	Add error inputs to ModuleInfo (mirroring OpInfo) (#106325 ) Add infra for error inputs to ModuleInfos, migrate first few error inputs tests from test_nn.py (more to come!) Pull Request resolved: https://github.com/pytorch/pytorch/pull/106325 Approved by: https://github.com/albanD	2023-08-01 12:49:56 +00:00
Mikayla Gawarecki	d8e5f2aa6d	Reland "Make adding buffers more like adding parameters (#104069 )" (#106224 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/106224 Approved by: https://github.com/atalman, https://github.com/albanD	2023-07-31 17:18:56 +00:00
Mikayla Gawarecki	ca7ece9b50	[easy] improve hint on error message in nn.Module.load_state_dict (#106042 ) Fix #105963 Pull Request resolved: https://github.com/pytorch/pytorch/pull/106042 Approved by: https://github.com/albanD	2023-07-27 19:56:02 +00:00
Nikita Karetnikov	eac9e1b35f	[OpInfo] add reference and error inputs for `multilabel_margin_loss` (#105523 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105523 Approved by: https://github.com/ezyang	2023-07-23 02:16:29 +00:00
Aaron Gokaslan	6d43c89f37	[BE]: Update Ruff to 0.0.280 (#105724 ) Removes unusued loop values in python dictionary iteration. Automated fix from Ruff master Pull Request resolved: https://github.com/pytorch/pytorch/pull/105724 Approved by: https://github.com/ezyang, https://github.com/janeyx99	2023-07-22 23:03:34 +00:00
Andrey Talman	c6653b65d8	Back out "Make adding buffers more like adding parameters (#104069 )" (#105581 ) Summary: D47537831 is breaking pyper tests: https://fb.workplace.com/groups/802176577445480/posts/1018902842439518/ with `TypeError: register_buffer() takes 3 positional arguments but 4 were given` Original commit changeset: d4b4069fbd38 Original Phabricator Diff: D47537831 Test Plan: ``` buck2 run //caffe2/torch/fb/training_toolkit/integration_tests/training_lifecycle/cogwheel_tests/pyper_release_v2:cogwheel_smallworld_inline_cvr_infer_pyper_pyper__canary_offline_training-launcher -- --run-harness-in-tupperware --build-fbpkg ads_dper3 --build-fbpkg training_platform ``` Reviewed By: atalman Differential Revision: D47600140 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105581 Approved by: https://github.com/mikaylagawarecki	2023-07-20 03:39:53 +00:00
Justin Chu	73e1455327	[BE] Enable ruff's UP rules and autoformat test/ (#105434 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105434 Approved by: https://github.com/albanD	2023-07-19 20:36:06 +00:00
Michael Gschwind	11b753af01	Refactor causal mask generation and detection for nn.transformer (#105265 ) Summary: * Create a private global-scope function _generate_subsequent because static class attribute member functions not supported by TorchScript resulting in torchscripting errors. * Make TransformerEncoder and TransformerDecoder consistent w.r.t. is_causal handling by calling _detect_casual_mask * Clarify documentation that is_causal is a hint * Move causal mask detection into a method _detect_causal_mask * only accept input-size compatible causal mask as causal mask * update _generate_subsequent_causal_mask to include factory kwargs for dtype and device: avoid extra copies & conversions by passing directly to torch.full. Test Plan: sandcastle & github CICD Continuation of #101487 (due to a tooling issue) which is a continuation-in-part of https://github.com/pytorch/pytorch/pull/98327 by @janEbert Differential Revision: D47427117 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105265 Approved by: https://github.com/mikaylagawarecki	2023-07-19 01:26:50 +00:00

... 2 3 4 5 6 ...

1592 Commits