pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 12:54:11 +08:00

Author	SHA1	Message	Date
Catherine Lee	fc61aae70f	Remove color in CI (#133517 ) Remove color by default to make CI logs easier to read Example of color <img width="569" alt="image" src="https://github.com/user-attachments/assets/0da13544-98b1-47be-8383-64a5b3fd8951"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/133517 Approved by: https://github.com/ZainRizvi	2024-08-26 16:58:06 +00:00
PyTorch MergeBot	42955e04f1	Revert "[dynamo] Cache _dynamo.disable results (#134272 )" This reverts commit a699bd11551e9755bb9238c6b82c369880789397. Reverted https://github.com/pytorch/pytorch/pull/134272 on behalf of https://github.com/ZainRizvi due to Fails internal tests ([comment](https://github.com/pytorch/pytorch/pull/134272#issuecomment-2310649115))	2024-08-26 16:57:53 +00:00
PyTorch MergeBot	e94bdc7876	Revert "[dynamo][guards] De-dupe DUPLICATE_INPUT guard (#134354 )" This reverts commit cdb9df5efe78142b7a612ae9c938ddf8a8850d10. Reverted https://github.com/pytorch/pytorch/pull/134354 on behalf of https://github.com/ZainRizvi due to Fails internal tests ([comment](https://github.com/pytorch/pytorch/pull/134272#issuecomment-2310649115))	2024-08-26 16:57:53 +00:00
atalman	a6fac0e969	Use ephemeral runners for windows nightly builds (#134463 ) This is definition of windows.4xlarge: ``` windows.4xlarge: disk_size: 256 instance_type: c5d.4xlarge is_ephemeral: true max_available: 420 os: windows ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/134463 Approved by: https://github.com/jeanschmidt	2024-08-26 16:33:19 +00:00
Wang, Chuanqi	b417e32da2	[CD] fix xpu nightly wheel test env (#134395 ) (#134464 ) Due to the https://github.com/pytorch/builder/pull/1972 landed, it will source xpu env duplicated in nightly wheel test. Works for https://github.com/pytorch/pytorch/issues/114850 Realnd of #134395 to be landed with pytorchmergebot Pull Request resolved: https://github.com/pytorch/pytorch/pull/134464 Approved by: https://github.com/jeanschmidt Co-authored-by: Wang, Chuanqi <chuanqi.wang@intel.com>	2024-08-26 15:35:48 +00:00
atalman	c507f402f1	Add linux arm64 ephemeral runners (#134469 ) Should be landed with: https://github.com/pytorch/test-infra/pull/5593 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134469 Approved by: https://github.com/jeanschmidt, https://github.com/clee2000	2024-08-26 15:32:45 +00:00
PyTorch MergeBot	17e8a51ff2	Revert "[inductor]Let output or input_as_strided match exact strides (#130956 )" This reverts commit a63efee5cd422db0aabe5d02d2fe35fef9be7978. Reverted https://github.com/pytorch/pytorch/pull/130956 on behalf of https://github.com/ZainRizvi due to sorry but this seems to cause internal tests to fail. Please see D61771533 for details ([comment](https://github.com/pytorch/pytorch/pull/130956#issuecomment-2310490049))	2024-08-26 15:31:23 +00:00
PyTorch MergeBot	1c4780e69a	Revert "c10d/logging: add C10D_LOCK_GUARD (#134131 )" This reverts commit 4c28a0eb0ba437c1b7db559f63f8bec17bd48f69. Reverted https://github.com/pytorch/pytorch/pull/134131 on behalf of https://github.com/ZainRizvi due to Sorry but this causes formatting errors internally which make it fail to build. See D61759282 ([comment](https://github.com/pytorch/pytorch/pull/134131#issuecomment-2310455878))	2024-08-26 15:19:27 +00:00
PyTorch MergeBot	50e90d7203	Revert "[dynamo] simplify implementation for `functools.reduce` (#133778 )" This reverts commit 6c0b15e3828b8e2a0bd726a3e5d4e98c8ced5efe. Reverted https://github.com/pytorch/pytorch/pull/133778 on behalf of https://github.com/ZainRizvi due to Sorry, but this breaks internal tests because of using functools ([comment](https://github.com/pytorch/pytorch/pull/133778#issuecomment-2310445169))	2024-08-26 15:16:17 +00:00
PyTorch MergeBot	472c7cf962	Revert "[dynamo] simplify implementation for `builtins.sum` (#133779 )" This reverts commit 8d90392fb02ce5e6854e6b4dbcdc4a7bbd55f8e2. Reverted https://github.com/pytorch/pytorch/pull/133779 on behalf of https://github.com/ZainRizvi due to Sorry, but this breaks internal tests because of using functools ([comment](https://github.com/pytorch/pytorch/pull/133778#issuecomment-2310445169))	2024-08-26 15:16:17 +00:00
PyTorch MergeBot	3d7f3f6a55	Revert "[dynamo][itertools] support `itertools.tee` (#133771 )" This reverts commit 0e49b2f18e78386c8ed9ce540a8017411c7ab0cd. Reverted https://github.com/pytorch/pytorch/pull/133771 on behalf of https://github.com/ZainRizvi due to Sorry, but this breaks internal tests because of using functools ([comment](https://github.com/pytorch/pytorch/pull/133778#issuecomment-2310445169))	2024-08-26 15:16:17 +00:00
PyTorch MergeBot	e1fc4362fb	Revert "[dynamo] simplify implementation for `os.fspath` (#133801 )" This reverts commit c5f6b72041144c00e240bcfdc783a5597c3d8928. Reverted https://github.com/pytorch/pytorch/pull/133801 on behalf of https://github.com/ZainRizvi due to Sorry, but this breaks internal tests because of using functools ([comment](https://github.com/pytorch/pytorch/pull/133778#issuecomment-2310445169))	2024-08-26 15:16:17 +00:00
Thanh Ha	bb67ff2ba7	Migrate Windows bin jobs to runner determinator (#134231 ) Update Windows binary workflows to use the runner determinator script. Closes: pytorch/ci-infra#262 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134231 Approved by: https://github.com/ZainRizvi	2024-08-26 14:56:00 +00:00
Benjamin Glass	27d97b9649	Remove unnecessary test skip (#134250 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/134250 Approved by: https://github.com/amjames, https://github.com/janeyx99	2024-08-26 14:34:53 +00:00
Andrey Talman	be96ccf77c	Revert "[CD] fix xpu nightly wheel test env (#134395 )" (#134461 ) This reverts commit 96738c9d756fbd64e6f2eba67f711d3e18f1630c. Merged without pytorchmergebot command by mistake Pull Request resolved: https://github.com/pytorch/pytorch/pull/134461 Approved by: https://github.com/jeanschmidt	2024-08-26 13:40:17 +00:00
Wang, Chuanqi	96738c9d75	[CD] fix xpu nightly wheel test env (#134395 )	2024-08-26 08:53:15 -04:00
haozhe.zhu	1ff226d88c	[inductor] support vec for atomic add (#131314 ) Depends on https://github.com/pytorch/pytorch/pull/130827 to have correct `index_expr` dtype Support vec for atomic add by scalar implementation. TestPlan: ``` python test/inductor/test_cpu_repro.py -k test_scatter_using_atomic_add_vec ``` Generated code for `test_scatter_using_atomic_add_vec` ``` cpp_fused_scatter_0 = async_compile.cpp_pybinding(['const float', 'const int64_t', 'const float', 'float'], ''' #include "/tmp/torchinductor_root/nn/cnnpkaxivwaa5rzng6qsyc4ao42vschogi3yk33ukwv3emlvxeqq.h" extern "C" void kernel(const float* in_ptr0, const int64_t* in_ptr1, const float* in_ptr2, float* out_ptr0) { { for(long x0=static_cast<long>(0L); x0<static_cast<long>(16L); x0+=static_cast<long>(16L)) { auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + static_cast<long>(x0), 16); tmp0.store(out_ptr0 + static_cast<long>(x0)); } #pragma omp simd simdlen(8) for(long x0=static_cast<long>(16L); x0<static_cast<long>(25L); x0+=static_cast<long>(1L)) { auto tmp0 = in_ptr0[static_cast<long>(x0)]; out_ptr0[static_cast<long>(x0)] = tmp0; } } { for(long x0=static_cast<long>(0L); x0<static_cast<long>(16L); x0+=static_cast<long>(16L)) { auto tmp0 = at::vec::VectorizedN<int64_t,2>::loadu(in_ptr1 + static_cast<long>(x0), 16); auto tmp12 = at::vec::Vectorized<float>::loadu(in_ptr2 + static_cast<long>(x0), 16); auto tmp1 = 25L; auto tmp2 = c10::convert<int64_t>(tmp1); auto tmp3 = at::vec::VectorizedN<int64_t,2>(tmp2); auto tmp4 = tmp0 + tmp3; auto tmp5 = static_cast<int64_t>(0); auto tmp6 = at::vec::VectorizedN<int64_t,2>(tmp5); auto tmp7 = at::vec::VecMask<int64_t,2>(tmp0 < tmp6); auto tmp8 = decltype(tmp4)::blendv(tmp0, tmp4, tmp7.template cast<int64_t,2>()); auto tmp9 = [&] { __at_align__ std::array<int64_t, 16> tmpbuf; tmp8.store(tmpbuf.data()); return tmpbuf; } () ; auto tmp10 = [&] { __at_align__ std::array<int64_t, 16> tmpbuf; #pragma GCC unroll 16 for (long x0_inner = 0; x0_inner < 16; x0_inner++) { tmpbuf[x0_inner] = static_cast<long>(tmp9[x0_inner]); } return at::vec::VectorizedN<int64_t,2>::loadu(tmpbuf.data(), 16); } () ; TORCH_CHECK((at::vec::VecMask<int64_t,2>((at::vec::VectorizedN<int64_t,2>(0) <= tmp10) & (tmp10 < at::vec::VectorizedN<int64_t,2>(25L)))).all_masked(), "index out of bounds: 0 <= tmp10 < 25L"); atomic_add_vec(out_ptr0, tmp8, tmp12); } #pragma omp simd simdlen(8) for(long x0=static_cast<long>(16L); x0<static_cast<long>(20L); x0+=static_cast<long>(1L)) { auto tmp0 = in_ptr1[static_cast<long>(x0)]; auto tmp9 = in_ptr2[static_cast<long>(x0)]; auto tmp1 = 25L; auto tmp2 = c10::convert<int64_t>(tmp1); auto tmp3 = decltype(tmp0)(tmp0 + tmp2); auto tmp4 = tmp0 < 0; auto tmp5 = tmp4 ? tmp3 : tmp0; auto tmp6 = tmp5; auto tmp7 = c10::convert<int64_t>(tmp6); TORCH_CHECK((0 <= tmp7) & (tmp7 < 25L), "index out of bounds: 0 <= tmp7 < 25L"); atomic_add(&out_ptr0[static_cast<long>(tmp5)], static_cast<float>(tmp9)); } } } ''') ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/131314 Approved by: https://github.com/jgong5, https://github.com/leslie-fang-intel	2024-08-26 10:36:51 +00:00
fduwjj	bf5c7bf06d	[FR] Fix the bug in FR script (e.g., checking all ranks dump check) (#134383 ) We somehow convert the rank to string which makes the ranks check fail. This fix now convert them all to int. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134383 Approved by: https://github.com/c-p-i-o	2024-08-26 08:21:14 +00:00
Avik Chaudhuri	92c4771853	fix stuck floordiv (#134150 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/134133 Test Plan: Tested on the small repro in the linked issue with different lengths N (replacing 100), recording N vs. time taken in nanoseconds: 10 127268319 20 220839662 30 325463125 40 429259441 50 553136055 60 670799769 70 999170514 80 899014103 90 997168902 100 1168202035 110 1388556619 120 1457488235 130 1609816470 140 2177889877 150 1917560313 160 2121096113 170 2428502334 180 4117450755 190 4003068224 So N ~ 200 takes ~5s. Previously even smaller N would go for >1 min. Didn't add a perf test because ezyang is planning to build a benchmark. Also tested on https://www.internalfb.com/diff/D61560171, which now gets past the stuck point. Differential Revision: D61619660 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134150 Approved by: https://github.com/ezyang	2024-08-26 07:27:59 +00:00
Xuehai Pan	c5f6b72041	[dynamo] simplify implementation for `os.fspath` (#133801 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133801 Approved by: https://github.com/anijain2305 ghstack dependencies: #133769, #133778, #133779, #133771	2024-08-26 07:12:15 +00:00
Amadeusz Skrzypczak	38f97ec8e3	[pt2] Add meta for poisson (#134103 ) Because aten.poisson doesn't have meta function registered, there is one additional eager execution of this op during compilation phase of torch.compile. There are more ops without meta registration. Is there any reason for it? Pull Request resolved: https://github.com/pytorch/pytorch/pull/134103 Approved by: https://github.com/ezyang	2024-08-26 06:14:38 +00:00
Aaron Orenstein	ed86ac2f25	[BE] typing for decorators - fx/_compatibility (#134054 ) Summary: See #131429 Test Plan: unit tests pass Differential Revision: D61493706 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134054 Approved by: https://github.com/oulgen	2024-08-26 04:00:27 +00:00
Laith Sakka	7b6b10417d	Remove ansi escape chars in assertExpectedInline and add options to skip comments and to skip empty lines (#134248 ) I had a night mare rewriting tests in test_misc.py specifically : 1. graphs can have comments that refers to my files "/lsakka/.." we really dont care about comments add option to ignore comments. 2. empty lines added when EXPECTTEST_ACCEPT=1 are changed with linter causing tests to fail or linter fail! add flag to ignore empty lines. 3. EXPECTTEST_ACCEPT fails when the text have some not readable characters. those should not effect comparing strings, also those causes weird diffs comments when tests fails. I removed ansi_escape chars https://github.com/pytorch/pytorch/pull/133045 this is used in Pull Request resolved: https://github.com/pytorch/pytorch/pull/134248 Approved by: https://github.com/aorenste ghstack dependencies: #133639, #134364	2024-08-26 02:03:44 +00:00
Xu Han	2ec149cd3e	[inductor] fix test_functional_call_sequential_params_and_buffers expectation on Windows (#134394 ) This UT actual code only one empty line wrap difference(`linear` and `add`) between Windows/Linux, and the context is right. Reproduce UTs: ```cmd pytest test\dynamo\test_higher_order_ops.py -v -k test_functional_call_sequential_params_and_buffers ``` We can add `empty_line_normalizer` to fix it. ```cmd ______________________________________________________________________________________________ FuncTorchHigherOrderOpTests.test_functional_call_sequential_params_and_buffers _______________________________________________________________________________________________ Traceback (most recent call last): File "D:\xu_git\dnnl_cb\pytorch\test\dynamo\test_higher_order_ops.py", line 3676, in test_functional_call_sequential_params_and_buffers self.assertExpectedInline( File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\site-packages\torch\testing\_internal\common_utils.py", line 2871, in assertExpectedInline return super().assertExpectedInline(actual if isinstance(actual, str) else str(actual), expect, skip + 1) File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\site-packages\expecttest\__init__.py", line 271, in assertExpectedInline self.assertMultiLineEqualMaybeCppStack(expect, actual, msg=help_text) File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\site-packages\expecttest\__init__.py", line 292, in assertMultiLineEqualMaybeCppStack self.assertMultiLineEqual(expect, actual, args, *kwargs) File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\unittest\case.py", line 1226, in assertMultiLineEqual self.fail(self._formatMessage(msg, standardMsg)) File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\unittest\case.py", line 675, in fail raise self.failureException(msg) AssertionError: 'clas[509 chars]one\n add: "f32[1, 1]" = linear + l_buf[69 chars],)\n' != 'clas[509 chars]one\n\n add: "f32[1, 1]" = linear + l_b[71 chars],)\n' class GraphModule(torch.nn.Module): def forward(self, L_params_l1_weight_: "f32[1, 1]", L_params_l1_bias_: "f32[1]", L_buffers_buffer_: "f32[1]", L_inputs_: "f32[1, 1]"): l_params_l1_weight_ = L_params_l1_weight_ l_params_l1_bias_ = L_params_l1_bias_ l_buffers_buffer_ = L_buffers_buffer_ l_inputs_ = L_inputs_ linear: "f32[1, 1]" = torch._C._nn.linear(l_inputs_, l_params_l1_weight_, l_params_l1_bias_); l_inputs_ = l_params_l1_weight_ = l_params_l1_bias_ = None + <<<< (difference is here ) add: "f32[1, 1]" = linear + l_buffers_buffer_; linear = l_buffers_buffer_ = None return (add,) : To accept the new output, re-run test with envvar EXPECTTEST_ACCEPT=1 (we recommend staging/committing your changes before doing this) To execute this test, run the following from the base repo dir: python test\dynamo\test_higher_order_ops.py FuncTorchHigherOrderOpTests.test_functional_call_sequential_params_and_buffers This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 ========================================================================================================================== short test summary info ========================================================================================================================== FAILED [0.4275s] test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_functional_call_sequential_params_and_buffers - AssertionError: 'clas[509 chars]one\n add: "f32[1, 1]" = linear + l_buf[69 chars],)\n' != 'clas[509 chars]one\n\n add: "f32[1, 1]" = linear + l_b[71 chars],)\n' ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/134394 Approved by: https://github.com/jansel Co-authored-by: Jason Ansel <jansel@jansel.net>	2024-08-26 01:41:20 +00:00
Tianyi Tao	7af38eb98b	Fix unexpected inference_mode interaction with torch.autograd.functional.jacobian (#130307 ) Fixes #128264 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130307 Approved by: https://github.com/soulitzer	2024-08-25 22:14:02 +00:00
Xu Han	dc1959e6a7	[inductor] calibration inductor windows uts (7/N) (#134420 ) Disable UTs on Windows: `test/dynamo/test_misc.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/134420 Approved by: https://github.com/jansel	2024-08-25 20:39:54 +00:00
Xu Han	97fd087cdb	[inductor] calibration inductor windows uts (6/N) (#134419 ) Disable UTs for Windows: `test/dynamo/test_aot_autograd_cache.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/134419 Approved by: https://github.com/jansel	2024-08-25 20:39:34 +00:00
Richard Barnes	b5dd60fa75	Fix namespace issues with qnnpack (#134336 ) After this I think all `using namespace` will have been eliminated from PyTorch header files. Internally, `-Wheader-hygiene` will prevent more from being added. Test Plan: Sandcastle Differential Revision: D61679037 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134336 Approved by: https://github.com/Skylion007	2024-08-25 19:50:01 +00:00
Igor Sugak	7940f2428f	[torch/package_importer] add compatibility name mapping (#134376 ) Summary: This enables patching extern modules to provide compatibility with serialized code depending on different versions of those extern modules. The main motivation is to enable Numpy upgrade. In the recent release many alias to builtin types were deprecated and removed [1]. This breaks loading pickled modules that reference the removed aliases. While the proper solution is to re-generate pickled modules, it's not always feasible. This proposes a way to define mapping with a new type, for a module member. It is only set if it's not present in the loaded module, thus removes the need to check for exact versions. https://numpy.org/doc/stable/release/1.20.0-notes.html#using-the-aliases-of-builtin-types-like-np-int-is-deprecated Differential Revision: D61556888 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134376 Approved by: https://github.com/SherlockNoMad	2024-08-25 19:34:46 +00:00
Shivam Raikundalia	816061843a	[Distributed/Profiler] Fix input/output dimension overflow (#134360 ) Summary: When using ParamCommsDebugInfo, the input elements and output elements are stored in `int` instead of `int64_t` Test Plan: Run HTA with new outputted values and make sure overflow does not occur Reviewed By: fengxizhou Differential Revision: D61728747 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134360 Approved by: https://github.com/fengxizhou, https://github.com/jeanschmidt	2024-08-25 16:25:56 +00:00
eqy	e93ca12c88	[CUDNN][SDPA] Fix unsupported trivial stride-1 transpose case (#134031 ) Fixes #134001 Incorrect assumption that two same-shape tensors being contiguous meant that they would have the same stride Pull Request resolved: https://github.com/pytorch/pytorch/pull/134031 Approved by: https://github.com/drisspg, https://github.com/Skylion007 Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>	2024-08-25 14:31:30 +00:00
Chirag Pandya	08d111250a	[ez][c10d] change ERROR to WARNING (#134349 ) Summary: Change error to warning because TCPStore can be torn down during a normal shutdown. It's OK if we're unable to access TCPStore. Should not be an error. Test Plan: Ran locally Pull Request resolved: https://github.com/pytorch/pytorch/pull/134349 Approved by: https://github.com/fduwjj, https://github.com/wconstab	2024-08-25 14:22:55 +00:00
PyTorch MergeBot	4648848696	Revert "[ROCm] remove triton-rocm commit pin and merge pins with triton.txt (#133438 )" This reverts commit f71c3d265ab52589f983dd252d61461db4e7dbbd. Reverted https://github.com/pytorch/pytorch/pull/133438 on behalf of https://github.com/jeanschmidt due to seems to have introduced breakages in linux binary builds ([comment](https://github.com/pytorch/pytorch/pull/133438#issuecomment-2308787310))	2024-08-25 11:20:30 +00:00
PyTorch MergeBot	e5563f7ad7	Revert "[dtensor][MTPG] make sharding prop lru cache not shared among threads (#134294 )" This reverts commit eb15b1a016c6facaf8605dde2c20b5de1586542d. Reverted https://github.com/pytorch/pytorch/pull/134294 on behalf of https://github.com/jeanschmidt due to seems to have introduced https://github.com/pytorch/pytorch/actions/runs/10537099590/job/29201744658 ([comment](https://github.com/pytorch/pytorch/pull/134294#issuecomment-2308785949))	2024-08-25 11:16:04 +00:00
wz337	268092db83	[DeviceMesh] Allow _flatten() to take in an optional mesh_dim_name (#134048 ) If a mesh_dim_name is given, we will use the given mesh_dim_name to name the new flattened dim. Otherwise, the default is a string concatentaing the mesh_dim_names of the given submesh with each mesh_dim_name separated by "_". For example, if we have a 3D mesh DeviceMesh([[[0, 1], [2, 3]], [[4, 5], [6, 7]]], mesh_dim_names=("dp", "cp", "tp")), calling mesh_3d["dp", "cp"]._flatten() will create a 1D submesh DeviceMesh([0, 1, 2, 3], mesh_dim_names=("dp_cp",)) on rank 0, 1, 2, 3 and a 1D submesh DeviceMesh([4, 5, 6, 7], mesh_dim_names=("dp_cp",)) on rank 4, 5, 6, 7. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134048 Approved by: https://github.com/fegin ghstack dependencies: #133838, #133839	2024-08-25 10:36:01 +00:00
Edward Z. Yang	326db8af4c	Replace sympy Min/Max with reimplementations (#133319 ) Sympy's implementation of Min/Max displays asymptotically bad behavior on `TORCH_COMPILE_CPROFILE=1 python torchrec/distributed/tests/test_pt2_multiprocess.py TestPt2Train.test_compile_multiprocess`. Evidence profile: ![image](https://github.com/user-attachments/assets/142301e9-3a18-4370-b9db-19b32ece7ee8) On this test case, we spend 42% of all time compiling the network on ShapeEnv.replace, which in turn spends all of its time in xreplace. The problem appears to be find_localzeros call. By vendoring the implementations of Min/Max, we can potentially reduce the cost of this operation. The implementation is copy-pasted sympy/functions/elementary/miscellaneous.py but with some adjustments: * I deleted logic related to differentatiation, evalf and heaviside, as it's not relevant to PyTorch reasoning * There's some massaging to appease PyTorch's linters, including a lot of noqa and type: ignore (which I could potentially refactor away with substantive changes, but that's better as its own change) * I deleted the second loop iteration for is_connected, as an attempt at initial optimization (this also simplifies the port, since I can omit some code). I'll comment at that point what the exact difference is. Before this change, the test in question takes 100s with 40 features; post this change, afterwards, it takes only 69s. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/133319 Approved by: https://github.com/Skylion007	2024-08-25 05:05:59 +00:00
Avik Chaudhuri	8db8ac700d	line by line logging (#134298 ) Summary: Today there is no good mechanism to detect progress of non-strict export line-by-line in user code. This caused some pain recently in trying to find the exact line of user code that was triggering a bug where the process appeared stuck because deep down something was calling some symbolic shapes code that was suffering some exponential blowup. This PR adds a environment variable for extended debugging that will log the line of user code corresponding to every torch function call. It only works in non-strict export for now. Prefix setting this environment variable with `TORCH_LOGS` enabled for `export` logs at `DEBUG` level (i.e., with a `+` prefix), i.e.,.: ``` TORCHEXPORT_EXTENDED_DEBUG_CURRENT_LOC=1 TORCH_LOGS="+export" ... ``` This will show logs with something like: ``` ... prim::device called at .../example.py:4284 in foo TensorBase.item called at .../example.py:4277 in bar ... ``` We already have an existing place to intercept torch functions where we process data-dependent errors in non-strict, so parking the logging there. An alternative place we could be doing this is where we add `stack_trace` metadata when generating code, but unfortunately at least the example that motivated this gets stuck before generating code, so that would be too late. Test Plan: ran it on some sample commands Differential Revision: D61692156 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134298 Approved by: https://github.com/angelayi	2024-08-25 02:57:11 +00:00
Xu Han	907c32faac	[inductor] calibration inductor windows uts (4/N) (#134401 ) skip failed UTs of `test/dynamo/test_unspec.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/134401 Approved by: https://github.com/ezyang	2024-08-25 00:32:29 +00:00
Xu Han	74ef74be36	[inductor] calibration inductor windows uts (3/N) (#134400 ) skip Windows UT of `test/dynamo/test_trace_rules.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/134400 Approved by: https://github.com/ezyang	2024-08-24 23:48:50 +00:00
Shivam Raikundalia	d33d68e326	[Profiler] Add test to make sure FunctionEvents are processed lazily (#134359 ) Summary: Create simple test that checks that FunctionEvent build tree happens lazily by checking that the metrics for it changes before and after call. Test Plan: Make sure test passes in CI Reviewed By: briancoutinho Differential Revision: D61685429 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134359 Approved by: https://github.com/briancoutinho	2024-08-24 23:03:19 +00:00
Xu Han	af4c87953e	[inductor] calibration inductor windows uts (5/N) (#134402 ) skip UTs of `test/dynamo/test_repros.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/134402 Approved by: https://github.com/ezyang	2024-08-24 23:00:11 +00:00
Bob Ren	94f92fbd88	Use integer divison in arange length calculation when start/end/step are integral (#134296 ) Fixes #133338 Test Plan: ``` TORCH_LOGS=dynamic python import torch torch._dynamo.config.capture_scalar_outputs = True @torch.compile() def f(x): y = x.item() torch._check_is_size(y) r = torch.arange(y, dtype=torch.float32) torch._check(r.size(0) == y) return r f(torch.tensor([300])) ``` Before and after diff. Verify the following line ``` I0813 11:05:44.890000 652898 torch/fx/experimental/symbolic_shapes.py:5198] [0/0] runtime_assert Eq(CeilToInt(IntTrueDiv(u0, 1)), u0) [guard added] at aa.py:10 in f (_dynamo/utils.py:2092 in run_node), for more info run with TORCHDYNAMO_EXTENDED_DEBUG_GUARD_ADDED="Eq(CeilToInt(IntTrueDiv(u0, 1)), u0)" ``` no longer shows in the logs. Also verify CI passes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134296 Approved by: https://github.com/aorenste	2024-08-24 21:09:28 +00:00
Aart Bik	1a0d00f1f4	[traced-graph][sparse] enable to_dense() for compressed (#133371 ) Fixes https://github.com/pytorch/pytorch/issues/133174 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133371 Approved by: https://github.com/ezyang	2024-08-24 20:33:23 +00:00
Aart Bik	050aa67e41	[traced-graph][sparse] fix restrictive assert for sparse add (#134037 ) exporting sparse addition can be CPU/Meta this fixes the overly restrictive assert and adds an exporting test Pull Request resolved: https://github.com/pytorch/pytorch/pull/134037 Approved by: https://github.com/ezyang	2024-08-24 20:26:47 +00:00
Xu Han	90fb83749e	[inductor] fix test torch package working with trace on windows (#134397 ) Current temporary directory path is hard code. Fixed by get temporary directory path by API. Reproduce UTs: ```cmd python test/dynamo/test_dynamic_shapes.py -v -k test_torch_package_working_with_trace_dynamic_shapes ``` Error message: ```cmd ________________________________________________________________________________________________ DynamicShapesMiscTests.test_torch_package_working_with_trace_dynamic_shapes ________________________________________________________________________________________________ Traceback (most recent call last): File "D:\xu_git\dnnl_cb\pytorch\test\dynamo\test_misc.py", line 7199, in test_torch_package_working_with_trace with package.PackageExporter(path) as exp: File "C:\Users\Xuhan\.conda\envs\win_mkl_static\lib\site-packages\torch\package\package_exporter.py", line 237, in __init__ self.zip_file = torch._C.PyTorchFileWriter(f) RuntimeError: Parent directory /tmp does not exist. To execute this test, run the following from the base repo dir: python test\dynamo\test_dynamic_shapes.py DynamicShapesMiscTests.test_torch_package_working_with_trace_dynamic_shapes This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 ========================================================================================================================== short test summary info ========================================================================================================================== FAILED [0.0080s] test/dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_torch_package_working_with_trace_dynamic_shapes - RuntimeError: Parent directory /tmp does not exist. ==================================================================================================================== 1 failed, 1665 deselected in 4.00s ===================================================================================================================== ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/134397 Approved by: https://github.com/ezyang	2024-08-24 20:25:44 +00:00
Jonathan Deakin	9cd53b3212	Add Arm copyright line to LICENSE (#133982 ) Some historical commits from arm: - 2021 664126bab5f3f2a275e82b7bde127132cff7f34e - 2023 2630144786e906b40abbe017294d404bcfe3c6ae - 2024 ce6130014156fa9555ce3d16c5f9a84cbdadf8f4 See https://github.com/pytorch/pytorch/pull/126687 for initial discussion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/133982 Approved by: https://github.com/malfet	2024-08-24 18:41:06 +00:00
Jonathan Deakin	50d5aa8c10	Enable optimized dynamic quantization on aarch64 (#126687 ) oneDNN+ACL has optimized kernels for s8s8 matmul, so input is signed. This change leaves behaviour on all other platforms the same. This change requires https://github.com/intel/ideep/pull/313 to go in, and oneDNN 3.5 for the optimized kernels. This change speeds up dynamic quantized linear by ~10x. Also, do you have a policy on copyright headers? Arm's usual policy when contributing to open source projects is to include a copyright header on any file which is modified. Would this be acceptable? If not, is there somewhere else suitable to note copyright? Pull Request resolved: https://github.com/pytorch/pytorch/pull/126687 Approved by: https://github.com/jgong5, https://github.com/malfet, https://github.com/snadampal Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>	2024-08-24 18:40:12 +00:00
Jack Taylor	f71c3d265a	[ROCm] remove triton-rocm commit pin and merge pins with triton.txt (#133438 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133438 Approved by: https://github.com/jithunnair-amd, https://github.com/malfet	2024-08-24 18:26:49 +00:00
chuanqiw	6245d5b87b	[CI] Update XPU ci test python version to 3.9 (#134214 ) Works for https://github.com/pytorch/pytorch/issues/114850 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134214 Approved by: https://github.com/EikanWang, https://github.com/malfet	2024-08-24 18:11:36 +00:00
Yueming Hao	a63efee5cd	[inductor]Let output or input_as_strided match exact strides (#130956 ) Fixes #130394 TorchInductor doesn't respect original strides of outputs. It opens up optimization opportunities like changing up memory layout. But for some cases, such as the case in https://github.com/pytorch/pytorch/issues/130394, we do need the output match the exact stride as required. The correctness is the first priority goal. So, this PR adds a new API `ir.ExternKernel.require_exact_strides(x, exact_strides, allow_padding=False)` to fix the issue. This PR enables non-dense outputs' strides follow the strides required by semantics. The comparison between the original and after this fix for the test is the below. ```python @triton.jit def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr): xnumel = 128 xoffset = tl.program_id(0) * XBLOCK xindex = xoffset + tl.arange(0, XBLOCK)[:] xmask = xindex < xnumel x0 = xindex % 8 x1 = (xindex // 8) - x2 = xindex tmp0 = tl.load(in_ptr0 + (x0 + (16x1)), xmask) tmp1 = tmp0 + tmp0 - tl.store(out_ptr0 + (x2), tmp1, xmask) + tl.store(out_ptr0 + (x0 + (16x1)), tmp1, xmask) def call(args): arg0_1, = args args.clear() assert_size_stride(arg0_1, (16, 8), (16, 1)) with torch.cuda._DeviceGuard(0): torch.cuda.set_device(0) - buf1 = empty_strided_cuda((16, 8), (8, 1), torch.float32) + buf1 = empty_strided_cuda((16, 8), (16, 1), torch.float32) stream0 = get_raw_stream(0) triton_poi_fused_add_copy_0.run(arg0_1, buf1, 128, grid=grid(128), stream=stream0) del arg0_1 return (buf1, ) ``` The buf1 is created with exact stride required by users, and its values are written in same stride with the input. Pull Request resolved: https://github.com/pytorch/pytorch/pull/130956 Approved by: https://github.com/eellison, https://github.com/blaine-rister	2024-08-24 17:04:05 +00:00

1 2 3 4 5 ...

77620 Commits