pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
Eli Uriegas	b2eb0e8c6a	docker: Use miniforge, install from pip (#134274 ) Switch installation of the pytorch package to be installed from our download.pytorch.org sources which are better maintained. As well, switching over the miniconda installation to a miniforge installation in order to ensure backwards compat for users expecting to have the conda package manager installed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134274 Approved by: https://github.com/malfet, https://github.com/atalman Co-authored-by: atalman <atalman@fb.com>	2024-08-22 23:20:22 +00:00
Stonepia	30d7e7a1cd	[XPU] Fix patch for old llvm package error for triton xpu (#134204 ) Fixes #134199 The PR #133694 does a workaround to replace the str `"https://tritonlang.blob.core.windows.net/llvm-builds/"` with `"https://oaitriton.blob.core.windows.net/public/llvm-builds/"` in `triton/python/setup.py`. However, in [newer version of Triton](`06e6799f4e`), it has already been changed to `"https://oaitriton.blob.core....` and don't need to be replaced. But formerly, this will throw a runtime error. This PR makes the `check_and_replace` logic won't fail in such a scenario. Both the old link and the newer link could work. Also note that the `.ci/docker/common/install_triton.sh` does not need the fix, because its `sed` command won't be in effect if there is no such pattern. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134204 Approved by: https://github.com/chuanqi129, https://github.com/EikanWang, https://github.com/atalman	2024-08-22 23:18:44 +00:00
drisspg	629bd6f718	Update FlexAttention with masking semantic (#133373 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133373 Approved by: https://github.com/yanboliang	2024-08-22 22:50:33 +00:00
fduwjj	e7929809f3	[c10d][ez] Add comments to CudaEventCache class (#134172 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/134172 Approved by: https://github.com/d4l3k, https://github.com/kwen2501	2024-08-22 22:44:12 +00:00
Justin Chu	b319fa3fd9	[ONNX] Opt into ruff fmt (#134120 ) Add ONNX directory to use ruff format. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134120 Approved by: https://github.com/XuehaiPan, https://github.com/Skylion007	2024-08-22 22:44:03 +00:00
Dan Johnson	25499de814	Remove ncclIdToCommMap_. (#133961 ) There is no purpose for this map structure, and it is incorrect in some cases. For example, when the uniqueID is not broadcasted to the other processes. @exported-using-ghexport Differential Revision: [D60966882](https://our.internmc.facebook.com/intern/diff/D60966882/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133961 Approved by: https://github.com/shuqiangzhang ghstack dependencies: #133960	2024-08-22 22:06:25 +00:00
Shangdi Yu	b0cf287b46	[export][training ir migration] Fix getitem not exist (#134259 ) Summary: Make quantization tests compatible with the new training IR. With the new batch norm node `torch.ops.aten.batch_norm.default`, we don't need an additional getitem node after the bn node, so tests need to be fixed to not check for the getitem node. We added a capture_pre_autograd_graph_using_training_ir() function, which returns True when we are using the training ir, and False otherwise. This way, the code supports both training ir and the old ir. For now, we are just rolling out the training ir for fbcode internal tests. Test Plan: ``` buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_qat_preserve_source_fn_stack buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_qat_update_shared_qspec buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_conv2d buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_qat_conv_bn_relu_fusion buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_qat_conv_bn_fusion buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_qat_conv_bn_fusion_literal_args ``` Reviewed By: andrewor14, tugsbayasgalan Differential Revision: D61292102 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134259 Approved by: https://github.com/tugsbayasgalan	2024-08-22 22:00:14 +00:00
Bin Bao	f0ba309d78	[CI][dashboard] Add jemalloc back for aarch64 (#134189 ) Forward fix based on https://github.com/pytorch/pytorch/pull/133997#discussion_r1726004220 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134189 Approved by: https://github.com/malfet, https://github.com/huydhn	2024-08-22 21:08:39 +00:00
Dan Johnson	1b6bbaa016	Remove PMI dependencies in PyTorch (#133960 ) This patch makes two changes: 1. Whenever ncclCommSplit accepts groupRanks in its config, we should populate it. This is independent of using PMI or not. For example, non-PMI NCCL can also use this information, if it chooses to. 2. Provide a user flag to decide when to do a uniqueId broadcast and when to skip it. This is a performance optimization, and not a correctness requirement. If the user forgets to set this, we will do the uniqueId broadcast, which is wasteful (because it will be ignored by NCCL), but not incorrect. @exported-using-ghexport Differential Revision: [D60966774](https://our.internmc.facebook.com/intern/diff/D60966774/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133960 Approved by: https://github.com/shuqiangzhang	2024-08-22 20:34:43 +00:00
Yanbo Liang	ff61f55387	[Dynamo][autograd.Function] Supports ctx.set_materialize_grads (#133978 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133978 Approved by: https://github.com/zou3519	2024-08-22 20:06:17 +00:00
Zain Rizvi	5633773188	Convert various jobs to be Linux Foundation fleet compatible (#134128 ) Migrates a batch of workflows over to LF Pull Request resolved: https://github.com/pytorch/pytorch/pull/134128 Approved by: https://github.com/zxiiro, https://github.com/jeanschmidt	2024-08-22 19:23:07 +00:00
Jeff Daily	0eb9c870fd	[reland][ROCm] TunableOp for gemm_and_bias (#128919 ) Reland of #128143 but added `alpha` and `bias` initialization to `launchTunableGemmAndBias` Thus far TunableOp was implemented for gemm, bgemm, and scaled_mm. gemm_and_bias was notably missing. This PR closes that gap. Pull Request resolved: https://github.com/pytorch/pytorch/pull/128919 Approved by: https://github.com/malfet	2024-08-22 18:27:50 +00:00
Shangdi Yu	978c5a80a0	[export][training ir migration] fix batch norm pattern match in quantization (#134157 ) Summary: In the new training ir, we produce `torch.ops.aten.batch_norm.default` instead of `torch.ops.aten._native_batch_norm_legit.default` or `torch.ops.aten._native_batch_norm_legit_no_training.default`. So we need to change the pattern match to accomodate the new op. - Add `torch.ops.aten.batch_norm.default` to pattern matcher list so it's identified as a batch norm node - `torch.ops.aten.batch_norm.default` doesn't have a getitem user anymore, so when removing the bn norm, we need to do `bn_node.replace_all_uses_with(conv_node)` instead of `getitem_node.replace_all_uses_with(conv_node)` The behavior of capture_pre_autograd_graph is consistent for each run. If the run is a fbcode test, then capture_pre_autograd_graph uses training IR. This means both _get_aten_graph_module_for_pattern and replace_pattern_with_filters see the same training IR. If the run is not a fbcode test, then both would see the old IR. Test Plan: ``` buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_conv2d_binary2 buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_conv2d_unary buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_linear_unary buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_dynamic_quant_linear buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_qat_dynamic_quant_linear buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_flatten_recipe buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_linear_unary ``` Reviewed By: andrewor14, tugsbayasgalan Differential Revision: D61291077 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134157 Approved by: https://github.com/tugsbayasgalan	2024-08-22 18:25:45 +00:00
Animesh Jain	fee677eeb6	[fbode-testing][dynamo][reland][inline-inbuilt-nn-modules] Mark attri… (#134136 ) Shuai wants to test this internally before https://github.com/pytorch/pytorch/pull/133713 can go in. Creating a separate PR for ghmport. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134136 Approved by: https://github.com/yanboliang	2024-08-22 17:54:58 +00:00
Thanh Ha	8f7d66f0c3	Enable dynamic rollout for Linux binary workflows (#131472 ) Enables dynamic migration of jobs to the LF AWS account for binary workflows. The new runners are only given to people specified in this issue: pytorch/test-infra#5132 This closes pytorch/ci-infra#251. Depends-On: pytorch/pytorch#132870 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131472 Approved by: https://github.com/ZainRizvi	2024-08-22 17:12:50 +00:00
Aaron Orenstein	d95aedf5fd	[BE] typing for decorators - fx/_compatibility (part 1) (#134202 ) Part of #134054. This corresponds to the pytorch mypy changes from D61493706. Updating takes so long and touches so many files that it's impossible to land as a whole without conflicting with some other intermediate change. So landing these 'type: ignore' for pytorch in advance of them actually being needed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134202 Approved by: https://github.com/Skylion007	2024-08-22 17:07:33 +00:00
yuqingj	44fa9f991c	[NJT] add aten.to.dtype support (#134164 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/134164 Approved by: https://github.com/davidberard98	2024-08-22 16:59:38 +00:00
Xuehai Pan	b6abac68ec	[BE][dynamo] reorganize polyfill module hierarchy (#133977 ) Changes: 1. Move `polyfill.py` -> `polyfills/__init__.py`. It can be used as `polyfill.xxx` -> `polyfills.xxx`. 2. Move submodule loading from `polyfills/__init__.py` to `polyfills/loader.py`. Merge `polyfill.py` and `polyfills/` packages. Each polyfill module have its own namespace for better code organization. The ultimate goal is make `polyfills/__init__.py` empty and all polyfill functions move to its own namespace. Pull Request resolved: https://github.com/pytorch/pytorch/pull/133977 Approved by: https://github.com/jansel	2024-08-22 16:42:29 +00:00
Xuehai Pan	c95ddd4bf2	[dynamo] ensure polyfill function has the same signature as the original function in `substitute_in_graph` (#133813 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133813 Approved by: https://github.com/jansel	2024-08-22 16:38:06 +00:00
Shangdi Yu	240467adfe	[fx] Implement deepcopy for Proxy (#133706 ) Summary: When deepcopy a proxy, we first try the default deepcopy behavior. Test Plan: buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:fx -- -r proxy_deepcopy Differential Revision: D61398418 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133706 Approved by: https://github.com/angelayi	2024-08-22 16:37:30 +00:00
PyTorch MergeBot	b0171c3920	Revert "[ONNX] Opt into ruff fmt (#134120 )" This reverts commit 0870398fa8c3e097640f31cb8a8e2e2d3e522d33. Reverted https://github.com/pytorch/pytorch/pull/134120 on behalf of https://github.com/albanD due to Breaks main branch lint ([comment](https://github.com/pytorch/pytorch/pull/134120#issuecomment-2305089756))	2024-08-22 15:48:14 +00:00
Simon Mahns	828ab84e19	Improve error msg on _lazy_init() error (#134159 ) Reviewed By: hanzlfs Differential Revision: D61627609 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134159 Approved by: https://github.com/hanzlfs	2024-08-22 15:10:50 +00:00
James Wu	3c5485fb7f	[Retry] Log chromium events to scuba (#134118 ) Summary: This diff implements a bunch of views for internal scuba viewing. TODOS that I might punt to another diff: - Saving cache stats via counter is definitely sus here, but there's not really a good way to track "fx graph cache hit for this compile phase" right now. Will think about this more. - We should definitely log frame id, compile id, etc - We should definitely be logging configs. That way, we can A/B test based on whether a config is turned on. - idk what I'm doing with compile_uuid yet, but it's useful when you want to look at samples for a single run. I think if we had mast job info this field is not needed, but it's nice to be able to drill down to a single run and get its chrome trace view or icicle view, so idk Test Plan: All of the above views are run with nanogpt benchmark: ``` buck run mode/opt caffe2/benchmarks/dynamo:torchbench -- --training --backend=inductor --only nanogpt --performance ``` Differential Revision: D61603243 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134118 Approved by: https://github.com/oulgen	2024-08-22 14:59:45 +00:00
Isuru Fernando	1b10a5c652	Allow SymInts and SymFloats as other in div_softmax_pattern (#133989 ) Fixes https://github.com/pytorch/pytorch/issues/133759 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133989 Approved by: https://github.com/ezyang	2024-08-22 14:36:01 +00:00
Vladimir Monakhov	afc2615d33	Add proper casting to fuse_linear_bn_weights (#134105 ) As per title, this PR adds proper casting to fuse_linear_bn_weights in the same style as the conv case above. This previously caused numerical issues on my end, so that is why I am fixing it. Also cleans up the docstring. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134105 Approved by: https://github.com/mikaylagawarecki	2024-08-22 14:26:12 +00:00
yuqingj	b459ca78eb	[NJT]Add unit tests that cover the internal use cases using new NJT API (#133513 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133513 Approved by: https://github.com/davidberard98, https://github.com/soulitzer	2024-08-22 13:54:40 +00:00
PyTorch MergeBot	1a7e8e5780	Revert "Update FlexAttention with masking semantic (#133373 )" This reverts commit 5a7b544e5c3e37bea62c6a231f6230c004a33d38. Reverted https://github.com/pytorch/pytorch/pull/133373 on behalf of https://github.com/jeanschmidt due to Broke internal test/inductor signals, see D61611729 ([comment](https://github.com/pytorch/pytorch/pull/133373#issuecomment-2304714503))	2024-08-22 13:47:26 +00:00
PyTorch MergeBot	88c973005d	Revert "[FlexAttention] Enable different qk and v head-dims (#134043 )" This reverts commit e847b6bb9ba281b0db83fcdd79c328252403e9e8. Reverted https://github.com/pytorch/pytorch/pull/134043 on behalf of https://github.com/jeanschmidt due to Need to revert, in order to be able to revert https://github.com/pytorch/pytorch/pull/133373, feel free to reland this after solving conflicts ([comment](https://github.com/pytorch/pytorch/pull/134043#issuecomment-2304708996))	2024-08-22 13:44:17 +00:00
Aaron Gokaslan	83b5d449a3	Add full float16/bfloat16 support to MaxUnPool (#133774 ) It already supported half so might as well add bfloat16 support for parity Pull Request resolved: https://github.com/pytorch/pytorch/pull/133774 Approved by: https://github.com/eqy, https://github.com/ezyang	2024-08-22 13:34:43 +00:00
Aaron Gokaslan	c9c84ae3ee	[BE][Ez]: Update CUDNN_frontend submodule to 1.6.1 (#134007 ) Update cudnn_frontend submodule to 1.6.1 to patch some minor bugfixes and compiler fixes. # Bug fix * Fixed an issue where custom dropout mask was not correctly applied. * Added -fvisibility=hidden for the pip wheels generated to avoid symbol conflicts with other modules that use cudnn frontend. * Fixed an issue in sdpa operation which when deserialized will lead to numerical mismatches. * Fixed an issue in sdpa fp8 fprop operation (in inference mode). # Samples * Added a new sample to showcase how a custom dropout mask can be applied to a sdpa operation. * Added a sample to showcase convolutions on large (c * d * h * w > 2 **31) tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134007 Approved by: https://github.com/eqy	2024-08-22 13:34:17 +00:00
Howard Huang	108a75b454	[PP] Add ZeroBubble schedule (#133467 ) Zero bubble can be expressed through `ScheduleFlexibleInterleaved1F1B` by setting `enable_zero_bubble=True`. But instead of having to include this flag in schedule initialization we should create a separate ZeroBubbleSchedule and also transition `Interleaved1F1B` to derive from `ScheduleFlexibleInterleaved1F1B`. Then we dont need to expose `ScheduleFlexibleInterleaved1F1B` since the naming is not obvious Pull Request resolved: https://github.com/pytorch/pytorch/pull/133467 Approved by: https://github.com/wconstab ghstack dependencies: #132691	2024-08-22 13:32:15 +00:00
PyTorch MergeBot	cedfac20c7	Revert "[SymmetricMemory] introduce multicast support, multimem_all_reduce_ and multimem_one_shot_all_reduce (#133424 )" This reverts commit 66d3eb783c3b3d7087988dd29bfb619b7f4306b7. Reverted https://github.com/pytorch/pytorch/pull/133424 on behalf of https://github.com/jeanschmidt due to Broke internal ADS builds, see D61611517 ([comment](https://github.com/pytorch/pytorch/pull/133424#issuecomment-2304676328))	2024-08-22 13:29:27 +00:00
Andrew Gu	592a172910	[FSDP2] Resolved strided sharding todo in clipping tests (#134152 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/134152 Approved by: https://github.com/XilunWu, https://github.com/weifengpy, https://github.com/wz337	2024-08-22 12:45:13 +00:00
Jez Ng	4c645c04d8	Fix type of get_raw_stream (#134187 ) Just something I noticed while implementing a new DeviceInterface I had to add `# type: ignore[assignment]` because mypy thinks DeviceInterface.get_raw_stream is a `Callable` and therefore incompatible with a `staticmethod`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134187 Approved by: https://github.com/jansel	2024-08-22 12:00:08 +00:00
Xu Han	5fb8754434	[inductor] write cpp code with encoding utf-8 (#134027 ) Windows is different to Linux, each Windows version with different language pack have different code page. Inductor on Windows will write the genarated cpp code with its code page, and it should occured un-decode character failed. For this situlation, Microsoft suggest to use Unicode to instead of a specific code page. Ref: https://learn.microsoft.com/en-us/windows/win32/intl/code-page-identifiers Changes: 1. Use `utf-8` as encoder for cpp code. 2. It only change encode for cpp code, but not for binary type. binary type is for AoT binary context. It works on https://github.com/pytorch/pytorch/issues/122094#issuecomment-2299592942. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134027 Approved by: https://github.com/desertfire, https://github.com/jgong5, https://github.com/jansel	2024-08-22 11:54:32 +00:00
Luca Wehrstedt	aea1148d56	[fp8 rowwise] Clarify dtypes (#134114 ) Disambiguate some of the dtypes (e.g., for the scales), move the "constant" ones out of the function, and use safe casting functions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134114 Approved by: https://github.com/drisspg ghstack dependencies: #134110, #134111, #134112, #134113	2024-08-22 11:07:39 +00:00
Luca Wehrstedt	72586ccd14	[fp8 rowwise] Don't build separate kernel for no bias (#134113 ) CUTLASS automatically skips a stage in the epilogue if we provide a nullptr. Thus, instead of building a special kernel for bias=None, we can reuse one of the other ones. This also considerably simplifies the code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134113 Approved by: https://github.com/drisspg ghstack dependencies: #134110, #134111, #134112	2024-08-22 11:07:39 +00:00
Luca Wehrstedt	d64fa11095	[fp8 rowwise] Fix bias calculation being done in low precision (#134112 ) The compute dtype for the bias addition was set to ElementBias. Thus, for a bf16 bias, we would cast the fp32 accum to bf16 and _then_ add the bias. It is however (slightly?) more accurate to first add the bias in fp32 and only cast at the end. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134112 Approved by: https://github.com/drisspg ghstack dependencies: #134110, #134111	2024-08-22 11:07:34 +00:00
Luca Wehrstedt	15faed60ca	[fp8 rowwise] Make schedule selection more readable (#134111 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/134111 Approved by: https://github.com/drisspg ghstack dependencies: #134110	2024-08-22 11:07:30 +00:00
Luca Wehrstedt	b8ea5b01c9	[fp8 rowwise] Allocate workspace as a PyTorch Tensor (#134110 ) This makes us pass through the CUDA caching allocator which is safer e.g. in case of CUDA graphs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134110 Approved by: https://github.com/drisspg	2024-08-22 11:07:26 +00:00
cyy	4c8193b8f0	[14/N] Fix clang-tidy warnings in aten/src/ATen (#132733 ) Follows #133807 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132733 Approved by: https://github.com/ezyang	2024-08-22 10:09:15 +00:00
Zitong Zhan	90c821814e	SparseCsrCUDA: cuDSS backend for linalg.solve (#129856 ) This PR switches to cuDSS library and has the same purpose of #127692, which is to add Sparse CSR tensor support to linalg.solve. Fixes #69538 Minimum example of usage: ``` import torch if __name__ == '__main__': spd = torch.rand(4, 3) A = spd.T @ spd b = torch.rand(3).to(torch.float64).cuda() A = A.to_sparse_csr().to(torch.float64).cuda() x = torch.linalg.solve(A, b) print((A @ x - b).norm()) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129856 Approved by: https://github.com/amjames, https://github.com/lezcano, https://github.com/huydhn Co-authored-by: Zihang Fang <zhfang1108@gmail.com> Co-authored-by: Huy Do <huydhn@gmail.com>	2024-08-22 07:57:30 +00:00
Pearu Peterson	64cfcbd8a3	Tune _int_bsr_dense_addmm for int8 inputs on A100 (#134035 ) As in the title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134035 Approved by: https://github.com/cpuhrsch ghstack dependencies: #133855	2024-08-22 06:43:11 +00:00
Feng Yuan	b7baa062fc	Update torch-xpu-ops pin (ATen XPU implementation) (#133850 ) Bugfixings for PyTorch 2.5, 1. Using SYCL group algorithm API instead of old style for sub group shift utilities. 2. Add preprocess in reduction kernel for cases requiring data type cast. 3. Make group norm memory format compatible. 4. ZeroTensor: a. Remove unnecessary aten operators registration, or ZeroTensor process is bypassed. b. Align preprocess with intree implementation in aten::copy_. 5. Rebase checkIndexTensorTypes usage. 6. Align latest semantics of PyTorch foreach operators. Return multiple tensors with offset=0. Pull Request resolved: https://github.com/pytorch/pytorch/pull/133850 Approved by: https://github.com/EikanWang	2024-08-22 06:27:03 +00:00
Yuanhao Ji	cdb9c7d228	Add support for using privateuse1 backend name in `instantiate_device_type_tests()` (#133082 ) As you can see, 'privateuse1' appears many times in out-of-tree extension codebase. I think that everything about the device type should be as same as other in-tree backends after registering the privateuse1 backend. For example, after registering a privateuse1 backend named "foo", you should allow "foo" to be passed in as a valid device type. ```diff - instantiate_device_type_tests(TestIndexing, globals(), only_for='privateuse1') - instantiate_device_type_tests(NumpyTests, globals(), only_for='privateuse1') + instantiate_device_type_tests(TestIndexing, globals(), only_for='foo') + instantiate_device_type_tests(NumpyTests, globals(), only_for='foo') ``` > https://github.com/Ascend/pytorch/blob/master/test/test_indexing.py#L1654-L1655 The change is to map privateuse1 backend name to 'privateuse1' when calling `filter_desired_device_types()`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/133082 Approved by: https://github.com/albanD	2024-08-22 06:17:21 +00:00
Chong Gu	24c2dd2002	Migrate fuse_chunk_reshape_concat_pass to PT2 (#134026 ) Summary: This is part of the work of dper pass migration https://fburl.com/gdoc/wxwykxns This pass has ~2.4% perf impact for adfinder_reels_ctr_model Test Plan: Still in test Differential Revision: D60789747 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134026 Approved by: https://github.com/huxintong	2024-08-22 06:13:52 +00:00
chilli	938f37b745	Added batching rule for sdpa_math, sdpa_efficient_attention forward, cudnn, and flash attention (#133964 ) Fixes https://github.com/pytorch/pytorch/issues/117016, https://github.com/pytorch/pytorch/issues/102457, https://github.com/pytorch/pytorch/issues/110525, https://github.com/pytorch/pytorch/issues/108065, Pull Request resolved: https://github.com/pytorch/pytorch/pull/133964 Approved by: https://github.com/Skylion007	2024-08-22 05:29:49 +00:00
Xu Han	e2ff094008	[inductor] calibration inductor windows uts (1/N) (#134033 ) Changes: 1. Re-open fixed UTs. 2. Mark skiped reasons for failed UTs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134033 Approved by: https://github.com/jansel	2024-08-22 05:21:28 +00:00
Avik Chaudhuri	0d7ac1966a	kill sharing of constraints (#134045 ) Summary: Previously, reuse of the same `Dim` was encoded by "sharing" internal constraints among constraint targets. This kind of sharing, implemented using `shared` fields between `_Constraint`s, was originally motivated by `dynamic_dim`, specifically to support `==` between `dynamic_dim`s, but we no longer need to maintain this overcomplicated structure: we can simply use names of `Dims` to directly encode sharing information. Thus this PR vastly simplifies the structure of `_Constraint` by removing `shared` fields. As a result, both `_Constraint` and its moral subclass, `_DerivedConstraint`, are 1-1 with `Dim` and its moral subclass, `DerivedDim`. Note that this will break `==` over `dynamic_dim`, so an immediate follow-up will be to remove `dynamic_dim` entirely from our public API. (It's been more than 6 months since the deprecation warning anyway.) I just didn't want to deal with that process in the same PR. Test Plan: existing Differential Revision: D61559413 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134045 Approved by: https://github.com/pianpwk	2024-08-22 04:40:47 +00:00
Wil Kong	de06345e9b	Avoid Host & Device Sync In LR Scheduler (#133663 ) Fixes #133662. Pull Request resolved: https://github.com/pytorch/pytorch/pull/133663 Approved by: https://github.com/janeyx99, https://github.com/eqy Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com>	2024-08-22 03:52:43 +00:00

1 2 3 4 5 ...

77484 Commits