pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:07:10 +08:00

Author	SHA1	Message	Date
Shunting Zhang	b71423c2e4	[inductor] let coordesc tuner respect max RBLOCK (#124325 ) Fix https://github.com/pytorch/pytorch/issues/124251 . Coordesc tuner need respect max RBLOCK. When rnumel is a multiple of max-RBLOCK, inductor codegen will skip rmask. If coordesc tuner does not consider max-RBLOCK and pick a RBLOCK larger than that, we would get CUDA IMA (illegal memory access) error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124325 Approved by: https://github.com/Chillee, https://github.com/jansel	2024-04-18 02:12:35 +00:00
Pearu Peterson	43b4ac956e	Add index_reduce decomposition (#122579 ) As in the title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/122579 Approved by: https://github.com/peterbell10 ghstack dependencies: #123375	2024-04-18 01:30:47 +00:00
eellison	030bb13fe8	Re-land precompile triton templates (#124030 ) Re-land precompile triton templates. This got reverted because we were precompiling templates without checking the cache. I have since added logic and a test to ensure we do not precompile if there is a cache hit. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124030 Approved by: https://github.com/shunting314, https://github.com/nmacchioni, https://github.com/yoyoyocmu	2024-04-18 01:22:13 +00:00
albanD	fae31495ff	Try to speed up lintrunner in CI (#124311 ) Before timing: clang is 19min and noclang is 16min After timing: clang is 17min and noclang is 15min This is still crazy slow so most likely more could be done but didn't check the logs in details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124311 Approved by: https://github.com/ezyang, https://github.com/Skylion007	2024-04-18 01:17:47 +00:00
Yu, Guangye	cc66c43d51	Make macro with AMP more generic (#124050 ) # Motivation According to [[RFC] Intel GPU Upstreaming](https://github.com/pytorch/pytorch/issues/114723), we would like to upstream amp autocast policy to facilitate the functionality and accuracy of `torch.compile` on e2e benchmarks. # Solution The first PR aims to make macro `KERNEL` to be generic. It accepts two types of inputs, like `(DISPATCH, OP, POLICY)` and `(DISPATCH, OP, OVERLOAD, POLICY)`. The second PR intends to refactor CUDA's autocast policy to make it can be shared with `XPU` backend. The final PR would like to support XPU autocast policy which shares the same recipe with `CUDA` backend. # Additional Context Another motivation is we would like to unify autocast API and provide the generic APIs, like: - `torch.get_autocast_dtype(device_type)` - `torch.set_autocast_dtype(device_type)` - `torch.is_autocast_enabled(device_type)` - `torch.set_autocast_enabled(device_type)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/124050 Approved by: https://github.com/jgong5, https://github.com/EikanWang, https://github.com/gujinghui, https://github.com/albanD	2024-04-18 01:15:03 +00:00
Michael Lazos	102a223216	Enable dynamo test_state_dict_deterministic (#123323 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123323 Approved by: https://github.com/janeyx99 ghstack dependencies: #123498, #123322	2024-04-18 01:06:28 +00:00
Michael Lazos	d88fcb86d8	Enable dynamo traced test_forloop_goes_right_direction (#123322 ) Removed a bunch of skips, I also updated test_forloop_goes_right_direction to not use the closure when dynamo is tracing. The reason for this is that testing the disabled optimizer doesn't actually test anything. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123322 Approved by: https://github.com/janeyx99 ghstack dependencies: #123498	2024-04-18 00:50:10 +00:00
Michael Lazos	57a3dc56d4	Small Adamax fix (#123498 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123498 Approved by: https://github.com/janeyx99	2024-04-18 00:50:03 +00:00
Yuanhao Ji	21f7cbdc1c	Enable UFMT on `test/test_autograd.py` (#124141 ) Part of: #123062 Ran lintrunner on: - `test/test_autograd.py` Detail: ```bash $ lintrunner -a --take UFMT --all-files ok No lint issues. Successfully applied all patches. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/124141 Approved by: https://github.com/soulitzer	2024-04-18 00:16:23 +00:00
Catherine Lee	025387f4dd	[ez][CI] Reduce CI_SERIAL_LIST pt2 (#124298 ) #124085 Add @serialTest() to some tests slow gradcheck already runs serially Doing this slowly so its easier to check flaky issues that might get made Pull Request resolved: https://github.com/pytorch/pytorch/pull/124298 Approved by: https://github.com/kit1980	2024-04-18 00:13:36 +00:00
William Wen	38bfe7bcd1	add link to custom ops troubleshooting page on tensor data_ptr error (#124240 ) Fix part of https://github.com/pytorch/pytorch/issues/123603. Example traceback on branch https://github.com/pytorch/vision/compare/main...wwen/custom_ops_test: ``` running my_custom_op! Traceback (most recent call last): File "/data/users/williamwen/torchvision/playground.py", line 13, in <module> print(opt_fn1(torch.randn(3, 3))) File "/data/users/williamwen/pytorch2/torch/_dynamo/eval_frame.py", line 387, in _fn return fn(args, kwargs) File "/data/users/williamwen/pytorch2/torch/_dynamo/convert_frame.py", line 977, in catch_errors return callback(frame, cache_entry, hooks, frame_state, skip=1) File "/data/users/williamwen/pytorch2/torch/_dynamo/convert_frame.py", line 818, in _convert_frame result = inner_convert( File "/data/users/williamwen/pytorch2/torch/_dynamo/convert_frame.py", line 411, in _convert_frame_assert return _compile( File "/data/users/williamwen/pytorch2/torch/_utils_internal.py", line 70, in wrapper_function return function(args, *kwargs) File "/data/users/williamwen/py310-env/lib/python3.10/contextlib.py", line 79, in inner return func(args, *kwds) File "/data/users/williamwen/pytorch2/torch/_dynamo/convert_frame.py", line 700, in _compile guarded_code = compile_inner(code, one_graph, hooks, transform) File "/data/users/williamwen/pytorch2/torch/_dynamo/utils.py", line 266, in time_wrapper r = func(args, *kwargs) File "/data/users/williamwen/pytorch2/torch/_dynamo/convert_frame.py", line 568, in compile_inner out_code = transform_code_object(code, transform) File "/data/users/williamwen/pytorch2/torch/_dynamo/bytecode_transformation.py", line 1116, in transform_code_object transformations(instructions, code_options) File "/data/users/williamwen/pytorch2/torch/_dynamo/convert_frame.py", line 173, in _fn return fn(args, kwargs) File "/data/users/williamwen/pytorch2/torch/_dynamo/convert_frame.py", line 515, in transform tracer.run() File "/data/users/williamwen/pytorch2/torch/_dynamo/symbolic_convert.py", line 2237, in run super().run() File "/data/users/williamwen/pytorch2/torch/_dynamo/symbolic_convert.py", line 875, in run while self.step(): File "/data/users/williamwen/pytorch2/torch/_dynamo/symbolic_convert.py", line 790, in step self.dispatch_table[inst.opcode](self, inst) File "/data/users/williamwen/pytorch2/torch/_dynamo/symbolic_convert.py", line 492, in wrapper return inner_fn(self, inst) File "/data/users/williamwen/pytorch2/torch/_dynamo/symbolic_convert.py", line 1260, in CALL_FUNCTION self.call_function(fn, args, {}) File "/data/users/williamwen/pytorch2/torch/_dynamo/symbolic_convert.py", line 730, in call_function self.push(fn.call_function(self, args, kwargs)) File "/data/users/williamwen/pytorch2/torch/_dynamo/variables/torch.py", line 747, in call_function tensor_variable = wrap_fx_proxy( File "/data/users/williamwen/pytorch2/torch/_dynamo/variables/builder.py", line 1425, in wrap_fx_proxy return wrap_fx_proxy_cls(target_cls=TensorVariable, kwargs) File "/data/users/williamwen/pytorch2/torch/_dynamo/variables/builder.py", line 1510, in wrap_fx_proxy_cls example_value = get_fake_value(proxy.node, tx, allow_non_graph_fake=True) File "/data/users/williamwen/pytorch2/torch/_dynamo/utils.py", line 1804, in get_fake_value raise TorchRuntimeError(str(e)).with_traceback(e.__traceback__) from None File "/data/users/williamwen/pytorch2/torch/_dynamo/utils.py", line 1736, in get_fake_value ret_val = wrap_fake_exception( File "/data/users/williamwen/pytorch2/torch/_dynamo/utils.py", line 1251, in wrap_fake_exception return fn() File "/data/users/williamwen/pytorch2/torch/_dynamo/utils.py", line 1737, in <lambda> lambda: run_node(tx.output, node, args, kwargs, nnmodule) File "/data/users/williamwen/pytorch2/torch/_dynamo/utils.py", line 1872, in run_node raise RuntimeError(make_error_message(e)).with_traceback( File "/data/users/williamwen/pytorch2/torch/_dynamo/utils.py", line 1854, in run_node return node.target(args, kwargs) File "/data/users/williamwen/pytorch2/torch/_ops.py", line 870, in __call__ return self_._op(args, *(kwargs or {})) torch._dynamo.exc.TorchRuntimeError: Failed running call_function torchvision.my_custom_op1((FakeTensor(..., size=(3, 3)),), **{}): The tensor has a non-zero number of elements, but its data is not allocated yet. Caffe2 uses a lazy allocation, so you will need to call mutable_data() or raw_mutable_data() to actually allocate memory. If you're using torch.compile/export/fx, it is likely that we are erroneously tracing into a custom kernel. To fix this, please wrap the custom kernel into an opaque custom op. Please see the following for details: https://docs.google.com/document/d/1W--T6wz8IY8fOI0Vm8BF44PdBgs283QvpelJZWieQWQ from user code: File "/data/users/williamwen/torchvision/playground.py", line 5, in fn1 return torch.ops.torchvision.my_custom_op1(x) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/124240 Approved by: https://github.com/zou3519	2024-04-18 00:08:09 +00:00
rzou	5a60a1abde	Move the implementation of register_fake onto torch.library.Library (#124065 ) Motivations: - This makes things more consistent: using a Library object, you should be able to do all of the registration APIs that tie registrations to the lifetime of the Library. - I need this for the next PR up in the stack, where we will have torch.library.register_fake support both CustomOpDef (from the new custom ops API) and other custom ops. Test Plan: - existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/124065 Approved by: https://github.com/albanD ghstack dependencies: #123937, #124064	2024-04-17 23:51:20 +00:00
rzou	d1e1d671ef	Stop requiring a pystub for register_fake by default (#124064 ) Previously, if someone used `register_fake` to add a fake impl for an operator defined in C++, we would require them to add a `m.set_python_module(<module>)` call to C++. This was to avoid situations where a user imported the C++ operator without importing the fake impl. This "breaks" open registration: there's no way to add a fake impl outside of a repository that defines an operator, so we want to turn this behavior off by default in open source. Test Plan: - existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/124064 Approved by: https://github.com/albanD ghstack dependencies: #123937	2024-04-17 23:51:20 +00:00
PyTorch MergeBot	f5049de242	Revert "[Environment Variable][1/N] Use thread-safe env variable API in c10 (#119449 )" This reverts commit 5bef127c2ea49280e7fda4f9fa7cad6fa4078e7d. Reverted https://github.com/pytorch/pytorch/pull/119449 on behalf of https://github.com/PaliC due to your using TORCH_INTERNAL_ASSERT incorrectly ([comment](https://github.com/pytorch/pytorch/pull/119449#issuecomment-2062696010))	2024-04-17 23:44:00 +00:00
Mikayla Gawarecki	64f6ddf12c	Add swap_tensors path to nn parametrizations (#124130 ) Fixes #123859 Pull Request resolved: https://github.com/pytorch/pytorch/pull/124130 Approved by: https://github.com/albanD	2024-04-17 23:37:28 +00:00
Andrew Gu	b5235694f4	[FSDP2] Made `unshard` return type consistent (#124293 ) We can always return an `UnshardHandle` if `async_op=True` even if the FSDP module does not manage any parameters and hence does not have an `FSDPParamGroup`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124293 Approved by: https://github.com/weifengpy ghstack dependencies: #120952	2024-04-17 23:33:46 +00:00
Andrew M. James	64f42bfd52	[dynamo] Support list.reverse (#124210 ) fixes #123974 Pull Request resolved: https://github.com/pytorch/pytorch/pull/124210 Approved by: https://github.com/peterbell10	2024-04-17 23:33:32 +00:00
Matthias Reso	dd7aeedb72	[Dynamo] Check for __bool__ attribute before accessing it (#120943 ) This PR checks if __bool__ attribute is available before accessing it when handling a UserDefinedObjectVariable Fixes #119782 Pull Request resolved: https://github.com/pytorch/pytorch/pull/120943 Approved by: https://github.com/zou3519	2024-04-17 23:26:55 +00:00
Nikita Shulga	00372b1211	Extend int[48]mm ops to float32 input (#124287 ) Just for completeness Pull Request resolved: https://github.com/pytorch/pytorch/pull/124287 Approved by: https://github.com/mikekgfb	2024-04-17 23:10:49 +00:00
Diogo Teles Sant'Anna	14162eecfc	Update Security Policy to provide Security Guidance for users (#120531 ) Fixes #120530 Co-authored-by: albanD <desmaison.alban@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/120531 Approved by: https://github.com/malfet, https://github.com/albanD	2024-04-17 23:08:48 +00:00
ZhiweiYan-96	9875a834e4	[Intel GPU] oneDNN GPU GEMM support (#117202 ) # Motivation This PR is a part of RFC #114848, and it is a successor PR of #116249 and #116019. This PR would depend on oneDNN compilation in #116249. Some runtime support is needed in #116019. Aten operators like `addmm`, `baddmm` is defined in `Blas.cpp` in `aten/src/ATen/native/mkldnn/xpu/`. Accompanied with these files provide core functionaliy, `BlasImpl.h`, `Utils.h` and other file provide basic utilities for them. For instance, `Utils.h` provide common memory descriptor query utils for `Matmul.h` and these utility function will also be used in other primitive, like `convolution`. `BlasImpl.h` is a header file that provide helper for handling shape info processing in matmul related operators. It would not only help basic GEMM operator like `addmm, baddmm` but also help fusion operators used in `torch.compile` like `linear_pointwise` in #117824. In next stage, we would continually complete the oneDNN support through enabling `matmul fusion` and `convolution` related code. Co-authored-by: xiaolil1 <xiaoli.liu@intel.com> Co-authored-by: lei,zhenyuan <zhenyuan.lei@intel.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/117202 Approved by: https://github.com/EikanWang, https://github.com/jgong5, https://github.com/malfet ghstack dependencies: #117098, #117112	2024-04-17 23:06:38 +00:00
vfdev-5	6330acae76	Refactored implementation for upsample_nearest decompostions (#122783 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122783 Approved by: https://github.com/peterbell10	2024-04-17 23:05:40 +00:00
Edward Z. Yang	bebdbb63ce	Introduce set_example_value and use it throughout Dynamo (#124176 ) I'm going to setup some extra behavior when we set example value, so I need a convenient place to interpose. I cannot easily do it on meta itself because its a generic dict with no interposition point. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/124176 Approved by: https://github.com/oulgen ghstack dependencies: #124105, #124059	2024-04-17 22:57:11 +00:00
Tugsbayasgalan Manlaibaatar	d23bf9cef0	Add fake impl for aten.unique2 (#124306 ) Reapply of: https://github.com/pytorch/pytorch/pull/121571 Differential Revision: [D56258431](https://our.internmc.facebook.com/intern/diff/D56258431) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124306 Approved by: https://github.com/gmagogsfm	2024-04-17 22:55:27 +00:00
ZhiweiYan-96	cc18afa25f	Intel GPU oneDNN upstreaming for primitive integration (#117112 ) # Motivation As proposed in https://github.com/pytorch/pytorch/issues/114848 and https://github.com/pytorch/pytorch/issues/114723, oneDNN library is an important component for Intel GPU software ecosystem. Current PR is based on #117098, where oneDNN library for Intel GPU should be ready. This PR is the integration code from aten to oneDNN. GEMM integration code is the core part in this PR. Accompanied with GEMM, more basic support like runtime (device, stream), primitive attr is also included. We put the oneDNN integration code in directory `aten/src/ATen/native/mkldnn/xpu/detail`. We add a namespace `at::native::xpu::onednn` for oneDNN integration. The code in this PR would be used in following PRs, where aten operators would call the functions in these integration code.. We separate the prs due to onednn integration is logically separable with aten operator implementation. Also, this can ease the burden of reviewing by avoid too much codes in single PR. Co-authored-by: xiaolil1 <xiaoli.liu@intel.com> Co-authored-by: lei,zhenyuan <zhenyuan.lei@intel.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/117112 Approved by: https://github.com/EikanWang, https://github.com/jgong5, https://github.com/albanD	2024-04-17 22:49:56 +00:00
PyTorch MergeBot	944d046645	Revert "[DeviceMesh][Test] Add 3d unit test for `get_local_rank()` (#124142 )" This reverts commit a403757913689d200683a4158c565bc3dbade74b. Reverted https://github.com/pytorch/pytorch/pull/124142 on behalf of https://github.com/malfet due to Broke lint ([comment](https://github.com/pytorch/pytorch/pull/124142#issuecomment-2062587289))	2024-04-17 22:31:30 +00:00
Tristan Rice	1ec05c769b	all_gather and reduce_scatter autograd (#123989 ) This adds `all_gather_tensor_autograd` and `reduce_scatter_tensor_autograd` to the functional_collectives library. This only supports `sum` mode for `reduce_scatter` but should be easy to extend in the future. The backwards implementations match the behavior in https://github.com/pytorch/torchrec/blob/main/torchrec/distributed/comm_ops.py This follows the pattern of #123599 . Test plan: ```sh pytest test/distributed/test_functional_api.py -k Autograd ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/123989 Approved by: https://github.com/wanchaol	2024-04-17 21:32:22 +00:00
wz337	a403757913	[DeviceMesh][Test] Add 3d unit test for `get_local_rank()` (#124142 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/124142 Approved by: https://github.com/xunnanxu, https://github.com/fegin, https://github.com/XilunWu	2024-04-17 20:45:49 +00:00
wz337	cdc855af97	[Test][2D] Turn on 2D state_dict tests for uneven sharding (#124255 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/124255 Approved by: https://github.com/wanchaol	2024-04-17 20:45:34 +00:00
Xuehai Pan	93e249969b	[BE] enable `ruff` rule `RSE` and remove useless parentheses in `raise` statements (#124261 ) Remove useless parentheses in `raise` statements if the exception type is raised with no argument. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124261 Approved by: https://github.com/albanD	2024-04-17 19:29:34 +00:00
eqy	b726a23d4e	change `tf32` thresholds for `test_per_sample_grads_embeddingnet` (#124104 ) TF32 causes issues with the tolerances here; we might also consider migrating some of the `with_tf32_off` tests in this file to `tf32_on_and_off` in case it would be useful to get signal for TF32. CC @malfet @atalman Pull Request resolved: https://github.com/pytorch/pytorch/pull/124104 Approved by: https://github.com/zou3519	2024-04-17 19:16:32 +00:00
doloresgarcia	4efdf9a6a6	fix pytorch version for onnx in doc (#124182 ) Fixes [ 123845](https://github.com/pytorch/pytorch/issues/123845) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124182 Approved by: https://github.com/albanD	2024-04-17 18:05:15 +00:00
Oguz Ulgen	24cecf06d7	Update autotune jk knobs (#124214 ) Differential Revision: [D56201145](https://our.internmc.facebook.com/intern/diff/D56201145/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124214 Approved by: https://github.com/aakhundov	2024-04-17 17:49:25 +00:00
Animesh Jain	f433517181	[dynamo][decorator] Support disable on nn modules (#124185 ) Fixes https://github.com/pytorch/pytorch/issues/123979 Pull Request resolved: https://github.com/pytorch/pytorch/pull/124185 Approved by: https://github.com/weifengpy, https://github.com/yoyoyocmu	2024-04-17 16:20:34 +00:00
Nikita Shulga	46324fe073	Speedup int4mm_kernel with NEON (#124257 ) By unrolling middle loop by 16 elements and using neon to decode packed int4 to float32. Unrolling entire `n` loop actually makes it a tad slower, probably because ARM has smaller register file that x86 Before/after performance running stories110M on M2Pro \| eager (before) \| eager (after) \| compile(before) \| compile (after) \| \| ---- \| --- \| -- \| -- \| \| 28 \| 57 \| 31 \| 104 \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/124257 Approved by: https://github.com/mikekgfb	2024-04-17 16:04:25 +00:00
maple@max	9b1d6c8d98	improve F.adaptive_avg_pool2d error messages on mps (#124143 ) Gives better error messages on mps. Partially fixes #123725 in the case of `F.adaptive_avg_pool2d`. This also relates to #96056. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124143 Approved by: https://github.com/albanD, https://github.com/malfet	2024-04-17 16:04:09 +00:00
Xuehai Pan	7e1c98c171	[dynamo] support `object.__setattr__(obj, name, value)` (#124068 ) Resolves #114964 Resolves #114966 - #114964 - #114966 Pull Request resolved: https://github.com/pytorch/pytorch/pull/124068 Approved by: https://github.com/jansel	2024-04-17 15:57:14 +00:00
PyTorch MergeBot	36f6928a37	Revert "[Profiler][PrivateUse1] Profiler support PrivateUse1 key (#120556 )" This reverts commit 41613a0803f7cde7956f039bc80f94253b0843f9. Reverted https://github.com/pytorch/pytorch/pull/120556 on behalf of https://github.com/aaronenyeshi due to Breaks GPU Chrome trace UI ([comment](https://github.com/pytorch/pytorch/pull/120556#issuecomment-2061578951))	2024-04-17 15:38:14 +00:00
Pearu Peterson	d2b0c0a34e	Fix index_reduce sampler filter when op_info.variant_test_name is specified (#123375 ) As in the title: `index_reduce` sample must correspond to reduction type specified by `variant_test_name`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123375 Approved by: https://github.com/zou3519, https://github.com/peterbell10	2024-04-17 15:31:28 +00:00
Nikita Shulga	5a735ece6b	Remove @abock from ONNX approvers/codeowners (#124259 ) As he is no longer interested in the project Pull Request resolved: https://github.com/pytorch/pytorch/pull/124259 Approved by: https://github.com/kit1980, https://github.com/BowenBao	2024-04-17 14:13:53 +00:00
Nikita Shulga	b880a71010	[BE] Add missing `std::` prefix to `Unique.mm` (#124232 ) Follow up after https://github.com/pytorch/pytorch/pull/124117 fixes following warning ``` /Users/malfet/git/pytorch/pytorch/aten/src/ATen/native/mps/operations/Unique.mm:282:26: warning: use of function template name with no prior declaration in function call with explicit template arguments is a C++20 extension [-Wc++20-extensions] return std::make_tuple(get<0>(out).to("mps"), get<1>(out).to("mps"), get<2>(out).to("mps")); ^ ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/124232 Approved by: https://github.com/kit1980, https://github.com/Skylion007	2024-04-17 14:12:29 +00:00
Kai Londenberg	5f378e1853	[Inductor cutlass backend] Fix flaky test ( CUDA IMA ) (#124106 ) A unit test within test_cutlass_backend.py can fail with CUDA illegal memory accesses due to the fact that some CUTLASS Kernels contain bugs. By using autotuning in subprocesses, this CUDA illegal memory access simply leads to the buggy Cutlass Kernels being filtered out, instead of causing it to bring down the entire process. Test Plan: This is a change to a unit test. It's recommended to use autotune_in_subproc when using the Cutlass backend anyway. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124106 Approved by: https://github.com/eellison	2024-04-17 13:19:13 +00:00
rzou	47dbfecd37	Rename impl_abstract to register_fake, part 1/2 (#123937 ) This PR: - adds a new torch.library.register_fake and deprecates torch.library.impl_abstract. The motivation is that we have a lot of confusion around the naming so we are going to align the naming with the actual subsystem (FakeTensor). - renames `m.impl_abstract_pystub("fbgemm_gpu.sparse_ops")` to `m.has_python_registration("fbgemm_gpu.sparse_ops")`. No deprecation here yet; I need to test how this works with static initialization. - Renames a bunch of internals to match (e.g. abstractimplpystub -> pystub) I'm scared to rename the Python-side internal APIs (e.g. torch._library.abstract_impl) because of torch.package concerns. I'll do that in its own isolated PR next just in case it causes problems. DEPRECATION NOTE: torch.library.impl_abstract was renamed to to torch.library.register_fake. Please use register_fake. We'll delete impl_abstract in a future version of PyTorch. Test Plan: - existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/123937 Approved by: https://github.com/albanD	2024-04-17 12:46:01 +00:00
Yuanhao Ji	6efcb6c718	Fix wrong ufmt exclusions in `.lintrunner.toml` (#124135 ) Part of: #123062 In this pull request(#123809), there were some exclusions that should have been removed, but weren't. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124135 Approved by: https://github.com/ezyang	2024-04-17 12:22:50 +00:00
PyTorch MergeBot	2dc15b6849	Revert "[sparse] Add fast semi-structured spasification kernels (#122350 )" This reverts commit 14b2273b0c58b4000e10b2e441341eeafb7dd2f6. Reverted https://github.com/pytorch/pytorch/pull/122350 on behalf of https://github.com/DanilBaibak due to Broken trunk ([comment](https://github.com/pytorch/pytorch/pull/122350#issuecomment-2061070350))	2024-04-17 11:47:02 +00:00
PyTorch MergeBot	3f89f565bb	Revert "Re-land precompile triton templates (#124030 )" This reverts commit d68196e7ef5eb8f62064ef70c75032f4d8b4a4fa. Reverted https://github.com/pytorch/pytorch/pull/124030 on behalf of https://github.com/DanilBaibak due to Broken trunk ([comment](https://github.com/pytorch/pytorch/pull/124030#issuecomment-2061044960))	2024-04-17 11:31:33 +00:00
PyTorch MergeBot	77ad630f5d	Revert "Dont precompile already seen keys, limit epilogue choices (#122642 )" This reverts commit 050051f412e50d98d506adf0d05aa6e4ceab54bd. Reverted https://github.com/pytorch/pytorch/pull/122642 on behalf of https://github.com/DanilBaibak due to Broken trunk ([comment](https://github.com/pytorch/pytorch/pull/124030#issuecomment-2061044960))	2024-04-17 11:31:32 +00:00
FFFrog	acc466751b	Add bfloat16 support to binary_cross_entropy for CPU (#123823 ) Fixes #123715 As the title stated. But, maybe we should pay attention to this https://github.com/pytorch/pytorch/pull/33206, which removed the half support for cpu about 4 years ago. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123823 Approved by: https://github.com/Skylion007, https://github.com/malfet	2024-04-17 09:44:07 +00:00
DanilBaibak	c4878abab0	Fix Setup Linux for ARC (#124171 ) We can't get information about `ami-id`, `instance-id`, `instance-type` for the ARC runners: ``` 2024-04-16T11:10:17.0098276Z curl: (22) The requested URL returned error: 401 2024-04-16T11:10:17.0110775Z ami-id: 2024-04-16T11:10:17.0159131Z curl: (22) The requested URL returned error: 401 2024-04-16T11:10:17.0167378Z instance-id: 2024-04-16T11:10:17.0219464Z curl: (22) The requested URL returned error: 401 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/124171 Approved by: https://github.com/malfet, https://github.com/ZainRizvi, https://github.com/zxiiro	2024-04-17 09:25:02 +00:00
chunyuan	d0211e207c	inductor cpp wrapper: add GIL release back (#123897 ) Fixes https://github.com/pytorch/pytorch/issues/123517. This PR adds the GIL release (originally added in https://github.com/pytorch/pytorch/pull/111888) back following the suggestion here: https://github.com/pytorch/pytorch/pull/123897#discussion_r1562509705. We added a default constructor and an assignment operator for the `RAIIPyObject` class (https://github.com/pytorch/pytorch/pull/123897#discussion_r1566262575) in order to declare the `custom_op_wrapper` outside of the GIL acquisition scope. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123897 Approved by: https://github.com/peterbell10, https://github.com/jgong5	2024-04-17 07:18:14 +00:00

... 2 3 4 5 6 ...

72046 Commits