pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 12:54:11 +08:00

Author	SHA1	Message	Date
Nikita Shulga	a10ce22577	[BE] Update bazelisk and bazel versions (#140992 ) bazelisk from 1.16 to 1.23 bazel from 6.1.1 to 6.5.0 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140992 Approved by: https://github.com/izaitsevfb, https://github.com/huydhn	2024-11-19 03:40:53 +00:00
Yidi Wu	0fcd024f59	[hop] refactor only_consist_of with find_mismatched_vars (#140105 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/140105 Approved by: https://github.com/zou3519	2024-11-19 03:21:16 +00:00
Ke Wen	70a0906f24	[c10d] Support optional backend if device_id provided (#140963 ) Citing @malfet's [comment](https://github.com/pytorch/pytorch/pull/136343#pullrequestreview-2318792396) in https://github.com/pytorch/pytorch/pull/136343 > It would be great, if users do not have to modify their programs for every new backend, but rather use with torch.device('xpu'): and keep rest of the code unchanged. This PR makes the backend specification ("nccl", "gloo") optional when user provides a `devce_id` to `init_process_group` (the acceptance of `device_id` has been previously supported for the purpose of eager init). New user experience: ``` device = torch.device(device_type, rank % device_count) dist.init_process_group(device_id=device) ``` The line of `device = torch.device(...)` is anyway needed because user would use it for tensor creation etc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140963 Approved by: https://github.com/wconstab	2024-11-19 03:17:29 +00:00
Mikayla Gawarecki	37959c554d	Add small test case for #140230 (#140850 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/140850 Approved by: https://github.com/malfet ghstack dependencies: #140739, #140740	2024-11-19 02:44:54 +00:00
Mikayla Gawarecki	f3f305ef3e	Fix condition for weights_only unpickler for DTensor (#140740 ) Same as #140739 but for DTensor (move safe globals for DTensor to `torch.distributed.tensor.__init__` and update error message to let user know `torch.distributed.tensor` must be imported to load DTensor) Differential Revision: [D65961690](https://our.internmc.facebook.com/intern/diff/D65961690) Pull Request resolved: https://github.com/pytorch/pytorch/pull/140740 Approved by: https://github.com/malfet ghstack dependencies: #140739	2024-11-19 02:44:53 +00:00
Mikayla Gawarecki	b63a84804c	Allow NJT by default for weights_only torch.load (take 2) (#140739 ) Per discussion with @malfet, only allow weights_only unpickler to load NJT if `torch.nested` and `torch._dynamo` are imported (this is slightly weird as technically `torch.nested` is actually imported by default and `torch._dynamo.decorators._DimRange` is actually what needs to be imported) we can't import this from `torch.nested` as this would - undo dynamo lazy import - cause circular import =========================== Redo of https://github.com/pytorch/pytorch/pull/140304 caused issues as `torch.nested._internal.foo` needs to be imported, which causes issues like ```python torch/_weights_only_unpickler.py", line 339, in load if full_path in _get_allowed_globals(): torch/_weights_only_unpickler.py", line 188, in _get_allowed_globals torch.nested._internal.nested_tensor.NestedTensor AttributeError: module 'torch.nested' has no attribute '_internal' ``` This likely wasn't caught in our CI because imports are global during unit tests(?), so we use subprocess to properly test this time Differential Revision: [D65961691](https://our.internmc.facebook.com/intern/diff/D65961691) @jbschlosser Pull Request resolved: https://github.com/pytorch/pytorch/pull/140739 Approved by: https://github.com/malfet	2024-11-19 02:44:53 +00:00
Prajesh Praveen Anchalia	1e234e63b3	[pytorch][dynamo_compile] Log inductor config to dynamo_compile (#140790 ) Summary: Scrubbed inductor config logging to dynamo_compile as json:str. Scrub RE: `r'((^TYPE_CHECKING$)\|(._progress$)\|(.TESTING.)\|(.(rocm\|halide).)\|(^trace\..)\|(^_))'`to save some space. Test Plan: Staging logger: https://fburl.com/data/ltkt08zm P1679697917 {F1958428018} Differential Revision: D65806399 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140790 Approved by: https://github.com/masnesral	2024-11-19 02:39:33 +00:00
Brian Hirsh	9ae19ffbed	fix layer_norm decomp precision for cpu (#140557 ) xref: https://fb.workplace.com/groups/1075192433118967/posts/1540519826586223/?comment_id=1543752356262970&reply_comment_id=1544425069529032 the issue is that our decomp needs to branch on device (it only upcasts for cpu), but the device shows up as "meta" because it is registered as a meta tensor rule. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140557 Approved by: https://github.com/ezyang	2024-11-19 02:31:31 +00:00
Max Ren	240aa77ad0	[Quantizer][XNNPACK] Fix ReLU fusion when conv/linear has > 1 user (#140846 ) Summary: Bug in quantizer when Conv + ReLU is fused even when the preceeding conv has more than one user. Conv and ReLU can not be fused in this case because the result of Conv must be used elsewhere. XNNPACK Delegate naturally handles this by inserting a clamp node for ReLU. Test Plan: CI Reviewed By: digantdesai Differential Revision: D65989599 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140846 Approved by: https://github.com/digantdesai	2024-11-19 02:29:45 +00:00
Tristan Rice	2673a440d0	[distributed] add PG APIs and general doc cleanups (#140853 ) Doc updates: * This adds documentation for the object oriented ProcessGroup APIs that are being used in torchft as well as https://github.com/pytorch/rfcs/pull/71 . * It also does some general cleanups to simplify the distributed.rst by using `:methods`. * It adds `__init__` definitions for the Stores * I've reordered things so the collective APIs are before the Store/PG apis Test plan: ``` lintrunner -a cd docs && sphinx-autobuild source build/ -j auto -WT --keep-going ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/140853 Approved by: https://github.com/kwen2501	2024-11-19 02:06:32 +00:00
zeshengzong	5b326d6b61	Add gdb print methods support same as pytorch-lldb (#140935 ) `pytorch-lldb` support pretty printing size and key_set of tensor via #97101 Add same pretty printing for gdb debugging. Test Result ```bash $ gdb python (gdb) break at::native::negative (gdb) r >>> import torch >>> t = torch.tensor([1, 2, 3, 4], dtype=torch.float64) >>> t.negative() Thread 1 "python" hit Breakpoint 1, at::native::negative (self=...) at /home/zong/code/pytorch/aten/src/ATen/native/UnaryOps.cpp:854 854 Tensor negative(const Tensor& self) { return self.neg(); } ``` Before ```bash (gdb) p self.key_set() $2 = {repr_ = 1271310352385} (gdb) p self.sizes() $3 = {Data = 0x9cb488, Length = 1} ``` After ```bash (gdb) torch-int-array-ref-repr self.sizes() [4] (gdb) torch-dispatch-keyset-repr self.key_set() DispatchKeySet(CPU, ADInplaceOrView, AutogradCPU, AutocastCPU) ``` ```bash $ lintrunner ``` ![image](https://github.com/user-attachments/assets/b720e284-13b1-4581-ae3a-963f6482fdb2) Pull Request resolved: https://github.com/pytorch/pytorch/pull/140935 Approved by: https://github.com/drisspg	2024-11-19 01:28:30 +00:00
Will Constable	98e6e69b1b	[C10D] Support group_dst/group_src in c10d send/recv object_list (#140847 ) Also add mypy annotations Partially addresses RFC 0042 (https://github.com/pytorch/rfcs/pull/71) See more details/motivation in https://github.com/pytorch/pytorch/pull/140460 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140847 Approved by: https://github.com/H-Huang ghstack dependencies: #140843	2024-11-19 01:23:08 +00:00
Will Constable	c82c46ccc7	[C10D] support group_src/dst in broadcast/reduce ops (#140843 ) Also add mypy annotations Partially addresses RFC 0042 (pytorch/rfcs#71) See more details/motivation in #140460 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140843 Approved by: https://github.com/kwen2501	2024-11-19 01:23:08 +00:00
Shen Xu	efe8482c0d	Add prepare_obs_or_fq_callback to quantizer (#140863 ) Test Plan: CI. Differential Revision: D65982003 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140863 Approved by: https://github.com/jerryzh168	2024-11-19 01:13:38 +00:00
Jason Ansel	c79e78b503	[inductor] Refactor MutableBox to make IRNode typing easier (#140895 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/140895 Approved by: https://github.com/ezyang, https://github.com/Skylion007	2024-11-19 00:24:35 +00:00
Ryan Guo	98e441f00b	[dynamo] Simplify `ConstantVariable.create` and `ConstantVariable.__init__` (#140745 ) This patch removes some redundant code paths in `ConstantVariable.create` and` ConstantVariable.__init__`. Closes #110871. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140745 Approved by: https://github.com/jansel	2024-11-19 00:22:50 +00:00
Ryan Guo	2da98d9757	[dynamo] Support `is` comparison for symnodes (#140754 ) Fixes #109504. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140754 Approved by: https://github.com/williamwen42	2024-11-19 00:19:33 +00:00
Yang Wang	175ba9fed6	[Utilization Monitor] input to disable utilization monitor (#140857 ) # Overview Currently monitor.py produces error only result, this pr introduct disable-monitor option to all *-test.yml. We also like to explore how the monitor code affect benchmark results. # next steps - fix the monitor.py - enable non-benchmark tests with monitor - investigate benchmark test behavior with monitor background job Pull Request resolved: https://github.com/pytorch/pytorch/pull/140857 Approved by: https://github.com/huydhn	2024-11-18 23:26:03 +00:00
Yukio Siraichi	48a276c5a0	`log_softmax`: fix meta function output argument dtype check. (#140289 ) Tracking issue: #138399 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140289 Approved by: https://github.com/ezyang ghstack dependencies: #140186, #140286, #140288	2024-11-18 23:05:29 +00:00
Yukio Siraichi	435286e985	Fix unary references' out dtype check. (#140288 ) Tracking issue: #138399 This PR fixes a number of reference implementations (which are also used as meta functions), making them more consistent with CPU device. More specifically, it fixes those operations that use `_make_elementwise_unary_reference` decorator, and don't error on mismatching out argument dtype while they error when using concrete devices (e.g. CPU). The fixed operations are: - `abs` - `ceil` - `floor` - `frac` - `isneginf` - `isposinf` - `sgn` - `sign` - `signbit` - `trunc` Pull Request resolved: https://github.com/pytorch/pytorch/pull/140288 Approved by: https://github.com/ezyang ghstack dependencies: #140186, #140286	2024-11-18 23:05:29 +00:00
PyTorch MergeBot	727f1a6da9	Revert "FlopCounterMode: Decompose ops for inference mode (#138508 )" This reverts commit f915409c26c0ba38b286c7b617880af61a6b08ba. Reverted https://github.com/pytorch/pytorch/pull/138508 on behalf of https://github.com/jamesjwu due to Failing internal jobs ([comment](https://github.com/pytorch/pytorch/pull/138508#issuecomment-2484310587))	2024-11-18 22:59:36 +00:00
James Wu	8d5b3eeaa6	Remove __start__ stack, log backward compile to empty stack (#140431 ) Summary: This diff removes "__start__" from all stacks in Pt2 Compile Events, as it's unnecessary. It also starts logging events for backward compile, because otherwise we have no toplevel event representing full backward compilation. This gives us a toplevel event outside of the inductor compile. Test Plan: New chromium events: https://interncache-all.fbcdn.net/manifold/perfetto-artifacts/tree/ui/index.html?url=https%3A%2F%2Finterncache-all.fbcdn.net%2Fmanifold%2Ftlparse_reports%2Ftree%2Flogs%2Fjjwu%2Fcustom%2Fstuff4%2Fchromium_events.json#!/viewer?url=https%3A%2F%2Finterncache-all.fbcdn.net%2Fmanifold%2Ftlparse_reports%2Ftree%2Flogs%2Fjjwu%2Fcustom%2Fstuff4%2Fchromium_events.json&local_cache_key New tlparse: https://interncache-all.fbcdn.net/manifold/tlparse_reports/tree/logs/jjwu/custom/stuff4/index.html New scuba icicle view, still good: https://fburl.com/scuba/pt2_compile_events/z6gr3z53 Differential Revision: D65832045 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140431 Approved by: https://github.com/masnesral	2024-11-18 22:48:31 +00:00
titaiwangms	8e439021c1	[ONNX] Support from dynamic_shapes to dynamic_axes when torch.onnx.export(fallback=True) is triggered (#139532 ) Fixes #139320 ### Summary: #### (1) Add `_rename_dynamic_shapes_with_model_inputs` for dynamic_shapes to play along with input_names * Use model forward signature to rename dynamic_shapes when dynamic_shapes is not nested and dynamic_shapes is directly using the customized name. This solves the issue that torch.export.export expects dynamic_shapes only uses the model input names. * If the dynamic_shapes is nested, we do nothing. #### (2) Add `_from_dynamic_shapes_to_dynamic_axes` for fallback * We flatten dynamic_shapes with leaf defined _pytree.tree_leaves() ~~* If a dynamic_shapes is not nested, and defined in dict. We can use the key as the input_names, since it should be renamed by `_rename_dynamic_shapes_with_model_inputs` already.~~ * If a dynamic_shapes is provided, input_names is required to assign the names, because dynamic_axes needs it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139532 Approved by: https://github.com/justinchuby	2024-11-18 22:35:21 +00:00
William Wen	72943ba823	[3.13] deal with exec() semantic change in test_cond_no_dynamo_cache_limit (#140401 ) https://peps.python.org/pep-0667/ changed the semantics of `eval/exec` in 3.13 so that changes to locals no longer propagate (but globals do). This is to make the behavior predictable since in the past, the locals may or may not update based on various mysterious conditions. Other test sites may need updating too. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140401 Approved by: https://github.com/ydwu4, https://github.com/zou3519	2024-11-18 22:06:47 +00:00
titaiwangms	e445239bb4	[ONNX] Fix 2GB exporting crash during onnx shape type inference (#140962 ) Fixes https://github.com/pytorch/pytorch/issues/132205 Regression happened after https://github.com/pytorch/pytorch/pull/128675 that ONNX shape type inference error stops the exporting process during shape type inference. ONNX shape type inference during the export only does it's best to fulfill the information, and should not crash the export. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140962 Approved by: https://github.com/justinchuby	2024-11-18 21:50:23 +00:00
cyy	8cd7ad8b48	[Reland][Environment Variable][5/N] Use thread-safe getenv functions (#140594 ) Reland of #139762 with no bug found. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140594 Approved by: https://github.com/ezyang	2024-11-18 21:45:35 +00:00
Catherine Lee	c62da98c1a	Upload all run attempts when in upload_test_stats_intermediate (#140459 ) Upload all run attempts since it can be hard to determine which run attempt to do from HUD, since HUD shows everything together Pull Request resolved: https://github.com/pytorch/pytorch/pull/140459 Approved by: https://github.com/huydhn	2024-11-18 21:40:10 +00:00
Scott Wolchok	17bb78a3d3	Port X86_F16 from executorch half to PyTorch half (#140720 ) This was added in https://github.com/pytorch/executorch/pull/1789 . I'm working on sharing Half.h with ExecuTorch, and this is a missing feature. Differential Revision: [D65949409](https://our.internmc.facebook.com/intern/diff/D65949409/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/140720 Approved by: https://github.com/malfet ghstack dependencies: #140564, #140565, #140566, #140567	2024-11-18 21:32:44 +00:00
PyTorch MergeBot	43de32d948	Revert "create a new torch.cuda.device_memory_used api (#140870 )" This reverts commit 478204cad68651960a979ca109e2bd4a219b0f1a. Reverted https://github.com/pytorch/pytorch/pull/140870 on behalf of https://github.com/yuguo68 due to the test is still flaky on ROCm, test_cuda.py::TestCudaMallocAsync is not skipped with the unittest.skipIf(TEST_CUDAMALLOCASYNC ([comment](https://github.com/pytorch/pytorch/pull/140870#issuecomment-2484161914))	2024-11-18 21:26:25 +00:00
Yuanhao Ji	4bb1bf0573	[Docs] Remove duplicate declaration of `double_tensor` (#140927 ) Fixes #140920 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140927 Approved by: https://github.com/malfet	2024-11-18 21:22:30 +00:00
Nikita Shulga	e46af7de0c	[MPS] [BE] Use direct call vs virtual (#140950 ) I.e. replace `at::detail::getMPSHooks().isOnMacOSorNewer` with `is_macos_13_or_newer`, which is a direct function call instead of going thru a virtual method call Hooks are only needed to provide a feature-agnostic inteface to query something even on the platforms that might not have support for the featuee, while functions implemented in `ATen/native/xxx` should be able to call those platform specific methods directly Pull Request resolved: https://github.com/pytorch/pytorch/pull/140950 Approved by: https://github.com/Skylion007 ghstack dependencies: #140896	2024-11-18 21:01:52 +00:00
Natalia Gimelshein	4eed438a42	Implement deterministic scan (#140887 ) Fixes #89492 Uses block-wise cub primitives On large inputs, this implementation is approximately 25% slower than device cub implementation, so it's turned on only in cases where cub would have been (floating point inputs, cumsum that is effectively 1d) Pull Request resolved: https://github.com/pytorch/pytorch/pull/140887 Approved by: https://github.com/ezyang, https://github.com/kurtamohler	2024-11-18 20:56:14 +00:00
Basil Wong	00c829876c	Log Full Knapsack Problem Information (#140757 ) Summary: When AOT_PARTITIONER_DEBUG is set to 1 and debug logging is turned on we can now log the full input and output for each knapsack problem. Differential Revision: D65633086 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140757 Approved by: https://github.com/jansel	2024-11-18 20:36:32 +00:00
Nikita Shulga	408ad45014	[MPS][BE] Introduce `mtl_setArgs` (#140896 ) Which is a variadic template that automates tedious (and error prone) process of pasing the arguments via series of ```cpp mtl_setBuffer(encoder, b1, 0); mtl_setBuffer(encoder, b2, 1); mtl_setBytes(encoder, param, 2); ``` into a compact ``` mtl_setArgs(encoder, b1, b2, param); ``` Introduce few more specialization of `mps_setArg`, such as: - Call `setBuffer` for `id<MTLBuffer>` - Copy double as float (as MPS does not support double precision types) - Accept `std::optional<at::Tensor>` that will not call setBuffet, if optional is empty Also, re-metaprogramm `mtl_setBytes` to make it usable with any trivially copiable structs, but keep separate implementation for containers, as uploading `c10:SmallVector`, which is trivially copiable would overwrite next arguments, which luckily resulted in test failures of `test_cross_entropy_label_smoothing_weight_ignore_indices_mps` Introduce `has_size_type_v` which could be used to diferrentiate between trivially copiable `std::array` and `c10::ArrayRef` vs other trivially copiable structs. ```cpp template <typename T> class has_size_type { template <typename U> static constexpr std::true_type check(typename U::size_type*); template <typename> static constexpr std::false_type check(...); public: static constexpr bool value = decltype(check<T>(nullptr))::value; }; template <typename T> constexpr bool has_size_type_v = has_size_type<T>::value; ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/140896 Approved by: https://github.com/Skylion007	2024-11-18 20:35:01 +00:00
Joel Schlosser	e80b1b2870	Flex + NJT: cross attention support (#140723 ) Fixes #140598 Allows ragged structures for query and key+value sequence lengths to differ (i.e. supports cross attention for Flex + NJT). Technically, this is BC-breaking thanks to arg renaming and positional arg reordering in `create_nested_block_mask()`, but Flex + NJT support isn't in a major release yet so I'm hoping we can just do it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140723 Approved by: https://github.com/drisspg	2024-11-18 19:49:45 +00:00
Yu Guo	478204cad6	create a new torch.cuda.device_memory_used api (#140870 ) Summary: the current torch.cuda.memory_usage returns the memory utilization, more specifically, percent of time over the past sample period global memory being read/written for Nvidia. see more details in https://github.com/pytorch/pytorch/issues/140638 Test Plan: added a new unittest Differential Revision: D65960134 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140870 Approved by: https://github.com/ngimel	2024-11-18 19:13:43 +00:00
Scott Wolchok	081c1687c8	Remove UB type punning from c10/util/floating_point_utils.h (#140567 ) Accessing the inactive member of a union is undefined behavior. Fortunately, we have c10::bit_cast. Differential Revision: [D65888680](https://our.internmc.facebook.com/intern/diff/D65888680/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/140567 Approved by: https://github.com/Skylion007, https://github.com/malfet ghstack dependencies: #140564, #140565, #140566	2024-11-18 18:41:34 +00:00
Scott Wolchok	f59ec98ceb	Add C10_EMBEDDED to gate ostream usage in Half/BFloat16 (#140566 ) We want to use Half/BFloat16 in ExecuTorch to support shared kernel code. They will need to be used in ExecuTorch core, so they can't have streams. This diff introduces a macro to gate the stream code off. Differential Revision: [D65888035](https://our.internmc.facebook.com/intern/diff/D65888035/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/140566 Approved by: https://github.com/ezyang, https://github.com/malfet ghstack dependencies: #140564, #140565	2024-11-18 18:41:34 +00:00
FFFrog	0f1a88cfba	Make Context to be Device-agnostic Step by Step (2/N) (#136526 ) ---- - add new method(getDefaultGenerator, getNewGenerator) into AcceleratorHooksInterface Pull Request resolved: https://github.com/pytorch/pytorch/pull/136526 Approved by: https://github.com/ezyang, https://github.com/EikanWang	2024-11-18 18:21:17 +00:00
Max Ren	cca34be584	Update XNNPACK Version (#139913 ) Updating XNNPACK Version to 4ea82e595b36106653175dcb04b2aa532660d0d8 submodule update Pull Request resolved: https://github.com/pytorch/pytorch/pull/139913 Approved by: https://github.com/digantdesai, https://github.com/huydhn	2024-11-18 18:16:31 +00:00
Scott Wolchok	e429a3b72e	Move complex<Half> from Half.h to complex.h (#140565 ) Executing on old TODO on the way to sharing Half.h with ExecuTorch. Differential Revision: [D65888037](https://our.internmc.facebook.com/intern/diff/D65888037/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/140565 Approved by: https://github.com/ezyang, https://github.com/malfet ghstack dependencies: #140564	2024-11-18 15:56:21 +00:00
Scott Wolchok	f630799587	move c10::overflows to its own header (#140564 ) Working on moving `complex<Half>` to complex.h instead of Half.h; this depends on complex and isn't used particularly widely. Differential Revision: [D65888038](https://our.internmc.facebook.com/intern/diff/D65888038/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/140564 Approved by: https://github.com/ezyang, https://github.com/Skylion007, https://github.com/malfet	2024-11-18 15:56:21 +00:00
Anant Gulati	b379a28a95	Generalization of distributed test cases for non-CUDA devices (#138216 ) # Motivation This pr is an extension of #131758. As described in #131758, these changes are looking to make distributed UTs more accessible to users of all device types. It is a demonstration of a few changes discussed by @kwen2501 and @jgong5 in the discussion for #131758(https://github.com/pytorch/pytorch/pull/131758#discussion_r1762422784) This PR contains two types of changes, the first is to the common distributed folder where we have added a new class derived from MultiProcessTestCase which helps abstracts out the process group creation /deletion and other functionality for a given device. The new generalized content can be added by deriving from this base class. Also includes other misc changes for gaudi support The second changed file is test_functional_api. a test file in common distributed. This file is a POC for how we can use this new class to write more device agnostic distributed test cases. The following changes have been made to test_functional_api.py: -Functionality has been added to test for non cuda devices using intel HPU as an example -Multiple set up steps previously required by MultiProcessTestCase have been abstracted out -Misc adaptations to allow for general call to accelerators while adding test skips instead explicitly skipping for multiple GPUs -Skipifhpu flags have been added to enable skipping a few Multithreaded test cases which are as yet not supported on HPUs NOTE: Within test functional api, there are tests which require the use of some multithreading functions which are as yet not supported on HPUs. These have been skipped for hpu using skipHPU decorator. I will be raising a separate PR to improve usability pf said decorators in a device agnostic setting in the manner suggested by @kwen2501 in a comment on this PR. This pr is a cleaned up version of a previous PR(#136988) which I closed due to human error. I have addressed some of the comments made by @kwen2501 in this as well Pull Request resolved: https://github.com/pytorch/pytorch/pull/138216 Approved by: https://github.com/kwen2501, https://github.com/guangyey	2024-11-18 09:38:00 +00:00
cyy	06dde8c157	[1/N] Remove inclusion of ATen/core/Array.h (#122064 ) The functionality of Array.h is largely overlapped with std::array and it should be safe to use std::array instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/122064 Approved by: https://github.com/ezyang	2024-11-18 08:50:28 +00:00
PyTorch MergeBot	6c6f745fa7	Revert "[1/N] Remove inclusion of ATen/core/Array.h (#122064 )" This reverts commit 486b9aaa67a02807aea06f33c009b5311caab337. Reverted https://github.com/pytorch/pytorch/pull/122064 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but lots of compilation errors show up after this lands ([comment](https://github.com/pytorch/pytorch/pull/122064#issuecomment-2482263396))	2024-11-18 08:31:38 +00:00
fan.mo	43edb94f8a	[Quantization][PrivateUse1] Adding more support QuantizedPrivateuse1 backends (#139860 ) Here's are some explanations of this PR. 1. Changes in `aten/src/ATen/core/Tensor.cpp` and `c10/core/DispatchKey.cpp`: Support toString method for `QuantizedPrivateUse1` backend, make pytorch print out correct backend string for it. 2. Add header `DispatchStub.h` in `aten/src/ATen/native/quantized/IndexKernel.h`: If this header is not included, we can't utilize `masked_fill_kernel_quantized_stub` even we include this `IndexKernel.h` header, it would throw an error during compilation. 3. Add multiple `TORCH_API`s in `aten/src/ATen/native/quantized/AffineQuantizer.h`: these functions is useful for other privateuse1 backends supporting quantization functions, if these `TORCH_API` are missed, it would throw an error during runtime (undefined symbol) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139860 Approved by: https://github.com/bdhirsh	2024-11-18 05:09:59 +00:00
Will Constable	1d5a8ee8fb	[C10D] call destroy_process_group after MultiProcess tests (#140820 ) Faced with an annoying string of warnings like this when running tests, <img width="1644" alt="Screenshot 2024-11-15 at 11 23 21 AM" src="https://github.com/user-attachments/assets/91ff4e1d-3c29-4510-9a61-46e7df68a212"> My choices seem to be (1) call destroy_process_group() at the end of each test fn, (2) do this in some wrapper, (3) do it in the base test class. Since tests in MultiProcessTestCase are responsible for calling init_process_group themselves, they should also be responsible for calling destroy (or at least method (3) would be asymmetric and may result in double-destroy). But it doesn't feel worth it to go add a destroy call manually to each test, and try/except for a possible second destroy call seems like a happy middle ground. Note: tests that want to ensure that destroy runs cleanly can and should still call destroy _inside_ the test, and this change does not affect that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140820 Approved by: https://github.com/fegin	2024-11-18 04:26:21 +00:00
Yuanhao Ji	a1327fac45	[Dynamo] Replace `torch._dynamo.optimize()` with `torch.compile()` [5/N] (#140663 ) related commits: - #139706 - #140238 - #140247 - #140253 - #140663 - #140688 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140663 Approved by: https://github.com/williamwen42	2024-11-18 04:11:56 +00:00
Yuanhao Ji	16bc82a015	[Dynamo] Replace `torch._dynamo.optimize()` with `torch.compile()` [6/N] (#140688 ) related commits: - #139706 - #140238 - #140247 - #140253 - #140663 - #140688 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140688 Approved by: https://github.com/williamwen42	2024-11-18 04:09:09 +00:00
Yu, Guangye	62d2c5b667	Revert "Enable XPUEvent elapsed_time function (#134666 )" (#140872 ) # Motivation This PR raises an internal UT failure on XPU. This reverts commit 4bbd6da33101a8d709f1d2921ad8ae6f9b0dc166. # Additional Context refer to https://github.com/pytorch/pytorch/issues/140814 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140872 Approved by: https://github.com/EikanWang	2024-11-18 02:58:05 +00:00

1 2 3 4 5 ...

81133 Commits