Commit Graph

81133 Commits

Author SHA1 Message Date
a10ce22577 [BE] Update bazelisk and bazel versions (#140992)
bazelisk from 1.16 to 1.23
bazel from 6.1.1 to 6.5.0
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140992
Approved by: https://github.com/izaitsevfb, https://github.com/huydhn
2024-11-19 03:40:53 +00:00
0fcd024f59 [hop] refactor only_consist_of with find_mismatched_vars (#140105)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140105
Approved by: https://github.com/zou3519
2024-11-19 03:21:16 +00:00
70a0906f24 [c10d] Support optional backend if device_id provided (#140963)
Citing @malfet's [comment](https://github.com/pytorch/pytorch/pull/136343#pullrequestreview-2318792396) in https://github.com/pytorch/pytorch/pull/136343
> It would be great, if users do not have to modify their programs for every new backend, but rather use with torch.device('xpu'): and keep rest of the code unchanged.

This PR makes the backend specification ("nccl", "gloo") optional when user provides a `devce_id` to `init_process_group` (the acceptance of `device_id` has been previously supported for the purpose of eager init).

New user experience:
```
device = torch.device(device_type, rank % device_count)
dist.init_process_group(device_id=device)
```

The line of `device = torch.device(...)` is anyway needed because user would use it for tensor creation etc.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140963
Approved by: https://github.com/wconstab
2024-11-19 03:17:29 +00:00
37959c554d Add small test case for #140230 (#140850)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140850
Approved by: https://github.com/malfet
ghstack dependencies: #140739, #140740
2024-11-19 02:44:54 +00:00
f3f305ef3e Fix condition for weights_only unpickler for DTensor (#140740)
Same as #140739 but for DTensor (move safe globals for DTensor to `torch.distributed.tensor.__init__` and update error message to let user know `torch.distributed.tensor` must be imported to load DTensor)

Differential Revision: [D65961690](https://our.internmc.facebook.com/intern/diff/D65961690)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140740
Approved by: https://github.com/malfet
ghstack dependencies: #140739
2024-11-19 02:44:53 +00:00
b63a84804c Allow NJT by default for weights_only torch.load (take 2) (#140739)
Per discussion with @malfet, only allow weights_only unpickler to load NJT if `torch.nested` and `torch._dynamo`  are imported

(this is slightly weird as technically `torch.nested` is actually imported by default and `torch._dynamo.decorators._DimRange` is actually what needs to be imported)

we can't import this from `torch.nested` as this would
- undo dynamo lazy import
- cause circular import

===========================
Redo of https://github.com/pytorch/pytorch/pull/140304 caused issues as `torch.nested._internal.foo` needs to be imported, which causes issues like

```python
torch/_weights_only_unpickler.py", line 339, in load
    if full_path in _get_allowed_globals():
torch/_weights_only_unpickler.py", line 188, in _get_allowed_globals
    torch.nested._internal.nested_tensor.NestedTensor
AttributeError: module 'torch.nested' has no attribute '_internal'
```

**This likely wasn't caught in our CI because imports are global during unit tests(?), so we use subprocess to properly test this time**

Differential Revision: [D65961691](https://our.internmc.facebook.com/intern/diff/D65961691)

@jbschlosser
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140739
Approved by: https://github.com/malfet
2024-11-19 02:44:53 +00:00
1e234e63b3 [pytorch][dynamo_compile] Log inductor config to dynamo_compile (#140790)
Summary:
Scrubbed inductor config logging to dynamo_compile as json:str.

Scrub RE: `r'((^TYPE_CHECKING$)|(.*_progress$)|(.*TESTING.*)|(.*(rocm|halide).*)|(^trace\..*)|(^_))'`to save some space.

Test Plan:
Staging logger: https://fburl.com/data/ltkt08zm

P1679697917

{F1958428018}

Differential Revision: D65806399

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140790
Approved by: https://github.com/masnesral
2024-11-19 02:39:33 +00:00
9ae19ffbed fix layer_norm decomp precision for cpu (#140557)
xref: https://fb.workplace.com/groups/1075192433118967/posts/1540519826586223/?comment_id=1543752356262970&reply_comment_id=1544425069529032

the issue is that our decomp needs to branch on device (it only upcasts for cpu), but the device shows up as "meta" because it is registered as a meta tensor rule.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140557
Approved by: https://github.com/ezyang
2024-11-19 02:31:31 +00:00
240aa77ad0 [Quantizer][XNNPACK] Fix ReLU fusion when conv/linear has > 1 user (#140846)
Summary:
Bug in quantizer when Conv + ReLU is fused even when the preceeding conv has more than one user. Conv and ReLU can not be fused in this case because the result of Conv must be used elsewhere.

XNNPACK Delegate naturally handles this by inserting a clamp node for ReLU.

Test Plan: CI

Reviewed By: digantdesai

Differential Revision: D65989599

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140846
Approved by: https://github.com/digantdesai
2024-11-19 02:29:45 +00:00
2673a440d0 [distributed] add PG APIs and general doc cleanups (#140853)
Doc updates:

* This adds documentation for the object oriented ProcessGroup APIs that are being used in torchft as well as https://github.com/pytorch/rfcs/pull/71 .
* It also does some general cleanups to simplify the distributed.rst by using `:methods`.
* It adds `__init__` definitions for the Stores
* I've reordered things so the collective APIs are before the Store/PG apis

Test plan:

```
lintrunner -a
cd docs && sphinx-autobuild source build/ -j auto -WT --keep-going
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140853
Approved by: https://github.com/kwen2501
2024-11-19 02:06:32 +00:00
5b326d6b61 Add gdb print methods support same as pytorch-lldb (#140935)
`pytorch-lldb` support pretty printing size and key_set of tensor via #97101

Add same pretty printing for gdb debugging.

**Test Result**

```bash
$ gdb python
(gdb) break at::native::negative
(gdb) r
>>> import torch
>>> t = torch.tensor([1, 2, 3, 4], dtype=torch.float64)
>>> t.negative()
Thread 1 "python" hit Breakpoint 1, at::native::negative (self=...) at /home/zong/code/pytorch/aten/src/ATen/native/UnaryOps.cpp:854
854	Tensor negative(const Tensor& self) { return self.neg(); }
```

**Before**
```bash
(gdb) p self.key_set()
$2 = {repr_ = 1271310352385}

(gdb) p self.sizes()
$3 = {Data = 0x9cb488, Length = 1}

```

**After**
```bash
(gdb) torch-int-array-ref-repr self.sizes()
[4]
(gdb) torch-dispatch-keyset-repr self.key_set()
DispatchKeySet(CPU, ADInplaceOrView, AutogradCPU, AutocastCPU)
```

```bash
$ lintrunner
```
![image](https://github.com/user-attachments/assets/b720e284-13b1-4581-ae3a-963f6482fdb2)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140935
Approved by: https://github.com/drisspg
2024-11-19 01:28:30 +00:00
98e6e69b1b [C10D] Support group_dst/group_src in c10d send/recv object_list (#140847)
Also add mypy annotations

Partially addresses RFC 0042 (https://github.com/pytorch/rfcs/pull/71)
See more details/motivation in https://github.com/pytorch/pytorch/pull/140460

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140847
Approved by: https://github.com/H-Huang
ghstack dependencies: #140843
2024-11-19 01:23:08 +00:00
c82c46ccc7 [C10D] support group_src/dst in broadcast/reduce ops (#140843)
Also add mypy annotations

Partially addresses RFC 0042 (pytorch/rfcs#71)
See more details/motivation in #140460
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140843
Approved by: https://github.com/kwen2501
2024-11-19 01:23:08 +00:00
efe8482c0d Add prepare_obs_or_fq_callback to quantizer (#140863)
Test Plan: CI.

Differential Revision: D65982003

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140863
Approved by: https://github.com/jerryzh168
2024-11-19 01:13:38 +00:00
c79e78b503 [inductor] Refactor MutableBox to make IRNode typing easier (#140895)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140895
Approved by: https://github.com/ezyang, https://github.com/Skylion007
2024-11-19 00:24:35 +00:00
98e441f00b [dynamo] Simplify ConstantVariable.create and ConstantVariable.__init__ (#140745)
This patch removes some redundant code paths in
`ConstantVariable.create` and` ConstantVariable.__init__`.

Closes #110871.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140745
Approved by: https://github.com/jansel
2024-11-19 00:22:50 +00:00
2da98d9757 [dynamo] Support is comparison for symnodes (#140754)
Fixes #109504.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140754
Approved by: https://github.com/williamwen42
2024-11-19 00:19:33 +00:00
175ba9fed6 [Utilization Monitor] input to disable utilization monitor (#140857)
# Overview
Currently monitor.py produces error only result, this pr introduct disable-monitor option to all *-test.yml. We also like to explore how the monitor code affect benchmark results.

# next steps
- fix the monitor.py
- enable non-benchmark tests with monitor
- investigate benchmark test behavior with monitor background job

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140857
Approved by: https://github.com/huydhn
2024-11-18 23:26:03 +00:00
48a276c5a0 log_softmax: fix meta function output argument dtype check. (#140289)
Tracking issue: #138399
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140289
Approved by: https://github.com/ezyang
ghstack dependencies: #140186, #140286, #140288
2024-11-18 23:05:29 +00:00
435286e985 Fix unary references' out dtype check. (#140288)
Tracking issue: #138399

This PR fixes a number of reference implementations (which are also used as meta
functions), making them more consistent with CPU device. More specifically, it fixes those
operations that use `_make_elementwise_unary_reference` decorator, and don't error on
mismatching out argument dtype while they error when using concrete devices (e.g. CPU).

The fixed operations are:

- `abs`
- `ceil`
- `floor`
- `frac`
- `isneginf`
- `isposinf`
- `sgn`
- `sign`
- `signbit`
- `trunc`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140288
Approved by: https://github.com/ezyang
ghstack dependencies: #140186, #140286
2024-11-18 23:05:29 +00:00
727f1a6da9 Revert "FlopCounterMode: Decompose ops for inference mode (#138508)"
This reverts commit f915409c26c0ba38b286c7b617880af61a6b08ba.

Reverted https://github.com/pytorch/pytorch/pull/138508 on behalf of https://github.com/jamesjwu due to Failing internal jobs ([comment](https://github.com/pytorch/pytorch/pull/138508#issuecomment-2484310587))
2024-11-18 22:59:36 +00:00
8d5b3eeaa6 Remove __start__ stack, log backward compile to empty stack (#140431)
Summary:
This diff removes "__start__" from all stacks in Pt2 Compile Events, as it's unnecessary.

It also starts logging events for backward compile, because otherwise we have no toplevel event representing full backward compilation. This gives us a toplevel event outside of the inductor compile.

Test Plan:
New chromium events:

https://interncache-all.fbcdn.net/manifold/perfetto-artifacts/tree/ui/index.html?url=https%3A%2F%2Finterncache-all.fbcdn.net%2Fmanifold%2Ftlparse_reports%2Ftree%2Flogs%2Fjjwu%2Fcustom%2Fstuff4%2Fchromium_events.json#!/viewer?url=https%3A%2F%2Finterncache-all.fbcdn.net%2Fmanifold%2Ftlparse_reports%2Ftree%2Flogs%2Fjjwu%2Fcustom%2Fstuff4%2Fchromium_events.json&local_cache_key

New tlparse:
https://interncache-all.fbcdn.net/manifold/tlparse_reports/tree/logs/jjwu/custom/stuff4/index.html

New scuba icicle view, still good: https://fburl.com/scuba/pt2_compile_events/z6gr3z53

Differential Revision: D65832045

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140431
Approved by: https://github.com/masnesral
2024-11-18 22:48:31 +00:00
8e439021c1 [ONNX] Support from dynamic_shapes to dynamic_axes when torch.onnx.export(fallback=True) is triggered (#139532)
Fixes #139320

### Summary:
#### (1) Add  `_rename_dynamic_shapes_with_model_inputs` for dynamic_shapes to play along with input_names

* Use model forward signature to rename dynamic_shapes when dynamic_shapes is not nested and dynamic_shapes is directly using the customized name. This solves the issue that torch.export.export expects dynamic_shapes only uses the model input names.
* If the dynamic_shapes is nested, we do nothing.

#### (2) Add `_from_dynamic_shapes_to_dynamic_axes` for fallback

* We flatten dynamic_shapes with leaf defined _pytree.tree_leaves()
~~* If a dynamic_shapes is not nested, and defined in dict. We can use the key as the input_names, since it should be renamed by `_rename_dynamic_shapes_with_model_inputs` already.~~
* If a dynamic_shapes is provided, input_names is required to assign the names, because dynamic_axes needs it.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/139532
Approved by: https://github.com/justinchuby
2024-11-18 22:35:21 +00:00
72943ba823 [3.13] deal with exec() semantic change in test_cond_no_dynamo_cache_limit (#140401)
https://peps.python.org/pep-0667/ changed the semantics of `eval/exec` in 3.13 so that changes to locals no longer propagate (but globals do). This is to make the behavior predictable since in the past, the locals may or may not update based on various mysterious conditions. Other test sites may need updating too.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140401
Approved by: https://github.com/ydwu4, https://github.com/zou3519
2024-11-18 22:06:47 +00:00
e445239bb4 [ONNX] Fix 2GB exporting crash during onnx shape type inference (#140962)
Fixes https://github.com/pytorch/pytorch/issues/132205

Regression happened after https://github.com/pytorch/pytorch/pull/128675 that ONNX shape type inference error stops the exporting process during shape type inference. ONNX shape type inference during the export only does it's best to fulfill the information, and should not crash the export.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140962
Approved by: https://github.com/justinchuby
2024-11-18 21:50:23 +00:00
cyy
8cd7ad8b48 [Reland][Environment Variable][5/N] Use thread-safe getenv functions (#140594)
Reland of #139762 with no bug found.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140594
Approved by: https://github.com/ezyang
2024-11-18 21:45:35 +00:00
c62da98c1a Upload all run attempts when in upload_test_stats_intermediate (#140459)
Upload all run attempts since it can be hard to determine which run attempt to do from HUD, since HUD shows everything together
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140459
Approved by: https://github.com/huydhn
2024-11-18 21:40:10 +00:00
17bb78a3d3 Port X86_F16 from executorch half to PyTorch half (#140720)
This was added in https://github.com/pytorch/executorch/pull/1789 . I'm working on sharing Half.h with ExecuTorch, and this is a missing feature.

Differential Revision: [D65949409](https://our.internmc.facebook.com/intern/diff/D65949409/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140720
Approved by: https://github.com/malfet
ghstack dependencies: #140564, #140565, #140566, #140567
2024-11-18 21:32:44 +00:00
43de32d948 Revert "create a new torch.cuda.device_memory_used api (#140870)"
This reverts commit 478204cad68651960a979ca109e2bd4a219b0f1a.

Reverted https://github.com/pytorch/pytorch/pull/140870 on behalf of https://github.com/yuguo68 due to the test is still flaky on ROCm, test_cuda.py::TestCudaMallocAsync is not skipped with the unittest.skipIf(TEST_CUDAMALLOCASYNC ([comment](https://github.com/pytorch/pytorch/pull/140870#issuecomment-2484161914))
2024-11-18 21:26:25 +00:00
4bb1bf0573 [Docs] Remove duplicate declaration of double_tensor (#140927)
Fixes #140920

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140927
Approved by: https://github.com/malfet
2024-11-18 21:22:30 +00:00
e46af7de0c [MPS] [BE] Use direct call vs virtual (#140950)
I.e. replace `at::detail::getMPSHooks().isOnMacOSorNewer` with `is_macos_13_or_newer`, which is a direct function call instead of going thru a virtual method call
Hooks are only needed to provide a feature-agnostic inteface to query something even on the platforms that might not have support for the featuee, while functions implemented in `ATen/native/xxx` should be able to call those platform specific methods directly
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140950
Approved by: https://github.com/Skylion007
ghstack dependencies: #140896
2024-11-18 21:01:52 +00:00
4eed438a42 Implement deterministic scan (#140887)
Fixes #89492
Uses block-wise cub primitives
On large inputs, this implementation is approximately 25% slower than device cub implementation, so it's turned on only in cases where cub would have been (floating point inputs, cumsum that is effectively 1d)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140887
Approved by: https://github.com/ezyang, https://github.com/kurtamohler
2024-11-18 20:56:14 +00:00
00c829876c Log Full Knapsack Problem Information (#140757)
Summary: When AOT_PARTITIONER_DEBUG is set to 1 and debug logging is turned on we can now log the full input and output for each knapsack problem.

Differential Revision: D65633086

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140757
Approved by: https://github.com/jansel
2024-11-18 20:36:32 +00:00
408ad45014 [MPS][BE] Introduce mtl_setArgs (#140896)
Which is a variadic template that automates tedious (and error prone) process of pasing the arguments via series of
```cpp
  mtl_setBuffer(encoder, b1, 0);
  mtl_setBuffer(encoder, b2, 1);
  mtl_setBytes(encoder, param, 2);
```
into a compact
```
  mtl_setArgs(encoder, b1, b2, param);
```

Introduce few more specialization of `mps_setArg`, such as:
 - Call `setBuffer` for `id<MTLBuffer>`
 - Copy double as float (as MPS does not support double precision types)
 - Accept `std::optional<at::Tensor>` that will not call setBuffet, if optional is empty

Also, re-metaprogramm `mtl_setBytes` to make it usable with any trivially copiable structs, but keep separate implementation for containers, as uploading `c10:SmallVector`, which is trivially copiable would overwrite next arguments, which luckily resulted in test failures of `test_cross_entropy_label_smoothing_weight_ignore_indices_mps`

Introduce `has_size_type_v` which could be used to diferrentiate between trivially copiable `std::array` and `c10::ArrayRef` vs other trivially copiable structs.
```cpp
template <typename T>
class has_size_type {
  template <typename U>
  static constexpr std::true_type check(typename U::size_type*);
  template <typename>
  static constexpr std::false_type check(...);

 public:
  static constexpr bool value = decltype(check<T>(nullptr))::value;
};

template <typename T>
constexpr bool has_size_type_v = has_size_type<T>::value;
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140896
Approved by: https://github.com/Skylion007
2024-11-18 20:35:01 +00:00
e80b1b2870 Flex + NJT: cross attention support (#140723)
Fixes #140598

Allows ragged structures for query and key+value sequence lengths to differ (i.e. supports cross attention for Flex + NJT).

Technically, this is BC-breaking thanks to arg renaming and positional arg reordering in `create_nested_block_mask()`, but Flex + NJT support isn't in a major release yet so I'm hoping we can just do it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140723
Approved by: https://github.com/drisspg
2024-11-18 19:49:45 +00:00
478204cad6 create a new torch.cuda.device_memory_used api (#140870)
Summary:
the current torch.cuda.memory_usage returns the memory utilization, more specifically, percent of time over the past sample period global memory being read/written for Nvidia.
see more details in https://github.com/pytorch/pytorch/issues/140638

Test Plan: added a new unittest

Differential Revision: D65960134

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140870
Approved by: https://github.com/ngimel
2024-11-18 19:13:43 +00:00
081c1687c8 Remove UB type punning from c10/util/floating_point_utils.h (#140567)
Accessing the inactive member of a union is undefined behavior. Fortunately, we have c10::bit_cast.

Differential Revision: [D65888680](https://our.internmc.facebook.com/intern/diff/D65888680/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140567
Approved by: https://github.com/Skylion007, https://github.com/malfet
ghstack dependencies: #140564, #140565, #140566
2024-11-18 18:41:34 +00:00
f59ec98ceb Add C10_EMBEDDED to gate ostream usage in Half/BFloat16 (#140566)
We want to use Half/BFloat16 in ExecuTorch to support shared kernel code. They will need to be used in ExecuTorch core, so they can't have streams. This diff introduces a macro to gate the stream code off.

Differential Revision: [D65888035](https://our.internmc.facebook.com/intern/diff/D65888035/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140566
Approved by: https://github.com/ezyang, https://github.com/malfet
ghstack dependencies: #140564, #140565
2024-11-18 18:41:34 +00:00
0f1a88cfba Make Context to be Device-agnostic Step by Step (2/N) (#136526)
----

- add new method(getDefaultGenerator, getNewGenerator) into AcceleratorHooksInterface
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136526
Approved by: https://github.com/ezyang, https://github.com/EikanWang
2024-11-18 18:21:17 +00:00
cca34be584 Update XNNPACK Version (#139913)
Updating XNNPACK Version to 4ea82e595b36106653175dcb04b2aa532660d0d8

submodule update
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139913
Approved by: https://github.com/digantdesai, https://github.com/huydhn
2024-11-18 18:16:31 +00:00
e429a3b72e Move complex<Half> from Half.h to complex.h (#140565)
Executing on old TODO on the way to sharing Half.h with ExecuTorch.

Differential Revision: [D65888037](https://our.internmc.facebook.com/intern/diff/D65888037/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140565
Approved by: https://github.com/ezyang, https://github.com/malfet
ghstack dependencies: #140564
2024-11-18 15:56:21 +00:00
f630799587 move c10::overflows to its own header (#140564)
Working on moving `complex<Half>` to complex.h instead of Half.h; this depends on complex and isn't used particularly widely.

Differential Revision: [D65888038](https://our.internmc.facebook.com/intern/diff/D65888038/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140564
Approved by: https://github.com/ezyang, https://github.com/Skylion007, https://github.com/malfet
2024-11-18 15:56:21 +00:00
b379a28a95 Generalization of distributed test cases for non-CUDA devices (#138216)
# Motivation
This pr is an extension of #131758. As described in #131758, these changes are looking to make distributed UTs more accessible to users of all device types.

It is a demonstration of a few changes discussed by @kwen2501 and @jgong5 in the discussion for #131758(https://github.com/pytorch/pytorch/pull/131758#discussion_r1762422784)

This PR contains two types of changes, the first is to the common distributed folder where we have added a new class derived from MultiProcessTestCase which helps abstracts out the process group creation /deletion and other functionality for a given device.

The new generalized content can be added by deriving from this base class.
Also includes other misc changes for gaudi support

The second changed file is test_functional_api. a test file in common distributed. This file is a POC for how we can use this new class to write more device agnostic distributed test cases.

The following changes have been made to test_functional_api.py:
-Functionality has been added to test for non cuda devices using intel HPU as an example
-Multiple set up steps previously required by MultiProcessTestCase have been abstracted out
-Misc adaptations to allow for general call to accelerators while adding test skips instead explicitly skipping for multiple GPUs
-Skipifhpu flags have been added to enable skipping a few Multithreaded test cases which are as yet not supported on HPUs

NOTE: Within test functional api, there are tests which require the use of some multithreading functions which are as yet not supported on HPUs. These have been skipped for hpu using skipHPU decorator.

I will be raising a separate PR to improve usability pf said decorators in a device agnostic setting in the manner suggested by @kwen2501 in a comment on this PR.

This pr is a cleaned up version of a previous PR(#136988) which I closed due to human error. I have addressed some of the comments made by @kwen2501 in this as well

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138216
Approved by: https://github.com/kwen2501, https://github.com/guangyey
2024-11-18 09:38:00 +00:00
cyy
06dde8c157 [1/N] Remove inclusion of ATen/core/Array.h (#122064)
The functionality of Array.h is largely overlapped with std::array and it should be safe to use std::array instead.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122064
Approved by: https://github.com/ezyang
2024-11-18 08:50:28 +00:00
6c6f745fa7 Revert "[1/N] Remove inclusion of ATen/core/Array.h (#122064)"
This reverts commit 486b9aaa67a02807aea06f33c009b5311caab337.

Reverted https://github.com/pytorch/pytorch/pull/122064 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but lots of compilation errors show up after this lands ([comment](https://github.com/pytorch/pytorch/pull/122064#issuecomment-2482263396))
2024-11-18 08:31:38 +00:00
43edb94f8a [Quantization][PrivateUse1] Adding more support QuantizedPrivateuse1 backends (#139860)
Here's are some explanations of this PR.

1. Changes in `aten/src/ATen/core/Tensor.cpp` and `c10/core/DispatchKey.cpp`: Support toString method for `QuantizedPrivateUse1` backend, make pytorch print out correct backend string for it.
2. Add  header `DispatchStub.h` in `aten/src/ATen/native/quantized/IndexKernel.h`: If this header is not included, we can't utilize `masked_fill_kernel_quantized_stub` even we include this `IndexKernel.h` header, it would throw an error during compilation.
3. Add multiple `TORCH_API`s in `aten/src/ATen/native/quantized/AffineQuantizer.h`: these functions is useful for other privateuse1 backends supporting quantization functions, if these `TORCH_API` are missed, it would throw an error during runtime (undefined symbol)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/139860
Approved by: https://github.com/bdhirsh
2024-11-18 05:09:59 +00:00
1d5a8ee8fb [C10D] call destroy_process_group after MultiProcess tests (#140820)
Faced with an annoying string of warnings like this when running tests,
<img width="1644" alt="Screenshot 2024-11-15 at 11 23 21 AM" src="https://github.com/user-attachments/assets/91ff4e1d-3c29-4510-9a61-46e7df68a212">

My choices seem to be (1) call destroy_process_group() at the end of
each test fn, (2) do this in some wrapper, (3) do it in the base test
class.

Since tests in MultiProcessTestCase are responsible for calling
init_process_group themselves, they should also be responsible for
calling destroy (or at least method (3) would be asymmetric and may
result in double-destroy).

But it doesn't feel worth it to go add a destroy call manually to each
test, and try/except for a possible second destroy call seems like a
happy middle ground.

Note: tests that want to ensure that destroy runs cleanly can and should
still call destroy _inside_ the test, and this change does not affect
that.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140820
Approved by: https://github.com/fegin
2024-11-18 04:26:21 +00:00
a1327fac45 [Dynamo] Replace torch._dynamo.optimize() with torch.compile() [5/N] (#140663)
related commits:

- #139706
- #140238
- #140247
- #140253
- #140663
- #140688

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140663
Approved by: https://github.com/williamwen42
2024-11-18 04:11:56 +00:00
16bc82a015 [Dynamo] Replace torch._dynamo.optimize() with torch.compile() [6/N] (#140688)
related commits:

- #139706
- #140238
- #140247
- #140253
- #140663
- #140688

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140688
Approved by: https://github.com/williamwen42
2024-11-18 04:09:09 +00:00
62d2c5b667 Revert "Enable XPUEvent elapsed_time function (#134666)" (#140872)
# Motivation
This PR raises an internal UT failure on XPU.
This reverts commit 4bbd6da33101a8d709f1d2921ad8ae6f9b0dc166.
# Additional Context
refer to https://github.com/pytorch/pytorch/issues/140814

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140872
Approved by: https://github.com/EikanWang
2024-11-18 02:58:05 +00:00