Instead of `setup-miniconda`
- Remove `CONDA_RUN` macro...
- Hack the search path in `macos-test.sh` to put both python and python3 aliases first in the path (not sure what other action are messing with path environment variable)
- Skip `TestMultiprocessing.test_fs_sharing` as even though it completes, it hangs on the shutdown both in CI and in all local setups I have
- Skip `TestCppExtensionOpenRgistration.test_base_device_registration` as it hangs on the shutdown as well
Pull Request resolved: https://github.com/pytorch/pytorch/pull/155698
Approved by: https://github.com/atalman
ghstack dependencies: #155476, #155493, #155601, #155515, #155697
# Motivation
According to [[RFC] Intel GPU Runtime Upstreaming](https://github.com/pytorch/pytorch/issues/114842), the 5th runtime component we would like to upstream is `Guard`. We will cover device guard and stream guard in this PR.
# Design
Device guard is used mainly for op dispatcher in PyTorch. Currently, PyTorch already has a device guard abstraction `c10::impl::DeviceGuardImplInterface`. In our design, we will introduce an `XPUGuardImpl` class inherits from `c10::impl::DeviceGuardImplInterface`. Register `XPUGuardImpl` to PyTorch after we implement the device switch management mechanism in `XPUGuardImpl`. Besides, we will introduce `XPUGuard`, `OptionalXPUGuard`, `XPUStreamGuard`, and `OptionalXPUStreamGuard`. They are all following the design of CUDA's counterpart. The corresponding C++ file should be placed in c10/xpu/ folder.
# Additional Context
It is unnecessary to add `Guard` code to PyTorch frontend.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118523
Approved by: https://github.com/EikanWang, https://github.com/gujinghui, https://github.com/jgong5, https://github.com/malfet
ghstack dependencies: #120315
Common advice we give for handling memory fragmentation issues is to
allocate a big block upfront to reserve memory which will get split up later.
For programs with changing tensor sizes this can be especially helpful to
avoid OOMs that happen the first time we see a new largest input and would
otherwise have to allocate new segments.
However the issue with allocating a block upfront is that is nearly impossible
to correctly estimate the size of that block. If too small, space in the block
will run out and the allocator will allocate separate blocks anyway. Too large,
and other non-PyTorch libraries might stop working because they cannot allocate
any memory.
This patch provides the same benefits as using a pre-allocating block but
without having to choose its size upfront. Using the cuMemMap-style APIs,
it adds the ability to expand the last block in a segment when more memory is
needed.
Compared to universally using cudaMallocAsync to avoid fragmentation,
this patch can fix this common fragmentation issue while preserving most
of the existing allocator behavior. This behavior can be enabled and disabled dynamically.
This should allow users to, for instance, allocate long-lived parameters and state in individual buffers,
and put temporary state into the large expandable blocks, further reducing
fragmentation.
See inline comments for information about the implementation and its limitations.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96995
Approved by: https://github.com/eellison
See https://github.com/pytorch/pytorch/issues/94024. I disabled this test on ASAN a while ago for this exact issue. The issue, unfortunately, was hard to reproduce and flaky bot closed it 3 weeks ago. ASAN job has been hanging flakily since then, i.e. 8313becefa.
I don't want to reopen the issue and forget about it after 2 weeks, so let's disable the test for ASAN and be at peace (for now). Interesting, there are other tests here also hanging on ASAN, i.e. `test_leaf_variable_sharing`:
```
# See https://github.com/pytorch/pytorch/issues/14997
@unittest.skipIf(TEST_WITH_ASAN,
"non-deterministically hangs with ASAN")
def test_leaf_variable_sharing(self):
```
I suspect that they have the same root cause.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97742
Approved by: https://github.com/clee2000
This should help address https://github.com/pytorch/pytorch/issues/67002. At the end of these tests, any temp file `/dev/shm/torch_*` are cleaned up, but somehow it might take longer than 0.5s to finish causing the test to fail. So, the PR tries to increase this max waiting time to 5s while polling for the result every 0.5s as before
### Testing
`pytest test_multiprocessing.py -k test_fs --verbose --flake-finder` to run `test_fs`, `test_fs_is_shared`, `test_fs_pool`, `test_fs_preserve_sharing`, and `test_fs_sharing` 50 times on a dynamo shard. All passes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91459
Approved by: https://github.com/kit1980, https://github.com/ZainRizvi, https://github.com/atalman
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72689
Fix https://github.com/pytorch/pytorch/issues/69839
Should we add a private python binding to check if the bad fork guard has been set and add test in CI to make sure that it is never set on our CPU-only CI build? Not sure how flaky that will be out of CI for people that run CPU build on a machine that cuda installed...
EDIT: turns out, we already had such tests in test_multiprocessing. So should be tested and enforced now!
Test Plan: Imported from OSS
Reviewed By: soulitzer
Differential Revision: D34180243
Pulled By: albanD
fbshipit-source-id: 3284db52dcf4568362244b60e3c5657153e64fa4
(cherry picked from commit 6e23f7a33a065c2ab6a267b2c7f0ca97c24532ea)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62030
Remove dtype tracking from Python Storage interface, remove all the different `<type>Storage` classes except for `ByteStorage`, and update serialization accordingly, while maintaining as much FC/BC as possible
Fixes https://github.com/pytorch/pytorch/issues/47442
* **THE SERIALIZATION FORMAT IS FULLY FC/BC.** We worked very hard to make sure this is the case. We will probably want to break FC at some point to make the serialization structure of tensors make more sense, but not today.
* There is now only a single torch.ByteStorage class. Methods like `Tensor.set_` no longer check that the dtype of storage is appropriate.
* As we no longer know what dtype of a storage is, we've **removed** the size method from Storage, replacing it with nbytes. This is to help catch otherwise silent errors where you confuse number of elements with number of bytes.
* `Storage._new_shared` takes a `nbytes` kwarg and will reject previous positional only calls. `Storage._new_with_file` and `_set_from_file` require explicit element size arguments.
* It's no longer possible to convert storages to different types using the float/double/etc methods. Instead, do the conversion using a tensor.
* It's no longer possible to allocate a typed storage directly using FloatStorage/DoubleStorage/etc constructors. Instead, construct a tensor and extract its storage. The classes still exist but they are used purely for unpickling.
* The preexisting serialization format stores dtype with storage, and in fact this dtype is used to determine the dtype of the tensor overall.
To accommodate this case, we introduce a new TypedStorage concept that exists only during unpickling time which is used to temporarily store the dtype so we can construct a tensor. **If you overrode the handling of pickling/unpickling, you MUST add handling for TypedStorage** or your serialization code will degrade to standard file-based serialization.
Original pull request: https://github.com/pytorch/pytorch/pull/59671
Reviewed By: soulitzer, ngimel
Differential Revision: D29466819
Pulled By: ezyang
fbshipit-source-id: 4a14e5d3c2b08e06e558683d97f7378a3180b00e
Summary:
BC-breaking NOTE:
In PyTorch 1.6 bool and integral fill values given to torch.full must set the dtype our out keyword arguments. In prior versions of PyTorch these fill values would return float tensors by default, but in PyTorch 1.7 they will return a bool or long tensor, respectively. The documentation for torch.full has been updated to reflect this.
PR NOTE:
This PR causes torch.full to throw a runtime error when it would have inferred a float dtype by being given a boolean or integer value. A versioned symbol for torch.full is added to preserve the behavior of already serialized Torchscript programs. Existing tests for this behavior being deprecated have been updated to reflect it now being unsupported, and a couple new tests have been added to validate the versioned symbol behavior. The documentation of torch.full has also been updated to reflect this change.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40364
Differential Revision: D22176640
Pulled By: mruberry
fbshipit-source-id: b20158ebbcb4f6bf269d05a688bcf4f6c853a965