Compare commits

..

50 Commits

Author SHA1 Message Date
c263bd43e8 [inductor] use triu ref instead of lowering (#96040) (#96462)
Fixes #95958
Generated code is functionally identical with ref and lowering, only minor differences

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96040
Approved by: https://github.com/jansel

Co-authored-by: Natalia Gimelshein <ngimel@fb.com>
2023-03-09 17:42:00 -05:00
c9913cf66f Add jinja2 as mandatory dependency (#95691) (#96450)
Should fix #95671  for nightly wheels issue. v2.0.0 RC does not need this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95691
Approved by: https://github.com/malfet

Co-authored-by: Wei Wang <weiwangmeta@meta.com>
2023-03-09 17:31:12 -05:00
2f7d8bbf17 Fix expired deprecation of comparison dtype for NumPy 1.24+ (#91517) (#96452)
> The `dtype=` argument to comparison ufuncs is now applied correctly. That
> means that only `bool` and `object` are valid values and `dtype=object` is
> enforced.

Source: https://numpy.org/doc/stable/release/1.24.0-notes.html#expired-deprecations

Fixes #91516

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91517
Approved by: https://github.com/zou3519, https://github.com/huydhn

Co-authored-by: Johnson <j3.soon@msa.hinet.net>
2023-03-09 14:30:00 -08:00
ca0cdf52ca dl_open_guard should restore flag even after exception (#96231) (#96457)
I.e. follow pattern outlined in https://docs.python.org/3.8/library/contextlib.html#contextlib.contextmanager

Also, return early on non-unix platforms (when `sys.getdlopenflags` is not defined)

Fixes https://github.com/pytorch/pytorch/issues/96159

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96231
Approved by: https://github.com/atalman

(cherry picked from commit 941ff109d32d51d6e93a2c2f4a028ff3826ece31)
2023-03-09 14:29:17 -08:00
9cfa076da8 [Release/2.0] Use Triton from PYPI (#96010)
* [Release/2.0] Use Triton from PYPI

Remove `[dynamo]` extras from setup.py

Build torchtriton conda wheels as 2.0.0

* Also, upload triton conda packages to test channel
2023-03-03 20:15:48 -05:00
8e05e41dbc [Release/2.0] Use builder release branch for tests 2023-03-03 16:22:04 -08:00
d8ffc60bc1 Remove mention of dynamo.optimize() in docs (#95802) (#96007)
This should be self containable to merge but other stuff that's been bugging me is
* Instructions on debugging IMA issues
* Dynamic shape instructions
* Explaining config options better

Will look at adding a config options doc

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95802
Approved by: https://github.com/svekars
2023-03-03 17:43:31 -05:00
1483723037 [MPS] Disallow reshape in slice (#95905) (#95978)
Disallow reshapes for arrayViews.
Current code allows a base shape of `[2, 4, 256]` to be sliced into `[4, 1, 256]` (view's shape) - which is not possible. Slicing a smaller dimension into a bigger one will always error out.

Fixes https://github.com/pytorch/pytorch/issues/95883
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95905
Approved by: https://github.com/razarmehr, https://github.com/kulinseth

Co-authored-by: Denis Vieriu <dvieriu@apple.com>
2023-03-03 10:15:10 -08:00
c4572aa1b7 [MPS] Add fixes for div with floor (#95869)
* [MPS] Add fixes for div with floor and raise error for div_trunc (#95769)

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95769
Approved by: https://github.com/DenisVieriu97

* Add back the unittest skip for MacOS 12.
2023-03-02 12:36:02 -08:00
82b078ba64 [MPS] Fix views with 3 or more sliced dimensions (#95762) (#95871)
Fixes https://github.com/pytorch/pytorch/issues/95482
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95762
Approved by: https://github.com/razarmehr

Co-authored-by: Denis Vieriu <dvieriu@apple.com>
2023-03-02 12:27:46 -08:00
77f7bc5f9d Remove torch._inductor.config.triton.convolution (#95840) 2023-03-02 13:49:20 -05:00
0865964576 [optim] _actually_ default to foreach (#95862)
* [optim] include nn.Parameter as foreach supported (#95811)

This PR is a result of a realization that models are NOT subscribed to the foreach defaulting as have been claimed on our documentation for months now. BIG OOPS.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95811
Approved by: https://github.com/albanD

* [optim] Widen the cases for defaulting to foreach (#95820)

Big OOP correction continued. Also added a test this time to verify the defaulting was as expected.

The key here is realizing that the grouping for foreach already assumes that the non-param tensorlists follow suit in dtype and device, so it is too narrow to check that _all_ tensors were on CUDA. The main leeway this allowed was state_steps, which are sometimes cpu tensors. Since foreach _can_ handle cpu tensors, this should not introduce breakage.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95820
Approved by: https://github.com/albanD
2023-03-02 13:33:57 -05:00
f18ac1b386 Release version of fixed nll decomp (#95853)
* fix nll loss decomposition to properly ignore ignore_index

* remove branch
2023-03-02 13:26:45 -05:00
c04134cdb1 [ROCM] Restrict pytorch rocm to only use triton 2.0.x (#95793) (#95834)
To align with upstream, we are requiring triton dependency to be between 2.0.0 and 2.1.  This will allow PyTorch 2.0 on ROCM to stay flexible enough to pick up any performance/stability improvements from Triton, without needing to cut a separate PyTorch version.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95793
Approved by: https://github.com/huydhn
2023-03-01 19:10:00 -05:00
72d0863ab2 [BE] Fix TORCH_WARN_ONCE (#95559) (#95822)
It does not take a condition as first argument, unlike `TORCH_CHECK`
Test plan, run: ` python3 -c "import torch;print(torch.arange(1., 10.,device='mps').view(3, 3).trace())"` and observe no warning

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95559
Approved by: https://github.com/Skylion007

(cherry picked from commit 9bca9df42b5898e45e2a80e03a4a4ba9a6fe654a)
2023-03-01 19:03:41 -05:00
1bd334dc25 Update copyright (#95652) (#95700)
Updating the copyright to reflect on the website.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95652
Approved by: https://github.com/atalman

Co-authored-by: Svetlana Karslioglu <svekars@fb.com>
2023-02-28 11:00:18 -05:00
93e13cd429 [MPS] Remove FFT from the fallback as its causing crashes in test_ops and TestConsistency tests. (#95625) 2023-02-27 17:26:32 -08:00
4e4d4b0afe [MPS] Add TORCH_CHECK for Convolution (#95495)
* Raise errors for Conv and remove FFTs from Fallback list.

* Move the FFT to a separate commit.
2023-02-27 17:25:14 -08:00
Wei
c4fa850827 Reserve the tensorrt backend name for torch-tensorrt (#95627) 2023-02-27 17:17:47 -08:00
36ead09873 Add float to list of allowed ops (#94910) (#95661)
By adding `BINFLOAT` op support

Fixes https://github.com/pytorch/pytorch/issues/94670
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94910
Approved by: https://github.com/albanD

Co-authored-by: Nikita Shulga <nshulga@meta.com>
2023-02-27 15:09:00 -08:00
66d23dbad7 fix spurious aot autograd warning (#95521) (#95614)
The _make_boxed logic probably needs a cleanup, but this fixes a spurious warning that we should get in before the release.

Confirmed that this used to emit a warning and no longer does:
```
import torch

lin = torch.nn.Linear(100, 10)
def f(x):
    return lin(x)

opt_f = torch.compile(f)
opt_f(torch.randn(10, 100, requires_grad=False))
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95521
Approved by: https://github.com/ngimel
2023-02-27 14:31:02 -05:00
e2fff58844 [CUDA][CUBLAS] Explicitly link against cuBLASLt (#95094) (#95615)
An issue surfaced recently that revealed that we were never explicitly linking against `cuBLASLt`, this fixes it by linking explicitly rather than depending on linker magic.

CC @ptrblck @ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95094
Approved by: https://github.com/malfet, https://github.com/ngimel, https://github.com/atalman

Co-authored-by: eqy <eddiey@nvidia.com>
2023-02-27 14:27:06 -05:00
735333a7ff Update triton hash (#95540) (#95577)
Fixes #95523

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95540
Approved by: https://github.com/ngimel
2023-02-27 09:03:50 -08:00
6017488801 [MPS] LSTM fixes (#95388)
* [MPS] Fix LSTM backward and forward pass (#95137)

Fixes #91694
Fixes #92615

Several transpositions were missing for backward graph in case of `batch_first=True`. The #91694 is not reproduced with `batch_first=False`.

After fixing transpose issue, I finally thought that now I can use LSTM freely in my project. And then I got horrific results on train. Seems related to #92615.

After that I decided to fix LSTM's backward step completely. I collected all my findings in this thread — seems like I succeeded

Funny enough, backward tests were completely disabled before and were not passing:
```python
    @unittest.skipIf(True, "Backward of lstm returns wrong result")
    def test_lstm_2(self, device="mps", dtype=torch.float32):
```

UPD: forward pass of multi-layer version also was wrong due to the incorrect `initState, initCell` slices. Tests were passing because states were inited with zeros. *Accidentally* fixed this too

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95137
Approved by: https://github.com/jhavukainen, https://github.com/kulinseth, https://github.com/soulitzer

* Update the allowlist for lstm_mps_backward

* More update to the BC allowlist

---------

Co-authored-by: alexdremov <dremov.me@gmail.com>
Co-authored-by: albanD <desmaison.alban@gmail.com>
2023-02-25 14:04:15 -05:00
e51e5e721c [optim] Add general documentation on our algorithm defaults (#95391) (#95516)
I added a section + table under Algorithms
https://docs-preview.pytorch.org/95391/optim.html?highlight=optim#module-torch.optim
<img width="725" alt="image" src="https://user-images.githubusercontent.com/31798555/221246256-99325a27-9016-407b-a9fe-404d61e41a82.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95391
Approved by: https://github.com/albanD
2023-02-25 14:02:57 -05:00
91739a0279 hotfix for memory leak in aot autograd induced by saving tensors for backward (#95101) (#95477)
Workaround fix in AOTAutograd for https://github.com/pytorch/pytorch/issues/94990 (see the comments for more details / discussion)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95101
Approved by: https://github.com/albanD
2023-02-24 16:40:30 -05:00
531f097b6f inductor: fix complier error when trying to vectorize logit_and and logit_or (#95361) (#95439)
Currently, `operator&& `  and `operator|| ` don't have vectorization implementation, disable them now for a quick fix for 2.0 release.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95361
Approved by: https://github.com/ngimel, https://github.com/EikanWang
2023-02-24 09:23:29 -05:00
00eb7b0d78 [optim] Set defaults to foreach, NOT fused (#95241) (#95415)
Rolling back the default change for Adam and rectifying the docs to reflect that AdamW never defaulted to fused.

Since our fused implementations are relatively newer, let's give them a longer bake-in time before flipping the switch for every user.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95241
Approved by: https://github.com/ngimel
2023-02-24 09:19:40 -05:00
2180f342c4 [SDPA] Fix bug in parsing scaled_dot_product_attention arguments (#95311) (#95397)
Fixes #95266

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95311
Approved by: https://github.com/cpuhrsch
2023-02-24 09:18:19 -05:00
a90b4f09ac use 4 warps for small block config in mm (#95383)
* use 4 warps for small block config in mm

* Update test/inductor/test_select_algorithm.py

* Update test/inductor/test_select_algorithm.py
2023-02-24 09:12:36 -05:00
1211ceeaa4 [MPS] Fix issues with max_pool2d (#95325)
* [MPS] Fix upsample for NHWC output  (#94963)

Fixes https://github.com/huggingface/diffusers/issues/941

**Before**:
<img width="1144" alt="Screenshot 2023-02-15 at 8 11 53 PM" src="https://user-images.githubusercontent.com/104024078/219266709-6a77636a-2fc0-4802-b130-85069b95953f.png">

**After**:
<img width="1144" alt="Screenshot 2023-02-15 at 8 12 02 PM" src="https://user-images.githubusercontent.com/104024078/219266694-ea743c02-fb55-44f1-b7d6-5946106527c3.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94963
Approved by: https://github.com/razarmehr

* [MPS] Move max_pool2d to mps dispatch key (#90772)

Related issue: #77394

This PR also modifies some assertions in the codegen, an explanatory comment for it has been added.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90772
Approved by: https://github.com/albanD

* [MPS] Convert output back to ChannelsLast for MaxPool2D (#94877)

Since we re-stride the indices and output in MPS pooling from ChannelsLast to Contiguous, we need to convert the results back to ChannelsLast.
This will fix the failure with test_memory_format with MaxPool2D in test_modules.py.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94877
Approved by: https://github.com/kulinseth, https://github.com/DenisVieriu97

---------

Co-authored-by: Denis Vieriu <104024078+DenisVieriu97@users.noreply.github.com>
Co-authored-by: Li-Huai (Allan) Lin <qqaatw@gmail.com>
Co-authored-by: Ramin Azarmehr <razarmehr@apple.com>
2023-02-24 09:10:49 -05:00
beaa5c5908 [MPS] View fixes (#95323)
* [MPS] Fix the uint8 type issue with View ops kernels (#95145)

This should fix the problem in Resnet model with image artifacts due to saturation on int8 type and also the incorrect class recognition reported in #86954.

Fixes #86954

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95145
Approved by: https://github.com/kulinseth, https://github.com/DenisVieriu97

* [MPS] Fix tensor with non-zero storage offset graph gathering (#91071)

Previously, the "can slice" flag in Placeholder constructor in `OperationUtils.mm` is conditioned on whether the numbers of dimensions of base shape and view shape are the same. This doesn't consider the situation that a view tensor could be the base tensor's sliced and then unsqueezed version, resulting in different num of dims.

For example, if we want to stack `y_mps` and `x_mps` on the last dim:
```
t_mps = torch.tensor([1, 2, 3, 4], device="mps")
x_mps = t_mps[2:]  # [3, 4]
y_mps = t_mps[:2]  # [1, 2]

res_mps = torch.stack((y_mps, x_mps), dim=-1)
```

the kernel will unsqueeze both of them on the last dim and then concatenate them, which is equivalent to:

```
res_mps = torch.cat((y_mps.unsqueeze(-1), x_mps.unsqueeze(-1)), dim=-1)
```

`x_mps.unsqueeze(-1)` is an unsqueezed and contiguous tensor with a storage offset, this kind of tensors should be sliceable without cloning its storage.

Fixes #87856
Fixes #91065

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91071
Approved by: https://github.com/kulinseth

* [MPS] Fix fill_ where input tensor has a storage offset (#95113)

Fixes #94390

Apart from fixing the issue above, this PR also fixes a bug that when an input tensor can be sliced, a sliced array view is created. This array view seems to be not writable or have a different storage from the original tensor, causing incorrect results with the in-place `fill`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95113
Approved by: https://github.com/kulinseth

* [MPS] Fix view op slicing for 2nd dim in case of 0 offset (#95381)

* Fix view op slicing for 2nd dim in case of 0 offset

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95381
Approved by: https://github.com/razarmehr

---------

Co-authored-by: Ramin Azarmehr <razarmehr@apple.com>
Co-authored-by: Li-Huai (Allan) Lin <qqaatw@gmail.com>
Co-authored-by: Denis Vieriu <104024078+DenisVieriu97@users.noreply.github.com>
2023-02-24 09:09:49 -05:00
4bd5c1e4f4 Fix warning if backend registers timer (#91702) (#95363)
currently logger timer is registered default for
cpu/cuda. for other backends, it may or may not
registers this timer. It reports warning for other backends and return which is not expected.
The above may fail, if the backends has have registered this timer. For example, HPU(habana) backend registers this timer. so, in this case it reports a warning and return which is incorrect.

Other case is where lazy backend timer is never registered. so, this returns a warning, and this is the reason the check was added, but it fails for other cases.

Add a generic check if the timer is registered, then don’t report warning.

Signed-off-by: Jeeja <jeejakp@habana.ai>

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91702
Approved by: https://github.com/kit1980
2023-02-23 18:57:09 -05:00
f3c97a4e43 Raise error on 3.11 dynamo export (#95088) (#95396)
For https://github.com/pytorch/pytorch/issues/94914. Realized that `dynamo.export` doesn't immediately raise an error when dynamo is trying to run on 3.11/windows.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95088
Approved by: https://github.com/weiwangmeta
2023-02-23 18:55:32 -05:00
30cf0e70f7 [MPS] Copy fixes for MPS backend (#95321)
* [MPS] Handle broadcasting by expanding src tensor in Copy.mm (#95272)

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95272
Approved by: https://github.com/DenisVieriu97

* [MPS] Fix copy_cast_mps() on tensors with storage offset (#95093)

- The copy_cast path requires storage_offset to be applied before casting
- This should fix some correctness issues in transformer models

Fixes #94980

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95093
Approved by: https://github.com/kulinseth

---------

Co-authored-by: Ramin Azarmehr <razarmehr@apple.com>
2023-02-23 18:17:20 -05:00
96f627dcde [MPS] Fixes in backward functions of the MPS ops (#95327)
* [MPS] Fix bilinear backward pass (#94892)

Fixes backward pass for bilinear.

Summary of changes:
- bilinear op is able to produce **contiguous, non-view** tensors with a storage offset, such as: shape=`[1, 1, 1, 1]`, `storage_offset=12`. This seems a weird case, but it is valid, and for these type of tensors we wouldn't be able to gather/scatter since we look at the view flag (which is not set here). This change looks into `storage_offset` only rather than the is_view flag which is not being set
- **reduction sum** must return a zeroed out output if passing an input with 0 elements (e.g a shape of (0, 5)).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94892
Approved by: https://github.com/kulinseth

* [MPS] Fix the crash in elu_backward() (#94923)

Fixes a crash where the inputTensor could go null and cause a crash.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94923
Approved by: https://github.com/DenisVieriu97, https://github.com/kulinseth

* [MPS] Fix prelu backward pass (#94933)

Allocate the correct shape for the weights gradient
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94933
Approved by: https://github.com/razarmehr

* [MPS] Fix embedding_backward() issue with Float16 (#94950)

- Casting the float16 input tensor to float32 and cast back the output tensor

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94950
Approved by: https://github.com/DenisVieriu97

---------

Co-authored-by: Denis Vieriu <dvieriu@apple.com>
Co-authored-by: Ramin Azarmehr <razarmehr@apple.com>
2023-02-23 16:28:33 -05:00
6f11e6d6a1 [MPS] Convolution fixes (#95318)
* [MPS] Convolution cleanup; remove unnecessary contiguous calls (#95078)

- Fixes convolution crashes in backward with weights
- Removes unnecessary contiguous calls
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95078
Approved by: https://github.com/kulinseth

* [MPS] Fix nn.functional.conv_transpose2d grad (#94871)

- add _mps_convolution_impl that takes optional shape
- for conv_tranpose2d grad, use the shape from forward pass directly
- for conv, calculate the shape from input
- remove nn.functional.conv_transpose2d grad from blocklist

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94871
Approved by: https://github.com/kulinseth

---------

Co-authored-by: Denis Vieriu <104024078+DenisVieriu97@users.noreply.github.com>
Co-authored-by: Denis Vieriu <dvieriu@apple.com>
2023-02-23 12:31:30 -05:00
fcec27f7d5 [MPS] Numerical stability and reduction fixes (#95317)
* [MPS] Fixes for LSTM. (#94889)

- Backward pass has to give explicit bias tensor of zeros if none is passed to the op or the bias gradient will not be calculated.
- Fixed bias tensor mistakenly getting overwritten to zeros
- Fixes crash when lstm op called with has_biases set to false. Change takes into account the changed shape of the input params TensorList depending on the bias flag.

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94889
Approved by: https://github.com/DenisVieriu97

* [MPS] LogSoftmax numerical stability (#95091)

Fixes #94043

Calculations are now consistent with numericaly stable formula and CPU:

$LogSoftmax(X, \dim) = X - \max(X, \dim) - \log(sum(X - \max(X, \dim), \dim))$

@malfet

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95091
Approved by: https://github.com/malfet, https://github.com/kulinseth

* [MPS] Cast int64 to int32 for reduction ops (#95231)

- give warnings of converting int64 for reduction ops
- use cast tensor for reduction sum on trace
- unblock trace from running
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95231
Approved by: https://github.com/razarmehr

* [MPS] Fix Float16 issue with Reduction ops for macOS 12 (#94952)

This would fix the issue with `__rdiv__` with float16
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94952
Approved by: https://github.com/kulinseth

---------

Co-authored-by: alexdremov <dremov.me@gmail.com>
Co-authored-by: Denis Vieriu <dvieriu@apple.com>
Co-authored-by: Ramin Azarmehr <razarmehr@apple.com>
2023-02-23 12:27:40 -05:00
cddcb1e526 Raise error if torch.compile is called from windows or py 3.11 (#94940) (#95329)
For https://github.com/pytorch/pytorch/issues/94914

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94940
Approved by: https://github.com/albanD

Co-authored-by: William Wen <williamwen@fb.com>
2023-02-23 07:56:44 -05:00
0553b46df1 [profiler] update docs with repeat=1 (#95085) (#95242)
Specifying number of times to repeat is now required when defining the schedule.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95085
Approved by: https://github.com/aaronenyeshi
2023-02-22 16:15:37 -08:00
b45d7697a5 fix numpy1.24 deprecations in unittests (#93997) (#95150)
Fixes https://github.com/pytorch/pytorch/issues/91329

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93997
Approved by: https://github.com/ngimel, https://github.com/jerryzh168
2023-02-22 17:48:57 -05:00
7ebb309457 Revert "[CI] Use prebuilt triton from nightly repo (#94732)" (#95310)
This reverts commit 18d93cdc5dba50633a72363625601f9cf7253162.

Reverted https://github.com/pytorch/pytorch/pull/94732 on behalf of https://github.com/kit1980 due to Reverting per offline discussion to try to fix dynamo test failures after triton update

Co-authored-by: PyTorch MergeBot <pytorchmergebot@users.noreply.github.com>
2023-02-22 15:52:19 -05:00
cedfcdab46 Upgrade setuptools before building wheels (#95288)
* [BE] Cleanup triton builds (#95026)

Remove Python-3.7 clause
Do not install llvm-11, as llvm-14 is installed by triton/python/setup.py script

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95026
Approved by: https://github.com/osalpekar, https://github.com/weiwangmeta

* Upgrade setuptools before building wheels (#95265)

Should fix https://github.com/pytorch/builder/issues/1318

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95265
Approved by: https://github.com/ngimel

---------

Co-authored-by: Nikita Shulga <nshulga@fb.com>
Co-authored-by: Wei Wang <weiwangmeta@meta.com>
2023-02-22 12:23:21 -08:00
0b21e62406 Update triton hash (#95247) (#95285)
Should fix #95082
This hash commit is supposed to fix sm_89 issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95247
Approved by: https://github.com/ngimel, https://github.com/seemethere

Co-authored-by: Wei Wang <weiwangmeta@meta.com>
2023-02-22 12:19:13 -08:00
91994c999f Deprecate Caffe2 ONNX exporter (#95071) 2023-02-21 11:01:34 -08:00
0b82f58866 inductor(cpu): fix C++ compile error when sigmoid's post ops is a reduction op (#94890) (#95054)
For timm **nfnet_l0** model. CPU path has the following error: `torch._dynamo.exc.BackendCompilerFailed: inductor raised CppCompileError: C++ compile error`.

There has a simple test case:

```
def fn(x):
    x = torch.ops.aten.sigmoid.default(x)
    return torch.ops.aten.mean.dim(x, [-1, -2], True)

x = torch.randn((1, 8, 8, 8))
opt_fn = torch._dynamo.optimize("inductor")(fn)
opt_fn(x)

real_out = fn(x)
compiled_out = opt_fn(x)
tol = 0.0001
print(torch.allclose(real_out, compiled_out, atol=tol, rtol=tol))

```

before:

```
extern "C" void kernel(float* __restrict__ in_out_ptr0,
                       const float* __restrict__ in_ptr0)
{
    auto out_ptr0 = in_out_ptr0;
    {
        #pragma GCC ivdep
        for(long i0=0; i0<8; i0+=1)
        {
            {
                #pragma omp declare reduction(+:at::vec::Vectorized<float>:omp_out += omp_in) initializer(omp_priv={{0}})
                float tmp2 = 0;
                auto tmp2_vec = at::vec::Vectorized<float>(tmp2);
                for(long i1=0; i1<4; i1+=1)
                {
                    auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + (16*i1) + (64*i0));
                    auto tmp1 = decltype(tmp0)(1)/(decltype(tmp0)(1) + tmp0.neg().exp());
                    tmp2_vec += tmp1;
                }
                #pragma omp simd simdlen(8)  reduction(+:tmp3)
                for(long i1=64; i1<64; i1+=1)
                {
                    auto tmp0 = in_ptr0[i1 + (64*i0)];
                    auto tmp1 = std::exp(-tmp0);
                    auto tmp2 = 1 / (1 + tmp1);
                    tmp3 += tmp2;
                }
                tmp2 += at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return x + y;}, tmp2_vec);
                out_ptr0[i0] = tmp3;
            }
        }
    }
    {
        for(long i0=0; i0<0; i0+=1)
        {
            auto tmp0 = at::vec::Vectorized<float>::loadu(out_ptr0 + 16*i0);
            auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(64));
            auto tmp2 = tmp0 / tmp1;
            tmp2.store(in_out_ptr0 + 16*i0);
        }
        #pragma omp simd simdlen(8)
        for(long i0=0; i0<8; i0+=1)
        {
            auto tmp0 = out_ptr0[i0];
            auto tmp1 = static_cast<float>(64);
            auto tmp2 = tmp0 / tmp1;
            in_out_ptr0[i0] = tmp2;
        }
    }
}
```

after:
```
extern "C" void kernel(float* __restrict__ in_out_ptr0,
                       const float* __restrict__ in_ptr0)
{
    auto out_ptr0 = in_out_ptr0;
    #pragma omp parallel num_threads(40)
    {
        {
            #pragma omp for
            for(long i0=0; i0<8; i0+=1)
            {
                {
                    #pragma omp declare reduction(+:at::vec::Vectorized<float>:omp_out += omp_in) initializer(omp_priv={{0}})
                    float tmp2 = 0;
                    auto tmp2_vec = at::vec::Vectorized<float>(tmp2);
                    for(long i1=0; i1<4; i1+=1)
                    {
                        auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + (16*i1) + (64*i0));
                        auto tmp1 = decltype(tmp0)(1)/(decltype(tmp0)(1) + tmp0.neg().exp());
                        tmp2_vec += tmp1;
                    }
                    #pragma omp simd simdlen(8)  reduction(+:tmp2)
                    for(long i1=64; i1<64; i1+=1)
                    {
                        auto tmp0 = in_ptr0[i1 + (64*i0)];
                        auto tmp1 = decltype(tmp0)(1) / (decltype(tmp0)(1) + std::exp(-tmp0));
                        tmp2 += tmp1;
                    }
                    tmp2 += at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return x + y;}, tmp2_vec);
                    out_ptr0[i0] = tmp2;
                }
            }
        }
        #pragma omp single
        {
            {
                for(long i0=0; i0<0; i0+=1)
                {
                    auto tmp0 = at::vec::Vectorized<float>::loadu(out_ptr0 + 16*i0);
                    auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(64));
                    auto tmp2 = tmp0 / tmp1;
                    tmp2.store(in_out_ptr0 + 16*i0);
                }
                #pragma omp simd simdlen(8)
                for(long i0=0; i0<8; i0+=1)
                {
                    auto tmp0 = out_ptr0[i0];
                    auto tmp1 = static_cast<float>(64);
                    auto tmp2 = tmp0 / tmp1;
                    in_out_ptr0[i0] = tmp2;
                }
            }
        }
    }
}
''')
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94890
Approved by: https://github.com/EikanWang, https://github.com/jgong5, https://github.com/lezcano
2023-02-20 14:35:40 -05:00
1f7ab1c823 fix performance issue in torch.sparse.mm reduce mode (#94969) (#95018)
Fix performance bug for `torch.sparse.mm()` with reduce flag.

Found this bug within internal benchmarking.
Made a mistake when updating previous patch which causes load imbalance between threads:

Test on ogbn-products datasets on Xeon CLX with 24 cores:

#### before
```
sparse.mm: mean: 1156.148 ms
sparse.mm: sum: 1163.754 ms
sparse.mm: (using mkl): 703.227 ms
```

#### after
```
sparse.mm: mean: 662.578 ms
sparse.mm: sum: 662.301 ms
sparse.mm: (using mkl): 700.178 ms
```

The result also indicates that the current spmm kernel is no worse than MKL's sparse_mm .

Also update results on `pyg benchmark` with:
```
python gnn.py --use_sage --epochs=3 --runs=1 --inference
```

* Out of box: `13.32s`
* Without the fix in this PR: `5.87s`
* With the fix in this PR: `3.19s`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94969
Approved by: https://github.com/jgong5
2023-02-20 14:30:52 -05:00
52a27dd0ee [release only change] Add change to template - fix lint (#94981) 2023-02-16 09:53:48 -05:00
e0c728c545 Changes for release 2.0 only (#94934)
* Changes for release 2.0 only

* Delete the refs during pytorch checkout

* Bug fix and add xla r2.0 hash
2023-02-15 18:08:38 -05:00
dbcd11f3a7 try to fix OSS CI error (#94785) (#94936)
Differential Revision: D43259005

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94785
Approved by: https://github.com/weiwangmeta, https://github.com/digantdesai

Co-authored-by: Cuiqing Li <cuiqingli123@meta.com>
2023-02-15 17:47:26 -05:00
5017 changed files with 189541 additions and 582011 deletions

View File

@ -1,3 +0,0 @@
# We do not use this library in our Bazel build. It contains an
# infinitely recursing symlink that makes Bazel very unhappy.
third_party/ittapi/

View File

@ -69,6 +69,10 @@ build --per_file_copt='^//.*\.(cpp|cc)$'@-Werror=all
# The following warnings come from -Wall. We downgrade them from error
# to warnings here.
#
# sign-compare has a tremendous amount of violations in the
# codebase. It will be a lot of work to fix them, just disable it for
# now.
build --per_file_copt='^//.*\.(cpp|cc)$'@-Wno-sign-compare
# We intentionally use #pragma unroll, which is compiler specific.
build --per_file_copt='^//.*\.(cpp|cc)$'@-Wno-error=unknown-pragmas
@ -96,9 +100,6 @@ build --per_file_copt='^//.*\.(cpp|cc)$'@-Wno-unused-parameter
# likely want to have this disabled for the most part.
build --per_file_copt='^//.*\.(cpp|cc)$'@-Wno-missing-field-initializers
build --per_file_copt='^//.*\.(cpp|cc)$'@-Wno-unused-function
build --per_file_copt='^//.*\.(cpp|cc)$'@-Wno-unused-variable
build --per_file_copt='//:aten/src/ATen/RegisterCompositeExplicitAutograd\.cpp$'@-Wno-error=unused-function
build --per_file_copt='//:aten/src/ATen/RegisterCompositeImplicitAutograd\.cpp$'@-Wno-error=unused-function
build --per_file_copt='//:aten/src/ATen/RegisterMkldnnCPU\.cpp$'@-Wno-error=unused-function

View File

@ -1 +1 @@
6.1.1
4.2.1

View File

@ -14,7 +14,6 @@
[cxx]
cxxflags = -std=c++17
ldflags = -Wl,--no-undefined
should_remap_host_platform = true
cpp = /usr/bin/clang
cc = /usr/bin/clang

View File

@ -1,7 +1,7 @@
# Docker images for GitHub CI
# Docker images for Jenkins
This directory contains everything needed to build the Docker images
that are used in our CI.
that are used in our CI
The Dockerfiles located in subdirectories are parameterized to
conditionally run build stages depending on build arguments passed to
@ -12,13 +12,13 @@ each image as the `BUILD_ENVIRONMENT` environment variable.
See `build.sh` for valid build environments (it's the giant switch).
Docker builds are now defined with `.circleci/cimodel/data/simple/docker_definitions.py`
## Contents
* `build.sh` -- dispatch script to launch all builds
* `common` -- scripts used to execute individual Docker build stages
* `ubuntu` -- Dockerfile for Ubuntu image for CPU build and test jobs
* `ubuntu-cuda` -- Dockerfile for Ubuntu image with CUDA support for nvidia-docker
* `ubuntu-rocm` -- Dockerfile for Ubuntu image with ROCm support
## Usage

View File

@ -53,7 +53,7 @@ dependencies {
implementation 'androidx.appcompat:appcompat:1.0.0'
implementation 'com.facebook.fbjni:fbjni-java-only:0.2.2'
implementation 'com.google.code.findbugs:jsr305:3.0.1'
implementation 'com.facebook.soloader:nativeloader:0.10.5'
implementation 'com.facebook.soloader:nativeloader:0.10.4'
implementation 'junit:junit:' + rootProject.junitVersion
implementation 'androidx.test:core:' + rootProject.coreVersion

View File

@ -46,7 +46,9 @@ if [[ "$image" == *xla* ]]; then
exit 0
fi
if [[ "$image" == *-focal* ]]; then
if [[ "$image" == *-bionic* ]]; then
UBUNTU_VERSION=18.04
elif [[ "$image" == *-focal* ]]; then
UBUNTU_VERSION=20.04
elif [[ "$image" == *-jammy* ]]; then
UBUNTU_VERSION=22.04
@ -79,18 +81,18 @@ fi
# CMake 3.18 is needed to support CUDA17 language variant
CMAKE_VERSION=3.18.5
_UCX_COMMIT=00bcc6bb18fc282eb160623b4c0d300147f579af
_UCC_COMMIT=7cb07a76ccedad7e56ceb136b865eb9319c258ea
_UCX_COMMIT=31e74cac7bee0ef66bef2af72e7d86d9c282e5ab
_UCC_COMMIT=1c7a7127186e7836f73aafbd7697bbc274a77eee
# It's annoying to rename jobs every time you want to rewrite a
# configuration, so we hardcode everything here rather than do it
# from scratch
case "$image" in
pytorch-linux-focal-cuda12.1-cudnn8-py3-gcc9)
CUDA_VERSION=12.1.1
pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7)
CUDA_VERSION=11.6.2
CUDNN_VERSION=8
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=9
GCC_VERSION=7
PROTOBUF=yes
DB=yes
VISION=yes
@ -98,13 +100,12 @@ case "$image" in
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
CONDA_CMAKE=yes
TRITON=yes
;;
pytorch-linux-focal-cuda12.1-cudnn8-py3-gcc9-inductor-benchmarks)
CUDA_VERSION=12.1.1
pytorch-linux-bionic-cuda11.7-cudnn8-py3-gcc7)
CUDA_VERSION=11.7.0
CUDNN_VERSION=8
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=9
GCC_VERSION=7
PROTOBUF=yes
DB=yes
VISION=yes
@ -112,24 +113,8 @@ case "$image" in
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
CONDA_CMAKE=yes
TRITON=yes
INDUCTOR_BENCHMARKS=yes
;;
pytorch-linux-focal-cuda11.8-cudnn8-py3-gcc9)
CUDA_VERSION=11.8.0
CUDNN_VERSION=8
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
CONDA_CMAKE=yes
TRITON=yes
;;
pytorch-linux-focal-cuda11.8-cudnn8-py3-gcc7)
pytorch-linux-bionic-cuda11.8-cudnn8-py3-gcc7)
CUDA_VERSION=11.8.0
CUDNN_VERSION=8
ANACONDA_PYTHON_VERSION=3.10
@ -141,36 +126,14 @@ case "$image" in
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
CONDA_CMAKE=yes
TRITON=yes
;;
pytorch-linux-focal-cuda11.8-cudnn8-py3-gcc7-inductor-benchmarks)
CUDA_VERSION=11.8.0
CUDNN_VERSION=8
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=7
pytorch-linux-focal-py3-clang7-asan)
ANACONDA_PYTHON_VERSION=3.9
CLANG_VERSION=7
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
CONDA_CMAKE=yes
TRITON=yes
INDUCTOR_BENCHMARKS=yes
;;
pytorch-linux-focal-cuda12.1-cudnn8-py3-gcc9)
CUDA_VERSION=12.1.1
CUDNN_VERSION=8
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
CONDA_CMAKE=yes
TRITON=yes
;;
pytorch-linux-focal-py3-clang10-onnx)
ANACONDA_PYTHON_VERSION=3.8
@ -179,10 +142,9 @@ case "$image" in
DB=yes
VISION=yes
CONDA_CMAKE=yes
ONNX=yes
;;
pytorch-linux-focal-py3-clang7-android-ndk-r19c)
ANACONDA_PYTHON_VERSION=3.8
ANACONDA_PYTHON_VERSION=3.7
CLANG_VERSION=7
LLVMDEV=yes
PROTOBUF=yes
@ -191,38 +153,45 @@ case "$image" in
GRADLE_VERSION=6.8.3
NINJA_VERSION=1.9.0
;;
pytorch-linux-focal-py3.8-clang10)
pytorch-linux-bionic-py3.8-clang9)
ANACONDA_PYTHON_VERSION=3.8
CLANG_VERSION=10
CLANG_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
VULKAN_SDK_VERSION=1.2.162.1
SWIFTSHADER=yes
CONDA_CMAKE=yes
TRITON=yes
;;
pytorch-linux-focal-py3.11-clang10)
pytorch-linux-bionic-py3.11-clang9)
ANACONDA_PYTHON_VERSION=3.11
CLANG_VERSION=10
CLANG_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
VULKAN_SDK_VERSION=1.2.162.1
SWIFTSHADER=yes
CONDA_CMAKE=yes
TRITON=yes
;;
pytorch-linux-focal-py3.8-gcc9)
pytorch-linux-bionic-py3.8-gcc9)
ANACONDA_PYTHON_VERSION=3.8
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
CONDA_CMAKE=yes
TRITON=yes
;;
pytorch-linux-focal-rocm-n-1-py3)
ANACONDA_PYTHON_VERSION=3.8
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
ROCM_VERSION=5.3
NINJA_VERSION=1.9.0
CONDA_CMAKE=yes
;;
pytorch-linux-focal-rocm-n-py3)
ANACONDA_PYTHON_VERSION=3.8
GCC_VERSION=9
PROTOBUF=yes
@ -231,18 +200,6 @@ case "$image" in
ROCM_VERSION=5.4.2
NINJA_VERSION=1.9.0
CONDA_CMAKE=yes
TRITON=yes
;;
pytorch-linux-focal-rocm-n-py3)
ANACONDA_PYTHON_VERSION=3.8
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
ROCM_VERSION=5.6
NINJA_VERSION=1.9.0
CONDA_CMAKE=yes
TRITON=yes
;;
pytorch-linux-focal-py3.8-gcc7)
ANACONDA_PYTHON_VERSION=3.8
@ -252,20 +209,24 @@ case "$image" in
VISION=yes
KATEX=yes
CONDA_CMAKE=yes
TRITON=yes
DOCS=yes
;;
pytorch-linux-jammy-py3.8-gcc11-inductor-benchmarks)
pytorch-linux-jammy-cuda11.6-cudnn8-py3.8-clang12)
ANACONDA_PYTHON_VERSION=3.8
GCC_VERSION=11
CUDA_VERSION=11.6
CUDNN_VERSION=8
CLANG_VERSION=12
PROTOBUF=yes
DB=yes
VISION=yes
;;
pytorch-linux-jammy-cuda11.7-cudnn8-py3.8-clang12)
ANACONDA_PYTHON_VERSION=3.8
CUDA_VERSION=11.7
CUDNN_VERSION=8
CLANG_VERSION=12
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
CONDA_CMAKE=yes
TRITON=yes
DOCS=yes
INDUCTOR_BENCHMARKS=yes
;;
pytorch-linux-jammy-cuda11.8-cudnn8-py3.8-clang12)
ANACONDA_PYTHON_VERSION=3.8
@ -275,27 +236,6 @@ case "$image" in
PROTOBUF=yes
DB=yes
VISION=yes
TRITON=yes
;;
pytorch-linux-jammy-py3-clang12-asan)
ANACONDA_PYTHON_VERSION=3.9
CLANG_VERSION=12
PROTOBUF=yes
DB=yes
VISION=yes
CONDA_CMAKE=yes
TRITON=yes
;;
pytorch-linux-jammy-py3.8-gcc11)
ANACONDA_PYTHON_VERSION=3.8
GCC_VERSION=11
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
CONDA_CMAKE=yes
TRITON=yes
DOCS=yes
;;
pytorch-linux-focal-linter)
# TODO: Use 3.9 here because of this issue https://github.com/python/mypy/issues/13627.
@ -320,7 +260,6 @@ case "$image" in
if [[ "$image" == *rocm* ]]; then
extract_version_from_image_name rocm ROCM_VERSION
NINJA_VERSION=1.9.0
TRITON=yes
fi
if [[ "$image" == *centos7* ]]; then
NINJA_VERSION=1.10.2
@ -384,15 +323,11 @@ docker build \
--build-arg "NINJA_VERSION=${NINJA_VERSION:-}" \
--build-arg "KATEX=${KATEX:-}" \
--build-arg "ROCM_VERSION=${ROCM_VERSION:-}" \
--build-arg "PYTORCH_ROCM_ARCH=${PYTORCH_ROCM_ARCH:-gfx906;gfx90a}" \
--build-arg "PYTORCH_ROCM_ARCH=${PYTORCH_ROCM_ARCH:-gfx906}" \
--build-arg "IMAGE_NAME=${IMAGE_NAME}" \
--build-arg "UCX_COMMIT=${UCX_COMMIT}" \
--build-arg "UCC_COMMIT=${UCC_COMMIT}" \
--build-arg "CONDA_CMAKE=${CONDA_CMAKE}" \
--build-arg "TRITON=${TRITON}" \
--build-arg "ONNX=${ONNX}" \
--build-arg "DOCS=${DOCS}" \
--build-arg "INDUCTOR_BENCHMARKS=${INDUCTOR_BENCHMARKS}" \
-f $(dirname ${DOCKERFILE})/Dockerfile \
-t "$tmp_tag" \
"$@" \

60
.ci/docker/build_docker.sh Executable file
View File

@ -0,0 +1,60 @@
#!/bin/bash
set -ex
retry () {
$* || (sleep 1 && $*) || (sleep 2 && $*)
}
# If UPSTREAM_BUILD_ID is set (see trigger job), then we can
# use it to tag this build with the same ID used to tag all other
# base image builds. Also, we can try and pull the previous
# image first, to avoid rebuilding layers that haven't changed.
#until we find a way to reliably reuse previous build, this last_tag is not in use
# last_tag="$(( CIRCLE_BUILD_NUM - 1 ))"
tag="${DOCKER_TAG}"
registry="308535385114.dkr.ecr.us-east-1.amazonaws.com"
image="${registry}/pytorch/${IMAGE_NAME}"
login() {
aws ecr get-authorization-token --region us-east-1 --output text --query 'authorizationData[].authorizationToken' |
base64 -d |
cut -d: -f2 |
docker login -u AWS --password-stdin "$1"
}
# Only run these steps if not on github actions
if [[ -z "${GITHUB_ACTIONS}" ]]; then
# Retry on timeouts (can happen on job stampede).
retry login "${registry}"
# Logout on exit
trap "docker logout ${registry}" EXIT
fi
# Try to pull the previous image (perhaps we can reuse some layers)
# if [ -n "${last_tag}" ]; then
# docker pull "${image}:${last_tag}" || true
# fi
# Build new image
./build.sh ${IMAGE_NAME} -t "${image}:${tag}"
# Only push if `DOCKER_SKIP_PUSH` = false
if [ "${DOCKER_SKIP_PUSH:-true}" = "false" ]; then
# Only push if docker image doesn't exist already.
# ECR image tags are immutable so this will avoid pushing if only just testing if the docker jobs work
# NOTE: The only workflow that should push these images should be the docker-builds.yml workflow
if ! docker manifest inspect "${image}:${tag}" >/dev/null 2>/dev/null; then
docker push "${image}:${tag}"
fi
fi
if [ -z "${DOCKER_SKIP_S3_UPLOAD:-}" ]; then
trap "rm -rf ${IMAGE_NAME}:${tag}.tar" EXIT
docker save -o "${IMAGE_NAME}:${tag}.tar" "${image}:${tag}"
aws s3 cp "${IMAGE_NAME}:${tag}.tar" "s3://ossci-linux-build/pytorch/base/${IMAGE_NAME}:${tag}.tar" --acl public-read
fi

View File

@ -64,9 +64,9 @@ ENV INSTALLED_DB ${DB}
# (optional) Install vision packages like OpenCV and ffmpeg
ARG VISION
COPY ./common/install_vision.sh ./common/cache_vision_models.sh ./common/common_utils.sh ./
COPY ./common/install_vision.sh install_vision.sh
RUN if [ -n "${VISION}" ]; then bash ./install_vision.sh; fi
RUN rm install_vision.sh cache_vision_models.sh common_utils.sh
RUN rm install_vision.sh
ENV INSTALLED_VISION ${VISION}
# Install rocm

View File

@ -1 +0,0 @@
4.27.4

View File

@ -1 +0,0 @@
b9d43c7dcac1fe05e851dd7be7187b108af593d2

View File

@ -1 +0,0 @@
34f8189eae57a23cc15b4b4f032fe25757e0db8e

View File

@ -1 +0,0 @@
e6216047b8b0aef1fe8da6ca8667a3ad0a016411

View File

@ -1,18 +0,0 @@
#!/bin/bash
set -ex
source "$(dirname "${BASH_SOURCE[0]}")/common_utils.sh"
# Cache the test models at ~/.cache/torch/hub/
IMPORT_SCRIPT_FILENAME="/tmp/torchvision_import_script.py"
as_jenkins echo 'import torchvision; torchvision.models.mobilenet_v2(pretrained=True); torchvision.models.mobilenet_v3_large(pretrained=True);' > "${IMPORT_SCRIPT_FILENAME}"
pip_install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/cpu
# Very weird quoting behavior here https://github.com/conda/conda/issues/10972,
# so echo the command to a file and run the file instead
conda_run python "${IMPORT_SCRIPT_FILENAME}"
# Cleaning up
conda_run pip uninstall -y torch torchvision
rm "${IMPORT_SCRIPT_FILENAME}" || true

View File

@ -13,7 +13,7 @@ as_jenkins() {
# NB: Pass on PATH and LD_LIBRARY_PATH to sudo invocation
# NB: This must be run from a directory that jenkins has access to,
# works around https://github.com/conda/conda-package-handling/pull/34
$SUDO -E -H -u jenkins env -u SUDO_UID -u SUDO_GID -u SUDO_COMMAND -u SUDO_USER env "PATH=$PATH" "LD_LIBRARY_PATH=$LD_LIBRARY_PATH" $*
$SUDO -H -u jenkins env -u SUDO_UID -u SUDO_GID -u SUDO_COMMAND -u SUDO_USER env "PATH=$PATH" "LD_LIBRARY_PATH=$LD_LIBRARY_PATH" $*
}
conda_install() {
@ -30,7 +30,3 @@ conda_run() {
pip_install() {
as_jenkins conda run -n py_$ANACONDA_PYTHON_VERSION pip install --progress-bar off $*
}
get_pinned_commit() {
cat "${1}".txt
}

View File

@ -107,6 +107,3 @@ chgrp -R jenkins /var/lib/jenkins/.gradle
popd
rm -rf /var/lib/jenkins/.gradle/daemon
# Cache vision models used by the test
source "$(dirname "${BASH_SOURCE[0]}")/cache_vision_models.sh"

View File

@ -31,13 +31,10 @@ install_ubuntu() {
maybe_libomp_dev=""
fi
# HACK: UCC testing relies on libnccl library from NVIDIA repo, and version 2.16 crashes
# See https://github.com/pytorch/pytorch/pull/105260#issuecomment-1673399729
if [[ "$UBUNTU_VERSION" == "20.04"* && "$CUDA_VERSION" == "11.8"* ]]; then
maybe_libnccl_dev="libnccl2=2.15.5-1+cuda11.8 libnccl-dev=2.15.5-1+cuda11.8 --allow-downgrades --allow-change-held-packages"
else
maybe_libnccl_dev=""
fi
# TODO: Remove this once nvidia package repos are back online
# Comment out nvidia repositories to prevent them from getting apt-get updated, see https://github.com/pytorch/pytorch/issues/74968
# shellcheck disable=SC2046
sed -i 's/.*nvidia.*/# &/' $(find /etc/apt/ -type f -name "*.list")
# Install common dependencies
apt-get update
@ -66,7 +63,6 @@ install_ubuntu() {
libasound2-dev \
libsndfile-dev \
${maybe_libomp_dev} \
${maybe_libnccl_dev} \
software-properties-common \
wget \
sudo \
@ -81,6 +77,20 @@ install_ubuntu() {
# see: https://github.com/pytorch/pytorch/issues/65931
apt-get install -y libgnutls30
# cuda-toolkit does not work with gcc-11.2.0 which is default in Ubunutu 22.04
# see: https://github.com/NVlabs/instant-ngp/issues/119
if [[ "$UBUNTU_VERSION" == "22.04"* ]]; then
apt-get install -y g++-10
update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-10 30
update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-10 30
update-alternatives --install /usr/bin/gcov gcov /usr/bin/gcov-10 30
# https://www.spinics.net/lists/libreoffice/msg07549.html
sudo rm -rf /usr/lib/gcc/x86_64-linux-gnu/11
wget https://github.com/gcc-mirror/gcc/commit/2b2d97fc545635a0f6aa9c9ee3b017394bc494bf.patch -O noexecpt.patch
sudo patch /usr/include/c++/10/bits/range_access.h noexecpt.patch
fi
# Cleanup package manager
apt-get autoclean && apt-get clean
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

View File

@ -36,11 +36,14 @@ if [ -n "$ROCM_VERSION" ]; then
curl --retry 3 http://repo.radeon.com/misc/.sccache_amd/sccache -o /opt/cache/bin/sccache
else
ID=$(grep -oP '(?<=^ID=).+' /etc/os-release | tr -d '"')
# TODO: Install the pre-built binary from S3 as building from source
# https://github.com/pytorch/sccache has started failing mysteriously
# in which sccache server couldn't start with the following error:
# sccache: error: Invalid argument (os error 22)
install_binary
case "$ID" in
ubuntu)
install_ubuntu
;;
*)
install_binary
;;
esac
fi
chmod a+x /opt/cache/bin/sccache

View File

@ -4,7 +4,10 @@ set -ex
if [ -n "$CLANG_VERSION" ]; then
if [[ $CLANG_VERSION == 9 && $UBUNTU_VERSION == 18.04 ]]; then
if [[ $CLANG_VERSION == 7 && $UBUNTU_VERSION == 16.04 ]]; then
wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key | sudo apt-key add -
sudo apt-add-repository "deb http://apt.llvm.org/xenial/ llvm-toolchain-xenial-7 main"
elif [[ $CLANG_VERSION == 9 && $UBUNTU_VERSION == 18.04 ]]; then
sudo apt-get update
# gpg-agent is not available by default on 18.04
sudo apt-get install -y --no-install-recommends gpg-agent
@ -25,11 +28,11 @@ if [ -n "$CLANG_VERSION" ]; then
fi
# Use update-alternatives to make this version the default
# TODO: Decide if overriding gcc as well is a good idea
# update-alternatives --install /usr/bin/gcc gcc /usr/bin/clang-"$CLANG_VERSION" 50
# update-alternatives --install /usr/bin/g++ g++ /usr/bin/clang++-"$CLANG_VERSION" 50
update-alternatives --install /usr/bin/clang clang /usr/bin/clang-"$CLANG_VERSION" 50
update-alternatives --install /usr/bin/clang++ clang++ /usr/bin/clang++-"$CLANG_VERSION" 50
# Override cc/c++ to clang as well
update-alternatives --install /usr/bin/cc cc /usr/bin/clang 50
update-alternatives --install /usr/bin/c++ c++ /usr/bin/clang++ 50
# clang's packaging is a little messed up (the runtime libs aren't
# added into the linker path), so give it a little help

View File

@ -7,7 +7,6 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then
BASE_URL="https://repo.anaconda.com/miniconda"
MAJOR_PYTHON_VERSION=$(echo "$ANACONDA_PYTHON_VERSION" | cut -d . -f 1)
MINOR_PYTHON_VERSION=$(echo "$ANACONDA_PYTHON_VERSION" | cut -d . -f 2)
case "$MAJOR_PYTHON_VERSION" in
2)
@ -53,23 +52,21 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then
# Install PyTorch conda deps, as per https://github.com/pytorch/pytorch README
CONDA_COMMON_DEPS="astunparse pyyaml mkl=2021.4.0 mkl-include=2021.4.0 setuptools"
if [ "$ANACONDA_PYTHON_VERSION" = "3.11" ]; then
conda_install numpy=1.23.5 ${CONDA_COMMON_DEPS}
# Install llvm-8 as it is required to compile llvmlite-0.30.0 from source
# TODO: Stop using `-c malfet`
conda_install numpy=1.23.5 ${CONDA_COMMON_DEPS} llvmdev=8.0.0 -c malfet
elif [ "$ANACONDA_PYTHON_VERSION" = "3.10" ]; then
conda_install numpy=1.21.2 ${CONDA_COMMON_DEPS}
# Install llvm-8 as it is required to compile llvmlite-0.30.0 from source
conda_install numpy=1.21.2 ${CONDA_COMMON_DEPS} llvmdev=8.0.0
elif [ "$ANACONDA_PYTHON_VERSION" = "3.9" ]; then
conda_install numpy=1.21.2 ${CONDA_COMMON_DEPS}
# Install llvm-8 as it is required to compile llvmlite-0.30.0 from source
conda_install numpy=1.19.2 ${CONDA_COMMON_DEPS} llvmdev=8.0.0
elif [ "$ANACONDA_PYTHON_VERSION" = "3.8" ]; then
conda_install numpy=1.21.2 ${CONDA_COMMON_DEPS}
# Install llvm-8 as it is required to compile llvmlite-0.30.0 from source
conda_install numpy=1.18.5 ${CONDA_COMMON_DEPS} llvmdev=8.0.0
else
# Install `typing-extensions` for 3.7
conda_install numpy=1.21.2 ${CONDA_COMMON_DEPS} typing-extensions
fi
# This is only supported in 3.8 upward
if [ "$MINOR_PYTHON_VERSION" -gt "7" ]; then
# Install llvm-8 as it is required to compile llvmlite-0.30.0 from source
# and libpython-static for torch deploy
conda_install llvmdev=8.0.0 "libpython-static=${ANACONDA_PYTHON_VERSION}"
conda_install numpy=1.18.5 ${CONDA_COMMON_DEPS} typing-extensions
fi
# Use conda cmake in some cases. Conda cmake will be newer than our supported
@ -97,22 +94,5 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then
pip_install scikit-learn==0.20.3
fi
if [ -n "$DOCS" ]; then
apt-get update
apt-get -y install expect-dev
# We are currently building docs with python 3.8 (min support version)
pip_install -r /opt/conda/requirements-docs.txt
fi
# HACK HACK HACK
# gcc-9 for ubuntu-18.04 from http://ppa.launchpad.net/ubuntu-toolchain-r/test/ubuntu
# Pulls llibstdc++6 13.1.0-8ubuntu1~18.04 which is too new for conda
# So remove libstdc++6.so.3.29 installed by https://anaconda.org/anaconda/libstdcxx-ng/files?version=11.2.0
# Same is true for gcc-12 from Ubuntu-22.04
if grep -e [12][82].04.[623] /etc/issue >/dev/null; then
rm /opt/conda/envs/py_$ANACONDA_PYTHON_VERSION/lib/libstdc++.so.6
fi
popd
fi

View File

@ -4,9 +4,9 @@ if [[ ${CUDNN_VERSION} == 8 ]]; then
# cuDNN license: https://developer.nvidia.com/cudnn/license_agreement
mkdir tmp_cudnn && cd tmp_cudnn
CUDNN_NAME="cudnn-linux-x86_64-8.3.2.44_cuda11.5-archive"
if [[ ${CUDA_VERSION:0:4} == "12.1" ]]; then
CUDNN_NAME="cudnn-linux-x86_64-8.9.2.26_cuda12-archive"
curl --retry 3 -OLs https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-x86_64/${CUDNN_NAME}.tar.xz
if [[ ${CUDA_VERSION:0:4} == "11.7" ]]; then
CUDNN_NAME="cudnn-linux-x86_64-8.5.0.96_cuda11-archive"
curl --retry 3 -OLs https://ossci-linux.s3.amazonaws.com/${CUDNN_NAME}.tar.xz
elif [[ ${CUDA_VERSION:0:4} == "11.8" ]]; then
CUDNN_NAME="cudnn-linux-x86_64-8.7.0.84_cuda11-archive"
curl --retry 3 -OLs https://developer.download.nvidia.com/compute/redist/cudnn/v8.7.0/local_installers/11.8/${CUDNN_NAME}.tar.xz

View File

@ -7,7 +7,7 @@ if [ -n "$KATEX" ]; then
# Ignore error if gpg-agent doesn't exist (for Ubuntu 16.04)
apt-get install -y gpg-agent || :
curl --retry 3 -sL https://deb.nodesource.com/setup_16.x | sudo -E bash -
curl --retry 3 -sL https://deb.nodesource.com/setup_12.x | sudo -E bash -
sudo apt-get install -y nodejs
curl --retry 3 -sS https://dl.yarnpkg.com/debian/pubkey.gpg | sudo apt-key add -

View File

@ -7,10 +7,17 @@ if [ -n "$GCC_VERSION" ]; then
# Need the official toolchain repo to get alternate packages
add-apt-repository ppa:ubuntu-toolchain-r/test
apt-get update
apt-get install -y g++-$GCC_VERSION
update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-"$GCC_VERSION" 50
update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-"$GCC_VERSION" 50
update-alternatives --install /usr/bin/gcov gcov /usr/bin/gcov-"$GCC_VERSION" 50
if [[ "$UBUNTU_VERSION" == "16.04" && "${GCC_VERSION:0:1}" == "5" ]]; then
apt-get install -y g++-5=5.4.0-6ubuntu1~16.04.12
update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-5 50
update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-5 50
update-alternatives --install /usr/bin/gcov gcov /usr/bin/gcov-5 50
else
apt-get install -y g++-$GCC_VERSION
update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-"$GCC_VERSION" 50
update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-"$GCC_VERSION" 50
update-alternatives --install /usr/bin/gcov gcov /usr/bin/gcov-"$GCC_VERSION" 50
fi
# Cleanup package manager

View File

@ -1,24 +0,0 @@
#!/bin/bash
set -ex
source "$(dirname "${BASH_SOURCE[0]}")/common_utils.sh"
function install_huggingface() {
local version
version=$(get_pinned_commit huggingface)
pip_install pandas==2.0.3
pip_install "transformers==${version}"
}
function install_timm() {
local commit
commit=$(get_pinned_commit timm)
pip_install pandas==2.0.3
pip_install "git+https://github.com/rwightman/pytorch-image-models@${commit}"
}
# Pango is needed for weasyprint which is needed for doctr
conda_install pango
install_huggingface
# install_timm

View File

@ -1,50 +0,0 @@
#!/bin/bash
set -ex
source "$(dirname "${BASH_SOURCE[0]}")/common_utils.sh"
retry () {
"$@" || (sleep 10 && "$@") || (sleep 20 && "$@") || (sleep 40 && "$@")
}
# A bunch of custom pip dependencies for ONNX
pip_install \
beartype==0.10.4 \
filelock==3.9.0 \
flatbuffers==2.0 \
mock==5.0.1 \
ninja==1.10.2 \
networkx==2.0 \
numpy==1.22.4
# ONNXRuntime should be installed before installing
# onnx-weekly. Otherwise, onnx-weekly could be
# overwritten by onnx.
pip_install \
parameterized==0.8.1 \
pytest-cov==4.0.0 \
pytest-subtests==0.10.0 \
tabulate==0.9.0 \
transformers==4.31.0
pip_install coloredlogs packaging
retry pip_install -i https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple/ --no-cache-dir --no-input ort-nightly==1.16.0.dev20230908001
pip_install onnx==1.14.1
pip_install onnxscript-preview==0.1.0.dev20230828 --no-deps
# Cache the transformers model to be used later by ONNX tests. We need to run the transformers
# package to download the model. By default, the model is cached at ~/.cache/huggingface/hub/
IMPORT_SCRIPT_FILENAME="/tmp/onnx_import_script.py"
as_jenkins echo 'import transformers; transformers.AutoModel.from_pretrained("sshleifer/tiny-gpt2"); transformers.AutoTokenizer.from_pretrained("sshleifer/tiny-gpt2");' > "${IMPORT_SCRIPT_FILENAME}"
# Need a PyTorch version for transformers to work
pip_install --pre torch --index-url https://download.pytorch.org/whl/nightly/cpu
# Very weird quoting behavior here https://github.com/conda/conda/issues/10972,
# so echo the command to a file and run the file instead
conda_run python "${IMPORT_SCRIPT_FILENAME}"
# Cleaning up
conda_run pip uninstall -y torch
rm "${IMPORT_SCRIPT_FILENAME}" || true

View File

@ -61,23 +61,13 @@ install_ubuntu() {
rocprofiler-dev \
roctracer-dev
# precompiled miopen kernels added in ROCm 3.5, renamed in ROCm 5.5
# search for all unversioned packages
# precompiled miopen kernels added in ROCm 3.5; search for all unversioned packages
# if search fails it will abort this script; use true to avoid case where search fails
if [[ $(ver $ROCM_VERSION) -ge $(ver 5.5) ]]; then
MIOPENHIPGFX=$(apt-cache search --names-only miopen-hip-gfx | awk '{print $1}' | grep -F -v . || true)
if [[ "x${MIOPENHIPGFX}" = x ]]; then
echo "miopen-hip-gfx package not available" && exit 1
else
DEBIAN_FRONTEND=noninteractive apt-get install -y --allow-unauthenticated ${MIOPENHIPGFX}
fi
MIOPENKERNELS=$(apt-cache search --names-only miopenkernels | awk '{print $1}' | grep -F -v . || true)
if [[ "x${MIOPENKERNELS}" = x ]]; then
echo "miopenkernels package not available"
else
MIOPENKERNELS=$(apt-cache search --names-only miopenkernels | awk '{print $1}' | grep -F -v . || true)
if [[ "x${MIOPENKERNELS}" = x ]]; then
echo "miopenkernels package not available" && exit 1
else
DEBIAN_FRONTEND=noninteractive apt-get install -y --allow-unauthenticated ${MIOPENKERNELS}
fi
DEBIAN_FRONTEND=noninteractive apt-get install -y --allow-unauthenticated ${MIOPENKERNELS}
fi
# Cleanup
@ -133,24 +123,6 @@ install_centos() {
rocprofiler-dev \
roctracer-dev
# precompiled miopen kernels; search for all unversioned packages
# if search fails it will abort this script; use true to avoid case where search fails
if [[ $(ver $ROCM_VERSION) -ge $(ver 5.5) ]]; then
MIOPENHIPGFX=$(yum -q search miopen-hip-gfx | grep miopen-hip-gfx | awk '{print $1}'| grep -F kdb. || true)
if [[ "x${MIOPENHIPGFX}" = x ]]; then
echo "miopen-hip-gfx package not available" && exit 1
else
yum install -y ${MIOPENHIPGFX}
fi
else
MIOPENKERNELS=$(yum -q search miopenkernels | grep miopenkernels- | awk '{print $1}'| grep -F kdb. || true)
if [[ "x${MIOPENKERNELS}" = x ]]; then
echo "miopenkernels package not available" && exit 1
else
yum install -y ${MIOPENKERNELS}
fi
fi
# Cleanup
yum clean all
rm -rf /var/cache/yum

View File

@ -6,7 +6,7 @@ set -ex
git clone https://bitbucket.org/icl/magma.git
pushd magma
# Fixes memory leaks of magma found while executing linalg UTs
git checkout 28592a7170e4b3707ed92644bf4a689ed600c27f
git checkout 5959b8783e45f1809812ed96ae762f38ee701972
cp make.inc-examples/make.inc.hip-gcc-mkl make.inc
echo 'LIBDIR += -L$(MKLROOT)/lib' >> make.inc
echo 'LIB += -Wl,--enable-new-dtags -Wl,--rpath,/opt/rocm/lib -Wl,--rpath,$(MKLROOT)/lib -Wl,--rpath,/opt/rocm/magma/lib' >> make.inc
@ -18,7 +18,7 @@ else
amdgpu_targets=`rocm_agent_enumerator | grep -v gfx000 | sort -u | xargs`
fi
for arch in $amdgpu_targets; do
echo "DEVCCFLAGS += --offload-arch=$arch" >> make.inc
echo "DEVCCFLAGS += --amdgpu-target=$arch" >> make.inc
done
# hipcc with openmp flag may cause isnan() on __device__ not to be found; depending on context, compiler may attempt to match with host definition
sed -i 's/^FOPENMP/#FOPENMP/g' make.inc

View File

@ -1,66 +0,0 @@
#!/bin/bash
set -ex
source "$(dirname "${BASH_SOURCE[0]}")/common_utils.sh"
get_conda_version() {
as_jenkins conda list -n py_$ANACONDA_PYTHON_VERSION | grep -w $* | head -n 1 | awk '{print $2}'
}
conda_reinstall() {
as_jenkins conda install -q -n py_$ANACONDA_PYTHON_VERSION -y --force-reinstall $*
}
if [ -n "${ROCM_VERSION}" ]; then
TRITON_REPO="https://github.com/ROCmSoftwarePlatform/triton"
TRITON_TEXT_FILE="triton-rocm"
else
TRITON_REPO="https://github.com/openai/triton"
TRITON_TEXT_FILE="triton"
fi
# The logic here is copied from .ci/pytorch/common_utils.sh
TRITON_PINNED_COMMIT=$(get_pinned_commit ${TRITON_TEXT_FILE})
apt update
apt-get install -y gpg-agent
if [ -n "${CONDA_CMAKE}" ]; then
# Keep the current cmake and numpy version here, so we can reinstall them later
CMAKE_VERSION=$(get_conda_version cmake)
NUMPY_VERSION=$(get_conda_version numpy)
fi
if [ -z "${MAX_JOBS}" ]; then
export MAX_JOBS=$(nproc)
fi
if [ -n "${GCC_VERSION}" ] && [[ "${GCC_VERSION}" == "7" ]]; then
# Triton needs at least gcc-9 to build
apt-get install -y g++-9
CXX=g++-9 pip_install "git+${TRITON_REPO}@${TRITON_PINNED_COMMIT}#subdirectory=python"
elif [ -n "${CLANG_VERSION}" ]; then
# Triton needs <filesystem> which surprisingly is not available with clang-9 toolchain
add-apt-repository -y ppa:ubuntu-toolchain-r/test
apt-get install -y g++-9
CXX=g++-9 pip_install "git+${TRITON_REPO}@${TRITON_PINNED_COMMIT}#subdirectory=python"
else
pip_install "git+${TRITON_REPO}@${TRITON_PINNED_COMMIT}#subdirectory=python"
fi
if [ -n "${CONDA_CMAKE}" ]; then
# TODO: This is to make sure that the same cmake and numpy version from install conda
# script is used. Without this step, the newer cmake version (3.25.2) downloaded by
# triton build step via pip will fail to detect conda MKL. Once that issue is fixed,
# this can be removed.
#
# The correct numpy version also needs to be set here because conda claims that it
# causes inconsistent environment. Without this, conda will attempt to install the
# latest numpy version, which fails ASAN tests with the following import error: Numba
# needs NumPy 1.20 or less.
conda_reinstall cmake="${CMAKE_VERSION}"
conda_reinstall numpy="${NUMPY_VERSION}"
fi

View File

@ -43,6 +43,3 @@ case "$ID" in
exit 1
;;
esac
# Cache vision models used by the test
source "$(dirname "${BASH_SOURCE[0]}")/cache_vision_models.sh"

View File

@ -25,10 +25,10 @@ coremltools==5.0b5
#Pinned versions:
#test that import:
expecttest==0.1.6
expecttest==0.1.3
#Description: method for writing tests where test framework auto populates
# the expected output based on previous runs
#Pinned versions: 0.1.6
#Pinned versions: 0.1.3
#test that import:
flatbuffers==2.0
@ -62,7 +62,7 @@ librosa>=0.6.2 ; python_version < "3.11"
#mkl-devel
# see mkl
#mock
#mock # breaks ci/circleci: docker-pytorch-linux-xenial-py3-clang5-android-ndk-r19c
#Description: A testing library that allows you to replace parts of your
#system under test with mock objects
#Pinned versions:
@ -75,16 +75,16 @@ librosa>=0.6.2 ; python_version < "3.11"
#Pinned versions:
#test that import:
mypy==1.4.1
mypy==0.960
# Pin MyPy version because new errors are likely to appear with each release
#Description: linter
#Pinned versions: 1.4.1
#Pinned versions: 0.960
#test that import: test_typing.py, test_type_hints.py
networkx==2.8.8
networkx==2.6.3
#Description: creation, manipulation, and study of
#the structure, dynamics, and functions of complex networks
#Pinned versions: 2.8.8
#Pinned versions: 2.6.3 (latest version that works with Python 3.7+)
#test that import: functorch
#ninja
@ -124,8 +124,7 @@ opt-einsum==3.3
#Pinned versions: 3.3
#test that import: test_linalg.py
pillow==9.3.0 ; python_version <= "3.8"
pillow==9.5.0 ; python_version > "3.8"
#pillow
#Description: Python Imaging Library fork
#Pinned versions:
#test that import:
@ -140,17 +139,17 @@ psutil
#Pinned versions:
#test that import: test_profiler.py, test_openmp.py, test_dataloader.py
pytest==7.3.2
pytest
#Description: testing framework
#Pinned versions:
#test that import: test_typing.py, test_cpp_extensions_aot.py, run_test.py
pytest-xdist==3.3.1
pytest-xdist
#Description: plugin for running pytest in parallel
#Pinned versions:
#test that import:
pytest-shard==0.1.2
pytest-shard
#Description: plugin spliting up tests in pytest
#Pinned versions:
#test that import:
@ -160,7 +159,7 @@ pytest-flakefinder==1.1.0
#Pinned versions: 1.1.0
#test that import:
pytest-rerunfailures>=10.3
pytest-rerunfailures
#Description: plugin for rerunning failure tests in pytest
#Pinned versions:
#test that import:
@ -180,7 +179,7 @@ xdoctest==1.1.0
#Pinned versions: 1.1.0
#test that import:
pygments==2.15.0
pygments==2.12.0
#Description: support doctest highlighting
#Pinned versions: 2.12.0
#test that import: the doctests
@ -200,8 +199,7 @@ pygments==2.15.0
#Pinned versions: 10.9.0
#test that import:
scikit-image==0.19.3 ; python_version < "3.10"
scikit-image==0.20.0 ; python_version >= "3.10"
scikit-image
#Description: image processing routines
#Pinned versions:
#test that import: test_nn.py
@ -213,7 +211,7 @@ scikit-image==0.20.0 ; python_version >= "3.10"
scipy==1.6.3 ; python_version < "3.10"
scipy==1.8.1 ; python_version == "3.10"
scipy==1.10.1 ; python_version == "3.11"
scipy==1.9.3 ; python_version == "3.11"
# Pin SciPy because of failing distribution tests (see #60347)
#Description: scientific python
#Pinned versions: 1.6.3
@ -226,7 +224,7 @@ scipy==1.10.1 ; python_version == "3.11"
#Pinned versions:
#test that import:
tb-nightly==2.13.0a20230426
tb-nightly
#Description: TensorBoard
#Pinned versions:
#test that import:
@ -246,9 +244,9 @@ unittest-xml-reporting<=3.2.0,>=2.0.0
#Pinned versions:
#test that import:
lintrunner==0.10.7
#Description: all about linters!
#Pinned versions: 0.10.7
lintrunner==0.9.2
#Description: all about linters
#Pinned versions: 0.9.2
#test that import:
rockset==1.0.3
@ -260,23 +258,3 @@ ghstack==0.7.1
#Description: ghstack tool
#Pinned versions: 0.7.1
#test that import:
jinja2==3.1.2
#Description: jinja2 template engine
#Pinned versions: 3.1.2
#test that import:
pytest-cpp==2.3.0
#Description: This is used by pytest to invoke C++ tests
#Pinned versions: 2.3.0
#test that import:
z3-solver==4.12.2.0
#Description: The Z3 Theorem Prover Project
#Pinned versions:
#test that import:
tensorboard==2.13.0
#Description: Also included in .ci/docker/requirements-docs.txt
#Pinned versions:
#test that import: test_tensorboard

View File

@ -1,49 +0,0 @@
sphinx==5.3.0
#Description: This is used to generate PyTorch docs
#Pinned versions: 5.3.0
-e git+https://github.com/pytorch/pytorch_sphinx_theme.git#egg=pytorch_sphinx_theme
# TODO: sphinxcontrib.katex 0.9.0 adds a local KaTeX server to speed up pre-rendering
# but it doesn't seem to work and hangs around idly. The initial thought is probably
# something related to Docker setup. We can investigate this later
sphinxcontrib.katex==0.8.6
#Description: This is used to generate PyTorch docs
#Pinned versions: 0.8.6
matplotlib==3.5.3
#Description: This is used to generate PyTorch docs
#Pinned versions: 3.5.3
tensorboard==2.13.0
#Description: This is used to generate PyTorch docs
#Pinned versions: 2.13.0
breathe==4.34.0
#Description: This is used to generate PyTorch C++ docs
#Pinned versions: 4.34.0
exhale==0.2.3
#Description: This is used to generate PyTorch C++ docs
#Pinned versions: 0.2.3
docutils==0.16
#Description: This is used to generate PyTorch C++ docs
#Pinned versions: 0.16
bs4==0.0.1
#Description: This is used to generate PyTorch C++ docs
#Pinned versions: 0.0.1
IPython==8.12.0
#Description: This is used to generate PyTorch functorch docs
#Pinned versions: 8.12.0
myst-nb==0.17.2
#Description: This is used to generate PyTorch functorch docs
#Pinned versions: 0.13.2
# The following are required to build torch.distributed.elastic.rendezvous.etcd* docs
python-etcd==0.4.5
sphinx-copybutton==0.5.0
sphinx-panels==0.4.1
myst-parser==0.18.1

View File

@ -1 +0,0 @@
2.1.0

View File

@ -58,9 +58,9 @@ ENV INSTALLED_DB ${DB}
# (optional) Install vision packages like OpenCV and ffmpeg
ARG VISION
COPY ./common/install_vision.sh ./common/cache_vision_models.sh ./common/common_utils.sh ./
COPY ./common/install_vision.sh install_vision.sh
RUN if [ -n "${VISION}" ]; then bash ./install_vision.sh; fi
RUN rm install_vision.sh cache_vision_models.sh common_utils.sh
RUN rm install_vision.sh
ENV INSTALLED_VISION ${VISION}
# (optional) Install UCC
@ -85,24 +85,6 @@ COPY ./common/install_cmake.sh install_cmake.sh
RUN if [ -n "${CMAKE_VERSION}" ]; then bash ./install_cmake.sh; fi
RUN rm install_cmake.sh
ARG INDUCTOR_BENCHMARKS
COPY ./common/install_inductor_benchmark_deps.sh install_inductor_benchmark_deps.sh
COPY ./common/common_utils.sh common_utils.sh
COPY ci_commit_pins/huggingface.txt huggingface.txt
COPY ci_commit_pins/timm.txt timm.txt
RUN if [ -n "${INDUCTOR_BENCHMARKS}" ]; then bash ./install_inductor_benchmark_deps.sh; fi
RUN rm install_inductor_benchmark_deps.sh common_utils.sh timm.txt huggingface.txt
ARG TRITON
# Install triton, this needs to be done before sccache because the latter will
# try to reach out to S3, which docker build runners don't have access
COPY ./common/install_triton.sh install_triton.sh
COPY ./common/common_utils.sh common_utils.sh
COPY ci_commit_pins/triton.txt triton.txt
COPY triton_version.txt triton_version.txt
RUN if [ -n "${TRITON}" ]; then bash ./install_triton.sh; fi
RUN rm install_triton.sh common_utils.sh triton.txt triton_version.txt
# Install ccache/sccache (do this last, so we get priority in PATH)
COPY ./common/install_cache.sh install_cache.sh
ENV PATH /opt/cache/bin:$PATH
@ -145,7 +127,6 @@ RUN rm install_cudnn.sh
# Delete /usr/local/cuda-11.X/cuda-11.X symlinks
RUN if [ -h /usr/local/cuda-11.6/cuda-11.6 ]; then rm /usr/local/cuda-11.6/cuda-11.6; fi
RUN if [ -h /usr/local/cuda-11.7/cuda-11.7 ]; then rm /usr/local/cuda-11.7/cuda-11.7; fi
RUN if [ -h /usr/local/cuda-12.1/cuda-12.1 ]; then rm /usr/local/cuda-12.1/cuda-12.1; fi
USER jenkins
CMD ["bash"]

View File

@ -55,9 +55,9 @@ ENV INSTALLED_DB ${DB}
# (optional) Install vision packages like OpenCV and ffmpeg
ARG VISION
COPY ./common/install_vision.sh ./common/cache_vision_models.sh ./common/common_utils.sh ./
COPY ./common/install_vision.sh install_vision.sh
RUN if [ -n "${VISION}" ]; then bash ./install_vision.sh; fi
RUN rm install_vision.sh cache_vision_models.sh common_utils.sh
RUN rm install_vision.sh
ENV INSTALLED_VISION ${VISION}
# Install rocm
@ -68,7 +68,6 @@ RUN rm install_rocm.sh
COPY ./common/install_rocm_magma.sh install_rocm_magma.sh
RUN bash ./install_rocm_magma.sh
RUN rm install_rocm_magma.sh
ENV ROCM_PATH /opt/rocm
ENV PATH /opt/rocm/bin:$PATH
ENV PATH /opt/rocm/hcc/bin:$PATH
ENV PATH /opt/rocm/hip/bin:$PATH
@ -90,16 +89,6 @@ COPY ./common/install_ninja.sh install_ninja.sh
RUN if [ -n "${NINJA_VERSION}" ]; then bash ./install_ninja.sh; fi
RUN rm install_ninja.sh
ARG TRITON
# Install triton, this needs to be done before sccache because the latter will
# try to reach out to S3, which docker build runners don't have access
COPY ./common/install_triton.sh install_triton.sh
COPY ./common/common_utils.sh common_utils.sh
COPY ci_commit_pins/triton-rocm.txt triton-rocm.txt
COPY triton_version.txt triton_version.txt
RUN if [ -n "${TRITON}" ]; then bash ./install_triton.sh; fi
RUN rm install_triton.sh common_utils.sh triton-rocm.txt triton_version.txt
# Install ccache/sccache (do this last, so we get priority in PATH)
COPY ./common/install_cache.sh install_cache.sh
ENV PATH /opt/cache/bin:$PATH

View File

@ -36,14 +36,12 @@ RUN bash ./install_docs_reqs.sh && rm install_docs_reqs.sh
# Install conda and other packages (e.g., numpy, pytest)
ARG ANACONDA_PYTHON_VERSION
ARG CONDA_CMAKE
ARG DOCS
ENV ANACONDA_PYTHON_VERSION=$ANACONDA_PYTHON_VERSION
ENV PATH /opt/conda/envs/py_$ANACONDA_PYTHON_VERSION/bin:/opt/conda/bin:$PATH
ENV DOCS=$DOCS
COPY requirements-ci.txt requirements-docs.txt /opt/conda/
COPY requirements-ci.txt /opt/conda/requirements-ci.txt
COPY ./common/install_conda.sh install_conda.sh
COPY ./common/common_utils.sh common_utils.sh
RUN bash ./install_conda.sh && rm install_conda.sh common_utils.sh /opt/conda/requirements-ci.txt /opt/conda/requirements-docs.txt
RUN bash ./install_conda.sh && rm install_conda.sh common_utils.sh /opt/conda/requirements-ci.txt
# Install gcc
ARG GCC_VERSION
@ -88,20 +86,20 @@ ENV INSTALLED_DB ${DB}
# (optional) Install vision packages like OpenCV and ffmpeg
ARG VISION
COPY ./common/install_vision.sh ./common/cache_vision_models.sh ./common/common_utils.sh ./
COPY ./common/install_vision.sh install_vision.sh
RUN if [ -n "${VISION}" ]; then bash ./install_vision.sh; fi
RUN rm install_vision.sh cache_vision_models.sh common_utils.sh
RUN rm install_vision.sh
ENV INSTALLED_VISION ${VISION}
# (optional) Install Android NDK
ARG ANDROID
ARG ANDROID_NDK
ARG GRADLE_VERSION
COPY ./common/install_android.sh ./common/cache_vision_models.sh ./common/common_utils.sh ./
COPY ./common/install_android.sh install_android.sh
COPY ./android/AndroidManifest.xml AndroidManifest.xml
COPY ./android/build.gradle build.gradle
RUN if [ -n "${ANDROID}" ]; then bash ./install_android.sh; fi
RUN rm install_android.sh cache_vision_models.sh common_utils.sh
RUN rm install_android.sh
RUN rm AndroidManifest.xml
RUN rm build.gradle
ENV INSTALLED_ANDROID ${ANDROID}
@ -136,29 +134,6 @@ ENV OPENSSL_ROOT_DIR /opt/openssl
ENV OPENSSL_DIR /opt/openssl
RUN rm install_openssl.sh
ARG INDUCTOR_BENCHMARKS
COPY ./common/install_inductor_benchmark_deps.sh install_inductor_benchmark_deps.sh
COPY ./common/common_utils.sh common_utils.sh
COPY ci_commit_pins/huggingface.txt huggingface.txt
COPY ci_commit_pins/timm.txt timm.txt
RUN if [ -n "${INDUCTOR_BENCHMARKS}" ]; then bash ./install_inductor_benchmark_deps.sh; fi
RUN rm install_inductor_benchmark_deps.sh common_utils.sh timm.txt huggingface.txt
ARG TRITON
# Install triton, this needs to be done before sccache because the latter will
# try to reach out to S3, which docker build runners don't have access
COPY ./common/install_triton.sh install_triton.sh
COPY ./common/common_utils.sh common_utils.sh
COPY ci_commit_pins/triton.txt triton.txt
RUN if [ -n "${TRITON}" ]; then bash ./install_triton.sh; fi
RUN rm install_triton.sh common_utils.sh triton.txt
ARG ONNX
# Install ONNX dependencies
COPY ./common/install_onnx.sh ./common/common_utils.sh ./
RUN if [ -n "${ONNX}" ]; then bash ./install_onnx.sh; fi
RUN rm install_onnx.sh common_utils.sh
# Install ccache/sccache (do this last, so we get priority in PATH)
COPY ./common/install_cache.sh install_cache.sh
ENV PATH /opt/cache/bin:$PATH

View File

@ -3,18 +3,72 @@
# shellcheck source=./common.sh
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
# Use to retry ONNX test, only retry it twice
retry () {
"$@" || (sleep 60 && "$@")
}
if [[ ${BUILD_ENVIRONMENT} == *onnx* ]]; then
pip install click mock tabulate networkx==2.0
pip -q install --user "file:///var/lib/jenkins/workspace/third_party/onnx#egg=onnx"
fi
# Skip tests in environments where they are not built/applicable
if [[ "${BUILD_ENVIRONMENT}" == *-android* ]]; then
echo 'Skipping tests'
exit 0
fi
if [[ "${BUILD_ENVIRONMENT}" == *-rocm* ]]; then
# temporary to locate some kernel issues on the CI nodes
export HSAKMT_DEBUG_LEVEL=4
fi
# These additional packages are needed for circleci ROCm builds.
if [[ $BUILD_ENVIRONMENT == *rocm* ]]; then
# Need networkx 2.0 because bellmand_ford was moved in 2.1 . Scikit-image by
# defaults installs the most recent networkx version, so we install this lower
# version explicitly before scikit-image pulls it in as a dependency
pip install networkx==2.0
# click - onnx
pip install --progress-bar off click protobuf tabulate virtualenv mock typing-extensions
fi
################################################################################
# Python tests #
################################################################################
if [[ "$BUILD_ENVIRONMENT" == *cmake* ]]; then
exit 0
fi
# If pip is installed as root, we must use sudo.
# CircleCI docker images could install conda as jenkins user, or use the OS's python package.
PIP=$(which pip)
PIP_USER=$(stat --format '%U' $PIP)
CURRENT_USER=$(id -u -n)
if [[ "$PIP_USER" = root && "$CURRENT_USER" != root ]]; then
MAYBE_SUDO=sudo
fi
# Uninstall pre-installed hypothesis and coverage to use an older version as newer
# versions remove the timeout parameter from settings which ideep/conv_transpose_test.py uses
$MAYBE_SUDO pip -q uninstall -y hypothesis
$MAYBE_SUDO pip -q uninstall -y coverage
# "pip install hypothesis==3.44.6" from official server is unreliable on
# CircleCI, so we host a copy on S3 instead
$MAYBE_SUDO pip -q install attrs==18.1.0 -f https://s3.amazonaws.com/ossci-linux/wheels/attrs-18.1.0-py2.py3-none-any.whl
$MAYBE_SUDO pip -q install coverage==4.5.1 -f https://s3.amazonaws.com/ossci-linux/wheels/coverage-4.5.1-cp36-cp36m-macosx_10_12_x86_64.whl
$MAYBE_SUDO pip -q install hypothesis==4.57.1
##############
# ONNX tests #
##############
if [[ "$BUILD_ENVIRONMENT" == *onnx* ]]; then
# TODO: This can be removed later once vision is also part of the Docker image
pip install -q --user --no-use-pep517 "git+https://github.com/pytorch/vision.git@$(cat .github/ci_commit_pins/vision.txt)"
pip install -q --user transformers==4.25.1
pip install -q --user ninja flatbuffers==2.0 numpy==1.22.4 onnxruntime==1.14.0 beartype==0.10.4
# TODO: change this when onnx 1.13.1 is released.
pip install --no-use-pep517 'onnx @ git+https://github.com/onnx/onnx@e192ba01e438d22ca2dedd7956e28e3551626c91'
# TODO: change this when onnx-script is on testPypi
pip install 'onnx-script @ git+https://github.com/microsoft/onnx-script@a71e35bcd72537bf7572536ee57250a0c0488bf6'
# numba requires numpy <= 1.20, onnxruntime requires numpy >= 1.21.
# We don't actually need it for our tests, but it's imported if it's present, so uninstall.
pip uninstall -q --yes numba
# JIT C++ extensions require ninja, so put it into PATH.
export PATH="/var/lib/jenkins/.local/bin:$PATH"
# NB: ONNX test is fast (~15m) so it's ok to retry it few more times to avoid any flaky issue, we
# need to bring this to the standard PyTorch run_test eventually. The issue will be tracked in
# https://github.com/pytorch/pytorch/issues/98626
retry "$ROOT_DIR/scripts/onnx/test.sh"
"$ROOT_DIR/scripts/onnx/test.sh"
fi

42
.ci/pytorch/build-asan.sh Executable file
View File

@ -0,0 +1,42 @@
#!/bin/bash
# Required environment variable: $BUILD_ENVIRONMENT
# (This is set by default in the Docker images we build, so you don't
# need to set it yourself.
# shellcheck source=./common.sh
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
# shellcheck source=./common-build.sh
source "$(dirname "${BASH_SOURCE[0]}")/common-build.sh"
echo "Clang version:"
clang --version
python tools/stats/export_test_times.py
# detect_leaks=0: Python is very leaky, so we need suppress it
# symbolize=1: Gives us much better errors when things go wrong
export ASAN_OPTIONS=detect_leaks=0:detect_stack_use_after_return=1:symbolize=1:detect_odr_violation=0
if [ -n "$(which conda)" ]; then
export CMAKE_PREFIX_PATH=/opt/conda
fi
# TODO: Make the ASAN flags a centralized env var and unify with USE_ASAN option
CC="clang" CXX="clang++" LDSHARED="clang --shared" \
CFLAGS="-fsanitize=address -fsanitize=undefined -fno-sanitize-recover=all -fsanitize-address-use-after-scope -shared-libasan" \
USE_ASAN=1 USE_CUDA=0 USE_MKLDNN=0 \
python setup.py bdist_wheel
pip_install_whl "$(echo dist/*.whl)"
# Test building via the sdist source tarball
python setup.py sdist
mkdir -p /tmp/tmp
pushd /tmp/tmp
tar zxf "$(dirname "${BASH_SOURCE[0]}")/../../dist/"*.tar.gz
cd torch-*
python setup.py build --cmake-only
popd
print_sccache_stats
assert_git_not_dirty

29
.ci/pytorch/build-tsan.sh Executable file
View File

@ -0,0 +1,29 @@
#!/bin/bash
# Required environment variable: $BUILD_ENVIRONMENT
# (This is set by default in the Docker images we build, so you don't
# need to set it yourself.
# shellcheck source=./common.sh
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
# shellcheck source=./common-build.sh
source "$(dirname "${BASH_SOURCE[0]}")/common-build.sh"
echo "Clang version:"
clang --version
python tools/stats/export_test_times.py
if [ -n "$(which conda)" ]; then
export CMAKE_PREFIX_PATH=/opt/conda
fi
CC="clang" CXX="clang++" LDSHARED="clang --shared" \
CFLAGS="-fsanitize=thread" \
USE_TSAN=1 USE_CUDA=0 USE_MKLDNN=0 \
python setup.py bdist_wheel
pip_install_whl "$(echo dist/*.whl)"
print_sccache_stats
assert_git_not_dirty

View File

@ -11,6 +11,14 @@ source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
# shellcheck source=./common-build.sh
source "$(dirname "${BASH_SOURCE[0]}")/common-build.sh"
if [[ "$BUILD_ENVIRONMENT" == *-clang7-asan* ]]; then
exec "$(dirname "${BASH_SOURCE[0]}")/build-asan.sh" "$@"
fi
if [[ "$BUILD_ENVIRONMENT" == *-clang7-tsan* ]]; then
exec "$(dirname "${BASH_SOURCE[0]}")/build-tsan.sh" "$@"
fi
if [[ "$BUILD_ENVIRONMENT" == *-mobile-*build* ]]; then
exec "$(dirname "${BASH_SOURCE[0]}")/build-mobile.sh" "$@"
fi
@ -36,7 +44,6 @@ if [[ "$BUILD_ENVIRONMENT" == *cuda11* ]]; then
if [[ "$BUILD_ENVIRONMENT" != *cuda11.3* && "$BUILD_ENVIRONMENT" != *clang* ]]; then
# TODO: there is a linking issue when building with UCC using clang,
# disable it for now and to be fix later.
# TODO: disable UCC temporarily to enable CUDA 12.1 in CI
export USE_UCC=1
export USE_SYSTEM_UCC=1
fi
@ -164,15 +171,6 @@ if [[ "${BUILD_ENVIRONMENT}" == *clang* ]]; then
export CXX=clang++
fi
if [[ "$BUILD_ENVIRONMENT" == *-clang*-asan* ]]; then
export LDSHARED="clang --shared"
export USE_CUDA=0
export USE_ASAN=1
export USE_MKLDNN=0
export UBSAN_FLAGS="-fno-sanitize-recover=all;-fno-sanitize=float-divide-by-zero;-fno-sanitize=float-cast-overflow"
unset USE_LLVM
fi
if [[ "${BUILD_ENVIRONMENT}" == *no-ops* ]]; then
export USE_PER_OPERATOR_HEADERS=0
fi
@ -193,19 +191,16 @@ if [[ "$BUILD_ENVIRONMENT" == *-bazel-* ]]; then
set -e
get_bazel
install_sccache_nvcc_for_bazel
# Leave 1 CPU free and use only up to 80% of memory to reduce the change of crashing
# the runner
BAZEL_MEM_LIMIT="--local_ram_resources=HOST_RAM*.8"
BAZEL_CPU_LIMIT="--local_cpu_resources=HOST_CPUS-1"
if [[ "$CUDA_VERSION" == "cpu" ]]; then
# Build torch, the Python module, and tests for CPU-only
tools/bazel build --config=no-tty "${BAZEL_MEM_LIMIT}" "${BAZEL_CPU_LIMIT}" --config=cpu-only :torch :torch/_C.so :all_tests
else
tools/bazel build --config=no-tty "${BAZEL_MEM_LIMIT}" "${BAZEL_CPU_LIMIT}" //...
fi
tools/bazel build --config=no-tty "${BAZEL_MEM_LIMIT}" "${BAZEL_CPU_LIMIT}" //...
# Build torch, the Python module, and tests for CPU-only
tools/bazel build --config=no-tty "${BAZEL_MEM_LIMIT}" "${BAZEL_CPU_LIMIT}" --config=cpu-only :torch :_C.so :all_tests
else
# check that setup.py would fail with bad arguments
echo "The next three invocations are expected to fail with invalid command error messages."

View File

@ -31,7 +31,7 @@ if [[ "$BUILD_ENVIRONMENT" != *win-* ]]; then
# as though sccache still gets used even when the sscache server isn't started
# explicitly
echo "Skipping sccache server initialization, setting environment variables"
export SCCACHE_IDLE_TIMEOUT=0
export SCCACHE_IDLE_TIMEOUT=1200
export SCCACHE_ERROR_LOG=~/sccache_error.log
export RUST_LOG=sccache::server=error
elif [[ "${BUILD_ENVIRONMENT}" == *rocm* ]]; then
@ -39,12 +39,11 @@ if [[ "$BUILD_ENVIRONMENT" != *win-* ]]; then
else
# increasing SCCACHE_IDLE_TIMEOUT so that extension_backend_test.cpp can build after this PR:
# https://github.com/pytorch/pytorch/pull/16645
SCCACHE_ERROR_LOG=~/sccache_error.log SCCACHE_IDLE_TIMEOUT=0 RUST_LOG=sccache::server=error sccache --start-server
SCCACHE_ERROR_LOG=~/sccache_error.log SCCACHE_IDLE_TIMEOUT=1200 RUST_LOG=sccache::server=error sccache --start-server
fi
# Report sccache stats for easier debugging. It's ok if this commands
# timeouts and fails on MacOS
sccache --zero-stats || true
# Report sccache stats for easier debugging
sccache --zero-stats
fi
if which ccache > /dev/null; then

View File

@ -22,3 +22,7 @@ fi
# TODO: Renable libtorch testing for MacOS, see https://github.com/pytorch/pytorch/issues/62598
# shellcheck disable=SC2034
BUILD_TEST_LIBTORCH=0
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}

View File

@ -80,34 +80,19 @@ function get_exit_code() {
}
function get_bazel() {
# Download and use the cross-platform, dependency-free Python
# version of Bazelisk to fetch the platform specific version of
# Bazel to use from .bazelversion.
retry curl --location --output tools/bazel \
https://raw.githubusercontent.com/bazelbuild/bazelisk/v1.16.0/bazelisk.py
shasum --algorithm=1 --check \
<(echo 'd4369c3d293814d3188019c9f7527a948972d9f8 tools/bazel')
chmod u+x tools/bazel
}
if [[ $(uname) == "Darwin" ]]; then
# download bazel version
retry curl https://github.com/bazelbuild/bazel/releases/download/4.2.1/bazel-4.2.1-darwin-x86_64 -Lo tools/bazel
# verify content
echo '74d93848f0c9d592e341e48341c53c87e3cb304a54a2a1ee9cff3df422f0b23c tools/bazel' | shasum -a 256 -c >/dev/null
else
# download bazel version
retry curl https://ossci-linux.s3.amazonaws.com/bazel-4.2.1-linux-x86_64 -o tools/bazel
# verify content
echo '1a4f3a3ce292307bceeb44f459883859c793436d564b95319aacb8af1f20557c tools/bazel' | shasum -a 256 -c >/dev/null
fi
# This function is bazel specific because of the bug
# in the bazel that requires some special paths massaging
# as a workaround. See
# https://github.com/bazelbuild/bazel/issues/10167
function install_sccache_nvcc_for_bazel() {
sudo mv /usr/local/cuda/bin/nvcc /usr/local/cuda/bin/nvcc-real
# Write the `/usr/local/cuda/bin/nvcc`
cat << EOF | sudo tee /usr/local/cuda/bin/nvcc
#!/bin/sh
if [ \$(env -u LD_PRELOAD ps -p \$PPID -o comm=) != sccache ]; then
exec sccache /usr/local/cuda/bin/nvcc "\$@"
else
exec external/local_cuda/cuda/bin/nvcc-real "\$@"
fi
EOF
sudo chmod +x /usr/local/cuda/bin/nvcc
chmod +x tools/bazel
}
function install_monkeytype {
@ -120,67 +105,21 @@ function get_pinned_commit() {
cat .github/ci_commit_pins/"${1}".txt
}
function install_torchaudio() {
local commit
commit=$(get_pinned_commit audio)
if [[ "$1" == "cuda" ]]; then
# TODO: This is better to be passed as a parameter from _linux-test workflow
# so that it can be consistent with what is set in build
TORCH_CUDA_ARCH_LIST="8.0;8.6" pip_install --no-use-pep517 --user "git+https://github.com/pytorch/audio.git@${commit}"
else
pip_install --no-use-pep517 --user "git+https://github.com/pytorch/audio.git@${commit}"
fi
}
function install_torchtext() {
local data_commit
local text_commit
data_commit=$(get_pinned_commit data)
text_commit=$(get_pinned_commit text)
pip_install --no-use-pep517 --user "git+https://github.com/pytorch/data.git@${data_commit}"
pip_install --no-use-pep517 --user "git+https://github.com/pytorch/text.git@${text_commit}"
local commit
commit=$(get_pinned_commit text)
pip_install --no-use-pep517 --user "git+https://github.com/pytorch/text.git@${commit}"
}
function install_torchvision() {
local orig_preload
local commit
commit=$(get_pinned_commit vision)
orig_preload=${LD_PRELOAD}
if [ -n "${LD_PRELOAD}" ]; then
# Silence dlerror to work-around glibc ASAN bug, see https://sourceware.org/bugzilla/show_bug.cgi?id=27653#c9
echo 'char* dlerror(void) { return "";}'|gcc -fpic -shared -o "${HOME}/dlerror.so" -x c -
LD_PRELOAD=${orig_preload}:${HOME}/dlerror.so
fi
pip_install --no-use-pep517 --user "git+https://github.com/pytorch/vision.git@${commit}"
if [ -n "${LD_PRELOAD}" ]; then
LD_PRELOAD=${orig_preload}
fi
}
function install_torchrec_and_fbgemm() {
local torchrec_commit
torchrec_commit=$(get_pinned_commit torchrec)
local fbgemm_commit
fbgemm_commit=$(get_pinned_commit fbgemm)
pip_uninstall torchrec-nightly
pip_uninstall fbgemm-gpu-nightly
pip_install setuptools-git-versioning scikit-build pyre-extensions
# See https://github.com/pytorch/pytorch/issues/106971
CUDA_PATH=/usr/local/cuda-12.1 pip_install --no-use-pep517 --user "git+https://github.com/pytorch/FBGEMM.git@${fbgemm_commit}#egg=fbgemm-gpu&subdirectory=fbgemm_gpu"
pip_install --no-use-pep517 --user "git+https://github.com/pytorch/torchrec.git@${torchrec_commit}"
}
function install_numpy_pytorch_interop() {
local commit
commit=$(get_pinned_commit numpy_pytorch_interop)
# TODO: --no-use-pep517 will result in failure.
pip_install --user "git+https://github.com/Quansight-Labs/numpy_pytorch_interop.git@${commit}"
}
function clone_pytorch_xla() {
if [[ ! -d ./xla ]]; then
git clone --recursive -b r2.1 https://github.com/pytorch/xla.git
git clone --recursive -b r2.0 --quiet https://github.com/pytorch/xla.git
pushd xla
# pin the xla hash so that we don't get broken by changes to xla
git checkout "$(cat ../.github/ci_commit_pins/xla.txt)"
@ -190,15 +129,53 @@ function clone_pytorch_xla() {
fi
}
function install_filelock() {
pip_install filelock
}
function install_triton() {
local commit
if [[ "${TEST_CONFIG}" == *rocm* ]]; then
echo "skipping triton due to rocm"
else
commit=$(get_pinned_commit triton)
if [[ "${BUILD_ENVIRONMENT}" == *gcc7* ]]; then
# Trition needs gcc-9 to build
sudo apt-get install -y g++-9
CXX=g++-9 pip_install --user "git+https://github.com/openai/triton@${commit}#subdirectory=python"
elif [[ "${BUILD_ENVIRONMENT}" == *clang* ]]; then
# Trition needs <filesystem> which surprisingly is not available with clang-9 toolchain
sudo add-apt-repository -y ppa:ubuntu-toolchain-r/test
sudo apt-get install -y g++-9
CXX=g++-9 pip_install --user "git+https://github.com/openai/triton@${commit}#subdirectory=python"
else
pip_install --user "git+https://github.com/openai/triton@${commit}#subdirectory=python"
fi
pip_install --user jinja2
fi
}
function setup_torchdeploy_deps(){
conda install -y -n "py_${ANACONDA_PYTHON_VERSION}" "libpython-static=${ANACONDA_PYTHON_VERSION}"
local CC
local CXX
CC="$(which gcc)"
CXX="$(which g++)"
export CC
export CXX
pip install --upgrade pip
}
function checkout_install_torchdeploy() {
local commit
commit=$(get_pinned_commit multipy)
setup_torchdeploy_deps
pushd ..
git clone --recurse-submodules https://github.com/pytorch/multipy.git
pushd multipy
git checkout "${commit}"
python multipy/runtime/example/generate_examples.py
BUILD_CUDA_TESTS=1 pip install -e .
pip install -e . --install-option="--cudatests"
popd
popd
}
@ -212,21 +189,26 @@ function test_torch_deploy(){
popd
}
function install_huggingface() {
local commit
commit=$(get_pinned_commit huggingface)
pip_install pandas
pip_install scipy
pip_install "git+https://github.com/huggingface/transformers.git@${commit}#egg=transformers"
}
function install_timm() {
local commit
commit=$(get_pinned_commit timm)
pip_install pandas
pip_install scipy
pip_install z3-solver
pip_install "git+https://github.com/rwightman/pytorch-image-models@${commit}"
}
function checkout_install_torchbench() {
local commit
commit=$(get_pinned_commit torchbench)
git clone https://github.com/pytorch/benchmark torchbench
pushd torchbench
git checkout "$commit"
git checkout no_torchaudio
if [ "$1" ]; then
python install.py --continue_on_fail models "$@"
@ -238,6 +220,10 @@ function checkout_install_torchbench() {
popd
}
function test_functorch() {
python test/run_test.py --functorch --verbose
}
function print_sccache_stats() {
echo 'PyTorch Build Statistics'
sccache --show-stats

View File

@ -1,10 +1,10 @@
from datetime import datetime, timedelta
from tempfile import mkdtemp
from cryptography import x509
from cryptography.hazmat.primitives import hashes, serialization
from cryptography.hazmat.primitives import serialization
from cryptography.hazmat.primitives.asymmetric import rsa
from cryptography import x509
from cryptography.x509.oid import NameOID
from cryptography.hazmat.primitives import hashes
temp_dir = mkdtemp()
print(temp_dir)
@ -16,43 +16,37 @@ def genrsa(path):
key_size=2048,
)
with open(path, "wb") as f:
f.write(
key.private_bytes(
encoding=serialization.Encoding.PEM,
format=serialization.PrivateFormat.TraditionalOpenSSL,
encryption_algorithm=serialization.NoEncryption(),
)
)
f.write(key.private_bytes(
encoding=serialization.Encoding.PEM,
format=serialization.PrivateFormat.TraditionalOpenSSL,
encryption_algorithm=serialization.NoEncryption(),
))
return key
def create_cert(path, C, ST, L, O, key):
subject = issuer = x509.Name(
[
x509.NameAttribute(NameOID.COUNTRY_NAME, C),
x509.NameAttribute(NameOID.STATE_OR_PROVINCE_NAME, ST),
x509.NameAttribute(NameOID.LOCALITY_NAME, L),
x509.NameAttribute(NameOID.ORGANIZATION_NAME, O),
]
)
cert = (
x509.CertificateBuilder()
.subject_name(subject)
.issuer_name(issuer)
.public_key(key.public_key())
.serial_number(x509.random_serial_number())
.not_valid_before(datetime.utcnow())
.not_valid_after(
# Our certificate will be valid for 10 days
datetime.utcnow()
+ timedelta(days=10)
)
.add_extension(
x509.BasicConstraints(ca=True, path_length=None),
critical=True,
)
.sign(key, hashes.SHA256())
)
subject = issuer = x509.Name([
x509.NameAttribute(NameOID.COUNTRY_NAME, C),
x509.NameAttribute(NameOID.STATE_OR_PROVINCE_NAME, ST),
x509.NameAttribute(NameOID.LOCALITY_NAME, L),
x509.NameAttribute(NameOID.ORGANIZATION_NAME, O),
])
cert = x509.CertificateBuilder().subject_name(
subject
).issuer_name(
issuer
).public_key(
key.public_key()
).serial_number(
x509.random_serial_number()
).not_valid_before(
datetime.utcnow()
).not_valid_after(
# Our certificate will be valid for 10 days
datetime.utcnow() + timedelta(days=10)
).add_extension(
x509.BasicConstraints(ca=True, path_length=None), critical=True,
).sign(key, hashes.SHA256())
# Write our certificate out to disk.
with open(path, "wb") as f:
f.write(cert.public_bytes(serialization.Encoding.PEM))
@ -60,65 +54,43 @@ def create_cert(path, C, ST, L, O, key):
def create_req(path, C, ST, L, O, key):
csr = (
x509.CertificateSigningRequestBuilder()
.subject_name(
x509.Name(
[
# Provide various details about who we are.
x509.NameAttribute(NameOID.COUNTRY_NAME, C),
x509.NameAttribute(NameOID.STATE_OR_PROVINCE_NAME, ST),
x509.NameAttribute(NameOID.LOCALITY_NAME, L),
x509.NameAttribute(NameOID.ORGANIZATION_NAME, O),
]
)
)
.sign(key, hashes.SHA256())
)
csr = x509.CertificateSigningRequestBuilder().subject_name(x509.Name([
# Provide various details about who we are.
x509.NameAttribute(NameOID.COUNTRY_NAME, C),
x509.NameAttribute(NameOID.STATE_OR_PROVINCE_NAME, ST),
x509.NameAttribute(NameOID.LOCALITY_NAME, L),
x509.NameAttribute(NameOID.ORGANIZATION_NAME, O),
])).sign(key, hashes.SHA256())
with open(path, "wb") as f:
f.write(csr.public_bytes(serialization.Encoding.PEM))
return csr
def sign_certificate_request(path, csr_cert, ca_cert, private_ca_key):
cert = (
x509.CertificateBuilder()
.subject_name(csr_cert.subject)
.issuer_name(ca_cert.subject)
.public_key(csr_cert.public_key())
.serial_number(x509.random_serial_number())
.not_valid_before(datetime.utcnow())
.not_valid_after(
# Our certificate will be valid for 10 days
datetime.utcnow()
+ timedelta(days=10)
# Sign our certificate with our private key
)
.sign(private_ca_key, hashes.SHA256())
)
cert = x509.CertificateBuilder().subject_name(
csr_cert.subject
).issuer_name(
ca_cert.subject
).public_key(
csr_cert.public_key()
).serial_number(
x509.random_serial_number()
).not_valid_before(
datetime.utcnow()
).not_valid_after(
# Our certificate will be valid for 10 days
datetime.utcnow() + timedelta(days=10)
# Sign our certificate with our private key
).sign(private_ca_key, hashes.SHA256())
with open(path, "wb") as f:
f.write(cert.public_bytes(serialization.Encoding.PEM))
return cert
ca_key = genrsa(temp_dir + "/ca.key")
ca_cert = create_cert(
temp_dir + "/ca.pem",
"US",
"New York",
"New York",
"Gloo Certificate Authority",
ca_key,
)
ca_cert = create_cert(temp_dir + "/ca.pem", u"US", u"New York", u"New York", u"Gloo Certificate Authority", ca_key)
pkey = genrsa(temp_dir + "/pkey.key")
csr = create_req(
temp_dir + "/csr.csr",
"US",
"California",
"San Francisco",
"Gloo Testing Company",
pkey,
)
csr = create_req(temp_dir + "/csr.csr", u"US", u"California", u"San Francisco", u"Gloo Testing Company", pkey)
cert = sign_certificate_request(temp_dir + "/cert.pem", csr, ca_cert, ca_key)

View File

@ -6,4 +6,5 @@ source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
echo "Testing pytorch docs"
cd docs
pip_install -r requirements.txt
make doctest

View File

@ -1,40 +0,0 @@
#!/bin/bash
# This is where the local pytorch install in the docker image is located
pt_checkout="/var/lib/jenkins/workspace"
source "$pt_checkout/.ci/pytorch/common_utils.sh"
echo "functorch_doc_push_script.sh: Invoked with $*"
set -ex
version=${DOCS_VERSION:-nightly}
echo "version: $version"
# Build functorch docs
pushd $pt_checkout/functorch/docs
make html
popd
git clone https://github.com/pytorch/functorch -b gh-pages --depth 1 functorch_ghpages
pushd functorch_ghpages
if [ "$version" == "main" ]; then
version=nightly
fi
git rm -rf "$version" || true
mv "$pt_checkout/functorch/docs/build/html" "$version"
git add "$version" || true
git status
git config user.email "soumith+bot@pytorch.org"
git config user.name "pytorchbot"
# If there aren't changes, don't make a commit; push is no-op
git commit -m "Generate Python docs from pytorch/pytorch@${GITHUB_SHA}" || true
git status
if [[ "${WITH_PUSH:-}" == true ]]; then
git push -u origin gh-pages
fi
popd

View File

@ -40,14 +40,8 @@ cross_compile_arm64() {
USE_DISTRIBUTED=0 CMAKE_OSX_ARCHITECTURES=arm64 MACOSX_DEPLOYMENT_TARGET=11.0 USE_MKLDNN=OFF USE_QNNPACK=OFF WERROR=1 BUILD_TEST=OFF USE_PYTORCH_METAL=1 python setup.py bdist_wheel
}
compile_arm64() {
# Compilation for arm64
# TODO: Compile with OpenMP support (but this causes CI regressions as cross-compilation were done with OpenMP disabled)
USE_DISTRIBUTED=0 USE_OPENMP=0 MACOSX_DEPLOYMENT_TARGET=11.0 WERROR=1 BUILD_TEST=OFF USE_PYTORCH_METAL=1 python setup.py bdist_wheel
}
compile_x86_64() {
USE_DISTRIBUTED=0 WERROR=1 python setup.py bdist_wheel --plat-name=macosx_10_9_x86_64
USE_DISTRIBUTED=0 WERROR=1 python setup.py bdist_wheel
}
build_lite_interpreter() {
@ -68,14 +62,8 @@ build_lite_interpreter() {
"${CPP_BUILD}/caffe2/build/bin/test_lite_interpreter_runtime"
}
print_cmake_info
if [[ ${BUILD_ENVIRONMENT} = *arm64* ]]; then
if [[ $(uname -m) == "arm64" ]]; then
compile_arm64
else
cross_compile_arm64
fi
cross_compile_arm64
elif [[ ${BUILD_ENVIRONMENT} = *lite-interpreter* ]]; then
export BUILD_LITE_INTERPRETER=1
build_lite_interpreter

View File

@ -9,25 +9,6 @@ sysctl -a | grep machdep.cpu
# These are required for both the build job and the test job.
# In the latter to test cpp extensions.
export MACOSX_DEPLOYMENT_TARGET=11.0
export MACOSX_DEPLOYMENT_TARGET=10.9
export CXX=clang++
export CC=clang
print_cmake_info() {
CMAKE_EXEC=$(which cmake)
echo "$CMAKE_EXEC"
CONDA_INSTALLATION_DIR=$(dirname "$CMAKE_EXEC")
# Print all libraries under cmake rpath for debugging
ls -la "$CONDA_INSTALLATION_DIR/../lib"
export CMAKE_EXEC
# Explicitly add conda env lib folder to cmake rpath to address the flaky issue
# where cmake dependencies couldn't be found. This seems to point to how conda
# links $CMAKE_EXEC to its package cache when cloning a new environment
install_name_tool -add_rpath @executable_path/../lib "${CMAKE_EXEC}" || true
# Adding the rpath will invalidate cmake signature, so signing it again here
# to trust the executable. EXC_BAD_ACCESS (SIGKILL (Code Signature Invalid))
# with an exit code 137 otherwise
codesign -f -s - "${CMAKE_EXEC}" || true
}

View File

@ -25,7 +25,6 @@ setup_test_python() {
# using the address associated with the loopback interface.
export GLOO_SOCKET_IFNAME=lo0
echo "Ninja version: $(ninja --version)"
echo "Python version: $(which python) ($(python --version))"
# Increase default limit on open file handles from 256 to 1024
ulimit -n 1024
@ -71,19 +70,37 @@ test_libtorch() {
VERBOSE=1 DEBUG=1 python "$BUILD_LIBTORCH_PY"
popd
MNIST_DIR="${PWD}/test/cpp/api/mnist"
python tools/download_mnist.py --quiet -d "${MNIST_DIR}"
python tools/download_mnist.py --quiet -d test/cpp/api/mnist
# Unfortunately it seems like the test can't load from miniconda3
# without these paths being set
export DYLD_LIBRARY_PATH="$DYLD_LIBRARY_PATH:$PWD/miniconda3/lib"
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$PWD/miniconda3/lib"
TORCH_CPP_TEST_MNIST_PATH="${MNIST_DIR}" CPP_TESTS_DIR="${CPP_BUILD}/caffe2/bin" python test/run_test.py --cpp --verbose -i cpp/test_api
TORCH_CPP_TEST_MNIST_PATH="test/cpp/api/mnist" "$CPP_BUILD"/caffe2/bin/test_api
assert_git_not_dirty
fi
}
print_cmake_info() {
CMAKE_EXEC=$(which cmake)
echo "$CMAKE_EXEC"
CONDA_INSTALLATION_DIR=$(dirname "$CMAKE_EXEC")
# Print all libraries under cmake rpath for debugging
ls -la "$CONDA_INSTALLATION_DIR/../lib"
export CMAKE_EXEC
# Explicitly add conda env lib folder to cmake rpath to address the flaky issue
# where cmake dependencies couldn't be found. This seems to point to how conda
# links $CMAKE_EXEC to its package cache when cloning a new environment
install_name_tool -add_rpath @executable_path/../lib "${CMAKE_EXEC}" || true
# Adding the rpath will invalidate cmake signature, so signing it again here
# to trust the executable. EXC_BAD_ACCESS (SIGKILL (Code Signature Invalid))
# with an exit code 137 otherwise
codesign -f -s - "${CMAKE_EXEC}" || true
}
test_custom_backend() {
print_cmake_info
@ -149,7 +166,9 @@ test_jit_hooks() {
assert_git_not_dirty
}
if [[ $NUM_TEST_SHARDS -gt 1 ]]; then
if [[ "${TEST_CONFIG}" == *functorch* ]]; then
test_functorch
elif [[ $NUM_TEST_SHARDS -gt 1 ]]; then
test_python_shard "${SHARD_NUMBER}"
if [[ "${SHARD_NUMBER}" == 1 ]]; then
test_libtorch

View File

@ -8,7 +8,6 @@
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
echo "Testing pytorch"
time python test/run_test.py --include test_cuda_multigpu test_cuda_primary_ctx --verbose
# Disabling tests to see if they solve timeout issues; see https://github.com/pytorch/pytorch/issues/70015
# python tools/download_mnist.py --quiet -d test/cpp/api/mnist
@ -28,25 +27,23 @@ time python test/run_test.py --verbose -i distributed/checkpoint/test_checkpoint
time python test/run_test.py --verbose -i distributed/checkpoint/test_file_system_checkpoint
time python test/run_test.py --verbose -i distributed/_shard/sharding_spec/test_sharding_spec
time python test/run_test.py --verbose -i distributed/_shard/sharding_plan/test_sharding_plan
time python test/run_test.py --verbose -i distributed/_shard/sharded_tensor/test_megatron_prototype
time python test/run_test.py --verbose -i distributed/_shard/sharded_tensor/test_sharded_tensor
time python test/run_test.py --verbose -i distributed/_shard/sharded_tensor/test_sharded_tensor_reshard
# functional collective tests
time python test/run_test.py --verbose -i distributed/test_functional_api
# DTensor tests
time python test/run_test.py --verbose -i distributed/_tensor/test_device_mesh
time python test/run_test.py --verbose -i distributed/_tensor/test_random_ops
time python test/run_test.py --verbose -i distributed/_tensor/test_dtensor_compile
# DTensor/TP tests
time python test/run_test.py --verbose -i distributed/tensor/parallel/test_ddp_2d_parallel
time python test/run_test.py --verbose -i distributed/tensor/parallel/test_fsdp_2d_parallel
time python test/run_test.py --verbose -i distributed/tensor/parallel/test_tp_examples
time python test/run_test.py --verbose -i distributed/_shard/sharded_tensor/ops/test_chunk
time python test/run_test.py --verbose -i distributed/_shard/sharded_tensor/ops/test_elementwise_ops
time python test/run_test.py --verbose -i distributed/_shard/sharded_tensor/ops/test_embedding
time python test/run_test.py --verbose -i distributed/_shard/sharded_tensor/ops/test_embedding_bag
time python test/run_test.py --verbose -i distributed/_shard/sharded_tensor/ops/test_binary_cmp
time python test/run_test.py --verbose -i distributed/_shard/sharded_tensor/ops/test_init
time python test/run_test.py --verbose -i distributed/_shard/sharded_tensor/ops/test_linear
time python test/run_test.py --verbose -i distributed/_shard/sharded_tensor/ops/test_math_ops
time python test/run_test.py --verbose -i distributed/_shard/sharded_tensor/ops/test_matrix_ops
time python test/run_test.py --verbose -i distributed/_shard/sharded_tensor/ops/test_softmax
time python test/run_test.py --verbose -i distributed/_shard/sharded_optim/test_sharded_optim
time python test/run_test.py --verbose -i distributed/_shard/test_partial_tensor
time python test/run_test.py --verbose -i distributed/_shard/test_replicated_tensor
# Other tests
time python test/run_test.py --verbose -i test_cuda_primary_ctx
time python test/run_test.py --verbose -i test_optim -- -k optimizers_with_varying_tensors
time python test/run_test.py --verbose -i test_foreach -- -k test_tensors_grouping
assert_git_not_dirty

View File

@ -1,41 +1,32 @@
import argparse
import sys
import json
import math
import sys
import argparse
parser = argparse.ArgumentParser()
parser.add_argument(
"--test-name", dest="test_name", action="store", required=True, help="test name"
)
parser.add_argument(
"--sample-stats",
dest="sample_stats",
action="store",
required=True,
help="stats from sample",
)
parser.add_argument(
"--update",
action="store_true",
help="whether to update baseline using stats from sample",
)
parser.add_argument('--test-name', dest='test_name', action='store',
required=True, help='test name')
parser.add_argument('--sample-stats', dest='sample_stats', action='store',
required=True, help='stats from sample')
parser.add_argument('--update', action='store_true',
help='whether to update baseline using stats from sample')
args = parser.parse_args()
test_name = args.test_name
if "cpu" in test_name:
backend = "cpu"
elif "gpu" in test_name:
backend = "gpu"
if 'cpu' in test_name:
backend = 'cpu'
elif 'gpu' in test_name:
backend = 'gpu'
data_file_path = f"../{backend}_runtime.json"
data_file_path = '../{}_runtime.json'.format(backend)
with open(data_file_path) as data_file:
data = json.load(data_file)
if test_name in data:
mean = float(data[test_name]["mean"])
sigma = float(data[test_name]["sigma"])
mean = float(data[test_name]['mean'])
sigma = float(data[test_name]['sigma'])
else:
# Let the test pass if baseline number doesn't exist
mean = sys.maxsize
@ -52,39 +43,37 @@ if math.isnan(mean) or math.isnan(sigma):
sample_stats_data = json.loads(args.sample_stats)
sample_mean = float(sample_stats_data["mean"])
sample_sigma = float(sample_stats_data["sigma"])
sample_mean = float(sample_stats_data['mean'])
sample_sigma = float(sample_stats_data['sigma'])
print("sample mean: ", sample_mean)
print("sample sigma: ", sample_sigma)
if math.isnan(sample_mean):
raise Exception("""Error: sample mean is NaN""")
raise Exception('''Error: sample mean is NaN''')
elif math.isnan(sample_sigma):
raise Exception("""Error: sample sigma is NaN""")
raise Exception('''Error: sample sigma is NaN''')
z_value = (sample_mean - mean) / sigma
print("z-value: ", z_value)
if z_value >= 3:
raise Exception(
f"""\n
raise Exception('''\n
z-value >= 3, there is high chance of perf regression.\n
To reproduce this regression, run
`cd .ci/pytorch/perf_test/ && bash {test_name}.sh` on your local machine
`cd .ci/pytorch/perf_test/ && bash {}.sh` on your local machine
and compare the runtime before/after your code change.
"""
)
'''.format(test_name))
else:
print("z-value < 3, no perf regression detected.")
if args.update:
print("We will use these numbers as new baseline.")
new_data_file_path = f"../new_{backend}_runtime.json"
new_data_file_path = '../new_{}_runtime.json'.format(backend)
with open(new_data_file_path) as new_data_file:
new_data = json.load(new_data_file)
new_data[test_name] = {}
new_data[test_name]["mean"] = sample_mean
new_data[test_name]["sigma"] = max(sample_sigma, sample_mean * 0.1)
with open(new_data_file_path, "w") as new_data_file:
new_data[test_name]['mean'] = sample_mean
new_data[test_name]['sigma'] = max(sample_sigma, sample_mean * 0.1)
with open(new_data_file_path, 'w') as new_data_file:
json.dump(new_data, new_data_file, indent=4)

View File

@ -1,6 +1,5 @@
import json
import sys
import json
import numpy
sample_data_list = sys.argv[1:]
@ -10,8 +9,8 @@ sample_mean = numpy.mean(sample_data_list)
sample_sigma = numpy.std(sample_data_list)
data = {
"mean": sample_mean,
"sigma": sample_sigma,
'mean': sample_mean,
'sigma': sample_sigma,
}
print(json.dumps(data))

View File

@ -1,5 +1,5 @@
import json
import sys
import json
data_file_path = sys.argv[1]
commit_hash = sys.argv[2]
@ -7,7 +7,7 @@ commit_hash = sys.argv[2]
with open(data_file_path) as data_file:
data = json.load(data_file)
data["commit"] = commit_hash
data['commit'] = commit_hash
with open(data_file_path, "w") as data_file:
with open(data_file_path, 'w') as data_file:
json.dump(data, data_file)

View File

@ -9,9 +9,9 @@ for line in lines:
# Ignore errors from CPU instruction set, symbol existing testing,
# or compilation error formatting
ignored_keywords = [
"src.c",
"CheckSymbolExists.c",
"test_compilation_error_formatting",
'src.c',
'CheckSymbolExists.c',
'test_compilation_error_formatting',
]
if all(keyword not in line for keyword in ignored_keywords):
if all([keyword not in line for keyword in ignored_keywords]):
print(line)

File diff suppressed because it is too large Load Diff

View File

@ -15,6 +15,13 @@ source "$SCRIPT_PARENT_DIR/common.sh"
# shellcheck source=./common-build.sh
source "$SCRIPT_PARENT_DIR/common-build.sh"
IMAGE_COMMIT_ID=$(git rev-parse HEAD)
export IMAGE_COMMIT_ID
export IMAGE_COMMIT_TAG=${BUILD_ENVIRONMENT}-${IMAGE_COMMIT_ID}
if [[ ${JOB_NAME} == *"develop"* ]]; then
export IMAGE_COMMIT_TAG=develop-${IMAGE_COMMIT_TAG}
fi
export TMP_DIR="${PWD}/build/win_tmp"
TMP_DIR_WIN=$(cygpath -w "${TMP_DIR}")
export TMP_DIR_WIN
@ -23,6 +30,14 @@ if [[ -n "$PYTORCH_FINAL_PACKAGE_DIR" ]]; then
mkdir -p "$PYTORCH_FINAL_PACKAGE_DIR" || true
fi
# This directory is used only to hold "pytorch_env_restore.bat", called via "setup_pytorch_env.bat"
CI_SCRIPTS_DIR=$TMP_DIR/ci_scripts
mkdir -p "$CI_SCRIPTS_DIR"
if [ -n "$(ls "$CI_SCRIPTS_DIR"/*)" ]; then
rm "$CI_SCRIPTS_DIR"/*
fi
export SCRIPT_HELPERS_DIR=$SCRIPT_PARENT_DIR/win-test-helpers
set +ex
@ -44,4 +59,7 @@ set -ex
assert_git_not_dirty
if [ ! -f "${TMP_DIR}"/"${IMAGE_COMMIT_TAG}".7z ] && [ ! "${BUILD_ENVIRONMENT}" == "" ]; then
exit 1
fi
echo "BUILD PASSED"

View File

@ -111,8 +111,23 @@ if "%USE_CUDA%"=="1" (
set CMAKE_CUDA_COMPILER_LAUNCHER=%TMP_DIR%/bin/randomtemp.exe;%TMP_DIR%\bin\sccache.exe
)
:: Print all existing environment variable for debugging
set
@echo off
echo @echo off >> %TMP_DIR_WIN%\ci_scripts\pytorch_env_restore.bat
for /f "usebackq tokens=*" %%i in (`set`) do echo set "%%i" >> %TMP_DIR_WIN%\ci_scripts\pytorch_env_restore.bat
@echo on
if "%REBUILD%" == "" (
if NOT "%BUILD_ENVIRONMENT%" == "" (
:: Create a shortcut to restore pytorch environment
echo @echo off >> %TMP_DIR_WIN%/ci_scripts/pytorch_env_restore_helper.bat
echo call "%TMP_DIR_WIN%/ci_scripts/pytorch_env_restore.bat" >> %TMP_DIR_WIN%/ci_scripts/pytorch_env_restore_helper.bat
echo cd /D "%CD%" >> %TMP_DIR_WIN%/ci_scripts/pytorch_env_restore_helper.bat
aws s3 cp "s3://ossci-windows/Restore PyTorch Environment.lnk" "C:\Users\circleci\Desktop\Restore PyTorch Environment.lnk"
if errorlevel 1 exit /b
if not errorlevel 0 exit /b
)
)
python setup.py bdist_wheel
if errorlevel 1 exit /b
@ -123,12 +138,18 @@ python -c "import os, glob; os.system('python -mpip install --no-index --no-deps
if "%BUILD_ENVIRONMENT%"=="" (
echo NOTE: To run `import torch`, please make sure to activate the conda environment by running `call %CONDA_PARENT_DIR%\Miniconda3\Scripts\activate.bat %CONDA_PARENT_DIR%\Miniconda3` in Command Prompt before running Git Bash.
) else (
copy /Y "dist\*.whl" "%PYTORCH_FINAL_PACKAGE_DIR%"
if "%USE_CUDA%"=="1" (
7z a %TMP_DIR_WIN%\%IMAGE_COMMIT_TAG%.7z %CONDA_PARENT_DIR%\Miniconda3\Lib\site-packages\torch %CONDA_PARENT_DIR%\Miniconda3\Lib\site-packages\torchgen %CONDA_PARENT_DIR%\Miniconda3\Lib\site-packages\functorch %CONDA_PARENT_DIR%\Miniconda3\Lib\site-packages\nvfuser && copy /Y "%TMP_DIR_WIN%\%IMAGE_COMMIT_TAG%.7z" "%PYTORCH_FINAL_PACKAGE_DIR%\"
) else (
7z a %TMP_DIR_WIN%\%IMAGE_COMMIT_TAG%.7z %CONDA_PARENT_DIR%\Miniconda3\Lib\site-packages\torch %CONDA_PARENT_DIR%\Miniconda3\Lib\site-packages\torchgen %CONDA_PARENT_DIR%\Miniconda3\Lib\site-packages\functorch && copy /Y "%TMP_DIR_WIN%\%IMAGE_COMMIT_TAG%.7z" "%PYTORCH_FINAL_PACKAGE_DIR%\"
)
if errorlevel 1 exit /b
if not errorlevel 0 exit /b
:: export test times so that potential sharded tests that'll branch off this build will use consistent data
python tools/stats/export_test_times.py
copy /Y ".pytorch-test-times.json" "%PYTORCH_FINAL_PACKAGE_DIR%"
copy /Y ".pytorch-test-file-ratings.json" "%PYTORCH_FINAL_PACKAGE_DIR%"
:: Also save build/.ninja_log as an artifact
copy /Y "build\.ninja_log" "%PYTORCH_FINAL_PACKAGE_DIR%\"

View File

@ -0,0 +1,19 @@
call %SCRIPT_HELPERS_DIR%\setup_pytorch_env.bat
:: exit the batch once there's an error
if not errorlevel 0 (
echo "setup pytorch env failed"
echo %errorlevel%
exit /b
)
echo "Test functorch"
pushd test
python run_test.py --functorch --shard "%SHARD_NUMBER%" "%NUM_TEST_SHARDS%" --verbose
popd
if ERRORLEVEL 1 goto fail
:eof
exit /b 0
:fail
exit /b 1

View File

@ -1,7 +1,7 @@
#!/usr/bin/env python3
import os
import subprocess
import os
COMMON_TESTS = [
(
@ -31,7 +31,8 @@ GPU_TESTS = [
if __name__ == "__main__":
if "USE_CUDA" in os.environ and os.environ["USE_CUDA"] == "1":
if 'USE_CUDA' in os.environ and os.environ['USE_CUDA'] == '1':
TESTS = COMMON_TESTS + GPU_TESTS
else:
TESTS = COMMON_TESTS
@ -43,10 +44,8 @@ if __name__ == "__main__":
try:
subprocess.check_call(command_args)
except subprocess.CalledProcessError as e:
sdk_root = os.environ.get(
"WindowsSdkDir", "C:\\Program Files (x86)\\Windows Kits\\10"
)
debugger = os.path.join(sdk_root, "Debuggers", "x64", "cdb.exe")
sdk_root = os.environ.get('WindowsSdkDir', 'C:\\Program Files (x86)\\Windows Kits\\10')
debugger = os.path.join(sdk_root, 'Debuggers', 'x64', 'cdb.exe')
if os.path.exists(debugger):
command_args = [debugger, "-o", "-c", "~*g; q"] + command_args
command_string = " ".join(command_args)

View File

@ -1,3 +1,8 @@
if exist "%TMP_DIR%/ci_scripts/pytorch_env_restore.bat" (
call %TMP_DIR%/ci_scripts/pytorch_env_restore.bat
exit /b 0
)
set PATH=C:\Program Files\CMake\bin;C:\Program Files\7-Zip;C:\ProgramData\chocolatey\bin;C:\Program Files\Git\cmd;C:\Program Files\Amazon\AWSCLI;C:\Program Files\Amazon\AWSCLI\bin;%PATH%
:: Install Miniconda3
@ -9,13 +14,6 @@ call %INSTALLER_DIR%\activate_miniconda3.bat
if errorlevel 1 exit /b
if not errorlevel 0 exit /b
:: PyTorch is now installed using the standard wheel on Windows into the conda environment.
:: However, the test scripts are still frequently referring to the workspace temp directory
:: build\torch. Rather than changing all these references, making a copy of torch folder
:: from conda to the current workspace is easier. The workspace will be cleaned up after
:: the job anyway
xcopy /s %CONDA_PARENT_DIR%\Miniconda3\Lib\site-packages\torch %TMP_DIR_WIN%\build\torch\
pushd .
if "%VC_VERSION%" == "" (
call "C:\Program Files (x86)\Microsoft Visual Studio\%VC_YEAR%\%VC_PRODUCT%\VC\Auxiliary\Build\vcvarsall.bat" x64
@ -50,5 +48,26 @@ set NUMBAPRO_NVVM=%CUDA_PATH%\nvvm\bin\nvvm64_32_0.dll
set PYTHONPATH=%TMP_DIR_WIN%\build;%PYTHONPATH%
:: Print all existing environment variable for debugging
set
if NOT "%BUILD_ENVIRONMENT%"=="" (
pushd %TMP_DIR_WIN%\build
copy /Y %PYTORCH_FINAL_PACKAGE_DIR_WIN%\%IMAGE_COMMIT_TAG%.7z %TMP_DIR_WIN%\
:: 7z: -aos skips if exists because this .bat can be called multiple times
7z x %TMP_DIR_WIN%\%IMAGE_COMMIT_TAG%.7z -aos
popd
) else (
xcopy /s %CONDA_PARENT_DIR%\Miniconda3\Lib\site-packages\torch %TMP_DIR_WIN%\build\torch\
)
@echo off
echo @echo off >> %TMP_DIR_WIN%/ci_scripts/pytorch_env_restore.bat
for /f "usebackq tokens=*" %%i in (`set`) do echo set "%%i" >> %TMP_DIR_WIN%/ci_scripts/pytorch_env_restore.bat
@echo on
if NOT "%BUILD_ENVIRONMENT%" == "" (
:: Create a shortcut to restore pytorch environment
echo @echo off >> %TMP_DIR_WIN%/ci_scripts/pytorch_env_restore_helper.bat
echo call "%TMP_DIR_WIN%/ci_scripts/pytorch_env_restore.bat" >> %TMP_DIR_WIN%/ci_scripts/pytorch_env_restore_helper.bat
echo cd /D "%CD%" >> %TMP_DIR_WIN%/ci_scripts/pytorch_env_restore_helper.bat
aws s3 cp "s3://ossci-windows/Restore PyTorch Environment.lnk" "C:\Users\circleci\Desktop\Restore PyTorch Environment.lnk"
)

View File

@ -5,16 +5,14 @@ if "%USE_CUDA%" == "0" IF NOT "%CUDA_VERSION%" == "cpu" exit /b 0
call %SCRIPT_HELPERS_DIR%\setup_pytorch_env.bat
if errorlevel 1 exit /b 1
:: Save the current working directory so that we can go back there
set CWD=%cd%
set CPP_TESTS_DIR=%TMP_DIR_WIN%\build\torch\bin
cd %TMP_DIR_WIN%\build\torch\bin
set TEST_OUT_DIR=%~dp0\..\..\..\test\test-reports\cpp-unittest
md %TEST_OUT_DIR%
set PATH=C:\Program Files\NVIDIA Corporation\NvToolsExt\bin\x64;%TMP_DIR_WIN%\build\torch\lib;%PATH%
set TORCH_CPP_TEST_MNIST_PATH=%CWD%\test\cpp\api\mnist
python tools\download_mnist.py --quiet -d %TORCH_CPP_TEST_MNIST_PATH%
python test\run_test.py --cpp --verbose -i cpp/test_api
set TEST_API_OUT_DIR=%TEST_OUT_DIR%\test_api
md %TEST_API_OUT_DIR%
test_api.exe --gtest_filter="-IntegrationTest.MNIST*" --gtest_output=xml:%TEST_API_OUT_DIR%\test_api.xml
if errorlevel 1 exit /b 1
if not errorlevel 0 exit /b 1
@ -27,10 +25,6 @@ for /r "." %%a in (*.exe) do (
goto :eof
:libtorch_check
cd %CWD%
set CPP_TESTS_DIR=%TMP_DIR_WIN%\build\torch\test
:: Skip verify_api_visibility as it a compile level test
if "%~1" == "verify_api_visibility" goto :eof
@ -48,12 +42,12 @@ if "%~1" == "utility_ops_gpu_test" goto :eof
echo Running "%~2"
if "%~1" == "c10_intrusive_ptr_benchmark" (
:: NB: This is not a gtest executable file, thus couldn't be handled by pytest-cpp
call "%~2"
goto :eof
)
python test\run_test.py --cpp --verbose -i "cpp/%~1"
:: Differentiating the test report directories is crucial for test time reporting.
md %TEST_OUT_DIR%\%~n2
call "%~2" --gtest_output=xml:%TEST_OUT_DIR%\%~n2\%~1.xml
if errorlevel 1 (
echo %1 failed with exit code %errorlevel%
exit /b 1

View File

@ -2,7 +2,6 @@ call %SCRIPT_HELPERS_DIR%\setup_pytorch_env.bat
echo Copying over test times file
copy /Y "%PYTORCH_FINAL_PACKAGE_DIR_WIN%\.pytorch-test-times.json" "%PROJECT_DIR_WIN%"
copy /Y "%PYTORCH_FINAL_PACKAGE_DIR_WIN%\.pytorch-test-file-ratings.json" "%PROJECT_DIR_WIN%"
pushd test

View File

@ -23,7 +23,6 @@ if "%SHARD_NUMBER%" == "1" (
echo Copying over test times file
copy /Y "%PYTORCH_FINAL_PACKAGE_DIR_WIN%\.pytorch-test-times.json" "%PROJECT_DIR_WIN%"
copy /Y "%PYTORCH_FINAL_PACKAGE_DIR_WIN%\.pytorch-test-file-ratings.json" "%PROJECT_DIR_WIN%"
echo Run nn tests
python run_test.py --exclude-jit-executor --exclude-distributed-tests --shard "%SHARD_NUMBER%" "%NUM_TEST_SHARDS%" --verbose

View File

@ -5,6 +5,13 @@ SCRIPT_PARENT_DIR=$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )
# shellcheck source=./common.sh
source "$SCRIPT_PARENT_DIR/common.sh"
IMAGE_COMMIT_ID=$(git rev-parse HEAD)
export IMAGE_COMMIT_ID
export IMAGE_COMMIT_TAG=${BUILD_ENVIRONMENT}-${IMAGE_COMMIT_ID}
if [[ ${JOB_NAME} == *"develop"* ]]; then
export IMAGE_COMMIT_TAG=develop-${IMAGE_COMMIT_TAG}
fi
export TMP_DIR="${PWD}/build/win_tmp"
TMP_DIR_WIN=$(cygpath -w "${TMP_DIR}")
export TMP_DIR_WIN
@ -14,12 +21,22 @@ export PROJECT_DIR_WIN
export TEST_DIR="${PWD}/test"
TEST_DIR_WIN=$(cygpath -w "${TEST_DIR}")
export TEST_DIR_WIN
export PYTORCH_FINAL_PACKAGE_DIR="${PYTORCH_FINAL_PACKAGE_DIR:-/c/w/build-results}"
export PYTORCH_FINAL_PACKAGE_DIR="${PYTORCH_FINAL_PACKAGE_DIR:-/c/users/circleci/workspace/build-results}"
PYTORCH_FINAL_PACKAGE_DIR_WIN=$(cygpath -w "${PYTORCH_FINAL_PACKAGE_DIR}")
export PYTORCH_FINAL_PACKAGE_DIR_WIN
mkdir -p "$TMP_DIR"/build/torch
# This directory is used only to hold "pytorch_env_restore.bat", called via "setup_pytorch_env.bat"
CI_SCRIPTS_DIR=$TMP_DIR/ci_scripts
mkdir -p "$CI_SCRIPTS_DIR"
if [ -n "$(ls "$CI_SCRIPTS_DIR"/*)" ]; then
rm "$CI_SCRIPTS_DIR"/*
fi
export SCRIPT_HELPERS_DIR=$SCRIPT_PARENT_DIR/win-test-helpers
if [[ "$TEST_CONFIG" = "force_on_cpu" ]]; then
@ -34,12 +51,6 @@ if [[ "$BUILD_ENVIRONMENT" == *cuda* ]]; then
export PYTORCH_TESTING_DEVICE_ONLY_FOR="cuda"
fi
# TODO: Move both of them to Windows AMI
python -m pip install pytest-rerunfailures==10.3 pytest-cpp==2.3.0 tensorboard==2.13.0
# Install Z3 optional dependency for Windows builds.
python -m pip install z3-solver
run_tests() {
# Run nvidia-smi if available
for path in '/c/Program Files/NVIDIA Corporation/NVSMI/nvidia-smi.exe' /c/Windows/System32/nvidia-smi.exe; do
@ -49,7 +60,9 @@ run_tests() {
fi
done
if [[ $NUM_TEST_SHARDS -eq 1 ]]; then
if [[ "${TEST_CONFIG}" == *functorch* ]]; then
"$SCRIPT_HELPERS_DIR"/install_test_functorch.bat
elif [[ $NUM_TEST_SHARDS -eq 1 ]]; then
"$SCRIPT_HELPERS_DIR"/test_python_shard.bat
"$SCRIPT_HELPERS_DIR"/test_custom_script_ops.bat
"$SCRIPT_HELPERS_DIR"/test_custom_backend.bat

View File

@ -106,7 +106,7 @@ All binaries are built in CircleCI workflows except Windows. There are checked-i
Some quick vocab:
* A \**workflow** is a CircleCI concept; it is a DAG of '**jobs**'. ctrl-f 'workflows' on https://github.com/pytorch/pytorch/blob/main/.circleci/config.yml to see the workflows.
* A \**workflow** is a CircleCI concept; it is a DAG of '**jobs**'. ctrl-f 'workflows' on https://github.com/pytorch/pytorch/blob/master/.circleci/config.yml to see the workflows.
* **jobs** are a sequence of '**steps**'
* **steps** are usually just a bash script or a builtin CircleCI command. *All steps run in new environments, environment variables declared in one script DO NOT persist to following steps*
* CircleCI has a **workspace**, which is essentially a cache between steps of the *same job* in which you can store artifacts between steps.
@ -117,8 +117,8 @@ The nightly binaries have 3 workflows. We have one job (actually 3 jobs: build,
1. binary_builds
1. every day midnight EST
2. linux: https://github.com/pytorch/pytorch/blob/main/.circleci/verbatim-sources/linux-binary-build-defaults.yml
3. macos: https://github.com/pytorch/pytorch/blob/main/.circleci/verbatim-sources/macos-binary-build-defaults.yml
2. linux: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/linux-binary-build-defaults.yml
3. macos: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/macos-binary-build-defaults.yml
4. For each binary configuration, e.g. linux_conda_3.7_cpu there is a
1. binary_linux_conda_3.7_cpu_build
1. Builds the build. On linux jobs this uses the 'docker executor'.
@ -133,12 +133,12 @@ The nightly binaries have 3 workflows. We have one job (actually 3 jobs: build,
2. Uploads the package
2. update_s3_htmls
1. every day 5am EST
2. https://github.com/pytorch/pytorch/blob/main/.circleci/verbatim-sources/binary_update_htmls.yml
2. https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/binary_update_htmls.yml
3. See below for what these are for and why they're needed
4. Three jobs that each examine the current contents of aws and the conda repo and update some html files in s3
3. binarysmoketests
1. every day
2. https://github.com/pytorch/pytorch/blob/main/.circleci/verbatim-sources/nightly-build-smoke-tests-defaults.yml
2. https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/nightly-build-smoke-tests-defaults.yml
3. For each binary configuration, e.g. linux_conda_3.7_cpu there is a
1. smoke_linux_conda_3.7_cpu
1. Downloads the package from the cloud, e.g. using the official pip or conda instructions
@ -146,26 +146,26 @@ The nightly binaries have 3 workflows. We have one job (actually 3 jobs: build,
## How are the jobs structured?
The jobs are in https://github.com/pytorch/pytorch/tree/main/.circleci/verbatim-sources. Jobs are made of multiple steps. There are some shared steps used by all the binaries/smokes. Steps of these jobs are all delegated to scripts in https://github.com/pytorch/pytorch/tree/main/.circleci/scripts .
The jobs are in https://github.com/pytorch/pytorch/tree/master/.circleci/verbatim-sources. Jobs are made of multiple steps. There are some shared steps used by all the binaries/smokes. Steps of these jobs are all delegated to scripts in https://github.com/pytorch/pytorch/tree/master/.circleci/scripts .
* Linux jobs: https://github.com/pytorch/pytorch/blob/main/.circleci/verbatim-sources/linux-binary-build-defaults.yml
* Linux jobs: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/linux-binary-build-defaults.yml
* binary_linux_build.sh
* binary_linux_test.sh
* binary_linux_upload.sh
* MacOS jobs: https://github.com/pytorch/pytorch/blob/main/.circleci/verbatim-sources/macos-binary-build-defaults.yml
* MacOS jobs: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/macos-binary-build-defaults.yml
* binary_macos_build.sh
* binary_macos_test.sh
* binary_macos_upload.sh
* Update html jobs: https://github.com/pytorch/pytorch/blob/main/.circleci/verbatim-sources/binary_update_htmls.yml
* Update html jobs: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/binary_update_htmls.yml
* These delegate from the pytorch/builder repo
* https://github.com/pytorch/builder/blob/main/cron/update_s3_htmls.sh
* https://github.com/pytorch/builder/blob/main/cron/upload_binary_sizes.sh
* Smoke jobs (both linux and macos): https://github.com/pytorch/pytorch/blob/main/.circleci/verbatim-sources/nightly-build-smoke-tests-defaults.yml
* https://github.com/pytorch/builder/blob/master/cron/update_s3_htmls.sh
* https://github.com/pytorch/builder/blob/master/cron/upload_binary_sizes.sh
* Smoke jobs (both linux and macos): https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/nightly-build-smoke-tests-defaults.yml
* These delegate from the pytorch/builder repo
* https://github.com/pytorch/builder/blob/main/run_tests.sh
* https://github.com/pytorch/builder/blob/main/smoke_test.sh
* https://github.com/pytorch/builder/blob/main/check_binary.sh
* Common shared code (shared across linux and macos): https://github.com/pytorch/pytorch/blob/main/.circleci/verbatim-sources/nightly-binary-build-defaults.yml
* https://github.com/pytorch/builder/blob/master/run_tests.sh
* https://github.com/pytorch/builder/blob/master/smoke_test.sh
* https://github.com/pytorch/builder/blob/master/check_binary.sh
* Common shared code (shared across linux and macos): https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/nightly-binary-build-defaults.yml
* binary_checkout.sh - checks out pytorch/builder repo. Right now this also checks out pytorch/pytorch, but it shouldn't. pytorch/pytorch should just be shared through the workspace. This can handle being run before binary_populate_env.sh
* binary_populate_env.sh - parses BUILD_ENVIRONMENT into the separate env variables that make up a binary configuration. Also sets lots of default values, the date, the version strings, the location of folders in s3, all sorts of things. This generally has to be run before other steps.
* binary_install_miniconda.sh - Installs miniconda, cross platform. Also hacks this for the update_binary_sizes job that doesn't have the right env variables
@ -308,7 +308,7 @@ Note that the Windows Python wheels are still built in conda environments. Some
* These should all be consolidated
* These must run on all OS types: MacOS, Linux, and Windows
* These all run smoke tests at the moment. They inspect the packages some, maybe run a few import statements. They DO NOT run the python tests nor the cpp tests. The idea is that python tests on main and PR merges will catch all breakages. All these tests have to do is make sure the special binary machinery didnt mess anything up.
* These all run smoke tests at the moment. They inspect the packages some, maybe run a few import statements. They DO NOT run the python tests nor the cpp tests. The idea is that python tests on master and PR merges will catch all breakages. All these tests have to do is make sure the special binary machinery didnt mess anything up.
* There are separate run_tests.sh and smoke_test.sh because one used to be called by the smoke jobs and one used to be called by the binary test jobs (see circleci structure section above). This is still true actually, but these could be united into a single script that runs these checks, given an installed pytorch package.
### Note on libtorch
@ -340,7 +340,7 @@ The Dockerfiles are available in pytorch/builder, but there is no circleci job o
tl;dr make a PR that looks like https://github.com/pytorch/pytorch/pull/21159
Sometimes we want to push a change to mainand then rebuild all of today's binaries after that change. As of May 30, 2019 there isn't a way to manually run a workflow in the UI. You can manually re-run a workflow, but it will use the exact same git commits as the first run and will not include any changes. So we have to make a PR and then force circleci to run the binary workflow instead of the normal tests. The above PR is an example of how to do this; essentially you copy-paste the binarybuilds workflow steps into the default workflow steps. If you need to point the builder repo to a different commit then you'd need to change https://github.com/pytorch/pytorch/blob/main/.circleci/scripts/binary_checkout.sh#L42-L45 to checkout what you want.
Sometimes we want to push a change to master and then rebuild all of today's binaries after that change. As of May 30, 2019 there isn't a way to manually run a workflow in the UI. You can manually re-run a workflow, but it will use the exact same git commits as the first run and will not include any changes. So we have to make a PR and then force circleci to run the binary workflow instead of the normal tests. The above PR is an example of how to do this; essentially you copy-paste the binarybuilds workflow steps into the default workflow steps. If you need to point the builder repo to a different commit then you'd need to change https://github.com/pytorch/pytorch/blob/master/.circleci/scripts/binary_checkout.sh#L42-L45 to checkout what you want.
## How to test changes to the binaries via .circleci

View File

@ -9,9 +9,8 @@ should be "pruned".
from collections import OrderedDict
import cimodel.data.dimensions as dimensions
from cimodel.lib.conf_tree import ConfigNode
import cimodel.data.dimensions as dimensions
LINKING_DIMENSIONS = [
@ -27,18 +26,12 @@ DEPS_INCLUSION_DIMENSIONS = [
def get_processor_arch_name(gpu_version):
return (
"cpu"
if not gpu_version
else (
"cu" + gpu_version.strip("cuda")
if gpu_version.startswith("cuda")
else gpu_version
)
return "cpu" if not gpu_version else (
"cu" + gpu_version.strip("cuda") if gpu_version.startswith("cuda") else gpu_version
)
CONFIG_TREE_DATA = OrderedDict()
CONFIG_TREE_DATA = OrderedDict(
)
# GCC config variants:
#
@ -48,8 +41,8 @@ CONFIG_TREE_DATA = OrderedDict()
#
# Libtorch with new gcc ABI is built with gcc 5.4 on Ubuntu 16.04.
LINUX_GCC_CONFIG_VARIANTS = OrderedDict(
manywheel=["devtoolset7"],
conda=["devtoolset7"],
manywheel=['devtoolset7'],
conda=['devtoolset7'],
libtorch=[
"devtoolset7",
"gcc5.4_cxx11-abi",
@ -70,9 +63,7 @@ class TopLevelNode(ConfigNode):
self.props["smoke"] = smoke
def get_children(self):
return [
OSConfigNode(self, x, c, p) for (x, (c, p)) in self.config_tree_data.items()
]
return [OSConfigNode(self, x, c, p) for (x, (c, p)) in self.config_tree_data.items()]
class OSConfigNode(ConfigNode):
@ -94,20 +85,12 @@ class PackageFormatConfigNode(ConfigNode):
self.props["python_versions"] = python_versions
self.props["package_format"] = package_format
def get_children(self):
if self.find_prop("os_name") == "linux":
return [
LinuxGccConfigNode(self, v)
for v in LINUX_GCC_CONFIG_VARIANTS[self.find_prop("package_format")]
]
elif (
self.find_prop("os_name") == "windows"
and self.find_prop("package_format") == "libtorch"
):
return [
WindowsLibtorchConfigNode(self, v)
for v in WINDOWS_LIBTORCH_CONFIG_VARIANTS
]
return [LinuxGccConfigNode(self, v) for v in LINUX_GCC_CONFIG_VARIANTS[self.find_prop("package_format")]]
elif self.find_prop("os_name") == "windows" and self.find_prop("package_format") == "libtorch":
return [WindowsLibtorchConfigNode(self, v) for v in WINDOWS_LIBTORCH_CONFIG_VARIANTS]
else:
return [ArchConfigNode(self, v) for v in self.find_prop("gpu_versions")]
@ -123,29 +106,23 @@ class LinuxGccConfigNode(ConfigNode):
# XXX devtoolset7 on CUDA 9.0 is temporarily disabled
# see https://github.com/pytorch/pytorch/issues/20066
if self.find_prop("gcc_config_variant") == "devtoolset7":
if self.find_prop("gcc_config_variant") == 'devtoolset7':
gpu_versions = filter(lambda x: x != "cuda_90", gpu_versions)
# XXX disabling conda rocm build since docker images are not there
if self.find_prop("package_format") == "conda":
gpu_versions = filter(
lambda x: x not in dimensions.ROCM_VERSION_LABELS, gpu_versions
)
if self.find_prop("package_format") == 'conda':
gpu_versions = filter(lambda x: x not in dimensions.ROCM_VERSION_LABELS, gpu_versions)
# XXX libtorch rocm build is temporarily disabled
if self.find_prop("package_format") == "libtorch":
gpu_versions = filter(
lambda x: x not in dimensions.ROCM_VERSION_LABELS, gpu_versions
)
if self.find_prop("package_format") == 'libtorch':
gpu_versions = filter(lambda x: x not in dimensions.ROCM_VERSION_LABELS, gpu_versions)
return [ArchConfigNode(self, v) for v in gpu_versions]
class WindowsLibtorchConfigNode(ConfigNode):
def __init__(self, parent, libtorch_config_variant):
super().__init__(
parent, "LIBTORCH_CONFIG_VARIANT=" + str(libtorch_config_variant)
)
super().__init__(parent, "LIBTORCH_CONFIG_VARIANT=" + str(libtorch_config_variant))
self.props["libtorch_config_variant"] = libtorch_config_variant
@ -184,15 +161,11 @@ class LinkingVariantConfigNode(ConfigNode):
super().__init__(parent, linking_variant)
def get_children(self):
return [
DependencyInclusionConfigNode(self, v) for v in DEPS_INCLUSION_DIMENSIONS
]
return [DependencyInclusionConfigNode(self, v) for v in DEPS_INCLUSION_DIMENSIONS]
class DependencyInclusionConfigNode(ConfigNode):
def __init__(self, parent, deps_variant):
super().__init__(parent, deps_variant)
self.props["libtorch_variant"] = "-".join(
[self.parent.get_label(), self.get_label()]
)
self.props["libtorch_variant"] = "-".join([self.parent.get_label(), self.get_label()])

View File

@ -1,24 +1,13 @@
from collections import OrderedDict
import cimodel.data.binary_build_data as binary_build_data
import cimodel.data.simple.util.branch_filters as branch_filters
import cimodel.data.binary_build_data as binary_build_data
import cimodel.lib.conf_tree as conf_tree
import cimodel.lib.miniutils as miniutils
class Conf(object):
def __init__(self, os, gpu_version, pydistro, parms, smoke, libtorch_variant, gcc_config_variant, libtorch_config_variant):
class Conf:
def __init__(
self,
os,
gpu_version,
pydistro,
parms,
smoke,
libtorch_variant,
gcc_config_variant,
libtorch_config_variant,
):
self.os = os
self.gpu_version = gpu_version
self.pydistro = pydistro
@ -29,11 +18,7 @@ class Conf:
self.libtorch_config_variant = libtorch_config_variant
def gen_build_env_parms(self):
elems = (
[self.pydistro]
+ self.parms
+ [binary_build_data.get_processor_arch_name(self.gpu_version)]
)
elems = [self.pydistro] + self.parms + [binary_build_data.get_processor_arch_name(self.gpu_version)]
if self.gcc_config_variant is not None:
elems.append(str(self.gcc_config_variant))
if self.libtorch_config_variant is not None:
@ -41,7 +26,7 @@ class Conf:
return elems
def gen_docker_image(self):
if self.gcc_config_variant == "gcc5.4_cxx11-abi":
if self.gcc_config_variant == 'gcc5.4_cxx11-abi':
if self.gpu_version is None:
return miniutils.quote("pytorch/libtorch-cxx11-builder:cpu")
else:
@ -52,41 +37,30 @@ class Conf:
if self.gpu_version is None:
return miniutils.quote("pytorch/conda-builder:cpu")
else:
return miniutils.quote(f"pytorch/conda-builder:{self.gpu_version}")
return miniutils.quote(
f"pytorch/conda-builder:{self.gpu_version}"
)
docker_word_substitution = {
"manywheel": "manylinux",
"libtorch": "manylinux",
}
docker_distro_prefix = miniutils.override(
self.pydistro, docker_word_substitution
)
docker_distro_prefix = miniutils.override(self.pydistro, docker_word_substitution)
# The cpu nightlies are built on the pytorch/manylinux-cuda102 docker image
# TODO cuda images should consolidate into tag-base images similar to rocm
alt_docker_suffix = (
"cuda102"
if not self.gpu_version
else (
"rocm:" + self.gpu_version.strip("rocm")
if self.gpu_version.startswith("rocm")
else self.gpu_version
)
)
docker_distro_suffix = (
alt_docker_suffix
if self.pydistro != "conda"
else ("cuda" if alt_docker_suffix.startswith("cuda") else "rocm")
)
return miniutils.quote(
"pytorch/" + docker_distro_prefix + "-" + docker_distro_suffix
)
alt_docker_suffix = "cuda102" if not self.gpu_version else (
"rocm:" + self.gpu_version.strip("rocm") if self.gpu_version.startswith("rocm") else self.gpu_version)
docker_distro_suffix = alt_docker_suffix if self.pydistro != "conda" else (
"cuda" if alt_docker_suffix.startswith("cuda") else "rocm")
return miniutils.quote("pytorch/" + docker_distro_prefix + "-" + docker_distro_suffix)
def get_name_prefix(self):
return "smoke" if self.smoke else "binary"
def gen_build_name(self, build_or_test, nightly):
parts = [self.get_name_prefix(), self.os] + self.gen_build_env_parms()
if nightly:
@ -104,9 +78,7 @@ class Conf:
def gen_workflow_job(self, phase, upload_phase_dependency=None, nightly=False):
job_def = OrderedDict()
job_def["name"] = self.gen_build_name(phase, nightly)
job_def["build_environment"] = miniutils.quote(
" ".join(self.gen_build_env_parms())
)
job_def["build_environment"] = miniutils.quote(" ".join(self.gen_build_env_parms()))
if self.smoke:
job_def["requires"] = [
"update_s3_htmls",
@ -144,48 +116,47 @@ class Conf:
os_name = miniutils.override(self.os, {"macos": "mac"})
job_name = "_".join([self.get_name_prefix(), os_name, phase])
return {job_name: job_def}
return {job_name : job_def}
def gen_upload_job(self, phase, requires_dependency):
"""Generate binary_upload job for configuration
Output looks similar to:
Output looks similar to:
- binary_upload:
name: binary_linux_manywheel_3_7m_cu113_devtoolset7_nightly_upload
context: org-member
requires: binary_linux_manywheel_3_7m_cu113_devtoolset7_nightly_test
filters:
branches:
only:
- nightly
tags:
only: /v[0-9]+(\\.[0-9]+)*-rc[0-9]+/
package_type: manywheel
upload_subfolder: cu113
- binary_upload:
name: binary_linux_manywheel_3_7m_cu113_devtoolset7_nightly_upload
context: org-member
requires: binary_linux_manywheel_3_7m_cu113_devtoolset7_nightly_test
filters:
branches:
only:
- nightly
tags:
only: /v[0-9]+(\\.[0-9]+)*-rc[0-9]+/
package_type: manywheel
upload_subfolder: cu113
"""
return {
"binary_upload": OrderedDict(
{
"name": self.gen_build_name(phase, nightly=True),
"context": "org-member",
"requires": [
self.gen_build_name(requires_dependency, nightly=True)
],
"filters": branch_filters.gen_filter_dict(
branches_list=["nightly"],
tags_list=[branch_filters.RC_PATTERN],
),
"package_type": self.pydistro,
"upload_subfolder": binary_build_data.get_processor_arch_name(
self.gpu_version,
),
}
)
"binary_upload": OrderedDict({
"name": self.gen_build_name(phase, nightly=True),
"context": "org-member",
"requires": [self.gen_build_name(
requires_dependency,
nightly=True
)],
"filters": branch_filters.gen_filter_dict(
branches_list=["nightly"],
tags_list=[branch_filters.RC_PATTERN],
),
"package_type": self.pydistro,
"upload_subfolder": binary_build_data.get_processor_arch_name(
self.gpu_version,
),
})
}
def get_root(smoke, name):
return binary_build_data.TopLevelNode(
name,
binary_build_data.CONFIG_TREE_DATA,
@ -194,6 +165,7 @@ def get_root(smoke, name):
def gen_build_env_list(smoke):
root = get_root(smoke, "N/A")
config_list = conf_tree.dfs(root)
@ -204,8 +176,7 @@ def gen_build_env_list(smoke):
c.find_prop("gpu"),
c.find_prop("package_format"),
[c.find_prop("pyver")],
c.find_prop("smoke")
and not (c.find_prop("os_name") == "macos_arm64"), # don't test arm64
c.find_prop("smoke") and not (c.find_prop("os_name") == "macos_arm64"), # don't test arm64
c.find_prop("libtorch_variant"),
c.find_prop("gcc_config_variant"),
c.find_prop("libtorch_config_variant"),
@ -214,11 +185,9 @@ def gen_build_env_list(smoke):
return newlist
def predicate_exclude_macos(config):
return config.os == "linux" or config.os == "windows"
def get_nightly_uploads():
configs = gen_build_env_list(False)
mylist = []
@ -228,7 +197,6 @@ def get_nightly_uploads():
return mylist
def get_post_upload_jobs():
return [
{
@ -242,8 +210,8 @@ def get_post_upload_jobs():
},
]
def get_nightly_tests():
configs = gen_build_env_list(False)
filtered_configs = filter(predicate_exclude_macos, configs)

View File

@ -16,4 +16,9 @@ ROCM_VERSION_LABELS = ["rocm" + v for v in ROCM_VERSIONS]
GPU_VERSIONS = [None] + ["cuda" + v for v in CUDA_VERSIONS] + ROCM_VERSION_LABELS
STANDARD_PYTHON_VERSIONS = ["3.7", "3.8", "3.9", "3.10"]
STANDARD_PYTHON_VERSIONS = [
"3.7",
"3.8",
"3.9",
"3.10"
]

View File

@ -1,7 +1,8 @@
from cimodel.lib.conf_tree import ConfigNode
CONFIG_TREE_DATA = []
CONFIG_TREE_DATA = [
]
def get_major_pyver(dotted_version):
@ -95,7 +96,6 @@ class SlowGradcheckConfigNode(TreeConfigNode):
def child_constructor(self):
return ExperimentalFeatureConfigNode
class PureTorchConfigNode(TreeConfigNode):
def modify_label(self, label):
return "PURE_TORCH=" + str(label)
@ -117,7 +117,6 @@ class XlaConfigNode(TreeConfigNode):
def child_constructor(self):
return ImportantConfigNode
class MPSConfigNode(TreeConfigNode):
def modify_label(self, label):
return "MPS=" + str(label)
@ -255,11 +254,8 @@ class XenialCompilerConfigNode(TreeConfigNode):
# noinspection PyMethodMayBeStatic
def child_constructor(self):
return (
XenialCompilerVersionConfigNode
if self.props["compiler_name"]
else PyVerConfigNode
)
return XenialCompilerVersionConfigNode if self.props["compiler_name"] else PyVerConfigNode
class BionicCompilerConfigNode(TreeConfigNode):
@ -271,11 +267,8 @@ class BionicCompilerConfigNode(TreeConfigNode):
# noinspection PyMethodMayBeStatic
def child_constructor(self):
return (
BionicCompilerVersionConfigNode
if self.props["compiler_name"]
else PyVerConfigNode
)
return BionicCompilerVersionConfigNode if self.props["compiler_name"] else PyVerConfigNode
class XenialCompilerVersionConfigNode(TreeConfigNode):

View File

@ -111,10 +111,10 @@ class Conf:
parameters["resource_class"] = resource_class
if phase == "build" and self.rocm_version is not None:
parameters["resource_class"] = "xlarge"
if hasattr(self, "filters"):
parameters["filters"] = self.filters
if hasattr(self, 'filters'):
parameters['filters'] = self.filters
if self.build_only:
parameters["build_only"] = miniutils.quote(str(int(True)))
parameters['build_only'] = miniutils.quote(str(int(True)))
return parameters
def gen_workflow_job(self, phase):
@ -122,6 +122,7 @@ class Conf:
job_def["name"] = self.gen_build_name(phase)
if Conf.is_test_phase(phase):
# TODO When merging the caffe2 and pytorch jobs, it might be convenient for a while to make a
# caffe2 test job dependent on a pytorch build job. This way we could quickly dedup the repeated
# build of pytorch in the caffe2 build job, and just run the caffe2 tests off of a completed
@ -142,7 +143,7 @@ class Conf:
# TODO This is a hack to special case some configs just for the workflow list
class HiddenConf:
class HiddenConf(object):
def __init__(self, name, parent_build=None, filters=None):
self.name = name
self.parent_build = parent_build
@ -159,8 +160,7 @@ class HiddenConf:
def gen_build_name(self, _):
return self.name
class DocPushConf:
class DocPushConf(object):
def __init__(self, name, parent_build=None, branch="master"):
self.name = name
self.parent_build = parent_build
@ -173,13 +173,11 @@ class DocPushConf:
"branch": self.branch,
"requires": [self.parent_build],
"context": "org-member",
"filters": gen_filter_dict(
branches_list=["nightly"], tags_list=RC_PATTERN
),
"filters": gen_filter_dict(branches_list=["nightly"],
tags_list=RC_PATTERN)
}
}
def gen_docs_configs(xenial_parent_config):
configs = []
@ -187,9 +185,8 @@ def gen_docs_configs(xenial_parent_config):
HiddenConf(
"pytorch_python_doc_build",
parent_build=xenial_parent_config,
filters=gen_filter_dict(
branches_list=["master", "main", "nightly"], tags_list=RC_PATTERN
),
filters=gen_filter_dict(branches_list=["master", "main", "nightly"],
tags_list=RC_PATTERN),
)
)
configs.append(
@ -204,9 +201,8 @@ def gen_docs_configs(xenial_parent_config):
HiddenConf(
"pytorch_cpp_doc_build",
parent_build=xenial_parent_config,
filters=gen_filter_dict(
branches_list=["master", "main", "nightly"], tags_list=RC_PATTERN
),
filters=gen_filter_dict(branches_list=["master", "main", "nightly"],
tags_list=RC_PATTERN),
)
)
configs.append(
@ -230,11 +226,13 @@ def gen_tree():
def instantiate_configs(only_slow_gradcheck):
config_list = []
root = get_root()
found_configs = conf_tree.dfs(root)
for fc in found_configs:
restrict_phases = None
distro_name = fc.find_prop("distro_name")
compiler_name = fc.find_prop("compiler_name")
@ -353,7 +351,8 @@ def instantiate_configs(only_slow_gradcheck):
and compiler_name == "gcc"
and fc.find_prop("compiler_version") == "5.4"
):
c.filters = gen_filter_dict(branches_list=r"/.*/", tags_list=RC_PATTERN)
c.filters = gen_filter_dict(branches_list=r"/.*/",
tags_list=RC_PATTERN)
c.dependent_tests = gen_docs_configs(c)
config_list.append(c)
@ -362,13 +361,16 @@ def instantiate_configs(only_slow_gradcheck):
def get_workflow_jobs(only_slow_gradcheck=False):
config_list = instantiate_configs(only_slow_gradcheck)
x = []
for conf_options in config_list:
phases = conf_options.restrict_phases or dimensions.PHASES
for phase in phases:
# TODO why does this not have a test?
if Conf.is_test_phase(phase) and conf_options.cuda_version == "10":
continue

View File

@ -1,39 +1,39 @@
from collections import OrderedDict
from cimodel.data.simple.util.branch_filters import gen_filter_dict, RC_PATTERN
from cimodel.lib.miniutils import quote
from cimodel.data.simple.util.branch_filters import gen_filter_dict, RC_PATTERN
# NOTE: All hardcoded docker image builds have been migrated to GHA
IMAGE_NAMES = []
IMAGE_NAMES = [
]
# This entry should be an element from the list above
# This should contain the image matching the "slow_gradcheck" entry in
# pytorch_build_data.py
SLOW_GRADCHECK_IMAGE_NAME = "pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7"
def get_workflow_jobs(images=IMAGE_NAMES, only_slow_gradcheck=False):
"""Generates a list of docker image build definitions"""
ret = []
for image_name in images:
if image_name.startswith("docker-"):
image_name = image_name.lstrip("docker-")
if image_name.startswith('docker-'):
image_name = image_name.lstrip('docker-')
if only_slow_gradcheck and image_name is not SLOW_GRADCHECK_IMAGE_NAME:
continue
parameters = OrderedDict(
{
"name": quote(f"docker-{image_name}"),
"image_name": quote(image_name),
}
)
parameters = OrderedDict({
"name": quote(f"docker-{image_name}"),
"image_name": quote(image_name),
})
if image_name == "pytorch-linux-xenial-py3.7-gcc5.4":
# pushing documentation on tags requires CircleCI to also
# build all the dependencies on tags, including this docker image
parameters["filters"] = gen_filter_dict(
branches_list=r"/.*/", tags_list=RC_PATTERN
)
ret.append(OrderedDict({"docker_build_job": parameters}))
parameters['filters'] = gen_filter_dict(branches_list=r"/.*/",
tags_list=RC_PATTERN)
ret.append(OrderedDict(
{
"docker_build_job": parameters
}
))
return ret

View File

@ -1,6 +1,6 @@
import cimodel.lib.miniutils as miniutils
from cimodel.data.simple.util.branch_filters import gen_filter_dict_exclude
from cimodel.data.simple.util.versions import MultiPartVersion
from cimodel.data.simple.util.branch_filters import gen_filter_dict_exclude
import cimodel.lib.miniutils as miniutils
XCODE_VERSION = MultiPartVersion([12, 5, 1])
@ -11,9 +11,7 @@ class ArchVariant:
self.custom_build_name = custom_build_name
def render(self):
extra_parts = (
[self.custom_build_name] if len(self.custom_build_name) > 0 else []
)
extra_parts = [self.custom_build_name] if len(self.custom_build_name) > 0 else []
return "-".join([self.name] + extra_parts).replace("_", "-")
@ -22,9 +20,7 @@ def get_platform(arch_variant_name):
class IOSJob:
def __init__(
self, xcode_version, arch_variant, is_org_member_context=True, extra_props=None
):
def __init__(self, xcode_version, arch_variant, is_org_member_context=True, extra_props=None):
self.xcode_version = xcode_version
self.arch_variant = arch_variant
self.is_org_member_context = is_org_member_context
@ -33,15 +29,11 @@ class IOSJob:
def gen_name_parts(self):
version_parts = self.xcode_version.render_dots_or_parts("-")
build_variant_suffix = self.arch_variant.render()
return (
[
"ios",
]
+ version_parts
+ [
build_variant_suffix,
]
)
return [
"ios",
] + version_parts + [
build_variant_suffix,
]
def gen_job_name(self):
return "-".join(self.gen_name_parts())
@ -67,12 +59,8 @@ class IOSJob:
WORKFLOW_DATA = [
IOSJob(
XCODE_VERSION,
ArchVariant("x86_64"),
is_org_member_context=False,
extra_props={"lite_interpreter": miniutils.quote(str(int(True)))},
),
IOSJob(XCODE_VERSION, ArchVariant("x86_64"), is_org_member_context=False, extra_props={
"lite_interpreter": miniutils.quote(str(int(True)))}),
# IOSJob(XCODE_VERSION, ArchVariant("arm64"), extra_props={
# "lite_interpreter": miniutils.quote(str(int(True)))}),
# IOSJob(XCODE_VERSION, ArchVariant("arm64", "metal"), extra_props={
@ -81,15 +69,9 @@ WORKFLOW_DATA = [
# IOSJob(XCODE_VERSION, ArchVariant("arm64", "custom-ops"), extra_props={
# "op_list": "mobilenetv2.yaml",
# "lite_interpreter": miniutils.quote(str(int(True)))}),
IOSJob(
XCODE_VERSION,
ArchVariant("x86_64", "coreml"),
is_org_member_context=False,
extra_props={
"use_coreml": miniutils.quote(str(int(True))),
"lite_interpreter": miniutils.quote(str(int(True))),
},
),
IOSJob(XCODE_VERSION, ArchVariant("x86_64", "coreml"), is_org_member_context=False, extra_props={
"use_coreml": miniutils.quote(str(int(True))),
"lite_interpreter": miniutils.quote(str(int(True)))}),
# IOSJob(XCODE_VERSION, ArchVariant("arm64", "coreml"), extra_props={
# "use_coreml": miniutils.quote(str(int(True))),
# "lite_interpreter": miniutils.quote(str(int(True)))}),

View File

@ -2,14 +2,17 @@
PyTorch Mobile PR builds (use linux host toolchain + mobile build options)
"""
import cimodel.data.simple.util.branch_filters
import cimodel.lib.miniutils as miniutils
import cimodel.data.simple.util.branch_filters
class MobileJob:
def __init__(
self, docker_image, docker_requires, variant_parts, is_master_only=False
):
self,
docker_image,
docker_requires,
variant_parts,
is_master_only=False):
self.docker_image = docker_image
self.docker_requires = docker_requires
self.variant_parts = variant_parts
@ -37,14 +40,13 @@ class MobileJob:
}
if self.is_master_only:
props_dict[
"filters"
] = cimodel.data.simple.util.branch_filters.gen_filter_dict()
props_dict["filters"] = cimodel.data.simple.util.branch_filters.gen_filter_dict()
return [{"pytorch_linux_build": props_dict}]
WORKFLOW_DATA = []
WORKFLOW_DATA = [
]
def get_workflow_jobs():

View File

@ -3,7 +3,11 @@ import cimodel.lib.miniutils as miniutils
class IOSNightlyJob:
def __init__(self, variant, is_full_jit=False, is_upload=False):
def __init__(self,
variant,
is_full_jit=False,
is_upload=False):
self.variant = variant
self.is_full_jit = is_full_jit
self.is_upload = is_upload
@ -12,24 +16,19 @@ class IOSNightlyJob:
return "upload" if self.is_upload else "build"
def get_common_name_pieces(self, sep):
extra_name_suffix = [self.get_phase_name()] if self.is_upload else []
extra_name = ["full_jit"] if self.is_full_jit else []
common_name_pieces = (
[
"ios",
]
+ extra_name
+ []
+ ios_definitions.XCODE_VERSION.render_dots_or_parts(sep)
+ [
"nightly",
self.variant,
"build",
]
+ extra_name_suffix
)
common_name_pieces = [
"ios",
] + extra_name + [
] + ios_definitions.XCODE_VERSION.render_dots_or_parts(sep) + [
"nightly",
self.variant,
"build",
] + extra_name_suffix
return common_name_pieces
@ -38,14 +37,10 @@ class IOSNightlyJob:
def gen_tree(self):
build_configs = BUILD_CONFIGS_FULL_JIT if self.is_full_jit else BUILD_CONFIGS
extra_requires = (
[x.gen_job_name() for x in build_configs] if self.is_upload else []
)
extra_requires = [x.gen_job_name() for x in build_configs] if self.is_upload else []
props_dict = {
"build_environment": "-".join(
["libtorch"] + self.get_common_name_pieces(".")
),
"build_environment": "-".join(["libtorch"] + self.get_common_name_pieces(".")),
"requires": extra_requires,
"context": "org-member",
"filters": {"branches": {"only": "nightly"}},
@ -61,13 +56,11 @@ class IOSNightlyJob:
if self.is_full_jit:
props_dict["lite_interpreter"] = miniutils.quote(str(int(False)))
template_name = "_".join(
[
"binary",
"ios",
self.get_phase_name(),
]
)
template_name = "_".join([
"binary",
"ios",
self.get_phase_name(),
])
return [{template_name: props_dict}]
@ -82,14 +75,10 @@ BUILD_CONFIGS_FULL_JIT = [
IOSNightlyJob("arm64", is_full_jit=True),
]
WORKFLOW_DATA = (
BUILD_CONFIGS
+ BUILD_CONFIGS_FULL_JIT
+ [
IOSNightlyJob("binary", is_full_jit=False, is_upload=True),
IOSNightlyJob("binary", is_full_jit=True, is_upload=True),
]
)
WORKFLOW_DATA = BUILD_CONFIGS + BUILD_CONFIGS_FULL_JIT + [
IOSNightlyJob("binary", is_full_jit=False, is_upload=True),
IOSNightlyJob("binary", is_full_jit=True, is_upload=True),
]
def get_workflow_jobs():

View File

@ -15,7 +15,10 @@ RC_PATTERN = r"/v[0-9]+(\.[0-9]+)*-rc[0-9]+/"
MAC_IOS_EXCLUSION_LIST = ["nightly", "postnightly"]
def gen_filter_dict(branches_list=NON_PR_BRANCH_LIST, tags_list=None):
def gen_filter_dict(
branches_list=NON_PR_BRANCH_LIST,
tags_list=None
):
"""Generates a filter dictionary for use with CircleCI's job filter"""
filter_dict = {
"branches": {

View File

@ -1,13 +1,11 @@
AWS_DOCKER_HOST = "308535385114.dkr.ecr.us-east-1.amazonaws.com"
def gen_docker_image(container_type):
return (
"/".join([AWS_DOCKER_HOST, "pytorch", container_type]),
f"docker-{container_type}",
)
def gen_docker_image_requires(image_name):
return [f"docker-{image_name}"]

View File

@ -12,9 +12,7 @@ class MultiPartVersion:
with the prefix string.
"""
if self.parts:
return [self.prefix + str(self.parts[0])] + [
str(part) for part in self.parts[1:]
]
return [self.prefix + str(self.parts[0])] + [str(part) for part in self.parts[1:]]
else:
return [self.prefix]

View File

@ -1,5 +1,5 @@
from dataclasses import dataclass, field
from typing import Dict, Optional
from typing import Optional, Dict
def X(val):
@ -19,7 +19,6 @@ class Ver:
"""
Represents a product with a version number
"""
name: str
version: str = ""
@ -29,7 +28,7 @@ class Ver:
@dataclass
class ConfigNode:
parent: Optional["ConfigNode"]
parent: Optional['ConfigNode']
node_name: str
props: Dict[str, str] = field(default_factory=dict)
@ -41,11 +40,7 @@ class ConfigNode:
return []
def get_parents(self):
return (
(self.parent.get_parents() + [self.parent.get_label()])
if self.parent
else []
)
return (self.parent.get_parents() + [self.parent.get_label()]) if self.parent else []
def get_depth(self):
return len(self.get_parents())
@ -74,13 +69,13 @@ class ConfigNode:
def dfs_recurse(
node,
leaf_callback=lambda x: None,
discovery_callback=lambda x, y, z: None,
child_callback=lambda x, y: None,
sibling_index=0,
sibling_count=1,
):
node,
leaf_callback=lambda x: None,
discovery_callback=lambda x, y, z: None,
child_callback=lambda x, y: None,
sibling_index=0,
sibling_count=1):
discovery_callback(node, sibling_index, sibling_count)
node_children = node.get_children()
@ -101,6 +96,7 @@ def dfs_recurse(
def dfs(toplevel_config_node):
config_list = []
def leaf_callback(node):

View File

@ -25,6 +25,7 @@ def render(fh, data, depth, is_list_member=False):
indentation = " " * INDENTATION_WIDTH * depth
if is_dict(data):
tuples = list(data.items())
if type(data) is not OrderedDict:
tuples.sort()

View File

@ -2,11 +2,10 @@
import os
import sys
import yaml
# Need to import modules that lie on an upward-relative path
sys.path.append(os.path.join(sys.path[0], ".."))
sys.path.append(os.path.join(sys.path[0], '..'))
import cimodel.lib.miniyaml as miniyaml

2
.circleci/config.yml generated
View File

@ -652,7 +652,7 @@ jobs:
- run:
name: Archive artifacts into zip
command: |
zip -1 -r artifacts.zip dist/ build/.ninja_log build/compile_commands.json .pytorch-test-times.json .pytorch-test-file-ratings.json
zip -1 -r artifacts.zip dist/ build/.ninja_log build/compile_commands.json .pytorch-test-times.json
cp artifacts.zip /Users/distiller/workspace
- persist_to_workspace:

View File

@ -21,6 +21,7 @@ Please re-run the "%s" script in the "%s" directory and commit the result. See "
def check_consistency():
_, temp_filename = tempfile.mkstemp("-generated-config.yml")
with open(temp_filename, "w") as fh:
@ -29,10 +30,7 @@ def check_consistency():
try:
subprocess.check_call(["cmp", temp_filename, CHECKED_IN_FILE])
except subprocess.CalledProcessError:
sys.exit(
ERROR_MESSAGE_TEMPLATE
% (CHECKED_IN_FILE, REGENERATION_SCRIPT, PARENT_DIR, README_PATH)
)
sys.exit(ERROR_MESSAGE_TEMPLATE % (CHECKED_IN_FILE, REGENERATION_SCRIPT, PARENT_DIR, README_PATH))
finally:
os.remove(temp_filename)

View File

@ -10,16 +10,15 @@ import shutil
import sys
from collections import namedtuple
import cimodel.data.simple.anaconda_prune_defintions
import cimodel.data.simple.docker_definitions
import cimodel.data.simple.mobile_definitions
import cimodel.data.simple.nightly_ios
import cimodel.data.simple.anaconda_prune_defintions
import cimodel.lib.miniutils as miniutils
import cimodel.lib.miniyaml as miniyaml
class File:
class File(object):
"""
Verbatim copy the contents of a file into config.yml
"""
@ -58,7 +57,7 @@ def horizontal_rule():
return "".join("#" * 78)
class Header:
class Header(object):
def __init__(self, title, summary=None):
self.title = title
self.summary_lines = summary or []
@ -83,19 +82,15 @@ def _for_all_items(items, functor) -> None:
def filter_master_only_jobs(items):
def _is_main_or_master_item(item):
filters = item.get("filters", None)
branches = filters.get("branches", None) if filters is not None else None
branches_only = branches.get("only", None) if branches is not None else None
return (
("main" in branches_only or "master" in branches_only)
if branches_only is not None
else False
)
filters = item.get('filters', None)
branches = filters.get('branches', None) if filters is not None else None
branches_only = branches.get('only', None) if branches is not None else None
return ('main' in branches_only or 'master' in branches_only) if branches_only is not None else False
master_deps = set()
def _save_requires_if_master(item_type, item):
requires = item.get("requires", None)
requires = item.get('requires', None)
item_name = item.get("name", None)
if not isinstance(requires, list):
return
@ -112,9 +107,9 @@ def filter_master_only_jobs(items):
item_name = item_name.strip('"') if item_name is not None else None
if not _is_main_or_master_item(item) and item_name not in master_deps:
return None
if "filters" in item:
if 'filters' in item:
item = item.copy()
item.pop("filters")
item.pop('filters')
return {item_type: item}
# Scan of dependencies twice to pick up nested required jobs
@ -128,12 +123,12 @@ def generate_required_docker_images(items):
required_docker_images = set()
def _requires_docker_image(item_type, item):
requires = item.get("requires", None)
requires = item.get('requires', None)
if not isinstance(requires, list):
return
for requirement in requires:
requirement = requirement.replace('"', "")
if requirement.startswith("docker-"):
requirement = requirement.replace('"', '')
if requirement.startswith('docker-'):
required_docker_images.add(requirement)
_for_all_items(items, _requires_docker_image)
@ -196,4 +191,5 @@ def stitch_sources(output_filehandle):
if __name__ == "__main__":
stitch_sources(sys.stdout)

View File

@ -48,7 +48,7 @@ if [[ -n "${CIRCLE_PR_NUMBER:-}" ]]; then
git checkout -q -B "$CIRCLE_BRANCH"
git reset --hard "$CIRCLE_SHA1"
elif [[ -n "${CIRCLE_SHA1:-}" ]]; then
# Scheduled workflows & "smoke" binary build on trunk on PR merges
# Scheduled workflows & "smoke" binary build on master on PR merges
DEFAULT_BRANCH="$(git remote show $CIRCLE_REPOSITORY_URL | awk '/HEAD branch/ {print $NF}')"
git reset --hard "$CIRCLE_SHA1"
git checkout -q -B $DEFAULT_BRANCH
@ -61,8 +61,8 @@ echo "Using Pytorch from "
git --no-pager log --max-count 1
popd
# Clone the Builder main repo
retry git clone -q https://github.com/pytorch/builder.git -b release/2.1 "$BUILDER_ROOT"
# Clone the Builder master repo
retry git clone -q https://github.com/pytorch/builder.git -b release/2.0 "$BUILDER_ROOT"
pushd "$BUILDER_ROOT"
echo "Using builder from "
git --no-pager log --max-count 1

View File

@ -33,7 +33,7 @@ fi
cp ${PROJ_ROOT}/LICENSE ${ZIP_DIR}/
# zip the library
export DATE="$(date -u +%Y%m%d)"
export IOS_NIGHTLY_BUILD_VERSION="2.1.0.${DATE}"
export IOS_NIGHTLY_BUILD_VERSION="2.0.0.${DATE}"
if [ "${BUILD_LITE_INTERPRETER}" == "1" ]; then
# libtorch_lite_ios_nightly_1.11.0.20210810.zip
ZIPFILE="libtorch_lite_ios_nightly_${IOS_NIGHTLY_BUILD_VERSION}.zip"

View File

@ -11,7 +11,7 @@ NUM_CPUS=$(( $(nproc) - 2 ))
# Defaults here for **binary** linux builds so they can be changed in one place
export MAX_JOBS=${MAX_JOBS:-$(( ${NUM_CPUS} > ${MEMORY_LIMIT_MAX_JOBS} ? ${MEMORY_LIMIT_MAX_JOBS} : ${NUM_CPUS} ))}
if [[ "${DESIRED_CUDA}" =~ cu1[1-2][0-9] ]]; then
if [[ "${DESIRED_CUDA}" =~ cu11[0-9] ]]; then
export BUILD_SPLIT_CUDA="ON"
fi

View File

@ -38,6 +38,10 @@ fi
EXTRA_CONDA_FLAGS=""
NUMPY_PIN=""
PROTOBUF_PACKAGE="defaults::protobuf"
if [[ "\$python_nodot" = *311* ]]; then
# Numpy is yet not avaiable on default conda channel
EXTRA_CONDA_FLAGS="-c=malfet"
fi
if [[ "\$python_nodot" = *310* ]]; then
# There's an issue with conda channel priority where it'll randomly pick 1.19 over 1.20
@ -77,7 +81,6 @@ if [[ "$PACKAGE_TYPE" == conda ]]; then
"numpy\${NUMPY_PIN}" \
mkl>=2018 \
ninja \
sympy \
typing-extensions \
${PROTOBUF_PACKAGE}
if [[ "$DESIRED_CUDA" == 'cpu' ]]; then
@ -90,18 +93,12 @@ if [[ "$PACKAGE_TYPE" == conda ]]; then
if [[ "\${TORCH_CONDA_BUILD_FOLDER}" == "pytorch-nightly" ]]; then
PYTORCH_CHANNEL="pytorch-nightly"
fi
retry conda install \${EXTRA_CONDA_FLAGS} -yq -c nvidia -c pytorch-test "pytorch-cuda=\${cu_ver}"
retry conda install \${EXTRA_CONDA_FLAGS} -yq -c nvidia -c "\${PYTORCH_CHANNEL}" "pytorch-cuda=\${cu_ver}"
fi
conda install \${EXTRA_CONDA_FLAGS} -y "\$pkg" --offline
)
elif [[ "$PACKAGE_TYPE" != libtorch ]]; then
if [[ "$(uname -m)" == aarch64 ]]; then
# Using "extra-index-url" until all needed aarch64 dependencies are
# added to "https://download.pytorch.org/whl/nightly/"
pip install "\$pkg" --extra-index-url "https://download.pytorch.org/whl/test/${DESIRED_CUDA}"
else
pip install "\$pkg" --index-url "https://download.pytorch.org/whl/test/${DESIRED_CUDA}"
fi
pip install "\$pkg" --extra-index-url "https://download.pytorch.org/whl/nightly/${DESIRED_CUDA}"
retry pip install -q numpy protobuf typing-extensions
fi
if [[ "$PACKAGE_TYPE" == libtorch ]]; then

View File

@ -59,7 +59,7 @@ PIP_UPLOAD_FOLDER='nightly/'
# We put this here so that OVERRIDE_PACKAGE_VERSION below can read from it
export DATE="$(date -u +%Y%m%d)"
#TODO: We should be pulling semver version from the base version.txt
BASE_BUILD_VERSION="2.1.0.dev$DATE"
BASE_BUILD_VERSION="2.0.0.dev$DATE"
# Change BASE_BUILD_VERSION to git tag when on a git tag
# Use 'git -C' to make doubly sure we're in the correct directory for checking
# the git tag
@ -77,9 +77,7 @@ else
export PYTORCH_BUILD_VERSION="${BASE_BUILD_VERSION}+$DESIRED_CUDA"
fi
# The build with with-pypi-cudnn suffix is only applicabe to
# pypi small wheel Linux x86 build
if [[ -n "${PYTORCH_EXTRA_INSTALL_REQUIREMENTS:-}" ]] && [[ "$(uname)" == 'Linux' && "$(uname -m)" == "x86_64" ]]; then
if [[ -n "${PYTORCH_EXTRA_INSTALL_REQUIREMENTS:-}" ]]; then
export PYTORCH_BUILD_VERSION="${PYTORCH_BUILD_VERSION}-with-pypi-cudnn"
fi

View File

@ -11,7 +11,7 @@ PKG_DIR=${PKG_DIR:-/tmp/workspace/final_pkgs}
# currently set within `designate_upload_channel`
UPLOAD_CHANNEL=${UPLOAD_CHANNEL:-nightly}
# Designates what subfolder to put packages into
UPLOAD_SUBFOLDER=${UPLOAD_SUBFOLDER:-}
UPLOAD_SUBFOLDER=${UPLOAD_SUBFOLDER:-cpu}
UPLOAD_BUCKET="s3://pytorch"
BACKUP_BUCKET="s3://pytorch-backup"
BUILD_NAME=${BUILD_NAME:-}
@ -64,17 +64,12 @@ s3_upload() {
local pkg_type
extension="$1"
pkg_type="$2"
s3_root_dir="${UPLOAD_BUCKET}/${pkg_type}/${UPLOAD_CHANNEL}"
if [[ -z ${UPLOAD_SUBFOLDER:-} ]]; then
s3_upload_dir="${s3_root_dir}/"
else
s3_upload_dir="${s3_root_dir}/${UPLOAD_SUBFOLDER}/"
fi
s3_dir="${UPLOAD_BUCKET}/${pkg_type}/${UPLOAD_CHANNEL}/${UPLOAD_SUBFOLDER}/"
(
for pkg in ${PKG_DIR}/*.${extension}; do
(
set -x
${AWS_S3_CP} --no-progress --acl public-read "${pkg}" "${s3_upload_dir}"
${AWS_S3_CP} --no-progress --acl public-read "${pkg}" "${s3_dir}"
)
done
)
@ -87,17 +82,15 @@ pip install -q awscli
case "${PACKAGE_TYPE}" in
conda)
conda_upload
for conda_archive in ${PKG_DIR}/*.tar.bz2; do
# Fetch platform (eg. win-64, linux-64, etc.) from index file because
# there's no actual conda command to read this
subdir=$(\
tar -xOf "${conda_archive}" info/index.json \
| grep subdir \
| cut -d ':' -f2 \
| sed -e 's/[[:space:]]//' -e 's/"//g' -e 's/,//' \
)
BACKUP_DIR="conda/${subdir}"
done
# Fetch platform (eg. win-64, linux-64, etc.) from index file
# Because there's no actual conda command to read this
subdir=$(\
tar -xOf ${PKG_DIR}/*.bz2 info/index.json \
| grep subdir \
| cut -d ':' -f2 \
| sed -e 's/[[:space:]]//' -e 's/"//g' -e 's/,//' \
)
BACKUP_DIR="conda/${subdir}"
;;
libtorch)
s3_upload "zip" "libtorch"

View File

@ -8,7 +8,62 @@ export CUDA_VERSION="${DESIRED_CUDA/cu/}"
export USE_SCCACHE=1
export SCCACHE_BUCKET=ossci-compiler-cache
export SCCACHE_IGNORE_SERVER_IO_ERROR=1
export VC_YEAR=2019
export VC_YEAR=2022
if [[ "${DESIRED_CUDA}" == *"cu11"* ]]; then
export BUILD_SPLIT_CUDA=ON
fi
echo "Free Space for CUDA DEBUG BUILD"
if [[ "${CIRCLECI:-}" == 'true' ]]; then
export NIGHTLIES_PYTORCH_ROOT="$PYTORCH_ROOT"
if [[ -d "C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\Community" ]]; then
rm -rf "C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\Community"
fi
if [[ -d "C:\\Program Files (x86)\\Microsoft Visual Studio 14.0" ]]; then
rm -rf "C:\\Program Files (x86)\\Microsoft Visual Studio 14.0"
fi
if [[ -d "C:\\Program Files (x86)\\Microsoft.NET" ]]; then
rm -rf "C:\\Program Files (x86)\\Microsoft.NET"
fi
if [[ -d "C:\\Program Files\\dotnet" ]]; then
rm -rf "C:\\Program Files\\dotnet"
fi
if [[ -d "C:\\Program Files (x86)\\dotnet" ]]; then
rm -rf "C:\\Program Files (x86)\\dotnet"
fi
if [[ -d "C:\\Program Files (x86)\\Microsoft SQL Server" ]]; then
rm -rf "C:\\Program Files (x86)\\Microsoft SQL Server"
fi
if [[ -d "C:\\Program Files (x86)\\Xamarin" ]]; then
rm -rf "C:\\Program Files (x86)\\Xamarin"
fi
if [[ -d "C:\\Program Files (x86)\\Google" ]]; then
rm -rf "C:\\Program Files (x86)\\Google"
fi
set +x
export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V4:-}
export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4:-}
set -x
if [[ -d "C:\\ProgramData\\Microsoft\\VisualStudio\\Packages\\_Instances" ]]; then
mv "C:\\ProgramData\\Microsoft\\VisualStudio\\Packages\\_Instances" .
rm -rf "C:\\ProgramData\\Microsoft\\VisualStudio\\Packages"
mkdir -p "C:\\ProgramData\\Microsoft\\VisualStudio\\Packages"
mv _Instances "C:\\ProgramData\\Microsoft\\VisualStudio\\Packages"
fi
if [[ -d "C:\\Microsoft" ]]; then
# don't use quotes here
rm -rf /c/Microsoft/AndroidNDK*
fi
fi
echo "Free space on filesystem before build:"
df -h

View File

@ -4,7 +4,7 @@ set -eux -o pipefail
source "${BINARY_ENV_FILE:-/c/w/env}"
export CUDA_VERSION="${DESIRED_CUDA/cu/}"
export VC_YEAR=2019
export VC_YEAR=2022
pushd "$BUILDER_ROOT"

View File

@ -1,4 +1,8 @@
#!/bin/bash
# =================== The following code **should** be executed inside Docker container ===================
# Install dependencies
sudo apt-get -y update
sudo apt-get -y install expect-dev
# This is where the local pytorch install in the docker image is located
pt_checkout="/var/lib/jenkins/workspace"
@ -16,7 +20,7 @@ echo "cpp_doc_push_script.sh: Invoked with $*"
# but since DOCS_INSTALL_PATH can be derived from DOCS_VERSION it's probably better to
# try and gather it first, just so we don't potentially break people who rely on this script
# Argument 2: What version of the Python API docs we are building.
version="${2:-${DOCS_VERSION:-main}}"
version="${2:-${DOCS_VERSION:-master}}"
if [ -z "$version" ]; then
echo "error: cpp_doc_push_script.sh: version (arg2) not specified"
exit 1
@ -30,6 +34,11 @@ echo "error: cpp_doc_push_script.sh: install_path (arg1) not specified"
exit 1
fi
is_main_doc=false
if [ "$version" == "master" ]; then
is_main_doc=true
fi
echo "install_path: $install_path version: $version"
# ======================== Building PyTorch C++ API Docs ========================
@ -44,6 +53,7 @@ set -ex
# Generate ATen files
pushd "${pt_checkout}"
pip install -r requirements.txt
time python -m torchgen.gen \
-s aten/src/ATen \
-d build/aten/src/ATen
@ -58,6 +68,7 @@ time python tools/setup_helpers/generate_code.py \
# Build the docs
pushd docs/cpp
pip install -r requirements.txt
time make VERBOSE=1 html -j
popd
@ -68,7 +79,7 @@ pushd cppdocs
# Purge everything with some exceptions
mkdir /tmp/cppdocs-sync
mv _config.yml README.md /tmp/cppdocs-sync/
rm -rf ./*
rm -rf *
# Copy over all the newly generated HTML
cp -r "${pt_checkout}"/docs/cpp/build/html/* .
@ -91,3 +102,4 @@ if [[ "${WITH_PUSH:-}" == true ]]; then
fi
popd
# =================== The above code **should** be executed inside Docker container ===================

View File

@ -24,7 +24,7 @@ popd
git clone https://github.com/pytorch/functorch -b gh-pages --depth 1 functorch_ghpages
pushd functorch_ghpages
if [ $version == "main" ]; then
if [ $version == "master" ]; then
version=nightly
fi

View File

@ -1,4 +1,8 @@
#!/bin/bash
# =================== The following code **should** be executed inside Docker container ===================
# Install dependencies
sudo apt-get -y update
sudo apt-get -y install expect-dev
# This is where the local pytorch install in the docker image is located
pt_checkout="/var/lib/jenkins/workspace"
@ -19,7 +23,7 @@ set -ex
# but since DOCS_INSTALL_PATH can be derived from DOCS_VERSION it's probably better to
# try and gather it first, just so we don't potentially break people who rely on this script
# Argument 2: What version of the docs we are building.
version="${2:-${DOCS_VERSION:-main}}"
version="${2:-${DOCS_VERSION:-master}}"
if [ -z "$version" ]; then
echo "error: python_doc_push_script.sh: version (arg2) not specified"
exit 1
@ -34,7 +38,7 @@ echo "error: python_doc_push_script.sh: install_path (arg1) not specified"
fi
is_main_doc=false
if [ "$version" == "main" ]; then
if [ "$version" == "master" ]; then
is_main_doc=true
fi
@ -51,7 +55,7 @@ echo "install_path: $install_path version: $version"
build_docs () {
set +e
set -o pipefail
make "$1" 2>&1 | tee /tmp/docs_build.txt
make $1 2>&1 | tee /tmp/docs_build.txt
code=$?
if [ $code -ne 0 ]; then
set +x
@ -68,12 +72,12 @@ build_docs () {
}
git clone https://github.com/pytorch/pytorch.github.io -b "$branch" --depth 1
git clone https://github.com/pytorch/pytorch.github.io -b $branch --depth 1
pushd pytorch.github.io
export LC_ALL=C
export PATH=/opt/conda/bin:$PATH
if [ -n "$ANACONDA_PYTHON_VERSION" ]; then
if [ -n $ANACONDA_PYTHON_VERSION ]; then
export PATH=/opt/conda/envs/py_$ANACONDA_PYTHON_VERSION/bin:$PATH
fi
@ -84,9 +88,10 @@ pushd "$pt_checkout"
pushd docs
# Build the docs
pip -q install -r requirements.txt
if [ "$is_main_doc" = true ]; then
build_docs html || exit $?
build_docs html
[ $? -eq 0 ] || exit $?
make coverage
# Now we have the coverage report, we need to make sure it is empty.
# Count the number of lines in the file and turn that number into a variable
@ -97,19 +102,19 @@ if [ "$is_main_doc" = true ]; then
# Also: see docs/source/conf.py for "coverage_ignore*" items, which should
# be documented then removed from there.
lines=$(wc -l build/coverage/python.txt 2>/dev/null |cut -f1 -d' ')
undocumented=$((lines - 2))
undocumented=$(($lines - 2))
if [ $undocumented -lt 0 ]; then
echo coverage output not found
exit 1
elif [ $undocumented -gt 0 ]; then
echo undocumented objects found:
cat build/coverage/python.txt
echo "Make sure you've updated relevant .rsts in docs/source!"
exit 1
fi
else
# skip coverage, format for stable or tags
build_docs html-stable || exit $?
build_docs html-stable
[ $? -eq 0 ] || exit $?
fi
# Move them into the docs repo
@ -141,3 +146,4 @@ if [[ "${WITH_PUSH:-}" == true ]]; then
fi
popd
# =================== The above code **should** be executed inside Docker container ===================

View File

@ -1,12 +1,11 @@
# Documentation: https://docs.microsoft.com/en-us/rest/api/azure/devops/build/?view=azure-devops-rest-6.0
import re
import json
import os
import re
import sys
import time
import requests
import time
AZURE_PIPELINE_BASE_URL = "https://aiinfra.visualstudio.com/PyTorch/"
AZURE_DEVOPS_PAT_BASE64 = os.environ.get("AZURE_DEVOPS_PAT_BASE64_SECRET", "")
@ -20,68 +19,54 @@ build_base_url = AZURE_PIPELINE_BASE_URL + "_apis/build/builds?api-version=6.0"
s = requests.Session()
s.headers.update({"Authorization": "Basic " + AZURE_DEVOPS_PAT_BASE64})
def submit_build(pipeline_id, project_id, source_branch, source_version):
print("Submitting build for branch: " + source_branch)
print("Commit SHA1: ", source_version)
run_build_raw = s.post(
build_base_url,
json={
"definition": {"id": pipeline_id},
"project": {"id": project_id},
"sourceBranch": source_branch,
"sourceVersion": source_version,
},
)
run_build_raw = s.post(build_base_url, json={
"definition": {"id": pipeline_id},
"project": {"id": project_id},
"sourceBranch": source_branch,
"sourceVersion": source_version
})
try:
run_build_json = run_build_raw.json()
except json.decoder.JSONDecodeError as e:
print(e)
print(
"Failed to parse the response. Check if the Azure DevOps PAT is incorrect or expired."
)
print("Failed to parse the response. Check if the Azure DevOps PAT is incorrect or expired.")
sys.exit(-1)
build_id = run_build_json["id"]
build_id = run_build_json['id']
print("Submitted bulid: " + str(build_id))
print("Bulid URL: " + run_build_json["url"])
print("Bulid URL: " + run_build_json['url'])
return build_id
def get_build(_id):
get_build_url = (
AZURE_PIPELINE_BASE_URL + f"/_apis/build/builds/{_id}?api-version=6.0"
)
get_build_url = AZURE_PIPELINE_BASE_URL + f"/_apis/build/builds/{_id}?api-version=6.0"
get_build_raw = s.get(get_build_url)
return get_build_raw.json()
def get_build_logs(_id):
get_build_logs_url = (
AZURE_PIPELINE_BASE_URL + f"/_apis/build/builds/{_id}/logs?api-version=6.0"
)
get_build_logs_url = AZURE_PIPELINE_BASE_URL + f"/_apis/build/builds/{_id}/logs?api-version=6.0"
get_build_logs_raw = s.get(get_build_logs_url)
return get_build_logs_raw.json()
def get_log_content(url):
resp = s.get(url)
return resp.text
def wait_for_build(_id):
build_detail = get_build(_id)
build_status = build_detail["status"]
build_status = build_detail['status']
while build_status == "notStarted":
print("Waiting for run to start: " + str(_id))
while build_status == 'notStarted':
print('Waiting for run to start: ' + str(_id))
sys.stdout.flush()
try:
build_detail = get_build(_id)
build_status = build_detail["status"]
build_status = build_detail['status']
except Exception as e:
print("Error getting build")
print(e)
@ -91,7 +76,7 @@ def wait_for_build(_id):
print("Bulid started: ", str(_id))
handled_logs = set()
while build_status == "inProgress":
while build_status == 'inProgress':
try:
print("Waiting for log: " + str(_id))
logs = get_build_logs(_id)
@ -101,39 +86,38 @@ def wait_for_build(_id):
time.sleep(30)
continue
for log in logs["value"]:
log_id = log["id"]
for log in logs['value']:
log_id = log['id']
if log_id in handled_logs:
continue
handled_logs.add(log_id)
print("Fetching log: \n" + log["url"])
print('Fetching log: \n' + log['url'])
try:
log_content = get_log_content(log["url"])
log_content = get_log_content(log['url'])
print(log_content)
except Exception as e:
print("Error getting log content")
print(e)
sys.stdout.flush()
build_detail = get_build(_id)
build_status = build_detail["status"]
build_status = build_detail['status']
time.sleep(30)
build_result = build_detail["result"]
build_result = build_detail['result']
print("Bulid status: " + build_status)
print("Bulid result: " + build_result)
return build_status, build_result
if __name__ == "__main__":
if __name__ == '__main__':
# Convert the branch name for Azure DevOps
match = re.search(r"pull/(\d+)", TARGET_BRANCH)
match = re.search(r'pull/(\d+)', TARGET_BRANCH)
if match is not None:
pr_num = match.group(1)
SOURCE_BRANCH = f"refs/pull/{pr_num}/head"
SOURCE_BRANCH = f'refs/pull/{pr_num}/head'
else:
SOURCE_BRANCH = f"refs/heads/{TARGET_BRANCH}"
SOURCE_BRANCH = f'refs/heads/{TARGET_BRANCH}'
MAX_RETRY = 2
retry = MAX_RETRY
@ -142,7 +126,7 @@ if __name__ == "__main__":
build_id = submit_build(PIPELINE_ID, PROJECT_ID, SOURCE_BRANCH, TARGET_COMMIT)
build_status, build_result = wait_for_build(build_id)
if build_result != "succeeded":
if build_result != 'succeeded':
retry = retry - 1
if retry > 0:
print("Retrying... remaining attempt: " + str(retry))

View File

@ -177,7 +177,7 @@
- run:
name: Archive artifacts into zip
command: |
zip -1 -r artifacts.zip dist/ build/.ninja_log build/compile_commands.json .pytorch-test-times.json .pytorch-test-file-ratings.json
zip -1 -r artifacts.zip dist/ build/.ninja_log build/compile_commands.json .pytorch-test-times.json
cp artifacts.zip /Users/distiller/workspace
- persist_to_workspace:

View File

@ -60,6 +60,9 @@ MacroBlockBegin: ''
MacroBlockEnd: ''
MaxEmptyLinesToKeep: 1
NamespaceIndentation: None
ObjCBlockIndentWidth: 2
ObjCSpaceAfterProperty: false
ObjCSpaceBeforeProtocolList: false
PenaltyBreakBeforeFirstCallParameter: 1
PenaltyBreakComment: 300
PenaltyBreakFirstLessLess: 120
@ -82,11 +85,4 @@ SpacesInSquareBrackets: false
Standard: Cpp11
TabWidth: 8
UseTab: Never
---
Language: ObjC
ColumnLimit: 120
AlignAfterOpenBracket: Align
ObjCBlockIndentWidth: 2
ObjCSpaceAfterProperty: false
ObjCSpaceBeforeProtocolList: false
...

View File

@ -9,7 +9,6 @@ bugprone-*,
-bugprone-lambda-function-name,
-bugprone-reserved-identifier,
-bugprone-swapped-arguments,
clang-diagnostic-missing-prototypes,
cppcoreguidelines-*,
-cppcoreguidelines-avoid-do-while,
-cppcoreguidelines-avoid-magic-numbers,
@ -42,6 +41,8 @@ modernize-*,
-modernize-use-trailing-return-type,
-modernize-use-nodiscard,
performance-*,
-performance-noexcept-move-constructor,
-performance-unnecessary-value-param,
readability-container-size-empty,
'
HeaderFilterRegex: '^(c10/(?!test)|torch/csrc/(?!deploy/interpreter/cpython)).*$'

View File

@ -1,34 +0,0 @@
FROM mcr.microsoft.com/vscode/devcontainers/miniconda:0-3
# I am suprised this is needed
RUN conda init
# Copy environment.yml (if found) to a temp location so we update the environment. Also
# copy "noop.txt" so the COPY instruction does not fail if no environment.yml exists.
COPY .devcontainer/cuda/environment.yml .devcontainer/noop.txt /tmp/conda-tmp/
RUN if [ -f "/tmp/conda-tmp/environment.yml" ]; then umask 0002 && /opt/conda/bin/conda env update -n base -f /tmp/conda-tmp/environment.yml; fi \
&& sudo rm -rf /tmp/conda-tmp
# Tools needed for llvm
RUN sudo apt-get -y update
RUN sudo apt install -y lsb-release wget software-properties-common gnupg
# Install CLANG if version is specified
ARG CLANG_VERSION
RUN if [ -n "$CLANG_VERSION" ]; then \
sudo wget https://apt.llvm.org/llvm.sh; \
chmod +x llvm.sh; \
sudo ./llvm.sh "${CLANG_VERSION}"; \
echo 'export CC=clang' >> ~/.bashrc; \
echo 'export CXX=clang++' >> ~/.bashrc; \
sudo apt update; \
sudo apt install -y clang; \
sudo apt install -y libomp-dev; \
fi
# Install cuda if version is specified
ARG CUDA_VERSION
RUN if [ -n "$CUDA_VERSION" ]; then \
conda install cuda -c "nvidia/label/cuda-${CUDA_VERSION}"; \
fi

View File

@ -1,37 +0,0 @@
// For format details, see https://aka.ms/devcontainer.json. For config options, see the
// README at: https://github.com/devcontainers/templates/tree/main/src/anaconda
{
"name": "PyTorch - CPU",
"build": {
"context": "../..",
"dockerfile": "../Dockerfile",
"args": {
"USERNAME": "vscode",
"BUILDKIT_INLINE_CACHE": "0",
"CLANG_VERSION": ""
}
},
// Features to add to the dev container. More info: https://containers.dev/features.
"features": {
// This is needed for lintrunner
"ghcr.io/devcontainers/features/rust:1" : {}
},
// Use 'forwardPorts' to make a list of ports inside the container available locally.
// "forwardPorts": [],
// Use 'postCreateCommand' to run commands after the container is created.
"postCreateCommand": "bash .devcontainer/scripts/install-dev-tools.sh",
// Configure tool-specific properties.
// "customizations": {},
"customizations": {
"vscode": {
"extensions": ["streetsidesoftware.code-spell-checker"]
}
}
// Uncomment to connect as root instead. More info: https://aka.ms/dev-containers-non-root.
// "remoteUser": "root"
}

View File

@ -1,6 +0,0 @@
# This environment is specific to Debian
name: PyTorch
dependencies:
- cmake
- ninja
- libopenblas

Some files were not shown because too many files have changed in this diff Show More