Compare commits

..

108 Commits

Author SHA1 Message Date
66f6e793f7 Fix deserialization of TransformerEncoderLayer (#81832) (#81832) (#82094)
Summary:
When `activation` is a module, it is not saved directly in the state dictionary but instead in `_modules`. When deserialized, the old version of this code would think that activation was missing and set it to RELU. This version first reconstructions the module and then sees if activation is neither a module nor a function before setting it to RELU.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81832
Approved by: https://github.com/kit1980, https://github.com/zrphercule

Test Plan:
contbuild & OSS CI, see e68583b4d1

Test plan from GitHub:
pytorch oss tests

Reviewed By: jeanschmidt, zrphercule

Differential Revision: D38014872

Pulled By: zdevito

fbshipit-source-id: 938079d768f7981ca55eed3c8828b29a92e06f41

Co-authored-by: Zachary DeVito (Meta Employee) <zdevito@fb.com>
2022-07-25 10:36:39 +01:00
35eb488428 [CI] Disable ios-12-5-1-x86-64 (#81612) (#81612) (#82096)
Summary:
Currently broken, but for a while it was not testing trunk, but rather some old released build, see https://github.com/pytorch/pytorch/runs/7369514831?check_suite_focus=true#step:9:147

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81612
Approved by: https://github.com/kit1980

Test Plan: contbuild & OSS CI, see 446833d11f

Reviewed By: DanilBaibak

Differential Revision: D37919692

Pulled By: malfet

fbshipit-source-id: c4fb3e32ffd2ca4d9004a4ab14d651cface00c26

Co-authored-by: Nikita Shulga (Meta Employee) <nshulga@fb.com>
2022-07-25 10:35:59 +01:00
e65e4ac1f1 1.12.1/bt fix (#81952)
* Add test for torchscripting nn.TransformerEncoder, including fast path (#79796) (#79796)

Summary:
Add test just to check if TransformerEncoder will crash when enumerating over params [with_no_grad, use_torchscript, training].

Motivation for this was that TransformerEncoder fast path (so with_no_grad=True) and use_torchscript=True would crash with the issue that NestedTensor doesn't have size. This was caused because the TransformerEncoder fast path generates a NestedTensor automatically as a perf optimization and torchscript attempts to find intermediate tensor sizes while it optimizes. But NestedTensor has not implemented a size method, so things fail.

This test goes together with this fix https://github.com/pytorch/pytorch/pull/79480

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79796
Approved by: https://github.com/zrphercule

Test Plan:
contbuild & OSS CI, see 06274d7a48

Test plan from GitHub:
```
buck build --show-output mode/opt -c fbcode.enable_gpu_sections=true -c fbcode.nvcc_arch=a100 mode/inplace  //caffe2/test:transformers

./fbcode/buck-out/gen/caffe2/test/transformers#binary.par
```
Test runs and passes together with the changes from the PR above (I made another diff on top of this with those changes). Does not pass without the fix.

Reviewed By: mikekgfb

Differential Revision: D37222923

Pulled By: erichan1

fbshipit-source-id: 5a16e7d240cb51c0a613d16a79931d41122aba8b

* disable src mask for transformer and multiheadattention fastpath (#81277) (#81277)

Summary:
Disable fastpath if src_mask passed to TransformerEncoderLayer and MultiheadAttention.
- Refactored test_transformerencoder from test_nn.py to test_transformers.py. Added a src_mask test there.
- Added a specific src_mask test in test_transformers.py

Fixes https://github.com/pytorch/pytorch/issues/81129

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81277
Approved by: https://github.com/zrphercule

Test Plan: contbuild & OSS CI, see 23088fcfdf

Reviewed By: DanilBaibak

Differential Revision: D37919513

Pulled By: erichan1

fbshipit-source-id: 0697d789634775136897fdb6a310356a6a45030d

* remove decoder tests for feature not in 1.12

* remove unnecessary changes from #77903 to make changes more minimal
2022-07-25 08:54:24 +01:00
e8534b92c9 MPS cherry picks for 1.12.1 (#81976)
* MPS: Fixes (#78930)

Cast integer to float in UnaryOps
Add tensor dtype in key generation
Enable FP16 scalars and use placeholder for alpha tensor in add/sum ops

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78930
Approved by: https://github.com/albanD

* MPS: Binary cast fix by proper type promotion and remove spurious copy warning (#79185)

Fixes #78019, #78020
Fixes https://github.com/pytorch/pytorch/pull/79185
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79185
Approved by: https://github.com/albanD, https://github.com/razarmehr

* MPS: add exponential op (#79188)

Add exponential distribution

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79188
Approved by: https://github.com/razarmehr, https://github.com/albanD

* [MPS] Delete unused vars from OperationUtils.mm

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79514

Approved by: https://github.com/kulinseth, https://github.com/albanD

* [MPS] Fix getDefaultGenerator and copy_kernel_mps

Returning reference to stack memory is really bad

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79515

Approved by: https://github.com/albanD

* [MPS][BE]Do not use `new/delete[]` in `chainViewOperation`

`std::array` will do just fine

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79516

Approved by: https://github.com/albanD

* [MPS] Support stride of stride

Fixes https://github.com/pytorch/pytorch/issues/79181

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79521

Approved by: https://github.com/kulinseth

* MPS: TopK raise an error if K>16 (#79677)

* Error out in TopK when k>16.
* Add a test case too.

Fixes #78915

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79677
Approved by: https://github.com/albanD

* [MPS]: Add fix for squeezed input axes handling in BCE loss (#79676)

Fixes #79527

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79676
Approved by: https://github.com/razarmehr, https://github.com/albanD

* MPS: Add amax and amin Ops with tests  (#79682)

* Add amax and amin with tests

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79682
Approved by: https://github.com/albanD

* [MPS] Fix torch.uint8 support (#80049)

`ScalarType.Byte` should be cast to `MPSDataTypeUInt8`
And support for `torch.int8` as well as test those conversions in `TestMPS.test_to`

Fixes #80006

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80049
Approved by: https://github.com/albanD

* [MPS] Fix binary ops between int32 tensor with int64 scalar (#80220)

For some reason, tensor *op* scalar does not follow the normal binary promotion rules
So cast output tensor to expected type if needed
It seems that one should have casted input tensors to expected output tensor type, but it does not really work for boolean binary ops, so...
Add output tensor type/shape to cached graph key
Extend `TestMPS. test_add_scalars` to test for this regression

Fixes #79835

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80220
Approved by: https://github.com/albanD

* [MPS] Add equal operator (#80195)

Which is, in essence is composite of `eq`->`all`->`item`
`native/mps/operators/Equal.cpp` is an almost verbatim copy of `native/cuda/Equal.cpp`

Fix codegen by generating MPSFunctions headers

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80195
Approved by: https://github.com/albanD

* [MPS] add `aten::normal.Tensor_float` `aten::normal.float_Tensor` `aten::normal.Tensor_Tensor` (#80297)

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80297
Approved by: https://github.com/albanD, https://github.com/kulinseth

* [MPS] Add flip (#80214)

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80214
Approved by: https://github.com/DenisVieriu97, https://github.com/albanD

* [MPS] Add logical ops (#80216)

This PR adds `logical_not`, `logical_and`, `logical_or`, `logical_xor`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80216
Approved by: https://github.com/albanD, https://github.com/kulinseth

* [MPS] Add glu (#79866)

Adds mps op for `aten::glu.out`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79866
Approved by: https://github.com/kulinseth, https://github.com/albanD

* [MPS] Fix std/var cache issue (#80502)

Use `getTensorsStringKey` which has tensor shape info added as part of the key to prevent cache lookup issue when the shape of input tensor is changed.

Fixes #80499

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80502
Approved by: https://github.com/malfet, https://github.com/kulinseth

* Add scatter support for view operations (#79939)

* Add scatter support for view operations; #78074, #78886, #79672
* Update test_slicing_replace_column to properly test different sizes
* Handle in-place changes for binary ops; add new testcase
* Add new view ops testing scatter; add MPSDebugConfig.h config file for debugging purposes
* Merge gatherViewTensor and scatterViewTensor into a generic function
* Add scatter on demand in scatterViewOperation instead of caching it into a generic graph
* Create separate graphs for scatter and gather;
* Create scatter graph at scatter time

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79939
Approved by: https://github.com/razarmehr

* MPS: Fix handling of 1D tensors in linear backward (#80759)

Fixes #https://github.com/pytorch/pytorch/issues/79784

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80759
Approved by: https://github.com/ezyang

* [MPS] Move the View ops to a separate file and reduce the number of graphs created (#80491)

This is dependent on the PR to go in first: https://github.com/pytorch/pytorch/pull/79939

Remove the data_ptr from the View Graph key which reduces the number of
graphs created significantly.

Don't wait when copying from MPS to MPS tensors

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80491
Approved by: https://github.com/malfet

* [MPS] Add softplus backward (#79873)

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79873
Approved by: https://github.com/malfet

* [MPS] Add argmin (#80828)

This PR

1. adds argmin
2. refactors `reduction_type` in `ReduceOps.mm` with enum.

Co-authored by Kulin Seth <kulinseth@gmail.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80828
Approved by: https://github.com/malfet

* [MPS] Fix LSTM batch_first output transposed (#80597)

The output of LSTM with `batch_first` should be transposed back to batch first format.

Fixes #80306

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80597
Approved by: https://github.com/kulinseth

* [MPS][BE] Introduce MPSUnaryCachedGraph (#81033)

I.e. CachedGraph that has input and output tensors
Also, add `MPSGraphCache::LookUpAs` template, which combines LookUp with
static_cast to target type

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81033
Approved by: https://github.com/kulinseth

* [MPS] Add test consistency from OpInfo based tests from PR 78504 (#79532)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79532
Approved by: https://github.com/albanD, https://github.com/malfet

* [MPS] Add huber loss (#80163)

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80163
Approved by: https://github.com/kulinseth, https://github.com/malfet

* Remove two tests dependent on the MPS serialization checkin.

* Fix lint error (FLAKE8) F401

* Remove the serialization test from test_mps as its support is not there in 1.12.1.

Co-authored-by: Kulin Seth <kulinseth@gmail.com>
Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com>
Co-authored-by: Kulin Seth <kulin_seth@apple.com>
Co-authored-by: Abhishek Pathak <abhipathak97@gmail.com>
Co-authored-by: Nikita Shulga <nshulga@fb.com>
Co-authored-by: qqaatw <qqaatw@gmail.com>
Co-authored-by: Ramin Azarmehr <razarmehr@apple.com>
2022-07-25 08:52:34 +01:00
03b82bdd99 Disable XLA builds (#80099) (#80099) (#81977)
Summary:
As they are constantly failing to download llvm release from https://storage.googleapis.com:
```
Error in download_and_extract: java.io.IOException: Error downloading [9c6a2f2966.tar.gz, 9c6a2f2966.tar.gz] to /home/jenkins/.cache/bazel/_bazel_jenkins/b463291cb8b07b4bfde1e3a43733cd1a/external/llvm-raw/temp10926951092717297163/9c6a2f29660b886044a267bb4de662cd801079bc.tar.gz: Read timed out
Loading: 0 packages loaded
```

GitHub CC:
JackCaoG

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80099
Approved by: https://github.com/janeyx99

Test Plan: contbuild & OSS CI, see afdd83efcb

Reviewed By: atalman

Differential Revision: D37381940

Pulled By: malfet

fbshipit-source-id: 90e5e1a1dfed8dc19a6dddcbf5a4b2097755a25f

Co-authored-by: Nikita Shulga (Meta Employee) <nshulga@fb.com>
2022-07-22 13:50:34 +01:00
48947f738c Add check for cuda lazy init (#80912) (#80912) (#81970)
Summary:
Validate that no CUDA calls are made during `import torch` call, by
importing torch and limited visible devices to non-existing device

Should prevent regressions like ones reported in https://github.com/pytorch/pytorch/issues/80876

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80912
Approved by: https://github.com/ngimel, https://github.com/atalman

Test Plan: contbuild & OSS CI, see 1ad7ef3f21

Reviewed By: mehtanirav

Differential Revision: D37648899

Pulled By: malfet

fbshipit-source-id: a2947960d3d0d0e7e4775c37590b2e9fee38c4e9

Co-authored-by: Nikita Shulga (Meta Employee) <nshulga@fb.com>
2022-07-22 11:25:33 +01:00
787b469b19 Raise proper timeout when sharing the distributed shared seed (#81666) (#81666) (#81892)
Summary:
Fixes https://github.com/pytorch/data/issues/659

- This would fix the problem that a slow DataLoader on rank 0 would cause TimeoutError as I have removed the `wait` operation on other Ranks.
- This PR also adds a [default timeout](f6a45f7984/torch/csrc/distributed/c10d/ProcessGroup.hpp (L26-L27)) as 30 * 60 seconds (taking reference from the distributed team's implementation). When the distributed seed is stuck on any rank, a proper timeout with detailed message will be raised.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81666
Approved by: https://github.com/NivekT

Test Plan: contbuild & OSS CI, see aa1466d542

Reviewed By: jeanschmidt

Differential Revision: D37990752

Pulled By: ejguan

fbshipit-source-id: 41639341aa737ab64de1992db5ed43cbb110ec91

Co-authored-by: erjia (Meta Employee) <erjia@fb.com>
2022-07-22 08:52:28 +01:00
37b49cf958 Cudnn conv cache key patch (#81418) (#81418) (#81888)
Summary:
Fixes #81106

Patches on cudnn algo cache to consider the right memory_format used in descriptors, instead of blindly copy the memory_format on inputs.
Note that to be on the safe side, we could actually cache on all tensor strides instead. But given how we short-cut and align memory_format from pytorch tensor to cudnn descriptor, it suffice to have a single field in the cache.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81418
Approved by: https://github.com/ngimel

Test Plan: contbuild & OSS CI, see ce2ce3ae96

Reviewed By: DanilBaibak

Differential Revision: D37847747

Pulled By: DanilBaibak

fbshipit-source-id: 1e5583e29f911d0987b6ff959886697a4fc853c7

Co-authored-by: jjsjann123 <jiej@nvidia.com>
2022-07-21 17:24:08 +01:00
9160508852 [DataLoader] Locking lower ranks seed recepients (#81071) (#81071) (#81886)
Summary:
Exit seed receiving section only when all ranks received seed, otherwise we are at risk that current rank
will reach same section of the code again while rank zero still in the previous iteration

Fixes: #80845

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81071
Approved by: https://github.com/msaroufim, https://github.com/ejguan

Test Plan:
contbuild & OSS CI, see e9b3bc2ead

Original Phabricator Test Plan:
Imported from OSS

Reviewed By: mehtanirav, ejguan

Differential Revision: D37702557

Pulled By: VitalyFedyunin

fbshipit-source-id: 51dd950e1bfc2c984a4ddbe6481e225023b0a202

Co-authored-by: Vitaly Fedyunin (Meta Employee) <vitalyf@fb.com>
2022-07-21 16:50:46 +01:00
60f9724e9a Change cudnn incompatibility message wording (#80877) (#80877) (#81881)
Summary:
Change cudnn incompatibility message wording
Please refer to: #80637

Test:
```
 File "/home/atalman/torch/backends/cudnn/__init__.py", line 67, in version
    if not _init():
  File "/home/atalman/torch/backends/cudnn/__init__.py", line 50, in _init
    raise RuntimeError(
RuntimeError: cuDNN version incompatibility: PyTorch was compiled  against (8, 3, 2) but found runtime version (8, 0, 3). PyTorch already comes bundled with cuDNN. One option to resolving this error is to ensure PyTorch can find the bundled cuDNN.Looks like your LD_LIBRARY_PATH contains incompatible version of cudnnPlease either remove it from the path or install cudnn (8, 3, 2)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80877
Approved by: https://github.com/zou3519

Test Plan: contbuild & OSS CI, see a2ee1a92d6

Reviewed By: mehtanirav

Differential Revision: D37717040

Pulled By: atalman

fbshipit-source-id: 7cfc9e51999ccb9899e9ad78afdbd46f017a76bf
2022-07-21 16:41:28 +01:00
23ec48ce27 Make nn.stateless correctly reset parameters if the forward pass fails (#81262) (#81262) (#81880)
Summary:
This bug came up as I was adding new tests for ExpandedWeights

If the forwards pass errors when the `_reparametrize_module` context manager is still on, the values from reparameterization will remain on the module outside of the context manager, where it should be the original values. This fixes that by putting a try/finally block around the forward call and call to reset the parameters

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81262
Approved by: https://github.com/zou3519

Test Plan: contbuild & OSS CI, see 56d1c75518

Reviewed By: DanilBaibak

Differential Revision: D37813203

Pulled By: samdow

fbshipit-source-id: 9c32485c074b10b985b35d2d575c35f16337af5f

Co-authored-by: samdow (Meta Employee) <samdow@fb.com>
2022-07-21 16:41:03 +01:00
1d8ea8366d Add 3.10 stdlib to torch.package (#81261) (#81261) (#81879)
Summary:
Copy-n-paste the list from https://github.com/PyCQA/isort/blob/main/isort/stdlibs/py310.py

Tested locally and in https://github.com/pytorch/pytorch/pull/81233

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81261
Approved by: https://github.com/suo

Test Plan: contbuild & OSS CI, see 9ed76c8c89

Reviewed By: DanilBaibak

Differential Revision: D37781957

Pulled By: malfet

fbshipit-source-id: e39d94335950022fbdbe7b053674136694b89fad

Co-authored-by: Nikita Shulga (Meta Employee) <nshulga@fb.com>
2022-07-21 16:40:32 +01:00
5525230fda [forward ad] Fix codegen to ignore undefined outputs (#81114) (#81114) (#81878)
Summary:
I don't think there's a way to avoid functions returning undefined tensors as outputs, so codegen will have to detect them before calling _set_fw_grad. Alternatively, we can just make calling _set_fw_grad with undefined self a no-op, but I'm biasing toward keeping _set_fw_grad more strict in case it is called in other areas.

Fixes https://github.com/pytorch/pytorch/issues/81111

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81114
Approved by: https://github.com/albanD

Test Plan: contbuild & OSS CI, see f69768fed4

Reviewed By: mehtanirav

Differential Revision: D37754419

Pulled By: soulitzer

fbshipit-source-id: ca5f2e703a838fa5cbc161604c5b98460456cdc0

Co-authored-by: soulitzer (Meta Employee) <soulitzer@gmail.com>
2022-07-21 16:39:45 +01:00
12954c729d Don't error if _warned_capturable_if_run_uncaptured not set (#80345) (#80345) (#81877)
Summary:
This can happen if an optimizer was pickled.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80345
Approved by: https://github.com/malfet, https://github.com/albanD

Test Plan: contbuild & OSS CI, see 57f001f35a

Reviewed By: b0noI

Differential Revision: D37523001

Pulled By: ezyang

fbshipit-source-id: 750884421d3f398695c24c351d8d6b26a501045a

Co-authored-by: Edward Z. Yang (Meta Employee) <ezyang@fb.com>
2022-07-21 16:37:50 +01:00
a93c901447 fix weight norm backward bug on CPU when OMP_NUM_THREADS <= 2 (#80930) (#80930) (#81872)
Summary:
fix https://github.com/pytorch/pytorch/issues/80569
root cause: `weight_norm_backward_last_dim_kernel` will create a temp buffer to
do vertical reduction, size of [num_threads, N] (N is the size of last dimension of v)

to save additional memory allocation, the original kernel reuse the buffer after
the vertical sum:
  1st row stores the final result of sum
  2nd row stores coefficient a
  3rd row stores coefficient b

when OMP_NUM_THREADS <=2, this will cause illegal memory access since the buffer size
will be only 1*N or 2*N;

the fix is use a separate buffer (`a_b`) to store the coefficients of a and b.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80930
Approved by: https://github.com/frank-wei, https://github.com/malfet

Test Plan: contbuild & OSS CI, see 6ee54a8780

Reviewed By: mehtanirav

Differential Revision: D37687546

Pulled By: mehtanirav

fbshipit-source-id: 5df39c9584f310722ae6901044f35c44d2c7c091

Co-authored-by: mingfeima <mingfei.ma@intel.com>
2022-07-21 16:37:21 +01:00
9d9bba4ce8 [Prims] Unbreak CUDA lazy init (#80899) (#80899) (#81870)
Summary:
CUDA calls should not be made in the default codepath

Fixes https://github.com/pytorch/pytorch/issues/80876

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80899
Approved by: https://github.com/ngimel

Test Plan: contbuild & OSS CI, see b62209f047

Reviewed By: mehtanirav

Differential Revision: D37648864

Pulled By: malfet

fbshipit-source-id: 9648e91cdcca96d9f76d873930e4ea2601bfb57d

Co-authored-by: Nikita Shulga (Meta Employee) <nshulga@fb.com>
2022-07-21 16:03:27 +01:00
0e43325ae9 Use fabi-version=11 to ensure compatibility between gcc7 and gcc9 binaries (#81058) (#81058) (#81884)
Summary:
Fixes: #80489

Test using cuda 11.3 manywheel binary:
```
import torch
print(torch.__version__)
print(torch._C._PYBIND11 (d55b25a633)_BUILD_ABI)
````

Output
```
1.13.0.dev20220707+cu113
_cxxabi1011
```

Functorch test torch : 1.13.0.dev20220707+cu113, functorch with cu102
```
import torch
print(torch.__version__)
print(torch._C._PYBIND11 (d55b25a633)_BUILD_ABI)
from functorch import vmap
x = torch.randn(2, 3, 5)
vmap(lambda x: x, out_dims=3)(x)
```

Output
```
1.13.0.dev20220707+cu113
_cxxabi1011
/home/atalman/temp/testc1.py:5: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:73.)
  x = torch.randn(2, 3, 5)
Traceback (most recent call last):
  File "/home/atalman/temp/testc1.py", line 6, in <module>
    vmap(lambda x: x, out_dims=3)(x)
  File "/home/atalman/conda/lib/python3.9/site-packages/functorch/_src/vmap.py", line 361, in wrapped
    return _flat_vmap(
  File "/home/atalman/conda/lib/python3.9/site-packages/functorch/_src/vmap.py", line 488, in _flat_vmap
    return _unwrap_batched(batched_outputs, out_dims, vmap_level, batch_size, func)
  File "/home/atalman/conda/lib/python3.9/site-packages/functorch/_src/vmap.py", line 165, in _unwrap_batched
    flat_outputs = [
  File "/home/atalman/conda/lib/python3.9/site-packages/functorch/_src/vmap.py", line 166, in <listcomp>
    _remove_batch_dim(batched_output, vmap_level, batch_size, out_dim)
IndexError: Dimension out of range (expected to be in range of [-3, 2], but got 3)
```

Related Builder  PR: https://github.com/pytorch/builder/pull/1083

Test PR: https://github.com/pytorch/pytorch/pull/81232

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81058
Approved by: https://github.com/zou3519, https://github.com/malfet

Test Plan: contbuild & OSS CI, see d552ba3b4f

Reviewed By: DanilBaibak

Differential Revision: D37813240

Pulled By: atalman

fbshipit-source-id: 94d94e777b0e9d5da106173c06117b3019ba71c4
2022-07-21 15:08:20 +01:00
b556fb30cb Allow register float16 weight_norm on cpu and speed up test (#80600) (#80600) (#81866)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/80599

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80600
Approved by: https://github.com/malfet

Test Plan: contbuild & OSS CI, see c8d64ba5ec

Reviewed By: seemethere

Differential Revision: D37559049

Pulled By: albanD

fbshipit-source-id: 6a44fa9c8b898e2065cdb6b160b7279466f0dc7e

Co-authored-by: albanD (Meta Employee) <desmaison.alban@gmail.com>
2022-07-21 15:02:06 +01:00
1680cd0e46 Fix Module.share_memory error (#80843) (#80843) (#81867)
Summary:
Fixes #80733

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80843
Approved by: https://github.com/malfet

Test Plan: contbuild & OSS CI, see 4c279994fd

Reviewed By: mehtanirav

Differential Revision: D37619124

Pulled By: mehtanirav

fbshipit-source-id: 2b0d71d5a420d4aab286eea0b3cccdc96d15afeb

Co-authored-by: Kurt Mohler <kmohler@quansight.com>
2022-07-21 14:46:12 +01:00
868646748d Don't error if _warned_capturable_if_run_uncaptured not set (#80345) (#80345) (#81865)
Summary:
This can happen if an optimizer was pickled.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80345
Approved by: https://github.com/malfet, https://github.com/albanD

Test Plan: contbuild & OSS CI, see 57f001f35a

Reviewed By: b0noI

Differential Revision: D37523001

Pulled By: ezyang

fbshipit-source-id: 750884421d3f398695c24c351d8d6b26a501045a

Co-authored-by: Edward Z. Yang (Meta Employee) <ezyang@fb.com>
2022-07-21 13:59:19 +01:00
cd6ec07348 remove overly restrictive checks for cudagraph (#80881) (#81858)
Finish fixing https://github.com/pytorch/pytorch/issues/80809
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80881
Approved by: https://github.com/jbschlosser

Co-authored-by: albanD <desmaison.alban@gmail.com>
2022-07-21 13:58:13 +01:00
430416fc9c Fix distributed store to use add for the counter of DL shared seed (#80348) (#80348) (#81860)
Summary:
In order to get the result of `_shared_seed_recv_cnt` properly, switch from `store.get` to `store.add(key, 0)`.

See the comment from distributed team for the reason:
590d3e5774/torch/distributed/distributed_c10d.py (L242-L246)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80348
Approved by: https://github.com/VitalyFedyunin, https://github.com/NivekT

Test Plan: contbuild & OSS CI, see 3ec9d34f21

Reviewed By: NivekT

Differential Revision: D37458370

Pulled By: ejguan

fbshipit-source-id: 386457bef43dbb47e3c5b8bb4524d456b5f4343a

Co-authored-by: erjia (Meta Employee) <erjia@fb.com>
2022-07-21 13:57:16 +01:00
db8ea2703e Remove overly restrictive assert in adam (#80222) (#81857)
This is causing issues if the user has the step on cuda for a good reason.

These assert prevents code that used to run just fine to fail.
Note that this is a pretty bad thing to do for performance though so it is ok to try and push users away from doing it.

For the 1.12.1 milestone: this is not asking for a dot release to fix this (as this is bad practice anyways). But it would be a great thing to add if we do one: it is very low risk and will prevent breakage for users.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80222
Approved by: https://github.com/jbschlosser, https://github.com/ngimel

Co-authored-by: albanD <desmaison.alban@gmail.com>
2022-07-21 13:55:13 +01:00
939019c162 Don't include libiomp with conda install on MacOS (#78632) (#78632) (#81873)
Summary:
Fixes #78490

Following command:
```
conda install pytorch torchvision torchaudio -c pytorch-nightly
```

Installs libiomp . Hence we don't want to package libiomp with conda installs. However, we still keep it for libtorch and wheels.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78632
Approved by: https://github.com/malfet

Test Plan: contbuild & OSS CI, see ca7f948806

Reviewed By: b0noI

Differential Revision: D36854265

Pulled By: atalman

fbshipit-source-id: 1b9a2f034cac822d9936febaa7b94213c31af19f
2022-07-21 13:50:01 +01:00
67ece03c8c Disable AVX512 CPU dispatch by default (#80253) (#80356)
As it can be slower, see https://github.com/pytorch/pytorch/issues/80252
Update trunk test matrix to test AVX512 config in `nogpu_AVX512` flavor.
Kill `nogpu_noAVX` as AVX support were replaced with AVX512 when https://github.com/pytorch/pytorch/pull/61903 were landed
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80253
Approved by: https://github.com/ngimel

(cherry picked from commit 14813536a7120f1104be2270be341b7a383415c5)
2022-06-27 13:41:56 -04:00
bcfb424768 [JIT] Imbue stringbuf with C locale (#79929) (#79983)
To prevent 12345 become "12,345" if locale is not "C", as shown in the
following example:
```cpp

int main() {
  std::locale::global(std::locale("en_US.utf-8"));
  std::stringstream ss;
  ss << "12345 in " << std::locale().name()  << " locale is " << 12345 ;
  ss.imbue(std::locale("C"));
  ss << " but in C locale is " << 12345;
  std::cout << ss.str() << std::endl;
}

```

Fixes #79583

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79929
Approved by: https://github.com/davidberard98

Co-authored-by: Nikita Shulga <nshulga@fb.com>
2022-06-21 21:35:03 -04:00
8186aa7d6c [DataLoader] Share seed via Distributed Store to get rid of CUDA dependency (#79829) (#79890)
Fixes #79828

In distributed environment, before this PR, DataLoader would create a Tensor holding the shared seed in RANK 0 and send the Tensor to other processes. However, when `NCCL` is used as the distributed backend, the Tensor is required to be moved to cuda before broadcasted from RANK 0 to other RANKs. And, this causes the Issue where DataLoader doesn't move the Tensor to cuda before sharing using `NCCL`.

After offline discussion with @mrshenli, we think the distributed Store is a better solution as the shared seed is just an integer value. Then, we can get rid of the dependency on NCCL and CUDA when sharing info between distributed processes for DataLoader.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79829
Approved by: https://github.com/VitalyFedyunin, https://github.com/NivekT
2022-06-20 20:16:14 -04:00
01d9324fe1 nn: Disable nested tensor by default (#79884)
Better transformers (and by extension nested tensor) are identified as a
prototype feature and should not be enabled by default for the 1.12
release.

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
2022-06-20 17:15:16 -04:00
5009086150 Fix release doc builds (#79865)
This logic were lost during last workflow migration and as result we do not have docs builds for 1.12 release candidate, see pytorch/pytorch.github.io/tree/site/docs

Hattip to @brianjo for reminding me about the issue

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79865
Approved by: https://github.com/atalman, https://github.com/albanD, https://github.com/seemethere

(cherry picked from commit 2bfba840847e785b4da56498041421fc4929826b)
2022-06-20 11:27:32 -07:00
bfb6b24575 [JIT] Nested fix (#79480) (#79816)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79480
Approved by: https://github.com/davidberard98

Co-authored-by: Elias Ellison <eellison@fb.com>
2022-06-20 06:10:09 -07:00
681a6e381c [v1.12.0] Fix non-reentrant hooks based checkpointing (#79490)
* merge fix

* Test fix

* Lint
2022-06-17 14:41:52 -07:00
92437c6b4e Revert behavior of Dropout2d on 3D inputs to 1D channel-wise dropout behavior & warn (#79611)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79549

Approved by: https://github.com/ngimel, https://github.com/albanD

Co-authored-by: Joel Benjamin Schlosser <jbschlosser@fb.com>
2022-06-17 14:35:45 -04:00
566286f9db Add Dropout1d module (#79610)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79545

Approved by: https://github.com/ngimel, https://github.com/albanD

Co-authored-by: Joel Benjamin Schlosser <jbschlosser@fb.com>
2022-06-17 14:35:08 -04:00
ac3086120d [DataLoader] Fix the world_size when distributed sharding MapDataPipe (#79524) (#79550)
Fixes #79449

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79524
Approved by: https://github.com/NivekT, https://github.com/VitalyFedyunin
2022-06-15 06:23:03 -07:00
eqy
7964022214 Cherry pick tf32docs (#79537) (#79539)
* Update numerical_accuracy.rst

* Update numerical_accuracy.rst

* Update numerical_accuracy.rst

* lint
2022-06-15 06:21:18 -07:00
1d5ecdb3b9 Update PeachPy submodule (#78326)
Forked the repo, merged latest changes into pre-generated branch and
update pregenerared opcodes

Re-enabled NNPACK builds on MacOS

Picking f8ef1a3c0a  fixes https://github.com/pytorch/pytorch/issues/76094

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78326
Approved by: https://github.com/atalman, https://github.com/albanD

(cherry picked from commit fa7117c64a9cc740e71728701adb2cb2ccc143c4)
2022-06-15 06:12:50 -07:00
7eef782636 Link LazyLinalg with cusolver statically when needed (#79324) (#79522)
By copy-n-pasting the static linking logic from `libtorch_cuda` if
lazylinalg is not enabled

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79324
Approved by: https://github.com/atalman

Co-authored-by: Nikita Shulga <nshulga@fb.com>
2022-06-14 08:35:57 -07:00
fa01ea406a Add docs for Python Registration (#79481)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78753

Approved by: https://github.com/ezyang, https://github.com/albanD
2022-06-14 08:09:21 -04:00
21e1282098 [CUDA graphs] Allows Adam and AdamW to be capture-safe (#77862) (#79472)
Near term fix for https://github.com/pytorch/pytorch/issues/76368.

Q. Why does the user need to request `capturable=True` in the optimizer constructor? Why can't capture safety be completely automatic?
A. We need to set up capture-safe (device-side) state variables before capture. If we don't, and step() internally detects capture is underway, it's too late: the best we could do is create a device state variable and copy the current CPU value into it, which is not something we want baked into the graph.

Q. Ok, why not just do the capture-safe approach with device-side state variables all the time?
A. It incurs several more kernel launches per parameter, which could really add up and regress cpu overhead for ungraphed step()s. If the optimizer won't be captured, we should allow step() to stick with its current cpu-side state handling.

Q. But cuda RNG is a stateful thing that maintains its state on the cpu outside of capture and replay, and we capture it automatically. Why can't we do the same thing here?
A. The graph object can handle RNG generator increments because its capture_begin, capture_end, and replay() methods can see and access generator object. But the graph object has no explicit knowledge of or access to optimizer steps in its capture scope. We could let the user tell the graph object what optimizers will be stepped in its scope, ie something like
```python
graph.will_use_optimizer(opt)
graph.capture_begin()
...
```
but that seems clunkier than an optimizer constructor arg.

I'm open to other ideas, but right now I think constructor arg is necessary and the least bad approach.

Long term, https://github.com/pytorch/pytorch/issues/71274 is a better fix.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77862
Approved by: https://github.com/ezyang
2022-06-14 08:06:07 -04:00
2d3d6f9d05 cherry-pick (#79455)
Co-authored-by: Mike Ruberry <mruberry@fb.com>
2022-06-13 18:52:31 -04:00
da93b1cbeb [CI] Turn flaky test signal to green (#79220) (#79220) (#79416)
Summary:
This implements the RFC #73573

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79220
Approved by: https://github.com/suo

Test Plan: contbuild & OSS CI, see 1bc8c87322

Reviewed By: osalpekar

Differential Revision: D37059423

Pulled By: osalpekar

fbshipit-source-id: c73d326e3aca834221cd003157f960e5bc02960a

Co-authored-by: Jane Xu (Meta Employee) <janeyx@fb.com>
2022-06-13 17:15:34 -04:00
d67c72cb53 Removing cublas static linking (#79280) (#79417)
Removing cublas static linking

Test:  https://github.com/pytorch/pytorch/runs/6837323424?check_suite_focus=true

```
(base) atalman@atalman-dev-workstation-d4c889c8-2k8hl:~/whl_test/torch/lib$ ldd libtorch_cuda.so
	linux-vdso.so.1 (0x00007fffe8f6a000)
	libc10_cuda.so (0x00007f6539e6a000)
	libcudart-80664282.so.10.2 (0x00007f6539be9000)
	libnvToolsExt-3965bdd0.so.1 (0x00007f65399df000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f65397c0000)
	libc10.so (0x00007f653952f000)
	libtorch_cpu.so (0x00007f6520921000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f6520583000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f652037f000)
	libcublas.so.10 (0x00007f651c0c5000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f651bebd000)
	libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f651bb34000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f651b91c000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f651b52b000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f656aa13000)
	libgomp-a34b3233.so.1 (0x00007f651b301000)
	libcublasLt.so.10 (0x00007f651946c000)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79280
Approved by: https://github.com/seemethere
2022-06-13 16:48:13 -04:00
ef26f13df9 Install NDK 21 after GitHub update (#79024) (#79024) (#79429)
Summary:
See https://github.com/actions/virtual-environments/issues/5595

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79024
Approved by: https://github.com/janeyx99

Test Plan: contbuild & OSS CI, see 0be9df4e85

Reviewed By: osalpekar

Differential Revision: D36993242

Pulled By: kit1980

fbshipit-source-id: c2e76fee4eaf0b1474cb7221721cbb798c319001

Co-authored-by: Sergii Dymchenko (Meta Employee) <sdym@fb.com>
2022-06-13 15:07:45 -04:00
4a9779aa4d [DataPipe] Correcting deprecation version (#79309)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79302

Approved by: https://github.com/ejguan

Co-authored-by: PyTorch MergeBot <pytorchmergebot@users.noreply.github.com>
2022-06-10 15:23:18 -07:00
9a94ddc081 Fix _free_weak_ref error (#79315)
Fixes #74016

This is a cherry pick of  https://github.com/pytorch/pytorch/pull/78575 into release/1.12 branch
Approved by: https://github.com/ezyang
2022-06-10 15:14:11 -07:00
dee3dc6070 MPS: add layer_norm_backward (#79189) (#79276)
Layernorm backward

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79189
Approved by: https://github.com/razarmehr, https://github.com/albanD
2022-06-10 10:20:43 -07:00
30fce6836f Fix jit schema_matching ignoring self resulting in wrong operator schema (#79249)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79101

Approved by: https://github.com/gmagogsfm, https://github.com/eellison
2022-06-10 11:51:42 -04:00
0f93212516 adding a quick link to nvfuser README.md in jit doc for 1.12 release (#78160) (#79221)
adding a link to github 1.12 release branch nvfuser README.md in jit doc

Note that this PR is intended to be cherry-picked by 1.12 release, we'll have a follow up PR to update the link once this PR is merged.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78160
Approved by: https://github.com/davidberard98

Co-authored-by: jjsjann123 <alex.jann2012@gmail.com>
2022-06-09 14:33:59 -04:00
eqy
585417e935 [DDP] Cherrypick support other memory formats #79060 (#79071)
* check in

* add test
2022-06-09 14:25:17 -04:00
bd93fe635e Foward fix sharding bug for DL (#79124) (#79129)
This PR solves a bug introduced by #79041

`torch.utils.data.graph_settings.apply_sharding` changes the datapipe in-place and returns `None`

It would resolve the Error in TorchData. See: https://github.com/pytorch/data/actions/runs/2461030312
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79124
Approved by: https://github.com/VitalyFedyunin
2022-06-09 14:21:33 -04:00
cc6e2d3035 Package config/template files with torchgen (#78942) (#79123)
Package config/template files with torchgen

This PR packages native_functions.yaml, tags.yaml and ATen/templates
with torchgen.

This PR:
- adds a step to setup.py to copy the relevant files over into torchgen
- adds a docstring for torchgen (so `import torchgen; help(torchgen)`
says something)
- adds a helper function in torchgen so you can get the torchgen root
directory (and figure out where the packaged files are)
- changes some scripts to explicitly pass the location of torchgen,
which will be helpful for the first item in the Future section.

Future
======

- torchgen, when invoked from the command line, should use sources
in torchgen/packaged instead of aten/src. I'm unable to do this because
people (aka PyTorch CI) invokes `python -m torchgen.gen` without
installing torchgen.
- the source of truth for all of these files should be in torchgen.
This is a bit annoying to execute on due to potential merge conflicts
and dealing with merge systems
- CI and testing. The way things are set up right now is really fragile,
we should have a CI job for torchgen.

Test Plan
=========
I ran the following locally:

```
python -m torchgen.gen -s torchgen/packaged
```
and verified that it outputted files.

Furthermore, I did a setup.py install and checked that the files are
actually being packaged with torchgen.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78942
Approved by: https://github.com/ezyang
2022-06-09 14:20:32 -04:00
127922d451 Fix sharding strategy for distributed DL (#79041) (#79063)
1. Change the sharding strategy from sharding by worker first then by rank to sharding in the order of rank then workers.
2. Change to fetch Rank and World size in main process for the sake of `spawn`.

For the change 1:
Before this PR, for the case when dataset can not be evenly divided by `worker_num * world_size`, more data will be retrieved by workers in first RANKs.
Using the following example:
- dataset size: 100
- world_size: 4
- num_worker: 2

The number of data retrieved by each rank before this PR
- Rank 0: 26
- Rank 1: 26
- Rank 2: 24
- Rank 3: 24

The number of data retrieved by each rank after this PR
- Rank 0: 25
- Rank 1: 25
- Rank 2: 25
- Rank 3: 25

For the change 2:
Before this PR, `dist` functions are invoked inside worker processes. It's fine when the worker processes are forked from the parent process. All environment variables are inherited and exposed to these `dist` functions. However, when the worker processes are spawned, they won't be able to access to these environment variables, then the dataset won't be sharded by rank.
After this PR, `_sharding_worker_init_fn` should be working for both `spawn` and `fork` case.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79041
Approved by: https://github.com/VitalyFedyunin, https://github.com/NivekT
2022-06-07 21:24:27 -04:00
4c3742be4b Add check for no grad in transformer encoder nestedtensor conversion (#78832) (#78832) (#79029)
Summary:
Before, we allowed inputs with grad to be converted to NestedTensors. Autograd attempts to find the size of the NestedTensor, but NestedTensor throws an exception for its size function. This causes all calls to nn.TransformerEncoder with grad enabled to fail.

Fix: we add a check for no grad in transformer encoder so we do not convert tensor with grad to nestedtensor.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78832
Approved by: https://github.com/cpuhrsch, https://github.com/jbschlosser

Test Plan: contbuild & OSS CI, see 1f819ee965

Reviewed By: frank-wei, mikekgfb

Differential Revision: D36907614

Pulled By: erichan1

fbshipit-source-id: 576be36530da81c1eff59ac427ae860bfb402106
2022-06-07 21:23:27 -04:00
f12a1ff7f9 [1.12][DataPipe] Disable profiler for IterDataPipe by default and add deprecation of functional DataPipe names (#79027)
* [DataPipe] Disable profiler for IterDataPipe by default

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78674

Approved by: https://github.com/VitalyFedyunin

* [DataPipe] Add function for deprecation of functional DataPipe names

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78970

Approved by: https://github.com/ejguan
2022-06-07 17:48:48 -04:00
f913b4d9fb [quant] Skip some broken tests due to hypothesis
Summary:
Some quantization tests failed when we didn't touch any code related to the tests, all of them
are using hypothesis, it's likely that hypothesis is the problem. We will skip these tests for now and
gradually remove all hypothesis tests from quantization test code, or skip running the hypothesis tests in CI

Test Plan:
ossci

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78302

Approved by: https://github.com/suo, https://github.com/dzdang

(cherry picked from commit 716f76716a842482947efbdb54ea6bf6de3577e1)
2022-06-07 10:05:29 -07:00
9229e451b2 Guard test_sparse_csr.test_mm on CUDA11+ (#77965)
Fixes #77944

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77965
Approved by: https://github.com/albanD, https://github.com/malfet

(cherry picked from commit a8467de6fa1657a6f2b3f0b426a873c6e98ce5ce)
2022-06-07 10:02:31 -07:00
d064733915 Fix coreml ios workflow (#78356)
Which were broken by https://pypi.org/project/protobuf/4.21.0/ release
Fix by installing pinned version of coremltools with pinned version of protobuf

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78356
Approved by: https://github.com/atalman

(cherry picked from commit a4723d5a5f11f974f9d9ccd83564f3de5de818c5)
2022-06-07 09:51:50 -07:00
9d67727edf [FSDP][Docs] Fix typo in full_optim_state_dict() (#78981)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78784

Approved by: https://github.com/rohan-varma
2022-06-07 10:10:32 -04:00
ec86ed25e9 Run MPS tests (#78723)
This adds a workflow, that is executed on MacOS 12.3+ machines and runs just test_mps.py
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78723
Approved by: https://github.com/albanD, https://github.com/kulinseth

(cherry picked from commit f7ac389e71e55f84651141c01334dea668b3f90c)
2022-06-07 06:57:52 -07:00
2deba51e72 [MPS] Do not pass linker command to a compiler (#78630)
`-weak_framework` is a linker rather than a compiler option and as such
it should not be passed as CXX flag
Also, use `string(APPEND` rather than `set(FOO "$(FOO) ...)`

Likely fixes our ability to use `sccache` for MacOS CI builds, see https://github.com/pytorch/pytorch/issues/78375#issuecomment-1143697183
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78630
Approved by: https://github.com/albanD

(cherry picked from commit 634954c55c05b0c0905b2299308dd9152e08af92)
2022-06-07 06:56:57 -07:00
e9a12ec87f update mps note with more details (#78669)
Follow up to the comments in https://github.com/pytorch/pytorch/pull/77767#pullrequestreview-978807521
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78669
Approved by: https://github.com/kulinseth, https://github.com/anjali411

(cherry picked from commit b30b1f3decfd2b51ac2250b00a8ae7049143d855)
2022-06-07 06:53:59 -07:00
2a8e3ee91e Update codeowners for MPS (#78727)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78727
Approved by: https://github.com/malfet

(cherry picked from commit 48c3d8573918cf47f8091e9b5e7cea7aa0785ad4)
2022-06-07 06:52:53 -07:00
47d558e862 [MPS] Add arange_mps_out implementation (#78789)
Mostly by factoring out shader logic from `linspace_out_mps` implementation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78789
Approved by: https://github.com/albanD, https://github.com/kulinseth
2022-06-07 06:52:17 -07:00
bc0a9abad2 MPS: Fix issues with view tensors and linspace. (#78690)
Fixes: #https://github.com/pytorch/pytorch/issues/78642, https://github.com/pytorch/pytorch/issues/78511
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78690
Approved by: https://github.com/razarmehr, https://github.com/DenisVieriu97

(cherry picked from commit 4858c56334aa2b09b1ba10d0a3547ef01edda363)
2022-06-07 06:51:17 -07:00
fa7d872ce3 MPS: add linespace op (#78570)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78570
Approved by: https://github.com/malfet

(cherry picked from commit a3bdafece3a07aea186e34abc28e2540aa078393)
2022-06-07 06:51:08 -07:00
d1d2be89fd Add test case for issue: https://github.com/pytorch/pytorch/issues/77851 (#78547)
The test works fine now.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78547
Approved by: https://github.com/kulinseth

(cherry picked from commit aa62b3e003b53a0b36e04005fe5fdc8e2dda0253)
2022-06-07 06:50:53 -07:00
0e58e3374e MPS: Implement aten::count_nonzero.dim_IntList (#78169)
- See: #77764

Implements the `aten::count_nonzero.dim_IntList` operator (as used by [torch.count_nonzero](https://pytorch.org/docs/stable/generated/torch.count_nonzero.html)) for [MPS](https://pytorch.org/blog/introducing-accelerated-pytorch-training-on-mac/).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78169
Approved by: https://github.com/malfet, https://github.com/kulinseth, https://github.com/albanD

(cherry picked from commit f42b42d3eb9af4ea1d09f00a13e9b6dc9efcc0f8)
2022-06-07 06:50:41 -07:00
e3e753161c MPS: Fix crashes in view tensors due to buffer size mismatch (#78496)
Fixes #78247, #77886

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78496
Approved by: https://github.com/albanD, https://github.com/malfet

(cherry picked from commit 017b0ae9431ae3780a4eb9bf6d8865dfcd02cd92)
2022-06-07 06:50:33 -07:00
dc2b2f09d7 Speed up test_mps from 9min to 25s
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78488

Approved by: https://github.com/kulinseth

(cherry picked from commit bde246fcc60372c0ce7ee16dd5e3dc7652a36867)
2022-06-07 06:50:20 -07:00
19ebdd7eab Remove prints and add proper asserts
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78454

Approved by: https://github.com/kulinseth

(cherry picked from commit 02551a002575d1a40d6a6c7d6c7f319ef1b3ad2f)
2022-06-07 06:50:13 -07:00
f8160b113e MPS: Fixes the as_strided_mps implementation for contiguous view operations (#78440)
Fixes https://github.com/pytorch/pytorch/issues/78107; https://github.com/pytorch/pytorch/issues/77750

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78440
Approved by: https://github.com/malfet

(cherry picked from commit d63db52349ae3cffd6f762c9027e7363a6271d27)
2022-06-07 06:50:04 -07:00
3e8119bf9a MPS: Fix the memory growing issue and BERT_pytorch network crash fix. (#78006)
Fixes #77753

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78006
Approved by: https://github.com/albanD

(cherry picked from commit cbdb694f158b8471d71822873c3ac130203cc218)
2022-06-07 06:49:56 -07:00
6660df9f22 [MPS] Fix copy_kernel_mps (#78428)
By passing `storage_offset` of source and destination Tensors
This fixes following simple usecase:
```
python3` -c "import torch;x=torch.zeros(3, 3, device='mps'); x[1, 1]=1;print(x)"
```

Add test to validate it would not regress in the future

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78428
Approved by: https://github.com/kulinseth

(cherry picked from commit 437ecfc4612b73ada1f99de94f3c79de6b08f99a)
2022-06-07 06:43:55 -07:00
8b7e19a87b MPS: Eye op (#78408)
This can be used as a reference PR was to add Op in MPS backend.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78408
Approved by: https://github.com/albanD

(cherry picked from commit 8552acbd7435eadb184e0cedc21df64d3bf30329)
2022-06-07 06:43:48 -07:00
9828013233 [mps] Do not use malloc/free in Indexing.mm (#78409)
Especially allocating just 2 int64 on heap is somewhat wasteful (and they are leaked if function returns earlier)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78409
Approved by: https://github.com/seemethere, https://github.com/kulinseth

(cherry picked from commit aefb4c9fba0edf5a71a245e4fd8f5ac1d65beeac)
2022-06-07 06:43:41 -07:00
53fc6dc3db MPS: Add adaptive max pool2d op (#78410)
Adaptive max pool 2d forward and backward with test

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78410
Approved by: https://github.com/albanD

(cherry picked from commit 2e32d5fcd8de75dc2695d940925e5be181a06b54)
2022-06-07 06:43:33 -07:00
52435c6b1f MPS: add ranked tensors for addcmul ops instead of constants and update version_check (#78354)
This is a reland of https://github.com/pytorch/pytorch/pull/78312 with syntax error and formating fixed in `MPSDevice.mm`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78354
Approved by: https://github.com/kulinseth

(cherry picked from commit 45462baf7e2ef00a9aa912e2a045b20bc3ed80d3)
2022-06-07 06:43:21 -07:00
9a66061326 Fix the MPS Heap volatility (#78230)
Fixes #77829
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78230
Approved by: https://github.com/malfet

(cherry picked from commit c8ab55b2939c4cd5cd8d2e0605fdfd09e8eff294)
2022-06-07 06:42:00 -07:00
eef0ec541e Use random seed in normal_mps_out (#78010)
Fixes #78009.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78010
Approved by: https://github.com/kulinseth

(cherry picked from commit 51c4c79e3d4e600baa6a53a2afcf99ef8db5dbe0)
2022-06-07 06:41:26 -07:00
0ffefea581 Fix typo in testname (#78258)
`test_linear2D_no_bias_backwarwd` -> `test_linear2D_no_bias_backward`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78258
Approved by: https://github.com/kulinseth, https://github.com/janeyx99

(cherry picked from commit 705082656a9dd2c9f243da01f28a195b94b24d66)
2022-06-07 06:41:18 -07:00
7e12cfb29d [MPS] Lazy initialize allocators (#78227)
Do not construct MPS allocators at load time, but rather create them
lazily when needed

This significantly reduces `libtorch.dylib` load time and prevents weird
flicker, when during import torch when Intel MacBook runs switches from
integrated to discrete graphics

Before the change `python3 -c "import timeit;import importlib;print(timeit.timeit(lambda: importlib.import_module('torch'), number=1))"` takes about 1 sec, after the change it drops down to .6 sec

Minor changes:
 - Deleted unused `__block id<MTLBuffer> buf = nil;` from
   HeapAllocatorImpl
 - Add braces for single line if statements

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78227
Approved by: https://github.com/kulinseth, https://github.com/albanD

(cherry picked from commit 2679aa47897232827771ad7bb18e14bb4be3cae8)
2022-06-07 06:41:09 -07:00
24b9bd4398 [MPS] Add version check (#78192)
Use `instancesRespondToSelector:` to test the presence of
`optimizationLevel` in `MPSGraphCompilationDescriptor`, which according
to
https://developer.apple.com/documentation/metalperformanceshadersgraph/mpsgraphcompilationdescriptor/3922624-optimizationlevel
is only available on 12.3 or newer

This works around a limitations of `@available(macOS 12.3, *)` macro in
shared libraries dynamically loaded by apps targeting older runtime.
And deployment target for macos Python conda binaries is 10.14:
```
% otool -l `which python3`
...
Load command 9
      cmd LC_BUILD_VERSION
  cmdsize 32
 platform 1
    minos 10.14
      sdk 10.14
...
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78192
Approved by: https://github.com/atalman, https://github.com/seemethere

(cherry picked from commit b7bb34d7625d95e5088638721dcc07c2bc5e2ade)
2022-06-07 06:41:02 -07:00
5342e76039 Convert MPS Tensor data using MPSGraph API (#78092)
Fixes #78091
If you are already working on this, simply disregard this or take what may be helpful. This is my attempt at MPS-native Tensor datatype conversion. It works for everything tested ~~but is currently only implemented for MPS-to-MPS copy, not MPS-to-X or X-to-MPS, but the same approach could easily be used~~.

Before:
```python
In [5]: pt.full((40,), -10.3, device="mps")
Out[5]:
tensor([-10.3000, -10.3000, -10.3000, -10.3000, -10.3000, -10.3000, -10.3000,
        -10.3000, -10.3000, -10.3000, -10.3000, -10.3000, -10.3000, -10.3000,
        -10.3000, -10.3000, -10.3000, -10.3000, -10.3000, -10.3000, -10.3000,
        -10.3000, -10.3000, -10.3000, -10.3000, -10.3000, -10.3000, -10.3000,
        -10.3000, -10.3000, -10.3000, -10.3000, -10.3000, -10.3000, -10.3000,
        -10.3000, -10.3000, -10.3000, -10.3000, -10.3000], device='mps:0')

In [6]: pt.full((40,), -10.3, device="mps").int()
Out[6]:
tensor([-1054552883, -1054552883, -1054552883, -1054552883, -1054552883,
        -1054552883, -1054552883, -1054552883, -1054552883, -1054552883,
        -1054552883, -1054552883, -1054552883, -1054552883, -1054552883,
        -1054552883, -1054552883, -1054552883, -1054552883, -1054552883,
        -1054552883, -1054552883, -1054552883, -1054552883, -1054552883,
        -1054552883, -1054552883, -1054552883, -1054552883, -1054552883,
        -1054552883, -1054552883, -1054552883, -1054552883, -1054552883,
        -1054552883, -1054552883, -1054552883, -1054552883, -1054552883],
       device='mps:0', dtype=torch.int32)

In [7]: pt.full((40,), -10.3, device="mps").int().float()
Out[7]:
tensor([-10.3000, -10.3000, -10.3000, -10.3000, -10.3000, -10.3000, -10.3000,
        -10.3000, -10.3000, -10.3000, -10.3000, -10.3000, -10.3000, -10.3000,
        -10.3000, -10.3000, -10.3000, -10.3000, -10.3000, -10.3000, -10.3000,
        -10.3000, -10.3000, -10.3000, -10.3000, -10.3000, -10.3000, -10.3000,
        -10.3000, -10.3000, -10.3000, -10.3000, -10.3000, -10.3000, -10.3000,
        -10.3000, -10.3000, -10.3000, -10.3000, -10.3000], device='mps:0')

In [8]: pt.full((40,), -10.3, device="mps").int().float().bool()
Out[8]:
tensor([ True, False, False,  True,  True, False, False,  True,  True, False,
        False,  True,  True, False, False,  True,  True, False, False,  True,
         True, False, False,  True,  True, False, False,  True,  True, False,
        False,  True,  True, False, False,  True,  True, False, False,  True],
       device='mps:0')
```

After:
```python
In [3]: pt.full((40,), -10.3, device="mps")
Out[3]:
tensor([-10.3000, -10.3000, -10.3000, -10.3000, -10.3000, -10.3000, -10.3000,
        -10.3000, -10.3000, -10.3000, -10.3000, -10.3000, -10.3000, -10.3000,
        -10.3000, -10.3000, -10.3000, -10.3000, -10.3000, -10.3000, -10.3000,
        -10.3000, -10.3000, -10.3000, -10.3000, -10.3000, -10.3000, -10.3000,
        -10.3000, -10.3000, -10.3000, -10.3000, -10.3000, -10.3000, -10.3000,
        -10.3000, -10.3000, -10.3000, -10.3000, -10.3000], device='mps:0')

In [4]: pt.full((40,), -10.3, device="mps").int()
Out[4]:
tensor([-10, -10, -10, -10, -10, -10, -10, -10, -10, -10, -10, -10, -10, -10,
        -10, -10, -10, -10, -10, -10, -10, -10, -10, -10, -10, -10, -10, -10,
        -10, -10, -10, -10, -10, -10, -10, -10, -10, -10, -10, -10],
       device='mps:0', dtype=torch.int32)

In [5]: pt.full((40,), -10.3, device="mps").int().float()
Out[5]:
tensor([-10., -10., -10., -10., -10., -10., -10., -10., -10., -10., -10., -10.,
        -10., -10., -10., -10., -10., -10., -10., -10., -10., -10., -10., -10.,
        -10., -10., -10., -10., -10., -10., -10., -10., -10., -10., -10., -10.,
        -10., -10., -10., -10.], device='mps:0')

In [6]: pt.full((40,), -10.3, device="mps").int().float().bool()
Out[6]:
tensor([True, True, True, True, True, True, True, True, True, True, True, True,
        True, True, True, True, True, True, True, True, True, True, True, True,
        True, True, True, True, True, True, True, True, True, True, True, True,
        True, True, True, True], device='mps:0')
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78092
Approved by: https://github.com/kulinseth, https://github.com/malfet

(cherry picked from commit a52bfe2c5d8588b8f9e83e0beecdd18a1d672d0e)
2022-06-07 06:40:54 -07:00
08d70ab718 [MPS] Fix torch.mps.is_available() (#78121)
By introducing `at:mps::is_available()` and changing `torch._C._is_mps_available` from property to memoizable callable

Also, if `_mtl_device` is released in MPSDevice destructor, shouldn't it be retained in the constructor

Looks like GitHubActions Mac runner does not have any Metal devices available, according to https://github.com/malfet/deleteme/runs/6560871657?check_suite_focus=true#step:3:15

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78121
Approved by: https://github.com/albanD

(cherry picked from commit 6244daa6a9a27463f63235d88b9f728c91243a08)
2022-06-07 06:40:23 -07:00
207bde1ee8 Add ignore for -Wunsupported-availability-guard
This failed internal builds so just upstreaming the internal fix

Signed-off-by: Eli Uriegas <eliuriegasfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77995

Approved by: https://github.com/bigfootjon, https://github.com/malfet

(cherry picked from commit a9a99a901e953cf35edd010e17ee0ed4d2f347af)
2022-06-07 06:40:18 -07:00
51428a8f43 Fix a few issues on assert/double error/legacy constructor (#77966)
Fixes https://github.com/pytorch/pytorch/issues/77960, https://github.com/pytorch/pytorch/issues/77957, https://github.com/pytorch/pytorch/issues/77781
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77966
Approved by: https://github.com/soulitzer, https://github.com/kulinseth

(cherry picked from commit 04ac80c73a9f525322a8b622659a27ad065698ea)
2022-06-07 06:38:32 -07:00
c40f18454d [DataLoader] Apply sharding settings in dist when num_workers is 0 (#78967)
ghstack-source-id: 9c53e8c9adb3ac7c80ebf22a476385509b252511
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78950

Co-authored-by: Vitaly Fedyunin <vitaly.fedyunin@gmail.com>
2022-06-07 09:15:35 -04:00
8a5156a050 [DataPipe] Adding functional API for FileLister (#78419) (#78948)
Fixes #78263

Follow-up from pytorch/data#387. This adds a functional API `list_files()` to `FileListerDataPipe`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78419
Approved by: https://github.com/NivekT, https://github.com/ejguan

Co-authored-by: Robert Xiu <xiurobert@gmail.com>
2022-06-07 09:13:09 -04:00
04d75d2008 Make ShufflerDataPipe deterministic for persistent DL and distributed DL (#78765) (#78927)
Fixes https://github.com/pytorch/data/issues/426

This PR introduces two main changes:
- It ensures the `ShufflerDataPipe` would share the same seed across distributed processes.
- Users can reset `shuffle` for persistent workers per epoch.

Detail:
- `shared_seed` is shared across distributed and worker processes. It will seed a `shared_rng` to provide seeds to each `ShufflerDataPipe` in the pipeline
- `worker_loop` now accepts a new argument of `shared_seed` to accept this shared seed.
- The `shared_seed` is attached to `_ResumeIteration` for resetting seed per epoch for `persistent worker`
- I choose not to touch `base_seed` simply for BC issue

I used this [script](https://gist.github.com/ejguan/d88f75fa822cb696ab1bc5bc25844f47) to test the result with `world_size=4`. Please check the result in: https://gist.github.com/ejguan/6ee2d2de12ca57f9eb4b97ef5a0e300b

You can see there isn't any duplicated/missing element for each epoch. And, with the same seed, the order of data remains the same across epochs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78765
Approved by: https://github.com/VitalyFedyunin
2022-06-07 09:03:26 -04:00
2652da29ab Avoid CPU Sync in SyncBatchNorm When Capturing CUDA Graphs (#78810)
We recently updated `SyncBatchNorm` to support empty input batches.
The new code removes stats from ranks with empty inputs. However,
this change breaks CUDA graph capture as it forces CPU sync. This
commit uses `is_current_stream_capturing()` to guard the new code
path, and only run the new code when not capturing CUA Graphs. To
support empty inputs with CUDA graph capturing, we might need to
update CUDA kernels for `batch_norm_backward_elemt` and
`batch_norm_gather_stats_with_counts`. See #78656.

Fixes #78549

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78666

Approved by: https://github.com/albanD
2022-06-06 09:39:03 -04:00
aa8911885b [chalf] warn once on creating a chalf tensor (#78245) (#78710)
`chalf` is experimental as the op coverage is low.

Following script raises 6 warnings if `set_warn_always(True)` else raises only 1 warning.
```python
import torch
torch.set_warn_always(True)
device='cpu'
t = torch.randn(3, dtype=torch.chalf, device=device)
y = torch.rand(3, dtype=torch.chalf, device=device)
# Allocates new tensor for result
t + y

device='cuda'
t = torch.randn(3, dtype=torch.chalf, device=device)
y = torch.rand(3, dtype=torch.chalf, device=device)

# Allocates new tensor for result
t + y

```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78245
Approved by: https://github.com/anjali411
2022-06-03 10:51:18 -04:00
528710ec89 [DataLoader] DataLoader now automatically apply sharding to DataPipes (#78762)
ghstack-source-id: ac918b064cd09cd68a04c28238481c76b46b4010
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78631

Co-authored-by: Vitaly Fedyunin <vitaly.fedyunin@gmail.com>
2022-06-03 10:21:55 -04:00
de53f70e1d [GHA] attempt to re-enable mac test workflows (#78000) (#78749)
Our mac tests have not been running since #77645 because of
<img width="1386" alt="image" src="https://user-images.githubusercontent.com/31798555/169602783-988a265a-ce4a-41a7-8f13-3eb4615b0d6f.png">

https://github.com/pytorch/pytorch/actions/runs/2345334995
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78000
Approved by: https://github.com/malfet

Co-authored-by: Jane Xu <janeyx@fb.com>
2022-06-02 15:25:22 -04:00
39ebb3e06e fix set item to scalar tensor missing gradient info (#78746)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78246

Approved by: https://github.com/ngimel
2022-06-02 15:04:36 -04:00
fd3cc823ce [DataPipe] Lazily generate exception message for performance (#78726)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78673

Approved by: https://github.com/ejguan
2022-06-02 14:26:38 -04:00
5bb7c617f6 [docs][nn] conv: complex support note (#78351) (#78709)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78351
Approved by: https://github.com/anjali411, https://github.com/jbschlosser
2022-06-02 14:23:29 -04:00
8a627381c9 [DataLoader] Minor documentation improvement (#78548)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78404

Approved by: https://github.com/ejguan
2022-06-01 14:05:54 -04:00
f56e16a70f [ONNX] Fix typo when comparing DeviceObjType (#78085) (#78370)
#77423 Introduced a typo in

1db9be70a7/torch/onnx/symbolic_opset9.py (L5012-L5017)

where the string `DeviceObjType` was replaced with `_C.DeviceObjType`. This PR reverts the changes to the strings.

**Tested:**

With torchvision,

```
pytest test/test_onnx.py::TestONNXExporter::test_mask_rcnn
pytest -n auto test/test_onnx.py::TestONNXExporter
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78085
Approved by: https://github.com/datumbox, https://github.com/BowenBao, https://github.com/ezyang

Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>
2022-05-26 13:05:36 -07:00
c93a7f8bea Update PyTorch/XLA git clone branch name for 1.12 (#78315) 2022-05-25 16:06:39 -07:00
919b53c5e7 [Profiler] Fix segfault in AppendOnlyList (#78084) 2022-05-24 17:21:45 -04:00
2ad18abc49 [MPS] Initialize MPSDevice::_mtl_device property to nil (#78136) (#78204)
This prevents `import torch` accidentally crash on machines with no metal devices

Should prevent crashes reported in https://github.com/pytorch/pytorch/pull/77662#issuecomment-1134637986 and https://github.com/pytorch/functorch/runs/6560056366?check_suite_focus=true

Backtrace to the crash:
```
(lldb) bt
* thread #1, stop reason = signal SIGSTOP
  * frame #0: 0x00007fff7202be57 libobjc.A.dylib`objc_msgSend + 23
    frame #1: 0x000000010fd9f524 libtorch_cpu.dylib`at::mps::HeapAllocator::MPSHeapAllocatorImpl::MPSHeapAllocatorImpl() + 436
    frame #2: 0x000000010fda011d libtorch_cpu.dylib`_GLOBAL__sub_I_MPSAllocator.mm + 125
    frame #3: 0x000000010ada81e3 dyld`ImageLoaderMachO::doModInitFunctions(ImageLoader::LinkContext const&) + 535
    frame #4: 0x000000010ada85ee dyld`ImageLoaderMachO::doInitialization(ImageLoader::LinkContext const&) + 40(lldb) up
frame #1: 0x000000010fd9f524 libtorch_cpu.dylib`at::mps::HeapAllocator::MPSHeapAllocatorImpl::MPSHeapAllocatorImpl() + 436
libtorch_cpu.dylib`at::mps::HeapAllocator::MPSHeapAllocatorImpl::MPSHeapAllocatorImpl:
->  0x10fd9f524 <+436>: movq   %rax, 0x1b0(%rbx)
    0x10fd9f52b <+443>: movw   $0x0, 0x1b8(%rbx)
    0x10fd9f534 <+452>: addq   $0x8, %rsp
    0x10fd9f538 <+456>: popq   %rbx
(lldb) disassemble
 ...
    0x10fd9f514 <+420>: movq   0xf19ad15(%rip), %rsi     ; "maxBufferLength"
    0x10fd9f51b <+427>: movq   %r14, %rdi
    0x10fd9f51e <+430>: callq  *0xeaa326c(%rip)          ; (void *)0x00007fff7202be40: objc_msgSend
```

which corresponds to `[m_device maxBufferLength]` call, where `m_device` is not initialized in
2ae3c59e4b/aten/src/ATen/mps/MPSAllocator.h (L171)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78136
Approved by: https://github.com/seemethere

Co-authored-by: Nikita Shulga <nshulga@fb.com>
2022-05-24 17:03:38 -04:00
9596b999f8 Fix unit tests (#78056) 2022-05-24 17:00:57 -04:00
baabb4cb96 MPS: Add back the memory leak fixes. (#77964) (#78198)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77964
Approved by: https://github.com/albanD

Co-authored-by: Kulin Seth <kulinseth@gmail.com>
2022-05-24 16:36:16 -04:00
906a6e1df9 Fixing release rc build names (#78174) 2022-05-24 15:04:36 -04:00
974f7f8080 [1.12] Remove torch.vmap (#78021) 2022-05-23 10:30:23 -07:00
8abf37d74e ci: Pin builder to release/1.12 (#77986)
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
2022-05-20 14:00:44 -04:00
8ff2bc0c01 Release 1.12 Install torch from test channel, Pin builder and xla repo (#77983) 2022-05-20 10:51:22 -07:00
a119b7f6d4 retry - enable NVFuser by default
Enable NVFuser in OSS.

Retry of #77213, because it was breaking torchvision tests.

Fix in #77471 has been verified by jjsjann123

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77579

Approved by: https://github.com/eellison, https://github.com/malfet, https://github.com/atalman, https://github.com/seemethere
2022-05-20 10:31:49 -07:00
4522 changed files with 222921 additions and 466401 deletions

106
.bazelrc
View File

@ -13,103 +13,15 @@ build:no-tty --curses no
build:no-tty --progress_report_interval 10
build:no-tty --show_progress_rate_limit 10
# Build with GPU support by default.
build --define=cuda=true
# rules_cuda configuration
build --@rules_cuda//cuda:enable_cuda
build --@rules_cuda//cuda:cuda_targets=sm_52
build --@rules_cuda//cuda:compiler=nvcc
build --repo_env=CUDA_PATH=/usr/local/cuda
# Configuration to build without GPU support
build:cpu-only --define=cuda=false
# Configuration to build with GPU support
build:gpu --define=cuda=true
# define a separate build folder for faster switching between configs
build:cpu-only --platform_suffix=-cpu-only
build:gpu --platform_suffix=-gpu
# See the note on the config-less build for details about why we are
# doing this. We must also do it for the "-cpu-only" platform suffix.
build --copt=-isystem --copt=bazel-out/k8-fastbuild-cpu-only/bin
# doing this. We must also do it for the "-gpu" platform suffix.
build --copt=-isystem --copt=bazel-out/k8-fastbuild-gpu/bin
# rules_cuda configuration
build:cpu-only --@rules_cuda//cuda:enable_cuda=False
# Definition of --config=shell
# interactive shell immediately before execution
build:shell --run_under="//tools/bazel_tools:shellwrap"
# Disable all warnings for external repositories. We don't care about
# their warnings.
build --per_file_copt=^external/@-w
# Set additional warnings to error level.
#
# Implementation notes:
# * we use file extensions to determine if we are using the C++
# compiler or the cuda compiler
# * we use ^// at the start of the regex to only permit matching
# PyTorch files. This excludes external repos.
#
# Note that because this is logically a command-line flag, it is
# considered the word on what warnings are enabled. This has the
# unfortunate consequence of preventing us from disabling an error at
# the target level because those flags will come before these flags in
# the action invocation. Instead we provide per-file exceptions after
# this.
#
# On the bright side, this means we don't have to more broadly apply
# the exceptions to an entire target.
#
# Looking for CUDA flags? We have a cu_library macro that we can edit
# directly. Look in //tools/rules:cu.bzl for details. Editing the
# macro over this has the following advantages:
# * making changes does not require discarding the Bazel analysis
# cache
# * it allows for selective overrides on individual targets since the
# macro-level opts will come earlier than target level overrides
build --per_file_copt='^//.*\.(cpp|cc)$'@-Werror=all
# The following warnings come from -Wall. We downgrade them from error
# to warnings here.
#
# sign-compare has a tremendous amount of violations in the
# codebase. It will be a lot of work to fix them, just disable it for
# now.
build --per_file_copt='^//.*\.(cpp|cc)$'@-Wno-sign-compare
# We intentionally use #pragma unroll, which is compiler specific.
build --per_file_copt='^//.*\.(cpp|cc)$'@-Wno-error=unknown-pragmas
build --per_file_copt='^//.*\.(cpp|cc)$'@-Werror=extra
# The following warnings come from -Wextra. We downgrade them from error
# to warnings here.
#
# unused-parameter-compare has a tremendous amount of violations in the
# codebase. It will be a lot of work to fix them, just disable it for
# now.
build --per_file_copt='^//.*\.(cpp|cc)$'@-Wno-unused-parameter
# missing-field-parameters has both a large number of violations in
# the codebase, but it also is used pervasively in the Python C
# API. There are a couple of catches though:
# * we use multiple versions of the Python API and hence have
# potentially multiple different versions of each relevant
# struct. They may have different numbers of fields. It will be
# unwieldy to support multiple versions in the same source file.
# * Python itself for many of these structs recommends only
# initializing a subset of the fields. We should respect the API
# usage conventions of our dependencies.
#
# Hence, we just disable this warning altogether. We may want to clean
# up some of the clear-cut cases that could be risky, but we still
# likely want to have this disabled for the most part.
build --per_file_copt='^//.*\.(cpp|cc)$'@-Wno-missing-field-initializers
build --per_file_copt='//:aten/src/ATen/RegisterCompositeExplicitAutograd\.cpp$'@-Wno-error=unused-function
build --per_file_copt='//:aten/src/ATen/RegisterCompositeImplicitAutograd\.cpp$'@-Wno-error=unused-function
build --per_file_copt='//:aten/src/ATen/RegisterMkldnnCPU\.cpp$'@-Wno-error=unused-function
build --per_file_copt='//:aten/src/ATen/RegisterNestedTensorCPU\.cpp$'@-Wno-error=unused-function
build --per_file_copt='//:aten/src/ATen/RegisterQuantizedCPU\.cpp$'@-Wno-error=unused-function
build --per_file_copt='//:aten/src/ATen/RegisterSparseCPU\.cpp$'@-Wno-error=unused-function
build --per_file_copt='//:aten/src/ATen/RegisterSparseCsrCPU\.cpp$'@-Wno-error=unused-function
build --per_file_copt='//:aten/src/ATen/RegisterNestedTensorMeta\.cpp$'@-Wno-error=unused-function
build --per_file_copt='//:aten/src/ATen/RegisterSparseMeta\.cpp$'@-Wno-error=unused-function
build --per_file_copt='//:aten/src/ATen/RegisterQuantizedMeta\.cpp$'@-Wno-error=unused-function
build --per_file_copt='//:aten/src/ATen/RegisterZeroTensor\.cpp$'@-Wno-error=unused-function
build --per_file_copt='//:torch/csrc/lazy/generated/RegisterAutogradLazy\.cpp$'@-Wno-error=unused-function
build --per_file_copt='//:torch/csrc/lazy/generated/RegisterLazy\.cpp$'@-Wno-error=unused-function
build:gpu --@rules_cuda//cuda:enable_cuda
build:gpu --@rules_cuda//cuda:cuda_targets=sm_52
build:gpu --@rules_cuda//cuda:compiler=nvcc
build:gpu --repo_env=CUDA_PATH=/usr/local/cuda

View File

@ -1,13 +1,8 @@
[pt]
is_oss=1
[buildfile]
name = BUCK.oss
includes = //tools/build_defs/select.bzl
name = BUILD.buck
[repositories]
bazel_skylib = third_party/bazel-skylib/
ovr_config = .
[download]
in_build = true
@ -15,11 +10,6 @@
[cxx]
cxxflags = -std=c++17
should_remap_host_platform = true
cpp = /usr/bin/clang
cc = /usr/bin/clang
cxx = /usr/bin/clang++
cxxpp = /usr/bin/clang++
ld = /usr/bin/clang++
[project]
default_flavors_mode=all

View File

@ -4,7 +4,6 @@ CUDA_VERSIONS = [
"102",
"113",
"116",
"117",
]
ROCM_VERSIONS = [

View File

@ -75,7 +75,6 @@ class ExperimentalFeatureConfigNode(TreeConfigNode):
"vulkan": VulkanConfigNode,
"parallel_tbb": ParallelTBBConfigNode,
"crossref": CrossRefConfigNode,
"dynamo": DynamoConfigNode,
"parallel_native": ParallelNativeConfigNode,
"onnx": ONNXConfigNode,
"libtorch": LibTorchConfigNode,
@ -180,14 +179,6 @@ class CrossRefConfigNode(TreeConfigNode):
return ImportantConfigNode
class DynamoConfigNode(TreeConfigNode):
def init2(self, node_name):
self.props["is_dynamo"] = node_name
def child_constructor(self):
return ImportantConfigNode
class ParallelNativeConfigNode(TreeConfigNode):
def modify_label(self, label):
return "PARALLELNATIVE=" + str(label)

View File

@ -240,7 +240,6 @@ def instantiate_configs(only_slow_gradcheck):
is_xla = fc.find_prop("is_xla") or False
is_asan = fc.find_prop("is_asan") or False
is_crossref = fc.find_prop("is_crossref") or False
is_dynamo = fc.find_prop("is_dynamo") or False
is_onnx = fc.find_prop("is_onnx") or False
is_pure_torch = fc.find_prop("is_pure_torch") or False
is_vulkan = fc.find_prop("is_vulkan") or False
@ -287,9 +286,6 @@ def instantiate_configs(only_slow_gradcheck):
if is_crossref:
parms_list_ignored_for_docker_image.append("crossref")
if is_dynamo:
parms_list_ignored_for_docker_image.append("dynamo")
if is_onnx:
parms_list.append("onnx")
python_version = fc.find_prop("pyver")

View File

@ -1,5 +1,4 @@
from cimodel.data.simple.util.versions import MultiPartVersion
from cimodel.data.simple.util.branch_filters import gen_filter_dict_exclude
import cimodel.lib.miniutils as miniutils
XCODE_VERSION = MultiPartVersion([12, 5, 1])
@ -12,7 +11,7 @@ class ArchVariant:
def render(self):
extra_parts = [self.custom_build_name] if len(self.custom_build_name) > 0 else []
return "-".join([self.name] + extra_parts).replace("_", "-")
return "_".join([self.name] + extra_parts)
def get_platform(arch_variant_name):
@ -26,25 +25,30 @@ class IOSJob:
self.is_org_member_context = is_org_member_context
self.extra_props = extra_props
def gen_name_parts(self):
version_parts = self.xcode_version.render_dots_or_parts("-")
build_variant_suffix = self.arch_variant.render()
def gen_name_parts(self, with_version_dots):
version_parts = self.xcode_version.render_dots_or_parts(with_version_dots)
build_variant_suffix = "_".join([self.arch_variant.render(), "build"])
return [
"pytorch",
"ios",
] + version_parts + [
build_variant_suffix,
]
def gen_job_name(self):
return "-".join(self.gen_name_parts())
return "_".join(self.gen_name_parts(False))
def gen_tree(self):
platform_name = get_platform(self.arch_variant.name)
props_dict = {
"name": self.gen_job_name(),
"build_environment": self.gen_job_name(),
"build_environment": "-".join(self.gen_name_parts(True)),
"ios_arch": self.arch_variant.name,
"ios_platform": platform_name,
"name": self.gen_job_name(),
}
if self.is_org_member_context:
@ -53,28 +57,30 @@ class IOSJob:
if self.extra_props:
props_dict.update(self.extra_props)
props_dict["filters"] = gen_filter_dict_exclude()
return [{"pytorch_ios_build": props_dict}]
WORKFLOW_DATA = [
IOSJob(XCODE_VERSION, ArchVariant("x86_64"), is_org_member_context=False, extra_props={
"lite_interpreter": miniutils.quote(str(int(True)))}),
# IOSJob(XCODE_VERSION, ArchVariant("arm64"), extra_props={
# "lite_interpreter": miniutils.quote(str(int(True)))}),
# IOSJob(XCODE_VERSION, ArchVariant("arm64", "metal"), extra_props={
# "use_metal": miniutils.quote(str(int(True))),
# "lite_interpreter": miniutils.quote(str(int(True)))}),
# IOSJob(XCODE_VERSION, ArchVariant("arm64", "custom-ops"), extra_props={
# "op_list": "mobilenetv2.yaml",
# "lite_interpreter": miniutils.quote(str(int(True)))}),
IOSJob(XCODE_VERSION, ArchVariant("x86_64", "full_jit"), is_org_member_context=False, extra_props={
"lite_interpreter": miniutils.quote(str(int(False)))}),
IOSJob(XCODE_VERSION, ArchVariant("arm64"), extra_props={
"lite_interpreter": miniutils.quote(str(int(True)))}),
IOSJob(XCODE_VERSION, ArchVariant("arm64", "metal"), extra_props={
"use_metal": miniutils.quote(str(int(True))),
"lite_interpreter": miniutils.quote(str(int(True)))}),
IOSJob(XCODE_VERSION, ArchVariant("arm64", "full_jit"), extra_props={
"lite_interpreter": miniutils.quote(str(int(False)))}),
IOSJob(XCODE_VERSION, ArchVariant("arm64", "custom"), extra_props={
"op_list": "mobilenetv2.yaml",
"lite_interpreter": miniutils.quote(str(int(True)))}),
IOSJob(XCODE_VERSION, ArchVariant("x86_64", "coreml"), is_org_member_context=False, extra_props={
"use_coreml": miniutils.quote(str(int(True))),
"lite_interpreter": miniutils.quote(str(int(True)))}),
# IOSJob(XCODE_VERSION, ArchVariant("arm64", "coreml"), extra_props={
# "use_coreml": miniutils.quote(str(int(True))),
# "lite_interpreter": miniutils.quote(str(int(True)))}),
IOSJob(XCODE_VERSION, ArchVariant("arm64", "coreml"), extra_props={
"use_coreml": miniutils.quote(str(int(True))),
"lite_interpreter": miniutils.quote(str(int(True)))}),
]

View File

@ -1,8 +1,3 @@
from collections import OrderedDict
from cimodel.lib.miniutils import quote
from cimodel.data.simple.util.branch_filters import gen_filter_dict_exclude
class MacOsJob:
def __init__(self, os_version, is_build=False, is_test=False, extra_props=tuple()):
# extra_props is tuple type, because mutable data structures for argument defaults
@ -16,14 +11,10 @@ class MacOsJob:
non_phase_parts = ["pytorch", "macos", self.os_version, "py3"]
extra_name_list = [name for name, exist in self.extra_props.items() if exist]
full_job_name_list = (
non_phase_parts
+ extra_name_list
+ [
"build" if self.is_build else None,
"test" if self.is_test else None,
]
)
full_job_name_list = non_phase_parts + extra_name_list + [
'build' if self.is_build else None,
'test' if self.is_test else None,
]
full_job_name = "_".join(list(filter(None, full_job_name_list)))
@ -50,99 +41,12 @@ WORKFLOW_DATA = [
"10_13",
is_build=True,
is_test=True,
extra_props=tuple({"lite_interpreter": True}.items()),
),
extra_props=tuple({
"lite_interpreter": True
}.items()),
)
]
def get_new_workflow_jobs():
return [
OrderedDict(
{
"mac_build": OrderedDict(
{
"name": "macos-12-py3-x86-64-build",
"build-environment": "macos-12-py3-x86-64",
"xcode-version": quote("13.3.1"),
"filters": gen_filter_dict_exclude()
}
)
}
),
OrderedDict(
{
"mac_test": OrderedDict(
{
"name": "macos-12-py3-x86-64-test-1-2-default",
"build-environment": "macos-12-py3-x86-64",
"xcode-version": quote("13.3.1"),
"shard-number": quote("1"),
"num-test-shards": quote("2"),
"requires": ["macos-12-py3-x86-64-build"],
"filters": gen_filter_dict_exclude()
}
)
}
),
OrderedDict(
{
"mac_test": OrderedDict(
{
"name": "macos-12-py3-x86-64-test-2-2-default",
"build-environment": "macos-12-py3-x86-64",
"xcode-version": quote("13.3.1"),
"shard-number": quote("2"),
"num-test-shards": quote("2"),
"requires": ["macos-12-py3-x86-64-build"],
"filters": gen_filter_dict_exclude()
}
)
}
),
OrderedDict(
{
"mac_test": OrderedDict(
{
"name": "macos-12-py3-x86-64-test-1-1-functorch",
"build-environment": "macos-12-py3-x86-64",
"xcode-version": quote("13.3.1"),
"shard-number": quote("1"),
"num-test-shards": quote("1"),
"test-config": "functorch",
"requires": ["macos-12-py3-x86-64-build"],
"filters": gen_filter_dict_exclude()
}
)
}
),
OrderedDict(
{
"mac_build": OrderedDict(
{
"name": "macos-12-py3-x86-64-lite-interpreter-build-test",
"build-environment": "macos-12-py3-lite-interpreter-x86-64",
"xcode-version": quote("13.3.1"),
"build-generates-artifacts": "false",
"filters": gen_filter_dict_exclude()
}
)
}
),
OrderedDict(
{
"mac_build": OrderedDict(
{
"name": "macos-12-py3-arm64-build",
"build-environment": "macos-12-py3-arm64",
"xcode-version": quote("13.3.1"),
"python-version": quote("3.9.12"),
"filters": gen_filter_dict_exclude()
}
)
}
),
]
def get_workflow_jobs():
return [item.gen_tree() for item in WORKFLOW_DATA]

View File

@ -15,7 +15,7 @@ class IOSNightlyJob:
def get_phase_name(self):
return "upload" if self.is_upload else "build"
def get_common_name_pieces(self, sep):
def get_common_name_pieces(self, with_version_dots):
extra_name_suffix = [self.get_phase_name()] if self.is_upload else []
@ -24,7 +24,7 @@ class IOSNightlyJob:
common_name_pieces = [
"ios",
] + extra_name + [
] + ios_definitions.XCODE_VERSION.render_dots_or_parts(sep) + [
] + ios_definitions.XCODE_VERSION.render_dots_or_parts(with_version_dots) + [
"nightly",
self.variant,
"build",
@ -33,14 +33,14 @@ class IOSNightlyJob:
return common_name_pieces
def gen_job_name(self):
return "_".join(["pytorch"] + self.get_common_name_pieces(None))
return "_".join(["pytorch"] + self.get_common_name_pieces(False))
def gen_tree(self):
build_configs = BUILD_CONFIGS_FULL_JIT if self.is_full_jit else BUILD_CONFIGS
extra_requires = [x.gen_job_name() for x in build_configs] if self.is_upload else []
props_dict = {
"build_environment": "-".join(["libtorch"] + self.get_common_name_pieces(".")),
"build_environment": "-".join(["libtorch"] + self.get_common_name_pieces(True)),
"requires": extra_requires,
"context": "org-member",
"filters": {"branches": {"only": "nightly"}},

View File

@ -1,22 +0,0 @@
from typing import OrderedDict
from cimodel.data.simple.util.branch_filters import gen_filter_dict_exclude
def get_workflow_job():
return [
OrderedDict(
{
"upload_test_stats": OrderedDict(
{
"name": "upload test status",
"requires": [
"macos-12-py3-x86-64-test-1-2-default",
"macos-12-py3-x86-64-test-2-2-default",
"macos-12-py3-x86-64-test-1-1-functorch",
],
"filters": gen_filter_dict_exclude()
}
)
}
),
]

View File

@ -12,9 +12,6 @@ PR_BRANCH_LIST = [
RC_PATTERN = r"/v[0-9]+(\.[0-9]+)*-rc[0-9]+/"
MAC_IOS_EXCLUSION_LIST = ["nightly", "postnightly"]
def gen_filter_dict(
branches_list=NON_PR_BRANCH_LIST,
tags_list=None
@ -29,11 +26,3 @@ def gen_filter_dict(
if tags_list is not None:
filter_dict["tags"] = {"only": tags_list}
return filter_dict
def gen_filter_dict_exclude(branches_list=MAC_IOS_EXCLUSION_LIST):
return {
"branches": {
"ignore": branches_list,
},
}

View File

@ -1,6 +1,3 @@
from typing import Optional
class MultiPartVersion:
def __init__(self, parts, prefix=""):
self.parts = parts
@ -16,11 +13,14 @@ class MultiPartVersion:
else:
return [self.prefix]
def render_dots_or_parts(self, sep: Optional[str] = None):
if sep is None:
return self.prefixed_parts()
def render_dots(self):
return ".".join(self.prefixed_parts())
def render_dots_or_parts(self, with_dots):
if with_dots:
return [self.render_dots()]
else:
return [sep.join(self.prefixed_parts())]
return self.prefixed_parts()
class CudaVersion(MultiPartVersion):

857
.circleci/config.yml generated
View File

@ -174,6 +174,46 @@ commands:
echo "This is not a pull request, skipping..."
fi
upload_binary_size_for_android_build:
description: "Upload binary size data for Android build"
parameters:
build_type:
type: string
default: ""
artifacts:
type: string
default: ""
steps:
- run:
name: "Binary Size - Install Dependencies"
no_output_timeout: "5m"
command: |
retry () {
$* || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*)
}
retry pip3 install requests
- run:
name: "Binary Size - Untar Artifacts"
no_output_timeout: "5m"
command: |
# The artifact file is created inside docker container, which contains the result binaries.
# Now unpackage it into the project folder. The subsequent script will scan project folder
# to locate result binaries and report their sizes.
# If artifact file is not provided it assumes that the project folder has been mounted in
# the docker during build and already contains the result binaries, so this step can be skipped.
export ARTIFACTS="<< parameters.artifacts >>"
if [ -n "${ARTIFACTS}" ]; then
tar xf "${ARTIFACTS}" -C ~/project
fi
- run:
name: "Binary Size - Upload << parameters.build_type >>"
no_output_timeout: "5m"
command: |
cd ~/project
export ANDROID_BUILD_TYPE="<< parameters.build_type >>"
export COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
python3 -m tools.stats.upload_binary_size_to_scuba android
##############################################################################
# Binary build (nightlies nightly build) defaults
# The binary builds use the docker executor b/c at time of writing the machine
@ -401,6 +441,245 @@ binary_windows_params: &binary_windows_params
# Job specs
##############################################################################
jobs:
binary_linux_build:
<<: *binary_linux_build_params
steps:
- checkout
- calculate_docker_image_tag
- run:
<<: *binary_checkout
- run:
<<: *binary_populate_env
- run:
name: Build
no_output_timeout: "1h"
command: |
source "/pytorch/.circleci/scripts/binary_linux_build.sh"
# Preserve build log
if [ -f /pytorch/build/.ninja_log ]; then
cp /pytorch/build/.ninja_log /final_pkgs
fi
- run:
name: Output binary sizes
no_output_timeout: "1m"
command: |
ls -lah /final_pkgs
- run:
name: upload build & binary data
no_output_timeout: "5m"
command: |
source /env
cd /pytorch && export COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
python3 -mpip install requests && \
SCRIBE_GRAPHQL_ACCESS_TOKEN=${SCRIBE_GRAPHQL_ACCESS_TOKEN} \
python3 -m tools.stats.upload_binary_size_to_scuba || exit 0
- persist_to_workspace:
root: /
paths: final_pkgs
- store_artifacts:
path: /final_pkgs
# This should really just be another step of the binary_linux_build job above.
# This isn't possible right now b/c the build job uses the docker executor
# (otherwise they'd be really really slow) but this one uses the macine
# executor (b/c we have to run the docker with --runtime=nvidia and we can't do
# that on the docker executor)
binary_linux_test:
<<: *binary_linux_test_upload_params
machine:
image: ubuntu-2004:202104-01
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- checkout
- attach_workspace:
at: /home/circleci/project
- setup_linux_system_environment
- setup_ci_environment
- run:
<<: *binary_checkout
- run:
<<: *binary_populate_env
- run:
name: Prepare test code
no_output_timeout: "1h"
command: .circleci/scripts/binary_linux_test.sh
- run:
<<: *binary_run_in_docker
binary_upload:
parameters:
package_type:
type: string
description: "What type of package we are uploading (eg. wheel, libtorch, conda)"
default: "wheel"
upload_subfolder:
type: string
description: "What subfolder to put our package into (eg. cpu, cudaX.Y, etc.)"
default: "cpu"
docker:
- image: continuumio/miniconda3
environment:
- DRY_RUN: disabled
- PACKAGE_TYPE: "<< parameters.package_type >>"
- UPLOAD_SUBFOLDER: "<< parameters.upload_subfolder >>"
steps:
- attach_workspace:
at: /tmp/workspace
- checkout
- designate_upload_channel
- run:
name: Install dependencies
no_output_timeout: "1h"
command: |
conda install -yq anaconda-client
pip install -q awscli
- run:
name: Do upload
no_output_timeout: "1h"
command: |
AWS_ACCESS_KEY_ID="${PYTORCH_BINARY_AWS_ACCESS_KEY_ID}" \
AWS_SECRET_ACCESS_KEY="${PYTORCH_BINARY_AWS_SECRET_ACCESS_KEY}" \
ANACONDA_API_TOKEN="${CONDA_PYTORCHBOT_TOKEN}" \
.circleci/scripts/binary_upload.sh
# Nighlty build smoke tests defaults
# These are the second-round smoke tests. These make sure that the binaries are
# correct from a user perspective, testing that they exist from the cloud are
# are runnable. Note that the pytorch repo is never cloned into these jobs
##############################################################################
smoke_linux_test:
<<: *binary_linux_test_upload_params
machine:
image: ubuntu-2004:202104-01
steps:
- checkout
- calculate_docker_image_tag
- setup_linux_system_environment
- setup_ci_environment
- run:
<<: *binary_checkout
- run:
<<: *binary_populate_env
- run:
name: Test
no_output_timeout: "1h"
command: |
set -ex
cat >/home/circleci/project/ci_test_script.sh \<<EOL
# The following code will be executed inside Docker container
set -eux -o pipefail
/builder/smoke_test.sh
# The above code will be executed inside Docker container
EOL
- run:
<<: *binary_run_in_docker
smoke_mac_test:
<<: *binary_linux_test_upload_params
macos:
xcode: "12.0"
steps:
- checkout
- run:
<<: *binary_checkout
- run:
<<: *binary_populate_env
- brew_update
- run:
<<: *binary_install_miniconda
- run:
name: Build
no_output_timeout: "1h"
command: |
set -ex
source "/Users/distiller/project/env"
export "PATH=$workdir/miniconda/bin:$PATH"
# TODO unbuffer and ts this, but it breaks cause miniconda overwrites
# tclsh. But unbuffer and ts aren't that important so they're just
# disabled for now
./builder/smoke_test.sh
binary_mac_build:
<<: *binary_mac_params
macos:
xcode: "12.0"
resource_class: "large"
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- checkout
- run:
<<: *binary_checkout
- run:
<<: *binary_populate_env
- brew_update
- run:
<<: *binary_install_miniconda
- run:
name: Build
no_output_timeout: "90m"
command: |
# Do not set -u here; there is some problem with CircleCI
# variable expansion with PROMPT_COMMAND
set -ex -o pipefail
script="/Users/distiller/project/pytorch/.circleci/scripts/binary_macos_build.sh"
cat "$script"
source "$script"
- run:
name: Test
no_output_timeout: "1h"
command: |
# Do not set -u here; there is some problem with CircleCI
# variable expansion with PROMPT_COMMAND
set -ex -o pipefail
script="/Users/distiller/project/pytorch/.circleci/scripts/binary_macos_test.sh"
cat "$script"
source "$script"
- persist_to_workspace:
root: /Users/distiller/project
paths: final_pkgs
- store_artifacts:
path: /Users/distiller/project/final_pkgs
binary_macos_arm64_build:
<<: *binary_mac_params
macos:
xcode: "12.3.0"
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- checkout
- run:
<<: *binary_checkout
- run:
<<: *binary_populate_env
- brew_update
- run:
<<: *binary_install_miniconda
- run:
name: Build
no_output_timeout: "90m"
command: |
# Do not set -u here; there is some problem with CircleCI
# variable expansion with PROMPT_COMMAND
set -ex -o pipefail
export CROSS_COMPILE_ARM64=1
script="/Users/distiller/project/pytorch/.circleci/scripts/binary_macos_build.sh"
cat "$script"
source "$script"
- persist_to_workspace:
root: /Users/distiller/project
paths: final_pkgs
- store_artifacts:
path: /Users/distiller/project/final_pkgs
binary_ios_build:
<<: *pytorch_ios_params
macos:
@ -445,6 +724,90 @@ jobs:
cat "$script"
source "$script"
binary_windows_build:
<<: *binary_windows_params
parameters:
build_environment:
type: string
default: ""
executor:
type: string
default: "windows-xlarge-cpu-with-nvidia-cuda"
executor: <<parameters.executor>>
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- checkout
- run:
<<: *binary_checkout
- run:
<<: *binary_populate_env
- run:
name: Build
no_output_timeout: "1h"
command: |
set -eux -o pipefail
script="/c/w/p/.circleci/scripts/binary_windows_build.sh"
cat "$script"
source "$script"
- persist_to_workspace:
root: "C:/w"
paths: final_pkgs
- store_artifacts:
path: C:/w/final_pkgs
binary_windows_test:
<<: *binary_windows_params
parameters:
build_environment:
type: string
default: ""
executor:
type: string
default: "windows-medium-cpu-with-nvidia-cuda"
executor: <<parameters.executor>>
steps:
- checkout
- attach_workspace:
at: c:/users/circleci/project
- run:
<<: *binary_checkout
- run:
<<: *binary_populate_env
- run:
name: Test
no_output_timeout: "1h"
command: |
set -eux -o pipefail
script="/c/w/p/.circleci/scripts/binary_windows_test.sh"
cat "$script"
source "$script"
smoke_windows_test:
<<: *binary_windows_params
parameters:
build_environment:
type: string
default: ""
executor:
type: string
default: "windows-medium-cpu-with-nvidia-cuda"
executor: <<parameters.executor>>
steps:
- checkout
- run:
<<: *binary_checkout
- run:
<<: *binary_populate_env
- run:
name: Test
no_output_timeout: "1h"
command: |
set -eux -o pipefail
export TEST_NIGHTLY_PACKAGE=1
script="/c/w/p/.circleci/scripts/binary_windows_test.sh"
cat "$script"
source "$script"
anaconda_prune:
parameters:
packages:
@ -499,6 +862,95 @@ jobs:
pushd /tmp/workspace
git push -u origin "<< parameters.branch >>"
pytorch_python_doc_build:
environment:
BUILD_ENVIRONMENT: pytorch-python-doc-push
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.7-gcc5.4"
resource_class: large
machine:
image: ubuntu-2004:202104-01
steps:
- checkout
- calculate_docker_image_tag
- setup_linux_system_environment
- setup_ci_environment
- run:
name: Doc Build and Push
no_output_timeout: "1h"
command: |
set -ex
export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}:build-${DOCKER_TAG}-${CIRCLE_SHA1}
echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}
# turn v1.12.0rc3 into 1.12
tag=$(echo $CIRCLE_TAG | sed -e 's/v*\([0-9]*\.[0-9]*\).*/\1/')
target=${tag:-main}
echo "building for ${target}"
time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null
export id=$(docker run --env-file "${BASH_ENV}" --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})
export COMMAND='((echo "sudo chown -R jenkins workspace && cd workspace && '"export CIRCLE_SHA1='$CIRCLE_SHA1'"' && . ./.circleci/scripts/python_doc_push_script.sh docs/'$target' '$target' site") | docker exec -u jenkins -i "$id" bash) 2>&1'
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts
mkdir -p ~/workspace/build_artifacts
docker cp $id:/var/lib/jenkins/workspace/pytorch.github.io/docs/main ~/workspace/build_artifacts
docker cp $id:/var/lib/jenkins/workspace/pytorch.github.io /tmp/workspace
# Save the docs build so we can debug any problems
export DEBUG_COMMIT_DOCKER_IMAGE=${COMMIT_DOCKER_IMAGE}-debug
docker commit "$id" ${DEBUG_COMMIT_DOCKER_IMAGE}
time docker push ${DEBUG_COMMIT_DOCKER_IMAGE}
- persist_to_workspace:
root: /tmp/workspace
paths:
- .
- store_artifacts:
path: ~/workspace/build_artifacts/main
destination: docs
pytorch_cpp_doc_build:
environment:
BUILD_ENVIRONMENT: pytorch-cpp-doc-push
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.7-gcc5.4"
resource_class: large
machine:
image: ubuntu-2004:202104-01
steps:
- checkout
- calculate_docker_image_tag
- setup_linux_system_environment
- setup_ci_environment
- run:
name: Doc Build and Push
no_output_timeout: "1h"
command: |
set -ex
export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}:build-${DOCKER_TAG}-${CIRCLE_SHA1}
echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}
# turn v1.12.0rc3 into 1.12
tag=$(echo $CIRCLE_TAG | sed -e 's/v*\([0-9]*\.[0-9]*\).*/\1/')
target=${tag:-main}
echo "building for ${target}"
time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null
export id=$(docker run --env-file "${BASH_ENV}" --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})
export COMMAND='((echo "sudo chown -R jenkins workspace && cd workspace && '"export CIRCLE_SHA1='$CIRCLE_SHA1'"' && . ./.circleci/scripts/cpp_doc_push_script.sh docs/"$target" main") | docker exec -u jenkins -i "$id" bash) 2>&1'
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts
mkdir -p ~/workspace/build_artifacts
docker cp $id:/var/lib/jenkins/workspace/cppdocs/ /tmp/workspace
# Save the docs build so we can debug any problems
export DEBUG_COMMIT_DOCKER_IMAGE=${COMMIT_DOCKER_IMAGE}-debug
docker commit "$id" ${DEBUG_COMMIT_DOCKER_IMAGE}
time docker push ${DEBUG_COMMIT_DOCKER_IMAGE}
- persist_to_workspace:
root: /tmp/workspace
paths:
- .
pytorch_macos_10_15_py3_build:
environment:
BUILD_ENVIRONMENT: pytorch-macos-10.15-py3-arm64-build
@ -512,6 +964,7 @@ jobs:
no_output_timeout: "1h"
command: |
set -e
export IN_CI=1
export CROSS_COMPILE_ARM64=1
export JOB_BASE_NAME=$CIRCLE_JOB
@ -549,6 +1002,7 @@ jobs:
no_output_timeout: "1h"
command: |
set -e
export IN_CI=1
export JOB_BASE_NAME=$CIRCLE_JOB
# Install sccache
@ -570,198 +1024,6 @@ jobs:
paths:
- miniconda3
mac_build:
parameters:
build-environment:
type: string
description: Top-level label for what's being built/tested.
xcode-version:
type: string
default: "13.3.1"
description: What xcode version to build with.
build-generates-artifacts:
type: boolean
default: true
description: if the build generates build artifacts
python-version:
type: string
default: "3.8"
macos:
xcode: << parameters.xcode-version >>
resource_class: medium
environment:
BUILD_ENVIRONMENT: << parameters.build-environment >>
AWS_REGION: us-east-1
steps:
- checkout
- run_brew_for_macos_build
- run:
name: Install sccache
command: |
sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache
sudo chmod +x /usr/local/bin/sccache
echo "export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${BASH_ENV}"
echo "export SCCACHE_S3_KEY_PREFIX=${GITHUB_WORKFLOW}" >> "${BASH_ENV}"
set +x
echo "export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V4}" >> "${BASH_ENV}"
echo "export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4}" >> "${BASH_ENV}"
set -x
- run:
name: Get workflow job id
command: |
echo "export OUR_GITHUB_JOB_ID=${CIRCLE_WORKFLOW_JOB_ID}" >> "${BASH_ENV}"
- run:
name: Build
command: |
set -x
git submodule sync
git submodule update --init --recursive --depth 1 --jobs 0
export PATH="/usr/local/bin:$PATH"
export WORKSPACE_DIR="${HOME}/workspace"
mkdir -p "${WORKSPACE_DIR}"
MINICONDA_URL="https://repo.anaconda.com/miniconda/Miniconda3-py38_4.12.0-MacOSX-x86_64.sh"
if [ << parameters.python-version >> == 3.9.12 ]; then
MINICONDA_URL="https://repo.anaconda.com/miniconda/Miniconda3-py39_4.12.0-MacOSX-x86_64.sh"
fi
# If a local installation of conda doesn't exist, we download and install conda
if [ ! -d "${WORKSPACE_DIR}/miniconda3" ]; then
mkdir -p "${WORKSPACE_DIR}"
curl --retry 3 ${MINICONDA_URL} -o "${WORKSPACE_DIR}"/miniconda3.sh
bash "${WORKSPACE_DIR}"/miniconda3.sh -b -p "${WORKSPACE_DIR}"/miniconda3
fi
export PATH="${WORKSPACE_DIR}/miniconda3/bin:$PATH"
# shellcheck disable=SC1091
source "${WORKSPACE_DIR}"/miniconda3/bin/activate
brew link --force libomp
echo "export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname "$(which conda)")/../"}" >> "${BASH_ENV}"
.jenkins/pytorch/macos-build.sh
- when:
condition: << parameters.build-generates-artifacts >>
steps:
- run:
name: Archive artifacts into zip
command: |
zip -1 -r artifacts.zip dist/ build/.ninja_log build/compile_commands.json .pytorch-test-times.json
cp artifacts.zip /Users/distiller/workspace
- persist_to_workspace:
root: /Users/distiller/workspace/
paths:
- miniconda3
- artifacts.zip
- store_artifacts:
path: /Users/distiller/project/artifacts.zip
mac_test:
parameters:
build-environment:
type: string
shard-number:
type: string
num-test-shards:
type: string
xcode-version:
type: string
test-config:
type: string
default: 'default'
macos:
xcode: << parameters.xcode-version >>
environment:
GIT_DEFAULT_BRANCH: 'master'
BUILD_ENVIRONMENT: << parameters.build-environment >>
TEST_CONFIG: << parameters.test-config >>
SHARD_NUMBER: << parameters.shard-number >>
NUM_TEST_SHARDS: << parameters.num-test-shards >>
PYTORCH_RETRY_TEST_CASES: 1
PYTORCH_OVERRIDE_FLAKY_SIGNAL: 1
steps:
- checkout
- attach_workspace:
at: ~/workspace
- run_brew_for_macos_build
- run:
name: Test
no_output_timeout: "2h"
command: |
set -x
git submodule sync --recursive
git submodule update --init --recursive
mv ~/workspace/artifacts.zip .
unzip artifacts.zip
export IN_CI=1
COMMIT_MESSAGES=$(git cherry -v "origin/${GIT_DEFAULT_BRANCH:-master}")
export PATH="/usr/local/bin:$PATH"
export WORKSPACE_DIR="${HOME}/workspace"
mkdir -p "${WORKSPACE_DIR}"
export PATH="${WORKSPACE_DIR}/miniconda3/bin:$PATH"
source "${WORKSPACE_DIR}"/miniconda3/bin/activate
# sanitize the input commit message and PR body here:
# trim all new lines from commit messages to avoid issues with batch environment
# variable copying. see https://github.com/pytorch/pytorch/pull/80043#issuecomment-1167796028
COMMIT_MESSAGES="${COMMIT_MESSAGES//[$'\n\r']}"
# then trim all special characters like single and double quotes to avoid unescaped inputs to
# wreak havoc internally
export COMMIT_MESSAGES="${COMMIT_MESSAGES//[\'\"]}"
python3 -mpip install dist/*.whl
.jenkins/pytorch/macos-test.sh
- run:
name: Copy files for uploading test stats
command: |
# copy into a parent folder test-reports because we can't use CIRCLEI_BUILD_NUM in path when persisting to workspace
mkdir -p test-reports/test-reports_${CIRCLE_BUILD_NUM}/test/test-reports
cp -r test/test-reports test-reports/test-reports_${CIRCLE_BUILD_NUM}/test/test-reports
- store_test_results:
path: test/test-reports
- persist_to_workspace:
root: /Users/distiller/project/
paths:
- test-reports
upload_test_stats:
machine: # executor type
image: ubuntu-2004:202010-01 # # recommended linux image - includes Ubuntu 20.04, docker 19.03.13, docker-compose 1.27.4
steps:
- checkout
- attach_workspace:
at: ~/workspace
- run:
name: upload
command: |
set -ex
if [ -z ${AWS_ACCESS_KEY_FOR_OSSCI_ARTIFACT_UPLOAD} ]; then
echo "No credentials found, cannot upload test stats (are you on a fork?)"
exit 0
fi
cp -r ~/workspace/test-reports/* ~/project
pip3 install requests==2.26 rockset==0.8.3 boto3==1.19.12 six==1.16.0
export AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_FOR_OSSCI_ARTIFACT_UPLOAD}
export AWS_SECRET_ACCESS_KEY=${AWS_SECRET_KEY_FOR_OSSCI_ARTIFACT_UPLOAD}
# i dont know how to get the run attempt number for reruns so default to 1
python3 -m tools.stats.upload_test_stats --workflow-run-id "${CIRCLE_WORKFLOW_JOB_ID}" --workflow-run-attempt 1 --head-branch << pipeline.git.branch >> --circleci
pytorch_macos_10_13_py3_test:
environment:
BUILD_ENVIRONMENT: pytorch-macos-10.13-py3-test
@ -777,6 +1039,7 @@ jobs:
no_output_timeout: "1h"
command: |
set -e
export IN_CI=1
export JOB_BASE_NAME=$CIRCLE_JOB
chmod a+x .jenkins/pytorch/macos-test.sh
@ -789,6 +1052,7 @@ jobs:
source /Users/distiller/workspace/miniconda3/bin/activate
python3 -m pip install boto3==1.19.12
export IN_CI=1
export JOB_BASE_NAME=$CIRCLE_JOB
# Using the same IAM user to write stats to our OSS bucket
@ -814,6 +1078,7 @@ jobs:
no_output_timeout: "1h"
command: |
set -e
export IN_CI=1
export BUILD_LITE_INTERPRETER=1
export JOB_BASE_NAME=$CIRCLE_JOB
chmod a+x ${HOME}/project/.jenkins/pytorch/macos-lite-interpreter-build-test.sh
@ -903,6 +1168,9 @@ jobs:
output_image=$docker_image_libtorch_android_x86_32-gradle
docker commit "$id_x86_32" ${output_image}
time docker push ${output_image}
- upload_binary_size_for_android_build:
build_type: prebuilt
artifacts: /home/circleci/workspace/build_android_artifacts/artifacts.tgz
- store_artifacts:
path: ~/workspace/build_android_artifacts/artifacts.tgz
destination: artifacts.tgz
@ -978,6 +1246,9 @@ jobs:
output_image=${docker_image_libtorch_android_x86_32}-gradle
docker commit "$id" ${output_image}
time docker push ${output_image}
- upload_binary_size_for_android_build:
build_type: prebuilt-single
artifacts: /home/circleci/workspace/build_android_x86_32_artifacts/artifacts.tgz
- store_artifacts:
path: ~/workspace/build_android_x86_32_artifacts/artifacts.tgz
destination: artifacts.tgz
@ -987,43 +1258,10 @@ jobs:
macos:
xcode: "12.5.1"
steps:
- run:
name: checkout with retry
command: |
checkout() {
set -ex
# Workaround old docker images with incorrect $HOME
# check https://github.com/docker/docker/issues/2968 for details
if [ "${HOME}" = "/" ]
then
export HOME=$(getent passwd $(id -un) | cut -d: -f6)
fi
mkdir -p ~/.ssh
echo 'github.com ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAq2A7hRGmdnm9tUDbO9IDSwBK6TbQa+PXYPCPy6rbTrTtw7PHkccKrpp0yVhp5HdEIcKr6pLlVDBfOLX9QUsyCOV0wzfjIJNlGEYsdlLJizHhbn2mUjvSAHQqZETYP81eFzLQNnPHt4EVVUh7VfDESU84KezmD5QlWpXLmvU31/yMf+Se8xhHTvKSCZIFImWwoG6mbUoWf9nzpIoaSjB+weqqUUmpaaasXVal72J+UX2B+2RPW3RcT0eOzQgqlJL3RKrTJvdsjE3JEAvGq3lGHSZXy28G3skua2SmVi/w4yCE6gbODqnTWlg7+wC604ydGXA8VJiS5ap43JXiUFFAaQ==
' >> ~/.ssh/known_hosts
# use git+ssh instead of https
git config --global url."ssh://git@github.com".insteadOf "https://github.com" || true
git config --global gc.auto 0 || true
echo 'Cloning git repository'
mkdir -p '/Users/distiller/project'
cd '/Users/distiller/project'
git clone "$CIRCLE_REPOSITORY_URL" .
echo 'Checking out branch'
git checkout --force -B "$CIRCLE_BRANCH" "$CIRCLE_SHA1"
git --no-pager log --no-color -n 1 --format='HEAD is now at %h %s'
}
retry () {
$* || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*)
}
retry checkout
- checkout
- run_brew_for_ios_build
- run:
name: Setup Fastlane
name: Run Fastlane
no_output_timeout: "1h"
command: |
set -e
@ -1031,11 +1269,26 @@ jobs:
cd ${PROJ_ROOT}/ios/TestApp
# install fastlane
sudo gem install bundler && bundle install
# install certificates
echo ${IOS_CERT_KEY_2022} >> cert.txt
base64 --decode cert.txt -o Certificates.p12
rm cert.txt
bundle exec fastlane install_root_cert
bundle exec fastlane install_dev_cert
# install the provisioning profile
PROFILE=PyTorch_CI_2022.mobileprovision
PROVISIONING_PROFILES=~/Library/MobileDevice/Provisioning\ Profiles
mkdir -pv "${PROVISIONING_PROFILES}"
cd "${PROVISIONING_PROFILES}"
echo ${IOS_SIGN_KEY_2022} >> cert.txt
base64 --decode cert.txt -o ${PROFILE}
rm cert.txt
- run:
name: Build
no_output_timeout: "1h"
command: |
set -e
export IN_CI=1
WORKSPACE=/Users/distiller/workspace
PROJ_ROOT=/Users/distiller/project
export TCLLIBPATH="/usr/local/lib"
@ -1088,12 +1341,18 @@ jobs:
command: |
set -e
PROJ_ROOT=/Users/distiller/project
PROFILE=PyTorch_CI_2022
# run the ruby build script
if ! [ -x "$(command -v xcodebuild)" ]; then
echo 'Error: xcodebuild is not installed.'
exit 1
fi
ruby ${PROJ_ROOT}/scripts/xcode_build.rb -i ${PROJ_ROOT}/build_ios/install -x ${PROJ_ROOT}/ios/TestApp/TestApp.xcodeproj -p ${IOS_PLATFORM}
echo ${IOS_DEV_TEAM_ID}
if [ ${IOS_PLATFORM} != "SIMULATOR" ]; then
ruby ${PROJ_ROOT}/scripts/xcode_build.rb -i ${PROJ_ROOT}/build_ios/install -x ${PROJ_ROOT}/ios/TestApp/TestApp.xcodeproj -p ${IOS_PLATFORM} -c ${PROFILE} -t ${IOS_DEV_TEAM_ID}
else
ruby ${PROJ_ROOT}/scripts/xcode_build.rb -i ${PROJ_ROOT}/build_ios/install -x ${PROJ_ROOT}/ios/TestApp/TestApp.xcodeproj -p ${IOS_PLATFORM}
fi
if ! [ "$?" -eq "0" ]; then
echo 'xcodebuild failed!'
exit 1
@ -1116,13 +1375,12 @@ jobs:
cd ${PROJ_ROOT}/ios/TestApp/benchmark
mkdir -p ../models
if [ ${USE_COREML_DELEGATE} == 1 ]; then
pip install coremltools==5.0b5 protobuf==3.20.1 six==1.16.0
pip install coremltools==5.0b5
pip install six
python coreml_backend.py
else
cd "${PROJ_ROOT}"
python test/mobile/model_test/gen_test_model.py ios-test
python trace_model.py
fi
cd "${PROJ_ROOT}/ios/TestApp/benchmark"
if [ ${BUILD_LITE_INTERPRETER} == 1 ]; then
echo "Setting up the TestApp for LiteInterpreter"
ruby setup.rb --lite 1
@ -1130,10 +1388,10 @@ jobs:
echo "Setting up the TestApp for Full JIT"
ruby setup.rb
fi
cd "${PROJ_ROOT}/ios/TestApp"
# instruments -s -devices
if [ "${BUILD_LITE_INTERPRETER}" == 1 ]; then
if [ "${USE_COREML_DELEGATE}" == 1 ]; then
cd ${PROJ_ROOT}/ios/TestApp
instruments -s -devices
if [ ${BUILD_LITE_INTERPRETER} == 1 ]; then
if [ ${USE_COREML_DELEGATE} == 1 ]; then
fastlane scan --only_testing TestAppTests/TestAppTests/testCoreML
else
fastlane scan --only_testing TestAppTests/TestAppTests/testLiteInterpreter
@ -1236,6 +1494,30 @@ jobs:
python3 -m pip install requests
python3 ./.circleci/scripts/trigger_azure_pipeline.py
pytorch_doc_test:
environment:
BUILD_ENVIRONMENT: pytorch-doc-test
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.7-gcc5.4"
resource_class: medium
machine:
image: ubuntu-2004:202104-01
steps:
- checkout
- calculate_docker_image_tag
- setup_linux_system_environment
- setup_ci_environment
- run:
name: Doc test
no_output_timeout: "30m"
command: |
set -ex
export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}:build-${DOCKER_TAG}-${CIRCLE_SHA1}
echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}
time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null
export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})
export COMMAND='((echo "sudo chown -R jenkins workspace && cd workspace && . ./.jenkins/pytorch/docs-test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts
# update_s3_htmls job
# These jobs create html files for every cpu/cu## folder in s3. The html
# files just store the names of all the files in that folder (which are
@ -1447,107 +1729,4 @@ workflows:
branches:
only:
- postnightly
- mac_build:
name: macos-12-py3-x86-64-build
build-environment: macos-12-py3-x86-64
xcode-version: "13.3.1"
filters:
branches:
ignore:
- nightly
- postnightly
- mac_test:
name: macos-12-py3-x86-64-test-1-2-default
build-environment: macos-12-py3-x86-64
xcode-version: "13.3.1"
shard-number: "1"
num-test-shards: "2"
requires:
- macos-12-py3-x86-64-build
filters:
branches:
ignore:
- nightly
- postnightly
- mac_test:
name: macos-12-py3-x86-64-test-2-2-default
build-environment: macos-12-py3-x86-64
xcode-version: "13.3.1"
shard-number: "2"
num-test-shards: "2"
requires:
- macos-12-py3-x86-64-build
filters:
branches:
ignore:
- nightly
- postnightly
- mac_test:
name: macos-12-py3-x86-64-test-1-1-functorch
build-environment: macos-12-py3-x86-64
xcode-version: "13.3.1"
shard-number: "1"
num-test-shards: "1"
test-config: functorch
requires:
- macos-12-py3-x86-64-build
filters:
branches:
ignore:
- nightly
- postnightly
- mac_build:
name: macos-12-py3-x86-64-lite-interpreter-build-test
build-environment: macos-12-py3-lite-interpreter-x86-64
xcode-version: "13.3.1"
build-generates-artifacts: false
filters:
branches:
ignore:
- nightly
- postnightly
- mac_build:
name: macos-12-py3-arm64-build
build-environment: macos-12-py3-arm64
xcode-version: "13.3.1"
python-version: "3.9.12"
filters:
branches:
ignore:
- nightly
- postnightly
- upload_test_stats:
name: upload test status
requires:
- macos-12-py3-x86-64-test-1-2-default
- macos-12-py3-x86-64-test-2-2-default
- macos-12-py3-x86-64-test-1-1-functorch
filters:
branches:
ignore:
- nightly
- postnightly
- pytorch_ios_build:
build_environment: ios-12-5-1-x86-64
filters:
branches:
ignore:
- nightly
- postnightly
ios_arch: x86_64
ios_platform: SIMULATOR
lite_interpreter: "1"
name: ios-12-5-1-x86-64
- pytorch_ios_build:
build_environment: ios-12-5-1-x86-64-coreml
filters:
branches:
ignore:
- nightly
- postnightly
ios_arch: x86_64
ios_platform: SIMULATOR
lite_interpreter: "1"
name: ios-12-5-1-x86-64-coreml
use_coreml: "1"
when: << pipeline.parameters.run_build >>

View File

@ -53,7 +53,7 @@ dependencies {
implementation 'androidx.appcompat:appcompat:1.0.0'
implementation 'com.facebook.fbjni:fbjni-java-only:0.2.2'
implementation 'com.google.code.findbugs:jsr305:3.0.1'
implementation 'com.facebook.soloader:nativeloader:0.10.4'
implementation 'com.facebook.soloader:nativeloader:0.10.1'
implementation 'junit:junit:' + rootProject.junitVersion
implementation 'androidx.test:core:' + rootProject.coreVersion

View File

@ -54,8 +54,6 @@ elif [[ "$image" == *-bionic* ]]; then
UBUNTU_VERSION=18.04
elif [[ "$image" == *-focal* ]]; then
UBUNTU_VERSION=20.04
elif [[ "$image" == *-jammy* ]]; then
UBUNTU_VERSION=22.04
elif [[ "$image" == *ubuntu* ]]; then
extract_version_from_image_name ubuntu UBUNTU_VERSION
elif [[ "$image" == *centos* ]]; then
@ -72,8 +70,7 @@ else
fi
DOCKERFILE="${OS}/Dockerfile"
# When using ubuntu - 22.04, start from Ubuntu docker image, instead of nvidia/cuda docker image.
if [[ "$image" == *cuda* && "$UBUNTU_VERSION" != "22.04" ]]; then
if [[ "$image" == *cuda* ]]; then
DOCKERFILE="${OS}-cuda/Dockerfile"
elif [[ "$image" == *rocm* ]]; then
DOCKERFILE="${OS}-rocm/Dockerfile"
@ -84,8 +81,6 @@ if [[ "$image" == *xenial* ]] || [[ "$image" == *bionic* ]]; then
fi
TRAVIS_DL_URL_PREFIX="https://s3.amazonaws.com/travis-python-archives/binaries/ubuntu/14.04/x86_64"
_UCX_COMMIT=31e74cac7bee0ef66bef2af72e7d86d9c282e5ab
_UCC_COMMIT=12944da33f911daf505d9bbc51411233d0ed85e1
# It's annoying to rename jobs every time you want to rewrite a
# configuration, so we hardcode everything here rather than do it
@ -96,6 +91,14 @@ case "$image" in
GCC_VERSION=7
# Do not install PROTOBUF, DB, and VISION as a test
;;
pytorch-linux-xenial-py3.7-gcc5.4)
ANACONDA_PYTHON_VERSION=3.7
GCC_VERSION=5
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
;;
pytorch-linux-xenial-py3.7-gcc7.2)
ANACONDA_PYTHON_VERSION=3.7
GCC_VERSION=7
@ -141,28 +144,14 @@ case "$image" in
KATEX=yes
;;
pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7)
CUDA_VERSION=11.6.2
CUDA_VERSION=11.6.0
CUDNN_VERSION=8
ANACONDA_PYTHON_VERSION=3.10
ANACONDA_PYTHON_VERSION=3.7
GCC_VERSION=7
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
;;
pytorch-linux-bionic-cuda11.7-cudnn8-py3-gcc7)
CUDA_VERSION=11.7.0
CUDNN_VERSION=8
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=7
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
;;
pytorch-linux-xenial-py3-clang5-asan)
ANACONDA_PYTHON_VERSION=3.7
@ -178,13 +167,6 @@ case "$image" in
DB=yes
VISION=yes
;;
pytorch-linux-focal-py3-clang7-asan)
ANACONDA_PYTHON_VERSION=3.7
CLANG_VERSION=7
PROTOBUF=yes
DB=yes
VISION=yes
;;
pytorch-linux-xenial-py3-clang7-onnx)
ANACONDA_PYTHON_VERSION=3.7
CLANG_VERSION=7
@ -192,13 +174,6 @@ case "$image" in
DB=yes
VISION=yes
;;
pytorch-linux-focal-py3-clang10-onnx)
ANACONDA_PYTHON_VERSION=3.7
CLANG_VERSION=10
PROTOBUF=yes
DB=yes
VISION=yes
;;
pytorch-linux-xenial-py3-clang5-android-ndk-r19c)
ANACONDA_PYTHON_VERSION=3.7
CLANG_VERSION=5.0
@ -250,7 +225,15 @@ case "$image" in
DB=yes
VISION=yes
;;
pytorch-linux-focal-rocm5.1-py3.7)
pytorch-linux-bionic-rocm5.0-py3.7)
ANACONDA_PYTHON_VERSION=3.7
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
ROCM_VERSION=5.0
;;
pytorch-linux-bionic-rocm5.1-py3.7)
ANACONDA_PYTHON_VERSION=3.7
GCC_VERSION=9
PROTOBUF=yes
@ -258,41 +241,15 @@ case "$image" in
VISION=yes
ROCM_VERSION=5.1.1
;;
pytorch-linux-focal-rocm5.2-py3.7)
ANACONDA_PYTHON_VERSION=3.7
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
ROCM_VERSION=5.2
;;
pytorch-linux-focal-py3.7-gcc7)
ANACONDA_PYTHON_VERSION=3.7
CMAKE_VERSION=3.16.9 # Required for precompiled header support
CMAKE_VERSION=3.12.4 # To make sure XNNPACK is enabled for the BACKWARDS_COMPAT_TEST used with this image
GCC_VERSION=7
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
;;
pytorch-linux-jammy-cuda11.6-cudnn8-py3.8-clang12)
ANACONDA_PYTHON_VERSION=3.8
CUDA_VERSION=11.6
CUDNN_VERSION=8
CLANG_VERSION=12
PROTOBUF=yes
DB=yes
VISION=yes
;;
pytorch-linux-jammy-cuda11.7-cudnn8-py3.8-clang12)
ANACONDA_PYTHON_VERSION=3.8
CUDA_VERSION=11.7
CUDNN_VERSION=8
CLANG_VERSION=12
PROTOBUF=yes
DB=yes
VISION=yes
;;
*)
# Catch-all for builds that are not hardcoded.
PROTOBUF=yes
@ -379,10 +336,8 @@ docker build \
--build-arg "NINJA_VERSION=${NINJA_VERSION:-}" \
--build-arg "KATEX=${KATEX:-}" \
--build-arg "ROCM_VERSION=${ROCM_VERSION:-}" \
--build-arg "PYTORCH_ROCM_ARCH=${PYTORCH_ROCM_ARCH:-gfx906}" \
--build-arg "PYTORCH_ROCM_ARCH=${PYTORCH_ROCM_ARCH:-gfx900;gfx906}" \
--build-arg "IMAGE_NAME=${IMAGE_NAME}" \
--build-arg "UCX_COMMIT=${UCX_COMMIT}" \
--build-arg "UCC_COMMIT=${UCC_COMMIT}" \
-f $(dirname ${DOCKERFILE})/Dockerfile \
-t "$tmp_tag" \
"$@" \

View File

@ -18,7 +18,6 @@ tag="${DOCKER_TAG}"
registry="308535385114.dkr.ecr.us-east-1.amazonaws.com"
image="${registry}/pytorch/${IMAGE_NAME}"
ghcr_image="ghcr.io/pytorch/ci-image"
login() {
aws ecr get-authorization-token --region us-east-1 --output text --query 'authorizationData[].authorizationToken' |
@ -47,22 +46,7 @@ fi
# Build new image
./build.sh ${IMAGE_NAME} -t "${image}:${tag}"
# Only push if `DOCKER_SKIP_PUSH` = false
if [ "${DOCKER_SKIP_PUSH:-true}" = "false" ]; then
# Only push if docker image doesn't exist already.
# ECR image tags are immutable so this will avoid pushing if only just testing if the docker jobs work
# NOTE: The only workflow that should push these images should be the docker-builds.yml workflow
if ! docker manifest inspect "${image}:${tag}" >/dev/null 2>/dev/null; then
docker push "${image}:${tag}"
fi
if [ "${PUSH_GHCR_IMAGE:-}" = "true" ]; then
# Push docker image to the ghcr.io
echo $GHCR_PAT | docker login ghcr.io -u pytorch --password-stdin
docker tag "${image}:${tag}" "${ghcr_image}:${IMAGE_NAME}-${tag}"
docker push "${ghcr_image}:${IMAGE_NAME}-${tag}"
fi
fi
docker push "${image}:${tag}"
if [ -z "${DOCKER_SKIP_S3_UPLOAD:-}" ]; then
trap "rm -rf ${IMAGE_NAME}:${tag}.tar" EXIT

View File

@ -12,7 +12,7 @@ ENV PYTORCH_ROCM_ARCH ${PYTORCH_ROCM_ARCH}
# Install common dependencies (so that this step can be cached separately)
ARG EC2
COPY ./common/install_base.sh install_base.sh
ADD ./common/install_base.sh install_base.sh
RUN bash ./install_base.sh && rm install_base.sh
# Update CentOS git version
@ -23,57 +23,54 @@ RUN yum install -y git
# Install devtoolset
ARG DEVTOOLSET_VERSION
COPY ./common/install_devtoolset.sh install_devtoolset.sh
ADD ./common/install_devtoolset.sh install_devtoolset.sh
RUN bash ./install_devtoolset.sh && rm install_devtoolset.sh
ENV BASH_ENV "/etc/profile"
# (optional) Install non-default glibc version
ARG GLIBC_VERSION
COPY ./common/install_glibc.sh install_glibc.sh
ADD ./common/install_glibc.sh install_glibc.sh
RUN if [ -n "${GLIBC_VERSION}" ]; then bash ./install_glibc.sh; fi
RUN rm install_glibc.sh
# Install user
COPY ./common/install_user.sh install_user.sh
ADD ./common/install_user.sh install_user.sh
RUN bash ./install_user.sh && rm install_user.sh
# Install conda and other packages (e.g., numpy, pytest)
ENV PATH /opt/conda/bin:$PATH
ARG ANACONDA_PYTHON_VERSION
COPY requirements-ci.txt /opt/conda/requirements-ci.txt
COPY ./common/install_conda.sh install_conda.sh
ADD requirements-ci.txt /opt/conda/requirements-ci.txt
ADD ./common/install_conda.sh install_conda.sh
RUN bash ./install_conda.sh && rm install_conda.sh
RUN rm /opt/conda/requirements-ci.txt
# (optional) Install protobuf for ONNX
ARG PROTOBUF
COPY ./common/install_protobuf.sh install_protobuf.sh
ADD ./common/install_protobuf.sh install_protobuf.sh
RUN if [ -n "${PROTOBUF}" ]; then bash ./install_protobuf.sh; fi
RUN rm install_protobuf.sh
ENV INSTALLED_PROTOBUF ${PROTOBUF}
# (optional) Install database packages like LMDB and LevelDB
ARG DB
COPY ./common/install_db.sh install_db.sh
ADD ./common/install_db.sh install_db.sh
RUN if [ -n "${DB}" ]; then bash ./install_db.sh; fi
RUN rm install_db.sh
ENV INSTALLED_DB ${DB}
# (optional) Install vision packages like OpenCV and ffmpeg
ARG VISION
COPY ./common/install_vision.sh install_vision.sh
ADD ./common/install_vision.sh install_vision.sh
RUN if [ -n "${VISION}" ]; then bash ./install_vision.sh; fi
RUN rm install_vision.sh
ENV INSTALLED_VISION ${VISION}
# Install rocm
ARG ROCM_VERSION
COPY ./common/install_rocm.sh install_rocm.sh
ADD ./common/install_rocm.sh install_rocm.sh
RUN bash ./install_rocm.sh
RUN rm install_rocm.sh
COPY ./common/install_rocm_magma.sh install_rocm_magma.sh
RUN bash ./install_rocm_magma.sh
RUN rm install_rocm_magma.sh
ENV PATH /opt/rocm/bin:$PATH
ENV PATH /opt/rocm/hcc/bin:$PATH
ENV PATH /opt/rocm/hip/bin:$PATH
@ -85,18 +82,18 @@ ENV LC_ALL en_US.utf8
# (optional) Install non-default CMake version
ARG CMAKE_VERSION
COPY ./common/install_cmake.sh install_cmake.sh
ADD ./common/install_cmake.sh install_cmake.sh
RUN if [ -n "${CMAKE_VERSION}" ]; then bash ./install_cmake.sh; fi
RUN rm install_cmake.sh
# (optional) Install non-default Ninja version
ARG NINJA_VERSION
COPY ./common/install_ninja.sh install_ninja.sh
ADD ./common/install_ninja.sh install_ninja.sh
RUN if [ -n "${NINJA_VERSION}" ]; then bash ./install_ninja.sh; fi
RUN rm install_ninja.sh
# Install ccache/sccache (do this last, so we get priority in PATH)
COPY ./common/install_cache.sh install_cache.sh
ADD ./common/install_cache.sh install_cache.sh
ENV PATH /opt/cache/bin:$PATH
RUN bash ./install_cache.sh && rm install_cache.sh

View File

@ -15,22 +15,11 @@ install_ubuntu() {
elif [[ "$UBUNTU_VERSION" == "20.04"* ]]; then
cmake3="cmake=3.16*"
maybe_libiomp_dev=""
elif [[ "$UBUNTU_VERSION" == "22.04"* ]]; then
cmake3="cmake=3.22*"
maybe_libiomp_dev=""
else
cmake3="cmake=3.5*"
maybe_libiomp_dev="libiomp-dev"
fi
if [[ "$CLANG_VERSION" == 12 ]]; then
maybe_libomp_dev="libomp-12-dev"
elif [[ "$CLANG_VERSION" == 10 ]]; then
maybe_libomp_dev="libomp-10-dev"
else
maybe_libomp_dev=""
fi
# TODO: Remove this once nvidia package repos are back online
# Comment out nvidia repositories to prevent them from getting apt-get updated, see https://github.com/pytorch/pytorch/issues/74968
# shellcheck disable=SC2046
@ -40,12 +29,10 @@ install_ubuntu() {
apt-get update
# TODO: Some of these may not be necessary
ccache_deps="asciidoc docbook-xml docbook-xsl xsltproc"
deploy_deps="libffi-dev libbz2-dev libreadline-dev libncurses5-dev libncursesw5-dev libgdbm-dev libsqlite3-dev uuid-dev tk-dev"
numpy_deps="gfortran"
apt-get install -y --no-install-recommends \
$ccache_deps \
$numpy_deps \
${deploy_deps} \
${cmake3} \
apt-transport-https \
autoconf \
@ -62,32 +49,15 @@ install_ubuntu() {
libjpeg-dev \
libasound2-dev \
libsndfile-dev \
${maybe_libomp_dev} \
software-properties-common \
wget \
sudo \
vim \
jq \
libtool
vim
# Should resolve issues related to various apt package repository cert issues
# see: https://github.com/pytorch/pytorch/issues/65931
apt-get install -y libgnutls30
# cuda-toolkit does not work with gcc-11.2.0 which is default in Ubunutu 22.04
# see: https://github.com/NVlabs/instant-ngp/issues/119
if [[ "$UBUNTU_VERSION" == "22.04"* ]]; then
apt-get install -y g++-10
update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-10 30
update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-10 30
update-alternatives --install /usr/bin/gcov gcov /usr/bin/gcov-10 30
# https://www.spinics.net/lists/libreoffice/msg07549.html
sudo rm -rf /usr/lib/gcc/x86_64-linux-gnu/11
wget https://github.com/gcc-mirror/gcc/commit/2b2d97fc545635a0f6aa9c9ee3b017394bc494bf.patch -O noexecpt.patch
sudo patch /usr/include/c++/10/bits/range_access.h noexecpt.patch
fi
# Cleanup package manager
apt-get autoclean && apt-get clean
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

View File

@ -5,9 +5,7 @@ set -ex
install_ubuntu() {
echo "Preparing to build sccache from source"
apt-get update
# libssl-dev will not work as it is upgraded to libssl3 in Ubuntu-22.04.
# Instead use lib and headers from OpenSSL1.1 installed in `install_openssl.sh``
apt-get install -y cargo
apt-get install -y cargo pkg-config libssl-dev
echo "Checking out sccache repo"
git clone https://github.com/pytorch/sccache
cd sccache
@ -48,9 +46,7 @@ fi
chmod a+x /opt/cache/bin/sccache
function write_sccache_stub() {
# Unset LD_PRELOAD for ps because of asan + ps issues
# https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90589
printf "#!/bin/sh\nif [ \$(env -u LD_PRELOAD ps -p \$PPID -o comm=) != sccache ]; then\n exec sccache $(which $1) \"\$@\"\nelse\n exec $(which $1) \"\$@\"\nfi" > "/opt/cache/bin/$1"
printf "#!/bin/sh\nif [ \$(ps -p \$PPID -o comm=) != sccache ]; then\n exec sccache $(which $1) \"\$@\"\nelse\n exec $(which $1) \"\$@\"\nfi" > "/opt/cache/bin/$1"
chmod a+x "/opt/cache/bin/$1"
}

View File

@ -13,9 +13,6 @@ if [ -n "$CLANG_VERSION" ]; then
sudo apt-get install -y --no-install-recommends gpg-agent
wget --no-check-certificate -O - https://apt.llvm.org/llvm-snapshot.gpg.key | sudo apt-key add -
apt-add-repository "deb http://apt.llvm.org/bionic/ llvm-toolchain-bionic-${CLANG_VERSION} main"
elif [[ $UBUNTU_VERSION == 22.04 ]]; then
# work around ubuntu apt-get conflicts
sudo apt-get -y -f install
fi
sudo apt-get update

View File

@ -55,10 +55,8 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then
# Ensure we run conda in a directory that jenkins has write access to
pushd /opt/conda
# Prevent conda from updating to 4.14.0, which causes docker build failures
# See https://hud.pytorch.org/pytorch/pytorch/commit/754d7f05b6841e555cea5a4b2c505dd9e0baec1d
# Uncomment the below when resolved to track the latest conda update
# as_jenkins conda update -y -n base conda
# Track latest conda update
as_jenkins conda update -y -n base conda
# Install correct Python version
as_jenkins conda install -y python="$ANACONDA_PYTHON_VERSION"
@ -75,21 +73,19 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then
}
# Install PyTorch conda deps, as per https://github.com/pytorch/pytorch README
# DO NOT install cmake here as it would install a version newer than 3.13, but
# we want to pin to version 3.13.
CONDA_COMMON_DEPS="astunparse pyyaml mkl=2022.0.1 mkl-include=2022.0.1 setuptools cffi future six"
if [ "$ANACONDA_PYTHON_VERSION" = "3.10" ]; then
# DO NOT install cmake here as it would install a version newer than 3.10, but
# we want to pin to version 3.10.
if [ "$ANACONDA_PYTHON_VERSION" = "3.9" ]; then
# Install llvm-8 as it is required to compile llvmlite-0.30.0 from source
conda_install numpy=1.21.2 ${CONDA_COMMON_DEPS} llvmdev=8.0.0
elif [ "$ANACONDA_PYTHON_VERSION" = "3.9" ]; then
# Install llvm-8 as it is required to compile llvmlite-0.30.0 from source
conda_install numpy=1.19.2 ${CONDA_COMMON_DEPS} llvmdev=8.0.0
conda_install numpy=1.19.2 astunparse pyyaml mkl mkl-include setuptools cffi future six llvmdev=8.0.0
elif [ "$ANACONDA_PYTHON_VERSION" = "3.8" ]; then
# Install llvm-8 as it is required to compile llvmlite-0.30.0 from source
conda_install numpy=1.18.5 ${CONDA_COMMON_DEPS} llvmdev=8.0.0
conda_install numpy=1.18.5 astunparse pyyaml mkl mkl-include setuptools cffi future six llvmdev=8.0.0
elif [ "$ANACONDA_PYTHON_VERSION" = "3.7" ]; then
# DO NOT install dataclasses if installing python-3.7, since its part of python-3.7 core packages
conda_install numpy=1.18.5 astunparse pyyaml mkl mkl-include setuptools cffi future six typing_extensions
else
# Install `typing_extensions` for 3.7
conda_install numpy=1.18.5 ${CONDA_COMMON_DEPS} typing_extensions
conda_install numpy=1.18.5 astunparse pyyaml mkl mkl-include setuptools cffi future six dataclasses typing_extensions
fi
# Magma package names are concatenation of CUDA major and minor ignoring revision

View File

@ -4,13 +4,7 @@ if [[ ${CUDNN_VERSION} == 8 ]]; then
# cuDNN license: https://developer.nvidia.com/cudnn/license_agreement
mkdir tmp_cudnn && cd tmp_cudnn
CUDNN_NAME="cudnn-linux-x86_64-8.3.2.44_cuda11.5-archive"
if [[ ${CUDA_VERSION:0:4} == "11.7" ]]; then
CUDNN_NAME="cudnn-linux-x86_64-8.5.0.96_cuda11-archive"
curl -OLs https://ossci-linux.s3.amazonaws.com/${CUDNN_NAME}.tar.xz
else
curl -OLs https://developer.download.nvidia.com/compute/redist/cudnn/v8.3.2/local_installers/11.5/${CUDNN_NAME}.tar.xz
fi
curl -OLs https://developer.download.nvidia.com/compute/redist/cudnn/v8.3.2/local_installers/11.5/${CUDNN_NAME}.tar.xz
tar xf ${CUDNN_NAME}.tar.xz
cp -a ${CUDNN_NAME}/include/* /usr/include/
cp -a ${CUDNN_NAME}/include/* /usr/local/cuda/include/

View File

@ -17,8 +17,6 @@ if [ -n "$KATEX" ]; then
apt-get install -y --no-install-recommends yarn
yarn global add katex --prefix /usr/local
sudo apt-get -y install doxygen
apt-get autoclean && apt-get clean
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

View File

@ -10,7 +10,5 @@ cd "${OPENSSL}"
./config --prefix=/opt/openssl -d '-Wl,--enable-new-dtags,-rpath,$(LIBRPATH)'
# NOTE: openssl install errors out when built with the -j option
make -j6; make install_sw
# Link the ssl libraries to the /usr/lib folder.
sudo ln -s /opt/openssl/lib/lib* /usr/lib
cd ..
rm -rf "${OPENSSL}"

View File

@ -2,12 +2,40 @@
set -ex
install_magma() {
# "install" hipMAGMA into /opt/rocm/magma by copying after build
git clone https://bitbucket.org/icl/magma.git
pushd magma
# Fixes memory leaks of magma found while executing linalg UTs
git checkout 5959b8783e45f1809812ed96ae762f38ee701972
cp make.inc-examples/make.inc.hip-gcc-mkl make.inc
echo 'LIBDIR += -L$(MKLROOT)/lib' >> make.inc
echo 'LIB += -Wl,--enable-new-dtags -Wl,--rpath,/opt/rocm/lib -Wl,--rpath,$(MKLROOT)/lib -Wl,--rpath,/opt/rocm/magma/lib' >> make.inc
echo 'DEVCCFLAGS += --gpu-max-threads-per-block=256' >> make.inc
export PATH="${PATH}:/opt/rocm/bin"
if [[ -n "$PYTORCH_ROCM_ARCH" ]]; then
amdgpu_targets=`echo $PYTORCH_ROCM_ARCH | sed 's/;/ /g'`
else
amdgpu_targets=`rocm_agent_enumerator | grep -v gfx000 | sort -u | xargs`
fi
for arch in $amdgpu_targets; do
echo "DEVCCFLAGS += --amdgpu-target=$arch" >> make.inc
done
# hipcc with openmp flag may cause isnan() on __device__ not to be found; depending on context, compiler may attempt to match with host definition
sed -i 's/^FOPENMP/#FOPENMP/g' make.inc
make -f make.gen.hipMAGMA -j $(nproc)
LANG=C.UTF-8 make lib/libmagma.so -j $(nproc) MKLROOT=/opt/conda
make testing/testing_dgemm -j $(nproc) MKLROOT=/opt/conda
popd
mv magma /opt/rocm
}
ver() {
printf "%3d%03d%03d%03d" $(echo "$1" | tr '.' ' ');
}
# Map ROCm version to AMDGPU version
declare -A AMDGPU_VERSIONS=( ["5.0"]="21.50" ["5.1.1"]="22.10.1" ["5.2"]="22.20" )
declare -A AMDGPU_VERSIONS=( ["4.5.2"]="21.40.2" ["5.0"]="21.50" ["5.1.1"]="22.10.1" )
install_ubuntu() {
apt-get update
@ -61,6 +89,8 @@ install_ubuntu() {
DEBIAN_FRONTEND=noninteractive apt-get install -y --allow-unauthenticated ${MIOPENKERNELS}
fi
install_magma
# Cleanup
apt-get autoclean && apt-get clean
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
@ -105,6 +135,8 @@ install_centos() {
rocprofiler-dev \
roctracer-dev
install_magma
# Cleanup
yum clean all
rm -rf /var/cache/yum

View File

@ -1,29 +0,0 @@
#!/bin/bash
set -ex
# "install" hipMAGMA into /opt/rocm/magma by copying after build
git clone https://bitbucket.org/icl/magma.git
pushd magma
# Fixes memory leaks of magma found while executing linalg UTs
git checkout 5959b8783e45f1809812ed96ae762f38ee701972
cp make.inc-examples/make.inc.hip-gcc-mkl make.inc
echo 'LIBDIR += -L$(MKLROOT)/lib' >> make.inc
echo 'LIB += -Wl,--enable-new-dtags -Wl,--rpath,/opt/rocm/lib -Wl,--rpath,$(MKLROOT)/lib -Wl,--rpath,/opt/rocm/magma/lib' >> make.inc
echo 'DEVCCFLAGS += --gpu-max-threads-per-block=256' >> make.inc
export PATH="${PATH}:/opt/rocm/bin"
if [[ -n "$PYTORCH_ROCM_ARCH" ]]; then
amdgpu_targets=`echo $PYTORCH_ROCM_ARCH | sed 's/;/ /g'`
else
amdgpu_targets=`rocm_agent_enumerator | grep -v gfx000 | sort -u | xargs`
fi
for arch in $amdgpu_targets; do
echo "DEVCCFLAGS += --amdgpu-target=$arch" >> make.inc
done
# hipcc with openmp flag may cause isnan() on __device__ not to be found; depending on context, compiler may attempt to match with host definition
sed -i 's/^FOPENMP/#FOPENMP/g' make.inc
make -f make.gen.hipMAGMA -j $(nproc)
LANG=C.UTF-8 make lib/libmagma.so -j $(nproc) MKLROOT=/opt/conda
make testing/testing_dgemm -j $(nproc) MKLROOT=/opt/conda
popd
mv magma /opt/rocm

View File

@ -1,48 +0,0 @@
#!/bin/bash
set -ex
if [[ -d "/usr/local/cuda/" ]]; then
with_cuda=/usr/local/cuda/
else
with_cuda=no
fi
function install_ucx() {
set -ex
git clone --recursive https://github.com/openucx/ucx.git
pushd ucx
git checkout ${UCX_COMMIT}
git submodule update --init --recursive
./autogen.sh
./configure --prefix=$UCX_HOME \
--enable-mt \
--with-cuda=$with_cuda \
--enable-profiling \
--enable-stats
time make -j
sudo make install
popd
rm -rf ucx
}
function install_ucc() {
set -ex
git clone --recursive https://github.com/openucx/ucc.git
pushd ucc
git checkout ${UCC_COMMIT}
git submodule update --init --recursive
./autogen.sh
./configure --prefix=$UCC_HOME --with-ucx=$UCX_HOME --with-cuda=$with_cuda
time make -j
sudo make install
popd
rm -rf ucc
}
install_ucx
install_ucc

View File

@ -41,7 +41,7 @@ flatbuffers==2.0
#Pinned versions:
#test that import:
hypothesis==5.35.1
hypothesis==4.53.2
# Pin hypothesis to avoid flakiness: https://github.com/pytorch/pytorch/issues/31136
#Description: advanced library for generating parametrized tests
#Pinned versions: 3.44.6, 4.53.2
@ -80,17 +80,17 @@ librosa>=0.6.2
#Pinned versions:
#test that import:
mypy==0.960
mypy==0.812
# Pin MyPy version because new errors are likely to appear with each release
#Description: linter
#Pinned versions: 0.960
#Pinned versions: 0.812
#test that import: test_typing.py, test_type_hints.py
networkx==2.6.3
#networkx
#Description: creation, manipulation, and study of
#the structure, dynamics, and functions of complex networks
#Pinned versions: 2.6.3 (latest version that works with Python 3.7+)
#test that import: functorch
#Pinned versions: 2.0
#test that import:
#ninja
#Description: build system. Note that it install from
@ -100,7 +100,6 @@ networkx==2.6.3
numba==0.49.0 ; python_version < "3.9"
numba==0.54.1 ; python_version == "3.9"
numba==0.55.2 ; python_version == "3.10"
#Description: Just-In-Time Compiler for Numerical Functions
#Pinned versions: 0.54.1, 0.49.0, <=0.49.1
#test that import: test_numba_integration.py
@ -124,19 +123,14 @@ numba==0.55.2 ; python_version == "3.10"
#Pinned versions: 1.9.0
#test that import:
opt-einsum==3.3
#Description: Python library to optimize tensor contraction order, used in einsum
#Pinned versions: 3.3
#test that import: test_linalg.py
#pillow
#Description: Python Imaging Library fork
#Pinned versions:
#test that import:
protobuf==3.20.2
#protobuf
#Description: Googles data interchange format
#Pinned versions: 3.20.1
#Pinned versions:
#test that import: test_tensorboard.py
psutil
@ -149,21 +143,6 @@ pytest
#Pinned versions:
#test that import: test_typing.py, test_cpp_extensions_aot.py, run_test.py
pytest-xdist
#Description: plugin for running pytest in parallel
#Pinned versions:
#test that import:
pytest-shard
#Description: plugin spliting up tests in pytest
#Pinned versions:
#test that import:
pytest-rerunfailures
#Description: plugin for rerunning tests in pytest
#Pinned versions:
#test that import:
#pytest-benchmark
#Description: fixture for benchmarking code
#Pinned versions: 3.2.3
@ -174,16 +153,6 @@ pytest-rerunfailures
#Pinned versions:
#test that import:
xdoctest==1.0.2
#Description: runs doctests in pytest
#Pinned versions: 1.0.2
#test that import:
pygments==2.12.0
#Description: support doctest highlighting
#Pinned versions: 2.12.0
#test that import: the doctests
#PyYAML
#Description: data serialization format
#Pinned versions:
@ -209,8 +178,7 @@ scikit-image
#Pinned versions: 0.20.3
#test that import:
scipy==1.6.3 ; python_version < "3.10"
scipy==1.8.1 ; python_version == "3.10"
scipy==1.6.3
# Pin SciPy because of failing distribution tests (see #60347)
#Description: scientific python
#Pinned versions: 1.6.3

View File

@ -11,96 +11,80 @@ ENV DEBIAN_FRONTEND noninteractive
# Install common dependencies (so that this step can be cached separately)
ARG EC2
COPY ./common/install_base.sh install_base.sh
ADD ./common/install_base.sh install_base.sh
RUN bash ./install_base.sh && rm install_base.sh
# Install user
COPY ./common/install_user.sh install_user.sh
ADD ./common/install_user.sh install_user.sh
RUN bash ./install_user.sh && rm install_user.sh
# Install katex
ARG KATEX
COPY ./common/install_docs_reqs.sh install_docs_reqs.sh
RUN bash ./install_docs_reqs.sh && rm install_docs_reqs.sh
ADD ./common/install_katex.sh install_katex.sh
RUN bash ./install_katex.sh && rm install_katex.sh
# Install conda and other packages (e.g., numpy, pytest)
ENV PATH /opt/conda/bin:$PATH
ARG ANACONDA_PYTHON_VERSION
COPY requirements-ci.txt /opt/conda/requirements-ci.txt
COPY ./common/install_conda.sh install_conda.sh
ADD requirements-ci.txt /opt/conda/requirements-ci.txt
ADD ./common/install_conda.sh install_conda.sh
RUN bash ./install_conda.sh && rm install_conda.sh
RUN rm /opt/conda/requirements-ci.txt
# Install gcc
ARG GCC_VERSION
COPY ./common/install_gcc.sh install_gcc.sh
ADD ./common/install_gcc.sh install_gcc.sh
RUN bash ./install_gcc.sh && rm install_gcc.sh
# Install clang
ARG CLANG_VERSION
COPY ./common/install_clang.sh install_clang.sh
ADD ./common/install_clang.sh install_clang.sh
RUN bash ./install_clang.sh && rm install_clang.sh
# (optional) Install protobuf for ONNX
ARG PROTOBUF
COPY ./common/install_protobuf.sh install_protobuf.sh
ADD ./common/install_protobuf.sh install_protobuf.sh
RUN if [ -n "${PROTOBUF}" ]; then bash ./install_protobuf.sh; fi
RUN rm install_protobuf.sh
ENV INSTALLED_PROTOBUF ${PROTOBUF}
# (optional) Install database packages like LMDB and LevelDB
ARG DB
COPY ./common/install_db.sh install_db.sh
ADD ./common/install_db.sh install_db.sh
RUN if [ -n "${DB}" ]; then bash ./install_db.sh; fi
RUN rm install_db.sh
ENV INSTALLED_DB ${DB}
# (optional) Install vision packages like OpenCV and ffmpeg
ARG VISION
COPY ./common/install_vision.sh install_vision.sh
ADD ./common/install_vision.sh install_vision.sh
RUN if [ -n "${VISION}" ]; then bash ./install_vision.sh; fi
RUN rm install_vision.sh
ENV INSTALLED_VISION ${VISION}
# (optional) Install UCC
ARG UCX_COMMIT
ARG UCC_COMMIT
ENV UCX_COMMIT $UCX_COMMIT
ENV UCC_COMMIT $UCC_COMMIT
ENV UCX_HOME /usr
ENV UCC_HOME /usr
ADD ./common/install_ucc.sh install_ucc.sh
RUN if [ -n "${UCX_COMMIT}" ] && [ -n "${UCC_COMMIT}" ]; then bash ./install_ucc.sh; fi
RUN rm install_ucc.sh
COPY ./common/install_openssl.sh install_openssl.sh
ADD ./common/install_openssl.sh install_openssl.sh
ENV OPENSSL_ROOT_DIR /opt/openssl
RUN bash ./install_openssl.sh
ENV OPENSSL_DIR /opt/openssl
# (optional) Install non-default CMake version
ARG CMAKE_VERSION
COPY ./common/install_cmake.sh install_cmake.sh
ADD ./common/install_cmake.sh install_cmake.sh
RUN if [ -n "${CMAKE_VERSION}" ]; then bash ./install_cmake.sh; fi
RUN rm install_cmake.sh
# Install ccache/sccache (do this last, so we get priority in PATH)
COPY ./common/install_cache.sh install_cache.sh
ADD ./common/install_cache.sh install_cache.sh
ENV PATH /opt/cache/bin:$PATH
# See https://github.com/pytorch/pytorch/issues/82174
# TODO(sdym@fb.com):
# check if this is needed after full off Xenial migration
ENV CARGO_NET_GIT_FETCH_WITH_CLI true
RUN bash ./install_cache.sh && rm install_cache.sh
ENV CMAKE_CUDA_COMPILER_LAUNCHER=/opt/cache/bin/sccache
# Add jni.h for java host build
COPY ./common/install_jni.sh install_jni.sh
COPY ./java/jni.h jni.h
ADD ./common/install_jni.sh install_jni.sh
ADD ./java/jni.h jni.h
RUN bash ./install_jni.sh && rm install_jni.sh
# Install Open MPI for CUDA
COPY ./common/install_openmpi.sh install_openmpi.sh
ADD ./common/install_openmpi.sh install_openmpi.sh
RUN if [ -n "${CUDA_VERSION}" ]; then bash install_openmpi.sh; fi
RUN rm install_openmpi.sh
@ -118,14 +102,9 @@ COPY --from=pytorch/llvm:9.0.1 /opt/llvm /opt/llvm
# Install CUDNN
ARG CUDNN_VERSION
ARG CUDA_VERSION
COPY ./common/install_cudnn.sh install_cudnn.sh
ADD ./common/install_cudnn.sh install_cudnn.sh
RUN if [ "${CUDNN_VERSION}" -eq 8 ]; then bash install_cudnn.sh; fi
RUN rm install_cudnn.sh
# Delete /usr/local/cuda-11.X/cuda-11.X symlinks
RUN if [ -h /usr/local/cuda-11.6/cuda-11.6 ]; then rm /usr/local/cuda-11.6/cuda-11.6; fi
RUN if [ -h /usr/local/cuda-11.7/cuda-11.7 ]; then rm /usr/local/cuda-11.7/cuda-11.7; fi
USER jenkins
CMD ["bash"]

View File

@ -12,61 +12,58 @@ ENV PYTORCH_ROCM_ARCH ${PYTORCH_ROCM_ARCH}
# Install common dependencies (so that this step can be cached separately)
ARG EC2
COPY ./common/install_base.sh install_base.sh
ADD ./common/install_base.sh install_base.sh
RUN bash ./install_base.sh && rm install_base.sh
# Install clang
ARG LLVMDEV
ARG CLANG_VERSION
COPY ./common/install_clang.sh install_clang.sh
ADD ./common/install_clang.sh install_clang.sh
RUN bash ./install_clang.sh && rm install_clang.sh
# Install user
COPY ./common/install_user.sh install_user.sh
ADD ./common/install_user.sh install_user.sh
RUN bash ./install_user.sh && rm install_user.sh
# Install conda and other packages (e.g., numpy, pytest)
ENV PATH /opt/conda/bin:$PATH
ARG ANACONDA_PYTHON_VERSION
COPY requirements-ci.txt /opt/conda/requirements-ci.txt
COPY ./common/install_conda.sh install_conda.sh
ADD requirements-ci.txt /opt/conda/requirements-ci.txt
ADD ./common/install_conda.sh install_conda.sh
RUN bash ./install_conda.sh && rm install_conda.sh
RUN rm /opt/conda/requirements-ci.txt
# Install gcc
ARG GCC_VERSION
COPY ./common/install_gcc.sh install_gcc.sh
ADD ./common/install_gcc.sh install_gcc.sh
RUN bash ./install_gcc.sh && rm install_gcc.sh
# (optional) Install protobuf for ONNX
ARG PROTOBUF
COPY ./common/install_protobuf.sh install_protobuf.sh
ADD ./common/install_protobuf.sh install_protobuf.sh
RUN if [ -n "${PROTOBUF}" ]; then bash ./install_protobuf.sh; fi
RUN rm install_protobuf.sh
ENV INSTALLED_PROTOBUF ${PROTOBUF}
# (optional) Install database packages like LMDB and LevelDB
ARG DB
COPY ./common/install_db.sh install_db.sh
ADD ./common/install_db.sh install_db.sh
RUN if [ -n "${DB}" ]; then bash ./install_db.sh; fi
RUN rm install_db.sh
ENV INSTALLED_DB ${DB}
# (optional) Install vision packages like OpenCV and ffmpeg
ARG VISION
COPY ./common/install_vision.sh install_vision.sh
ADD ./common/install_vision.sh install_vision.sh
RUN if [ -n "${VISION}" ]; then bash ./install_vision.sh; fi
RUN rm install_vision.sh
ENV INSTALLED_VISION ${VISION}
# Install rocm
ARG ROCM_VERSION
COPY ./common/install_rocm.sh install_rocm.sh
ADD ./common/install_rocm.sh install_rocm.sh
RUN bash ./install_rocm.sh
RUN rm install_rocm.sh
COPY ./common/install_rocm_magma.sh install_rocm_magma.sh
RUN bash ./install_rocm_magma.sh
RUN rm install_rocm_magma.sh
ENV PATH /opt/rocm/bin:$PATH
ENV PATH /opt/rocm/hcc/bin:$PATH
ENV PATH /opt/rocm/hip/bin:$PATH
@ -78,18 +75,18 @@ ENV LC_ALL C.UTF-8
# (optional) Install non-default CMake version
ARG CMAKE_VERSION
COPY ./common/install_cmake.sh install_cmake.sh
ADD ./common/install_cmake.sh install_cmake.sh
RUN if [ -n "${CMAKE_VERSION}" ]; then bash ./install_cmake.sh; fi
RUN rm install_cmake.sh
# (optional) Install non-default Ninja version
ARG NINJA_VERSION
COPY ./common/install_ninja.sh install_ninja.sh
ADD ./common/install_ninja.sh install_ninja.sh
RUN if [ -n "${NINJA_VERSION}" ]; then bash ./install_ninja.sh; fi
RUN rm install_ninja.sh
# Install ccache/sccache (do this last, so we get priority in PATH)
COPY ./common/install_cache.sh install_cache.sh
ADD ./common/install_cache.sh install_cache.sh
ENV PATH /opt/cache/bin:$PATH
RUN bash ./install_cache.sh && rm install_cache.sh

View File

@ -6,86 +6,67 @@ ARG UBUNTU_VERSION
ENV DEBIAN_FRONTEND noninteractive
ARG CLANG_VERSION
# Install common dependencies (so that this step can be cached separately)
ARG EC2
COPY ./common/install_base.sh install_base.sh
ADD ./common/install_base.sh install_base.sh
RUN bash ./install_base.sh && rm install_base.sh
# Install clang
ARG LLVMDEV
COPY ./common/install_clang.sh install_clang.sh
ARG CLANG_VERSION
ADD ./common/install_clang.sh install_clang.sh
RUN bash ./install_clang.sh && rm install_clang.sh
# (optional) Install thrift.
ARG THRIFT
COPY ./common/install_thrift.sh install_thrift.sh
ADD ./common/install_thrift.sh install_thrift.sh
RUN if [ -n "${THRIFT}" ]; then bash ./install_thrift.sh; fi
RUN rm install_thrift.sh
ENV INSTALLED_THRIFT ${THRIFT}
# Install user
COPY ./common/install_user.sh install_user.sh
ADD ./common/install_user.sh install_user.sh
RUN bash ./install_user.sh && rm install_user.sh
# Install katex
ARG KATEX
COPY ./common/install_docs_reqs.sh install_docs_reqs.sh
RUN bash ./install_docs_reqs.sh && rm install_docs_reqs.sh
ADD ./common/install_katex.sh install_katex.sh
RUN bash ./install_katex.sh && rm install_katex.sh
# Install conda and other packages (e.g., numpy, pytest)
ENV PATH /opt/conda/bin:$PATH
ARG ANACONDA_PYTHON_VERSION
COPY requirements-ci.txt /opt/conda/requirements-ci.txt
COPY ./common/install_conda.sh install_conda.sh
ADD requirements-ci.txt /opt/conda/requirements-ci.txt
ADD ./common/install_conda.sh install_conda.sh
RUN bash ./install_conda.sh && rm install_conda.sh
RUN rm /opt/conda/requirements-ci.txt
# Install gcc
ARG GCC_VERSION
COPY ./common/install_gcc.sh install_gcc.sh
ADD ./common/install_gcc.sh install_gcc.sh
RUN bash ./install_gcc.sh && rm install_gcc.sh
# Install lcov for C++ code coverage
COPY ./common/install_lcov.sh install_lcov.sh
ADD ./common/install_lcov.sh install_lcov.sh
RUN bash ./install_lcov.sh && rm install_lcov.sh
# Install cuda and cudnn
ARG CUDA_VERSION
RUN wget -q https://raw.githubusercontent.com/pytorch/builder/main/common/install_cuda.sh -O install_cuda.sh
RUN bash ./install_cuda.sh ${CUDA_VERSION} && rm install_cuda.sh
ENV DESIRED_CUDA ${CUDA_VERSION}
ENV PATH /usr/local/nvidia/bin:/usr/local/cuda/bin:$PATH
# (optional) Install UCC
ARG UCX_COMMIT
ARG UCC_COMMIT
ENV UCX_COMMIT $UCX_COMMIT
ENV UCC_COMMIT $UCC_COMMIT
ENV UCX_HOME /usr
ENV UCC_HOME /usr
ADD ./common/install_ucc.sh install_ucc.sh
RUN if [ -n "${UCX_COMMIT}" ] && [ -n "${UCC_COMMIT}" ]; then bash ./install_ucc.sh; fi
RUN rm install_ucc.sh
# (optional) Install protobuf for ONNX
ARG PROTOBUF
COPY ./common/install_protobuf.sh install_protobuf.sh
ADD ./common/install_protobuf.sh install_protobuf.sh
RUN if [ -n "${PROTOBUF}" ]; then bash ./install_protobuf.sh; fi
RUN rm install_protobuf.sh
ENV INSTALLED_PROTOBUF ${PROTOBUF}
# (optional) Install database packages like LMDB and LevelDB
ARG DB
COPY ./common/install_db.sh install_db.sh
ADD ./common/install_db.sh install_db.sh
RUN if [ -n "${DB}" ]; then bash ./install_db.sh; fi
RUN rm install_db.sh
ENV INSTALLED_DB ${DB}
# (optional) Install vision packages like OpenCV and ffmpeg
ARG VISION
COPY ./common/install_vision.sh install_vision.sh
ADD ./common/install_vision.sh install_vision.sh
RUN if [ -n "${VISION}" ]; then bash ./install_vision.sh; fi
RUN rm install_vision.sh
ENV INSTALLED_VISION ${VISION}
@ -94,9 +75,9 @@ ENV INSTALLED_VISION ${VISION}
ARG ANDROID
ARG ANDROID_NDK
ARG GRADLE_VERSION
COPY ./common/install_android.sh install_android.sh
COPY ./android/AndroidManifest.xml AndroidManifest.xml
COPY ./android/build.gradle build.gradle
ADD ./common/install_android.sh install_android.sh
ADD ./android/AndroidManifest.xml AndroidManifest.xml
ADD ./android/build.gradle build.gradle
RUN if [ -n "${ANDROID}" ]; then bash ./install_android.sh; fi
RUN rm install_android.sh
RUN rm AndroidManifest.xml
@ -105,53 +86,42 @@ ENV INSTALLED_ANDROID ${ANDROID}
# (optional) Install Vulkan SDK
ARG VULKAN_SDK_VERSION
COPY ./common/install_vulkan_sdk.sh install_vulkan_sdk.sh
ADD ./common/install_vulkan_sdk.sh install_vulkan_sdk.sh
RUN if [ -n "${VULKAN_SDK_VERSION}" ]; then bash ./install_vulkan_sdk.sh; fi
RUN rm install_vulkan_sdk.sh
# (optional) Install swiftshader
ARG SWIFTSHADER
COPY ./common/install_swiftshader.sh install_swiftshader.sh
ADD ./common/install_swiftshader.sh install_swiftshader.sh
RUN if [ -n "${SWIFTSHADER}" ]; then bash ./install_swiftshader.sh; fi
RUN rm install_swiftshader.sh
# (optional) Install non-default CMake version
ARG CMAKE_VERSION
COPY ./common/install_cmake.sh install_cmake.sh
ADD ./common/install_cmake.sh install_cmake.sh
RUN if [ -n "${CMAKE_VERSION}" ]; then bash ./install_cmake.sh; fi
RUN rm install_cmake.sh
# (optional) Install non-default Ninja version
ARG NINJA_VERSION
COPY ./common/install_ninja.sh install_ninja.sh
ADD ./common/install_ninja.sh install_ninja.sh
RUN if [ -n "${NINJA_VERSION}" ]; then bash ./install_ninja.sh; fi
RUN rm install_ninja.sh
COPY ./common/install_openssl.sh install_openssl.sh
ADD ./common/install_openssl.sh install_openssl.sh
RUN bash ./install_openssl.sh
ENV OPENSSL_ROOT_DIR /opt/openssl
ENV OPENSSL_DIR /opt/openssl
RUN rm install_openssl.sh
# Install ccache/sccache (do this last, so we get priority in PATH)
COPY ./common/install_cache.sh install_cache.sh
ADD ./common/install_cache.sh install_cache.sh
ENV PATH /opt/cache/bin:$PATH
# See https://github.com/pytorch/pytorch/issues/82174
# TODO(sdym@fb.com):
# check if this is needed after full off Xenial migration
ENV CARGO_NET_GIT_FETCH_WITH_CLI true
RUN bash ./install_cache.sh && rm install_cache.sh
# Add jni.h for java host build
COPY ./common/install_jni.sh install_jni.sh
COPY ./java/jni.h jni.h
ADD ./common/install_jni.sh install_jni.sh
ADD ./java/jni.h jni.h
RUN bash ./install_jni.sh && rm install_jni.sh
# Install Open MPI for CUDA
COPY ./common/install_openmpi.sh install_openmpi.sh
RUN if [ -n "${CUDA_VERSION}" ]; then bash install_openmpi.sh; fi
RUN rm install_openmpi.sh
# Include BUILD_ENVIRONMENT environment variable in image
ARG BUILD_ENVIRONMENT
ENV BUILD_ENVIRONMENT ${BUILD_ENVIRONMENT}
@ -159,10 +129,5 @@ ENV BUILD_ENVIRONMENT ${BUILD_ENVIRONMENT}
# Install LLVM dev version (Defined in the pytorch/builder github repository)
COPY --from=pytorch/llvm:9.0.1 /opt/llvm /opt/llvm
# AWS specific CUDA build guidance
ENV TORCH_CUDA_ARCH_LIST Maxwell
ENV TORCH_NVCC_FLAGS "-Xfatbin -compress-all"
ENV CUDA_PATH /usr/local/cuda
USER jenkins
CMD ["bash"]

View File

@ -14,9 +14,6 @@ import cimodel.data.simple.docker_definitions
import cimodel.data.simple.mobile_definitions
import cimodel.data.simple.nightly_ios
import cimodel.data.simple.anaconda_prune_defintions
import cimodel.data.simple.macos_definitions
import cimodel.data.simple.upload_test_stats_definition
import cimodel.data.simple.ios_definitions
import cimodel.lib.miniutils as miniutils
import cimodel.lib.miniyaml as miniyaml
@ -73,7 +70,6 @@ class Header(object):
for line in filter(None, lines):
output_filehandle.write(line + "\n")
def _for_all_items(items, functor) -> None:
if isinstance(items, list):
for item in items:
@ -82,7 +78,6 @@ def _for_all_items(items, functor) -> None:
item_type, item = next(iter(items.items()))
functor(item_type, item)
def filter_master_only_jobs(items):
def _is_main_or_master_item(item):
filters = item.get('filters', None)
@ -121,7 +116,6 @@ def filter_master_only_jobs(items):
_for_all_items(items, _save_requires_if_master)
return _do_filtering(items)
def generate_required_docker_images(items):
required_docker_images = set()
@ -137,15 +131,11 @@ def generate_required_docker_images(items):
_for_all_items(items, _requires_docker_image)
return required_docker_images
def gen_build_workflows_tree():
build_workflows_functions = [
cimodel.data.simple.mobile_definitions.get_workflow_jobs,
cimodel.data.simple.nightly_ios.get_workflow_jobs,
cimodel.data.simple.anaconda_prune_defintions.get_workflow_jobs,
cimodel.data.simple.macos_definitions.get_new_workflow_jobs,
cimodel.data.simple.upload_test_stats_definition.get_workflow_job,
cimodel.data.simple.ios_definitions.get_workflow_jobs,
]
build_jobs = [f() for f in build_workflows_functions]
build_jobs.extend(

View File

@ -62,7 +62,7 @@ git --no-pager log --max-count 1
popd
# Clone the Builder master repo
retry git clone -q https://github.com/pytorch/builder.git -b release/1.13 "$BUILDER_ROOT"
retry git clone -q https://github.com/pytorch/builder.git -b release/1.12 "$BUILDER_ROOT"
pushd "$BUILDER_ROOT"
echo "Using builder from "
git --no-pager log --max-count 1

View File

@ -1,19 +1,30 @@
#!/bin/bash
set -ex -o pipefail
if ! [ "$IOS_PLATFORM" == "SIMULATOR" ]; then
exit 0
fi
echo ""
echo "DIR: $(pwd)"
PROJ_ROOT=/Users/distiller/project
cd ${PROJ_ROOT}/ios/TestApp
# install fastlane
sudo gem install bundler && bundle install
# install certificates
echo "${IOS_CERT_KEY_2022}" >> cert.txt
base64 --decode cert.txt -o Certificates.p12
rm cert.txt
bundle exec fastlane install_root_cert
bundle exec fastlane install_dev_cert
# install the provisioning profile
PROFILE=PyTorch_CI_2022.mobileprovision
PROVISIONING_PROFILES=~/Library/MobileDevice/Provisioning\ Profiles
mkdir -pv "${PROVISIONING_PROFILES}"
cd "${PROVISIONING_PROFILES}"
echo "${IOS_SIGN_KEY_2022}" >> cert.txt
base64 --decode cert.txt -o ${PROFILE}
rm cert.txt
# run the ruby build script
if ! [ -x "$(command -v xcodebuild)" ]; then
echo 'Error: xcodebuild is not installed.'
exit 1
fi
ruby ${PROJ_ROOT}/scripts/xcode_build.rb -i ${PROJ_ROOT}/build_ios/install -x ${PROJ_ROOT}/ios/TestApp/TestApp.xcodeproj -p ${IOS_PLATFORM}
PROFILE=PyTorch_CI_2022
ruby ${PROJ_ROOT}/scripts/xcode_build.rb -i ${PROJ_ROOT}/build_ios/install -x ${PROJ_ROOT}/ios/TestApp/TestApp.xcodeproj -p ${IOS_PLATFORM} -c ${PROFILE} -t ${IOS_DEV_TEAM_ID}

View File

@ -33,7 +33,7 @@ fi
cp ${PROJ_ROOT}/LICENSE ${ZIP_DIR}/
# zip the library
export DATE="$(date -u +%Y%m%d)"
export IOS_NIGHTLY_BUILD_VERSION="1.13.0.${DATE}"
export IOS_NIGHTLY_BUILD_VERSION="1.12.0.${DATE}"
if [ "${BUILD_LITE_INTERPRETER}" == "1" ]; then
# libtorch_lite_ios_nightly_1.11.0.20210810.zip
ZIPFILE="libtorch_lite_ios_nightly_${IOS_NIGHTLY_BUILD_VERSION}.zip"

View File

@ -53,7 +53,9 @@ if [[ "\$python_nodot" = *39* ]]; then
NUMPY_PIN=">=1.20"
fi
if [[ "$DESIRED_CUDA" == "cu116" ]]; then
EXTRA_CONDA_FLAGS="-c=conda-forge"
fi
# Move debug wheels out of the the package dir so they don't get installed
mkdir -p /tmp/debug_final_pkgs
@ -86,14 +88,13 @@ if [[ "$PACKAGE_TYPE" == conda ]]; then
if [[ "$DESIRED_CUDA" == 'cpu' ]]; then
retry conda install -c pytorch -y cpuonly
else
cu_ver="${DESIRED_CUDA:2:2}.${DESIRED_CUDA:4}"
CUDA_PACKAGE="cudatoolkit"
if [[ "$DESIRED_CUDA" == "cu116" || "$DESIRED_CUDA" == "cu117" ]]; then
CUDA_PACKAGE="cuda"
# DESIRED_CUDA is in format cu90 or cu102
if [[ "${#DESIRED_CUDA}" == 4 ]]; then
cu_ver="${DESIRED_CUDA:2:1}.${DESIRED_CUDA:3}"
else
cu_ver="${DESIRED_CUDA:2:2}.${DESIRED_CUDA:4}"
fi
retry conda install \${EXTRA_CONDA_FLAGS} -yq -c nvidia -c pytorch "\${CUDA_PACKAGE}=\${cu_ver}"
retry conda install \${EXTRA_CONDA_FLAGS} -yq -c nvidia -c pytorch "cudatoolkit=\${cu_ver}"
fi
conda install \${EXTRA_CONDA_FLAGS} -y "\$pkg" --offline
)

View File

@ -4,7 +4,7 @@ set -eux -o pipefail
source "${BINARY_ENV_FILE:-/Users/distiller/project/env}"
mkdir -p "$PYTORCH_FINAL_PACKAGE_DIR"
if [[ -z "${GITHUB_ACTIONS:-}" ]]; then
if [[ -z "${IS_GHA:-}" ]]; then
export PATH="${workdir:-${HOME}}/miniconda/bin:${PATH}"
fi

View File

@ -5,7 +5,7 @@ export TZ=UTC
tagged_version() {
# Grabs version from either the env variable CIRCLE_TAG
# or the pytorch git described version
if [[ "$OSTYPE" == "msys" && -z "${GITHUB_ACTIONS:-}" ]]; then
if [[ "$OSTYPE" == "msys" && -z "${IS_GHA:-}" ]]; then
GIT_DIR="${workdir}/p/.git"
else
GIT_DIR="${workdir}/pytorch/.git"
@ -23,12 +23,50 @@ tagged_version() {
fi
}
envfile=${BINARY_ENV_FILE:-/tmp/env}
if [[ -n "${PYTORCH_ROOT}" ]]; then
workdir=$(dirname "${PYTORCH_ROOT}")
# These are only relevant for CircleCI
# TODO: Remove these later once migrated fully to GHA
if [[ -z ${IS_GHA:-} ]]; then
# We need to write an envfile to persist these variables to following
# steps, but the location of the envfile depends on the circleci executor
if [[ "$(uname)" == Darwin ]]; then
# macos executor (builds and tests)
workdir="/Users/distiller/project"
elif [[ "$OSTYPE" == "msys" ]]; then
# windows executor (builds and tests)
workdir="/c/w"
elif [[ -d "/home/circleci/project" ]]; then
# machine executor (binary tests)
workdir="/home/circleci/project"
else
# docker executor (binary builds)
workdir="/"
fi
envfile="$workdir/env"
touch "$envfile"
chmod +x "$envfile"
# Parse the BUILD_ENVIRONMENT to package type, python, and cuda
configs=($BUILD_ENVIRONMENT)
export PACKAGE_TYPE="${configs[0]}"
export DESIRED_PYTHON="${configs[1]}"
export DESIRED_CUDA="${configs[2]}"
if [[ "${OSTYPE}" == "msys" ]]; then
export DESIRED_DEVTOOLSET=""
export LIBTORCH_CONFIG="${configs[3]:-}"
if [[ "$LIBTORCH_CONFIG" == 'debug' ]]; then
export DEBUG=1
fi
else
export DESIRED_DEVTOOLSET="${configs[3]:-}"
fi
else
# docker executor (binary builds)
workdir="/"
envfile=${BINARY_ENV_FILE:-/tmp/env}
if [[ -n "${PYTORCH_ROOT}" ]]; then
workdir=$(dirname "${PYTORCH_ROOT}")
else
# docker executor (binary builds)
workdir="/"
fi
fi
if [[ "$PACKAGE_TYPE" == 'libtorch' ]]; then
@ -59,7 +97,7 @@ PIP_UPLOAD_FOLDER='nightly/'
# We put this here so that OVERRIDE_PACKAGE_VERSION below can read from it
export DATE="$(date -u +%Y%m%d)"
#TODO: We should be pulling semver version from the base version.txt
BASE_BUILD_VERSION="1.13.0.dev$DATE"
BASE_BUILD_VERSION="1.12.0.dev$DATE"
# Change BASE_BUILD_VERSION to git tag when on a git tag
# Use 'git -C' to make doubly sure we're in the correct directory for checking
# the git tag
@ -76,11 +114,6 @@ if [[ "$(uname)" == 'Darwin' ]] || [[ "$PACKAGE_TYPE" == conda ]]; then
else
export PYTORCH_BUILD_VERSION="${BASE_BUILD_VERSION}+$DESIRED_CUDA"
fi
if [[ -n "${PYTORCH_EXTRA_INSTALL_REQUIREMENTS:-}" ]]; then
export PYTORCH_BUILD_VERSION="${PYTORCH_BUILD_VERSION}-with-pypi-cudnn"
fi
export PYTORCH_BUILD_NUMBER=1
@ -129,9 +162,9 @@ if [[ "${OSTYPE}" == "msys" ]]; then
else
export DESIRED_DEVTOOLSET="${DESIRED_DEVTOOLSET:-}"
fi
export PYTORCH_EXTRA_INSTALL_REQUIREMENTS="${PYTORCH_EXTRA_INSTALL_REQUIREMENTS:-}"
export DATE="$DATE"
export NIGHTLIES_DATE_PREAMBLE=1.13.0.dev
export NIGHTLIES_DATE_PREAMBLE=1.12.0.dev
export PYTORCH_BUILD_VERSION="$PYTORCH_BUILD_VERSION"
export PYTORCH_BUILD_NUMBER="$PYTORCH_BUILD_NUMBER"
export OVERRIDE_PACKAGE_VERSION="$PYTORCH_BUILD_VERSION"
@ -167,7 +200,7 @@ if [[ "$(uname)" != Darwin ]]; then
EOL
fi
if [[ -z "${GITHUB_ACTIONS:-}" ]]; then
if [[ -z "${IS_GHA:-}" ]]; then
cat >>"$envfile" <<EOL
export workdir="$workdir"
export MAC_PACKAGE_WORK_DIR="$workdir"

View File

@ -14,12 +14,6 @@ UPLOAD_CHANNEL=${UPLOAD_CHANNEL:-nightly}
UPLOAD_SUBFOLDER=${UPLOAD_SUBFOLDER:-cpu}
UPLOAD_BUCKET="s3://pytorch"
BACKUP_BUCKET="s3://pytorch-backup"
BUILD_NAME=${BUILD_NAME:-}
# this is temporary change to upload pypi-cudnn builds to separate folder
if [[ ${BUILD_NAME} == *with-pypi-cudnn* ]]; then
UPLOAD_SUBFOLDER="${UPLOAD_SUBFOLDER}_pypi_cudnn"
fi
DRY_RUN=${DRY_RUN:-enabled}
# Don't actually do work unless explicit
@ -30,11 +24,6 @@ if [[ "${DRY_RUN}" = "disabled" ]]; then
AWS_S3_CP="aws s3 cp"
fi
# Sleep 2 minutes between retries for conda upload
retry () {
"$@" || (sleep 5m && "$@") || (sleep 5m && "$@") || (sleep 5m && "$@") || (sleep 5m && "$@")
}
do_backup() {
local backup_dir
backup_dir=$1
@ -48,14 +37,13 @@ do_backup() {
conda_upload() {
(
set -x
retry \
${ANACONDA} \
upload \
${PKG_DIR}/*.tar.bz2 \
-u "pytorch-${UPLOAD_CHANNEL}" \
--label main \
--no-progress \
--force
upload \
${PKG_DIR}/*.tar.bz2 \
-u "pytorch-${UPLOAD_CHANNEL}" \
--label main \
--no-progress \
--force
)
}

View File

@ -6,7 +6,7 @@ mkdir -p "$PYTORCH_FINAL_PACKAGE_DIR"
export CUDA_VERSION="${DESIRED_CUDA/cu/}"
export USE_SCCACHE=1
export SCCACHE_BUCKET=ossci-compiler-cache
export SCCACHE_BUCKET=ossci-compiler-cache-windows
export SCCACHE_IGNORE_SERVER_IO_ERROR=1
export VC_YEAR=2019

View File

@ -78,7 +78,7 @@ if [[ "${BUILD_ENVIRONMENT}" == *-gradle-build-only-x86_32* ]]; then
GRADLE_PARAMS+=" -PABI_FILTERS=x86"
fi
if [ -n "${GRADLE_OFFLINE:-}" ]; then
if [ -n "{GRADLE_OFFLINE:-}" ]; then
GRADLE_PARAMS+=" --offline"
fi

View File

@ -51,6 +51,8 @@ git clone https://github.com/pytorch/cppdocs
set -ex
sudo apt-get -y install doxygen
# Generate ATen files
pushd "${pt_checkout}"
pip install -r requirements.txt
@ -98,9 +100,6 @@ git commit -m "Generate C++ docs from pytorch/pytorch@${GITHUB_SHA}" || true
git status
if [[ "${WITH_PUSH:-}" == true ]]; then
# push to a temp branch first to trigger CLA check and satisfy branch protections
git push -u origin HEAD:pytorchbot/temp-branch-cpp -f
sleep 30
git push -u origin
fi

View File

@ -1,47 +0,0 @@
#!/bin/bash
# =================== The following code **should** be executed inside Docker container ===================
# Install dependencies
sudo apt-get -y update
sudo apt-get -y install expect-dev
# This is where the local pytorch install in the docker image is located
pt_checkout="/var/lib/jenkins/workspace"
source "$pt_checkout/.jenkins/pytorch/common_utils.sh"
echo "functorch_doc_push_script.sh: Invoked with $*"
set -ex
version=${DOCS_VERSION:-nightly}
echo "version: $version"
# Build functorch docs
pushd $pt_checkout/functorch/docs
pip -q install -r requirements.txt
make html
popd
git clone https://github.com/pytorch/functorch -b gh-pages --depth 1 functorch_ghpages
pushd functorch_ghpages
if [ $version == "master" ]; then
version=nightly
fi
git rm -rf "$version" || true
mv "$pt_checkout/functorch/docs/build/html" "$version"
git add "$version" || true
git status
git config user.email "soumith+bot@pytorch.org"
git config user.name "pytorchbot"
# If there aren't changes, don't make a commit; push is no-op
git commit -m "Generate Python docs from pytorch/pytorch@${GITHUB_SHA}" || true
git status
if [[ "${WITH_PUSH:-}" == true ]]; then
git push -u origin gh-pages
fi
popd
# =================== The above code **should** be executed inside Docker container ===================

View File

@ -135,9 +135,6 @@ git commit -m "Generate Python docs from pytorch/pytorch@${GITHUB_SHA}" || true
git status
if [[ "${WITH_PUSH:-}" == true ]]; then
# push to a temp branch first to trigger CLA check and satisfy branch protections
git push -u origin HEAD:pytorchbot/temp-branch-py -f
sleep 30
git push -u origin "${branch}"
fi

View File

@ -32,7 +32,7 @@ if ! command -v aws >/dev/null; then
fi
if [ -n "${USE_CUDA_DOCKER_RUNTIME:-}" ]; then
DRIVER_FN="NVIDIA-Linux-x86_64-515.57.run"
DRIVER_FN="NVIDIA-Linux-x86_64-510.60.02.run"
wget "https://s3.amazonaws.com/ossci-linux/nvidia_driver/$DRIVER_FN"
sudo /bin/bash "$DRIVER_FN" -s --no-drm || (sudo cat /var/log/nvidia-installer.log && false)
nvidia-smi
@ -66,6 +66,7 @@ add_to_env_file() {
esac
}
add_to_env_file IN_CI 1
add_to_env_file CI_MASTER "${CI_MASTER:-}"
add_to_env_file COMMIT_SOURCE "${CIRCLE_BRANCH:-}"
add_to_env_file BUILD_ENVIRONMENT "${BUILD_ENVIRONMENT}"

View File

@ -14,11 +14,6 @@ case ${CUDA_VERSION} in
cuda_installer_name="cuda_11.6.0_511.23_windows"
cuda_install_packages="thrust_11.6 nvcc_11.6 cuobjdump_11.6 nvprune_11.6 nvprof_11.6 cupti_11.6 cublas_11.6 cublas_dev_11.6 cudart_11.6 cufft_11.6 cufft_dev_11.6 curand_11.6 curand_dev_11.6 cusolver_11.6 cusolver_dev_11.6 cusparse_11.6 cusparse_dev_11.6 npp_11.6 npp_dev_11.6 nvrtc_11.6 nvrtc_dev_11.6 nvml_dev_11.6"
;;
11.7)
cuda_installer_name="cuda_11.7.0_516.01_windows"
cuda_install_packages="thrust_11.7 nvcc_11.7 cuobjdump_11.7 nvprune_11.7 nvprof_11.7 cupti_11.7 cublas_11.7 cublas_dev_11.7 cudart_11.7 cufft_11.7 cufft_dev_11.7 curand_11.7 curand_dev_11.7 cusolver_11.7 cusolver_dev_11.7 cusparse_11.7 cusparse_dev_11.7 npp_11.7 npp_dev_11.7 nvrtc_11.7 nvrtc_dev_11.7 nvml_dev_11.7"
;;
*)
echo "CUDA_VERSION $CUDA_VERSION is not supported yet"
exit 1

View File

@ -16,10 +16,6 @@ case ${CUDA_VERSION} in
# Use cudnn8.3 with hard-coded cuda11.5 version
cudnn_file_name="cudnn-windows-x86_64-8.3.2.44_cuda11.5-archive"
;;
11.7)
# Use cudnn8.3 with hard-coded cuda11.5 version
cudnn_file_name="cudnn-windows-x86_64-8.5.0.96_cuda11-archive"
;;
*)
echo "CUDA_VERSION: ${CUDA_VERSION} not supported yet"
exit 1

View File

@ -132,3 +132,43 @@ commands:
else
echo "This is not a pull request, skipping..."
fi
upload_binary_size_for_android_build:
description: "Upload binary size data for Android build"
parameters:
build_type:
type: string
default: ""
artifacts:
type: string
default: ""
steps:
- run:
name: "Binary Size - Install Dependencies"
no_output_timeout: "5m"
command: |
retry () {
$* || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*)
}
retry pip3 install requests
- run:
name: "Binary Size - Untar Artifacts"
no_output_timeout: "5m"
command: |
# The artifact file is created inside docker container, which contains the result binaries.
# Now unpackage it into the project folder. The subsequent script will scan project folder
# to locate result binaries and report their sizes.
# If artifact file is not provided it assumes that the project folder has been mounted in
# the docker during build and already contains the result binaries, so this step can be skipped.
export ARTIFACTS="<< parameters.artifacts >>"
if [ -n "${ARTIFACTS}" ]; then
tar xf "${ARTIFACTS}" -C ~/project
fi
- run:
name: "Binary Size - Upload << parameters.build_type >>"
no_output_timeout: "5m"
command: |
cd ~/project
export ANDROID_BUILD_TYPE="<< parameters.build_type >>"
export COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
python3 -m tools.stats.upload_binary_size_to_scuba android

View File

@ -1,4 +1,243 @@
jobs:
binary_linux_build:
<<: *binary_linux_build_params
steps:
- checkout
- calculate_docker_image_tag
- run:
<<: *binary_checkout
- run:
<<: *binary_populate_env
- run:
name: Build
no_output_timeout: "1h"
command: |
source "/pytorch/.circleci/scripts/binary_linux_build.sh"
# Preserve build log
if [ -f /pytorch/build/.ninja_log ]; then
cp /pytorch/build/.ninja_log /final_pkgs
fi
- run:
name: Output binary sizes
no_output_timeout: "1m"
command: |
ls -lah /final_pkgs
- run:
name: upload build & binary data
no_output_timeout: "5m"
command: |
source /env
cd /pytorch && export COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
python3 -mpip install requests && \
SCRIBE_GRAPHQL_ACCESS_TOKEN=${SCRIBE_GRAPHQL_ACCESS_TOKEN} \
python3 -m tools.stats.upload_binary_size_to_scuba || exit 0
- persist_to_workspace:
root: /
paths: final_pkgs
- store_artifacts:
path: /final_pkgs
# This should really just be another step of the binary_linux_build job above.
# This isn't possible right now b/c the build job uses the docker executor
# (otherwise they'd be really really slow) but this one uses the macine
# executor (b/c we have to run the docker with --runtime=nvidia and we can't do
# that on the docker executor)
binary_linux_test:
<<: *binary_linux_test_upload_params
machine:
image: ubuntu-2004:202104-01
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- checkout
- attach_workspace:
at: /home/circleci/project
- setup_linux_system_environment
- setup_ci_environment
- run:
<<: *binary_checkout
- run:
<<: *binary_populate_env
- run:
name: Prepare test code
no_output_timeout: "1h"
command: .circleci/scripts/binary_linux_test.sh
- run:
<<: *binary_run_in_docker
binary_upload:
parameters:
package_type:
type: string
description: "What type of package we are uploading (eg. wheel, libtorch, conda)"
default: "wheel"
upload_subfolder:
type: string
description: "What subfolder to put our package into (eg. cpu, cudaX.Y, etc.)"
default: "cpu"
docker:
- image: continuumio/miniconda3
environment:
- DRY_RUN: disabled
- PACKAGE_TYPE: "<< parameters.package_type >>"
- UPLOAD_SUBFOLDER: "<< parameters.upload_subfolder >>"
steps:
- attach_workspace:
at: /tmp/workspace
- checkout
- designate_upload_channel
- run:
name: Install dependencies
no_output_timeout: "1h"
command: |
conda install -yq anaconda-client
pip install -q awscli
- run:
name: Do upload
no_output_timeout: "1h"
command: |
AWS_ACCESS_KEY_ID="${PYTORCH_BINARY_AWS_ACCESS_KEY_ID}" \
AWS_SECRET_ACCESS_KEY="${PYTORCH_BINARY_AWS_SECRET_ACCESS_KEY}" \
ANACONDA_API_TOKEN="${CONDA_PYTORCHBOT_TOKEN}" \
.circleci/scripts/binary_upload.sh
# Nighlty build smoke tests defaults
# These are the second-round smoke tests. These make sure that the binaries are
# correct from a user perspective, testing that they exist from the cloud are
# are runnable. Note that the pytorch repo is never cloned into these jobs
##############################################################################
smoke_linux_test:
<<: *binary_linux_test_upload_params
machine:
image: ubuntu-2004:202104-01
steps:
- checkout
- calculate_docker_image_tag
- setup_linux_system_environment
- setup_ci_environment
- run:
<<: *binary_checkout
- run:
<<: *binary_populate_env
- run:
name: Test
no_output_timeout: "1h"
command: |
set -ex
cat >/home/circleci/project/ci_test_script.sh \<<EOL
# The following code will be executed inside Docker container
set -eux -o pipefail
/builder/smoke_test.sh
# The above code will be executed inside Docker container
EOL
- run:
<<: *binary_run_in_docker
smoke_mac_test:
<<: *binary_linux_test_upload_params
macos:
xcode: "12.0"
steps:
- checkout
- run:
<<: *binary_checkout
- run:
<<: *binary_populate_env
- brew_update
- run:
<<: *binary_install_miniconda
- run:
name: Build
no_output_timeout: "1h"
command: |
set -ex
source "/Users/distiller/project/env"
export "PATH=$workdir/miniconda/bin:$PATH"
# TODO unbuffer and ts this, but it breaks cause miniconda overwrites
# tclsh. But unbuffer and ts aren't that important so they're just
# disabled for now
./builder/smoke_test.sh
binary_mac_build:
<<: *binary_mac_params
macos:
xcode: "12.0"
resource_class: "large"
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- checkout
- run:
<<: *binary_checkout
- run:
<<: *binary_populate_env
- brew_update
- run:
<<: *binary_install_miniconda
- run:
name: Build
no_output_timeout: "90m"
command: |
# Do not set -u here; there is some problem with CircleCI
# variable expansion with PROMPT_COMMAND
set -ex -o pipefail
script="/Users/distiller/project/pytorch/.circleci/scripts/binary_macos_build.sh"
cat "$script"
source "$script"
- run:
name: Test
no_output_timeout: "1h"
command: |
# Do not set -u here; there is some problem with CircleCI
# variable expansion with PROMPT_COMMAND
set -ex -o pipefail
script="/Users/distiller/project/pytorch/.circleci/scripts/binary_macos_test.sh"
cat "$script"
source "$script"
- persist_to_workspace:
root: /Users/distiller/project
paths: final_pkgs
- store_artifacts:
path: /Users/distiller/project/final_pkgs
binary_macos_arm64_build:
<<: *binary_mac_params
macos:
xcode: "12.3.0"
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- checkout
- run:
<<: *binary_checkout
- run:
<<: *binary_populate_env
- brew_update
- run:
<<: *binary_install_miniconda
- run:
name: Build
no_output_timeout: "90m"
command: |
# Do not set -u here; there is some problem with CircleCI
# variable expansion with PROMPT_COMMAND
set -ex -o pipefail
export CROSS_COMPILE_ARM64=1
script="/Users/distiller/project/pytorch/.circleci/scripts/binary_macos_build.sh"
cat "$script"
source "$script"
- persist_to_workspace:
root: /Users/distiller/project
paths: final_pkgs
- store_artifacts:
path: /Users/distiller/project/final_pkgs
binary_ios_build:
<<: *pytorch_ios_params
macos:
@ -43,6 +282,90 @@ jobs:
cat "$script"
source "$script"
binary_windows_build:
<<: *binary_windows_params
parameters:
build_environment:
type: string
default: ""
executor:
type: string
default: "windows-xlarge-cpu-with-nvidia-cuda"
executor: <<parameters.executor>>
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- checkout
- run:
<<: *binary_checkout
- run:
<<: *binary_populate_env
- run:
name: Build
no_output_timeout: "1h"
command: |
set -eux -o pipefail
script="/c/w/p/.circleci/scripts/binary_windows_build.sh"
cat "$script"
source "$script"
- persist_to_workspace:
root: "C:/w"
paths: final_pkgs
- store_artifacts:
path: C:/w/final_pkgs
binary_windows_test:
<<: *binary_windows_params
parameters:
build_environment:
type: string
default: ""
executor:
type: string
default: "windows-medium-cpu-with-nvidia-cuda"
executor: <<parameters.executor>>
steps:
- checkout
- attach_workspace:
at: c:/users/circleci/project
- run:
<<: *binary_checkout
- run:
<<: *binary_populate_env
- run:
name: Test
no_output_timeout: "1h"
command: |
set -eux -o pipefail
script="/c/w/p/.circleci/scripts/binary_windows_test.sh"
cat "$script"
source "$script"
smoke_windows_test:
<<: *binary_windows_params
parameters:
build_environment:
type: string
default: ""
executor:
type: string
default: "windows-medium-cpu-with-nvidia-cuda"
executor: <<parameters.executor>>
steps:
- checkout
- run:
<<: *binary_checkout
- run:
<<: *binary_populate_env
- run:
name: Test
no_output_timeout: "1h"
command: |
set -eux -o pipefail
export TEST_NIGHTLY_PACKAGE=1
script="/c/w/p/.circleci/scripts/binary_windows_test.sh"
cat "$script"
source "$script"
anaconda_prune:
parameters:
packages:

View File

@ -24,6 +24,95 @@
pushd /tmp/workspace
git push -u origin "<< parameters.branch >>"
pytorch_python_doc_build:
environment:
BUILD_ENVIRONMENT: pytorch-python-doc-push
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.7-gcc5.4"
resource_class: large
machine:
image: ubuntu-2004:202104-01
steps:
- checkout
- calculate_docker_image_tag
- setup_linux_system_environment
- setup_ci_environment
- run:
name: Doc Build and Push
no_output_timeout: "1h"
command: |
set -ex
export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}:build-${DOCKER_TAG}-${CIRCLE_SHA1}
echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}
# turn v1.12.0rc3 into 1.12
tag=$(echo $CIRCLE_TAG | sed -e 's/v*\([0-9]*\.[0-9]*\).*/\1/')
target=${tag:-main}
echo "building for ${target}"
time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null
export id=$(docker run --env-file "${BASH_ENV}" --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})
export COMMAND='((echo "sudo chown -R jenkins workspace && cd workspace && '"export CIRCLE_SHA1='$CIRCLE_SHA1'"' && . ./.circleci/scripts/python_doc_push_script.sh docs/'$target' '$target' site") | docker exec -u jenkins -i "$id" bash) 2>&1'
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts
mkdir -p ~/workspace/build_artifacts
docker cp $id:/var/lib/jenkins/workspace/pytorch.github.io/docs/main ~/workspace/build_artifacts
docker cp $id:/var/lib/jenkins/workspace/pytorch.github.io /tmp/workspace
# Save the docs build so we can debug any problems
export DEBUG_COMMIT_DOCKER_IMAGE=${COMMIT_DOCKER_IMAGE}-debug
docker commit "$id" ${DEBUG_COMMIT_DOCKER_IMAGE}
time docker push ${DEBUG_COMMIT_DOCKER_IMAGE}
- persist_to_workspace:
root: /tmp/workspace
paths:
- .
- store_artifacts:
path: ~/workspace/build_artifacts/main
destination: docs
pytorch_cpp_doc_build:
environment:
BUILD_ENVIRONMENT: pytorch-cpp-doc-push
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.7-gcc5.4"
resource_class: large
machine:
image: ubuntu-2004:202104-01
steps:
- checkout
- calculate_docker_image_tag
- setup_linux_system_environment
- setup_ci_environment
- run:
name: Doc Build and Push
no_output_timeout: "1h"
command: |
set -ex
export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}:build-${DOCKER_TAG}-${CIRCLE_SHA1}
echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}
# turn v1.12.0rc3 into 1.12
tag=$(echo $CIRCLE_TAG | sed -e 's/v*\([0-9]*\.[0-9]*\).*/\1/')
target=${tag:-main}
echo "building for ${target}"
time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null
export id=$(docker run --env-file "${BASH_ENV}" --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})
export COMMAND='((echo "sudo chown -R jenkins workspace && cd workspace && '"export CIRCLE_SHA1='$CIRCLE_SHA1'"' && . ./.circleci/scripts/cpp_doc_push_script.sh docs/"$target" main") | docker exec -u jenkins -i "$id" bash) 2>&1'
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts
mkdir -p ~/workspace/build_artifacts
docker cp $id:/var/lib/jenkins/workspace/cppdocs/ /tmp/workspace
# Save the docs build so we can debug any problems
export DEBUG_COMMIT_DOCKER_IMAGE=${COMMIT_DOCKER_IMAGE}-debug
docker commit "$id" ${DEBUG_COMMIT_DOCKER_IMAGE}
time docker push ${DEBUG_COMMIT_DOCKER_IMAGE}
- persist_to_workspace:
root: /tmp/workspace
paths:
- .
pytorch_macos_10_15_py3_build:
environment:
BUILD_ENVIRONMENT: pytorch-macos-10.15-py3-arm64-build
@ -37,6 +126,7 @@
no_output_timeout: "1h"
command: |
set -e
export IN_CI=1
export CROSS_COMPILE_ARM64=1
export JOB_BASE_NAME=$CIRCLE_JOB
@ -74,6 +164,7 @@
no_output_timeout: "1h"
command: |
set -e
export IN_CI=1
export JOB_BASE_NAME=$CIRCLE_JOB
# Install sccache
@ -95,198 +186,6 @@
paths:
- miniconda3
mac_build:
parameters:
build-environment:
type: string
description: Top-level label for what's being built/tested.
xcode-version:
type: string
default: "13.3.1"
description: What xcode version to build with.
build-generates-artifacts:
type: boolean
default: true
description: if the build generates build artifacts
python-version:
type: string
default: "3.8"
macos:
xcode: << parameters.xcode-version >>
resource_class: medium
environment:
BUILD_ENVIRONMENT: << parameters.build-environment >>
AWS_REGION: us-east-1
steps:
- checkout
- run_brew_for_macos_build
- run:
name: Install sccache
command: |
sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache
sudo chmod +x /usr/local/bin/sccache
echo "export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${BASH_ENV}"
echo "export SCCACHE_S3_KEY_PREFIX=${GITHUB_WORKFLOW}" >> "${BASH_ENV}"
set +x
echo "export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V4}" >> "${BASH_ENV}"
echo "export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4}" >> "${BASH_ENV}"
set -x
- run:
name: Get workflow job id
command: |
echo "export OUR_GITHUB_JOB_ID=${CIRCLE_WORKFLOW_JOB_ID}" >> "${BASH_ENV}"
- run:
name: Build
command: |
set -x
git submodule sync
git submodule update --init --recursive --depth 1 --jobs 0
export PATH="/usr/local/bin:$PATH"
export WORKSPACE_DIR="${HOME}/workspace"
mkdir -p "${WORKSPACE_DIR}"
MINICONDA_URL="https://repo.anaconda.com/miniconda/Miniconda3-py38_4.12.0-MacOSX-x86_64.sh"
if [ << parameters.python-version >> == 3.9.12 ]; then
MINICONDA_URL="https://repo.anaconda.com/miniconda/Miniconda3-py39_4.12.0-MacOSX-x86_64.sh"
fi
# If a local installation of conda doesn't exist, we download and install conda
if [ ! -d "${WORKSPACE_DIR}/miniconda3" ]; then
mkdir -p "${WORKSPACE_DIR}"
curl --retry 3 ${MINICONDA_URL} -o "${WORKSPACE_DIR}"/miniconda3.sh
bash "${WORKSPACE_DIR}"/miniconda3.sh -b -p "${WORKSPACE_DIR}"/miniconda3
fi
export PATH="${WORKSPACE_DIR}/miniconda3/bin:$PATH"
# shellcheck disable=SC1091
source "${WORKSPACE_DIR}"/miniconda3/bin/activate
brew link --force libomp
echo "export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname "$(which conda)")/../"}" >> "${BASH_ENV}"
.jenkins/pytorch/macos-build.sh
- when:
condition: << parameters.build-generates-artifacts >>
steps:
- run:
name: Archive artifacts into zip
command: |
zip -1 -r artifacts.zip dist/ build/.ninja_log build/compile_commands.json .pytorch-test-times.json
cp artifacts.zip /Users/distiller/workspace
- persist_to_workspace:
root: /Users/distiller/workspace/
paths:
- miniconda3
- artifacts.zip
- store_artifacts:
path: /Users/distiller/project/artifacts.zip
mac_test:
parameters:
build-environment:
type: string
shard-number:
type: string
num-test-shards:
type: string
xcode-version:
type: string
test-config:
type: string
default: 'default'
macos:
xcode: << parameters.xcode-version >>
environment:
GIT_DEFAULT_BRANCH: 'master'
BUILD_ENVIRONMENT: << parameters.build-environment >>
TEST_CONFIG: << parameters.test-config >>
SHARD_NUMBER: << parameters.shard-number >>
NUM_TEST_SHARDS: << parameters.num-test-shards >>
PYTORCH_RETRY_TEST_CASES: 1
PYTORCH_OVERRIDE_FLAKY_SIGNAL: 1
steps:
- checkout
- attach_workspace:
at: ~/workspace
- run_brew_for_macos_build
- run:
name: Test
no_output_timeout: "2h"
command: |
set -x
git submodule sync --recursive
git submodule update --init --recursive
mv ~/workspace/artifacts.zip .
unzip artifacts.zip
export IN_CI=1
COMMIT_MESSAGES=$(git cherry -v "origin/${GIT_DEFAULT_BRANCH:-master}")
export PATH="/usr/local/bin:$PATH"
export WORKSPACE_DIR="${HOME}/workspace"
mkdir -p "${WORKSPACE_DIR}"
export PATH="${WORKSPACE_DIR}/miniconda3/bin:$PATH"
source "${WORKSPACE_DIR}"/miniconda3/bin/activate
# sanitize the input commit message and PR body here:
# trim all new lines from commit messages to avoid issues with batch environment
# variable copying. see https://github.com/pytorch/pytorch/pull/80043#issuecomment-1167796028
COMMIT_MESSAGES="${COMMIT_MESSAGES//[$'\n\r']}"
# then trim all special characters like single and double quotes to avoid unescaped inputs to
# wreak havoc internally
export COMMIT_MESSAGES="${COMMIT_MESSAGES//[\'\"]}"
python3 -mpip install dist/*.whl
.jenkins/pytorch/macos-test.sh
- run:
name: Copy files for uploading test stats
command: |
# copy into a parent folder test-reports because we can't use CIRCLEI_BUILD_NUM in path when persisting to workspace
mkdir -p test-reports/test-reports_${CIRCLE_BUILD_NUM}/test/test-reports
cp -r test/test-reports test-reports/test-reports_${CIRCLE_BUILD_NUM}/test/test-reports
- store_test_results:
path: test/test-reports
- persist_to_workspace:
root: /Users/distiller/project/
paths:
- test-reports
upload_test_stats:
machine: # executor type
image: ubuntu-2004:202010-01 # # recommended linux image - includes Ubuntu 20.04, docker 19.03.13, docker-compose 1.27.4
steps:
- checkout
- attach_workspace:
at: ~/workspace
- run:
name: upload
command: |
set -ex
if [ -z ${AWS_ACCESS_KEY_FOR_OSSCI_ARTIFACT_UPLOAD} ]; then
echo "No credentials found, cannot upload test stats (are you on a fork?)"
exit 0
fi
cp -r ~/workspace/test-reports/* ~/project
pip3 install requests==2.26 rockset==0.8.3 boto3==1.19.12 six==1.16.0
export AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_FOR_OSSCI_ARTIFACT_UPLOAD}
export AWS_SECRET_ACCESS_KEY=${AWS_SECRET_KEY_FOR_OSSCI_ARTIFACT_UPLOAD}
# i dont know how to get the run attempt number for reruns so default to 1
python3 -m tools.stats.upload_test_stats --workflow-run-id "${CIRCLE_WORKFLOW_JOB_ID}" --workflow-run-attempt 1 --head-branch << pipeline.git.branch >> --circleci
pytorch_macos_10_13_py3_test:
environment:
BUILD_ENVIRONMENT: pytorch-macos-10.13-py3-test
@ -302,6 +201,7 @@
no_output_timeout: "1h"
command: |
set -e
export IN_CI=1
export JOB_BASE_NAME=$CIRCLE_JOB
chmod a+x .jenkins/pytorch/macos-test.sh
@ -314,6 +214,7 @@
source /Users/distiller/workspace/miniconda3/bin/activate
python3 -m pip install boto3==1.19.12
export IN_CI=1
export JOB_BASE_NAME=$CIRCLE_JOB
# Using the same IAM user to write stats to our OSS bucket
@ -339,6 +240,7 @@
no_output_timeout: "1h"
command: |
set -e
export IN_CI=1
export BUILD_LITE_INTERPRETER=1
export JOB_BASE_NAME=$CIRCLE_JOB
chmod a+x ${HOME}/project/.jenkins/pytorch/macos-lite-interpreter-build-test.sh
@ -428,6 +330,9 @@
output_image=$docker_image_libtorch_android_x86_32-gradle
docker commit "$id_x86_32" ${output_image}
time docker push ${output_image}
- upload_binary_size_for_android_build:
build_type: prebuilt
artifacts: /home/circleci/workspace/build_android_artifacts/artifacts.tgz
- store_artifacts:
path: ~/workspace/build_android_artifacts/artifacts.tgz
destination: artifacts.tgz
@ -503,6 +408,9 @@
output_image=${docker_image_libtorch_android_x86_32}-gradle
docker commit "$id" ${output_image}
time docker push ${output_image}
- upload_binary_size_for_android_build:
build_type: prebuilt-single
artifacts: /home/circleci/workspace/build_android_x86_32_artifacts/artifacts.tgz
- store_artifacts:
path: ~/workspace/build_android_x86_32_artifacts/artifacts.tgz
destination: artifacts.tgz
@ -512,43 +420,10 @@
macos:
xcode: "12.5.1"
steps:
- run:
name: checkout with retry
command: |
checkout() {
set -ex
# Workaround old docker images with incorrect $HOME
# check https://github.com/docker/docker/issues/2968 for details
if [ "${HOME}" = "/" ]
then
export HOME=$(getent passwd $(id -un) | cut -d: -f6)
fi
mkdir -p ~/.ssh
echo 'github.com ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAq2A7hRGmdnm9tUDbO9IDSwBK6TbQa+PXYPCPy6rbTrTtw7PHkccKrpp0yVhp5HdEIcKr6pLlVDBfOLX9QUsyCOV0wzfjIJNlGEYsdlLJizHhbn2mUjvSAHQqZETYP81eFzLQNnPHt4EVVUh7VfDESU84KezmD5QlWpXLmvU31/yMf+Se8xhHTvKSCZIFImWwoG6mbUoWf9nzpIoaSjB+weqqUUmpaaasXVal72J+UX2B+2RPW3RcT0eOzQgqlJL3RKrTJvdsjE3JEAvGq3lGHSZXy28G3skua2SmVi/w4yCE6gbODqnTWlg7+wC604ydGXA8VJiS5ap43JXiUFFAaQ==
' >> ~/.ssh/known_hosts
# use git+ssh instead of https
git config --global url."ssh://git@github.com".insteadOf "https://github.com" || true
git config --global gc.auto 0 || true
echo 'Cloning git repository'
mkdir -p '/Users/distiller/project'
cd '/Users/distiller/project'
git clone "$CIRCLE_REPOSITORY_URL" .
echo 'Checking out branch'
git checkout --force -B "$CIRCLE_BRANCH" "$CIRCLE_SHA1"
git --no-pager log --no-color -n 1 --format='HEAD is now at %h %s'
}
retry () {
$* || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*)
}
retry checkout
- checkout
- run_brew_for_ios_build
- run:
name: Setup Fastlane
name: Run Fastlane
no_output_timeout: "1h"
command: |
set -e
@ -556,11 +431,26 @@
cd ${PROJ_ROOT}/ios/TestApp
# install fastlane
sudo gem install bundler && bundle install
# install certificates
echo ${IOS_CERT_KEY_2022} >> cert.txt
base64 --decode cert.txt -o Certificates.p12
rm cert.txt
bundle exec fastlane install_root_cert
bundle exec fastlane install_dev_cert
# install the provisioning profile
PROFILE=PyTorch_CI_2022.mobileprovision
PROVISIONING_PROFILES=~/Library/MobileDevice/Provisioning\ Profiles
mkdir -pv "${PROVISIONING_PROFILES}"
cd "${PROVISIONING_PROFILES}"
echo ${IOS_SIGN_KEY_2022} >> cert.txt
base64 --decode cert.txt -o ${PROFILE}
rm cert.txt
- run:
name: Build
no_output_timeout: "1h"
command: |
set -e
export IN_CI=1
WORKSPACE=/Users/distiller/workspace
PROJ_ROOT=/Users/distiller/project
export TCLLIBPATH="/usr/local/lib"
@ -613,12 +503,18 @@
command: |
set -e
PROJ_ROOT=/Users/distiller/project
PROFILE=PyTorch_CI_2022
# run the ruby build script
if ! [ -x "$(command -v xcodebuild)" ]; then
echo 'Error: xcodebuild is not installed.'
exit 1
fi
ruby ${PROJ_ROOT}/scripts/xcode_build.rb -i ${PROJ_ROOT}/build_ios/install -x ${PROJ_ROOT}/ios/TestApp/TestApp.xcodeproj -p ${IOS_PLATFORM}
echo ${IOS_DEV_TEAM_ID}
if [ ${IOS_PLATFORM} != "SIMULATOR" ]; then
ruby ${PROJ_ROOT}/scripts/xcode_build.rb -i ${PROJ_ROOT}/build_ios/install -x ${PROJ_ROOT}/ios/TestApp/TestApp.xcodeproj -p ${IOS_PLATFORM} -c ${PROFILE} -t ${IOS_DEV_TEAM_ID}
else
ruby ${PROJ_ROOT}/scripts/xcode_build.rb -i ${PROJ_ROOT}/build_ios/install -x ${PROJ_ROOT}/ios/TestApp/TestApp.xcodeproj -p ${IOS_PLATFORM}
fi
if ! [ "$?" -eq "0" ]; then
echo 'xcodebuild failed!'
exit 1
@ -641,13 +537,12 @@
cd ${PROJ_ROOT}/ios/TestApp/benchmark
mkdir -p ../models
if [ ${USE_COREML_DELEGATE} == 1 ]; then
pip install coremltools==5.0b5 protobuf==3.20.1 six==1.16.0
pip install coremltools==5.0b5
pip install six
python coreml_backend.py
else
cd "${PROJ_ROOT}"
python test/mobile/model_test/gen_test_model.py ios-test
python trace_model.py
fi
cd "${PROJ_ROOT}/ios/TestApp/benchmark"
if [ ${BUILD_LITE_INTERPRETER} == 1 ]; then
echo "Setting up the TestApp for LiteInterpreter"
ruby setup.rb --lite 1
@ -655,10 +550,10 @@
echo "Setting up the TestApp for Full JIT"
ruby setup.rb
fi
cd "${PROJ_ROOT}/ios/TestApp"
# instruments -s -devices
if [ "${BUILD_LITE_INTERPRETER}" == 1 ]; then
if [ "${USE_COREML_DELEGATE}" == 1 ]; then
cd ${PROJ_ROOT}/ios/TestApp
instruments -s -devices
if [ ${BUILD_LITE_INTERPRETER} == 1 ]; then
if [ ${USE_COREML_DELEGATE} == 1 ]; then
fastlane scan --only_testing TestAppTests/TestAppTests/testCoreML
else
fastlane scan --only_testing TestAppTests/TestAppTests/testLiteInterpreter
@ -760,3 +655,27 @@
set -e
python3 -m pip install requests
python3 ./.circleci/scripts/trigger_azure_pipeline.py
pytorch_doc_test:
environment:
BUILD_ENVIRONMENT: pytorch-doc-test
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.7-gcc5.4"
resource_class: medium
machine:
image: ubuntu-2004:202104-01
steps:
- checkout
- calculate_docker_image_tag
- setup_linux_system_environment
- setup_ci_environment
- run:
name: Doc test
no_output_timeout: "30m"
command: |
set -ex
export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}:build-${DOCKER_TAG}-${CIRCLE_SHA1}
echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}
time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null
export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})
export COMMAND='((echo "sudo chown -R jenkins workspace && cd workspace && . ./.jenkins/pytorch/docs-test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

View File

@ -22,9 +22,6 @@ exclude =
./docs/caffe2,
./docs/cpp/src,
./docs/src,
./functorch/docs,
./functorch/examples,
./functorch/notebooks,
./scripts,
./test/generated_type_hints_smoketest.py,
./third_party,

View File

@ -18,13 +18,7 @@ cc11aaaa60aadf28e3ec278bce26a42c1cd68a4f
e3900d2ba5c9f91a24a9ce34520794c8366d5c54
# 2021-04-21 Removed all unqualified `type: ignore`
75024e228ca441290b6a1c2e564300ad507d7af6
# 2021-04-30 [PyTorch] Autoformat c10
44cc873fba5e5ffc4d4d4eef3bd370b653ce1ce1
# 2021-05-14 Removed all versionless Python shebangs
2e26976ad3b06ce95dd6afccfdbe124802edf28f
# 2021-06-07 Strictly typed everything in `.github` and `tools`
737d920b21db9b4292d056ee1329945990656304
# 2022-06-09 Apply clang-format to ATen headers
95b15c266baaf989ef7b6bbd7c23a2d90bacf687
# 2022-06-11 [lint] autoformat test/cpp and torch/csrc
30fb2c4abaaaa966999eab11674f25b18460e609

View File

@ -5,8 +5,6 @@ about: Tracking incidents for PyTorch's CI infra.
> NOTE: Remember to label this issue with "`ci: sev`"
**MERGE BLOCKING** <!-- remove this line if you don't want this SEV to block merges -->
## Current Status
*Status could be: preemptive, ongoing, mitigated, closed. Also tell people if they need to take action to fix it (i.e. rebase)*.

View File

@ -12,7 +12,5 @@ self-hosted-runner:
- windows.8xlarge.nvidia.gpu
- bm-runner
- linux.rocm.gpu
- macos-m1-12
- macos-12-xl
- macos-12
- macos12.3-m1

View File

@ -37,10 +37,12 @@ runs:
shell: bash
env:
BRANCH: ${{ inputs.branch }}
JOB_BASE_NAME: ${{ inputs.build-environment }}-build-and-test
BUILD_ENVIRONMENT: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-${{ inputs.arch-for-build-env }}-build"
AWS_DEFAULT_REGION: us-east-1
PR_NUMBER: ${{ github.event.pull_request.number }}
SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts
SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
DOCKER_IMAGE: ${{ inputs.docker-image }}
MATRIX_ARCH: ${{ inputs.arch }}
@ -50,12 +52,16 @@ runs:
export container_name
container_name=$(docker run \
-e BUILD_ENVIRONMENT \
-e JOB_BASE_NAME \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e AWS_DEFAULT_REGION \
-e IS_GHA \
-e PR_NUMBER \
-e SHA1 \
-e BRANCH \
-e GITHUB_RUN_ID \
-e SCCACHE_BUCKET \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e SKIP_SCCACHE_INITIALIZATION=1 \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \

View File

@ -17,16 +17,6 @@ inputs:
pull:
description: If set to any value, run `docker pull`` on the calculated image.
required: false
skip_push:
description: If set to true value, skip will be pushed, default is to skip so that pushing will be explicit
required: false
default: "true"
force_push:
description: If set to any value, always run the push
required: false
push-ghcr-image:
description: If set to any value, push docker image to the ghcr.io.
required: false
outputs:
docker-image:
@ -41,7 +31,7 @@ runs:
id: calculate-tag
env:
IS_XLA: ${{ inputs.xla == 'true' && 'true' || '' }}
XLA_IMAGE_TAG: v0.4
XLA_IMAGE_TAG: v0.2
DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/${{ inputs.docker-image-name }}
run: |
if [ -n "${IS_XLA}" ]; then
@ -63,7 +53,6 @@ runs:
BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}
DOCKER_IMAGE: ${{ steps.calculate-tag.outputs.docker-image }}
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker-tag }}
DOCKER_FORCE_PUSH: ${{ inputs.force_push }}
run: |
set -x
# Check if image already exists, if it does then skip building it
@ -86,15 +75,9 @@ runs:
PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker")
# If no image exists but the hash is the same as the previous hash then we should error out here
if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then
echo "WARNING: Something has gone wrong and the previous image isn't available for the merge-base of your branch"
echo " Will re-build docker image to store in local cache, TTS may be longer"
# NOTE: DOCKER_FORCE_PUSH will always be set to true for docker-builds.yml
if [[ "${DOCKER_FORCE_PUSH}" != "true" ]]; then
# In order to avoid a stampeding herd of jobs trying to push all at once we set it to
# skip the push. If this is negatively affecting TTS across the board the suggestion
# should be to run the docker-builds.yml workflow to generate the correct docker builds
echo ::set-output name=skip_push::true
fi
echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"
echo " contact the PyTorch team to restore the original images"
exit 1
fi
echo ::set-output name=rebuild::yes
@ -103,11 +86,7 @@ runs:
env:
IMAGE_NAME: ${{inputs.docker-image-name}}
DOCKER_SKIP_S3_UPLOAD: "1"
# Skip push if we don't need it, or if specified in the inputs
DOCKER_SKIP_PUSH: ${{ steps.check.outputs.skip_push || inputs.skip_push }}
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker-tag }}
PUSH_GHCR_IMAGE: ${{ inputs.push-ghcr-image }}
GHCR_PAT: ${{ env.GHCR_PAT }}
working-directory: .circleci/docker
shell: bash
run: |

View File

@ -23,14 +23,11 @@ runs:
env:
NO_SUDO: ${{ inputs.no-sudo }}
run: |
retry () {
$* || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*)
}
echo "${GITHUB_WORKSPACE}"
if [ -z "${NO_SUDO}" ]; then
retry sudo rm -rf "${GITHUB_WORKSPACE}"
sudo rm -rf "${GITHUB_WORKSPACE}"
else
retry rm -rf "${GITHUB_WORKSPACE}"
rm -rf "${GITHUB_WORKSPACE}"
fi
mkdir "${GITHUB_WORKSPACE}"

View File

@ -15,7 +15,7 @@ runs:
steps:
- name: Download PyTorch Build Artifacts from S3
if: ${{ !inputs.use-gha }}
uses: seemethere/download-artifact-s3@v4
uses: seemethere/download-artifact-s3@v3
with:
name: ${{ inputs.name }}

View File

@ -1,60 +0,0 @@
name: Filter test configs matrix
description: |
Apply filter to the test configs matrix to keep only entries specified
by the PR test-config labels. If no test-config label is set, the same
test configs matrix is returned untouched.
inputs:
github-token:
description: GITHUB_TOKEN
required: true
test-matrix:
required: true
type: string
description: JSON description of what test configs to run.
outputs:
test-matrix:
description: The filtered test configs matrix.
value: ${{ steps.filter.outputs.test-matrix }}
is-test-matrix-empty:
description: True if the filtered test configs matrix is empty. False otherwise.
value: ${{ steps.filter.outputs.is-test-matrix-empty }}
runs:
using: composite
steps:
- uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a
name: Setup dependencies
env:
GITHUB_TOKEN: ${{ inputs.github-token }}
with:
shell: bash
timeout_minutes: 10
max_attempts: 5
retry_wait_seconds: 30
command: |
set -eux
python3 -m pip install requests==2.26.0 pyyaml==6.0
- name: Parse ref
shell: bash
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Select all requested test configurations
shell: bash
env:
GITHUB_TOKEN: ${{ inputs.github-token }}
id: filter
run: |
.github/scripts/filter_test_configs.py \
--test-matrix "${{ inputs.test-matrix }}" \
--pr-number "${{ github.event.pull_request.number }}" \
--tag "${{ steps.parse-ref.outputs.tag }}"
- name: Print the filtered test matrix
shell: bash
run: |
echo "${{ steps.filter.outputs.test-matrix }}"

View File

@ -15,7 +15,7 @@ outputs:
runs:
using: composite
steps:
- uses: nick-fields/retry@7d4a37704547a311dbb66ebdf5b23ec19374a767
- uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a
id: get-job-id
env:
GITHUB_TOKEN: ${{ inputs.github-token }}
@ -25,7 +25,7 @@ runs:
max_attempts: 5
retry_wait_seconds: 30
command: |
set -eux
set -x
python3 -m pip install requests==2.26.0
GHA_WORKFLOW_JOB_ID=$(python3 .github/scripts/get_workflow_job_id.py "${GITHUB_RUN_ID}" "${RUNNER_NAME}")
echo "::set-output name=job-id::${GHA_WORKFLOW_JOB_ID}"

View File

@ -0,0 +1,19 @@
name: Pull docker image
description: pull a specific docker image
inputs:
docker-image:
description: the image to pull
required: true
runs:
using: composite
steps:
- name: Pull Docker image
shell: bash
env:
DOCKER_IMAGE: ${{ inputs.docker-image }}
run: |
retry () { "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") }
retry docker pull "${DOCKER_IMAGE}"

View File

@ -44,5 +44,4 @@ runs:
- name: Preserve github env variables for use in docker
shell: bash
run: |
env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}"
env | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}"
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"

View File

@ -49,8 +49,7 @@ runs:
- name: Preserve github env variables for use in docker
shell: bash
run: |
env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}"
env | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}"
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: ROCm set GPU_FLAG
shell: bash

17
.github/actions/setup-ssh/action.yml vendored Normal file
View File

@ -0,0 +1,17 @@
name: Setup SSH
description: Adds ssh keys for current user to machine
inputs:
github-secret:
description: GitHub token
required: true
runs:
using: composite
steps:
- name: "Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ inputs.github-secret }}
activate-with-label: false

View File

@ -58,8 +58,3 @@ runs:
uses: actions/setup-python@v2
with:
python-version: "3.x"
cache: pip
cache-dependency-path: |
**/requirements.txt
**/.circleci/docker/requirements-ci.txt
**/.github/requirements-gha-cache.txt

View File

@ -0,0 +1,28 @@
name: Teardown Linux
description: Stuff that should always run at the end of a linux job
inputs:
skip-wait-ssh:
description: If set, don't wait for ssh to drain before tearing down
required: false
default: ""
runs:
using: composite
steps:
- name: Hold runner for 2 hours or until ssh sessions have drained
# TODO working-directory: !{{ pytorch_directory }}
# Always hold for active ssh sessions
shell: bash
if: inputs.skip-wait-ssh == ''
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Kill containers, clean up images
shell: bash
run: |
# ignore expansion of "docker ps -q" since it could be empty
# shellcheck disable=SC2046
docker stop $(docker ps -q) || true
# Prune all of the docker images
docker system prune -af

View File

@ -0,0 +1,25 @@
name: Teardown ROCm host
description: Teardown ROCm host for CI
runs:
using: composite
steps:
- name: Kill containers, clean up images
if: always()
shell: bash
run: |
# ignore expansion of "docker ps -q" since it could be empty
# shellcheck disable=SC2046
docker stop $(docker ps -q) || true
# Prune all of the docker containers
docker container prune -f
# Prune everything docker if there are more than 10 images (~200GB).
# This is easier than using a time filter, e.g., "until=24h".
image_count=$(docker images | wc -l)
if [[ ${image_count} -gt 10 ]]; then
echo "Purging all docker caches"
docker system prune -af
else
echo "Will not purge docker, only ${image_count} images found"
fi

View File

@ -1,41 +0,0 @@
name: Test pytorch binary
description: Pulls the docker image and tests the pytorch binary using it. All env variable referenced in the "Test PyTorch binary" step must be set in the GITHUB_ENV file
runs:
using: composite
steps:
- name: Test PyTorch binary
shell: bash
run: |
set -x
# shellcheck disable=SC2086,SC2090
container_name=$(docker run \
${GPU_FLAG:-} \
-e BINARY_ENV_FILE \
-e BUILDER_ROOT \
-e BUILD_ENVIRONMENT \
-e BUILD_SPLIT_CUDA \
-e DESIRED_CUDA \
-e DESIRED_DEVTOOLSET \
-e DESIRED_PYTHON \
-e GITHUB_ACTIONS \
-e GPU_ARCH_TYPE \
-e GPU_ARCH_VERSION \
-e LIBTORCH_VARIANT \
-e PACKAGE_TYPE \
-e PYTORCH_FINAL_PACKAGE_DIR \
-e PYTORCH_ROOT \
-e SKIP_ALL_TESTS \
--tty \
--detach \
-v "${GITHUB_WORKSPACE}/pytorch:/pytorch" \
-v "${GITHUB_WORKSPACE}/builder:/builder" \
-v "${RUNNER_TEMP}/artifacts:/final_pkgs" \
-w / \
"${DOCKER_IMAGE}"
)
docker exec -t -w "${PYTORCH_ROOT}" "${container_name}" bash -c "bash .circleci/scripts/binary_populate_env.sh"
# Generate test script
docker exec -t -w "${PYTORCH_ROOT}" -e OUTPUT_SCRIPT="/run.sh" "${container_name}" bash -c "bash .circleci/scripts/binary_linux_test.sh"
docker exec -t "${container_name}" bash -c "source ${BINARY_ENV_FILE} && bash -x /run.sh"

View File

@ -36,20 +36,6 @@ runs:
rm -f test-reports-*.zip
zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml'
- name: Zip usage log for upload
if: runner.os != 'Windows' && !inputs.use-gha
shell: bash
env:
FILE_SUFFIX: ${{ inputs.file-suffix }}
run: |
# Remove any previous test reports if they exist
rm -f usage-log-*.zip
# this workflow is also run in bazel build test, but we dont generate usage reports for it
# so check to see if the file exists first
if [ -f 'usage_log.txt' ]; then
zip "usage-log-${FILE_SUFFIX}.zip" 'usage_log.txt'
fi
# Windows zip
- name: Zip JSONs for upload
if: runner.os == 'Windows' && !inputs.use-gha
@ -69,46 +55,23 @@ runs:
# -ir => recursive include all files in pattern
7z a "test-reports-$Env:FILE_SUFFIX.zip" -ir'!test\*.xml'
- name: Zip usage log for upload
if: runner.os == 'Windows' && !inputs.use-gha
shell: powershell
env:
FILE_SUFFIX: ${{ inputs.file-suffix }}
run: |
# -ir => recursive include all files in pattern
7z a "usage-log-$Env:FILE_SUFFIX.zip" 'usage_log.txt'
# S3 upload
- name: Store Test Downloaded JSONs on S3
uses: seemethere/upload-artifact-s3@v5
uses: seemethere/upload-artifact-s3@v4
if: ${{ !inputs.use-gha }}
with:
s3-prefix: |
${{ github.repository }}/${{ github.run_id }}/${{ github.run_attempt }}/artifact
retention-days: 14
if-no-files-found: warn
path: test-jsons-*.zip
- name: Store Test Reports on S3
uses: seemethere/upload-artifact-s3@v5
uses: seemethere/upload-artifact-s3@v4
if: ${{ !inputs.use-gha }}
with:
s3-prefix: |
${{ github.repository }}/${{ github.run_id }}/${{ github.run_attempt }}/artifact
retention-days: 14
if-no-files-found: error
path: test-reports-*.zip
- name: Store Usage Logs on S3
uses: seemethere/upload-artifact-s3@v5
if: ${{ !inputs.use-gha }}
with:
s3-prefix: |
${{ github.repository }}/${{ github.run_id }}/${{ github.run_attempt }}/artifact
retention-days: 14
if-no-files-found: ignore
path: usage-log-*.zip
# GHA upload
- name: Store Test Downloaded JSONs on Github
uses: actions/upload-artifact@v2

View File

@ -1,27 +0,0 @@
# Documented at https://github.com/necojackarc/auto-request-review
reviewers:
groups:
symbolic-shapes:
- ezyang
- Chillee
- wconstab
- anjali411
- albanD
- Krovatkin
- miladm
per_author:
symbolic-shapes:
- symbolic-shapes
- antoniojkim
files:
# none yet, TODO: migrate CODEOWNERS here
options:
ignore_draft: true
ignored_keywords:
- DO NOT REVIEW
# Just manually setup a self-referential per_author rule if you
# want group assignment
enable_group_assignment: false

View File

@ -1 +0,0 @@
6ead5cae0d1234aa64db06fe230ef56e12ec76fe

View File

@ -1 +0,0 @@
d7d90f56117ce0955332846a5f90b8d1346c4c09

View File

@ -1 +0,0 @@
f2b36df6a1a80137eff7644e6d0f4eeb7ff429d6

114
.github/merge_rules.json vendored Normal file
View File

@ -0,0 +1,114 @@
[
{
"name": "ONNX exporter",
"patterns": [
".jenkins/caffe2/*",
"scripts/onnx/**",
"docs/source/onnx.rst",
"test/onnx/**",
"test/jit/test_export_modes.py",
"aten/src/ATen/core/interned_strings.h",
"tools/onnx/**",
"torch/_C/__init__.pyi.in",
"torch/csrc/jit/passes/onnx.*",
"torch/csrc/jit/passes/onnx/**",
"torch/csrc/jit/serialization/export.*",
"torch/csrc/jit/serialization/onnx.*",
"torch/csrc/onnx/**",
"torch/onnx/**"
],
"approved_by": ["BowenBao", "garymm"],
"mandatory_checks_name": ["Facebook CLA Check", "Lint"]
},
{
"name": "NVFuser",
"patterns": [
"test/test_jit_cuda_fuser.py",
"torch/csrc/jit/codegen/fuser/cuda/**",
"torch/csrc/jit/codegen/cuda/**",
"benchmarks/cpp/nvfuser/**"
],
"approved_by": ["csarofeen", "ngimel", "jjsjann123"],
"mandatory_checks_name": ["Facebook CLA Check", "Lint"]
},
{
"name": "OSS CI",
"patterns": [".github/**", ".circleci/**", ".jenkins/**", "scripts/**", "tools/**"],
"approved_by": ["ezyang", "pytorch/pytorch-dev-infra"],
"mandatory_checks_name": ["Facebook CLA Check", "Lint"]
},
{
"name": "Documentation",
"patterns": ["docs/**", "torch/*docs.py"],
"approved_by": ["mruberry", "ngimel", "janeyx99"],
"mandatory_checks_name": ["Facebook CLA Check", "Lint"]
},
{
"name": "Mobile",
"patterns": ["ios/**", "android/**", "test/mobile/**"],
"approved_by": ["linbinyu", "kit1980", "IvanKobzarev", "dreiss"],
"mandatory_checks_name": ["Facebook CLA Check", "Lint"]
},
{
"name": "Linear Algebra",
"patterns": [
"aten/src/ATen/native/cuda/linalg/**",
"aten/src/ATen/LinalgBackend.h",
"aten/src/ATen/native/**/*LinearAlgebra*",
"docs/source/linalg.rst",
"torch/linalg/**",
"torch/_linalg_utils.py",
"torch/**/python_linalg_functions.*",
"torch/**/linalg.h",
"tools/autograd/templates/python_linalg_functions.cpp",
"test/test_linalg.py"
],
"approved_by": ["nikitaved", "mruberry", "pearu", "Lezcano", "IvanYashchuk"],
"mandatory_checks_name": ["Facebook CLA Check", "Lint"]
},
{
"name": "FFT",
"patterns": [
"aten/src/ATen/native/cuda/*FFT*.h",
"aten/src/ATen/native/SpectralOps.cpp",
"aten/src/ATen/native/mkl/SpectralOps.cpp",
"aten/src/ATen/native/cuda/SpectralOps.*",
"docs/source/fft.rst",
"torch/fft/**",
"torch/csrc/api/include/torch/fft.h",
"torch/**/python_fft_functions.*",
"tools/autograd/templates/python_fft_functions.cpp",
"test/cpp/api/fft.cpp"
],
"approved_by": ["mruberry", "peterbell10"],
"mandatory_checks_name": ["Facebook CLA Check", "Lint"]
},
{
"name": "Sparse",
"patterns": [
"benchmarks/sparse",
"c10/util/sparse_bitset.h",
"docs/source/sparse.rst",
"torch/**/sparse/**",
"torch/**/*sparse*",
"torch/optim/sparse*",
"torch/ao/nn/sparse/**",
"torch/utils/benchmark/**/*sparse*",
"aten/src/ATen/native/ao_sparse/**",
"aten/src/ATen/native/sparse/**",
"aten/src/ATen/**/*Sparse*",
"aten/src/ATen/*Sparse*",
"torch/_masked/**",
"test/*_masked*",
"test/**/*sparse*"
],
"approved_by": ["nikitaved", "cpuhrsch", "pearu", "IvanYashchuk"],
"mandatory_checks_name": ["Facebook CLA Check", "Lint"]
},
{
"name": "superuser",
"patterns": ["*"],
"approved_by": ["pytorch/metamates"],
"mandatory_checks_name": ["Facebook CLA Check", "Lint"]
}
]

View File

@ -1,356 +0,0 @@
- name: ONNX exporter
patterns:
- .jenkins/caffe2/*
- aten/src/ATen/core/interned_strings.h
- docs/source/onnx.rst
- docs/source/onnx*
- docs/source/scripts/onnx/**
- scripts/onnx/**
- test/jit/test_export_modes.py
- test/onnx/**
- tools/onnx/**
- torch/_C/__init__.pyi.in
- torch/csrc/jit/passes/onnx.*
- torch/csrc/jit/passes/onnx/**
- torch/csrc/jit/serialization/export.*
- torch/csrc/jit/serialization/onnx.*
- torch/csrc/onnx/**
- torch/onnx/**
- third_party/onnx
- caffe2/python/onnx/**
approved_by:
- BowenBao
- abock
mandatory_checks_name:
- EasyCLA
- Lint
- pull
- name: NVFuser
patterns:
- test/test_jit_cuda_fuser.py
- torch/csrc/jit/codegen/fuser/cuda/**
- torch/csrc/jit/codegen/cuda/**
- benchmarks/cpp/nvfuser/**
approved_by:
- csarofeen
- ngimel
- jjsjann123
- ptrblck
mandatory_checks_name:
- EasyCLA
- Lint
- pull
- name: OSS CI
patterns:
- .github/**
- .circleci/**
- .jenkins/**
- scripts/**
- tools/**
approved_by:
- alband
- dagitses
- pytorch/pytorch-dev-infra
mandatory_checks_name:
- EasyCLA
- Lint
- pull
- name: OSS CI / pytorchbot
patterns:
- .github/ci_commit_pins/vision.txt
- .github/ci_commit_pins/torchdynamo.txt
approved_by:
- pytorchbot
mandatory_checks_name:
- EasyCLA
- Lint
- pull
- name: OSS CI / pytorchbot / XLA
patterns:
- .github/ci_commit_pins/xla.txt
approved_by:
- pytorchbot
mandatory_checks_name:
- EasyCLA
- Lint
- pull / linux-bionic-py3_7-clang8-xla / build
- pull / linux-bionic-py3_7-clang8-xla / test (xla, 1, 1, linux.2xlarge)
- name: Documentation
patterns:
- docs/**
- torch/*docs.py
approved_by:
- svekars
mandatory_checks_name:
- EasyCLA
- Lint
- pull
- name: Mobile
patterns:
- ios/**
- android/**
- test/mobile/**
approved_by:
- linbinyu
- IvanKobzarev
- dreiss
- raziel
mandatory_checks_name:
- EasyCLA
- Lint
- pull
- name: Linear Algebra
patterns:
- aten/src/ATen/native/cuda/linalg/**
- aten/src/ATen/LinalgBackend.h
- aten/src/ATen/native/**LinearAlgebra*
- docs/source/linalg.rst
- torch/linalg/**
- torch/_linalg_utils.py
- torch/**python_linalg_functions.*
- torch/**linalg.h
- tools/autograd/templates/python_linalg_functions.cpp
- test/test_linalg.py
approved_by:
- mruberry
- lezcano
- IvanYashchuk
mandatory_checks_name:
- EasyCLA
- Lint
- pull
- name: FFT
patterns:
- aten/src/ATen/native/cuda/*FFT*.h
- aten/src/ATen/native/SpectralOps.cpp
- aten/src/ATen/native/mkl/SpectralOps.cpp
- aten/src/ATen/native/cuda/SpectralOps.*
- docs/source/fft.rst
- torch/fft/**
- torch/csrc/api/include/torch/fft.h
- torch/**python_fft_functions.*
- tools/autograd/templates/python_fft_functions.cpp
- test/cpp/api/fft.cpp
approved_by:
- mruberry
- peterbell10
mandatory_checks_name:
- EasyCLA
- Lint
- pull
- name: Sparse
patterns:
- benchmarks/sparse
- c10/util/sparse_bitset.h
- docs/source/sparse.rst
- torch/**sparse/**
- torch/**sparse*
- torch/optim/sparse*
- torch/ao/nn/sparse/**
- torch/utils/benchmark/**sparse*
- aten/src/ATen/native/ao_sparse/**
- aten/src/ATen/native/sparse/**
- aten/src/ATen/**Sparse*
- aten/src/ATen/*Sparse*
- torch/_masked/**
- test/*_masked*
- test/**sparse*
approved_by:
- nikitaved
- cpuhrsch
- pearu
- IvanYashchuk
mandatory_checks_name:
- EasyCLA
- Lint
- pull
- name: MPS
patterns:
- test/test_mps.py
- aten/src/ATen/native/native_functions.yaml
- aten/src/ATen/mps/**
- aten/src/ATen/native/mps/**
approved_by:
- kulinseth
- alband
- malfet
- razarmehr
mandatory_checks_name:
- EasyCLA
- Lint
- pull
- name: Distributions
patterns:
- torch/distributions/**
- test/distributions/**
approved_by:
- fritzo
- neerajprad
- alicanb
mandatory_checks_name:
- EasyCLA
- Lint
- pull
- name: Distributed
patterns:
- docs/source/pipeline.rst
- docs/source/distributed*
- docs/source/rpc.rst
- docs/source/rpc/**
- docs/source/_static/img/rpc*
- docs/source/_static/img/*distributed*
- docs/source/elastic/**
- benchmarks/distributed/**
- torch/distributed/**
- torch/nn/parallel/distributed*
- torch/_C/_distributed*
- torch/csrc/distributed/**
- torch/testing/_internal/distributed/**
- test/distributed/**
- test/cpp/dist_autograd/**
- test/cpp/rpc/**
approved_by:
- mrshenli
- pritamdamania87
- zhaojuanmao
- rohan-varma
- wanchaol
- fduwjj
- H-Huang
- d4l3k
- aazzolini
- kwen2501
mandatory_checks_name:
- EasyCLA
- Lint
- pull
- name: IDEEP
patterns:
- third_party/ideep
- caffe2/ideep/**
- caffe2/python/ideep/**
approved_by:
- XiaobingSuper
- yanbing-j
mandatory_checks_name:
- EasyCLA
- Lint
- pull
- name: oneDNN graph
patterns:
- torch/csrc/jit/codegen/onednn/**
- test/test_jit_llga_fuser.py
approved_by:
- sanchitintel
- chunyuan-w
mandatory_checks_name:
- EasyCLA
- Lint
- pull
- name: CPU ATen backend
patterns:
- aten/src/ATen/cpu/**
- aten/src/ATen/native/cpu/**
- aten/src/ATen/native/quantized/cpu/**
- aten/src/ATen/native/Convolution*.cpp
- aten/src/ATen/native/mkldnn/**
approved_by:
- mingfeima
- XiaobingSuper
mandatory_checks_name:
- EasyCLA
- Lint
- pull
- name: CPU frontend
patterns:
- torch/cpu/**
- torch/utils/mkldnn.py
- test/test_mkldnn.py
approved_by:
- leslie-fang-intel
- CaoE
mandatory_checks_name:
- EasyCLA
- Lint
- pull
- name: Autocast
patterns:
- torch/amp/**
- aten/src/ATen/autocast_mode.*
- torch/csrc/jit/passes/autocast.cpp
- test/test_autocast.py
approved_by:
- leslie-fang-intel
- CaoE
mandatory_checks_name:
- EasyCLA
- Lint
- pull
- name: Lazy Tensor
patterns:
- torch/csrc/lazy/**
- test/cpp/lazy/**
- test/lazy/**
- codegen/api/lazy.py
- codegen/dest/lazy_ir.py
- codegen/dest/lazy_ts_lowering.py
- codegen/gen_lazy_tensor.py
- aten/src/ATen/native/ts_native_functions.yaml
approved_by:
- alanwaketan
- JackCaoG
mandatory_checks_name:
- EasyCLA
- Lint
- pull
- name: superuser
patterns:
- '*'
approved_by:
- pytorch/metamates
mandatory_checks_name:
- EasyCLA
- Lint
- pull
- name: Core Reviewers
patterns:
- '*'
approved_by:
- mruberry
- lezcano
mandatory_checks_name:
- EasyCLA
- Lint
- pull
- name: Core Maintainers
patterns:
- '*'
approved_by:
- soumith
- gchanan
- ezyang
- dzhulgakov
mandatory_checks_name:
- EasyCLA
- Lint
- pull

View File

@ -1,16 +0,0 @@
# This file is to cache other dependencies not specified elsewhere in:
# requirement.txt
# requirements-flake8.txt
# docs/requirements.txt
# docs/cpp/requirements.txt
# functorch/docs/requirements.txt
# .circleci/docker/requirements-ci.txt
cffi==1.15.0
dataclasses==0.6
jinja2==3.0.1
lintrunner=0.9.2
ninja==1.10.0.post1
pynvml==11.4.1
requests==2.26
rich==10.9.0
rockset==0.8.10

69
.github/scale-config.yml vendored Normal file
View File

@ -0,0 +1,69 @@
# scale-config.yml:
# Powers what instance types are available for GHA auto-scaled
# runners. Runners listed here will be available as self hosted
# runners, configuration is directly pulled from the main branch.
#
# NOTE (Apr, 5, 2021): Linux runners are currently all an amazonlinux2
#
# NOTE (Jan 5, 2021): Linux runners are all non-ephemeral to reduce the amount of CreateInstaces calls
# to avoid RequestLimitExceeded issues
#
# TODO: Add some documentation on how the auto-scaling works
#
# NOTE: Default values,
#
# runner_types:
# runner_label:
# instance_type: m4.large
# os: linux
# max_available: 20
# disk_size: 50
# is_ephemeral: true
runner_types:
# mainly used for ciflow-should-run, not made to run any serious tests
linux.large:
instance_type: c5.large
os: linux
disk_size: 10
is_ephemeral: false
linux.2xlarge:
instance_type: c5.2xlarge
os: linux
max_available: 1000
disk_size: 150
is_ephemeral: false
linux.4xlarge: # for binary-builds
instance_type: c5.4xlarge
os: linux
max_available: 500
disk_size: 150
is_ephemeral: false
linux.8xlarge.nvidia.gpu:
instance_type: g3.8xlarge
os: linux
max_available: 200
disk_size: 150
is_ephemeral: false
linux.4xlarge.nvidia.gpu:
instance_type: g3.4xlarge
os: linux
max_available: 250
disk_size: 150
is_ephemeral: false
linux.16xlarge.nvidia.gpu:
instance_type: g3.16xlarge
os: linux
max_available: 10
disk_size: 150
is_ephemeral: false
windows.4xlarge:
instance_type: c5d.4xlarge
os: windows
max_available: 200
disk_size: 256
windows.8xlarge.nvidia.gpu:
instance_type: p3.2xlarge
os: windows
max_available: 50
disk_size: 256

View File

@ -1,34 +0,0 @@
from typing import Any
from trymerge import gh_post_pr_comment
from gitutils import get_git_remote_name, get_git_repo_dir, GitRepo
from trymerge_explainer import BOT_COMMANDS_WIKI
import os
def parse_args() -> Any:
from argparse import ArgumentParser
parser = ArgumentParser("Comment on a PR")
parser.add_argument("pr_num", type=int)
parser.add_argument("action", type=str)
return parser.parse_args()
def main() -> None:
args = parse_args()
repo = GitRepo(get_git_repo_dir(), get_git_remote_name(), debug=True)
org, project = repo.gh_owner_and_name()
run_url = os.environ.get("GH_RUN_URL")
job_link = f"[job]({run_url})" if run_url is not None else "job"
msg = (
f"The {args.action} {job_link} was canceled. If you believe this is a mistake,"
+ f"then you can re trigger it through [pytorch-bot]({BOT_COMMANDS_WIKI})."
)
gh_post_pr_comment(org, project, args.pr_num, msg)
print(org, project, args.pr_num, msg)
if __name__ == "__main__":
main()

View File

@ -1,123 +0,0 @@
import sys
from typing import Any, Dict, List, NamedTuple, Tuple
from gitutils import _check_output
import rockset # type: ignore[import]
import os
import re
def eprint(msg: str) -> None:
print(msg, file=sys.stderr)
class WorkflowCheck(NamedTuple):
workflowName: str
name: str
jobName: str
conclusion: str
def get_latest_commits() -> List[str]:
latest_viable_commit = _check_output(
[
"git",
"log",
"-n",
"1",
"--pretty=format:%H",
"origin/viable/strict",
],
encoding="ascii",
)
commits = _check_output(
[
"git",
"rev-list",
f"{latest_viable_commit}^..HEAD",
"--remotes=*origin/master",
],
encoding="ascii",
).splitlines()
return commits
def query_commits(commits: List[str], qlambda: Any) -> Any:
params = rockset.ParamDict()
params['shas'] = ",".join(commits)
results = qlambda.execute(parameters=params)
return results
def print_commit_status(commit: str, results: Dict[str, Any]) -> None:
print(commit)
for check in results['results']:
if check['sha'] == commit:
print(f"\t{check['conclusion']:>10}: {check['name']}")
def get_commit_results(commit: str, results: Dict[str, Any]) -> List[Dict[str, Any]]:
workflow_checks = []
for check in results['results']:
if check['sha'] == commit:
workflow_checks.append(WorkflowCheck(
workflowName=check['workflowName'],
name=check['name'],
jobName=check['jobName'],
conclusion=check['conclusion'],
)._asdict())
return workflow_checks
def isGreen(commit: str, results: Dict[str, Any]) -> Tuple[bool, str]:
workflow_checks = get_commit_results(commit, results)
regex = {
"pull": False,
"trunk": False,
"lint": False,
"linux-binary": False,
"windows-binary": False,
}
for check in workflow_checks:
workflowName = check['workflowName']
conclusion = check['conclusion']
for required_check in regex:
if re.match(required_check, workflowName, flags=re.IGNORECASE):
if conclusion not in ["success", "skipped"]:
return (False, workflowName + " checks were not successful")
else:
regex[required_check] = True
if workflowName in ["periodic", "docker-release-builds"] and conclusion not in ["success", "skipped"]:
return (False, workflowName + " checks were not successful")
missing_workflows = [x for x in regex.keys() if not regex[x]]
if len(missing_workflows) > 0:
return (False, "missing required workflows: " + ", ".join(missing_workflows))
return (True, "")
def get_latest_green_commit(commits: List[str], results: Dict[str, Any]) -> Any:
for commit in commits:
eprint(f"Checking {commit}")
is_green, msg = isGreen(commit, results)
if is_green:
eprint("GREEN")
return commit
else:
eprint("RED: " + msg)
return None
def main() -> None:
rs = rockset.Client(
api_server="api.rs2.usw2.rockset.com", api_key=os.environ["ROCKSET_API_KEY"]
)
qlambda = rs.QueryLambda.retrieve(
'commit_jobs_batch_query',
version='15aba20837ae9d75',
workspace='commons')
commits = get_latest_commits()
results = query_commits(commits, qlambda)
latest_viable_commit = get_latest_green_commit(commits, results)
print(latest_viable_commit)
if __name__ == "__main__":
main()

View File

@ -1,162 +0,0 @@
#!/usr/bin/env python3
import sys
import re
import json
import os
import requests
from typing import Any, Dict, Set, List
import yaml
import warnings
PREFIX = "test-config/"
# Same as shard names
VALID_TEST_CONFIG_LABELS = {f"{PREFIX}{label}" for label in {
"backwards_compat",
"crossref",
"default",
"deploy",
"distributed",
"docs_tests",
"dynamo",
"force_on_cpu",
"functorch",
"jit_legacy",
"multigpu",
"nogpu_AVX512",
"nogpu_NO_AVX2",
"slow",
"xla",
}}
def parse_args() -> Any:
from argparse import ArgumentParser
parser = ArgumentParser("Filter all test configurations and keep only requested ones")
parser.add_argument("--test-matrix", type=str, required=True, help="the original test matrix")
parser.add_argument("--pr-number", type=str, help="the pull request number")
parser.add_argument("--tag", type=str, help="the associated tag if it exists")
return parser.parse_args()
def get_labels(pr_number: int) -> Set[str]:
"""
Dynamical get the latest list of labels from the pull request
"""
# From https://docs.github.com/en/actions/learn-github-actions/environment-variables
PYTORCH_REPO = os.environ.get("GITHUB_REPOSITORY", "pytorch/pytorch")
PYTORCH_GITHUB_API = f"https://api.github.com/repos/{PYTORCH_REPO}"
GITHUB_TOKEN = os.environ["GITHUB_TOKEN"]
REQUEST_HEADERS = {
"Accept": "application/vnd.github.v3+json",
"Authorization": "token " + GITHUB_TOKEN,
}
response = requests.get(
f"{PYTORCH_GITHUB_API}/issues/{pr_number}/labels",
headers=REQUEST_HEADERS,
)
if response.status_code != requests.codes.ok:
warnings.warn(f"Failed to get the labels for #{pr_number} (status code {response.status_code})")
return set()
return {label.get("name") for label in response.json() if label.get("name")}
def filter(test_matrix: Dict[str, List[Any]], labels: Set[str]) -> Dict[str, List[Any]]:
"""
Select the list of test config to run from the test matrix. The logic works
as follows:
If the PR has one or more labels as specified in the VALID_TEST_CONFIG_LABELS set, only
these test configs will be selected. This also works with ciflow labels, for example,
if a PR has both ciflow/trunk and test-config/functorch, only trunk functorch builds
and tests will be run
If the PR has none of the test-config label, all tests are run as usual.
"""
filtered_test_matrix: Dict[str, List[Any]] = {
"include": []
}
for entry in test_matrix.get("include", []):
config_name = entry.get("config", "")
if not config_name:
continue
label = f"{PREFIX}{config_name.strip()}"
if label in labels:
print(f"Select {config_name} because label {label} is presented in the pull request by the time the test starts")
filtered_test_matrix["include"].append(entry)
valid_test_config_labels = labels.intersection(VALID_TEST_CONFIG_LABELS)
if not filtered_test_matrix["include"] and not valid_test_config_labels:
# Found no valid label and the filtered test matrix is empty, return the same
# test matrix as before so that all tests can be run normally
return test_matrix
else:
# When the filter test matrix contain matches or if a valid test config label
# is found in the PR, return the filtered test matrix
return filtered_test_matrix
def main() -> None:
args = parse_args()
# Load the original test matrix set by the workflow. Its format, however,
# doesn't follow the strict JSON format, so we load it using yaml here for
# its more relaxed syntax
test_matrix = yaml.safe_load(args.test_matrix)
if test_matrix is None:
warnings.warn(f"Invalid test matrix input '{args.test_matrix}', exiting")
# We handle invalid test matrix gracefully by marking it as empty
print("::set-output name=is-test-matrix-empty::True")
sys.exit(0)
pr_number = args.pr_number
tag = args.tag
# If the tag matches, we can get the PR number from it, this is from ciflow
# workflow dispatcher
tag_regex = re.compile(r"^ciflow/\w+/(?P<pr_number>\d+)$")
if pr_number:
# If a PR number is set, query all the labels from that PR
labels = get_labels(int(pr_number))
# Then filter the test matrix and keep only the selected ones
filtered_test_matrix = filter(test_matrix, labels)
elif tag:
m = tag_regex.match(tag)
if m:
pr_number = m.group("pr_number")
# The PR number can also come from the tag in ciflow tag event
labels = get_labels(int(pr_number))
# Filter the test matrix and keep only the selected ones
filtered_test_matrix = filter(test_matrix, labels)
else:
# There is a tag but it isn't ciflow, so there is nothing left to do
filtered_test_matrix = test_matrix
else:
# No PR number, no tag, we can just return the test matrix as it is
filtered_test_matrix = test_matrix
# Set the filtered test matrix as the output
print(f"::set-output name=test-matrix::{json.dumps(filtered_test_matrix)}")
filtered_test_matrix_len = len(filtered_test_matrix.get("include", []))
# and also put a flag if the test matrix is empty, so subsequent jobs can
# quickly check it without the need to parse the JSON string
print(f"::set-output name=is-test-matrix-empty::{filtered_test_matrix_len == 0}")
if __name__ == "__main__":
main()

View File

@ -13,10 +13,10 @@ architectures:
from typing import Dict, List, Tuple, Optional
CUDA_ARCHES = ["11.6", "11.7"]
CUDA_ARCHES = ["10.2", "11.3", "11.6"]
ROCM_ARCHES = ["5.1.1", "5.2"]
ROCM_ARCHES = ["5.0", "5.1.1"]
def arch_type(arch_version: str) -> str:
@ -90,8 +90,11 @@ def generate_conda_matrix(os: str) -> List[Dict[str, str]]:
ret: List[Dict[str, str]] = []
arches = ["cpu"]
python_versions = FULL_PYTHON_VERSIONS
if os == "linux" or os == "windows":
if os == "linux":
arches += CUDA_ARCHES
elif os == "windows":
# We don't build CUDA 10.2 for window see https://github.com/pytorch/pytorch/issues/65648
arches += list_without(CUDA_ARCHES, ["10.2"])
elif os == "macos-arm64":
python_versions = list_without(python_versions, ["3.7"])
for python_version in python_versions:
@ -126,7 +129,8 @@ def generate_libtorch_matrix(os: str, abi_version: str,
arches += CUDA_ARCHES
arches += ROCM_ARCHES
elif os == "windows":
arches += CUDA_ARCHES
# We don't build CUDA 10.2 for window see https://github.com/pytorch/pytorch/issues/65648
arches += list_without(CUDA_ARCHES, ["10.2"])
if libtorch_variants is None:
libtorch_variants = [
@ -147,45 +151,25 @@ def generate_libtorch_matrix(os: str, abi_version: str,
# ROCm builds without-deps failed even in ROCm runners; skip for now
if gpu_arch_type == "rocm" and "without-deps" in libtorch_variant:
continue
desired_cuda = translate_desired_cuda(gpu_arch_type, gpu_arch_version)
if desired_cuda == "rocm5.1.1" and abi_version == PRE_CXX11_ABI:
ret.append(
{
"gpu_arch_type": gpu_arch_type,
"gpu_arch_version": gpu_arch_version,
"desired_cuda": desired_cuda,
"libtorch_variant": libtorch_variant,
"libtorch_config": abi_version if os == "windows" else "",
"devtoolset": abi_version if os != "windows" else "",
"container_image": (
"pytorch/manylinux-builder:rocm5.1.1-cd2573d54f9bd9b8f32b4dd7f182923a846597d5"
if os != "windows" else ""
),
"package_type": "libtorch",
"build_name": f"libtorch-{gpu_arch_type}{gpu_arch_version}-{libtorch_variant}-{abi_version}".replace(
".", "_"
),
}
)
else:
ret.append(
{
"gpu_arch_type": gpu_arch_type,
"gpu_arch_version": gpu_arch_version,
"desired_cuda": desired_cuda,
"libtorch_variant": libtorch_variant,
"libtorch_config": abi_version if os == "windows" else "",
"devtoolset": abi_version if os != "windows" else "",
"container_image": LIBTORCH_CONTAINER_IMAGES[
(arch_version, abi_version)
] if os != "windows" else "",
"package_type": "libtorch",
"build_name": f"libtorch-{gpu_arch_type}{gpu_arch_version}-{libtorch_variant}-{abi_version}".replace(
".", "_"
),
}
)
ret.append(
{
"gpu_arch_type": gpu_arch_type,
"gpu_arch_version": gpu_arch_version,
"desired_cuda": translate_desired_cuda(
gpu_arch_type, gpu_arch_version
),
"libtorch_variant": libtorch_variant,
"libtorch_config": abi_version if os == "windows" else "",
"devtoolset": abi_version if os != "windows" else "",
"container_image": LIBTORCH_CONTAINER_IMAGES[
(arch_version, abi_version)
] if os != "windows" else "",
"package_type": "libtorch",
"build_name": f"libtorch-{gpu_arch_type}{gpu_arch_version}-{libtorch_variant}-{abi_version}".replace(
".", "_"
),
}
)
return ret
@ -199,81 +183,37 @@ def generate_wheels_matrix(os: str,
if python_versions is None:
# Define default python version
python_versions = list(FULL_PYTHON_VERSIONS)
python_versions = FULL_PYTHON_VERSIONS
if os == "macos-arm64":
python_versions = list_without(python_versions, ["3.7"])
if os == "linux":
# NOTE: We only build 3.11 wheel on linux as 3.11 is not
# available on conda right now
python_versions.append("3.11")
if arches is None:
# Define default compute archivectures
arches = ["cpu"]
if os == "linux":
arches += CUDA_ARCHES + ROCM_ARCHES
elif os == "windows":
arches += CUDA_ARCHES
# We don't build CUDA 10.2 for window see https://github.com/pytorch/pytorch/issues/65648
arches += list_without(CUDA_ARCHES, ["10.2"])
ret: List[Dict[str, str]] = []
for python_version in python_versions:
for arch_version in arches:
gpu_arch_type = arch_type(arch_version)
gpu_arch_version = "" if arch_version == "cpu" else arch_version
# Skip rocm 3.11 binaries for now as the docker image are not correct
if python_version == "3.11" and gpu_arch_type == "rocm":
continue
desired_cuda = translate_desired_cuda(gpu_arch_type, gpu_arch_version)
# special 11.7 wheels package without dependencies
# dependency downloaded via pip install
if arch_version == "11.7" and os == "linux":
ret.append(
{
"python_version": python_version,
"gpu_arch_type": gpu_arch_type,
"gpu_arch_version": gpu_arch_version,
"desired_cuda": desired_cuda,
"container_image": WHEEL_CONTAINER_IMAGES[arch_version],
"package_type": package_type,
"pytorch_extra_install_requirements":
"nvidia-cuda-runtime-cu11; platform_system == 'Linux' | "
"nvidia-cudnn-cu11==8.5.0.96; platform_system == 'Linux' | "
"nvidia-cublas-cu11==11.10.3.66; platform_system == 'Linux'",
"build_name":
f"{package_type}-py{python_version}-{gpu_arch_type}{gpu_arch_version}-with-pypi-cudnn"
.replace(
".", "_"
),
}
)
if desired_cuda == "rocm5.1.1":
ret.append(
{
"python_version": python_version,
"gpu_arch_type": gpu_arch_type,
"gpu_arch_version": gpu_arch_version,
"desired_cuda": desired_cuda,
"container_image": "pytorch/manylinux-builder:rocm5.1.1-cd2573d54f9bd9b8f32b4dd7f182923a846597d5",
"package_type": package_type,
"build_name": f"{package_type}-py{python_version}-{gpu_arch_type}{gpu_arch_version}".replace(
".", "_"
),
}
)
else:
ret.append(
{
"python_version": python_version,
"gpu_arch_type": gpu_arch_type,
"gpu_arch_version": gpu_arch_version,
"desired_cuda": desired_cuda,
"container_image": WHEEL_CONTAINER_IMAGES[arch_version],
"package_type": package_type,
"build_name": f"{package_type}-py{python_version}-{gpu_arch_type}{gpu_arch_version}".replace(
".", "_"
),
}
)
ret.append(
{
"python_version": python_version,
"gpu_arch_type": gpu_arch_type,
"gpu_arch_version": gpu_arch_version,
"desired_cuda": translate_desired_cuda(
gpu_arch_type, gpu_arch_version
),
"container_image": WHEEL_CONTAINER_IMAGES[arch_version],
"package_type": package_type,
"build_name": f"{package_type}-py{python_version}-{gpu_arch_type}{gpu_arch_version}".replace(
".", "_"
),
}
)
return ret

View File

@ -17,6 +17,7 @@ Arch = Literal["windows", "linux", "macos"]
GITHUB_DIR = Path(__file__).resolve().parent.parent
LABEL_CIFLOW_TRUNK = "ciflow/trunk"
LABEL_CIFLOW_ALL = "ciflow/all"
LABEL_CIFLOW_BINARIES = "ciflow/binaries"
LABEL_CIFLOW_PERIODIC = "ciflow/periodic"
LABEL_CIFLOW_BINARIES_LIBTORCH = "ciflow/binaries_libtorch"
@ -33,6 +34,7 @@ class CIFlowConfig:
def __post_init__(self) -> None:
if not self.isolated_workflow:
self.labels.add(LABEL_CIFLOW_ALL)
if LABEL_CIFLOW_PERIODIC not in self.labels:
self.labels.add(LABEL_CIFLOW_TRUNK)
@ -134,7 +136,7 @@ LINUX_BINARY_SMOKE_WORKFLOWS = [
package_type="manywheel",
build_configs=generate_binary_build_matrix.generate_wheels_matrix(
OperatingSystem.LINUX,
arches=["11.6"],
arches=["10.2"],
python_versions=["3.7"]),
branches="master",
),
@ -207,6 +209,15 @@ WINDOWS_BINARY_BUILD_WORKFLOWS = [
),
]
WINDOWS_BINARY_SMOKE_WORKFLOWS = [
BinaryBuildWorkflow(
os=OperatingSystem.WINDOWS,
package_type="wheel",
build_configs=generate_binary_build_matrix.generate_wheels_matrix(
OperatingSystem.WINDOWS,
arches=["11.3"],
python_versions=["3.7"]),
branches="master",
),
BinaryBuildWorkflow(
os=OperatingSystem.WINDOWS,
package_type="libtorch",

View File

@ -31,9 +31,7 @@ parser.add_argument(
args = parser.parse_args()
# From https://docs.github.com/en/actions/learn-github-actions/environment-variables
PYTORCH_REPO = os.environ.get("GITHUB_REPOSITORY", "pytorch/pytorch")
PYTORCH_GITHUB_API = f"https://api.github.com/repos/{PYTORCH_REPO}"
PYTORCH_REPO = "https://api.github.com/repos/pytorch/pytorch"
GITHUB_TOKEN = os.environ["GITHUB_TOKEN"]
REQUEST_HEADERS = {
"Accept": "application/vnd.github.v3+json",
@ -41,7 +39,7 @@ REQUEST_HEADERS = {
}
response = requests.get(
f"{PYTORCH_GITHUB_API}/actions/runs/{args.workflow_run_id}/jobs?per_page=100",
f"{PYTORCH_REPO}/actions/runs/{args.workflow_run_id}/jobs?per_page=100",
headers=REQUEST_HEADERS,
)

View File

@ -305,8 +305,8 @@ def patterns_to_regex(allowed_patterns: List[str]) -> Any:
"""
pattern is glob-like, i.e. the only special sequences it has are:
- ? - matches single character
- * - matches any non-folder separator characters or no character
- ** - matches any characters or no character
- * - matches any non-folder separator characters
- ** - matches any characters
Assuming that patterns are free of braces and backslashes
the only character that needs to be escaped are dot and plus
"""
@ -324,9 +324,9 @@ def patterns_to_regex(allowed_patterns: List[str]) -> Any:
elif c == "*":
if pattern_.peek() == "*":
next(pattern_)
rc += ".*"
rc += ".+"
else:
rc += "[^/]*"
rc += "[^/]+"
else:
rc += c
rc += ")"

6397
.github/scripts/gql_mocks.json generated vendored

File diff suppressed because one or more lines are too long

View File

@ -2,10 +2,8 @@
set -eou pipefail
DISTRIBUTION=$(. /etc/os-release;echo $ID$VERSION_ID)
DRIVER_VERSION="515.57"
DRIVER_FN="NVIDIA-Linux-x86_64-${DRIVER_VERSION}.run"
DISTRIBUTION=$(. /etc/os-release;echo $ID$VERSION_ID) \
DRIVER_FN="NVIDIA-Linux-x86_64-510.60.02.run"
YUM_REPO_URL="https://nvidia.github.io/nvidia-docker/${DISTRIBUTION}/nvidia-docker.repo"
install_nvidia_docker2_amzn2() {
@ -22,52 +20,17 @@ install_nvidia_docker2_amzn2() {
install_nvidia_driver_amzn2() {
(
set -x
HAS_NVIDIA_DRIVER=0
# Check if NVIDIA driver has already been installed
if [ -x "$(command -v nvidia-smi)" ]; then
# The driver exists, check its version next
INSTALLED_DRIVER_VERSION=$(nvidia-smi --query-gpu=driver_version --format=csv,noheader)
if [ "$INSTALLED_DRIVER_VERSION" != "$DRIVER_VERSION" ]; then
# TODO
# Remove this after torchrec and FBGEMM have both been updated to use
# PyTorch NVIDIA installation script instead of using the latest driver
# from RHEL repo
HAS_NVIDIA_DRIVER=1
echo "NVIDIA driver ($INSTALLED_DRIVER_VERSION) has been installed, but we expect to have $DRIVER_VERSION instead. Skipping NVIDIA driver installation for now until torchrec and FBGEMM are updated to use PyTorch NVIDIA installation script instead of RHEL repo"
else
HAS_NVIDIA_DRIVER=1
echo "NVIDIA driver ($INSTALLED_DRIVER_VERSION) has already been installed. Skipping NVIDIA driver installation"
fi
fi
if [ "$HAS_NVIDIA_DRIVER" -eq 0 ]; then
sudo yum groupinstall -y "Development Tools"
# ensure our kernel install is the same as our underlying kernel,
# groupinstall "Development Tools" has a habit of mismatching kernel headers
sudo yum install -y "kernel-devel-uname-r == $(uname -r)"
sudo modprobe backlight
sudo curl -fsL -o /tmp/nvidia_driver "https://s3.amazonaws.com/ossci-linux/nvidia_driver/$DRIVER_FN"
sudo /bin/bash /tmp/nvidia_driver -s --no-drm || (sudo cat /var/log/nvidia-installer.log && false)
sudo rm -fv /tmp/nvidia_driver
fi
sudo yum groupinstall -y "Development Tools"
# ensure our kernel install is the same as our underlying kernel,
# groupinstall "Development Tools" has a habit of mismatching kernel headers
sudo yum install -y "kernel-devel-uname-r == $(uname -r)"
sudo curl -fsL -o /tmp/nvidia_driver "https://s3.amazonaws.com/ossci-linux/nvidia_driver/$DRIVER_FN"
sudo /bin/bash /tmp/nvidia_driver -s --no-drm || (sudo cat /var/log/nvidia-installer.log && false)
sudo rm -fv /tmp/nvidia_driver
nvidia-smi
)
}
echo "== Installing nvidia driver ${DRIVER_FN} =="
case "${DISTRIBUTION}" in
amzn*)
install_nvidia_driver_amzn2
;;
*)
echo "ERROR: Unknown distribution ${DISTRIBUTION}"
exit 1
;;
esac
# Install container toolkit based on distribution
echo "== Installing nvidia container toolkit for ${DISTRIBUTION} =="
case "${DISTRIBUTION}" in
@ -79,3 +42,14 @@ case "${DISTRIBUTION}" in
exit 1
;;
esac
echo "== Installing nvidia driver ${DRIVER_FN} =="
case "${DISTRIBUTION}" in
amzn*)
install_nvidia_driver_amzn2
;;
*)
echo "ERROR: Unknown distribution ${DISTRIBUTION}"
exit 1
;;
esac

View File

@ -1,60 +0,0 @@
#!/usr/bin/env bash
set -eou pipefail
GIT_TOP_DIR=$(git rev-parse --show-toplevel)
TMPFILE=$(mktemp)
trap "rm -rf ${TMPFILE}" EXIT
# By default just run against the latest commit
BASE=${BASE:-HEAD~1}
HEAD=${HEAD:-HEAD}
ancestor=$(git merge-base "${BASE}" "${HEAD}")
echo "INFO: Checking aginst the following stats"
(
set -x
git diff --stat "$ancestor" "${HEAD}" | sed '$d' > "${TMPFILE}"
)
while read -r git_attribute; do
if echo "${git_attribute}" | grep "linguist-generated=true" >/dev/null 2>/dev/null; then
pattern=$(echo ${git_attribute} | cut -d' ' -f1)
escaped_pattern=$(printf '%s\n' "$pattern" | sed -e 's/[\/&]/\\&/g')
# Delete known generated files
sed -i '/'"${escaped_pattern}"'/d' "${TMPFILE}"
fi
done < "${GIT_TOP_DIR}/.gitattributes"
echo "INFO: Showing non-generated files:"
(
set -x
cat "${TMPFILE}"
)
# Get only files that have changed
changed_files=$(cut -d' ' -f2 "${TMPFILE}" | xargs)
details=$(git diff --shortstat "$ancestor" "${HEAD}" -- ${changed_files})
add=$(echo "$details" | grep -o '[0-9]* insertion' | grep -o '[0-9]*' || true)
remove=$(echo "$details" | grep -o '[0-9]* deletion' | grep -o '[0-9]*' || true)
pr_size=0
if [ "$add" ]; then
pr_size=$(("$pr_size" + "$add"))
fi
if [ "$remove" ]; then
pr_size=$(("$pr_size" + "$remove"))
fi
echo "INFO: PR SIZE is ${pr_size}"
if ((pr_size > 2000)); then
echo
echo 'Your PR is '"$pr_size"' LOC which is more than the 2000 maximum'
echo 'allowed within PyTorch infra. PLease make sure to split up'
echo 'your PR into smaller pieces that can be reviewed.'
echo 'If you think that this rule should not apply to your PR,'
echo 'please contact @albanD or @seemethere.'
echo
exit 1
fi

View File

@ -99,7 +99,7 @@ if __name__ == "__main__":
repo_labels = get_repo_labels()
primary_labels = set(filter(lambda x: x.startswith(PRIMARY_LABEL_FILTER), repo_labels))
has_both_labels = bool(primary_labels.intersection(labels)) and bool(SECONDARY_LABELS.intersection(labels))
has_both_labels = bool(primary_labels.intersection(labels) and SECONDARY_LABELS.intersection(labels))
is_properly_labeled = has_both_labels or bool(ALLOWED_ONLY_SECONDARY.intersection(labels))
if not is_properly_labeled:

View File

@ -5,25 +5,22 @@ Currently, only supports running tests on specified model names
Testing environment:
- Intel Xeon 8259CL @ 2.50 GHz, 24 Cores with disabled Turbo and HT
- Nvidia Tesla T4
- Nvidia Driver 470.82.01
- Python 3.8
- CUDA 11.3
- Nvidia Driver 450.51.06
- Python 3.7
- CUDA 10.2
"""
# Known issues:
# 1. Does not reuse the build artifact in other CI workflows
# 2. CI jobs are serialized because there is only one worker
import os
import boto3 # type: ignore[import]
import git # type: ignore[import]
import pathlib
import argparse
import subprocess
from pathlib import Path
from typing import List, Tuple
from typing import List
TORCHBENCH_CONFIG_NAME = "config.yaml"
TORCHBENCH_USERBENCHMARK_CONFIG_NAME = "ub-config.yaml"
MAGIC_PREFIX = "RUN_TORCHBENCH:"
MAGIC_TORCHBENCH_PREFIX = "TORCHBENCH_BRANCH:"
ABTEST_CONFIG_TEMPLATE = """# This config is automatically generated by run_torchbench.py
@ -33,25 +30,6 @@ threshold: 100
direction: decrease
timeout: 720
tests:"""
S3_BUCKET = "ossci-metrics"
S3_PREFIX = "torchbench-pr-test"
S3_URL_BASE = f"https://{S3_BUCKET}.s3.amazonaws.com/"
class S3Client:
def __init__(self, bucket: str = S3_BUCKET, prefix: str = S3_PREFIX):
self.s3 = boto3.client('s3')
self.resource = boto3.resource('s3')
self.bucket = bucket
self.prefix = prefix
def upload_file(self, file_path: Path, filekey_prefix: str) -> None:
assert file_path.is_file(), f"Specified file path {file_path} does not exist or not file."
file_name = file_path.name
s3_key = f"{self.prefix}/{filekey_prefix}/{file_name}"
print(f"Uploading file {file_name} to S3 with key: {s3_key}")
self.s3.upload_file(str(file_path), self.bucket, s3_key)
# output the result URL
print(f"Uploaded the result file {file_name} to {S3_URL_BASE}{s3_key}")
def gen_abtest_config(control: str, treatment: str, models: List[str]) -> str:
d = {}
@ -76,51 +54,33 @@ def find_current_branch(repo_path: str) -> str:
name: str = repo.active_branch.name
return name
def deploy_torchbench_config(output_dir: str, config: str, config_name: str = TORCHBENCH_CONFIG_NAME) -> None:
def deploy_torchbench_config(output_dir: str, config: str) -> None:
# Create test dir if needed
pathlib.Path(output_dir).mkdir(exist_ok=True)
# TorchBench config file name
config_path = os.path.join(output_dir, config_name)
config_path = os.path.join(output_dir, TORCHBENCH_CONFIG_NAME)
with open(config_path, "w") as fp:
fp.write(config)
def get_valid_models(torchbench_path: str) -> List[str]:
benchmark_path = os.path.join(torchbench_path, "torchbenchmark", "models")
valid_models = [model for model in os.listdir(benchmark_path) if os.path.isdir(os.path.join(benchmark_path, model))]
return valid_models
def get_valid_userbenchmarks(torchbench_path: str) -> List[str]:
def is_valid_ub_dir(ub_path: str) -> bool:
return os.path.isdir(ub_path) and os.path.exists(os.path.join(ub_path, "__init__.py"))
ub_path = os.path.join(os.path.abspath(torchbench_path), "userbenchmark")
ubs = list(filter(is_valid_ub_dir, [os.path.join(ub_path, ubdir) for ubdir in os.listdir(ub_path)]))
valid_ubs = list(map(lambda x: os.path.basename(x), ubs))
return valid_ubs
def extract_models_from_pr(torchbench_path: str, prbody_file: str) -> Tuple[List[str], List[str]]:
def extract_models_from_pr(torchbench_path: str, prbody_file: str) -> List[str]:
model_list = []
userbenchmark_list = []
pr_list = []
with open(prbody_file, "r") as pf:
lines = map(lambda x: x.strip(), pf.read().splitlines())
magic_lines = list(filter(lambda x: x.startswith(MAGIC_PREFIX), lines))
if magic_lines:
# Only the first magic line will be recognized.
pr_list = list(map(lambda x: x.strip(), magic_lines[0][len(MAGIC_PREFIX):].split(",")))
valid_models = get_valid_models(torchbench_path)
valid_ubs = get_valid_userbenchmarks(torchbench_path)
for pr_bm in pr_list:
if pr_bm in valid_models or pr_bm == "ALL":
model_list.append(pr_bm)
elif pr_bm in valid_ubs:
userbenchmark_list.append(pr_bm)
else:
print(f"The model or benchmark {pr_bm} you specified does not exist in TorchBench suite. Please double check.")
exit(-1)
# Shortcut: if pr_list is ["ALL"], run all the model tests
if "ALL" in model_list:
model_list = ["ALL"]
return model_list, userbenchmark_list
model_list = list(map(lambda x: x.strip(), magic_lines[0][len(MAGIC_PREFIX):].split(",")))
# Shortcut: if model_list is ["ALL"], run all the tests
if model_list == ["ALL"]:
return model_list
# Sanity check: make sure all the user specified models exist in torchbench repository
benchmark_path = os.path.join(torchbench_path, "torchbenchmark", "models")
full_model_list = [model for model in os.listdir(benchmark_path) if os.path.isdir(os.path.join(benchmark_path, model))]
for m in model_list:
if m not in full_model_list:
print(f"The model {m} you specified does not exist in TorchBench suite. Please double check.")
return []
return model_list
def find_torchbench_branch(prbody_file: str) -> str:
branch_name: str = ""
@ -140,39 +100,13 @@ def run_torchbench(pytorch_path: str, torchbench_path: str, output_dir: str) ->
env = dict(os.environ)
command = ["python", "bisection.py", "--work-dir", output_dir,
"--pytorch-src", pytorch_path, "--torchbench-src", torchbench_path,
"--config", os.path.join(output_dir, TORCHBENCH_CONFIG_NAME),
"--config", os.path.join(output_dir, "config.yaml"),
"--output", os.path.join(output_dir, "result.txt")]
print(f"Running torchbench command: {command}")
subprocess.check_call(command, cwd=torchbench_path, env=env)
def run_userbenchmarks(pytorch_path: str, torchbench_path: str, base_sha: str, head_sha: str,
userbenchmark: str, output_dir: str) -> None:
# Copy system environment so that we will not override
env = dict(os.environ)
command = ["python", "./.github/scripts/abtest.py",
"--pytorch-repo", pytorch_path,
"--base", base_sha,
"--head", head_sha,
"--userbenchmark", userbenchmark,
"--output-dir", output_dir]
print(f"Running torchbench userbenchmark command: {command}")
subprocess.check_call(command, cwd=torchbench_path, env=env)
def process_upload_s3(result_dir: str) -> None:
# validate result directory
result_dir_path = Path(result_dir)
assert result_dir_path.exists(), f"Specified result directory {result_dir} doesn't exist."
# upload all files to S3 bucket oss-ci-metrics
files = [x for x in result_dir_path.iterdir() if x.is_file()]
# upload file to S3 bucket
s3_client: S3Client = S3Client()
filekey_prefix = result_dir_path.name
for f in files:
s3_client.upload_file(f, filekey_prefix)
if __name__ == "__main__":
parser = argparse.ArgumentParser(description='Run TorchBench tests based on PR')
parser.add_argument('--pr-body', help="The file that contains body of a Pull Request")
parser.add_argument('--pr-body', required=True, help="The file that contains body of a Pull Request")
subparsers = parser.add_subparsers(dest='command')
# parser for setup the torchbench branch name env
@ -184,9 +118,6 @@ if __name__ == "__main__":
run_parser.add_argument('--pr-head-sha', required=True, type=str, help="The Pull Request head hash")
run_parser.add_argument('--pytorch-path', required=True, type=str, help="Path to pytorch repository")
run_parser.add_argument('--torchbench-path', required=True, type=str, help="Path to TorchBench repository")
# parser to upload results to S3
upload_parser = subparsers.add_parser("upload-s3")
upload_parser.add_argument('--result-dir', required=True, type=str, help="Path to benchmark output")
args = parser.parse_args()
if args.command == 'set-torchbench-branch':
@ -195,30 +126,21 @@ if __name__ == "__main__":
setup_gha_env(MAGIC_TORCHBENCH_PREFIX[:-1], branch_name)
elif args.command == 'run':
output_dir: str = os.path.join(os.environ["HOME"], ".torchbench", "bisection", f"pr{args.pr_num}")
# Identify the specified models and verify the input
models = extract_models_from_pr(args.torchbench_path, args.pr_body)
if not models:
print("Can't parse the model filter from the pr body. Currently we only support allow-list.")
exit(-1)
# Assert the current branch in args.torchbench_path is the same as the one specified in pr body
branch_name = find_torchbench_branch(args.pr_body)
current_branch = find_current_branch(args.torchbench_path)
assert branch_name == current_branch, f"Torchbench repo {args.torchbench_path} is on branch {current_branch}, \
but user specified to run on branch {branch_name}."
print(f"Ready to run TorchBench with benchmark. Result will be saved in the directory: {output_dir}.")
# Identify the specified models and userbenchmarks
models, userbenchmarks = extract_models_from_pr(args.torchbench_path, args.pr_body)
if models:
torchbench_config = gen_abtest_config(args.pr_base_sha, args.pr_head_sha, models)
deploy_torchbench_config(output_dir, torchbench_config)
run_torchbench(pytorch_path=args.pytorch_path, torchbench_path=args.torchbench_path, output_dir=output_dir)
if userbenchmarks:
assert len(userbenchmarks) == 1, \
"We don't support running multiple userbenchmarks in single workflow yet." \
"If you need, please submit a feature request."
run_userbenchmarks(pytorch_path=args.pytorch_path, torchbench_path=args.torchbench_path,
base_sha=args.pr_base_sha, head_sha=args.pr_head_sha,
userbenchmark=userbenchmarks[0], output_dir=output_dir)
if not models and not userbenchmarks:
print("Can't parse valid models or userbenchmarks from the pr body. Quit.")
exit(-1)
elif args.command == 'upload-s3':
process_upload_s3(args.result_dir)
# Run TorchBench with the generated config
torchbench_config = gen_abtest_config(args.pr_base_sha, args.pr_head_sha, models)
deploy_torchbench_config(output_dir, torchbench_config)
run_torchbench(pytorch_path=args.pytorch_path, torchbench_path=args.torchbench_path, output_dir=output_dir)
else:
print(f"The command {args.command} is not supported.")
exit(-1)

25
.github/scripts/syncbranches.py vendored Executable file
View File

@ -0,0 +1,25 @@
#!/usr/bin/env python3
from gitutils import get_git_repo_dir, get_git_remote_name, GitRepo
from typing import Any
def parse_args() -> Any:
from argparse import ArgumentParser
parser = ArgumentParser("Merge PR/branch into default branch")
parser.add_argument("--sync-branch", default="sync")
parser.add_argument("--default-branch", type=str, default="main")
parser.add_argument("--dry-run", action="store_true")
parser.add_argument("--debug", action="store_true")
return parser.parse_args()
def main() -> None:
args = parse_args()
repo = GitRepo(get_git_repo_dir(), get_git_remote_name(), debug=args.debug)
repo.cherry_pick_commits(args.sync_branch, args.default_branch)
repo.push(args.default_branch, args.dry_run)
if __name__ == '__main__':
main()

View File

@ -1,101 +0,0 @@
from unittest import TestCase, main, mock
from typing import Any, List, Dict
from fetch_latest_green_commit import isGreen, WorkflowCheck
workflowNames = [
"pull",
"trunk",
"Lint",
"linux-binary-libtorch-pre-cxx11",
"android-tests",
"windows-binary-wheel",
"periodic",
"docker-release-builds",
"nightly",
"pr-labels",
"Close stale pull requests",
"Update S3 HTML indices for download.pytorch.org",
"Create Release"
]
def set_workflow_job_status(workflow: List[Dict[str, Any]], name: str, status: str) -> List[Dict[str, Any]]:
for check in workflow:
if check['workflowName'] == name:
check['conclusion'] = status
return workflow
class TestChecks:
def make_test_checks(self) -> List[Dict[str, Any]]:
workflow_checks = []
for i in range(len(workflowNames)):
workflow_checks.append(WorkflowCheck(
workflowName=workflowNames[i],
name="test/job",
jobName="job",
conclusion="success",
)._asdict())
return workflow_checks
class TestPrintCommits(TestCase):
@mock.patch('fetch_latest_green_commit.get_commit_results', return_value=TestChecks().make_test_checks())
def test_all_successful(self, mock_get_commit_results: Any) -> None:
"Test with workflows are successful"
workflow_checks = mock_get_commit_results()
self.assertTrue(isGreen("sha", workflow_checks)[0])
@mock.patch('fetch_latest_green_commit.get_commit_results', return_value=TestChecks().make_test_checks())
def test_necessary_successful(self, mock_get_commit_results: Any) -> None:
"Test with necessary workflows are successful"
workflow_checks = mock_get_commit_results()
workflow_checks = set_workflow_job_status(workflow_checks, workflowNames[8], "failed")
workflow_checks = set_workflow_job_status(workflow_checks, workflowNames[9], "failed")
workflow_checks = set_workflow_job_status(workflow_checks, workflowNames[10], "failed")
workflow_checks = set_workflow_job_status(workflow_checks, workflowNames[11], "failed")
workflow_checks = set_workflow_job_status(workflow_checks, workflowNames[12], "failed")
self.assertTrue(isGreen("sha", workflow_checks)[0])
@mock.patch('fetch_latest_green_commit.get_commit_results', return_value=TestChecks().make_test_checks())
def test_necessary_skipped(self, mock_get_commit_results: Any) -> None:
"Test with necessary job (ex: pull) skipped"
workflow_checks = mock_get_commit_results()
workflow_checks = set_workflow_job_status(workflow_checks, "pull", "skipped")
result = isGreen("sha", workflow_checks)
self.assertTrue(result[0])
@mock.patch('fetch_latest_green_commit.get_commit_results', return_value=TestChecks().make_test_checks())
def test_skippable_skipped(self, mock_get_commit_results: Any) -> None:
"Test with skippable jobs (periodic and docker-release-builds skipped"
workflow_checks = mock_get_commit_results()
workflow_checks = set_workflow_job_status(workflow_checks, "periodic", "skipped")
workflow_checks = set_workflow_job_status(workflow_checks, "docker-release-builds", "skipped")
self.assertTrue(isGreen("sha", workflow_checks))
@mock.patch('fetch_latest_green_commit.get_commit_results', return_value=TestChecks().make_test_checks())
def test_necessary_failed(self, mock_get_commit_results: Any) -> None:
"Test with necessary job (ex: Lint) failed"
workflow_checks = mock_get_commit_results()
workflow_checks = set_workflow_job_status(workflow_checks, "Lint", "failed")
result = isGreen("sha", workflow_checks)
self.assertFalse(result[0])
self.assertEqual(result[1], "Lint checks were not successful")
@mock.patch('fetch_latest_green_commit.get_commit_results', return_value=TestChecks().make_test_checks())
def test_skippable_failed(self, mock_get_commit_results: Any) -> None:
"Test with skippable job (ex: docker-release-builds) failing"
workflow_checks = mock_get_commit_results()
workflow_checks = set_workflow_job_status(workflow_checks, "periodic", "skipped")
workflow_checks = set_workflow_job_status(workflow_checks, "docker-release-builds", "failed")
result = isGreen("sha", workflow_checks)
self.assertFalse(result[0])
self.assertEqual(result[1], "docker-release-builds checks were not successful")
@mock.patch('fetch_latest_green_commit.get_commit_results', return_value={})
def test_no_workflows(self, mock_get_commit_results: Any) -> None:
"Test with missing workflows"
workflow_checks = mock_get_commit_results()
result = isGreen("sha", workflow_checks)
self.assertFalse(result[0])
self.assertEqual(result[1], "missing required workflows: pull, trunk, lint, linux-binary, windows-binary")
if __name__ == "__main__":
main()

View File

@ -1,88 +0,0 @@
#!/usr/bin/env python3
import os
import yaml
import json
from unittest import TestCase, main, mock
from filter_test_configs import get_labels, filter, PREFIX, VALID_TEST_CONFIG_LABELS
import requests
from requests.models import Response
from typing import Any, Dict
def mocked_gh_get_labels_failed(url: str, headers: Dict[str, str]) -> Response:
mocked_response = Response()
mocked_response.status_code = requests.codes.bad_request
return mocked_response
def mocked_gh_get_labels(url: str, headers: Dict[str, str]) -> Response:
mocked_response = Response()
mocked_response.status_code = requests.codes.ok
mocked_response._content = b'[{"name": "foo"}, {"name": "bar"}, {}, {"name": ""}]'
return mocked_response
class TestConfigFilter(TestCase):
def setUp(self) -> None:
os.environ["GITHUB_TOKEN"] = "GITHUB_TOKEN"
@mock.patch("filter_test_configs.requests.get", side_effect=mocked_gh_get_labels)
def test_get_labels(self, mocked_gh: Any) -> None:
labels = get_labels(pr_number=12345)
self.assertSetEqual({"foo", "bar"}, labels)
@mock.patch("filter_test_configs.requests.get", side_effect=mocked_gh_get_labels_failed)
def test_get_labels_failed(self, mocked_gh: Any) -> None:
labels = get_labels(pr_number=54321)
self.assertFalse(labels)
def test_filter(self) -> None:
mocked_labels = {f"{PREFIX}cfg", "ciflow/trunk", "plain-cfg"}
testcases = [
{
"test_matrix": '{include: [{config: "default", runner: "linux"}]}',
"expected": '{"include": [{"config": "default", "runner": "linux"}]}',
"description": "No match, keep the same test matrix",
},
{
"test_matrix": '{include: [{config: "default", runner: "linux"}, {config: "plain-cfg"}]}',
"expected": '{"include": [{"config": "default", "runner": "linux"}, {"config": "plain-cfg"}]}',
"description": "No match because there is no prefix or suffix, keep the same test matrix",
},
{
"test_matrix": '{include: [{config: "default", runner: "linux"}, {config: "cfg", shard: 1}]}',
"expected": '{"include": [{"config": "cfg", "shard": 1}]}',
"description": "Found a match, only keep that",
},
]
for case in testcases:
filtered_test_matrix = filter(yaml.safe_load(case["test_matrix"]), mocked_labels)
self.assertEqual(case["expected"], json.dumps(filtered_test_matrix))
def test_filter_with_valid_label(self) -> None:
mocked_labels = {f"{PREFIX}cfg", "ciflow/trunk"}
VALID_TEST_CONFIG_LABELS.add(f"{PREFIX}cfg")
testcases = [
{
"test_matrix": '{include: [{config: "default", runner: "linux"}]}',
"expected": '{"include": []}',
"description": "Found a valid label in the PR body, return the filtered test matrix",
},
{
"test_matrix": '{include: [{config: "default", runner: "linux"}, {config: "cfg", shard: 1}]}',
"expected": '{"include": [{"config": "cfg", "shard": 1}]}',
"description": "Found a match, only keep that",
},
]
for case in testcases:
filtered_test_matrix = filter(yaml.safe_load(case["test_matrix"]), mocked_labels)
self.assertEqual(case["expected"], json.dumps(filtered_test_matrix))
if __name__ == '__main__':
main()

View File

@ -1,5 +1,5 @@
#!/usr/bin/env python3
from gitutils import PeekableIterator, patterns_to_regex
from gitutils import PeekableIterator
from unittest import TestCase, main
class TestPeekableIterator(TestCase):
@ -22,18 +22,6 @@ class TestPeekableIterator(TestCase):
self.assertTrue(iter_.peek() is None)
class TestPattern(TestCase):
def test_double_asterisks(self) -> None:
allowed_patterns = [
"aten/src/ATen/native/**LinearAlgebra*",
]
patterns_re = patterns_to_regex(allowed_patterns)
fnames = [
"aten/src/ATen/native/LinearAlgebra.cpp",
"aten/src/ATen/native/cpu/LinearAlgebraKernel.cpp"]
for filename in fnames:
self.assertTrue(patterns_re.match(filename))
if __name__ == '__main__':
main()

View File

@ -11,28 +11,12 @@ import json
import os
from hashlib import sha256
from trymerge import (find_matching_merge_rule,
get_land_checkrun_conclusions,
validate_land_time_checks,
gh_graphql,
gh_get_team_members,
read_merge_rules,
validate_revert,
filter_pending_checks,
filter_failed_checks,
GitHubPR,
MergeRule,
MandatoryChecksMissingError,
WorkflowCheckState,
main as trymerge_main)
from trymerge import find_matching_merge_rule, gh_graphql, gh_get_team_members, GitHubPR, MergeRule, MandatoryChecksMissingError
from gitutils import get_git_remote_name, get_git_repo_dir, GitRepo
from typing import Any, List, Optional
from typing import cast, Any, List, Optional
from unittest import TestCase, main, mock
from urllib.error import HTTPError
if 'GIT_REMOTE_URL' not in os.environ:
os.environ['GIT_REMOTE_URL'] = "https://github.com/pytorch/pytorch"
def mocked_gh_graphql(query: str, **kwargs: Any) -> Any:
gql_db_fname = os.path.join(os.path.dirname(__file__), "gql_mocks.json")
@ -69,110 +53,39 @@ def mocked_gh_graphql(query: str, **kwargs: Any) -> Any:
return rc
def mock_parse_args(revert: bool = False,
force: bool = False) -> Any:
class Object(object):
def __init__(self) -> None:
self.revert = revert
self.force = force
self.pr_num = 76123
self.dry_run = True
self.comment_id = 0
self.on_mandatory = False
self.on_green = False
self.land_checks = False
self.reason = 'this is for testing'
return Object()
def mock_revert(repo: GitRepo, pr: GitHubPR, *,
dry_run: bool = False,
comment_id: Optional[int] = None,
reason: Optional[str] = None) -> None:
pass
def mock_merge(pr_num: int, repo: GitRepo,
dry_run: bool = False,
skip_mandatory_checks: bool = False,
comment_id: Optional[int] = None,
mandatory_only: bool = False,
on_green: bool = False,
land_checks: bool = False,
timeout_minutes: int = 400,
stale_pr_days: int = 3) -> None:
pass
def mock_gh_get_info() -> Any:
return {"closed": False, "isCrossRepository": False}
def mocked_read_merge_rules_NE(repo: Any, org: str, project: str) -> List[MergeRule]:
return [
MergeRule(name="mock with nonexistent check",
patterns=["*"],
approved_by=[],
mandatory_checks_name=["Lint",
"Facebook CLA Check",
"nonexistent"],
),
def mocked_read_merge_rules(repo: Optional[GitRepo], org: str, project: str) -> List[MergeRule]:
mock_merge_rules = """
[
{
"name": "mock with nonexistent check",
"patterns": ["*"],
"approved_by": [],
"mandatory_checks_name": [
"Facebook CLA Check",
"Lint",
"nonexistent"
]
}
]
"""
rc = json.loads(mock_merge_rules, object_hook=lambda x: MergeRule(**x))
return cast(List[MergeRule], rc)
def mocked_read_merge_rules(repo: Any, org: str, project: str) -> List[MergeRule]:
return [
MergeRule(name="super",
patterns=["*"],
approved_by=["pytorch/metamates"],
mandatory_checks_name=["Lint",
"Facebook CLA Check",
"pull / linux-xenial-cuda11.3-py3.7-gcc7 / build",
],
),
]
def mocked_read_merge_rules_raise(repo: Any, org: str, project: str) -> List[MergeRule]:
raise RuntimeError("testing")
class DummyGitRepo(GitRepo):
def __init__(self) -> None:
super().__init__(get_git_repo_dir(), get_git_remote_name())
def commits_resolving_gh_pr(self, pr_num: int) -> List[str]:
return ["FakeCommitSha"]
def commit_message(self, ref: str) -> str:
return "super awsome commit message"
class TestGitHubPR(TestCase):
def test_merge_rules_valid(self) -> None:
"Test that merge_rules.yaml can be parsed"
repo = DummyGitRepo()
self.assertGreater(len(read_merge_rules(repo, "pytorch", "pytorch")), 1)
@mock.patch('trymerge.gh_graphql', side_effect=mocked_gh_graphql)
@mock.patch('trymerge.read_merge_rules', side_effect=mocked_read_merge_rules)
def test_match_rules(self, mocked_gql: Any, mocked_rmr: Any) -> None:
def test_match_rules(self, mocked_gql: Any) -> None:
"Tests that PR passes merge rules"
pr = GitHubPR("pytorch", "pytorch", 77700)
repo = DummyGitRepo()
pr = GitHubPR("pytorch", "pytorch", 71759)
repo = GitRepo(get_git_repo_dir(), get_git_remote_name())
self.assertTrue(find_matching_merge_rule(pr, repo) is not None)
@mock.patch('trymerge.gh_graphql', side_effect=mocked_gh_graphql)
@mock.patch('trymerge.read_merge_rules', side_effect=mocked_read_merge_rules_raise)
def test_read_merge_rules_fails(self, mocked_gql: Any, mocked_rmr: Any) -> None:
"Tests that PR fails to read the merge rules"
pr = GitHubPR("pytorch", "pytorch", 77700)
repo = DummyGitRepo()
self.assertRaisesRegex(RuntimeError, "testing", lambda: find_matching_merge_rule(pr, repo))
@mock.patch('trymerge.gh_graphql', side_effect=mocked_gh_graphql)
@mock.patch('trymerge.read_merge_rules', side_effect=mocked_read_merge_rules)
def test_lint_fails(self, mocked_gql: Any, mocked_rmr: Any) -> None:
def test_lint_fails(self, mocked_gql: Any) -> None:
"Tests that PR fails mandatory lint check"
pr = GitHubPR("pytorch", "pytorch", 74649)
repo = DummyGitRepo()
repo = GitRepo(get_git_repo_dir(), get_git_remote_name())
self.assertRaises(RuntimeError, lambda: find_matching_merge_rule(pr, repo))
@mock.patch('trymerge.gh_graphql', side_effect=mocked_gh_graphql)
@ -219,7 +132,7 @@ class TestGitHubPR(TestCase):
def test_checksuites_pagination(self, mocked_gql: Any) -> None:
"Tests that PR with lots of checksuits can be fetched"
pr = GitHubPR("pytorch", "pytorch", 73811)
self.assertEqual(len(pr.get_checkrun_conclusions()), 107)
self.assertGreater(len(pr.get_checkrun_conclusions()), 0)
@mock.patch('trymerge.gh_graphql', side_effect=mocked_gh_graphql)
def test_comments_pagination(self, mocked_gql: Any) -> None:
@ -256,13 +169,13 @@ class TestGitHubPR(TestCase):
self.assertGreater(len(authors), 50)
self.assertTrue("@" in pr.get_author())
@mock.patch('trymerge.read_merge_rules', side_effect=mocked_read_merge_rules_NE)
@mock.patch('trymerge.read_merge_rules', side_effect=mocked_read_merge_rules)
@mock.patch('trymerge.gh_graphql', side_effect=mocked_gh_graphql)
def test_pending_status_check(self, mocked_gql: Any, mocked_read_merge_rules: Any) -> None:
""" Tests that PR with nonexistent/pending status checks fails with the right reason.
"""
pr = GitHubPR("pytorch", "pytorch", 76118)
repo = DummyGitRepo()
repo = GitRepo(get_git_repo_dir(), get_git_remote_name())
self.assertRaisesRegex(MandatoryChecksMissingError,
".*are pending/not yet run.*",
lambda: find_matching_merge_rule(pr, repo))
@ -283,92 +196,8 @@ class TestGitHubPR(TestCase):
"""
pr = GitHubPR("pytorch", "pytorch", 77700)
conclusions = pr.get_checkrun_conclusions()
self.assertEqual(len(conclusions), 83)
self.assertTrue("pull / linux-docs / build-docs (cpp)" in conclusions.keys())
self.assertTrue("linux-docs / build-docs (cpp)" in conclusions.keys())
@mock.patch('trymerge.gh_graphql', side_effect=mocked_gh_graphql)
def test_cancelled_gets_ignored(self, mocked_gql: Any) -> None:
""" Tests that cancelled workflow does not override existing successfull status
"""
pr = GitHubPR("pytorch", "pytorch", 82169)
conclusions = pr.get_checkrun_conclusions()
self.assertTrue("Lint" in conclusions.keys())
self.assertEqual(conclusions["Lint"][0], "SUCCESS")
@mock.patch('trymerge.gh_graphql', side_effect=mocked_gh_graphql)
def test_get_many_land_checks(self, mocked_gql: Any) -> None:
""" Tests that all checkruns can be fetched for a commit
"""
conclusions = get_land_checkrun_conclusions('pytorch', 'pytorch', '6882717f73deffb692219ccd1fd6db258d8ed684')
self.assertEqual(len(conclusions), 101)
self.assertTrue("pull / linux-docs / build-docs (cpp)" in conclusions.keys())
@mock.patch('trymerge.gh_graphql', side_effect=mocked_gh_graphql)
def test_failed_land_checks(self, mocked_gql: Any) -> None:
""" Tests that PR with Land Checks fail with a RunTime error
"""
self.assertRaisesRegex(RuntimeError,
".*Failed to merge; some land checks failed.*",
lambda: validate_land_time_checks('pytorch', 'pytorch', '6882717f73deffb692219ccd1fd6db258d8ed684'))
@mock.patch('trymerge.gh_get_pr_info', return_value=mock_gh_get_info())
@mock.patch('trymerge.parse_args', return_value=mock_parse_args(True, False))
@mock.patch('trymerge.try_revert', side_effect=mock_revert)
def test_main_revert(self, mock_revert: Any, mock_parse_args: Any, gh_get_pr_info: Any) -> None:
trymerge_main()
mock_revert.assert_called_once()
@mock.patch('trymerge.gh_get_pr_info', return_value=mock_gh_get_info())
@mock.patch('trymerge.parse_args', return_value=mock_parse_args(False, True))
@mock.patch('trymerge.merge', side_effect=mock_merge)
def test_main_force(self, mock_merge: Any, mock_parse_args: Any, mock_gh_get_info: Any) -> None:
trymerge_main()
mock_merge.assert_called_once_with(mock.ANY,
mock.ANY,
dry_run=mock.ANY,
skip_mandatory_checks=True,
comment_id=mock.ANY,
on_green=False,
land_checks=False,
mandatory_only=False)
@mock.patch('trymerge.gh_get_pr_info', return_value=mock_gh_get_info())
@mock.patch('trymerge.parse_args', return_value=mock_parse_args(False, False))
@mock.patch('trymerge.merge', side_effect=mock_merge)
def test_main_merge(self, mock_merge: Any, mock_parse_args: Any, mock_gh_get_info: Any) -> None:
trymerge_main()
mock_merge.assert_called_once_with(mock.ANY,
mock.ANY,
dry_run=mock.ANY,
skip_mandatory_checks=False,
comment_id=mock.ANY,
on_green=False,
land_checks=False,
mandatory_only=False)
@mock.patch('trymerge.gh_graphql', side_effect=mocked_gh_graphql)
@mock.patch('trymerge.read_merge_rules', side_effect=mocked_read_merge_rules)
def test_revert_rules(self, mock_gql: Any, mock_mr: Any) -> None:
""" Tests that reverts from collaborators are allowed """
pr = GitHubPR("pytorch", "pytorch", 79694)
repo = DummyGitRepo()
self.assertIsNotNone(validate_revert(repo, pr, comment_id=1189459845))
def test_checks_filter(self) -> None:
checks = [
WorkflowCheckState(name="check0", status="SUCCESS", url="url0"),
WorkflowCheckState(name="check1", status="FAILURE", url="url1"),
WorkflowCheckState(name="check2", status="STARTUP_FAILURE", url="url2"),
WorkflowCheckState(name="check3", status=None, url="url3"),
]
checks_dict = {check.name : check for check in checks}
pending_checks = filter_pending_checks(checks_dict)
failing_checks = filter_failed_checks(checks_dict)
self.assertListEqual(failing_checks, [checks[1], checks[2]])
self.assertListEqual(pending_checks, [checks[3]])
if __name__ == "__main__":
main()

View File

@ -15,28 +15,12 @@ class TestRebase(TestCase):
"Tests rebase successfully"
pr = GitHubPR("pytorch", "pytorch", 31093)
repo = GitRepo(get_git_repo_dir(), get_git_remote_name())
rebase_onto(pr, repo, 'master')
rebase_onto(pr, repo)
calls = [mock.call('fetch', 'origin', 'pull/31093/head:pull/31093/head'),
mock.call('rebase', 'refs/remotes/origin/master', 'pull/31093/head'),
mock.call('rebase', 'master', 'pull/31093/head'),
mock.call('push', '-f', 'https://github.com/mingxiaoh/pytorch.git', 'pull/31093/head:master')]
mocked_run_git.assert_has_calls(calls)
self.assertTrue(
"Successfully rebased `master` onto `refs/remotes/origin/master`" in mocked_post_comment.call_args[0][3])
@mock.patch('trymerge.gh_graphql', side_effect=mocked_gh_graphql)
@mock.patch('gitutils.GitRepo._run_git')
@mock.patch('tryrebase.gh_post_comment')
def test_rebase_to_stable(self, mocked_post_comment: Any, mocked_run_git: Any, mocked_gql: Any) -> None:
"Tests rebase to viable/strict successfully"
pr = GitHubPR("pytorch", "pytorch", 31093)
repo = GitRepo(get_git_repo_dir(), get_git_remote_name())
rebase_onto(pr, repo, 'viable/strict', False)
calls = [mock.call('fetch', 'origin', 'pull/31093/head:pull/31093/head'),
mock.call('rebase', 'refs/remotes/origin/viable/strict', 'pull/31093/head'),
mock.call('push', '-f', 'https://github.com/mingxiaoh/pytorch.git', 'pull/31093/head:master')]
mocked_run_git.assert_has_calls(calls)
self.assertTrue(
"Successfully rebased `master` onto `refs/remotes/origin/viable/strict`" in mocked_post_comment.call_args[0][3])
self.assertTrue("Successfully rebased `master` onto `master`" in mocked_post_comment.call_args[0][3])
@mock.patch('trymerge.gh_graphql', side_effect=mocked_gh_graphql)
@mock.patch('gitutils.GitRepo._run_git', return_value="Everything up-to-date")
@ -45,9 +29,9 @@ class TestRebase(TestCase):
"Tests branch already up to date"
pr = GitHubPR("pytorch", "pytorch", 31093)
repo = GitRepo(get_git_repo_dir(), get_git_remote_name())
rebase_onto(pr, repo, 'master')
rebase_onto(pr, repo)
calls = [mock.call('fetch', 'origin', 'pull/31093/head:pull/31093/head'),
mock.call('rebase', 'refs/remotes/origin/master', 'pull/31093/head'),
mock.call('rebase', 'master', 'pull/31093/head'),
mock.call('push', '-f', 'https://github.com/mingxiaoh/pytorch.git', 'pull/31093/head:master')]
mocked_run_git.assert_has_calls(calls)
self.assertTrue(

File diff suppressed because it is too large Load Diff

View File

@ -1,146 +0,0 @@
import os
import re
from typing import List, Pattern, Tuple, Optional
BOT_COMMANDS_WIKI = "https://github.com/pytorch/pytorch/wiki/Bot-commands"
CIFLOW_LABEL = re.compile(r"^ciflow/.+")
CIFLOW_TRUNK_LABEL = re.compile(r"^ciflow/trunk")
OFFICE_HOURS_LINK = "https://github.com/pytorch/pytorch/wiki/Dev-Infra-Office-Hours"
CONTACT_US = f"Please reach out to the [PyTorch DevX Team]({OFFICE_HOURS_LINK}) with feedback or questions!"
ALTERNATIVES = (
"If this is not the intended behavior, feel free to use some "
+ f"of the other merge options in the [wiki]({BOT_COMMANDS_WIKI})."
)
LAND_CHECK_ROLLOUT = "https://github.com/pytorch/test-infra/blob/main/torchci/lib/bot/rolloutUtils.ts#L1-L34"
def has_label(labels: List[str], pattern: Pattern[str] = CIFLOW_LABEL) -> bool:
return len(list(filter(pattern.match, labels))) > 0
class TryMergeExplainer(object):
force: bool
on_green: bool
land_checks: bool
labels: List[str]
pr_num: int
org: str
project: str
has_trunk_label: bool
has_ciflow_label: bool
def __init__(
self,
force: bool,
on_green: bool,
land_checks: bool,
labels: List[str],
pr_num: int,
org: str,
project: str,
):
self.force = force
self.on_green = on_green
self.land_checks = land_checks
self.labels = labels
self.pr_num = pr_num
self.org = org
self.project = project
self.get_flags()
def get_flags(self) -> Tuple[bool, bool]:
self.has_trunk_label = has_label(self.labels, CIFLOW_TRUNK_LABEL)
self.has_ciflow_label = has_label(self.labels, CIFLOW_LABEL)
should_check_land_branch = self.land_checks and not self.has_trunk_label
should_check_green = self.on_green or self.has_ciflow_label
return (should_check_green, should_check_land_branch)
def _get_flag_msg(self) -> str:
if self.force:
return " the force (-f) flag."
elif self.on_green:
return " the green (-g) flag."
elif self.land_checks:
return (
" the land checks (-l) flag."
+ " If you did not specify this flag yourself, "
+ f" you are likely enrolled in the [land checks rollout]({LAND_CHECK_ROLLOUT})."
)
else:
return "out a flag."
def _get_land_check_progress(self, commit: Optional[str]) -> str:
if commit is not None:
return (
" and land check "
+ f"progress [here](https://hud.pytorch.org/{self.org}/{self.project}/commit/{commit})"
)
else:
return ""
def _get_flag_explanation_message(self) -> str:
if self.force:
return "This means your change will be merged **immediately**, bypassing any CI checks (ETA: 1-5 minutes)."
elif self.on_green:
return "This means that your change will be merged once all checks on your PR have passed (ETA: 0-4 Hours)."
elif self.land_checks:
if self.has_trunk_label:
land_check_msg_suffix = "have passed since you have added the `ciflow/trunk` label to your PR (ETA 0-4 Hours)."
else:
land_check_msg_suffix = (
"and the land checks have passed (**ETA 4 Hours**). "
)
land_check_msg_suffix += "If you need to coordinate lands between different changes and cannot risk a land race, "
land_check_msg_suffix += "please add the `ciflow/trunk` label to your PR and wait for signal to complete, "
land_check_msg_suffix += "and then land your changes in proper order."
land_check_msg_suffix += (
" Having `trunk`, `pull`, and `Lint` pre-run on a "
)
land_check_msg_suffix += (
"PR will bypass land checks and the ETA should be immediate."
)
return (
"This means that your change will be merged once all checks on your PR "
+ land_check_msg_suffix
)
else:
return "This means that your change will be merged once all checks on your PR have passed (ETA: 0-4 Hours)."
def get_merge_message(self, commit: Optional[str] = None) -> str:
message_prefix = "@pytorchbot successfully started a merge job."
progress_links = f"Check the current status [here]({os.getenv('GH_RUN_URL')}){self._get_land_check_progress(commit)}."
flag_message = f"The merge job was triggered with{self._get_flag_msg()}"
explanation_message = self._get_flag_explanation_message()
msg = message_prefix + " "
msg += progress_links + "\n"
msg += flag_message + " "
msg += explanation_message + " "
msg += ALTERNATIVES + "\n"
msg += CONTACT_US
return msg
def get_revert_message(org: str, project: str, pr_num: int) -> str:
msg = (
"@pytorchbot successfully started a revert job."
+ f" Check the current status [here]({os.getenv('GH_RUN_URL')}).\n"
)
msg += CONTACT_US
return msg
def get_land_check_troubleshooting_message() -> str:
return (
" If you believe this is an error, you can use the old behavior with `@pytorchbot merge -g`"
+ " (optionally with the `ciflow/trunk` to get land checks)"
+ ' or use `@pytorchbot merge -f "some reason here"`.'
+ f" For more information, see the [bot wiki]({BOT_COMMANDS_WIKI}). \n\n"
+ CONTACT_US
)

View File

@ -1,26 +1,22 @@
#!/usr/bin/env python3
import os
import subprocess
import sys
import re
from typing import Any
from gitutils import get_git_remote_name, get_git_repo_dir, GitRepo
from trymerge import gh_post_pr_comment as gh_post_comment, GitHubPR
from trymerge import gh_post_comment, GitHubPR
def parse_args() -> Any:
from argparse import ArgumentParser
parser = ArgumentParser("Rebase PR into branch")
parser.add_argument("--dry-run", action="store_true")
parser.add_argument("--branch", type=str)
parser.add_argument("pr_num", type=int)
return parser.parse_args()
def rebase_onto(pr: GitHubPR, repo: GitRepo, onto_branch: str, dry_run: bool = False) -> None:
def rebase_onto(pr: GitHubPR, repo: GitRepo, dry_run: bool = False) -> None:
branch = f"pull/{pr.pr_num}/head"
onto_branch = f"refs/remotes/origin/{onto_branch}"
onto_branch = pr.default_branch()
remote_url = f"https://github.com/{pr.info['headRepository']['nameWithOwner']}.git"
refspec = f"{branch}:{pr.head_ref()}"
@ -40,91 +36,24 @@ def rebase_onto(pr: GitHubPR, repo: GitRepo, onto_branch: str, dry_run: bool = F
"git pull --rebase`)", dry_run=dry_run)
def rebase_ghstack_onto(pr: GitHubPR, repo: GitRepo, onto_branch: str, dry_run: bool = False) -> None:
if subprocess.run([sys.executable, "-m", "ghstack", "--help"], capture_output=True).returncode != 0:
subprocess.run([sys.executable, "-m", "pip", "install", "ghstack"])
orig_ref = f"{re.sub(r'/head$', '/orig', pr.head_ref())}"
onto_branch = f"refs/remotes/origin/{onto_branch}"
repo.fetch(orig_ref, orig_ref)
repo._run_git("rebase", onto_branch, orig_ref)
# steal the identity of the committer of the commit on the orig branch
email = repo._run_git("log", orig_ref, "--pretty=format:%ae", "-1")
name = repo._run_git("log", orig_ref, "--pretty=format:%an", "-1")
repo._run_git("config", "--global", "user.name", name)
repo._run_git("config", "--global", "user.email", email)
os.environ["OAUTH_TOKEN"] = os.environ["GITHUB_TOKEN"]
with open('.ghstackrc', 'w+') as f:
f.write('[ghstack]\n' +
"github_url=github.com\n" +
"github_username=pytorchmergebot\n" +
"remote_name=origin")
if dry_run:
print("Don't know how to dry-run ghstack")
else:
ghstack_result = subprocess.run(["ghstack"], capture_output=True)
push_result = ghstack_result.stdout.decode("utf-8")
print(push_result)
if ghstack_result.returncode != 0:
raise Exception(f"\n```{push_result}```")
# The contents of a successful push result should look like:
# Summary of changes (ghstack 0.6.0)
# - Updated https://github.com/clee2000/random-testing/pull/2
# - Updated https://github.com/clee2000/random-testing/pull/1
# Facebook employees can import your changes by running
# (on a Facebook machine):
# ghimport -s https://github.com/clee2000/random-testing/pull/2
# If you want to work on this diff stack on another machine:
# ghstack checkout https://github.com/clee2000/random-testing/pull/2
org, project = repo.gh_owner_and_name()
for line in push_result.splitlines():
if "Updated" in line:
pr_num = int(line.split("/")[-1])
if pr_num != pr.pr_num:
gh_post_comment(pr.org, pr.project, pr_num,
f"Rebased `{orig_ref}` onto `{onto_branch}` because #{pr.pr_num} was rebased, "
"please pull locally before adding more changes (for example, via `ghstack " +
f"checkout https://github.com/{org}/{project}/pull/{pr_num}`)", dry_run=dry_run)
else:
gh_post_comment(pr.org, pr.project, pr_num,
f"Successfully rebased `{orig_ref}` onto `{onto_branch}`, please pull locally " +
"before adding more changes (for example, via `ghstack " +
f"checkout https://github.com/{org}/{project}/pull/{pr.pr_num}`)", dry_run=dry_run)
if f"Skipped https://github.com/{org}/{project}/pull/{pr.pr_num}" in push_result:
gh_post_comment(pr.org, pr.project, pr.pr_num,
f"Tried to rebase and push PR #{pr.pr_num}, but it was already up to date", dry_run=dry_run)
def main() -> None:
args = parse_args()
repo = GitRepo(get_git_repo_dir(), get_git_remote_name(), debug=True)
org, project = repo.gh_owner_and_name()
pr = GitHubPR(org, project, args.pr_num)
onto_branch = args.branch if args.branch else pr.default_branch()
msg = "@pytorchbot successfully started a rebase job."
msg += f" Check the current status [here]({os.getenv('GH_RUN_URL')})"
gh_post_comment(org, project, args.pr_num, msg, dry_run=args.dry_run)
if pr.is_closed():
gh_post_comment(org, project, args.pr_num, f"PR #{args.pr_num} is closed, won't rebase", dry_run=args.dry_run)
return
if pr.is_ghstack_pr():
gh_post_comment(org, project, args.pr_num,
f"PR #{args.pr_num} is a ghstack, which is currently not supported", dry_run=args.dry_run)
return
try:
if pr.is_ghstack_pr():
rebase_ghstack_onto(pr, repo, onto_branch, dry_run=args.dry_run)
return
rebase_onto(pr, repo, onto_branch, dry_run=args.dry_run)
rebase_onto(pr, repo, dry_run=args.dry_run)
except Exception as e:
msg = f"Rebase failed due to {e}"
run_url = os.getenv("GH_RUN_URL")

View File

@ -1,169 +0,0 @@
import json
import os
import subprocess
import requests
from typing import Any, Dict
from argparse import ArgumentParser
MERGEBOT_TOKEN = os.environ["MERGEBOT_TOKEN"]
PYTORCHBOT_TOKEN = os.environ["PYTORCHBOT_TOKEN"]
OWNER, REPO = "pytorch", "pytorch"
def git_api(
url: str, params: Dict[str, str], type: str = "get", token: str = MERGEBOT_TOKEN
) -> Any:
headers = {
"Accept": "application/vnd.github.v3+json",
"Authorization": f"token {token}",
}
if type == "post":
return requests.post(
f"https://api.github.com{url}",
data=json.dumps(params),
headers=headers,
).json()
elif type == "patch":
return requests.patch(
f"https://api.github.com{url}",
data=json.dumps(params),
headers=headers,
).json()
else:
return requests.get(
f"https://api.github.com{url}",
params=params,
headers=headers,
).json()
def parse_args() -> Any:
parser = ArgumentParser("Rebase PR into branch")
parser.add_argument("--repo-name", type=str)
parser.add_argument("--branch", type=str)
return parser.parse_args()
def make_pr(repo_name: str, branch_name: str) -> Any:
params = {
"title": f"[{repo_name} hash update] update the pinned {repo_name} hash",
"head": branch_name,
"base": "master",
"body": "This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/"
+ f".github/workflows/_update-commit-hash.yml).\nUpdate the pinned {repo_name} hash.",
}
response = git_api(f"/repos/{OWNER}/{REPO}/pulls", params, type="post")
print(f"made pr {response['html_url']}")
return response["number"]
def approve_pr(pr_number: str) -> None:
params = {"event": "APPROVE"}
# use pytorchbot to approve the pr
git_api(
f"/repos/{OWNER}/{REPO}/pulls/{pr_number}/reviews",
params,
type="post",
token=PYTORCHBOT_TOKEN,
)
def make_comment(pr_number: str, msg: str) -> None:
params = {"body": msg}
# comment with pytorchbot because pytorchmergebot gets ignored
git_api(
f"/repos/{OWNER}/{REPO}/issues/{pr_number}/comments",
params,
type="post",
token=PYTORCHBOT_TOKEN,
)
def close_pr(pr_number: str) -> None:
params = {"state": "closed"}
git_api(
f"/repos/{OWNER}/{REPO}/pulls/{pr_number}",
params,
type="patch",
)
def is_newer_hash(new_hash: str, old_hash: str, repo_name: str) -> bool:
def _get_date(hash: str) -> int:
# this git command prints the unix timestamp of the hash
return int(
subprocess.run(
f"git show --no-patch --no-notes --pretty=%ct {hash}".split(),
capture_output=True,
cwd=f"{repo_name}",
)
.stdout.decode("utf-8")
.strip()
)
return _get_date(new_hash) > _get_date(old_hash)
def main() -> None:
args = parse_args()
branch_name = os.environ["NEW_BRANCH_NAME"]
pr_num = None
# query to see if a pr already exists
params = {
"q": f"is:pr is:open in:title author:pytorchmergebot repo:{OWNER}/{REPO} {args.repo_name} hash update"
}
response = git_api("/search/issues", params)
if response["total_count"] != 0:
# pr does exist
pr_num = response["items"][0]["number"]
link = response["items"][0]["html_url"]
response = git_api(f"/repos/{OWNER}/{REPO}/pulls/{pr_num}", {})
branch_name = response["head"]["ref"]
print(
f"pr does exist, number is {pr_num}, branch name is {branch_name}, link is {link}"
)
hash = (
subprocess.run(
f"git rev-parse {args.branch}".split(),
capture_output=True,
cwd=f"{args.repo_name}",
)
.stdout.decode("utf-8")
.strip()
)
with open(f".github/ci_commit_pins/{args.repo_name}.txt", "r+") as f:
old_hash = f.read().strip()
subprocess.run(f"git checkout {old_hash}".split(), cwd=args.repo_name)
f.seek(0)
f.truncate()
f.write(f"{hash}\n")
if is_newer_hash(hash, old_hash, args.repo_name):
# if there was an update, push to branch
subprocess.run(f"git checkout -b {branch_name}".split())
subprocess.run(f"git add .github/ci_commit_pins/{args.repo_name}.txt".split())
subprocess.run(
"git commit -m".split() + [f"update {args.repo_name} commit hash"]
)
subprocess.run(f"git push --set-upstream origin {branch_name} -f".split())
print(f"changes pushed to branch {branch_name}")
if pr_num is None:
# no existing pr, so make a new one and approve it
pr_num = make_pr(args.repo_name, branch_name)
approve_pr(pr_num)
# comment to merge if all checks are green
make_comment(pr_num, "@pytorchbot merge -g")
else:
print(
f"tried to update from old hash: {old_hash} to new hash: {hash} but the old hash seems to be newer, not creating pr"
)
if pr_num is not None:
make_comment(pr_num, "closing pr as the current hash seems up to date")
close_pr(pr_num)
print(f"closing PR {pr_num}")
if __name__ == "__main__":
main()

13
.github/scripts/wait_for_ssh_to_drain.sh vendored Executable file
View File

@ -0,0 +1,13 @@
#!/usr/bin/env bash
set -eou pipefail
echo "Holding runner for 2 hours until all ssh sessions have logged out"
for _ in $(seq 1440); do
# Break if no ssh session exists anymore
if [ "$(who)" = "" ]; then
break
fi
echo "."
sleep 5
done

View File

@ -1,7 +1,4 @@
{%- set upload_artifact_s3_action = "seemethere/upload-artifact-s3@v5" -%}
{%- set download_artifact_s3_action = "seemethere/download-artifact-s3@v4" -%}
{%- set upload_artifact_action = "actions/upload-artifact@v3" -%}
{%- set download_artifact_action = "actions/download-artifact@v3" -%}
{%- set upload_artifact_s3_action = "seemethere/upload-artifact-s3@v4" -%}
{# squid_proxy is an private ELB that only available for GHA custom runners #}
{%- set squid_proxy = "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -%}
@ -11,11 +8,11 @@
# NOTE: If testing pytorch/builder changes you can change this variable to change what pytorch/builder reference
# the binary builds will check out
{%- set builder_branch = "release/1.13" -%}
{%- set builder_branch = "release/1.12" -%}
{%- macro concurrency(build_environment) -%}
concurrency:
group: !{{ build_environment }}-${{ github.event.pull_request.number || github.ref_name }}-${{ github.ref_type == 'branch' && github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
group: !{{ build_environment }}-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true
{%- endmacro -%}
@ -93,6 +90,7 @@ on:
AWS_DEFAULT_REGION: us-east-1
GIT_DEFAULT_BRANCH: ${{ github.event.repository.default_branch }}
BRANCH: ${{ steps.parse-ref.outputs.branch }}
JOB_BASE_NAME: !{{ build_environment }}-test
PR_NUMBER: ${{ github.event.pull_request.number }}
SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
TAG: ${{ steps.parse-ref.outputs.tag }}
@ -195,8 +193,30 @@ on:
killall runsvc.sh
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}"
env | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}"
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
{%- endmacro -%}
{%- macro teardown_ec2_linux(pytorch_directory="") -%}
- name: Hold runner for 2 hours or until ssh sessions have drained
{%- if pytorch_directory %}
working-directory: !{{ pytorch_directory }}
{%- endif %}
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Chown workspace
if: always()
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Kill containers, clean up images
if: always()
run: |
# ignore expansion of "docker ps -q" since it could be empty
# shellcheck disable=SC2046
docker stop $(docker ps -q) || true
# Prune all of the docker images
docker system prune -af
{%- endmacro -%}
{%- macro teardown_rocm_linux() -%}

Some files were not shown because too many files have changed in this diff Show More