Compare commits

..

57 Commits

Author SHA1 Message Date
7c98e70d44 attempted fix for nvrtc with lovelace (#87611) (#87618)
Fixes #87595 (maybe?)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87611
Approved by: https://github.com/malfet, https://github.com/atalman

Co-authored-by: Natalia Gimelshein <ngimel@fb.com>
2022-10-24 16:00:35 -04:00
4e1a4b150a fix docs push (#87498) (#87628)
push docs to temp branch first then push to actual branch to satisfy CLA check in branch protections
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87498
Approved by: https://github.com/malfet

Co-authored-by: Catherine Lee <csl@fb.com>
2022-10-24 15:42:56 -04:00
341c377c0f Add General Project Policies (#87385) (#87613)
Add General Project Policies to the Governance page

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87385
Approved by: https://github.com/orionr

Co-authored-by: Svetlana Karslioglu <svekars@fb.com>
2022-10-24 15:42:15 -04:00
fdb18da4cc Fix distributed issue by including distributed files (#87612) 2022-10-24 15:40:50 -04:00
8569a44f38 [MPS] Revamp copy_to_mps_ implementation (#87475)
* [MPS] Copy from CPU always add storageOffset (#86958)

Because why wouldn't it?
Fixes https://github.com/pytorch/pytorch/issues/86052

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86958
Approved by: https://github.com/kulinseth

(cherry picked from commit 13cff2ee8ea1d7aea2ad201cbd77ebe2b9a29d25)

* [MPS] Revamp copy_to_mps_ implementation (#86956)

Tensor's view in linear storage is represented by the following parameters: `.shape`, `.stride()` and `.storage_offset()`.

Only tensors that are representable as 1d-views can be copied from host to device (and vice versa) using single  [`copy(from:sourceOffset:to:destinationOffset:size:)`](https://developer.apple.com/documentation/metal/mtlblitcommandencoder/1400767-copyfrombuffer?language=objc) call.

Modify `copy_to_mps_` function to do the following steps:
- Cast `src` tensor to dst data type if needed
- Expand `src` tensor to `dst` tensor shape
- Clone `src` tensor if it is not stride contiguous (i.e. can not be represented by `src.view(src.numel())`)
- Create an empty tensor if `dst` is not stride-contiguous or if its strides are different then potentially cloned `src` strides
- Do 1d copy for `src` to (potentiall temp) `dst`
- Finally do re-striding/copy on MPS if needed

Add test to cover cases where stide-contiguous permuted tensor is copied to MPS, non-stride-contiguous tensor is copied to MPS and if permuted CPU tensor is copied to differently permuted MPS tensor

Fixes https://github.com/pytorch/pytorch/issues/86954

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86956
Approved by: https://github.com/kulinseth

(cherry picked from commit ae62cf7c02c009ab1123a3bfa08ca9a5e4255e4a)
2022-10-21 10:26:23 -07:00
6a8be2cb63 [ONNX] Reland: Update training state logic to support ScriptedModule (#86745) (#87457)
In https://github.com/pytorch/pytorch/issues/86325, it was reported that ScriptedModule do not have a training attribute and will fail export because we don't expect them as input.

Also

- Parameterized the test_util_funs test

Thanks @borisfom for the suggestion!

Fixes #86325

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86745
Approved by: https://github.com/AllenTiTaiWang, https://github.com/BowenBao

Co-authored-by: Justin Chu <justinchu@microsoft.com>
2022-10-21 09:17:31 -07:00
f6c42ae2c2 Reenable isinstance with torch.distributed.ReduceOp (#87303) (#87463)
tentatively marking as draft as I haven't gotten a comprehensive list of side effects...

Ref: https://stackoverflow.com/questions/40244413/python-static-class-attribute-of-the-class-itself
Rel: https://github.com/pytorch/pytorch/issues/87191

cc @kwen2501
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87303
Approved by: https://github.com/wanchaol

Co-authored-by: Masaki Kozuki <mkozuki@nvidia.com>
2022-10-21 09:10:41 -07:00
51fa4fae41 Move PadNd from ATen/native to ATen (#87456)
Summary:
This header is being included from both aten/native and torch/csrc, but
some of our build configurations don't allow direct dependencies from
torch/csrc to atent/native, so put the header in aten where it's always
accessible.

Resolves https://github.com/pytorch/pytorch/issues/81198

Test Plan:
CI.
```
./scripts/build_android.sh
env ANDROID_ABI="x86_64" ANDROID_NDK=".../ndk-bundle" CMAKE_CXX_COMPILER_LAUNCHER=ccache CMAKE_C_COMPILER_LAUNCHER=ccache USE_VULKAN=0 ./scripts/build_android.sh
echo '#include <torch/torch.h>' > test.cpp
g++ -E -I $PWD/build_android/install/include/ -I $PWD/build_android/install/include/torch/csrc/api/include test.cpp >/dev/null
```

ghstack-source-id: 8f6f1b9954cd3ae5399db7a0bfe0d5ba70d39823
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82379

Co-authored-by: David Reiss <dreiss@fb.com>
2022-10-21 09:08:15 -07:00
d3aecbd9bc Delete torch::deploy from pytorch core (#85953) (#85953) (#87454)
Summary:
As we have migrated torch::deploy over to https://github.com/pytorch/multipy, we can now delete it from pytorch core as ongoing development will happen there.

This PR was created due to syncing issues with https://github.com/pytorch/pytorch/pull/85443 which is where the review history can be found.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85953
Approved by: https://github.com/seemethere, https://github.com/malfet

Test Plan: contbuild & OSS CI, see 936e93058b

Reviewed By: seemethere

Differential Revision: D40167063

Pulled By: seemethere

fbshipit-source-id: 62e8843a8d69e67c8ec77a2d0db87539516cc904

Co-authored-by: Sahan Paliskara (Meta Employee) <sahancpal@gmail.com>
2022-10-21 09:07:19 -07:00
d253eb29d8 Avoid calling logging.basicConfig (#86959) (#87455)
Fixes https://github.com/pytorch/pytorch/issues/85952

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86959
Approved by: https://github.com/xwang233, https://github.com/davidberard98

Co-authored-by: Sherlock Huang <bahuang@fb.com>
2022-10-21 11:54:36 -04:00
0c0df0be74 Add weights_only option to torch.load (#87443)
* Tweak several test serialization to store models state_dict (#87143)

Namely, change:
- `test_meta_serialization`
- `test_serialization_2gb_file`
- `test_pathlike_serialization`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87143
Approved by: https://github.com/ezyang

(cherry picked from commit 4a533f12157ffb5c05c142490e4ceaa311981b38)

* Add `weights_only` option to `torch.load` (#86812)

This addresses the security issue in default Python's `unpickler` that allows arbitrary code execution while unpickling.
Restrict classes allowed to be unpicked to in `None`, `int`, `bool`, `str`, `float`, `list`, `tuple`, `dict`/`OrderedDict` as well as `torch.Size`, `torch.nn.Param` as well as  `torch.Tensor` and `torch.Storage` variants.

Defaults `weights_only` is set to `False`,  but allows global override to safe only load via `TORCH_FORCE_WEIGHTS_ONLY_LOAD` environment variable.

To some extent, addresses https://github.com/pytorch/pytorch/issues/52596
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86812
Approved by: https://github.com/ezyang

(cherry picked from commit 961ebca2255f902477e9ea7060b8f28781e3c0cd)
2022-10-21 09:54:13 -04:00
59686b4c60 [maskedtensor] fix docs formatting (#87387) (#87406)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87387
Approved by: https://github.com/cpuhrsch
2022-10-21 09:41:44 -04:00
55c76baf57 [1.13] Fix functorch version docs (#87392)
The functorch docs build seems to use a build of PyTorch that is not the
RC binary to produce the documentation. This results in the version
looking like '1.13.0a??????'.

This PR just hard-codes the functorch version in the docs to be 1.13 in
the release branch.

Test Plan:
- check docs preview
2022-10-21 09:28:32 -04:00
3ffddc0b2d [functorch] fix AOTAutograd tutorial (#87415) (#87434)
It was raising asserts previously
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87415
Approved by: https://github.com/Chillee
2022-10-20 23:14:01 -07:00
385022c76e [ci] handle libomp upgrade on github (#87382) (#87408)
like #86979, idk if this is a good idea but it seems to fix the problem
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87382
Approved by: https://github.com/seemethere

Co-authored-by: Catherine Lee <csl@fb.com>
2022-10-20 18:05:30 -07:00
05cb998e11 [MPS] Do not dispatch empty job in bitwise_not (#87286) (#87427)
Follows the pattern from https://github.com/pytorch/pytorch/pull/85285 and returns before computing dispatching an empty metal kernel for bitwise not operation.

Fixes crash when invoked with empty MPS tensor on AMD GPU

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87286
Approved by: https://github.com/kulinseth

(cherry picked from commit a79e034d89d3d112fcb8d16f7a6862934a44955d)
2022-10-20 17:43:33 -07:00
d2ce4310ba Assert if padding mask type is unexpected (#87106) (#87384)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87106

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86353

Fix the issue described in
https://github.com/pytorch/pytorch/issues/86120

Test Plan: buck test mode/opt caffe2/test:test_transformers -- test_train_with_long_type_pad

Reviewed By: malfet

Differential Revision: D40129968

fbshipit-source-id: d7530d8c4548b9d3a3ef8b9740c93bee000ed809
2022-10-20 13:39:07 -07:00
ca0ee922bc [release] Add warning to stateless.functional_call for deprecated behavior (#87079)
* add warning to stateless for deprecated behavior

* fix polyfill to not error

* moved check to happen during traversal, down to 5% regression

* cleanup: remove unnecessary files and add typing where mypy needs

* added test

* fixed public api check
2022-10-20 16:33:42 -04:00
6e2cb22d74 [functorch][docs] Downgrade the warning about forward-mode AD coverage (#87383) (#87386)
Previously we claimed that "forward-mode AD coverage is not that good".
We've since improved it so I clarified the statement in our docs and
downgraded the warning to a note.

Test Plan:
- view docs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87383
Approved by: https://github.com/samdow
2022-10-20 15:43:19 -04:00
aba5544aff [maskedtensor] add docs (#84887) (#87381)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84887
Approved by: https://github.com/cpuhrsch

Co-authored-by: George Qi <georgeqi94@gmail.com>
2022-10-20 15:40:27 -04:00
7342a3b0ea Add prototype warning to MaskedTensor constructor (#87107) (#87380)
When a user constructs a MaskedTensor we should signal its development status to set expecations.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87107
Approved by: https://github.com/bhosmer
2022-10-20 15:36:13 -04:00
1f80ac7fa7 [Docs] Update mm family ops and F.linear to note limited sparse support. (#86220) (#87379)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86220
Approved by: https://github.com/cpuhrsch

Co-authored-by: Andrew M. James <andrew.m.james2@gmail.com>
2022-10-20 15:31:42 -04:00
0948dbb3e7 Add torch.sparse overview section (#85265) (#87376)
The goal of this section is to provide a general overview of how PyTorch handles sparsity for readers who are already familiar with sparse matrices and their operators.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85265
Approved by: https://github.com/jisaacso
2022-10-20 15:22:17 -04:00
3ca59f4b8a Reenable aot tests on windows for cuda 11.7 and up (#87193) (#87307)
Reenable aot tests on windows for cuda 11.7 and up

Issue: https://github.com/pytorch/pytorch/issues/69460 seems to be mitigated in CUDA 11.7 hence re-enable this test

cc @peterjc123 @mszhanyi @skyline75489 @nbcsm
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87193
Approved by: https://github.com/malfet
2022-10-20 10:04:01 -07:00
9e3df4920f Improve NestedTensor documentation (#85186) (#87337)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85186
Approved by: https://github.com/cpuhrsch
2022-10-20 10:53:44 -04:00
3af7c7fc98 [einsum] fix MPS regression and fix incorrect contraction order when path is None (#87261)
* [einsum] keep the promise that we contract left to right (#87199)

We promise that if path is not defined, we would go left to right. The previous code did not keep that promise as we push'd combined ops to the back of the list. For most use cases this is fine (einsum with 3 or fewer inputs), but we should do what we say.

Test plan:
Added a print statement to print the sizes of ops we're contracting to see if the order is fixed. Code run:
```
import torch
a = torch.rand(1)
b = torch.rand(2)
c = torch.rand(3)
d = torch.rand(4)
torch.einsum('a,b,c,d->abcd', a,b,c,d)
```

BEFORE--it does a+b, then c+d, then a+b+c+d, which...is right, but it's not the order specified by the user.
```
/Users/janeyx/pytorch/torch/functional.py:378: UserWarning: Contracting a: [1, 1, 1, 1]and b: [1, 2, 1, 1] (Triggered internally at /Users/janeyx/pytorch/aten/src/ATen/native/Linear.cpp:507.)
  return _VF.einsum(equation, operands)  # type: ignore[attr-defined]
/Users/janeyx/pytorch/torch/functional.py:378: UserWarning: Contracting a: [1, 1, 3, 1]and b: [1, 1, 1, 4] (Triggered internally at /Users/janeyx/pytorch/aten/src/ATen/native/Linear.cpp:507.)
  return _VF.einsum(equation, operands)  # type: ignore[attr-defined]
/Users/janeyx/pytorch/torch/functional.py:378: UserWarning: Contracting a: [1, 2, 1, 1]and b: [1, 1, 3, 4] (Triggered internally at /Users/janeyx/pytorch/aten/src/ATen/native/Linear.cpp:507.)
  return _VF.einsum(equation, operands)  # type: ignore[attr-defined]
```

WITH THIS CHANGE--it actually goes left to right: a+b, a+b+c, a+b+c+d
```
/Users/janeyx/pytorch/torch/functional.py:378: UserWarning: Contracting a: [1, 1, 1, 1]and b: [1, 2, 1, 1] (Triggered internally at /Users/janeyx/pytorch/aten/src/ATen/native/Linear.cpp:507.)
  return _VF.einsum(equation, operands)  # type: ignore[attr-defined]
/Users/janeyx/pytorch/torch/functional.py:378: UserWarning: Contracting a: [1, 2, 1, 1]and b: [1, 1, 3, 1] (Triggered internally at /Users/janeyx/pytorch/aten/src/ATen/native/Linear.cpp:507.)
  return _VF.einsum(equation, operands)  # type: ignore[attr-defined]
/Users/janeyx/pytorch/torch/functional.py:378: UserWarning: Contracting a: [1, 2, 3, 1]and b: [1, 1, 1, 4] (Triggered internally at /Users/janeyx/pytorch/aten/src/ATen/native/Linear.cpp:507.)
  return _VF.einsum(equation, operands)  # type: ignore[attr-defined]
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87199
Approved by: https://github.com/soulitzer

* [einsum] Call view instead of sum to remediate MPS regression (#87135)

Fixes #87010.

It turns out that squeeze is much faster than sum, and view is faster than squeeze, so we should default to that whenever possible.

Benchmarking results show that, on MPS, we would be going from the following code taking **29.89ms instead of the current 1466ms, almost a 50x speedup**.
```
q = torch.rand(16, 4096, 40, device='mps', dtype=torch.float)
k = torch.rand(16, 4096, 40, device='mps', dtype=torch.float)
torch.einsum('b i d, b j d -> b i j', q, k).max().item()
```
And a regular einsum will now take **.506ms instead of 2.76ms.**
```
q = torch.rand(16, 4096, 40, device='mps', dtype=torch.float)
k = torch.rand(16, 4096, 40, device='mps', dtype=torch.float)
torch.einsum('b i d, b j d -> b i j', q, k)
```

Special thanks to @soulitzer for helping me experiment + figure out how to squash the remaining 5x regression due to squeeze being slower than view!!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87135
Approved by: https://github.com/soulitzer, https://github.com/malfet, https://github.com/albanD
2022-10-19 13:50:07 -04:00
e6d8c1392c [einsum] Fix opt_einsum defaults to be more reasonable (#86985) (#87144)
Fixes the confusing situation mentioned here https://github.com/pytorch/pytorch/issues/85224#issuecomment-1278628262 by

- setting better OG defaults
- changing warnings to errors now that we have better defaults

Test plan:
- Ran einsum tests locally + CI
- Uninstalled opt-einsum and ran through setting
     - `enabled` to False (doesn't throw error)
     - `strategy` to anything that's not None (errors)
     - `strategy` to None (noops)
- Installed opt-einsum and ran through setting
     - `enabled` to False (doesn't throw error)
     - `enabled` to True (doesn't throw error, no ops + defaults to 'auto')
     - `strategy` to random string (errors)
     - `strategy` to None (noops, still is 'auto')
     - `strategy` to 'greedy' (is set to 'greedy')
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86985
Approved by: https://github.com/soulitzer
2022-10-19 10:46:09 -04:00
ef6dd0d34d [ONNX] Ignore print(Tensor) during tracing (#86223) (#87122)
Fixes #73619
Fixes https://github.com/microsoft/onnxruntime/issues/11812

This PR adds new symbolics: `aten::_conj`, `aten::conj_physical`, `aten::resolve_conj`, and `aten::resolve_neg`
While the last two are always NO-OP by definition (do not change nodes), the first raises an exception as they are not supported by ONNX yet
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86223
Approved by: https://github.com/justinchuby, https://github.com/BowenBao
2022-10-18 14:19:08 -07:00
894bad72c9 [ONNX] Fix scalar_type_analysis metadata for copied constant (#86716) (#86923)
Fix the source of metadata for copied constant. Since the constant is being implicitly casted,
it makes more sense to assign code location and etc with the user node.
This issue was discovered in https://github.com/pytorch/pytorch/issues/86627. This PR also adds unit test coverage for scope
information of nodes when they are altered by CSE and related passes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86716
Approved by: https://github.com/thiagocrepaldi, https://github.com/malfet
2022-10-18 11:46:58 -07:00
d324341922 Bug fix for torch._C._willEngineExecuteNode (#86672) 2022-10-18 11:44:49 -07:00
1442f04f3b Enables lazy loading of cuda modules if not set by user (#86509) 2022-10-18 11:42:24 -07:00
1ebff1dcaf [ONNX] Renable assert diagnostic test (#85999) (#86924)
Fix to properly clear 'background_context' of export diagnostic 'engine' in `clear`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85999
Approved by: https://github.com/abock
2022-10-18 11:40:40 -07:00
7688e5eeab [ONNX] Fix triu/tril export with diagonal input (#86843) (#86925)
Investigation with @thiagocrepaldi discovered this bug with triu/tril export when
`diagonal` is passed in as input. Previously assumption was made that `diagonal`
is always provided a constant value.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86843
Approved by: https://github.com/thiagocrepaldi, https://github.com/abock
2022-10-18 11:39:33 -07:00
d2325f3b6a handle libomp update on circleci (#86979) (#87132)
libomp got an update and now its keg only

reverts https://github.com/pytorch/pytorch/pull/86940
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86979
Approved by: https://github.com/huydhn, https://github.com/malfet
2022-10-18 11:35:04 -07:00
eda8263f15 [Release 1.13.0][DataPipe] Documentation and interface fix (#87140)
* [DataPipe] Fix missing functional name for FileLister (#86497)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86497
Approved by: https://github.com/ejguan

* [DataPipe] Fixing interface generation in setup.py (#87081)

Based on the artifact generated on this [page](https://hud.pytorch.org/pr/87081), I downloaded [[s3] linux-focal-py3.7-clang7-asan/artifacts.zip](https://gha-artifacts.s3.amazonaws.com/pytorch/pytorch/3266430083/linux-focal-py3.7-clang7-asan/artifacts.zip) (1.14 GB) and unpacked it. `torch.utils.data.datapipes.datapipe.pyi` does exist. I believe this means the file should be part of the distribution.

I also did `wheel unpack ***.whl` to confirm the existence of the file.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87081
Approved by: https://github.com/ejguan
2022-10-18 11:34:21 -07:00
c40a04415c Add error checking to flaky test bot platform parser (#86632) (#87201)
If an invalid platform is specified when disabling a test with flaky test bot, the CI crashes, skipping all tests that come after it.

This turns it into a console message instead.  Not erroring out here since it'll affect random PRs.  Actual error message should go into the bot that parses the original issue so that it can respond on that issue directly
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86632
Approved by: https://github.com/huydhn

(cherry picked from commit 0337f0ad473ffc298a30e603050d2df9d0073428)

Co-authored-by: Zain Rizvi <zainr@fb.com>
2022-10-18 10:26:46 -07:00
f89a762a8e Fix the performance issue that the for-loop before ExternallCall could not be parallelized. (#85056) (#86516)
Currently, NNC only parallelizes the loop statement of the graph outputs. The logic could bypass some loop statements that could be parallelized. Take an example as follows and suppose the output of `ExternallCall` is also the output of NNC fusion group. Current [parallel logic](https://github.com/pytorch/pytorch/pull/85056/files#diff-9a11174c26e4b57ab73e819520122bc314467c72962f3a5b79e7400ea3c4bbe5L781-L785) only tries to parallel the `ExternalCall` and bypass `stmt1` and `stmt2`.

```c++
stmt1: For:
stmt2:   For:
stmt3: ExternalCall
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85056
Approved by: https://github.com/frank-wei, https://github.com/bertmaher
2022-10-18 07:47:40 -07:00
492d572086 [functorch] fix cross (#86934)
[ghstack-poisoned]
2022-10-17 16:58:16 -04:00
366c59e260 Allow PrivateUse1 backends to not have Storage (#86557) (#86803)
* Allow PrivateUse1 backends to not have Storage (#86557)

Allow PrivateUse1 backends to not have Storage

To unblock the DirectML backend, this change would be needed for 1.13 as well.

The DirectML backend creates tensors using the open registration pattern documented here: https://pytorch.org/tutorials/advanced/extend_dispatcher.html
[registration example](https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fbdhirsh%2Fpytorch_open_registration_example&data=05%7C01%7CSheil.Kumar%40microsoft.com%7Cf107b0b4349e41f1a57808daa7ee8a2c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638006940242882444%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ivYLNmuC1WMitwu8n%2B1RAmeKkRM4ssb7EvhhGKJDFwk%3D&reserved=0)

However, DirectML tensors are opaque, and do not have Storage.
The DirectML Tensor Impl derives from OpaqueTensorImpl, which does not have a storage. Because of this various places in the code fail that expect storage to be present. We had made various changes in-tree to accommodate this:
a.	def __deepcopy__(self, memo):
[b5acba8895/torch/_tensor.py (L119)](https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fpytorch%2Fpytorch%2Fblob%2Fb5acba88959698d35cb548c78dd3fb151f85f28b%2Ftorch%2F_tensor.py%23L119&data=05%7C01%7CSheil.Kumar%40microsoft.com%7Cf107b0b4349e41f1a57808daa7ee8a2c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638006940242882444%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ajg23nMCzgRDwlinqSxS%2BRmOkAcDCr3LW%2BBEfNCn5hw%3D&reserved=0)
or self.device.type in ["lazy", "xla", "mps", "ort", "meta", "hpu", 'dml']
b.	def _reduce_ex_internal(self, proto):
[b5acba8895/torch/_tensor.py (L275)](https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fpytorch%2Fpytorch%2Fblob%2Fb5acba88959698d35cb548c78dd3fb151f85f28b%2Ftorch%2F_tensor.py%23L275&data=05%7C01%7CSheil.Kumar%40microsoft.com%7Cf107b0b4349e41f1a57808daa7ee8a2c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638006940242882444%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=xDW6LwPSe2F396OJ6QSJY6mVzJVDeQiJgA0G347y2pw%3D&reserved=0)
if self.device.type in ["xla", "ort", "hpu", "dml"]:
c.	TensorIteratorBase::build has an unsupported list for tensors without storage.
[b5acba8895/aten/src/ATen/TensorIterator.cpp (L1497)](https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fpytorch%2Fpytorch%2Fblob%2Fb5acba88959698d35cb548c78dd3fb151f85f28b%2Faten%2Fsrc%2FATen%2FTensorIterator.cpp%23L1497&data=05%7C01%7CSheil.Kumar%40microsoft.com%7Cf107b0b4349e41f1a57808daa7ee8a2c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638006940242882444%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=qAdgNgzKl0xrtOvsABpw1VGkSoGUpe7jwDPhHw3XjgU%3D&reserved=0)

Using the PrivateUse1 backend, similar exemptions need to be made in order to relax requirements on Storage so that the DirectML backend tensors can work.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86557
Approved by: https://github.com/bdhirsh, https://github.com/martinb35

* Fix incorrect tensor storage check  (#86845)

Fix incorrect tensor storage check

This change contains an incorrect check for storage: https://github.com/pytorch/pytorch/pull/86557
**self.storage is not None**
should have been:
**not torch._C._has_storage(self)**

These fixes were run through the DirectML test suite, and confirm the check is now working correctly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86845
Approved by: https://github.com/martinb35, https://github.com/bdhirsh
2022-10-14 18:01:26 -07:00
de3fa48e38 Install c10d headers with absolute path (#86257) (#86933)
https://github.com/pytorch/pytorch/pull/85780 updated all c10d headers in pytorch to use absolute path following the other distributed components. However, the headers were still copied to `${TORCH_INSTALL_INCLUDE_DIR}/torch`, thus external extentions still have to reference the c10d headers as `<c10d/*.h>`, making the usage inconsistent (the only exception was c10d/exception.h, which was copied to `${TORCH_INSTALL_INCLUDE_DIR}/torch/csrc/distributed/c10d`).

This patch fixes the installation step to copy all c10d headers to `${TORCH_INSTALL_INCLUDE_DIR}/torch/csrc/distributed/c10d`, thus external extensions can consistently reference c10d headers with the absolute path.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86257
Approved by: https://github.com/kumpera
2022-10-14 17:59:41 -07:00
aab841e89d [ONNX] Support device().type() string comparison with constant (#86168) (#86921)
Fixes #86168

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86168
Approved by: https://github.com/justinchuby, https://github.com/AllenTiTaiWang, https://github.com/abock
2022-10-14 11:10:12 -07:00
7e196948ca Add vmap support for slogdet; fix regression from functorch 0.2.1 (#86815) (#86902)
This PR adds vmap support for slogdet -- slogdet just decomposes into
linalg.slogdet.

This fixes a regression from functorch 0.2.1 (slogdet had a batching
rule then, and doesn't anymore). We didn't catch the regression because
it seems like slogdet doesn't have an OpInfo (I'm not sure if it had one
before).

Test Plan:
- new one-off test.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86815
Approved by: https://github.com/samdow
2022-10-13 11:24:19 -07:00
311b47b72c [quant] Move the order of x86 engine to avoid changing the default qengine (#86631) (#86726)
since the default qengine is the last element of the engine in supported_engines list, adding x86 qengine in the end of the list changes the default quantized engine as well. this PR will be a short term fix to revert the changes. We have an issue here to track the proper fix: https://github.com/pytorch/pytorch/issues/86404

Motivation:
a meta internal team found that the inference failed in onednn prepacking with error: "could not create a primitive descriptor for a reorder primitive." in a COPPER_LAKE machine, we are working with intel to repro and fix the problem. in the mean time, we'll revert the changes of default option back to fbgemm
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86631
Approved by: https://github.com/vkuzo
2022-10-13 09:49:53 -07:00
51c16f092c [functorch] Add more details to the functorch install page (#86823) (#86903)
Added some details about:
- `pip uninstall functorch` being helpful if there are problems
- `pip install functorch` still working for BC reasons.

Test Plan:
- wait for docs preview
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86823
Approved by: https://github.com/samdow
2022-10-13 11:54:55 -04:00
f65ac2ad20 [CI] Fix builder ref for release, linux only (#86904) 2022-10-13 11:52:45 -04:00
71251e2521 ci: Just use regular checkout (#86824) (#86895)
checkout-pytorch seems to have issues and is purpose made for our PR
testing and appears to conflict with what we're trying to do for binary
builds.

For builds like
https://github.com/pytorch/pytorch/actions/runs/3207520052/jobs/5242479607
there is a confusion over where the reference is pulled and I believe it is
root caused by the checkout logic in checkout-pytorch.

So with that in mind I suggest we just use the upstream checkout action
for this job

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86824
Approved by: https://github.com/atalman

Co-authored-by: Eli Uriegas <eliuriegas@fb.com>
2022-10-13 11:03:02 -04:00
01042487e2 [1.13] Release-only improvements to functorch docs (#86693)
This PR:
- corrects the functorch version string (previously it said "nightly",
it should instead say the version, which is 1.13).
- Adds permalinks to colabs. The colab notebooks previously linked to
the ones on the master branch, instead, they should link to the ones on
the release/1.13 branch.

Test Plan:
- build and tested docs locally
2022-10-13 07:21:14 -07:00
d9ddab5efa [1.13] Remove torch.vmap (#86333)
torch.vmap is a prototype feature (functorch.vmap is in beta). We're
still working on deduplicating the two.

This PR removes torch.vmap from the 1.13 release. Hopefully by the next
release we will have everything deduplicated so we don't need to do this
anymore.
2022-10-13 07:00:34 -07:00
03992a6fb3 Make the data types of output and input consistenst for batchnorm (#86784)
* make the data types of output and input consistenst for batchnorm

* Apply suggestions from code review

Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com>
2022-10-13 08:10:50 -04:00
786431cd13 [DOC] Use type hints to show annotation in the docs (#86851)
Fixes #44964

Use type hints in the code to show type annotations in the parameters section of the docs.

For the parameters already documented in the docstring, but lack the type annotation, the type hints from the code are used:

| [Before](https://pytorch.org/docs/master/generated/torch.nn.AdaptiveMaxPool1d.html) | [After](https://docs-preview.pytorch.org/79086/generated/torch.nn.AdaptiveMaxPool1d.html) |
| --- | --- |
| <img width="462" alt="image" src="https://user-images.githubusercontent.com/6421097/172954756-96d2d8a6-7df9-4c0f-ad34-c12912a5a740.png"> | <img width="479" alt="image" src="https://user-images.githubusercontent.com/6421097/172954770-a6ce2425-99a6-4853-ac2c-e182c3849344.png"> |

| [Before](https://pytorch.org/docs/master/generated/torch.nn.Linear.html) | [After](https://docs-preview.pytorch.org/79086/generated/torch.nn.Linear.html) |
| --- | --- |
| <img width="482" alt="image" src="https://user-images.githubusercontent.com/6421097/172954992-10ce6b48-44a2-487e-b855-2a15a50805bb.png"> | <img width="471" alt="image" src="https://user-images.githubusercontent.com/6421097/172954839-84012ce6-bf42-432c-9226-d3e81500e72d.png"> |

Ref:
- PR https://github.com/pytorch/pytorch/pull/49294 removed type annotations from signatures in HTML docs.
- Sphinx version was bumped to 5.0.0 in PR #70309
- Duplicated (closed) issues: #78311 and #77501

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79086
Approved by: https://github.com/malfet

(cherry picked from commit e552cf105058e6d7ea367d31ff3d3c0a31ea0bbd)

Co-authored-by: Shawn Zhong <github@shawnzhong.com>
2022-10-12 18:53:37 -07:00
f37023b03f [MPS] Better error message for slow_conv2d_forward (#86844)
Error `Could not run 'aten::_slow_conv2d_forward' with arguments from the 'MPS' backend.` is very misleading as usually this method is only invoked if input is on CPU but weights are on MPS device.
Raise a more user friendly error in this case

Add test to `test_invalid_conv2d` to check for those conditions.

Fixes https://github.com/pytorch/pytorch/issues/77931

This is a cherry-pick of  https://github.com/pytorch/pytorch/pull/86303 into release/1.13 branch
Approved by: https://github.com/kulinseth

(cherry picked from commit fa799132d82c3c48253aaf7d3ee3a8c5e007350d)
2022-10-12 16:36:23 -07:00
5be45fc4f1 [ROCm] set nvfuser default to disabled, keep CI (#86369) (#86725)
Bug fix. nvfuser is functional for ROCm on gfx906, but some tests are failing for other gfx targets. Disable nvfuser until all features are verified. Users may still opt-in by setting the known env var PYTORCH_JIT_ENABLE_NVFUSER=1. This PR sets this env var for the github actions workflow for ROCm since all current CI hosts are gfx906.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86369
Approved by: https://github.com/huydhn
2022-10-12 15:21:07 -04:00
50a9cd95ba Add version selector back to functorch docs (#86602) (#86689)
I accidentally deleted it in
https://github.com/pytorch/pytorch/pull/85856/ . This brings the version
selector back.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86602
Approved by: https://github.com/samdow
2022-10-12 15:14:55 -04:00
c38dbd0e1d Conditionally build the TestApp benchmark based on lite interpreter (#86314) (#86377)
The TestApp benchmark was recently re-added, however it seems it only builds when pytorch is built with the lite interpreter. This diff adds a macro to compile out the benchmark when pytorch is built as full jit. This should fix our full jit simulator nightly builds.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86314
Approved by: https://github.com/malfet

Co-authored-by: John Detloff <johndetloff@fb.com>
2022-10-12 14:19:30 -04:00
95112ca043 Fix binary builds for the release - unblock release (#86484)
* Fix binary builds - unblock release

* Fix RC names

* Fix builder
2022-10-07 16:31:40 -04:00
f682048bf9 Fix for the binary upload (#86385) 2022-10-06 14:50:21 -04:00
56508e29e6 Release 1.13, Install torch from test channel, Pin build… (#86290)
* Release 1.13, pin builder, Install torch from test channel, Pin builder and xla repo

* Update pytorch xla name
2022-10-05 14:14:30 -04:00
4995 changed files with 182226 additions and 484788 deletions

View File

@ -1,4 +1,4 @@
build --cxxopt=--std=c++17
build --cxxopt=--std=c++14
build --copt=-I.
# Bazel does not support including its cc_library targets as system
# headers. We work around this for generated code

View File

@ -1,32 +0,0 @@
#!/bin/bash
# Work around bug where devtoolset replaces sudo and breaks it.
if [ -n "$DEVTOOLSET_VERSION" ]; then
export SUDO=/bin/sudo
else
export SUDO=sudo
fi
as_jenkins() {
# NB: unsetting the environment variables works around a conda bug
# https://github.com/conda/conda/issues/6576
# NB: Pass on PATH and LD_LIBRARY_PATH to sudo invocation
# NB: This must be run from a directory that jenkins has access to,
# works around https://github.com/conda/conda-package-handling/pull/34
$SUDO -H -u jenkins env -u SUDO_UID -u SUDO_GID -u SUDO_COMMAND -u SUDO_USER env "PATH=$PATH" "LD_LIBRARY_PATH=$LD_LIBRARY_PATH" $*
}
conda_install() {
# Ensure that the install command don't upgrade/downgrade Python
# This should be called as
# conda_install pkg1 pkg2 ... [-c channel]
as_jenkins conda install -q -n py_$ANACONDA_PYTHON_VERSION -y python="$ANACONDA_PYTHON_VERSION" $*
}
conda_run() {
as_jenkins conda run -n py_$ANACONDA_PYTHON_VERSION --no-capture-output $*
}
pip_install() {
as_jenkins conda run -n py_$ANACONDA_PYTHON_VERSION pip install --progress-bar off $*
}

View File

@ -1,29 +0,0 @@
#!/bin/bash
set -ex
source "$(dirname "${BASH_SOURCE[0]}")/common_utils.sh"
if [ -n "${UBUNTU_VERSION}" ]; then
apt update
apt-get install -y clang doxygen git graphviz nodejs npm libtinfo5
fi
# Do shallow clone of PyTorch so that we can init lintrunner in Docker build context
git clone https://github.com/pytorch/pytorch.git --depth 1
chown -R jenkins pytorch
pushd pytorch
# Install all linter dependencies
pip_install -r requirements.txt
conda_run lintrunner init
# Cache .lintbin directory as part of the Docker image
cp -r .lintbin /tmp
popd
# Node dependencies required by toc linter job
npm install -g markdown-toc
# Cleaning up
rm -rf pytorch

View File

@ -1,34 +0,0 @@
ARG UBUNTU_VERSION
FROM ubuntu:${UBUNTU_VERSION}
ARG UBUNTU_VERSION
ENV DEBIAN_FRONTEND noninteractive
# Install common dependencies (so that this step can be cached separately)
COPY ./common/install_base.sh install_base.sh
RUN bash ./install_base.sh && rm install_base.sh
# Install user
COPY ./common/install_user.sh install_user.sh
RUN bash ./install_user.sh && rm install_user.sh
# Install conda and other packages (e.g., numpy, pytest)
ARG ANACONDA_PYTHON_VERSION
ARG CONDA_CMAKE
ENV ANACONDA_PYTHON_VERSION=$ANACONDA_PYTHON_VERSION
ENV PATH /opt/conda/envs/py_$ANACONDA_PYTHON_VERSION/bin:/opt/conda/bin:$PATH
COPY requirements-ci.txt /opt/conda/requirements-ci.txt
COPY ./common/install_conda.sh install_conda.sh
COPY ./common/common_utils.sh common_utils.sh
RUN bash ./install_conda.sh && rm install_conda.sh common_utils.sh /opt/conda/requirements-ci.txt
# Note that Docker build forbids copying file outside the build context
COPY ./common/install_linter.sh install_linter.sh
COPY ./common/common_utils.sh common_utils.sh
RUN bash ./install_linter.sh
RUN rm install_linter.sh common_utils.sh
USER jenkins
CMD ["bash"]

View File

@ -1,14 +0,0 @@
# Jenkins
The scripts in this directory are the entrypoint for testing ONNX exporter.
The environment variable `BUILD_ENVIRONMENT` is expected to be set to
the build environment you intend to test. It is a hint for the build
and test scripts to configure Caffe2 a certain way and include/exclude
tests. Docker images, they equal the name of the image itself. For
example: `py2-cuda9.0-cudnn7-ubuntu16.04`. The Docker images that are
built on Jenkins and are used in triggered builds already have this
environment variable set in their manifest. Also see
`./docker/jenkins/*/Dockerfile` and search for `BUILD_ENVIRONMENT`.
Our Jenkins installation is located at https://ci.pytorch.org/jenkins/.

View File

@ -1,19 +0,0 @@
set -ex
LOCAL_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
ROOT_DIR=$(cd "$LOCAL_DIR"/../.. && pwd)
TEST_DIR="$ROOT_DIR/test"
pytest_reports_dir="${TEST_DIR}/test-reports/python"
# Figure out which Python to use
PYTHON="$(which python)"
if [[ "${BUILD_ENVIRONMENT}" =~ py((2|3)\.?[0-9]?\.?[0-9]?) ]]; then
PYTHON=$(which "python${BASH_REMATCH[1]}")
fi
if [[ "${BUILD_ENVIRONMENT}" == *rocm* ]]; then
# HIP_PLATFORM is auto-detected by hipcc; unset to avoid build errors
unset HIP_PLATFORM
fi
mkdir -p "$pytest_reports_dir" || true

View File

@ -1,74 +0,0 @@
#!/bin/bash
# shellcheck source=./common.sh
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
if [[ ${BUILD_ENVIRONMENT} == *onnx* ]]; then
pip install click mock tabulate networkx==2.0
pip -q install --user "file:///var/lib/jenkins/workspace/third_party/onnx#egg=onnx"
fi
# Skip tests in environments where they are not built/applicable
if [[ "${BUILD_ENVIRONMENT}" == *-android* ]]; then
echo 'Skipping tests'
exit 0
fi
if [[ "${BUILD_ENVIRONMENT}" == *-rocm* ]]; then
# temporary to locate some kernel issues on the CI nodes
export HSAKMT_DEBUG_LEVEL=4
fi
# These additional packages are needed for circleci ROCm builds.
if [[ $BUILD_ENVIRONMENT == *rocm* ]]; then
# Need networkx 2.0 because bellmand_ford was moved in 2.1 . Scikit-image by
# defaults installs the most recent networkx version, so we install this lower
# version explicitly before scikit-image pulls it in as a dependency
pip install networkx==2.0
# click - onnx
pip install --progress-bar off click protobuf tabulate virtualenv mock typing-extensions
fi
################################################################################
# Python tests #
################################################################################
if [[ "$BUILD_ENVIRONMENT" == *cmake* ]]; then
exit 0
fi
# If pip is installed as root, we must use sudo.
# CircleCI docker images could install conda as jenkins user, or use the OS's python package.
PIP=$(which pip)
PIP_USER=$(stat --format '%U' $PIP)
CURRENT_USER=$(id -u -n)
if [[ "$PIP_USER" = root && "$CURRENT_USER" != root ]]; then
MAYBE_SUDO=sudo
fi
# Uninstall pre-installed hypothesis and coverage to use an older version as newer
# versions remove the timeout parameter from settings which ideep/conv_transpose_test.py uses
$MAYBE_SUDO pip -q uninstall -y hypothesis
$MAYBE_SUDO pip -q uninstall -y coverage
# "pip install hypothesis==3.44.6" from official server is unreliable on
# CircleCI, so we host a copy on S3 instead
$MAYBE_SUDO pip -q install attrs==18.1.0 -f https://s3.amazonaws.com/ossci-linux/wheels/attrs-18.1.0-py2.py3-none-any.whl
$MAYBE_SUDO pip -q install coverage==4.5.1 -f https://s3.amazonaws.com/ossci-linux/wheels/coverage-4.5.1-cp36-cp36m-macosx_10_12_x86_64.whl
$MAYBE_SUDO pip -q install hypothesis==4.57.1
##############
# ONNX tests #
##############
if [[ "$BUILD_ENVIRONMENT" == *onnx* ]]; then
pip install -q --user --no-use-pep517 "git+https://github.com/pytorch/vision.git@$(cat .github/ci_commit_pins/vision.txt)"
pip install -q --user transformers==4.25.1
pip install -q --user ninja flatbuffers==2.0 numpy==1.22.4 onnxruntime==1.14.0 beartype==0.10.4
# TODO: change this when onnx 1.13.1 is released.
pip install --no-use-pep517 'onnx @ git+https://github.com/onnx/onnx@e192ba01e438d22ca2dedd7956e28e3551626c91'
# TODO: change this when onnx-script is on testPypi
pip install 'onnx-script @ git+https://github.com/microsoft/onnx-script@a71e35bcd72537bf7572536ee57250a0c0488bf6'
# numba requires numpy <= 1.20, onnxruntime requires numpy >= 1.21.
# We don't actually need it for our tests, but it's imported if it's present, so uninstall.
pip uninstall -q --yes numba
# JIT C++ extensions require ninja, so put it into PATH.
export PATH="/var/lib/jenkins/.local/bin:$PATH"
"$ROOT_DIR/scripts/onnx/test.sh"
fi

View File

@ -1,29 +0,0 @@
#!/bin/bash
# Required environment variable: $BUILD_ENVIRONMENT
# (This is set by default in the Docker images we build, so you don't
# need to set it yourself.
# shellcheck source=./common.sh
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
# shellcheck source=./common-build.sh
source "$(dirname "${BASH_SOURCE[0]}")/common-build.sh"
echo "Clang version:"
clang --version
python tools/stats/export_test_times.py
if [ -n "$(which conda)" ]; then
export CMAKE_PREFIX_PATH=/opt/conda
fi
CC="clang" CXX="clang++" LDSHARED="clang --shared" \
CFLAGS="-fsanitize=thread" \
USE_TSAN=1 USE_CUDA=0 USE_MKLDNN=0 \
python setup.py bdist_wheel
pip_install_whl "$(echo dist/*.whl)"
print_sccache_stats
assert_git_not_dirty

View File

@ -1,58 +0,0 @@
#!/bin/bash
# Required environment variables:
# $BUILD_ENVIRONMENT (should be set by your Docker image)
if [[ "$BUILD_ENVIRONMENT" != *win-* ]]; then
# Save the absolute path in case later we chdir (as occurs in the gpu perf test)
script_dir="$( cd "$(dirname "${BASH_SOURCE[0]}")" || exit ; pwd -P )"
if which sccache > /dev/null; then
# Save sccache logs to file
sccache --stop-server > /dev/null 2>&1 || true
rm -f ~/sccache_error.log || true
function sccache_epilogue() {
echo "::group::Sccache Compilation Log"
echo '=================== sccache compilation log ==================='
python "$script_dir/print_sccache_log.py" ~/sccache_error.log 2>/dev/null || true
echo '=========== If your build fails, please take a look at the log above for possible reasons ==========='
sccache --show-stats
sccache --stop-server || true
echo "::endgroup::"
}
# Register the function here so that the error log can be printed even when
# sccache fails to start, i.e. timeout error
trap_add sccache_epilogue EXIT
if [[ -n "${SKIP_SCCACHE_INITIALIZATION:-}" ]]; then
# sccache --start-server seems to hang forever on self hosted runners for GHA
# so let's just go ahead and skip the --start-server altogether since it seems
# as though sccache still gets used even when the sscache server isn't started
# explicitly
echo "Skipping sccache server initialization, setting environment variables"
export SCCACHE_IDLE_TIMEOUT=1200
export SCCACHE_ERROR_LOG=~/sccache_error.log
export RUST_LOG=sccache::server=error
elif [[ "${BUILD_ENVIRONMENT}" == *rocm* ]]; then
SCCACHE_ERROR_LOG=~/sccache_error.log SCCACHE_IDLE_TIMEOUT=0 sccache --start-server
else
# increasing SCCACHE_IDLE_TIMEOUT so that extension_backend_test.cpp can build after this PR:
# https://github.com/pytorch/pytorch/pull/16645
SCCACHE_ERROR_LOG=~/sccache_error.log SCCACHE_IDLE_TIMEOUT=1200 RUST_LOG=sccache::server=error sccache --start-server
fi
# Report sccache stats for easier debugging
sccache --zero-stats
fi
if which ccache > /dev/null; then
# Report ccache stats for easier debugging
ccache --zero-stats
ccache --show-stats
function ccache_epilogue() {
ccache --show-stats
}
trap_add ccache_epilogue EXIT
fi
fi

View File

@ -1,28 +0,0 @@
#!/bin/bash
# Common setup for all Jenkins scripts
# shellcheck source=./common_utils.sh
source "$(dirname "${BASH_SOURCE[0]}")/common_utils.sh"
set -ex
# Required environment variables:
# $BUILD_ENVIRONMENT (should be set by your Docker image)
# Figure out which Python to use for ROCm
if [[ "${BUILD_ENVIRONMENT}" == *rocm* ]]; then
# HIP_PLATFORM is auto-detected by hipcc; unset to avoid build errors
unset HIP_PLATFORM
export PYTORCH_TEST_WITH_ROCM=1
# temporary to locate some kernel issues on the CI nodes
export HSAKMT_DEBUG_LEVEL=4
# improve rccl performance for distributed tests
export HSA_FORCE_FINE_GRAIN_PCIE=1
fi
# TODO: Renable libtorch testing for MacOS, see https://github.com/pytorch/pytorch/issues/62598
# shellcheck disable=SC2034
BUILD_TEST_LIBTORCH=0
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}

View File

@ -1,14 +0,0 @@
#!/bin/bash
# Common prelude for macos-build.sh and macos-test.sh
# shellcheck source=./common.sh
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
sysctl -a | grep machdep.cpu
# These are required for both the build job and the test job.
# In the latter to test cpp extensions.
export MACOSX_DEPLOYMENT_TARGET=10.9
export CXX=clang++
export CC=clang

View File

@ -1,468 +0,0 @@
Warning
=======
Contents may be out of date. Our CircleCI workflows are gradually being migrated to Github actions.
Structure of CI
===============
setup job:
1. Does a git checkout
2. Persists CircleCI scripts (everything in `.circleci`) into a workspace. Why?
We don't always do a Git checkout on all subjobs, but we usually
still want to be able to call scripts one way or another in a subjob.
Persisting files this way lets us have access to them without doing a
checkout. This workspace is conventionally mounted on `~/workspace`
(this is distinguished from `~/project`, which is the conventional
working directory that CircleCI will default to starting your jobs
in.)
3. Write out the commit message to `.circleci/COMMIT_MSG`. This is so
we can determine in subjobs if we should actually run the jobs or
not, even if there isn't a Git checkout.
CircleCI configuration generator
================================
One may no longer make changes to the `.circleci/config.yml` file directly.
Instead, one must edit these Python scripts or files in the `verbatim-sources/` directory.
Usage
----------
1. Make changes to these scripts.
2. Run the `regenerate.sh` script in this directory and commit the script changes and the resulting change to `config.yml`.
You'll see a build failure on GitHub if the scripts don't agree with the checked-in version.
Motivation
----------
These scripts establish a single, authoritative source of documentation for the CircleCI configuration matrix.
The documentation, in the form of diagrams, is automatically generated and cannot drift out of sync with the YAML content.
Furthermore, consistency is enforced within the YAML config itself, by using a single source of data to generate
multiple parts of the file.
* Facilitates one-off culling/enabling of CI configs for testing PRs on special targets
Also see https://github.com/pytorch/pytorch/issues/17038
Future direction
----------------
### Declaring sparse config subsets
See comment [here](https://github.com/pytorch/pytorch/pull/17323#pullrequestreview-206945747):
In contrast with a full recursive tree traversal of configuration dimensions,
> in the future I think we actually want to decrease our matrix somewhat and have only a few mostly-orthogonal builds that taste as many different features as possible on PRs, plus a more complete suite on every PR and maybe an almost full suite nightly/weekly (we don't have this yet). Specifying PR jobs in the future might be easier to read with an explicit list when we come to this.
----------------
----------------
# How do the binaries / nightlies / releases work?
### What is a binary?
A binary or package (used interchangeably) is a pre-built collection of c++ libraries, header files, python bits, and other files. We build these and distribute them so that users do not need to install from source.
A **binary configuration** is a collection of
* release or nightly
* releases are stable, nightlies are beta and built every night
* python version
* linux: 3.7m (mu is wide unicode or something like that. It usually doesn't matter but you should know that it exists)
* macos: 3.7, 3.8
* windows: 3.7, 3.8
* cpu version
* cpu, cuda 9.0, cuda 10.0
* The supported cuda versions occasionally change
* operating system
* Linux - these are all built on CentOS. There haven't been any problems in the past building on CentOS and using on Ubuntu
* MacOS
* Windows - these are built on Azure pipelines
* devtoolset version (gcc compiler version)
* This only matters on Linux cause only Linux uses gcc. tldr is gcc made a backwards incompatible change from gcc 4.8 to gcc 5, because it had to change how it implemented std::vector and std::string
### Where are the binaries?
The binaries are built in CircleCI. There are nightly binaries built every night at 9pm PST (midnight EST) and release binaries corresponding to Pytorch releases, usually every few months.
We have 3 types of binary packages
* pip packages - nightlies are stored on s3 (pip install -f \<a s3 url\>). releases are stored in a pip repo (pip install torch) (ask Soumith about this)
* conda packages - nightlies and releases are both stored in a conda repo. Nighty packages have a '_nightly' suffix
* libtorch packages - these are zips of all the c++ libraries, header files, and sometimes dependencies. These are c++ only
* shared with dependencies (the only supported option for Windows)
* static with dependencies
* shared without dependencies
* static without dependencies
All binaries are built in CircleCI workflows except Windows. There are checked-in workflows (committed into the .circleci/config.yml) to build the nightlies every night. Releases are built by manually pushing a PR that builds the suite of release binaries (overwrite the config.yml to build the release)
# CircleCI structure of the binaries
Some quick vocab:
* A \**workflow** is a CircleCI concept; it is a DAG of '**jobs**'. ctrl-f 'workflows' on https://github.com/pytorch/pytorch/blob/master/.circleci/config.yml to see the workflows.
* **jobs** are a sequence of '**steps**'
* **steps** are usually just a bash script or a builtin CircleCI command. *All steps run in new environments, environment variables declared in one script DO NOT persist to following steps*
* CircleCI has a **workspace**, which is essentially a cache between steps of the *same job* in which you can store artifacts between steps.
## How are the workflows structured?
The nightly binaries have 3 workflows. We have one job (actually 3 jobs: build, test, and upload) per binary configuration
1. binary_builds
1. every day midnight EST
2. linux: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/linux-binary-build-defaults.yml
3. macos: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/macos-binary-build-defaults.yml
4. For each binary configuration, e.g. linux_conda_3.7_cpu there is a
1. binary_linux_conda_3.7_cpu_build
1. Builds the build. On linux jobs this uses the 'docker executor'.
2. Persists the package to the workspace
2. binary_linux_conda_3.7_cpu_test
1. Loads the package to the workspace
2. Spins up a docker image (on Linux), mapping the package and code repos into the docker
3. Runs some smoke tests in the docker
4. (Actually, for macos this is a step rather than a separate job)
3. binary_linux_conda_3.7_cpu_upload
1. Logs in to aws/conda
2. Uploads the package
2. update_s3_htmls
1. every day 5am EST
2. https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/binary_update_htmls.yml
3. See below for what these are for and why they're needed
4. Three jobs that each examine the current contents of aws and the conda repo and update some html files in s3
3. binarysmoketests
1. every day
2. https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/nightly-build-smoke-tests-defaults.yml
3. For each binary configuration, e.g. linux_conda_3.7_cpu there is a
1. smoke_linux_conda_3.7_cpu
1. Downloads the package from the cloud, e.g. using the official pip or conda instructions
2. Runs the smoke tests
## How are the jobs structured?
The jobs are in https://github.com/pytorch/pytorch/tree/master/.circleci/verbatim-sources. Jobs are made of multiple steps. There are some shared steps used by all the binaries/smokes. Steps of these jobs are all delegated to scripts in https://github.com/pytorch/pytorch/tree/master/.circleci/scripts .
* Linux jobs: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/linux-binary-build-defaults.yml
* binary_linux_build.sh
* binary_linux_test.sh
* binary_linux_upload.sh
* MacOS jobs: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/macos-binary-build-defaults.yml
* binary_macos_build.sh
* binary_macos_test.sh
* binary_macos_upload.sh
* Update html jobs: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/binary_update_htmls.yml
* These delegate from the pytorch/builder repo
* https://github.com/pytorch/builder/blob/master/cron/update_s3_htmls.sh
* https://github.com/pytorch/builder/blob/master/cron/upload_binary_sizes.sh
* Smoke jobs (both linux and macos): https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/nightly-build-smoke-tests-defaults.yml
* These delegate from the pytorch/builder repo
* https://github.com/pytorch/builder/blob/master/run_tests.sh
* https://github.com/pytorch/builder/blob/master/smoke_test.sh
* https://github.com/pytorch/builder/blob/master/check_binary.sh
* Common shared code (shared across linux and macos): https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/nightly-binary-build-defaults.yml
* binary_checkout.sh - checks out pytorch/builder repo. Right now this also checks out pytorch/pytorch, but it shouldn't. pytorch/pytorch should just be shared through the workspace. This can handle being run before binary_populate_env.sh
* binary_populate_env.sh - parses BUILD_ENVIRONMENT into the separate env variables that make up a binary configuration. Also sets lots of default values, the date, the version strings, the location of folders in s3, all sorts of things. This generally has to be run before other steps.
* binary_install_miniconda.sh - Installs miniconda, cross platform. Also hacks this for the update_binary_sizes job that doesn't have the right env variables
* binary_run_in_docker.sh - Takes a bash script file (the actual test code) from a hardcoded location, spins up a docker image, and runs the script inside the docker image
### **Why do the steps all refer to scripts?**
CircleCI creates a final yaml file by inlining every <<* segment, so if we were to keep all the code in the config.yml itself then the config size would go over 4 MB and cause infra problems.
### **What is binary_run_in_docker for?**
So, CircleCI has several executor types: macos, machine, and docker are the ones we use. The 'machine' executor gives you two cores on some linux vm. The 'docker' executor gives you considerably more cores (nproc was 32 instead of 2 back when I tried in February). Since the dockers are faster, we try to run everything that we can in dockers. Thus
* linux build jobs use the docker executor. Running them on the docker executor was at least 2x faster than running them on the machine executor
* linux test jobs use the machine executor in order for them to properly interface with GPUs since docker executors cannot execute with attached GPUs
* linux upload jobs use the machine executor. The upload jobs are so short that it doesn't really matter what they use
* linux smoke test jobs use the machine executor for the same reason as the linux test jobs
binary_run_in_docker.sh is a way to share the docker start-up code between the binary test jobs and the binary smoke test jobs
### **Why does binary_checkout also checkout pytorch? Why shouldn't it?**
We want all the nightly binary jobs to run on the exact same git commit, so we wrote our own checkout logic to ensure that the same commit was always picked. Later circleci changed that to use a single pytorch checkout and persist it through the workspace (they did this because our config file was too big, so they wanted to take a lot of the setup code into scripts, but the scripts needed the code repo to exist to be called, so they added a prereq step called 'setup' to checkout the code and persist the needed scripts to the workspace). The changes to the binary jobs were not properly tested, so they all broke from missing pytorch code no longer existing. We hotfixed the problem by adding the pytorch checkout back to binary_checkout, so now there's two checkouts of pytorch on the binary jobs. This problem still needs to be fixed, but it takes careful tracing of which code is being called where.
# Code structure of the binaries (circleci agnostic)
## Overview
The code that runs the binaries lives in two places, in the normal [github.com/pytorch/pytorch](http://github.com/pytorch/pytorch), but also in [github.com/pytorch/builder](http://github.com/pytorch/builder), which is a repo that defines how all the binaries are built. The relevant code is
```
# All code needed to set-up environments for build code to run in,
# but only code that is specific to the current CI system
pytorch/pytorch
- .circleci/ # Folder that holds all circleci related stuff
- config.yml # GENERATED file that actually controls all circleci behavior
- verbatim-sources # Used to generate job/workflow sections in ^
- scripts/ # Code needed to prepare circleci environments for binary build scripts
- setup.py # Builds pytorch. This is wrapped in pytorch/builder
- cmake files # used in normal building of pytorch
# All code needed to prepare a binary build, given an environment
# with all the right variables/packages/paths.
pytorch/builder
# Given an installed binary and a proper python env, runs some checks
# to make sure the binary was built the proper way. Checks things like
# the library dependencies, symbols present, etc.
- check_binary.sh
# Given an installed binary, runs python tests to make sure everything
# is in order. These should be de-duped. Right now they both run smoke
# tests, but are called from different places. Usually just call some
# import statements, but also has overlap with check_binary.sh above
- run_tests.sh
- smoke_test.sh
# Folders that govern how packages are built. See paragraphs below
- conda/
- build_pytorch.sh # Entrypoint. Delegates to proper conda build folder
- switch_cuda_version.sh # Switches activate CUDA installation in Docker
- pytorch-nightly/ # Build-folder
- manywheel/
- build_cpu.sh # Entrypoint for cpu builds
- build.sh # Entrypoint for CUDA builds
- build_common.sh # Actual build script that ^^ call into
- wheel/
- build_wheel.sh # Entrypoint for wheel builds
- windows/
- build_pytorch.bat # Entrypoint for wheel builds on Windows
```
Every type of package has an entrypoint build script that handles the all the important logic.
## Conda
Linux, MacOS and Windows use the same code flow for the conda builds.
Conda packages are built with conda-build, see https://conda.io/projects/conda-build/en/latest/resources/commands/conda-build.html
Basically, you pass `conda build` a build folder (pytorch-nightly/ above) that contains a build script and a meta.yaml. The meta.yaml specifies in what python environment to build the package in, and what dependencies the resulting package should have, and the build script gets called in the env to build the thing.
tl;dr on conda-build is
1. Creates a brand new conda environment, based off of deps in the meta.yaml
1. Note that environment variables do not get passed into this build env unless they are specified in the meta.yaml
2. If the build fails this environment will stick around. You can activate it for much easier debugging. The “General Python” section below explains what exactly a python “environment” is.
2. Calls build.sh in the environment
3. Copies the finished package to a new conda env, also specified by the meta.yaml
4. Runs some simple import tests (if specified in the meta.yaml)
5. Saves the finished package as a tarball
The build.sh we use is essentially a wrapper around `python setup.py build`, but it also manually copies in some of our dependent libraries into the resulting tarball and messes with some rpaths.
The entrypoint file `builder/conda/build_conda.sh` is complicated because
* It works for Linux, MacOS and Windows
* The mac builds used to create their own environments, since they all used to be on the same machine. Theres now a lot of extra logic to handle conda envs. This extra machinery could be removed
* It used to handle testing too, which adds more logic messing with python environments too. This extra machinery could be removed.
## Manywheels (linux pip and libtorch packages)
Manywheels are pip packages for linux distros. Note that these manywheels are not actually manylinux compliant.
`builder/manywheel/build_cpu.sh` and `builder/manywheel/build.sh` (for CUDA builds) just set different env vars and then call into `builder/manywheel/build_common.sh`
The entrypoint file `builder/manywheel/build_common.sh` is really really complicated because
* This used to handle building for several different python versions at the same time. The loops have been removed, but there's still unnecessary folders and movements here and there.
* The script is never used this way anymore. This extra machinery could be removed.
* This used to handle testing the pip packages too. This is why theres testing code at the end that messes with python installations and stuff
* The script is never used this way anymore. This extra machinery could be removed.
* This also builds libtorch packages
* This should really be separate. libtorch packages are c++ only and have no python. They should not share infra with all the python specific stuff in this file.
* There is a lot of messing with rpaths. This is necessary, but could be made much much simpler if the above issues were fixed.
## Wheels (MacOS pip and libtorch packages)
The entrypoint file `builder/wheel/build_wheel.sh` is complicated because
* The mac builds used to all run on one machine (we didnt have autoscaling mac machines till circleci). So this script handled siloing itself by setting-up and tearing-down its build env and siloing itself into its own build directory.
* The script is never used this way anymore. This extra machinery could be removed.
* This also builds libtorch packages
* Ditto the comment above. This should definitely be separated out.
Note that the MacOS Python wheels are still built in conda environments. Some of the dependencies present during build also come from conda.
## Windows Wheels (Windows pip and libtorch packages)
The entrypoint file `builder/windows/build_pytorch.bat` is complicated because
* This used to handle building for several different python versions at the same time. This is why there are loops everywhere
* The script is never used this way anymore. This extra machinery could be removed.
* This used to handle testing the pip packages too. This is why theres testing code at the end that messes with python installations and stuff
* The script is never used this way anymore. This extra machinery could be removed.
* This also builds libtorch packages
* This should really be separate. libtorch packages are c++ only and have no python. They should not share infra with all the python specific stuff in this file.
Note that the Windows Python wheels are still built in conda environments. Some of the dependencies present during build also come from conda.
## General notes
### Note on run_tests.sh, smoke_test.sh, and check_binary.sh
* These should all be consolidated
* These must run on all OS types: MacOS, Linux, and Windows
* These all run smoke tests at the moment. They inspect the packages some, maybe run a few import statements. They DO NOT run the python tests nor the cpp tests. The idea is that python tests on master and PR merges will catch all breakages. All these tests have to do is make sure the special binary machinery didnt mess anything up.
* There are separate run_tests.sh and smoke_test.sh because one used to be called by the smoke jobs and one used to be called by the binary test jobs (see circleci structure section above). This is still true actually, but these could be united into a single script that runs these checks, given an installed pytorch package.
### Note on libtorch
Libtorch packages are built in the wheel build scripts: manywheel/build_*.sh for linux and build_wheel.sh for mac. There are several things wrong with this
* Its confusing. Most of those scripts deal with python specifics.
* The extra conditionals everywhere severely complicate the wheel build scripts
* The process for building libtorch is different from the official instructions (a plain call to cmake, or a call to a script)
### Note on docker images / Dockerfiles
All linux builds occur in docker images. The docker images are
* pytorch/conda-cuda
* Has ALL CUDA versions installed. The script pytorch/builder/conda/switch_cuda_version.sh sets /usr/local/cuda to a symlink to e.g. /usr/local/cuda-10.0 to enable different CUDA builds
* Also used for cpu builds
* pytorch/manylinux-cuda90
* pytorch/manylinux-cuda100
* Also used for cpu builds
The Dockerfiles are available in pytorch/builder, but there is no circleci job or script to build these docker images, and they cannot be run locally (unless you have the correct local packages/paths). Only Soumith can build them right now.
### General Python
* This is still a good explanation of python installations https://caffe2.ai/docs/faq.html#why-do-i-get-import-errors-in-python-when-i-try-to-use-caffe2
# How to manually rebuild the binaries
tl;dr make a PR that looks like https://github.com/pytorch/pytorch/pull/21159
Sometimes we want to push a change to master and then rebuild all of today's binaries after that change. As of May 30, 2019 there isn't a way to manually run a workflow in the UI. You can manually re-run a workflow, but it will use the exact same git commits as the first run and will not include any changes. So we have to make a PR and then force circleci to run the binary workflow instead of the normal tests. The above PR is an example of how to do this; essentially you copy-paste the binarybuilds workflow steps into the default workflow steps. If you need to point the builder repo to a different commit then you'd need to change https://github.com/pytorch/pytorch/blob/master/.circleci/scripts/binary_checkout.sh#L42-L45 to checkout what you want.
## How to test changes to the binaries via .circleci
Writing PRs that test the binaries is annoying, since the default circleci jobs that run on PRs are not the jobs that you want to run. Likely, changes to the binaries will touch something under .circleci/ and require that .circleci/config.yml be regenerated (.circleci/config.yml controls all .circleci behavior, and is generated using `.circleci/regenerate.sh` in python 3.7). But you also need to manually hardcode the binary jobs that you want to test into the .circleci/config.yml workflow, so you should actually make at least two commits, one for your changes and one to temporarily hardcode jobs. See https://github.com/pytorch/pytorch/pull/22928 as an example of how to do this.
```sh
# Make your changes
touch .circleci/verbatim-sources/nightly-binary-build-defaults.yml
# Regenerate the yaml, has to be in python 3.7
.circleci/regenerate.sh
# Make a commit
git add .circleci *
git commit -m "My real changes"
git push origin my_branch
# Now hardcode the jobs that you want in the .circleci/config.yml workflows section
# Also eliminate ensure-consistency and should_run_job checks
# e.g. https://github.com/pytorch/pytorch/commit/2b3344bfed8772fe86e5210cc4ee915dee42b32d
# Make a commit you won't keep
git add .circleci
git commit -m "[DO NOT LAND] testing binaries for above changes"
git push origin my_branch
# Now you need to make some changes to the first commit.
git rebase -i HEAD~2 # mark the first commit as 'edit'
# Make the changes
touch .circleci/verbatim-sources/nightly-binary-build-defaults.yml
.circleci/regenerate.sh
# Ammend the commit and recontinue
git add .circleci
git commit --amend
git rebase --continue
# Update the PR, need to force since the commits are different now
git push origin my_branch --force
```
The advantage of this flow is that you can make new changes to the base commit and regenerate the .circleci without having to re-write which binary jobs you want to test on. The downside is that all updates will be force pushes.
## How to build a binary locally
### Linux
You can build Linux binaries locally easily using docker.
```sh
# Run the docker
# Use the correct docker image, pytorch/conda-cuda used here as an example
#
# -v path/to/foo:path/to/bar makes path/to/foo on your local machine (the
# machine that you're running the command on) accessible to the docker
# container at path/to/bar. So if you then run `touch path/to/bar/baz`
# in the docker container then you will see path/to/foo/baz on your local
# machine. You could also clone the pytorch and builder repos in the docker.
#
# If you know how, add ccache as a volume too and speed up everything
docker run \
-v your/pytorch/repo:/pytorch \
-v your/builder/repo:/builder \
-v where/you/want/packages/to/appear:/final_pkgs \
-it pytorch/conda-cuda /bin/bash
# Export whatever variables are important to you. All variables that you'd
# possibly need are in .circleci/scripts/binary_populate_env.sh
# You should probably always export at least these 3 variables
export PACKAGE_TYPE=conda
export DESIRED_PYTHON=3.7
export DESIRED_CUDA=cpu
# Call the entrypoint
# `|& tee foo.log` just copies all stdout and stderr output to foo.log
# The builds generate lots of output so you probably need this when
# building locally.
/builder/conda/build_pytorch.sh |& tee build_output.log
```
**Building CUDA binaries on docker**
You can build CUDA binaries on CPU only machines, but you can only run CUDA binaries on CUDA machines. This means that you can build a CUDA binary on a docker on your laptop if you so choose (though its gonna take a long time).
For Facebook employees, ask about beefy machines that have docker support and use those instead of your laptop; it will be 5x as fast.
### MacOS
Theres no easy way to generate reproducible hermetic MacOS environments. If you have a Mac laptop then you can try emulating the .circleci environments as much as possible, but you probably have packages in /usr/local/, possibly installed by brew, that will probably interfere with the build. If youre trying to repro an error on a Mac build in .circleci and you cant seem to repro locally, then my best advice is actually to iterate on .circleci :/
But if you want to try, then Id recommend
```sh
# Create a new terminal
# Clear your LD_LIBRARY_PATH and trim as much out of your PATH as you
# know how to do
# Install a new miniconda
# First remove any other python or conda installation from your PATH
# Always install miniconda 3, even if building for Python <3
new_conda="~/my_new_conda"
conda_sh="$new_conda/install_miniconda.sh"
curl -o "$conda_sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
chmod +x "$conda_sh"
"$conda_sh" -b -p "$MINICONDA_ROOT"
rm -f "$conda_sh"
export PATH="~/my_new_conda/bin:$PATH"
# Create a clean python env
# All MacOS builds use conda to manage the python env and dependencies
# that are built with, even the pip packages
conda create -yn binary python=2.7
conda activate binary
# Export whatever variables are important to you. All variables that you'd
# possibly need are in .circleci/scripts/binary_populate_env.sh
# You should probably always export at least these 3 variables
export PACKAGE_TYPE=conda
export DESIRED_PYTHON=3.7
export DESIRED_CUDA=cpu
# Call the entrypoint you want
path/to/builder/wheel/build_wheel.sh
```
N.B. installing a brand new miniconda is important. This has to do with how conda installations work. See the “General Python” section above, but tldr; is that
1. You make the conda command accessible by prepending `path/to/conda_root/bin` to your PATH.
2. You make a new env and activate it, which then also gets prepended to your PATH. Now you have `path/to/conda_root/envs/new_env/bin:path/to/conda_root/bin:$PATH`
3. Now say you (or some code that you ran) call python executable `foo`
1. if you installed `foo` in `new_env`, then `path/to/conda_root/envs/new_env/bin/foo` will get called, as expected.
2. But if you forgot to installed `foo` in `new_env` but happened to previously install it in your root conda env (called base), then unix/linux will still find `path/to/conda_root/bin/foo` . This is dangerous, since `foo` can be a different version than you want; `foo` can even be for an incompatible python version!
Newer conda versions and proper python hygiene can prevent this, but just install a new miniconda to be safe.
### Windows
TODO: fill in

View File

@ -57,7 +57,7 @@ WINDOWS_LIBTORCH_CONFIG_VARIANTS = [
class TopLevelNode(ConfigNode):
def __init__(self, node_name, config_tree_data, smoke):
super().__init__(None, node_name)
super(TopLevelNode, self).__init__(None, node_name)
self.config_tree_data = config_tree_data
self.props["smoke"] = smoke
@ -68,7 +68,7 @@ class TopLevelNode(ConfigNode):
class OSConfigNode(ConfigNode):
def __init__(self, parent, os_name, gpu_versions, py_tree):
super().__init__(parent, os_name)
super(OSConfigNode, self).__init__(parent, os_name)
self.py_tree = py_tree
self.props["os_name"] = os_name
@ -80,7 +80,7 @@ class OSConfigNode(ConfigNode):
class PackageFormatConfigNode(ConfigNode):
def __init__(self, parent, package_format, python_versions):
super().__init__(parent, package_format)
super(PackageFormatConfigNode, self).__init__(parent, package_format)
self.props["python_versions"] = python_versions
self.props["package_format"] = package_format
@ -97,7 +97,7 @@ class PackageFormatConfigNode(ConfigNode):
class LinuxGccConfigNode(ConfigNode):
def __init__(self, parent, gcc_config_variant):
super().__init__(parent, "GCC_CONFIG_VARIANT=" + str(gcc_config_variant))
super(LinuxGccConfigNode, self).__init__(parent, "GCC_CONFIG_VARIANT=" + str(gcc_config_variant))
self.props["gcc_config_variant"] = gcc_config_variant
@ -122,7 +122,7 @@ class LinuxGccConfigNode(ConfigNode):
class WindowsLibtorchConfigNode(ConfigNode):
def __init__(self, parent, libtorch_config_variant):
super().__init__(parent, "LIBTORCH_CONFIG_VARIANT=" + str(libtorch_config_variant))
super(WindowsLibtorchConfigNode, self).__init__(parent, "LIBTORCH_CONFIG_VARIANT=" + str(libtorch_config_variant))
self.props["libtorch_config_variant"] = libtorch_config_variant
@ -132,7 +132,7 @@ class WindowsLibtorchConfigNode(ConfigNode):
class ArchConfigNode(ConfigNode):
def __init__(self, parent, gpu):
super().__init__(parent, get_processor_arch_name(gpu))
super(ArchConfigNode, self).__init__(parent, get_processor_arch_name(gpu))
self.props["gpu"] = gpu
@ -142,7 +142,7 @@ class ArchConfigNode(ConfigNode):
class PyVersionConfigNode(ConfigNode):
def __init__(self, parent, pyver):
super().__init__(parent, pyver)
super(PyVersionConfigNode, self).__init__(parent, pyver)
self.props["pyver"] = pyver
@ -158,7 +158,7 @@ class PyVersionConfigNode(ConfigNode):
class LinkingVariantConfigNode(ConfigNode):
def __init__(self, parent, linking_variant):
super().__init__(parent, linking_variant)
super(LinkingVariantConfigNode, self).__init__(parent, linking_variant)
def get_children(self):
return [DependencyInclusionConfigNode(self, v) for v in DEPS_INCLUSION_DIMENSIONS]
@ -166,6 +166,6 @@ class LinkingVariantConfigNode(ConfigNode):
class DependencyInclusionConfigNode(ConfigNode):
def __init__(self, parent, deps_variant):
super().__init__(parent, deps_variant)
super(DependencyInclusionConfigNode, self).__init__(parent, deps_variant)
self.props["libtorch_variant"] = "-".join([self.parent.get_label(), self.get_label()])

View File

@ -12,7 +12,7 @@ def get_major_pyver(dotted_version):
class TreeConfigNode(ConfigNode):
def __init__(self, parent, node_name, subtree):
super().__init__(parent, self.modify_label(node_name))
super(TreeConfigNode, self).__init__(parent, self.modify_label(node_name))
self.subtree = subtree
self.init2(node_name)
@ -28,7 +28,7 @@ class TreeConfigNode(ConfigNode):
class TopLevelNode(TreeConfigNode):
def __init__(self, node_name, subtree):
super().__init__(None, node_name, subtree)
super(TopLevelNode, self).__init__(None, node_name, subtree)
# noinspection PyMethodMayBeStatic
def child_constructor(self):

View File

@ -1,3 +1,8 @@
from collections import OrderedDict
from cimodel.lib.miniutils import quote
from cimodel.data.simple.util.branch_filters import gen_filter_dict_exclude
class MacOsJob:
def __init__(self, os_version, is_build=False, is_test=False, extra_props=tuple()):
# extra_props is tuple type, because mutable data structures for argument defaults
@ -50,5 +55,94 @@ WORKFLOW_DATA = [
]
def get_new_workflow_jobs():
return [
OrderedDict(
{
"mac_build": OrderedDict(
{
"name": "macos-12-py3-x86-64-build",
"build-environment": "macos-12-py3-x86-64",
"xcode-version": quote("13.3.1"),
"filters": gen_filter_dict_exclude()
}
)
}
),
OrderedDict(
{
"mac_test": OrderedDict(
{
"name": "macos-12-py3-x86-64-test-1-2-default",
"build-environment": "macos-12-py3-x86-64",
"xcode-version": quote("13.3.1"),
"shard-number": quote("1"),
"num-test-shards": quote("2"),
"requires": ["macos-12-py3-x86-64-build"],
"filters": gen_filter_dict_exclude()
}
)
}
),
OrderedDict(
{
"mac_test": OrderedDict(
{
"name": "macos-12-py3-x86-64-test-2-2-default",
"build-environment": "macos-12-py3-x86-64",
"xcode-version": quote("13.3.1"),
"shard-number": quote("2"),
"num-test-shards": quote("2"),
"requires": ["macos-12-py3-x86-64-build"],
"filters": gen_filter_dict_exclude()
}
)
}
),
OrderedDict(
{
"mac_test": OrderedDict(
{
"name": "macos-12-py3-x86-64-test-1-1-functorch",
"build-environment": "macos-12-py3-x86-64",
"xcode-version": quote("13.3.1"),
"shard-number": quote("1"),
"num-test-shards": quote("1"),
"test-config": "functorch",
"requires": ["macos-12-py3-x86-64-build"],
"filters": gen_filter_dict_exclude()
}
)
}
),
OrderedDict(
{
"mac_build": OrderedDict(
{
"name": "macos-12-py3-x86-64-lite-interpreter-build-test",
"build-environment": "macos-12-py3-lite-interpreter-x86-64",
"xcode-version": quote("13.3.1"),
"build-generates-artifacts": "false",
"filters": gen_filter_dict_exclude()
}
)
}
),
OrderedDict(
{
"mac_build": OrderedDict(
{
"name": "macos-12-py3-arm64-build",
"build-environment": "macos-12-py3-arm64",
"xcode-version": quote("13.3.1"),
"python-version": quote("3.9.12"),
"filters": gen_filter_dict_exclude()
}
)
}
),
]
def get_workflow_jobs():
return [item.gen_tree() for item in WORKFLOW_DATA]

View File

@ -0,0 +1,22 @@
from typing import OrderedDict
from cimodel.data.simple.util.branch_filters import gen_filter_dict_exclude
def get_workflow_job():
return [
OrderedDict(
{
"upload_test_stats": OrderedDict(
{
"name": "upload test status",
"requires": [
"macos-12-py3-x86-64-test-1-2-default",
"macos-12-py3-x86-64-test-2-2-default",
"macos-12-py3-x86-64-test-1-1-functorch",
],
"filters": gen_filter_dict_exclude()
}
)
}
),
]

164
.circleci/config.yml generated
View File

@ -47,7 +47,7 @@ commands:
- run:
name: "Calculate docker image hash"
command: |
DOCKER_TAG=$(git rev-parse HEAD:.ci/docker)
DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)
echo "DOCKER_TAG=${DOCKER_TAG}" >> "${BASH_ENV}"
designate_upload_channel:
@ -526,8 +526,8 @@ jobs:
export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4}
set -x
chmod a+x .ci/pytorch/macos-build.sh
unbuffer .ci/pytorch/macos-build.sh 2>&1 | ts
chmod a+x .jenkins/pytorch/macos-build.sh
unbuffer .jenkins/pytorch/macos-build.sh 2>&1 | ts
- persist_to_workspace:
root: /Users/distiller/workspace/
@ -562,8 +562,8 @@ jobs:
export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4}
set -x
chmod a+x .ci/pytorch/macos-build.sh
unbuffer .ci/pytorch/macos-build.sh 2>&1 | ts
chmod a+x .jenkins/pytorch/macos-build.sh
unbuffer .jenkins/pytorch/macos-build.sh 2>&1 | ts
- persist_to_workspace:
root: /Users/distiller/workspace/
@ -644,7 +644,7 @@ jobs:
brew link --force libomp
echo "export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname "$(which conda)")/../"}" >> "${BASH_ENV}"
.ci/pytorch/macos-build.sh
.jenkins/pytorch/macos-build.sh
- when:
condition: << parameters.build-generates-artifacts >>
@ -727,7 +727,7 @@ jobs:
export COMMIT_MESSAGES="${COMMIT_MESSAGES//[\'\"]}"
python3 -mpip install dist/*.whl
.ci/pytorch/macos-test.sh
.jenkins/pytorch/macos-test.sh
- run:
name: Copy files for uploading test stats
command: |
@ -757,7 +757,7 @@ jobs:
exit 0
fi
cp -r ~/workspace/test-reports/* ~/project
pip3 install requests==2.26 rockset==1.0.3 boto3==1.19.12
pip3 install requests==2.26 rockset==0.8.3 boto3==1.19.12 six==1.16.0
export AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_FOR_OSSCI_ARTIFACT_UPLOAD}
export AWS_SECRET_ACCESS_KEY=${AWS_SECRET_KEY_FOR_OSSCI_ARTIFACT_UPLOAD}
# i dont know how to get the run attempt number for reruns so default to 1
@ -779,8 +779,23 @@ jobs:
set -e
export JOB_BASE_NAME=$CIRCLE_JOB
chmod a+x .ci/pytorch/macos-test.sh
unbuffer .ci/pytorch/macos-test.sh 2>&1 | ts
chmod a+x .jenkins/pytorch/macos-test.sh
unbuffer .jenkins/pytorch/macos-test.sh 2>&1 | ts
- run:
name: Report results
no_output_timeout: "5m"
command: |
set -ex
source /Users/distiller/workspace/miniconda3/bin/activate
python3 -m pip install boto3==1.19.12
export JOB_BASE_NAME=$CIRCLE_JOB
# Using the same IAM user to write stats to our OSS bucket
export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V4}
export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4}
python -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test
when: always
- store_test_results:
path: test/test-reports
@ -801,8 +816,8 @@ jobs:
set -e
export BUILD_LITE_INTERPRETER=1
export JOB_BASE_NAME=$CIRCLE_JOB
chmod a+x ${HOME}/project/.ci/pytorch/macos-lite-interpreter-build-test.sh
unbuffer ${HOME}/project/.ci/pytorch/macos-lite-interpreter-build-test.sh 2>&1 | ts
chmod a+x ${HOME}/project/.jenkins/pytorch/macos-lite-interpreter-build-test.sh
unbuffer ${HOME}/project/.jenkins/pytorch/macos-lite-interpreter-build-test.sh 2>&1 | ts
- store_test_results:
path: test/test-reports
@ -1026,7 +1041,7 @@ jobs:
export TCLLIBPATH="/usr/local/lib"
# Install conda
curl --retry 3 -o ~/conda.sh https://repo.anaconda.com/miniconda/Miniconda3-py39_4.12.0-MacOSX-x86_64.sh
curl --retry 3 -o ~/conda.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
chmod +x ~/conda.sh
/bin/bash ~/conda.sh -b -p ~/anaconda
export PATH="~/anaconda/bin:${PATH}"
@ -1037,7 +1052,7 @@ jobs:
$* || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*)
}
retry conda install numpy ninja pyyaml mkl mkl-include setuptools cmake requests typing-extensions --yes
retry conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi requests typing_extensions --yes
# sync submodules
cd ${PROJ_ROOT}
@ -1101,7 +1116,7 @@ jobs:
cd ${PROJ_ROOT}/ios/TestApp/benchmark
mkdir -p ../models
if [ ${USE_COREML_DELEGATE} == 1 ]; then
pip install coremltools==5.0b5 protobuf==3.20.1
pip install coremltools==5.0b5 protobuf==3.20.1 six==1.16.0
python coreml_backend.py
else
cd "${PROJ_ROOT}"
@ -1151,7 +1166,7 @@ jobs:
docker cp /home/circleci/project/. $id:/var/lib/jenkins/workspace
export COMMAND='((echo "sudo chown -R jenkins workspace && cd workspace && .ci/pytorch/build.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
export COMMAND='((echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/build.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts
@ -1197,9 +1212,9 @@ jobs:
trap "retrieve_test_reports" ERR
if [[ ${BUILD_ENVIRONMENT} == *"multigpu"* ]]; then
export COMMAND='((echo "sudo chown -R jenkins workspace && cd workspace && .ci/pytorch/multigpu-test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
export COMMAND='((echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/multigpu-test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
else
export COMMAND='((echo "sudo chown -R jenkins workspace && cd workspace && .ci/pytorch/test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
export COMMAND='((echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
fi
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts
@ -1322,12 +1337,12 @@ jobs:
exit 0
fi
# Covers the case where a previous tag doesn't exist for the tree
# this is only really applicable on trees that don't have `.ci/docker` at its merge base, i.e. nightly
if ! git rev-parse "$(git merge-base HEAD << pipeline.git.base_revision >>):.ci/docker"; then
echo "Directory '.ci/docker' not found in tree << pipeline.git.base_revision >>, you should probably rebase onto a more recent commit"
# this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly
if ! git rev-parse "$(git merge-base HEAD << pipeline.git.base_revision >>):.circleci/docker"; then
echo "Directory '.circleci/docker' not found in tree << pipeline.git.base_revision >>, you should probably rebase onto a more recent commit"
exit 1
fi
PREVIOUS_DOCKER_TAG=$(git rev-parse "$(git merge-base HEAD << pipeline.git.base_revision >>):ci/docker")
PREVIOUS_DOCKER_TAG=$(git rev-parse "$(git merge-base HEAD << pipeline.git.base_revision >>):.circleci/docker")
# If no image exists but the hash is the same as the previous hash then we should error out here
if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then
echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"
@ -1342,7 +1357,7 @@ jobs:
export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_DOCKER_BUILDER_V1}
export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_DOCKER_BUILDER_V1}
set -x
cd .ci/docker && ./build_docker.sh
cd .circleci/docker && ./build_docker.sh
##############################################################################
# Workflows
##############################################################################
@ -1432,4 +1447,107 @@ workflows:
branches:
only:
- postnightly
- mac_build:
name: macos-12-py3-x86-64-build
build-environment: macos-12-py3-x86-64
xcode-version: "13.3.1"
filters:
branches:
ignore:
- nightly
- postnightly
- mac_test:
name: macos-12-py3-x86-64-test-1-2-default
build-environment: macos-12-py3-x86-64
xcode-version: "13.3.1"
shard-number: "1"
num-test-shards: "2"
requires:
- macos-12-py3-x86-64-build
filters:
branches:
ignore:
- nightly
- postnightly
- mac_test:
name: macos-12-py3-x86-64-test-2-2-default
build-environment: macos-12-py3-x86-64
xcode-version: "13.3.1"
shard-number: "2"
num-test-shards: "2"
requires:
- macos-12-py3-x86-64-build
filters:
branches:
ignore:
- nightly
- postnightly
- mac_test:
name: macos-12-py3-x86-64-test-1-1-functorch
build-environment: macos-12-py3-x86-64
xcode-version: "13.3.1"
shard-number: "1"
num-test-shards: "1"
test-config: functorch
requires:
- macos-12-py3-x86-64-build
filters:
branches:
ignore:
- nightly
- postnightly
- mac_build:
name: macos-12-py3-x86-64-lite-interpreter-build-test
build-environment: macos-12-py3-lite-interpreter-x86-64
xcode-version: "13.3.1"
build-generates-artifacts: false
filters:
branches:
ignore:
- nightly
- postnightly
- mac_build:
name: macos-12-py3-arm64-build
build-environment: macos-12-py3-arm64
xcode-version: "13.3.1"
python-version: "3.9.12"
filters:
branches:
ignore:
- nightly
- postnightly
- upload_test_stats:
name: upload test status
requires:
- macos-12-py3-x86-64-test-1-2-default
- macos-12-py3-x86-64-test-2-2-default
- macos-12-py3-x86-64-test-1-1-functorch
filters:
branches:
ignore:
- nightly
- postnightly
- pytorch_ios_build:
build_environment: ios-12-5-1-x86-64
filters:
branches:
ignore:
- nightly
- postnightly
ios_arch: x86_64
ios_platform: SIMULATOR
lite_interpreter: "1"
name: ios-12-5-1-x86-64
- pytorch_ios_build:
build_environment: ios-12-5-1-x86-64-coreml
filters:
branches:
ignore:
- nightly
- postnightly
ios_arch: x86_64
ios_platform: SIMULATOR
lite_interpreter: "1"
name: ios-12-5-1-x86-64-coreml
use_coreml: "1"
when: << pipeline.parameters.run_build >>

View File

@ -33,7 +33,7 @@ function extract_all_from_image_name() {
if [ "x${name}" = xpy ]; then
vername=ANACONDA_PYTHON_VERSION
fi
# skip non-conforming fields such as "pytorch", "linux" or "bionic" without version string
# skip non-conforming fields such as "pytorch", "linux" or "xenial" without version string
if [ -n "${name}" ]; then
extract_version_from_image_name "${name}" "${vername}"
fi
@ -46,7 +46,11 @@ if [[ "$image" == *xla* ]]; then
exit 0
fi
if [[ "$image" == *-bionic* ]]; then
if [[ "$image" == *-xenial* ]]; then
UBUNTU_VERSION=16.04
elif [[ "$image" == *-artful* ]]; then
UBUNTU_VERSION=17.10
elif [[ "$image" == *-bionic* ]]; then
UBUNTU_VERSION=18.04
elif [[ "$image" == *-focal* ]]; then
UBUNTU_VERSION=20.04
@ -73,21 +77,69 @@ if [[ "$image" == *cuda* && "$UBUNTU_VERSION" != "22.04" ]]; then
DOCKERFILE="${OS}-cuda/Dockerfile"
elif [[ "$image" == *rocm* ]]; then
DOCKERFILE="${OS}-rocm/Dockerfile"
elif [[ "$image" == *linter* ]]; then
# Use a separate Dockerfile for linter to keep a small image size
DOCKERFILE="linter/Dockerfile"
fi
# CMake 3.18 is needed to support CUDA17 language variant
CMAKE_VERSION=3.18.5
if [[ "$image" == *xenial* ]] || [[ "$image" == *bionic* ]]; then
CMAKE_VERSION=3.13.5
fi
TRAVIS_DL_URL_PREFIX="https://s3.amazonaws.com/travis-python-archives/binaries/ubuntu/14.04/x86_64"
_UCX_COMMIT=31e74cac7bee0ef66bef2af72e7d86d9c282e5ab
_UCC_COMMIT=1c7a7127186e7836f73aafbd7697bbc274a77eee
_UCC_COMMIT=12944da33f911daf505d9bbc51411233d0ed85e1
# It's annoying to rename jobs every time you want to rewrite a
# configuration, so we hardcode everything here rather than do it
# from scratch
case "$image" in
pytorch-linux-xenial-py3.8)
ANACONDA_PYTHON_VERSION=3.8
GCC_VERSION=7
# Do not install PROTOBUF, DB, and VISION as a test
;;
pytorch-linux-xenial-py3.7-gcc7.2)
ANACONDA_PYTHON_VERSION=3.7
GCC_VERSION=7
# Do not install PROTOBUF, DB, and VISION as a test
;;
pytorch-linux-xenial-py3.7-gcc7)
ANACONDA_PYTHON_VERSION=3.7
GCC_VERSION=7
PROTOBUF=yes
DB=yes
VISION=yes
;;
pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7)
CUDA_VERSION=10.2
CUDNN_VERSION=7
ANACONDA_PYTHON_VERSION=3.7
GCC_VERSION=7
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
;;
pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7)
CUDA_VERSION=11.3.0 # Deviating from major.minor to conform to nvidia's Docker image names
CUDNN_VERSION=8
TENSORRT_VERSION=8.0.1.6
ANACONDA_PYTHON_VERSION=3.7
GCC_VERSION=7
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
;;
pytorch-linux-bionic-cuda11.3-cudnn8-py3-clang9)
CUDA_VERSION=11.3.0 # Deviating from major.minor to conform to nvidia's Docker image names
CUDNN_VERSION=8
TENSORRT_VERSION=8.0.1.6
ANACONDA_PYTHON_VERSION=3.7
CLANG_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
;;
pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7)
CUDA_VERSION=11.6.2
CUDNN_VERSION=8
@ -99,7 +151,6 @@ case "$image" in
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
CONDA_CMAKE=yes
;;
pytorch-linux-bionic-cuda11.7-cudnn8-py3-gcc7)
CUDA_VERSION=11.7.0
@ -112,40 +163,45 @@ case "$image" in
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
CONDA_CMAKE=yes
;;
pytorch-linux-bionic-cuda11.8-cudnn8-py3-gcc7)
CUDA_VERSION=11.8.0
CUDNN_VERSION=8
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=7
pytorch-linux-xenial-py3-clang5-asan)
ANACONDA_PYTHON_VERSION=3.7
CLANG_VERSION=5.0
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
CONDA_CMAKE=yes
;;
pytorch-linux-focal-py3-clang7-asan)
ANACONDA_PYTHON_VERSION=3.9
pytorch-linux-xenial-py3-clang7-asan)
ANACONDA_PYTHON_VERSION=3.7
CLANG_VERSION=7
PROTOBUF=yes
DB=yes
VISION=yes
;;
pytorch-linux-focal-py3-clang7-asan)
ANACONDA_PYTHON_VERSION=3.7
CLANG_VERSION=7
PROTOBUF=yes
DB=yes
VISION=yes
;;
pytorch-linux-xenial-py3-clang7-onnx)
ANACONDA_PYTHON_VERSION=3.7
CLANG_VERSION=7
PROTOBUF=yes
DB=yes
VISION=yes
CONDA_CMAKE=yes
;;
pytorch-linux-focal-py3-clang10-onnx)
ANACONDA_PYTHON_VERSION=3.8
ANACONDA_PYTHON_VERSION=3.7
CLANG_VERSION=10
PROTOBUF=yes
DB=yes
VISION=yes
CONDA_CMAKE=yes
;;
pytorch-linux-focal-py3-clang7-android-ndk-r19c)
pytorch-linux-xenial-py3-clang5-android-ndk-r19c)
ANACONDA_PYTHON_VERSION=3.7
CLANG_VERSION=7
CLANG_VERSION=5.0
LLVMDEV=yes
PROTOBUF=yes
ANDROID=yes
@ -153,25 +209,21 @@ case "$image" in
GRADLE_VERSION=6.8.3
NINJA_VERSION=1.9.0
;;
pytorch-linux-bionic-py3.8-clang9)
ANACONDA_PYTHON_VERSION=3.8
CLANG_VERSION=9
pytorch-linux-xenial-py3.7-clang7)
ANACONDA_PYTHON_VERSION=3.7
CLANG_VERSION=7
PROTOBUF=yes
DB=yes
VISION=yes
VULKAN_SDK_VERSION=1.2.162.1
SWIFTSHADER=yes
CONDA_CMAKE=yes
;;
pytorch-linux-bionic-py3.11-clang9)
ANACONDA_PYTHON_VERSION=3.11
pytorch-linux-bionic-py3.7-clang9)
ANACONDA_PYTHON_VERSION=3.7
CLANG_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
VULKAN_SDK_VERSION=1.2.162.1
SWIFTSHADER=yes
CONDA_CMAKE=yes
;;
pytorch-linux-bionic-py3.8-gcc9)
ANACONDA_PYTHON_VERSION=3.8
@ -179,36 +231,49 @@ case "$image" in
PROTOBUF=yes
DB=yes
VISION=yes
CONDA_CMAKE=yes
;;
pytorch-linux-focal-rocm-n-1-py3)
ANACONDA_PYTHON_VERSION=3.8
pytorch-linux-bionic-cuda10.2-cudnn7-py3.7-clang9)
CUDA_VERSION=10.2
CUDNN_VERSION=7
ANACONDA_PYTHON_VERSION=3.7
CLANG_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
;;
pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7)
CUDA_VERSION=10.2
CUDNN_VERSION=7
ANACONDA_PYTHON_VERSION=3.9
GCC_VERSION=7
PROTOBUF=yes
DB=yes
VISION=yes
;;
pytorch-linux-focal-rocm5.1-py3.7)
ANACONDA_PYTHON_VERSION=3.7
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
ROCM_VERSION=5.3
NINJA_VERSION=1.9.0
CONDA_CMAKE=yes
ROCM_VERSION=5.1.1
;;
pytorch-linux-focal-rocm-n-py3)
ANACONDA_PYTHON_VERSION=3.8
pytorch-linux-focal-rocm5.2-py3.7)
ANACONDA_PYTHON_VERSION=3.7
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
ROCM_VERSION=5.4.2
NINJA_VERSION=1.9.0
CONDA_CMAKE=yes
ROCM_VERSION=5.2
;;
pytorch-linux-focal-py3.8-gcc7)
ANACONDA_PYTHON_VERSION=3.8
pytorch-linux-focal-py3.7-gcc7)
ANACONDA_PYTHON_VERSION=3.7
CMAKE_VERSION=3.16.9 # Required for precompiled header support
GCC_VERSION=7
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
CONDA_CMAKE=yes
;;
pytorch-linux-jammy-cuda11.6-cudnn8-py3.8-clang12)
ANACONDA_PYTHON_VERSION=3.8
@ -228,22 +293,6 @@ case "$image" in
DB=yes
VISION=yes
;;
pytorch-linux-jammy-cuda11.8-cudnn8-py3.8-clang12)
ANACONDA_PYTHON_VERSION=3.8
CUDA_VERSION=11.8
CUDNN_VERSION=8
CLANG_VERSION=12
PROTOBUF=yes
DB=yes
VISION=yes
;;
pytorch-linux-focal-linter)
# TODO: Use 3.9 here because of this issue https://github.com/python/mypy/issues/13627.
# We will need to update mypy version eventually, but that's for another day. The task
# would be to upgrade mypy to 1.0.0 with Python 3.11
ANACONDA_PYTHON_VERSION=3.9
CONDA_CMAKE=yes
;;
*)
# Catch-all for builds that are not hardcoded.
PROTOBUF=yes
@ -259,10 +308,6 @@ case "$image" in
fi
if [[ "$image" == *rocm* ]]; then
extract_version_from_image_name rocm ROCM_VERSION
NINJA_VERSION=1.9.0
fi
if [[ "$image" == *centos7* ]]; then
NINJA_VERSION=1.10.2
fi
if [[ "$image" == *gcc* ]]; then
extract_version_from_image_name gcc GCC_VERSION
@ -282,6 +327,12 @@ case "$image" in
;;
esac
# Set Jenkins UID and GID if running Jenkins
if [ -n "${JENKINS:-}" ]; then
JENKINS_UID=$(id -u jenkins)
JENKINS_GID=$(id -g jenkins)
fi
tmp_tag=$(basename "$(mktemp -u)" | tr '[:upper:]' '[:lower:]')
#when using cudnn version 8 install it separately from cuda
@ -298,12 +349,17 @@ fi
docker build \
--no-cache \
--progress=plain \
--build-arg "TRAVIS_DL_URL_PREFIX=${TRAVIS_DL_URL_PREFIX}" \
--build-arg "BUILD_ENVIRONMENT=${image}" \
--build-arg "PROTOBUF=${PROTOBUF:-}" \
--build-arg "THRIFT=${THRIFT:-}" \
--build-arg "LLVMDEV=${LLVMDEV:-}" \
--build-arg "DB=${DB:-}" \
--build-arg "VISION=${VISION:-}" \
--build-arg "EC2=${EC2:-}" \
--build-arg "JENKINS=${JENKINS:-}" \
--build-arg "JENKINS_UID=${JENKINS_UID:-}" \
--build-arg "JENKINS_GID=${JENKINS_GID:-}" \
--build-arg "UBUNTU_VERSION=${UBUNTU_VERSION}" \
--build-arg "CENTOS_VERSION=${CENTOS_VERSION}" \
--build-arg "DEVTOOLSET_VERSION=${DEVTOOLSET_VERSION}" \
@ -327,7 +383,6 @@ docker build \
--build-arg "IMAGE_NAME=${IMAGE_NAME}" \
--build-arg "UCX_COMMIT=${UCX_COMMIT}" \
--build-arg "UCC_COMMIT=${UCC_COMMIT}" \
--build-arg "CONDA_CMAKE=${CONDA_CMAKE}" \
-f $(dirname ${DOCKERFILE})/Dockerfile \
-t "$tmp_tag" \
"$@" \

View File

@ -18,6 +18,7 @@ tag="${DOCKER_TAG}"
registry="308535385114.dkr.ecr.us-east-1.amazonaws.com"
image="${registry}/pytorch/${IMAGE_NAME}"
ghcr_image="ghcr.io/pytorch/ci-image"
login() {
aws ecr get-authorization-token --region us-east-1 --output text --query 'authorizationData[].authorizationToken' |
@ -35,6 +36,9 @@ if [[ -z "${GITHUB_ACTIONS}" ]]; then
trap "docker logout ${registry}" EXIT
fi
# export EC2=1
# export JENKINS=1
# Try to pull the previous image (perhaps we can reuse some layers)
# if [ -n "${last_tag}" ]; then
# docker pull "${image}:${last_tag}" || true
@ -51,6 +55,13 @@ if [ "${DOCKER_SKIP_PUSH:-true}" = "false" ]; then
if ! docker manifest inspect "${image}:${tag}" >/dev/null 2>/dev/null; then
docker push "${image}:${tag}"
fi
if [ "${PUSH_GHCR_IMAGE:-}" = "true" ]; then
# Push docker image to the ghcr.io
echo $GHCR_PAT | docker login ghcr.io -u pytorch --password-stdin
docker tag "${image}:${tag}" "${ghcr_image}:${IMAGE_NAME}-${tag}"
docker push "${ghcr_image}:${IMAGE_NAME}-${tag}"
fi
fi
if [ -z "${DOCKER_SKIP_S3_UPLOAD:-}" ]; then

View File

@ -11,15 +11,14 @@ ENV PYTORCH_ROCM_ARCH ${PYTORCH_ROCM_ARCH}
# Install required packages to build Caffe2
# Install common dependencies (so that this step can be cached separately)
ARG EC2
COPY ./common/install_base.sh install_base.sh
RUN bash ./install_base.sh && rm install_base.sh
# Update CentOS git version
RUN yum -y remove git
RUN yum -y remove git-*
RUN yum -y install https://packages.endpoint.com/rhel/7/os/x86_64/endpoint-repo-1.9-1.x86_64.rpm || \
(yum -y install https://packages.endpointdev.com/rhel/7/os/x86_64/endpoint-repo-1.9-1.x86_64.rpm && \
sed -i "s/packages.endpoint/packages.endpointdev/" /etc/yum.repos.d/endpoint.repo)
RUN yum -y install https://packages.endpoint.com/rhel/7/os/x86_64/endpoint-repo-1.9-1.x86_64.rpm
RUN yum install -y git
# Install devtoolset
@ -39,14 +38,12 @@ COPY ./common/install_user.sh install_user.sh
RUN bash ./install_user.sh && rm install_user.sh
# Install conda and other packages (e.g., numpy, pytest)
ENV PATH /opt/conda/bin:$PATH
ARG ANACONDA_PYTHON_VERSION
ARG CONDA_CMAKE
ENV ANACONDA_PYTHON_VERSION=$ANACONDA_PYTHON_VERSION
ENV PATH /opt/conda/envs/py_$ANACONDA_PYTHON_VERSION/bin:/opt/conda/bin:$PATH
COPY requirements-ci.txt /opt/conda/requirements-ci.txt
COPY ./common/install_conda.sh install_conda.sh
COPY ./common/common_utils.sh common_utils.sh
RUN bash ./install_conda.sh && rm install_conda.sh common_utils.sh /opt/conda/requirements-ci.txt
RUN bash ./install_conda.sh && rm install_conda.sh
RUN rm /opt/conda/requirements-ci.txt
# (optional) Install protobuf for ONNX
ARG PROTOBUF

View File

@ -68,10 +68,7 @@ install_ubuntu() {
sudo \
vim \
jq \
libtool \
vim \
unzip \
gdb
libtool
# Should resolve issues related to various apt package repository cert issues
# see: https://github.com/pytorch/pytorch/issues/65931
@ -129,9 +126,7 @@ install_centos() {
opencv-devel \
sudo \
wget \
vim \
unzip \
gdb
vim
# Cleanup
yum clean all
@ -157,7 +152,7 @@ esac
# Install Valgrind separately since the apt-get version is too old.
mkdir valgrind_build && cd valgrind_build
VALGRIND_VERSION=3.20.0
VALGRIND_VERSION=3.16.1
wget https://ossci-linux.s3.amazonaws.com/valgrind-${VALGRIND_VERSION}.tar.bz2
tar -xjf valgrind-${VALGRIND_VERSION}.tar.bz2
cd valgrind-${VALGRIND_VERSION}

View File

@ -5,19 +5,7 @@ set -ex
[ -n "$CMAKE_VERSION" ]
# Remove system cmake install so it won't get used instead
ID=$(grep -oP '(?<=^ID=).+' /etc/os-release | tr -d '"')
case "$ID" in
ubuntu)
apt-get remove cmake -y
;;
centos)
yum remove cmake -y
;;
*)
echo "Unable to determine OS..."
exit 1
;;
esac
apt-get remove cmake -y
# Turn 3.6.3 into v3.6
path=$(echo "${CMAKE_VERSION}" | sed -e 's/\([0-9].[0-9]\+\).*/v\1/')

View File

@ -24,12 +24,26 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then
mkdir -p /opt/conda
chown jenkins:jenkins /opt/conda
source "$(dirname "${BASH_SOURCE[0]}")/common_utils.sh"
# Work around bug where devtoolset replaces sudo and breaks it.
if [ -n "$DEVTOOLSET_VERSION" ]; then
SUDO=/bin/sudo
else
SUDO=sudo
fi
as_jenkins() {
# NB: unsetting the environment variables works around a conda bug
# https://github.com/conda/conda/issues/6576
# NB: Pass on PATH and LD_LIBRARY_PATH to sudo invocation
# NB: This must be run from a directory that jenkins has access to,
# works around https://github.com/conda/conda-package-handling/pull/34
$SUDO -H -u jenkins env -u SUDO_UID -u SUDO_GID -u SUDO_COMMAND -u SUDO_USER env "PATH=$PATH" "LD_LIBRARY_PATH=$LD_LIBRARY_PATH" $*
}
pushd /tmp
wget -q "${BASE_URL}/${CONDA_FILE}"
# NB: Manually invoke bash per https://github.com/conda/conda/issues/10431
as_jenkins bash "${CONDA_FILE}" -b -f -p "/opt/conda"
chmod +x "${CONDA_FILE}"
as_jenkins ./"${CONDA_FILE}" -b -f -p "/opt/conda"
popd
# NB: Don't do this, rely on the rpath to get it right
@ -47,15 +61,24 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then
# as_jenkins conda update -y -n base conda
# Install correct Python version
as_jenkins conda create -n py_$ANACONDA_PYTHON_VERSION -y python="$ANACONDA_PYTHON_VERSION"
as_jenkins conda install -y python="$ANACONDA_PYTHON_VERSION"
conda_install() {
# Ensure that the install command don't upgrade/downgrade Python
# This should be called as
# conda_install pkg1 pkg2 ... [-c channel]
as_jenkins conda install -q -y python="$ANACONDA_PYTHON_VERSION" $*
}
pip_install() {
as_jenkins pip install --progress-bar off $*
}
# Install PyTorch conda deps, as per https://github.com/pytorch/pytorch README
CONDA_COMMON_DEPS="astunparse pyyaml mkl=2021.4.0 mkl-include=2021.4.0 setuptools"
if [ "$ANACONDA_PYTHON_VERSION" = "3.11" ]; then
# Install llvm-8 as it is required to compile llvmlite-0.30.0 from source
# TODO: Stop using `-c malfet`
conda_install numpy=1.23.5 ${CONDA_COMMON_DEPS} llvmdev=8.0.0 -c malfet
elif [ "$ANACONDA_PYTHON_VERSION" = "3.10" ]; then
# DO NOT install cmake here as it would install a version newer than 3.13, but
# we want to pin to version 3.13.
CONDA_COMMON_DEPS="astunparse pyyaml mkl=2022.0.1 mkl-include=2022.0.1 setuptools cffi future six"
if [ "$ANACONDA_PYTHON_VERSION" = "3.10" ]; then
# Install llvm-8 as it is required to compile llvmlite-0.30.0 from source
conda_install numpy=1.21.2 ${CONDA_COMMON_DEPS} llvmdev=8.0.0
elif [ "$ANACONDA_PYTHON_VERSION" = "3.9" ]; then
@ -65,16 +88,8 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then
# Install llvm-8 as it is required to compile llvmlite-0.30.0 from source
conda_install numpy=1.18.5 ${CONDA_COMMON_DEPS} llvmdev=8.0.0
else
# Install `typing-extensions` for 3.7
conda_install numpy=1.18.5 ${CONDA_COMMON_DEPS} typing-extensions
fi
# Use conda cmake in some cases. Conda cmake will be newer than our supported
# min version (3.5 for xenial and 3.10 for bionic), so we only do it in those
# following builds that we know should use conda. Specifically, Ubuntu bionic
# and focal cannot find conda mkl with stock cmake, so we need a cmake from conda
if [ -n "${CONDA_CMAKE}" ]; then
conda_install cmake
# Install `typing_extensions` for 3.7
conda_install numpy=1.18.5 ${CONDA_COMMON_DEPS} typing_extensions
fi
# Magma package names are concatenation of CUDA major and minor ignoring revision
@ -83,6 +98,9 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then
conda_install magma-cuda$(TMP=${CUDA_VERSION/./};echo ${TMP%.*[0-9]}) -c pytorch
fi
# TODO: This isn't working atm
conda_install nnpack -c killeent
# Install some other packages, including those needed for Python test reporting
pip_install -r /opt/conda/requirements-ci.txt

View File

@ -6,12 +6,9 @@ if [[ ${CUDNN_VERSION} == 8 ]]; then
CUDNN_NAME="cudnn-linux-x86_64-8.3.2.44_cuda11.5-archive"
if [[ ${CUDA_VERSION:0:4} == "11.7" ]]; then
CUDNN_NAME="cudnn-linux-x86_64-8.5.0.96_cuda11-archive"
curl --retry 3 -OLs https://ossci-linux.s3.amazonaws.com/${CUDNN_NAME}.tar.xz
elif [[ ${CUDA_VERSION:0:4} == "11.8" ]]; then
CUDNN_NAME="cudnn-linux-x86_64-8.7.0.84_cuda11-archive"
curl --retry 3 -OLs https://developer.download.nvidia.com/compute/redist/cudnn/v8.7.0/local_installers/11.8/${CUDNN_NAME}.tar.xz
curl -OLs https://ossci-linux.s3.amazonaws.com/${CUDNN_NAME}.tar.xz
else
curl --retry 3 -OLs https://developer.download.nvidia.com/compute/redist/cudnn/v8.3.2/local_installers/11.5/${CUDNN_NAME}.tar.xz
curl -OLs https://developer.download.nvidia.com/compute/redist/cudnn/v8.3.2/local_installers/11.5/${CUDNN_NAME}.tar.xz
fi
tar xf ${CUDNN_NAME}.tar.xz

View File

@ -7,10 +7,10 @@ if [ -n "$KATEX" ]; then
# Ignore error if gpg-agent doesn't exist (for Ubuntu 16.04)
apt-get install -y gpg-agent || :
curl --retry 3 -sL https://deb.nodesource.com/setup_12.x | sudo -E bash -
curl -sL https://deb.nodesource.com/setup_12.x | sudo -E bash -
sudo apt-get install -y nodejs
curl --retry 3 -sS https://dl.yarnpkg.com/debian/pubkey.gpg | sudo apt-key add -
curl -sS https://dl.yarnpkg.com/debian/pubkey.gpg | sudo apt-key add -
echo "deb https://dl.yarnpkg.com/debian/ stable main" | sudo tee /etc/apt/sources.list.d/yarn.list
apt-get update

View File

@ -12,7 +12,7 @@ install_protobuf_317() {
# g++: error: ./../lib64/crti.o: No such file or directory
ln -s /usr/lib64 "$pb_dir/lib64"
curl -LO "https://github.com/protocolbuffers/protobuf/releases/download/v3.17.3/protobuf-all-3.17.3.tar.gz" --retry 3
curl -LO "https://github.com/protocolbuffers/protobuf/releases/download/v3.17.3/protobuf-all-3.17.3.tar.gz"
tar -xvz -C "$pb_dir" --strip-components 1 -f protobuf-all-3.17.3.tar.gz
# -j6 to balance memory usage and speed.
# naked `-j` seems to use too much memory.

View File

@ -29,12 +29,7 @@ install_ubuntu() {
if [[ $(ver $ROCM_VERSION) -ge $(ver 4.5) ]]; then
# Add amdgpu repository
UBUNTU_VERSION_NAME=`cat /etc/os-release | grep UBUNTU_CODENAME | awk -F= '{print $2}'`
local amdgpu_baseurl
if [[ $(ver $ROCM_VERSION) -ge $(ver 5.3) ]]; then
amdgpu_baseurl="https://repo.radeon.com/amdgpu/${ROCM_VERSION}/ubuntu"
else
amdgpu_baseurl="https://repo.radeon.com/amdgpu/${AMDGPU_VERSIONS[$ROCM_VERSION]}/ubuntu"
fi
local amdgpu_baseurl="https://repo.radeon.com/amdgpu/${AMDGPU_VERSIONS[$ROCM_VERSION]}/ubuntu"
echo "deb [arch=amd64] ${amdgpu_baseurl} ${UBUNTU_VERSION_NAME} main" > /etc/apt/sources.list.d/amdgpu.list
fi
@ -43,10 +38,6 @@ install_ubuntu() {
ROCM_REPO="xenial"
fi
if [[ $(ver $ROCM_VERSION) -ge $(ver 5.3) ]]; then
ROCM_REPO="${UBUNTU_VERSION_NAME}"
fi
# Add rocm repository
wget -qO - http://repo.radeon.com/rocm/rocm.gpg.key | apt-key add -
local rocm_baseurl="http://repo.radeon.com/rocm/apt/${ROCM_VERSION}"
@ -87,16 +78,7 @@ install_centos() {
if [[ $(ver $ROCM_VERSION) -ge $(ver 4.5) ]]; then
# Add amdgpu repository
local amdgpu_baseurl
if [[ $OS_VERSION == 9 ]]; then
amdgpu_baseurl="https://repo.radeon.com/amdgpu/${AMDGPU_VERSIONS[$ROCM_VERSION]}/rhel/9.0/main/x86_64"
else
if [[ $(ver $ROCM_VERSION) -ge $(ver 5.3) ]]; then
amdgpu_baseurl="https://repo.radeon.com/amdgpu/${ROCM_VERSION}/rhel/7.9/main/x86_64"
else
amdgpu_baseurl="https://repo.radeon.com/amdgpu/${AMDGPU_VERSIONS[$ROCM_VERSION]}/rhel/7.9/main/x86_64"
fi
fi
local amdgpu_baseurl="https://repo.radeon.com/amdgpu/${AMDGPU_VERSIONS[$ROCM_VERSION]}/rhel/7.9/main/x86_64"
echo "[AMDGPU]" > /etc/yum.repos.d/amdgpu.repo
echo "name=AMDGPU" >> /etc/yum.repos.d/amdgpu.repo
echo "baseurl=${amdgpu_baseurl}" >> /etc/yum.repos.d/amdgpu.repo

View File

@ -23,7 +23,7 @@ done
# hipcc with openmp flag may cause isnan() on __device__ not to be found; depending on context, compiler may attempt to match with host definition
sed -i 's/^FOPENMP/#FOPENMP/g' make.inc
make -f make.gen.hipMAGMA -j $(nproc)
LANG=C.UTF-8 make lib/libmagma.so -j $(nproc) MKLROOT=/opt/conda/envs/py_$ANACONDA_PYTHON_VERSION
make testing/testing_dgemm -j $(nproc) MKLROOT=/opt/conda/envs/py_$ANACONDA_PYTHON_VERSION
LANG=C.UTF-8 make lib/libmagma.so -j $(nproc) MKLROOT=/opt/conda
make testing/testing_dgemm -j $(nproc) MKLROOT=/opt/conda
popd
mv magma /opt/rocm

View File

@ -22,12 +22,5 @@ chown jenkins:jenkins /usr/local
# TODO: Maybe we shouldn't
echo 'jenkins ALL=(ALL) NOPASSWD:ALL' > /etc/sudoers.d/jenkins
# Work around bug where devtoolset replaces sudo and breaks it.
if [ -n "$DEVTOOLSET_VERSION" ]; then
SUDO=/bin/sudo
else
SUDO=sudo
fi
# Test that sudo works
$SUDO -u jenkins $SUDO -v
sudo -u jenkins sudo -v

View File

@ -36,6 +36,11 @@ flatbuffers==2.0
#Pinned versions: 2.0
#test that import:
#future #this breaks linux-bionic-rocm4.5-py3.7
#Description: compatibility layer between python 2 and python 3
#Pinned versions:
#test that import:
hypothesis==5.35.1
# Pin hypothesis to avoid flakiness: https://github.com/pytorch/pytorch/issues/31136
#Description: advanced library for generating parametrized tests
@ -47,7 +52,7 @@ junitparser==2.1.1
#Pinned versions: 2.1.1
#test that import:
librosa>=0.6.2 ; python_version < "3.11"
librosa>=0.6.2
#Description: A python package for music and audio analysis
#Pinned versions: >=0.6.2
#test that import: test_spectral_ops.py
@ -154,13 +159,8 @@ pytest-shard
#Pinned versions:
#test that import:
pytest-flakefinder==1.1.0
#Description: plugin for rerunning tests a fixed number of times in pytest
#Pinned versions: 1.1.0
#test that import:
pytest-rerunfailures
#Description: plugin for rerunning failure tests in pytest
#Description: plugin for rerunning tests in pytest
#Pinned versions:
#test that import:
@ -174,9 +174,9 @@ pytest-rerunfailures
#Pinned versions:
#test that import:
xdoctest==1.1.0
xdoctest==1.0.2
#Description: runs doctests in pytest
#Pinned versions: 1.1.0
#Pinned versions: 1.0.2
#test that import:
pygments==2.12.0
@ -211,7 +211,6 @@ scikit-image
scipy==1.6.3 ; python_version < "3.10"
scipy==1.8.1 ; python_version == "3.10"
scipy==1.9.3 ; python_version == "3.11"
# Pin SciPy because of failing distribution tests (see #60347)
#Description: scientific python
#Pinned versions: 1.6.3
@ -243,18 +242,3 @@ unittest-xml-reporting<=3.2.0,>=2.0.0
#Description: saves unit test results to xml
#Pinned versions:
#test that import:
lintrunner==0.9.2
#Description: all about linters
#Pinned versions: 0.9.2
#test that import:
rockset==1.0.3
#Description: queries Rockset
#Pinned versions: 1.0.3
#test that import:
ghstack==0.7.1
#Description: ghstack tool
#Pinned versions: 0.7.1
#test that import:

View File

@ -10,6 +10,7 @@ ARG CUDA_VERSION
ENV DEBIAN_FRONTEND noninteractive
# Install common dependencies (so that this step can be cached separately)
ARG EC2
COPY ./common/install_base.sh install_base.sh
RUN bash ./install_base.sh && rm install_base.sh
@ -23,14 +24,12 @@ COPY ./common/install_docs_reqs.sh install_docs_reqs.sh
RUN bash ./install_docs_reqs.sh && rm install_docs_reqs.sh
# Install conda and other packages (e.g., numpy, pytest)
ENV PATH /opt/conda/bin:$PATH
ARG ANACONDA_PYTHON_VERSION
ENV ANACONDA_PYTHON_VERSION=$ANACONDA_PYTHON_VERSION
ENV PATH /opt/conda/envs/py_$ANACONDA_PYTHON_VERSION/bin:/opt/conda/bin:$PATH
ARG CONDA_CMAKE
COPY requirements-ci.txt /opt/conda/requirements-ci.txt
COPY ./common/install_conda.sh install_conda.sh
COPY ./common/common_utils.sh common_utils.sh
RUN bash ./install_conda.sh && rm install_conda.sh common_utils.sh /opt/conda/requirements-ci.txt
RUN bash ./install_conda.sh && rm install_conda.sh
RUN rm /opt/conda/requirements-ci.txt
# Install gcc
ARG GCC_VERSION

View File

@ -11,6 +11,7 @@ ARG PYTORCH_ROCM_ARCH
ENV PYTORCH_ROCM_ARCH ${PYTORCH_ROCM_ARCH}
# Install common dependencies (so that this step can be cached separately)
ARG EC2
COPY ./common/install_base.sh install_base.sh
RUN bash ./install_base.sh && rm install_base.sh
@ -25,14 +26,12 @@ COPY ./common/install_user.sh install_user.sh
RUN bash ./install_user.sh && rm install_user.sh
# Install conda and other packages (e.g., numpy, pytest)
ENV PATH /opt/conda/bin:$PATH
ARG ANACONDA_PYTHON_VERSION
ARG CONDA_CMAKE
ENV ANACONDA_PYTHON_VERSION=$ANACONDA_PYTHON_VERSION
ENV PATH /opt/conda/envs/py_$ANACONDA_PYTHON_VERSION/bin:/opt/conda/bin:$PATH
COPY requirements-ci.txt /opt/conda/requirements-ci.txt
COPY ./common/install_conda.sh install_conda.sh
COPY ./common/common_utils.sh common_utils.sh
RUN bash ./install_conda.sh && rm install_conda.sh common_utils.sh /opt/conda/requirements-ci.txt
RUN bash ./install_conda.sh && rm install_conda.sh
RUN rm /opt/conda/requirements-ci.txt
# Install gcc
ARG GCC_VERSION

View File

@ -9,6 +9,7 @@ ENV DEBIAN_FRONTEND noninteractive
ARG CLANG_VERSION
# Install common dependencies (so that this step can be cached separately)
ARG EC2
COPY ./common/install_base.sh install_base.sh
RUN bash ./install_base.sh && rm install_base.sh
@ -34,14 +35,12 @@ COPY ./common/install_docs_reqs.sh install_docs_reqs.sh
RUN bash ./install_docs_reqs.sh && rm install_docs_reqs.sh
# Install conda and other packages (e.g., numpy, pytest)
ENV PATH /opt/conda/bin:$PATH
ARG ANACONDA_PYTHON_VERSION
ARG CONDA_CMAKE
ENV ANACONDA_PYTHON_VERSION=$ANACONDA_PYTHON_VERSION
ENV PATH /opt/conda/envs/py_$ANACONDA_PYTHON_VERSION/bin:/opt/conda/bin:$PATH
COPY requirements-ci.txt /opt/conda/requirements-ci.txt
COPY ./common/install_conda.sh install_conda.sh
COPY ./common/common_utils.sh common_utils.sh
RUN bash ./install_conda.sh && rm install_conda.sh common_utils.sh /opt/conda/requirements-ci.txt
RUN bash ./install_conda.sh && rm install_conda.sh
RUN rm /opt/conda/requirements-ci.txt
# Install gcc
ARG GCC_VERSION
@ -137,6 +136,10 @@ RUN rm install_openssl.sh
# Install ccache/sccache (do this last, so we get priority in PATH)
COPY ./common/install_cache.sh install_cache.sh
ENV PATH /opt/cache/bin:$PATH
# See https://github.com/pytorch/pytorch/issues/82174
# TODO(sdym@fb.com):
# check if this is needed after full off Xenial migration
ENV CARGO_NET_GIT_FETCH_WITH_CLI true
RUN bash ./install_cache.sh && rm install_cache.sh
# Add jni.h for java host build

View File

@ -14,6 +14,9 @@ import cimodel.data.simple.docker_definitions
import cimodel.data.simple.mobile_definitions
import cimodel.data.simple.nightly_ios
import cimodel.data.simple.anaconda_prune_defintions
import cimodel.data.simple.macos_definitions
import cimodel.data.simple.upload_test_stats_definition
import cimodel.data.simple.ios_definitions
import cimodel.lib.miniutils as miniutils
import cimodel.lib.miniyaml as miniyaml
@ -140,6 +143,9 @@ def gen_build_workflows_tree():
cimodel.data.simple.mobile_definitions.get_workflow_jobs,
cimodel.data.simple.nightly_ios.get_workflow_jobs,
cimodel.data.simple.anaconda_prune_defintions.get_workflow_jobs,
cimodel.data.simple.macos_definitions.get_new_workflow_jobs,
cimodel.data.simple.upload_test_stats_definition.get_workflow_job,
cimodel.data.simple.ios_definitions.get_workflow_jobs,
]
build_jobs = [f() for f in build_workflows_functions]
build_jobs.extend(

View File

@ -56,13 +56,13 @@ else
echo "Can't tell what to checkout"
exit 1
fi
retry git submodule update --init --recursive
retry git submodule update --init --recursive --jobs 0
echo "Using Pytorch from "
git --no-pager log --max-count 1
popd
# Clone the Builder master repo
retry git clone -q https://github.com/pytorch/builder.git -b release/2.0 "$BUILDER_ROOT"
retry git clone -q https://github.com/pytorch/builder.git -b release/1.13 "$BUILDER_ROOT"
pushd "$BUILDER_ROOT"
echo "Using builder from "
git --no-pager log --max-count 1

View File

@ -31,9 +31,9 @@ fi
conda_sh="$workdir/install_miniconda.sh"
if [[ "$(uname)" == Darwin ]]; then
curl --retry 3 --retry-all-errors -o "$conda_sh" https://repo.anaconda.com/miniconda/Miniconda3-py39_4.12.0-MacOSX-x86_64.sh
curl --retry 3 -o "$conda_sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
else
curl --retry 3 --retry-all-errors -o "$conda_sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
curl --retry 3 -o "$conda_sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
fi
chmod +x "$conda_sh"
"$conda_sh" -b -p "$MINICONDA_ROOT"

View File

@ -8,21 +8,21 @@ PROJ_ROOT=/Users/distiller/project
export TCLLIBPATH="/usr/local/lib"
# Install conda
curl --retry 3 -o ~/conda.sh https://repo.anaconda.com/miniconda/Miniconda3-py39_4.12.0-MacOSX-x86_64.sh
curl --retry 3 -o ~/conda.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
chmod +x ~/conda.sh
/bin/bash ~/conda.sh -b -p ~/anaconda
export PATH="~/anaconda/bin:${PATH}"
source ~/anaconda/bin/activate
# Install dependencies
conda install numpy ninja pyyaml mkl mkl-include setuptools cmake requests typing-extensions --yes
conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi requests typing_extensions --yes
conda install -c conda-forge valgrind --yes
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
# sync submodules
cd ${PROJ_ROOT}
git submodule sync
git submodule update --init --recursive
git submodule update --init --recursive --jobs 0
# run build script
chmod a+x ${PROJ_ROOT}/scripts/build_ios.sh

View File

@ -33,7 +33,7 @@ fi
cp ${PROJ_ROOT}/LICENSE ${ZIP_DIR}/
# zip the library
export DATE="$(date -u +%Y%m%d)"
export IOS_NIGHTLY_BUILD_VERSION="2.0.0.${DATE}"
export IOS_NIGHTLY_BUILD_VERSION="1.13.0.${DATE}"
if [ "${BUILD_LITE_INTERPRETER}" == "1" ]; then
# libtorch_lite_ios_nightly_1.11.0.20210810.zip
ZIPFILE="libtorch_lite_ios_nightly_${IOS_NIGHTLY_BUILD_VERSION}.zip"
@ -47,7 +47,7 @@ echo "${IOS_NIGHTLY_BUILD_VERSION}" > version.txt
zip -r ${ZIPFILE} install src version.txt LICENSE
# upload to aws
# Install conda then 'conda install' awscli
curl --retry 3 -o ~/conda.sh https://repo.anaconda.com/miniconda/Miniconda3-py39_4.12.0-MacOSX-x86_64.sh
curl --retry 3 -o ~/conda.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
chmod +x ~/conda.sh
/bin/bash ~/conda.sh -b -p ~/anaconda
export PATH="~/anaconda/bin:${PATH}"

View File

@ -38,12 +38,8 @@ fi
EXTRA_CONDA_FLAGS=""
NUMPY_PIN=""
PROTOBUF_PACKAGE="defaults::protobuf"
if [[ "\$python_nodot" = *311* ]]; then
# Numpy is yet not avaiable on default conda channel
EXTRA_CONDA_FLAGS="-c=malfet"
fi
if [[ "\$python_nodot" = *310* ]]; then
EXTRA_CONDA_FLAGS="-c=conda-forge"
# There's an issue with conda channel priority where it'll randomly pick 1.19 over 1.20
# we set a lower boundary here just to be safe
NUMPY_PIN=">=1.21.2"
@ -51,6 +47,7 @@ if [[ "\$python_nodot" = *310* ]]; then
fi
if [[ "\$python_nodot" = *39* ]]; then
EXTRA_CONDA_FLAGS="-c=conda-forge"
# There's an issue with conda channel priority where it'll randomly pick 1.19 over 1.20
# we set a lower boundary here just to be safe
NUMPY_PIN=">=1.20"
@ -79,27 +76,30 @@ if [[ "$PACKAGE_TYPE" == conda ]]; then
set +u
retry conda install \${EXTRA_CONDA_FLAGS} -yq \
"numpy\${NUMPY_PIN}" \
future \
mkl>=2018 \
ninja \
dataclasses \
typing-extensions \
${PROTOBUF_PACKAGE}
${PROTOBUF_PACKAGE} \
six
if [[ "$DESIRED_CUDA" == 'cpu' ]]; then
retry conda install -c pytorch -y cpuonly
else
cu_ver="${DESIRED_CUDA:2:2}.${DESIRED_CUDA:4}"
CUDA_PACKAGE="pytorch-cuda"
PYTORCH_CHANNEL="pytorch"
if [[ "\${TORCH_CONDA_BUILD_FOLDER}" == "pytorch-nightly" ]]; then
PYTORCH_CHANNEL="pytorch-nightly"
CUDA_PACKAGE="cudatoolkit"
if [[ "$DESIRED_CUDA" == "cu116" || "$DESIRED_CUDA" == "cu117" ]]; then
CUDA_PACKAGE="cuda"
fi
retry conda install \${EXTRA_CONDA_FLAGS} -yq -c nvidia -c "\${PYTORCH_CHANNEL}" "pytorch-cuda=\${cu_ver}"
retry conda install \${EXTRA_CONDA_FLAGS} -yq -c nvidia -c pytorch "\${CUDA_PACKAGE}=\${cu_ver}"
fi
conda install \${EXTRA_CONDA_FLAGS} -y "\$pkg" --offline
)
elif [[ "$PACKAGE_TYPE" != libtorch ]]; then
pip install "\$pkg" --extra-index-url "https://download.pytorch.org/whl/nightly/${DESIRED_CUDA}"
retry pip install -q numpy protobuf typing-extensions
pip install "\$pkg"
retry pip install -q future numpy protobuf typing-extensions six
fi
if [[ "$PACKAGE_TYPE" == libtorch ]]; then
pkg="\$(ls /final_pkgs/*-latest.zip)"

View File

@ -59,7 +59,7 @@ PIP_UPLOAD_FOLDER='nightly/'
# We put this here so that OVERRIDE_PACKAGE_VERSION below can read from it
export DATE="$(date -u +%Y%m%d)"
#TODO: We should be pulling semver version from the base version.txt
BASE_BUILD_VERSION="2.0.0.dev$DATE"
BASE_BUILD_VERSION="1.13.0.dev$DATE"
# Change BASE_BUILD_VERSION to git tag when on a git tag
# Use 'git -C' to make doubly sure we're in the correct directory for checking
# the git tag
@ -92,11 +92,11 @@ if [[ "$PACKAGE_TYPE" == libtorch ]]; then
POSSIBLE_JAVA_HOMES+=(/usr/lib/jvm/java-8-openjdk-amd64)
POSSIBLE_JAVA_HOMES+=(/Library/Java/JavaVirtualMachines/*.jdk/Contents/Home)
# Add the Windows-specific JNI path
POSSIBLE_JAVA_HOMES+=("$PWD/pytorch/.circleci/windows-jni/")
POSSIBLE_JAVA_HOMES+=("$PWD/.circleci/windows-jni/")
for JH in "${POSSIBLE_JAVA_HOMES[@]}" ; do
if [[ -e "$JH/include/jni.h" ]] ; then
# Skip if we're not on Windows but haven't found a JAVA_HOME
if [[ "$JH" == "$PWD/pytorch/.circleci/windows-jni/" && "$OSTYPE" != "msys" ]] ; then
if [[ "$JH" == "$PWD/.circleci/windows-jni/" && "$OSTYPE" != "msys" ]] ; then
break
fi
echo "Found jni.h under $JH"
@ -131,7 +131,7 @@ else
fi
export PYTORCH_EXTRA_INSTALL_REQUIREMENTS="${PYTORCH_EXTRA_INSTALL_REQUIREMENTS:-}"
export DATE="$DATE"
export NIGHTLIES_DATE_PREAMBLE=1.14.0.dev
export NIGHTLIES_DATE_PREAMBLE=1.13.0.dev
export PYTORCH_BUILD_VERSION="$PYTORCH_BUILD_VERSION"
export PYTORCH_BUILD_NUMBER="$PYTORCH_BUILD_NUMBER"
export OVERRIDE_PACKAGE_VERSION="$PYTORCH_BUILD_VERSION"

View File

@ -8,7 +8,7 @@ export CUDA_VERSION="${DESIRED_CUDA/cu/}"
export USE_SCCACHE=1
export SCCACHE_BUCKET=ossci-compiler-cache
export SCCACHE_IGNORE_SERVER_IO_ERROR=1
export VC_YEAR=2022
export VC_YEAR=2019
if [[ "${DESIRED_CUDA}" == *"cu11"* ]]; then
export BUILD_SPLIT_CUDA=ON

View File

@ -4,7 +4,7 @@ set -eux -o pipefail
source "${BINARY_ENV_FILE:-/c/w/env}"
export CUDA_VERSION="${DESIRED_CUDA/cu/}"
export VC_YEAR=2022
export VC_YEAR=2019
pushd "$BUILDER_ROOT"

View File

@ -20,11 +20,6 @@ do
touch "$file" || true
done < <(find /var/lib/jenkins/.gradle -type f -print0)
# Patch pocketfft (as Android does not have aligned_alloc even if compiled with c++17
if [ -f ~/workspace/third_party/pocketfft/pocketfft_hdronly.h ]; then
sed -i -e "s/#if __cplusplus >= 201703L/#if 0/" ~/workspace/third_party/pocketfft/pocketfft_hdronly.h
fi
export GRADLE_LOCAL_PROPERTIES=~/workspace/android/local.properties
rm -f $GRADLE_LOCAL_PROPERTIES
echo "sdk.dir=/opt/android/sdk" >> $GRADLE_LOCAL_PROPERTIES

View File

@ -98,6 +98,9 @@ git commit -m "Generate C++ docs from pytorch/pytorch@${GITHUB_SHA}" || true
git status
if [[ "${WITH_PUSH:-}" == true ]]; then
# push to a temp branch first to trigger CLA check and satisfy branch protections
git push -u origin HEAD:pytorchbot/temp-branch-cpp -f
sleep 30
git push -u origin
fi

View File

@ -1,5 +1,5 @@
set "DRIVER_DOWNLOAD_LINK=https://s3.amazonaws.com/ossci-windows/452.39-data-center-tesla-desktop-win10-64bit-international.exe"
curl --retry 3 --retry-all-errors -kL %DRIVER_DOWNLOAD_LINK% --output 452.39-data-center-tesla-desktop-win10-64bit-international.exe
curl --retry 3 -kL %DRIVER_DOWNLOAD_LINK% --output 452.39-data-center-tesla-desktop-win10-64bit-international.exe
if errorlevel 1 exit /b 1
start /wait 452.39-data-center-tesla-desktop-win10-64bit-international.exe -s -noreboot

View File

@ -7,7 +7,7 @@ sudo apt-get -y install expect-dev
# This is where the local pytorch install in the docker image is located
pt_checkout="/var/lib/jenkins/workspace"
source "$pt_checkout/.ci/pytorch/common_utils.sh"
source "$pt_checkout/.jenkins/pytorch/common_utils.sh"
echo "functorch_doc_push_script.sh: Invoked with $*"
set -ex

View File

@ -7,7 +7,7 @@ sudo apt-get -y install expect-dev
# This is where the local pytorch install in the docker image is located
pt_checkout="/var/lib/jenkins/workspace"
source "$pt_checkout/.ci/pytorch/common_utils.sh"
source "$pt_checkout/.jenkins/pytorch/common_utils.sh"
echo "python_doc_push_script.sh: Invoked with $*"
@ -77,9 +77,6 @@ pushd pytorch.github.io
export LC_ALL=C
export PATH=/opt/conda/bin:$PATH
if [ -n $ANACONDA_PYTHON_VERSION ]; then
export PATH=/opt/conda/envs/py_$ANACONDA_PYTHON_VERSION/bin:$PATH
fi
rm -rf pytorch || true
@ -140,7 +137,6 @@ git status
if [[ "${WITH_PUSH:-}" == true ]]; then
# push to a temp branch first to trigger CLA check and satisfy branch protections
git push -u origin HEAD:pytorchbot/temp-branch-py -f
git push -u origin HEAD^:pytorchbot/base -f
sleep 30
git push -u origin "${branch}"
fi

View File

@ -32,7 +32,7 @@ if ! command -v aws >/dev/null; then
fi
if [ -n "${USE_CUDA_DOCKER_RUNTIME:-}" ]; then
DRIVER_FN="NVIDIA-Linux-x86_64-515.76.run"
DRIVER_FN="NVIDIA-Linux-x86_64-515.57.run"
wget "https://s3.amazonaws.com/ossci-linux/nvidia_driver/$DRIVER_FN"
sudo /bin/bash "$DRIVER_FN" -s --no-drm || (sudo cat /var/log/nvidia-installer.log && false)
nvidia-smi
@ -40,8 +40,8 @@ if [ -n "${USE_CUDA_DOCKER_RUNTIME:-}" ]; then
# Taken directly from https://github.com/NVIDIA/nvidia-docker
# Add the package repositories
distribution=$(. /etc/os-release;echo "$ID$VERSION_ID")
curl -s -L --retry 3 --retry-all-errors https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L --retry 3 --retry-all-errors "https://nvidia.github.io/nvidia-docker/${distribution}/nvidia-docker.list" | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L "https://nvidia.github.io/nvidia-docker/${distribution}/nvidia-docker.list" | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
retry sudo apt-get update -qq
# Necessary to get the `--gpus` flag to function within docker

View File

@ -2,7 +2,7 @@
set -eux -o pipefail
# Set up CircleCI GPG keys for apt, if needed
curl --retry 3 --retry-all-errors -s -L https://packagecloud.io/circleci/trusty/gpgkey | sudo apt-key add -
curl --retry 3 -s -L https://packagecloud.io/circleci/trusty/gpgkey | sudo apt-key add -
# Stop background apt updates. Hypothetically, the kill should not
# be necessary, because stop is supposed to send a kill signal to

View File

@ -0,0 +1,65 @@
# https://developercommunity.visualstudio.com/t/install-specific-version-of-vs-component/1142479
# Where to find the links: https://docs.microsoft.com/en-us/visualstudio/releases/2019/history#release-dates-and-build-numbers
# BuildTools from S3
$VS_DOWNLOAD_LINK = "https://s3.amazonaws.com/ossci-windows/vs${env:VS_VERSION}_BuildTools.exe"
$COLLECT_DOWNLOAD_LINK = "https://aka.ms/vscollect.exe"
$VS_INSTALL_ARGS = @("--nocache","--quiet","--wait", "--add Microsoft.VisualStudio.Workload.VCTools",
"--add Microsoft.Component.MSBuild",
"--add Microsoft.VisualStudio.Component.Roslyn.Compiler",
"--add Microsoft.VisualStudio.Component.TextTemplating",
"--add Microsoft.VisualStudio.Component.VC.CoreIde",
"--add Microsoft.VisualStudio.Component.VC.Redist.14.Latest",
"--add Microsoft.VisualStudio.ComponentGroup.NativeDesktop.Core",
"--add Microsoft.VisualStudio.Component.VC.Tools.x86.x64",
"--add Microsoft.VisualStudio.ComponentGroup.NativeDesktop.Win81")
if (${env:INSTALL_WINDOWS_SDK} -eq "1") {
$VS_INSTALL_ARGS += "--add Microsoft.VisualStudio.Component.Windows10SDK.19041"
}
if (Test-Path "${env:ProgramFiles(x86)}\Microsoft Visual Studio\Installer\vswhere.exe") {
$VS_VERSION_major = [int] ${env:VS_VERSION}.split(".")[0]
$existingPath = & "${env:ProgramFiles(x86)}\Microsoft Visual Studio\Installer\vswhere.exe" -products "Microsoft.VisualStudio.Product.BuildTools" -version "[${env:VS_VERSION}, ${env:VS_VERSION_major + 1})" -property installationPath
if (($existingPath -ne $null) -and (!${env:CIRCLECI})) {
echo "Found correctly versioned existing BuildTools installation in $existingPath"
exit 0
}
$pathToRemove = & "${env:ProgramFiles(x86)}\Microsoft Visual Studio\Installer\vswhere.exe" -products "Microsoft.VisualStudio.Product.BuildTools" -property installationPath
}
echo "Downloading VS installer from S3."
curl.exe --retry 3 -kL $VS_DOWNLOAD_LINK --output vs_installer.exe
if ($LASTEXITCODE -ne 0) {
echo "Download of the VS 2019 Version ${env:VS_VERSION} installer failed"
exit 1
}
if ($pathToRemove -ne $null) {
echo "Uninstalling $pathToRemove."
$VS_UNINSTALL_ARGS = @("uninstall", "--installPath", "`"$pathToRemove`"", "--quiet","--wait")
$process = Start-Process "${PWD}\vs_installer.exe" -ArgumentList $VS_UNINSTALL_ARGS -NoNewWindow -Wait -PassThru
$exitCode = $process.ExitCode
if (($exitCode -ne 0) -and ($exitCode -ne 3010)) {
echo "Original BuildTools uninstall failed with code $exitCode"
exit 1
}
echo "Other versioned BuildTools uninstalled."
}
echo "Installing Visual Studio version ${env:VS_VERSION}."
$process = Start-Process "${PWD}\vs_installer.exe" -ArgumentList $VS_INSTALL_ARGS -NoNewWindow -Wait -PassThru
Remove-Item -Path vs_installer.exe -Force
$exitCode = $process.ExitCode
if (($exitCode -ne 0) -and ($exitCode -ne 3010)) {
echo "VS 2019 installer exited with code $exitCode, which should be one of [0, 3010]."
curl.exe --retry 3 -kL $COLLECT_DOWNLOAD_LINK --output Collect.exe
if ($LASTEXITCODE -ne 0) {
echo "Download of the VS Collect tool failed."
exit 1
}
Start-Process "${PWD}\Collect.exe" -NoNewWindow -Wait -PassThru
New-Item -Path "C:\w\build-results" -ItemType "directory" -Force
Copy-Item -Path "${env:TEMP}\vslogs.zip" -Destination "C:\w\build-results\"
exit 1
}

View File

@ -0,0 +1,5 @@
$CMATH_DOWNLOAD_LINK = "https://raw.githubusercontent.com/microsoft/STL/12c684bba78f9b032050526abdebf14f58ca26a3/stl/inc/cmath"
$VC14_28_INSTALL_PATH="C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.28.29910\include"
curl.exe --retry 3 -kL $CMATH_DOWNLOAD_LINK --output "$home\cmath"
Move-Item -Path "$home\cmath" -Destination "$VC14_28_INSTALL_PATH" -Force

View File

@ -0,0 +1,75 @@
#!/bin/bash
set -eux -o pipefail
case ${CUDA_VERSION} in
10.2)
cuda_installer_name="cuda_10.2.89_441.22_win10"
cuda_install_packages="nvcc_10.2 cuobjdump_10.2 nvprune_10.2 cupti_10.2 cublas_10.2 cublas_dev_10.2 cudart_10.2 cufft_10.2 cufft_dev_10.2 curand_10.2 curand_dev_10.2 cusolver_10.2 cusolver_dev_10.2 cusparse_10.2 cusparse_dev_10.2 nvgraph_10.2 nvgraph_dev_10.2 npp_10.2 npp_dev_10.2 nvrtc_10.2 nvrtc_dev_10.2 nvml_dev_10.2"
;;
11.3)
cuda_installer_name="cuda_11.3.0_465.89_win10"
cuda_install_packages="thrust_11.3 nvcc_11.3 cuobjdump_11.3 nvprune_11.3 nvprof_11.3 cupti_11.3 cublas_11.3 cublas_dev_11.3 cudart_11.3 cufft_11.3 cufft_dev_11.3 curand_11.3 curand_dev_11.3 cusolver_11.3 cusolver_dev_11.3 cusparse_11.3 cusparse_dev_11.3 npp_11.3 npp_dev_11.3 nvrtc_11.3 nvrtc_dev_11.3 nvml_dev_11.3"
;;
11.6)
cuda_installer_name="cuda_11.6.0_511.23_windows"
cuda_install_packages="thrust_11.6 nvcc_11.6 cuobjdump_11.6 nvprune_11.6 nvprof_11.6 cupti_11.6 cublas_11.6 cublas_dev_11.6 cudart_11.6 cufft_11.6 cufft_dev_11.6 curand_11.6 curand_dev_11.6 cusolver_11.6 cusolver_dev_11.6 cusparse_11.6 cusparse_dev_11.6 npp_11.6 npp_dev_11.6 nvrtc_11.6 nvrtc_dev_11.6 nvml_dev_11.6"
;;
11.7)
cuda_installer_name="cuda_11.7.0_516.01_windows"
cuda_install_packages="thrust_11.7 nvcc_11.7 cuobjdump_11.7 nvprune_11.7 nvprof_11.7 cupti_11.7 cublas_11.7 cublas_dev_11.7 cudart_11.7 cufft_11.7 cufft_dev_11.7 curand_11.7 curand_dev_11.7 cusolver_11.7 cusolver_dev_11.7 cusparse_11.7 cusparse_dev_11.7 npp_11.7 npp_dev_11.7 nvrtc_11.7 nvrtc_dev_11.7 nvml_dev_11.7"
;;
*)
echo "CUDA_VERSION $CUDA_VERSION is not supported yet"
exit 1
;;
esac
if [[ -f "/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v${CUDA_VERSION}/bin/nvcc.exe" ]]; then
echo "Existing CUDA v${CUDA_VERSION} installation found, skipping install"
else
tmp_dir=$(mktemp -d)
(
# no need to popd after, the subshell shouldn't affect the parent shell
pushd "${tmp_dir}"
cuda_installer_link="https://ossci-windows.s3.amazonaws.com/${cuda_installer_name}.exe"
curl --retry 3 -kLO $cuda_installer_link
7z x ${cuda_installer_name}.exe -o${cuda_installer_name}
pushd ${cuda_installer_name}
mkdir cuda_install_logs
set +e
# This breaks for some reason if you quote cuda_install_packages
# shellcheck disable=SC2086
./setup.exe -s ${cuda_install_packages} -loglevel:6 -log:"$(pwd -W)/cuda_install_logs"
set -e
if [[ ! -f "/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v${CUDA_VERSION}/bin/nvcc.exe" ]]; then
echo "CUDA installation failed"
mkdir -p /c/w/build-results
7z a "c:\\w\\build-results\\cuda_install_logs.7z" cuda_install_logs
exit 1
fi
)
rm -rf "${tmp_dir}"
fi
if [[ -f "/c/Program Files/NVIDIA Corporation/NvToolsExt/bin/x64/nvToolsExt64_1.dll" ]]; then
echo "Existing nvtools installation found, skipping install"
else
# create tmp dir for download
tmp_dir=$(mktemp -d)
(
# no need to popd after, the subshell shouldn't affect the parent shell
pushd "${tmp_dir}"
curl --retry 3 -kLO https://ossci-windows.s3.amazonaws.com/NvToolsExt.7z
7z x NvToolsExt.7z -oNvToolsExt
mkdir -p "C:/Program Files/NVIDIA Corporation/NvToolsExt"
cp -r NvToolsExt/* "C:/Program Files/NVIDIA Corporation/NvToolsExt/"
)
rm -rf "${tmp_dir}"
fi

View File

@ -0,0 +1,52 @@
#!/bin/bash
set -eux -o pipefail
windows_s3_link="https://ossci-windows.s3.amazonaws.com"
case ${CUDA_VERSION} in
10.2)
cudnn_file_name="cudnn-${CUDA_VERSION}-windows10-x64-v7.6.5.32"
;;
11.3)
# Use cudnn8.3 with hard-coded cuda11.3 version
cudnn_file_name="cudnn-windows-x86_64-8.3.2.44_cuda11.5-archive"
;;
11.6)
# Use cudnn8.3 with hard-coded cuda11.5 version
cudnn_file_name="cudnn-windows-x86_64-8.3.2.44_cuda11.5-archive"
;;
11.7)
# Use cudnn8.3 with hard-coded cuda11.5 version
cudnn_file_name="cudnn-windows-x86_64-8.5.0.96_cuda11-archive"
;;
*)
echo "CUDA_VERSION: ${CUDA_VERSION} not supported yet"
exit 1
;;
esac
cudnn_installer_name="cudnn_installer.zip"
cudnn_installer_link="${windows_s3_link}/${cudnn_file_name}.zip"
cudnn_install_folder="C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v${CUDA_VERSION}/"
if [[ -f "${cudnn_install_folder}/include/cudnn.h" ]]; then
echo "Existing cudnn installation found, skipping install..."
else
tmp_dir=$(mktemp -d)
(
pushd "${tmp_dir}"
curl --retry 3 -o "${cudnn_installer_name}" "$cudnn_installer_link"
7z x "${cudnn_installer_name}" -ocudnn
# Use '${var:?}/*' to avoid potentially expanding to '/*'
# Remove all of the directories before attempting to copy files
rm -rf "${cudnn_install_folder:?}/*"
cp -rf cudnn/cuda/* "${cudnn_install_folder}"
#Make sure windows path contains zlib dll
curl -k -L "${windows_s3_link}/zlib123dllx64.zip" --output "${tmp_dir}\zlib123dllx64.zip"
7z x "${tmp_dir}\zlib123dllx64.zip" -o"${tmp_dir}\zlib"
xcopy /Y "${tmp_dir}\zlib\dll_x64\*.dll" "C:\Windows\System32"
)
rm -rf "${tmp_dir}"
fi

View File

@ -6,7 +6,7 @@ commands:
- run:
name: "Calculate docker image hash"
command: |
DOCKER_TAG=$(git rev-parse HEAD:.ci/docker)
DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)
echo "DOCKER_TAG=${DOCKER_TAG}" >> "${BASH_ENV}"
designate_upload_channel:

View File

@ -33,12 +33,12 @@
exit 0
fi
# Covers the case where a previous tag doesn't exist for the tree
# this is only really applicable on trees that don't have `.ci/docker` at its merge base, i.e. nightly
if ! git rev-parse "$(git merge-base HEAD << pipeline.git.base_revision >>):.ci/docker"; then
echo "Directory '.ci/docker' not found in tree << pipeline.git.base_revision >>, you should probably rebase onto a more recent commit"
# this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly
if ! git rev-parse "$(git merge-base HEAD << pipeline.git.base_revision >>):.circleci/docker"; then
echo "Directory '.circleci/docker' not found in tree << pipeline.git.base_revision >>, you should probably rebase onto a more recent commit"
exit 1
fi
PREVIOUS_DOCKER_TAG=$(git rev-parse "$(git merge-base HEAD << pipeline.git.base_revision >>):ci/docker")
PREVIOUS_DOCKER_TAG=$(git rev-parse "$(git merge-base HEAD << pipeline.git.base_revision >>):.circleci/docker")
# If no image exists but the hash is the same as the previous hash then we should error out here
if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then
echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"
@ -53,4 +53,4 @@
export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_DOCKER_BUILDER_V1}
export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_DOCKER_BUILDER_V1}
set -x
cd .ci/docker && ./build_docker.sh
cd .circleci/docker && ./build_docker.sh

View File

@ -51,8 +51,8 @@
export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4}
set -x
chmod a+x .ci/pytorch/macos-build.sh
unbuffer .ci/pytorch/macos-build.sh 2>&1 | ts
chmod a+x .jenkins/pytorch/macos-build.sh
unbuffer .jenkins/pytorch/macos-build.sh 2>&1 | ts
- persist_to_workspace:
root: /Users/distiller/workspace/
@ -87,8 +87,8 @@
export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4}
set -x
chmod a+x .ci/pytorch/macos-build.sh
unbuffer .ci/pytorch/macos-build.sh 2>&1 | ts
chmod a+x .jenkins/pytorch/macos-build.sh
unbuffer .jenkins/pytorch/macos-build.sh 2>&1 | ts
- persist_to_workspace:
root: /Users/distiller/workspace/
@ -169,7 +169,7 @@
brew link --force libomp
echo "export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname "$(which conda)")/../"}" >> "${BASH_ENV}"
.ci/pytorch/macos-build.sh
.jenkins/pytorch/macos-build.sh
- when:
condition: << parameters.build-generates-artifacts >>
@ -252,7 +252,7 @@
export COMMIT_MESSAGES="${COMMIT_MESSAGES//[\'\"]}"
python3 -mpip install dist/*.whl
.ci/pytorch/macos-test.sh
.jenkins/pytorch/macos-test.sh
- run:
name: Copy files for uploading test stats
command: |
@ -282,7 +282,7 @@
exit 0
fi
cp -r ~/workspace/test-reports/* ~/project
pip3 install requests==2.26 rockset==1.0.3 boto3==1.19.12
pip3 install requests==2.26 rockset==0.8.3 boto3==1.19.12 six==1.16.0
export AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_FOR_OSSCI_ARTIFACT_UPLOAD}
export AWS_SECRET_ACCESS_KEY=${AWS_SECRET_KEY_FOR_OSSCI_ARTIFACT_UPLOAD}
# i dont know how to get the run attempt number for reruns so default to 1
@ -304,8 +304,23 @@
set -e
export JOB_BASE_NAME=$CIRCLE_JOB
chmod a+x .ci/pytorch/macos-test.sh
unbuffer .ci/pytorch/macos-test.sh 2>&1 | ts
chmod a+x .jenkins/pytorch/macos-test.sh
unbuffer .jenkins/pytorch/macos-test.sh 2>&1 | ts
- run:
name: Report results
no_output_timeout: "5m"
command: |
set -ex
source /Users/distiller/workspace/miniconda3/bin/activate
python3 -m pip install boto3==1.19.12
export JOB_BASE_NAME=$CIRCLE_JOB
# Using the same IAM user to write stats to our OSS bucket
export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V4}
export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4}
python -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test
when: always
- store_test_results:
path: test/test-reports
@ -326,8 +341,8 @@
set -e
export BUILD_LITE_INTERPRETER=1
export JOB_BASE_NAME=$CIRCLE_JOB
chmod a+x ${HOME}/project/.ci/pytorch/macos-lite-interpreter-build-test.sh
unbuffer ${HOME}/project/.ci/pytorch/macos-lite-interpreter-build-test.sh 2>&1 | ts
chmod a+x ${HOME}/project/.jenkins/pytorch/macos-lite-interpreter-build-test.sh
unbuffer ${HOME}/project/.jenkins/pytorch/macos-lite-interpreter-build-test.sh 2>&1 | ts
- store_test_results:
path: test/test-reports
@ -551,7 +566,7 @@
export TCLLIBPATH="/usr/local/lib"
# Install conda
curl --retry 3 -o ~/conda.sh https://repo.anaconda.com/miniconda/Miniconda3-py39_4.12.0-MacOSX-x86_64.sh
curl --retry 3 -o ~/conda.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
chmod +x ~/conda.sh
/bin/bash ~/conda.sh -b -p ~/anaconda
export PATH="~/anaconda/bin:${PATH}"
@ -562,7 +577,7 @@
$* || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*)
}
retry conda install numpy ninja pyyaml mkl mkl-include setuptools cmake requests typing-extensions --yes
retry conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi requests typing_extensions --yes
# sync submodules
cd ${PROJ_ROOT}
@ -626,7 +641,7 @@
cd ${PROJ_ROOT}/ios/TestApp/benchmark
mkdir -p ../models
if [ ${USE_COREML_DELEGATE} == 1 ]; then
pip install coremltools==5.0b5 protobuf==3.20.1
pip install coremltools==5.0b5 protobuf==3.20.1 six==1.16.0
python coreml_backend.py
else
cd "${PROJ_ROOT}"
@ -676,7 +691,7 @@
docker cp /home/circleci/project/. $id:/var/lib/jenkins/workspace
export COMMAND='((echo "sudo chown -R jenkins workspace && cd workspace && .ci/pytorch/build.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
export COMMAND='((echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/build.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts
@ -722,9 +737,9 @@
trap "retrieve_test_reports" ERR
if [[ ${BUILD_ENVIRONMENT} == *"multigpu"* ]]; then
export COMMAND='((echo "sudo chown -R jenkins workspace && cd workspace && .ci/pytorch/multigpu-test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
export COMMAND='((echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/multigpu-test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
else
export COMMAND='((echo "sudo chown -R jenkins workspace && cd workspace && .ci/pytorch/test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
export COMMAND='((echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
fi
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

View File

@ -3,14 +3,11 @@
InheritParentConfig: true
Checks: '
bugprone-*,
-bugprone-easily-swappable-parameters,
-bugprone-forward-declaration-namespace,
-bugprone-macro-parentheses,
-bugprone-lambda-function-name,
-bugprone-reserved-identifier,
-bugprone-swapped-arguments,
cppcoreguidelines-*,
-cppcoreguidelines-avoid-do-while,
-cppcoreguidelines-avoid-magic-numbers,
-cppcoreguidelines-avoid-non-const-global-variables,
-cppcoreguidelines-interfaces-global-init,
@ -29,11 +26,8 @@ cppcoreguidelines-*,
-facebook-hte-RelativeInclude,
hicpp-exception-baseclass,
hicpp-avoid-goto,
misc-unused-alias-decls,
misc-unused-using-decls,
modernize-*,
-modernize-concat-nested-namespaces,
-modernize-macro-to-enum,
-modernize-return-braced-init-list,
-modernize-use-auto,
-modernize-use-default-member-init,
@ -43,9 +37,9 @@ modernize-*,
performance-*,
-performance-noexcept-move-constructor,
-performance-unnecessary-value-param,
readability-container-size-empty,
'
HeaderFilterRegex: '^(c10/(?!test)|torch/csrc/(?!deploy/interpreter/cpython)).*$'
HeaderFilterRegex: 'torch/csrc/(?!deploy/interpreter/cpython).*'
AnalyzeTemporaryDtors: false
WarningsAsErrors: '*'
CheckOptions:
...

View File

@ -11,12 +11,8 @@ ignore =
# these ignores are from flake8-bugbear; please fix!
B007,B008,
# these ignores are from flake8-comprehensions; please fix!
C407
per-file-ignores =
__init__.py: F401
torch/utils/cpp_extension.py: B950
torchgen/api/types/__init__.py: F401,F403
torchgen/executorch/api/types/__init__.py: F401,F403
C400,C401,C402,C403,C404,C405,C407,C411,C413,C414,C415
per-file-ignores = __init__.py: F401 torch/utils/cpp_extension.py: B950
optional-ascii-coding = True
exclude =
./.git,

View File

@ -1,61 +0,0 @@
name: 🐛 torch.compile Bug Report
description: Create a report to help us reproduce and fix the bug
labels: ["oncall: pt2"]
body:
- type: markdown
attributes:
value: >
#### Before submitting a bug, please make sure the issue hasn't been already addressed by searching through [the
existing and past issues](https://github.com/pytorch/pytorch/issues)
It's likely that your bug will be resolved by checking our FAQ or troubleshooting guide [documentation](https://pytorch.org/docs/master/dynamo/index.html)
- type: textarea
attributes:
label: 🐛 Describe the bug
description: |
Please provide a clear and concise description of what the bug is.
placeholder: |
A clear and concise description of what the bug is.
validations:
required: false
- type: textarea
attributes:
label: Error logs
description: |
Please provide the error you're seeing
placeholder: |
Error...
validations:
required: false
- type: textarea
attributes:
label: Minified repro
description: |
Please run the minifier on your example and paste the minified code below
Learn more here https://pytorch.org/docs/master/dynamo/troubleshooting.html
placeholder: |
env TORCHDYNAMO_REPRO_AFTER="aot" python your_model.py
or
env TORCHDYNAMO_REPRO_AFTER="dynamo" python your_model.py
import torch
...
# torch version: 2.0.....
class Repro(torch.nn.Module)
validations:
required: false
- type: textarea
attributes:
label: Versions
description: |
Please run the following and paste the output below.
```sh
wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py
```
validations:
required: true

View File

@ -1 +1 @@
Fixes #ISSUE_NUMBER
Fixes #ISSUE_NUMBER

View File

@ -5,19 +5,14 @@ self-hosted-runner:
- linux.large
- linux.2xlarge
- linux.4xlarge
- linux.12xlarge
- linux.24xlarge
- linux.4xlarge.nvidia.gpu
- linux.8xlarge.nvidia.gpu
- linux.16xlarge.nvidia.gpu
- linux.g5.4xlarge.nvidia.gpu
- windows.4xlarge
- windows.8xlarge.nvidia.gpu
- windows.g5.4xlarge.nvidia.gpu
- bm-runner
- linux.rocm.gpu
- macos-m1-12
- macos-m1-13
- macos-12-xl
- macos-12
- macos12.3-m1

View File

@ -66,11 +66,11 @@ runs:
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}"
)
git submodule sync && git submodule update -q --init --recursive --depth 1
git submodule sync && git submodule update -q --init --recursive --depth 1 --jobs 0
docker cp "${GITHUB_WORKSPACE}/." "${container_name}:/var/lib/jenkins/workspace"
(echo "sudo chown -R jenkins . && .ci/pytorch/build.sh && find ${BUILD_ROOT} -type f -name "*.a" -or -name "*.o" -delete" | docker exec -u jenkins -i "${container_name}" bash) 2>&1
(echo "sudo chown -R jenkins . && .jenkins/pytorch/build.sh && find ${BUILD_ROOT} -type f -name "*.a" -or -name "*.o" -delete" | docker exec -u jenkins -i "${container_name}" bash) 2>&1
# Copy install binaries back
mkdir -p "${GITHUB_WORKSPACE}/build_android_install_${MATRIX_ARCH}"
docker cp "${container_name}:/var/lib/jenkins/workspace/build_android/install" "${GITHUB_WORKSPACE}/build_android_install_${MATRIX_ARCH}"
echo "container_id=${container_name}" >> "${GITHUB_OUTPUT}"
echo "::set-output name=container_id::${container_name}"

View File

@ -24,6 +24,9 @@ inputs:
force_push:
description: If set to any value, always run the push
required: false
push-ghcr-image:
description: If set to any value, push docker image to the ghcr.io.
required: false
outputs:
docker-image:
@ -38,18 +41,18 @@ runs:
id: calculate-tag
env:
IS_XLA: ${{ inputs.xla == 'true' && 'true' || '' }}
XLA_IMAGE_TAG: v1.0
XLA_IMAGE_TAG: v0.4
DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/${{ inputs.docker-image-name }}
run: |
if [ -n "${IS_XLA}" ]; then
echo "XLA workflow uses pre-built test image at ${XLA_IMAGE_TAG}"
DOCKER_TAG=$(git rev-parse HEAD:.ci/docker)
echo "docker-tag=${DOCKER_TAG}" >> "${GITHUB_OUTPUT}"
echo "docker-image=${DOCKER_IMAGE_BASE}:${XLA_IMAGE_TAG}" >> "${GITHUB_OUTPUT}"
DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)
echo "::set-output name=docker-tag::${DOCKER_TAG}"
echo "::set-output name=docker-image::${DOCKER_IMAGE_BASE}:${XLA_IMAGE_TAG}"
else
DOCKER_TAG=$(git rev-parse HEAD:.ci/docker)
echo "docker-tag=${DOCKER_TAG}" >> "${GITHUB_OUTPUT}"
echo "docker-image=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_OUTPUT}"
DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)
echo "::set-output name=docker-tag::${DOCKER_TAG}"
echo "::set-output name=docker-image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"
fi
- name: Check if image should be built
@ -75,12 +78,12 @@ runs:
MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION")
fi
# Covers the case where a previous tag doesn't exist for the tree
# this is only really applicable on trees that don't have `.ci/docker` at its merge base, i.e. nightly
if ! git rev-parse "$MERGE_BASE:.ci/docker"; then
echo "Directory '.ci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit"
# this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly
if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then
echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit"
exit 1
fi
PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.ci/docker")
PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker")
# If no image exists but the hash is the same as the previous hash then we should error out here
if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then
echo "WARNING: Something has gone wrong and the previous image isn't available for the merge-base of your branch"
@ -90,10 +93,10 @@ runs:
# In order to avoid a stampeding herd of jobs trying to push all at once we set it to
# skip the push. If this is negatively affecting TTS across the board the suggestion
# should be to run the docker-builds.yml workflow to generate the correct docker builds
echo "skip_push=true" >> "${GITHUB_OUTPUT}"
echo ::set-output name=skip_push::true
fi
fi
echo "rebuild=yes" >> "${GITHUB_OUTPUT}"
echo ::set-output name=rebuild::yes
- name: Build and push docker image
if: inputs.always-rebuild || steps.check.outputs.rebuild
@ -103,7 +106,9 @@ runs:
# Skip push if we don't need it, or if specified in the inputs
DOCKER_SKIP_PUSH: ${{ steps.check.outputs.skip_push || inputs.skip_push }}
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker-tag }}
working-directory: .ci/docker
PUSH_GHCR_IMAGE: ${{ inputs.push-ghcr-image }}
GHCR_PAT: ${{ env.GHCR_PAT }}
working-directory: .circleci/docker
shell: bash
run: |
./build_docker.sh

View File

@ -1,31 +0,0 @@
name: Cleans up diskspace
description: Cleans up diskspace if the root directory has used more than seventy percent of your diskspace.
inputs:
diskspace-cutoff:
description: The percent amount after which docker prune is run.
required: true
default: 70
runs:
using: composite
steps:
- name: Cleans up diskspace
shell: bash
run: |
diskspace_cutoff=${{ inputs.diskspace-cutoff }}
diskspace=$(df -H / --output=pcent | sed -n 2p | sed 's/%//' | sed 's/ //')
msg="Please file an issue on pytorch/pytorch reporting the faulty runner. Include a link to the runner logs so the runner can be identified"
if [[ "$diskspace" -ge "$diskspace_cutoff" ]] ; then
docker system prune -af
diskspace_new=$(df -H / --output=pcent | sed -n 2p | sed 's/%//' | sed 's/ //')
if [[ "$diskspace_new" -gt "$diskspace_cutoff" ]] ; then
echo "Error: Available diskspace is less than $diskspace_cutoff percent. Not enough diskspace."
echo "$msg"
exit 1
else
difference=$((diskspace - diskspace_new))
echo "Diskspace saved: $difference percent"
fi
fi

View File

@ -21,7 +21,7 @@ runs:
- name: Download PyTorch Build Artifacts from GHA
if: inputs.use-gha
uses: actions/download-artifact@v3
uses: actions/download-artifact@v2
with:
name: ${{ inputs.name }}

View File

@ -21,14 +21,11 @@ outputs:
is-test-matrix-empty:
description: True if the filtered test configs matrix is empty. False otherwise.
value: ${{ steps.filter.outputs.is-test-matrix-empty }}
keep-going:
description: True if keep-going label was on PR.
value: ${{ steps.filter.outputs.keep-going }}
runs:
using: composite
steps:
- uses: nick-fields/retry@3e91a01664abd3c5cd539100d10d33b9c5b68482
- uses: nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a
name: Setup dependencies
env:
GITHUB_TOKEN: ${{ inputs.github-token }}
@ -55,9 +52,7 @@ runs:
.github/scripts/filter_test_configs.py \
--test-matrix "${{ inputs.test-matrix }}" \
--pr-number "${{ github.event.pull_request.number }}" \
--tag "${{ steps.parse-ref.outputs.tag }}" \
--event-name "${{ github.event_name }}" \
--schedule "${{ github.event.schedule }}"
--tag "${{ steps.parse-ref.outputs.tag }}"
- name: Print the filtered test matrix
shell: bash

View File

@ -15,14 +15,17 @@ outputs:
runs:
using: composite
steps:
- name: Get jobid or fail
# timeout-minutes is unsupported for composite workflows, see https://github.com/actions/runner/issues/1979
# timeout-minutes: 10
shell: bash
- uses: nick-fields/retry@7d4a37704547a311dbb66ebdf5b23ec19374a767
id: get-job-id
run: |
set -eux
GHA_WORKFLOW_JOB_ID=$(python3 .github/scripts/get_workflow_job_id.py "${GITHUB_RUN_ID}" "${RUNNER_NAME}")
echo "job-id=${GHA_WORKFLOW_JOB_ID}" >> "${GITHUB_OUTPUT}"
env:
GITHUB_TOKEN: ${{ inputs.github-token }}
with:
shell: bash
timeout_minutes: 10
max_attempts: 5
retry_wait_seconds: 30
command: |
set -eux
python3 -m pip install requests==2.26.0
GHA_WORKFLOW_JOB_ID=$(python3 .github/scripts/get_workflow_job_id.py "${GITHUB_RUN_ID}" "${RUNNER_NAME}")
echo "::set-output name=job-id::${GHA_WORKFLOW_JOB_ID}"

View File

@ -9,16 +9,6 @@ runs:
shell: bash
run: echo "DOCKER_HOST=unix:///run/user/$(id -u)/docker.sock" >> "${GITHUB_ENV}"
- name: Stop all running docker containers
if: always()
shell: bash
run: |
# ignore expansion of "docker ps -q" since it could be empty
# shellcheck disable=SC2046
docker stop $(docker ps -q) || true
# Prune all stopped containers.
docker container prune -f
- name: Runner health check system info
if: always()
shell: bash
@ -45,21 +35,10 @@ runs:
shell: bash
run: |
ngpu=$(rocminfo | grep -c -E 'Name:.*\sgfx')
msg="Please file an issue on pytorch/pytorch reporting the faulty runner. Include a link to the runner logs so the runner can be identified"
if [[ $ngpu -eq 0 ]]; then
echo "Error: Failed to detect any GPUs on the runner"
echo "$msg"
if [[ "x$ngpu" != "x2" && "x$ngpu" != "x4" ]]; then
echo "Failed to detect GPUs on the runner"
exit 1
fi
if [[ $ngpu -eq 1 ]]; then
echo "Error: only 1 GPU detected, at least 2 GPUs are needed for distributed jobs"
echo "$msg"
exit 1
fi
- name: Runner diskspace health check
uses: ./.github/actions/diskspace-cleanup
if: always()
- name: Runner health check disconnect on failure
if: ${{ failure() }}
@ -76,5 +55,11 @@ runs:
- name: ROCm set GPU_FLAG
shell: bash
run: |
# All GPUs are visible to the runner; visibility, if needed, will be set by run_test.py.
echo "GPU_FLAG=--device=/dev/mem --device=/dev/kfd --device=/dev/dri --group-add video --group-add daemon" >> "${GITHUB_ENV}"
# Examine the runner name. If it ends with "-2", this is the second runner on the host.
if [[ ${{ runner.name }} == *-2 ]]; then
# select the last two GPUs on the host
echo "GPU_FLAG=--device=/dev/mem --device=/dev/kfd --device=/dev/dri/renderD130 --device=/dev/dri/renderD131 --group-add video --group-add daemon" >> "${GITHUB_ENV}"
else
# select the first two GPUs on the host
echo "GPU_FLAG=--device=/dev/mem --device=/dev/kfd --device=/dev/dri/renderD128 --device=/dev/dri/renderD129 --group-add video --group-add daemon" >> "${GITHUB_ENV}"
fi

View File

@ -37,48 +37,29 @@ runs:
run: |
Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore
- name: Setup conda
shell: bash
- name: Install Visual Studio 2019 toolchain
shell: powershell
env:
VS_VERSION: "16.8.6"
INSTALL_WINDOWS_SDK: "1"
run: |
# Windows conda is baked into the AMI at this location
CONDA="C:\Jenkins\Miniconda3\condabin\conda.bat"
.\.circleci\scripts\vs_install.ps1
{
echo "CONDA_RUN=${CONDA} run --no-capture-output";
echo "CONDA_BUILD=${CONDA} run conda-build";
echo "CONDA_INSTALL=${CONDA} install";
} >> "${GITHUB_ENV}"
- name: Install CUDA and CUDNN
shell: bash
if: inputs.cuda-version != 'cpu'
env:
CUDA_VERSION: ${{ inputs.cuda-version }}
run: |
.circleci/scripts/windows_cuda_install.sh
.circleci/scripts/windows_cudnn_install.sh
- name: Setup Python3
shell: bash
run: |
set +e
PYTHON3=$(${CONDA_RUN} which python3)
EXIT_CODE=$?
if [[ "${EXIT_CODE}" == "0" ]]; then
echo "Found Python3 at ${PYTHON3}, adding it into GITHUB_PATH"
PYTHON_PATH=$(dirname "${PYTHON3}")
echo "${PYTHON_PATH}" >> "${GITHUB_PATH}"
else
# According to https://docs.conda.io/en/latest/miniconda.html, we are using the Miniconda3
# installation, which is Python 3 based. Its Python is default to Python 3. Further, there
# is also the Miniconda installation that is Python 2 based, and both can be installed if
# needed. In both cases, Python binary is just called python
PYTHON=$(${CONDA_RUN} which python)
EXIT_CODE=$?
if [[ "${EXIT_CODE}" == "0" ]]; then
PYTHON3=$(echo "${PYTHON}" | sed "s/python/python3/")
# It's difficult to setup alias across GitHub action steps, so I just add a softlink
# here pointing to Python
ln -s "${PYTHON}" "${PYTHON3}"
PYTHON_PATH=$(dirname "${PYTHON}")
echo "${PYTHON_PATH}" >> "${GITHUB_PATH}"
else
echo "Found no Python using ${CONDA_RUN}"
fi
fi
uses: actions/setup-python@v2
with:
python-version: "3.x"
cache: pip
cache-dependency-path: |
**/requirements.txt
**/.circleci/docker/requirements-ci.txt
**/.github/requirements-gha-cache.txt

View File

@ -1,19 +0,0 @@
name: Teardown ROCm host
description: Tear down ROCm host for CI
runs:
using: composite
steps:
- name: Teardown ROCm
if: always()
shell: bash
run: |
# ignore expansion of "docker ps -q" since it could be empty
# shellcheck disable=SC2046
docker stop $(docker ps -q) || true
# Prune all stopped containers.
docker container prune -f
- name: Runner diskspace health check
uses: ./.github/actions/diskspace-cleanup
if: always()

View File

@ -15,6 +15,7 @@ runs:
-e BINARY_ENV_FILE \
-e BUILDER_ROOT \
-e BUILD_ENVIRONMENT \
-e BUILD_SPLIT_CUDA \
-e DESIRED_CUDA \
-e DESIRED_DEVTOOLSET \
-e DESIRED_PYTHON \

View File

@ -34,7 +34,7 @@ runs:
run: |
# Remove any previous test reports if they exist
rm -f test-reports-*.zip
zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' -i '*.csv'
zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml'
- name: Zip usage log for upload
if: runner.os != 'Windows' && !inputs.use-gha
@ -49,9 +49,6 @@ runs:
if [ -f 'usage_log.txt' ]; then
zip "usage-log-${FILE_SUFFIX}.zip" 'usage_log.txt'
fi
if ls test/**/*.log 1> /dev/null 2>&1; then
zip -r "usage-log-${FILE_SUFFIX}.zip" test -i '*.log'
fi
# Windows zip
- name: Zip JSONs for upload
@ -70,17 +67,16 @@ runs:
FILE_SUFFIX: ${{ inputs.file-suffix }}
run: |
# -ir => recursive include all files in pattern
7z a "test-reports-$Env:FILE_SUFFIX.zip" -ir'!test\*.xml' -ir'!test\*.csv'
7z a "test-reports-$Env:FILE_SUFFIX.zip" -ir'!test\*.xml'
- name: Zip usage log for upload
if: runner.os == 'Windows' && !inputs.use-gha
continue-on-error: true
shell: powershell
env:
FILE_SUFFIX: ${{ inputs.file-suffix }}
run: |
# -ir => recursive include all files in pattern
7z a "usage-log-$Env:FILE_SUFFIX.zip" 'usage_log.txt' -ir'!test\*.log'
7z a "usage-log-$Env:FILE_SUFFIX.zip" 'usage_log.txt'
# S3 upload
- name: Store Test Downloaded JSONs on S3
@ -106,7 +102,6 @@ runs:
- name: Store Usage Logs on S3
uses: seemethere/upload-artifact-s3@v5
if: ${{ !inputs.use-gha }}
continue-on-error: true
with:
s3-prefix: |
${{ github.repository }}/${{ github.run_id }}/${{ github.run_attempt }}/artifact
@ -116,9 +111,8 @@ runs:
# GHA upload
- name: Store Test Downloaded JSONs on Github
uses: actions/upload-artifact@v3
uses: actions/upload-artifact@v2
if: inputs.use-gha
continue-on-error: true
with:
# Add the run attempt, see [Artifact run attempt]
name: test-jsons-runattempt${{ github.run_attempt }}-${{ inputs.file-suffix }}.zip
@ -127,28 +121,11 @@ runs:
path: test/**/*.json
- name: Store Test Reports on Github
uses: actions/upload-artifact@v3
uses: actions/upload-artifact@v2
if: inputs.use-gha
continue-on-error: true
with:
# Add the run attempt, see [Artifact run attempt]
name: test-reports-runattempt${{ github.run_attempt }}-${{ inputs.file-suffix }}.zip
retention-days: 14
# Don't want to fail the workflow here because not all workflows have csv files
if-no-files-found: ignore
path: |
test/**/*.xml
test/**/*.csv
- name: Store Usage Logs on Github
uses: actions/upload-artifact@v3
if: inputs.use-gha
continue-on-error: true
with:
# Add the run attempt, see [Artifact run attempt]
name: usage-log-runattempt${{ github.run_attempt }}-${{ inputs.file-suffix }}.zip
retention-days: 14
if-no-files-found: ignore
path: |
usage_log.txt
test/**/*.log
if-no-files-found: error
path: test/**/*.xml

View File

@ -4,18 +4,16 @@ reviewers:
symbolic-shapes:
- ezyang
- Chillee
- wconstab
- anjali411
- albanD
- Krovatkin
- miladm
- bdhirsh
- voznesenskym
- jbschlosser
per_author:
symbolic-shapes:
- symbolic-shapes
- antoniojkim
- wconstab
- SherlockNoMad
files:
# none yet, TODO: migrate CODEOWNERS here

View File

@ -1 +0,0 @@
ebee0a27940adfbb30444d83387b9ea0f1173f40

View File

@ -1 +0,0 @@
7dd29931fa8e9bb7c970f05f8c0dc13b69e17494

View File

@ -1 +0,0 @@
5b78d074bd303eb230d30567646fcf0358ee2dd4

View File

@ -1 +0,0 @@
6635bc3f7d06c6a0d0481803b24d6ad0004b61ac

View File

@ -1 +0,0 @@
24b95f2f627bf07a61cefed653419389a7586357

Some files were not shown because too many files have changed in this diff Show More