Compare commits

..

83 Commits

Author SHA1 Message Date
29fe90e2a2 [release/1.6] [JIT] Dont include view ops in autodiff graphs (#42029)
* Dont include view ops in autodiff graphs

* skip view ops in autodiff testing

* two more tests

* appease calng format

* Pacify clang-format

Co-authored-by: eellison <eellison@fb.com>
Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com>
2020-07-24 13:41:32 -07:00
35ad2d8586 Revert "[jit] fix tuple alias analysis (#41992)"
This reverts commit 8aa878fc935564bdd1e4fc00d7f34381a746b504.
2020-07-24 13:32:00 -07:00
994b37b36e [release/1.6] .circleci: Don't use SCCACHE for windows release builds (#42024)
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
2020-07-24 11:15:26 -07:00
8aa878fc93 [jit] fix tuple alias analysis (#41992)
Previously when analyzing a TupleConstruct, we ignored the aliasing
information of the inputs and simply marked all elements of the returned
tuple as wildcards. But since we can fully reason about the contents of
a tuple statically, we should be able to assign them aliasing
information.

This analysis was not only incomplete but produced incorrect results,
since if `a` is not a wildcard, `a noalias wilcard`. So if we looked at
`tuple(a)` and reported the aliasing info as `tuple(wildcard)`, then
`tuple[0] noalias a`, which is...wrong.
2020-07-24 08:05:20 -07:00
7c7c9c3aa6 scatter/gather - check that inputs are of the same dimensionality (#41890)
Co-authored-by: Nikita Vedeneev <nik@quansight.com>
2020-07-22 18:33:07 -07:00
a2922f589d [1.6.0] Mark torch.set_deterministic and torch.is_deterministic as experimental (#41870)
This PR:
- renames `torch.set_deterministic` to `torch._set_deterministic`
- renames `torch.is_deterministic` to `torch._is_deterministic`
- Modifies the docstrings for both to indicate that the feature is not
yet complete.

We would like to do this because this feature is experimental and the
docstrings before this PR are misleading.

This PR does not have an accompanying change in master. That is because
there still is discussion over what the eventual state of the feature
should be: https://github.com/pytorch/pytorch/issues/15359. I expect
that there will be a better plan for this once 1.7 rolls around.

Test Plan:
- wait for CI
2020-07-22 18:32:47 -07:00
8acfecaecb [1.6] Add optimizer_for_mobile doc into python api root doc (#41491)
* Add optimizer_for_mobile doc into python api root doc

* Apply suggestions from code review

Remove all references to `optimization_blacklist` as it's missing in 1.6

Co-authored-by: Nikita Shulga <nshulga@fb.com>
2020-07-22 17:37:45 -07:00
860e18a61b Update torch.set_default_dtype doc (#41263)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41263

Test Plan: Imported from OSS

Differential Revision: D22482989

Pulled By: anjali411

fbshipit-source-id: 2aadfbb84bbab66f3111970734a37ba74d817ffd
2020-07-22 14:50:15 -07:00
8f804baaa9 Doc note for complex (#41252)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41252

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D22553266

Pulled By: anjali411

fbshipit-source-id: f6dc409da048496d72b29b0976dfd3dd6645bc4d
2020-07-22 14:49:51 -07:00
a395e0903e Autograd Doc for Complex Numbers (#41012)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41012

Test Plan: Imported from OSS

Differential Revision: D22476911

Pulled By: anjali411

fbshipit-source-id: 7da20cb4312a0465272bebe053520d9911475828
2020-07-22 14:40:52 -07:00
2ca55430d2 Add reference documentation for torch/library.h (#41470) (#41602)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41470

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D22577426

Pulled By: ezyang

fbshipit-source-id: 4bfe5806061e74181a74d161c868acb7c1ecd1e4
2020-07-22 11:10:16 -07:00
b8e77a42bd Add CUDA11 build and test (#40452) (#41543)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40452

Differential Revision: D22316007

Pulled By: malfet

fbshipit-source-id: 94f4b4ba2a46ff3d3042ba842a615f8392cdc350

Co-authored-by: Gao, Xiang <qasdfgtyuiop@gmail.com>
2020-07-22 09:53:22 -07:00
4081fdd3df Revert "port masked_select from TH to ATen and optimize perf on CPU (#33269)" (#41829)
This reverts commit fe66bdb498efe912d8b9c437a14efa4295c04fdd.

This also makes a sense to THTensorEvenMoreMath because sumall was removed, see THTensor_wrap.
2020-07-22 09:52:30 -07:00
cefb9e0cd6 Update pthreadpool to pthreadpool:029c88620802e1361ccf41d1970bd5b07fd6b7bb. (#40524) (#41190)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40524

Reviewed By: ezyang

Differential Revision: D22215742

Pulled By: AshkanAliabadi

fbshipit-source-id: ef594e0901337a92b21ddd44e554da66c723eb7c
2020-07-10 09:11:32 -07:00
d9e9e0087a [v1.6] [RPC docs] Remove mention of TensorPipe's SHM and CMA backends as they're not built (#41229)
Summary:
In short, we messed up. The SHM and CMA backends of TensorPipe are Linux-specific and thus they are guarded by a #ifdef in the agent's code. Due to a mishap with CMake (due the fact that TensorPipe has two CMake files, one for PyTorch and a "standalone" one) we were not correctly propagating some flags and these #ifdefs were always false. This means that these two backends have always been disabled and have thus never been covered by our OSS CI. It would be irresponsible to enable them now in v1.6, so instead we remove any mention of them from the docs.

Note that this is perhaps not as bad as it sounds. These two backends were providing higher performance (latency) when the two endpoints were on the same machine. However, I suspect that most RPC users will only do transfers across machines, for which SHM and CMA wouldn't have played any role.

Original PR against master: #41200 (merged as dde3d5f4a8f713ecc4649d776565b68ca75ae5c8)

Test Plan: Docs only
2020-07-10 09:02:08 -07:00
43d746305c Preserve CUDA gencode flags (#41212)
Summary:
Add `torch._C._cuda_getArchFlags()` that returns list of architecture `torch_cuda` were compiled with
Add `torch.cuda.get_arch_list()` and `torch.cuda.get_gencode_flags()` methods that returns architecture list and gencode flags PyTorch were compiled with
Print warning if some of GPUs is not compatible with any of the CUBINs

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41173

Differential Revision: D22459998

Pulled By: malfet

fbshipit-source-id: 65d40ae29e54a0ba0f3f2da11b821fdb4d452d95
2020-07-09 17:34:50 -07:00
9409e03903 [ONNX][1.6] Update interpolate recompute_scale_factor default (#41117)
* Update interpolate recompute_scale_factor default

* Update upsampling.h

* Update functional.py
2020-07-09 17:24:53 -07:00
c9a1853d2f [1.6] Make IterableDataset DataLoader.__len__ warning clearer (#41185)
* make IterableDataset DataLoader.__len__ warning clearer

* typo
2020-07-09 14:07:58 -07:00
7fa9b2923b quantizer.cpp: fix cuda memory pinning (#41139) (#41194)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41139

Fixes the test case in https://github.com/pytorch/pytorch/issues/41115
by using PyTorch's CUDA allocator instead of the old Caffe2 one.

Test Plan:
run the test case from the issue:
https://gist.github.com/vkuzo/6d013aa1645cb986d0d4464a931c779b

let's run CI and see what it uncovers

Imported from OSS

Reviewed By: malfet

Differential Revision: D22438787

fbshipit-source-id: 0853b0115d198a99c43e6176aef34ea951bf5c2e

Co-authored-by: Vasiliy Kuznetsov <vasiliy@fb.com>
2020-07-09 14:06:11 -07:00
40bf15a8ac Remove copy_ warnings for angle and abs for complex tensors (#41152) (#41191)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41152

fixes https://github.com/pytorch/pytorch/issues/40838

Test Plan: Imported from OSS

Differential Revision: D22444357

Pulled By: anjali411

fbshipit-source-id: 2879d0cffc0a011c624eb8e00c7b64bd33522cc3

Co-authored-by: anjali411 <chourdiaanjali123@gmail.com>
2020-07-09 13:41:15 -07:00
c164fc4d7f Patch #40883 to 1.6 release. (#41033) 2020-07-09 10:25:39 -07:00
e0b7480f34 Revert "make IterableDataset DataLoader.__len__ warning clearer (#41183)"
This reverts commit 89d7f194d8ea19f36c9afb52585a00b5b7d0ffeb.
2020-07-09 08:05:24 -07:00
89d7f194d8 make IterableDataset DataLoader.__len__ warning clearer (#41183) 2020-07-09 08:00:00 -07:00
59bb44a8e8 Add a link in RPC doc page to point to PT Distributed overview (#41108) (#41156)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41108

Test Plan: Imported from OSS

Differential Revision: D22440751

Pulled By: mrshenli

fbshipit-source-id: 9e7b002091a3161ae385fdfcc26484ae8fc243bb
2020-07-09 07:49:10 -07:00
8f4d01d9f1 Disables unary op casting to output dtype (#41097) (#41160)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/41047.

Some CPU kernel implementations don't call `cast_outputs()`, so when CPU temporaries were created to hold their outputs they weren't copied back to the out parameters correctly. Instead of fixing that issue, for simplicity this PR disables the behavior. The corresponding test in test_type_promotion.py is expanded with more operations to verify that unary ops can no longer have out arguments with different dtypes than their inputs (except in special cases like torch.abs which maps complex inputs to float outputs and torch.deg2rad which is secretly torch.mul).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41097

Differential Revision: D22422352

Pulled By: mruberry

fbshipit-source-id: 8e61d34ef1c9608790b35cf035302fd226fd9421

Co-authored-by: Mike Ruberry <mruberry@devfair044.maas>
2020-07-08 22:06:48 -07:00
77ffb25925 Add guard for non-default stream in DDP's autograd engine callback (#40115) (#41151)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40115

Closes https://github.com/pytorch/pytorch/issues/37790
Closes https://github.com/pytorch/pytorch/issues/37944

A user may wish to run DDP's forward + backwards step under a non-default CUDA stream such as those created by `with torch.cuda.Stream(stream)`. In this case, the user should be responsible for synchronizing events on this stream with other streams used in the program (per the documentation at https://pytorch.org/docs/stable/notes/cuda.html#cuda-semantics), but currently DDP has a bug which causes DDP under non-default streams to fail.

If a user does the following:
```
model = DDP(...)
loss = model(inptut).sum()
loss.backward()
grad = model.module.weight.grad()
average = dist.all_reduce(grad)
```

There is a chance that `average` and `grad` will not be equal. This is because the CUDA kernels corresponding to the  `all_reduce` call may run before `loss.backward()`'s kernels are finished. Specifically, in DDP we copy the allreduced gradients back to the model parameter gradients in an autograd engine callback, but this callback runs on the default stream. Note that this can also be fixed by the application synchronizing on the current stream, although this should not be expected, since the application is not using the current stream at all.

This PR fixes the issue by passing the current stream into DDP's callback.

Tested by adding a UT `test_DistributedDataParallel_non_default_stream` that fails without this PR
ghstack-source-id: 106481208

Differential Revision: D22073353

fbshipit-source-id: 70da9b44e5f546ff8b6d8c42022ecc846dff033e
2020-07-08 21:08:17 -07:00
af9600b1f5 [Caffe2] Move in-header virtual function implementation to .cc files (#41090)
* Move OperatorSchema default inference function implementations to .cc… (#40845)

Summary:
… file

This prevents implementation of those functions(as lambdas) to be embedded as weak symbol into every shared library that includes this header.

Combination of this and https://github.com/pytorch/pytorch/pull/40844 reduces size of `libcaffe2_module_test_dynamic.so` from 500kb to 50Kb.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40845

Differential Revision: D22334779

Pulled By: malfet

fbshipit-source-id: 64706918fc2947350a58c0877f294b1b8b085455

* Move `OperatorBase::AddRelatedBlobInfo` implementation to .cc file (#40844)

Summary:
If virtual function is implemented in header file, it's implementation will be included as a weak symbol to every shared library that includes this header along with all of it's dependencies.

This was one of the reasons why size of libcaffe2_module_test_dynamic.so  was 500Kb (AddRelatedBlobInfo implementation pulled a quarter of libprotobuf.a with it)

Combination of this and https://github.com/pytorch/pytorch/issues/40845 reduces size of `libcaffe2_module_test_dynamic.so` from 500kb to 50Kb.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40844

Differential Revision: D22334725

Pulled By: malfet

fbshipit-source-id: 836a4cbb9f344355ddd2512667e77472546616c0
2020-07-07 21:17:11 -07:00
83262b1ba1 torch._six.PY37 should be true for Python-3.8 as well (#40868) (#41091)
Summary:
Right now it is used to check whether `math.remainder` exists, which is the case for both Python-3.7 and 3.8
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40868

Differential Revision: D22343454

Pulled By: malfet

fbshipit-source-id: 6b6d4869705b64c4b952309120f92c04ac7e39fd
2020-07-07 17:15:01 -07:00
f862a6ba4d Remove unused Logger in get_matching_activations (#41023) (#41087)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41023

Remove Logger in get_matching_activations since it's not used.
ghstack-source-id: 107237046

Test Plan:
buck test mode/dev caffe2/test:quantization -- 'test_compare_weights_lstm_dynamic'
buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub_lstm_dynamic'
buck test mode/dev caffe2/test:quantization -- 'test_compare_model_outputs_lstm_dynamic'
buck test mode/dev caffe2/test:quantization -- 'test_compare_weights_conv_static'
buck test mode/dev caffe2/test:quantization -- 'test_compare_weights_linear_static'
buck test mode/dev caffe2/test:quantization -- 'test_compare_weights_linear_dynamic'
buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub_conv_static'
buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub_linear_static'
buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub_submodule_static'
buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub_functional_static'
buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub_linear_dynamic'
buck test mode/dev caffe2/test:quantization -- 'test_compare_model_outputs_conv_static'
buck test mode/dev caffe2/test:quantization -- 'test_compare_model_outputs_linear_static'
buck test mode/dev caffe2/test:quantization -- 'test_compare_model_outputs_functional_static'
buck test mode/dev caffe2/test:quantization -- 'test_compare_model_outputs_linear_dynamic'

Differential Revision: D22394957

fbshipit-source-id: 7d59e0f35e9f4c304b8487460d48236ee6e5a872

Co-authored-by: Haixin Liu <haixin@fb.com>
2020-07-07 16:09:37 -07:00
f3c1ea7455 [PyTorch Numeric Suite] Remove unnecessary Logger in input arguments (#40890) (#41086)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40890

Remove unnecessary Logger in input arguments and simplify the API.
ghstack-source-id: 107110487

Test Plan:
buck test mode/dev caffe2/test:quantization -- 'test_compare_weights_lstm_dynamic'
buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub_lstm_dynamic'
buck test mode/dev caffe2/test:quantization -- 'test_compare_model_outputs_lstm_dynamic'
buck test mode/dev caffe2/test:quantization -- 'test_compare_weights_conv_static'
buck test mode/dev caffe2/test:quantization -- 'test_compare_weights_linear_static'
buck test mode/dev caffe2/test:quantization -- 'test_compare_weights_linear_dynamic'
buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub_conv_static'
buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub_linear_static'
buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub_submodule_static'
buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub_functional_static'
buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub_linear_dynamic'
buck test mode/dev caffe2/test:quantization -- 'test_compare_model_outputs_conv_static'
buck test mode/dev caffe2/test:quantization -- 'test_compare_model_outputs_linear_static'
buck test mode/dev caffe2/test:quantization -- 'test_compare_model_outputs_functional_static'
buck test mode/dev caffe2/test:quantization -- 'test_compare_model_outputs_linear_dynamic'

Differential Revision: D22345477

fbshipit-source-id: d8b4eb3d6cb3049aa3296dead8ba29bf5467bd1c

Co-authored-by: Haixin Liu <haixin@fb.com>
2020-07-07 16:09:11 -07:00
2ed3ad2891 fix autodoc for torch.distributed.launch (#40963) (#41089)
Summary:
The doc for `torch.distributed.launch` is missing since v1.2.0 (see issue https://github.com/pytorch/pytorch/issues/36386) because PR https://github.com/pytorch/pytorch/issues/22501 added some imports at the first line.
542ac74987/torch/distributed/launch.py (L1-L5)
I move it below the docstring to make the autodoc in Sphinx work normally.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/40963

Differential Revision: D22380816

Pulled By: mrshenli

fbshipit-source-id: ee8406785b9a198bbf3fc65e589854379179496f

Co-authored-by: Xin Yao <yaox12@outlook.com>
2020-07-07 14:23:31 -07:00
a857af50a4 [quant][graphmode][fix] cloning schema in insert_observers (#40624) (#40934)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40624

Previously we didn't clone schema, so the default schema is used, this is
causing issue for some models

Test Plan: Imported from OSS

Differential Revision: D22259519

fbshipit-source-id: e2a393a54cb18f55da0c7152a74ddc22079ac350
2020-07-07 13:27:36 -07:00
d0045e5520 Some fixes for graph mode quantization (#40935)
* [quant] aten::repeat work for quantized tensor (#40644)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40644

Test Plan: Imported from OSS

Differential Revision: D22268558

fbshipit-source-id: 3bc9a129bece1b547c519772ecc6b980780fb904

* [quant][graphmode][fix] remove unsupported ops in the list (#40653)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40653

(Note: this ignores all push blocking failures!)

Test Plan: Imported from OSS

Differential Revision: D22271413

fbshipit-source-id: a01611b5d90849ac673fa5a310f910c858e907a3
2020-07-07 13:26:27 -07:00
0406b69b79 [quant][graphmode][fix] Fold conv bn (#40865) (#40970)
* [quant][graphmode][fix] Fold conv bn (#40865)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40865

1. applied filter for the module types
2. removed the assumption that the conv bn are immediate child of parent module

Test Plan:
python test/test_quantization.py TestQuantizeJitPasses

Imported from OSS

Differential Revision: D22338074

fbshipit-source-id: 64739a5e56c0a74249a1dbc2c8454b88ec32aa9e

* [quant][graphmode][fix] Print the node in error message (#40889)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40889

Test Plan: Imported from OSS

Differential Revision: D22348266

fbshipit-source-id: eed2ece5c94fcfaf187d6770bed4a7109f0c0b4a
2020-07-07 13:25:39 -07:00
6220cc4380 [quant][graphmode][fix] dequantize propagation for {add/mul}_scalar + aten::repeat (#40933)
* [quant][graphmode][fix] dequantize propagation for {add/mul}_scalar (#40596)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40596

Previously the fusion patterns for {add/mul}_scalar is inconsistent since the op pattern
produces a non-quantized tensor and the op replacement graph produces a quantized tensor

Test Plan: Imported from OSS

Differential Revision: D22251072

fbshipit-source-id: e16eb92cf6611578cca1ed8ebde961f8d0610137

* [quant][graphmode] Support quantization for `aten::apend` (#40743)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40743

`aten::append` modifies input inplace and the output is ignored, these ops are not
supported right now, so we'll need to first make `aten::append` non-inplace
by change
```
ignored = aten::append(list, x)
```
to
```
x_list = aten::ListConstruct(x)
result = aten::add(list, x_list)
```
and then quantize the aten::add instead.

Test Plan:
TestQuantizeJitOps.test_general_shape_ops

Imported from OSS

Differential Revision: D22302151

fbshipit-source-id: 931000388e7501e9dd17bec2fad8a96b71a5efc5
2020-07-07 13:25:02 -07:00
eaf3f2fd34 Added index_put to promotelist (#41036)
* Added index_put to promotelist

* docstring

Co-authored-by: Michael Carilli <mcarilli@nvidia.com>
2020-07-07 13:00:32 -07:00
c35b4c770b Bucket of shape analysis fixes (#41044)
* [JIT] fix unfold shape analysis (#40749)

Summary:
unfold on a 0-dimensioned tensor returns a 1-dim tensor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40749

Differential Revision: D22361481

Pulled By: eellison

fbshipit-source-id: 621597e5f97f6e39953eb86f8b85bb4142527a9f

* shape analysis fix for default dtype'

ghstack-source-id: 723aa27c2685417715a0891f5ca1ae885d4c9832
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40938

* fix grad thrashing of shape analysis

ghstack-source-id: dd8742b1da52d17e9d6ab6c81ff0b27520b09417
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40939

Co-authored-by: Elias Ellison <eellison@fb.com>
2020-07-07 12:59:47 -07:00
11baccf1b5 [release/1.6] .circleci: Output binary sizes, store binaries (#41075)
We need an easy to way to quickly visually grep binary sizes from builds
and then have a way to test out those binaries quickly.

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
(cherry picked from commit 66813515d4dec66f319442ba967c64b87c0286cd)
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
2020-07-07 11:27:00 -07:00
f0f0cbdd4a Docstring changes for dynamic quantized classes (#40931) (#41032)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40931

Fix docstrings for dynamic quantized Linear/LSTM and associated classes
ghstack-source-id: 107064446

Test Plan: Docs show up in correctly

Differential Revision: D22360787

fbshipit-source-id: 8e357e081dc59ee42fd7f12ea5079ce5d0cc9df2
2020-07-06 21:37:53 -07:00
11b70b0041 [JIT] Switch executor from Simple to Legacy. (#41017)
* properly skip legacy tests regardless of the default executor (#40381)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40381

Differential Revision: D22173938

Pulled By: Krovatkin

fbshipit-source-id: 305fc4484977e828cc4cee6e053a1e1ab9f0d6c7

* [JIT] Switch executor from Simple to Legacy.

This is done for 1.6 only in order to recover performance regressions
caused by the Legacy->Simple switch that was done in 1.5. On master we
still plan to use Simple executor and fix the performance issues in 1.7
without falling back to the Legacy executor.

Co-authored-by: Nikolay Korovaiko <korovaikon@gmail.com>
2020-07-06 21:35:02 -07:00
01e9562313 [1.6 cherrypick] Fix delegating to jit.load from torch.load (#41013) 2020-07-06 16:55:00 -07:00
3f13c9a2c8 infer tensor properties based on an input tensor rather than defaults for xxx_like ctors (#40895) (#41016)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40895

Reviewed By: eellison

Differential Revision: D22358878

Pulled By: Krovatkin

fbshipit-source-id: 2db2429aa89c180d8e52a6bb1265308483da46a2
2020-07-06 16:52:59 -07:00
63a94c021a shape inference of undefined for prim::grad (#40866) (#41015)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40866

Reviewed By: pbelevich

Differential Revision: D22358988

Pulled By: Krovatkin

fbshipit-source-id: 7118d7f8d4eaf056cfb71dc0d588d38b1dfb0fc7
2020-07-06 16:51:37 -07:00
2b175ba909 update requires_gard on loop inputs correctly (master) (#40926) (#41014)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40926

Reviewed By: eellison

Differential Revision: D22359471

Pulled By: Krovatkin

fbshipit-source-id: 823e87674e2d2917f075255ec926e0485972f4e2
2020-07-06 16:30:14 -07:00
8c3f662224 Update FP16 to FP16:4dfe081cf6bcd15db339cf2680b9281b8451eeb3. (#40956) 2020-07-06 06:59:41 -07:00
0ffdd5aa1d Update cpuinfo to cpuinfo:63b254577ed77a8004a9be6ac707f3dccc4e1fd9. (#40955) 2020-07-06 06:59:30 -07:00
d53427c541 Update FXdiv to FXdiv:b408327ac2a15ec3e43352421954f5b1967701d1. (#40954) 2020-07-06 06:59:17 -07:00
b44b1d868e Update psimd to psimd:072586a71b55b7f8c584153d223e95687148a900 (#40953) 2020-07-06 06:59:01 -07:00
9184c9832e Re-apply PyTorch pthreadpool changes (#40951)
* Re-apply PyTorch pthreadpool changes

Summary:
This re-applies D21232894 (b9d3869df3) and D22162524, plus updates jni_deps in a few places
to avoid breaking host JNI tests.

Test Plan: `buck test @//fbandroid/mode/server //fbandroid/instrumentation_tests/com/facebook/caffe2:host-test`

Reviewed By: xcheng16

Differential Revision: D22199952

fbshipit-source-id: df13eef39c01738637ae8cf7f581d6ccc88d37d5

* Enable XNNPACK ops on iOS and macOS.

Test Plan: buck run aibench:run_bench -- -b aibench/specifications/models/pytorch/pytext/pytext_mobile_inference.json --platform ios --framework pytorch --remote --devices D221 (9788a74da8)AP-12.0.1

Reviewed By: xta0

Differential Revision: D21886736

fbshipit-source-id: ac482619dc1b41a110a3c4c79cc0339e5555edeb

* Respect user set thread count. (#40707)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40707

Test Plan: Imported from OSS

Differential Revision: D22318197

Pulled By: AshkanAliabadi

fbshipit-source-id: f11b7302a6e91d11d750df100d2a3d8d96b5d1db

* Fix and reenable threaded QNNPACK linear (#40587)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40587

Previously, this was causing divide-by-zero only in the multithreaded
empty-batch case, while calculating tiling parameters for the threads.
In my opinion, the bug here is using a value that is allowed to be zero
(batch size) for an argument that should not be zero (tile size), so I
fixed the bug by bailing out right before the call to
pthreadpool_compute_4d_tiled.

Test Plan: TestQuantizedOps.test_empty_batch

Differential Revision: D22264414

Pulled By: dreiss

fbshipit-source-id: 9446d5231ff65ef19003686f3989e62f04cf18c9

* Fix batch size zero for QNNPACK linear_dynamic (#40588)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40588

Two bugs were preventing this from working.  One was a divide by zero
when multithreading was enabled, fixed similarly to the fix for static
quantized linear in the previous commit.  The other was computation of
min and max to determine qparams.  FBGEMM uses [0,0] for [min,max] of
empty input, do the same.

Test Plan: Added a unit test.

Differential Revision: D22264415

Pulled By: dreiss

fbshipit-source-id: 6ca9cf48107dd998ef4834e5540279a8826bc754

Co-authored-by: David Reiss <dreiss@fb.com>
2020-07-06 06:58:25 -07:00
e89c4f0dec [quant] Fix fuse linear pass (#40549) (#40751)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40549

Currently we didn't check if %weight_t is produced by `aten::t`, this will fuse some `matmul`/`addmm` that is
not 2d to `aten::linear`, which is incorrect

Test Plan: Imported from OSS

Differential Revision: D22225921

fbshipit-source-id: 9723e82fdbac6d8e1a7ade22f3a9791321ab12b6
2020-07-02 10:23:22 -07:00
ea273c68f9 Inplace construct of TorchScript Module and inplace option for quantization (#40750)
* [WIP][JIT] Add ScriptModule._reconstruct (#39979)

Summary:
**Summary**
This commit adds an instance method `_reconstruct` that permits users
to reconstruct a `ScriptModule` from a given C++ `Module` instance.

**Testing**
This commit adds a unit test for `_reconstruct`.

**Fixes**
This pull request fixes https://github.com/pytorch/pytorch/issues/33912.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39979

Differential Revision: D22172323

Pulled By: SplitInfinity

fbshipit-source-id: 9aa6551c422a5a324b822a09cd8d7c660f99ca5c

* [quant][graphmode] Enable inplace option for top level API (#40414)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40414

after `_reconstruct` is supported in RecursiveScriptModule: https://github.com/pytorch/pytorch/pull/39979
we can support inplace option in quantization API

Test Plan: Imported from OSS

Differential Revision: D22178326

fbshipit-source-id: c78bc2bcf2c42b06280c12262bb31aebcadc6c32

Co-authored-by: Meghan Lele <meghanl@fb.com>
2020-07-02 10:22:45 -07:00
4dd37bfbf7 [jit] Remove unnecessary clone APIs for script::Module and RecursiveScriptModule (#40297) (#40748)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40297

Test Plan: Imported from OSS

Differential Revision: D22191660

fbshipit-source-id: 4b338ca82caaca04784bffe01fdae3d180c192f4
2020-07-02 10:22:27 -07:00
2533b9da83 Fix complex printing for sci_mode=True (#40513) (#40919)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40513

This PR makes the following changes:
1. Complex Printing now uses print formatting for it's real and imaginary values and they are joined at the end.
2. Adding 1. naturally fixes the printing of complex tensors in sci_mode=True

```
>>> torch.tensor(float('inf')+float('inf')*1j)
tensor(nan+infj)
>>> torch.randn(2000, dtype=torch.cfloat)
tensor([ 0.3015-0.2502j, -1.1102+1.2218j, -0.6324+0.0640j,  ...,
        -1.0200-0.2302j,  0.6511-0.1889j, -0.1069+0.1702j])
>>> torch.tensor([1e-3, 3+4j, 1e-5j, 1e-2+3j, 5+1e-6j])
tensor([1.0000e-03+0.0000e+00j, 3.0000e+00+4.0000e+00j, 0.0000e+00+1.0000e-05j,
        1.0000e-02+3.0000e+00j, 5.0000e+00+1.0000e-06j])
>>> torch.randn(3, dtype=torch.cfloat)
tensor([ 1.0992-0.4459j,  1.1073+0.1202j, -0.2177-0.6342j])
>>> x = torch.tensor([1e2, 1e-2])
>>> torch.set_printoptions(sci_mode=False)
>>> x
tensor([  100.0000,     0.0100])
>>> x = torch.tensor([1e2, 1e-2j])
>>> x
tensor([100.+0.0000j,   0.+0.0100j])
```

Test Plan: Imported from OSS

Differential Revision: D22309294

Pulled By: anjali411

fbshipit-source-id: 20edf9e28063725aeff39f3a246a2d7f348ff1e8

Co-authored-by: anjali411 <chourdiaanjali123@gmail.com>
2020-07-02 09:45:35 -07:00
c5c8a85a82 If ninja is being used, force build_ext to run. (#40881)
As ninja has accurate dependency tracking, if there is nothing to do,
then we will very quickly noop.  But this is important for correctness:
if a change was made to a header that is not listed explicitly in
the distutils Extension, then distutils will come to the wrong
conclusion about whether or not recompilation is needed (but Ninja
will work it out.)

This caused https://github.com/pytorch/vision/issues/2367

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ghstack-source-id: 6409595c8ac091f3863f305c123266b9d3a167ad
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40837
2020-07-02 08:05:25 -07:00
b4b8f5b9d4 Release GIL during DDP construction. (#40877)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40495

As part of debugging flaky ddp_under_dist_autograd tests, I realized
we were running into the following deadlock.

1) Rank 0 would go into DDP construction, hold GIL and wait for broadcast in
DDP construction.
2) Rank 3 is a little slower and performs an RRef fetch call before the DDP
construction.
3) The RRef fetch call is done on Rank 0 and tries to acquire GIL.
4) We now have a deadlock since Rank 0 is waiting for Rank 3 to enter the
collective and Rank 3 is waiting for Rank 0 to release GIL.
ghstack-source-id: 106534442

Test Plan:
1) Ran ddp_under_dist_autograd 500 times.
2) waitforbuildbot

Differential Revision: D22205180

fbshipit-source-id: 6afd55342e801b9edb9591ff25158a244a8ea66a

Co-authored-by: Pritam Damania <pritam.damania@fb.com>
2020-07-01 13:36:50 -07:00
41816dc97f [1.6] Fix dictConstruct ordering and enable dict mix (#40797)
A combination of https://github.com/pytorch/pytorch/pull/39601 and
https://github.com/pytorch/pytorch/pull/40424 both are approved and
merged in master
2020-07-01 09:30:16 -07:00
31d9776c04 [1.6] fix autograd doc subsubsection display issue (#40796)
Master branch PR: https://github.com/pytorch/pytorch/pull/40582
2020-07-01 09:28:25 -07:00
ddea6c552f Ports full dtype inference deprecation to 1.6 (#40799)
* ports full deprecation

* fixtures

* Fixes lint

* Trying to fix phantom lint issue

* nuclear lint option

* Paradoxical linter fix

Co-authored-by: Mike Ruberry <mruberry@devfair044.maas>
2020-07-01 09:27:27 -07:00
091537a764 [JIT][1.6] Shape analysis fixes. (#40716)
* [JIT] Update type of the unsqueeze's output in shape analysis.

* [JIT] Fix shape analysis for aten::masked_select.

The reference says that this op always returns a 1-D tensor, even if
the input and the mask are 0-D.
2020-07-01 08:41:05 -07:00
bf4d905ea1 Fix wrong MSVC version constraint for CUDA 9.2 (#40794) (#40849)
Summary:
Tested with https://github.com/pytorch/pytorch/pull/40782.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40794

Differential Revision: D22318045

Pulled By: malfet

fbshipit-source-id: a737ffd7cb8a6a9efb62b84378318f4c3800ad8f
2020-07-01 08:37:40 -07:00
415e499330 Fix zip serialization for file > 2GiB for Windows (#40852) 2020-07-01 08:36:40 -07:00
eaf7dad5d6 [1.6 cherrypick] Support Pathlike for zipfile serialization (#40793) 2020-06-30 10:38:00 -07:00
75a074abdc 1.6 Port: Dynamic Versioning (#40542)
Co-authored-by: Mike Ruberry <mruberry@devfair044.maas>
2020-06-30 10:18:18 -07:00
dede34eab7 [1.6 cherrypick] Doc fix for complex views
Cherry-pick of https://github.com/pytorch/pytorch/pull/40450

Test Plan: Imported from OSS
2020-06-30 09:37:02 -07:00
0c90b6da5c [1.6 cherrypick] Fix zip serialization for file > 2GiB (#40757)
* [1.6 cherrypick] Fix zip serialization for file > 2GiB

* Update test/test_serialization.py

Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com>
2020-06-30 07:10:02 -07:00
4316199832 Add examples and tests for combining static/class method with async execution (#40619) (#40688)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40619

Test Plan: Imported from OSS

Differential Revision: D22258407

Pulled By: mrshenli

fbshipit-source-id: 036d85a2affc4505efd2df197fc513dba010e359
2020-06-29 19:34:23 -07:00
f993e5ac88 [1.6] Update TensorPipe submodule (#40634)
Upstream PR: #40614

Summary:
This update pulls in a oneliner fix, which sets the TCP_NODELAY option on the TCP sockets of the UV transport. This leads to exceptional performance gains in terms of latency, with about a 25x improvement in one simple benchmark. This thus resolves a regression that TensorPipe had compared to the ProcessGroup agent and, in fact, ends up beating it by 2x.

The benchmark I ran is this, with the two endpoints pinned to different cores of the same machine:
```
torch.jit.script
def remote_fn(t: int):
    return t

torch.jit.script
def local_fn():
    for _ in range(1_000_000):
        fut = rpc.rpc_async("rhs", remote_fn, (42,))
        fut.wait()
```

And the average round-trip time (one iteration) is:
- TensorPipe with SHM: 97.2 us
- TensorPipe with UV _after the fix_: 205us
- Gloo: 440us
- TensorPipe with UV _before the fix_: 5ms

Test Plan: Ran PyTorch RPC test suite
2020-06-29 19:33:32 -07:00
c5bd737f0c [JIT] script if tracing fix (#40468) (#40572)
Summary:
Currently, torchvision annotates `batched_nms` with `torch.jit.script` so the function gets compiled when it is traced and ONNX will work. Unfortunately, this means we are eagerly compiling batched_nms, which fails if torchvision isn't built with `torchvision.ops.nms`. As a result, torchvision doesn't work on torch hub right now.

`_script_if_tracing` could solve our problem here, but right now it does not correctly interact with recursive compilation. This PR fixes that bug.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40468

Reviewed By: jamesr66a

Differential Revision: D22195771

Pulled By: eellison

fbshipit-source-id: 83022ca0bab6d389a48a478aec03052c9282d2b7

Co-authored-by: Elias Ellison <eellison@fb.com>
2020-06-29 19:30:41 -07:00
fe45c2c986 Allow slicing sequential container (#40538)
- fixes #38034
- works around missing slice functionality in Sequential
  by casting to tuple and slicing that instead
- supports iterating on the resulting slice but not call()
2020-06-29 19:29:19 -07:00
a9996bb482 Fixes caffe2 loading issues on Windows (#39513) (#40487)
Summary:
Addresses https://github.com/pytorch/pytorch/issues/27840#issuecomment-638715422.
Contains a bunch of fixes (https://github.com/pytorch/pytorch/pull/39376 + https://github.com/pytorch/pytorch/pull/39334 + https://github.com/pytorch/pytorch/pull/38302 + https://github.com/pytorch/pytorch/pull/35362)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39513

Differential Revision: D22190761

Pulled By: malfet

fbshipit-source-id: b2d52f6cb16c233d16071e9c0670dfff7da2710e
(cherry picked from commit e2201e2ed8ed7bf9c6226f8c484192949d94c248)
2020-06-29 19:17:34 -07:00
bdfcbfa18c [release/1.6] .jenkins: Install torch from test channel (#40706)
We're on a test branch so we should install from the test channel

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
2020-06-29 13:53:14 -07:00
ea1b0dba18 Remove constexpr for NVCC on Windows (#40676) 2020-06-29 13:48:50 -07:00
6d85b2c989 Pin XLA CI to use r1.6 release branch. (#40721) 2020-06-29 13:41:14 -07:00
44f79651a7 Tweak file_diff_from_base for release/1.6 branch (#40712) 2020-06-29 11:41:46 -07:00
8682ac147b Docs merge (#40569)
Co-authored-by: Elias Ellison <eellison@fb.com>
2020-06-26 12:24:08 -07:00
4cc605e80a (1.6) Update docs feature classifications (#40539)
Co-authored-by: Eli Uriegas <1700823+seemethere@users.noreply.github.com>
2020-06-26 12:23:02 -07:00
b0cce716f7 Add beta warning for quant docs (#40540)
Add a beta warning to match stable and master docs: https://github.com/pytorch/pytorch/blob/master/docs/source/quantization.rst
2020-06-26 12:20:06 -07:00
0dc93ac119 [v1.6.0 patch] Install method docstrings from PyRRef to RRef (#40620)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40461

It turned out `:inheried-members:` (see [doc](https://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html#directive-autoclass)) is not really usable.

Because pybind11 generates a docstring that writes `self` as parent class, `rpc.PyRRef`, type.

As a workaround, I am pulling docstrings on parent-class, `PyRRef` class, into subclass, `RRef`. And do surgery on the docstring generated by pybind11.

{F241283111}

ghstack-source-id: 106472496

P134031188

Differential Revision: D7933834

fbshipit-source-id: c03a8a4c9d98888b64492a8caba1591595bfe247

Co-authored-by: Shihao Xu <shihaoxu@fb.com>
2020-06-26 12:15:28 -07:00
bb848df10b [1.6] Remove table of contents at the top of rpc.rst (#40482)
Master PR: https://github.com/pytorch/pytorch/pull/40205

Remove the table of contents created by the `.. contents:: :local: :depth: 2` since this page isn't one of the large documentation pages (https://github.com/pytorch/pytorch/issues/38010) and is simply a landing page for the Distributed RPC Framework.

Changes made in this original PR: f10fbcc820 (diff-250b9b23fd6f1a5c15aecdb72afb9d7d)
2020-06-26 08:37:49 -07:00
2dc0b84aca Skip test_mem_leak on Windows (#40498)
(cherry picked from commit 3fb6f038256a3a5ce43e857409ce4ffb807d93a5)
2020-06-25 16:45:48 -07:00
168cddf5f1 .circleci: Fix upload to backup directory
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
2020-06-23 20:57:42 -07:00
bc8760b3db .circleci: Fix pip installation of awscli
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
2020-06-23 19:05:48 -07:00
4269b9a8fc .circleci: Fix backup uploads
awscli was not loaded on conda builds and the backup upload did not work
since it was a recursive copy instead of just specifically copying what
we want.

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
2020-06-23 18:18:06 -07:00
8929 changed files with 315827 additions and 1491405 deletions

View File

@ -1,27 +1,3 @@
build --cxxopt=--std=c++14
build --copt=--std=c++14
build --copt=-I.
# Bazel does not support including its cc_library targets as system
# headers. We work around this for generated code
# (e.g. c10/macros/cmake_macros.h) by making the generated directory a
# system include path.
build --copt=-isystem --copt bazel-out/k8-fastbuild/bin
build --copt=-isystem --copt bazel-out/darwin-fastbuild/bin
build --experimental_ui_max_stdouterr_bytes=2048576
# Configuration to disable tty features for environments like CI
build:no-tty --curses no
build:no-tty --progress_report_interval 10
build:no-tty --show_progress_rate_limit 10
# Configuration to build with GPU support
build:gpu --define=cuda=true
# define a separate build folder for faster switching between configs
build:gpu --platform_suffix=-gpu
# See the note on the config-less build for details about why we are
# doing this. We must also do it for the "-gpu" platform suffix.
build --copt=-isystem --copt=bazel-out/k8-fastbuild-gpu/bin
# rules_cuda configuration
build:gpu --@rules_cuda//cuda:enable_cuda
build:gpu --@rules_cuda//cuda:cuda_targets=sm_52
build:gpu --@rules_cuda//cuda:compiler=nvcc
build:gpu --repo_env=CUDA_PATH=/usr/local/cuda

View File

@ -1 +1 @@
4.2.1
3.1.0

View File

@ -1,15 +0,0 @@
[buildfile]
name = BUILD.buck
[repositories]
bazel_skylib = third_party/bazel-skylib/
[download]
in_build = true
[cxx]
cxxflags = -std=c++17
should_remap_host_platform = true
[project]
default_flavors_mode=all

504
.circleci/README.md Normal file
View File

@ -0,0 +1,504 @@
Structure of CI
===============
setup job:
1. Does a git checkout
2. Persists CircleCI scripts (everything in `.circleci`) into a workspace. Why?
We don't always do a Git checkout on all subjobs, but we usually
still want to be able to call scripts one way or another in a subjob.
Persisting files this way lets us have access to them without doing a
checkout. This workspace is conventionally mounted on `~/workspace`
(this is distinguished from `~/project`, which is the conventional
working directory that CircleCI will default to starting your jobs
in.)
3. Write out the commit message to `.circleci/COMMIT_MSG`. This is so
we can determine in subjobs if we should actually run the jobs or
not, even if there isn't a Git checkout.
CircleCI configuration generator
================================
One may no longer make changes to the `.circleci/config.yml` file directly.
Instead, one must edit these Python scripts or files in the `verbatim-sources/` directory.
Usage
----------
1. Make changes to these scripts.
2. Run the `regenerate.sh` script in this directory and commit the script changes and the resulting change to `config.yml`.
You'll see a build failure on TravisCI if the scripts don't agree with the checked-in version.
Motivation
----------
These scripts establish a single, authoritative source of documentation for the CircleCI configuration matrix.
The documentation, in the form of diagrams, is automatically generated and cannot drift out of sync with the YAML content.
Furthermore, consistency is enforced within the YAML config itself, by using a single source of data to generate
multiple parts of the file.
* Facilitates one-off culling/enabling of CI configs for testing PRs on special targets
Also see https://github.com/pytorch/pytorch/issues/17038
Future direction
----------------
### Declaring sparse config subsets
See comment [here](https://github.com/pytorch/pytorch/pull/17323#pullrequestreview-206945747):
In contrast with a full recursive tree traversal of configuration dimensions,
> in the future future I think we actually want to decrease our matrix somewhat and have only a few mostly-orthogonal builds that taste as many different features as possible on PRs, plus a more complete suite on every PR and maybe an almost full suite nightly/weekly (we don't have this yet). Specifying PR jobs in the future might be easier to read with an explicit list when we come to this.
----------------
----------------
# How do the binaries / nightlies / releases work?
### What is a binary?
A binary or package (used interchangeably) is a pre-built collection of c++ libraries, header files, python bits, and other files. We build these and distribute them so that users do not need to install from source.
A **binary configuration** is a collection of
* release or nightly
* releases are stable, nightlies are beta and built every night
* python version
* linux: 3.5m, 3.6m 3.7m (mu is wide unicode or something like that. It usually doesn't matter but you should know that it exists)
* macos: 3.6, 3.7, 3.8
* windows: 3.6, 3.7, 3.8
* cpu version
* cpu, cuda 9.0, cuda 10.0
* The supported cuda versions occasionally change
* operating system
* Linux - these are all built on CentOS. There haven't been any problems in the past building on CentOS and using on Ubuntu
* MacOS
* Windows - these are built on Azure pipelines
* devtoolset version (gcc compiler version)
* This only matters on Linux cause only Linux uses gcc. tldr is gcc made a backwards incompatible change from gcc 4.8 to gcc 5, because it had to change how it implemented std::vector and std::string
### Where are the binaries?
The binaries are built in CircleCI. There are nightly binaries built every night at 9pm PST (midnight EST) and release binaries corresponding to Pytorch releases, usually every few months.
We have 3 types of binary packages
* pip packages - nightlies are stored on s3 (pip install -f <a s3 url>). releases are stored in a pip repo (pip install torch) (ask Soumith about this)
* conda packages - nightlies and releases are both stored in a conda repo. Nighty packages have a '_nightly' suffix
* libtorch packages - these are zips of all the c++ libraries, header files, and sometimes dependencies. These are c++ only
* shared with dependencies (the only supported option for Windows)
* static with dependencies
* shared without dependencies
* static without dependencies
All binaries are built in CircleCI workflows except Windows. There are checked-in workflows (committed into the .circleci/config.yml) to build the nightlies every night. Releases are built by manually pushing a PR that builds the suite of release binaries (overwrite the config.yml to build the release)
# CircleCI structure of the binaries
Some quick vocab:
* A\**workflow** is a CircleCI concept; it is a DAG of '**jobs**'. ctrl-f 'workflows' on\https://github.com/pytorch/pytorch/blob/master/.circleci/config.yml to see the workflows.
* **jobs** are a sequence of '**steps**'
* **steps** are usually just a bash script or a builtin CircleCI command.* All steps run in new environments, environment variables declared in one script DO NOT persist to following steps*
* CircleCI has a **workspace**, which is essentially a cache between steps of the *same job* in which you can store artifacts between steps.
## How are the workflows structured?
The nightly binaries have 3 workflows. We have one job (actually 3 jobs: build, test, and upload) per binary configuration
1. binarybuilds
1. every day midnight EST
2. linux: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/linux-binary-build-defaults.yml
3. macos: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/macos-binary-build-defaults.yml
4. For each binary configuration, e.g. linux_conda_3.7_cpu there is a
1. binary_linux_conda_3.7_cpu_build
1. Builds the build. On linux jobs this uses the 'docker executor'.
2. Persists the package to the workspace
2. binary_linux_conda_3.7_cpu_test
1. Loads the package to the workspace
2. Spins up a docker image (on Linux), mapping the package and code repos into the docker
3. Runs some smoke tests in the docker
4. (Actually, for macos this is a step rather than a separate job)
3. binary_linux_conda_3.7_cpu_upload
1. Logs in to aws/conda
2. Uploads the package
2. update_s3_htmls
1. every day 5am EST
2. https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/binary_update_htmls.yml
3. See below for what these are for and why they're needed
4. Three jobs that each examine the current contents of aws and the conda repo and update some html files in s3
3. binarysmoketests
1. every day
2. https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/nightly-build-smoke-tests-defaults.yml
3. For each binary configuration, e.g. linux_conda_3.7_cpu there is a
1. smoke_linux_conda_3.7_cpu
1. Downloads the package from the cloud, e.g. using the official pip or conda instructions
2. Runs the smoke tests
## How are the jobs structured?
The jobs are in https://github.com/pytorch/pytorch/tree/master/.circleci/verbatim-sources . Jobs are made of multiple steps. There are some shared steps used by all the binaries/smokes. Steps of these jobs are all delegated to scripts in https://github.com/pytorch/pytorch/tree/master/.circleci/scripts .
* Linux jobs: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/linux-binary-build-defaults.yml
* binary_linux_build.sh
* binary_linux_test.sh
* binary_linux_upload.sh
* MacOS jobs: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/macos-binary-build-defaults.yml
* binary_macos_build.sh
* binary_macos_test.sh
* binary_macos_upload.sh
* Update html jobs: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/binary_update_htmls.yml
* These delegate from the pytorch/builder repo
* https://github.com/pytorch/builder/blob/master/cron/update_s3_htmls.sh
* https://github.com/pytorch/builder/blob/master/cron/upload_binary_sizes.sh
* Smoke jobs (both linux and macos): https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/nightly-build-smoke-tests-defaults.yml
* These delegate from the pytorch/builder repo
* https://github.com/pytorch/builder/blob/master/run_tests.sh
* https://github.com/pytorch/builder/blob/master/smoke_test.sh
* https://github.com/pytorch/builder/blob/master/check_binary.sh
* Common shared code (shared across linux and macos): https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/nightly-binary-build-defaults.yml
* binary_checkout.sh - checks out pytorch/builder repo. Right now this also checks out pytorch/pytorch, but it shouldn't. pytorch/pytorch should just be shared through the workspace. This can handle being run before binary_populate_env.sh
* binary_populate_env.sh - parses BUILD_ENVIRONMENT into the separate env variables that make up a binary configuration. Also sets lots of default values, the date, the version strings, the location of folders in s3, all sorts of things. This generally has to be run before other steps.
* binary_install_miniconda.sh - Installs miniconda, cross platform. Also hacks this for the update_binary_sizes job that doesn't have the right env variables
* binary_run_in_docker.sh - Takes a bash script file (the actual test code) from a hardcoded location, spins up a docker image, and runs the script inside the docker image
### **Why do the steps all refer to scripts?**
CircleCI creates a final yaml file by inlining every <<* segment, so if we were to keep all the code in the config.yml itself then the config size would go over 4 MB and cause infra problems.
### **What is binary_run_in_docker for?**
So, CircleCI has several executor types: macos, machine, and docker are the ones we use. The 'machine' executor gives you two cores on some linux vm. The 'docker' executor gives you considerably more cores (nproc was 32 instead of 2 back when I tried in February). Since the dockers are faster, we try to run everything that we can in dockers. Thus
* linux build jobs use the docker executor. Running them on the docker executor was at least 2x faster than running them on the machine executor
* linux test jobs use the machine executor and spin up their own docker. Why this nonsense? It's cause we run nvidia-docker for our GPU tests; any code that calls into the CUDA runtime needs to be run on nvidia-docker. To run a nvidia-docker you need to install some nvidia packages on the host machine and then call docker with the '—runtime nvidia' argument. CircleCI doesn't support this, so we have to do it ourself.
* This is not just a mere inconvenience. **This blocks all of our linux tests from using more than 2 cores.** But there is nothing that we can do about it, but wait for a fix on circleci's side. Right now, we only run some smoke tests (some simple imports) on the binaries, but this also affects non-binary test jobs.
* linux upload jobs use the machine executor. The upload jobs are so short that it doesn't really matter what they use
* linux smoke test jobs use the machine executor for the same reason as the linux test jobs
binary_run_in_docker.sh is a way to share the docker start-up code between the binary test jobs and the binary smoke test jobs
### **Why does binary_checkout also checkout pytorch? Why shouldn't it?**
We want all the nightly binary jobs to run on the exact same git commit, so we wrote our own checkout logic to ensure that the same commit was always picked. Later circleci changed that to use a single pytorch checkout and persist it through the workspace (they did this because our config file was too big, so they wanted to take a lot of the setup code into scripts, but the scripts needed the code repo to exist to be called, so they added a prereq step called 'setup' to checkout the code and persist the needed scripts to the workspace). The changes to the binary jobs were not properly tested, so they all broke from missing pytorch code no longer existing. We hotfixed the problem by adding the pytorch checkout back to binary_checkout, so now there's two checkouts of pytorch on the binary jobs. This problem still needs to be fixed, but it takes careful tracing of which code is being called where.
# Azure Pipelines structure of the binaries
TODO: fill in stuff
## How are the workflows structured?
TODO: fill in stuff
## How are the jobs structured?
TODO: fill in stuff
# Code structure of the binaries (circleci agnostic)
## Overview
The code that runs the binaries lives in two places, in the normal [github.com/pytorch/pytorch](http://github.com/pytorch/pytorch), but also in [github.com/pytorch/builder](http://github.com/pytorch/builder) , which is a repo that defines how all the binaries are built. The relevant code is
```
# All code needed to set-up environments for build code to run in,
# but only code that is specific to the current CI system
pytorch/pytorch
- .circleci/ # Folder that holds all circleci related stuff
- config.yml # GENERATED file that actually controls all circleci behavior
- verbatim-sources # Used to generate job/workflow sections in ^
- scripts/ # Code needed to prepare circleci environments for binary build scripts
- setup.py # Builds pytorch. This is wrapped in pytorch/builder
- cmake files # used in normal building of pytorch
# All code needed to prepare a binary build, given an environment
# with all the right variables/packages/paths.
pytorch/builder
# Given an installed binary and a proper python env, runs some checks
# to make sure the binary was built the proper way. Checks things like
# the library dependencies, symbols present, etc.
- check_binary.sh
# Given an installed binary, runs python tests to make sure everything
# is in order. These should be de-duped. Right now they both run smoke
# tests, but are called from different places. Usually just call some
# import statements, but also has overlap with check_binary.sh above
- run_tests.sh
- smoke_test.sh
# Folders that govern how packages are built. See paragraphs below
- conda/
- build_pytorch.sh # Entrypoint. Delegates to proper conda build folder
- switch_cuda_version.sh # Switches activate CUDA installation in Docker
- pytorch-nightly/ # Build-folder
- manywheel/
- build_cpu.sh # Entrypoint for cpu builds
- build.sh # Entrypoint for CUDA builds
- build_common.sh # Actual build script that ^^ call into
- wheel/
- build_wheel.sh # Entrypoint for wheel builds
- windows/
- build_pytorch.bat # Entrypoint for wheel builds on Windows
```
Every type of package has an entrypoint build script that handles the all the important logic.
## Conda
Linux, MacOS and Windows use the same code flow for the conda builds.
Conda packages are built with conda-build, see https://conda.io/projects/conda-build/en/latest/resources/commands/conda-build.html
Basically, you pass `conda build` a build folder (pytorch-nightly/ above) that contains a build script and a meta.yaml. The meta.yaml specifies in what python environment to build the package in, and what dependencies the resulting package should have, and the build script gets called in the env to build the thing.
tldr; on conda-build is
1. Creates a brand new conda environment, based off of deps in the meta.yaml
1. Note that environment variables do not get passed into this build env unless they are specified in the meta.yaml
2. If the build fails this environment will stick around. You can activate it for much easier debugging. The “General Python” section below explains what exactly a python “environment” is.
2. Calls build.sh in the environment
3. Copies the finished package to a new conda env, also specified by the meta.yaml
4. Runs some simple import tests (if specified in the meta.yaml)
5. Saves the finished package as a tarball
The build.sh we use is essentially a wrapper around ```python setup.py build``` , but it also manually copies in some of our dependent libraries into the resulting tarball and messes with some rpaths.
The entrypoint file `builder/conda/build_conda.sh` is complicated because
* It works for Linux, MacOS and Windows
* The mac builds used to create their own environments, since they all used to be on the same machine. Theres now a lot of extra logic to handle conda envs. This extra machinery could be removed
* It used to handle testing too, which adds more logic messing with python environments too. This extra machinery could be removed.
## Manywheels (linux pip and libtorch packages)
Manywheels are pip packages for linux distros. Note that these manywheels are not actually manylinux compliant.
`builder/manywheel/build_cpu.sh` and `builder/manywheel/build.sh` (for CUDA builds) just set different env vars and then call into `builder/manywheel/build_common.sh`
The entrypoint file `builder/manywheel/build_common.sh` is really really complicated because
* This used to handle building for several different python versions at the same time. The loops have been removed, but there's still unnecessary folders and movements here and there.
* The script is never used this way anymore. This extra machinery could be removed.
* This used to handle testing the pip packages too. This is why theres testing code at the end that messes with python installations and stuff
* The script is never used this way anymore. This extra machinery could be removed.
* This also builds libtorch packages
* This should really be separate. libtorch packages are c++ only and have no python. They should not share infra with all the python specific stuff in this file.
* There is a lot of messing with rpaths. This is necessary, but could be made much much simpler if the above issues were fixed.
## Wheels (MacOS pip and libtorch packages)
The entrypoint file `builder/wheel/build_wheel.sh` is complicated because
* The mac builds used to all run on one machine (we didnt have autoscaling mac machines till circleci). So this script handled siloing itself by setting-up and tearing-down its build env and siloing itself into its own build directory.
* The script is never used this way anymore. This extra machinery could be removed.
* This also builds libtorch packages
* Ditto the comment above. This should definitely be separated out.
Note that the MacOS Python wheels are still built in conda environments. Some of the dependencies present during build also come from conda.
## Windows Wheels (Windows pip and libtorch packages)
The entrypoint file `builder/windows/build_pytorch.bat` is complicated because
* This used to handle building for several different python versions at the same time. This is why there are loops everywhere
* The script is never used this way anymore. This extra machinery could be removed.
* This used to handle testing the pip packages too. This is why theres testing code at the end that messes with python installations and stuff
* The script is never used this way anymore. This extra machinery could be removed.
* This also builds libtorch packages
* This should really be separate. libtorch packages are c++ only and have no python. They should not share infra with all the python specific stuff in this file.
Note that the Windows Python wheels are still built in conda environments. Some of the dependencies present during build also come from conda.
## General notes
### Note on run_tests.sh, smoke_test.sh, and check_binary.sh
* These should all be consolidated
* These must run on all OS types: MacOS, Linux, and Windows
* These all run smoke tests at the moment. They inspect the packages some, maybe run a few import statements. They DO NOT run the python tests nor the cpp tests. The idea is that python tests on master and PR merges will catch all breakages. All these tests have to do is make sure the special binary machinery didnt mess anything up.
* There are separate run_tests.sh and smoke_test.sh because one used to be called by the smoke jobs and one used to be called by the binary test jobs (see circleci structure section above). This is still true actually, but these could be united into a single script that runs these checks, given an installed pytorch package.
### Note on libtorch
Libtorch packages are built in the wheel build scripts: manywheel/build_*.sh for linux and build_wheel.sh for mac. There are several things wrong with this
* Its confusing. Most of those scripts deal with python specifics.
* The extra conditionals everywhere severely complicate the wheel build scripts
* The process for building libtorch is different from the official instructions (a plain call to cmake, or a call to a script)
### Note on docker images / Dockerfiles
All linux builds occur in docker images. The docker images are
* pytorch/conda-cuda
* Has ALL CUDA versions installed. The script pytorch/builder/conda/switch_cuda_version.sh sets /usr/local/cuda to a symlink to e.g. /usr/local/cuda-10.0 to enable different CUDA builds
* Also used for cpu builds
* pytorch/manylinux-cuda90
* pytorch/manylinux-cuda92
* pytorch/manylinux-cuda100
* Also used for cpu builds
The Dockerfiles are available in pytorch/builder, but there is no circleci job or script to build these docker images, and they cannot be run locally (unless you have the correct local packages/paths). Only Soumith can build them right now.
### General Python
* This is still a good explanation of python installations https://caffe2.ai/docs/faq.html#why-do-i-get-import-errors-in-python-when-i-try-to-use-caffe2
# How to manually rebuild the binaries
tldr; make a PR that looks like https://github.com/pytorch/pytorch/pull/21159
Sometimes we want to push a change to master and then rebuild all of today's binaries after that change. As of May 30, 2019 there isn't a way to manually run a workflow in the UI. You can manually re-run a workflow, but it will use the exact same git commits as the first run and will not include any changes. So we have to make a PR and then force circleci to run the binary workflow instead of the normal tests. The above PR is an example of how to do this; essentially you copy-paste the binarybuilds workflow steps into the default workflow steps. If you need to point the builder repo to a different commit then you'd need to change https://github.com/pytorch/pytorch/blob/master/.circleci/scripts/binary_checkout.sh#L42-L45 to checkout what you want.
## How to test changes to the binaries via .circleci
Writing PRs that test the binaries is annoying, since the default circleci jobs that run on PRs are not the jobs that you want to run. Likely, changes to the binaries will touch something under .circleci/ and require that .circleci/config.yml be regenerated (.circleci/config.yml controls all .circleci behavior, and is generated using ```.circleci/regenerate.sh``` in python 3.7). But you also need to manually hardcode the binary jobs that you want to test into the .circleci/config.yml workflow, so you should actually make at least two commits, one for your changes and one to temporarily hardcode jobs. See https://github.com/pytorch/pytorch/pull/22928 as an example of how to do this.
```
# Make your changes
touch .circleci/verbatim-sources/nightly-binary-build-defaults.yml
# Regenerate the yaml, has to be in python 3.7
.circleci/regenerate.sh
# Make a commit
git add .circleci *
git commit -m "My real changes"
git push origin my_branch
# Now hardcode the jobs that you want in the .circleci/config.yml workflows section
# Also eliminate ensure-consistency and should_run_job checks
# e.g. https://github.com/pytorch/pytorch/commit/2b3344bfed8772fe86e5210cc4ee915dee42b32d
# Make a commit you won't keep
git add .circleci
git commit -m "[DO NOT LAND] testing binaries for above changes"
git push origin my_branch
# Now you need to make some changes to the first commit.
git rebase -i HEAD~2 # mark the first commit as 'edit'
# Make the changes
touch .circleci/verbatim-sources/nightly-binary-build-defaults.yml
.circleci/regenerate.sh
# Ammend the commit and recontinue
git add .circleci
git commit --amend
git rebase --continue
# Update the PR, need to force since the commits are different now
git push origin my_branch --force
```
The advantage of this flow is that you can make new changes to the base commit and regenerate the .circleci without having to re-write which binary jobs you want to test on. The downside is that all updates will be force pushes.
## How to build a binary locally
### Linux
You can build Linux binaries locally easily using docker.
```
# Run the docker
# Use the correct docker image, pytorch/conda-cuda used here as an example
#
# -v path/to/foo:path/to/bar makes path/to/foo on your local machine (the
# machine that you're running the command on) accessible to the docker
# container at path/to/bar. So if you then run `touch path/to/bar/baz`
# in the docker container then you will see path/to/foo/baz on your local
# machine. You could also clone the pytorch and builder repos in the docker.
#
# If you're building a CUDA binary then use `nvidia-docker run` instead, see below.
#
# If you know how, add ccache as a volume too and speed up everything
docker run \
-v your/pytorch/repo:/pytorch \
-v your/builder/repo:/builder \
-v where/you/want/packages/to/appear:/final_pkgs \
-it pytorch/conda-cuda /bin/bash
# Export whatever variables are important to you. All variables that you'd
# possibly need are in .circleci/scripts/binary_populate_env.sh
# You should probably always export at least these 3 variables
export PACKAGE_TYPE=conda
export DESIRED_PYTHON=3.6
export DESIRED_CUDA=cpu
# Call the entrypoint
# `|& tee foo.log` just copies all stdout and stderr output to foo.log
# The builds generate lots of output so you probably need this when
# building locally.
/builder/conda/build_pytorch.sh |& tee build_output.log
```
**Building CUDA binaries on docker**
To build a CUDA binary you need to use `nvidia-docker run` instead of just `docker run` (or you can manually pass `--runtime=nvidia`). This adds some needed libraries and things to build CUDA stuff.
You can build CUDA binaries on CPU only machines, but you can only run CUDA binaries on CUDA machines. This means that you can build a CUDA binary on a docker on your laptop if you so choose (though its gonna take a loong time).
For Facebook employees, ask about beefy machines that have docker support and use those instead of your laptop; it will be 5x as fast.
### MacOS
Theres no easy way to generate reproducible hermetic MacOS environments. If you have a Mac laptop then you can try emulating the .circleci environments as much as possible, but you probably have packages in /usr/local/, possibly installed by brew, that will probably interfere with the build. If youre trying to repro an error on a Mac build in .circleci and you cant seem to repro locally, then my best advice is actually to iterate on .circleci :/
But if you want to try, then Id recommend
```
# Create a new terminal
# Clear your LD_LIBRARY_PATH and trim as much out of your PATH as you
# know how to do
# Install a new miniconda
# First remove any other python or conda installation from your PATH
# Always install miniconda 3, even if building for Python <3
new_conda="~/my_new_conda"
conda_sh="$new_conda/install_miniconda.sh"
curl -o "$conda_sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
chmod +x "$conda_sh"
"$conda_sh" -b -p "$MINICONDA_ROOT"
rm -f "$conda_sh"
export PATH="~/my_new_conda/bin:$PATH"
# Create a clean python env
# All MacOS builds use conda to manage the python env and dependencies
# that are built with, even the pip packages
conda create -yn binary python=2.7
conda activate binary
# Export whatever variables are important to you. All variables that you'd
# possibly need are in .circleci/scripts/binary_populate_env.sh
# You should probably always export at least these 3 variables
export PACKAGE_TYPE=conda
export DESIRED_PYTHON=3.6
export DESIRED_CUDA=cpu
# Call the entrypoint you want
path/to/builder/wheel/build_wheel.sh
```
N.B. installing a brand new miniconda is important. This has to do with how conda installations work. See the “General Python” section above, but tldr; is that
1. You make the conda command accessible by prepending `path/to/conda_root/bin` to your PATH.
2. You make a new env and activate it, which then also gets prepended to your PATH. Now you have `path/to/conda_root/envs/new_env/bin:path/to/conda_root/bin:$PATH`
3. Now say you (or some code that you ran) call python executable `foo`
1. if you installed `foo` in `new_env`, then `path/to/conda_root/envs/new_env/bin/foo` will get called, as expected.
2. But if you forgot to installed `foo` in `new_env` but happened to previously install it in your root conda env (called base), then unix/linux will still find `path/to/conda_root/bin/foo` . This is dangerous, since `foo` can be a different version than you want; `foo` can even be for an incompatible python version!
Newer conda versions and proper python hygiene can prevent this, but just install a new miniconda to be safe.
### Windows
TODO: fill in

View File

@ -25,12 +25,38 @@ DEPS_INCLUSION_DIMENSIONS = [
]
def get_processor_arch_name(gpu_version):
return "cpu" if not gpu_version else (
"cu" + gpu_version.strip("cuda") if gpu_version.startswith("cuda") else gpu_version
)
def get_processor_arch_name(cuda_version):
return "cpu" if not cuda_version else "cu" + cuda_version
LINUX_PACKAGE_VARIANTS = OrderedDict(
manywheel=[
"3.6m",
"3.7m",
"3.8m",
],
conda=dimensions.STANDARD_PYTHON_VERSIONS,
libtorch=[
"3.7m",
],
)
CONFIG_TREE_DATA = OrderedDict(
linux=(dimensions.CUDA_VERSIONS, LINUX_PACKAGE_VARIANTS),
macos=([None], OrderedDict(
wheel=dimensions.STANDARD_PYTHON_VERSIONS,
conda=dimensions.STANDARD_PYTHON_VERSIONS,
libtorch=[
"3.7",
],
)),
windows=(dimensions.CUDA_VERSIONS, OrderedDict(
wheel=dimensions.STANDARD_PYTHON_VERSIONS,
conda=dimensions.STANDARD_PYTHON_VERSIONS,
libtorch=[
"3.7",
],
)),
)
# GCC config variants:
@ -67,12 +93,12 @@ class TopLevelNode(ConfigNode):
class OSConfigNode(ConfigNode):
def __init__(self, parent, os_name, gpu_versions, py_tree):
def __init__(self, parent, os_name, cuda_versions, py_tree):
super(OSConfigNode, self).__init__(parent, os_name)
self.py_tree = py_tree
self.props["os_name"] = os_name
self.props["gpu_versions"] = gpu_versions
self.props["cuda_versions"] = cuda_versions
def get_children(self):
return [PackageFormatConfigNode(self, k, v) for k, v in self.py_tree.items()]
@ -85,14 +111,13 @@ class PackageFormatConfigNode(ConfigNode):
self.props["python_versions"] = python_versions
self.props["package_format"] = package_format
def get_children(self):
if self.find_prop("os_name") == "linux":
return [LinuxGccConfigNode(self, v) for v in LINUX_GCC_CONFIG_VARIANTS[self.find_prop("package_format")]]
elif self.find_prop("os_name") == "windows" and self.find_prop("package_format") == "libtorch":
return [WindowsLibtorchConfigNode(self, v) for v in WINDOWS_LIBTORCH_CONFIG_VARIANTS]
else:
return [ArchConfigNode(self, v) for v in self.find_prop("gpu_versions")]
return [ArchConfigNode(self, v) for v in self.find_prop("cuda_versions")]
class LinuxGccConfigNode(ConfigNode):
@ -102,22 +127,14 @@ class LinuxGccConfigNode(ConfigNode):
self.props["gcc_config_variant"] = gcc_config_variant
def get_children(self):
gpu_versions = self.find_prop("gpu_versions")
cuda_versions = self.find_prop("cuda_versions")
# XXX devtoolset7 on CUDA 9.0 is temporarily disabled
# see https://github.com/pytorch/pytorch/issues/20066
if self.find_prop("gcc_config_variant") == 'devtoolset7':
gpu_versions = filter(lambda x: x != "cuda_90", gpu_versions)
cuda_versions = filter(lambda x: x != "90", cuda_versions)
# XXX disabling conda rocm build since docker images are not there
if self.find_prop("package_format") == 'conda':
gpu_versions = filter(lambda x: x not in dimensions.ROCM_VERSION_LABELS, gpu_versions)
# XXX libtorch rocm build is temporarily disabled
if self.find_prop("package_format") == 'libtorch':
gpu_versions = filter(lambda x: x not in dimensions.ROCM_VERSION_LABELS, gpu_versions)
return [ArchConfigNode(self, v) for v in gpu_versions]
return [ArchConfigNode(self, v) for v in cuda_versions]
class WindowsLibtorchConfigNode(ConfigNode):
@ -127,14 +144,14 @@ class WindowsLibtorchConfigNode(ConfigNode):
self.props["libtorch_config_variant"] = libtorch_config_variant
def get_children(self):
return [ArchConfigNode(self, v) for v in self.find_prop("gpu_versions")]
return [ArchConfigNode(self, v) for v in self.find_prop("cuda_versions")]
class ArchConfigNode(ConfigNode):
def __init__(self, parent, gpu):
super(ArchConfigNode, self).__init__(parent, get_processor_arch_name(gpu))
def __init__(self, parent, cu):
super(ArchConfigNode, self).__init__(parent, get_processor_arch_name(cu))
self.props["gpu"] = gpu
self.props["cu"] = cu
def get_children(self):
return [PyVersionConfigNode(self, v) for v in self.find_prop("python_versions")]

View File

@ -6,10 +6,10 @@ import cimodel.lib.conf_tree as conf_tree
import cimodel.lib.miniutils as miniutils
class Conf(object):
def __init__(self, os, gpu_version, pydistro, parms, smoke, libtorch_variant, gcc_config_variant, libtorch_config_variant):
def __init__(self, os, cuda_version, pydistro, parms, smoke, libtorch_variant, gcc_config_variant, libtorch_config_variant):
self.os = os
self.gpu_version = gpu_version
self.cuda_version = cuda_version
self.pydistro = pydistro
self.parms = parms
self.smoke = smoke
@ -18,7 +18,7 @@ class Conf(object):
self.libtorch_config_variant = libtorch_config_variant
def gen_build_env_parms(self):
elems = [self.pydistro] + self.parms + [binary_build_data.get_processor_arch_name(self.gpu_version)]
elems = [self.pydistro] + self.parms + [binary_build_data.get_processor_arch_name(self.cuda_version)]
if self.gcc_config_variant is not None:
elems.append(str(self.gcc_config_variant))
if self.libtorch_config_variant is not None:
@ -27,19 +27,7 @@ class Conf(object):
def gen_docker_image(self):
if self.gcc_config_variant == 'gcc5.4_cxx11-abi':
if self.gpu_version is None:
return miniutils.quote("pytorch/libtorch-cxx11-builder:cpu")
else:
return miniutils.quote(
f"pytorch/libtorch-cxx11-builder:{self.gpu_version}"
)
if self.pydistro == "conda":
if self.gpu_version is None:
return miniutils.quote("pytorch/conda-builder:cpu")
else:
return miniutils.quote(
f"pytorch/conda-builder:{self.gpu_version}"
)
return miniutils.quote("pytorch/pytorch-binary-docker-image-ubuntu16.04:latest")
docker_word_substitution = {
"manywheel": "manylinux",
@ -49,12 +37,9 @@ class Conf(object):
docker_distro_prefix = miniutils.override(self.pydistro, docker_word_substitution)
# The cpu nightlies are built on the pytorch/manylinux-cuda102 docker image
# TODO cuda images should consolidate into tag-base images similar to rocm
alt_docker_suffix = "cuda102" if not self.gpu_version else (
"rocm:" + self.gpu_version.strip("rocm") if self.gpu_version.startswith("rocm") else self.gpu_version)
docker_distro_suffix = alt_docker_suffix if self.pydistro != "conda" else (
"cuda" if alt_docker_suffix.startswith("cuda") else "rocm")
return miniutils.quote("pytorch/" + docker_distro_prefix + "-" + docker_distro_suffix)
alt_docker_suffix = self.cuda_version or "102"
docker_distro_suffix = "" if self.pydistro == "conda" else alt_docker_suffix
return miniutils.quote("pytorch/" + docker_distro_prefix + "-cuda" + docker_distro_suffix)
def get_name_prefix(self):
return "smoke" if self.smoke else "binary"
@ -84,10 +69,14 @@ class Conf(object):
"update_s3_htmls",
]
job_def["filters"] = branch_filters.gen_filter_dict(
branches_list=["postnightly"],
branches_list=["nightly"],
tags_list=[branch_filters.RC_PATTERN],
)
else:
filter_branch = r"/.*/"
if phase in ["upload"]:
filter_branch = "nightly"
else:
filter_branch = r"/.*/"
job_def["filters"] = branch_filters.gen_filter_dict(
branches_list=[filter_branch],
tags_list=[branch_filters.RC_PATTERN],
@ -100,61 +89,28 @@ class Conf(object):
if not (self.smoke and self.os == "macos") and self.os != "windows":
job_def["docker_image"] = self.gen_docker_image()
# fix this. only works on cuda not rocm
if self.os != "windows" and self.gpu_version:
if self.os != "windows" and self.cuda_version:
job_def["use_cuda_docker_runtime"] = miniutils.quote("1")
else:
if self.os == "linux" and phase != "upload":
job_def["docker_image"] = self.gen_docker_image()
if phase == "test":
if self.gpu_version:
if self.cuda_version:
if self.os == "windows":
job_def["executor"] = "windows-with-nvidia-gpu"
else:
job_def["resource_class"] = "gpu.medium"
if phase == "upload":
job_def["context"] = "org-member"
job_def["requires"] = [
self.gen_build_name(upload_phase_dependency, nightly)
]
os_name = miniutils.override(self.os, {"macos": "mac"})
job_name = "_".join([self.get_name_prefix(), os_name, phase])
return {job_name : job_def}
def gen_upload_job(self, phase, requires_dependency):
"""Generate binary_upload job for configuration
Output looks similar to:
- binary_upload:
name: binary_linux_manywheel_3_7m_cu113_devtoolset7_nightly_upload
context: org-member
requires: binary_linux_manywheel_3_7m_cu113_devtoolset7_nightly_test
filters:
branches:
only:
- nightly
tags:
only: /v[0-9]+(\\.[0-9]+)*-rc[0-9]+/
package_type: manywheel
upload_subfolder: cu113
"""
return {
"binary_upload": OrderedDict({
"name": self.gen_build_name(phase, nightly=True),
"context": "org-member",
"requires": [self.gen_build_name(
requires_dependency,
nightly=True
)],
"filters": branch_filters.gen_filter_dict(
branches_list=["nightly"],
tags_list=[branch_filters.RC_PATTERN],
),
"package_type": self.pydistro,
"upload_subfolder": binary_build_data.get_processor_arch_name(
self.gpu_version,
),
})
}
def get_root(smoke, name):
return binary_build_data.TopLevelNode(
@ -173,10 +129,10 @@ def gen_build_env_list(smoke):
for c in config_list:
conf = Conf(
c.find_prop("os_name"),
c.find_prop("gpu"),
c.find_prop("cu"),
c.find_prop("package_format"),
[c.find_prop("pyver")],
c.find_prop("smoke") and not (c.find_prop("os_name") == "macos_arm64"), # don't test arm64
c.find_prop("smoke"),
c.find_prop("libtorch_variant"),
c.find_prop("gcc_config_variant"),
c.find_prop("libtorch_config_variant"),
@ -193,19 +149,32 @@ def get_nightly_uploads():
mylist = []
for conf in configs:
phase_dependency = "test" if predicate_exclude_macos(conf) else "build"
mylist.append(conf.gen_upload_job("upload", phase_dependency))
mylist.append(conf.gen_workflow_job("upload", phase_dependency, nightly=True))
return mylist
def get_post_upload_jobs():
"""Generate jobs to update HTML indices and report binary sizes"""
configs = gen_build_env_list(False)
common_job_def = {
"context": "org-member",
"filters": branch_filters.gen_filter_dict(
branches_list=["nightly"],
tags_list=[branch_filters.RC_PATTERN],
),
"requires": [],
}
for conf in configs:
upload_job_name = conf.gen_build_name(
build_or_test="upload",
nightly=True
)
common_job_def["requires"].append(upload_job_name)
return [
{
"update_s3_htmls": {
"name": "update_s3_htmls",
"context": "org-member",
"filters": branch_filters.gen_filter_dict(
branches_list=["postnightly"],
),
**common_job_def,
},
},
]
@ -228,9 +197,7 @@ def get_jobs(toplevel_key, smoke):
configs = gen_build_env_list(smoke)
phase = "build" if toplevel_key == "binarybuilds" else "test"
for build_config in configs:
# don't test for macos_arm64 as it's cross compiled
if phase != "test" or build_config.os != "macos_arm64":
jobs_list.append(build_config.gen_workflow_job(phase, nightly=True))
jobs_list.append(build_config.gen_workflow_job(phase, nightly=True))
return jobs_list

View File

@ -0,0 +1,91 @@
from cimodel.lib.conf_tree import ConfigNode, XImportant
from cimodel.lib.conf_tree import Ver
CONFIG_TREE_DATA = [
(Ver("ubuntu", "16.04"), [
([Ver("clang", "7")], [XImportant("onnx_main_py3.6"),
XImportant("onnx_ort1_py3.6"),
XImportant("onnx_ort2_py3.6")]),
]),
]
class TreeConfigNode(ConfigNode):
def __init__(self, parent, node_name, subtree):
super(TreeConfigNode, self).__init__(parent, self.modify_label(node_name))
self.subtree = subtree
self.init2(node_name)
# noinspection PyMethodMayBeStatic
def modify_label(self, label):
return str(label)
def init2(self, node_name):
pass
def get_children(self):
return [self.child_constructor()(self, k, v) for (k, v) in self.subtree]
def is_build_only(self):
if str(self.find_prop("language_version")) == "onnx_main_py3.6" or \
str(self.find_prop("language_version")) == "onnx_ort1_py3.6" or \
str(self.find_prop("language_version")) == "onnx_ort2_py3.6":
return False
return set(str(c) for c in self.find_prop("compiler_version")).intersection({
"clang3.8",
"clang3.9",
"clang7",
"android",
}) or self.find_prop("distro_version").name == "macos"
def is_test_only(self):
if str(self.find_prop("language_version")) == "onnx_ort1_py3.6" or \
str(self.find_prop("language_version")) == "onnx_ort2_py3.6":
return True
return False
class TopLevelNode(TreeConfigNode):
def __init__(self, node_name, subtree):
super(TopLevelNode, self).__init__(None, node_name, subtree)
# noinspection PyMethodMayBeStatic
def child_constructor(self):
return DistroConfigNode
class DistroConfigNode(TreeConfigNode):
def init2(self, node_name):
self.props["distro_version"] = node_name
# noinspection PyMethodMayBeStatic
def child_constructor(self):
return CompilerConfigNode
class CompilerConfigNode(TreeConfigNode):
def init2(self, node_name):
self.props["compiler_version"] = node_name
# noinspection PyMethodMayBeStatic
def child_constructor(self):
return LanguageConfigNode
class LanguageConfigNode(TreeConfigNode):
def init2(self, node_name):
self.props["language_version"] = node_name
self.props["build_only"] = self.is_build_only()
self.props["test_only"] = self.is_test_only()
def child_constructor(self):
return ImportantConfigNode
class ImportantConfigNode(TreeConfigNode):
def init2(self, node_name):
self.props["important"] = True
def get_children(self):
return []

View File

@ -0,0 +1,174 @@
from collections import OrderedDict
import cimodel.data.dimensions as dimensions
import cimodel.lib.conf_tree as conf_tree
from cimodel.lib.conf_tree import Ver
import cimodel.lib.miniutils as miniutils
from cimodel.data.caffe2_build_data import CONFIG_TREE_DATA, TopLevelNode
from cimodel.data.simple.util.branch_filters import gen_filter_dict
from dataclasses import dataclass
DOCKER_IMAGE_PATH_BASE = "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/"
DOCKER_IMAGE_VERSION = "376"
@dataclass
class Conf:
language: str
distro: Ver
# There could be multiple compiler versions configured (e.g. nvcc
# for gpu files and host compiler (gcc/clang) for cpu files)
compilers: [Ver]
build_only: bool
test_only: bool
is_important: bool
@property
def compiler_names(self):
return [c.name for c in self.compilers]
# TODO: Eventually we can probably just remove the cudnn7 everywhere.
def get_cudnn_insertion(self):
omit = self.language == "onnx_main_py3.6" \
or self.language == "onnx_ort1_py3.6" \
or self.language == "onnx_ort2_py3.6" \
or set(self.compiler_names).intersection({"android", "mkl", "clang"}) \
or str(self.distro) in ["ubuntu14.04", "macos10.13"]
return [] if omit else ["cudnn7"]
def get_build_name_root_parts(self):
return [
"caffe2",
self.language,
] + self.get_build_name_middle_parts()
def get_build_name_middle_parts(self):
return [str(c) for c in self.compilers] + self.get_cudnn_insertion() + [str(self.distro)]
def construct_phase_name(self, phase):
root_parts = self.get_build_name_root_parts()
build_name_substitutions = {
"onnx_ort1_py3.6": "onnx_main_py3.6",
"onnx_ort2_py3.6": "onnx_main_py3.6",
}
if phase == "build":
root_parts = [miniutils.override(r, build_name_substitutions) for r in root_parts]
return "_".join(root_parts + [phase]).replace(".", "_")
def get_platform(self):
platform = self.distro.name
if self.distro.name != "macos":
platform = "linux"
return platform
def gen_docker_image(self):
lang_substitutions = {
"onnx_main_py3.6": "py3.6",
"onnx_ort1_py3.6": "py3.6",
"onnx_ort2_py3.6": "py3.6",
"cmake": "py3",
}
lang = miniutils.override(self.language, lang_substitutions)
parts = [lang] + self.get_build_name_middle_parts()
return miniutils.quote(DOCKER_IMAGE_PATH_BASE + "-".join(parts) + ":" + str(DOCKER_IMAGE_VERSION))
def gen_workflow_params(self, phase):
parameters = OrderedDict()
lang_substitutions = {
"onnx_py3": "onnx-py3",
"onnx_main_py3.6": "onnx-main-py3.6",
"onnx_ort1_py3.6": "onnx-ort1-py3.6",
"onnx_ort2_py3.6": "onnx-ort2-py3.6",
}
lang = miniutils.override(self.language, lang_substitutions)
parts = [
"caffe2",
lang,
] + self.get_build_name_middle_parts() + [phase]
build_env_name = "-".join(parts)
parameters["build_environment"] = miniutils.quote(build_env_name)
if "ios" in self.compiler_names:
parameters["build_ios"] = miniutils.quote("1")
if phase == "test":
# TODO cuda should not be considered a compiler
if "cuda" in self.compiler_names:
parameters["use_cuda_docker_runtime"] = miniutils.quote("1")
if self.distro.name != "macos":
parameters["docker_image"] = self.gen_docker_image()
if self.build_only:
parameters["build_only"] = miniutils.quote("1")
if phase == "test":
resource_class = "large" if "cuda" not in self.compiler_names else "gpu.medium"
parameters["resource_class"] = resource_class
return parameters
def gen_workflow_job(self, phase):
job_def = OrderedDict()
job_def["name"] = self.construct_phase_name(phase)
if phase == "test":
job_def["requires"] = [self.construct_phase_name("build")]
job_name = "caffe2_" + self.get_platform() + "_test"
else:
job_name = "caffe2_" + self.get_platform() + "_build"
if not self.is_important:
job_def["filters"] = gen_filter_dict()
job_def.update(self.gen_workflow_params(phase))
return {job_name : job_def}
def get_root():
return TopLevelNode("Caffe2 Builds", CONFIG_TREE_DATA)
def instantiate_configs():
config_list = []
root = get_root()
found_configs = conf_tree.dfs(root)
for fc in found_configs:
c = Conf(
language=fc.find_prop("language_version"),
distro=fc.find_prop("distro_version"),
compilers=fc.find_prop("compiler_version"),
build_only=fc.find_prop("build_only"),
test_only=fc.find_prop("test_only"),
is_important=fc.find_prop("important"),
)
config_list.append(c)
return config_list
def get_workflow_jobs():
configs = instantiate_configs()
x = []
for conf_options in configs:
phases = ["build"]
if not conf_options.build_only:
phases = dimensions.PHASES
if conf_options.test_only:
phases = ["test"]
for phase in phases:
x.append(conf_options.gen_workflow_job(phase))
return x

View File

@ -1,23 +1,14 @@
PHASES = ["build", "test"]
CUDA_VERSIONS = [
None, # cpu build
"92",
"101",
"102",
"113",
"116",
]
ROCM_VERSIONS = [
"4.3.1",
"4.5.2",
]
ROCM_VERSION_LABELS = ["rocm" + v for v in ROCM_VERSIONS]
GPU_VERSIONS = [None] + ["cuda" + v for v in CUDA_VERSIONS] + ROCM_VERSION_LABELS
STANDARD_PYTHON_VERSIONS = [
"3.6",
"3.7",
"3.8",
"3.9",
"3.10"
"3.8"
]

View File

@ -1,7 +1,64 @@
from cimodel.lib.conf_tree import ConfigNode
from cimodel.lib.conf_tree import ConfigNode, X, XImportant
CONFIG_TREE_DATA = [
("xenial", [
(None, [
X("nightly"),
]),
("gcc", [
("5.4", [ # All this subtree rebases to master and then build
XImportant("3.6"),
("3.6", [
("parallel_tbb", [X(True)]),
("parallel_native", [X(True)]),
]),
]),
# TODO: bring back libtorch test
("7", [X("3.6")]),
]),
("clang", [
("5", [
XImportant("3.6"), # This is actually the ASAN build
]),
]),
("cuda", [
("9.2", [
X("3.6"),
("3.6", [
("cuda_gcc_override", [X("gcc5.4")])
])
]),
("10.1", [X("3.6")]),
("10.2", [
XImportant("3.6"),
("3.6", [
("libtorch", [XImportant(True)])
]),
]),
("11.0", [
X("3.8"),
("3.8", [
("libtorch", [X(True)])
]),
]),
]),
]),
("bionic", [
("clang", [
("9", [
XImportant("3.6"),
]),
("9", [
("3.6", [
("xla", [XImportant(True)]),
]),
]),
]),
("gcc", [
("9", [XImportant("3.8")]),
]),
]),
]
@ -53,8 +110,6 @@ class PyVerConfigNode(TreeConfigNode):
def init2(self, node_name):
self.props["pyver"] = node_name
self.props["abbreviated_pyver"] = get_major_pyver(node_name)
if node_name == "3.9":
self.props["abbreviated_pyver"] = "py3.9"
# noinspection PyMethodMayBeStatic
def child_constructor(self):
@ -69,43 +124,17 @@ class ExperimentalFeatureConfigNode(TreeConfigNode):
experimental_feature = self.find_prop("experimental_feature")
next_nodes = {
"asan": AsanConfigNode,
"xla": XlaConfigNode,
"mps": MPSConfigNode,
"vulkan": VulkanConfigNode,
"parallel_tbb": ParallelTBBConfigNode,
"crossref": CrossRefConfigNode,
"parallel_native": ParallelNativeConfigNode,
"onnx": ONNXConfigNode,
"libtorch": LibTorchConfigNode,
"important": ImportantConfigNode,
"build_only": BuildOnlyConfigNode,
"shard_test": ShardTestConfigNode,
"cuda_gcc_override": CudaGccOverrideConfigNode,
"pure_torch": PureTorchConfigNode,
"slow_gradcheck": SlowGradcheckConfigNode,
"cuda_gcc_override": CudaGccOverrideConfigNode
}
return next_nodes[experimental_feature]
class SlowGradcheckConfigNode(TreeConfigNode):
def init2(self, node_name):
self.props["is_slow_gradcheck"] = True
def child_constructor(self):
return ExperimentalFeatureConfigNode
class PureTorchConfigNode(TreeConfigNode):
def modify_label(self, label):
return "PURE_TORCH=" + str(label)
def init2(self, node_name):
self.props["is_pure_torch"] = node_name
def child_constructor(self):
return ImportantConfigNode
class XlaConfigNode(TreeConfigNode):
def modify_label(self, label):
return "XLA=" + str(label)
@ -116,49 +145,6 @@ class XlaConfigNode(TreeConfigNode):
def child_constructor(self):
return ImportantConfigNode
class MPSConfigNode(TreeConfigNode):
def modify_label(self, label):
return "MPS=" + str(label)
def init2(self, node_name):
self.props["is_mps"] = node_name
def child_constructor(self):
return ImportantConfigNode
class AsanConfigNode(TreeConfigNode):
def modify_label(self, label):
return "Asan=" + str(label)
def init2(self, node_name):
self.props["is_asan"] = node_name
def child_constructor(self):
return ExperimentalFeatureConfigNode
class ONNXConfigNode(TreeConfigNode):
def modify_label(self, label):
return "Onnx=" + str(label)
def init2(self, node_name):
self.props["is_onnx"] = node_name
def child_constructor(self):
return ImportantConfigNode
class VulkanConfigNode(TreeConfigNode):
def modify_label(self, label):
return "Vulkan=" + str(label)
def init2(self, node_name):
self.props["is_vulkan"] = node_name
def child_constructor(self):
return ImportantConfigNode
class ParallelTBBConfigNode(TreeConfigNode):
def modify_label(self, label):
@ -171,14 +157,6 @@ class ParallelTBBConfigNode(TreeConfigNode):
return ImportantConfigNode
class CrossRefConfigNode(TreeConfigNode):
def init2(self, node_name):
self.props["is_crossref"] = node_name
def child_constructor(self):
return ImportantConfigNode
class ParallelNativeConfigNode(TreeConfigNode):
def modify_label(self, label):
return "PARALLELNATIVE=" + str(label)
@ -198,7 +176,7 @@ class LibTorchConfigNode(TreeConfigNode):
self.props["is_libtorch"] = node_name
def child_constructor(self):
return ExperimentalFeatureConfigNode
return ImportantConfigNode
class CudaGccOverrideConfigNode(TreeConfigNode):
@ -206,21 +184,13 @@ class CudaGccOverrideConfigNode(TreeConfigNode):
self.props["cuda_gcc_override"] = node_name
def child_constructor(self):
return ExperimentalFeatureConfigNode
return ImportantConfigNode
class BuildOnlyConfigNode(TreeConfigNode):
def init2(self, node_name):
self.props["build_only"] = node_name
def child_constructor(self):
return ExperimentalFeatureConfigNode
class ShardTestConfigNode(TreeConfigNode):
def init2(self, node_name):
self.props["shard_test"] = node_name
def child_constructor(self):
return ImportantConfigNode
@ -237,6 +207,7 @@ class ImportantConfigNode(TreeConfigNode):
class XenialCompilerConfigNode(TreeConfigNode):
def modify_label(self, label):
return label or "<unspecified>"
@ -250,6 +221,7 @@ class XenialCompilerConfigNode(TreeConfigNode):
class BionicCompilerConfigNode(TreeConfigNode):
def modify_label(self, label):
return label or "<unspecified>"

View File

@ -1,13 +1,14 @@
from collections import OrderedDict
from dataclasses import dataclass, field
from typing import List, Optional
from cimodel.data.pytorch_build_data import TopLevelNode, CONFIG_TREE_DATA
import cimodel.data.dimensions as dimensions
import cimodel.lib.conf_tree as conf_tree
import cimodel.lib.miniutils as miniutils
from cimodel.data.pytorch_build_data import CONFIG_TREE_DATA, TopLevelNode
from cimodel.data.simple.util.branch_filters import gen_filter_dict, RC_PATTERN
from cimodel.data.simple.util.docker_constants import gen_docker_image
from cimodel.data.simple.util.branch_filters import gen_filter_dict
from cimodel.data.simple.util.docker_constants import gen_docker_image_path
from dataclasses import dataclass, field
from typing import List, Optional
@dataclass
@ -17,25 +18,18 @@ class Conf:
parms_list_ignored_for_docker_image: Optional[List[str]] = None
pyver: Optional[str] = None
cuda_version: Optional[str] = None
rocm_version: Optional[str] = None
# TODO expand this to cover all the USE_* that we want to test for
# tesnrorrt, leveldb, lmdb, redis, opencv, mkldnn, ideep, etc.
# (from https://github.com/pytorch/pytorch/pull/17323#discussion_r259453608)
is_xla: bool = False
is_vulkan: bool = False
is_pure_torch: bool = False
vulkan: bool = False
restrict_phases: Optional[List[str]] = None
gpu_resource: Optional[str] = None
dependent_tests: List = field(default_factory=list)
parent_build: Optional["Conf"] = None
parent_build: Optional['Conf'] = None
is_libtorch: bool = False
is_important: bool = False
parallel_backend: Optional[str] = None
build_only: bool = False
@staticmethod
def is_test_phase(phase):
return "test" in phase
# TODO: Eliminate the special casing for docker paths
# In the short term, we *will* need to support special casing as docker images are merged for caffe2 and pytorch
@ -48,12 +42,8 @@ class Conf:
leading.append("pytorch")
if self.is_xla and not for_docker:
leading.append("xla")
if self.is_vulkan and not for_docker:
leading.append("vulkan")
if self.is_libtorch and not for_docker:
leading.append("libtorch")
if self.is_pure_torch and not for_docker:
leading.append("pure_torch")
if self.parallel_backend is not None and not for_docker:
leading.append(self.parallel_backend)
@ -61,34 +51,23 @@ class Conf:
if self.cuda_version:
cudnn = "cudnn8" if self.cuda_version.startswith("11.") else "cudnn7"
cuda_parms.extend(["cuda" + self.cuda_version, cudnn])
if self.rocm_version:
cuda_parms.extend([f"rocm{self.rocm_version}"])
result = leading + ["linux", self.distro] + cuda_parms + self.parms
if not for_docker and self.parms_list_ignored_for_docker_image is not None:
result = result + self.parms_list_ignored_for_docker_image
return result
def gen_docker_image_path(self):
parms_source = self.parent_build or self
base_build_env_name = "-".join(parms_source.get_parms(True))
image_name, _ = gen_docker_image(base_build_env_name)
return miniutils.quote(image_name)
def gen_docker_image_requires(self):
parms_source = self.parent_build or self
base_build_env_name = "-".join(parms_source.get_parms(True))
_, requires = gen_docker_image(base_build_env_name)
return miniutils.quote(requires)
return miniutils.quote(gen_docker_image_path(base_build_env_name))
def get_build_job_name_pieces(self, build_or_test):
return self.get_parms(False) + [build_or_test]
def gen_build_name(self, build_or_test):
return (
("_".join(map(str, self.get_build_job_name_pieces(build_or_test))))
.replace(".", "_")
.replace("-", "_")
)
return ("_".join(map(str, self.get_build_job_name_pieces(build_or_test)))).replace(".", "_").replace("-", "_")
def get_dependents(self):
return self.dependent_tests or []
@ -100,28 +79,20 @@ class Conf:
build_env_name = "-".join(map(str, build_job_name_pieces))
parameters["build_environment"] = miniutils.quote(build_env_name)
parameters["docker_image"] = self.gen_docker_image_path()
if Conf.is_test_phase(phase) and self.gpu_resource:
if phase == "test" and self.gpu_resource:
parameters["use_cuda_docker_runtime"] = miniutils.quote("1")
if Conf.is_test_phase(phase):
if phase == "test":
resource_class = "large"
if self.gpu_resource:
resource_class = "gpu." + self.gpu_resource
if self.rocm_version is not None:
resource_class = "pytorch/amd-gpu"
parameters["resource_class"] = resource_class
if phase == "build" and self.rocm_version is not None:
parameters["resource_class"] = "xlarge"
if hasattr(self, 'filters'):
parameters['filters'] = self.filters
if self.build_only:
parameters['build_only'] = miniutils.quote(str(int(True)))
return parameters
def gen_workflow_job(self, phase):
job_def = OrderedDict()
job_def["name"] = self.gen_build_name(phase)
if Conf.is_test_phase(phase):
if phase == "test":
# TODO When merging the caffe2 and pytorch jobs, it might be convenient for a while to make a
# caffe2 test job dependent on a pytorch build job. This way we could quickly dedup the repeated
@ -133,85 +104,63 @@ class Conf:
job_name = "pytorch_linux_test"
else:
job_name = "pytorch_linux_build"
job_def["requires"] = [self.gen_docker_image_requires()]
if not self.is_important:
job_def["filters"] = gen_filter_dict()
job_def.update(self.gen_workflow_params(phase))
return {job_name: job_def}
return {job_name : job_def}
# TODO This is a hack to special case some configs just for the workflow list
class HiddenConf(object):
def __init__(self, name, parent_build=None, filters=None):
def __init__(self, name, parent_build=None):
self.name = name
self.parent_build = parent_build
self.filters = filters
def gen_workflow_job(self, phase):
return {
self.gen_build_name(phase): {
"requires": [self.parent_build.gen_build_name("build")],
"filters": self.filters,
}
}
return {self.gen_build_name(phase): {"requires": [self.parent_build.gen_build_name("build")]}}
def gen_build_name(self, _):
return self.name
class DocPushConf(object):
def __init__(self, name, parent_build=None, branch="master"):
self.name = name
self.parent_build = parent_build
self.branch = branch
def gen_workflow_job(self, phase):
return {
"pytorch_doc_push": {
"name": self.name,
"branch": self.branch,
"requires": [self.parent_build],
"context": "org-member",
"filters": gen_filter_dict(branches_list=["nightly"],
tags_list=RC_PATTERN)
}
}
# TODO Convert these to graph nodes
def gen_dependent_configs(xenial_parent_config):
extra_parms = [
(["multigpu"], "large"),
(["NO_AVX2"], "medium"),
(["NO_AVX", "NO_AVX2"], "medium"),
(["slow"], "medium"),
(["nogpu"], None),
]
configs = []
for parms, gpu in extra_parms:
c = Conf(
xenial_parent_config.distro,
["py3"] + parms,
pyver="3.6",
cuda_version=xenial_parent_config.cuda_version,
restrict_phases=["test"],
gpu_resource=gpu,
parent_build=xenial_parent_config,
is_important=xenial_parent_config.is_important,
)
configs.append(c)
return configs
def gen_docs_configs(xenial_parent_config):
configs = []
configs.append(
HiddenConf(
"pytorch_python_doc_build",
parent_build=xenial_parent_config,
filters=gen_filter_dict(branches_list=["master", "main", "nightly"],
tags_list=RC_PATTERN),
)
)
configs.append(
DocPushConf(
"pytorch_python_doc_push",
parent_build="pytorch_python_doc_build",
branch="site",
)
)
for x in ["pytorch_python_doc_push", "pytorch_cpp_doc_push", "pytorch_doc_test"]:
configs.append(HiddenConf(x, parent_build=xenial_parent_config))
configs.append(
HiddenConf(
"pytorch_cpp_doc_build",
parent_build=xenial_parent_config,
filters=gen_filter_dict(branches_list=["master", "main", "nightly"],
tags_list=RC_PATTERN),
)
)
configs.append(
DocPushConf(
"pytorch_cpp_doc_push",
parent_build="pytorch_cpp_doc_build",
branch="master",
)
)
return configs
@ -225,7 +174,7 @@ def gen_tree():
return configs_list
def instantiate_configs(only_slow_gradcheck):
def instantiate_configs():
config_list = []
@ -238,16 +187,11 @@ def instantiate_configs(only_slow_gradcheck):
compiler_name = fc.find_prop("compiler_name")
compiler_version = fc.find_prop("compiler_version")
is_xla = fc.find_prop("is_xla") or False
is_asan = fc.find_prop("is_asan") or False
is_crossref = fc.find_prop("is_crossref") or False
is_onnx = fc.find_prop("is_onnx") or False
is_pure_torch = fc.find_prop("is_pure_torch") or False
is_vulkan = fc.find_prop("is_vulkan") or False
is_slow_gradcheck = fc.find_prop("is_slow_gradcheck") or False
parms_list_ignored_for_docker_image = []
if only_slow_gradcheck ^ is_slow_gradcheck:
continue
vulkan = fc.find_prop("vulkan") or False
if vulkan:
parms_list_ignored_for_docker_image.append("vulkan")
python_version = None
if compiler_name == "cuda" or compiler_name == "android":
@ -257,14 +201,9 @@ def instantiate_configs(only_slow_gradcheck):
parms_list = ["py" + fc.find_prop("pyver")]
cuda_version = None
rocm_version = None
if compiler_name == "cuda":
cuda_version = fc.find_prop("compiler_version")
elif compiler_name == "rocm":
rocm_version = fc.find_prop("compiler_version")
restrict_phases = ["build", "test1", "test2", "caffe2_test"]
elif compiler_name == "android":
android_ndk_version = fc.find_prop("compiler_version")
# TODO: do we need clang to compile host binaries like protoc?
@ -278,19 +217,11 @@ def instantiate_configs(only_slow_gradcheck):
gcc_version = compiler_name + (fc.find_prop("compiler_version") or "")
parms_list.append(gcc_version)
if is_asan:
parms_list.append("asan")
python_version = fc.find_prop("pyver")
parms_list[0] = fc.find_prop("abbreviated_pyver")
if is_crossref:
parms_list_ignored_for_docker_image.append("crossref")
if is_onnx:
parms_list.append("onnx")
python_version = fc.find_prop("pyver")
parms_list[0] = fc.find_prop("abbreviated_pyver")
restrict_phases = ["build", "ort_test1", "ort_test2"]
# TODO: This is a nasty special case
if gcc_version == 'clang5' and not is_xla:
parms_list.append("asan")
python_version = fc.find_prop("pyver")
parms_list[0] = fc.find_prop("abbreviated_pyver")
if cuda_version:
cuda_gcc_version = fc.find_prop("cuda_gcc_override") or "gcc7"
@ -300,18 +231,9 @@ def instantiate_configs(only_slow_gradcheck):
is_important = fc.find_prop("is_important") or False
parallel_backend = fc.find_prop("parallel_backend") or None
build_only = fc.find_prop("build_only") or False
shard_test = fc.find_prop("shard_test") or False
# TODO: fix pure_torch python test packaging issue.
if shard_test:
restrict_phases = ["build"] if restrict_phases is None else restrict_phases
restrict_phases.extend(["test1", "test2"])
if build_only or is_pure_torch:
if build_only and restrict_phases is None:
restrict_phases = ["build"]
if is_slow_gradcheck:
parms_list_ignored_for_docker_image.append("old")
parms_list_ignored_for_docker_image.append("gradcheck")
gpu_resource = None
if cuda_version and cuda_version != "10":
gpu_resource = "medium"
@ -322,43 +244,50 @@ def instantiate_configs(only_slow_gradcheck):
parms_list_ignored_for_docker_image,
python_version,
cuda_version,
rocm_version,
is_xla,
is_vulkan,
is_pure_torch,
vulkan,
restrict_phases,
gpu_resource,
is_libtorch=is_libtorch,
is_important=is_important,
parallel_backend=parallel_backend,
build_only=build_only,
)
# run docs builds on "pytorch-linux-xenial-py3.7-gcc5.4". Docs builds
# run docs builds on "pytorch-linux-xenial-py3.6-gcc5.4". Docs builds
# should run on a CPU-only build that runs on all PRs.
# XXX should this be updated to a more modern build?
if (
distro_name == "xenial"
and fc.find_prop("pyver") == "3.7"
and cuda_version is None
and parallel_backend is None
and not is_vulkan
and not is_pure_torch
and compiler_name == "gcc"
and fc.find_prop("compiler_version") == "5.4"
):
c.filters = gen_filter_dict(branches_list=r"/.*/",
tags_list=RC_PATTERN)
if distro_name == 'xenial' and fc.find_prop("pyver") == '3.6' \
and cuda_version is None \
and parallel_backend is None \
and compiler_name == 'gcc' \
and fc.find_prop('compiler_version') == '5.4':
c.dependent_tests = gen_docs_configs(c)
if cuda_version == "10.1" and python_version == "3.6" and not is_libtorch:
c.dependent_tests = gen_dependent_configs(c)
if (compiler_name == "gcc"
and compiler_version == "5.4"
and not is_libtorch
and parallel_backend is None):
bc_breaking_check = Conf(
"backward-compatibility-check",
[],
is_xla=False,
restrict_phases=["test"],
is_libtorch=False,
is_important=True,
parent_build=c,
)
c.dependent_tests.append(bc_breaking_check)
config_list.append(c)
return config_list
def get_workflow_jobs(only_slow_gradcheck=False):
def get_workflow_jobs():
config_list = instantiate_configs(only_slow_gradcheck)
config_list = instantiate_configs()
x = []
for conf_options in config_list:
@ -368,7 +297,7 @@ def get_workflow_jobs(only_slow_gradcheck=False):
for phase in phases:
# TODO why does this not have a test?
if Conf.is_test_phase(phase) and conf_options.cuda_version == "10":
if phase == "test" and conf_options.cuda_version == "10":
continue
x.append(conf_options.gen_workflow_job(phase))

View File

@ -1,28 +0,0 @@
from collections import OrderedDict
from cimodel.data.simple.util.branch_filters import gen_filter_dict
from cimodel.lib.miniutils import quote
CHANNELS_TO_PRUNE = ["pytorch-nightly", "pytorch-test"]
PACKAGES_TO_PRUNE = "pytorch torchvision torchaudio torchtext ignite torchcsprng"
def gen_workflow_job(channel: str):
return OrderedDict(
{
"anaconda_prune": OrderedDict(
{
"name": f"anaconda-prune-{channel}",
"context": quote("org-member"),
"packages": quote(PACKAGES_TO_PRUNE),
"channel": channel,
"filters": gen_filter_dict(branches_list=["postnightly"]),
}
)
}
)
def get_workflow_jobs():
return [gen_workflow_job(channel) for channel in CHANNELS_TO_PRUNE]

View File

@ -0,0 +1,92 @@
import cimodel.data.simple.util.branch_filters
from cimodel.data.simple.util.docker_constants import DOCKER_IMAGE_NDK
class AndroidJob:
def __init__(self,
variant,
template_name,
is_master_only=True):
self.variant = variant
self.template_name = template_name
self.is_master_only = is_master_only
def gen_tree(self):
base_name_parts = [
"pytorch",
"linux",
"xenial",
"py3",
"clang5",
"android",
"ndk",
"r19c",
] + self.variant + [
"build",
]
full_job_name = "_".join(base_name_parts)
build_env_name = "-".join(base_name_parts)
props_dict = {
"name": full_job_name,
"build_environment": "\"{}\"".format(build_env_name),
"docker_image": "\"{}\"".format(DOCKER_IMAGE_NDK),
}
if self.is_master_only:
props_dict["filters"] = cimodel.data.simple.util.branch_filters.gen_filter_dict()
return [{self.template_name: props_dict}]
class AndroidGradleJob:
def __init__(self,
job_name,
template_name,
dependencies,
is_master_only=True):
self.job_name = job_name
self.template_name = template_name
self.dependencies = dependencies
self.is_master_only = is_master_only
def gen_tree(self):
props_dict = {
"name": self.job_name,
"requires": self.dependencies,
}
if self.is_master_only:
props_dict["filters"] = cimodel.data.simple.util.branch_filters.gen_filter_dict()
return [{self.template_name: props_dict}]
WORKFLOW_DATA = [
AndroidJob(["x86_32"], "pytorch_linux_build", is_master_only=False),
AndroidJob(["x86_64"], "pytorch_linux_build"),
AndroidJob(["arm", "v7a"], "pytorch_linux_build"),
AndroidJob(["arm", "v8a"], "pytorch_linux_build"),
AndroidJob(["vulkan", "x86_32"], "pytorch_linux_build", is_master_only=False),
AndroidGradleJob(
"pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build-x86_32",
"pytorch_android_gradle_build-x86_32",
["pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build"],
is_master_only=False),
AndroidGradleJob(
"pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build",
"pytorch_android_gradle_build",
["pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build",
"pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_64_build",
"pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v7a_build",
"pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v8a_build"]),
]
def get_workflow_jobs():
return [item.gen_tree() for item in WORKFLOW_DATA]

View File

@ -0,0 +1,63 @@
from cimodel.data.simple.util.docker_constants import DOCKER_IMAGE_GCC7
def gen_job_name(phase):
job_name_parts = [
"pytorch",
"bazel",
phase,
]
return "_".join(job_name_parts)
class BazelJob:
def __init__(self, phase, extra_props=None):
self.phase = phase
self.extra_props = extra_props or {}
def gen_tree(self):
template_parts = [
"pytorch",
"linux",
"bazel",
self.phase,
]
build_env_parts = [
"pytorch",
"linux",
"xenial",
"py3.6",
"gcc7",
"bazel",
self.phase,
]
full_job_name = gen_job_name(self.phase)
build_env_name = "-".join(build_env_parts)
extra_requires = [gen_job_name("build")] if self.phase == "test" else []
props_dict = {
"build_environment": build_env_name,
"docker_image": DOCKER_IMAGE_GCC7,
"name": full_job_name,
"requires": extra_requires,
}
props_dict.update(self.extra_props)
template_name = "_".join(template_parts)
return [{template_name: props_dict}]
WORKFLOW_DATA = [
BazelJob("build", {"resource_class": "large"}),
BazelJob("test"),
]
def get_workflow_jobs():
return [item.gen_tree() for item in WORKFLOW_DATA]

View File

@ -0,0 +1,193 @@
"""
TODO: Refactor circleci/cimodel/data/binary_build_data.py to generate this file
instead of doing one offs here
Binary builds (subset, to smoke test that they'll work)
NB: If you modify this file, you need to also modify
the binary_and_smoke_tests_on_pr variable in
pytorch-ci-hud to adjust the list of whitelisted builds
at https://github.com/ezyang/pytorch-ci-hud/blob/master/src/BuildHistoryDisplay.js
Note:
This binary build is currently broken, see https://github_com/pytorch/pytorch/issues/16710
- binary_linux_conda_3_6_cu90_devtoolset7_build
- binary_linux_conda_3_6_cu90_devtoolset7_test
TODO
we should test a libtorch cuda build, but they take too long
- binary_linux_libtorch_3_6m_cu90_devtoolset7_static-without-deps_build
"""
import cimodel.lib.miniutils as miniutils
import cimodel.data.simple.util.branch_filters
class SmoketestJob:
def __init__(self,
template_name,
build_env_parts,
docker_image,
job_name,
is_master_only=False,
requires=None,
has_libtorch_variant=False,
extra_props=None):
self.template_name = template_name
self.build_env_parts = build_env_parts
self.docker_image = docker_image
self.job_name = job_name
self.is_master_only = is_master_only
self.requires = requires or []
self.has_libtorch_variant = has_libtorch_variant
self.extra_props = extra_props or {}
def gen_tree(self):
props_dict = {
"build_environment": " ".join(self.build_env_parts),
"name": self.job_name,
"requires": self.requires,
}
if self.docker_image:
props_dict["docker_image"] = self.docker_image
if self.is_master_only:
props_dict["filters"] = cimodel.data.simple.util.branch_filters.gen_filter_dict()
if self.has_libtorch_variant:
props_dict["libtorch_variant"] = "shared-with-deps"
props_dict.update(self.extra_props)
return [{self.template_name: props_dict}]
WORKFLOW_DATA = [
SmoketestJob(
"binary_linux_build",
["manywheel", "3.7m", "cu102", "devtoolset7"],
"pytorch/manylinux-cuda102",
"binary_linux_manywheel_3_7m_cu102_devtoolset7_build",
is_master_only=True,
),
SmoketestJob(
"binary_linux_build",
["libtorch", "3.7m", "cpu", "devtoolset7"],
"pytorch/manylinux-cuda102",
"binary_linux_libtorch_3_7m_cpu_devtoolset7_shared-with-deps_build",
is_master_only=False,
has_libtorch_variant=True,
),
SmoketestJob(
"binary_linux_build",
["libtorch", "3.7m", "cpu", "gcc5.4_cxx11-abi"],
"pytorch/pytorch-binary-docker-image-ubuntu16.04:latest",
"binary_linux_libtorch_3_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_build",
is_master_only=False,
has_libtorch_variant=True,
),
SmoketestJob(
"binary_mac_build",
["wheel", "3.7", "cpu"],
None,
"binary_macos_wheel_3_7_cpu_build",
is_master_only=True,
),
# This job has an average run time of 3 hours o.O
# Now only running this on master to reduce overhead
SmoketestJob(
"binary_mac_build",
["libtorch", "3.7", "cpu"],
None,
"binary_macos_libtorch_3_7_cpu_build",
is_master_only=True,
),
SmoketestJob(
"binary_windows_build",
["libtorch", "3.7", "cpu", "debug"],
None,
"binary_windows_libtorch_3_7_cpu_debug_build",
is_master_only=False,
),
SmoketestJob(
"binary_windows_build",
["libtorch", "3.7", "cpu", "release"],
None,
"binary_windows_libtorch_3_7_cpu_release_build",
is_master_only=False,
),
SmoketestJob(
"binary_windows_build",
["wheel", "3.7", "cu102"],
None,
"binary_windows_wheel_3_7_cu102_build",
is_master_only=True,
),
SmoketestJob(
"binary_windows_test",
["libtorch", "3.7", "cpu", "debug"],
None,
"binary_windows_libtorch_3_7_cpu_debug_test",
is_master_only=False,
requires=["binary_windows_libtorch_3_7_cpu_debug_build"],
),
SmoketestJob(
"binary_windows_test",
["libtorch", "3.7", "cpu", "release"],
None,
"binary_windows_libtorch_3_7_cpu_release_test",
is_master_only=False,
requires=["binary_windows_libtorch_3_7_cpu_release_build"],
),
SmoketestJob(
"binary_windows_test",
["wheel", "3.7", "cu102"],
None,
"binary_windows_wheel_3_7_cu102_test",
is_master_only=True,
requires=["binary_windows_wheel_3_7_cu102_build"],
extra_props={
"executor": "windows-with-nvidia-gpu",
},
),
SmoketestJob(
"binary_linux_test",
["manywheel", "3.7m", "cu102", "devtoolset7"],
"pytorch/manylinux-cuda102",
"binary_linux_manywheel_3_7m_cu102_devtoolset7_test",
is_master_only=True,
requires=["binary_linux_manywheel_3_7m_cu102_devtoolset7_build"],
extra_props={
"resource_class": "gpu.medium",
"use_cuda_docker_runtime": miniutils.quote((str(1))),
},
),
SmoketestJob(
"binary_linux_test",
["libtorch", "3.7m", "cpu", "devtoolset7"],
"pytorch/manylinux-cuda102",
"binary_linux_libtorch_3_7m_cpu_devtoolset7_shared-with-deps_test",
is_master_only=False,
requires=["binary_linux_libtorch_3_7m_cpu_devtoolset7_shared-with-deps_build"],
has_libtorch_variant=True,
),
SmoketestJob(
"binary_linux_test",
["libtorch", "3.7m", "cpu", "gcc5.4_cxx11-abi"],
"pytorch/pytorch-binary-docker-image-ubuntu16.04:latest",
"binary_linux_libtorch_3_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_test",
is_master_only=False,
requires=["binary_linux_libtorch_3_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_build"],
has_libtorch_variant=True,
),
]
def get_workflow_jobs():
return [item.gen_tree() for item in WORKFLOW_DATA]

View File

@ -1,39 +1,44 @@
from collections import OrderedDict
from cimodel.lib.miniutils import quote
from cimodel.data.simple.util.branch_filters import gen_filter_dict, RC_PATTERN
# NOTE: All hardcoded docker image builds have been migrated to GHA
# TODO: make this generated from a matrix rather than just a static list
IMAGE_NAMES = [
"pytorch-linux-bionic-cuda11.0-cudnn8-py3.6-gcc9",
"pytorch-linux-bionic-cuda11.0-cudnn8-py3.8-gcc9",
"pytorch-linux-bionic-cuda10.2-cudnn7-py3.8-gcc9",
"pytorch-linux-bionic-py3.6-clang9",
"pytorch-linux-bionic-cuda10.2-cudnn7-py3.6-clang9",
"pytorch-linux-bionic-py3.8-gcc9",
"pytorch-linux-xenial-cuda10-cudnn7-py3-gcc7",
"pytorch-linux-xenial-cuda10.1-cudnn7-py3-gcc7",
"pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7",
"pytorch-linux-xenial-cuda11.0-cudnn8-py3-gcc7",
"pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc5.4",
"pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc7",
"pytorch-linux-xenial-py3-clang5-android-ndk-r19c",
"pytorch-linux-xenial-py3-clang5-asan",
"pytorch-linux-xenial-py3.8",
"pytorch-linux-xenial-py3.6-clang7",
"pytorch-linux-xenial-py3.6-gcc4.8",
"pytorch-linux-xenial-py3.6-gcc5.4",
"pytorch-linux-xenial-py3.6-gcc7.2",
"pytorch-linux-xenial-py3.6-gcc7",
"pytorch-linux-xenial-pynightly",
"pytorch-linux-xenial-rocm3.3-py3.6",
]
# This entry should be an element from the list above
# This should contain the image matching the "slow_gradcheck" entry in
# pytorch_build_data.py
SLOW_GRADCHECK_IMAGE_NAME = "pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7"
def get_workflow_jobs(images=IMAGE_NAMES, only_slow_gradcheck=False):
def get_workflow_jobs():
"""Generates a list of docker image build definitions"""
ret = []
for image_name in images:
if image_name.startswith('docker-'):
image_name = image_name.lstrip('docker-')
if only_slow_gradcheck and image_name is not SLOW_GRADCHECK_IMAGE_NAME:
continue
parameters = OrderedDict({
"name": quote(f"docker-{image_name}"),
"image_name": quote(image_name),
})
if image_name == "pytorch-linux-xenial-py3.7-gcc5.4":
# pushing documentation on tags requires CircleCI to also
# build all the dependencies on tags, including this docker image
parameters['filters'] = gen_filter_dict(branches_list=r"/.*/",
tags_list=RC_PATTERN)
ret.append(OrderedDict(
return [
OrderedDict(
{
"docker_build_job": parameters
"docker_build_job": OrderedDict(
{"name": quote(image_name), "image_name": quote(image_name)}
)
}
))
return ret
)
for image_name in IMAGE_NAMES
]

View File

@ -0,0 +1,103 @@
import cimodel.lib.miniutils as miniutils
from cimodel.data.simple.util.versions import MultiPartVersion, CudaVersion
from cimodel.data.simple.util.docker_constants import DOCKER_IMAGE_BASIC, DOCKER_IMAGE_CUDA_10_2
class GeConfigTestJob:
def __init__(self,
py_version,
gcc_version,
cuda_version,
variant_parts,
extra_requires,
use_cuda_docker=False,
build_env_override=None):
self.py_version = py_version
self.gcc_version = gcc_version
self.cuda_version = cuda_version
self.variant_parts = variant_parts
self.extra_requires = extra_requires
self.use_cuda_docker = use_cuda_docker
self.build_env_override = build_env_override
def get_all_parts(self, with_dots):
maybe_py_version = self.py_version.render_dots_or_parts(with_dots) if self.py_version else []
maybe_gcc_version = self.gcc_version.render_dots_or_parts(with_dots) if self.gcc_version else []
maybe_cuda_version = self.cuda_version.render_dots_or_parts(with_dots) if self.cuda_version else []
common_parts = [
"pytorch",
"linux",
"xenial",
] + maybe_cuda_version + maybe_py_version + maybe_gcc_version
return common_parts + self.variant_parts
def gen_tree(self):
resource_class = "gpu.medium" if self.use_cuda_docker else "large"
docker_image = DOCKER_IMAGE_CUDA_10_2 if self.use_cuda_docker else DOCKER_IMAGE_BASIC
full_name = "_".join(self.get_all_parts(False))
build_env = self.build_env_override or "-".join(self.get_all_parts(True))
props_dict = {
"name": full_name,
"build_environment": build_env,
"requires": self.extra_requires,
"resource_class": resource_class,
"docker_image": docker_image,
}
if self.use_cuda_docker:
props_dict["use_cuda_docker_runtime"] = miniutils.quote(str(1))
return [{"pytorch_linux_test": props_dict}]
WORKFLOW_DATA = [
GeConfigTestJob(
MultiPartVersion([3, 6], "py"),
MultiPartVersion([5, 4], "gcc"),
None,
["ge_config_legacy", "test"],
["pytorch_linux_xenial_py3_6_gcc5_4_build"]),
GeConfigTestJob(
MultiPartVersion([3, 6], "py"),
MultiPartVersion([5, 4], "gcc"),
None,
["ge_config_profiling", "test"],
["pytorch_linux_xenial_py3_6_gcc5_4_build"]),
GeConfigTestJob(
MultiPartVersion([3, 6], "py"),
MultiPartVersion([5, 4], "gcc"),
None,
["ge_config_simple", "test"],
["pytorch_linux_xenial_py3_6_gcc5_4_build"],
),
GeConfigTestJob(
None,
None,
CudaVersion(10, 2),
["cudnn7", "py3", "ge_config_legacy", "test"],
["pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_build"],
use_cuda_docker=True,
# TODO Why does the build environment specify cuda10.1, while the
# job name is cuda10_2?
build_env_override="pytorch-linux-xenial-cuda10.1-cudnn7-ge_config_legacy-test"),
GeConfigTestJob(
None,
None,
CudaVersion(10, 2),
["cudnn7", "py3", "ge_config_profiling", "test"],
["pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_build"],
use_cuda_docker=True,
# TODO Why does the build environment specify cuda10.1, while the
# job name is cuda10_2?
build_env_override="pytorch-linux-xenial-cuda10.1-cudnn7-ge_config_profiling-test"),
]
def get_workflow_jobs():
return [item.gen_tree() for item in WORKFLOW_DATA]

View File

@ -1,16 +1,16 @@
from cimodel.data.simple.util.versions import MultiPartVersion
import cimodel.lib.miniutils as miniutils
XCODE_VERSION = MultiPartVersion([12, 5, 1])
IOS_VERSION = MultiPartVersion([11, 2, 1])
class ArchVariant:
def __init__(self, name, custom_build_name=""):
def __init__(self, name, is_custom=False):
self.name = name
self.custom_build_name = custom_build_name
self.is_custom = is_custom
def render(self):
extra_parts = [self.custom_build_name] if len(self.custom_build_name) > 0 else []
extra_parts = ["custom"] if self.is_custom else []
return "_".join([self.name] + extra_parts)
@ -19,15 +19,15 @@ def get_platform(arch_variant_name):
class IOSJob:
def __init__(self, xcode_version, arch_variant, is_org_member_context=True, extra_props=None):
self.xcode_version = xcode_version
def __init__(self, ios_version, arch_variant, is_org_member_context=True, extra_props=None):
self.ios_version = ios_version
self.arch_variant = arch_variant
self.is_org_member_context = is_org_member_context
self.extra_props = extra_props
def gen_name_parts(self, with_version_dots):
version_parts = self.xcode_version.render_dots_or_parts(with_version_dots)
version_parts = self.ios_version.render_dots_or_parts(with_version_dots)
build_variant_suffix = "_".join([self.arch_variant.render(), "build"])
return [
@ -61,26 +61,9 @@ class IOSJob:
WORKFLOW_DATA = [
IOSJob(XCODE_VERSION, ArchVariant("x86_64"), is_org_member_context=False, extra_props={
"lite_interpreter": miniutils.quote(str(int(True)))}),
IOSJob(XCODE_VERSION, ArchVariant("x86_64", "full_jit"), is_org_member_context=False, extra_props={
"lite_interpreter": miniutils.quote(str(int(False)))}),
IOSJob(XCODE_VERSION, ArchVariant("arm64"), extra_props={
"lite_interpreter": miniutils.quote(str(int(True)))}),
IOSJob(XCODE_VERSION, ArchVariant("arm64", "metal"), extra_props={
"use_metal": miniutils.quote(str(int(True))),
"lite_interpreter": miniutils.quote(str(int(True)))}),
IOSJob(XCODE_VERSION, ArchVariant("arm64", "full_jit"), extra_props={
"lite_interpreter": miniutils.quote(str(int(False)))}),
IOSJob(XCODE_VERSION, ArchVariant("arm64", "custom"), extra_props={
"op_list": "mobilenetv2.yaml",
"lite_interpreter": miniutils.quote(str(int(True)))}),
IOSJob(XCODE_VERSION, ArchVariant("x86_64", "coreml"), is_org_member_context=False, extra_props={
"use_coreml": miniutils.quote(str(int(True))),
"lite_interpreter": miniutils.quote(str(int(True)))}),
IOSJob(XCODE_VERSION, ArchVariant("arm64", "coreml"), extra_props={
"use_coreml": miniutils.quote(str(int(True))),
"lite_interpreter": miniutils.quote(str(int(True)))}),
IOSJob(IOS_VERSION, ArchVariant("x86_64"), is_org_member_context=False),
IOSJob(IOS_VERSION, ArchVariant("arm64")),
IOSJob(IOS_VERSION, ArchVariant("arm64", True), extra_props={"op_list": "mobilenetv2.yaml"}),
]

View File

@ -1,22 +1,14 @@
class MacOsJob:
def __init__(self, os_version, is_build=False, is_test=False, extra_props=tuple()):
# extra_props is tuple type, because mutable data structures for argument defaults
# is not recommended.
def __init__(self, os_version, is_test=False):
self.os_version = os_version
self.is_build = is_build
self.is_test = is_test
self.extra_props = dict(extra_props)
def gen_tree(self):
non_phase_parts = ["pytorch", "macos", self.os_version, "py3"]
extra_name_list = [name for name, exist in self.extra_props.items() if exist]
full_job_name_list = non_phase_parts + extra_name_list + [
'build' if self.is_build else None,
'test' if self.is_test else None,
]
phase_name = "test" if self.is_test else "build"
full_job_name = "_".join(list(filter(None, full_job_name_list)))
full_job_name = "_".join(non_phase_parts + [phase_name])
test_build_dependency = "_".join(non_phase_parts + ["build"])
extra_dependencies = [test_build_dependency] if self.is_test else []
@ -29,23 +21,7 @@ class MacOsJob:
return [{full_job_name: props_dict}]
WORKFLOW_DATA = [
MacOsJob("10_15", is_build=True),
MacOsJob("10_13", is_build=True),
MacOsJob(
"10_13",
is_build=False,
is_test=True,
),
MacOsJob(
"10_13",
is_build=True,
is_test=True,
extra_props=tuple({
"lite_interpreter": True
}.items()),
)
]
WORKFLOW_DATA = [MacOsJob("10_13"), MacOsJob("10_13", True)]
def get_workflow_jobs():

View File

@ -4,17 +4,12 @@ PyTorch Mobile PR builds (use linux host toolchain + mobile build options)
import cimodel.lib.miniutils as miniutils
import cimodel.data.simple.util.branch_filters
from cimodel.data.simple.util.docker_constants import DOCKER_IMAGE_ASAN, DOCKER_IMAGE_NDK
class MobileJob:
def __init__(
self,
docker_image,
docker_requires,
variant_parts,
is_master_only=False):
def __init__(self, docker_image, variant_parts, is_master_only=False):
self.docker_image = docker_image
self.docker_requires = docker_requires
self.variant_parts = variant_parts
self.is_master_only = is_master_only
@ -35,7 +30,6 @@ class MobileJob:
"build_environment": build_env_name,
"build_only": miniutils.quote(str(int(True))),
"docker_image": self.docker_image,
"requires": self.docker_requires,
"name": full_job_name,
}
@ -46,6 +40,15 @@ class MobileJob:
WORKFLOW_DATA = [
MobileJob(DOCKER_IMAGE_ASAN, ["build"]),
MobileJob(DOCKER_IMAGE_ASAN, ["custom", "build", "static"]),
# Use LLVM-DEV toolchain in android-ndk-r19c docker image
MobileJob(DOCKER_IMAGE_NDK, ["custom", "build", "dynamic"]),
# Use LLVM-DEV toolchain in android-ndk-r19c docker image
# Most of this CI is already covered by "mobile-custom-build-dynamic" job
MobileJob(DOCKER_IMAGE_NDK, ["code", "analysis"], True),
]

View File

@ -0,0 +1,73 @@
from cimodel.data.simple.util.docker_constants import DOCKER_IMAGE_NDK
class AndroidNightlyJob:
def __init__(self,
variant,
template_name,
extra_props=None,
with_docker=True,
requires=None,
no_build_suffix=False):
self.variant = variant
self.template_name = template_name
self.extra_props = extra_props or {}
self.with_docker = with_docker
self.requires = requires
self.no_build_suffix = no_build_suffix
def gen_tree(self):
base_name_parts = [
"pytorch",
"linux",
"xenial",
"py3",
"clang5",
"android",
"ndk",
"r19c",
] + self.variant
build_suffix = [] if self.no_build_suffix else ["build"]
full_job_name = "_".join(["nightly"] + base_name_parts + build_suffix)
build_env_name = "-".join(base_name_parts)
props_dict = {
"name": full_job_name,
"requires": self.requires,
"filters": {"branches": {"only": "nightly"}},
}
props_dict.update(self.extra_props)
if self.with_docker:
props_dict["docker_image"] = DOCKER_IMAGE_NDK
props_dict["build_environment"] = build_env_name
return [{self.template_name: props_dict}]
WORKFLOW_DATA = [
AndroidNightlyJob(["x86_32"], "pytorch_linux_build"),
AndroidNightlyJob(["x86_64"], "pytorch_linux_build"),
AndroidNightlyJob(["arm", "v7a"], "pytorch_linux_build"),
AndroidNightlyJob(["arm", "v8a"], "pytorch_linux_build"),
AndroidNightlyJob(["android_gradle"], "pytorch_android_gradle_build",
with_docker=False,
requires=[
"nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build",
"nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_64_build",
"nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v7a_build",
"nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v8a_build"]),
AndroidNightlyJob(["x86_32_android_publish_snapshot"], "pytorch_android_publish_snapshot",
extra_props={"context": "org-member"},
with_docker=False,
requires=["nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_android_gradle_build"],
no_build_suffix=True),
]
def get_workflow_jobs():
return [item.gen_tree() for item in WORKFLOW_DATA]

View File

@ -1,15 +1,12 @@
import cimodel.data.simple.ios_definitions as ios_definitions
import cimodel.lib.miniutils as miniutils
class IOSNightlyJob:
def __init__(self,
variant,
is_full_jit=False,
is_upload=False):
self.variant = variant
self.is_full_jit = is_full_jit
self.is_upload = is_upload
def get_phase_name(self):
@ -19,12 +16,9 @@ class IOSNightlyJob:
extra_name_suffix = [self.get_phase_name()] if self.is_upload else []
extra_name = ["full_jit"] if self.is_full_jit else []
common_name_pieces = [
"ios",
] + extra_name + [
] + ios_definitions.XCODE_VERSION.render_dots_or_parts(with_version_dots) + [
] + ios_definitions.IOS_VERSION.render_dots_or_parts(with_version_dots) + [
"nightly",
self.variant,
"build",
@ -36,8 +30,7 @@ class IOSNightlyJob:
return "_".join(["pytorch"] + self.get_common_name_pieces(False))
def gen_tree(self):
build_configs = BUILD_CONFIGS_FULL_JIT if self.is_full_jit else BUILD_CONFIGS
extra_requires = [x.gen_job_name() for x in build_configs] if self.is_upload else []
extra_requires = [x.gen_job_name() for x in BUILD_CONFIGS] if self.is_upload else []
props_dict = {
"build_environment": "-".join(["libtorch"] + self.get_common_name_pieces(True)),
@ -50,11 +43,6 @@ class IOSNightlyJob:
props_dict["ios_arch"] = self.variant
props_dict["ios_platform"] = ios_definitions.get_platform(self.variant)
props_dict["name"] = self.gen_job_name()
props_dict["use_metal"] = miniutils.quote(str(int(True)))
props_dict["use_coreml"] = miniutils.quote(str(int(True)))
if self.is_full_jit:
props_dict["lite_interpreter"] = miniutils.quote(str(int(False)))
template_name = "_".join([
"binary",
@ -70,14 +58,9 @@ BUILD_CONFIGS = [
IOSNightlyJob("arm64"),
]
BUILD_CONFIGS_FULL_JIT = [
IOSNightlyJob("x86_64", is_full_jit=True),
IOSNightlyJob("arm64", is_full_jit=True),
]
WORKFLOW_DATA = BUILD_CONFIGS + BUILD_CONFIGS_FULL_JIT + [
IOSNightlyJob("binary", is_full_jit=False, is_upload=True),
IOSNightlyJob("binary", is_full_jit=True, is_upload=True),
WORKFLOW_DATA = BUILD_CONFIGS + [
IOSNightlyJob("binary", is_upload=True),
]

View File

@ -1,15 +1,9 @@
NON_PR_BRANCH_LIST = [
"main",
"master",
r"/ci-all\/.*/",
r"/release\/.*/",
]
PR_BRANCH_LIST = [
r"/gh\/.*\/head/",
r"/pull\/.*/",
]
RC_PATTERN = r"/v[0-9]+(\.[0-9]+)*-rc[0-9]+/"
def gen_filter_dict(

View File

@ -1,33 +1,30 @@
AWS_DOCKER_HOST = "308535385114.dkr.ecr.us-east-1.amazonaws.com"
def gen_docker_image(container_type):
return (
"/".join([AWS_DOCKER_HOST, "pytorch", container_type]),
f"docker-{container_type}",
)
def gen_docker_image_requires(image_name):
return [f"docker-{image_name}"]
# ARE YOU EDITING THIS NUMBER? MAKE SURE YOU READ THE GUIDANCE AT THE
# TOP OF .circleci/config.yml
DOCKER_IMAGE_TAG = "209062ef-ab58-422a-b295-36c4eed6e906"
DOCKER_IMAGE_BASIC, DOCKER_REQUIREMENT_BASE = gen_docker_image(
"pytorch-linux-xenial-py3.7-gcc5.4"
)
DOCKER_IMAGE_CUDA_10_2, DOCKER_REQUIREMENT_CUDA_10_2 = gen_docker_image(
"pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7"
)
DOCKER_IMAGE_GCC7, DOCKER_REQUIREMENT_GCC7 = gen_docker_image(
"pytorch-linux-xenial-py3.7-gcc7"
)
def gen_docker_image_path(container_type):
return "/".join([
AWS_DOCKER_HOST,
"pytorch",
container_type + ":" + DOCKER_IMAGE_TAG,
])
def gen_mobile_docker(specifier):
DOCKER_IMAGE_BASIC = gen_docker_image_path("pytorch-linux-xenial-py3.6-gcc5.4")
DOCKER_IMAGE_CUDA_10_2 = gen_docker_image_path("pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7")
DOCKER_IMAGE_GCC7 = gen_docker_image_path("pytorch-linux-xenial-py3.6-gcc7")
def gen_mobile_docker_name(specifier):
container_type = "pytorch-linux-xenial-py3-clang5-" + specifier
return gen_docker_image(container_type)
return gen_docker_image_path(container_type)
DOCKER_IMAGE_ASAN, DOCKER_REQUIREMENT_ASAN = gen_mobile_docker("asan")
DOCKER_IMAGE_ASAN = gen_mobile_docker_name("asan")
DOCKER_IMAGE_NDK, DOCKER_REQUIREMENT_NDK = gen_mobile_docker("android-ndk-r19c")
DOCKER_IMAGE_NDK = gen_mobile_docker_name("android-ndk-r19c")

View File

@ -9,7 +9,7 @@ class MultiPartVersion:
with the prefix string.
"""
if self.parts:
return [self.prefix + str(self.parts[0])] + [str(part) for part in self.parts[1:]]
return [self.prefix + str(self.parts[0])] + list(map(str, self.parts[1:]))
else:
return [self.prefix]
@ -29,6 +29,3 @@ class CudaVersion(MultiPartVersion):
self.minor = minor
super().__init__([self.major, self.minor], "cuda")
def __str__(self):
return f"{self.major}.{self.minor}"

View File

@ -0,0 +1,142 @@
import cimodel.data.simple.util.branch_filters
import cimodel.lib.miniutils as miniutils
from cimodel.data.simple.util.versions import CudaVersion
class WindowsJob:
def __init__(
self,
test_index,
vscode_spec,
cuda_version,
force_on_cpu=False,
master_only_pred=lambda job: job.vscode_spec.year != 2019,
):
self.test_index = test_index
self.vscode_spec = vscode_spec
self.cuda_version = cuda_version
self.force_on_cpu = force_on_cpu
self.master_only_pred = master_only_pred
def gen_tree(self):
base_phase = "build" if self.test_index is None else "test"
numbered_phase = (
base_phase if self.test_index is None else base_phase + str(self.test_index)
)
key_name = "_".join(["pytorch", "windows", base_phase])
cpu_forcing_name_parts = ["on", "cpu"] if self.force_on_cpu else []
target_arch = self.cuda_version.render_dots() if self.cuda_version else "cpu"
base_name_parts = [
"pytorch",
"windows",
self.vscode_spec.render(),
"py36",
target_arch,
]
prerequisite_jobs = []
if base_phase == "test":
prerequisite_jobs.append("_".join(base_name_parts + ["build"]))
arch_env_elements = (
["cuda" + str(self.cuda_version.major), "cudnn7"]
if self.cuda_version
else ["cpu"]
)
build_environment_string = "-".join(
["pytorch", "win"]
+ self.vscode_spec.get_elements()
+ arch_env_elements
+ ["py3"]
)
is_running_on_cuda = bool(self.cuda_version) and not self.force_on_cpu
props_dict = {
"build_environment": build_environment_string,
"python_version": miniutils.quote("3.6"),
"vc_version": miniutils.quote(self.vscode_spec.dotted_version()),
"vc_year": miniutils.quote(str(self.vscode_spec.year)),
"vc_product": self.vscode_spec.get_product(),
"use_cuda": miniutils.quote(str(int(is_running_on_cuda))),
"requires": prerequisite_jobs,
}
if self.master_only_pred(self):
props_dict[
"filters"
] = cimodel.data.simple.util.branch_filters.gen_filter_dict()
name_parts = base_name_parts + cpu_forcing_name_parts + [numbered_phase]
if base_phase == "test":
test_name = "-".join(["pytorch", "windows", numbered_phase])
props_dict["test_name"] = test_name
if is_running_on_cuda:
props_dict["executor"] = "windows-with-nvidia-gpu"
props_dict["cuda_version"] = (
miniutils.quote(str(self.cuda_version.major))
if self.cuda_version
else "cpu"
)
props_dict["name"] = "_".join(name_parts)
return [{key_name: props_dict}]
class VcSpec:
def __init__(self, year, version_elements=None):
self.year = year
self.version_elements = version_elements or []
def get_elements(self):
return [self.prefixed_year()] + self.version_elements
def get_product(self):
return "Community" if self.year == 2019 else "BuildTools"
def dotted_version(self):
return ".".join(self.version_elements)
def prefixed_year(self):
return "vs" + str(self.year)
def render(self):
return "_".join(filter(None, [self.prefixed_year(), self.dotted_version()]))
def FalsePred(_):
return False
def TruePred(_):
return True
WORKFLOW_DATA = [
# VS2017 CUDA-10.1
WindowsJob(None, VcSpec(2017, ["14", "11"]), CudaVersion(10, 1), master_only_pred=FalsePred),
WindowsJob(1, VcSpec(2017, ["14", "11"]), CudaVersion(10, 1)),
# VS2017 no-CUDA (builds only)
WindowsJob(None, VcSpec(2017, ["14", "16"]), CudaVersion(10, 1)),
WindowsJob(None, VcSpec(2017, ["14", "16"]), None),
# VS2019 CUDA-10.1
WindowsJob(None, VcSpec(2019), CudaVersion(10, 1)),
WindowsJob(1, VcSpec(2019), CudaVersion(10, 1)),
WindowsJob(2, VcSpec(2019), CudaVersion(10, 1)),
# VS2019 CPU-only
WindowsJob(None, VcSpec(2019), None),
WindowsJob(1, VcSpec(2019), None),
WindowsJob(2, VcSpec(2019), None, master_only_pred=TruePred),
WindowsJob(1, VcSpec(2019), CudaVersion(10, 1), force_on_cpu=True),
WindowsJob(2, VcSpec(2019), CudaVersion(10, 1), force_on_cpu=True, master_only_pred=TruePred),
]
def get_windows_workflows():
return [item.gen_tree() for item in WORKFLOW_DATA]

File diff suppressed because it is too large Load Diff

View File

@ -12,20 +12,8 @@ each image as the `BUILD_ENVIRONMENT` environment variable.
See `build.sh` for valid build environments (it's the giant switch).
Docker builds are now defined with `.circleci/cimodel/data/simple/docker_definitions.py`
## Contents
* `build.sh` -- dispatch script to launch all builds
* `common` -- scripts used to execute individual Docker build stages
* `ubuntu-cuda` -- Dockerfile for Ubuntu image with CUDA support for nvidia-docker
## Usage
```bash
# Build a specific image
./build.sh pytorch-linux-bionic-py3.8-gcc9 -t myimage:latest
# Set flags (see build.sh) and build image
sudo bash -c 'PROTOBUF=1 ./build.sh pytorch-linux-bionic-py3.8-gcc9 -t myimage:latest
```

View File

@ -20,8 +20,10 @@ buildscript {
}
dependencies {
classpath 'com.android.tools.build:gradle:4.1.2'
classpath 'com.vanniktech:gradle-maven-publish-plugin:0.14.2'
classpath 'com.android.tools.build:gradle:3.3.2'
classpath "com.jfrog.bintray.gradle:gradle-bintray-plugin:1.8.0"
classpath "com.github.dcendents:android-maven-gradle-plugin:2.1"
classpath "org.jfrog.buildinfo:build-info-extractor-gradle:4.9.8"
}
}
@ -51,9 +53,9 @@ android {
dependencies {
implementation 'com.android.support:appcompat-v7:28.0.0'
implementation 'androidx.appcompat:appcompat:1.0.0'
implementation 'com.facebook.fbjni:fbjni-java-only:0.2.2'
implementation 'com.facebook.fbjni:fbjni-java-only:0.0.3'
implementation 'com.google.code.findbugs:jsr305:3.0.1'
implementation 'com.facebook.soloader:nativeloader:0.10.1'
implementation 'com.facebook.soloader:nativeloader:0.8.0'
implementation 'junit:junit:' + rootProject.junitVersion
implementation 'androidx.test:core:' + rootProject.coreVersion

View File

@ -10,43 +10,18 @@ if [ -z "${image}" ]; then
exit 1
fi
function extract_version_from_image_name() {
eval export $2=$(echo "${image}" | perl -n -e"/$1(\d+(\.\d+)?(\.\d+)?)/ && print \$1")
if [ "x${!2}" = x ]; then
echo "variable '$2' not correctly parsed from image='$image'"
exit 1
fi
}
function extract_all_from_image_name() {
# parts $image into array, splitting on '-'
keep_IFS="$IFS"
IFS="-"
declare -a parts=($image)
IFS="$keep_IFS"
unset keep_IFS
for part in "${parts[@]}"; do
name=$(echo "${part}" | perl -n -e"/([a-zA-Z]+)\d+(\.\d+)?(\.\d+)?/ && print \$1")
vername="${name^^}_VERSION"
# "py" is the odd one out, needs this special case
if [ "x${name}" = xpy ]; then
vername=ANACONDA_PYTHON_VERSION
fi
# skip non-conforming fields such as "pytorch", "linux" or "xenial" without version string
if [ -n "${name}" ]; then
extract_version_from_image_name "${name}" "${vername}"
fi
done
}
# Use the same pre-built XLA test image from PyTorch/XLA
if [[ "$image" == *xla* ]]; then
echo "Using pre-built XLA test image..."
exit 0
# TODO: Generalize
OS="ubuntu"
DOCKERFILE="${OS}/Dockerfile"
if [[ "$image" == *-cuda* ]]; then
DOCKERFILE="${OS}-cuda/Dockerfile"
elif [[ "$image" == *-rocm* ]]; then
DOCKERFILE="${OS}-rocm/Dockerfile"
fi
if [[ "$image" == *-xenial* ]]; then
if [[ "$image" == *-trusty* ]]; then
UBUNTU_VERSION=14.04
elif [[ "$image" == *-xenial* ]]; then
UBUNTU_VERSION=16.04
elif [[ "$image" == *-artful* ]]; then
UBUNTU_VERSION=17.10
@ -54,30 +29,6 @@ elif [[ "$image" == *-bionic* ]]; then
UBUNTU_VERSION=18.04
elif [[ "$image" == *-focal* ]]; then
UBUNTU_VERSION=20.04
elif [[ "$image" == *ubuntu* ]]; then
extract_version_from_image_name ubuntu UBUNTU_VERSION
elif [[ "$image" == *centos* ]]; then
extract_version_from_image_name centos CENTOS_VERSION
fi
if [ -n "${UBUNTU_VERSION}" ]; then
OS="ubuntu"
elif [ -n "${CENTOS_VERSION}" ]; then
OS="centos"
else
echo "Unable to derive operating system base..."
exit 1
fi
DOCKERFILE="${OS}/Dockerfile"
if [[ "$image" == *cuda* ]]; then
DOCKERFILE="${OS}-cuda/Dockerfile"
elif [[ "$image" == *rocm* ]]; then
DOCKERFILE="${OS}-rocm/Dockerfile"
fi
if [[ "$image" == *xenial* ]] || [[ "$image" == *bionic* ]]; then
CMAKE_VERSION=3.13.5
fi
TRAVIS_DL_URL_PREFIX="https://s3.amazonaws.com/travis-python-archives/binaries/ubuntu/14.04/x86_64"
@ -87,66 +38,98 @@ TRAVIS_DL_URL_PREFIX="https://s3.amazonaws.com/travis-python-archives/binaries/u
# from scratch
case "$image" in
pytorch-linux-xenial-py3.8)
ANACONDA_PYTHON_VERSION=3.8
# TODO: This is a hack, get rid of this as soon as you get rid of the travis downloads
TRAVIS_DL_URL_PREFIX="https://s3.amazonaws.com/travis-python-archives/binaries/ubuntu/16.04/x86_64"
TRAVIS_PYTHON_VERSION=3.8
GCC_VERSION=7
# Do not install PROTOBUF, DB, and VISION as a test
;;
pytorch-linux-xenial-py3.7-gcc5.4)
ANACONDA_PYTHON_VERSION=3.7
pytorch-linux-xenial-py3.6-gcc4.8)
ANACONDA_PYTHON_VERSION=3.6
GCC_VERSION=4.8
PROTOBUF=yes
DB=yes
VISION=yes
;;
pytorch-linux-xenial-py3.6-gcc5.4)
ANACONDA_PYTHON_VERSION=3.6
GCC_VERSION=5
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
;;
pytorch-linux-xenial-py3.7-gcc7.2)
ANACONDA_PYTHON_VERSION=3.7
pytorch-linux-xenial-py3.6-gcc7.2)
ANACONDA_PYTHON_VERSION=3.6
GCC_VERSION=7
# Do not install PROTOBUF, DB, and VISION as a test
;;
pytorch-linux-xenial-py3.7-gcc7)
ANACONDA_PYTHON_VERSION=3.7
pytorch-linux-xenial-py3.6-gcc7)
ANACONDA_PYTHON_VERSION=3.6
GCC_VERSION=7
PROTOBUF=yes
DB=yes
VISION=yes
;;
pytorch-linux-xenial-pynightly)
TRAVIS_PYTHON_VERSION=nightly
GCC_VERSION=7
PROTOBUF=yes
DB=yes
VISION=yes
;;
pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc5.4)
CUDA_VERSION=9.2
CUDNN_VERSION=7
ANACONDA_PYTHON_VERSION=3.6
GCC_VERSION=5
PROTOBUF=yes
DB=yes
VISION=yes
;;
pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc7)
CUDA_VERSION=9.2
CUDNN_VERSION=7
ANACONDA_PYTHON_VERSION=3.6
GCC_VERSION=7
PROTOBUF=yes
DB=yes
VISION=yes
;;
pytorch-linux-xenial-cuda10-cudnn7-py3-gcc7)
CUDA_VERSION=10.0
CUDNN_VERSION=7
ANACONDA_PYTHON_VERSION=3.6
GCC_VERSION=7
PROTOBUF=yes
DB=yes
VISION=yes
;;
pytorch-linux-xenial-cuda10.1-cudnn7-py3-gcc7)
CUDA_VERSION=10.1
CUDNN_VERSION=7
ANACONDA_PYTHON_VERSION=3.6
GCC_VERSION=7
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
;;
pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7)
CUDA_VERSION=10.2
CUDNN_VERSION=7
ANACONDA_PYTHON_VERSION=3.7
ANACONDA_PYTHON_VERSION=3.6
GCC_VERSION=7
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
;;
pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7)
CUDA_VERSION=11.3.0 # Deviating from major.minor to conform to nvidia's Docker image names
pytorch-linux-xenial-cuda11.0-cudnn8-py3-gcc7)
UBUNTU_VERSION=16.04-rc
CUDA_VERSION=11.0
CUDNN_VERSION=8
TENSORRT_VERSION=8.0.1.6
ANACONDA_PYTHON_VERSION=3.7
GCC_VERSION=7
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
;;
pytorch-linux-bionic-cuda11.3-cudnn8-py3-clang9)
CUDA_VERSION=11.3.0 # Deviating from major.minor to conform to nvidia's Docker image names
CUDNN_VERSION=8
TENSORRT_VERSION=8.0.1.6
ANACONDA_PYTHON_VERSION=3.7
CLANG_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
;;
pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7)
CUDA_VERSION=11.6.0
CUDNN_VERSION=8
ANACONDA_PYTHON_VERSION=3.7
ANACONDA_PYTHON_VERSION=3.6
GCC_VERSION=7
PROTOBUF=yes
DB=yes
@ -154,51 +137,36 @@ case "$image" in
KATEX=yes
;;
pytorch-linux-xenial-py3-clang5-asan)
ANACONDA_PYTHON_VERSION=3.7
ANACONDA_PYTHON_VERSION=3.6
CLANG_VERSION=5.0
PROTOBUF=yes
DB=yes
VISION=yes
;;
pytorch-linux-xenial-py3-clang7-asan)
ANACONDA_PYTHON_VERSION=3.7
CLANG_VERSION=7
PROTOBUF=yes
DB=yes
VISION=yes
;;
pytorch-linux-xenial-py3-clang7-onnx)
ANACONDA_PYTHON_VERSION=3.7
CLANG_VERSION=7
PROTOBUF=yes
DB=yes
VISION=yes
;;
pytorch-linux-xenial-py3-clang5-android-ndk-r19c)
ANACONDA_PYTHON_VERSION=3.7
ANACONDA_PYTHON_VERSION=3.6
CLANG_VERSION=5.0
LLVMDEV=yes
PROTOBUF=yes
ANDROID=yes
ANDROID_NDK_VERSION=r19c
GRADLE_VERSION=6.8.3
GRADLE_VERSION=4.10.3
CMAKE_VERSION=3.7.0
NINJA_VERSION=1.9.0
;;
pytorch-linux-xenial-py3.7-clang7)
ANACONDA_PYTHON_VERSION=3.7
pytorch-linux-xenial-py3.6-clang7)
ANACONDA_PYTHON_VERSION=3.6
CLANG_VERSION=7
PROTOBUF=yes
DB=yes
VISION=yes
;;
pytorch-linux-bionic-py3.7-clang9)
ANACONDA_PYTHON_VERSION=3.7
pytorch-linux-bionic-py3.6-clang9)
ANACONDA_PYTHON_VERSION=3.6
CLANG_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
VULKAN_SDK_VERSION=1.2.162.1
SWIFTSHADER=yes
;;
pytorch-linux-bionic-py3.8-gcc9)
ANACONDA_PYTHON_VERSION=3.8
@ -207,81 +175,62 @@ case "$image" in
DB=yes
VISION=yes
;;
pytorch-linux-bionic-cuda10.2-cudnn7-py3.7-clang9)
pytorch-linux-bionic-cuda10.2-cudnn7-py3.6-clang9)
CUDA_VERSION=10.2
CUDNN_VERSION=7
ANACONDA_PYTHON_VERSION=3.7
ANACONDA_PYTHON_VERSION=3.6
CLANG_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
;;
pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7)
pytorch-linux-bionic-cuda10.2-cudnn7-py3.8-gcc9)
CUDA_VERSION=10.2
CUDNN_VERSION=7
ANACONDA_PYTHON_VERSION=3.9
GCC_VERSION=7
PROTOBUF=yes
DB=yes
VISION=yes
;;
pytorch-linux-bionic-rocm5.0-py3.7)
ANACONDA_PYTHON_VERSION=3.7
ANACONDA_PYTHON_VERSION=3.8
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
ROCM_VERSION=5.0
;;
pytorch-linux-bionic-rocm5.1-py3.7)
ANACONDA_PYTHON_VERSION=3.7
pytorch-linux-bionic-cuda11.0-cudnn8-py3.6-gcc9)
UBUNTU_VERSION=18.04-rc
CUDA_VERSION=11.0
CUDNN_VERSION=8
ANACONDA_PYTHON_VERSION=3.6
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
ROCM_VERSION=5.1.1
;;
pytorch-linux-focal-py3.7-gcc7)
ANACONDA_PYTHON_VERSION=3.7
CMAKE_VERSION=3.12.4 # To make sure XNNPACK is enabled for the BACKWARDS_COMPAT_TEST used with this image
GCC_VERSION=7
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
;;
*)
# Catch-all for builds that are not hardcoded.
pytorch-linux-bionic-cuda11.0-cudnn8-py3.8-gcc9)
UBUNTU_VERSION=18.04-rc
CUDA_VERSION=11.0
CUDNN_VERSION=8
ANACONDA_PYTHON_VERSION=3.8
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
echo "image '$image' did not match an existing build configuration"
if [[ "$image" == *py* ]]; then
extract_version_from_image_name py ANACONDA_PYTHON_VERSION
fi
if [[ "$image" == *cuda* ]]; then
extract_version_from_image_name cuda CUDA_VERSION
extract_version_from_image_name cudnn CUDNN_VERSION
fi
if [[ "$image" == *rocm* ]]; then
extract_version_from_image_name rocm ROCM_VERSION
fi
if [[ "$image" == *gcc* ]]; then
extract_version_from_image_name gcc GCC_VERSION
fi
if [[ "$image" == *clang* ]]; then
extract_version_from_image_name clang CLANG_VERSION
fi
if [[ "$image" == *devtoolset* ]]; then
extract_version_from_image_name devtoolset DEVTOOLSET_VERSION
fi
if [[ "$image" == *glibc* ]]; then
extract_version_from_image_name glibc GLIBC_VERSION
fi
if [[ "$image" == *cmake* ]]; then
extract_version_from_image_name cmake CMAKE_VERSION
fi
;;
KATEX=yes
;;
pytorch-linux-xenial-rocm3.3-py3.6)
ANACONDA_PYTHON_VERSION=3.6
PROTOBUF=yes
DB=yes
VISION=yes
ROCM_VERSION=3.3
# newer cmake version required
CMAKE_VERSION=3.6.3
;;
pytorch-linux-bionic-rocm3.3-py3.6)
ANACONDA_PYTHON_VERSION=3.6
PROTOBUF=yes
DB=yes
VISION=yes
ROCM_VERSION=3.3
;;
esac
# Set Jenkins UID and GID if running Jenkins
@ -290,15 +239,7 @@ if [ -n "${JENKINS:-}" ]; then
JENKINS_GID=$(id -g jenkins)
fi
tmp_tag=$(basename "$(mktemp -u)" | tr '[:upper:]' '[:lower:]')
#when using cudnn version 8 install it separately from cuda
if [[ "$image" == *cuda* && ${OS} == "ubuntu" ]]; then
IMAGE_NAME="nvidia/cuda:${CUDA_VERSION}-cudnn${CUDNN_VERSION}-devel-ubuntu${UBUNTU_VERSION}"
if [[ ${CUDNN_VERSION} == 8 ]]; then
IMAGE_NAME="nvidia/cuda:${CUDA_VERSION}-devel-ubuntu${UBUNTU_VERSION}"
fi
fi
tmp_tag="tmp-$(cat /dev/urandom | tr -dc 'a-z' | fold -w 32 | head -n 1)"
# Build image
# TODO: build-arg THRIFT is not turned on for any image, remove it once we confirm
@ -318,45 +259,29 @@ docker build \
--build-arg "JENKINS_UID=${JENKINS_UID:-}" \
--build-arg "JENKINS_GID=${JENKINS_GID:-}" \
--build-arg "UBUNTU_VERSION=${UBUNTU_VERSION}" \
--build-arg "CENTOS_VERSION=${CENTOS_VERSION}" \
--build-arg "DEVTOOLSET_VERSION=${DEVTOOLSET_VERSION}" \
--build-arg "GLIBC_VERSION=${GLIBC_VERSION}" \
--build-arg "CLANG_VERSION=${CLANG_VERSION}" \
--build-arg "ANACONDA_PYTHON_VERSION=${ANACONDA_PYTHON_VERSION}" \
--build-arg "TRAVIS_PYTHON_VERSION=${TRAVIS_PYTHON_VERSION}" \
--build-arg "GCC_VERSION=${GCC_VERSION}" \
--build-arg "CUDA_VERSION=${CUDA_VERSION}" \
--build-arg "CUDNN_VERSION=${CUDNN_VERSION}" \
--build-arg "TENSORRT_VERSION=${TENSORRT_VERSION}" \
--build-arg "ANDROID=${ANDROID}" \
--build-arg "ANDROID_NDK=${ANDROID_NDK_VERSION}" \
--build-arg "GRADLE_VERSION=${GRADLE_VERSION}" \
--build-arg "VULKAN_SDK_VERSION=${VULKAN_SDK_VERSION}" \
--build-arg "SWIFTSHADER=${SWIFTSHADER}" \
--build-arg "CMAKE_VERSION=${CMAKE_VERSION:-}" \
--build-arg "NINJA_VERSION=${NINJA_VERSION:-}" \
--build-arg "KATEX=${KATEX:-}" \
--build-arg "ROCM_VERSION=${ROCM_VERSION:-}" \
--build-arg "PYTORCH_ROCM_ARCH=${PYTORCH_ROCM_ARCH:-gfx900;gfx906}" \
--build-arg "IMAGE_NAME=${IMAGE_NAME}" \
-f $(dirname ${DOCKERFILE})/Dockerfile \
-t "$tmp_tag" \
"$@" \
.
# NVIDIA dockers for RC releases use tag names like `11.0-cudnn8-devel-ubuntu18.04-rc`,
# for this case we will set UBUNTU_VERSION to `18.04-rc` so that the Dockerfile could
# find the correct image. As a result, here we have to replace the
# "$UBUNTU_VERSION" == "18.04-rc"
# with
# "$UBUNTU_VERSION" == "18.04"
UBUNTU_VERSION=$(echo ${UBUNTU_VERSION} | sed 's/-rc$//')
function drun() {
docker run --rm "$tmp_tag" $*
}
if [[ "$OS" == "ubuntu" ]]; then
if !(drun lsb_release -a 2>&1 | grep -qF Ubuntu); then
echo "OS=ubuntu, but:"
drun lsb_release -a
@ -369,6 +294,19 @@ if [[ "$OS" == "ubuntu" ]]; then
fi
fi
if [ -n "$TRAVIS_PYTHON_VERSION" ]; then
if [[ "$TRAVIS_PYTHON_VERSION" != nightly ]]; then
if !(drun python --version 2>&1 | grep -qF "Python $TRAVIS_PYTHON_VERSION"); then
echo "TRAVIS_PYTHON_VERSION=$TRAVIS_PYTHON_VERSION, but:"
drun python --version
exit 1
fi
else
echo "Please manually check nightly is OK:"
drun python --version
fi
fi
if [ -n "$ANACONDA_PYTHON_VERSION" ]; then
if !(drun python --version 2>&1 | grep -qF "Python $ANACONDA_PYTHON_VERSION"); then
echo "ANACONDA_PYTHON_VERSION=$ANACONDA_PYTHON_VERSION, but:"

View File

@ -13,7 +13,7 @@ retry () {
#until we find a way to reliably reuse previous build, this last_tag is not in use
# last_tag="$(( CIRCLE_BUILD_NUM - 1 ))"
tag="${DOCKER_TAG}"
tag="${CIRCLE_WORKFLOW_ID}"
registry="308535385114.dkr.ecr.us-east-1.amazonaws.com"
@ -26,14 +26,11 @@ login() {
docker login -u AWS --password-stdin "$1"
}
# Retry on timeouts (can happen on job stampede).
retry login "${registry}"
# Only run these steps if not on github actions
if [[ -z "${GITHUB_ACTIONS}" ]]; then
# Retry on timeouts (can happen on job stampede).
retry login "${registry}"
# Logout on exit
trap "docker logout ${registry}" EXIT
fi
# Logout on exit
trap "docker logout ${registry}" EXIT
# export EC2=1
# export JENKINS=1
@ -48,8 +45,9 @@ fi
docker push "${image}:${tag}"
if [ -z "${DOCKER_SKIP_S3_UPLOAD:-}" ]; then
trap "rm -rf ${IMAGE_NAME}:${tag}.tar" EXIT
docker save -o "${IMAGE_NAME}:${tag}.tar" "${image}:${tag}"
aws s3 cp "${IMAGE_NAME}:${tag}.tar" "s3://ossci-linux-build/pytorch/base/${IMAGE_NAME}:${tag}.tar" --acl public-read
fi
# TODO: Get rid of duplicate tagging once ${DOCKER_TAG} becomes the default
docker tag "${image}:${tag}" "${image}:${DOCKER_TAG}"
docker push "${image}:${DOCKER_TAG}"
docker save -o "${IMAGE_NAME}:${tag}.tar" "${image}:${tag}"
aws s3 cp "${IMAGE_NAME}:${tag}.tar" "s3://ossci-linux-build/pytorch/base/${IMAGE_NAME}:${tag}.tar" --acl public-read

View File

@ -1,105 +0,0 @@
ARG CENTOS_VERSION
FROM centos:${CENTOS_VERSION}
ARG CENTOS_VERSION
# Set AMD gpu targets to build for
ARG PYTORCH_ROCM_ARCH
ENV PYTORCH_ROCM_ARCH ${PYTORCH_ROCM_ARCH}
# Install required packages to build Caffe2
# Install common dependencies (so that this step can be cached separately)
ARG EC2
ADD ./common/install_base.sh install_base.sh
RUN bash ./install_base.sh && rm install_base.sh
# Update CentOS git version
RUN yum -y remove git
RUN yum -y remove git-*
RUN yum -y install https://packages.endpoint.com/rhel/7/os/x86_64/endpoint-repo-1.9-1.x86_64.rpm
RUN yum install -y git
# Install devtoolset
ARG DEVTOOLSET_VERSION
ADD ./common/install_devtoolset.sh install_devtoolset.sh
RUN bash ./install_devtoolset.sh && rm install_devtoolset.sh
ENV BASH_ENV "/etc/profile"
# (optional) Install non-default glibc version
ARG GLIBC_VERSION
ADD ./common/install_glibc.sh install_glibc.sh
RUN if [ -n "${GLIBC_VERSION}" ]; then bash ./install_glibc.sh; fi
RUN rm install_glibc.sh
# Install user
ADD ./common/install_user.sh install_user.sh
RUN bash ./install_user.sh && rm install_user.sh
# Install conda and other packages (e.g., numpy, pytest)
ENV PATH /opt/conda/bin:$PATH
ARG ANACONDA_PYTHON_VERSION
ADD requirements-ci.txt /opt/conda/requirements-ci.txt
ADD ./common/install_conda.sh install_conda.sh
RUN bash ./install_conda.sh && rm install_conda.sh
RUN rm /opt/conda/requirements-ci.txt
# (optional) Install protobuf for ONNX
ARG PROTOBUF
ADD ./common/install_protobuf.sh install_protobuf.sh
RUN if [ -n "${PROTOBUF}" ]; then bash ./install_protobuf.sh; fi
RUN rm install_protobuf.sh
ENV INSTALLED_PROTOBUF ${PROTOBUF}
# (optional) Install database packages like LMDB and LevelDB
ARG DB
ADD ./common/install_db.sh install_db.sh
RUN if [ -n "${DB}" ]; then bash ./install_db.sh; fi
RUN rm install_db.sh
ENV INSTALLED_DB ${DB}
# (optional) Install vision packages like OpenCV and ffmpeg
ARG VISION
ADD ./common/install_vision.sh install_vision.sh
RUN if [ -n "${VISION}" ]; then bash ./install_vision.sh; fi
RUN rm install_vision.sh
ENV INSTALLED_VISION ${VISION}
# Install rocm
ARG ROCM_VERSION
ADD ./common/install_rocm.sh install_rocm.sh
RUN bash ./install_rocm.sh
RUN rm install_rocm.sh
ENV PATH /opt/rocm/bin:$PATH
ENV PATH /opt/rocm/hcc/bin:$PATH
ENV PATH /opt/rocm/hip/bin:$PATH
ENV PATH /opt/rocm/opencl/bin:$PATH
ENV PATH /opt/rocm/llvm/bin:$PATH
ENV MAGMA_HOME /opt/rocm/magma
ENV LANG en_US.utf8
ENV LC_ALL en_US.utf8
# (optional) Install non-default CMake version
ARG CMAKE_VERSION
ADD ./common/install_cmake.sh install_cmake.sh
RUN if [ -n "${CMAKE_VERSION}" ]; then bash ./install_cmake.sh; fi
RUN rm install_cmake.sh
# (optional) Install non-default Ninja version
ARG NINJA_VERSION
ADD ./common/install_ninja.sh install_ninja.sh
RUN if [ -n "${NINJA_VERSION}" ]; then bash ./install_ninja.sh; fi
RUN rm install_ninja.sh
# Install ccache/sccache (do this last, so we get priority in PATH)
ADD ./common/install_cache.sh install_cache.sh
ENV PATH /opt/cache/bin:$PATH
RUN bash ./install_cache.sh && rm install_cache.sh
# Include BUILD_ENVIRONMENT environment variable in image
ARG BUILD_ENVIRONMENT
ENV BUILD_ENVIRONMENT ${BUILD_ENVIRONMENT}
USER jenkins
CMD ["bash"]

View File

@ -4,15 +4,13 @@ set -ex
[ -n "${ANDROID_NDK}" ]
_https_amazon_aws=https://ossci-android.s3.amazonaws.com
apt-get update
apt-get install -y --no-install-recommends autotools-dev autoconf unzip
apt-get autoclean && apt-get clean
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
pushd /tmp
curl -Os --retry 3 $_https_amazon_aws/android-ndk-${ANDROID_NDK}-linux-x86_64.zip
curl -Os --retry 3 https://dl.google.com/android/repository/android-ndk-${ANDROID_NDK}-linux-x86_64.zip
popd
_ndk_dir=/opt/ndk
mkdir -p "$_ndk_dir"
@ -47,22 +45,43 @@ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/
# Installing android sdk
# https://github.com/circleci/circleci-images/blob/staging/android/Dockerfile.m4
_tmp_sdk_zip=/tmp/android-sdk-linux.zip
_sdk_version=sdk-tools-linux-3859397.zip
_android_home=/opt/android/sdk
rm -rf $_android_home
sudo mkdir -p $_android_home
curl --silent --show-error --location --fail --retry 3 --output /tmp/android-sdk-linux.zip $_https_amazon_aws/android-sdk-linux-tools3859397-build-tools2803-2902-platforms28-29.zip
sudo unzip -q $_tmp_sdk_zip -d $_android_home
rm $_tmp_sdk_zip
curl --silent --show-error --location --fail --retry 3 --output /tmp/$_sdk_version https://dl.google.com/android/repository/$_sdk_version
sudo unzip -q /tmp/$_sdk_version -d $_android_home
rm /tmp/$_sdk_version
sudo chmod -R 777 $_android_home
export ANDROID_HOME=$_android_home
export ADB_INSTALL_TIMEOUT=120
export PATH="${ANDROID_HOME}/tools:${ANDROID_HOME}/tools/bin:${ANDROID_HOME}/platform-tools:${PATH}"
export PATH="${ANDROID_HOME}/emulator:${ANDROID_HOME}/tools:${ANDROID_HOME}/tools/bin:${ANDROID_HOME}/platform-tools:${PATH}"
echo "PATH:${PATH}"
alias sdkmanager="$ANDROID_HOME/tools/bin/sdkmanager"
sudo mkdir ~/.android && sudo echo '### User Sources for Android SDK Manager' > ~/.android/repositories.cfg
sudo chmod -R 777 ~/.android
yes | sdkmanager --licenses
yes | sdkmanager --update
sdkmanager \
"tools" \
"platform-tools" \
"emulator"
sdkmanager \
"build-tools;28.0.3" \
"build-tools;29.0.2"
sdkmanager \
"platforms;android-28" \
"platforms;android-29"
sdkmanager --list
# Installing Gradle
echo "GRADLE_VERSION:${GRADLE_VERSION}"
@ -70,7 +89,8 @@ _gradle_home=/opt/gradle
sudo rm -rf $gradle_home
sudo mkdir -p $_gradle_home
curl --silent --output /tmp/gradle.zip --retry 3 $_https_amazon_aws/gradle-${GRADLE_VERSION}-bin.zip
wget --no-verbose --output-document=/tmp/gradle.zip \
"https://services.gradle.org/distributions/gradle-${GRADLE_VERSION}-bin.zip"
sudo unzip -q /tmp/gradle.zip -d $_gradle_home
rm /tmp/gradle.zip
@ -99,7 +119,7 @@ echo "ndk.dir=/opt/ndk" >> $GRADLE_LOCAL_PROPERTIES
chown -R jenkins /var/lib/jenkins/gradledeps
chgrp -R jenkins /var/lib/jenkins/gradledeps
sudo -H -u jenkins $GRADLE_HOME/bin/gradle -Pandroid.useAndroidX=true -p /var/lib/jenkins/gradledeps -g /var/lib/jenkins/.gradle --refresh-dependencies --debug --stacktrace assemble
sudo -H -u jenkins $GRADLE_HOME/bin/gradle -p /var/lib/jenkins/gradledeps -g /var/lib/jenkins/.gradle --refresh-dependencies --debug --stacktrace assemble
chown -R jenkins /var/lib/jenkins/.gradle
chgrp -R jenkins /var/lib/jenkins/.gradle

View File

@ -2,133 +2,74 @@
set -ex
install_ubuntu() {
# NVIDIA dockers for RC releases use tag names like `11.0-cudnn8-devel-ubuntu18.04-rc`,
# for this case we will set UBUNTU_VERSION to `18.04-rc` so that the Dockerfile could
# find the correct image. As a result, here we have to check for
# "$UBUNTU_VERSION" == "18.04"*
# instead of
# "$UBUNTU_VERSION" == "18.04"
if [[ "$UBUNTU_VERSION" == "18.04"* ]]; then
cmake3="cmake=3.10*"
maybe_libiomp_dev="libiomp-dev"
elif [[ "$UBUNTU_VERSION" == "20.04"* ]]; then
cmake3="cmake=3.16*"
maybe_libiomp_dev=""
else
cmake3="cmake=3.5*"
maybe_libiomp_dev="libiomp-dev"
fi
# NVIDIA dockers for RC releases use tag names like `11.0-cudnn8-devel-ubuntu18.04-rc`,
# for this case we will set UBUNTU_VERSION to `18.04-rc` so that the Dockerfile could
# find the correct image. As a result, here we have to check for
# "$UBUNTU_VERSION" == "18.04"*
# instead of
# "$UBUNTU_VERSION" == "18.04"
if [[ "$UBUNTU_VERSION" == "18.04"* ]]; then
cmake3="cmake=3.10*"
else
cmake3="cmake=3.5*"
fi
# TODO: Remove this once nvidia package repos are back online
# Comment out nvidia repositories to prevent them from getting apt-get updated, see https://github.com/pytorch/pytorch/issues/74968
# shellcheck disable=SC2046
sed -i 's/.*nvidia.*/# &/' $(find /etc/apt/ -type f -name "*.list")
# Install common dependencies
apt-get update
# TODO: Some of these may not be necessary
ccache_deps="asciidoc docbook-xml docbook-xsl xsltproc"
numpy_deps="gfortran"
apt-get install -y --no-install-recommends \
$ccache_deps \
$numpy_deps \
${cmake3} \
apt-transport-https \
autoconf \
automake \
build-essential \
ca-certificates \
curl \
git \
libatlas-base-dev \
libc6-dbg \
${maybe_libiomp_dev} \
libyaml-dev \
libz-dev \
libjpeg-dev \
libasound2-dev \
libsndfile-dev \
software-properties-common \
wget \
sudo \
vim
# Should resolve issues related to various apt package repository cert issues
# see: https://github.com/pytorch/pytorch/issues/65931
apt-get install -y libgnutls30
# Cleanup package manager
apt-get autoclean && apt-get clean
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
}
install_centos() {
# Need EPEL for many packages we depend on.
# See http://fedoraproject.org/wiki/EPEL
yum --enablerepo=extras install -y epel-release
ccache_deps="asciidoc docbook-dtds docbook-style-xsl libxslt"
numpy_deps="gcc-gfortran"
# Note: protobuf-c-{compiler,devel} on CentOS are too old to be used
# for Caffe2. That said, we still install them to make sure the build
# system opts to build/use protoc and libprotobuf from third-party.
yum install -y \
$ccache_deps \
$numpy_deps \
autoconf \
automake \
bzip2 \
cmake \
cmake3 \
curl \
gcc \
gcc-c++ \
gflags-devel \
git \
glibc-devel \
glibc-headers \
glog-devel \
hiredis-devel \
libstdc++-devel \
libsndfile-devel \
make \
opencv-devel \
sudo \
wget \
vim
# Cleanup
yum clean all
rm -rf /var/cache/yum
rm -rf /var/lib/yum/yumdb
rm -rf /var/lib/yum/history
}
# Install base packages depending on the base OS
ID=$(grep -oP '(?<=^ID=).+' /etc/os-release | tr -d '"')
case "$ID" in
ubuntu)
install_ubuntu
;;
centos)
install_centos
;;
*)
echo "Unable to determine OS..."
exit 1
;;
esac
# Install common dependencies
apt-get update
# TODO: Some of these may not be necessary
# TODO: libiomp also gets installed by conda, aka there's a conflict
ccache_deps="asciidoc docbook-xml docbook-xsl xsltproc"
numpy_deps="gfortran"
apt-get install -y --no-install-recommends \
$ccache_deps \
$numpy_deps \
${cmake3} \
apt-transport-https \
autoconf \
automake \
build-essential \
ca-certificates \
curl \
git \
libatlas-base-dev \
libc6-dbg \
libiomp-dev \
libyaml-dev \
libz-dev \
libjpeg-dev \
libasound2-dev \
libsndfile-dev \
python \
python-dev \
python-setuptools \
python-wheel \
software-properties-common \
sudo \
wget \
vim
# Install Valgrind separately since the apt-get version is too old.
mkdir valgrind_build && cd valgrind_build
VALGRIND_VERSION=3.16.1
wget https://ossci-linux.s3.amazonaws.com/valgrind-${VALGRIND_VERSION}.tar.bz2
VALGRIND_VERSION=3.15.0
if ! wget http://valgrind.org/downloads/valgrind-${VALGRIND_VERSION}.tar.bz2
then
wget https://sourceware.org/ftp/valgrind/valgrind-${VALGRIND_VERSION}.tar.bz2
fi
tar -xjf valgrind-${VALGRIND_VERSION}.tar.bz2
cd valgrind-${VALGRIND_VERSION}
./configure --prefix=/usr/local
make -j6
make -j 4
sudo make install
cd ../../
rm -rf valgrind_build
alias valgrind="/usr/local/bin/valgrind"
# TODO: THIS IS A HACK!!!
# distributed nccl(2) tests are a bit busted, see https://github.com/pytorch/pytorch/issues/5877
if dpkg -s libnccl-dev; then
apt-get remove -y libnccl-dev libnccl2 --allow-change-held-packages
fi
# Cleanup package manager
apt-get autoclean && apt-get clean
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

View File

@ -2,51 +2,17 @@
set -ex
install_ubuntu() {
echo "Preparing to build sccache from source"
apt-get update
apt-get install -y cargo pkg-config libssl-dev
echo "Checking out sccache repo"
git clone https://github.com/pytorch/sccache
cd sccache
echo "Building sccache"
cargo build --release
cp target/release/sccache /opt/cache/bin
echo "Cleaning up"
cd ..
rm -rf sccache
apt-get remove -y cargo rustc
apt-get autoclean && apt-get clean
}
install_binary() {
echo "Downloading sccache binary from S3 repo"
curl --retry 3 https://s3.amazonaws.com/ossci-linux/sccache -o /opt/cache/bin/sccache
}
mkdir -p /opt/cache/bin
mkdir -p /opt/cache/lib
sed -e 's|PATH="\(.*\)"|PATH="/opt/cache/bin:\1"|g' -i /etc/environment
export PATH="/opt/cache/bin:$PATH"
# Setup compiler cache
if [ -n "$ROCM_VERSION" ]; then
curl --retry 3 http://repo.radeon.com/misc/.sccache_amd/sccache -o /opt/cache/bin/sccache
else
ID=$(grep -oP '(?<=^ID=).+' /etc/os-release | tr -d '"')
case "$ID" in
ubuntu)
install_ubuntu
;;
*)
install_binary
;;
esac
fi
curl --retry 3 https://s3.amazonaws.com/ossci-linux/sccache -o /opt/cache/bin/sccache
chmod a+x /opt/cache/bin/sccache
function write_sccache_stub() {
printf "#!/bin/sh\nif [ \$(ps -p \$PPID -o comm=) != sccache ]; then\n exec sccache $(which $1) \"\$@\"\nelse\n exec $(which $1) \"\$@\"\nfi" > "/opt/cache/bin/$1"
printf "#!/bin/sh\nexec sccache $(which $1) \$*" > "/opt/cache/bin/$1"
chmod a+x "/opt/cache/bin/$1"
}
@ -54,12 +20,8 @@ write_sccache_stub cc
write_sccache_stub c++
write_sccache_stub gcc
write_sccache_stub g++
# NOTE: See specific ROCM_VERSION case below.
if [ "x$ROCM_VERSION" = x ]; then
write_sccache_stub clang
write_sccache_stub clang++
fi
write_sccache_stub clang
write_sccache_stub clang++
if [ -n "$CUDA_VERSION" ]; then
# TODO: This is a workaround for the fact that PyTorch's FindCUDA
@ -68,50 +30,6 @@ if [ -n "$CUDA_VERSION" ]; then
# where CUDA is installed. Instead, we install an nvcc symlink outside
# of the PATH, and set CUDA_NVCC_EXECUTABLE so that we make use of it.
write_sccache_stub nvcc
mv /opt/cache/bin/nvcc /opt/cache/lib/
fi
if [ -n "$ROCM_VERSION" ]; then
# ROCm compiler is hcc or clang. However, it is commonly invoked via hipcc wrapper.
# hipcc will call either hcc or clang using an absolute path starting with /opt/rocm,
# causing the /opt/cache/bin to be skipped. We must create the sccache wrappers
# directly under /opt/rocm while also preserving the original compiler names.
# Note symlinks will chain as follows: [hcc or clang++] -> clang -> clang-??
# Final link in symlink chain must point back to original directory.
# Original compiler is moved one directory deeper. Wrapper replaces it.
function write_sccache_stub_rocm() {
OLDCOMP=$1
COMPNAME=$(basename $OLDCOMP)
TOPDIR=$(dirname $OLDCOMP)
WRAPPED="$TOPDIR/original/$COMPNAME"
mv "$OLDCOMP" "$WRAPPED"
printf "#!/bin/sh\nexec sccache $WRAPPED \"\$@\"" > "$OLDCOMP"
chmod a+x "$OLDCOMP"
}
if [[ -e "/opt/rocm/hcc/bin/hcc" ]]; then
# ROCm 3.3 or earlier.
mkdir /opt/rocm/hcc/bin/original
write_sccache_stub_rocm /opt/rocm/hcc/bin/hcc
write_sccache_stub_rocm /opt/rocm/hcc/bin/clang
write_sccache_stub_rocm /opt/rocm/hcc/bin/clang++
# Fix last link in symlink chain, clang points to versioned clang in prior dir
pushd /opt/rocm/hcc/bin/original
ln -s ../$(readlink clang)
popd
elif [[ -e "/opt/rocm/llvm/bin/clang" ]]; then
# ROCm 3.5 and beyond.
mkdir /opt/rocm/llvm/bin/original
write_sccache_stub_rocm /opt/rocm/llvm/bin/clang
write_sccache_stub_rocm /opt/rocm/llvm/bin/clang++
# Fix last link in symlink chain, clang points to versioned clang in prior dir
pushd /opt/rocm/llvm/bin/original
ln -s ../$(readlink clang)
popd
else
echo "Cannot find ROCm compiler."
exit 1
fi
printf "#!/bin/sh\nexec sccache $(which nvcc) \"\$@\"" > /opt/cache/lib/nvcc
chmod a+x /opt/cache/lib/nvcc
fi

View File

@ -4,9 +4,6 @@ set -ex
[ -n "$CMAKE_VERSION" ]
# Remove system cmake install so it won't get used instead
apt-get remove cmake -y
# Turn 3.6.3 into v3.6
path=$(echo "${CMAKE_VERSION}" | sed -e 's/\([0-9].[0-9]\+\).*/v\1/')
file="cmake-${CMAKE_VERSION}-Linux-x86_64.tar.gz"

View File

@ -21,23 +21,16 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then
;;
esac
mkdir -p /opt/conda
mkdir /opt/conda
chown jenkins:jenkins /opt/conda
# Work around bug where devtoolset replaces sudo and breaks it.
if [ -n "$DEVTOOLSET_VERSION" ]; then
SUDO=/bin/sudo
else
SUDO=sudo
fi
as_jenkins() {
# NB: unsetting the environment variables works around a conda bug
# https://github.com/conda/conda/issues/6576
# NB: Pass on PATH and LD_LIBRARY_PATH to sudo invocation
# NB: This must be run from a directory that jenkins has access to,
# works around https://github.com/conda/conda-package-handling/pull/34
$SUDO -H -u jenkins env -u SUDO_UID -u SUDO_GID -u SUDO_COMMAND -u SUDO_USER env "PATH=$PATH" "LD_LIBRARY_PATH=$LD_LIBRARY_PATH" $*
sudo -H -u jenkins env -u SUDO_UID -u SUDO_GID -u SUDO_COMMAND -u SUDO_USER env "PATH=$PATH" "LD_LIBRARY_PATH=$LD_LIBRARY_PATH" $*
}
pushd /tmp
@ -56,10 +49,10 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then
pushd /opt/conda
# Track latest conda update
as_jenkins conda update -y -n base conda
as_jenkins conda update -n base conda
# Install correct Python version
as_jenkins conda install -y python="$ANACONDA_PYTHON_VERSION"
as_jenkins conda install python="$ANACONDA_PYTHON_VERSION"
conda_install() {
# Ensure that the install command don't upgrade/downgrade Python
@ -68,45 +61,36 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then
as_jenkins conda install -q -y python="$ANACONDA_PYTHON_VERSION" $*
}
pip_install() {
as_jenkins pip install --progress-bar off $*
}
# Install PyTorch conda deps, as per https://github.com/pytorch/pytorch README
# DO NOT install cmake here as it would install a version newer than 3.10, but
# we want to pin to version 3.10.
if [ "$ANACONDA_PYTHON_VERSION" = "3.9" ]; then
# DO NOT install cmake here as it would install a version newer than 3.5, but
# we want to pin to version 3.5.
if [ "$ANACONDA_PYTHON_VERSION" = "3.8" ]; then
# DO NOT install typing if installing python-3.8, since its part of python-3.8 core packages
# Install llvm-8 as it is required to compile llvmlite-0.30.0 from source
conda_install numpy=1.19.2 astunparse pyyaml mkl mkl-include setuptools cffi future six llvmdev=8.0.0
elif [ "$ANACONDA_PYTHON_VERSION" = "3.8" ]; then
# Install llvm-8 as it is required to compile llvmlite-0.30.0 from source
conda_install numpy=1.18.5 astunparse pyyaml mkl mkl-include setuptools cffi future six llvmdev=8.0.0
elif [ "$ANACONDA_PYTHON_VERSION" = "3.7" ]; then
# DO NOT install dataclasses if installing python-3.7, since its part of python-3.7 core packages
conda_install numpy=1.18.5 astunparse pyyaml mkl mkl-include setuptools cffi future six typing_extensions
conda_install numpy pyyaml mkl mkl-include setuptools cffi future six llvmdev=8.0.0
else
conda_install numpy=1.18.5 astunparse pyyaml mkl mkl-include setuptools cffi future six dataclasses typing_extensions
conda_install numpy pyyaml mkl mkl-include setuptools cffi typing future six
fi
# Magma package names are concatenation of CUDA major and minor ignoring revision
# I.e. magma-cuda102 package corresponds to CUDA_VERSION=10.2 and CUDA_VERSION=10.2.89
if [ -n "$CUDA_VERSION" ]; then
conda_install magma-cuda$(TMP=${CUDA_VERSION/./};echo ${TMP%.*[0-9]}) -c pytorch
if [[ "$CUDA_VERSION" == 9.2* ]]; then
conda_install magma-cuda92 -c pytorch
elif [[ "$CUDA_VERSION" == 10.0* ]]; then
conda_install magma-cuda100 -c pytorch
elif [[ "$CUDA_VERSION" == 10.1* ]]; then
conda_install magma-cuda101 -c pytorch
elif [[ "$CUDA_VERSION" == 10.2* ]]; then
conda_install magma-cuda102 -c pytorch
fi
# TODO: This isn't working atm
conda_install nnpack -c killeent
# Install some other packages, including those needed for Python test reporting
pip_install -r /opt/conda/requirements-ci.txt
# Update scikit-learn to a python-3.8 compatible version
if [[ $(python -c "import sys; print(int(sys.version_info >= (3, 8)))") == "1" ]]; then
pip_install -U scikit-learn
else
# Pinned scikit-learn due to https://github.com/scikit-learn/scikit-learn/issues/14485 (affects gcc 5.5 only)
pip_install scikit-learn==0.20.3
fi
# Install some other packages
# TODO: Why is scipy pinned
# numba & llvmlite is pinned because of https://github.com/numba/numba/issues/4368
# scikit-learn is pinned because of
# https://github.com/scikit-learn/scikit-learn/issues/14485 (affects gcc 5.5
# only)
as_jenkins pip install --progress-bar off pytest scipy==1.1.0 scikit-learn==0.20.3 scikit-image librosa>=0.6.2 psutil numba==0.46.0 llvmlite==0.30.0
popd
fi

View File

@ -1,18 +0,0 @@
#!/bin/bash
if [[ ${CUDNN_VERSION} == 8 ]]; then
# cuDNN license: https://developer.nvidia.com/cudnn/license_agreement
mkdir tmp_cudnn && cd tmp_cudnn
CUDNN_NAME="cudnn-linux-x86_64-8.3.2.44_cuda11.5-archive"
curl -OLs https://developer.download.nvidia.com/compute/redist/cudnn/v8.3.2/local_installers/11.5/${CUDNN_NAME}.tar.xz
tar xf ${CUDNN_NAME}.tar.xz
cp -a ${CUDNN_NAME}/include/* /usr/include/
cp -a ${CUDNN_NAME}/include/* /usr/local/cuda/include/
cp -a ${CUDNN_NAME}/include/* /usr/include/x86_64-linux-gnu/
cp -a ${CUDNN_NAME}/lib/* /usr/local/cuda/lib64/
cp -a ${CUDNN_NAME}/lib/* /usr/lib/x86_64-linux-gnu/
cd ..
rm -rf tmp_cudnn
ldconfig
fi

View File

@ -2,6 +2,23 @@
set -ex
# This function installs protobuf 2.6
install_protobuf_26() {
pb_dir="/usr/temp_pb_install_dir"
mkdir -p $pb_dir
# On the nvidia/cuda:9-cudnn7-devel-centos7 image we need this symlink or
# else it will fail with
# g++: error: ./../lib64/crti.o: No such file or directory
ln -s /usr/lib64 "$pb_dir/lib64"
curl -LO "https://github.com/google/protobuf/releases/download/v2.6.1/protobuf-2.6.1.tar.gz"
tar -xvz -C "$pb_dir" --strip-components 1 -f protobuf-2.6.1.tar.gz
pushd "$pb_dir" && ./configure && make && make check && sudo make install && sudo ldconfig
popd
rm -rf $pb_dir
}
install_ubuntu() {
apt-get update
apt-get install -y --no-install-recommends \
@ -34,16 +51,11 @@ install_centos() {
}
# Install base packages depending on the base OS
ID=$(grep -oP '(?<=^ID=).+' /etc/os-release | tr -d '"')
case "$ID" in
ubuntu)
install_ubuntu
;;
centos)
install_centos
;;
*)
echo "Unable to determine OS..."
exit 1
;;
esac
if [ -f /etc/lsb-release ]; then
install_ubuntu
elif [ -f /etc/os-release ]; then
install_centos
else
echo "Unable to determine OS..."
exit 1
fi

View File

@ -7,18 +7,14 @@ if [ -n "$GCC_VERSION" ]; then
# Need the official toolchain repo to get alternate packages
add-apt-repository ppa:ubuntu-toolchain-r/test
apt-get update
if [[ "$UBUNTU_VERSION" == "16.04" && "${GCC_VERSION:0:1}" == "5" ]]; then
if [ "$UBUNTU_VERSION" = "16.04" -a "$GCC_VERSION" = "5" ]; then
apt-get install -y g++-5=5.4.0-6ubuntu1~16.04.12
update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-5 50
update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-5 50
update-alternatives --install /usr/bin/gcov gcov /usr/bin/gcov-5 50
else
apt-get install -y g++-$GCC_VERSION
update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-"$GCC_VERSION" 50
update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-"$GCC_VERSION" 50
update-alternatives --install /usr/bin/gcov gcov /usr/bin/gcov-"$GCC_VERSION" 50
fi
update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-"$GCC_VERSION" 50
update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-"$GCC_VERSION" 50
# Cleanup package manager
apt-get autoclean && apt-get clean

View File

@ -3,9 +3,6 @@
set -ex
if [ -n "$KATEX" ]; then
apt-get update
# Ignore error if gpg-agent doesn't exist (for Ubuntu 16.04)
apt-get install -y gpg-agent || :
curl -sL https://deb.nodesource.com/setup_12.x | sudo -E bash -
sudo apt-get install -y nodejs

View File

@ -1,8 +0,0 @@
#!/bin/bash
set -ex
git clone --branch v1.15 https://github.com/linux-test-project/lcov.git
pushd lcov
sudo make install # will be installed in /usr/local/bin/lcov
popd

View File

@ -0,0 +1,30 @@
#!/bin/bash
set -ex
llvm_url="https://github.com/llvm/llvm-project/releases/download/llvmorg-9.0.1/llvm-9.0.1.src.tar.xz"
mkdir /opt/llvm
pushd /tmp
wget --no-verbose --output-document=llvm.tar.xz "$llvm_url"
mkdir llvm
tar -xf llvm.tar.xz -C llvm --strip-components 1
rm -f llvm.tar.xz
cd llvm
mkdir build
cd build
cmake -G "Unix Makefiles" \
-DCMAKE_BUILD_TYPE=MinSizeRel \
-DLLVM_ENABLE_ASSERTIONS=ON \
-DCMAKE_INSTALL_PREFIX=/opt/llvm \
-DLLVM_TARGETS_TO_BUILD="host" \
-DLLVM_BUILD_TOOLS=OFF \
-DLLVM_BUILD_UTILS=OFF \
-DLLVM_TEMPORARILY_ALLOW_OLD_TOOLCHAIN=ON \
../
make -j4
sudo make install
popd

View File

@ -1,10 +0,0 @@
#!/bin/bash
sudo apt-get update
# also install ssh to avoid error of:
# --------------------------------------------------------------------------
# The value of the MCA parameter "plm_rsh_agent" was set to a path
# that could not be found:
# plm_rsh_agent: ssh : rsh
sudo apt-get install -y ssh
sudo apt-get install -y --allow-downgrades --allow-change-held-packages openmpi-bin libopenmpi-dev

View File

@ -1,14 +0,0 @@
#!/bin/bash
set -ex
OPENSSL=openssl-1.1.1k
wget -q -O "${OPENSSL}.tar.gz" "https://ossci-linux.s3.amazonaws.com/${OPENSSL}.tar.gz"
tar xf "${OPENSSL}.tar.gz"
cd "${OPENSSL}"
./config --prefix=/opt/openssl -d '-Wl,--enable-new-dtags,-rpath,$(LIBRPATH)'
# NOTE: openssl install errors out when built with the -j option
make -j6; make install_sw
cd ..
rm -rf "${OPENSSL}"

View File

@ -2,8 +2,8 @@
set -ex
# This function installs protobuf 3.17
install_protobuf_317() {
# This function installs protobuf 2.6
install_protobuf_26() {
pb_dir="/usr/temp_pb_install_dir"
mkdir -p $pb_dir
@ -12,45 +12,45 @@ install_protobuf_317() {
# g++: error: ./../lib64/crti.o: No such file or directory
ln -s /usr/lib64 "$pb_dir/lib64"
curl -LO "https://github.com/protocolbuffers/protobuf/releases/download/v3.17.3/protobuf-all-3.17.3.tar.gz"
tar -xvz -C "$pb_dir" --strip-components 1 -f protobuf-all-3.17.3.tar.gz
# -j6 to balance memory usage and speed.
# naked `-j` seems to use too much memory.
pushd "$pb_dir" && ./configure && make -j6 && make -j6 check && sudo make -j6 install && sudo ldconfig
curl -LO "https://github.com/google/protobuf/releases/download/v2.6.1/protobuf-2.6.1.tar.gz"
tar -xvz -C "$pb_dir" --strip-components 1 -f protobuf-2.6.1.tar.gz
pushd "$pb_dir" && ./configure && make && make check && sudo make install && sudo ldconfig
popd
rm -rf $pb_dir
}
install_ubuntu() {
# Ubuntu 14.04 has cmake 2.8.12 as the default option, so we will
# Ubuntu 14.04 ships with protobuf 2.5, but ONNX needs protobuf >= 2.6
# so we install that here if on 14.04
# Ubuntu 14.04 also has cmake 2.8.12 as the default option, so we will
# install cmake3 here and use cmake3.
apt-get update
if [[ "$UBUNTU_VERSION" == 14.04 ]]; then
apt-get install -y --no-install-recommends cmake3
install_protobuf_26
else
apt-get install -y --no-install-recommends \
libprotobuf-dev \
protobuf-compiler
fi
# Cleanup
apt-get autoclean && apt-get clean
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
install_protobuf_317
}
install_centos() {
install_protobuf_317
# Centos7 ships with protobuf 2.5, but ONNX needs protobuf >= 2.6
# so we always install install that here
install_protobuf_26
}
# Install base packages depending on the base OS
ID=$(grep -oP '(?<=^ID=).+' /etc/os-release | tr -d '"')
case "$ID" in
ubuntu)
install_ubuntu
;;
centos)
install_centos
;;
*)
echo "Unable to determine OS..."
exit 1
;;
esac
if [ -f /etc/lsb-release ]; then
install_ubuntu
elif [ -f /etc/os-release ]; then
install_centos
else
echo "Unable to determine OS..."
exit 1
fi

View File

@ -2,141 +2,75 @@
set -ex
install_magma() {
# "install" hipMAGMA into /opt/rocm/magma by copying after build
git clone https://bitbucket.org/icl/magma.git
pushd magma
# Fixes memory leaks of magma found while executing linalg UTs
git checkout 5959b8783e45f1809812ed96ae762f38ee701972
cp make.inc-examples/make.inc.hip-gcc-mkl make.inc
echo 'LIBDIR += -L$(MKLROOT)/lib' >> make.inc
echo 'LIB += -Wl,--enable-new-dtags -Wl,--rpath,/opt/rocm/lib -Wl,--rpath,$(MKLROOT)/lib -Wl,--rpath,/opt/rocm/magma/lib' >> make.inc
echo 'DEVCCFLAGS += --gpu-max-threads-per-block=256' >> make.inc
export PATH="${PATH}:/opt/rocm/bin"
if [[ -n "$PYTORCH_ROCM_ARCH" ]]; then
amdgpu_targets=`echo $PYTORCH_ROCM_ARCH | sed 's/;/ /g'`
else
amdgpu_targets=`rocm_agent_enumerator | grep -v gfx000 | sort -u | xargs`
fi
for arch in $amdgpu_targets; do
echo "DEVCCFLAGS += --amdgpu-target=$arch" >> make.inc
done
# hipcc with openmp flag may cause isnan() on __device__ not to be found; depending on context, compiler may attempt to match with host definition
sed -i 's/^FOPENMP/#FOPENMP/g' make.inc
make -f make.gen.hipMAGMA -j $(nproc)
LANG=C.UTF-8 make lib/libmagma.so -j $(nproc) MKLROOT=/opt/conda
make testing/testing_dgemm -j $(nproc) MKLROOT=/opt/conda
popd
mv magma /opt/rocm
}
ver() {
printf "%3d%03d%03d%03d" $(echo "$1" | tr '.' ' ');
}
# Map ROCm version to AMDGPU version
declare -A AMDGPU_VERSIONS=( ["4.5.2"]="21.40.2" ["5.0"]="21.50" ["5.1.1"]="22.10.1" )
install_ubuntu() {
apt-get update
if [[ $UBUNTU_VERSION == 18.04 ]]; then
# gpg-agent is not available by default on 18.04
apt-get install -y --no-install-recommends gpg-agent
fi
if [[ $UBUNTU_VERSION == 20.04 ]]; then
# gpg-agent is not available by default on 20.04
apt-get install -y --no-install-recommends gpg-agent
fi
apt-get install -y kmod
apt-get install -y wget
apt-get install -y libopenblas-dev
# Need the libc++1 and libc++abi1 libraries to allow torch._C to load at runtime
apt-get install -y libc++1
apt-get install -y libc++abi1
if [[ $(ver $ROCM_VERSION) -ge $(ver 4.5) ]]; then
# Add amdgpu repository
UBUNTU_VERSION_NAME=`cat /etc/os-release | grep UBUNTU_CODENAME | awk -F= '{print $2}'`
local amdgpu_baseurl="https://repo.radeon.com/amdgpu/${AMDGPU_VERSIONS[$ROCM_VERSION]}/ubuntu"
echo "deb [arch=amd64] ${amdgpu_baseurl} ${UBUNTU_VERSION_NAME} main" > /etc/apt/sources.list.d/amdgpu.list
fi
ROCM_REPO="ubuntu"
if [[ $(ver $ROCM_VERSION) -lt $(ver 4.2) ]]; then
ROCM_REPO="xenial"
fi
DEB_ROCM_REPO=http://repo.radeon.com/rocm/apt/${ROCM_VERSION}
# Add rocm repository
wget -qO - http://repo.radeon.com/rocm/rocm.gpg.key | apt-key add -
local rocm_baseurl="http://repo.radeon.com/rocm/apt/${ROCM_VERSION}"
echo "deb [arch=amd64] ${rocm_baseurl} ${ROCM_REPO} main" > /etc/apt/sources.list.d/rocm.list
wget -qO - $DEB_ROCM_REPO/rocm.gpg.key | apt-key add -
echo "deb [arch=amd64] $DEB_ROCM_REPO xenial main" > /etc/apt/sources.list.d/rocm.list
apt-get update --allow-insecure-repositories
DEBIAN_FRONTEND=noninteractive apt-get install -y --allow-unauthenticated \
rocm-dev \
rocm-utils \
rocm-libs \
rocfft \
miopen-hip \
rocblas \
hipsparse \
rocrand \
hipcub \
rocthrust \
rccl \
rocprofiler-dev \
roctracer-dev
# precompiled miopen kernels added in ROCm 3.5; search for all unversioned packages
# if search fails it will abort this script; use true to avoid case where search fails
MIOPENKERNELS=$(apt-cache search --names-only miopenkernels | awk '{print $1}' | grep -F -v . || true)
if [[ "x${MIOPENKERNELS}" = x ]]; then
echo "miopenkernels package not available"
else
DEBIAN_FRONTEND=noninteractive apt-get install -y --allow-unauthenticated ${MIOPENKERNELS}
fi
install_magma
# Cleanup
apt-get autoclean && apt-get clean
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
# Cleanup
apt-get autoclean && apt-get clean
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
}
install_centos() {
yum update -y
yum install -y kmod
yum install -y wget
yum install -y openblas-devel
yum install -y epel-release
yum install -y dkms kernel-headers-`uname -r` kernel-devel-`uname -r`
if [[ $(ver $ROCM_VERSION) -ge $(ver 4.5) ]]; then
# Add amdgpu repository
local amdgpu_baseurl="https://repo.radeon.com/amdgpu/${AMDGPU_VERSIONS[$ROCM_VERSION]}/rhel/7.9/main/x86_64"
echo "[AMDGPU]" > /etc/yum.repos.d/amdgpu.repo
echo "name=AMDGPU" >> /etc/yum.repos.d/amdgpu.repo
echo "baseurl=${amdgpu_baseurl}" >> /etc/yum.repos.d/amdgpu.repo
echo "enabled=1" >> /etc/yum.repos.d/amdgpu.repo
echo "gpgcheck=1" >> /etc/yum.repos.d/amdgpu.repo
echo "gpgkey=http://repo.radeon.com/rocm/rocm.gpg.key" >> /etc/yum.repos.d/amdgpu.repo
fi
local rocm_baseurl="http://repo.radeon.com/rocm/yum/${ROCM_VERSION}"
echo "[ROCm]" > /etc/yum.repos.d/rocm.repo
echo "name=ROCm" >> /etc/yum.repos.d/rocm.repo
echo "baseurl=${rocm_baseurl}" >> /etc/yum.repos.d/rocm.repo
echo "baseurl=http://repo.radeon.com/rocm/yum/rpm/" >> /etc/yum.repos.d/rocm.repo
echo "enabled=1" >> /etc/yum.repos.d/rocm.repo
echo "gpgcheck=1" >> /etc/yum.repos.d/rocm.repo
echo "gpgkey=http://repo.radeon.com/rocm/rocm.gpg.key" >> /etc/yum.repos.d/rocm.repo
echo "gpgcheck=0" >> /etc/yum.repos.d/rocm.repo
yum update -y
yum install -y \
rocm-dev \
rocm-utils \
rocm-libs \
rocfft \
miopen-hip \
rocblas \
hipsparse \
rocrand \
rccl \
hipcub \
rocthrust \
rocprofiler-dev \
roctracer-dev
install_magma
# Cleanup
yum clean all
rm -rf /var/cache/yum
@ -145,16 +79,11 @@ install_centos() {
}
# Install Python packages depending on the base OS
ID=$(grep -oP '(?<=^ID=).+' /etc/os-release | tr -d '"')
case "$ID" in
ubuntu)
install_ubuntu
;;
centos)
install_centos
;;
*)
echo "Unable to determine OS..."
exit 1
;;
esac
if [ -f /etc/lsb-release ]; then
install_ubuntu
elif [ -f /etc/os-release ]; then
install_centos
else
echo "Unable to determine OS..."
exit 1
fi

View File

@ -1,24 +0,0 @@
#!/bin/bash
set -ex
[ -n "${SWIFTSHADER}" ]
retry () {
$* || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*)
}
_https_amazon_aws=https://ossci-android.s3.amazonaws.com
# SwiftShader
_swiftshader_dir=/var/lib/jenkins/swiftshader
_swiftshader_file_targz=swiftshader-abe07b943-prebuilt.tar.gz
mkdir -p $_swiftshader_dir
_tmp_swiftshader_targz="/tmp/${_swiftshader_file_targz}"
curl --silent --show-error --location --fail --retry 3 \
--output "${_tmp_swiftshader_targz}" "$_https_amazon_aws/${_swiftshader_file_targz}"
tar -C "${_swiftshader_dir}" -xzf "${_tmp_swiftshader_targz}"
export VK_ICD_FILENAMES="${_swiftshader_dir}/build/Linux/vk_swiftshader_icd.json"

View File

@ -0,0 +1,97 @@
#!/bin/bash
set -ex
as_jenkins() {
# NB: Preserve PATH and LD_LIBRARY_PATH changes
sudo -H -u jenkins env "PATH=$PATH" "LD_LIBRARY_PATH=$LD_LIBRARY_PATH" $*
}
if [ -n "$TRAVIS_PYTHON_VERSION" ]; then
mkdir -p /opt/python
chown jenkins:jenkins /opt/python
# Download Python binary from Travis
pushd tmp
as_jenkins wget --quiet ${TRAVIS_DL_URL_PREFIX}/python-$TRAVIS_PYTHON_VERSION.tar.bz2
# NB: The tarball also comes with /home/travis virtualenv that we
# don't care about. (Maybe we should, but we've worked around the
# "how do I install to python" issue by making this entire directory
# user-writable "lol")
# NB: Relative ordering of opt/python and flags matters
as_jenkins tar xjf python-$TRAVIS_PYTHON_VERSION.tar.bz2 --strip-components=2 --directory /opt/python opt/python
popd
echo "/opt/python/$TRAVIS_PYTHON_VERSION/lib" > /etc/ld.so.conf.d/travis-python.conf
ldconfig
sed -e 's|PATH="\(.*\)"|PATH="/opt/python/'"$TRAVIS_PYTHON_VERSION"'/bin:\1"|g' -i /etc/environment
export PATH="/opt/python/$TRAVIS_PYTHON_VERSION/bin:$PATH"
python --version
pip --version
# Install pip from source.
# The python-pip package on Ubuntu Trusty is old
# and upon install numpy doesn't use the binary
# distribution, and fails to compile it from source.
pushd tmp
as_jenkins curl -L -O https://pypi.python.org/packages/11/b6/abcb525026a4be042b486df43905d6893fb04f05aac21c32c638e939e447/pip-9.0.1.tar.gz
as_jenkins tar zxf pip-9.0.1.tar.gz
pushd pip-9.0.1
as_jenkins python setup.py install
popd
rm -rf pip-9.0.1*
popd
# Install pip packages
as_jenkins pip install --upgrade pip
pip --version
if [[ "$TRAVIS_PYTHON_VERSION" == nightly ]]; then
# These two packages have broken Cythonizations uploaded
# to PyPi, see:
#
# - https://github.com/numpy/numpy/issues/10500
# - https://github.com/yaml/pyyaml/issues/117
#
# Furthermore, the released version of Cython does not
# have these issues fixed.
#
# While we are waiting on fixes for these, we build
# from Git for now. Feel free to delete this conditional
# branch if things start working again (you may need
# to do this if these packages regress on Git HEAD.)
as_jenkins pip install git+https://github.com/cython/cython.git
as_jenkins pip install git+https://github.com/numpy/numpy.git
as_jenkins pip install git+https://github.com/yaml/pyyaml.git
else
as_jenkins pip install numpy pyyaml
fi
as_jenkins pip install \
future \
hypothesis \
protobuf \
pytest \
pillow \
typing
as_jenkins pip install mkl mkl-devel
# SciPy does not support Python 3.7 or Python 2.7.9
if [[ "$TRAVIS_PYTHON_VERSION" != nightly ]] && [[ "$TRAVIS_PYTHON_VERSION" != "2.7.9" ]]; then
as_jenkins pip install scipy==1.1.0 scikit-image librosa>=0.6.2
fi
# Install psutil for dataloader tests
as_jenkins pip install psutil
# Install dill for serialization tests
as_jenkins pip install "dill>=0.3.1"
# Cleanup package manager
apt-get autoclean && apt-get clean
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
fi

View File

@ -3,11 +3,8 @@
set -ex
# Mirror jenkins user in container
# jenkins user as ec2-user should have the same user-id
echo "jenkins:x:1000:1000::/var/lib/jenkins:" >> /etc/passwd
echo "jenkins:x:1000:" >> /etc/group
# Needed on focal or newer
echo "jenkins:*:19110:0:99999:7:::" >>/etc/shadow
echo "jenkins:x:1014:1014::/var/lib/jenkins:" >> /etc/passwd
echo "jenkins:x:1014:" >> /etc/group
# Create $HOME
mkdir -p /var/lib/jenkins
@ -21,6 +18,3 @@ chown jenkins:jenkins /usr/local
# Allow sudo
# TODO: Maybe we shouldn't
echo 'jenkins ALL=(ALL) NOPASSWD:ALL' > /etc/sudoers.d/jenkins
# Test that sudo works
sudo -u jenkins sudo -v

View File

@ -2,6 +2,23 @@
set -ex
# This function installs protobuf 2.6
install_protobuf_26() {
pb_dir="/usr/temp_pb_install_dir"
mkdir -p $pb_dir
# On the nvidia/cuda:9-cudnn7-devel-centos7 image we need this symlink or
# else it will fail with
# g++: error: ./../lib64/crti.o: No such file or directory
ln -s /usr/lib64 "$pb_dir/lib64"
curl -LO "https://github.com/google/protobuf/releases/download/v2.6.1/protobuf-2.6.1.tar.gz"
tar -xvz -C "$pb_dir" --strip-components 1 -f protobuf-2.6.1.tar.gz
pushd "$pb_dir" && ./configure && make && make check && sudo make install && sudo ldconfig
popd
rm -rf $pb_dir
}
install_ubuntu() {
apt-get update
apt-get install -y --no-install-recommends \
@ -30,16 +47,11 @@ install_centos() {
}
# Install base packages depending on the base OS
ID=$(grep -oP '(?<=^ID=).+' /etc/os-release | tr -d '"')
case "$ID" in
ubuntu)
install_ubuntu
;;
centos)
install_centos
;;
*)
echo "Unable to determine OS..."
exit 1
;;
esac
if [ -f /etc/lsb-release ]; then
install_ubuntu
elif [ -f /etc/os-release ]; then
install_centos
else
echo "Unable to determine OS..."
exit 1
fi

View File

@ -1,24 +0,0 @@
#!/bin/bash
set -ex
[ -n "${VULKAN_SDK_VERSION}" ]
retry () {
$* || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*)
}
_vulkansdk_dir=/var/lib/jenkins/vulkansdk
_tmp_vulkansdk_targz=/tmp/vulkansdk.tar.gz
curl \
--silent \
--show-error \
--location \
--fail \
--retry 3 \
--output "${_tmp_vulkansdk_targz}" "https://ossci-android.s3.amazonaws.com/vulkansdk-linux-x86_64-${VULKAN_SDK_VERSION}.tar.gz"
mkdir -p "${_vulkansdk_dir}"
tar -C "${_vulkansdk_dir}" -xzf "${_tmp_vulkansdk_targz}" --strip-components 1
rm -rf "${_tmp_vulkansdk_targz}"

View File

@ -1,212 +0,0 @@
# Python dependencies required for unit tests
#awscli==1.6 #this breaks some platforms
#Description: AWS command line interface
#Pinned versions: 1.6
#test that import:
boto3==1.19.12
#Description: AWS SDK for python
#Pinned versions: 1.19.12, 1.16.34
#test that import:
click
#Description: Command Line Interface Creation Kit
#Pinned versions:
#test that import:
coremltools==5.0b5
#Description: Apple framework for ML integration
#Pinned versions: 5.0b5
#test that import:
#dataclasses #this breaks some platforms
#Description: Provides decorators for auto adding special methods to user classes
#Pinned versions:
#test that import:
expecttest==0.1.3
#Description: method for writing tests where test framework auto populates
# the expected output based on previous runs
#Pinned versions: 0.1.3
#test that import:
flatbuffers==2.0
#Description: cross platform serialization library
#Pinned versions: 2.0
#test that import:
#future #this breaks linux-bionic-rocm4.5-py3.7
#Description: compatibility layer between python 2 and python 3
#Pinned versions:
#test that import:
hypothesis==4.53.2
# Pin hypothesis to avoid flakiness: https://github.com/pytorch/pytorch/issues/31136
#Description: advanced library for generating parametrized tests
#Pinned versions: 3.44.6, 4.53.2
#test that import: test_xnnpack_integration.py, test_pruning_op.py, test_nn.py
junitparser==2.1.1
#Description: unitparser handles JUnit/xUnit Result XML files
#Pinned versions: 2.1.1
#test that import:
librosa>=0.6.2
#Description: A python package for music and audio analysis
#Pinned versions: >=0.6.2
#test that import: test_spectral_ops.py
#mkl #this breaks linux-bionic-rocm4.5-py3.7
#Description: Intel oneAPI Math Kernel Library
#Pinned versions:
#test that import: test_profiler.py, test_public_bindings.py, test_testing.py,
#test_nn.py, test_mkldnn.py, test_jit.py, test_fx_experimental.py,
#test_autograd.py
#mkl-devel
# see mkl
#mock # breaks ci/circleci: docker-pytorch-linux-xenial-py3-clang5-android-ndk-r19c
#Description: A testing library that allows you to replace parts of your
#system under test with mock objects
#Pinned versions:
#test that import: test_module_init.py, test_modules.py, test_nn.py,
#test_testing.py
#MonkeyType # breaks pytorch-xla-linux-bionic-py3.7-clang8
#Description: collects runtime types of function arguments and return
#values, and can automatically generate stub files
#Pinned versions:
#test that import:
mypy==0.812
# Pin MyPy version because new errors are likely to appear with each release
#Description: linter
#Pinned versions: 0.812
#test that import: test_typing.py, test_type_hints.py
#networkx
#Description: creation, manipulation, and study of
#the structure, dynamics, and functions of complex networks
#Pinned versions: 2.0
#test that import:
#ninja
#Description: build system. Note that it install from
#here breaks things so it is commented out
#Pinned versions: 1.10.0.post1
#test that import: run_test.py, test_cpp_extensions_aot.py,test_determination.py
numba==0.49.0 ; python_version < "3.9"
numba==0.54.1 ; python_version == "3.9"
#Description: Just-In-Time Compiler for Numerical Functions
#Pinned versions: 0.54.1, 0.49.0, <=0.49.1
#test that import: test_numba_integration.py
#For numba issue see https://github.com/pytorch/pytorch/issues/51511
#numpy
#Description: Provides N-dimensional arrays and linear algebra
#Pinned versions: 1.20
#test that import: test_view_ops.py, test_unary_ufuncs.py, test_type_promotion.py,
#test_type_info.py, test_torch.py, test_tensorexpr_pybind.py, test_tensorexpr.py,
#test_tensorboard.py, test_tensor_creation_ops.py, test_static_runtime.py,
#test_spectral_ops.py, test_sort_and_select.py, test_shape_ops.py,
#test_segment_reductions.py, test_reductions.py, test_pruning_op.py,
#test_overrides.py, test_numpy_interop.py, test_numba_integration.py
#test_nn.py, test_namedtensor.py, test_linalg.py, test_jit_cuda_fuser.py,
#test_jit.py, test_indexing.py, test_datapipe.py, test_dataloader.py,
#test_binary_ufuncs.py
#onnxruntime
#Description: scoring engine for Open Neural Network Exchange (ONNX) models
#Pinned versions: 1.9.0
#test that import:
#pillow
#Description: Python Imaging Library fork
#Pinned versions:
#test that import:
#protobuf
#Description: Googles data interchange format
#Pinned versions:
#test that import: test_tensorboard.py
psutil
#Description: information on running processes and system utilization
#Pinned versions:
#test that import: test_profiler.py, test_openmp.py, test_dataloader.py
pytest
#Description: testing framework
#Pinned versions:
#test that import: test_typing.py, test_cpp_extensions_aot.py, run_test.py
#pytest-benchmark
#Description: fixture for benchmarking code
#Pinned versions: 3.2.3
#test that import:
#pytest-sugar
#Description: shows failures and errors instantly
#Pinned versions:
#test that import:
#PyYAML
#Description: data serialization format
#Pinned versions:
#test that import:
#requests
#Description: HTTP library
#Pinned versions:
#test that import: test_type_promotion.py
#rich
#Description: rich text and beautiful formatting in the terminal
#Pinned versions: 10.9.0
#test that import:
scikit-image
#Description: image processing routines
#Pinned versions:
#test that import: test_nn.py
#scikit-learn
#Description: machine learning package
#Pinned versions: 0.20.3
#test that import:
scipy==1.6.3
# Pin SciPy because of failing distribution tests (see #60347)
#Description: scientific python
#Pinned versions: 1.6.3
#test that import: test_unary_ufuncs.py, test_torch.py,test_tensor_creation_ops.py
#test_spectral_ops.py, test_sparse_csr.py, test_reductions.py,test_nn.py
#test_linalg.py, test_binary_ufuncs.py
#tabulate
#Description: Pretty-print tabular data
#Pinned versions:
#test that import:
tb-nightly
#Description: TensorBoard
#Pinned versions:
#test that import:
#typing-extensions
#Description: type hints for python
#Pinned versions:
#test that import:
#virtualenv
#Description: virtual environment for python
#Pinned versions:
#test that import:
unittest-xml-reporting<=3.2.0,>=2.0.0
#Description: saves unit test results to xml
#Pinned versions:
#test that import:

View File

@ -1,11 +1,12 @@
ARG UBUNTU_VERSION
ARG CUDA_VERSION
ARG IMAGE_NAME
ARG CUDNN_VERSION
FROM ${IMAGE_NAME}
FROM nvidia/cuda:${CUDA_VERSION}-cudnn${CUDNN_VERSION}-devel-ubuntu${UBUNTU_VERSION}
ARG UBUNTU_VERSION
ARG CUDA_VERSION
ARG CUDNN_VERSION
ENV DEBIAN_FRONTEND noninteractive
@ -23,13 +24,11 @@ ARG KATEX
ADD ./common/install_katex.sh install_katex.sh
RUN bash ./install_katex.sh && rm install_katex.sh
# Install conda and other packages (e.g., numpy, pytest)
# Install conda
ENV PATH /opt/conda/bin:$PATH
ARG ANACONDA_PYTHON_VERSION
ADD requirements-ci.txt /opt/conda/requirements-ci.txt
ADD ./common/install_conda.sh install_conda.sh
RUN bash ./install_conda.sh && rm install_conda.sh
RUN rm /opt/conda/requirements-ci.txt
# Install gcc
ARG GCC_VERSION
@ -41,6 +40,12 @@ ARG CLANG_VERSION
ADD ./common/install_clang.sh install_clang.sh
RUN bash ./install_clang.sh && rm install_clang.sh
# Install non-standard Python versions (via Travis binaries)
ARG TRAVIS_PYTHON_VERSION
ENV PATH /opt/python/$TRAVIS_PYTHON_VERSION/bin:$PATH
ADD ./common/install_travis_python.sh install_travis_python.sh
RUN bash ./install_travis_python.sh && rm install_travis_python.sh
# (optional) Install protobuf for ONNX
ARG PROTOBUF
ADD ./common/install_protobuf.sh install_protobuf.sh
@ -62,32 +67,17 @@ RUN if [ -n "${VISION}" ]; then bash ./install_vision.sh; fi
RUN rm install_vision.sh
ENV INSTALLED_VISION ${VISION}
ADD ./common/install_openssl.sh install_openssl.sh
ENV OPENSSL_ROOT_DIR /opt/openssl
RUN bash ./install_openssl.sh
# (optional) Install non-default CMake version
ARG CMAKE_VERSION
ADD ./common/install_cmake.sh install_cmake.sh
RUN if [ -n "${CMAKE_VERSION}" ]; then bash ./install_cmake.sh; fi
RUN rm install_cmake.sh
# Install ccache/sccache (do this last, so we get priority in PATH)
ADD ./common/install_cache.sh install_cache.sh
ENV PATH /opt/cache/bin:$PATH
RUN bash ./install_cache.sh && rm install_cache.sh
ENV CMAKE_CUDA_COMPILER_LAUNCHER=/opt/cache/bin/sccache
ENV CUDA_NVCC_EXECUTABLE=/opt/cache/lib/nvcc
# Add jni.h for java host build
ADD ./common/install_jni.sh install_jni.sh
ADD ./java/jni.h jni.h
RUN bash ./install_jni.sh && rm install_jni.sh
# Install Open MPI for CUDA
ADD ./common/install_openmpi.sh install_openmpi.sh
RUN if [ -n "${CUDA_VERSION}" ]; then bash install_openmpi.sh; fi
RUN rm install_openmpi.sh
# Include BUILD_ENVIRONMENT environment variable in image
ARG BUILD_ENVIRONMENT
ENV BUILD_ENVIRONMENT ${BUILD_ENVIRONMENT}
@ -95,16 +85,10 @@ ENV BUILD_ENVIRONMENT ${BUILD_ENVIRONMENT}
# AWS specific CUDA build guidance
ENV TORCH_CUDA_ARCH_LIST Maxwell
ENV TORCH_NVCC_FLAGS "-Xfatbin -compress-all"
ENV CUDA_PATH /usr/local/cuda
# Install LLVM dev version (Defined in the pytorch/builder github repository)
COPY --from=pytorch/llvm:9.0.1 /opt/llvm /opt/llvm
# Install CUDNN
ARG CUDNN_VERSION
ADD ./common/install_cudnn.sh install_cudnn.sh
RUN if [ "${CUDNN_VERSION}" -eq 8 ]; then bash install_cudnn.sh; fi
RUN rm install_cudnn.sh
# Install LLVM dev version
ADD ./common/install_llvm.sh install_llvm.sh
RUN bash ./install_llvm.sh
USER jenkins
CMD ["bash"]

View File

@ -6,10 +6,6 @@ ARG UBUNTU_VERSION
ENV DEBIAN_FRONTEND noninteractive
# Set AMD gpu targets to build for
ARG PYTORCH_ROCM_ARCH
ENV PYTORCH_ROCM_ARCH ${PYTORCH_ROCM_ARCH}
# Install common dependencies (so that this step can be cached separately)
ARG EC2
ADD ./common/install_base.sh install_base.sh
@ -25,18 +21,11 @@ RUN bash ./install_clang.sh && rm install_clang.sh
ADD ./common/install_user.sh install_user.sh
RUN bash ./install_user.sh && rm install_user.sh
# Install conda and other packages (e.g., numpy, pytest)
# Install conda
ENV PATH /opt/conda/bin:$PATH
ARG ANACONDA_PYTHON_VERSION
ADD requirements-ci.txt /opt/conda/requirements-ci.txt
ADD ./common/install_conda.sh install_conda.sh
RUN bash ./install_conda.sh && rm install_conda.sh
RUN rm /opt/conda/requirements-ci.txt
# Install gcc
ARG GCC_VERSION
ADD ./common/install_gcc.sh install_gcc.sh
RUN bash ./install_gcc.sh && rm install_gcc.sh
# (optional) Install protobuf for ONNX
ARG PROTOBUF
@ -68,8 +57,7 @@ ENV PATH /opt/rocm/bin:$PATH
ENV PATH /opt/rocm/hcc/bin:$PATH
ENV PATH /opt/rocm/hip/bin:$PATH
ENV PATH /opt/rocm/opencl/bin:$PATH
ENV PATH /opt/rocm/llvm/bin:$PATH
ENV MAGMA_HOME /opt/rocm/magma
ENV HIP_PLATFORM hcc
ENV LANG C.UTF-8
ENV LC_ALL C.UTF-8

View File

@ -33,22 +33,23 @@ ARG KATEX
ADD ./common/install_katex.sh install_katex.sh
RUN bash ./install_katex.sh && rm install_katex.sh
# Install conda and other packages (e.g., numpy, pytest)
# Install conda
ENV PATH /opt/conda/bin:$PATH
ARG ANACONDA_PYTHON_VERSION
ADD requirements-ci.txt /opt/conda/requirements-ci.txt
ADD ./common/install_conda.sh install_conda.sh
RUN bash ./install_conda.sh && rm install_conda.sh
RUN rm /opt/conda/requirements-ci.txt
# Install gcc
ARG GCC_VERSION
ADD ./common/install_gcc.sh install_gcc.sh
RUN bash ./install_gcc.sh && rm install_gcc.sh
# Install lcov for C++ code coverage
ADD ./common/install_lcov.sh install_lcov.sh
RUN bash ./install_lcov.sh && rm install_lcov.sh
# Install non-standard Python versions (via Travis binaries)
ARG TRAVIS_PYTHON_VERSION
ARG TRAVIS_DL_URL_PREFIX
ENV PATH /opt/python/$TRAVIS_PYTHON_VERSION/bin:$PATH
ADD ./common/install_travis_python.sh install_travis_python.sh
RUN bash ./install_travis_python.sh && rm install_travis_python.sh
# (optional) Install protobuf for ONNX
ARG PROTOBUF
@ -84,18 +85,6 @@ RUN rm AndroidManifest.xml
RUN rm build.gradle
ENV INSTALLED_ANDROID ${ANDROID}
# (optional) Install Vulkan SDK
ARG VULKAN_SDK_VERSION
ADD ./common/install_vulkan_sdk.sh install_vulkan_sdk.sh
RUN if [ -n "${VULKAN_SDK_VERSION}" ]; then bash ./install_vulkan_sdk.sh; fi
RUN rm install_vulkan_sdk.sh
# (optional) Install swiftshader
ARG SWIFTSHADER
ADD ./common/install_swiftshader.sh install_swiftshader.sh
RUN if [ -n "${SWIFTSHADER}" ]; then bash ./install_swiftshader.sh; fi
RUN rm install_swiftshader.sh
# (optional) Install non-default CMake version
ARG CMAKE_VERSION
ADD ./common/install_cmake.sh install_cmake.sh
@ -108,10 +97,6 @@ ADD ./common/install_ninja.sh install_ninja.sh
RUN if [ -n "${NINJA_VERSION}" ]; then bash ./install_ninja.sh; fi
RUN rm install_ninja.sh
ADD ./common/install_openssl.sh install_openssl.sh
RUN bash ./install_openssl.sh
ENV OPENSSL_ROOT_DIR /opt/openssl
# Install ccache/sccache (do this last, so we get priority in PATH)
ADD ./common/install_cache.sh install_cache.sh
ENV PATH /opt/cache/bin:$PATH
@ -126,8 +111,9 @@ RUN bash ./install_jni.sh && rm install_jni.sh
ARG BUILD_ENVIRONMENT
ENV BUILD_ENVIRONMENT ${BUILD_ENVIRONMENT}
# Install LLVM dev version (Defined in the pytorch/builder github repository)
COPY --from=pytorch/llvm:9.0.1 /opt/llvm /opt/llvm
# Install LLVM dev version
ADD ./common/install_llvm.sh install_llvm.sh
RUN bash ./install_llvm.sh
USER jenkins
CMD ["bash"]

View File

@ -0,0 +1,13 @@
FROM ubuntu:16.04
RUN apt-get update && apt-get install -y python-pip git && rm -rf /var/lib/apt/lists/* /var/log/dpkg.log
ADD requirements.txt /requirements.txt
RUN pip install -r /requirements.txt
ADD gc.py /usr/bin/gc.py
ADD docker_hub.py /usr/bin/docker_hub.py
ENTRYPOINT ["/usr/bin/gc.py"]

View File

@ -0,0 +1,125 @@
#!/usr/bin/env python
from collections import namedtuple
import boto3
import requests
import os
IMAGE_INFO = namedtuple(
"IMAGE_INFO", ("repo", "tag", "size", "last_updated_at", "last_updated_by")
)
def build_access_token(username, passwordtr):
r = requests.post(
"https://hub.docker.com/v2/users/login/",
data={"username": username, "password": password},
)
r.raise_for_status()
token = r.json().get("token")
return {"Authorization": "JWT " + token}
def list_repos(user, token):
r = requests.get("https://hub.docker.com/v2/repositories/" + user, headers=token)
r.raise_for_status()
ret = sorted(
repo["user"] + "/" + repo["name"] for repo in r.json().get("results", [])
)
if ret:
print("repos found:")
print("".join("\n\t" + r for r in ret))
return ret
def list_tags(repo, token):
r = requests.get(
"https://hub.docker.com/v2/repositories/" + repo + "/tags", headers=token
)
r.raise_for_status()
return [
IMAGE_INFO(
repo=repo,
tag=t["name"],
size=t["full_size"],
last_updated_at=t["last_updated"],
last_updated_by=t["last_updater_username"],
)
for t in r.json().get("results", [])
]
def save_to_s3(tags):
table_content = ""
client = boto3.client("s3")
for t in tags:
table_content += (
"<tr><td>{repo}</td><td>{tag}</td><td>{size}</td>"
"<td>{last_updated_at}</td><td>{last_updated_by}</td></tr>"
).format(
repo=t.repo,
tag=t.tag,
size=t.size,
last_updated_at=t.last_updated_at,
last_updated_by=t.last_updated_by,
)
html_body = """
<html>
<head>
<link rel="stylesheet"
href="https://stackpath.bootstrapcdn.com/bootstrap/4.4.1/css/bootstrap.min.css"
integrity="sha384-Vkoo8x4CGsO3+Hhxv8T/Q5PaXtkKtu6ug5TOeNV6gBiFeWPGFN9MuhOf23Q9Ifjh"
crossorigin="anonymous">
<link rel="stylesheet" type="text/css"
href="https://cdn.datatables.net/1.10.20/css/jquery.dataTables.css">
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.4.1/jquery.min.js">
</script>
<script type="text/javascript" charset="utf8"
src="https://cdn.datatables.net/1.10.20/js/jquery.dataTables.js"></script>
<title> docker image info</title>
</head>
<body>
<table class="table table-striped table-hover" id="docker">
<caption>Docker images on docker hub</caption>
<thead class="thead-dark">
<tr>
<th scope="col">repo</th>
<th scope="col">tag</th>
<th scope="col">size</th>
<th scope="col">last_updated_at</th>
<th scope="col">last_updated_by</th>
</tr>
</thead>
<tbody>
{table_content}
</tbody>
</table>
</body>
<script>
$(document).ready( function () {{
$('#docker').DataTable({{paging: false}});
}} );py
</script>
</html>
""".format(
table_content=table_content
)
client.put_object(
Bucket="docker.pytorch.org",
ACL="public-read",
Key="docker_hub.html",
Body=html_body,
ContentType="text/html",
)
if __name__ == "__main__":
username = os.environ.get("DOCKER_HUB_USERNAME")
password = os.environ.get("DOCKER_HUB_PASSWORD")
token = build_access_token(username, password)
tags = []
for repo in list_repos("pytorch", token):
tags.extend(list_tags(repo, token))
save_to_s3(tags)

214
.circleci/ecr_gc_docker/gc.py Executable file
View File

@ -0,0 +1,214 @@
#!/usr/bin/env python
import argparse
import datetime
import boto3
import pytz
import sys
import re
def save_to_s3(project, data):
table_content = ""
client = boto3.client("s3")
for repo, tag, window, age, pushed in data:
table_content += "<tr><td>{repo}</td><td>{tag}</td><td>{window}</td><td>{age}</td><td>{pushed}</td></tr>".format(
repo=repo, tag=tag, window=window, age=age, pushed=pushed
)
html_body = """
<html>
<head>
<link rel="stylesheet"
href="https://stackpath.bootstrapcdn.com/bootstrap/4.4.1/css/bootstrap.min.css"
integrity="sha384-Vkoo8x4CGsO3+Hhxv8T/Q5PaXtkKtu6ug5TOeNV6gBiFeWPGFN9MuhOf23Q9Ifjh"
crossorigin="anonymous">
<link rel="stylesheet" type="text/css" href="https://cdn.datatables.net/1.10.20/css/jquery.dataTables.css">
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.4.1/jquery.min.js"></script>
<script type="text/javascript" charset="utf8" src="https://cdn.datatables.net/1.10.20/js/jquery.dataTables.js"></script>
<title>{project} nightly and permanent docker image info</title>
</head>
<body>
<table class="table table-striped table-hover" id="docker">
<thead class="thead-dark">
<tr>
<th scope="col">repo</th>
<th scope="col">tag</th>
<th scope="col">keep window</th>
<th scope="col">age</th>
<th scope="col">pushed at</th>
</tr>
</thead>
<tbody>
{table_content}
</tbody>
</table>
</body>
<script>
$(document).ready( function () {{
$('#docker').DataTable({{paging: false}});
}} );
</script>
</html>
""".format(
project=project, table_content=table_content
)
# for pytorch, file can be found at
# http://ossci-docker.s3-website.us-east-1.amazonaws.com/pytorch.html
# and later one we can config docker.pytorch.org to point to the location
client.put_object(
Bucket="docker.pytorch.org",
ACL="public-read",
Key="{project}.html".format(project=project),
Body=html_body,
ContentType="text/html",
)
def repos(client):
paginator = client.get_paginator("describe_repositories")
pages = paginator.paginate(registryId="308535385114")
for page in pages:
for repo in page["repositories"]:
yield repo
def images(client, repository):
paginator = client.get_paginator("describe_images")
pages = paginator.paginate(
registryId="308535385114", repositoryName=repository["repositoryName"]
)
for page in pages:
for image in page["imageDetails"]:
yield image
parser = argparse.ArgumentParser(description="Delete old Docker tags from registry")
parser.add_argument(
"--dry-run", action="store_true", help="Dry run; print tags that would be deleted"
)
parser.add_argument(
"--keep-stable-days",
type=int,
default=14,
help="Days of stable Docker tags to keep (non per-build images)",
)
parser.add_argument(
"--keep-unstable-days",
type=int,
default=1,
help="Days of unstable Docker tags to keep (per-build images)",
)
parser.add_argument(
"--filter-prefix",
type=str,
default="",
help="Only run cleanup for repositories with this prefix",
)
parser.add_argument(
"--ignore-tags",
type=str,
default="",
help="Never cleanup these tags (comma separated)",
)
args = parser.parse_args()
if not args.ignore_tags or not args.filter_prefix:
print(
"""
Missing required arguments --ignore-tags and --filter-prefix
You must specify --ignore-tags and --filter-prefix to avoid accidentally
pruning a stable Docker tag which is being actively used. This will
make you VERY SAD. So pay attention.
First, which filter-prefix do you want? The list of valid prefixes
is in jobs/private.groovy under the 'docker-registry-cleanup' job.
You probably want either pytorch or caffe2.
Second, which ignore-tags do you want? It should be whatever the most
up-to-date DockerVersion for the repository in question is. Follow
the imports of jobs/pytorch.groovy to find them.
"""
)
sys.exit(1)
client = boto3.client("ecr", region_name="us-east-1")
stable_window = datetime.timedelta(days=args.keep_stable_days)
unstable_window = datetime.timedelta(days=args.keep_unstable_days)
now = datetime.datetime.now(pytz.UTC)
ignore_tags = args.ignore_tags.split(",")
def chunks(chunkable, n):
""" Yield successive n-sized chunks from l.
"""
for i in range(0, len(chunkable), n):
yield chunkable[i : i + n]
SHA_PATTERN = re.compile(r'^[0-9a-f]{40}$')
def looks_like_git_sha(tag):
"""Returns a boolean to check if a tag looks like a git sha
For reference a sha1 is 40 characters with only 0-9a-f and contains no
"-" characters
"""
return re.match(SHA_PATTERN, tag) is not None
stable_window_tags = []
for repo in repos(client):
repositoryName = repo["repositoryName"]
if not repositoryName.startswith(args.filter_prefix):
continue
# Keep list of image digests to delete for this repository
digest_to_delete = []
print(repositoryName)
for image in images(client, repo):
tags = image.get("imageTags")
if not isinstance(tags, (list,)) or len(tags) == 0:
continue
tag = tags[0]
created = image["imagePushedAt"]
age = now - created
if any([
looks_like_git_sha(tag),
tag.isdigit(),
tag.count("-") == 4, # TODO: Remove, this no longer applies as tags are now built using a SHA1
tag in ignore_tags]):
window = stable_window
if tag in ignore_tags:
stable_window_tags.append((repositoryName, tag, "", age, created))
elif age < window:
stable_window_tags.append((repositoryName, tag, window, age, created))
else:
window = unstable_window
if tag in ignore_tags:
print("Ignoring tag {}:{} (age: {})".format(repositoryName, tag, age))
continue
if age < window:
print("Not deleting manifest for tag {}:{} (age: {})".format(repositoryName, tag, age))
continue
if args.dry_run:
print("(dry run) Deleting manifest for tag {}:{} (age: {})".format(repositoryName, tag, age))
else:
print("Deleting manifest for tag{}:{} (age: {})".format(repositoryName, tag, age))
digest_to_delete.append(image["imageDigest"])
# Issue batch delete for all images to delete for this repository
# Note that as of 2018-07-25, the maximum number of images you can
# delete in a single batch is 100, so chunk our list into batches of
# 100
for c in chunks(digest_to_delete, 100):
client.batch_delete_image(
registryId="308535385114",
repositoryName=repositoryName,
imageIds=[{"imageDigest": digest} for digest in c],
)
save_to_s3(args.filter_prefix, stable_window_tags)

View File

@ -0,0 +1,3 @@
boto3
pytz
requests

View File

@ -8,12 +8,22 @@ Please see README.md in this directory for details.
import os
import shutil
import sys
from collections import namedtuple
from collections import OrderedDict, namedtuple
import cimodel.data.binary_build_definitions as binary_build_definitions
import cimodel.data.caffe2_build_definitions as caffe2_build_definitions
import cimodel.data.pytorch_build_definitions as pytorch_build_definitions
import cimodel.data.simple.android_definitions
import cimodel.data.simple.bazel_definitions
import cimodel.data.simple.binary_smoketest
import cimodel.data.simple.docker_definitions
import cimodel.data.simple.ge_config_tests
import cimodel.data.simple.ios_definitions
import cimodel.data.simple.macos_definitions
import cimodel.data.simple.mobile_definitions
import cimodel.data.simple.nightly_android
import cimodel.data.simple.nightly_ios
import cimodel.data.simple.anaconda_prune_defintions
import cimodel.data.windows_build_definitions as windows_build_definitions
import cimodel.lib.miniutils as miniutils
import cimodel.lib.miniyaml as miniyaml
@ -70,96 +80,57 @@ class Header(object):
for line in filter(None, lines):
output_filehandle.write(line + "\n")
def _for_all_items(items, functor) -> None:
if isinstance(items, list):
for item in items:
_for_all_items(item, functor)
if isinstance(items, dict) and len(items) == 1:
item_type, item = next(iter(items.items()))
functor(item_type, item)
def filter_master_only_jobs(items):
def _is_main_or_master_item(item):
filters = item.get('filters', None)
branches = filters.get('branches', None) if filters is not None else None
branches_only = branches.get('only', None) if branches is not None else None
return ('main' in branches_only or 'master' in branches_only) if branches_only is not None else False
master_deps = set()
def _save_requires_if_master(item_type, item):
requires = item.get('requires', None)
item_name = item.get("name", None)
if not isinstance(requires, list):
return
if _is_main_or_master_item(item) or item_name in master_deps:
master_deps.update([n.strip('"') for n in requires])
def _do_filtering(items):
if isinstance(items, list):
rc = [_do_filtering(item) for item in items]
return [item for item in rc if len(item if item is not None else []) > 0]
assert isinstance(items, dict) and len(items) == 1
item_type, item = next(iter(items.items()))
item_name = item.get("name", None)
item_name = item_name.strip('"') if item_name is not None else None
if not _is_main_or_master_item(item) and item_name not in master_deps:
return None
if 'filters' in item:
item = item.copy()
item.pop('filters')
return {item_type: item}
# Scan of dependencies twice to pick up nested required jobs
# I.e. jobs depending on jobs that main-only job depend on
_for_all_items(items, _save_requires_if_master)
_for_all_items(items, _save_requires_if_master)
return _do_filtering(items)
def generate_required_docker_images(items):
required_docker_images = set()
def _requires_docker_image(item_type, item):
requires = item.get('requires', None)
if not isinstance(requires, list):
return
for requirement in requires:
requirement = requirement.replace('"', '')
if requirement.startswith('docker-'):
required_docker_images.add(requirement)
_for_all_items(items, _requires_docker_image)
return required_docker_images
def gen_build_workflows_tree():
build_workflows_functions = [
pytorch_build_definitions.get_workflow_jobs,
cimodel.data.simple.macos_definitions.get_workflow_jobs,
cimodel.data.simple.android_definitions.get_workflow_jobs,
cimodel.data.simple.ios_definitions.get_workflow_jobs,
cimodel.data.simple.mobile_definitions.get_workflow_jobs,
cimodel.data.simple.ge_config_tests.get_workflow_jobs,
cimodel.data.simple.bazel_definitions.get_workflow_jobs,
caffe2_build_definitions.get_workflow_jobs,
cimodel.data.simple.binary_smoketest.get_workflow_jobs,
cimodel.data.simple.nightly_ios.get_workflow_jobs,
cimodel.data.simple.anaconda_prune_defintions.get_workflow_jobs,
cimodel.data.simple.nightly_android.get_workflow_jobs,
windows_build_definitions.get_windows_workflows,
]
build_jobs = [f() for f in build_workflows_functions]
build_jobs.extend(
cimodel.data.simple.docker_definitions.get_workflow_jobs(
# sort for consistency
sorted(generate_required_docker_images(build_jobs))
)
)
master_build_jobs = filter_master_only_jobs(build_jobs)
rc = {
binary_build_functions = [
binary_build_definitions.get_binary_build_jobs,
binary_build_definitions.get_nightly_tests,
binary_build_definitions.get_nightly_uploads,
binary_build_definitions.get_post_upload_jobs,
binary_build_definitions.get_binary_smoke_test_jobs,
]
docker_builder_functions = [
cimodel.data.simple.docker_definitions.get_workflow_jobs
]
return {
"workflows": {
"build": {
"when": r"<< pipeline.parameters.run_build >>",
"jobs": build_jobs,
"binary_builds": {
"when": r"<< pipeline.parameters.run_binary_tests >>",
"jobs": [f() for f in binary_build_functions],
},
"docker_build": OrderedDict(
{
"triggers": [
{
"schedule": {
"cron": miniutils.quote("0 15 * * 0"),
"filters": {"branches": {"only": ["master"]}},
}
}
],
"jobs": [f() for f in docker_builder_functions],
}
),
"build": {"jobs": [f() for f in build_workflows_functions]},
}
}
if len(master_build_jobs) > 0:
rc["workflows"]["master_build"] = {
"when": r"<< pipeline.parameters.run_master_build >>",
"jobs": master_build_jobs,
}
return rc
# Order of this list matters to the generated config.yml.
@ -169,15 +140,22 @@ YAML_SOURCES = [
File("nightly-binary-build-defaults.yml"),
Header("Build parameters"),
File("build-parameters/pytorch-build-params.yml"),
File("build-parameters/caffe2-build-params.yml"),
File("build-parameters/binary-build-params.yml"),
File("build-parameters/promote-build-params.yml"),
Header("Job specs"),
File("job-specs/pytorch-job-specs.yml"),
File("job-specs/caffe2-job-specs.yml"),
File("job-specs/binary-job-specs.yml"),
File("job-specs/job-specs-custom.yml"),
File("job-specs/job-specs-promote.yml"),
File("job-specs/binary_update_htmls.yml"),
File("job-specs/binary-build-tests.yml"),
File("job-specs/docker_jobs.yml"),
Header("Workflows"),
Treegen(gen_build_workflows_tree, 0),
File("workflows/workflows-ecr-gc.yml"),
File("workflows/workflows-promote.yml"),
]

View File

@ -1,5 +0,0 @@
cd $PSScriptRoot;
$NewFile = New-TemporaryFile;
python generate_config_yml.py > $NewFile.name
(Get-Content $NewFile.name -Raw).TrimEnd().Replace("`r`n","`n") | Set-Content config.yml -Force
Remove-Item $NewFile.name

View File

@ -1,17 +1,8 @@
#!/bin/bash -e
#!/bin/bash -xe
# Allows this script to be invoked from any directory:
cd "$(dirname "$0")"
UNCOMMIT_CHANGE=$(git status -s | grep " config.yml" | wc -l | xargs)
if [[ $UNCOMMIT_CHANGE != 0 ]]; then
OLD_FILE=$(mktemp)
cp config.yml "$OLD_FILE"
echo "Uncommitted change detected in .circleci/config.yml"
echo "It has been backed up to $OLD_FILE"
fi
cd $(dirname "$0")
NEW_FILE=$(mktemp)
./generate_config_yml.py > "$NEW_FILE"
cp "$NEW_FILE" config.yml
echo "New config generated in .circleci/config.yml"
./generate_config_yml.py > $NEW_FILE
cp $NEW_FILE config.yml

View File

@ -33,11 +33,6 @@ else
export BUILDER_ROOT="$workdir/builder"
fi
# Try to extract PR number from branch if not already set
if [[ -z "${CIRCLE_PR_NUMBER:-}" ]]; then
CIRCLE_PR_NUMBER="$(echo ${CIRCLE_BRANCH} | sed -E -n 's/pull\/([0-9]*).*/\1/p')"
fi
# Clone the Pytorch branch
retry git clone https://github.com/pytorch/pytorch.git "$PYTORCH_ROOT"
pushd "$PYTORCH_ROOT"
@ -49,20 +44,19 @@ if [[ -n "${CIRCLE_PR_NUMBER:-}" ]]; then
git reset --hard "$CIRCLE_SHA1"
elif [[ -n "${CIRCLE_SHA1:-}" ]]; then
# Scheduled workflows & "smoke" binary build on master on PR merges
DEFAULT_BRANCH="$(git remote show $CIRCLE_REPOSITORY_URL | awk '/HEAD branch/ {print $NF}')"
git reset --hard "$CIRCLE_SHA1"
git checkout -q -B $DEFAULT_BRANCH
git checkout -q -B master
else
echo "Can't tell what to checkout"
exit 1
fi
retry git submodule update --init --recursive --jobs 0
retry git submodule update --init --recursive
echo "Using Pytorch from "
git --no-pager log --max-count 1
popd
# Clone the Builder master repo
retry git clone -q https://github.com/pytorch/builder.git -b release/1.12 "$BUILDER_ROOT"
retry git clone -q https://github.com/pytorch/builder.git "$BUILDER_ROOT"
pushd "$BUILDER_ROOT"
echo "Using builder from "
git --no-pager log --max-count 1

View File

@ -15,14 +15,13 @@ export PATH="~/anaconda/bin:${PATH}"
source ~/anaconda/bin/activate
# Install dependencies
conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi requests typing_extensions --yes
conda install -c conda-forge valgrind --yes
conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi typing requests --yes
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
# sync submodules
cd ${PROJ_ROOT}
git submodule sync
git submodule update --init --recursive --jobs 0
git submodule update --init --recursive
# run build script
chmod a+x ${PROJ_ROOT}/scripts/build_ios.sh
@ -31,12 +30,8 @@ cat ${PROJ_ROOT}/scripts/build_ios.sh
echo "########################################################"
echo "IOS_ARCH: ${IOS_ARCH}"
echo "IOS_PLATFORM: ${IOS_PLATFORM}"
echo "USE_PYTORCH_METAL: ${USE_PYTORCH_METAL}"
echo "USE_COREML_DELEGATE: ${USE_COREML_DELEGATE}"
export IOS_ARCH=${IOS_ARCH}
export IOS_PLATFORM=${IOS_PLATFORM}
export USE_PYTORCH_METAL=${USE_PYTORCH_METAL}
export USE_COREML_DELEGATE=${USE_COREML_DELEGATE}
unbuffer ${PROJ_ROOT}/scripts/build_ios.sh 2>&1 | ts
#store the binary

View File

@ -8,23 +8,22 @@ cd ${PROJ_ROOT}/ios/TestApp
# install fastlane
sudo gem install bundler && bundle install
# install certificates
echo "${IOS_CERT_KEY_2022}" >> cert.txt
echo "${IOS_CERT_KEY}" >> cert.txt
base64 --decode cert.txt -o Certificates.p12
rm cert.txt
bundle exec fastlane install_root_cert
bundle exec fastlane install_dev_cert
bundle exec fastlane install_cert
# install the provisioning profile
PROFILE=PyTorch_CI_2022.mobileprovision
PROFILE=TestApp_CI.mobileprovision
PROVISIONING_PROFILES=~/Library/MobileDevice/Provisioning\ Profiles
mkdir -pv "${PROVISIONING_PROFILES}"
cd "${PROVISIONING_PROFILES}"
echo "${IOS_SIGN_KEY_2022}" >> cert.txt
echo "${IOS_SIGN_KEY}" >> cert.txt
base64 --decode cert.txt -o ${PROFILE}
rm cert.txt
# run the ruby build script
if ! [ -x "$(command -v xcodebuild)" ]; then
echo 'Error: xcodebuild is not installed.'
exit 1
fi
PROFILE=PyTorch_CI_2022
fi
PROFILE=TestApp_CI
ruby ${PROJ_ROOT}/scripts/xcode_build.rb -i ${PROJ_ROOT}/build_ios/install -x ${PROJ_ROOT}/ios/TestApp/TestApp.xcodeproj -p ${IOS_PLATFORM} -c ${PROFILE} -t ${IOS_DEV_TEAM_ID}

View File

@ -23,36 +23,18 @@ do
fi
done
lipo -i ${ZIP_DIR}/install/lib/*.a
echo "BUILD_LITE_INTERPRETER: ${BUILD_LITE_INTERPRETER}"
# copy the umbrella header and license
if [ "${BUILD_LITE_INTERPRETER}" == "1" ]; then
cp ${PROJ_ROOT}/ios/LibTorch-Lite.h ${ZIP_DIR}/src/
else
cp ${PROJ_ROOT}/ios/LibTorch.h ${ZIP_DIR}/src/
fi
cp ${PROJ_ROOT}/ios/LibTorch.h ${ZIP_DIR}/src/
cp ${PROJ_ROOT}/LICENSE ${ZIP_DIR}/
# zip the library
export DATE="$(date -u +%Y%m%d)"
export IOS_NIGHTLY_BUILD_VERSION="1.12.0.${DATE}"
if [ "${BUILD_LITE_INTERPRETER}" == "1" ]; then
# libtorch_lite_ios_nightly_1.11.0.20210810.zip
ZIPFILE="libtorch_lite_ios_nightly_${IOS_NIGHTLY_BUILD_VERSION}.zip"
else
ZIPFILE="libtorch_ios_nightly_build.zip"
fi
ZIPFILE=libtorch_ios_nightly_build.zip
cd ${ZIP_DIR}
#for testing
touch version.txt
echo "${IOS_NIGHTLY_BUILD_VERSION}" > version.txt
echo $(date +%s) > version.txt
zip -r ${ZIPFILE} install src version.txt LICENSE
# upload to aws
# Install conda then 'conda install' awscli
curl --retry 3 -o ~/conda.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
chmod +x ~/conda.sh
/bin/bash ~/conda.sh -b -p ~/anaconda
export PATH="~/anaconda/bin:${PATH}"
source ~/anaconda/bin/activate
conda install -c conda-forge awscli --yes
brew install awscli
set +x
export AWS_ACCESS_KEY_ID=${AWS_S3_ACCESS_KEY_FOR_PYTORCH_BINARY_UPLOAD}
export AWS_SECRET_ACCESS_KEY=${AWS_S3_ACCESS_SECRET_FOR_PYTORCH_BINARY_UPLOAD}
@ -60,16 +42,3 @@ set +x
# echo "AWS KEY: ${AWS_ACCESS_KEY_ID}"
# echo "AWS SECRET: ${AWS_SECRET_ACCESS_KEY}"
aws s3 cp ${ZIPFILE} s3://ossci-ios-build/ --acl public-read
if [ "${BUILD_LITE_INTERPRETER}" == "1" ]; then
# create a new LibTorch-Lite-Nightly.podspec from the template
echo "cp ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec.template ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec"
cp ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec.template ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec
# update pod version
sed -i '' -e "s/IOS_NIGHTLY_BUILD_VERSION/${IOS_NIGHTLY_BUILD_VERSION}/g" ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec
cat ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec
# push the new LibTorch-Lite-Nightly.podspec to CocoaPods
pod trunk push --verbose --allow-warnings --use-libraries --skip-import-validation ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec
fi

View File

@ -4,31 +4,27 @@ echo "RUNNING ON $(uname -a) WITH $(nproc) CPUS AND $(free -m)"
set -eux -o pipefail
source /env
# Because most Circle executors only have 20 CPUs, using more causes OOMs w/ Ninja and nvcc parallelization
MEMORY_LIMIT_MAX_JOBS=18
NUM_CPUS=$(( $(nproc) - 2 ))
# Defaults here for **binary** linux builds so they can be changed in one place
export MAX_JOBS=${MAX_JOBS:-$(( ${NUM_CPUS} > ${MEMORY_LIMIT_MAX_JOBS} ? ${MEMORY_LIMIT_MAX_JOBS} : ${NUM_CPUS} ))}
if [[ "${DESIRED_CUDA}" =~ cu11[0-9] ]]; then
export BUILD_SPLIT_CUDA="ON"
fi
# Defaults here so they can be changed in one place
export MAX_JOBS=12
# Parse the parameters
if [[ "$PACKAGE_TYPE" == 'conda' ]]; then
build_script='conda/build_pytorch.sh'
elif [[ "$DESIRED_CUDA" == cpu ]]; then
build_script='manywheel/build_cpu.sh'
elif [[ "$DESIRED_CUDA" == *"rocm"* ]]; then
build_script='manywheel/build_rocm.sh'
else
build_script='manywheel/build.sh'
fi
if [[ "$CIRCLE_BRANCH" == "main" ]] || [[ "$CIRCLE_BRANCH" == "master" ]] || [[ "$CIRCLE_BRANCH" == release/* ]]; then
export BUILD_DEBUG_INFO=1
# We want to call unbuffer, which calls tclsh which finds the expect
# package. The expect was installed by yum into /usr/bin so we want to
# find /usr/bin/tclsh, but this is shadowed by /opt/conda/bin/tclsh in
# the conda docker images, so we prepend it to the path here.
if [[ "$PACKAGE_TYPE" == 'conda' ]]; then
mkdir /just_tclsh_bin
ln -s /usr/bin/tclsh /just_tclsh_bin/tclsh
export PATH=/just_tclsh_bin:$PATH
fi
# Build the package
SKIP_ALL_TESTS=1 "/builder/$build_script"
SKIP_ALL_TESTS=1 unbuffer "/builder/$build_script" | ts

View File

@ -1,31 +1,16 @@
#!/bin/bash
OUTPUT_SCRIPT=${OUTPUT_SCRIPT:-/home/circleci/project/ci_test_script.sh}
# only source if file exists
if [[ -f /home/circleci/project/env ]]; then
source /home/circleci/project/env
fi
cat >"${OUTPUT_SCRIPT}" <<EOL
source /home/circleci/project/env
cat >/home/circleci/project/ci_test_script.sh <<EOL
# =================== The following code will be executed inside Docker container ===================
set -eux -o pipefail
retry () {
"\$@" || (sleep 1 && "\$@") || (sleep 2 && "\$@")
}
# Source binary env file here if exists
if [[ -e "${BINARY_ENV_FILE:-/nofile}" ]]; then
source "${BINARY_ENV_FILE:-/nofile}"
fi
python_nodot="\$(echo $DESIRED_PYTHON | tr -d m.u)"
# Set up Python
if [[ "$PACKAGE_TYPE" == conda ]]; then
retry conda create -qyn testenv python="$DESIRED_PYTHON"
source activate testenv >/dev/null
elif [[ "$PACKAGE_TYPE" != libtorch ]]; then
python_nodot="\$(echo $DESIRED_PYTHON | tr -d m.u)"
python_path="/opt/python/cp\$python_nodot-cp\${python_nodot}"
# Prior to Python 3.8 paths were suffixed with an 'm'
if [[ -d "\${python_path}/bin" ]]; then
@ -35,72 +20,31 @@ elif [[ "$PACKAGE_TYPE" != libtorch ]]; then
fi
fi
EXTRA_CONDA_FLAGS=""
NUMPY_PIN=""
PROTOBUF_PACKAGE="defaults::protobuf"
if [[ "\$python_nodot" = *310* ]]; then
EXTRA_CONDA_FLAGS="-c=conda-forge"
# There's an issue with conda channel priority where it'll randomly pick 1.19 over 1.20
# we set a lower boundary here just to be safe
NUMPY_PIN=">=1.21.2"
PROTOBUF_PACKAGE="protobuf>=3.19.0"
fi
if [[ "\$python_nodot" = *39* ]]; then
EXTRA_CONDA_FLAGS="-c=conda-forge"
# There's an issue with conda channel priority where it'll randomly pick 1.19 over 1.20
# we set a lower boundary here just to be safe
NUMPY_PIN=">=1.20"
fi
if [[ "$DESIRED_CUDA" == "cu116" ]]; then
EXTRA_CONDA_FLAGS="-c=conda-forge"
fi
# Move debug wheels out of the the package dir so they don't get installed
mkdir -p /tmp/debug_final_pkgs
mv /final_pkgs/debug-*.zip /tmp/debug_final_pkgs || echo "no debug packages to move"
# Install the package
# These network calls should not have 'retry's because they are installing
# locally and aren't actually network calls
# TODO there is duplicated and inconsistent test-python-env setup across this
# file, builder/smoke_test.sh, and builder/run_tests.sh, and also in the
# conda build scripts themselves. These should really be consolidated
# Pick only one package of multiple available (which happens as result of workflow re-runs)
pkg="/final_pkgs/\$(ls -1 /final_pkgs|sort|tail -1)"
pkg="/final_pkgs/\$(ls /final_pkgs)"
if [[ "$PACKAGE_TYPE" == conda ]]; then
(
# For some reason conda likes to re-activate the conda environment when attempting this install
# which means that a deactivate is run and some variables might not exist when that happens,
# namely CONDA_MKL_INTERFACE_LAYER_BACKUP from libblas so let's just ignore unbound variables when
# it comes to the conda installation commands
set +u
retry conda install \${EXTRA_CONDA_FLAGS} -yq \
"numpy\${NUMPY_PIN}" \
future \
mkl>=2018 \
ninja \
dataclasses \
typing-extensions \
${PROTOBUF_PACKAGE} \
six
if [[ "$DESIRED_CUDA" == 'cpu' ]]; then
retry conda install -c pytorch -y cpuonly
conda install -y "\$pkg" --offline
if [[ "$DESIRED_CUDA" == 'cpu' ]]; then
retry conda install -y cpuonly -c pytorch
fi
retry conda install -yq future numpy protobuf six
if [[ "$DESIRED_CUDA" != 'cpu' ]]; then
# DESIRED_CUDA is in format cu90 or cu102
if [[ "${#DESIRED_CUDA}" == 4 ]]; then
cu_ver="${DESIRED_CUDA:2:1}.${DESIRED_CUDA:3}"
else
# DESIRED_CUDA is in format cu90 or cu102
if [[ "${#DESIRED_CUDA}" == 4 ]]; then
cu_ver="${DESIRED_CUDA:2:1}.${DESIRED_CUDA:3}"
else
cu_ver="${DESIRED_CUDA:2:2}.${DESIRED_CUDA:4}"
fi
retry conda install \${EXTRA_CONDA_FLAGS} -yq -c nvidia -c pytorch "cudatoolkit=\${cu_ver}"
cu_ver="${DESIRED_CUDA:2:2}.${DESIRED_CUDA:4}"
fi
conda install \${EXTRA_CONDA_FLAGS} -y "\$pkg" --offline
)
retry conda install -yq -c pytorch "cudatoolkit=\${cu_ver}"
fi
elif [[ "$PACKAGE_TYPE" != libtorch ]]; then
pip install "\$pkg"
retry pip install -q future numpy protobuf typing-extensions six
retry pip install -q future numpy protobuf six
fi
if [[ "$PACKAGE_TYPE" == libtorch ]]; then
pkg="\$(ls /final_pkgs/*-latest.zip)"
@ -116,4 +60,4 @@ EOL
echo
echo
echo "The script that will run in the next step is:"
cat "${OUTPUT_SCRIPT}"
cat /home/circleci/project/ci_test_script.sh

View File

@ -0,0 +1,48 @@
#!/bin/bash
# Do NOT set -x
source /home/circleci/project/env
set -eu -o pipefail
set +x
declare -x "AWS_ACCESS_KEY_ID=${PYTORCH_BINARY_AWS_ACCESS_KEY_ID}"
declare -x "AWS_SECRET_ACCESS_KEY=${PYTORCH_BINARY_AWS_SECRET_ACCESS_KEY}"
#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!
# DO NOT TURN -x ON BEFORE THIS LINE
#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!
set -eux -o pipefail
export PATH="$MINICONDA_ROOT/bin:$PATH"
# This gets set in binary_populate_env.sh, but lets have a sane default just in case
PIP_UPLOAD_FOLDER=${PIP_UPLOAD_FOLDER:-nightly}
# TODO: Combine CONDA_UPLOAD_CHANNEL and PIP_UPLOAD_FOLDER into one variable
# The only difference is the trailing slash
# Strip trailing slashes if there
CONDA_UPLOAD_CHANNEL=$(echo "${PIP_UPLOAD_FOLDER}" | sed 's:/*$::')
BACKUP_BUCKET="s3://pytorch-backup"
retry pip install -q awscli
# Upload the package to the final location
pushd /home/circleci/project/final_pkgs
if [[ "$PACKAGE_TYPE" == conda ]]; then
retry conda install -yq anaconda-client
retry anaconda -t "${CONDA_PYTORCHBOT_TOKEN}" upload "$(ls)" -u "pytorch-${CONDA_UPLOAD_CHANNEL}" --label main --no-progress --force
# Fetch platform (eg. win-64, linux-64, etc.) from index file
# Because there's no actual conda command to read this
subdir=$(tar -xOf ./*.bz2 info/index.json | grep subdir | cut -d ':' -f2 | sed -e 's/[[:space:]]//' -e 's/"//g' -e 's/,//')
BACKUP_DIR="conda/${subdir}"
elif [[ "$PACKAGE_TYPE" == libtorch ]]; then
s3_dir="s3://pytorch/libtorch/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"
for pkg in $(ls); do
retry aws s3 cp "$pkg" "$s3_dir" --acl public-read
done
BACKUP_DIR="libtorch/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"
else
s3_dir="s3://pytorch/whl/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"
retry aws s3 cp "$(ls)" "$s3_dir" --acl public-read
BACKUP_DIR="whl/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"
fi
if [[ -n "${CIRCLE_TAG:-}" ]]; then
s3_dir="${BACKUP_BUCKET}/${CIRCLE_TAG}/${BACKUP_DIR}"
retry aws s3 cp --recursive . "$s3_dir"
fi

View File

@ -1,19 +1,24 @@
#!/bin/bash
set -eux -o pipefail
source "${BINARY_ENV_FILE:-/Users/distiller/project/env}"
source "/Users/distiller/project/env"
mkdir -p "$PYTORCH_FINAL_PACKAGE_DIR"
if [[ -z "${IS_GHA:-}" ]]; then
export PATH="${workdir:-${HOME}}/miniconda/bin:${PATH}"
fi
# For some reason `unbuffer` breaks if we change the PATH here, so we
# write a script with the PATH change in it and unbuffer the whole
# thing
build_script="$workdir/build_script.sh"
touch "$build_script"
chmod +x "$build_script"
# Build
export USE_PYTORCH_METAL_EXPORT=1
export USE_COREML_DELEGATE=1
cat >"$build_script" <<EOL
export PATH="$workdir/miniconda/bin:$PATH"
if [[ "$PACKAGE_TYPE" == conda ]]; then
"${BUILDER_ROOT}/conda/build_pytorch.sh"
"$workdir/builder/conda/build_pytorch.sh"
else
export TORCH_PACKAGE_NAME="$(echo $TORCH_PACKAGE_NAME | tr '-' '_')"
"${BUILDER_ROOT}/wheel/build_wheel.sh"
"$workdir/builder/wheel/build_wheel.sh"
fi
EOL
unbuffer "$build_script" | ts

View File

@ -20,9 +20,9 @@ if [[ "$PACKAGE_TYPE" == libtorch ]]; then
unzip "$pkg" -d /tmp
cd /tmp/libtorch
elif [[ "$PACKAGE_TYPE" == conda ]]; then
conda install -y "$pkg"
conda install -y "$pkg" --offline
else
pip install "$pkg" -v
pip install "$pkg" --no-index --no-dependencies -v
fi
# Test

View File

@ -0,0 +1,48 @@
#!/bin/bash
# Do NOT set -x
set -eu -o pipefail
set +x
export AWS_ACCESS_KEY_ID="${PYTORCH_BINARY_AWS_ACCESS_KEY_ID}"
export AWS_SECRET_ACCESS_KEY="${PYTORCH_BINARY_AWS_SECRET_ACCESS_KEY}"
#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!
# DO NOT TURN -x ON BEFORE THIS LINE
#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!
set -eux -o pipefail
source "/Users/distiller/project/env"
export "PATH=$workdir/miniconda/bin:$PATH"
# This gets set in binary_populate_env.sh, but lets have a sane default just in case
PIP_UPLOAD_FOLDER=${PIP_UPLOAD_FOLDER:-nightly}
# TODO: Combine CONDA_UPLOAD_CHANNEL and PIP_UPLOAD_FOLDER into one variable
# The only difference is the trailing slash
# Strip trailing slashes if there
CONDA_UPLOAD_CHANNEL=$(echo "${PIP_UPLOAD_FOLDER}" | sed 's:/*$::')
BACKUP_BUCKET="s3://pytorch-backup"
retry pip install -q awscli
pushd "$workdir/final_pkgs"
if [[ "$PACKAGE_TYPE" == conda ]]; then
retry conda install -yq anaconda-client
retry anaconda -t "${CONDA_PYTORCHBOT_TOKEN}" upload "$(ls)" -u "pytorch-${CONDA_UPLOAD_CHANNEL}" --label main --no-progress --force
# Fetch platform (eg. win-64, linux-64, etc.) from index file
# Because there's no actual conda command to read this
subdir=$(tar -xOf ./*.bz2 info/index.json | grep subdir | cut -d ':' -f2 | sed -e 's/[[:space:]]//' -e 's/"//g' -e 's/,//')
BACKUP_DIR="conda/${subdir}"
elif [[ "$PACKAGE_TYPE" == libtorch ]]; then
s3_dir="s3://pytorch/libtorch/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"
for pkg in $(ls); do
retry aws s3 cp "$pkg" "$s3_dir" --acl public-read
done
BACKUP_DIR="libtorch/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"
else
s3_dir="s3://pytorch/whl/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"
retry aws s3 cp "$(ls)" "$s3_dir" --acl public-read
BACKUP_DIR="whl/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"
fi
if [[ -n "${CIRCLE_TAG:-}" ]]; then
s3_dir="${BACKUP_BUCKET}/${CIRCLE_TAG}/${BACKUP_DIR}"
retry aws s3 cp --recursive . "$s3_dir"
fi

View File

@ -5,70 +5,53 @@ export TZ=UTC
tagged_version() {
# Grabs version from either the env variable CIRCLE_TAG
# or the pytorch git described version
if [[ "$OSTYPE" == "msys" && -z "${IS_GHA:-}" ]]; then
GIT_DIR="${workdir}/p/.git"
if [[ "$OSTYPE" == "msys" ]]; then
GIT_DESCRIBE="git --git-dir ${workdir}/p/.git describe"
else
GIT_DIR="${workdir}/pytorch/.git"
GIT_DESCRIBE="git --git-dir ${workdir}/pytorch/.git describe"
fi
GIT_DESCRIBE="git --git-dir ${GIT_DIR} describe --tags --match v[0-9]*.[0-9]*.[0-9]*"
if [[ -n "${CIRCLE_TAG:-}" ]]; then
echo "${CIRCLE_TAG}"
elif [[ ! -d "${GIT_DIR}" ]]; then
echo "Abort, abort! Git dir ${GIT_DIR} does not exists!"
kill $$
elif ${GIT_DESCRIBE} --exact >/dev/null; then
${GIT_DESCRIBE}
elif ${GIT_DESCRIBE} --exact --tags >/dev/null; then
${GIT_DESCRIBE} --tags
else
return 1
fi
}
# These are only relevant for CircleCI
# TODO: Remove these later once migrated fully to GHA
if [[ -z ${IS_GHA:-} ]]; then
# We need to write an envfile to persist these variables to following
# steps, but the location of the envfile depends on the circleci executor
if [[ "$(uname)" == Darwin ]]; then
# macos executor (builds and tests)
workdir="/Users/distiller/project"
elif [[ "$OSTYPE" == "msys" ]]; then
# windows executor (builds and tests)
workdir="/c/w"
elif [[ -d "/home/circleci/project" ]]; then
# machine executor (binary tests)
workdir="/home/circleci/project"
else
# docker executor (binary builds)
workdir="/"
fi
envfile="$workdir/env"
touch "$envfile"
chmod +x "$envfile"
# We need to write an envfile to persist these variables to following
# steps, but the location of the envfile depends on the circleci executor
if [[ "$(uname)" == Darwin ]]; then
# macos executor (builds and tests)
workdir="/Users/distiller/project"
elif [[ "$OSTYPE" == "msys" ]]; then
# windows executor (builds and tests)
workdir="/c/w"
elif [[ -d "/home/circleci/project" ]]; then
# machine executor (binary tests)
workdir="/home/circleci/project"
else
# docker executor (binary builds)
workdir="/"
fi
envfile="$workdir/env"
touch "$envfile"
chmod +x "$envfile"
# Parse the BUILD_ENVIRONMENT to package type, python, and cuda
configs=($BUILD_ENVIRONMENT)
export PACKAGE_TYPE="${configs[0]}"
export DESIRED_PYTHON="${configs[1]}"
export DESIRED_CUDA="${configs[2]}"
if [[ "${OSTYPE}" == "msys" ]]; then
export DESIRED_DEVTOOLSET=""
export LIBTORCH_CONFIG="${configs[3]:-}"
if [[ "$LIBTORCH_CONFIG" == 'debug' ]]; then
export DEBUG=1
fi
else
export DESIRED_DEVTOOLSET="${configs[3]:-}"
# Parse the BUILD_ENVIRONMENT to package type, python, and cuda
configs=($BUILD_ENVIRONMENT)
export PACKAGE_TYPE="${configs[0]}"
export DESIRED_PYTHON="${configs[1]}"
export DESIRED_CUDA="${configs[2]}"
if [[ "${BUILD_FOR_SYSTEM:-}" == "windows" ]]; then
export DESIRED_DEVTOOLSET=""
export LIBTORCH_CONFIG="${configs[3]:-}"
if [[ "$LIBTORCH_CONFIG" == 'debug' ]]; then
export DEBUG=1
fi
else
envfile=${BINARY_ENV_FILE:-/tmp/env}
if [[ -n "${PYTORCH_ROOT}" ]]; then
workdir=$(dirname "${PYTORCH_ROOT}")
else
# docker executor (binary builds)
workdir="/"
fi
export DESIRED_DEVTOOLSET="${configs[3]:-}"
fi
if [[ "$PACKAGE_TYPE" == 'libtorch' ]]; then
export BUILD_PYTHONLESS=1
fi
@ -79,25 +62,18 @@ if [[ -z "$DOCKER_IMAGE" ]]; then
if [[ "$PACKAGE_TYPE" == conda ]]; then
export DOCKER_IMAGE="pytorch/conda-cuda"
elif [[ "$DESIRED_CUDA" == cpu ]]; then
export DOCKER_IMAGE="pytorch/manylinux-cpu"
export DOCKER_IMAGE="pytorch/manylinux-cuda100"
else
export DOCKER_IMAGE="pytorch/manylinux-cuda${DESIRED_CUDA:2}"
fi
fi
USE_GOLD_LINKER="OFF"
# GOLD linker can not be used if CUPTI is statically linked into PyTorch, see https://github.com/pytorch/pytorch/issues/57744
if [[ ${DESIRED_CUDA} == "cpu" ]]; then
USE_GOLD_LINKER="ON"
fi
# Default to nightly, since that's where this normally uploads to
PIP_UPLOAD_FOLDER='nightly/'
# We put this here so that OVERRIDE_PACKAGE_VERSION below can read from it
export DATE="$(date -u +%Y%m%d)"
#TODO: We should be pulling semver version from the base version.txt
BASE_BUILD_VERSION="1.12.0.dev$DATE"
BASE_BUILD_VERSION="1.6.0.dev$DATE"
# Change BASE_BUILD_VERSION to git tag when on a git tag
# Use 'git -C' to make doubly sure we're in the correct directory for checking
# the git tag
@ -109,7 +85,7 @@ if tagged_version >/dev/null; then
# Turns tag v1.6.0-rc1 -> v1.6.0
BASE_BUILD_VERSION="$(tagged_version | sed -e 's/^v//' -e 's/-.*$//')"
fi
if [[ "$(uname)" == 'Darwin' ]] || [[ "$PACKAGE_TYPE" == conda ]]; then
if [[ "$(uname)" == 'Darwin' ]] || [[ "$DESIRED_CUDA" == "cu102" ]] || [[ "$PACKAGE_TYPE" == conda ]]; then
export PYTORCH_BUILD_VERSION="${BASE_BUILD_VERSION}"
else
export PYTORCH_BUILD_VERSION="${BASE_BUILD_VERSION}+$DESIRED_CUDA"
@ -124,14 +100,8 @@ if [[ "$PACKAGE_TYPE" == libtorch ]]; then
POSSIBLE_JAVA_HOMES+=(/usr/local)
POSSIBLE_JAVA_HOMES+=(/usr/lib/jvm/java-8-openjdk-amd64)
POSSIBLE_JAVA_HOMES+=(/Library/Java/JavaVirtualMachines/*.jdk/Contents/Home)
# Add the Windows-specific JNI path
POSSIBLE_JAVA_HOMES+=("$PWD/.circleci/windows-jni/")
for JH in "${POSSIBLE_JAVA_HOMES[@]}" ; do
if [[ -e "$JH/include/jni.h" ]] ; then
# Skip if we're not on Windows but haven't found a JAVA_HOME
if [[ "$JH" == "$PWD/.circleci/windows-jni/" && "$OSTYPE" != "msys" ]] ; then
break
fi
echo "Found jni.h under $JH"
JAVA_HOME="$JH"
BUILD_JNI=ON
@ -143,28 +113,24 @@ if [[ "$PACKAGE_TYPE" == libtorch ]]; then
fi
fi
cat >"$envfile" <<EOL
cat >>"$envfile" <<EOL
# =================== The following code will be executed inside Docker container ===================
export TZ=UTC
echo "Running on $(uname -a) at $(date)"
export PACKAGE_TYPE="$PACKAGE_TYPE"
export DESIRED_PYTHON="${DESIRED_PYTHON:-}"
export DESIRED_PYTHON="$DESIRED_PYTHON"
export DESIRED_CUDA="$DESIRED_CUDA"
export LIBTORCH_VARIANT="${LIBTORCH_VARIANT:-}"
export BUILD_PYTHONLESS="${BUILD_PYTHONLESS:-}"
if [[ "${OSTYPE}" == "msys" ]]; then
export DESIRED_DEVTOOLSET="$DESIRED_DEVTOOLSET"
if [[ "${BUILD_FOR_SYSTEM:-}" == "windows" ]]; then
export LIBTORCH_CONFIG="${LIBTORCH_CONFIG:-}"
if [[ "${LIBTORCH_CONFIG:-}" == 'debug' ]]; then
export DEBUG=1
fi
export DESIRED_DEVTOOLSET=""
else
export DESIRED_DEVTOOLSET="${DESIRED_DEVTOOLSET:-}"
export DEBUG="${DEBUG:-}"
fi
export DATE="$DATE"
export NIGHTLIES_DATE_PREAMBLE=1.12.0.dev
export NIGHTLIES_DATE_PREAMBLE=1.6.0.dev
export PYTORCH_BUILD_VERSION="$PYTORCH_BUILD_VERSION"
export PYTORCH_BUILD_NUMBER="$PYTORCH_BUILD_NUMBER"
export OVERRIDE_PACKAGE_VERSION="$PYTORCH_BUILD_VERSION"
@ -172,7 +138,6 @@ export OVERRIDE_PACKAGE_VERSION="$PYTORCH_BUILD_VERSION"
# TODO: We don't need this anymore IIUC
export TORCH_PACKAGE_NAME='torch'
export TORCH_CONDA_BUILD_FOLDER='pytorch-nightly'
export ANACONDA_USER='pytorch'
export USE_FBGEMM=1
export JAVA_HOME=$JAVA_HOME
@ -180,48 +145,25 @@ export BUILD_JNI=$BUILD_JNI
export PIP_UPLOAD_FOLDER="$PIP_UPLOAD_FOLDER"
export DOCKER_IMAGE="$DOCKER_IMAGE"
export workdir="$workdir"
export MAC_PACKAGE_WORK_DIR="$workdir"
if [[ "$OSTYPE" == "msys" ]]; then
export PYTORCH_ROOT="$workdir/p"
export BUILDER_ROOT="$workdir/b"
else
export PYTORCH_ROOT="$workdir/pytorch"
export BUILDER_ROOT="$workdir/builder"
fi
export MINICONDA_ROOT="$workdir/miniconda"
export PYTORCH_FINAL_PACKAGE_DIR="$workdir/final_pkgs"
export USE_GOLD_LINKER="${USE_GOLD_LINKER}"
export USE_GLOO_WITH_OPENSSL="ON"
export CIRCLE_TAG="${CIRCLE_TAG:-}"
export CIRCLE_SHA1="$CIRCLE_SHA1"
export CIRCLE_PR_NUMBER="${CIRCLE_PR_NUMBER:-}"
export CIRCLE_BRANCH="$CIRCLE_BRANCH"
# =================== The above code will be executed inside Docker container ===================
EOL
# nproc doesn't exist on darwin
if [[ "$(uname)" != Darwin ]]; then
# Because most Circle executors only have 20 CPUs, using more causes OOMs w/ Ninja and nvcc parallelization
MEMORY_LIMIT_MAX_JOBS=18
NUM_CPUS=$(( $(nproc) - 2 ))
# Defaults here for **binary** linux builds so they can be changed in one place
export MAX_JOBS=${MAX_JOBS:-$(( ${NUM_CPUS} > ${MEMORY_LIMIT_MAX_JOBS} ? ${MEMORY_LIMIT_MAX_JOBS} : ${NUM_CPUS} ))}
cat >>"$envfile" <<EOL
export MAX_JOBS="${MAX_JOBS}"
EOL
fi
if [[ -z "${IS_GHA:-}" ]]; then
cat >>"$envfile" <<EOL
export workdir="$workdir"
export MAC_PACKAGE_WORK_DIR="$workdir"
if [[ "$OSTYPE" == "msys" ]]; then
export PYTORCH_ROOT="$workdir/p"
export BUILDER_ROOT="$workdir/b"
else
export PYTORCH_ROOT="$workdir/pytorch"
export BUILDER_ROOT="$workdir/builder"
fi
export MINICONDA_ROOT="$workdir/miniconda"
export PYTORCH_FINAL_PACKAGE_DIR="$workdir/final_pkgs"
export CIRCLE_TAG="${CIRCLE_TAG:-}"
export CIRCLE_SHA1="$CIRCLE_SHA1"
export CIRCLE_PR_NUMBER="${CIRCLE_PR_NUMBER:-}"
export CIRCLE_BRANCH="$CIRCLE_BRANCH"
export CIRCLE_WORKFLOW_ID="$CIRCLE_WORKFLOW_ID"
EOL
fi
echo 'retry () {' >> "$envfile"
echo ' $* || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*)' >> "$envfile"
echo '}' >> "$envfile"

View File

@ -19,7 +19,7 @@ chmod +x /home/circleci/project/ci_test_script.sh
VOLUME_MOUNTS="-v /home/circleci/project/:/circleci_stuff -v /home/circleci/project/final_pkgs:/final_pkgs -v ${PYTORCH_ROOT}:/pytorch -v ${BUILDER_ROOT}:/builder"
# Run the docker
if [ -n "${USE_CUDA_DOCKER_RUNTIME:-}" ]; then
export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --gpus all ${VOLUME_MOUNTS} -t -d "${DOCKER_IMAGE}")
export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --runtime=nvidia ${VOLUME_MOUNTS} -t -d "${DOCKER_IMAGE}")
else
export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined ${VOLUME_MOUNTS} -t -d "${DOCKER_IMAGE}")
fi

View File

@ -1,102 +0,0 @@
#!/usr/bin/env bash
set -euo pipefail
PACKAGE_TYPE=${PACKAGE_TYPE:-conda}
PKG_DIR=${PKG_DIR:-/tmp/workspace/final_pkgs}
# Designates whether to submit as a release candidate or a nightly build
# Value should be `test` when uploading release candidates
# currently set within `designate_upload_channel`
UPLOAD_CHANNEL=${UPLOAD_CHANNEL:-nightly}
# Designates what subfolder to put packages into
UPLOAD_SUBFOLDER=${UPLOAD_SUBFOLDER:-cpu}
UPLOAD_BUCKET="s3://pytorch"
BACKUP_BUCKET="s3://pytorch-backup"
DRY_RUN=${DRY_RUN:-enabled}
# Don't actually do work unless explicit
ANACONDA="true anaconda"
AWS_S3_CP="aws s3 cp --dryrun"
if [[ "${DRY_RUN}" = "disabled" ]]; then
ANACONDA="anaconda"
AWS_S3_CP="aws s3 cp"
fi
do_backup() {
local backup_dir
backup_dir=$1
(
pushd /tmp/workspace
set -x
${AWS_S3_CP} --recursive . "${BACKUP_BUCKET}/${CIRCLE_TAG}/${backup_dir}/"
)
}
conda_upload() {
(
set -x
${ANACONDA} \
upload \
${PKG_DIR}/*.tar.bz2 \
-u "pytorch-${UPLOAD_CHANNEL}" \
--label main \
--no-progress \
--force
)
}
s3_upload() {
local extension
local pkg_type
extension="$1"
pkg_type="$2"
s3_dir="${UPLOAD_BUCKET}/${pkg_type}/${UPLOAD_CHANNEL}/${UPLOAD_SUBFOLDER}/"
(
for pkg in ${PKG_DIR}/*.${extension}; do
(
set -x
${AWS_S3_CP} --no-progress --acl public-read "${pkg}" "${s3_dir}"
)
done
)
}
# Install dependencies (should be a no-op if previously installed)
conda install -yq anaconda-client
pip install -q awscli
case "${PACKAGE_TYPE}" in
conda)
conda_upload
# Fetch platform (eg. win-64, linux-64, etc.) from index file
# Because there's no actual conda command to read this
subdir=$(\
tar -xOf ${PKG_DIR}/*.bz2 info/index.json \
| grep subdir \
| cut -d ':' -f2 \
| sed -e 's/[[:space:]]//' -e 's/"//g' -e 's/,//' \
)
BACKUP_DIR="conda/${subdir}"
;;
libtorch)
s3_upload "zip" "libtorch"
BACKUP_DIR="libtorch/${UPLOAD_CHANNEL}/${UPLOAD_SUBFOLDER}"
;;
# wheel can either refer to wheel/manywheel
*wheel)
s3_upload "whl" "whl"
BACKUP_DIR="whl/${UPLOAD_CHANNEL}/${UPLOAD_SUBFOLDER}"
;;
*)
echo "ERROR: unknown package type: ${PACKAGE_TYPE}"
exit 1
;;
esac
# CIRCLE_TAG is defined by upstream circleci,
# this can be changed to recognize tagged versions
if [[ -n "${CIRCLE_TAG:-}" ]]; then
do_backup "${BACKUP_DIR}"
fi

View File

@ -1,68 +1,29 @@
#!/bin/bash
set -eux -o pipefail
source "${BINARY_ENV_FILE:-/c/w/env}"
source "/c/w/env"
mkdir -p "$PYTORCH_FINAL_PACKAGE_DIR"
export CUDA_VERSION="${DESIRED_CUDA/cu/}"
export USE_SCCACHE=1
export SCCACHE_BUCKET=ossci-compiler-cache-windows
export SCCACHE_IGNORE_SERVER_IO_ERROR=1
export VC_YEAR=2019
export NIGHTLIES_PYTORCH_ROOT="$PYTORCH_ROOT"
if [[ "${DESIRED_CUDA}" == *"cu11"* ]]; then
export BUILD_SPLIT_CUDA=ON
if [[ "$CUDA_VERSION" == "92" || "$CUDA_VERSION" == "100" ]]; then
export VC_YEAR=2017
else
export VC_YEAR=2019
fi
set +x
export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V4:-}
export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4:-}
set -x
echo "Free Space for CUDA DEBUG BUILD"
if [[ "${CIRCLECI:-}" == 'true' ]]; then
export NIGHTLIES_PYTORCH_ROOT="$PYTORCH_ROOT"
if [[ -d "C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\Community" ]]; then
rm -rf "C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\Community"
fi
if [[ -d "C:\\Program Files (x86)\\Microsoft Visual Studio 14.0" ]]; then
rm -rf "C:\\Program Files (x86)\\Microsoft Visual Studio 14.0"
fi
if [[ -d "C:\\Program Files (x86)\\Microsoft.NET" ]]; then
rm -rf "C:\\Program Files (x86)\\Microsoft.NET"
fi
if [[ -d "C:\\Program Files\\dotnet" ]]; then
rm -rf "C:\\Program Files\\dotnet"
fi
if [[ -d "C:\\Program Files (x86)\\dotnet" ]]; then
rm -rf "C:\\Program Files (x86)\\dotnet"
fi
if [[ -d "C:\\Program Files (x86)\\Microsoft SQL Server" ]]; then
rm -rf "C:\\Program Files (x86)\\Microsoft SQL Server"
fi
if [[ -d "C:\\Program Files (x86)\\Xamarin" ]]; then
rm -rf "C:\\Program Files (x86)\\Xamarin"
fi
if [[ -d "C:\\Program Files (x86)\\Google" ]]; then
rm -rf "C:\\Program Files (x86)\\Google"
fi
set +x
export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V4:-}
export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4:-}
set -x
if [[ -d "C:\\ProgramData\\Microsoft\\VisualStudio\\Packages\\_Instances" ]]; then
mv "C:\\ProgramData\\Microsoft\\VisualStudio\\Packages\\_Instances" .
rm -rf "C:\\ProgramData\\Microsoft\\VisualStudio\\Packages"
mkdir -p "C:\\ProgramData\\Microsoft\\VisualStudio\\Packages"
mv _Instances "C:\\ProgramData\\Microsoft\\VisualStudio\\Packages"
fi
if [[ -d "C:\\Microsoft" ]]; then
# don't use quotes here
rm -rf /c/Microsoft/AndroidNDK*
fi
if [[ "$CIRCLECI" == 'true' && -d "C:\\ProgramData\\Microsoft\\VisualStudio\\Packages\\_Instances" ]]; then
mv "C:\\ProgramData\\Microsoft\\VisualStudio\\Packages\\_Instances" .
rm -rf "C:\\ProgramData\\Microsoft\\VisualStudio\\Packages"
mkdir -p "C:\\ProgramData\\Microsoft\\VisualStudio\\Packages"
mv _Instances "C:\\ProgramData\\Microsoft\\VisualStudio\\Packages"
fi
echo "Free space on filesystem before build:"
@ -70,10 +31,9 @@ df -h
pushd "$BUILDER_ROOT"
if [[ "$PACKAGE_TYPE" == 'conda' ]]; then
./windows/internal/build_conda.bat
./windows/internal/build_conda.bat
elif [[ "$PACKAGE_TYPE" == 'wheel' || "$PACKAGE_TYPE" == 'libtorch' ]]; then
export NIGHTLIES_PYTORCH_ROOT="$PYTORCH_ROOT"
./windows/internal/build_wheels.bat
./windows/internal/build_wheels.bat
fi
echo "Free space on filesystem after build:"

View File

@ -1,10 +1,16 @@
#!/bin/bash
set -eux -o pipefail
source "${BINARY_ENV_FILE:-/c/w/env}"
source "/c/w/env"
export CUDA_VERSION="${DESIRED_CUDA/cu/}"
export VC_YEAR=2019
export VC_YEAR=2017
if [[ "$CUDA_VERSION" == "92" || "$CUDA_VERSION" == "100" ]]; then
export VC_YEAR=2017
else
export VC_YEAR=2019
fi
pushd "$BUILDER_ROOT"

View File

@ -0,0 +1,47 @@
#!/bin/bash
set -eu -o pipefail
set +x
declare -x "AWS_ACCESS_KEY_ID=${PYTORCH_BINARY_AWS_ACCESS_KEY_ID}"
declare -x "AWS_SECRET_ACCESS_KEY=${PYTORCH_BINARY_AWS_SECRET_ACCESS_KEY}"
#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!
# DO NOT TURN -x ON BEFORE THIS LINE
#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!
set -eux -o pipefail
source "/env"
# This gets set in binary_populate_env.sh, but lets have a sane default just in case
PIP_UPLOAD_FOLDER=${PIP_UPLOAD_FOLDER:-nightly/}
# TODO: Combine CONDA_UPLOAD_CHANNEL and PIP_UPLOAD_FOLDER into one variable
# The only difference is the trailing slash
# Strip trailing slashes if there
CONDA_UPLOAD_CHANNEL=$(echo "${PIP_UPLOAD_FOLDER}" | sed 's:/*$::')
BACKUP_BUCKET="s3://pytorch-backup"
retry pip install -q awscli
pushd /root/workspace/final_pkgs
# Upload the package to the final location
if [[ "$PACKAGE_TYPE" == conda ]]; then
retry conda install -yq anaconda-client
retry anaconda -t "${CONDA_PYTORCHBOT_TOKEN}" upload "$(ls)" -u "pytorch-${CONDA_UPLOAD_CHANNEL}" --label main --no-progress --force
# Fetch platform (eg. win-64, linux-64, etc.) from index file
# Because there's no actual conda command to read this
subdir=$(tar -xOf ./*.bz2 info/index.json | grep subdir | cut -d ':' -f2 | sed -e 's/[[:space:]]//' -e 's/"//g' -e 's/,//')
BACKUP_DIR="conda/${subdir}"
elif [[ "$PACKAGE_TYPE" == libtorch ]]; then
s3_dir="s3://pytorch/libtorch/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"
for pkg in $(ls); do
retry aws s3 cp "$pkg" "$s3_dir" --acl public-read
done
BACKUP_DIR="libtorch/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"
else
s3_dir="s3://pytorch/whl/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"
retry aws s3 cp "$(ls)" "$s3_dir" --acl public-read
BACKUP_DIR="whl/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"
fi
if [[ -n "${CIRCLE_TAG:-}" ]]; then
s3_dir="${BACKUP_BUCKET}/${CIRCLE_TAG}/${BACKUP_DIR}"
retry aws s3 cp --recursive . "$s3_dir"
fi

View File

@ -1,44 +1,15 @@
#!/usr/bin/env bash
set -eux -o pipefail
env
echo "BUILD_ENVIRONMENT:$BUILD_ENVIRONMENT"
export ANDROID_NDK_HOME=/opt/ndk
export ANDROID_NDK=/opt/ndk
export ANDROID_HOME=/opt/android/sdk
# Must be in sync with GRADLE_VERSION in docker image for android
# https://github.com/pietern/pytorch-dockerfiles/blob/master/build.sh#L155
export GRADLE_VERSION=6.8.3
export GRADLE_VERSION=4.10.3
export GRADLE_HOME=/opt/gradle/gradle-$GRADLE_VERSION
export GRADLE_PATH=$GRADLE_HOME/bin/gradle
# touch gradle cache files to prevent expiration
while IFS= read -r -d '' file
do
touch "$file" || true
done < <(find /var/lib/jenkins/.gradle -type f -print0)
export GRADLE_LOCAL_PROPERTIES=~/workspace/android/local.properties
rm -f $GRADLE_LOCAL_PROPERTIES
echo "sdk.dir=/opt/android/sdk" >> $GRADLE_LOCAL_PROPERTIES
echo "ndk.dir=/opt/ndk" >> $GRADLE_LOCAL_PROPERTIES
echo "cmake.dir=/usr/local" >> $GRADLE_LOCAL_PROPERTIES
retry () {
$* || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*)
}
# Run custom build script
if [[ "${BUILD_ENVIRONMENT}" == *-gradle-custom-build* ]]; then
# Install torch & torchvision - used to download & dump used ops from test model.
retry pip install torch torchvision --progress-bar off
exec "$(dirname "${BASH_SOURCE[0]}")/../../android/build_test_app_custom.sh" armeabi-v7a
fi
# Run default build
BUILD_ANDROID_INCLUDE_DIR_x86=~/workspace/build_android/install/include
BUILD_ANDROID_LIB_DIR_x86=~/workspace/build_android/install/lib
@ -73,6 +44,9 @@ ln -s ${BUILD_ANDROID_INCLUDE_DIR_arm_v8a} ${JNI_INCLUDE_DIR}/arm64-v8a
ln -s ${BUILD_ANDROID_LIB_DIR_arm_v8a} ${JNI_LIBS_DIR}/arm64-v8a
fi
env
echo "BUILD_ENVIRONMENT:$BUILD_ENVIRONMENT"
GRADLE_PARAMS="-p android assembleRelease --debug --stacktrace"
if [[ "${BUILD_ENVIRONMENT}" == *-gradle-build-only-x86_32* ]]; then
GRADLE_PARAMS+=" -PABI_FILTERS=x86"
@ -82,6 +56,20 @@ if [ -n "{GRADLE_OFFLINE:-}" ]; then
GRADLE_PARAMS+=" --offline"
fi
# touch gradle cache files to prevent expiration
while IFS= read -r -d '' file
do
touch "$file" || true
done < <(find /var/lib/jenkins/.gradle -type f -print0)
env
export GRADLE_LOCAL_PROPERTIES=~/workspace/android/local.properties
rm -f $GRADLE_LOCAL_PROPERTIES
echo "sdk.dir=/opt/android/sdk" >> $GRADLE_LOCAL_PROPERTIES
echo "ndk.dir=/opt/ndk" >> $GRADLE_LOCAL_PROPERTIES
echo "cmake.dir=/usr/local" >> $GRADLE_LOCAL_PROPERTIES
$GRADLE_PATH $GRADLE_PARAMS
find . -type f -name "*.a" -exec ls -lh {} \;

View File

@ -10,36 +10,33 @@ pt_checkout="/var/lib/jenkins/workspace"
# Since we're cat-ing this file, we need to escape all $'s
echo "cpp_doc_push_script.sh: Invoked with $*"
# for statements like ${1:-${DOCS_INSTALL_PATH:-docs/}}
# the order of operations goes:
# 1. Check if there's an argument $1
# 2. If no argument check for environment var DOCS_INSTALL_PATH
# 3. If no environment var fall back to default 'docs/'
# NOTE: It might seem weird to gather the second argument before gathering the first argument
# but since DOCS_INSTALL_PATH can be derived from DOCS_VERSION it's probably better to
# try and gather it first, just so we don't potentially break people who rely on this script
# Argument 2: What version of the Python API docs we are building.
version="${2:-${DOCS_VERSION:-master}}"
if [ -z "$version" ]; then
echo "error: cpp_doc_push_script.sh: version (arg2) not specified"
exit 1
fi
# Argument 1: Where to copy the built documentation for Python API to
# (pytorch.github.io/$install_path)
install_path="${1:-${DOCS_INSTALL_PATH:-docs/${DOCS_VERSION}}}"
install_path="$1"
if [ -z "$install_path" ]; then
echo "error: cpp_doc_push_script.sh: install_path (arg1) not specified"
exit 1
fi
is_main_doc=false
if [ "$version" == "master" ]; then
is_main_doc=true
# Argument 2: What version of the Python API docs we are building.
version="$2"
if [ -z "$version" ]; then
echo "error: cpp_doc_push_script.sh: version (arg2) not specified"
exit 1
fi
echo "install_path: $install_path version: $version"
is_master_doc=false
if [ "$version" == "master" ]; then
is_master_doc=true
fi
# Argument 3: (optional) If present, we will NOT do any pushing. Used for testing.
dry_run=false
if [ "$3" != "" ]; then
dry_run=true
fi
echo "install_path: $install_path version: $version dry_run: $dry_run"
# ======================== Building PyTorch C++ API Docs ========================
@ -56,21 +53,31 @@ sudo apt-get -y install doxygen
# Generate ATen files
pushd "${pt_checkout}"
pip install -r requirements.txt
time python -m torchgen.gen \
time python aten/src/ATen/gen.py \
-s aten/src/ATen \
-d build/aten/src/ATen
-d build/aten/src/ATen \
aten/src/ATen/Declarations.cwrap \
aten/src/THCUNN/generic/THCUNN.h \
aten/src/ATen/nn.yaml \
aten/src/ATen/native/native_functions.yaml
# Copy some required files
cp aten/src/ATen/common_with_cwrap.py tools/shared/cwrap_common.py
cp torch/_utils_internal.py tools/shared
# Generate PyTorch files
time python tools/setup_helpers/generate_code.py \
--native-functions-path aten/src/ATen/native/native_functions.yaml \
--tags-path aten/src/ATen/native/tags.yaml
--declarations-path build/aten/src/ATen/Declarations.yaml \
--nn-path aten/src/
# Build the docs
pushd docs/cpp
pip install -r requirements.txt
pip install breathe==4.13.0 bs4 lxml six
pip install --no-cache-dir -e "git+https://github.com/pytorch/pytorch_sphinx_theme.git#egg=pytorch_sphinx_theme"
pip install exhale>=0.2.1
pip install sphinx==2.4.4
# Uncomment once it is fixed
# pip install -r requirements.txt
time make VERBOSE=1 html -j
popd
@ -96,11 +103,23 @@ git status
git config user.email "soumith+bot@pytorch.org"
git config user.name "pytorchbot"
# If there aren't changes, don't make a commit; push is no-op
git commit -m "Generate C++ docs from pytorch/pytorch@${GITHUB_SHA}" || true
git commit -m "Automatic sync on $(date)" || true
git status
if [[ "${WITH_PUSH:-}" == true ]]; then
git push -u origin
if [ "$dry_run" = false ]; then
echo "Pushing to https://github.com/pytorch/cppdocs"
set +x
/usr/bin/expect <<DONE
spawn git push -u origin master
expect "Username*"
send "pytorchbot\n"
expect "Password*"
send "$::env(GITHUB_PYTORCHBOT_TOKEN)\n"
expect eof
DONE
set -x
else
echo "Skipping push due to dry_run"
fi
popd

View File

@ -1,8 +0,0 @@
set "DRIVER_DOWNLOAD_LINK=https://s3.amazonaws.com/ossci-windows/452.39-data-center-tesla-desktop-win10-64bit-international.exe"
curl --retry 3 -kL %DRIVER_DOWNLOAD_LINK% --output 452.39-data-center-tesla-desktop-win10-64bit-international.exe
if errorlevel 1 exit /b 1
start /wait 452.39-data-center-tesla-desktop-win10-64bit-international.exe -s -noreboot
if errorlevel 1 exit /b 1
del 452.39-data-center-tesla-desktop-win10-64bit-international.exe || ver > NUL

View File

@ -5,7 +5,7 @@ set -eu -o pipefail
export ANDROID_NDK_HOME=/opt/ndk
export ANDROID_HOME=/opt/android/sdk
export GRADLE_VERSION=6.8.3
export GRADLE_VERSION=4.10.3
export GRADLE_HOME=/opt/gradle/gradle-$GRADLE_VERSION
export GRADLE_PATH=$GRADLE_HOME/bin/gradle
@ -35,9 +35,7 @@ else
echo "ndk.dir=/opt/ndk" >> $GRADLE_LOCAL_PROPERTIES
echo "SONATYPE_NEXUS_USERNAME=${SONATYPE_NEXUS_USERNAME}" >> $GRADLE_PROPERTIES
echo "mavenCentralRepositoryUsername=${SONATYPE_NEXUS_USERNAME}" >> $GRADLE_PROPERTIES
echo "SONATYPE_NEXUS_PASSWORD=${SONATYPE_NEXUS_PASSWORD}" >> $GRADLE_PROPERTIES
echo "mavenCentralRepositoryPassword=${SONATYPE_NEXUS_PASSWORD}" >> $GRADLE_PROPERTIES
echo "signing.keyId=${ANDROID_SIGN_KEY}" >> $GRADLE_PROPERTIES
echo "signing.password=${ANDROID_SIGN_PASS}" >> $GRADLE_PROPERTIES

View File

@ -7,72 +7,46 @@ sudo apt-get -y install expect-dev
# This is where the local pytorch install in the docker image is located
pt_checkout="/var/lib/jenkins/workspace"
source "$pt_checkout/.jenkins/pytorch/common_utils.sh"
echo "python_doc_push_script.sh: Invoked with $*"
set -ex
# for statements like ${1:-${DOCS_INSTALL_PATH:-docs/}}
# the order of operations goes:
# 1. Check if there's an argument $1
# 2. If no argument check for environment var DOCS_INSTALL_PATH
# 3. If no environment var fall back to default 'docs/'
# NOTE: It might seem weird to gather the second argument before gathering the first argument
# but since DOCS_INSTALL_PATH can be derived from DOCS_VERSION it's probably better to
# try and gather it first, just so we don't potentially break people who rely on this script
# Argument 2: What version of the docs we are building.
version="${2:-${DOCS_VERSION:-master}}"
if [ -z "$version" ]; then
echo "error: python_doc_push_script.sh: version (arg2) not specified"
exit 1
fi
# Argument 1: Where to copy the built documentation to
# (pytorch.github.io/$install_path)
install_path="${1:-${DOCS_INSTALL_PATH:-docs/${DOCS_VERSION}}}"
install_path="$1"
if [ -z "$install_path" ]; then
echo "error: python_doc_push_script.sh: install_path (arg1) not specified"
exit 1
fi
is_main_doc=false
# Argument 2: What version of the docs we are building.
version="$2"
if [ -z "$version" ]; then
echo "error: python_doc_push_script.sh: version (arg2) not specified"
exit 1
fi
is_master_doc=false
if [ "$version" == "master" ]; then
is_main_doc=true
is_master_doc=true
fi
# Argument 3: The branch to push to. Usually is "site"
branch="${3:-${DOCS_BRANCH:-site}}"
branch="$3"
if [ -z "$branch" ]; then
echo "error: python_doc_push_script.sh: branch (arg3) not specified"
exit 1
fi
echo "install_path: $install_path version: $version"
# Argument 4: (optional) If present, we will NOT do any pushing. Used for testing.
dry_run=false
if [ "$4" != "" ]; then
dry_run=true
fi
echo "install_path: $install_path version: $version dry_run: $dry_run"
build_docs () {
set +e
set -o pipefail
make $1 2>&1 | tee /tmp/docs_build.txt
code=$?
if [ $code -ne 0 ]; then
set +x
echo =========================
grep "WARNING:" /tmp/docs_build.txt
echo =========================
echo Docs build failed. If the failure is not clear, scan back in the log
echo for any WARNINGS or for the line "build finished with problems"
echo "(tried to echo the WARNINGS above the ==== line)"
echo =========================
fi
set -ex
return $code
}
git clone https://github.com/pytorch/pytorch.github.io -b $branch --depth 1
git clone https://github.com/pytorch/pytorch.github.io -b $branch
pushd pytorch.github.io
export LC_ALL=C
@ -80,15 +54,26 @@ export PATH=/opt/conda/bin:$PATH
rm -rf pytorch || true
# Install TensorBoard in python 3 so torch.utils.tensorboard classes render
pip install -q https://s3.amazonaws.com/ossci-linux/wheels/tensorboard-1.14.0a0-py3-none-any.whl
# Get all the documentation sources, put them in one place
pushd "$pt_checkout"
git clone https://github.com/pytorch/vision
pushd vision
conda install -q pillow
time python setup.py install
popd
pushd docs
rm -rf source/torchvision
cp -a ../vision/docs/source source/torchvision
# Build the docs
pip -q install -r requirements.txt
if [ "$is_main_doc" = true ]; then
build_docs html
[ $? -eq 0 ] || exit $?
pip -q install -r requirements.txt || true
if [ "$is_master_doc" = true ]; then
# TODO: fix gh-38011 then enable this which changes warnings into errors
# export SPHINXOPTS="-WT --keep-going"
make html
make coverage
# Now we have the coverage report, we need to make sure it is empty.
# Count the number of lines in the file and turn that number into a variable
@ -109,9 +94,8 @@ if [ "$is_main_doc" = true ]; then
exit 1
fi
else
# skip coverage, format for stable or tags
build_docs html-stable
[ $? -eq 0 ] || exit $?
# Don't fail the build on coverage problems
make html-stable
fi
# Move them into the docs repo
@ -120,6 +104,14 @@ popd
git rm -rf "$install_path" || true
mv "$pt_checkout/docs/build/html" "$install_path"
# Add the version handler by search and replace.
# XXX: Consider moving this to the docs Makefile or site build
if [ "$is_master_doc" = true ]; then
find "$install_path" -name "*.html" -print0 | xargs -0 perl -pi -w -e "s@master\s+\((\d\.\d\.[A-Fa-f0-9]+\+[A-Fa-f0-9]+)\s+\)@<a href='http://pytorch.org/docs/versions.html'>\1 \&#x25BC</a>@g"
else
find "$install_path" -name "*.html" -print0 | xargs -0 perl -pi -w -e "s@master\s+\((\d\.\d\.[A-Fa-f0-9]+\+[A-Fa-f0-9]+)\s+\)@<a href='http://pytorch.org/docs/versions.html'>$version \&#x25BC</a>@g"
fi
# Prevent Google from indexing $install_path/_modules. This folder contains
# generated source files.
# NB: the following only works on gnu sed. The sed shipped with mac os is different.
@ -131,11 +123,23 @@ git status
git config user.email "soumith+bot@pytorch.org"
git config user.name "pytorchbot"
# If there aren't changes, don't make a commit; push is no-op
git commit -m "Generate Python docs from pytorch/pytorch@${GITHUB_SHA}" || true
git commit -m "auto-generating sphinx docs" || true
git status
if [[ "${WITH_PUSH:-}" == true ]]; then
git push -u origin "${branch}"
if [ "$dry_run" = false ]; then
echo "Pushing to pytorch.github.io:$branch"
set +x
/usr/bin/expect <<DONE
spawn git push origin $branch
expect "Username*"
send "pytorchbot\n"
expect "Password*"
send "$::env(GITHUB_PYTORCHBOT_TOKEN)\n"
expect eof
DONE
set -x
else
echo "Skipping push due to dry_run"
fi
popd

View File

@ -1,17 +1,20 @@
#!/usr/bin/env bash
set -ex -o pipefail
# Set up NVIDIA docker repo
curl -s -L --retry 3 https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
echo "deb https://nvidia.github.io/libnvidia-container/ubuntu16.04/amd64 /" | sudo tee -a /etc/apt/sources.list.d/nvidia-docker.list
echo "deb https://nvidia.github.io/nvidia-container-runtime/ubuntu16.04/amd64 /" | sudo tee -a /etc/apt/sources.list.d/nvidia-docker.list
echo "deb https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64 /" | sudo tee -a /etc/apt/sources.list.d/nvidia-docker.list
# Remove unnecessary sources
sudo rm -f /etc/apt/sources.list.d/google-chrome.list
sudo rm -f /etc/apt/heroku.list
sudo rm -f /etc/apt/openjdk-r-ubuntu-ppa-xenial.list
sudo rm -f /etc/apt/partner.list
# To increase the network reliability, let apt decide which mirror is best to use
sudo sed -i -e 's/http:\/\/.*archive/mirror:\/\/mirrors/' -e 's/\/ubuntu\//\/mirrors.txt/' /etc/apt/sources.list
retry () {
$* || $* || $* || $* || $*
$* || $* || $* || $* || $*
}
# Method adapted from here: https://askubuntu.com/questions/875213/apt-get-to-retry-downloading
@ -19,85 +22,70 @@ retry () {
# This is better than retrying the whole apt-get command
echo "APT::Acquire::Retries \"3\";" | sudo tee /etc/apt/apt.conf.d/80-retries
retry sudo apt-get update -qq
sudo apt-get -y update
sudo apt-get -y remove linux-image-generic linux-headers-generic linux-generic docker-ce
# WARNING: Docker version is hardcoded here; you must update the
# version number below for docker-ce and nvidia-docker2 to get newer
# versions of Docker. We hardcode these numbers because we kept
# getting broken CI when Docker would update their docker version,
# and nvidia-docker2 would be out of date for a day until they
# released a newer version of their package.
#
# How to figure out what the correct versions of these packages are?
# My preferred method is to start a Docker instance of the correct
# Ubuntu version (e.g., docker run -it ubuntu:16.04) and then ask
# apt what the packages you need are. Note that the CircleCI image
# comes with Docker.
#
# Using 'retry' here as belt-and-suspenders even though we are
# presumably retrying at the single-package level via the
# apt.conf.d/80-retries technique.
retry sudo apt-get -y install \
linux-headers-$(uname -r) \
linux-image-generic \
moreutils \
docker-ce=5:18.09.4~3-0~ubuntu-xenial \
nvidia-container-runtime=2.0.0+docker18.09.4-1 \
nvidia-docker2=2.0.3+docker18.09.4-1 \
expect-dev
echo "== DOCKER VERSION =="
docker version
sudo pkill -SIGHUP dockerd
if ! command -v aws >/dev/null; then
retry sudo pip3 -q install awscli==1.19.64
fi
retry sudo pip -q install awscli==1.16.35
if [ -n "${USE_CUDA_DOCKER_RUNTIME:-}" ]; then
DRIVER_FN="NVIDIA-Linux-x86_64-510.60.02.run"
DRIVER_FN="NVIDIA-Linux-x86_64-440.59.run"
wget "https://s3.amazonaws.com/ossci-linux/nvidia_driver/$DRIVER_FN"
sudo /bin/bash "$DRIVER_FN" -s --no-drm || (sudo cat /var/log/nvidia-installer.log && false)
nvidia-smi
# Taken directly from https://github.com/NVIDIA/nvidia-docker
# Add the package repositories
distribution=$(. /etc/os-release;echo "$ID$VERSION_ID")
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L "https://nvidia.github.io/nvidia-docker/${distribution}/nvidia-docker.list" | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
retry sudo apt-get update -qq
# Necessary to get the `--gpus` flag to function within docker
retry sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
else
# Explicitly remove nvidia docker apt repositories if not building for cuda
sudo rm -rf /etc/apt/sources.list.d/nvidia-docker.list
fi
add_to_env_file() {
local name=$1
local value=$2
case "$value" in
*\ *)
# BASH_ENV should be set by CircleCI
echo "${name}='${value}'" >> "${BASH_ENV:-/tmp/env}"
;;
*)
echo "${name}=${value}" >> "${BASH_ENV:-/tmp/env}"
;;
esac
}
add_to_env_file IN_CI 1
add_to_env_file CI_MASTER "${CI_MASTER:-}"
add_to_env_file COMMIT_SOURCE "${CIRCLE_BRANCH:-}"
add_to_env_file BUILD_ENVIRONMENT "${BUILD_ENVIRONMENT}"
add_to_env_file CIRCLE_PULL_REQUEST "${CIRCLE_PULL_REQUEST}"
if [[ "${BUILD_ENVIRONMENT}" == *-build ]]; then
add_to_env_file SCCACHE_BUCKET ossci-compiler-cache-circleci-v2
SCCACHE_MAX_JOBS=$(( $(nproc) - 1 ))
MEMORY_LIMIT_MAX_JOBS=8 # the "large" resource class on CircleCI has 32 CPU cores, if we use all of them we'll OOM
MAX_JOBS=$(( ${SCCACHE_MAX_JOBS} > ${MEMORY_LIMIT_MAX_JOBS} ? ${MEMORY_LIMIT_MAX_JOBS} : ${SCCACHE_MAX_JOBS} ))
add_to_env_file MAX_JOBS "${MAX_JOBS}"
echo "declare -x IN_CIRCLECI=1" > /home/circleci/project/env
echo "declare -x COMMIT_SOURCE=${CIRCLE_BRANCH:-}" >> /home/circleci/project/env
echo "declare -x SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> /home/circleci/project/env
if [ -n "${USE_CUDA_DOCKER_RUNTIME:-}" ]; then
add_to_env_file TORCH_CUDA_ARCH_LIST 5.2
echo "declare -x TORCH_CUDA_ARCH_LIST=5.2" >> /home/circleci/project/env
fi
export SCCACHE_MAX_JOBS=`expr $(nproc) - 1`
export MEMORY_LIMIT_MAX_JOBS=8 # the "large" resource class on CircleCI has 32 CPU cores, if we use all of them we'll OOM
export MAX_JOBS=$(( ${SCCACHE_MAX_JOBS} > ${MEMORY_LIMIT_MAX_JOBS} ? ${MEMORY_LIMIT_MAX_JOBS} : ${SCCACHE_MAX_JOBS} ))
echo "declare -x MAX_JOBS=${MAX_JOBS}" >> /home/circleci/project/env
if [[ "${BUILD_ENVIRONMENT}" == *xla* ]]; then
# This IAM user allows write access to S3 bucket for sccache & bazels3cache
set +x
add_to_env_file XLA_CLANG_CACHE_S3_BUCKET_NAME "${XLA_CLANG_CACHE_S3_BUCKET_NAME:-}"
add_to_env_file AWS_ACCESS_KEY_ID "${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_AND_XLA_BAZEL_S3_BUCKET_V2:-}"
add_to_env_file AWS_SECRET_ACCESS_KEY "${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_AND_XLA_BAZEL_S3_BUCKET_V2:-}"
echo "declare -x XLA_CLANG_CACHE_S3_BUCKET_NAME=${XLA_CLANG_CACHE_S3_BUCKET_NAME:-}" >> /home/circleci/project/env
echo "declare -x AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_AND_XLA_BAZEL_S3_BUCKET_V2:-}" >> /home/circleci/project/env
echo "declare -x AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_AND_XLA_BAZEL_S3_BUCKET_V2:-}" >> /home/circleci/project/env
set -x
else
# This IAM user allows write access to S3 bucket for sccache
set +x
add_to_env_file XLA_CLANG_CACHE_S3_BUCKET_NAME "${XLA_CLANG_CACHE_S3_BUCKET_NAME:-}"
add_to_env_file AWS_ACCESS_KEY_ID "${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V4:-}"
add_to_env_file AWS_SECRET_ACCESS_KEY "${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4:-}"
echo "declare -x XLA_CLANG_CACHE_S3_BUCKET_NAME=${XLA_CLANG_CACHE_S3_BUCKET_NAME:-}" >> /home/circleci/project/env
echo "declare -x AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V4:-}" >> /home/circleci/project/env
echo "declare -x AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4:-}" >> /home/circleci/project/env
set -x
fi
fi
@ -106,7 +94,5 @@ fi
set +x
export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_ECR_READ_WRITE_V4:-}
export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_ECR_READ_WRITE_V4:-}
export AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\")
export AWS_REGION=us-east-1
aws ecr get-login-password --region $AWS_REGION|docker login --username AWS --password-stdin $AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com
eval $(aws ecr get-login --region us-east-1 --no-include-email)
set -x

View File

@ -33,7 +33,7 @@ systemctl list-units --all | cat
sudo pkill apt-get || true
# For even better luck, purge unattended-upgrades
sudo apt-get purge -y unattended-upgrades || true
sudo apt-get purge -y unattended-upgrades
cat /etc/apt/sources.list

View File

@ -1,140 +0,0 @@
# Documentation: https://docs.microsoft.com/en-us/rest/api/azure/devops/build/?view=azure-devops-rest-6.0
import re
import json
import os
import sys
import requests
import time
AZURE_PIPELINE_BASE_URL = "https://aiinfra.visualstudio.com/PyTorch/"
AZURE_DEVOPS_PAT_BASE64 = os.environ.get("AZURE_DEVOPS_PAT_BASE64_SECRET", "")
PIPELINE_ID = "911"
PROJECT_ID = "0628bce4-2d33-499e-bac5-530e12db160f"
TARGET_BRANCH = os.environ.get("CIRCLE_BRANCH", "main")
TARGET_COMMIT = os.environ.get("CIRCLE_SHA1", "")
build_base_url = AZURE_PIPELINE_BASE_URL + "_apis/build/builds?api-version=6.0"
s = requests.Session()
s.headers.update({"Authorization": "Basic " + AZURE_DEVOPS_PAT_BASE64})
def submit_build(pipeline_id, project_id, source_branch, source_version):
print("Submitting build for branch: " + source_branch)
print("Commit SHA1: ", source_version)
run_build_raw = s.post(build_base_url, json={
"definition": {"id": pipeline_id},
"project": {"id": project_id},
"sourceBranch": source_branch,
"sourceVersion": source_version
})
try:
run_build_json = run_build_raw.json()
except json.decoder.JSONDecodeError as e:
print(e)
print("Failed to parse the response. Check if the Azure DevOps PAT is incorrect or expired.")
sys.exit(-1)
build_id = run_build_json['id']
print("Submitted bulid: " + str(build_id))
print("Bulid URL: " + run_build_json['url'])
return build_id
def get_build(_id):
get_build_url = AZURE_PIPELINE_BASE_URL + f"/_apis/build/builds/{_id}?api-version=6.0"
get_build_raw = s.get(get_build_url)
return get_build_raw.json()
def get_build_logs(_id):
get_build_logs_url = AZURE_PIPELINE_BASE_URL + f"/_apis/build/builds/{_id}/logs?api-version=6.0"
get_build_logs_raw = s.get(get_build_logs_url)
return get_build_logs_raw.json()
def get_log_content(url):
resp = s.get(url)
return resp.text
def wait_for_build(_id):
build_detail = get_build(_id)
build_status = build_detail['status']
while build_status == 'notStarted':
print('Waiting for run to start: ' + str(_id))
sys.stdout.flush()
try:
build_detail = get_build(_id)
build_status = build_detail['status']
except Exception as e:
print("Error getting build")
print(e)
time.sleep(30)
print("Bulid started: ", str(_id))
handled_logs = set()
while build_status == 'inProgress':
try:
print("Waiting for log: " + str(_id))
logs = get_build_logs(_id)
except Exception as e:
print("Error fetching logs")
print(e)
time.sleep(30)
continue
for log in logs['value']:
log_id = log['id']
if log_id in handled_logs:
continue
handled_logs.add(log_id)
print('Fetching log: \n' + log['url'])
try:
log_content = get_log_content(log['url'])
print(log_content)
except Exception as e:
print("Error getting log content")
print(e)
sys.stdout.flush()
build_detail = get_build(_id)
build_status = build_detail['status']
time.sleep(30)
build_result = build_detail['result']
print("Bulid status: " + build_status)
print("Bulid result: " + build_result)
return build_status, build_result
if __name__ == '__main__':
# Convert the branch name for Azure DevOps
match = re.search(r'pull/(\d+)', TARGET_BRANCH)
if match is not None:
pr_num = match.group(1)
SOURCE_BRANCH = f'refs/pull/{pr_num}/head'
else:
SOURCE_BRANCH = f'refs/heads/{TARGET_BRANCH}'
MAX_RETRY = 2
retry = MAX_RETRY
while retry > 0:
build_id = submit_build(PIPELINE_ID, PROJECT_ID, SOURCE_BRANCH, TARGET_COMMIT)
build_status, build_result = wait_for_build(build_id)
if build_result != 'succeeded':
retry = retry - 1
if retry > 0:
print("Retrying... remaining attempt: " + str(retry))
# Wait a bit before retrying
time.sleep((MAX_RETRY - retry) * 120)
continue
else:
print("No more chance to retry. Giving up.")
sys.exit(-1)
else:
break

View File

@ -0,0 +1,145 @@
import glob
import json
import logging
import os
import os.path
import pathlib
import re
import sys
import time
import zipfile
import requests
def get_size(file_dir):
try:
# we should only expect one file, if no, something is wrong
file_name = glob.glob(os.path.join(file_dir, "*"))[0]
return os.stat(file_name).st_size
except:
logging.exception(f"error getting file from: {file_dir}")
return 0
def build_message(size):
pkg_type, py_ver, cu_ver, *_ = os.environ.get("BUILD_ENVIRONMENT", "").split() + [
None,
None,
None,
]
os_name = os.uname()[0].lower()
if os_name == "darwin":
os_name = "macos"
return {
"normal": {
"os": os_name,
"pkg_type": pkg_type,
"py_ver": py_ver,
"cu_ver": cu_ver,
"pr": os.environ.get("CIRCLE_PR_NUMBER"),
"build_num": os.environ.get("CIRCLE_BUILD_NUM"),
"sha1": os.environ.get("CIRCLE_SHA1"),
"branch": os.environ.get("CIRCLE_BRANCH"),
},
"int": {
"time": int(time.time()),
"size": size,
"commit_time": int(os.environ.get("COMMIT_TIME", "0")),
},
}
def send_message(messages):
access_token = os.environ.get("SCRIBE_GRAPHQL_ACCESS_TOKEN")
if not access_token:
raise ValueError("Can't find access token from environment variable")
url = "https://graph.facebook.com/scribe_logs"
r = requests.post(
url,
data={
"access_token": access_token,
"logs": json.dumps(
[
{
"category": "perfpipe_pytorch_binary_size",
"message": json.dumps(message),
"line_escape": False,
}
for message in messages
]
),
},
)
print(r.text)
r.raise_for_status()
def report_android_sizes(file_dir):
def gen_sizes():
# we should only expect one file, if no, something is wrong
aar_files = list(pathlib.Path(file_dir).rglob("pytorch_android-*.aar"))
if len(aar_files) != 1:
logging.exception(f"error getting aar files from: {file_dir} / {aar_files}")
return
aar_file = aar_files[0]
zf = zipfile.ZipFile(aar_file)
for info in zf.infolist():
# Scan ".so" libs in `jni` folder. Examples:
# jni/arm64-v8a/libfbjni.so
# jni/arm64-v8a/libpytorch_jni.so
m = re.match(r"^jni/([^/]+)/(.*\.so)$", info.filename)
if not m:
continue
arch, lib = m.groups()
# report per architecture library size
yield [arch, lib, info.compress_size, info.file_size]
# report whole package size
yield ["aar", aar_file.name, os.stat(aar_file).st_size, 0]
def gen_messages():
android_build_type = os.environ.get("ANDROID_BUILD_TYPE")
for arch, lib, comp_size, uncomp_size in gen_sizes():
print(android_build_type, arch, lib, comp_size, uncomp_size)
yield {
"normal": {
"os": "android",
# TODO: create dedicated columns
"pkg_type": "{}/{}/{}".format(android_build_type, arch, lib),
"cu_ver": "", # dummy value for derived field `build_name`
"py_ver": "", # dummy value for derived field `build_name`
"pr": os.environ.get("CIRCLE_PR_NUMBER"),
"build_num": os.environ.get("CIRCLE_BUILD_NUM"),
"sha1": os.environ.get("CIRCLE_SHA1"),
"branch": os.environ.get("CIRCLE_BRANCH"),
},
"int": {
"time": int(time.time()),
"commit_time": int(os.environ.get("COMMIT_TIME", "0")),
"size": comp_size,
"raw_size": uncomp_size,
},
}
send_message(list(gen_messages()))
if __name__ == "__main__":
file_dir = os.environ.get(
"PYTORCH_FINAL_PACKAGE_DIR", "/home/circleci/project/final_pkgs"
)
if len(sys.argv) == 2:
file_dir = sys.argv[1]
print("checking dir: " + file_dir)
if "-android" in os.environ.get("BUILD_ENVIRONMENT", ""):
report_android_sizes(file_dir)
else:
size = get_size(file_dir)
if size != 0:
try:
send_message([build_message(size)])
except:
logging.exception("can't send message")

View File

@ -1,10 +1,7 @@
# https://developercommunity.visualstudio.com/t/install-specific-version-of-vs-component/1142479
# Where to find the links: https://docs.microsoft.com/en-us/visualstudio/releases/2019/history#release-dates-and-build-numbers
# BuildTools from S3
$VS_DOWNLOAD_LINK = "https://s3.amazonaws.com/ossci-windows/vs${env:VS_VERSION}_BuildTools.exe"
$VS_DOWNLOAD_LINK = "https://aka.ms/vs/15/release/vs_buildtools.exe"
$COLLECT_DOWNLOAD_LINK = "https://aka.ms/vscollect.exe"
$VS_INSTALL_ARGS = @("--nocache","--quiet","--wait", "--add Microsoft.VisualStudio.Workload.VCTools",
"--add Microsoft.VisualStudio.Component.VC.Tools.14.11",
"--add Microsoft.Component.MSBuild",
"--add Microsoft.VisualStudio.Component.Roslyn.Compiler",
"--add Microsoft.VisualStudio.Component.TextTemplating",
@ -14,45 +11,17 @@ $VS_INSTALL_ARGS = @("--nocache","--quiet","--wait", "--add Microsoft.VisualStud
"--add Microsoft.VisualStudio.Component.VC.Tools.x86.x64",
"--add Microsoft.VisualStudio.ComponentGroup.NativeDesktop.Win81")
if (${env:INSTALL_WINDOWS_SDK} -eq "1") {
$VS_INSTALL_ARGS += "--add Microsoft.VisualStudio.Component.Windows10SDK.19041"
}
if (Test-Path "${env:ProgramFiles(x86)}\Microsoft Visual Studio\Installer\vswhere.exe") {
$VS_VERSION_major = [int] ${env:VS_VERSION}.split(".")[0]
$existingPath = & "${env:ProgramFiles(x86)}\Microsoft Visual Studio\Installer\vswhere.exe" -products "Microsoft.VisualStudio.Product.BuildTools" -version "[${env:VS_VERSION}, ${env:VS_VERSION_major + 1})" -property installationPath
if (($existingPath -ne $null) -and (!${env:CIRCLECI})) {
echo "Found correctly versioned existing BuildTools installation in $existingPath"
exit 0
}
$pathToRemove = & "${env:ProgramFiles(x86)}\Microsoft Visual Studio\Installer\vswhere.exe" -products "Microsoft.VisualStudio.Product.BuildTools" -property installationPath
}
echo "Downloading VS installer from S3."
curl.exe --retry 3 -kL $VS_DOWNLOAD_LINK --output vs_installer.exe
if ($LASTEXITCODE -ne 0) {
echo "Download of the VS 2019 Version ${env:VS_VERSION} installer failed"
echo "Download of the VS 2017 installer failed"
exit 1
}
if ($pathToRemove -ne $null) {
echo "Uninstalling $pathToRemove."
$VS_UNINSTALL_ARGS = @("uninstall", "--installPath", "`"$pathToRemove`"", "--quiet","--wait")
$process = Start-Process "${PWD}\vs_installer.exe" -ArgumentList $VS_UNINSTALL_ARGS -NoNewWindow -Wait -PassThru
$exitCode = $process.ExitCode
if (($exitCode -ne 0) -and ($exitCode -ne 3010)) {
echo "Original BuildTools uninstall failed with code $exitCode"
exit 1
}
echo "Other versioned BuildTools uninstalled."
}
echo "Installing Visual Studio version ${env:VS_VERSION}."
$process = Start-Process "${PWD}\vs_installer.exe" -ArgumentList $VS_INSTALL_ARGS -NoNewWindow -Wait -PassThru
Remove-Item -Path vs_installer.exe -Force
$exitCode = $process.ExitCode
if (($exitCode -ne 0) -and ($exitCode -ne 3010)) {
echo "VS 2019 installer exited with code $exitCode, which should be one of [0, 3010]."
echo "VS 2017 installer exited with code $exitCode, which should be one of [0, 3010]."
curl.exe --retry 3 -kL $COLLECT_DOWNLOAD_LINK --output Collect.exe
if ($LASTEXITCODE -ne 0) {
echo "Download of the VS Collect tool failed."
@ -60,6 +29,6 @@ if (($exitCode -ne 0) -and ($exitCode -ne 3010)) {
}
Start-Process "${PWD}\Collect.exe" -NoNewWindow -Wait -PassThru
New-Item -Path "C:\w\build-results" -ItemType "directory" -Force
Copy-Item -Path "${env:TEMP}\vslogs.zip" -Destination "C:\w\build-results\"
Copy-Item -Path "C:\Users\circleci\AppData\Local\Temp\vslogs.zip" -Destination "C:\w\build-results\"
exit 1
}

View File

@ -1,5 +0,0 @@
$CMATH_DOWNLOAD_LINK = "https://raw.githubusercontent.com/microsoft/STL/12c684bba78f9b032050526abdebf14f58ca26a3/stl/inc/cmath"
$VC14_28_INSTALL_PATH="C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.28.29910\include"
curl.exe --retry 3 -kL $CMATH_DOWNLOAD_LINK --output "$home\cmath"
Move-Item -Path "$home\cmath" -Destination "$VC14_28_INSTALL_PATH" -Force

View File

@ -1,70 +1,37 @@
#!/bin/bash
set -eux -o pipefail
case ${CUDA_VERSION} in
10.2)
cuda_installer_name="cuda_10.2.89_441.22_win10"
cuda_install_packages="nvcc_10.2 cuobjdump_10.2 nvprune_10.2 cupti_10.2 cublas_10.2 cublas_dev_10.2 cudart_10.2 cufft_10.2 cufft_dev_10.2 curand_10.2 curand_dev_10.2 cusolver_10.2 cusolver_dev_10.2 cusparse_10.2 cusparse_dev_10.2 nvgraph_10.2 nvgraph_dev_10.2 npp_10.2 npp_dev_10.2 nvrtc_10.2 nvrtc_dev_10.2 nvml_dev_10.2"
;;
11.3)
cuda_installer_name="cuda_11.3.0_465.89_win10"
cuda_install_packages="thrust_11.3 nvcc_11.3 cuobjdump_11.3 nvprune_11.3 nvprof_11.3 cupti_11.3 cublas_11.3 cublas_dev_11.3 cudart_11.3 cufft_11.3 cufft_dev_11.3 curand_11.3 curand_dev_11.3 cusolver_11.3 cusolver_dev_11.3 cusparse_11.3 cusparse_dev_11.3 npp_11.3 npp_dev_11.3 nvrtc_11.3 nvrtc_dev_11.3 nvml_dev_11.3"
;;
11.6)
cuda_installer_name="cuda_11.6.0_511.23_windows"
cuda_install_packages="thrust_11.6 nvcc_11.6 cuobjdump_11.6 nvprune_11.6 nvprof_11.6 cupti_11.6 cublas_11.6 cublas_dev_11.6 cudart_11.6 cufft_11.6 cufft_dev_11.6 curand_11.6 curand_dev_11.6 cusolver_11.6 cusolver_dev_11.6 cusparse_11.6 cusparse_dev_11.6 npp_11.6 npp_dev_11.6 nvrtc_11.6 nvrtc_dev_11.6 nvml_dev_11.6"
;;
*)
echo "CUDA_VERSION $CUDA_VERSION is not supported yet"
exit 1
;;
esac
curl --retry 3 -kLO https://ossci-windows.s3.amazonaws.com/cuda_10.1.243_426.00_win10.exe
7z x cuda_10.1.243_426.00_win10.exe -ocuda_10.1.243_426.00_win10
cd cuda_10.1.243_426.00_win10
mkdir cuda_install_logs
set +e
if [[ -f "/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v${CUDA_VERSION}/bin/nvcc.exe" ]]; then
echo "Existing CUDA v${CUDA_VERSION} installation found, skipping install"
./setup.exe -s nvcc_10.1 cuobjdump_10.1 nvprune_10.1 cupti_10.1 cublas_10.1 cublas_dev_10.1 cudart_10.1 cufft_10.1 cufft_dev_10.1 curand_10.1 curand_dev_10.1 cusolver_10.1 cusolver_dev_10.1 cusparse_10.1 cusparse_dev_10.1 nvgraph_10.1 nvgraph_dev_10.1 npp_10.1 npp_dev_10.1 nvrtc_10.1 nvrtc_dev_10.1 nvml_dev_10.1 -loglevel:6 -log:"$(pwd -W)/cuda_install_logs"
set -e
if [[ "${VC_YEAR}" == "2017" ]]; then
cp -r CUDAVisualStudioIntegration/extras/visual_studio_integration/MSBuildExtensions/* "C:/Program Files (x86)/Microsoft Visual Studio/2017/${VC_PRODUCT}/Common7/IDE/VC/VCTargets/BuildCustomizations/"
else
tmp_dir=$(mktemp -d)
(
# no need to popd after, the subshell shouldn't affect the parent shell
pushd "${tmp_dir}"
cuda_installer_link="https://ossci-windows.s3.amazonaws.com/${cuda_installer_name}.exe"
curl --retry 3 -kLO $cuda_installer_link
7z x ${cuda_installer_name}.exe -o${cuda_installer_name}
pushd ${cuda_installer_name}
mkdir cuda_install_logs
set +e
# This breaks for some reason if you quote cuda_install_packages
# shellcheck disable=SC2086
./setup.exe -s ${cuda_install_packages} -loglevel:6 -log:"$(pwd -W)/cuda_install_logs"
set -e
if [[ ! -f "/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v${CUDA_VERSION}/bin/nvcc.exe" ]]; then
echo "CUDA installation failed"
mkdir -p /c/w/build-results
7z a "c:\\w\\build-results\\cuda_install_logs.7z" cuda_install_logs
exit 1
fi
)
rm -rf "${tmp_dir}"
cp -r CUDAVisualStudioIntegration/extras/visual_studio_integration/MSBuildExtensions/* "C:/Program Files (x86)/Microsoft Visual Studio/2019/${VC_PRODUCT}/MSBuild/Microsoft/VC/v160/BuildCustomizations/"
fi
if [[ -f "/c/Program Files/NVIDIA Corporation/NvToolsExt/bin/x64/nvToolsExt64_1.dll" ]]; then
echo "Existing nvtools installation found, skipping install"
else
# create tmp dir for download
tmp_dir=$(mktemp -d)
(
# no need to popd after, the subshell shouldn't affect the parent shell
pushd "${tmp_dir}"
curl --retry 3 -kLO https://ossci-windows.s3.amazonaws.com/NvToolsExt.7z
7z x NvToolsExt.7z -oNvToolsExt
mkdir -p "C:/Program Files/NVIDIA Corporation/NvToolsExt"
cp -r NvToolsExt/* "C:/Program Files/NVIDIA Corporation/NvToolsExt/"
)
rm -rf "${tmp_dir}"
curl --retry 3 -kLO https://ossci-windows.s3.amazonaws.com/NvToolsExt.7z
7z x NvToolsExt.7z -oNvToolsExt
mkdir -p "C:/Program Files/NVIDIA Corporation/NvToolsExt"
cp -r NvToolsExt/* "C:/Program Files/NVIDIA Corporation/NvToolsExt/"
export NVTOOLSEXT_PATH="C:\\Program Files\\NVIDIA Corporation\\NvToolsExt\\"
if ! ls "/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1/bin/nvcc.exe"
then
echo "CUDA installation failed"
mkdir -p /c/w/build-results
7z a "c:\\w\\build-results\\cuda_install_logs.7z" cuda_install_logs
exit 1
fi
cd ..
rm -rf ./cuda_10.1.243_426.00_win10
rm -f ./cuda_10.1.243_426.00_win10.exe

View File

@ -1,48 +0,0 @@
#!/bin/bash
set -eux -o pipefail
windows_s3_link="https://ossci-windows.s3.amazonaws.com"
case ${CUDA_VERSION} in
10.2)
cudnn_file_name="cudnn-${CUDA_VERSION}-windows10-x64-v7.6.5.32"
;;
11.3)
# Use cudnn8.3 with hard-coded cuda11.3 version
cudnn_file_name="cudnn-windows-x86_64-8.3.2.44_cuda11.5-archive"
;;
11.6)
# Use cudnn8.3 with hard-coded cuda11.5 version
cudnn_file_name="cudnn-windows-x86_64-8.3.2.44_cuda11.5-archive"
;;
*)
echo "CUDA_VERSION: ${CUDA_VERSION} not supported yet"
exit 1
;;
esac
cudnn_installer_name="cudnn_installer.zip"
cudnn_installer_link="${windows_s3_link}/${cudnn_file_name}.zip"
cudnn_install_folder="C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v${CUDA_VERSION}/"
if [[ -f "${cudnn_install_folder}/include/cudnn.h" ]]; then
echo "Existing cudnn installation found, skipping install..."
else
tmp_dir=$(mktemp -d)
(
pushd "${tmp_dir}"
curl --retry 3 -o "${cudnn_installer_name}" "$cudnn_installer_link"
7z x "${cudnn_installer_name}" -ocudnn
# Use '${var:?}/*' to avoid potentially expanding to '/*'
# Remove all of the directories before attempting to copy files
rm -rf "${cudnn_install_folder:?}/*"
cp -rf cudnn/cuda/* "${cudnn_install_folder}"
#Make sure windows path contains zlib dll
curl -k -L "${windows_s3_link}/zlib123dllx64.zip" --output "${tmp_dir}\zlib123dllx64.zip"
7z x "${tmp_dir}\zlib123dllx64.zip" -o"${tmp_dir}\zlib"
xcopy /Y "${tmp_dir}\zlib\dll_x64\*.dll" "C:\Windows\System32"
)
rm -rf "${tmp_dir}"
fi

View File

@ -0,0 +1,45 @@
#!/usr/bin/env python3
import cimodel.data.caffe2_build_definitions as caffe2_build_definitions
import cimodel.data.simple.util.docker_constants as pytorch_docker_constants
from yaml import load
try:
from yaml import CLoader as Loader
except ImportError:
from yaml import Loader
def load_config(filename=".circleci/config.yml"):
with open(filename, "r") as fh:
return load("".join(fh.readlines()), Loader)
def load_tags_for_projects(workflow_config):
return {
v["ecr_gc_job"]["project"]: v["ecr_gc_job"]["tags_to_keep"]
for v in workflow_config["workflows"]["ecr_gc"]["jobs"]
if isinstance(v, dict) and "ecr_gc_job" in v
}
def check_version(job, tags, expected_version):
valid_versions = tags[job].split(",")
if expected_version not in valid_versions:
raise RuntimeError(
"We configured {} to use Docker version {}; but this "
"version is not configured in job ecr_gc_job_for_{}. Non-deployed versions will be "
"garbage collected two weeks after they are created. DO NOT LAND "
"THIS TO MASTER without also updating ossci-job-dsl with this version."
"\n\nDeployed versions: {}".format(job, expected_version, job, tags[job])
)
def validate_docker_version():
tags = load_tags_for_projects(load_config())
check_version("pytorch", tags, pytorch_docker_constants.DOCKER_IMAGE_TAG)
check_version("caffe2", tags, caffe2_build_definitions.DOCKER_IMAGE_VERSION)
if __name__ == "__main__":
validate_docker_version()

View File

@ -59,7 +59,8 @@ binary_windows_params: &binary_windows_params
default: ""
executor:
type: string
default: "windows-xlarge-cpu-with-nvidia-cuda"
default: "windows-cpu-with-nvidia-cuda"
environment:
BUILD_ENVIRONMENT: << parameters.build_environment >>
BUILD_FOR_SYSTEM: windows
JOB_EXECUTOR: <<parameters.executor>>

View File

@ -0,0 +1,27 @@
caffe2_params: &caffe2_params
parameters:
build_environment:
type: string
default: ""
build_ios:
type: string
default: ""
docker_image:
type: string
default: ""
use_cuda_docker_runtime:
type: string
default: ""
build_only:
type: string
default: ""
resource_class:
type: string
default: "large"
environment:
BUILD_ENVIRONMENT: << parameters.build_environment >>
BUILD_IOS: << parameters.build_ios >>
USE_CUDA_DOCKER_RUNTIME: << parameters.use_cuda_docker_runtime >>
DOCKER_IMAGE: << parameters.docker_image >>
BUILD_ONLY: << parameters.build_only >>
resource_class: << parameters.resource_class >>

View File

@ -0,0 +1,14 @@
promote_common: &promote_common
docker:
- image: pytorch/release
parameters:
package_name:
description: "package name to promote"
type: string
default: ""
environment:
PACKAGE_NAME: << parameters.package_name >>
ANACONDA_API_TOKEN: ${CONDA_PYTORCHBOT_TOKEN}
AWS_ACCESS_KEY_ID: ${PYTORCH_BINARY_AWS_ACCESS_KEY_ID}
AWS_SECRET_ACCESS_KEY: ${PYTORCH_BINARY_AWS_SECRET_ACCESS_KEY}

View File

@ -15,15 +15,11 @@ pytorch_params: &pytorch_params
build_only:
type: string
default: ""
ci_master:
type: string
default: ""
environment:
BUILD_ENVIRONMENT: << parameters.build_environment >>
DOCKER_IMAGE: << parameters.docker_image >>
USE_CUDA_DOCKER_RUNTIME: << parameters.use_cuda_docker_runtime >>
BUILD_ONLY: << parameters.build_only >>
CI_MASTER: << pipeline.parameters.run_master_build >>
resource_class: << parameters.resource_class >>
pytorch_ios_params: &pytorch_ios_params
@ -40,29 +36,17 @@ pytorch_ios_params: &pytorch_ios_params
op_list:
type: string
default: ""
use_metal:
type: string
default: "0"
lite_interpreter:
type: string
default: "1"
use_coreml:
type: string
default: "0"
environment:
BUILD_ENVIRONMENT: << parameters.build_environment >>
IOS_ARCH: << parameters.ios_arch >>
IOS_PLATFORM: << parameters.ios_platform >>
SELECTED_OP_LIST: << parameters.op_list >>
USE_PYTORCH_METAL: << parameters.use_metal >>
BUILD_LITE_INTERPRETER: << parameters.lite_interpreter >>
USE_COREML_DELEGATE: << parameters.use_coreml >>
pytorch_windows_params: &pytorch_windows_params
parameters:
executor:
type: string
default: "windows-xlarge-cpu-with-nvidia-cuda"
default: "windows-cpu-with-nvidia-cuda"
build_environment:
type: string
default: ""
@ -71,19 +55,16 @@ pytorch_windows_params: &pytorch_windows_params
default: ""
cuda_version:
type: string
default: "10.1"
default: "10"
python_version:
type: string
default: "3.8"
vs_version:
type: string
default: "16.8.6"
default: "3.6"
vc_version:
type: string
default: "14.16"
default: "14.11"
vc_year:
type: string
default: "2019"
default: "2017"
vc_product:
type: string
default: "BuildTools"
@ -95,11 +76,10 @@ pytorch_windows_params: &pytorch_windows_params
SCCACHE_BUCKET: "ossci-compiler-cache"
CUDA_VERSION: <<parameters.cuda_version>>
PYTHON_VERSION: <<parameters.python_version>>
VS_VERSION: <<parameters.vs_version>>
VC_VERSION: <<parameters.vc_version>>
VC_YEAR: <<parameters.vc_year>>
VC_PRODUCT: <<parameters.vc_product>>
USE_CUDA: <<parameters.use_cuda>>
TORCH_CUDA_ARCH_LIST: "5.2 7.5"
TORCH_CUDA_ARCH_LIST: "7.5"
JOB_BASE_NAME: <<parameters.test_name>>
JOB_EXECUTOR: <<parameters.executor>>

View File

@ -1,26 +1,23 @@
commands:
calculate_docker_image_tag:
description: "Calculates the docker image tag"
# Must be run after attaching workspace from previous steps
load_shared_env:
description: "Loads .circleci/shared/env_file into ${BASH_ENV}"
parameters:
# For some weird reason we decide to reattach our workspace to ~/workspace so
# in the vein of making it simple let's assume our share env_file is here
root:
type: string
default: "~/workspace"
steps:
- run:
name: "Calculate docker image hash"
name: "Load .circleci/shared/env_file into ${BASH_ENV}"
command: |
DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)
echo "DOCKER_TAG=${DOCKER_TAG}" >> "${BASH_ENV}"
designate_upload_channel:
description: "inserts the correct upload channel into ${BASH_ENV}"
steps:
- run:
name: adding UPLOAD_CHANNEL to BASH_ENV
command: |
our_upload_channel=nightly
# On tags upload to test instead
if [[ -n "${CIRCLE_TAG}" ]]; then
our_upload_channel=test
if [[ -f "<< parameters.root >>/.circleci/shared/env_file" ]]; then
cat << parameters.root >>/.circleci/shared/env_file >> ${BASH_ENV}
else
echo "We didn't have a shared env file, that's weird"
fi
echo "export UPLOAD_CHANNEL=${our_upload_channel}" >> ${BASH_ENV}
# This system setup script is meant to run before the CI-related scripts, e.g.,
# installing Git client, checking out code, setting up CI env, and
@ -103,7 +100,7 @@ commands:
name: (Optional) Merge target branch
no_output_timeout: "10m"
command: |
if [[ -n "$CIRCLE_PULL_REQUEST" && "$CIRCLE_BRANCH" != "nightly" ]]; then
if [ -n "$CIRCLE_PULL_REQUEST" ]; then
PR_NUM=$(basename $CIRCLE_PULL_REQUEST)
CIRCLE_PR_BASE_BRANCH=$(curl -s https://api.github.com/repos/$CIRCLE_PROJECT_USERNAME/$CIRCLE_PROJECT_REPONAME/pulls/$PR_NUM | jq -r '.base.ref')
if [[ "${BUILD_ENVIRONMENT}" == *"xla"* || "${BUILD_ENVIRONMENT}" == *"gcc5"* ]] ; then
@ -133,42 +130,4 @@ commands:
echo "This is not a pull request, skipping..."
fi
upload_binary_size_for_android_build:
description: "Upload binary size data for Android build"
parameters:
build_type:
type: string
default: ""
artifacts:
type: string
default: ""
steps:
- run:
name: "Binary Size - Install Dependencies"
no_output_timeout: "5m"
command: |
retry () {
$* || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*)
}
retry pip3 install requests
- run:
name: "Binary Size - Untar Artifacts"
no_output_timeout: "5m"
command: |
# The artifact file is created inside docker container, which contains the result binaries.
# Now unpackage it into the project folder. The subsequent script will scan project folder
# to locate result binaries and report their sizes.
# If artifact file is not provided it assumes that the project folder has been mounted in
# the docker during build and already contains the result binaries, so this step can be skipped.
export ARTIFACTS="<< parameters.artifacts >>"
if [ -n "${ARTIFACTS}" ]; then
tar xf "${ARTIFACTS}" -C ~/project
fi
- run:
name: "Binary Size - Upload << parameters.build_type >>"
no_output_timeout: "5m"
command: |
cd ~/project
export ANDROID_BUILD_TYPE="<< parameters.build_type >>"
export COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
python3 -m tools.stats.upload_binary_size_to_scuba android

View File

@ -11,31 +11,24 @@ parameters:
run_binary_tests:
type: boolean
default: false
run_build:
type: boolean
default: true
run_master_build:
type: boolean
default: false
run_slow_gradcheck_build:
type: boolean
default: false
docker_config_defaults: &docker_config_defaults
user: jenkins
aws_auth:
# This IAM user only allows read-write access to ECR
aws_access_key_id: ${CIRCLECI_AWS_ACCESS_KEY_FOR_ECR_READ_WRITE_V4}
aws_secret_access_key: ${CIRCLECI_AWS_SECRET_KEY_FOR_ECR_READ_WRITE_V4}
executors:
windows-with-nvidia-gpu:
machine:
resource_class: windows.gpu.nvidia.medium
image: windows-server-2019-nvidia:previous
image: windows-server-2019-nvidia:stable
shell: bash.exe
windows-xlarge-cpu-with-nvidia-cuda:
windows-cpu-with-nvidia-cuda:
machine:
# we will change to CPU host when it's ready
resource_class: windows.xlarge
image: windows-server-2019-vs2019:stable
shell: bash.exe
windows-medium-cpu-with-nvidia-cuda:
machine:
resource_class: windows.medium
image: windows-server-2019-vs2019:stable
shell: bash.exe

View File

@ -3,12 +3,12 @@
# binary_linux_libtorch_3.6m_cpu_test:
# environment:
# BUILD_ENVIRONMENT: "libtorch 3.6m cpu"
# resource_class: gpu.nvidia.small
# resource_class: gpu.medium
# <<: *binary_linux_test
#
# binary_linux_libtorch_3.6m_cu90_test:
# environment:
# BUILD_ENVIRONMENT: "libtorch 3.6m cu90"
# resource_class: gpu.nvidia.small
# resource_class: gpu.medium
# <<: *binary_linux_test
#

Some files were not shown because too many files have changed in this diff Show More