Compare commits

..

69 Commits

Author SHA1 Message Date
dfbd030854 Fix builder pinning (#64971)
`git checkout release/1.9` should be work dir is changed to
`BUILDER_ROOT`
2021-09-13 21:13:35 -07:00
e2cb357367 (torch.distributed) Add torch.distributed.is_torchelastic_launched() util method + make init_method=tcp:// compatible with torchelastic (#63910) (#64826)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63910

Addresses the current issue that `init_method=tcp://` is not compatible with `torch.distributed.run` and `torch.distributed.launch`. When running with a training script that initializes the process group with `init_method=tcp://localhost:$port` as such:

```
$ python -u -m torch.distributed.run --max_restarts 0 --nproc_per_node 1 --nnodes 1 --master_addr $(hostname) --master_port 6000 ~/tmp/test.py
```

An `Address in use` error is raised since the training script tries to create a TCPStore on port 6000, which is already taken since the elastic agent is already running a TCPStore on that port.

For details see: https://github.com/pytorch/pytorch/issues/63874.

This change does a couple of things:

1. Adds `is_torchelastic_launched()` check function that users can use in the training scripts to see whether the script is launched via torchelastic.
1. Update the `torch.distributed` docs page to include the new `is_torchelastic_launched()` function.
1. Makes `init_method=tcp://` torchelastic compatible by modifying `_tcp_rendezvous_handler` in `torch.distributed.rendezvous` (this is NOT the elastic rendezvous, it is the old rendezvous module which is slotted for deprecation in future releases) to check `is_torchelastic_launched()` AND `torchelastic_use_agent_store()` and if so, only create TCPStore clients (no daemons, not even for rank 0).
1. Adds a bunch of unittests to cover the different code paths

NOTE: the issue mentions that we should fail-fast with an assertion on `init_method!=env://` when `is_torchelastic_launched()` is `True`. There are three registered init_methods in pytorch: env://, tcp://, file://. Since this diff makes tcp:// compatible with torchelastic and I've validated that file is compatible with torchelastic. There is no need to add assertions. I did update the docs to point out that env:// is the RECOMMENDED init_method. We should probably deprecate the other init_methods in the future but this is out of scope for this issue.

Test Plan: Unittests.

Reviewed By: cbalioglu

Differential Revision: D30529984

fbshipit-source-id: 267aea6d4dad73eb14a2680ac921f210ff547cc5

Co-authored-by: Kiuk Chung <kiuk@fb.com>
2021-09-12 10:38:42 -07:00
eccdfffa95 Pin macos mkl conda version to fix the cmake build (#61773) (#64831)
Summary:
Fixes macos build error in master, recently mkl had a upgrade.

CircleCI error:
https://app.circleci.com/pipelines/github/pytorch/pytorch/351645/workflows/d22421c1-bb8f-48fd-9efd-7c0d77f0b083/jobs/14815607

```
Jul 16 11:43:05 CMake Error at /Users/distiller/workspace/miniconda3/lib/cmake/mkl/MKLConfig.cmake:456 (list):
Jul 16 11:43:05   list does not recognize sub-command PREPEND
Jul 16 11:43:05 Call Stack (most recent call first):
Jul 16 11:43:05   /Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/share/cmake/Caffe2/public/mkl.cmake:1 (find_package)
Jul 16 11:43:05   /Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:109 (include)
Jul 16 11:43:05   /Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package)
Jul 16 11:43:05   CMakeLists.txt:5 (find_package)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61773

Reviewed By: soulitzer

Differential Revision: D29736742

Pulled By: zhouzhuojie

fbshipit-source-id: 68c5244196f7f7562a6c202157c4ccdcfcb64337

Co-authored-by: zhouzhuojie <zhouzhuojie@gmail.com>
2021-09-10 15:55:28 -07:00
3c81d48e29 fix(elastic-docs): Fix elastic launch doc (#62378) (#64798)
Summary:
The documentation link should be https://pytorch.org/docs/stable/elastic/run.html

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62378

Reviewed By: aivanou

Differential Revision: D30002830

Pulled By: kiukchung

fbshipit-source-id: 34b434acaa10222561df43f6397a2420eef02015

Co-authored-by: Ce Gao <cegao@tencent.com>
2021-09-09 20:06:18 -07:00
61e9e88c30 [Release-1.9.1] [torch] Various improvements to torch.distributed.launch and torch.distributed.run (#60925) (#64797)
* Minor bug fix in the warning message (#61127)

Summary:
The current example code does not work. The correct one is like this: cb7d813275/torch/distributed/run.py (L266)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61127

Reviewed By: cbalioglu

Differential Revision: D29572003

Pulled By: mrshenli

fbshipit-source-id: 05b470230f3d70f8a6164edb5f92894a1112069f

* Small change for torch.distributed launcher (#59152)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59152

Small change for https://fb.workplace.com/groups/319878845696681

Test Plan: Imported from OSS

Reviewed By: rohan-varma

Differential Revision: D28773682

Pulled By: H-Huang

fbshipit-source-id: acf82273e8622b7ffd3088d8d766bdf49273754c

* [torch] Various improvements to `torch.distributed.launch` and `torch.distributed.run` (#61294)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61294

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60925

* Make `torch.distributed.launch` restarts to 0
* Remove unnecessary `-use_env` warning, move `-use_env` warnings
* Move `-use_env` warnings to `torch.distributed.launch`
* Make default log level WARNING
* Add new doc section around transitioning to `torch.distributed.run`
* Make `torch.distributed.launch` not use error-propagation
* Set default events handler to `null` that does not print events to console
* Add reference from `torch.distributed.launch` to `torch.distributed.run`
* Set correct preexec function that sends SIGTERM to child processes when parent dies

Issues resolved:

https://github.com/pytorch/pytorch/issues/60716
https://github.com/pytorch/pytorch/issues/60754

Test Plan:
sandcastle

    python -m torch.distributed.launch --nproc_per_node 2 main.py -> uses 0 restarts
    python -m torch.distributed.run --nproc_per_node 2 main.py -> uses default for torchelastic, 0 restarts

    python -m torch.distributed.launch --nproc_per_node=4  --use_env --no_python  main.py -> produces error
    python -m torch.distributed.launch --nproc_per_node=4  --use_env main.py -> no warning
    python -m torch.distributed.launch --nproc_per_node=4  --no_python  main.py ->warning

Output of running torch.distributed.launch without --use_env:

    $path/torch/distributed/launch.py:173: FutureWarning: The module torch.distributed.launch is deprecated
    and will be removed in future. Use torch.distributed.run.
    Note that --use_env is set by default in torch.distributed.run.
    If your script expects `--local_rank` argument to be set, please
    change it to read from `os.environ('LOCAL_RANK')` instead.

New section:

{F628923078}

{F628974089}

Reviewed By: cbalioglu

Differential Revision: D29559553

fbshipit-source-id: 03ed9ba638bf154354e1530ffc964688431edf6b

Co-authored-by: Kento Nozawa <k_nzw@klis.tsukuba.ac.jp>
Co-authored-by: Howard Huang <howardhuang@fb.com>
Co-authored-by: Aliaksandr Ivanou <aivanou@fb.com>
2021-09-09 20:03:00 -07:00
04dd41d7cb Remove use_env from torch.distributed.run, clarify bc around that parameter in comment. (#59409) (#64796)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59409

Remove use_env from torch.distributed.run, and clarify bc around that parameter in comment.

Test Plan: n/a

Reviewed By: cbalioglu

Differential Revision: D28876485

fbshipit-source-id: 5f10365968d204985ce517b83c392c688995d76e

Co-authored-by: Aliaksandr Ivanou <aivanou@fb.com>
2021-09-09 19:51:45 -07:00
9cc96ad811 [torch] Set nproc_per_node to 1 (#61552) (#64791)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61552

Set `nproc_per_node` to 1

Test Plan: unittests

Reviewed By: cbalioglu

Differential Revision: D29667056

fbshipit-source-id: 6601f66fec5e018c7737d909f8c71642451abb29

Co-authored-by: Aliaksandr Ivanou <aivanou@fb.com>
2021-09-09 19:33:26 -07:00
7a9f23c030 [torch/distributed] remove outdated FutureWarning in distributed/elastic/util/store.py (#60807) (#64792)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60807

Addresses: https://github.com/pytorch/pytorch/issues/60717

This warning should have been removed since this code is no longer in "experimental" mode.

Test Plan: N/A - just removing experimental warning that should've been removed.

Reviewed By: H-Huang, aivanou

Differential Revision: D29412972

fbshipit-source-id: 16a8a98abde70a4ae0c1ac1b14bda339cb44863a

Co-authored-by: Kiuk Chung <kiuk@fb.com>
2021-09-09 19:31:47 -07:00
3a2876d98e [Release-1.9.1] fix mm not correctly report TORCH_CHECK failure issue (#61394) (#64702)
Fixes https://github.com/pytorch/pytorch/issues/61291.

Cherry-pick of  https://github.com/pytorch/pytorch/pull/61394 into `release/1.9` branch

As `torch.mm` is not not structured op, explicit check is added to LinearAlgebra.cu

Reviewed By: zhouzhuojie, seemethere

Differential Revision: D29614208

Pulled By: walterddr

fbshipit-source-id: f49a15dde708e30b06059b47fae1cda7c2c3571c

Co-authored-by: Rong Rong (AI Infra) <rongr@fb.com>
2021-09-09 12:44:57 -07:00
81efb9fd1a [torch/elastic] Pretty print the failure message captured by @record (#64036) (#64681)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64036

This PR slightly revises the implementation of the internal `_format_failure()` method in order to pretty print the error message captured in a subprocess by the `record` annotation.

With this PR a failure log is formatted as below:

```
Root Cause:
[0]:
  time: 2021-08-26_17:12:07
  rank: 0 (local_rank: 0)
  exitcode: 1 (pid: 8045)
  error_file: /tmp/torchelastic_6cj9eppm/6d9d844a-6ce4-4838-93ed-1639a9525b00_rec9kuv3/attempt_0/0/error.json
  msg:
    {
      "message": "ValueError: Test",
      "extraInfo": {
        "py_callstack": [
          "  File \"/data/home/balioglu/fail.py\", line 7, in <module>\n    main()\n",
          "  File \"/fsx/users/balioglu/repos/pytorch/torch/distributed/elastic/multiprocessing/errors/__init__.py\", line 373, in wrapper\n    error_handler.record_exception(e)\n",
          "  File \"/fsx/users/balioglu/repos/pytorch/torch/distributed/elastic/multiprocessing/errors/error_handler.py\", line 86, in record_exception\n    _write_error(e, self._get_error_file_path())\n",
          "  File \"/fsx/users/balioglu/repos/pytorch/torch/distributed/elastic/multiprocessing/errors/error_handler.py\", line 26, in _write_error\n    \"py_callstack\": traceback.format_stack(),\n"
        ],
        "timestamp": "1629997927"
      }
    }
```

in contrast to the old formatting:

```
Root Cause:
[0]:
  time: 2021-08-26_17:15:50
  rank: 0 (local_rank: 0)
  exitcode: 1 (pid: 9417)
  error_file: /tmp/torchelastic_22pwarnq/19f22638-848c-4b8f-8379-677f34fc44e7_u43o9vs7/attempt_0/0/error.json
  msg: "{'message': 'ValueError: Test', 'extraInfo': {'py_callstack': 'Traceback (most recent call last):\n  File "/fsx/users/balioglu/repos/pytorch/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 351, in wrapper\n    return f(*args, **kwargs)\n  File "/data/home/balioglu/fail.py", line 5, in main\n    raise ValueError("BALIOGLU")\nValueError: BALIOGLU\n', 'timestamp': '1629998150'}}"
```
ghstack-source-id: 136761768

Test Plan: Run the existing unit tests.

Reviewed By: kiukchung

Differential Revision: D30579025

fbshipit-source-id: 37df0b7c7ec9b620355766122986c2c77e8495ae

Co-authored-by: Can Balioglu <balioglu@fb.com>
2021-09-08 12:00:04 -07:00
568bc668e5 Update torch.distributed.run OMP_NUM_THREADS message to log.warning (#63953) (#64679)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63953

Closes #61138

Test:
`python -m torch.distributed.run --nproc_per_node 2 test.py`
Still outputs message

`LOGLEVEL=ERROR python -m torch.distributed.run --nproc_per_node 2 test.py`
Does not output message anymore

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski SciPioneer H-Huang mrzzd cbalioglu gcramer23

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D30542997

Pulled By: H-Huang

fbshipit-source-id: e7da30dcda51516abf4e56f1f510132e44397027

Co-authored-by: Howard Huang <howardhuang@fb.com>
2021-09-08 11:53:51 -07:00
088ca37c5c Add option to skip GH validation for torch.hub (#62139) (#64677)
Summary:
Split from https://github.com/pytorch/pytorch/pull/62072

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62139

Reviewed By: mthrok

Differential Revision: D29891497

Pulled By: malfet

fbshipit-source-id: 5c0baf53a2acf8f95062bd001457e1f936011529
2021-09-08 11:46:26 -07:00
544c8bd791 Fix infinite loop in _validate_not_a_forked_repo() (#62072) (#64659)
Summary:
Increase `page_idx` in the loop rather than outside of it
Break from the loop when receive empty response as it means there are no more items to fetch via pagination request

Also, add options to use provided github token (via `GITHUB_TOKEN` environment variable)

Fixes failure with "Rate Limit Exceeded" when doing something like `torch.hub.list("pytorch/test-infra:dsf")`

Fixes https://github.com/pytorch/pytorch/issues/61755

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62072

Reviewed By: jbschlosser

Differential Revision: D29868539

Pulled By: malfet

fbshipit-source-id: 206082a0ba1208e9b15ff6c9c6cb71d2da74f1c3
2021-09-08 10:50:29 -07:00
5057e20247 Remove Caffe2 thread-pool leak warning (#60318) (#64651)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/57273.

Some users reported that they dislike the Caffe2 thread-pool leak warning, as it floods their logs, and have requested disabling it, or have asked for a way to filter it.

It seems caffe2 pthreadpool already exists because of some dependency in the binary distribution, so `torch.set_num_threads()` invocation isn't required to reproduce the issue (as is otherwise the case when building from the master branch).

https://github.com/pytorch/pytorch/issues/60171's test script does have a `set_num_threads` invocation & hence that's why I was able to reproduce the issue after building from the master branch's source code.

cc malfet & ejguan, who have the authority to make a decision.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60318

Reviewed By: albanD

Differential Revision: D29265771

Pulled By: ezyang

fbshipit-source-id: 26f678af2fec45ef8f7e1d39a57559790eb9e94b

Co-authored-by: Winston Smith <76181208+imaginary-person@users.noreply.github.com>
2021-09-08 08:27:06 -07:00
81ddd719b4 Stop warning on .names() access in max_pool2d and max_pool2d_backward (#60059) (#64616)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60059

Fixes #60053.

The problem is that `.names()` always triggers the named tensor warning.
To not trigger it, one has to guard it with has_names:
`x.has_names() ? x.names() : DimnameList{}`

This is not the first time this has happened; we should probably
make it so that .names() doesn't raise a warning unless it is actually
populated with names. That's a little tricky to implement so I'm leaving
it for the future.

Test Plan:
- New test, also run `python test/test_nn.py -v -k "max_pool"` and
confirm there are no warnings.

Reviewed By: gchanan

Differential Revision: D29152737

Pulled By: zou3519

fbshipit-source-id: 89a2fdbe6a6064a7044b5b75f7d0c58e51e57509

Co-authored-by: Richard Zou <zou3519@gmail.com>
2021-09-08 08:26:31 -07:00
dce3a4047f [Release-1.9] .circleci: Remove conda-package-handling pin #64648
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62290

No longer needed anymore.

Fixes nightly failures that we're observing as well:

```
Jul 27 07:33:02 Found conflicts! Looking for incompatible packages.
Jul 27 07:33:02 This can take several minutes.  Press CTRL-C to abort.
Jul 27 07:33:02 failed
Jul 27 07:33:02
Jul 27 07:33:02 UnsatisfiableError: The following specifications were found
Jul 27 07:33:02 to be incompatible with the existing python installation in your environment:
Jul 27 07:33:02
Jul 27 07:33:02 Specifications:
Jul 27 07:33:02
Jul 27 07:33:02   - conda-package-handling=1.6.0 -> python[version='>=2.7,<2.8.0a0|>=3.6,<3.7.0a0|>=3.7,<3.8.0a0|>=3.8,<3.9.0a0']
Jul 27 07:33:02
Jul 27 07:33:02 Your python: python=3.9
```

From: https://app.circleci.com/pipelines/github/pytorch/pytorch/356478/workflows/2102acf1-c92a-4a59-919c-61d32d3bcd71/jobs/15027876

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: driazati

Differential Revision: D29946501

Pulled By: seemethere

fbshipit-source-id: 3e9182f4cbcf2aab185dbbc21b7a6171746e2281

Co-authored-by: Eli Uriegas <eliuriegas@fb.com>
2021-09-08 08:20:19 -07:00
4d7401c2f7 [Release-1.9.1] Fix CI failures (#64617)
Pin numpy version to 1.20 to avoid mypy failures

Re-fork https://github.com/google/breakpad to be able to access now deleted commit 5485e473ed (which github had now yet pruned at the time of this PR)

For the record, 1.9 was build with breakpad integration forked at  f7428bc397 plus following patch:
```
diff --git a/src/client/linux/handler/exception_handler.cc b/src/client/linux/handler/exception_handler.cc
index ca353c40..6d1fbfd5 100644
--- a/src/client/linux/handler/exception_handler.cc
+++ b/src/client/linux/handler/exception_handler.cc
@@ -381,7 +381,7 @@ void ExceptionHandler::SignalHandler(int sig, siginfo_t* info, void* uc) {
   // successfully, restore the default handler. Otherwise, restore the
   // previously installed handler. Then, when the signal is retriggered, it will
   // be delivered to the appropriate handler.
-  if (handled) {
+  if (false && handled) {
     InstallDefaultHandler(sig);
   } else {
     RestoreHandlersLocked();
```

Last but not the least: build OpenSSL before sccache, otherwise sccache "accelerated" build inside container times out
2021-09-07 23:06:14 -07:00
e7cd59c7a0 [package] fix tutorial link (#60113) (#60117)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60113

The tutorial link in the docs was to an fb-only colab.

Test Plan: Imported from OSS

Reviewed By: SplitInfinity

Differential Revision: D29169818

Pulled By: suo

fbshipit-source-id: 374807c234a185bd515b8ffe1300e6cf8d821636
2021-06-16 12:32:59 -07:00
1a7c23c4c0 [docs] Add pickle security warning to package docs (#59959) (#59976)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59959

**Summary**
This commit replaces the warning on the `torch.package` documentation
page about the module not being publicly released (which will no longer
be true as of 1.9) with one that warns about security issues caused by
the use of the `pickle` module.

**Test Plan**
1) Built the docs locally.
2) Continuous integration.

<img width="877" alt="Captura de Pantalla 2021-06-14 a la(s) 11 22 05 a  m" src="https://user-images.githubusercontent.com/4392003/121940300-c98cab00-cd02-11eb-99dc-08e29632079a.png">

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D29108429

Pulled By: SplitInfinity

fbshipit-source-id: 3a0aeac0dc804a31203bc5071efb1c5bd6ef9725
2021-06-14 13:20:26 -07:00
d69c22dd61 [docs] Add torch.package documentation for beta release (#59886)
**Summary**
This commit adds documentation for the `torch.package` module to
accompany its beta release in 1.9.

**Test Plan**
Continous integration.
2021-06-11 13:43:27 -07:00
4ad4f6db7f hold references to storages during TorchScript serializaiton (#59672)
Fixes issue for serialization problem caused by using memory address of storages for mobile and torch.package models.

 - https://github.com/pytorch/pytorch/pull/59642 hold references to storages during TorchScript serialization

Uses StorageContext to hold a reference to all storages seen during TorchScript serialization to allow for tensors to be created/destroyed during serialization process. Tracking of the storages solves for the ABA memory problem.
2021-06-11 13:42:58 -07:00
90e67738b1 [Release/1.9] Link whole CuDNN for CUDA-11.1 (#59873)
* Move cublas dependency after CuDNN (#58287)

Summary:
Library linking order matters during static linking
Not sure whether its a bug or a feature, but if cublas is reference
before CuDNN, it will be partially statically linked into the library,
even if it is not used

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58287

Reviewed By: janeyx99

Differential Revision: D28433165

Pulled By: malfet

fbshipit-source-id: 8dffa0533075126dc383428f838f7d048074205c

* [CMake] Split caffe2::cudnn into public and private (#59721)

Summary:
This is only important for builds where cuDNN is linked statically into libtorch_cpu.
Before this PR PyTorch wheels often accidentally contained several partial copies of cudnn_static library.
Splitting the interface into header only (cudnn-public) and library+headers(cudnn-private) prevents those from happening.
Preliminary step towards enabling optional linking whole cudnn_library to workaround issue reported in https://github.com/pytorch/pytorch/issues/50153

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59721

Reviewed By: ngimel

Differential Revision: D29000967

Pulled By: malfet

fbshipit-source-id: f054df92b265e9494076ab16c247427b39da9336

* Add USE_WHOLE_CUDNN option (#59744)

Summary:
It is only enabled if USE_STATIC_CUDNN is enabled

Next step after https://github.com/pytorch/pytorch/pull/59721 towards resolving fast kernels stripping reported in https://github.com/pytorch/pytorch/issues/50153

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59744

Reviewed By: seemethere, ngimel

Differential Revision: D29007314

Pulled By: malfet

fbshipit-source-id: 7091e299c0c6cc2a8aa82fbf49312cecf3bb861a

* [Binary] Link whole CuDNN for CUDA-11.1 (#59802)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/50153

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59802

Reviewed By: driazati, seemethere

Differential Revision: D29033537

Pulled By: malfet

fbshipit-source-id: e816fc71f273ae0b4ba8a0621d5368a2078561a1
2021-06-11 10:38:31 -07:00
43c581aa62 Make detach return an alias even under inference mode (#59633) (#59757)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59633

Fixes #59614

This fix isn't 100% correct but it appears to stem the bleeding.
A better fix would be understand how to detect when function
implementations don't uphold required invariants, leading to
refcount disaster.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D28962183

Pulled By: ezyang

fbshipit-source-id: 6ec71994666289dadef47bac363e6902df90b094
2021-06-11 10:04:14 -07:00
bc446f6a54 Fix test_randperm_device_compatibility for 1 GPU (#59484) (#59502)
Summary:
Do not try to create tensors on 2nd device if device_count() == 1

Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59484

Reviewed By: ngimel

Differential Revision: D28910673

Pulled By: malfet

fbshipit-source-id: e3517f31a463dd049ce8a5155409b7b716c8df18
2021-06-04 20:01:02 -07:00
abe996a7fb Move CUDA async warning to suffix (#59467) (#59501)
Summary:
After the change async error warnings look as follows:
```
$ python -c "import torch;torch.eye(3,3,device='cuda:777')"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59467

Reviewed By: ngimel

Differential Revision: D28904360

Pulled By: malfet

fbshipit-source-id: 2a8fa5affed5b4ffcaa602c8ab2669061cde7db0
2021-06-04 20:00:55 -07:00
795df76568 Do not use gold linker for CUDA builds (#59490) (#59500)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59490

Reviewed By: agolynski, seemethere

Differential Revision: D28913160

Pulled By: malfet

fbshipit-source-id: d27092c252fc86424028abe146cf5f33a2f74544
2021-06-04 20:00:45 -07:00
3b9cd08901 Prefer accurate reciprocal on ARMv8 (#59361) (#59470)
Summary:
Default NEON accelerated implementation of reciprocal uses vrecpeq_f32 which yield  Newton-Raphson approximation rather than actual value
Use regular NEON accelerated division for reciprocal and reciprocal square root operations.

This fixes `test_reference_numerics_hard_frac_cpu_float32`, `test_reference_numerics_normal_rsqrt_cpu_float32` etc

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59361

Reviewed By: mruberry

Differential Revision: D28870456

Pulled By: malfet

fbshipit-source-id: e634b0887cce7efb046ea1fd9b74424e0eceb164
2021-06-04 18:34:39 -07:00
226c274f70 Search for static OpenBLAS compiled with OpenMP (#59428) (#59463)
Summary:
Before that, only dynamically linked OpenBLAS compield with OpenMP could
be found.

Also get rid of hardcoded codepath for libgfortran.a in FindLAPACK.cmake

Only affects aarch64 linux builds

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59428

Reviewed By: agolynski

Differential Revision: D28891314

Pulled By: malfet

fbshipit-source-id: 5af55a14c85ac66551ad2805c5716bbefe8d55b2
2021-06-04 11:15:58 -07:00
ce24cab257 Fix torch.randperm for CUDA (#59352) (#59452)
Summary:
Context https://github.com/pytorch/pytorch/issues/58545

The logic is that we are going to keep it consistent for both
torch.randperm and torch.randint

1. Generators can have either a fully-specified or non-fully specified device
2. As long as the device type match with the result, we don't error out

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59352

Test Plan:
```
python test/test_tensor_creation_ops.py -k TestRandomTensorCreation
```

Reviewed By: ngimel

Differential Revision: D28855920

Pulled By: zhouzhuojie

fbshipit-source-id: f8141a2c4b2f177e1aa7baec6999b65916cba02c
2021-06-04 10:23:29 -07:00
d98d113810 .circleci: Disable USE_GOLD_LINKER for CUDA 10.2 (#59413) (#59462)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59413

For CUDA 10.2 builds linked with the gold linker we were observing
crashes when exceptions were being raised

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D28888054

Pulled By: seemethere

fbshipit-source-id: f9b38147591721803ed3cac607510fe5bbc49d6d
(cherry picked from commit c7a3a13baba0d547c5c20579328b0b3d83b94656)
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
2021-06-04 10:22:51 -07:00
17a44c2bb5 Added missing namespaces for C++ API (#45736) (#59367)
Summary:
Hello,

depending on the build environment you may encounter
```c++
error: reference to 'optional' is ambiguous
```
when using the Torch-C++-API.

This PR adds `c10::` to avoid possible ambiguities with **std::optional** and does not introduce any functional change.

Fixes https://discuss.pytorch.org/t/linker-failed-with-ambiguous-references/36255 .

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45736

Reviewed By: dzhulgakov

Differential Revision: D24125123

Pulled By: VitalyFedyunin

fbshipit-source-id: df21420f0a2d0270227c28976a7a4218315cc107

Co-authored-by: Johannes Czech <QueensGambit@users.noreply.github.com>
2021-06-03 10:39:51 -07:00
26e6fa380e [vulkan] Remove constant duplication for Vulkan optimize_for_mobile (#59341)
ghstack-source-id: bb809586d27d1285660d1db2c3561b46d158f499
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59276
2021-06-03 09:45:56 -07:00
bf16699cc8 [Release-1.9] Disable failing ROCM-4.2 tests (#59339)
* [ROCm] disable test test_Conv2d_groups_nobias for ROCm (#59158)

Summary:
Disabling the test since its failing in ROCm4.2

Signed-off-by: Jagadish Krishnamoorthy <jagdish.krishna@gmail.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59158

Reviewed By: mruberry

Differential Revision: D28808953

Pulled By: ngimel

fbshipit-source-id: 134f147ead6dc559d2cde49cf8343cd976e6c224

* [ROCm] disable test test_Conv2d_groups_nobias_v2 for ROCm (#58701)

Summary:
Disable test_Conv2d_groups_nobias_v2 test because it is failing on ROCm 4.2

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58701

Reviewed By: ngimel

Differential Revision: D28626651

Pulled By: mruberry

fbshipit-source-id: a74bdf45335ae2afee0aa5e3bece6e208e75a63f

Co-authored-by: Jagadish Krishnamoorthy <jagdish.krishna@gmail.com>
Co-authored-by: Kyle Chen <kylechen@amd.com>
2021-06-02 15:07:06 -07:00
6d4fe05502 Build with USE_GLOO_WITH_OPENSSL=1 (#59274)
Needed for https://github.com/pytorch/builder/pull/779

Co-authored-by: Your Name <driazati@users.noreply.github.com>
2021-06-02 08:18:25 -07:00
b046542f8a Add breakpad + debug builds (#59275)
This is the combination of #59236 and #58685 which will enable <insert builder PR here> to land on the release branch. This enables breakpad for minidump collection (which is still opt-in) and debug builds for the release.

Co-authored-by: Your Name <driazati@users.noreply.github.com>
2021-06-01 23:32:08 -07:00
5d57b9392c [pkg] Catch exceptions where dependency resolution gets invalid imports (#58573) (#59272)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58573

Users can create invalid imports, like:
```
HG: in a top-level package
if False:
  from .. import foo
```

Since this code is never executed, it will not cause the module to fail to
load. But our dependency analysis walks every `import` statement in the AST,
and will attempt to resolve the (incorrectly formed) import, throwing an exception.

For posterity, the code that triggered this: https://git.io/JsCgM

Differential Revision: D28543980

Test Plan: Added a unit test

Reviewed By: Chillee

Pulled By: suo

fbshipit-source-id: 03b7e274633945b186500fab6f974973ef8c7c7d

Co-authored-by: Michael Suo <suo@fb.com>
2021-06-01 15:51:38 -07:00
f6a9351776 [pkg] simplifications to broken dependency handling (#58572) (#59273)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58572

Right now, we have three categories of error (broken, denied, unhandled). This
PR unifies them into a single "error" field in the node, with optional context.
It also generalizes how formatting of the error in PackagingError occurs.

Differential Revision: D28543982

Test Plan: sandcastle

Reviewed By: Chillee

Pulled By: suo

fbshipit-source-id: d99d37699ec2e172e3798763e60aafe9a66ed6f4

Co-authored-by: Michael Suo <suo@fb.com>
2021-06-01 15:51:30 -07:00
3071601491 [c10d] Fix monitored_barrier with wait_all_ranks (#58702) (#59266)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58702

Off by one error when determining if some ranks failed or not with
`wait_all_ranks=True`. This wasn't caught by tests because the tests only
tested failure scenarios, not success scenarios with `wait_all_ranks=True`.
ghstack-source-id: 129559840

Test Plan: CI

Reviewed By: zhaojuanmao

Differential Revision: D28583235

fbshipit-source-id: a8f376efb13a3f36c788667acab86543c80aff59
2021-06-01 15:45:16 -07:00
d417a094f3 Document factory_kwargs in nn.Quantize + remove Attributes section (#59025) (#59045)
Summary:
The `factory_kwargs` kwarg was previously undocumented in `nn.Quantize`. Further, the `Attributes` section of the docs was improperly filled in, resulting in bad formatting. This section doesn't apply since `nn.Quantize` doesn't have parameters, so it has been removed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59025

Reviewed By: anjali411

Differential Revision: D28723889

Pulled By: jbschlosser

fbshipit-source-id: ba86429f66d511ac35042ebd9c6cc3da7b6b5805

Co-authored-by: Joel Schlosser <jbschlosser@fb.com>
2021-05-27 20:53:52 -07:00
1fdbbc96ae fix unique for discontiguous inputs (#59003) (#59055)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/58959

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59003

Reviewed By: mruberry

Differential Revision: D28714534

Pulled By: ngimel

fbshipit-source-id: d9bf82f54be5b5919e27281e49fad74e00d8b766
2021-05-27 20:52:42 -07:00
e761f16ad5 Collect kernel version (#58485) (#59121)
Summary:
Collect env should collect kernel and glibc version

Fixes https://github.com/pytorch/pytorch/issues/58387

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58485

Reviewed By: walterddr

Differential Revision: D28510564

Pulled By: malfet

fbshipit-source-id: ad3d4b93f51db052720bfaa4322138c55816921b
2021-05-27 17:12:31 -07:00
0544a765d3 Split CUDA SpectralOp (#58459) (#59120)
Summary:
Move all cuFFT related parts to SpectralOps.cpp
Leave only _fft_fill_with_conjugate_symmetry_cuda_ in SpecralOps.cu

Keep `CUDAHooks.cpp` in torch_cuda_cpp by introducing `at::cuda::detail::THCMagma_init` functor and registering it from global constructor in `THCTensorMathMagma.cu`

Move entire detail folder to torch_cuda_cpp library.

This is a no-op that helps greatly reduce binary size for CUDA-11.x builds by avoiding cufft/cudnn symbol duplication between torch_cuda_cpp(that makes most of cuFFT calls) and torch_cuda_cu (that only needed it to compile SpectralOps.cu)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58459

Reviewed By: ngimel

Differential Revision: D28499001

Pulled By: malfet

fbshipit-source-id: 425a981beb383c18a79d4fbd9b49ddb4e5133291
2021-05-27 17:08:20 -07:00
1ea8ae5d93 Refactor GlooDeviceFactory::makeDeviceFor... (#58996) (#59118)
Summary:
`makeDeviceForHostname` and `makeDeviceForInterface` are almost
duplicate except for different default argument values

Create generic `makeGlooDevice` anonymous function that takes both host
name and interface name and call it from both
makeDeviceFor[Hostname|Interface]

Also solve two other minor issues:
 - do not call `getenv("GLOO_DEVICE_TRANSPORT")` during library load
   time
 - Raise exception rather than crash if GLOO_DEVICE_TRANSPORT is set to unknown value

Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58996

Reviewed By: pbelevich

Differential Revision: D28713324

Pulled By: malfet

fbshipit-source-id: cb33b438078d163e3ec6f047f2e5247b07d94f8d
2021-05-27 17:08:09 -07:00
97ca7303b0 [ROCm] fix JIT codegen (#57400) (#59116)
Summary:
Fixes upcoming changes that are part of ROCm 4.2 and affect PyTorch JIT.

- ROCM_VERSION macro must be available to both device and host compilation passes.
- Unifies some of CUDA and HIP differences in the code generated.
  - NAN / POS_INFINITY / NEG_INFINITY
  - Do not hipify `extern __shared__` -> `HIP_DYNAMIC_SHARED()` macro [deprecated]
- Differentiates bf16 codegen for HIP.
- Optionally provides missing macros when using hiprtc precompiled header feature.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57400

Reviewed By: ejguan

Differential Revision: D28421065

Pulled By: malfet

fbshipit-source-id: 215f476773c61d8b0d9d148a4e5f5d016f863074

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
2021-05-27 17:07:45 -07:00
ac94547143 Change link order for BUILD_SPLIT_CUDA option (#58437) (#59119)
Summary:
torch_cuda_cu depends on torch_cuda_cpp, so it should be linked first
Otherwise linker keeps lots of cudnn symbols for no good reason

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58437

Reviewed By: janeyx99

Differential Revision: D28496472

Pulled By: malfet

fbshipit-source-id: 338605ff755591476070c172a6ea0a0dcd0beb23
2021-05-27 17:07:39 -07:00
e2027acebe Add underscores to some internal names (#59105)
* Add underscores to some internal names

Summary:
Add underscores to some of the internal names

Test Plan:
python test/test_profiler.py -v

Reviewers: anjali411

[ghstack-poisoned]

* Add underscores to some internal names

Summary:
Add underscores to some of the internal names

Test Plan:
python test/test_profiler.py -v

Reviewers: anjali411

[ghstack-poisoned]

Co-authored-by: ilia-cher <iliacher@fb.com>
2021-05-27 14:19:13 -07:00
0896c6b1f0 fix nn.MHA scriptability (#58727) (#59072)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58727

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D28593830

Pulled By: bhosmer

fbshipit-source-id: 37dee9efededaea9985a2bf040df1ba4b46f6580
2021-05-27 10:30:10 -07:00
43f6675363 [PyTorch] Remove device check from a few indexing methods (#58800) (#59048)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58800

These methods leverages TensorIterator which will handle
(or skip) device check.
ghstack-source-id: 129654358

Test Plan: CI && sandcastle

Reviewed By: ngimel

Differential Revision: D28622626

fbshipit-source-id: 6153299780d4f7bf286423520ba4cb60b554335e

Co-authored-by: Wenlei Xie <wxie@fb.com>
2021-05-27 10:28:56 -07:00
450f5c6f4d Add docstring for is_inference_mode_enabled (#59047) (#59085)
Summary:
Fixes` #{issue number}

Testing:
```
>>> import torch
>>> torch.is_inference_mode_enabled.__doc__
'\nis_inference_mode_enabled(input) -> (bool)\n\nReturns True if inference mode is currently enabled.\n\nArgs:\n    input (Tensor): the input tensor.\n'
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59047

Reviewed By: ailzhang

Differential Revision: D28726991

Pulled By: soulitzer

fbshipit-source-id: c117c7d73e551a1b5f0e215f2aed528bf558ef7c
2021-05-27 10:27:32 -07:00
310e528a0d Add UninitializedBuffer to nn docs (#59021) (#59044)
Summary:
The `UninitializedBuffer` class was previously left out of `nn.rst`, so it was not included in the generated documentation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59021

Reviewed By: anjali411

Differential Revision: D28723044

Pulled By: jbschlosser

fbshipit-source-id: 71e15b0c7fabaf57e8fbdf7fbd09ef2adbdb36ad

Co-authored-by: Joel Schlosser <jbschlosser@fb.com>
2021-05-27 10:27:20 -07:00
e4161d0b2b Add sparse_csr_tensor to BC allow-list (#59093)
Fix for intentional regression in #59001

Co-authored-by: driazati <driazati@users.noreply.github.com>
2021-05-27 10:27:00 -07:00
016dc8cb68 Fix build regression caused by https://github.com/pytorch/pytorch/pull/58940 (#59008)
s/Vectorized/Vec256/

Vec256 were renamed to Vectorized on master after the branch cut
2021-05-26 11:55:50 -07:00
a3ea5cee52 [docs] Clarify batch_first behavior for nn.LSTM, nn.RNN, and nn.GRU (#58809) (#58958)
Summary:
Fixes the high-pri doc component of https://github.com/pytorch/pytorch/issues/4145.

To make the input / output shapes more readable for both `batch_first` states, this PR also introduces short dim names. Opinions welcome on the readability of the restructured docs!

Screenshot for `nn.LSTM`:
<img width="791" alt="Screen Shot 2021-05-24 at 5 11 39 PM" src="https://user-images.githubusercontent.com/75754324/119408130-389e5300-bcb3-11eb-9a4f-1df96a0a4d70.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58809

Reviewed By: gchanan

Differential Revision: D28685415

Pulled By: jbschlosser

fbshipit-source-id: e8c92e3d7e052071a505b55dca976fd2ef5a8307

Co-authored-by: Joel Schlosser <jbschlosser@fb.com>
2021-05-26 11:12:56 -07:00
dfc58f4faa Underscore prefix sparse_csr_tensor and to_sparse_csr (#59001)
* Underscore prefix sparse_csr_tensor and to_sparse_csr

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* fix lint

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2021-05-26 11:11:25 -07:00
b5e2635281 Add mish activation function (#58648) (#58940)
Summary:
See issus: https://github.com/pytorch/pytorch/issues/58375

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58648

Reviewed By: gchanan

Differential Revision: D28625390

Pulled By: jbschlosser

fbshipit-source-id: 23ea2eb7d5b3dc89c6809ff6581b90ee742149f4

Co-authored-by: Adnios <2780199647@qq.com>
2021-05-25 13:30:36 -07:00
9dfd2e7b56 Add no-grad inference mode note (#58513) (#58939)
Summary:
Adds a note explaining the difference between several often conflated mechanisms in the autograd note
Also adds a link to this note from the docs in `grad_mode` and `nn.module`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58513

Reviewed By: gchanan

Differential Revision: D28651129

Pulled By: soulitzer

fbshipit-source-id: af9eb1749b641fc1b632815634eea36bf7979156
2021-05-25 13:30:29 -07:00
f0bdbb4ce1 [Release/1.9][DataLoader] Add keyword arg to meta and support abc for typing (#58848)
ghstack-source-id: 36e1ae3e08cf19da25c00a0a5e8a2bd0ab9530c3
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58450
2021-05-24 10:26:54 -07:00
bc4471c8c9 catch exception when running print regression (#58751) (#58752)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58751

Test Plan: https://github.com/pytorch/pytorch/issues/58752

Reviewed By: samestep

Differential Revision: D28605667

Pulled By: walterddr

fbshipit-source-id: 3796c924df8e50849dd08ecbeab612ba4f0c569b
2021-05-23 22:30:07 -07:00
317fd72526 Quote in setup-ci-env (#58637) (#58763)
Summary:
Do not put quotes for arguments that do not have space in them in add_to_env_file

ENV file is used both by bash as well as by docker, which does not omit
quotes when they are present there

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58637

Reviewed By: wconstab

Differential Revision: D28561159

Pulled By: malfet

fbshipit-source-id: 0843aad22703b6c3adebeb76175de1cfc1a974b5
2021-05-21 13:55:42 -07:00
b8d36033f0 Enables builds with Compute Library backend for oneDNN (#55913) (#58746)
Summary:
Since v1.7, oneDNN (MKL-DNN) has supported the use of Compute Library
for the Arm architeture to provide optimised convolution primitives
on AArch64.

This change enables the use of Compute Library in the PyTorch build.
Following the approach used to enable the use of CBLAS in MKLDNN,
It is enabled by setting the env vars USE_MKLDNN and USE_MKLDNN_ACL.
The location of the Compute Library build must be set useing `ACL_ROOT_DIR`.

This is an extension of the work in https://github.com/pytorch/pytorch/pull/50400
which added support for the oneDNN/MKL-DNN backend on AArch64.

_Note: this assumes that Compute Library has been built and installed at
ACL_ROOT_DIR. Compute library can be downloaded here:
`https://github.com/ARM-software/ComputeLibrary`_

Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55913

Reviewed By: ailzhang

Differential Revision: D28559516

Pulled By: malfet

fbshipit-source-id: 29d24996097d0a54efc9ab754fb3f0bded290005
2021-05-21 10:59:00 -07:00
47507259b9 [PyTorch Edge] Use lite interpreter as default and bump model version (#58630)
* [PyTorch Edge] bytecode version bump to v5 and enable share constant table

* [Pytorch] Build lite interpreter as default for iOS

* [Pytorch] Build lite interpreter as default for Android
2021-05-20 17:43:14 -07:00
e77e8d52da Add grid_sample to fp32 list (#58683) 2021-05-20 17:34:03 -07:00
b9fb6d1c7e fix nonzero perf regression (#58714) 2021-05-20 17:31:34 -07:00
1ea310bc8e [1.9] remove gate for beta feature (torchscript support in torch.package) (#58620) 2021-05-19 15:21:11 -07:00
8e6b8d8d46 Add shape documentation for CosineEmbeddingLoss (#58403) (#58590)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/52732

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58403

Reviewed By: HDCharles

Differential Revision: D28480076

Pulled By: jbschlosser

fbshipit-source-id: c2c51e9da86e274e80126bbcabebb27270f2d2d0

Co-authored-by: Joel Schlosser <jbschlosser@fb.com>
2021-05-19 14:05:11 -07:00
87c46a5e32 [1.9] Remove torch.vmap (#58589)
torch.vmap is a prototype feature and should not be in the stable
binary. This PR:
- Removes the torch.vmap API
- Removes the documentation entry for torch.vmap
- Changes the vmap tests to use an internal API instead of torch.vmap.

Test Plan:
- Tested locally (test_torch, test_autograd, test_type_hints, test_vmap),
but also wait for CI.
2021-05-19 14:04:27 -07:00
5092364d78 [release/1.9] Pin builder and xla repos (#58514)
Pin builder to https://github.com/pytorch/builder/commits/release/1.9
Pin xla to https://github.com/pytorch/xla/tree/r1.9

Co-authored-by: driazati <driazati@users.noreply.github.com>
2021-05-18 18:52:06 -07:00
085a3bcb77 [release/1.9] Fix issues regarding binary_chekcout (#58495)
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
2021-05-18 10:36:16 -07:00
5f0bbb38ec ci: Release branch specific changes
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
2021-05-17 17:30:59 -07:00
4182 changed files with 129723 additions and 244714 deletions

View File

@ -44,7 +44,7 @@ jobs:
is_official_build: ${{ parameters.is_official_build}}
# Sync and update PyTorch submodules
- bash: git submodule update --init --recursive --jobs 0
- bash: git submodule update --init --recursive
displayName: Update PyTorch submodules
# Build PyTorch and run unit tests - no packaging

View File

@ -47,7 +47,7 @@ jobs:
is_official_build: ${{ parameters.is_official_build}}
# Sync and update PyTorch submodules
- script: git submodule update --init --recursive --jobs 0
- script: git submodule update --init --recursive
displayName: Update PyTorch submodules
# Build PyTorch and run unit tests - no packaging

View File

@ -1,26 +0,0 @@
parameters:
name: ''
pool: ''
customMatrixes: ''
jobs:
- job: ${{parameters.name}}
timeoutInMinutes: 600
strategy:
matrix:
${{ insert }}: ${{parameters.customMatrixes}}
pool:
name: ${{ parameters.pool}}
steps:
# Clone PyTorch Tests repository
- bash: |
B64_PAT=$(echo -n ":$_ADOTOKEN" | base64)
git -c http.extraHeader="Authorization: Basic ${B64_PAT}" clone $(AZURE_DEVOPS_PYTORCH_TESTS_REPO_URL)
cd pytorch_tests
git checkout $(PYTORCH_TESTS_CHECKOUT_BRANCH)
env:
_ADOTOKEN: $(AZURE_DEVOPS_CLI_PAT)
displayName: Clone PyTorch Tests repo
- bash: |
bash $(Build.SourcesDirectory)/pytorch_tests/webapp/notify_webapp.sh
displayName: Notify Webapp

View File

@ -33,7 +33,7 @@ jobs:
# Clone PyTorch Tests repository
- bash: |
B64_PAT=$(echo -n ":$_ADOTOKEN" | base64)
B64_PAT=$(printf "%s"":$_ADOTOKEN" | base64)
git -c http.extraHeader="Authorization: Basic ${B64_PAT}" clone $(AZURE_DEVOPS_PYTORCH_TESTS_REPO_URL)
cd pytorch_tests
git checkout $(PYTORCH_TESTS_CHECKOUT_BRANCH)
@ -48,14 +48,4 @@ jobs:
_TS_CLONE_P: $(TS_CLONE_PASSWORD)
_TS_P: $(TS_PAT)
_TS_SM_P: $(TS_SM_PAT)
_AZUREML_CLONE_PASSWORD: $(AZUREML_CLONE_PASSWORD)
_SPPASSWORD: $(SPPASSWORD)
displayName: Run PyTorch Unit Tests
# Tests results are available outside the docker container since
# the current directory is mounted as a volume of the container.
- task: PublishTestResults@2
condition: always()
inputs:
testResultsFiles: '**/test-*.xml'
testRunTitle: 'Publish test results for Python'

View File

@ -47,11 +47,3 @@ jobs:
_TS_P: $(TS_PAT)
_TS_SM_P: $(TS_SM_PAT)
displayName: Run PyTorch Unit Tests
# Tests results are available outside the docker container since
# the current directory is mounted as a volume of the container.
- task: PublishTestResults@2
condition: always()
inputs:
testResultsFiles: '**\test-*.xml'
testRunTitle: 'Publish test results for Python'

View File

@ -8,7 +8,7 @@ steps:
connectionType: 'connectedServiceName'
serviceConnection: circleciconn
method: 'POST'
headers: '{"Content-Type":"application/json", "BranchName":"$(_TARGET_BRANCH_TO_CHECK)", "JobName":"$(TARGET_CIRCLECI_BUILD_PR)", "PRNumber":"$(_TARGET_PR_NUMBER)", "TargetCommit":"$(_TARGET_COMMIT)", "PlanUrl":"$(System.CollectionUri)", "ProjectId":"$(System.TeamProjectId)", "HubName":"$(System.HostType)", "PlanId":"$(System.PlanId)", "JobId":"$(System.JobId)", "TimelineId":"$(System.TimelineId)", "TaskInstanceId":"$(System.TaskInstanceId)", "AuthToken":"$(System.AccessToken)"}'
headers: '{"Content-Type":"application/json", "BranchName":"$(TARGET_BRANCH_TO_CHECK_PR)", "JobName":"$(TARGET_CIRCLECI_PR)", "PlanUrl":"$(System.CollectionUri)", "ProjectId":"$(System.TeamProjectId)", "HubName":"$(System.HostType)", "PlanId":"$(System.PlanId)", "JobId":"$(System.JobId)", "TimelineId":"$(System.TimelineId)", "TaskInstanceId":"$(System.TaskInstanceId)", "AuthToken":"$(System.AccessToken)"}'
body: ''
urlSuffix: 'api/JobStatus'
waitForCompletion: true

View File

@ -1,6 +1,6 @@
# Initiate 5 agentless-server waiting jobs to check on the
# status of PR artifact builds, for a maximum wait time of
# 11*60 min=660 mins. These jobs will pass immediately
# 5 * 60 min =300 minutes. These jobs will pass immediately
# once targeted CircleCI build is ready.
jobs:
@ -8,6 +8,7 @@ jobs:
pool: server
timeoutInMinutes: 60
continueOnError: true
steps:
- template: wheel-wait-job-template.yml
@ -16,6 +17,7 @@ jobs:
timeoutInMinutes: 60
dependsOn: checkjob1
continueOnError: true
steps:
- template: wheel-wait-job-template.yml
@ -24,6 +26,7 @@ jobs:
timeoutInMinutes: 60
dependsOn: checkjob2
continueOnError: true
steps:
- template: wheel-wait-job-template.yml
@ -32,6 +35,7 @@ jobs:
timeoutInMinutes: 60
dependsOn: checkjob3
continueOnError: true
steps:
- template: wheel-wait-job-template.yml
@ -40,53 +44,6 @@ jobs:
timeoutInMinutes: 60
dependsOn: checkjob4
continueOnError: true
steps:
- template: wheel-wait-job-template.yml
- job: checkjob6
pool: server
timeoutInMinutes: 60
dependsOn: checkjob5
continueOnError: true
steps:
- template: wheel-wait-job-template.yml
- job: checkjob7
pool: server
timeoutInMinutes: 60
dependsOn: checkjob6
continueOnError: true
steps:
- template: wheel-wait-job-template.yml
- job: checkjob8
pool: server
timeoutInMinutes: 60
dependsOn: checkjob7
continueOnError: true
steps:
- template: wheel-wait-job-template.yml
- job: checkjob9
pool: server
timeoutInMinutes: 60
dependsOn: checkjob8
continueOnError: true
steps:
- template: wheel-wait-job-template.yml
- job: checkjob10
pool: server
timeoutInMinutes: 60
dependsOn: checkjob9
continueOnError: true
steps:
- template: wheel-wait-job-template.yml
- job: checkjob11
pool: server
timeoutInMinutes: 60
dependsOn: checkjob10
continueOnError: true
steps:
- template: wheel-wait-job-template.yml

View File

@ -48,13 +48,3 @@ stages:
_PYTHON_VERSION: $(PYTHON_VERSION_WIN_2)
_CUDA_BUILD_VERSION: $(CUDA_BUILD_VERSION_WIN_2)
_RUN_TESTS: $(RUN_TESTS_WIN)
- stage: 'NotifyWebapp'
displayName: 'Notify Webapp that pipeline is finished'
dependsOn: NightlyCustomTests
condition: succeededOrFailed()
jobs:
- template: job_templates/notify-webapp-template.yml
parameters:
name: ubuntu_1804_CPU
pool: $(BUILD_POOL_LIN_1)

View File

@ -7,28 +7,14 @@
# 2) runs custom PyTorch unit-tests on PyTorch
# wheels generated during PR builds.
resources:
webhooks:
- webhook: GitHubPyTorchPRTrigger
connection: GitHubPyTorchPRTriggerConnection
filters:
- path: repositoryName
value: pytorch_tests
stages:
- stage: 'EnsureArtifactsReady'
displayName: 'Ensure PyTorch PR Artifacts are ready'
jobs:
- template: job_templates/wheel-wait-template.yml
variables:
_TARGET_BRANCH_TO_CHECK: ${{parameters.GitHubPyTorchPRTrigger.TARGET_BRANCH_TO_CHECK_AZ_DEVOPS_PR}}
_TARGET_PR_NUMBER: ${{parameters.GitHubPyTorchPRTrigger.PR_NUMBER}}
_TARGET_COMMIT: ${{parameters.GitHubPyTorchPRTrigger.TARGET_COMMIT}}
- stage: 'PRCustomTests'
displayName: 'Run custom unit tests on PyTorch wheels'
dependsOn: EnsureArtifactsReady
condition: succeeded()
jobs:
- template: job_templates/pytorch-template-unix.yml
parameters:
@ -38,25 +24,7 @@ stages:
PR_Custom_Tests:
_PYTHON_VERSION: $(PYTHON_VERSION_PR)
_CUDA_BUILD_VERSION: $(CUDA_BUILD_VERSION_PR)
_TARGET_CIRCLECI_BUILD: $(TARGET_CIRCLECI_BUILD_PR)
_TARGET_BRANCH_TO_CHECK: ${{parameters.GitHubPyTorchPRTrigger.TARGET_BRANCH_TO_CHECK_AZ_DEVOPS_PR}}
_TARGET_PR_NUMBER: ${{parameters.GitHubPyTorchPRTrigger.PR_NUMBER}}
_TARGET_COMMIT: ${{parameters.GitHubPyTorchPRTrigger.TARGET_COMMIT}}
_TARGET_CIRCLECI_BUILD: $(TARGET_CIRCLECI_PR)
_TARGET_BRANCH_TO_CHECK: $(TARGET_BRANCH_TO_CHECK_PR)
_DOCKER_IMAGE: $(DOCKER_IMAGE_PR)
_RUN_TESTS: $(RUN_TESTS_PR)
- stage: 'NotifyWebapp'
displayName: 'Notify Webapp that pipeline is finished'
dependsOn: PRCustomTests
condition: succeededOrFailed()
jobs:
- template: job_templates/notify-webapp-template.yml
parameters:
name: ubuntu_1804_CPU
pool: $(BUILD_POOL_LIN_1)
customMatrixes:
PR_Notify_WebApp:
_TARGET_CIRCLECI_BUILD: $(TARGET_CIRCLECI_BUILD_PR)
_TARGET_BRANCH_TO_CHECK: ${{parameters.GitHubPyTorchPRTrigger.TARGET_BRANCH_TO_CHECK_AZ_DEVOPS_PR}}
_TARGET_PR_NUMBER: ${{parameters.GitHubPyTorchPRTrigger.PR_NUMBER}}
_TARGET_COMMIT: ${{parameters.GitHubPyTorchPRTrigger.TARGET_COMMIT}}

View File

@ -1,13 +1,3 @@
build --copt=--std=c++14
build --copt=-I.
build --copt=-isystem --copt bazel-out/k8-fastbuild/bin
# Configuration to disable tty features for environments like CI
build:no-tty --curses no
build:no-tty --progress_report_interval 10
build:no-tty --show_progress_rate_limit 10
# Configuration to build with GPU support
build:gpu --define=cuda=true
# define a separate build folder for faster switching between configs
build:gpu --platform_suffix=-gpu

View File

@ -1 +1 @@
4.2.1
3.1.0

View File

@ -343,6 +343,7 @@ All linux builds occur in docker images. The docker images are
* Has ALL CUDA versions installed. The script pytorch/builder/conda/switch_cuda_version.sh sets /usr/local/cuda to a symlink to e.g. /usr/local/cuda-10.0 to enable different CUDA builds
* Also used for cpu builds
* pytorch/manylinux-cuda90
* pytorch/manylinux-cuda92
* pytorch/manylinux-cuda100
* Also used for cpu builds

View File

@ -126,7 +126,6 @@ class PackageFormatConfigNode(ConfigNode):
self.props["python_versions"] = python_versions
self.props["package_format"] = package_format
def get_children(self):
if self.find_prop("os_name") == "linux":
return [LinuxGccConfigNode(self, v) for v in LINUX_GCC_CONFIG_VARIANTS[self.find_prop("package_format")]]

View File

@ -124,9 +124,9 @@ class Conf(object):
Output looks similar to:
- binary_upload:
name: binary_linux_manywheel_3_7m_cu113_devtoolset7_nightly_upload
name: binary_linux_manywheel_3_7m_cu92_devtoolset7_nightly_upload
context: org-member
requires: binary_linux_manywheel_3_7m_cu113_devtoolset7_nightly_test
requires: binary_linux_manywheel_3_7m_cu92_devtoolset7_nightly_test
filters:
branches:
only:
@ -134,7 +134,7 @@ class Conf(object):
tags:
only: /v[0-9]+(\\.[0-9]+)*-rc[0-9]+/
package_type: manywheel
upload_subfolder: cu113
upload_subfolder: cu92
"""
return {
"binary_upload": OrderedDict({

View File

@ -3,7 +3,6 @@ PHASES = ["build", "test"]
CUDA_VERSIONS = [
"102",
"111",
"113",
]
ROCM_VERSIONS = [

View File

@ -7,19 +7,26 @@ CONFIG_TREE_DATA = [
("5.4", [ # All this subtree rebases to master and then build
("3.6", [
("important", [X(True)]),
("parallel_tbb", [X(True)]),
("parallel_native", [X(True)]),
("pure_torch", [X(True)]),
]),
]),
# TODO: bring back libtorch test
("7", [X("3.6")]),
]),
("clang", [
("7", [
("5", [
("3.6", [
("asan", [
(True, [
("shard_test", [XImportant(True)]),
]),
]),
]),
]),
("7", [
("3.6", [
("onnx", [XImportant(True)]),
]),
]),
@ -27,28 +34,33 @@ CONFIG_TREE_DATA = [
("cuda", [
("10.2", [
("3.6", [
# Build are needed for slow_gradcheck
('build_only', [X(True)]),
("slow_gradcheck", [
# If you update this slow gradcheck, you should
# also update docker_definitions.py to make sure
# the docker image match the config used here
("shard_test", [X(True)]),
("libtorch", [
(True, [
('shard_test', [XImportant(True)]),
('build_only', [X(True)]),
]),
]),
]),
]),
("11.1", [
("3.8", [
("shard_test", [XImportant(True)]),
("libtorch", [
(True, [
('build_only', [X(True)]),
]),
]),
# UNCOMMENT THE BELOW TO REENABLE LIBTORCH
# ("libtorch", [
# (True, [
# ('build_only', [X(True)]),
# ]),
# ]),
]),
]),
]),
]),
("bionic", [
("clang", [
("9", [
("3.6", [
("noarch", [XImportant(True)]),
]),
]),
("9", [
("3.6", [
("xla", [XImportant(True)]),
@ -56,14 +68,31 @@ CONFIG_TREE_DATA = [
]),
]),
]),
# @jithunnair-amd believes Jenkins builds are sufficient
# ("rocm", [
# ("3.9", [
# ("3.6", [
# ('build_only', [XImportant(True)]),
# ]),
# ]),
# ]),
("cuda", [
("10.2", [
("3.9", [
("shard_test", [XImportant(True)]),
]),
]),
]),
("gcc", [
("9", [
("3.8", [
("coverage", [
(True, [
("shard_test", [XImportant(True)]),
]),
]),
]),
]),
]),
("rocm", [
("3.9", [
("3.6", [
('build_only', [XImportant(True)]),
]),
]),
]),
]),
]
@ -147,18 +176,10 @@ class ExperimentalFeatureConfigNode(TreeConfigNode):
"cuda_gcc_override": CudaGccOverrideConfigNode,
"coverage": CoverageConfigNode,
"pure_torch": PureTorchConfigNode,
"slow_gradcheck": SlowGradcheckConfigNode,
}
return next_nodes[experimental_feature]
class SlowGradcheckConfigNode(TreeConfigNode):
def init2(self, node_name):
self.props["is_slow_gradcheck"] = True
def child_constructor(self):
return ExperimentalFeatureConfigNode
class PureTorchConfigNode(TreeConfigNode):
def modify_label(self, label):
return "PURE_TORCH=" + str(label)

View File

@ -31,7 +31,6 @@ class Conf:
is_libtorch: bool = False
is_important: bool = False
parallel_backend: Optional[str] = None
build_only: bool = False
@staticmethod
def is_test_phase(phase):
@ -113,8 +112,6 @@ class Conf:
parameters["resource_class"] = "xlarge"
if hasattr(self, 'filters'):
parameters['filters'] = self.filters
if self.build_only:
parameters['build_only'] = miniutils.quote(str(int(True)))
return parameters
def gen_workflow_job(self, phase):
@ -178,6 +175,35 @@ class DocPushConf(object):
}
}
# TODO Convert these to graph nodes
def gen_dependent_configs(xenial_parent_config):
extra_parms = [
(["multigpu"], "large"),
(["nogpu", "NO_AVX2"], None),
(["nogpu", "NO_AVX"], None),
(["slow"], "medium"),
]
configs = []
for parms, gpu in extra_parms:
c = Conf(
xenial_parent_config.distro,
["py3"] + parms,
pyver=xenial_parent_config.pyver,
cuda_version=xenial_parent_config.cuda_version,
restrict_phases=["test"],
gpu_resource=gpu,
parent_build=xenial_parent_config,
is_important=False,
)
configs.append(c)
return configs
def gen_docs_configs(xenial_parent_config):
configs = []
@ -185,7 +211,7 @@ def gen_docs_configs(xenial_parent_config):
HiddenConf(
"pytorch_python_doc_build",
parent_build=xenial_parent_config,
filters=gen_filter_dict(branches_list=["master", "nightly"],
filters=gen_filter_dict(branches_list=r"/.*/",
tags_list=RC_PATTERN),
)
)
@ -201,7 +227,7 @@ def gen_docs_configs(xenial_parent_config):
HiddenConf(
"pytorch_cpp_doc_build",
parent_build=xenial_parent_config,
filters=gen_filter_dict(branches_list=["master", "nightly"],
filters=gen_filter_dict(branches_list=r"/.*/",
tags_list=RC_PATTERN),
)
)
@ -212,6 +238,13 @@ def gen_docs_configs(xenial_parent_config):
branch="master",
)
)
configs.append(
HiddenConf(
"pytorch_doc_test",
parent_build=xenial_parent_config
)
)
return configs
@ -225,7 +258,7 @@ def gen_tree():
return configs_list
def instantiate_configs(only_slow_gradcheck):
def instantiate_configs():
config_list = []
@ -244,12 +277,8 @@ def instantiate_configs(only_slow_gradcheck):
is_onnx = fc.find_prop("is_onnx") or False
is_pure_torch = fc.find_prop("is_pure_torch") or False
is_vulkan = fc.find_prop("is_vulkan") or False
is_slow_gradcheck = fc.find_prop("is_slow_gradcheck") or False
parms_list_ignored_for_docker_image = []
if only_slow_gradcheck ^ is_slow_gradcheck:
continue
python_version = None
if compiler_name == "cuda" or compiler_name == "android":
python_version = fc.find_prop("pyver")
@ -313,10 +342,6 @@ def instantiate_configs(only_slow_gradcheck):
if build_only or is_pure_torch:
restrict_phases = ["build"]
if is_slow_gradcheck:
parms_list_ignored_for_docker_image.append("old")
parms_list_ignored_for_docker_image.append("gradcheck")
gpu_resource = None
if cuda_version and cuda_version != "10":
gpu_resource = "medium"
@ -336,7 +361,6 @@ def instantiate_configs(only_slow_gradcheck):
is_libtorch=is_libtorch,
is_important=is_important,
parallel_backend=parallel_backend,
build_only=build_only,
)
# run docs builds on "pytorch-linux-xenial-py3.6-gcc5.4". Docs builds
@ -357,19 +381,19 @@ def instantiate_configs(only_slow_gradcheck):
tags_list=RC_PATTERN)
c.dependent_tests = gen_docs_configs(c)
if cuda_version == "10.2" and python_version == "3.6" and not is_libtorch:
c.dependent_tests = gen_dependent_configs(c)
if (
compiler_name != "clang"
and not rocm_version
compiler_name == "gcc"
and compiler_version == "5.4"
and not is_libtorch
and not is_vulkan
and not is_pure_torch
and not is_noarch
and not is_slow_gradcheck
and not only_slow_gradcheck
and not build_only
and parallel_backend is None
):
distributed_test = Conf(
c.gen_build_name("") + "distributed",
bc_breaking_check = Conf(
"backward-compatibility-check",
[],
is_xla=False,
restrict_phases=["test"],
@ -377,16 +401,16 @@ def instantiate_configs(only_slow_gradcheck):
is_important=True,
parent_build=c,
)
c.dependent_tests.append(distributed_test)
c.dependent_tests.append(bc_breaking_check)
config_list.append(c)
return config_list
def get_workflow_jobs(only_slow_gradcheck=False):
def get_workflow_jobs():
config_list = instantiate_configs(only_slow_gradcheck)
config_list = instantiate_configs()
x = []
for conf_options in config_list:

View File

@ -6,39 +6,35 @@ from cimodel.data.simple.util.branch_filters import gen_filter_dict, RC_PATTERN
# TODO: make this generated from a matrix rather than just a static list
IMAGE_NAMES = [
"pytorch-linux-bionic-cuda10.2-cudnn7-py3.8-gcc9",
"pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7",
"pytorch-linux-bionic-py3.6-clang9",
"pytorch-linux-bionic-cuda10.2-cudnn7-py3.6-clang9",
"pytorch-linux-bionic-py3.8-gcc9",
"pytorch-linux-xenial-cuda10-cudnn7-py3-gcc7",
"pytorch-linux-xenial-cuda10.1-cudnn7-py3-gcc7",
"pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7",
"pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7",
"pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7",
"pytorch-linux-xenial-py3-clang5-android-ndk-r19c",
"pytorch-linux-xenial-py3-clang5-asan",
"pytorch-linux-xenial-py3-clang7-asan",
"pytorch-linux-xenial-py3-clang7-onnx",
"pytorch-linux-xenial-py3.8",
"pytorch-linux-xenial-py3.6-clang7",
"pytorch-linux-xenial-py3.6-gcc5.4", # this one is used in doc builds
"pytorch-linux-xenial-py3.6-gcc7.2",
"pytorch-linux-xenial-py3.6-gcc7",
"pytorch-linux-bionic-rocm3.9-py3.6",
"pytorch-linux-bionic-rocm4.0.1-py3.6",
"pytorch-linux-bionic-rocm4.1-py3.6",
"pytorch-linux-bionic-rocm4.2-py3.6",
"pytorch-linux-bionic-rocm4.3.1-py3.6",
]
# This entry should be an element from the list above
# This should contain the image matching the "slow_gradcheck" entry in
# pytorch_build_data.py
SLOW_GRADCHECK_IMAGE_NAME = "pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7"
def get_workflow_jobs(only_slow_gradcheck=False):
def get_workflow_jobs():
"""Generates a list of docker image build definitions"""
ret = []
for image_name in IMAGE_NAMES:
if only_slow_gradcheck and image_name is not SLOW_GRADCHECK_IMAGE_NAME:
continue
parameters = OrderedDict({
"name": quote(f"docker-{image_name}"),
"image_name": quote(image_name),

View File

@ -0,0 +1,78 @@
import cimodel.lib.miniutils as miniutils
from cimodel.data.simple.util.versions import MultiPartVersion, CudaVersion
from cimodel.data.simple.util.docker_constants import DOCKER_IMAGE_BASIC, DOCKER_IMAGE_CUDA_10_2
class GeConfigTestJob:
def __init__(self,
py_version,
gcc_version,
cuda_version,
variant_parts,
extra_requires,
use_cuda_docker=False,
build_env_override=None):
self.py_version = py_version
self.gcc_version = gcc_version
self.cuda_version = cuda_version
self.variant_parts = variant_parts
self.extra_requires = extra_requires
self.use_cuda_docker = use_cuda_docker
self.build_env_override = build_env_override
def get_all_parts(self, with_dots):
maybe_py_version = self.py_version.render_dots_or_parts(with_dots) if self.py_version else []
maybe_gcc_version = self.gcc_version.render_dots_or_parts(with_dots) if self.gcc_version else []
maybe_cuda_version = self.cuda_version.render_dots_or_parts(with_dots) if self.cuda_version else []
common_parts = [
"pytorch",
"linux",
"xenial",
] + maybe_cuda_version + maybe_py_version + maybe_gcc_version
return common_parts + self.variant_parts
def gen_tree(self):
resource_class = "gpu.medium" if self.use_cuda_docker else "large"
docker_image = DOCKER_IMAGE_CUDA_10_2 if self.use_cuda_docker else DOCKER_IMAGE_BASIC
full_name = "_".join(self.get_all_parts(False))
build_env = self.build_env_override or "-".join(self.get_all_parts(True))
props_dict = {
"name": full_name,
"build_environment": build_env,
"requires": self.extra_requires,
"resource_class": resource_class,
"docker_image": docker_image,
}
if self.use_cuda_docker:
props_dict["use_cuda_docker_runtime"] = miniutils.quote(str(1))
return [{"pytorch_linux_test": props_dict}]
WORKFLOW_DATA = [
GeConfigTestJob(
MultiPartVersion([3, 6], "py"),
MultiPartVersion([5, 4], "gcc"),
None,
["jit_legacy", "test"],
["pytorch_linux_xenial_py3_6_gcc5_4_build"]),
GeConfigTestJob(
None,
None,
CudaVersion(10, 2),
["cudnn7", "py3", "jit_legacy", "test"],
["pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_build"],
use_cuda_docker=True,
),
]
def get_workflow_jobs():
return [item.gen_tree() for item in WORKFLOW_DATA]

View File

@ -1,7 +1,7 @@
from cimodel.data.simple.util.versions import MultiPartVersion
import cimodel.lib.miniutils as miniutils
XCODE_VERSION = MultiPartVersion([12, 5, 1])
XCODE_VERSION = MultiPartVersion([12, 0, 0])
class ArchVariant:

View File

@ -1,5 +1,4 @@
import cimodel.data.simple.ios_definitions as ios_definitions
import cimodel.lib.miniutils as miniutils
class IOSNightlyJob:
@ -44,8 +43,6 @@ class IOSNightlyJob:
props_dict["ios_arch"] = self.variant
props_dict["ios_platform"] = ios_definitions.get_platform(self.variant)
props_dict["name"] = self.gen_job_name()
props_dict["use_metal"] = miniutils.quote(str(int(True)))
props_dict["use_coreml"] = miniutils.quote(str(int(True)))
template_name = "_".join([
"binary",

View File

@ -40,13 +40,11 @@ class WindowsJob:
target_arch = self.cuda_version.render_dots() if self.cuda_version else "cpu"
python_version = "3.8"
base_name_parts = [
"pytorch",
"windows",
self.vscode_spec.render(),
"py" + python_version.replace(".", ""),
"py36",
target_arch,
]
@ -58,7 +56,7 @@ class WindowsJob:
self.cudnn_version = 8 if self.cuda_version.major == 11 else 7
arch_env_elements = (
["cuda" + str(self.cuda_version.major) + "." + str(self.cuda_version.minor)]
["cuda" + str(self.cuda_version.major), "cudnn" + str(self.cudnn_version)]
if self.cuda_version
else ["cpu"]
)
@ -67,7 +65,7 @@ class WindowsJob:
["pytorch", "win"]
+ self.vscode_spec.get_elements()
+ arch_env_elements
+ ["py" + python_version.split(".")[0]]
+ ["py3"]
)
is_running_on_cuda = bool(self.cuda_version) and not self.force_on_cpu
@ -77,8 +75,7 @@ class WindowsJob:
else:
props_dict = {
"build_environment": build_environment_string,
"python_version": miniutils.quote(python_version),
"vs_version": miniutils.quote("16.8.6"),
"python_version": miniutils.quote("3.6"),
"vc_version": miniutils.quote(self.vscode_spec.dotted_version()),
"vc_year": miniutils.quote(str(self.vscode_spec.year)),
"vc_product": self.vscode_spec.get_product(),
@ -146,13 +143,20 @@ class VcSpec:
_VC2019 = VcSpec(2019)
WORKFLOW_DATA = [
# VS2019 CUDA-10.2
WindowsJob(None, _VC2019, CudaVersion(10, 2), master_only=True),
# VS2019 CUDA-10.2 force on cpu
WindowsJob(1, _VC2019, CudaVersion(10, 2), force_on_cpu=True, master_only=True),
# TODO: This test is disabled due to https://github.com/pytorch/pytorch/issues/59724
# WindowsJob('_azure_multi_gpu', _VC2019, CudaVersion(11, 1), multi_gpu=True, master_and_nightly=True),
# VS2019 CUDA-10.1
WindowsJob(None, _VC2019, CudaVersion(10, 1), master_only=True),
WindowsJob(1, _VC2019, CudaVersion(10, 1), master_only=True),
WindowsJob(2, _VC2019, CudaVersion(10, 1), master_only=True),
# VS2019 CUDA-11.1
WindowsJob(None, _VC2019, CudaVersion(11, 1)),
WindowsJob(1, _VC2019, CudaVersion(11, 1), master_only=True),
WindowsJob(2, _VC2019, CudaVersion(11, 1), master_only=True),
WindowsJob('_azure_multi_gpu', _VC2019, CudaVersion(11, 1), multi_gpu=True, nightly_only=True),
# VS2019 CPU-only
WindowsJob(None, _VC2019, None),
WindowsJob(1, _VC2019, None),
WindowsJob(2, _VC2019, None),
WindowsJob(1, _VC2019, CudaVersion(10, 1), force_on_cpu=True, master_only=True),
]

File diff suppressed because it is too large Load Diff

View File

@ -27,5 +27,5 @@ Docker builds are now defined with `.circleci/cimodel/data/simple/docker_definit
./build.sh pytorch-linux-bionic-py3.8-gcc9 -t myimage:latest
# Set flags (see build.sh) and build image
sudo bash -c 'PROTOBUF=1 ./build.sh pytorch-linux-bionic-py3.8-gcc9 -t myimage:latest
sudo bash -c 'BREAKPAD=1 ./build.sh pytorch-linux-bionic-py3.8-gcc9 -t myimage:latest
```

View File

@ -78,108 +78,119 @@ TRAVIS_DL_URL_PREFIX="https://s3.amazonaws.com/travis-python-archives/binaries/u
case "$image" in
pytorch-linux-xenial-py3.8)
ANACONDA_PYTHON_VERSION=3.8
CMAKE_VERSION=3.10.3
GCC_VERSION=7
# Do not install PROTOBUF, DB, and VISION as a test
;;
pytorch-linux-xenial-py3.6-gcc5.4)
ANACONDA_PYTHON_VERSION=3.6
CMAKE_VERSION=3.10.3
GCC_VERSION=5
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
BREAKPAD=yes
;;
pytorch-linux-xenial-py3.6-gcc7.2)
ANACONDA_PYTHON_VERSION=3.6
CMAKE_VERSION=3.10.3
GCC_VERSION=7
# Do not install PROTOBUF, DB, and VISION as a test
;;
pytorch-linux-xenial-py3.6-gcc7)
ANACONDA_PYTHON_VERSION=3.6
CMAKE_VERSION=3.10.3
GCC_VERSION=7
PROTOBUF=yes
DB=yes
VISION=yes
BREAKPAD=yes
;;
pytorch-linux-xenial-cuda10-cudnn7-py3-gcc7)
CUDA_VERSION=10.0
CUDNN_VERSION=7
ANACONDA_PYTHON_VERSION=3.6
GCC_VERSION=7
PROTOBUF=yes
DB=yes
VISION=yes
BREAKPAD=yes
;;
pytorch-linux-xenial-cuda10.1-cudnn7-py3-gcc7)
CUDA_VERSION=10.1
CUDNN_VERSION=7
ANACONDA_PYTHON_VERSION=3.6
GCC_VERSION=7
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
BREAKPAD=yes
;;
pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7)
CUDA_VERSION=10.2
CUDNN_VERSION=7
ANACONDA_PYTHON_VERSION=3.6
CMAKE_VERSION=3.10.3
GCC_VERSION=7
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
BREAKPAD=yes
;;
pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7)
CUDA_VERSION=11.1
CUDNN_VERSION=8
ANACONDA_PYTHON_VERSION=3.6
CMAKE_VERSION=3.10.3
GCC_VERSION=7
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
BREAKPAD=yes
;;
pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7)
CUDA_VERSION=11.3.0 # Deviating from major.minor to conform to nvidia's Docker image names
CUDNN_VERSION=8
ANACONDA_PYTHON_VERSION=3.6
CMAKE_VERSION=3.10.3
GCC_VERSION=7
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
BREAKPAD=yes
;;
pytorch-linux-xenial-py3-clang5-asan)
ANACONDA_PYTHON_VERSION=3.6
CLANG_VERSION=5.0
CMAKE_VERSION=3.10.3
PROTOBUF=yes
DB=yes
VISION=yes
;;
pytorch-linux-xenial-py3-clang7-asan)
ANACONDA_PYTHON_VERSION=3.6
CLANG_VERSION=7
CMAKE_VERSION=3.10.3
PROTOBUF=yes
DB=yes
VISION=yes
BREAKPAD=yes
;;
pytorch-linux-xenial-py3-clang7-onnx)
ANACONDA_PYTHON_VERSION=3.6
CLANG_VERSION=7
CMAKE_VERSION=3.10.3
PROTOBUF=yes
DB=yes
VISION=yes
BREAKPAD=yes
;;
pytorch-linux-xenial-py3-clang5-android-ndk-r19c)
ANACONDA_PYTHON_VERSION=3.6
CLANG_VERSION=5.0
CMAKE_VERSION=3.10.3
LLVMDEV=yes
PROTOBUF=yes
ANDROID=yes
ANDROID_NDK_VERSION=r19c
GRADLE_VERSION=6.8.3
CMAKE_VERSION=3.7.0
NINJA_VERSION=1.9.0
;;
pytorch-linux-xenial-py3.6-clang7)
ANACONDA_PYTHON_VERSION=3.6
CMAKE_VERSION=3.10.3
CLANG_VERSION=7
PROTOBUF=yes
DB=yes
VISION=yes
BREAKPAD=yes
;;
pytorch-linux-bionic-py3.6-clang9)
ANACONDA_PYTHON_VERSION=3.6
@ -187,6 +198,7 @@ case "$image" in
PROTOBUF=yes
DB=yes
VISION=yes
BREAKPAD=yes
VULKAN_SDK_VERSION=1.2.162.1
SWIFTSHADER=yes
;;
@ -196,6 +208,8 @@ case "$image" in
PROTOBUF=yes
DB=yes
VISION=yes
BREAKPAD=yes
BREAKPAD=yes
;;
pytorch-linux-bionic-cuda10.2-cudnn7-py3.6-clang9)
CUDA_VERSION=10.2
@ -205,6 +219,17 @@ case "$image" in
PROTOBUF=yes
DB=yes
VISION=yes
BREAKPAD=yes
;;
pytorch-linux-bionic-cuda10.2-cudnn7-py3.8-gcc9)
CUDA_VERSION=10.2
CUDNN_VERSION=7
ANACONDA_PYTHON_VERSION=3.8
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
BREAKPAD=yes
;;
pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7)
CUDA_VERSION=10.2
@ -214,6 +239,7 @@ case "$image" in
PROTOBUF=yes
DB=yes
VISION=yes
BREAKPAD=yes
;;
pytorch-linux-bionic-cuda11.0-cudnn8-py3.6-gcc9)
CUDA_VERSION=11.0
@ -223,14 +249,25 @@ case "$image" in
PROTOBUF=yes
DB=yes
VISION=yes
BREAKPAD=yes
ROCM_VERSION=3.9
;;
pytorch-linux-bionic-rocm4.0.1-py3.6)
ANACONDA_PYTHON_VERSION=3.6
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
BREAKPAD=yes
ROCM_VERSION=4.0.1
;;
pytorch-linux-bionic-rocm4.1-py3.6)
ANACONDA_PYTHON_VERSION=3.6
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
BREAKPAD=yes
ROCM_VERSION=4.1
;;
pytorch-linux-bionic-rocm4.2-py3.6)
@ -239,25 +276,16 @@ case "$image" in
PROTOBUF=yes
DB=yes
VISION=yes
BREAKPAD=yes
ROCM_VERSION=4.2
;;
pytorch-linux-bionic-rocm4.3.1-py3.6)
ANACONDA_PYTHON_VERSION=3.6
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
ROCM_VERSION=4.3.1
;;
*)
# Catch-all for builds that are not hardcoded.
PROTOBUF=yes
DB=yes
VISION=yes
BREAKPAD=yes
echo "image '$image' did not match an existing build configuration"
if [[ "$image" == *xenial* ]]; then
CMAKE_VERSION=3.10.3
fi
if [[ "$image" == *py* ]]; then
extract_version_from_image_name py ANACONDA_PYTHON_VERSION
fi
@ -292,7 +320,7 @@ if [ -n "${JENKINS:-}" ]; then
JENKINS_GID=$(id -g jenkins)
fi
tmp_tag=$(basename "$(mktemp -u)" | tr '[:upper:]' '[:lower:]')
tmp_tag="tmp-$(cat /dev/urandom | tr -dc 'a-z' | head -c 32)"
# Build image
# TODO: build-arg THRIFT is not turned on for any image, remove it once we confirm
@ -320,6 +348,7 @@ docker build \
--build-arg "GCC_VERSION=${GCC_VERSION}" \
--build-arg "CUDA_VERSION=${CUDA_VERSION}" \
--build-arg "CUDNN_VERSION=${CUDNN_VERSION}" \
--build-arg "BREAKPAD=${BREAKPAD}" \
--build-arg "ANDROID=${ANDROID}" \
--build-arg "ANDROID_NDK=${ANDROID_NDK_VERSION}" \
--build-arg "GRADLE_VERSION=${GRADLE_VERSION}" \

View File

@ -0,0 +1,19 @@
#!/bin/bash
set -ex
git clone https://github.com/malfet/breakpad.git -b pytorch/release-1.9
pushd breakpad
git clone https://chromium.googlesource.com/linux-syscall-support src/third_party/lss
pushd src/third_party/lss
# same as with breakpad, there are no real releases for this repo so use a
# commit as the pin
git checkout e1e7b0ad8ee99a875b272c8e33e308472e897660
popd
./configure
make
make install
popd
rm -rf breakpad

View File

@ -4,9 +4,6 @@ set -ex
[ -n "$CMAKE_VERSION" ]
# Remove system cmake install so it won't get used instead
apt-get remove cmake -y
# Turn 3.6.3 into v3.6
path=$(echo "${CMAKE_VERSION}" | sed -e 's/\([0-9].[0-9]\+\).*/v\1/')
file="cmake-${CMAKE_VERSION}-Linux-x86_64.tar.gz"

View File

@ -69,8 +69,8 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then
}
# Install PyTorch conda deps, as per https://github.com/pytorch/pytorch README
# DO NOT install cmake here as it would install a version newer than 3.10, but
# we want to pin to version 3.10.
# DO NOT install cmake here as it would install a version newer than 3.5, but
# we want to pin to version 3.5.
SCIPY_VERSION=1.1.0
if [ "$ANACONDA_PYTHON_VERSION" = "3.9" ]; then
# Install llvm-8 as it is required to compile llvmlite-0.30.0 from source
@ -86,7 +86,11 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then
conda_install numpy=1.18.5 astunparse pyyaml mkl mkl-include setuptools cffi future six dataclasses typing_extensions
fi
if [[ "$CUDA_VERSION" == 10.2* ]]; then
if [[ "$CUDA_VERSION" == 10.0* ]]; then
conda_install magma-cuda100 -c pytorch
elif [[ "$CUDA_VERSION" == 10.1* ]]; then
conda_install magma-cuda101 -c pytorch
elif [[ "$CUDA_VERSION" == 10.2* ]]; then
conda_install magma-cuda102 -c pytorch
elif [[ "$CUDA_VERSION" == 11.0* ]]; then
conda_install magma-cuda110 -c pytorch
@ -112,7 +116,6 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then
boto3==1.16.34 \
coverage==5.5 \
hypothesis==4.53.2 \
expecttest==0.1.3 \
mypy==0.812 \
tb-nightly

View File

@ -2,6 +2,23 @@
set -ex
# This function installs protobuf 2.6
install_protobuf_26() {
pb_dir="/usr/temp_pb_install_dir"
mkdir -p $pb_dir
# On the nvidia/cuda:9-cudnn7-devel-centos7 image we need this symlink or
# else it will fail with
# g++: error: ./../lib64/crti.o: No such file or directory
ln -s /usr/lib64 "$pb_dir/lib64"
curl -LO "https://github.com/google/protobuf/releases/download/v2.6.1/protobuf-2.6.1.tar.gz"
tar -xvz -C "$pb_dir" --strip-components 1 -f protobuf-2.6.1.tar.gz
pushd "$pb_dir" && ./configure && make && make check && sudo make install && sudo ldconfig
popd
rm -rf $pb_dir
}
install_ubuntu() {
apt-get update
apt-get install -y --no-install-recommends \

View File

@ -0,0 +1,4 @@
#!/bin/bash
sudo apt-get -qq update
sudo apt-get -qq install --allow-downgrades --allow-change-held-packages libnccl-dev=2.5.6-1+cuda10.1 libnccl2=2.5.6-1+cuda10.1

View File

@ -1,10 +1,4 @@
#!/bin/bash
sudo apt-get update
# also install ssh to avoid error of:
# --------------------------------------------------------------------------
# The value of the MCA parameter "plm_rsh_agent" was set to a path
# that could not be found:
# plm_rsh_agent: ssh : rsh
sudo apt-get install -y ssh
sudo apt-get install -y --allow-downgrades --allow-change-held-packages openmpi-bin libopenmpi-dev

View File

@ -2,8 +2,8 @@
set -ex
# This function installs protobuf 3.17
install_protobuf_317() {
# This function installs protobuf 2.6
install_protobuf_26() {
pb_dir="/usr/temp_pb_install_dir"
mkdir -p $pb_dir
@ -12,32 +12,37 @@ install_protobuf_317() {
# g++: error: ./../lib64/crti.o: No such file or directory
ln -s /usr/lib64 "$pb_dir/lib64"
curl -LO "https://github.com/protocolbuffers/protobuf/releases/download/v3.17.3/protobuf-all-3.17.3.tar.gz"
tar -xvz -C "$pb_dir" --strip-components 1 -f protobuf-all-3.17.3.tar.gz
# -j2 to balance memory usage and speed.
# naked `-j` seems to use too much memory.
pushd "$pb_dir" && ./configure && make -j2 && make -j2 check && sudo make -j2 install && sudo ldconfig
curl -LO "https://github.com/google/protobuf/releases/download/v2.6.1/protobuf-2.6.1.tar.gz"
tar -xvz -C "$pb_dir" --strip-components 1 -f protobuf-2.6.1.tar.gz
pushd "$pb_dir" && ./configure && make && make check && sudo make install && sudo ldconfig
popd
rm -rf $pb_dir
}
install_ubuntu() {
# Ubuntu 14.04 has cmake 2.8.12 as the default option, so we will
# Ubuntu 14.04 ships with protobuf 2.5, but ONNX needs protobuf >= 2.6
# so we install that here if on 14.04
# Ubuntu 14.04 also has cmake 2.8.12 as the default option, so we will
# install cmake3 here and use cmake3.
apt-get update
if [[ "$UBUNTU_VERSION" == 14.04 ]]; then
apt-get install -y --no-install-recommends cmake3
install_protobuf_26
else
apt-get install -y --no-install-recommends \
libprotobuf-dev \
protobuf-compiler
fi
# Cleanup
apt-get autoclean && apt-get clean
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
install_protobuf_317
}
install_centos() {
install_protobuf_317
# Centos7 ships with protobuf 2.5, but ONNX needs protobuf >= 2.6
# so we always install install that here
install_protobuf_26
}
# Install base packages depending on the base OS

View File

@ -4,13 +4,9 @@ set -ex
install_magma() {
# "install" hipMAGMA into /opt/rocm/magma by copying after build
git clone https://bitbucket.org/icl/magma.git -b magma_ctrl_launch_bounds
git clone https://bitbucket.org/icl/magma.git
pushd magma
# The branch "magma_ctrl_launch_bounds" is having a fix over the below commit, so keeping the below comment for reference.
#git checkout 878b1ce02e9cfe4a829be22c8f911e9c0b6bd88f
# Work around non-asii characters in certain magma sources; remove this after upstream magma fixes this.
perl -i.bak -pe 's/[^[:ascii:]]//g' sparse/control/magma_zfree.cpp
perl -i.bak -pe 's/[^[:ascii:]]//g' sparse/control/magma_zsolverinfo.cpp
git checkout 878b1ce02e9cfe4a829be22c8f911e9c0b6bd88f
cp make.inc-examples/make.inc.hip-gcc-mkl make.inc
echo 'LIBDIR += -L$(MKLROOT)/lib' >> make.inc
echo 'LIB += -Wl,--enable-new-dtags -Wl,--rpath,/opt/rocm/lib -Wl,--rpath,$(MKLROOT)/lib -Wl,--rpath,/opt/rocm/magma/lib' >> make.inc
@ -19,7 +15,7 @@ install_magma() {
sed -i 's/^FOPENMP/#FOPENMP/g' make.inc
export PATH="${PATH}:/opt/rocm/bin"
make -f make.gen.hipMAGMA -j $(nproc)
make lib/libmagma.so -j $(nproc) MKLROOT=/opt/conda
LANG=C.UTF-8 make lib/libmagma.so -j $(nproc) MKLROOT=/opt/conda
make testing/testing_dgemm -j $(nproc) MKLROOT=/opt/conda
popd
mv magma /opt/rocm

View File

@ -2,6 +2,23 @@
set -ex
# This function installs protobuf 2.6
install_protobuf_26() {
pb_dir="/usr/temp_pb_install_dir"
mkdir -p $pb_dir
# On the nvidia/cuda:9-cudnn7-devel-centos7 image we need this symlink or
# else it will fail with
# g++: error: ./../lib64/crti.o: No such file or directory
ln -s /usr/lib64 "$pb_dir/lib64"
curl -LO "https://github.com/google/protobuf/releases/download/v2.6.1/protobuf-2.6.1.tar.gz"
tar -xvz -C "$pb_dir" --strip-components 1 -f protobuf-2.6.1.tar.gz
pushd "$pb_dir" && ./configure && make && make check && sudo make install && sudo ldconfig
popd
rm -rf $pb_dir
}
install_ubuntu() {
apt-get update
apt-get install -y --no-install-recommends \

View File

@ -65,12 +65,6 @@ ADD ./common/install_openssl.sh install_openssl.sh
ENV OPENSSL_ROOT_DIR /opt/openssl
RUN bash ./install_openssl.sh
# (optional) Install non-default CMake version
ARG CMAKE_VERSION
ADD ./common/install_cmake.sh install_cmake.sh
RUN if [ -n "${CMAKE_VERSION}" ]; then bash ./install_cmake.sh; fi
RUN rm install_cmake.sh
# Install ccache/sccache (do this last, so we get priority in PATH)
ADD ./common/install_cache.sh install_cache.sh
ENV PATH /opt/cache/bin:$PATH
@ -82,6 +76,11 @@ ADD ./common/install_jni.sh install_jni.sh
ADD ./java/jni.h jni.h
RUN bash ./install_jni.sh && rm install_jni.sh
# Install NCCL for when CUDA is version 10.1
ADD ./common/install_nccl.sh install_nccl.sh
RUN if [ "${CUDA_VERSION}" = 10.1 ]; then bash ./install_nccl.sh; fi
RUN rm install_nccl.sh
# Install Open MPI for CUDA
ADD ./common/install_openmpi.sh install_openmpi.sh
RUN if [ -n "${CUDA_VERSION}" ]; then bash install_openmpi.sh; fi

View File

@ -82,6 +82,13 @@ RUN rm AndroidManifest.xml
RUN rm build.gradle
ENV INSTALLED_ANDROID ${ANDROID}
# (optional) Install breakpad
ARG BREAKPAD
ADD ./common/install_breakpad.sh install_breakpad.sh
RUN if [ -n "${BREAKPAD}" ]; then bash ./install_breakpad.sh; fi
RUN rm install_breakpad.sh
ENV INSTALLED_BREAKPAD ${BREAKPAD}
# (optional) Install Vulkan SDK
ARG VULKAN_SDK_VERSION
ADD ./common/install_vulkan_sdk.sh install_vulkan_sdk.sh

View File

@ -13,8 +13,10 @@ from collections import namedtuple
import cimodel.data.binary_build_definitions as binary_build_definitions
import cimodel.data.pytorch_build_definitions as pytorch_build_definitions
import cimodel.data.simple.android_definitions
import cimodel.data.simple.bazel_definitions
import cimodel.data.simple.binary_smoketest
import cimodel.data.simple.docker_definitions
import cimodel.data.simple.ge_config_tests
import cimodel.data.simple.ios_definitions
import cimodel.data.simple.macos_definitions
import cimodel.data.simple.mobile_definitions
@ -133,6 +135,8 @@ def gen_build_workflows_tree():
cimodel.data.simple.android_definitions.get_workflow_jobs,
cimodel.data.simple.ios_definitions.get_workflow_jobs,
cimodel.data.simple.mobile_definitions.get_workflow_jobs,
cimodel.data.simple.ge_config_tests.get_workflow_jobs,
cimodel.data.simple.bazel_definitions.get_workflow_jobs,
cimodel.data.simple.binary_smoketest.get_workflow_jobs,
cimodel.data.simple.nightly_ios.get_workflow_jobs,
cimodel.data.simple.nightly_android.get_workflow_jobs,
@ -141,20 +145,14 @@ def gen_build_workflows_tree():
binary_build_definitions.get_post_upload_jobs,
binary_build_definitions.get_binary_smoke_test_jobs,
]
build_jobs = [f() for f in build_workflows_functions]
master_build_jobs = filter_master_only_jobs(build_jobs)
binary_build_functions = [
binary_build_definitions.get_binary_build_jobs,
binary_build_definitions.get_nightly_tests,
binary_build_definitions.get_nightly_uploads,
]
slow_gradcheck_jobs = [
pytorch_build_definitions.get_workflow_jobs,
cimodel.data.simple.docker_definitions.get_workflow_jobs,
]
build_jobs = [f() for f in build_workflows_functions]
master_build_jobs = filter_master_only_jobs(build_jobs)
return {
"workflows": {
"binary_builds": {
@ -169,10 +167,6 @@ def gen_build_workflows_tree():
"when": r"<< pipeline.parameters.run_master_build >>",
"jobs": master_build_jobs,
},
"slow_gradcheck_build": {
"when": r"<< pipeline.parameters.run_slow_gradcheck_build >>",
"jobs": [f(only_slow_gradcheck=True) for f in slow_gradcheck_jobs],
},
}
}

View File

@ -55,14 +55,15 @@ else
echo "Can't tell what to checkout"
exit 1
fi
retry git submodule update --init --recursive --jobs 0
retry git submodule update --init --recursive
echo "Using Pytorch from "
git --no-pager log --max-count 1
popd
# Clone the Builder master repo
retry git clone -q https://github.com/pytorch/builder.git -b release/1.10 "$BUILDER_ROOT"
retry git clone -q https://github.com/pytorch/builder.git "$BUILDER_ROOT"
pushd "$BUILDER_ROOT"
git checkout release/1.9
echo "Using builder from "
git --no-pager log --max-count 1
popd

View File

@ -22,7 +22,7 @@ export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
# sync submodules
cd ${PROJ_ROOT}
git submodule sync
git submodule update --init --recursive --jobs 0
git submodule update --init --recursive
# run build script
chmod a+x ${PROJ_ROOT}/scripts/build_ios.sh
@ -31,12 +31,8 @@ cat ${PROJ_ROOT}/scripts/build_ios.sh
echo "########################################################"
echo "IOS_ARCH: ${IOS_ARCH}"
echo "IOS_PLATFORM: ${IOS_PLATFORM}"
echo "USE_PYTORCH_METAL: ${USE_PYTORCH_METAL}"
echo "USE_COREML_DELEGATE: ${USE_COREML_DELEGATE}"
export IOS_ARCH=${IOS_ARCH}
export IOS_PLATFORM=${IOS_PLATFORM}
export USE_PYTORCH_METAL=${USE_PYTORCH_METAL}
export USE_COREML_DELEGATE=${USE_COREML_DELEGATE}
unbuffer ${PROJ_ROOT}/scripts/build_ios.sh 2>&1 | ts
#store the binary

View File

@ -8,17 +8,16 @@ cd ${PROJ_ROOT}/ios/TestApp
# install fastlane
sudo gem install bundler && bundle install
# install certificates
echo "${IOS_CERT_KEY_2022}" >> cert.txt
echo "${IOS_CERT_KEY}" >> cert.txt
base64 --decode cert.txt -o Certificates.p12
rm cert.txt
bundle exec fastlane install_root_cert
bundle exec fastlane install_dev_cert
bundle exec fastlane install_cert
# install the provisioning profile
PROFILE=PyTorch_CI_2022.mobileprovision
PROFILE=PyTorch_CI_2021.mobileprovision
PROVISIONING_PROFILES=~/Library/MobileDevice/Provisioning\ Profiles
mkdir -pv "${PROVISIONING_PROFILES}"
cd "${PROVISIONING_PROFILES}"
echo "${IOS_SIGN_KEY_2022}" >> cert.txt
echo "${IOS_SIGN_KEY}" >> cert.txt
base64 --decode cert.txt -o ${PROFILE}
rm cert.txt
# run the ruby build script
@ -26,5 +25,5 @@ if ! [ -x "$(command -v xcodebuild)" ]; then
echo 'Error: xcodebuild is not installed.'
exit 1
fi
PROFILE=PyTorch_CI_2022
ruby ${PROJ_ROOT}/scripts/xcode_build.rb -i ${PROJ_ROOT}/build_ios/install -x ${PROJ_ROOT}/ios/TestApp/TestApp.xcodeproj -p ${IOS_PLATFORM} -c ${PROFILE} -t ${IOS_DEV_TEAM_ID} -f Accelerate,MetalPerformanceShaders,CoreML
PROFILE=PyTorch_CI_2021
ruby ${PROJ_ROOT}/scripts/xcode_build.rb -i ${PROJ_ROOT}/build_ios/install -x ${PROJ_ROOT}/ios/TestApp/TestApp.xcodeproj -p ${IOS_PLATFORM} -c ${PROFILE} -t ${IOS_DEV_TEAM_ID}

View File

@ -24,17 +24,14 @@ do
done
lipo -i ${ZIP_DIR}/install/lib/*.a
# copy the umbrella header and license
cp ${PROJ_ROOT}/ios/LibTorch-Lite.h ${ZIP_DIR}/src/
cp ${PROJ_ROOT}/ios/LibTorch.h ${ZIP_DIR}/src/
cp ${PROJ_ROOT}/LICENSE ${ZIP_DIR}/
# zip the library
export DATE="$(date -u +%Y%m%d)"
export IOS_NIGHTLY_BUILD_VERSION="1.10.0.${DATE}"
# libtorch_lite_ios_nightly_1.10.0.20210810.zip
ZIPFILE="libtorch_lite_ios_nightly_${IOS_NIGHTLY_BUILD_VERSION}.zip"
ZIPFILE=libtorch_ios_nightly_build.zip
cd ${ZIP_DIR}
#for testing
touch version.txt
echo "${IOS_NIGHTLY_BUILD_VERSION}" > version.txt
echo $(date +%s) > version.txt
zip -r ${ZIPFILE} install src version.txt LICENSE
# upload to aws
# Install conda then 'conda install' awscli
@ -51,14 +48,3 @@ set +x
# echo "AWS KEY: ${AWS_ACCESS_KEY_ID}"
# echo "AWS SECRET: ${AWS_SECRET_ACCESS_KEY}"
aws s3 cp ${ZIPFILE} s3://ossci-ios-build/ --acl public-read
# create a new LibTorch-Lite-Nightly.podspec from the template
echo "cp ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec.template ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec"
cp ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec.template ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec
# update pod version
sed -i '' -e "s/IOS_NIGHTLY_BUILD_VERSION/${IOS_NIGHTLY_BUILD_VERSION}/g" ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec
cat ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec
# push the new LibTorch-Lite-Nightly.podspec to CocoaPods
pod trunk push --verbose --allow-warnings --use-libraries --skip-import-validation ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec

View File

@ -4,14 +4,10 @@ echo "RUNNING ON $(uname -a) WITH $(nproc) CPUS AND $(free -m)"
set -eux -o pipefail
source /env
# Because most Circle executors only have 20 CPUs, using more causes OOMs w/ Ninja and nvcc parallelization
MEMORY_LIMIT_MAX_JOBS=18
NUM_CPUS=$(( $(nproc) - 2 ))
# Defaults here so they can be changed in one place
export MAX_JOBS=${MAX_JOBS:-$(( $(nproc) - 2 ))}
# Defaults here for **binary** linux builds so they can be changed in one place
export MAX_JOBS=${MAX_JOBS:-$(( ${NUM_CPUS} > ${MEMORY_LIMIT_MAX_JOBS} ? ${MEMORY_LIMIT_MAX_JOBS} : ${NUM_CPUS} ))}
if [[ "${DESIRED_CUDA}" == "cu111" || "${DESIRED_CUDA}" == "cu113" ]]; then
if [[ "${DESIRED_CUDA}" == "cu111" ]]; then
export BUILD_SPLIT_CUDA="ON"
fi
@ -26,9 +22,5 @@ else
build_script='manywheel/build.sh'
fi
if [[ "$CIRCLE_BRANCH" == "master" ]] || [[ "$CIRCLE_BRANCH" == release/* ]]; then
export BUILD_DEBUG_INFO=1
fi
# Build the package
SKIP_ALL_TESTS=1 "/builder/$build_script"

View File

@ -14,10 +14,6 @@ chmod +x "$build_script"
# Build
cat >"$build_script" <<EOL
export PATH="$workdir/miniconda/bin:$PATH"
if [[ "$CIRCLE_BRANCH" == "nightly" ]]; then
export USE_PYTORCH_METAL_EXPORT=1
export USE_COREML_DELEGATE=1
fi
if [[ "$PACKAGE_TYPE" == conda ]]; then
"$workdir/builder/conda/build_pytorch.sh"
else

View File

@ -62,7 +62,7 @@ if [[ -z "$DOCKER_IMAGE" ]]; then
if [[ "$PACKAGE_TYPE" == conda ]]; then
export DOCKER_IMAGE="pytorch/conda-cuda"
elif [[ "$DESIRED_CUDA" == cpu ]]; then
export DOCKER_IMAGE="pytorch/manylinux-cpu"
export DOCKER_IMAGE="pytorch/manylinux-cuda100"
else
export DOCKER_IMAGE="pytorch/manylinux-cuda${DESIRED_CUDA:2}"
fi
@ -85,7 +85,7 @@ PIP_UPLOAD_FOLDER='nightly/'
# We put this here so that OVERRIDE_PACKAGE_VERSION below can read from it
export DATE="$(date -u +%Y%m%d)"
#TODO: We should be pulling semver version from the base version.txt
BASE_BUILD_VERSION="1.10.0.dev$DATE"
BASE_BUILD_VERSION="1.9.0.dev$DATE"
# Change BASE_BUILD_VERSION to git tag when on a git tag
# Use 'git -C' to make doubly sure we're in the correct directory for checking
# the git tag
@ -148,7 +148,7 @@ if [[ "${BUILD_FOR_SYSTEM:-}" == "windows" ]]; then
fi
export DATE="$DATE"
export NIGHTLIES_DATE_PREAMBLE=1.10.0.dev
export NIGHTLIES_DATE_PREAMBLE=1.9.0.dev
export PYTORCH_BUILD_VERSION="$PYTORCH_BUILD_VERSION"
export PYTORCH_BUILD_NUMBER="$PYTORCH_BUILD_NUMBER"
export OVERRIDE_PACKAGE_VERSION="$PYTORCH_BUILD_VERSION"

View File

@ -8,45 +8,15 @@ export CUDA_VERSION="${DESIRED_CUDA/cu/}"
export USE_SCCACHE=1
export SCCACHE_BUCKET=ossci-compiler-cache-windows
export NIGHTLIES_PYTORCH_ROOT="$PYTORCH_ROOT"
export VC_YEAR=2019
if [[ "${DESIRED_CUDA}" == "cu111" || "${DESIRED_CUDA}" == "cu113" ]]; then
export BUILD_SPLIT_CUDA="ON"
if [[ "$CUDA_VERSION" == "92" || "$CUDA_VERSION" == "100" ]]; then
export VC_YEAR=2017
else
export VC_YEAR=2019
fi
echo "Free Space for CUDA DEBUG BUILD"
if [[ "$CIRCLECI" == 'true' ]]; then
if [[ -d "C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\Community" ]]; then
rm -rf "C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\Community"
fi
if [[ -d "C:\\Program Files (x86)\\Microsoft Visual Studio 14.0" ]]; then
rm -rf "C:\\Program Files (x86)\\Microsoft Visual Studio 14.0"
fi
if [[ -d "C:\\Program Files (x86)\\Microsoft.NET" ]]; then
rm -rf "C:\\Program Files (x86)\\Microsoft.NET"
fi
if [[ -d "C:\\Program Files\\dotnet" ]]; then
rm -rf "C:\\Program Files\\dotnet"
fi
if [[ -d "C:\\Program Files (x86)\\dotnet" ]]; then
rm -rf "C:\\Program Files (x86)\\dotnet"
fi
if [[ -d "C:\\Program Files (x86)\\Microsoft SQL Server" ]]; then
rm -rf "C:\\Program Files (x86)\\Microsoft SQL Server"
fi
if [[ -d "C:\\Program Files (x86)\\Xamarin" ]]; then
rm -rf "C:\\Program Files (x86)\\Xamarin"
fi
if [[ -d "C:\\Program Files (x86)\\Google" ]]; then
rm -rf "C:\\Program Files (x86)\\Google"
fi
if [[ "${DESIRED_CUDA}" == "cu111" ]]; then
export BUILD_SPLIT_CUDA="ON"
fi
set +x
@ -62,8 +32,7 @@ if [[ "$CIRCLECI" == 'true' && -d "C:\\ProgramData\\Microsoft\\VisualStudio\\Pac
fi
if [[ "$CIRCLECI" == 'true' && -d "C:\\Microsoft" ]]; then
# don't use quotes here
rm -rf /c/Microsoft/AndroidNDK*
rm -rf "C:\\Microsoft\\Android*"
fi
echo "Free space on filesystem before build:"

View File

@ -4,7 +4,13 @@ set -eux -o pipefail
source "/c/w/env"
export CUDA_VERSION="${DESIRED_CUDA/cu/}"
export VC_YEAR=2019
export VC_YEAR=2017
if [[ "$CUDA_VERSION" == "92" || "$CUDA_VERSION" == "100" ]]; then
export VC_YEAR=2017
else
export VC_YEAR=2019
fi
pushd "$BUILDER_ROOT"

View File

@ -10,27 +10,18 @@ pt_checkout="/var/lib/jenkins/workspace"
# Since we're cat-ing this file, we need to escape all $'s
echo "cpp_doc_push_script.sh: Invoked with $*"
# for statements like ${1:-${DOCS_INSTALL_PATH:-docs/}}
# the order of operations goes:
# 1. Check if there's an argument $1
# 2. If no argument check for environment var DOCS_INSTALL_PATH
# 3. If no environment var fall back to default 'docs/'
# NOTE: It might seem weird to gather the second argument before gathering the first argument
# but since DOCS_INSTALL_PATH can be derived from DOCS_VERSION it's probably better to
# try and gather it first, just so we don't potentially break people who rely on this script
# Argument 2: What version of the Python API docs we are building.
version="${2:-${DOCS_VERSION:-master}}"
if [ -z "$version" ]; then
echo "error: cpp_doc_push_script.sh: version (arg2) not specified"
# Argument 1: Where to copy the built documentation for Python API to
# (pytorch.github.io/$install_path)
install_path="$1"
if [ -z "$install_path" ]; then
echo "error: cpp_doc_push_script.sh: install_path (arg1) not specified"
exit 1
fi
# Argument 1: Where to copy the built documentation for Python API to
# (pytorch.github.io/$install_path)
install_path="${1:-${DOCS_INSTALL_PATH:-docs/${DOCS_VERSION}}}"
if [ -z "$install_path" ]; then
echo "error: cpp_doc_push_script.sh: install_path (arg1) not specified"
# Argument 2: What version of the Python API docs we are building.
version="$2"
if [ -z "$version" ]; then
echo "error: cpp_doc_push_script.sh: version (arg2) not specified"
exit 1
fi

View File

@ -13,27 +13,18 @@ echo "python_doc_push_script.sh: Invoked with $*"
set -ex
# for statements like ${1:-${DOCS_INSTALL_PATH:-docs/}}
# the order of operations goes:
# 1. Check if there's an argument $1
# 2. If no argument check for environment var DOCS_INSTALL_PATH
# 3. If no environment var fall back to default 'docs/'
# NOTE: It might seem weird to gather the second argument before gathering the first argument
# but since DOCS_INSTALL_PATH can be derived from DOCS_VERSION it's probably better to
# try and gather it first, just so we don't potentially break people who rely on this script
# Argument 2: What version of the docs we are building.
version="${2:-${DOCS_VERSION:-master}}"
if [ -z "$version" ]; then
echo "error: python_doc_push_script.sh: version (arg2) not specified"
# Argument 1: Where to copy the built documentation to
# (pytorch.github.io/$install_path)
install_path="$1"
if [ -z "$install_path" ]; then
echo "error: python_doc_push_script.sh: install_path (arg1) not specified"
exit 1
fi
# Argument 1: Where to copy the built documentation to
# (pytorch.github.io/$install_path)
install_path="${1:-${DOCS_INSTALL_PATH:-docs/${DOCS_VERSION}}}"
if [ -z "$install_path" ]; then
echo "error: python_doc_push_script.sh: install_path (arg1) not specified"
# Argument 2: What version of the docs we are building.
version="$2"
if [ -z "$version" ]; then
echo "error: python_doc_push_script.sh: version (arg2) not specified"
exit 1
fi
@ -43,7 +34,7 @@ if [ "$version" == "master" ]; then
fi
# Argument 3: The branch to push to. Usually is "site"
branch="${3:-${DOCS_BRANCH:-site}}"
branch="$3"
if [ -z "$branch" ]; then
echo "error: python_doc_push_script.sh: branch (arg3) not specified"
exit 1

View File

@ -7,9 +7,6 @@ sudo rm -f /etc/apt/heroku.list
sudo rm -f /etc/apt/openjdk-r-ubuntu-ppa-xenial.list
sudo rm -f /etc/apt/partner.list
# To increase the network reliability, let apt decide which mirror is best to use
sudo sed -i -e 's/http:\/\/.*archive/mirror:\/\/mirrors/' -e 's/\/ubuntu\//\/mirrors.txt/' /etc/apt/sources.list
retry () {
$* || $* || $* || $* || $*
}
@ -43,9 +40,9 @@ if [ -n "${USE_CUDA_DOCKER_RUNTIME:-}" ]; then
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L "https://nvidia.github.io/nvidia-docker/${distribution}/nvidia-docker.list" | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
retry sudo apt-get update -qq
sudo apt-get update -qq
# Necessary to get the `--gpus` flag to function within docker
retry sudo apt-get install -y nvidia-container-toolkit
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
else
# Explicitly remove nvidia docker apt repositories if not building for cuda
@ -67,7 +64,6 @@ add_to_env_file() {
}
add_to_env_file IN_CI 1
add_to_env_file CI_MASTER "${CI_MASTER:-}"
add_to_env_file COMMIT_SOURCE "${CIRCLE_BRANCH:-}"
add_to_env_file BUILD_ENVIRONMENT "${BUILD_ENVIRONMENT}"
add_to_env_file CIRCLE_PULL_REQUEST "${CIRCLE_PULL_REQUEST}"

View File

@ -9,16 +9,10 @@ import sys
import time
import zipfile
from typing import Any, Dict, Generator, List
from tools.stats.scribe import (
send_to_scribe,
rds_write,
register_rds_schema,
schema_from_sample,
)
import requests
def get_size(file_dir: str) -> int:
def get_size(file_dir):
try:
# we should only expect one file, if no, something is wrong
file_name = glob.glob(os.path.join(file_dir, "*"))[0]
@ -28,21 +22,15 @@ def get_size(file_dir: str) -> int:
return 0
def base_data() -> Dict[str, Any]:
return {
"run_duration_seconds": int(
time.time() - os.path.getmtime(os.path.realpath(__file__))
),
}
def build_message(size: int) -> Dict[str, Any]:
build_env_split: List[Any] = os.environ.get("BUILD_ENVIRONMENT", "").split()
pkg_type, py_ver, cu_ver, *_ = build_env_split + [None, None, None]
def build_message(size):
pkg_type, py_ver, cu_ver, *_ = os.environ.get("BUILD_ENVIRONMENT", "").split() + [
None,
None,
None,
]
os_name = os.uname()[0].lower()
if os_name == "darwin":
os_name = "macos"
return {
"normal": {
"os": os_name,
@ -59,30 +47,38 @@ def build_message(size: int) -> Dict[str, Any]:
"time": int(time.time()),
"size": size,
"commit_time": int(os.environ.get("COMMIT_TIME", "0")),
"run_duration": int(
time.time() - os.path.getmtime(os.path.realpath(__file__))
),
"run_duration": int(time.time() - os.path.getmtime(os.path.realpath(__file__))),
},
}
def send_message(messages: List[Dict[str, Any]]) -> None:
logs = json.dumps(
[
{
"category": "perfpipe_pytorch_binary_size",
"message": json.dumps(message),
"line_escape": False,
}
for message in messages
]
def send_message(messages):
access_token = os.environ.get("SCRIBE_GRAPHQL_ACCESS_TOKEN")
if not access_token:
raise ValueError("Can't find access token from environment variable")
url = "https://graph.facebook.com/scribe_logs"
r = requests.post(
url,
data={
"access_token": access_token,
"logs": json.dumps(
[
{
"category": "perfpipe_pytorch_binary_size",
"message": json.dumps(message),
"line_escape": False,
}
for message in messages
]
),
},
)
res = send_to_scribe(logs)
print(res)
print(r.text)
r.raise_for_status()
def report_android_sizes(file_dir: str) -> None:
def gen_sizes() -> Generator[List[Any], None, None]:
def report_android_sizes(file_dir):
def gen_sizes():
# we should only expect one file, if no, something is wrong
aar_files = list(pathlib.Path(file_dir).rglob("pytorch_android-*.aar"))
if len(aar_files) != 1:
@ -105,7 +101,7 @@ def report_android_sizes(file_dir: str) -> None:
# report whole package size
yield ["aar", aar_file.name, os.stat(aar_file).st_size, 0]
def gen_messages() -> Generator[Dict[str, Any], None, None]:
def gen_messages():
android_build_type = os.environ.get("ANDROID_BUILD_TYPE")
for arch, lib, comp_size, uncomp_size in gen_sizes():
print(android_build_type, arch, lib, comp_size, uncomp_size)
@ -125,9 +121,7 @@ def report_android_sizes(file_dir: str) -> None:
"int": {
"time": int(time.time()),
"commit_time": int(os.environ.get("COMMIT_TIME", "0")),
"run_duration": int(
time.time() - os.path.getmtime(os.path.realpath(__file__))
),
"run_duration": int(time.time() - os.path.getmtime(os.path.realpath(__file__))),
"size": comp_size,
"raw_size": uncomp_size,
},
@ -142,41 +136,13 @@ if __name__ == "__main__":
)
if len(sys.argv) == 2:
file_dir = sys.argv[1]
if os.getenv("IS_GHA", "0") == "1":
sample_lib = {
"library": "abcd",
"size": 1234,
}
sample_data = {
**base_data(),
**sample_lib,
}
register_rds_schema("binary_size", schema_from_sample(sample_data))
print("checking dir: " + file_dir)
if "-android" in os.environ.get("BUILD_ENVIRONMENT", ""):
report_android_sizes(file_dir)
else:
if os.getenv("IS_GHA", "0") == "1":
build_path = pathlib.Path("build") / "lib"
libraries = [
(path.name, os.stat(path).st_size) for path in build_path.glob("*")
]
data = []
for name, size in libraries:
if name.strip() == "":
continue
library_data = {
"library": name,
"size": size,
}
data.append({**base_data(), **library_data})
rds_write("binary_size", data)
print(json.dumps(data, indent=2))
else:
print("checking dir: " + file_dir)
size = get_size(file_dir)
# Sending the message anyway if no size info is collected.
size = get_size(file_dir)
if size != 0:
try:
send_message([build_message(size)])
except Exception:

View File

@ -1,8 +1,8 @@
# https://developercommunity.visualstudio.com/t/install-specific-version-of-vs-component/1142479
# Where to find the links: https://docs.microsoft.com/en-us/visualstudio/releases/2019/history#release-dates-and-build-numbers
# https://docs.microsoft.com/en-us/visualstudio/releases/2019/history#release-dates-and-build-numbers
# BuildTools from S3
$VS_DOWNLOAD_LINK = "https://s3.amazonaws.com/ossci-windows/vs${env:VS_VERSION}_BuildTools.exe"
# 16.8.5 BuildTools
$VS_DOWNLOAD_LINK = "https://download.visualstudio.microsoft.com/download/pr/20130c62-1bc8-43d6-b4f0-c20bb7c79113/145a319d79a83376915d8f855605e152ef5f6fa2b2f1d2dca411fb03722eea72/vs_BuildTools.exe"
$COLLECT_DOWNLOAD_LINK = "https://aka.ms/vscollect.exe"
$VS_INSTALL_ARGS = @("--nocache","--quiet","--wait", "--add Microsoft.VisualStudio.Workload.VCTools",
"--add Microsoft.Component.MSBuild",
@ -14,45 +14,32 @@ $VS_INSTALL_ARGS = @("--nocache","--quiet","--wait", "--add Microsoft.VisualStud
"--add Microsoft.VisualStudio.Component.VC.Tools.x86.x64",
"--add Microsoft.VisualStudio.ComponentGroup.NativeDesktop.Win81")
if (${env:INSTALL_WINDOWS_SDK} -eq "1") {
$VS_INSTALL_ARGS += "--add Microsoft.VisualStudio.Component.Windows10SDK.19041"
}
if (Test-Path "${env:ProgramFiles(x86)}\Microsoft Visual Studio\Installer\vswhere.exe") {
$VS_VERSION_major = [int] ${env:VS_VERSION}.split(".")[0]
$existingPath = & "${env:ProgramFiles(x86)}\Microsoft Visual Studio\Installer\vswhere.exe" -products "Microsoft.VisualStudio.Product.BuildTools" -version "[${env:VS_VERSION}, ${env:VS_VERSION_major + 1})" -property installationPath
if (($existingPath -ne $null) -and (!${env:CIRCLECI})) {
echo "Found correctly versioned existing BuildTools installation in $existingPath"
exit 0
}
$pathToRemove = & "${env:ProgramFiles(x86)}\Microsoft Visual Studio\Installer\vswhere.exe" -products "Microsoft.VisualStudio.Product.BuildTools" -property installationPath
}
echo "Downloading VS installer from S3."
curl.exe --retry 3 -kL $VS_DOWNLOAD_LINK --output vs_installer.exe
if ($LASTEXITCODE -ne 0) {
echo "Download of the VS 2019 Version ${env:VS_VERSION} installer failed"
echo "Download of the VS 2019 Version 16.8.5 installer failed"
exit 1
}
if ($pathToRemove -ne $null) {
echo "Uninstalling $pathToRemove."
$VS_UNINSTALL_ARGS = @("uninstall", "--installPath", "`"$pathToRemove`"", "--quiet","--wait")
$process = Start-Process "${PWD}\vs_installer.exe" -ArgumentList $VS_UNINSTALL_ARGS -NoNewWindow -Wait -PassThru
$exitCode = $process.ExitCode
if (($exitCode -ne 0) -and ($exitCode -ne 3010)) {
echo "Original BuildTools uninstall failed with code $exitCode"
exit 1
if (Test-Path "${env:ProgramFiles(x86)}\Microsoft Visual Studio\Installer\vswhere.exe") {
$existingPath = & "${env:ProgramFiles(x86)}\Microsoft Visual Studio\Installer\vswhere.exe" -products "Microsoft.VisualStudio.Product.BuildTools" -version "[16, 17)" -property installationPath
if ($existingPath -ne $null) {
echo "Found existing BuildTools installation in $existingPath"
$VS_UNINSTALL_ARGS = @("uninstall", "--installPath", "`"$existingPath`"", "--quiet","--wait")
$process = Start-Process "${PWD}\vs_installer.exe" -ArgumentList $VS_UNINSTALL_ARGS -NoNewWindow -Wait -PassThru
$exitCode = $process.ExitCode
if (($exitCode -ne 0) -and ($exitCode -ne 3010)) {
echo "Original BuildTools uninstall failed with code $exitCode"
exit 1
}
echo "Original BuildTools uninstalled"
}
echo "Other versioned BuildTools uninstalled."
}
echo "Installing Visual Studio version ${env:VS_VERSION}."
$process = Start-Process "${PWD}\vs_installer.exe" -ArgumentList $VS_INSTALL_ARGS -NoNewWindow -Wait -PassThru
Remove-Item -Path vs_installer.exe -Force
$exitCode = $process.ExitCode
if (($exitCode -ne 0) -and ($exitCode -ne 3010)) {
echo "VS 2019 installer exited with code $exitCode, which should be one of [0, 3010]."
echo "VS 2017 installer exited with code $exitCode, which should be one of [0, 3010]."
curl.exe --retry 3 -kL $COLLECT_DOWNLOAD_LINK --output Collect.exe
if ($LASTEXITCODE -ne 0) {
echo "Download of the VS Collect tool failed."
@ -60,6 +47,6 @@ if (($exitCode -ne 0) -and ($exitCode -ne 3010)) {
}
Start-Process "${PWD}\Collect.exe" -NoNewWindow -Wait -PassThru
New-Item -Path "C:\w\build-results" -ItemType "directory" -Force
Copy-Item -Path "${env:TEMP}\vslogs.zip" -Destination "C:\w\build-results\"
Copy-Item -Path "C:\Users\circleci\AppData\Local\Temp\vslogs.zip" -Destination "C:\w\build-results\"
exit 1
}

View File

@ -1,74 +1,70 @@
#!/bin/bash
set -eux -o pipefail
case ${CUDA_VERSION} in
10.1)
cuda_installer_name="cuda_10.1.243_426.00_win10"
cuda_install_packages="nvcc_10.1 cuobjdump_10.1 nvprune_10.1 cupti_10.1 cublas_10.1 cublas_dev_10.1 cudart_10.1 cufft_10.1 cufft_dev_10.1 curand_10.1 curand_dev_10.1 cusolver_10.1 cusolver_dev_10.1 cusparse_10.1 cusparse_dev_10.1 nvgraph_10.1 nvgraph_dev_10.1 npp_10.1 npp_dev_10.1 nvrtc_10.1 nvrtc_dev_10.1 nvml_dev_10.1"
;;
10.2)
cuda_installer_name="cuda_10.2.89_441.22_win10"
cuda_install_packages="nvcc_10.2 cuobjdump_10.2 nvprune_10.2 cupti_10.2 cublas_10.2 cublas_dev_10.2 cudart_10.2 cufft_10.2 cufft_dev_10.2 curand_10.2 curand_dev_10.2 cusolver_10.2 cusolver_dev_10.2 cusparse_10.2 cusparse_dev_10.2 nvgraph_10.2 nvgraph_dev_10.2 npp_10.2 npp_dev_10.2 nvrtc_10.2 nvrtc_dev_10.2 nvml_dev_10.2"
;;
11.1)
cuda_major_version=${CUDA_VERSION%.*}
if [[ "$cuda_major_version" == "10" ]]; then
cuda_installer_name="cuda_10.1.243_426.00_win10"
msbuild_project_dir="CUDAVisualStudioIntegration/extras/visual_studio_integration/MSBuildExtensions"
cuda_install_packages="nvcc_10.1 cuobjdump_10.1 nvprune_10.1 cupti_10.1 cublas_10.1 cublas_dev_10.1 cudart_10.1 cufft_10.1 cufft_dev_10.1 curand_10.1 curand_dev_10.1 cusolver_10.1 cusolver_dev_10.1 cusparse_10.1 cusparse_dev_10.1 nvgraph_10.1 nvgraph_dev_10.1 npp_10.1 npp_dev_10.1 nvrtc_10.1 nvrtc_dev_10.1 nvml_dev_10.1"
elif [[ "$cuda_major_version" == "11" ]]; then
if [[ "${CUDA_VERSION}" == "11.1" ]]; then
cuda_installer_name="cuda_11.1.0_456.43_win10"
msbuild_project_dir="visual_studio_integration/CUDAVisualStudioIntegration/extras/visual_studio_integration/MSBuildExtensions"
cuda_install_packages="nvcc_11.1 cuobjdump_11.1 nvprune_11.1 nvprof_11.1 cupti_11.1 cublas_11.1 cublas_dev_11.1 cudart_11.1 cufft_11.1 cufft_dev_11.1 curand_11.1 curand_dev_11.1 cusolver_11.1 cusolver_dev_11.1 cusparse_11.1 cusparse_dev_11.1 npp_11.1 npp_dev_11.1 nvrtc_11.1 nvrtc_dev_11.1 nvml_dev_11.1"
;;
11.3)
elif [[ "${CUDA_VERSION}" == "11.3" ]]; then
cuda_installer_name="cuda_11.3.0_465.89_win10"
msbuild_project_dir="visual_studio_integration/CUDAVisualStudioIntegration/extras/visual_studio_integration/MSBuildExtensions"
cuda_install_packages="thrust_11.3 nvcc_11.3 cuobjdump_11.3 nvprune_11.3 nvprof_11.3 cupti_11.3 cublas_11.3 cublas_dev_11.3 cudart_11.3 cufft_11.3 cufft_dev_11.3 curand_11.3 curand_dev_11.3 cusolver_11.3 cusolver_dev_11.3 cusparse_11.3 cusparse_dev_11.3 npp_11.3 npp_dev_11.3 nvrtc_11.3 nvrtc_dev_11.3 nvml_dev_11.3"
;;
*)
echo "CUDA_VERSION $CUDA_VERSION is not supported yet"
else
echo "This should not happen! ABORT."
exit 1
;;
esac
if [[ -f "/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v${CUDA_VERSION}/bin/nvcc.exe" ]]; then
echo "Existing CUDA v${CUDA_VERSION} installation found, skipping install"
fi
else
tmp_dir=$(mktemp -d)
(
# no need to popd after, the subshell shouldn't affect the parent shell
pushd "${tmp_dir}"
cuda_installer_link="https://ossci-windows.s3.amazonaws.com/${cuda_installer_name}.exe"
curl --retry 3 -kLO $cuda_installer_link
7z x ${cuda_installer_name}.exe -o${cuda_installer_name}
pushd ${cuda_installer_name}
mkdir cuda_install_logs
set +e
# This breaks for some reason if you quote cuda_install_packages
# shellcheck disable=SC2086
./setup.exe -s ${cuda_install_packages} -loglevel:6 -log:"$(pwd -W)/cuda_install_logs"
set -e
if [[ ! -f "/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v${CUDA_VERSION}/bin/nvcc.exe" ]]; then
echo "CUDA installation failed"
mkdir -p /c/w/build-results
7z a "c:\\w\\build-results\\cuda_install_logs.7z" cuda_install_logs
exit 1
fi
)
rm -rf "${tmp_dir}"
echo "CUDA_VERSION $CUDA_VERSION is not supported yet"
exit 1
fi
if [[ -f "/c/Program Files/NVIDIA Corporation/NvToolsExt/bin/x64/nvToolsExt64_1.dll" ]]; then
echo "Existing nvtools installation found, skipping install"
else
# create tmp dir for download
tmp_dir=$(mktemp -d)
(
# no need to popd after, the subshell shouldn't affect the parent shell
pushd "${tmp_dir}"
curl --retry 3 -kLO https://ossci-windows.s3.amazonaws.com/NvToolsExt.7z
7z x NvToolsExt.7z -oNvToolsExt
mkdir -p "C:/Program Files/NVIDIA Corporation/NvToolsExt"
cp -r NvToolsExt/* "C:/Program Files/NVIDIA Corporation/NvToolsExt/"
)
rm -rf "${tmp_dir}"
if [[ "$cuda_major_version" == "11" && "${JOB_EXECUTOR}" == "windows-with-nvidia-gpu" ]]; then
cuda_install_packages="${cuda_install_packages} Display.Driver"
fi
cuda_installer_link="https://ossci-windows.s3.amazonaws.com/${cuda_installer_name}.exe"
curl --retry 3 -kLO $cuda_installer_link
7z x ${cuda_installer_name}.exe -o${cuda_installer_name}
cd ${cuda_installer_name}
mkdir cuda_install_logs
set +e
./setup.exe -s ${cuda_install_packages} -loglevel:6 -log:"$(pwd -W)/cuda_install_logs"
set -e
if [[ "${VC_YEAR}" == "2017" ]]; then
cp -r ${msbuild_project_dir}/* "C:/Program Files (x86)/Microsoft Visual Studio/2017/${VC_PRODUCT}/Common7/IDE/VC/VCTargets/BuildCustomizations/"
else
cp -r ${msbuild_project_dir}/* "C:/Program Files (x86)/Microsoft Visual Studio/2019/${VC_PRODUCT}/MSBuild/Microsoft/VC/v160/BuildCustomizations/"
fi
if ! ls "/c/Program Files/NVIDIA Corporation/NvToolsExt/bin/x64/nvToolsExt64_1.dll"
then
curl --retry 3 -kLO https://ossci-windows.s3.amazonaws.com/NvToolsExt.7z
7z x NvToolsExt.7z -oNvToolsExt
mkdir -p "C:/Program Files/NVIDIA Corporation/NvToolsExt"
cp -r NvToolsExt/* "C:/Program Files/NVIDIA Corporation/NvToolsExt/"
export NVTOOLSEXT_PATH="C:\\Program Files\\NVIDIA Corporation\\NvToolsExt\\"
fi
if ! ls "/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v${CUDA_VERSION}/bin/nvcc.exe"
then
echo "CUDA installation failed"
mkdir -p /c/w/build-results
7z a "c:\\w\\build-results\\cuda_install_logs.7z" cuda_install_logs
exit 1
fi
cd ..
rm -rf ./${cuda_installer_name}
rm -f ./${cuda_installer_name}.exe

View File

@ -1,46 +1,28 @@
#!/bin/bash
set -eux -o pipefail
# This is typically blank but for CUDA 10* it'll be set to 10
windows_version_qualifier=""
cuda_major_version=${CUDA_VERSION%.*}
case ${CUDA_VERSION} in
10.1)
archive_version="v7.6.4.38"
windows_version_qualifier="10"
;;
10.2)
archive_version="v7.6.5.32"
windows_version_qualifier="10"
;;
11.1)
archive_version="v8.0.5.39"
;;
11.3)
archive_version="v8.2.0.53"
;;
*)
echo "CUDA_VERSION: ${CUDA_VERSION} not supported yet"
if [[ "$cuda_major_version" == "10" ]]; then
cudnn_installer_name="cudnn-${CUDA_VERSION}-windows10-x64-v7.6.4.38"
elif [[ "$cuda_major_version" == "11" ]]; then
if [[ "${CUDA_VERSION}" == "11.1" ]]; then
cudnn_installer_name="cudnn-${CUDA_VERSION}-windows-x64-v8.0.5.39"
elif [[ "${CUDA_VERSION}" == "11.3" ]]; then
cudnn_installer_name="cudnn-${CUDA_VERSION}-windows-x64-v8.2.0.53"
else
echo "This should not happen! ABORT."
exit 1
;;
esac
cudnn_installer_name="cudnn_installer.zip"
cudnn_installer_link="https://ossci-windows.s3.amazonaws.com/cudnn-${CUDA_VERSION}-windows${windows_version_qualifier}-x64-${archive_version}.zip"
cudnn_install_folder="C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v${CUDA_VERSION}/"
if [[ -f "${cudnn_install_folder}/include/cudnn.h" ]]; then
echo "Existing cudnn installation found, skipping install..."
fi
else
tmp_dir=$(mktemp -d)
(
pushd "${tmp_dir}"
curl --retry 3 -o "${cudnn_installer_name}" "$cudnn_installer_link"
7z x "${cudnn_installer_name}" -ocudnn
# Use '${var:?}/*' to avoid potentially expanding to '/*'
# Remove all of the directories before attempting to copy files
rm -rf "${cudnn_install_folder:?}/*"
cp -rf cudnn/cuda/* "${cudnn_install_folder}"
)
rm -rf "${tmp_dir}"
echo "CUDNN for CUDA_VERSION $CUDA_VERSION is not supported yet"
exit 1
fi
cudnn_installer_link="https://ossci-windows.s3.amazonaws.com/${cudnn_installer_name}.zip"
curl --retry 3 -O $cudnn_installer_link
7z x ${cudnn_installer_name}.zip -ocudnn
cp -r cudnn/cuda/* "C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v${CUDA_VERSION}/"
rm -rf cudnn
rm -f ${cudnn_installer_name}.zip

View File

@ -15,15 +15,11 @@ pytorch_params: &pytorch_params
build_only:
type: string
default: ""
ci_master:
type: string
default: ""
environment:
BUILD_ENVIRONMENT: << parameters.build_environment >>
DOCKER_IMAGE: << parameters.docker_image >>
USE_CUDA_DOCKER_RUNTIME: << parameters.use_cuda_docker_runtime >>
BUILD_ONLY: << parameters.build_only >>
CI_MASTER: << pipeline.parameters.run_master_build >>
resource_class: << parameters.resource_class >>
pytorch_android_params: &pytorch_android_params
@ -64,9 +60,6 @@ pytorch_ios_params: &pytorch_ios_params
lite_interpreter:
type: string
default: "1"
use_coreml:
type: string
default: "0"
environment:
BUILD_ENVIRONMENT: << parameters.build_environment >>
IOS_ARCH: << parameters.ios_arch >>
@ -74,7 +67,6 @@ pytorch_ios_params: &pytorch_ios_params
SELECTED_OP_LIST: << parameters.op_list >>
USE_PYTORCH_METAL: << parameters.use_metal >>
BUILD_LITE_INTERPRETER: << parameters.lite_interpreter >>
USE_COREML_DELEGATE: << parameters.use_coreml >>
pytorch_windows_params: &pytorch_windows_params
parameters:
@ -92,10 +84,7 @@ pytorch_windows_params: &pytorch_windows_params
default: "10.1"
python_version:
type: string
default: "3.8"
vs_version:
type: string
default: "16.8.6"
default: "3.6"
vc_version:
type: string
default: "14.16"
@ -113,7 +102,6 @@ pytorch_windows_params: &pytorch_windows_params
SCCACHE_BUCKET: "ossci-compiler-cache"
CUDA_VERSION: <<parameters.cuda_version>>
PYTHON_VERSION: <<parameters.python_version>>
VS_VERSION: <<parameters.vs_version>>
VC_VERSION: <<parameters.vc_version>>
VC_YEAR: <<parameters.vc_year>>
VC_PRODUCT: <<parameters.vc_product>>

View File

@ -171,4 +171,4 @@ commands:
cd ~/project
export ANDROID_BUILD_TYPE="<< parameters.build_type >>"
export COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
python3 -m tools.stats.upload_binary_size_to_scuba android
python3 .circleci/scripts/upload_binary_size_to_scuba.py android

View File

@ -17,9 +17,6 @@ parameters:
run_master_build:
type: boolean
default: false
run_slow_gradcheck_build:
type: boolean
default: false
executors:
windows-with-nvidia-gpu:

View File

@ -22,14 +22,14 @@
command: |
ls -lah /final_pkgs
- run:
name: upload build & binary data
name: save binary size
no_output_timeout: "5m"
command: |
source /env
cd /pytorch && export COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
python3 -mpip install requests && \
SCRIBE_GRAPHQL_ACCESS_TOKEN=${SCRIBE_GRAPHQL_ACCESS_TOKEN} \
python3 -m tools.stats.upload_binary_size_to_scuba || exit 0
python3 /pytorch/.circleci/scripts/upload_binary_size_to_scuba.py || exit 0
- persist_to_workspace:
root: /
paths: final_pkgs
@ -239,7 +239,7 @@
binary_ios_build:
<<: *pytorch_ios_params
macos:
xcode: "12.5.1"
xcode: "12.0"
steps:
- attach_workspace:
at: ~/workspace
@ -266,7 +266,7 @@
binary_ios_upload:
<<: *pytorch_ios_params
macos:
xcode: "12.5.1"
xcode: "12.0"
steps:
- attach_workspace:
at: ~/workspace

View File

@ -41,7 +41,7 @@
no_output_timeout: "1h"
command: |
set -ex
export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}:build-${DOCKER_TAG}-${CIRCLE_SHA1}
export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}:${DOCKER_TAG}-${CIRCLE_SHA1}
echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}
tag=${CIRCLE_TAG:1:5}
target=${tag:-master}
@ -86,7 +86,7 @@
no_output_timeout: "1h"
command: |
set -ex
export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}:build-${DOCKER_TAG}-${CIRCLE_SHA1}
export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}:${DOCKER_TAG}-${CIRCLE_SHA1}
echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}
tag=${CIRCLE_TAG:1:5}
target=${tag:-master}
@ -126,7 +126,6 @@
set -e
export IN_CI=1
export CROSS_COMPILE_ARM64=1
export JOB_BASE_NAME=$CIRCLE_JOB
# Install sccache
sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache
@ -163,7 +162,6 @@
command: |
set -e
export IN_CI=1
export JOB_BASE_NAME=$CIRCLE_JOB
# Install sccache
sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache
@ -200,7 +198,6 @@
command: |
set -e
export IN_CI=1
export JOB_BASE_NAME=$CIRCLE_JOB
chmod a+x .jenkins/pytorch/macos-test.sh
unbuffer .jenkins/pytorch/macos-test.sh 2>&1 | ts
@ -211,14 +208,12 @@
set -ex
source /Users/distiller/workspace/miniconda3/bin/activate
pip install boto3
export IN_CI=1
export JOB_BASE_NAME=$CIRCLE_JOB
export PYTHONPATH="$PWD"
# Using the same IAM user to write stats to our OSS bucket
export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V4}
export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4}
python -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test
python tools/print_test_stats.py --upload-to-s3 --compare-with-s3 test
when: always
- store_test_results:
path: test/test-reports
@ -240,7 +235,6 @@
set -e
export IN_CI=1
export BUILD_LITE_INTERPRETER=1
export JOB_BASE_NAME=$CIRCLE_JOB
chmod a+x ${HOME}/project/.jenkins/pytorch/macos-lite-interpreter-build-test.sh
unbuffer ${HOME}/project/.jenkins/pytorch/macos-lite-interpreter-build-test.sh 2>&1 | ts
- store_test_results:
@ -264,7 +258,7 @@
no_output_timeout: "1h"
command: |
set -eux
docker_image_commit=${DOCKER_IMAGE}:build-${DOCKER_TAG}-${CIRCLE_SHA1}
docker_image_commit=${DOCKER_IMAGE}:${DOCKER_TAG}-${CIRCLE_SHA1}
docker_image_libtorch_android_x86_32=${docker_image_commit}-android-x86_32
docker_image_libtorch_android_x86_64=${docker_image_commit}-android-x86_64
@ -353,7 +347,7 @@
no_output_timeout: "1h"
command: |
set -eux
docker_image_commit=${DOCKER_IMAGE}:build-${DOCKER_TAG}-${CIRCLE_SHA1}
docker_image_commit=${DOCKER_IMAGE}:${DOCKER_TAG}-${CIRCLE_SHA1}
docker_image_libtorch_android_x86_32_gradle=${docker_image_commit}-android-x86_32-gradle
@ -390,7 +384,7 @@
no_output_timeout: "1h"
command: |
set -e
docker_image_libtorch_android_x86_32=${DOCKER_IMAGE}:build-${DOCKER_TAG}-${CIRCLE_SHA1}-android-x86_32
docker_image_libtorch_android_x86_32=${DOCKER_IMAGE}:${DOCKER_TAG}-${CIRCLE_SHA1}-android-x86_32
echo "docker_image_libtorch_android_x86_32: "${docker_image_libtorch_android_x86_32}
# x86
@ -437,7 +431,7 @@
echo "DOCKER_IMAGE: ${DOCKER_IMAGE}:${DOCKER_TAG}"
time docker pull ${DOCKER_IMAGE}:${DOCKER_TAG} >/dev/null
git submodule sync && git submodule update -q --init --recursive --depth 1 --jobs 0
git submodule sync && git submodule update -q --init --recursive --depth 1
VOLUME_MOUNTS="-v /home/circleci/project/:/var/lib/jenkins/workspace"
export id=$(docker run --env-file "${BASH_ENV}" ${VOLUME_MOUNTS} --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${DOCKER_IMAGE}:${DOCKER_TAG})
@ -453,7 +447,7 @@
pytorch_ios_build:
<<: *pytorch_ios_params
macos:
xcode: "12.5.1"
xcode: "12.0"
steps:
- checkout
- run_brew_for_ios_build
@ -467,17 +461,16 @@
# install fastlane
sudo gem install bundler && bundle install
# install certificates
echo ${IOS_CERT_KEY_2022} >> cert.txt
echo ${IOS_CERT_KEY} >> cert.txt
base64 --decode cert.txt -o Certificates.p12
rm cert.txt
bundle exec fastlane install_root_cert
bundle exec fastlane install_dev_cert
bundle exec fastlane install_cert
# install the provisioning profile
PROFILE=PyTorch_CI_2022.mobileprovision
PROFILE=PyTorch_CI_2021.mobileprovision
PROVISIONING_PROFILES=~/Library/MobileDevice/Provisioning\ Profiles
mkdir -pv "${PROVISIONING_PROFILES}"
cd "${PROVISIONING_PROFILES}"
echo ${IOS_SIGN_KEY_2022} >> cert.txt
echo ${IOS_SIGN_KEY} >> cert.txt
base64 --decode cert.txt -o ${PROFILE}
rm cert.txt
- run:
@ -507,7 +500,7 @@
# sync submodules
cd ${PROJ_ROOT}
git submodule sync
git submodule update --init --recursive --depth 1 --jobs 0
git submodule update --init --recursive --depth 1
# export
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
@ -535,8 +528,12 @@
no_output_timeout: "30m"
command: |
set -e
if [ ${BUILD_LITE_INTERPRETER} == 0 ]; then
echo "Run Build Test is not for full jit, skipping."
exit 0
fi
PROJ_ROOT=/Users/distiller/project
PROFILE=PyTorch_CI_2022
PROFILE=PyTorch_CI_2021
# run the ruby build script
if ! [ -x "$(command -v xcodebuild)" ]; then
echo 'Error: xcodebuild is not installed.'
@ -560,28 +557,21 @@
if [ ${IOS_PLATFORM} != "SIMULATOR" ]; then
echo "not SIMULATOR build, skip it."
exit 0
elif [ ${BUILD_LITE_INTERPRETER} == 0 ]; then
echo "Run Simulator Tests is not for full jit, skipping."
exit 0
fi
WORKSPACE=/Users/distiller/workspace
PROJ_ROOT=/Users/distiller/project
source ~/anaconda/bin/activate
# use the pytorch nightly build to generate models
conda install pytorch torchvision -c pytorch-nightly --yes
# generate models for differnet backends
pip install torch torchvision --progress-bar off
#run unit test
cd ${PROJ_ROOT}/ios/TestApp/benchmark
mkdir -p ../models
python trace_model.py
if [ ${BUILD_LITE_INTERPRETER} == 1 ]; then
ruby setup.rb --lite 1
else
ruby setup.rb
fi
ruby setup.rb
cd ${PROJ_ROOT}/ios/TestApp
instruments -s -devices
if [ ${BUILD_LITE_INTERPRETER} == 1 ]; then
fastlane scan --only_testing TestAppTests/TestAppTests/testLiteInterpreter
else
fastlane scan --only_testing TestAppTests/TestAppTests/testFullJIT
fi
fastlane scan
pytorch_linux_bazel_build:
<<: *pytorch_params
machine:
@ -603,7 +593,7 @@
echo "Do NOT merge master branch into $CIRCLE_BRANCH in environment $BUILD_ENVIRONMENT"
git submodule sync && git submodule update -q --init --recursive --depth 1 --jobs 0
git submodule sync && git submodule update -q --init --recursive --depth 1
docker cp /home/circleci/project/. $id:/var/lib/jenkins/workspace
@ -614,7 +604,7 @@
# Push intermediate Docker image for next phase to use
if [ -z "${BUILD_ONLY}" ]; then
# Augment our output image name with bazel to avoid collisions
output_image=${DOCKER_IMAGE}:build-${DOCKER_TAG}-bazel-${CIRCLE_SHA1}
output_image=${DOCKER_IMAGE}:${DOCKER_TAG}-bazel-${CIRCLE_SHA1}
export COMMIT_DOCKER_IMAGE=$output_image
docker commit "$id" ${COMMIT_DOCKER_IMAGE}
time docker push ${COMMIT_DOCKER_IMAGE}
@ -634,7 +624,7 @@
no_output_timeout: "90m"
command: |
set -e
output_image=${DOCKER_IMAGE}:build-${DOCKER_TAG}-bazel-${CIRCLE_SHA1}
output_image=${DOCKER_IMAGE}:${DOCKER_TAG}-bazel-${CIRCLE_SHA1}
export COMMIT_DOCKER_IMAGE=$output_image
echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}
@ -694,7 +684,7 @@
no_output_timeout: "30m"
command: |
set -ex
export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}:build-${DOCKER_TAG}-${CIRCLE_SHA1}
export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}:${DOCKER_TAG}-${CIRCLE_SHA1}
echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}
time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null
export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

View File

@ -30,11 +30,11 @@ jobs:
time docker pull ${DOCKER_IMAGE}:${DOCKER_TAG} >/dev/null
export id=$(docker run --env-file "${BASH_ENV}" --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${DOCKER_IMAGE}:${DOCKER_TAG})
git submodule sync && git submodule update -q --init --recursive --depth 1 --jobs 0
git submodule sync && git submodule update -q --init --recursive --depth 1
docker cp /home/circleci/project/. $id:/var/lib/jenkins/workspace
export COMMAND='((echo "sudo chown -R jenkins workspace && export JOB_BASE_NAME="$CIRCLE_JOB" && cd workspace && .jenkins/pytorch/build.sh && find ${BUILD_ROOT} -type f -name "*.a" -or -name "*.o" -delete") | docker exec -u jenkins -i "$id" bash) 2>&1'
export COMMAND='((echo "sudo chown -R jenkins workspace && export CIRCLE_JOB="$CIRCLE_JOB" && cd workspace && .jenkins/pytorch/build.sh && find ${BUILD_ROOT} -type f -name "*.a" -or -name "*.o" -delete") | docker exec -u jenkins -i "$id" bash) 2>&1'
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts
@ -47,7 +47,7 @@ jobs:
# The xla build uses the same docker image as
# pytorch_linux_bionic_py3_6_clang9_build. In the push step, we have to
# distinguish between them so the test can pick up the correct image.
output_image=${DOCKER_IMAGE}:build-${DOCKER_TAG}-${CIRCLE_SHA1}
output_image=${DOCKER_IMAGE}:${DOCKER_TAG}-${CIRCLE_SHA1}
if [[ ${BUILD_ENVIRONMENT} == *"xla"* ]]; then
export COMMIT_DOCKER_IMAGE=$output_image-xla
elif [[ ${BUILD_ENVIRONMENT} == *"libtorch"* ]]; then
@ -74,14 +74,6 @@ jobs:
docker commit "$id" ${COMMIT_DOCKER_IMAGE}
time docker push ${COMMIT_DOCKER_IMAGE}
fi
- run:
name: upload build & binary data
no_output_timeout: "5m"
command: |
cd /pytorch && export COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
python3 -mpip install requests && \
SCRIBE_GRAPHQL_ACCESS_TOKEN=${SCRIBE_GRAPHQL_ACCESS_TOKEN} \
python3 -m tools.stats.upload_binary_size_to_scuba || exit 0
- store_artifacts:
path: /home/circleci/project/dist
@ -105,7 +97,7 @@ jobs:
export DOCKER_TAG="f3d89a32912f62815e4feaeed47e564e887dffd6"
fi
# See Note [Special build images]
output_image=${DOCKER_IMAGE}:build-${DOCKER_TAG}-${CIRCLE_SHA1}
output_image=${DOCKER_IMAGE}:${DOCKER_TAG}-${CIRCLE_SHA1}
if [[ ${BUILD_ENVIRONMENT} == *"xla"* ]]; then
export COMMIT_DOCKER_IMAGE=$output_image-xla
elif [[ ${BUILD_ENVIRONMENT} == *"libtorch"* ]]; then
@ -158,14 +150,13 @@ jobs:
}
if is_vanilla_build; then
echo "apt-get update || apt-get install libgnutls30" | docker exec -u root -i "$id" bash
echo "apt-get install -y qemu-user gdb" | docker exec -u root -i "$id" bash
echo "apt-get update && apt-get install -y qemu-user gdb" | docker exec -u root -i "$id" bash
echo "cd workspace/build; qemu-x86_64 -g 2345 -cpu Broadwell -E ATEN_CPU_CAPABILITY=default ./bin/basic --gtest_filter=BasicTest.BasicTestCPU & gdb ./bin/basic -ex 'set pagination off' -ex 'target remote :2345' -ex 'continue' -ex 'bt' -ex='set confirm off' -ex 'quit \$_isvoid(\$_exitcode)'" | docker exec -u jenkins -i "$id" bash
else
echo "Skipping for ${BUILD_ENVIRONMENT}"
fi
- run:
name: Test
name: Run tests
no_output_timeout: "90m"
command: |
set -e
@ -174,16 +165,7 @@ jobs:
# =================== The following code will be executed inside Docker container ===================
set -ex
export SCRIBE_GRAPHQL_ACCESS_TOKEN="${SCRIBE_GRAPHQL_ACCESS_TOKEN}"
export JOB_BASE_NAME="$CIRCLE_JOB"
# temporary fix for https://github.com/pytorch/pytorch/issues/60746
if [ -z "$CIRCLE_PR_NUMBER" ]; then
if [[ $CIRCLE_BRANCH =~ .*pull.* ]]; then
export PR_NUMBER="$(echo $CIRCLE_BRANCH | sed 's/[^0-9]//g')"
export CIRCLE_PR_NUMBER="$PR_NUMBER"
fi
else
export PR_NUMBER="$CIRCLE_PR_NUMBER"
fi
export CIRCLE_JOB="$CIRCLE_JOB"
${PARALLEL_FLAGS}
cd workspace
EOL
@ -230,10 +212,11 @@ jobs:
export CIRCLE_SHA1="$CIRCLE_SHA1"
export CIRCLE_PR_NUMBER="${CIRCLE_PR_NUMBER:-}"
export CIRCLE_BRANCH="$CIRCLE_BRANCH"
export JOB_BASE_NAME="$CIRCLE_JOB"
export CIRCLE_JOB="$CIRCLE_JOB"
export CIRCLE_WORKFLOW_ID="$CIRCLE_WORKFLOW_ID"
cd workspace
python -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test
export PYTHONPATH="\${PWD}"
python tools/print_test_stats.py --upload-to-s3 --compare-with-s3 test
EOL
echo "(cat docker_commands.sh | docker exec -u jenkins -e LANG=C.UTF-8 -i "$id" bash) 2>&1" > command.sh
unbuffer bash command.sh | ts
@ -262,10 +245,7 @@ jobs:
default: "10.1"
python_version:
type: string
default: "3.8"
vs_version:
type: string
default: "16.8.6"
default: "3.6"
vc_version:
type: string
default: "14.16"
@ -332,10 +312,7 @@ jobs:
default: "10.1"
python_version:
type: string
default: "3.8"
vs_version:
type: string
default: "16.8.6"
default: "3.6"
vc_version:
type: string
default: "14.16"
@ -391,8 +368,9 @@ jobs:
set -ex
export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_WIN_BUILD_V1}
export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_WIN_BUILD_V1}
export PYTHONPATH="$PWD"
pip install typing_extensions boto3
python -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test
python tools/print_test_stats.py --upload-to-s3 --compare-with-s3 test
when: always
- store_test_results:
path: test/test-reports

View File

@ -1,3 +1,169 @@
scheduled-ci:
triggers:
- schedule:
# runs every 4 hours on the 45th minute
cron: "45 0,4,8,12,16,20 * * *"
filters:
branches:
only:
- master
jobs:
- docker_build_job:
name: "docker-pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"
image_name: "pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"
- pytorch_linux_build:
name: periodic_pytorch_xenial_cuda11_3_cudnn8_gcc7_build
requires:
- "docker-pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"
build_environment: "pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7-build"
docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"
- pytorch_linux_test:
name: periodic_pytorch_xenial_cuda11_3_cudnn8_gcc7_test
requires:
- periodic_pytorch_xenial_cuda11_3_cudnn8_gcc7_build
build_environment: "pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7-test"
docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"
use_cuda_docker_runtime: "1"
resource_class: gpu.medium
- pytorch_linux_build:
name: periodic_libtorch_xenial_cuda11_3_cudnn8_gcc7_build
requires:
- "docker-pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"
build_environment: "pytorch-libtorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7-build"
docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"
- pytorch_windows_build:
build_environment: pytorch-win-vs2019-cuda11-cudnn8-py3
cuda_version: "11.3"
name: periodic_pytorch_windows_cuda11.3_build
python_version: "3.6"
use_cuda: "1"
vc_product: BuildTools
vc_version: "14.28.29333"
vc_year: "2019"
- pytorch_windows_test:
build_environment: pytorch-win-vs2019-cuda11-cudnn8-py3
cuda_version: "11.3"
executor: windows-with-nvidia-gpu
name: periodic_pytorch_windows_cuda11.3_test1
python_version: "3.6"
requires:
- periodic_pytorch_windows_cuda11.3_build
test_name: pytorch-windows-test1
use_cuda: "1"
vc_product: BuildTools
vc_version: "14.28.29333"
vc_year: "2019"
- pytorch_windows_test:
build_environment: pytorch-win-vs2019-cuda11-cudnn8-py3
cuda_version: "11.3"
executor: windows-with-nvidia-gpu
name: periodic_pytorch_windows_cuda11.3_test2
python_version: "3.6"
requires:
- periodic_pytorch_windows_cuda11.3_build
test_name: pytorch-windows-test2
use_cuda: "1"
vc_product: BuildTools
vc_version: "14.28.29333"
vc_year: "2019"
# The following allows these jobs to run on ci-all and release branches
debuggable-scheduled-ci:
jobs:
- docker_build_job:
name: "docker-pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"
image_name: "pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"
filters:
branches:
only:
- /ci-all\/.*/
- /release\/.*/
- pytorch_linux_build:
name: pytorch_linux_xenial_cuda11_3_cudnn8_py3_gcc7_build
requires:
- "docker-pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"
build_environment: "pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7-build"
docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"
filters:
branches:
only:
- /ci-all\/.*/
- /release\/.*/
- pytorch_linux_test:
name: pytorch_linux_xenial_cuda11_3_cudnn8_py3_gcc7_test
requires:
- pytorch_linux_xenial_cuda11_3_cudnn8_py3_gcc7_build
build_environment: "pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7-test"
docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"
use_cuda_docker_runtime: "1"
resource_class: gpu.medium
filters:
branches:
only:
- /ci-all\/.*/
- /release\/.*/
- pytorch_linux_build:
name: pytorch_libtorch_linux_xenial_cuda11_3_cudnn8_py3_gcc7_build
requires:
- "docker-pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"
build_environment: "pytorch-libtorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7-build"
docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"
filters:
branches:
only:
- /ci-all\/.*/
- /release\/.*/
- pytorch_windows_build:
build_environment: pytorch-win-vs2019-cuda11-cudnn8-py3
cuda_version: "11.3"
name: pytorch_windows_vs2019_py36_cuda11.3_build
python_version: "3.6"
use_cuda: "1"
vc_product: BuildTools
vc_version: "14.28.29333"
vc_year: "2019"
filters:
branches:
only:
- /ci-all\/.*/
- /release\/.*/
- pytorch_windows_test:
build_environment: pytorch-win-vs2019-cuda11-cudnn8-py3
cuda_version: "11.3"
executor: windows-with-nvidia-gpu
name: pytorch_windows_vs2019_py36_cuda11.3_test1
python_version: "3.6"
requires:
- pytorch_windows_vs2019_py36_cuda11.3_build
test_name: pytorch-windows-test1
use_cuda: "1"
vc_product: BuildTools
vc_version: "14.28.29333"
vc_year: "2019"
filters:
branches:
only:
- /ci-all\/.*/
- /release\/.*/
- pytorch_windows_test:
build_environment: pytorch-win-vs2019-cuda11-cudnn8-py3
cuda_version: "11.3"
executor: windows-with-nvidia-gpu
name: pytorch_windows_vs2019_py36_cuda11.3_test2
python_version: "3.6"
requires:
- pytorch_windows_vs2019_py36_cuda11.3_build
test_name: pytorch-windows-test2
use_cuda: "1"
vc_product: BuildTools
vc_version: "14.28.29333"
vc_year: "2019"
filters:
branches:
only:
- /ci-all\/.*/
- /release\/.*/
# the following clones pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7's tests but enables
# slow tests and sets an environment variable so gradcheck runs with fast_mode=False
slow-gradcheck-scheduled-ci:
@ -20,18 +186,10 @@
build_environment: "pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7-build"
docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7"
- pytorch_linux_test:
name: periodic_pytorch_xenial_cuda10_2_cudnn7_gcc7_old_gradcheck_test1
name: periodic_pytorch_xenial_cuda10_2_cudnn7_gcc7_old_gradcheck_tests
requires:
- periodic_pytorch_xenial_cuda10_2_cudnn7_gcc7_build
build_environment: "pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7-old-gradcheck-test1"
docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7"
use_cuda_docker_runtime: "1"
resource_class: gpu.medium
- pytorch_linux_test:
name: periodic_pytorch_xenial_cuda10_2_cudnn7_gcc7_old_gradcheck_test2
requires:
- periodic_pytorch_xenial_cuda10_2_cudnn7_gcc7_build
build_environment: "pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7-old-gradcheck-test2"
build_environment: "pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7-old-gradcheck-tests"
docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7"
use_cuda_docker_runtime: "1"
resource_class: gpu.medium

View File

@ -9,7 +9,6 @@ bugprone-*,
-bugprone-reserved-identifier,
cppcoreguidelines-*,
-cppcoreguidelines-avoid-magic-numbers,
-cppcoreguidelines-avoid-non-const-global-variables,
-cppcoreguidelines-interfaces-global-init,
-cppcoreguidelines-macro-usage,
-cppcoreguidelines-owning-memory,
@ -22,7 +21,6 @@ cppcoreguidelines-*,
-cppcoreguidelines-pro-type-union-access,
-cppcoreguidelines-pro-type-vararg,
-cppcoreguidelines-special-member-functions,
-cppcoreguidelines-non-private-member-variables-in-classes,
-facebook-hte-RelativeInclude,
hicpp-exception-baseclass,
hicpp-avoid-goto,
@ -39,6 +37,5 @@ performance-*,
'
HeaderFilterRegex: 'torch/csrc/.*'
AnalyzeTemporaryDtors: false
WarningsAsErrors: '*'
CheckOptions:
...

5
.gitattributes vendored
View File

@ -1,4 +1 @@
*.bat text eol=crlf
.circleci/config.yml linguist-generated=true
.github/workflows/generated-*.yml linguist-generated=true
.github/generated-* linguist-generated=true
*.bat text eol=crlf

View File

@ -1,5 +1,5 @@
---
name: "\U0001F680 Feature Request"
name: "\U0001F680Feature Request"
about: Submit a proposal/request for a new PyTorch feature
---

View File

@ -1,8 +0,0 @@
self-hosted-runner:
labels:
- linux.2xlarge
- linux.8xlarge.nvidia.gpu
- linux.16xlarge.nvidia.gpu
- windows.4xlarge
- windows.8xlarge.nvidia.gpu
- bm-runner

View File

@ -1,102 +0,0 @@
{
"__comment": "@generated DO NOT EDIT MANUALLY, Generation script: .github/scripts/generate_ci_workflows.py",
"label_rules": {
"ciflow/all": [
"libtorch-linux-xenial-cuda10.2-py3.6-gcc7",
"libtorch-linux-xenial-cuda11.3-py3.6-gcc7",
"linux-bionic-cuda10.2-py3.9-gcc7",
"linux-bionic-py3.6-clang9",
"linux-bionic-py3.8-gcc9-coverage",
"linux-xenial-cuda10.2-py3.6-gcc7",
"linux-xenial-cuda11.3-py3.6-gcc7",
"linux-xenial-py3.6-gcc5.4",
"linux-xenial-py3.6-gcc7-bazel-test",
"parallelnative-linux-xenial-py3.6-gcc5.4",
"periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7",
"periodic-linux-xenial-cuda11.1-py3.6-gcc7",
"periodic-win-vs2019-cuda11.1-py3",
"puretorch-linux-xenial-py3.6-gcc5.4",
"win-vs2019-cpu-py3",
"win-vs2019-cuda10.2-py3",
"win-vs2019-cuda11.3-py3"
],
"ciflow/bazel": [
"linux-xenial-py3.6-gcc7-bazel-test"
],
"ciflow/coverage": [
"linux-bionic-py3.8-gcc9-coverage"
],
"ciflow/cpu": [
"linux-bionic-py3.6-clang9",
"linux-bionic-py3.8-gcc9-coverage",
"linux-xenial-py3.6-gcc5.4",
"linux-xenial-py3.6-gcc7-bazel-test",
"parallelnative-linux-xenial-py3.6-gcc5.4",
"puretorch-linux-xenial-py3.6-gcc5.4",
"win-vs2019-cpu-py3"
],
"ciflow/cuda": [
"libtorch-linux-xenial-cuda10.2-py3.6-gcc7",
"libtorch-linux-xenial-cuda11.3-py3.6-gcc7",
"linux-bionic-cuda10.2-py3.9-gcc7",
"linux-xenial-cuda10.2-py3.6-gcc7",
"linux-xenial-cuda11.3-py3.6-gcc7",
"periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7",
"periodic-linux-xenial-cuda11.1-py3.6-gcc7",
"periodic-win-vs2019-cuda11.1-py3",
"win-vs2019-cuda10.2-py3",
"win-vs2019-cuda11.3-py3"
],
"ciflow/default": [
"linux-bionic-py3.6-clang9",
"linux-bionic-py3.8-gcc9-coverage",
"linux-xenial-cuda11.3-py3.6-gcc7",
"linux-xenial-py3.6-gcc5.4",
"linux-xenial-py3.6-gcc7-bazel-test",
"win-vs2019-cpu-py3",
"win-vs2019-cuda11.3-py3"
],
"ciflow/libtorch": [
"libtorch-linux-xenial-cuda10.2-py3.6-gcc7",
"libtorch-linux-xenial-cuda11.3-py3.6-gcc7",
"periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7"
],
"ciflow/linux": [
"libtorch-linux-xenial-cuda10.2-py3.6-gcc7",
"libtorch-linux-xenial-cuda11.3-py3.6-gcc7",
"linux-bionic-cuda10.2-py3.9-gcc7",
"linux-bionic-py3.6-clang9",
"linux-bionic-py3.8-gcc9-coverage",
"linux-xenial-cuda10.2-py3.6-gcc7",
"linux-xenial-cuda11.3-py3.6-gcc7",
"linux-xenial-py3.6-gcc5.4",
"linux-xenial-py3.6-gcc7-bazel-test",
"parallelnative-linux-xenial-py3.6-gcc5.4",
"periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7",
"periodic-linux-xenial-cuda11.1-py3.6-gcc7",
"puretorch-linux-xenial-py3.6-gcc5.4"
],
"ciflow/noarch": [
"linux-bionic-py3.6-clang9"
],
"ciflow/scheduled": [
"periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7",
"periodic-linux-xenial-cuda11.1-py3.6-gcc7",
"periodic-win-vs2019-cuda11.1-py3"
],
"ciflow/slow": [
"linux-bionic-cuda10.2-py3.9-gcc7",
"linux-xenial-cuda10.2-py3.6-gcc7"
],
"ciflow/win": [
"periodic-win-vs2019-cuda11.1-py3",
"win-vs2019-cpu-py3",
"win-vs2019-cuda10.2-py3",
"win-vs2019-cuda11.3-py3"
],
"ciflow/xla": [
"linux-bionic-py3.6-clang9"
]
},
"version": "v1"
}

View File

@ -13,9 +13,3 @@ labels_to_circle_params:
- run_build
ci/master:
parameter: run_master_build
set_to_false:
- run_build
ci/slow-gradcheck:
parameter: run_slow_gradcheck_build
set_to_false:
- run_build

View File

@ -1,2 +1 @@
tracking_issue: 24422
ciflow_tracking_issue: 64124

View File

@ -1,6 +0,0 @@
#!/bin/bash -e
# Allows this script to be invoked from any directory:
cd "$(dirname "$0")"
python3 scripts/generate_ci_workflows.py

View File

@ -27,18 +27,8 @@ runner_types:
os: linux
max_available: 50
disk_size: 150
linux.16xlarge.nvidia.gpu:
instance_type: g3.16xlarge
os: linux
max_available: 10
disk_size: 150
windows.4xlarge:
instance_type: c5d.4xlarge
instance_type: c5.4xlarge
os: windows
max_available: 200
disk_size: 256
windows.8xlarge.nvidia.gpu:
instance_type: p3.2xlarge
os: windows
max_available: 25
disk_size: 256

View File

@ -1,57 +0,0 @@
#!/usr/bin/env python3
import argparse
import sys
import yaml
from pathlib import Path
REPO_ROOT = Path(__file__).resolve().parent.parent.parent
WORKFLOWS = REPO_ROOT / ".github" / "workflows"
def concurrency_key(filename: Path) -> str:
workflow_name = filename.with_suffix("").name.replace("_", "-")
if workflow_name.startswith("generated-"):
workflow_name = workflow_name[len("generated-"):]
return f"{workflow_name}-${{{{ github.event.pull_request.number || github.sha }}}}" \
"-${{ github.event_name == 'workflow_dispatch' }}"
def should_check(filename: Path) -> bool:
with open(filename, "r") as f:
content = f.read()
data = yaml.safe_load(content)
on = data.get("on", data.get(True, {}))
return "pull_request" in on
if __name__ == "__main__":
parser = argparse.ArgumentParser(
description="Ensure all relevant GitHub actions jobs will be cancelled based on a concurrency key"
)
args = parser.parse_args()
files = list(WORKFLOWS.glob("*.yml"))
errors_found = False
files = [f for f in files if should_check(f)]
for filename in files:
with open(filename, "r") as f:
data = yaml.safe_load(f)
expected = {
"group": concurrency_key(filename),
"cancel-in-progress": True,
}
if data.get("concurrency", None) != expected:
print(
f"'concurrency' incorrect or not found in '{filename.relative_to(REPO_ROOT)}'",
file=sys.stderr,
)
errors_found = True
if errors_found:
sys.exit(1)

View File

@ -1,586 +0,0 @@
#!/usr/bin/env python3
from dataclasses import asdict, dataclass, field
from pathlib import Path
from typing import Dict, Set
import jinja2
import json
import os
import sys
from typing_extensions import Literal
YamlShellBool = Literal["''", 1]
Arch = Literal["windows", "linux"]
DOCKER_REGISTRY = "308535385114.dkr.ecr.us-east-1.amazonaws.com"
GITHUB_DIR = Path(__file__).resolve().parent.parent
WINDOWS_CPU_TEST_RUNNER = "windows.4xlarge"
WINDOWS_CUDA_TEST_RUNNER = "windows.8xlarge.nvidia.gpu"
WINDOWS_RUNNERS = {
WINDOWS_CPU_TEST_RUNNER,
WINDOWS_CUDA_TEST_RUNNER,
}
LINUX_CPU_TEST_RUNNER = "linux.2xlarge"
LINUX_CUDA_TEST_RUNNER = "linux.8xlarge.nvidia.gpu"
LINUX_RUNNERS = {
LINUX_CPU_TEST_RUNNER,
LINUX_CUDA_TEST_RUNNER,
}
CUDA_RUNNERS = {
WINDOWS_CUDA_TEST_RUNNER,
LINUX_CUDA_TEST_RUNNER,
}
CPU_RUNNERS = {
WINDOWS_CPU_TEST_RUNNER,
LINUX_CPU_TEST_RUNNER,
}
LABEL_CIFLOW_ALL = "ciflow/all"
LABEL_CIFLOW_BAZEL = "ciflow/bazel"
LABEL_CIFLOW_COVERAGE = "ciflow/coverage"
LABEL_CIFLOW_CPU = "ciflow/cpu"
LABEL_CIFLOW_CUDA = "ciflow/cuda"
LABEL_CIFLOW_DEFAULT = "ciflow/default"
LABEL_CIFLOW_LIBTORCH = "ciflow/libtorch"
LABEL_CIFLOW_LINUX = "ciflow/linux"
LABEL_CIFLOW_SCHEDULED = "ciflow/scheduled"
LABEL_CIFLOW_SLOW = "ciflow/slow"
LABEL_CIFLOW_WIN = "ciflow/win"
LABEL_CIFLOW_XLA = "ciflow/xla"
LABEL_CIFLOW_NOARCH = "ciflow/noarch"
@dataclass
class CIFlowConfig:
enabled: bool = False
# For use to enable workflows to run on pytorch/pytorch-canary
run_on_canary: bool = False
labels: Set[str] = field(default_factory=set)
trigger_action: str = 'unassigned'
trigger_actor: str = 'pytorchbot'
root_job_name: str = 'ciflow_should_run'
root_job_condition: str = ''
# trigger_action_only controls if we listen only on the trigger_action of a pull_request.
# If it's False, we listen on all default pull_request actions, this is useful when
# ciflow (via probot) is not automated yet.
trigger_action_only: bool = False
def gen_root_job_condition(self) -> None:
# TODO: Make conditions strict
# At the beginning of the rollout of ciflow, we keep everything the same as what we have
# Once fully rollout, we can have strict constraints
# e.g. ADD env.GITHUB_ACTOR == '{self.trigger_actor}
# REMOVE github.event.action !='{self.trigger_action}'
label_conditions = [
f"contains(github.event.pull_request.labels.*.name, '{label}')" for label in sorted(self.labels)]
if self.run_on_canary:
self.root_job_condition = "(github.repository_owner == 'pytorch') && "
else:
self.root_job_condition = "(github.repository == 'pytorch/pytorch') && "
self.root_job_condition += f"((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || " \
f"(github.event.action !='{self.trigger_action}') || " \
f"({' || '.join(label_conditions)}))"
def reset_root_job(self) -> None:
self.root_job_name = ''
self.root_job_condition = ''
def __post_init__(self) -> None:
if not self.enabled:
self.reset_root_job()
return
self.labels.add(LABEL_CIFLOW_ALL)
self.gen_root_job_condition()
@dataclass
class CIFlowRuleset:
version = 'v1'
output_file = f'{GITHUB_DIR}/generated-ciflow-ruleset.json'
label_rules: Dict[str, Set[str]] = field(default_factory=dict)
def add_label_rule(self, labels: Set[str], workflow_name: str) -> None:
for label in labels:
if label in self.label_rules:
self.label_rules[label].add(workflow_name)
else:
self.label_rules[label] = {workflow_name}
def generate_json(self) -> None:
GENERATED = "generated" # Note that please keep the variable GENERATED otherwise phabricator will hide the whole file
output = {
"__comment": f"@{GENERATED} DO NOT EDIT MANUALLY, Generation script: .github/scripts/generate_ci_workflows.py",
"version": self.version,
"label_rules": {
label: sorted(list(workflows))
for label, workflows in self.label_rules.items()
}
}
with open(self.output_file, 'w') as outfile:
json.dump(output, outfile, indent=2, sort_keys=True)
outfile.write('\n')
@dataclass
class CIWorkflow:
# Required fields
arch: Arch
build_environment: str
test_runner_type: str
# Optional fields
ciflow_config: CIFlowConfig = field(default_factory=CIFlowConfig)
cuda_version: str = ''
docker_image_base: str = ''
enable_doc_jobs: bool = False
exclude_test: bool = False
is_coverage: bool = False
is_libtorch: bool = False
is_scheduled: str = ''
num_test_shards: int = 1
on_pull_request: bool = False
only_build_on_pull_request: bool = False
only_run_smoke_tests_on_pull_request: bool = False
num_test_shards_on_pull_request: int = -1
distributed_test: bool = True
# The following variables will be set as environment variables,
# so it's easier for both shell and Python scripts to consume it if false is represented as the empty string.
enable_jit_legacy_test: YamlShellBool = "''"
enable_distributed_test: YamlShellBool = "''"
enable_multigpu_test: YamlShellBool = "''"
enable_nogpu_no_avx_test: YamlShellBool = "''"
enable_nogpu_no_avx2_test: YamlShellBool = "''"
enable_slow_test: YamlShellBool = "''"
enable_docs_test: YamlShellBool = "''"
enable_backwards_compat_test: YamlShellBool = "''"
enable_xla_test: YamlShellBool = "''"
enable_noarch_test: YamlShellBool = "''"
def __post_init__(self) -> None:
if self.is_libtorch:
self.exclude_test = True
if not self.on_pull_request:
self.only_build_on_pull_request = False
if self.distributed_test:
self.enable_distributed_test = 1
# If num_test_shards_on_pull_request is not user-defined, default to num_test_shards unless we are
# only running smoke tests on the pull request.
if self.num_test_shards_on_pull_request == -1:
# Don't waste resources on runner spinup and cooldown for another shard if we are only running a few tests
if self.only_run_smoke_tests_on_pull_request:
self.num_test_shards_on_pull_request = 1
else:
self.num_test_shards_on_pull_request = self.num_test_shards
self.assert_valid()
def assert_valid(self) -> None:
err_message = f"invalid test_runner_type for {self.arch}: {self.test_runner_type}"
if self.arch == 'linux':
assert self.test_runner_type in LINUX_RUNNERS, err_message
if self.arch == 'windows':
assert self.test_runner_type in WINDOWS_RUNNERS, err_message
if self.ciflow_config.enabled:
# make sure if LABEL_CIFLOW_DEFAULT is set, we then need to set trigger_action_only to False
assert self.ciflow_config.trigger_action_only != (LABEL_CIFLOW_DEFAULT in self.ciflow_config.labels)
assert self.on_pull_request
assert LABEL_CIFLOW_ALL in self.ciflow_config.labels
assert LABEL_CIFLOW_ALL in self.ciflow_config.root_job_condition
if self.arch == 'linux':
assert LABEL_CIFLOW_LINUX in self.ciflow_config.labels
if self.arch == 'windows':
assert LABEL_CIFLOW_WIN in self.ciflow_config.labels
if self.test_runner_type in CUDA_RUNNERS:
assert LABEL_CIFLOW_CUDA in self.ciflow_config.labels
if self.test_runner_type in CPU_RUNNERS:
assert LABEL_CIFLOW_CPU in self.ciflow_config.labels
def generate_workflow_file(self, workflow_template: jinja2.Template) -> None:
output_file_path = GITHUB_DIR / f"workflows/generated-{self.build_environment}.yml"
with open(output_file_path, "w") as output_file:
GENERATED = "generated" # Note that please keep the variable GENERATED otherwise phabricator will hide the whole file
output_file.writelines([f"# @{GENERATED} DO NOT EDIT MANUALLY\n"])
try:
content = workflow_template.render(asdict(self))
except Exception as e:
print(f"Failed on template: {workflow_template}", file=sys.stderr)
raise e
output_file.write(content)
if content[-1] != "\n":
output_file.write("\n")
print(output_file_path)
WINDOWS_WORKFLOWS = [
CIWorkflow(
arch="windows",
build_environment="win-vs2019-cpu-py3",
cuda_version="cpu",
test_runner_type=WINDOWS_CPU_TEST_RUNNER,
on_pull_request=True,
num_test_shards=2,
ciflow_config=CIFlowConfig(
enabled=True,
run_on_canary=True,
labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_CPU, LABEL_CIFLOW_WIN}
),
),
CIWorkflow(
arch="windows",
build_environment="win-vs2019-cuda10.2-py3",
cuda_version="10.2",
test_runner_type=WINDOWS_CUDA_TEST_RUNNER,
on_pull_request=True,
num_test_shards=2,
ciflow_config=CIFlowConfig(
enabled=True,
trigger_action_only=True,
labels={LABEL_CIFLOW_CUDA, LABEL_CIFLOW_WIN}
),
),
CIWorkflow(
arch="windows",
build_environment="win-vs2019-cuda11.3-py3",
cuda_version="11.3",
test_runner_type=WINDOWS_CUDA_TEST_RUNNER,
num_test_shards=2,
on_pull_request=True,
only_run_smoke_tests_on_pull_request=True,
ciflow_config=CIFlowConfig(
enabled=True,
run_on_canary=True,
labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_CUDA, LABEL_CIFLOW_WIN}
),
),
CIWorkflow(
arch="windows",
build_environment="periodic-win-vs2019-cuda11.1-py3",
cuda_version="11.1",
test_runner_type=WINDOWS_CUDA_TEST_RUNNER,
num_test_shards=2,
is_scheduled="45 0,4,8,12,16,20 * * *",
on_pull_request=True,
ciflow_config=CIFlowConfig(
enabled=True,
trigger_action_only=True,
labels={LABEL_CIFLOW_SCHEDULED, LABEL_CIFLOW_WIN, LABEL_CIFLOW_CUDA}
),
),
]
LINUX_WORKFLOWS = [
CIWorkflow(
arch="linux",
build_environment="linux-xenial-py3.6-gcc5.4",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.6-gcc5.4",
test_runner_type=LINUX_CPU_TEST_RUNNER,
on_pull_request=True,
enable_jit_legacy_test=1,
enable_doc_jobs=True,
enable_docs_test=1,
enable_backwards_compat_test=1,
num_test_shards=2,
ciflow_config=CIFlowConfig(
enabled=True,
run_on_canary=True,
labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU}
),
),
# ParallelTBB does not have a maintainer and is currently flaky
# CIWorkflow(
# arch="linux",
# build_environment="paralleltbb-linux-xenial-py3.6-gcc5.4",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.6-gcc5.4",
# test_runner_type=LINUX_CPU_TEST_RUNNER,
# # This is a master only job despite on_pull_request is set to True
# on_pull_request=True,
# ciflow_config=CIFlowConfig(
# enabled=True,
# trigger_action_only=True,
# labels={LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU},
# ),
# ),
CIWorkflow(
arch="linux",
build_environment="parallelnative-linux-xenial-py3.6-gcc5.4",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.6-gcc5.4",
test_runner_type=LINUX_CPU_TEST_RUNNER,
# This is a master only job despite on_pull_request is set to True
on_pull_request=True,
ciflow_config=CIFlowConfig(
enabled=True,
trigger_action_only=True,
labels={LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU},
),
),
# Build PyTorch with BUILD_CAFFE2=OFF
CIWorkflow(
arch="linux",
build_environment="puretorch-linux-xenial-py3.6-gcc5.4",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.6-gcc5.4",
test_runner_type=LINUX_CPU_TEST_RUNNER,
exclude_test=True,
# This is a master only job despite on_pull_request is set to True
on_pull_request=True,
ciflow_config=CIFlowConfig(
enabled=True,
trigger_action_only=True,
labels={LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU},
),
),
# CIWorkflow(
# arch="linux",
# build_environment="linux-xenial-py3.6-gcc7",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.6-gcc7",
# test_runner_type=LINUX_CPU_TEST_RUNNER,
# ),
# CIWorkflow(
# arch="linux",
# build_environment="linux-xenial-py3.6-clang5-asan",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-asan",
# test_runner_type=LINUX_CPU_TEST_RUNNER,
# ),
# CIWorkflow(
# arch="linux",
# build_environment="linux-xenial-py3.6-clang7-onnx",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang7-onnx",
# test_runner_type=LINUX_CPU_TEST_RUNNER,
# ),
CIWorkflow(
arch="linux",
build_environment="linux-bionic-cuda10.2-py3.9-gcc7",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7",
test_runner_type=LINUX_CUDA_TEST_RUNNER,
num_test_shards=2,
on_pull_request=True,
ciflow_config=CIFlowConfig(
enabled=True,
run_on_canary=True,
trigger_action_only=True,
labels={LABEL_CIFLOW_SLOW, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CUDA}
),
),
CIWorkflow(
arch="linux",
build_environment="linux-xenial-cuda10.2-py3.6-gcc7",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7",
test_runner_type=LINUX_CUDA_TEST_RUNNER,
enable_jit_legacy_test=1,
enable_multigpu_test=1,
enable_nogpu_no_avx_test=1,
enable_nogpu_no_avx2_test=1,
enable_slow_test=1,
num_test_shards=2,
on_pull_request=True,
ciflow_config=CIFlowConfig(
enabled=True,
trigger_action_only=True,
labels=set([LABEL_CIFLOW_SLOW, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CUDA]),
),
),
CIWorkflow(
arch="linux",
build_environment="libtorch-linux-xenial-cuda10.2-py3.6-gcc7",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7",
test_runner_type=LINUX_CUDA_TEST_RUNNER,
is_libtorch=True,
on_pull_request=True,
ciflow_config=CIFlowConfig(
enabled=True,
trigger_action_only=True,
labels=set([LABEL_CIFLOW_LIBTORCH, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CUDA]),
),
),
CIWorkflow(
arch="linux",
build_environment="linux-xenial-cuda11.3-py3.6-gcc7",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7",
test_runner_type=LINUX_CUDA_TEST_RUNNER,
num_test_shards=2,
on_pull_request=True,
ciflow_config=CIFlowConfig(
enabled=True,
labels=set([LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CUDA]),
),
),
CIWorkflow(
arch="linux",
build_environment="libtorch-linux-xenial-cuda11.3-py3.6-gcc7",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7",
test_runner_type=LINUX_CUDA_TEST_RUNNER,
is_libtorch=True,
on_pull_request=True,
ciflow_config=CIFlowConfig(
enabled=True,
trigger_action_only=True,
labels=set([LABEL_CIFLOW_LIBTORCH, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CUDA]),
),
),
CIWorkflow(
arch="linux",
build_environment="periodic-linux-xenial-cuda11.1-py3.6-gcc7",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7",
test_runner_type=LINUX_CUDA_TEST_RUNNER,
num_test_shards=2,
is_scheduled="45 0,4,8,12,16,20 * * *",
on_pull_request=True,
ciflow_config=CIFlowConfig(
enabled=True,
trigger_action_only=True,
labels={LABEL_CIFLOW_SCHEDULED, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CUDA}
),
),
CIWorkflow(
arch="linux",
build_environment="periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7",
test_runner_type=LINUX_CUDA_TEST_RUNNER,
is_libtorch=True,
is_scheduled="45 0,4,8,12,16,20 * * *",
on_pull_request=True,
ciflow_config=CIFlowConfig(
enabled=True,
trigger_action_only=True,
labels={LABEL_CIFLOW_SCHEDULED, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_LIBTORCH, LABEL_CIFLOW_CUDA},
),
),
CIWorkflow(
arch="linux",
build_environment="linux-bionic-py3.8-gcc9-coverage",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-py3.8-gcc9",
test_runner_type=LINUX_CPU_TEST_RUNNER,
on_pull_request=True,
is_coverage=True,
num_test_shards=2,
ciflow_config=CIFlowConfig(
enabled=True,
labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_COVERAGE, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU},
),
),
CIWorkflow(
arch="linux",
build_environment="linux-bionic-py3.6-clang9",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-py3.6-clang9",
test_runner_type=LINUX_CPU_TEST_RUNNER,
on_pull_request=True,
num_test_shards=2,
distributed_test=False,
enable_noarch_test=1,
ciflow_config=CIFlowConfig(
enabled=True,
labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU, LABEL_CIFLOW_XLA, LABEL_CIFLOW_NOARCH},
),
),
# CIWorkflow(
# arch="linux",
# build_environment="linux-bionic-rocm3.9-py3.6",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-rocm3.9-py3.6",
# test_runner_type=LINUX_CPU_TEST_RUNNER,
# ),
# CIWorkflow(
# arch="linux",
# build_environment="linux-xenial-py3.6-clang5-android-ndk-r19c-x86_32",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",
# test_runner_type=LINUX_CPU_TEST_RUNNER,
# ),
# CIWorkflow(
# arch="linux",
# build_environment="linux-xenial-py3.6-clang5-android-ndk-r19c-x86_64",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",
# test_runner_type=LINUX_CPU_TEST_RUNNER,
# ),
# CIWorkflow(
# arch="linux",
# build_environment="linux-xenial-py3.6-clang5-android-ndk-r19c-arm-v7a",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",
# test_runner_type=LINUX_CPU_TEST_RUNNER,
# ),
# CIWorkflow(
# arch="linux",
# build_environment="linux-xenial-py3.6-clang5-android-ndk-r19c-arm-v8a",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",
# test_runner_type=LINUX_CPU_TEST_RUNNER,
# ),
# CIWorkflow(
# arch="linux",
# build_environment="linux-xenial-py3.6-clang5-mobile",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-asan",
# test_runner_type=LINUX_CPU_TEST_RUNNER,
# ),
# CIWorkflow(
# arch="linux",
# build_environment="linux-xenial-py3.6-clang5-mobile-custom-dynamic",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",
# test_runner_type=LINUX_CPU_TEST_RUNNER,
# ),
# CIWorkflow(
# arch="linux",
# build_environment="linux-xenial-py3.6-clang5-mobile-custom-static",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",
# test_runner_type=LINUX_CPU_TEST_RUNNER,
# ),
# CIWorkflow(
# arch="linux",
# build_environment="linux-xenial-py3.6-clang5-mobile-code-analysis",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",
# test_runner_type=LINUX_CPU_TEST_RUNNER,
# ),
]
BAZEL_WORKFLOWS = [
CIWorkflow(
arch="linux",
build_environment="linux-xenial-py3.6-gcc7-bazel-test",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7",
test_runner_type=LINUX_CPU_TEST_RUNNER,
on_pull_request=True,
ciflow_config=CIFlowConfig(
enabled=True,
labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_BAZEL, LABEL_CIFLOW_CPU, LABEL_CIFLOW_LINUX},
),
),
]
if __name__ == "__main__":
jinja_env = jinja2.Environment(
variable_start_string="!{{",
loader=jinja2.FileSystemLoader(str(GITHUB_DIR.joinpath("templates"))),
undefined=jinja2.StrictUndefined,
)
template_and_workflows = [
(jinja_env.get_template("linux_ci_workflow.yml.j2"), LINUX_WORKFLOWS),
(jinja_env.get_template("windows_ci_workflow.yml.j2"), WINDOWS_WORKFLOWS),
(jinja_env.get_template("bazel_ci_workflow.yml.j2"), BAZEL_WORKFLOWS),
]
# Delete the existing generated files first, this should align with .gitattributes file description.
existing_workflows = GITHUB_DIR.glob("workflows/generated-*")
for w in existing_workflows:
try:
os.remove(w)
except Exception as e:
print(f"Error occurred when deleting file {w}: {e}")
ciflow_ruleset = CIFlowRuleset()
for template, workflows in template_and_workflows:
for workflow in workflows:
workflow.generate_workflow_file(workflow_template=template)
if workflow.ciflow_config.enabled:
ciflow_ruleset.add_label_rule(workflow.ciflow_config.labels, workflow.build_environment)
elif workflow.on_pull_request:
# If ciflow is disabled but still on_pull_request, we can denote
# it as a special label LABEL_CIFLOW_DEFAULT in the ruleset, which will be later
# turned into an actual LABEL_CIFLOW_DEFAULT label in the workflow.
# During the rollout phase, it has the same effect as LABEL_CIFLOW_DEFAULT
ciflow_ruleset.add_label_rule({LABEL_CIFLOW_DEFAULT}, workflow.build_environment)
ciflow_ruleset.generate_json()

161
.github/scripts/generate_linux_ci_workflows.py vendored Executable file
View File

@ -0,0 +1,161 @@
#!/usr/bin/env python3
from pathlib import Path
import jinja2
DOCKER_REGISTRY = "308535385114.dkr.ecr.us-east-1.amazonaws.com"
GITHUB_DIR = Path(__file__).parent.parent
CPU_TEST_RUNNER = "linux.2xlarge"
CUDA_TEST_RUNNER = "linux.8xlarge.nvidia.gpu"
class PyTorchLinuxWorkflow:
def __init__(
self,
build_environment: str,
docker_image_base: str,
on_pull_request: bool = False,
enable_doc_jobs: bool = False,
):
self.build_environment = build_environment
self.docker_image_base = docker_image_base
self.test_runner_type = CPU_TEST_RUNNER
self.on_pull_request = on_pull_request
self.enable_doc_jobs = enable_doc_jobs
if "cuda" in build_environment:
self.test_runner_type = CUDA_TEST_RUNNER
def generate_workflow_file(
self, workflow_template: jinja2.Template, jinja_env: jinja2.Environment
) -> Path:
output_file_path = GITHUB_DIR.joinpath(
f"workflows/{self.build_environment}.yml"
)
with open(output_file_path, "w") as output_file:
output_file.writelines(["# @generated DO NOT EDIT MANUALLY\n"])
output_file.write(
workflow_template.render(
build_environment=self.build_environment,
docker_image_base=self.docker_image_base,
test_runner_type=self.test_runner_type,
enable_doc_jobs=self.enable_doc_jobs,
on_pull_request=self.on_pull_request,
)
)
output_file.write('\n')
return output_file_path
WORKFLOWS = [
PyTorchLinuxWorkflow(
build_environment="pytorch-linux-xenial-py3.6-gcc5.4",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.6-gcc5.4",
on_pull_request=True,
enable_doc_jobs=True,
),
# PyTorchLinuxWorkflow(
# build_environment="pytorch-paralleltbb-linux-xenial-py3.6-gcc5.4",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.6-gcc5.4",
# ),
# PyTorchLinuxWorkflow(
# build_environment="pytorch-parallelnative-linux-xenial-py3.6-gcc5.4",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.6-gcc5.4",
# ),
# PyTorchLinuxWorkflow(
# build_environment="pytorch-pure_torch-linux-xenial-py3.6-gcc5.4",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.6-gcc5.4",
# ),
# PyTorchLinuxWorkflow(
# build_environment="pytorch-linux-xenial-py3.6-gcc7",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.6-gcc7",
# ),
# PyTorchLinuxWorkflow(
# build_environment="pytorch-linux-xenial-py3.6-clang5-asan",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-asan",
# ),
# PyTorchLinuxWorkflow(
# build_environment="pytorch-linux-xenial-py3.6-clang7-onnx",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang7-onnx",
# ),
PyTorchLinuxWorkflow(
build_environment="pytorch-linux-xenial-cuda10.2-cudnn7-py3.6-gcc7",
docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7",
),
# PyTorchLinuxWorkflow(
# build_environment="pytorch-linux-xenial-cuda11.1-cudnn8-py3.6-gcc7",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7",
# ),
# PyTorchLinuxWorkflow(
# build_environment="pytorch-libtorch-linux-xenial-cuda11.1-cudnn8-py3.6-gcc7",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7",
# ),
# PyTorchLinuxWorkflow(
# build_environment="pytorch-linux-bionic-py3.6-clang9-noarch",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-py3.6-clang9",
# ),
# PyTorchLinuxWorkflow(
# build_environment="pytorch-xla-linux-bionic-py3.6-clang9",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-py3.6-clang9",
# ),
# PyTorchLinuxWorkflow(
# build_environment="pytorch-vulkan-linux-bionic-py3.6-clang9",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-py3.6-clang9",
# ),
# PyTorchLinuxWorkflow(
# build_environment="pytorch-linux-bionic-py3.8-gcc9-coverage",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-py3.8-gcc9",
# ),
# PyTorchLinuxWorkflow(
# build_environment="pytorch-linux-bionic-rocm3.9-py3.6",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-rocm3.9-py3.6",
# ),
# PyTorchLinuxWorkflow(
# build_environment="pytorch-linux-xenial-py3.6-clang5-android-ndk-r19c-x86_32",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",
# ),
# PyTorchLinuxWorkflow(
# build_environment="pytorch-linux-xenial-py3.6-clang5-android-ndk-r19c-x86_64",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",
# ),
# PyTorchLinuxWorkflow(
# build_environment="pytorch-linux-xenial-py3.6-clang5-android-ndk-r19c-arm-v7a",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",
# ),
# PyTorchLinuxWorkflow(
# build_environment="pytorch-linux-xenial-py3.6-clang5-android-ndk-r19c-arm-v8a",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",
# ),
# PyTorchLinuxWorkflow(
# build_environment="pytorch-linux-xenial-py3.6-clang5-mobile",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-asan",
# ),
# PyTorchLinuxWorkflow(
# build_environment="pytorch-linux-xenial-py3.6-clang5-mobile-custom-dynamic",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",
# ),
# PyTorchLinuxWorkflow(
# build_environment="pytorch-linux-xenial-py3.6-clang5-mobile-custom-static",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",
# ),
# PyTorchLinuxWorkflow(
# build_environment="pytorch-linux-xenial-py3.6-clang5-mobile-code-analysis",
# docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",
# ),
]
if __name__ == "__main__":
jinja_env = jinja2.Environment(
variable_start_string="!{{",
loader=jinja2.FileSystemLoader(str(GITHUB_DIR.joinpath("templates"))),
)
workflow_template = jinja_env.get_template("linux_ci_workflow.yml.in")
for workflow in WORKFLOWS:
print(
workflow.generate_workflow_file(
workflow_template=workflow_template,
jinja_env=jinja_env
)
)

View File

@ -1,94 +0,0 @@
#!/usr/bin/env python3
"""Generates a matrix to be utilized through github actions
Will output a matrix to represent our testing configurations, which is currently
dictated by just sharding.
"""
import json
import os
import re
from typing import Dict
from typing_extensions import TypedDict
class Config(TypedDict):
num_shards: int
runner: str
def get_disabled_issues() -> str:
pr_body = os.getenv('PR_BODY', '')
# The below regex is meant to match all *case-insensitive* keywords that
# GitHub has delineated would link PRs to issues, more details here:
# https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue.
# E.g., "Close #62851", "fixES #62851" and "RESOLVED #62851" would all match, but not
# "closes #62851" --> extra space, "fixing #62851" --> not a keyword, nor "fix 62851" --> no #
regex = '(?i)(Close(d|s)?|Resolve(d|s)?|Fix(ed|es)?) #([0-9]+)'
issue_numbers = [x[4] for x in re.findall(regex, pr_body)]
return ','.join(issue_numbers)
def main() -> None:
TEST_RUNNER_TYPE = os.getenv('TEST_RUNNER_TYPE')
assert TEST_RUNNER_TYPE is not None
ON_PULL_REQUEST = os.getenv('GITHUB_HEAD_REF')
NUM_TEST_SHARDS_ON_PULL_REQUEST = os.getenv('NUM_TEST_SHARDS_ON_PULL_REQUEST')
NUM_TEST_SHARDS = int(os.getenv('NUM_TEST_SHARDS', '1'))
if ON_PULL_REQUEST and NUM_TEST_SHARDS_ON_PULL_REQUEST:
NUM_TEST_SHARDS = int(NUM_TEST_SHARDS_ON_PULL_REQUEST)
MULTIGPU_RUNNER_TYPE = os.getenv('MULTIGPU_RUNNER_TYPE')
NOGPU_RUNNER_TYPE = os.getenv('NOGPU_RUNNER_TYPE')
configs: Dict[str, Config] = {}
if os.getenv('ENABLE_JIT_LEGACY_TEST'):
configs['jit_legacy'] = {'num_shards': 1, 'runner': TEST_RUNNER_TYPE}
if MULTIGPU_RUNNER_TYPE is not None and os.getenv('ENABLE_MULTIGPU_TEST'):
configs['multigpu'] = {'num_shards': 1, 'runner': MULTIGPU_RUNNER_TYPE}
if NOGPU_RUNNER_TYPE is not None and os.getenv('ENABLE_NOGPU_NO_AVX_TEST'):
configs['nogpu_NO_AVX'] = {'num_shards': 1, 'runner': NOGPU_RUNNER_TYPE}
if NOGPU_RUNNER_TYPE is not None and os.getenv('ENABLE_NOGPU_NO_AVX2_TEST'):
configs['nogpu_NO_AVX2'] = {'num_shards': 1, 'runner': NOGPU_RUNNER_TYPE}
if os.getenv('ENABLE_DISTRIBUTED_TEST'):
configs['distributed'] = {'num_shards': 1, 'runner': TEST_RUNNER_TYPE}
if os.getenv('ENABLE_SLOW_TEST'):
configs['slow'] = {'num_shards': 1, 'runner': TEST_RUNNER_TYPE}
if os.getenv('ENABLE_DOCS_TEST'):
configs['docs_test'] = {'num_shards': 1, 'runner': TEST_RUNNER_TYPE}
if os.getenv('ENABLE_BACKWARDS_COMPAT_TEST'):
configs['backwards_compat'] = {'num_shards': 1, 'runner': TEST_RUNNER_TYPE}
if os.getenv('ENABLE_XLA_TEST'):
configs['xla'] = {'num_shards': 1, 'runner': TEST_RUNNER_TYPE}
if os.getenv('ENABLE_NOARCH_TEST'):
configs['noarch'] = {'num_shards': 1, 'runner': TEST_RUNNER_TYPE}
matrix = {
'include': [
{
'config': 'default',
'shard': shard,
'num_shards': NUM_TEST_SHARDS,
'runner': TEST_RUNNER_TYPE,
}
for shard in range(1, NUM_TEST_SHARDS + 1)
] + [
{
'config': name,
'shard': shard,
'num_shards': config['num_shards'],
'runner': config['runner'],
}
for name, config in configs.items()
for shard in range(1, config['num_shards'] + 1)
]
}
render_matrix = {'config': list(dict.fromkeys(x['config'] for x in matrix['include']))}
print(json.dumps({'matrix': matrix, 'render-matrix': render_matrix}, indent=2))
print(f'::set-output name=matrix::{json.dumps(matrix)}')
print(f'::set-output name=render-matrix::{json.dumps(render_matrix)}')
print(f'::set-output name=ignore-disabled-issues::{get_disabled_issues()}')
if __name__ == "__main__":
main()

View File

@ -16,12 +16,12 @@ LEGACY_BASE_VERSION_SUFFIX_PATTERN = re.compile("a0$")
class NoGitTagException(Exception):
pass
def get_pytorch_root() -> Path:
def get_pytorch_root():
return Path(subprocess.check_output(
['git', 'rev-parse', '--show-toplevel']
).decode('ascii').strip())
def get_tag() -> str:
def get_tag():
root = get_pytorch_root()
# We're on a tag
am_on_tag = (
@ -46,7 +46,7 @@ def get_tag() -> str:
tag = re.sub(TRAILING_RC_PATTERN, "", tag)
return tag
def get_base_version() -> str:
def get_base_version():
root = get_pytorch_root()
dirty_version = open(root / 'version.txt', 'r').read().strip()
# Strips trailing a0 from version.txt, not too sure why it's there in the
@ -54,44 +54,37 @@ def get_base_version() -> str:
return re.sub(LEGACY_BASE_VERSION_SUFFIX_PATTERN, "", dirty_version)
class PytorchVersion:
def __init__(
self,
gpu_arch_type: str,
gpu_arch_version: str,
no_build_suffix: bool,
) -> None:
def __init__(self, gpu_arch_type, gpu_arch_version, no_build_suffix):
self.gpu_arch_type = gpu_arch_type
self.gpu_arch_version = gpu_arch_version
self.no_build_suffix = no_build_suffix
def get_post_build_suffix(self) -> str:
if self.no_build_suffix:
return ""
def get_post_build_suffix(self):
if self.gpu_arch_type == "cuda":
return f"+cu{self.gpu_arch_version.replace('.', '')}"
return f"+{self.gpu_arch_type}{self.gpu_arch_version}"
def get_release_version(self) -> str:
def get_release_version(self):
if not get_tag():
raise NoGitTagException(
"Not on a git tag, are you sure you want a release version?"
)
return f"{get_tag()}{self.get_post_build_suffix()}"
def get_nightly_version(self) -> str:
def get_nightly_version(self):
date_str = datetime.today().strftime('%Y%m%d')
build_suffix = self.get_post_build_suffix()
return f"{get_base_version()}.dev{date_str}{build_suffix}"
def main() -> None:
def main():
parser = argparse.ArgumentParser(
description="Generate pytorch version for binary builds"
)
parser.add_argument(
"--no-build-suffix",
action="store_true",
type=strtobool,
help="Whether or not to add a build suffix typically (+cpu)",
default=strtobool(os.environ.get("NO_BUILD_SUFFIX", "False"))
default=os.environ.get("NO_BUILD_SUFFIX", False)
)
parser.add_argument(
"--gpu-arch-type",

View File

@ -1,11 +0,0 @@
function Get-SSH-Sessions {
Get-Process sshd -IncludeUserName |
Where-Object UserName -notLike "*SYSTEM*" |
Select-Object Id
}
$runningSessions = Get-SSH-Sessions
foreach ($session in $runningSessions) {
Stop-Process -id $session.Id
}

View File

@ -14,19 +14,19 @@ is simply to make sure that there is *some* configuration of ruamel that can rou
the YAML, not to be prescriptive about it.
'''
import ruamel.yaml # type: ignore[import]
import ruamel.yaml
import difflib
import sys
from pathlib import Path
from io import StringIO
def fn(base: str) -> str:
def fn(base):
return str(base / Path("aten/src/ATen/native/native_functions.yaml"))
with open(Path(__file__).parent.parent.parent / fn('.'), "r") as f:
contents = f.read()
yaml = ruamel.yaml.YAML() # type: ignore[attr-defined]
yaml = ruamel.yaml.YAML()
yaml.preserve_quotes = True
yaml.width = 1000
yaml.boolean_representation = ['False', 'True']

View File

@ -0,0 +1,37 @@
#!/usr/bin/env python3
'''
This file verifies that the workflows that are potentially canceled in our cancel_redundant_workflow.yml
match the workflows we have running on pull requests (found in .github/workflows). This way, anytime a
workflow is added or removed, people can be reminded to modify the cancel_redundant_workflow.yml accordingly.
'''
import ruamel.yaml
from pathlib import Path
yaml = ruamel.yaml.YAML()
yaml.preserve_quotes = True
yaml.boolean_representation = ['False', 'True']
yaml.default_flow_style = False
if __name__ == '__main__':
workflow_paths = (Path(__file__).parent.parent / 'workflows').rglob('*')
workflows = []
for path in workflow_paths:
if path.suffix in {'.yml', '.yaml'}:
with open(path) as f:
data = yaml.load(f)
assert 'name' in data, 'Every GHA workflow must have a name.'
if 'pull_request' in data['on']:
workflows.append(data['name'])
with open('.github/workflows/cancel_redundant_workflows.yml', 'r') as f:
data = yaml.load(f)
# Replace workflows to cancel
data['on']['workflow_run']['workflows'] = sorted(workflows)
with open('.github/workflows/cancel_redundant_workflows.yml', 'w') as f:
yaml.dump(data, f)

View File

@ -1,5 +1,5 @@
#!/usr/bin/env bash
CHANGES=$(git status --porcelain "$1")
CHANGES=$(git status --porcelain)
echo "$CHANGES"
git diff "$1"
git diff
[ -z "$CHANGES" ]

View File

@ -13,7 +13,6 @@ Testing environment:
# 1. Does not reuse the build artifact in other CI workflows
# 2. CI jobs are serialized because there is only one worker
import os
import git # type: ignore[import]
import pathlib
import argparse
import subprocess
@ -24,7 +23,6 @@ CUDA_VERSION = "cu102"
PYTHON_VERSION = "3.7"
TORCHBENCH_CONFIG_NAME = "config.yaml"
MAGIC_PREFIX = "RUN_TORCHBENCH:"
MAGIC_TORCHBENCH_PREFIX = "TORCHBENCH_BRANCH:"
ABTEST_CONFIG_TEMPLATE = """# This config is automatically generated by run_torchbench.py
start: {control}
end: {treatment}
@ -33,7 +31,7 @@ direction: decrease
timeout: 720
tests:"""
def gen_abtest_config(control: str, treatment: str, models: List[str]) -> str:
def gen_abtest_config(control: str, treatment: str, models: List[str]):
d = {}
d["control"] = control
d["treatment"] = treatment
@ -45,7 +43,7 @@ def gen_abtest_config(control: str, treatment: str, models: List[str]) -> str:
config = config + "\n"
return config
def deploy_torchbench_config(output_dir: str, config: str) -> None:
def deploy_torchbench_config(output_dir: str, config: str):
# Create test dir if needed
pathlib.Path(output_dir).mkdir(exist_ok=True)
# TorchBench config file name
@ -59,7 +57,7 @@ def extract_models_from_pr(torchbench_path: str, prbody_file: str) -> List[str]:
lines = map(lambda x: x.strip(), pf.read().splitlines())
magic_lines = list(filter(lambda x: x.startswith(MAGIC_PREFIX), lines))
if magic_lines:
# Only the first magic line will be recognized.
# Only the first magic line will be respected.
model_list = list(map(lambda x: x.strip(), magic_lines[0][len(MAGIC_PREFIX):].split(",")))
# Shortcut: if model_list is ["ALL"], run all the tests
if model_list == ["ALL"]:
@ -73,27 +71,7 @@ def extract_models_from_pr(torchbench_path: str, prbody_file: str) -> List[str]:
return []
return model_list
def identify_torchbench_branch(torchbench_path: str, prbody_file: str) -> None:
branch_name: str
with open(prbody_file, "r") as pf:
lines = map(lambda x: x.strip(), pf.read().splitlines())
magic_lines = list(filter(lambda x: x.startswith(MAGIC_TORCHBENCH_PREFIX), lines))
if magic_lines:
# Only the first magic line will be recognized.
branch_name = magic_lines[0][len(MAGIC_TORCHBENCH_PREFIX):].strip()
# If not specified, directly return without the branch checkout
if not branch_name:
return
try:
print(f"Checking out the TorchBench branch: {branch_name} ...")
repo = git.Repo(torchbench_path)
origin = repo.remotes.origin
origin.fetch(branch_name)
repo.create_head(branch_name, origin.refs[branch_name]).checkout()
except git.exc.GitCommandError:
raise RuntimeError(f'{branch_name} doesn\'t exist in the pytorch/benchmark repository. Please double check.')
def run_torchbench(pytorch_path: str, torchbench_path: str, output_dir: str) -> None:
def run_torchbench(pytorch_path: str, torchbench_path: str, output_dir: str):
# Copy system environment so that we will not override
env = dict(os.environ)
command = ["python", "bisection.py", "--work-dir", output_dir,
@ -118,12 +96,6 @@ if __name__ == "__main__":
if not models:
print("Can't parse the model filter from the pr body. Currently we only support allow-list.")
exit(1)
# Identify the specified TorchBench branch, verify the branch exists, and checkout the branch
try:
identify_torchbench_branch(args.torchbench_path, args.pr_body)
except RuntimeError as e:
print(f"Identify TorchBench branch failed: {str(e)}")
exit(1)
print(f"Ready to run TorchBench with benchmark. Result will be saved in the directory: {output_dir}.")
# Run TorchBench with the generated config
torchbench_config = gen_abtest_config(args.pr_base_sha, args.pr_head_sha, models)

View File

@ -1,17 +0,0 @@
function Get-SSH-Users {
# Gets ssh sessions for all users not named SYSTEM
Get-CimInstance -ClassName Win32_Process -Filter "Name = 'sshd.exe'" |
Get-CimAssociatedInstance -Association Win32_SessionProcess |
Get-CimAssociatedInstance -Association Win32_LoggedOnUser |
Where-Object {$_.Name -ne 'SYSTEM'} |
Measure-Object
}
$usersLoggedOn = Get-SSH-Users
Write-Output "Holding runner until all ssh sessions have logged out"
while ($usersLoggedOn.Count -gt 0) {
$usersLoggedOn = Get-SSH-Users
Write-Output "."
Start-Sleep -s 5
}

View File

@ -1,13 +0,0 @@
#!/usr/bin/env bash
set -eou pipefail
echo "Holding runner for 2 hours until all ssh sessions have logged out"
for _ in $(seq 1440); do
# Break if no ssh session exists anymore
if [ "$(who)" = "" ]; then
break
fi
echo "."
sleep 5
done

View File

@ -1,137 +0,0 @@
{%- extends "linux_ci_workflow.yml.j2" -%}
{%- set exclude_test = true -%}
{% block name -%}
# Template is at: .github/templates/bazel_ci_workflow.yml.j2
# Generation script: .github/scripts/generate_ci_workflows.py
name: !{{ build_environment }}
{%- endblock %}
on:
{%- if on_pull_request %}
pull_request:
{%- if ciflow_config.enabled %}
{%- if ciflow_config.trigger_action_only %}
types: [!{{ ciflow_config.trigger_action }}]
{%- else %}
types: [opened, synchronize, reopened, !{{ ciflow_config.trigger_action }}]
{%- endif %}
{%- endif %}
{%- else %}
# TODO: Enable pull_request builds when we can verify capacity can be met by auto-scalers
{%- endif %}
{% block build +%}
# building and testing in a single job since bazel runs only small subset of tests
build-and-test:
runs-on: !{{ test_runner_type }}
needs: [calculate-docker-image, !{{ ciflow_config.root_job_name }}]
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
JOB_BASE_NAME: !{{ build_environment }}-build-and-test
NUM_TEST_SHARDS: !{{ num_test_shards }}
CONTINUE_THROUGH_ERROR: ${{ github.repository == 'pytorch/pytorch' && (github.event_name == 'push' || github.event_name == 'schedule') }}
steps:
!{{ common.setup_ec2_linux() }}
!{{ common.checkout_pytorch("recursive") }}
- name: Pull docker image
run: |
docker pull "${DOCKER_IMAGE}"
- name: Determine shm-size
run: |
shm_size="1g"
case "${BUILD_ENVIRONMENT}" in
*cuda*)
shm_size="2g"
;;
*rocm*)
shm_size="8g"
;;
esac
echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}"
- name: Output disk space left
run: |
sudo df -H
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Build
run: |
# detached container should get cleaned up by teardown_ec2_linux
container_name=$(docker run \
-e BUILD_ENVIRONMENT \
-e JOB_BASE_NAME \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e SCCACHE_BUCKET \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e PR_LABELS \
-e SKIP_SCCACHE_INITIALIZATION=1 \
-e TORCH_CUDA_ARCH_LIST \
-e http_proxy="!{{ common.squid_proxy }}" -e https_proxy="!{{ common.squid_proxy }}" -e no_proxy="!{{ common.squid_no_proxy }}" \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--tty \
--detach \
--user jenkins \
-v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && sudo chown -R jenkins /dev && .jenkins/pytorch/build.sh'
!{{ common.parse_ref() }}
- name: Display and upload binary build size statistics (Click Me)
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
run: |
COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
export COMMIT_TIME
pip3 install requests==2.26
python3 -m tools.stats.upload_binary_size_to_scuba || exit 0
- name: Test
run: |
# detached container should get cleaned up by teardown_ec2_linux
export SHARD_NUMBER=0
# TODO: Stop building test binaries as part of the build phase
# Make sure we copy test results from bazel-testlogs symlink to
# a regular directory ./test/test-reports
container_name=$(docker run \
-e BUILD_ENVIRONMENT \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e GITHUB_ACTIONS \
-e IN_CI \
-e SHARD_NUMBER \
-e JOB_BASE_NAME \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e SCCACHE_BUCKET \
-e CONTINUE_THROUGH_ERROR \
-e PR_LABELS \
-e http_proxy="!{{ common.squid_proxy }}" -e https_proxy="!{{ common.squid_proxy }}" -e no_proxy="!{{ common.squid_no_proxy }}" \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--shm-size="${SHM_SIZE}" \
--tty \
--detach \
--user jenkins \
-v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && sudo chown -R jenkins /dev && .jenkins/pytorch/test.sh && cp -Lr ./bazel-testlogs ./test/test-reports'
- name: Chown workspace
if: always()
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
!{{ common.upload_test_reports(name='bazel') }}
!{{ common.upload_test_statistics(build_environment) }}
!{{ common.teardown_ec2_linux() }}
{%- endblock %}

View File

@ -1,186 +0,0 @@
{%- set upload_artifact_s3_action = "seemethere/upload-artifact-s3@v3" -%}
{# squid_proxy is an private ELB that only available for GHA custom runners #}
{%- set squid_proxy = "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -%}
{# squid_no_proxy is a list of common set of fixed domains or IPs that we don't need to proxy. See https://docs.aws.amazon.com/AmazonECS/latest/developerguide/http_proxy_config.html#windows-proxy #}
{%- set squid_no_proxy = "localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" -%}
{%- macro concurrency(build_environment) -%}
concurrency:
group: !{{ build_environment }}-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true
{%- endmacro -%}
{%- macro display_ec2_information() -%}
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
{%- endmacro -%}
{%- macro parse_ref() -%}
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
{%- endmacro -%}
{%- macro upload_test_statistics(build_environment) -%}
- name: Display and upload test statistics (Click Me)
if: always()
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
JOB_BASE_NAME: !{{ build_environment }}-test
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
shell: bash
run: |
python3 -m pip install -r requirements.txt
python3 -m pip install boto3==1.16.34
python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test
{%- endmacro -%}
{%- macro setup_ec2_linux() -%}
!{{ display_ec2_information() }}
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
{%- endmacro -%}
{%- macro teardown_ec2_linux() -%}
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Chown workspace
if: always()
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Kill containers, clean up images
if: always()
run: |
# ignore expansion of "docker ps -q" since it could be empty
# shellcheck disable=SC2046
docker stop $(docker ps -q) || true
# Prune all of the docker images
docker system prune -af
{%- endmacro -%}
{%- macro checkout_pytorch(submodules) -%}
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: !{{ submodules }}
{%- endmacro -%}
{%- macro upload_test_reports(name) -%}
- name: Zip test reports for upload
if: always()
env:
{%- if name == 'linux' or name == 'windows' %}
FILE_SUFFIX: '${{ github.job }}-${{ matrix.config }}-${{ matrix.shard }}-${{ matrix.num_shards }}-${{ matrix.runner }}'
{%- else %}
FILE_SUFFIX: '!{{ name }}-${{ github.job }}'
{%- endif %}
{%- if name == 'windows' %}
shell: powershell
run: |
# -ir => recursive include all files in pattern
7z a "test-reports-$Env:FILE_SUFFIX.zip" -ir'!test\*.xml'
{%- else %}
run: |
# Remove any previous test reports if they exist
rm -f test-reports-*.zip
zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml'
{%- endif %}
- uses: actions/upload-artifact@v2
name: Store Test Reports
if: always()
with:
{%- if name == 'linux' or name == 'windows' %}
name: test-reports-${{ matrix.config }}
{%- else %}
name: test-reports-!{{ name }}
{%- endif %}
retention-days: 14
if-no-files-found: error
path:
{%- if name == 'windows' %}
pytorch-${{ github.run_id }}/test-reports-*.zip
{%- else %}
test-reports-*.zip
{%- endif %}
- uses: !{{ upload_artifact_s3_action }}
name: Store Test Reports on S3
if: always()
with:
retention-days: 14
if-no-files-found: error
path:
{%- if name == 'windows' %}
pytorch-${{ github.run_id }}/test-reports-*.zip
{%- else %}
test-reports-*.zip
{%- endif %}
{%- endmacro -%}
{%- macro render_test_results() -%}
- name: Install render_test_results dependencies
if: always()
shell: bash
run: |
python3 -m pip install junitparser==2.1.1 rich==10.9.0
- name: "[[ Click me for rendered test results (useful for finding failing tests) ]]"
if: always()
shell: bash
# Encoding is weird on windows, just try to default to utf-8 if possible
env:
PYTHONIOENCODING: "utf-8"
run: |
python3 tools/render_junit.py test/
{%- endmacro -%}

View File

@ -0,0 +1,368 @@
# Template is at: .github/templates/linux_ci_workflow.yml
# Generation script: .github/scripts/generate_linux_ci_workflows.py
name: Linux CI (!{{ build_environment }})
on:
# TODO: Enable pull_request builds when we can verify capacity can be met by auto-scalers
{%- if on_pull_request %}
pull_request:
{%- endif %}
push:
branches:
- master
- release/*
workflow_dispatch:
env:
BUILD_ENVIRONMENT: !{{ build_environment }}
DOCKER_IMAGE_BASE: !{{ docker_image_base }}
SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
TORCH_CUDA_ARCH_LIST: 5.2
IN_CI: 1
# Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh
CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
jobs:
calculate-docker-image:
runs-on: linux.2xlarge
env:
DOCKER_BUILDKIT: 1
timeout-minutes: 90
outputs:
docker_image: ${{ steps.calculate-tag.outputs.docker_image }}
steps:
- name: Checkout PyTorch
uses: actions/checkout@v2
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
- name: Calculate docker image tag
id: calculate-tag
run: |
DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)
echo "::set-output name=docker_tag::${DOCKER_TAG}"
echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"
- name: Check if image should be built
id: check
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}
run: |
eval "$(aws ecr get-login --no-include-email --region us-east-1)"
set -x
# Check if image already exists, if it does then skip building it
if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then
exit 0
fi
if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then
# if we're on the base branch then use the parent commit
MERGE_BASE=$(git rev-parse HEAD~)
else
# otherwise we're on a PR, so use the most recent base commit
MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION")
fi
# Covers the case where a previous tag doesn't exist for the tree
# this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly
if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then
echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit"
exit 1
fi
PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker")
# If no image exists but the hash is the same as the previous hash then we should error out here
if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then
echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"
echo " contact the PyTorch team to restore the original images"
exit 1
fi
echo ::set-output name=rebuild::yes
- name: Build and push docker image
if: steps.check.outputs.rebuild
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
DOCKER_SKIP_S3_UPLOAD: 1
run: |
export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}
cd .circleci/docker && ./build_docker.sh
build:
runs-on: linux.2xlarge
needs: calculate-docker-image
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
steps:
- name: Log in to ECR
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Checkout PyTorch
uses: actions/checkout@v2
with:
fetch-depth: 0 # deep clone, to allow sharding to use git rev-list
submodules: recursive
- name: Pull docker image
run: |
docker pull "${DOCKER_IMAGE}"
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Build PyTorch
run: |
docker run \
-e BUILD_ENVIRONMENT \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e SCCACHE_BUCKET \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e SKIP_SCCACHE_INITIALIZATION=1 \
-e TORCH_CUDA_ARCH_LIST \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--tty \
--user jenkins \
-v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}" \
sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'
- name: Chown workspace
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Archive artifacts into zip
run: |
zip -r artifacts.zip dist/ build/
- uses: actions/upload-artifact@v2
name: Store PyTorch Build Artifacts
with:
name: ${{ env.BUILD_ENVIRONMENT }}
retention-days: 30
if-no-files-found: error
path:
artifacts.zip
- name: Clean up docker images
if: always()
run: |
# Prune all of the docker images
docker system prune -af
test:
runs-on: !{{ test_runner_type }}
needs:
- calculate-docker-image
- build
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
steps:
- name: Log in to ECR
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)/../":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Checkout PyTorch
uses: actions/checkout@v2
- name: Pull docker image
run: |
docker pull "${DOCKER_IMAGE}"
- name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG
if: ${{ contains(env.BUILD_ENVIRONMENT, 'cuda') }}
run: |
bash .github/scripts/install_nvidia_utils_linux.sh
echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}"
- name: Determine shm-size
run: |
shm_size="1g"
case "${BUILD_ENVIRONMENT}" in
*cuda*)
shm_size="2g"
;;
*rocm*)
shm_size="8g"
;;
esac
echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}"
- uses: actions/download-artifact@v2
name: Download PyTorch Build Artifacts
with:
name: ${{ env.BUILD_ENVIRONMENT }}
- name: Unzip artifacts
run: |
unzip -o artifacts.zip
- name: Output disk space left
run: |
sudo df -H
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Test PyTorch
run: |
# TODO: Stop building test binaries as part of the build phase
# Used for GPU_FLAG since that doesn't play nice
# shellcheck disable=SC2086
docker run \
${GPU_FLAG:-} \
-e BUILD_ENVIRONMENT \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e IN_CI \
-e MAX_JOBS="$(nproc --ignore=2)" \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--shm-size="${SHM_SIZE}" \
--tty \
--user jenkins \
-v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}" \
sh -c 'sudo chown -R jenkins . && pip install dist/*.whl && .jenkins/pytorch/test.sh'
- name: Chown workspace
if: always()
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- uses: actions/upload-artifact@v2
name: Store PyTorch Test Reports
if: always()
with:
name: test-reports
retention-days: 30
if-no-files-found: error
path:
test/**/*.xml
- name: Clean up docker images
if: always()
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
# Prune all of the docker images
docker system prune -af
render_test_results:
if: always()
needs:
- test
runs-on: ubuntu-18.04
steps:
- name: Checkout PyTorch
uses: actions/checkout@v2
with:
# deep clone, to allow tools/print_test_stats.py to use Git commands
fetch-depth: 0
- uses: actions/download-artifact@v2
name: Download PyTorch Test Reports
with:
name: test-reports
path: test/test-reports
- uses: actions/setup-python@v2
with:
python-version: 3.9
- name: Install dependencies
# boto3 version copied from .circleci/docker/common/install_conda.sh
run: |
pip install -r requirements.txt
pip install boto3==1.16.34 junitparser rich
- name: Output Test Results (Click Me)
run: |
python tools/render_junit.py test
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Display and upload test statistics (Click Me)
# temporary hack: set CIRCLE_* vars, until we update
# tools/print_test_stats.py to natively support GitHub Actions
env:
SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_OSSCI_METRICS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_OSSCI_METRICS_SECRET_ACCESS_KEY }}
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_JOB: !{{ build_environment }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: ${{ github.run_id }} # dunno if this corresponds
run: |
export PYTHONPATH=$PWD
python tools/print_test_stats.py --upload-to-s3 --compare-with-s3 test
{%- if enable_doc_jobs %}
pytorch_python_doc_build:
runs-on: linux.2xlarge
needs:
- calculate-docker-image
- build
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
steps:
- name: Log in to ECR
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v alpine chown -R "$(id -u):$(id -g)" .
- name: Checkout PyTorch
uses: actions/checkout@v2
with:
fetch-depth: 0 # deep clone, to allow sharding to use git rev-list
submodules: recursive
- name: Pull docker image
run: |
docker pull "${DOCKER_IMAGE}"
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- uses: actions/download-artifact@v2
name: Download PyTorch Build Artifacts
with:
name: ${{ env.BUILD_ENVIRONMENT }}
- name: Unzip artifacts
run: |
unzip -o artifacts.zip
- name: Build Python Doc in Docker
run: |
set -ex
time docker pull "${DOCKER_IMAGE}" > /dev/null
echo "${GITHUB_REF}"
ref=${GITHUB_REF##*/}
target=${ref//v}
docker run \
-e BUILD_ENVIRONMENT \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e IN_CI \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e CIRCLE_SHA1="$GITHUB_SHA" \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--name="$GITHUB_SHA" \
--tty \
--user jenkins \
-v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}" \
bash -c "sudo chown -R jenkins . && pip install dist/*.whl && ./.circleci/scripts/python_doc_push_script.sh docs/$target $target site"
- name: Chown workspace
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v alpine chown -R "$(id -u):$(id -g)" .
- name: Archive artifacts into zip
run: |
zip -r pytorch_github_io.zip "${GITHUB_WORKSPACE}/pytorch.github.io"
- uses: actions/upload-artifact@v2
name: Store PyTorch Build Artifacts
with:
name: pytorch_github_io
if-no-files-found: error
path: pytorch_github_io.zip
- name: Clean up docker images
if: always()
run: |
# Prune all of the docker images
docker system prune -af
{%- endif -%}

View File

@ -1,451 +0,0 @@
{% import 'common.yml.j2' as common %}
{%- block name -%}
# Template is at: .github/templates/linux_ci_workflow.yml.j2
# Generation script: .github/scripts/generate_ci_workflows.py
name: !{{ build_environment }}
{%- endblock %}
on:
{%- if on_pull_request %}
pull_request:
{%- if ciflow_config.enabled %}
{%- if ciflow_config.trigger_action_only %}
types: [!{{ ciflow_config.trigger_action }}]
{%- else %}
types: [opened, synchronize, reopened, !{{ ciflow_config.trigger_action }}]
{%- endif %}
{%- endif %}
{%- else %}
# TODO: Enable pull_request builds when we can verify capacity can be met by auto-scalers
{%- endif %}
{%- if is_scheduled %}
schedule:
- cron: !{{ is_scheduled }}
{%- else %}
push:
branches:
- master
- release/*
{%- endif %}
workflow_dispatch:
env:
BUILD_ENVIRONMENT: !{{ build_environment }}
DOCKER_IMAGE_BASE: !{{ docker_image_base }}
SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla
TORCH_CUDA_ARCH_LIST: 5.2
IN_CI: 1
# This is used for the phase of adding wheel tests only, will be removed once completed
IN_WHEEL_TEST: 1
# Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh
CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
!{{ common.concurrency(build_environment) }}
jobs:
{%- if ciflow_config.enabled %}
!{{ ciflow_config.root_job_name }}:
runs-on: ubuntu-18.04
if: ${{ !{{ ciflow_config.root_job_condition }} }}
env:
LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
steps:
- name: noop
run: echo running !{{ ciflow_config.root_job_name }}
- name: print labels
run: echo "${LABELS}"
{%- endif %}
calculate-docker-image:
runs-on: linux.2xlarge
{%- if ciflow_config.enabled %}
needs: [!{{ ciflow_config.root_job_name }}]
{%- endif %}
env:
DOCKER_BUILDKIT: 1
timeout-minutes: 90
outputs:
docker_image: ${{ steps.calculate-tag.outputs.docker_image }}
steps:
!{{ common.setup_ec2_linux() }}
!{{ common.checkout_pytorch("false") }}
- name: Calculate docker image tag
id: calculate-tag
run: |
DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)
echo "::set-output name=docker_tag::${DOCKER_TAG}"
echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"
- name: Check if image should be built
id: check
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}
run: |
set -x
# Check if image already exists, if it does then skip building it
if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then
exit 0
fi
if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then
# if we're on the base branch then use the parent commit
MERGE_BASE=$(git rev-parse HEAD~)
else
# otherwise we're on a PR, so use the most recent base commit
MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION")
fi
# Covers the case where a previous tag doesn't exist for the tree
# this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly
if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then
echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit"
exit 1
fi
PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker")
# If no image exists but the hash is the same as the previous hash then we should error out here
if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then
echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"
echo " contact the PyTorch team to restore the original images"
exit 1
fi
echo ::set-output name=rebuild::yes
- name: Build and push docker image
if: ${{ steps.check.outputs.rebuild }}
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
DOCKER_SKIP_S3_UPLOAD: 1
run: |
export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}
cd .circleci/docker && ./build_docker.sh
{% block build +%}
build:
runs-on: linux.2xlarge
needs: [calculate-docker-image, !{{ ciflow_config.root_job_name }}]
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
JOB_BASE_NAME: !{{ build_environment }}-build
steps:
!{{ common.setup_ec2_linux() }}
!{{ common.checkout_pytorch("recursive") }}
- name: Pull docker image
run: |
docker pull "${DOCKER_IMAGE}"
- name: Build
run: |
# detached container should get cleaned up by teardown_ec2_linux
container_name=$(docker run \
-e BUILD_ENVIRONMENT \
-e JOB_BASE_NAME \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e SCCACHE_BUCKET \
-e XLA_CLANG_CACHE_S3_BUCKET_NAME \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e SKIP_SCCACHE_INITIALIZATION=1 \
-e TORCH_CUDA_ARCH_LIST \
-e PR_LABELS \
-e http_proxy="!{{ common.squid_proxy }}" -e https_proxy="!{{ common.squid_proxy }}" -e no_proxy="!{{ common.squid_no_proxy }}" \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--tty \
--detach \
--user jenkins \
-v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'
!{{ common.parse_ref() }}
- name: Display and upload binary build size statistics (Click Me)
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
IS_GHA: 1
SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
run: |
COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
export COMMIT_TIME
pip3 install requests==2.26 boto3==1.16.34
python3 -m tools.stats.upload_binary_size_to_scuba || exit 0
- name: Chown workspace
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
{%- if not is_libtorch %}
- name: Archive artifacts into zip
run: |
zip -1 -r artifacts.zip dist/ build/custom_test_artifacts build/lib build/bin .pytorch-test-times.json
- uses: !{{ common.upload_artifact_s3_action }}
name: Store PyTorch Build Artifacts on S3
with:
name: ${{ env.BUILD_ENVIRONMENT }}
retention-days: 14
if-no-files-found: error
path:
artifacts.zip
{%- endif %}
!{{ common.teardown_ec2_linux() }}
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Clean up docker images
if: always()
run: |
# Prune all of the docker images
docker system prune -af
{%- endblock %}
{%- if not exclude_test %}
{% block test +%}
generate-test-matrix:
runs-on: ubuntu-18.04
{%- if ciflow_config.enabled %}
needs: [!{{ ciflow_config.root_job_name }}]
{%- endif %}
env:
TEST_RUNNER_TYPE: !{{ test_runner_type }}
ENABLE_DISTRIBUTED_TEST: !{{ enable_distributed_test }}
ENABLE_JIT_LEGACY_TEST: !{{ enable_jit_legacy_test }}
ENABLE_MULTIGPU_TEST: !{{ enable_multigpu_test }}
ENABLE_NOGPU_NO_AVX_TEST: !{{ enable_nogpu_no_avx_test }}
ENABLE_NOGPU_NO_AVX2_TEST: !{{ enable_nogpu_no_avx2_test }}
ENABLE_SLOW_TEST: !{{ enable_slow_test }}
ENABLE_DOCS_TEST: !{{ enable_docs_test }}
ENABLE_BACKWARDS_COMPAT_TEST: !{{ enable_backwards_compat_test }}
ENABLE_XLA_TEST: !{{ enable_xla_test }}
ENABLE_NOARCH_TEST: !{{ enable_noarch_test }}
NUM_TEST_SHARDS: !{{ num_test_shards }}
MULTIGPU_RUNNER_TYPE: linux.16xlarge.nvidia.gpu
NOGPU_RUNNER_TYPE: linux.2xlarge
PR_BODY: ${{ github.event.pull_request.body }}
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
render-matrix: ${{ steps.set-matrix.outputs.render-matrix }}
ignore-disabled-issues: ${{ steps.set-matrix.outputs.ignore-disabled-issues }}
container:
image: python:3.9
steps:
- name: Install dependencies
run: pip install typing-extensions==3.10
- name: Clone pytorch/pytorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
- name: Generating test matrix
id: set-matrix
run: .github/scripts/generate_pytorch_test_matrix.py
test:
needs: [calculate-docker-image, build, generate-test-matrix, !{{ ciflow_config.root_job_name }}]
strategy:
matrix: ${{ fromJson(needs.generate-test-matrix.outputs.matrix) }}
fail-fast: false
runs-on: ${{ matrix.runner }}
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
JOB_BASE_NAME: !{{ build_environment }}-test
TEST_CONFIG: ${{ matrix.config }}
SHARD_NUMBER: ${{ matrix.shard }}
NUM_TEST_SHARDS: ${{ matrix.num_shards }}
PYTORCH_IGNORE_DISABLED_ISSUES: ${{ needs.generate-test-matrix.outputs.ignore-disabled-issues }}
CONTINUE_THROUGH_ERROR: ${{ github.repository == 'pytorch/pytorch' && (github.event_name == 'push' || github.event_name == 'schedule') }}
steps:
!{{ common.setup_ec2_linux() }}
!{{ common.checkout_pytorch("recursive") }}
- name: Pull docker image
run: |
docker pull "${DOCKER_IMAGE}"
- name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG
if: ${{ contains(env.BUILD_ENVIRONMENT, 'cuda') && !contains(matrix.config, 'nogpu') }}
run: |
bash .github/scripts/install_nvidia_utils_linux.sh
echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}"
- name: Determine shm-size
run: |
shm_size="1g"
case "${BUILD_ENVIRONMENT}" in
*cuda*)
shm_size="2g"
;;
*rocm*)
shm_size="8g"
;;
esac
echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}"
- uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b
name: Download PyTorch Build Artifacts
with:
name: ${{ env.BUILD_ENVIRONMENT }}
- name: Unzip artifacts
run: |
unzip -o artifacts.zip
- name: Output disk space left
run: |
sudo df -H
!{{ common.parse_ref() }}
- name: Test
env:
PR_NUMBER: ${{ github.event.pull_request.number }}
IS_GHA: 1
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
AWS_DEFAULT_REGION: us-east-1
run: |
if [[ $TEST_CONFIG == 'multigpu' ]]; then
TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh
else
TEST_COMMAND=.jenkins/pytorch/test.sh
fi
if [[ $NUM_TEST_SHARDS -ne 2 ]]; then
export SHARD_NUMBER=0
fi
# detached container should get cleaned up by teardown_ec2_linux
# TODO: Stop building test binaries as part of the build phase
# Used for GPU_FLAG since that doesn't play nice
# shellcheck disable=SC2086
container_name=$(docker run \
${GPU_FLAG:-} \
-e BUILD_ENVIRONMENT \
-e PR_NUMBER \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e GITHUB_ACTIONS \
-e IN_CI \
-e IS_GHA \
-e CIRCLE_BRANCH \
-e CIRCLE_SHA1 \
-e CIRCLE_PR_NUMBER \
-e AWS_DEFAULT_REGION \
-e IN_WHEEL_TEST \
-e SHARD_NUMBER \
-e JOB_BASE_NAME \
-e TEST_CONFIG \
-e NUM_TEST_SHARDS \
-e PYTORCH_IGNORE_DISABLED_ISSUES \
-e PR_LABELS \
-e CONTINUE_THROUGH_ERROR \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e SCCACHE_BUCKET \
-e http_proxy="!{{ common.squid_proxy }}" -e https_proxy="!{{ common.squid_proxy }}" -e no_proxy="!{{ common.squid_no_proxy }}" \
-e XLA_CLANG_CACHE_S3_BUCKET_NAME \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--shm-size="${SHM_SIZE}" \
--tty \
--detach \
--name="${container_name}" \
--user jenkins \
-v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}"
- name: Chown workspace
if: always()
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
!{{ common.render_test_results() }}
{%- if is_coverage %}
- name: Report coverage
run: |
python3 -mpip install codecov==2.1.12
python3 -mcodecov
{%- endif %}
!{{ common.upload_test_reports(name='linux') }}
!{{ common.upload_test_statistics(build_environment) }}
!{{ common.teardown_ec2_linux() }}
{% endblock %}
{%- endif -%}
{%- if enable_doc_jobs %}
build-docs:
runs-on: linux.2xlarge
strategy:
matrix:
docs_type: [cpp, python]
needs: [calculate-docker-image, build, !{{ ciflow_config.root_job_name }}]
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
DOCS_TYPE: ${{ matrix.docs_type }}
steps:
!{{ common.setup_ec2_linux() }}
!{{ common.checkout_pytorch("recursive") }}
- name: Pull docker image
run: |
docker pull "${DOCKER_IMAGE}"
- uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b
name: Download PyTorch Build Artifacts
with:
name: ${{ env.BUILD_ENVIRONMENT }}
- name: Unzip artifacts
run: |
unzip -o artifacts.zip
- name: Build ${{ matrix.docs_type }} docs
run: |
set -ex
time docker pull "${DOCKER_IMAGE}" > /dev/null
echo "${GITHUB_REF}"
ref=${GITHUB_REF##*/}
target=${ref//v}
# detached container should get cleaned up by teardown_ec2_linux
container_name=$(docker run \
-e BUILD_ENVIRONMENT \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e IN_CI \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e CIRCLE_SHA1="$GITHUB_SHA" \
-e DOCS_VERSION="${target}" \
-e DOCS_TYPE \
-e PR_LABELS \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--tty \
--detach \
--user jenkins \
-v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" bash -c "sudo chown -R jenkins . && pip install dist/*.whl && ./.circleci/scripts/${DOCS_TYPE}_doc_push_script.sh"
- name: Chown workspace
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- uses: !{{ common.upload_artifact_s3_action }}
name: Upload Python Docs Preview
if: ${{ github.event_name == 'pull_request' && matrix.docs_type == 'python' }}
with:
retention-days: 14
s3-bucket: doc-previews
if-no-files-found: error
path: pytorch.github.io/docs/merge/
s3-prefix: pytorch/${{ github.event.pull_request.number }}
- uses: !{{ common.upload_artifact_s3_action }}
name: Upload C++ Docs Preview
if: ${{ github.event_name == 'pull_request' && matrix.docs_type == 'cpp' }}
with:
retention-days: 14
if-no-files-found: error
s3-bucket: doc-previews
path: cppdocs/
s3-prefix: pytorch/${{ github.event.pull_request.number }}/cppdocs
- name: Archive artifacts into zip
run: |
zip -r "docs_${DOCS_TYPE}.zip" "${GITHUB_WORKSPACE}/pytorch.github.io" "${GITHUB_WORKSPACE}/cppdocs"
- uses: actions/upload-artifact@v2
name: Store PyTorch Build Artifacts
with:
name: docs_${{ matrix.docs_type }}
path: docs_${{ matrix.docs_type }}.zip
if-no-files-found: error
!{{ common.teardown_ec2_linux() }}
{%- endif -%}

View File

@ -1,259 +0,0 @@
{% import 'common.yml.j2' as common %}
{%- macro wait_and_kill_ssh() -%}
- name: Wait until all sessions have drained
shell: powershell
if: always()
timeout-minutes: 120
run: |
.github\scripts\wait_for_ssh_to_drain.ps1
- name: Kill active ssh sessions if still around (Useful if workflow was cancelled)
shell: powershell
if: always()
run: |
.github\scripts\kill_active_ssh_sessions.ps1
{%- endmacro -%}
# Template is at: .github/templates/windows_ci_workflow.yml.j2
# Generation script: .github/scripts/generate_ci_workflows.py
name: !{{ build_environment }}
on:
{%- if on_pull_request %}
pull_request:
{%- if ciflow_config.enabled %}
{%- if ciflow_config.trigger_action_only %}
types: [!{{ ciflow_config.trigger_action }}]
{%- else %}
types: [opened, synchronize, reopened, !{{ ciflow_config.trigger_action }}]
{%- endif %}
{%- endif %}
{%- endif %}
{%- if is_scheduled %}
schedule:
- cron: !{{ is_scheduled }}
{%- else %}
push:
branches:
- master
- release/*
{%- endif %}
workflow_dispatch:
env:
BUILD_ENVIRONMENT: !{{ build_environment }}
BUILD_WHEEL: 1
CUDA_VERSION: "!{{ cuda_version }}"
IN_CI: 1
INSTALL_WINDOWS_SDK: 1
PYTHON_VERSION: "3.8"
PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
SCCACHE_BUCKET: "ossci-compiler-cache"
VC_PRODUCT: "BuildTools"
VC_VERSION: ""
VS_VERSION: "16.8.6"
VC_YEAR: "2019"
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
no_proxy: !{{ common.squid_no_proxy }}
{%- if cuda_version != "cpu" %}
TORCH_CUDA_ARCH_LIST: "7.0"
USE_CUDA: 1
{%- endif %}
!{{ common.concurrency(build_environment) }}
jobs:
{%- if ciflow_config.enabled %}
!{{ ciflow_config.root_job_name }}:
runs-on: ubuntu-18.04
if: ${{ !{{ ciflow_config.root_job_condition }} }}
steps:
- name: noop
run: echo running !{{ ciflow_config.root_job_name }}
{%- endif %}
build:
runs-on: "windows.4xlarge"
defaults:
run:
working-directory: pytorch-${{ github.run_id }}
{%- if ciflow_config.enabled %}
needs: [!{{ ciflow_config.root_job_name }}]
{%- endif %}
env:
JOB_BASE_NAME: !{{ build_environment }}-build
http_proxy: "!{{ common. squid_proxy }}"
https_proxy: "!{{ common.squid_proxy }}"
steps:
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
submodules: recursive
path: pytorch-${{ github.run_id }}
# deep clone, to allow use of git merge-base
fetch-depth: 0
!{{ common.display_ec2_information() }}
- name: Install Visual Studio 2019 toolchain
shell: powershell
run: |
.\.circleci\scripts\vs_install.ps1
{%- if cuda_version != "cpu" %}
- name: Install Cuda
shell: bash
run: |
.circleci/scripts/windows_cuda_install.sh
- name: Install Cudnn
shell: bash
run: |
.circleci/scripts/windows_cudnn_install.sh
{%- endif %}
- name: Build
shell: bash
env:
PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/
run: |
.jenkins/pytorch/win-build.sh
# Upload to github so that people can click and download artifacts
- name: Upload artifacts to Github
if: always()
uses: actions/upload-artifact@v2
# Don't fail on upload to GH since it's only for user convenience
continue-on-error: true
with:
retention-days: 14
if-no-files-found: error
name: ${{ env.BUILD_ENVIRONMENT }}
path: C:\${{ github.run_id }}\build-results
- name: Upload artifacts to s3
if: always()
uses: !{{ common.upload_artifact_s3_action }}
with:
retention-days: 14
if-no-files-found: error
name: ${{ env.BUILD_ENVIRONMENT }}
path: C:\${{ github.run_id }}\build-results
!{{ wait_and_kill_ssh() }}
- name: Cleanup build-results and workspaces
if: always()
shell: bash
env:
PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/
# Should remove the entirety of pytorch-${{ github.run_id }}
run: |
rm -rf "${PYTORCH_FINAL_PACKAGE_DIR}"
rm -rf ./*
generate-test-matrix:
{%- if ciflow_config.enabled %}
needs: [!{{ ciflow_config.root_job_name }}]
{%- endif %}
runs-on: ubuntu-18.04
env:
TEST_RUNNER_TYPE: !{{ test_runner_type }}
NUM_TEST_SHARDS: !{{ num_test_shards }}
NUM_TEST_SHARDS_ON_PULL_REQUEST: !{{ num_test_shards_on_pull_request }}
PR_BODY: ${{ github.event.pull_request.body }}
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
render-matrix: ${{ steps.set-matrix.outputs.render-matrix }}
ignore-disabled-issues: ${{ steps.set-matrix.outputs.ignore-disabled-issues }}
container:
image: python:3.9
steps:
- name: Install dependencies
run: pip install typing-extensions==3.10
- name: Clone pytorch/pytorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
- name: Generating test matrix
id: set-matrix
run: .github/scripts/generate_pytorch_test_matrix.py
test:
{%- if only_build_on_pull_request %}
if: ${{ github.event_name == 'push' }}
{%- endif %}
env:
JOB_BASE_NAME: !{{ build_environment }}-test
SHARD_NUMBER: ${{ matrix.shard }}
NUM_TEST_SHARDS: ${{ matrix.num_shards }}
TEST_CONFIG: ${{ matrix.config }}
http_proxy: "!{{ common.squid_proxy }}"
https_proxy: "!{{ common.squid_proxy }}"
RUN_SMOKE_TESTS_ONLY_ON_PR: !{{ only_run_smoke_tests_on_pull_request }}
PYTORCH_IGNORE_DISABLED_ISSUES: ${{ needs.generate-test-matrix.outputs.ignore-disabled-issues }}
CONTINUE_THROUGH_ERROR: ${{ github.repository == 'pytorch/pytorch' && (github.event_name == 'push' || github.event_name == 'schedule') }}
needs: [build, generate-test-matrix, !{{ ciflow_config.root_job_name }}]
strategy:
matrix: ${{ fromJson(needs.generate-test-matrix.outputs.matrix) }}
fail-fast: false
runs-on: ${{ matrix.runner }}
defaults:
run:
working-directory: pytorch-${{ github.run_id }}
steps:
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
submodules: recursive
path: pytorch-${{ github.run_id }}
# deep clone, to allow use of git merge-base
fetch-depth: 0
!{{ common.display_ec2_information() }}
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Install Visual Studio 2019 toolchain
shell: powershell
run: |
.\.circleci\scripts\vs_install.ps1
{%- if cuda_version != "cpu" %}
- name: Install Cuda
shell: bash
run: |
.circleci/scripts/windows_cuda_install.sh
- name: Install Cudnn
shell: bash
run: |
.circleci/scripts/windows_cudnn_install.sh
{%- endif %}
- uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b
name: Download PyTorch Build Artifacts
with:
name: ${{ env.BUILD_ENVIRONMENT }}
path: C:\${{ github.run_id }}\build-results
- name: Check build-results folder
shell: powershell
run: |
tree /F C:\$Env:GITHUB_RUN_ID\build-results
# Needed for coverage in win-test.sh
- uses: actions/setup-python@v2
name: Setup Python3
with:
python-version: '3.x'
- name: Test
shell: bash
env:
PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/
run: |
if [[ $NUM_TEST_SHARDS -ne 2 ]]; then
export SHARD_NUMBER=0
fi
if [[ -n $GITHUB_HEAD_REF && "$RUN_SMOKE_TESTS_ONLY_ON_PR" == "true" ]]; then
export RUN_SMOKE_TESTS_ONLY=1
fi
.jenkins/pytorch/win-test.sh
!{{ common.upload_test_reports(name='windows') }}
!{{ common.render_test_results() }}
!{{ wait_and_kill_ssh() }}
!{{ common.parse_ref() }}
!{{ common.upload_test_statistics(build_environment) }}
- name: Cleanup workspace
if: always()
shell: bash
# Should remove the entirety of pytorch-${{ github.run_id }}
run: |
rm -rf ./*

66
.github/workflows/add_annotations.yml vendored Normal file
View File

@ -0,0 +1,66 @@
name: Add annotations
on:
workflow_run:
types:
- completed
workflows:
- Lint
jobs:
annotate:
strategy:
fail-fast: false
matrix:
name:
- flake8-py3
- clang-tidy
runs-on: ubuntu-18.04
steps:
- name: Download artifact
uses: actions/github-script@v3
env:
RUN_ID: ${{ github.event.workflow_run.id }}
LINT_NAME: ${{ matrix.name }}
with:
# https://securitylab.github.com/research/github-actions-preventing-pwn-requests/
script: |
const artifacts = await github.actions.listWorkflowRunArtifacts({
owner: context.repo.owner,
repo: context.repo.repo,
run_id: process.env.RUN_ID,
});
const filteredArtifacts = artifacts.data.artifacts.filter(artifact => {
return artifact.name == process.env.LINT_NAME;
});
if (filteredArtifacts.length > 0) {
const matchArtifact = filteredArtifacts[0];
const download = await github.actions.downloadArtifact({
owner: context.repo.owner,
repo: context.repo.repo,
artifact_id: matchArtifact.id,
archive_format: 'zip',
});
const fs = require('fs');
fs.writeFileSync(
`${process.env.GITHUB_WORKSPACE}/linter-output.zip`,
Buffer.from(download.data),
);
}
- name: Unzip artifact
id: unzip
run: |
if unzip linter-output.zip annotations.json commit-sha.txt; then
echo ::set-output \
name=sha::"$(grep -Em1 '^[[:xdigit:]]{40}$' commit-sha.txt)"
fi
- if: steps.unzip.outputs.sha
name: Add annotations
uses: pytorch/add-annotations-github-action@master
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
check_name: ${{ matrix.name }}
linter_output_path: annotations.json
commit_sha: ${{ steps.unzip.outputs.sha }}
mode: json

View File

@ -6,15 +6,8 @@ on:
pull_request_target:
types: [edited, opened, synchronize, reopened]
concurrency:
group: auto-label-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true
jobs:
auto-label-rocm:
if: ${{ github.repository == 'pytorch/pytorch' }}
runs-on: ubuntu-18.04
steps:
- name: Retrieve information

View File

@ -16,7 +16,7 @@ jobs:
image: python:3.9
steps:
- name: Clone pytorch/pytorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
uses: actions/checkout@v2
- name: Generating build matrix
id: set-matrix
run: |
@ -29,7 +29,8 @@ jobs:
needs: generate-build-matrix
runs-on: linux.2xlarge
strategy:
matrix: ${{ fromJson(needs.generate-build-matrix.outputs.matrix) }}
matrix:
${{ fromJson(needs.generate-build-matrix.outputs.matrix) }}
fail-fast: false
container:
image: ${{ matrix.container_image }}
@ -40,7 +41,7 @@ jobs:
DESIRED_CUDA: ${{ matrix.gpu_arch_version }}
GPU_ARCH_VERSION: ${{ matrix.GPU_ARCH_VERSION }}
GPU_ARCH_TYPE: ${{ matrix.gpu_arch_type }}
NO_BUILD_SUFFIX: true
NO_BUILD_SUFFIX: True
# TODO: This is a legacy variable, we should just default all build to use
# this folder within the conda/build_pytorch.sh script
TORCH_CONDA_BUILD_FOLDER: pytorch-nightly
@ -57,12 +58,12 @@ jobs:
- name: Clean runner workspace
run: rm -rf "$GITHUB_WORKSPACE"
- name: Clone pytorch/pytorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
uses: actions/checkout@v2
with:
path: pytorch
submodules: recursive
- name: Clone pytorch/builder
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
uses: actions/checkout@v2
with:
repository: pytorch/builder
path: builder
@ -91,25 +92,4 @@ jobs:
with:
name: pytorch-conda-py${{ matrix.python_version }}-${{matrix.gpu_arch_type}}-${{ matrix.gpu_arch_version }}
path: /remote/**/*.bz2
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Display and upload binary build size statistics (Click Me)
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
run: |
COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
export COMMIT_TIME
pip3 install requests==2.26
python3 -m tools.stats.upload_binary_size_to_scuba || exit 0
concurrency:
group: build-linux-conda-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true
# TODO: Add a step here for uploading binaries

View File

@ -16,7 +16,7 @@ jobs:
image: python:3.9
steps:
- name: Clone pytorch/pytorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
uses: actions/checkout@v2
- name: Generating build matrix
id: set-matrix
run: |
@ -29,7 +29,8 @@ jobs:
needs: generate-build-matrix
runs-on: linux.2xlarge
strategy:
matrix: ${{ fromJson(needs.generate-build-matrix.outputs.matrix) }}
matrix:
${{ fromJson(needs.generate-build-matrix.outputs.matrix) }}
fail-fast: false
container:
image: ${{ matrix.container_image }}
@ -51,12 +52,12 @@ jobs:
- name: Clean runner workspace
run: rm -rf "$GITHUB_WORKSPACE"
- name: Clone pytorch/pytorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
uses: actions/checkout@v2
with:
path: pytorch
submodules: recursive
- name: Clone pytorch/builder
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
uses: actions/checkout@v2
with:
repository: pytorch/builder
path: builder
@ -90,25 +91,4 @@ jobs:
with:
name: pytorch-libtorch-${{ matrix.libtorch_variant }}-${{ matrix.devtoolset }}-${{matrix.gpu_arch_type}}-${{ matrix.gpu_arch_version }}
path: /remote/**/*.zip
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Display and upload binary build size statistics (Click Me)
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
run: |
COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
export COMMIT_TIME
pip3 install requests==2.26
python3 -m tools.stats.upload_binary_size_to_scuba || exit 0
concurrency:
group: build-linux-libtorch-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true
# TODO: Add a step here for uploading binaries

View File

@ -16,7 +16,7 @@ jobs:
image: python:3.9
steps:
- name: Clone pytorch/pytorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
uses: actions/checkout@v2
- name: Generating build matrix
id: set-matrix
run: |
@ -29,7 +29,8 @@ jobs:
needs: generate-build-matrix
runs-on: linux.2xlarge
strategy:
matrix: ${{ fromJson(needs.generate-build-matrix.outputs.matrix) }}
matrix:
${{ fromJson(needs.generate-build-matrix.outputs.matrix) }}
fail-fast: false
container:
image: ${{ matrix.container_image }}
@ -46,12 +47,12 @@ jobs:
- name: Clean runner workspace
run: rm -rf "$GITHUB_WORKSPACE"
- name: Clone pytorch/pytorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
uses: actions/checkout@v2
with:
path: pytorch
submodules: recursive
- name: Clone pytorch/builder
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
uses: actions/checkout@v2
with:
repository: pytorch/builder
path: builder
@ -89,25 +90,4 @@ jobs:
with:
name: pytorch-wheel-py${{ matrix.python_version }}-${{matrix.gpu_arch_type}}-${{ matrix.gpu_arch_version }}
path: /remote/**/*.whl
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Display and upload binary build size statistics (Click Me)
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
run: |
COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
export COMMIT_TIME
pip3 install requests==2.26
python3 -m tools.stats.upload_binary_size_to_scuba || exit 0
concurrency:
group: build-linux-wheels-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true
# TODO: Add a step here for uploading binaries

View File

@ -0,0 +1,24 @@
name: Cancel redundant workflows
on:
workflow_run:
types:
- requested
# NOTE: Make sure to add to this list as you add more workflows running on 'pull_request'
workflows:
- Lint
- Linux CI (pytorch-linux-xenial-py3.6-gcc5.4)
- Test tools
- TorchBench CI (pytorch-linux-py3.7-cu102)
- clang-format
jobs:
cancel:
# We do not want to cancel reruns on master
if: github.event.workflow_run.head_branch != 'master'
runs-on: ubuntu-18.04
steps:
- name: Cancel duplicate workflow runs
uses: potiuk/cancel-workflow-runs@a81b3c4d59c61e27484cfacdc13897dd908419c9
with:
cancelMode: duplicates
token: ${{ secrets.GITHUB_TOKEN }}
sourceRunId: ${{ github.event.workflow_run.id }}

44
.github/workflows/clang_format.yml vendored Normal file
View File

@ -0,0 +1,44 @@
name: clang-format
on:
pull_request:
jobs:
clang-format:
runs-on: ubuntu-18.04
steps:
- name: Setup Python
uses: actions/setup-python@v2
with:
python-version: 3.x
architecture: x64
- name: Fetch PyTorch
uses: actions/checkout@v2
with:
fetch-depth: 0 # deep clone, to allow us to use git merge-base
- name: Run clang-format
env:
BASE_SHA: ${{ github.event.pull_request.base.sha }}
run: |
set -eu
# This is necessary to get the same results regardless of whether the
# PR was opened directly or from a forked repo. See: `9f890a92` for more info.
git remote add upstream https://github.com/pytorch/pytorch
git fetch upstream "$GITHUB_BASE_REF"
# only run clang-format on allowlisted files
echo "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"
echo "| clang-format failures found! Run: "
echo "| tools/clang_format_ci.sh ${BASE_SHA} "
echo "| to fix this error. "
echo "| For more info, see: https://github.com/pytorch/pytorch/wiki/clang-format "
echo "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"
tools/clang_format_ci.sh "${BASE_SHA}"
GIT_DIFF=$(git diff)
if [[ -z $GIT_DIFF ]]; then
exit 0
fi
echo "$GIT_DIFF"
exit 1

View File

@ -1,53 +0,0 @@
name: Create Release
on:
push:
tags: ['v*']
branches: [master]
release:
types: [published]
pull_request:
paths: [.github/workflows/create_release.yml]
jobs:
release:
if: ${{ github.repository == 'pytorch/pytorch' }}
name: Create Release
runs-on: ubuntu-latest
steps:
- uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
submodules: 'recursive'
- name: Fake name for PRs
if: ${{ github.event_name == 'pull_request' }}
run: echo "PT_GITHUB_REF=refs/tags/pr-tag" >> "$GITHUB_ENV"
- name: Real name for non-PRs
if: ${{ github.event_name != 'pull_request' }}
run: echo "PT_GITHUB_REF=$GITHUB_REF" >> "$GITHUB_ENV"
- name: Set filenames
run: |
tag_or_branch="${PT_GITHUB_REF#refs/tags/}"
tag_or_branch="${tag_or_branch#refs/heads/}"
echo "PT_RELEASE_NAME=pytorch-$tag_or_branch" >> "$GITHUB_ENV"
echo "PT_RELEASE_FILE=pytorch-$tag_or_branch.tar.gz" >> "$GITHUB_ENV"
- name: Create source distribution
run: |
# Create new folder with specified name so extracting the archive yields that
rm -rf "/tmp/$PT_RELEASE_NAME"
cp -r "$PWD" "/tmp/$PT_RELEASE_NAME"
mv "/tmp/$PT_RELEASE_NAME" .
# Cleanup
rm -r "$PT_RELEASE_NAME"/{.azure_pipelines,.circleci,.jenkins}
find "$PT_RELEASE_NAME" -name '.git*' -exec rm -rv {} \; || true
# Create archive
tar -czf "$PT_RELEASE_FILE" "$PT_RELEASE_NAME"
echo "Created source archive $PT_RELEASE_FILE with content: $(ls -a "$PT_RELEASE_NAME")"
- name: Upload source distribution
if: ${{ github.event_name == 'release' }}
uses: softprops/action-gh-release@v1
with:
files: ${{env.PT_RELEASE_FILE}}
concurrency:
group: create-release-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true

View File

@ -1,283 +0,0 @@
# @generated DO NOT EDIT MANUALLY
# Template is at: .github/templates/linux_ci_workflow.yml.j2
# Generation script: .github/scripts/generate_ci_workflows.py
name: libtorch-linux-xenial-cuda11.3-py3.6-gcc7
on:
pull_request:
types: [unassigned]
push:
branches:
- master
- release/*
workflow_dispatch:
env:
BUILD_ENVIRONMENT: libtorch-linux-xenial-cuda11.3-py3.6-gcc7
DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7
SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla
TORCH_CUDA_ARCH_LIST: 5.2
IN_CI: 1
# This is used for the phase of adding wheel tests only, will be removed once completed
IN_WHEEL_TEST: 1
# Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh
CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
concurrency:
group: libtorch-linux-xenial-cuda11.3-py3.6-gcc7-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true
jobs:
ciflow_should_run:
runs-on: ubuntu-18.04
if: ${{ (github.repository == 'pytorch/pytorch') && ((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || (github.event.action !='unassigned') || (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cuda') || contains(github.event.pull_request.labels.*.name, 'ciflow/libtorch') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux'))) }}
env:
LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
steps:
- name: noop
run: echo running ciflow_should_run
- name: print labels
run: echo "${LABELS}"
calculate-docker-image:
runs-on: linux.2xlarge
needs: [ciflow_should_run]
env:
DOCKER_BUILDKIT: 1
timeout-minutes: 90
outputs:
docker_image: ${{ steps.calculate-tag.outputs.docker_image }}
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: false
- name: Calculate docker image tag
id: calculate-tag
run: |
DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)
echo "::set-output name=docker_tag::${DOCKER_TAG}"
echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"
- name: Check if image should be built
id: check
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}
run: |
set -x
# Check if image already exists, if it does then skip building it
if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then
exit 0
fi
if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then
# if we're on the base branch then use the parent commit
MERGE_BASE=$(git rev-parse HEAD~)
else
# otherwise we're on a PR, so use the most recent base commit
MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION")
fi
# Covers the case where a previous tag doesn't exist for the tree
# this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly
if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then
echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit"
exit 1
fi
PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker")
# If no image exists but the hash is the same as the previous hash then we should error out here
if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then
echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"
echo " contact the PyTorch team to restore the original images"
exit 1
fi
echo ::set-output name=rebuild::yes
- name: Build and push docker image
if: ${{ steps.check.outputs.rebuild }}
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
DOCKER_SKIP_S3_UPLOAD: 1
run: |
export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}
cd .circleci/docker && ./build_docker.sh
build:
runs-on: linux.2xlarge
needs: [calculate-docker-image, ciflow_should_run]
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
JOB_BASE_NAME: libtorch-linux-xenial-cuda11.3-py3.6-gcc7-build
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Pull docker image
run: |
docker pull "${DOCKER_IMAGE}"
- name: Build
run: |
# detached container should get cleaned up by teardown_ec2_linux
container_name=$(docker run \
-e BUILD_ENVIRONMENT \
-e JOB_BASE_NAME \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e SCCACHE_BUCKET \
-e XLA_CLANG_CACHE_S3_BUCKET_NAME \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e SKIP_SCCACHE_INITIALIZATION=1 \
-e TORCH_CUDA_ARCH_LIST \
-e PR_LABELS \
-e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--tty \
--detach \
--user jenkins \
-v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Display and upload binary build size statistics (Click Me)
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
IS_GHA: 1
SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
run: |
COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
export COMMIT_TIME
pip3 install requests==2.26 boto3==1.16.34
python3 -m tools.stats.upload_binary_size_to_scuba || exit 0
- name: Chown workspace
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Chown workspace
if: always()
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Kill containers, clean up images
if: always()
run: |
# ignore expansion of "docker ps -q" since it could be empty
# shellcheck disable=SC2046
docker stop $(docker ps -q) || true
# Prune all of the docker images
docker system prune -af
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Clean up docker images
if: always()
run: |
# Prune all of the docker images
docker system prune -af

View File

@ -1,562 +0,0 @@
# @generated DO NOT EDIT MANUALLY
# Template is at: .github/templates/linux_ci_workflow.yml.j2
# Generation script: .github/scripts/generate_ci_workflows.py
name: linux-bionic-cuda10.2-py3.9-gcc7
on:
pull_request:
types: [unassigned]
push:
branches:
- master
- release/*
workflow_dispatch:
env:
BUILD_ENVIRONMENT: linux-bionic-cuda10.2-py3.9-gcc7
DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7
SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla
TORCH_CUDA_ARCH_LIST: 5.2
IN_CI: 1
# This is used for the phase of adding wheel tests only, will be removed once completed
IN_WHEEL_TEST: 1
# Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh
CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
concurrency:
group: linux-bionic-cuda10.2-py3.9-gcc7-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true
jobs:
ciflow_should_run:
runs-on: ubuntu-18.04
if: ${{ (github.repository_owner == 'pytorch') && ((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || (github.event.action !='unassigned') || (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cuda') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux') || contains(github.event.pull_request.labels.*.name, 'ciflow/slow'))) }}
env:
LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
steps:
- name: noop
run: echo running ciflow_should_run
- name: print labels
run: echo "${LABELS}"
calculate-docker-image:
runs-on: linux.2xlarge
needs: [ciflow_should_run]
env:
DOCKER_BUILDKIT: 1
timeout-minutes: 90
outputs:
docker_image: ${{ steps.calculate-tag.outputs.docker_image }}
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: false
- name: Calculate docker image tag
id: calculate-tag
run: |
DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)
echo "::set-output name=docker_tag::${DOCKER_TAG}"
echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"
- name: Check if image should be built
id: check
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}
run: |
set -x
# Check if image already exists, if it does then skip building it
if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then
exit 0
fi
if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then
# if we're on the base branch then use the parent commit
MERGE_BASE=$(git rev-parse HEAD~)
else
# otherwise we're on a PR, so use the most recent base commit
MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION")
fi
# Covers the case where a previous tag doesn't exist for the tree
# this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly
if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then
echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit"
exit 1
fi
PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker")
# If no image exists but the hash is the same as the previous hash then we should error out here
if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then
echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"
echo " contact the PyTorch team to restore the original images"
exit 1
fi
echo ::set-output name=rebuild::yes
- name: Build and push docker image
if: ${{ steps.check.outputs.rebuild }}
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
DOCKER_SKIP_S3_UPLOAD: 1
run: |
export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}
cd .circleci/docker && ./build_docker.sh
build:
runs-on: linux.2xlarge
needs: [calculate-docker-image, ciflow_should_run]
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
JOB_BASE_NAME: linux-bionic-cuda10.2-py3.9-gcc7-build
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Pull docker image
run: |
docker pull "${DOCKER_IMAGE}"
- name: Build
run: |
# detached container should get cleaned up by teardown_ec2_linux
container_name=$(docker run \
-e BUILD_ENVIRONMENT \
-e JOB_BASE_NAME \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e SCCACHE_BUCKET \
-e XLA_CLANG_CACHE_S3_BUCKET_NAME \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e SKIP_SCCACHE_INITIALIZATION=1 \
-e TORCH_CUDA_ARCH_LIST \
-e PR_LABELS \
-e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--tty \
--detach \
--user jenkins \
-v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Display and upload binary build size statistics (Click Me)
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
IS_GHA: 1
SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
run: |
COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
export COMMIT_TIME
pip3 install requests==2.26 boto3==1.16.34
python3 -m tools.stats.upload_binary_size_to_scuba || exit 0
- name: Chown workspace
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Archive artifacts into zip
run: |
zip -1 -r artifacts.zip dist/ build/custom_test_artifacts build/lib build/bin .pytorch-test-times.json
- uses: seemethere/upload-artifact-s3@v3
name: Store PyTorch Build Artifacts on S3
with:
name: ${{ env.BUILD_ENVIRONMENT }}
retention-days: 14
if-no-files-found: error
path:
artifacts.zip
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Chown workspace
if: always()
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Kill containers, clean up images
if: always()
run: |
# ignore expansion of "docker ps -q" since it could be empty
# shellcheck disable=SC2046
docker stop $(docker ps -q) || true
# Prune all of the docker images
docker system prune -af
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Clean up docker images
if: always()
run: |
# Prune all of the docker images
docker system prune -af
generate-test-matrix:
runs-on: ubuntu-18.04
needs: [ciflow_should_run]
env:
TEST_RUNNER_TYPE: linux.8xlarge.nvidia.gpu
ENABLE_DISTRIBUTED_TEST: 1
ENABLE_JIT_LEGACY_TEST: ''
ENABLE_MULTIGPU_TEST: ''
ENABLE_NOGPU_NO_AVX_TEST: ''
ENABLE_NOGPU_NO_AVX2_TEST: ''
ENABLE_SLOW_TEST: ''
ENABLE_DOCS_TEST: ''
ENABLE_BACKWARDS_COMPAT_TEST: ''
ENABLE_XLA_TEST: ''
ENABLE_NOARCH_TEST: ''
NUM_TEST_SHARDS: 2
MULTIGPU_RUNNER_TYPE: linux.16xlarge.nvidia.gpu
NOGPU_RUNNER_TYPE: linux.2xlarge
PR_BODY: ${{ github.event.pull_request.body }}
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
render-matrix: ${{ steps.set-matrix.outputs.render-matrix }}
ignore-disabled-issues: ${{ steps.set-matrix.outputs.ignore-disabled-issues }}
container:
image: python:3.9
steps:
- name: Install dependencies
run: pip install typing-extensions==3.10
- name: Clone pytorch/pytorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
- name: Generating test matrix
id: set-matrix
run: .github/scripts/generate_pytorch_test_matrix.py
test:
needs: [calculate-docker-image, build, generate-test-matrix, ciflow_should_run]
strategy:
matrix: ${{ fromJson(needs.generate-test-matrix.outputs.matrix) }}
fail-fast: false
runs-on: ${{ matrix.runner }}
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
JOB_BASE_NAME: linux-bionic-cuda10.2-py3.9-gcc7-test
TEST_CONFIG: ${{ matrix.config }}
SHARD_NUMBER: ${{ matrix.shard }}
NUM_TEST_SHARDS: ${{ matrix.num_shards }}
PYTORCH_IGNORE_DISABLED_ISSUES: ${{ needs.generate-test-matrix.outputs.ignore-disabled-issues }}
CONTINUE_THROUGH_ERROR: ${{ github.repository == 'pytorch/pytorch' && (github.event_name == 'push' || github.event_name == 'schedule') }}
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Pull docker image
run: |
docker pull "${DOCKER_IMAGE}"
- name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG
if: ${{ contains(env.BUILD_ENVIRONMENT, 'cuda') && !contains(matrix.config, 'nogpu') }}
run: |
bash .github/scripts/install_nvidia_utils_linux.sh
echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}"
- name: Determine shm-size
run: |
shm_size="1g"
case "${BUILD_ENVIRONMENT}" in
*cuda*)
shm_size="2g"
;;
*rocm*)
shm_size="8g"
;;
esac
echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}"
- uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b
name: Download PyTorch Build Artifacts
with:
name: ${{ env.BUILD_ENVIRONMENT }}
- name: Unzip artifacts
run: |
unzip -o artifacts.zip
- name: Output disk space left
run: |
sudo df -H
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Test
env:
PR_NUMBER: ${{ github.event.pull_request.number }}
IS_GHA: 1
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
AWS_DEFAULT_REGION: us-east-1
run: |
if [[ $TEST_CONFIG == 'multigpu' ]]; then
TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh
else
TEST_COMMAND=.jenkins/pytorch/test.sh
fi
if [[ $NUM_TEST_SHARDS -ne 2 ]]; then
export SHARD_NUMBER=0
fi
# detached container should get cleaned up by teardown_ec2_linux
# TODO: Stop building test binaries as part of the build phase
# Used for GPU_FLAG since that doesn't play nice
# shellcheck disable=SC2086
container_name=$(docker run \
${GPU_FLAG:-} \
-e BUILD_ENVIRONMENT \
-e PR_NUMBER \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e GITHUB_ACTIONS \
-e IN_CI \
-e IS_GHA \
-e CIRCLE_BRANCH \
-e CIRCLE_SHA1 \
-e CIRCLE_PR_NUMBER \
-e AWS_DEFAULT_REGION \
-e IN_WHEEL_TEST \
-e SHARD_NUMBER \
-e JOB_BASE_NAME \
-e TEST_CONFIG \
-e NUM_TEST_SHARDS \
-e PYTORCH_IGNORE_DISABLED_ISSUES \
-e PR_LABELS \
-e CONTINUE_THROUGH_ERROR \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e SCCACHE_BUCKET \
-e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \
-e XLA_CLANG_CACHE_S3_BUCKET_NAME \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--shm-size="${SHM_SIZE}" \
--tty \
--detach \
--name="${container_name}" \
--user jenkins \
-v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}"
- name: Chown workspace
if: always()
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Install render_test_results dependencies
if: always()
shell: bash
run: |
python3 -m pip install junitparser==2.1.1 rich==10.9.0
- name: "[[ Click me for rendered test results (useful for finding failing tests) ]]"
if: always()
shell: bash
# Encoding is weird on windows, just try to default to utf-8 if possible
env:
PYTHONIOENCODING: "utf-8"
run: |
python3 tools/render_junit.py test/
- name: Zip test reports for upload
if: always()
env:
FILE_SUFFIX: '${{ github.job }}-${{ matrix.config }}-${{ matrix.shard }}-${{ matrix.num_shards }}-${{ matrix.runner }}'
run: |
# Remove any previous test reports if they exist
rm -f test-reports-*.zip
zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml'
- uses: actions/upload-artifact@v2
name: Store Test Reports
if: always()
with:
name: test-reports-${{ matrix.config }}
retention-days: 14
if-no-files-found: error
path:
test-reports-*.zip
- uses: seemethere/upload-artifact-s3@v3
name: Store Test Reports on S3
if: always()
with:
retention-days: 14
if-no-files-found: error
path:
test-reports-*.zip
- name: Display and upload test statistics (Click Me)
if: always()
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
JOB_BASE_NAME: linux-bionic-cuda10.2-py3.9-gcc7-test
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
shell: bash
run: |
python3 -m pip install -r requirements.txt
python3 -m pip install boto3==1.16.34
python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Chown workspace
if: always()
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Kill containers, clean up images
if: always()
run: |
# ignore expansion of "docker ps -q" since it could be empty
# shellcheck disable=SC2046
docker stop $(docker ps -q) || true
# Prune all of the docker images
docker system prune -af

View File

@ -1,562 +0,0 @@
# @generated DO NOT EDIT MANUALLY
# Template is at: .github/templates/linux_ci_workflow.yml.j2
# Generation script: .github/scripts/generate_ci_workflows.py
name: linux-bionic-py3.6-clang9
on:
pull_request:
types: [opened, synchronize, reopened, unassigned]
push:
branches:
- master
- release/*
workflow_dispatch:
env:
BUILD_ENVIRONMENT: linux-bionic-py3.6-clang9
DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-py3.6-clang9
SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla
TORCH_CUDA_ARCH_LIST: 5.2
IN_CI: 1
# This is used for the phase of adding wheel tests only, will be removed once completed
IN_WHEEL_TEST: 1
# Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh
CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
concurrency:
group: linux-bionic-py3.6-clang9-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true
jobs:
ciflow_should_run:
runs-on: ubuntu-18.04
if: ${{ (github.repository == 'pytorch/pytorch') && ((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || (github.event.action !='unassigned') || (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cpu') || contains(github.event.pull_request.labels.*.name, 'ciflow/default') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux') || contains(github.event.pull_request.labels.*.name, 'ciflow/noarch') || contains(github.event.pull_request.labels.*.name, 'ciflow/xla'))) }}
env:
LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
steps:
- name: noop
run: echo running ciflow_should_run
- name: print labels
run: echo "${LABELS}"
calculate-docker-image:
runs-on: linux.2xlarge
needs: [ciflow_should_run]
env:
DOCKER_BUILDKIT: 1
timeout-minutes: 90
outputs:
docker_image: ${{ steps.calculate-tag.outputs.docker_image }}
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: false
- name: Calculate docker image tag
id: calculate-tag
run: |
DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)
echo "::set-output name=docker_tag::${DOCKER_TAG}"
echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"
- name: Check if image should be built
id: check
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}
run: |
set -x
# Check if image already exists, if it does then skip building it
if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then
exit 0
fi
if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then
# if we're on the base branch then use the parent commit
MERGE_BASE=$(git rev-parse HEAD~)
else
# otherwise we're on a PR, so use the most recent base commit
MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION")
fi
# Covers the case where a previous tag doesn't exist for the tree
# this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly
if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then
echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit"
exit 1
fi
PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker")
# If no image exists but the hash is the same as the previous hash then we should error out here
if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then
echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"
echo " contact the PyTorch team to restore the original images"
exit 1
fi
echo ::set-output name=rebuild::yes
- name: Build and push docker image
if: ${{ steps.check.outputs.rebuild }}
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
DOCKER_SKIP_S3_UPLOAD: 1
run: |
export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}
cd .circleci/docker && ./build_docker.sh
build:
runs-on: linux.2xlarge
needs: [calculate-docker-image, ciflow_should_run]
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
JOB_BASE_NAME: linux-bionic-py3.6-clang9-build
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Pull docker image
run: |
docker pull "${DOCKER_IMAGE}"
- name: Build
run: |
# detached container should get cleaned up by teardown_ec2_linux
container_name=$(docker run \
-e BUILD_ENVIRONMENT \
-e JOB_BASE_NAME \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e SCCACHE_BUCKET \
-e XLA_CLANG_CACHE_S3_BUCKET_NAME \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e SKIP_SCCACHE_INITIALIZATION=1 \
-e TORCH_CUDA_ARCH_LIST \
-e PR_LABELS \
-e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--tty \
--detach \
--user jenkins \
-v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Display and upload binary build size statistics (Click Me)
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
IS_GHA: 1
SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
run: |
COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
export COMMIT_TIME
pip3 install requests==2.26 boto3==1.16.34
python3 -m tools.stats.upload_binary_size_to_scuba || exit 0
- name: Chown workspace
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Archive artifacts into zip
run: |
zip -1 -r artifacts.zip dist/ build/custom_test_artifacts build/lib build/bin .pytorch-test-times.json
- uses: seemethere/upload-artifact-s3@v3
name: Store PyTorch Build Artifacts on S3
with:
name: ${{ env.BUILD_ENVIRONMENT }}
retention-days: 14
if-no-files-found: error
path:
artifacts.zip
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Chown workspace
if: always()
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Kill containers, clean up images
if: always()
run: |
# ignore expansion of "docker ps -q" since it could be empty
# shellcheck disable=SC2046
docker stop $(docker ps -q) || true
# Prune all of the docker images
docker system prune -af
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Clean up docker images
if: always()
run: |
# Prune all of the docker images
docker system prune -af
generate-test-matrix:
runs-on: ubuntu-18.04
needs: [ciflow_should_run]
env:
TEST_RUNNER_TYPE: linux.2xlarge
ENABLE_DISTRIBUTED_TEST: ''
ENABLE_JIT_LEGACY_TEST: ''
ENABLE_MULTIGPU_TEST: ''
ENABLE_NOGPU_NO_AVX_TEST: ''
ENABLE_NOGPU_NO_AVX2_TEST: ''
ENABLE_SLOW_TEST: ''
ENABLE_DOCS_TEST: ''
ENABLE_BACKWARDS_COMPAT_TEST: ''
ENABLE_XLA_TEST: ''
ENABLE_NOARCH_TEST: 1
NUM_TEST_SHARDS: 2
MULTIGPU_RUNNER_TYPE: linux.16xlarge.nvidia.gpu
NOGPU_RUNNER_TYPE: linux.2xlarge
PR_BODY: ${{ github.event.pull_request.body }}
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
render-matrix: ${{ steps.set-matrix.outputs.render-matrix }}
ignore-disabled-issues: ${{ steps.set-matrix.outputs.ignore-disabled-issues }}
container:
image: python:3.9
steps:
- name: Install dependencies
run: pip install typing-extensions==3.10
- name: Clone pytorch/pytorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
- name: Generating test matrix
id: set-matrix
run: .github/scripts/generate_pytorch_test_matrix.py
test:
needs: [calculate-docker-image, build, generate-test-matrix, ciflow_should_run]
strategy:
matrix: ${{ fromJson(needs.generate-test-matrix.outputs.matrix) }}
fail-fast: false
runs-on: ${{ matrix.runner }}
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
JOB_BASE_NAME: linux-bionic-py3.6-clang9-test
TEST_CONFIG: ${{ matrix.config }}
SHARD_NUMBER: ${{ matrix.shard }}
NUM_TEST_SHARDS: ${{ matrix.num_shards }}
PYTORCH_IGNORE_DISABLED_ISSUES: ${{ needs.generate-test-matrix.outputs.ignore-disabled-issues }}
CONTINUE_THROUGH_ERROR: ${{ github.repository == 'pytorch/pytorch' && (github.event_name == 'push' || github.event_name == 'schedule') }}
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Pull docker image
run: |
docker pull "${DOCKER_IMAGE}"
- name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG
if: ${{ contains(env.BUILD_ENVIRONMENT, 'cuda') && !contains(matrix.config, 'nogpu') }}
run: |
bash .github/scripts/install_nvidia_utils_linux.sh
echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}"
- name: Determine shm-size
run: |
shm_size="1g"
case "${BUILD_ENVIRONMENT}" in
*cuda*)
shm_size="2g"
;;
*rocm*)
shm_size="8g"
;;
esac
echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}"
- uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b
name: Download PyTorch Build Artifacts
with:
name: ${{ env.BUILD_ENVIRONMENT }}
- name: Unzip artifacts
run: |
unzip -o artifacts.zip
- name: Output disk space left
run: |
sudo df -H
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Test
env:
PR_NUMBER: ${{ github.event.pull_request.number }}
IS_GHA: 1
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
AWS_DEFAULT_REGION: us-east-1
run: |
if [[ $TEST_CONFIG == 'multigpu' ]]; then
TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh
else
TEST_COMMAND=.jenkins/pytorch/test.sh
fi
if [[ $NUM_TEST_SHARDS -ne 2 ]]; then
export SHARD_NUMBER=0
fi
# detached container should get cleaned up by teardown_ec2_linux
# TODO: Stop building test binaries as part of the build phase
# Used for GPU_FLAG since that doesn't play nice
# shellcheck disable=SC2086
container_name=$(docker run \
${GPU_FLAG:-} \
-e BUILD_ENVIRONMENT \
-e PR_NUMBER \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e GITHUB_ACTIONS \
-e IN_CI \
-e IS_GHA \
-e CIRCLE_BRANCH \
-e CIRCLE_SHA1 \
-e CIRCLE_PR_NUMBER \
-e AWS_DEFAULT_REGION \
-e IN_WHEEL_TEST \
-e SHARD_NUMBER \
-e JOB_BASE_NAME \
-e TEST_CONFIG \
-e NUM_TEST_SHARDS \
-e PYTORCH_IGNORE_DISABLED_ISSUES \
-e PR_LABELS \
-e CONTINUE_THROUGH_ERROR \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e SCCACHE_BUCKET \
-e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \
-e XLA_CLANG_CACHE_S3_BUCKET_NAME \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--shm-size="${SHM_SIZE}" \
--tty \
--detach \
--name="${container_name}" \
--user jenkins \
-v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}"
- name: Chown workspace
if: always()
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Install render_test_results dependencies
if: always()
shell: bash
run: |
python3 -m pip install junitparser==2.1.1 rich==10.9.0
- name: "[[ Click me for rendered test results (useful for finding failing tests) ]]"
if: always()
shell: bash
# Encoding is weird on windows, just try to default to utf-8 if possible
env:
PYTHONIOENCODING: "utf-8"
run: |
python3 tools/render_junit.py test/
- name: Zip test reports for upload
if: always()
env:
FILE_SUFFIX: '${{ github.job }}-${{ matrix.config }}-${{ matrix.shard }}-${{ matrix.num_shards }}-${{ matrix.runner }}'
run: |
# Remove any previous test reports if they exist
rm -f test-reports-*.zip
zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml'
- uses: actions/upload-artifact@v2
name: Store Test Reports
if: always()
with:
name: test-reports-${{ matrix.config }}
retention-days: 14
if-no-files-found: error
path:
test-reports-*.zip
- uses: seemethere/upload-artifact-s3@v3
name: Store Test Reports on S3
if: always()
with:
retention-days: 14
if-no-files-found: error
path:
test-reports-*.zip
- name: Display and upload test statistics (Click Me)
if: always()
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
JOB_BASE_NAME: linux-bionic-py3.6-clang9-test
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
shell: bash
run: |
python3 -m pip install -r requirements.txt
python3 -m pip install boto3==1.16.34
python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Chown workspace
if: always()
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Kill containers, clean up images
if: always()
run: |
# ignore expansion of "docker ps -q" since it could be empty
# shellcheck disable=SC2046
docker stop $(docker ps -q) || true
# Prune all of the docker images
docker system prune -af

View File

@ -1,566 +0,0 @@
# @generated DO NOT EDIT MANUALLY
# Template is at: .github/templates/linux_ci_workflow.yml.j2
# Generation script: .github/scripts/generate_ci_workflows.py
name: linux-bionic-py3.8-gcc9-coverage
on:
pull_request:
types: [opened, synchronize, reopened, unassigned]
push:
branches:
- master
- release/*
workflow_dispatch:
env:
BUILD_ENVIRONMENT: linux-bionic-py3.8-gcc9-coverage
DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-py3.8-gcc9
SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla
TORCH_CUDA_ARCH_LIST: 5.2
IN_CI: 1
# This is used for the phase of adding wheel tests only, will be removed once completed
IN_WHEEL_TEST: 1
# Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh
CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
concurrency:
group: linux-bionic-py3.8-gcc9-coverage-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true
jobs:
ciflow_should_run:
runs-on: ubuntu-18.04
if: ${{ (github.repository == 'pytorch/pytorch') && ((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || (github.event.action !='unassigned') || (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/coverage') || contains(github.event.pull_request.labels.*.name, 'ciflow/cpu') || contains(github.event.pull_request.labels.*.name, 'ciflow/default') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux'))) }}
env:
LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
steps:
- name: noop
run: echo running ciflow_should_run
- name: print labels
run: echo "${LABELS}"
calculate-docker-image:
runs-on: linux.2xlarge
needs: [ciflow_should_run]
env:
DOCKER_BUILDKIT: 1
timeout-minutes: 90
outputs:
docker_image: ${{ steps.calculate-tag.outputs.docker_image }}
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: false
- name: Calculate docker image tag
id: calculate-tag
run: |
DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)
echo "::set-output name=docker_tag::${DOCKER_TAG}"
echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"
- name: Check if image should be built
id: check
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}
run: |
set -x
# Check if image already exists, if it does then skip building it
if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then
exit 0
fi
if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then
# if we're on the base branch then use the parent commit
MERGE_BASE=$(git rev-parse HEAD~)
else
# otherwise we're on a PR, so use the most recent base commit
MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION")
fi
# Covers the case where a previous tag doesn't exist for the tree
# this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly
if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then
echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit"
exit 1
fi
PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker")
# If no image exists but the hash is the same as the previous hash then we should error out here
if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then
echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"
echo " contact the PyTorch team to restore the original images"
exit 1
fi
echo ::set-output name=rebuild::yes
- name: Build and push docker image
if: ${{ steps.check.outputs.rebuild }}
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
DOCKER_SKIP_S3_UPLOAD: 1
run: |
export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}
cd .circleci/docker && ./build_docker.sh
build:
runs-on: linux.2xlarge
needs: [calculate-docker-image, ciflow_should_run]
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
JOB_BASE_NAME: linux-bionic-py3.8-gcc9-coverage-build
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Pull docker image
run: |
docker pull "${DOCKER_IMAGE}"
- name: Build
run: |
# detached container should get cleaned up by teardown_ec2_linux
container_name=$(docker run \
-e BUILD_ENVIRONMENT \
-e JOB_BASE_NAME \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e SCCACHE_BUCKET \
-e XLA_CLANG_CACHE_S3_BUCKET_NAME \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e SKIP_SCCACHE_INITIALIZATION=1 \
-e TORCH_CUDA_ARCH_LIST \
-e PR_LABELS \
-e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--tty \
--detach \
--user jenkins \
-v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Display and upload binary build size statistics (Click Me)
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
IS_GHA: 1
SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
run: |
COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
export COMMIT_TIME
pip3 install requests==2.26 boto3==1.16.34
python3 -m tools.stats.upload_binary_size_to_scuba || exit 0
- name: Chown workspace
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Archive artifacts into zip
run: |
zip -1 -r artifacts.zip dist/ build/custom_test_artifacts build/lib build/bin .pytorch-test-times.json
- uses: seemethere/upload-artifact-s3@v3
name: Store PyTorch Build Artifacts on S3
with:
name: ${{ env.BUILD_ENVIRONMENT }}
retention-days: 14
if-no-files-found: error
path:
artifacts.zip
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Chown workspace
if: always()
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Kill containers, clean up images
if: always()
run: |
# ignore expansion of "docker ps -q" since it could be empty
# shellcheck disable=SC2046
docker stop $(docker ps -q) || true
# Prune all of the docker images
docker system prune -af
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Clean up docker images
if: always()
run: |
# Prune all of the docker images
docker system prune -af
generate-test-matrix:
runs-on: ubuntu-18.04
needs: [ciflow_should_run]
env:
TEST_RUNNER_TYPE: linux.2xlarge
ENABLE_DISTRIBUTED_TEST: 1
ENABLE_JIT_LEGACY_TEST: ''
ENABLE_MULTIGPU_TEST: ''
ENABLE_NOGPU_NO_AVX_TEST: ''
ENABLE_NOGPU_NO_AVX2_TEST: ''
ENABLE_SLOW_TEST: ''
ENABLE_DOCS_TEST: ''
ENABLE_BACKWARDS_COMPAT_TEST: ''
ENABLE_XLA_TEST: ''
ENABLE_NOARCH_TEST: ''
NUM_TEST_SHARDS: 2
MULTIGPU_RUNNER_TYPE: linux.16xlarge.nvidia.gpu
NOGPU_RUNNER_TYPE: linux.2xlarge
PR_BODY: ${{ github.event.pull_request.body }}
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
render-matrix: ${{ steps.set-matrix.outputs.render-matrix }}
ignore-disabled-issues: ${{ steps.set-matrix.outputs.ignore-disabled-issues }}
container:
image: python:3.9
steps:
- name: Install dependencies
run: pip install typing-extensions==3.10
- name: Clone pytorch/pytorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
- name: Generating test matrix
id: set-matrix
run: .github/scripts/generate_pytorch_test_matrix.py
test:
needs: [calculate-docker-image, build, generate-test-matrix, ciflow_should_run]
strategy:
matrix: ${{ fromJson(needs.generate-test-matrix.outputs.matrix) }}
fail-fast: false
runs-on: ${{ matrix.runner }}
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
JOB_BASE_NAME: linux-bionic-py3.8-gcc9-coverage-test
TEST_CONFIG: ${{ matrix.config }}
SHARD_NUMBER: ${{ matrix.shard }}
NUM_TEST_SHARDS: ${{ matrix.num_shards }}
PYTORCH_IGNORE_DISABLED_ISSUES: ${{ needs.generate-test-matrix.outputs.ignore-disabled-issues }}
CONTINUE_THROUGH_ERROR: ${{ github.repository == 'pytorch/pytorch' && (github.event_name == 'push' || github.event_name == 'schedule') }}
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Pull docker image
run: |
docker pull "${DOCKER_IMAGE}"
- name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG
if: ${{ contains(env.BUILD_ENVIRONMENT, 'cuda') && !contains(matrix.config, 'nogpu') }}
run: |
bash .github/scripts/install_nvidia_utils_linux.sh
echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}"
- name: Determine shm-size
run: |
shm_size="1g"
case "${BUILD_ENVIRONMENT}" in
*cuda*)
shm_size="2g"
;;
*rocm*)
shm_size="8g"
;;
esac
echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}"
- uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b
name: Download PyTorch Build Artifacts
with:
name: ${{ env.BUILD_ENVIRONMENT }}
- name: Unzip artifacts
run: |
unzip -o artifacts.zip
- name: Output disk space left
run: |
sudo df -H
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Test
env:
PR_NUMBER: ${{ github.event.pull_request.number }}
IS_GHA: 1
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
AWS_DEFAULT_REGION: us-east-1
run: |
if [[ $TEST_CONFIG == 'multigpu' ]]; then
TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh
else
TEST_COMMAND=.jenkins/pytorch/test.sh
fi
if [[ $NUM_TEST_SHARDS -ne 2 ]]; then
export SHARD_NUMBER=0
fi
# detached container should get cleaned up by teardown_ec2_linux
# TODO: Stop building test binaries as part of the build phase
# Used for GPU_FLAG since that doesn't play nice
# shellcheck disable=SC2086
container_name=$(docker run \
${GPU_FLAG:-} \
-e BUILD_ENVIRONMENT \
-e PR_NUMBER \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e GITHUB_ACTIONS \
-e IN_CI \
-e IS_GHA \
-e CIRCLE_BRANCH \
-e CIRCLE_SHA1 \
-e CIRCLE_PR_NUMBER \
-e AWS_DEFAULT_REGION \
-e IN_WHEEL_TEST \
-e SHARD_NUMBER \
-e JOB_BASE_NAME \
-e TEST_CONFIG \
-e NUM_TEST_SHARDS \
-e PYTORCH_IGNORE_DISABLED_ISSUES \
-e PR_LABELS \
-e CONTINUE_THROUGH_ERROR \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e SCCACHE_BUCKET \
-e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \
-e XLA_CLANG_CACHE_S3_BUCKET_NAME \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--shm-size="${SHM_SIZE}" \
--tty \
--detach \
--name="${container_name}" \
--user jenkins \
-v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}"
- name: Chown workspace
if: always()
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Install render_test_results dependencies
if: always()
shell: bash
run: |
python3 -m pip install junitparser==2.1.1 rich==10.9.0
- name: "[[ Click me for rendered test results (useful for finding failing tests) ]]"
if: always()
shell: bash
# Encoding is weird on windows, just try to default to utf-8 if possible
env:
PYTHONIOENCODING: "utf-8"
run: |
python3 tools/render_junit.py test/
- name: Report coverage
run: |
python3 -mpip install codecov==2.1.12
python3 -mcodecov
- name: Zip test reports for upload
if: always()
env:
FILE_SUFFIX: '${{ github.job }}-${{ matrix.config }}-${{ matrix.shard }}-${{ matrix.num_shards }}-${{ matrix.runner }}'
run: |
# Remove any previous test reports if they exist
rm -f test-reports-*.zip
zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml'
- uses: actions/upload-artifact@v2
name: Store Test Reports
if: always()
with:
name: test-reports-${{ matrix.config }}
retention-days: 14
if-no-files-found: error
path:
test-reports-*.zip
- uses: seemethere/upload-artifact-s3@v3
name: Store Test Reports on S3
if: always()
with:
retention-days: 14
if-no-files-found: error
path:
test-reports-*.zip
- name: Display and upload test statistics (Click Me)
if: always()
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
JOB_BASE_NAME: linux-bionic-py3.8-gcc9-coverage-test
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
shell: bash
run: |
python3 -m pip install -r requirements.txt
python3 -m pip install boto3==1.16.34
python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Chown workspace
if: always()
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Kill containers, clean up images
if: always()
run: |
# ignore expansion of "docker ps -q" since it could be empty
# shellcheck disable=SC2046
docker stop $(docker ps -q) || true
# Prune all of the docker images
docker system prune -af

View File

@ -1,562 +0,0 @@
# @generated DO NOT EDIT MANUALLY
# Template is at: .github/templates/linux_ci_workflow.yml.j2
# Generation script: .github/scripts/generate_ci_workflows.py
name: linux-xenial-cuda10.2-py3.6-gcc7
on:
pull_request:
types: [unassigned]
push:
branches:
- master
- release/*
workflow_dispatch:
env:
BUILD_ENVIRONMENT: linux-xenial-cuda10.2-py3.6-gcc7
DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7
SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla
TORCH_CUDA_ARCH_LIST: 5.2
IN_CI: 1
# This is used for the phase of adding wheel tests only, will be removed once completed
IN_WHEEL_TEST: 1
# Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh
CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
concurrency:
group: linux-xenial-cuda10.2-py3.6-gcc7-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true
jobs:
ciflow_should_run:
runs-on: ubuntu-18.04
if: ${{ (github.repository == 'pytorch/pytorch') && ((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || (github.event.action !='unassigned') || (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cuda') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux') || contains(github.event.pull_request.labels.*.name, 'ciflow/slow'))) }}
env:
LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
steps:
- name: noop
run: echo running ciflow_should_run
- name: print labels
run: echo "${LABELS}"
calculate-docker-image:
runs-on: linux.2xlarge
needs: [ciflow_should_run]
env:
DOCKER_BUILDKIT: 1
timeout-minutes: 90
outputs:
docker_image: ${{ steps.calculate-tag.outputs.docker_image }}
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: false
- name: Calculate docker image tag
id: calculate-tag
run: |
DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)
echo "::set-output name=docker_tag::${DOCKER_TAG}"
echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"
- name: Check if image should be built
id: check
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}
run: |
set -x
# Check if image already exists, if it does then skip building it
if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then
exit 0
fi
if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then
# if we're on the base branch then use the parent commit
MERGE_BASE=$(git rev-parse HEAD~)
else
# otherwise we're on a PR, so use the most recent base commit
MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION")
fi
# Covers the case where a previous tag doesn't exist for the tree
# this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly
if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then
echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit"
exit 1
fi
PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker")
# If no image exists but the hash is the same as the previous hash then we should error out here
if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then
echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"
echo " contact the PyTorch team to restore the original images"
exit 1
fi
echo ::set-output name=rebuild::yes
- name: Build and push docker image
if: ${{ steps.check.outputs.rebuild }}
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
DOCKER_SKIP_S3_UPLOAD: 1
run: |
export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}
cd .circleci/docker && ./build_docker.sh
build:
runs-on: linux.2xlarge
needs: [calculate-docker-image, ciflow_should_run]
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
JOB_BASE_NAME: linux-xenial-cuda10.2-py3.6-gcc7-build
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Pull docker image
run: |
docker pull "${DOCKER_IMAGE}"
- name: Build
run: |
# detached container should get cleaned up by teardown_ec2_linux
container_name=$(docker run \
-e BUILD_ENVIRONMENT \
-e JOB_BASE_NAME \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e SCCACHE_BUCKET \
-e XLA_CLANG_CACHE_S3_BUCKET_NAME \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e SKIP_SCCACHE_INITIALIZATION=1 \
-e TORCH_CUDA_ARCH_LIST \
-e PR_LABELS \
-e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--tty \
--detach \
--user jenkins \
-v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Display and upload binary build size statistics (Click Me)
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
IS_GHA: 1
SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
run: |
COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
export COMMIT_TIME
pip3 install requests==2.26 boto3==1.16.34
python3 -m tools.stats.upload_binary_size_to_scuba || exit 0
- name: Chown workspace
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Archive artifacts into zip
run: |
zip -1 -r artifacts.zip dist/ build/custom_test_artifacts build/lib build/bin .pytorch-test-times.json
- uses: seemethere/upload-artifact-s3@v3
name: Store PyTorch Build Artifacts on S3
with:
name: ${{ env.BUILD_ENVIRONMENT }}
retention-days: 14
if-no-files-found: error
path:
artifacts.zip
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Chown workspace
if: always()
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Kill containers, clean up images
if: always()
run: |
# ignore expansion of "docker ps -q" since it could be empty
# shellcheck disable=SC2046
docker stop $(docker ps -q) || true
# Prune all of the docker images
docker system prune -af
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Clean up docker images
if: always()
run: |
# Prune all of the docker images
docker system prune -af
generate-test-matrix:
runs-on: ubuntu-18.04
needs: [ciflow_should_run]
env:
TEST_RUNNER_TYPE: linux.8xlarge.nvidia.gpu
ENABLE_DISTRIBUTED_TEST: 1
ENABLE_JIT_LEGACY_TEST: 1
ENABLE_MULTIGPU_TEST: 1
ENABLE_NOGPU_NO_AVX_TEST: 1
ENABLE_NOGPU_NO_AVX2_TEST: 1
ENABLE_SLOW_TEST: 1
ENABLE_DOCS_TEST: ''
ENABLE_BACKWARDS_COMPAT_TEST: ''
ENABLE_XLA_TEST: ''
ENABLE_NOARCH_TEST: ''
NUM_TEST_SHARDS: 2
MULTIGPU_RUNNER_TYPE: linux.16xlarge.nvidia.gpu
NOGPU_RUNNER_TYPE: linux.2xlarge
PR_BODY: ${{ github.event.pull_request.body }}
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
render-matrix: ${{ steps.set-matrix.outputs.render-matrix }}
ignore-disabled-issues: ${{ steps.set-matrix.outputs.ignore-disabled-issues }}
container:
image: python:3.9
steps:
- name: Install dependencies
run: pip install typing-extensions==3.10
- name: Clone pytorch/pytorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
- name: Generating test matrix
id: set-matrix
run: .github/scripts/generate_pytorch_test_matrix.py
test:
needs: [calculate-docker-image, build, generate-test-matrix, ciflow_should_run]
strategy:
matrix: ${{ fromJson(needs.generate-test-matrix.outputs.matrix) }}
fail-fast: false
runs-on: ${{ matrix.runner }}
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
JOB_BASE_NAME: linux-xenial-cuda10.2-py3.6-gcc7-test
TEST_CONFIG: ${{ matrix.config }}
SHARD_NUMBER: ${{ matrix.shard }}
NUM_TEST_SHARDS: ${{ matrix.num_shards }}
PYTORCH_IGNORE_DISABLED_ISSUES: ${{ needs.generate-test-matrix.outputs.ignore-disabled-issues }}
CONTINUE_THROUGH_ERROR: ${{ github.repository == 'pytorch/pytorch' && (github.event_name == 'push' || github.event_name == 'schedule') }}
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Pull docker image
run: |
docker pull "${DOCKER_IMAGE}"
- name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG
if: ${{ contains(env.BUILD_ENVIRONMENT, 'cuda') && !contains(matrix.config, 'nogpu') }}
run: |
bash .github/scripts/install_nvidia_utils_linux.sh
echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}"
- name: Determine shm-size
run: |
shm_size="1g"
case "${BUILD_ENVIRONMENT}" in
*cuda*)
shm_size="2g"
;;
*rocm*)
shm_size="8g"
;;
esac
echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}"
- uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b
name: Download PyTorch Build Artifacts
with:
name: ${{ env.BUILD_ENVIRONMENT }}
- name: Unzip artifacts
run: |
unzip -o artifacts.zip
- name: Output disk space left
run: |
sudo df -H
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Test
env:
PR_NUMBER: ${{ github.event.pull_request.number }}
IS_GHA: 1
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
AWS_DEFAULT_REGION: us-east-1
run: |
if [[ $TEST_CONFIG == 'multigpu' ]]; then
TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh
else
TEST_COMMAND=.jenkins/pytorch/test.sh
fi
if [[ $NUM_TEST_SHARDS -ne 2 ]]; then
export SHARD_NUMBER=0
fi
# detached container should get cleaned up by teardown_ec2_linux
# TODO: Stop building test binaries as part of the build phase
# Used for GPU_FLAG since that doesn't play nice
# shellcheck disable=SC2086
container_name=$(docker run \
${GPU_FLAG:-} \
-e BUILD_ENVIRONMENT \
-e PR_NUMBER \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e GITHUB_ACTIONS \
-e IN_CI \
-e IS_GHA \
-e CIRCLE_BRANCH \
-e CIRCLE_SHA1 \
-e CIRCLE_PR_NUMBER \
-e AWS_DEFAULT_REGION \
-e IN_WHEEL_TEST \
-e SHARD_NUMBER \
-e JOB_BASE_NAME \
-e TEST_CONFIG \
-e NUM_TEST_SHARDS \
-e PYTORCH_IGNORE_DISABLED_ISSUES \
-e PR_LABELS \
-e CONTINUE_THROUGH_ERROR \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e SCCACHE_BUCKET \
-e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \
-e XLA_CLANG_CACHE_S3_BUCKET_NAME \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--shm-size="${SHM_SIZE}" \
--tty \
--detach \
--name="${container_name}" \
--user jenkins \
-v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}"
- name: Chown workspace
if: always()
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Install render_test_results dependencies
if: always()
shell: bash
run: |
python3 -m pip install junitparser==2.1.1 rich==10.9.0
- name: "[[ Click me for rendered test results (useful for finding failing tests) ]]"
if: always()
shell: bash
# Encoding is weird on windows, just try to default to utf-8 if possible
env:
PYTHONIOENCODING: "utf-8"
run: |
python3 tools/render_junit.py test/
- name: Zip test reports for upload
if: always()
env:
FILE_SUFFIX: '${{ github.job }}-${{ matrix.config }}-${{ matrix.shard }}-${{ matrix.num_shards }}-${{ matrix.runner }}'
run: |
# Remove any previous test reports if they exist
rm -f test-reports-*.zip
zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml'
- uses: actions/upload-artifact@v2
name: Store Test Reports
if: always()
with:
name: test-reports-${{ matrix.config }}
retention-days: 14
if-no-files-found: error
path:
test-reports-*.zip
- uses: seemethere/upload-artifact-s3@v3
name: Store Test Reports on S3
if: always()
with:
retention-days: 14
if-no-files-found: error
path:
test-reports-*.zip
- name: Display and upload test statistics (Click Me)
if: always()
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
JOB_BASE_NAME: linux-xenial-cuda10.2-py3.6-gcc7-test
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
shell: bash
run: |
python3 -m pip install -r requirements.txt
python3 -m pip install boto3==1.16.34
python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Chown workspace
if: always()
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Kill containers, clean up images
if: always()
run: |
# ignore expansion of "docker ps -q" since it could be empty
# shellcheck disable=SC2046
docker stop $(docker ps -q) || true
# Prune all of the docker images
docker system prune -af

View File

@ -1,562 +0,0 @@
# @generated DO NOT EDIT MANUALLY
# Template is at: .github/templates/linux_ci_workflow.yml.j2
# Generation script: .github/scripts/generate_ci_workflows.py
name: linux-xenial-cuda11.3-py3.6-gcc7
on:
pull_request:
types: [opened, synchronize, reopened, unassigned]
push:
branches:
- master
- release/*
workflow_dispatch:
env:
BUILD_ENVIRONMENT: linux-xenial-cuda11.3-py3.6-gcc7
DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7
SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla
TORCH_CUDA_ARCH_LIST: 5.2
IN_CI: 1
# This is used for the phase of adding wheel tests only, will be removed once completed
IN_WHEEL_TEST: 1
# Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh
CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
concurrency:
group: linux-xenial-cuda11.3-py3.6-gcc7-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
cancel-in-progress: true
jobs:
ciflow_should_run:
runs-on: ubuntu-18.04
if: ${{ (github.repository == 'pytorch/pytorch') && ((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || (github.event.action !='unassigned') || (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cuda') || contains(github.event.pull_request.labels.*.name, 'ciflow/default') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux'))) }}
env:
LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
steps:
- name: noop
run: echo running ciflow_should_run
- name: print labels
run: echo "${LABELS}"
calculate-docker-image:
runs-on: linux.2xlarge
needs: [ciflow_should_run]
env:
DOCKER_BUILDKIT: 1
timeout-minutes: 90
outputs:
docker_image: ${{ steps.calculate-tag.outputs.docker_image }}
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: false
- name: Calculate docker image tag
id: calculate-tag
run: |
DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)
echo "::set-output name=docker_tag::${DOCKER_TAG}"
echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"
- name: Check if image should be built
id: check
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}
run: |
set -x
# Check if image already exists, if it does then skip building it
if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then
exit 0
fi
if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then
# if we're on the base branch then use the parent commit
MERGE_BASE=$(git rev-parse HEAD~)
else
# otherwise we're on a PR, so use the most recent base commit
MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION")
fi
# Covers the case where a previous tag doesn't exist for the tree
# this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly
if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then
echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit"
exit 1
fi
PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker")
# If no image exists but the hash is the same as the previous hash then we should error out here
if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then
echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"
echo " contact the PyTorch team to restore the original images"
exit 1
fi
echo ::set-output name=rebuild::yes
- name: Build and push docker image
if: ${{ steps.check.outputs.rebuild }}
env:
DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
DOCKER_SKIP_S3_UPLOAD: 1
run: |
export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}
cd .circleci/docker && ./build_docker.sh
build:
runs-on: linux.2xlarge
needs: [calculate-docker-image, ciflow_should_run]
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
JOB_BASE_NAME: linux-xenial-cuda11.3-py3.6-gcc7-build
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Pull docker image
run: |
docker pull "${DOCKER_IMAGE}"
- name: Build
run: |
# detached container should get cleaned up by teardown_ec2_linux
container_name=$(docker run \
-e BUILD_ENVIRONMENT \
-e JOB_BASE_NAME \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e SCCACHE_BUCKET \
-e XLA_CLANG_CACHE_S3_BUCKET_NAME \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e SKIP_SCCACHE_INITIALIZATION=1 \
-e TORCH_CUDA_ARCH_LIST \
-e PR_LABELS \
-e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--tty \
--detach \
--user jenkins \
-v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Display and upload binary build size statistics (Click Me)
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
IS_GHA: 1
SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
run: |
COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
export COMMIT_TIME
pip3 install requests==2.26 boto3==1.16.34
python3 -m tools.stats.upload_binary_size_to_scuba || exit 0
- name: Chown workspace
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Archive artifacts into zip
run: |
zip -1 -r artifacts.zip dist/ build/custom_test_artifacts build/lib build/bin .pytorch-test-times.json
- uses: seemethere/upload-artifact-s3@v3
name: Store PyTorch Build Artifacts on S3
with:
name: ${{ env.BUILD_ENVIRONMENT }}
retention-days: 14
if-no-files-found: error
path:
artifacts.zip
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Chown workspace
if: always()
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Kill containers, clean up images
if: always()
run: |
# ignore expansion of "docker ps -q" since it could be empty
# shellcheck disable=SC2046
docker stop $(docker ps -q) || true
# Prune all of the docker images
docker system prune -af
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Clean up docker images
if: always()
run: |
# Prune all of the docker images
docker system prune -af
generate-test-matrix:
runs-on: ubuntu-18.04
needs: [ciflow_should_run]
env:
TEST_RUNNER_TYPE: linux.8xlarge.nvidia.gpu
ENABLE_DISTRIBUTED_TEST: 1
ENABLE_JIT_LEGACY_TEST: ''
ENABLE_MULTIGPU_TEST: ''
ENABLE_NOGPU_NO_AVX_TEST: ''
ENABLE_NOGPU_NO_AVX2_TEST: ''
ENABLE_SLOW_TEST: ''
ENABLE_DOCS_TEST: ''
ENABLE_BACKWARDS_COMPAT_TEST: ''
ENABLE_XLA_TEST: ''
ENABLE_NOARCH_TEST: ''
NUM_TEST_SHARDS: 2
MULTIGPU_RUNNER_TYPE: linux.16xlarge.nvidia.gpu
NOGPU_RUNNER_TYPE: linux.2xlarge
PR_BODY: ${{ github.event.pull_request.body }}
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
render-matrix: ${{ steps.set-matrix.outputs.render-matrix }}
ignore-disabled-issues: ${{ steps.set-matrix.outputs.ignore-disabled-issues }}
container:
image: python:3.9
steps:
- name: Install dependencies
run: pip install typing-extensions==3.10
- name: Clone pytorch/pytorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
- name: Generating test matrix
id: set-matrix
run: .github/scripts/generate_pytorch_test_matrix.py
test:
needs: [calculate-docker-image, build, generate-test-matrix, ciflow_should_run]
strategy:
matrix: ${{ fromJson(needs.generate-test-matrix.outputs.matrix) }}
fail-fast: false
runs-on: ${{ matrix.runner }}
env:
DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
JOB_BASE_NAME: linux-xenial-cuda11.3-py3.6-gcc7-test
TEST_CONFIG: ${{ matrix.config }}
SHARD_NUMBER: ${{ matrix.shard }}
NUM_TEST_SHARDS: ${{ matrix.num_shards }}
PYTORCH_IGNORE_DISABLED_ISSUES: ${{ needs.generate-test-matrix.outputs.ignore-disabled-issues }}
CONTINUE_THROUGH_ERROR: ${{ github.repository == 'pytorch/pytorch' && (github.event_name == 'push' || github.event_name == 'schedule') }}
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
- name: Log in to ECR
env:
AWS_RETRY_MODE: standard
AWS_MAX_ATTEMPTS: 5
run: |
aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
bash /tmp/ecr-login.sh
rm /tmp/ecr-login.sh
- name: Chown workspace
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
retry () {
"$@" || (sleep 1 && "$@") || (sleep 2 && "$@")
}
# Ensure the working directory gets chowned back to the current user
retry docker pull "${ALPINE_IMAGE}"
docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Clean workspace
run: |
rm -rf "${GITHUB_WORKSPACE:?}/*"
rm -f ~/.ssh/authorized_keys
- name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
uses: seemethere/add-github-ssh-key@v1
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Preserve github env variables for use in docker
run: |
env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
- name: Checkout PyTorch
uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
with:
# deep clone, to allow use of git merge-base
fetch-depth: 0
submodules: recursive
- name: Pull docker image
run: |
docker pull "${DOCKER_IMAGE}"
- name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG
if: ${{ contains(env.BUILD_ENVIRONMENT, 'cuda') && !contains(matrix.config, 'nogpu') }}
run: |
bash .github/scripts/install_nvidia_utils_linux.sh
echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}"
- name: Determine shm-size
run: |
shm_size="1g"
case "${BUILD_ENVIRONMENT}" in
*cuda*)
shm_size="2g"
;;
*rocm*)
shm_size="8g"
;;
esac
echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}"
- uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b
name: Download PyTorch Build Artifacts
with:
name: ${{ env.BUILD_ENVIRONMENT }}
- name: Unzip artifacts
run: |
unzip -o artifacts.zip
- name: Output disk space left
run: |
sudo df -H
- name: Parse ref
id: parse-ref
run: .github/scripts/parse_ref.py
- name: Test
env:
PR_NUMBER: ${{ github.event.pull_request.number }}
IS_GHA: 1
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
AWS_DEFAULT_REGION: us-east-1
run: |
if [[ $TEST_CONFIG == 'multigpu' ]]; then
TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh
else
TEST_COMMAND=.jenkins/pytorch/test.sh
fi
if [[ $NUM_TEST_SHARDS -ne 2 ]]; then
export SHARD_NUMBER=0
fi
# detached container should get cleaned up by teardown_ec2_linux
# TODO: Stop building test binaries as part of the build phase
# Used for GPU_FLAG since that doesn't play nice
# shellcheck disable=SC2086
container_name=$(docker run \
${GPU_FLAG:-} \
-e BUILD_ENVIRONMENT \
-e PR_NUMBER \
-e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
-e GITHUB_ACTIONS \
-e IN_CI \
-e IS_GHA \
-e CIRCLE_BRANCH \
-e CIRCLE_SHA1 \
-e CIRCLE_PR_NUMBER \
-e AWS_DEFAULT_REGION \
-e IN_WHEEL_TEST \
-e SHARD_NUMBER \
-e JOB_BASE_NAME \
-e TEST_CONFIG \
-e NUM_TEST_SHARDS \
-e PYTORCH_IGNORE_DISABLED_ISSUES \
-e PR_LABELS \
-e CONTINUE_THROUGH_ERROR \
-e MAX_JOBS="$(nproc --ignore=2)" \
-e SCCACHE_BUCKET \
-e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \
-e XLA_CLANG_CACHE_S3_BUCKET_NAME \
--env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--shm-size="${SHM_SIZE}" \
--tty \
--detach \
--name="${container_name}" \
--user jenkins \
-v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
-w /var/lib/jenkins/workspace \
"${DOCKER_IMAGE}"
)
docker exec -t "${container_name}" sh -c "sudo chown -R jenkins . && pip install dist/*.whl && ${TEST_COMMAND}"
- name: Chown workspace
if: always()
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Install render_test_results dependencies
if: always()
shell: bash
run: |
python3 -m pip install junitparser==2.1.1 rich==10.9.0
- name: "[[ Click me for rendered test results (useful for finding failing tests) ]]"
if: always()
shell: bash
# Encoding is weird on windows, just try to default to utf-8 if possible
env:
PYTHONIOENCODING: "utf-8"
run: |
python3 tools/render_junit.py test/
- name: Zip test reports for upload
if: always()
env:
FILE_SUFFIX: '${{ github.job }}-${{ matrix.config }}-${{ matrix.shard }}-${{ matrix.num_shards }}-${{ matrix.runner }}'
run: |
# Remove any previous test reports if they exist
rm -f test-reports-*.zip
zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml'
- uses: actions/upload-artifact@v2
name: Store Test Reports
if: always()
with:
name: test-reports-${{ matrix.config }}
retention-days: 14
if-no-files-found: error
path:
test-reports-*.zip
- uses: seemethere/upload-artifact-s3@v3
name: Store Test Reports on S3
if: always()
with:
retention-days: 14
if-no-files-found: error
path:
test-reports-*.zip
- name: Display and upload test statistics (Click Me)
if: always()
# temporary hack: set CIRCLE_* vars, until we update
# tools/stats/print_test_stats.py to natively support GitHub Actions
env:
AWS_DEFAULT_REGION: us-east-1
CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
JOB_BASE_NAME: linux-xenial-cuda11.3-py3.6-gcc7-test
CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
shell: bash
run: |
python3 -m pip install -r requirements.txt
python3 -m pip install boto3==1.16.34
python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test
- name: Hold runner for 2 hours or until ssh sessions have drained
# Always hold for active ssh sessions
if: always()
run: .github/scripts/wait_for_ssh_to_drain.sh
- name: Chown workspace
if: always()
env:
ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
run: |
# Ensure the working directory gets chowned back to the current user
docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
- name: Kill containers, clean up images
if: always()
run: |
# ignore expansion of "docker ps -q" since it could be empty
# shellcheck disable=SC2046
docker stop $(docker ps -q) || true
# Prune all of the docker images
docker system prune -af

Some files were not shown because too many files have changed in this diff Show More