Compare commits

..

30 Commits

Author SHA1 Message Date
302ee7bfb6 [release/1.10] Fix adaptive_max_pool2d for channels-last on CUDA (#67697) (#69618)
Co-authored-by: Xiao Wang <24860335+xwang233@users.noreply.github.com>
2021-12-09 08:59:45 -08:00
0c91a7063d [release/1.10] TST Adds test for non-contiguous tensors (#64954) (#69617)
* TST Adds test for non-contiguous tensors (#64954)

Summary:
Follow up to https://github.com/pytorch/pytorch/issues/61935

This PR:

1. Adds test for non-contiguous tensors
2. Fixes bug in `NLLLoss` that was catch by the test.

The reason this was not catch in `common_nn` is because `CriterionTest` overrides `test_cuda` but does not call `test_nonconfig`.

cc albanD mruberry jbschlosser walterddr

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64954

Reviewed By: zou3519

Differential Revision: D31174149

Pulled By: jbschlosser

fbshipit-source-id: a16073e59b40ccc01c82ede016b63a8db2e810f5
(cherry picked from commit 0d3bf97fd05ce6ef5ddfb0a100c78ad82914cee4)
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

* Cherry-pick changes from #64444

Namely, `make_weight` partial into `module_inputs_torch_nn_NLLLoss`

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Nikita Shulga <nshulga@fb.com>
2021-12-09 07:09:08 -08:00
eadb03895a [ONNX] Update onnxruntime to 1.9 for CI (#65029) (#67269) (#69641)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67269

Test Plan: Imported from OSS

Reviewed By: ngimel, msaroufim

Differential Revision: D31962516

Pulled By: malfet

fbshipit-source-id: 39b3c6a4a05d7b769f0ef5ce7ea597209516cde2

Co-authored-by: Gary Miguel <garymiguel@microsoft.com>
2021-12-08 20:09:11 -08:00
8416d630c9 Fix strict aliasing rule violation in bitwise_binary_op (#66194) (#69619)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/66119

Failure on ARM Neoverse N1 before this PR:
```
======================================================================
FAIL: test_bitwise_ops_cpu_int16 (__main__.TestBinaryUfuncsCPU)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/pytorch/pytorch/torch/testing/_internal/common_device_type.py", line 373, in instantiated_test
    result = test(self, **param_kwargs)
  File "test_binary_ufuncs.py", line 315, in test_bitwise_ops
    self.assertEqual(op(a, b), op(a_np, b_np))
  File "/opt/pytorch/pytorch/torch/testing/_internal/common_utils.py", line 1633, in assertEqual
    self.assertEqual(
  File "/opt/pytorch/pytorch/torch/testing/_internal/common_utils.py", line 1611, in assertEqual
    super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg))
AssertionError: False is not true : Tensors failed to compare as equal!Found 176 different element(s) (out of 225), with the greatest difference of 21850 (-21846 vs. 4) occuring at index (0, 2).

======================================================================
FAIL: test_bitwise_ops_cpu_int32 (__main__.TestBinaryUfuncsCPU)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/pytorch/pytorch/torch/testing/_internal/common_device_type.py", line 373, in instantiated_test
    result = test(self, **param_kwargs)
  File "test_binary_ufuncs.py", line 315, in test_bitwise_ops
    self.assertEqual(op(a, b), op(a_np, b_np))
  File "/opt/pytorch/pytorch/torch/testing/_internal/common_utils.py", line 1633, in assertEqual
    self.assertEqual(
  File "/opt/pytorch/pytorch/torch/testing/_internal/common_utils.py", line 1611, in assertEqual
    super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg))
AssertionError: False is not true : Tensors failed to compare as equal!Found 188 different element(s) (out of 225), with the greatest difference of 1335341061 (-1335341056 vs. 5) occuring at index (14, 8).

----------------------------------------------------------------------
```
which passes now.

CC malfet ezyang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66194

Reviewed By: dagitses, bdhirsh, ngimel

Differential Revision: D31430274

Pulled By: malfet

fbshipit-source-id: bcf1c9d584c02eff328dd5b1f7af064fac5942c9
(cherry picked from commit 0b0674121aeb7d8bbcccd0461d939b64879a1273)
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Co-authored-by: pbialecki <pbialecki@nvidia.com>
2021-12-08 15:01:18 -08:00
c78ceadbb0 [LiteInterpreter] Specify Loader to yaml.load (#67694) (#69642)
Summary:
It became a mandatory argument since PyYaml-6, but has been present since PyYaml-3

Unblock migration to newer runtime

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67694

Reviewed By: seemethere

Differential Revision: D32106043

Pulled By: malfet

fbshipit-source-id: 35246b97a974b168c066396ea31987b267534c7f
2021-12-08 14:59:16 -08:00
70af72c794 Fix python version in test tools CI job (#66947) (#69643)
Summary:
On the HUD, the test tools job is failing as the runners now install Python 3.10, which is not compatible with numpy 1.20

See https://github.com/pytorch/pytorch/runs/3952169950?check_suite_focus=true Install dependencies step:
```
 ERROR: Command errored out with exit status 1:
   command: /opt/hostedtoolcache/Python/3.10.0/x64/bin/python /opt/hostedtoolcache/Python/3.10.0/x64/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py build_wheel /tmp/tmptq8aay7m
       cwd: /tmp/pip-install-dk_6t98q/numpy_e9431bf106b746148c0e7c36e46551b4
  Complete output (1169 lines):
  setup.py:66: RuntimeWarning: NumPy 1.20.0 may not yet support Python 3.10.
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66947

Reviewed By: suo, malfet

Differential Revision: D31799205

Pulled By: janeyx99

fbshipit-source-id: 64bf10c37c0aa4f5837c48e92d56e81d920722bd

Co-authored-by: Jane Xu <janeyx@fb.com>
2021-12-08 14:51:06 -08:00
36449ea931 (torch/elastic) add fqdn hostname to error printout (#66182) (#66662)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66182

closes https://github.com/pytorch/pytorch/issues/63174

Does a few things:

1. adds hostname to the error report
2. moves the "root cause" section to the end (presumably since the logs are being "tailed" we want the root cause to appear at the end)
3. moves redundant error info logging to debug
4. makes the border max 60 char in length and justifies left for the header

NOTE: YOU HAVE TO annotate your main function with torch.distributed.elastic.multiprocessing.errors.record, otherwise no traceback is printed (this is because python exception propagation does NOT work out of the both for IPC - hence the extra record annotation).

Test Plan:
Sample

```
============================================================
run_script_path FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2021-10-05_17:37:22
  host      : devvm4955.prn0.facebook.com
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 3296201)
  error_file: /home/kiuk/tmp/elastic/none_3_lsytqe/attempt_0/0/error.json
  traceback :
  Traceback (most recent call last):
    File "/tmp/jetter.xr3_x6qq/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 372, in wrapper
      return f(*args, **kwargs)
    File "main.py", line 28, in main
      raise RuntimeError(args.throws)
  RuntimeError: foobar

============================================================
```

Reviewed By: cbalioglu, aivanou

Differential Revision: D31416492

fbshipit-source-id: 0aeaf6e634e23ce0ea7f6a03b12c8a9ac57246e9
2021-10-14 18:35:23 -07:00
b544cbddfa Handle shared memory cases in MathBitFallback (#66667)
* Handle shared memory cases in MathBithFallback (#63602)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63602

This PR fixes the case when a read and write is performed on a memory shared between mutable and (or) non-mutable arguments. Example:
```
a=torch.tensor([1+1j])
b=a.conj()
b.add_(a) # should return tensor([2]) but returns tensor ([2-2j])
```

The issue here is that in the conjugate fallback, we resolve the conjugation in-place for mutable arguments which can be a problem as shown above in the case when other input arguments share memory with the mutable argument(s).
This PR fixes this issue by:
1. first scanning through the operator input arguments and creating a vector of mutable arguments that have the conj bit set to `True` (and accordingly setting the flag `check_for_alias_with_mut_arg ` to `True` or `False`).
2. Iterating through all the arguments. At this time we only look at the non-mutable arguments. If `check_for_alias_with_mut_arg` is set to `True`, then we iterate through `mutable_inputs` to check if the current arg tensor in question doesn't alias any of the entries in `mutable_inputs`. If yes, then we clone the non-mutable tensor arg, else we resolve the conjugation as before.
3. Now we look through the mutable_inputs vector (which contains only mutable input tensors with conj bit set to `True`). We in-place conjugate each of the entries in the vector.
4. Do the computation.
5. Re-conjugate the mutable argument tensors.

NOTE: `TensorLists` are not fully handled in ConjugateFallback. Please see the in-line comment for more details.

Fixes https://github.com/pytorch/pytorch/issues/59943

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D30466905

Pulled By: anjali411

fbshipit-source-id: 58058e5e6481da04a12d03f743c1491942a6cc9b

* fix lint (#66572)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66572

Test Plan: Imported from OSS

Reviewed By: seemethere

Differential Revision: D31624043

Pulled By: suo

fbshipit-source-id: 9db9cee3140d78c2a2f0c937be84755206fee1dd

Co-authored-by: anjali411 <chourdiaanjali123@gmail.com>
Co-authored-by: Michael Suo <suo@fb.com>
2021-10-14 18:34:13 -07:00
ddf3092581 Disable .numpy() and .tolist() for tensor subclasses subclasses and f… (#66642)
* Disable .numpy() and .tolist() for tensor subclasses subclasses and fix .tolist() for conjugated and negated tensors (#66082)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66082

Fixes https://github.com/pytorch/pytorch/issues/66024 #65779

cc ezyang anjali411 dylanbespalko mruberry Lezcano nikitaved albanD

Test Plan: Imported from OSS

Reviewed By: Gamrix, albanD

Differential Revision: D31615588

Pulled By: anjali411

fbshipit-source-id: c3e65ef0fe301630eb76732ccd7819683c09aa19

* Apply suggestions from code review

Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com>
Co-authored-by: Nikita Shulga <nshulga@fb.com>
2021-10-14 16:00:56 -07:00
cc360fa38f Delete extraneous whitespaces 2021-10-14 15:57:16 -07:00
3c134b8b1e Disable .numpy() and .tolist() for tensor subclasses subclasses and fix .tolist() for conjugated and negated tensors (#66082) (#66576)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66082

Fixes https://github.com/pytorch/pytorch/issues/66024 #65779

cc ezyang anjali411 dylanbespalko mruberry Lezcano nikitaved albanD

Test Plan: Imported from OSS

Reviewed By: Gamrix, albanD

Differential Revision: D31615588

Pulled By: anjali411

fbshipit-source-id: c3e65ef0fe301630eb76732ccd7819683c09aa19
2021-10-14 13:16:03 -07:00
4a514dd81e Call PyArray_Check only if NumPy is available (#66433) (#66629)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/66353

Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66433

Reviewed By: seemethere, janeyx99

Differential Revision: D31548290

Pulled By: malfet

fbshipit-source-id: 3b094bc8195d0392338e0bdc6df2f39587b85bb3
2021-10-14 09:46:41 -07:00
c3ea586e32 fix normal with empty std (#66524) 2021-10-14 09:42:41 -07:00
9509e8a3d6 Fix cosine similarity dim checks (#66214)
* fix cosine similarity dimensionality check

* fix shapes in the doc
2021-10-08 07:22:40 -07:00
1774a6a2f4 [ONNX] Deprecate various args (#65962)
* [ONNX] Remove argument _retain_param_name from torch.onnx.export() function. (#61702) (#64370)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64370

As of now, the "_retain_param_name" parameter has no description in PyTorch docs website. According to code, this argument determines if we keep the original parameter names of PyTorch model in the final ONNX graph. If this is False, those original parameter names will be replaced with a series of integers starting from 1.

Since setting numbers as parameter names make no sense to users, we remove this argument from the torch.onnx.export() function to increase user experience of calling this function.

This PR will still keep it in torch.onnx.export() function for backward support while all backend logic has been changed to work as _retain_param_name is set to True.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D30905270

Pulled By: malfet

fbshipit-source-id: ca60757ca17daaff937e9f08da42596086795f4a

Co-authored-by: fatcat-z <zhang-ji@outlook.com>

* [ONNX] Remove strip_doc_string param from torch.onnx.export() function. (#61712) (#64371)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64371

As of now, the "strip_doc_string" parameter was described as below:

strip_doc_string (bool, default True): do not include the field
doc_string``` from the exported model. Otherwise the field will mention the source code locations for model``.

This is usually useless to users who want to transform a PyTorch model to ONNX one. Only when the user wants to debug the export process, these source code locations could provide benefits.

To make the export() function more friendly by providing less parameters, we combined "strip_doc_string" into "verbose" parameter. If a user set verbose to True, it means the users need some log information for debugging the export process and this is similar with the purpose of strip_doc_string parameter.

But the usage of these 2 arguments are opposite: setting verbose to True means we want to print log information to help debug, which means strip_doc_string should be False. And this is how we replace strip_doc_string with verbose argument in this PR.

This PR will still keep it in torch.onnx.export() function for backward support while the usage of it has been combined with verbose argument.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D30905268

Pulled By: malfet

fbshipit-source-id: 2f06eb805c01fe15ff7a1b4f6595c937ba716d60

Co-authored-by: fatcat-z <zhang-ji@outlook.com>

* [ONNX] minor doc improvements and cleanup (#62514) (#64373)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64373

* Fix some bad formatting and clarify things in onnx.rst.
* In `export_to_pretty_string`:
    * Add documentation for previously undocumented args.
    * Document that `f` arg is ignored and mark it deprecated.
    * Update tests to stop setting `f`.
    * Warn if `_retain_param_name` is set.
* Use double quotes for string literals in test_operators.py.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D30905271

Pulled By: malfet

fbshipit-source-id: 3627eeabf40b9516c4a83cfab424ce537b36e4b3

* [ONNX] Deprecated the example_outputs param from torch.onnx.export() function. (#62815) (#64380)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64380

* `example_outputs` used to determine the type and shape of the outputs without tracing the execution of the model. And it must be provided when exporting a ScriptModule or ScriptFunction when using export() function.

* Since we can work out `example_outputs` in internal function instead of being provided by user, so we deprecated this argument in the export() function to increase user experience of calling this function.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D30905266

Pulled By: malfet

fbshipit-source-id: d00b00d7d02b365d165028288ad915678caa51f2

Co-authored-by: hwangdeyu <dejack953@outlook.com>

* [ONNX] Deprecate use_external_data_format param from torch.onnx.export() function. (#62257) (#64382)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64382

* This `use_external_data_format` parameter is used for large models cannot be exported because of the 2GB protobuf limit.

* When `use_external_data_format` set to True, the model is exported in ONNX external data format, in which case some of the model parameters are stored in external binary files and not in the ONNX model file itself.

* This PR will set this paramter to DEPRECATED and check the model proto sizes by code instead of by user, if the sizes lager than 2GB, then `use_external_data_format = True` automatically.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D30905265

Pulled By: malfet

fbshipit-source-id: 82b4e17bfa6a8de2bfd700a5282c12f6835603cb

Co-authored-by: hwangdeyu <dejack953@outlook.com>

* fix clang-tidy error introduced by #64382 (#65977)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65977

Reviewed By: ngimel

Differential Revision: D31423174

Pulled By: malfet

fbshipit-source-id: 0ea560b9a6ddd6431f70bd3ac10ace68e26ab352

Co-authored-by: BowenBao <bowbao@microsoft.com>
Co-authored-by: fatcat-z <zhang-ji@outlook.com>
Co-authored-by: hwangdeyu <dejack953@outlook.com>
2021-10-08 07:21:29 -07:00
a27906c250 Convert Sampler back to lazily construction (#63646) (#65926)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63646

Fixes #63609

Test Plan: Imported from OSS

Reviewed By: NivekT

Differential Revision: D30451774

Pulled By: ejguan

fbshipit-source-id: 550d77494326446d1a42b5da0559e0d384c47413
2021-10-08 07:20:03 -07:00
49f52b6c07 Revert "Added option to update parameters using state_dict in AveragedModel (#65495) (#65755)" (#66308)
This reverts commit 5f1a434599b46afd99607839d15892e09269a1c4.
2021-10-08 07:17:47 -07:00
5f1a434599 Added option to update parameters using state_dict in AveragedModel (#65495) (#65755)
* Added option to update parameters using state_dict in AveragedModel (#65495)

Summary:
While implementing [EMA](https://github.com/pytorch/vision/pull/4381)(which extends AveragedModel) in torchvision, update_parameters() from AveragedModel could not be used as it did not handle state_dict(), so a custom update_parameters() needed to be defined in [EMA class](https://github.com/pytorch/vision/pull/4406). This PR aims to handle this scenario removing the need for this custom update_parameters() implementation.

Discussion: https://github.com/pytorch/vision/pull/4406#pullrequestreview-753734102

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65495

Reviewed By: datumbox

Differential Revision: D31176742

Pulled By: prabhat00155

fbshipit-source-id: 326d14876018f21cf602bab5eaba344678dbabe2
(cherry picked from commit 2ea724b1fd543304e3be7bd223cac451cd093e16)

* Added validation of mode parameter in AveragedModel (#65921)

Summary:
Discussion: https://github.com/pytorch/pytorch/pull/65495#issuecomment-930460469

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65921

Reviewed By: albanD

Differential Revision: D31310105

Pulled By: prabhat00155

fbshipit-source-id: 417691832a7c793744830c11e0ce53e3972d21a3
(cherry picked from commit c7748fc172553da66368fd0b7fea3fe5661e2dc1)
2021-10-06 11:13:31 -07:00
ecbf5a7439 Tweak file_diff_from_base for release/1.10 branch (#66202) 2021-10-06 08:34:46 -07:00
4e3ebebcff [DataPipe] DataPipe Fix and Deprecation Warnings for Release 1.10 (#65932)
* Unify the output pathname of archive reader and extractor (#65424)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65424

This PR is re-implementation for https://github.com/facebookexternal/torchdata/pull/93
Same PR has landed into torchdata https://github.com/facebookexternal/torchdata/pull/157

Test Plan: Imported from OSS

Reviewed By: soulitzer

Differential Revision: D31090447

Pulled By: ejguan

fbshipit-source-id: 45af1ad9b24310bebfd6e010f41cff398946ba65

* [DatePipe] add deprecation warnings for DataPipes that will solely exist in TorchData (#65827)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65827

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D31272794

Pulled By: NivekT

fbshipit-source-id: 8da8266184b4df050422904cbc5fca6d7c3d2e02

* [DataPipe] Fixes an issue where TarArchiveReader closes stream when read into a buffer (#65877)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65877

Fixes #65808

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D31296041

Pulled By: NivekT

fbshipit-source-id: cdcad3a333ae9781d6063678a122a128955b0ff4

Co-authored-by: Erjia Guan <erjia@fb.com>
2021-10-05 20:54:40 -07:00
2b46c95e7c [iOS][CI] Update dev certs (#66004) (#66188)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/65988

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66004

Reviewed By: xta0

Differential Revision: D31340893

Pulled By: malfet

fbshipit-source-id: 3bf0be266e9686a73d62e86c5cf0bebeb0416260

Co-authored-by: Tao Xu <taox@fb.com>
2021-10-05 20:12:40 -07:00
5f3eee1ca5 Fix backward compatibility tests (#66186)
Compare operator list against RC1 build rather than against nightly
2021-10-05 20:12:13 -07:00
4731f33d02 Fix Windows ninja builds when MAX_JOBS is specified (#65444) (#66155)
Summary:
Reported by cloudhan in https://github.com/pytorch/pytorch/pull/64733#issuecomment-924545463

Fixes regression introduced by 047e68235f

cc malfet seemethere

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65444

Reviewed By: dagitses, seemethere

Differential Revision: D31103260

Pulled By: malfet

fbshipit-source-id: 9d5454a64cb8a0b96264119cf16582cc5afed284
2021-10-05 12:03:27 -07:00
ecfcb8ff5a Binary building wthout python fix (#66031) (#66117)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/66030

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66031

Reviewed By: VitalyFedyunin

Differential Revision: D31356243

Pulled By: malfet

fbshipit-source-id: d1537bc65bbba5d6497ecb8db7160a397eca81fd
2021-10-05 12:02:51 -07:00
6aadfda9e2 [ci] try installing libgnutls to fix cert error (#65934) (#65979)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65934

see: https://github.com/pytorch/pytorch/issues/65931, this was a
suggested remediation on the linked issue

Test Plan: Imported from OSS

Reviewed By: malfet, zhouzhuojie

Differential Revision: D31313040

Pulled By: suo

fbshipit-source-id: a9e2b82a1e879962af768ed3049c73ab77394738

Co-authored-by: Michael Suo <suo@fb.com>
2021-09-30 18:55:44 -07:00
13666d20fd [DataPipe] Fix deepcopy filehandle for Mapper and in-place modification for IterableWrapper (#65220) (#65924)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65220

Fixes #65221

- Remove deepcopy from Mapper to support file handles
- Convert `IterableWrapper` to deepcopy iterable instance within each iterator to prevent in-place modification (different data per epoch)
- Convert `IDP` to `IterableWrapper` in test_datapipe.py
- Refine the variable names (prevent using `dp` that is module reference)

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D31021886

Pulled By: ejguan

fbshipit-source-id: 72a9eee66c758e2717d591cd0942892bddedc223
2021-09-30 18:36:49 -07:00
1fa17a20fc Fix the slowdown of _object_to_tensor since 1.9 (#65721) (#65835)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65721

#Closes: https://github.com/pytorch/pytorch/issues/65696

The bug is introduced in https://github.com/pytorch/pytorch/pull/55861, and it causes 100X slowdown since 1.9.
ghstack-source-id: 139128267

Test Plan:
Performance test:
```
import time

from torch.distributed.distributed_c10d import _object_to_tensor

start = time.time()
_object_to_tensor("x" * 50_000_000)
print("Time:", time.time() - start)
```

Reviewed By: rohan-varma

Differential Revision: D31219794

fbshipit-source-id: 1abec38f9d51361c1eab6ad5efd87b589322e208

Co-authored-by: Yi Wang <wayi@fb.com>
2021-09-29 14:38:54 -07:00
c05547fa6c Fix test reporting git merge-base (#65787) 2021-09-28 15:48:32 -07:00
0e857bf109 [1.10] Remove torch.vmap (#65496)
torch.vmap is a prototype feature and should not be in the stable
binary. This PR:
- Removes the torch.vmap API
- Removes the documentation entry for torch.vmap
- Changes the vmap tests to use an internal API instead of torch.vmap.

Test Plan:
- Tested locally (test_torch, test_autograd, test_type_hints, test_vmap),
but also wait for CI.
2021-09-24 10:29:08 -07:00
ad22804b95 [release/1.10] Pin builder and xla repo (#65433)
Pin builder to https://github.com/pytorch/builder/commits/release/1.10
Pin xla to https://github.com/pytorch/xla/tree/r1.10
2021-09-21 16:16:22 -07:00
18158 changed files with 974469 additions and 1991270 deletions

View File

@ -0,0 +1,63 @@
# PyTorch CI Builds Pipeline on Azure DevOps
#
# This pipeline:
# 1) builds PyTorch on select configurations
# 2) runs only TestTorch unit tests.
stages:
- stage: 'Build'
displayName: 'Build PyTorch'
jobs:
- template: job_templates/build-verify-publish-template-unix.yml
parameters:
name: ubuntu_1804_CPU_docker
pool: 'PyTorch-Linux-CPU'
container_endpoint: pytorchms.azurecr.io
build_stage: True
is_ci_build: True
os: ubuntu
cuda: cpu
customMatrixes:
Py_38:
configuration: ubuntu_1804_py_38_cpu
container_image: pytorchms.azurecr.io/ubuntu_1804_py_38_cpu_dev
- template: job_templates/build-verify-publish-template-unix.yml
parameters:
name: ubuntu_1804_GPU_docker
pool: 'PyTorch-Linux-GPU'
container_endpoint: pytorchms.azurecr.io
build_stage: True
is_ci_build: True
os: ubuntu
cuda: gpu
customMatrixes:
Py_39_CUDA_112_cuDNN_810:
configuration: ubuntu_1804_py_39_cuda_112_cudnn_810
container_image: pytorchms.azurecr.io/ubuntu_1804_py_39_cuda_112_cudnn_8_dev
CUDA_VERSION: 112
- template: job_templates/build-verify-publish-template-win.yml
parameters:
name: windows_2019_CPU
pool: 'PyTorch-Win-CPU'
build_stage: True
is_ci_build: True
os: windows
cuda: cpu
customMatrixes:
Py_37:
configuration: windows_2019_py_37_cpu
- template: job_templates/build-verify-publish-template-win.yml
parameters:
name: windows_2019_GPU
pool: 'PyTorch-Win-GPU'
build_stage: True
is_ci_build: True
os: windows
cuda: gpu
customMatrixes:
Py_38_CUDA_102_cuDNN_765:
configuration: windows_2019_py_38_cuda_102_cudnn_765
CUDA_VERSION: 102

View File

@ -0,0 +1,82 @@
# PyTorch Daily Builds Pipeline on Azure DevOps
#
# This pipeline:
# 1) builds PyTorch on all available configurations
# 2) runs all PyTorch unit tests
stages:
- stage: 'BuildTest'
displayName: 'Build and Test PyTorch'
jobs:
- template: job_templates/build-verify-publish-template-unix.yml
parameters:
name: ubuntu_1804_CPU_docker
pool: 'PyTorch-Linux-CPU'
container_endpoint: pytorchms.azurecr.io
build_stage: True
is_daily_build: True
os: ubuntu
cuda: cpu
customMatrixes:
Py_38:
configuration: ubuntu_1804_py_38_cpu
container_image: pytorchms.azurecr.io/ubuntu_1804_py_38_cpu_dev
Py_37:
configuration: ubuntu_1804_py_37_cpu
container_image: pytorchms.azurecr.io/ubuntu_1804_py_37_cpu_dev
- template: job_templates/build-verify-publish-template-unix.yml
parameters:
name: ubuntu_1804_GPU_docker
pool: 'PyTorch-Linux-GPU'
container_endpoint: pytorchms.azurecr.io
build_stage: True
is_daily_build: True
os: ubuntu
cuda: gpu
customMatrixes:
Py_39_CUDA_112_cuDNN_810:
configuration: ubuntu_1804_py_39_cuda_112_cudnn_810
container_image: pytorchms.azurecr.io/ubuntu_1804_py_39_cuda_112_cudnn_8_dev
CUDA_VERSION: 112
Py_38_CUDA_102_cuDNN_810:
configuration: ubuntu_1804_py_38_cuda_102_cudnn_810
container_image: pytorchms.azurecr.io/ubuntu_1804_py_38_cuda_102_cudnn_8_dev
CUDA_VERSION: 102
Py_37_CUDA_101_cuDNN_765:
configuration: ubuntu_1804_py_37_cuda_101_cudnn_765
container_image: pytorchms.azurecr.io/ubuntu_1804_py_37_cuda_101_cudnn_7_dev
CUDA_VERSION: 101
- template: job_templates/build-verify-publish-template-win.yml
parameters:
name: windows_2019_CPU
pool: 'PyTorch-Win-CPU'
build_stage: True
is_daily_build: True
os: windows
cuda: cpu
customMatrixes:
Py_38:
configuration: windows_2019_py_38_cpu
Py_37:
configuration: windows_2019_py_37_cpu
- template: job_templates/build-verify-publish-template-win.yml
parameters:
name: windows_2019_GPU
pool: 'PyTorch-Win-GPU'
build_stage: True
is_daily_build: True
os: windows
cuda: gpu
customMatrixes:
Py_39_CUDA_112_cuDNN_810:
configuration: windows_2019_py_39_cuda_112_cudnn_810
CUDA_VERSION: 112
Py_38_CUDA_102_cuDNN_765:
configuration: windows_2019_py_38_cuda_102_cudnn_765
CUDA_VERSION: 102
Py_37_CUDA_101_cuDNN_764:
configuration: windows_2019_py_37_cuda_101_cudnn_764
CUDA_VERSION: 101

View File

@ -0,0 +1,134 @@
# PyTorch build steps template with Unix images Azure DevOps Instances
#
# This build depends on 3 parameters set as environment variables in the pipeline:
# - AZURE_DEVOPS_CLI_PAT: Secret var for authenticating to Azure DevOps
# - AZURE_DEVOPS_ARTIFACTS_ORGANIZATION: Azure Artifacts Organization name to publish artifacts
# - AZURE_DEVOPS_ARTIFACTS_PROJECT: Azure Artifacts Project name to publish artifacts
parameters:
name: ''
pool: ''
container_endpoint: ''
os: ''
cuda: ''
is_ci_build: False
is_official_build: False
is_daily_build: False
build_stage: False
verify_stage: False
publish_stage: False
customMatrixes: ''
jobs:
- job: ${{parameters.name}}
timeoutInMinutes: 300
strategy:
matrix:
${{ insert }}: ${{parameters.customMatrixes}}
pool:
name: ${{ parameters.pool}}
variables:
DECODE_PERCENTS: false
container:
image: $[variables['container_image']]
endpoint: ${{parameters.container_endpoint}}
steps:
# Build stage
- ${{ if eq(parameters.build_stage, 'True') }}:
# Set up environment variables for specific pipeline build
- template: set-environment-variables.yml
parameters:
os: ${{ parameters.os}}
cuda: ${{ parameters.cuda}}
is_official_build: ${{ parameters.is_official_build}}
# Sync and update PyTorch submodules
- bash: git submodule update --init --recursive --jobs 0
displayName: Update PyTorch submodules
# Build PyTorch and run unit tests - no packaging
- ${{ if or(eq(parameters.is_ci_build, 'True'), eq(parameters.is_daily_build, 'True')) }}:
# Build PyTorch from source in develop mode
- bash: python setup.py develop
displayName: Build PyTorch from source
- ${{ if eq(parameters.is_ci_build, 'True') }}:
# Run TestTorch unit tests to demonstrate successful PyTorch build
- bash: python test/test_torch.py TestTorch
displayName: Run TestTorch unit tests
- ${{ if eq(parameters.is_daily_build, 'True') }}:
# Run all unit tests to demonstrate successful PyTorch build
- bash: python test/run_test.py --continue-through-error --exclude-jit-executor --verbose
displayName: Run all unit tests
# Run ComponentGovernance
- task: ComponentGovernanceComponentDetection@0
inputs:
scanType: 'Register'
verbosity: 'Verbose'
alertWarningLevel: 'High'
# Build PyTorch and produce artifacts for verification stage
- ${{ if eq(parameters.is_official_build, 'True') }}:
# Build PyTorch from source in install mode and exclude test binaries
- bash: python setup.py install
displayName: Build PyTorch from source without test binaries
# Package PyTorch Wheel
- bash: python setup.py bdist_wheel
displayName: Package PyTorch Wheel
# Publish PyTorch Wheel
- task: PublishPipelineArtifact@1
inputs:
targetPath: $(Build.SourcesDirectory)/dist/
artifactName: Build_$(Build.BuildNumber)_$(configuration)
displayName: Publish PyTorch Wheel to Pipeline Artifacts
# Verification stage
- ${{ if eq(parameters.verify_stage, 'True') }}:
# Download PyTorch Wheel
- task: DownloadPipelineArtifact@2
inputs:
artifact: Build_$(Build.BuildNumber)_$(configuration)
path: $(Build.SourcesDirectory)/verify
displayName: Download PyTorch Wheel
# Install PyTorch Wheel on Windows
- bash: python -m pip install $(Build.SourcesDirectory)/verify/torch*linux*.whl
displayName: Install PyTorch Wheel
# Ensure PyTorch installed correctly from produced wheel
- bash: |
cd $(Build.SourcesDirectory)/verify
python -c "import torch; print('Installed Torch version: ' + torch.__version__)"
displayName: Check PyTorch correctly installed from wheel
# Publishing stage
- ${{ if eq(parameters.publish_stage, 'True') }}:
# Download PyTorch Wheel
- task: DownloadPipelineArtifact@2
inputs:
artifact: Build_$(Build.BuildNumber)_$(configuration)
path: $(Build.SourcesDirectory)/publish
displayName: Download PyTorch Wheel
# Publish wheel to Azure Artifacts
# The flag continueOnError=true is needed as the artifact to be published
# may already exist, because the artifact is differentiated based on the
# last commit date.
- bash: |
export TORCH_VERSION=$(head -c 5 ./version.txt)
export LAST_COMMIT=$(git rev-parse --short HEAD)
export LAST_COMMIT_DATE=$(git log -1 --pretty=%ad --date=format:%Y%m%d)
cd $(Build.SourcesDirectory)/publish
export TORCH_WHEEL=$(echo torch*linux*whl)
az extension add -n azure-devops
echo $ADOTOKEN | az devops login
az artifacts universal publish --organization $AZURE_DEVOPS_ARTIFACTS_ORGANIZATION --project $AZURE_DEVOPS_ARTIFACTS_PROJECT --scope project --feed "PyTorch" --name $TORCH_WHEEL --description "PyTorch Official Build Artifact" --version $TORCH_VERSION-$LAST_COMMIT_DATE-$LAST_COMMIT --path .
env:
ADOTOKEN: $(AZURE_DEVOPS_CLI_PAT)
continueOnError: true
displayName: Upload PyTorch Official Build package to Azure Artifacts

View File

@ -0,0 +1,150 @@
# PyTorch build steps template with Windows images Azure DevOps Instances
#
# This build depends on 3 parameters set as environment variables in the pipeline:
# - AZURE_DEVOPS_CLI_PAT: Secret var for authenticating to Azure DevOps
# - AZURE_DEVOPS_ARTIFACTS_ORGANIZATION: Azure Artifacts Organization name to publish artifacts
# - AZURE_DEVOPS_ARTIFACTS_PROJECT: Azure Artifacts Project name to publish artifacts
parameters:
name: ''
pool: ''
os: ''
cuda: ''
is_ci_build: False
is_official_build: False
is_daily_build: False
build_stage: False
verify_stage: False
publish_stage: False
customMatrixes: ''
jobs:
- job: ${{parameters.name}}
timeoutInMinutes: 300
strategy:
matrix:
${{ insert }}: ${{parameters.customMatrixes}}
pool:
name: ${{ parameters.pool}}
variables:
CMAKE_GENERATOR: Ninja
PACKAGE_PDBS: 0
steps:
# Prepare for PyTorch build on Windows
- template: prepare-build-template.yml
parameters:
configuration: $(configuration)
build_stage: ${{ parameters.build_stage}}
# Build Stage
- ${{ if eq(parameters.build_stage, 'True') }}:
# Set up environment variables for specific pipeline build
- template: set-environment-variables.yml
parameters:
os: ${{ parameters.os}}
cuda: ${{ parameters.cuda}}
is_official_build: ${{ parameters.is_official_build}}
# Sync and update PyTorch submodules
- script: git submodule update --init --recursive --jobs 0
displayName: Update PyTorch submodules
# Build PyTorch and run unit tests - no packaging
- ${{ if or(eq(parameters.is_ci_build, 'True'), eq(parameters.is_daily_build, 'True')) }}:
# Build PyTorch from source in develop mode with Ninja
- script: call activate $(configuration) && python setup.py develop
displayName: Build PyTorch from source
- ${{ if eq(parameters.is_ci_build, 'True') }}:
# Run TestTorch unit tests to demonstrate successful PyTorch build
- script: call activate $(configuration) && python test\test_torch.py TestTorch
displayName: Run TestTorch unit tests
- ${{ if eq(parameters.is_daily_build, 'True') }}:
# Run all unit tests to demonstrate successful PyTorch build
- script: call activate $(configuration) && python test/run_test.py --continue-through-error --exclude-jit-executor --verbose
displayName: Run all unit tests
# Run ComponentGovernance
- task: ComponentGovernanceComponentDetection@0
inputs:
scanType: 'Register'
verbosity: 'Verbose'
alertWarningLevel: 'High'
# Build PyTorch and produce artifacts for verification stage
- ${{ if eq(parameters.is_official_build, 'True') }}:
# Build PyTorch from source in install mode with Ninja and exclude test binaries
- script: call activate $(configuration) && python setup.py install
displayName: Build PyTorch from source without test binaries
# Package PyTorch Wheel
- script: call activate $(configuration) && python setup.py bdist_wheel
displayName: Package PyTorch Wheel
# Publish PyTorch Wheel
- task: PublishPipelineArtifact@1
inputs:
targetPath: $(Build.SourcesDirectory)\dist\
artifactName: Build_$(Build.BuildNumber)_$(configuration)
displayName: Publish PyTorch Wheel to Pipeline Artifacts
# Verification Stage
- ${{ if eq(parameters.verify_stage, 'True') }}:
# Download PyTorch Wheel
- task: DownloadPipelineArtifact@2
inputs:
artifact: Build_$(Build.BuildNumber)_$(configuration)
path: $(Build.SourcesDirectory)\verify
displayName: Download PyTorch Wheel
# Install PyTorch Wheel on Windows
- script: |
call activate $(configuration)
cd $(Build.SourcesDirectory)\verify
dir torch*win*.whl /b > whl.txt
set /p whl= < whl.txt
python -m pip install %whl%
displayName: Install PyTorch Wheel
# Ensure PyTorch installed correctly from produced wheel
- script: |
call activate $(configuration)
cd $(Build.SourcesDirectory)\verify
python -c "import torch; print('Installed Torch version: ' + torch.__version__)"
displayName: Check PyTorch correctly installed from wheel
# Publishing stage
- ${{ if eq(parameters.publish_stage, 'True') }}:
# Download PyTorch Wheel
- task: DownloadPipelineArtifact@2
inputs:
artifact: Build_$(Build.BuildNumber)_$(configuration)
path: $(Build.SourcesDirectory)\publish
displayName: Download PyTorch Wheel
# Set up Azure Artifacts for Windows
# The pip install --upgrade command is a bug fix for Azure CLI on Windows
# More info: https://github.com/Azure/azure-cli/issues/16858
- script: |
pip install --upgrade pip --target \opt\az\lib\python3.6\site-packages\
az extension add -n azure-devops
displayName: Set up Azure Artifacts download on Windows
# Publish wheel to Azure Artifacts
# The flag continueOnError=true is needed as the artifact to be published
# may already exist, because the artifact is differentiated based on the
# last commit date.
- script: |
set /p TORCH_VERSION= < version.txt
cd $(Build.SourcesDirectory)\publish
git rev-parse --short HEAD > last_commit.txt && set /p LAST_COMMIT= < last_commit.txt
git log -1 --pretty=%ad --date=format:%Y%m%d > last_commit_date.txt && set /p LAST_COMMIT_DATE= < last_commit_date.txt
dir torch*win*.whl /b > whl.txt && set /p TORCH_WHEEL= < whl.txt
echo %ADOTOKEN% | az devops login
az artifacts universal publish --organization %AZURE_DEVOPS_ARTIFACTS_ORGANIZATION% --project %AZURE_DEVOPS_ARTIFACTS_PROJECT% --scope project --feed "PyTorch" --name %TORCH_WHEEL% --description "PyTorch Official Build Artifact" --version %TORCH_VERSION:~0,5%-%LAST_COMMIT_DATE%-%LAST_COMMIT% --path .
env:
ADOTOKEN: $(AZURE_DEVOPS_CLI_PAT)
continueOnError: true
displayName: Upload PyTorch nigthly package to Azure Artifacts

View File

@ -0,0 +1,17 @@
dependencies:
- python=PYTHON_VERSION
- numpy
- ninja
- pyyaml
- mkl
- mkl-include
- setuptools
- cmake
- cffi
- typing_extensions
- future
- six
- requests
- dataclasses
- pip:
- -r ../../requirements.txt

View File

@ -0,0 +1,26 @@
parameters:
name: ''
pool: ''
customMatrixes: ''
jobs:
- job: ${{parameters.name}}
timeoutInMinutes: 600
strategy:
matrix:
${{ insert }}: ${{parameters.customMatrixes}}
pool:
name: ${{ parameters.pool}}
steps:
# Clone PyTorch Tests repository
- bash: |
B64_PAT=$(echo -n ":$_ADOTOKEN" | base64)
git -c http.extraHeader="Authorization: Basic ${B64_PAT}" clone $(AZURE_DEVOPS_PYTORCH_TESTS_REPO_URL)
cd pytorch_tests
git checkout $(PYTORCH_TESTS_CHECKOUT_BRANCH)
env:
_ADOTOKEN: $(AZURE_DEVOPS_CLI_PAT)
displayName: Clone PyTorch Tests repo
- bash: |
bash $(Build.SourcesDirectory)/pytorch_tests/webapp/notify_webapp.sh
displayName: Notify Webapp

View File

@ -0,0 +1,62 @@
# Build prepare steps for PyTorch on Azure DevOps to build from source.
# These steps share between normal build process and semmle security scan tasks
parameters:
build_stage: False
configuration: ''
steps:
# End Python tasks that may be lingering over from previous runs
# Note: If python.exe isn't currently running, exit code becomes 128,
# which fails the run. Here exit code is set to 0 to avoid failed run.
- script: |
taskkill /f /im python.exe
IF %ERRORLEVEL% EQU 128 exit 0
displayName: End previous Python processes
# Clean up env directory in conda for fresh builds and set up conda environment YAML
- powershell: |
Remove-Item 'C:\Miniconda\envs' -Recurse -ErrorAction Ignore
$env:PYTHON_VERSION = $env:SYSTEM_JOBNAME.Substring(3,1) + '.' + $env:SYSTEM_JOBNAME.Substring(4,1)
(Get-Content .azure_pipelines\job_templates\common-packages.yml) -replace 'PYTHON_VERSION', $env:PYTHON_VERSION | Out-File -encoding ASCII .azure_pipelines\job_templates\common-packages.yml
displayName: Clean up previous environments and Set up conda environment YAML
# Make conda environment and install required packages
- script: |
call conda clean --all -y
call conda env create -n $(configuration) --file .azure_pipelines\job_templates\common-packages.yml
call activate $(configuration)
call conda install -c conda-forge libuv=1.39
displayName: Set up conda environment for building from source
- ${{ if eq(parameters.build_stage, 'True') }}:
# Install MKL
- script: |
rmdir /s /q mkl
del mkl_2020.2.254.7z
curl https://s3.amazonaws.com/ossci-windows/mkl_2020.2.254.7z -k -O
7z x -aoa mkl_2020.2.254.7z -omkl
displayName: Install MKL
# Install sccache and randomtemp
# Related PyTorch GitHub issue: https://github.com/pytorch/pytorch/issues/25393
# Related fix: https://github.com/pytorch/builder/pull/448/
- script: |
mkdir .\tmp_bin
curl -k https://s3.amazonaws.com/ossci-windows/sccache.exe --output .\tmp_bin\sccache.exe
curl -k https://s3.amazonaws.com/ossci-windows/sccache-cl.exe --output .\tmp_bin\sccache-cl.exe
copy .\tmp_bin\sccache.exe .\tmp_bin\nvcc.exe
curl -kL https://github.com/peterjc123/randomtemp-rust/releases/download/v0.3/randomtemp.exe --output .\tmp_bin\randomtemp.exe
displayName: Install sccache and randomtemp
condition: not(eq(variables.CUDA_VERSION, ''))
# CUDA 11.2's CUB directory conflicts with CUDA 10.2 and 10.1
# builds, where CUDA 11.2's CUB is injected into non-CUDA
# 11.2 builds.
- powershell: Remove-Item "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\include\cub" -Recurse -ErrorAction Ignore
displayName: Remove conflicting CUB from CUDA installation
condition: not(eq(variables.CUDA_VERSION, ''))
- powershell: Copy-Item -Path "F:\cuda_11_2\cub\" -Destination "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\include" -Recurse
displayName: Copy CUDA CUB for CUDA 11.2 build
condition: eq(variables.CUDA_VERSION, '112')

View File

@ -0,0 +1,61 @@
# PyTorch build steps template with Unix images Azure DevOps Instances
#
# This build depends on 5 parameters set as an environment variables in the pipeline:
# - AZURE_DEVOPS_CLI_PAT: Secret var for authenticating to Azure DevOps
# - AZURE_STORAGE_KEY: Secret var for authenticating to Azure Storage
# - _TS_CLONE_P, _TS_P, _TS_SM_P: Secret vars for specific unit tests
parameters:
name: ''
pool: ''
container_endpoint: ''
customMatrixes: ''
jobs:
- job: ${{parameters.name}}
timeoutInMinutes: 600
strategy:
matrix:
${{ insert }}: ${{parameters.customMatrixes}}
pool:
name: ${{ parameters.pool}}
variables:
DECODE_PERCENTS: false
steps:
# Don't checkout repo contents to save time and CPU compute. Environment variables
# related to checkout branch such as $(BUILD_SOURCEBRANCH) are still available.
- checkout: none
# Delete pytorch_tests repo from previous builds if exists
- bash: rm -rf pytorch_tests/
displayName: Delete pytorch_tests repo from previous builds if exists
# Clone PyTorch Tests repository
- bash: |
B64_PAT=$(echo -n ":$_ADOTOKEN" | base64)
git -c http.extraHeader="Authorization: Basic ${B64_PAT}" clone $(AZURE_DEVOPS_PYTORCH_TESTS_REPO_URL)
cd pytorch_tests
git checkout $(PYTORCH_TESTS_CHECKOUT_BRANCH)
env:
_ADOTOKEN: $(AZURE_DEVOPS_CLI_PAT)
displayName: Clone PyTorch Tests repo
# Run PyTorch Unit Tests
- bash: bash $(Build.SourcesDirectory)/pytorch_tests/scripts/linux/run.sh
env:
_AZURE_STORAGE_KEY: $(AZURE_STORAGE_KEY)
_TS_CLONE_P: $(TS_CLONE_PASSWORD)
_TS_P: $(TS_PAT)
_TS_SM_P: $(TS_SM_PAT)
_AZUREML_CLONE_PASSWORD: $(AZUREML_CLONE_PASSWORD)
_SPPASSWORD: $(SPPASSWORD)
displayName: Run PyTorch Unit Tests
# Tests results are available outside the docker container since
# the current directory is mounted as a volume of the container.
- task: PublishTestResults@2
condition: always()
inputs:
testResultsFiles: '**/test-*.xml'
testRunTitle: 'Publish test results for Python'

View File

@ -0,0 +1,57 @@
# PyTorch build steps template with Windows images Azure DevOps Instances
#
# This build depends on 5 parameters set as an environment variables in the pipeline:
# - AZURE_DEVOPS_CLI_PAT: Secret var for authenticating to Azure DevOps
# - AZURE_STORAGE_KEY: Secret var for authenticating to Azure Storage
# - _TS_CLONE_P, _TS_P, _TS_SM_P: Secret vars for specific unit tests
parameters:
name: ''
pool: ''
customMatrixes: ''
jobs:
- job: ${{parameters.name}}
timeoutInMinutes: 600
strategy:
matrix:
${{ insert }}: ${{parameters.customMatrixes}}
pool:
name: ${{ parameters.pool}}
steps:
# Don't checkout repo contents to save time and CPU compute. Environment variables
# related to checkout branch such as $(BUILD_SOURCEBRANCH) are still available.
- checkout: none
# Delete pytorch_tests repo from previous builds if exists
- script: if exist "pytorch_tests/" rmdir "pytorch_tests/" /q /s
displayName: Delete pytorch_tests repo from previous builds if exists
# Clone PyTorch Tests repository
- powershell: |
$env:B64Pat = [Convert]::ToBase64String([System.Text.Encoding]::UTF8.GetBytes(":$env:_ADOTOKEN"))
git -c http.extraHeader="Authorization: Basic $env:B64Pat" clone $env:AZURE_DEVOPS_pytorch_tests_REPO_URL
cd pytorch_tests
git checkout $(PYTORCH_TESTS_CHECKOUT_BRANCH)
env:
_ADOTOKEN: $(AZURE_DEVOPS_CLI_PAT)
displayName: Clone PyTorch Tests repo
# Run PyTorch Unit Tests
- script: call $(Build.SourcesDirectory)\pytorch_tests\scripts\windows\run.bat
env:
_ADOTOKEN: $(AZURE_DEVOPS_CLI_PAT)
_AZURE_STORAGE_KEY: $(AZURE_STORAGE_KEY)
_TS_CLONE_P: $(TS_CLONE_PASSWORD)
_TS_P: $(TS_PAT)
_TS_SM_P: $(TS_SM_PAT)
displayName: Run PyTorch Unit Tests
# Tests results are available outside the docker container since
# the current directory is mounted as a volume of the container.
- task: PublishTestResults@2
condition: always()
inputs:
testResultsFiles: '**\test-*.xml'
testRunTitle: 'Publish test results for Python'

View File

@ -0,0 +1,131 @@
# Set environment variables for specific configurations
parameters:
is_official_build: False
os: ''
cuda: ''
steps:
# Environment configuration steps for Ubuntu builds
- ${{ if contains(parameters.os, 'ubuntu') }}:
# Set configuration specific build flags
- ${{ if eq(parameters.is_official_build, True) }}:
- bash: |
echo "##vso[task.setvariable variable=INSTALL_TEST;]0"
echo "##vso[task.setvariable variable=PYTORCH_BUILD_NUMBER;]1"
export PYTORCH_VERSION=$(head -c 5 ./version.txt)
echo "##vso[task.setvariable variable=PYTORCH_BUILD_VERSION;]$PYTORCH_VERSION.dev"
displayName: Set configuration-specific build flags
# Set PyTorch CPU/GPU build flags.
- ${{ if contains(parameters.cuda, 'cpu') }}:
- bash: |
echo "##vso[task.setvariable variable=USE_CUDA;]0"
echo "##vso[task.setvariable variable=PYTORCH_BUILD_VERSION;]$(PYTORCH_BUILD_VERSION).cpu"
displayName: Set CUDA-specific build flag for CPU builds
- ${{ if contains(parameters.cuda, 'gpu') }}:
- bash: |
echo "##vso[task.setvariable variable=USE_CUDA;]1"
echo "##vso[task.setvariable variable=PYTORCH_BUILD_VERSION;]$(PYTORCH_BUILD_VERSION).cu$(CUDA_VERSION)"
displayName: Set CUDA-specific build flag for GPU builds
# Set MKL environment variables
- bash: |
echo "##vso[task.setvariable variable=CMAKE_LIBRARY_PATH;]/opt/intel/lib:$CMAKE_LIBRARY_PATH"
echo "##vso[task.setvariable variable=CMAKE_INCLUDE_PATH;]/opt/intel/include:$CMAKE_INCLUDE_PATH"
displayName: Set MKL paths
# View current environment variables
- bash:
printenv
displayName: Show environment variables
# Environment configuration steps for Windows builds
- ${{ if contains(parameters.os, 'windows') }}:
# Set Conda Lib Path
- powershell: Write-Host "##vso[task.setvariable variable=CONDA_LIB_PATH;]C:\Miniconda\envs\$(configuration)\Library\bin"
displayName: Set Conda Lib Path
# Set configuration specific build flags
- ${{ if eq(parameters.is_official_build, True) }}:
- powershell: |
Write-Host "##vso[task.setvariable variable=INSTALL_TEST;]0"
Write-Host "##vso[task.setvariable variable=PYTORCH_BUILD_NUMBER;]1"
Set-Variable -Name PYTORCH_VERSION -Value (Get-Content .\version.txt).Substring(0,5)
Write-Host "##vso[task.setvariable variable=PYTORCH_BUILD_VERSION;]$PYTORCH_VERSION.dev"
displayName: Set configuration-specific build flags
# Set PyTorch CPU/GPU build flags..
- ${{ if contains(parameters.cuda, 'cpu') }}:
- powershell: |
Write-Host "##vso[task.setvariable variable=USE_CUDA;]0"
Write-Host "##vso[task.setvariable variable=PYTORCH_BUILD_VERSION;]$(PYTORCH_BUILD_VERSION).cpu"
displayName: Set CUDA-specific build flag for CPU build
- ${{ if contains(parameters.cuda, 'gpu') }}:
- powershell: |
Write-Host "##vso[task.setvariable variable=USE_CUDA;]1"
Write-Host "##vso[task.setvariable variable=PYTORCH_BUILD_VERSION;]$(PYTORCH_BUILD_VERSION).cu$(CUDA_VERSION)"
displayName: Set CUDA-specific build flag for GPU build
# Set CUDA 11.2, 10.2 or 10.1 specific build flags
- ${{ if eq(parameters.cuda, 'gpu') }}:
- powershell: |
Write-Host "##vso[task.setvariable variable=TORCH_CUDA_ARCH_LIST;]3.7+PTX;5.0;6.0;6.1;7.0;7.5;8.0;8.6"
Write-Host "##vso[task.setvariable variable=CUDA_PATH;]C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\"
displayName: Set CUDA 11.2 specific build flags
condition: eq(variables.CUDA_VERSION, '112')
- powershell: |
Write-Host "##vso[task.setvariable variable=TORCH_CUDA_ARCH_LIST;]3.7+PTX;5.0;6.0;6.1;7.0;7.5"
Write-Host "##vso[task.setvariable variable=CUDA_PATH;]C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\"
displayName: Set CUDA 10.2 specific build flags
condition: eq(variables.CUDA_VERSION, '102')
- powershell: |
Write-Host "##vso[task.setvariable variable=TORCH_CUDA_ARCH_LIST;]3.7+PTX;5.0;6.0;6.1;7.0;7.5"
Write-Host "##vso[task.setvariable variable=CUDA_PATH;]C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\"
displayName: Set CUDA 10.1 specific build flags
condition: eq(variables.CUDA_VERSION, '101')
- powershell: |
Write-Host "##vso[task.setvariable variable=CUDA_BIN_PATH;]$env:CUDA_PATH\bin\"
Write-Host "##vso[task.setvariable variable=CUDNN_ROOT;]$env:CUDA_PATH"
Write-Host "##vso[task.setvariable variable=CUDNN_INCLUDE_DIR;]$env:CUDA_PATH\include\"
Write-Host "##vso[task.setvariable variable=CUDNN_LIBRARY;]$env:CUDA_PATH\lib\x64\"
Write-Host "##vso[task.prependpath]$env:CUDA_PATH\bin"
Write-Host "##vso[task.setvariable variable=TORCH_NVCC_FLAGS;]-Xfatbin -compress-all --no-host-device-move-forward"
Write-Host "##vso[task.setvariable variable=THRUST_IGNORE_CUB_VERSION_CHECK;]1"
Write-Host "##vso[task.setvariable variable=NVTOOLSEXT_PATH;]C:\Program Files\NVIDIA Corporation\NvToolsExt\"
displayName: Set CUDA environment variables
- powershell: |
copy "$(CUDA_BIN_PATH)\cusparse*64_*.dll*" $(Build.SourcesDirectory)\torch\lib
copy "$(CUDA_BIN_PATH)\cublas*64_*.dll*" $(Build.SourcesDirectory)\torch\lib
copy "$(CUDA_BIN_PATH)\cudart*64_*.dll*" $(Build.SourcesDirectory)\torch\lib
copy "$(CUDA_BIN_PATH)\curand*64_*.dll*" $(Build.SourcesDirectory)\torch\lib
copy "$(CUDA_BIN_PATH)\cufft*64_*.dll*" $(Build.SourcesDirectory)\torch\lib
copy "$(CUDA_BIN_PATH)\cusolver*64_*.dll*" $(Build.SourcesDirectory)\torch\lib
copy "$(CUDA_BIN_PATH)\cudnn*64_*.dll*" $(Build.SourcesDirectory)\torch\lib
copy "$(CUDA_BIN_PATH)\nvrtc*64_*.dll*" $(Build.SourcesDirectory)\torch\lib
copy "C:\Program Files\NVIDIA Corporation\NvToolsExt\bin\x64\nvToolsExt64_1.dll*" $(Build.SourcesDirectory)\torch\lib
copy "$(CONDA_LIB_PATH)\libiomp*5md.dll" $(Build.SourcesDirectory)\torch\lib
copy "$(CONDA_LIB_PATH)\uv.dll" $(Build.SourcesDirectory)\torch\lib
displayName: Copy CUDA/cuDNN/libomp/libuv dlls to torch\lib
# Set MKL, sccache and randomtemp environment variables
- powershell: |
Write-Host "##vso[task.setvariable variable=CMAKE_INCLUDE_PATH;]$(Build.SourcesDirectory)\mkl\include"
Write-Host "##vso[task.setvariable variable=CMAKE_LIBRARY_PATH;]$(Build.SourcesDirectory)\mkl\lib;$env:CMAKE_LIBRARY_PATH"
Write-Host "##vso[task.setvariable variable=ADDITIONAL_PATH;]$(Build.SourcesDirectory)\tmp_bin"
Write-Host "##vso[task.setvariable variable=SCCACHE_IDLE_TIMEOUT;]1500"
Write-Host "##vso[task.setvariable variable=RANDOMTEMP_EXECUTABLE;]$(Build.SourcesDirectory)\tmp_bin\nvcc.exe"
Write-Host "##vso[task.setvariable variable=CUDA_NVCC_EXECUTABLE;]$(Build.SourcesDirectory)\tmp_bin\randomtemp.exe"
Write-Host "##vso[task.setvariable variable=RANDOMTEMP_BASEDIR;]$(Build.SourcesDirectory)\tmp_bin"
displayName: Set MKL, sccache and randomtemp environment variables
# View current environment variables
- script:
set
displayName: Show environment variables

View File

@ -0,0 +1,14 @@
# Main logic to initiate wait for PR artifact to be ready
steps:
- task: InvokeRESTAPI@1
displayName: 'Wait for job success and wheel ready'
timeoutInMinutes: 60
inputs:
connectionType: 'connectedServiceName'
serviceConnection: circleciconn
method: 'POST'
headers: '{"Content-Type":"application/json", "BranchName":"$(_TARGET_BRANCH_TO_CHECK)", "JobName":"$(TARGET_CIRCLECI_BUILD_PR)", "PRNumber":"$(_TARGET_PR_NUMBER)", "TargetCommit":"$(_TARGET_COMMIT)", "PlanUrl":"$(System.CollectionUri)", "ProjectId":"$(System.TeamProjectId)", "HubName":"$(System.HostType)", "PlanId":"$(System.PlanId)", "JobId":"$(System.JobId)", "TimelineId":"$(System.TimelineId)", "TaskInstanceId":"$(System.TaskInstanceId)", "AuthToken":"$(System.AccessToken)"}'
body: ''
urlSuffix: 'api/JobStatus'
waitForCompletion: true

View File

@ -0,0 +1,92 @@
# Initiate 5 agentless-server waiting jobs to check on the
# status of PR artifact builds, for a maximum wait time of
# 11*60 min=660 mins. These jobs will pass immediately
# once targeted CircleCI build is ready.
jobs:
- job: checkjob1
pool: server
timeoutInMinutes: 60
continueOnError: true
steps:
- template: wheel-wait-job-template.yml
- job: checkjob2
pool: server
timeoutInMinutes: 60
dependsOn: checkjob1
continueOnError: true
steps:
- template: wheel-wait-job-template.yml
- job: checkjob3
pool: server
timeoutInMinutes: 60
dependsOn: checkjob2
continueOnError: true
steps:
- template: wheel-wait-job-template.yml
- job: checkjob4
pool: server
timeoutInMinutes: 60
dependsOn: checkjob3
continueOnError: true
steps:
- template: wheel-wait-job-template.yml
- job: checkjob5
pool: server
timeoutInMinutes: 60
dependsOn: checkjob4
continueOnError: true
steps:
- template: wheel-wait-job-template.yml
- job: checkjob6
pool: server
timeoutInMinutes: 60
dependsOn: checkjob5
continueOnError: true
steps:
- template: wheel-wait-job-template.yml
- job: checkjob7
pool: server
timeoutInMinutes: 60
dependsOn: checkjob6
continueOnError: true
steps:
- template: wheel-wait-job-template.yml
- job: checkjob8
pool: server
timeoutInMinutes: 60
dependsOn: checkjob7
continueOnError: true
steps:
- template: wheel-wait-job-template.yml
- job: checkjob9
pool: server
timeoutInMinutes: 60
dependsOn: checkjob8
continueOnError: true
steps:
- template: wheel-wait-job-template.yml
- job: checkjob10
pool: server
timeoutInMinutes: 60
dependsOn: checkjob9
continueOnError: true
steps:
- template: wheel-wait-job-template.yml
- job: checkjob11
pool: server
timeoutInMinutes: 60
dependsOn: checkjob10
continueOnError: true
steps:
- template: wheel-wait-job-template.yml

View File

@ -0,0 +1,60 @@
# PyTorch Nightly PyTorch Tests Builds Pipeline on Azure DevOps
#
# This pipeline runs custom PyTorch unit-tests on nightly
# PyTorch wheels.
stages:
- stage: 'NightlyCustomTests'
displayName: 'Run custom unit tests on PyTorch wheels'
jobs:
- template: job_templates/pytorch-template-unix.yml
parameters:
name: ubuntu_1804_CPU_docker
pool: $(BUILD_POOL_LIN_1)
customMatrixes:
Nightly_Custom_Tests:
_DOCKER_IMAGE: $(DOCKER_IMAGE_LIN_1)
_PYTHON_VERSION: $(PYTHON_VERSION_LIN_1)
_CUDA_BUILD_VERSION: $(CUDA_BUILD_VERSION_LIN_1)
_RUN_TESTS: $(RUN_TESTS_LIN)
- template: job_templates/pytorch-template-unix.yml
parameters:
name: ubuntu_1804_GPU_docker
pool: $(BUILD_POOL_LIN_2)
customMatrixes:
Nightly_Custom_Tests:
_DOCKER_IMAGE: $(DOCKER_IMAGE_LIN_2)
_PYTHON_VERSION: $(PYTHON_VERSION_LIN_2)
_CUDA_BUILD_VERSION: $(CUDA_BUILD_VERSION_LIN_2)
_RUN_TESTS: $(RUN_TESTS_LIN)
- template: job_templates/pytorch-template-win.yml
parameters:
name: windows_2019_CPU
pool: $(BUILD_POOL_WIN_1)
customMatrixes:
Nightly_Custom_Tests:
_PYTHON_VERSION: $(PYTHON_VERSION_WIN_1)
_CUDA_BUILD_VERSION: $(CUDA_BUILD_VERSION_WIN_1)
_RUN_TESTS: $(RUN_TESTS_WIN)
- template: job_templates/pytorch-template-win.yml
parameters:
name: windows_2019_GPU
pool: $(BUILD_POOL_WIN_2)
customMatrixes:
Nightly_Custom_Tests:
_PYTHON_VERSION: $(PYTHON_VERSION_WIN_2)
_CUDA_BUILD_VERSION: $(CUDA_BUILD_VERSION_WIN_2)
_RUN_TESTS: $(RUN_TESTS_WIN)
- stage: 'NotifyWebapp'
displayName: 'Notify Webapp that pipeline is finished'
dependsOn: NightlyCustomTests
condition: succeededOrFailed()
jobs:
- template: job_templates/notify-webapp-template.yml
parameters:
name: ubuntu_1804_CPU
pool: $(BUILD_POOL_LIN_1)

View File

@ -0,0 +1,62 @@
# PyTorch PR PyTorch Tests Builds Pipeline on Azure DevOps
#
# This pipeline:
# 1) ensures that CircleCI builds for a given PR
# have finished, and that its artifacts are
# ready for download
# 2) runs custom PyTorch unit-tests on PyTorch
# wheels generated during PR builds.
resources:
webhooks:
- webhook: GitHubPyTorchPRTrigger
connection: GitHubPyTorchPRTriggerConnection
filters:
- path: repositoryName
value: pytorch_tests
stages:
- stage: 'EnsureArtifactsReady'
displayName: 'Ensure PyTorch PR Artifacts are ready'
jobs:
- template: job_templates/wheel-wait-template.yml
variables:
_TARGET_BRANCH_TO_CHECK: ${{parameters.GitHubPyTorchPRTrigger.TARGET_BRANCH_TO_CHECK_AZ_DEVOPS_PR}}
_TARGET_PR_NUMBER: ${{parameters.GitHubPyTorchPRTrigger.PR_NUMBER}}
_TARGET_COMMIT: ${{parameters.GitHubPyTorchPRTrigger.TARGET_COMMIT}}
- stage: 'PRCustomTests'
displayName: 'Run custom unit tests on PyTorch wheels'
dependsOn: EnsureArtifactsReady
condition: succeeded()
jobs:
- template: job_templates/pytorch-template-unix.yml
parameters:
name: ubuntu_1804_GPU_docker
pool: $(BUILD_POOL_PR)
customMatrixes:
PR_Custom_Tests:
_PYTHON_VERSION: $(PYTHON_VERSION_PR)
_CUDA_BUILD_VERSION: $(CUDA_BUILD_VERSION_PR)
_TARGET_CIRCLECI_BUILD: $(TARGET_CIRCLECI_BUILD_PR)
_TARGET_BRANCH_TO_CHECK: ${{parameters.GitHubPyTorchPRTrigger.TARGET_BRANCH_TO_CHECK_AZ_DEVOPS_PR}}
_TARGET_PR_NUMBER: ${{parameters.GitHubPyTorchPRTrigger.PR_NUMBER}}
_TARGET_COMMIT: ${{parameters.GitHubPyTorchPRTrigger.TARGET_COMMIT}}
_DOCKER_IMAGE: $(DOCKER_IMAGE_PR)
_RUN_TESTS: $(RUN_TESTS_PR)
- stage: 'NotifyWebapp'
displayName: 'Notify Webapp that pipeline is finished'
dependsOn: PRCustomTests
condition: succeededOrFailed()
jobs:
- template: job_templates/notify-webapp-template.yml
parameters:
name: ubuntu_1804_CPU
pool: $(BUILD_POOL_LIN_1)
customMatrixes:
PR_Notify_WebApp:
_TARGET_CIRCLECI_BUILD: $(TARGET_CIRCLECI_BUILD_PR)
_TARGET_BRANCH_TO_CHECK: ${{parameters.GitHubPyTorchPRTrigger.TARGET_BRANCH_TO_CHECK_AZ_DEVOPS_PR}}
_TARGET_PR_NUMBER: ${{parameters.GitHubPyTorchPRTrigger.PR_NUMBER}}
_TARGET_COMMIT: ${{parameters.GitHubPyTorchPRTrigger.TARGET_COMMIT}}

View File

@ -0,0 +1,224 @@
# PyTorch Official Builds Pipeline on Azure DevOps
#
# This pipeline:
# 1) builds PyTorch on all available configurations
# 2) verifies PyTorch artifacts by installing them in a clean environment
# and checking torch.__version_
# 3) publishes official PyTorch artifacts to Azure DevOps Artifacts for consumption
stages:
- stage: 'Build'
displayName: 'Build PyTorch'
jobs:
- template: job_templates/build-verify-publish-template-unix.yml
parameters:
name: ubuntu_1804_CPU_docker
pool: 'PyTorch-Linux-CPU'
container_endpoint: pytorchms.azurecr.io
build_stage: True
is_official_build: True
os: ubuntu
cuda: cpu
customMatrixes:
Py_38:
configuration: ubuntu_1804_py_38_cpu
container_image: pytorchms.azurecr.io/ubuntu_1804_py_38_cpu_dev
Py_37:
configuration: ubuntu_1804_py_37_cpu
container_image: pytorchms.azurecr.io/ubuntu_1804_py_37_cpu_dev
- template: job_templates/build-verify-publish-template-unix.yml
parameters:
name: ubuntu_1804_GPU_docker
pool: 'PyTorch-Linux-GPU'
container_endpoint: pytorchms.azurecr.io
build_stage: True
is_official_build: True
os: ubuntu
cuda: gpu
customMatrixes:
Py_39_CUDA_112_cuDNN_810:
configuration: ubuntu_1804_py_39_cuda_112_cudnn_810
container_image: pytorchms.azurecr.io/ubuntu_1804_py_39_cuda_112_cudnn_8_dev
CUDA_VERSION: 112
Py_38_CUDA_102_cuDNN_810:
configuration: ubuntu_1804_py_38_cuda_102_cudnn_810
container_image: pytorchms.azurecr.io/ubuntu_1804_py_38_cuda_102_cudnn_8_dev
CUDA_VERSION: 102
Py_37_CUDA_101_cuDNN_765:
configuration: ubuntu_1804_py_37_cuda_101_cudnn_765
container_image: pytorchms.azurecr.io/ubuntu_1804_py_37_cuda_101_cudnn_7_dev
CUDA_VERSION: 101
- template: job_templates/build-verify-publish-template-win.yml
parameters:
name: windows_2019_CPU
pool: 'PyTorch-Win-CPU'
build_stage: True
is_official_build: True
os: windows
cuda: cpu
customMatrixes:
Py_38:
configuration: windows_2019_py_38_cpu
Py_37:
configuration: windows_2019_py_37_cpu
- template: job_templates/build-verify-publish-template-win.yml
parameters:
name: windows_2019_GPU
pool: 'PyTorch-Win-GPU'
build_stage: True
is_official_build: True
os: windows
cuda: gpu
customMatrixes:
Py_39_CUDA_112_cuDNN_810:
configuration: windows_2019_py_39_cuda_112_cudnn_810
CUDA_VERSION: 112
Py_38_CUDA_102_cuDNN_765:
configuration: windows_2019_py_38_cuda_102_cudnn_765
CUDA_VERSION: 102
Py_37_CUDA_101_cuDNN_764:
configuration: windows_2019_py_37_cuda_101_cudnn_764
CUDA_VERSION: 101
- stage: 'Verify'
displayName: 'Verify PyTorch wheels'
dependsOn: Build
condition: succeeded()
jobs:
- template: job_templates/build-verify-publish-template-unix.yml
parameters:
name: ubuntu_1804_CPU_docker
pool: 'PyTorch-Linux-CPU'
container_endpoint: pytorchms.azurecr.io
verify_stage: True
is_official_build: True
customMatrixes:
Py_38:
configuration: ubuntu_1804_py_38_cpu
container_image: pytorchms.azurecr.io/ubuntu_1804_py_38_cpu_dev
Py_37:
configuration: ubuntu_1804_py_37_cpu
container_image: pytorchms.azurecr.io/ubuntu_1804_py_37_cpu_dev
- template: job_templates/build-verify-publish-template-unix.yml
parameters:
name: ubuntu_1804_GPU_docker
pool: 'PyTorch-Linux-GPU'
container_endpoint: pytorchms.azurecr.io
verify_stage: True
is_official_build: True
customMatrixes:
Py_39_CUDA_112_cuDNN_810:
configuration: ubuntu_1804_py_39_cuda_112_cudnn_810
container_image: pytorchms.azurecr.io/ubuntu_1804_py_39_cuda_112_cudnn_8_dev
CUDA_VERSION: 112
Py_38_CUDA_102_cuDNN_810:
configuration: ubuntu_1804_py_38_cuda_102_cudnn_810
container_image: pytorchms.azurecr.io/ubuntu_1804_py_38_cuda_102_cudnn_8_dev
CUDA_VERSION: 102
Py_37_CUDA_101_cuDNN_765:
configuration: ubuntu_1804_py_37_cuda_101_cudnn_765
container_image: pytorchms.azurecr.io/ubuntu_1804_py_37_cuda_101_cudnn_7_dev
CUDA_VERSION: 101
- template: job_templates/build-verify-publish-template-win.yml
parameters:
name: windows_2019_CPU
pool: 'PyTorch-Win-CPU'
verify_stage: True
is_official_build: True
customMatrixes:
Py_38:
configuration: windows_2019_py_38_cpu
Py_37:
configuration: windows_2019_py_37_cpu
- template: job_templates/build-verify-publish-template-win.yml
parameters:
name: windows_2019_GPU
pool: 'PyTorch-Win-GPU'
verify_stage: True
is_official_build: True
customMatrixes:
Py_39_CUDA_112_cuDNN_810:
configuration: windows_2019_py_39_cuda_112_cudnn_810
CUDA_VERSION: 112
Py_38_CUDA_102_cuDNN_765:
configuration: windows_2019_py_38_cuda_102_cudnn_765
CUDA_VERSION: 102
Py_37_CUDA_101_cuDNN_764:
configuration: windows_2019_py_37_cuda_101_cudnn_764
CUDA_VERSION: 101
- stage: 'Publish'
displayName: 'Publish PyTorch wheels'
dependsOn: Verify
condition: succeeded()
jobs:
- template: job_templates/build-verify-publish-template-unix.yml
parameters:
name: ubuntu_1804_CPU_docker
pool: 'PyTorch-Linux-CPU'
container_endpoint: pytorchms.azurecr.io
publish_stage: True
is_official_build: True
customMatrixes:
Py_38:
configuration: ubuntu_1804_py_38_cpu
container_image: pytorchms.azurecr.io/ubuntu_1804_py_38_cpu_dev
Py_37:
configuration: ubuntu_1804_py_37_cpu
container_image: pytorchms.azurecr.io/ubuntu_1804_py_37_cpu_dev
- template: job_templates/build-verify-publish-template-unix.yml
parameters:
name: ubuntu_1804_GPU_docker
pool: 'PyTorch-Linux-GPU'
container_endpoint: pytorchms.azurecr.io
publish_stage: True
is_official_build: True
customMatrixes:
Py_39_CUDA_112_cuDNN_810:
configuration: ubuntu_1804_py_39_cuda_112_cudnn_810
container_image: pytorchms.azurecr.io/ubuntu_1804_py_39_cuda_112_cudnn_8_dev
CUDA_VERSION: 112
Py_38_CUDA_102_cuDNN_810:
configuration: ubuntu_1804_py_38_cuda_102_cudnn_810
container_image: pytorchms.azurecr.io/ubuntu_1804_py_38_cuda_102_cudnn_8_dev
CUDA_VERSION: 102
Py_37_CUDA_101_cuDNN_765:
configuration: ubuntu_1804_py_37_cuda_101_cudnn_765
container_image: pytorchms.azurecr.io/ubuntu_1804_py_37_cuda_101_cudnn_7_dev
CUDA_VERSION: 101
- template: job_templates/build-verify-publish-template-win.yml
parameters:
name: windows_2019_CPU
pool: 'PyTorch-Win-CPU'
publish_stage: True
is_official_build: True
customMatrixes:
Py_38:
configuration: windows_2019_py_38_cpu
Py_37:
configuration: windows_2019_py_37_cpu
- template: job_templates/build-verify-publish-template-win.yml
parameters:
name: windows_2019_GPU
pool: 'PyTorch-Win-GPU'
publish_stage: True
is_official_build: True
customMatrixes:
Py_39_CUDA_112_cuDNN_810:
configuration: windows_2019_py_39_cuda_112_cudnn_810
CUDA_VERSION: 112
Py_38_CUDA_102_cuDNN_765:
configuration: windows_2019_py_38_cuda_102_cudnn_765
CUDA_VERSION: 102
Py_37_CUDA_101_cuDNN_764:
configuration: windows_2019_py_37_cuda_101_cudnn_764
CUDA_VERSION: 101

View File

@ -1,4 +0,0 @@
# We do not use this library in our Bazel build. It contains an
# infinitely recursing symlink that makes Bazel very unhappy.
third_party/ittapi/
third_party/opentelemetry-cpp

109
.bazelrc
View File

@ -1,114 +1,13 @@
build --cxxopt=--std=c++17
build --copt=--std=c++14
build --copt=-I.
# Bazel does not support including its cc_library targets as system
# headers. We work around this for generated code
# (e.g. c10/macros/cmake_macros.h) by making the generated directory a
# system include path.
build --copt=-isystem --copt bazel-out/k8-fastbuild/bin
build --copt=-isystem --copt bazel-out/darwin-fastbuild/bin
build --experimental_ui_max_stdouterr_bytes=2048576
# Configuration to disable tty features for environments like CI
build:no-tty --curses no
build:no-tty --progress_report_interval 10
build:no-tty --show_progress_rate_limit 10
# Build with GPU support by default.
build --define=cuda=true
# rules_cuda configuration
build --@rules_cuda//cuda:enable_cuda
build --@rules_cuda//cuda:cuda_targets=sm_52
build --@rules_cuda//cuda:compiler=nvcc
build --repo_env=CUDA_PATH=/usr/local/cuda
# Configuration to build without GPU support
build:cpu-only --define=cuda=false
# Configuration to build with GPU support
build:gpu --define=cuda=true
# define a separate build folder for faster switching between configs
build:cpu-only --platform_suffix=-cpu-only
# See the note on the config-less build for details about why we are
# doing this. We must also do it for the "-cpu-only" platform suffix.
build --copt=-isystem --copt=bazel-out/k8-fastbuild-cpu-only/bin
# rules_cuda configuration
build:cpu-only --@rules_cuda//cuda:enable_cuda=False
# Definition of --config=shell
# interactive shell immediately before execution
build:shell --run_under="//tools/bazel_tools:shellwrap"
# Disable all warnings for external repositories. We don't care about
# their warnings.
build --per_file_copt=^external/@-w
# Set additional warnings to error level.
#
# Implementation notes:
# * we use file extensions to determine if we are using the C++
# compiler or the cuda compiler
# * we use ^// at the start of the regex to only permit matching
# PyTorch files. This excludes external repos.
#
# Note that because this is logically a command-line flag, it is
# considered the word on what warnings are enabled. This has the
# unfortunate consequence of preventing us from disabling an error at
# the target level because those flags will come before these flags in
# the action invocation. Instead we provide per-file exceptions after
# this.
#
# On the bright side, this means we don't have to more broadly apply
# the exceptions to an entire target.
#
# Looking for CUDA flags? We have a cu_library macro that we can edit
# directly. Look in //tools/rules:cu.bzl for details. Editing the
# macro over this has the following advantages:
# * making changes does not require discarding the Bazel analysis
# cache
# * it allows for selective overrides on individual targets since the
# macro-level opts will come earlier than target level overrides
build --per_file_copt='^//.*\.(cpp|cc)$'@-Werror=all
# The following warnings come from -Wall. We downgrade them from error
# to warnings here.
#
# We intentionally use #pragma unroll, which is compiler specific.
build --per_file_copt='^//.*\.(cpp|cc)$'@-Wno-error=unknown-pragmas
build --per_file_copt='^//.*\.(cpp|cc)$'@-Werror=extra
# The following warnings come from -Wextra. We downgrade them from error
# to warnings here.
#
# unused-parameter-compare has a tremendous amount of violations in the
# codebase. It will be a lot of work to fix them, just disable it for
# now.
build --per_file_copt='^//.*\.(cpp|cc)$'@-Wno-unused-parameter
# missing-field-parameters has both a large number of violations in
# the codebase, but it also is used pervasively in the Python C
# API. There are a couple of catches though:
# * we use multiple versions of the Python API and hence have
# potentially multiple different versions of each relevant
# struct. They may have different numbers of fields. It will be
# unwieldy to support multiple versions in the same source file.
# * Python itself for many of these structs recommends only
# initializing a subset of the fields. We should respect the API
# usage conventions of our dependencies.
#
# Hence, we just disable this warning altogether. We may want to clean
# up some of the clear-cut cases that could be risky, but we still
# likely want to have this disabled for the most part.
build --per_file_copt='^//.*\.(cpp|cc)$'@-Wno-missing-field-initializers
build --per_file_copt='^//.*\.(cpp|cc)$'@-Wno-unused-function
build --per_file_copt='^//.*\.(cpp|cc)$'@-Wno-unused-variable
build --per_file_copt='//:aten/src/ATen/RegisterCompositeExplicitAutograd\.cpp$'@-Wno-error=unused-function
build --per_file_copt='//:aten/src/ATen/RegisterCompositeImplicitAutograd\.cpp$'@-Wno-error=unused-function
build --per_file_copt='//:aten/src/ATen/RegisterMkldnnCPU\.cpp$'@-Wno-error=unused-function
build --per_file_copt='//:aten/src/ATen/RegisterNestedTensorCPU\.cpp$'@-Wno-error=unused-function
build --per_file_copt='//:aten/src/ATen/RegisterQuantizedCPU\.cpp$'@-Wno-error=unused-function
build --per_file_copt='//:aten/src/ATen/RegisterSparseCPU\.cpp$'@-Wno-error=unused-function
build --per_file_copt='//:aten/src/ATen/RegisterSparseCsrCPU\.cpp$'@-Wno-error=unused-function
build --per_file_copt='//:aten/src/ATen/RegisterNestedTensorMeta\.cpp$'@-Wno-error=unused-function
build --per_file_copt='//:aten/src/ATen/RegisterSparseMeta\.cpp$'@-Wno-error=unused-function
build --per_file_copt='//:aten/src/ATen/RegisterQuantizedMeta\.cpp$'@-Wno-error=unused-function
build --per_file_copt='//:aten/src/ATen/RegisterZeroTensor\.cpp$'@-Wno-error=unused-function
build --per_file_copt='//:torch/csrc/lazy/generated/RegisterAutogradLazy\.cpp$'@-Wno-error=unused-function
build --per_file_copt='//:torch/csrc/lazy/generated/RegisterLazy\.cpp$'@-Wno-error=unused-function
build:gpu --platform_suffix=-gpu

View File

@ -1 +1 @@
6.1.1
4.2.1

View File

@ -1,26 +0,0 @@
[pt]
is_oss=1
[buildfile]
name = BUCK.oss
includes = //tools/build_defs/select.bzl
[repositories]
bazel_skylib = third_party/bazel-skylib/
ovr_config = .
[download]
in_build = true
[cxx]
cxxflags = -std=c++17
ldflags = -Wl,--no-undefined
should_remap_host_platform = true
cpp = /usr/bin/clang
cc = /usr/bin/clang
cxx = /usr/bin/clang++
cxxpp = /usr/bin/clang++
ld = /usr/bin/clang++
[project]
default_flavors_mode=all

View File

@ -1,36 +0,0 @@
set -ex
LOCAL_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
ROOT_DIR=$(cd "$LOCAL_DIR"/../.. && pwd)
TEST_DIR="$ROOT_DIR/test"
gtest_reports_dir="${TEST_DIR}/test-reports/cpp"
pytest_reports_dir="${TEST_DIR}/test-reports/python"
# Figure out which Python to use
PYTHON="$(which python)"
if [[ "${BUILD_ENVIRONMENT}" =~ py((2|3)\.?[0-9]?\.?[0-9]?) ]]; then
PYTHON=$(which "python${BASH_REMATCH[1]}")
fi
if [[ "${BUILD_ENVIRONMENT}" == *rocm* ]]; then
# HIP_PLATFORM is auto-detected by hipcc; unset to avoid build errors
unset HIP_PLATFORM
if which sccache > /dev/null; then
# Save sccache logs to file
sccache --stop-server || true
rm -f ~/sccache_error.log || true
SCCACHE_ERROR_LOG=~/sccache_error.log SCCACHE_IDLE_TIMEOUT=0 sccache --start-server
# Report sccache stats for easier debugging
sccache --zero-stats
fi
fi
# /usr/local/caffe2 is where the cpp bits are installed to in cmake-only
# builds. In +python builds the cpp tests are copied to /usr/local/caffe2 so
# that the test code in .ci/test.sh is the same
INSTALL_PREFIX="/usr/local/caffe2"
mkdir -p "$gtest_reports_dir" || true
mkdir -p "$pytest_reports_dir" || true
mkdir -p "$INSTALL_PREFIX" || true

View File

@ -1,172 +0,0 @@
#!/bin/bash
# shellcheck source=./common.sh
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
if [[ ${BUILD_ENVIRONMENT} == *onnx* ]]; then
pip install click mock tabulate networkx==2.0
pip -q install --user "file:///var/lib/jenkins/workspace/third_party/onnx#egg=onnx"
fi
# Skip tests in environments where they are not built/applicable
if [[ "${BUILD_ENVIRONMENT}" == *-android* ]]; then
echo 'Skipping tests'
exit 0
fi
if [[ "${BUILD_ENVIRONMENT}" == *-rocm* ]]; then
# temporary to locate some kernel issues on the CI nodes
export HSAKMT_DEBUG_LEVEL=4
fi
# These additional packages are needed for circleci ROCm builds.
if [[ $BUILD_ENVIRONMENT == *rocm* ]]; then
# Need networkx 2.0 because bellmand_ford was moved in 2.1 . Scikit-image by
# defaults installs the most recent networkx version, so we install this lower
# version explicitly before scikit-image pulls it in as a dependency
pip install networkx==2.0
# click - onnx
pip install --progress-bar off click protobuf tabulate virtualenv mock typing-extensions
fi
# Find where cpp tests and Caffe2 itself are installed
if [[ "$BUILD_ENVIRONMENT" == *cmake* ]]; then
# For cmake only build we install everything into /usr/local
cpp_test_dir="$INSTALL_PREFIX/cpp_test"
ld_library_path="$INSTALL_PREFIX/lib"
else
# For Python builds we install into python
# cd to /usr first so the python import doesn't get confused by any 'caffe2'
# directory in cwd
python_installation="$(dirname $(dirname $(cd /usr && $PYTHON -c 'import os; import caffe2; print(os.path.realpath(caffe2.__file__))')))"
caffe2_pypath="$python_installation/caffe2"
cpp_test_dir="$python_installation/torch/test"
ld_library_path="$python_installation/torch/lib"
fi
################################################################################
# C++ tests #
################################################################################
# Only run cpp tests in the first shard, don't run cpp tests a second time in the second shard
if [[ "${SHARD_NUMBER:-1}" == "1" ]]; then
echo "Running C++ tests.."
for test in $(find "$cpp_test_dir" -executable -type f); do
case "$test" in
# skip tests we know are hanging or bad
*/mkl_utils_test|*/aten/integer_divider_test)
continue
;;
*/scalar_tensor_test|*/basic|*/native_test)
if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then
continue
else
LD_LIBRARY_PATH="$ld_library_path" "$test"
fi
;;
*/*_benchmark)
LD_LIBRARY_PATH="$ld_library_path" "$test" --benchmark_color=false
;;
*)
# Currently, we use a mixture of gtest (caffe2) and Catch2 (ATen). While
# planning to migrate to gtest as the common PyTorch c++ test suite, we
# currently do NOT use the xml test reporter, because Catch doesn't
# support multiple reporters
# c.f. https://github.com/catchorg/Catch2/blob/master/docs/release-notes.md#223
# which means that enabling XML output means you lose useful stdout
# output for Jenkins. It's more important to have useful console
# output than it is to have XML output for Jenkins.
# Note: in the future, if we want to use xml test reporter once we switch
# to all gtest, one can simply do:
LD_LIBRARY_PATH="$ld_library_path" \
"$test" --gtest_output=xml:"$gtest_reports_dir/$(basename $test).xml"
;;
esac
done
fi
################################################################################
# Python tests #
################################################################################
if [[ "$BUILD_ENVIRONMENT" == *cmake* ]]; then
exit 0
fi
# If pip is installed as root, we must use sudo.
# CircleCI docker images could install conda as jenkins user, or use the OS's python package.
PIP=$(which pip)
PIP_USER=$(stat --format '%U' $PIP)
CURRENT_USER=$(id -u -n)
if [[ "$PIP_USER" = root && "$CURRENT_USER" != root ]]; then
MAYBE_SUDO=sudo
fi
# Uninstall pre-installed hypothesis and coverage to use an older version as newer
# versions remove the timeout parameter from settings which ideep/conv_transpose_test.py uses
$MAYBE_SUDO pip -q uninstall -y hypothesis
$MAYBE_SUDO pip -q uninstall -y coverage
# "pip install hypothesis==3.44.6" from official server is unreliable on
# CircleCI, so we host a copy on S3 instead
$MAYBE_SUDO pip -q install attrs==18.1.0 -f https://s3.amazonaws.com/ossci-linux/wheels/attrs-18.1.0-py2.py3-none-any.whl
$MAYBE_SUDO pip -q install coverage==4.5.1 -f https://s3.amazonaws.com/ossci-linux/wheels/coverage-4.5.1-cp36-cp36m-macosx_10_12_x86_64.whl
$MAYBE_SUDO pip -q install hypothesis==3.44.6 -f https://s3.amazonaws.com/ossci-linux/wheels/hypothesis-3.44.6-py3-none-any.whl
# Collect additional tests to run (outside caffe2/python)
EXTRA_TESTS=()
# CUDA builds always include NCCL support
if [[ "$BUILD_ENVIRONMENT" == *-cuda* ]] || [[ "$BUILD_ENVIRONMENT" == *-rocm* ]]; then
EXTRA_TESTS+=("$caffe2_pypath/contrib/nccl")
fi
rocm_ignore_test=()
if [[ $BUILD_ENVIRONMENT == *-rocm* ]]; then
# Currently these tests are failing on ROCM platform:
# On ROCm, RCCL (distributed) development isn't complete.
# https://github.com/ROCmSoftwarePlatform/rccl
rocm_ignore_test+=("--ignore $caffe2_pypath/python/data_parallel_model_test.py")
# This test has been flaky in ROCm CI (but note the tests are
# cpu-only so should be unrelated to ROCm)
rocm_ignore_test+=("--ignore $caffe2_pypath/python/operator_test/blobs_queue_db_test.py")
# This test is skipped on Jenkins(compiled without MKL) and otherwise known flaky
rocm_ignore_test+=("--ignore $caffe2_pypath/python/ideep/convfusion_op_test.py")
# This test is skipped on Jenkins(compiled without MKL) and causing segfault on Circle
rocm_ignore_test+=("--ignore $caffe2_pypath/python/ideep/pool_op_test.py")
fi
echo "Running Python tests.."
# locale setting is required by click package
for loc in "en_US.utf8" "C.UTF-8"; do
if locale -a | grep "$loc" >/dev/null 2>&1; then
export LC_ALL="$loc"
export LANG="$loc"
break;
fi
done
# Some Caffe2 tests fail when run using AVX512 ISA, see https://github.com/pytorch/pytorch/issues/66111
export DNNL_MAX_CPU_ISA=AVX2
# Should still run even in the absence of SHARD_NUMBER
if [[ "${SHARD_NUMBER:-1}" == "1" ]]; then
# TODO(sdym@meta.com) remove this when the linked issue resolved.
# py is temporary until https://github.com/Teemu/pytest-sugar/issues/241 is fixed
pip install --user py==1.11.0
pip install --user pytest-sugar
# NB: Warnings are disabled because they make it harder to see what
# the actual erroring test is
"$PYTHON" \
-m pytest \
-x \
-v \
--disable-warnings \
--junit-xml="$pytest_reports_dir/result.xml" \
--ignore "$caffe2_pypath/python/test/executor_test.py" \
--ignore "$caffe2_pypath/python/operator_test/matmul_op_test.py" \
--ignore "$caffe2_pypath/python/operator_test/pack_ops_test.py" \
--ignore "$caffe2_pypath/python/mkl/mkl_sbn_speed_test.py" \
--ignore "$caffe2_pypath/python/trt/test_pt_onnx_trt.py" \
${rocm_ignore_test[@]} \
"$caffe2_pypath/python" \
"${EXTRA_TESTS[@]}"
fi

View File

@ -1,32 +0,0 @@
# Docker images for GitHub CI
This directory contains everything needed to build the Docker images
that are used in our CI.
The Dockerfiles located in subdirectories are parameterized to
conditionally run build stages depending on build arguments passed to
`docker build`. This lets us use only a few Dockerfiles for many
images. The different configurations are identified by a freeform
string that we call a _build environment_. This string is persisted in
each image as the `BUILD_ENVIRONMENT` environment variable.
See `build.sh` for valid build environments (it's the giant switch).
## Contents
* `build.sh` -- dispatch script to launch all builds
* `common` -- scripts used to execute individual Docker build stages
* `ubuntu` -- Dockerfile for Ubuntu image for CPU build and test jobs
* `ubuntu-cuda` -- Dockerfile for Ubuntu image with CUDA support for nvidia-docker
* `ubuntu-rocm` -- Dockerfile for Ubuntu image with ROCm support
* `ubuntu-xpu` -- Dockerfile for Ubuntu image with XPU support
## Usage
```bash
# Build a specific image
./build.sh pytorch-linux-bionic-py3.8-gcc9 -t myimage:latest
# Set flags (see build.sh) and build image
sudo bash -c 'PROTOBUF=1 ./build.sh pytorch-linux-bionic-py3.8-gcc9 -t myimage:latest
```

View File

@ -1,5 +0,0 @@
0.6b
manylinux_2_17
rocm6
04b5df8c8123f90cba3ede7e971e6fbc6040d506
3db6ecbc915893ff967abd6e1b43bd5f54949868873be60dc802086c3863e648

View File

@ -1,566 +0,0 @@
#!/bin/bash
set -ex
image="$1"
shift
if [ -z "${image}" ]; then
echo "Usage: $0 IMAGE"
exit 1
fi
function extract_version_from_image_name() {
eval export $2=$(echo "${image}" | perl -n -e"/$1(\d+(\.\d+)?(\.\d+)?)/ && print \$1")
if [ "x${!2}" = x ]; then
echo "variable '$2' not correctly parsed from image='$image'"
exit 1
fi
}
function extract_all_from_image_name() {
# parts $image into array, splitting on '-'
keep_IFS="$IFS"
IFS="-"
declare -a parts=($image)
IFS="$keep_IFS"
unset keep_IFS
for part in "${parts[@]}"; do
name=$(echo "${part}" | perl -n -e"/([a-zA-Z]+)\d+(\.\d+)?(\.\d+)?/ && print \$1")
vername="${name^^}_VERSION"
# "py" is the odd one out, needs this special case
if [ "x${name}" = xpy ]; then
vername=ANACONDA_PYTHON_VERSION
fi
# skip non-conforming fields such as "pytorch", "linux" or "bionic" without version string
if [ -n "${name}" ]; then
extract_version_from_image_name "${name}" "${vername}"
fi
done
}
# Use the same pre-built XLA test image from PyTorch/XLA
if [[ "$image" == *xla* ]]; then
echo "Using pre-built XLA test image..."
exit 0
fi
if [[ "$image" == *-focal* ]]; then
UBUNTU_VERSION=20.04
elif [[ "$image" == *-jammy* ]]; then
UBUNTU_VERSION=22.04
elif [[ "$image" == *ubuntu* ]]; then
extract_version_from_image_name ubuntu UBUNTU_VERSION
elif [[ "$image" == *centos* ]]; then
extract_version_from_image_name centos CENTOS_VERSION
fi
if [ -n "${UBUNTU_VERSION}" ]; then
OS="ubuntu"
elif [ -n "${CENTOS_VERSION}" ]; then
OS="centos"
else
echo "Unable to derive operating system base..."
exit 1
fi
DOCKERFILE="${OS}/Dockerfile"
# When using ubuntu - 22.04, start from Ubuntu docker image, instead of nvidia/cuda docker image.
if [[ "$image" == *cuda* && "$UBUNTU_VERSION" != "22.04" ]]; then
DOCKERFILE="${OS}-cuda/Dockerfile"
elif [[ "$image" == *rocm* ]]; then
DOCKERFILE="${OS}-rocm/Dockerfile"
elif [[ "$image" == *xpu* ]]; then
DOCKERFILE="${OS}-xpu/Dockerfile"
elif [[ "$image" == *cuda*linter* ]]; then
# Use a separate Dockerfile for linter to keep a small image size
DOCKERFILE="linter-cuda/Dockerfile"
elif [[ "$image" == *linter* ]]; then
# Use a separate Dockerfile for linter to keep a small image size
DOCKERFILE="linter/Dockerfile"
fi
# CMake 3.18 is needed to support CUDA17 language variant
CMAKE_VERSION=3.18.5
_UCX_COMMIT=7bb2722ff2187a0cad557ae4a6afa090569f83fb
_UCC_COMMIT=20eae37090a4ce1b32bcce6144ccad0b49943e0b
# It's annoying to rename jobs every time you want to rewrite a
# configuration, so we hardcode everything here rather than do it
# from scratch
case "$image" in
pytorch-linux-focal-cuda12.4-cudnn9-py3-gcc9)
CUDA_VERSION=12.4.0
CUDNN_VERSION=9
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
CONDA_CMAKE=yes
TRITON=yes
;;
pytorch-linux-focal-cuda12.1-cudnn9-py3-gcc9)
CUDA_VERSION=12.1.1
CUDNN_VERSION=9
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
CONDA_CMAKE=yes
TRITON=yes
;;
pytorch-linux-focal-cuda12.4-cudnn9-py3-gcc9-inductor-benchmarks)
CUDA_VERSION=12.4.0
CUDNN_VERSION=9
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
CONDA_CMAKE=yes
TRITON=yes
INDUCTOR_BENCHMARKS=yes
;;
pytorch-linux-focal-cuda12.1-cudnn9-py3-gcc9-inductor-benchmarks)
CUDA_VERSION=12.1.1
CUDNN_VERSION=9
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
CONDA_CMAKE=yes
TRITON=yes
INDUCTOR_BENCHMARKS=yes
;;
pytorch-linux-focal-cuda12.1-cudnn9-py3.12-gcc9-inductor-benchmarks)
CUDA_VERSION=12.1.1
CUDNN_VERSION=9
ANACONDA_PYTHON_VERSION=3.12
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
CONDA_CMAKE=yes
TRITON=yes
INDUCTOR_BENCHMARKS=yes
;;
pytorch-linux-focal-cuda12.4-cudnn9-py3.12-gcc9-inductor-benchmarks)
CUDA_VERSION=12.4.0
CUDNN_VERSION=9
ANACONDA_PYTHON_VERSION=3.12
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
CONDA_CMAKE=yes
TRITON=yes
INDUCTOR_BENCHMARKS=yes
;;
pytorch-linux-focal-cuda11.8-cudnn9-py3-gcc9)
CUDA_VERSION=11.8.0
CUDNN_VERSION=9
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
CONDA_CMAKE=yes
TRITON=yes
;;
pytorch-linux-focal-cuda12.4-cudnn9-py3-gcc9)
CUDA_VERSION=12.4.0
CUDNN_VERSION=9
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
CONDA_CMAKE=yes
TRITON=yes
;;
pytorch-linux-focal-cuda12.1-cudnn9-py3-gcc9)
CUDA_VERSION=12.1.1
CUDNN_VERSION=9
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
CONDA_CMAKE=yes
TRITON=yes
;;
pytorch-linux-focal-cuda12.4-cudnn9-py3-gcc9)
CUDA_VERSION=12.4.0
CUDNN_VERSION=9
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
CONDA_CMAKE=yes
TRITON=yes
;;
pytorch-linux-focal-py3-clang10-onnx)
ANACONDA_PYTHON_VERSION=3.8
CLANG_VERSION=10
PROTOBUF=yes
DB=yes
VISION=yes
CONDA_CMAKE=yes
ONNX=yes
;;
pytorch-linux-focal-py3-clang9-android-ndk-r21e)
ANACONDA_PYTHON_VERSION=3.8
CLANG_VERSION=9
LLVMDEV=yes
PROTOBUF=yes
ANDROID=yes
ANDROID_NDK_VERSION=r21e
GRADLE_VERSION=6.8.3
NINJA_VERSION=1.9.0
;;
pytorch-linux-focal-py3.8-clang10)
ANACONDA_PYTHON_VERSION=3.8
CLANG_VERSION=10
PROTOBUF=yes
DB=yes
VISION=yes
VULKAN_SDK_VERSION=1.2.162.1
SWIFTSHADER=yes
CONDA_CMAKE=yes
TRITON=yes
;;
pytorch-linux-focal-py3.11-clang10)
ANACONDA_PYTHON_VERSION=3.11
CLANG_VERSION=10
PROTOBUF=yes
DB=yes
VISION=yes
VULKAN_SDK_VERSION=1.2.162.1
SWIFTSHADER=yes
CONDA_CMAKE=yes
TRITON=yes
;;
pytorch-linux-focal-py3.8-gcc9)
ANACONDA_PYTHON_VERSION=3.8
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
CONDA_CMAKE=yes
TRITON=yes
;;
pytorch-linux-focal-rocm-n-1-py3)
ANACONDA_PYTHON_VERSION=3.8
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
ROCM_VERSION=6.0
NINJA_VERSION=1.9.0
CONDA_CMAKE=yes
TRITON=yes
;;
pytorch-linux-focal-rocm-n-py3)
ANACONDA_PYTHON_VERSION=3.8
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
ROCM_VERSION=6.1
NINJA_VERSION=1.9.0
CONDA_CMAKE=yes
TRITON=yes
;;
pytorch-linux-jammy-xpu-2024.0-py3)
ANACONDA_PYTHON_VERSION=3.8
GCC_VERSION=11
PROTOBUF=yes
DB=yes
VISION=yes
XPU_VERSION=0.5
NINJA_VERSION=1.9.0
CONDA_CMAKE=yes
TRITON=yes
;;
pytorch-linux-jammy-py3.8-gcc11-inductor-benchmarks)
ANACONDA_PYTHON_VERSION=3.8
GCC_VERSION=11
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
CONDA_CMAKE=yes
TRITON=yes
DOCS=yes
INDUCTOR_BENCHMARKS=yes
;;
pytorch-linux-jammy-cuda11.8-cudnn9-py3.8-clang12)
ANACONDA_PYTHON_VERSION=3.8
CUDA_VERSION=11.8
CUDNN_VERSION=9
CLANG_VERSION=12
PROTOBUF=yes
DB=yes
VISION=yes
TRITON=yes
;;
pytorch-linux-jammy-py3-clang12-asan)
ANACONDA_PYTHON_VERSION=3.9
CLANG_VERSION=12
PROTOBUF=yes
DB=yes
VISION=yes
CONDA_CMAKE=yes
TRITON=yes
;;
pytorch-linux-jammy-py3-clang15-asan)
ANACONDA_PYTHON_VERSION=3.10
CLANG_VERSION=15
CONDA_CMAKE=yes
VISION=yes
;;
pytorch-linux-jammy-py3.8-gcc11)
ANACONDA_PYTHON_VERSION=3.8
GCC_VERSION=11
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
CONDA_CMAKE=yes
TRITON=yes
DOCS=yes
UNINSTALL_DILL=yes
;;
pytorch-linux-jammy-py3-clang12-executorch)
ANACONDA_PYTHON_VERSION=3.10
CLANG_VERSION=12
CONDA_CMAKE=yes
EXECUTORCH=yes
;;
pytorch-linux-jammy-py3.12-halide)
CUDA_VERSION=12.4
ANACONDA_PYTHON_VERSION=3.12
GCC_VERSION=11
CONDA_CMAKE=yes
HALIDE=yes
;;
pytorch-linux-focal-linter)
# TODO: Use 3.9 here because of this issue https://github.com/python/mypy/issues/13627.
# We will need to update mypy version eventually, but that's for another day. The task
# would be to upgrade mypy to 1.0.0 with Python 3.11
ANACONDA_PYTHON_VERSION=3.9
CONDA_CMAKE=yes
;;
pytorch-linux-jammy-cuda11.8-cudnn9-py3.9-linter)
ANACONDA_PYTHON_VERSION=3.9
CUDA_VERSION=11.8
CONDA_CMAKE=yes
;;
pytorch-linux-jammy-aarch64-py3.10-gcc11)
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=11
ACL=yes
PROTOBUF=yes
DB=yes
VISION=yes
CONDA_CMAKE=yes
# snadampal: skipping sccache due to the following issue
# https://github.com/pytorch/pytorch/issues/121559
SKIP_SCCACHE_INSTALL=yes
# snadampal: skipping llvm src build install because the current version
# from pytorch/llvm:9.0.1 is x86 specific
SKIP_LLVM_SRC_BUILD_INSTALL=yes
;;
*)
# Catch-all for builds that are not hardcoded.
PROTOBUF=yes
DB=yes
VISION=yes
echo "image '$image' did not match an existing build configuration"
if [[ "$image" == *py* ]]; then
extract_version_from_image_name py ANACONDA_PYTHON_VERSION
fi
if [[ "$image" == *cuda* ]]; then
extract_version_from_image_name cuda CUDA_VERSION
extract_version_from_image_name cudnn CUDNN_VERSION
fi
if [[ "$image" == *rocm* ]]; then
extract_version_from_image_name rocm ROCM_VERSION
NINJA_VERSION=1.9.0
TRITON=yes
# To ensure that any ROCm config will build using conda cmake
# and thus have LAPACK/MKL enabled
CONDA_CMAKE=yes
fi
if [[ "$image" == *centos7* ]]; then
NINJA_VERSION=1.10.2
fi
if [[ "$image" == *gcc* ]]; then
extract_version_from_image_name gcc GCC_VERSION
fi
if [[ "$image" == *clang* ]]; then
extract_version_from_image_name clang CLANG_VERSION
fi
if [[ "$image" == *devtoolset* ]]; then
extract_version_from_image_name devtoolset DEVTOOLSET_VERSION
fi
if [[ "$image" == *glibc* ]]; then
extract_version_from_image_name glibc GLIBC_VERSION
fi
if [[ "$image" == *cmake* ]]; then
extract_version_from_image_name cmake CMAKE_VERSION
fi
;;
esac
tmp_tag=$(basename "$(mktemp -u)" | tr '[:upper:]' '[:lower:]')
#when using cudnn version 8 install it separately from cuda
if [[ "$image" == *cuda* && ${OS} == "ubuntu" ]]; then
IMAGE_NAME="nvidia/cuda:${CUDA_VERSION}-cudnn${CUDNN_VERSION}-devel-ubuntu${UBUNTU_VERSION}"
if [[ ${CUDNN_VERSION} == 9 ]]; then
IMAGE_NAME="nvidia/cuda:${CUDA_VERSION}-devel-ubuntu${UBUNTU_VERSION}"
fi
fi
# Build image
docker build \
--no-cache \
--progress=plain \
--build-arg "BUILD_ENVIRONMENT=${image}" \
--build-arg "PROTOBUF=${PROTOBUF:-}" \
--build-arg "LLVMDEV=${LLVMDEV:-}" \
--build-arg "DB=${DB:-}" \
--build-arg "VISION=${VISION:-}" \
--build-arg "UBUNTU_VERSION=${UBUNTU_VERSION}" \
--build-arg "CENTOS_VERSION=${CENTOS_VERSION}" \
--build-arg "DEVTOOLSET_VERSION=${DEVTOOLSET_VERSION}" \
--build-arg "GLIBC_VERSION=${GLIBC_VERSION}" \
--build-arg "CLANG_VERSION=${CLANG_VERSION}" \
--build-arg "ANACONDA_PYTHON_VERSION=${ANACONDA_PYTHON_VERSION}" \
--build-arg "GCC_VERSION=${GCC_VERSION}" \
--build-arg "CUDA_VERSION=${CUDA_VERSION}" \
--build-arg "CUDNN_VERSION=${CUDNN_VERSION}" \
--build-arg "TENSORRT_VERSION=${TENSORRT_VERSION}" \
--build-arg "ANDROID=${ANDROID}" \
--build-arg "ANDROID_NDK=${ANDROID_NDK_VERSION}" \
--build-arg "GRADLE_VERSION=${GRADLE_VERSION}" \
--build-arg "VULKAN_SDK_VERSION=${VULKAN_SDK_VERSION}" \
--build-arg "SWIFTSHADER=${SWIFTSHADER}" \
--build-arg "CMAKE_VERSION=${CMAKE_VERSION:-}" \
--build-arg "NINJA_VERSION=${NINJA_VERSION:-}" \
--build-arg "KATEX=${KATEX:-}" \
--build-arg "ROCM_VERSION=${ROCM_VERSION:-}" \
--build-arg "PYTORCH_ROCM_ARCH=${PYTORCH_ROCM_ARCH:-gfx906;gfx90a}" \
--build-arg "IMAGE_NAME=${IMAGE_NAME}" \
--build-arg "UCX_COMMIT=${UCX_COMMIT}" \
--build-arg "UCC_COMMIT=${UCC_COMMIT}" \
--build-arg "CONDA_CMAKE=${CONDA_CMAKE}" \
--build-arg "TRITON=${TRITON}" \
--build-arg "ONNX=${ONNX}" \
--build-arg "DOCS=${DOCS}" \
--build-arg "INDUCTOR_BENCHMARKS=${INDUCTOR_BENCHMARKS}" \
--build-arg "EXECUTORCH=${EXECUTORCH}" \
--build-arg "HALIDE=${HALIDE}" \
--build-arg "XPU_VERSION=${XPU_VERSION}" \
--build-arg "ACL=${ACL:-}" \
--build-arg "SKIP_SCCACHE_INSTALL=${SKIP_SCCACHE_INSTALL:-}" \
--build-arg "SKIP_LLVM_SRC_BUILD_INSTALL=${SKIP_LLVM_SRC_BUILD_INSTALL:-}" \
-f $(dirname ${DOCKERFILE})/Dockerfile \
-t "$tmp_tag" \
"$@" \
.
# NVIDIA dockers for RC releases use tag names like `11.0-cudnn9-devel-ubuntu18.04-rc`,
# for this case we will set UBUNTU_VERSION to `18.04-rc` so that the Dockerfile could
# find the correct image. As a result, here we have to replace the
# "$UBUNTU_VERSION" == "18.04-rc"
# with
# "$UBUNTU_VERSION" == "18.04"
UBUNTU_VERSION=$(echo ${UBUNTU_VERSION} | sed 's/-rc$//')
function drun() {
docker run --rm "$tmp_tag" $*
}
if [[ "$OS" == "ubuntu" ]]; then
if !(drun lsb_release -a 2>&1 | grep -qF Ubuntu); then
echo "OS=ubuntu, but:"
drun lsb_release -a
exit 1
fi
if !(drun lsb_release -a 2>&1 | grep -qF "$UBUNTU_VERSION"); then
echo "UBUNTU_VERSION=$UBUNTU_VERSION, but:"
drun lsb_release -a
exit 1
fi
fi
if [ -n "$ANACONDA_PYTHON_VERSION" ]; then
if !(drun python --version 2>&1 | grep -qF "Python $ANACONDA_PYTHON_VERSION"); then
echo "ANACONDA_PYTHON_VERSION=$ANACONDA_PYTHON_VERSION, but:"
drun python --version
exit 1
fi
fi
if [ -n "$GCC_VERSION" ]; then
if !(drun gcc --version 2>&1 | grep -q " $GCC_VERSION\\W"); then
echo "GCC_VERSION=$GCC_VERSION, but:"
drun gcc --version
exit 1
fi
fi
if [ -n "$CLANG_VERSION" ]; then
if !(drun clang --version 2>&1 | grep -qF "clang version $CLANG_VERSION"); then
echo "CLANG_VERSION=$CLANG_VERSION, but:"
drun clang --version
exit 1
fi
fi
if [ -n "$KATEX" ]; then
if !(drun katex --version); then
echo "KATEX=$KATEX, but:"
drun katex --version
exit 1
fi
fi

View File

@ -1,133 +0,0 @@
ARG CENTOS_VERSION
FROM centos:${CENTOS_VERSION}
ARG CENTOS_VERSION
# Set AMD gpu targets to build for
ARG PYTORCH_ROCM_ARCH
ENV PYTORCH_ROCM_ARCH ${PYTORCH_ROCM_ARCH}
# Install required packages to build Caffe2
# Install common dependencies (so that this step can be cached separately)
COPY ./common/install_base.sh install_base.sh
RUN bash ./install_base.sh && rm install_base.sh
# Update CentOS git version
RUN yum -y remove git
RUN yum -y remove git-*
RUN yum -y install https://packages.endpoint.com/rhel/7/os/x86_64/endpoint-repo-1.9-1.x86_64.rpm || \
(yum -y install https://packages.endpointdev.com/rhel/7/os/x86_64/endpoint-repo-1.9-1.x86_64.rpm && \
sed -i "s/packages.endpoint/packages.endpointdev/" /etc/yum.repos.d/endpoint.repo)
RUN yum install -y git
# Install devtoolset
ARG DEVTOOLSET_VERSION
COPY ./common/install_devtoolset.sh install_devtoolset.sh
RUN bash ./install_devtoolset.sh && rm install_devtoolset.sh
ENV BASH_ENV "/etc/profile"
# (optional) Install non-default glibc version
ARG GLIBC_VERSION
COPY ./common/install_glibc.sh install_glibc.sh
RUN if [ -n "${GLIBC_VERSION}" ]; then bash ./install_glibc.sh; fi
RUN rm install_glibc.sh
# Install user
COPY ./common/install_user.sh install_user.sh
RUN bash ./install_user.sh && rm install_user.sh
# Install conda and other packages (e.g., numpy, pytest)
ARG ANACONDA_PYTHON_VERSION
ARG CONDA_CMAKE
ENV ANACONDA_PYTHON_VERSION=$ANACONDA_PYTHON_VERSION
ENV PATH /opt/conda/envs/py_$ANACONDA_PYTHON_VERSION/bin:/opt/conda/bin:$PATH
COPY requirements-ci.txt /opt/conda/requirements-ci.txt
COPY ./common/install_conda.sh install_conda.sh
COPY ./common/common_utils.sh common_utils.sh
RUN bash ./install_conda.sh && rm install_conda.sh common_utils.sh /opt/conda/requirements-ci.txt
# (optional) Install protobuf for ONNX
ARG PROTOBUF
COPY ./common/install_protobuf.sh install_protobuf.sh
RUN if [ -n "${PROTOBUF}" ]; then bash ./install_protobuf.sh; fi
RUN rm install_protobuf.sh
ENV INSTALLED_PROTOBUF ${PROTOBUF}
# (optional) Install database packages like LMDB and LevelDB
ARG DB
COPY ./common/install_db.sh install_db.sh
RUN if [ -n "${DB}" ]; then bash ./install_db.sh; fi
RUN rm install_db.sh
ENV INSTALLED_DB ${DB}
# (optional) Install vision packages like OpenCV
ARG VISION
COPY ./common/install_vision.sh ./common/cache_vision_models.sh ./common/common_utils.sh ./
RUN if [ -n "${VISION}" ]; then bash ./install_vision.sh; fi
RUN rm install_vision.sh cache_vision_models.sh common_utils.sh
ENV INSTALLED_VISION ${VISION}
# Install rocm
ARG ROCM_VERSION
COPY ./common/install_rocm.sh install_rocm.sh
RUN bash ./install_rocm.sh
RUN rm install_rocm.sh
COPY ./common/install_rocm_magma.sh install_rocm_magma.sh
RUN bash ./install_rocm_magma.sh
RUN rm install_rocm_magma.sh
COPY ./common/install_amdsmi.sh install_amdsmi.sh
RUN bash ./install_amdsmi.sh
RUN rm install_amdsmi.sh
ENV PATH /opt/rocm/bin:$PATH
ENV PATH /opt/rocm/hcc/bin:$PATH
ENV PATH /opt/rocm/hip/bin:$PATH
ENV PATH /opt/rocm/opencl/bin:$PATH
ENV PATH /opt/rocm/llvm/bin:$PATH
ENV MAGMA_HOME /opt/rocm/magma
ENV LANG en_US.utf8
ENV LC_ALL en_US.utf8
# (optional) Install non-default CMake version
ARG CMAKE_VERSION
COPY ./common/install_cmake.sh install_cmake.sh
RUN if [ -n "${CMAKE_VERSION}" ]; then bash ./install_cmake.sh; fi
RUN rm install_cmake.sh
# (optional) Install non-default Ninja version
ARG NINJA_VERSION
COPY ./common/install_ninja.sh install_ninja.sh
RUN if [ -n "${NINJA_VERSION}" ]; then bash ./install_ninja.sh; fi
RUN rm install_ninja.sh
ARG TRITON
# Install triton, this needs to be done before sccache because the latter will
# try to reach out to S3, which docker build runners don't have access
ENV CMAKE_C_COMPILER cc
ENV CMAKE_CXX_COMPILER c++
COPY ./common/install_triton.sh install_triton.sh
COPY ./common/common_utils.sh common_utils.sh
COPY ci_commit_pins/triton-rocm.txt triton-rocm.txt
COPY triton_version.txt triton_version.txt
RUN if [ -n "${TRITON}" ]; then bash ./install_triton.sh; fi
RUN rm install_triton.sh common_utils.sh triton-rocm.txt triton_version.txt
# Install AOTriton (Early fail)
COPY ./aotriton_version.txt aotriton_version.txt
COPY ./common/common_utils.sh common_utils.sh
COPY ./common/install_aotriton.sh install_aotriton.sh
RUN ["/bin/bash", "-c", "./install_aotriton.sh /opt/rocm && rm -rf install_aotriton.sh aotriton_version.txt common_utils.sh"]
ENV AOTRITON_INSTALLED_PREFIX /opt/rocm/aotriton
# Install ccache/sccache (do this last, so we get priority in PATH)
COPY ./common/install_cache.sh install_cache.sh
ENV PATH /opt/cache/bin:$PATH
RUN bash ./install_cache.sh && rm install_cache.sh
# Include BUILD_ENVIRONMENT environment variable in image
ARG BUILD_ENVIRONMENT
ENV BUILD_ENVIRONMENT ${BUILD_ENVIRONMENT}
USER jenkins
CMD ["bash"]

View File

@ -1 +0,0 @@
172574a6be5910a4609e4ed1bef2b6b8475ddb3d

View File

@ -1 +0,0 @@
340136fec6d3ebc73e7a19eba1663e9b0ba8ab2d

View File

@ -1 +0,0 @@
243e186efbf7fb93328dd6b34927a4e8c8f24395

View File

@ -1 +0,0 @@
730b907b4d45a4713cbc425cbf224c46089fd514

View File

@ -1 +0,0 @@
01cbe5045a6898c9a925f01435c8277b2fe6afcc

View File

@ -1 +0,0 @@
aac14a3b93f11d781d1d5ebc5400b15ae8df5185

View File

@ -1 +0,0 @@
45fff310c891f5a92d55445adf8cc9d29df5841e

View File

@ -1,18 +0,0 @@
#!/bin/bash
set -ex
source "$(dirname "${BASH_SOURCE[0]}")/common_utils.sh"
# Cache the test models at ~/.cache/torch/hub/
IMPORT_SCRIPT_FILENAME="/tmp/torchvision_import_script.py"
as_jenkins echo 'import torchvision; torchvision.models.mobilenet_v2(pretrained=True); torchvision.models.mobilenet_v3_large(pretrained=True);' > "${IMPORT_SCRIPT_FILENAME}"
pip_install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/cpu
# Very weird quoting behavior here https://github.com/conda/conda/issues/10972,
# so echo the command to a file and run the file instead
conda_run python "${IMPORT_SCRIPT_FILENAME}"
# Cleaning up
conda_run pip uninstall -y torch torchvision
rm "${IMPORT_SCRIPT_FILENAME}" || true

View File

@ -1,36 +0,0 @@
#!/bin/bash
# Work around bug where devtoolset replaces sudo and breaks it.
if [ -n "$DEVTOOLSET_VERSION" ]; then
export SUDO=/bin/sudo
else
export SUDO=sudo
fi
as_jenkins() {
# NB: unsetting the environment variables works around a conda bug
# https://github.com/conda/conda/issues/6576
# NB: Pass on PATH and LD_LIBRARY_PATH to sudo invocation
# NB: This must be run from a directory that jenkins has access to,
# works around https://github.com/conda/conda-package-handling/pull/34
$SUDO -E -H -u jenkins env -u SUDO_UID -u SUDO_GID -u SUDO_COMMAND -u SUDO_USER env "PATH=$PATH" "LD_LIBRARY_PATH=$LD_LIBRARY_PATH" $*
}
conda_install() {
# Ensure that the install command don't upgrade/downgrade Python
# This should be called as
# conda_install pkg1 pkg2 ... [-c channel]
as_jenkins conda install -q -n py_$ANACONDA_PYTHON_VERSION -y python="$ANACONDA_PYTHON_VERSION" $*
}
conda_run() {
as_jenkins conda run -n py_$ANACONDA_PYTHON_VERSION --no-capture-output $*
}
pip_install() {
as_jenkins conda run -n py_$ANACONDA_PYTHON_VERSION pip install --progress-bar off $*
}
get_pinned_commit() {
cat "${1}".txt
}

View File

@ -1,16 +0,0 @@
set -euo pipefail
readonly version=v24.04
readonly src_host=https://review.mlplatform.org/ml
readonly src_repo=ComputeLibrary
# Clone ACL
[[ ! -d ${src_repo} ]] && git clone ${src_host}/${src_repo}.git
cd ${src_repo}
git checkout $version
# Build with scons
scons -j8 Werror=0 debug=0 neon=1 opencl=0 embed_kernels=0 \
os=linux arch=armv8a build=native multi_isa=1 \
fixed_format_kernels=1 openmp=1 cppthreads=0

View File

@ -1,5 +0,0 @@
#!/bin/bash
set -ex
cd /opt/rocm/share/amd_smi && pip install .

View File

@ -1,23 +0,0 @@
#!/bin/bash
set -ex
source "$(dirname "${BASH_SOURCE[0]}")/common_utils.sh"
TARBALL='aotriton.tar.bz2'
# This read command alwasy returns with exit code 1
read -d "\n" VER MANYLINUX ROCMBASE PINNED_COMMIT SHA256 < aotriton_version.txt || true
ARCH=$(uname -m)
AOTRITON_INSTALL_PREFIX="$1"
AOTRITON_URL="https://github.com/ROCm/aotriton/releases/download/${VER}/aotriton-${VER}-${MANYLINUX}_${ARCH}-${ROCMBASE}.tar.bz2"
cd "${AOTRITON_INSTALL_PREFIX}"
# Must use -L to follow redirects
curl -L --retry 3 -o "${TARBALL}" "${AOTRITON_URL}"
ACTUAL_SHA256=$(sha256sum "${TARBALL}" | cut -d " " -f 1)
if [ "${SHA256}" != "${ACTUAL_SHA256}" ]; then
echo -n "Error: The SHA256 of downloaded tarball is ${ACTUAL_SHA256},"
echo " which does not match the expected value ${SHA256}."
exit
fi
tar xf "${TARBALL}" && rm -rf "${TARBALL}"

View File

@ -1,159 +0,0 @@
#!/bin/bash
set -ex
install_ubuntu() {
# NVIDIA dockers for RC releases use tag names like `11.0-cudnn9-devel-ubuntu18.04-rc`,
# for this case we will set UBUNTU_VERSION to `18.04-rc` so that the Dockerfile could
# find the correct image. As a result, here we have to check for
# "$UBUNTU_VERSION" == "18.04"*
# instead of
# "$UBUNTU_VERSION" == "18.04"
if [[ "$UBUNTU_VERSION" == "20.04"* ]]; then
cmake3="cmake=3.16*"
maybe_libiomp_dev=""
elif [[ "$UBUNTU_VERSION" == "22.04"* ]]; then
cmake3="cmake=3.22*"
maybe_libiomp_dev=""
else
cmake3="cmake=3.5*"
maybe_libiomp_dev="libiomp-dev"
fi
if [[ "$CLANG_VERSION" == 15 ]]; then
maybe_libomp_dev="libomp-15-dev"
elif [[ "$CLANG_VERSION" == 12 ]]; then
maybe_libomp_dev="libomp-12-dev"
elif [[ "$CLANG_VERSION" == 10 ]]; then
maybe_libomp_dev="libomp-10-dev"
else
maybe_libomp_dev=""
fi
# HACK: UCC testing relies on libnccl library from NVIDIA repo, and version 2.16 crashes
# See https://github.com/pytorch/pytorch/pull/105260#issuecomment-1673399729
if [[ "$UBUNTU_VERSION" == "20.04"* && "$CUDA_VERSION" == "11.8"* ]]; then
maybe_libnccl_dev="libnccl2=2.15.5-1+cuda11.8 libnccl-dev=2.15.5-1+cuda11.8 --allow-downgrades --allow-change-held-packages"
else
maybe_libnccl_dev=""
fi
# Install common dependencies
apt-get update
# TODO: Some of these may not be necessary
ccache_deps="asciidoc docbook-xml docbook-xsl xsltproc"
deploy_deps="libffi-dev libbz2-dev libreadline-dev libncurses5-dev libncursesw5-dev libgdbm-dev libsqlite3-dev uuid-dev tk-dev"
numpy_deps="gfortran"
apt-get install -y --no-install-recommends \
$ccache_deps \
$numpy_deps \
${deploy_deps} \
${cmake3} \
apt-transport-https \
autoconf \
automake \
build-essential \
ca-certificates \
curl \
git \
libatlas-base-dev \
libc6-dbg \
${maybe_libiomp_dev} \
libyaml-dev \
libz-dev \
libjemalloc2 \
libjpeg-dev \
libasound2-dev \
libsndfile-dev \
${maybe_libomp_dev} \
${maybe_libnccl_dev} \
software-properties-common \
wget \
sudo \
vim \
jq \
libtool \
vim \
unzip \
gpg-agent \
gdb
# Should resolve issues related to various apt package repository cert issues
# see: https://github.com/pytorch/pytorch/issues/65931
apt-get install -y libgnutls30
# Cleanup package manager
apt-get autoclean && apt-get clean
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
}
install_centos() {
# Need EPEL for many packages we depend on.
# See http://fedoraproject.org/wiki/EPEL
yum --enablerepo=extras install -y epel-release
ccache_deps="asciidoc docbook-dtds docbook-style-xsl libxslt"
numpy_deps="gcc-gfortran"
# Note: protobuf-c-{compiler,devel} on CentOS are too old to be used
# for Caffe2. That said, we still install them to make sure the build
# system opts to build/use protoc and libprotobuf from third-party.
yum install -y \
$ccache_deps \
$numpy_deps \
autoconf \
automake \
bzip2 \
cmake \
cmake3 \
curl \
gcc \
gcc-c++ \
gflags-devel \
git \
glibc-devel \
glibc-headers \
glog-devel \
libstdc++-devel \
libsndfile-devel \
make \
opencv-devel \
sudo \
wget \
vim \
unzip \
gdb
# Cleanup
yum clean all
rm -rf /var/cache/yum
rm -rf /var/lib/yum/yumdb
rm -rf /var/lib/yum/history
}
# Install base packages depending on the base OS
ID=$(grep -oP '(?<=^ID=).+' /etc/os-release | tr -d '"')
case "$ID" in
ubuntu)
install_ubuntu
;;
centos)
install_centos
;;
*)
echo "Unable to determine OS..."
exit 1
;;
esac
# Install Valgrind separately since the apt-get version is too old.
mkdir valgrind_build && cd valgrind_build
VALGRIND_VERSION=3.20.0
wget https://ossci-linux.s3.amazonaws.com/valgrind-${VALGRIND_VERSION}.tar.bz2
tar -xjf valgrind-${VALGRIND_VERSION}.tar.bz2
cd valgrind-${VALGRIND_VERSION}
./configure --prefix=/usr/local
make -j$[$(nproc) - 2]
sudo make install
cd ../../
rm -rf valgrind_build
alias valgrind="/usr/local/bin/valgrind"

View File

@ -1,44 +0,0 @@
#!/bin/bash
set -ex
if [ -n "$CLANG_VERSION" ]; then
if [[ $CLANG_VERSION == 9 && $UBUNTU_VERSION == 18.04 ]]; then
sudo apt-get update
# gpg-agent is not available by default on 18.04
sudo apt-get install -y --no-install-recommends gpg-agent
wget --no-check-certificate -O - https://apt.llvm.org/llvm-snapshot.gpg.key | sudo apt-key add -
apt-add-repository "deb http://apt.llvm.org/bionic/ llvm-toolchain-bionic-${CLANG_VERSION} main"
elif [[ $UBUNTU_VERSION == 22.04 ]]; then
# work around ubuntu apt-get conflicts
sudo apt-get -y -f install
fi
sudo apt-get update
apt-get install -y --no-install-recommends clang-"$CLANG_VERSION"
apt-get install -y --no-install-recommends llvm-"$CLANG_VERSION"
# Install dev version of LLVM.
if [ -n "$LLVMDEV" ]; then
sudo apt-get install -y --no-install-recommends llvm-"$CLANG_VERSION"-dev
fi
# Use update-alternatives to make this version the default
update-alternatives --install /usr/bin/clang clang /usr/bin/clang-"$CLANG_VERSION" 50
update-alternatives --install /usr/bin/clang++ clang++ /usr/bin/clang++-"$CLANG_VERSION" 50
# Override cc/c++ to clang as well
update-alternatives --install /usr/bin/cc cc /usr/bin/clang 50
update-alternatives --install /usr/bin/c++ c++ /usr/bin/clang++ 50
# clang's packaging is a little messed up (the runtime libs aren't
# added into the linker path), so give it a little help
clang_lib=("/usr/lib/llvm-$CLANG_VERSION/lib/clang/"*"/lib/linux")
echo "$clang_lib" > /etc/ld.so.conf.d/clang.conf
ldconfig
# Cleanup package manager
apt-get autoclean && apt-get clean
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
fi

View File

@ -1,31 +0,0 @@
#!/bin/bash
set -ex
[ -n "$CMAKE_VERSION" ]
# Remove system cmake install so it won't get used instead
ID=$(grep -oP '(?<=^ID=).+' /etc/os-release | tr -d '"')
case "$ID" in
ubuntu)
apt-get remove cmake -y
;;
centos)
yum remove cmake -y
;;
*)
echo "Unable to determine OS..."
exit 1
;;
esac
# Turn 3.6.3 into v3.6
path=$(echo "${CMAKE_VERSION}" | sed -e 's/\([0-9].[0-9]\+\).*/v\1/')
file="cmake-${CMAKE_VERSION}-Linux-x86_64.tar.gz"
# Download and install specific CMake version in /usr/local
pushd /tmp
curl -Os --retry 3 "https://cmake.org/files/${path}/${file}"
tar -C /usr/local --strip-components 1 --no-same-owner -zxf cmake-*.tar.gz
rm -f cmake-*.tar.gz
popd

View File

@ -1,127 +0,0 @@
#!/bin/bash
set -ex
# Optionally install conda
if [ -n "$ANACONDA_PYTHON_VERSION" ]; then
BASE_URL="https://repo.anaconda.com/miniconda"
MAJOR_PYTHON_VERSION=$(echo "$ANACONDA_PYTHON_VERSION" | cut -d . -f 1)
MINOR_PYTHON_VERSION=$(echo "$ANACONDA_PYTHON_VERSION" | cut -d . -f 2)
if [[ $(uname -m) == "aarch64" ]]; then
BASE_URL="https://github.com/conda-forge/miniforge/releases/latest/download"
case "$MAJOR_PYTHON_VERSION" in
3)
CONDA_FILE="Miniforge3-Linux-aarch64.sh"
;;
*)
echo "Unsupported ANACONDA_PYTHON_VERSION: $ANACONDA_PYTHON_VERSION"
exit 1
;;
esac
else
case "$MAJOR_PYTHON_VERSION" in
3)
CONDA_FILE="Miniconda3-latest-Linux-x86_64.sh"
;;
*)
echo "Unsupported ANACONDA_PYTHON_VERSION: $ANACONDA_PYTHON_VERSION"
exit 1
;;
esac
fi
mkdir -p /opt/conda
chown jenkins:jenkins /opt/conda
source "$(dirname "${BASH_SOURCE[0]}")/common_utils.sh"
pushd /tmp
wget -q "${BASE_URL}/${CONDA_FILE}"
# NB: Manually invoke bash per https://github.com/conda/conda/issues/10431
as_jenkins bash "${CONDA_FILE}" -b -f -p "/opt/conda"
popd
# NB: Don't do this, rely on the rpath to get it right
#echo "/opt/conda/lib" > /etc/ld.so.conf.d/conda-python.conf
#ldconfig
sed -e 's|PATH="\(.*\)"|PATH="/opt/conda/bin:\1"|g' -i /etc/environment
export PATH="/opt/conda/bin:$PATH"
# Ensure we run conda in a directory that jenkins has write access to
pushd /opt/conda
# Prevent conda from updating to 4.14.0, which causes docker build failures
# See https://hud.pytorch.org/pytorch/pytorch/commit/754d7f05b6841e555cea5a4b2c505dd9e0baec1d
# Uncomment the below when resolved to track the latest conda update
# as_jenkins conda update -y -n base conda
if [[ $(uname -m) == "aarch64" ]]; then
export SYSROOT_DEP="sysroot_linux-aarch64=2.17"
else
export SYSROOT_DEP="sysroot_linux-64=2.17"
fi
# Install correct Python version
# Also ensure sysroot is using a modern GLIBC to match system compilers
as_jenkins conda create -n py_$ANACONDA_PYTHON_VERSION -y\
python="$ANACONDA_PYTHON_VERSION" \
${SYSROOT_DEP}
# libstdcxx from conda default channels are too old, we need GLIBCXX_3.4.30
# which is provided in libstdcxx 12 and up.
conda_install libstdcxx-ng=12.3.0 -c conda-forge
# Install PyTorch conda deps, as per https://github.com/pytorch/pytorch README
if [[ $(uname -m) == "aarch64" ]]; then
CONDA_COMMON_DEPS="astunparse pyyaml setuptools openblas==0.3.25=*openmp* ninja==1.11.1 scons==4.5.2"
if [ "$ANACONDA_PYTHON_VERSION" = "3.8" ]; then
conda_install numpy=1.24.4 ${CONDA_COMMON_DEPS}
else
conda_install numpy=1.26.2 ${CONDA_COMMON_DEPS}
fi
else
CONDA_COMMON_DEPS="astunparse pyyaml mkl=2021.4.0 mkl-include=2021.4.0 setuptools"
if [ "$ANACONDA_PYTHON_VERSION" = "3.11" ] || [ "$ANACONDA_PYTHON_VERSION" = "3.12" ]; then
conda_install numpy=1.26.0 ${CONDA_COMMON_DEPS}
else
conda_install numpy=1.21.2 ${CONDA_COMMON_DEPS}
fi
fi
# Install llvm-8 as it is required to compile llvmlite-0.30.0 from source
# and libpython-static for torch deploy
conda_install llvmdev=8.0.0 "libpython-static=${ANACONDA_PYTHON_VERSION}"
# Use conda cmake in some cases. Conda cmake will be newer than our supported
# min version (3.5 for xenial and 3.10 for bionic), so we only do it in those
# following builds that we know should use conda. Specifically, Ubuntu bionic
# and focal cannot find conda mkl with stock cmake, so we need a cmake from conda
if [ -n "${CONDA_CMAKE}" ]; then
conda_install cmake
fi
# Magma package names are concatenation of CUDA major and minor ignoring revision
# I.e. magma-cuda102 package corresponds to CUDA_VERSION=10.2 and CUDA_VERSION=10.2.89
if [ -n "$CUDA_VERSION" ]; then
conda_install magma-cuda$(TMP=${CUDA_VERSION/./};echo ${TMP%.*[0-9]}) -c pytorch
fi
# Install some other packages, including those needed for Python test reporting
pip_install -r /opt/conda/requirements-ci.txt
pip_install -U scikit-learn
if [ -n "$DOCS" ]; then
apt-get update
apt-get -y install expect-dev
# We are currently building docs with python 3.8 (min support version)
pip_install -r /opt/conda/requirements-docs.txt
fi
popd
fi

View File

@ -1,22 +0,0 @@
#!/bin/bash
if [[ -n "${CUDNN_VERSION}" ]]; then
# cuDNN license: https://developer.nvidia.com/cudnn/license_agreement
mkdir tmp_cudnn
pushd tmp_cudnn
if [[ ${CUDA_VERSION:0:2} == "12" ]]; then
CUDNN_NAME="cudnn-linux-x86_64-9.1.0.70_cuda12-archive"
elif [[ ${CUDA_VERSION:0:2} == "11" ]]; then
CUDNN_NAME="cudnn-linux-x86_64-9.1.0.70_cuda11-archive"
else
print "Unsupported CUDA version ${CUDA_VERSION}"
exit 1
fi
curl --retry 3 -OLs https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-x86_64/${CUDNN_NAME}.tar.xz
tar xf ${CUDNN_NAME}.tar.xz
cp -a ${CUDNN_NAME}/include/* /usr/local/cuda/include/
cp -a ${CUDNN_NAME}/lib/* /usr/local/cuda/lib64/
popd
rm -rf tmp_cudnn
ldconfig
fi

View File

@ -1,26 +0,0 @@
#!/bin/bash
set -ex
# cuSPARSELt license: https://docs.nvidia.com/cuda/cusparselt/license.html
mkdir tmp_cusparselt && cd tmp_cusparselt
if [[ ${CUDA_VERSION:0:4} =~ ^12\.[1-4]$ ]]; then
arch_path='sbsa'
export TARGETARCH=${TARGETARCH:-$(uname -m)}
if [ ${TARGETARCH} = 'amd64' ] || [ "${TARGETARCH}" = 'x86_64' ]; then
arch_path='x86_64'
fi
CUSPARSELT_NAME="libcusparse_lt-linux-${arch_path}-0.5.2.1-archive"
curl --retry 3 -OLs https://developer.download.nvidia.com/compute/cusparselt/redist/libcusparse_lt/linux-${arch_path}/${CUSPARSELT_NAME}.tar.xz
elif [[ ${CUDA_VERSION:0:4} == "11.8" ]]; then
CUSPARSELT_NAME="libcusparse_lt-linux-x86_64-0.4.0.7-archive"
curl --retry 3 -OLs https://developer.download.nvidia.com/compute/cusparselt/redist/libcusparse_lt/linux-x86_64/${CUSPARSELT_NAME}.tar.xz
fi
tar xf ${CUSPARSELT_NAME}.tar.xz
cp -a ${CUSPARSELT_NAME}/include/* /usr/local/cuda/include/
cp -a ${CUSPARSELT_NAME}/lib/* /usr/local/cuda/lib64/
cd ..
rm -rf tmp_cusparselt
ldconfig

View File

@ -1,38 +0,0 @@
#!/bin/bash
set -ex
install_ubuntu() {
apt-get update
# Cleanup
apt-get autoclean && apt-get clean
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
}
install_centos() {
# Need EPEL for many packages we depend on.
# See http://fedoraproject.org/wiki/EPEL
yum --enablerepo=extras install -y epel-release
# Cleanup
yum clean all
rm -rf /var/cache/yum
rm -rf /var/lib/yum/yumdb
rm -rf /var/lib/yum/history
}
# Install base packages depending on the base OS
ID=$(grep -oP '(?<=^ID=).+' /etc/os-release | tr -d '"')
case "$ID" in
ubuntu)
install_ubuntu
;;
centos)
install_centos
;;
*)
echo "Unable to determine OS..."
exit 1
;;
esac

View File

@ -1,25 +0,0 @@
#!/bin/bash
set -ex
if [ -n "$KATEX" ]; then
apt-get update
# Ignore error if gpg-agent doesn't exist (for Ubuntu 16.04)
apt-get install -y gpg-agent || :
curl --retry 3 -sL https://deb.nodesource.com/setup_16.x | sudo -E bash -
sudo apt-get install -y nodejs
curl --retry 3 -sS https://dl.yarnpkg.com/debian/pubkey.gpg | sudo apt-key add -
echo "deb https://dl.yarnpkg.com/debian/ stable main" | sudo tee /etc/apt/sources.list.d/yarn.list
apt-get update
apt-get install -y --no-install-recommends yarn
yarn global add katex --prefix /usr/local
sudo apt-get -y install doxygen
apt-get autoclean && apt-get clean
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
fi

View File

@ -1,65 +0,0 @@
#!/bin/bash
set -ex
source "$(dirname "${BASH_SOURCE[0]}")/common_utils.sh"
clone_executorch() {
EXECUTORCH_PINNED_COMMIT=$(get_pinned_commit executorch)
# Clone the Executorch
git clone https://github.com/pytorch/executorch.git
# and fetch the target commit
pushd executorch
git checkout "${EXECUTORCH_PINNED_COMMIT}"
git submodule update --init
popd
chown -R jenkins executorch
}
install_buck2() {
pushd executorch/.ci/docker
BUCK2_VERSION=$(cat ci_commit_pins/buck2.txt)
source common/install_buck.sh
popd
}
install_conda_dependencies() {
pushd executorch/.ci/docker
# Install conda dependencies like flatbuffer
conda_install --file conda-env-ci.txt
popd
}
install_pip_dependencies() {
pushd executorch/.ci/docker
# Install PyTorch CPU build beforehand to avoid installing the much bigger CUDA
# binaries later, ExecuTorch only needs CPU
pip_install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
# Install all Python dependencies
pip_install -r requirements-ci.txt
popd
}
setup_executorch() {
pushd executorch
# Setup swiftshader and Vulkan SDK which are required to build the Vulkan delegate
as_jenkins bash .ci/scripts/setup-vulkan-linux-deps.sh
export PYTHON_EXECUTABLE=python
export EXECUTORCH_BUILD_PYBIND=ON
export CMAKE_ARGS="-DEXECUTORCH_BUILD_XNNPACK=ON -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON"
as_jenkins .ci/scripts/setup-linux.sh cmake
popd
}
clone_executorch
install_buck2
install_conda_dependencies
install_pip_dependencies
setup_executorch

View File

@ -1,46 +0,0 @@
#!/bin/bash
set -ex
source "$(dirname "${BASH_SOURCE[0]}")/common_utils.sh"
COMMIT=$(get_pinned_commit halide)
test -n "$COMMIT"
# activate conda to populate CONDA_PREFIX
test -n "$ANACONDA_PYTHON_VERSION"
eval "$(conda shell.bash hook)"
conda activate py_$ANACONDA_PYTHON_VERSION
if [ -n "${UBUNTU_VERSION}" ];then
apt update
apt-get install -y lld liblld-15-dev libpng-dev libjpeg-dev libgl-dev \
libopenblas-dev libeigen3-dev libatlas-base-dev libzstd-dev
fi
conda_install numpy scipy imageio cmake ninja
git clone --depth 1 --branch release/16.x --recursive https://github.com/llvm/llvm-project.git
cmake -DCMAKE_BUILD_TYPE=Release \
-DLLVM_ENABLE_PROJECTS="clang" \
-DLLVM_TARGETS_TO_BUILD="X86;NVPTX" \
-DLLVM_ENABLE_TERMINFO=OFF -DLLVM_ENABLE_ASSERTIONS=ON \
-DLLVM_ENABLE_EH=ON -DLLVM_ENABLE_RTTI=ON -DLLVM_BUILD_32_BITS=OFF \
-S llvm-project/llvm -B llvm-build -G Ninja
cmake --build llvm-build
cmake --install llvm-build --prefix llvm-install
export LLVM_ROOT=`pwd`/llvm-install
export LLVM_CONFIG=$LLVM_ROOT/bin/llvm-config
git clone https://github.com/halide/Halide.git
pushd Halide
git checkout ${COMMIT} && git submodule update --init --recursive
pip_install -r requirements.txt
cmake -G Ninja -DCMAKE_BUILD_TYPE=Release -S . -B build
cmake --build build
test -e ${CONDA_PREFIX}/lib/python3 || ln -s python${ANACONDA_PYTHON_VERSION} ${CONDA_PREFIX}/lib/python3
cmake --install build --prefix ${CONDA_PREFIX}
chown -R jenkins ${CONDA_PREFIX}
popd
rm -rf Halide llvm-build llvm-project llvm-install
python -c "import halide" # check for errors

View File

@ -1,26 +0,0 @@
#!/bin/bash
set -ex
source "$(dirname "${BASH_SOURCE[0]}")/common_utils.sh"
function install_huggingface() {
local version
commit=$(get_pinned_commit huggingface)
pip_install pandas==2.0.3
pip_install "git+https://github.com/huggingface/transformers@${commit}"
}
function install_timm() {
local commit
commit=$(get_pinned_commit timm)
pip_install pandas==2.0.3
pip_install "git+https://github.com/huggingface/pytorch-image-models@${commit}"
# Clean up
conda_run pip uninstall -y cmake torch torchvision triton
}
# Pango is needed for weasyprint which is needed for doctr
conda_install pango
install_huggingface
install_timm

View File

@ -1,29 +0,0 @@
#!/bin/bash
set -ex
source "$(dirname "${BASH_SOURCE[0]}")/common_utils.sh"
if [ -n "${UBUNTU_VERSION}" ]; then
apt update
apt-get install -y clang doxygen git graphviz nodejs npm libtinfo5
fi
# Do shallow clone of PyTorch so that we can init lintrunner in Docker build context
git clone https://github.com/pytorch/pytorch.git --depth 1
chown -R jenkins pytorch
pushd pytorch
# Install all linter dependencies
pip_install -r requirements.txt
conda_run lintrunner init
# Cache .lintbin directory as part of the Docker image
cp -r .lintbin /tmp
popd
# Node dependencies required by toc linter job
npm install -g markdown-toc
# Cleaning up
rm -rf pytorch

View File

@ -1,53 +0,0 @@
#!/bin/bash
set -ex
source "$(dirname "${BASH_SOURCE[0]}")/common_utils.sh"
retry () {
"$@" || (sleep 10 && "$@") || (sleep 20 && "$@") || (sleep 40 && "$@")
}
# A bunch of custom pip dependencies for ONNX
pip_install \
beartype==0.15.0 \
filelock==3.9.0 \
flatbuffers==2.0 \
mock==5.0.1 \
ninja==1.10.2 \
networkx==2.0 \
numpy==1.24.2
# ONNXRuntime should be installed before installing
# onnx-weekly. Otherwise, onnx-weekly could be
# overwritten by onnx.
pip_install \
parameterized==0.8.1 \
pytest-cov==4.0.0 \
pytest-subtests==0.10.0 \
tabulate==0.9.0 \
transformers==4.36.2
pip_install coloredlogs packaging
pip_install onnxruntime==1.18
pip_install onnx==1.16.0
# pip_install "onnxscript@git+https://github.com/microsoft/onnxscript@3e869ef8ccf19b5ebd21c10d3e9c267c9a9fa729" --no-deps
pip_install onnxscript==0.1.0.dev20240613 --no-deps
# required by onnxscript
pip_install ml_dtypes
# Cache the transformers model to be used later by ONNX tests. We need to run the transformers
# package to download the model. By default, the model is cached at ~/.cache/huggingface/hub/
IMPORT_SCRIPT_FILENAME="/tmp/onnx_import_script.py"
as_jenkins echo 'import transformers; transformers.AutoModel.from_pretrained("sshleifer/tiny-gpt2"); transformers.AutoTokenizer.from_pretrained("sshleifer/tiny-gpt2"); transformers.AutoModelForSpeechSeq2Seq.from_pretrained("openai/whisper-large-v3");' > "${IMPORT_SCRIPT_FILENAME}"
# Need a PyTorch version for transformers to work
pip_install --pre torch --index-url https://download.pytorch.org/whl/nightly/cpu
# Very weird quoting behavior here https://github.com/conda/conda/issues/10972,
# so echo the command to a file and run the file instead
conda_run python "${IMPORT_SCRIPT_FILENAME}"
# Cleaning up
conda_run pip uninstall -y torch
rm "${IMPORT_SCRIPT_FILENAME}" || true

View File

@ -1,17 +0,0 @@
#!/bin/bash
set -ex
OPENSSL=openssl-1.1.1k
wget -q -O "${OPENSSL}.tar.gz" "https://ossci-linux.s3.amazonaws.com/${OPENSSL}.tar.gz"
tar xf "${OPENSSL}.tar.gz"
cd "${OPENSSL}"
./config --prefix=/opt/openssl -d '-Wl,--enable-new-dtags,-rpath,$(LIBRPATH)'
# NOTE: openssl install errors out when built with the -j option
NPROC=$[$(nproc) - 2]
make -j${NPROC}; make install_sw
# Link the ssl libraries to the /usr/lib folder.
sudo ln -s /opt/openssl/lib/lib* /usr/lib
cd ..
rm -rf "${OPENSSL}"

View File

@ -1,19 +0,0 @@
#!/bin/bash
set -ex
pb_dir="/usr/temp_pb_install_dir"
mkdir -p $pb_dir
# On the nvidia/cuda:9-cudnn7-devel-centos7 image we need this symlink or
# else it will fail with
# g++: error: ./../lib64/crti.o: No such file or directory
ln -s /usr/lib64 "$pb_dir/lib64"
curl -LO "https://github.com/protocolbuffers/protobuf/releases/download/v3.17.3/protobuf-all-3.17.3.tar.gz" --retry 3
tar -xvz --no-same-owner -C "$pb_dir" --strip-components 1 -f protobuf-all-3.17.3.tar.gz
NPROC=$[$(nproc) - 2]
pushd "$pb_dir" && ./configure && make -j${NPROC} && make -j${NPROC} check && sudo make -j${NRPOC} install && sudo ldconfig
popd
rm -rf $pb_dir

View File

@ -1,148 +0,0 @@
#!/bin/bash
set -ex
ver() {
printf "%3d%03d%03d%03d" $(echo "$1" | tr '.' ' ');
}
install_ubuntu() {
apt-get update
if [[ $UBUNTU_VERSION == 18.04 ]]; then
# gpg-agent is not available by default on 18.04
apt-get install -y --no-install-recommends gpg-agent
fi
if [[ $UBUNTU_VERSION == 20.04 ]]; then
# gpg-agent is not available by default on 20.04
apt-get install -y --no-install-recommends gpg-agent
fi
apt-get install -y kmod
apt-get install -y wget
# Need the libc++1 and libc++abi1 libraries to allow torch._C to load at runtime
apt-get install -y libc++1
apt-get install -y libc++abi1
# Add amdgpu repository
UBUNTU_VERSION_NAME=`cat /etc/os-release | grep UBUNTU_CODENAME | awk -F= '{print $2}'`
echo "deb [arch=amd64] https://repo.radeon.com/amdgpu/${ROCM_VERSION}/ubuntu ${UBUNTU_VERSION_NAME} main" > /etc/apt/sources.list.d/amdgpu.list
# Add rocm repository
wget -qO - http://repo.radeon.com/rocm/rocm.gpg.key | apt-key add -
local rocm_baseurl="http://repo.radeon.com/rocm/apt/${ROCM_VERSION}"
echo "deb [arch=amd64] ${rocm_baseurl} ${UBUNTU_VERSION_NAME} main" > /etc/apt/sources.list.d/rocm.list
apt-get update --allow-insecure-repositories
DEBIAN_FRONTEND=noninteractive apt-get install -y --allow-unauthenticated \
rocm-dev \
rocm-utils \
rocm-libs \
rccl \
rocprofiler-dev \
roctracer-dev \
amd-smi-lib
if [[ $(ver $ROCM_VERSION) -ge $(ver 6.1) ]]; then
DEBIAN_FRONTEND=noninteractive apt-get install -y --allow-unauthenticated rocm-llvm-dev
fi
# precompiled miopen kernels added in ROCm 3.5, renamed in ROCm 5.5
# search for all unversioned packages
# if search fails it will abort this script; use true to avoid case where search fails
MIOPENHIPGFX=$(apt-cache search --names-only miopen-hip-gfx | awk '{print $1}' | grep -F -v . || true)
if [[ "x${MIOPENHIPGFX}" = x ]]; then
echo "miopen-hip-gfx package not available" && exit 1
else
DEBIAN_FRONTEND=noninteractive apt-get install -y --allow-unauthenticated ${MIOPENHIPGFX}
fi
# ROCm 6.0 had a regression where journal_mode was enabled on the kdb files resulting in permission errors at runtime
for kdb in /opt/rocm/share/miopen/db/*.kdb
do
sqlite3 $kdb "PRAGMA journal_mode=off; PRAGMA VACUUM;"
done
# Cleanup
apt-get autoclean && apt-get clean
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
}
install_centos() {
yum update -y
yum install -y kmod
yum install -y wget
yum install -y openblas-devel
yum install -y epel-release
yum install -y dkms kernel-headers-`uname -r` kernel-devel-`uname -r`
# Add amdgpu repository
local amdgpu_baseurl
if [[ $OS_VERSION == 9 ]]; then
amdgpu_baseurl="https://repo.radeon.com/amdgpu/${ROCM_VERSION}/rhel/9.0/main/x86_64"
else
amdgpu_baseurl="https://repo.radeon.com/amdgpu/${ROCM_VERSION}/rhel/7.9/main/x86_64"
fi
echo "[AMDGPU]" > /etc/yum.repos.d/amdgpu.repo
echo "name=AMDGPU" >> /etc/yum.repos.d/amdgpu.repo
echo "baseurl=${amdgpu_baseurl}" >> /etc/yum.repos.d/amdgpu.repo
echo "enabled=1" >> /etc/yum.repos.d/amdgpu.repo
echo "gpgcheck=1" >> /etc/yum.repos.d/amdgpu.repo
echo "gpgkey=http://repo.radeon.com/rocm/rocm.gpg.key" >> /etc/yum.repos.d/amdgpu.repo
local rocm_baseurl="http://repo.radeon.com/rocm/yum/${ROCM_VERSION}"
echo "[ROCm]" > /etc/yum.repos.d/rocm.repo
echo "name=ROCm" >> /etc/yum.repos.d/rocm.repo
echo "baseurl=${rocm_baseurl}" >> /etc/yum.repos.d/rocm.repo
echo "enabled=1" >> /etc/yum.repos.d/rocm.repo
echo "gpgcheck=1" >> /etc/yum.repos.d/rocm.repo
echo "gpgkey=http://repo.radeon.com/rocm/rocm.gpg.key" >> /etc/yum.repos.d/rocm.repo
yum update -y
yum install -y \
rocm-dev \
rocm-utils \
rocm-libs \
rccl \
rocprofiler-dev \
roctracer-dev \
amd-smi-lib
# precompiled miopen kernels; search for all unversioned packages
# if search fails it will abort this script; use true to avoid case where search fails
MIOPENHIPGFX=$(yum -q search miopen-hip-gfx | grep miopen-hip-gfx | awk '{print $1}'| grep -F kdb. || true)
if [[ "x${MIOPENHIPGFX}" = x ]]; then
echo "miopen-hip-gfx package not available" && exit 1
else
yum install -y ${MIOPENHIPGFX}
fi
# ROCm 6.0 had a regression where journal_mode was enabled on the kdb files resulting in permission errors at runtime
for kdb in /opt/rocm/share/miopen/db/*.kdb
do
sqlite3 $kdb "PRAGMA journal_mode=off; PRAGMA VACUUM;"
done
# Cleanup
yum clean all
rm -rf /var/cache/yum
rm -rf /var/lib/yum/yumdb
rm -rf /var/lib/yum/history
}
# Install Python packages depending on the base OS
ID=$(grep -oP '(?<=^ID=).+' /etc/os-release | tr -d '"')
case "$ID" in
ubuntu)
install_ubuntu
;;
centos)
install_centos
;;
*)
echo "Unable to determine OS..."
exit 1
;;
esac

View File

@ -1,31 +0,0 @@
#!/bin/bash
set -ex
# "install" hipMAGMA into /opt/rocm/magma by copying after build
git clone https://bitbucket.org/icl/magma.git
pushd magma
# Version 2.7.2 + ROCm related updates
git checkout a1625ff4d9bc362906bd01f805dbbe12612953f6
cp make.inc-examples/make.inc.hip-gcc-mkl make.inc
echo 'LIBDIR += -L$(MKLROOT)/lib' >> make.inc
echo 'LIB += -Wl,--enable-new-dtags -Wl,--rpath,/opt/rocm/lib -Wl,--rpath,$(MKLROOT)/lib -Wl,--rpath,/opt/rocm/magma/lib' >> make.inc
echo 'DEVCCFLAGS += --gpu-max-threads-per-block=256' >> make.inc
export PATH="${PATH}:/opt/rocm/bin"
if [[ -n "$PYTORCH_ROCM_ARCH" ]]; then
amdgpu_targets=`echo $PYTORCH_ROCM_ARCH | sed 's/;/ /g'`
else
amdgpu_targets=`rocm_agent_enumerator | grep -v gfx000 | sort -u | xargs`
fi
for arch in $amdgpu_targets; do
echo "DEVCCFLAGS += --offload-arch=$arch" >> make.inc
done
# hipcc with openmp flag may cause isnan() on __device__ not to be found; depending on context, compiler may attempt to match with host definition
sed -i 's/^FOPENMP/#FOPENMP/g' make.inc
make -f make.gen.hipMAGMA -j $(nproc)
LANG=C.UTF-8 make lib/libmagma.so -j $(nproc) MKLROOT=/opt/conda/envs/py_$ANACONDA_PYTHON_VERSION
make testing/testing_dgemm -j $(nproc) MKLROOT=/opt/conda/envs/py_$ANACONDA_PYTHON_VERSION
popd
mv magma /opt/rocm

View File

@ -1,72 +0,0 @@
#!/bin/bash
set -ex
source "$(dirname "${BASH_SOURCE[0]}")/common_utils.sh"
get_conda_version() {
as_jenkins conda list -n py_$ANACONDA_PYTHON_VERSION | grep -w $* | head -n 1 | awk '{print $2}'
}
conda_reinstall() {
as_jenkins conda install -q -n py_$ANACONDA_PYTHON_VERSION -y --force-reinstall $*
}
if [ -n "${ROCM_VERSION}" ]; then
TRITON_REPO="https://github.com/openai/triton"
TRITON_TEXT_FILE="triton-rocm"
elif [ -n "${XPU_VERSION}" ]; then
TRITON_REPO="https://github.com/intel/intel-xpu-backend-for-triton"
TRITON_TEXT_FILE="triton-xpu"
else
TRITON_REPO="https://github.com/openai/triton"
TRITON_TEXT_FILE="triton"
fi
# The logic here is copied from .ci/pytorch/common_utils.sh
TRITON_PINNED_COMMIT=$(get_pinned_commit ${TRITON_TEXT_FILE})
if [ -n "${UBUNTU_VERSION}" ];then
apt update
apt-get install -y gpg-agent
fi
if [ -n "${CONDA_CMAKE}" ]; then
# Keep the current cmake and numpy version here, so we can reinstall them later
CMAKE_VERSION=$(get_conda_version cmake)
NUMPY_VERSION=$(get_conda_version numpy)
fi
if [ -z "${MAX_JOBS}" ]; then
export MAX_JOBS=$(nproc)
fi
if [ -n "${UBUNTU_VERSION}" ] && [ -n "${GCC_VERSION}" ] && [[ "${GCC_VERSION}" == "7" ]]; then
# Triton needs at least gcc-9 to build
apt-get install -y g++-9
CXX=g++-9 pip_install "git+${TRITON_REPO}@${TRITON_PINNED_COMMIT}#subdirectory=python"
elif [ -n "${UBUNTU_VERSION}" ] && [ -n "${CLANG_VERSION}" ]; then
# Triton needs <filesystem> which surprisingly is not available with clang-9 toolchain
add-apt-repository -y ppa:ubuntu-toolchain-r/test
apt-get install -y g++-9
CXX=g++-9 pip_install "git+${TRITON_REPO}@${TRITON_PINNED_COMMIT}#subdirectory=python"
else
pip_install "git+${TRITON_REPO}@${TRITON_PINNED_COMMIT}#subdirectory=python"
fi
if [ -n "${CONDA_CMAKE}" ]; then
# TODO: This is to make sure that the same cmake and numpy version from install conda
# script is used. Without this step, the newer cmake version (3.25.2) downloaded by
# triton build step via pip will fail to detect conda MKL. Once that issue is fixed,
# this can be removed.
#
# The correct numpy version also needs to be set here because conda claims that it
# causes inconsistent environment. Without this, conda will attempt to install the
# latest numpy version, which fails ASAN tests with the following import error: Numba
# needs NumPy 1.20 or less.
conda_reinstall cmake="${CMAKE_VERSION}"
# Note that we install numpy with pip as conda might not have the version we want
pip_install --force-reinstall numpy=="${NUMPY_VERSION}"
fi

View File

@ -1,53 +0,0 @@
#!/bin/bash
set -ex
if [[ -d "/usr/local/cuda/" ]]; then
with_cuda=/usr/local/cuda/
else
with_cuda=no
fi
function install_ucx() {
set -ex
git clone --recursive https://github.com/openucx/ucx.git
pushd ucx
git checkout ${UCX_COMMIT}
git submodule update --init --recursive
./autogen.sh
./configure --prefix=$UCX_HOME \
--enable-mt \
--with-cuda=$with_cuda \
--enable-profiling \
--enable-stats
time make -j
sudo make install
popd
rm -rf ucx
}
function install_ucc() {
set -ex
git clone --recursive https://github.com/openucx/ucc.git
pushd ucc
git checkout ${UCC_COMMIT}
git submodule update --init --recursive
./autogen.sh
# We only run distributed tests on Tesla M60 and A10G
NVCC_GENCODE="-gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_86,code=compute_86"
./configure --prefix=$UCC_HOME \
--with-ucx=$UCX_HOME \
--with-cuda=$with_cuda \
--with-nvcc-gencode="${NVCC_GENCODE}"
time make -j
sudo make install
popd
rm -rf ucc
}
install_ucx
install_ucc

View File

@ -1,33 +0,0 @@
#!/bin/bash
set -ex
# Mirror jenkins user in container
# jenkins user as ec2-user should have the same user-id
echo "jenkins:x:1000:1000::/var/lib/jenkins:" >> /etc/passwd
echo "jenkins:x:1000:" >> /etc/group
# Needed on focal or newer
echo "jenkins:*:19110:0:99999:7:::" >>/etc/shadow
# Create $HOME
mkdir -p /var/lib/jenkins
chown jenkins:jenkins /var/lib/jenkins
mkdir -p /var/lib/jenkins/.ccache
chown jenkins:jenkins /var/lib/jenkins/.ccache
# Allow writing to /usr/local (for make install)
chown jenkins:jenkins /usr/local
# Allow sudo
# TODO: Maybe we shouldn't
echo 'jenkins ALL=(ALL) NOPASSWD:ALL' > /etc/sudoers.d/jenkins
# Work around bug where devtoolset replaces sudo and breaks it.
if [ -n "$DEVTOOLSET_VERSION" ]; then
SUDO=/bin/sudo
else
SUDO=sudo
fi
# Test that sudo works
$SUDO -u jenkins $SUDO -v

View File

@ -1,114 +0,0 @@
#!/bin/bash
set -xe
# Intel® software for general purpose GPU capabilities.
# Refer to https://www.intel.com/content/www/us/en/developer/articles/tool/pytorch-prerequisites-for-intel-gpus.html
# Users should update to the latest version as it becomes available
function install_ubuntu() {
apt-get update -y
apt-get install -y gpg-agent wget
# Set up the repository. To do this, download the key to the system keyring
wget -qO - https://repositories.intel.com/gpu/intel-graphics.key \
| gpg --dearmor --output /usr/share/keyrings/intel-graphics.gpg
wget -qO - https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB \
| gpg --dearmor --output /usr/share/keyrings/intel-for-pytorch-gpu-dev-keyring.gpg
# Add the signed entry to APT sources and configure the APT client to use the Intel repository
echo "deb [arch=amd64 signed-by=/usr/share/keyrings/intel-graphics.gpg] \
https://repositories.intel.com/gpu/ubuntu jammy/lts/2350 unified" \
| tee /etc/apt/sources.list.d/intel-gpu-jammy.list
echo "deb [signed-by=/usr/share/keyrings/intel-for-pytorch-gpu-dev-keyring.gpg] \
https://apt.repos.intel.com/intel-for-pytorch-gpu-dev all main" \
| tee /etc/apt/sources.list.d/intel-for-pytorch-gpu-dev.list
# Update the packages list and repository index
apt-get update
# The xpu-smi packages
apt-get install -y flex bison xpu-smi
# Compute and Media Runtimes
apt-get install -y \
intel-opencl-icd intel-level-zero-gpu level-zero \
intel-media-va-driver-non-free libmfx1 libmfxgen1 libvpl2 \
libegl-mesa0 libegl1-mesa libegl1-mesa-dev libgbm1 libgl1-mesa-dev libgl1-mesa-dri \
libglapi-mesa libgles2-mesa-dev libglx-mesa0 libigdgmm12 libxatracker2 mesa-va-drivers \
mesa-vdpau-drivers mesa-vulkan-drivers va-driver-all vainfo hwinfo clinfo
# Development Packages
apt-get install -y libigc-dev intel-igc-cm libigdfcl-dev libigfxcmrt-dev level-zero-dev
# Install Intel Support Packages
if [ -n "$XPU_VERSION" ]; then
apt-get install -y intel-for-pytorch-gpu-dev-${XPU_VERSION}
else
apt-get install -y intel-for-pytorch-gpu-dev
fi
# Cleanup
apt-get autoclean && apt-get clean
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
}
function install_centos() {
dnf install -y 'dnf-command(config-manager)'
dnf config-manager --add-repo \
https://repositories.intel.com/gpu/rhel/8.6/production/2328/unified/intel-gpu-8.6.repo
# To add the EPEL repository needed for DKMS
dnf -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm
# https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm
# Create the YUM repository file in the /temp directory as a normal user
tee > /tmp/oneAPI.repo << EOF
[oneAPI]
name=Intel® oneAPI repository
baseurl=https://yum.repos.intel.com/oneapi
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://yum.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB
EOF
# Move the newly created oneAPI.repo file to the YUM configuration directory /etc/yum.repos.d
mv /tmp/oneAPI.repo /etc/yum.repos.d
# The xpu-smi packages
dnf install -y flex bison xpu-smi
# Compute and Media Runtimes
dnf install -y \
intel-opencl intel-media intel-mediasdk libmfxgen1 libvpl2\
level-zero intel-level-zero-gpu mesa-dri-drivers mesa-vulkan-drivers \
mesa-vdpau-drivers libdrm mesa-libEGL mesa-libgbm mesa-libGL \
mesa-libxatracker libvpl-tools intel-metrics-discovery \
intel-metrics-library intel-igc-core intel-igc-cm \
libva libva-utils intel-gmmlib libmetee intel-gsc intel-ocloc hwinfo clinfo
# Development packages
dnf install -y --refresh \
intel-igc-opencl-devel level-zero-devel intel-gsc-devel libmetee-devel \
level-zero-devel
# Install Intel® oneAPI Base Toolkit
dnf install intel-basekit -y
# Cleanup
dnf clean all
rm -rf /var/cache/yum
rm -rf /var/lib/yum/yumdb
rm -rf /var/lib/yum/history
}
# The installation depends on the base OS
ID=$(grep -oP '(?<=^ID=).+' /etc/os-release | tr -d '"')
case "$ID" in
ubuntu)
install_ubuntu
;;
centos)
install_centos
;;
*)
echo "Unable to determine OS..."
exit 1
;;
esac

View File

@ -1,44 +0,0 @@
ARG UBUNTU_VERSION
FROM ubuntu:${UBUNTU_VERSION}
ARG UBUNTU_VERSION
ENV DEBIAN_FRONTEND noninteractive
# Install common dependencies (so that this step can be cached separately)
COPY ./common/install_base.sh install_base.sh
RUN bash ./install_base.sh && rm install_base.sh
# Install missing libomp-dev
RUN apt-get update && apt-get install -y --no-install-recommends libomp-dev && apt-get autoclean && apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
# Install user
COPY ./common/install_user.sh install_user.sh
RUN bash ./install_user.sh && rm install_user.sh
# Install conda and other packages (e.g., numpy, pytest)
ARG ANACONDA_PYTHON_VERSION
ARG CONDA_CMAKE
ENV ANACONDA_PYTHON_VERSION=$ANACONDA_PYTHON_VERSION
ENV PATH /opt/conda/envs/py_$ANACONDA_PYTHON_VERSION/bin:/opt/conda/bin:$PATH
COPY requirements-ci.txt /opt/conda/requirements-ci.txt
COPY ./common/install_conda.sh install_conda.sh
COPY ./common/common_utils.sh common_utils.sh
RUN bash ./install_conda.sh && rm install_conda.sh common_utils.sh /opt/conda/requirements-ci.txt
# Install cuda and cudnn
ARG CUDA_VERSION
RUN wget -q https://raw.githubusercontent.com/pytorch/builder/main/common/install_cuda.sh -O install_cuda.sh
RUN bash ./install_cuda.sh ${CUDA_VERSION} && rm install_cuda.sh
ENV DESIRED_CUDA ${CUDA_VERSION}
ENV PATH /usr/local/nvidia/bin:/usr/local/cuda/bin:$PATH
# Note that Docker build forbids copying file outside the build context
COPY ./common/install_linter.sh install_linter.sh
COPY ./common/common_utils.sh common_utils.sh
RUN bash ./install_linter.sh
RUN rm install_linter.sh common_utils.sh
USER jenkins
CMD ["bash"]

View File

@ -1,34 +0,0 @@
ARG UBUNTU_VERSION
FROM ubuntu:${UBUNTU_VERSION}
ARG UBUNTU_VERSION
ENV DEBIAN_FRONTEND noninteractive
# Install common dependencies (so that this step can be cached separately)
COPY ./common/install_base.sh install_base.sh
RUN bash ./install_base.sh && rm install_base.sh
# Install user
COPY ./common/install_user.sh install_user.sh
RUN bash ./install_user.sh && rm install_user.sh
# Install conda and other packages (e.g., numpy, pytest)
ARG ANACONDA_PYTHON_VERSION
ARG CONDA_CMAKE
ENV ANACONDA_PYTHON_VERSION=$ANACONDA_PYTHON_VERSION
ENV PATH /opt/conda/envs/py_$ANACONDA_PYTHON_VERSION/bin:/opt/conda/bin:$PATH
COPY requirements-ci.txt /opt/conda/requirements-ci.txt
COPY ./common/install_conda.sh install_conda.sh
COPY ./common/common_utils.sh common_utils.sh
RUN bash ./install_conda.sh && rm install_conda.sh common_utils.sh /opt/conda/requirements-ci.txt
# Note that Docker build forbids copying file outside the build context
COPY ./common/install_linter.sh install_linter.sh
COPY ./common/common_utils.sh common_utils.sh
RUN bash ./install_linter.sh
RUN rm install_linter.sh common_utils.sh
USER jenkins
CMD ["bash"]

View File

@ -1,314 +0,0 @@
# Python dependencies required for unit tests
#awscli==1.6 #this breaks some platforms
#Description: AWS command line interface
#Pinned versions: 1.6
#test that import:
boto3==1.19.12
#Description: AWS SDK for python
#Pinned versions: 1.19.12, 1.16.34
#test that import:
click
#Description: Command Line Interface Creation Kit
#Pinned versions:
#test that import:
coremltools==5.0b5 ; python_version < "3.12"
#Description: Apple framework for ML integration
#Pinned versions: 5.0b5
#test that import:
#dataclasses #this breaks some platforms
#Description: Provides decorators for auto adding special methods to user classes
#Pinned versions:
#test that import:
dill==0.3.7
#Description: dill extends pickle with serializing and de-serializing for most built-ins
#Pinned versions: 0.3.7
#test that import: dynamo/test_replay_record.py test_dataloader.py test_datapipe.py test_serialization.py
expecttest==0.1.6
#Description: method for writing tests where test framework auto populates
# the expected output based on previous runs
#Pinned versions: 0.1.6
#test that import:
flatbuffers==2.0
#Description: cross platform serialization library
#Pinned versions: 2.0
#test that import:
hypothesis==5.35.1
# Pin hypothesis to avoid flakiness: https://github.com/pytorch/pytorch/issues/31136
#Description: advanced library for generating parametrized tests
#Pinned versions: 3.44.6, 4.53.2
#test that import: test_xnnpack_integration.py, test_pruning_op.py, test_nn.py
junitparser==2.1.1
#Description: unitparser handles JUnit/xUnit Result XML files
#Pinned versions: 2.1.1
#test that import:
lark==0.12.0
#Description: parser
#Pinned versions: 0.12.0
#test that import:
librosa>=0.6.2 ; python_version < "3.11"
#Description: A python package for music and audio analysis
#Pinned versions: >=0.6.2
#test that import: test_spectral_ops.py
#mkl #this breaks linux-bionic-rocm4.5-py3.7
#Description: Intel oneAPI Math Kernel Library
#Pinned versions:
#test that import: test_profiler.py, test_public_bindings.py, test_testing.py,
#test_nn.py, test_mkldnn.py, test_jit.py, test_fx_experimental.py,
#test_autograd.py
#mkl-devel
# see mkl
#mock
#Description: A testing library that allows you to replace parts of your
#system under test with mock objects
#Pinned versions:
#test that import: test_modules.py, test_nn.py,
#test_testing.py
#MonkeyType # breaks pytorch-xla-linux-bionic-py3.7-clang8
#Description: collects runtime types of function arguments and return
#values, and can automatically generate stub files
#Pinned versions:
#test that import:
mypy==1.10.0
# Pin MyPy version because new errors are likely to appear with each release
#Description: linter
#Pinned versions: 1.10.0
#test that import: test_typing.py, test_type_hints.py
networkx==2.8.8
#Description: creation, manipulation, and study of
#the structure, dynamics, and functions of complex networks
#Pinned versions: 2.8.8
#test that import: functorch
#ninja
#Description: build system. Note that it install from
#here breaks things so it is commented out
#Pinned versions: 1.10.0.post1
#test that import: run_test.py, test_cpp_extensions_aot.py,test_determination.py
numba==0.49.0 ; python_version < "3.9"
numba==0.54.1 ; python_version == "3.9"
numba==0.55.2 ; python_version == "3.10"
#Description: Just-In-Time Compiler for Numerical Functions
#Pinned versions: 0.54.1, 0.49.0, <=0.49.1
#test that import: test_numba_integration.py
#For numba issue see https://github.com/pytorch/pytorch/issues/51511
#numpy
#Description: Provides N-dimensional arrays and linear algebra
#Pinned versions: 1.20
#test that import: test_view_ops.py, test_unary_ufuncs.py, test_type_promotion.py,
#test_type_info.py, test_torch.py, test_tensorexpr_pybind.py, test_tensorexpr.py,
#test_tensorboard.py, test_tensor_creation_ops.py, test_static_runtime.py,
#test_spectral_ops.py, test_sort_and_select.py, test_shape_ops.py,
#test_segment_reductions.py, test_reductions.py, test_pruning_op.py,
#test_overrides.py, test_numpy_interop.py, test_numba_integration.py
#test_nn.py, test_namedtensor.py, test_linalg.py, test_jit_cuda_fuser.py,
#test_jit.py, test_indexing.py, test_datapipe.py, test_dataloader.py,
#test_binary_ufuncs.py
#onnxruntime
#Description: scoring engine for Open Neural Network Exchange (ONNX) models
#Pinned versions: 1.9.0
#test that import:
opt-einsum==3.3
#Description: Python library to optimize tensor contraction order, used in einsum
#Pinned versions: 3.3
#test that import: test_linalg.py
optree==0.11.0
#Description: A library for tree manipulation
#Pinned versions: 0.11.0
#test that import: test_vmap.py, test_aotdispatch.py, test_dynamic_shapes.py,
#test_pytree.py, test_ops.py, test_control_flow.py, test_modules.py,
#common_utils.py, test_eager_transforms.py, test_python_dispatch.py,
#test_expanded_weights.py, test_decomp.py, test_overrides.py, test_masked.py,
#test_ops.py, test_prims.py, test_subclass.py, test_functionalization.py,
#test_schema_check.py, test_profiler_tree.py, test_meta.py, test_torchxla_num_output.py,
#test_utils.py, test_proxy_tensor.py, test_memory_profiler.py, test_view_ops.py,
#test_pointwise_ops.py, test_dtensor_ops.py, test_torchinductor.py, test_fx.py,
#test_fake_tensor.py, test_mps.py
pillow==10.3.0
#Description: Python Imaging Library fork
#Pinned versions: 10.3.0
#test that import:
protobuf==3.20.2
#Description: Googles data interchange format
#Pinned versions: 3.20.1
#test that import: test_tensorboard.py
psutil
#Description: information on running processes and system utilization
#Pinned versions:
#test that import: test_profiler.py, test_openmp.py, test_dataloader.py
pytest==7.3.2
#Description: testing framework
#Pinned versions:
#test that import: test_typing.py, test_cpp_extensions_aot.py, run_test.py
pytest-xdist==3.3.1
#Description: plugin for running pytest in parallel
#Pinned versions:
#test that import:
pytest-flakefinder==1.1.0
#Description: plugin for rerunning tests a fixed number of times in pytest
#Pinned versions: 1.1.0
#test that import:
pytest-rerunfailures>=10.3
#Description: plugin for rerunning failure tests in pytest
#Pinned versions:
#test that import:
#pytest-benchmark
#Description: fixture for benchmarking code
#Pinned versions: 3.2.3
#test that import:
#pytest-sugar
#Description: shows failures and errors instantly
#Pinned versions:
#test that import:
xdoctest==1.1.0
#Description: runs doctests in pytest
#Pinned versions: 1.1.0
#test that import:
pygments==2.15.0
#Description: support doctest highlighting
#Pinned versions: 2.12.0
#test that import: the doctests
#PyYAML
#Description: data serialization format
#Pinned versions:
#test that import:
#requests
#Description: HTTP library
#Pinned versions:
#test that import: test_type_promotion.py
#rich
#Description: rich text and beautiful formatting in the terminal
#Pinned versions: 10.9.0
#test that import:
scikit-image==0.19.3 ; python_version < "3.10"
scikit-image==0.20.0 ; python_version >= "3.10"
#Description: image processing routines
#Pinned versions:
#test that import: test_nn.py
#scikit-learn
#Description: machine learning package
#Pinned versions: 0.20.3
#test that import:
scipy==1.10.1 ; python_version <= "3.11"
scipy==1.12.0 ; python_version == "3.12"
# Pin SciPy because of failing distribution tests (see #60347)
#Description: scientific python
#Pinned versions: 1.10.1
#test that import: test_unary_ufuncs.py, test_torch.py,test_tensor_creation_ops.py
#test_spectral_ops.py, test_sparse_csr.py, test_reductions.py,test_nn.py
#test_linalg.py, test_binary_ufuncs.py
#tabulate
#Description: Pretty-print tabular data
#Pinned versions:
#test that import:
tb-nightly==2.13.0a20230426
#Description: TensorBoard
#Pinned versions:
#test that import:
# needed by torchgen utils
typing-extensions
#Description: type hints for python
#Pinned versions:
#test that import:
#virtualenv
#Description: virtual environment for python
#Pinned versions:
#test that import:
unittest-xml-reporting<=3.2.0,>=2.0.0
#Description: saves unit test results to xml
#Pinned versions:
#test that import:
#lintrunner is supported on aarch64-linux only from 0.12.4 version
lintrunner==0.12.5
#Description: all about linters!
#Pinned versions: 0.12.5
#test that import:
rockset==1.0.3
#Description: queries Rockset
#Pinned versions: 1.0.3
#test that import:
ghstack==0.8.0
#Description: ghstack tool
#Pinned versions: 0.8.0
#test that import:
jinja2==3.1.4
#Description: jinja2 template engine
#Pinned versions: 3.1.4
#test that import:
pytest-cpp==2.3.0
#Description: This is used by pytest to invoke C++ tests
#Pinned versions: 2.3.0
#test that import:
z3-solver==4.12.2.0
#Description: The Z3 Theorem Prover Project
#Pinned versions:
#test that import:
tensorboard==2.13.0
#Description: Also included in .ci/docker/requirements-docs.txt
#Pinned versions:
#test that import: test_tensorboard
pywavelets==1.4.1 ; python_version < "3.12"
pywavelets==1.5.0 ; python_version >= "3.12"
#Description: This is a requirement of scikit-image, we need to pin
# it here because 1.5.0 conflicts with numpy 1.21.2 used in CI
#Pinned versions: 1.4.1
#test that import:
lxml==5.0.0.
#Description: This is a requirement of unittest-xml-reporting
# Python-3.9 binaries
PyGithub==2.3.0

View File

@ -1,49 +0,0 @@
sphinx==5.3.0
#Description: This is used to generate PyTorch docs
#Pinned versions: 5.3.0
-e git+https://github.com/pytorch/pytorch_sphinx_theme.git#egg=pytorch_sphinx_theme
# TODO: sphinxcontrib.katex 0.9.0 adds a local KaTeX server to speed up pre-rendering
# but it doesn't seem to work and hangs around idly. The initial thought is probably
# something related to Docker setup. We can investigate this later
sphinxcontrib.katex==0.8.6
#Description: This is used to generate PyTorch docs
#Pinned versions: 0.8.6
matplotlib==3.5.3
#Description: This is used to generate PyTorch docs
#Pinned versions: 3.5.3
tensorboard==2.13.0
#Description: This is used to generate PyTorch docs
#Pinned versions: 2.13.0
breathe==4.34.0
#Description: This is used to generate PyTorch C++ docs
#Pinned versions: 4.34.0
exhale==0.2.3
#Description: This is used to generate PyTorch C++ docs
#Pinned versions: 0.2.3
docutils==0.16
#Description: This is used to generate PyTorch C++ docs
#Pinned versions: 0.16
bs4==0.0.1
#Description: This is used to generate PyTorch C++ docs
#Pinned versions: 0.0.1
IPython==8.12.0
#Description: This is used to generate PyTorch functorch docs
#Pinned versions: 8.12.0
myst-nb==0.17.2
#Description: This is used to generate PyTorch functorch docs
#Pinned versions: 0.13.2
# The following are required to build torch.distributed.elastic.rendezvous.etcd* docs
python-etcd==0.4.5
sphinx-copybutton==0.5.0
sphinx-panels==0.4.1
myst-parser==0.18.1

View File

@ -1 +0,0 @@
3.0.0

View File

@ -1,166 +0,0 @@
ARG UBUNTU_VERSION
ARG CUDA_VERSION
ARG IMAGE_NAME
FROM ${IMAGE_NAME}
ARG UBUNTU_VERSION
ARG CUDA_VERSION
ENV DEBIAN_FRONTEND noninteractive
# Install common dependencies (so that this step can be cached separately)
COPY ./common/install_base.sh install_base.sh
RUN bash ./install_base.sh && rm install_base.sh
# Install user
COPY ./common/install_user.sh install_user.sh
RUN bash ./install_user.sh && rm install_user.sh
# Install katex
ARG KATEX
COPY ./common/install_docs_reqs.sh install_docs_reqs.sh
RUN bash ./install_docs_reqs.sh && rm install_docs_reqs.sh
# Install conda and other packages (e.g., numpy, pytest)
ARG ANACONDA_PYTHON_VERSION
ENV ANACONDA_PYTHON_VERSION=$ANACONDA_PYTHON_VERSION
ENV PATH /opt/conda/envs/py_$ANACONDA_PYTHON_VERSION/bin:/opt/conda/bin:$PATH
ARG CONDA_CMAKE
COPY requirements-ci.txt /opt/conda/requirements-ci.txt
COPY ./common/install_conda.sh install_conda.sh
COPY ./common/common_utils.sh common_utils.sh
RUN bash ./install_conda.sh && rm install_conda.sh common_utils.sh /opt/conda/requirements-ci.txt
# Install gcc
ARG GCC_VERSION
COPY ./common/install_gcc.sh install_gcc.sh
RUN bash ./install_gcc.sh && rm install_gcc.sh
# Install clang
ARG CLANG_VERSION
COPY ./common/install_clang.sh install_clang.sh
RUN bash ./install_clang.sh && rm install_clang.sh
# (optional) Install protobuf for ONNX
ARG PROTOBUF
COPY ./common/install_protobuf.sh install_protobuf.sh
RUN if [ -n "${PROTOBUF}" ]; then bash ./install_protobuf.sh; fi
RUN rm install_protobuf.sh
ENV INSTALLED_PROTOBUF ${PROTOBUF}
# (optional) Install database packages like LMDB and LevelDB
ARG DB
COPY ./common/install_db.sh install_db.sh
RUN if [ -n "${DB}" ]; then bash ./install_db.sh; fi
RUN rm install_db.sh
ENV INSTALLED_DB ${DB}
# (optional) Install vision packages like OpenCV
ARG VISION
COPY ./common/install_vision.sh ./common/cache_vision_models.sh ./common/common_utils.sh ./
RUN if [ -n "${VISION}" ]; then bash ./install_vision.sh; fi
RUN rm install_vision.sh cache_vision_models.sh common_utils.sh
ENV INSTALLED_VISION ${VISION}
# (optional) Install UCC
ARG UCX_COMMIT
ARG UCC_COMMIT
ENV UCX_COMMIT $UCX_COMMIT
ENV UCC_COMMIT $UCC_COMMIT
ENV UCX_HOME /usr
ENV UCC_HOME /usr
ADD ./common/install_ucc.sh install_ucc.sh
RUN if [ -n "${UCX_COMMIT}" ] && [ -n "${UCC_COMMIT}" ]; then bash ./install_ucc.sh; fi
RUN rm install_ucc.sh
COPY ./common/install_openssl.sh install_openssl.sh
ENV OPENSSL_ROOT_DIR /opt/openssl
RUN bash ./install_openssl.sh
ENV OPENSSL_DIR /opt/openssl
ARG INDUCTOR_BENCHMARKS
COPY ./common/install_inductor_benchmark_deps.sh install_inductor_benchmark_deps.sh
COPY ./common/common_utils.sh common_utils.sh
COPY ci_commit_pins/huggingface.txt huggingface.txt
COPY ci_commit_pins/timm.txt timm.txt
RUN if [ -n "${INDUCTOR_BENCHMARKS}" ]; then bash ./install_inductor_benchmark_deps.sh; fi
RUN rm install_inductor_benchmark_deps.sh common_utils.sh timm.txt huggingface.txt
# (optional) Install non-default CMake version
ARG CMAKE_VERSION
COPY ./common/install_cmake.sh install_cmake.sh
RUN if [ -n "${CMAKE_VERSION}" ]; then bash ./install_cmake.sh; fi
RUN rm install_cmake.sh
ARG TRITON
# Install triton, this needs to be done before sccache because the latter will
# try to reach out to S3, which docker build runners don't have access
COPY ./common/install_triton.sh install_triton.sh
COPY ./common/common_utils.sh common_utils.sh
COPY ci_commit_pins/triton.txt triton.txt
COPY triton_version.txt triton_version.txt
RUN if [ -n "${TRITON}" ]; then bash ./install_triton.sh; fi
RUN rm install_triton.sh common_utils.sh triton.txt triton_version.txt
ARG HALIDE
# Build and install halide
COPY ./common/install_halide.sh install_halide.sh
COPY ./common/common_utils.sh common_utils.sh
COPY ci_commit_pins/halide.txt halide.txt
RUN if [ -n "${HALIDE}" ]; then bash ./install_halide.sh; fi
RUN rm install_halide.sh common_utils.sh halide.txt
# Install ccache/sccache (do this last, so we get priority in PATH)
COPY ./common/install_cache.sh install_cache.sh
ENV PATH /opt/cache/bin:$PATH
# See https://github.com/pytorch/pytorch/issues/82174
# TODO(sdym@fb.com):
# check if this is needed after full off Xenial migration
ENV CARGO_NET_GIT_FETCH_WITH_CLI true
RUN bash ./install_cache.sh && rm install_cache.sh
ENV CMAKE_CUDA_COMPILER_LAUNCHER=/opt/cache/bin/sccache
# Add jni.h for java host build
COPY ./common/install_jni.sh install_jni.sh
COPY ./java/jni.h jni.h
RUN bash ./install_jni.sh && rm install_jni.sh
# Install Open MPI for CUDA
COPY ./common/install_openmpi.sh install_openmpi.sh
RUN if [ -n "${CUDA_VERSION}" ]; then bash install_openmpi.sh; fi
RUN rm install_openmpi.sh
# Include BUILD_ENVIRONMENT environment variable in image
ARG BUILD_ENVIRONMENT
ENV BUILD_ENVIRONMENT ${BUILD_ENVIRONMENT}
# AWS specific CUDA build guidance
ENV TORCH_CUDA_ARCH_LIST Maxwell
ENV TORCH_NVCC_FLAGS "-Xfatbin -compress-all"
ENV CUDA_PATH /usr/local/cuda
# Install LLVM dev version (Defined in the pytorch/builder github repository)
COPY --from=pytorch/llvm:9.0.1 /opt/llvm /opt/llvm
# Install CUDNN
ARG CUDNN_VERSION
ARG CUDA_VERSION
COPY ./common/install_cudnn.sh install_cudnn.sh
RUN if [ -n "${CUDNN_VERSION}" ]; then bash install_cudnn.sh; fi
RUN rm install_cudnn.sh
# Install CUSPARSELT
ARG CUDA_VERSION
COPY ./common/install_cusparselt.sh install_cusparselt.sh
RUN bash install_cusparselt.sh
RUN rm install_cusparselt.sh
# Delete /usr/local/cuda-11.X/cuda-11.X symlinks
RUN if [ -h /usr/local/cuda-11.6/cuda-11.6 ]; then rm /usr/local/cuda-11.6/cuda-11.6; fi
RUN if [ -h /usr/local/cuda-11.7/cuda-11.7 ]; then rm /usr/local/cuda-11.7/cuda-11.7; fi
RUN if [ -h /usr/local/cuda-12.1/cuda-12.1 ]; then rm /usr/local/cuda-12.1/cuda-12.1; fi
RUN if [ -h /usr/local/cuda-12.4/cuda-12.4 ]; then rm /usr/local/cuda-12.4/cuda-12.4; fi
USER jenkins
CMD ["bash"]

View File

@ -1,125 +0,0 @@
ARG UBUNTU_VERSION
FROM ubuntu:${UBUNTU_VERSION}
ARG UBUNTU_VERSION
ENV DEBIAN_FRONTEND noninteractive
# Set AMD gpu targets to build for
ARG PYTORCH_ROCM_ARCH
ENV PYTORCH_ROCM_ARCH ${PYTORCH_ROCM_ARCH}
# Install common dependencies (so that this step can be cached separately)
COPY ./common/install_base.sh install_base.sh
RUN bash ./install_base.sh && rm install_base.sh
# Install clang
ARG LLVMDEV
ARG CLANG_VERSION
COPY ./common/install_clang.sh install_clang.sh
RUN bash ./install_clang.sh && rm install_clang.sh
# Install user
COPY ./common/install_user.sh install_user.sh
RUN bash ./install_user.sh && rm install_user.sh
# Install conda and other packages (e.g., numpy, pytest)
ARG ANACONDA_PYTHON_VERSION
ARG CONDA_CMAKE
ENV ANACONDA_PYTHON_VERSION=$ANACONDA_PYTHON_VERSION
ENV PATH /opt/conda/envs/py_$ANACONDA_PYTHON_VERSION/bin:/opt/conda/bin:$PATH
COPY requirements-ci.txt /opt/conda/requirements-ci.txt
COPY ./common/install_conda.sh install_conda.sh
COPY ./common/common_utils.sh common_utils.sh
RUN bash ./install_conda.sh && rm install_conda.sh common_utils.sh /opt/conda/requirements-ci.txt
# Install gcc
ARG GCC_VERSION
COPY ./common/install_gcc.sh install_gcc.sh
RUN bash ./install_gcc.sh && rm install_gcc.sh
# (optional) Install protobuf for ONNX
ARG PROTOBUF
COPY ./common/install_protobuf.sh install_protobuf.sh
RUN if [ -n "${PROTOBUF}" ]; then bash ./install_protobuf.sh; fi
RUN rm install_protobuf.sh
ENV INSTALLED_PROTOBUF ${PROTOBUF}
# (optional) Install database packages like LMDB and LevelDB
ARG DB
COPY ./common/install_db.sh install_db.sh
RUN if [ -n "${DB}" ]; then bash ./install_db.sh; fi
RUN rm install_db.sh
ENV INSTALLED_DB ${DB}
# (optional) Install vision packages like OpenCV
ARG VISION
COPY ./common/install_vision.sh ./common/cache_vision_models.sh ./common/common_utils.sh ./
RUN if [ -n "${VISION}" ]; then bash ./install_vision.sh; fi
RUN rm install_vision.sh cache_vision_models.sh common_utils.sh
ENV INSTALLED_VISION ${VISION}
# Install rocm
ARG ROCM_VERSION
COPY ./common/install_rocm.sh install_rocm.sh
RUN bash ./install_rocm.sh
RUN rm install_rocm.sh
COPY ./common/install_rocm_magma.sh install_rocm_magma.sh
RUN bash ./install_rocm_magma.sh
RUN rm install_rocm_magma.sh
ENV ROCM_PATH /opt/rocm
ENV PATH /opt/rocm/bin:$PATH
ENV PATH /opt/rocm/hcc/bin:$PATH
ENV PATH /opt/rocm/hip/bin:$PATH
ENV PATH /opt/rocm/opencl/bin:$PATH
ENV PATH /opt/rocm/llvm/bin:$PATH
ENV MAGMA_HOME /opt/rocm/magma
ENV LANG C.UTF-8
ENV LC_ALL C.UTF-8
# Install amdsmi
COPY ./common/install_amdsmi.sh install_amdsmi.sh
RUN bash ./install_amdsmi.sh
RUN rm install_amdsmi.sh
# (optional) Install non-default CMake version
ARG CMAKE_VERSION
COPY ./common/install_cmake.sh install_cmake.sh
RUN if [ -n "${CMAKE_VERSION}" ]; then bash ./install_cmake.sh; fi
RUN rm install_cmake.sh
# (optional) Install non-default Ninja version
ARG NINJA_VERSION
COPY ./common/install_ninja.sh install_ninja.sh
RUN if [ -n "${NINJA_VERSION}" ]; then bash ./install_ninja.sh; fi
RUN rm install_ninja.sh
ARG TRITON
# Install triton, this needs to be done before sccache because the latter will
# try to reach out to S3, which docker build runners don't have access
COPY ./common/install_triton.sh install_triton.sh
COPY ./common/common_utils.sh common_utils.sh
COPY ci_commit_pins/triton-rocm.txt triton-rocm.txt
COPY triton_version.txt triton_version.txt
RUN if [ -n "${TRITON}" ]; then bash ./install_triton.sh; fi
RUN rm install_triton.sh common_utils.sh triton-rocm.txt triton_version.txt
# Install AOTriton
COPY ./aotriton_version.txt aotriton_version.txt
COPY ./common/common_utils.sh common_utils.sh
COPY ./common/install_aotriton.sh install_aotriton.sh
RUN ["/bin/bash", "-c", "./install_aotriton.sh /opt/rocm && rm -rf install_aotriton.sh aotriton_version.txt common_utils.sh"]
ENV AOTRITON_INSTALLED_PREFIX /opt/rocm/aotriton
# Install ccache/sccache (do this last, so we get priority in PATH)
COPY ./common/install_cache.sh install_cache.sh
ENV PATH /opt/cache/bin:$PATH
RUN bash ./install_cache.sh && rm install_cache.sh
# Include BUILD_ENVIRONMENT environment variable in image
ARG BUILD_ENVIRONMENT
ENV BUILD_ENVIRONMENT ${BUILD_ENVIRONMENT}
USER jenkins
CMD ["bash"]

View File

@ -1,118 +0,0 @@
ARG UBUNTU_VERSION
FROM ubuntu:${UBUNTU_VERSION}
ARG UBUNTU_VERSION
ENV DEBIAN_FRONTEND noninteractive
ARG CLANG_VERSION
# Install common dependencies (so that this step can be cached separately)
COPY ./common/install_base.sh install_base.sh
RUN bash ./install_base.sh && rm install_base.sh
# Install clang
ARG LLVMDEV
COPY ./common/install_clang.sh install_clang.sh
RUN bash ./install_clang.sh && rm install_clang.sh
# Install user
COPY ./common/install_user.sh install_user.sh
RUN bash ./install_user.sh && rm install_user.sh
# Install katex
ARG KATEX
COPY ./common/install_docs_reqs.sh install_docs_reqs.sh
RUN bash ./install_docs_reqs.sh && rm install_docs_reqs.sh
# Install conda and other packages (e.g., numpy, pytest)
ARG ANACONDA_PYTHON_VERSION
ARG CONDA_CMAKE
ARG DOCS
ENV ANACONDA_PYTHON_VERSION=$ANACONDA_PYTHON_VERSION
ENV PATH /opt/conda/envs/py_$ANACONDA_PYTHON_VERSION/bin:/opt/conda/bin:$PATH
ENV DOCS=$DOCS
COPY requirements-ci.txt requirements-docs.txt /opt/conda/
COPY ./common/install_conda.sh install_conda.sh
COPY ./common/common_utils.sh common_utils.sh
RUN bash ./install_conda.sh && rm install_conda.sh common_utils.sh /opt/conda/requirements-ci.txt /opt/conda/requirements-docs.txt
# Install gcc
ARG GCC_VERSION
COPY ./common/install_gcc.sh install_gcc.sh
RUN bash ./install_gcc.sh && rm install_gcc.sh
# Install lcov for C++ code coverage
COPY ./common/install_lcov.sh install_lcov.sh
RUN bash ./install_lcov.sh && rm install_lcov.sh
COPY ./common/install_openssl.sh install_openssl.sh
RUN bash ./install_openssl.sh
ENV OPENSSL_ROOT_DIR /opt/openssl
ENV OPENSSL_DIR /opt/openssl
RUN rm install_openssl.sh
ARG INDUCTOR_BENCHMARKS
COPY ./common/install_inductor_benchmark_deps.sh install_inductor_benchmark_deps.sh
COPY ./common/common_utils.sh common_utils.sh
COPY ci_commit_pins/huggingface.txt huggingface.txt
COPY ci_commit_pins/timm.txt timm.txt
RUN if [ -n "${INDUCTOR_BENCHMARKS}" ]; then bash ./install_inductor_benchmark_deps.sh; fi
RUN rm install_inductor_benchmark_deps.sh common_utils.sh timm.txt huggingface.txt
# Install XPU Dependencies
ARG XPU_VERSION
COPY ./common/install_xpu.sh install_xpu.sh
RUN bash ./install_xpu.sh && rm install_xpu.sh
ARG TRITON
# Install triton, this needs to be done before sccache because the latter will
# try to reach out to S3, which docker build runners don't have access
COPY ./common/install_triton.sh install_triton.sh
COPY ./common/common_utils.sh common_utils.sh
COPY ci_commit_pins/triton-xpu.txt triton-xpu.txt
COPY triton_version.txt triton_version.txt
RUN if [ -n "${TRITON}" ]; then bash ./install_triton.sh; fi
RUN rm install_triton.sh common_utils.sh triton-xpu.txt triton_version.txt
# (optional) Install database packages like LMDB and LevelDB
ARG DB
COPY ./common/install_db.sh install_db.sh
RUN if [ -n "${DB}" ]; then bash ./install_db.sh; fi
RUN rm install_db.sh
ENV INSTALLED_DB ${DB}
# (optional) Install vision packages like OpenCV
ARG VISION
COPY ./common/install_vision.sh ./common/cache_vision_models.sh ./common/common_utils.sh ./
RUN if [ -n "${VISION}" ]; then bash ./install_vision.sh; fi
RUN rm install_vision.sh cache_vision_models.sh common_utils.sh
ENV INSTALLED_VISION ${VISION}
# (optional) Install non-default CMake version
ARG CMAKE_VERSION
COPY ./common/install_cmake.sh install_cmake.sh
RUN if [ -n "${CMAKE_VERSION}" ]; then bash ./install_cmake.sh; fi
RUN rm install_cmake.sh
# (optional) Install non-default Ninja version
ARG NINJA_VERSION
COPY ./common/install_ninja.sh install_ninja.sh
RUN if [ -n "${NINJA_VERSION}" ]; then bash ./install_ninja.sh; fi
RUN rm install_ninja.sh
# Install ccache/sccache (do this last, so we get priority in PATH)
COPY ./common/install_cache.sh install_cache.sh
ENV PATH /opt/cache/bin:$PATH
RUN bash ./install_cache.sh && rm install_cache.sh
# Include BUILD_ENVIRONMENT environment variable in image
ARG BUILD_ENVIRONMENT
ENV BUILD_ENVIRONMENT ${BUILD_ENVIRONMENT}
# Install LLVM dev version (Defined in the pytorch/builder github repository)
COPY --from=pytorch/llvm:9.0.1 /opt/llvm /opt/llvm
USER jenkins
CMD ["bash"]

View File

@ -1,211 +0,0 @@
ARG UBUNTU_VERSION
FROM ubuntu:${UBUNTU_VERSION}
ARG UBUNTU_VERSION
ENV DEBIAN_FRONTEND noninteractive
ARG CLANG_VERSION
# Install common dependencies (so that this step can be cached separately)
COPY ./common/install_base.sh install_base.sh
RUN bash ./install_base.sh && rm install_base.sh
# Install clang
ARG LLVMDEV
COPY ./common/install_clang.sh install_clang.sh
RUN bash ./install_clang.sh && rm install_clang.sh
# Install user
COPY ./common/install_user.sh install_user.sh
RUN bash ./install_user.sh && rm install_user.sh
# Install katex
ARG KATEX
COPY ./common/install_docs_reqs.sh install_docs_reqs.sh
RUN bash ./install_docs_reqs.sh && rm install_docs_reqs.sh
# Install conda and other packages (e.g., numpy, pytest)
ARG ANACONDA_PYTHON_VERSION
ARG CONDA_CMAKE
ARG DOCS
ENV ANACONDA_PYTHON_VERSION=$ANACONDA_PYTHON_VERSION
ENV PATH /opt/conda/envs/py_$ANACONDA_PYTHON_VERSION/bin:/opt/conda/bin:$PATH
ENV DOCS=$DOCS
COPY requirements-ci.txt requirements-docs.txt /opt/conda/
COPY ./common/install_conda.sh install_conda.sh
COPY ./common/common_utils.sh common_utils.sh
RUN bash ./install_conda.sh && rm install_conda.sh common_utils.sh /opt/conda/requirements-ci.txt /opt/conda/requirements-docs.txt
RUN if [ -n "${UNINSTALL_DILL}" ]; then pip uninstall -y dill; fi
# Install gcc
ARG GCC_VERSION
COPY ./common/install_gcc.sh install_gcc.sh
RUN bash ./install_gcc.sh && rm install_gcc.sh
# Install lcov for C++ code coverage
COPY ./common/install_lcov.sh install_lcov.sh
RUN bash ./install_lcov.sh && rm install_lcov.sh
# Install cuda and cudnn
ARG CUDA_VERSION
RUN wget -q https://raw.githubusercontent.com/pytorch/builder/main/common/install_cuda.sh -O install_cuda.sh
RUN bash ./install_cuda.sh ${CUDA_VERSION} && rm install_cuda.sh
ENV DESIRED_CUDA ${CUDA_VERSION}
ENV PATH /usr/local/nvidia/bin:/usr/local/cuda/bin:$PATH
# (optional) Install UCC
ARG UCX_COMMIT
ARG UCC_COMMIT
ENV UCX_COMMIT $UCX_COMMIT
ENV UCC_COMMIT $UCC_COMMIT
ENV UCX_HOME /usr
ENV UCC_HOME /usr
ADD ./common/install_ucc.sh install_ucc.sh
RUN if [ -n "${UCX_COMMIT}" ] && [ -n "${UCC_COMMIT}" ]; then bash ./install_ucc.sh; fi
RUN rm install_ucc.sh
# (optional) Install protobuf for ONNX
ARG PROTOBUF
COPY ./common/install_protobuf.sh install_protobuf.sh
RUN if [ -n "${PROTOBUF}" ]; then bash ./install_protobuf.sh; fi
RUN rm install_protobuf.sh
ENV INSTALLED_PROTOBUF ${PROTOBUF}
# (optional) Install database packages like LMDB and LevelDB
ARG DB
COPY ./common/install_db.sh install_db.sh
RUN if [ -n "${DB}" ]; then bash ./install_db.sh; fi
RUN rm install_db.sh
ENV INSTALLED_DB ${DB}
# (optional) Install vision packages like OpenCV
ARG VISION
COPY ./common/install_vision.sh ./common/cache_vision_models.sh ./common/common_utils.sh ./
RUN if [ -n "${VISION}" ]; then bash ./install_vision.sh; fi
RUN rm install_vision.sh cache_vision_models.sh common_utils.sh
ENV INSTALLED_VISION ${VISION}
# (optional) Install Android NDK
ARG ANDROID
ARG ANDROID_NDK
ARG GRADLE_VERSION
COPY ./common/install_android.sh ./common/cache_vision_models.sh ./common/common_utils.sh ./
COPY ./android/AndroidManifest.xml AndroidManifest.xml
COPY ./android/build.gradle build.gradle
RUN if [ -n "${ANDROID}" ]; then bash ./install_android.sh; fi
RUN rm install_android.sh cache_vision_models.sh common_utils.sh
RUN rm AndroidManifest.xml
RUN rm build.gradle
ENV INSTALLED_ANDROID ${ANDROID}
# (optional) Install Vulkan SDK
ARG VULKAN_SDK_VERSION
COPY ./common/install_vulkan_sdk.sh install_vulkan_sdk.sh
RUN if [ -n "${VULKAN_SDK_VERSION}" ]; then bash ./install_vulkan_sdk.sh; fi
RUN rm install_vulkan_sdk.sh
# (optional) Install swiftshader
ARG SWIFTSHADER
COPY ./common/install_swiftshader.sh install_swiftshader.sh
RUN if [ -n "${SWIFTSHADER}" ]; then bash ./install_swiftshader.sh; fi
RUN rm install_swiftshader.sh
# (optional) Install non-default CMake version
ARG CMAKE_VERSION
COPY ./common/install_cmake.sh install_cmake.sh
RUN if [ -n "${CMAKE_VERSION}" ]; then bash ./install_cmake.sh; fi
RUN rm install_cmake.sh
# (optional) Install non-default Ninja version
ARG NINJA_VERSION
COPY ./common/install_ninja.sh install_ninja.sh
RUN if [ -n "${NINJA_VERSION}" ]; then bash ./install_ninja.sh; fi
RUN rm install_ninja.sh
COPY ./common/install_openssl.sh install_openssl.sh
RUN bash ./install_openssl.sh
ENV OPENSSL_ROOT_DIR /opt/openssl
ENV OPENSSL_DIR /opt/openssl
RUN rm install_openssl.sh
ARG INDUCTOR_BENCHMARKS
COPY ./common/install_inductor_benchmark_deps.sh install_inductor_benchmark_deps.sh
COPY ./common/common_utils.sh common_utils.sh
COPY ci_commit_pins/huggingface.txt huggingface.txt
COPY ci_commit_pins/timm.txt timm.txt
RUN if [ -n "${INDUCTOR_BENCHMARKS}" ]; then bash ./install_inductor_benchmark_deps.sh; fi
RUN rm install_inductor_benchmark_deps.sh common_utils.sh timm.txt huggingface.txt
ARG TRITON
# Install triton, this needs to be done before sccache because the latter will
# try to reach out to S3, which docker build runners don't have access
COPY ./common/install_triton.sh install_triton.sh
COPY ./common/common_utils.sh common_utils.sh
COPY ci_commit_pins/triton.txt triton.txt
RUN if [ -n "${TRITON}" ]; then bash ./install_triton.sh; fi
RUN rm install_triton.sh common_utils.sh triton.txt
ARG EXECUTORCH
# Build and install executorch
COPY ./common/install_executorch.sh install_executorch.sh
COPY ./common/common_utils.sh common_utils.sh
COPY ci_commit_pins/executorch.txt executorch.txt
RUN if [ -n "${EXECUTORCH}" ]; then bash ./install_executorch.sh; fi
RUN rm install_executorch.sh common_utils.sh executorch.txt
ARG HALIDE
# Build and install halide
COPY ./common/install_halide.sh install_halide.sh
COPY ./common/common_utils.sh common_utils.sh
COPY ci_commit_pins/halide.txt halide.txt
RUN if [ -n "${HALIDE}" ]; then bash ./install_halide.sh; fi
RUN rm install_halide.sh common_utils.sh halide.txt
ARG ONNX
# Install ONNX dependencies
COPY ./common/install_onnx.sh ./common/common_utils.sh ./
RUN if [ -n "${ONNX}" ]; then bash ./install_onnx.sh; fi
RUN rm install_onnx.sh common_utils.sh
# (optional) Build ACL
ARG ACL
COPY ./common/install_acl.sh install_acl.sh
RUN if [ -n "${ACL}" ]; then bash ./install_acl.sh; fi
RUN rm install_acl.sh
ENV INSTALLED_ACL ${ACL}
# Install ccache/sccache (do this last, so we get priority in PATH)
ARG SKIP_SCCACHE_INSTALL
COPY ./common/install_cache.sh install_cache.sh
ENV PATH /opt/cache/bin:$PATH
RUN if [ -z "${SKIP_SCCACHE_INSTALL}" ]; then bash ./install_cache.sh; fi
RUN rm install_cache.sh
# Add jni.h for java host build
COPY ./common/install_jni.sh install_jni.sh
COPY ./java/jni.h jni.h
RUN bash ./install_jni.sh && rm install_jni.sh
# Install Open MPI for CUDA
COPY ./common/install_openmpi.sh install_openmpi.sh
RUN if [ -n "${CUDA_VERSION}" ]; then bash install_openmpi.sh; fi
RUN rm install_openmpi.sh
# Include BUILD_ENVIRONMENT environment variable in image
ARG BUILD_ENVIRONMENT
ENV BUILD_ENVIRONMENT ${BUILD_ENVIRONMENT}
# Install LLVM dev version (Defined in the pytorch/builder github repository)
ARG SKIP_LLVM_SRC_BUILD_INSTALL
COPY --from=pytorch/llvm:9.0.1 /opt/llvm /opt/llvm
RUN if [ -n "${SKIP_LLVM_SRC_BUILD_INSTALL}" ]; then set -eu; rm -rf /opt/llvm; fi
# AWS specific CUDA build guidance
ENV TORCH_CUDA_ARCH_LIST Maxwell
ENV TORCH_NVCC_FLAGS "-Xfatbin -compress-all"
ENV CUDA_PATH /usr/local/cuda
USER jenkins
CMD ["bash"]

View File

@ -1,14 +0,0 @@
# Jenkins
The scripts in this directory are the entrypoint for testing ONNX exporter.
The environment variable `BUILD_ENVIRONMENT` is expected to be set to
the build environment you intend to test. It is a hint for the build
and test scripts to configure Caffe2 a certain way and include/exclude
tests. Docker images, they equal the name of the image itself. For
example: `py2-cuda9.0-cudnn7-ubuntu16.04`. The Docker images that are
built on Jenkins and are used in triggered builds already have this
environment variable set in their manifest. Also see
`./docker/jenkins/*/Dockerfile` and search for `BUILD_ENVIRONMENT`.
Our Jenkins installation is located at https://ci.pytorch.org/jenkins/.

View File

@ -1,23 +0,0 @@
#!/bin/bash
set -ex
source "$(dirname "${BASH_SOURCE[0]}")/../pytorch/common_utils.sh"
LOCAL_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
ROOT_DIR=$(cd "$LOCAL_DIR"/../.. && pwd)
TEST_DIR="$ROOT_DIR/test"
pytest_reports_dir="${TEST_DIR}/test-reports/python"
# Figure out which Python to use
PYTHON="$(which python)"
if [[ "${BUILD_ENVIRONMENT}" =~ py((2|3)\.?[0-9]?\.?[0-9]?) ]]; then
PYTHON=$(which "python${BASH_REMATCH[1]}")
fi
if [[ "${BUILD_ENVIRONMENT}" == *rocm* ]]; then
# HIP_PLATFORM is auto-detected by hipcc; unset to avoid build errors
unset HIP_PLATFORM
fi
mkdir -p "$pytest_reports_dir" || true

View File

@ -1,29 +0,0 @@
#!/bin/bash
# shellcheck source=./common.sh
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
# Workaround for dind-rootless userid mapping (https://github.com/pytorch/ci-infra/issues/96)
WORKSPACE_ORIGINAL_OWNER_ID=$(stat -c '%u' "/var/lib/jenkins/workspace")
cleanup_workspace() {
echo "sudo may print the following warning message that can be ignored. The chown command will still run."
echo " sudo: setrlimit(RLIMIT_STACK): Operation not permitted"
echo "For more details refer to https://github.com/sudo-project/sudo/issues/42"
sudo chown -R "$WORKSPACE_ORIGINAL_OWNER_ID" /var/lib/jenkins/workspace
}
# Disable shellcheck SC2064 as we want to parse the original owner immediately.
# shellcheck disable=SC2064
trap_add cleanup_workspace EXIT
sudo chown -R jenkins /var/lib/jenkins/workspace
git config --global --add safe.directory /var/lib/jenkins/workspace
if [[ "$BUILD_ENVIRONMENT" == *onnx* ]]; then
# TODO: This can be removed later once vision is also part of the Docker image
pip install -q --user --no-use-pep517 "git+https://github.com/pytorch/vision.git@$(cat .github/ci_commit_pins/vision.txt)"
# JIT C++ extensions require ninja, so put it into PATH.
export PATH="/var/lib/jenkins/.local/bin:$PATH"
# NB: ONNX test is fast (~15m) so it's ok to retry it few more times to avoid any flaky issue, we
# need to bring this to the standard PyTorch run_test eventually. The issue will be tracked in
# https://github.com/pytorch/pytorch/issues/98626
"$ROOT_DIR/scripts/onnx/test.sh"
fi

View File

@ -1,42 +0,0 @@
This directory contains scripts for our continuous integration.
One important thing to keep in mind when reading the scripts here is
that they are all based off of Docker images, which we build for each of
the various system configurations we want to run on Jenkins. This means
it is very easy to run these tests yourself:
1. Figure out what Docker image you want. The general template for our
images look like:
``registry.pytorch.org/pytorch/pytorch-$BUILD_ENVIRONMENT:$DOCKER_VERSION``,
where ``$BUILD_ENVIRONMENT`` is one of the build environments
enumerated in
[pytorch-dockerfiles](https://github.com/pytorch/pytorch/blob/master/.ci/docker/build.sh). The dockerfile used by jenkins can be found under the `.ci` [directory](https://github.com/pytorch/pytorch/blob/master/.ci/docker)
2. Run ``docker run -it -u jenkins $DOCKER_IMAGE``, clone PyTorch and
run one of the scripts in this directory.
The Docker images are designed so that any "reasonable" build commands
will work; if you look in [build.sh](build.sh) you will see that it is a
very simple script. This is intentional. Idiomatic build instructions
should work inside all of our Docker images. You can tweak the commands
however you need (e.g., in case you want to rebuild with DEBUG, or rerun
the build with higher verbosity, etc.).
We have to do some work to make this so. Here is a summary of the
mechanisms we use:
- We install binaries to directories like `/usr/local/bin` which
are automatically part of your PATH.
- We add entries to the PATH using Docker ENV variables (so
they apply when you enter Docker) and `/etc/environment` (so they
continue to apply even if you sudo), instead of modifying
`PATH` in our build scripts.
- We use `/etc/ld.so.conf.d` to register directories containing
shared libraries, instead of modifying `LD_LIBRARY_PATH` in our
build scripts.
- We reroute well known paths like `/usr/bin/gcc` to alternate
implementations with `update-alternatives`, instead of setting
`CC` and `CXX` in our implementations.

View File

@ -1,408 +0,0 @@
#!/bin/bash
set -ex
# Required environment variable: $BUILD_ENVIRONMENT
# (This is set by default in the Docker images we build, so you don't
# need to set it yourself.
# shellcheck source=./common.sh
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
# shellcheck source=./common-build.sh
source "$(dirname "${BASH_SOURCE[0]}")/common-build.sh"
if [[ "$BUILD_ENVIRONMENT" == *-mobile-*build* ]]; then
exec "$(dirname "${BASH_SOURCE[0]}")/build-mobile.sh" "$@"
fi
echo "Python version:"
python --version
echo "GCC version:"
gcc --version
echo "CMake version:"
cmake --version
echo "Environment variables:"
env
if [[ "$BUILD_ENVIRONMENT" == *cuda* ]]; then
# Use jemalloc during compilation to mitigate https://github.com/pytorch/pytorch/issues/116289
export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2
echo "NVCC version:"
nvcc --version
fi
if [[ "$BUILD_ENVIRONMENT" == *cuda11* ]]; then
if [[ "$BUILD_ENVIRONMENT" != *cuda11.3* && "$BUILD_ENVIRONMENT" != *clang* ]]; then
# TODO: there is a linking issue when building with UCC using clang,
# disable it for now and to be fix later.
# TODO: disable UCC temporarily to enable CUDA 12.1 in CI
export USE_UCC=1
export USE_SYSTEM_UCC=1
fi
fi
if [[ ${BUILD_ENVIRONMENT} == *"parallelnative"* ]]; then
export ATEN_THREADING=NATIVE
fi
# Enable LLVM dependency for TensorExpr testing
if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then
export USE_LLVM=/opt/rocm/llvm
export LLVM_DIR=/opt/rocm/llvm/lib/cmake/llvm
else
export USE_LLVM=/opt/llvm
export LLVM_DIR=/opt/llvm/lib/cmake/llvm
fi
if [[ "$BUILD_ENVIRONMENT" == *executorch* ]]; then
# To build test_edge_op_registration
export BUILD_EXECUTORCH=ON
export USE_CUDA=0
fi
if ! which conda; then
# In ROCm CIs, we are doing cross compilation on build machines with
# intel cpu and later run tests on machines with amd cpu.
# Also leave out two builds to make sure non-mkldnn builds still work.
if [[ "$BUILD_ENVIRONMENT" != *rocm* ]]; then
export USE_MKLDNN=1
else
export USE_MKLDNN=0
fi
else
# CMAKE_PREFIX_PATH precedences
# 1. $CONDA_PREFIX, if defined. This follows the pytorch official build instructions.
# 2. /opt/conda/envs/py_${ANACONDA_PYTHON_VERSION}, if ANACONDA_PYTHON_VERSION defined.
# This is for CI, which defines ANACONDA_PYTHON_VERSION but not CONDA_PREFIX.
# 3. $(conda info --base). The fallback value of pytorch official build
# instructions actually refers to this.
# Commonly this is /opt/conda/
if [[ -v CONDA_PREFIX ]]; then
export CMAKE_PREFIX_PATH=${CONDA_PREFIX}
elif [[ -v ANACONDA_PYTHON_VERSION ]]; then
export CMAKE_PREFIX_PATH="/opt/conda/envs/py_${ANACONDA_PYTHON_VERSION}"
else
# already checked by `! which conda`
CMAKE_PREFIX_PATH="$(conda info --base)"
export CMAKE_PREFIX_PATH
fi
# Workaround required for MKL library linkage
# https://github.com/pytorch/pytorch/issues/119557
if [ "$ANACONDA_PYTHON_VERSION" = "3.12" ]; then
export CMAKE_LIBRARY_PATH="/opt/conda/envs/py_$ANACONDA_PYTHON_VERSION/lib/"
export CMAKE_INCLUDE_PATH="/opt/conda/envs/py_$ANACONDA_PYTHON_VERSION/include/"
fi
fi
if [[ "$BUILD_ENVIRONMENT" == *aarch64* ]]; then
export USE_MKLDNN=1
export USE_MKLDNN_ACL=1
export ACL_ROOT_DIR=/ComputeLibrary
fi
if [[ "$BUILD_ENVIRONMENT" == *libtorch* ]]; then
POSSIBLE_JAVA_HOMES=()
POSSIBLE_JAVA_HOMES+=(/usr/local)
POSSIBLE_JAVA_HOMES+=(/usr/lib/jvm/java-8-openjdk-amd64)
POSSIBLE_JAVA_HOMES+=(/Library/Java/JavaVirtualMachines/*.jdk/Contents/Home)
# Add the Windows-specific JNI
POSSIBLE_JAVA_HOMES+=("$PWD/.circleci/windows-jni/")
for JH in "${POSSIBLE_JAVA_HOMES[@]}" ; do
if [[ -e "$JH/include/jni.h" ]] ; then
# Skip if we're not on Windows but haven't found a JAVA_HOME
if [[ "$JH" == "$PWD/.circleci/windows-jni/" && "$OSTYPE" != "msys" ]] ; then
break
fi
echo "Found jni.h under $JH"
export JAVA_HOME="$JH"
export BUILD_JNI=ON
break
fi
done
if [ -z "$JAVA_HOME" ]; then
echo "Did not find jni.h"
fi
fi
# Use special scripts for Android builds
if [[ "${BUILD_ENVIRONMENT}" == *-android* ]]; then
export ANDROID_NDK=/opt/ndk
build_args=()
if [[ "${BUILD_ENVIRONMENT}" == *-arm-v7a* ]]; then
build_args+=("-DANDROID_ABI=armeabi-v7a")
elif [[ "${BUILD_ENVIRONMENT}" == *-arm-v8a* ]]; then
build_args+=("-DANDROID_ABI=arm64-v8a")
elif [[ "${BUILD_ENVIRONMENT}" == *-x86_32* ]]; then
build_args+=("-DANDROID_ABI=x86")
elif [[ "${BUILD_ENVIRONMENT}" == *-x86_64* ]]; then
build_args+=("-DANDROID_ABI=x86_64")
fi
if [[ "${BUILD_ENVIRONMENT}" == *vulkan* ]]; then
build_args+=("-DUSE_VULKAN=ON")
fi
build_args+=("-DUSE_LITE_INTERPRETER_PROFILER=OFF")
exec ./scripts/build_android.sh "${build_args[@]}" "$@"
fi
if [[ "$BUILD_ENVIRONMENT" != *android* && "$BUILD_ENVIRONMENT" == *vulkan* ]]; then
export USE_VULKAN=1
# shellcheck disable=SC1091
source /var/lib/jenkins/vulkansdk/setup-env.sh
fi
if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then
# hcc used to run out of memory, silently exiting without stopping
# the build process, leaving undefined symbols in the shared lib,
# causing undefined symbol errors when later running tests.
# We used to set MAX_JOBS to 4 to avoid, but this is no longer an issue.
if [ -z "$MAX_JOBS" ]; then
export MAX_JOBS=$(($(nproc) - 1))
fi
if [[ -n "$CI" && -z "$PYTORCH_ROCM_ARCH" ]]; then
# Set ROCM_ARCH to gfx906 for CI builds, if user doesn't override.
echo "Limiting PYTORCH_ROCM_ARCH to gfx906 for CI builds"
export PYTORCH_ROCM_ARCH="gfx906"
fi
# hipify sources
python tools/amd_build/build_amd.py
fi
if [[ "$BUILD_ENVIRONMENT" == *xpu* ]]; then
# shellcheck disable=SC1091
source /opt/intel/oneapi/compiler/latest/env/vars.sh
export USE_XPU=1
fi
# sccache will fail for CUDA builds if all cores are used for compiling
# gcc 7 with sccache seems to have intermittent OOM issue if all cores are used
if [ -z "$MAX_JOBS" ]; then
if { [[ "$BUILD_ENVIRONMENT" == *cuda* ]] || [[ "$BUILD_ENVIRONMENT" == *gcc7* ]]; } && which sccache > /dev/null; then
export MAX_JOBS=$(($(nproc) - 1))
fi
fi
# TORCH_CUDA_ARCH_LIST must be passed from an environment variable
if [[ "$BUILD_ENVIRONMENT" == *cuda* && -z "$TORCH_CUDA_ARCH_LIST" ]]; then
echo "TORCH_CUDA_ARCH_LIST must be defined"
exit 1
fi
# We only build FlashAttention files for CUDA 8.0+, and they require large amounts of
# memory to build and will OOM
if [[ "$BUILD_ENVIRONMENT" == *cuda* ]] && [[ "$TORCH_CUDA_ARCH_LIST" == *"8.6"* || "$TORCH_CUDA_ARCH_LIST" == *"8.0"* ]]; then
echo "WARNING: FlashAttention files require large amounts of memory to build and will OOM"
echo "Setting MAX_JOBS=(nproc-2)/3 to reduce memory usage"
export MAX_JOBS="$(( $(nproc --ignore=2) / 3 ))"
fi
if [[ "${BUILD_ENVIRONMENT}" == *clang* ]]; then
export CC=clang
export CXX=clang++
fi
if [[ "$BUILD_ENVIRONMENT" == *-clang*-asan* ]]; then
export LDSHARED="clang --shared"
export USE_CUDA=0
export USE_ASAN=1
export UBSAN_FLAGS="-fno-sanitize-recover=all;-fno-sanitize=float-divide-by-zero;-fno-sanitize=float-cast-overflow"
unset USE_LLVM
fi
if [[ "${BUILD_ENVIRONMENT}" == *no-ops* ]]; then
export USE_PER_OPERATOR_HEADERS=0
fi
if [[ "${BUILD_ENVIRONMENT}" == *-pch* ]]; then
export USE_PRECOMPILED_HEADERS=1
fi
if [[ "${BUILD_ENVIRONMENT}" == *linux-focal-py3.7-gcc7-build* ]]; then
export USE_GLOO_WITH_OPENSSL=ON
fi
if [[ "${BUILD_ENVIRONMENT}" != *android* && "${BUILD_ENVIRONMENT}" != *cuda* ]]; then
export BUILD_STATIC_RUNTIME_BENCHMARK=ON
fi
# Do not change workspace permissions for ROCm CI jobs
# as it can leave workspace with bad permissions for cancelled jobs
if [[ "$BUILD_ENVIRONMENT" != *rocm* ]]; then
# Workaround for dind-rootless userid mapping (https://github.com/pytorch/ci-infra/issues/96)
WORKSPACE_ORIGINAL_OWNER_ID=$(stat -c '%u' "/var/lib/jenkins/workspace")
cleanup_workspace() {
echo "sudo may print the following warning message that can be ignored. The chown command will still run."
echo " sudo: setrlimit(RLIMIT_STACK): Operation not permitted"
echo "For more details refer to https://github.com/sudo-project/sudo/issues/42"
sudo chown -R "$WORKSPACE_ORIGINAL_OWNER_ID" /var/lib/jenkins/workspace
}
# Disable shellcheck SC2064 as we want to parse the original owner immediately.
# shellcheck disable=SC2064
trap_add cleanup_workspace EXIT
sudo chown -R jenkins /var/lib/jenkins/workspace
git config --global --add safe.directory /var/lib/jenkins/workspace
fi
if [[ "$BUILD_ENVIRONMENT" == *-bazel-* ]]; then
set -e
get_bazel
install_sccache_nvcc_for_bazel
# Leave 1 CPU free and use only up to 80% of memory to reduce the change of crashing
# the runner
BAZEL_MEM_LIMIT="--local_ram_resources=HOST_RAM*.8"
BAZEL_CPU_LIMIT="--local_cpu_resources=HOST_CPUS-1"
if [[ "$CUDA_VERSION" == "cpu" ]]; then
# Build torch, the Python module, and tests for CPU-only
tools/bazel build --config=no-tty "${BAZEL_MEM_LIMIT}" "${BAZEL_CPU_LIMIT}" --config=cpu-only :torch :torch/_C.so :all_tests
else
tools/bazel build --config=no-tty "${BAZEL_MEM_LIMIT}" "${BAZEL_CPU_LIMIT}" //...
fi
else
# check that setup.py would fail with bad arguments
echo "The next three invocations are expected to fail with invalid command error messages."
( ! get_exit_code python setup.py bad_argument )
( ! get_exit_code python setup.py clean] )
( ! get_exit_code python setup.py clean bad_argument )
if [[ "$BUILD_ENVIRONMENT" != *libtorch* ]]; then
# rocm builds fail when WERROR=1
# XLA test build fails when WERROR=1
# set only when building other architectures
# or building non-XLA tests.
if [[ "$BUILD_ENVIRONMENT" != *rocm* &&
"$BUILD_ENVIRONMENT" != *xla* ]]; then
if [[ "$BUILD_ENVIRONMENT" != *py3.8* ]]; then
# Install numpy-2.0 release candidate for builds
# Which should be backward compatible with Numpy-1.X
python -mpip install --pre numpy==2.0.0rc1
fi
WERROR=1 python setup.py clean
if [[ "$USE_SPLIT_BUILD" == "true" ]]; then
BUILD_LIBTORCH_WHL=1 BUILD_PYTHON_ONLY=0 python setup.py bdist_wheel
BUILD_LIBTORCH_WHL=0 BUILD_PYTHON_ONLY=1 python setup.py bdist_wheel --cmake
else
WERROR=1 python setup.py bdist_wheel
fi
else
python setup.py clean
if [[ "$BUILD_ENVIRONMENT" == *xla* ]]; then
source .ci/pytorch/install_cache_xla.sh
fi
if [[ "$USE_SPLIT_BUILD" == "true" ]]; then
echo "USE_SPLIT_BUILD cannot be used with xla or rocm"
exit 1
else
python setup.py bdist_wheel
fi
fi
pip_install_whl "$(echo dist/*.whl)"
# TODO: I'm not sure why, but somehow we lose verbose commands
set -x
assert_git_not_dirty
# Copy ninja build logs to dist folder
mkdir -p dist
if [ -f build/.ninja_log ]; then
cp build/.ninja_log dist
fi
if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then
# remove sccache wrappers post-build; runtime compilation of MIOpen kernels does not yet fully support them
sudo rm -f /opt/cache/bin/cc
sudo rm -f /opt/cache/bin/c++
sudo rm -f /opt/cache/bin/gcc
sudo rm -f /opt/cache/bin/g++
pushd /opt/rocm/llvm/bin
if [[ -d original ]]; then
sudo mv original/clang .
sudo mv original/clang++ .
fi
sudo rm -rf original
popd
fi
CUSTOM_TEST_ARTIFACT_BUILD_DIR=${CUSTOM_TEST_ARTIFACT_BUILD_DIR:-"build/custom_test_artifacts"}
CUSTOM_TEST_USE_ROCM=$([[ "$BUILD_ENVIRONMENT" == *rocm* ]] && echo "ON" || echo "OFF")
CUSTOM_TEST_MODULE_PATH="${PWD}/cmake/public"
mkdir -pv "${CUSTOM_TEST_ARTIFACT_BUILD_DIR}"
# Build custom operator tests.
CUSTOM_OP_BUILD="${CUSTOM_TEST_ARTIFACT_BUILD_DIR}/custom-op-build"
CUSTOM_OP_TEST="$PWD/test/custom_operator"
python --version
SITE_PACKAGES="$(python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())')"
mkdir -p "$CUSTOM_OP_BUILD"
pushd "$CUSTOM_OP_BUILD"
cmake "$CUSTOM_OP_TEST" -DCMAKE_PREFIX_PATH="$SITE_PACKAGES/torch;$SITE_PACKAGES" -DPython_EXECUTABLE="$(which python)" \
-DCMAKE_MODULE_PATH="$CUSTOM_TEST_MODULE_PATH" -DUSE_ROCM="$CUSTOM_TEST_USE_ROCM"
make VERBOSE=1
popd
assert_git_not_dirty
# Build jit hook tests
JIT_HOOK_BUILD="${CUSTOM_TEST_ARTIFACT_BUILD_DIR}/jit-hook-build"
JIT_HOOK_TEST="$PWD/test/jit_hooks"
python --version
SITE_PACKAGES="$(python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())')"
mkdir -p "$JIT_HOOK_BUILD"
pushd "$JIT_HOOK_BUILD"
cmake "$JIT_HOOK_TEST" -DCMAKE_PREFIX_PATH="$SITE_PACKAGES/torch;$SITE_PACKAGES" -DPython_EXECUTABLE="$(which python)" \
-DCMAKE_MODULE_PATH="$CUSTOM_TEST_MODULE_PATH" -DUSE_ROCM="$CUSTOM_TEST_USE_ROCM"
make VERBOSE=1
popd
assert_git_not_dirty
# Build custom backend tests.
CUSTOM_BACKEND_BUILD="${CUSTOM_TEST_ARTIFACT_BUILD_DIR}/custom-backend-build"
CUSTOM_BACKEND_TEST="$PWD/test/custom_backend"
python --version
mkdir -p "$CUSTOM_BACKEND_BUILD"
pushd "$CUSTOM_BACKEND_BUILD"
cmake "$CUSTOM_BACKEND_TEST" -DCMAKE_PREFIX_PATH="$SITE_PACKAGES/torch;$SITE_PACKAGES" -DPython_EXECUTABLE="$(which python)" \
-DCMAKE_MODULE_PATH="$CUSTOM_TEST_MODULE_PATH" -DUSE_ROCM="$CUSTOM_TEST_USE_ROCM"
make VERBOSE=1
popd
assert_git_not_dirty
else
# Test no-Python build
echo "Building libtorch"
# This is an attempt to mitigate flaky libtorch build OOM error. By default, the build parallelization
# is set to be the number of CPU minus 2. So, let's try a more conservative value here. A 4xlarge has
# 16 CPUs
MAX_JOBS=$(nproc --ignore=4)
export MAX_JOBS
# NB: Install outside of source directory (at the same level as the root
# pytorch folder) so that it doesn't get cleaned away prior to docker push.
BUILD_LIBTORCH_PY=$PWD/tools/build_libtorch.py
mkdir -p ../cpp-build/caffe2
pushd ../cpp-build/caffe2
WERROR=1 VERBOSE=1 DEBUG=1 python "$BUILD_LIBTORCH_PY"
popd
fi
fi
if [[ "$BUILD_ENVIRONMENT" != *libtorch* && "$BUILD_ENVIRONMENT" != *bazel* ]]; then
# export test times so that potential sharded tests that'll branch off this build will use consistent data
# don't do this for libtorch as libtorch is C++ only and thus won't have python tests run on its build
python tools/stats/export_test_times.py
fi
# snadampal: skipping it till sccache support added for aarch64
# https://github.com/pytorch/pytorch/issues/121559
if [[ "$BUILD_ENVIRONMENT" != *aarch64* ]]; then
print_sccache_stats
fi

View File

@ -1,59 +0,0 @@
#!/bin/bash
# Required environment variables:
# $BUILD_ENVIRONMENT (should be set by your Docker image)
if [[ "$BUILD_ENVIRONMENT" != *win-* ]]; then
# Save the absolute path in case later we chdir (as occurs in the gpu perf test)
script_dir="$( cd "$(dirname "${BASH_SOURCE[0]}")" || exit ; pwd -P )"
if which sccache > /dev/null; then
# Save sccache logs to file
sccache --stop-server > /dev/null 2>&1 || true
rm -f ~/sccache_error.log || true
function sccache_epilogue() {
echo "::group::Sccache Compilation Log"
echo '=================== sccache compilation log ==================='
python "$script_dir/print_sccache_log.py" ~/sccache_error.log 2>/dev/null || true
echo '=========== If your build fails, please take a look at the log above for possible reasons ==========='
sccache --show-stats
sccache --stop-server || true
echo "::endgroup::"
}
# Register the function here so that the error log can be printed even when
# sccache fails to start, i.e. timeout error
trap_add sccache_epilogue EXIT
if [[ -n "${SKIP_SCCACHE_INITIALIZATION:-}" ]]; then
# sccache --start-server seems to hang forever on self hosted runners for GHA
# so let's just go ahead and skip the --start-server altogether since it seems
# as though sccache still gets used even when the sscache server isn't started
# explicitly
echo "Skipping sccache server initialization, setting environment variables"
export SCCACHE_IDLE_TIMEOUT=0
export SCCACHE_ERROR_LOG=~/sccache_error.log
export RUST_LOG=sccache::server=error
elif [[ "${BUILD_ENVIRONMENT}" == *rocm* ]]; then
SCCACHE_ERROR_LOG=~/sccache_error.log SCCACHE_IDLE_TIMEOUT=0 sccache --start-server
else
# increasing SCCACHE_IDLE_TIMEOUT so that extension_backend_test.cpp can build after this PR:
# https://github.com/pytorch/pytorch/pull/16645
SCCACHE_ERROR_LOG=~/sccache_error.log SCCACHE_IDLE_TIMEOUT=0 RUST_LOG=sccache::server=error sccache --start-server
fi
# Report sccache stats for easier debugging. It's ok if this commands
# timeouts and fails on MacOS
sccache --zero-stats || true
fi
if which ccache > /dev/null; then
# Report ccache stats for easier debugging
ccache --zero-stats
ccache --show-stats
function ccache_epilogue() {
ccache --show-stats
}
trap_add ccache_epilogue EXIT
fi
fi

View File

@ -1,24 +0,0 @@
#!/bin/bash
# Common setup for all Jenkins scripts
# shellcheck source=./common_utils.sh
source "$(dirname "${BASH_SOURCE[0]}")/common_utils.sh"
set -ex
# Required environment variables:
# $BUILD_ENVIRONMENT (should be set by your Docker image)
# Figure out which Python to use for ROCm
if [[ "${BUILD_ENVIRONMENT}" == *rocm* ]]; then
# HIP_PLATFORM is auto-detected by hipcc; unset to avoid build errors
unset HIP_PLATFORM
export PYTORCH_TEST_WITH_ROCM=1
# temporary to locate some kernel issues on the CI nodes
export HSAKMT_DEBUG_LEVEL=4
# improve rccl performance for distributed tests
export HSA_FORCE_FINE_GRAIN_PCIE=1
fi
# TODO: Renable libtorch testing for MacOS, see https://github.com/pytorch/pytorch/issues/62598
# shellcheck disable=SC2034
BUILD_TEST_LIBTORCH=0

View File

@ -1,238 +0,0 @@
#!/bin/bash
# Common util **functions** that can be sourced in other scripts.
# note: printf is used instead of echo to avoid backslash
# processing and to properly handle values that begin with a '-'.
log() { printf '%s\n' "$*"; }
error() { log "ERROR: $*" >&2; }
fatal() { error "$@"; exit 1; }
retry () {
"$@" || (sleep 10 && "$@") || (sleep 20 && "$@") || (sleep 40 && "$@")
}
# compositional trap taken from https://stackoverflow.com/a/7287873/23845
# appends a command to a trap
#
# - 1st arg: code to add
# - remaining args: names of traps to modify
#
trap_add() {
trap_add_cmd=$1; shift || fatal "${FUNCNAME[0]} usage error"
for trap_add_name in "$@"; do
trap -- "$(
# helper fn to get existing trap command from output
# of trap -p
extract_trap_cmd() { printf '%s\n' "$3"; }
# print existing trap command with newline
eval "extract_trap_cmd $(trap -p "${trap_add_name}")"
# print the new trap command
printf '%s\n' "${trap_add_cmd}"
)" "${trap_add_name}" \
|| fatal "unable to add to trap ${trap_add_name}"
done
}
# set the trace attribute for the above function. this is
# required to modify DEBUG or RETURN traps because functions don't
# inherit them unless the trace attribute is set
declare -f -t trap_add
function assert_git_not_dirty() {
# TODO: we should add an option to `build_amd.py` that reverts the repo to
# an unmodified state.
if [[ "$BUILD_ENVIRONMENT" != *rocm* ]] && [[ "$BUILD_ENVIRONMENT" != *xla* ]] ; then
git_status=$(git status --porcelain | grep -v '?? third_party' || true)
if [[ $git_status ]]; then
echo "Build left local git repository checkout dirty"
echo "git status --porcelain:"
echo "${git_status}"
exit 1
fi
fi
}
function pip_install_whl() {
# This is used to install PyTorch and other build artifacts wheel locally
# without using any network connection
# Convert the input arguments into an array
local args=("$@")
# Check if the first argument contains multiple paths separated by spaces
if [[ "${args[0]}" == *" "* ]]; then
# Split the string by spaces into an array
IFS=' ' read -r -a paths <<< "${args[0]}"
# Loop through each path and install individually
for path in "${paths[@]}"; do
echo "Installing $path"
python3 -mpip install --no-index --no-deps "$path"
done
else
# Loop through each argument and install individually
for path in "${args[@]}"; do
echo "Installing $path"
python3 -mpip install --no-index --no-deps "$path"
done
fi
}
function pip_install() {
# retry 3 times
# old versions of pip don't have the "--progress-bar" flag
pip install --progress-bar off "$@" || pip install --progress-bar off "$@" || pip install --progress-bar off "$@" ||\
pip install "$@" || pip install "$@" || pip install "$@"
}
function pip_uninstall() {
# uninstall 2 times
pip uninstall -y "$@" || pip uninstall -y "$@"
}
function get_exit_code() {
set +e
"$@"
retcode=$?
set -e
return $retcode
}
function get_bazel() {
# Download and use the cross-platform, dependency-free Python
# version of Bazelisk to fetch the platform specific version of
# Bazel to use from .bazelversion.
retry curl --location --output tools/bazel \
https://raw.githubusercontent.com/bazelbuild/bazelisk/v1.16.0/bazelisk.py
shasum --algorithm=1 --check \
<(echo 'd4369c3d293814d3188019c9f7527a948972d9f8 tools/bazel')
chmod u+x tools/bazel
}
# This function is bazel specific because of the bug
# in the bazel that requires some special paths massaging
# as a workaround. See
# https://github.com/bazelbuild/bazel/issues/10167
function install_sccache_nvcc_for_bazel() {
sudo mv /usr/local/cuda/bin/nvcc /usr/local/cuda/bin/nvcc-real
# Write the `/usr/local/cuda/bin/nvcc`
cat << EOF | sudo tee /usr/local/cuda/bin/nvcc
#!/bin/sh
if [ \$(env -u LD_PRELOAD ps -p \$PPID -o comm=) != sccache ]; then
exec sccache /usr/local/cuda/bin/nvcc "\$@"
else
exec external/local_cuda/cuda/bin/nvcc-real "\$@"
fi
EOF
sudo chmod +x /usr/local/cuda/bin/nvcc
}
function install_monkeytype {
# Install MonkeyType
pip_install MonkeyType
}
function get_pinned_commit() {
cat .github/ci_commit_pins/"${1}".txt
}
function install_torchaudio() {
local commit
commit=$(get_pinned_commit audio)
if [[ "$1" == "cuda" ]]; then
# TODO: This is better to be passed as a parameter from _linux-test workflow
# so that it can be consistent with what is set in build
TORCH_CUDA_ARCH_LIST="8.0;8.6" pip_install --no-use-pep517 --user "git+https://github.com/pytorch/audio.git@${commit}"
else
pip_install --no-use-pep517 --user "git+https://github.com/pytorch/audio.git@${commit}"
fi
}
function install_torchtext() {
local data_commit
local text_commit
data_commit=$(get_pinned_commit data)
text_commit=$(get_pinned_commit text)
pip_install --no-use-pep517 --user "git+https://github.com/pytorch/data.git@${data_commit}"
pip_install --no-use-pep517 --user "git+https://github.com/pytorch/text.git@${text_commit}"
}
function install_torchvision() {
local orig_preload
local commit
commit=$(get_pinned_commit vision)
orig_preload=${LD_PRELOAD}
if [ -n "${LD_PRELOAD}" ]; then
# Silence dlerror to work-around glibc ASAN bug, see https://sourceware.org/bugzilla/show_bug.cgi?id=27653#c9
echo 'char* dlerror(void) { return "";}'|gcc -fpic -shared -o "${HOME}/dlerror.so" -x c -
LD_PRELOAD=${orig_preload}:${HOME}/dlerror.so
fi
pip_install --no-use-pep517 --user "git+https://github.com/pytorch/vision.git@${commit}"
if [ -n "${LD_PRELOAD}" ]; then
LD_PRELOAD=${orig_preload}
fi
}
function install_tlparse() {
pip_install --user "tlparse==0.3.7"
PATH="$(python -m site --user-base)/bin:$PATH"
}
function install_torchrec_and_fbgemm() {
local torchrec_commit
torchrec_commit=$(get_pinned_commit torchrec)
local fbgemm_commit
fbgemm_commit=$(get_pinned_commit fbgemm)
pip_uninstall torchrec-nightly
pip_uninstall fbgemm-gpu-nightly
pip_install setuptools-git-versioning scikit-build pyre-extensions
# See https://github.com/pytorch/pytorch/issues/106971
CUDA_PATH=/usr/local/cuda-12.1 pip_install --no-use-pep517 --user "git+https://github.com/pytorch/FBGEMM.git@${fbgemm_commit}#egg=fbgemm-gpu&subdirectory=fbgemm_gpu"
pip_install --no-use-pep517 --user "git+https://github.com/pytorch/torchrec.git@${torchrec_commit}"
}
function clone_pytorch_xla() {
if [[ ! -d ./xla ]]; then
git clone --recursive --quiet https://github.com/pytorch/xla.git
pushd xla
# pin the xla hash so that we don't get broken by changes to xla
git checkout "$(cat ../.github/ci_commit_pins/xla.txt)"
git submodule sync
git submodule update --init --recursive
popd
fi
}
function checkout_install_torchbench() {
local commit
commit=$(get_pinned_commit torchbench)
git clone https://github.com/pytorch/benchmark torchbench
pushd torchbench
git checkout "$commit"
if [ "$1" ]; then
python install.py --continue_on_fail models "$@"
else
# Occasionally the installation may fail on one model but it is ok to continue
# to install and test other models
python install.py --continue_on_fail
fi
popd
}
function print_sccache_stats() {
echo 'PyTorch Build Statistics'
sccache --show-stats
if [[ -n "${OUR_GITHUB_JOB_ID}" ]]; then
sccache --show-stats --stats-format json | jq .stats \
> "sccache-stats-${BUILD_ENVIRONMENT}-${OUR_GITHUB_JOB_ID}.json"
else
echo "env var OUR_GITHUB_JOB_ID not set, will not write sccache stats to json"
fi
}

View File

@ -1,124 +0,0 @@
from datetime import datetime, timedelta
from tempfile import mkdtemp
from cryptography import x509
from cryptography.hazmat.primitives import hashes, serialization
from cryptography.hazmat.primitives.asymmetric import rsa
from cryptography.x509.oid import NameOID
temp_dir = mkdtemp()
print(temp_dir)
def genrsa(path):
key = rsa.generate_private_key(
public_exponent=65537,
key_size=2048,
)
with open(path, "wb") as f:
f.write(
key.private_bytes(
encoding=serialization.Encoding.PEM,
format=serialization.PrivateFormat.TraditionalOpenSSL,
encryption_algorithm=serialization.NoEncryption(),
)
)
return key
def create_cert(path, C, ST, L, O, key):
subject = issuer = x509.Name(
[
x509.NameAttribute(NameOID.COUNTRY_NAME, C),
x509.NameAttribute(NameOID.STATE_OR_PROVINCE_NAME, ST),
x509.NameAttribute(NameOID.LOCALITY_NAME, L),
x509.NameAttribute(NameOID.ORGANIZATION_NAME, O),
]
)
cert = (
x509.CertificateBuilder()
.subject_name(subject)
.issuer_name(issuer)
.public_key(key.public_key())
.serial_number(x509.random_serial_number())
.not_valid_before(datetime.utcnow())
.not_valid_after(
# Our certificate will be valid for 10 days
datetime.utcnow()
+ timedelta(days=10)
)
.add_extension(
x509.BasicConstraints(ca=True, path_length=None),
critical=True,
)
.sign(key, hashes.SHA256())
)
# Write our certificate out to disk.
with open(path, "wb") as f:
f.write(cert.public_bytes(serialization.Encoding.PEM))
return cert
def create_req(path, C, ST, L, O, key):
csr = (
x509.CertificateSigningRequestBuilder()
.subject_name(
x509.Name(
[
# Provide various details about who we are.
x509.NameAttribute(NameOID.COUNTRY_NAME, C),
x509.NameAttribute(NameOID.STATE_OR_PROVINCE_NAME, ST),
x509.NameAttribute(NameOID.LOCALITY_NAME, L),
x509.NameAttribute(NameOID.ORGANIZATION_NAME, O),
]
)
)
.sign(key, hashes.SHA256())
)
with open(path, "wb") as f:
f.write(csr.public_bytes(serialization.Encoding.PEM))
return csr
def sign_certificate_request(path, csr_cert, ca_cert, private_ca_key):
cert = (
x509.CertificateBuilder()
.subject_name(csr_cert.subject)
.issuer_name(ca_cert.subject)
.public_key(csr_cert.public_key())
.serial_number(x509.random_serial_number())
.not_valid_before(datetime.utcnow())
.not_valid_after(
# Our certificate will be valid for 10 days
datetime.utcnow()
+ timedelta(days=10)
# Sign our certificate with our private key
)
.sign(private_ca_key, hashes.SHA256())
)
with open(path, "wb") as f:
f.write(cert.public_bytes(serialization.Encoding.PEM))
return cert
ca_key = genrsa(temp_dir + "/ca.key")
ca_cert = create_cert(
temp_dir + "/ca.pem",
"US",
"New York",
"New York",
"Gloo Certificate Authority",
ca_key,
)
pkey = genrsa(temp_dir + "/pkey.key")
csr = create_req(
temp_dir + "/csr.csr",
"US",
"California",
"San Francisco",
"Gloo Testing Company",
pkey,
)
cert = sign_certificate_request(temp_dir + "/cert.pem", csr, ca_cert, ca_key)

View File

@ -1,6 +0,0 @@
#!/bin/bash
# shellcheck source=./common.sh
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
docker build -t pytorch .

View File

@ -1,9 +0,0 @@
#!/bin/bash
# shellcheck source=./common.sh
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
echo "Testing pytorch docs"
cd docs
TERM=vt100 make doctest

View File

@ -1,40 +0,0 @@
#!/bin/bash
# This is where the local pytorch install in the docker image is located
pt_checkout="/var/lib/jenkins/workspace"
source "$pt_checkout/.ci/pytorch/common_utils.sh"
echo "functorch_doc_push_script.sh: Invoked with $*"
set -ex
version=${DOCS_VERSION:-nightly}
echo "version: $version"
# Build functorch docs
pushd $pt_checkout/functorch/docs
make html
popd
git clone https://github.com/pytorch/functorch -b gh-pages --depth 1 functorch_ghpages
pushd functorch_ghpages
if [ "$version" == "main" ]; then
version=nightly
fi
git rm -rf "$version" || true
mv "$pt_checkout/functorch/docs/build/html" "$version"
git add "$version" || true
git status
git config user.email "soumith+bot@pytorch.org"
git config user.name "pytorchbot"
# If there aren't changes, don't make a commit; push is no-op
git commit -m "Generate Python docs from pytorch/pytorch@${GITHUB_SHA}" || true
git status
if [[ "${WITH_PUSH:-}" == true ]]; then
git push -u origin gh-pages
fi
popd

View File

@ -1,37 +0,0 @@
#!/bin/bash
# Script for installing sccache on the xla build job, which uses xla's docker
# image and doesn't have sccache installed on it. This is mostly copied from
# .ci/docker/install_cache.sh. Changes are: removing checks that will always
# return the same thing, ex checks for for rocm, CUDA, and changing the path
# where sccache is installed, and not changing /etc/environment.
set -ex
install_binary() {
echo "Downloading sccache binary from S3 repo"
curl --retry 3 https://s3.amazonaws.com/ossci-linux/sccache -o /tmp/cache/bin/sccache
}
mkdir -p /tmp/cache/bin
mkdir -p /tmp/cache/lib
export PATH="/tmp/cache/bin:$PATH"
install_binary
chmod a+x /tmp/cache/bin/sccache
function write_sccache_stub() {
# Unset LD_PRELOAD for ps because of asan + ps issues
# https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90589
# shellcheck disable=SC2086
# shellcheck disable=SC2059
printf "#!/bin/sh\nif [ \$(env -u LD_PRELOAD ps -p \$PPID -o comm=) != sccache ]; then\n exec sccache $(which $1) \"\$@\"\nelse\n exec $(which $1) \"\$@\"\nfi" > "/tmp/cache/bin/$1"
chmod a+x "/tmp/cache/bin/$1"
}
write_sccache_stub cc
write_sccache_stub c++
write_sccache_stub gcc
write_sccache_stub g++
write_sccache_stub clang
write_sccache_stub clang++

View File

@ -1,92 +0,0 @@
#!/bin/bash
# shellcheck disable=SC2034
# shellcheck source=./macos-common.sh
source "$(dirname "${BASH_SOURCE[0]}")/macos-common.sh"
# shellcheck source=./common-build.sh
source "$(dirname "${BASH_SOURCE[0]}")/common-build.sh"
# Build PyTorch
if [ -z "${CI}" ]; then
export DEVELOPER_DIR=/Applications/Xcode9.app/Contents/Developer
fi
# This helper function wraps calls to binaries with sccache, but only if they're not already wrapped with sccache.
# For example, `clang` will be `sccache clang`, but `sccache clang` will not become `sccache sccache clang`.
# The way this is done is by detecting the command of the parent pid of the current process and checking whether
# that is sccache, and wrapping sccache around the process if its parent were not already sccache.
function write_sccache_stub() {
output=$1
binary=$(basename "${output}")
printf "#!/bin/sh\nif [ \$(ps auxc \$(ps auxc -o ppid \$\$ | grep \$\$ | rev | cut -d' ' -f1 | rev) | tr '\\\\n' ' ' | rev | cut -d' ' -f2 | rev) != sccache ]; then\n exec sccache %s \"\$@\"\nelse\n exec %s \"\$@\"\nfi" "$(which "${binary}")" "$(which "${binary}")" > "${output}"
chmod a+x "${output}"
}
if which sccache > /dev/null; then
# Create temp directory for sccache shims
tmp_dir=$(mktemp -d)
trap 'rm -rfv ${tmp_dir}' EXIT
write_sccache_stub "${tmp_dir}/clang++"
write_sccache_stub "${tmp_dir}/clang"
export PATH="${tmp_dir}:$PATH"
fi
cross_compile_arm64() {
# Cross compilation for arm64
# Explicitly set USE_DISTRIBUTED=0 to align with the default build config on mac. This also serves as the sole CI config that tests
# that building with USE_DISTRIBUTED=0 works at all. See https://github.com/pytorch/pytorch/issues/86448
USE_DISTRIBUTED=0 CMAKE_OSX_ARCHITECTURES=arm64 MACOSX_DEPLOYMENT_TARGET=11.0 USE_MKLDNN=OFF USE_QNNPACK=OFF WERROR=1 BUILD_TEST=OFF USE_PYTORCH_METAL=1 python setup.py bdist_wheel
}
compile_arm64() {
# Compilation for arm64
# TODO: Compile with OpenMP support (but this causes CI regressions as cross-compilation were done with OpenMP disabled)
USE_DISTRIBUTED=0 USE_OPENMP=1 MACOSX_DEPLOYMENT_TARGET=11.0 WERROR=1 BUILD_TEST=OFF USE_PYTORCH_METAL=1 python setup.py bdist_wheel
}
compile_x86_64() {
USE_DISTRIBUTED=0 WERROR=1 python setup.py bdist_wheel --plat-name=macosx_10_9_x86_64
}
build_lite_interpreter() {
echo "Testing libtorch (lite interpreter)."
CPP_BUILD="$(pwd)/../cpp_build"
# Ensure the removal of the tmp directory
trap 'rm -rfv ${CPP_BUILD}' EXIT
rm -rf "${CPP_BUILD}"
mkdir -p "${CPP_BUILD}/caffe2"
# It looks libtorch need to be built in "${CPP_BUILD}/caffe2 folder.
BUILD_LIBTORCH_PY=$PWD/tools/build_libtorch.py
pushd "${CPP_BUILD}/caffe2" || exit
VERBOSE=1 DEBUG=1 python "${BUILD_LIBTORCH_PY}"
popd || exit
"${CPP_BUILD}/caffe2/build/bin/test_lite_interpreter_runtime"
}
print_cmake_info
if [[ ${BUILD_ENVIRONMENT} = *arm64* ]]; then
if [[ $(uname -m) == "arm64" ]]; then
compile_arm64
else
cross_compile_arm64
fi
elif [[ ${BUILD_ENVIRONMENT} = *lite-interpreter* ]]; then
export BUILD_LITE_INTERPRETER=1
build_lite_interpreter
else
compile_x86_64
fi
if which sccache > /dev/null; then
print_sccache_stats
fi
python tools/stats/export_test_times.py
assert_git_not_dirty

View File

@ -1,33 +0,0 @@
#!/bin/bash
# Common prelude for macos-build.sh and macos-test.sh
# shellcheck source=./common.sh
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
sysctl -a | grep machdep.cpu
# These are required for both the build job and the test job.
# In the latter to test cpp extensions.
export MACOSX_DEPLOYMENT_TARGET=11.1
export CXX=clang++
export CC=clang
print_cmake_info() {
CMAKE_EXEC=$(which cmake)
echo "$CMAKE_EXEC"
CONDA_INSTALLATION_DIR=$(dirname "$CMAKE_EXEC")
# Print all libraries under cmake rpath for debugging
ls -la "$CONDA_INSTALLATION_DIR/../lib"
export CMAKE_EXEC
# Explicitly add conda env lib folder to cmake rpath to address the flaky issue
# where cmake dependencies couldn't be found. This seems to point to how conda
# links $CMAKE_EXEC to its package cache when cloning a new environment
install_name_tool -add_rpath @executable_path/../lib "${CMAKE_EXEC}" || true
# Adding the rpath will invalidate cmake signature, so signing it again here
# to trust the executable. EXC_BAD_ACCESS (SIGKILL (Code Signature Invalid))
# with an exit code 137 otherwise
codesign -f -s - "${CMAKE_EXEC}" || true
}

View File

@ -1,169 +0,0 @@
#!/bin/bash
# shellcheck disable=SC2034
# shellcheck source=./macos-common.sh
source "$(dirname "${BASH_SOURCE[0]}")/macos-common.sh"
if [[ -n "$CONDA_ENV" ]]; then
# Use binaries under conda environment
export PATH="$CONDA_ENV/bin":$PATH
fi
# Test that OpenMP is enabled for non-arm64 build
if [[ ${BUILD_ENVIRONMENT} != *arm64* ]]; then
pushd test
if [[ ! $(python -c "import torch; print(int(torch.backends.openmp.is_available()))") == "1" ]]; then
echo "Build should have OpenMP enabled, but torch.backends.openmp.is_available() is False"
exit 1
fi
popd
fi
setup_test_python() {
# The CircleCI worker hostname doesn't resolve to an address.
# This environment variable makes ProcessGroupGloo default to
# using the address associated with the loopback interface.
export GLOO_SOCKET_IFNAME=lo0
echo "Ninja version: $(ninja --version)"
echo "Python version: $(which python) ($(python --version))"
# Increase default limit on open file handles from 256 to 1024
ulimit -n 1024
}
test_python_all() {
setup_test_python
time python test/run_test.py --verbose --exclude-jit-executor
assert_git_not_dirty
}
test_python_shard() {
if [[ -z "$NUM_TEST_SHARDS" ]]; then
echo "NUM_TEST_SHARDS must be defined to run a Python test shard"
exit 1
fi
setup_test_python
time python test/run_test.py --verbose --exclude-jit-executor --exclude-distributed-tests --shard "$1" "$NUM_TEST_SHARDS"
assert_git_not_dirty
}
test_libtorch() {
# C++ API
if [[ "$BUILD_TEST_LIBTORCH" == "1" ]]; then
# NB: Install outside of source directory (at the same level as the root
# pytorch folder) so that it doesn't get cleaned away prior to docker push.
# But still clean it before we perform our own build.
echo "Testing libtorch"
CPP_BUILD="$PWD/../cpp-build"
rm -rf "$CPP_BUILD"
mkdir -p "$CPP_BUILD"/caffe2
BUILD_LIBTORCH_PY=$PWD/tools/build_libtorch.py
pushd "$CPP_BUILD"/caffe2
VERBOSE=1 DEBUG=1 python "$BUILD_LIBTORCH_PY"
popd
MNIST_DIR="${PWD}/test/cpp/api/mnist"
python tools/download_mnist.py --quiet -d "${MNIST_DIR}"
# Unfortunately it seems like the test can't load from miniconda3
# without these paths being set
export DYLD_LIBRARY_PATH="$DYLD_LIBRARY_PATH:$PWD/miniconda3/lib"
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$PWD/miniconda3/lib"
TORCH_CPP_TEST_MNIST_PATH="${MNIST_DIR}" CPP_TESTS_DIR="${CPP_BUILD}/caffe2/bin" python test/run_test.py --cpp --verbose -i cpp/test_api
assert_git_not_dirty
fi
}
test_custom_backend() {
print_cmake_info
echo "Testing custom backends"
pushd test/custom_backend
rm -rf build && mkdir build
pushd build
SITE_PACKAGES="$(python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())')"
CMAKE_PREFIX_PATH="$SITE_PACKAGES/torch" "${CMAKE_EXEC}" ..
make VERBOSE=1
popd
# Run Python tests and export a lowered module.
python test_custom_backend.py -v
python backend.py --export-module-to=model.pt
# Run C++ tests using the exported module.
build/test_custom_backend ./model.pt
rm -f ./model.pt
popd
assert_git_not_dirty
}
test_custom_script_ops() {
print_cmake_info
echo "Testing custom script operators"
pushd test/custom_operator
# Build the custom operator library.
rm -rf build && mkdir build
pushd build
SITE_PACKAGES="$(python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())')"
CMAKE_PREFIX_PATH="$SITE_PACKAGES/torch" "${CMAKE_EXEC}" ..
make VERBOSE=1
popd
# Run tests Python-side and export a script module.
python test_custom_ops.py -v
python model.py --export-script-module=model.pt
# Run tests C++-side and load the exported script module.
build/test_custom_ops ./model.pt
popd
assert_git_not_dirty
}
test_jit_hooks() {
print_cmake_info
echo "Testing jit hooks in cpp"
pushd test/jit_hooks
# Build the custom operator library.
rm -rf build && mkdir build
pushd build
SITE_PACKAGES="$(python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())')"
CMAKE_PREFIX_PATH="$SITE_PACKAGES/torch" "${CMAKE_EXEC}" ..
make VERBOSE=1
popd
# Run tests Python-side and export a script module.
python model.py --export-script-module=model
# Run tests C++-side and load the exported script module.
build/test_jit_hooks ./model
popd
assert_git_not_dirty
}
install_tlparse
if [[ $NUM_TEST_SHARDS -gt 1 ]]; then
test_python_shard "${SHARD_NUMBER}"
if [[ "${SHARD_NUMBER}" == 1 ]]; then
test_libtorch
test_custom_script_ops
elif [[ "${SHARD_NUMBER}" == 2 ]]; then
test_jit_hooks
test_custom_backend
fi
else
test_python_all
test_libtorch
test_custom_script_ops
test_jit_hooks
test_custom_backend
fi

View File

@ -1,62 +0,0 @@
#!/bin/bash
# Required environment variable: $BUILD_ENVIRONMENT
# (This is set by default in the Docker images we build, so you don't
# need to set it yourself.
# shellcheck source=./common.sh
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
echo "Testing pytorch"
time python test/run_test.py --include test_cuda_multigpu test_cuda_primary_ctx --verbose
# Disabling tests to see if they solve timeout issues; see https://github.com/pytorch/pytorch/issues/70015
# python tools/download_mnist.py --quiet -d test/cpp/api/mnist
# OMP_NUM_THREADS=2 TORCH_CPP_TEST_MNIST_PATH="test/cpp/api/mnist" build/bin/test_api
time python test/run_test.py --verbose -i distributed/test_c10d_common
time python test/run_test.py --verbose -i distributed/test_c10d_gloo
time python test/run_test.py --verbose -i distributed/test_c10d_nccl
time python test/run_test.py --verbose -i distributed/test_c10d_spawn_gloo
time python test/run_test.py --verbose -i distributed/test_c10d_spawn_nccl
time python test/run_test.py --verbose -i distributed/test_cuda_p2p
time python test/run_test.py --verbose -i distributed/test_store
time python test/run_test.py --verbose -i distributed/test_pg_wrapper
time python test/run_test.py --verbose -i distributed/rpc/cuda/test_tensorpipe_agent
# FSDP tests
for f in test/distributed/fsdp/*.py ; do time python test/run_test.py --verbose -i "${f#*/}" ; done
# ShardedTensor tests
time python test/run_test.py --verbose -i distributed/checkpoint/test_checkpoint
time python test/run_test.py --verbose -i distributed/checkpoint/test_file_system_checkpoint
time python test/run_test.py --verbose -i distributed/_shard/sharding_spec/test_sharding_spec
time python test/run_test.py --verbose -i distributed/_shard/sharding_plan/test_sharding_plan
time python test/run_test.py --verbose -i distributed/_shard/sharded_tensor/test_sharded_tensor
time python test/run_test.py --verbose -i distributed/_shard/sharded_tensor/test_sharded_tensor_reshard
# functional collective tests
time python test/run_test.py --verbose -i distributed/test_functional_api
# DTensor tests
time python test/run_test.py --verbose -i distributed/_tensor/test_random_ops
time python test/run_test.py --verbose -i distributed/_tensor/test_dtensor_compile
# DeviceMesh test
time python test/run_test.py --verbose -i distributed/test_device_mesh
# DTensor/TP tests
time python test/run_test.py --verbose -i distributed/tensor/parallel/test_ddp_2d_parallel
time python test/run_test.py --verbose -i distributed/tensor/parallel/test_fsdp_2d_parallel
time python test/run_test.py --verbose -i distributed/tensor/parallel/test_tp_examples
time python test/run_test.py --verbose -i distributed/tensor/parallel/test_tp_random_state
# FSDP2 tests
time python test/run_test.py --verbose -i distributed/_composable/fsdp/test_fully_shard_training -- -k test_2d_mlp_with_nd_mesh
# Pipelining composability tests
time python test/run_test.py --verbose -i distributed/pipelining/test_composability.py
# Other tests
time python test/run_test.py --verbose -i test_cuda_primary_ctx
time python test/run_test.py --verbose -i test_optim -- -k test_forloop_goes_right_direction_multigpu
time python test/run_test.py --verbose -i test_optim -- -k test_mixed_device_dtype
time python test/run_test.py --verbose -i test_foreach -- -k test_tensors_grouping
assert_git_not_dirty

View File

@ -1,90 +0,0 @@
import argparse
import json
import math
import sys
parser = argparse.ArgumentParser()
parser.add_argument(
"--test-name", dest="test_name", action="store", required=True, help="test name"
)
parser.add_argument(
"--sample-stats",
dest="sample_stats",
action="store",
required=True,
help="stats from sample",
)
parser.add_argument(
"--update",
action="store_true",
help="whether to update baseline using stats from sample",
)
args = parser.parse_args()
test_name = args.test_name
if "cpu" in test_name:
backend = "cpu"
elif "gpu" in test_name:
backend = "gpu"
data_file_path = f"../{backend}_runtime.json"
with open(data_file_path) as data_file:
data = json.load(data_file)
if test_name in data:
mean = float(data[test_name]["mean"])
sigma = float(data[test_name]["sigma"])
else:
# Let the test pass if baseline number doesn't exist
mean = sys.maxsize
sigma = 0.001
print("population mean: ", mean)
print("population sigma: ", sigma)
# Let the test pass if baseline number is NaN (which happened in
# the past when we didn't have logic for catching NaN numbers)
if math.isnan(mean) or math.isnan(sigma):
mean = sys.maxsize
sigma = 0.001
sample_stats_data = json.loads(args.sample_stats)
sample_mean = float(sample_stats_data["mean"])
sample_sigma = float(sample_stats_data["sigma"])
print("sample mean: ", sample_mean)
print("sample sigma: ", sample_sigma)
if math.isnan(sample_mean):
raise Exception("""Error: sample mean is NaN""") # noqa: TRY002
elif math.isnan(sample_sigma):
raise Exception("""Error: sample sigma is NaN""") # noqa: TRY002
z_value = (sample_mean - mean) / sigma
print("z-value: ", z_value)
if z_value >= 3:
raise Exception( # noqa: TRY002
f"""\n
z-value >= 3, there is high chance of perf regression.\n
To reproduce this regression, run
`cd .ci/pytorch/perf_test/ && bash {test_name}.sh` on your local machine
and compare the runtime before/after your code change.
"""
)
else:
print("z-value < 3, no perf regression detected.")
if args.update:
print("We will use these numbers as new baseline.")
new_data_file_path = f"../new_{backend}_runtime.json"
with open(new_data_file_path) as new_data_file:
new_data = json.load(new_data_file)
new_data[test_name] = {}
new_data[test_name]["mean"] = sample_mean
new_data[test_name]["sigma"] = max(sample_sigma, sample_mean * 0.1)
with open(new_data_file_path, "w") as new_data_file:
json.dump(new_data, new_data_file, indent=4)

View File

@ -1,13 +0,0 @@
import json
import sys
data_file_path = sys.argv[1]
commit_hash = sys.argv[2]
with open(data_file_path) as data_file:
data = json.load(data_file)
data["commit"] = commit_hash
with open(data_file_path, "w") as data_file:
json.dump(data, data_file)

View File

@ -1,17 +0,0 @@
import sys
log_file_path = sys.argv[1]
with open(log_file_path) as f:
lines = f.readlines()
for line in lines:
# Ignore errors from CPU instruction set, symbol existing testing,
# or compilation error formatting
ignored_keywords = [
"src.c",
"CheckSymbolExists.c",
"test_compilation_error_formatting",
]
if all(keyword not in line for keyword in ignored_keywords):
print(line)

View File

@ -1,144 +0,0 @@
#!/bin/bash
# This is where the local pytorch install in the docker image is located
pt_checkout="/var/lib/jenkins/workspace"
source "$pt_checkout/.ci/pytorch/common_utils.sh"
echo "python_doc_push_script.sh: Invoked with $*"
set -ex
# for statements like ${1:-${DOCS_INSTALL_PATH:-docs/}}
# the order of operations goes:
# 1. Check if there's an argument $1
# 2. If no argument check for environment var DOCS_INSTALL_PATH
# 3. If no environment var fall back to default 'docs/'
# NOTE: It might seem weird to gather the second argument before gathering the first argument
# but since DOCS_INSTALL_PATH can be derived from DOCS_VERSION it's probably better to
# try and gather it first, just so we don't potentially break people who rely on this script
# Argument 2: What version of the docs we are building.
version="${2:-${DOCS_VERSION:-main}}"
if [ -z "$version" ]; then
echo "error: python_doc_push_script.sh: version (arg2) not specified"
exit 1
fi
# Argument 1: Where to copy the built documentation to
# (pytorch_docs/$install_path)
install_path="${1:-${DOCS_INSTALL_PATH:-${DOCS_VERSION}}}"
if [ -z "$install_path" ]; then
echo "error: python_doc_push_script.sh: install_path (arg1) not specified"
exit 1
fi
is_main_doc=false
if [ "$version" == "main" ]; then
is_main_doc=true
fi
# Argument 3: The branch to push to. Usually is "site"
branch="${3:-${DOCS_BRANCH:-site}}"
if [ -z "$branch" ]; then
echo "error: python_doc_push_script.sh: branch (arg3) not specified"
exit 1
fi
echo "install_path: $install_path version: $version"
build_docs () {
set +e
set -o pipefail
make "$1" 2>&1 | tee /tmp/docs_build.txt
code=$?
if [ $code -ne 0 ]; then
set +x
echo =========================
grep "WARNING:" /tmp/docs_build.txt
echo =========================
echo Docs build failed. If the failure is not clear, scan back in the log
echo for any WARNINGS or for the line "build finished with problems"
echo "(tried to echo the WARNINGS above the ==== line)"
echo =========================
fi
set -ex
return $code
}
git clone https://github.com/pytorch/docs pytorch_docs -b "$branch" --depth 1
pushd pytorch_docs
export LC_ALL=C
export PATH=/opt/conda/bin:$PATH
if [ -n "$ANACONDA_PYTHON_VERSION" ]; then
export PATH=/opt/conda/envs/py_$ANACONDA_PYTHON_VERSION/bin:$PATH
fi
rm -rf pytorch || true
# Get all the documentation sources, put them in one place
pushd "$pt_checkout"
pushd docs
# Build the docs
if [ "$is_main_doc" = true ]; then
build_docs html || exit $?
make coverage
# Now we have the coverage report, we need to make sure it is empty.
# Count the number of lines in the file and turn that number into a variable
# $lines. The `cut -f1 ...` is to only parse the number, not the filename
# Skip the report header by subtracting 2: the header will be output even if
# there are no undocumented items.
#
# Also: see docs/source/conf.py for "coverage_ignore*" items, which should
# be documented then removed from there.
lines=$(wc -l build/coverage/python.txt 2>/dev/null |cut -f1 -d' ')
undocumented=$((lines - 2))
if [ $undocumented -lt 0 ]; then
echo coverage output not found
exit 1
elif [ $undocumented -gt 0 ]; then
echo undocumented objects found:
cat build/coverage/python.txt
echo "Make sure you've updated relevant .rsts in docs/source!"
echo "You can reproduce locally by running 'cd docs && make coverage && cat build/coverage/python.txt'"
exit 1
fi
else
# skip coverage, format for stable or tags
build_docs html-stable || exit $?
fi
# Move them into the docs repo
popd
popd
git rm -rf "$install_path" || true
mv "$pt_checkout/docs/build/html" "$install_path"
# Prevent Google from indexing $install_path/_modules. This folder contains
# generated source files.
# NB: the following only works on gnu sed. The sed shipped with mac os is different.
# One can `brew install gnu-sed` on a mac and then use "gsed" instead of "sed".
find "$install_path/_modules" -name "*.html" -print0 | xargs -0 sed -i '/<head>/a \ \ <meta name="robots" content="noindex">'
git add "$install_path" || true
git status
git config user.email "soumith+bot@pytorch.org"
git config user.name "pytorchbot"
# If there aren't changes, don't make a commit; push is no-op
git commit -m "Generate Python docs from pytorch/pytorch@${GITHUB_SHA}" || true
git status
if [[ "${WITH_PUSH:-}" == true ]]; then
# push to a temp branch first to trigger CLA check and satisfy branch protections
git push -u origin HEAD:pytorchbot/temp-branch-py -f
git push -u origin HEAD^:pytorchbot/base -f
sleep 30
git push -u origin "${branch}"
fi
popd

View File

@ -1,71 +0,0 @@
#!/bin/bash
SCRIPT_PARENT_DIR=$(dirname "${BASH_SOURCE[0]}")
# shellcheck source=.ci/pytorch/common.sh
source "$SCRIPT_PARENT_DIR/common.sh"
cd .ci/pytorch/perf_test
echo "Running CPU perf test for PyTorch..."
pip install -q awscli
# Set multipart_threshold to be sufficiently high, so that `aws s3 cp` is not a multipart read
# More info at https://github.com/aws/aws-cli/issues/2321
aws configure set default.s3.multipart_threshold 5GB
UPSTREAM_DEFAULT_BRANCH="$(git remote show https://github.com/pytorch/pytorch.git | awk '/HEAD branch/ {print $NF}')"
if [[ "$COMMIT_SOURCE" == "$UPSTREAM_DEFAULT_BRANCH" ]]; then
# Get current default branch commit hash
DEFAULT_BRANCH_COMMIT_ID=$(git log --format="%H" -n 1)
export DEFAULT_BRANCH_COMMIT_ID
fi
# Find the default branch commit to test against
git remote add upstream https://github.com/pytorch/pytorch.git
git fetch upstream
IFS=$'\n'
while IFS='' read -r commit_id; do
if aws s3 ls s3://ossci-perf-test/pytorch/cpu_runtime/"${commit_id}".json; then
LATEST_TESTED_COMMIT=${commit_id}
break
fi
done < <(git rev-list upstream/"$UPSTREAM_DEFAULT_BRANCH")
aws s3 cp s3://ossci-perf-test/pytorch/cpu_runtime/"${LATEST_TESTED_COMMIT}".json cpu_runtime.json
if [[ "$COMMIT_SOURCE" == "$UPSTREAM_DEFAULT_BRANCH" ]]; then
# Prepare new baseline file
cp cpu_runtime.json new_cpu_runtime.json
python update_commit_hash.py new_cpu_runtime.json "${DEFAULT_BRANCH_COMMIT_ID}"
fi
# Include tests
# shellcheck source=./perf_test/test_cpu_speed_mini_sequence_labeler.sh
. ./test_cpu_speed_mini_sequence_labeler.sh
# shellcheck source=./perf_test/test_cpu_speed_mnist.sh
. ./test_cpu_speed_mnist.sh
# shellcheck source=./perf_test/test_cpu_speed_torch.sh
. ./test_cpu_speed_torch.sh
# shellcheck source=./perf_test/test_cpu_speed_torch_tensor.sh
. ./test_cpu_speed_torch_tensor.sh
# Run tests
export TEST_MODE="compare_with_baseline"
if [[ "$COMMIT_SOURCE" == "$UPSTREAM_DEFAULT_BRANCH" ]]; then
export TEST_MODE="compare_and_update"
fi
# Operator tests
run_test test_cpu_speed_torch ${TEST_MODE}
run_test test_cpu_speed_torch_tensor ${TEST_MODE}
# Sample model tests
run_test test_cpu_speed_mini_sequence_labeler 20 ${TEST_MODE}
run_test test_cpu_speed_mnist 20 ${TEST_MODE}
if [[ "$COMMIT_SOURCE" == "$UPSTREAM_DEFAULT_BRANCH" ]]; then
# This could cause race condition if we are testing the same default branch commit twice,
# but the chance of them executing this line at the same time is low.
aws s3 cp new_cpu_runtime.json s3://ossci-perf-test/pytorch/cpu_runtime/"${DEFAULT_BRANCH_COMMIT_ID}".json --acl public-read
fi

View File

@ -1,76 +0,0 @@
#!/bin/bash
# shellcheck source=./common.sh
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
pushd .ci/pytorch/perf_test
echo "Running GPU perf test for PyTorch..."
# Trying to uninstall PyYAML can cause problem. Workaround according to:
# https://github.com/pypa/pip/issues/5247#issuecomment-415571153
pip install -q awscli --ignore-installed PyYAML
# Set multipart_threshold to be sufficiently high, so that `aws s3 cp` is not a multipart read
# More info at https://github.com/aws/aws-cli/issues/2321
aws configure set default.s3.multipart_threshold 5GB
UPSTREAM_DEFAULT_BRANCH="$(git remote show https://github.com/pytorch/pytorch.git | awk '/HEAD branch/ {print $NF}')"
if [[ "$COMMIT_SOURCE" == "$UPSTREAM_DEFAULT_BRANCH" ]]; then
# Get current default branch commit hash
DEFAULT_BRANCH_COMMIT_ID=$(git log --format="%H" -n 1)
export DEFAULT_BRANCH_COMMIT_ID
fi
# Find the default branch commit to test against
git remote add upstream https://github.com/pytorch/pytorch.git
git fetch upstream
IFS=$'\n'
while IFS='' read -r commit_id; do
if aws s3 ls s3://ossci-perf-test/pytorch/gpu_runtime/"${commit_id}".json; then
LATEST_TESTED_COMMIT=${commit_id}
break
fi
done < <(git rev-list upstream/"$UPSTREAM_DEFAULT_BRANCH")
aws s3 cp s3://ossci-perf-test/pytorch/gpu_runtime/"${LATEST_TESTED_COMMIT}".json gpu_runtime.json
if [[ "$COMMIT_SOURCE" == "$UPSTREAM_DEFAULT_BRANCH" ]]; then
# Prepare new baseline file
cp gpu_runtime.json new_gpu_runtime.json
python update_commit_hash.py new_gpu_runtime.json "${DEFAULT_BRANCH_COMMIT_ID}"
fi
# Include tests
# shellcheck source=./perf_test/test_gpu_speed_mnist.sh
. ./test_gpu_speed_mnist.sh
# shellcheck source=./perf_test/test_gpu_speed_word_language_model.sh
. ./test_gpu_speed_word_language_model.sh
# shellcheck source=./perf_test/test_gpu_speed_cudnn_lstm.sh
. ./test_gpu_speed_cudnn_lstm.sh
# shellcheck source=./perf_test/test_gpu_speed_lstm.sh
. ./test_gpu_speed_lstm.sh
# shellcheck source=./perf_test/test_gpu_speed_mlstm.sh
. ./test_gpu_speed_mlstm.sh
# Run tests
if [[ "$COMMIT_SOURCE" == "$UPSTREAM_DEFAULT_BRANCH" ]]; then
run_test test_gpu_speed_mnist 20 compare_and_update
run_test test_gpu_speed_word_language_model 20 compare_and_update
run_test test_gpu_speed_cudnn_lstm 20 compare_and_update
run_test test_gpu_speed_lstm 20 compare_and_update
run_test test_gpu_speed_mlstm 20 compare_and_update
else
run_test test_gpu_speed_mnist 20 compare_with_baseline
run_test test_gpu_speed_word_language_model 20 compare_with_baseline
run_test test_gpu_speed_cudnn_lstm 20 compare_with_baseline
run_test test_gpu_speed_lstm 20 compare_with_baseline
run_test test_gpu_speed_mlstm 20 compare_with_baseline
fi
if [[ "$COMMIT_SOURCE" == "$UPSTREAM_DEFAULT_BRANCH" ]]; then
# This could cause race condition if we are testing the same default branch commit twice,
# but the chance of them executing this line at the same time is low.
aws s3 cp new_gpu_runtime.json s3://ossci-perf-test/pytorch/gpu_runtime/"${DEFAULT_BRANCH_COMMIT_ID}".json --acl public-read
fi
popd

File diff suppressed because it is too large Load Diff

View File

@ -1,47 +0,0 @@
#!/bin/bash
# If you want to rebuild, run this with REBUILD=1
# If you want to build with CUDA, run this with USE_CUDA=1
# If you want to build without CUDA, run this with USE_CUDA=0
if [ ! -f setup.py ]; then
echo "ERROR: Please run this build script from PyTorch root directory."
exit 1
fi
SCRIPT_PARENT_DIR=$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )
# shellcheck source=./common.sh
source "$SCRIPT_PARENT_DIR/common.sh"
# shellcheck source=./common-build.sh
source "$SCRIPT_PARENT_DIR/common-build.sh"
export TMP_DIR="${PWD}/build/win_tmp"
TMP_DIR_WIN=$(cygpath -w "${TMP_DIR}")
export TMP_DIR_WIN
export PYTORCH_FINAL_PACKAGE_DIR=${PYTORCH_FINAL_PACKAGE_DIR:-/c/w/build-results}
if [[ -n "$PYTORCH_FINAL_PACKAGE_DIR" ]]; then
mkdir -p "$PYTORCH_FINAL_PACKAGE_DIR" || true
fi
export SCRIPT_HELPERS_DIR=$SCRIPT_PARENT_DIR/win-test-helpers
set +ex
grep -E -R 'PyLong_(From|As)(Unsigned|)Long\(' --exclude=python_numbers.h --exclude=eval_frame.c torch/
PYLONG_API_CHECK=$?
if [[ $PYLONG_API_CHECK == 0 ]]; then
echo "Usage of PyLong_{From,As}{Unsigned}Long API may lead to overflow errors on Windows"
echo "because \`sizeof(long) == 4\` and \`sizeof(unsigned long) == 4\`."
echo "Please include \"torch/csrc/utils/python_numbers.h\" and use the correspoding APIs instead."
echo "PyLong_FromLong -> THPUtils_packInt32 / THPUtils_packInt64"
echo "PyLong_AsLong -> THPUtils_unpackInt (32-bit) / THPUtils_unpackLong (64-bit)"
echo "PyLong_FromUnsignedLong -> THPUtils_packUInt32 / THPUtils_packUInt64"
echo "PyLong_AsUnsignedLong -> THPUtils_unpackUInt32 / THPUtils_unpackUInt64"
exit 1
fi
set -ex
"$SCRIPT_HELPERS_DIR"/build_pytorch.bat
assert_git_not_dirty
echo "BUILD PASSED"

View File

@ -1,142 +0,0 @@
if "%DEBUG%" == "1" (
set BUILD_TYPE=debug
) ELSE (
set BUILD_TYPE=release
)
set PATH=C:\Program Files\CMake\bin;C:\Program Files\7-Zip;C:\ProgramData\chocolatey\bin;C:\Program Files\Git\cmd;C:\Program Files\Amazon\AWSCLI;C:\Program Files\Amazon\AWSCLI\bin;%PATH%
:: This inflates our log size slightly, but it is REALLY useful to be
:: able to see what our cl.exe commands are (since you can actually
:: just copy-paste them into a local Windows setup to just rebuild a
:: single file.)
:: log sizes are too long, but leaving this here incase someone wants to use it locally
:: set CMAKE_VERBOSE_MAKEFILE=1
set INSTALLER_DIR=%SCRIPT_HELPERS_DIR%\installation-helpers
call %INSTALLER_DIR%\install_magma.bat
if errorlevel 1 goto fail
if not errorlevel 0 goto fail
call %INSTALLER_DIR%\install_sccache.bat
if errorlevel 1 goto fail
if not errorlevel 0 goto fail
:: Miniconda has been installed as part of the Windows AMI with all the dependencies.
:: We just need to activate it here
call %INSTALLER_DIR%\activate_miniconda3.bat
if errorlevel 1 goto fail
if not errorlevel 0 goto fail
call pip install mkl-include==2021.4.0 mkl-devel==2021.4.0
if errorlevel 1 goto fail
if not errorlevel 0 goto fail
:: Override VS env here
pushd .
if "%VC_VERSION%" == "" (
call "C:\Program Files (x86)\Microsoft Visual Studio\%VC_YEAR%\%VC_PRODUCT%\VC\Auxiliary\Build\vcvarsall.bat" x64
) else (
call "C:\Program Files (x86)\Microsoft Visual Studio\%VC_YEAR%\%VC_PRODUCT%\VC\Auxiliary\Build\vcvarsall.bat" x64 -vcvars_ver=%VC_VERSION%
)
if errorlevel 1 goto fail
if not errorlevel 0 goto fail
@echo on
popd
if not "%USE_CUDA%"=="1" goto cuda_build_end
set CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v%CUDA_VERSION%
if x%CUDA_VERSION:.=%==x%CUDA_VERSION% (
echo CUDA version %CUDA_VERSION% format isn't correct, which doesn't contain '.'
goto fail
)
rem version transformer, for example 10.1 to 10_1.
if x%CUDA_VERSION:.=%==x%CUDA_VERSION% (
echo CUDA version %CUDA_VERSION% format isn't correct, which doesn't contain '.'
goto fail
)
set VERSION_SUFFIX=%CUDA_VERSION:.=_%
set CUDA_PATH_V%VERSION_SUFFIX%=%CUDA_PATH%
set CUDNN_LIB_DIR=%CUDA_PATH%\lib\x64
set CUDA_TOOLKIT_ROOT_DIR=%CUDA_PATH%
set CUDNN_ROOT_DIR=%CUDA_PATH%
set NVTOOLSEXT_PATH=C:\Program Files\NVIDIA Corporation\NvToolsExt
set PATH=%CUDA_PATH%\bin;%CUDA_PATH%\libnvvp;%PATH%
set CUDNN_LIB_DIR=%CUDA_PATH%\lib\x64
set CUDA_TOOLKIT_ROOT_DIR=%CUDA_PATH%
set CUDNN_ROOT_DIR=%CUDA_PATH%
set NVTOOLSEXT_PATH=C:\Program Files\NVIDIA Corporation\NvToolsExt
set PATH=%CUDA_PATH%\bin;%CUDA_PATH%\libnvvp;%PATH%
:cuda_build_end
set DISTUTILS_USE_SDK=1
set PATH=%TMP_DIR_WIN%\bin;%PATH%
:: The latest Windows CUDA test is running on AWS G5 runner with A10G GPU
if "%TORCH_CUDA_ARCH_LIST%" == "" set TORCH_CUDA_ARCH_LIST=8.6
:: The default sccache idle timeout is 600, which is too short and leads to intermittent build errors.
set SCCACHE_IDLE_TIMEOUT=0
set SCCACHE_IGNORE_SERVER_IO_ERROR=1
sccache --stop-server
sccache --start-server
sccache --zero-stats
set CMAKE_C_COMPILER_LAUNCHER=sccache
set CMAKE_CXX_COMPILER_LAUNCHER=sccache
set CMAKE_GENERATOR=Ninja
if "%USE_CUDA%"=="1" (
:: randomtemp is used to resolve the intermittent build error related to CUDA.
:: code: https://github.com/peterjc123/randomtemp-rust
:: issue: https://github.com/pytorch/pytorch/issues/25393
::
:: CMake requires a single command as CUDA_NVCC_EXECUTABLE, so we push the wrappers
:: randomtemp.exe and sccache.exe into a batch file which CMake invokes.
curl -kL https://github.com/peterjc123/randomtemp-rust/releases/download/v0.4/randomtemp.exe --output %TMP_DIR_WIN%\bin\randomtemp.exe
if errorlevel 1 goto fail
if not errorlevel 0 goto fail
echo @"%TMP_DIR_WIN%\bin\randomtemp.exe" "%TMP_DIR_WIN%\bin\sccache.exe" "%CUDA_PATH%\bin\nvcc.exe" %%* > "%TMP_DIR%/bin/nvcc.bat"
cat %TMP_DIR%/bin/nvcc.bat
set CUDA_NVCC_EXECUTABLE=%TMP_DIR%/bin/nvcc.bat
for /F "usebackq delims=" %%n in (`cygpath -m "%CUDA_PATH%\bin\nvcc.exe"`) do set CMAKE_CUDA_COMPILER=%%n
set CMAKE_CUDA_COMPILER_LAUNCHER=%TMP_DIR%/bin/randomtemp.exe;%TMP_DIR%\bin\sccache.exe
)
:: Print all existing environment variable for debugging
set
python setup.py bdist_wheel
if errorlevel 1 goto fail
if not errorlevel 0 goto fail
sccache --show-stats
python -c "import os, glob; os.system('python -mpip install --no-index --no-deps ' + glob.glob('dist/*.whl')[0])"
(
if "%BUILD_ENVIRONMENT%"=="" (
echo NOTE: To run `import torch`, please make sure to activate the conda environment by running `call %CONDA_PARENT_DIR%\Miniconda3\Scripts\activate.bat %CONDA_PARENT_DIR%\Miniconda3` in Command Prompt before running Git Bash.
) else (
copy /Y "dist\*.whl" "%PYTORCH_FINAL_PACKAGE_DIR%"
:: export test times so that potential sharded tests that'll branch off this build will use consistent data
python tools/stats/export_test_times.py
robocopy /E ".additional_ci_files" "%PYTORCH_FINAL_PACKAGE_DIR%\.additional_ci_files"
:: Also save build/.ninja_log as an artifact
copy /Y "build\.ninja_log" "%PYTORCH_FINAL_PACKAGE_DIR%\"
)
)
sccache --show-stats --stats-format json | jq .stats > sccache-stats-%BUILD_ENVIRONMENT%-%OUR_GITHUB_JOB_ID%.json
sccache --stop-server
exit /b 0
:fail
exit /b 1

View File

@ -1,26 +0,0 @@
if "%BUILD_ENVIRONMENT%"=="" (
set CONDA_PARENT_DIR=%CD%
) else (
set CONDA_PARENT_DIR=C:\Jenkins
)
:: Be conservative here when rolling out the new AMI with conda. This will try
:: to install conda as before if it couldn't find the conda installation. This
:: can be removed eventually after we gain enough confidence in the AMI
if not exist %CONDA_PARENT_DIR%\Miniconda3 (
set INSTALL_FRESH_CONDA=1
)
if "%INSTALL_FRESH_CONDA%"=="1" (
curl --retry 3 --retry-all-errors -k https://repo.anaconda.com/miniconda/Miniconda3-latest-Windows-x86_64.exe --output %TMP_DIR_WIN%\Miniconda3-latest-Windows-x86_64.exe
if errorlevel 1 exit /b
if not errorlevel 0 exit /b
%TMP_DIR_WIN%\Miniconda3-latest-Windows-x86_64.exe /InstallationType=JustMe /RegisterPython=0 /S /AddToPath=0 /D=%CONDA_PARENT_DIR%\Miniconda3
if errorlevel 1 exit /b
if not errorlevel 0 exit /b
)
:: Activate conda so that we can use its commands, i.e. conda, python, pip
call %CONDA_PARENT_DIR%\Miniconda3\Scripts\activate.bat %CONDA_PARENT_DIR%\Miniconda3

View File

@ -1,37 +0,0 @@
if "%CUDA_VERSION%" == "cpu" (
echo skip magma installation for cpu builds
exit /b 0
)
rem remove dot in cuda_version, fox example 11.1 to 111
if not "%USE_CUDA%"=="1" (
exit /b 0
)
if x%CUDA_VERSION:.=%==x%CUDA_VERSION% (
echo CUDA version %CUDA_VERSION% format isn't correct, which doesn't contain '.'
exit /b 1
)
set VERSION_SUFFIX=%CUDA_VERSION:.=%
set CUDA_SUFFIX=cuda%VERSION_SUFFIX%
if "%CUDA_SUFFIX%" == "" (
echo unknown CUDA version, please set `CUDA_VERSION` higher than 10.2
exit /b 1
)
if "%REBUILD%"=="" (
if "%BUILD_ENVIRONMENT%"=="" (
curl --retry 3 --retry-all-errors -k https://s3.amazonaws.com/ossci-windows/magma_2.5.4_%CUDA_SUFFIX%_%BUILD_TYPE%.7z --output %TMP_DIR_WIN%\magma_2.5.4_%CUDA_SUFFIX%_%BUILD_TYPE%.7z
) else (
aws s3 cp s3://ossci-windows/magma_2.5.4_%CUDA_SUFFIX%_%BUILD_TYPE%.7z %TMP_DIR_WIN%\magma_2.5.4_%CUDA_SUFFIX%_%BUILD_TYPE%.7z --quiet
)
if errorlevel 1 exit /b
if not errorlevel 0 exit /b
7z x -aoa %TMP_DIR_WIN%\magma_2.5.4_%CUDA_SUFFIX%_%BUILD_TYPE%.7z -o%TMP_DIR_WIN%\magma
if errorlevel 1 exit /b
if not errorlevel 0 exit /b
)
set MAGMA_HOME=%TMP_DIR_WIN%\magma

View File

@ -1,13 +0,0 @@
mkdir %TMP_DIR_WIN%\bin
if "%REBUILD%"=="" (
IF EXIST %TMP_DIR_WIN%\bin\sccache.exe (
taskkill /im sccache.exe /f /t || ver > nul
del %TMP_DIR_WIN%\bin\sccache.exe || ver > nul
)
if "%BUILD_ENVIRONMENT%"=="" (
curl --retry 3 --retry-all-errors -k https://s3.amazonaws.com/ossci-windows/sccache-v0.7.4.exe --output %TMP_DIR_WIN%\bin\sccache.exe
) else (
aws s3 cp s3://ossci-windows/sccache-v0.7.4.exe %TMP_DIR_WIN%\bin\sccache.exe
)
)

View File

@ -1,54 +0,0 @@
set PATH=C:\Program Files\CMake\bin;C:\Program Files\7-Zip;C:\ProgramData\chocolatey\bin;C:\Program Files\Git\cmd;C:\Program Files\Amazon\AWSCLI;C:\Program Files\Amazon\AWSCLI\bin;%PATH%
:: Install Miniconda3
set INSTALLER_DIR=%SCRIPT_HELPERS_DIR%\installation-helpers
:: Miniconda has been installed as part of the Windows AMI with all the dependencies.
:: We just need to activate it here
call %INSTALLER_DIR%\activate_miniconda3.bat
if errorlevel 1 exit /b
if not errorlevel 0 exit /b
:: PyTorch is now installed using the standard wheel on Windows into the conda environment.
:: However, the test scripts are still frequently referring to the workspace temp directory
:: build\torch. Rather than changing all these references, making a copy of torch folder
:: from conda to the current workspace is easier. The workspace will be cleaned up after
:: the job anyway
xcopy /s %CONDA_PARENT_DIR%\Miniconda3\Lib\site-packages\torch %TMP_DIR_WIN%\build\torch\
pushd .
if "%VC_VERSION%" == "" (
call "C:\Program Files (x86)\Microsoft Visual Studio\%VC_YEAR%\%VC_PRODUCT%\VC\Auxiliary\Build\vcvarsall.bat" x64
) else (
call "C:\Program Files (x86)\Microsoft Visual Studio\%VC_YEAR%\%VC_PRODUCT%\VC\Auxiliary\Build\vcvarsall.bat" x64 -vcvars_ver=%VC_VERSION%
)
if errorlevel 1 exit /b
if not errorlevel 0 exit /b
@echo on
popd
set DISTUTILS_USE_SDK=1
if not "%USE_CUDA%"=="1" goto cuda_build_end
set CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v%CUDA_VERSION%
rem version transformer, for example 10.1 to 10_1.
set VERSION_SUFFIX=%CUDA_VERSION:.=_%
set CUDA_PATH_V%VERSION_SUFFIX%=%CUDA_PATH%
set CUDNN_LIB_DIR=%CUDA_PATH%\lib\x64
set CUDA_TOOLKIT_ROOT_DIR=%CUDA_PATH%
set CUDNN_ROOT_DIR=%CUDA_PATH%
set NVTOOLSEXT_PATH=C:\Program Files\NVIDIA Corporation\NvToolsExt
set PATH=%CUDA_PATH%\bin;%CUDA_PATH%\libnvvp;%PATH%
set NUMBAPRO_CUDALIB=%CUDA_PATH%\bin
set NUMBAPRO_LIBDEVICE=%CUDA_PATH%\nvvm\libdevice
set NUMBAPRO_NVVM=%CUDA_PATH%\nvvm\bin\nvvm64_32_0.dll
:cuda_build_end
set PYTHONPATH=%TMP_DIR_WIN%\build;%PYTHONPATH%
:: Print all existing environment variable for debugging
set

View File

@ -1,54 +0,0 @@
call %SCRIPT_HELPERS_DIR%\setup_pytorch_env.bat
if errorlevel 1 exit /b 1
:: Save the current working directory so that we can go back there
set CWD=%cd%
set CPP_TESTS_DIR=%TMP_DIR_WIN%\build\torch\bin
set PATH=C:\Program Files\NVIDIA Corporation\NvToolsExt\bin\x64;%TMP_DIR_WIN%\build\torch\lib;%PATH%
set TORCH_CPP_TEST_MNIST_PATH=%CWD%\test\cpp\api\mnist
python tools\download_mnist.py --quiet -d %TORCH_CPP_TEST_MNIST_PATH%
python test\run_test.py --cpp --verbose -i cpp/test_api
if errorlevel 1 exit /b 1
if not errorlevel 0 exit /b 1
cd %TMP_DIR_WIN%\build\torch\test
for /r "." %%a in (*.exe) do (
call :libtorch_check "%%~na" "%%~fa"
if errorlevel 1 goto fail
)
goto :eof
:libtorch_check
cd %CWD%
set CPP_TESTS_DIR=%TMP_DIR_WIN%\build\torch\test
:: Skip verify_api_visibility as it a compile level test
if "%~1" == "verify_api_visibility" goto :eof
echo Running "%~2"
if "%~1" == "c10_intrusive_ptr_benchmark" (
:: NB: This is not a gtest executable file, thus couldn't be handled by pytest-cpp
call "%~2"
goto :eof
)
python test\run_test.py --cpp --verbose -i "cpp/%~1"
if errorlevel 1 (
echo %1 failed with exit code %errorlevel%
goto fail
)
if not errorlevel 0 (
echo %1 failed with exit code %errorlevel%
goto fail
)
:eof
exit /b 0
:fail
exit /b 1

View File

@ -1,12 +0,0 @@
call %SCRIPT_HELPERS_DIR%\setup_pytorch_env.bat
echo Copying over test times file
robocopy /E "%PYTORCH_FINAL_PACKAGE_DIR_WIN%\.additional_ci_files" "%PROJECT_DIR_WIN%\.additional_ci_files"
pushd test
echo Run jit_profiling tests
python run_test.py --include test_jit_legacy test_jit_fuser_legacy --verbose
if ERRORLEVEL 1 exit /b 1
popd

View File

@ -1,37 +0,0 @@
call %SCRIPT_HELPERS_DIR%\setup_pytorch_env.bat
:: exit the batch once there's an error
if not errorlevel 0 (
echo "setup pytorch env failed"
echo %errorlevel%
exit /b
)
pushd test
set GFLAGS_EXE="C:\Program Files (x86)\Windows Kits\10\Debuggers\x64\gflags.exe"
if "%SHARD_NUMBER%" == "1" (
if exist %GFLAGS_EXE% (
echo Some smoke tests
%GFLAGS_EXE% /i python.exe +sls
python %SCRIPT_HELPERS_DIR%\run_python_nn_smoketests.py
if ERRORLEVEL 1 goto fail
%GFLAGS_EXE% /i python.exe -sls
if ERRORLEVEL 1 goto fail
)
)
echo Copying over test times file
robocopy /E "%PYTORCH_FINAL_PACKAGE_DIR_WIN%\.additional_ci_files" "%PROJECT_DIR_WIN%\.additional_ci_files"
echo Run nn tests
python run_test.py --exclude-jit-executor --exclude-distributed-tests --shard "%SHARD_NUMBER%" "%NUM_TEST_SHARDS%" --verbose
if ERRORLEVEL 1 goto fail
popd
:eof
exit /b 0
:fail
exit /b 1

Some files were not shown because too many files have changed in this diff Show More