Compare commits

..

57 Commits

Author SHA1 Message Date
173368cb63 CUDAEvent::elapsed_time could accidentally initialize a non-used GPU (#122538)
This sets the device before call cudaEventElapsedTime to avoid the case
where the "cudaGetCurrentDevice" device would be initialized even though
neither event is on that device.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122538
Approved by: https://github.com/shuqiangzhang, https://github.com/wconstab
2024-04-22 15:39:28 -07:00
b86edd97d6 [nccl-pg] print broadcast ncclunique id duration (#123963)
Summary: Print NCCL PG broadcast nccl unique id duration for measurement.

Differential Revision: D56048059

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123963
Approved by: https://github.com/wconstab
2024-04-16 17:03:25 -07:00
b33a283e9a [nccl-pg] Pass pg name and desc to NCCL communicator (#124149)
Summary:
Pass Process Group Name and Desc to NCCL communicator in order to access pg information in NCCL layer.
The information is passed as commDesc string(i.e. "<pg_desc>:<pg_name>")
Function only valid when NCCL_COMM_DESCRIPTION is defined.

Differential Revision: D55703310

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124149
Approved by: https://github.com/shuqiangzhang
2024-04-16 15:08:38 -07:00
7a551d81e5 [c10d/nccl-pg] allow user to pass process group description (#123472)
Summary:
We need a way to allow user set a customized description for a process group, e.g. FSDP, PP.

Here are several use cases of user specified group_desc:
- Logging: we can easily match a log line and understand what's this collective/pg is used to.
- Pytorch traces (e.g. Kineto, Execution Trace) can benefit from the PG desc since trace analysis, benchmarks will be able to easily differentiate PG purpose like FSDP, PP.
- Lower layer collectives(e.g. NCCL) debug: we will be able to expose PG desc to NCCL communicator so NCCL layer operations can be easily correlated to a PG.

Solution: Add a group_desc field to c10d

Differential Revision: D55781850

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123472
Approved by: https://github.com/kwen2501
2024-04-16 15:08:38 -07:00
1515a90475 [DCP] Adds ability to create a CPU state dict that is both shared and pinned (#122338)
[DCP] Adds ability to create a CPU state dict that is both shared and pinned, as well as a new utility specific to copying the state dict

https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__MEMORY.html#group__CUDART__MEMORY_1ge8d5c17670f16ac4fc8fcb4181cb490c

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122338
Approved by: https://github.com/fegin
2024-04-16 15:08:22 -07:00
4882ec2a91 Pass and record process_group_name when creating ProcessGroupNCCL (#123117)
Summary:
Pass python c10d group_name to c++ ProcessGroupNCCL so that the pg name will be consistent across different layers.
Also record pg_name in flight recorder entry.

Differential Revision: D55597200

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123117
Approved by: https://github.com/wconstab
2024-04-16 13:48:35 -07:00
972b8060bd [c10d] make monitorThread sleep when we try to dump (#123788)
Summary:
We seperated the FR dump logic from the desync debug logic,
so we no longer set collectiveDebugInfoMode_ to true when we just need FR
dump. That's why monitor thread did not sleep and try to kill the
process without waiting for the dump.

The fix is simple, we should sleep whenever shouldDump_ is true
Test Plan:
Existing unit tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123788
Approved by: https://github.com/wconstab
2024-04-11 09:19:15 -07:00
3e7683ae18 [c10d] dump on any exception (timeout + nccl error) (#123023)
Summary:
Existing flight recorder dumping logic is: dump only on timeout, but not
on NCCL error. This resulted in the faulty ranks missing dumps when NCCL
error happens.

So in this PR, we revise the logic of dump such that records are dumped
when any exception is detected. Exception could be 1. NCCL async errors.
2. watchdog timeout

Also the existing code tends to mix the logic of flight recorder dump
and desync debug, which is no desirable. We only dump the desync debug
report only when timeout is detected.
Test Plan:
Added a new unit test to trigger nccl error and dump, and make sure the
dump is triggered by the error.

Also existing dump on timeout tests should still pass.

sqzhang_1) [sqzhang@devgpu009.cln1 ~/pytorch (84bf9d4c)]$ python
test/distributed/test_c10d_nccl.py NcclErrorDumpTest
NCCL version 2.19.3+cuda12.0
[E329 19:15:11.775879730 ProcessGroupNCCL.cpp:565] [Rank 0] Watchdog
caught collective operation timeout: WorkNCCL(SeqNum=2,
OpType=ALLREDUCE, NumelIn=10, NumelOut=10, Timeout(ms)=10000) ran for
10028 milliseconds before timing out.
[E329 19:15:11.777459894 ProcessGroupNCCL.cpp:1561] [PG 0 Rank 0]
Exception hit in NCCL work: 2
[E329 19:15:12.660717323 ProcessGroupNCCL.cpp:1332] [PG 0 Rank 0]
Received a timeout signal from this local rank and will start to dump
the debug info. Last enqueued NCCL work: 2, last completed NCCL work: 1.
[E329 19:15:12.660932242 ProcessGroupNCCL.cpp:1167] [PG 0 Rank 0]
ProcessGroupNCCL preparing to dump debug info.
[E329 19:15:12.661192990 ProcessGroupNCCL.cpp:1174] [PG 0 Rank 0]
ProcessGroupNCCL dumping nccl trace to /tmp/tmp06psqil3/trace_0
[F329 19:15:12.661485601 ProcessGroupNCCL.cpp:1185] [PG 0 Rank 0] [PG 0
Rank 0] ProcessGroupNCCL's watchdog detected a collective timeout from
the local rank. This is most likely caused by incorrect usages of
collectives, e.g., wrong sizes used across ranks, the order of
collectives is not same for all ranks or the scheduled collective, for
some reason, didn't run. Additionally, this can be caused by GIL
deadlock or other reasons such as network errors or bugs in the
communications library (e.g. NCCL), etc. We tried our best to dump the
debug info into the storage to help you debug the issue.

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123023
Approved by: https://github.com/wconstab
2024-04-02 15:41:15 -07:00
f2e9ec2dc5 [c10d] dump from one and only one thread (PG0's monitor thread) (#120893)
Summary:
When there are multiple PGs in a process and a hardware failure happens,
we found that multiple PGs/ threads in the same
process are competing to dump the same records at the same time. The
affects the reliability of dumps.

In this PR, we will try to make the change such that only one thread/PG
could dump: PG0's monitor thread. We use a static variable to indicate
that something (e.g., collective timeout) has triggered the dump
locally.

monitor thread would dump debug info under any one of the 3 conditions:
1: this static variable is set to true by the watchdog thread when it detects
a timeout or pipe dump signal
2: timeout signal is received from other ranks through tcpstore
3: no heartbeat of watchdog
Test Plan:
python test/distributed/test_c10d_nccl.py -k
test_timeout_dumps_on_stuck_ranks

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120893
Approved by: https://github.com/wconstab
2024-04-02 15:36:05 -07:00
dde4324d8e [NCCL PG] Enable ncclCommDevIdxMap unconditionally (#122049)
Differential Revision: D54993977

The initial purpose of ncclCommDevIdxMap is to support NCCL zero copy algorithms. Therefore, it is only enabled (with its values filled) if useTensorRegisterAllocatorHook_ is set to true. However, now we rely on it to support dumping NCCL information in a single PG. So we need it to be always available, regardless of whether we enabled useTensorRegisterAllocatorHook_.
Move the code of filling ncclCommDevIdxMap out of if (useTensorRegisterAllocatorHook_) statement.

See diff

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122049
Approved by: https://github.com/shuqiangzhang
2024-03-26 17:14:06 -07:00
94c079104d [c10d] fix the macro definition of NCCL_COMM_DUMP (#120502)
Summary:
Only if both macros are defined, should we dump the comm dump,
otherwise, use the original definition.

The previous implementation missed the function definition when IS_NCCL_EXP is defined but NCCL_COMM_DUMP is not defined

Test Plan:
Build and unit test

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120502
Approved by: https://github.com/dsjohns2, https://github.com/Skylion007
2024-03-26 14:09:00 -07:00
a6afee6d94 [c10d][flight recorder] dump additinal NCCL debug info (#120063)
Summary:
This PR is mainly about flight recorder side of changes that takes a
map of maps as input, and dump it as picklable. Also add functions that
should be compiled only when NCCL_COMM_DUMP is defined
Test Plan:
Integration tests with NCCL would be done later, here we only do the
c10d side of dump test, aka,NCCLTraceTest

Testing the dump function is a bit tricky as we don't have
existing C++ unit tests for them. So we still use the Python NCCLTraceTest with
the python binding of _dump_nccl_trace(), we manually fed the
dump_nccl_trace with a map of test info, and assert the pickle result and
print the converted python dict:
```
(sqzhang_1) [sqzhang@devgpu009.cln1 ~/pytorch (main)]$  python
test/distributed/test_c10d_nccl.py NCCLTraceTest
NCCL version 2.19.3+cuda12.0
[rank0]:[E ProcessGroupNCCL.cpp:1200] [PG 0 Rank 0] ProcessGroupNCCL
preparing to dump debug info.
.NCCL version 2.19.3+cuda12.0
.NCCL version 2.19.3+cuda12.0
{'ncclID2': {'Key2': 'Value2', 'Key1': 'Value1'}, 'ncclID1': {'Key2':
'Value2', 'Key1': 'Value1'}}
{'ncclID2': {'Key2': 'Value2', 'Key1': 'Value1'}, 'ncclID1': {'Key2':
'Value2', 'Key1': 'Value1'}}
.NCCL version 2.19.3+cuda12.0
{'ncclID2': {'Key2': 'Value2', 'Key1': 'Value1'}, 'ncclID1': {'Key2':
'Value2', 'Key1': 'Value1'}}
{'ncclID2': {'Key2': 'Value2', 'Key1': 'Value1'}, 'ncclID1': {'Key2':
'Value2', 'Key1': 'Value1'}}
.NCCL version 2.19.3+cuda12.0
.NCCL version 2.19.3+cuda12.0
.NCCL version 2.19.3+cuda12.0
.NCCL version 2.19.3+cuda12.0
.
----------------------------------------------------------------------
Ran 8 tests in 95.761s
OK
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120063
Approved by: https://github.com/wconstab
2024-03-26 14:08:19 -07:00
d092857531 [Caffe2 CPU tests] Update CMakeLists.txt 2024-02-24 12:18:10 -08:00
6aad5e444a Fix missing MAST log when there is Unicode non-decodable text in logs (#119298)
Summary:
## Issue
When there is Unicode non-decodable text in logs, `tail_logger` will stop working afterwards, i.e. f527390102

In the example, the process stopped producing Python logs after 17:20:21 untill the job finished
```
[0]:I0201 17:20:21.338000 3429 gen_ai/genie_projects/llm/metaformers/reward_model_score.py:335] Progress: 118 batches out of 512 total batches. 23.05 % | (gpu mem: 25.8GB, free CPU mem: 1387.8GB)
I0201 17:39:14 Stopping twtask-main.service with Service Result: [success] Exit Code: [exited] Exit Status: [0]
```
At the end, `UnicodeDecodeError` was thrown at the end with no call stack.

## Fix
Use `errors="replace"` to avoid throwing exception when `UnicodeDecodeError` happens.

Test Plan: f528854819

Differential Revision: D53483644

Co-authored-by: Jack Zhang <jackzh@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119298
Approved by: https://github.com/XilunWu
2024-02-24 12:16:39 -08:00
c54ce9313b [c10d][flight recorder] store a copy of string in entry (#119837)
Summary:
Previously, we just store the char pointer in entry, the string is a
temp object and will be destructed when we want to dump/access it.

A quick fix is to store a copy of the string, but without changing the
upstream char*.

An alternative is to change every profilingTitle into std:string, this
however would needs comprehensive overhall of the code up to the
c10d::work layer above workNCCL and RecordFunction etc.

We chose the first option for this change

Resolve #119808

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119837
Approved by: https://github.com/zdevito, https://github.com/wconstab
2024-02-14 11:38:10 -08:00
1fe59f4ef7 [c10d][flight recorder] remove unintended assignment of entry (#119748)
Summary:
auto& entry = entries_.at(*id % max_entries_);
entry = entries_.at(*id % max_entries_);
The above line of code has unintended consequence of invoking copy/assignment
of entry objects as ref itself cannot be re-assigned.

Also what could cause the crash is that the entry ref could become invalid if entries_ are
resized by other threads. and this could result in 'copy to a garbage
location'. The fix is to use a pointer which can be re-assigned after
re-acquiring the lock

Tests: python test/distributed/test_c10d_nccl.py NCCLTraceTest

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119748
Approved by: https://github.com/wconstab, https://github.com/fegin
2024-02-14 11:38:10 -08:00
e693fb2bb1 [nccl flight recorder] record time we discover start and complete (#119249)
Some APIs like ncclCommAbort can cause nccl kernels to finish even if
they were previously stuck. Because we can gather the trace buffer after
those calls, we can end up seeing some collectives marked completed eventhough
that complete happened several minutes after they started and clearly after
the timeout. This changes how we record state so that we keep track of the time
we discover a state change, so even if eventually the collective gets marked complete,
we can observe it happened minutes after it was schedule.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/119249
Approved by: https://github.com/wconstab
2024-02-14 11:38:10 -08:00
4fe510baf6 [NCCL PG] log NCCL comm at creation and abort (#118335)
Summary: It helps correlate NCCL PG with corresponding NCCL comm in separate logs.

Differential Revision: D53107647

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118335
Approved by: https://github.com/wconstab
2024-02-14 11:38:04 -08:00
7c507b78c4 [c10d] Expose check method to Python for store via pybind (#116144)
Differential Revision: [D52310987](https://our.internmc.facebook.com/intern/diff/D52310987)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116144
Approved by: https://github.com/wconstab
2024-01-31 11:08:27 -08:00
0019901601 [C10D] Fix nccl flightrecorder ignored dump timeout (#118142)
Don't call future.get() unless it's ready, because it waits.
Also, refactor the code a bit for simplicity.

We should do a follow-on PR to clean up the timeouts further, but this
should fix the glaring timeout bug.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118142
Approved by: https://github.com/shuqiangzhang
ghstack dependencies: #118044, #118046, #118047
2024-01-26 16:48:00 -08:00
18be18535b [C10D] Make Flight Recorder report time_created in ns (#118047)
Addresses (6) from #117883

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118047
Approved by: https://github.com/zdevito
ghstack dependencies: #118044, #118046
2024-01-26 16:48:00 -08:00
2729367313 [C10D] Add version tag to NCCL Flight Recorder Dump (#118046)
Addresses (3) from #117883

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118046
Approved by: https://github.com/zdevito
ghstack dependencies: #118044
2024-01-26 16:48:00 -08:00
33537aae24 [C10D] Make NCCL Flight Recorder dump produce a dict (#118044)
Putting the list of entries into a particular key of a top-level dict
paves the way for adding other metadata as other top level keys.

Addresses 1 and 2 from #117883

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118044
Approved by: https://github.com/zdevito
2024-01-26 16:48:00 -08:00
dcdb1337dd [C10D] Finer-grain nccl heartbeat, avoid false positive hangs (#118016)
Summary:
Previously, heatbeat was incremented once per finishing a for loop over a list
of in-progress work items, under the assumption that either the processing
would be predictably quick, or it would hang completely.

In fact, there can be cuda API contention that causes the processing of works
to slow down arbitrarily but not truly deadlock.  To guard against this, we
bump the heartbeat at the smallest unit of progress, one work item being
successfully processed.

Test Plan: CI

Differential Revision: D52973948

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118016
Approved by: https://github.com/shuqiangzhang, https://github.com/kwen2501
2024-01-26 16:48:00 -08:00
9cf0f2bd59 Move getDurationFromFirstEvent to USE_C10D_NCCL ifdef (#117738)
Fixes #117517

Try to move nccl related function *getDurationFromFirstEvent* to USE_C10D_NCCL ifdef (Related to https://github.com/pytorch/pytorch/issues/114575)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117738
Approved by: https://github.com/wconstab, https://github.com/XilunWu
2024-01-26 16:48:00 -08:00
1d2e877c05 [ProcessGroup] Make watchdog check work queue more frequently (#117297)
Today watchdog's sleep interval is 1s. That's a bit long compared to modern GPU link's (or network link's) speed.

Take DDP and Ampere for example:

DDP's bucket size = 25 MB
Ampere's NVLink speed = 250 GB/s

25 MB / 250 GB/s = 100 ms.
So we are updating the interval to 100 ms.

Update:
25 MB / 250 GB/s = 0.1 ms
But let's see how it goes so far between making the checking more aggressive.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117297
Approved by: https://github.com/fduwjj
2024-01-26 16:48:00 -08:00
f27b979b0c [c10d] Move the timeout dump check from watchdog to monitoring thread (#117168)
To avoid potential hang in watchdog thread which will prevent us from dumping timeout debugging info, we move the check of global collective timeout signals and dumping debugging info to monitoring thread. We also need to ensure that we don't wait very long to check out the timeout signal from store; otherwise, we will miss the signal and don't get debugging info dumped.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117168
Approved by: https://github.com/wconstab
2024-01-26 16:48:00 -08:00
f30d6047ad [c10d] Add a timeout check interval variable for timeout dump (#117093)
The current timeout check frequency is relied on monitoring thread's timeout thread which can be too long (even if we set it to 2mins) so let's use a separate timeout variable which users can configure it. And we only only let default PG to check TCPStore so even more frequent check should be fine. (Our stress test is performed on every half second).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117093
Approved by: https://github.com/wconstab, https://github.com/kwen2501
2024-01-26 16:48:00 -08:00
75311510ef [C10D] Add duration_ms to flight recorder (#114817)
Measures the duration of a collective operation using nccl start/end
events and includes this duration (in ms) in the flight recorder data.

duration_ms will be an optional field, since it only works when
timing is enabled.  Currently timing is enabled when flight recorder
is enabled, but this is not a strict requirement.  Duration is also
not available for collectives not in a completed state.

Note: computing duration can lead to a hang due to calling cudaEventDuration when
the cuda driver queue is full.

We don't ever want dump() api to hang, since we might want dump to help
debug a hang. Hence, we only query durations from the watchdog thread,
and it's possible during dump() call, some of the most recent
collectives durations won't have been computed yet at time of dump.  We
make this tradeoff to ensure that dump() itself will never hang.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114817
Approved by: https://github.com/fduwjj, https://github.com/zdevito
ghstack dependencies: #116905
2024-01-26 16:48:00 -08:00
dbd6094d05 [C10D](reland) Add GIL checker to NCCL watchdog monitor (#117312)
Whenever the monitor thread kills the watchdog thread for being stuck, we do so to save cluster time and get a faster failure signal, but we want to know more about why it got stuck.

One possible reason for watchdog stuckness is GIL contention, which could be ruled out or observed by making an attempt to acquire the GIL at exit time.

If we cannot acquire the GIL within a short time window (1s) we abort the attempt and report GIL contention, otherwise we report that GIL was acquired successfully.

Reland: uses a function pointer to avoid destructor ordering issues on dlclose. (Looks like the destructor for the std::function was being run later than the libtorchpython lib was unloaded, leading to a crash).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117312
Approved by: https://github.com/zdevito
2024-01-26 16:48:00 -08:00
397b9d47e9 [ProcessGroup] Do not print NCCL_DEBUG before NCCL init (#117328)
In case /etc/nccl.conf is used, `NCCL_DEBUG` is not set to sys env until NCCL inits.
The deleted print point is before NCCL inits, hence may be inaccurate.
This PR removes it and relies on the other print point which is after NCCL comm creation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117328
Approved by: https://github.com/wconstab, https://github.com/fduwjj
2024-01-26 16:48:00 -08:00
36a01a8ab9 [c10d][EZ] Add more logs in the destructor of ProcessGroupNCCL for better root cause investigation (#117291)
Add logs to the place where we inspect whether a hang happens.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117291
Approved by: https://github.com/XilunWu, https://github.com/shuqiangzhang
2024-01-26 16:48:00 -08:00
ee336cf58a [c10d] Add comments to the rest environment variable within NCCLPG (#117092)
Not every environment within NCCLPG has comments, let's add comments to each of them.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117092
Approved by: https://github.com/kwen2501
ghstack dependencies: #116545
2024-01-26 16:48:00 -08:00
a9e2e745d7 [c10d] Add extra sleep in waitForDumpOrTimeout to ensure enough time for all ranks dump debug info (#116545)
We added an extra sleep and make it configurable so that users can set an extra wait to ensure all ranks have dumped the debug info.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116545
Approved by: https://github.com/wconstab
2024-01-26 16:48:00 -08:00
ab4df89eea [C10D] Rename flightrecorder key vars to avoid confusion (#116905)
Key vars are strings used as dict keys (e.g. duration_s was a string
"duration_ms")

_s confused me with time (seconds) since duration_s was a key string and
duration_ms is another variable holding a time value.

Now duration_key is "duration_ms".

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116905
Approved by: https://github.com/zdevito
2024-01-26 16:48:00 -08:00
9d02ebe876 [c10d] To make ProcessGroupNCCL to use globalStore for coordination (#117075)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117075
Approved by: https://github.com/wconstab
ghstack dependencies: #117074
2024-01-26 16:48:00 -08:00
b61e01cce9 [c10d] Add a recursive method to get the inner most store (#117074)
In c10d PG initialization, we wrap TCPStore with multiple layers of PrefixStore which adds layers of prefix.

One example is:
"default_pg/0//cuda//timeout_dump"
When initialized the default PG, because there is no store passed. We first add the prefix "default_pg" to the TCPStore returned from rendezvous:

bdeaaad70c/torch/distributed/distributed_c10d.py (L1240)

We then add pg_name (aka 0) bdeaaad70c/torch/distributed/distributed_c10d.py (L1376) and device (aka cuda) bdeaaad70c/torch/distributed/distributed_c10d.py (L1387)

to the prefix. Then when we call store_->set("timeout_dump"). The actual key used for writing into TCPStore is "default_pg/0//cuda//timeout_dump".

For sub-PG, things get even interesting, we put the store wrapped with default pg name to a cache:
bdeaaad70c/torch/distributed/distributed_c10d.py (L1517)

And when creating each subPG, it is append its PG name right after the cached store. The example keys are:
'default_pg/0//10//cuda//timeout_dump', 'default_pg/0//12//cuda//timeout_dump', 'default_pg/0//38//cuda//timeout_dump', 'default_pg/0//39//cuda//timeout_dump'. (10, 12, 38 and 39 are all PG names of each subPG created)

The reason why the number in the name is bumped up so high is because for each subPG creation, all ranks have to call the API together and the global variable used for PG name will be bumped up monolithically:

bdeaaad70c/torch/distributed/distributed_c10d.py (L3666)

Similar things happen for using hashing for PG names.

This has a potential issue, because each sub-PG has an instance of ProcessGroupNCCL, and if we want to set something global to notify all sub-PGs (and all ranks). This added prefix causes bugs. For example, if on sub-PG 1, we set a value to TCPStore with key ('default_pg/0//1//cuda//timeout_dump'), while we use the default PG instances to check the TCPStore, which are using the key ('default_pg/0//cuda//timeout_dump'), default PG instances will never get the notified signals. So in this PR, we added a new API in PrefixStore which we get the innermost non-PrefixStore for set and check. The next PR will make changes in NCCL watchdog.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117074
Approved by: https://github.com/wconstab, https://github.com/H-Huang
2024-01-26 16:48:00 -08:00
f7ce61ba53 [C10D] Dump cpp stacktraces on heartbeat monitor timeout (#116717)
Summary:
If heartbeat monitor times out and kills the process, we want to know why.

It's convenient to use an internal tool for this, but we plan to later
integrate with torchelastic to call into pyspy or something else, which will be
both better (including py stacks) and compatible with OSS.

Test Plan: tested manually, observed c++ stacktraces were dumped

Reviewed By: fduwjj

Differential Revision: D52370243

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116717
Approved by: https://github.com/zdevito
2024-01-26 16:48:00 -08:00
e7bae15ab1 [C10D] Make heartbeat_ atomic (#116702)
Summary:
Currently, the code is working. We know this becuase we observe heartbeat
timeouts.

However, there is a chance that if the code were refactored, the compiler could
optimize away the load of heartbeat_ inside heartbeatMonitor, and we wouldn't
know.

Using atomic here is not really for thread synchronization, but more to ensure
compiler optimizations (hoisting the read outside the loop) can never be
allowed to happen.  Again, we know this isn't currently happening bc if it
were, it  would not be an intermittent failure, it would be an always failure.
(at least with a fixed compiler/platform).

I previously avoided atomic bc we didn't want shared locks between heartbeat
monitor and watchdog thread.  Why? if watchdog held the lock and hung, monitor
could also hang.  However, this really can't happen (Afaik) when using an
atomic.

Test Plan: existing CI tests

Differential Revision: D52378257

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116702
Approved by: https://github.com/fduwjj, https://github.com/zdevito
2024-01-26 16:48:00 -08:00
e71b422908 [C10D] Improve Heartbeat Monitor exit logs (#116268) (#116661)
Summary:

- add workMetaList_.size() so we know how many outstanding works there
  were when killing
- Print our first log before debuginfo dump instead of after, since it
  is clearer when reading the logs that we time out and then dump
- Organize the log strings- put them near where they are used

cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l yf225

imported-using-ghimport

Test Plan: Imported from OSS

Reviewed By: fduwjj

Differential Revision: D52369167

Pulled By: wconstab

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116661
Approved by: https://github.com/fduwjj
2024-01-26 16:48:00 -08:00
389940ce60 [c10d] Make DebugInfoWriter Singleton across all PG objects (#116489)
Previously, we have the writer register to each NCCL PG(backend), so for every pg, we have a NCCL PG instance, so if we use some customized writer when multiple sub-PGs are used, we need to ensure user to register the writer for every backend which indicates a bad UX. Furthermore, the debug info is global, so it does not make sense to have the writer for each instance. We even have a static mutex in the `dumpDebuggingInfo` to ensure we serialize the write, that makes it more obvious that we can make the writer a singleton so that we only have one writer instance for all PG instances.

Although the rationale is clear, the implementation may vary a lot. So this PR is RFC for now to see if this implementation makes sense or not.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116489
Approved by: https://github.com/kwen2501
2024-01-26 16:48:00 -08:00
b2237a7c85 [C10d] Fix Log Prefix in NCCLPG so that each instance gets its own prefix (#116520)
Somehow the logprefix only have ProcessGroup 0 rank [global rank]. This does not give the expected result as per the comment says "a prefix that is unique to this process group and rank". So this PR fix it and make it different for different subPGs.

The reason is that we set the prefix static which is shared across all NCCLPG instances and whoever calls this function first will set `rank_` and `uid_` to the prefix. We always initialize PG 0 first that's why we always see PG[0] + global ranks for all subPGs.

<img width="484" alt="image" src="https://github.com/pytorch/pytorch/assets/6937752/7fbb0226-7e25-4306-9cee-22e17b00bc8e">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116520
Approved by: https://github.com/wconstab
ghstack dependencies: #116218
2024-01-26 16:48:00 -08:00
ef5dfe3f3e [c10d] Fix timeout dump path write path overlap when there are multiple PGs (#116218)
Basically we observed that if there are multiple PGs and if the timeout happens on one of the subPG, we somehow use the local rank in the dump file. We realize that:
1. For setting the timeout signal in the store, any watchdog thread from any PG can do that.
2. For checking and dump, only the watchdog thread of default PG which we will always create and contain all ranks (no file name conflict) is needed here because the store signal and dump debug info are all global.
3. Since dump is global, we want to avoid the case when ranks from sub-PG pollute logs from global ranks (local rank 0 vs global rank 0). So that we use global ranks here to initialize debug info writer. (Down the road, we are thinking about making it a singleton so that user only register it once for multi-PG case.)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116218
Approved by: https://github.com/wconstab
2024-01-26 16:48:00 -08:00
e303dc3c08 [c10d] Add stream info during nccl comm abort call (#116076)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116076
Approved by: https://github.com/XilunWu
2024-01-26 16:48:00 -08:00
265efad2de [C10D] Increase TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC (#116267)
Change default from 2 min to 10 min.

Why? Many cases of heartbeat timeout were reported, but increasing
timeout led to the same job hanging in a different place, suggesting
heartbeat kill was working well and not a false positive.  However, some
others reported jobs running fine with increased timeouts.  One such
case was investigated below, and suggests that indeed a 2 min timeout is
too aggressive.  While we have not fully root caused the issue, it
is better to avoid killing jobs that would otherwise complete.

Current theory is that watchdog is not totally deadlocked, but is slowed
down in its processing of work objs due to some intermittent resource
contention.  Hence, allowing more time is more of a workaround than a
fix.

Debug/Analysis:
https://docs.google.com/document/d/1NMNWoTB86ZpP9bqYLZ_EVA9byOlEfxw0wynMVEMlXwM

Differential Revision: [D52368791](https://our.internmc.facebook.com/intern/diff/D52368791)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116267
Approved by: https://github.com/fduwjj
2024-01-26 16:48:00 -08:00
60f0455905 [C10D] Make all PGNCCL LOG usages use logPrefix() (#116060)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116060
Approved by: https://github.com/fduwjj
ghstack dependencies: #116059
2024-01-26 16:48:00 -08:00
4898313791 [C10D] Add logPrefix to abortCommsFromMap (#116059)
Prints additional info such as PG ID/Rank.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116059
Approved by: https://github.com/fduwjj
2024-01-26 16:48:00 -08:00
f4da9adf6b [C10D] Add waitForDumpOrTimeout to log on dump abandonment (#115876)
Helps call attention to any cases where the dump actually times out.

The timeout is likely to hit if we run into slow stacktrace processing.

Log any exceptions encountered in the background thread, but don't raise
them- we're already willing to abandon the debug dump, and want to
proceed with our normal execution (in the case of dumppipe) or shutdown
process (when dumping happens on timeout and shutdown is already
initiated).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115876
Approved by: https://github.com/zdevito
ghstack dependencies: #115807
2024-01-26 16:48:00 -08:00
8f7f35273e [c10d] Polish NCCL PG monitor thread log message (#115888)
We turned on monitor thread by default in https://github.com/pytorch/pytorch/pull/112518, and we want the error message that is displayed when the monitor kills the process to be more informative.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115888
Approved by: https://github.com/wconstab
2024-01-26 16:48:00 -08:00
44ec9612ed [C10D] Log PG size in init log (#115807)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115807
Approved by: https://github.com/XilunWu
2024-01-26 16:48:00 -08:00
4d3bea2b29 [nccl flight recorder] nullptr profiling name (#115851)
Sometimes profiling name can be a nullptr, which
throws on conversion to std::string. This adds a check.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115851
Approved by: https://github.com/wconstab
2024-01-26 16:48:00 -08:00
0bcdddc3c1 [C10D] Make dumpDebuggingInfo share a mutex across PGs (#115803)
The mutex was originally added to avoid racing to dump debuginfo,
where a race in this case would result in a corrupted dump file.

The reason a mutex helps is that it forces all dump requests to be
serialized, so that an observer would either see an in-progress file, a
complete file, or no file.  Without a mutex, a fourth state is possible
(a file that has been written to by multiple threads and is invalid).

Becuase the mutex was a ProcessGroupNCCL class member, and each PG
instance has its own watchdog thread that can launch a dump, it was not
doing its job.  Making the mutex static shares it between instances of
the class and ensures serialization of dumps triggered by any PG.

(Note: dumps triggered by different PGs have the same, global contents
anyway- there is only one global flight recorder, so it doesn't matter
who triggers it.)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115803
Approved by: https://github.com/kwen2501
ghstack dependencies: #115771, #115798, #115800, #115801
2024-01-26 16:48:00 -08:00
28b6220312 [C10D] Change PGNCCL logs to prefix [PG {} Rank {}] (#115801)
Adds a PG {process group uid} prefix component to logs.

This is helpful in situations where there are multiple processgroups,
and rank information by itself is confusing.  (For example rank0 on PG1
may correspond to rank3 on PG0.  People may assume 'rank0' references
the global (PG0) world, but it may reference a sub-pg.  Prefacing the PG
helps clarify this.

Does NOT change logs from inside WorkNCCL functions, since WorkNCCL
doens't know what PG ID it corresponds to. Will address these logs
separately.

Example:

```
[I ProcessGroupNCCL.cpp:787] [PG 0 Rank 0] ProcessGroupNCCL initialization ...
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115801
Approved by: https://github.com/fduwjj
ghstack dependencies: #115771, #115798, #115800
2024-01-26 16:48:00 -08:00
210b7b65e2 [C10D] Refactor NCCL logs to use common prefix helper (#115800)
Put the repeated code that string formats [Rank {rank}] in one place.

Sets up for the next PR that also adds more info to this prefix.

(Does not change exception messages, which could be done as well.
Exception messages are not formatted quite the same way. Tries
instead to keep from changing log behavior (in this PR) and only
refactor code.

Did limited testing (some logs were observed OK).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115800
Approved by: https://github.com/fduwjj
ghstack dependencies: #115771, #115798
2024-01-26 16:48:00 -08:00
4da10b5cd3 [C10D] Only open NCCL dump pipe file once per process (#115798)
The NCCL flight recorder is per-process (it is shared by all
processgroups), but individual process groups used to construct their
own pipe for being signaled to dump the flight recorder.

This ensures that only one pipe per process is created, by only creating
the pipe on the first ProcessGroup (uid_ == 0) which should be the world
group.

Filenames are still keyed off of rank, but this should now be global
rank instead of sub-pg rank, making the filenames unique across the
whole trainer process.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115798
Approved by: https://github.com/zdevito
ghstack dependencies: #115771
2024-01-26 16:48:00 -08:00
f09763814f [C10D] Make DumpPipe disabled when FlightRecorder disabled (#115771)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115771
Approved by: https://github.com/fduwjj
2024-01-26 16:48:00 -08:00
80923ed5a6 [C10D] Make DumpPipe pipe file configurable (#115770)
Add TORCH_NCCL_DEBUG_INFO_PIPE_FILE env, allowing separate pipe file
location from dump file location.

Defaults PIPE_FILE to empty, meaning disabled.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115770
Approved by: https://github.com/zdevito
2024-01-26 16:48:00 -08:00
15341 changed files with 767354 additions and 667851 deletions

View File

@ -1,4 +1,3 @@
# We do not use this library in our Bazel build. It contains an
# infinitely recursing symlink that makes Bazel very unhappy.
third_party/ittapi/
third_party/opentelemetry-cpp

View File

@ -1,4 +1,4 @@
# Docker images for GitHub CI and CD
# Docker images for GitHub CI
This directory contains everything needed to build the Docker images
that are used in our CI.
@ -12,20 +12,13 @@ each image as the `BUILD_ENVIRONMENT` environment variable.
See `build.sh` for valid build environments (it's the giant switch).
## Docker CI builds
## Contents
* `build.sh` -- dispatch script to launch all builds
* `common` -- scripts used to execute individual Docker build stages
* `ubuntu` -- Dockerfile for Ubuntu image for CPU build and test jobs
* `ubuntu-cuda` -- Dockerfile for Ubuntu image with CUDA support for nvidia-docker
* `ubuntu-rocm` -- Dockerfile for Ubuntu image with ROCm support
* `ubuntu-xpu` -- Dockerfile for Ubuntu image with XPU support
### Docker CD builds
* `conda` - Dockerfile and build.sh to build Docker images used in nightly conda builds
* `manywheel` - Dockerfile and build.sh to build Docker images used in nightly manywheel builds
* `libtorch` - Dockerfile and build.sh to build Docker images used in nightly libtorch builds
## Usage

View File

@ -1,5 +0,0 @@
0.7b
manylinux_2_17
rocm6.2
9be04068c3c0857a4cfd17d7e39e71d0423ebac2
3e9e1959d23b93d78a08fcc5f868125dc3854dece32fd9458be9ef4467982291

View File

@ -71,8 +71,6 @@ if [[ "$image" == *cuda* && "$UBUNTU_VERSION" != "22.04" ]]; then
DOCKERFILE="${OS}-cuda/Dockerfile"
elif [[ "$image" == *rocm* ]]; then
DOCKERFILE="${OS}-rocm/Dockerfile"
elif [[ "$image" == *xpu* ]]; then
DOCKERFILE="${OS}-xpu/Dockerfile"
elif [[ "$image" == *cuda*linter* ]]; then
# Use a separate Dockerfile for linter to keep a small image size
DOCKERFILE="linter-cuda/Dockerfile"
@ -84,30 +82,16 @@ fi
# CMake 3.18 is needed to support CUDA17 language variant
CMAKE_VERSION=3.18.5
_UCX_COMMIT=7bb2722ff2187a0cad557ae4a6afa090569f83fb
_UCC_COMMIT=20eae37090a4ce1b32bcce6144ccad0b49943e0b
_UCX_COMMIT=00bcc6bb18fc282eb160623b4c0d300147f579af
_UCC_COMMIT=7cb07a76ccedad7e56ceb136b865eb9319c258ea
# It's annoying to rename jobs every time you want to rewrite a
# configuration, so we hardcode everything here rather than do it
# from scratch
case "$image" in
pytorch-linux-focal-cuda12.4-cudnn9-py3-gcc9)
CUDA_VERSION=12.4.1
CUDNN_VERSION=9
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
CONDA_CMAKE=yes
TRITON=yes
;;
pytorch-linux-focal-cuda12.1-cudnn9-py3-gcc9)
pytorch-linux-focal-cuda12.1-cudnn8-py3-gcc9)
CUDA_VERSION=12.1.1
CUDNN_VERSION=9
CUDNN_VERSION=8
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=9
PROTOBUF=yes
@ -119,24 +103,9 @@ case "$image" in
CONDA_CMAKE=yes
TRITON=yes
;;
pytorch-linux-focal-cuda12.4-cudnn9-py3-gcc9-inductor-benchmarks)
CUDA_VERSION=12.4.1
CUDNN_VERSION=9
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
CONDA_CMAKE=yes
TRITON=yes
INDUCTOR_BENCHMARKS=yes
;;
pytorch-linux-focal-cuda12.1-cudnn9-py3-gcc9-inductor-benchmarks)
pytorch-linux-focal-cuda12.1-cudnn8-py3-gcc9-inductor-benchmarks)
CUDA_VERSION=12.1.1
CUDNN_VERSION=9
CUDNN_VERSION=8
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=9
PROTOBUF=yes
@ -149,39 +118,9 @@ case "$image" in
TRITON=yes
INDUCTOR_BENCHMARKS=yes
;;
pytorch-linux-focal-cuda12.1-cudnn9-py3.12-gcc9-inductor-benchmarks)
CUDA_VERSION=12.1.1
CUDNN_VERSION=9
ANACONDA_PYTHON_VERSION=3.12
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
CONDA_CMAKE=yes
TRITON=yes
INDUCTOR_BENCHMARKS=yes
;;
pytorch-linux-focal-cuda12.4-cudnn9-py3.12-gcc9-inductor-benchmarks)
CUDA_VERSION=12.4.1
CUDNN_VERSION=9
ANACONDA_PYTHON_VERSION=3.12
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
CONDA_CMAKE=yes
TRITON=yes
INDUCTOR_BENCHMARKS=yes
;;
pytorch-linux-focal-cuda11.8-cudnn9-py3-gcc9)
pytorch-linux-focal-cuda11.8-cudnn8-py3-gcc9)
CUDA_VERSION=11.8.0
CUDNN_VERSION=9
CUDNN_VERSION=8
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=9
PROTOBUF=yes
@ -193,37 +132,9 @@ case "$image" in
CONDA_CMAKE=yes
TRITON=yes
;;
pytorch-linux-focal-cuda12.4-cudnn9-py3-gcc9)
CUDA_VERSION=12.4.1
CUDNN_VERSION=9
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
CONDA_CMAKE=yes
TRITON=yes
;;
pytorch-linux-focal-cuda12.1-cudnn9-py3-gcc9)
pytorch-linux-focal-cuda12.1-cudnn8-py3-gcc9)
CUDA_VERSION=12.1.1
CUDNN_VERSION=9
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
CONDA_CMAKE=yes
TRITON=yes
;;
pytorch-linux-focal-cuda12.4-cudnn9-py3-gcc9)
CUDA_VERSION=12.4.1
CUDNN_VERSION=9
CUDNN_VERSION=8
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=9
PROTOBUF=yes
@ -236,7 +147,7 @@ case "$image" in
TRITON=yes
;;
pytorch-linux-focal-py3-clang10-onnx)
ANACONDA_PYTHON_VERSION=3.9
ANACONDA_PYTHON_VERSION=3.8
CLANG_VERSION=10
PROTOBUF=yes
DB=yes
@ -245,7 +156,7 @@ case "$image" in
ONNX=yes
;;
pytorch-linux-focal-py3-clang9-android-ndk-r21e)
ANACONDA_PYTHON_VERSION=3.9
ANACONDA_PYTHON_VERSION=3.8
CLANG_VERSION=9
LLVMDEV=yes
PROTOBUF=yes
@ -254,8 +165,8 @@ case "$image" in
GRADLE_VERSION=6.8.3
NINJA_VERSION=1.9.0
;;
pytorch-linux-focal-py3.9-clang10)
ANACONDA_PYTHON_VERSION=3.9
pytorch-linux-focal-py3.8-clang10)
ANACONDA_PYTHON_VERSION=3.8
CLANG_VERSION=10
PROTOBUF=yes
DB=yes
@ -276,8 +187,8 @@ case "$image" in
CONDA_CMAKE=yes
TRITON=yes
;;
pytorch-linux-focal-py3.9-gcc9)
ANACONDA_PYTHON_VERSION=3.9
pytorch-linux-focal-py3.8-gcc9)
ANACONDA_PYTHON_VERSION=3.8
GCC_VERSION=9
PROTOBUF=yes
DB=yes
@ -286,40 +197,29 @@ case "$image" in
TRITON=yes
;;
pytorch-linux-focal-rocm-n-1-py3)
ANACONDA_PYTHON_VERSION=3.10
ANACONDA_PYTHON_VERSION=3.8
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
ROCM_VERSION=6.1
ROCM_VERSION=5.6
NINJA_VERSION=1.9.0
CONDA_CMAKE=yes
TRITON=yes
;;
pytorch-linux-focal-rocm-n-py3)
ANACONDA_PYTHON_VERSION=3.10
ANACONDA_PYTHON_VERSION=3.8
GCC_VERSION=9
PROTOBUF=yes
DB=yes
VISION=yes
ROCM_VERSION=6.2
ROCM_VERSION=5.7
NINJA_VERSION=1.9.0
CONDA_CMAKE=yes
TRITON=yes
;;
pytorch-linux-jammy-xpu-2024.0-py3)
ANACONDA_PYTHON_VERSION=3.9
GCC_VERSION=11
PROTOBUF=yes
DB=yes
VISION=yes
XPU_VERSION=0.5
NINJA_VERSION=1.9.0
CONDA_CMAKE=yes
TRITON=yes
;;
pytorch-linux-jammy-py3.9-gcc11-inductor-benchmarks)
ANACONDA_PYTHON_VERSION=3.9
pytorch-linux-jammy-py3.8-gcc11-inductor-benchmarks)
ANACONDA_PYTHON_VERSION=3.8
GCC_VERSION=11
PROTOBUF=yes
DB=yes
@ -330,10 +230,10 @@ case "$image" in
DOCS=yes
INDUCTOR_BENCHMARKS=yes
;;
pytorch-linux-jammy-cuda11.8-cudnn9-py3.9-clang12)
ANACONDA_PYTHON_VERSION=3.9
pytorch-linux-jammy-cuda11.8-cudnn8-py3.8-clang12)
ANACONDA_PYTHON_VERSION=3.8
CUDA_VERSION=11.8
CUDNN_VERSION=9
CUDNN_VERSION=8
CLANG_VERSION=12
PROTOBUF=yes
DB=yes
@ -355,8 +255,8 @@ case "$image" in
CONDA_CMAKE=yes
VISION=yes
;;
pytorch-linux-jammy-py3.9-gcc11)
ANACONDA_PYTHON_VERSION=3.9
pytorch-linux-jammy-py3.8-gcc11)
ANACONDA_PYTHON_VERSION=3.8
GCC_VERSION=11
PROTOBUF=yes
DB=yes
@ -365,7 +265,6 @@ case "$image" in
CONDA_CMAKE=yes
TRITON=yes
DOCS=yes
UNINSTALL_DILL=yes
;;
pytorch-linux-jammy-py3-clang12-executorch)
ANACONDA_PYTHON_VERSION=3.10
@ -373,13 +272,6 @@ case "$image" in
CONDA_CMAKE=yes
EXECUTORCH=yes
;;
pytorch-linux-jammy-py3.12-halide)
CUDA_VERSION=12.4
ANACONDA_PYTHON_VERSION=3.12
GCC_VERSION=11
CONDA_CMAKE=yes
HALIDE=yes
;;
pytorch-linux-focal-linter)
# TODO: Use 3.9 here because of this issue https://github.com/python/mypy/issues/13627.
# We will need to update mypy version eventually, but that's for another day. The task
@ -387,42 +279,11 @@ case "$image" in
ANACONDA_PYTHON_VERSION=3.9
CONDA_CMAKE=yes
;;
pytorch-linux-jammy-cuda11.8-cudnn9-py3.9-linter)
pytorch-linux-jammy-cuda11.8-cudnn8-py3.9-linter)
ANACONDA_PYTHON_VERSION=3.9
CUDA_VERSION=11.8
CONDA_CMAKE=yes
;;
pytorch-linux-jammy-aarch64-py3.10-gcc11)
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=11
ACL=yes
PROTOBUF=yes
DB=yes
VISION=yes
CONDA_CMAKE=yes
# snadampal: skipping sccache due to the following issue
# https://github.com/pytorch/pytorch/issues/121559
SKIP_SCCACHE_INSTALL=yes
# snadampal: skipping llvm src build install because the current version
# from pytorch/llvm:9.0.1 is x86 specific
SKIP_LLVM_SRC_BUILD_INSTALL=yes
;;
pytorch-linux-jammy-aarch64-py3.10-gcc11-inductor-benchmarks)
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=11
ACL=yes
PROTOBUF=yes
DB=yes
VISION=yes
CONDA_CMAKE=yes
# snadampal: skipping sccache due to the following issue
# https://github.com/pytorch/pytorch/issues/121559
SKIP_SCCACHE_INSTALL=yes
# snadampal: skipping llvm src build install because the current version
# from pytorch/llvm:9.0.1 is x86 specific
SKIP_LLVM_SRC_BUILD_INSTALL=yes
INDUCTOR_BENCHMARKS=yes
;;
*)
# Catch-all for builds that are not hardcoded.
PROTOBUF=yes
@ -470,7 +331,7 @@ tmp_tag=$(basename "$(mktemp -u)" | tr '[:upper:]' '[:lower:]')
#when using cudnn version 8 install it separately from cuda
if [[ "$image" == *cuda* && ${OS} == "ubuntu" ]]; then
IMAGE_NAME="nvidia/cuda:${CUDA_VERSION}-cudnn${CUDNN_VERSION}-devel-ubuntu${UBUNTU_VERSION}"
if [[ ${CUDNN_VERSION} == 9 ]]; then
if [[ ${CUDNN_VERSION} == 8 ]]; then
IMAGE_NAME="nvidia/cuda:${CUDA_VERSION}-devel-ubuntu${UBUNTU_VERSION}"
fi
fi
@ -513,17 +374,12 @@ docker build \
--build-arg "DOCS=${DOCS}" \
--build-arg "INDUCTOR_BENCHMARKS=${INDUCTOR_BENCHMARKS}" \
--build-arg "EXECUTORCH=${EXECUTORCH}" \
--build-arg "HALIDE=${HALIDE}" \
--build-arg "XPU_VERSION=${XPU_VERSION}" \
--build-arg "ACL=${ACL:-}" \
--build-arg "SKIP_SCCACHE_INSTALL=${SKIP_SCCACHE_INSTALL:-}" \
--build-arg "SKIP_LLVM_SRC_BUILD_INSTALL=${SKIP_LLVM_SRC_BUILD_INSTALL:-}" \
-f $(dirname ${DOCKERFILE})/Dockerfile \
-t "$tmp_tag" \
"$@" \
.
# NVIDIA dockers for RC releases use tag names like `11.0-cudnn9-devel-ubuntu18.04-rc`,
# NVIDIA dockers for RC releases use tag names like `11.0-cudnn8-devel-ubuntu18.04-rc`,
# for this case we will set UBUNTU_VERSION to `18.04-rc` so that the Dockerfile could
# find the correct image. As a result, here we have to replace the
# "$UBUNTU_VERSION" == "18.04-rc"

View File

@ -62,7 +62,7 @@ RUN if [ -n "${DB}" ]; then bash ./install_db.sh; fi
RUN rm install_db.sh
ENV INSTALLED_DB ${DB}
# (optional) Install vision packages like OpenCV
# (optional) Install vision packages like OpenCV and ffmpeg
ARG VISION
COPY ./common/install_vision.sh ./common/cache_vision_models.sh ./common/common_utils.sh ./
RUN if [ -n "${VISION}" ]; then bash ./install_vision.sh; fi
@ -77,9 +77,6 @@ RUN rm install_rocm.sh
COPY ./common/install_rocm_magma.sh install_rocm_magma.sh
RUN bash ./install_rocm_magma.sh
RUN rm install_rocm_magma.sh
COPY ./common/install_amdsmi.sh install_amdsmi.sh
RUN bash ./install_amdsmi.sh
RUN rm install_amdsmi.sh
ENV PATH /opt/rocm/bin:$PATH
ENV PATH /opt/rocm/hcc/bin:$PATH
ENV PATH /opt/rocm/hip/bin:$PATH
@ -108,17 +105,10 @@ ENV CMAKE_C_COMPILER cc
ENV CMAKE_CXX_COMPILER c++
COPY ./common/install_triton.sh install_triton.sh
COPY ./common/common_utils.sh common_utils.sh
COPY ci_commit_pins/triton.txt triton.txt
COPY ci_commit_pins/triton-rocm.txt triton-rocm.txt
COPY triton_version.txt triton_version.txt
RUN if [ -n "${TRITON}" ]; then bash ./install_triton.sh; fi
RUN rm install_triton.sh common_utils.sh triton.txt triton_version.txt
# Install AOTriton (Early fail)
COPY ./aotriton_version.txt aotriton_version.txt
COPY ./common/common_utils.sh common_utils.sh
COPY ./common/install_aotriton.sh install_aotriton.sh
RUN ["/bin/bash", "-c", "./install_aotriton.sh /opt/rocm && rm -rf install_aotriton.sh aotriton_version.txt common_utils.sh"]
ENV AOTRITON_INSTALLED_PREFIX /opt/rocm/aotriton
RUN rm install_triton.sh common_utils.sh triton-rocm.txt triton_version.txt
# Install ccache/sccache (do this last, so we get priority in PATH)
COPY ./common/install_cache.sh install_cache.sh

View File

@ -1 +1 @@
cd1c833b079adb324871dcbbe75b43d42ffc0ade
ca6322dcfc51b209a06b76d160bd95d81d58f15c

View File

@ -1 +0,0 @@
461c12871f336fe6f57b55d6a297f13ef209161b

View File

@ -1 +1 @@
243e186efbf7fb93328dd6b34927a4e8c8f24395
6c26faa159b79a42d7fa46cb66e2d21523351987

View File

@ -1 +1 @@
ac3470188b914c5d7a5058a7e28b9eb685a62427
730b907b4d45a4713cbc425cbf224c46089fd514

View File

@ -0,0 +1 @@
dafe1459823b9549417ed95e9720f1b594fab329

View File

@ -1 +0,0 @@
91b14bf5593cf58a8541f3e6b9125600a867d4ef

View File

@ -1 +1 @@
5fe38ffd73c2ac6ed6323b554205186696631c6f
bcad9dabe15021c53b6a88296e9d7a210044f108

View File

@ -1,16 +0,0 @@
set -euo pipefail
readonly version=v24.04
readonly src_host=https://review.mlplatform.org/ml
readonly src_repo=ComputeLibrary
# Clone ACL
[[ ! -d ${src_repo} ]] && git clone ${src_host}/${src_repo}.git
cd ${src_repo}
git checkout $version
# Build with scons
scons -j8 Werror=0 debug=0 neon=1 opencl=0 embed_kernels=0 \
os=linux arch=armv8a build=native multi_isa=1 \
fixed_format_kernels=1 openmp=1 cppthreads=0

View File

@ -1,5 +0,0 @@
#!/bin/bash
set -ex
cd /opt/rocm/share/amd_smi && pip install .

View File

@ -1,23 +0,0 @@
#!/bin/bash
set -ex
source "$(dirname "${BASH_SOURCE[0]}")/common_utils.sh"
TARBALL='aotriton.tar.gz'
# This read command alwasy returns with exit code 1
read -d "\n" VER MANYLINUX ROCMBASE PINNED_COMMIT SHA256 < aotriton_version.txt || true
ARCH=$(uname -m)
AOTRITON_INSTALL_PREFIX="$1"
AOTRITON_URL="https://github.com/ROCm/aotriton/releases/download/${VER}/aotriton-${VER}-${MANYLINUX}_${ARCH}-${ROCMBASE}-shared.tar.gz"
cd "${AOTRITON_INSTALL_PREFIX}"
# Must use -L to follow redirects
curl -L --retry 3 -o "${TARBALL}" "${AOTRITON_URL}"
ACTUAL_SHA256=$(sha256sum "${TARBALL}" | cut -d " " -f 1)
if [ "${SHA256}" != "${ACTUAL_SHA256}" ]; then
echo -n "Error: The SHA256 of downloaded tarball is ${ACTUAL_SHA256},"
echo " which does not match the expected value ${SHA256}."
exit
fi
tar xf "${TARBALL}" && rm -rf "${TARBALL}"

View File

@ -3,7 +3,7 @@
set -ex
install_ubuntu() {
# NVIDIA dockers for RC releases use tag names like `11.0-cudnn9-devel-ubuntu18.04-rc`,
# NVIDIA dockers for RC releases use tag names like `11.0-cudnn8-devel-ubuntu18.04-rc`,
# for this case we will set UBUNTU_VERSION to `18.04-rc` so that the Dockerfile could
# find the correct image. As a result, here we have to check for
# "$UBUNTU_VERSION" == "18.04"*
@ -61,7 +61,6 @@ install_ubuntu() {
${maybe_libiomp_dev} \
libyaml-dev \
libz-dev \
libjemalloc2 \
libjpeg-dev \
libasound2-dev \
libsndfile-dev \
@ -75,7 +74,6 @@ install_ubuntu() {
libtool \
vim \
unzip \
gpg-agent \
gdb
# Should resolve issues related to various apt package repository cert issues
@ -113,6 +111,7 @@ install_centos() {
glibc-devel \
glibc-headers \
glog-devel \
hiredis-devel \
libstdc++-devel \
libsndfile-devel \
make \
@ -152,7 +151,7 @@ wget https://ossci-linux.s3.amazonaws.com/valgrind-${VALGRIND_VERSION}.tar.bz2
tar -xjf valgrind-${VALGRIND_VERSION}.tar.bz2
cd valgrind-${VALGRIND_VERSION}
./configure --prefix=/usr/local
make -j$[$(nproc) - 2]
make -j6
sudo make install
cd ../../
rm -rf valgrind_build

View File

@ -5,17 +5,17 @@ set -ex
# Optionally install conda
if [ -n "$ANACONDA_PYTHON_VERSION" ]; then
BASE_URL="https://repo.anaconda.com/miniconda"
CONDA_FILE="Miniconda3-latest-Linux-x86_64.sh"
if [[ $(uname -m) == "aarch64" ]] || [[ "$BUILD_ENVIRONMENT" == *xpu* ]]; then
BASE_URL="https://github.com/conda-forge/miniforge/releases/latest/download"
CONDA_FILE="Miniforge3-Linux-$(uname -m).sh"
fi
MAJOR_PYTHON_VERSION=$(echo "$ANACONDA_PYTHON_VERSION" | cut -d . -f 1)
MINOR_PYTHON_VERSION=$(echo "$ANACONDA_PYTHON_VERSION" | cut -d . -f 2)
case "$MAJOR_PYTHON_VERSION" in
3);;
2)
CONDA_FILE="Miniconda2-latest-Linux-x86_64.sh"
;;
3)
CONDA_FILE="Miniconda3-latest-Linux-x86_64.sh"
;;
*)
echo "Unsupported ANACONDA_PYTHON_VERSION: $ANACONDA_PYTHON_VERSION"
exit 1
@ -47,41 +47,16 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then
# Uncomment the below when resolved to track the latest conda update
# as_jenkins conda update -y -n base conda
if [[ $(uname -m) == "aarch64" ]]; then
export SYSROOT_DEP="sysroot_linux-aarch64=2.17"
else
export SYSROOT_DEP="sysroot_linux-64=2.17"
fi
# Install correct Python version
# Also ensure sysroot is using a modern GLIBC to match system compilers
as_jenkins conda create -n py_$ANACONDA_PYTHON_VERSION -y\
python="$ANACONDA_PYTHON_VERSION" \
${SYSROOT_DEP}
# libstdcxx from conda default channels are too old, we need GLIBCXX_3.4.30
# which is provided in libstdcxx 12 and up.
conda_install libstdcxx-ng=12.3.0 -c conda-forge
as_jenkins conda create -n py_$ANACONDA_PYTHON_VERSION -y python="$ANACONDA_PYTHON_VERSION"
# Install PyTorch conda deps, as per https://github.com/pytorch/pytorch README
if [[ $(uname -m) == "aarch64" ]]; then
CONDA_COMMON_DEPS="astunparse pyyaml setuptools openblas==0.3.25=*openmp* ninja==1.11.1 scons==4.5.2"
if [ "$ANACONDA_PYTHON_VERSION" = "3.8" ]; then
NUMPY_VERSION=1.24.4
else
NUMPY_VERSION=1.26.2
fi
CONDA_COMMON_DEPS="astunparse pyyaml mkl=2021.4.0 mkl-include=2021.4.0 setuptools"
if [ "$ANACONDA_PYTHON_VERSION" = "3.11" ]; then
conda_install numpy=1.23.5 ${CONDA_COMMON_DEPS}
else
CONDA_COMMON_DEPS="astunparse pyyaml mkl=2021.4.0 mkl-include=2021.4.0 setuptools"
if [ "$ANACONDA_PYTHON_VERSION" = "3.11" ] || [ "$ANACONDA_PYTHON_VERSION" = "3.12" ] || [ "$ANACONDA_PYTHON_VERSION" = "3.13" ]; then
NUMPY_VERSION=1.26.0
else
NUMPY_VERSION=1.21.2
fi
conda_install numpy=1.21.2 ${CONDA_COMMON_DEPS}
fi
conda_install ${CONDA_COMMON_DEPS}
# Install llvm-8 as it is required to compile llvmlite-0.30.0 from source
# and libpython-static for torch deploy
@ -103,7 +78,7 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then
# Install some other packages, including those needed for Python test reporting
pip_install -r /opt/conda/requirements-ci.txt
pip_install numpy=="$NUMPY_VERSION"
pip_install -U scikit-learn
if [ -n "$DOCS" ]; then
@ -114,5 +89,14 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then
pip_install -r /opt/conda/requirements-docs.txt
fi
# HACK HACK HACK
# gcc-9 for ubuntu-18.04 from http://ppa.launchpad.net/ubuntu-toolchain-r/test/ubuntu
# Pulls llibstdc++6 13.1.0-8ubuntu1~18.04 which is too new for conda
# So remove libstdc++6.so.3.29 installed by https://anaconda.org/anaconda/libstdcxx-ng/files?version=11.2.0
# Same is true for gcc-12 from Ubuntu-22.04
if grep -e [12][82].04.[623] /etc/issue >/dev/null; then
rm /opt/conda/envs/py_$ANACONDA_PYTHON_VERSION/lib/libstdc++.so.6
fi
popd
fi

View File

@ -1,20 +0,0 @@
#!/bin/bash
# Script used only in CD pipeline
set -ex
# Anaconda
# Latest anaconda is using openssl-3 which is incompatible with all currently published versions of git
# Which are using openssl-1.1.1, see https://anaconda.org/anaconda/git/files?version=2.40.1 for example
MINICONDA_URL=https://repo.anaconda.com/miniconda/Miniconda3-py311_23.5.2-0-Linux-x86_64.sh
wget -q $MINICONDA_URL
# NB: Manually invoke bash per https://github.com/conda/conda/issues/10431
bash $(basename "$MINICONDA_URL") -b -p /opt/conda
rm $(basename "$MINICONDA_URL")
export PATH=/opt/conda/bin:$PATH
# See https://github.com/pytorch/builder/issues/1473
# Pin conda to 23.5.2 as it's the last one compatible with openssl-1.1.1
conda install -y conda=23.5.2 conda-build anaconda-client git ninja
# The cmake version here needs to match with the minimum version of cmake
# supported by PyTorch (3.18). There is only 3.18.2 on anaconda
/opt/conda/bin/pip3 install cmake==3.18.2
conda remove -y --force patchelf

View File

@ -1,96 +0,0 @@
#!/bin/bash
# Script used only in CD pipeline
set -uex -o pipefail
PYTHON_DOWNLOAD_URL=https://www.python.org/ftp/python
PYTHON_DOWNLOAD_GITHUB_BRANCH=https://github.com/python/cpython/archive/refs/heads
GET_PIP_URL=https://bootstrap.pypa.io/get-pip.py
# Python versions to be installed in /opt/$VERSION_NO
CPYTHON_VERSIONS=${CPYTHON_VERSIONS:-"3.8.1 3.9.0 3.10.1 3.11.0 3.12.0 3.13.0"}
function check_var {
if [ -z "$1" ]; then
echo "required variable not defined"
exit 1
fi
}
function do_cpython_build {
local py_ver=$1
local py_folder=$2
check_var $py_ver
check_var $py_folder
tar -xzf Python-$py_ver.tgz
pushd $py_folder
local prefix="/opt/_internal/cpython-${py_ver}"
mkdir -p ${prefix}/lib
if [[ -n $(which patchelf) ]]; then
local shared_flags="--enable-shared"
else
local shared_flags="--disable-shared"
fi
if [[ -z "${WITH_OPENSSL+x}" ]]; then
local openssl_flags=""
else
local openssl_flags="--with-openssl=${WITH_OPENSSL} --with-openssl-rpath=auto"
fi
# -Wformat added for https://bugs.python.org/issue17547 on Python 2.6
CFLAGS="-Wformat" ./configure --prefix=${prefix} ${openssl_flags} ${shared_flags} > /dev/null
make -j40 > /dev/null
make install > /dev/null
if [[ "${shared_flags}" == "--enable-shared" ]]; then
patchelf --set-rpath '$ORIGIN/../lib' ${prefix}/bin/python3
fi
popd
rm -rf $py_folder
# Some python's install as bin/python3. Make them available as
# bin/python.
if [ -e ${prefix}/bin/python3 ]; then
ln -s python3 ${prefix}/bin/python
fi
${prefix}/bin/python get-pip.py
if [ -e ${prefix}/bin/pip3 ] && [ ! -e ${prefix}/bin/pip ]; then
ln -s pip3 ${prefix}/bin/pip
fi
# install setuptools since python 3.12 is required to use distutils
${prefix}/bin/pip install wheel==0.34.2 setuptools==68.2.2
local abi_tag=$(${prefix}/bin/python -c "from wheel.pep425tags import get_abbr_impl, get_impl_ver, get_abi_tag; print('{0}{1}-{2}'.format(get_abbr_impl(), get_impl_ver(), get_abi_tag()))")
ln -s ${prefix} /opt/python/${abi_tag}
}
function build_cpython {
local py_ver=$1
check_var $py_ver
check_var $PYTHON_DOWNLOAD_URL
local py_ver_folder=$py_ver
if [ "$py_ver" = "3.13.0" ]; then
PY_VER_SHORT="3.13"
check_var $PYTHON_DOWNLOAD_GITHUB_BRANCH
wget $PYTHON_DOWNLOAD_GITHUB_BRANCH/$PY_VER_SHORT.tar.gz -O Python-$py_ver.tgz
do_cpython_build $py_ver cpython-$PY_VER_SHORT
else
wget -q $PYTHON_DOWNLOAD_URL/$py_ver_folder/Python-$py_ver.tgz
do_cpython_build $py_ver Python-$py_ver
fi
rm -f Python-$py_ver.tgz
}
function build_cpythons {
check_var $GET_PIP_URL
curl -sLO $GET_PIP_URL
for py_ver in $@; do
build_cpython $py_ver
done
rm -f get-pip.py
}
mkdir -p /opt/python
mkdir -p /opt/_internal
build_cpythons $CPYTHON_VERSIONS

View File

@ -1,250 +0,0 @@
#!/bin/bash
set -ex
NCCL_VERSION=v2.21.5-1
CUDNN_VERSION=9.1.0.70
function install_cusparselt_040 {
# cuSparseLt license: https://docs.nvidia.com/cuda/cusparselt/license.html
mkdir tmp_cusparselt && pushd tmp_cusparselt
wget -q https://developer.download.nvidia.com/compute/cusparselt/redist/libcusparse_lt/linux-x86_64/libcusparse_lt-linux-x86_64-0.4.0.7-archive.tar.xz
tar xf libcusparse_lt-linux-x86_64-0.4.0.7-archive.tar.xz
cp -a libcusparse_lt-linux-x86_64-0.4.0.7-archive/include/* /usr/local/cuda/include/
cp -a libcusparse_lt-linux-x86_64-0.4.0.7-archive/lib/* /usr/local/cuda/lib64/
popd
rm -rf tmp_cusparselt
}
function install_cusparselt_052 {
# cuSparseLt license: https://docs.nvidia.com/cuda/cusparselt/license.html
mkdir tmp_cusparselt && pushd tmp_cusparselt
wget -q https://developer.download.nvidia.com/compute/cusparselt/redist/libcusparse_lt/linux-x86_64/libcusparse_lt-linux-x86_64-0.5.2.1-archive.tar.xz
tar xf libcusparse_lt-linux-x86_64-0.5.2.1-archive.tar.xz
cp -a libcusparse_lt-linux-x86_64-0.5.2.1-archive/include/* /usr/local/cuda/include/
cp -a libcusparse_lt-linux-x86_64-0.5.2.1-archive/lib/* /usr/local/cuda/lib64/
popd
rm -rf tmp_cusparselt
}
function install_cusparselt_062 {
# cuSparseLt license: https://docs.nvidia.com/cuda/cusparselt/license.html
mkdir tmp_cusparselt && pushd tmp_cusparselt
wget -q https://developer.download.nvidia.com/compute/cusparselt/redist/libcusparse_lt/linux-x86_64/libcusparse_lt-linux-x86_64-0.6.2.3-archive.tar.xz
tar xf libcusparse_lt-linux-x86_64-0.6.2.3-archive.tar.xz
cp -a libcusparse_lt-linux-x86_64-0.6.2.3-archive/include/* /usr/local/cuda/include/
cp -a libcusparse_lt-linux-x86_64-0.6.2.3-archive/lib/* /usr/local/cuda/lib64/
popd
rm -rf tmp_cusparselt
}
function install_118 {
echo "Installing CUDA 11.8 and cuDNN ${CUDNN_VERSION} and NCCL ${NCCL_VERSION} and cuSparseLt-0.4.0"
rm -rf /usr/local/cuda-11.8 /usr/local/cuda
# install CUDA 11.8.0 in the same container
wget -q https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run
chmod +x cuda_11.8.0_520.61.05_linux.run
./cuda_11.8.0_520.61.05_linux.run --toolkit --silent
rm -f cuda_11.8.0_520.61.05_linux.run
rm -f /usr/local/cuda && ln -s /usr/local/cuda-11.8 /usr/local/cuda
# cuDNN license: https://developer.nvidia.com/cudnn/license_agreement
mkdir tmp_cudnn && cd tmp_cudnn
wget -q https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-x86_64/cudnn-linux-x86_64-${CUDNN_VERSION}_cuda11-archive.tar.xz -O cudnn-linux-x86_64-${CUDNN_VERSION}_cuda11-archive.tar.xz
tar xf cudnn-linux-x86_64-${CUDNN_VERSION}_cuda11-archive.tar.xz
cp -a cudnn-linux-x86_64-${CUDNN_VERSION}_cuda11-archive/include/* /usr/local/cuda/include/
cp -a cudnn-linux-x86_64-${CUDNN_VERSION}_cuda11-archive/lib/* /usr/local/cuda/lib64/
cd ..
rm -rf tmp_cudnn
# NCCL license: https://docs.nvidia.com/deeplearning/nccl/#licenses
# Follow build: https://github.com/NVIDIA/nccl/tree/master?tab=readme-ov-file#build
git clone -b $NCCL_VERSION --depth 1 https://github.com/NVIDIA/nccl.git
cd nccl && make -j src.build
cp -a build/include/* /usr/local/cuda/include/
cp -a build/lib/* /usr/local/cuda/lib64/
cd ..
rm -rf nccl
install_cusparselt_040
ldconfig
}
function install_121 {
echo "Installing CUDA 12.1 and cuDNN ${CUDNN_VERSION} and NCCL ${NCCL_VERSION} and cuSparseLt-0.5.2"
rm -rf /usr/local/cuda-12.1 /usr/local/cuda
# install CUDA 12.1.0 in the same container
wget -q https://developer.download.nvidia.com/compute/cuda/12.1.1/local_installers/cuda_12.1.1_530.30.02_linux.run
chmod +x cuda_12.1.1_530.30.02_linux.run
./cuda_12.1.1_530.30.02_linux.run --toolkit --silent
rm -f cuda_12.1.1_530.30.02_linux.run
rm -f /usr/local/cuda && ln -s /usr/local/cuda-12.1 /usr/local/cuda
# cuDNN license: https://developer.nvidia.com/cudnn/license_agreement
mkdir tmp_cudnn && cd tmp_cudnn
wget -q https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-x86_64/cudnn-linux-x86_64-${CUDNN_VERSION}_cuda12-archive.tar.xz -O cudnn-linux-x86_64-${CUDNN_VERSION}_cuda12-archive.tar.xz
tar xf cudnn-linux-x86_64-${CUDNN_VERSION}_cuda12-archive.tar.xz
cp -a cudnn-linux-x86_64-${CUDNN_VERSION}_cuda12-archive/include/* /usr/local/cuda/include/
cp -a cudnn-linux-x86_64-${CUDNN_VERSION}_cuda12-archive/lib/* /usr/local/cuda/lib64/
cd ..
rm -rf tmp_cudnn
# NCCL license: https://docs.nvidia.com/deeplearning/nccl/#licenses
# Follow build: https://github.com/NVIDIA/nccl/tree/master?tab=readme-ov-file#build
git clone -b $NCCL_VERSION --depth 1 https://github.com/NVIDIA/nccl.git
cd nccl && make -j src.build
cp -a build/include/* /usr/local/cuda/include/
cp -a build/lib/* /usr/local/cuda/lib64/
cd ..
rm -rf nccl
install_cusparselt_052
ldconfig
}
function install_124 {
echo "Installing CUDA 12.4.1 and cuDNN ${CUDNN_VERSION} and NCCL ${NCCL_VERSION} and cuSparseLt-0.5.2"
rm -rf /usr/local/cuda-12.4 /usr/local/cuda
# install CUDA 12.4.1 in the same container
wget -q https://developer.download.nvidia.com/compute/cuda/12.4.1/local_installers/cuda_12.4.1_550.54.15_linux.run
chmod +x cuda_12.4.1_550.54.15_linux.run
./cuda_12.4.1_550.54.15_linux.run --toolkit --silent
rm -f cuda_12.4.1_550.54.15_linux.run
rm -f /usr/local/cuda && ln -s /usr/local/cuda-12.4 /usr/local/cuda
# cuDNN license: https://developer.nvidia.com/cudnn/license_agreement
mkdir tmp_cudnn && cd tmp_cudnn
wget -q https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-x86_64/cudnn-linux-x86_64-${CUDNN_VERSION}_cuda12-archive.tar.xz -O cudnn-linux-x86_64-${CUDNN_VERSION}_cuda12-archive.tar.xz
tar xf cudnn-linux-x86_64-${CUDNN_VERSION}_cuda12-archive.tar.xz
cp -a cudnn-linux-x86_64-${CUDNN_VERSION}_cuda12-archive/include/* /usr/local/cuda/include/
cp -a cudnn-linux-x86_64-${CUDNN_VERSION}_cuda12-archive/lib/* /usr/local/cuda/lib64/
cd ..
rm -rf tmp_cudnn
# NCCL license: https://docs.nvidia.com/deeplearning/nccl/#licenses
# Follow build: https://github.com/NVIDIA/nccl/tree/master?tab=readme-ov-file#build
git clone -b $NCCL_VERSION --depth 1 https://github.com/NVIDIA/nccl.git
cd nccl && make -j src.build
cp -a build/include/* /usr/local/cuda/include/
cp -a build/lib/* /usr/local/cuda/lib64/
cd ..
rm -rf nccl
install_cusparselt_062
ldconfig
}
function prune_118 {
echo "Pruning CUDA 11.8 and cuDNN"
#####################################################################################
# CUDA 11.8 prune static libs
#####################################################################################
export NVPRUNE="/usr/local/cuda-11.8/bin/nvprune"
export CUDA_LIB_DIR="/usr/local/cuda-11.8/lib64"
export GENCODE="-gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_90,code=sm_90"
export GENCODE_CUDNN="-gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_90,code=sm_90"
if [[ -n "$OVERRIDE_GENCODE" ]]; then
export GENCODE=$OVERRIDE_GENCODE
fi
# all CUDA libs except CuDNN and CuBLAS (cudnn and cublas need arch 3.7 included)
ls $CUDA_LIB_DIR/ | grep "\.a" | grep -v "culibos" | grep -v "cudart" | grep -v "cudnn" | grep -v "cublas" | grep -v "metis" \
| xargs -I {} bash -c \
"echo {} && $NVPRUNE $GENCODE $CUDA_LIB_DIR/{} -o $CUDA_LIB_DIR/{}"
# prune CuDNN and CuBLAS
$NVPRUNE $GENCODE_CUDNN $CUDA_LIB_DIR/libcublas_static.a -o $CUDA_LIB_DIR/libcublas_static.a
$NVPRUNE $GENCODE_CUDNN $CUDA_LIB_DIR/libcublasLt_static.a -o $CUDA_LIB_DIR/libcublasLt_static.a
#####################################################################################
# CUDA 11.8 prune visual tools
#####################################################################################
export CUDA_BASE="/usr/local/cuda-11.8/"
rm -rf $CUDA_BASE/libnvvp $CUDA_BASE/nsightee_plugins $CUDA_BASE/nsight-compute-2022.3.0 $CUDA_BASE/nsight-systems-2022.4.2/
}
function prune_121 {
echo "Pruning CUDA 12.1"
#####################################################################################
# CUDA 12.1 prune static libs
#####################################################################################
export NVPRUNE="/usr/local/cuda-12.1/bin/nvprune"
export CUDA_LIB_DIR="/usr/local/cuda-12.1/lib64"
export GENCODE="-gencode arch=compute_50,code=sm_50 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_90,code=sm_90"
export GENCODE_CUDNN="-gencode arch=compute_50,code=sm_50 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_90,code=sm_90"
if [[ -n "$OVERRIDE_GENCODE" ]]; then
export GENCODE=$OVERRIDE_GENCODE
fi
# all CUDA libs except CuDNN and CuBLAS
ls $CUDA_LIB_DIR/ | grep "\.a" | grep -v "culibos" | grep -v "cudart" | grep -v "cudnn" | grep -v "cublas" | grep -v "metis" \
| xargs -I {} bash -c \
"echo {} && $NVPRUNE $GENCODE $CUDA_LIB_DIR/{} -o $CUDA_LIB_DIR/{}"
# prune CuDNN and CuBLAS
$NVPRUNE $GENCODE_CUDNN $CUDA_LIB_DIR/libcublas_static.a -o $CUDA_LIB_DIR/libcublas_static.a
$NVPRUNE $GENCODE_CUDNN $CUDA_LIB_DIR/libcublasLt_static.a -o $CUDA_LIB_DIR/libcublasLt_static.a
#####################################################################################
# CUDA 12.1 prune visual tools
#####################################################################################
export CUDA_BASE="/usr/local/cuda-12.1/"
rm -rf $CUDA_BASE/libnvvp $CUDA_BASE/nsightee_plugins $CUDA_BASE/nsight-compute-2023.1.0 $CUDA_BASE/nsight-systems-2023.1.2/
}
function prune_124 {
echo "Pruning CUDA 12.4"
#####################################################################################
# CUDA 12.4 prune static libs
#####################################################################################
export NVPRUNE="/usr/local/cuda-12.4/bin/nvprune"
export CUDA_LIB_DIR="/usr/local/cuda-12.4/lib64"
export GENCODE="-gencode arch=compute_50,code=sm_50 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_90,code=sm_90"
export GENCODE_CUDNN="-gencode arch=compute_50,code=sm_50 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_90,code=sm_90"
if [[ -n "$OVERRIDE_GENCODE" ]]; then
export GENCODE=$OVERRIDE_GENCODE
fi
if [[ -n "$OVERRIDE_GENCODE_CUDNN" ]]; then
export GENCODE_CUDNN=$OVERRIDE_GENCODE_CUDNN
fi
# all CUDA libs except CuDNN and CuBLAS
ls $CUDA_LIB_DIR/ | grep "\.a" | grep -v "culibos" | grep -v "cudart" | grep -v "cudnn" | grep -v "cublas" | grep -v "metis" \
| xargs -I {} bash -c \
"echo {} && $NVPRUNE $GENCODE $CUDA_LIB_DIR/{} -o $CUDA_LIB_DIR/{}"
# prune CuDNN and CuBLAS
$NVPRUNE $GENCODE_CUDNN $CUDA_LIB_DIR/libcublas_static.a -o $CUDA_LIB_DIR/libcublas_static.a
$NVPRUNE $GENCODE_CUDNN $CUDA_LIB_DIR/libcublasLt_static.a -o $CUDA_LIB_DIR/libcublasLt_static.a
#####################################################################################
# CUDA 12.1 prune visual tools
#####################################################################################
export CUDA_BASE="/usr/local/cuda-12.4/"
rm -rf $CUDA_BASE/libnvvp $CUDA_BASE/nsightee_plugins $CUDA_BASE/nsight-compute-2024.1.0 $CUDA_BASE/nsight-systems-2023.4.4/
}
# idiomatic parameter and option handling in sh
while test $# -gt 0
do
case "$1" in
11.8) install_118; prune_118
;;
12.1) install_121; prune_121
;;
12.4) install_124; prune_124
;;
*) echo "bad argument $1"; exit 1
;;
esac
shift
done

View File

@ -1,93 +0,0 @@
#!/bin/bash
# Script used only in CD pipeline
set -ex
NCCL_VERSION=v2.21.5-1
function install_cusparselt_052 {
# cuSparseLt license: https://docs.nvidia.com/cuda/cusparselt/license.html
mkdir tmp_cusparselt && pushd tmp_cusparselt
wget -q https://developer.download.nvidia.com/compute/cusparselt/redist/libcusparse_lt/linux-sbsa/libcusparse_lt-linux-sbsa-0.5.2.1-archive.tar.xz
tar xf libcusparse_lt-linux-sbsa-0.5.2.1-archive.tar.xz
cp -a libcusparse_lt-linux-sbsa-0.5.2.1-archive/include/* /usr/local/cuda/include/
cp -a libcusparse_lt-linux-sbsa-0.5.2.1-archive/lib/* /usr/local/cuda/lib64/
popd
rm -rf tmp_cusparselt
}
function install_124 {
echo "Installing CUDA 12.4.1 and cuDNN 9.1 and NCCL ${NCCL_VERSION} and cuSparseLt-0.5.2"
rm -rf /usr/local/cuda-12.4 /usr/local/cuda
# install CUDA 12.4.1 in the same container
wget -q https://developer.download.nvidia.com/compute/cuda/12.4.1/local_installers/cuda_12.4.1_550.54.15_linux_sbsa.run
chmod +x cuda_12.4.1_550.54.15_linux_sbsa.run
./cuda_12.4.1_550.54.15_linux_sbsa.run --toolkit --silent
rm -f cuda_12.4.1_550.54.15_linux_sbsa.run
rm -f /usr/local/cuda && ln -s /usr/local/cuda-12.4 /usr/local/cuda
# cuDNN license: https://developer.nvidia.com/cudnn/license_agreement
mkdir tmp_cudnn && cd tmp_cudnn
wget -q https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-sbsa/cudnn-linux-sbsa-9.1.0.70_cuda12-archive.tar.xz -O cudnn-linux-sbsa-9.1.0.70_cuda12-archive.tar.xz
tar xf cudnn-linux-sbsa-9.1.0.70_cuda12-archive.tar.xz
cp -a cudnn-linux-sbsa-9.1.0.70_cuda12-archive/include/* /usr/local/cuda/include/
cp -a cudnn-linux-sbsa-9.1.0.70_cuda12-archive/lib/* /usr/local/cuda/lib64/
cd ..
rm -rf tmp_cudnn
# NCCL license: https://docs.nvidia.com/deeplearning/nccl/#licenses
# Follow build: https://github.com/NVIDIA/nccl/tree/master?tab=readme-ov-file#build
git clone -b ${NCCL_VERSION} --depth 1 https://github.com/NVIDIA/nccl.git
cd nccl && make -j src.build
cp -a build/include/* /usr/local/cuda/include/
cp -a build/lib/* /usr/local/cuda/lib64/
cd ..
rm -rf nccl
install_cusparselt_052
ldconfig
}
function prune_124 {
echo "Pruning CUDA 12.4"
#####################################################################################
# CUDA 12.4 prune static libs
#####################################################################################
export NVPRUNE="/usr/local/cuda-12.4/bin/nvprune"
export CUDA_LIB_DIR="/usr/local/cuda-12.4/lib64"
export GENCODE="-gencode arch=compute_50,code=sm_50 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_90,code=sm_90"
export GENCODE_CUDNN="-gencode arch=compute_50,code=sm_50 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_90,code=sm_90"
if [[ -n "$OVERRIDE_GENCODE" ]]; then
export GENCODE=$OVERRIDE_GENCODE
fi
# all CUDA libs except CuDNN and CuBLAS
ls $CUDA_LIB_DIR/ | grep "\.a" | grep -v "culibos" | grep -v "cudart" | grep -v "cudnn" | grep -v "cublas" | grep -v "metis" \
| xargs -I {} bash -c \
"echo {} && $NVPRUNE $GENCODE $CUDA_LIB_DIR/{} -o $CUDA_LIB_DIR/{}"
# prune CuDNN and CuBLAS
$NVPRUNE $GENCODE_CUDNN $CUDA_LIB_DIR/libcublas_static.a -o $CUDA_LIB_DIR/libcublas_static.a
$NVPRUNE $GENCODE_CUDNN $CUDA_LIB_DIR/libcublasLt_static.a -o $CUDA_LIB_DIR/libcublasLt_static.a
#####################################################################################
# CUDA 12.1 prune visual tools
#####################################################################################
export CUDA_BASE="/usr/local/cuda-12.4/"
rm -rf $CUDA_BASE/libnvvp $CUDA_BASE/nsightee_plugins $CUDA_BASE/nsight-compute-2024.1.0 $CUDA_BASE/nsight-systems-2023.4.4/
}
# idiomatic parameter and option handling in sh
while test $# -gt 0
do
case "$1" in
12.4) install_124; prune_124
;;
*) echo "bad argument $1"; exit 1
;;
esac
shift
done

View File

@ -1,22 +1,27 @@
#!/bin/bash
if [[ -n "${CUDNN_VERSION}" ]]; then
if [[ ${CUDNN_VERSION} == 8 ]]; then
# cuDNN license: https://developer.nvidia.com/cudnn/license_agreement
mkdir tmp_cudnn
pushd tmp_cudnn
if [[ ${CUDA_VERSION:0:2} == "12" ]]; then
CUDNN_NAME="cudnn-linux-x86_64-9.1.0.70_cuda12-archive"
elif [[ ${CUDA_VERSION:0:2} == "11" ]]; then
CUDNN_NAME="cudnn-linux-x86_64-9.1.0.70_cuda11-archive"
mkdir tmp_cudnn && cd tmp_cudnn
CUDNN_NAME="cudnn-linux-x86_64-8.3.2.44_cuda11.5-archive"
if [[ ${CUDA_VERSION:0:4} == "12.1" ]]; then
CUDNN_NAME="cudnn-linux-x86_64-8.9.2.26_cuda12-archive"
curl --retry 3 -OLs https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-x86_64/${CUDNN_NAME}.tar.xz
elif [[ ${CUDA_VERSION:0:4} == "11.8" ]]; then
CUDNN_NAME="cudnn-linux-x86_64-8.7.0.84_cuda11-archive"
curl --retry 3 -OLs https://developer.download.nvidia.com/compute/redist/cudnn/v8.7.0/local_installers/11.8/${CUDNN_NAME}.tar.xz
else
print "Unsupported CUDA version ${CUDA_VERSION}"
exit 1
curl --retry 3 -OLs https://developer.download.nvidia.com/compute/redist/cudnn/v8.3.2/local_installers/11.5/${CUDNN_NAME}.tar.xz
fi
curl --retry 3 -OLs https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-x86_64/${CUDNN_NAME}.tar.xz
tar xf ${CUDNN_NAME}.tar.xz
cp -a ${CUDNN_NAME}/include/* /usr/include/
cp -a ${CUDNN_NAME}/include/* /usr/local/cuda/include/
cp -a ${CUDNN_NAME}/include/* /usr/include/x86_64-linux-gnu/
cp -a ${CUDNN_NAME}/lib/* /usr/local/cuda/lib64/
popd
cp -a ${CUDNN_NAME}/lib/* /usr/lib/x86_64-linux-gnu/
cd ..
rm -rf tmp_cudnn
ldconfig
fi

View File

@ -1,25 +0,0 @@
#!/bin/bash
set -ex
# cudss license: https://docs.nvidia.com/cuda/cudss/license.html
mkdir tmp_cudss && cd tmp_cudss
if [[ ${CUDA_VERSION:0:4} =~ ^12\.[1-4]$ ]]; then
arch_path='sbsa'
export TARGETARCH=${TARGETARCH:-$(uname -m)}
if [ ${TARGETARCH} = 'amd64' ] || [ "${TARGETARCH}" = 'x86_64' ]; then
arch_path='x86_64'
fi
CUDSS_NAME="libcudss-linux-${arch_path}-0.3.0.9_cuda12-archive"
curl --retry 3 -OLs https://developer.download.nvidia.com/compute/cudss/redist/libcudss/linux-${arch_path}/${CUDSS_NAME}.tar.xz
# only for cuda 12
tar xf ${CUDSS_NAME}.tar.xz
cp -a ${CUDSS_NAME}/include/* /usr/local/cuda/include/
cp -a ${CUDSS_NAME}/lib/* /usr/local/cuda/lib64/
fi
cd ..
rm -rf tmp_cudss
ldconfig

View File

@ -1,34 +0,0 @@
#!/bin/bash
set -ex
# cuSPARSELt license: https://docs.nvidia.com/cuda/cusparselt/license.html
mkdir tmp_cusparselt && cd tmp_cusparselt
if [[ ${CUDA_VERSION:0:4} =~ ^12\.[2-4]$ ]]; then
arch_path='sbsa'
export TARGETARCH=${TARGETARCH:-$(uname -m)}
if [ ${TARGETARCH} = 'amd64' ] || [ "${TARGETARCH}" = 'x86_64' ]; then
arch_path='x86_64'
fi
CUSPARSELT_NAME="libcusparse_lt-linux-${arch_path}-0.6.2.3-archive"
curl --retry 3 -OLs https://developer.download.nvidia.com/compute/cusparselt/redist/libcusparse_lt/linux-${arch_path}/${CUSPARSELT_NAME}.tar.xz
elif [[ ${CUDA_VERSION:0:4} == "12.1" ]]; then
arch_path='sbsa'
export TARGETARCH=${TARGETARCH:-$(uname -m)}
if [ ${TARGETARCH} = 'amd64' ] || [ "${TARGETARCH}" = 'x86_64' ]; then
arch_path='x86_64'
fi
CUSPARSELT_NAME="libcusparse_lt-linux-${arch_path}-0.5.2.1-archive"
curl --retry 3 -OLs https://developer.download.nvidia.com/compute/cusparselt/redist/libcusparse_lt/linux-${arch_path}/${CUSPARSELT_NAME}.tar.xz
elif [[ ${CUDA_VERSION:0:4} == "11.8" ]]; then
CUSPARSELT_NAME="libcusparse_lt-linux-x86_64-0.4.0.7-archive"
curl --retry 3 -OLs https://developer.download.nvidia.com/compute/cusparselt/redist/libcusparse_lt/linux-x86_64/${CUSPARSELT_NAME}.tar.xz
fi
tar xf ${CUSPARSELT_NAME}.tar.xz
cp -a ${CUSPARSELT_NAME}/include/* /usr/local/cuda/include/
cp -a ${CUSPARSELT_NAME}/lib/* /usr/local/cuda/lib64/
cd ..
rm -rf tmp_cusparselt
ldconfig

View File

@ -4,6 +4,11 @@ set -ex
install_ubuntu() {
apt-get update
apt-get install -y --no-install-recommends \
libhiredis-dev \
libleveldb-dev \
liblmdb-dev \
libsnappy-dev
# Cleanup
apt-get autoclean && apt-get clean
@ -15,6 +20,12 @@ install_centos() {
# See http://fedoraproject.org/wiki/EPEL
yum --enablerepo=extras install -y epel-release
yum install -y \
hiredis-devel \
leveldb-devel \
lmdb-devel \
snappy-devel
# Cleanup
yum clean all
rm -rf /var/cache/yum

View File

@ -37,9 +37,6 @@ install_conda_dependencies() {
install_pip_dependencies() {
pushd executorch/.ci/docker
# Install PyTorch CPU build beforehand to avoid installing the much bigger CUDA
# binaries later, ExecuTorch only needs CPU
pip_install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
# Install all Python dependencies
pip_install -r requirements-ci.txt
popd
@ -47,14 +44,14 @@ install_pip_dependencies() {
setup_executorch() {
pushd executorch
# Setup swiftshader and Vulkan SDK which are required to build the Vulkan delegate
as_jenkins bash .ci/scripts/setup-vulkan-linux-deps.sh
source .ci/scripts/utils.sh
export PYTHON_EXECUTABLE=python
export EXECUTORCH_BUILD_PYBIND=ON
export CMAKE_ARGS="-DEXECUTORCH_BUILD_XNNPACK=ON -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON"
install_flatc_from_source
pip_install .
build_executorch_runner "cmake"
as_jenkins .ci/scripts/setup-linux.sh cmake
# Make sure that all the newly generate files are owned by Jenkins
chown -R jenkins .
popd
}

View File

@ -1,46 +0,0 @@
#!/bin/bash
set -ex
source "$(dirname "${BASH_SOURCE[0]}")/common_utils.sh"
COMMIT=$(get_pinned_commit halide)
test -n "$COMMIT"
# activate conda to populate CONDA_PREFIX
test -n "$ANACONDA_PYTHON_VERSION"
eval "$(conda shell.bash hook)"
conda activate py_$ANACONDA_PYTHON_VERSION
if [ -n "${UBUNTU_VERSION}" ];then
apt update
apt-get install -y lld liblld-15-dev libpng-dev libjpeg-dev libgl-dev \
libopenblas-dev libeigen3-dev libatlas-base-dev libzstd-dev
fi
conda_install numpy scipy imageio cmake ninja
git clone --depth 1 --branch release/16.x --recursive https://github.com/llvm/llvm-project.git
cmake -DCMAKE_BUILD_TYPE=Release \
-DLLVM_ENABLE_PROJECTS="clang" \
-DLLVM_TARGETS_TO_BUILD="X86;NVPTX" \
-DLLVM_ENABLE_TERMINFO=OFF -DLLVM_ENABLE_ASSERTIONS=ON \
-DLLVM_ENABLE_EH=ON -DLLVM_ENABLE_RTTI=ON -DLLVM_BUILD_32_BITS=OFF \
-S llvm-project/llvm -B llvm-build -G Ninja
cmake --build llvm-build
cmake --install llvm-build --prefix llvm-install
export LLVM_ROOT=`pwd`/llvm-install
export LLVM_CONFIG=$LLVM_ROOT/bin/llvm-config
git clone https://github.com/halide/Halide.git
pushd Halide
git checkout ${COMMIT} && git submodule update --init --recursive
pip_install -r requirements.txt
cmake -G Ninja -DCMAKE_BUILD_TYPE=Release -S . -B build
cmake --build build
test -e ${CONDA_PREFIX}/lib/python3 || ln -s python${ANACONDA_PYTHON_VERSION} ${CONDA_PREFIX}/lib/python3
cmake --install build --prefix ${CONDA_PREFIX}
chown -R jenkins ${CONDA_PREFIX}
popd
rm -rf Halide llvm-build llvm-project llvm-install
python -c "import halide" # check for errors

View File

@ -1,23 +0,0 @@
#!/bin/bash
# Script used only in CD pipeline
set -ex
LIBPNG_VERSION=1.6.37
mkdir -p libpng
pushd libpng
wget http://download.sourceforge.net/libpng/libpng-$LIBPNG_VERSION.tar.gz
tar -xvzf libpng-$LIBPNG_VERSION.tar.gz
pushd libpng-$LIBPNG_VERSION
./configure
make
make install
popd
popd
rm -rf libpng

View File

@ -1,29 +0,0 @@
#!/usr/bin/env bash
# Script used only in CD pipeline
set -eou pipefail
MAGMA_VERSION="2.5.2"
function do_install() {
cuda_version=$1
cuda_version_nodot=${1/./}
MAGMA_VERSION="2.6.1"
magma_archive="magma-cuda${cuda_version_nodot}-${MAGMA_VERSION}-1.tar.bz2"
cuda_dir="/usr/local/cuda-${cuda_version}"
(
set -x
tmp_dir=$(mktemp -d)
pushd ${tmp_dir}
curl -OLs https://anaconda.org/pytorch/magma-cuda${cuda_version_nodot}/${MAGMA_VERSION}/download/linux-64/${magma_archive}
tar -xvf "${magma_archive}"
mkdir -p "${cuda_dir}/magma"
mv include "${cuda_dir}/magma/include"
mv lib "${cuda_dir}/magma/lib"
popd
)
}
do_install $1

View File

@ -1,169 +0,0 @@
#!/bin/bash
# Script used only in CD pipeline
set -ex
ROCM_VERSION=$1
if [[ -z $ROCM_VERSION ]]; then
echo "missing ROCM_VERSION"
exit 1;
fi
IS_UBUNTU=0
ID=$(grep -oP '(?<=^ID=).+' /etc/os-release | tr -d '"')
case "$ID" in
ubuntu)
IS_UBUNTU=1
;;
centos)
IS_UBUNTU=0
;;
*)
echo "Unable to determine OS..."
exit 1
;;
esac
# To make version comparison easier, create an integer representation.
save_IFS="$IFS"
IFS=. ROCM_VERSION_ARRAY=(${ROCM_VERSION})
IFS="$save_IFS"
if [[ ${#ROCM_VERSION_ARRAY[@]} == 2 ]]; then
ROCM_VERSION_MAJOR=${ROCM_VERSION_ARRAY[0]}
ROCM_VERSION_MINOR=${ROCM_VERSION_ARRAY[1]}
ROCM_VERSION_PATCH=0
elif [[ ${#ROCM_VERSION_ARRAY[@]} == 3 ]]; then
ROCM_VERSION_MAJOR=${ROCM_VERSION_ARRAY[0]}
ROCM_VERSION_MINOR=${ROCM_VERSION_ARRAY[1]}
ROCM_VERSION_PATCH=${ROCM_VERSION_ARRAY[2]}
else
echo "Unhandled ROCM_VERSION ${ROCM_VERSION}"
exit 1
fi
ROCM_INT=$(($ROCM_VERSION_MAJOR * 10000 + $ROCM_VERSION_MINOR * 100 + $ROCM_VERSION_PATCH))
# Install custom MIOpen + COMgr for ROCm >= 4.0.1
if [[ $ROCM_INT -lt 40001 ]]; then
echo "ROCm version < 4.0.1; will not install custom MIOpen"
exit 0
fi
# Function to retry functions that sometimes timeout or have flaky failures
retry () {
$* || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*)
}
# Build custom MIOpen to use comgr for offline compilation.
## Need a sanitized ROCM_VERSION without patchlevel; patchlevel version 0 must be added to paths.
ROCM_DOTS=$(echo ${ROCM_VERSION} | tr -d -c '.' | wc -c)
if [[ ${ROCM_DOTS} == 1 ]]; then
ROCM_VERSION_NOPATCH="${ROCM_VERSION}"
ROCM_INSTALL_PATH="/opt/rocm-${ROCM_VERSION}.0"
else
ROCM_VERSION_NOPATCH="${ROCM_VERSION%.*}"
ROCM_INSTALL_PATH="/opt/rocm-${ROCM_VERSION}"
fi
# MIOPEN_USE_HIP_KERNELS is a Workaround for COMgr issues
MIOPEN_CMAKE_COMMON_FLAGS="
-DMIOPEN_USE_COMGR=ON
-DMIOPEN_BUILD_DRIVER=OFF
"
# Pull MIOpen repo and set DMIOPEN_EMBED_DB based on ROCm version
if [[ $ROCM_INT -ge 60200 ]] && [[ $ROCM_INT -lt 60300 ]]; then
MIOPEN_BRANCH="release/rocm-rel-6.2-staging"
elif [[ $ROCM_INT -ge 60100 ]] && [[ $ROCM_INT -lt 60200 ]]; then
echo "ROCm 6.1 MIOpen does not need any patches, do not build from source"
exit 0
elif [[ $ROCM_INT -ge 60000 ]] && [[ $ROCM_INT -lt 60100 ]]; then
echo "ROCm 6.0 MIOpen does not need any patches, do not build from source"
exit 0
elif [[ $ROCM_INT -ge 50700 ]] && [[ $ROCM_INT -lt 60000 ]]; then
echo "ROCm 5.7 MIOpen does not need any patches, do not build from source"
exit 0
elif [[ $ROCM_INT -ge 50600 ]] && [[ $ROCM_INT -lt 50700 ]]; then
MIOPEN_BRANCH="release/rocm-rel-5.6-staging"
elif [[ $ROCM_INT -ge 50500 ]] && [[ $ROCM_INT -lt 50600 ]]; then
MIOPEN_BRANCH="release/rocm-rel-5.5-gfx11"
elif [[ $ROCM_INT -ge 50400 ]] && [[ $ROCM_INT -lt 50500 ]]; then
MIOPEN_CMAKE_DB_FLAGS="-DMIOPEN_EMBED_DB=gfx900_56;gfx906_60;gfx90878;gfx90a6e;gfx1030_36 -DMIOPEN_USE_MLIR=Off"
MIOPEN_BRANCH="release/rocm-rel-5.4-staging"
elif [[ $ROCM_INT -ge 50300 ]] && [[ $ROCM_INT -lt 50400 ]]; then
MIOPEN_CMAKE_DB_FLAGS="-DMIOPEN_EMBED_DB=gfx900_56;gfx906_60;gfx90878;gfx90a6e;gfx1030_36 -DMIOPEN_USE_MLIR=Off"
MIOPEN_BRANCH="release/rocm-rel-5.3-staging"
elif [[ $ROCM_INT -ge 50200 ]] && [[ $ROCM_INT -lt 50300 ]]; then
MIOPEN_CMAKE_DB_FLAGS="-DMIOPEN_EMBED_DB=gfx900_56;gfx906_60;gfx90878;gfx90a6e;gfx1030_36 -DMIOPEN_USE_MLIR=Off"
MIOPEN_BRANCH="release/rocm-rel-5.2-staging"
elif [[ $ROCM_INT -ge 50100 ]] && [[ $ROCM_INT -lt 50200 ]]; then
MIOPEN_CMAKE_DB_FLAGS="-DMIOPEN_EMBED_DB=gfx900_56;gfx906_60;gfx90878;gfx90a6e;gfx1030_36"
MIOPEN_BRANCH="release/rocm-rel-5.1-staging"
elif [[ $ROCM_INT -ge 50000 ]] && [[ $ROCM_INT -lt 50100 ]]; then
MIOPEN_CMAKE_DB_FLAGS="-DMIOPEN_EMBED_DB=gfx900_56;gfx906_60;gfx90878;gfx90a6e;gfx1030_36"
MIOPEN_BRANCH="release/rocm-rel-5.0-staging"
else
echo "Unhandled ROCM_VERSION ${ROCM_VERSION}"
exit 1
fi
if [[ ${IS_UBUNTU} == 1 ]]; then
apt-get remove -y miopen-hip
else
yum remove -y miopen-hip
fi
git clone https://github.com/ROCm/MIOpen -b ${MIOPEN_BRANCH}
pushd MIOpen
# remove .git to save disk space since CI runner was running out
rm -rf .git
# Don't build CK to save docker build time
if [[ $ROCM_INT -ge 60200 ]]; then
sed -i '/composable_kernel/d' requirements.txt
fi
# Don't build MLIR to save docker build time
# since we are disabling MLIR backend for MIOpen anyway
if [[ $ROCM_INT -ge 50400 ]] && [[ $ROCM_INT -lt 50500 ]]; then
sed -i '/rocMLIR/d' requirements.txt
elif [[ $ROCM_INT -ge 50200 ]] && [[ $ROCM_INT -lt 50400 ]]; then
sed -i '/llvm-project-mlir/d' requirements.txt
fi
## MIOpen minimum requirements
cmake -P install_deps.cmake --minimum
# clean up since CI runner was running out of disk space
rm -rf /tmp/*
if [[ ${IS_UBUNTU} == 1 ]]; then
apt-get autoclean && apt-get clean
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
else
yum clean all
rm -rf /var/cache/yum
rm -rf /var/lib/yum/yumdb
rm -rf /var/lib/yum/history
fi
## Build MIOpen
mkdir -p build
cd build
PKG_CONFIG_PATH=/usr/local/lib/pkgconfig CXX=${ROCM_INSTALL_PATH}/llvm/bin/clang++ cmake .. \
${MIOPEN_CMAKE_COMMON_FLAGS} \
${MIOPEN_CMAKE_DB_FLAGS} \
-DCMAKE_PREFIX_PATH="${ROCM_INSTALL_PATH}/hip;${ROCM_INSTALL_PATH}"
make MIOpen -j $(nproc)
# Build MIOpen package
make -j $(nproc) package
# clean up since CI runner was running out of disk space
rm -rf /usr/local/cget
if [[ ${IS_UBUNTU} == 1 ]]; then
sudo dpkg -i miopen-hip*.deb
else
yum install -y miopen-*.rpm
fi
popd
rm -rf MIOpen

View File

@ -1,16 +0,0 @@
#!/bin/bash
set -ex
# MKL
MKL_VERSION=2024.2.0
MKLROOT=/opt/intel
mkdir -p ${MKLROOT}
pushd /tmp
python3 -mpip install wheel
python3 -mpip download -d . mkl-static==${MKL_VERSION}
python3 -m wheel unpack mkl_static-${MKL_VERSION}-py2.py3-none-manylinux1_x86_64.whl
python3 -m wheel unpack mkl_include-${MKL_VERSION}-py2.py3-none-manylinux1_x86_64.whl
mv mkl_static-${MKL_VERSION}/mkl_static-${MKL_VERSION}.data/data/lib ${MKLROOT}
mv mkl_include-${MKL_VERSION}/mkl_include-${MKL_VERSION}.data/data/include ${MKLROOT}

View File

@ -1,13 +0,0 @@
#!/bin/bash
# Script used only in CD pipeline
set -ex
mkdir -p /usr/local/mnist/
cd /usr/local/mnist
for img in train-images-idx3-ubyte.gz train-labels-idx1-ubyte.gz t10k-images-idx3-ubyte.gz t10k-labels-idx1-ubyte.gz; do
wget -q https://ossci-datasets.s3.amazonaws.com/mnist/$img
gzip -d $img
done

View File

@ -1,20 +0,0 @@
#!/bin/bash
set -ex
function install_nvpl {
mkdir -p /opt/nvpl/lib /opt/nvpl/include
wget https://developer.download.nvidia.com/compute/nvpl/redist/nvpl_blas/linux-sbsa/nvpl_blas-linux-sbsa-0.3.0-archive.tar.xz
tar xf nvpl_blas-linux-sbsa-0.3.0-archive.tar.xz
cp -r nvpl_blas-linux-sbsa-0.3.0-archive/lib/* /opt/nvpl/lib/
cp -r nvpl_blas-linux-sbsa-0.3.0-archive/include/* /opt/nvpl/include/
wget https://developer.download.nvidia.com/compute/nvpl/redist/nvpl_lapack/linux-sbsa/nvpl_lapack-linux-sbsa-0.2.3.1-archive.tar.xz
tar xf nvpl_lapack-linux-sbsa-0.2.3.1-archive.tar.xz
cp -r nvpl_lapack-linux-sbsa-0.2.3.1-archive/lib/* /opt/nvpl/lib/
cp -r nvpl_lapack-linux-sbsa-0.2.3.1-archive/include/* /opt/nvpl/include/
}
install_nvpl

View File

@ -15,7 +15,7 @@ pip_install \
flatbuffers==2.0 \
mock==5.0.1 \
ninja==1.10.2 \
networkx==2.5 \
networkx==2.0 \
numpy==1.24.2
# ONNXRuntime should be installed before installing
@ -26,20 +26,18 @@ pip_install \
pytest-cov==4.0.0 \
pytest-subtests==0.10.0 \
tabulate==0.9.0 \
transformers==4.36.2
transformers==4.32.1
pip_install coloredlogs packaging
retry pip_install -i https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple/ --no-cache-dir --no-input ort-nightly==1.17.0.dev20231005006
pip_install onnxruntime==1.18.1
pip_install onnx==1.16.2
pip_install onnxscript==0.1.0.dev20240831 --no-deps
# required by onnxscript
pip_install ml_dtypes
pip_install -i https://test.pypi.org/simple/ onnx==1.15.0rc2
pip_install onnxscript==0.1.0.dev20231128 --no-deps
# Cache the transformers model to be used later by ONNX tests. We need to run the transformers
# package to download the model. By default, the model is cached at ~/.cache/huggingface/hub/
IMPORT_SCRIPT_FILENAME="/tmp/onnx_import_script.py"
as_jenkins echo 'import transformers; transformers.AutoModel.from_pretrained("sshleifer/tiny-gpt2"); transformers.AutoTokenizer.from_pretrained("sshleifer/tiny-gpt2"); transformers.AutoModelForSpeechSeq2Seq.from_pretrained("openai/whisper-large-v3");' > "${IMPORT_SCRIPT_FILENAME}"
as_jenkins echo 'import transformers; transformers.AutoModel.from_pretrained("sshleifer/tiny-gpt2"); transformers.AutoTokenizer.from_pretrained("sshleifer/tiny-gpt2");' > "${IMPORT_SCRIPT_FILENAME}"
# Need a PyTorch version for transformers to work
pip_install --pre torch --index-url https://download.pytorch.org/whl/nightly/cpu

View File

@ -1,22 +0,0 @@
#!/bin/bash
# Script used only in CD pipeline
set -ex
cd /
git clone https://github.com/OpenMathLib/OpenBLAS.git -b v0.3.25 --depth 1 --shallow-submodules
OPENBLAS_BUILD_FLAGS="
NUM_THREADS=128
USE_OPENMP=1
NO_SHARED=0
DYNAMIC_ARCH=1
TARGET=ARMV8
CFLAGS=-O3
"
OPENBLAS_CHECKOUT_DIR="OpenBLAS"
make -j8 ${OPENBLAS_BUILD_FLAGS} -C ${OPENBLAS_CHECKOUT_DIR}
make -j8 ${OPENBLAS_BUILD_FLAGS} install -C ${OPENBLAS_CHECKOUT_DIR}

View File

@ -9,8 +9,7 @@ tar xf "${OPENSSL}.tar.gz"
cd "${OPENSSL}"
./config --prefix=/opt/openssl -d '-Wl,--enable-new-dtags,-rpath,$(LIBRPATH)'
# NOTE: openssl install errors out when built with the -j option
NPROC=$[$(nproc) - 2]
make -j${NPROC}; make install_sw
make -j6; make install_sw
# Link the ssl libraries to the /usr/lib folder.
sudo ln -s /opt/openssl/lib/lib* /usr/lib
cd ..

View File

@ -1,16 +0,0 @@
#!/bin/bash
# Script used only in CD pipeline
set -ex
# Pin the version to latest release 0.17.2, building newer commit starts
# to fail on the current image
git clone -b 0.17.2 --single-branch https://github.com/NixOS/patchelf
cd patchelf
sed -i 's/serial/parallel/g' configure.ac
./bootstrap.sh
./configure
make
make install
cd ..
rm -rf patchelf

View File

@ -2,18 +2,55 @@
set -ex
pb_dir="/usr/temp_pb_install_dir"
mkdir -p $pb_dir
# This function installs protobuf 3.17
install_protobuf_317() {
pb_dir="/usr/temp_pb_install_dir"
mkdir -p $pb_dir
# On the nvidia/cuda:9-cudnn7-devel-centos7 image we need this symlink or
# else it will fail with
# g++: error: ./../lib64/crti.o: No such file or directory
ln -s /usr/lib64 "$pb_dir/lib64"
# On the nvidia/cuda:9-cudnn7-devel-centos7 image we need this symlink or
# else it will fail with
# g++: error: ./../lib64/crti.o: No such file or directory
ln -s /usr/lib64 "$pb_dir/lib64"
curl -LO "https://github.com/protocolbuffers/protobuf/releases/download/v3.17.3/protobuf-all-3.17.3.tar.gz" --retry 3
curl -LO "https://github.com/protocolbuffers/protobuf/releases/download/v3.17.3/protobuf-all-3.17.3.tar.gz" --retry 3
tar -xvz -C "$pb_dir" --strip-components 1 -f protobuf-all-3.17.3.tar.gz
# -j6 to balance memory usage and speed.
# naked `-j` seems to use too much memory.
pushd "$pb_dir" && ./configure && make -j6 && make -j6 check && sudo make -j6 install && sudo ldconfig
popd
rm -rf $pb_dir
}
tar -xvz --no-same-owner -C "$pb_dir" --strip-components 1 -f protobuf-all-3.17.3.tar.gz
NPROC=$[$(nproc) - 2]
pushd "$pb_dir" && ./configure && make -j${NPROC} && make -j${NPROC} check && sudo make -j${NRPOC} install && sudo ldconfig
popd
rm -rf $pb_dir
install_ubuntu() {
# Ubuntu 14.04 has cmake 2.8.12 as the default option, so we will
# install cmake3 here and use cmake3.
apt-get update
if [[ "$UBUNTU_VERSION" == 14.04 ]]; then
apt-get install -y --no-install-recommends cmake3
fi
# Cleanup
apt-get autoclean && apt-get clean
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
install_protobuf_317
}
install_centos() {
install_protobuf_317
}
# Install base packages depending on the base OS
ID=$(grep -oP '(?<=^ID=).+' /etc/os-release | tr -d '"')
case "$ID" in
ubuntu)
install_ubuntu
;;
centos)
install_centos
;;
*)
echo "Unable to determine OS..."
exit 1
;;
esac

View File

@ -6,6 +6,9 @@ ver() {
printf "%3d%03d%03d%03d" $(echo "$1" | tr '.' ' ');
}
# Map ROCm version to AMDGPU version
declare -A AMDGPU_VERSIONS=( ["5.0"]="21.50" ["5.1.1"]="22.10.1" ["5.2"]="22.20" )
install_ubuntu() {
apt-get update
if [[ $UBUNTU_VERSION == 18.04 ]]; then
@ -23,14 +26,31 @@ install_ubuntu() {
apt-get install -y libc++1
apt-get install -y libc++abi1
# Add amdgpu repository
UBUNTU_VERSION_NAME=`cat /etc/os-release | grep UBUNTU_CODENAME | awk -F= '{print $2}'`
echo "deb [arch=amd64] https://repo.radeon.com/amdgpu/${ROCM_VERSION}/ubuntu ${UBUNTU_VERSION_NAME} main" > /etc/apt/sources.list.d/amdgpu.list
if [[ $(ver $ROCM_VERSION) -ge $(ver 4.5) ]]; then
# Add amdgpu repository
UBUNTU_VERSION_NAME=`cat /etc/os-release | grep UBUNTU_CODENAME | awk -F= '{print $2}'`
local amdgpu_baseurl
if [[ $(ver $ROCM_VERSION) -ge $(ver 5.3) ]]; then
amdgpu_baseurl="https://repo.radeon.com/amdgpu/${ROCM_VERSION}/ubuntu"
else
amdgpu_baseurl="https://repo.radeon.com/amdgpu/${AMDGPU_VERSIONS[$ROCM_VERSION]}/ubuntu"
fi
echo "deb [arch=amd64] ${amdgpu_baseurl} ${UBUNTU_VERSION_NAME} main" > /etc/apt/sources.list.d/amdgpu.list
fi
ROCM_REPO="ubuntu"
if [[ $(ver $ROCM_VERSION) -lt $(ver 4.2) ]]; then
ROCM_REPO="xenial"
fi
if [[ $(ver $ROCM_VERSION) -ge $(ver 5.3) ]]; then
ROCM_REPO="${UBUNTU_VERSION_NAME}"
fi
# Add rocm repository
wget -qO - http://repo.radeon.com/rocm/rocm.gpg.key | apt-key add -
local rocm_baseurl="http://repo.radeon.com/rocm/apt/${ROCM_VERSION}"
echo "deb [arch=amd64] ${rocm_baseurl} ${UBUNTU_VERSION_NAME} main" > /etc/apt/sources.list.d/rocm.list
echo "deb [arch=amd64] ${rocm_baseurl} ${ROCM_REPO} main" > /etc/apt/sources.list.d/rocm.list
apt-get update --allow-insecure-repositories
DEBIAN_FRONTEND=noninteractive apt-get install -y --allow-unauthenticated \
@ -39,29 +59,27 @@ install_ubuntu() {
rocm-libs \
rccl \
rocprofiler-dev \
roctracer-dev \
amd-smi-lib
if [[ $(ver $ROCM_VERSION) -ge $(ver 6.1) ]]; then
DEBIAN_FRONTEND=noninteractive apt-get install -y --allow-unauthenticated rocm-llvm-dev
fi
roctracer-dev
# precompiled miopen kernels added in ROCm 3.5, renamed in ROCm 5.5
# search for all unversioned packages
# if search fails it will abort this script; use true to avoid case where search fails
MIOPENHIPGFX=$(apt-cache search --names-only miopen-hip-gfx | awk '{print $1}' | grep -F -v . || true)
if [[ "x${MIOPENHIPGFX}" = x ]]; then
echo "miopen-hip-gfx package not available" && exit 1
if [[ $(ver $ROCM_VERSION) -ge $(ver 5.5) ]]; then
MIOPENHIPGFX=$(apt-cache search --names-only miopen-hip-gfx | awk '{print $1}' | grep -F -v . || true)
if [[ "x${MIOPENHIPGFX}" = x ]]; then
echo "miopen-hip-gfx package not available" && exit 1
else
DEBIAN_FRONTEND=noninteractive apt-get install -y --allow-unauthenticated ${MIOPENHIPGFX}
fi
else
DEBIAN_FRONTEND=noninteractive apt-get install -y --allow-unauthenticated ${MIOPENHIPGFX}
MIOPENKERNELS=$(apt-cache search --names-only miopenkernels | awk '{print $1}' | grep -F -v . || true)
if [[ "x${MIOPENKERNELS}" = x ]]; then
echo "miopenkernels package not available" && exit 1
else
DEBIAN_FRONTEND=noninteractive apt-get install -y --allow-unauthenticated ${MIOPENKERNELS}
fi
fi
# ROCm 6.0 had a regression where journal_mode was enabled on the kdb files resulting in permission errors at runtime
for kdb in /opt/rocm/share/miopen/db/*.kdb
do
sqlite3 $kdb "PRAGMA journal_mode=off; PRAGMA VACUUM;"
done
# Cleanup
apt-get autoclean && apt-get clean
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
@ -77,19 +95,25 @@ install_centos() {
yum install -y epel-release
yum install -y dkms kernel-headers-`uname -r` kernel-devel-`uname -r`
# Add amdgpu repository
local amdgpu_baseurl
if [[ $OS_VERSION == 9 ]]; then
amdgpu_baseurl="https://repo.radeon.com/amdgpu/${ROCM_VERSION}/rhel/9.0/main/x86_64"
else
amdgpu_baseurl="https://repo.radeon.com/amdgpu/${ROCM_VERSION}/rhel/7.9/main/x86_64"
if [[ $(ver $ROCM_VERSION) -ge $(ver 4.5) ]]; then
# Add amdgpu repository
local amdgpu_baseurl
if [[ $OS_VERSION == 9 ]]; then
amdgpu_baseurl="https://repo.radeon.com/amdgpu/${AMDGPU_VERSIONS[$ROCM_VERSION]}/rhel/9.0/main/x86_64"
else
if [[ $(ver $ROCM_VERSION) -ge $(ver 5.3) ]]; then
amdgpu_baseurl="https://repo.radeon.com/amdgpu/${ROCM_VERSION}/rhel/7.9/main/x86_64"
else
amdgpu_baseurl="https://repo.radeon.com/amdgpu/${AMDGPU_VERSIONS[$ROCM_VERSION]}/rhel/7.9/main/x86_64"
fi
fi
echo "[AMDGPU]" > /etc/yum.repos.d/amdgpu.repo
echo "name=AMDGPU" >> /etc/yum.repos.d/amdgpu.repo
echo "baseurl=${amdgpu_baseurl}" >> /etc/yum.repos.d/amdgpu.repo
echo "enabled=1" >> /etc/yum.repos.d/amdgpu.repo
echo "gpgcheck=1" >> /etc/yum.repos.d/amdgpu.repo
echo "gpgkey=http://repo.radeon.com/rocm/rocm.gpg.key" >> /etc/yum.repos.d/amdgpu.repo
fi
echo "[AMDGPU]" > /etc/yum.repos.d/amdgpu.repo
echo "name=AMDGPU" >> /etc/yum.repos.d/amdgpu.repo
echo "baseurl=${amdgpu_baseurl}" >> /etc/yum.repos.d/amdgpu.repo
echo "enabled=1" >> /etc/yum.repos.d/amdgpu.repo
echo "gpgcheck=1" >> /etc/yum.repos.d/amdgpu.repo
echo "gpgkey=http://repo.radeon.com/rocm/rocm.gpg.key" >> /etc/yum.repos.d/amdgpu.repo
local rocm_baseurl="http://repo.radeon.com/rocm/yum/${ROCM_VERSION}"
echo "[ROCm]" > /etc/yum.repos.d/rocm.repo
@ -107,24 +131,26 @@ install_centos() {
rocm-libs \
rccl \
rocprofiler-dev \
roctracer-dev \
amd-smi-lib
roctracer-dev
# precompiled miopen kernels; search for all unversioned packages
# if search fails it will abort this script; use true to avoid case where search fails
MIOPENHIPGFX=$(yum -q search miopen-hip-gfx | grep miopen-hip-gfx | awk '{print $1}'| grep -F kdb. || true)
if [[ "x${MIOPENHIPGFX}" = x ]]; then
echo "miopen-hip-gfx package not available" && exit 1
if [[ $(ver $ROCM_VERSION) -ge $(ver 5.5) ]]; then
MIOPENHIPGFX=$(yum -q search miopen-hip-gfx | grep miopen-hip-gfx | awk '{print $1}'| grep -F kdb. || true)
if [[ "x${MIOPENHIPGFX}" = x ]]; then
echo "miopen-hip-gfx package not available" && exit 1
else
yum install -y ${MIOPENHIPGFX}
fi
else
yum install -y ${MIOPENHIPGFX}
MIOPENKERNELS=$(yum -q search miopenkernels | grep miopenkernels- | awk '{print $1}'| grep -F kdb. || true)
if [[ "x${MIOPENKERNELS}" = x ]]; then
echo "miopenkernels package not available" && exit 1
else
yum install -y ${MIOPENKERNELS}
fi
fi
# ROCm 6.0 had a regression where journal_mode was enabled on the kdb files resulting in permission errors at runtime
for kdb in /opt/rocm/share/miopen/db/*.kdb
do
sqlite3 $kdb "PRAGMA journal_mode=off; PRAGMA VACUUM;"
done
# Cleanup
yum clean all
rm -rf /var/cache/yum

View File

@ -1,150 +0,0 @@
#!/bin/bash
# Script used only in CD pipeline
###########################
### prereqs
###########################
# Install Python packages depending on the base OS
ID=$(grep -oP '(?<=^ID=).+' /etc/os-release | tr -d '"')
case "$ID" in
ubuntu)
apt-get update -y
apt-get install -y libpciaccess-dev pkg-config
apt-get clean
;;
centos)
yum install -y libpciaccess-devel pkgconfig
;;
*)
echo "Unable to determine OS..."
exit 1
;;
esac
python3 -m pip install meson ninja
###########################
### clone repo
###########################
GIT_SSL_NO_VERIFY=true git clone https://gitlab.freedesktop.org/mesa/drm.git
pushd drm
###########################
### patch
###########################
patch -p1 <<'EOF'
diff --git a/amdgpu/amdgpu_asic_id.c b/amdgpu/amdgpu_asic_id.c
index a5007ffc..13fa07fc 100644
--- a/amdgpu/amdgpu_asic_id.c
+++ b/amdgpu/amdgpu_asic_id.c
@@ -22,6 +22,13 @@
*
*/
+#define _XOPEN_SOURCE 700
+#define _LARGEFILE64_SOURCE
+#define _FILE_OFFSET_BITS 64
+#include <ftw.h>
+#include <link.h>
+#include <limits.h>
+
#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
@@ -34,6 +41,19 @@
#include "amdgpu_drm.h"
#include "amdgpu_internal.h"
+static char *amdgpuids_path = NULL;
+static const char* amdgpuids_path_msg = NULL;
+
+static int check_for_location_of_amdgpuids(const char *filepath, const struct stat *info, const int typeflag, struct FTW *pathinfo)
+{
+ if (typeflag == FTW_F && strstr(filepath, "amdgpu.ids")) {
+ amdgpuids_path = strdup(filepath);
+ return 1;
+ }
+
+ return 0;
+}
+
static int parse_one_line(struct amdgpu_device *dev, const char *line)
{
char *buf, *saveptr;
@@ -113,10 +133,46 @@ void amdgpu_parse_asic_ids(struct amdgpu_device *dev)
int line_num = 1;
int r = 0;
+ // attempt to find typical location for amdgpu.ids file
fp = fopen(AMDGPU_ASIC_ID_TABLE, "r");
+
+ // if it doesn't exist, search
+ if (!fp) {
+
+ char self_path[ PATH_MAX ];
+ ssize_t count;
+ ssize_t i;
+
+ count = readlink( "/proc/self/exe", self_path, PATH_MAX );
+ if (count > 0) {
+ self_path[count] = '\0';
+
+ // remove '/bin/python' from self_path
+ for (i=count; i>0; --i) {
+ if (self_path[i] == '/') break;
+ self_path[i] = '\0';
+ }
+ self_path[i] = '\0';
+ for (; i>0; --i) {
+ if (self_path[i] == '/') break;
+ self_path[i] = '\0';
+ }
+ self_path[i] = '\0';
+
+ if (1 == nftw(self_path, check_for_location_of_amdgpuids, 5, FTW_PHYS)) {
+ fp = fopen(amdgpuids_path, "r");
+ amdgpuids_path_msg = amdgpuids_path;
+ }
+ }
+
+ }
+ else {
+ amdgpuids_path_msg = AMDGPU_ASIC_ID_TABLE;
+ }
+
+ // both hard-coded location and search have failed
if (!fp) {
- fprintf(stderr, "%s: %s\n", AMDGPU_ASIC_ID_TABLE,
- strerror(errno));
+ fprintf(stderr, "amdgpu.ids: No such file or directory\n");
return;
}
@@ -132,7 +188,7 @@ void amdgpu_parse_asic_ids(struct amdgpu_device *dev)
continue;
}
- drmMsg("%s version: %s\n", AMDGPU_ASIC_ID_TABLE, line);
+ drmMsg("%s version: %s\n", amdgpuids_path_msg, line);
break;
}
@@ -150,7 +206,7 @@ void amdgpu_parse_asic_ids(struct amdgpu_device *dev)
if (r == -EINVAL) {
fprintf(stderr, "Invalid format: %s: line %d: %s\n",
- AMDGPU_ASIC_ID_TABLE, line_num, line);
+ amdgpuids_path_msg, line_num, line);
} else if (r && r != -EAGAIN) {
fprintf(stderr, "%s: Cannot parse ASIC IDs: %s\n",
__func__, strerror(-r));
EOF
###########################
### build
###########################
meson builddir --prefix=/opt/amdgpu
pushd builddir
ninja install
popd
popd

View File

@ -1,24 +1,17 @@
#!/bin/bash
# Script used in CI and CD pipeline
set -ex
MKLROOT=${MKLROOT:-/opt/conda/envs/py_$ANACONDA_PYTHON_VERSION}
# "install" hipMAGMA into /opt/rocm/magma by copying after build
git clone https://bitbucket.org/icl/magma.git
pushd magma
# Version 2.7.2 + ROCm related updates
git checkout a1625ff4d9bc362906bd01f805dbbe12612953f6
git checkout 823531632140d0edcb7e77c3edc0e837421471c5
cp make.inc-examples/make.inc.hip-gcc-mkl make.inc
echo 'LIBDIR += -L$(MKLROOT)/lib' >> make.inc
if [[ -f "${MKLROOT}/lib/libmkl_core.a" ]]; then
echo 'LIB = -Wl,--start-group -lmkl_gf_lp64 -lmkl_gnu_thread -lmkl_core -Wl,--end-group -lpthread -lstdc++ -lm -lgomp -lhipblas -lhipsparse' >> make.inc
fi
echo 'LIB += -Wl,--enable-new-dtags -Wl,--rpath,/opt/rocm/lib -Wl,--rpath,$(MKLROOT)/lib -Wl,--rpath,/opt/rocm/magma/lib -ldl' >> make.inc
echo 'LIB += -Wl,--enable-new-dtags -Wl,--rpath,/opt/rocm/lib -Wl,--rpath,$(MKLROOT)/lib -Wl,--rpath,/opt/rocm/magma/lib' >> make.inc
echo 'DEVCCFLAGS += --gpu-max-threads-per-block=256' >> make.inc
export PATH="${PATH}:/opt/rocm/bin"
if [[ -n "$PYTORCH_ROCM_ARCH" ]]; then
@ -32,7 +25,7 @@ done
# hipcc with openmp flag may cause isnan() on __device__ not to be found; depending on context, compiler may attempt to match with host definition
sed -i 's/^FOPENMP/#FOPENMP/g' make.inc
make -f make.gen.hipMAGMA -j $(nproc)
LANG=C.UTF-8 make lib/libmagma.so -j $(nproc) MKLROOT="${MKLROOT}"
make testing/testing_dgemm -j $(nproc) MKLROOT="${MKLROOT}"
LANG=C.UTF-8 make lib/libmagma.so -j $(nproc) MKLROOT=/opt/conda/envs/py_$ANACONDA_PYTHON_VERSION
make testing/testing_dgemm -j $(nproc) MKLROOT=/opt/conda/envs/py_$ANACONDA_PYTHON_VERSION
popd
mv magma /opt/rocm

View File

@ -12,9 +12,9 @@ conda_reinstall() {
as_jenkins conda install -q -n py_$ANACONDA_PYTHON_VERSION -y --force-reinstall $*
}
if [ -n "${XPU_VERSION}" ]; then
TRITON_REPO="https://github.com/intel/intel-xpu-backend-for-triton"
TRITON_TEXT_FILE="triton-xpu"
if [ -n "${ROCM_VERSION}" ]; then
TRITON_REPO="https://github.com/ROCmSoftwarePlatform/triton"
TRITON_TEXT_FILE="triton-rocm"
else
TRITON_REPO="https://github.com/openai/triton"
TRITON_TEXT_FILE="triton"
@ -38,33 +38,19 @@ if [ -z "${MAX_JOBS}" ]; then
export MAX_JOBS=$(nproc)
fi
# Git checkout triton
mkdir /var/lib/jenkins/triton
chown -R jenkins /var/lib/jenkins/triton
chgrp -R jenkins /var/lib/jenkins/triton
pushd /var/lib/jenkins/
as_jenkins git clone ${TRITON_REPO} triton
cd triton
as_jenkins git checkout ${TRITON_PINNED_COMMIT}
cd python
# TODO: remove patch setup.py once we have a proper fix for https://github.com/triton-lang/triton/issues/4527
as_jenkins sed -i -e 's/https:\/\/tritonlang.blob.core.windows.net\/llvm-builds/https:\/\/oaitriton.blob.core.windows.net\/public\/llvm-builds/g' setup.py
if [ -n "${UBUNTU_VERSION}" ] && [ -n "${GCC_VERSION}" ] && [[ "${GCC_VERSION}" == "7" ]]; then
# Triton needs at least gcc-9 to build
apt-get install -y g++-9
CXX=g++-9 pip_install -e .
CXX=g++-9 pip_install "git+${TRITON_REPO}@${TRITON_PINNED_COMMIT}#subdirectory=python"
elif [ -n "${UBUNTU_VERSION}" ] && [ -n "${CLANG_VERSION}" ]; then
# Triton needs <filesystem> which surprisingly is not available with clang-9 toolchain
add-apt-repository -y ppa:ubuntu-toolchain-r/test
apt-get install -y g++-9
CXX=g++-9 pip_install -e .
CXX=g++-9 pip_install "git+${TRITON_REPO}@${TRITON_PINNED_COMMIT}#subdirectory=python"
else
pip_install -e .
pip_install "git+${TRITON_REPO}@${TRITON_PINNED_COMMIT}#subdirectory=python"
fi
if [ -n "${CONDA_CMAKE}" ]; then
@ -78,6 +64,5 @@ if [ -n "${CONDA_CMAKE}" ]; then
# latest numpy version, which fails ASAN tests with the following import error: Numba
# needs NumPy 1.20 or less.
conda_reinstall cmake="${CMAKE_VERSION}"
# Note that we install numpy with pip as conda might not have the version we want
pip_install --force-reinstall numpy=="${NUMPY_VERSION}"
conda_reinstall numpy="${NUMPY_VERSION}"
fi

View File

@ -36,12 +36,7 @@ function install_ucc() {
git submodule update --init --recursive
./autogen.sh
# We only run distributed tests on Tesla M60 and A10G
NVCC_GENCODE="-gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_86,code=compute_86"
./configure --prefix=$UCC_HOME \
--with-ucx=$UCX_HOME \
--with-cuda=$with_cuda \
--with-nvcc-gencode="${NVCC_GENCODE}"
./configure --prefix=$UCC_HOME --with-ucx=$UCX_HOME --with-cuda=$with_cuda
time make -j
sudo make install

View File

@ -5,7 +5,8 @@ set -ex
install_ubuntu() {
apt-get update
apt-get install -y --no-install-recommends \
libopencv-dev
libopencv-dev \
libavcodec-dev
# Cleanup
apt-get autoclean && apt-get clean
@ -18,7 +19,8 @@ install_centos() {
yum --enablerepo=extras install -y epel-release
yum install -y \
opencv-devel
opencv-devel \
ffmpeg-devel
# Cleanup
yum clean all

View File

@ -1,161 +0,0 @@
#!/bin/bash
set -xe
# Script used in CI and CD pipeline
# Intel® software for general purpose GPU capabilities.
# Refer to https://www.intel.com/content/www/us/en/developer/articles/tool/pytorch-prerequisites-for-intel-gpus.html
# Users should update to the latest version as it becomes available
function install_ubuntu() {
. /etc/os-release
if [[ ! " jammy " =~ " ${VERSION_CODENAME} " ]]; then
echo "Ubuntu version ${VERSION_CODENAME} not supported"
exit
fi
apt-get update -y
apt-get install -y gpg-agent wget
# To add the online network package repository for the GPU Driver
wget -qO - https://repositories.intel.com/gpu/intel-graphics.key \
| gpg --yes --dearmor --output /usr/share/keyrings/intel-graphics.gpg
echo "deb [arch=amd64 signed-by=/usr/share/keyrings/intel-graphics.gpg] \
https://repositories.intel.com/gpu/ubuntu ${VERSION_CODENAME}${XPU_DRIVER_VERSION} unified" \
| tee /etc/apt/sources.list.d/intel-gpu-${VERSION_CODENAME}.list
# To add the online network network package repository for the Intel Support Packages
wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB \
| gpg --dearmor > /usr/share/keyrings/intel-for-pytorch-gpu-dev-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/intel-for-pytorch-gpu-dev-keyring.gpg] \
https://apt.repos.intel.com/intel-for-pytorch-gpu-dev all main" \
| tee /etc/apt/sources.list.d/intel-for-pytorch-gpu-dev.list
# Update the packages list and repository index
apt-get update
# The xpu-smi packages
apt-get install -y flex bison xpu-smi
# Compute and Media Runtimes
apt-get install -y \
intel-opencl-icd intel-level-zero-gpu level-zero \
intel-media-va-driver-non-free libmfx1 libmfxgen1 libvpl2 \
libegl-mesa0 libegl1-mesa libegl1-mesa-dev libgbm1 libgl1-mesa-dev libgl1-mesa-dri \
libglapi-mesa libgles2-mesa-dev libglx-mesa0 libigdgmm12 libxatracker2 mesa-va-drivers \
mesa-vdpau-drivers mesa-vulkan-drivers va-driver-all vainfo hwinfo clinfo
# Development Packages
apt-get install -y libigc-dev intel-igc-cm libigdfcl-dev libigfxcmrt-dev level-zero-dev
# Install Intel Support Packages
if [ -n "$XPU_VERSION" ]; then
apt-get install -y intel-for-pytorch-gpu-dev-${XPU_VERSION} intel-pti-dev
else
apt-get install -y intel-for-pytorch-gpu-dev intel-pti-dev
fi
# Cleanup
apt-get autoclean && apt-get clean
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
}
function install_rhel() {
. /etc/os-release
if [[ "${ID}" == "rhel" ]]; then
if [[ ! " 8.6 8.8 8.9 9.0 9.2 9.3 " =~ " ${VERSION_ID} " ]]; then
echo "RHEL version ${VERSION_ID} not supported"
exit
fi
elif [[ "${ID}" == "almalinux" ]]; then
# Workaround for almalinux8 which used by quay.io/pypa/manylinux_2_28_x86_64
VERSION_ID="8.6"
fi
dnf install -y 'dnf-command(config-manager)'
# To add the online network package repository for the GPU Driver
dnf config-manager --add-repo \
https://repositories.intel.com/gpu/rhel/${VERSION_ID}${XPU_DRIVER_VERSION}/unified/intel-gpu-${VERSION_ID}.repo
# To add the online network network package repository for the Intel Support Packages
tee > /etc/yum.repos.d/intel-for-pytorch-gpu-dev.repo << EOF
[intel-for-pytorch-gpu-dev]
name=Intel for Pytorch GPU dev repository
baseurl=https://yum.repos.intel.com/intel-for-pytorch-gpu-dev
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://yum.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB
EOF
# The xpu-smi packages
dnf install -y xpu-smi
# Compute and Media Runtimes
dnf install --skip-broken -y \
intel-opencl intel-media intel-mediasdk libmfxgen1 libvpl2\
level-zero intel-level-zero-gpu mesa-dri-drivers mesa-vulkan-drivers \
mesa-vdpau-drivers libdrm mesa-libEGL mesa-libgbm mesa-libGL \
mesa-libxatracker libvpl-tools intel-metrics-discovery \
intel-metrics-library intel-igc-core intel-igc-cm \
libva libva-utils intel-gmmlib libmetee intel-gsc intel-ocloc
# Development packages
dnf install -y --refresh \
intel-igc-opencl-devel level-zero-devel intel-gsc-devel libmetee-devel \
level-zero-devel
# Install Intel Support Packages
yum install -y intel-for-pytorch-gpu-dev intel-pti-dev
# Cleanup
dnf clean all
rm -rf /var/cache/yum
rm -rf /var/lib/yum/yumdb
rm -rf /var/lib/yum/history
}
function install_sles() {
. /etc/os-release
VERSION_SP=${VERSION_ID//./sp}
if [[ ! " 15sp4 15sp5 " =~ " ${VERSION_SP} " ]]; then
echo "SLES version ${VERSION_ID} not supported"
exit
fi
# To add the online network package repository for the GPU Driver
zypper addrepo -f -r \
https://repositories.intel.com/gpu/sles/${VERSION_SP}${XPU_DRIVER_VERSION}/unified/intel-gpu-${VERSION_SP}.repo
rpm --import https://repositories.intel.com/gpu/intel-graphics.key
# To add the online network network package repository for the Intel Support Packages
zypper addrepo https://yum.repos.intel.com/intel-for-pytorch-gpu-dev intel-for-pytorch-gpu-dev
rpm --import https://yum.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB
# The xpu-smi packages
zypper install -y lsb-release flex bison xpu-smi
# Compute and Media Runtimes
zypper install -y intel-level-zero-gpu level-zero intel-gsc intel-opencl intel-ocloc \
intel-media-driver libigfxcmrt7 libvpl2 libvpl-tools libmfxgen1 libmfx1
# Development packages
zypper install -y libigdfcl-devel intel-igc-cm libigfxcmrt-devel level-zero-devel
# Install Intel Support Packages
zypper install -y intel-for-pytorch-gpu-dev intel-pti-dev
}
# Default use GPU driver LTS releases
XPU_DRIVER_VERSION="/lts/2350"
if [[ "${XPU_DRIVER_TYPE,,}" == "rolling" ]]; then
# Use GPU driver rolling releases
XPU_DRIVER_VERSION=""
fi
# The installation depends on the base OS
ID=$(grep -oP '(?<=^ID=).+' /etc/os-release | tr -d '"')
case "$ID" in
ubuntu)
install_ubuntu
;;
rhel|almalinux)
install_rhel
;;
sles)
install_sles
;;
*)
echo "Unable to determine OS..."
exit 1
;;
esac

View File

@ -1,100 +0,0 @@
ARG CUDA_VERSION=10.2
ARG BASE_TARGET=cuda${CUDA_VERSION}
FROM centos:7 as base
ENV LC_ALL en_US.UTF-8
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US.UTF-8
ARG DEVTOOLSET_VERSION=9
RUN sed -i s/mirror.centos.org/vault.centos.org/g /etc/yum.repos.d/*.repo
RUN sed -i s/^#.*baseurl=http/baseurl=http/g /etc/yum.repos.d/*.repo
RUN sed -i s/^mirrorlist=http/#mirrorlist=http/g /etc/yum.repos.d/*.repo
RUN yum update -y
RUN yum install -y wget curl perl util-linux xz bzip2 git patch which unzip
# Just add everything as a safe.directory for git since these will be used in multiple places with git
RUN git config --global --add safe.directory '*'
RUN yum install -y yum-utils centos-release-scl
RUN yum-config-manager --enable rhel-server-rhscl-7-rpms
RUN sed -i s/mirror.centos.org/vault.centos.org/g /etc/yum.repos.d/*.repo
RUN sed -i s/^#.*baseurl=http/baseurl=http/g /etc/yum.repos.d/*.repo
RUN sed -i s/^mirrorlist=http/#mirrorlist=http/g /etc/yum.repos.d/*.repo
RUN yum install -y devtoolset-${DEVTOOLSET_VERSION}-gcc devtoolset-${DEVTOOLSET_VERSION}-gcc-c++ devtoolset-${DEVTOOLSET_VERSION}-gcc-gfortran devtoolset-${DEVTOOLSET_VERSION}-binutils
# EPEL for cmake
RUN yum --enablerepo=extras install -y epel-release
# cmake
RUN yum install -y cmake3 && \
ln -s /usr/bin/cmake3 /usr/bin/cmake
ENV PATH=/opt/rh/devtoolset-${DEVTOOLSET_VERSION}/root/usr/bin:$PATH
ENV LD_LIBRARY_PATH=/opt/rh/devtoolset-${DEVTOOLSET_VERSION}/root/usr/lib64:/opt/rh/devtoolset-${DEVTOOLSET_VERSION}/root/usr/lib:$LD_LIBRARY_PATH
RUN yum install -y autoconf aclocal automake make sudo
RUN rm -rf /usr/local/cuda-*
FROM base as patchelf
# Install patchelf
ADD ./common/install_patchelf.sh install_patchelf.sh
RUN bash ./install_patchelf.sh && rm install_patchelf.sh && cp $(which patchelf) /patchelf
FROM base as openssl
# Install openssl
ADD ./common/install_openssl.sh install_openssl.sh
RUN bash ./install_openssl.sh && rm install_openssl.sh
FROM base as conda
# Install Anaconda
ADD ./common/install_conda_docker.sh install_conda.sh
RUN bash ./install_conda.sh && rm install_conda.sh
# Install CUDA
FROM base as cuda
ARG CUDA_VERSION=10.2
RUN rm -rf /usr/local/cuda-*
ADD ./common/install_cuda.sh install_cuda.sh
ENV CUDA_HOME=/usr/local/cuda-${CUDA_VERSION}
# Preserve CUDA_VERSION for the builds
ENV CUDA_VERSION=${CUDA_VERSION}
# Make things in our path by default
ENV PATH=/usr/local/cuda-${CUDA_VERSION}/bin:$PATH
FROM cuda as cuda11.8
RUN bash ./install_cuda.sh 11.8
ENV DESIRED_CUDA=11.8
FROM cuda as cuda12.1
RUN bash ./install_cuda.sh 12.1
ENV DESIRED_CUDA=12.1
FROM cuda as cuda12.4
RUN bash ./install_cuda.sh 12.4
ENV DESIRED_CUDA=12.4
# Install MNIST test data
FROM base as mnist
ADD ./common/install_mnist.sh install_mnist.sh
RUN bash ./install_mnist.sh
FROM base as all_cuda
COPY --from=cuda11.8 /usr/local/cuda-11.8 /usr/local/cuda-11.8
COPY --from=cuda12.1 /usr/local/cuda-12.1 /usr/local/cuda-12.1
COPY --from=cuda12.4 /usr/local/cuda-12.4 /usr/local/cuda-12.4
# Final step
FROM ${BASE_TARGET} as final
COPY --from=openssl /opt/openssl /opt/openssl
COPY --from=patchelf /patchelf /usr/local/bin/patchelf
COPY --from=conda /opt/conda /opt/conda
# Add jni.h for java host build.
COPY ./common/install_jni.sh install_jni.sh
COPY ./java/jni.h jni.h
RUN bash ./install_jni.sh && rm install_jni.sh
ENV PATH /opt/conda/bin:$PATH
COPY --from=mnist /usr/local/mnist /usr/local/mnist
RUN rm -rf /usr/local/cuda
RUN chmod o+rw /usr/local
RUN touch /.condarc && \
chmod o+rw /.condarc && \
chmod -R o+rw /opt/conda

View File

@ -1,76 +0,0 @@
#!/usr/bin/env bash
# Script used only in CD pipeline
set -eou pipefail
image="$1"
shift
if [ -z "${image}" ]; then
echo "Usage: $0 IMAGE"
exit 1
fi
DOCKER_IMAGE_NAME="pytorch/${image}"
export DOCKER_BUILDKIT=1
TOPDIR=$(git rev-parse --show-toplevel)
CUDA_VERSION=${CUDA_VERSION:-12.1}
case ${CUDA_VERSION} in
cpu)
BASE_TARGET=base
DOCKER_TAG=cpu
;;
all)
BASE_TARGET=all_cuda
DOCKER_TAG=latest
;;
*)
BASE_TARGET=cuda${CUDA_VERSION}
DOCKER_TAG=cuda${CUDA_VERSION}
;;
esac
(
set -x
docker build \
--target final \
--progress plain \
--build-arg "BASE_TARGET=${BASE_TARGET}" \
--build-arg "CUDA_VERSION=${CUDA_VERSION}" \
--build-arg "DEVTOOLSET_VERSION=9" \
-t ${DOCKER_IMAGE_NAME} \
$@ \
-f "${TOPDIR}/.ci/docker/conda/Dockerfile" \
${TOPDIR}/.ci/docker/
)
if [[ "${DOCKER_TAG}" =~ ^cuda* ]]; then
# Test that we're using the right CUDA compiler
(
set -x
docker run --rm "${DOCKER_IMAGE_NAME}" nvcc --version | grep "cuda_${CUDA_VERSION}"
)
fi
GITHUB_REF=${GITHUB_REF:-$(git symbolic-ref -q HEAD || git describe --tags --exact-match)}
GIT_BRANCH_NAME=${GITHUB_REF##*/}
GIT_COMMIT_SHA=${GITHUB_SHA:-$(git rev-parse HEAD)}
DOCKER_IMAGE_BRANCH_TAG=${DOCKER_IMAGE_NAME}-${GIT_BRANCH_NAME}
DOCKER_IMAGE_SHA_TAG=${DOCKER_IMAGE_NAME}-${GIT_COMMIT_SHA}
if [[ "${WITH_PUSH:-}" == true ]]; then
(
set -x
docker push "${DOCKER_IMAGE_NAME}"
if [[ -n ${GITHUB_REF} ]]; then
docker tag ${DOCKER_IMAGE_NAME} ${DOCKER_IMAGE_BRANCH_TAG}
docker tag ${DOCKER_IMAGE_NAME} ${DOCKER_IMAGE_SHA_TAG}
docker push "${DOCKER_IMAGE_BRANCH_TAG}"
docker push "${DOCKER_IMAGE_SHA_TAG}"
fi
)
fi

View File

@ -1,107 +0,0 @@
ARG BASE_TARGET=base
ARG GPU_IMAGE=ubuntu:20.04
FROM ${GPU_IMAGE} as base
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get clean && apt-get update
RUN apt-get install -y curl locales g++ git-all autoconf automake make cmake wget unzip sudo
# Just add everything as a safe.directory for git since these will be used in multiple places with git
RUN git config --global --add safe.directory '*'
RUN locale-gen en_US.UTF-8
ENV LC_ALL en_US.UTF-8
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US.UTF-8
# Install openssl
FROM base as openssl
ADD ./common/install_openssl.sh install_openssl.sh
RUN bash ./install_openssl.sh && rm install_openssl.sh
# Install python
FROM base as python
ADD common/install_cpython.sh install_cpython.sh
RUN apt-get update -y && \
apt-get install build-essential gdb lcov libbz2-dev libffi-dev \
libgdbm-dev liblzma-dev libncurses5-dev libreadline6-dev \
libsqlite3-dev libssl-dev lzma lzma-dev tk-dev uuid-dev zlib1g-dev -y && \
bash ./install_cpython.sh && \
rm install_cpython.sh && \
apt-get clean
FROM base as conda
ADD ./common/install_conda_docker.sh install_conda.sh
RUN bash ./install_conda.sh && rm install_conda.sh
FROM base as cpu
# Install Anaconda
COPY --from=conda /opt/conda /opt/conda
# Install python
COPY --from=python /opt/python /opt/python
COPY --from=python /opt/_internal /opt/_internal
ENV PATH=/opt/conda/bin:/usr/local/cuda/bin:$PATH
# Install MKL
ADD ./common/install_mkl.sh install_mkl.sh
RUN bash ./install_mkl.sh && rm install_mkl.sh
FROM cpu as cuda
ADD ./common/install_cuda.sh install_cuda.sh
ADD ./common/install_magma.sh install_magma.sh
ENV CUDA_HOME /usr/local/cuda
FROM cuda as cuda11.8
RUN bash ./install_cuda.sh 11.8
RUN bash ./install_magma.sh 11.8
RUN ln -sf /usr/local/cuda-11.8 /usr/local/cuda
FROM cuda as cuda12.1
RUN bash ./install_cuda.sh 12.1
RUN bash ./install_magma.sh 12.1
RUN ln -sf /usr/local/cuda-12.1 /usr/local/cuda
FROM cuda as cuda12.4
RUN bash ./install_cuda.sh 12.4
RUN bash ./install_magma.sh 12.4
RUN ln -sf /usr/local/cuda-12.4 /usr/local/cuda
FROM cpu as rocm
ARG PYTORCH_ROCM_ARCH
ENV PYTORCH_ROCM_ARCH ${PYTORCH_ROCM_ARCH}
ENV MKLROOT /opt/intel
# Adding ROCM_PATH env var so that LoadHip.cmake (even with logic updated for ROCm6.0)
# find HIP works for ROCm5.7. Not needed for ROCm6.0 and above.
# Remove below when ROCm5.7 is not in support matrix anymore.
ENV ROCM_PATH /opt/rocm
# No need to install ROCm as base docker image should have full ROCm install
#ADD ./common/install_rocm.sh install_rocm.sh
ADD ./common/install_rocm_drm.sh install_rocm_drm.sh
ADD ./common/install_rocm_magma.sh install_rocm_magma.sh
# gfortran and python needed for building magma from source for ROCm
RUN apt-get update -y && \
apt-get install gfortran -y && \
apt-get install python -y && \
apt-get clean
RUN bash ./install_rocm_drm.sh && rm install_rocm_drm.sh
RUN bash ./install_rocm_magma.sh && rm install_rocm_magma.sh
# Install AOTriton
COPY ./common/common_utils.sh common_utils.sh
COPY ./aotriton_version.txt aotriton_version.txt
COPY ./common/install_aotriton.sh install_aotriton.sh
RUN bash ./install_aotriton.sh /opt/rocm && rm install_aotriton.sh aotriton_version.txt
ENV AOTRITON_INSTALLED_PREFIX /opt/rocm/aotriton
FROM ${BASE_TARGET} as final
COPY --from=openssl /opt/openssl /opt/openssl
# Install patchelf
ADD ./common/install_patchelf.sh install_patchelf.sh
RUN bash ./install_patchelf.sh && rm install_patchelf.sh
# Install Anaconda
COPY --from=conda /opt/conda /opt/conda
# Install python
COPY --from=python /opt/python /opt/python
COPY --from=python /opt/_internal /opt/_internal
ENV PATH=/opt/conda/bin:/usr/local/cuda/bin:$PATH

View File

@ -1,93 +0,0 @@
#!/usr/bin/env bash
# Script used only in CD pipeline
set -eou pipefail
image="$1"
shift
if [ -z "${image}" ]; then
echo "Usage: $0 IMAGE"
exit 1
fi
DOCKER_IMAGE="pytorch/${image}"
TOPDIR=$(git rev-parse --show-toplevel)
GPU_ARCH_TYPE=${GPU_ARCH_TYPE:-cpu}
GPU_ARCH_VERSION=${GPU_ARCH_VERSION:-}
WITH_PUSH=${WITH_PUSH:-}
DOCKER=${DOCKER:-docker}
case ${GPU_ARCH_TYPE} in
cpu)
BASE_TARGET=cpu
DOCKER_TAG=cpu
GPU_IMAGE=ubuntu:20.04
DOCKER_GPU_BUILD_ARG=""
;;
cuda)
BASE_TARGET=cuda${GPU_ARCH_VERSION}
DOCKER_TAG=cuda${GPU_ARCH_VERSION}
GPU_IMAGE=ubuntu:20.04
DOCKER_GPU_BUILD_ARG=""
;;
rocm)
BASE_TARGET=rocm
DOCKER_TAG=rocm${GPU_ARCH_VERSION}
GPU_IMAGE=rocm/dev-ubuntu-20.04:${GPU_ARCH_VERSION}-complete
PYTORCH_ROCM_ARCH="gfx900;gfx906;gfx908;gfx90a;gfx1030;gfx1100"
ROCM_REGEX="([0-9]+)\.([0-9]+)[\.]?([0-9]*)"
if [[ $GPU_ARCH_VERSION =~ $ROCM_REGEX ]]; then
ROCM_VERSION_INT=$((${BASH_REMATCH[1]}*10000 + ${BASH_REMATCH[2]}*100 + ${BASH_REMATCH[3]:-0}))
else
echo "ERROR: rocm regex failed"
exit 1
fi
if [[ $ROCM_VERSION_INT -ge 60000 ]]; then
PYTORCH_ROCM_ARCH+=";gfx942"
fi
DOCKER_GPU_BUILD_ARG="--build-arg PYTORCH_ROCM_ARCH=${PYTORCH_ROCM_ARCH}"
;;
*)
echo "ERROR: Unrecognized GPU_ARCH_TYPE: ${GPU_ARCH_TYPE}"
exit 1
;;
esac
(
set -x
DOCKER_BUILDKIT=1 ${DOCKER} build \
--target final \
${DOCKER_GPU_BUILD_ARG} \
--build-arg "GPU_IMAGE=${GPU_IMAGE}" \
--build-arg "BASE_TARGET=${BASE_TARGET}" \
-t "${DOCKER_IMAGE}" \
$@ \
-f "${TOPDIR}/.ci/docker/libtorch/Dockerfile" \
"${TOPDIR}/.ci/docker/"
)
GITHUB_REF=${GITHUB_REF:-$(git symbolic-ref -q HEAD || git describe --tags --exact-match)}
GIT_BRANCH_NAME=${GITHUB_REF##*/}
GIT_COMMIT_SHA=${GITHUB_SHA:-$(git rev-parse HEAD)}
DOCKER_IMAGE_BRANCH_TAG=${DOCKER_IMAGE}-${GIT_BRANCH_NAME}
DOCKER_IMAGE_SHA_TAG=${DOCKER_IMAGE}-${GIT_COMMIT_SHA}
if [[ "${WITH_PUSH}" == true ]]; then
(
set -x
${DOCKER} push "${DOCKER_IMAGE}"
if [[ -n ${GITHUB_REF} ]]; then
${DOCKER} tag ${DOCKER_IMAGE} ${DOCKER_IMAGE_BRANCH_TAG}
${DOCKER} tag ${DOCKER_IMAGE} ${DOCKER_IMAGE_SHA_TAG}
${DOCKER} push "${DOCKER_IMAGE_BRANCH_TAG}"
${DOCKER} push "${DOCKER_IMAGE_SHA_TAG}"
fi
)
fi

View File

@ -29,7 +29,7 @@ RUN bash ./install_conda.sh && rm install_conda.sh common_utils.sh /opt/conda/re
# Install cuda and cudnn
ARG CUDA_VERSION
COPY ./common/install_cuda.sh install_cuda.sh
RUN wget -q https://raw.githubusercontent.com/pytorch/builder/main/common/install_cuda.sh -O install_cuda.sh
RUN bash ./install_cuda.sh ${CUDA_VERSION} && rm install_cuda.sh
ENV DESIRED_CUDA ${CUDA_VERSION}
ENV PATH /usr/local/nvidia/bin:/usr/local/cuda/bin:$PATH

View File

@ -1,202 +0,0 @@
# syntax = docker/dockerfile:experimental
ARG ROCM_VERSION=3.7
ARG BASE_CUDA_VERSION=11.8
ARG GPU_IMAGE=centos:7
FROM centos:7 as base
ENV LC_ALL en_US.UTF-8
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US.UTF-8
ARG DEVTOOLSET_VERSION=9
# Note: This is required patch since CentOS have reached EOL
# otherwise any yum install setp will fail
RUN sed -i s/mirror.centos.org/vault.centos.org/g /etc/yum.repos.d/*.repo
RUN sed -i s/^#.*baseurl=http/baseurl=http/g /etc/yum.repos.d/*.repo
RUN sed -i s/^mirrorlist=http/#mirrorlist=http/g /etc/yum.repos.d/*.repo
RUN yum install -y wget curl perl util-linux xz bzip2 git patch which perl zlib-devel
# Just add everything as a safe.directory for git since these will be used in multiple places with git
RUN git config --global --add safe.directory '*'
RUN yum install -y yum-utils centos-release-scl
RUN yum-config-manager --enable rhel-server-rhscl-7-rpms
# Note: After running yum-config-manager --enable rhel-server-rhscl-7-rpms
# patch is required once again. Somehow this steps adds mirror.centos.org
RUN sed -i s/mirror.centos.org/vault.centos.org/g /etc/yum.repos.d/*.repo
RUN sed -i s/^#.*baseurl=http/baseurl=http/g /etc/yum.repos.d/*.repo
RUN sed -i s/^mirrorlist=http/#mirrorlist=http/g /etc/yum.repos.d/*.repo
RUN yum install -y devtoolset-${DEVTOOLSET_VERSION}-gcc devtoolset-${DEVTOOLSET_VERSION}-gcc-c++ devtoolset-${DEVTOOLSET_VERSION}-gcc-gfortran devtoolset-${DEVTOOLSET_VERSION}-binutils
ENV PATH=/opt/rh/devtoolset-${DEVTOOLSET_VERSION}/root/usr/bin:$PATH
ENV LD_LIBRARY_PATH=/opt/rh/devtoolset-${DEVTOOLSET_VERSION}/root/usr/lib64:/opt/rh/devtoolset-${DEVTOOLSET_VERSION}/root/usr/lib:$LD_LIBRARY_PATH
RUN yum --enablerepo=extras install -y epel-release
# cmake-3.18.4 from pip
RUN yum install -y python3-pip && \
python3 -mpip install cmake==3.18.4 && \
ln -s /usr/local/bin/cmake /usr/bin/cmake
RUN yum install -y autoconf aclocal automake make sudo
FROM base as openssl
# Install openssl (this must precede `build python` step)
# (In order to have a proper SSL module, Python is compiled
# against a recent openssl [see env vars above], which is linked
# statically. We delete openssl afterwards.)
ADD ./common/install_openssl.sh install_openssl.sh
RUN bash ./install_openssl.sh && rm install_openssl.sh
# EPEL for cmake
FROM base as patchelf
# Install patchelf
ADD ./common/install_patchelf.sh install_patchelf.sh
RUN bash ./install_patchelf.sh && rm install_patchelf.sh
RUN cp $(which patchelf) /patchelf
FROM patchelf as python
# build python
COPY manywheel/build_scripts /build_scripts
ADD ./common/install_cpython.sh /build_scripts/install_cpython.sh
RUN bash build_scripts/build.sh && rm -r build_scripts
FROM base as cuda
ARG BASE_CUDA_VERSION=10.2
# Install CUDA
ADD ./common/install_cuda.sh install_cuda.sh
RUN bash ./install_cuda.sh ${BASE_CUDA_VERSION} && rm install_cuda.sh
FROM base as intel
# MKL
ADD ./common/install_mkl.sh install_mkl.sh
RUN bash ./install_mkl.sh && rm install_mkl.sh
FROM base as magma
ARG BASE_CUDA_VERSION=10.2
# Install magma
ADD ./common/install_magma.sh install_magma.sh
RUN bash ./install_magma.sh ${BASE_CUDA_VERSION} && rm install_magma.sh
FROM base as jni
# Install java jni header
ADD ./common/install_jni.sh install_jni.sh
ADD ./java/jni.h jni.h
RUN bash ./install_jni.sh && rm install_jni.sh
FROM base as libpng
# Install libpng
ADD ./common/install_libpng.sh install_libpng.sh
RUN bash ./install_libpng.sh && rm install_libpng.sh
FROM ${GPU_IMAGE} as common
RUN sed -i s/mirror.centos.org/vault.centos.org/g /etc/yum.repos.d/*.repo
RUN sed -i s/^#.*baseurl=http/baseurl=http/g /etc/yum.repos.d/*.repo
RUN sed -i s/^mirrorlist=http/#mirrorlist=http/g /etc/yum.repos.d/*.repo
ENV LC_ALL en_US.UTF-8
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US.UTF-8
RUN yum install -y \
aclocal \
autoconf \
automake \
bison \
bzip2 \
curl \
diffutils \
file \
git \
make \
patch \
perl \
unzip \
util-linux \
wget \
which \
xz \
yasm
RUN yum install -y \
https://repo.ius.io/ius-release-el7.rpm \
https://ossci-linux.s3.amazonaws.com/epel-release-7-14.noarch.rpm
RUN yum swap -y git git236-core
# git236+ would refuse to run git commands in repos owned by other users
# Which causes version check to fail, as pytorch repo is bind-mounted into the image
# Override this behaviour by treating every folder as safe
# For more details see https://github.com/pytorch/pytorch/issues/78659#issuecomment-1144107327
RUN git config --global --add safe.directory "*"
ENV SSL_CERT_FILE=/opt/_internal/certs.pem
# Install LLVM version
COPY --from=openssl /opt/openssl /opt/openssl
COPY --from=python /opt/python /opt/python
COPY --from=python /opt/_internal /opt/_internal
COPY --from=python /opt/python/cp39-cp39/bin/auditwheel /usr/local/bin/auditwheel
COPY --from=intel /opt/intel /opt/intel
COPY --from=patchelf /usr/local/bin/patchelf /usr/local/bin/patchelf
COPY --from=jni /usr/local/include/jni.h /usr/local/include/jni.h
COPY --from=libpng /usr/local/bin/png* /usr/local/bin/
COPY --from=libpng /usr/local/bin/libpng* /usr/local/bin/
COPY --from=libpng /usr/local/include/png* /usr/local/include/
COPY --from=libpng /usr/local/include/libpng* /usr/local/include/
COPY --from=libpng /usr/local/lib/libpng* /usr/local/lib/
COPY --from=libpng /usr/local/lib/pkgconfig /usr/local/lib/pkgconfig
FROM common as cpu_final
ARG BASE_CUDA_VERSION=10.1
ARG DEVTOOLSET_VERSION=9
RUN sed -i s/mirror.centos.org/vault.centos.org/g /etc/yum.repos.d/*.repo
RUN sed -i s/^#.*baseurl=http/baseurl=http/g /etc/yum.repos.d/*.repo
RUN sed -i s/^mirrorlist=http/#mirrorlist=http/g /etc/yum.repos.d/*.repo
RUN yum install -y yum-utils centos-release-scl
RUN yum-config-manager --enable rhel-server-rhscl-7-rpms
RUN sed -i s/mirror.centos.org/vault.centos.org/g /etc/yum.repos.d/*.repo
RUN sed -i s/^#.*baseurl=http/baseurl=http/g /etc/yum.repos.d/*.repo
RUN sed -i s/^mirrorlist=http/#mirrorlist=http/g /etc/yum.repos.d/*.repo
RUN yum install -y devtoolset-${DEVTOOLSET_VERSION}-gcc devtoolset-${DEVTOOLSET_VERSION}-gcc-c++ devtoolset-${DEVTOOLSET_VERSION}-gcc-gfortran devtoolset-${DEVTOOLSET_VERSION}-binutils
ENV PATH=/opt/rh/devtoolset-${DEVTOOLSET_VERSION}/root/usr/bin:$PATH
ENV LD_LIBRARY_PATH=/opt/rh/devtoolset-${DEVTOOLSET_VERSION}/root/usr/lib64:/opt/rh/devtoolset-${DEVTOOLSET_VERSION}/root/usr/lib:$LD_LIBRARY_PATH
# cmake is already installed inside the rocm base image, so remove if present
RUN rpm -e cmake || true
# cmake-3.18.4 from pip
RUN yum install -y python3-pip && \
python3 -mpip install cmake==3.18.4 && \
ln -s /usr/local/bin/cmake /usr/bin/cmake
# ninja
RUN yum install -y ninja-build
FROM cpu_final as cuda_final
RUN rm -rf /usr/local/cuda-${BASE_CUDA_VERSION}
COPY --from=cuda /usr/local/cuda-${BASE_CUDA_VERSION} /usr/local/cuda-${BASE_CUDA_VERSION}
COPY --from=magma /usr/local/cuda-${BASE_CUDA_VERSION} /usr/local/cuda-${BASE_CUDA_VERSION}
RUN ln -sf /usr/local/cuda-${BASE_CUDA_VERSION} /usr/local/cuda
ENV PATH=/usr/local/cuda/bin:$PATH
FROM cpu_final as rocm_final
ARG ROCM_VERSION=3.7
ARG PYTORCH_ROCM_ARCH
ENV PYTORCH_ROCM_ARCH ${PYTORCH_ROCM_ARCH}
# Adding ROCM_PATH env var so that LoadHip.cmake (even with logic updated for ROCm6.0)
# find HIP works for ROCm5.7. Not needed for ROCm6.0 and above.
# Remove below when ROCm5.7 is not in support matrix anymore.
ENV ROCM_PATH /opt/rocm
ENV MKLROOT /opt/intel
# No need to install ROCm as base docker image should have full ROCm install
#ADD ./common/install_rocm.sh install_rocm.sh
#RUN ROCM_VERSION=${ROCM_VERSION} bash ./install_rocm.sh && rm install_rocm.sh
ADD ./common/install_rocm_drm.sh install_rocm_drm.sh
RUN bash ./install_rocm_drm.sh && rm install_rocm_drm.sh
# cmake3 is needed for the MIOpen build
RUN ln -sf /usr/local/bin/cmake /usr/bin/cmake3
ADD ./common/install_rocm_magma.sh install_rocm_magma.sh
RUN bash ./install_rocm_magma.sh && rm install_rocm_magma.sh
ADD ./common/install_miopen.sh install_miopen.sh
RUN bash ./install_miopen.sh ${ROCM_VERSION} && rm install_miopen.sh
# Install AOTriton
COPY ./common/common_utils.sh common_utils.sh
COPY ./aotriton_version.txt aotriton_version.txt
COPY ./common/install_aotriton.sh install_aotriton.sh
RUN bash ./install_aotriton.sh /opt/rocm && rm install_aotriton.sh aotriton_version.txt
ENV AOTRITON_INSTALLED_PREFIX /opt/rocm/aotriton

View File

@ -1,153 +0,0 @@
# syntax = docker/dockerfile:experimental
ARG ROCM_VERSION=3.7
ARG BASE_CUDA_VERSION=10.2
ARG GPU_IMAGE=nvidia/cuda:${BASE_CUDA_VERSION}-devel-centos7
FROM quay.io/pypa/manylinux2014_x86_64 as base
ENV LC_ALL en_US.UTF-8
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US.UTF-8
RUN sed -i s/mirror.centos.org/vault.centos.org/g /etc/yum.repos.d/*.repo
RUN sed -i s/^#.*baseurl=http/baseurl=http/g /etc/yum.repos.d/*.repo
RUN sed -i s/^mirrorlist=http/#mirrorlist=http/g /etc/yum.repos.d/*.repo
RUN yum install -y wget curl perl util-linux xz bzip2 git patch which perl zlib-devel
RUN yum install -y yum-utils centos-release-scl sudo
RUN yum-config-manager --enable rhel-server-rhscl-7-rpms
RUN yum install -y devtoolset-7-gcc devtoolset-7-gcc-c++ devtoolset-7-gcc-gfortran devtoolset-7-binutils
ENV PATH=/opt/rh/devtoolset-7/root/usr/bin:$PATH
ENV LD_LIBRARY_PATH=/opt/rh/devtoolset-7/root/usr/lib64:/opt/rh/devtoolset-7/root/usr/lib:$LD_LIBRARY_PATH
# cmake
RUN yum install -y cmake3 && \
ln -s /usr/bin/cmake3 /usr/bin/cmake
FROM base as openssl
# Install openssl (this must precede `build python` step)
# (In order to have a proper SSL module, Python is compiled
# against a recent openssl [see env vars above], which is linked
# statically. We delete openssl afterwards.)
ADD ./common/install_openssl.sh install_openssl.sh
RUN bash ./install_openssl.sh && rm install_openssl.sh
# remove unncessary python versions
RUN rm -rf /opt/python/cp26-cp26m /opt/_internal/cpython-2.6.9-ucs2
RUN rm -rf /opt/python/cp26-cp26mu /opt/_internal/cpython-2.6.9-ucs4
RUN rm -rf /opt/python/cp33-cp33m /opt/_internal/cpython-3.3.6
RUN rm -rf /opt/python/cp34-cp34m /opt/_internal/cpython-3.4.6
FROM base as cuda
ARG BASE_CUDA_VERSION=10.2
# Install CUDA
ADD ./common/install_cuda.sh install_cuda.sh
RUN bash ./install_cuda.sh ${BASE_CUDA_VERSION} && rm install_cuda.sh
FROM base as intel
# MKL
ADD ./common/install_mkl.sh install_mkl.sh
RUN bash ./install_mkl.sh && rm install_mkl.sh
FROM base as magma
ARG BASE_CUDA_VERSION=10.2
# Install magma
ADD ./common/install_magma.sh install_magma.sh
RUN bash ./install_magma.sh ${BASE_CUDA_VERSION} && rm install_magma.sh
FROM base as jni
# Install java jni header
ADD ./common/install_jni.sh install_jni.sh
ADD ./java/jni.h jni.h
RUN bash ./install_jni.sh && rm install_jni.sh
FROM base as libpng
# Install libpng
ADD ./common/install_libpng.sh install_libpng.sh
RUN bash ./install_libpng.sh && rm install_libpng.sh
FROM ${GPU_IMAGE} as common
RUN sed -i s/mirror.centos.org/vault.centos.org/g /etc/yum.repos.d/*.repo
RUN sed -i s/^#.*baseurl=http/baseurl=http/g /etc/yum.repos.d/*.repo
RUN sed -i s/^mirrorlist=http/#mirrorlist=http/g /etc/yum.repos.d/*.repo
ENV LC_ALL en_US.UTF-8
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US.UTF-8
RUN yum install -y \
aclocal \
autoconf \
automake \
bison \
bzip2 \
curl \
diffutils \
file \
git \
make \
patch \
perl \
unzip \
util-linux \
wget \
which \
xz \
yasm
RUN yum install -y \
https://repo.ius.io/ius-release-el7.rpm \
https://ossci-linux.s3.amazonaws.com/epel-release-7-14.noarch.rpm
RUN yum swap -y git git236-core
# git236+ would refuse to run git commands in repos owned by other users
# Which causes version check to fail, as pytorch repo is bind-mounted into the image
# Override this behaviour by treating every folder as safe
# For more details see https://github.com/pytorch/pytorch/issues/78659#issuecomment-1144107327
RUN git config --global --add safe.directory "*"
ENV SSL_CERT_FILE=/opt/_internal/certs.pem
# Install LLVM version
COPY --from=openssl /opt/openssl /opt/openssl
COPY --from=base /opt/python /opt/python
COPY --from=base /opt/_internal /opt/_internal
COPY --from=base /usr/local/bin/auditwheel /usr/local/bin/auditwheel
COPY --from=intel /opt/intel /opt/intel
COPY --from=base /usr/local/bin/patchelf /usr/local/bin/patchelf
COPY --from=libpng /usr/local/bin/png* /usr/local/bin/
COPY --from=libpng /usr/local/bin/libpng* /usr/local/bin/
COPY --from=libpng /usr/local/include/png* /usr/local/include/
COPY --from=libpng /usr/local/include/libpng* /usr/local/include/
COPY --from=libpng /usr/local/lib/libpng* /usr/local/lib/
COPY --from=libpng /usr/local/lib/pkgconfig /usr/local/lib/pkgconfig
COPY --from=jni /usr/local/include/jni.h /usr/local/include/jni.h
FROM common as cpu_final
ARG BASE_CUDA_VERSION=10.2
RUN yum install -y yum-utils centos-release-scl
RUN yum-config-manager --enable rhel-server-rhscl-7-rpms
RUN yum install -y devtoolset-7-gcc devtoolset-7-gcc-c++ devtoolset-7-gcc-gfortran devtoolset-7-binutils
ENV PATH=/opt/rh/devtoolset-7/root/usr/bin:$PATH
ENV LD_LIBRARY_PATH=/opt/rh/devtoolset-7/root/usr/lib64:/opt/rh/devtoolset-7/root/usr/lib:$LD_LIBRARY_PATH
# cmake
RUN yum install -y cmake3 && \
ln -s /usr/bin/cmake3 /usr/bin/cmake
# ninja
RUN yum install -y http://repo.okay.com.mx/centos/7/x86_64/release/okay-release-1-1.noarch.rpm
RUN yum install -y ninja-build
FROM cpu_final as cuda_final
RUN rm -rf /usr/local/cuda-${BASE_CUDA_VERSION}
COPY --from=cuda /usr/local/cuda-${BASE_CUDA_VERSION} /usr/local/cuda-${BASE_CUDA_VERSION}
COPY --from=magma /usr/local/cuda-${BASE_CUDA_VERSION} /usr/local/cuda-${BASE_CUDA_VERSION}
FROM common as rocm_final
ARG ROCM_VERSION=3.7
# Install ROCm
ADD ./common/install_rocm.sh install_rocm.sh
RUN bash ./install_rocm.sh ${ROCM_VERSION} && rm install_rocm.sh
# cmake is already installed inside the rocm base image, but both 2 and 3 exist
# cmake3 is needed for the later MIOpen custom build, so that step is last.
RUN yum install -y cmake3 && \
rm -f /usr/bin/cmake && \
ln -s /usr/bin/cmake3 /usr/bin/cmake
ADD ./common/install_miopen.sh install_miopen.sh
RUN bash ./install_miopen.sh ${ROCM_VERSION} && rm install_miopen.sh

View File

@ -1,157 +0,0 @@
# syntax = docker/dockerfile:experimental
ARG ROCM_VERSION=3.7
ARG BASE_CUDA_VERSION=11.8
ARG GPU_IMAGE=amd64/almalinux:8
FROM quay.io/pypa/manylinux_2_28_x86_64 as base
ENV LC_ALL en_US.UTF-8
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US.UTF-8
ARG DEVTOOLSET_VERSION=11
RUN yum install -y sudo wget curl perl util-linux xz bzip2 git patch which perl zlib-devel yum-utils gcc-toolset-${DEVTOOLSET_VERSION}-toolchain
ENV PATH=/opt/rh/gcc-toolset-${DEVTOOLSET_VERSION}/root/usr/bin:$PATH
ENV LD_LIBRARY_PATH=/opt/rh/gcc-toolset-${DEVTOOLSET_VERSION}/root/usr/lib64:/opt/rh/gcc-toolset-${DEVTOOLSET_VERSION}/root/usr/lib:$LD_LIBRARY_PATH
# cmake-3.18.4 from pip
RUN yum install -y python3-pip && \
python3 -mpip install cmake==3.18.4 && \
ln -s /usr/local/bin/cmake /usr/bin/cmake3
FROM base as openssl
# Install openssl (this must precede `build python` step)
# (In order to have a proper SSL module, Python is compiled
# against a recent openssl [see env vars above], which is linked
# statically. We delete openssl afterwards.)
ADD ./common/install_openssl.sh install_openssl.sh
RUN bash ./install_openssl.sh && rm install_openssl.sh
# remove unncessary python versions
RUN rm -rf /opt/python/cp26-cp26m /opt/_internal/cpython-2.6.9-ucs2
RUN rm -rf /opt/python/cp26-cp26mu /opt/_internal/cpython-2.6.9-ucs4
RUN rm -rf /opt/python/cp33-cp33m /opt/_internal/cpython-3.3.6
RUN rm -rf /opt/python/cp34-cp34m /opt/_internal/cpython-3.4.6
FROM base as cuda
ARG BASE_CUDA_VERSION=11.8
# Install CUDA
ADD ./common/install_cuda.sh install_cuda.sh
RUN bash ./install_cuda.sh ${BASE_CUDA_VERSION} && rm install_cuda.sh
FROM base as intel
# MKL
ADD ./common/install_mkl.sh install_mkl.sh
RUN bash ./install_mkl.sh && rm install_mkl.sh
FROM base as magma
ARG BASE_CUDA_VERSION=10.2
# Install magma
ADD ./common/install_magma.sh install_magma.sh
RUN bash ./install_magma.sh ${BASE_CUDA_VERSION} && rm install_magma.sh
FROM base as jni
# Install java jni header
ADD ./common/install_jni.sh install_jni.sh
ADD ./java/jni.h jni.h
RUN bash ./install_jni.sh && rm install_jni.sh
FROM base as libpng
# Install libpng
ADD ./common/install_libpng.sh install_libpng.sh
RUN bash ./install_libpng.sh && rm install_libpng.sh
FROM ${GPU_IMAGE} as common
ARG DEVTOOLSET_VERSION=11
ENV LC_ALL en_US.UTF-8
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US.UTF-8
RUN yum -y install epel-release
RUN yum -y update
RUN yum install -y \
autoconf \
automake \
bison \
bzip2 \
curl \
diffutils \
file \
git \
make \
patch \
perl \
unzip \
util-linux \
wget \
which \
xz \
gcc-toolset-${DEVTOOLSET_VERSION}-toolchain \
glibc-langpack-en
RUN yum install -y \
https://repo.ius.io/ius-release-el7.rpm \
https://ossci-linux.s3.amazonaws.com/epel-release-7-14.noarch.rpm
RUN yum swap -y git git236-core
# git236+ would refuse to run git commands in repos owned by other users
# Which causes version check to fail, as pytorch repo is bind-mounted into the image
# Override this behaviour by treating every folder as safe
# For more details see https://github.com/pytorch/pytorch/issues/78659#issuecomment-1144107327
RUN git config --global --add safe.directory "*"
ENV SSL_CERT_FILE=/opt/_internal/certs.pem
# Install LLVM version
COPY --from=openssl /opt/openssl /opt/openssl
COPY --from=base /opt/python /opt/python
COPY --from=base /opt/_internal /opt/_internal
COPY --from=base /usr/local/bin/auditwheel /usr/local/bin/auditwheel
COPY --from=intel /opt/intel /opt/intel
COPY --from=base /usr/local/bin/patchelf /usr/local/bin/patchelf
COPY --from=libpng /usr/local/bin/png* /usr/local/bin/
COPY --from=libpng /usr/local/bin/libpng* /usr/local/bin/
COPY --from=libpng /usr/local/include/png* /usr/local/include/
COPY --from=libpng /usr/local/include/libpng* /usr/local/include/
COPY --from=libpng /usr/local/lib/libpng* /usr/local/lib/
COPY --from=libpng /usr/local/lib/pkgconfig /usr/local/lib/pkgconfig
COPY --from=jni /usr/local/include/jni.h /usr/local/include/jni.h
FROM common as cpu_final
ARG BASE_CUDA_VERSION=11.8
ARG DEVTOOLSET_VERSION=11
# Ensure the expected devtoolset is used
ENV PATH=/opt/rh/gcc-toolset-${DEVTOOLSET_VERSION}/root/usr/bin:$PATH
ENV LD_LIBRARY_PATH=/opt/rh/gcc-toolset-${DEVTOOLSET_VERSION}/root/usr/lib64:/opt/rh/gcc-toolset-${DEVTOOLSET_VERSION}/root/usr/lib:$LD_LIBRARY_PATH
# cmake-3.18.4 from pip
RUN yum install -y python3-pip && \
python3 -mpip install cmake==3.18.4 && \
ln -s /usr/local/bin/cmake /usr/bin/cmake3
FROM cpu_final as cuda_final
RUN rm -rf /usr/local/cuda-${BASE_CUDA_VERSION}
COPY --from=cuda /usr/local/cuda-${BASE_CUDA_VERSION} /usr/local/cuda-${BASE_CUDA_VERSION}
COPY --from=magma /usr/local/cuda-${BASE_CUDA_VERSION} /usr/local/cuda-${BASE_CUDA_VERSION}
FROM common as rocm_final
ARG ROCM_VERSION=3.7
# Install ROCm
ADD ./common/install_rocm.sh install_rocm.sh
RUN bash ./install_rocm.sh ${ROCM_VERSION} && rm install_rocm.sh
# cmake is already installed inside the rocm base image, but both 2 and 3 exist
# cmake3 is needed for the later MIOpen custom build, so that step is last.
RUN yum install -y cmake3 && \
rm -f /usr/bin/cmake && \
ln -s /usr/bin/cmake3 /usr/bin/cmake
ADD ./common/install_miopen.sh install_miopen.sh
RUN bash ./install_miopen.sh ${ROCM_VERSION} && rm install_miopen.sh
FROM cpu_final as xpu_final
# XPU CD use rolling driver
ENV XPU_DRIVER_TYPE ROLLING
# cmake-3.28.4 from pip
RUN python3 -m pip install --upgrade pip && \
python3 -mpip install cmake==3.28.4
# Install setuptools and wheel for python 3.13
RUN /opt/python/cp313-cp313/bin/python -m pip install setuptools wheel
ADD ./common/install_xpu.sh install_xpu.sh
RUN bash ./install_xpu.sh && rm install_xpu.sh
RUN pushd /opt/_internal && tar -xJf static-libs-for-embedding-only.tar.xz && popd

View File

@ -1,57 +0,0 @@
FROM quay.io/pypa/manylinux_2_28_aarch64 as base
# Graviton needs GCC 10 or above for the build. GCC12 is the default version in almalinux-8.
ARG GCCTOOLSET_VERSION=11
# Language variabes
ENV LC_ALL=en_US.UTF-8
ENV LANG=en_US.UTF-8
ENV LANGUAGE=en_US.UTF-8
# Installed needed OS packages. This is to support all
# the binary builds (torch, vision, audio, text, data)
RUN yum -y install epel-release
RUN yum -y update
RUN yum install -y \
autoconf \
automake \
bison \
bzip2 \
curl \
diffutils \
file \
git \
less \
libffi-devel \
libgomp \
make \
openssl-devel \
patch \
perl \
unzip \
util-linux \
wget \
which \
xz \
yasm \
zstd \
sudo \
gcc-toolset-${GCCTOOLSET_VERSION}-toolchain
# Ensure the expected devtoolset is used
ENV PATH=/opt/rh/gcc-toolset-${GCCTOOLSET_VERSION}/root/usr/bin:$PATH
ENV LD_LIBRARY_PATH=/opt/rh/gcc-toolset-${GCCTOOLSET_VERSION}/root/usr/lib64:/opt/rh/gcc-toolset-${GCCTOOLSET_VERSION}/root/usr/lib:$LD_LIBRARY_PATH
# git236+ would refuse to run git commands in repos owned by other users
# Which causes version check to fail, as pytorch repo is bind-mounted into the image
# Override this behaviour by treating every folder as safe
# For more details see https://github.com/pytorch/pytorch/issues/78659#issuecomment-1144107327
RUN git config --global --add safe.directory "*"
FROM base as final
# remove unncessary python versions
RUN rm -rf /opt/python/cp26-cp26m /opt/_internal/cpython-2.6.9-ucs2
RUN rm -rf /opt/python/cp26-cp26mu /opt/_internal/cpython-2.6.9-ucs4
RUN rm -rf /opt/python/cp33-cp33m /opt/_internal/cpython-3.3.6
RUN rm -rf /opt/python/cp34-cp34m /opt/_internal/cpython-3.4.6

View File

@ -1,94 +0,0 @@
FROM quay.io/pypa/manylinux2014_aarch64 as base
# Graviton needs GCC 10 for the build
ARG DEVTOOLSET_VERSION=10
# Language variabes
ENV LC_ALL=en_US.UTF-8
ENV LANG=en_US.UTF-8
ENV LANGUAGE=en_US.UTF-8
# Installed needed OS packages. This is to support all
# the binary builds (torch, vision, audio, text, data)
RUN yum -y install epel-release
RUN yum -y update
RUN yum install -y \
autoconf \
automake \
bison \
bzip2 \
curl \
diffutils \
file \
git \
make \
patch \
perl \
unzip \
util-linux \
wget \
which \
xz \
yasm \
less \
zstd \
libgomp \
sudo \
devtoolset-${DEVTOOLSET_VERSION}-gcc \
devtoolset-${DEVTOOLSET_VERSION}-gcc-c++ \
devtoolset-${DEVTOOLSET_VERSION}-gcc-gfortran \
devtoolset-${DEVTOOLSET_VERSION}-binutils
# Ensure the expected devtoolset is used
ENV PATH=/opt/rh/devtoolset-${DEVTOOLSET_VERSION}/root/usr/bin:$PATH
ENV LD_LIBRARY_PATH=/opt/rh/devtoolset-${DEVTOOLSET_VERSION}/root/usr/lib64:/opt/rh/devtoolset-${DEVTOOLSET_VERSION}/root/usr/lib:$LD_LIBRARY_PATH
# git236+ would refuse to run git commands in repos owned by other users
# Which causes version check to fail, as pytorch repo is bind-mounted into the image
# Override this behaviour by treating every folder as safe
# For more details see https://github.com/pytorch/pytorch/issues/78659#issuecomment-1144107327
RUN git config --global --add safe.directory "*"
###############################################################################
# libglfortran.a hack
#
# libgfortran.a from quay.io/pypa/manylinux2014_aarch64 is not compiled with -fPIC.
# This causes __stack_chk_guard@@GLIBC_2.17 on pytorch build. To solve, get
# ubuntu's libgfortran.a which is compiled with -fPIC
# NOTE: Need a better way to get this library as Ubuntu's package can be removed by the vender, or changed
###############################################################################
RUN cd ~/ \
&& curl -L -o ~/libgfortran-10-dev.deb http://ports.ubuntu.com/ubuntu-ports/pool/universe/g/gcc-10/libgfortran-10-dev_10.5.0-1ubuntu1_arm64.deb \
&& ar x ~/libgfortran-10-dev.deb \
&& tar --use-compress-program=unzstd -xvf data.tar.zst -C ~/ \
&& cp -f ~/usr/lib/gcc/aarch64-linux-gnu/10/libgfortran.a /opt/rh/devtoolset-10/root/usr/lib/gcc/aarch64-redhat-linux/10/
# install cmake
RUN yum install -y cmake3 && \
ln -s /usr/bin/cmake3 /usr/bin/cmake
FROM base as openssl
# Install openssl (this must precede `build python` step)
# (In order to have a proper SSL module, Python is compiled
# against a recent openssl [see env vars above], which is linked
# statically. We delete openssl afterwards.)
ADD ./common/install_openssl.sh install_openssl.sh
RUN bash ./install_openssl.sh && rm install_openssl.sh
ENV SSL_CERT_FILE=/opt/_internal/certs.pem
FROM base as openblas
# Install openblas
ADD ./common/install_openblas.sh install_openblas.sh
RUN bash ./install_openblas.sh && rm install_openblas.sh
FROM openssl as final
# remove unncessary python versions
RUN rm -rf /opt/python/cp26-cp26m /opt/_internal/cpython-2.6.9-ucs2
RUN rm -rf /opt/python/cp26-cp26mu /opt/_internal/cpython-2.6.9-ucs4
RUN rm -rf /opt/python/cp33-cp33m /opt/_internal/cpython-3.3.6
RUN rm -rf /opt/python/cp34-cp34m /opt/_internal/cpython-3.4.6
COPY --from=openblas /opt/OpenBLAS/ /opt/OpenBLAS/
ENV LD_LIBRARY_PATH=/opt/OpenBLAS/lib:$LD_LIBRARY_PATH

View File

@ -1,91 +0,0 @@
FROM quay.io/pypa/manylinux_2_28_aarch64 as base
# Cuda ARM build needs gcc 11
ARG DEVTOOLSET_VERSION=11
# Language variables
ENV LC_ALL=en_US.UTF-8
ENV LANG=en_US.UTF-8
ENV LANGUAGE=en_US.UTF-8
# Installed needed OS packages. This is to support all
# the binary builds (torch, vision, audio, text, data)
RUN yum -y install epel-release
RUN yum -y update
RUN yum install -y \
autoconf \
automake \
bison \
bzip2 \
curl \
diffutils \
file \
git \
make \
patch \
perl \
unzip \
util-linux \
wget \
which \
xz \
yasm \
less \
zstd \
libgomp \
sudo \
gcc-toolset-${DEVTOOLSET_VERSION}-toolchain
# Ensure the expected devtoolset is used
ENV PATH=/opt/rh/gcc-toolset-${DEVTOOLSET_VERSION}/root/usr/bin:$PATH
ENV LD_LIBRARY_PATH=/opt/rh/gcc-toolset-${DEVTOOLSET_VERSION}/root/usr/lib64:/opt/rh/gcc-toolset-${DEVTOOLSET_VERSION}/root/usr/lib:$LD_LIBRARY_PATH
# git236+ would refuse to run git commands in repos owned by other users
# Which causes version check to fail, as pytorch repo is bind-mounted into the image
# Override this behaviour by treating every folder as safe
# For more details see https://github.com/pytorch/pytorch/issues/78659#issuecomment-1144107327
RUN git config --global --add safe.directory "*"
FROM base as openssl
# Install openssl (this must precede `build python` step)
# (In order to have a proper SSL module, Python is compiled
# against a recent openssl [see env vars above], which is linked
# statically. We delete openssl afterwards.)
ADD ./common/install_openssl.sh install_openssl.sh
RUN bash ./install_openssl.sh && rm install_openssl.sh
ENV SSL_CERT_FILE=/opt/_internal/certs.pem
FROM openssl as final
# remove unncessary python versions
RUN rm -rf /opt/python/cp26-cp26m /opt/_internal/cpython-2.6.9-ucs2
RUN rm -rf /opt/python/cp26-cp26mu /opt/_internal/cpython-2.6.9-ucs4
RUN rm -rf /opt/python/cp33-cp33m /opt/_internal/cpython-3.3.6
RUN rm -rf /opt/python/cp34-cp34m /opt/_internal/cpython-3.4.6
FROM base as cuda
ARG BASE_CUDA_VERSION
# Install CUDA
ADD ./common/install_cuda_aarch64.sh install_cuda_aarch64.sh
RUN bash ./install_cuda_aarch64.sh ${BASE_CUDA_VERSION} && rm install_cuda_aarch64.sh
FROM base as magma
ARG BASE_CUDA_VERSION
# Install magma
ADD ./common/install_magma.sh install_magma.sh
RUN bash ./install_magma.sh ${BASE_CUDA_VERSION} && rm install_magma.sh
FROM base as nvpl
# Install nvpl
ADD ./common/install_nvpl.sh install_nvpl.sh
RUN bash ./install_nvpl.sh && rm install_nvpl.sh
FROM final as cuda_final
ARG BASE_CUDA_VERSION
RUN rm -rf /usr/local/cuda-${BASE_CUDA_VERSION}
COPY --from=cuda /usr/local/cuda-${BASE_CUDA_VERSION} /usr/local/cuda-${BASE_CUDA_VERSION}
COPY --from=magma /usr/local/cuda-${BASE_CUDA_VERSION} /usr/local/cuda-${BASE_CUDA_VERSION}
COPY --from=nvpl /opt/nvpl/lib/ /usr/local/lib/
COPY --from=nvpl /opt/nvpl/include/ /usr/local/include/
RUN ln -sf /usr/local/cuda-${BASE_CUDA_VERSION} /usr/local/cuda
ENV PATH=/usr/local/cuda/bin:$PATH

View File

@ -1,71 +0,0 @@
FROM centos:8 as base
ENV LC_ALL en_US.UTF-8
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US.UTF-8
ENV PATH /opt/rh/gcc-toolset-11/root/bin/:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
# change to a valid repo
RUN sed -i 's|#baseurl=http://mirror.centos.org|baseurl=http://vault.centos.org|g' /etc/yum.repos.d/CentOS-Linux-*.repo
# enable to install ninja-build
RUN sed -i 's|enabled=0|enabled=1|g' /etc/yum.repos.d/CentOS-Linux-PowerTools.repo
RUN yum -y update
RUN yum install -y wget curl perl util-linux xz bzip2 git patch which zlib-devel sudo
RUN yum install -y autoconf automake make cmake gdb gcc-toolset-11-gcc-c++
FROM base as openssl
ADD ./common/install_openssl.sh install_openssl.sh
RUN bash ./install_openssl.sh && rm install_openssl.sh
# Install python
FROM base as python
RUN yum install -y openssl-devel zlib-devel bzip2-devel ncurses-devel sqlite-devel readline-devel tk-devel gdbm-devel libpcap-devel xz-devel libffi-devel
ADD common/install_cpython.sh install_cpython.sh
RUN bash ./install_cpython.sh && rm install_cpython.sh
FROM base as conda
ADD ./common/install_conda_docker.sh install_conda.sh
RUN bash ./install_conda.sh && rm install_conda.sh
RUN /opt/conda/bin/conda install -y cmake
FROM base as intel
# Install MKL
COPY --from=python /opt/python /opt/python
COPY --from=python /opt/_internal /opt/_internal
COPY --from=conda /opt/conda /opt/conda
ENV PATH=/opt/conda/bin:$PATH
ADD ./common/install_mkl.sh install_mkl.sh
RUN bash ./install_mkl.sh && rm install_mkl.sh
FROM base as patchelf
ADD ./common/install_patchelf.sh install_patchelf.sh
RUN bash ./install_patchelf.sh && rm install_patchelf.sh
RUN cp $(which patchelf) /patchelf
FROM base as jni
ADD ./common/install_jni.sh install_jni.sh
ADD ./java/jni.h jni.h
RUN bash ./install_jni.sh && rm install_jni.sh
FROM base as libpng
ADD ./common/install_libpng.sh install_libpng.sh
RUN bash ./install_libpng.sh && rm install_libpng.sh
FROM base as final
COPY --from=openssl /opt/openssl /opt/openssl
COPY --from=python /opt/python /opt/python
COPY --from=python /opt/_internal /opt/_internal
COPY --from=intel /opt/intel /opt/intel
COPY --from=conda /opt/conda /opt/conda
COPY --from=patchelf /usr/local/bin/patchelf /usr/local/bin/patchelf
COPY --from=jni /usr/local/include/jni.h /usr/local/include/jni.h
COPY --from=libpng /usr/local/bin/png* /usr/local/bin/
COPY --from=libpng /usr/local/bin/libpng* /usr/local/bin/
COPY --from=libpng /usr/local/include/png* /usr/local/include/
COPY --from=libpng /usr/local/include/libpng* /usr/local/include/
COPY --from=libpng /usr/local/lib/libpng* /usr/local/lib/
COPY --from=libpng /usr/local/lib/pkgconfig /usr/local/lib/pkgconfig
RUN yum install -y ninja-build

View File

@ -1,73 +0,0 @@
FROM --platform=linux/s390x docker.io/ubuntu:24.04 as base
# Language variables
ENV LC_ALL=C.UTF-8
ENV LANG=C.UTF-8
ENV LANGUAGE=C.UTF-8
# Installed needed OS packages. This is to support all
# the binary builds (torch, vision, audio, text, data)
RUN apt update ; apt upgrade -y
RUN apt install -y \
build-essential \
autoconf \
automake \
bzip2 \
curl \
diffutils \
file \
git \
make \
patch \
perl \
unzip \
util-linux \
wget \
which \
xz-utils \
less \
zstd \
cmake \
python3 \
python3-dev \
python3-setuptools \
python3-yaml \
python3-typing-extensions \
libblas-dev \
libopenblas-dev \
liblapack-dev \
libatlas-base-dev
# git236+ would refuse to run git commands in repos owned by other users
# Which causes version check to fail, as pytorch repo is bind-mounted into the image
# Override this behaviour by treating every folder as safe
# For more details see https://github.com/pytorch/pytorch/issues/78659#issuecomment-1144107327
RUN git config --global --add safe.directory "*"
FROM base as openssl
# Install openssl (this must precede `build python` step)
# (In order to have a proper SSL module, Python is compiled
# against a recent openssl [see env vars above], which is linked
# statically. We delete openssl afterwards.)
ADD ./common/install_openssl.sh install_openssl.sh
RUN bash ./install_openssl.sh && rm install_openssl.sh
ENV SSL_CERT_FILE=/opt/_internal/certs.pem
# EPEL for cmake
FROM base as patchelf
# Install patchelf
ADD ./common/install_patchelf.sh install_patchelf.sh
RUN bash ./install_patchelf.sh && rm install_patchelf.sh
RUN cp $(which patchelf) /patchelf
FROM patchelf as python
# build python
COPY manywheel/build_scripts /build_scripts
ADD ./common/install_cpython.sh /build_scripts/install_cpython.sh
RUN bash build_scripts/build.sh && rm -r build_scripts
FROM openssl as final
COPY --from=python /opt/python /opt/python
COPY --from=python /opt/_internal /opt/_internal
COPY --from=python /opt/python/cp39-cp39/bin/auditwheel /usr/local/bin/auditwheel
COPY --from=patchelf /usr/local/bin/patchelf /usr/local/bin/patchelf

View File

@ -1,154 +0,0 @@
#!/usr/bin/env bash
# Script used only in CD pipeline
set -eou pipefail
TOPDIR=$(git rev-parse --show-toplevel)
image="$1"
shift
if [ -z "${image}" ]; then
echo "Usage: $0 IMAGE"
exit 1
fi
DOCKER_IMAGE="pytorch/${image}"
DOCKER_REGISTRY="${DOCKER_REGISTRY:-docker.io}"
GPU_ARCH_TYPE=${GPU_ARCH_TYPE:-cpu}
GPU_ARCH_VERSION=${GPU_ARCH_VERSION:-}
MANY_LINUX_VERSION=${MANY_LINUX_VERSION:-}
DOCKERFILE_SUFFIX=${DOCKERFILE_SUFFIX:-}
WITH_PUSH=${WITH_PUSH:-}
case ${GPU_ARCH_TYPE} in
cpu)
TARGET=cpu_final
DOCKER_TAG=cpu
GPU_IMAGE=centos:7
DOCKER_GPU_BUILD_ARG=" --build-arg DEVTOOLSET_VERSION=9"
;;
cpu-manylinux_2_28)
TARGET=cpu_final
DOCKER_TAG=cpu
GPU_IMAGE=amd64/almalinux:8
DOCKER_GPU_BUILD_ARG=" --build-arg DEVTOOLSET_VERSION=11"
MANY_LINUX_VERSION="2_28"
;;
cpu-aarch64)
TARGET=final
DOCKER_TAG=cpu-aarch64
GPU_IMAGE=arm64v8/centos:7
DOCKER_GPU_BUILD_ARG=" --build-arg DEVTOOLSET_VERSION=10"
MANY_LINUX_VERSION="aarch64"
;;
cpu-aarch64-2_28)
TARGET=final
DOCKER_TAG=cpu-aarch64
GPU_IMAGE=arm64v8/almalinux:8
DOCKER_GPU_BUILD_ARG=" --build-arg DEVTOOLSET_VERSION=11"
MANY_LINUX_VERSION="2_28_aarch64"
;;
cpu-cxx11-abi)
TARGET=final
DOCKER_TAG=cpu-cxx11-abi
GPU_IMAGE=""
DOCKER_GPU_BUILD_ARG=" --build-arg DEVTOOLSET_VERSION=9"
MANY_LINUX_VERSION="cxx11-abi"
;;
cpu-s390x)
TARGET=final
DOCKER_TAG=cpu-s390x
GPU_IMAGE=redhat/ubi9
DOCKER_GPU_BUILD_ARG=""
MANY_LINUX_VERSION="s390x"
;;
cuda)
TARGET=cuda_final
DOCKER_TAG=cuda${GPU_ARCH_VERSION}
# Keep this up to date with the minimum version of CUDA we currently support
GPU_IMAGE=centos:7
DOCKER_GPU_BUILD_ARG="--build-arg BASE_CUDA_VERSION=${GPU_ARCH_VERSION} --build-arg DEVTOOLSET_VERSION=9"
;;
cuda-manylinux_2_28)
TARGET=cuda_final
DOCKER_TAG=cuda${GPU_ARCH_VERSION}
GPU_IMAGE=amd64/almalinux:8
DOCKER_GPU_BUILD_ARG="--build-arg BASE_CUDA_VERSION=${GPU_ARCH_VERSION} --build-arg DEVTOOLSET_VERSION=11"
MANY_LINUX_VERSION="2_28"
;;
cuda-aarch64)
TARGET=cuda_final
DOCKER_TAG=cuda${GPU_ARCH_VERSION}
GPU_IMAGE=arm64v8/centos:7
DOCKER_GPU_BUILD_ARG="--build-arg BASE_CUDA_VERSION=${GPU_ARCH_VERSION} --build-arg DEVTOOLSET_VERSION=11"
MANY_LINUX_VERSION="aarch64"
DOCKERFILE_SUFFIX="_cuda_aarch64"
;;
rocm)
TARGET=rocm_final
DOCKER_TAG=rocm${GPU_ARCH_VERSION}
GPU_IMAGE=rocm/dev-centos-7:${GPU_ARCH_VERSION}-complete
PYTORCH_ROCM_ARCH="gfx900;gfx906;gfx908;gfx90a;gfx1030;gfx1100"
ROCM_REGEX="([0-9]+)\.([0-9]+)[\.]?([0-9]*)"
if [[ $GPU_ARCH_VERSION =~ $ROCM_REGEX ]]; then
ROCM_VERSION_INT=$((${BASH_REMATCH[1]}*10000 + ${BASH_REMATCH[2]}*100 + ${BASH_REMATCH[3]:-0}))
else
echo "ERROR: rocm regex failed"
exit 1
fi
if [[ $ROCM_VERSION_INT -ge 60000 ]]; then
PYTORCH_ROCM_ARCH+=";gfx942"
fi
DOCKER_GPU_BUILD_ARG="--build-arg ROCM_VERSION=${GPU_ARCH_VERSION} --build-arg PYTORCH_ROCM_ARCH=${PYTORCH_ROCM_ARCH} --build-arg DEVTOOLSET_VERSION=9"
;;
xpu)
TARGET=xpu_final
DOCKER_TAG=xpu
GPU_IMAGE=amd64/almalinux:8
DOCKER_GPU_BUILD_ARG=" --build-arg DEVTOOLSET_VERSION=11"
MANY_LINUX_VERSION="2_28"
;;
*)
echo "ERROR: Unrecognized GPU_ARCH_TYPE: ${GPU_ARCH_TYPE}"
exit 1
;;
esac
IMAGES=''
if [[ -n ${MANY_LINUX_VERSION} && -z ${DOCKERFILE_SUFFIX} ]]; then
DOCKERFILE_SUFFIX=_${MANY_LINUX_VERSION}
fi
(
set -x
DOCKER_BUILDKIT=1 docker build \
${DOCKER_GPU_BUILD_ARG} \
--build-arg "GPU_IMAGE=${GPU_IMAGE}" \
--target "${TARGET}" \
-t "${DOCKER_IMAGE}" \
$@ \
-f "${TOPDIR}/.ci/docker/manywheel/Dockerfile${DOCKERFILE_SUFFIX}" \
"${TOPDIR}/.ci/docker/"
)
GITHUB_REF=${GITHUB_REF:-$(git symbolic-ref -q HEAD || git describe --tags --exact-match)}
GIT_BRANCH_NAME="2.5"
GIT_COMMIT_SHA=${GITHUB_SHA:-$(git rev-parse HEAD)}
DOCKER_IMAGE_BRANCH_TAG=${DOCKER_IMAGE}-${GIT_BRANCH_NAME}
DOCKER_IMAGE_SHA_TAG=${DOCKER_IMAGE}-${GIT_COMMIT_SHA}
if [[ "${WITH_PUSH}" == true ]]; then
(
set -x
docker push "${DOCKER_IMAGE}"
if [[ -n ${GITHUB_REF} ]]; then
docker tag ${DOCKER_IMAGE} ${DOCKER_IMAGE_BRANCH_TAG}
docker tag ${DOCKER_IMAGE} ${DOCKER_IMAGE_SHA_TAG}
docker push "${DOCKER_IMAGE_BRANCH_TAG}"
docker push "${DOCKER_IMAGE_SHA_TAG}"
fi
)
fi

View File

@ -1,131 +0,0 @@
#!/bin/bash
# Top-level build script called from Dockerfile
# Script used only in CD pipeline
# Stop at any error, show all commands
set -ex
# openssl version to build, with expected sha256 hash of .tar.gz
# archive
OPENSSL_ROOT=openssl-1.1.1l
OPENSSL_HASH=0b7a3e5e59c34827fe0c3a74b7ec8baef302b98fa80088d7f9153aa16fa76bd1
DEVTOOLS_HASH=a8ebeb4bed624700f727179e6ef771dafe47651131a00a78b342251415646acc
PATCHELF_HASH=d9afdff4baeacfbc64861454f368b7f2c15c44d245293f7587bbf726bfe722fb
CURL_ROOT=curl-7.73.0
CURL_HASH=cf34fe0b07b800f1c01a499a6e8b2af548f6d0e044dca4a29d88a4bee146d131
AUTOCONF_ROOT=autoconf-2.69
AUTOCONF_HASH=954bd69b391edc12d6a4a51a2dd1476543da5c6bbf05a95b59dc0dd6fd4c2969
# Get build utilities
MY_DIR=$(dirname "${BASH_SOURCE[0]}")
source $MY_DIR/build_utils.sh
if [ "$(uname -m)" != "s390x" ] ; then
# Dependencies for compiling Python that we want to remove from
# the final image after compiling Python
PYTHON_COMPILE_DEPS="zlib-devel bzip2-devel ncurses-devel sqlite-devel readline-devel tk-devel gdbm-devel db4-devel libpcap-devel xz-devel libffi-devel"
# Libraries that are allowed as part of the manylinux1 profile
MANYLINUX1_DEPS="glibc-devel libstdc++-devel glib2-devel libX11-devel libXext-devel libXrender-devel mesa-libGL-devel libICE-devel libSM-devel ncurses-devel"
# Development tools and libraries
yum -y install bzip2 make git patch unzip bison yasm diffutils \
automake which file cmake28 \
kernel-devel-`uname -r` \
${PYTHON_COMPILE_DEPS}
else
# Dependencies for compiling Python that we want to remove from
# the final image after compiling Python
PYTHON_COMPILE_DEPS="zlib1g-dev libbz2-dev libncurses-dev libsqlite3-dev libdb-dev libpcap-dev liblzma-dev libffi-dev"
# Libraries that are allowed as part of the manylinux1 profile
MANYLINUX1_DEPS="libglib2.0-dev libX11-dev libncurses-dev"
# Development tools and libraries
apt install -y bzip2 make git patch unzip diffutils \
automake which file cmake \
linux-headers-virtual \
${PYTHON_COMPILE_DEPS}
fi
# Install newest autoconf
build_autoconf $AUTOCONF_ROOT $AUTOCONF_HASH
autoconf --version
# Compile the latest Python releases.
# (In order to have a proper SSL module, Python is compiled
# against a recent openssl [see env vars above], which is linked
# statically. We delete openssl afterwards.)
build_openssl $OPENSSL_ROOT $OPENSSL_HASH
/build_scripts/install_cpython.sh
PY39_BIN=/opt/python/cp39-cp39/bin
# Our openssl doesn't know how to find the system CA trust store
# (https://github.com/pypa/manylinux/issues/53)
# And it's not clear how up-to-date that is anyway
# So let's just use the same one pip and everyone uses
$PY39_BIN/pip install certifi
ln -s $($PY39_BIN/python -c 'import certifi; print(certifi.where())') \
/opt/_internal/certs.pem
# If you modify this line you also have to modify the versions in the
# Dockerfiles:
export SSL_CERT_FILE=/opt/_internal/certs.pem
# Install newest curl
build_curl $CURL_ROOT $CURL_HASH
rm -rf /usr/local/include/curl /usr/local/lib/libcurl* /usr/local/lib/pkgconfig/libcurl.pc
hash -r
curl --version
curl-config --features
# Install patchelf (latest with unreleased bug fixes)
curl -sLOk https://nixos.org/releases/patchelf/patchelf-0.10/patchelf-0.10.tar.gz
# check_sha256sum patchelf-0.9njs2.tar.gz $PATCHELF_HASH
tar -xzf patchelf-0.10.tar.gz
(cd patchelf-0.10 && ./configure && make && make install)
rm -rf patchelf-0.10.tar.gz patchelf-0.10
# Install latest pypi release of auditwheel
$PY39_BIN/pip install auditwheel
ln -s $PY39_BIN/auditwheel /usr/local/bin/auditwheel
# Clean up development headers and other unnecessary stuff for
# final image
if [ "$(uname -m)" != "s390x" ] ; then
yum -y erase wireless-tools gtk2 libX11 hicolor-icon-theme \
avahi freetype bitstream-vera-fonts \
${PYTHON_COMPILE_DEPS} || true > /dev/null 2>&1
yum -y install ${MANYLINUX1_DEPS}
yum -y clean all > /dev/null 2>&1
yum list installed
else
apt purge -y ${PYTHON_COMPILE_DEPS} || true > /dev/null 2>&1
fi
# we don't need libpython*.a, and they're many megabytes
find /opt/_internal -name '*.a' -print0 | xargs -0 rm -f
# Strip what we can -- and ignore errors, because this just attempts to strip
# *everything*, including non-ELF files:
find /opt/_internal -type f -print0 \
| xargs -0 -n1 strip --strip-unneeded 2>/dev/null || true
# We do not need the Python test suites, or indeed the precompiled .pyc and
# .pyo files. Partially cribbed from:
# https://github.com/docker-library/python/blob/master/3.4/slim/Dockerfile
find /opt/_internal \
\( -type d -a -name test -o -name tests \) \
-o \( -type f -a -name '*.pyc' -o -name '*.pyo' \) \
-print0 | xargs -0 rm -f
for PYTHON in /opt/python/*/bin/python; do
# Smoke test to make sure that our Pythons work, and do indeed detect as
# being manylinux compatible:
$PYTHON $MY_DIR/manylinux1-check.py
# Make sure that SSL cert checking works
$PYTHON $MY_DIR/ssl-check.py
done
# Fix libc headers to remain compatible with C99 compilers.
find /usr/include/ -type f -exec sed -i 's/\bextern _*inline_*\b/extern __inline __attribute__ ((__gnu_inline__))/g' {} +
# Now we can delete our built SSL
rm -rf /usr/local/ssl

View File

@ -1,91 +0,0 @@
#!/bin/bash
# Helper utilities for build
# Script used only in CD pipeline
OPENSSL_DOWNLOAD_URL=https://www.openssl.org/source/old/1.1.1/
CURL_DOWNLOAD_URL=https://curl.askapache.com/download
AUTOCONF_DOWNLOAD_URL=https://ftp.gnu.org/gnu/autoconf
function check_var {
if [ -z "$1" ]; then
echo "required variable not defined"
exit 1
fi
}
function do_openssl_build {
./config no-ssl2 no-shared -fPIC --prefix=/usr/local/ssl > /dev/null
make > /dev/null
make install > /dev/null
}
function check_sha256sum {
local fname=$1
check_var ${fname}
local sha256=$2
check_var ${sha256}
echo "${sha256} ${fname}" > ${fname}.sha256
sha256sum -c ${fname}.sha256
rm -f ${fname}.sha256
}
function build_openssl {
local openssl_fname=$1
check_var ${openssl_fname}
local openssl_sha256=$2
check_var ${openssl_sha256}
check_var ${OPENSSL_DOWNLOAD_URL}
curl -sLO ${OPENSSL_DOWNLOAD_URL}/${openssl_fname}.tar.gz
check_sha256sum ${openssl_fname}.tar.gz ${openssl_sha256}
tar -xzf ${openssl_fname}.tar.gz
(cd ${openssl_fname} && do_openssl_build)
rm -rf ${openssl_fname} ${openssl_fname}.tar.gz
}
function do_curl_build {
LIBS=-ldl ./configure --with-ssl --disable-shared > /dev/null
make > /dev/null
make install > /dev/null
}
function build_curl {
local curl_fname=$1
check_var ${curl_fname}
local curl_sha256=$2
check_var ${curl_sha256}
check_var ${CURL_DOWNLOAD_URL}
curl -sLO ${CURL_DOWNLOAD_URL}/${curl_fname}.tar.bz2
check_sha256sum ${curl_fname}.tar.bz2 ${curl_sha256}
tar -jxf ${curl_fname}.tar.bz2
(cd ${curl_fname} && do_curl_build)
rm -rf ${curl_fname} ${curl_fname}.tar.bz2
}
function do_standard_install {
./configure > /dev/null
make > /dev/null
make install > /dev/null
}
function build_autoconf {
local autoconf_fname=$1
check_var ${autoconf_fname}
local autoconf_sha256=$2
check_var ${autoconf_sha256}
check_var ${AUTOCONF_DOWNLOAD_URL}
curl -sLO ${AUTOCONF_DOWNLOAD_URL}/${autoconf_fname}.tar.gz
check_sha256sum ${autoconf_fname}.tar.gz ${autoconf_sha256}
tar -zxf ${autoconf_fname}.tar.gz
(cd ${autoconf_fname} && do_standard_install)
rm -rf ${autoconf_fname} ${autoconf_fname}.tar.gz
}

View File

@ -1,60 +0,0 @@
# Logic copied from PEP 513
def is_manylinux1_compatible():
# Only Linux, and only x86-64 / i686
from distutils.util import get_platform
if get_platform() not in ["linux-x86_64", "linux-i686", "linux-s390x"]:
return False
# Check for presence of _manylinux module
try:
import _manylinux
return bool(_manylinux.manylinux1_compatible)
except (ImportError, AttributeError):
# Fall through to heuristic check below
pass
# Check glibc version. CentOS 5 uses glibc 2.5.
return have_compatible_glibc(2, 5)
def have_compatible_glibc(major, minimum_minor):
import ctypes
process_namespace = ctypes.CDLL(None)
try:
gnu_get_libc_version = process_namespace.gnu_get_libc_version
except AttributeError:
# Symbol doesn't exist -> therefore, we are not linked to
# glibc.
return False
# Call gnu_get_libc_version, which returns a string like "2.5".
gnu_get_libc_version.restype = ctypes.c_char_p
version_str = gnu_get_libc_version()
# py2 / py3 compatibility:
if not isinstance(version_str, str):
version_str = version_str.decode("ascii")
# Parse string and check against requested version.
version = [int(piece) for piece in version_str.split(".")]
assert len(version) == 2
if major != version[0]:
return False
if minimum_minor > version[1]:
return False
return True
import sys
if is_manylinux1_compatible():
print(f"{sys.executable} is manylinux1 compatible")
sys.exit(0)
else:
print(f"{sys.executable} is NOT manylinux1 compatible")
sys.exit(1)

View File

@ -1,35 +0,0 @@
# cf. https://github.com/pypa/manylinux/issues/53
GOOD_SSL = "https://google.com"
BAD_SSL = "https://self-signed.badssl.com"
import sys
print("Testing SSL certificate checking for Python:", sys.version)
if sys.version_info[:2] < (2, 7) or sys.version_info[:2] < (3, 4):
print("This version never checks SSL certs; skipping tests")
sys.exit(0)
if sys.version_info[0] >= 3:
from urllib.request import urlopen
EXC = OSError
else:
from urllib import urlopen
EXC = IOError
print(f"Connecting to {GOOD_SSL} should work")
urlopen(GOOD_SSL)
print("...it did, yay.")
print(f"Connecting to {BAD_SSL} should fail")
try:
urlopen(BAD_SSL)
# If we get here then we failed:
print("...it DIDN'T!!!!!11!!1one!")
sys.exit(1)
except EXC:
print("...it did, yay.")

View File

@ -15,7 +15,7 @@ click
#Pinned versions:
#test that import:
coremltools==5.0b5 ; python_version < "3.12"
coremltools==5.0b5
#Description: Apple framework for ML integration
#Pinned versions: 5.0b5
#test that import:
@ -25,19 +25,9 @@ coremltools==5.0b5 ; python_version < "3.12"
#Pinned versions:
#test that import:
dill==0.3.7
#Description: dill extends pickle with serializing and de-serializing for most built-ins
#Pinned versions: 0.3.7
#test that import: dynamo/test_replay_record.py test_dataloader.py test_datapipe.py test_serialization.py
expecttest==0.2.1
expecttest==0.1.6
#Description: method for writing tests where test framework auto populates
# the expected output based on previous runs
#Pinned versions: 0.2.1
#test that import:
fbscribelogger==0.1.6
#Description: write to scribe from authenticated jobs on CI
#Pinned versions: 0.1.6
#test that import:
@ -57,11 +47,6 @@ junitparser==2.1.1
#Pinned versions: 2.1.1
#test that import:
lark==0.12.0
#Description: parser
#Pinned versions: 0.12.0
#test that import:
librosa>=0.6.2 ; python_version < "3.11"
#Description: A python package for music and audio analysis
#Pinned versions: >=0.6.2
@ -81,7 +66,7 @@ librosa>=0.6.2 ; python_version < "3.11"
#Description: A testing library that allows you to replace parts of your
#system under test with mock objects
#Pinned versions:
#test that import: test_modules.py, test_nn.py,
#test that import: test_module_init.py, test_modules.py, test_nn.py,
#test_testing.py
#MonkeyType # breaks pytorch-xla-linux-bionic-py3.7-clang8
@ -90,10 +75,10 @@ librosa>=0.6.2 ; python_version < "3.11"
#Pinned versions:
#test that import:
mypy==1.10.0
mypy==1.7.0
# Pin MyPy version because new errors are likely to appear with each release
#Description: linter
#Pinned versions: 1.10.0
#Pinned versions: 1.7.0
#test that import: test_typing.py, test_type_hints.py
networkx==2.8.8
@ -109,7 +94,7 @@ networkx==2.8.8
#test that import: run_test.py, test_cpp_extensions_aot.py,test_determination.py
numba==0.49.0 ; python_version < "3.9"
numba==0.55.2 ; python_version == "3.9"
numba==0.54.1 ; python_version == "3.9"
numba==0.55.2 ; python_version == "3.10"
#Description: Just-In-Time Compiler for Numerical Functions
#Pinned versions: 0.54.1, 0.49.0, <=0.49.1
@ -139,9 +124,9 @@ opt-einsum==3.3
#Pinned versions: 3.3
#test that import: test_linalg.py
optree==0.12.1
optree==0.9.1
#Description: A library for tree manipulation
#Pinned versions: 0.12.1
#Pinned versions: 0.9.1
#test that import: test_vmap.py, test_aotdispatch.py, test_dynamic_shapes.py,
#test_pytree.py, test_ops.py, test_control_flow.py, test_modules.py,
#common_utils.py, test_eager_transforms.py, test_python_dispatch.py,
@ -152,9 +137,9 @@ optree==0.12.1
#test_pointwise_ops.py, test_dtensor_ops.py, test_torchinductor.py, test_fx.py,
#test_fake_tensor.py, test_mps.py
pillow==10.3.0
pillow==10.0.1
#Description: Python Imaging Library fork
#Pinned versions: 10.3.0
#Pinned versions: 10.0.1
#test that import:
protobuf==3.20.2
@ -177,6 +162,11 @@ pytest-xdist==3.3.1
#Pinned versions:
#test that import:
pytest-shard==0.1.2
#Description: plugin spliting up tests in pytest
#Pinned versions:
#test that import:
pytest-flakefinder==1.1.0
#Description: plugin for rerunning tests a fixed number of times in pytest
#Pinned versions: 1.1.0
@ -223,7 +213,7 @@ pygments==2.15.0
#test that import:
scikit-image==0.19.3 ; python_version < "3.10"
scikit-image==0.22.0 ; python_version >= "3.10"
scikit-image==0.20.0 ; python_version >= "3.10"
#Description: image processing routines
#Pinned versions:
#test that import: test_nn.py
@ -233,11 +223,12 @@ scikit-image==0.22.0 ; python_version >= "3.10"
#Pinned versions: 0.20.3
#test that import:
scipy==1.10.1 ; python_version <= "3.11"
scipy==1.12.0 ; python_version == "3.12"
scipy==1.6.3 ; python_version < "3.10"
scipy==1.8.1 ; python_version == "3.10"
scipy==1.10.1 ; python_version == "3.11"
# Pin SciPy because of failing distribution tests (see #60347)
#Description: scientific python
#Pinned versions: 1.10.1
#Pinned versions: 1.6.3
#test that import: test_unary_ufuncs.py, test_torch.py,test_tensor_creation_ops.py
#test_spectral_ops.py, test_sparse_csr.py, test_reductions.py,test_nn.py
#test_linalg.py, test_binary_ufuncs.py
@ -252,8 +243,7 @@ tb-nightly==2.13.0a20230426
#Pinned versions:
#test that import:
# needed by torchgen utils
typing-extensions
#typing-extensions
#Description: type hints for python
#Pinned versions:
#test that import:
@ -268,29 +258,24 @@ unittest-xml-reporting<=3.2.0,>=2.0.0
#Pinned versions:
#test that import:
#lintrunner is supported on aarch64-linux only from 0.12.4 version
lintrunner==0.12.5
lintrunner==0.10.7
#Description: all about linters!
#Pinned versions: 0.12.5
#Pinned versions: 0.10.7
#test that import:
redis>=4.0.0
#Description: redis database
#test that import: anything that tests OSS caching/mocking (inductor/test_codecache.py, inductor/test_max_autotune.py)
rockset==1.0.3
#Description: queries Rockset
#Pinned versions: 1.0.3
#test that import:
ghstack==0.8.0
ghstack==0.7.1
#Description: ghstack tool
#Pinned versions: 0.8.0
#Pinned versions: 0.7.1
#test that import:
jinja2==3.1.4
jinja2==3.1.2
#Description: jinja2 template engine
#Pinned versions: 3.1.4
#Pinned versions: 3.1.2
#test that import:
pytest-cpp==2.3.0
@ -308,32 +293,8 @@ tensorboard==2.13.0
#Pinned versions:
#test that import: test_tensorboard
pywavelets==1.4.1 ; python_version < "3.12"
pywavelets==1.5.0 ; python_version >= "3.12"
pywavelets==1.4.1
#Description: This is a requirement of scikit-image, we need to pin
# it here because 1.5.0 conflicts with numpy 1.21.2 used in CI
#Pinned versions: 1.4.1
#test that import:
lxml==5.0.0
#Description: This is a requirement of unittest-xml-reporting
# Python-3.9 binaries
PyGithub==2.3.0
sympy==1.12.1 ; python_version == "3.8"
sympy==1.13.1 ; python_version >= "3.9"
#Description: Required by coremltools, also pinned in .github/requirements/pip-requirements-macOS.txt
#Pinned versions:
#test that import:
onnx==1.16.1
#Description: Required by mypy and test_public_bindings.py when checking torch.onnx._internal
#Pinned versions:
#test that import:
onnxscript==0.1.0.dev20240817
#Description: Required by mypy and test_public_bindings.py when checking torch.onnx._internal
#Pinned versions:
#test that import:

View File

@ -1 +1 @@
3.1.0
2.1.0

View File

@ -56,7 +56,7 @@ RUN if [ -n "${DB}" ]; then bash ./install_db.sh; fi
RUN rm install_db.sh
ENV INSTALLED_DB ${DB}
# (optional) Install vision packages like OpenCV
# (optional) Install vision packages like OpenCV and ffmpeg
ARG VISION
COPY ./common/install_vision.sh ./common/cache_vision_models.sh ./common/common_utils.sh ./
RUN if [ -n "${VISION}" ]; then bash ./install_vision.sh; fi
@ -103,14 +103,6 @@ COPY triton_version.txt triton_version.txt
RUN if [ -n "${TRITON}" ]; then bash ./install_triton.sh; fi
RUN rm install_triton.sh common_utils.sh triton.txt triton_version.txt
ARG HALIDE
# Build and install halide
COPY ./common/install_halide.sh install_halide.sh
COPY ./common/common_utils.sh common_utils.sh
COPY ci_commit_pins/halide.txt halide.txt
RUN if [ -n "${HALIDE}" ]; then bash ./install_halide.sh; fi
RUN rm install_halide.sh common_utils.sh halide.txt
# Install ccache/sccache (do this last, so we get priority in PATH)
COPY ./common/install_cache.sh install_cache.sh
ENV PATH /opt/cache/bin:$PATH
@ -147,26 +139,13 @@ COPY --from=pytorch/llvm:9.0.1 /opt/llvm /opt/llvm
ARG CUDNN_VERSION
ARG CUDA_VERSION
COPY ./common/install_cudnn.sh install_cudnn.sh
RUN if [ -n "${CUDNN_VERSION}" ]; then bash install_cudnn.sh; fi
RUN if [ "${CUDNN_VERSION}" -eq 8 ]; then bash install_cudnn.sh; fi
RUN rm install_cudnn.sh
# Install CUSPARSELT
ARG CUDA_VERSION
COPY ./common/install_cusparselt.sh install_cusparselt.sh
RUN bash install_cusparselt.sh
RUN rm install_cusparselt.sh
# Install CUDSS
ARG CUDA_VERSION
COPY ./common/install_cudss.sh install_cudss.sh
RUN bash install_cudss.sh
RUN rm install_cudss.sh
# Delete /usr/local/cuda-11.X/cuda-11.X symlinks
RUN if [ -h /usr/local/cuda-11.6/cuda-11.6 ]; then rm /usr/local/cuda-11.6/cuda-11.6; fi
RUN if [ -h /usr/local/cuda-11.7/cuda-11.7 ]; then rm /usr/local/cuda-11.7/cuda-11.7; fi
RUN if [ -h /usr/local/cuda-12.1/cuda-12.1 ]; then rm /usr/local/cuda-12.1/cuda-12.1; fi
RUN if [ -h /usr/local/cuda-12.4/cuda-12.4 ]; then rm /usr/local/cuda-12.4/cuda-12.4; fi
USER jenkins
CMD ["bash"]

View File

@ -53,7 +53,7 @@ RUN if [ -n "${DB}" ]; then bash ./install_db.sh; fi
RUN rm install_db.sh
ENV INSTALLED_DB ${DB}
# (optional) Install vision packages like OpenCV
# (optional) Install vision packages like OpenCV and ffmpeg
ARG VISION
COPY ./common/install_vision.sh ./common/cache_vision_models.sh ./common/common_utils.sh ./
RUN if [ -n "${VISION}" ]; then bash ./install_vision.sh; fi
@ -68,8 +68,6 @@ RUN rm install_rocm.sh
COPY ./common/install_rocm_magma.sh install_rocm_magma.sh
RUN bash ./install_rocm_magma.sh
RUN rm install_rocm_magma.sh
ADD ./common/install_miopen.sh install_miopen.sh
RUN bash ./install_miopen.sh ${ROCM_VERSION} && rm install_miopen.sh
ENV ROCM_PATH /opt/rocm
ENV PATH /opt/rocm/bin:$PATH
ENV PATH /opt/rocm/hcc/bin:$PATH
@ -80,11 +78,6 @@ ENV MAGMA_HOME /opt/rocm/magma
ENV LANG C.UTF-8
ENV LC_ALL C.UTF-8
# Install amdsmi
COPY ./common/install_amdsmi.sh install_amdsmi.sh
RUN bash ./install_amdsmi.sh
RUN rm install_amdsmi.sh
# (optional) Install non-default CMake version
ARG CMAKE_VERSION
COPY ./common/install_cmake.sh install_cmake.sh
@ -102,17 +95,10 @@ ARG TRITON
# try to reach out to S3, which docker build runners don't have access
COPY ./common/install_triton.sh install_triton.sh
COPY ./common/common_utils.sh common_utils.sh
COPY ci_commit_pins/triton.txt triton.txt
COPY ci_commit_pins/triton-rocm.txt triton-rocm.txt
COPY triton_version.txt triton_version.txt
RUN if [ -n "${TRITON}" ]; then bash ./install_triton.sh; fi
RUN rm install_triton.sh common_utils.sh triton.txt triton_version.txt
# Install AOTriton
COPY ./aotriton_version.txt aotriton_version.txt
COPY ./common/common_utils.sh common_utils.sh
COPY ./common/install_aotriton.sh install_aotriton.sh
RUN ["/bin/bash", "-c", "./install_aotriton.sh /opt/rocm && rm -rf install_aotriton.sh aotriton_version.txt common_utils.sh"]
ENV AOTRITON_INSTALLED_PREFIX /opt/rocm/aotriton
RUN rm install_triton.sh common_utils.sh triton-rocm.txt triton_version.txt
# Install ccache/sccache (do this last, so we get priority in PATH)
COPY ./common/install_cache.sh install_cache.sh
@ -123,8 +109,5 @@ RUN bash ./install_cache.sh && rm install_cache.sh
ARG BUILD_ENVIRONMENT
ENV BUILD_ENVIRONMENT ${BUILD_ENVIRONMENT}
# Install LLVM dev version (Defined in the pytorch/builder github repository)
COPY --from=pytorch/llvm:9.0.1 /opt/llvm /opt/llvm
USER jenkins
CMD ["bash"]

View File

@ -1,119 +0,0 @@
ARG UBUNTU_VERSION
FROM ubuntu:${UBUNTU_VERSION}
ARG UBUNTU_VERSION
ENV DEBIAN_FRONTEND noninteractive
ARG CLANG_VERSION
# Install common dependencies (so that this step can be cached separately)
COPY ./common/install_base.sh install_base.sh
RUN bash ./install_base.sh && rm install_base.sh
# Install clang
ARG LLVMDEV
COPY ./common/install_clang.sh install_clang.sh
RUN bash ./install_clang.sh && rm install_clang.sh
# Install user
COPY ./common/install_user.sh install_user.sh
RUN bash ./install_user.sh && rm install_user.sh
# Install katex
ARG KATEX
COPY ./common/install_docs_reqs.sh install_docs_reqs.sh
RUN bash ./install_docs_reqs.sh && rm install_docs_reqs.sh
# Install conda and other packages (e.g., numpy, pytest)
ARG ANACONDA_PYTHON_VERSION
ARG CONDA_CMAKE
ARG DOCS
ARG BUILD_ENVIRONMENT
ENV ANACONDA_PYTHON_VERSION=$ANACONDA_PYTHON_VERSION
ENV PATH /opt/conda/envs/py_$ANACONDA_PYTHON_VERSION/bin:/opt/conda/bin:$PATH
ENV DOCS=$DOCS
COPY requirements-ci.txt requirements-docs.txt /opt/conda/
COPY ./common/install_conda.sh install_conda.sh
COPY ./common/common_utils.sh common_utils.sh
RUN bash ./install_conda.sh && rm install_conda.sh common_utils.sh /opt/conda/requirements-ci.txt /opt/conda/requirements-docs.txt
# Install gcc
ARG GCC_VERSION
COPY ./common/install_gcc.sh install_gcc.sh
RUN bash ./install_gcc.sh && rm install_gcc.sh
# Install lcov for C++ code coverage
COPY ./common/install_lcov.sh install_lcov.sh
RUN bash ./install_lcov.sh && rm install_lcov.sh
COPY ./common/install_openssl.sh install_openssl.sh
RUN bash ./install_openssl.sh
ENV OPENSSL_ROOT_DIR /opt/openssl
ENV OPENSSL_DIR /opt/openssl
RUN rm install_openssl.sh
ARG INDUCTOR_BENCHMARKS
COPY ./common/install_inductor_benchmark_deps.sh install_inductor_benchmark_deps.sh
COPY ./common/common_utils.sh common_utils.sh
COPY ci_commit_pins/huggingface.txt huggingface.txt
COPY ci_commit_pins/timm.txt timm.txt
RUN if [ -n "${INDUCTOR_BENCHMARKS}" ]; then bash ./install_inductor_benchmark_deps.sh; fi
RUN rm install_inductor_benchmark_deps.sh common_utils.sh timm.txt huggingface.txt
# Install XPU Dependencies
ARG XPU_VERSION
COPY ./common/install_xpu.sh install_xpu.sh
RUN bash ./install_xpu.sh && rm install_xpu.sh
ARG TRITON
# Install triton, this needs to be done before sccache because the latter will
# try to reach out to S3, which docker build runners don't have access
COPY ./common/install_triton.sh install_triton.sh
COPY ./common/common_utils.sh common_utils.sh
COPY ci_commit_pins/triton-xpu.txt triton-xpu.txt
COPY triton_version.txt triton_version.txt
RUN if [ -n "${TRITON}" ]; then bash ./install_triton.sh; fi
RUN rm install_triton.sh common_utils.sh triton-xpu.txt triton_version.txt
# (optional) Install database packages like LMDB and LevelDB
ARG DB
COPY ./common/install_db.sh install_db.sh
RUN if [ -n "${DB}" ]; then bash ./install_db.sh; fi
RUN rm install_db.sh
ENV INSTALLED_DB ${DB}
# (optional) Install vision packages like OpenCV
ARG VISION
COPY ./common/install_vision.sh ./common/cache_vision_models.sh ./common/common_utils.sh ./
RUN if [ -n "${VISION}" ]; then bash ./install_vision.sh; fi
RUN rm install_vision.sh cache_vision_models.sh common_utils.sh
ENV INSTALLED_VISION ${VISION}
# (optional) Install non-default CMake version
ARG CMAKE_VERSION
COPY ./common/install_cmake.sh install_cmake.sh
RUN if [ -n "${CMAKE_VERSION}" ]; then bash ./install_cmake.sh; fi
RUN rm install_cmake.sh
# (optional) Install non-default Ninja version
ARG NINJA_VERSION
COPY ./common/install_ninja.sh install_ninja.sh
RUN if [ -n "${NINJA_VERSION}" ]; then bash ./install_ninja.sh; fi
RUN rm install_ninja.sh
# Install ccache/sccache (do this last, so we get priority in PATH)
COPY ./common/install_cache.sh install_cache.sh
ENV PATH /opt/cache/bin:$PATH
RUN bash ./install_cache.sh && rm install_cache.sh
# Include BUILD_ENVIRONMENT environment variable in image
ARG BUILD_ENVIRONMENT
ENV BUILD_ENVIRONMENT ${BUILD_ENVIRONMENT}
# Install LLVM dev version (Defined in the pytorch/builder github repository)
COPY --from=pytorch/llvm:9.0.1 /opt/llvm /opt/llvm
USER jenkins
CMD ["bash"]

View File

@ -37,7 +37,6 @@ COPY requirements-ci.txt requirements-docs.txt /opt/conda/
COPY ./common/install_conda.sh install_conda.sh
COPY ./common/common_utils.sh common_utils.sh
RUN bash ./install_conda.sh && rm install_conda.sh common_utils.sh /opt/conda/requirements-ci.txt /opt/conda/requirements-docs.txt
RUN if [ -n "${UNINSTALL_DILL}" ]; then pip uninstall -y dill; fi
# Install gcc
ARG GCC_VERSION
@ -50,7 +49,7 @@ RUN bash ./install_lcov.sh && rm install_lcov.sh
# Install cuda and cudnn
ARG CUDA_VERSION
COPY ./common/install_cuda.sh install_cuda.sh
RUN wget -q https://raw.githubusercontent.com/pytorch/builder/main/common/install_cuda.sh -O install_cuda.sh
RUN bash ./install_cuda.sh ${CUDA_VERSION} && rm install_cuda.sh
ENV DESIRED_CUDA ${CUDA_VERSION}
ENV PATH /usr/local/nvidia/bin:/usr/local/cuda/bin:$PATH
@ -80,7 +79,7 @@ RUN if [ -n "${DB}" ]; then bash ./install_db.sh; fi
RUN rm install_db.sh
ENV INSTALLED_DB ${DB}
# (optional) Install vision packages like OpenCV
# (optional) Install vision packages like OpenCV and ffmpeg
ARG VISION
COPY ./common/install_vision.sh ./common/cache_vision_models.sh ./common/common_utils.sh ./
RUN if [ -n "${VISION}" ]; then bash ./install_vision.sh; fi
@ -155,33 +154,16 @@ COPY ci_commit_pins/executorch.txt executorch.txt
RUN if [ -n "${EXECUTORCH}" ]; then bash ./install_executorch.sh; fi
RUN rm install_executorch.sh common_utils.sh executorch.txt
ARG HALIDE
# Build and install halide
COPY ./common/install_halide.sh install_halide.sh
COPY ./common/common_utils.sh common_utils.sh
COPY ci_commit_pins/halide.txt halide.txt
RUN if [ -n "${HALIDE}" ]; then bash ./install_halide.sh; fi
RUN rm install_halide.sh common_utils.sh halide.txt
ARG ONNX
# Install ONNX dependencies
COPY ./common/install_onnx.sh ./common/common_utils.sh ./
RUN if [ -n "${ONNX}" ]; then bash ./install_onnx.sh; fi
RUN rm install_onnx.sh common_utils.sh
# (optional) Build ACL
ARG ACL
COPY ./common/install_acl.sh install_acl.sh
RUN if [ -n "${ACL}" ]; then bash ./install_acl.sh; fi
RUN rm install_acl.sh
ENV INSTALLED_ACL ${ACL}
# Install ccache/sccache (do this last, so we get priority in PATH)
ARG SKIP_SCCACHE_INSTALL
COPY ./common/install_cache.sh install_cache.sh
ENV PATH /opt/cache/bin:$PATH
RUN if [ -z "${SKIP_SCCACHE_INSTALL}" ]; then bash ./install_cache.sh; fi
RUN rm install_cache.sh
RUN bash ./install_cache.sh && rm install_cache.sh
# Add jni.h for java host build
COPY ./common/install_jni.sh install_jni.sh
@ -198,9 +180,7 @@ ARG BUILD_ENVIRONMENT
ENV BUILD_ENVIRONMENT ${BUILD_ENVIRONMENT}
# Install LLVM dev version (Defined in the pytorch/builder github repository)
ARG SKIP_LLVM_SRC_BUILD_INSTALL
COPY --from=pytorch/llvm:9.0.1 /opt/llvm /opt/llvm
RUN if [ -n "${SKIP_LLVM_SRC_BUILD_INSTALL}" ]; then set -eu; rm -rf /opt/llvm; fi
# AWS specific CUDA build guidance
ENV TORCH_CUDA_ARCH_LIST Maxwell

View File

@ -1,9 +1,5 @@
#!/bin/bash
set -ex
source "$(dirname "${BASH_SOURCE[0]}")/../pytorch/common_utils.sh"
LOCAL_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
ROOT_DIR=$(cd "$LOCAL_DIR"/../.. && pwd)
TEST_DIR="$ROOT_DIR/test"

View File

@ -3,20 +3,6 @@
# shellcheck source=./common.sh
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
# Workaround for dind-rootless userid mapping (https://github.com/pytorch/ci-infra/issues/96)
WORKSPACE_ORIGINAL_OWNER_ID=$(stat -c '%u' "/var/lib/jenkins/workspace")
cleanup_workspace() {
echo "sudo may print the following warning message that can be ignored. The chown command will still run."
echo " sudo: setrlimit(RLIMIT_STACK): Operation not permitted"
echo "For more details refer to https://github.com/sudo-project/sudo/issues/42"
sudo chown -R "$WORKSPACE_ORIGINAL_OWNER_ID" /var/lib/jenkins/workspace
}
# Disable shellcheck SC2064 as we want to parse the original owner immediately.
# shellcheck disable=SC2064
trap_add cleanup_workspace EXIT
sudo chown -R jenkins /var/lib/jenkins/workspace
git config --global --add safe.directory /var/lib/jenkins/workspace
if [[ "$BUILD_ENVIRONMENT" == *onnx* ]]; then
# TODO: This can be removed later once vision is also part of the Docker image
pip install -q --user --no-use-pep517 "git+https://github.com/pytorch/vision.git@$(cat .github/ci_commit_pins/vision.txt)"

View File

@ -1 +1,42 @@
This directory contains scripts for our continuous integration.
One important thing to keep in mind when reading the scripts here is
that they are all based off of Docker images, which we build for each of
the various system configurations we want to run on Jenkins. This means
it is very easy to run these tests yourself:
1. Figure out what Docker image you want. The general template for our
images look like:
``registry.pytorch.org/pytorch/pytorch-$BUILD_ENVIRONMENT:$DOCKER_VERSION``,
where ``$BUILD_ENVIRONMENT`` is one of the build environments
enumerated in
[pytorch-dockerfiles](https://github.com/pytorch/pytorch/blob/master/.ci/docker/build.sh). The dockerfile used by jenkins can be found under the `.ci` [directory](https://github.com/pytorch/pytorch/blob/master/.ci/docker)
2. Run ``docker run -it -u jenkins $DOCKER_IMAGE``, clone PyTorch and
run one of the scripts in this directory.
The Docker images are designed so that any "reasonable" build commands
will work; if you look in [build.sh](build.sh) you will see that it is a
very simple script. This is intentional. Idiomatic build instructions
should work inside all of our Docker images. You can tweak the commands
however you need (e.g., in case you want to rebuild with DEBUG, or rerun
the build with higher verbosity, etc.).
We have to do some work to make this so. Here is a summary of the
mechanisms we use:
- We install binaries to directories like `/usr/local/bin` which
are automatically part of your PATH.
- We add entries to the PATH using Docker ENV variables (so
they apply when you enter Docker) and `/etc/environment` (so they
continue to apply even if you sudo), instead of modifying
`PATH` in our build scripts.
- We use `/etc/ld.so.conf.d` to register directories containing
shared libraries, instead of modifying `LD_LIBRARY_PATH` in our
build scripts.
- We reroute well known paths like `/usr/bin/gcc` to alternate
implementations with `update-alternatives`, instead of setting
`CC` and `CXX` in our implementations.

View File

@ -28,8 +28,6 @@ echo "Environment variables:"
env
if [[ "$BUILD_ENVIRONMENT" == *cuda* ]]; then
# Use jemalloc during compilation to mitigate https://github.com/pytorch/pytorch/issues/116289
export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2
echo "NVCC version:"
nvcc --version
fi
@ -44,13 +42,26 @@ if [[ "$BUILD_ENVIRONMENT" == *cuda11* ]]; then
fi
fi
if [[ ${BUILD_ENVIRONMENT} == *"parallelnative"* ]]; then
if [[ ${BUILD_ENVIRONMENT} == *"caffe2"* ]]; then
echo "Caffe2 build is ON"
export BUILD_CAFFE2=ON
fi
if [[ ${BUILD_ENVIRONMENT} == *"paralleltbb"* ]]; then
export ATEN_THREADING=TBB
export USE_TBB=1
elif [[ ${BUILD_ENVIRONMENT} == *"parallelnative"* ]]; then
export ATEN_THREADING=NATIVE
fi
# Enable LLVM dependency for TensorExpr testing
export USE_LLVM=/opt/llvm
export LLVM_DIR=/opt/llvm/lib/cmake/llvm
if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then
export USE_LLVM=/opt/rocm/llvm
export LLVM_DIR=/opt/rocm/llvm/lib/cmake/llvm
else
export USE_LLVM=/opt/llvm
export LLVM_DIR=/opt/llvm/lib/cmake/llvm
fi
if [[ "$BUILD_ENVIRONMENT" == *executorch* ]]; then
# To build test_edge_op_registration
@ -68,35 +79,7 @@ if ! which conda; then
export USE_MKLDNN=0
fi
else
# CMAKE_PREFIX_PATH precedences
# 1. $CONDA_PREFIX, if defined. This follows the pytorch official build instructions.
# 2. /opt/conda/envs/py_${ANACONDA_PYTHON_VERSION}, if ANACONDA_PYTHON_VERSION defined.
# This is for CI, which defines ANACONDA_PYTHON_VERSION but not CONDA_PREFIX.
# 3. $(conda info --base). The fallback value of pytorch official build
# instructions actually refers to this.
# Commonly this is /opt/conda/
if [[ -v CONDA_PREFIX ]]; then
export CMAKE_PREFIX_PATH=${CONDA_PREFIX}
elif [[ -v ANACONDA_PYTHON_VERSION ]]; then
export CMAKE_PREFIX_PATH="/opt/conda/envs/py_${ANACONDA_PYTHON_VERSION}"
else
# already checked by `! which conda`
CMAKE_PREFIX_PATH="$(conda info --base)"
export CMAKE_PREFIX_PATH
fi
# Workaround required for MKL library linkage
# https://github.com/pytorch/pytorch/issues/119557
if [ "$ANACONDA_PYTHON_VERSION" = "3.12" ]; then
export CMAKE_LIBRARY_PATH="/opt/conda/envs/py_$ANACONDA_PYTHON_VERSION/lib/"
export CMAKE_INCLUDE_PATH="/opt/conda/envs/py_$ANACONDA_PYTHON_VERSION/include/"
fi
fi
if [[ "$BUILD_ENVIRONMENT" == *aarch64* ]]; then
export USE_MKLDNN=1
export USE_MKLDNN_ACL=1
export ACL_ROOT_DIR=/ComputeLibrary
export CMAKE_PREFIX_PATH=/opt/conda
fi
if [[ "$BUILD_ENVIRONMENT" == *libtorch* ]]; then
@ -168,13 +151,6 @@ if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then
python tools/amd_build/build_amd.py
fi
if [[ "$BUILD_ENVIRONMENT" == *xpu* ]]; then
# shellcheck disable=SC1091
source /opt/intel/oneapi/compiler/latest/env/vars.sh
# XPU kineto feature dependencies are not fully ready, disable kineto build as temp WA
export USE_KINETO=0
fi
# sccache will fail for CUDA builds if all cores are used for compiling
# gcc 7 with sccache seems to have intermittent OOM issue if all cores are used
if [ -z "$MAX_JOBS" ]; then
@ -226,28 +202,6 @@ if [[ "${BUILD_ENVIRONMENT}" != *android* && "${BUILD_ENVIRONMENT}" != *cuda* ]]
export BUILD_STATIC_RUNTIME_BENCHMARK=ON
fi
if [[ "$BUILD_ENVIRONMENT" == *-debug* ]]; then
export CMAKE_BUILD_TYPE=RelWithAssert
fi
# Do not change workspace permissions for ROCm CI jobs
# as it can leave workspace with bad permissions for cancelled jobs
if [[ "$BUILD_ENVIRONMENT" != *rocm* ]]; then
# Workaround for dind-rootless userid mapping (https://github.com/pytorch/ci-infra/issues/96)
WORKSPACE_ORIGINAL_OWNER_ID=$(stat -c '%u' "/var/lib/jenkins/workspace")
cleanup_workspace() {
echo "sudo may print the following warning message that can be ignored. The chown command will still run."
echo " sudo: setrlimit(RLIMIT_STACK): Operation not permitted"
echo "For more details refer to https://github.com/sudo-project/sudo/issues/42"
sudo chown -R "$WORKSPACE_ORIGINAL_OWNER_ID" /var/lib/jenkins/workspace
}
# Disable shellcheck SC2064 as we want to parse the original owner immediately.
# shellcheck disable=SC2064
trap_add cleanup_workspace EXIT
sudo chown -R jenkins /var/lib/jenkins/workspace
git config --global --add safe.directory /var/lib/jenkins/workspace
fi
if [[ "$BUILD_ENVIRONMENT" == *-bazel-* ]]; then
set -e
@ -273,36 +227,16 @@ else
( ! get_exit_code python setup.py clean bad_argument )
if [[ "$BUILD_ENVIRONMENT" != *libtorch* ]]; then
# rocm builds fail when WERROR=1
# XLA test build fails when WERROR=1
# set only when building other architectures
# or building non-XLA tests.
if [[ "$BUILD_ENVIRONMENT" != *rocm* &&
"$BUILD_ENVIRONMENT" != *xla* ]]; then
if [[ "$BUILD_ENVIRONMENT" != *py3.8* ]]; then
# Install numpy-2.0.2 for builds which are backward compatible with 1.X
python -mpip install --pre numpy==2.0.2
fi
WERROR=1 python setup.py clean
if [[ "$USE_SPLIT_BUILD" == "true" ]]; then
BUILD_LIBTORCH_WHL=1 BUILD_PYTHON_ONLY=0 python setup.py bdist_wheel
BUILD_LIBTORCH_WHL=0 BUILD_PYTHON_ONLY=1 python setup.py bdist_wheel --cmake
else
WERROR=1 python setup.py bdist_wheel
fi
WERROR=1 python setup.py bdist_wheel
else
python setup.py clean
if [[ "$BUILD_ENVIRONMENT" == *xla* ]]; then
source .ci/pytorch/install_cache_xla.sh
fi
if [[ "$USE_SPLIT_BUILD" == "true" ]]; then
echo "USE_SPLIT_BUILD cannot be used with xla or rocm"
exit 1
else
python setup.py bdist_wheel
fi
python setup.py bdist_wheel
fi
pip_install_whl "$(echo dist/*.whl)"
@ -341,10 +275,9 @@ else
CUSTOM_OP_TEST="$PWD/test/custom_operator"
python --version
SITE_PACKAGES="$(python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())')"
mkdir -p "$CUSTOM_OP_BUILD"
pushd "$CUSTOM_OP_BUILD"
cmake "$CUSTOM_OP_TEST" -DCMAKE_PREFIX_PATH="$SITE_PACKAGES/torch;$SITE_PACKAGES" -DPython_EXECUTABLE="$(which python)" \
cmake "$CUSTOM_OP_TEST" -DCMAKE_PREFIX_PATH="$SITE_PACKAGES/torch" -DPYTHON_EXECUTABLE="$(which python)" \
-DCMAKE_MODULE_PATH="$CUSTOM_TEST_MODULE_PATH" -DUSE_ROCM="$CUSTOM_TEST_USE_ROCM"
make VERBOSE=1
popd
@ -357,7 +290,7 @@ else
SITE_PACKAGES="$(python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())')"
mkdir -p "$JIT_HOOK_BUILD"
pushd "$JIT_HOOK_BUILD"
cmake "$JIT_HOOK_TEST" -DCMAKE_PREFIX_PATH="$SITE_PACKAGES/torch;$SITE_PACKAGES" -DPython_EXECUTABLE="$(which python)" \
cmake "$JIT_HOOK_TEST" -DCMAKE_PREFIX_PATH="$SITE_PACKAGES/torch" -DPYTHON_EXECUTABLE="$(which python)" \
-DCMAKE_MODULE_PATH="$CUSTOM_TEST_MODULE_PATH" -DUSE_ROCM="$CUSTOM_TEST_USE_ROCM"
make VERBOSE=1
popd
@ -369,7 +302,7 @@ else
python --version
mkdir -p "$CUSTOM_BACKEND_BUILD"
pushd "$CUSTOM_BACKEND_BUILD"
cmake "$CUSTOM_BACKEND_TEST" -DCMAKE_PREFIX_PATH="$SITE_PACKAGES/torch;$SITE_PACKAGES" -DPython_EXECUTABLE="$(which python)" \
cmake "$CUSTOM_BACKEND_TEST" -DCMAKE_PREFIX_PATH="$SITE_PACKAGES/torch" -DPYTHON_EXECUTABLE="$(which python)" \
-DCMAKE_MODULE_PATH="$CUSTOM_TEST_MODULE_PATH" -DUSE_ROCM="$CUSTOM_TEST_USE_ROCM"
make VERBOSE=1
popd
@ -400,8 +333,4 @@ if [[ "$BUILD_ENVIRONMENT" != *libtorch* && "$BUILD_ENVIRONMENT" != *bazel* ]];
python tools/stats/export_test_times.py
fi
# snadampal: skipping it till sccache support added for aarch64
# https://github.com/pytorch/pytorch/issues/121559
if [[ "$BUILD_ENVIRONMENT" != *aarch64* ]]; then
print_sccache_stats
fi
print_sccache_stats

View File

@ -56,29 +56,9 @@ function assert_git_not_dirty() {
function pip_install_whl() {
# This is used to install PyTorch and other build artifacts wheel locally
# without using any network connection
# Convert the input arguments into an array
local args=("$@")
# Check if the first argument contains multiple paths separated by spaces
if [[ "${args[0]}" == *" "* ]]; then
# Split the string by spaces into an array
IFS=' ' read -r -a paths <<< "${args[0]}"
# Loop through each path and install individually
for path in "${paths[@]}"; do
echo "Installing $path"
python3 -mpip install --no-index --no-deps "$path"
done
else
# Loop through each argument and install individually
for path in "${args[@]}"; do
echo "Installing $path"
python3 -mpip install --no-index --no-deps "$path"
done
fi
python3 -mpip install --no-index --no-deps "$@"
}
function pip_install() {
# retry 3 times
# old versions of pip don't have the "--progress-bar" flag
@ -178,11 +158,6 @@ function install_torchvision() {
fi
}
function install_tlparse() {
pip_install --user "tlparse==0.3.25"
PATH="$(python -m site --user-base)/bin:$PATH"
}
function install_torchrec_and_fbgemm() {
local torchrec_commit
torchrec_commit=$(get_pinned_commit torchrec)
@ -198,7 +173,7 @@ function install_torchrec_and_fbgemm() {
function clone_pytorch_xla() {
if [[ ! -d ./xla ]]; then
git clone --recursive -b r2.5 https://github.com/pytorch/xla.git
git clone --recursive --quiet https://github.com/pytorch/xla.git
pushd xla
# pin the xla hash so that we don't get broken by changes to xla
git checkout "$(cat ../.github/ci_commit_pins/xla.txt)"
@ -208,6 +183,28 @@ function clone_pytorch_xla() {
fi
}
function checkout_install_torchdeploy() {
local commit
commit=$(get_pinned_commit multipy)
pushd ..
git clone --recurse-submodules https://github.com/pytorch/multipy.git
pushd multipy
git checkout "${commit}"
python multipy/runtime/example/generate_examples.py
BUILD_CUDA_TESTS=1 pip install -e .
popd
popd
}
function test_torch_deploy(){
pushd ..
pushd multipy
./multipy/runtime/build/test_deploy
./multipy/runtime/build/test_deploy_gpu
popd
popd
}
function checkout_install_torchbench() {
local commit
commit=$(get_pinned_commit torchbench)
@ -222,8 +219,6 @@ function checkout_install_torchbench() {
# to install and test other models
python install.py --continue_on_fail
fi
echo "Print all dependencies after TorchBench is installed"
python -mpip freeze
popd
}

View File

@ -6,7 +6,6 @@ from cryptography.hazmat.primitives import hashes, serialization
from cryptography.hazmat.primitives.asymmetric import rsa
from cryptography.x509.oid import NameOID
temp_dir = mkdtemp()
print(temp_dir)

View File

@ -6,4 +6,4 @@ source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
echo "Testing pytorch docs"
cd docs
TERM=vt100 make doctest
make doctest

View File

@ -1,37 +0,0 @@
#!/bin/bash
# Script for installing sccache on the xla build job, which uses xla's docker
# image and doesn't have sccache installed on it. This is mostly copied from
# .ci/docker/install_cache.sh. Changes are: removing checks that will always
# return the same thing, ex checks for for rocm, CUDA, and changing the path
# where sccache is installed, and not changing /etc/environment.
set -ex
install_binary() {
echo "Downloading sccache binary from S3 repo"
curl --retry 3 https://s3.amazonaws.com/ossci-linux/sccache -o /tmp/cache/bin/sccache
}
mkdir -p /tmp/cache/bin
mkdir -p /tmp/cache/lib
export PATH="/tmp/cache/bin:$PATH"
install_binary
chmod a+x /tmp/cache/bin/sccache
function write_sccache_stub() {
# Unset LD_PRELOAD for ps because of asan + ps issues
# https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90589
# shellcheck disable=SC2086
# shellcheck disable=SC2059
printf "#!/bin/sh\nif [ \$(env -u LD_PRELOAD ps -p \$PPID -o comm=) != sccache ]; then\n exec sccache $(which $1) \"\$@\"\nelse\n exec $(which $1) \"\$@\"\nfi" > "/tmp/cache/bin/$1"
chmod a+x "/tmp/cache/bin/$1"
}
write_sccache_stub cc
write_sccache_stub c++
write_sccache_stub gcc
write_sccache_stub g++
write_sccache_stub clang
write_sccache_stub clang++

View File

@ -9,7 +9,7 @@ sysctl -a | grep machdep.cpu
# These are required for both the build job and the test job.
# In the latter to test cpp extensions.
export MACOSX_DEPLOYMENT_TARGET=11.1
export MACOSX_DEPLOYMENT_TARGET=11.0
export CXX=clang++
export CC=clang

View File

@ -149,8 +149,6 @@ test_jit_hooks() {
assert_git_not_dirty
}
install_tlparse
if [[ $NUM_TEST_SHARDS -gt 1 ]]; then
test_python_shard "${SHARD_NUMBER}"
if [[ "${SHARD_NUMBER}" == 1 ]]; then

View File

@ -18,9 +18,7 @@ time python test/run_test.py --verbose -i distributed/test_c10d_gloo
time python test/run_test.py --verbose -i distributed/test_c10d_nccl
time python test/run_test.py --verbose -i distributed/test_c10d_spawn_gloo
time python test/run_test.py --verbose -i distributed/test_c10d_spawn_nccl
time python test/run_test.py --verbose -i distributed/test_compute_comm_reordering
time python test/run_test.py --verbose -i distributed/test_store
time python test/run_test.py --verbose -i distributed/test_symmetric_memory
time python test/run_test.py --verbose -i distributed/test_pg_wrapper
time python test/run_test.py --verbose -i distributed/rpc/cuda/test_tensorpipe_agent
# FSDP tests
@ -36,6 +34,7 @@ time python test/run_test.py --verbose -i distributed/_shard/sharded_tensor/test
# functional collective tests
time python test/run_test.py --verbose -i distributed/test_functional_api
# DTensor tests
time python test/run_test.py --verbose -i distributed/_tensor/test_random_ops
time python test/run_test.py --verbose -i distributed/_tensor/test_dtensor_compile
@ -44,19 +43,12 @@ time python test/run_test.py --verbose -i distributed/_tensor/test_dtensor_compi
time python test/run_test.py --verbose -i distributed/test_device_mesh
# DTensor/TP tests
time python test/run_test.py --verbose -i distributed/tensor/parallel/test_ddp_2d_parallel
time python test/run_test.py --verbose -i distributed/tensor/parallel/test_fsdp_2d_parallel
time python test/run_test.py --verbose -i distributed/tensor/parallel/test_tp_examples
time python test/run_test.py --verbose -i distributed/tensor/parallel/test_tp_random_state
# FSDP2 tests
time python test/run_test.py --verbose -i distributed/_composable/fsdp/test_fully_shard_training -- -k test_2d_mlp_with_nd_mesh
# ND composability tests
time python test/run_test.py --verbose -i distributed/_composable/test_composability/test_2d_composability
time python test/run_test.py --verbose -i distributed/_composable/test_composability/test_pp_composability
# Other tests
time python test/run_test.py --verbose -i test_cuda_primary_ctx
time python test/run_test.py --verbose -i test_optim -- -k test_forloop_goes_right_direction_multigpu
time python test/run_test.py --verbose -i test_optim -- -k test_mixed_device_dtype
time python test/run_test.py --verbose -i test_optim -- -k optimizers_with_varying_tensors
time python test/run_test.py --verbose -i test_foreach -- -k test_tensors_grouping
assert_git_not_dirty

View File

@ -3,7 +3,6 @@ import json
import math
import sys
parser = argparse.ArgumentParser()
parser.add_argument(
"--test-name", dest="test_name", action="store", required=True, help="test name"
@ -60,16 +59,16 @@ print("sample mean: ", sample_mean)
print("sample sigma: ", sample_sigma)
if math.isnan(sample_mean):
raise Exception("""Error: sample mean is NaN""") # noqa: TRY002
raise Exception("""Error: sample mean is NaN""")
elif math.isnan(sample_sigma):
raise Exception("""Error: sample sigma is NaN""") # noqa: TRY002
raise Exception("""Error: sample sigma is NaN""")
z_value = (sample_mean - mean) / sigma
print("z-value: ", z_value)
if z_value >= 3:
raise Exception( # noqa: TRY002
raise Exception(
f"""\n
z-value >= 3, there is high chance of perf regression.\n
To reproduce this regression, run

View File

@ -3,7 +3,6 @@ import sys
import numpy
sample_data_list = sys.argv[1:]
sample_data_list = [float(v.strip()) for v in sample_data_list]

View File

@ -1,7 +1,6 @@
import json
import sys
data_file_path = sys.argv[1]
commit_hash = sys.argv[2]

View File

@ -1,6 +1,5 @@
import sys
log_file_path = sys.argv[1]
with open(log_file_path) as f:

View File

@ -26,8 +26,8 @@ echo "error: python_doc_push_script.sh: version (arg2) not specified"
fi
# Argument 1: Where to copy the built documentation to
# (pytorch_docs/$install_path)
install_path="${1:-${DOCS_INSTALL_PATH:-${DOCS_VERSION}}}"
# (pytorch.github.io/$install_path)
install_path="${1:-${DOCS_INSTALL_PATH:-docs/${DOCS_VERSION}}}"
if [ -z "$install_path" ]; then
echo "error: python_doc_push_script.sh: install_path (arg1) not specified"
exit 1
@ -68,8 +68,8 @@ build_docs () {
}
git clone https://github.com/pytorch/docs pytorch_docs -b "$branch" --depth 1
pushd pytorch_docs
git clone https://github.com/pytorch/pytorch.github.io -b "$branch" --depth 1
pushd pytorch.github.io
export LC_ALL=C
export PATH=/opt/conda/bin:$PATH
@ -105,7 +105,6 @@ if [ "$is_main_doc" = true ]; then
echo undocumented objects found:
cat build/coverage/python.txt
echo "Make sure you've updated relevant .rsts in docs/source!"
echo "You can reproduce locally by running 'cd docs && make coverage && cat build/coverage/python.txt'"
exit 1
fi
else

View File

@ -6,30 +6,6 @@
set -ex
# Suppress ANSI color escape sequences
export TERM=vt100
# shellcheck source=./common.sh
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
# Do not change workspace permissions for ROCm CI jobs
# as it can leave workspace with bad permissions for cancelled jobs
if [[ "$BUILD_ENVIRONMENT" != *rocm* ]]; then
# Workaround for dind-rootless userid mapping (https://github.com/pytorch/ci-infra/issues/96)
WORKSPACE_ORIGINAL_OWNER_ID=$(stat -c '%u' "/var/lib/jenkins/workspace")
cleanup_workspace() {
echo "sudo may print the following warning message that can be ignored. The chown command will still run."
echo " sudo: setrlimit(RLIMIT_STACK): Operation not permitted"
echo "For more details refer to https://github.com/sudo-project/sudo/issues/42"
sudo chown -R "$WORKSPACE_ORIGINAL_OWNER_ID" /var/lib/jenkins/workspace
}
# Disable shellcheck SC2064 as we want to parse the original owner immediately.
# shellcheck disable=SC2064
trap_add cleanup_workspace EXIT
sudo chown -R jenkins /var/lib/jenkins/workspace
git config --global --add safe.directory /var/lib/jenkins/workspace
fi
echo "Environment variables:"
env
@ -42,10 +18,6 @@ BUILD_DIR="build"
BUILD_RENAMED_DIR="build_renamed"
BUILD_BIN_DIR="$BUILD_DIR"/bin
#Set Default values for these variables in case they are not set
SHARD_NUMBER="${SHARD_NUMBER:=1}"
NUM_TEST_SHARDS="${NUM_TEST_SHARDS:=1}"
export VALGRIND=ON
# export TORCH_INDUCTOR_INSTALL_GXX=ON
if [[ "$BUILD_ENVIRONMENT" == *clang9* ]]; then
@ -114,6 +86,9 @@ if [[ -n $TESTS_TO_INCLUDE ]]; then
INCLUDE_CLAUSE="--include $TESTS_TO_INCLUDE"
fi
# shellcheck source=./common.sh
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
echo "Environment variables"
env
@ -149,10 +124,6 @@ if [[ "$BUILD_ENVIRONMENT" == *cuda* || "$BUILD_ENVIRONMENT" == *rocm* ]]; then
# mainly used so that we're not spending extra cycles testing cpu
# devices on expensive gpu machines
export PYTORCH_TESTING_DEVICE_ONLY_FOR="cuda"
elif [[ "$BUILD_ENVIRONMENT" == *xpu* ]]; then
export PYTORCH_TESTING_DEVICE_ONLY_FOR="xpu"
# setting PYTHON_TEST_EXTRA_OPTION
export PYTHON_TEST_EXTRA_OPTION="--xpu"
fi
if [[ "$TEST_CONFIG" == *crossref* ]]; then
@ -160,22 +131,11 @@ if [[ "$TEST_CONFIG" == *crossref* ]]; then
fi
if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then
# regression in ROCm 6.0 on MI50 CI runners due to hipblaslt; remove in 6.1
export VALGRIND=OFF
# Print GPU info
rocminfo
rocminfo | grep -E 'Name:.*\sgfx|Marketing'
fi
if [[ "$BUILD_ENVIRONMENT" == *xpu* ]]; then
# Source Intel oneAPI envrioment script to enable xpu runtime related libraries
# refer to https://www.intel.com/content/www/us/en/developer/articles/tool/pytorch-prerequisites-for-intel-gpu/2-5.html
# shellcheck disable=SC1091
source /opt/intel/oneapi/compiler/latest/env/vars.sh
# Check XPU status before testing
xpu-smi discovery
fi
if [[ "$BUILD_ENVIRONMENT" != *-bazel-* ]] ; then
# JIT C++ extensions require ninja.
pip_install --user "ninja==1.10.2"
@ -184,13 +144,6 @@ if [[ "$BUILD_ENVIRONMENT" != *-bazel-* ]] ; then
export PATH="$HOME/.local/bin:$PATH"
fi
if [[ "$BUILD_ENVIRONMENT" == *aarch64* ]]; then
# TODO: revisit this once the CI is stabilized on aarch64 linux
export VALGRIND=OFF
fi
install_tlparse
# DANGER WILL ROBINSON. The LD_PRELOAD here could cause you problems
# if you're not careful. Check this if you made some changes and the
# ASAN test is not working
@ -237,6 +190,8 @@ if [[ "$BUILD_ENVIRONMENT" == *asan* ]]; then
export LD_PRELOAD=/usr/lib/llvm-15/lib/clang/15.0.7/lib/linux/libclang_rt.asan-x86_64.so
# Disable valgrind for asan
export VALGRIND=OFF
# Increase stack size, because ASAN red zones use more stack
ulimit -s 81920
(cd test && python -c "import torch; print(torch.__version__, torch.version.git_version)")
echo "The next four invocations are expected to crash; if they don't that means ASAN/UBSAN is misconfigured"
@ -252,7 +207,9 @@ fi
# This tests that the debug asserts are working correctly.
if [[ "$BUILD_ENVIRONMENT" == *-debug* ]]; then
echo "We are in debug mode: $BUILD_ENVIRONMENT. Expect the python assertion to fail"
(cd test && ! get_exit_code python -c "import torch; torch._C._crash_if_debug_asserts_fail(424242)")
# TODO: Enable the check after we setup the build to run debug asserts without having
# to do a full (and slow) debug build
# (cd test && ! get_exit_code python -c "import torch; torch._C._crash_if_debug_asserts_fail(424242)")
elif [[ "$BUILD_ENVIRONMENT" != *-bazel-* ]]; then
# Noop when debug is disabled. Skip bazel jobs because torch isn't available there yet.
echo "We are not in debug mode: $BUILD_ENVIRONMENT. Expect the assertion to pass"
@ -278,17 +235,14 @@ test_python_shard() {
# Bare --include flag is not supported and quoting for lint ends up with flag not being interpreted correctly
# shellcheck disable=SC2086
# modify LD_LIBRARY_PATH to ensure it has the conda env.
# This set of tests has been shown to be buggy without it for the split-build
time python test/run_test.py --exclude-jit-executor --exclude-distributed-tests $INCLUDE_CLAUSE --shard "$1" "$NUM_TEST_SHARDS" --verbose $PYTHON_TEST_EXTRA_OPTION
time python test/run_test.py --exclude-jit-executor --exclude-distributed-tests $INCLUDE_CLAUSE --shard "$1" "$NUM_TEST_SHARDS" --verbose
assert_git_not_dirty
}
test_python() {
# shellcheck disable=SC2086
time python test/run_test.py --exclude-jit-executor --exclude-distributed-tests $INCLUDE_CLAUSE --verbose $PYTHON_TEST_EXTRA_OPTION
time python test/run_test.py --exclude-jit-executor --exclude-distributed-tests $INCLUDE_CLAUSE --verbose
assert_git_not_dirty
}
@ -299,13 +253,33 @@ test_dynamo_shard() {
exit 1
fi
python tools/dynamo/verify_dynamo.py
# PLEASE DO NOT ADD ADDITIONAL EXCLUDES HERE.
# Instead, use @skipIfTorchDynamo on your tests.
# Temporarily disable test_fx for dynamo pending the investigation on TTS
# regression in https://github.com/pytorch/torchdynamo/issues/784
time python test/run_test.py --dynamo \
--exclude-inductor-tests \
--exclude-jit-executor \
--exclude-distributed-tests \
--exclude-torch-export-tests \
--exclude \
test_autograd \
test_jit \
test_proxy_tensor \
test_quantization \
test_public_bindings \
test_dataloader \
test_reductions \
test_namedtensor \
test_namedtuple_return_api \
profiler/test_profiler \
profiler/test_profiler_tree \
test_overrides \
test_python_dispatch \
test_fx \
test_package \
test_legacy_vmap \
test_custom_ops \
test_content_store \
export/test_db \
functorch/test_dims \
functorch/test_aotdispatch \
--shard "$1" "$NUM_TEST_SHARDS" \
--verbose
assert_git_not_dirty
@ -314,24 +288,11 @@ test_dynamo_shard() {
test_inductor_distributed() {
# Smuggle a few multi-gpu tests here so that we don't have to request another large node
echo "Testing multi_gpu tests in test_torchinductor"
python test/run_test.py -i inductor/test_torchinductor.py -k test_multi_gpu --verbose
python test/run_test.py -i inductor/test_aot_inductor.py -k test_non_default_cuda_device --verbose
python test/run_test.py -i inductor/test_aot_inductor.py -k test_replicate_on_devices --verbose
python test/run_test.py -i distributed/test_c10d_functional_native.py --verbose
python test/run_test.py -i distributed/_tensor/test_dtensor_compile.py --verbose
python test/run_test.py -i distributed/tensor/parallel/test_micro_pipeline_tp.py --verbose
python test/run_test.py -i distributed/_composable/fsdp/test_fully_shard_comm.py --verbose
python test/run_test.py -i distributed/_composable/fsdp/test_fully_shard_training.py -k test_train_parity_multi_group --verbose
python test/run_test.py -i distributed/_composable/fsdp/test_fully_shard_training.py -k test_train_parity_with_activation_checkpointing --verbose
python test/run_test.py -i distributed/_composable/fsdp/test_fully_shard_training.py -k test_train_parity_hsdp --verbose
python test/run_test.py -i distributed/_composable/fsdp/test_fully_shard_training.py -k test_train_parity_2d_transformer_checkpoint_resume --verbose
python test/run_test.py -i distributed/_composable/fsdp/test_fully_shard_training.py -k test_gradient_accumulation --verbose
python test/run_test.py -i distributed/_composable/fsdp/test_fully_shard_state_dict.py -k test_dp_state_dict_save_load --verbose
python test/run_test.py -i distributed/_composable/fsdp/test_fully_shard_frozen.py --verbose
python test/run_test.py -i distributed/_composable/fsdp/test_fully_shard_mixed_precision.py -k test_compute_dtype --verbose
python test/run_test.py -i distributed/_composable/fsdp/test_fully_shard_mixed_precision.py -k test_reduce_dtype --verbose
python test/run_test.py -i distributed/_composable/fsdp/test_fully_shard_clip_grad_norm_.py -k test_clip_grad_norm_2d --verbose
python test/run_test.py -i distributed/fsdp/test_fsdp_tp_integration.py -k test_fsdp_tp_integration --verbose
pytest test/inductor/test_torchinductor.py -k test_multi_gpu
pytest test/inductor/test_aot_inductor.py -k test_non_default_cuda_device
pytest test/inductor/test_aot_inductor.py -k test_replicate_on_devices
pytest test/distributed/_tensor/test_dtensor_compile.py
pytest test/distributed/tensor/parallel/test_fsdp_2d_parallel.py
# this runs on both single-gpu and multi-gpu instance. It should be smart about skipping tests that aren't supported
# with if required # gpus aren't available
@ -339,52 +300,16 @@ test_inductor_distributed() {
assert_git_not_dirty
}
test_inductor_shard() {
if [[ -z "$NUM_TEST_SHARDS" ]]; then
echo "NUM_TEST_SHARDS must be defined to run a Python test shard"
exit 1
fi
test_inductor() {
python tools/dynamo/verify_dynamo.py
python test/run_test.py --inductor \
--include test_modules test_ops test_ops_gradients test_torch \
--shard "$1" "$NUM_TEST_SHARDS" \
--verbose
python test/run_test.py --inductor --include test_modules test_ops test_ops_gradients test_torch --verbose
# Do not add --inductor for the following inductor unit tests, otherwise we will fail because of nested dynamo state
python test/run_test.py \
--include inductor/test_torchinductor inductor/test_torchinductor_opinfo inductor/test_aot_inductor \
--shard "$1" "$NUM_TEST_SHARDS" \
--verbose
}
python test/run_test.py --include inductor/test_torchinductor inductor/test_torchinductor_opinfo --verbose
test_inductor_aoti() {
# docker build uses bdist_wheel which does not work with test_aot_inductor
# TODO: need a faster way to build
if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then
# We need to hipify before building again
python3 tools/amd_build/build_amd.py
fi
BUILD_AOT_INDUCTOR_TEST=1 python setup.py develop
CPP_TESTS_DIR="${BUILD_BIN_DIR}" LD_LIBRARY_PATH="${TORCH_LIB_DIR}" python test/run_test.py --cpp --verbose -i cpp/test_aoti_abi_check cpp/test_aoti_inference
}
test_inductor_cpp_wrapper_abi_compatible() {
export TORCHINDUCTOR_ABI_COMPATIBLE=1
TEST_REPORTS_DIR=$(pwd)/test/test-reports
mkdir -p "$TEST_REPORTS_DIR"
echo "Testing Inductor cpp wrapper mode with TORCHINDUCTOR_ABI_COMPATIBLE=1"
# cpu stack allocation causes segfault and needs more investigation
PYTORCH_TESTING_DEVICE_ONLY_FOR="" python test/run_test.py --include inductor/test_cpu_cpp_wrapper
python test/run_test.py --include inductor/test_cuda_cpp_wrapper
TORCHINDUCTOR_CPP_WRAPPER=1 python benchmarks/dynamo/timm_models.py --device cuda --accuracy --amp \
--training --inductor --disable-cudagraphs --only vit_base_patch16_224 \
--output "$TEST_REPORTS_DIR/inductor_cpp_wrapper_training.csv"
python benchmarks/dynamo/check_accuracy.py \
--actual "$TEST_REPORTS_DIR/inductor_cpp_wrapper_training.csv" \
--expected "benchmarks/dynamo/ci_expected_accuracy/inductor_timm_training.csv"
CPP_TESTS_DIR="${BUILD_BIN_DIR}" LD_LIBRARY_PATH="${TORCH_LIB_DIR}" python test/run_test.py --cpp --verbose -i cpp/test_aot_inductor
}
# "Global" flags for inductor benchmarking controlled by TEST_CONFIG
@ -395,22 +320,7 @@ test_inductor_cpp_wrapper_abi_compatible() {
# .github/workflows/inductor-perf-test-nightly.yml
DYNAMO_BENCHMARK_FLAGS=()
pr_time_benchmarks() {
pip_install --user "fbscribelogger"
TEST_REPORTS_DIR=$(pwd)/test/test-reports
mkdir -p "$TEST_REPORTS_DIR"
PYTHONPATH=$(pwd)/benchmarks/dynamo/pr_time_benchmarks source benchmarks/dynamo/pr_time_benchmarks/benchmark_runner.sh "$TEST_REPORTS_DIR/pr_time_benchmarks_after.txt" "benchmarks/dynamo/pr_time_benchmarks/benchmarks"
echo "benchmark results on current PR: "
cat "$TEST_REPORTS_DIR/pr_time_benchmarks_after.txt"
}
if [[ "${TEST_CONFIG}" == *pr_time_benchmarks* ]]; then
pr_time_benchmarks
exit 0
elif [[ "${TEST_CONFIG}" == *dynamo_eager* ]]; then
if [[ "${TEST_CONFIG}" == *dynamo_eager* ]]; then
DYNAMO_BENCHMARK_FLAGS+=(--backend eager)
elif [[ "${TEST_CONFIG}" == *aot_eager* ]]; then
DYNAMO_BENCHMARK_FLAGS+=(--backend aot_eager)
@ -424,7 +334,7 @@ if [[ "${TEST_CONFIG}" == *dynamic* ]]; then
DYNAMO_BENCHMARK_FLAGS+=(--dynamic-shapes --dynamic-batch-only)
fi
if [[ "${TEST_CONFIG}" == *cpu* ]]; then
if [[ "${TEST_CONFIG}" == *cpu_inductor* ]]; then
DYNAMO_BENCHMARK_FLAGS+=(--device cpu)
else
DYNAMO_BENCHMARK_FLAGS+=(--device cuda)
@ -448,18 +358,6 @@ test_perf_for_dashboard() {
# TODO: All the accuracy tests can be skipped once the CI accuracy checking is stable enough
local targets=(accuracy performance)
local device=cuda
if [[ "${TEST_CONFIG}" == *cpu* ]]; then
if [[ "${TEST_CONFIG}" == *cpu_x86* ]]; then
device=cpu_x86
elif [[ "${TEST_CONFIG}" == *cpu_aarch64* ]]; then
device=cpu_aarch64
fi
test_inductor_set_cpu_affinity
elif [[ "${TEST_CONFIG}" == *cuda_a10g* ]]; then
device=cuda_a10g
fi
for mode in "${modes[@]}"; do
if [[ "$mode" == "inference" ]]; then
dtype=bfloat16
@ -475,62 +373,45 @@ test_perf_for_dashboard() {
fi
if [[ "$DASHBOARD_TAG" == *default-true* ]]; then
$TASKSET python "benchmarks/dynamo/$suite.py" \
python "benchmarks/dynamo/$suite.py" \
"${target_flag[@]}" --"$mode" --"$dtype" --backend "$backend" --disable-cudagraphs "$@" \
--output "$TEST_REPORTS_DIR/${backend}_no_cudagraphs_${suite}_${dtype}_${mode}_${device}_${target}.csv"
--output "$TEST_REPORTS_DIR/${backend}_no_cudagraphs_${suite}_${dtype}_${mode}_cuda_${target}.csv"
fi
if [[ "$DASHBOARD_TAG" == *cudagraphs-true* ]]; then
$TASKSET python "benchmarks/dynamo/$suite.py" \
python "benchmarks/dynamo/$suite.py" \
"${target_flag[@]}" --"$mode" --"$dtype" --backend "$backend" "$@" \
--output "$TEST_REPORTS_DIR/${backend}_with_cudagraphs_${suite}_${dtype}_${mode}_${device}_${target}.csv"
--output "$TEST_REPORTS_DIR/${backend}_with_cudagraphs_${suite}_${dtype}_${mode}_cuda_${target}.csv"
fi
if [[ "$DASHBOARD_TAG" == *dynamic-true* ]]; then
$TASKSET python "benchmarks/dynamo/$suite.py" \
python "benchmarks/dynamo/$suite.py" \
"${target_flag[@]}" --"$mode" --"$dtype" --backend "$backend" --dynamic-shapes \
--dynamic-batch-only "$@" \
--output "$TEST_REPORTS_DIR/${backend}_dynamic_${suite}_${dtype}_${mode}_${device}_${target}.csv"
--output "$TEST_REPORTS_DIR/${backend}_dynamic_${suite}_${dtype}_${mode}_cuda_${target}.csv"
fi
if [[ "$DASHBOARD_TAG" == *cppwrapper-true* ]] && [[ "$mode" == "inference" ]]; then
TORCHINDUCTOR_CPP_WRAPPER=1 $TASKSET python "benchmarks/dynamo/$suite.py" \
"${target_flag[@]}" --"$mode" --"$dtype" --backend "$backend" --disable-cudagraphs "$@" \
--output "$TEST_REPORTS_DIR/${backend}_cpp_wrapper_${suite}_${dtype}_${mode}_${device}_${target}.csv"
python "benchmarks/dynamo/$suite.py" \
"${target_flag[@]}" --"$mode" --"$dtype" --backend "$backend" --disable-cudagraphs --cpp-wrapper "$@" \
--output "$TEST_REPORTS_DIR/${backend}_cpp_wrapper_${suite}_${dtype}_${mode}_cuda_${target}.csv"
fi
if [[ "$DASHBOARD_TAG" == *freezing_cudagraphs-true* ]] && [[ "$mode" == "inference" ]]; then
$TASKSET python "benchmarks/dynamo/$suite.py" \
python "benchmarks/dynamo/$suite.py" \
"${target_flag[@]}" --"$mode" --"$dtype" --backend "$backend" "$@" --freezing \
--output "$TEST_REPORTS_DIR/${backend}_with_cudagraphs_freezing_${suite}_${dtype}_${mode}_${device}_${target}.csv"
--output "$TEST_REPORTS_DIR/${backend}_with_cudagraphs_freezing_${suite}_${dtype}_${mode}_cuda_${target}.csv"
fi
if [[ "$DASHBOARD_TAG" == *freeze_autotune_cudagraphs-true* ]] && [[ "$mode" == "inference" ]]; then
TORCHINDUCTOR_MAX_AUTOTUNE=1 $TASKSET python "benchmarks/dynamo/$suite.py" \
TORCHINDUCTOR_MAX_AUTOTUNE=1 python "benchmarks/dynamo/$suite.py" \
"${target_flag[@]}" --"$mode" --"$dtype" --backend "$backend" "$@" --freezing \
--output "$TEST_REPORTS_DIR/${backend}_with_cudagraphs_freezing_autotune_${suite}_${dtype}_${mode}_${device}_${target}.csv"
--output "$TEST_REPORTS_DIR/${backend}_with_cudagraphs_freezing_autotune_${suite}_${dtype}_${mode}_cuda_${target}.csv"
fi
if [[ "$DASHBOARD_TAG" == *aotinductor-true* ]] && [[ "$mode" == "inference" ]]; then
if [[ "$target" == "accuracy" ]]; then
# Also collect Export pass rate and display as a separate row
$TASKSET python "benchmarks/dynamo/$suite.py" \
"${target_flag[@]}" --"$mode" --"$dtype" --export --disable-cudagraphs "$@" \
--output "$TEST_REPORTS_DIR/${backend}_export_${suite}_${dtype}_${mode}_${device}_${target}.csv"
fi
TORCHINDUCTOR_ABI_COMPATIBLE=1 $TASKSET python "benchmarks/dynamo/$suite.py" \
python "benchmarks/dynamo/$suite.py" \
"${target_flag[@]}" --"$mode" --"$dtype" --export-aot-inductor --disable-cudagraphs "$@" \
--output "$TEST_REPORTS_DIR/${backend}_aot_inductor_${suite}_${dtype}_${mode}_${device}_${target}.csv"
--output "$TEST_REPORTS_DIR/${backend}_aot_inductor_${suite}_${dtype}_${mode}_cuda_${target}.csv"
fi
if [[ "$DASHBOARD_TAG" == *maxautotune-true* ]]; then
TORCHINDUCTOR_MAX_AUTOTUNE=1 $TASKSET python "benchmarks/dynamo/$suite.py" \
TORCHINDUCTOR_MAX_AUTOTUNE=1 python "benchmarks/dynamo/$suite.py" \
"${target_flag[@]}" --"$mode" --"$dtype" --backend "$backend" "$@" \
--output "$TEST_REPORTS_DIR/${backend}_max_autotune_${suite}_${dtype}_${mode}_${device}_${target}.csv"
fi
if [[ "$DASHBOARD_TAG" == *cudagraphs_low_precision-true* ]] && [[ "$mode" == "inference" ]]; then
# TODO: This has a new dtype called quant and the benchmarks script needs to be updated to support this.
# The tentative command is as follows. It doesn't work now, but it's ok because we only need mock data
# to fill the dashboard.
$TASKSET python "benchmarks/dynamo/$suite.py" \
"${target_flag[@]}" --"$mode" --quant --backend "$backend" "$@" \
--output "$TEST_REPORTS_DIR/${backend}_cudagraphs_low_precision_${suite}_quant_${mode}_${device}_${target}.csv" || true
# Copy cudagraph results as mock data, easiest choice?
cp "$TEST_REPORTS_DIR/${backend}_with_cudagraphs_${suite}_${dtype}_${mode}_${device}_${target}.csv" \
"$TEST_REPORTS_DIR/${backend}_cudagraphs_low_precision_${suite}_quant_${mode}_${device}_${target}.csv"
--output "$TEST_REPORTS_DIR/${backend}_max_autotune_${suite}_${dtype}_${mode}_cuda_${target}.csv"
fi
done
done
@ -567,19 +448,6 @@ test_single_dynamo_benchmark() {
test_perf_for_dashboard "$suite" \
"${DYNAMO_BENCHMARK_FLAGS[@]}" "$@" "${partition_flags[@]}"
else
if [[ "${TEST_CONFIG}" == *aot_inductor* && "${TEST_CONFIG}" != *cpu_aot_inductor* ]]; then
# Test AOTInductor with the ABI-compatible mode on CI
# This can be removed once the ABI-compatible mode becomes default.
# For CPU device, we perfer non ABI-compatible mode on CI when testing AOTInductor.
export TORCHINDUCTOR_ABI_COMPATIBLE=1
fi
if [[ "${TEST_CONFIG}" == *_avx2* ]]; then
TEST_CONFIG=${TEST_CONFIG//_avx2/}
fi
if [[ "${TEST_CONFIG}" == *_avx512* ]]; then
TEST_CONFIG=${TEST_CONFIG//_avx512/}
fi
python "benchmarks/dynamo/$suite.py" \
--ci --accuracy --timing --explain \
"${DYNAMO_BENCHMARK_FLAGS[@]}" \
@ -594,19 +462,6 @@ test_single_dynamo_benchmark() {
fi
}
test_inductor_micro_benchmark() {
TEST_REPORTS_DIR=$(pwd)/test/test-reports
if [[ "${TEST_CONFIG}" == *cpu* ]]; then
test_inductor_set_cpu_affinity
fi
python benchmarks/gpt_fast/benchmark.py --output "${TEST_REPORTS_DIR}/gpt_fast_benchmark.csv"
}
test_inductor_halide() {
python test/run_test.py --include inductor/test_halide.py --verbose
assert_git_not_dirty
}
test_dynamo_benchmark() {
# Usage: test_dynamo_benchmark huggingface 0
TEST_REPORTS_DIR=$(pwd)/test/test-reports
@ -621,16 +476,8 @@ test_dynamo_benchmark() {
elif [[ "${TEST_CONFIG}" == *perf* ]]; then
test_single_dynamo_benchmark "dashboard" "$suite" "$shard_id" "$@"
else
if [[ "${TEST_CONFIG}" == *cpu* ]]; then
local dt="float32"
if [[ "${TEST_CONFIG}" == *amp* ]]; then
dt="amp"
fi
if [[ "${TEST_CONFIG}" == *freezing* ]]; then
test_single_dynamo_benchmark "inference" "$suite" "$shard_id" --inference --"$dt" --freezing "$@"
else
test_single_dynamo_benchmark "inference" "$suite" "$shard_id" --inference --"$dt" "$@"
fi
if [[ "${TEST_CONFIG}" == *cpu_inductor* ]]; then
test_single_dynamo_benchmark "inference" "$suite" "$shard_id" --inference --float32 "$@"
elif [[ "${TEST_CONFIG}" == *aot_inductor* ]]; then
test_single_dynamo_benchmark "inference" "$suite" "$shard_id" --inference --bfloat16 "$@"
else
@ -644,24 +491,13 @@ test_inductor_torchbench_smoketest_perf() {
TEST_REPORTS_DIR=$(pwd)/test/test-reports
mkdir -p "$TEST_REPORTS_DIR"
# Test some models in the cpp wrapper mode
TORCHINDUCTOR_ABI_COMPATIBLE=1 TORCHINDUCTOR_CPP_WRAPPER=1 python benchmarks/dynamo/torchbench.py --device cuda --accuracy \
--bfloat16 --inference --inductor --only hf_T5 --output "$TEST_REPORTS_DIR/inductor_cpp_wrapper_inference.csv"
TORCHINDUCTOR_ABI_COMPATIBLE=1 TORCHINDUCTOR_CPP_WRAPPER=1 python benchmarks/dynamo/torchbench.py --device cuda --accuracy \
--bfloat16 --inference --inductor --only llama --output "$TEST_REPORTS_DIR/inductor_cpp_wrapper_inference.csv"
TORCHINDUCTOR_ABI_COMPATIBLE=1 TORCHINDUCTOR_CPP_WRAPPER=1 python benchmarks/dynamo/torchbench.py --device cuda --accuracy \
--bfloat16 --inference --inductor --only moco --output "$TEST_REPORTS_DIR/inductor_cpp_wrapper_inference.csv"
python benchmarks/dynamo/check_accuracy.py \
--actual "$TEST_REPORTS_DIR/inductor_cpp_wrapper_inference.csv" \
--expected "benchmarks/dynamo/ci_expected_accuracy/inductor_torchbench_inference.csv"
python benchmarks/dynamo/torchbench.py --device cuda --performance --backend inductor --float16 --training \
--batch-size-file "$(realpath benchmarks/dynamo/torchbench_models_list.txt)" --only hf_Bert \
--output "$TEST_REPORTS_DIR/inductor_training_smoketest.csv"
# The threshold value needs to be actively maintained to make this check useful
python benchmarks/dynamo/check_perf_csv.py -f "$TEST_REPORTS_DIR/inductor_training_smoketest.csv" -t 1.4
TORCHINDUCTOR_ABI_COMPATIBLE=1 python benchmarks/dynamo/torchbench.py --device cuda --performance --bfloat16 --inference \
python benchmarks/dynamo/torchbench.py --device cuda --performance --bfloat16 --inference \
--export-aot-inductor --only nanogpt --output "$TEST_REPORTS_DIR/inductor_inference_smoketest.csv"
# The threshold value needs to be actively maintained to make this check useful
# The perf number of nanogpt seems not very stable, e.g.
@ -680,94 +516,6 @@ test_inductor_torchbench_smoketest_perf() {
"$TEST_REPORTS_DIR/inductor_training_smoketest_$test.csv" \
--expected benchmarks/dynamo/expected_ci_perf_inductor_torchbench.csv
done
# Perform some "warm-start" runs for a few huggingface models.
for test in AlbertForQuestionAnswering AllenaiLongformerBase DistilBertForMaskedLM DistillGPT2 GoogleFnet YituTechConvBert; do
python benchmarks/dynamo/huggingface.py --accuracy --training --amp --inductor --device cuda --warm-start-latency \
--only $test --output "$TEST_REPORTS_DIR/inductor_warm_start_smoketest_$test.csv"
python benchmarks/dynamo/check_accuracy.py \
--actual "$TEST_REPORTS_DIR/inductor_warm_start_smoketest_$test.csv" \
--expected "benchmarks/dynamo/ci_expected_accuracy/inductor_huggingface_training.csv"
done
}
test_inductor_get_core_number() {
if [[ "${TEST_CONFIG}" == *aarch64* ]]; then
echo "$(($(lscpu | grep 'Cluster(s):' | awk '{print $2}') * $(lscpu | grep 'Core(s) per cluster:' | awk '{print $4}')))"
else
echo "$(($(lscpu | grep 'Socket(s):' | awk '{print $2}') * $(lscpu | grep 'Core(s) per socket:' | awk '{print $4}')))"
fi
}
test_inductor_set_cpu_affinity(){
#set jemalloc
JEMALLOC_LIB="$(find /usr/lib -name libjemalloc.so.2)"
export LD_PRELOAD="$JEMALLOC_LIB":"$LD_PRELOAD"
export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:-1,muzzy_decay_ms:-1"
if [[ "${TEST_CONFIG}" != *aarch64* ]]; then
# Use Intel OpenMP for x86
IOMP_LIB="$(dirname "$(which python)")/../lib/libiomp5.so"
export LD_PRELOAD="$IOMP_LIB":"$LD_PRELOAD"
export KMP_AFFINITY=granularity=fine,compact,1,0
export KMP_BLOCKTIME=1
fi
cores=$(test_inductor_get_core_number)
export OMP_NUM_THREADS=$cores
end_core=$((cores-1))
export TASKSET="taskset -c 0-$end_core"
}
test_inductor_torchbench_cpu_smoketest_perf(){
TEST_REPORTS_DIR=$(pwd)/test/test-reports
mkdir -p "$TEST_REPORTS_DIR"
test_inductor_set_cpu_affinity
MODELS_SPEEDUP_TARGET=benchmarks/dynamo/expected_ci_speedup_inductor_torchbench_cpu.csv
grep -v '^ *#' < "$MODELS_SPEEDUP_TARGET" | while IFS=',' read -r -a model_cfg
do
local model_name=${model_cfg[0]}
local data_type=${model_cfg[2]}
local speedup_target=${model_cfg[5]}
local backend=${model_cfg[1]}
if [[ ${model_cfg[4]} == "cpp" ]]; then
export TORCHINDUCTOR_CPP_WRAPPER=1
else
unset TORCHINDUCTOR_CPP_WRAPPER
fi
local output_name="$TEST_REPORTS_DIR/inductor_inference_${model_cfg[0]}_${model_cfg[1]}_${model_cfg[2]}_${model_cfg[3]}_cpu_smoketest.csv"
if [[ ${model_cfg[3]} == "dynamic" ]]; then
$TASKSET python benchmarks/dynamo/torchbench.py \
--inference --performance --"$data_type" -dcpu -n50 --only "$model_name" --dynamic-shapes \
--dynamic-batch-only --freezing --timeout 9000 --"$backend" --output "$output_name"
else
$TASKSET python benchmarks/dynamo/torchbench.py \
--inference --performance --"$data_type" -dcpu -n50 --only "$model_name" \
--freezing --timeout 9000 --"$backend" --output "$output_name"
fi
cat "$output_name"
# The threshold value needs to be actively maintained to make this check useful.
python benchmarks/dynamo/check_perf_csv.py -f "$output_name" -t "$speedup_target"
done
# Add a few ABI-compatible accuracy tests for CPU. These can be removed once we turn on ABI-compatible as default.
TORCHINDUCTOR_ABI_COMPATIBLE=1 python benchmarks/dynamo/timm_models.py --device cpu --accuracy \
--bfloat16 --inference --export-aot-inductor --disable-cudagraphs --only adv_inception_v3 \
--output "$TEST_REPORTS_DIR/aot_inductor_smoke_test.csv"
TORCHINDUCTOR_ABI_COMPATIBLE=1 python benchmarks/dynamo/timm_models.py --device cpu --accuracy \
--bfloat16 --inference --export-aot-inductor --disable-cudagraphs --only beit_base_patch16_224 \
--output "$TEST_REPORTS_DIR/aot_inductor_smoke_test.csv"
python benchmarks/dynamo/check_accuracy.py \
--actual "$TEST_REPORTS_DIR/aot_inductor_smoke_test.csv" \
--expected "benchmarks/dynamo/ci_expected_accuracy/aot_inductor_timm_inference.csv"
}
test_torchbench_gcp_smoketest(){
pushd "${TORCHBENCHPATH}"
python test.py -v
popd
}
test_python_gloo_with_tls() {
@ -801,6 +549,7 @@ test_aten() {
${SUDO} ln -sf "$TORCH_LIB_DIR"/libmkldnn* "$TEST_BASE_DIR"
${SUDO} ln -sf "$TORCH_LIB_DIR"/libnccl* "$TEST_BASE_DIR"
${SUDO} ln -sf "$TORCH_LIB_DIR"/libtorch* "$TEST_BASE_DIR"
${SUDO} ln -sf "$TORCH_LIB_DIR"/libtbb* "$TEST_BASE_DIR"
ls "$TEST_BASE_DIR"
aten/tools/run_tests.sh "$TEST_BASE_DIR"
@ -825,6 +574,21 @@ test_without_numpy() {
popd
}
# pytorch extensions require including torch/extension.h which includes all.h
# which includes utils.h which includes Parallel.h.
# So you can call for instance parallel_for() from your extension,
# but the compilation will fail because of Parallel.h has only declarations
# and definitions are conditionally included Parallel.h(see last lines of Parallel.h).
# I tried to solve it #39612 and #39881 by including Config.h into Parallel.h
# But if Pytorch is built with TBB it provides Config.h
# that has AT_PARALLEL_NATIVE_TBB=1(see #3961 or #39881) and it means that if you include
# torch/extension.h which transitively includes Parallel.h
# which transitively includes tbb.h which is not available!
if [[ "${BUILD_ENVIRONMENT}" == *tbb* ]]; then
sudo mkdir -p /usr/include/tbb
sudo cp -r "$PWD"/third_party/tbb/include/tbb/* /usr/include/tbb
fi
test_libtorch() {
local SHARD="$1"
@ -838,6 +602,7 @@ test_libtorch() {
ln -sf "$TORCH_LIB_DIR"/libc10* "$TORCH_BIN_DIR"
ln -sf "$TORCH_LIB_DIR"/libshm* "$TORCH_BIN_DIR"
ln -sf "$TORCH_LIB_DIR"/libtorch* "$TORCH_BIN_DIR"
ln -sf "$TORCH_LIB_DIR"/libtbb* "$TORCH_BIN_DIR"
ln -sf "$TORCH_LIB_DIR"/libnvfuser* "$TORCH_BIN_DIR"
export CPP_TESTS_DIR="${TORCH_BIN_DIR}"
@ -899,19 +664,6 @@ test_libtorch_api() {
fi
}
test_xpu_bin(){
TEST_REPORTS_DIR=$(pwd)/test/test-reports
mkdir -p "$TEST_REPORTS_DIR"
for xpu_case in "${BUILD_BIN_DIR}"/*{xpu,sycl}*; do
if [[ "$xpu_case" != *"*"* && "$xpu_case" != *.so && "$xpu_case" != *.a ]]; then
case_name=$(basename "$xpu_case")
echo "Testing ${case_name} ..."
"$xpu_case" --gtest_output=xml:"$TEST_REPORTS_DIR"/"$case_name".xml
fi
done
}
test_aot_compilation() {
echo "Testing Ahead of Time compilation"
ln -sf "$TORCH_LIB_DIR"/libc10* "$TORCH_BIN_DIR"
@ -974,6 +726,7 @@ test_rpc() {
# test reporting process to function as expected.
ln -sf "$TORCH_LIB_DIR"/libtorch* "$TORCH_BIN_DIR"
ln -sf "$TORCH_LIB_DIR"/libc10* "$TORCH_BIN_DIR"
ln -sf "$TORCH_LIB_DIR"/libtbb* "$TORCH_BIN_DIR"
CPP_TESTS_DIR="${TORCH_BIN_DIR}" python test/run_test.py --cpp --verbose -i cpp/test_cpp_rpc
}
@ -1075,113 +828,11 @@ test_xla() {
assert_git_not_dirty
}
function check_public_api_test_fails {
test_name=$1
invalid_item_name=$2
invalid_item_desc=$3
echo "Running public API test '${test_name}'..."
test_output=$(python test/test_public_bindings.py -k "${test_name}" 2>&1) && ret=$? || ret=$?
# Ensure test fails correctly.
if [ "$ret" -eq 0 ]; then
cat << EOF
Expected the public API test '${test_name}' to fail after introducing
${invalid_item_desc}, but it succeeded! Check test/test_public_bindings.py
for any changes that may have broken the test.
EOF
return 1
fi
# Ensure invalid item is in the test output.
echo "${test_output}" | grep -q "${invalid_item_name}" && ret=$? || ret=$?
if [ $ret -ne 0 ]; then
cat << EOF
Expected the public API test '${test_name}' to identify ${invalid_item_desc}, but
it didn't! It's possible the test may not have run. Check test/test_public_bindings.py
for any changes that may have broken the test.
EOF
return 1
fi
echo "Success! '${test_name}' identified ${invalid_item_desc} ${invalid_item_name}."
return 0
}
# Do NOT run this test before any other tests, like test_python_shard, etc.
# Because this function uninstalls the torch built from branch and installs
# the torch built on its base commit.
test_forward_backward_compatibility() {
set -x
# First, validate public API tests in the torch built from branch.
# Step 1. Make sure the public API test "test_correct_module_names" fails when a new file
# introduces an invalid public API function.
new_filename=$(mktemp XXXXXXXX.py -p "${TORCH_INSTALL_DIR}")
BAD_PUBLIC_FUNC=$(
cat << 'EOF'
def new_public_func():
pass
# valid public API functions have __module__ set correctly
new_public_func.__module__ = None
EOF
)
echo "${BAD_PUBLIC_FUNC}" >> "${new_filename}"
invalid_api="torch.$(basename -s '.py' "${new_filename}").new_public_func"
echo "Created an invalid public API function ${invalid_api}..."
check_public_api_test_fails \
"test_correct_module_names" \
"${invalid_api}" \
"an invalid public API function" && ret=$? || ret=$?
rm -v "${new_filename}"
if [ "$ret" -ne 0 ]; then
exit 1
fi
# Step 2. Make sure that the public API test "test_correct_module_names" fails when an existing
# file is modified to introduce an invalid public API function.
EXISTING_FILEPATH="${TORCH_INSTALL_DIR}/nn/parameter.py"
cp -v "${EXISTING_FILEPATH}" "${EXISTING_FILEPATH}.orig"
echo "${BAD_PUBLIC_FUNC}" >> "${EXISTING_FILEPATH}"
invalid_api="torch.nn.parameter.new_public_func"
echo "Appended an invalid public API function to existing file ${EXISTING_FILEPATH}..."
check_public_api_test_fails \
"test_correct_module_names" \
"${invalid_api}" \
"an invalid public API function" && ret=$? || ret=$?
mv -v "${EXISTING_FILEPATH}.orig" "${EXISTING_FILEPATH}"
if [ "$ret" -ne 0 ]; then
exit 1
fi
# Step 3. Make sure that the public API test "test_modules_can_be_imported" fails when a module
# cannot be imported.
new_module_dir=$(mktemp XXXXXXXX -d -p "${TORCH_INSTALL_DIR}")
echo "invalid syntax garbage" > "${new_module_dir}/__init__.py"
invalid_module_name="torch.$(basename "${new_module_dir}")"
check_public_api_test_fails \
"test_modules_can_be_imported" \
"${invalid_module_name}" \
"a non-importable module" && ret=$? || ret=$?
rm -rv "${new_module_dir}"
if [ "$ret" -ne 0 ]; then
exit 1
fi
# Next, build torch from the merge base.
REPO_DIR=$(pwd)
if [[ "${BASE_SHA}" == "${SHA1}" ]]; then
echo "On trunk, we should compare schemas with torch built from the parent commit"
@ -1253,8 +904,7 @@ test_bazel() {
tools/bazel test --config=cpu-only --test_timeout=480 --test_output=all --test_tag_filters=-gpu-required --test_filter=-*CUDA :all_tests
else
# Increase the test timeout to 480 like CPU tests because modules_test frequently timeout
tools/bazel test --test_timeout=480 --test_output=errors \
tools/bazel test --test_output=errors \
//:any_test \
//:autograd_test \
//:dataloader_test \
@ -1349,27 +999,18 @@ test_docs_test() {
}
test_executorch() {
echo "Install torchvision and torchaudio"
install_torchvision
install_torchaudio
pushd /executorch
export PYTHON_EXECUTABLE=python
export EXECUTORCH_BUILD_PYBIND=ON
export CMAKE_ARGS="-DEXECUTORCH_BUILD_XNNPACK=ON -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON"
# NB: We need to rebuild ExecuTorch runner here because it depends on PyTorch
# from the PR
# shellcheck disable=SC1091
source .ci/scripts/setup-linux.sh cmake
echo "Run ExecuTorch unit tests"
pytest -v -n auto
# shellcheck disable=SC1091
LLVM_PROFDATA=llvm-profdata-12 LLVM_COV=llvm-cov-12 bash test/run_oss_cpp_tests.sh
echo "Install torchvision and torchaudio"
# TODO(huydhn): Switch this to the pinned commits on ExecuTorch once they are
# there. These libraries need to be built here, and not part of the Docker
# image because they require the target version of torch to be installed first
pip_install --no-use-pep517 --user "git+https://github.com/pytorch/audio.git"
pip_install --no-use-pep517 --user "git+https://github.com/pytorch/vision.git"
echo "Run ExecuTorch regression tests for some models"
# NB: This is a sample model, more can be added here
export PYTHON_EXECUTABLE=python
# TODO(huydhn): Add more coverage here using ExecuTorch's gather models script
# shellcheck disable=SC1091
source .ci/scripts/test.sh mv3 cmake xnnpack-quantization-delegation ''
@ -1383,33 +1024,11 @@ test_executorch() {
assert_git_not_dirty
}
test_linux_aarch64(){
python test/run_test.py --include test_modules test_mkldnn test_mkldnn_fusion test_openmp test_torch test_dynamic_shapes \
test_transformers test_multiprocessing test_numpy_interop --verbose
# Dynamo tests
python test/run_test.py --include dynamo/test_compile dynamo/test_backends dynamo/test_comptime dynamo/test_config \
dynamo/test_functions dynamo/test_fx_passes_pre_grad dynamo/test_interop dynamo/test_model_output dynamo/test_modules \
dynamo/test_optimizers dynamo/test_recompile_ux dynamo/test_recompiles --verbose
# Inductor tests
python test/run_test.py --include inductor/test_torchinductor inductor/test_benchmark_fusion inductor/test_codecache \
inductor/test_config inductor/test_control_flow inductor/test_coordinate_descent_tuner inductor/test_fx_fusion \
inductor/test_group_batch_fusion inductor/test_inductor_freezing inductor/test_inductor_utils \
inductor/test_inplacing_pass inductor/test_kernel_benchmark inductor/test_layout_optim \
inductor/test_max_autotune inductor/test_memory_planning inductor/test_metrics inductor/test_multi_kernel inductor/test_pad_mm \
inductor/test_pattern_matcher inductor/test_perf inductor/test_profiler inductor/test_select_algorithm inductor/test_smoke \
inductor/test_split_cat_fx_passes inductor/test_standalone_compile inductor/test_torchinductor \
inductor/test_torchinductor_codegen_dynamic_shapes inductor/test_torchinductor_dynamic_shapes --verbose
}
if ! [[ "${BUILD_ENVIRONMENT}" == *libtorch* || "${BUILD_ENVIRONMENT}" == *-bazel-* ]]; then
(cd test && python -c "import torch; print(torch.__config__.show())")
(cd test && python -c "import torch; print(torch.__config__.parallel_info())")
fi
if [[ "${BUILD_ENVIRONMENT}" == *aarch64* && "${TEST_CONFIG}" != *perf_cpu_aarch64* ]]; then
test_linux_aarch64
elif [[ "${TEST_CONFIG}" == *backward* ]]; then
if [[ "${TEST_CONFIG}" == *backward* ]]; then
test_forward_backward_compatibility
# Do NOT add tests after bc check tests, see its comment.
elif [[ "${TEST_CONFIG}" == *xla* ]]; then
@ -1429,12 +1048,11 @@ elif [[ "$TEST_CONFIG" == distributed ]]; then
if [[ "${SHARD_NUMBER}" == 1 ]]; then
test_rpc
fi
elif [[ "$TEST_CONFIG" == deploy ]]; then
checkout_install_torchdeploy
test_torch_deploy
elif [[ "${TEST_CONFIG}" == *inductor_distributed* ]]; then
test_inductor_distributed
elif [[ "${TEST_CONFIG}" == *inductor-halide* ]]; then
test_inductor_halide
elif [[ "${TEST_CONFIG}" == *inductor-micro-benchmark* ]]; then
test_inductor_micro_benchmark
elif [[ "${TEST_CONFIG}" == *huggingface* ]]; then
install_torchvision
id=$((SHARD_NUMBER-1))
@ -1444,67 +1062,46 @@ elif [[ "${TEST_CONFIG}" == *timm* ]]; then
id=$((SHARD_NUMBER-1))
test_dynamo_benchmark timm_models "$id"
elif [[ "${TEST_CONFIG}" == *torchbench* ]]; then
if [[ "${TEST_CONFIG}" == *cpu* ]]; then
if [[ "${TEST_CONFIG}" == *cpu_inductor* ]]; then
install_torchaudio cpu
else
install_torchaudio cuda
fi
install_torchtext
install_torchvision
TORCH_CUDA_ARCH_LIST="8.0;8.6" pip_install git+https://github.com/pytorch/ao.git
id=$((SHARD_NUMBER-1))
# https://github.com/opencv/opencv-python/issues/885
pip_install opencv-python==4.8.0.74
if [[ "${TEST_CONFIG}" == *inductor_torchbench_smoketest_perf* ]]; then
checkout_install_torchbench hf_Bert hf_Albert nanogpt timm_vision_transformer
PYTHONPATH=$(pwd)/torchbench test_inductor_torchbench_smoketest_perf
elif [[ "${TEST_CONFIG}" == *inductor_torchbench_cpu_smoketest_perf* ]]; then
checkout_install_torchbench timm_vision_transformer phlippe_densenet basic_gnn_edgecnn \
llama_v2_7b_16h resnet50 timm_efficientnet mobilenet_v3_large timm_resnest \
functorch_maml_omniglot yolov3 mobilenet_v2 resnext50_32x4d densenet121 mnasnet1_0
PYTHONPATH=$(pwd)/torchbench test_inductor_torchbench_cpu_smoketest_perf
elif [[ "${TEST_CONFIG}" == *torchbench_gcp_smoketest* ]]; then
checkout_install_torchbench
TORCHBENCHPATH=$(pwd)/torchbench test_torchbench_gcp_smoketest
else
checkout_install_torchbench
# Do this after checkout_install_torchbench to ensure we clobber any
# nightlies that torchbench may pull in
if [[ "${TEST_CONFIG}" != *cpu* ]]; then
if [[ "${TEST_CONFIG}" != *cpu_inductor* ]]; then
install_torchrec_and_fbgemm
fi
PYTHONPATH=$(pwd)/torchbench test_dynamo_benchmark torchbench "$id"
fi
elif [[ "${TEST_CONFIG}" == *inductor_cpp_wrapper_abi_compatible* ]]; then
elif [[ "${TEST_CONFIG}" == *inductor* && "${SHARD_NUMBER}" == 1 ]]; then
install_torchvision
test_inductor_cpp_wrapper_abi_compatible
elif [[ "${TEST_CONFIG}" == *inductor* ]]; then
test_inductor
test_inductor_distributed
elif [[ "${TEST_CONFIG}" == *dynamo* && "${SHARD_NUMBER}" == 1 && $NUM_TEST_SHARDS -gt 1 ]]; then
test_without_numpy
install_torchvision
test_inductor_shard "${SHARD_NUMBER}"
if [[ "${SHARD_NUMBER}" == 1 ]]; then
if [[ "${BUILD_ENVIRONMENT}" != linux-jammy-py3.9-gcc11-build ]]; then
test_inductor_distributed
fi
fi
elif [[ "${TEST_CONFIG}" == *dynamo* ]]; then
install_torchvision
test_dynamo_shard "${SHARD_NUMBER}"
if [[ "${SHARD_NUMBER}" == 1 ]]; then
test_aten
fi
elif [[ "${BUILD_ENVIRONMENT}" == *rocm* && -n "$TESTS_TO_INCLUDE" ]]; then
install_torchvision
test_python_shard "$SHARD_NUMBER"
test_dynamo_shard 1
test_aten
elif [[ "${TEST_CONFIG}" == *dynamo* && "${SHARD_NUMBER}" == 2 && $NUM_TEST_SHARDS -gt 1 ]]; then
install_torchvision
test_dynamo_shard 2
elif [[ "${SHARD_NUMBER}" == 1 && $NUM_TEST_SHARDS -gt 1 ]]; then
test_without_numpy
install_torchvision
test_python_shard 1
test_aten
test_libtorch 1
if [[ "${BUILD_ENVIRONMENT}" == *xpu* ]]; then
test_xpu_bin
fi
elif [[ "${SHARD_NUMBER}" == 2 && $NUM_TEST_SHARDS -gt 1 ]]; then
install_torchvision
test_python_shard 2
@ -1525,11 +1122,10 @@ elif [[ "${BUILD_ENVIRONMENT}" == *-mobile-lightweight-dispatch* ]]; then
test_libtorch
elif [[ "${TEST_CONFIG}" = docs_test ]]; then
test_docs_test
elif [[ "${BUILD_ENVIRONMENT}" == *xpu* ]]; then
elif [[ "${BUILD_ENVIRONMENT}" == *rocm* && -n "$TESTS_TO_INCLUDE" ]]; then
install_torchvision
test_python
test_aten
test_xpu_bin
else
install_torchvision
install_monkeytype

View File

@ -16,29 +16,24 @@ set PATH=C:\Program Files\CMake\bin;C:\Program Files\7-Zip;C:\ProgramData\chocol
set INSTALLER_DIR=%SCRIPT_HELPERS_DIR%\installation-helpers
call %INSTALLER_DIR%\install_mkl.bat
if errorlevel 1 exit /b
if not errorlevel 0 exit /b
call %INSTALLER_DIR%\install_magma.bat
if errorlevel 1 goto fail
if not errorlevel 0 goto fail
if errorlevel 1 exit /b
if not errorlevel 0 exit /b
call %INSTALLER_DIR%\install_sccache.bat
if errorlevel 1 goto fail
if not errorlevel 0 goto fail
if "%USE_XPU%"=="1" (
:: Install xpu support packages
call %INSTALLER_DIR%\install_xpu.bat
if errorlevel 1 exit /b 1
)
if errorlevel 1 exit /b
if not errorlevel 0 exit /b
:: Miniconda has been installed as part of the Windows AMI with all the dependencies.
:: We just need to activate it here
call %INSTALLER_DIR%\activate_miniconda3.bat
if errorlevel 1 goto fail
if not errorlevel 0 goto fail
call pip install mkl-include==2021.4.0 mkl-devel==2021.4.0
if errorlevel 1 goto fail
if not errorlevel 0 goto fail
if errorlevel 1 exit /b
if not errorlevel 0 exit /b
:: Override VS env here
pushd .
@ -47,18 +42,8 @@ if "%VC_VERSION%" == "" (
) else (
call "C:\Program Files (x86)\Microsoft Visual Studio\%VC_YEAR%\%VC_PRODUCT%\VC\Auxiliary\Build\vcvarsall.bat" x64 -vcvars_ver=%VC_VERSION%
)
if errorlevel 1 goto fail
if not errorlevel 0 goto fail
if "%USE_XPU%"=="1" (
:: Activate xpu environment - VS env is required for xpu
call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
if errorlevel 1 exit /b 1
:: Reduce build time. Only have MTL self-hosted runner now
SET TORCH_XPU_ARCH_LIST=xe-lpg
SET USE_KINETO=0
)
if errorlevel 1 exit /b
if not errorlevel 0 exit /b
@echo on
popd
@ -68,12 +53,12 @@ set CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v%CUDA_VERSION%
if x%CUDA_VERSION:.=%==x%CUDA_VERSION% (
echo CUDA version %CUDA_VERSION% format isn't correct, which doesn't contain '.'
goto fail
exit /b 1
)
rem version transformer, for example 10.1 to 10_1.
if x%CUDA_VERSION:.=%==x%CUDA_VERSION% (
echo CUDA version %CUDA_VERSION% format isn't correct, which doesn't contain '.'
goto fail
exit /b 1
)
set VERSION_SUFFIX=%CUDA_VERSION:.=_%
set CUDA_PATH_V%VERSION_SUFFIX%=%CUDA_PATH%
@ -81,6 +66,13 @@ set CUDA_PATH_V%VERSION_SUFFIX%=%CUDA_PATH%
set CUDNN_LIB_DIR=%CUDA_PATH%\lib\x64
set CUDA_TOOLKIT_ROOT_DIR=%CUDA_PATH%
set CUDNN_ROOT_DIR=%CUDA_PATH%
set NVTOOLSEXT_PATH=C:\Program Files\NVIDIA Corporation\NvToolsExt
set PATH=%CUDA_PATH%\bin;%CUDA_PATH%\libnvvp;%PATH%
set CUDNN_LIB_DIR=%CUDA_PATH%\lib\x64
set CUDA_TOOLKIT_ROOT_DIR=%CUDA_PATH%
set CUDNN_ROOT_DIR=%CUDA_PATH%
set NVTOOLSEXT_PATH=C:\Program Files\NVIDIA Corporation\NvToolsExt
set PATH=%CUDA_PATH%\bin;%CUDA_PATH%\libnvvp;%PATH%
:cuda_build_end
@ -97,8 +89,8 @@ set SCCACHE_IGNORE_SERVER_IO_ERROR=1
sccache --stop-server
sccache --start-server
sccache --zero-stats
set CMAKE_C_COMPILER_LAUNCHER=sccache
set CMAKE_CXX_COMPILER_LAUNCHER=sccache
set CC=sccache-cl
set CXX=sccache-cl
set CMAKE_GENERATOR=Ninja
@ -110,8 +102,8 @@ if "%USE_CUDA%"=="1" (
:: CMake requires a single command as CUDA_NVCC_EXECUTABLE, so we push the wrappers
:: randomtemp.exe and sccache.exe into a batch file which CMake invokes.
curl -kL https://github.com/peterjc123/randomtemp-rust/releases/download/v0.4/randomtemp.exe --output %TMP_DIR_WIN%\bin\randomtemp.exe
if errorlevel 1 goto fail
if not errorlevel 0 goto fail
if errorlevel 1 exit /b
if not errorlevel 0 exit /b
echo @"%TMP_DIR_WIN%\bin\randomtemp.exe" "%TMP_DIR_WIN%\bin\sccache.exe" "%CUDA_PATH%\bin\nvcc.exe" %%* > "%TMP_DIR%/bin/nvcc.bat"
cat %TMP_DIR%/bin/nvcc.bat
set CUDA_NVCC_EXECUTABLE=%TMP_DIR%/bin/nvcc.bat
@ -123,8 +115,8 @@ if "%USE_CUDA%"=="1" (
set
python setup.py bdist_wheel
if errorlevel 1 goto fail
if not errorlevel 0 goto fail
if errorlevel 1 exit /b
if not errorlevel 0 exit /b
sccache --show-stats
python -c "import os, glob; os.system('python -mpip install --no-index --no-deps ' + glob.glob('dist/*.whl')[0])"
(
@ -144,8 +136,3 @@ python -c "import os, glob; os.system('python -mpip install --no-index --no-deps
sccache --show-stats --stats-format json | jq .stats > sccache-stats-%BUILD_ENVIRONMENT%-%OUR_GITHUB_JOB_ID%.json
sccache --stop-server
exit /b 0
:fail
exit /b 1

View File

@ -0,0 +1,14 @@
if "%REBUILD%"=="" (
if "%BUILD_ENVIRONMENT%"=="" (
curl --retry 3 --retry-all-errors -k https://s3.amazonaws.com/ossci-windows/mkl_2020.2.254.7z --output %TMP_DIR_WIN%\mkl.7z
) else (
aws s3 cp s3://ossci-windows/mkl_2020.2.254.7z %TMP_DIR_WIN%\mkl.7z --quiet
)
if errorlevel 1 exit /b
if not errorlevel 0 exit /b
7z x -aoa %TMP_DIR_WIN%\mkl.7z -o%TMP_DIR_WIN%\mkl
if errorlevel 1 exit /b
if not errorlevel 0 exit /b
)
set CMAKE_INCLUDE_PATH=%TMP_DIR_WIN%\mkl\include
set LIB=%TMP_DIR_WIN%\mkl\lib;%LIB%

View File

@ -1,13 +1,18 @@
mkdir %TMP_DIR_WIN%\bin
if "%REBUILD%"=="" (
IF EXIST %TMP_DIR_WIN%\bin\sccache.exe (
:check_sccache
%TMP_DIR_WIN%\bin\sccache.exe --show-stats || (
taskkill /im sccache.exe /f /t || ver > nul
del %TMP_DIR_WIN%\bin\sccache.exe || ver > nul
del %TMP_DIR_WIN%\bin\sccache-cl.exe || ver > nul
if "%BUILD_ENVIRONMENT%"=="" (
curl --retry 3 --retry-all-errors -k https://s3.amazonaws.com/ossci-windows/sccache.exe --output %TMP_DIR_WIN%\bin\sccache.exe
curl --retry 3 --retry-all-errors -k https://s3.amazonaws.com/ossci-windows/sccache-cl.exe --output %TMP_DIR_WIN%\bin\sccache-cl.exe
) else (
aws s3 cp s3://ossci-windows/sccache.exe %TMP_DIR_WIN%\bin\sccache.exe
aws s3 cp s3://ossci-windows/sccache-cl.exe %TMP_DIR_WIN%\bin\sccache-cl.exe
)
goto :check_sccache
)
if "%BUILD_ENVIRONMENT%"=="" (
curl --retry 3 --retry-all-errors -k https://s3.amazonaws.com/ossci-windows/sccache-v0.7.4.exe --output %TMP_DIR_WIN%\bin\sccache.exe
) else (
aws s3 cp s3://ossci-windows/sccache-v0.7.4.exe %TMP_DIR_WIN%\bin\sccache.exe
)
)
)

View File

@ -1,91 +0,0 @@
@echo on
REM Description: Install Intel Support Packages on Windows
REM BKM reference: https://www.intel.com/content/www/us/en/developer/articles/tool/pytorch-prerequisites-for-intel-gpu/2-5.html
set XPU_INSTALL_MODE=%~1
if "%XPU_INSTALL_MODE%"=="" goto xpu_bundle_install_start
if "%XPU_INSTALL_MODE%"=="bundle" goto xpu_bundle_install_start
if "%XPU_INSTALL_MODE%"=="driver" goto xpu_driver_install_start
if "%XPU_INSTALL_MODE%"=="all" goto xpu_driver_install_start
:arg_error
echo Illegal XPU installation mode. The value can be "bundle"/"driver"/"all"
echo If keep the value as space, will use default "bundle" mode
exit /b 1
:xpu_driver_install_start
:: TODO Need more testing for driver installation
set XPU_DRIVER_LINK=https://downloadmirror.intel.com/830975/gfx_win_101.5972.exe
curl -o xpu_driver.exe --retry 3 --retry-all-errors -k %XPU_DRIVER_LINK%
echo "XPU Driver installing..."
start /wait "Intel XPU Driver Installer" "xpu_driver.exe"
if errorlevel 1 exit /b 1
del xpu_driver.exe
if "%XPU_INSTALL_MODE%"=="driver" goto xpu_install_end
:xpu_bundle_install_start
set XPU_BUNDLE_PARENT_DIR=C:\Program Files (x86)\Intel\oneAPI
set XPU_BUNDLE_URL=https://registrationcenter-download.intel.com/akdlm/IRC_NAS/9d1a91e2-e8b8-40a5-8c7f-5db768a6a60c/w_intel-for-pytorch-gpu-dev_p_0.5.3.37_offline.exe
set XPU_PTI_URL=https://registrationcenter-download.intel.com/akdlm/IRC_NAS/9d1a91e2-e8b8-40a5-8c7f-5db768a6a60c/w_intel-pti-dev_p_0.9.0.37_offline.exe
set XPU_BUNDLE_VERSION=0.5.3+31
set XPU_PTI_VERSION=0.9.0+36
set XPU_BUNDLE_PRODUCT_NAME=intel.oneapi.win.intel-for-pytorch-gpu-dev.product
set XPU_PTI_PRODUCT_NAME=intel.oneapi.win.intel-pti-dev.product
set XPU_BUNDLE_INSTALLED=0
set XPU_PTI_INSTALLED=0
set XPU_BUNDLE_UNINSTALL=0
set XPU_PTI_UNINSTALL=0
:: Check if XPU bundle is target version or already installed
if exist "%XPU_BUNDLE_PARENT_DIR%\Installer\installer.exe" goto xpu_bundle_ver_check
goto xpu_bundle_install
:xpu_bundle_ver_check
"%XPU_BUNDLE_PARENT_DIR%\Installer\installer.exe" --list-products > xpu_bundle_installed_ver.log
for /f "tokens=1,2" %%a in (xpu_bundle_installed_ver.log) do (
if "%%a"=="%XPU_BUNDLE_PRODUCT_NAME%" (
echo %%a Installed Version: %%b
set XPU_BUNDLE_INSTALLED=1
if not "%XPU_BUNDLE_VERSION%"=="%%b" (
start /wait "Installer Title" "%XPU_BUNDLE_PARENT_DIR%\Installer\installer.exe" --action=remove --eula=accept --silent --product-id %XPU_BUNDLE_PRODUCT_NAME% --product-ver %%b --log-dir uninstall_bundle
set XPU_BUNDLE_UNINSTALL=1
)
)
if "%%a"=="%XPU_PTI_PRODUCT_NAME%" (
echo %%a Installed Version: %%b
set XPU_PTI_INSTALLED=1
if not "%XPU_PTI_VERSION%"=="%%b" (
start /wait "Installer Title" "%XPU_BUNDLE_PARENT_DIR%\Installer\installer.exe" --action=remove --eula=accept --silent --product-id %XPU_PTI_PRODUCT_NAME% --product-ver %%b --log-dir uninstall_bundle
set XPU_PTI_UNINSTALL=1
)
)
)
if errorlevel 1 exit /b 1
if exist xpu_bundle_installed_ver.log del xpu_bundle_installed_ver.log
if "%XPU_BUNDLE_INSTALLED%"=="0" goto xpu_bundle_install
if "%XPU_BUNDLE_UNINSTALL%"=="1" goto xpu_bundle_install
if "%XPU_PTI_INSTALLED%"=="0" goto xpu_pti_install
if "%XPU_PTI_UNINSTALL%"=="1" goto xpu_pti_install
goto xpu_install_end
:xpu_bundle_install
curl -o xpu_bundle.exe --retry 3 --retry-all-errors -k %XPU_BUNDLE_URL%
echo "XPU Bundle installing..."
start /wait "Intel Pytorch Bundle Installer" "xpu_bundle.exe" --action=install --eula=accept --silent --log-dir install_bundle
if errorlevel 1 exit /b 1
del xpu_bundle.exe
:xpu_pti_install
curl -o xpu_pti.exe --retry 3 --retry-all-errors -k %XPU_PTI_URL%
echo "XPU PTI installing..."
start /wait "Intel PTI Installer" "xpu_pti.exe" --action=install --eula=accept --silent --log-dir install_bundle
if errorlevel 1 exit /b 1
del xpu_pti.exe
:xpu_install_end

View File

@ -4,7 +4,6 @@ import os
import subprocess
import sys
COMMON_TESTS = [
(
"Checking that torch is available",

View File

@ -40,6 +40,7 @@ set CUDA_PATH_V%VERSION_SUFFIX%=%CUDA_PATH%
set CUDNN_LIB_DIR=%CUDA_PATH%\lib\x64
set CUDA_TOOLKIT_ROOT_DIR=%CUDA_PATH%
set CUDNN_ROOT_DIR=%CUDA_PATH%
set NVTOOLSEXT_PATH=C:\Program Files\NVIDIA Corporation\NvToolsExt
set PATH=%CUDA_PATH%\bin;%CUDA_PATH%\libnvvp;%PATH%
set NUMBAPRO_CUDALIB=%CUDA_PATH%\bin
set NUMBAPRO_LIBDEVICE=%CUDA_PATH%\nvvm\libdevice

View File

@ -31,6 +31,6 @@ if ERRORLEVEL 1 exit /b 1
:: Run tests C++-side and load the exported script module.
cd build
set PATH=%TMP_DIR_WIN%\build\torch\lib;%PATH%
set PATH=C:\Program Files\NVIDIA Corporation\NvToolsExt\bin\x64;%TMP_DIR_WIN%\build\torch\lib;%PATH%
test_custom_backend.exe model.pt
if ERRORLEVEL 1 exit /b 1

View File

@ -31,6 +31,6 @@ if ERRORLEVEL 1 exit /b 1
:: Run tests C++-side and load the exported script module.
cd build
set PATH=%TMP_DIR_WIN%\build\torch\lib;%PATH%
set PATH=C:\Program Files\NVIDIA Corporation\NvToolsExt\bin\x64;%TMP_DIR_WIN%\build\torch\lib;%PATH%
test_custom_ops.exe model.pt
if ERRORLEVEL 1 exit /b 1

View File

@ -5,7 +5,7 @@ if errorlevel 1 exit /b 1
set CWD=%cd%
set CPP_TESTS_DIR=%TMP_DIR_WIN%\build\torch\bin
set PATH=%TMP_DIR_WIN%\build\torch\lib;%PATH%
set PATH=C:\Program Files\NVIDIA Corporation\NvToolsExt\bin\x64;%TMP_DIR_WIN%\build\torch\lib;%PATH%
set TORCH_CPP_TEST_MNIST_PATH=%CWD%\test\cpp\api\mnist
python tools\download_mnist.py --quiet -d %TORCH_CPP_TEST_MNIST_PATH%

View File

@ -40,9 +40,6 @@ python -m pip install pytest-rerunfailures==10.3 pytest-cpp==2.3.0 tensorboard==
# Install Z3 optional dependency for Windows builds.
python -m pip install z3-solver==4.12.2.0
# Install tlparse for test\dynamo\test_structured_trace.py UTs.
python -m pip install tlparse==0.3.25
run_tests() {
# Run nvidia-smi if available
for path in '/c/Program Files/NVIDIA Corporation/NVSMI/nvidia-smi.exe' /c/Windows/System32/nvidia-smi.exe; do

View File

@ -1,4 +1,468 @@
Warning
=======
PyTorch migration from CircleCI to github actions has been completed. All continuous integration & deployment workflows are defined in `.github/workflows` folder
Contents may be out of date. Our CircleCI workflows are gradually being migrated to Github actions.
Structure of CI
===============
setup job:
1. Does a git checkout
2. Persists CircleCI scripts (everything in `.circleci`) into a workspace. Why?
We don't always do a Git checkout on all subjobs, but we usually
still want to be able to call scripts one way or another in a subjob.
Persisting files this way lets us have access to them without doing a
checkout. This workspace is conventionally mounted on `~/workspace`
(this is distinguished from `~/project`, which is the conventional
working directory that CircleCI will default to starting your jobs
in.)
3. Write out the commit message to `.circleci/COMMIT_MSG`. This is so
we can determine in subjobs if we should actually run the jobs or
not, even if there isn't a Git checkout.
CircleCI configuration generator
================================
One may no longer make changes to the `.circleci/config.yml` file directly.
Instead, one must edit these Python scripts or files in the `verbatim-sources/` directory.
Usage
----------
1. Make changes to these scripts.
2. Run the `regenerate.sh` script in this directory and commit the script changes and the resulting change to `config.yml`.
You'll see a build failure on GitHub if the scripts don't agree with the checked-in version.
Motivation
----------
These scripts establish a single, authoritative source of documentation for the CircleCI configuration matrix.
The documentation, in the form of diagrams, is automatically generated and cannot drift out of sync with the YAML content.
Furthermore, consistency is enforced within the YAML config itself, by using a single source of data to generate
multiple parts of the file.
* Facilitates one-off culling/enabling of CI configs for testing PRs on special targets
Also see https://github.com/pytorch/pytorch/issues/17038
Future direction
----------------
### Declaring sparse config subsets
See comment [here](https://github.com/pytorch/pytorch/pull/17323#pullrequestreview-206945747):
In contrast with a full recursive tree traversal of configuration dimensions,
> in the future I think we actually want to decrease our matrix somewhat and have only a few mostly-orthogonal builds that taste as many different features as possible on PRs, plus a more complete suite on every PR and maybe an almost full suite nightly/weekly (we don't have this yet). Specifying PR jobs in the future might be easier to read with an explicit list when we come to this.
----------------
----------------
# How do the binaries / nightlies / releases work?
### What is a binary?
A binary or package (used interchangeably) is a pre-built collection of c++ libraries, header files, python bits, and other files. We build these and distribute them so that users do not need to install from source.
A **binary configuration** is a collection of
* release or nightly
* releases are stable, nightlies are beta and built every night
* python version
* linux: 3.7m (mu is wide unicode or something like that. It usually doesn't matter but you should know that it exists)
* macos: 3.7, 3.8
* windows: 3.7, 3.8
* cpu version
* cpu, cuda 9.0, cuda 10.0
* The supported cuda versions occasionally change
* operating system
* Linux - these are all built on CentOS. There haven't been any problems in the past building on CentOS and using on Ubuntu
* MacOS
* Windows - these are built on Azure pipelines
* devtoolset version (gcc compiler version)
* This only matters on Linux cause only Linux uses gcc. tldr is gcc made a backwards incompatible change from gcc 4.8 to gcc 5, because it had to change how it implemented std::vector and std::string
### Where are the binaries?
The binaries are built in CircleCI. There are nightly binaries built every night at 9pm PST (midnight EST) and release binaries corresponding to Pytorch releases, usually every few months.
We have 3 types of binary packages
* pip packages - nightlies are stored on s3 (pip install -f \<a s3 url\>). releases are stored in a pip repo (pip install torch) (ask Soumith about this)
* conda packages - nightlies and releases are both stored in a conda repo. Nighty packages have a '_nightly' suffix
* libtorch packages - these are zips of all the c++ libraries, header files, and sometimes dependencies. These are c++ only
* shared with dependencies (the only supported option for Windows)
* static with dependencies
* shared without dependencies
* static without dependencies
All binaries are built in CircleCI workflows except Windows. There are checked-in workflows (committed into the .circleci/config.yml) to build the nightlies every night. Releases are built by manually pushing a PR that builds the suite of release binaries (overwrite the config.yml to build the release)
# CircleCI structure of the binaries
Some quick vocab:
* A \**workflow** is a CircleCI concept; it is a DAG of '**jobs**'. ctrl-f 'workflows' on https://github.com/pytorch/pytorch/blob/main/.circleci/config.yml to see the workflows.
* **jobs** are a sequence of '**steps**'
* **steps** are usually just a bash script or a builtin CircleCI command. *All steps run in new environments, environment variables declared in one script DO NOT persist to following steps*
* CircleCI has a **workspace**, which is essentially a cache between steps of the *same job* in which you can store artifacts between steps.
## How are the workflows structured?
The nightly binaries have 3 workflows. We have one job (actually 3 jobs: build, test, and upload) per binary configuration
1. binary_builds
1. every day midnight EST
2. linux: https://github.com/pytorch/pytorch/blob/main/.circleci/verbatim-sources/linux-binary-build-defaults.yml
3. macos: https://github.com/pytorch/pytorch/blob/main/.circleci/verbatim-sources/macos-binary-build-defaults.yml
4. For each binary configuration, e.g. linux_conda_3.7_cpu there is a
1. binary_linux_conda_3.7_cpu_build
1. Builds the build. On linux jobs this uses the 'docker executor'.
2. Persists the package to the workspace
2. binary_linux_conda_3.7_cpu_test
1. Loads the package to the workspace
2. Spins up a docker image (on Linux), mapping the package and code repos into the docker
3. Runs some smoke tests in the docker
4. (Actually, for macos this is a step rather than a separate job)
3. binary_linux_conda_3.7_cpu_upload
1. Logs in to aws/conda
2. Uploads the package
2. update_s3_htmls
1. every day 5am EST
2. https://github.com/pytorch/pytorch/blob/main/.circleci/verbatim-sources/binary_update_htmls.yml
3. See below for what these are for and why they're needed
4. Three jobs that each examine the current contents of aws and the conda repo and update some html files in s3
3. binarysmoketests
1. every day
2. https://github.com/pytorch/pytorch/blob/main/.circleci/verbatim-sources/nightly-build-smoke-tests-defaults.yml
3. For each binary configuration, e.g. linux_conda_3.7_cpu there is a
1. smoke_linux_conda_3.7_cpu
1. Downloads the package from the cloud, e.g. using the official pip or conda instructions
2. Runs the smoke tests
## How are the jobs structured?
The jobs are in https://github.com/pytorch/pytorch/tree/main/.circleci/verbatim-sources. Jobs are made of multiple steps. There are some shared steps used by all the binaries/smokes. Steps of these jobs are all delegated to scripts in https://github.com/pytorch/pytorch/tree/main/.circleci/scripts .
* Linux jobs: https://github.com/pytorch/pytorch/blob/main/.circleci/verbatim-sources/linux-binary-build-defaults.yml
* binary_linux_build.sh
* binary_linux_test.sh
* binary_linux_upload.sh
* MacOS jobs: https://github.com/pytorch/pytorch/blob/main/.circleci/verbatim-sources/macos-binary-build-defaults.yml
* binary_macos_build.sh
* binary_macos_test.sh
* binary_macos_upload.sh
* Update html jobs: https://github.com/pytorch/pytorch/blob/main/.circleci/verbatim-sources/binary_update_htmls.yml
* These delegate from the pytorch/builder repo
* https://github.com/pytorch/builder/blob/main/cron/update_s3_htmls.sh
* https://github.com/pytorch/builder/blob/main/cron/upload_binary_sizes.sh
* Smoke jobs (both linux and macos): https://github.com/pytorch/pytorch/blob/main/.circleci/verbatim-sources/nightly-build-smoke-tests-defaults.yml
* These delegate from the pytorch/builder repo
* https://github.com/pytorch/builder/blob/main/run_tests.sh
* https://github.com/pytorch/builder/blob/main/smoke_test.sh
* https://github.com/pytorch/builder/blob/main/check_binary.sh
* Common shared code (shared across linux and macos): https://github.com/pytorch/pytorch/blob/main/.circleci/verbatim-sources/nightly-binary-build-defaults.yml
* binary_checkout.sh - checks out pytorch/builder repo. Right now this also checks out pytorch/pytorch, but it shouldn't. pytorch/pytorch should just be shared through the workspace. This can handle being run before binary_populate_env.sh
* binary_populate_env.sh - parses BUILD_ENVIRONMENT into the separate env variables that make up a binary configuration. Also sets lots of default values, the date, the version strings, the location of folders in s3, all sorts of things. This generally has to be run before other steps.
* binary_install_miniconda.sh - Installs miniconda, cross platform. Also hacks this for the update_binary_sizes job that doesn't have the right env variables
* binary_run_in_docker.sh - Takes a bash script file (the actual test code) from a hardcoded location, spins up a docker image, and runs the script inside the docker image
### **Why do the steps all refer to scripts?**
CircleCI creates a final yaml file by inlining every <<* segment, so if we were to keep all the code in the config.yml itself then the config size would go over 4 MB and cause infra problems.
### **What is binary_run_in_docker for?**
So, CircleCI has several executor types: macos, machine, and docker are the ones we use. The 'machine' executor gives you two cores on some linux vm. The 'docker' executor gives you considerably more cores (nproc was 32 instead of 2 back when I tried in February). Since the dockers are faster, we try to run everything that we can in dockers. Thus
* linux build jobs use the docker executor. Running them on the docker executor was at least 2x faster than running them on the machine executor
* linux test jobs use the machine executor in order for them to properly interface with GPUs since docker executors cannot execute with attached GPUs
* linux upload jobs use the machine executor. The upload jobs are so short that it doesn't really matter what they use
* linux smoke test jobs use the machine executor for the same reason as the linux test jobs
binary_run_in_docker.sh is a way to share the docker start-up code between the binary test jobs and the binary smoke test jobs
### **Why does binary_checkout also checkout pytorch? Why shouldn't it?**
We want all the nightly binary jobs to run on the exact same git commit, so we wrote our own checkout logic to ensure that the same commit was always picked. Later circleci changed that to use a single pytorch checkout and persist it through the workspace (they did this because our config file was too big, so they wanted to take a lot of the setup code into scripts, but the scripts needed the code repo to exist to be called, so they added a prereq step called 'setup' to checkout the code and persist the needed scripts to the workspace). The changes to the binary jobs were not properly tested, so they all broke from missing pytorch code no longer existing. We hotfixed the problem by adding the pytorch checkout back to binary_checkout, so now there's two checkouts of pytorch on the binary jobs. This problem still needs to be fixed, but it takes careful tracing of which code is being called where.
# Code structure of the binaries (circleci agnostic)
## Overview
The code that runs the binaries lives in two places, in the normal [github.com/pytorch/pytorch](http://github.com/pytorch/pytorch), but also in [github.com/pytorch/builder](http://github.com/pytorch/builder), which is a repo that defines how all the binaries are built. The relevant code is
```
# All code needed to set-up environments for build code to run in,
# but only code that is specific to the current CI system
pytorch/pytorch
- .circleci/ # Folder that holds all circleci related stuff
- config.yml # GENERATED file that actually controls all circleci behavior
- verbatim-sources # Used to generate job/workflow sections in ^
- scripts/ # Code needed to prepare circleci environments for binary build scripts
- setup.py # Builds pytorch. This is wrapped in pytorch/builder
- cmake files # used in normal building of pytorch
# All code needed to prepare a binary build, given an environment
# with all the right variables/packages/paths.
pytorch/builder
# Given an installed binary and a proper python env, runs some checks
# to make sure the binary was built the proper way. Checks things like
# the library dependencies, symbols present, etc.
- check_binary.sh
# Given an installed binary, runs python tests to make sure everything
# is in order. These should be de-duped. Right now they both run smoke
# tests, but are called from different places. Usually just call some
# import statements, but also has overlap with check_binary.sh above
- run_tests.sh
- smoke_test.sh
# Folders that govern how packages are built. See paragraphs below
- conda/
- build_pytorch.sh # Entrypoint. Delegates to proper conda build folder
- switch_cuda_version.sh # Switches activate CUDA installation in Docker
- pytorch-nightly/ # Build-folder
- manywheel/
- build_cpu.sh # Entrypoint for cpu builds
- build.sh # Entrypoint for CUDA builds
- build_common.sh # Actual build script that ^^ call into
- wheel/
- build_wheel.sh # Entrypoint for wheel builds
- windows/
- build_pytorch.bat # Entrypoint for wheel builds on Windows
```
Every type of package has an entrypoint build script that handles the all the important logic.
## Conda
Linux, MacOS and Windows use the same code flow for the conda builds.
Conda packages are built with conda-build, see https://conda.io/projects/conda-build/en/latest/resources/commands/conda-build.html
Basically, you pass `conda build` a build folder (pytorch-nightly/ above) that contains a build script and a meta.yaml. The meta.yaml specifies in what python environment to build the package in, and what dependencies the resulting package should have, and the build script gets called in the env to build the thing.
tl;dr on conda-build is
1. Creates a brand new conda environment, based off of deps in the meta.yaml
1. Note that environment variables do not get passed into this build env unless they are specified in the meta.yaml
2. If the build fails this environment will stick around. You can activate it for much easier debugging. The “General Python” section below explains what exactly a python “environment” is.
2. Calls build.sh in the environment
3. Copies the finished package to a new conda env, also specified by the meta.yaml
4. Runs some simple import tests (if specified in the meta.yaml)
5. Saves the finished package as a tarball
The build.sh we use is essentially a wrapper around `python setup.py build`, but it also manually copies in some of our dependent libraries into the resulting tarball and messes with some rpaths.
The entrypoint file `builder/conda/build_conda.sh` is complicated because
* It works for Linux, MacOS and Windows
* The mac builds used to create their own environments, since they all used to be on the same machine. Theres now a lot of extra logic to handle conda envs. This extra machinery could be removed
* It used to handle testing too, which adds more logic messing with python environments too. This extra machinery could be removed.
## Manywheels (linux pip and libtorch packages)
Manywheels are pip packages for linux distros. Note that these manywheels are not actually manylinux compliant.
`builder/manywheel/build_cpu.sh` and `builder/manywheel/build.sh` (for CUDA builds) just set different env vars and then call into `builder/manywheel/build_common.sh`
The entrypoint file `builder/manywheel/build_common.sh` is really really complicated because
* This used to handle building for several different python versions at the same time. The loops have been removed, but there's still unnecessary folders and movements here and there.
* The script is never used this way anymore. This extra machinery could be removed.
* This used to handle testing the pip packages too. This is why theres testing code at the end that messes with python installations and stuff
* The script is never used this way anymore. This extra machinery could be removed.
* This also builds libtorch packages
* This should really be separate. libtorch packages are c++ only and have no python. They should not share infra with all the python specific stuff in this file.
* There is a lot of messing with rpaths. This is necessary, but could be made much much simpler if the above issues were fixed.
## Wheels (MacOS pip and libtorch packages)
The entrypoint file `builder/wheel/build_wheel.sh` is complicated because
* The mac builds used to all run on one machine (we didnt have autoscaling mac machines till circleci). So this script handled siloing itself by setting-up and tearing-down its build env and siloing itself into its own build directory.
* The script is never used this way anymore. This extra machinery could be removed.
* This also builds libtorch packages
* Ditto the comment above. This should definitely be separated out.
Note that the MacOS Python wheels are still built in conda environments. Some of the dependencies present during build also come from conda.
## Windows Wheels (Windows pip and libtorch packages)
The entrypoint file `builder/windows/build_pytorch.bat` is complicated because
* This used to handle building for several different python versions at the same time. This is why there are loops everywhere
* The script is never used this way anymore. This extra machinery could be removed.
* This used to handle testing the pip packages too. This is why theres testing code at the end that messes with python installations and stuff
* The script is never used this way anymore. This extra machinery could be removed.
* This also builds libtorch packages
* This should really be separate. libtorch packages are c++ only and have no python. They should not share infra with all the python specific stuff in this file.
Note that the Windows Python wheels are still built in conda environments. Some of the dependencies present during build also come from conda.
## General notes
### Note on run_tests.sh, smoke_test.sh, and check_binary.sh
* These should all be consolidated
* These must run on all OS types: MacOS, Linux, and Windows
* These all run smoke tests at the moment. They inspect the packages some, maybe run a few import statements. They DO NOT run the python tests nor the cpp tests. The idea is that python tests on main and PR merges will catch all breakages. All these tests have to do is make sure the special binary machinery didnt mess anything up.
* There are separate run_tests.sh and smoke_test.sh because one used to be called by the smoke jobs and one used to be called by the binary test jobs (see circleci structure section above). This is still true actually, but these could be united into a single script that runs these checks, given an installed pytorch package.
### Note on libtorch
Libtorch packages are built in the wheel build scripts: manywheel/build_*.sh for linux and build_wheel.sh for mac. There are several things wrong with this
* Its confusing. Most of those scripts deal with python specifics.
* The extra conditionals everywhere severely complicate the wheel build scripts
* The process for building libtorch is different from the official instructions (a plain call to cmake, or a call to a script)
### Note on docker images / Dockerfiles
All linux builds occur in docker images. The docker images are
* pytorch/conda-cuda
* Has ALL CUDA versions installed. The script pytorch/builder/conda/switch_cuda_version.sh sets /usr/local/cuda to a symlink to e.g. /usr/local/cuda-10.0 to enable different CUDA builds
* Also used for cpu builds
* pytorch/manylinux-cuda90
* pytorch/manylinux-cuda100
* Also used for cpu builds
The Dockerfiles are available in pytorch/builder, but there is no circleci job or script to build these docker images, and they cannot be run locally (unless you have the correct local packages/paths). Only Soumith can build them right now.
### General Python
* This is still a good explanation of python installations https://caffe2.ai/docs/faq.html#why-do-i-get-import-errors-in-python-when-i-try-to-use-caffe2
# How to manually rebuild the binaries
tl;dr make a PR that looks like https://github.com/pytorch/pytorch/pull/21159
Sometimes we want to push a change to mainand then rebuild all of today's binaries after that change. As of May 30, 2019 there isn't a way to manually run a workflow in the UI. You can manually re-run a workflow, but it will use the exact same git commits as the first run and will not include any changes. So we have to make a PR and then force circleci to run the binary workflow instead of the normal tests. The above PR is an example of how to do this; essentially you copy-paste the binarybuilds workflow steps into the default workflow steps. If you need to point the builder repo to a different commit then you'd need to change https://github.com/pytorch/pytorch/blob/main/.circleci/scripts/binary_checkout.sh#L42-L45 to checkout what you want.
## How to test changes to the binaries via .circleci
Writing PRs that test the binaries is annoying, since the default circleci jobs that run on PRs are not the jobs that you want to run. Likely, changes to the binaries will touch something under .circleci/ and require that .circleci/config.yml be regenerated (.circleci/config.yml controls all .circleci behavior, and is generated using `.circleci/regenerate.sh` in python 3.7). But you also need to manually hardcode the binary jobs that you want to test into the .circleci/config.yml workflow, so you should actually make at least two commits, one for your changes and one to temporarily hardcode jobs. See https://github.com/pytorch/pytorch/pull/22928 as an example of how to do this.
```sh
# Make your changes
touch .circleci/verbatim-sources/nightly-binary-build-defaults.yml
# Regenerate the yaml, has to be in python 3.7
.circleci/regenerate.sh
# Make a commit
git add .circleci *
git commit -m "My real changes"
git push origin my_branch
# Now hardcode the jobs that you want in the .circleci/config.yml workflows section
# Also eliminate ensure-consistency and should_run_job checks
# e.g. https://github.com/pytorch/pytorch/commit/2b3344bfed8772fe86e5210cc4ee915dee42b32d
# Make a commit you won't keep
git add .circleci
git commit -m "[DO NOT LAND] testing binaries for above changes"
git push origin my_branch
# Now you need to make some changes to the first commit.
git rebase -i HEAD~2 # mark the first commit as 'edit'
# Make the changes
touch .circleci/verbatim-sources/nightly-binary-build-defaults.yml
.circleci/regenerate.sh
# Ammend the commit and recontinue
git add .circleci
git commit --amend
git rebase --continue
# Update the PR, need to force since the commits are different now
git push origin my_branch --force
```
The advantage of this flow is that you can make new changes to the base commit and regenerate the .circleci without having to re-write which binary jobs you want to test on. The downside is that all updates will be force pushes.
## How to build a binary locally
### Linux
You can build Linux binaries locally easily using docker.
```sh
# Run the docker
# Use the correct docker image, pytorch/conda-cuda used here as an example
#
# -v path/to/foo:path/to/bar makes path/to/foo on your local machine (the
# machine that you're running the command on) accessible to the docker
# container at path/to/bar. So if you then run `touch path/to/bar/baz`
# in the docker container then you will see path/to/foo/baz on your local
# machine. You could also clone the pytorch and builder repos in the docker.
#
# If you know how, add ccache as a volume too and speed up everything
docker run \
-v your/pytorch/repo:/pytorch \
-v your/builder/repo:/builder \
-v where/you/want/packages/to/appear:/final_pkgs \
-it pytorch/conda-cuda /bin/bash
# Export whatever variables are important to you. All variables that you'd
# possibly need are in .circleci/scripts/binary_populate_env.sh
# You should probably always export at least these 3 variables
export PACKAGE_TYPE=conda
export DESIRED_PYTHON=3.7
export DESIRED_CUDA=cpu
# Call the entrypoint
# `|& tee foo.log` just copies all stdout and stderr output to foo.log
# The builds generate lots of output so you probably need this when
# building locally.
/builder/conda/build_pytorch.sh |& tee build_output.log
```
**Building CUDA binaries on docker**
You can build CUDA binaries on CPU only machines, but you can only run CUDA binaries on CUDA machines. This means that you can build a CUDA binary on a docker on your laptop if you so choose (though its gonna take a long time).
For Facebook employees, ask about beefy machines that have docker support and use those instead of your laptop; it will be 5x as fast.
### MacOS
Theres no easy way to generate reproducible hermetic MacOS environments. If you have a Mac laptop then you can try emulating the .circleci environments as much as possible, but you probably have packages in /usr/local/, possibly installed by brew, that will probably interfere with the build. If youre trying to repro an error on a Mac build in .circleci and you cant seem to repro locally, then my best advice is actually to iterate on .circleci :/
But if you want to try, then Id recommend
```sh
# Create a new terminal
# Clear your LD_LIBRARY_PATH and trim as much out of your PATH as you
# know how to do
# Install a new miniconda
# First remove any other python or conda installation from your PATH
# Always install miniconda 3, even if building for Python <3
new_conda="~/my_new_conda"
conda_sh="$new_conda/install_miniconda.sh"
curl -o "$conda_sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
chmod +x "$conda_sh"
"$conda_sh" -b -p "$MINICONDA_ROOT"
rm -f "$conda_sh"
export PATH="~/my_new_conda/bin:$PATH"
# Create a clean python env
# All MacOS builds use conda to manage the python env and dependencies
# that are built with, even the pip packages
conda create -yn binary python=2.7
conda activate binary
# Export whatever variables are important to you. All variables that you'd
# possibly need are in .circleci/scripts/binary_populate_env.sh
# You should probably always export at least these 3 variables
export PACKAGE_TYPE=conda
export DESIRED_PYTHON=3.7
export DESIRED_CUDA=cpu
# Call the entrypoint you want
path/to/builder/wheel/build_wheel.sh
```
N.B. installing a brand new miniconda is important. This has to do with how conda installations work. See the “General Python” section above, but tldr; is that
1. You make the conda command accessible by prepending `path/to/conda_root/bin` to your PATH.
2. You make a new env and activate it, which then also gets prepended to your PATH. Now you have `path/to/conda_root/envs/new_env/bin:path/to/conda_root/bin:$PATH`
3. Now say you (or some code that you ran) call python executable `foo`
1. if you installed `foo` in `new_env`, then `path/to/conda_root/envs/new_env/bin/foo` will get called, as expected.
2. But if you forgot to installed `foo` in `new_env` but happened to previously install it in your root conda env (called base), then unix/linux will still find `path/to/conda_root/bin/foo` . This is dangerous, since `foo` can be a different version than you want; `foo` can even be for an incompatible python version!
Newer conda versions and proper python hygiene can prevent this, but just install a new miniconda to be safe.
### Windows
TODO: fill in

View File

@ -0,0 +1,198 @@
"""
This module models the tree of configuration variants
for "smoketest" builds.
Each subclass of ConfigNode represents a layer of the configuration hierarchy.
These tree nodes encapsulate the logic for whether a branch of the hierarchy
should be "pruned".
"""
from collections import OrderedDict
import cimodel.data.dimensions as dimensions
from cimodel.lib.conf_tree import ConfigNode
LINKING_DIMENSIONS = [
"shared",
"static",
]
DEPS_INCLUSION_DIMENSIONS = [
"with-deps",
"without-deps",
]
def get_processor_arch_name(gpu_version):
return (
"cpu"
if not gpu_version
else (
"cu" + gpu_version.strip("cuda")
if gpu_version.startswith("cuda")
else gpu_version
)
)
CONFIG_TREE_DATA = OrderedDict()
# GCC config variants:
#
# All the nightlies (except libtorch with new gcc ABI) are built with devtoolset7,
# which can only build with old gcc ABI. It is better than devtoolset3
# because it understands avx512, which is needed for good fbgemm performance.
#
# Libtorch with new gcc ABI is built with gcc 5.4 on Ubuntu 16.04.
LINUX_GCC_CONFIG_VARIANTS = OrderedDict(
manywheel=["devtoolset7"],
conda=["devtoolset7"],
libtorch=[
"devtoolset7",
"gcc5.4_cxx11-abi",
],
)
WINDOWS_LIBTORCH_CONFIG_VARIANTS = [
"debug",
"release",
]
class TopLevelNode(ConfigNode):
def __init__(self, node_name, config_tree_data, smoke):
super().__init__(None, node_name)
self.config_tree_data = config_tree_data
self.props["smoke"] = smoke
def get_children(self):
return [
OSConfigNode(self, x, c, p) for (x, (c, p)) in self.config_tree_data.items()
]
class OSConfigNode(ConfigNode):
def __init__(self, parent, os_name, gpu_versions, py_tree):
super().__init__(parent, os_name)
self.py_tree = py_tree
self.props["os_name"] = os_name
self.props["gpu_versions"] = gpu_versions
def get_children(self):
return [PackageFormatConfigNode(self, k, v) for k, v in self.py_tree.items()]
class PackageFormatConfigNode(ConfigNode):
def __init__(self, parent, package_format, python_versions):
super().__init__(parent, package_format)
self.props["python_versions"] = python_versions
self.props["package_format"] = package_format
def get_children(self):
if self.find_prop("os_name") == "linux":
return [
LinuxGccConfigNode(self, v)
for v in LINUX_GCC_CONFIG_VARIANTS[self.find_prop("package_format")]
]
elif (
self.find_prop("os_name") == "windows"
and self.find_prop("package_format") == "libtorch"
):
return [
WindowsLibtorchConfigNode(self, v)
for v in WINDOWS_LIBTORCH_CONFIG_VARIANTS
]
else:
return [ArchConfigNode(self, v) for v in self.find_prop("gpu_versions")]
class LinuxGccConfigNode(ConfigNode):
def __init__(self, parent, gcc_config_variant):
super().__init__(parent, "GCC_CONFIG_VARIANT=" + str(gcc_config_variant))
self.props["gcc_config_variant"] = gcc_config_variant
def get_children(self):
gpu_versions = self.find_prop("gpu_versions")
# XXX devtoolset7 on CUDA 9.0 is temporarily disabled
# see https://github.com/pytorch/pytorch/issues/20066
if self.find_prop("gcc_config_variant") == "devtoolset7":
gpu_versions = filter(lambda x: x != "cuda_90", gpu_versions)
# XXX disabling conda rocm build since docker images are not there
if self.find_prop("package_format") == "conda":
gpu_versions = filter(
lambda x: x not in dimensions.ROCM_VERSION_LABELS, gpu_versions
)
# XXX libtorch rocm build is temporarily disabled
if self.find_prop("package_format") == "libtorch":
gpu_versions = filter(
lambda x: x not in dimensions.ROCM_VERSION_LABELS, gpu_versions
)
return [ArchConfigNode(self, v) for v in gpu_versions]
class WindowsLibtorchConfigNode(ConfigNode):
def __init__(self, parent, libtorch_config_variant):
super().__init__(
parent, "LIBTORCH_CONFIG_VARIANT=" + str(libtorch_config_variant)
)
self.props["libtorch_config_variant"] = libtorch_config_variant
def get_children(self):
return [ArchConfigNode(self, v) for v in self.find_prop("gpu_versions")]
class ArchConfigNode(ConfigNode):
def __init__(self, parent, gpu):
super().__init__(parent, get_processor_arch_name(gpu))
self.props["gpu"] = gpu
def get_children(self):
return [PyVersionConfigNode(self, v) for v in self.find_prop("python_versions")]
class PyVersionConfigNode(ConfigNode):
def __init__(self, parent, pyver):
super().__init__(parent, pyver)
self.props["pyver"] = pyver
def get_children(self):
package_format = self.find_prop("package_format")
os_name = self.find_prop("os_name")
has_libtorch_variants = package_format == "libtorch" and os_name == "linux"
linking_variants = LINKING_DIMENSIONS if has_libtorch_variants else []
return [LinkingVariantConfigNode(self, v) for v in linking_variants]
class LinkingVariantConfigNode(ConfigNode):
def __init__(self, parent, linking_variant):
super().__init__(parent, linking_variant)
def get_children(self):
return [
DependencyInclusionConfigNode(self, v) for v in DEPS_INCLUSION_DIMENSIONS
]
class DependencyInclusionConfigNode(ConfigNode):
def __init__(self, parent, deps_variant):
super().__init__(parent, deps_variant)
self.props["libtorch_variant"] = "-".join(
[self.parent.get_label(), self.get_label()]
)

Some files were not shown because too many files have changed in this diff Show More