61 Commits

Author SHA1 Message Date
66f53889d5 [nativert] port semaphore to c10 util (#153504)
Summary:
nativert RFC: https://github.com/zhxchen17/rfcs/blob/master/RFC-0043-torch-native-runtime.md

To land the runtime into PyTorch core, we will gradually land logical parts of the code into the Github issue and get each piece properly reviewed.

This diff adds a simple semaphore interface into c10 until c++20 where we get counting_semaphore

gonna need a oss build export to take a look at this...

Test Plan: CI

Differential Revision: D73882656

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153504
Approved by: https://github.com/zhxchen17
2025-05-28 19:17:30 +00:00
05bc78e64f [submodule] Update fbgemm pinned version (#153950)
Summary:
Update fbgemm pinned version in PyTroch.
Related update in fbgemm: D74434751

Included changes:
Update fbgemm external dependencies directory in setup.py
Add DISABLE_FBGEMM_AUTOVEC flag to disable fbgemm's autovec

Test Plan: PyTorch OSS CI

Differential Revision: D75073516

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153950
Approved by: https://github.com/Skylion007, https://github.com/ngimel
2025-05-20 20:24:27 +00:00
41b38f755c Revert "Reverting the PR adding Kleidiai-based int4 kernels (#145392)" (#145505)
https://github.com/pytorch/pytorch/pull/134124 was reverted by https://github.com/pytorch/pytorch/pull/145392 due to KleidiAI clone issue.

1. This reverts commit 0940eb6d44f3cf69dd840db990245cbe1f78e770 (https://github.com/pytorch/pytorch/pull/145392 )and Fixes KleidiAI mirror issue.
2. KleidiAI is now cloned from github mirror instead of arm gitlab

Change-Id: I7d6eee7214cd117d3057d615936fcc3ee6052fa2

Fixes https://github.com/pytorch/pytorch/issues/145273

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145505
Approved by: https://github.com/malfet
2025-01-23 18:50:59 +00:00
0940eb6d44 Reverting the PR adding Kleidiai-based int4 kernels (#145392)
Mitigation for https://github.com/pytorch/pytorch/issues/145273
Reverting https://github.com/pytorch/pytorch/pull/134124 and https://github.com/pytorch/pytorch/pull/144074

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145392
Approved by: https://github.com/ZainRizvi, https://github.com/malfet, https://github.com/atalman, https://github.com/digantdesai
2025-01-22 20:11:49 +00:00
94737e8a2a [ARM][feat]: Add 4 bit dynamic quantization matmuls & KleidiAI Backend (#134124)
Description:
1. Quantize Linear Layer Weights to 4-bits:
Quantize the weights of the Linear layer to 4 bits, using symmetric quantization.
Pack two 4-bit weights into one uint8 container.
Choose a quantization scheme (channel-wise or group-wise), with the group size being a multiple of 32.

2. Prepare Quantized Weights, Scales, and Optional Bias:
After quantizing, obtain the quantized_weights, scales, and groupsize.
If the original Linear layer has a bias, prepare it as well.

3. Pack the Weights Efficiently:
Use torch.ops.aten._dyn_quant_pack_4bit_weight to optimally pack the weights, scales, and optional bias.
```python
packed_weights = torch.ops.aten._dyn_quant_pack_4bit_weight(weight, scales_and_zeros, bias, groupsize, in_features, out_features)
```
Input parameters should include:
in_features and out_features (the same as the Linear layer’s corresponding parameters).

4. Perform Dynamic Quantized Matrix Multiplication:
Use torch.ops.aten._dyn_quant_matmul_4bit to perform matrix multiplication with quantized weights.
```python
output = torch.ops.aten._dyn_quant_matmul_4bit(input, packed_weights,  groupsize, in_features, out_features)
```
Inputs required include:
The input tensor, packed_weights , groupsize, and the in_features and out_features.

API Usage: https://github.com/pytorch/pytorch/issues/143289

Model Perf :
7B Transformer model:
Prefill : 340 t/s
Decode  : 40  t/s
2B Transformer model
Prefill : 747 t/s
Decode  : 80  t/s

Tests:
python test/test_linalg.py -k test__dyn_quant_pack_4bit_weight
Ran 1 test in 0.016s

OK

python test/test_linalg.py -k test__dyn_quant_matmul_4bit
Ran 8 tests in 0.077s

OK

python test/test_linalg.py -k test_compile_dyn_quant_matmul_4bit
Ran 8 tests in 11.454s

Change-Id: Ia1672bad5e6ec94e64d8bb1971395d60f4b3a452

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134124
Approved by: https://github.com/digantdesai, https://github.com/malfet
2024-12-20 19:32:03 +00:00
8136daff5a Revert "[ARM][feat]: Add 4 bit dynamic quantization matmuls & KleidiAI Backend (#134124)"
This reverts commit 4b82251011f85f9d1395b451d61e976af844d9b1.

Reverted https://github.com/pytorch/pytorch/pull/134124 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it breaks lots of internal build ([comment](https://github.com/pytorch/pytorch/pull/134124#issuecomment-2555953189))
2024-12-19 23:33:17 +00:00
4b82251011 [ARM][feat]: Add 4 bit dynamic quantization matmuls & KleidiAI Backend (#134124)
Description:
1. Quantize Linear Layer Weights to 4-bits:
Quantize the weights of the Linear layer to 4 bits, using symmetric quantization.
Pack two 4-bit weights into one uint8 container.
Choose a quantization scheme (channel-wise or group-wise), with the group size being a multiple of 32.

2. Prepare Quantized Weights, Scales, and Optional Bias:
After quantizing, obtain the quantized_weights, scales, and groupsize.
If the original Linear layer has a bias, prepare it as well.

3. Pack the Weights Efficiently:
Use torch.ops.aten._dyn_quant_pack_4bit_weight to optimally pack the weights, scales, and optional bias.
```python
packed_weights = torch.ops.aten._dyn_quant_pack_4bit_weight(weight, scales_and_zeros, bias, groupsize, in_features, out_features)
```
Input parameters should include:
in_features and out_features (the same as the Linear layer’s corresponding parameters).

4. Perform Dynamic Quantized Matrix Multiplication:
Use torch.ops.aten._dyn_quant_matmul_4bit to perform matrix multiplication with quantized weights.
```python
output = torch.ops.aten._dyn_quant_matmul_4bit(input, packed_weights,  groupsize, in_features, out_features)
```
Inputs required include:
The input tensor, packed_weights , groupsize, and the in_features and out_features.

API Usage: https://github.com/pytorch/pytorch/issues/143289

Model Perf :
7B Transformer model:
Prefill : 340 t/s
Decode  : 40  t/s
2B Transformer model
Prefill : 747 t/s
Decode  : 80  t/s

Tests:
python test/test_linalg.py -k test__dyn_quant_pack_4bit_weight
Ran 1 test in 0.016s

OK

python test/test_linalg.py -k test__dyn_quant_matmul_4bit
Ran 8 tests in 0.077s

OK

python test/test_linalg.py -k test_compile_dyn_quant_matmul_4bit
Ran 8 tests in 11.454s

Change-Id: Ia1672bad5e6ec94e64d8bb1971395d60f4b3a452

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134124
Approved by: https://github.com/digantdesai, https://github.com/malfet
2024-12-19 18:51:26 +00:00
14fe1f7190 Revert "[ARM][feat]: Add 4 bit dynamic quantization matmuls & KleidiAI Backend (#134124)"
This reverts commit d3ff2d42c28a2c187cbedfd8f60b84a4dfa2d6bf.

Reverted https://github.com/pytorch/pytorch/pull/134124 on behalf of https://github.com/malfet due to This broke S390 builds, includes cpuinfo unconditionally ([comment](https://github.com/pytorch/pytorch/pull/134124#issuecomment-2552560208))
2024-12-19 01:05:11 +00:00
d3ff2d42c2 [ARM][feat]: Add 4 bit dynamic quantization matmuls & KleidiAI Backend (#134124)
Description:
1. Quantize Linear Layer Weights to 4-bits:
Quantize the weights of the Linear layer to 4 bits, using symmetric quantization.
Pack two 4-bit weights into one uint8 container.
Choose a quantization scheme (channel-wise or group-wise), with the group size being a multiple of 32.

2. Prepare Quantized Weights, Scales, and Optional Bias:
After quantizing, obtain the quantized_weights, scales, and groupsize.
If the original Linear layer has a bias, prepare it as well.

3. Pack the Weights Efficiently:
Use torch.ops.aten._dyn_quant_pack_4bit_weight to optimally pack the weights, scales, and optional bias.
```python
packed_weights = torch.ops.aten._dyn_quant_pack_4bit_weight(weight, scales_and_zeros, bias, groupsize, in_features, out_features)
```
Input parameters should include:
in_features and out_features (the same as the Linear layer’s corresponding parameters).

4. Perform Dynamic Quantized Matrix Multiplication:
Use torch.ops.aten._dyn_quant_matmul_4bit to perform matrix multiplication with quantized weights.
```python
output = torch.ops.aten._dyn_quant_matmul_4bit(input, packed_weights,  groupsize, in_features, out_features)
```
Inputs required include:
The input tensor, packed_weights , groupsize, and the in_features and out_features.

API Usage: https://github.com/pytorch/pytorch/issues/143289

Model Perf :
7B Transformer model:
Prefill : 340 t/s
Decode  : 40  t/s
2B Transformer model
Prefill : 747 t/s
Decode  : 80  t/s

Tests:
python test/test_linalg.py -k test__dyn_quant_pack_4bit_weight
Ran 1 test in 0.016s

OK

python test/test_linalg.py -k test__dyn_quant_matmul_4bit
Ran 8 tests in 0.077s

OK

python test/test_linalg.py -k test_compile_dyn_quant_matmul_4bit
Ran 8 tests in 11.454s

Change-Id: Ia1672bad5e6ec94e64d8bb1971395d60f4b3a452

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134124
Approved by: https://github.com/digantdesai, https://github.com/malfet
2024-12-18 22:30:07 +00:00
f0f6144381 [EZ][BE] Update googletest submodule (#140988)
From v1.11.0 (released in Jun 2021) to v1.15.2 (release in Jul 2024)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/140988
Approved by: https://github.com/izaitsevfb, https://github.com/huydhn
2024-11-19 07:49:16 +00:00
cyy
05e8e87a69 [Submodule] Remove foxi (#132976)
It is not used after removal of Caffe2 code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132976
Approved by: https://github.com/ezyang
2024-08-09 03:46:52 +00:00
64f1111d38 Expose nholmann json to torch (#129570)
Summary:

Expose nlohmann json library so that it can be used from inside Pytorch. The library already exists in the `third_party` directory. This PR is making `nlohmann/json.hpp` header available to be used from `torch.distributed`.
The next PR makes actual use of this header.

imported-using-ghimport

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D59035246

Pulled By: c-p-i-o

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129570
Approved by: https://github.com/d4l3k, https://github.com/malfet
2024-06-26 21:59:26 +00:00
597922ba21 Reapply "distributed debug handlers (#126601)" (#127805)
This reverts commit 7646825c3eb687030c4f873b01312be0eed80174.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127805
Approved by: https://github.com/PaliC
2024-06-04 19:44:30 +00:00
7646825c3e Revert "distributed debug handlers (#126601)"
This reverts commit 3d541835d509910fceca00fc5a916e9718c391d8.

Reverted https://github.com/pytorch/pytorch/pull/126601 on behalf of https://github.com/PaliC due to breaking internal typechecking tests ([comment](https://github.com/pytorch/pytorch/pull/126601#issuecomment-2141076987))
2024-05-31 01:21:24 +00:00
cyy
d44daebdbc [Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127051
Approved by: https://github.com/cpuhrsch, https://github.com/malfet
2024-05-31 01:20:45 +00:00
3d541835d5 distributed debug handlers (#126601)
This adds debug handlers as described in:
* https://gist.github.com/d4l3k/828b7be585c7615e85b2c448b308d925 (public copy)
* https://docs.google.com/document/d/1la68szcS6wUYElUUX-P6zXgkPA8lnfzpagMTPys3aQ8/edit (internal copy)

This is only adding the C++ pieces that will be used from the main process. The Python and torchrun pieces will be added in a follow up PR.

This adds 2 handlers out of the box:

* `/handler/ping` for testing purposes
* `/handler/dump_nccl_trace_pickle` as a POC integration with Flight Recorder

Test plan:

```
python test/distributed/elastic/test_control_plane.py
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126601
Approved by: https://github.com/kurman, https://github.com/c-p-i-o
2024-05-30 02:21:08 +00:00
67739d8c6f Revert "[Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051)"
This reverts commit 699db7988d84d163ebb6919f78885e4630182a7a.

Reverted https://github.com/pytorch/pytorch/pull/127051 on behalf of https://github.com/PaliC due to This PR needs to be synced using the import button as there is a bug in our diff train ([comment](https://github.com/pytorch/pytorch/pull/127051#issuecomment-2138496995))
2024-05-30 01:16:57 +00:00
cyy
699db7988d [Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127051
Approved by: https://github.com/cpuhrsch, https://github.com/malfet
2024-05-29 11:58:03 +00:00
cdbb2c9acc Revert "[Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051)"
This reverts commit 4fdbaa794f9d5af2f171f772a51cb710c51c925f.

Reverted https://github.com/pytorch/pytorch/pull/127051 on behalf of https://github.com/PaliC due to This PR needs to be synced using the import button as there is a bug in our diff train ([comment](https://github.com/pytorch/pytorch/pull/127051#issuecomment-2136428735))
2024-05-29 03:02:35 +00:00
cyy
4fdbaa794f [Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127051
Approved by: https://github.com/cpuhrsch, https://github.com/malfet
2024-05-27 03:54:03 +00:00
cyy
574ae9afb8 [Submodule] Remove third-party onnx-tensorrt (#126542)
It seems that tensorrt is not used by the C++ code, may be due to the removal of Caffe2.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126542
Approved by: https://github.com/ezyang
2024-05-19 22:34:24 +00:00
fd90991790 [rfc] opentelemetry in pytorch (#122999)
1. Add current latest version (opentelemetry-cpp version v1.14.2) to PyTorch library.
Steps:
```
$cd pytorch
$git submodule add https://github.com/open-telemetry/opentelemetry-cpp.git third_party/opentelemetry-cpp
$cd third_party/opentelemetry-cpp
$git checkout v1.14.2
$git add third_party/opentelemetry-cpp .gitmodules
$git commit
```
Expected change in checkout size:
```
(/home/cpio/local/a/pytorch-env) [cpio@devvm17556.vll0 ~/local/pytorch (gh/c-p-i-o/otel)]$ git count-objects -vH
count: 654
size: 3.59 MiB
in-pack: 1229701
packs: 17
size-pack: 1.17 GiB
prune-packable: 76
garbage: 0
size-garbage: 0 bytes
```

2.

TODO
- [x] Figure out how dynamic linking works. App builders will somehow need to `target_include` opentelemetry-cpp at runtime.
- [ ] Examples on how to use opentelemetry + pytorch
- [ ] Tests + documentation (e.g. using null opentelemetry implementation).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/122999
Approved by: https://github.com/ezyang
2024-04-21 15:20:21 +00:00
95a090fb56 [CI] Update bazel deps (#124076)
- Update `WORKSPACE` to actually use Python-3.10 as job name claims it is
- Get rid of unneeded `future` and `six` dependencies (Removed long time ago)
- Update `requests`, `typing-extensions` and `setuptools` to the latest releases
- Mark `tools/build/bazel/requirements.txt` as a generated file

This also updates idna to 3.7 that contains a fix for [CVE-2024-3651](https://github.com/advisories/GHSA-jjg7-2v4v-x38h), though as we are no shipping a binary with it, it does not expose CI system to any actual risks

TODOs:
 - Add periodic job that runs `pip compile` to update those to the latest version
 - Unify varios requirements .txt (i.e. bazel requirements and requirements-ci should be one and the same)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/124076
Approved by: https://github.com/seemethere, https://github.com/DanilBaibak
2024-04-15 20:39:50 +00:00
ba06951c66 [BE] [cuDNN] Always build assuming cuDNN >= 8.1 (#95722)
<!--
copilot:summary
-->
### <samp>🤖 Generated by Copilot at 27084ed</samp>

This pull request simplifies and cleans up the code that uses the cuDNN library for convolution, batch normalization, CTC loss, and quantized operations. It removes the unnecessary checks and conditions for older cuDNN versions and the experimental cuDNN v8 API, and ~~replaces them with the stable `cudnn_frontend` API that requires cuDNN v8 or higher. It also adds the dependency and configuration for the `cudnn_frontend` library in the cmake and bazel files.~~ Correction: The v7 API will still be available with this PR, and can still be used, without any changes to the defaults. This change simply always _builds_ the v8 API, and removes the case where _only_ the v7 API is built.

This is a re-land of https://github.com/pytorch/pytorch/pull/91527

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95722
Approved by: https://github.com/malfet, https://github.com/atalman
2024-01-03 15:41:28 +00:00
4ea7430ffb [BE] Don't copy CuDNN libs twice (#115872)
- It was installed twice : once in `/usr/local/cuda/lib64` folder and 2nd time in `/usr/lib64`
- And don't install CuDNN headers thrice, only in `/usr/local/cuda/includa`
- Error on unknown CUDA version
- Modify bazel builds to look for cudnn in `/usr/local/cuda` folder
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115872
Approved by: https://github.com/huydhn
2023-12-15 09:47:14 +00:00
3c9a59cb8d Revert "[BE] [cuDNN] Always build assuming cuDNN >= 8.0 (#95722)"
This reverts commit df4f0b3829f8e8b623f4e94a8536cfa58ccfb9af.

Reverted https://github.com/pytorch/pytorch/pull/95722 on behalf of https://github.com/PaliC due to is breaking a bunch of internal pytorch users ([comment](https://github.com/pytorch/pytorch/pull/95722#issuecomment-1806131675))
2023-11-10 17:26:36 +00:00
df4f0b3829 [BE] [cuDNN] Always build assuming cuDNN >= 8.0 (#95722)
<!--
copilot:summary
-->
### <samp>🤖 Generated by Copilot at 27084ed</samp>

This pull request simplifies and cleans up the code that uses the cuDNN library for convolution, batch normalization, CTC loss, and quantized operations. It removes the unnecessary checks and conditions for older cuDNN versions and the experimental cuDNN v8 API, and ~~replaces them with the stable `cudnn_frontend` API that requires cuDNN v8 or higher. It also adds the dependency and configuration for the `cudnn_frontend` library in the cmake and bazel files.~~ Correction: The v7 API will still be available with this PR, and can still be used, without any changes to the defaults. This change simply always _builds_ the v8 API, and removes the case where _only_ the v7 API is built.

This is a re-land of https://github.com/pytorch/pytorch/pull/91527

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95722
Approved by: https://github.com/malfet
2023-11-08 07:53:23 +00:00
9bbee245fe update rules_python and let bazel install its own pip dependencies (#101405)
update rules_python and let bazel install its own pip dependencies

Summary:
This is the official way of doing Python in Bazel.

Test Plan: Rely on CI.

---
Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/pytorch/pytorch/pull/101405).
* #101406
* __->__ #101405
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101405
Approved by: https://github.com/vors, https://github.com/huydhn
2023-05-23 06:20:33 +00:00
47d31364d7 run buildifier on WORKSPACE (#101411)
run buildifier on WORKSPACE

Summary: Make it easier to keep the file clean with subsequent changes.

Test Plan: Should be a no-op.

---
Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/pytorch/pytorch/pull/101411).
* #101406
* #101405
* #101445
* #101424
* __->__ #101411
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101411
Approved by: https://github.com/huydhn
2023-05-16 14:53:28 +00:00
630593d3cc [bazel] add python targets (#101003)
This PR adds bazel python, so that bazel build could be used from python like `import torch`.

Notable changes:
- Add the python targets.
- Add the version.py.tpl generation.
- In order to archive the `USE_GLOBAL_DEPS = False` just for the bazel build, employ a monkey-patch hack in the mentioned `version.py.tpl`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101003
Approved by: https://github.com/huydhn
2023-05-12 19:44:01 +00:00
447f5b5e2d [bazel] enable sccache+nvcc in CI (#95528)
Fixes #79348

This change is mostly focused on enabling nvcc+sccache in the PyTorch CI.

Along the way we had to do couple tweaks:
1.  Split the rules_cc from the rules_cuda that embeeded them before. This is needed in order to apply a different patch to the rules_cc compare to the one that rules_cuda does by default. This is in turn needed because we need to workaround an nvcc behavior where it doesn't send `-iquote xxx` to the host compiler, but it does send `-isystem xxx`. So we workaround this problem with (ab)using `-isystem` instead. Without it we are getting errors like `xxx` is not found.

2. Workaround bug in bazel https://github.com/bazelbuild/bazel/issues/10167 that prevents us from using a straightforward and honest `nvcc` sccache wrapper. Instead we generate ad-hock bazel specific nvcc wrapper that has internal knowledge of the relative bazel paths to local_cuda. This allows us to workaround the issue with CUDA symlinks. Without it we are getting `undeclared inclusion(s) in rule` all over the place for CUDA headers.

## Test plan

Green CI build https://github.com/pytorch/pytorch/actions/runs/4267147180/jobs/7428431740

Note that now it says "CUDA" in the sccache output

```
+ sccache --show-stats
Compile requests                    9784
Compile requests executed           6726
Cache hits                          6200
Cache hits (C/C++)                  6131
Cache hits (CUDA)                     69
Cache misses                         519
Cache misses (C/C++)                 201
Cache misses (CUDA)                  318
Cache timeouts                         0
Cache read errors                      0
Forced recaches                        0
Cache write errors                     0
Compilation failures                   0
Cache errors                           7
Cache errors (C/C++)                   7
Non-cacheable compilations             0
Non-cacheable calls                 2893
Non-compilation calls                165
Unsupported compiler calls             0
Average cache write                0.116 s
Average cache read miss           23.722 s
Average cache read hit             0.057 s
Failed distributed compilations        0
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95528
Approved by: https://github.com/huydhn
2023-02-28 03:51:11 +00:00
a530446f57 Manual submodule update: kineto and libfmt bazel issue (#94756) (#95535)
Summary:
This is a manual pull request to update the third_party submodule for [pytorch/kineto](https://github.com/pytorch/kineto). Also, tries to fix the failure in libfmt bazel build similar to https://github.com/pytorch/pytorch/pull/93219.

New submodule commit: 92c5344f0b

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95535

Test Plan: Ensure that CI jobs succeed on GitHub before landing.

Differential Revision: D43588413

Pulled By: aaronenyeshi

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95535
Approved by: https://github.com/davidberard98
2023-02-25 19:26:08 +00:00
2f547ae613 Remove SHA checksum for bazel http_archive from GitHub (#95039)
An action item from https://github.com/pytorch/pytorch/issues/94346

Although the security practice of setting the checksum is good, it doesn't work when the archive is downloaded from some sites like GitHub because it can change. Specifically, GitHub gives no guarantee to keep the same value forever https://github.com/community/community/discussions/46034.

This also adds a new linter to make sure that SHA checksum from GitHub can be removed quickly.  The WORKSPACE file is actually updated using the new linter:

```
>>> Lint for WORKSPACE:

  Advice (BAZEL_LINTER) format
    Redundant SHA checksum. Run `lintrunner -a` to apply this patch.

    You can run `lintrunner -a` to apply this patch.

     5   5 |
     6   6 | http_archive(
     7   7 |     name = "rules_cuda",
     7     |-    sha256 = "f80438bee9906e9ecb1a8a4ae2365374ac1e8a283897281a2db2fb7fcf746333",
     9   8 |     strip_prefix = "runtime-b1c7cce21ba4661c17ac72421c6a0e2015e7bef3/third_party/rules_cuda",
    10   9 |     urls = ["b1c7cce21b.tar.gz"],
    11  10 | )
--------------------------------------------------------------------------------
    29  28 |   name = "pybind11_bazel",
    30  29 |   strip_prefix = "pybind11_bazel-992381ced716ae12122360b0fbadbc3dda436dbf",
    31  30 |   urls = ["992381ced7.zip"],
    31     |-  sha256 = "3dc6435bd41c058453efe102995ef084d0a86b0176fd6a67a6b7100a2e9a940e",
    33  31 | )
    34  32 |
    35  33 | new_local_repository(
--------------------------------------------------------------------------------
    52  50 |     urls = [
    53  51 |         "https://github.com/gflags/gflags/archive/v2.2.2.tar.gz",
    54  52 |     ],
    54     |-    sha256 = "34af2f15cf7367513b352bdcd2493ab14ce43692d2dcd9dfc499492966c64dcf",
    56  53 | )
    57  54 |
    58  55 | new_local_repository(
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95039
Approved by: https://github.com/ZainRizvi
2023-02-22 04:39:19 +00:00
cyy
a405c6993f [submodule] update libfmt to tag 9.1.0 (#93219)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93219
Approved by: https://github.com/malfet, https://github.com/Skylion007, https://github.com/albanD
2023-02-08 17:21:39 +00:00
42d4eca796 Update submodule kineto fix bazel1 (#92318)
Update kineto submodule and fix bazel build issue.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92318
Approved by: https://github.com/aaronenyeshi
2023-01-28 02:26:28 +00:00
523d4f2562 Revert "[cuDNN][cuDNN V8 API] Always build assuming cuDNN >= 8.0 (#91527)"
This reverts commit 4d07ad74f1c11efa55501433d6cf1f06840f5207.

Reverted https://github.com/pytorch/pytorch/pull/91527 on behalf of https://github.com/DanilBaibak due to Break internal build
2023-01-16 13:28:09 +00:00
4d07ad74f1 [cuDNN][cuDNN V8 API] Always build assuming cuDNN >= 8.0 (#91527)
We've been building with V8 (incl. V8 API) by default for a while now; this PR cleans up some guards for cuDNN < 8.0.

CC @ptrblck @ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91527
Approved by: https://github.com/ngimel
2023-01-13 18:55:37 +00:00
f6c6048b10 Use CUTLASS GEMM for NT bmm (#85894)
Copy of https://github.com/pytorch/pytorch/pull/85710
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85894
Approved by: https://github.com/drisspg
2022-10-18 23:11:47 +00:00
d169f950da Revert "Use CUTLASS GEMM for NT bmm [OSS-only] (#85894)"
This reverts commit ef58a132f223d5abf2bd3f8bee380aca6c29d17f.

Reverted https://github.com/pytorch/pytorch/pull/85894 on behalf of https://github.com/DanilBaibak due to Break internal build
2022-10-13 15:28:09 +00:00
ef58a132f2 Use CUTLASS GEMM for NT bmm [OSS-only] (#85894)
OSS-only copy of https://github.com/pytorch/pytorch/pull/85710
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85894
Approved by: https://github.com/drisspg
2022-10-12 20:03:28 +00:00
f0ee21fe0a Update cpuinfo to the latest commit (#83620)
This hasn't been updated for a while, so pulling the latest commit from https://github.com/pytorch/cpuinfo. I wonder if it breaks anything

Fixes #83594

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83620
Approved by: https://github.com/malfet
2022-08-20 06:16:54 +00:00
9d3c35d1e1 Back out "Revert D37720837: Back out "Revert D37228314: [Profiler] Include ActivityType from Kineto"" (#81450)
Differential Revision: [D37842341](https://our.internmc.facebook.com/intern/diff/D37842341/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D37842341/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81450
Approved by: https://github.com/pbelevich
2022-07-15 18:25:40 +00:00
36d2c44cce Revert "Back out "Revert D37228314: [Profiler] Include ActivityType from Kineto" (#81122)"
This reverts commit 52a538868b9239378af3923ba64a33ad7e1fb4c6.

Reverted https://github.com/pytorch/pytorch/pull/81122 on behalf of https://github.com/clee2000 due to broke periodic buck build https://github.com/pytorch/pytorch/runs/7306516655?check_suite_focus=true
2022-07-12 18:20:00 +00:00
52a538868b Back out "Revert D37228314: [Profiler] Include ActivityType from Kineto" (#81122)
Reland

Differential Revision: [D37720837](https://our.internmc.facebook.com/intern/diff/D37720837/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D37720837/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81122
Approved by: https://github.com/chaekit
2022-07-12 14:54:01 +00:00
a965a67492 Revert "[Profiler] Include ActivityType from Kineto (#80750)"
This reverts commit 2f6f7391efd109f1ea12bbebdda58aa9169f4e9c.

Reverted https://github.com/pytorch/pytorch/pull/80750 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally
2022-07-08 05:16:56 +00:00
2f6f7391ef [Profiler] Include ActivityType from Kineto (#80750)
We don't want to compile with Kineto on all platforms, but if we're going to have significant integration between profiler and Kineto profiler will need to be able to rely on simple API constructs like the Kineto enums.

Differential Revision: [D37228314](https://our.internmc.facebook.com/intern/diff/D37228314/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D37228314/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80750
Approved by: https://github.com/aaronenyeshi
2022-07-08 04:59:06 +00:00
a0a23c6ef8 [bazel] make it possible to build the whole world, update CI (#78870)
Fixes https://github.com/pytorch/pytorch/issues/77509

This PR supersedes https://github.com/pytorch/pytorch/pull/77510.
It allows both `bazel query //...` and `bazel build --config=gpu //...` to work.

Concretely the changes are:
1. Add "GenerateAten" mnemonic -- this is a convenience thing, so anybody who uses [Remote Execution](https://bazel.build/docs/remote-execution) can add a

```
build:rbe --strategy=GenerateAten=sandboxed,local
```

line to the `~/.bazelrc` and build this action locally (it doesn't have hermetic dependencies at the moment).

2. Replaced few `http_archive` repos by the proper existing submodules to avoid code drift.
3. Updated `pybind11_bazel` and added `python_version="3"` to `python_configure`. This prevents hard-to-debug error that are caused by an attempt to build with python2 on the systems where it's a default python (Ubuntu 18.04 for example).
4. Added `unused_` repos, they purpose is to hide the unwanted submodules of submodules that often have bazel targets in them.
5. Updated CI to build //... -- this is a great step forward to prevent regressions in targets not only in the top-level BUILD.bazel file, but in other folders too.
6. Switch default bazel build to use gpu support.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78870
Approved by: https://github.com/ezyang
2022-06-06 21:58:47 +00:00
4bf8a9b259 add benchmark to Bazel build (#71412)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71412

This is only in CMake and internal builds right now. Add to Bazel for
parity.
ghstack-source-id: 150235094

Test Plan: Built and ran locally. Rely on CI to verify.

Reviewed By: malfet

Differential Revision: D33635743

fbshipit-source-id: b9e5abbef5feabd52c53a9c2b95713b87ce81681
(cherry picked from commit 11700dbc80200093fdd74b1be066b4e740cee516)
2022-03-02 11:33:22 +00:00
1bc3571078 [pytorch][PR] Add ability for a mobile::Module to save as flatbuffer (#70201)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70201

Included functions:
save_mobile_module -> saves a mobile::Module to flatbuffer
load_mobile_module_from_file -> loads a flatbuffer into mobile::Module
parse_mobile_module -> parses from bytes or deserialized flatbuffer module object

Compared to previous attempts, this diff only adds flatbuffer to cmake target and leaves fbcode/xplat ones unchanged.

Test Plan: unittest

Reviewed By: malfet, gmagogsfm

Differential Revision: D33239362

fbshipit-source-id: b9ca36b83d6af2d78cc50b9eb9e2a6fa7fce0763
2022-01-12 16:30:39 -08:00
e35bf56461 [Bazel] Add CUDA build to CI (#66241)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/35316
On master, bazel cuda build is disabled due to lack of a proper `cu_library` rule. This PR:
- Add `rules_cuda` to the WORKSPACE and forward `cu_library` to `rules_cuda`.
- Use a simple local cuda and cudnn repositories (adopted from TRTorch) for cuda 11.3.
- Fix current broken cuda build.
- Enable cuda build in CI, not just for `:torch` target but all the test binaries to catch undefined symbols.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66241

Reviewed By: ejguan

Differential Revision: D31544091

Pulled By: malfet

fbshipit-source-id: fd3c34d0e8f80fee06f015694a4c13a8e9e12206
2021-12-17 13:44:29 -08:00