Commit Graph

37 Commits

Author SHA1 Message Date
b2054d3025 Prepare for an update to the XNNPACK submodule (#72642)
Summary:
- Target Sha1: ae108ef49aa5623b896fc93d4298c49d1750d9ba
- Make USE_XNNPACK a dependent option on cmake minimum version 3.12
- Print USE_XNNPACK under cmake options summary, and print the
  availability from collet_env.py
- Skip XNNPACK based tests when XNNPACK is not available
    - Add SkipIfNoXNNPACK wrapper to skip tests
- Update cmake version for xenial-py3.7-gcc5.4 image to 3.12.4
    - This is required for the backwards compatibility test.
      The PyTorch op schema is XNNPACK dependent. See,
      aten/src/ATen/native/xnnpack/RegisterOpContextClass.cpp for
      example. The nightly version is assumed to have USE_XNNPACK=ON,
      so with this change we ensure that the test build can also
      have XNNPACK.
- HACK: skipping test_xnnpack_integration tests on ROCM

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72642

Reviewed By: kimishpatel

Differential Revision: D34456794

Pulled By: digantdesai

fbshipit-source-id: 85dbfe0211de7846d8a84321b14fdb061cd6c037
(cherry picked from commit 6cf48e7b64d6979962d701b5d493998262cc8bfa)
2022-02-25 00:39:15 +00:00
3f06f29577 Improve pip package determination (#63321)
Summary:
Invoking `pip` or `pip3` yields list of packages invoked for `pip` alias on the path, rather than for the one currently being executed. Changed `get_pip_packages` to use `sys.executable + '-mpip'`

Also, add mypy to the list of packages of interest

Discovered while looking at https://github.com/pytorch/pytorch/issues/63279

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63321

Reviewed By: walterddr

Differential Revision: D30342099

Pulled By: malfet

fbshipit-source-id: fc8d17cf2ddcf18236cfde5c1b9edb4e72804ee0
2021-08-16 13:54:39 -07:00
150c828803 Add lint rule to keep collect_env.py python2 compliant (#60946)
Summary:
Fixes T94400857

- [x] Add lint rule
- [x] Verify lint rule works
- [x] Fix torch/utils/collect_env.py

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60946

Reviewed By: malfet, mruberry

Differential Revision: D29457294

Pulled By: rsemenov

fbshipit-source-id: 3c0670408d7aee1479e1de335291deb13a04ace9
2021-06-29 11:57:53 -07:00
4c00df12ec Include full Python version in collect_env.py output (#59632)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59632

Before:

```
Python version: 3.7 (64-bit runtime)
```

After:

```
Python version: 3.7.7 (default, Mar 23 2020, 17:31:31)  [Clang 4.0.1 (tags/RELEASE_401/final)] (64-bit runtime)
```

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D28961500

Pulled By: ezyang

fbshipit-source-id: 0f95a49abf6977941f09a64243916576a820679f
2021-06-24 12:11:01 -07:00
2f3be2735f Don't split oversize cached blocks (#44742)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/35901

This change is designed to prevent fragmentation in the Caching Allocator.  Permissive block splitting in the allocator allows very large blocks to be split into many pieces.  Once split too finely it is unlikely all pieces will be 'free' at that same time so the original allocation can never be returned.   Anecdotally, we've seen a model run out of memory failing to alloc a 50 MB block on a 32 GB card while the caching allocator is holding 13 GB of 'split free blocks'

Approach:

- Large blocks above a certain size are designated "oversize".  This limit is currently set 1 decade above large, 200 MB
- Oversize blocks can not be split
- Oversize blocks must closely match the requested size (e.g. a 200 MB request will match an existing 205 MB block, but not a 300 MB block)
- In lieu of splitting oversize blocks there is a mechanism to quickly free a single oversize block (to the system allocator) to allow an appropriate size block to be allocated.  This will be activated under memory pressure and will prevent _release_cached_blocks()_ from triggering

Initial performance tests show this is similar or quicker than the original strategy.  Additional tests are ongoing.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44742

Reviewed By: zou3519

Differential Revision: D29186394

Pulled By: ezyang

fbshipit-source-id: c88918836db3f51df59de6d1b3e03602ebe306a9
2021-06-21 11:46:08 -07:00
f7c15610aa Collect kernel version (#58485)
Summary:
Collect env should collect kernel and glibc version

Fixes https://github.com/pytorch/pytorch/issues/58387

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58485

Reviewed By: walterddr

Differential Revision: D28510564

Pulled By: malfet

fbshipit-source-id: ad3d4b93f51db052720bfaa4322138c55816921b
2021-05-18 10:57:59 -07:00
1ec12fd491 Add minidump collection via breakpad (#55647)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55647

This adds [breakpad](https://github.com/google/breakpad) which comes with out-of-the-box utilities to register a signal handler that writes out a minidump on an unhandled exception. Right now this is gated behind a flag in `torch.utils`, but in the future it could be on by default. Sizewise this adds aboute 500k to `libtorch_cpu.so` (187275968 B to 187810016 B).

```bash
$ cat <<EOF > test.py
import torch

torch.utils.enable_minidump_collection()

# temporary util that just segfaults
torch._C._crash()
EOF

$ python test.py
Wrote minidump to /tmp/pytorch_crashes/6a829041-50e9-4247-ea992f99-a74cf47a.dmp
fish: “python test.py” terminated by signal SIGSEGV (Address boundary error)
$ minidump-2-core /tmp/pytorch_crashes/6a829041-50e9-4247-ea992f99-a74cf47a.dmp -o core.dmp
$ gdb python core.dmp
... commence debugging ...
```

Right now all exceptions that get passed up to Python don't trigger the signal handler (which by default only
handles [these](https://github.com/google/breakpad/blob/main/src/client/linux/handler/exception_handler.cc#L115)). It would be possible for PyTorch exceptions to explicitly write a minidump when passed up to Python (maybe only when the exception is unhandled or something).

Test Plan: Imported from OSS

Reviewed By: ailzhang

Differential Revision: D27679767

Pulled By: driazati

fbshipit-source-id: 1ab3b5160b6dc405f5097eb25acc644d533358d7
2021-04-16 13:05:01 -07:00
f94c95a2dd Revert D23752058: [pytorch][PR] Don't split oversize cached blocks
Test Plan: revert-hammer

Differential Revision:
D23752058 (67dcd62310)

Original commit changeset: ccb7c13e3cf8

fbshipit-source-id: 12ae9702135ea510e9714ed97fb75ca3b9f97c27
2021-04-14 09:24:08 -07:00
67dcd62310 Don't split oversize cached blocks (#44742)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/35901

This change is designed to prevent fragmentation in the Caching Allocator.  Permissive block splitting in the allocator allows very large blocks to be split into many pieces.  Once split too finely it is unlikely all pieces will be 'free' at that same time so the original allocation can never be returned.   Anecdotally, we've seen a model run out of memory failing to alloc a 50 MB block on a 32 GB card while the caching allocator is holding 13 GB of 'split free blocks'

Approach:

- Large blocks above a certain size are designated "oversize".  This limit is currently set 1 decade above large, 200 MB
- Oversize blocks can not be split
- Oversize blocks must closely match the requested size (e.g. a 200 MB request will match an existing 205 MB block, but not a 300 MB block)
- In lieu of splitting oversize blocks there is a mechanism to quickly free a single oversize block (to the system allocator) to allow an appropriate size block to be allocated.  This will be activated under memory pressure and will prevent _release_cached_blocks()_ from triggering

Initial performance tests show this is similar or quicker than the original strategy.  Additional tests are ongoing.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44742

Reviewed By: ngimel

Differential Revision: D23752058

Pulled By: ezyang

fbshipit-source-id: ccb7c13e3cf8ef2707706726ac9aaac3a5e3d5c8
2021-04-14 03:04:41 -07:00
42635c3e59 Fix regex in collect_env.py for CUDA 11 (#51852)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/51840

Manually tested with both CUDA 10.2.89 & 11.2.67.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51852

Reviewed By: izdeby

Differential Revision: D26326105

Pulled By: mrshenli

fbshipit-source-id: 46fbe5f20c02bca982ce2ec6e62f7cc3a14fcc97
2021-02-09 07:31:08 -08:00
f431e47a2e [collect_env] Acquire windows encoding using OEMCP (#49020)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/49010.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49020

Reviewed By: zhangguanheng66

Differential Revision: D25398064

Pulled By: janeyx99

fbshipit-source-id: c7fd1e7d1f3dd82613d7f2031439503188b144fd
2020-12-09 15:22:18 -08:00
e538bd6695 [collect_env] Add candidate paths for nvidia-smi on Windows (#49021)
Summary:
Recently, Nvidia tries to put nvidia-smi under SystemRoot.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49021

Reviewed By: zhangguanheng66

Differential Revision: D25399831

Pulled By: ezyang

fbshipit-source-id: b1ea12452012e0a3fb4703996b6104e7115a8a7f
2020-12-08 15:02:15 -08:00
9b19880c43 Fix collect_env.py with older version of PyTorch (#48076)
Summary:
Inspired by https://github.com/pytorch/pytorch/issues/47993, this fixes the import error in `collect_env.py` with older version of PyTorch when `torch.version` does not have `hip` property.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48076

Reviewed By: seemethere, xuzhao9

Differential Revision: D25024352

Pulled By: samestep

fbshipit-source-id: 7dff9d2ab80b0bd25f9ca035d8660f38419cdeca
2020-11-19 12:18:08 -08:00
513f62b45b [hotfix] fix collect_env not working when torch compile/install fails (#47752)
Summary:
fix collect env not working when pytorch compile from source failed mid-way.
```
Traceback (most recent call last):
OSError: /home/rongr/local/pytorch/torch/lib/libtorch_global_deps.so: cannot open shared object file: No such file or directory
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47752

Reviewed By: janeyx99

Differential Revision: D24888576

Pulled By: walterddr

fbshipit-source-id: 3b20daeddbb4118491fb0cca9fb59d861f683da7
2020-11-11 11:47:49 -08:00
848901f276 Fix collect_env when pytorch is not installed (#47398)
Summary:
Moved all torch specific checks under `if TORCH_AVAILABLE` block

Embed gpu_info dict back into SystemEnv constructor creation and deduplicate some code between HIP and CUDA cases

Fixes https://github.com/pytorch/pytorch/issues/47397

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47398

Reviewed By: walterddr

Differential Revision: D24740421

Pulled By: malfet

fbshipit-source-id: d0a1fe5b428617cb1a9d027324d24d7371c68d64
2020-11-04 16:54:08 -08:00
20ac736200 Remove py2 compatible future imports (#44735)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44735

Reviewed By: mruberry

Differential Revision: D23731306

Pulled By: ezyang

fbshipit-source-id: 0ba009a99e475ddbe22981be8ac636f8a1c8b02f
2020-09-16 12:55:57 -07:00
bd257a17a1 Add HIP/ROCm version to collect_env.py (#44106)
Summary:
This adds HIP version info to the `collect_env.py` output.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44106

Reviewed By: VitalyFedyunin

Differential Revision: D23652341

Pulled By: zou3519

fbshipit-source-id: a1f5bce8da7ad27a1277a95885934293d0fd43c5
2020-09-14 09:19:18 -07:00
64a7684219 Enable typechecking of collect_env.py during CI (#43062)
Summary:
No type annotations can be added to the script, as it still have to be Python-2 compliant.
 Make changes to avoid variable type redefinition.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43062

Reviewed By: zou3519

Differential Revision: D23132991

Pulled By: malfet

fbshipit-source-id: 360c02e564398f555273e5889a99f834a5467059
2020-08-14 12:46:42 -07:00
c7d2774d20 Fix typo in collect_env.py (#43050)
Summary:
Minor typo fix introduced in yesterdays PR: https://github.com/pytorch/pytorch/pull/42961

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43050

Reviewed By: ezyang, malfet

Differential Revision: D23130936

Pulled By: zou3519

fbshipit-source-id: e8fa2bf155ab6a5988c74e8345278d8d70855894
2020-08-14 08:33:35 -07:00
0ff51accd8 collect_env.py: Print CPU architecture after Linux OS name (#42961)
Summary:
Missed this case in https://github.com/pytorch/pytorch/pull/42887

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42961

Reviewed By: zou3519

Differential Revision: D23095264

Pulled By: malfet

fbshipit-source-id: ff1fb0eba9ecd29bfa3d8f5e4c3dcbcb11deefcb
2020-08-13 10:49:15 -07:00
b0b8340065 Collect more data in collect_env (#42887)
Summary:
Collect Python runtime bitness (32 vs 64 bit)
Collect Mac/Linux OS machine time (x86_64, arm, Power, etc)
Collect Clang version

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42887

Reviewed By: seemethere

Differential Revision: D23064788

Pulled By: malfet

fbshipit-source-id: df361bdbb79364dc521b8e1ecbed1b4bd08f9742
2020-08-11 18:01:14 -07:00
e029d678b6 Make collect_env more robust on Windows (#39136)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/39133.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39136

Differential Revision: D21763686

Pulled By: zou3519

fbshipit-source-id: d45c3b529f569554e987dfd29579fc93d4894aaf
2020-05-28 13:25:36 -07:00
e75fb4356b Remove (most) Python 2 support from Python code (#35615)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35615

Python 2 has reached end-of-life and is no longer supported by PyTorch.
Now we can clean up a lot of cruft that we put in place to support it.
These changes were all done manually, and I skipped anything that seemed
like it would take more than a few seconds, so I think it makes sense to
review it manually as well (though using side-by-side view and ignoring
whitespace change might be helpful).

Test Plan: CI

Differential Revision: D20842886

Pulled By: dreiss

fbshipit-source-id: 8cad4e87c45895e7ce3938a88e61157a79504aed
2020-04-22 09:23:14 -07:00
8ff05031b0 Update collect_env.py to detect relevant conda-installed numpy and cudatoolkit (#35646)
Summary:
Addresses a small issue noticed in gh-32369

With this PR in a clean conda env where `conda install pytorch torchvision cudatoolkit=10.1 -c pytorch` was run:

```
Versions of relevant libraries:
[pip] numpy==1.18.1
[pip] torch==1.4.0
[pip] torchvision==0.5.0
[conda] blas                      1.0                         mkl
[conda] cudatoolkit               10.1.243             h6bb024c_0
[conda] mkl                       2020.0                      166
[conda] mkl-service               2.3.0            py37he904b0f_0
[conda] mkl_fft                   1.0.15           py37ha843d7b_0
[conda] mkl_random                1.1.0            py37hd6b4f25_0
[conda] numpy                     1.18.1           py37h4f9e942_0
[conda] numpy-base                1.18.1           py37hde5b4d6_1
[conda] pytorch                   1.4.0           py3.7_cuda10.1.243_cudnn7.6.3_0    pytorch
[conda] torchvision               0.5.0                py37_cu101    pytorch
```

With current master:
```
Versions of relevant libraries:
[pip] numpy==1.18.1
[pip] torch==1.4.0
[pip] torchvision==0.5.0
[conda] blas                      1.0                         mkl
[conda] mkl                       2020.0                      166
[conda] mkl-service               2.3.0            py37he904b0f_0
[conda] mkl_fft                   1.0.15           py37ha843d7b_0
[conda] mkl_random                1.1.0            py37hd6b4f25_0
[conda] pytorch                   1.4.0           py3.7_cuda10.1.243_cudnn7.6.3_0    pytorch
[conda] torchvision               0.5.0                py37_cu101    pytorch
```

Note that the conda output will always also include pip-installed packages, so there are still duplicates now. If it's desirable to completely remove the `pip list` output for conda envs, that's also an option.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35646

Differential Revision: D20736304

Pulled By: zou3519

fbshipit-source-id: fb6b3da3f69395869bc8c52bf7a85e9d15b0476d
2020-03-31 09:32:12 -07:00
d4464d3418 Use system locale in collect_env.py (#22579)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/22570.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22579

Differential Revision: D16147304

Pulled By: soumith

fbshipit-source-id: db73447bffbfdf54f7b830447d4b9584f363f05f
2019-07-07 20:55:31 -07:00
173f224570 Turn on F401: Unused import warning. (#18598)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18598
ghimport-source-id: c74597e5e7437e94a43c163cee0639b20d0d0c6a

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18598 Turn on F401: Unused import warning.**

This was requested by someone at Facebook; this lint is turned
on for Facebook by default.  "Sure, why not."

I had to noqa a number of imports in __init__.  Hypothetically
we're supposed to use __all__ in this case, but I was too lazy
to fix it.  Left for future work.

Be careful!  flake8-2 and flake8-3 behave differently with
respect to import resolution for # type: comments.  flake8-3 will
report an import unused; flake8-2 will not.  For now, I just
noqa'd all these sites.

All the changes were done by hand.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D14687478

fbshipit-source-id: 30d532381e914091aadfa0d2a5a89404819663e3
2019-03-30 09:01:17 -07:00
7c1e4258a9 Workarounds to the lack of nvidia-smi and ldconfig programs in macosx (was PR 16968) (#16999)
Summary:
Fix issue #12174 for Mac OSX.

PS: This is a duplicate of PR #16968 that got messed up. Sorry for the confusion.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16999

Differential Revision: D14050669

Pulled By: zou3519

fbshipit-source-id: a4594c03ae8e0ca91a4836408b6c588720162c9f
2019-02-12 14:39:28 -08:00
7ce33c586d Robust determination of cudnn library and relevant conda packages. (#16859)
Summary:
This PR implements:
1. a fix to issue #12174 - determine the location of cudnn library using `ldconfig`
2. a fix to determine the installed conda packages (in recent versions of conda, the command `conda` is a Bash function that cannot be called within a python script, so using CONDA_EXE environment variable instead)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16859

Differential Revision: D14000399

Pulled By: soumith

fbshipit-source-id: 905658ecacb0ca0587a162fade436de9582d32ab
2019-02-07 20:34:46 -08:00
d327965dac Fix pip list format in collect_env (#16798)
Summary:
Since pip 18.0 (2018-07-22), `legacy` is no longer a valid choice for `pip list --format` as can be seen in the [Release Notes](https://pip.pypa.io/en/stable/news/#id62). Therefore, the options now are: `columns`, `freeze` and `json`. With `legacy`, this is how it looked like:

```
[...]
Versions of relevant libraries:
[pip3] numpy (1.16.1)
[pip3] torch (1.0.1)
[pip3] torchvision (0.2.1)
[...]
```

Changing to `freeze`, this is how it looks like:

```
[...]
Versions of relevant libraries:
[pip3] numpy==1.16.1
[pip3] torch==1.0.1
[pip3] torchvision==0.2.1
[...]
```

Currently, this is what happens:

```
[...]
Versions of relevant libraries:
[pip] Could not collect
[...]
```
The `freeze` option is also available in old pip, so this change is backwards compatible. Also, if we would like to keep the old style, which I think it is not necessary, I could easily change that.

 ---

In case anyone wants to know how `columns` looks like (I prefer `freeze`):

```
[...]
Versions of relevant libraries:
[pip3] numpy               1.16.1
[pip3] torch               1.0.1
[pip3] torchvision         0.2.1
[...]
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16798

Differential Revision: D13971793

Pulled By: soumith

fbshipit-source-id: 3721d9079a2afa245e1185f725598901185ea4cd
2019-02-06 07:48:08 -08:00
b1b00f329e Fix the flake8 linter
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16549

Reviewed By: bddppq

Differential Revision: D13877435

Pulled By: houseroad

fbshipit-source-id: dbe575ba3f6dd30d27ac6aa5eec2eea025063540
2019-01-30 09:36:00 -08:00
d9cad71b36 Enable running collect_env.py without building PyTorch (#15468)
Summary: Closes #15346

Differential Revision: D13537873

Pulled By: ezyang

fbshipit-source-id: 7765ce4108dae9479d8900c0815cc2f174596a83
2018-12-21 11:37:43 -08:00
3a6d473b49 collect_env fix (#15447)
Summary:
fixes #15214
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15447

Differential Revision: D13531523

Pulled By: ezyang

fbshipit-source-id: 8f24f5ae9f3e78f6c5c9ee702ba14faca7aa297a
2018-12-20 16:56:34 -08:00
e6a420114f collect_env.py: get conda magma and mkl information (#14854)
Summary:
Fixes #12371
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14854

Differential Revision: D13363635

Pulled By: zou3519

fbshipit-source-id: f8b5d05038bf5ce451399dfeed558ae298178128
2018-12-06 14:58:14 -08:00
c47f680086 arc lint torch/utils (#13141)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13141

This is an example diff to show what lint rules are being applied.

Reviewed By: mingzhe09088

Differential Revision: D10858478

fbshipit-source-id: cbeb013f10f755b0095478adf79366e7cf7836ff
2018-10-25 14:59:03 -07:00
cf235e0894 fix lint after new flake8 release added new style constraints (#13047)
Summary:
fix lint after new flake8 release added new style constraints
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13047

Differential Revision: D10527804

Pulled By: soumith

fbshipit-source-id: 6f4d02662570b6339f69117b61037c8394b0bbd8
2018-10-24 09:03:38 -07:00
bed172cf54 Fix collect_env.py for Windows (#8326)
* Fix collect_env.py for Windows

* Fix expect file for Win machine
2018-06-11 10:52:21 -04:00
7a3c38ab59 Add environment collection script (#6635)
* Add environment collection script

Fixes #6111. This should make it easier for users to report bugs by giving
them a script to collect system environment information.

Changes include:
- Refactor out the environment collecting code from utils.bottleneck
- Add script (collect_env.py)
- Cleaned up the issues template so that it suggests using the script
  and is more readable.

Testing: added expect tests to go with 4 CI configurations. Whenever one
of these configurations gets updated, the test will fail until the test
also gets updated.

* Expect tests

* Update issue template

* Fix random space

* Minor improvement to issue template; fix expect test

* Skip expect test if BUILD_ENVIRONMENT not found; test fix; split off smoke/expect test
2018-04-22 15:18:14 -04:00