pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
PyTorch MergeBot	851e89c8e8	Revert "Expandable blocks in allocator (#96995 )" This reverts commit 6a50b83b739c2d37d0f518f98b8e624eca0ea153. Reverted https://github.com/pytorch/pytorch/pull/96995 on behalf of https://github.com/izaitsevfb due to Breaks internal tests	2023-04-16 19:23:37 +00:00
Zachary DeVito	6a50b83b73	Expandable blocks in allocator (#96995 ) Common advice we give for handling memory fragmentation issues is to allocate a big block upfront to reserve memory which will get split up later. For programs with changing tensor sizes this can be especially helpful to avoid OOMs that happen the first time we see a new largest input and would otherwise have to allocate new segments. However the issue with allocating a block upfront is that is nearly impossible to correctly estimate the size of that block. If too small, space in the block will run out and the allocator will allocate separate blocks anyway. Too large, and other non-PyTorch libraries might stop working because they cannot allocate any memory. This patch provides the same benefits as using a pre-allocating block but without having to choose its size upfront. Using the cuMemMap-style APIs, it adds the ability to expand the last block in a segment when more memory is needed. Compared to universally using cudaMallocAsync to avoid fragmentation, this patch can fix this common fragmentation issue while preserving most of the existing allocator behavior. This behavior can be enabled and disabled dynamically. This should allow users to, for instance, allocate long-lived parameters and state in individual buffers, and put temporary state into the large expandable blocks, further reducing fragmentation. See inline comments for information about the implementation and its limitations. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96995 Approved by: https://github.com/eellison	2023-04-14 09:49:11 +00:00
Richard Zou	d5120ff18a	[torch.library] Add ability to create library fragments (#98439 ) In C++ we have TORCH_LIBRARY_FRAGMENT. This PR adds the same functionality to the Python torch.library API. The motivation for this is: for the simple custom op API, we don't want users to need to deal with Library objects. One way to hide this from users is to create library fragments. Test Plan: - tests that you can create multiple fragments and def+impl operators on each. Pull Request resolved: https://github.com/pytorch/pytorch/pull/98439 Approved by: https://github.com/ezyang, https://github.com/bdhirsh	2023-04-10 18:04:53 +00:00
BowenBao	4f9dbc17a4	[ONNX] Enable xdoctests in CI (#98546 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/98546 Approved by: https://github.com/justinchuby, https://github.com/kit1980	2023-04-07 22:20:18 +00:00
PyTorch MergeBot	55724a5ec9	Revert "[experiment] More procs in CI (#98098 )" This reverts commit 9fd3eba6ceb048cfdcb430e34f9168eda888b4c8. Reverted https://github.com/pytorch/pytorch/pull/98098 on behalf of https://github.com/clee2000 due to I think theres a bug	2023-04-07 19:50:54 +00:00
Catherine Lee	9fd3eba6ce	[experiment] More procs in CI (#98098 ) experiment with more procs but only in master so prs dont get affected Pull Request resolved: https://github.com/pytorch/pytorch/pull/98098 Approved by: https://github.com/huydhn	2023-04-07 17:21:32 +00:00
Fuzzkatt	481ecffb5e	Add test c10d ucc tests (#88110 ) Creates the equivalent c10d test for ucc for https://github.com/pytorch/pytorch/blob/master/test/distributed/test_c10d_gloo.py and https://github.com/pytorch/pytorch/blob/master/test/distributed/test_c10d_nccl.py. Uses test_c10d_gloo.py as the reference and adds all the common ops. More detailed comparison of available ops here: https://docs.google.com/document/d/1yPsa_X9EiEiqo-j2Yn7ierhccBtEjwoqC-B7-amI0MI/edit?usp=sharing Also removes extra line for ProcessGroupUCC.cpp barrier blocking wait that got duplicated from merging https://github.com/pytorch/pytorch/pull/85047. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88110 Approved by: https://github.com/zasdfgbnm, https://github.com/kit1980, https://github.com/kwen2501, https://github.com/malfet	2023-04-06 23:51:27 +00:00
Catherine Lee	0d73cfb3e9	Retry at test file level (#97506 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/97506 Approved by: https://github.com/huydhn	2023-03-31 18:36:53 +00:00
Catherine Lee	c797c7bc8b	Clean up duplicate function run_test.py (#97914 ) afaict theyre the same thing Pull Request resolved: https://github.com/pytorch/pytorch/pull/97914 Approved by: https://github.com/huydhn	2023-03-31 06:31:17 +00:00
PyTorch MergeBot	675dfd2c1f	Revert "Retry at test file level (#97506 )" This reverts commit 7d5d5beba27050a8da68675a0ae97a12b26b8a40. Reverted https://github.com/pytorch/pytorch/pull/97506 on behalf of https://github.com/clee2000 due to test_jit_cuda_fuser having a rough time	2023-03-31 06:22:14 +00:00
Catherine Lee	7d5d5beba2	Retry at test file level (#97506 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/97506 Approved by: https://github.com/huydhn	2023-03-30 17:12:19 +00:00
Kazuaki Ishizaki	f7fe6e148e	[test] Make environment variable name better (#97356 ) This PR intends to use better (or correct?) environment variable name (`TORCH_DOCTEST_ANOMALY` instead of `TORCH_DOCTEST_ANOMOLY`) in test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97356 Approved by: https://github.com/malfet, https://github.com/kit1980	2023-03-30 06:21:28 +00:00
Huy Do	4c0dce50fd	[BE] Apply ufmt to run_test and GitHub Python util scripts (#97588 ) This has been bugging me for a while as I'm working on these Python scripts and they are not tracked by ufmt linter. So I add these script into that linter. ``` [[linter]] code = 'UFMT' include_patterns = [ '.github/*/.py', 'test/run_test.py', ``` This change should just work and not break anything as ufmt (black + usort) linter is very safe to use for standalone util scripts. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97588 Approved by: https://github.com/kit1980	2023-03-26 04:52:55 +00:00
Catherine Lee	29c061bb90	Remove non existent files in multigpu tests (#97393 ) They were removed in https://github.com/pytorch/pytorch/pull/96989/files and https://github.com/pytorch/pytorch/pull/96985/files Pull Request resolved: https://github.com/pytorch/pytorch/pull/97393 Approved by: https://github.com/kit1980, https://github.com/huydhn, https://github.com/fduwjj, https://github.com/malfet	2023-03-23 17:00:29 +00:00
Huy Do	244736a5a5	Mark ROCm tests as flaky (#97259 ) Before https://github.com/pytorch/pytorch/pull/96464, ROCm tests in trunk are already quite flaky https://hud.pytorch.org/reliability/pytorch/pytorch?jobName=trunk%20%2F%20linux-focal-rocm5.4.2-py3.8%20%2F%20test%20(default). After https://github.com/pytorch/pytorch/pull/96464, there is a new group of flaky failures coming from functorch. So let's mark the test as flaky to monitor without impacting trunk. Two flaky tests currently seeing in trunk are: * https://github.com/pytorch/pytorch/issues/97256 * `functorch/test_memory_efficient_fusion.py` OOM Pull Request resolved: https://github.com/pytorch/pytorch/pull/97259 Approved by: https://github.com/malfet, https://github.com/zou3519	2023-03-21 16:55:00 +00:00
Richard Zou	5acf403088	Run functorch tests in default shards; delete functorch-specific shards (#96464 ) Fixes #96347 This PR: - Makes the functorch tests run as a part of the "default" shards - Delete the functorch CI shard from all CI job configurations (if it exists) - Increase the "default" shard count by 1 for each job, unless it was previously set to 1, to accommodate the new functorch tests and not regress time-to-signal. - Adds a bunch of skips for ROCM and torchdynamo configurations. We can investigate them later. NB: I might go through some more iterations to figure out what other skips need to be added, but this iteration of the PR seems to pass most CI. suite. Test Plan: - wait for CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/96464 Approved by: https://github.com/huydhn	2023-03-21 13:53:01 +00:00
Huy Do	270b42d279	Fix test_schema_check CUDA illegal memory access (#97062 ) I'm seeing some recent [CUDA illegal memory access](https://hud.pytorch.org/failure/FAILED%20test_schema_check.py%3A%3ATestSchemaCheckModeOpInfoCUDA%3A%3Atest_schema_correctness_fft_fft_cuda_bool%20-%20RuntimeError%3A%20CUDA%20error%3A%20an%20illegal%20memory%20access%20was%20encountered) error related to this test. So a cheap fix is to run it serially. Fixes https://github.com/pytorch/pytorch/issues/95749 Pull Request resolved: https://github.com/pytorch/pytorch/pull/97062 Approved by: https://github.com/clee2000	2023-03-20 20:57:27 +00:00
Huy Do	db2c1ea8c8	Re-enable test_ops_jit on Windows (#96859 ) (#96931 ) Fixes https://github.com/pytorch/pytorch/issues/96858 Pull Request resolved: https://github.com/pytorch/pytorch/pull/96931 Approved by: https://github.com/kit1980	2023-03-17 22:42:22 +00:00
Catherine Lee	8c2341c1b9	Remove pytest block list (#96698 ) Enables the last few files under pytest. xdist was causing problems with `profiler/test_profiler` `test_source_multithreaded` due to creating extra threads. Luckily we don't use it so we can disable it with `-p no:xdist`, but this is incompatible with pytest-rerunfailures==10.2, so upgrade to 10.3. I'd update the windows ami but idk how. `dynamo/test_optimizers` and `dynamo/test_repros` both had tests that used skip_if_pytest. https://github.com/pytorch/pytorch/pull/93251/files suggests that it is due to pytest assertion rewriting, so I added `PYTEST_DONT_REWRITE` to their module docstrings to prevent pytest from rewriting assertions. Disable test by issue in `dynamo/test_dynamic_shapes` seems sane. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96698 Approved by: https://github.com/huydhn, https://github.com/malfet	2023-03-16 04:22:42 +00:00
albanD	7c525823c7	Remove un-used list. And disable pytest for public binding test. (#96684 ) This contains a temporary change to make sure the test fails nicely now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96684 Approved by: https://github.com/clee2000	2023-03-15 22:12:00 +00:00
Huy Do	6339ee5d23	Temporarily disable test_ops_jit on Windows (#96859 ) See https://github.com/pytorch/pytorch/issues/96858 Pull Request resolved: https://github.com/pytorch/pytorch/pull/96859 Approved by: https://github.com/kit1980	2023-03-15 17:51:32 +00:00
Huy Do	51b8ab7879	Clean up references to test_megatron_prototype (#96431 ) This test has been deleted in #96254 Pull Request resolved: https://github.com/pytorch/pytorch/pull/96431 Approved by: https://github.com/clee2000, https://github.com/fduwjj	2023-03-10 23:50:32 +00:00
Catherine Lee	4519228f60	Reduce pytest blocklist part 2 (#96397 ) Enable pytest for a few unique files. pytest runs tests in a different order than unittest (but still a consistent ordering with respect to itself) and some tests change global state, causing other tests to fail. `test_transpose_non_contiguous` in `test_torchinductor.py` gets impacted from some other test but I'm not sure which one, so my solution is to reset the metrics before the rest of the test is run. `test_register_patterns` in `test_quantize_fx.py` adds extra keys to global variables, so remove them when the test is done via unittest's `addCleanUp` which also works on pytest. pytest doesn't really have an equivalent for `load_tests` so change it to be like `test_jit` that imports all the classes. I also attempted to dynamically import them, but I failed. `test_public_api_surface` in `test_fx.py` checks for a backwards compatibility classification. There is a different test in test_fx that results in `fuser_utils` being imported. pytest runs this test before `test_public_api_surface` while unittest runs it after, so pytest sees `fuser_utils` when crawling through the modules. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96397 Approved by: https://github.com/huydhn	2023-03-10 19:10:43 +00:00
Xiao Wang	cf3d3a583e	Add env PYTORCH_TEST_DO_NOT_USE_PYTEST as an option to not use pytest in unit testing (#96444 ) Set environment variable ``` PYTORCH_TEST_DO_NOT_USE_PYTEST=1 ``` to not use pytest in pytorch unit testing. This change is related to some recent changes, e.g. #96210, #96016, #95844, #95659, that enabled the use of pytest in many test modules. Those test modules were testing normally before, but failed immediately after pytest is used. Sample stacktraces are: ```python root@8e3168a83ee2:/opt/pytorch/pytorch# python test/run_test.py -v -i test_optim -- -v --save-xml Ignoring disabled issues: [] /opt/pytorch/pytorch/test/run_test.py:1225: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. if torch.version.cuda is not None and LooseVersion(torch.version.cuda) >= "11.6": Selected tests: test_optim parallel (file granularity) tests: test_optim serial (file granularity) tests: Ignoring disabled issues: [] Ignoring disabled issues: [] Running test_optim ... [2023-03-09 12:51:59.358110] Executing ['/usr/local/bin/python', '-bb', 'test_optim.py', '-v', '--save-xml', '-v', '--use-pytest', '-vv', '-rfEX', '-x', '--reruns=2'] ... [2023-03-09 12:51:59.358810] Test results will be stored in test-reports/python-pytest/test_optim/test_optim-5e41643c8bac8ace.xml Traceback (most recent call last): File "/opt/pytorch/pytorch/test/test_optim.py", line 4581, in <module> run_tests() File "/opt/pytorch/pytorch/torch/testing/_internal/common_utils.py", line 796, in run_tests exit_code = pytest.main(args=pytest_args) File "/usr/local/lib/python3.10/site-packages/_pytest/config/__init__.py", line 148, in main config = _prepareconfig(args, plugins) File "/usr/local/lib/python3.10/site-packages/_pytest/config/__init__.py", line 329, in _prepareconfig config = pluginmanager.hook.pytest_cmdline_parse( File "/usr/local/lib/python3.10/site-packages/pluggy/_hooks.py", line 265, in __call__ return self._hookexec(self.name, self.get_hookimpls(), kwargs, firstresult) File "/usr/local/lib/python3.10/site-packages/pluggy/_manager.py", line 80, in _hookexec return self._inner_hookexec(hook_name, methods, kwargs, firstresult) File "/usr/local/lib/python3.10/site-packages/pluggy/_callers.py", line 55, in _multicall gen.send(outcome) File "/usr/local/lib/python3.10/site-packages/_pytest/helpconfig.py", line 103, in pytest_cmdline_parse config: Config = outcome.get_result() File "/usr/local/lib/python3.10/site-packages/pluggy/_result.py", line 60, in get_result raise ex[1].with_traceback(ex[2]) File "/usr/local/lib/python3.10/site-packages/pluggy/_callers.py", line 39, in _multicall res = hook_impl.function(*args) File "/usr/local/lib/python3.10/site-packages/_pytest/config/__init__.py", line 1060, in pytest_cmdline_parse self.parse(args) File "/usr/local/lib/python3.10/site-packages/_pytest/config/__init__.py", line 1348, in parse self._preparse(args, addopts=addopts) File "/usr/local/lib/python3.10/site-packages/_pytest/config/__init__.py", line 1231, in _preparse self.pluginmanager.load_setuptools_entrypoints("pytest11") File "/usr/local/lib/python3.10/site-packages/pluggy/_manager.py", line 287, in load_setuptools_entrypoints plugin = ep.load() File "/usr/local/lib/python3.10/importlib/metadata/__init__.py", line 171, in load module = import_module(match.group('module')) File "/usr/local/lib/python3.10/importlib/__init__.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "<frozen importlib._bootstrap>", line 1050, in _gcd_import File "<frozen importlib._bootstrap>", line 1027, in _find_and_load File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 688, in _load_unlocked File "/usr/local/lib/python3.10/site-packages/_pytest/assertion/rewrite.py", line 168, in exec_module exec(co, module.__dict__) File "/usr/local/lib/python3.10/site-packages/xdist/looponfail.py", line 16, in <module> import execnet File "/usr/local/lib/python3.10/site-packages/execnet/__init__.py", line 14, in <module> from .gateway_base import DataFormatError File "/usr/local/lib/python3.10/site-packages/execnet/gateway_base.py", line 1138, in <module> FLOAT_FORMAT_SIZE = struct.calcsize(FLOAT_FORMAT) BytesWarning: Comparison between bytes and string FINISHED PRINTING LOG FILE of test_optim (/opt/pytorch/pytorch/test/test-reports/test_optim_1pnlesrz.log) test_optim failed! Traceback (most recent call last): File "/opt/pytorch/pytorch/test/run_test.py", line 1428, in <module> main() File "/opt/pytorch/pytorch/test/run_test.py", line 1386, in main raise RuntimeError( RuntimeError: test_optim failed! Tip: You can keep running tests even on failure by passing --keep-going to run_test.py. If running on CI, add the 'keep-going' label to your PR and rerun your jobs. ``` I'd like to propose this option that allows users to use the good old python unit test way instead of pytest to run their testing in CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96444 Approved by: https://github.com/malfet	2023-03-10 01:32:15 +00:00
Catherine Lee	a7fe11dec0	--subprocess for pytest (#96210 ) Implements --subprocess flag for pytest, which previously only worked with unittest Pretty much all the tests in the custom handler list use --subprocess Pull Request resolved: https://github.com/pytorch/pytorch/pull/96210 Approved by: https://github.com/huydhn	2023-03-08 21:04:50 +00:00
BowenBao	bdb076ab43	[ONNX] Skip doctest `torch.onnx._internal.fx` if ImportError (#95686 ) Need to use `exclude` to skip the module altogether. Because xdoctest triggers `ImportError` when trying to import the module. So the whole test fails regardless if skip was added in the docstring. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95686 Approved by: https://github.com/kit1980, https://github.com/titaiwangms	2023-03-07 22:05:27 +00:00
Catherine Lee	eea0733045	Reduce pytest blocklist (#96016 ) `TestCase = object` or variations of it get switched to `TestCase = NoTest`. unittest collects test based on subclassing unittest.TestCase, so setting TestCase = object removes it from unittest test collection. pytest collects based on name (https://docs.pytest.org/en/7.1.x/reference/reference.html#confval-python_classes) but can be told to ignore a class (bottom of https://docs.pytest.org/en/7.1.x/example/pythoncollection.html#changing-naming-conventions) Pull Request resolved: https://github.com/pytorch/pytorch/pull/96016 Approved by: https://github.com/ZainRizvi, https://github.com/huydhn	2023-03-07 18:30:27 +00:00
Catherine Lee	7f5f0b3665	Run _nvfuser/test_torchscript serially (#95951 ) Started at `ce4cbac914 (11734276291)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/95951 Approved by: https://github.com/huydhn	2023-03-03 17:41:09 +00:00
Catherine Lee	d21577f28c	Run more tests through pytest (#95844 ) Run more tests through pytest. Use a block list for tests that shouldn't run through pytest. As far as I can tell, the number of tests run, skipped, and xfailed for those not on the blocklist are the same. Regarding the main module: Usually tests are run in CI, we call `python <test file>`, which causes the file to be imported under the module name `__main__`. However, pytest searches for the module to be imported under the file name, so the file will be reimported. This can cause issues for tests that run module level code and change global state, like test_nn, which modifies lists imported from another file, or tests in test/lazy, which initialize a backend that cannot coexist with a second copy of itself. My workaround for this is to run tests from the `__main__` module. However, this results in pytest being unable to rewrite assertions (and possibly other things but I don't know what other things pytest does right now). A better solution might be to call `pytest <test file>` directly and move all the code in run_tests(argv) to be module level code or put it in a hook in conftest.py. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95844 Approved by: https://github.com/huydhn	2023-03-03 17:32:26 +00:00
Bin Bao	9835c93aba	[CI] Change the way tests are triggered with dynamo and inductor (#94539 ) Summary: Currently running PyTorch tests with dynamo and inductor is controlled by environment variables, and CI sets them based on test config name matching. Change them to use options of run_test.py. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94539 Approved by: https://github.com/huydhn	2023-03-01 13:06:23 +00:00
Catherine Lee	e3c5c369ba	Run tests in USE_PYTEST_LIST through run_tests (#95659 ) Part of my effort to move everything to pytest and decrease the number of testrunner frameworks in ci Gives xmls but they might look a weird b/c module level tests vs tests in classes. Doesn't give skip/disable test infra because those are tied to classes. (for future ref, could either put tests in classes or move the check_if_enable stuff into a pytest hook) Tested in CI and checked that the same number of tests are run Pull Request resolved: https://github.com/pytorch/pytorch/pull/95659 Approved by: https://github.com/huydhn	2023-02-28 22:09:01 +00:00
Catherine Lee	5272d6e6e5	Remove mentions of distributed/_shard/test_replicated_tensor (#95632 ) The file was removed in https://github.com/pytorch/pytorch/pull/95453, which cause some issues with the multigpu job in periodic. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95632 Approved by: https://github.com/huydhn	2023-02-27 22:41:02 +00:00
Huy Do	9b7abc4fac	Run slow gradcheck tests sequentially (#95494 ) Also redo https://github.com/pytorch/pytorch/pull/95246 as there are many more still run OOM Pull Request resolved: https://github.com/pytorch/pytorch/pull/95494 Approved by: https://github.com/clee2000	2023-02-26 00:44:25 +00:00
Zain Rizvi	c97275acf6	Fix OOMing periodic shards (#95246 ) Tests have been consistently failing with the error on the following shards with the error `RuntimeError: CUDA error: out of memory` - `periodic / linux-bionic-cuda11.7-py3-gcc7-slow-gradcheck / test (default, 1, 2, linux.4xlarge.nvidia.gpu)` - `periodic / linux-bionic-cuda11.7-py3-gcc7-slow-gradcheck / test (default, 2, 2, linux.4xlarge.nvidia.gpu)` Seeing if serializing those test files makes the periodic jobs succeed again. This feels a bit off since there are so many different test files that have failed and need to be serialized, indicating a potential perf regression somewhere Failures on hud: https://hud.pytorch.org/hud/pytorch/pytorch/master/1?per_page=100&name_filter=periodic%20%2F%20linux-bionic-cuda11.7-py3-gcc7-slow-gradcheck%20%2F%20test%20(default%2C%20 Pull Request resolved: https://github.com/pytorch/pytorch/pull/95246 Approved by: https://github.com/Skylion007, https://github.com/huydhn	2023-02-23 03:50:56 +00:00
Xuehai Pan	046e88a291	[BE] [3/3] Rewrite `super()` calls in test (#94592 ) Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied. - #94587 - #94588 - #94592 Also, methods with only a `super()` call are removed: ```diff class MyModule(nn.Module): - def __init__(self): - super().__init__() - def forward(self, ...): ... ``` Some cases that change the semantics should be kept unchanged. E.g.: `f152a79be9/caffe2/python/net_printer.py (L184-L190)` `f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/94592 Approved by: https://github.com/ezyang, https://github.com/seemethere	2023-02-12 22:20:53 +00:00
Huy Do	d51ca38ef0	Run test_serialization serially (for 2xlarge runners) (#94613 ) Fixes https://github.com/pytorch/pytorch/issues/92746 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94613 Approved by: https://github.com/clee2000	2023-02-11 00:15:10 +00:00
Vasiliy Kuznetsov	f15ab8a7f2	AO migration: replace torch internal callsites (#94170 ) Summary: Do the following renames: `torch.quantization` -> `torch.ao.quantization` `torch.nn.quantized` -> `torch.ao.nn.quantized` `torch.nn.quantizable` -> `torch.ao.nn.quantizable` `torch.nn.qat` -> `torch.ao.nn.qat` `torch.nn.intrinsic` -> `torch.ao.nn.intrinsic` And then, do `torch.ao.nn.quantized._reference` -> `torch.ao.nn.quantized.reference` to clean up the aftermath of https://github.com/pytorch/pytorch/pull/84974 Then, manually update `test/test_module_init.py` to fix hanging whitespace due to the replace. Run this script to do the replacements: https://gist.github.com/vkuzo/7f7afebf8c31b9ba48306223e68a1c82 This is for https://github.com/pytorch/pytorch/issues/81667 Test plan: CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/94170 Approved by: https://github.com/jerryzh168	2023-02-07 02:32:23 +00:00
Nikita Shulga	a07d1291cf	Re-enable compilation tests (#92333 ) As CUDA-11.5 is no longer supported, just remove the check Fixes https://github.com/pytorch/pytorch/issues/69460 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92333 Approved by: https://github.com/atalman	2023-02-06 20:06:12 +00:00
Jane Xu	0ecb071fc4	[BE][CI] change references from .jenkins to .ci (#92624 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92624 Approved by: https://github.com/ZainRizvi, https://github.com/huydhn	2023-01-30 22:50:07 +00:00
PyTorch MergeBot	8b3e35ea4a	Revert "Run dynamo/test_dynamic_shapes serially (#92215 )" This reverts commit ea1007b89cb86551c80ddfd38db0bb3ade32140b. Reverted https://github.com/pytorch/pytorch/pull/92215 on behalf of https://github.com/huydhn due to This is not needed anymore as https://github.com/pytorch/pytorch/issues/92196 has been root caused to test ordering	2023-01-20 18:54:13 +00:00
Catherine Lee	25e530083e	[ci] Run test_decomp parallel (#92566 ) run test_decomp in parallel with itself since it now takes 2+ hours on some architectures https://docs.google.com/spreadsheets/d/1o0W4WjOYIyPSzBSl3lelvKcQyLOiv8pMijiGUDoPuBU/edit#gid=0 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92566 Approved by: https://github.com/huydhn	2023-01-19 20:47:27 +00:00
Huy Do	ea1007b89c	Run dynamo/test_dynamic_shapes serially (#92215 ) Per my findings in https://github.com/pytorch/pytorch/issues/92196#issuecomment-1383029544 > The test itself dynamo/test_dynamic_shapes is not flaky and all passes when I try to run it locally. However, this test is set to run in parallel with other tests on the runner (2 tests at a times). After many tries, I can only reproduce the issue once when dynamo/test_dynamic_shapes is run in parallel with test_comparison_utils After many retries, I could reproduce the issue once locally when running (https://paste.sh/_mFImq6V#FgbKq6IQBg65PKUFA08Ah_Vb) ``` python test/run_test.py --verbose --exclude-jit-executor --exclude-distributed-tests -i test_comparison_utils dynamo/test_dynamic_shapes ``` So setting this test to run serially to avoid further flakiness while the root cause is investigated. Here are some example flaky failures: * https://github.com/pytorch/pytorch/issues/92196 * https://github.com/pytorch/pytorch/issues/92178 * https://github.com/pytorch/pytorch/issues/92042 * https://github.com/pytorch/pytorch/issues/92210 The test takes 30s or so to finish, so its duration is not a concern. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92215 Approved by: https://github.com/clee2000	2023-01-17 17:54:39 +00:00
Wanchao Liang	801d831d7a	[dtensor] enable op db tests by using multithreaded test case (#92198 ) Time comparison between using MultithreadedTestCase and MultiProcessTestCase on op db tests is amazing! using MultiThreadTestCase on a AWS dev node: ``` time pytest test/distributed/_tensor/test_dtensor_ops.py ============= 175 passed, 42 skipped, 397 xfailed in 80.30s (0:01:20) ======= real 1m22.330s user 1m38.782s sys 0m18.762s ``` MultiProcessTestCase spends from 40mins to more than 1h, even if using pytest parallel testing tools. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92198 Approved by: https://github.com/XilunWu	2023-01-17 03:26:38 +00:00
Jeff Daily	ce50a8de75	[CI][ROCm] add test_dataloader to CI_SERIAL_LIST (#91895 ) Still working towards solving #90940 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/91895 Approved by: https://github.com/huydhn	2023-01-10 16:32:39 +00:00
Catherine Lee	e67f5ab6cc	Print and zip remaining test logs (#91510 ) When CI times out or gets cancelled, the code to print and delete logs for currently running tests doesn't get run, which makes it hard to debug what's going on, so print the logs in a new step and also zip them into the usage-log zip (which should probably get a name change at some point) Pull Request resolved: https://github.com/pytorch/pytorch/pull/91510 Approved by: https://github.com/malfet, https://github.com/huydhn, https://github.com/ZainRizvi	2023-01-09 17:31:36 +00:00
Jeff Daily	f44946289b	[CI][ROCm] fix device visibility, again (#91813 ) The previous PR #91137 was incomplete. Though it successfully queried for the number of available GPUs, it still resulted in test files sharing the same GPU. This PR lifts the maxtasksperchild=1 restriction so that Pool workers will always use the same GPU. This also adds a Note in run_test.py for future reference. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91813 Approved by: https://github.com/kit1980, https://github.com/huydhn, https://github.com/malfet	2023-01-06 22:19:07 +00:00
Jeff Daily	c18e8c68d8	[ROCm] fix parallel test runners and device visibility (#91137 ) Fixes #90940. This PR revamps how tests are run in parallel as well as device visibility at the docker container and within the run_test.py test runner. First, running multiple test modules concurrently on the same GPU was causing instability for ROCm runners manifesting as timeouts. ROCm runners have at least 1 GPU each, but often 2 or more. This PR allows NUM_PROCS to be set equal to the number of devices available, but also takes care to set HIP_VISIBLE_DEVICES to avoid oversubscribing any GPU. Second, we had introduced env vars `-e ROCR_VISIBLE_DEVICES` (#91031) to prepare for two GHA runners per CI node, to split up the GPU visibility at the docker level between the two runners. This effort wasn't fully realized; to date, we haven't had more than one runner per CI host. We abandon this effort in favor of all GPUs being visible to a single runner and managing GPU resources as stated above. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91137 Approved by: https://github.com/kit1980, https://github.com/huydhn, https://github.com/pruthvistony	2023-01-04 19:40:05 +00:00
Huy Do	316ba9e6fc	Run jit legacy tests sequentially (#91518 ) Fixes https://github.com/pytorch/pytorch/issues/91457. I have been re-running the 2 tests `test_jit_legacy` and `test_jit_fuser_legacy` in `jit_legacy` shard multiple times (100+) without any flaky issues found. I suspect that we might have a test parallelization flakiness here. So this PR runs these 2 tests serially. They takes less than 5 minutes to finish, so running them sequentially won't be an issue (https://hud.pytorch.org/hud/pytorch/pytorch/master/1?per_page=50&name_filter=jit_legacy) Pull Request resolved: https://github.com/pytorch/pytorch/pull/91518 Approved by: https://github.com/clee2000	2023-01-04 04:13:01 +00:00
joncrall	ad782ff7df	Enable xdoctest runner in CI for real this time (#83816 ) Builds on #83317 and enables running the doctests. Just need to figure out what is causing the failures. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83816 Approved by: https://github.com/ezyang, https://github.com/malfet	2022-12-29 05:32:42 +00:00
Huy Do	8f16524598	Run test_spectral_ops serially to fix CUDA illegal memory access (#91264 ) Fixes https://github.com/pytorch/pytorch/issues/88916 * Running this test sequentially is not flaky after 1000 reruns `pytest --verbose test_spectral_ops.py -k test_fft_round_trip_cuda_float32 --flake-finder --flake-runs=1000` * On the other hand, the curious thing is that when I run this same command on an active runner with some testing processs running in the background, the reruns could fail with CUDA illegal memory access error (hard to reproduce though) https://paste.sh/6sZdRn95#pve73riXC5XehCLqxlCbnjea. This points to the fact that running the `test_spectral_ops` test in parallel with others might be the surface-level cause of flakiness So this PR adds the test to the serial list instead. This shouldn't cause any issue w.r.t TTS because the test takes only half a minute at most to finish. ``` +---------------------+-------------------------------------------------+-------------+---------------------+ \| file \| base_name \| test_config \| time \| +---------------------+-------------------------------------------------+-------------+---------------------+ \| "test_spectral_ops" \| "cuda11.6-py3.10-gcc7-sm86" \| "default" \| 5.991666666666661 \| \| "test_spectral_ops" \| "cuda11.6-py3.10-gcc7-sm86" \| "slow" \| 0.18433333333333346 \| \| "test_spectral_ops" \| "linux-bionic-cuda11.6-py3-gcc7-slow-gradcheck" \| "default" \| 9.866000000000003 \| \| "test_spectral_ops" \| "linux-bionic-cuda11.6-py3.10-gcc7" \| "default" \| 10.591333333333337 \| \| "test_spectral_ops" \| "linux-bionic-cuda11.6-py3.7-gcc7-debug" \| "default" \| 11.395000000000003 \| \| "test_spectral_ops" \| "linux-bionic-cuda11.7-py3.10-gcc7" \| "default" \| 9.424 \| \| "test_spectral_ops" \| "linux-bionic-cuda11.7-py3.7-gcc7-debug" \| "default" \| 8.889000000000003 \| \| "test_spectral_ops" \| "linux-bionic-py3.7-clang9" \| "crossref" \| 6.280333333333329 \| \| "test_spectral_ops" \| "linux-bionic-py3.7-clang9" \| "default" \| 12.182999999999998 \| \| "test_spectral_ops" \| "linux-bionic-py3.7-clang9" \| "dynamo" \| 11.124999999999984 \| \| "test_spectral_ops" \| "linux-bionic-py3.7-clang9-slow" \| "slow" \| 0.1916666666666668 \| \| "test_spectral_ops" \| "linux-focal-py3.7-clang7-asan" \| "default" \| 20.899666666666658 \| \| "test_spectral_ops" \| "linux-focal-py3.7-gcc7" \| "default" \| 5.097999999999996 \| \| "test_spectral_ops" \| "linux-focal-rocm5.3-py3.8-slow" \| "slow" \| 0.23700000000000018 \| \| "test_spectral_ops" \| "macos-12-py3-arm64" \| "default" \| 2.8396666666666626 \| \| "test_spectral_ops" \| "macos-12-py3-x86-64" \| "default" \| 8.838999999999997 \| \| "test_spectral_ops" \| "parallelnative-linux-focal-py3.7-gcc7" \| "default" \| 5.016999999999998 \| \| "test_spectral_ops" \| "win-vs2019-cpu-py3" \| "default" \| 8.351666666666665 \| \| "test_spectral_ops" \| "win-vs2019-cuda11.6-py3" \| "default" \| 27.121666666666687 \| \| "test_spectral_ops" \| "win-vs2019-cuda11.7-py3" \| "default" \| 24.567000000000025 \| +---------------------+-------------------------------------------------+-------------+---------------------+ ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/91264 Approved by: https://github.com/clee2000	2022-12-22 02:39:33 +00:00

... 3 4 5 6 7 ...

695 Commits