<!--
copilot:summary
-->
### <samp>🤖 Generated by Copilot at 4f0b524</samp>
This pull request updates the codebase and the documentation to use C++17 instead of C++14 as the minimum required C++ standard. This affects the `ATen`, `c10`, and `torch` libraries and their dependencies, as well as the CI system and the `conda` package metadata.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100557
Approved by: https://github.com/malfet
`getpass.getuser` may raise exceptions in some circumstances, where users cannot override the default cache dir with env `TORCHINDUCTOR_CACHE_DIR`. Hence the assemble of default cache dir should be lazily evaluated.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100824
Approved by: https://github.com/ezyang
cudaGetLastError and hipGetLastError will clear any error value within CUDA and HIP, respectively. This is often done on purpose to clear benign errors. Discarding the return value should be indicated by casting to void and a nearby comment. This silences warnings from HIP:
warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
Performing an audit of pytorch sources found one use of cudaGetLastError that was incorrectly ignored in IndexKernel.cu.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100488
Approved by: https://github.com/ezyang
Summary:
Previously the node annotation looks like the following:
```
node.meta["..."] = {
"input_act_obs_or_fq_ctr": ...,
"weight_obs_or_fq_ctr": ...,
"weight_index": 1,
}
```
Basically we need specifiy the index for weight and also have a separate key for weight config, in this PR we changed that to:
```
node.meta["..."] = {
"input_act_obs_or_fq_ctr_map": {input_node: ..., weight_node: ...},
}
```
This can support specifying the observer/fake quant constructor for any argument of the node
Test Plan: buck2 test @//mode/opt //caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_resnet18_with_quantizer_api (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2EModels)'
Reviewed By: kimishpatel
Differential Revision: D45553195
Pull Request resolved: https://github.com/pytorch/pytorch/pull/101005
Approved by: https://github.com/kimishpatel
Fixes the error:
```
/var/lib/jenkins/pytorch/test/inductor/test_torchinductor.py:6021: PytestCollectionWarning: cannot collect test class 'TestFailure' because it has a __init__ constructor (from: test/inductor/test_torchinductor.py)
class TestFailure:
```
It does so by marking the class as not actually being a test class, despite it's name starting with `Test`.
For more details see: https://stackoverflow.com/a/72465142/21539
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100949
Approved by: https://github.com/huydhn
Dynamo will frequently segfault when attempting to print stack traces. We fix this by:
- Fixing stack size calculations, as we did not account for exception tables
- Creating shadow execution frames in a way that more closely resembles what CPython does to create its execution frames
Dynamo/inductor-wrapped pytorch tests are enabled up the stack - those need to be green before this PR can be merged.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99934
Approved by: https://github.com/albanD, https://github.com/malfet, https://github.com/jansel
Summary:
For each op, we have a List[List[dtype;dim-order]]:
- the inner list contains the `dtype;dim-order` info for each arg if we have a Tensor/TensorList/OptionalTensorList
- the outer list contains different occurances of dtype/dim-order combinations for that op in the program
Example:
```
et_kernel_metadata:
aten::add.out:
# A list of different dtype/dim-order combinations used in model
- # Each contains the list of args of Tensor dtype and dim order if applicable
- FLOAT;0,1
- FLOAT;0,1
- NON_TENSOR_ARG
- FLOAT;0,1
- FLOAT;0,1
-
- INT;0,1
- INT;0,1
- NON_TENSOR_ARG
- INT;0,1
- INT;0,1
aten::mul.out:
- - FLOAT;0,1
- FLOAT;0,1
- FLOAT;0,1
- FLOAT;0,1
```
We don't have the arg name so far; we need to parse the schema (functions.yaml) to get that info. We depend on the order of args from that file.
Test Plan: `buck run fbcode//executorch/codegen/tools:test_gen_oplist_real_model`
Differential Revision: D45551409
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100665
Approved by: https://github.com/larryliu0820
After https://github.com/pytorch/pytorch/pull/99559, we can now run C++ test with `run_test.py`. Although advance features such as `--import-slow-tests` and `--import-disabled-tests` won't work for now, there will still be a gain in reliability and performance as C++ can now be retried and run in parallel.
This covers all C++ tests in the CI including aten, libtorch, and Vulkan C++ tests across all platforms Linux, Windows, MacOS.
Notes:
* To support C++ test discovery, the env variable `CPP_TESTS_DIR` can be set to where the C++ test binaries is located
* Support pytest -k argument via run_test as this is used by pytest-cpp to replace `--gtest-filter`
* The XML output is in pytest format, but it's ok now because we don't have slow test or flaky test support for C++ test yet
* ~~I need to figure out why conftest.py doesn't work when I invoke pytest directly for C++ test, so `--sc` is not available for C++ tests at the moment. Proper pytest plugin like stepwise works fine though. I'll investigate and fix it in a separate PR~~ Found the cause, `conftest.py` is per directory and needs to be in any arbitrary directory that holds C++ test
* Two tests `test_api` and `test_tensorexpr` timed out on ASAN, I suspect that ASAN is now used on top of the python executable, which is slower than running native C++ code. IMO, it's ok to run these tests as before on ASAN for now
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99956
Approved by: https://github.com/clee2000, https://github.com/ZainRizvi
Summary:
This fixes flakiness of div_to_scalar_wrapped
See [here](b89f74aa35) for flakiness of div_to_scalar_wrapped
Test Plan:
On Devserver:
```
LD_LIBRARY_PATH=third-party/swiftshader/lib/linux-x64/ buck run //xplat/caffe2:pt_vulkan_api_test_bin
```
On Mac:
```
buck run --target-platforms ovr_config//platform/macos:arm64-fbsource -c pt.vulkan_full_precision=1 //xplat/caffe2:pt_vulkan_api_test_binAppleMac\#macosx-arm64
```
To test that these changes fixed flakiness of div_to_scalar_wrapped, I ran the test 1000 times on devserver before the changes, and observed failures. Then ran it 1000 times after the changes and didn't observe any failures.
Reviewed By: SS-JIA
Differential Revision: D45670642
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100909
Approved by: https://github.com/SS-JIA
Summary: This tests running a conv2d with clamp after dividing the input tensor by another tensor. Both tensors have number channels = 3 (i.e. not a multiple of 4) and therefore, the channel dimension was padded. Hence, we are testing our divide-by-zero fix (D44392406)
Test Plan:
```
buck run --target-platforms ovr_config//platform/macos:arm64-fbsource -c pt.vulkan_full_precision=1 //xplat/caffe2:pt_vulkan_api_test_binAppleMac\#macosx-arm64 -- --gtest_filter="VulkanAPITest.conv2d_clamp_after_div"
```
Reviewed By: SS-JIA
Differential Revision: D44550026
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100910
Approved by: https://github.com/SS-JIA
Summary:
This PR adds support for folding bn weights into conv for QAT flow, this is equivalent
to the QAT branch of `from_float` in eager mode quantized conv module: https://github.com/pytorch/pytorch/blob/main/torch/ao/nn/quantized/modules/conv.py#L223
Items that needs followup:
* there are some workaround I did because quantize_per_tensor is using float/int args and dynamo does not support these args, need to fix after we change the quantized model representation and also change these args to Tensor
Test Plan: buck2 test @//mode/opt //caffe2/test:quantization_pt2e -- --exact 'caffe2/test:quantization_pt2e - test_convert_qat_conv_bn_fusion (quantization.pt2e.test_quantize_pt2e.TestQuantizePT2E)'
Reviewed By: andrewor14
Differential Revision: D45344281
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100442
Approved by: https://github.com/kimishpatel
PyTorch is C++17 project, so let's use some C++17 features.
I.e. `s/std::is_same<X, Y>::value/std::is_same_v<X, Y>`
And use `if constexpr` in few places when this construct is used.
<!--
copilot:poem
-->
### <samp>🤖 Generated by Copilot at 7b7683f</samp>
> _We're sailing on the sea of code, we're making it more neat_
> _We're using `is_same_v` and `if constexpr` to keep it sweet_
> _We're refactoring the range tensor logic, we're avoiding duplication_
> _We're heaving on the ropes of `Distributions.mm`, on the count of three, with elation_
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100975
Approved by: https://github.com/jeanschmidt, https://github.com/albanD, https://github.com/kulinseth, https://github.com/Skylion007
Without these changes, it can be hard to know which magic methods are not implemented on a given ScriptObject.
before:
```py
torch.ops.load_library("somelib.so")
c = torch.classes.somelib.SomeClass()
print(len(c))
# raise NotImplementedError
```
after:
```py
torch.ops.load_library("somelib.so")
c = torch.classes.somelib.SomeClass()
print(len(c))
# raise NotImplementedError: '__len__' is not implemented for __torch__.torch.classes.somelib.SomeClass
```
------
I could not find a linked issue, if you want me to open one as well I can do this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100171
Approved by: https://github.com/ezyang
Summary:
Currently there are build configs where the torchdynamo import trips over a
strange SystemError related to some module's __dict__.items() returning NULL,
while torchdynamo tries to iterate all torch modules and process them for
its allowed functions list.
While this is hard to repro, we should be able to work around it and then fix
it properly.
Test Plan: Rely on others to test this, assuming CI passes.
Reviewed By: anijain2305
Differential Revision: D45663313
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100901
Approved by: https://github.com/yanboliang, https://github.com/malfet