Fixes #ISSUE_NUMBER
Similar to #143682, for large maximum values we were sampling integers via % and it doesn't provide uniform distribution. Here we limit the max skew to approx 1% (random32 is used for max values `<= 2**32 / 128`)
This comes with significant perf penalty, especially for cuda, but it's a pretty bad bug, so we'll have to figure out what can be done to improve it.
`torch.compile` has always been producing correct results for this, and it's performance is also significantly better than current eager (eager is ~660 GB/s on H100, torch.compile 1200 GB/s), so we have to figure out why torch.compile is better.
`__launch_bounds__` slightly regress perf, so perhaps we can figure out how to specify them better, but it's only 20-30 GB/s, so the big difference is still unexplained.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/143787
Approved by: https://github.com/eqy
Related to #107302
We saw `test_float_to_int_conversion_nonfinite` failed as we upgrade to NumPy 2.
It is caused by the undefined behavior of `numpy` casting `inf`, `-inf` and `nan` from `np.float32` to other dtypes.
The test is using NumPy as reference for the ground truth. (see line 1013-1015)
However, these behaviors are undefined in NumPy.
If you do `np.array([float("inf")]).astype(np.uint8, casting="safe")`, it results in an error `TypeError: Cannot cast array data from dtype('float64') to dtype('uint8') according to the rule 'safe'`.
The undefined behaviors are always subject to change.
This PR address this issue by passing concrete values as the ground truth references.
In the future, even NumPy changes its behavior the test would still remain stable.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138131
Approved by: https://github.com/drisspg
Summary: Float8 is becoming and increasingly popular datatype now that it is well supported on GPUs. This diff enables FP8 to work with `torch.cat`. This is pretty straight forward since memory operations dont vary based on the input dtype, but can be quite helpful for FP8 based models.
Test Plan:
```
buck2 run mode/opt -c fbcode.enable_gpu_sections=true -c fbcode.platform=platform010 -c fbcode.nvcc_arch=h100a -c fbcode.platform010_cuda_version=12 //caffe2/test:tensor_creation -- -r test_cat_all_dtypes_and_devices
```
Differential Revision: D64443965
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138046
Approved by: https://github.com/eqy, https://github.com/qchip, https://github.com/jianyuh
Update ruff to 0.4.1 .
This version fixes a lot false negatives/false positives, is 20-40% faster, and has various other bug fixes.
Below is a before and after table showing the execution time of ruff lint and ruff format in milliseconds courtesy of https://astral.sh/blog/ruff-v0.4.0
| Repository | Linter (v0.3) | Linter (v0.4) | Formatter (v0.3) | Formatter (v0.4) |
|----------------------------------------------------|---------------|---------------|------------------|------------------|
| [pytorch/pytorch](https://github.com/pytorch/pytorch) | 328.7 | 251.8 | 351.1 | 274.9 |
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124549
Approved by: https://github.com/ezyang
Using `@skipifTorchDynamo` is wrong, the correct usage is
`@skipIfTorchDynamo()` or `@skipIfTorchDynamo("msg")`. This would cause
tests to stop existing.
Added an assertion for this and fixed the incorrect callsites.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117114
Approved by: https://github.com/voznesenskym
Fixes#115595
As an aside, there are currently no tests checking the output of `torch.signal.windows.kaiser` against the output of scipy's implementation, which is what is done with `torch.kaiser_window`. The same goes for the other window functions in `torch.signal.windows`. I did some tests on my end, but I'm not sure what the best practice is, so I haven't included them for now.
@gchanan @mruberry
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116470
Approved by: https://github.com/ezyang
Fixes#68972
Relands #107246
To avoid causing Meta-internal CI failures, this PR avoids always asserting that the default dtype is float in the `TestCase.setUp/tearDown` methods. Instead, the assert is only done if `TestCase._default_dtype_check_enabled == True`. `_default_dtype_check_enabled` is set to True in the `if __name__ == "__main__":` blocks of all the relevant test files that have required changes for this issue
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108088
Approved by: https://github.com/ezyang
This updates ruff to 0.285 which is faster, better, and have fixes a bunch of false negatives with regards to fstrings.
I also enabled RUF017 which looks for accidental quadratic list summation. Luckily, seems like there are no instances of it in our codebase, so enabling it so that it stays like that. :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107519
Approved by: https://github.com/ezyang
This updates ruff to 0.285 which is faster, better, and have fixes a bunch of false negatives with regards to fstrings.
I also enabled RUF017 which looks for accidental quadratic list summation. Luckily, seems like there are no instances of it in our codebase, so enabling it so that it stays like that. :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107519
Approved by: https://github.com/ezyang
RFC: https://github.com/pytorch/rfcs/pull/54
First commit is the contents of https://github.com/Quansight-Labs/numpy_pytorch_interop/
We have already been using this in core for the last few months as a external dependency. This PR pulls all these into core.
In the next commits, I do a number of things in this order
- Fix a few small issues
- Make the tests that this PR adds pass
- Bend backwards until lintrunner passes
- Remove the optional dependency on `torch_np` and simply rely on the upstreamed code
- Fix a number dynamo tests that were passing before (they were not tasting anything I think) and are not passing now.
Missing from this PR (but not blocking):
- Have a flag that deactivates tracing NumPy functions and simply breaks. There used to be one but after the merge stopped working and I removed it. @lezcano to investigate.
- https://github.com/pytorch/pytorch/pull/106431#issuecomment-1667079543. @voznesenskym to submit a fix after we merge.
All the tests in `tests/torch_np` take about 75s to run.
This was a work by @ev-br, @rgommers @honno and I. I did not create this PR via ghstack (which would have been convenient) as this is a collaboration, and ghstack doesn't allow for shared contributions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106211
Approved by: https://github.com/ezyang
Notes:
- No segfaults observed in any CI tests: dynamo unittests, inductor unittests, dynamo-wrapped pytorch tests. So we remove the warning that using dynamo 3.11 may result in segfaults.
- Some dynamo-wrapped pytorch tests hang. They will be skipped in the dynamo-wrapped test suite and will be addressed in a future PR
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99180
Approved by: https://github.com/malfet
This PR fixes divergent value issues in converting float32 to uint8. The failures of `TestTensorCreationCPU.test_float_to_int_conversion_finite_cpu_uint8` came from the divergent values of PyTorch and numpy among platforms. This PR adds two items:
- Enhance `_float_to_int_conversion_helper()` to have given reference values to provide the stable reference value
- Omit a test for `float.max` since the results on PyTorch are divergent (e.g. `float.max` -> `uint8` is 0 on x86_64, or 255 on s390x).
Fixes#97794
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98916
Approved by: https://github.com/dagitses
Follow up from: Quansight-Labs/numpy_pytorch_interop#3
This PR adds support for NumPy scalars for `torch.asarray`.
**Before:** treats the scalar as an object that implements the buffer protocol. Thus, interprets the data as the default data type (`float32`)
```python
>>> torch.asarray(numpy.float64(0.5))
tensor([0.0000, 1.7500])
```
**After:** identifies the NumPy scalar, and does the "right" thing. i.e. creates a 0-dimensional tensor from the NumPy array that doesn't share its memory
```python
>>> torch.asarray(numpy.float64(0.5))
tensor(0.5000, dtype=torch.float64)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90914
Approved by: https://github.com/lezcano, https://github.com/mruberry
Fixes#88652
In the CPU implementation of linspace of integral types, `base` type in vectorized implementation is `int64_t`, which will drop the precision when `base` comes from a floating number. Meanwhile, its vectorized implementation tends to suffer from the catastrophic cancellation of floating point arithemtic since both the `base (start + step * idx)` and the `step` are not exact. Its scalar implementation is fine since start is always an integer and the result would be truncated to integer as well.
Therefore, in this PR , we will skip the vectorized implementation since the vec doesn't contribute to performance anyway. And now the behaviors between CPU and GPU are the same. In some cases, the results are the same as numpy's. In some other cases, the results are different from numpy's, but it is not related to the devices (CPU and GPU). https://github.com/pytorch/pytorch/issues/81996#issuecomment-1192980485
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89048
Approved by: https://github.com/mingfeima, https://github.com/jgong5, https://github.com/albanD