pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
albanD	29ddf9a63e	Document dispatch trace build flag (#145517 ) Ok, the build flag seems to have been broken for a while since the function it calls doesn't exist anymore. Repurposed it to enable dispatcher printing (which requires a full (and slow) debug build otherwise). Pull Request resolved: https://github.com/pytorch/pytorch/pull/145517 Approved by: https://github.com/bdhirsh	2025-01-24 03:19:39 +00:00
Nikhil Gupta	41b38f755c	Revert "Reverting the PR adding Kleidiai-based int4 kernels (#145392 )" (#145505 ) https://github.com/pytorch/pytorch/pull/134124 was reverted by https://github.com/pytorch/pytorch/pull/145392 due to KleidiAI clone issue. 1. This reverts commit 0940eb6d44f3cf69dd840db990245cbe1f78e770 (https://github.com/pytorch/pytorch/pull/145392 )and Fixes KleidiAI mirror issue. 2. KleidiAI is now cloned from github mirror instead of arm gitlab Change-Id: I7d6eee7214cd117d3057d615936fcc3ee6052fa2 Fixes https://github.com/pytorch/pytorch/issues/145273 Pull Request resolved: https://github.com/pytorch/pytorch/pull/145505 Approved by: https://github.com/malfet	2025-01-23 18:50:59 +00:00
albanD	0940eb6d44	Reverting the PR adding Kleidiai-based int4 kernels (#145392 ) Mitigation for https://github.com/pytorch/pytorch/issues/145273 Reverting https://github.com/pytorch/pytorch/pull/134124 and https://github.com/pytorch/pytorch/pull/144074 Pull Request resolved: https://github.com/pytorch/pytorch/pull/145392 Approved by: https://github.com/ZainRizvi, https://github.com/malfet, https://github.com/atalman, https://github.com/digantdesai	2025-01-22 20:11:49 +00:00
Nikita Shulga	dc9b77cc55	[MPS] Support includes in metal objects (#145087 ) Useful for code reuse for Metal shader build both for eager mode and MPSInductor, but it requires one to implement `_cpp_embed_headers` tool that, as name suggests, would preprocess and embeds the for shader to be used in dynamic compilation. Test using: - `TestMetalLibrary.test_metal_include` - Moving `i0`/`i1` implementation to `c10/util/metal_special_math.h` and call it from `SpecialOps.metal` shader, which now looks much more compact: ```metal template <typename T, typename Tout = T> void kernel i0(constant T* input, device Tout* output, uint index [[thread_position_in_grid]]) { output[index] = c10::i0(static_cast<Tout>(input[index])); } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/145087 Approved by: https://github.com/dcci ghstack dependencies: #145023	2025-01-18 05:35:22 +00:00
atalman	a215e174a1	[BE] Remove conda from scripts and build files Part 2 (#145015 ) Continuation of https://github.com/pytorch/pytorch/pull/144870 Remove conda logic from scripts: 1. Remove conda build from triton build script 2. Remove conda checks from setup.py 3. Remove conda from release scripts 4. Script read_conda_versions.sh is not used (checked via git grep) Related to: https://github.com/pytorch/pytorch/issues/138506 Pull Request resolved: https://github.com/pytorch/pytorch/pull/145015 Approved by: https://github.com/malfet, https://github.com/Skylion007	2025-01-17 16:26:24 +00:00
PyTorch MergeBot	94c0f15302	Revert "cpp_wrapper: Move #includes to per-device header files (#143909 )" This reverts commit d62b3979dadfa4928ec1c76e850f874d49803125. Reverted https://github.com/pytorch/pytorch/pull/143909 on behalf of https://github.com/kit1980 due to breaking internal builds because of removal of torch‎/_inductor‎/codegen‎/aoti_runtime‎/implementation.cpp‎ ([comment](https://github.com/pytorch/pytorch/pull/143909#issuecomment-2597188669))	2025-01-17 00:36:38 +00:00
Benjamin Glass	d62b3979da	cpp_wrapper: Move #includes to per-device header files (#143909 ) This prepares us for the next PR in the stack, where we introduce pre-compiled per-device header files to save compilation time. Differential Revision: [D67938955](https://our.internmc.facebook.com/intern/diff/D67938955) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143909 Approved by: https://github.com/desertfire	2025-01-15 21:14:02 +00:00
atalman	e14c36d3f4	Set maximum supported version of Python as 3.13 (#144396 ) Same as https://github.com/pytorch/pytorch/pull/119743 Required for Release 2.6.0 Pull Request resolved: https://github.com/pytorch/pytorch/pull/144396 Approved by: https://github.com/Skylion007, https://github.com/albanD, https://github.com/malfet	2025-01-08 16:16:10 +00:00
Nikhil Gupta	94737e8a2a	[ARM][feat]: Add 4 bit dynamic quantization matmuls & KleidiAI Backend (#134124 ) Description: 1. Quantize Linear Layer Weights to 4-bits: Quantize the weights of the Linear layer to 4 bits, using symmetric quantization. Pack two 4-bit weights into one uint8 container. Choose a quantization scheme (channel-wise or group-wise), with the group size being a multiple of 32. 2. Prepare Quantized Weights, Scales, and Optional Bias: After quantizing, obtain the quantized_weights, scales, and groupsize. If the original Linear layer has a bias, prepare it as well. 3. Pack the Weights Efficiently: Use torch.ops.aten._dyn_quant_pack_4bit_weight to optimally pack the weights, scales, and optional bias. ```python packed_weights = torch.ops.aten._dyn_quant_pack_4bit_weight(weight, scales_and_zeros, bias, groupsize, in_features, out_features) ``` Input parameters should include: in_features and out_features (the same as the Linear layer’s corresponding parameters). 4. Perform Dynamic Quantized Matrix Multiplication: Use torch.ops.aten._dyn_quant_matmul_4bit to perform matrix multiplication with quantized weights. ```python output = torch.ops.aten._dyn_quant_matmul_4bit(input, packed_weights, groupsize, in_features, out_features) ``` Inputs required include: The input tensor, packed_weights , groupsize, and the in_features and out_features. API Usage: https://github.com/pytorch/pytorch/issues/143289 Model Perf : 7B Transformer model: Prefill : 340 t/s Decode : 40 t/s 2B Transformer model Prefill : 747 t/s Decode : 80 t/s Tests: python test/test_linalg.py -k test__dyn_quant_pack_4bit_weight Ran 1 test in 0.016s OK python test/test_linalg.py -k test__dyn_quant_matmul_4bit Ran 8 tests in 0.077s OK python test/test_linalg.py -k test_compile_dyn_quant_matmul_4bit Ran 8 tests in 11.454s Change-Id: Ia1672bad5e6ec94e64d8bb1971395d60f4b3a452 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/134124 Approved by: https://github.com/digantdesai, https://github.com/malfet	2024-12-20 19:32:03 +00:00
PyTorch MergeBot	8136daff5a	Revert "[ARM][feat]: Add 4 bit dynamic quantization matmuls & KleidiAI Backend (#134124 )" This reverts commit 4b82251011f85f9d1395b451d61e976af844d9b1. Reverted https://github.com/pytorch/pytorch/pull/134124 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it breaks lots of internal build ([comment](https://github.com/pytorch/pytorch/pull/134124#issuecomment-2555953189))	2024-12-19 23:33:17 +00:00
Nikhil Gupta	4b82251011	[ARM][feat]: Add 4 bit dynamic quantization matmuls & KleidiAI Backend (#134124 ) Description: 1. Quantize Linear Layer Weights to 4-bits: Quantize the weights of the Linear layer to 4 bits, using symmetric quantization. Pack two 4-bit weights into one uint8 container. Choose a quantization scheme (channel-wise or group-wise), with the group size being a multiple of 32. 2. Prepare Quantized Weights, Scales, and Optional Bias: After quantizing, obtain the quantized_weights, scales, and groupsize. If the original Linear layer has a bias, prepare it as well. 3. Pack the Weights Efficiently: Use torch.ops.aten._dyn_quant_pack_4bit_weight to optimally pack the weights, scales, and optional bias. ```python packed_weights = torch.ops.aten._dyn_quant_pack_4bit_weight(weight, scales_and_zeros, bias, groupsize, in_features, out_features) ``` Input parameters should include: in_features and out_features (the same as the Linear layer’s corresponding parameters). 4. Perform Dynamic Quantized Matrix Multiplication: Use torch.ops.aten._dyn_quant_matmul_4bit to perform matrix multiplication with quantized weights. ```python output = torch.ops.aten._dyn_quant_matmul_4bit(input, packed_weights, groupsize, in_features, out_features) ``` Inputs required include: The input tensor, packed_weights , groupsize, and the in_features and out_features. API Usage: https://github.com/pytorch/pytorch/issues/143289 Model Perf : 7B Transformer model: Prefill : 340 t/s Decode : 40 t/s 2B Transformer model Prefill : 747 t/s Decode : 80 t/s Tests: python test/test_linalg.py -k test__dyn_quant_pack_4bit_weight Ran 1 test in 0.016s OK python test/test_linalg.py -k test__dyn_quant_matmul_4bit Ran 8 tests in 0.077s OK python test/test_linalg.py -k test_compile_dyn_quant_matmul_4bit Ran 8 tests in 11.454s Change-Id: Ia1672bad5e6ec94e64d8bb1971395d60f4b3a452 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/134124 Approved by: https://github.com/digantdesai, https://github.com/malfet	2024-12-19 18:51:26 +00:00
PyTorch MergeBot	14fe1f7190	Revert "[ARM][feat]: Add 4 bit dynamic quantization matmuls & KleidiAI Backend (#134124 )" This reverts commit d3ff2d42c28a2c187cbedfd8f60b84a4dfa2d6bf. Reverted https://github.com/pytorch/pytorch/pull/134124 on behalf of https://github.com/malfet due to This broke S390 builds, includes cpuinfo unconditionally ([comment](https://github.com/pytorch/pytorch/pull/134124#issuecomment-2552560208))	2024-12-19 01:05:11 +00:00
Nikhil Gupta	d3ff2d42c2	[ARM][feat]: Add 4 bit dynamic quantization matmuls & KleidiAI Backend (#134124 ) Description: 1. Quantize Linear Layer Weights to 4-bits: Quantize the weights of the Linear layer to 4 bits, using symmetric quantization. Pack two 4-bit weights into one uint8 container. Choose a quantization scheme (channel-wise or group-wise), with the group size being a multiple of 32. 2. Prepare Quantized Weights, Scales, and Optional Bias: After quantizing, obtain the quantized_weights, scales, and groupsize. If the original Linear layer has a bias, prepare it as well. 3. Pack the Weights Efficiently: Use torch.ops.aten._dyn_quant_pack_4bit_weight to optimally pack the weights, scales, and optional bias. ```python packed_weights = torch.ops.aten._dyn_quant_pack_4bit_weight(weight, scales_and_zeros, bias, groupsize, in_features, out_features) ``` Input parameters should include: in_features and out_features (the same as the Linear layer’s corresponding parameters). 4. Perform Dynamic Quantized Matrix Multiplication: Use torch.ops.aten._dyn_quant_matmul_4bit to perform matrix multiplication with quantized weights. ```python output = torch.ops.aten._dyn_quant_matmul_4bit(input, packed_weights, groupsize, in_features, out_features) ``` Inputs required include: The input tensor, packed_weights , groupsize, and the in_features and out_features. API Usage: https://github.com/pytorch/pytorch/issues/143289 Model Perf : 7B Transformer model: Prefill : 340 t/s Decode : 40 t/s 2B Transformer model Prefill : 747 t/s Decode : 80 t/s Tests: python test/test_linalg.py -k test__dyn_quant_pack_4bit_weight Ran 1 test in 0.016s OK python test/test_linalg.py -k test__dyn_quant_matmul_4bit Ran 8 tests in 0.077s OK python test/test_linalg.py -k test_compile_dyn_quant_matmul_4bit Ran 8 tests in 11.454s Change-Id: Ia1672bad5e6ec94e64d8bb1971395d60f4b3a452 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/134124 Approved by: https://github.com/digantdesai, https://github.com/malfet	2024-12-18 22:30:07 +00:00
Xinya Zhang	424156c26c	[ROCm] Update to AOTriton 0.8b (#140172 ) Notable new features for SDPA operators on AMD systems from AOTriton 0.8b: 1. Nestedtensor support; 2. MQA/GQA support; 3. Restore Efficient attention support for causal=True and seqlen_q != seqlen_k cases; + The kernel should use top-left alignment, bottom right alignment will be added later 4. Move gfx1100 (RX7900/W7800/W7900) out of experimental support status. However, users are strongly recommended to update to ROCM 6.2.4, notably for its firmware updates. Related unit tests are enabled as well. Notable related changes from AOTriton 0.8b: 1. AOTriton 0.8b moves the GPU kernel out of libaotriton.so to a separate directory `aotriton.images`; 2. LZMA replaces ZSTD as GPU kernel compression algorithm for better compression ratio: aotriton0.8b (.so + aotriton.images take 350MB) compared to aotriton0.7b .so: 800MB 3. The compression cannot be disabled now, and `liblzma` is hard run-time dependency. + Should not be a problem, since `lzma` is part of Python Standard Library Pull Request resolved: https://github.com/pytorch/pytorch/pull/140172 Approved by: https://github.com/jithunnair-amd, https://github.com/jeffdaily Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com>	2024-12-06 21:45:18 +00:00
Zhengxu Chen	1a7da6e7e9	[export] Add test to enforce consistency between synced thrift and generated thrift from schema.py (#141989 ) Summary: In this diff we implement a way to ensure the internal thrift schema from cfgr (configerator/structs/caffe2/torch/export/schema.thrift) and the schema in OSS (torch/_export/serde/schema.thrift) are in sync, by adding a unittest to reflect on the type names and fields from each schema and compare them field by field. When we detect new fields/types from torch/_export/serde/schema.thrift, there'll be a test failure on the trunk and the error message hints people to add the missing field/type to the thrift schema from cfgr, so that they are always in sync in practice. Test Plan: buck test mode/opt caffe2/test:test_export -- -r test_thrift_schema_in_sync Differential Revision: D66716834 Pull Request resolved: https://github.com/pytorch/pytorch/pull/141989 Approved by: https://github.com/yiming0416	2024-12-06 18:42:20 +00:00
xinan.lin	4742080ed9	[AOTI XPU] Enable Cpp wraper for Intel GPU. (#135318 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/135318 Approved by: https://github.com/jgong5, https://github.com/EikanWang, https://github.com/guangyey, https://github.com/desertfire	2024-11-26 11:51:32 +00:00
Nikita Shulga	a2ac96cae0	[BE] Rectify some references to caffe2 (#140204 ) - Rename `tools.build_pytorch_libs.build_caffe2` to `tools.build_pytorch_libs.build_pytorch` - Delete number of `if BUILD_CAFFE2` conditions Pull Request resolved: https://github.com/pytorch/pytorch/pull/140204 Approved by: https://github.com/huydhn, https://github.com/r-barnes, https://github.com/atalman	2024-11-09 14:14:20 +00:00
Yu, Guangye	46bca8a4b6	Export XPU oneDNN header to the public (#139177 ) # Motivation Export oneDNN header to the public, for example, the third-party extension now could use `GpuStreamManager` to manage `dnnl::stream` to submit oneDNN kernel. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139177 Approved by: https://github.com/gujinghui, https://github.com/EikanWang, https://github.com/malfet	2024-11-01 02:36:16 +00:00
Piotr Bialecki	bd88d40e5f	[Submodule] update submodule onnx==1.17.0 (#139128 ) Follow-up PR of: https://github.com/pytorch/pytorch/pull/138719 CC @malfet @ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/139128 Approved by: https://github.com/malfet	2024-10-31 02:50:00 +00:00
Nikita Shulga	5c49db98b4	[EZ] Update minversion to 3.9.0 (#139085 ) Fixes https://github.com/pytorch/pytorch/issues/138979 Pull Request resolved: https://github.com/pytorch/pytorch/pull/139085 Approved by: https://github.com/kit1980, https://github.com/huydhn, https://github.com/seemethere, https://github.com/Skylion007	2024-10-28 18:04:29 +00:00
Aaron Gokaslan	49ed365b22	[BE]: Update Typeguard to TypeIs for better type inference (#133814 ) Uses TypeIs instead of TypeGuard for better inference. See https://peps.python.org/pep-0742/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/133814 Approved by: https://github.com/ezyang	2024-10-26 15:07:13 +00:00
Scott Wolchok	a3de067975	[PyTorch] Use 128-bit vectors for ARM64 (#137426 ) The correct vector length for ARM64 is 128 bits (16 bytes). We were previously using double this, apparently just because that would be the same length as AVX2. Differential Revision: [D63984039](https://our.internmc.facebook.com/intern/diff/D63984039/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137426 Approved by: https://github.com/jgong5, https://github.com/malfet ghstack dependencies: #138486, #138542, #138655, #138716, #138744	2024-10-26 00:20:35 +00:00
PyTorch MergeBot	32d4582e02	Revert "[BE]: Update Typeguard to TypeIs for better type inference (#133814 )" This reverts commit 16caa8c1b3a02e47b5f52d3c2d40d7931cc427dc. Reverted https://github.com/pytorch/pytorch/pull/133814 on behalf of https://github.com/jeanschmidt due to checking if this will solve inductor errors ([comment](https://github.com/pytorch/pytorch/pull/133814#issuecomment-2427565425))	2024-10-21 19:40:58 +00:00
Aaron Gokaslan	16caa8c1b3	[BE]: Update Typeguard to TypeIs for better type inference (#133814 ) Uses TypeIs instead of TypeGuard for better inference. See https://peps.python.org/pep-0742/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/133814 Approved by: https://github.com/ezyang	2024-10-21 17:20:06 +00:00
PyTorch MergeBot	d1027c2be6	Revert "Update sympy version constraint to 1.13.3 (#138338 )" This reverts commit d8279ad9d162b5ce71699f462d3664c3745b14f5. Reverted https://github.com/pytorch/pytorch/pull/138338 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but I think a bunch of inductor tests and test_dynamic_shapes are failing in trunk after this lands `d8279ad9d1` ([comment](https://github.com/pytorch/pytorch/pull/138338#issuecomment-2424487225))	2024-10-20 03:19:02 +00:00
Jeongseok (JS) Lee	d8279ad9d1	Update sympy version constraint to 1.13.3 (#138338 ) `simpy` was pinned to version 1.13.1 due to test failures with version 1.13.2 on Windows and mac, as reported in https://github.com/pytorch/pytorch/pull/133235. Now that a newer version, 1.13.3, has been released, this PR aims to verify if the test failure has been resolved and also allow building with newer versions for packaging purposes (e.g., https://github.com/conda-forge/pytorch-cpu-feedstock/pull/277#discussion_r1806721862). Pull Request resolved: https://github.com/pytorch/pytorch/pull/138338 Approved by: https://github.com/Skylion007, https://github.com/malfet Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>	2024-10-20 00:20:02 +00:00
Jeongseok Lee	3cfd244495	Add USE_SYSTEM_NVTX option (#138287 ) ## Summary We are currently [updating](https://github.com/conda-forge/pytorch-cpu-feedstock/pull/277) the [`conda-forge::pytorch`](https://anaconda.org/conda-forge/pytorch) package to version 2.5.0. This update includes a new dependency, the third_party/NVTX submodule. However, like other package management frameworks (e.g., apt), conda-forge prefers using system-installed packages instead of vendor-provided third-party packages. This pull request aims to add an option, `USE_SYSTEM_NVTX`, to select whether to use the vendored nvtx or the system-installed one, with the default being the vendored one (which is the current behavior). ## Test Plan The `USE_SYSTEM_NVTX` option is tested by building the `conda-forge::pytorch` package with the change applied as a [patch](`cd1d2464dd/recipe/patches/0005-Use-system-nvtx3.patch`). Pull Request resolved: https://github.com/pytorch/pytorch/pull/138287 Approved by: https://github.com/albanD	2024-10-19 04:26:01 +00:00
Yu, Guangye	8cda774a03	Add torch.xpu.get_arch_list and torch.xpu.get_gencode_flags for XPU (#137773 ) # Motivation Add `torch.xpu.get_arch_list()` and `torch.xpu.get_gencode_flags()` methods that return architecture list and AOT flags to preserve what flags PyTorch XPU was built with. Pull Request resolved: https://github.com/pytorch/pytorch/pull/137773 Approved by: https://github.com/EikanWang, https://github.com/albanD	2024-10-18 02:28:08 +00:00
atalman	6016b8a9be	Remove CI/CD python 3.8 requirements (#137893 ) Python 3.8 is deprecated from CI/CD. No reason have these pins Pull Request resolved: https://github.com/pytorch/pytorch/pull/137893 Approved by: https://github.com/Skylion007, https://github.com/huydhn, https://github.com/albanD, https://github.com/kit1980	2024-10-14 20:28:48 +00:00
Xuehai Pan	59cdd8ddf1	Bump optree version to 0.13.0 to enable Python 3.13 and Python 3.13t support (#137396 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137396 Approved by: https://github.com/albanD	2024-10-08 06:49:04 +00:00
maajidkhann	5a6ddbcc3b	Extending the Pytorch vec backend for SVE (ARM) (#119571 ) Motivation: In Pytorch, Aten vectorization supports multiple platforms, including x86 and Arm, as well as multiple data types. It provides a generic implementation of Vector (Vec) type that allows the programmer to write code packing various primitives (such as floats) within 256bit & 512bits registers. It can be extended to support other ISAs easily by adding more VecISA sub-classes. Reference Link: https://github.com/pytorch/pytorch/tree/main/aten/src/ATen/cpu/vec This PR: * Our goal with this contribution is to add support for SVE backend for Vec in the Aten vectorization for CPU backend which can be benefitted by any ARM architecture supported CPU's that supports SVE. * More about SVE ISA for ARM: [https://developer.arm.com/Architectures/Scalable Vector Extensions](https://developer.arm.com/Architectures/Scalable%20Vector%20Extensions) * We are using the ARM C Language Extensions for SVE (https://developer.arm.com/documentation/102699/0100/Optimizing-with-intrinsics ) to accelerate performance for various operators in the SVE backend for Vec. * Currently we are adding support only for SVE ISA with the vector length of 256 bits (SVE 256). In future, we plan to extend this SVE support for other vector lengths as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119571 Approved by: https://github.com/malfet, https://github.com/snadampal Co-authored-by: Divya Kotadiya <divya.kotadiya@fujitsu.com>	2024-09-18 18:59:10 +00:00
angelayi	cd9ee49a69	[aoti] Add cpp loader (#135374 ) * Added a cpp loader, AOTIModelPackageLoader, which can load the .pt2, build the .so, and create a runner. The python-facing API is that users can directly call the `run` function, whereas in cpp users can directly access the `runner_` if they are more familiar with that. I couldn't figure out how to bind the `get_runner()` function to python... * Added a new config, `aot_inductor.package_cpp_only` which will not package the so. This means that whenever the package is loaded, we will need to build the so. This is turned off by default so that new environments do not need to rebuild their so. The `package_cpp_only` is a feature which torchchat intends to use to provide flexibility to users. * Added a new config, `aot_inductor.metadata` which stores user-provided metadata, serialized to the pt2 as a json file. It also stores the device used when exporting, "cuda" or "cpu", so that during load time, we can use that data to determine which AOTIModelContainerRunner to use. The metadata can be accessed through `loader.get_metadata()`. TODO is to move this metadata to the toplevel `package_aoti` function so that we can remove the metadata as a config. * Separated out `package_aoti` as a standalone function, instead of it automatically being called in inductor. This is to prepare for the case where users will compile multiple models, and want to bundle it in one package. The specific use case is in torchchat, where we want to package the separately-exported encoder and decoder layers. An example of how to use this is in `test_multiple_methods`. * `load_package` will load a singular model, given the model name. * The loader doesn't support windows for now, I think I need to add some more casing to make the build commands work on windows? Differential Revision: [D62329906](https://our.internmc.facebook.com/intern/diff/D62329906) Pull Request resolved: https://github.com/pytorch/pytorch/pull/135374 Approved by: https://github.com/desertfire, https://github.com/malfet	2024-09-11 03:00:01 +00:00
Moritz Marseu	e000cf0ad9	Fix license metadata in setup.py (#129219 ) Package metadata in setup.py lists license as BSD-3 which is not a valid SPDX id. The correct id would be BSD-3-Clause. Specifying an SPDX id is beneficial to license compliance scanning. Taking up #129123 from my personal account. Pull Request resolved: https://github.com/pytorch/pytorch/pull/129219 Approved by: https://github.com/malfet, https://github.com/kit1980	2024-09-04 00:21:22 +00:00
fduwjj	bdfa94b787	[RFC] Make fr trace script a console scripts (#134729 ) We want to make fr analyzer script a command after users `pip install torch`, that's why we want to mimic what torchrun is doing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134729 Approved by: https://github.com/c-p-i-o, https://github.com/malfet ghstack dependencies: #134528, #134780	2024-08-30 18:17:06 +00:00
Zitong Zhan	90c821814e	SparseCsrCUDA: cuDSS backend for linalg.solve (#129856 ) This PR switches to cuDSS library and has the same purpose of #127692, which is to add Sparse CSR tensor support to linalg.solve. Fixes #69538 Minimum example of usage: ``` import torch if __name__ == '__main__': spd = torch.rand(4, 3) A = spd.T @ spd b = torch.rand(3).to(torch.float64).cuda() A = A.to_sparse_csr().to(torch.float64).cuda() x = torch.linalg.solve(A, b) print((A @ x - b).norm()) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129856 Approved by: https://github.com/amjames, https://github.com/lezcano, https://github.com/huydhn Co-authored-by: Zihang Fang <zhfang1108@gmail.com> Co-authored-by: Huy Do <huydhn@gmail.com>	2024-08-22 07:57:30 +00:00
PyTorch MergeBot	2db28a9611	Revert "[BE]: Update Typeguard to TypeIs for better type inference (#133814 )" This reverts commit bce0caba7804b0787684dbf1f4e1c4d9e3acded5. Reverted https://github.com/pytorch/pytorch/pull/133814 on behalf of https://github.com/ezyang due to root cause of internal failures not addressed ([comment](https://github.com/pytorch/pytorch/pull/133814#issuecomment-2302466444))	2024-08-21 16:13:34 +00:00
Aaron Gokaslan	bce0caba78	[BE]: Update Typeguard to TypeIs for better type inference (#133814 ) Uses TypeIs instead of TypeGuard for better inference. See https://peps.python.org/pep-0742/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/133814 Approved by: https://github.com/ezyang	2024-08-20 17:19:57 +00:00
cyy	c3d02fa390	[Reland2] Update NVTX to NVTX3 (#109843 ) Another attempt to update NVTX to NVTX3. We now avoid changing NVTX header inclusion of existing code. The advantage of NVTX3 over NVTX is that it is a header-only library so that linking with NVTX3 can greatly simplify our CMake and other building scripts for finding libraries in user environments. In addition, NVTX are indeed still present in the latest CUDA versions, but they're no longer a compiled library: It's now a header-only library. That's why there isn't a .lib file anymore. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109843 Approved by: https://github.com/peterbell10, https://github.com/eqy Co-authored-by: Ivan Zaitsev <108101595+izaitsevfb@users.noreply.github.com>	2024-08-20 16:33:26 +00:00
Christophe Bornet	d6368985af	[BE]: Fix setuptools not installed with Python 3.12 (#133561 ) setuptools is not installed correctly for Python 3.12. See https://github.com/python-poetry/poetry/issues/9630#issuecomment-2291114885 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133561 Approved by: https://github.com/Skylion007 Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>	2024-08-17 17:42:04 +00:00
Mikayla Gawarecki	018e48c337	[Reland] Add wrappers for synchronous GPUDirect Storage APIs (#133489 ) Reland #130633 USE_CUFILE turned off by default in this version Pull Request resolved: https://github.com/pytorch/pytorch/pull/133489 Approved by: https://github.com/albanD	2024-08-15 17:11:52 +00:00
cyy	e76f0e0646	Remove QNNPACK reference from setup.py (#133177 ) QNNPACK has been removed from third party Pull Request resolved: https://github.com/pytorch/pytorch/pull/133177 Approved by: https://github.com/albanD	2024-08-13 01:19:12 +00:00
Catherine Lee	4f0d5f6551	Pin sympy to 1.13.1 (#133235 ) Sympy 1.13.2 release yesterday, and it results in test failures on windows and mac `454713fe9d/1` Hopefully these are the places it needs to get pinned Pull Request resolved: https://github.com/pytorch/pytorch/pull/133235 Approved by: https://github.com/atalman, https://github.com/ZainRizvi	2024-08-12 20:10:09 +00:00
cyy	05e8e87a69	[Submodule] Remove foxi (#132976 ) It is not used after removal of Caffe2 code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/132976 Approved by: https://github.com/ezyang	2024-08-09 03:46:52 +00:00
PyTorch MergeBot	e191b83462	Revert "Add wrappers for synchronous GPUDirect Storage APIs (#130633 )" This reverts commit 709ddf7a9dcfa1268848b72f6f56b55afa6728d6. Reverted https://github.com/pytorch/pytorch/pull/130633 on behalf of https://github.com/clee2000 due to still failing internally D60265673 ([comment](https://github.com/pytorch/pytorch/pull/130633#issuecomment-2253239607))	2024-07-26 18:08:20 +00:00
Mikayla Gawarecki	709ddf7a9d	Add wrappers for synchronous GPUDirect Storage APIs (#130633 ) Based in part on https://github.com/NVIDIA/apex/pull/1774 Differential Revision: [D60155434](https://our.internmc.facebook.com/intern/diff/D60155434) Pull Request resolved: https://github.com/pytorch/pytorch/pull/130633 Approved by: https://github.com/albanD	2024-07-25 22:23:38 +00:00
PyTorch MergeBot	e4b5645f83	Revert "Add wrappers for synchronous GPUDirect Storage APIs (#130633 )" This reverts commit 5b5e0698a5f560decb9bbdd150ed7b0622eb7777. Reverted https://github.com/pytorch/pytorch/pull/130633 on behalf of https://github.com/clee2000 due to breaking a lot of jobs and build rules internally D60085885, possibly needs to update some bazel build? ([comment](https://github.com/pytorch/pytorch/pull/130633#issuecomment-2245806738))	2024-07-23 17:19:34 +00:00
Mikayla Gawarecki	5b5e0698a5	Add wrappers for synchronous GPUDirect Storage APIs (#130633 ) Based in part on https://github.com/NVIDIA/apex/pull/1774 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130633 Approved by: https://github.com/albanD	2024-07-22 14:51:24 +00:00
Xuehai Pan	d2bd9acabd	[BE] bump `optree` version to 0.12.1 (#130139 ) 0.12.0 Major Updates: - Add context manager to temporarily set the dictionary sorting mode - Add accessor APIs - Use `stable` tag for `pybind11` for Python 3.13 support - Fix potential segmentation fault for pickling support 0.12.1 Updates: - Fix warning regression during import when launch with strict warning filters Closes #130155 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130139 Approved by: https://github.com/zou3519 ghstack dependencies: #130895	2024-07-20 02:41:10 +00:00
Xuehai Pan	f0075c179b	Pin `sympy >= 1.13.0` (#130895 ) ------ The opposite of #130836. Pin `sympy >= 1.13.0` for Python >= 3.9 and `sympy == 1.12.1` for Python 3.8. - #130836 See the PR description of #130836 for more details. `sympy` 1.13.0 introduces some breaking changes which break our tests. More specifically: - Ref [Backwards compatibility breaks and deprecations](https://github.com/sympy/sympy/wiki/release-notes-for-1.13.0#backwards-compatibility-breaks-and-deprecations) > BREAKING CHANGE: Float and Integer/Rational no longer compare equal with a == b. From now on Float(2.0) != Integer(2). Previously expressions involving Float would compare unequal e.g. x2.0 != x2 but an individual Float would compare equal to an Integer. In SymPy 1.7 a Float will always compare unequal to an Integer even if they have the same "value". Use sympy.numbers.int_valued(number) to test if a number is a concrete number with no decimal part. ([#25614](https://github.com/sympy/sympy/pull/25614) by [@smichr](https://github.com/smichr)) `sympy >= 1.13.0` is required to enable Python 3.13 support. This should be part of #130689. - #130689 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130895 Approved by: https://github.com/ezyang	2024-07-20 00:59:24 +00:00
Xuehai Pan	a3abfa5cb5	[BE][Easy][1/19] enforce style for empty lines in import segments (#129752 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129752 Approved by: https://github.com/ezyang, https://github.com/malfet	2024-07-16 00:42:56 +00:00

1 2 3 4 5 ...

845 Commits