pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
penguin-wwy	217ba7b2ab	[Docs] Update FileCheck doc (#135199 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/135199 Approved by: https://github.com/soulitzer	2024-09-06 08:18:38 +00:00
CaoE	758d515d98	[Inductor][CPP] Select tiling factor for lower precision data types (#133830 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133830 Approved by: https://github.com/jgong5, https://github.com/jansel	2024-09-06 08:12:37 +00:00
Feng Yuan	60d98b4cfb	Update torch-xpu-ops pin (ATen XPU implementation) (#135300 ) Release cycle for PyTorch 2.5 1. Bugfixing: correct reduction logic in cdist kernel. Pull Request resolved: https://github.com/pytorch/pytorch/pull/135300 Approved by: https://github.com/EikanWang	2024-09-06 07:30:09 +00:00
Shangdi Yu	590a3e9f8a	[export][training ir migration] quantized_decomposed.quantize_per_tensor decomposition (#134525 ) Summary: In graph of TestXNNPACKQuantizer.test_dynamic_linear_with_con test, some quantized_decomposed.quantize_per_tensor.default ops are becoming quantized_decomposed.dequantize_per_tensor.tensor ops when using the new training ir. This is because we lift params/buffers before calling make_fx. So previously, for the graph that’s passed to make_fx,`graph.L__self___linear1.weight` is a tensor now in training ir, graph.L__self___linear1.weight is a FakeTensor. This caused the node overload to be different. Test Plan: ``` buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test/quantization:test_quantization -- -r test_dynamic_linear_with_conv ``` Differential Revision: D61364547 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134525 Approved by: https://github.com/tugsbayasgalan, https://github.com/jerryzh168	2024-09-06 07:06:06 +00:00
drisspg	764ee6e3f9	[FlexAttention] Specify padding_value for boundary checked loads (#134573 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/134573 Approved by: https://github.com/Chillee	2024-09-06 06:47:26 +00:00
wz337	67f98a99a4	[DeviceMesh][Easy] Make RuntimeError a bit more descriptive by including the actual world_size (#135271 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/135271 Approved by: https://github.com/fduwjj	2024-09-06 06:23:20 +00:00
fduwjj	e020a8755a	[Fix][FR][ez] Remove debugging logs (#135308 ) Removing the print added during debugging process. Pull Request resolved: https://github.com/pytorch/pytorch/pull/135308 Approved by: https://github.com/wz337	2024-09-06 06:14:33 +00:00
Jason Ansel	7ffb3b201c	[inductor] Remove LoopBody.reads,writes,other (#135256 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/135256 Approved by: https://github.com/oulgen ghstack dependencies: #135070, #135076, #135082, #135084, #135079, #135235	2024-09-06 06:11:55 +00:00
Jason Ansel	f946bf88c4	[inductor] Skip retracing an existing LoopBody (#135235 ) This is roughly a 7% speedup in inductor compile time for hf_Bert_large. The time spent in `LoopBody.__init__` improves from 15% to 8% of `fx_codegen_and_compile`. Before ![image](https://github.com/user-attachments/assets/7de0f28e-35bd-472f-b4be-b52733d2a85c) After ![image](https://github.com/user-attachments/assets/5f0cf11a-43c5-43ae-b13c-f32383a75a7f) Overall ![image](https://github.com/user-attachments/assets/6a369d8c-fb5e-4ad2-9504-0fc745ad6568) Pull Request resolved: https://github.com/pytorch/pytorch/pull/135235 Approved by: https://github.com/oulgen ghstack dependencies: #135070, #135076, #135082, #135084, #135079	2024-09-06 06:11:55 +00:00
Jason Ansel	66da3b3b2a	[fx] Bypass custom __setattr__ in Node.__init__ (#135079 ) Before: ![image](https://github.com/user-attachments/assets/5f0a6ae6-6049-44d0-b5f2-a549a23ad97f) After: ![image](https://github.com/user-attachments/assets/51c9f91b-f8a0-4043-8362-65813feec823) Pull Request resolved: https://github.com/pytorch/pytorch/pull/135079 Approved by: https://github.com/oulgen ghstack dependencies: #135070, #135076, #135082, #135084	2024-09-06 06:11:46 +00:00
Laith Sakka	41e653456e	[RDP] Fix "No module named 'libfb’" (#135244 ) Summary: D62215095 Introduced an import error to arvr pipelines as the is_fbcode() function does not work as intended. This changes is_fbcode() to be a much stricter check. Test Plan: ``` buck2 run arvr/mode/platform010/opt-stripped //arvr/libraries/depthlink/clients/mr_replay:pipeline_runner -c bolt.use_eva3_sim=True -- --config_file arvr/libraries/depthlink/clients/mr_replay/configs/runner_config.yaml --features DEPTH ``` Differential Revision: D62237502 Pull Request resolved: https://github.com/pytorch/pytorch/pull/135244 Approved by: https://github.com/aorenste	2024-09-06 04:52:31 +00:00
chilli	e40a0a9359	Add randomness checking for sdpa vmap (#135176 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/135176 Approved by: https://github.com/zou3519	2024-09-06 04:50:49 +00:00
Xuan Zhang	c05a7adb36	[inductor][debug] fix draw_buffers (#135266 ) Before: ![image](https://github.com/user-attachments/assets/aac756f3-1349-4647-9da3-87cf105cf647) After: <img width="791" alt="image" src="https://github.com/user-attachments/assets/d72c663c-e598-42fa-ac40-9e58956f1ec1"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/135266 Approved by: https://github.com/yf225	2024-09-06 04:12:41 +00:00
hippocookie	5f57be7571	[Distributed] Change function call in test to non-deprecated to eliminate warning (#134938 ) Migrate function call in test to eliminate warning message in below and reduce the chance of test fail when methods removed - from deprecated `save_state_dict` change to `save` - from deprecated `load_state_dict` change to `load` Warning message: ```bash pytorch/test/distributed/checkpoint/test_fsdp_model_state.py:37: FutureWarning: `save_state_dict` is deprecated and will be removed in future versions.Please use `save` instead. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/134938 Approved by: https://github.com/wz337, https://github.com/fegin	2024-09-06 03:25:09 +00:00
Xu Han	29d72c1100	[inductor] check intel compiler minimal version (#135209 ) On Windows: early version icx has `-print-file-name` issue, and can't preload correctly for inductor. Add minimal version check for Intel compiler. Pull Request resolved: https://github.com/pytorch/pytorch/pull/135209 Approved by: https://github.com/ezyang	2024-09-06 03:21:07 +00:00
leslie-fang-intel	3b1a334c0f	[Inductor][CPP] Avoid mistake wgt tensor delete (#135100 ) Summary Fix issue: https://github.com/pytorch/pytorch/issues/134998: Previously, we only checked if the `get_attr` FX node for the weight had a single user node. However, two `get_attr` nodes may share the same tensor and should not be deleted in such cases. In this PR, we add the count of users for tensor along with the num of users for nodes to decide whether this tensor can be deleted or not. TestPlan ``` python test/inductor/test_cpu_select_algorithm.py -k test_linear_wgt_multi_users ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/135100 Approved by: https://github.com/jgong5	2024-09-06 03:13:36 +00:00
leslie-fang-intel	07689a38bf	[Inductor] Fix AOT weight alignment issue on CPU (#135205 ) Summary Fix issue: https://github.com/pytorch/pytorch/issues/135027. On CPU, the `consts_size` used to generate `_binary_constants_bin_start` is not padded to `ALIGN_BYTES`, while `serialized_weights` is, causing a failure in the 16K alignment check. Pull Request resolved: https://github.com/pytorch/pytorch/pull/135205 Approved by: https://github.com/jgong5, https://github.com/desertfire	2024-09-06 03:06:51 +00:00
Edward Z. Yang	06a7dc21c1	Remove dead expect_rational (#135105 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/135105 Approved by: https://github.com/malfet	2024-09-06 02:57:27 +00:00
Edward Z. Yang	d9a18173fa	Report qualname of exception type rather than <class 'RuntimeError'> (#135146 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/135146 Approved by: https://github.com/Skylion007, https://github.com/albanD, https://github.com/yanboliang ghstack dependencies: #135148, #135145	2024-09-06 02:56:50 +00:00
Edward Z. Yang	d8543e3162	Include exception type qualname when rewrapping InternalTorchDynamoError (#135145 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/135145 Approved by: https://github.com/drisspg, https://github.com/anijain2305 ghstack dependencies: #135148	2024-09-06 02:56:50 +00:00
Edward Z. Yang	ad01fc194d	Consolidate raise and rewrap raise error branches (#135148 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/135148 Approved by: https://github.com/anijain2305, https://github.com/albanD, https://github.com/yanboliang, https://github.com/malfet	2024-09-06 02:56:46 +00:00
Haibo Chen	e162414963	add instrumentation of CCA stats for reserved and allocated memory size (#135231 ) As titled Pull Request resolved: https://github.com/pytorch/pytorch/pull/135231 Approved by: https://github.com/c-p-i-o	2024-09-06 02:48:56 +00:00
Edward Z. Yang	9e5a797771	Improve test_public_bindings import module error reporting (#135258 ) Error was hard to understand without message. Render it now. See https://github.com/pytorch/pytorch/pull/135259 for it in action. Example failure: ``` 2024-09-05T20:04:45.3022000Z FAILED [5.9524s] test_public_bindings.py::TestPublicBindings::test_modules_can_be_imported - AssertionError: String comparison failed: '' != "torch._logging.scribe failed to import w[112 chars].py)" 2024-09-05T20:04:45.3025413Z + torch._logging.scribe failed to import with error ImportError: cannot import name 'TypeAlias' from 'typing' (/opt/conda/envs/py_3.9/lib/python3.9/typing.py) 2024-09-05T20:04:45.3026990Z ``` Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/135258 Approved by: https://github.com/albanD	2024-09-06 02:40:03 +00:00
atalman	b46a1b9e2d	Use Python 3.9 on all libtorch jobs (#135245 ) Part of the migration py3.8->3.9 Pull Request resolved: https://github.com/pytorch/pytorch/pull/135245 Approved by: https://github.com/izaitsevfb	2024-09-06 02:27:22 +00:00
Sunita Nadampalli	9688014820	aarch64: extend matmul heuristic checks to all neoverse platforms (#134548 ) for aarch64 neoverse platforms there are two gemm backends available for matmul operator on PyTorch: (1) Arm Compute Library and (2) OpenBLAS. While Arm Compute Library provides better performance over OpenBLAS, it has overhead for the kernel launch time, and hence we use OpenBLAS for smaller tensor compute. The heuristic was originally implemented for neoverse_v1. This commit extends the heuristic to other neoverse platforms Pull Request resolved: https://github.com/pytorch/pytorch/pull/134548 Approved by: https://github.com/malfet	2024-09-06 01:40:50 +00:00
titaiwangms	8f6e73f068	[ONNX] Enable experimental exporter logic to dynamo_export and support refine dynamic_shapes (#134976 ) (1) Enable experimental exporter logic to dynamo_export (2) Refine dynamic shapes and retry export in export strategies (3) Delete `torch_export_graph_extractor` and use the new export logic (4) Disable ExportedProgram test in `test_fx_onnx_with_onnxruntime.py`, as ONNXProgram is different now. Fixes https://github.com/pytorch/pytorch/issues/126479 Fixes #135183 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134976 Approved by: https://github.com/justinchuby	2024-09-06 01:29:56 +00:00
Bin Bao	1e57ef08fa	[AOTI] Support MKLDNN qconv ops in cpp wrapper (#134795 ) Summary: Similar to https://github.com/pytorch/pytorch/pull/134475, support qconv in the ABI-compatible mode for cpp-wrapper Inductor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134795 Approved by: https://github.com/leslie-fang-intel, https://github.com/chunyuan-w, https://github.com/angelayi ghstack dependencies: #134475, #134783	2024-09-06 01:01:53 +00:00
Bin Bao	614b86d602	[AOTI] Support MKLDNN qlinear ops in cpp wrapper (#134783 ) Summary: Similar to https://github.com/pytorch/pytorch/pull/134475, support qlinear in the ABI-compatible mode for cpp-wrapper Inductor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134783 Approved by: https://github.com/leslie-fang-intel, https://github.com/chunyuan-w, https://github.com/angelayi ghstack dependencies: #134475	2024-09-06 01:01:53 +00:00
Bin Bao	0b96dfb736	[AOTI] Support MKLDNN conv ops in cpp wrapper (#134475 ) Summary: Partially fix https://github.com/pytorch/pytorch/issues/123040. In the ABI-compatible mode, MKLDNN fallback ops do not have C shim implementations and thus need to go through the custom ops launch path. Other MLKDNN ops will be fixed in following PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134475 Approved by: https://github.com/leslie-fang-intel, https://github.com/chunyuan-w, https://github.com/angelayi	2024-09-06 01:01:53 +00:00
Shivam Raikundalia	62b221d5cc	Add Percentages to Function Events (#135155 ) Summary: Users have recently asked that the profiler contains self/total CPU and device percentages to FunctionEvents so that teams can process the data procedurely. Some of it could be done mathematically via subroutines but since we already have the information in the _build_table, lets build it there. Test Plan: Check that we have the same table as before but also check that the parameters we check also have the expected values Differential Revision: D62210351 Pull Request resolved: https://github.com/pytorch/pytorch/pull/135155 Approved by: https://github.com/shanw-meta, https://github.com/kit1980	2024-09-06 00:39:11 +00:00
Laith Sakka	66dd4577b1	Track base of FunctionalTensor in inference mode. (#135141 ) The idea behind the tracking is the following, whenever we see a tensor if the tensors is a root tensors (does not have any view metas ) when we consider is as the base of the all the tensors that shares its storage. Pull Request resolved: https://github.com/pytorch/pytorch/pull/135141 Approved by: https://github.com/zou3519	2024-09-06 00:10:25 +00:00
cyy	cc28634172	[Submodule] Bump pybind11 to v2.13.5 (#135202 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/135202 Approved by: https://github.com/Skylion007	2024-09-06 00:09:00 +00:00
wz337	c83cdf068b	[DTensor] Fix view op replicating on tensor dim when the size of the tensor dim = 1 (#135054 ) We found a corner case that when a tensor dimension is 1, calling `view(1)` would result in an unexpected replication (see case 1 below). When the tensor dimension to shard is not 1, no matter whether the tensor dimension is evenly-shardable across the mesh dimension, it won't cause an implicit replication behind the scenes if view doesn't change the size of the given tensor dimension (see case 2 and 3). When the tensor dimension to shard is of size 1, it is not being added to shardable_dims here: https://github.com/pytorch/pytorch/blob/main/torch/distributed/_tensor/ops/_view_ops.py#L518 ``` # uneven case where the size of the tensor dimension to shard is 1 p = torch.randn(1,2) mesh = init_device_mesh(“cuda”, (2,)) dtensor = distribute_tensor(p, mesh, [Shard(0)]) t = dtensor.view(1, 2) # this would result in replication, meaning t is now replicated across all ranks. # uneven case where the size of the tensor dimension to shard is not 1 p = torch.randn(3, 2) mesh = init_device_mesh(“cuda”, (2,)) dtensor = distribute_tensor(p, mesh, [Shard(0)]) t = dtensor.view(3, 2) # this would not result in replication. # this would not result in replication, meaning t stays as sharded. # even case p = torch.randn(2,2) dtensor = distribute_tensor(p, mesh, [Shard(0)]) t = dtensor.view(2, 2) # this would not result in replication, meaning t stays as sharded. ``` Differential Revision: [D62155606](https://our.internmc.facebook.com/intern/diff/D62155606) Pull Request resolved: https://github.com/pytorch/pytorch/pull/135054 Approved by: https://github.com/tianyu-l, https://github.com/wanchaol	2024-09-06 00:03:54 +00:00
titaiwangms	28ccfba248	[ONNX] Delete ONNXProgramSerializer (#135261 ) Fixes #135182 Pull Request resolved: https://github.com/pytorch/pytorch/pull/135261 Approved by: https://github.com/justinchuby	2024-09-05 23:52:51 +00:00
Jason Ansel	b2386bdca1	[debug] Add helper to run cProfile on a function (#135084 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/135084 Approved by: https://github.com/oulgen ghstack dependencies: #135070, #135076, #135082	2024-09-05 23:41:30 +00:00
Jason Ansel	bdfc8d9f96	[fx] Don't use generators in map_aggregate (#135082 ) While the generators avoid a copy, they are slow. Before: ![image](https://github.com/user-attachments/assets/70a55a9a-0595-4105-b0ab-22cf77c7409c) After: ![image](https://github.com/user-attachments/assets/cecb9c59-ae36-47de-8b08-cab2c7cb3d57) Pull Request resolved: https://github.com/pytorch/pytorch/pull/135082 Approved by: https://github.com/oulgen ghstack dependencies: #135070, #135076	2024-09-05 23:41:30 +00:00
Jason Ansel	70779dded8	[fx] Compile time optimization in Node.__update_args_kwargs (#135076 ) Before this we took two passes over all of the args. Before: ![image](https://github.com/user-attachments/assets/24ce5628-03f4-4983-9f2d-5ddf0ca5816e) After: ![image](https://github.com/user-attachments/assets/c9681aa2-32f0-4f6b-a598-fc6f90ffafb5) Pull Request resolved: https://github.com/pytorch/pytorch/pull/135076 Approved by: https://github.com/Chillee ghstack dependencies: #135070	2024-09-05 23:41:30 +00:00
Jason Ansel	ea231300d1	[inductor] Improve compile time regression from MemoryDep.normalize (#135070 ) Possible fix for #135056 Before ![image](https://github.com/user-attachments/assets/3962cb85-e808-4fd4-991f-471ff5ef7eae) After ![image](https://github.com/user-attachments/assets/2322d48d-6518-4518-baca-336027b5cda8) Measured based on: ``` python benchmarks/dynamo/torchbench.py --ci --accuracy --timing --explain --inductor --device cuda --training --only hf_Bert_large --stats -n1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/135070 Approved by: https://github.com/Chillee	2024-09-05 23:41:30 +00:00
PyTorch MergeBot	8f66995459	Revert "Support rolling over a percentage of workflows (#134816 )" This reverts commit fc890b55b51098437b6149abf1026a8b2aaee389. Reverted https://github.com/pytorch/pytorch/pull/134816 on behalf of https://github.com/malfet due to Causes lint to intermittently fail ([comment](https://github.com/pytorch/pytorch/pull/134816#issuecomment-2332902609))	2024-09-05 23:39:41 +00:00
Kulin Seth	144fde4fd2	[MPS] Add support for autocast in MPS (#99272 ) Fixes https://github.com/pytorch/pytorch/issues/88415 Need to run inductor/test_cpu_select_algorithm Pull Request resolved: https://github.com/pytorch/pytorch/pull/99272 Approved by: https://github.com/malfet Co-authored-by: Siddharth Kotapati <skotapati@apple.com> Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com> Co-authored-by: Roy Hvaara <roy@lightyear.no>	2024-09-05 23:23:17 +00:00
Avik Chaudhuri	43f4947d44	fix fake tensor tolist implementation (#135131 ) Summary: When exporting for training with `tolist`, we do not hit `FunctionalTensor.tolist` since we do not functionalize. Unfortunately, this means we hit `FakeTensor.tolist`, which creates unbacked symints that are not backed by proxies. Rather than trying to patch up this low-level implementation, we replace it with essentially what `FunctionalTensor.tolist` does, which is higher-level: we essentially desugar to `item()` calls and let it take care of unbacked symints. Test Plan: Some expected failures are gone now. Also found a test for `tolist` that was written when `FunctionalTensor.tolist` was implemented but not really doing much; repurposed it now to exercise more modes. Differential Revision: D62197742 Pull Request resolved: https://github.com/pytorch/pytorch/pull/135131 Approved by: https://github.com/ezyang	2024-09-05 23:20:31 +00:00
Chirag Pandya	65e1c34061	[rfc] scuba for flight recorder (#134794 ) Summary: Record flight recorder status in a scuba table. Test Plan: Testing with timing out a job. Will post results soon. Differential Revision: D61729221 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134794 Approved by: https://github.com/fduwjj	2024-09-05 23:18:10 +00:00
Stonepia	830247c355	[Intel Triton] Update Intel Triton to release/2.5.0 (#134074 ) This PR relands https://github.com/pytorch/pytorch/pull/134053 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134074 Approved by: https://github.com/EikanWang	2024-09-05 22:46:31 +00:00
Yidi Wu	4262755b5a	[cond] fix typo in cond codegen (#134708 ) As titled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134708 Approved by: https://github.com/jansel	2024-09-05 22:38:24 +00:00
Edward Z. Yang	3825607144	Add torch._logging.scribe (#135224 ) See https://github.com/pytorch/pytorch/pull/135138 for a usage example. Meta only, see https://docs.google.com/document/d/1JpbAQvRhTmuxjnKKjT7qq57dsnV84nxSLpWJo1abJuE/edit#heading=h.9wi46k7np6xw for context fbscribelogger is a library that allows us to write to scribe, which is Meta's logging infrastructure, when you have appropriate access token (this token is available for jobs running on main, as well as authorized jobs with the ci-scribe label). The resulting data is accessible via Scuba (a real time in-memory database) and Hive (a more traditional SQL persisted database). Here's the motivating use case. Suppose there is somewhere in PyTorch's codebase where you'd like to log an event, and then you'd like to find all the situations where this log is called. If PyTorch is rolled out to our internal users, we have some FB-oriented APIs (like torch._utils_internal.signpost_event) with which you can do this. But you have to actually land your PR to main, wait for it to be ingested to fbcode, and then wait for us to actually roll out this version, before you get any data. But what if you want the results within the next few hours? Instead, you can use torch._logging.scribe to directly write to our logging infrastructure from inside CI jobs. The most convenient approach is to log unstructured JSON blobs to `open_source_signpost` (added in this PR; you can also add your own dedicated table as described in the GDoc above). After adding logging code to your code, you can push your PR to CI, add 'ci-scribe' label, and in a few hours view the results in Scuba, e.g., (Meta-only) https://fburl.com/scuba/torch_open_source_signpost/z2mq8o4l If you want continuous logging on all commits on master, you can land your PR and it will be continuously get logging for all CI runs that happen on main. Eventually, if your dataset is important enough, you can consider collaborating with PyTorch Dev Infra to get the data collected in our public AWS cloud so that OSS users can view it without access to Meta's internal users. But this facility is really good for prototyping / one-off experiments. It's entirely self serve: just add your logging, run your PR CI with ci-scribe, get results, do analysis in Scuba. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/135224 Approved by: https://github.com/Skylion007	2024-09-05 22:37:13 +00:00
eqy	3c8f71ff93	[cuDNN][64-bit indexing] cuDNN v9.3+ supports non-batch-splittable convolutions with > 2**31 elements (#134890 ) For longstanding issues such as #95024 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134890 Approved by: https://github.com/Skylion007	2024-09-05 22:22:45 +00:00
Zain Rizvi	fc890b55b5	Support rolling over a percentage of workflows (#134816 ) In order to support adding a rollover percentage, this ended up being a complete rewrite of runner_determinator.py. Details of the new format are in the comments up top. On the plus side, this now includes some unit tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134816 Approved by: https://github.com/PaliC, https://github.com/zxiiro	2024-09-05 22:21:45 +00:00
Animesh Jain	058a69d91a	[fbcode][dynamo] Turn on guard_nn_modules using justknobs_check (#134928 ) As Title Pull Request resolved: https://github.com/pytorch/pytorch/pull/134928 Approved by: https://github.com/ezyang	2024-09-05 22:05:54 +00:00
sanchitintel	6c5920d515	Tune int8 AMX WoQ micro-kernel for CPU (#134832 ) This patch prevents performance regression against the default ATen implementation for LLaMA 3.1 int8 GPTQ WoQ workload. Uses AMX micro-kernel only if `M` >= `block_m` Pull Request resolved: https://github.com/pytorch/pytorch/pull/134832 Approved by: https://github.com/jgong5	2024-09-05 22:01:14 +00:00
Zhengxu Chen	116fd474da	[export] Expand coverage to more copied sym ops for unflattener. (#135119 ) Test Plan: buck2 test 'fbcode//mode/opt' fbcode//torchrec/ir/tests:test_serializer -- --run-disabled ``` File changed: fbcode//caffe2/torch/export/unflatten.py Buck UI: https://www.internalfb.com/buck2/2e0377e7-e2b6-4bd0-8133-a787245165a0 Test UI: https://www.internalfb.com/intern/testinfra/testrun/5066549824883887 Network: Up: 0B Down: 0B Jobs completed: 16. Time elapsed: 10.2s. Tests finished: Pass 6. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` Differential Revision: D62190172 Pull Request resolved: https://github.com/pytorch/pytorch/pull/135119 Approved by: https://github.com/yushangdi	2024-09-05 21:58:20 +00:00

1 2 3 4 5 ...

78140 Commits