pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-21 05:34:18 +08:00

Author	SHA1	Message	Date
Digant Desai	e2830e6328	[PyTorch] SDPA decomp: actually use attn_mask (#117579 ) Summary: Need to pass this along Test Plan: ``` cd ~/fbsource/fbcode/executorch/backends/xnnpack/test buck test fbcode//mode/dev-nosan :test_xnnpack_ops -- test_fp32_sdpa buck run fbcode//mode/dev-nosan :test_xnnpack_models -- executorch.backends.xnnpack.test.models.llama2_et_example.TestLlama2ETExample.test_fp32 ``` Reviewed By: larryliu0820 Differential Revision: D52812369 Pull Request resolved: https://github.com/pytorch/pytorch/pull/117579 Approved by: https://github.com/larryliu0820	2024-01-17 10:26:43 +00:00
vfdev-5	f6767244cf	Added meta function for _upsample_bicubic2d_aa (#117347 ) This should fix remaining errors with Resize op in torchvision: https://github.com/pytorch/vision/actions/runs/7298953575?pr=8127 ``` /opt/conda/envs/ci/lib/python3.8/site-packages/torch/nn/functional.py:4072: in interpolate return torch._C._nn._upsample_bicubic2d_aa(input, output_size, align_corners, scale_factors) E torch._dynamo.exc.TorchRuntimeError: Failed running call_function <function interpolate at 0x7f4443fe00d0>((FakeTensor(..., size=(1, s0, s1, s2)),), {'size': [s4, floor(s3s4/floor(s1*s3/s2))], 'mode': 'bicubic', 'align_corners': False, 'antialias': True}): E aten/src/ATen/RegisterCompositeImplicitAutograd.cpp:5567: SymIntArrayRef expected to contain only concrete integers E E from user code: E File "/pytorch/vision/torchvision/transforms/v2/functional/_geometry.py", line 260, in resize_image E image = interpolate( E E Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information E E E You can suppress this exception and fall back to eager by setting: E import torch._dynamo E torch._dynamo.config.suppress_errors = True ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/117347 Approved by: https://github.com/peterbell10	2024-01-16 23:33:55 +00:00
Aaron Orenstein	638f85fd67	Add default parameters to rrelu_with_noise() (#117141 ) Summary: rrelu_with_noise() was listed as having default parameters in the schema but the actual code definition didn't have them. The failing example was calling rrelu() which DOES have default parameters and it passes those defaulted values to C++. Under the covers the C code was calling the python version of rrelu_with_noise(). Although the C++ code was passing all the values to the python version of rrelu_with_noise() the pytorch C++ -> Python dispatch code looks at the schema and strips any parameters which match the schema's listed defaults so if the schema shows defaults that aren't in the code it will be a problem. Test Plan: I added a unit test for this specific case. It would probably be better to write a more general one to validate all the ops against their schemas - but I haven't learned enough about the test harness to do that yet. Fixes #115811 Pull Request resolved: https://github.com/pytorch/pytorch/pull/117141 Approved by: https://github.com/yanboliang, https://github.com/oulgen	2024-01-12 05:32:13 +00:00
Elias Ellison	e3d4f4d14b	[ProxyTensor] dedupe symbolic shapes in tracing (#116158 ) Dedupes symbolic shapes in proxy tensor tracing. Reusing the existing sym shape avoids inserting spurious sym_size calls, which can interfere with pattern matching and graph passes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/116158 Approved by: https://github.com/ezyang	2024-01-11 07:15:11 +00:00
Mengwei Liu	8783fe9cf3	[export] Modify SDPA decomposition to decompose _scaled_dot_product_flash_attention_for_cpu (#117097 ) Summary: As titled. #115913 added `_scaled_dot_product_flash_attention_for_cpu` and the export result of `scaled_dot_product_attention` includes this op. Adding this decomposition so that it's being decomposed the same way as `_scaled_dot_product_attention_math`. Test Plan: Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/117097 Approved by: https://github.com/lezcano	2024-01-10 23:46:14 +00:00
Elias Ellison	d6540038c0	Fix 0-dim Index in Index Copy decomp (#117065 ) Fix for https://github.com/pytorch/pytorch/issues/115931 Pull Request resolved: https://github.com/pytorch/pytorch/pull/117065 Approved by: https://github.com/jansel, https://github.com/shunting314	2024-01-10 22:13:43 +00:00
Zhengxu Chen	b3f7fdbf0a	Add decomp for pad_sequence (#116285 ) Summary: currently pad_sequence caused symbolic shape specialization in export which is unintended. Adding a decomp seems to work to avoid the c++ kernel which caused the specialization. Test Plan: buck test mode/opt caffe2/test:test_export -- -r pad_sequence Reviewed By: SherlockNoMad Differential Revision: D52345667 Pull Request resolved: https://github.com/pytorch/pytorch/pull/116285 Approved by: https://github.com/peterbell10, https://github.com/lezcano	2023-12-27 23:56:51 +00:00
Aaron Meurer	f08c4da86d	Add a decomposition for take() (#114813 ) Presumably this can close https://github.com/pytorch/pytorch/pull/109784 Also related to https://github.com/pytorch/pytorch/issues/93757 (though `take` is not listed there). There's no bounds checking here (out of bounds indices cause a segfault or undefined behavior). Should that be added somehow? Pull Request resolved: https://github.com/pytorch/pytorch/pull/114813 Approved by: https://github.com/peterbell10, https://github.com/lezcano	2023-12-22 18:14:57 +00:00
vfdev-5	f727bed2e6	[inductor] Updated upsample_bilinear2d decomposition (#104182 ) Description: - Updated upsample_bilinear2d decomposition - added support for uint8 dtype support - code improvements - Added uint8 dtype tests Perf considerations: - There is minor perf regression (speed-up ~0.7) on cases uint8, align_corners=True when output is smaller/equal (256, 256) - For cases, when output is larger (256, 256) and input dtype uint8, nightly output is wrong, so IMO large perf regression (speed-up around ~0.2) should not be taken into account. ## Perfs benchmarks ``` [--------------------------------------------------------------------------------------------------------------------------------------------------------- Interpolate, cpu --------------------------------------------------------------------------------------------------------------------------------------------------------] \| Eager (2.3.0a0+gitafcfdb1) PR \| Compiled (2.3.0a0+gitafcfdb1) PR \| Compiled (2.3.0a0+gitde89a53) Nightly \| speed-up PR vs Nightly \| Eager (2.3.0a0+gitde89a53) Nightly 1 threads: -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Input (1, 3, 500, 400), torch.uint8, torch.contiguous_format \| mode: bilinear, align_corners: True, antialias: False, osize: (256, 256) \| 565.212 (+-3.548) \| 1384.210 (+-10.798) \| 1230.996 (+-32.930) \| 0.889 (+-0.000) \| 566.253 (+-1.526) Input (1, 3, 500, 400), torch.uint8, torch.contiguous_format \| mode: bilinear, align_corners: False, antialias: False, osize: (256, 256) \| 565.404 (+-1.614) \| 1491.649 (+-7.763) \| 2974.959 (+-6.006) \| 1.994 (+-0.000) \| 566.476 (+-1.742) Input (1, 3, 500, 400), torch.uint8, torch.channels_last \| mode: bilinear, align_corners: True, antialias: False, osize: (256, 256) \| 270.761 (+-0.861) \| 1557.777 (+-4.699) \| 1080.919 (+-4.243) \| 0.694 (+-0.000) \| 269.829 (+-0.986) Input (1, 3, 500, 400), torch.uint8, torch.channels_last \| mode: bilinear, align_corners: False, antialias: False, osize: (256, 256) \| 270.960 (+-0.995) \| 1723.913 (+-12.433) \| 3191.938 (+-6.194) \| 1.852 (+-0.000) \| 269.962 (+-1.657) Input (1, 3, 500, 400), torch.float32, torch.contiguous_format \| mode: bilinear, align_corners: True, antialias: False, osize: (256, 256) \| 1555.884 (+-5.169) \| 1178.753 (+-4.957) \| 1910.445 (+-5.988) \| 1.621 (+-0.000) \| 1560.804 (+-6.793) Input (1, 3, 500, 400), torch.float32, torch.contiguous_format \| mode: bilinear, align_corners: False, antialias: False, osize: (256, 256) \| 1651.193 (+-6.952) \| 1323.466 (+-6.059) \| 3374.842 (+-8.168) \| 2.550 (+-0.000) \| 1653.497 (+-8.018) Input (1, 3, 500, 400), torch.float32, torch.channels_last \| mode: bilinear, align_corners: True, antialias: False, osize: (256, 256) \| 978.482 (+-10.183) \| 1383.768 (+-4.341) \| 2147.841 (+-6.581) \| 1.552 (+-0.000) \| 979.983 (+-1.499) Input (1, 3, 500, 400), torch.float32, torch.channels_last \| mode: bilinear, align_corners: False, antialias: False, osize: (256, 256) \| 1074.472 (+-5.031) \| 1414.912 (+-5.754) \| 3590.968 (+-10.042) \| 2.538 (+-0.000) \| 1074.589 (+-3.948) Input (4, 3, 500, 400), torch.uint8, torch.contiguous_format \| mode: bilinear, align_corners: True, antialias: False, osize: (256, 256) \| 2168.703 (+-8.964) \| 5400.528 (+-26.628) \| 4777.299 (+-11.891) \| 0.885 (+-0.000) \| 2168.133 (+-7.667) Input (4, 3, 500, 400), torch.uint8, torch.contiguous_format \| mode: bilinear, align_corners: False, antialias: False, osize: (256, 256) \| 2169.132 (+-12.618) \| 6583.866 (+-28.959) \| 11986.894 (+-45.838) \| 1.821 (+-0.000) \| 2174.488 (+-10.317) Input (4, 3, 500, 400), torch.uint8, torch.channels_last \| mode: bilinear, align_corners: True, antialias: False, osize: (256, 256) \| 992.808 (+-6.086) \| 5985.028 (+-9.532) \| 4334.158 (+-9.423) \| 0.724 (+-0.000) \| 989.604 (+-5.499) Input (4, 3, 500, 400), torch.uint8, torch.channels_last \| mode: bilinear, align_corners: False, antialias: False, osize: (256, 256) \| 987.618 (+-6.350) \| 6963.044 (+-28.885) \| 15441.096 (+-55.324) \| 2.218 (+-0.000) \| 985.573 (+-5.159) Input (4, 3, 500, 400), torch.float32, torch.contiguous_format \| mode: bilinear, align_corners: True, antialias: False, osize: (256, 256) \| 6695.557 (+-35.067) \| 4657.603 (+-14.220) \| 8058.708 (+-41.684) \| 1.730 (+-0.000) \| 6714.996 (+-38.626) Input (4, 3, 500, 400), torch.float32, torch.contiguous_format \| mode: bilinear, align_corners: False, antialias: False, osize: (256, 256) \| 7040.481 (+-39.486) \| 5445.704 (+-16.659) \| 13906.618 (+-53.298) \| 2.554 (+-0.000) \| 7034.453 (+-44.626) Input (4, 3, 500, 400), torch.float32, torch.channels_last \| mode: bilinear, align_corners: True, antialias: False, osize: (256, 256) \| 3926.186 (+-10.660) \| 5741.433 (+-12.748) \| 9356.036 (+-40.848) \| 1.630 (+-0.000) \| 3930.598 (+-17.086) Input (4, 3, 500, 400), torch.float32, torch.channels_last \| mode: bilinear, align_corners: False, antialias: False, osize: (256, 256) \| 4308.536 (+-9.607) \| 6122.755 (+-47.278) \| 15637.567 (+-54.392) \| 2.554 (+-0.000) \| 4307.463 (+-11.268) Input (1, 3, 1200, 1300), torch.uint8, torch.contiguous_format \| mode: bilinear, align_corners: True, antialias: False, osize: (200, 300) \| 2512.740 (+-10.860) \| 1573.590 (+-5.061) \| 451.355 (+-1.210) \| 0.287 (+-0.000) \| 2511.727 (+-10.930) Input (1, 3, 1200, 1300), torch.uint8, torch.contiguous_format \| mode: bilinear, align_corners: False, antialias: False, osize: (200, 300) \| 2489.926 (+-11.915) \| 1537.233 (+-4.212) \| 2501.470 (+-7.446) \| 1.627 (+-0.000) \| 2500.000 (+-12.155) Input (1, 3, 1200, 1300), torch.uint8, torch.channels_last \| mode: bilinear, align_corners: True, antialias: False, osize: (200, 300) \| 632.032 (+-2.108) \| 1496.994 (+-4.194) \| 404.759 (+-1.064) \| 0.270 (+-0.000) \| 630.122 (+-4.086) Input (1, 3, 1200, 1300), torch.uint8, torch.channels_last \| mode: bilinear, align_corners: False, antialias: False, osize: (200, 300) \| 629.174 (+-4.386) \| 1708.935 (+-8.817) \| 2643.296 (+-9.723) \| 1.547 (+-0.000) \| 628.388 (+-1.326) Input (1, 3, 1200, 1300), torch.float32, torch.contiguous_format \| mode: bilinear, align_corners: True, antialias: False, osize: (200, 300) \| 4409.941 (+-8.016) \| 1160.133 (+-4.698) \| 1897.089 (+-9.392) \| 1.635 (+-0.000) \| 4450.959 (+-10.438) Input (1, 3, 1200, 1300), torch.float32, torch.contiguous_format \| mode: bilinear, align_corners: False, antialias: False, osize: (200, 300) \| 4493.427 (+-11.703) \| 1329.226 (+-4.740) \| 2835.872 (+-12.241) \| 2.133 (+-0.000) \| 4506.973 (+-9.914) Input (1, 3, 1200, 1300), torch.float32, torch.channels_last \| mode: bilinear, align_corners: True, antialias: False, osize: (200, 300) \| 901.712 (+-4.071) \| 1320.739 (+-5.197) \| 2207.605 (+-8.219) \| 1.671 (+-0.000) \| 904.757 (+-4.558) Input (1, 3, 1200, 1300), torch.float32, torch.channels_last \| mode: bilinear, align_corners: False, antialias: False, osize: (200, 300) \| 990.080 (+-3.922) \| 1702.563 (+-7.909) \| 3074.196 (+-10.478) \| 1.806 (+-0.000) \| 990.482 (+-4.444) Input (4, 3, 1200, 1300), torch.uint8, torch.contiguous_format \| mode: bilinear, align_corners: True, antialias: False, osize: (200, 300) \| 9785.550 (+-58.445) \| 6135.680 (+-33.569) \| 1628.572 (+-19.770) \| 0.265 (+-0.000) \| 9893.606 (+-62.377) Input (4, 3, 1200, 1300), torch.uint8, torch.contiguous_format \| mode: bilinear, align_corners: False, antialias: False, osize: (200, 300) \| 9710.191 (+-57.597) \| 6066.824 (+-36.364) \| 10469.110 (+-42.775) \| 1.726 (+-0.000) \| 9919.022 (+-72.190) Input (4, 3, 1200, 1300), torch.uint8, torch.channels_last \| mode: bilinear, align_corners: True, antialias: False, osize: (200, 300) \| 2790.356 (+-12.188) \| 6134.101 (+-28.694) \| 1576.832 (+-6.030) \| 0.257 (+-0.000) \| 2761.122 (+-11.503) Input (4, 3, 1200, 1300), torch.uint8, torch.channels_last \| mode: bilinear, align_corners: False, antialias: False, osize: (200, 300) \| 2778.711 (+-13.603) \| 6608.528 (+-37.776) \| 10841.549 (+-49.429) \| 1.641 (+-0.000) \| 2753.037 (+-10.995) Input (4, 3, 1200, 1300), torch.float32, torch.contiguous_format \| mode: bilinear, align_corners: True, antialias: False, osize: (200, 300) \| 45533.868 (+-102.618) \| 4962.994 (+-8.215) \| 9003.968 (+-38.179) \| 1.814 (+-0.000) \| 43531.261 (+-102.951) Input (4, 3, 1200, 1300), torch.float32, torch.contiguous_format \| mode: bilinear, align_corners: False, antialias: False, osize: (200, 300) \| 45932.699 (+-81.207) \| 5595.682 (+-11.482) \| 12302.907 (+-50.254) \| 2.199 (+-0.000) \| 43916.455 (+-80.468) Input (4, 3, 1200, 1300), torch.float32, torch.channels_last \| mode: bilinear, align_corners: True, antialias: False, osize: (200, 300) \| 3827.804 (+-8.057) \| 6311.580 (+-25.021) \| 11760.614 (+-51.531) \| 1.863 (+-0.000) \| 3849.959 (+-10.848) Input (4, 3, 1200, 1300), torch.float32, torch.channels_last \| mode: bilinear, align_corners: False, antialias: False, osize: (200, 300) \| 4169.007 (+-8.452) \| 6820.716 (+-35.310) \| 15264.633 (+-49.982) \| 2.238 (+-0.000) \| 4183.875 (+-19.104) Input (1, 3, 300, 400), torch.uint8, torch.contiguous_format \| mode: bilinear, align_corners: True, antialias: False, osize: (600, 700) \| 1306.914 (+-7.470) \| 10598.101 (+-38.410) \| 2678.031 (+-11.051) \| 0.253 (+-0.000) \| 1307.470 (+-8.519) Input (1, 3, 300, 400), torch.uint8, torch.contiguous_format \| mode: bilinear, align_corners: False, antialias: False, osize: (600, 700) \| 1307.268 (+-8.197) \| 10161.123 (+-45.643) \| 17148.842 (+-55.402) \| 1.688 (+-0.000) \| 1308.077 (+-8.553) Input (1, 3, 300, 400), torch.uint8, torch.channels_last \| mode: bilinear, align_corners: True, antialias: False, osize: (600, 700) \| 548.574 (+-2.157) \| 10072.806 (+-41.368) \| 2408.971 (+-6.997) \| 0.239 (+-0.000) \| 547.726 (+-1.721) Input (1, 3, 300, 400), torch.uint8, torch.channels_last \| mode: bilinear, align_corners: False, antialias: False, osize: (600, 700) \| 546.664 (+-1.484) \| 11123.694 (+-43.636) \| 18058.070 (+-48.552) \| 1.623 (+-0.000) \| 547.151 (+-1.627) Input (1, 3, 300, 400), torch.float32, torch.contiguous_format \| mode: bilinear, align_corners: True, antialias: False, osize: (600, 700) \| 7935.051 (+-71.022) \| 7654.533 (+-29.512) \| 12414.194 (+-87.450) \| 1.622 (+-0.000) \| 7900.056 (+-53.997) Input (1, 3, 300, 400), torch.float32, torch.contiguous_format \| mode: bilinear, align_corners: False, antialias: False, osize: (600, 700) \| 8546.732 (+-53.118) \| 8583.572 (+-35.656) \| 19111.824 (+-166.978) \| 2.227 (+-0.000) \| 8515.433 (+-63.300) Input (1, 3, 300, 400), torch.float32, torch.channels_last \| mode: bilinear, align_corners: True, antialias: False, osize: (600, 700) \| 6202.642 (+-34.355) \| 8915.622 (+-62.293) \| 14327.295 (+-52.188) \| 1.607 (+-0.000) \| 6213.329 (+-39.740) Input (1, 3, 300, 400), torch.float32, torch.channels_last \| mode: bilinear, align_corners: False, antialias: False, osize: (600, 700) \| 6811.128 (+-33.747) \| 9647.316 (+-50.837) \| 20830.594 (+-62.979) \| 2.159 (+-0.000) \| 6822.512 (+-37.092) Input (4, 3, 300, 400), torch.uint8, torch.contiguous_format \| mode: bilinear, align_corners: True, antialias: False, osize: (600, 700) \| 5079.586 (+-19.067) \| 42238.442 (+-87.643) \| 11282.141 (+-42.477) \| 0.267 (+-0.000) \| 5104.234 (+-17.706) Input (4, 3, 300, 400), torch.uint8, torch.contiguous_format \| mode: bilinear, align_corners: False, antialias: False, osize: (600, 700) \| 5079.575 (+-16.306) \| 41512.995 (+-83.710) \| 68789.816 (+-440.001) \| 1.657 (+-0.000) \| 5097.446 (+-21.724) Input (4, 3, 300, 400), torch.uint8, torch.channels_last \| mode: bilinear, align_corners: True, antialias: False, osize: (600, 700) \| 2039.974 (+-8.614) \| 42322.773 (+-111.866) \| 10399.237 (+-43.140) \| 0.246 (+-0.000) \| 2043.808 (+-10.707) Input (4, 3, 300, 400), torch.uint8, torch.channels_last \| mode: bilinear, align_corners: False, antialias: False, osize: (600, 700) \| 2036.214 (+-10.083) \| 44353.281 (+-71.548) \| 73340.412 (+-324.780) \| 1.654 (+-0.000) \| 2039.000 (+-9.554) Input (4, 3, 300, 400), torch.float32, torch.contiguous_format \| mode: bilinear, align_corners: True, antialias: False, osize: (600, 700) \| 33821.523 (+-96.639) \| 30552.094 (+-65.023) \| 49494.486 (+-872.916) \| 1.620 (+-0.000) \| 33844.404 (+-92.466) Input (4, 3, 300, 400), torch.float32, torch.contiguous_format \| mode: bilinear, align_corners: False, antialias: False, osize: (600, 700) \| 36196.104 (+-128.169) \| 34038.432 (+-79.697) \| 75761.226 (+-905.194) \| 2.226 (+-0.000) \| 36260.473 (+-94.642) Input (4, 3, 300, 400), torch.float32, torch.channels_last \| mode: bilinear, align_corners: True, antialias: False, osize: (600, 700) \| 24827.821 (+-77.335) \| 37006.218 (+-86.318) \| 61297.625 (+-898.192) \| 1.656 (+-0.000) \| 24823.275 (+-80.945) Input (4, 3, 300, 400), torch.float32, torch.channels_last \| mode: bilinear, align_corners: False, antialias: False, osize: (600, 700) \| 27266.138 (+-70.262) \| 40109.475 (+-94.248) \| 92086.075 (+-404.922) \| 2.296 (+-0.000) \| 27287.992 (+-89.507) Times are in microseconds (us). [--------------------------------------------------------------------------------------------------------------------------------------------------------- Interpolate, cuda ---------------------------------------------------------------------------------------------------------------------------------------------------------] \| Eager (2.3.0a0+gitafcfdb1) PR \| Compiled (2.3.0a0+gitafcfdb1) PR \| Compiled (2.3.0a0+gitde89a53) Nightly \| speed-up PR vs Nightly \| Eager (2.3.0a0+gitde89a53) Nightly 1 threads: ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Input (1, 3, 2345, 2456), torch.float32, torch.contiguous_format \| mode: bilinear, align_corners: True, antialias: False, osize: (1234, 1345) \| 98.259 (+-0.014) \| 97.156 (+-0.008) \| 97.443 (+-0.031) \| 1.003 (+-0.000) \| 98.248 (+-0.021) Input (1, 3, 2345, 2456), torch.float32, torch.contiguous_format \| mode: bilinear, align_corners: False, antialias: False, osize: (1234, 1345) \| 97.048 (+-0.016) \| 97.480 (+-0.018) \| 96.819 (+-0.126) \| 0.993 (+-0.000) \| 97.045 (+-0.015) Input (1, 3, 2345, 2456), torch.float32, torch.channels_last \| mode: bilinear, align_corners: True, antialias: False, osize: (1234, 1345) \| 97.944 (+-0.028) \| 91.686 (+-0.411) \| 93.894 (+-1.011) \| 1.024 (+-0.000) \| 97.933 (+-0.008) Input (1, 3, 2345, 2456), torch.float32, torch.channels_last \| mode: bilinear, align_corners: False, antialias: False, osize: (1234, 1345) \| 98.008 (+-0.011) \| 91.205 (+-0.346) \| 96.854 (+-0.058) \| 1.062 (+-0.000) \| 97.203 (+-0.010) Input (4, 3, 2345, 2456), torch.float32, torch.contiguous_format \| mode: bilinear, align_corners: True, antialias: False, osize: (1234, 1345) \| 384.318 (+-0.011) \| 382.793 (+-0.007) \| 382.472 (+-0.011) \| 0.999 (+-0.000) \| 384.701 (+-0.012) Input (4, 3, 2345, 2456), torch.float32, torch.contiguous_format \| mode: bilinear, align_corners: False, antialias: False, osize: (1234, 1345) \| 384.266 (+-0.009) \| 385.333 (+-0.024) \| 382.554 (+-0.022) \| 0.993 (+-0.000) \| 384.386 (+-0.016) Input (4, 3, 2345, 2456), torch.float32, torch.channels_last \| mode: bilinear, align_corners: True, antialias: False, osize: (1234, 1345) \| 383.924 (+-0.011) \| 570.071 (+-0.030) \| 545.615 (+-0.051) \| 0.957 (+-0.000) \| 384.044 (+-0.012) Input (4, 3, 2345, 2456), torch.float32, torch.channels_last \| mode: bilinear, align_corners: False, antialias: False, osize: (1234, 1345) \| 384.184 (+-0.016) \| 560.857 (+-0.026) \| 552.447 (+-0.040) \| 0.985 (+-0.000) \| 384.063 (+-0.016) Input (1, 3, 1234, 1345), torch.float32, torch.contiguous_format \| mode: bilinear, align_corners: True, antialias: False, osize: (2345, 2456) \| 122.188 (+-0.053) \| 116.744 (+-1.006) \| 163.762 (+-0.015) \| 1.403 (+-0.000) \| 121.874 (+-0.015) Input (1, 3, 1234, 1345), torch.float32, torch.contiguous_format \| mode: bilinear, align_corners: False, antialias: False, osize: (2345, 2456) \| 122.156 (+-0.012) \| 182.692 (+-0.013) \| 161.653 (+-0.018) \| 0.885 (+-0.000) \| 121.926 (+-0.014) Input (1, 3, 1234, 1345), torch.float32, torch.channels_last \| mode: bilinear, align_corners: True, antialias: False, osize: (2345, 2456) \| 105.852 (+-0.324) \| 119.545 (+-0.294) \| 190.527 (+-0.023) \| 1.594 (+-0.000) \| 105.999 (+-0.446) Input (1, 3, 1234, 1345), torch.float32, torch.channels_last \| mode: bilinear, align_corners: False, antialias: False, osize: (2345, 2456) \| 106.507 (+-0.282) \| 120.060 (+-0.257) \| 162.330 (+-0.012) \| 1.352 (+-0.000) \| 106.567 (+-0.385) Input (4, 3, 1234, 1345), torch.float32, torch.contiguous_format \| mode: bilinear, align_corners: True, antialias: False, osize: (2345, 2456) \| 447.907 (+-0.015) \| 463.863 (+-1.779) \| 650.492 (+-0.331) \| 1.402 (+-0.000) \| 446.596 (+-0.017) Input (4, 3, 1234, 1345), torch.float32, torch.contiguous_format \| mode: bilinear, align_corners: False, antialias: False, osize: (2345, 2456) \| 447.750 (+-0.017) \| 723.832 (+-0.170) \| 641.539 (+-0.075) \| 0.886 (+-0.000) \| 446.467 (+-0.019) Input (4, 3, 1234, 1345), torch.float32, torch.channels_last \| mode: bilinear, align_corners: True, antialias: False, osize: (2345, 2456) \| 439.549 (+-0.031) \| 507.772 (+-2.879) \| 758.795 (+-0.482) \| 1.494 (+-0.000) \| 440.372 (+-0.025) Input (4, 3, 1234, 1345), torch.float32, torch.channels_last \| mode: bilinear, align_corners: False, antialias: False, osize: (2345, 2456) \| 439.538 (+-0.029) \| 509.260 (+-2.704) \| 654.195 (+-2.621) \| 1.285 (+-0.000) \| 440.362 (+-0.026) Times are in microseconds (us). ``` [Source](`f4751a3196/perf_interp_mode.py`), [Output](`899f34c024/output/20231213-214209-upsample-bilinear-pr_vs_nightly-speedup.md`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/104182 Approved by: https://github.com/lezcano	2023-12-14 14:50:06 +00:00
angelayi	639060cb0b	Use get_mkldnn_enabled for decompositions (#115448 ) `torch._C.has_mkldnn` does not respect cases where users try to disable mkldnn using `torch._C._set_mkldnn_enabled()`. This is relevant to edge use cases, where they do not want decompositions to go to the ATen opset, and do not want the mkldnn operator to appear in the graph. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115448 Approved by: https://github.com/jgong5, https://github.com/ydwu4	2023-12-12 22:42:51 +00:00
Isuru Fernando	d40a7c6026	Add decompositions for replication_pad (#115113 ) Fixes #115395 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115113 Approved by: https://github.com/peterbell10	2023-12-09 02:44:07 +00:00
Isuru Fernando	fb19947962	Add decompositions for reflection_pad{1, 2, 3}d (#115100 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115100 Approved by: https://github.com/peterbell10	2023-12-08 23:05:57 +00:00
Jason Ansel	7979ba7b43	[inductor] Add dropout type check to match eager (#115040 ) Fixes #98970 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115040 Approved by: https://github.com/oulgen	2023-12-03 23:05:02 +00:00
Kurt Mohler	6f32eb7eef	Add decomp for `replication_pad2d` and use for CUDA deterministic (#111590 ) Fixes #95578 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111590 Approved by: https://github.com/peterbell10	2023-12-01 18:56:09 +00:00
PyTorch MergeBot	013675ff59	Revert "Add decomp for `replication_pad2d` and use for CUDA deterministic (#111590 )" This reverts commit f1286161a637e9fc0797a22a7b7d90eaa04ddc4f. Reverted https://github.com/pytorch/pytorch/pull/111590 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing XLA job. The job is also failing on the PR, but the log classifier failed to find the failed test which lead to it being marked wrongly as flaky ([comment](https://github.com/pytorch/pytorch/pull/111590#issuecomment-1833004794))	2023-11-30 02:28:14 +00:00
Kurt Mohler	f1286161a6	Add decomp for `replication_pad2d` and use for CUDA deterministic (#111590 ) Fixes #95578 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111590 Approved by: https://github.com/peterbell10	2023-11-29 21:50:46 +00:00
Antonio Kim	7fc292930c	Add support for `torch.Generator` type in TorchScript (#110413 ) - Add support for `torch.Generator` type in TorchScript - Add `generator` args to all `torch.nn.init` functions that call `uniform_` or `normal_` - Add support for `torch.Generator` in LTC's TorchScript backend (CC: @wconstab) CC: @eellison @davidberard98 @GlebKazantaev @behzad-a Pull Request resolved: https://github.com/pytorch/pytorch/pull/110413 Approved by: https://github.com/wconstab, https://github.com/albanD, https://github.com/glebk-cerebras, https://github.com/davidberard98	2023-11-21 23:07:21 +00:00
vfdev-5	1f8d00c5a3	[inductor] Added decomposition for upsample_nearest_exact Nd (#113749 ) Description: - Added decomposition for upsample_nearest_exact: 1d, 2d, 3d Pull Request resolved: https://github.com/pytorch/pytorch/pull/113749 Approved by: https://github.com/lezcano	2023-11-21 13:03:47 +00:00
GD06	b30580e121	[PT] Include tensor shape info in the error messages of torch split (#113984 ) Summary: Include tensor shape info in the error messages of torch split. Test Plan: CI Differential Revision: D51436684 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113984 Approved by: https://github.com/ezyang	2023-11-19 01:34:57 +00:00
PyTorch MergeBot	252e68a83b	Revert "Add support for `torch.Generator` type in TorchScript (#110413 )" This reverts commit 54493fe8c4b1cca4c5ff993b99eb3e3dbc984226. Reverted https://github.com/pytorch/pytorch/pull/110413 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is, unfortunately, still breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/110413#issuecomment-1811625557))	2023-11-15 00:51:23 +00:00
Antonio Kim	54493fe8c4	Add support for `torch.Generator` type in TorchScript (#110413 ) - Add support for `torch.Generator` type in TorchScript - Add `generator` args to all `torch.nn.init` functions that call `uniform_` or `normal_` - Add support for `torch.Generator` in LTC's TorchScript backend (CC: @wconstab) CC: @eellison @davidberard98 @GlebKazantaev @behzad-a Pull Request resolved: https://github.com/pytorch/pytorch/pull/110413 Approved by: https://github.com/wconstab, https://github.com/albanD, https://github.com/glebk-cerebras, https://github.com/davidberard98	2023-11-13 23:18:14 +00:00
Mengwei Liu	5506b9db43	[decomp] Fix _scaled_dot_product_flash_attention decomposition bug (#113102 ) For `_scaled_dot_product_flash_attention` we don't have `Tensor? attn_mask=None` but `scaled_dot_product_attention` has. In the original decomp there's a mixup where I added this argument to `_scaled_dot_product_flash_attention`. Fix it so that `_scaled_dot_product_flash_attention` is being decomposed correctly. Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/113102 Approved by: https://github.com/ezyang	2023-11-08 21:47:37 +00:00
PyTorch MergeBot	9a28a7b498	Revert "Add support for `torch.Generator` type in TorchScript (#110413 )" This reverts commit 27e31ab6e86259b27d816d6fb6e7a69de526a0e4. Reverted https://github.com/pytorch/pytorch/pull/110413 on behalf of https://github.com/PaliC due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/110413#issuecomment-1799003164))	2023-11-07 15:53:32 +00:00
Antonio Kim	27e31ab6e8	Add support for `torch.Generator` type in TorchScript (#110413 ) - Add support for `torch.Generator` type in TorchScript - Add `generator` args to all `torch.nn.init` functions that call `uniform_` or `normal_` - Add support for `torch.Generator` in LTC's TorchScript backend (CC: @wconstab) CC: @eellison @davidberard98 @GlebKazantaev @behzad-a Pull Request resolved: https://github.com/pytorch/pytorch/pull/110413 Approved by: https://github.com/wconstab, https://github.com/albanD, https://github.com/glebk-cerebras, https://github.com/davidberard98	2023-11-06 21:27:02 +00:00
Han Qi	5a6f8014c4	Add a decomposition for _weight_norm_interface. (#112193 ) Fixes #112086 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112193 Approved by: https://github.com/ezyang	2023-11-01 19:51:11 +00:00
Peter Bell	66c32d099a	Use `pytree.arg_tree_leaves` everywhere (#112394 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112394 Approved by: https://github.com/lezcano ghstack dependencies: #112391, #112392, #112393	2023-10-31 15:57:06 +00:00
Peter Bell	bbd5b935e4	Use `pytree.tree_leaves` everywhere (#112324 ) This changes all the instances I could find of `tree_flatten(...)[0]` or `x, _ = tree_flatten` to use `tree_leaves`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112324 Approved by: https://github.com/lezcano ghstack dependencies: #112327, #112323	2023-10-30 03:39:04 +00:00
lezcano	c8a5bb451e	Do not import sympy within torch._prims_common (#112034 ) This is the first of a few PRs that avoid importing SymPy at import time. The pitch here is that we (almost!) do not have SymPy on our API, so this should be feasible. This should speed-up torch imports by a good 15% as per https://dev-discuss.pytorch.org/t/delving-into-what-happens-when-you-import-torch/1589 In this PR we just move a few global imports into local imports. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112034 Approved by: https://github.com/ezyang	2023-10-26 12:53:25 +00:00
PyTorch MergeBot	98c329b19e	Revert "[core ATen IR] Add decompositions for max, min, var_mean (#110906 )" This reverts commit 9606cda64e97210cfcca07110ef4872cedc5a1d9. Reverted https://github.com/pytorch/pytorch/pull/110906 on behalf of https://github.com/SS-JIA due to Breaks internal CI ([comment](https://github.com/pytorch/pytorch/pull/110906#issuecomment-1757490740))	2023-10-11 11:41:21 +00:00
SS-JIA	9606cda64e	[core ATen IR] Add decompositions for max, min, var_mean (#110906 ) ## Context Add decompositions for `aten.max`, `aten.min`, and `aten.var_mean`. These operators follow a pattern of returning a tuple of outputs from two component operators: ``` aten.max(x) -> return aten.amax(x), aten.argmax(x) aten.min(x) -> return aten.amin(x), aten.argmin(x) aten.var_mean(x) -> return aten.var(x), aten.mean(x) ``` For `var_mean`, the `refs` implementation was doing something similar, so I changed it to call `torch.` ops instead like was done for other `refs` implementations previously. cc: @peterbell10 @lezcano Note that Inductor lowers all these directly, so they are excluded from the Inductor decomp table. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110906 Approved by: https://github.com/manuelcandales	2023-10-11 00:06:24 +00:00
Kazuaki Ishizaki	fde28fdc8c	Fix typo under torch/_decomp directory (#110821 ) This PR fixes typo of comments in files under `torch/_decomp` directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110821 Approved by: https://github.com/Skylion007	2023-10-08 20:33:49 +00:00
Stephen Jia	c2e7a0d689	[core IR] Add decomps for `aten.sum` and `aten.squeeze` variants (#110645 ) Summary: ## Context Both `aten.sum` and `aten.squeeze` have a "most generic" variant in the form of `aten.sum.dim_IntList` and `aten.squeeze.dims` respectively. Add decompositions for other non generic variants of these operators to express them using the most generic variant. Note that to register these decomps, the reference implementation under `_refs` had to be removed as registered decompositions. cc: @lezcano @peterbell10 Test Plan: Github CI + Meta Internal CI Differential Revision: D49965952 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110645 Approved by: https://github.com/peterbell10, https://github.com/digantdesai, https://github.com/manuelcandales	2023-10-07 04:21:51 +00:00
cdzhan	7cc0020a80	[decomp] Fix different return type in threshold_backward vs. eager (#110689 ) due to type promotion with floating point scalar in decompositions.py Fixes part of #100838 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110689 Approved by: https://github.com/ezyang	2023-10-06 20:59:58 +00:00
chilli	ceb773b68d	Fix #110680 (requires_grad typo in decomp) (#110687 ) Fixes https://github.com/pytorch/pytorch/issues/110680 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110687 Approved by: https://github.com/voznesenskym, https://github.com/lezcano ghstack dependencies: #110501, #110504, #110591, #110668	2023-10-06 10:36:01 +00:00
Jerry Zhang	f2a1b93549	Back out "[quant] Support integer implementations for adaptive_avg_pool2d (#104226 )" (#110316 ) Summary: Original commit changeset: acdb5b34e3aa Original Phabricator Diff: D47321689 Test Plan: opinfo tests in CI Differential Revision: D49789403 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110316 Approved by: https://github.com/kimishpatel	2023-10-03 16:59:23 +00:00
Peter Bell	be3b16daad	[decomp] Fix baddbmm decomposition (#109714 ) The decomposition is currently registered without the pw_cast_for_opmath decorator, due to the ordering of decorators being meaningful. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109714 Approved by: https://github.com/lezcano	2023-09-28 21:23:44 +00:00
SS-JIA	5df8aca994	[core IR] Add a core decomposition for floor_divide (#110046 ) ## Context Introduce a core decomposition for `aten.floor_divide` into other `aten` ops, and add it to the core ATen decomposition table. This replaces the decomposition of `floor_divide` that was used by Inductor. I noticed there was a note on that decomposition ``` # TorchInductor-only decomposition. It should not be taken to core. # See https://github.com/pytorch/torchdynamo/pull/1120 ``` but couldn't discern the reason why this is the case. cc: @lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/110046 Approved by: https://github.com/peterbell10	2023-09-26 08:39:21 +00:00
Mwiza Kunda	5c4b5baf21	Fix python decomps for OpOverloadPackets and add tests (#107707 ) - Extend `test_torch_dispatch_meta_outplace` to test torch ops that do not have an out parameter but have aten op overloads that have out parameters. Additionally, Python decompositions may register `OpOverloadPacket`'s so decompositions need to be tested to ensure all `OpOverloads` still function for the `Meta` key (e.g. if a python decomposition is registered for an aten op `aten.foo` with overloads `[default, out]`, the python function needs to support receiving out arguments) - Add out parameter wrappers to python decomps for aten ops that have out overloads CC. @ezyang @albanD @lezcano Fixes #107713 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107707 Approved by: https://github.com/lezcano	2023-09-25 20:53:30 +00:00
SS-JIA	7de669f2f9	[core IR] Remove trunc decomp and add trunc to core (#109902 ) Following up from [this comment](https://github.com/pytorch/pytorch/pull/109319#discussion_r1330803226). Remove the decomposition for `trunc`, and add it as a core operator. Going forward, provide similar treatment for operators that map cleanly to hardware instructions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109902 Approved by: https://github.com/peterbell10	2023-09-25 18:18:06 +00:00
Jijie Wei	334ead04a9	Back out "[decomp] Fix baddbmm decomposition (#109714 )" (#109855 ) Summary: Original commit changeset: 95c462a380c9 Original Phabricator Diff: D49484954 this diff cause test failure for deterministic ne test see:https://www.internalfb.com/sandcastle/job/18014399565419856/ Test Plan: buck2 test 'fbcode//mode/opt' fbcode//aps_models/ads/icvr/tests:icvr_fm_e2e_deterministic_ne_test -- --exact 'aps_models/ads/icvr/tests:icvr_fm_e2e_deterministic_ne_test - aps_models.ads.icvr.tests.icvr_fm_e2e_deterministic_ne_test.ICVR_FM_E2EDeterministicNeTest: test_e2e_deterministic_icvr_fm_pt2_fsdp_multi_gpus' https://www.internalfb.com/intern/testinfra/testrun/16888498605839953 Differential Revision: D49527271 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109855 Approved by: https://github.com/yanboliang	2023-09-22 22:01:38 +00:00
Mwiza Kunda	8dedc9dd9b	Add meta tests for layer/group/batch norm backward (#109591 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/109591 Approved by: https://github.com/ezyang	2023-09-21 18:58:51 +00:00
Peter Bell	6f0cf5a837	[decomp] Decompose unsafe_split{,_with_sizes} into safe variants (#109668 ) The "safety" aspect refers to the output not being registered as aliasing the input, but after AOTAutograd I don't think this distinction matters. However, we shouldn't use the same decomposition as the safe variant in case the backend doesn't want to decompose split. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109668 Approved by: https://github.com/lezcano ghstack dependencies: #109667	2023-09-20 18:45:56 +00:00
Peter Bell	36a8105f54	[decomp] Fix baddbmm decomposition (#109714 ) The decomposition is currently registered without the pw_cast_for_opmath decorator, due to the ordering of decorators being meaningful. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109714 Approved by: https://github.com/lezcano	2023-09-20 18:40:21 +00:00
Salil Desai	40b2c796dc	[Decomposition] baddbmm (#108534 ) Summary: Moving decomposition of baddbmm from _inductor/decomposition.py and include it in core_aten_decompositions `ff38c0e2f9/torch/_inductor/decomposition.py (L203)` Test Plan: Phabricator + OSS Tests Differential Revision: D48871741 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108534 Approved by: https://github.com/SherlockNoMad	2023-09-20 12:49:32 +00:00
Salil Desai	2e721aab98	[Decomposition] Trunc (#109319 ) Summary: Add Decomp for Trunc and add it to core_aten_decompositions Differential Revision: D49042033 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109319 Approved by: https://github.com/SherlockNoMad	2023-09-19 13:30:13 +00:00
Jez Ng	db48bc80d9	Check index size during decomp of index_add (#108826 ) This partially fixes the `test_index_add_correctness` test (#108181) when run under inductor: it causes an exception to be raised [here][1] as expected. The test as a whole still cannot be made to pass under inductor because the [last assert][2] still fails, likely due to #108798. [1]: `dec2b267d4/test/test_torch.py (L6049)` [2]: `dec2b267d4/test/test_torch.py (L6051)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/108826 Approved by: https://github.com/eellison	2023-09-13 13:06:26 +00:00
Edward Z. Yang	9f37aec964	Add torch._check_is_size (#108685 ) Check comments for what it does. The key distinction is that if you feed it an unbacked SymInt, we will also apply >= 2 assumption at compile time. This will get exercised when I reland https://github.com/pytorch/pytorch/pull/107788 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/108685 Approved by: https://github.com/albanD, https://github.com/Skylion007	2023-09-07 12:48:39 +00:00
Huy Do	5a4fe05a15	Revert "Force synced KJT to trace unbacked SymInt (#107788 )" (#108684 ) This reverts commit 3b92ef814de4571a125294f2aa95843d7d2e2aea. So let's manually revert it instead. (Not sure why the bot doesn't work on https://github.com/pytorch/pytorch/pull/107788) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108684 Approved by: https://github.com/ezyang	2023-09-06 19:15:45 +00:00
Kimish Patel	ebed490c2f	[sdpa decomp] change sdpa decomp to be consistent with flash attention (#108608 ) Summary: See the comment in code for the reasons of the change Test Plan: buck2 test executorch/examples/export/test:test_export -- test_vit_export_to_executorch Differential Revision: D48992180 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108608 Approved by: https://github.com/larryliu0820	2023-09-06 15:34:03 +00:00
Edward Z. Yang	3b92ef814d	Force synced KJT to trace unbacked SymInt (#107788 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/107788 Approved by: https://github.com/voznesenskym	2023-09-06 03:18:26 +00:00

... 2 3 4 5 6 ...

470 Commits