pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-21 21:49:24 +08:00

Author	SHA1	Message	Date
Shengbao Zheng	b33a283e9a	[nccl-pg] Pass pg name and desc to NCCL communicator (#124149 ) Summary: Pass Process Group Name and Desc to NCCL communicator in order to access pg information in NCCL layer. The information is passed as commDesc string(i.e. "<pg_desc>:<pg_name>") Function only valid when NCCL_COMM_DESCRIPTION is defined. Differential Revision: D55703310 Pull Request resolved: https://github.com/pytorch/pytorch/pull/124149 Approved by: https://github.com/shuqiangzhang	2024-04-16 15:08:38 -07:00
Shengbao Zheng	7a551d81e5	[c10d/nccl-pg] allow user to pass process group description (#123472 ) Summary: We need a way to allow user set a customized description for a process group, e.g. FSDP, PP. Here are several use cases of user specified group_desc: - Logging: we can easily match a log line and understand what's this collective/pg is used to. - Pytorch traces (e.g. Kineto, Execution Trace) can benefit from the PG desc since trace analysis, benchmarks will be able to easily differentiate PG purpose like FSDP, PP. - Lower layer collectives(e.g. NCCL) debug: we will be able to expose PG desc to NCCL communicator so NCCL layer operations can be easily correlated to a PG. Solution: Add a group_desc field to c10d Differential Revision: D55781850 Pull Request resolved: https://github.com/pytorch/pytorch/pull/123472 Approved by: https://github.com/kwen2501	2024-04-16 15:08:38 -07:00
Lucas Pasqualin	1515a90475	[DCP] Adds ability to create a CPU state dict that is both shared and pinned (#122338 ) [DCP] Adds ability to create a CPU state dict that is both shared and pinned, as well as a new utility specific to copying the state dict https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__MEMORY.html#group__CUDART__MEMORY_1ge8d5c17670f16ac4fc8fcb4181cb490c Pull Request resolved: https://github.com/pytorch/pytorch/pull/122338 Approved by: https://github.com/fegin	2024-04-16 15:08:22 -07:00
Shengbao Zheng	4882ec2a91	Pass and record process_group_name when creating ProcessGroupNCCL (#123117 ) Summary: Pass python c10d group_name to c++ ProcessGroupNCCL so that the pg name will be consistent across different layers. Also record pg_name in flight recorder entry. Differential Revision: D55597200 Pull Request resolved: https://github.com/pytorch/pytorch/pull/123117 Approved by: https://github.com/wconstab	2024-04-16 13:48:35 -07:00
Shuqiang Zhang	972b8060bd	[c10d] make monitorThread sleep when we try to dump (#123788 ) Summary: We seperated the FR dump logic from the desync debug logic, so we no longer set collectiveDebugInfoMode_ to true when we just need FR dump. That's why monitor thread did not sleep and try to kill the process without waiting for the dump. The fix is simple, we should sleep whenever shouldDump_ is true Test Plan: Existing unit tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/123788 Approved by: https://github.com/wconstab	2024-04-11 09:19:15 -07:00
Shuqiang Zhang	3e7683ae18	[c10d] dump on any exception (timeout + nccl error) (#123023 ) Summary: Existing flight recorder dumping logic is: dump only on timeout, but not on NCCL error. This resulted in the faulty ranks missing dumps when NCCL error happens. So in this PR, we revise the logic of dump such that records are dumped when any exception is detected. Exception could be 1. NCCL async errors. 2. watchdog timeout Also the existing code tends to mix the logic of flight recorder dump and desync debug, which is no desirable. We only dump the desync debug report only when timeout is detected. Test Plan: Added a new unit test to trigger nccl error and dump, and make sure the dump is triggered by the error. Also existing dump on timeout tests should still pass. sqzhang_1) [sqzhang@devgpu009.cln1 ~/pytorch (84bf9d4c)]$ python test/distributed/test_c10d_nccl.py NcclErrorDumpTest NCCL version 2.19.3+cuda12.0 [E329 19:15:11.775879730 ProcessGroupNCCL.cpp:565] [Rank 0] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=2, OpType=ALLREDUCE, NumelIn=10, NumelOut=10, Timeout(ms)=10000) ran for 10028 milliseconds before timing out. [E329 19:15:11.777459894 ProcessGroupNCCL.cpp:1561] [PG 0 Rank 0] Exception hit in NCCL work: 2 [E329 19:15:12.660717323 ProcessGroupNCCL.cpp:1332] [PG 0 Rank 0] Received a timeout signal from this local rank and will start to dump the debug info. Last enqueued NCCL work: 2, last completed NCCL work: 1. [E329 19:15:12.660932242 ProcessGroupNCCL.cpp:1167] [PG 0 Rank 0] ProcessGroupNCCL preparing to dump debug info. [E329 19:15:12.661192990 ProcessGroupNCCL.cpp:1174] [PG 0 Rank 0] ProcessGroupNCCL dumping nccl trace to /tmp/tmp06psqil3/trace_0 [F329 19:15:12.661485601 ProcessGroupNCCL.cpp:1185] [PG 0 Rank 0] [PG 0 Rank 0] ProcessGroupNCCL's watchdog detected a collective timeout from the local rank. This is most likely caused by incorrect usages of collectives, e.g., wrong sizes used across ranks, the order of collectives is not same for all ranks or the scheduled collective, for some reason, didn't run. Additionally, this can be caused by GIL deadlock or other reasons such as network errors or bugs in the communications library (e.g. NCCL), etc. We tried our best to dump the debug info into the storage to help you debug the issue. Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/123023 Approved by: https://github.com/wconstab	2024-04-02 15:41:15 -07:00
Shuqiang Zhang	f2e9ec2dc5	[c10d] dump from one and only one thread (PG0's monitor thread) (#120893 ) Summary: When there are multiple PGs in a process and a hardware failure happens, we found that multiple PGs/ threads in the same process are competing to dump the same records at the same time. The affects the reliability of dumps. In this PR, we will try to make the change such that only one thread/PG could dump: PG0's monitor thread. We use a static variable to indicate that something (e.g., collective timeout) has triggered the dump locally. monitor thread would dump debug info under any one of the 3 conditions: 1: this static variable is set to true by the watchdog thread when it detects a timeout or pipe dump signal 2: timeout signal is received from other ranks through tcpstore 3: no heartbeat of watchdog Test Plan: python test/distributed/test_c10d_nccl.py -k test_timeout_dumps_on_stuck_ranks Pull Request resolved: https://github.com/pytorch/pytorch/pull/120893 Approved by: https://github.com/wconstab	2024-04-02 15:36:05 -07:00
Yulun Wang	dde4324d8e	[NCCL PG] Enable ncclCommDevIdxMap unconditionally (#122049 ) Differential Revision: D54993977 The initial purpose of ncclCommDevIdxMap is to support NCCL zero copy algorithms. Therefore, it is only enabled (with its values filled) if useTensorRegisterAllocatorHook_ is set to true. However, now we rely on it to support dumping NCCL information in a single PG. So we need it to be always available, regardless of whether we enabled useTensorRegisterAllocatorHook_. Move the code of filling ncclCommDevIdxMap out of if (useTensorRegisterAllocatorHook_) statement. See diff Pull Request resolved: https://github.com/pytorch/pytorch/pull/122049 Approved by: https://github.com/shuqiangzhang	2024-03-26 17:14:06 -07:00
Shuqiang Zhang	94c079104d	[c10d] fix the macro definition of NCCL_COMM_DUMP (#120502 ) Summary: Only if both macros are defined, should we dump the comm dump, otherwise, use the original definition. The previous implementation missed the function definition when IS_NCCL_EXP is defined but NCCL_COMM_DUMP is not defined Test Plan: Build and unit test Pull Request resolved: https://github.com/pytorch/pytorch/pull/120502 Approved by: https://github.com/dsjohns2, https://github.com/Skylion007	2024-03-26 14:09:00 -07:00
Shuqiang Zhang	a6afee6d94	[c10d][flight recorder] dump additinal NCCL debug info (#120063 ) Summary: This PR is mainly about flight recorder side of changes that takes a map of maps as input, and dump it as picklable. Also add functions that should be compiled only when NCCL_COMM_DUMP is defined Test Plan: Integration tests with NCCL would be done later, here we only do the c10d side of dump test, aka,NCCLTraceTest Testing the dump function is a bit tricky as we don't have existing C++ unit tests for them. So we still use the Python NCCLTraceTest with the python binding of _dump_nccl_trace(), we manually fed the dump_nccl_trace with a map of test info, and assert the pickle result and print the converted python dict: ``` (sqzhang_1) [sqzhang@devgpu009.cln1 ~/pytorch (main)]$ python test/distributed/test_c10d_nccl.py NCCLTraceTest NCCL version 2.19.3+cuda12.0 [rank0]:[E ProcessGroupNCCL.cpp:1200] [PG 0 Rank 0] ProcessGroupNCCL preparing to dump debug info. .NCCL version 2.19.3+cuda12.0 .NCCL version 2.19.3+cuda12.0 {'ncclID2': {'Key2': 'Value2', 'Key1': 'Value1'}, 'ncclID1': {'Key2': 'Value2', 'Key1': 'Value1'}} {'ncclID2': {'Key2': 'Value2', 'Key1': 'Value1'}, 'ncclID1': {'Key2': 'Value2', 'Key1': 'Value1'}} .NCCL version 2.19.3+cuda12.0 {'ncclID2': {'Key2': 'Value2', 'Key1': 'Value1'}, 'ncclID1': {'Key2': 'Value2', 'Key1': 'Value1'}} {'ncclID2': {'Key2': 'Value2', 'Key1': 'Value1'}, 'ncclID1': {'Key2': 'Value2', 'Key1': 'Value1'}} .NCCL version 2.19.3+cuda12.0 .NCCL version 2.19.3+cuda12.0 .NCCL version 2.19.3+cuda12.0 .NCCL version 2.19.3+cuda12.0 . ---------------------------------------------------------------------- Ran 8 tests in 95.761s OK ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/120063 Approved by: https://github.com/wconstab	2024-03-26 14:08:19 -07:00
Kunal Bhalla	d092857531	[Caffe2 CPU tests] Update CMakeLists.txt	2024-02-24 12:18:10 -08:00
Jack Zhang	6aad5e444a	Fix missing MAST log when there is Unicode non-decodable text in logs (#119298 ) Summary: ## Issue When there is Unicode non-decodable text in logs, `tail_logger` will stop working afterwards, i.e. f527390102 In the example, the process stopped producing Python logs after 17:20:21 untill the job finished ``` [0]:I0201 17:20:21.338000 3429 gen_ai/genie_projects/llm/metaformers/reward_model_score.py:335] Progress: 118 batches out of 512 total batches. 23.05 % \| (gpu mem: 25.8GB, free CPU mem: 1387.8GB) I0201 17:39:14 Stopping twtask-main.service with Service Result: [success] Exit Code: [exited] Exit Status: [0] ``` At the end, `UnicodeDecodeError` was thrown at the end with no call stack. ## Fix Use `errors="replace"` to avoid throwing exception when `UnicodeDecodeError` happens. Test Plan: f528854819 Differential Revision: D53483644 Co-authored-by: Jack Zhang <jackzh@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/119298 Approved by: https://github.com/XilunWu	2024-02-24 12:16:39 -08:00
Shuqiang Zhang	c54ce9313b	[c10d][flight recorder] store a copy of string in entry (#119837 ) Summary: Previously, we just store the char pointer in entry, the string is a temp object and will be destructed when we want to dump/access it. A quick fix is to store a copy of the string, but without changing the upstream char*. An alternative is to change every profilingTitle into std:string, this however would needs comprehensive overhall of the code up to the c10d::work layer above workNCCL and RecordFunction etc. We chose the first option for this change Resolve #119808 Pull Request resolved: https://github.com/pytorch/pytorch/pull/119837 Approved by: https://github.com/zdevito, https://github.com/wconstab	2024-02-14 11:38:10 -08:00
Shuqiang Zhang	1fe59f4ef7	[c10d][flight recorder] remove unintended assignment of entry (#119748 ) Summary: auto& entry = entries_.at(id % max_entries_); entry = entries_.at(id % max_entries_); The above line of code has unintended consequence of invoking copy/assignment of entry objects as ref itself cannot be re-assigned. Also what could cause the crash is that the entry ref could become invalid if entries_ are resized by other threads. and this could result in 'copy to a garbage location'. The fix is to use a pointer which can be re-assigned after re-acquiring the lock Tests: python test/distributed/test_c10d_nccl.py NCCLTraceTest Pull Request resolved: https://github.com/pytorch/pytorch/pull/119748 Approved by: https://github.com/wconstab, https://github.com/fegin	2024-02-14 11:38:10 -08:00
zdevito	e693fb2bb1	[nccl flight recorder] record time we discover start and complete (#119249 ) Some APIs like ncclCommAbort can cause nccl kernels to finish even if they were previously stuck. Because we can gather the trace buffer after those calls, we can end up seeing some collectives marked completed eventhough that complete happened several minutes after they started and clearly after the timeout. This changes how we record state so that we keep track of the time we discover a state change, so even if eventually the collective gets marked complete, we can observe it happened minutes after it was schedule. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119249 Approved by: https://github.com/wconstab	2024-02-14 11:38:10 -08:00
Min Si	4fe510baf6	[NCCL PG] log NCCL comm at creation and abort (#118335 ) Summary: It helps correlate NCCL PG with corresponding NCCL comm in separate logs. Differential Revision: D53107647 Pull Request resolved: https://github.com/pytorch/pytorch/pull/118335 Approved by: https://github.com/wconstab	2024-02-14 11:38:04 -08:00
fduwjj	7c507b78c4	[c10d] Expose check method to Python for store via pybind (#116144 ) Differential Revision: [D52310987](https://our.internmc.facebook.com/intern/diff/D52310987) Pull Request resolved: https://github.com/pytorch/pytorch/pull/116144 Approved by: https://github.com/wconstab	2024-01-31 11:08:27 -08:00
Will Constable	0019901601	[C10D] Fix nccl flightrecorder ignored dump timeout (#118142 ) Don't call future.get() unless it's ready, because it waits. Also, refactor the code a bit for simplicity. We should do a follow-on PR to clean up the timeouts further, but this should fix the glaring timeout bug. Pull Request resolved: https://github.com/pytorch/pytorch/pull/118142 Approved by: https://github.com/shuqiangzhang ghstack dependencies: #118044, #118046, #118047	2024-01-26 16:48:00 -08:00
Will Constable	18be18535b	[C10D] Make Flight Recorder report time_created in ns (#118047 ) Addresses (6) from #117883 Pull Request resolved: https://github.com/pytorch/pytorch/pull/118047 Approved by: https://github.com/zdevito ghstack dependencies: #118044, #118046	2024-01-26 16:48:00 -08:00
Will Constable	2729367313	[C10D] Add version tag to NCCL Flight Recorder Dump (#118046 ) Addresses (3) from #117883 Pull Request resolved: https://github.com/pytorch/pytorch/pull/118046 Approved by: https://github.com/zdevito ghstack dependencies: #118044	2024-01-26 16:48:00 -08:00
Will Constable	33537aae24	[C10D] Make NCCL Flight Recorder dump produce a dict (#118044 ) Putting the list of entries into a particular key of a top-level dict paves the way for adding other metadata as other top level keys. Addresses 1 and 2 from #117883 Pull Request resolved: https://github.com/pytorch/pytorch/pull/118044 Approved by: https://github.com/zdevito	2024-01-26 16:48:00 -08:00
Will Constable	dcdb1337dd	[C10D] Finer-grain nccl heartbeat, avoid false positive hangs (#118016 ) Summary: Previously, heatbeat was incremented once per finishing a for loop over a list of in-progress work items, under the assumption that either the processing would be predictably quick, or it would hang completely. In fact, there can be cuda API contention that causes the processing of works to slow down arbitrarily but not truly deadlock. To guard against this, we bump the heartbeat at the smallest unit of progress, one work item being successfully processed. Test Plan: CI Differential Revision: D52973948 Pull Request resolved: https://github.com/pytorch/pytorch/pull/118016 Approved by: https://github.com/shuqiangzhang, https://github.com/kwen2501	2024-01-26 16:48:00 -08:00
dilililiwhy	9cf0f2bd59	Move getDurationFromFirstEvent to USE_C10D_NCCL ifdef (#117738 ) Fixes #117517 Try to move nccl related function getDurationFromFirstEvent to USE_C10D_NCCL ifdef (Related to https://github.com/pytorch/pytorch/issues/114575) Pull Request resolved: https://github.com/pytorch/pytorch/pull/117738 Approved by: https://github.com/wconstab, https://github.com/XilunWu	2024-01-26 16:48:00 -08:00
Ke Wen	1d2e877c05	[ProcessGroup] Make watchdog check work queue more frequently (#117297 ) Today watchdog's sleep interval is 1s. That's a bit long compared to modern GPU link's (or network link's) speed. Take DDP and Ampere for example: DDP's bucket size = 25 MB Ampere's NVLink speed = 250 GB/s 25 MB / 250 GB/s = 100 ms. So we are updating the interval to 100 ms. Update: 25 MB / 250 GB/s = 0.1 ms But let's see how it goes so far between making the checking more aggressive. Pull Request resolved: https://github.com/pytorch/pytorch/pull/117297 Approved by: https://github.com/fduwjj	2024-01-26 16:48:00 -08:00
fduwjj	f27b979b0c	[c10d] Move the timeout dump check from watchdog to monitoring thread (#117168 ) To avoid potential hang in watchdog thread which will prevent us from dumping timeout debugging info, we move the check of global collective timeout signals and dumping debugging info to monitoring thread. We also need to ensure that we don't wait very long to check out the timeout signal from store; otherwise, we will miss the signal and don't get debugging info dumped. Pull Request resolved: https://github.com/pytorch/pytorch/pull/117168 Approved by: https://github.com/wconstab	2024-01-26 16:48:00 -08:00
fduwjj	f30d6047ad	[c10d] Add a timeout check interval variable for timeout dump (#117093 ) The current timeout check frequency is relied on monitoring thread's timeout thread which can be too long (even if we set it to 2mins) so let's use a separate timeout variable which users can configure it. And we only only let default PG to check TCPStore so even more frequent check should be fine. (Our stress test is performed on every half second). Pull Request resolved: https://github.com/pytorch/pytorch/pull/117093 Approved by: https://github.com/wconstab, https://github.com/kwen2501	2024-01-26 16:48:00 -08:00
Will Constable	75311510ef	[C10D] Add duration_ms to flight recorder (#114817 ) Measures the duration of a collective operation using nccl start/end events and includes this duration (in ms) in the flight recorder data. duration_ms will be an optional field, since it only works when timing is enabled. Currently timing is enabled when flight recorder is enabled, but this is not a strict requirement. Duration is also not available for collectives not in a completed state. Note: computing duration can lead to a hang due to calling cudaEventDuration when the cuda driver queue is full. We don't ever want dump() api to hang, since we might want dump to help debug a hang. Hence, we only query durations from the watchdog thread, and it's possible during dump() call, some of the most recent collectives durations won't have been computed yet at time of dump. We make this tradeoff to ensure that dump() itself will never hang. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114817 Approved by: https://github.com/fduwjj, https://github.com/zdevito ghstack dependencies: #116905	2024-01-26 16:48:00 -08:00
Will Constable	dbd6094d05	[C10D](reland) Add GIL checker to NCCL watchdog monitor (#117312 ) Whenever the monitor thread kills the watchdog thread for being stuck, we do so to save cluster time and get a faster failure signal, but we want to know more about why it got stuck. One possible reason for watchdog stuckness is GIL contention, which could be ruled out or observed by making an attempt to acquire the GIL at exit time. If we cannot acquire the GIL within a short time window (1s) we abort the attempt and report GIL contention, otherwise we report that GIL was acquired successfully. Reland: uses a function pointer to avoid destructor ordering issues on dlclose. (Looks like the destructor for the std::function was being run later than the libtorchpython lib was unloaded, leading to a crash). Pull Request resolved: https://github.com/pytorch/pytorch/pull/117312 Approved by: https://github.com/zdevito	2024-01-26 16:48:00 -08:00
Ke Wen	397b9d47e9	[ProcessGroup] Do not print NCCL_DEBUG before NCCL init (#117328 ) In case /etc/nccl.conf is used, `NCCL_DEBUG` is not set to sys env until NCCL inits. The deleted print point is before NCCL inits, hence may be inaccurate. This PR removes it and relies on the other print point which is after NCCL comm creation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/117328 Approved by: https://github.com/wconstab, https://github.com/fduwjj	2024-01-26 16:48:00 -08:00
fduwjj	36a01a8ab9	[c10d][EZ] Add more logs in the destructor of ProcessGroupNCCL for better root cause investigation (#117291 ) Add logs to the place where we inspect whether a hang happens. Pull Request resolved: https://github.com/pytorch/pytorch/pull/117291 Approved by: https://github.com/XilunWu, https://github.com/shuqiangzhang	2024-01-26 16:48:00 -08:00
fduwjj	ee336cf58a	[c10d] Add comments to the rest environment variable within NCCLPG (#117092 ) Not every environment within NCCLPG has comments, let's add comments to each of them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/117092 Approved by: https://github.com/kwen2501 ghstack dependencies: #116545	2024-01-26 16:48:00 -08:00
fduwjj	a9e2e745d7	[c10d] Add extra sleep in waitForDumpOrTimeout to ensure enough time for all ranks dump debug info (#116545 ) We added an extra sleep and make it configurable so that users can set an extra wait to ensure all ranks have dumped the debug info. Pull Request resolved: https://github.com/pytorch/pytorch/pull/116545 Approved by: https://github.com/wconstab	2024-01-26 16:48:00 -08:00
Will Constable	ab4df89eea	[C10D] Rename flightrecorder key vars to avoid confusion (#116905 ) Key vars are strings used as dict keys (e.g. duration_s was a string "duration_ms") _s confused me with time (seconds) since duration_s was a key string and duration_ms is another variable holding a time value. Now duration_key is "duration_ms". Pull Request resolved: https://github.com/pytorch/pytorch/pull/116905 Approved by: https://github.com/zdevito	2024-01-26 16:48:00 -08:00
fduwjj	9d02ebe876	[c10d] To make ProcessGroupNCCL to use globalStore for coordination (#117075 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/117075 Approved by: https://github.com/wconstab ghstack dependencies: #117074	2024-01-26 16:48:00 -08:00
fduwjj	b61e01cce9	[c10d] Add a recursive method to get the inner most store (#117074 ) In c10d PG initialization, we wrap TCPStore with multiple layers of PrefixStore which adds layers of prefix. One example is: "default_pg/0//cuda//timeout_dump" When initialized the default PG, because there is no store passed. We first add the prefix "default_pg" to the TCPStore returned from rendezvous: `bdeaaad70c/torch/distributed/distributed_c10d.py (L1240)` We then add pg_name (aka 0) `bdeaaad70c/torch/distributed/distributed_c10d.py (L1376)` and device (aka cuda) `bdeaaad70c/torch/distributed/distributed_c10d.py (L1387)` to the prefix. Then when we call store_->set("timeout_dump"). The actual key used for writing into TCPStore is "default_pg/0//cuda//timeout_dump". For sub-PG, things get even interesting, we put the store wrapped with default pg name to a cache: `bdeaaad70c/torch/distributed/distributed_c10d.py (L1517)` And when creating each subPG, it is append its PG name right after the cached store. The example keys are: 'default_pg/0//10//cuda//timeout_dump', 'default_pg/0//12//cuda//timeout_dump', 'default_pg/0//38//cuda//timeout_dump', 'default_pg/0//39//cuda//timeout_dump'. (10, 12, 38 and 39 are all PG names of each subPG created) The reason why the number in the name is bumped up so high is because for each subPG creation, all ranks have to call the API together and the global variable used for PG name will be bumped up monolithically: `bdeaaad70c/torch/distributed/distributed_c10d.py (L3666)` Similar things happen for using hashing for PG names. This has a potential issue, because each sub-PG has an instance of ProcessGroupNCCL, and if we want to set something global to notify all sub-PGs (and all ranks). This added prefix causes bugs. For example, if on sub-PG 1, we set a value to TCPStore with key ('default_pg/0//1//cuda//timeout_dump'), while we use the default PG instances to check the TCPStore, which are using the key ('default_pg/0//cuda//timeout_dump'), default PG instances will never get the notified signals. So in this PR, we added a new API in PrefixStore which we get the innermost non-PrefixStore for set and check. The next PR will make changes in NCCL watchdog. Pull Request resolved: https://github.com/pytorch/pytorch/pull/117074 Approved by: https://github.com/wconstab, https://github.com/H-Huang	2024-01-26 16:48:00 -08:00
Will Constable	f7ce61ba53	[C10D] Dump cpp stacktraces on heartbeat monitor timeout (#116717 ) Summary: If heartbeat monitor times out and kills the process, we want to know why. It's convenient to use an internal tool for this, but we plan to later integrate with torchelastic to call into pyspy or something else, which will be both better (including py stacks) and compatible with OSS. Test Plan: tested manually, observed c++ stacktraces were dumped Reviewed By: fduwjj Differential Revision: D52370243 Pull Request resolved: https://github.com/pytorch/pytorch/pull/116717 Approved by: https://github.com/zdevito	2024-01-26 16:48:00 -08:00
Will Constable	e7bae15ab1	[C10D] Make heartbeat_ atomic (#116702 ) Summary: Currently, the code is working. We know this becuase we observe heartbeat timeouts. However, there is a chance that if the code were refactored, the compiler could optimize away the load of heartbeat_ inside heartbeatMonitor, and we wouldn't know. Using atomic here is not really for thread synchronization, but more to ensure compiler optimizations (hoisting the read outside the loop) can never be allowed to happen. Again, we know this isn't currently happening bc if it were, it would not be an intermittent failure, it would be an always failure. (at least with a fixed compiler/platform). I previously avoided atomic bc we didn't want shared locks between heartbeat monitor and watchdog thread. Why? if watchdog held the lock and hung, monitor could also hang. However, this really can't happen (Afaik) when using an atomic. Test Plan: existing CI tests Differential Revision: D52378257 Pull Request resolved: https://github.com/pytorch/pytorch/pull/116702 Approved by: https://github.com/fduwjj, https://github.com/zdevito	2024-01-26 16:48:00 -08:00
Will Constable	e71b422908	[C10D] Improve Heartbeat Monitor exit logs (#116268 ) (#116661 ) Summary: - add workMetaList_.size() so we know how many outstanding works there were when killing - Print our first log before debuginfo dump instead of after, since it is clearer when reading the logs that we time out and then dump - Organize the log strings- put them near where they are used cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l yf225 imported-using-ghimport Test Plan: Imported from OSS Reviewed By: fduwjj Differential Revision: D52369167 Pulled By: wconstab Pull Request resolved: https://github.com/pytorch/pytorch/pull/116661 Approved by: https://github.com/fduwjj	2024-01-26 16:48:00 -08:00
fduwjj	389940ce60	[c10d] Make DebugInfoWriter Singleton across all PG objects (#116489 ) Previously, we have the writer register to each NCCL PG(backend), so for every pg, we have a NCCL PG instance, so if we use some customized writer when multiple sub-PGs are used, we need to ensure user to register the writer for every backend which indicates a bad UX. Furthermore, the debug info is global, so it does not make sense to have the writer for each instance. We even have a static mutex in the `dumpDebuggingInfo` to ensure we serialize the write, that makes it more obvious that we can make the writer a singleton so that we only have one writer instance for all PG instances. Although the rationale is clear, the implementation may vary a lot. So this PR is RFC for now to see if this implementation makes sense or not. Pull Request resolved: https://github.com/pytorch/pytorch/pull/116489 Approved by: https://github.com/kwen2501	2024-01-26 16:48:00 -08:00
fduwjj	b2237a7c85	[C10d] Fix Log Prefix in NCCLPG so that each instance gets its own prefix (#116520 ) Somehow the logprefix only have ProcessGroup 0 rank [global rank]. This does not give the expected result as per the comment says "a prefix that is unique to this process group and rank". So this PR fix it and make it different for different subPGs. The reason is that we set the prefix static which is shared across all NCCLPG instances and whoever calls this function first will set `rank_` and `uid_` to the prefix. We always initialize PG 0 first that's why we always see PG[0] + global ranks for all subPGs. <img width="484" alt="image" src="https://github.com/pytorch/pytorch/assets/6937752/7fbb0226-7e25-4306-9cee-22e17b00bc8e"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/116520 Approved by: https://github.com/wconstab ghstack dependencies: #116218	2024-01-26 16:48:00 -08:00
fduwjj	ef5dfe3f3e	[c10d] Fix timeout dump path write path overlap when there are multiple PGs (#116218 ) Basically we observed that if there are multiple PGs and if the timeout happens on one of the subPG, we somehow use the local rank in the dump file. We realize that: 1. For setting the timeout signal in the store, any watchdog thread from any PG can do that. 2. For checking and dump, only the watchdog thread of default PG which we will always create and contain all ranks (no file name conflict) is needed here because the store signal and dump debug info are all global. 3. Since dump is global, we want to avoid the case when ranks from sub-PG pollute logs from global ranks (local rank 0 vs global rank 0). So that we use global ranks here to initialize debug info writer. (Down the road, we are thinking about making it a singleton so that user only register it once for multi-PG case.) Pull Request resolved: https://github.com/pytorch/pytorch/pull/116218 Approved by: https://github.com/wconstab	2024-01-26 16:48:00 -08:00
fduwjj	e303dc3c08	[c10d] Add stream info during nccl comm abort call (#116076 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/116076 Approved by: https://github.com/XilunWu	2024-01-26 16:48:00 -08:00
Will Constable	265efad2de	[C10D] Increase TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC (#116267 ) Change default from 2 min to 10 min. Why? Many cases of heartbeat timeout were reported, but increasing timeout led to the same job hanging in a different place, suggesting heartbeat kill was working well and not a false positive. However, some others reported jobs running fine with increased timeouts. One such case was investigated below, and suggests that indeed a 2 min timeout is too aggressive. While we have not fully root caused the issue, it is better to avoid killing jobs that would otherwise complete. Current theory is that watchdog is not totally deadlocked, but is slowed down in its processing of work objs due to some intermittent resource contention. Hence, allowing more time is more of a workaround than a fix. Debug/Analysis: https://docs.google.com/document/d/1NMNWoTB86ZpP9bqYLZ_EVA9byOlEfxw0wynMVEMlXwM Differential Revision: [D52368791](https://our.internmc.facebook.com/intern/diff/D52368791) Pull Request resolved: https://github.com/pytorch/pytorch/pull/116267 Approved by: https://github.com/fduwjj	2024-01-26 16:48:00 -08:00
Will Constable	60f0455905	[C10D] Make all PGNCCL LOG usages use logPrefix() (#116060 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/116060 Approved by: https://github.com/fduwjj ghstack dependencies: #116059	2024-01-26 16:48:00 -08:00
Will Constable	4898313791	[C10D] Add logPrefix to abortCommsFromMap (#116059 ) Prints additional info such as PG ID/Rank. Pull Request resolved: https://github.com/pytorch/pytorch/pull/116059 Approved by: https://github.com/fduwjj	2024-01-26 16:48:00 -08:00
Will Constable	f4da9adf6b	[C10D] Add waitForDumpOrTimeout to log on dump abandonment (#115876 ) Helps call attention to any cases where the dump actually times out. The timeout is likely to hit if we run into slow stacktrace processing. Log any exceptions encountered in the background thread, but don't raise them- we're already willing to abandon the debug dump, and want to proceed with our normal execution (in the case of dumppipe) or shutdown process (when dumping happens on timeout and shutdown is already initiated). Pull Request resolved: https://github.com/pytorch/pytorch/pull/115876 Approved by: https://github.com/zdevito ghstack dependencies: #115807	2024-01-26 16:48:00 -08:00
Will Constable	8f7f35273e	[c10d] Polish NCCL PG monitor thread log message (#115888 ) We turned on monitor thread by default in https://github.com/pytorch/pytorch/pull/112518, and we want the error message that is displayed when the monitor kills the process to be more informative. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115888 Approved by: https://github.com/wconstab	2024-01-26 16:48:00 -08:00
Will Constable	44ec9612ed	[C10D] Log PG size in init log (#115807 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115807 Approved by: https://github.com/XilunWu	2024-01-26 16:48:00 -08:00
zdevito	4d3bea2b29	[nccl flight recorder] nullptr profiling name (#115851 ) Sometimes profiling name can be a nullptr, which throws on conversion to std::string. This adds a check. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115851 Approved by: https://github.com/wconstab	2024-01-26 16:48:00 -08:00
Will Constable	0bcdddc3c1	[C10D] Make dumpDebuggingInfo share a mutex across PGs (#115803 ) The mutex was originally added to avoid racing to dump debuginfo, where a race in this case would result in a corrupted dump file. The reason a mutex helps is that it forces all dump requests to be serialized, so that an observer would either see an in-progress file, a complete file, or no file. Without a mutex, a fourth state is possible (a file that has been written to by multiple threads and is invalid). Becuase the mutex was a ProcessGroupNCCL class member, and each PG instance has its own watchdog thread that can launch a dump, it was not doing its job. Making the mutex static shares it between instances of the class and ensures serialization of dumps triggered by any PG. (Note: dumps triggered by different PGs have the same, global contents anyway- there is only one global flight recorder, so it doesn't matter who triggers it.) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115803 Approved by: https://github.com/kwen2501 ghstack dependencies: #115771, #115798, #115800, #115801	2024-01-26 16:48:00 -08:00
Will Constable	28b6220312	[C10D] Change PGNCCL logs to prefix [PG {} Rank {}] (#115801 ) Adds a PG {process group uid} prefix component to logs. This is helpful in situations where there are multiple processgroups, and rank information by itself is confusing. (For example rank0 on PG1 may correspond to rank3 on PG0. People may assume 'rank0' references the global (PG0) world, but it may reference a sub-pg. Prefacing the PG helps clarify this. Does NOT change logs from inside WorkNCCL functions, since WorkNCCL doens't know what PG ID it corresponds to. Will address these logs separately. Example: ``` [I ProcessGroupNCCL.cpp:787] [PG 0 Rank 0] ProcessGroupNCCL initialization ... ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/115801 Approved by: https://github.com/fduwjj ghstack dependencies: #115771, #115798, #115800	2024-01-26 16:48:00 -08:00
Will Constable	210b7b65e2	[C10D] Refactor NCCL logs to use common prefix helper (#115800 ) Put the repeated code that string formats [Rank {rank}] in one place. Sets up for the next PR that also adds more info to this prefix. (Does not change exception messages, which could be done as well. Exception messages are not formatted quite the same way. Tries instead to keep from changing log behavior (in this PR) and only refactor code. Did limited testing (some logs were observed OK). Pull Request resolved: https://github.com/pytorch/pytorch/pull/115800 Approved by: https://github.com/fduwjj ghstack dependencies: #115771, #115798	2024-01-26 16:48:00 -08:00
Will Constable	4da10b5cd3	[C10D] Only open NCCL dump pipe file once per process (#115798 ) The NCCL flight recorder is per-process (it is shared by all processgroups), but individual process groups used to construct their own pipe for being signaled to dump the flight recorder. This ensures that only one pipe per process is created, by only creating the pipe on the first ProcessGroup (uid_ == 0) which should be the world group. Filenames are still keyed off of rank, but this should now be global rank instead of sub-pg rank, making the filenames unique across the whole trainer process. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115798 Approved by: https://github.com/zdevito ghstack dependencies: #115771	2024-01-26 16:48:00 -08:00
Will Constable	f09763814f	[C10D] Make DumpPipe disabled when FlightRecorder disabled (#115771 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115771 Approved by: https://github.com/fduwjj	2024-01-26 16:48:00 -08:00
Will Constable	80923ed5a6	[C10D] Make DumpPipe pipe file configurable (#115770 ) Add TORCH_NCCL_DEBUG_INFO_PIPE_FILE env, allowing separate pipe file location from dump file location. Defaults PIPE_FILE to empty, meaning disabled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115770 Approved by: https://github.com/zdevito	2024-01-26 16:48:00 -08:00
Joel Schlosser	0ff155fb65	Fix SDPA for SAM (#115636 ) Addresses the regression for Segment Anything Fast in https://github.com/pytorch-labs/segment-anything-fast/issues/99 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115636 Approved by: https://github.com/soulitzer, https://github.com/ani300	2023-12-12 18:52:38 +00:00
soulitzer	8885128dcc	Fix backward for SDPA NT jagged layout (#115576 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115576 Approved by: https://github.com/jbschlosser, https://github.com/ani300	2023-12-12 18:35:40 +00:00
Xiaodong Wang	7553c49514	[S382174] Fix distributed debug w/ non-equal split (#115483 ) Summary: In collectives, it's possible to have non-equal split that has a different implementation and the output tensor size will be different, e.g. https://www.internalfb.com/code/fbsource/[460afb1172b5]/fbcode/caffe2/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp?lines=3104. However, TORCH_DISTRIBUTED_DEBUG=DETAIL will assume the output tensor size is the same and does the check and will fail the job if they don't: https://fburl.com/code/mhte9ty8. c10d code should handle this. Ideally we should check the input size across ranks and make sure they're the same. Maybe for next diff. Test Plan: Test torchrec's TWRW w/ non-even split and it's working now. Reviewed By: zhangruiskyline Differential Revision: D52010942 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115483 Approved by: https://github.com/kwen2501, https://github.com/fegin, https://github.com/XilunWu	2023-12-12 18:02:05 +00:00
mantaionut	d521857411	Terminate handler (#101332 ) Fixes #50051. This PR is based on #50320 and I address the last feedback. On Windows it is enabled by default. Can be enabled or disabled via USE_CUSTOM_TERMINATE env variable. This PR adds support for overriding the terminate handler in order to log uncaught exceptions in the threads. If an exception is thrown and not caught, it will print <Unhandled exception caught in c10/util/AbortHandler.h> The point of doing this is that in issue #50051, exceptions were thrown but not logged. With this logging system it will be easier to debug it in the future. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101332 Approved by: https://github.com/albanD, https://github.com/malfet	2023-12-12 17:55:27 +00:00
Jason Ansel	36b5136270	[inductor] Don't print disable_cudagraphs_reason when cudagraphs is disabled (#115489 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115489 Approved by: https://github.com/yanboliang	2023-12-12 17:50:18 +00:00
Pearu Peterson	670eb83573	Enable test_sparse_addmm for crossref tests (#115536 ) Fixes https://github.com/pytorch/pytorch/issues/97284 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115536 Approved by: https://github.com/cpuhrsch	2023-12-12 17:26:40 +00:00
Max Ren	a8dc9d8e35	[8/n] Update XNNPACK Version Part 8 Everything Remaining to get it to work (#115587 ) > __Note:__ XNNPACK Upgrade is too large in the range of 40k files and 10m Lines of code, Thus we break the update of the library into multiple parts. All Parts [1 - 6/n] Must be landed together for it to work. *This also means If there is a revert. Please revert the Entire Stack.* This change is everything remaining requiring XNNPACK version to work. Differential Revision: [D52044420](https://our.internmc.facebook.com/intern/diff/D52044420/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115587 Approved by: https://github.com/digantdesai	2023-12-12 17:17:19 +00:00
Pearu Peterson	e918461377	Add instructions for generating optimal Triton kernel parameters of bsr_dense_addmm (#115504 ) As in the title. In addition, enable verbose output when executing the torch/sparse/_triton_ops_meta.py script. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115504 Approved by: https://github.com/cpuhrsch ghstack dependencies: #115499	2023-12-12 16:44:51 +00:00
Pearu Peterson	32286512cc	Add tune_bsr_dense_addmm as an API to find optimal triton kernel parameters for bsr_dense_addmm (#115499 ) As in the title. In addition: - improve the algorithm for finding a minima of operation timings: break the inner loop early when a next minima candidate is found - add tests and fix bugs Pull Request resolved: https://github.com/pytorch/pytorch/pull/115499 Approved by: https://github.com/cpuhrsch	2023-12-12 16:44:51 +00:00
Peter Bell	40dc0580a6	[inductor] De-duplicate triton helper functions (#115546 ) Previously if two calls to cumsum were generated in the same triton kernel we would generate identical helper functions with different names. Now this recognizes identical functions and only defines it once. To do this I defer choosing the name until after codegen. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115546 Approved by: https://github.com/lezcano ghstack dependencies: #109132	2023-12-12 16:30:50 +00:00
Peter Bell	02196c21ac	[inductor] Parameterize ir.Scan on combine_fn (#109132 ) This replaces `tl.cumsum` and `tl.cumprod` with calls to `tl.associative_scan` where the combine function is generated from inductor IR. So before we had: ```python @triton.jit def triton_(in_ptr0, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr): xnumel = 20 rnumel = 30 RBLOCK: tl.constexpr = 32 xoffset = tl.program_id(0) * XBLOCK xindex = xoffset + tl.arange(0, XBLOCK)[:, None] xmask = xindex < xnumel rindex = tl.arange(0, RBLOCK)[None, :] rmask = rindex < rnumel r1 = rindex x0 = xindex tmp0 = tl.load(in_ptr0 + (r1 + (30x0)), rmask & xmask, other=0).to(tl.float32) tmp1 = tl.broadcast_to(tmp0, [XBLOCK, RBLOCK]) tmp2 = tl.where(rmask & xmask, tmp1, 0) tmp3 = tl.cumsum(tmp2, 1) tl.store(out_ptr0 + (r1 + (30x0)), tmp3, rmask & xmask) ``` Now we have: ```python @triton.jit def _triton_helper_fn0(arg0, arg1): tmp0 = tmp0 + tmp1 return tmp0 @triton.jit def triton_(in_ptr0, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr): xnumel = 20 rnumel = 30 RBLOCK: tl.constexpr = 32 xoffset = tl.program_id(0) * XBLOCK xindex = xoffset + tl.arange(0, XBLOCK)[:, None] xmask = xindex < xnumel rindex = tl.arange(0, RBLOCK)[None, :] rmask = rindex < rnumel r1 = rindex x0 = xindex tmp0 = tl.load(in_ptr0 + (r1 + (30x0)), rmask & xmask, other=0).to(tl.float32) tmp1 = tl.broadcast_to(tmp0, [XBLOCK, RBLOCK]) tmp2 = tl.where(rmask & xmask, tmp1, 0) tmp3 = tl.associative_scan(tmp2, 1, _triton_helper_fn0) tl.store(out_ptr0 + (r1 + (30x0)), tmp3, rmask & xmask) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/109132 Approved by: https://github.com/lezcano	2023-12-12 16:30:50 +00:00
zhxchen17	d5286d7ea8	[export] Add canonical form for differentiating IR (#115589 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/115589 Approved by: https://github.com/suo	2023-12-12 16:21:57 +00:00
Scott Wolchok	de4b2e59a7	[PyTorch] AOTI: add more basic aoti_torch getters (#112799 ) Lot of simple information about tensors we couldn't get. In particular, we didn't know the lengths of the arrays returned by sizes and strides. Differential Revision: [D50949929](https://our.internmc.facebook.com/intern/diff/D50949929/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112799 Approved by: https://github.com/desertfire, https://github.com/aakhundov ghstack dependencies: #112116, #112174, #112405, #112798	2023-12-12 15:56:33 +00:00
DanilBaibak	c5c4d81b1b	Switched stale workflow to linux.large.arc (#115635 ) Switched stale workflow to linux.large.arc Pull Request resolved: https://github.com/pytorch/pytorch/pull/115635 Approved by: https://github.com/jeanschmidt	2023-12-12 15:33:59 +00:00
Nikita Shulga	4fafc36c33	[MPS] Fix `sum` and `prod` for complex types (#115554 ) By not force-casting dtype to float Test plan: `python -c "import torch;print(torch.linspace(-3.0, 3.0, 50, dtype=torch.cfloat, device='mps').sqrt().sin().sum())"` Before: ``` tensor(21.1778+0.j, device='mps:0') ``` After ``` tensor(21.1778+39.1377j, device='mps:0') ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/115554 Approved by: https://github.com/lezcano ghstack dependencies: #115512, #115513	2023-12-12 15:04:45 +00:00
Nikita Shulga	07f03b4a62	[MPS] Add support for `MPSDataTypeComplexFloat[16\|32]` (#115513 ) But limit it to MacOS Sonoma + Before the calling `torch.cat` with complex types failed, but now it works. Before: ``` % python -c "import torch;print(torch.cat([torch.rand(3, 3, dtype=torch.cfloat).to('mps'), torch.rand(3, 3, dtype=torch.cfloat).to('mps')]))" TypeError: Trying to convert ComplexFloat to the MPS backend but it does not have support for that dtype. ``` After: ``` % python -c "import torch;print(torch.cat([torch.rand(3, 3, dtype=torch.cfloat).to('mps'), torch.rand(3, 3, dtype=torch.cfloat).to('mps')]))" tensor([[0.4857+0.0030j, 0.9375+0.8630j, 0.3544+0.9911j], [0.5293+0.8652j, 0.8440+0.1991j, 0.5152+0.8276j], [0.0136+0.7469j, 0.1403+0.4761j, 0.2943+0.0896j], [0.6458+0.0035j, 0.3579+0.4577j, 0.1723+0.1508j], [0.4420+0.3554j, 0.4396+0.7272j, 0.2479+0.1191j], [0.3895+0.2292j, 0.7886+0.1613j, 0.9243+0.4180j]], device='mps:0') ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/115513 Approved by: https://github.com/kulinseth ghstack dependencies: #115512	2023-12-12 15:04:45 +00:00
PyTorch MergeBot	21cf6e76c2	Revert "Use linux.large.arc for stale workflow (#115440 )" This reverts commit dadb3694ffaa2a0bfe78516c294a46566430c1ad. Reverted https://github.com/pytorch/pytorch/pull/115440 on behalf of https://github.com/DanilBaibak due to Did not merge properly ([comment](https://github.com/pytorch/pytorch/pull/115440#issuecomment-1852126050))	2023-12-12 14:20:29 +00:00
Danylo Baibak	dadb3694ff	Use linux.large.arc for stale workflow (#115440 ) * Try linux.large.arc for stale workflow * Run stale workflow on PR changes * Added arc runner lable to the list of self hosted runners * Added concurency linux-job * Cleanup * Added workflow_dispatch for testing purpose	2023-12-12 15:11:09 +01:00
Bin Bao	7350dcb307	[CI] Fix lint errors on master (#115627 ) Differential Revision: [D52073432](https://our.internmc.facebook.com/intern/diff/D52073432) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115627 Approved by: https://github.com/atalman	2023-12-12 13:53:14 +00:00
PyTorch MergeBot	bc51a0c22f	Revert "[PyTorch] AOTI: add more basic aoti_torch getters (#112799 )" This reverts commit 3de2596abed9717a166635b48126302fcf46527a. Reverted https://github.com/pytorch/pytorch/pull/112799 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/112799#issuecomment-1852076887))	2023-12-12 13:52:34 +00:00
Pearu Peterson	f98b0f3ebc	Add bfloat16 support to torch.sparse.addmm for CPU (#115535 ) Fixes https://github.com/pytorch/pytorch/issues/73145. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115535 Approved by: https://github.com/cpuhrsch	2023-12-12 13:26:33 +00:00
PyTorch MergeBot	d6f8850653	Revert "[Export] Test non-strict mode on existing test cases (#115399 )" This reverts commit 36527df344c0c33dae8bc6c94eded8646013b736. Reverted https://github.com/pytorch/pytorch/pull/115399 on behalf of https://github.com/atalman due to OSSCI oncall, broke CI tests ([comment](https://github.com/pytorch/pytorch/pull/115399#issuecomment-1851988651))	2023-12-12 13:02:18 +00:00
mingfeima	a8acd6c410	Add Half support for AvgPool2d on CPU (#109578 ) Add Half support for AvgPool2d (both channels last and channels first) on CPU Pull Request resolved: https://github.com/pytorch/pytorch/pull/109578 Approved by: https://github.com/mingfeima, https://github.com/albanD	2023-12-12 12:59:47 +00:00
angelayi	92fd3927b0	[export][reland] Add math.* ops to pass base (#115559 ) Reland of https://github.com/pytorch/pytorch/pull/115271/ Fixes https://github.com/pytorch/pytorch/issues/115209 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115559 Approved by: https://github.com/zhxchen17, https://github.com/atalman ghstack dependencies: #115556, #115557, #115558	2023-12-12 10:46:41 +00:00
chundian	36527df344	[Export] Test non-strict mode on existing test cases (#115399 ) Summary: Dynamo test methodology provides a good example to patch various treaments on the same set of test cases. A pitfall is the global config that could be easily modified somewhere. Here we change the behavior of the export API thru hijacking it with self defined code. For supporting non-strict test suite, the `strict=False` is explicitly passed into the export API when it's called w/ or w/o strict arg. * For existing failed strict test cases, non-strict also fails. * For passed strict but failed non-strict cases, we mark them as `@testing.expectedFailureNonStrict`. * Moreover, I manually check the failure reason and some of them are not related to nn.Module asserting exception. I mark them as `# Need to fix for non-strict mode`. Test Plan: python test/export/test_export_nonstrict.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/115399 Approved by: https://github.com/zhxchen17, https://github.com/tugsbayasgalan	2023-12-12 07:11:53 +00:00
PyTorch MergeBot	fdf814c6ca	Revert "[MPS] Add support for `MPSDataTypeComplexFloat[16\|32]` (#115513 )" This reverts commit a4bb4a237348ff8d688e43ba542ee59a9d7ed4a6. Reverted https://github.com/pytorch/pytorch/pull/115513 on behalf of https://github.com/malfet due to Broke Mac x86 periodic builds ([comment](https://github.com/pytorch/pytorch/pull/115513#issuecomment-1851398773))	2023-12-12 06:50:47 +00:00
PyTorch MergeBot	46694e92b7	Revert "[MPS] Fix `sum` and `prod` for complex types (#115554 )" This reverts commit 8b28380c8ed5b5bfe479392bcffeccf8b89be328. Reverted https://github.com/pytorch/pytorch/pull/115554 on behalf of https://github.com/malfet due to Broke MacOS x86 builds ([comment](https://github.com/pytorch/pytorch/pull/115554#issuecomment-1851395982))	2023-12-12 06:47:39 +00:00
Nikita Shulga	f28687dfb2	Do not use `pytorchbot-env` from upload-test-stats (#115606 ) As it was only needed to check our token rate limits Pull Request resolved: https://github.com/pytorch/pytorch/pull/115606 Approved by: https://github.com/huydhn	2023-12-12 06:42:33 +00:00
Iris Z	1eca63c6ac	[DeviceMesh] Move helper function 'get_mesh_dim_by_name' to MeshEnv class (#115572 ) Move helper function `get_mesh_dim_by_name ` outside of the DeviceMesh class to keep the public class cleaner. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115572 Approved by: https://github.com/XilunWu, https://github.com/wanchaol	2023-12-12 06:29:46 +00:00
Scott Wolchok	3de2596abe	[PyTorch] AOTI: add more basic aoti_torch getters (#112799 ) Lot of simple information about tensors we couldn't get. In particular, we didn't know the lengths of the arrays returned by sizes and strides. Differential Revision: [D50949929](https://our.internmc.facebook.com/intern/diff/D50949929/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112799 Approved by: https://github.com/desertfire, https://github.com/aakhundov ghstack dependencies: #112116, #112174, #112405, #112798	2023-12-12 06:19:45 +00:00
Scott Wolchok	2b323e61ad	[PyTorch] AOTI: Use static_cast, not dynamic_cast (#112798 ) dynamic_cast is for when we aren't certain about the type. We are certain (and will crash anyway if we're wrong). Differential Revision: [D50812978](https://our.internmc.facebook.com/intern/diff/D50812978/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112798 Approved by: https://github.com/chenyang78, https://github.com/desertfire, https://github.com/jansel, https://github.com/khabinov ghstack dependencies: #112116, #112174, #112405	2023-12-12 06:19:45 +00:00
Scott Wolchok	ca52195112	[PyTorch] AOTI: Avoid aoti_torch_data_ptr calls for constants at inference time (#112405 ) Cache aoti_torch_get_data_ptr at constants update time. Differential Revision: [D50708982](https://our.internmc.facebook.com/intern/diff/D50708982/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112405 Approved by: https://github.com/chenyang78, https://github.com/desertfire, https://github.com/khabinov ghstack dependencies: #112116, #112174	2023-12-12 06:19:45 +00:00
Scott Wolchok	24c67fe8cf	[PyTorch] AOTI: Emit static constexpr int array vars when possible (#112174 ) No need to populate a stack-based array for a shape/stride array when it's statically known. Differential Revision: [D50699889](https://our.internmc.facebook.com/intern/diff/D50699889/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112174 Approved by: https://github.com/chenyang78, https://github.com/desertfire, https://github.com/jansel ghstack dependencies: #112116	2023-12-12 06:19:45 +00:00
Scott Wolchok	ff6f987adc	[PyTorch] Replace cached thread_locals with stack allocation in AOTI (#112116 ) This changes cached thread_local tensors to stack-allocated buffers. Since we were incidentally caching output in a thread_local, I had to add manual thread_local caching of outputs, which I implemented by caching a buffer and a Tensor whose storage is that buffer and then just memcpying the result into the cached buffer every time. Ideally, memory planning would be able to identify allocations that are the backing storage for outputs, but this should be good enough in the absence of planning. Differential Revision: [D50416438](https://our.internmc.facebook.com/intern/diff/D50416438/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112116 Approved by: https://github.com/jansel, https://github.com/desertfire	2023-12-12 06:19:45 +00:00
Kumar Ashutosh	405a0040cf	Adds tool to visualize sharding (#114307 ) This pull request adds a tool to visualize sharding. It uses the device_mesh and placement details to construct a visualization of the split of a torch dtensor. Things to fix: - [x] This implementation only uses the first element of the placement tuple, when can there be more than one elements? - [x] The calculation of the split is happening here but maybe it is already done somewhere internally in Shard class and can we directly call that here? Fixes #108746 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114307 Approved by: https://github.com/wanchaol	2023-12-12 06:18:03 +00:00
ecao	65651d970b	Optimize the copy of Half to Float and Float to Half on CPU (#103148 ) ### Description Optimize the copy of Half to Float and Float to Half on CPU. ### Testing Single core: shape \| fp16 -> fp32 / ms \| fp32 -> fp16 / ms \| bf16 -> fp32 / ms \| fp32 -> bf16 / ms -- \| -- \| -- \| -- \| -- size: (1, 777) \| 0.00345 \| 0.00344 \| 0.00411 \| 0.00410 size: (2, 512) \| 0.00355 \| 0.00344 \| 0.00431 \| 0.00400 size: (10, 555) \| 0.00473 \| 0.00391 \| 0.00562 \| 0.00477 size: (1, 2048, 1024) \| 0.488 \| 0.480 \| 0.498 \| 0.499 size: (32, 100, 777) \| 0.584 \| 0.568 \| 0.571 \| 0.587 28 cores: shape \| fp16 -> fp32 / ms \| fp32 -> fp16 / ms \| bf16 -> fp32 / ms \| fp32 -> bf16 / ms -- \| -- \| -- \| -- \| -- size: (10, 555) \| 0.00472 \| 0.00369 \| 0.00576 \| 0.00481 size: (1, 2048, 1024) \| 0.0189 \| 0.0188 \| 0.0173 \| 0.0251 size: (64, 512, 1024) \| 3.159 \| 2.375 \| 3.152 \| 2.358 size: (32, 100, 777) \| 0.0225 \| 0.0195 \| 0.0193 \| 0.0261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/103148 Approved by: https://github.com/jgong5, https://github.com/cpuhrsch	2023-12-12 05:57:52 +00:00
angelayi	b6a4866330	[export][reland][refactor][3/n] Move unlift to separate file (#115558 ) Reland of https://github.com/pytorch/pytorch/pull/114787 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115558 Approved by: https://github.com/zhxchen17, https://github.com/atalman ghstack dependencies: #115556, #115557	2023-12-12 05:37:07 +00:00
angelayi	36199747f3	[export][reland][refactor][2/n] Move tracing logic (#115557 ) Reland of https://github.com/pytorch/pytorch/pull/114768 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115557 Approved by: https://github.com/zhxchen17 ghstack dependencies: #115556	2023-12-12 05:37:07 +00:00
angelayi	dd9a989b83	[export][reland][refactor][1/n] Split dynamic shapes (#115556 ) Reland of https://github.com/pytorch/pytorch/pull/114764 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115556 Approved by: https://github.com/zhxchen17	2023-12-12 05:36:41 +00:00
Jackie (Jiaqi) Xu	744d74c456	[inductor][optimus] enable smart fusion (#115471 ) Summary: Enable gmm smart fusion in D51698686 Test Plan: buck test Differential Revision: D52002137 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115471 Approved by: https://github.com/mengluy0125	2023-12-12 05:04:36 +00:00
Wanchao Liang	fbb744fd49	[dtensor] enable radam foreach optimizer (#115566 ) As titled, test both non-foreach and foreach optim Pull Request resolved: https://github.com/pytorch/pytorch/pull/115566 Approved by: https://github.com/XilunWu ghstack dependencies: #115297, #115564, #115565	2023-12-12 03:57:00 +00:00
Wanchao Liang	c322e5b5e9	[dtensor] add test for nadam optimizer (#115565 ) as titled, foreach ops already supported, just add test Pull Request resolved: https://github.com/pytorch/pytorch/pull/115565 Approved by: https://github.com/XilunWu ghstack dependencies: #115297, #115564	2023-12-12 03:57:00 +00:00
Wanchao Liang	4bd661c472	[dtensor] enable adadelta foreach optimizer (#115564 ) as titled Pull Request resolved: https://github.com/pytorch/pytorch/pull/115564 Approved by: https://github.com/XilunWu ghstack dependencies: #115297	2023-12-12 03:56:55 +00:00
Wanchao Liang	8a27352d6b	[dtensor] add a implicit replication flag (#115297 ) This PR adds a experimental implicit replication support for DTensor to inter-op with torch.Tensor, basically under this context manager DTensor could work together with torch.Tensor by assuming the torch.Tensor sharding layout is replicated. Note that this is risky for DTensor so we don't turn it on by default, but for certain cases where it is for sure replicated, user can use this to allow DTensor and Tensor computation work together Pull Request resolved: https://github.com/pytorch/pytorch/pull/115297 Approved by: https://github.com/awgu	2023-12-12 03:56:48 +00:00
wz337	c70f995b5c	[DeviceMesh] Add mesh_dim_names to DeviceMesh __repr__ if it exists (#115579 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115579 Approved by: https://github.com/wanchaol	2023-12-12 02:18:34 +00:00
Bin Bao	0fc04e274d	[inductor] Fix an aliased output bug (#115373 ) Summary: for https://github.com/pytorch/pytorch/issues/97083, when Pull Request resolved: https://github.com/pytorch/pytorch/pull/115373 Approved by: https://github.com/jansel	2023-12-12 01:18:59 +00:00
David Berard	89ee3af076	[Reland][Dynamo] Don't log compilation metrics for PyTorch unit tests (#115571 ) Reland #115452, which was reverted to simplify a merge conflict with #115386 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115571 Approved by: https://github.com/yanboliang	2023-12-12 01:15:54 +00:00
Valentine233	064846dbc2	[cpu] flash attention optimization (#115151 ) ### Modifications - EXP: Add a fast version with a reduced accuracy (ULP20) to vec exp `exp_u20` and use it in flash attention. - FUSION: Do fusion for `softmax` ops. - SCALE: Move the calculation of `scaling_factor` after `gemm`. ### Performance _Model: Stable Diffusion V2.1_ \| Version \| BF16 Kernel latency (s) \| BF16 speedup \| FP32 Kernel latency (s) \| FP32 speedup \| \| ----- \| ----- \| ----- \| ----- \| ----- \| \| PT \| 15.865 \| \| 35.362 \| \| \| PT + EXP \| 12.518 \| 21.10% \| 19.327 \| 45.35% \| \| PT + EXP + FUSION \| 11.774 \| 25.79% \| 18.306 \| 48.23% \| \| PT + EXP + FUSION + SCALE \| 11.053 \| 30.33% \| 18.360 \| 48.08% \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/115151 Approved by: https://github.com/jgong5, https://github.com/drisspg	2023-12-12 01:09:55 +00:00
fduwjj	0379c11248	[c10d] Enable PG NCCL monitor thread by default (#115577 ) We added a monitor thread in NCCL PG in https://github.com/pytorch/pytorch/pull/112518. To summarize what we are doing in monitor thread: it listens to the heartbeat from watchdog thread and detect unhealthy nccl watchdog hang (due to several reasons such as nccl/cuda API bugs or unexpected blocking behaviors). This is the last resort to ensure that we don't silently keep the training job run for hours. We didn't open this feature as default, since we want to perform more due diligence and have some customers to try it out. So far, we didn't see any obstacle which blocks turning on this feature and received positive feedback from users. We now decided to turn in on by default in this PR. If this feature turns out not to work as expected and disturb one's training process, one can set `TORCH_NCCL_ENABLE_MONITORING=0` to disable this feature. Please kindly file an issue with us so that we can see if we missed any corner cases during the design. Differential Revision: [D52045911](https://our.internmc.facebook.com/intern/diff/D52045911) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115577 Approved by: https://github.com/wconstab, https://github.com/kwen2501	2023-12-12 00:45:54 +00:00
andrewor14	6988e40b48	[quant][fx] Lower operator.matmul in convert_fx (#113954 ) Summary: We support lowering `torch.matmul` but not `operator.matmul`. This commit adds support for the latter, which enables lowering the shorthand `@`. This address https://github.com/pytorch/pytorch/issues/111450. Test Plan: python test/test_quantization.py TestQuantizeFx Reviewers: jerryzh168 Subscribers: jerryzh168, supriyar Pull Request resolved: https://github.com/pytorch/pytorch/pull/113954 Approved by: https://github.com/jerryzh168	2023-12-12 00:34:58 +00:00
Wanchao Liang	0a464ad1a7	[dtensor] turn back on symbolic shape in tests (#115568 ) as titled, as @jbschlosser enabled dynamic shape support for traceable subclass, turn back on the tests with default setting Pull Request resolved: https://github.com/pytorch/pytorch/pull/115568 Approved by: https://github.com/XilunWu	2023-12-12 00:26:23 +00:00
Jithun Nair	078773b32b	[ROCm] Add owners for more HIP-specific paths (#113989 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113989 Approved by: https://github.com/jeffdaily, https://github.com/malfet	2023-12-12 00:24:38 +00:00
Yanbo Liang	17de38c9af	[Dynamo] Check duplication when loading dynamo tracing rules (#115059 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/115059 Approved by: https://github.com/jansel	2023-12-12 00:22:20 +00:00
Wanchao Liang	0692240b90	[dtensor] account for empty list when turning to OpStrategy (#115298 ) Trying to fix https://github.com/pytorch/pytorch/issues/115065 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115298 Approved by: https://github.com/XilunWu	2023-12-12 00:11:16 +00:00
Bin Bao	19c67a9db5	[dynamo] Fix a closure cell empty error (#115541 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/97115. The solution given by @jansel in that issue works. Checking in the code so it won't get lost. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115541 Approved by: https://github.com/jansel	2023-12-12 00:01:51 +00:00
Bin Bao	617c228fba	[CI] Lower the smoketest speedup threshold for nangpt (#115562 ) Summary: https://github.com/pytorch/pytorch/actions/runs/7158691360/job/19491437314 shows the variance can be larger than previously expected. Lowering it for now and if it continues to be a problem, we should switch to some other more stable model. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115562 Approved by: https://github.com/chenyang78	2023-12-11 23:46:30 +00:00
Jesse Cai	4471fe6c39	[sparse][semi-structured] add alg_id to _cslt_sparse_mm and _cslt_sparse_mm_search (#115178 ) Summary: cuSPARSELt has support for different alg_id, which are set via `cusparseLTMatmulAlgSetAttribute`, in total there are 4 different alg_ids, 0 - 3. Previously we were just using the default alg_id, as from our initial experiments we found that for most shapes the default alg_id is the fastest and that they made no difference on numerical correctness, just performance. From our previous experiments the fastest alg_id seemed to differ only on small matmul shapes. danthe3rd found a performance regression when running with cuSPARSELt v0.4.0 vs v0.5.0, on LLM shapes, which match these characteristics (activations are small, weights are large). However it's likely that this is due to the alg_id ordering changing, as mentioned in the release notes for v0.5.0. ``` cusparseLtMatmulAlgSelectionInit() does not ensure the same ordering of algorithm id alg as in v0.4.0. ``` This PR adds in the following: - support for passing in alg_id to _cslt_sparse_mm - a new op, _cslt_sparse_mm_search, which returns the optimal alg_id for a given matmul _cslt_sparse_mm_search has the same function signature as _cslt_sparse_mm, minus the alg_id parameter. We are able to achieve v0.4.0 performance with alg_id=1 on the shapes that daniel provided. We will address autoselecting the best alg_id in a future PR, possibly with torch.compile. Test Plan: ``` python test/test_sparse_semi_structured -k cslt ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/115178 Approved by: https://github.com/cpuhrsch	2023-12-11 23:08:51 +00:00
Nikita Shulga	8b28380c8e	[MPS] Fix `sum` and `prod` for complex types (#115554 ) By not force-casting dtype to float Test plan: `python -c "import torch;print(torch.linspace(-3.0, 3.0, 50, dtype=torch.cfloat, device='mps').sqrt().sin().sum())"` Before: ``` tensor(21.1778+0.j, device='mps:0') ``` After ``` tensor(21.1778+39.1377j, device='mps:0') ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/115554 Approved by: https://github.com/lezcano ghstack dependencies: #115512, #115513	2023-12-11 23:03:44 +00:00
Nikita Shulga	a4bb4a2373	[MPS] Add support for `MPSDataTypeComplexFloat[16\|32]` (#115513 ) But limit it to MacOS Sonoma + Before the calling `torch.cat` with complex types failed, but now it works. Before: ``` % python -c "import torch;print(torch.cat([torch.rand(3, 3, dtype=torch.cfloat).to('mps'), torch.rand(3, 3, dtype=torch.cfloat).to('mps')]))" TypeError: Trying to convert ComplexFloat to the MPS backend but it does not have support for that dtype. ``` After: ``` % python -c "import torch;print(torch.cat([torch.rand(3, 3, dtype=torch.cfloat).to('mps'), torch.rand(3, 3, dtype=torch.cfloat).to('mps')]))" tensor([[0.4857+0.0030j, 0.9375+0.8630j, 0.3544+0.9911j], [0.5293+0.8652j, 0.8440+0.1991j, 0.5152+0.8276j], [0.0136+0.7469j, 0.1403+0.4761j, 0.2943+0.0896j], [0.6458+0.0035j, 0.3579+0.4577j, 0.1723+0.1508j], [0.4420+0.3554j, 0.4396+0.7272j, 0.2479+0.1191j], [0.3895+0.2292j, 0.7886+0.1613j, 0.9243+0.4180j]], device='mps:0') ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/115513 Approved by: https://github.com/kulinseth ghstack dependencies: #115512	2023-12-11 23:03:44 +00:00
Jithun Nair	288822c968	Increase ROCm test shards to 6 (#110997 ) To reduce signal time Pull Request resolved: https://github.com/pytorch/pytorch/pull/110997 Approved by: https://github.com/huydhn, https://github.com/malfet	2023-12-11 22:30:16 +00:00
Thiago Crepaldi	4307ccde99	Move ONNX's TorchModelType to pytorch_test_common to fix circ. dep. (#115353 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115353 Approved by: https://github.com/BowenBao	2023-12-11 22:23:03 +00:00
suo	ccd5bde6a3	[export] Reintroduce InterpreterModule to unflatten (#115436 ) InterpreterModule is better than GraphModule codegen; it's more debuggable and has better stack traces. The only reason we don't use it today is because torch.compile doesn't work with it. I work around this by constructing a GraphModule separately for usage during dynamo tracing, but otherwise using torch.fx.Interpreter. Differential Revision: [D51971661](https://our.internmc.facebook.com/intern/diff/D51971661/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115436 Approved by: https://github.com/zhxchen17 ghstack dependencies: #115408	2023-12-11 22:15:32 +00:00
suo	c137335b5c	[export] make UnflattenedModule not inherit from GraphModule (#115408 ) UnflattenedModule doesn't really behave like a graph module; we customize `__call__` to do something completely different than what GraphModule does. So, things that test `isinstance(unflattened_module, GraphModule)` and do something with the GraphModule are often broken. This change makes UnflattenedModule it's own thing. Differential Revision: [D51959097](https://our.internmc.facebook.com/intern/diff/D51959097/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115408 Approved by: https://github.com/zhxchen17	2023-12-11 22:15:21 +00:00
fduwjj	8c1567d021	[c10d] Change watchdog inner loop function name to make it more accurate (#115404 ) Function `workCleanupLoop` does not affect all things we did in watchdog thread, so proposing a new name here to reflect what we are actually doing in the watchdog thread. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115404 Approved by: https://github.com/kwen2501, https://github.com/wconstab	2023-12-11 22:00:06 +00:00
Howard Huang	99f06c0cc2	[BE] update errors to be more descriptive (#115443 ) we call `_check_single_tensor` and `_check_tensor_list` as validation but don't print out the param types that were invalid Pull Request resolved: https://github.com/pytorch/pytorch/pull/115443 Approved by: https://github.com/XilunWu	2023-12-11 21:21:10 +00:00
Nikita Shulga	b706c4116d	[MPS] Add MacOS 14 runtime check (#115512 ) Prerequisite for adding more complex type support and FFT operation Check using `conjugateWithTensor:name:` selector defined as follows ```objc /// Returns the complex conjugate of the input tensor elements. /// /// - Parameters: /// - tensor: The input tensor. /// - name: An optional string which serves as an identifier for the operation.. /// - Returns: A valid `MPSGraphTensor` object containing the elementwise result of the applied operation. -(MPSGraphTensor ) conjugateWithTensor:(MPSGraphTensor ) tensor name:(NSString * _Nullable) name MPS_AVAILABLE_STARTING(macos(14.0), ios(17.0), tvos(17.0)) MPS_SWIFT_NAME( conjugate(tensor:name:) ); ``` - Rename `isOnMacOS13orNewer(unsigned minor)` hook to `isOnMacOSorNewer(major, minor)` - Replace `torch._C.__mps_is_on_macos_13_or_newer` with `torch._C._mps_is_on_macos_or_newer` - Add `torch.backends.mps.is_macos_or_newer` public API Pull Request resolved: https://github.com/pytorch/pytorch/pull/115512 Approved by: https://github.com/albanD	2023-12-11 21:11:42 +00:00
fduwjj	03ff44c958	[c10d] Fix Store check condition in NCCL PG watchdog (#115475 ) In https://github.com/pytorch/pytorch/pull/115449/ somehow after turning on `DUMP_ON_TIMEOUT=1`, some existing tests failed. Upon checking, the failing is because of TCPStore check call within watchdog thread. 1. It's not because of TCPStore creation has not completed, because if I put it sleep for a long time, the test still failed. Rather, it's because we query TCPStore after we shutdown the PG. 2. The reason for that is: The `std::chrono::steady_clock::now()` function in C++ returns a `time_point` object representing the current point in time according to the steady clock. The default unit of this time_point is not directly specified in terms of seconds or nanoseconds; rather, it is dependent on the internal representation of the steady clock, which can vary between implementations. In reality it's actually nanosecs which makes the delta so big that we are checking the store every time when watchdog thread wakes up. To make things even worse, `terminateProcessGroup_` might be turned to be `True` before the next check for the outmost while but before TCPStore check, so watchdog gets stuck because we are checking a TCPStore which is already deleted. And main thread is still waiting for watchdog to join. The solution here is: 1. Add back `std::chrono::duration_cast` to ensure the delta is indeed mil_sec, so that the timeout check logic is working as expected. 2. Check `terminateProcessGroup_` as well so that, we don't do any dump when main thread has already mark the process exited. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115475 Approved by: https://github.com/wconstab	2023-12-11 21:06:05 +00:00
Shubhraprakash Das	ccc9e5f5bc	Optimize conv2d pw quantized (#115221 ) Summary: In order to get better performance on conv2d pw its better to read the input together in a batch. With this optimization on CUNET-enc ops: Kernel Name Workgroup Size Duration P50 (ns) =========== ============== ================= vulkan.quantized_conv2d_pw_2x2{96, 72, 2} 891332 vulkan.quantized_conv2d_pw_2x2{48, 36, 4} 528528 vulkan.quantized_conv2d_pw_2x2{24, 18, 8} 557336 Without this optimization: Kernel Name Workgroup Size Duration P50 (ns) =========== ============== ================= vulkan.quantized_conv2d_pw_2x2{96, 72, 2} 1633268 vulkan.quantized_conv2d_pw_2x2{48, 36, 4} 1177228 vulkan.vulkan.quantized_conv2d_pw_2x2{24, 18, 8} 1343264 Test Plan: Ensure all vulkan quantize tests pass: buck2 run --target-platforms ovr_configplatform/macos:arm64-fbsourcexplat/caffe2:pt_vulkan_quantized_api_test_binAppleMac\#macosx-arm64 -c pt.vulkan_full_precision=1 --show-output" Running main() from third-party/googletest/1.11.0/googletest/googletest/src/gtest_main.cc [==========] Running 78 tests from 1 test suite. [----------] Global test environment set-up. [----------] 78 tests from VulkanAPITest [ RUN ] VulkanAPITest.uniform_buffer_copy ... [----------] Global test environment tear-down [==========] 78 tests from 1 test suite ran. (1519 ms total) [ PASSED ] 78 tests. buck2 run --target-platforms ovr_config//platform/macos:arm64-fbsource //xplat/caffe2:pt_vulkan_api_test_binAppleMac\#macosx-arm64 -c pt.vulkan_full_precision=1 --show-output" Running main() from third-party/googletest/1.11.0/googletest/googletest/src/gtest_main.cc [==========] Running 395 tests from 1 test suite. [----------] Global test environment set-up. [----------] 395 tests from VulkanAPITest [ RUN ] VulkanAPITest.zero_size_tensor [ OK ] VulkanAPITest.zero_size_tensor (83 ms) ... xplat/caffe2/aten/src/ATen/test/vulkan_api_test.cpp:7593: Skipped QueryPool is not available [ SKIPPED ] VulkanAPITest.querypool_flushed_shader_log (0 ms) [----------] 395 tests from VulkanAPITest (6515 ms total) [----------] Global test environment tear-down [==========] 395 tests from 1 test suite ran. (6515 ms total) [ PASSED ] 394 tests. [ SKIPPED ] 1 test, listed below: [ SKIPPED ] VulkanAPITest.querypool_flushed_shader_log YOU HAVE 5 DISABLED TESTS Reviewed By: yipjustin Differential Revision: D50997530 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115221 Approved by: https://github.com/yipjustin	2023-12-11 20:59:15 +00:00
PyTorch UpdateBot	585aea6e77	[xla hash update] update the pinned xla hash (#115528 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml). Update the pinned xla hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115528 Approved by: https://github.com/clee2000	2023-12-11 20:22:46 +00:00
Isuru Fernando	505574c46a	Add decomposition for torch.block_diag (#115096 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115096 Approved by: https://github.com/peterbell10	2023-12-11 20:04:22 +00:00
PyTorch MergeBot	5fe2b138e3	Revert "[inductor] Fix an aliased output bug (#115373 )" This reverts commit 1310f0bf38293b68a781287d1de8cf699a76974d. Reverted https://github.com/pytorch/pytorch/pull/115373 on behalf of https://github.com/atalman due to Sorry for reverting your change it broke inductor tests ([comment](https://github.com/pytorch/pytorch/pull/115373#issuecomment-1850792869))	2023-12-11 20:02:15 +00:00
Catherine Lee	c52b78ebc2	[ez] Remove some args from run_test.py (#115459 ) Don't think anyone uses these Pull Request resolved: https://github.com/pytorch/pytorch/pull/115459 Approved by: https://github.com/malfet, https://github.com/huydhn	2023-12-11 19:56:37 +00:00
Catherine Lee	b5578cb08b	[ez] Remove unittest retries (#115460 ) Pytest is used in CI now for reruns and I doubt people are using the env vars when running locally. imo removing this code has the makes the run function easier to read Pull Request resolved: https://github.com/pytorch/pytorch/pull/115460 Approved by: https://github.com/malfet, https://github.com/huydhn	2023-12-11 19:46:09 +00:00
David Berard	5c0976fa04	Revert "[dynamo] guarded config (#111299 )" (#115386 ) This reverts commit 5927e9cbf2ac18aaaaecaab02258b7a35ac10969. Differential Revision: [D51959266](https://our.internmc.facebook.com/intern/diff/D51959266) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115386 Approved by: https://github.com/yanboliang, https://github.com/malfet ghstack dependencies: #115384, #115401, #115385	2023-12-11 19:35:42 +00:00
David Berard	6db7b30db4	Revert "[dynamo] Cache size calc for differing config (#111300 )" (#115385 ) This reverts commit 78318d024989cf86e1ede424997cd42d2d291694. Differential Revision: [D51959268](https://our.internmc.facebook.com/intern/diff/D51959268) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115385 Approved by: https://github.com/malfet ghstack dependencies: #115384, #115401	2023-12-11 19:35:42 +00:00
PyTorch MergeBot	f06f51b152	Revert "[Dynamo] Don't log compilation metrics for PyTorch unit tests (#115452 )" This reverts commit cd444aa075dd1e9c5d85cf3fbca9e078c74a7580. Reverted https://github.com/pytorch/pytorch/pull/115452 on behalf of https://github.com/davidberard98 due to Merge conflict with #115385, which already landed in fbcode ([comment](https://github.com/pytorch/pytorch/pull/115452#issuecomment-1850729965))	2023-12-11 19:21:40 +00:00
PyTorch UpdateBot	f5f6618813	[executorch hash update] update the pinned executorch hash (#115311 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml). Update the pinned executorch hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115311 Approved by: https://github.com/pytorchbot	2023-12-11 18:31:44 +00:00
PyTorch MergeBot	40a14e07ef	Revert "[sparse][semi-structured] add alg_id to _cslt_sparse_mm and _cslt_spasre_mm_search (#115178 )" This reverts commit 1e5636f7915035b09dce22ad1d2170a65f344214. Reverted https://github.com/pytorch/pytorch/pull/115178 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but the Window build failure looks legit `1e5636f791` ([comment](https://github.com/pytorch/pytorch/pull/115178#issuecomment-1850605711))	2023-12-11 18:07:17 +00:00
fduwjj	5f41fc7619	[c10d] Change NCCL PG watchdog error msg and test comments (#115403 ) Address the nit comments in https://github.com/pytorch/pytorch/pull/115226/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/115403 Approved by: https://github.com/wconstab ghstack dependencies: #115226	2023-12-11 17:55:28 +00:00
Aaron Gokaslan	794545c11f	[BE]: Enable RUF015 codebase wide (#115507 ) Constant time access of first value in collection. This is a constant time operation instead of converting the item to a list to get the first item which is linear. The rule is turned on which automatically autofixes and enforces this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115507 Approved by: https://github.com/malfet	2023-12-11 15:51:01 +00:00
Jesse Cai	1e5636f791	[sparse][semi-structured] add alg_id to _cslt_sparse_mm and _cslt_spasre_mm_search (#115178 ) Summary: cuSPARSELt has support for different alg_id, which are set via `cusparseLTMatmulAlgSetAttribute`, in total there are 4 different alg_ids, 0 - 3. Previously we were just using the default alg_id, as from our initial experiments we found that for most shapes the default alg_id is the fastest and that they made no difference on numerical correctness, just performance. From our previous experiments the fastest alg_id seemed to differ only on small matmul shapes. danthe3rd found a performance regression when running with cuSPARSELt v0.4.0 vs v0.5.0, on LLM shapes, which match these characteristics (activations are small, weights are large). However it's likely that this is due to the alg_id ordering changing, as mentioned in the release notes for v0.5.0. ``` cusparseLtMatmulAlgSelectionInit() does not ensure the same ordering of algorithm id alg as in v0.4.0. ``` This PR adds in the following: - support for passing in alg_id to _cslt_sparse_mm - a new op, _cslt_sparse_mm_search, which returns the optimal alg_id for a given matmul _cslt_sparse_mm_search has the same function signature as _cslt_sparse_mm, minus the alg_id parameter. We are able to achieve v0.4.0 performance with alg_id=1 on the shapes that daniel provided. We will address autoselecting the best alg_id in a future PR, possibly with torch.compile. Test Plan: ``` python test/test_sparse_semi_structured -k cslt ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/115178 Approved by: https://github.com/cpuhrsch	2023-12-11 15:47:28 +00:00
atalman	b88be1686d	Revert "[export][refactor][1/n] Move dynamic shapes logic (#114764 )" (#115508 ) GitHub first oncall. This reverts commit 53bf8cfcf9c966096e829247380462d0a3a61e8d. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115508 Approved by: https://github.com/malfet, https://github.com/angelayi	2023-12-11 14:54:51 +00:00
igm503	f017a1af3f	[MPS] add complex_out to MPS backend (#110851 ) Adds support for at::complex_out to the MPS backend Implemented in a binary kernel using the view_as_real pattern for handling complex dtypes in the mps backend. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110851 Approved by: https://github.com/kulinseth	2023-12-11 13:37:55 +00:00
Jason Ansel	de89a53df8	[benchmarking] Reduce box_detections_per_img for vision_maskrcnn (#115487 ) This fixes a failure on the [perf dashboard](https://hud.pytorch.org/benchmark/compilers) with `--amp` mode. I believe boxes 5 and 6 were getting swapped. The existing comment explains the issue. Before ``` $ ./benchmarks/dynamo/torchbench.py --training --accuracy --no-translation-validatio --amp --backend=inductor --disable-cudagraphs --only vision_maskrcnn ... [2023-12-09 13:21:27,292] torch._dynamo.utils: [ERROR] RMSE (res-fp64): 0.00171, (ref-fp64): 0.00054 and shape=torch.Size([256, 256, 3, 3]) [2023-12-09 13:21:27,292] torch._dynamo.utils: [ERROR] Accuracy failed for key name backbone.fpn.layer_blocks.2.0.weight.grad fail_accuracy ``` After ``` $ ./benchmarks/dynamo/torchbench.py --training --accuracy --no-translation-validatio --amp --backend=inductor --disable-cudagraphs --only vision_maskrcnn ... pass ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/115487 Approved by: https://github.com/yanboliang	2023-12-11 08:42:25 +00:00
Yanbo Liang	274fdc81f8	[Dynamo][6.3/N] Further cleanup torch.py (#114669 ) A follow-up PR to clean up what I found during the refactor of torch.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/114669 Approved by: https://github.com/jansel	2023-12-11 07:16:03 +00:00
Yang Chen	fe01605830	[aotinductor] replace lld with the default ld linker (#115478 ) Currently, we place constants in the .so. To avoid cases where constants are too large (i.e. >2G), we put the constants into .lrodata, which allows doesn't have 2G limit. Not sure why, lld still issues errors like beow even if those large constants data are stored in .lrodata section: "relocation R_X86_64_PC32 out of range: 5459191920 is not in [-2147483648, 2147483647]" In constrast, the default gnu ld linker works fine. Let's switch back to use ld to unblock some internal models. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115478 Approved by: https://github.com/desertfire, https://github.com/htyu	2023-12-11 02:35:26 +00:00
Bin Bao	1310f0bf38	[inductor] Fix an aliased output bug (#115373 ) Summary: for https://github.com/pytorch/pytorch/issues/97083, when Pull Request resolved: https://github.com/pytorch/pytorch/pull/115373 Approved by: https://github.com/jansel	2023-12-10 23:52:39 +00:00
Bin Bao	2e6b809d6b	[AOTI] Fix a missing declaration for the result of item() (#115175 ) Differential Revision: [D51968539](https://our.internmc.facebook.com/intern/diff/D51968539) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115175 Approved by: https://github.com/chenyang78	2023-12-10 22:49:45 +00:00
Nikita Shulga	9b3cb1c66c	Fix environment condition for docker-release.yml As those are run on nightlies and release tags environment should be set accordingly. Also simplify `WITH_PUSH` condition. Should fix https://github.com/pytorch/pytorch/actions/runs/7156407285/job/19494049140	2023-12-10 14:09:39 -08:00
Adrian Wälchli	38f890341d	Implement pass-through `state_dict` and `load_state_dict` for dynamo OptimizedModule (#113423 ) Fixes #113422 Fixes #94575 This is now possible: ```py model = Model() compiled_model = torch.compile(model) model.load_state_dict(compiled_model.state_dict()) # previously key mismatch! ``` This also makes it much easier to checkpoint and load models that were wrapped like so: ```py FSDP(torch.compile(model)) # or DDP(torch.compile(model)) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/113423 Approved by: https://github.com/msaroufim	2023-12-10 22:09:19 +00:00
Bin Bao	26266c9718	[CI] Call torch.cuda.empty_cache to release device memory (#114663 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114663 Approved by: https://github.com/eellison	2023-12-10 21:27:42 +00:00
Jason Ansel	694cc6af56	[benchmarks] Fix NameError: name 'args' is not defined (#115494 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115494 Approved by: https://github.com/Skylion007, https://github.com/desertfire	2023-12-10 21:22:21 +00:00
Hannes Friederich	21a1d31ed8	[caffe2] update Meta-internal googletest references (#115407 ) Summary: Update test dependencies to point to the new internal googletest location. Test Plan: CI Differential Revision: D51951643 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115407 Approved by: https://github.com/cccclai	2023-12-10 20:37:13 +00:00
atalman	24a463c46c	Revert "[export][refactor][2/n] Move tracing logic (#114768 )" (#115503 ) Github first oncall. This reverts commit 0ab57ee7eab5391289d30e8c49fceee3f503f539. Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/115503 Approved by: https://github.com/angelayi, https://github.com/kit1980	2023-12-10 19:30:15 +00:00
David Berard	b4ef59f740	Revert "[dynamo] remove unused `OptimizeCtx` field - export (#113901 )" (#115401 ) This reverts commit b62230a685666e8c2b8a5cb31b16352d286bcf9f. Differential Revision: [D52001024](https://our.internmc.facebook.com/intern/diff/D52001024) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115401 Approved by: https://github.com/malfet ghstack dependencies: #115384	2023-12-10 18:17:24 +00:00
David Berard	b36fc6790e	Revert "[dynamo] Guard on `HAS_GRAPH_BREAKS` if graph breaks are present (i.e. cache miss if compiled object requires nopython) (#114073 )" (#115384 ) This reverts commit 0bb29f945079ac4c83d674f7b3ff755cfb5396cf. Differential Revision: [D51959267](https://our.internmc.facebook.com/intern/diff/D51959267) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115384 Approved by: https://github.com/malfet	2023-12-10 18:16:02 +00:00
PyTorch MergeBot	6c1e75e646	Revert "[HigherOrderOp] make MapHigherOrder create map_impl call_function node instead of map (#115205 )" This reverts commit 8b747358783d2411afe1136dcc9da95c01bfbdaa. Reverted https://github.com/pytorch/pytorch/pull/115205 on behalf of https://github.com/atalman due to ghfirst broke internal tests ([comment](https://github.com/pytorch/pytorch/pull/115205#issuecomment-1848995376))	2023-12-10 15:25:55 +00:00
Nikita Shulga	100c466bff	[CI][Inductor] Skip CPU tests when running on GPU (#115430 ) This is just follows the standard practice for CI, when one specifies `PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda`, only tests targeting the device should be run Do it by refactoring part of `instantiate_device_type_tests` into `get_desired_device_type_test_bases` and using it from test_torchinductor.py to skip CPU tests Fixes https://github.com/pytorch/pytorch/issues/115423 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115430 Approved by: https://github.com/seemethere	2023-12-10 15:21:24 +00:00
PyTorch MergeBot	08d63a75a4	Revert "[HigherOrderOp] Remove additional get item calls in MapHigherOrder. (#115207 )" This reverts commit dd6ae6d3b473906d32fcb8a319895e31b039f224. Reverted https://github.com/pytorch/pytorch/pull/115207 on behalf of https://github.com/atalman due to ghfirst broke internal tests ([comment](https://github.com/pytorch/pytorch/pull/115207#issuecomment-1848991919))	2023-12-10 15:12:12 +00:00
Michael Lazos	fbeca60b1f	Remove replace_all and make VTs mutable (#113725 ) 1. Removes calls to `replace_all` and `clone` and makes VTs mutable. 2. Properly handles Tuple Iterator mutation. Previously TupleIterator variables would only be properly reconstructed if they were advanced at least once in a frame. On calls to `next`, the source information would be lost (due to constructing a new iterator without using builder), which would ensure that during codegen the variable would be reconstructed from scratch. Now that VTs are mutated, the source is never lost, so we need to properly track mutation and handle it by replaying calls to `next` at the end of the modified bytecode. 3. Added test for checking iadd side effects, this was missing in our unit test coverage. 4. Fixed two incorrect sources, DelayGraphBreakVariable, and UserMethodVariable both relied on setting the source to AttrSource(parent, name) at the callsite of `var_getattr`. 5. Fixed a bug in inplace adding for lists, it would set the resulting VariableTracker's source to `None` which would utilize a different reconstruct path in codegen. Now this is handled explicitly by reconstructing vars when allow_cache=`False`, so that during side effect replay, the mutated var is correctly updated. In subsequent PRs: * Refactoring side effect tracking to be significantly simpler (I think we only need an `is_modified` flag) * Refactor `next_variables` iterator to match the signature of `next` * Remove all references to `options` in the code * Refactor VTs representing mutable collections to implement their own mutation update handling * Remove clone and/or make it specific to lists for creating slices * Add mutation tracking/replay for sets * Add mutation tracking/replay for iter.py * Removing setting source in builder (it's set at the top level after a var is returned) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113725 Approved by: https://github.com/jansel	2023-12-10 09:31:21 +00:00
Yanbo Liang	f71d931b32	[Dynamo][6.2/N] Dump the in graph function list(~2600 ops) and add unit tests. (#114196 ) This is the second PR according https://github.com/pytorch/pytorch/pull/113009#issuecomment-1804417925 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114196 Approved by: https://github.com/jansel	2023-12-10 06:41:51 +00:00
PyTorch MergeBot	4eb5838e18	Revert "Enable builtin tests for ONNX Export with ExportedProgram models (#114762 )" This reverts commit 13d2e3eba79000028291f4739a6e9c937dbe4264. Reverted https://github.com/pytorch/pytorch/pull/114762 on behalf of https://github.com/huydhn due to Sorry for reverting your change but ONNX test is failing from this commit `13d2e3eba7` ([comment](https://github.com/pytorch/pytorch/pull/114762#issuecomment-1848831147))	2023-12-10 01:55:47 +00:00
PyTorch MergeBot	2ee240d14a	Revert "Move ONNX's TorchModelType to pytorch_test_common to fix circ. dep. (#115353 )" This reverts commit 960ad9d94e365c758b19298b45bcba5225b79e0c. Reverted https://github.com/pytorch/pytorch/pull/115353 on behalf of https://github.com/huydhn due to Sorry for reverting your change but ONNX test is failing from the commit below in the stack `13d2e3eba7` ([comment](https://github.com/pytorch/pytorch/pull/115353#issuecomment-1848830883))	2023-12-10 01:53:50 +00:00
Jason Ansel	4490d4692b	[doc] Rewrite benchmarks/dynamo/README.md (#115485 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115485 Approved by: https://github.com/yanboliang	2023-12-10 00:37:53 +00:00
Nikita Shulga	8ddc549c0f	[BE][JIT] Do not wrap shared_ptr with optional (#115473 ) While reviewing https://github.com/pytorch/pytorch/pull/115381 noticed that `torch::jit::GraphFunction::optimized_graph_` is an `std::array<c10::optional<std::shared_ptr<Graph>>, N>`, which feels excessive as `shared_ptr` is already nullable and have `operator bool()`. Looking at https://github.com/pytorch/pytorch/pull/26488 that introduced the change, also does not hint that this indirection is necessary. Test plan: CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/115473 Approved by: https://github.com/davidberard98, https://github.com/Skylion007	2023-12-09 20:43:40 +00:00
Sijia Chen	641ec2115f	[AOTI] move model runner into a library (#115220 ) Summary: So that we can import it in fbcode and do some AOTI run in py env Test Plan: existed AOTI tests Reviewed By: chenyang78 Differential Revision: D51780021 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115220 Approved by: https://github.com/desertfire	2023-12-09 19:03:32 +00:00
Tobias Ringwald	c039f01bd9	Increased hardcoded limit for number of GPUs. (#115368 ) Fixes #115331. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115368 Approved by: https://github.com/albanD	2023-12-09 18:10:51 +00:00
cyy	99f222372b	[5/N] Fixes clang-tidy warnings in c10/{core,util}/*.h (#115354 ) This PR continues to fix clang-tidy warnings for headers in c10/core and c10/util. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115354 Approved by: https://github.com/Skylion007	2023-12-09 17:16:04 +00:00
Chip Turner	937d616e82	Re-enable type checking for distributed_c10d.py (#115223 ) Re-enable type checking for distributed_c10d.py Type checking for distributed_c10d.py was inadvertently turned off in issues that have accumulated since. Note: the backwards compatibility linter does not like some of these changes. But they were incorrect before. This needs human verification, however. #suppress-api-compatibility-check Pull Request resolved: https://github.com/pytorch/pytorch/pull/115223 Approved by: https://github.com/wconstab	2023-12-09 11:07:54 +00:00
Yue Dong	485ea9a70a	[DTensor] Add DTensor experimental op for LayerNorm backward sharding rule propogation (#115398 ) Summary: This diff is only for prototype to unblock the TP work. PyTorch distributed team is working on a more generic backward op for `aten.layer_norm`. Will remove this op from the experimental file once it is ready. Test Plan: Local Test: Accuracy: - Dtensor + Checkpoint: first run loss: P884569822 (on-par with baseline: P884213363) - 2nd by loading saved checkpoint: P884583429 (on-par with baseline: P884271869) Trace: - Collective functions are inserted automatically. - Example: https://fburl.com/perfdoctor/l567ww1x MAST Test: With: trainer = 128, batch_size=512 - NE on-par: (see: 4441_ep_bs512_2fsdp_tp_sp_dtensor) {F1155318138} Differential Revision: D51490868 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115398 Approved by: https://github.com/wanchaol	2023-12-09 09:38:56 +00:00
Yanbo Liang	eb3aa424ce	[Reland][Dynamo] Added support for math.radians on ints with dynamic shapes (#115477 ) Reland #114507 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115477 Approved by: https://github.com/larryliu0820	2023-12-09 08:58:18 +00:00
Thiago Crepaldi	960ad9d94e	Move ONNX's TorchModelType to pytorch_test_common to fix circ. dep. (#115353 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115353 Approved by: https://github.com/BowenBao ghstack dependencies: #114407, #115281, #114762	2023-12-09 07:47:03 +00:00
Thiago Crepaldi	13d2e3eba7	Enable builtin tests for ONNX Export with ExportedProgram models (#114762 ) Fixed by https://github.com/pytorch/pytorch/pull/113982 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114762 Approved by: https://github.com/BowenBao ghstack dependencies: #114407, #115281	2023-12-09 07:46:43 +00:00
Thiago Crepaldi	7e941a932b	Store user model to simplify ONNXProgram.{adapt_torch_*,__call__} APIs (#115281 ) Currently (after https://github.com/pytorch/pytorch/pull/114407), the user has must pass the original user ``model`` to APIs such as ``ONNXProgram.__call__``, ``ONNXProgram.adapt_torch_inputs_to_onnx`` and ``ONNXProgram.adapt_torch_outputs_to_onnx`` APIs. This was needed because when the model is fakefied, a version of the non-fakefied model is needed so that the Initializers, buffers and constants can be extracted from a real model (and used as input to the ONNX model). That approach brings an unnecessary usability burden to the user when the model is not fakefied, because the model that was already passed to ``torch.onnx.dynamo_export`` could be used to extract ``state_dict``. This PR adds ``ONNXProgram._model_torch`` attribute to store the user model and demote ``model`` argument of the aforementioned APIs to optional, only (as opposed to required). As a result, for the fakefied model scenario, the user still need to pass the required model, but for non fakefied models, the persisted model is implicitly used to extract the model state_dict, making it easier to use. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115281 Approved by: https://github.com/BowenBao ghstack dependencies: #114407	2023-12-09 07:46:12 +00:00
Yanbo Liang	da341d0d48	[Dynamo][6.1/N] Refactor out TorchInGraphFunctionVariable and improve heuristic (#113432 ) This is splitted from #113009, please check https://github.com/pytorch/pytorch/pull/113009#issuecomment-1804417925 for more details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113432 Approved by: https://github.com/ezyang, https://github.com/jansel	2023-12-09 05:11:44 +00:00
Deepak Seshadri	1c1f2bbe8a	Add a space in the error message (#115465 ) Summary: As title says Created from CodeHub with https://fburl.com/edit-in-codehub Test Plan: waitforsandcastle Sandcastle run Reviewed By: eeggl Differential Revision: D52000286 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115465 Approved by: https://github.com/kwen2501	2023-12-09 04:35:51 +00:00
Hongtao Yu	3ebf9acea1	[Triton] Replace triton.runtime.jit.get_cuda_stream with torch.cuda.c… (#115397 ) triton.runtime.jit.get_cuda_stream was removed in https://github.com/openai/triton/pull/2756 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115397 Approved by: https://github.com/jansel	2023-12-09 04:30:42 +00:00
cyy	516bd4a72c	[1/N] Use std::in_place (#115170 ) It is time to gradually replace c10::in_place with std::in_place. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115170 Approved by: https://github.com/colesbury	2023-12-09 03:52:39 +00:00
Chip Turner	2ed47fecc5	Robustify torch.multiprocessing.spawn error reporting to be less deadlock prone (#114688 ) multiprocessing.Queue relies on, among other things, background threads to send messages between processes. This works in the happy path but can cause issues if a process is exiting by bypassing atexit handlers or crashing because the writer to the Queue can terminate while the reader is blocked reading the queue. The reader sees the queue as non-empty yet even with a timeout will actually block forever. An example of a Queue deadlock is here: https://gist.github.com/chipturner/342f72341f087737befe9df84d0e41ce Since the error reporting case here is a simple one-shot message from the dying child to the parent, we can just use a file-based rendezvous. This eliminates the deadlock when a large traceback is still being flushed to the network when a child exits. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114688 Approved by: https://github.com/suo, https://github.com/yifuwang	2023-12-09 03:36:43 +00:00
BowenBao	2962271f58	[ONNX][dynamo_export] Extend expected fx output types for int, float, bool (#115431 ) Fixes exporting ops, such as `aten::_scaled_dot_product_flash_attention` that returns int, float, bool typed outputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115431 Approved by: https://github.com/titaiwangms, https://github.com/thiagocrepaldi	2023-12-09 03:24:48 +00:00
Yuqing Jiang	41b1919208	[nested_tensor]Python subclass NT overhead improvement (2/n): avoid getting from WeakTensorKeyDictionary twice during __init__ (#115450 ) Summary: Most NT operations end with creating a new NestedTensor, which is time-consuming. Trying to reduce overhead during the NestedTensor creation. The ops return a new NestedTensor with the same offsets, so "tensor not in _tensor_symint_registry" would be false in most case. The "in" (__contain__) function takes ~8 us. If we use the "get" directly, then we save a few us for most NT operations. Test Plan: Before: get_tensor_symint take 15us https://pxl.cl/3XF83 After get_tensor_symint take 10us https://pxl.cl/3XFc9 Differential Revision: D51992836 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115450 Approved by: https://github.com/soulitzer	2023-12-09 03:12:31 +00:00
Isuru Fernando	d40a7c6026	Add decompositions for replication_pad (#115113 ) Fixes #115395 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115113 Approved by: https://github.com/peterbell10	2023-12-09 02:44:07 +00:00
Wang, Xiao	d7705f325d	Patch `--save-xml` when `TEST_IN_SUBPROCESS` (#115463 ) Patch `--save-xml` when `TEST_IN_SUBPROCESS` When `--save-xml` is given as a unit test argument and the test is handled by a `TEST_IN_SUBPROCESS` handler (e.g., `run_test_with_subprocess` for `distributed/test_c10d_nccl`), the `--save-xml` args were first "consumed" by argparser in `common_utils.py`. When a following subprocess in this `if TEST_IN_SUBPROCESS:` section starts, there are no `--save-xml` args, thus leaving `args.save_xml` to `None`. Since argparser for `--save-xml` option has a default argument of `_get_test_report_path()` when the arg is `None`, it's not a problem for Github CI run. It could be an issue when people run those tests without `CI=1`. Test reports won't be saved in this case even if they passed `--save-xml=xxx`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115463 Approved by: https://github.com/clee2000	2023-12-09 02:38:31 +00:00
Oguz Ulgen	c9c4cdf9a9	[AOTAutograd] Do not call ctx.mark_dirty on mutations hidden from autograd (#115324 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115324 Approved by: https://github.com/bdhirsh	2023-12-09 02:23:13 +00:00
FFFrog	3361496f96	Fix the corner case of index_add (#114929 ) Fixes #114864 As the title stated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114929 Approved by: https://github.com/mikaylagawarecki	2023-12-09 01:57:25 +00:00
Thiago Crepaldi	3c54ff6bcd	Update ONNX's IO Adapter to support FakeTensor with ExportedProgram (#114407 ) Currently, the ONNX exporter using torch.nn.Module as input can support FakeTensor because the ONNX model stores all initializers When using torch.export.ExportedProgram as input, the initializers are lifted as inputs. In order to execute the ONNX model, we need to pass a reference to the non-fake model to the ONNXProgram.adapt_torch_inputs_to_onnx API, so that initializers can be fetched from the model and fed to the ONNX model as input ps: https://github.com/pytorch/pytorch/issues/115461 will track the API revision for the cases where additional `model_with_state_dict` are required to produce complete ONNX files exported with fake support. This is also tracked by the umbrella fake tensor issue https://github.com/pytorch/pytorch/issues/105464 FYI @BowenBao Pull Request resolved: https://github.com/pytorch/pytorch/pull/114407 Approved by: https://github.com/BowenBao	2023-12-09 01:48:27 +00:00
Will Feng	495054545c	Allow `preserve_rng_state=True` when torch.compile + selective checkpointing + CUDA (#113718 ) Fixes https://github.com/pytorch/pytorch/issues/113717. When `preserve_rng_state=True`, we let AOTAutograd trace through `torch.random.fork_rng` op, and the tracing doesn't work under CUDA, hence the original error reported in the issue. But since we are already doing RNG functionalization at Inductor level, we don't actually need to trace this `fork_rng` op. So we should just rewrite `preserve_rng_state` to False when we are using torch.compile (and let Inductor do its RNG functionalization which it's already been doing). Pull Request resolved: https://github.com/pytorch/pytorch/pull/113718 Approved by: https://github.com/wanchaol	2023-12-09 01:47:25 +00:00
Yanbo Liang	cd444aa075	[Dynamo] Don't log compilation metrics for PyTorch unit tests (#115452 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/115452 Approved by: https://github.com/zou3519	2023-12-09 01:39:36 +00:00
sanchitintel	e1370ff80f	Vectorize CPU ATen mean kernel for BF16 & FP16 dtypes (#114582 ) ## Summary Since #97351, CPU ATen kernel for `mean` for BF16 & FP16 dtypes has been unvectorized (it's not even implicitly vectorized). This PR vectorizes `mean` for BF16 & FP16 on CPU in a `cast_fp32 -> sum -> div -> cast_bf16_or_fp16` fashion. The perf benefit would be especially pronounced on machines with `AVX512_BF16` and/or `AVX512_FP16` ISA support. ## Benchmarking data for BF16 (collected before & after the change in this PR) Machine: Intel® Xeon® (4th generation series, formerly codenamed Sapphire Rapids) Platinum 8468H One socket (48 physical cores) - used `numactl --membind=0 --cpunodebind=0` libtcmalloc & Intel OpenMP were preloaded Environment variable used - `KMP_AFFINITY=granularity=fine,compact,1,0 KMP_BLOCKTIME=1 KMP_SETTINGS=1 OMP_NUM_THREADS=48 MKL_NUM_THREADS=48` Workload: E2E performance on BS 32 resnet50 (using BF16 via AMP) inference using oneDNN Graph JIT fuser (`mean` kernel is dispatched to eager mode ATen kernel, and is the bottleneck right now) \| BEFORE: Latency with unvectorized mean (lower is better)\| AFTER: Latency with vectorized mean (lower is better)\| Speedup due to vectorizing mean\| \|----------------------------\|-------------------------\|------------\| \| 19.1 ms \| 10.8 ms \| latency reduced by ~43.45% \| Benchmarking script for BF16 - ``` import time import torch import torchvision # enable oneDNN Graph JIT fuser torch.jit.enable_onednn_fusion(True) # AMP for JIT mode is enabled by default, and is divergent with its eager mode counterpart torch._C._jit_set_autocast_mode(False) # sample input should be of the same shape as expected inputs example_input = torch.rand(32, 3, 224, 224) # Using resnet50 from torchvision in this example for illustrative purposes, # but the line below can indeed be modified to use custom models as well. model = getattr(torchvision.models, "resnet50")().eval() with torch.no_grad(), torch.cpu.amp.autocast(cache_enabled=False, dtype=torch.bfloat16): # Conv-BatchNorm folding for CNN-based Vision Models should be done with ``torch.fx.experimental.optimization.fuse`` when AMP is used import torch.fx.experimental.optimization as optimization # Please note that optimization.fuse need not be called when AMP is not used model = optimization.fuse(model) model = torch.jit.trace(model, (example_input)) model = torch.jit.freeze(model) # a couple of warm-up runs model(example_input) model(example_input) # speedup would be observed in subsequent runs start = time.time() model(example_input) end = time.time() inference_time = (end - start) * 1000 print("Inference time is ", inference_time) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114582 Approved by: https://github.com/jgong5, https://github.com/malfet	2023-12-09 01:02:13 +00:00
William Wen	f614ed78b8	[docs, dynamo] fix typos in dynamo custom backend docs (#115444 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115444 Approved by: https://github.com/eellison	2023-12-08 23:58:26 +00:00
Isuru Fernando	fb19947962	Add decompositions for reflection_pad{1, 2, 3}d (#115100 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115100 Approved by: https://github.com/peterbell10	2023-12-08 23:05:57 +00:00
Will Constable	9f7b3a4e18	Move autolabeler to "oncall: distributed" not "module:.." (#115447 ) Reasoning for the change is spelled out in this issue https://github.com/pytorch/pytorch/issues/115168 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115447 Approved by: https://github.com/huydhn, https://github.com/malfet	2023-12-08 22:53:20 +00:00
atalman	749f0c90e1	Revert "[export][refactor][3/n] Move unlift to separate file (#114787 )" (#115457 ) Github First Oncall: This reverts commit 967863d91dbe0a56fa7bcc4e075a25cc4ad67c81. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115457 Approved by: https://github.com/osalpekar	2023-12-08 22:33:28 +00:00
atalman	28de29fdda	[releng] version 2.2 -> 2.3 (#115446 ) Release 2.2 branch cut is ompleted. Hence bump nightly version to 2.3 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115446 Approved by: https://github.com/huydhn, https://github.com/seemethere, https://github.com/malfet	2023-12-08 22:25:52 +00:00
PyTorch MergeBot	3e47e3f441	Revert "[export] Fix graph output mismatch issue with constant outputs. (#115280 )" This reverts commit 622688fab9fc6d20ff3475a8a0a1fdb6af9d837e. Reverted https://github.com/pytorch/pytorch/pull/115280 on behalf of https://github.com/atalman due to ghfirst issue when importing, will reland this PR ([comment](https://github.com/pytorch/pytorch/pull/115280#issuecomment-1847903624))	2023-12-08 22:10:03 +00:00
PyTorch MergeBot	3dab46fe19	Revert "[export] Dont skip output caching for now. (#115374 )" This reverts commit fd79995fd6d9f599ff60b721ae56bb7b0aa4eb93. Reverted https://github.com/pytorch/pytorch/pull/115374 on behalf of https://github.com/atalman due to ghfirst issue when importing, will reland this PR ([comment](https://github.com/pytorch/pytorch/pull/115374#issuecomment-1847899901))	2023-12-08 22:06:21 +00:00
Catherine Lee	aaaf5c08fb	[ez] Don't run workflows on forks (#115429 ) Adds the `if: github.repository_owner == 'pytorch'` to some jobs to make sure they don't run on forks, since they usually either fail or remain pending due to not having the correct machines to run. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115429 Approved by: https://github.com/huydhn, https://github.com/botmethere, https://github.com/malfet, https://github.com/atalman	2023-12-08 21:41:58 +00:00
HDCharles	b5d3d3ebf0	[ao] making hist_obs handle torch.inf and closeby values (#103467 ) Summary: This PR does 2 things: 1) Previously this would simply error, now it will ignore any torch.inf values that it recieves. note: The code checks for torch.inf after aminmax that way if there are no torch.inf values found, the perf is a relatively unchanged 2) as mentioned in https://github.com/pytorch/pytorch/issues/100051, values close to (but not quite at) the maximum/minimum float value could overflow to infinity in the course of _adjust_min_max() (when this large value would be multiplied by something in the middle of a calculation that would otherwise result in a non inf value). This was fixed by rearranging the order of operations for the lines in question without altering the actual equations. Specifically, where operations in lines 1095, 1098 and 1100 have multiplication and division of large values, its better to divide the two large values before multiplying, rather than multiplying the two large values together (creating overflow) before dividing like it had been. Test Plan: python test/test_quantization.py TestObserver.test_histogram_observer_ignore_infinity python test/test_quantization.py TestObserver.test_histogram_observer_handle_close_to_infinity Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D51489345](https://our.internmc.facebook.com/intern/diff/D51489345) Pull Request resolved: https://github.com/pytorch/pytorch/pull/103467 Approved by: https://github.com/andrewor14	2023-12-08 21:41:31 +00:00
Wanchao Liang	1215f2ffe2	[dtensor] readme typo (#115383 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115383 Approved by: https://github.com/awgu ghstack dependencies: #115365	2023-12-08 21:40:40 +00:00
PyTorch MergeBot	af925a56a1	Revert "[export] Add math.* ops to pass base (#115271 )" This reverts commit 6c0a4ced530dab78db455c37508931de2eb56239. Reverted https://github.com/pytorch/pytorch/pull/115271 on behalf of https://github.com/atalman due to ghfirst issue when importing, will reland this PR ([comment](https://github.com/pytorch/pytorch/pull/115271#issuecomment-1847852211))	2023-12-08 21:17:56 +00:00
Menglu Yu	12d7ea19af	[Indcutor][fx pass] Add sub and div pointwise ops to the post grad fusion (#115389 ) Summary: Titled Test Plan: # unit test ``` buck2 test 'fbcode//mode/dev-nosan' fbcode//caffe2/test/inductor:group_batch_fusion ``` Buck UI: https://www.internalfb.com/buck2/792c58db-c369-487d-9a42-b5da471657c0 Test UI: https://www.internalfb.com/intern/testinfra/testrun/2814749981661407 Network: Up: 74KiB Down: 29KiB (reSessionID-b47c266b-12d6-4e88-8dc3-4af1dd7ecbb4) Jobs completed: 20. Time elapsed: 2:09.6s. Cache hits: 0%. Commands: 2 (cached: 0, remote: 0, local: 2) Tests finished: Pass 7. Fail 0. Fatal 0. Skip 0. Build failure 0 # local reproduce OC: P899142918 MAI: P899175452 # e2e (oc) Differential Revision: D51957242 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115389 Approved by: https://github.com/dshi7, https://github.com/jackiexu1992, https://github.com/xuzhao9	2023-12-08 21:07:03 +00:00
PyTorch MergeBot	e8e4141773	Revert "[Dynamo][6.1/N] Refactor out TorchInGraphFunctionVariable and improve heuristic (#113432 )" This reverts commit e61d6b42f0f4e4fa5bb816e03fb81e5bbcc9fa06. Reverted https://github.com/pytorch/pytorch/pull/113432 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it is failing dynamo tests in trunk `e61d6b42f0`, landrace? ([comment](https://github.com/pytorch/pytorch/pull/113432#issuecomment-1847787981))	2023-12-08 20:15:39 +00:00
PyTorch MergeBot	d7180161b5	Revert "[SparseCsr] Remove triton sdpa skip after triton pin update (#109601 )" This reverts commit f64b10803f5fdd34e43fba7f421401bcfe247c19. Reverted https://github.com/pytorch/pytorch/pull/109601 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it is failing in trunk with this error ZeroDivisionError: integer division or modulo by zero ([comment](https://github.com/pytorch/pytorch/pull/109601#issuecomment-1847784383))	2023-12-08 20:12:53 +00:00
PyTorch MergeBot	4186932bac	Revert "[export] Remove runtime assertion pass (#115196 )" This reverts commit c163b3c03563c11640d4dbee504ef63101b019fe. Reverted https://github.com/pytorch/pytorch/pull/115196 on behalf of https://github.com/atalman due to Broke internal test ([comment](https://github.com/pytorch/pytorch/pull/115196#issuecomment-1847778344))	2023-12-08 20:07:04 +00:00
Will Constable	317486edb0	[C10D] Decouple flight recorder from enableTiming (#115358 ) RE #115301 Decoupling gives us a path to disable timing without disabling the flight recorder. Flight recorder is still useful for stuckness analysis without 'timing'. Disabling timing makes it miss the 'started' state that comes from using an extra nccl event at the start of each collective. It will also be missing 'duration_ms' of collectives, which hasn't been landed yet, but is useful for timing/perf work more than stuckness analysis. Hopefully we can enable timing by default and leave both on, but it's nice to have the flexiblity for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115358 Approved by: https://github.com/fduwjj	2023-12-08 19:44:45 +00:00
suo	3d999d2f2c	[export] optimize unflattener (#115364 ) Unflattening was slow on the APS FM model (which has thousands of nn.EmbeddingBag modules). Quick glance at the profile shows 75% of time in unflattening was spent copying this node list, which is immutable and globally shared. So just passing it around as a tuple yields a 4x speedup lol. Differential Revision: [D51929775](https://our.internmc.facebook.com/intern/diff/D51929775/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115364 Approved by: https://github.com/zhxchen17	2023-12-08 19:32:01 +00:00
Scott Wolchok	494cb28231	[PyTorch] AOTI: add ArrayRefTensor (#112115 ) This adds a shim for AOTI generated code to pretend a raw array works like an AtenTensorHandle. This allows parts of AOTI that generate uses of tensors to continue to be unaware of how those tensors are allocated. See the following diff/PR for usage. Differential Revision: [D50570252](https://our.internmc.facebook.com/intern/diff/D50570252/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112115 Approved by: https://github.com/chenyang78, https://github.com/desertfire	2023-12-08 19:31:50 +00:00
albanD	a2b89154bf	New swap function (#111747 ) This PR is proposing a new approach to solve the nn/optim only linked by python object identity problem. The idea is to have a function that can swap the content of two Tensors t1 and t2 while preserving all the old references. This would allow us to swap the `model.weight` with a new Tensor (can be any subclass of Tensor and any TensorImpl (xla, sparse, nested tensorimpl would work)). The use within nn will be done in a follow up. This is done by swapping the whole content of the PyObject and then putting back the fields associated with external references (refcount, gc tracking and weakrefs). Note that we have to properly handle all the cases where there is memory used before the public pointer PyObject* and where the PyObject is bigger due to dict/weakref being inlined (older CPython version) or due to slots. The main limitation of this approach is that the number of slots need to match for the objects being swapped and thus limit usage of slots in subclasses. Draft right now to see what @colesbury thinks about doing this? Pull Request resolved: https://github.com/pytorch/pytorch/pull/111747 Approved by: https://github.com/colesbury	2023-12-08 18:49:35 +00:00
Linus	5f2ff29569	Fix typo in `https://pytorch.org/docs/stable/sparse.html` (#115282 ) Fixes #111473 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115282 Approved by: https://github.com/svekars	2023-12-08 18:31:33 +00:00
Wongboo	68f74dd162	Add python and C++ support for LPPool3d (#114199 ) Add python and C++ support for LPPool3d to Fixes #114114 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114199 Approved by: https://github.com/mikaylagawarecki	2023-12-08 18:18:44 +00:00
Michael Lazos	1c3a4a864c	Remove always restore (#115317 ) Removes always restore, assuming that a HOP will cleanup any leftover state from tracing fwd + bwd This required a minor change to the autograd fn variable higher order op. If we are tracing forward DON'T add the call_function node into the main graph, since we are only tracing it for the purposes of speculation. Instead return the result directly to be passed to the backward for speculation. This was the only observable side effect on the output graph that I found. Test plan: test_smoke_from_test_autograd in test_autograd_function.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/115317 Approved by: https://github.com/voznesenskym, https://github.com/jansel	2023-12-08 18:17:37 +00:00
Nikita Shulga	a3f93dc44d	[EZ] [CD] Enable Triton 3.12 conda builds (#115424 ) Currently there is a chicken and egg problem with enabling triton builds for the platform, as package depends on `torch`, so I can only submit this change few days after https://github.com/pytorch/pytorch/pull/114819 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115424 Approved by: https://github.com/clee2000, https://github.com/seemethere	2023-12-08 18:10:45 +00:00
Bin Bao	81b565b142	[CI] Fix a missing write_csv_when_exception problem (#115370 ) Summary: Fix a problem shown in https://github.com/pytorch/pytorch/actions/runs/7124839624/job/19400589129 when a model times out. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115370 Approved by: https://github.com/eellison	2023-12-08 18:09:53 +00:00
Jason Ansel	c370450f02	[inductor] Remove hashing of tensor data for constants (#115356 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115356 Approved by: https://github.com/eellison	2023-12-08 18:05:34 +00:00
Yanbo Liang	e61d6b42f0	[Dynamo][6.1/N] Refactor out TorchInGraphFunctionVariable and improve heuristic (#113432 ) This is splitted from #113009, please check https://github.com/pytorch/pytorch/pull/113009#issuecomment-1804417925 for more details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113432 Approved by: https://github.com/ezyang, https://github.com/jansel	2023-12-08 17:15:14 +00:00
Mengwei Liu	898554a3a3	[torchgen] Add logic in custom ops to return empty tensor (#114143 ) Summary: Add two logic: 1. If the custom op is returning a `Tensor` but also doesn't have an out tensor as input, return an empty tensor. 2. If the custom op is returning more than one Tensor and the number of out tensors is not the same as return Tensor, return a tuple of empty tensors. Test Plan: Rely on new unit tests Differential Revision: D51471651 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114143 Approved by: https://github.com/cccclai	2023-12-08 17:03:44 +00:00
Behrang Javaherian	b3b5bd51ea	[raas][torch][jit] Allow not storing the optimized graph (#115381 ) Summary: GraphFunction internally stores the optimized graph after generating it and then it is passed into the executor which makes a copy of it. So we store the optimized graph effectively twice. This diff allows to set a flag to not store the optimized graph inside the GraphFunction. The code is NoP right now until the flag is enabled. Test Plan: I ran SL with this on raas with good memory saving on raas server. From command line: exmaple model run ``` buck run mode/opt-clang sigrid/predictor/client/localnet:run_model -- --model_id_to_load=953556500 --model_snapshot_to_load=362 I1207 11:04:58.657143 3556226 SigridPredictorLocalModelFactory.cpp:32] Memory usage for 953556500_362 is 255646 Kb ``` then with flag enabled: ``` buck run mode/opt-clang sigrid/predictor/client/localnet:run_model -- --model_id_to_load=953556500 --model_snapshot_to_load=362 --torch_jit_do_not_store_optimized_graph=true I1207 11:06:25.245779 3577383 SigridPredictorLocalModelFactory.cpp:32] Memory usage for 953556500_362 is 165167 Kb ``` So collective with this flag and the flag from D51950418 ``` buck run mode/opt-clang sigrid/predictor/client/localnet:run_model -- --model_id_to_load=953556500 --model_snapshot_to_load=362 --torch_jit_do_not_store_optimized_graph=true --torch_jit_enable_profiling_graph_executor=false I1207 11:09:17.502743 3592345 SigridPredictorLocalModelFactory.cpp:32] Memory usage for 953556500_362 is 114848 Kb ``` Differential Revision: D51931895 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115381 Approved by: https://github.com/malfet	2023-12-08 16:29:13 +00:00
Peter Bell	f64b10803f	[SparseCsr] Remove triton sdpa skip after triton pin update (#109601 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109601 Approved by: https://github.com/desertfire, https://github.com/amjames	2023-12-08 15:49:16 +00:00
rzou	72e58a756c	Set markDynamoStrictTest in functorch/test_vmap.py (#115274 ) We set markDynamoStrictTest in most of functorch/test_vmap.py. This revealed many existing failing tests, so we mark those all as expected failures or skip them. Test Plan: - CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/115274 Approved by: https://github.com/guilhermeleobas, https://github.com/kshitij12345 ghstack dependencies: #115267, #115276, #115268	2023-12-08 14:51:19 +00:00
Jerry Zhang	cc8f6f56dc	[quant][pt2e] Add convert callback to Observer module (#115001 ) Summary: This is to allow easier extension of quant workflow in the future, as we are seening more diverse ways of doing quantization putting up this for feedbacks first Test Plan: python test/test_quantization.py TestQuantizePT2E.test_observer_callback Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/115001 Approved by: https://github.com/kimishpatel	2023-12-08 13:47:37 +00:00
Alexander Grund	ca15671c30	Fix failing test_invalid_input_csr_large (#114940 ) The test introduced in #102530 has a bug: Construction of `crow_indices` raises an exception: "value cannot be converted to type int32 without overflow" which is obviously correct. This makes the test fail which is supposed to check for an overflow in nnz. Fix by making the construction of `crow_indices` pass although with an invalid value which would error later but triggers the correct check. Given that I'm not sure it is even worth checking for an overflow in nnz: - `crow_indices[..., -1] == nnz` is already enforced - this can only hold if `crow_indices` is able to hold `nnz` without overflow - `col_indices` has to be of the same type as `crow_indices` - Hence the type of `col_indices` has to be able to hold the value of `nnz` So in conclusion: The situation being checked for cannot reasonably occur CC @pearu as the test author for additional insight Pull Request resolved: https://github.com/pytorch/pytorch/pull/114940 Approved by: https://github.com/pearu, https://github.com/cpuhrsch	2023-12-08 11:55:21 +00:00
Iris Zhang (PyTorch)	23fa9621e4	[DeviceMesh] Rename _device_mesh.py to device_mesh.py to prepare for beta (#115099 ) (#115193 ) Summary: Rename _device_mesh.py to device_mesh.py, update all callsites, add documentation. We created stubs for public class and methods in torch.distributed.device_mesh so that torch.distributed.device_mesh can be imported with or without distributed is available(). Original diff reverted: D51629761 Original PR reverted: https://github.com/pytorch/pytorch/pull/115099 Prior to landing, CI signals are all passed. Shipit added the "ci/trunk" label to the PR and DID NOT wait for it and went ahead committing. More context can be found in the reverted PR above. Test Plan: CI. Differential Revision: D51861018 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115193 Approved by: https://github.com/fegin	2023-12-08 08:44:32 +00:00
Nikita Shulga	6c585de076	[CUDA] baddmm should fall back to addmm for batch=1 (#114992 ) I.e. it feels reasonable to always call `at::cuda::gemm` rather than `at::cuda::bgemm` when num_batches == 1 After the change, benchmarking torch built with CUDA-12 using [following perf script](https://gist.github.com/malfet/6a17156d7f5663b8b12054a1beff3fe1) on A100 are as follows: \| Shape \| bmm_time \| mm_time \| slow down (%) \| \| -------------- \| --------- \| --------- \| ------------- \| \| 1x1x4096 \| 14.18 \| 14.31 \| -0.89 \| \| 1x1x8192 \| 14.37 \| 14.37 \| -0.05 \| \| 1x1x16384 \| 14.03 \| 14.12 \| -0.68 \| \| 1x1x32768 \| 14.19 \| 14.24 \| -0.35 \| \| 1x1x65536 \| 14.85 \| 14.52 \| 2.30 \| \| 1x1x131072 \| 14.03 \| 14.07 \| -0.33 \| \| 128x128x128 \| 11.34 \| 11.06 \| 2.56 \| \| 256x256x256 \| 14.85 \| 14.40 \| 3.15 \| \| 512x512x512 \| 27.22 \| 27.22 \| -0.01 \| \| 1024x1024x1024 \| 129.66 \| 129.50 \| 0.12 \| \| 2048x2048x2048 \| 972.18 \| 973.24 \| -0.11 \| \| 129x127x129 \| 11.21 \| 11.25 \| -0.39 \| \| 257x255x257 \| 14.50 \| 14.43 \| 0.44 \| \| 513x511x513 \| 29.01 \| 29.01 \| 0.01 \| \| 1025x1023x1025 \| 137.65 \| 137.64 \| 0.01 \| \| 2049x2047x2049 \| 982.58 \| 982.65 \| -0.01 \| \| 4097x3x4097 \| 86.65 \| 86.64 \| 0.01 \| \| 8193x3x8193 \| 384.02 \| 383.96 \| 0.02 \| \| 16385x3x16385 \| 1106.73 \| 1107.32 \| -0.05 \| \| 32769x3x32769 \| 4739.49 \| 4739.48 \| 0.00 \| \| 65537x3x65537 \| 17377.78 \| 17378.74 \| -0.01 \| \| 4097x5x4097 \| 87.09 \| 87.12 \| -0.03 \| \| 8193x5x8193 \| 301.38 \| 301.36 \| 0.01 \| \| 16385x5x16385 \| 1107.38 \| 1108.04 \| -0.06 \| \| 32769x5x32769 \| 4743.73 \| 4744.07 \| -0.01 \| \| 65537x5x65537 \| 17392.32 \| 17395.42 \| -0.02 \| \| 4097x7x4097 \| 87.17 \| 87.19 \| -0.02 \| \| 8193x7x8193 \| 301.94 \| 302.00 \| -0.02 \| \| 16385x7x16385 \| 1107.17 \| 1106.79 \| 0.03 \| \| 32769x7x32769 \| 4747.15 \| 4747.13 \| 0.00 \| \| 65537x7x65537 \| 17403.85 \| 17405.02 \| -0.01 \| Fixes perf problem reported in https://github.com/pytorch/pytorch/issues/114911 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114992 Approved by: https://github.com/Skylion007, https://github.com/eqy	2023-12-08 07:53:17 +00:00
fduwjj	4d70802133	[c10d] Use TCPStore to record NCCL timeout and dump debug info (#115226 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115226 Approved by: https://github.com/wconstab	2023-12-08 06:19:40 +00:00
voznesenskym	2c84616a94	Move the shape env symint cache to a symbol cache, better routing for subclass fakification [re-pr 115227] (#115396 ) * Context: Joel sees that unless he manually writes to the fake tensor memo, fakification seems to produce spurious symbols! Voz (me) objects, saying that not only is directly writing to memo a bad pattern, recursively invoking fakification on tensor subclass elements in dynamo should suffice! Joel says that while he morally agrees, he has a test proving otherwise, a most perplexing situation. Digging in, I figured out that while we were making fake tensors correctly, with properly cached symbols and the like, we were also incorrectly creating spurious symbols, leading the test to fail. Before this PR, we would only cache source->symint. This was generally fine, but meant that you would create a symbol, then potentially throw it out due to symint cache. For example, the cache hit flow was: make a symbol (ex: s2) -> use it to make a symint -> hit the cache (my_source-s1) Now, in this example, you have a symbol in your val_to_var/var_to_val (s2) that is unused. This is sound, but wasteful, and furthermore, misleading. This was causing a test added in a PR in this stack to fail, specifically, because the test was using ``` curr_var_to_val = { str(k): v for k, v in context.fake_mode.shape_env.var_to_val.items() } ```` To validate that no new symbols were being created (that is, that recursively creating fake tensors for subclasses was working). The test is correct, but the implementation of caching would make (by this method of observation) cache hits look like cache misses. So, the fix here is to move the cache up to be a general symbol cache, rather than only a cache for symints. The initial implementation did that! But then, it ran into some interesting errors when it came to replay. When replaying symbol creation, behaviors would diverge in the new shape env! How could that be? The answer is because creating a new shape_env resulted in us replaying symbol creation... but with a cache from a different shape env! This was short circuiting symbol creation - and so, adding an extra layer to the cache for id(shape_env) fixes the problem. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115396 Approved by: https://github.com/mlazos	2023-12-08 05:02:21 +00:00
PyTorch UpdateBot	d0f161eae4	[vision hash update] update the pinned vision hash (#111264 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111264 Approved by: https://github.com/pytorchbot	2023-12-08 03:33:33 +00:00
Satendra Gera	9521331ba5	[pytorch] Multiprocessing api to use sigkill if sigterm doesn't kill the process (#115219 ) Summary: [pytorch] Multiprocessing api to use sigkill if sigterm doesn't kill the process We have seen a handful of jobs training stuck where one of the trainer goes down while others are stuck in c++ land and hence not handling the sigterm. Test Plan: Manually validated by attaching gdb to one of the processes and sent a kill -9 to another. Saw the log ```WARNING] Unable to shutdown process 4422 via Signals.SIGTERM, forcefully exiting via Signals.SIGKILL``` Differential Revision: D51862545 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115219 Approved by: https://github.com/wconstab, https://github.com/fduwjj	2023-12-08 02:26:19 +00:00
Eddie Yan	459845b82d	[cuDNN][cuDNN frontend] Bump `cudnn_frontend` submodule to 1.0 (#115218 ) A prerequisite for cuDNN flash attention #113713 . CC @malfet @atalman @drisspg @ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/115218 Approved by: https://github.com/drisspg, https://github.com/malfet	2023-12-08 02:24:26 +00:00
Yuqing Jiang	e071d6a9eb	[Nested tensor]avoid using shape in python subclass NT, use _size instead (#115371 ) Summary: calling tensor.shape will call torch_dispatch which adds more overhead. Testing overhead difference in "NT + NT" operation: Before: the add operation takes ~300us {F1167963824} After: the add operation takes ~200us {F1167964056} Test Plan: unit tests in test_nestedtensor Differential Revision: D51949135 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115371 Approved by: https://github.com/soulitzer, https://github.com/jbschlosser	2023-12-08 02:08:36 +00:00
Lucas Pasqualin	5432088098	Adds Checkpointer Wrapper for DCP [3/N] (#114603 ) Adds a useful high level wrapper for calling `dist.save/load` with the correct storage readers and writers. Instead of doing: ``` DCP.save( state_dict={...}, storage_writer=StorageWriter(...) ) DCP.load( state_dict={...}, storage_reader=StorageReader(...) ) ``` We can now do: ``` checkpointer = Checkpointer(...) checkpointer.save(state_dict={...}) checkpointer.load(state_dict={...}) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114603 Approved by: https://github.com/fegin, https://github.com/wz337	2023-12-08 01:03:21 +00:00
Joel Schlosser	3b01f30b20	Prevent invalid pointwise ops on jagged with transposed ragged dim (#115190 ) TODO: tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/115190 Approved by: https://github.com/soulitzer, https://github.com/ani300	2023-12-08 00:54:03 +00:00
Will Constable	784e20e3d7	[C10D] Make dumpPipe use async launcher (#115375 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115375 Approved by: https://github.com/fduwjj ghstack dependencies: #115332	2023-12-08 00:16:22 +00:00
Isuru Fernando	bb7746275c	Add is_integer to SymFloat (#114703 ) Fixes #114676 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114703 Approved by: https://github.com/peterbell10	2023-12-07 23:23:53 +00:00
Mikayla Gawarecki	f5919335db	Fix _load_from_state_dict for num_batches_tracked in batchnorm (#115285 ) I approved https://github.com/pytorch/pytorch/pull/110850 which did the following Previously: `num_batches_tracked` not in state_dict when doing `m.load_state_dict(state_dict)` --> always overwrite module's `num_batches_tracked` in `load_from_state_dict` with a 0 cpu tensor Now: `num_batches_tracked` not in state_dict loaded when doing `m.load_state_dict(state_dict)` --> only overwrite module's `num_batches_tracked` in `load_from_state_dict` with a 0 cpu tensor if module does not have `num_batches_tracked` This causes the following issue: ``` with torch.device('meta'): m = BatchNorm(...) m.load_state_dict(state_dict, assign=True) ``` If `num_batches_tracked` is not in `state_dict`, since `modules's` `num_batches_tracked` is present on meta device, it is not overwritten with a 0 cpu tensor. When compiling, this error is raised ``` AssertionError: Does not support mixing cuda+meta ``` I am not sure whether the explicit check for meta device makes sense as a fix, will add testing if this fix is ok Pull Request resolved: https://github.com/pytorch/pytorch/pull/115285 Approved by: https://github.com/albanD	2023-12-07 22:48:26 +00:00
Michael Lazos	18d57dde2d	Remove remaining uses of copy_graphstate (#115321 ) After auditing higher_order_ops.py, the graph checkpoints were only getting used in the event of an exception, so it is safe to remove because we restart analysis in this case now. To make this clearer the current state is the following: Checkpoint side effects Capture subgraph if graph break: restore as usual else: throw away inlining translator and subgraph tracer Restore side effects This will change to the following after this change: Checkpoint side effects Capture subgraph: if graph break: restart analysis else: throw away inlining translator and subgraph tracer Restore side effects Pull Request resolved: https://github.com/pytorch/pytorch/pull/115321 Approved by: https://github.com/jansel, https://github.com/zou3519	2023-12-07 22:35:02 +00:00
Jerry Zhang	ecba053cff	[quant][pt2e] XNNPACKQuantizer skip inserting observers for non-float Tensors (#114999 ) Summary: att Test Plan: python test/test_quantization.py -k test_add_mul_long Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/114999 Approved by: https://github.com/kimishpatel, https://github.com/guangy10	2023-12-07 22:13:36 +00:00
wz337	dacf5d6e92	[DTensor] Remove assert to allow tensor sharding dimension < Shard(x).ndim (#115114 ) Consolidated by changes made by @yoyoyocmu. https://www.internalfb.com/diff/D51821717 Remove assert to allow tensor dimension < Shard(x).ndim. With the current padding, we do support this already. Follow up: we will still need to fix the size mismatch and `full_tensor()` hang when tensor is uneven-sharded. Created issue here: https://github.com/pytorch/pytorch/issues/115310 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115114 Approved by: https://github.com/yoyoyocmu, https://github.com/wanchaol	2023-12-07 21:57:30 +00:00
Will Constable	7562b45454	Reland "[C10D] Use future for flight recorder dump (#115176 )" (#115332 ) Replaces the "always sleep 30 sec before abort" with "wait up to 30 sec for the future to complete then abort". The difference in this case is the abort happens as soon as the dump finishes up to a maximum, instead of always waiting the maximum. Allows multiple calls to dump, which will be serialized. Renames tryWriteDebugInfo to launchAsyncDebugDump in spirit of the change to support more than one launch and to always launch rather than only launching on the first call. Adds a test for dumping on timeout. This reverts commit ac7d14baad53fa7d63119418f760190f289d8a01. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115332 Approved by: https://github.com/fduwjj	2023-12-07 21:20:58 +00:00
zhxchen17	fd79995fd6	[export] Dont skip output caching for now. (#115374 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/115374 Approved by: https://github.com/tugsbayasgalan	2023-12-07 20:31:30 +00:00
Wanchao Liang	6a6a1e3ef7	[dtensor] update README to make all example runnable (#115365 ) as titled, also add torchrun commands Pull Request resolved: https://github.com/pytorch/pytorch/pull/115365 Approved by: https://github.com/fegin	2023-12-07 20:23:37 +00:00
Nicolas Macchioni	c06ab369e8	[OAT] toggle for forcing matmul precision matching (#115326 ) Summary: Add a toggle to inductor config that will force matmul precision dtypes to match between cublas and triton backends for addmm, bmm, and mm operations. Test Plan: CI + model launches Reviewed By: jansel Differential Revision: D51442001 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115326 Approved by: https://github.com/jansel	2023-12-07 20:22:12 +00:00
Sunita Nadampalli	7faa67f6ef	[inductor] enable mkldnn op weight pre-packing on aarch64 (#115037 ) This PR enables the fx passes and mkldnn optimizations for aarch64 It improved the bert inference performance up to 5.8x on AWS c7g instance when compared torch.compile() vs no compile path. This is enabled when pytorch is built with USE_MKLDNN_ACL option for aarch64. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115037 Approved by: https://github.com/jgong5, https://github.com/malfet	2023-12-07 19:58:38 +00:00
Linus	7201edc0a5	Fix RNN class constructor signature (#115341 ) Fixes #114617 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115341 Approved by: https://github.com/mikaylagawarecki	2023-12-07 19:46:33 +00:00
Jane Xu	21cca2494d	Move test_multi_tensor_optimizers to use OptimizerInfos (#114797 ) This PR aims for parity+ compared to the old testing for the simplest foreach test case. Test coverage increase: we now test foreach optimizers with CPU as well as on GPU. Before: ``` (pytorch-3.10) [janeyx@devgpu023.odn1 ~/local/pytorch (19136605)]$ python test/test_optim.py -v -k test_multi_tensor_optimizers /home/janeyx/.conda/envs/pytorch-3.10/lib/python3.10/site-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.0 warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}" test_multi_tensor_optimizers (optim.test_optim.TestOptim) ... ok ---------------------------------------------------------------------- Ran 1 test in 7.253s OK (pytorch-3.10) [janeyx@devgpu023.odn1 ~/local/pytorch (19136605)]$ ``` Now, we get granular test cases at the cost of overhead! ``` (pytorch-3.10) [janeyx@devgpu023.odn1 ~/local/pytorch (19136605)]$ python test/test_optim.py -v -k test_foreach /home/janeyx/.conda/envs/pytorch-3.10/lib/python3.10/site-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.0 warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}" test_foreach_ASGD_cpu_float64 (__main__.TestOptimRenewedCPU) ... ok test_foreach_Adadelta_cpu_float64 (__main__.TestOptimRenewedCPU) ... ok test_foreach_Adagrad_cpu_float64 (__main__.TestOptimRenewedCPU) ... ok test_foreach_AdamW_cpu_float64 (__main__.TestOptimRenewedCPU) ... ok test_foreach_Adam_cpu_float64 (__main__.TestOptimRenewedCPU) ... ok test_foreach_Adamax_cpu_float64 (__main__.TestOptimRenewedCPU) ... ok test_foreach_NAdam_cpu_float64 (__main__.TestOptimRenewedCPU) ... ok test_foreach_RAdam_cpu_float64 (__main__.TestOptimRenewedCPU) ... ok test_foreach_RMSprop_cpu_float64 (__main__.TestOptimRenewedCPU) ... ok test_foreach_Rprop_cpu_float64 (__main__.TestOptimRenewedCPU) ... ok test_foreach_SGD_cpu_float64 (__main__.TestOptimRenewedCPU) ... ok test_foreach_ASGD_cuda_float64 (__main__.TestOptimRenewedCUDA) ... ok test_foreach_Adadelta_cuda_float64 (__main__.TestOptimRenewedCUDA) ... ok test_foreach_Adagrad_cuda_float64 (__main__.TestOptimRenewedCUDA) ... ok test_foreach_AdamW_cuda_float64 (__main__.TestOptimRenewedCUDA) ... ok test_foreach_Adam_cuda_float64 (__main__.TestOptimRenewedCUDA) ... ok test_foreach_Adamax_cuda_float64 (__main__.TestOptimRenewedCUDA) ... ok test_foreach_NAdam_cuda_float64 (__main__.TestOptimRenewedCUDA) ... ok test_foreach_RAdam_cuda_float64 (__main__.TestOptimRenewedCUDA) ... ok test_foreach_RMSprop_cuda_float64 (__main__.TestOptimRenewedCUDA) ... ok test_foreach_Rprop_cuda_float64 (__main__.TestOptimRenewedCUDA) ... ok test_foreach_SGD_cuda_float64 (__main__.TestOptimRenewedCUDA) ... ok ---------------------------------------------------------------------- Ran 22 tests in 30.954s OK (pytorch-3.10) [janeyx@devgpu023.odn1 ~/local/pytorch (19136605)]$ ``` Why the increase in time? Two reasons: 1. overhead. Any _CUDA_ *Info test (OpInfo, ModuleInfo, OptimizerInfo) will wrap itself with the `CudaNonDefaultStream` policy, and `CudaNonDefaultStream.__enter__` when called for the first time will go through all visible CUDA devices and synchronize each of them, thus forcing the CUDAContext to be init'd. Doing this for all 8 devices takes ~10-15s. Also, test parametrization costs a little overhead too, but not to the level init'ing CUDA context does. 2. We test more! Now, we have 72 configs (in the foreach optimizer world) whereas we only had 59 before. Next steps for the future: - consider adding more Tensor LR configs (like a Tensor LR without capturable in the single tensor case) - this is likely the next PR or 2: migrate all uses of _test_derived_optimizers in test_optim to TestOptimRenewed Pull Request resolved: https://github.com/pytorch/pytorch/pull/114797 Approved by: https://github.com/albanD	2023-12-07 19:37:56 +00:00
youkaichao	16373bbc1f	fix error message in pytorch (#115349 ) Fixes https://dev-discuss.pytorch.org/t/typo-in-error-message/1709 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/115349 Approved by: https://github.com/Skylion007	2023-12-07 19:27:29 +00:00
suo	eb4ba35b07	fix test_weak.py on mac (#115367 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115367 Approved by: https://github.com/albanD	2023-12-07 19:19:56 +00:00
Menglu Yu	b0a9641815	[Inductor][fx pass] Fuse pointwise operators in the post grad (#114778 ) Summary: We construct a unified API that can be easily add pointwise ops to be batched in the post grad Test Plan: # unit test ``` buck2 test 'fbcode//mode/dev-nosan' fbcode//caffe2/test/inductor:group_batch_fusion ``` Buck UI: https://www.internalfb.com/buck2/19b3f641-782f-4f94-a953-3ff9ce2cfa7b Test UI: https://www.internalfb.com/intern/testinfra/testrun/1125900251953016 Network: Up: 67KiB Down: 32KiB (reSessionID-c2a80f26-8227-4f78-89fc-bcbda0ae8353) Jobs completed: 18. Time elapsed: 1:19.8s. Cache hits: 0%. Commands: 2 (cached: 0, remote: 0, local: 2) Tests finished: Pass 6. Fail 0. Fatal 0. Skip 0. Build failure 0 # local reproduce ### cmf P881792289 ### igctr ### dsnn ### icvr Reviewed By: xuzhao9 Differential Revision: D51332067 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114778 Approved by: https://github.com/xuzhao9	2023-12-07 19:04:03 +00:00
rzou	3a5fb0d456	markDynamoStrictTest in functorch/test_eager_transforms.py (#115268 ) We're doing some more work around the functorch-torch.compile interaction. The current state is that these tests might not get run in the Dynamo CI shard. Using this decorator makes them actually run (by resetting the Dynamo state before/after each test). Test Plan: Wait for CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/115268 Approved by: https://github.com/voznesenskym, https://github.com/guilhermeleobas ghstack dependencies: #115267, #115276	2023-12-07 18:42:21 +00:00
rzou	a1bfaf75dc	markDynamoStrictTest: add nopython flag, set default to False (#115276 ) Default should be False because in general, we're interested in reliability and composability: we want to check that running PyTorch with and without Dynamo has the same semantics (with graph breaks allowed). Test Plan: Existing tests? Pull Request resolved: https://github.com/pytorch/pytorch/pull/115276 Approved by: https://github.com/voznesenskym ghstack dependencies: #115267	2023-12-07 18:42:21 +00:00
rzou	2847045ed9	Set _dynamo.config.capture_func_transforms=False (#115267 ) Due to not all tests in the Dynamo shard actually running in CI, we've started to bitrot on this implementation. Since our plan is to trace into the functorch implementations instead of construct a HOP (which is what capture_func_transforms=True does), let's turn off this config by default. Test Plan: - Tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/115267 Approved by: https://github.com/voznesenskym, https://github.com/guilhermeleobas	2023-12-07 18:42:15 +00:00
Howard Huang	3e66385ddd	Add Work to distributed docs (#115172 ) Summary: Documenting the `Work` object For a collective (broadcast, all_reduce, etc.) when async_op=True we return a `Work` object to which users can call `.wait()`, `.is_success()`, among other things but this class is not documented Test Plan: Preview the docs build in OSS Differential Revision: D51854974 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115172 Approved by: https://github.com/wconstab	2023-12-07 18:12:10 +00:00
Shaltiel Shmidman	ee8b33f7d5	Fixed crash when calling pad_packed_tensor when packed with cuda tensors and ensure_sorted=false due to indexing with tensors on different devices (#115028 ) Fixes #115027 Fix in csrc as done in the python code [here](https://github.com/pytorch/pytorch/blob/main/torch/nn/utils/rnn.py#L338). Pull Request resolved: https://github.com/pytorch/pytorch/pull/115028 Approved by: https://github.com/drisspg	2023-12-07 18:09:18 +00:00
suo	686a3e0bf0	[pytorch][PR] introduce WeakHashRef (#115216 ) We would like weak dictionaries that have `torch.ScriptObject` keys. Similar to tensors, we need to override the behavior of the ref to dot he right thing under comparison. This change also makes it so that WeakIdKeyDictionary works with a pluggable ref_type. Differential Revision: [D51828205](https://our.internmc.facebook.com/intern/diff/D51828205/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115216 Approved by: https://github.com/albanD	2023-12-07 17:48:11 +00:00
PyTorch MergeBot	684ce1b21d	Revert "Assert that output could only be the last node of the FX graph (#115179 )" This reverts commit 4a9fb9832abc00dff9729b7d7a9647b376882f38. Reverted https://github.com/pytorch/pytorch/pull/115179 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/115179#issuecomment-1845776365))	2023-12-07 17:26:27 +00:00
ydwu4	dd6ae6d3b4	[HigherOrderOp] Remove additional get item calls in MapHigherOrder. (#115207 ) As titled, this PR removes the unnessecary getitem call from the graph that's manipulated in MapHigherOrder, where we want to get the first dim slice of original tensor for specualtion but using call_method will accidentally create a get_item call in the graph, so want to avoid it by calling unpack_var_sequence on input tensor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115207 Approved by: https://github.com/yanboliang ghstack dependencies: #115115, #115204, #115205	2023-12-07 17:06:44 +00:00
ydwu4	8b74735878	[HigherOrderOp] make MapHigherOrder create map_impl call_function node instead of map (#115205 ) We want to remove the map_wrapper and replace it with dynamo always on. This is the first step of this plan. In this PR, we make dynamo directly generates a map_impl nodes. This hasn't touch the eager logic yet. So the execution path after this PR looks like 1. `dynamo -> map_impl` when torch.compile is on. (Before this PR, it's `dynamo -> map_wrapper -> map_impl` and 2. `map_wrapper -> map_impl` (This PR did't touch the logic here). The added TODO(yidi) is addressed in the following pr. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115205 Approved by: https://github.com/yanboliang ghstack dependencies: #115115, #115204	2023-12-07 17:06:44 +00:00
ydwu4	be3efbebb6	[HigherOrderOp] make MapHigherOrder use should_flatten_output=True (#115204 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115204 Approved by: https://github.com/yanboliang ghstack dependencies: #115115	2023-12-07 17:06:35 +00:00
ydwu4	998c87f93c	[BE][HigherOrderOp] extract redundant code that unflattens the output (#115115 ) We need this function to unflatten the variable tracker for HOPs that want pytree output support, e.g. map. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115115 Approved by: https://github.com/yanboliang	2023-12-07 17:06:28 +00:00
Tobias Ringwald	43f42bf3cb	Updated docs for deprecated `torch.set_default_tensor_type` (#115041 ) Added deprecation note for torch.set_default_tensor_type. Updated docs that referenced this method. Fixes #113646. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115041 Approved by: https://github.com/janeyx99	2023-12-07 16:17:36 +00:00
Pruthvi Madugundu	441ecf03e2	Update gloo submodule (#115158 ) Updates to pull ROCm 6.0 related changes and few minor updates in gloo. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115158 Approved by: https://github.com/jeffdaily, https://github.com/malfet	2023-12-07 15:55:08 +00:00
cyy	7b8084d1c6	[5/N] Fixes clang-tidy warnings in c10/core/*.h (#115232 ) This PR continues to fix clang-tidy warnings for headers in c10/core. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115232 Approved by: https://github.com/Skylion007	2023-12-07 15:48:03 +00:00
drisspg	d08b20d534	Update FlashAttention too v2.3.6 (#115313 ) # Summary This PR updates the FlashAttention code from: `02ac572f3f`. Or Tag 2.3.2 To `92dd5703ec` Or tag 3.2.6. As well I think that this should be cherry picked into 2.2.0 release since there was a temporary ~15% perf regression for causal masking. It is not technically a regression since Flash wasn't released yet but it would be nice to have in the release. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115313 Approved by: https://github.com/Skylion007	2023-12-07 15:47:16 +00:00
Chip Turner	78b945484b	[c10d] Extend NCCL communicator splitting to more use cases (#114916 ) Previously we could only use `ncclCommSplit` when we knew all backends were connected on all shards (due to the need to perform a NOCOLOR split), which in practice meant we could only use it for subgroups that were copies of the entire world. This change allows for specifying a bound device id to `init_process_group` which tells the pg and its backends that the specified device, and the specified device only, will be associated with this rank. This guarantee lets us do an early connect (which we could not previously do due to how ProcessGroupNCCL infers devices based on tensors and not the rank number). And by doing the early connect, we have the guarantee ranks are connected and can perform nocolor splits when needed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114916 Approved by: https://github.com/kwen2501	2023-12-07 15:13:01 +00:00
Catherine Lee	a6736ac851	Add call to run_tests for a few tests (#115097 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/115097 Approved by: https://github.com/wconstab, https://github.com/fduwjj	2023-12-07 08:27:40 +00:00
Michael Lazos	3c882925da	Make subclass type instances constants (like UserDefinedClasses) (#115323 ) As title Pull Request resolved: https://github.com/pytorch/pytorch/pull/115323 Approved by: https://github.com/oulgen	2023-12-07 08:10:59 +00:00
Xilun Wu	5e3631db31	[DTensor] force re-compute sharding when normalized_shape differs in fwd layer norm (#115250 ) Summary: #114174 did not test the case where `elementwise_affine=False` (i.e. `weight` and `bias` are `None`) and this test would fail due to cached sharding propagation. The difference on sharding prop between these cases is, when `weight` and `bias` are None, the forward layer norm op will be recognized as a "static shape op" and `propagate_op_sharding` will be applied rather than `propagate_op_sharding_non_cached`. A fix is to force re-compute sharding when `normalized_shape` changes by setting op schema's `RuntimeSchemaInfo`'s `static_argnum` to include `normalized_shape` (i.e. 1) Test: pytest test/distributed/_tensor/test_math_ops.py -s -k layer_norm Pull Request resolved: https://github.com/pytorch/pytorch/pull/115250 Approved by: https://github.com/wanchaol	2023-12-07 07:44:06 +00:00
zhxchen17	622688fab9	[export] Fix graph output mismatch issue with constant outputs. (#115280 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/115280 Approved by: https://github.com/tugsbayasgalan	2023-12-07 06:11:08 +00:00
FFFrog	e1f159e6b2	Remove rebundant api named is_int_list (#115136 ) Fixes #114933 As the title stated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115136 Approved by: https://github.com/zou3519	2023-12-07 04:55:13 +00:00
chundian	5309ac1b98	Add test case to prove non-strict export supports external call (#115245 ) Current non-strict test cases (added in #114697) are already supported by strict mode, so it can't demonstrate the incremental value of non-strict mode. How about adding test cases that fail in strict mode but pass in non-strict mode? Test Plan: python test/export/test_export.py -k test_external_call_non_strict_real_tensor Pull Request resolved: https://github.com/pytorch/pytorch/pull/115245 Approved by: https://github.com/tugsbayasgalan, https://github.com/zhxchen17	2023-12-07 04:51:15 +00:00
Jerry Zhang	a93b9ee9d8	[quant][be] Add a test for per channel quant for groupwise conv (#115224 ) Summary: just making sure this works Test Plan: python test/test_quantization.py -k test_groupwise_per_channel_quant Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/115224 Approved by: https://github.com/andrewor14	2023-12-07 04:46:20 +00:00
Hongtao Yu	b7eb9b1e7e	[Autotune] Enable register pressure handling logic for H100. (#115295 ) I have seen the register pressure handling logic helps performance on H100 for a couple kernels. Also my local run of Huggingface and timm_models both show neutral results. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115295 Approved by: https://github.com/jansel	2023-12-07 04:37:44 +00:00
Nicolas Macchioni	f55ab176fc	[OAT] move matmul precision out of system info (#115242 ) Summary: move matmul precision out of the system info (system hash) and into the cache in preparation for switching precisions during compile Test Plan: CI Reviewed By: jansel Differential Revision: D51442000 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115242 Approved by: https://github.com/jansel	2023-12-07 04:30:06 +00:00
leslie-fang-intel	7ec145bfed	[Quant] [PT2] Fix XNNPACKQuantizer set_module_type issue (#115252 ) Summary Fix the issue https://github.com/pytorch/pytorch/issues/115251, the root-cause is we pass the `filter_fn` parameter of `find_sequential_partitions` in wrong position. Use keyword arg to fix this issue. Summary ``` python -u -m pytest -s -v test_quantization.py -k test_set_module_type_case_2 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/115252 Approved by: https://github.com/jgong5, https://github.com/jerryzh168	2023-12-07 03:08:20 +00:00
angelayi	6c0a4ced53	[export] Add math.* ops to pass base (#115271 ) Fixes https://github.com/pytorch/pytorch/issues/115209 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115271 Approved by: https://github.com/ydwu4	2023-12-07 02:47:04 +00:00
Chip Turner	d7160c9223	Handle potential ValueError exception when stringifying signals (#114696 ) On some systems it is possible to receive a signal that does not have a name. Rare, but possible. This prevents our error handler from crashing and instead properly reports the signal. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114696 Approved by: https://github.com/xmfan	2023-12-07 02:10:30 +00:00
PyTorch MergeBot	ac7d14baad	Revert "[C10D] Use future for flight recorder dump (#115176 )" This reverts commit 0e07e3dbe434ce31a5aea634628c7d39747f265f. Reverted https://github.com/pytorch/pytorch/pull/115176 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but the test_timeout_dumps is failing in trunk `0e07e3dbe4` ([comment](https://github.com/pytorch/pytorch/pull/115176#issuecomment-1844076455))	2023-12-07 02:09:58 +00:00
Joel Schlosser	3a18211622	Guard on subclass inner tensors (#114965 ) This PR introduces guarding on subclass inner tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114965 Approved by: https://github.com/voznesenskym ghstack dependencies: #114311, #115212	2023-12-07 01:47:48 +00:00
angelayi	c163b3c035	[export] Remove runtime assertion pass (#115196 ) Reland of https://github.com/pytorch/pytorch/pull/111949/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/115196 Approved by: https://github.com/avikchaudhuri	2023-12-07 01:44:11 +00:00
Antonio Kim	73c0035160	Add `reset_storage` method to FunctionalTensorWrapper (#115235 ) In certain edge cases when using lazy tensors, the base tensor stored in the `FunctionalStorageImpl` and the `value_` tensor stored in the `FunctionalTensorWrapper` diverge. For instance, take this simple example ```python class Model(torch.nn.Module): def __init__(self): super().__init__() self.fc1 = torch.nn.Linear(4, 2, bias=False) def forward(self, x): return x @ self.fc1.weight.transpose(0, 1) with torch.device("lazy"): model = Model() x = torch.ones(4) out = model(x) ``` The call to `transpose` on the lazily initialized weight `fc1.weight` applies a view op on the functional tensor which only gets propagated to the functional tensor wrapper and not the base tensor in the storage. Thus, causing them to diverge. To fix this behaviour, we need to reset the functional tensor's storage. To facilitate this, we add a `reset_storage` method to `FunctionalTensorWrapper` which clears away the old storage and view metas. CC: @behzad-a @GlebKazantaev @wconstab @bdhirsh Pull Request resolved: https://github.com/pytorch/pytorch/pull/115235 Approved by: https://github.com/bdhirsh	2023-12-07 01:32:01 +00:00
cyy	4e9fe496cd	Remove c10::either (#112733 ) Time to remove it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112733 Approved by: https://github.com/albanD	2023-12-07 01:31:53 +00:00
ydwu4	240f4b2d25	make __lookup_backend return None when cache misses (#114766 ) Fixes #114674. The error is because cached_backends is a thread-local object, when it's accessed from the other thread, we'll have a cache miss. The naive fix is to just return None and re-compiles when cache misses. This could also be related to making dynamo more thread-safe but I'm not sure if there an on-going effort or not. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114766 Approved by: https://github.com/IvanYashchuk, https://github.com/Neilblaze, https://github.com/jansel	2023-12-07 00:25:01 +00:00
Bin Bao	7457a5f4be	[inductor] adapt to the get_max_simd_tflops Triton API change (#115288 ) Differential Revision: D51907617 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115288 Approved by: https://github.com/hl475, https://github.com/chenyang78	2023-12-07 00:22:06 +00:00
titaiwangms	ae5365819d	[ONNX] Extend `test_fx_op_consistency.py` to cover ExportedProgram model type (#114886 ) This PR covers `ExportedProgram` to `test_fx_op_consistency.py`, which helps us identify the necessary but missing io_steps. Next, we should refactor the tests to actually cover all ops supported by registry. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114886 Approved by: https://github.com/thiagocrepaldi	2023-12-07 00:03:23 +00:00
Albert Zeyer	3642f29a64	DistributedDataParallel._post_forward, fix return (#114678 ) Fix `return` in case of `_delay_all_reduce_all_params`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114678 Approved by: https://github.com/Skylion007, https://github.com/fegin	2023-12-06 23:44:52 +00:00
Will Constable	0e07e3dbe4	[C10D] Use future for flight recorder dump (#115176 ) Replaces the "always sleep 30 sec before abort" with "wait up to 30 sec for the future to complete then abort". The difference in this case is the abort happens as soon as the dump finishes up to a maximum, instead of always waiting the maximum. Allows multiple calls to dump, which will be serialized. Renames `tryWriteDebugInfo` to `launchAsyncDebugDump` in spirit of the change to support more than one launch and to always launch rather than only launching on the first call. Adds a test for dumping on timeout. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115176 Approved by: https://github.com/zdevito	2023-12-06 23:42:19 +00:00
Bin Bao	0757e2ba84	[aotautograd] Fix an output shape error when inputs are aliased (#115279 ) Summary: https://github.com/pytorch/pytorch/issues/97083, when an output is marked as OutputType.is_input but a synthetic base is constructed because of aliased inputs, we may need to update the output type to OutputType.alias_of_input if needed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115279 Approved by: https://github.com/bdhirsh	2023-12-06 23:10:21 +00:00
Facebook Community Bot	7e0e124a5d	Automated submodule update: FBGEMM (#115103 ) This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `dbc3157bf2` Test Plan: Ensure that CI jobs succeed on GitHub before landing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115103 Approved by: https://github.com/malfet	2023-12-06 22:47:40 +00:00
Jon Chuang	83cb6a75ad	[dynamo] add list iterator contains (#115237 ) Fixes https://github.com/pytorch/pytorch/issues/115236 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115237 Approved by: https://github.com/jansel	2023-12-06 22:26:16 +00:00
Bin Bao	71bf4f3b87	[CI] Add torch/_functorch/_aot_autograd to auto-label rule (#115283 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115283 Approved by: https://github.com/bdhirsh	2023-12-06 20:07:53 +00:00
leslie-fang-intel	1489e4bcf3	[Quant] [PT2] Enable batchnorm in _move_exported_model_to_eval (#114547 ) Summary Add standalone batchnorm into `_move_exported_model_to_eval` to move it from training mode into eval mode Test Plan ``` python -m pytest test_mkldnn_pattern_matcher.py -k test_qat_bn_conv2d python -u -m pytest -s -v test_quantize_pt2e.py -k test_bn_move_exported_model_to_eval ``` Differential Revision: [D51853407](https://our.internmc.facebook.com/intern/diff/D51853407) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114547 Approved by: https://github.com/jgong5, https://github.com/andrewor14	2023-12-06 19:51:22 +00:00
Joel Schlosser	c99db5617a	Introduce general metadata cache to jagged layout NestedTensor (#115212 ) Slight refactor to: * lazily compute min / max seq_len used for flash. this avoids unnecessary graph breaks / specialization when we're not accessing these * store min / max seq_len in a general `metadata_cache`. condensing these should make it easier to avoid specializing on these and others we may add in the future Pull Request resolved: https://github.com/pytorch/pytorch/pull/115212 Approved by: https://github.com/soulitzer, https://github.com/ani300 ghstack dependencies: #114311	2023-12-06 19:40:35 +00:00
Wanchao Liang	b6de337d16	[funcol] a few optimizations to funcol (#113324 ) Apply a few optimizations to funcol: - allgather on non-0 dim, the resulting tensor already needs to access data in order to do torch.cat, so we sync wait here so that we don;t need to go through ACT dispatch for chunk + cat alltogether - have a fast return logic to aten.view as it's a commonly hit op for view related ops Pull Request resolved: https://github.com/pytorch/pytorch/pull/113324 Approved by: https://github.com/XilunWu	2023-12-06 19:25:35 +00:00
Jon Chuang	2cf0cf8137	[dynamo / DDP] - lazily compile submodules - to propagate real tensor strides to backend compiler (#114154 ) Fixes https://github.com/pytorch/pytorch/issues/113812, https://github.com/pytorch/pytorch/issues/102591, Probably fixes: https://github.com/pytorch/pytorch/issues/113740, https://github.com/pytorch/pytorch/issues/113786, https://github.com/pytorch/pytorch/issues/113788 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114154 Approved by: https://github.com/wconstab, https://github.com/yf225	2023-12-06 18:50:14 +00:00
angelayi	967863d91d	[export][refactor][3/n] Move unlift to separate file (#114787 ) Differential Revision: [D51823960](https://our.internmc.facebook.com/intern/diff/D51823960) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114787 Approved by: https://github.com/ydwu4 ghstack dependencies: #114764, #114768	2023-12-06 16:46:47 +00:00
angelayi	0ab57ee7ea	[export][refactor][2/n] Move tracing logic (#114768 ) 2/n of refactoring export code: * Moved tracing logic in torch/_export/init.py to torch/export/_tracer.py Differential Revision: [D51823961](https://our.internmc.facebook.com/intern/diff/D51823961) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114768 Approved by: https://github.com/ydwu4 ghstack dependencies: #114764	2023-12-06 16:46:47 +00:00
angelayi	53bf8cfcf9	[export][refactor][1/n] Move dynamic shapes logic (#114764 ) 1/n of refactoring export code: * Moved dynamic shapes/constraints/dynamic_dims logic in torch/_export/__init__.py and torch/export/__init__.py to torch/export/dynamic_shapes.py Differential Revision: [D51823962](https://our.internmc.facebook.com/intern/diff/D51823962) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114764 Approved by: https://github.com/ydwu4	2023-12-06 16:46:38 +00:00
Bin Bao	5f939e32e3	[CI] Log load_model failures in csv (#114784 ) Summary: Right now when load_model fails (either because of loading error or validation eager run failure), the result won't be logged in generated csv files. Let's log them in csv so that they are monitored by the expected results checking. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114784 Approved by: https://github.com/malfet	2023-12-06 15:19:16 +00:00
rzou	67c8ad7285	Fix autograd.Function x enum input x torch.compile (#115206 ) Fixes https://github.com/pytorch/pytorch/issues/114777. We treat Enums like we do ConstantVariable. Test Plan: New test Pull Request resolved: https://github.com/pytorch/pytorch/pull/115206 Approved by: https://github.com/yanboliang ghstack dependencies: #115185, #115186, #115187	2023-12-06 15:18:25 +00:00
y-sq	233ce0d24b	Support GPU annotations for auto-trace jobs similar on-demand support (#114638 ) Summary: When using auto_trace, gpu_user_annotation is not shown in the results. Fixing this by including `GPU_USER_ANNOTATION` in `kCudaTypes`. Differential Revision: D51597995 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114638 Approved by: https://github.com/aaronenyeshi	2023-12-06 09:38:13 +00:00
drisspg	d4c79a3078	Add an attention bias subclass for a lower right causal masking (#114823 ) # Summary This PR introduces a new Tensor subclass that is designed to be used with torch.nn.functional.scaled_dot_product_attention. Currently we have a boolean `is_causal` flag that allows users to do do causal masking without the need to actually create the "realized" attention bias and pass into sdpa. We originally added this flag since there is native support in both fused kernels we support. This provides a big performance gain ( the kernels only need to iterate over ~0.5x the sequence, and for very large sequence lengths this can provide vary large memory improvements. The flag was introduced when the early on in the kernel development and at the time it was implicitly meant to "upper_left" causal attention. This distinction only matters when the attention_bias is not square. For a more detailed break down see: https://github.com/pytorch/pytorch/issues/108108. The kernels default behavior has since changed, largely due to the rise of autogressive text generation. And unfortunately this would lead to a BC break. In the long term it may actually be beneficial to change the default meaning of `is_causal` to represent lower_right causal masking. The larger theme though is laid here: https://github.com/pytorch/pytorch/issues/110681. The thesis being that there is alot of innovation in SDPA revolving around the attention_bias being used. This is the first in hopefully a few more attention_biases that we would like to add. The next interesting one would be `sliding_window` which is used by the popular mistral model family. Results from benchmarking, I improved the meff_attention perf hence the slightly decreased max perf. ```Shell +---------+--------------------+------------+-----------+-----------+-----------+-----------+----------------+----------+ \| Type \| Speedup \| batch_size \| num_heads \| q_seq_len \| k_seq_len \| embed_dim \| dtype \| head_dim \| +---------+--------------------+------------+-----------+-----------+-----------+-----------+----------------+----------+ \| Average \| 1.2388050062214226 \| \| \| \| \| \| \| \| \| Max \| 1.831672915579016 \| 128 \| 32 \| 1024 \| 2048 \| 2048 \| torch.bfloat16 \| 64 \| \| Min \| 0.9430534166730135 \| 1 \| 16 \| 256 \| 416 \| 2048 \| torch.bfloat16 \| 128 \| +---------+--------------------+------------+-----------+-----------+-----------+-----------+----------------+----------+ ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114823 Approved by: https://github.com/cpuhrsch	2023-12-06 08:29:26 +00:00
Oguz Ulgen	4a9fb9832a	Assert that output could only be the last node of the FX graph (#115179 ) Test Plan: unit tests Differential Revision: D51856848 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115179 Approved by: https://github.com/Chillee	2023-12-06 08:17:16 +00:00
Wei Wei	fcf6a76108	[aot_inductor][pass] fuse parallel linear based on pre grad aten IR (#114776 ) Summary: This work is for PT2 inference. Since the IR from Export will change to pre-grad aten IR in a few months. We need to start this work from now on. Here is what I do in this diff: 1) Copy the fuse parallel linear pass to fb folder and adapt it to aten IR. We still want to keep the original `group_batch_fusion.py` because it is still used in training. In future at certain time point when PT2 training decided to retire the torch IR based group_batch_fusion, we can remove it. But right now, it's better to have torch IR and aten IR version seperately. Our plan is to gradually transform the existing and important pre-grad passes to aten IR based passes. Differential Revision: D51017854 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114776 Approved by: https://github.com/zhxchen17	2023-12-06 05:48:20 +00:00
cyy	d250b2158e	[4/N] Fixes clang-tidy warnings in header files (#115163 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/115163 Approved by: https://github.com/Skylion007	2023-12-06 05:00:01 +00:00
Jason Ansel	f4c67ffff4	[dynamo] Improve support for dynamic shapes str.format and _assert (#115203 ) This removes a graph break in vision_maskrcnn. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115203 Approved by: https://github.com/yanboliang	2023-12-06 04:54:45 +00:00
JackCaoG	4ff4e06b5b	Update xla pin (#115211 ) This is to update the pin pass `062aa91a9c` so flaky test can be skipped Pull Request resolved: https://github.com/pytorch/pytorch/pull/115211 Approved by: https://github.com/malfet	2023-12-06 04:52:37 +00:00
Jiong Gong	534f25887b	[inductor] avoid inplace for ComplexView (#115166 ) Fix https://github.com/pytorch/pytorch/issues/115071 A regression introduced by https://github.com/pytorch/pytorch/pull/112875/files#diff-d2539c9c8dc6a3d7e457767a880612e96d3c85752a77ead49a9e4e00a3e4c3c7R335 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115166 Approved by: https://github.com/Skylion007	2023-12-06 04:52:28 +00:00
Chen_Liqing	490f2d7570	Skip privateuse1's checkZeroPoints (#114117 ) We want to use ``quantize_per_channel`` to create a quantized tensor, but we found that ``checkZeroPoints`` for ``privateuse1`` backend failed. ``quantize_tensor_per_channel_affine`` will ``checkZeroPoints`` for all backends expect ``CUDA``: `140c54e6cc/aten/src/ATen/native/quantized/AffineQuantizer.cpp (L162-L164)` However, our ``privateuse1`` backend will get a segmentation error if we try to cast our data to int64_t in ``checkZeroPoints``: `140c54e6cc/aten/src/ATen/native/quantized/AffineQuantizer.cpp (L82-L88)` So if we can skip ``privateuse1``'s ``checkZeroPoints`` and check this item in the actual device function? What do you think? Pull Request resolved: https://github.com/pytorch/pytorch/pull/114117 Approved by: https://github.com/jerryzh168	2023-12-06 04:44:49 +00:00
PyTorch UpdateBot	acdd06e00f	[executorch hash update] update the pinned executorch hash (#115215 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml). Update the pinned executorch hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115215 Approved by: https://github.com/pytorchbot	2023-12-06 04:33:25 +00:00
Nikita Shulga	a548e80536	Use `test_vulkan` to validate run_test without boto3 (#115233 ) As `test_weak` can undergo some changes, but `test_vulkan` is a no-op for CPU builds Pull Request resolved: https://github.com/pytorch/pytorch/pull/115233 Approved by: https://github.com/suo	2023-12-06 03:45:52 +00:00
fduwjj	2bff36bb0e	[c10d] Change set timeout API name to _set_default_timeout (#115197 ) Somehow the feedback does not show up, this PR is to address the comment in https://github.com/pytorch/pytorch/pull/115141. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115197 Approved by: https://github.com/XilunWu, https://github.com/wconstab	2023-12-06 03:38:39 +00:00
Nikita Shulga	b56b002842	Fix NULL dereference in binary CPU ops (#115183 ) Targeted fix for https://github.com/pytorch/pytorch/issues/113037 A more fundamental one, where those functions are not even called for empty tensors are coming later Pull Request resolved: https://github.com/pytorch/pytorch/pull/115183 Approved by: https://github.com/drisspg, https://github.com/atalman, https://github.com/huydhn	2023-12-06 03:37:47 +00:00
PyTorch UpdateBot	892a14a450	[vision hash update] update the pinned vision hash (#111408 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111408 Approved by: https://github.com/pytorchbot	2023-12-06 03:25:52 +00:00
Natalia Gimelshein	ef6cbf4e1f	remove myself from CODEOWNERS (#115230 ) Trying to reign in my notifications ;-) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115230 Approved by: https://github.com/malfet	2023-12-06 02:50:50 +00:00
rzou	b0b190f7c0	More descriptive error message for unsupported inputs to HOP (#115187 ) Test Plan: See updated tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/115187 Approved by: https://github.com/ydwu4, https://github.com/yanboliang ghstack dependencies: #115185, #115186	2023-12-06 01:29:03 +00:00
rzou	b5b011a5cd	Expand input types for HOPs that use manually_set_subgraph_inputs=False (#115186 ) Previously we only supported Tensor, Constants, and SymNode. We lift that restriction (there's not really a good reason for it). HOPs like torch.cond, torch.map already do input validation (those are the ones that can only support Tensor, Constant, and SymNode inputs). Test Plan: New test for `wrap`, which is a HOP that has manually_set_subgraph_inputs=False Pull Request resolved: https://github.com/pytorch/pytorch/pull/115186 Approved by: https://github.com/ydwu4, https://github.com/yanboliang ghstack dependencies: #115185	2023-12-06 01:29:03 +00:00
rzou	bc46347152	Refactor how HOPs create new args to subgraphs (#115185 ) This PR combines the logic for Tensor and SymNode. Test Plan: - Existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/115185 Approved by: https://github.com/ydwu4, https://github.com/yanboliang	2023-12-06 01:29:03 +00:00
leslie-fang-intel	f6291a5e93	[Quant] [Inductor] Enable QLinear weight prepack when input dimension size exceeds 2 (#113928 ) Summary Enable the qlinear weight prepack when input dimension size exceeds 2. There are extra reshape node before and after the `addmm` or `mm` node if input dimension size exceeds 2. Test Plan ``` python -m pytest test_mkldnn_pattern_matcher.py -k input_dim_exceeds_2 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/113928 Approved by: https://github.com/jgong5, https://github.com/eellison ghstack dependencies: #113733, #113912	2023-12-06 01:24:15 +00:00
leslie-fang-intel	6d0cf26c3a	[Quant] [Inductor] Enable Dequant Promotion when Linear input dimension size exceeds 2 (#113912 ) Summary When decomposing `Linear` to `addmm` or `mm` within Inductor, if the input dimension size exceeds 2, `reshape` nodes are introduced to convert the input into a 2-dimensional form before and after the `addmm` or `mm` node. It is essential to identify and match this pattern during quantization for dequantization promotion. For instance, ``` # quant # + - - - \| - - - + # \| dequant \| # \| \| \| # \| reshape \| # \| / \ \| # \| node1 node2 \| # + - \| - - - \| - + # reshape reshape # + - \| - - - \| - + # quant quant ``` In this PR, we mainly do 2 things: - Extend support for the dequantization pattern in QLinear when the input dimension size exceeds 2. - Revise the implementation of the dequant promotion pass, as it now needs to accommodate the matching of four different patterns. Test Plan ``` python -m pytest test_mkldnn_pattern_matcher.py -k input_dim_exceeds_2 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/113912 Approved by: https://github.com/jgong5, https://github.com/eellison ghstack dependencies: #113733	2023-12-06 01:20:36 +00:00
leslie-fang-intel	4a624d1f8a	[Quant] [PT2] Enable QLinear input with multi dims (#113733 ) Summary In the previous QLinear implementation, it was assumed that inputs have a dimension of 2. In this update, we have modified QLinear to accept inputs with a dimension greater than 2, incorporating input and output reshaping accordingly. Test Plan ``` python -u -m pytest -s -v test_quantized_op.py -k test_qlinear_pt2e ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/113733 Approved by: https://github.com/jgong5, https://github.com/eellison	2023-12-06 01:16:51 +00:00
Natalia Gimelshein	b8ce05456c	enable cat for cuda bits types (#115044 ) It was already working for cpu, so bring parity. Also, slightly reduce number of compiled kernels by using OpaqueType. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115044 Approved by: https://github.com/malfet	2023-12-06 00:05:18 +00:00
BowenBao	b9c4fb68c5	[ONNX][Bench] Fix model name retrieval and remove unused argument (#115108 ) Might be some upstream updates, the previous hack starts to not pick up model names, updating to use the other more appropriate variable. Also fix a bug with an unused argument that was supposed to be removed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115108 Approved by: https://github.com/thiagocrepaldi	2023-12-05 23:55:12 +00:00
Scott Wolchok	ae457a2c4a	[PyTorch] Change test_aot_inductor CPU test failures syntax (#115180 ) This portion of D50416438 is extremely subject to merge conflicts. It can also be safely landed without full CI round trip because it changes just one test file that we can simply run to make sure it works. Differential Revision: [D51856943](https://our.internmc.facebook.com/intern/diff/D51856943/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115180 Approved by: https://github.com/mikekgfb, https://github.com/desertfire	2023-12-05 23:55:08 +00:00
Hongtao Yu	01ec71e466	[NFC][Autotune] Use device_prop.regsPerMultiprocessor instead of hardcoded reg number. (#115094 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115094 Approved by: https://github.com/jansel	2023-12-05 23:49:46 +00:00
Brian Hirsh	1102d37958	remove aot_config.keep_inference_input_mutations from assert_functional_graph (#115195 ) We technically allow backends to aot_autograd to pass a config saying "yes I am ok with seeing input mutations in my graph". With https://github.com/pytorch/pytorch/pull/112906 though, there can be input mutations that show up in the backward (that we need to handle for correctness), that are a large pain to keep out of the graph. The meta-point is that it's been ~a year since we added the config, and it almost always makes sense for backends to support input mutations for performance reasons (inductor does). So I just allow these input mutations in the graph in this rare backward situation, even if the backend didn't explicitly use the config. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115195 Approved by: https://github.com/drisspg	2023-12-05 23:36:37 +00:00
Peter Bell	7aac689b19	[inductor] Add ir.Scan and lower aten.cumsum on CUDA (#106581 ) This adds the `ir.Scan` node (currently only supported on CUDA) which re-uses the existing reduction kernel machinery to support different kinds of non-pointwise ops. Just like reductions it supports prologue and epilogue fusions and has both persistent and non-persistent kernel generation. Currently this doesn't support the equivalent of `Reduction.create_multilayer` and will instead fall back to eager in those cases. This is because splitting into multiple kernel invocations ends up being far slower than cub's single kernel strategy which matches the performance of a copy kernel. Fixes https://github.com/pytorch/pytorch/issues/93631 Pull Request resolved: https://github.com/pytorch/pytorch/pull/106581 Approved by: https://github.com/lezcano, https://github.com/atalman	2023-12-05 23:31:49 +00:00
Jane Xu	d78fe039eb	Introduce OptimizerInfos + add a test_errors (#114178 ) Introduce OptimizerInfos + use them to refactor out the error testing. Why OptimizerInfos? - cleaner, easier way to test all configs of optimizers - would plug in well with devicetype to auto-enable tests for devices like MPS, meta - would allow for more granular testing. currently, lots of functionality is tested in `_test_basic_cases` and some of that should be broken down more. What did I do for error testing? - I moved out some error cases from `_test_basic_cases` into a new test_errors parametrized test. - The new test has to live in TestOptimRenewed (bikeshedding welcome) because the parametrized tests need to take in device and dtype and hook correctly, and not all tests in TestOptim do that. - TestOptimRenewed also is migrating to the toplevel test/test_optim.py now because importing TestOptimRenewed does not work (because of test instantiation, TestOptimRenewed gets replaced with TestOptimRenewedDevice for CPU, CUDA, and whatever other device). Is there any change in test coverage? - INCREASE: The error case where a single Parameter (vs a container of them) are passed in has now expanded to all optims instead of only LBFGS - DECREASE: Not much. The only thing is we no longer test two error cases for foreach=True AND foreach=False, which I think is redundant. (Highlighted in comments) Possible but not urgent next step: test ALL possible error cases by going through all the constructors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114178 Approved by: https://github.com/albanD	2023-12-05 22:58:36 +00:00
rzou	99257002fa	Extend auto_functionalized to support ops that return Tensors (#115135 ) We can auto-functionalize operators that mutate their inputs as long as the outputs of the operator do not alias their inputs. The user needs to provide an abstract impl for the operator if it has non-trivial returns. - We update can_auto_functionalize(op) to include ops that return (but do not alias) Tensors - We update auto_functionalized(op, mutated_args_names, kwargs) to return (out, mutated_args), where `out = op(**kwargs)` and `mutated_args` are the new values of the inputs that would have been mutated. Test Plan: - new test Pull Request resolved: https://github.com/pytorch/pytorch/pull/115135 Approved by: https://github.com/bdhirsh ghstack dependencies: #114955, #114956, #115134	2023-12-05 22:43:06 +00:00
rzou	d0aad93249	Refactor can_auto_functionalize (#115134 ) In preparation for the next PR up in the stack, which is going to update "can_auto_functionalize" to support more operators than just ones that return nothing. We are unable to auto-generate FakeTensor kernels for operators that do not return nothing, but we are able to generate functionalization kernels for operators that return something. Test Plan: Existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/115134 Approved by: https://github.com/bdhirsh ghstack dependencies: #114955, #114956	2023-12-05 22:43:06 +00:00
Yanbo Liang	4620170008	[Dynamo] Revert multiple PRs since they triggered compilation stuck internally (#115126 ) Revert the following PRs to mitigate internal compilation stuck: #113432 #114016 #114507 #114196 #114739 #114669 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115126 Approved by: https://github.com/xush6528	2023-12-05 22:35:37 +00:00
Mu-Chu Lee	80527c0cf2	[AOTInductor] Double buffering for Weights (#114446 ) Summary: This adds function to model container doing weight swapping with double buffering. There are 2 parts for double buffering a) Write constants into inactive buffer b) Swap active buffer For (a), we write the constants into the buffer that's currently not in use, and store the information in both constants map and the corresponding constant array to read. For (b), we obtain the lock, and activate the constant map/constant array that is inactive, and flag the one that's currently in use to inactive. Test Plan: test/cpp/aot_inductor/test.cpp Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D51543732](https://our.internmc.facebook.com/intern/diff/D51543732) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114446 Approved by: https://github.com/chenyang78, https://github.com/eellison	2023-12-05 22:31:56 +00:00
Pearu Peterson	12085914b8	Replace bsr_dense_mm triton kernel with bsr_dense_addm triton kernel (#115030 ) The `bsr_dense_addmm` triton kernel introduced in https://github.com/pytorch/pytorch/pull/114595 is a generalization of `bsr_dense_mm` triton kernel and a more efficient version of it because it uses an extra kernel parameter `SPLIT_N` that has notable effect to performance for r.h.s operand with a larger number of columns. This PR eliminates the `bsr_dense_mm` triton kernel in favor of using `bsr_dense_addmm` triton kernel. The performance increase of `bsr_dense_mm` is as follows (float16, `NVIDIA A100-SXM4-80GB`): - with 16x16 blocks, the average/maximal speed up is 50/71 % - with 32x32 blocks, the average/maximal speed up is 30/63 % - with 64x64 blocks, the average/maximal speed up is 12/26 % - with 128x128 blocks, the average/maximal speed up is 7/17 % Pull Request resolved: https://github.com/pytorch/pytorch/pull/115030 Approved by: https://github.com/cpuhrsch	2023-12-05 22:29:24 +00:00
Will Constable	f35f52e4a6	Update auto_request_review.yml (#115182 ) remove myself to avoid notification noise Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/115182 Approved by: https://github.com/huydhn, https://github.com/albanD	2023-12-05 21:36:18 +00:00
Xu Zhao	f09e8381b7	[Inductor][fx pass] Fix a bug in batch linear fusion in the post grad (#115061 ) (#115131 ) Summary: Titled Test Plan: ``` buck2 test 'fbcode//mode/dev-nosan' fbcode//caffe2/test/inductor:group_batch_fusion ``` Buck UI: https://www.internalfb.com/buck2/ab4b918c-9ffa-4d00-a747-880521a27851 Test UI: https://www.internalfb.com/intern/testinfra/testrun/16607023638890043 Network: Up: 11MiB Down: 117MiB (reSessionID-079402d0-8fd7-4797-9ed5-dd0f778dce1a) Jobs completed: 189430. Time elapsed: 2:02.5s. Cache hits: 99%. Commands: 77000 (cached: 76995, remote: 5, local: 0) Tests finished: Pass 7. Fail 0. Fatal 0. Skip 0. Build failure 0 Reviewed By: mengluy0125 Differential Revision: D51796899 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115131 Approved by: https://github.com/mengluy0125	2023-12-05 21:20:17 +00:00
Yue Dong	ab120e65fb	Fix FSDP + TP state dict in param unflattening (#115105 ) Summary: This diff fix the param unflattening when using FSDP together with TP. Currently we hardcode the `reshape_size` to be multiplied by 2, which instead should be the size of the process group. Before the fix, example exception: `shape '[257, 514]' is invalid for input of size 264196`, where the process group size is 4 instead of 2. Test Plan: CI: CI test Unit test: `buck2 test mode/dev-nosan //caffe2/test/distributed/tensor/parallel:fsdp_2d_parallel` - Passed Test model with WHEN: - Verified that checkpoint can be saved and resumed successfully; - Verified the accuracy with window_ne, which is on-par with baseline. https://pxl.cl/3Wp8w Differential Revision: D51826120 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115105 Approved by: https://github.com/fegin	2023-12-05 21:19:56 +00:00
Joel Schlosser	22704426c3	Expand dynamic dims support for traceable subclasses (#114311 ) Continuation of #112185, following the design in this [doc](https://docs.google.com/document/d/1ipSxcTzEMMOAPvxP-YJlD5JBZZmIGgh8Q34ixtOUCRo). Summary: * Introduce `SubclassSymbolicPolicy` containing separate dynamic dim / constraint policies for the outer and inner tensors * Expand the automatic dynamic algorithm to recurse into inner tensors and produce one of these for a subclass instance * Maintain legacy behavior for subclasses by recursively calling `mark_dynamic()` on inner tensors of the same dim as outer when `mark_dynamic(outer, ...)` is called * Addresses this: `6a86cf00ad/torch/_dynamo/variables/builder.py (L1750)` * Add `outer_size` and `outer_stride` arguments to `__tensor_unflatten__()` so that you can find out what symbols were allocated for the outer size / stride (you are expected to return a tensor that compares equal to the outer symbols) * Signatures now: ```python # attrs is a list of inner tensor attributes on x; inner_tensor = getattr(x, attr) # ctx is anything useful for rebuilding the class we want to guard on attrs, ctx = x.__tensor_flatten__() ... # inner_tensors is a dict of {attr -> tensor} # ctx is taken unmodified from flattening and (eventually) guarded on # outer_size is the expected size of the output; possibly symbolic # outer_stride is the expected strides of the output; possibly symbolic y = MySubclass.__tensor_unflatten__(inner_tensors, ctx, outer_size, outer_stride) # at the __tensor_unflatten__() call-site in PT2, we assert y.shape == outer_size and y.stride() == outer_stride # the assert simplifies symbols when there are relationships between outer and inner symbols ``` * Size info needed for `NestedTensor` at least, stride info needed for `DTensor` at least * Punting on `outer_storage_offset` because storage_offset handling is horribly broken in PT2 right now * ~~Add new `__tensor_mark_dynamic__()` to allow overriding the behavior of mark_dynamic on a per-subclass basis~~ (booted to future work) * ~~Add guards for tensor subclasses by calling `__tensor_flatten__()` in the guard to test equality on `ctx`~~ * Now handled in #114469 * Next PR: add TENSOR_MATCH guards on inner tensors Pull Request resolved: https://github.com/pytorch/pytorch/pull/114311 Approved by: https://github.com/ezyang, https://github.com/drisspg, https://github.com/voznesenskym, https://github.com/bdhirsh	2023-12-05 21:09:25 +00:00
zdevito	259a99669d	[NCCL flight recorder] Dump when writing to pipe (#115139 ) If TORCH_NCCL_DUMP_ON_TIMEOUT is set, then along with producing a dump file when a timeout happens, you can trigger a dump by writing to local pipe `<TORCH_NCCL_DEBUG_INFO_TEMP_FILE>_<rank>.pipe` (by default /tmp/nccl_trace_{rank}_<rank>.pipe). Pull Request resolved: https://github.com/pytorch/pytorch/pull/115139 Approved by: https://github.com/wconstab	2023-12-05 20:44:23 +00:00
angelayi	5fdae89c03	[docs][aoti] Link to export docs in AOTI docs (#115088 ) Context: https://fb.workplace.com/groups/1075192433118967/posts/1341833143121560/?comment_id=1341841786454029 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115088 Approved by: https://github.com/desertfire	2023-12-05 20:22:42 +00:00
fduwjj	a8bd593252	[c10d] Add _reset_nccl_collective_timeout so users can change timeout of a NCCL PG (#115141 ) There are some use cases when users want to change the timeout for a NCCL process group in the middle of training. This PR enables it by adding a pybind api. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115141 Approved by: https://github.com/wconstab	2023-12-05 19:55:28 +00:00
Anupam Bhatnagar	85d4708512	HTA docs (#115060 ) Added documentation for Holistic Trace Analysis Pull Request resolved: https://github.com/pytorch/pytorch/pull/115060 Approved by: https://github.com/aaronenyeshi	2023-12-05 19:38:09 +00:00
PyTorch MergeBot	063423edf5	Revert "enable cat for cuda bits types (#115044 )" This reverts commit 4cf97c40f7145b1bd1ab76b2240327d7000c27d2. Reverted https://github.com/pytorch/pytorch/pull/115044 on behalf of https://github.com/malfet due to This breaks ROCM ([comment](https://github.com/pytorch/pytorch/pull/115044#issuecomment-1841494814))	2023-12-05 19:37:25 +00:00
willfengg	01afa54df5	[dynamo][FSDP] unit test: FSDP should not be lifted as fx graph attrs (#115112 ) this was a SEV when FSDP modules are registered as graph attributes this unit test prevents it from happening again without SEV fix: D48810186 ``` python test/distributed/test_dynamo_distributed.py -k test_fsdp_skip_register_attr_or_module File "/data/users/weif/pytorch/torch/_dynamo/repro/after_dynamo.py", line 117, in debug_wrapper compiled_gm = compiler_fn(gm, example_inputs) File "/data/users/weif/pytorch/test/distributed/test_dynamo_distributed.py", line 897, in debug_compiler self.assertFalse(name in node.name, f"FSDP module {name} should not be registered as attributes") torch._dynamo.exc.BackendCompilerFailed: backend='debug_compiler' raised: AssertionError: True is not false : FSDP module l__self___net_0_weight should not be registered as attributes ``` with SEV fix: D48810186 ``` python test/distributed/test_dynamo_distributed.py -k test_fsdp_skip_register_attr_or_module Ran 1 test in 6.438s ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/115112 Approved by: https://github.com/mlazos	2023-12-05 19:16:03 +00:00
Jason Ansel	4b8ddbbc7e	[dynamo] Improve graph break message for copy.deepcopy (#115120 ) I was curious what hf_T5_generate was trying to deepcopy, so I updated the errror message: Before: ``` STATS graph_break ("'skip function deepcopy in file /home/jansel/conda/envs/pytorch/lib/python3.10/copy.py'', skipped according skipfiles.SKIP_DIRS'", 3) ... ``` After: ``` STATS graph_break ('copy.deepcopy UserDefinedObjectVariable(GenerationConfig)', 3) ... ``` Related issue: #115122 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115120 Approved by: https://github.com/oulgen ghstack dependencies: #115095, #115046, #115057, #115119	2023-12-05 19:01:31 +00:00
Jason Ansel	522bae20df	[dynamo] Support any() on SymNodeVariable (#115119 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115119 Approved by: https://github.com/yanboliang ghstack dependencies: #115095, #115046, #115057	2023-12-05 19:01:31 +00:00
Jason Ansel	88642d44d9	[dynamo] Add RestrictedListSubclassVariable (#115057 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115057 Approved by: https://github.com/yanboliang ghstack dependencies: #115095, #115046	2023-12-05 19:01:23 +00:00
Jason Ansel	a97ed2470a	[dynamo] Support hasattr on dataclass (#115046 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115046 Approved by: https://github.com/yanboliang ghstack dependencies: #115095	2023-12-05 19:01:14 +00:00
Jason Ansel	aa70e31610	[dynamo] Fix MutableSideEffects returning alias (#115095 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115095 Approved by: https://github.com/yanboliang	2023-12-05 19:01:03 +00:00
visdauas	5f89cedf9b	Add note to set_default_device about functions with shared memory (#114825 ) Fixes #114691 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114825 Approved by: https://github.com/mikaylagawarecki	2023-12-05 18:52:54 +00:00
Aaron Gokaslan	a987ad3d89	[BE]: Update ruff to v0.1.7 (#115169 ) Update ruff to v0.1.7 with the latest and greatest fixes Pull Request resolved: https://github.com/pytorch/pytorch/pull/115169 Approved by: https://github.com/albanD	2023-12-05 18:50:11 +00:00
Xilun Wu	4c5fe66880	[DTensor][BE] fix bug in OpStrategy for Tuple output (#115161 ) Summary: DTensor sharding propagation returns an `OpStrategy` object in case of a Tuple of multiple DTensors of the same `placements` and this object will later be expanded to a tuple of `DTensorSpec`s. However, the expansion was done as copying the object's reference instead of copying/creating new objects and this leads to wrong overriding issue in Tensor Meta propagation logic. Test: pytest test/distributed/_tensor/test_math_ops.py pytest test/distributed/_tensor/test_dtensor_ops.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/115161 Approved by: https://github.com/wanchaol	2023-12-05 18:28:40 +00:00
Ke Wen	c9853ccadc	Relax tensor contiguity requirement for P2P ops (#114982 ) I hit the following error when performing pipeline parallel for T5: ``` return default_pg.send([tensor], dst, tag) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ValueError: Tensors must be contiguous ``` In theory, we shouldn't require the tensors to be contiguous, especially for P2P ops, because we are just doing bit-wise "copy". Thus, this PR relaxes the requirement and instead calls out that it would be user responsibility to guarantee the source and destination tensors have the same contiguity setting. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114982 Approved by: https://github.com/H-Huang	2023-12-05 18:25:42 +00:00
Xia, Weiwen	daf89b4101	Update oneDNN submodule to v3.3.2 (#112700 ) Update oneDNN submodule to v3.3.2. Add a macro to check the version of `third_party/ideep`. Since we have versioning now, the changes won't break any pipeline even if `third_party/ideep` is not updated at the same time. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112700 Approved by: https://github.com/leslie-fang-intel, https://github.com/atalman	2023-12-05 17:51:55 +00:00
Natalia Gimelshein	4cf97c40f7	enable cat for cuda bits types (#115044 ) It was already working for cpu, so bring parity. Also, slightly reduce number of compiled kernels by using OpaqueType. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115044 Approved by: https://github.com/malfet	2023-12-05 17:14:42 +00:00
Nikita Shulga	a827ac71f2	Revert "[DeviceMesh] Rename _device_mesh.py to device_mesh.py to prepare for beta (#115099 )" This reverts commit eaa64339d640ed1d36520ada379213f8361be5ff.	2023-12-05 08:59:36 -08:00
lezcano	0a9819e3e1	Prefer is_number over is_constant() (#114513 ) `is_constant` tries really hard to check whether an expression is constant. `is_number` is often enough. Note that `sympy.nan.is_number` is true. Same for infinities Pull Request resolved: https://github.com/pytorch/pytorch/pull/114513 Approved by: https://github.com/peterbell10	2023-12-05 16:56:15 +00:00
Huy Do	5de0dff7ea	Disable bugprone-unchecked-optional-access as it can cause clang-tidy to hang (#115124 ) Let's see if it helps https://github.com/pytorch/pytorch/issues/114913 The issues on llvm are at https://github.com/llvm/llvm-project/issues/55530 and https://github.com/llvm/llvm-project/issues/69369. In my CI test, I saw the following process hanged: ``` /pytorch/pytorch/.lintbin/clang-tidy -p=/pytorch/pytorch/build --extra-arg -I/usr/lib/llvm-11/include/openmp --extra-arg -I/opt/conda/envs/py_3.9/include/python3.9 --extra-arg -I/pytorch/pytorch/third_party/pybind11/include --extra-arg -I/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11 --extra-arg -I/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11 --extra-arg -I/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/backward --extra-arg -I/usr/lib/llvm-14/lib/clang/14.0.0/include --extra-arg -I/usr/local/include --extra-arg -I/usr/include/x86_64-linux-gnu --extra-arg -I/usr/include /pytorch/pytorch/torch/csrc/autograd/python_nested_functions_manual.cpp ``` and the core dump matches the description found in https://github.com/llvm/llvm-project/issues/69369 showing the stuck in `clang::tidy::bugprone::UncheckedOptionalAccessCheck::check`: ``` #0 0x00000000030c7420 in clang::dataflow::WatchedLiteralsSolverImpl::updateWatchedLiterals() () #1 0x00000000030c6c2a in clang::dataflow::WatchedLiteralsSolverImpl::solve() && () #2 0x00000000030c6572 in clang::dataflow::WatchedLiteralsSolver::solve(llvm::DenseSet<clang::dataflow::BoolValue, llvm::DenseMapInfo<clang::dataflow::BoolValue, void> >) () #3 0x00000000030b3bd3 in clang::dataflow::DataflowAnalysisContext::querySolver(llvm::DenseSet<clang::dataflow::BoolValue, llvm::DenseMapInfo<clang::dataflow::BoolValue, void> >) () #4 0x00000000030b3ca5 in clang::dataflow::DataflowAnalysisContext::flowConditionImplies(clang::dataflow::AtomicBoolValue&, clang::dataflow::BoolValue&) () #5 0x00000000030b1213 in clang::dataflow::(anonymous namespace)::diagnoseUnwrapCall(clang::Expr const, clang::Expr const, clang::dataflow::Environment const&) () #6 0x00000000030b1357 in std::_Function_handler<std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > (clang::CallExpr const, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&), clang::dataflow::(anonymous namespace)::buildDiagnoseMatchSwitch(clang::dataflow::UncheckedOptionalAccessModelOptions const&)::$_7>::_M_invoke(std::_Any_data const&, clang::CallExpr const&&, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&) () #7 0x00000000030b1292 in std::_Function_handler<std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > (clang::Stmt const, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&), clang::dataflow::MatchSwitchBuilder<clang::dataflow::Environment const, std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > >::CaseOf<clang::CallExpr>(clang::ast_matchers::internal::Matcher<clang::Stmt>, std::function<std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > (clang::CallExpr const, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&)>) &&::{lambda(clang::Stmt const, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&)#1}>::_M_invoke(std::_Any_data const&, clang::Stmt const&&, clang::ast_matchers::MatchFinder::MatchResult const&, clang::dataflow::Environment const&) () #8 0x00000000030b1995 in clang::dataflow::MatchSwitchBuilder<clang::dataflow::Environment const, std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > >::Build() &&::{lambda(clang::Stmt const&, clang::ASTContext&, clang::dataflow::Environment const&)#1}::operator()(clang::Stmt const&, clang::ASTContext&, clang::dataflow::Environment const&) const () #9 0x00000000030b170c in std::_Function_handler<std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > (clang::Stmt const&, clang::ASTContext&, clang::dataflow::Environment const&), clang::dataflow::MatchSwitchBuilder<clang::dataflow::Environment const, std::vector<clang::SourceLocation, std::allocator<clang::SourceLocation> > >::Build() &&::{lambda(clang::Stmt const&, clang::ASTContext&, clang::dataflow::Environment const&)#1}>::_M_invoke(std::_Any_data const&, clang::Stmt const&, clang::ASTContext&, clang::dataflow::Environment const&) () #10 0x00000000030a7c27 in clang::dataflow::UncheckedOptionalAccessDiagnoser::diagnose(clang::ASTContext&, clang::Stmt const, clang::dataflow::Environment const&) () #11 0x0000000002931286 in std::_Function_handler<void (clang::Stmt const, clang::dataflow::DataflowAnalysisState<clang::dataflow::NoopLattice> const&), clang::tidy::bugprone::analyzeFunction(clang::FunctionDecl const&, clang::ASTContext&)::$_0>::_M_invoke(std::_Any_data const&, clang::Stmt const&&, clang::dataflow::DataflowAnalysisState<clang::dataflow::NoopLattice> const&) () #12 0x0000000002930b41 in clang::dataflow::runDataflowAnalysis<clang::dataflow::UncheckedOptionalAccessModel>(clang::dataflow::ControlFlowContext const&, clang::dataflow::UncheckedOptionalAccessModel&, clang::dataflow::Environment const&, std::function<void (clang::Stmt const, clang::dataflow::DataflowAnalysisState<clang::dataflow::UncheckedOptionalAccessModel::Lattice> const&)>)::{lambda(clang::Stmt const, clang::dataflow::TypeErasedDataflowAnalysisState const&)#1}::operator()(clang::Stmt const, clang::dataflow::TypeErasedDataflowAnalysisState const&) const () #13 0x00000000030c18cc in std::_Function_handler<void (clang::CFGStmt const&, clang::dataflow::TypeErasedDataflowAnalysisState const&), clang::dataflow::runTypeErasedDataflowAnalysis(clang::dataflow::ControlFlowContext const&, clang::dataflow::TypeErasedDataflowAnalysis&, clang::dataflow::Environment const&, std::function<void (clang::Stmt const, clang::dataflow::TypeErasedDataflowAnalysisState const&)>)::$_1>::_M_invoke(std::_Any_data const&, clang::CFGStmt const&, clang::dataflow::TypeErasedDataflowAnalysisState const&) () #14 0x00000000030bf069 in clang::dataflow::transferBlock(clang::dataflow::ControlFlowContext const&, std::vector<llvm::Optional<clang::dataflow::TypeErasedDataflowAnalysisState>, std::allocator<llvm::Optional<clang::dataflow::TypeErasedDataflowAnalysisState> > >&, clang::CFGBlock const&, clang::dataflow::Environment const&, clang::dataflow::TypeErasedDataflowAnalysis&, std::function<void (clang::CFGStmt const&, clang::dataflow::TypeErasedDataflowAnalysisState const&)>) () #15 0x00000000030bfaa5 in clang::dataflow::runTypeErasedDataflowAnalysis(clang::dataflow::ControlFlowContext const&, clang::dataflow::TypeErasedDataflowAnalysis&, clang::dataflow::Environment const&, std::function<void (clang::Stmt const, clang::dataflow::TypeErasedDataflowAnalysisState const&)>) () #16 0x00000000029301b3 in llvm::Expected<std::vector<llvm::Optional<clang::dataflow::DataflowAnalysisState<clang::dataflow::UncheckedOptionalAccessModel::Lattice> >, std::allocator<llvm::Optional<clang::dataflow::DataflowAnalysisState<clang::dataflow::UncheckedOptionalAccessModel::Lattice> > > > > clang::dataflow::runDataflowAnalysis<clang::dataflow::UncheckedOptionalAccessModel>(clang::dataflow::ControlFlowContext const&, clang::dataflow::UncheckedOptionalAccessModel&, clang::dataflow::Environment const&, std::function<void (clang::Stmt const, clang::dataflow::DataflowAnalysisState<clang::dataflow::UncheckedOptionalAccessModel::Lattice> const&)>) () #17 0x000000000292fbe8 in clang::tidy::bugprone::UncheckedOptionalAccessCheck::check(clang::ast_matchers::MatchFinder::MatchResult const&) () #18 0x00000000022e1572 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::MatchVisitor::visitMatch(clang::ast_matchers::BoundNodes const&) () #19 0x0000000002797a1c in clang::ast_matchers::internal::BoundNodesTreeBuilder::visitMatches(clang::ast_matchers::internal::BoundNodesTreeBuilder::Visitor) () #20 0x00000000022e0dc6 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::matchWithFilter(clang::DynTypedNode const&) () #21 0x00000000022e3b57 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl) () #22 0x00000000022e4c0c in clang::RecursiveASTVisitor<clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor>::TraverseDecl(clang::Decl) () #23 0x00000000022e3b62 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl) () #24 0x00000000022e4c0c in clang::RecursiveASTVisitor<clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor>::TraverseDecl(clang::Decl) () #25 0x00000000022e3b62 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl) () #26 0x00000000022e4c0c in clang::RecursiveASTVisitor<clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor>::TraverseDecl(clang::Decl) () #27 0x00000000022e3b62 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl) () #28 0x00000000022e4c0c in clang::RecursiveASTVisitor<clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor>::TraverseDecl(clang::Decl) () #29 0x00000000022e3b62 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl) () #30 0x00000000022e8791 in clang::RecursiveASTVisitor<clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor>::TraverseDecl(clang::Decl) () #31 0x00000000022e3b62 in clang::ast_matchers::internal::(anonymous namespace)::MatchASTVisitor::TraverseDecl(clang::Decl) () #32 0x00000000022c017a in clang::ast_matchers::MatchFinder::matchAST(clang::ASTContext&) () #33 0x000000000370ad3c in clang::MultiplexConsumer::HandleTranslationUnit(clang::ASTContext&) () #34 0x00000000038ed4bb in clang::ParseAST(clang::Sema&, bool, bool) () #35 0x000000000369eda7 in clang::FrontendAction::Execute() () #36 0x000000000360d3f6 in clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) () #37 0x00000000027c475c in clang::tooling::FrontendActionFactory::runInvocation(std::shared_ptr<clang::CompilerInvocation>, clang::FileManager, std::shared_ptr<clang::PCHContainerOperations>, clang::DiagnosticConsumer) () #38 0x00000000022ad486 in clang::tidy::runClangTidy(clang::tidy::ClangTidyContext&, clang::tooling::CompilationDatabase const&, llvm::ArrayRef<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, llvm::IntrusiveRefCntPtr<llvm::vfs::OverlayFileSystem>, bool, bool, llvm::StringRef)::ActionFactory::runInvocation(std::shared_ptr<clang::CompilerInvocation>, clang::FileManager, std::shared_ptr<clang::PCHContainerOperations>, clang::DiagnosticConsumer) () #39 0x00000000027c44c6 in clang::tooling::ToolInvocation::runInvocation(char const, clang::driver::Compilation, std::shared_ptr<clang::CompilerInvocation>, std::shared_ptr<clang::PCHContainerOperations>) () #40 0x00000000027c360b in clang::tooling::ToolInvocation::run() () #41 0x00000000027c5bb1 in clang::tooling::ClangTool::run(clang::tooling::ToolAction) () #42 0x00000000022a90c7 in clang::tidy::runClangTidy(clang::tidy::ClangTidyContext&, clang::tooling::CompilationDatabase const&, llvm::ArrayRef<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, llvm::IntrusiveRefCntPtr<llvm::vfs::OverlayFileSystem>, bool, bool, llvm::StringRef) () #43 0x0000000001ebc7f2 in clang::tidy::clangTidyMain(int, char const**) () #44 0x0000000004c54ba0 in __libc_start_main () #45 0x0000000001eb76ae in _start () ``` Another note is that clang-tidy is CPU-bound. So we could consider running lintrunner job on 4xlarge if needed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115124 Approved by: https://github.com/kit1980, https://github.com/Skylion007, https://github.com/malfet	2023-12-05 16:27:56 +00:00
PyTorch MergeBot	ee96399bb4	Revert "[Reland2] Update NVTX to NVTX3 (#109843 )" This reverts commit dcb486232d3eb61024ad9e76cca367c60019c84c. Reverted https://github.com/pytorch/pytorch/pull/109843 on behalf of https://github.com/atalman due to Diff broke internal builds and tests ([comment](https://github.com/pytorch/pytorch/pull/109843#issuecomment-1841105398))	2023-12-05 16:10:20 +00:00
Bin Bao	e06bff8bbe	[AOTI] Handle empty input args (#114682 ) Summary: When the model takes no inputs, AOTInductor relies on checking weights to figure out which device to compile the model into. Currently recording buffer device type happens too late, and this PR fixes that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114682 Approved by: https://github.com/chenyang78	2023-12-05 15:02:17 +00:00
rzou	3d8c174069	Tie some torch.library def/impls to library objects in testing (#114956 ) This should deflake some of the tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/114956 Approved by: https://github.com/williamwen42 ghstack dependencies: #114955	2023-12-05 14:53:32 +00:00
rzou	cfa4370c07	torch.compile should auto-functionalize certain mutable ops (#114955 ) Users may wish to torch.compile custom ops that mutate their inputs and return nothing (this is a common class of operators). torch.compile will automatically support this op without anyone needing to provide a functionalization kernel for it. Here's how. Let's say we have a hypothetical mylib::sin_(Tensor(a!) x) -> () op. First, when FakeTensor sees this op, it can just return None. This is the case because custom ops are not allowed to mutate input metadata, so the FakeTensor rule for one that returns nothing is trivial. Next, when Python FunctionalTensor sees the op, it will functionalize it by emitting a call to an auto_functionalize(op, ["x"], {"x": ...}) HOP and replacing the mutated inputs with the outputs of this HOP. This HOP effectively runs the functional version of the op when called: it clones inputs that will be mutated, runs the op, and then returns Tensors with the new values. In the future we can teach Inductor how to do re-inplacing when it sees this HOP (like how triton kernels do it) but this isn't urgent (and is more of a performance problem). Test Plan: - new tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/114955 Approved by: https://github.com/bdhirsh	2023-12-05 14:53:08 +00:00
Pavan Balaji	94faba5224	[nccl-pg] Revert accidental renaming of env variables (#115082 ) Summary: In [9cc040fef64154a2424b2ccd2c0909641e245cf0], we accidentally changed some of the environment variable names to the non-deprecated form. The intent was to support both the deprecated and the new form of the env variables (with a warning thrown for the deprecated form). Test Plan: OSS CI Reviewers: Subscribers: Tasks: Tags: Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/115082 Approved by: https://github.com/zdevito	2023-12-05 14:52:30 +00:00
PyTorch MergeBot	0ee1e469cb	Revert "Modify pointwise cat heuristic to only apply when inputs are all pointwise and outputs are all pointwise (#114520 )" This reverts commit 3d47b92dfbe19362fb6e98f142b2c79b9db7645c. Reverted https://github.com/pytorch/pytorch/pull/114520 on behalf of https://github.com/atalman due to Diff broke internal tests ([comment](https://github.com/pytorch/pytorch/pull/114520#issuecomment-1840890210))	2023-12-05 14:24:30 +00:00
cyyever	1224acc018	[3/N] Fixes clang-tidy warnings in header files (#114431 ) This PR series tries to enable clang-tidy for headers in torch/csrc and c10/util. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114431 Approved by: https://github.com/Skylion007	2023-12-05 12:58:27 +00:00
Huy Do	89569be2bd	Pin z3-solver on Windows to 4.12.2.0 (#115150 ) Windows trunk jobs start to fail with the new versions 4.12.3.0 published today https://pypi.org/project/z3-solver/ (Dec 4th 2023) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115150 Approved by: https://github.com/kit1980	2023-12-05 10:48:57 +00:00
Jackie (Jiaqi) Xu	58809e8914	[Inductor][Optimus]Move group/batch fusion logic out of inductor (#115128 ) Summary: As discussed D51695982, fusion may not be always good. We want to let the user customize the fx passes. Some example for new configs: * Use batch_fusion config: this will automatically use the following batch fusions, including batch linear, layernorm, relu, tanh, sigmoid and post grad batch linear fusion * use config: ``` "pre_grad_fusion_options": { "batch_linear": {"min_fuse_set_size": 10}, "batch_linear_lhs": {}, "batch_layernorm": {"max_fuse_search_depth": 100}, "batch_tanh": {}, "batch_relu": {}, "batch_sigmoid": {} }, ``` Test Plan: with flag: f509168388 with config: f509168595 Reviewed By: frank-wei, mengluy0125 Differential Revision: D51817314 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115128 Approved by: https://github.com/mengluy0125	2023-12-05 08:19:17 +00:00
Elias Ellison	d5af6b0301	Dont pad broadcasting bias dimension in pad mm (#115098 ) Fix for https://github.com/pytorch/pytorch/issues/99649. As title - we shouldn't pad a broadcasting dimension. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115098 Approved by: https://github.com/desertfire, https://github.com/jansel	2023-12-05 08:02:51 +00:00
Antoni Viros	1dc4588c6a	Add an SDPA dispatcher for nested tensors with jagged layouts (#114164 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114164 Approved by: https://github.com/jbschlosser	2023-12-05 06:33:45 +00:00
Chi	fb92983c9b	Added More Information About Adadelta Optimizer (#106290 ) I have added more information about Adadelta Optimizer to developers understand faster ways to what is doing that. It's my changes code looks like this: ![Screenshot from 2023-07-31 10-01-54](https://github.com/pytorch/pytorch/assets/93595990/72d7cd00-8acb-4ab0-820b-7ece4943c7c1) Pull Request resolved: https://github.com/pytorch/pytorch/pull/106290 Approved by: https://github.com/janeyx99	2023-12-05 05:55:16 +00:00
Iris Zhang (PyTorch)	eaa64339d6	[DeviceMesh] Rename _device_mesh.py to device_mesh.py to prepare for beta (#115099 ) Summary: Rename _device_mesh.py to device_mesh.py, update all callsites, adds documentation. Original diff reverted: D51629761 Original PR reverted: https://github.com/pytorch/pytorch/pull/114991 It was failing because failing a public module binding tests in MacOS, and this is due to the change in import order for torch/distributed/fsdp/_common_utils.py. Since this original import would still work, we remove the changes in this file. Test Plan: CI. Differential Revision: D51825114 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115099 Approved by: https://github.com/wanchaol, https://github.com/fegin	2023-12-05 05:44:52 +00:00
Michael Gschwind	e199b769b6	Unbreak vectorization (#115086 ) Summary: Unbreak vectorization Test Plan: sandcastle Differential Revision: D51818065 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115086 Approved by: https://github.com/malfet, https://github.com/seemethere	2023-12-05 04:15:54 +00:00
PyTorch UpdateBot	7843df60e4	[executorch hash update] update the pinned executorch hash (#115116 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml). Update the pinned executorch hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115116 Approved by: https://github.com/pytorchbot	2023-12-05 04:09:48 +00:00
Will Feng	1d0e70ad65	Add get_mutation_names to ir.Wait (#115104 ) `ir.Wait` generates the last 2 lines of this code: ```python buf1_work = dist.all_gather_into_tensor(buf1[0], buf1_inputs[0], async_op=True, group=buf1_pg) fun_col_impl._register_tensor_work(buf1, buf1_work) buf2 = buf1[0] del buf1 buf2 = _wait_tensor(buf2) # <- generated by ir.Wait buf3 = buf2; # reuse <- generated by ir.Wait ``` `_wait_tensor` technically is a "mutation" op that changes `buf2` in place. So we should mark `ir.Wait` as a mutation op (by overriding its `get_mutation_names()`). This fixes a very peculiar issue when inductor comm reordering is used for llama model: downstream nodes that uses the all-gather comm output sometimes takes dependency on `buf2` (the node before `ir.Wait`) instead of on `buf3` (`ir.Wait`) (it's still unclear why it behaves like this). To work around the issue, we add the missing annotation that `buf3` is a mutation of `buf2`, so that the scheduler knows to schedule `buf3` before any of the `buf2` users. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115104 Approved by: https://github.com/wanchaol	2023-12-05 03:54:33 +00:00
Jez Ng	3cf5348239	[inductor] Replace rand[n].generator with inductor prim if generator=None (#115051 ) This fixes the "should have been handled in replace_random.py" error raised during lowering. I also fixed `test_randn_generator` to catch any regressions. Previously, it did not use the result of randn(), so dynamo tracing omitted that node entirely. Fixes #114203. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115051 Approved by: https://github.com/eellison	2023-12-05 01:53:41 +00:00
Jason Ansel	3d0bbb24a1	[dynamo] Improve support for list subclasses (#115052 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115052 Approved by: https://github.com/oulgen, https://github.com/eellison ghstack dependencies: #114830, #115047, #115048	2023-12-05 01:31:33 +00:00
Jason Ansel	fe690f430a	[dynamo] Fix dict.get with no default (#115048 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115048 Approved by: https://github.com/eellison, https://github.com/oulgen ghstack dependencies: #114830, #115047	2023-12-05 01:31:33 +00:00
Nikita Shulga	f6b6fad136	Fix `torch.inductor._utils.get_device_tflops` on ROCm (#115102 ) That caused numerous test regressions after https://github.com/pytorch/pytorch/pull/114772 changed triton APIs a bit to use `nvsmi` function, which is not available on `hip` platform Fixes https://github.com/pytorch/pytorch/issues/115087 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115102 Approved by: https://github.com/desertfire, https://github.com/huydhn	2023-12-05 00:56:31 +00:00
rzou	c56d91ba39	Log pt2_compliant custom ops used with torch.compile (#115083 ) Summary: We already log non-pt2_compliant ops. This PR extends the logging to include pt2_compliant custom ops. We do not log all pt2_compliant ops (i.e. including builtin ops) because it would probably take too much memory Test Plan: Tested locally Pull Request resolved: https://github.com/pytorch/pytorch/pull/115083 Approved by: https://github.com/yanboliang, https://github.com/williamwen42	2023-12-05 00:51:33 +00:00
Wanchao Liang	288b1acaa9	[dtensor] fix empty shape init for dtensor constructors (#115091 ) As titled, this PR fixes the empty shape init case, where if we pass in things like `torch.dtensor.zeros([])`, it should call `torch.zeros([])` under the hood not `torch.empty(0)`, this makes dtensor constructor and torch constructor aligns Pull Request resolved: https://github.com/pytorch/pytorch/pull/115091 Approved by: https://github.com/XilunWu	2023-12-05 00:51:29 +00:00
PyTorch MergeBot	5cfda9b7f8	Revert "Add an SDPA dispatcher for nested tensors with jagged layouts (#114164 )" This reverts commit aafa8233a4a1f336014cb122d16941e5b593706c. Reverted https://github.com/pytorch/pytorch/pull/114164 on behalf of https://github.com/malfet due to Broke ROCM, see `aafa8233a4` ([comment](https://github.com/pytorch/pytorch/pull/114164#issuecomment-1839798986))	2023-12-05 00:35:20 +00:00
Julian M. Urban	aa6920c542	Fix hang in VonMises rejection sampling for small values of concentration (#114498 ) Fixes #88443 Forces the internal `dtype` of `torch.distributions.von_mises.VonMises` to be `torch.double` and mirrors the numpy implementation of the second order Taylor expansion for `concentration < 1e-5`. Samples and log probs are returned with `dtype` of argument `loc`. In principle one could also use masking in the rejection sampler to return uniformly distributed numbers for `concentration < 1e-8`, as in numpy. This may be slightly more efficient, but isn't required to solve the hanging issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114498 Approved by: https://github.com/fritzo	2023-12-04 23:07:06 +00:00
Jerry Zhang	1474dad28c	[quant][pt2e][xnnpack] Add support for QAT dynamic quantization for linear in XNNPACKQuantizer (#113288 ) Summary: FX graph mode quant workflow and also pt2e flow relies on the `is_dynamic` flag in observer/quantizationspec to convert an observer to dynamic quantization patterns (choose_qparams -> q -> dq), this PR added is_dynamic flag for all observers so that it's possible to convert these observers to the pattern. However, this dynamic quantization pattern (choose_qparams -> q -> dq) is actually only valid for MovingAverageObserver(averaging_constant=1) for the computation before convert and after convert to match in the context of QAT. So we'll have some sanity checks in other observers to make sure the is_dynamic is False. Test Plan: python test/test_quantization.py TestXNNPACKQuantizer.test_qat_dynamic_linear Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D51124725](https://our.internmc.facebook.com/intern/diff/D51124725) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113288 Approved by: https://github.com/kimishpatel	2023-12-04 23:06:38 +00:00
soulitzer	a7bcc78bff	Make it clearer that current selective AC is PT2-only and private (#115081 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115081 Approved by: https://github.com/albanD	2023-12-04 23:01:22 +00:00
Pearu Peterson	4ba37e1804	Add tests for bsr_dense_addmm and bsr_dense_mm triton kernels (#114800 ) As in the title. In addition, - resolve https://github.com/pytorch/pytorch/pull/114757#discussion_r1409547917 re triton-contiguous inputs - support non-contiguous inputs and outputs in triton kernels - fix a couple of minor bugs Pull Request resolved: https://github.com/pytorch/pytorch/pull/114800 Approved by: https://github.com/cpuhrsch	2023-12-04 22:07:47 +00:00
Antoni Viros	aafa8233a4	Add an SDPA dispatcher for nested tensors with jagged layouts (#114164 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114164 Approved by: https://github.com/jbschlosser	2023-12-04 21:54:02 +00:00
atalman	43e3242490	[BE] Remove test corner cases for CUDA older than supported 11.8 (#114989 ) Remove deprecated CUDA use cases from tests. Similar to: https://github.com/pytorch/pytorch/pull/112873 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114989 Approved by: https://github.com/malfet	2023-12-04 21:41:03 +00:00
Yanbo Liang	8ef44e6110	[autograd.Function] Fix torch.compile w/ once_differentiable leads to opaque graph break (#113625 ) Fixes #106893 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113625 Approved by: https://github.com/zou3519	2023-12-04 21:37:06 +00:00
Shubhraprakash Das	8dbae73e62	Use 2d weight and bias texture for conv2d quantized op (#114902 ) Summary: The performance with 2D texture for weight and bias is better for quantized conv2d, the un-quantized version of conv2d also uses 2D texture. The performance gain is: With 3D: Kernel Name Workgroup Size Duration P50 (ns) =========== ============== ================= vulkan.quantized_conv2d {96, 72, 2} 5965440 vulkan.quantized_conv2d {96, 72, 2} 11316968 vulkan.quantized_conv2d_dw{96, 72, 2} 2735564 vulkan.quantized_conv2d_pw_2x2{96, 72, 2} 1645696 With 2D: vulkan.quantized_conv2d {96, 72, 2} 4295772 vulkan.quantized_conv2d {96, 72, 2} 7874620 vulkan.quantized_conv2d_dw{96, 72, 2} 2658552 vulkan.quantized_conv2d_pw_2x2{96, 72, 2} 1632020 Test Plan: Ensure all vulkan quantize tests pass: buck2 run --target-platforms ovr_configplatform/macos:arm64-fbsourcexplat/caffe2:pt_vulkan_quantized_api_test_binAppleMac\#macosx-arm64 -c pt.vulkan_full_precision=1 --show-output" Running main() from third-party/googletest/1.11.0/googletest/googletest/src/gtest_main.cc [==========] Running 78 tests from 1 test suite. [----------] Global test environment set-up. [----------] 78 tests from VulkanAPITest .... [----------] 78 tests from VulkanAPITest (1519 ms total) [----------] Global test environment tear-down [==========] 78 tests from 1 test suite ran. (1519 ms total) [ PASSED ] 78 tests. buck2 run --target-platforms ovr_config//platform/macos:arm64-fbsource //xplat/caffe2:pt_vulkan_api_test_binAppleMac\#macosx-arm64 -c pt.vulkan_full_precision=1 --show-output" Running main() from third-party/googletest/1.11.0/googletest/googletest/src/gtest_main.cc [==========] Running 395 tests from 1 test suite. [----------] Global test environment set-up. [----------] 395 tests from VulkanAPITest ...... ----------] 395 tests from VulkanAPITest (6515 ms total) [----------] Global test environment tear-down [==========] 395 tests from 1 test suite ran. (6515 ms total) [ PASSED ] 394 tests. [ SKIPPED ] 1 test, listed below: [ SKIPPED ] VulkanAPITest.querypool_flushed_shader_log YOU HAVE 5 DISABLED TESTS Reviewed By: yipjustin Differential Revision: D50997534 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114902 Approved by: https://github.com/yipjustin	2023-12-04 20:54:40 +00:00
Wei Lu	6317a0350e	[PyTorch][Vulkan] Refactor performance test binary (#114712 ) Summary: We create two files `vulkan_perf_utils.h` and `vulkan_perf_utils.cpp` which hosts several shared functions among the `perf_test` source files: - `makeStack` - `callOpByHandle` - `callOpByName` - `extractTotalShaderResultsAndSetState` - `extractTotalOpResultsAndSetState` so that they can be used for all perf tests. Test Plan: We test `vulkan_conv_arithmetic_perf_test`, `vulkan_layernorm_perf_test` and `vulkan_mm_perf_test` respectively as below. - build binary, at `fbsource` ``` buck2 build -c ndk.debug_info_level=0 -c ndk.static_linking=true -c pt.enable_qpl=0 -c pt.vulkan_use_gpu_diagnostics=1 --target-platforms=ovr_config//platform/android:arm32-fbsource //xplat/caffe2:pt_vulkan_layernorm_perf_test_binAndroid --show-output -c pt.vulkan_full_precision=1 buck2 build -c ndk.debug_info_level=0 -c ndk.static_linking=true -c pt.enable_qpl=0 -c pt.vulkan_use_gpu_diagnostics=1 --target-platforms=ovr_config//platform/android:arm32-fbsource //xplat/caffe2:pt_vulkan_conv_arithmetic_perf_test_binAndroid --show-output -c pt.vulkan_full_precision=1 buck2 build -c ndk.debug_info_level=0 -c ndk.static_linking=true -c pt.enable_qpl=0 -c pt.vulkan_use_gpu_diagnostics=1 --target-platforms=ovr_config//platform/android:arm32-fbsource //xplat/caffe2:pt_vulkan_mm_perf_test_binAndroid --show-output -c pt.vulkan_full_precision=1 ``` - push to device ``` adb push buck-out/v2/gen/fbsource/f1f3f9bed27e143c/xplat/caffe2/__pt_vulkan_conv_arithmetic_perf_test_binAndroid__/pt_vulkan_conv_arithmetic_perf_test_binAndroid /data/local/tmp adb push buck-out/v2/gen/fbsource/f1f3f9bed27e143c/xplat/caffe2/__pt_vulkan_mm_perf_test_binAndroid__/pt_vulkan_mm_perf_test_binAndroid /data/local/tmp adb push buck-out/v2/gen/fbsource/f1f3f9bed27e143c/xplat/caffe2/__pt_vulkan_mm_perf_test_binAndroid__/pt_vulkan_mm_perf_test_binAndroid /data/local/tmp ``` - test on device ``` adb shell /data/local/tmp/pt_vulkan_mm_perf_test_binAndroid adb shell /data/local/tmp/pt_vulkan_layernorm_perf_test_binAndroid adb shell /data/local/tmp/pt_vulkan_conv_arithmetic_perf_test_binAndroid ``` full results: vulkan_mm_perf_test: P887658084 vulkan_layernorm_perf_test P887687924 vulkan_conv_arithmetic_perf_test P887689880 Reviewed By: yipjustin, liuk22 Differential Revision: D51451751 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114712 Approved by: https://github.com/yipjustin	2023-12-04 19:49:50 +00:00
PyTorch MergeBot	62df4f3428	Revert "Update oneDNN submodule to v3.3.2 (#112700 )" This reverts commit afbaa0c1650cf15100fb5dc579ceeba24fb8665a. Reverted https://github.com/pytorch/pytorch/pull/112700 on behalf of https://github.com/atalman due to Diff broke internal tests ([comment](https://github.com/pytorch/pytorch/pull/112700#issuecomment-1839350284))	2023-12-04 19:41:12 +00:00
Jason Ansel	a70c85ce90	[dynamo] Improve support for inspect.signature().parameters (#115047 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115047 Approved by: https://github.com/oulgen ghstack dependencies: #114830	2023-12-04 19:08:36 +00:00
Jason Ansel	40218436c4	Remove size asserts from fx_insert_profiling (#114830 ) These are pretty old, don't work with dynamic shapes, and are failing with --coverage mode in torchbench.py. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114830 Approved by: https://github.com/oulgen	2023-12-04 19:08:36 +00:00
PyTorch MergeBot	8bb3cd192f	Revert "Assert that output could only be the last node of the FX graph (#114973 )" This reverts commit a85df9eb0b35ed8c03e7db3c3cee01c2180fa3ed. Reverted https://github.com/pytorch/pytorch/pull/114973 on behalf of https://github.com/atalman due to Diff broke internal tests ([comment](https://github.com/pytorch/pytorch/pull/114973#issuecomment-1839290400))	2023-12-04 19:07:48 +00:00
cyyever	dcb486232d	[Reland2] Update NVTX to NVTX3 (#109843 ) Another attempt to update NVTX to NVTX3. We now avoid changing NVTX header inclusion of existing code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109843 Approved by: https://github.com/peterbell10	2023-12-04 19:02:07 +00:00
Lucas Pasqualin	753c07bbe0	All gather keys before processing Stateful objects in save/load [2/N] (#114304 ) Accounts for the case where `state_dict` keys may present in different orders. Since users may be calling collectives in `state_dict` and `load_state_dict` call, different ordered keys could cause a deadlock. This is mostly a defensive move, meant to match the feature in TSS. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114304 Approved by: https://github.com/fegin, https://github.com/wz337	2023-12-04 18:31:14 +00:00
Tugsbayasgalan Manlaibaatar	f1c8c427da	Fix https://github.com/pytorch/pytorch/issues/114892 (#115054 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115054 Approved by: https://github.com/bdhirsh	2023-12-04 18:29:33 +00:00
Tugsbayasgalan Manlaibaatar	a9e9590934	FF inductor failure (#114980 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114980 Approved by: https://github.com/eellison, https://github.com/bdhirsh	2023-12-04 18:26:34 +00:00
Jesse Cai	4cb7dd0fc9	[sparse][quant] Add support for vector alpha in cusparselt mm (#112056 ) Summary: This PR adds in support for passing in a alpha Tensor, which represents a tensor of alpha values to fuse into the matmul. ``` cusparselt_sparse_mm = alpha A @ B + bias ``` This operation is necessary for quantization, where we would like to fuse one of the dequant matmuls into the sparse op. Test Plan: ``` python test/test_sparse_semi_structured -k alpha ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/112056 Approved by: https://github.com/cpuhrsch	2023-12-04 16:56:06 +00:00
PyTorch MergeBot	f101426790	Revert "Move class definition of DebugInfoWriter to TraceUtil as well (#114901 )" This reverts commit fb325bbd46f69bea8b2debd3ab5830c9eedadc0d. Reverted https://github.com/pytorch/pytorch/pull/114901 on behalf of https://github.com/atalman due to Diff broke internal tests ([comment](https://github.com/pytorch/pytorch/pull/114901#issuecomment-1838815178))	2023-12-04 14:55:39 +00:00
PyTorch UpdateBot	453d509b73	[xla hash update] update the pinned xla hash (#114586 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml). Update the pinned xla hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114586 Approved by: https://github.com/pytorchbot	2023-12-04 11:24:17 +00:00
Jiong Gong	bfa2c844a8	[inductor][cpp] avoid redundant lowp type cast for direct load/store (#115006 ) Fix https://github.com/pytorch/pytorch/issues/114879. See https://github.com/pytorch/pytorch/issues/114879#issuecomment-1836977610 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115006 Approved by: https://github.com/jansel	2023-12-04 06:39:27 +00:00
Oguz Ulgen	3da67ffad1	[Inductor] Do not promote int to float for torch.mm (#115043 ) This PR fixes inductor silently promoting int to float and causing behavior difference Fixes #98978 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115043 Approved by: https://github.com/jansel	2023-12-04 06:36:55 +00:00
Xuehai Pan	3fbfa8cd0a	[dynamo] support `dict.copy()` / `OrderedDict.copy()` / `defaultdict.copy()` (#115012 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115012 Approved by: https://github.com/jansel ghstack dependencies: #115010, #115011	2023-12-04 01:50:10 +00:00
Xuehai Pan	917a52d2a2	[dynamo] support `dict.update(seq2)` / `OrderedDict.update(seq2)` / `defaultdict.update(seq2)` (#115011 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115011 Approved by: https://github.com/jansel ghstack dependencies: #115010	2023-12-04 01:50:10 +00:00
Xuehai Pan	2e8ac5ea93	[dynamo] support `dict.fromkeys()` / `OrderedDict.fromkeys()` / `defaultdict.fromkeys()` (#115010 ) Add support for `dict.fromkeys`, `OrderedDict.fromkeys`, and `defaultdict.fromkeys`. Fixes #114963 - #114963 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115010 Approved by: https://github.com/jansel	2023-12-04 01:49:59 +00:00
FFFrog	541591dd79	Add the appropriate check on div_value to the cpp frontend (#114671 ) Fixes #114334 As the title stated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114671 Approved by: https://github.com/mikaylagawarecki	2023-12-04 01:28:11 +00:00
Menglu Yu	50833021dd	[Inductor] We re-enable the batch_fusion and group_fusion flags in order not to disturb the current production model implementation (#114841 ) Summary: We did two things: 1. We add back the batch_fusion and group_fusion flags to keep the current production model implementation 2. We tell batch and group fusion in the post grad since group need fbgemm. Test Plan: ``` buck2 test 'fbcode//mode/dev-nosan' fbcode//caffe2/test/inductor:group_batch_fusion ``` Buck UI: https://www.internalfb.com/buck2/13d152d2-5d4d-4c7a-ab88-51f8e8218942 Test UI: https://www.internalfb.com/intern/testinfra/testrun/1125900253044737 Network: Up: 376KiB Down: 44KiB (reSessionID-c508aedc-8cc2-434a-8c17-bbe075a05562) Jobs completed: 17. Time elapsed: 1:23.1s. Cache hits: 0%. Commands: 1 (cached: 0, remote: 0, local: 1) Tests finished: Pass 6. Fail 0. Fatal 0. Skip 0. Build failure 0 Differential Revision: D51695982 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114841 Approved by: https://github.com/jackiexu1992	2023-12-03 23:59:10 +00:00
Nikita Shulga	491f3c8037	[CI] Small follow up for triton conda builds Forgot to modify conda upload rules in https://github.com/pytorch/pytorch/pull/115039 Also remove redundant parentheses	2023-12-03 15:55:00 -08:00
Nikita Shulga	bf16fec463	Fix up triton builds (#115039 ) Follow ups after https://github.com/pytorch/pytorch/pull/114772 and https://github.com/pytorch/pytorch/pull/108187 - Triton builds should be published from `main` rather than `nightly` branch, as: - They are independent of any PyTorch changes - Every nightly is pinned to a specific commit therefore publishing updated triton binaries will not affect previous nightlies - If this is not the case, nightly promotion will never happen as binary builds on main will continue to fail in perpetuity searching for new triton binary - `patch_setup_py` is still needed to modify name of the package for ROCm builds Pull Request resolved: https://github.com/pytorch/pytorch/pull/115039 Approved by: https://github.com/seemethere, https://github.com/kit1980, https://github.com/huydhn	2023-12-03 23:14:41 +00:00
Jason Ansel	7979ba7b43	[inductor] Add dropout type check to match eager (#115040 ) Fixes #98970 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115040 Approved by: https://github.com/oulgen	2023-12-03 23:05:02 +00:00
Jason Ansel	69a8f9b07e	[inductor] Fix shape mismatch in sdpa pattern matcher (#115038 ) Fixes #100316 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115038 Approved by: https://github.com/oulgen	2023-12-03 22:32:12 +00:00
Xuehai Pan	55064a4ef9	[BE] add parentheses to kwargs unpacking `func(args, (kwargs or {}))` (#115026 ) This PR adds parentheses to kwargs unpacking `func(args, *(kwargs or {}))` for better code readability. With/without the parentheses are semantic equivalent because they produce the same bytecode. ```console $ echo "func(args, *kwargs or {})" \| python3 -m dis - 0 0 RESUME 0 1 2 PUSH_NULL 4 LOAD_NAME 0 (func) 6 LOAD_NAME 1 (args) 8 BUILD_MAP 0 10 LOAD_NAME 2 (kwargs) 12 JUMP_IF_TRUE_OR_POP 1 (to 16) 14 BUILD_MAP 0 >> 16 DICT_MERGE 1 18 CALL_FUNCTION_EX 1 20 POP_TOP 22 LOAD_CONST 0 (None) 24 RETURN_VALUE $ echo "func(args, **(kwargs or {}))" \| python3 -m dis - 0 0 RESUME 0 1 2 PUSH_NULL 4 LOAD_NAME 0 (func) 6 LOAD_NAME 1 (args) 8 BUILD_MAP 0 10 LOAD_NAME 2 (kwargs) 12 JUMP_IF_TRUE_OR_POP 1 (to 16) 14 BUILD_MAP 0 >> 16 DICT_MERGE 1 18 CALL_FUNCTION_EX 1 20 POP_TOP 22 LOAD_CONST 0 (None) 24 RETURN_VALUE ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/115026 Approved by: https://github.com/Skylion007	2023-12-03 20:03:26 +00:00
Yang Chen	4d8b9964e1	[aotinductor] support at::convolution for AOTInductor (#114961 ) This PR adds support to at::convolution for AOTInductor Pull Request resolved: https://github.com/pytorch/pytorch/pull/114961 Approved by: https://github.com/desertfire	2023-12-03 07:52:28 +00:00
Tugsbayasgalan Manlaibaatar	7f49603ed3	Fix https://github.com/pytorch/pytorch/issues/114899 (#114985 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114985 Approved by: https://github.com/ydwu4	2023-12-03 05:24:02 +00:00
Jez Ng	3cdfba0a7c	Make DynamicShapes*Tests show up properly in the test failure repro string (#115019 ) Set their `__module__` attributes so that Python thinks the test classes are defined in test_dynamic_shapes and not in torch._dynamo.testing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115019 Approved by: https://github.com/Skylion007 ghstack dependencies: #115003	2023-12-03 04:48:43 +00:00
Jez Ng	c808a84680	Better logging for "cannot fuse" reasons (#115003 ) This was invaluable when I was debugging #114917. Without the node names in the log message, it was difficult to make sense of them. However, I did not want to bloat the number of LOC with this change. Thus, instead of calling `debug()` directly with the node arguments, I made a new callable class WhyNoFuse to partially apply the node arguments at the top of each fusion-checking method. WhyNoFuse generates the logging string only when its `__str__` method gets called, so there is minimal overhead when logging is disabled. I also removed the various logging 'tags' like "vert:1" / "triton:1" -- the log messages themselves are unique enough that the user can identify them without the tag. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115003 Approved by: https://github.com/Skylion007	2023-12-03 04:48:43 +00:00
PyTorch UpdateBot	a797821fd6	[executorch hash update] update the pinned executorch hash (#115021 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml). Update the pinned executorch hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115021 Approved by: https://github.com/pytorchbot	2023-12-03 04:20:41 +00:00
PyTorch UpdateBot	3f366aa317	[audio hash update] update the pinned audio hash (#114997 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml). Update the pinned audio hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114997 Approved by: https://github.com/pytorchbot	2023-12-03 03:45:49 +00:00
Nikita Shulga	a6294d8b9f	[RelEng] Enable Py312 conda builds (#114819 ) Once [sympy-1.12](https://anaconda.org/anaconda/sympy/files?version=1.12) has been added it can be build across the board Majority of the changes are in the builder repo: * `6b8c73fecb` tweaks numpy and openssl deps * `fc773dde97` <- tweak MLK requirements for Windows * `ca378c16f8` do not depend on Triton * `3c7404d80c` <- build without GLOO_SSL And finally, to workaround chicken-and-egg problem from [smoke_test.bat:97](`b92da8cd64/windows/internal/smoke_test.bat (L97)`) ```cmd call conda install -yq numpy pytorch %CONDA_EXTRA_ARGS% ``` Manually upload binaries to pytorch-nightly channel (will fix it akin to Nova in followup PRs) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114819 Approved by: https://github.com/huydhn	2023-12-03 01:30:03 +00:00
Nikita Shulga	2391f3717e	[BE] Same install command for aarch64 and x86_64 wheels (#115017 ) `--extra-index-url` should be no longer necessary Pull Request resolved: https://github.com/pytorch/pytorch/pull/115017 Approved by: https://github.com/kit1980	2023-12-03 00:33:52 +00:00
Facebook Community Bot	3cbe7a53a9	Automated submodule update: FBGEMM (#114444 ) This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `84c7b278be` Test Plan: Ensure that CI jobs succeed on GitHub before landing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114444 Approved by: https://github.com/malfet	2023-12-02 22:15:06 +00:00
Aaron Gokaslan	d7b303dcf8	[BE]: Enable a PLC0131, PLC0132, PLC0205. Fix PLC0132 bug. (#115015 ) Enable pylint rules `PLC0131` and `PLC0132`. There was a violation of the `PLC0132` so this commit also fixes it and enables the rules so the violation do not occur again. `PLC0205` checks accidentally setting your `__slots__` to a string which is almost always a bug. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115015 Approved by: https://github.com/jansel, https://github.com/malfet	2023-12-02 20:35:10 +00:00
Kwanghoon An	13410d0eda	Moving target/code path to non-pytorch repo (#114095 ) Differential Revision: D51460806 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114095 Approved by: https://github.com/digantdesai	2023-12-02 19:27:09 +00:00
Bin Bao	8a90249bc2	[inductor] Update triton pin (#114772 ) Differential Revision: [D51761353](https://our.internmc.facebook.com/intern/diff/D51761353) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114772 Approved by: https://github.com/shunting314, https://github.com/atalman	2023-12-02 19:13:56 +00:00
PyTorch MergeBot	3a2e2044cd	Revert "[DeviceMesh] Rename _device_mesh.py to device_mesh.py to prepare for beta (#114710 ) (#114991 )" This reverts commit 729ac7317a50a6a195b324cf6cefd748bf4f5498. Reverted https://github.com/pytorch/pytorch/pull/114991 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/114991#issuecomment-1837214567))	2023-12-02 17:55:51 +00:00
Jiong Gong	af5a3bda45	[merge rule] add CPU quantization (#114994 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114994 Approved by: https://github.com/jerryzh168, https://github.com/malfet	2023-12-02 08:34:55 +00:00
Wanchao Liang	28925902fa	[TP] fully rewrite Tensor Parallel APIs (#114732 ) This PR rewrites Tensor Parallel implementation. Tensor Parallel APIs supposed to be a very thin-wrapper to DTensor APIs, but the current implementation got too messy and buggy. It's really hard to debug what went wrong when using it. It's crucially important for advanced users or developers to understand the API and its implementation easily without going through all different types of functions and utils, so that they could trust what happen under the hood. In particular this PR: * Make ParallelStyle to be a real contract API for parallelize_module to take, each concrete ParallelStyle only needs to implement `apply` to apply the sharding to nn.Module, remove all non-necessary fields. This also enable easier ParallelStyle authoring going forward. * Keep the ColwiseParallel and RowwiseParallel public interface, but refactor them in a way that makes the parameter sharding, inputs and outputs handling lives within the style itself, so that it's easy to understand how Linear/Embedding layers are sharded and how the inputs/outputs transformations are performed. * remove all those private _prepare_input/_prepare_output_fn fields for both ColwiseParallel/RowwiseParallel. Since we throw deprecation messages in nightly for a while and TP is on prototype release, the fields are also private, it should be safe to remove them * Refactor the recently landed PrepareModuleInput/Output style, change output_layouts to desired_input/output_layouts, group the function inside the style itself, no default arguments for these two styles and user need to specify them to think about the sharding layouts. Fixed bugs about not handling `use_local_output` flag. * Make default arguments be None instead of Placement object, this is standard python practice to not have custom object instance as default argument * Remove all dead APIs (i.e. PairwiseParallel and SequenceParallel style, all prepare input/output functions) as we throw deprecation msgs for a while, and in the progress of removing all of them from the tests. * throw deprecation warning for `tp_mesh_dim` as we recomemnd use device mesh slice/indexing instead of manually specify mesh dim * Rewrite all documentations for every ParallelStyle and make the documentation more clear about what each style is doing TODOs: * Rewrite TP tests to adjust for the changes we have in this PR * add more tests to guard the bug fixes Differential Revision: [D51761183](https://our.internmc.facebook.com/intern/diff/D51761183) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114732 Approved by: https://github.com/wz337, https://github.com/fduwjj	2023-12-02 08:18:12 +00:00
Iris Zhang (PyTorch)	729ac7317a	[DeviceMesh] Rename _device_mesh.py to device_mesh.py to prepare for beta (#114710 ) (#114991 ) Summary: Same content of changes as https://github.com/pytorch/pytorch/pull/114710 Rename _device_mesh.py to device_mesh.py, update all callsites, adds documentation. ghstack-source-id: 208980207 exported-using-ghexport Test Plan: CI. Reviewed By: wanchaol Differential Revision: D51629761 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114991 Approved by: https://github.com/wanchaol, https://github.com/fduwjj, https://github.com/fegin	2023-12-02 04:39:41 +00:00
Wanchao Liang	0fef82b3df	[dcp] fix fsdp state_dict to use run_check=False (#114995 ) from_local with replicate placement would run mesh_broadcast if run_check=True, by default from_local have run_check=True, but for FSDP state_dict case we are for sure that these are replica already, so we don't need to check/force check it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114995 Approved by: https://github.com/fegin, https://github.com/XilunWu, https://github.com/wz337	2023-12-02 04:16:37 +00:00
chilli	1f51f977ae	misc visualization/utility improvements (#114984 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114984 Approved by: https://github.com/weifengpy ghstack dependencies: #114520	2023-12-02 04:02:39 +00:00
chilli	3d47b92dfb	Modify pointwise cat heuristic to only apply when inputs are all pointwise and outputs are all pointwise (#114520 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114520 Approved by: https://github.com/eellison	2023-12-02 04:02:39 +00:00
PyTorch UpdateBot	a5a1f0a6b1	[executorch hash update] update the pinned executorch hash (#114996 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml). Update the pinned executorch hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114996 Approved by: https://github.com/pytorchbot	2023-12-02 03:57:47 +00:00
Jez Ng	f1fd02503b	Reland #113487 and #112527 (sdpa shim & fp8 AOTInductor support) (#114974 ) This is a backout of #113747 which reverted the above two commits. Now that #113997 has landed, this diff can be landed safely without breaking ABI compatibility. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114974 Approved by: https://github.com/chenyang78	2023-12-02 03:25:51 +00:00
PyTorch UpdateBot	fe08d995ef	[vision hash update] update the pinned vision hash (#111523 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111523 Approved by: https://github.com/pytorchbot	2023-12-02 03:04:19 +00:00
Nikita Shulga	2882d7fdaf	[BE] Remove stale workaround for CUDA<=11.2 (#114979 ) It's been a dead code for the last 3+ releases Pull Request resolved: https://github.com/pytorch/pytorch/pull/114979 Approved by: https://github.com/Skylion007	2023-12-02 02:41:41 +00:00
Mu-Chu Lee	a9aad4ea21	[AOTInductor] Generate Triton header even if scheduler is not invoked. (#114972 ) Summary: Generate Triton header for profiling. If Triton header isn't generated through Scheduler, generate it directly when in wrapper codegen. Test Plan: Test included in commit. (test_aot_inductor.py:test_with_no_triton_profiler) Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/114972 Approved by: https://github.com/chenyang78, https://github.com/desertfire	2023-12-02 02:03:38 +00:00
Mu-Chu Lee	fb806f487f	[AOTInductor] Add method to get storage size in shim (#114976 ) Summary: Add a method to get storage size. Test Plan: N/A, for FC, test will come after packaged. Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/114976 Approved by: https://github.com/chenyang78, https://github.com/desertfire	2023-12-02 01:54:18 +00:00
Jerry Zhang	8f164017ee	[quant][pt2e][xnnpack] XNNPACKQuantizer skip quantization for input and output to workaround histogram observer problem (#113405 ) Summary: att, this is because histogram observer does not work for a corner case in mobilebert (observing a scalar tensor of float32 max value) because histc operator errors out when the value is larger than certain number Test Plan: python test/test_quantization.py -k test_mul_float32_max Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/113405 Approved by: https://github.com/mcr229	2023-12-02 00:44:42 +00:00
Jason Ansel	7bbc19adc4	[dynamo] Unskip DALLE2_pytorch (#114960 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114960 Approved by: https://github.com/eellison ghstack dependencies: #114959	2023-12-02 00:40:25 +00:00
voznesenskym	4cfe997490	[dynamo] handle setting .data on a tensor (#113080 ) Dynamo We don't want setattr in the graph. Setting data has interesting implications on both aliasing and on the autograd engine. The safe recipe is: 1) Disable grad 2) Call set_() 3) Manually lower the version counter on the object to hide it from the autograd engine This is effectively the same exact thing as setting .data, and it composes properly with aot_autograd and inductor. aot_autograd For aot_autograd, there's another snag. Specifically, when we invoke aot_autograd, we call `fake_mode.from_tensor()`, relying on memo to get the right tensor out. For .data mutations, this doesn't work, because the memoized fake_tensor is in the state it will be in at the end of the trace, not at the beginning. This means that the .data call is already applied, and the tensor shape (as in the case of these tests) mismatches. aot_autograd produces an invalid graph, with illegal calls like `torch.ops.aten.view.default(primals_2, [0])` where primals is actually sized `([6])` on input. The new plan here is to: 1) Record tensor fakification policy in dynamo 2) provide a fresh fake mode to all backends 3) Invoke from_tensor with the stored policy to get fresh new fake tensors in aot_autograd Pull Request resolved: https://github.com/pytorch/pytorch/pull/113080 Approved by: https://github.com/bdhirsh	2023-12-02 00:35:44 +00:00
BowenBao	77c4565d58	[ONNX][Bench] Remove double export and session init in perf test (#114907 ) Previously both `optimize_ctx` call and `experiment` call will do export and session creation, ending up doubling the resource cost. This PR makes `experiment` call re-use the onnx model created by `optimize_ctx`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114907 Approved by: https://github.com/thiagocrepaldi ghstack dependencies: #110178	2023-12-02 00:17:07 +00:00
BowenBao	b0a36944cc	[ONNX] Add sanity check in CI for onnxbench (#110178 ) ONNX CI to run benchmark with `--quick` to validate the onnxbench infra. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110178 Approved by: https://github.com/thiagocrepaldi	2023-12-02 00:17:07 +00:00
Nikita Shulga	1fce51037e	Add `profiler/unwind` to the package (#114981 ) Needed by `torch/csrc/profiler/combined_traceback.h` Fixes https://github.com/pytorch/pytorch/issues/114978 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114981 Approved by: https://github.com/atalman	2023-12-01 23:55:01 +00:00
Antoni Viros	d47f715d29	Expose Flash attn to autograd (#114378 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114378 Approved by: https://github.com/drisspg	2023-12-01 23:42:06 +00:00
chunyuan	80d8a2a237	improve mkldnn_linear_pointwise performance for contiguous tensor with non default contiguous strides (#114939 ) This PR will convert the stride to the default contiguous stride in `mkldnn_linear_pointwise` before calling oneDNN to run into an optimization path similar to https://github.com/pytorch/pytorch/pull/99511. Also refactored the code to provide a common utility function. https://github.com/pytorch/pytorch/pull/111976 will ignore Dims of value 1 in Require_Stride_order. For a tensor with `size = [1, 1280]`, `stride = [0, 1]`: Before the above PR, it is considered as non-contiguous, thus in the below call, it is converted to `size = [1, 1280]`, `stride = [1280,1]`: `25b83521be/torch/_inductor/ir.py (L5263)` While after the above PR, dims of value 1 are ignored so this tensor is already contiguous and we'll feed a tensor with `stride = [0, 1]` to oneDNN, which results in poor performance. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114939 Approved by: https://github.com/jgong5	2023-12-01 23:30:07 +00:00
Sergii Dymchenko	e666159e2f	Fix lint in group_batch_fusion.py (#114993 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114993 Approved by: https://github.com/janeyx99	2023-12-01 23:17:12 +00:00
Brian Hirsh	c546ca9f80	AOTAutograd: support mutations on buffers that happen during the bw (#114953 ) Re-land of https://github.com/pytorch/pytorch/pull/112906 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114953 Approved by: https://github.com/zou3519, https://github.com/drisspg	2023-12-01 23:09:37 +00:00
Oguz Ulgen	a85df9eb0b	Assert that output could only be the last node of the FX graph (#114973 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114973 Approved by: https://github.com/Chillee	2023-12-01 23:04:19 +00:00
Rohan Varma	3c78ea4c9d	[DDP][Compile] Test to Ensure torch.compile works w/static_graph=True (#114621 ) Resolves https://github.com/pytorch/pytorch/issues/93672. This was actually fixed by https://github.com/pytorch/pytorch/pull/103487 but I didn't realize that PR also fixes torch compile at the time. Differential Revision: [D51596148](https://our.internmc.facebook.com/intern/diff/D51596148/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114621 Approved by: https://github.com/wconstab	2023-12-01 22:18:45 +00:00
Shiyan Deng	6e495eef60	[tgif] allow preserving non-forward methods during deepcopy (#114849 ) Summary: bypass-github-export-checks force-merge-on-github Reviewed By: sayitmemory Differential Revision: D51629520 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114849 Approved by: https://github.com/houseroad	2023-12-01 21:51:05 +00:00
Jason Ansel	4ee80fd7f4	[dynamo] Support UNPACK_SEQUENCE nn.ModuleList (#114959 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114959 Approved by: https://github.com/oulgen, https://github.com/yanboliang	2023-12-01 21:42:23 +00:00
Shunting Zhang	68a8d74f3f	[inductur] benchmark epilogue fused matmul template (#114809 ) Want to be a able to benchmark epilogue fused triton matmul kernel for a couple of reasons 1. @eellison found that certain TB models (resnet50, resnet152, moco) fails sometimes in maxautotune mode on the dashboard. The issue is quite hard to repro due to flakiness. The issue only get triggered when certain triton config for certain epilogue fused kernel get picked. (disable epilogue fusion bypass the issue) It would be nice if we can have a runnable script that directly run that kernel to ease further debugging 2. this is a necessary piece to do benchmark fusion for triton matmul kernels. cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @ColinPeppler for this Example runnable kernel script: https://gist.github.com/shunting314/00bdbc1b6b46bfa73d1389d8f40cd669 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114809 Approved by: https://github.com/eellison	2023-12-01 21:05:01 +00:00
Will Constable	8a51845b38	[C10D] Add filename to dump finished log (#114957 ) Just shows you where to look.. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114957 Approved by: https://github.com/fduwjj	2023-12-01 20:38:02 +00:00
Chip Turner	9cc040fef6	Switch env variable use in test harnesses to the non-deprecated names to fix warnings (#114880 ) Previously: ``` [W Utils.hpp:133] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt) [W Utils.hpp:133] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt) ``` With this PR, those warnings disappear. They were introduced in #114077 This change was generated with this sed script, applied with `sed -i -f /tmp/x */.{py,hpp,cpp,cc}` and hand inspected. ``` s/\bNCCL_BLOCKING_WAIT\b/TORCH_NCCL_BLOCKING_WAIT/g s/\bNCCL_ENABLE_TIMING\b/TORCH_NCCL_ENABLE_TIMING/g s/\bNCCL_DESYNC_DEBUG\b/TORCH_NCCL_DESYNC_DEBUG/g s/\bNCCL_ASYNC_ERROR_HANDLING\b/TORCH_NCCL_ASYNC_ERROR_HANDLING/g s/\bENABLE_NCCL_HEALTH_CHECK\b/TORCH_ENABLE_NCCL_HEALTH_CHECK/g s/\bNCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK\b/TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK/g ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114880 Approved by: https://github.com/kwen2501	2023-12-01 20:08:23 +00:00
Xu Zhao	1bcefaf575	[inductor] post_grad batched linear fusion (#112504 ) Summary: Fusing independent nn.Linear() functions with aten.bmm and aten.cat. Test Plan: Without the BMM fusion: ``` buck2 run @mode/opt //pytorch/benchmark:run -- test_module -d cuda --module test_linear_module --torchdynamo inductor --torchinductor_cudagraph 0 --torchinductor_batch_fusion 0 ``` https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/test/torchbench_test_module_20231030_072536_6535183793.json.gz&bucket=pyper_traces 100 aten::mm operators With the BMM fusion: ``` buck2 run @mode/opt //pytorch/benchmark:run -- test_module -d cuda --module test_linear_module --torchdynamo inductor --torchinductor_cudagraph 0 --torchinductor_batch_fusion 1 ``` 20 aten::bmm operators https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/test/torchbench_test_module_20231030_072157_6535183793.json.gz&bucket=pyper_traces Passes accuracy test: ``` $ buck2 run @mode/opt //pytorch/benchmark:run -- test_module -d cuda --module test_linear_module --torchdynamo inductor --torchinductor_cudagraph 0 --torchinductor_batch_fusion 1 --accuracy Running eval method from test_module on cuda in dynamo inductor mode with input batch size 4 and precision tf32. Accuracy: pass ``` Looks like the bmm and input cat has been fused successfully. Checking the triton codegen: ``` TORCH_LOGS=+dynamo,+aot,+inductor buck2 run @mode/opt //pytorch/benchmark:run -- test_module -d cuda --module test_linear_module --torchdynamo inductor --torchinductor_cudagraph 0 --torchinductor_batch_fusion 1 --dump_triton 1 ``` Triton code dump: https://www.internalfb.com/intern/everpaste/?handle=GHp1ABaqYuTjYCUBALiTWmteaI1PbsIXAAAB Pull Request resolved: https://github.com/pytorch/pytorch/pull/112504 Approved by: https://github.com/yanboliang	2023-12-01 19:26:29 +00:00
Lucas Pasqualin	f073dcd4f7	Stateful Checkpointing for Distributed [1/N] (#113867 ) First pass at adding a save/load API, as well as definition of Stateful objects. Amongst a couple todo's, we still need to explore adding an `all_gather` & potentially a `barrier` while iterating through state keys. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113867 Approved by: https://github.com/fegin, https://github.com/wz337	2023-12-01 19:21:03 +00:00
Kurt Mohler	6f32eb7eef	Add decomp for `replication_pad2d` and use for CUDA deterministic (#111590 ) Fixes #95578 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111590 Approved by: https://github.com/peterbell10	2023-12-01 18:56:09 +00:00
PyTorch MergeBot	c6e975bc0e	Revert "[Quant] [PT2] Enable batchnorm in _move_exported_model_to_eval (#114547 )" This reverts commit bab054063c7fd6c4b3b8d55a932f2e7fa0a057bb. Reverted https://github.com/pytorch/pytorch/pull/114547 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/114547#issuecomment-1836612143))	2023-12-01 18:52:51 +00:00
Xia, Weiwen	afbaa0c165	Update oneDNN submodule to v3.3.2 (#112700 ) Update oneDNN submodule to v3.3.2. Add a macro to check the version of `third_party/ideep`. Since we have versioning now, the changes won't break any pipeline even if `third_party/ideep` is not updated at the same time. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112700 Approved by: https://github.com/leslie-fang-intel, https://github.com/atalman	2023-12-01 18:40:07 +00:00
Menglu Yu	93b1e47586	[inductor][Observability] Add log for Optimus to enable easier debug (#110452 ) Summary: The log breaks one of ads-model export flows, and we change the log to debug Test Plan: see details in D49710166 Differential Revision: D49844303 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110452 Approved by: https://github.com/jackiexu1992	2023-12-01 18:25:56 +00:00
Catherine Lee	32b928e582	Tests have main linter (#114882 ) The linter uses libcst to check for a call to run_tests or a raised exception when the test file is run as main to ensure that all test files either get run in OSS CI or don't run and are expected to not run. A better option instead of making this into a linter might be to add this code in run_test since there's also a list of blocklisted tests there that needs to be updated when a test file raises an exception. This is possibly overkill since run on its own, the code takes ~1 minutes to run without the multiprocessing on all the files Pull Request resolved: https://github.com/pytorch/pytorch/pull/114882 Approved by: https://github.com/kit1980	2023-12-01 17:24:08 +00:00
David Berard	3fc58a6bbe	Revert "Make offsets dynamic by default (#113734 )" (#114889 ) This reverts commit 7c38b76efec65249e39ae2b8fd8280dfebd1d415. if a graph has a lot of inputs which are views (with nonzero storage offset), then the check for overlapping tensor views will add a lot of guards (n^2?) `b35ca2cb94/torch/_functorch/_aot_autograd/input_output_analysis.py (L256-L260)` this was causing very slow compilations on an internal model. Differential Revision: [D51733774](https://our.internmc.facebook.com/intern/diff/D51733774) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114889 Approved by: https://github.com/ckluk2, https://github.com/YuqingJ, https://github.com/aaronenyeshi	2023-12-01 16:49:42 +00:00
Xuehai Pan	ec124b90b8	[pytree] hardcode values for `none_is_leaf` and `namespace` in C++ pytree (#114858 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114858 Approved by: https://github.com/zou3519	2023-12-01 15:01:33 +00:00
Yukio Siraichi	5eb36166f8	Fix hard-coded `cuda` device in `ConstructorMoverPass`. (#114932 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114932 Approved by: https://github.com/eellison ghstack dependencies: #114626	2023-12-01 14:23:48 +00:00
Aleksei Nikiforov	833200c54f	s390x: fix build (#114508 ) Follow up to d18e6b07aa61 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114508 Approved by: https://github.com/huydhn	2023-12-01 14:23:44 +00:00
Nikita Shulga	76362cc9a0	[BE] Do not use AT_ERROR (#114883 ) As later is just an alias to `TORCH_CHECK(false,)` Proposed as suggestion to https://github.com/pytorch/pytorch/pull/110303 but it wasn't noticed Pull Request resolved: https://github.com/pytorch/pytorch/pull/114883 Approved by: https://github.com/atalman	2023-12-01 13:44:17 +00:00
Tobias Ringwald	d90d67a146	Added a check to prevent accessing blocksize during Tensor.to_sparse … (#114905 ) …conversion if empty. The main problem was that blocksize is an `optional<ArrayRef>`, so checking for `.has_value()` will be true even if the containing `ArrayRef` is empty. Fixes #114865. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114905 Approved by: https://github.com/malfet	2023-12-01 12:36:15 +00:00
yuanfz98	9e94c951a8	Fix missing meta for proxy.node (#114659 ) Hello community, There is node like SDPA which has basic type meta field that extract_val failed to capture. Otherwise some third-party modules plugged in Dynamo which trying to analyse node.meta['val'] will fail. See https://github.com/nod-ai/SHARK-Turbine/issues/206 Thanks ! Pull Request resolved: https://github.com/pytorch/pytorch/pull/114659 Approved by: https://github.com/Chillee	2023-12-01 12:17:23 +00:00
chilli	57083542ee	Added support for custom pre-grad passes (#113823 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113823 Approved by: https://github.com/eellison, https://github.com/jansel ghstack dependencies: #113913	2023-12-01 12:10:03 +00:00
fduwjj	25b83521be	[c10d] Log NCCL trace buffer size (#114926 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114926 Approved by: https://github.com/zdevito ghstack dependencies: #114901	2023-12-01 08:06:10 +00:00
Huy Do	9a075d9a8f	Update expected values after #114828 (#114918 ) This is failing in trunk `7b3429d97c`, updating the value after chatting with @jansel Pull Request resolved: https://github.com/pytorch/pytorch/pull/114918 Approved by: https://github.com/jansel	2023-12-01 07:55:13 +00:00
Jason Ansel	67562c8cf8	Add DALLE2_pytorch to skips (#114924 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114924 Approved by: https://github.com/huydhn	2023-12-01 07:15:59 +00:00
Li-Huai (Allan) Lin	38e1440bae	[MPS] Remove redundant topk test and move all pad tests inside a class (#113313 ) Summary: 1. The removed `topk` test is essentially very similar to the following test, so I remove it: ```python def test_topk(self): def helper(shape): cpu_x = torch.randn(shape, device='cpu', dtype=torch.float, requires_grad=False) x = cpu_x.detach().clone().to('mps') for largest_val in [True, False]: if (type(shape) == tuple): for curr_dim in range(0, len(shape)): dim_size = shape[curr_dim] for k in range(1, dim_size + 1): topk_values, topk_indices = torch.topk(x, k, dim=curr_dim, largest=largest_val) topk_values_cpu, topk_indices_cpu = torch.topk(cpu_x, k, dim=curr_dim, largest=largest_val) self.assertEqual(topk_values, topk_values_cpu) self.assertEqual(topk_indices, topk_indices_cpu) else: for k in range(1, shape): topk_values, topk_indices = torch.topk(x, k, dim=0, largest=largest_val) topk_values_cpu, topk_indices_cpu = torch.topk(cpu_x, k, dim=0, largest=largest_val) self.assertEqual(topk_values, topk_values_cpu) self.assertEqual(topk_indices, topk_indices_cpu) helper(2) helper((5, 1)) helper((1, 5)) helper((5, 9, 7, 4)) helper((50, 20, 7, 4)) ``` `297c26bb8e/test/test_mps.py (L8054-L8091)` 2. Move all pad tests to one standalone class. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113313 Approved by: https://github.com/kulinseth ghstack dependencies: #113312	2023-12-01 06:52:07 +00:00
Li-Huai (Allan) Lin	88a659e752	[MPS] Move non-nll loss tests outside TestNLLLoss (#113312 ) The diff looks messy but this PR essentially does one thing: Move non-nll loss tests in `TestNLLLoss` class to `TestMPS` class. After doing so, it ends up having two stack tests the same name `test_stack` ; therefore, I rename one of them to `test_stack_storage_offset`, which is what the test actually does. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113312 Approved by: https://github.com/kulinseth	2023-12-01 06:52:07 +00:00
Wanchao Liang	4875e4d63f	[tp] delete dead code (#114731 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114731 Approved by: https://github.com/fegin, https://github.com/wz337	2023-12-01 06:35:42 +00:00
Nikita Shulga	1b27eae65e	[MPS] Fix out-of-bounds fill to sliced tensor (#114838 ) This fixes regression introduced by https://github.com/pytorch/pytorch/pull/81951 that caused out-of-bounds access when sliced tensor is filled with zeros Remove bogus `TORCH_INTERNAL_ASSERT(length >= offset)` as [NSMakeRange](https://developer.apple.com/documentation/foundation/1417188-nsmakerange?language=objc) arguments are location and length rather than start and end offset. In `fill_mps_tensor_`: - Pass `value` argument to `MPSStream::fill` - Pass `self.nbytes()` rather than `self.storage().nbytes()` as length of of buffer to fill as later will always results in out-of-bounds write if offset within the store is non-zero Add regression test Fixes https://github.com/pytorch/pytorch/issues/114692 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114838 Approved by: https://github.com/atalman, https://github.com/kulinseth	2023-12-01 06:24:42 +00:00
Pavan Balaji	aa390cec21	[profiler] Fix description to use nelems rather than size (#114735 ) We were storing the number of elements in the tensor, rather than the actual bytes. Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/114735 Approved by: https://github.com/aaronenyeshi, https://github.com/yoyoyocmu, https://github.com/kwen2501, https://github.com/fduwjj	2023-12-01 06:21:47 +00:00
Philip Meier	373f2060ba	fix extending torch native API docs (#114863 ) Couldn't think of a better `release notes:` label. Feel free to set a more fitting one Pull Request resolved: https://github.com/pytorch/pytorch/pull/114863 Approved by: https://github.com/mikaylagawarecki	2023-12-01 06:09:35 +00:00
Huy Do	5687285ca5	Skip quantization tests running from BaseTestQuantizePT2EQAT_ConvBn (#114829 ) Summary: This is a follow-up from D51428979. These tests should be run only from `TestQuantizePT2EQAT_ConvBn1d` and `TestQuantizePT2EQAT_ConvBn2d`. The base class doesn't have the necessary setup to run them and will fail expectedly. I previously ignored the failures on D51428979, and these failed tests have been disabled. Test Plan: Run an example test there and confirm that two versions from `TestQuantizePT2EQAT_ConvBn1d` and `TestQuantizePT2EQAT_ConvBn2d` are run while the one from `BaseTestQuantizePT2EQAT_ConvBn` is skipped ``` $ buck2 test 'fbcode//mode/opt' fbcode//caffe2/test/quantization:test_quantization -- --run-disabled 'caffe2/test/quantization:test_quantization - test_qat_conv_bn_fusion_literal_args' File changed: fbcode//caffe2/test/quantization/pt2e/test_quantize_pt2e_qat.py ↷ Skip: caffe2/test/quantization:test_quantization - test_qat_conv_bn_fusion_literal_args (caffe2.test.quantization.pt2e.test_quantize_pt2e_qat.BaseTestQuantizePT2EQAT_ConvBn) (0.0s) /data/users/huydo/fbsource/buck-out/v2/gen/fbcode/689edf96bfbb5738/caffe2/test/quantization/__test_quantization__/test_quantization#link-tree/torch/_utils_internal.py:230: NCCL_DEBUG env var is set to None /data/users/huydo/fbsource/buck-out/v2/gen/fbcode/689edf96bfbb5738/caffe2/test/quantization/__test_quantization__/test_quantization#link-tree/torch/_utils_internal.py:239: NCCL_DEBUG is WARN from /etc/nccl.conf INFO:2023-11-29 19:20:33 3049620:3049620 CuptiActivityProfiler.cpp:225] CUDA versions. CUPTI: 18; Runtime: 12000; Driver: 12000 /data/users/huydo/fbsource/buck-out/v2/gen/fbcode/689edf96bfbb5738/caffe2/test/quantization/__test_quantization__/test_quantization#link-tree/torch/_utils_internal.py:158: DeprecationWarning: This is a NOOP in python >= 3.7, its just too dangerous with how we write code at facebook. Instead we patch os.fork and multiprocessing which can raise exceptions if a deadlock would happen. threadSafeForkRegisterAtFork() test_qat_conv_bn_fusion_literal_args (caffe2.test.quantization.pt2e.test_quantize_pt2e_qat.BaseTestQuantizePT2EQAT_ConvBn) ... skipped 'Skipping test running from BaseTestQuantizePT2EQAT_ConvBn' ---------------------------------------------------------------------- Ran 1 test in 0.001s OK (skipped=1) Skipped: Skipping test running from BaseTestQuantizePT2EQAT_ConvBn Buck UI: https://www.internalfb.com/buck2/7b70fb33-44cb-4745-92e1-64031bb413b8 Test UI: https://www.internalfb.com/intern/testinfra/testrun/6473924660765251 Network: Up: 12KiB Down: 0B (reSessionID-0399f0c3-e671-4770-a41c-75c06ae709d5) Jobs completed: 11. Time elapsed: 1:07.2s. Cache hits: 0%. Commands: 1 (cached: 0, remote: 0, local: 1) Tests finished: Pass 2. Fail 0. Fatal 0. Skip 1. Build failure 0 ``` Differential Revision: D51694959 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114829 Approved by: https://github.com/clee2000	2023-12-01 05:13:27 +00:00
Xuehai Pan	d6c0d1b58b	[pytree] support collections.deque type for Python pytree (#113256 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113256 Approved by: https://github.com/zou3519 ghstack dependencies: #112485, #113255	2023-12-01 05:12:09 +00:00
Yukio Siraichi	0019196f1b	Refactor `move_constructor_to_cuda`. (#114626 ) Follow-up: #114539 This PR introduces a minor change to the `move_constructor_to_cuda` implementation, while refactoring the whole pass into a class. Here's a brief summary of the changes: - Create a new `ConstructorMoverPass` - Rephrase the condition: ```python if not isinstance( node.target, torch._ops.OpOverload ) or node.target.namespace not in ("prims", "aten"): ... if not ( isinstance(node.target, torch._ops.OpOverload) and node.target.namespace in ("prims", "aten") ): ... ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114626 Approved by: https://github.com/eellison	2023-12-01 05:09:29 +00:00
PyTorch UpdateBot	9267ab9032	[executorch hash update] update the pinned executorch hash (#114915 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml). Update the pinned executorch hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114915 Approved by: https://github.com/pytorchbot	2023-12-01 04:32:35 +00:00
Yanbo Liang	ab5385fc50	[Dynamo][6.3/N] Further cleanup torch.py (#114669 ) A follow-up PR to clean up what I found during the refactor of torch.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/114669 Approved by: https://github.com/jansel	2023-12-01 04:08:29 +00:00
Jerry Zhang	64fd706b21	[quant][pt2e] Add generate_numeric_debug_handle pass (#114315 ) Summary: This is a util for numeric suite in pt2 export so that we can build a more streamlined UX for numerical debugging in quant + executorch stack Test Plan: python test/test_quantization.py TestGenerateNumericDebugHandle Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/114315 Approved by: https://github.com/zhxchen17	2023-12-01 03:38:17 +00:00
wz337	2dd2fb91d9	[DeviceMesh] Add get_local_rank() API to DeviceMesh (#114709 ) As title. Differential Revision: [D51625152](https://our.internmc.facebook.com/intern/diff/D51625152/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114709 Approved by: https://github.com/wanchaol, https://github.com/fegin ghstack dependencies: #114708	2023-12-01 03:28:55 +00:00
fduwjj	fb325bbd46	Move class definition of DebugInfoWriter to TraceUtil as well (#114901 ) Since we moved the implementation of the class to TraceUtils in https://github.com/pytorch/pytorch/pull/114367, maybe we also want to move the implementation here as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114901 Approved by: https://github.com/XilunWu	2023-12-01 03:28:16 +00:00
William Wen	2a2f74727a	[dynamo, test] add test for backend registration API (#114908 ) Add tests for backend registration API, per https://github.com/pytorch/pytorch/pull/114820. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114908 Approved by: https://github.com/eellison ghstack dependencies: #114820	2023-12-01 03:10:56 +00:00
drisspg	033f98b7e0	Remove confusing warning message from SDPA about mask alignment (#114909 ) # Summary Users have reported that this warning message leads to confusion about the correctness of the mask even though it is only concerned with performance. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114909 Approved by: https://github.com/Chillee	2023-12-01 03:02:20 +00:00
Yang Chen	235eaabfed	[inductor][easy] print out exception message upon failing to write to a file (#114836 ) To address Oleg's internal review feedback. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114836 Approved by: https://github.com/khabinov	2023-12-01 02:40:43 +00:00
titaiwangms	1aa54bdebf	[ONNX] Fix op level debug on complex dtype support (#114885 ) Previous to this PR, op level debug mismatches whenever it comes to complex dtype matching, because in ONNX, we support real representation. This PR makes sure we use real representation to compare the results. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114885 Approved by: https://github.com/BowenBao	2023-12-01 02:17:27 +00:00
Shengbao Zheng	1d95644740	[Execution Trace] record root rank for broadcast/gather/reduce/scatter (#113828 ) Summary: collective like broadcast/gather/reduce/scatter need root rank info in order to be replayed in PARAM benchmarks. Log root rank instead of local rank in RECORD_PARAM_COMMS_DATA Reference: distributed/c10d/Types.hpp Test Plan: Tested in HPC Differential Revision: D51381196 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113828 Approved by: https://github.com/fduwjj	2023-12-01 01:28:49 +00:00
Yanbo Liang	6cba8b584d	[Dynamo] Support torch.cuda.amp.custom_fwd/custom_bwd by inlining (#114891 ) Fixes #114693 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114891 Approved by: https://github.com/zou3519	2023-12-01 01:23:51 +00:00
Yanbo Liang	7f40640342	[Dynamo] Support torch.amp.autocast as decorator (#114845 ) Fixes #114818 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114845 Approved by: https://github.com/jansel	2023-11-30 23:54:57 +00:00
Flavio Sales Truzzi	ad09d81694	Allow functionalization to work with optional mutable (#114803 ) Summary: - Added functionalization to allow Optionals Test Plan: CI tests. Reviewed By: zou3519 Differential Revision: D51209981 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114803 Approved by: https://github.com/zou3519	2023-11-30 23:48:03 +00:00
wz337	7b3e45be59	[DeviceMesh] Rename get_dim_groups to get_group (#114708 ) Rename get_dim_groups to get_group and update all callsites. Differential Revision: [D51629801](https://our.internmc.facebook.com/intern/diff/D51629801/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114708 Approved by: https://github.com/XilunWu, https://github.com/wanchaol, https://github.com/fegin	2023-11-30 23:40:14 +00:00
William Wen	38ae17d166	[dynamo, docs] update dynamo backend registration docs (#114820 ) Update docs to reflect current backend registration API. Add `lookup_backend` to root `dynamo` module. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114820 Approved by: https://github.com/eellison	2023-11-30 21:41:05 +00:00
Bin Bao	1f845d5898	[CI] Fix a REQUIRE_HIGHER_TOLERANCE comparison bug (#114870 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114870 Approved by: https://github.com/jansel	2023-11-30 21:11:15 +00:00
pbialecki	386b9c2adc	build small pip wheels for CUDA 11.8 (#114620 ) As discussed, we would like to start building all wheels using the CUDA PyPI dependencies. Adding the "small wheel" workflow for CUDA 11.8 as it's already used for 12.1U1. CC @malfet @atalman Pull Request resolved: https://github.com/pytorch/pytorch/pull/114620 Approved by: https://github.com/atalman, https://github.com/malfet	2023-11-30 20:50:31 +00:00
Xuehai Pan	2ab2e8e1c0	[pytree] support collections.defaultdict type for Python pytree (#113255 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113255 Approved by: https://github.com/zou3519 ghstack dependencies: #112485	2023-11-30 20:46:25 +00:00
BowenBao	baeb0705fe	[ONNX][Bench] Add warmup for onnx cuda runs (#114821 ) Increases perf accuracy especially for low iteration runs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114821 Approved by: https://github.com/thiagocrepaldi ghstack dependencies: #112179, #114767	2023-11-30 20:41:44 +00:00
vfdev	c867fddab5	[inductor] Fix in CppPrinter._print_Pow (#114872 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114872 Approved by: https://github.com/lezcano	2023-11-30 20:21:44 +00:00
Aaron Meurer	81adbb6131	Sort the output of TORCH_LOGS=help (#114657 ) Previously the order was random because it was based on the order of dictionary keys. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114657 Approved by: https://github.com/lezcano	2023-11-30 20:13:51 +00:00
Jason Ansel	b35ca2cb94	Better error message for misconfigured torchbench model (#114827 ) ``` File "/home/jansel/pytorch/./benchmarks/dynamo/torchbench.py", line 381, in load_model benchmark_cls.name = model_name AttributeError: 'NoneType' object has no attribute 'name ``` becomes ``` File "/home/jansel/pytorch/./benchmarks/dynamo/torchbench.py", line 381, in load_model raise NotImplementedError(f"{model_name}.Model is None") NotImplementedError: torchrec_dlrm.Model is None ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114827 Approved by: https://github.com/xuzhao9, https://github.com/yanboliang	2023-11-30 19:11:01 +00:00
Jason Ansel	57e482010a	Fix build-deps in benchmarks/dynamo/Makefile (#114815 ) This works around a missing git-python-versioning error Pull Request resolved: https://github.com/pytorch/pytorch/pull/114815 Approved by: https://github.com/yanboliang	2023-11-30 19:10:56 +00:00
Jason Ansel	7b3429d97c	Fix error with int+SymBool (#114828 ) Fixes #104797 ``` File "/home/jansel/pytorch/torch/_dynamo/utils.py", line 1486, in <lambda> lambda: run_node(tx.output, node, args, kwargs, nnmodule) File "/home/jansel/pytorch/torch/_dynamo/utils.py", line 1591, in run_node raise RuntimeError(fn_str + str(e)).with_traceback(e.__traceback__) from e File "/home/jansel/pytorch/torch/_dynamo/utils.py", line 1570, in run_node return node.target(args, kwargs) File "/home/jansel/conda/envs/pytorch/lib/python3.10/site-packages/einops/packing.py", line 153, in unpack n_unknown_composed_axes = sum(x == -1 for x in lengths_of_composed_axes) torch._dynamo.exc.TorchRuntimeError: Failed running call_function <function unpack at 0x7f644b962710>((FakeTensor(..., device='cuda:0', size=(1, s0s1, 128)), [(s0, s1)], 'b c'), **{}): unsupported operand type(s) for +: 'int' and 'SymBool' ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114828 Approved by: https://github.com/lezcano	2023-11-30 18:30:36 +00:00
Xuehai Pan	2a3d8e50fb	[pytree] test aligned API signature for C++ and Python pytree (#112485 ) Add tests to ensure the C++ and Python pytree provide the same APIs with identical signatures. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112485 Approved by: https://github.com/zou3519	2023-11-30 17:50:06 +00:00
Zhengxu Chen	e6b3a8ce5f	[export] Refactor export() and separate the non-strict part. (#114697 ) Summary: Refactor torch.export to separate strict part and non strict part. Adding an option to torch.export called `strict=True`. Test Plan: buck2 test mode/opt caffe2/test:test_export -- -r non_strict Pull Request resolved: https://github.com/pytorch/pytorch/pull/114697 Approved by: https://github.com/ydwu4, https://github.com/tugsbayasgalan	2023-11-30 16:47:50 +00:00
chunyuan	e3c42d3fb3	Inductor cpp wrapper: fix buffer free in non-AOT mode (#114741 ) We found performance regression when using cpp wrapper in non-AOT mode due to the change in https://github.com/pytorch/pytorch/pull/110892. https://github.com/pytorch/pytorch/pull/110892 only handles the buffer cache in AOT mode but removes the `reset` call without checking whether AOT mode is on or off. This PR updates the buffer free change to only happen when `V.graph.aot_mode is True`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114741 Approved by: https://github.com/jgong5, https://github.com/desertfire	2023-11-30 16:46:55 +00:00
vfdev	f93ea14309	[dynamo] Added support for math ops on ints with dynamic shapes (#114507 ) Fixes #114218 ``` import math import torch def func(x, a): b = math.floor(a + 0.5) b = math.radians(a) + b y = x + b return y cfunc = torch.compile(func, dynamic=True, fullgraph=True, backend="eager") x = torch.tensor([0, 1, 2, 3], dtype=torch.float32) a = 12 out = cfunc(x, a) ``` ``` [2023-11-29 18:10:08,385] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG] TRACED GRAPH [2023-11-29 18:10:08,385] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG] ===== __compiled_fn_0 ===== [2023-11-29 18:10:08,385] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG] <eval_with_key>.0 class GraphModule(torch.nn.Module): [2023-11-29 18:10:08,385] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG] def forward(self, L_a_ : torch.SymInt, s1 : torch.SymInt, L_x_ : torch.Tensor): [2023-11-29 18:10:08,385] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG] l_a_ = L_a_ [2023-11-29 18:10:08,385] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG] l_x_ = L_x_ [2023-11-29 18:10:08,385] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG] [2023-11-29 18:10:08,385] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG] # File: check_math_ops.py:7, code: b = math.floor(a + 0.5) [2023-11-29 18:10:08,385] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG] add = l_a_ + 0.5 [2023-11-29 18:10:08,385] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG] floor = math_floor(add); add = None [2023-11-29 18:10:08,385] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG] [2023-11-29 18:10:08,385] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG] # File: /pytorch/torch/_dynamo/polyfill.py:28, code: return math.pi / 180.0 * x [2023-11-29 18:10:08,385] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG] mul = 0.017453292519943295 * l_a_; l_a_ = None [2023-11-29 18:10:08,385] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG] [2023-11-29 18:10:08,385] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG] # File: check_math_ops.py:9, code: b = math.radians(a) + b [2023-11-29 18:10:08,385] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG] add_1 = mul + floor; mul = floor = None [2023-11-29 18:10:08,385] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG] [2023-11-29 18:10:08,385] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG] # File: check_math_ops.py:13, code: y = x + b [2023-11-29 18:10:08,385] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG] y = l_x_ + add_1; l_x_ = add_1 = None [2023-11-29 18:10:08,385] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG] return (y,) [2023-11-29 18:10:08,385] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG] [2023-11-29 18:10:08,385] [0/0] torch._dynamo.output_graph.__graph_code: [DEBUG] ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114507 Approved by: https://github.com/lezcano	2023-11-30 14:11:57 +00:00
Pearu Peterson	69f112d586	Call triton bsr_dense_mm/bsr_dense_addmm kernels on mm/addmm float32 inputs when appropiate (#114757 ) As in the title. In addition, this PR fixes a bug in `bsr_dense_mm` and `bsr_dense_addmm` return value handling where computations are performed on `make_triton_contiguous` return value while `bsr_dense_mm`/`bsr_dense_addmm` return a tensor that is an input to `make_triton_contiguous`. If `make_triton_contiguous` makes a copy of the input, the return values of `bsr_dense_mm`/`bsr_dense_addmm` will contain garbage. The PR increases the performance of nn.linear as follows (float32, `NVIDIA A100-SXM4-80GB`): - with 16x16 blocks, the average/maximal speed up is 67/78 % - with 32x32 blocks, the average/maximal speed up is 72/79 % - with 64x64 blocks, the average/maximal speed up is 71/79 % - with 128x128 blocks, the average/maximal speed up is 62/76 % The performance increase is illustrated also by the following sparsity-speedup graphs (before and after this PR): <img src="https://github.com/pytorch/pytorch/assets/402156/55ce0bf7-8ef2-47ab-99e8-8878f159037d" width="48%"> <img src="https://github.com/pytorch/pytorch/assets/402156/df256175-a594-4bd7-b244-90867fb9a45e" width="48%"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/114757 Approved by: https://github.com/cpuhrsch	2023-11-30 13:38:07 +00:00
Mikayla Gawarecki	d4128b164d	Fix nn.utils.parametrizations.weight_norm for BFloat16 (#114785 ) Fixes #107914 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114785 Approved by: https://github.com/lezcano	2023-11-30 13:18:47 +00:00
wz337	272e38e78b	[DeviceMesh] Update DeviceMesh's hash (#114812 ) Currently, when we create two DeviceMesh with the same mesh_tensor, the hash of the DeviceMesh will be the same. To follow the pattern of `dist.new_group()`, the two DeviceMesh should be different. Therefore, adding an id field for DeviceMesh creation to distinguish different DeviceMesh. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114812 Approved by: https://github.com/wanchaol, https://github.com/yoyoyocmu, https://github.com/fegin	2023-11-30 12:14:19 +00:00
Nikita Shulga	db698f733d	Update fbgemm_gpu pin (#114847 ) Should have been landed together with https://github.com/pytorch/pytorch/pull/101995 Includes `de731af65b` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114847 Approved by: https://github.com/kit1980, https://github.com/huydhn	2023-11-30 09:53:50 +00:00
Will Constable	92cd78b1df	[C10D] logging/comment clean ups (#114625 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114625 Approved by: https://github.com/fduwjj, https://github.com/XilunWu ghstack dependencies: #114810	2023-11-30 07:46:32 +00:00
Yang Chen	5c3f03e2dd	[inductor] add a config to specify the shape attribute for the generated svg graphs (#114811 ) We draw our fx graphs with the "record" shape attribute by default. Sometimes, when the graph is very complex, we may hit dot errors like below: "flat edge between adjacent nodes one of which has a record shape - replace records with HTML-like labels" and thus fail to generate a graph. So, let's give the user an option to specify the shape attribute for the dot graph. For example, passing INDUCTOR_DOT_GRAPH_SHAPE_SVG = "none" would let us generate HTML-like lables to workaround the above failure. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114811 Approved by: https://github.com/weifengpy	2023-11-30 06:10:37 +00:00
Nikita Shulga	e97e2ff445	[CI][MacOS] Cleanup left over local site-packages (#114843 ) Once a janitor always a janitor! Partially addresses https://github.com/pytorch/pytorch/issues/114840 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114843 Approved by: https://github.com/yanboliang	2023-11-30 05:37:53 +00:00
Tianyu Liu	8ae3835323	further deprecate PairwiseParallel and SequenceParallel from test (#114402 ) Remaining Issue When replace SequenceParallel, tests would pass even setting `input_layouts=Replicate()`. Still looking into it... Summary This is a follow-up PR to #114314. Test Plan `python test_files.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114402 Approved by: https://github.com/wanchaol	2023-11-30 05:06:08 +00:00
BowenBao	c1e51fcbfc	[ONNX][Bench] Relax tolerance for cuda accuracy check (#114767 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114767 Approved by: https://github.com/thiagocrepaldi ghstack dependencies: #112179	2023-11-30 04:43:46 +00:00
leslie-fang-intel	fd7201029a	[Quant] [PT2] Enable Inplace Dropout in _move_exported_model_to_eval (#114725 ) Summary Enable Inplace Dropout replacement in `_move_exported_model_to_eval` Test Plan ``` python -u -m pytest -s -v test_quantize_pt2e.py -k test_move_exported_model_to_eval ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114725 Approved by: https://github.com/andrewor14, https://github.com/jgong5 ghstack dependencies: #114547	2023-11-30 04:43:22 +00:00
PyTorch UpdateBot	06eb28c32a	[executorch hash update] update the pinned executorch hash (#114814 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml). Update the pinned executorch hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114814 Approved by: https://github.com/pytorchbot	2023-11-30 04:35:53 +00:00
leslie-fang-intel	bab054063c	[Quant] [PT2] Enable batchnorm in _move_exported_model_to_eval (#114547 ) Summary Add standalone batchnorm into `_move_exported_model_to_eval` to move it from training mode into eval mode Test Plan ``` python -m pytest test_mkldnn_pattern_matcher.py -k test_qat_bn_conv2d python -u -m pytest -s -v test_quantize_pt2e.py -k test_bn_move_exported_model_to_eval ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114547 Approved by: https://github.com/jgong5, https://github.com/andrewor14	2023-11-30 04:31:27 +00:00
Will Constable	4ed9e65038	[C10D] Add time_created_us to flight recorder (#114810 ) time_created_us is the cpu-side epoch_time (in usec) when a flight-recorder event was created. It loosely corresponds to the time the c10d collective API was called and a work object was created. It does NOT correspond to the time the collective started on the GPU. We follow the precedent of us epoch time from this PR adding timestamps to the cuda caching allocator: https://github.com/pytorch/pytorch/pull/112266 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114810 Approved by: https://github.com/zdevito	2023-11-30 04:15:56 +00:00
Tae Kyung Heo	1f5726708b	[PyTorch][ET] Collect Execution Traces in Chakra schema (#114753 ) Summary: Collect execution traces in the Chakra schema Created a new diff to change email address: D48030418 Test Plan: ``` $ cd ~/fbcode $ binary_path=$(buck2 build //param_bench/train/compute/python:pytorch_run_benchmark --show-output \| tail -1 \| awk '{print $2}') $ cd ~/fbsource $ $binary_path -c ~/fbcode/param_bench/train/compute/python/examples/pytorch/configs/alex_net.json --et $ cat ~/is_json.py import json import sys def is_json_file(filename): try: with open(filename, 'r') as f: json.load(f) return True except Exception as e: return False if len(sys.argv) != 2: print("Usage: python check_json.py [filename]") sys.exit(1) filename = sys.argv[1] # get filename from command-line argument print(is_json_file(filename)) $ python3 ~/is_json.py ~/fbsource/benchmark_result_2244333_1691065899_et.json True ``` Differential Revision: D51662384 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114753 Approved by: https://github.com/aaronenyeshi	2023-11-30 04:07:11 +00:00
Catherine Lee	3b7d60b6ff	Fix keep-going (#112098 ) New function for continue on error Another solution might be to run the entire suite to the end and use last failed, but I'm worried about concurrent processes writing to the same last failed cache entry, it's a bit different than the usual test rerunning strategy we use especially regarding segfaults and other ways the test suite can suddenly end, and there are some cases where the entire test suite should immediately get rerun in a new process (ex cuda error that causes sync to fail). Find example logs on commit 2f1510839727f6ef2631040d5f0edde26265015d TODO: continue on error for --subprocess and test_distributed aren't working fully Pull Request resolved: https://github.com/pytorch/pytorch/pull/112098 Approved by: https://github.com/huydhn	2023-11-30 04:01:57 +00:00
zdevito	d5544125a0	[distributed] NCCLflight recorder timeout fix (#114804 ) Because isCompleted() returns true on an exception, a timeout exception will cause the flight recorder to consider the event completed even though it timed out. This changes the logic to explicitly query the completion events on "retirement" when the work item leaves the workMetaList. We mark events as retired so we can distinguish between an event still in the queue but not completed and one that timed out. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114804 Approved by: https://github.com/wconstab	2023-11-30 03:46:48 +00:00
Bin Bao	e70a7c3296	[CI] Update torchbench pin (#114694 ) Summary: also revert the regressed graph breaks count for DALLE2_pytorch in https://github.com/pytorch/pytorch/pull/114598 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114694 Approved by: https://github.com/huydhn, https://github.com/malfet	2023-11-30 03:41:03 +00:00
Angela Yi	f1fe0b685c	[export] Remove combine_args_kwargs (#114782 ) Test Plan: CI Differential Revision: D51676479 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114782 Approved by: https://github.com/zhxchen17	2023-11-30 02:49:21 +00:00
Scott Wolchok	165f4f6ccf	[PyTorch] Redirect c10::optional to std::optional (#101995 ) We have C++17 now! I am intentionally dropping the `c10::optional<c10::ArrayRef>` size optimization. It was intended to improve dispatch, but thanks to D34602980 / #70864 we don't use `optional<ArrayRef>` in function arguments anymore anyway. Differential Revision: [D46079028](https://our.internmc.facebook.com/intern/diff/D46079028/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101995 Approved by: https://github.com/malfet, https://github.com/Skylion007, https://github.com/ezyang	2023-11-30 02:46:41 +00:00
PyTorch MergeBot	013675ff59	Revert "Add decomp for `replication_pad2d` and use for CUDA deterministic (#111590 )" This reverts commit f1286161a637e9fc0797a22a7b7d90eaa04ddc4f. Reverted https://github.com/pytorch/pytorch/pull/111590 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing XLA job. The job is also failing on the PR, but the log classifier failed to find the failed test which lead to it being marked wrongly as flaky ([comment](https://github.com/pytorch/pytorch/pull/111590#issuecomment-1833004794))	2023-11-30 02:28:14 +00:00
Tianyu Liu	9f3ec2ad45	deprecate PairwiseParallel from test (#114314 ) Summary To solve issue #113706: 1. replace `PariwiseParallel` with `ColwiseParallel` and `RowwiseParallel`. 2. replace the input of ColwiseParallel from `make_input_replicate_1d` and `make_output_replicate_1d` to `input_layouts` and `output_layouts`. 3. deprecate the tests for `_parallelize_mlp` as it only supports `PariwiseParallel`. Test Plan `pytest pytorch/test/distributed/tensor/parallel/` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114314 Approved by: https://github.com/wanchaol, https://github.com/XilunWu	2023-11-30 02:19:30 +00:00
colinpeppler	5262484ece	[easy][aotinductor] fix typos & add static typing (#114728 ) ``` // check all references $ grep -rl 'cpp_kernel_overlad_name' * ir.py ``` ``` $ lintrunner --take MYPYINDUCTOR torch/_inductor/codegen/wrapper.py torch/_inductor/ir.py ok No lint issues. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114728 Approved by: https://github.com/Skylion007, https://github.com/chenyang78	2023-11-30 02:10:56 +00:00
Chien-Chin Huang	4ba649e207	[FSDP][state_dict] Avoid assigning the root _device_mesh to the children _device_mesh (#114384 ) Assigning the root _device_mesh to the children _device_mesh is not correct as each FSDP state can have a different DeviceMesh. We are also replacing fully_shard with a new implementation. So there is no need to worry about the fully_shard behavior. Differential Revision: [D51507959](https://our.internmc.facebook.com/intern/diff/D51507959/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114384 Approved by: https://github.com/wz337	2023-11-30 02:08:31 +00:00
BowenBao	8cfc95368f	[Experimental][ONNX] Export with symbolic shapes in proto (#112179 ) Experimental feature to store symbolic shapes produced by torch dynamo inside the exported onnx model. There is no official ONNX spec to support nodes within FunctionProto to have value info, https://github.com/onnx/onnx/issues/5487. The names for value info are generated uniquely to be retrievable based on the call site and call stack. This requires onnxscript with https://github.com/microsoft/onnxscript/tree/bowbao/export_symbolic_shapes Pull Request resolved: https://github.com/pytorch/pytorch/pull/112179 Approved by: https://github.com/titaiwangms, https://github.com/thiagocrepaldi	2023-11-30 02:03:32 +00:00
Angela Yi	f0cc6364ed	[export] Remove convert_to_cpu flag (#114775 ) Test Plan: CI Differential Revision: D51674158 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114775 Approved by: https://github.com/zhxchen17, https://github.com/SherlockNoMad	2023-11-30 01:59:52 +00:00
Wei Lu	34ea0a2bdc	[Pytoch][Vulkan] Create context for layernorm (#114701 ) Summary: `Layernorm` has two arguments weight and bias which are stored as constant tensors on the CPU and they are transferred to GPU at every inference call. We create a context for this op to avoid the repeated passing. Specifically, we - created `create_layernorm_context` and `run_layernorm_context` in `Layernorm.h` and `Layernorm.cpp` - registered them in `Register.cpp` - rewrote the graph representation of the op in `vulkan_rewrite.cpp` Test Plan: ## Numerical test ``` [luwei@devbig984.prn1 /data/users/luwei/fbsource (b6ccc956c)]$ LD_LIBRARY_PATH=third-party/swiftshader/lib/linux-x64/ buck run fbcode/mode/dev-nosan //xplat/caffe2:pt_vulkan_api_test_bin -- --gtest_filter="layer_norm" Recommended: For faster builds try buck2: replace 'buck' with 'buck2' NOTE: buck-out/ has changed: look for files in fbsource/buck-out/v2/ 'buck2 build --show-output //xplat/caffe2:pt_vulkan_api_test_bin' will print the new output paths. If you are building in fbsource//xplat and have questions, post in 'Cross Platform Dev Discussions': https://fb.workplace.com/groups/xplat.qa Targets matching .buckconfig buck2.supported_projects: {'//xplat/caffe2:pt_vulkan_api_test_bin': '//xplat'} To suppress this warning: touch ~/.config/.dont_hint_buck2 Building: finished in 0.1 sec (100%) 339/339 jobs, 0/339 updated Total time: 0.2 sec BUILD SUCCEEDED Running main() from third-party/googletest/1.14.0/googletest/googletest/src/gtest_main.cc Note: Google Test filter = layer_norm [==========] Running 10 tests from 1 test suite. [----------] Global test environment set-up. [----------] 10 tests from VulkanAPITest [ RUN ] VulkanAPITest.packed_layer_norm_2d [ OK ] VulkanAPITest.packed_layer_norm_2d (342 ms) [ RUN ] VulkanAPITest.packed_layer_norm_3d [ OK ] VulkanAPITest.packed_layer_norm_3d (284 ms) [ RUN ] VulkanAPITest.packed_layer_norm_4d [ OK ] VulkanAPITest.packed_layer_norm_4d (5 ms) [ RUN ] VulkanAPITest.layer_norm_invalid_inputs [ OK ] VulkanAPITest.layer_norm_invalid_inputs (28 ms) [ RUN ] VulkanAPITest.layer_norm_2d [ OK ] VulkanAPITest.layer_norm_2d (1 ms) [ RUN ] VulkanAPITest.layer_norm_3d [ OK ] VulkanAPITest.layer_norm_3d (2 ms) [ RUN ] VulkanAPITest.layer_norm_4d [ OK ] VulkanAPITest.layer_norm_4d (4 ms) [ RUN ] VulkanAPITest.native_layer_norm_2d [ OK ] VulkanAPITest.native_layer_norm_2d (1 ms) [ RUN ] VulkanAPITest.native_layer_norm_3d [ OK ] VulkanAPITest.native_layer_norm_3d (2 ms) [ RUN ] VulkanAPITest.native_layer_norm_4d [ OK ] VulkanAPITest.native_layer_norm_4d (6 ms) [----------] 10 tests from VulkanAPITest (679 ms total) [----------] Global test environment tear-down [==========] 10 tests from 1 test suite ran. (679 ms total) [ PASSED ] 10 tests. ``` Full test result in P888496077, summary as below ``` [----------] 419 tests from VulkanAPITest (21652 ms total) [----------] Global test environment tear-down [==========] 419 tests from 1 test suite ran. (21652 ms total) [ PASSED ] 418 tests. [ SKIPPED ] 1 test, listed below: [ SKIPPED ] VulkanAPITest.querypool_flushed_shader_log ``` ## Graph representation comparison We created a model using `layer_norm` and traced it as below ``` class MyModel(torch.nn.Module): def __init__(self): super(MyModel, self).__init__() self.layer_norm = torch.nn.LayerNorm(normalized_shape=10) def forward(self, x): return self.layer_norm(x) # Create an instance of the model model = MyModel() # Create a dummy input tensor for tracing input_tensor = torch.randn(1, 10) # Use torch.jit.trace to trace the model and generate a graph traced_model = torch.jit.trace(model, input_tensor) ``` Then we converted the traced model to Vulkan backend using `optimize_for_mobile` ``` from torch.utils import mobile_optimizer vulkan_model = mobile_optimizer.optimize_for_mobile( traced_model, backend="vulkan", preserved_methods=to_preserve ) ``` Then we can print the graph of the `vulkan_model` as `print(vk_model.graph)` - Before this diff ``` %4 : bool = prim::Constant[value=1](), scope: __module.layer_norm # /mnt/xarfuse/uid-602118/33e18f68-seed-nspid4026531836_cgpid32066351-ns-4026531840/torch/nn/functional.py:2546:0 %5 : float = prim::Constant[value=1.0000000000000001e-05](), scope: __module.layer_norm # /mnt/xarfuse/uid-602118/33e18f68-seed-nspid4026531836_cgpid32066351-ns-4026531840/torch/nn/functional.py:2546:0 %14 : int[] = prim::Constant[value=[10]]() %33 : Tensor = aten::to(%x, %53, %30, %31, %31) %10 : Tensor = aten::layer_norm(%33, %14, %self.layer_norm.weight, %self.layer_norm.bias, %5, %4), scope: __module.layer_norm # /mnt/xarfuse/uid-602118/33e18f68-seed-nspid4026531836_cgpid32066351-ns-4026531840/torch/nn/functional.py:2546:0 ``` - after this diff ``` %14 : int[] = prim::Constant[value=[10]]() %47 : Tensor = aten::to(%x, %78, %44, %45, %45) %16 : Tensor = vulkan_prepack::run_layernorm_context(%47, %14, %17) ``` Reviewed By: SS-JIA Differential Revision: D51530478 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114701 Approved by: https://github.com/yipjustin	2023-11-30 01:33:50 +00:00
chilli	597d3fb86a	Add additional guard for index_put fallback for bfloat16 on whether it's accumulating or not (#114788 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114788 Approved by: https://github.com/cpuhrsch	2023-11-30 00:33:50 +00:00
Jon Chuang	80ae00d11a	[AOT Refactor] jit compile runtime wrappers (#114564 ) --- Part _ of https://github.com/pytorch/pytorch/issues/114548 Total reduction in lines: 5200 lines -> 1100 lines Pull Request resolved: https://github.com/pytorch/pytorch/pull/114564 Approved by: https://github.com/bdhirsh ghstack dependencies: #114550, #114551, #114552, #114553, #114554, #114555, #114556, #114557, #114558, #114559, #114561, #114562, #114563	2023-11-30 00:28:57 +00:00
Jon Chuang	741414b739	[AOT Refactor] dispatch compile graph (#114563 ) --- Part _ of https://github.com/pytorch/pytorch/issues/114548 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114563 Approved by: https://github.com/bdhirsh ghstack dependencies: #114550, #114551, #114552, #114553, #114554, #114555, #114556, #114557, #114558, #114559, #114561, #114562	2023-11-30 00:28:43 +00:00
Jon Chuang	abb84051a3	[AOT Refactor] alias runtime wrappers (#114562 ) --- Part _ of https://github.com/pytorch/pytorch/issues/114548 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114562 Approved by: https://github.com/bdhirsh ghstack dependencies: #114550, #114551, #114552, #114553, #114554, #114555, #114556, #114557, #114558, #114559, #114561	2023-11-30 00:24:43 +00:00
Jon Chuang	4d4093a5de	[AOT Refactor] traced function transforms pt. 2 (#114561 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114561 Approved by: https://github.com/bdhirsh ghstack dependencies: #114550, #114551, #114552, #114553, #114554, #114555, #114556, #114557, #114558, #114559	2023-11-30 00:24:05 +00:00
Jon Chuang	dab89d546c	[AOT Refactor] traced function transforms pt. 1 (#114559 ) --- Part _ of https://github.com/pytorch/pytorch/issues/114548 Current progress: 5200 lines -> 2400 lines Pull Request resolved: https://github.com/pytorch/pytorch/pull/114559 Approved by: https://github.com/bdhirsh ghstack dependencies: #114550, #114551, #114552, #114553, #114554, #114555, #114556, #114557, #114558	2023-11-30 00:24:05 +00:00
Jon Chuang	0f41a0e99d	[AOT Refactor] (missed) graph signature to i/o analysis (#114558 ) --- Part _ of https://github.com/pytorch/pytorch/issues/114548 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114558 Approved by: https://github.com/bdhirsh ghstack dependencies: #114550, #114551, #114552, #114553, #114554, #114555, #114556, #114557	2023-11-30 00:23:59 +00:00
Jon Chuang	5ab61c1ae1	[AOT Refactor] runtime wrappers (#114557 ) --- Part _ of https://github.com/pytorch/pytorch/issues/114548 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114557 Approved by: https://github.com/bdhirsh ghstack dependencies: #114550, #114551, #114552, #114553, #114554, #114555, #114556	2023-11-30 00:23:52 +00:00
Jon Chuang	7eafdee4d6	[AOT Refactor] input/output analysis (#114556 ) --- Part _ of https://github.com/pytorch/pytorch/issues/114548 Current progress: 5200 lines -> 3000 lines Pull Request resolved: https://github.com/pytorch/pytorch/pull/114556 Approved by: https://github.com/bdhirsh ghstack dependencies: #114550, #114551, #114552, #114553, #114554, #114555	2023-11-30 00:21:00 +00:00
Jon Chuang	7cb2e8387b	[AOT Refactor] collect metadata analysis (#114555 ) --- Part _ of https://github.com/pytorch/pytorch/issues/114548 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114555 Approved by: https://github.com/bdhirsh ghstack dependencies: #114550, #114551, #114552, #114553, #114554	2023-11-30 00:21:00 +00:00
Jon Chuang	e9b03ac36d	[AOT Refactor] subclass utils (#114554 ) --- Part _ of https://github.com/pytorch/pytorch/issues/114548 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114554 Approved by: https://github.com/bdhirsh ghstack dependencies: #114550, #114551, #114552, #114553	2023-11-30 00:17:57 +00:00
Jon Chuang	721d99181e	[AOT Refactor] schemas (#114553 ) --- Part _ of https://github.com/pytorch/pytorch/issues/114548 Current progress: 5200 lines -> 4200 lines Pull Request resolved: https://github.com/pytorch/pytorch/pull/114553 Approved by: https://github.com/bdhirsh ghstack dependencies: #114550, #114551, #114552	2023-11-30 00:15:28 +00:00
Jon Chuang	1971eda1db	[AOT Refactor] functional utils (#114552 ) --- Part _ of https://github.com/pytorch/pytorch/issues/114548 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114552 Approved by: https://github.com/bdhirsh ghstack dependencies: #114550, #114551	2023-11-30 00:12:41 +00:00
PyTorch UpdateBot	850887b0de	[executorch hash update] update the pinned executorch hash (#114717 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml). Update the pinned executorch hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114717 Approved by: https://github.com/pytorchbot, https://github.com/malfet	2023-11-30 00:08:43 +00:00
Jon Chuang	ec4b59305b	[AOT Refactor] logging utils (#114551 ) --- Part _ of https://github.com/pytorch/pytorch/issues/114548 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114551 Approved by: https://github.com/bdhirsh ghstack dependencies: #114550	2023-11-30 00:06:34 +00:00
Jon Chuang	41c1090e48	[AOT Refactor] utils (#114550 ) --- Part _ of https://github.com/pytorch/pytorch/issues/114548 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114550 Approved by: https://github.com/bdhirsh	2023-11-30 00:02:40 +00:00
Nikita Shulga	b5c4b1d9fe	Make Float8 types serializeable (#114662 ) By finally breaking FC promise on new dtypes by serializing untyped storage and tensor dtypes - Add `_rebuild_tensor_v3` that takes an extra dtype argument - In `Tensor.__reduce_ex__` serialize tensor using untyped storage for v3_dtypes (which are at the moment limited to float8 dtypes) Test plan: `python -c "import torch;x=torch.arange(10).to(dtype=torch.float8_e4m3fn);torch.save(x, 'pt.pt');print(torch.load('pt.pt'))"` Fixes https://github.com/pytorch/pytorch/issues/114634 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114662 Approved by: https://github.com/ngimel	2023-11-29 23:23:23 +00:00
Shiyan Deng	fe7b845c8d	[tgif] preserve non-forward method during torch package serialization (#114702 ) Reviewed By: terrycsy, sayitmemory Differential Revision: D51607058 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114702 Approved by: https://github.com/houseroad	2023-11-29 22:31:35 +00:00
Kurt Mohler	f1286161a6	Add decomp for `replication_pad2d` and use for CUDA deterministic (#111590 ) Fixes #95578 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111590 Approved by: https://github.com/peterbell10	2023-11-29 21:50:46 +00:00
Jeffrey Dunn	0ced55e06c	Optimize `inspect.stack()` call in caffe2/torch/library.py (#114700 ) Summary: Same optimization as https://github.com/pytorch/pytorch/pull/105940. Test Plan: Wait for tests Verify that the new code extracts the same module in a simple test case: ``` import inspect import sys def inside_frame() -> None: frame = inspect.stack()[0] print(f"Via inspect.stack(): {inspect.getmodule(frame[0])}, extracted frame = {frame[0]}") frame = sys._getframe(0) print(f"Via sys._getframe: {inspect.getmodule(frame)}, extracted frame = {frame}") if __name__ == "__main__": inside_frame() ``` Output: ``` [jsd115@devbig1161 /tmp/test]$ python3 ./getmodule.py Via inspect.stack(): <module '__main__' from './getmodule.py'>, extracted frame = <frame at 0x7fc9db9c4dd0, file './getmodule.py', line 6, code inside_frame> Via sys._getframe: <module '__main__' from './getmodule.py'>, extracted frame = <frame at 0x7fc9db9c4dd0, file './getmodule.py', line 9, code inside_frame> ``` Differential Revision: D51629733 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114700 Approved by: https://github.com/zou3519	2023-11-29 20:54:02 +00:00
Aaron Gokaslan	acdb278144	[BE]: Enable more ruff PLW checks. Disable one PLR that is preview. (#114759 ) Enables a couple more `PLW` checks and disables one that was added that was still in preview mode `PLR6201`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114759 Approved by: https://github.com/jansel	2023-11-29 20:53:26 +00:00
Jane Xu	7c1a5012f0	[BE][SparseAdam] cleaner way to verify no sparse params (#114425 ) Context: https://github.com/pytorch/pytorch/pull/47724 fixed the problem that SparseAdam could not handle generators by using the `list(...)` construct. However, this meant that SparseAdam deviated from other optimizers in that it could _accept_ a raw Tensors/Parameter vs requiring a container of them. This is not really a big deal. So why this PR? I do think this PR is cleaner. It uses the fact that the Optimizer parent class already containerizes parameters into parameter groups, so we could reuse that here by calling `super().__init__` first and then filter the param_groups after. This change would also make SparseAdam consistent with the rest of our optimizers in that only containerized params are accepted, which technically is BC breaking SO I've added a deprecation warning that we should remove in May 2024. (But is it really BC breaking when we've said in the docs that params should be an iterable this whole time? Maybe this is just a bug fix....😛) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114425 Approved by: https://github.com/drisspg	2023-11-29 19:47:03 +00:00
wz337	febbc48f43	[DeviceMesh] Make our mesh_dim kwarg naming consistent (#114707 ) Changing size(self, dim: Optional[int] = None) to def size(self, mesh_dim: Optional[int] = None) so it is consistent with the rest of our APIs. We also update this API usage change in both PT and internal (pyper, APS). Differential Revision: [D51602986](https://our.internmc.facebook.com/intern/diff/D51602986/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114707 Approved by: https://github.com/XilunWu, https://github.com/wanchaol, https://github.com/fegin	2023-11-29 19:43:23 +00:00
Jeffrey Dunn	d197f5c72b	Remove unused call to `inspect.stack()` in torch/_custom_op/impl.py (#114698 ) Summary: Fetching the stack isn't free and this variable isn't used. Let's not do the work. Test Plan: Wait for tests Differential Revision: D51629732 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114698 Approved by: https://github.com/zou3519, https://github.com/Skylion007	2023-11-29 19:33:52 +00:00
Nikita Shulga	a9d5133207	[ez][doc] Fix sample code in onnx_dynamo.rst (#114770 ) By adding `import torch.nn as nn` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114770 Approved by: https://github.com/atalman, https://github.com/thiagocrepaldi	2023-11-29 19:27:52 +00:00
Bin Bao	ffa974b940	[CI] Dump more detailed error msg in PT2 integration tests (#114683 ) Summary: Sometimes a PT2 CI test shows as both pass and infra_error, e.g. https://github.com/pytorch/pytorch/actions/runs/7015184949/job/19086433407. Add more logging to investigate what has happened. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114683 Approved by: https://github.com/eellison	2023-11-29 18:44:23 +00:00
PyTorch MergeBot	e38a3a6079	Revert "[dynamo / DDP] - lazily compile submodules - to propagate real tensor strides to backend compiler (#114154 )" This reverts commit 3f574eadb4d8a4c9cf9eb2fcd91a2944f3555886. Reverted https://github.com/pytorch/pytorch/pull/114154 on behalf of https://github.com/clee2000 due to reverted internally, broke internal builds, not sure why bot isn't working ([comment](https://github.com/pytorch/pytorch/pull/114154#issuecomment-1832496040))	2023-11-29 18:43:17 +00:00
Bin Bao	83c0763dda	[CI] Use linux.12xlarge for cpu_inductor integration tests (#114729 ) Summary: use linux.12xlarge for larger memory to avoid OOM Pull Request resolved: https://github.com/pytorch/pytorch/pull/114729 Approved by: https://github.com/huydhn	2023-11-29 18:39:53 +00:00
Menglu Yu	c1f7d4ad6a	[Inductor][fx pass] Refactor code to easily add pointwise op to do the batch fusion (#113381 ) Summary: 1. We refactor the code to have a unified API to add pointwise op 2. Add one more op sigmoid since we observed it in MC models Test Plan: # local reproduce for CMF ``` buck2 run mode/opt //scripts/jackiexu0313/pt2:local_model_with_pt2 -- --test_mode split_batch -c ``` P876977403 P876996776 diffing: https://www.internalfb.com/intern/diffing/?paste_number=876999623 Differential Revision: D51142990 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113381 Approved by: https://github.com/xuzhao9	2023-11-29 18:29:57 +00:00
atalman	ba4285bd9e	Deprecate primTorch module, replace it with decompositions in module Owners (#114754 ) Context: pt2 oncall is revamping its labeling system. One of the guidelines is to remove duplicate labeling in our system. Both primTorch and decomposition labels are referring to the same thing. primTorch was the legacy name (and we no longer have a primTorch project), so using decomposition as the label name makes more sense. Right now, the only open issues that use "module: primTorch" are the ones generated by the DISABLED bots. Once we replace the label in the bot, we can safely remove the primTorch label. Here an example of the issue that has primTorch label : https://github.com/pytorch/pytorch/issues/112719 Torchbot uses following logic to auto extract module owners: https://github.com/pytorch/test-infra/blob/main/torchci/pages/api/flaky-tests/disable.ts#L391 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114754 Approved by: https://github.com/huydhn	2023-11-29 18:27:20 +00:00
Tobias Ringwald	b6df841460	Fixed an issue where a user-specified default device clashed with the… (#114560 ) … device placement of the RNG. This PR now ignores the user-specified default device, allocates the tensor on the CPU and then moves the tensor to the device of the input tensor. This was more or less already the standard procedure in case the default device wasn't set. Fixes #114536. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114560 Approved by: https://github.com/soulitzer	2023-11-29 17:45:49 +00:00
Nikita Shulga	b20330ef81	[CI] Test PyTorch on M1 using OpenMP (#114738 ) Baby step towards https://github.com/pytorch/pytorch/issues/114721 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114738 Approved by: https://github.com/DanilBaibak, https://github.com/atalman	2023-11-29 17:41:35 +00:00
atalman	e891a3bba9	[releng] Add release 2.2 to Release Compatibility Matrix for PyTorch releases (#114758 ) Update RELEASE.md for release 2.2 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114758 Approved by: https://github.com/DanilBaibak	2023-11-29 16:27:59 +00:00
Jack Taylor	4a4c9fb0b8	[ROCm] Add ROCm AMDGPU support for inductor cpp codegen (#105141 ) Follows from previous enablement attempt: https://github.com/pytorch/pytorch/pull/101797 Adds support for hsaco binaries in inductor's cpp_wrapper codegen and enables the CUDA tests in test_cpp_wrapper. This PR also brings in additional required hipify mappings for the wrapper codegen file. NOTE: we can unskip some of these tests when we enabled MI210 runners. Pull Request resolved: https://github.com/pytorch/pytorch/pull/105141 Approved by: https://github.com/jansel, https://github.com/malfet	2023-11-29 15:11:24 +00:00
Nikita Shulga	a3bbf9ce3e	[BE][RelEng] Remove `dynamo` extra (#114720 ) As all dynamo dependencies are part of the default requirements, see ``` % curl -s https://pypi.org/pypi/torch/2.1.1/json \| jq '.info.requires_dist' [ "filelock", "typing-extensions", "sympy", "networkx", "jinja2", "fsspec", "nvidia-cuda-nvrtc-cu12 (==12.1.105) ; platform_system == \"Linux\" and platform_machine == \"x86_64\"", "nvidia-cuda-runtime-cu12 (==12.1.105) ; platform_system == \"Linux\" and platform_machine == \"x86_64\"", "nvidia-cuda-cupti-cu12 (==12.1.105) ; platform_system == \"Linux\" and platform_machine == \"x86_64\"", "nvidia-cudnn-cu12 (==8.9.2.26) ; platform_system == \"Linux\" and platform_machine == \"x86_64\"", "nvidia-cublas-cu12 (==12.1.3.1) ; platform_system == \"Linux\" and platform_machine == \"x86_64\"", "nvidia-cufft-cu12 (==11.0.2.54) ; platform_system == \"Linux\" and platform_machine == \"x86_64\"", "nvidia-curand-cu12 (==10.3.2.106) ; platform_system == \"Linux\" and platform_machine == \"x86_64\"", "nvidia-cusolver-cu12 (==11.4.5.107) ; platform_system == \"Linux\" and platform_machine == \"x86_64\"", "nvidia-cusparse-cu12 (==12.1.0.106) ; platform_system == \"Linux\" and platform_machine == \"x86_64\"", "nvidia-nccl-cu12 (==2.18.1) ; platform_system == \"Linux\" and platform_machine == \"x86_64\"", "nvidia-nvtx-cu12 (==12.1.105) ; platform_system == \"Linux\" and platform_machine == \"x86_64\"", "triton (==2.1.0) ; platform_system == \"Linux\" and platform_machine == \"x86_64\"", "jinja2 ; extra == 'dynamo'", "opt-einsum (>=3.3) ; extra == 'opt-einsum'" ] ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114720 Approved by: https://github.com/kit1980, https://github.com/huydhn	2023-11-29 15:08:27 +00:00
Yanbo Liang	b6a30bbfb6	[Dynamo] Forward fix dynamo trace rule test failure due to landing race (#114739 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/114739 Approved by: https://github.com/janeyx99, https://github.com/huydhn	2023-11-29 09:31:12 +00:00
Jerry Zhang	d2f4215dbb	[quant][pt2e] Fix the order for implicit sharing code (#114704 ) Summary: Current order of implicit sharing breaks common annotation patterns of SharedQuantizationSpec, so we changed the order here. But it's not going to work in all possible annotation cases, so quantizer implementors still need to be careful. In general if people only refer to node/edges that comes before the current node/edge in SharedQuantizationSpec, it should work I think Test Plan: CI, make sure this Fixed some internal tests Differential Revision: D51605918 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114704 Approved by: https://github.com/andrewor14	2023-11-29 08:58:28 +00:00
Elias Ellison	7692595834	Use different conv layout optimization heuristics for inference (#114600 ) While many models regress in training when converted to channels last, in inference the results are quite different. Almost all of the models experienced a speedup when converted to channels last. There were a few big regressions in torchbench - `timm_regnet` from `1.4343 → 1.0573` and `timm_resnet` from `1.7484 → 1.2868`. I used a modified script of the operator benchmarks [here](https://gist.github.com/eellison/e11dc645412f52e8b45fb26ba6f9f6a1) to measure the average speedup of convolutions across all of the input shapes found in torchbench according to the existing classifications that @shunting314 used - grouped convs, small channel convs, convolution with larger in-channel than out-channel. Only grouped convolutions benchmarked as a slowdown in inference. I updated the inference heuristic to multiply the flops of each conv with its predicted speedup/slowdown in channels last. With this heuristic the two previously regressing models no longer regress. Speeds up inference for torchbench ~8% and timm ~6%. The motivating model here was SDXL which now hits channels last and improves 10%. There were some models that were sped up in training when forcing channels last (along with a number of regressions). It's possible there is some speedup in training to be had with additional heuristics. We could also have more granular classification/predictions which might benefit both training and inference. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114600 Approved by: https://github.com/jansel, https://github.com/shunting314	2023-11-29 07:53:59 +00:00
cyy	4e38178bb8	[Reland] [1/N] Fixes clang-tidy warnings in header files (#114668 ) Reland of #113608 after fixing the problematic parts. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114668 Approved by: https://github.com/huydhn	2023-11-29 07:11:51 +00:00
angelayi	c10893654e	[export] Fix run_decomps to work with fake mode (#114714 ) Fixes https://github.com/pytorch/pytorch/issues/114711 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114714 Approved by: https://github.com/ydwu4, https://github.com/zhxchen17	2023-11-29 06:52:13 +00:00
Chen, Zejun	a076a74f11	[Nested Tensor] Add xpu device in assertion for nested tensor creation (#114664 ) Add xpu device checking in nested tensor creation Pull Request resolved: https://github.com/pytorch/pytorch/pull/114664 Approved by: https://github.com/jgong5, https://github.com/xunnanxu	2023-11-29 05:59:35 +00:00
Pearu Peterson	69c4819f53	Add bsr_dense_addmm triton kernel (#114595 ) As in the title. The `bsr_dense_addmm` kernel implemented in this PR is a generalization of `bsr_dense_mm` in the following respects (in addition of having input, beta, and alpha parameters): - it implements `SPLIT_N` kernel parameter that enables efficient kernel launches in the case of wide inputs. For instance, the timing of nn.linear with 256x256 BSR weights having 16x16 blocks and 256x131072 strided input reduced about 16x (this corresponds to the 94 % speed up value listed below). - it supports rectangular blocks in sparse BSR tensor weights The performance increase of nn.linear is as follows (float16, `NVIDIA A100-SXM4-80GB`): - with 16x16 blocks, the average/maximal speed up is 55/94 % - with 32x32 blocks, the average/maximal speed up is 33/63 % - with 64x64 blocks, the average/maximal speed up is 23/42 % - with 128x128 blocks, the average/maximal speed up is 15/39 % Pull Request resolved: https://github.com/pytorch/pytorch/pull/114595 Approved by: https://github.com/cpuhrsch	2023-11-29 05:29:25 +00:00
Yanbo Liang	57a5a687b0	[Dynamo][6.2/N] Dump the in graph function list(~2600 ops) and add unit tests. (#114196 ) This is the second PR according https://github.com/pytorch/pytorch/pull/113009#issuecomment-1804417925 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114196 Approved by: https://github.com/jansel	2023-11-29 05:09:48 +00:00
Angela Yi	05f071d922	[export] Fix state dict device serialization (#114695 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/114000 Will check with SherlockNoMad on why we need to convert to cpu after his PTO Test Plan: CI Differential Revision: D51629068 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114695 Approved by: https://github.com/ydwu4	2023-11-29 05:05:22 +00:00
PyTorch MergeBot	7c8d3639cf	Revert "[fx] log the node when it's get eliminated (#112684 )" This reverts commit 6256d3710e18f08af8588d1aae88c758bd9c6b30. Reverted https://github.com/pytorch/pytorch/pull/112684 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/112684#issuecomment-1831198778))	2023-11-29 04:31:15 +00:00
Brian Hirsh	64ccdd4afb	AOTAutograd: keep input mutations in the graph if they are under no_grad, even if they require_grad (#114646 ) Quick recap of events: (1) https://github.com/pytorch/pytorch/pull/111347, which fixed a perf regression in 2.1 compared to 2.0, introduced a correctness problem around input mutations on inputs that require grad that show up in an inference-only graph (the specific case where this can happen is rare and nobody reported the issue, but it was fixed a few weeks later) (2) That fix happened here: https://github.com/pytorch/pytorch/pull/113584, which makes sure to keep input mutations outside of the graph, so the autograd engine can set metadata properly on them (3) That in turn caused a slight regression compared to (1), which is what this PR attempts to fix. In particular, code like the below is safe to keep the mutations in the graph for: ``` @torch.compile def f(x): x.mul_(2) x = torch.ones(2, requires_grad=True).clone() # x requires_grad, so the input mutation will change some autograd metadata, like the version counter # However, the mutation is under no_grad, so we don't have to worry about e.g. aliases of x having their .grad_fn fields changed with torch.no_grad(): f(x) ``` This particular case is pretty important to the shampoo optimizer code, which is run under `torch.compile`, and mutates parameters (which require grad). Pull Request resolved: https://github.com/pytorch/pytorch/pull/114646 Approved by: https://github.com/zou3519	2023-11-29 04:29:32 +00:00
Scott Wolchok	ce00c8fb45	[PyTorch] Remove hardcoded device=cuda in test_aot_inductor (#112797 ) All the other tests use self.device, so this seems like an oversight? Cost me a lot of time debugging the minimal arrayref interface, which is only intended for CPU. Differential Revision: [D50949928](https://our.internmc.facebook.com/intern/diff/D50949928/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112797 Approved by: https://github.com/chenyang78, https://github.com/desertfire, https://github.com/khabinov ghstack dependencies: #113997	2023-11-29 03:12:33 +00:00
Scott Wolchok	5b9add666f	[PyTorch] AOTI: Emit CACHED_TORCH_TYPE only as needed (#113997 ) Avoids potential compatibility issues where a new dtype is supported by the DSO but not the binary loading it. Differential Revision: [D51434335](https://our.internmc.facebook.com/intern/diff/D51434335/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113997 Approved by: https://github.com/int3	2023-11-29 03:12:32 +00:00
Richard Zou	73a661abf1	Stop using excess memory in generate_opcheck_tests, re-enable fbgemm TBE tests (#114641 ) Summary: 1. We stop using excess memory in generate_opcheck_tests. This is safe because all the individual test utils already ensure that they do not modify the inputs. 2. We re-enable the fbgemm TBE tests (see internal diff, but all of this is open source). They were previously removed because they OOM'ed when run serially; (1) and (3) cut down the memory usage to ~20gb peak. 3. I needed to skip some newly failing generated tests and also some that had an impact on the memory usage. Test Plan: - run tests Reviewed By: sryap Differential Revision: D51601964 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114641 Approved by: https://github.com/williamwen42	2023-11-29 02:21:13 +00:00
Shiyan Deng	6256d3710e	[fx] log the node when it's get eliminated (#112684 ) Summary: ATT Test Plan: CI Reviewed By: strisunshinewentingwang Differential Revision: D50912413 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112684 Approved by: https://github.com/zyan0	2023-11-29 01:43:04 +00:00
Nikita Shulga	24f06c7783	[no ci] Add `.watchman` to .gitignore (#114718 ) Followup after https://github.com/pytorch/pytorch/pull/114716 TODO: should the old filename be deleted, or it just depends on Atom/VSCode version Pull Request resolved: https://github.com/pytorch/pytorch/pull/114718 Approved by: https://github.com/kit1980	2023-11-29 01:37:40 +00:00
PyTorch MergeBot	48820c928c	Revert "[test] AOTAutograd: support mutations on buffers that happen during th bw (#112906 )" This reverts commit c8974d649d684a33a5c02a0b112a6e0743201d97. Reverted https://github.com/pytorch/pytorch/pull/112906 on behalf of https://github.com/huydhn due to There are lots of failure after this change `c8974d649d`, this is probably a landrace ([comment](https://github.com/pytorch/pytorch/pull/112906#issuecomment-1831016362))	2023-11-29 00:49:57 +00:00
Eddie Yan	4bfb19827e	Cleanup `.watchman` file (#114716 ) This seems to be an artifact from an fb tool that snuck into a commit (#113117)? CC @malfet Pull Request resolved: https://github.com/pytorch/pytorch/pull/114716 Approved by: https://github.com/mikaylagawarecki, https://github.com/yanboliang, https://github.com/malfet	2023-11-29 00:48:58 +00:00
Jesse Cai	ae593d0393	[sparse][semi-structured][inductor] meta registrations for _cslt_sparse_mm + additional stride checking in test. (#114685 ) _cslt_sparse_mm + additional stride checking in test. Summary: This PR adds in meta registrations for _cslt_sparse_mm. Based on the work @drisspg did in #114370. Additionally, it updates the tests by checking that the strides of the spare result and the result returned by sparse+compile are the same, to avoid errors like those found in https://github.com/pytorch/pytorch/pull/114477. Test Plan: ``` python test/test_sparse_semi_structred -k compile_cusparselt python test/test_sparse_semi_structred -k compile_cutlass ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/114685 Approved by: https://github.com/alexsamardzic, https://github.com/drisspg	2023-11-29 00:31:52 +00:00
Will Constable	43d0659d74	[C10D] Fix DUMP_ON_TIMEOUT env (#114699 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114699 Approved by: https://github.com/kwen2501, https://github.com/XilunWu, https://github.com/fduwjj	2023-11-29 00:15:45 +00:00
Aaron Gokaslan	bc34f02c38	[BE][Easy]: Apply RUF019: remove duplicate checks for dict access (#114478 ) Applies RUF019 nightly preview rule to the codebase Pull Request resolved: https://github.com/pytorch/pytorch/pull/114478 Approved by: https://github.com/mikaylagawarecki	2023-11-29 00:14:02 +00:00
Brian Hirsh	c8974d649d	[test] AOTAutograd: support mutations on buffers that happen during th bw (#112906 ) I can hold off on reviews / landing until I talk to Driss and we confirm that we need this for FP8. This PR also needs testing and probably shouldn't land until Tugsuu's input mutation handling [PR](https://github.com/pytorch/pytorch/pull/111046) goes through. What this PR tries to solve is when you have a model that tries to mutate some nn module state (a buffer), but during the backward. It appears that this might be necessary for FP8's delayed scaling. Today, AOTAutograd will just not realize if you happened to mutate any graph inputs when running the backward pass, and functionalize them away but not realize that they were input mutations. This PR tries to: (a) detect this situation (input mutations during the backward) (b) put `copy_()`'s in the graph to properly handle the input mutation when we can. In cases where we can't keep the copy_() in the graph, we just error loudly (I imagine that these cases will be extremely rare, but we can fix them if they ever come up). This is mostly a prototype for now, not ready for review. I made this example locally to test out: ``` import torch class MutatingAutogradFn(torch.autograd.Function): @staticmethod def forward(ctx, x, buf): ctx.save_for_backward(buf) return x @staticmethod def backward(ctx, x_grad): buf = ctx.saved_tensors[0] buf.add_(x_grad) return x_grad * 3, None class Mod(torch.nn.Module): def __init__(self): super().__init__() self.buf = torch.ones(2) @torch._dynamo.allow_in_graph def backward_mutating_fn(self, x, buf): return MutatingAutogradFn.apply(x, buf) def forward(self, x): tmp = self.backward_mutating_fn(x, self.buf) return tmp + self.buf m = Mod() x = torch.ones(2, requires_grad=True) out = m(x) # After the fw, buf should not have been mutated print(m.buf) out.sum().backward() # bw has run, so buf should now be mutated print(m.buf) print(x.grad) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112906 Approved by: https://github.com/ezyang	2023-11-28 23:59:21 +00:00
Bin Bao	11277cc510	[CI] Remove an exception catching for Triton compiler error (#113064 ) Summary: The workaround was there when Triton compiler was at its early stage. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113064 Approved by: https://github.com/eellison	2023-11-28 23:46:30 +00:00
Will Constable	3fccc0446c	Add dtensor and fsdp/2d tests to inductor_distributed CI (#114642 ) Smuggle important and not too slow tests to run on this trunk job, instead of just on the periodic job where they currently reside. - test_dtensor_compile took 70sec, test_fsdp_2d_parallel took 198sec locally As a follow up, organize the distributed-mgpu tests better and maybe rename this job to reflect its more 'general dist mgpu' Pull Request resolved: https://github.com/pytorch/pytorch/pull/114642 Approved by: https://github.com/wanchaol, https://github.com/malfet	2023-11-28 23:06:18 +00:00
joncrall	765d4599ee	Give users control over packages in torch.utils.collect_env (#112993 ) I'm looking to repurpose some logic in `torch.utils.collect_env` for the `geowatch` package. I'm mostly able to just use this script as a library, which is great because it reduces code in my package. However, the issue is that the package patterns that are relevant to torch are hard-coded inside of `get_conda_packages` and `get_pip_packages`. The changes I made are simple. I defined the default package patterns as two global sets, and I added an argument to each function that lets the user customize exactly what package patterns are relevant. If they are not specified the defaults are used. I was considering extending the power of the patterns by utilizing `fnmatch`, `re` (or [xdev.pattern](https://github.com/Erotemic/xdev/blob/main/xdev/patterns.py) which abstracts them both), but instead I opted to just use the existing `__contains__` test to keep things simple. From torch's perspective this should make maintaining this file slightly easier because to update relevant packages, the developer now updates two neighboring top-level globals instead of two separated local variables. However, it does add an argument to two functions, and that argument isn't used in torch itself, so there is an argument for removing that, and then users could still have some control by modifying globals, but I think the way I did it balances the tradeoffs well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112993 Approved by: https://github.com/zou3519	2023-11-28 22:35:25 +00:00
rzou	ce4bff4013	[dynamo] fix functools.wraps on nested functions (#114279 ) Updated version of #108885 addressing the review. In this PR: - We add a VT.can_reconstruct utility that checks if VT.reconstruct() does something. - If functools.wraps(fn) is passed a `fn` that either has a source or has .can_reconstruct() == True, then we stash the source (or the VT) - Later on, we use the source (or VT.reconstruct) to actually reconstruct the object in codegen. Test Plan: - New tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/114279 Approved by: https://github.com/voznesenskym	2023-11-28 22:34:59 +00:00
Wei Lu	a26d747615	[PyTorch][Vulkan] Fix matrix multiplication performance test binary (#114624 ) Summary: Due to recent changes in D51421256 and D51379737, - shaders of `mm`, `addmm`, `bmm`, `baddbmm` are reduced into just `mm`, - height and width packing logic is applied to linear operations so the current perf testings of `addmm` and `create_linear_context` and `run_linear_context` are no longer valid (0 latency will be printed, see test plan). Specifically, the original test extracts latency of `vulkan.addmm` which doesn't exist any more. Instead the current implementation of `addmm` invokes ``` vulkan.convert_channels_to_height_packed vulkan.convert_channels_to_width_packed vulkan.mm vulkan.mul_scalar vulkan.add ``` To deal with this - for `addmm` and `run_linear_context`, we apply a new function `extractTotalShaderResultsAndSetState` which aggregates latency of all invoded shaders except `nchw_to_image` and `image_to_nchw`; - for `create_linear_context`, besides `nchw_to_image` and `image_to_nchw`, we also aggregate `vulkan.convert_channels_to_height_packed` Test Plan: - build binary, at `fbsource` ``` buck2 build -c ndk.debug_info_level=0 -c ndk.static_linking=true -c pt.enable_qpl=0 -c pt.vulkan_use_gpu_diagnostics=1 --target-platforms=ovr_config//platform/android:arm32-fbsource //xplat/caffe2:pt_vulkan_mm_perf_test_binAndroid --show-output -c pt.vulkan_full_precision=1 ``` - test on android device ``` adb push buck-out/v2/gen/fbsource/f1f3f9bed27e143c/xplat/caffe2/__pt_vulkan_mm_perf_test_binAndroid__/pt_vulkan_mm_perf_test_binAndroid /data/local/tmp adb shell /data/local/tmp/pt_vulkan_mm_perf_test_binAndroid ``` ## Before addmm_benchmark ``` (base) luwei@luwei-mbp ~ % adb shell /data/local/tmp/pt_vulkan_mm_perf_test_binAndroid 2023-11-16T06:48:18+00:00 Running /data/local/tmp/pt_vulkan_mm_perf_test_binAndroid Run on (4 X 1708.8 MHz CPU s) *WARNING* CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead. ... Kernel Name Workgroup Size Duration (ns) =========== ============== =========== vulkan.nchw_to_image {500, 500, 1} 4334408 vulkan.nchw_to_image {500, 500, 1} 4327648 vulkan.nchw_to_image {500, 500, 1} 4322760 vulkan.convert_channels_to_height_packed{500, 125, 1} 1233960 vulkan.convert_channels_to_width_packed{125, 500, 1} 1286896 vulkan.mm {125, 125, 1} 76186084 vulkan.mul_scalar {500, 500, 1} 1132924 vulkan.mul_scalar {500, 500, 1} 1128556 vulkan.add {500, 500, 1} 4285788 vulkan.image_to_nchw {500, 500, 1} 1421576 ... addmm_benchmark/N:500/M:500/P:500/iterations:5/manual_time/threads:1 0.000 ms 77.2 ms 5 ``` create_linear_context_benchmark ``` Kernel Name Workgroup Size Duration (ns) =========== ============== =========== vulkan.nchw_to_image {500, 500, 1} 4336696 vulkan.convert_channels_to_height_packed{500, 125, 1} 1229384 ... create_linear_context_benchmark/N:500/M:500/P:500/iterations:5/manual_time/threads:1 8.57 ms 32.9 ms 5 ``` run_linear_context_benchmark ``` Kernel Name Workgroup Size Duration (ns) =========== ============== =========== vulkan.nchw_to_image {500, 500, 1} 4305548 vulkan.convert_channels_to_height_packed{500, 125, 1} 1196104 ... run_linear_context_benchmark/N:500/M:500/P:500/iterations:5/manual_time/threads:1 0.000 ms 86.2 ms 5 ``` ## After addmm_benchmark ``` Kernel Name Workgroup Size Duration (ns) =========== ============== =========== vulkan.nchw_to_image {500, 500, 1} 4332016 vulkan.nchw_to_image {500, 500, 1} 4321356 vulkan.nchw_to_image {500, 500, 1} 4314908 vulkan.convert_channels_to_height_packed{500, 125, 1} 1195896 vulkan.convert_channels_to_width_packed{125, 500, 1} 1273428 vulkan.mm {125, 125, 1} 77055680 vulkan.mul_scalar {500, 500, 1} 1111708 vulkan.mul_scalar {500, 500, 1} 1111032 vulkan.add {500, 500, 1} 4236024 vulkan.image_to_nchw {500, 500, 1} 1429480 ... addmm_benchmark/N:500/M:500/P:500/iterations:5/manual_time/threads:1 51.1 ms 76.0 ms 5 ``` create_linear_context_benchmark ``` Kernel Name Workgroup Size Duration (ns) =========== ============== =========== vulkan.nchw_to_image {500, 500, 1} 4332432 vulkan.convert_channels_to_height_packed{500, 125, 1} 1235884 ... create_linear_context_benchmark/N:500/M:500/P:500/iterations:5/manual_time/threads:1 9.74 ms 30.6 ms 5 ``` run_linear_context_benchmark ``` Kernel Name Workgroup Size Duration (ns) =========== ============== =========== vulkan.nchw_to_image {500, 500, 1} 4289740 vulkan.convert_channels_to_height_packed{500, 125, 1} 1227928 ... run_linear_context_benchmark/N:500/M:500/P:500/iterations:5/manual_time/threads:1 50.4 ms 86.0 ms 5 ``` full result in P887658084 Reviewed By: liuk22 Differential Revision: D51506293 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114624 Approved by: https://github.com/yipjustin	2023-11-28 22:27:26 +00:00
Kaichao You	d114f31b30	add testcase when bytecode hook changes the bytecode; fix code map (#114487 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114487 Approved by: https://github.com/jansel	2023-11-28 22:14:57 +00:00
Jez Ng	47e6cc4d22	Remove yet more type-ignores in dynamo/inductor (#114684 ) Probably the last big batch for a while Pull Request resolved: https://github.com/pytorch/pytorch/pull/114684 Approved by: https://github.com/Skylion007	2023-11-28 22:09:38 +00:00
Aaron Gokaslan	9f073ae304	[BE][Easy]: add some PLR pylint checks and exclusions to ruff (#114519 ) Add a couple of additional checks and exclusions Pull Request resolved: https://github.com/pytorch/pytorch/pull/114519 Approved by: https://github.com/jansel	2023-11-28 20:49:03 +00:00
chundian	74e10f0f60	[inductor] Fix torch.split bug on unbacked symint (#113406 ) torch.split(x, l) fails when l's shape is the unbacked symint. E.g. l = y.tolist() makes l the unbacked shape, because l depends on the data access of y. The downdtream call `SliceView.create()` evaluates the shape even if the input shape is unbacked symint, which brings up the bug. Test Plan: python test/inductor/test_unbacked_symints.py -k test_split_with_sizes Pull Request resolved: https://github.com/pytorch/pytorch/pull/113406 Approved by: https://github.com/aakhundov, https://github.com/ezyang	2023-11-28 20:45:13 +00:00
Guo Yejun	4aa2c51a09	[doc] fix typo on graph 3 that is recorded (#114666 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/114666 Approved by: https://github.com/eellison	2023-11-28 20:40:13 +00:00
Guo Yejun	4a35ec3c0e	[docs] correct the code for cudagraph trees integration (#114583 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/114583 Approved by: https://github.com/eellison	2023-11-28 20:28:52 +00:00
Will Constable	44c9e4cbf0	[C10D] Decouple PGNCCL desync from dbg dump (#114614 ) Add TORCH_NCCL_DUMP_DEBUG_INFO env to control dumping independently of desync debug feature. Currently default to disabled (so no behavior change by default), but plan to default this to true after validation. Moves 'sleep for 30 sec' that used to be after desync debug to before it. In my view sleeping before desync is equivalent since we always sleep the same duration, and keeps the code simpler this way. Fixes #114433 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114614 Approved by: https://github.com/zdevito ghstack dependencies: #114651	2023-11-28 19:46:10 +00:00
Jon Chuang	cef79c0df4	[inductor] `_sparse_semi_structured_linear` fallback - no meta registration; not on testing path (#114477 ) Test was wrong in original PR and merged changes were never tested. Further, the sparse op was never actually compiled due to missing `fullgraph=True` and missing meta registration. When meta is added as per this PR, it gives wrong answers when input needs to be padded and when input needs to be reshaped. Is this something to do with the generated inductor code for: ``` constant_pad_nd: "f16[32, 128]" = torch.ops.aten.constant_pad_nd.default(primals_3, [0, 0, 0, 31], 0.0) ... slice_1: "f16[1, 128]" = torch.ops.aten.slice.Tensor(_sparse_semi_structured_linear, 0, 0, 1); _sparse_semi_structured_linear = None ``` and ``` [2023-11-23 14:01:03,463] [0/2] torch._functorch.aot_autograd.__aot_graphs: [INFO] mul: "Sym(s0s1)" = primals_4 primals_5 [2023-11-23 14:01:03,463] [0/2] torch._functorch.aot_autograd.__aot_graphs: [INFO] view: "f16[s0s1, 128]" = torch.ops.aten.view.default(primals_6, [mul, 128]); primals_6 = mul = None ... [2023-11-23 14:01:03,463] [0/2] torch._functorch.aot_autograd.__aot_graphs: [INFO] view_1: "f16[s0, s1, 128]" = torch.ops.aten.view.default(slice_1, [primals_4, primals_5, 128]); slice_1 = None ``` Failing graphs: Padded: ``` [2023-11-23 13:59:51,102] [0/2] torch._functorch.aot_autograd.__aot_graphs: [INFO] TRACED GRAPH [2023-11-23 13:59:51,102] [0/2] torch._functorch.aot_autograd.__aot_graphs: [INFO] ===== Forward graph 5 ===== [2023-11-23 13:59:51,102] [0/2] torch._functorch.aot_autograd.__aot_graphs: [INFO] <eval_with_key>.66 class GraphModule(torch.nn.Module): [2023-11-23 13:59:51,102] [0/2] torch._functorch.aot_autograd.__aot_graphs: [INFO] def forward(self, primals_1: "f16[128, 64]", primals_2: "i16[128, 8]", primals_3: "f16[1, 128]"): [2023-11-23 13:59:51,102] [0/2] torch._functorch.aot_autograd.__aot_graphs: [INFO] # File: /home/jonch/Desktop/Programming/mlsys/pytorch/test/test_sparse_semi_structured.py:145, code: x = self.linear(x) [2023-11-23 13:59:51,102] [0/2] torch._functorch.aot_autograd.__aot_graphs: [INFO] constant_pad_nd: "f16[32, 128]" = torch.ops.aten.constant_pad_nd.default(primals_3, [0, 0, 0, 31], 0.0) [2023-11-23 13:59:51,102] [0/2] torch._functorch.aot_autograd.__aot_graphs: [INFO] _sparse_semi_structured_linear: "f16[32, 128]" = torch.ops.aten._sparse_semi_structured_linear.default(constant_pad_nd, primals_1, primals_2); constant_pad_nd = primals_1 = primals_2 = None [2023-11-23 13:59:51,102] [0/2] torch._functorch.aot_autograd.__aot_graphs: [INFO] slice_1: "f16[1, 128]" = torch.ops.aten.slice.Tensor(_sparse_semi_structured_linear, 0, 0, 1); _sparse_semi_structured_linear = None [2023-11-23 13:59:51,102] [0/2] torch._functorch.aot_autograd.__aot_graphs: [INFO] slice_2: "f16[1, 128]" = torch.ops.aten.slice.Tensor(slice_1, 1, 0, 9223372036854775807); slice_1 = None [2023-11-23 13:59:51,102] [0/2] torch._functorch.aot_autograd.__aot_graphs: [INFO] [2023-11-23 13:59:51,102] [0/2] torch._functorch.aot_autograd.__aot_graphs: [INFO] # File: /home/jonch/Desktop/Programming/mlsys/pytorch/test/test_sparse_semi_structured.py:147, code: return torch.nn.functional.relu(x) [2023-11-23 13:59:51,102] [0/2] torch._functorch.aot_autograd.__aot_graphs: [INFO] relu: "f16[1, 128]" = torch.ops.aten.relu.default(slice_2); slice_2 = None [2023-11-23 13:59:51,102] [0/2] torch._functorch.aot_autograd.__aot_graphs: [INFO] alias: "f16[1, 128]" = torch.ops.aten.alias.default(relu) [2023-11-23 13:59:51,102] [0/2] torch._functorch.aot_autograd.__aot_graphs: [INFO] alias_1: "f16[1, 128]" = torch.ops.aten.alias.default(alias); alias = None [2023-11-23 13:59:51,102] [0/2] torch._functorch.aot_autograd.__aot_graphs: [INFO] le: "b8[1, 128]" = torch.ops.aten.le.Scalar(alias_1, 0); alias_1 = None [2023-11-23 13:59:51,102] [0/2] torch._functorch.aot_autograd.__aot_graphs: [INFO] [2023-11-23 13:59:51,102] [0/2] torch._functorch.aot_autograd.__aot_graphs: [INFO] # File: /home/jonch/Desktop/Programming/mlsys/pytorch/test/test_sparse_semi_structured.py:145, code: x = self.linear(x) [2023-11-23 13:59:51,102] [0/2] torch._functorch.aot_autograd.__aot_graphs: [INFO] permute: "f16[128, 1]" = torch.ops.aten.permute.default(primals_3, [1, 0]); primals_3 = None [2023-11-23 13:59:51,102] [0/2] torch._functorch.aot_autograd.__aot_graphs: [INFO] return [relu, le, permute] ``` Reshape: ``` [2023-11-23 14:01:03,463] [0/2] torch._functorch.aot_autograd.__aot_graphs: [INFO] <eval_with_key>.69 class GraphModule(torch.nn.Module): [2023-11-23 14:01:03,463] [0/2] torch._functorch.aot_autograd.__aot_graphs: [INFO] def forward(self, primals_1: "f16[128, 64]", primals_2: "i16[128, 8]", primals_3: "f16[128]", primals_4: "Sym(s0)", primals_5: "Sym(s1)", primals_6: "f16[s0, s1, 128]"): [2023-11-23 14:01:03,463] [0/2] torch._functorch.aot_autograd.__aot_graphs: [INFO] # File: /home/jonch/Desktop/Programming/mlsys/pytorch/test/test_sparse_semi_structured.py:145, code: x = self.linear(x) [2023-11-23 14:01:03,463] [0/2] torch._functorch.aot_autograd.__aot_graphs: [INFO] mul: "Sym(s0s1)" = primals_4 * primals_5 [2023-11-23 14:01:03,463] [0/2] torch._functorch.aot_autograd.__aot_graphs: [INFO] view: "f16[s0s1, 128]" = torch.ops.aten.view.default(primals_6, [mul, 128]); primals_6 = mul = None [2023-11-23 14:01:03,463] [0/2] torch._functorch.aot_autograd.__aot_graphs: [INFO] _sparse_semi_structured_linear: "f16[s0s1, 128]" = torch.ops.aten._sparse_semi_structured_linear.default(view, primals_1, primals_2, bias = primals_3); primals_1 = primals_2 = primals_3 = None [2023-11-23 14:01:03,463] [0/2] torch._functorch.aot_autograd.__aot_graphs: [INFO] slice_1: "f16[s0*s1, 128]" = torch.ops.aten.slice.Tensor(_sparse_semi_structured_linear, 1, 0, 9223372036854775807); _sparse_semi_structured_linear = None [2023-11-23 14:01:03,463] [0/2] torch._functorch.aot_autograd.__aot_graphs: [INFO] view_1: "f16[s0, s1, 128]" = torch.ops.aten.view.default(slice_1, [primals_4, primals_5, 128]); slice_1 = None [2023-11-23 14:01:03,463] [0/2] torch._functorch.aot_autograd.__aot_graphs: [INFO] [2023-11-23 14:01:03,463] [0/2] torch._functorch.aot_autograd.__aot_graphs: [INFO] # File: /home/jonch/Desktop/Programming/mlsys/pytorch/test/test_sparse_semi_structured.py:147, code: return torch.nn.functional.relu(x) [2023-11-23 14:01:03,463] [0/2] torch._functorch.aot_autograd.__aot_graphs: [INFO] relu: "f16[s0, s1, 128]" = torch.ops.aten.relu.default(view_1); view_1 = None [2023-11-23 14:01:03,463] [0/2] torch._functorch.aot_autograd.__aot_graphs: [INFO] alias: "f16[s0, s1, 128]" = torch.ops.aten.alias.default(relu) [2023-11-23 14:01:03,463] [0/2] torch._functorch.aot_autograd.__aot_graphs: [INFO] alias_1: "f16[s0, s1, 128]" = torch.ops.aten.alias.default(alias); alias = None [2023-11-23 14:01:03,463] [0/2] torch._functorch.aot_autograd.__aot_graphs: [INFO] le: "b8[s0, s1, 128]" = torch.ops.aten.le.Scalar(alias_1, 0); alias_1 = None [2023-11-23 14:01:03,463] [0/2] torch._functorch.aot_autograd.__aot_graphs: [INFO] return [relu, view, le, primals_4, primals_5] ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114477 Approved by: https://github.com/jcaip	2023-11-28 19:35:05 +00:00
voznesenskym	ddf1cb7870	AOTAutograd: handle set_(), detect metadata mutations that cancel out (#111554 ) This should be enough to get @voznesenskym 's FSDP branch to plumb `set_()` through AOTAutograd properly and have everything properly no-op out. Main changes are: (1) graph break on `aten::set_.source_Tensor_storage_offset` (we could support it but it isn't needed, seems safer to graph break) (2) Functionalization: add a "proper" functionalization kernel for `aten::set_.source_Tensor`. The previous one we had was codegen'd and it was wrong (it would just clone() and call set_(), which does not do the right thing). I also manually mark on the `FunctionalTensorWrapper` when a given tensor has been mutated by a `set_()` call. (3) AOTAutograd: I added a new field, `InputAliasInfo.mutates_storage_metadata`, so we can distinguish between "regular" metadata mutations, and metadata mutations due to `set_()` calls. This is mainly because at runtime, one requires calling `as_strided_()` to fix up metadata, while the other requires calling `set_()`. (4) Made AOTAutograd's detection for metadata mutations / set_() mutations smarter and detect no-ops (if the storage and metadata are all the same). I also killed `was_updated()` and `was_metadata_updated()`, and replaced them with (existing) `has_data_mutation() ` and (new) `has_data_mutation()`, which can more accurately distinguish between data-mutation vs. `set_()` calls vs. metadata-mutation This PR is still silently correct in one case though, which I'd like to discuss more. In particular, this example: ``` def f(x): x_view = x.view(-1) x.set_(torch.ones(2)) x_view.mul_(2) return ``` If you have an input that experiences both a data-mutation and a `x_old.set_(x_new)` call, there are two cases: (a) the data mutation happened on the storage of `x_new`. This case should be handled automatically: if x_new is a graph intermediate then we will functionalize the mutation. If x_new is a different graph input, then we will perform the usual `copy_()` on that other graph input (b) the data mutation happened on the storage of `x_old`. This is more of a pain to handle, and doesn't currently work. At runtime, the right thing to do is probably something like: ``` def functionalized_f(x): x_view = x.view(-1) # set_() desugars into a no-op; later usages of x will use x_output x_output = torch.ones(2) # functionalize the mutation on x_view x_view_updated = x.mul(2) x_updated = x_view_updated.view(x.shape) # x experienced TWO TYPES of mutations; a data mutation and a metatadata mutation # We need to return both updated tensors in our graph return x_updated, x_output def runtime_wrapper(x): x_data_mutation_result, x_set_mutation_result = compiled_graph(x) # First, perform the data mutation on x's old storage x.copy_(x_data_mutation_result) # Then, swap out the storage of x with the new storage x.set_(x_set_mutation_result) ``` There are two things that make this difficult to do though: (1) Functionalization: the functionalization rule for `set_()` will fully throw away the old `FunctionalStorageImpl` on the graph input. So if there are any mutations to that `FunctionalStorageImpl` later on in the graph, the current graph input won't know about it. Maybe we can have a given `FunctionalTensorWrapper` remember all previous storages that it had, and track mutations on all of them - although this feels pretty complicated. (2) AOTAutograd now needs to know that we might have two graph outputs that correspond to a single "mutated input", which is annoying. It's worth pointing out that this issue is probably extremely unlikely for anyone to run into - can we just detect it and error? This feels slightly easier than solving it, although not significantly easier. We would still need `FunctionalTensorWrapper` to keep track of mutations on any of its "previous" storages, so it can report this info back to AOTAutograd so we can raise an error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111554 Approved by: https://github.com/ezyang ghstack dependencies: #113926	2023-11-28 19:33:35 +00:00
titaiwangms	e83c05c833	[ONNX] Add ONNX ExportedProgram tests (#114633 ) Fix #114166 Fix #113705 This PR references tests from `test_export.py` to make sure the exported program from PyTorch can all be successfully exported into ONNX model. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114633 Approved by: https://github.com/thiagocrepaldi	2023-11-28 19:03:13 +00:00
Tarun Karuturi	39f16c221e	Adding event_tracer evalue logging calls in codegen (#114584 ) Summary: This diff adds support in the ExecuTorch codegen layer to log the outputs of kernels to event_tracer. It does this by calling the `event_tracer_log_evalue` API. When the `ET_EVENT_TRACER_ENABLED` flag is disabled this is essentially a no-op and will add no overhead. Test Plan: CI Reviewed By: larryliu0820 Differential Revision: D51534590 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114584 Approved by: https://github.com/larryliu0820	2023-11-28 18:32:05 +00:00
Will Constable	e6a8052051	[C10D] Flight recorder - disable c++ stacktrace by default (#114651 ) CPP Stacktrace processing (symbolizer) takes a long time on some systems using a particular version of addr2line. In slow systems, this makes flight-recorder dumping slow enough to time out on even toy programs. TORCH_NCCL_TRACE_CPP_STACK=True will re-enable CPP stacktrace collection as part of the flight recorder. CPP stacktrace is fast enough for use on certain combinations of OS. We can investigate moving to llvm's symbolizer as a replacement. On devserver with C++ stacktraces disabled/enabled: ``` python test/distributed/test_c10d_nccl.py -k test_short Ran 1 test in 12.175s TORCH_NCCL_TRACE_CPP_STACK=1 python test/distributed/test_c10d_nccl.py -k test_short Ran 1 test in 53.338s ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114651 Approved by: https://github.com/zdevito	2023-11-28 16:49:20 +00:00
Nikita Shulga	b060694088	Add `bits` dtypes to `torch._C` stubs (#114661 ) As defined `6ae0554d11/c10/core/ScalarType.h (L54-L58)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114661 Approved by: https://github.com/ngimel	2023-11-28 15:21:58 +00:00
Bin Bao	0bef97fac3	[dynamo] Support itertools.groupby (#114192 ) Summary: for https://github.com/pytorch/pytorch/issues/108698 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114192 Approved by: https://github.com/jansel	2023-11-28 14:58:59 +00:00
Andrew Gu	cc7a969bb3	[FSDP] Added test for `ignored_states` + auto wrap (#114612 ) This adds some unit testing for the `ignored_states` argument and auto wrapping. There is some ongoing discussion with @erhoo82 about his particular use case, but it should not block this PR. (We can land a separate PR if needed.) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114612 Approved by: https://github.com/wanchaol ghstack dependencies: #114611	2023-11-28 14:36:34 +00:00
lezcano	79ee99e6d2	[easy] Dispatch torch.from_numpy to torch.as_tensor (#114609 ) ...rather than detaching the tensor Pull Request resolved: https://github.com/pytorch/pytorch/pull/114609 Approved by: https://github.com/larryliu0820, https://github.com/voznesenskym ghstack dependencies: #114608	2023-11-28 12:04:37 +00:00
lezcano	0bb2600c28	Allow to differentiate through NumPy code (#114608 ) With this PR it is possible to differentiate through NumPy code modulo the usual caveats that apply to differentiation: - That there are no graphbreaks - That the decomposition in `torch._numpy` is differentiable @ev-br and I were somewhat careful to achieve the second point, but it is not tested though and through, so YMMV Pull Request resolved: https://github.com/pytorch/pytorch/pull/114608 Approved by: https://github.com/voznesenskym	2023-11-28 12:04:37 +00:00
Xuehai Pan	89a1fe6966	[pytree] register pytree node type in both C++ pytree and Python pytree (#112111 ) Changes: 1. Add `_private_register_pytree_node` API in both C++ and Python pytree. In C++ pytree, the API will only register pytree node for C++ pytree. In Python pytree, the API will only register pytree node for Python pytree. 2. Do not allow registering a type as pytree node twice in the Python pytree. 3. Add thread lock to the Python pytree node register API. 4. The old `_register_pytree_node` API will call the `_private_register_pytree_node` API and raise a deprecation warning. 5. Add a new `register_pytree_node` API to register node type in both C++ and Python implementations. 6. Add tests to ensure a warning will be raised when the old private function is called. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112111 Approved by: https://github.com/zou3519	2023-11-28 11:41:38 +00:00
Pearu Peterson	088fc7779e	Eliminate unnecessary copy in CUDA addmm with sparse compressed block operand (#114484 ) As in the title. As a result, `nn.linear(<strided tensor>, <BSR tensor>, bias=<strided tensor>)` performance increases as follows (`float16`, `NVIDIA A100-SXM4-80GB`): - 256x256 weights, speed up is 14..27 % - 512x512 weights, speed up is 9..25 % - 1024x1024 weights, speed up is 5..20 % - 2048x2048 weights, speed up is 3..16 % - 4092x4092 weights, speed up is 2..9 % Pull Request resolved: https://github.com/pytorch/pytorch/pull/114484 Approved by: https://github.com/cpuhrsch	2023-11-28 11:35:55 +00:00
angelayi	00412e6dfa	[export] Add meta to params (#114622 ) The graph from `capture_pre_autograd_graph` doesn't have `meta["val"]` on the param nodes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114622 Approved by: https://github.com/frank-wei, https://github.com/zhxchen17, https://github.com/khabinov	2023-11-28 07:40:15 +00:00
leslie-fang-intel	95aec251aa	[Quant] [Inductor] Enable the Inductor Lowering of QConv2d post op hardtanh (#114580 ) Summary Enable the fusion pattern of `QConv2d -> hardtanh` lowering to `hardtanh` as `QConv2d` post operator. Test Plan ``` python -m pytest test_mkldnn_pattern_matcher.py -k test_qconv2d_relu6_cpu python -m pytest test_mkldnn_pattern_matcher.py -k test_qconv2d_hardtanh_cpu python -m pytest test_mkldnn_pattern_matcher.py -k test_qat_qconv2d_relu6 python -m pytest test_mkldnn_pattern_matcher.py -k test_qat_qconv2d_hardtanh ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114580 Approved by: https://github.com/jgong5, https://github.com/jerryzh168 ghstack dependencies: #114578, #114579	2023-11-28 07:21:30 +00:00
leslie-fang-intel	8c1f65dc2b	[Quant] [PT2] Add Hardtanh and ReLU6 into X86InductorQuantizer Conv2d Unary Annotation (#114579 ) Summary Add `Hardtanh` and `ReLU6` into X86InductorQuantizer Conv2d Unary Annotation TestPlan ``` python -m pytest test_x86inductor_quantizer.py -k test_conv2d_unary python -m pytest test_x86inductor_quantizer.py -k test_qat_conv2d_unary ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114579 Approved by: https://github.com/jgong5, https://github.com/jerryzh168 ghstack dependencies: #114578	2023-11-28 07:18:00 +00:00
leslie-fang-intel	8a35a68bb7	[Quant] Enable QConv2d with hardtanh post op (#114578 ) Summary Enable QConv2d implementation with post op `hardtanh` Test Plan ``` python -m pytest test_quantized_op.py -k test_qconv2d_hardtanh_pt2e ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114578 Approved by: https://github.com/jgong5, https://github.com/jerryzh168	2023-11-28 07:13:01 +00:00
Andrew Gu	06abac971a	[FSDP] Simplified FSDP wrapping in ignored module test (#114611 ) This saves some verbosity. There is no change to functionality. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114611 Approved by: https://github.com/wanchaol	2023-11-28 07:07:37 +00:00
Jez Ng	5cfa0647a7	Update mypy to 1.7.0 (#114160 ) It appears that `mypy` is now checking a few more previously-unchecked files; these files are being found via import-following. Not sure exactly why they weren't being checked before. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114160 Approved by: https://github.com/eellison ghstack dependencies: #114162	2023-11-28 06:45:55 +00:00
Jez Ng	71b742b42c	[inductor] Remove more type: ignore comments (#114162 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114162 Approved by: https://github.com/Skylion007, https://github.com/eellison	2023-11-28 06:45:55 +00:00
Jon Chuang	3f574eadb4	[dynamo / DDP] - lazily compile submodules - to propagate real tensor strides to backend compiler (#114154 ) Fixes https://github.com/pytorch/pytorch/issues/113812, https://github.com/pytorch/pytorch/issues/102591, Probably fixes: https://github.com/pytorch/pytorch/issues/113740, https://github.com/pytorch/pytorch/issues/113786, https://github.com/pytorch/pytorch/issues/113788 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114154 Approved by: https://github.com/wconstab	2023-11-28 06:29:43 +00:00
PyTorch UpdateBot	6636c2b178	[executorch hash update] update the pinned executorch hash (#114648 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml). Update the pinned executorch hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114648 Approved by: https://github.com/pytorchbot	2023-11-28 05:41:36 +00:00
cyy	8933ff3595	Make torch::jit::module movable (#114041 ) This PR makes torch::jit::module movable to improve performance. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114041 Approved by: https://github.com/huydhn	2023-11-28 05:03:37 +00:00
Nikita Shulga	2f875c74bf	Print ghcr docker pull during build/test (#114510 ) To make debugging easier to external devs Test plan: Copy and run command from [`Use the following to pull public copy of the image`](https://github.com/pytorch/pytorch/actions/runs/7012511180/job/19077533416?pr=114510#step:6:9): ``` docker pull ghcr.io/pytorch/ci-image:pytorch-linux-jammy-py3.8-gcc11-0d0042fd2e432ea07301ad6f6a474d36a581f0dc ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114510 Approved by: https://github.com/atalman, https://github.com/huydhn	2023-11-28 04:38:17 +00:00
Liao, Xuan	0de67e7949	[cpu] Modify inductor opt flag (#113347 ) Fixes https://github.com/pytorch/pytorch/issues/113014, https://github.com/pytorch/pytorch/issues/113012, https://github.com/pytorch/pytorch/issues/93598. For CPU inductor path, remove `-funsafe-math-optimizations` from optimization flags to fix functional issues. ### Validation on 3 benchmark suites FP32 <img width="582" alt="image" src="https://github.com/pytorch/pytorch/assets/23010269/5a648497-a8e2-4057-8dd4-b322e9334456"> - No accuracy problem - Slight geomean perf drop - 3 outlier models (speed up < 0.8). Could be solved by adding vectorizations later. BF16 <img width="583" alt="image" src="https://github.com/pytorch/pytorch/assets/23010269/ca1cbd34-5712-4d79-9238-0cc11dd279b1"> - No accuracy problem - Slight geomean perf drop - 4 outlier models (speed up < 0.8). Could be solved by adding vectorizations later. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113347 Approved by: https://github.com/jgong5, https://github.com/desertfire	2023-11-28 04:03:24 +00:00
leslie-fang-intel	11f11e95df	[Quant] [Inductor] Fix an issue in QConv Binary Pattern Match (#114541 ) Summary Add the `extra_check` in `_register_quantized_conv_binary_lowering` to skip the pattern which matched unexpected. To match a Conv-Binary pattern, we should expect the extra input of binary node comes from a dequant pattern instead of a constant scalar. Test Plan ``` python -m pytest test_mkldnn_pattern_matcher.py -k test_qconv2d_add_2 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114541 Approved by: https://github.com/jgong5, https://github.com/jerryzh168 ghstack dependencies: #114540	2023-11-28 02:59:20 +00:00
drisspg	8556a09d44	Require less alignment for attn bias (#114173 ) # Summary Improved Fix for Attention Mask Alignment Issue (#112577) This PR addresses Issue #112577 by refining the previously implemented fix, which was found to be incorrect and causes un-needed memory regressions. The update simplifies the approach to handling the alignment of the attention mask for mem eff attention. ## Changes Alignment Check and Padding: Initially, the alignment of the attention mask is checked. If misalignment is detected, padding is applied, followed by slicing. During this process, a warning is raised to alert users. Should this be warn_once? We only call expand, once on the aligned mask. Reference https://github.com/facebookresearch/xformers/blob/main/xformers/ops/fmha/cutlass.py#L115 @albanD, @mruberry, @jbschlosser, @walterddr, and @mikaylagawarecki. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114173 Approved by: https://github.com/danthe3rd	2023-11-28 02:40:41 +00:00
Yang Chen	4abf2b2261	[dynamo] fixed record_replayer issue when TORCH_COMPILE_DEBUG=1 (#114623 ) In https://github.com/pytorch/pytorch/pull/113432, we changed the behavior of _is_allowed_module_prefix, where we moved the '.' from the module perfixes. Consequently, 'LOAD_ATTR submodule' (e.g. LOAD_ATTR fx) is turned into PythonModuleVariable instead of TorchVariable. This caused some issue for record_replayer.record_module_access , which is enabled by setting TORCH_COMPILER_DEBUG=1, because 'torch.fx' doesn't exist in record_replayer's name_to_modrec dictionary when record_module_access is called. This PR fixed the issue by adding "torch.fx" into record_replayer's EXCLUDES list. The fix is likely to be a workaround to unblock internal workflow. There might be some fundamental changes to the relevant pieces along with Yanbo's refactoring PRs for tracing in-graph functions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114623 Approved by: https://github.com/mlazos, https://github.com/yanboliang	2023-11-28 02:40:07 +00:00
Will Constable	2333d381b2	Make 'distributed' TORCH_LOGS include ddpoptimizer (#114376 ) There are now 3 ways to see logs from ddpoptimzer. 1) TORCH_LOGS="distributed" 2) TORCH_LOGS="dynamo" 3) TORCH_LOGS="torch._dynamo.backends.distributed" (1 and 2 are different supersets of 3 that also include other content) Note: ddp_graphs is still a separate 'artifact' logger, which just includes graph dumps from the graph-splitting process. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114376 Approved by: https://github.com/wanchaol	2023-11-28 02:39:28 +00:00
Yang Chen	ae40a3ebcf	[inductor] added a config to dump profiling results to a file (#114587 ) Currently, we print out profile bandwidth result for each triton kernel to stdout after each profiling run finishes. Consequently, the profiling results are mixed with other debug outputs. This PR adds a config, profile_bandwidth_output, to specify a file where we can dump the results in a sorted order. The new config can be set by setting "TORCHINDUCTOR_PROFILE_OUTPUT" environment variable. Hopefully it would offer a slightly better way to navigate the profiling results. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114587 Approved by: https://github.com/Chillee	2023-11-28 02:21:11 +00:00
leslie-fang-intel	6ae0554d11	Enable the lowering of quantized reshape (#114443 ) Summary Enable the lowering of `dq->reshape->q` into a `qreshape` Test Plan ``` python -m pytest test_mkldnn_pattern_matcher.py -k test_qflatten ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114443 Approved by: https://github.com/jgong5, https://github.com/eellison, https://github.com/jerryzh168 ghstack dependencies: #114442	2023-11-28 01:43:54 +00:00
lezcano	4ba3e6758d	Canonicalize runtime asserts (#114509 ) This allows us to remove quite a few redundant runtime asserts, and potentially a number of guards as well. On ``` python test/dynamo/test_subclasses.py -k test_unbind ``` we go from ``` inserting runtime assert i0 <= s0 inserting runtime assert 0 <= -i0 + s0 inserting runtime assert i0 + i1 <= s0 inserting runtime assert i0 <= -i1 + s0 inserting runtime assert i0 + i1 + i2 <= s0 inserting runtime assert i0 + i1 <= -i2 + s0 inserting runtime assert Eq(i0 + i1 + i2 + i3, s0) inserting runtime assert i0 + i1 + i2 + i3 <= s0 inserting runtime assert i0 + i1 + i2 <= -i3 + s0 ``` to ``` inserting runtime assert i0 - s0 <= 0 inserting runtime assert i0 + i1 - s0 <= 0 inserting runtime assert i0 + i1 + i2 - s0 <= 0 inserting runtime assert Eq(i0 + i1 + i2 + i3, s0) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114509 Approved by: https://github.com/voznesenskym	2023-11-28 01:38:47 +00:00
leslie-fang-intel	74370a8a9d	Add adaptive_avg_pool2d and flatten into x86 Inductor Quantizer recipe (#114442 ) Summary Add adaptive_avg_pool2d and flatten into x86 Inductor Quantizer recipe Test Plan ``` python -m pytest test_x86inductor_quantizer.py -k test_adaptive_avg_pool2d_recipe python -m pytest test_x86inductor_quantizer.py -k test_flatten_recipe ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114442 Approved by: https://github.com/jgong5, https://github.com/jerryzh168	2023-11-28 01:35:57 +00:00
Aaron Gokaslan	e25b146b8c	[BE][Easy]: Enable flake8-exe rules in ruff too. (#114521 ) Enable flake8-exe rules in ruff too. RUFF requires EXE rules to enabled separately from the E prefix. This fixes a parity bug between flake8 and ruff. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114521 Approved by: https://github.com/kit1980	2023-11-28 01:27:55 +00:00
Jacob Szwejbka	304ea761f5	[executorch][be] update test_emit to use export (#114294 ) Summary: exir.capture is deprecated. Switch to blessed path Test Plan: fbsource/fbcode/executorch/exir/emit/test (c40a7a0d2)]$ buck test : Differential Revision: D51503120 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114294 Approved by: https://github.com/zhxchen17	2023-11-28 01:25:46 +00:00
Huy Do	cf9f3ae8d8	Skip an example of test_instance_norm when running internally due to its size (#114452 ) After https://github.com/pytorch/pytorch/pull/113420, `torch.unique` now includes a call to `torch.sort` and that call is slow when running in dev mode, i.e. `@fbcode//mode/dev`. This causes the test to take more than 10 minutes and time out internally [T170720856](https://www.internalfb.com/intern/tasks/?t=170720856). Running the test in `@fbcode//mode/opt` is fine, so please let me know if there is a way to set that. Otherwise, this change will skip the largest example when running in sandcastle internally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114452 Approved by: https://github.com/malfet	2023-11-28 01:11:19 +00:00
leslie-fang-intel	e592b9a469	[Quant] [PT2] Fix an issue in Conv Binary Quantization Annotation (#114540 ) Summary To annotate a conv-binary pattern, should skip the pattern if the conv node has more than one user. Test Plan ``` python -m pytest test_x86inductor_quantizer.py -k test_conv2d_binary2 python -m pytest test_x86inductor_quantizer.py -k test_qat_conv2d_binary2 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114540 Approved by: https://github.com/jgong5, https://github.com/jerryzh168	2023-11-28 01:06:48 +00:00
Chien-Chin Huang	b1fb591272	[replicate] Simplify replicate() init logic and remove unnecessary variables in _ReplicateState (#113679 ) Many variables _ReplicateState are created because replicate() was lazy initialized. This PR removes these variables and simplifes the logic.y Differential Revision: [D51317874](https://our.internmc.facebook.com/intern/diff/D51317874/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113679 Approved by: https://github.com/awgu	2023-11-28 00:55:36 +00:00
Angela Yi	dffa5f3f23	[dynamo][reland] `ExecutorchCallDelegateHigherOrderVariable` - add sanity check that input and output tensors are disjoint (#114167 ) Summary: Reland of https://github.com/pytorch/pytorch/pull/111960, Fixes https://github.com/pytorch/pytorch/issues/111917 Original PR broke some internal tests which the current diff has resolved. Test Plan: CI Differential Revision: D51473196 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114167 Approved by: https://github.com/jon-chuang, https://github.com/zou3519	2023-11-28 00:27:23 +00:00
Angela Yi	a0be4b7ea7	[fx] Update symbolic_trace nn_module_stack (#114422 ) Summary: Fixed nn_module_stack dynamo produced by symbolic trace to align with the nn_module_stack metadata produced by dynamo. The key should be the module path, with the value being a unique name, and the type. Something like: `{'L__self___one_module': ("L['self'].one_module", <class 'torch.fx.graph_module.GraphModule.__new__.<locals>.GraphModuleImpl'>)}` This was causing some tests to fail when using export + the old quantization flow (prepare_fx calls symbolic_trace). Test Plan: D51534471 `buck2 run @//mode/dev-nosan //executorch/backends/xnnpack/test:test_xnnpack_quantized -- -r "test_xnnpack_leaky_relu"` Differential Revision: D51539118 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114422 Approved by: https://github.com/JacobSzwejbka, https://github.com/jerryzh168	2023-11-28 00:18:41 +00:00
Pritam Damania	f505d76462	Bug fixes to DDP _update_process_group API. (#114194 ) https://github.com/pytorch/pytorch/pull/113580 introduced the `DDP._update_process_group` API. However, the implementation did not correctly reset all of the necessary state in the reducer. In particular if an error occurred during backward, DDP would end up in an incorrect state. As a result, in this PR I've enhanced the unit test to test for this case and also appropriately fixed resetting Reducer state. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114194 Approved by: https://github.com/rohan-varma	2023-11-27 23:52:40 +00:00
Nikita Shulga	7c98bac4a0	[BE] Speedup register schema compilation (#114438 ) For some reason, inlining initializer list into a std::vector takes a lot of time using clang-15. But considering that there are only dozen or so distrinct tags, creating them once and pass as def argument should not affect runtime speed at all, but this significantly improves compilation time. On Mac M1 it reduces time needed to compiler RegisterSchema.cpp from 50 to 3 seconds. Special case empty tags, to keep torch_gen tests happy Before ``` % /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -ftime-report -DAT_PER_OPERATOR_HEADERS -DCAFFE2_BUILD_MAIN_LIB -DCPUINFO_SUPPORTED_PLATFORM=1 -DFMT_HEADER_ONLY=1 -DFXDIV_USE_INLINE_ASSEMBLY=0 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DNNP_CONVOLUTION_ONLY=0 -DNNP_INFERENCE_ONLY=0 -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DUSE_EXTERNAL_MZCRC -D_FILE_OFFSET_BITS=64 -Dtorch_cpu_EXPORTS -I/Users/nshulga/git/pytorch/pytorch/build/aten/src -I/Users/nshulga/git/pytorch/pytorch/aten/src -I/Users/nshulga/git/pytorch/pytorch/build -I/Users/nshulga/git/pytorch/pytorch -I/Users/nshulga/git/pytorch/pytorch/cmake/../third_party/benchmark/include -I/Users/nshulga/git/pytorch/pytorch/third_party/onnx -I/Users/nshulga/git/pytorch/pytorch/build/third_party/onnx -I/Users/nshulga/git/pytorch/pytorch/third_party/foxi -I/Users/nshulga/git/pytorch/pytorch/build/third_party/foxi -I/Users/nshulga/git/pytorch/pytorch/torch/csrc/api -I/Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include -I/Users/nshulga/git/pytorch/pytorch/caffe2/aten/src/TH -I/Users/nshulga/git/pytorch/pytorch/build/caffe2/aten/src/TH -I/Users/nshulga/git/pytorch/pytorch/build/caffe2/aten/src -I/Users/nshulga/git/pytorch/pytorch/build/caffe2/../aten/src -I/Users/nshulga/git/pytorch/pytorch/torch/csrc -I/Users/nshulga/git/pytorch/pytorch/third_party/miniz-2.1.0 -I/Users/nshulga/git/pytorch/pytorch/third_party/kineto/libkineto/include -I/Users/nshulga/git/pytorch/pytorch/third_party/kineto/libkineto/src -I/Users/nshulga/git/pytorch/pytorch/aten/src/ATen/.. -I/Users/nshulga/git/pytorch/pytorch/third_party/FXdiv/include -I/Users/nshulga/git/pytorch/pytorch/c10/.. -I/Users/nshulga/git/pytorch/pytorch/third_party/pthreadpool/include -I/Users/nshulga/git/pytorch/pytorch/third_party/cpuinfo/include -I/Users/nshulga/git/pytorch/pytorch/aten/src/ATen/native/quantized/cpu/qnnpack/include -I/Users/nshulga/git/pytorch/pytorch/aten/src/ATen/native/quantized/cpu/qnnpack/src -I/Users/nshulga/git/pytorch/pytorch/third_party/cpuinfo/deps/clog/include -I/Users/nshulga/git/pytorch/pytorch/third_party/NNPACK/include -I/Users/nshulga/git/pytorch/pytorch/third_party/FP16/include -I/Users/nshulga/git/pytorch/pytorch/third_party/fmt/include -I/Users/nshulga/git/pytorch/pytorch/third_party/flatbuffers/include -isystem /Users/nshulga/git/pytorch/pytorch/cmake/../third_party/googletest/googlemock/include -isystem /Users/nshulga/git/pytorch/pytorch/cmake/../third_party/googletest/googletest/include -isystem /Users/nshulga/git/pytorch/pytorch/third_party/protobuf/src -isystem /Users/nshulga/git/pytorch/pytorch/third_party/XNNPACK/include -isystem /Users/nshulga/git/pytorch/pytorch/cmake/../third_party/eigen -isystem /Users/nshulga/git/pytorch/pytorch/build/include -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=braced-scalar-init -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wvla-extension -Wsuggest-override -Wnewline-eof -Winconsistent-missing-override -Winconsistent-missing-destructor-override -Wno-pass-failed -Wno-error=pedantic -Wno-error=old-style-cast -Wno-error=inconsistent-missing-override -Wno-error=inconsistent-missing-destructor-override -Wconstant-conversion -Wno-invalid-partial-specialization -Wno-missing-braces -Qunused-arguments -fcolor-diagnostics -faligned-new -Werror -Wno-unused-but-set-variable -fno-math-errno -fno-trapping-math -Werror=format -DUSE_MPS -Wno-unused-private-field -Wno-missing-braces -O3 -DNDEBUG -DNDEBUG -arch arm64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.0.sdk -fPIC -D__NEON__ -Wall -Wextra -Wdeprecated -Wno-unused-parameter -Wno-unused-function -Wno-missing-field-initializers -Wno-unknown-pragmas -Wno-type-limits -Wno-array-bounds -Wno-strict-overflow -Wno-strict-aliasing -fvisibility=hidden -O2 -Wmissing-prototypes -Werror=missing-prototypes -Xpreprocessor -fopenmp -I/Users/nshulga/miniforge3/include -std=gnu++17 -Wno-missing-prototypes -Wno-error=missing-prototypes -o caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterSchema.cpp.o -c /Users/nshulga/git/pytorch/pytorch/build/aten/src/ATen/RegisterSchema.cpp ===-------------------------------------------------------------------------=== ... Pass execution timing report ... ===-------------------------------------------------------------------------=== Total Execution Time: 131.8054 seconds (132.5540 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- ---Instr--- --- Name --- 43.6364 ( 33.2%) 0.0919 ( 30.1%) 43.7282 ( 33.2%) 43.9658 ( 33.2%) 536345245380 ModuleInlinerWrapperPass 43.6291 ( 33.2%) 0.0891 ( 29.2%) 43.7182 ( 33.2%) 43.9549 ( 33.2%) 536264096394 DevirtSCCRepeatedPass 42.3766 ( 32.2%) 0.0185 ( 6.1%) 42.3951 ( 32.2%) 42.6198 ( 32.2%) 523040901767 GVNPass 0.4085 ( 0.3%) 0.0040 ( 1.3%) 0.4125 ( 0.3%) 0.4195 ( 0.3%) 4106085945 SimplifyCFGPass 0.3611 ( 0.3%) 0.0115 ( 3.8%) 0.3726 ( 0.3%) 0.3779 ( 0.3%) 4864696407 InstCombinePass 0.1607 ( 0.1%) 0.0088 ( 2.9%) 0.1695 ( 0.1%) 0.1720 ( 0.1%) 1780986175 InlinerPass 0.0865 ( 0.1%) 0.0024 ( 0.8%) 0.0889 ( 0.1%) 0.0914 ( 0.1%) 1489982961 SROAPass 0.0750 ( 0.1%) 0.0013 ( 0.4%) 0.0763 ( 0.1%) 0.0764 ( 0.1%) 620016338 SCCPPass 0.0661 ( 0.1%) 0.0040 ( 1.3%) 0.0701 ( 0.1%) 0.0735 ( 0.1%) 592027163 EarlyCSEPass ... ===-------------------------------------------------------------------------=== Clang front-end time report ===-------------------------------------------------------------------------=== Total Execution Time: 48.2802 seconds (48.8638 wall clock) ... ``` After ``` % /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -ftime-report -DAT_PER_OPERATOR_HEADERS -DCAFFE2_BUILD_MAIN_LIB -DCPUINFO_SUPPORTED_PLATFORM=1 -DFMT_HEADER_ONLY=1 -DFXDIV_USE_INLINE_ASSEMBLY=0 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DNNP_CONVOLUTION_ONLY=0 -DNNP_INFERENCE_ONLY=0 -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DUSE_EXTERNAL_MZCRC -D_FILE_OFFSET_BITS=64 -Dtorch_cpu_EXPORTS -I/Users/nshulga/git/pytorch/pytorch/build/aten/src -I/Users/nshulga/git/pytorch/pytorch/aten/src -I/Users/nshulga/git/pytorch/pytorch/build -I/Users/nshulga/git/pytorch/pytorch -I/Users/nshulga/git/pytorch/pytorch/cmake/../third_party/benchmark/include -I/Users/nshulga/git/pytorch/pytorch/third_party/onnx -I/Users/nshulga/git/pytorch/pytorch/build/third_party/onnx -I/Users/nshulga/git/pytorch/pytorch/third_party/foxi -I/Users/nshulga/git/pytorch/pytorch/build/third_party/foxi -I/Users/nshulga/git/pytorch/pytorch/torch/csrc/api -I/Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include -I/Users/nshulga/git/pytorch/pytorch/caffe2/aten/src/TH -I/Users/nshulga/git/pytorch/pytorch/build/caffe2/aten/src/TH -I/Users/nshulga/git/pytorch/pytorch/build/caffe2/aten/src -I/Users/nshulga/git/pytorch/pytorch/build/caffe2/../aten/src -I/Users/nshulga/git/pytorch/pytorch/torch/csrc -I/Users/nshulga/git/pytorch/pytorch/third_party/miniz-2.1.0 -I/Users/nshulga/git/pytorch/pytorch/third_party/kineto/libkineto/include -I/Users/nshulga/git/pytorch/pytorch/third_party/kineto/libkineto/src -I/Users/nshulga/git/pytorch/pytorch/aten/src/ATen/.. -I/Users/nshulga/git/pytorch/pytorch/third_party/FXdiv/include -I/Users/nshulga/git/pytorch/pytorch/c10/.. -I/Users/nshulga/git/pytorch/pytorch/third_party/pthreadpool/include -I/Users/nshulga/git/pytorch/pytorch/third_party/cpuinfo/include -I/Users/nshulga/git/pytorch/pytorch/aten/src/ATen/native/quantized/cpu/qnnpack/include -I/Users/nshulga/git/pytorch/pytorch/aten/src/ATen/native/quantized/cpu/qnnpack/src -I/Users/nshulga/git/pytorch/pytorch/third_party/cpuinfo/deps/clog/include -I/Users/nshulga/git/pytorch/pytorch/third_party/NNPACK/include -I/Users/nshulga/git/pytorch/pytorch/third_party/FP16/include -I/Users/nshulga/git/pytorch/pytorch/third_party/fmt/include -I/Users/nshulga/git/pytorch/pytorch/third_party/flatbuffers/include -isystem /Users/nshulga/git/pytorch/pytorch/cmake/../third_party/googletest/googlemock/include -isystem /Users/nshulga/git/pytorch/pytorch/cmake/../third_party/googletest/googletest/include -isystem /Users/nshulga/git/pytorch/pytorch/third_party/protobuf/src -isystem /Users/nshulga/git/pytorch/pytorch/third_party/XNNPACK/include -isystem /Users/nshulga/git/pytorch/pytorch/cmake/../third_party/eigen -isystem /Users/nshulga/git/pytorch/pytorch/build/include -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=braced-scalar-init -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wvla-extension -Wsuggest-override -Wnewline-eof -Winconsistent-missing-override -Winconsistent-missing-destructor-override -Wno-pass-failed -Wno-error=pedantic -Wno-error=old-style-cast -Wno-error=inconsistent-missing-override -Wno-error=inconsistent-missing-destructor-override -Wconstant-conversion -Wno-invalid-partial-specialization -Wno-missing-braces -Qunused-arguments -fcolor-diagnostics -faligned-new -Werror -Wno-unused-but-set-variable -fno-math-errno -fno-trapping-math -Werror=format -DUSE_MPS -Wno-unused-private-field -Wno-missing-braces -O3 -DNDEBUG -DNDEBUG -arch arm64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.0.sdk -fPIC -D__NEON__ -Wall -Wextra -Wdeprecated -Wno-unused-parameter -Wno-unused-function -Wno-missing-field-initializers -Wno-unknown-pragmas -Wno-type-limits -Wno-array-bounds -Wno-strict-overflow -Wno-strict-aliasing -fvisibility=hidden -O2 -Wmissing-prototypes -Werror=missing-prototypes -Xpreprocessor -fopenmp -I/Users/nshulga/miniforge3/include -std=gnu++17 -Wno-missing-prototypes -Wno-error=missing-prototypes -o caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/RegisterSchema.cpp.o -c /Users/nshulga/git/pytorch/pytorch/build/aten/src/ATen/RegisterSchema.cpp ===-------------------------------------------------------------------------=== ... Pass execution timing report ... ===-------------------------------------------------------------------------=== Total Execution Time: 1.2920 seconds (1.3187 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- ---Instr--- --- Name --- 0.3070 ( 27.6%) 0.0547 ( 30.2%) 0.3617 ( 28.0%) 0.3654 ( 27.7%) 3719690895 ModuleInlinerWrapperPass 0.3024 ( 27.2%) 0.0525 ( 29.0%) 0.3549 ( 27.5%) 0.3585 ( 27.2%) 3653363330 DevirtSCCRepeatedPass 0.0619 ( 5.6%) 0.0073 ( 4.0%) 0.0692 ( 5.4%) 0.0711 ( 5.4%) 868136227 InstCombinePass 0.0601 ( 5.4%) 0.0065 ( 3.6%) 0.0666 ( 5.2%) 0.0679 ( 5.1%) 696430647 InlinerPass 0.0363 ( 3.3%) 0.0033 ( 1.8%) 0.0396 ( 3.1%) 0.0425 ( 3.2%) 535426974 SimplifyCFGPass 0.0280 ( 2.5%) 0.0069 ( 3.8%) 0.0348 ( 2.7%) 0.0358 ( 2.7%) 378716394 BlockFrequencyAnalysis 0.0208 ( 1.9%) 0.0049 ( 2.7%) 0.0257 ( 2.0%) 0.0262 ( 2.0%) 283689627 BranchProbabilityAnalysis 0.0239 ( 2.1%) 0.0002 ( 0.1%) 0.0241 ( 1.9%) 0.0241 ( 1.8%) 219122704 OpenMPOptCGSCCPass 0.0174 ( 1.6%) 0.0015 ( 0.8%) 0.0189 ( 1.5%) 0.0192 ( 1.5%) 215583965 GVNPass 0.0153 ( 1.4%) 0.0025 ( 1.4%) 0.0178 ( 1.4%) 0.0187 ( 1.4%) 184232295 EarlyCSEPass ... ===-------------------------------------------------------------------------=== Clang front-end time report ===-------------------------------------------------------------------------=== Total Execution Time: 2.9128 seconds (3.1027 wall clock) ... ``` And the generated schema file looks as follows: ```cpp TORCH_LIBRARY(aten, m) { const std::vector<at::Tag> tags_0 = {at::Tag::pt2_compliant_tag}; m.def("_cast_Byte(Tensor self, bool non_blocking=False) -> Tensor", tags_0); m.def("_cast_Char(Tensor self, bool non_blocking=False) -> Tensor", tags_0); m.def("_cast_Double(Tensor self, bool non_blocking=False) -> Tensor", tags_0); m.def("_cast_Float(Tensor self, bool non_blocking=False) -> Tensor", tags_0); m.def("_cast_Int(Tensor self, bool non_blocking=False) -> Tensor", tags_0); m.def("_cast_Long(Tensor self, bool non_blocking=False) -> Tensor", tags_0); m.def("_cast_Short(Tensor self, bool non_blocking=False) -> Tensor", tags_0); m.def("_cast_Half(Tensor self, bool non_blocking=False) -> Tensor", tags_0); m.def("_backward(Tensor self, Tensor[] inputs, Tensor? gradient=None, bool? retain_graph=None, bool create_graph=False) -> ()", tags_0); m.def("set_data(Tensor(a!) self, Tensor new_data) -> ()", tags_0); m.def("data(Tensor self) -> Tensor", tags_0); m.def("is_leaf(Tensor self) -> bool", tags_0); m.def("output_nr(Tensor self) -> int", tags_0); m.def("_version(Tensor self) -> int", tags_0); m.def("requires_grad_(Tensor(a!) self, bool requires_grad=True) -> Tensor(a!)", tags_0); m.def("retain_grad(Tensor(a!) self) -> ()", tags_0); m.def("retains_grad(Tensor self) -> bool", tags_0); m.def("_fw_primal(Tensor(a) self, int level) -> Tensor(a)", tags_0); m.def("_make_dual(Tensor(a) primal, Tensor tangent, int level) -> Tensor(a)", tags_0); m.def("_unpack_dual(Tensor(a) dual, int level) -> (Tensor(a) primal, Tensor tangent)", tags_0); m.def("_new_zeros_with_same_feature_meta(Tensor self, Tensor other, *, int self_num_batch_dims=0) -> Tensor", tags_0); m.def("_has_same_storage_numel(Tensor self, Tensor other) -> bool", tags_0); const std::vector<at::Tag> tags_1 = {at::Tag::inplace_view, at::Tag::pt2_compliant_tag}; m.def("rename_(Tensor(a!) self, Dimname[]? names) -> Tensor(a!)", tags_1); m.def("rename(Tensor(a) self, Dimname[]? names) -> Tensor(a)", tags_0); m.def("align_to(Tensor(a) self, Dimname[] names) -> Tensor(a)", tags_0); m.def("align_to.ellipsis_idx(Tensor(a) self, Dimname[] order, int ellipsis_idx) -> Tensor(a)", tags_0); m.def("align_as(Tensor self, Tensor other) -> Tensor", tags_0); ... ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114438 Approved by: https://github.com/zou3519	2023-11-27 23:33:04 +00:00
Will Constable	e4b1378a92	Fix dynamo test_logging handling of partial qnames (#114429 ) if logger_qname is a.b.c and dynamo_qnames contains a.b, it still matches dynamo's INFO setting concretely, torch._dynamo.backends.distributed is implicitly part of the dynamo namespace since it is covered by `torch._dynamo` which is one of dynamo_qnames. However, it is not an exact match for any of dynamo_qnames, which made this test fail when adding a specific qname for backends.distributed in the subsequent PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114429 Approved by: https://github.com/Skylion007 ghstack dependencies: #114428	2023-11-27 22:52:11 +00:00
Jithun Nair	2ea2421b44	Skip unit tests that fail on MI210 runners (#114613 ) Taken from https://github.com/pytorch/pytorch/pull/105980 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114613 Approved by: https://github.com/malfet	2023-11-27 22:25:35 +00:00
ydwu4	2ac0b61e60	[HigherOrderOp] dedup repeated get_attr placeholders in branches of cond (#112874 ) We further de-duplicate the dupliacted get_attrs nodes. For code below: ```python def test_cond_free_variable_in_both_branches(self): backend = EagerAndRecordGraphs() cnt = CompileCounterWithBackend(backend) z = torch.ones(4, 4) class Foo(torch.nn.Module): def __init__(self): super().__init__() self.register_buffer("buffer", torch.ones(6, 4)) def forward(self, x, y): def true_fn(x): return x.sum() + self.buffer.sum() + z.sum() def false_fn(x): return x.sum() - z.sum() - self.buffer.sum() return control_flow.cond(y, true_fn, false_fn, [x]) mod_for_compile = torch.compile( Foo(), backend=cnt, dynamic=True, fullgraph=True ) ``` Before de-duplication, we have the following graph module: ```python class GraphModule(torch.nn.Module): def forward(self, L_y_ : torch.Tensor, L_x_ : torch.Tensor, s0 : torch.SymInt, L_z_ : torch.Tensor): l_y_ = L_y_ l_x_ = L_x_ l_z_ = L_z_ # File: /home/yidi/local/pytorch/test/dynamo/test_higher_order_ops.py:1243, code: return x.sum() + self.buffer.sum() + z.sum() l__self___buffer = self.L__self___buffer # File: /home/yidi/local/pytorch/test/dynamo/test_higher_order_ops.py:1246, code: return x.sum() - z.sum() - self.buffer.sum() l__self___buffer_1 = self.L__self___buffer # File: /home/yidi/local/pytorch/torch/_higher_order_ops/cond.py:118, code: return cond_op(pred, true_fn, false_fn, operands) cond_true_0 = self.cond_true_0 cond_false_0 = self.cond_false_0 cond = torch.ops.higher_order.cond(l_y_, cond_true_0, cond_false_0, [l_x_, l_z_, l__self___buffer, l__self___buffer_1]); l_y_ = cond_true_0 = cond_false_0 = l_x_ = l_z_ = l__self___buffer = l__self___buffer_1 = None return (cond,) class GraphModule(torch.nn.Module): def forward(self, l_x_, l_z_, l__self___buffer_true_branch, l__self___buffer_1_false_branch): l_x__1 = l_x_ l_z__1 = l_z_ # File: /home/yidi/local/pytorch/test/dynamo/test_higher_order_ops.py:1243, code: return x.sum() + self.buffer.sum() + z.sum() sum_1 = l_x__1.sum(); l_x__1 = None sum_2 = l__self___buffer_true_branch.sum(); l__self___buffer_true_branch = None add = sum_1 + sum_2; sum_1 = sum_2 = None sum_3 = l_z__1.sum(); l_z__1 = None add_1 = add + sum_3; add = sum_3 = None return add_1 class GraphModule(torch.nn.Module): def forward(self, l_x_, l_z_, l__self___buffer_true_branch, l__self___buffer_1_false_branch): l_x__1 = l_x_ l_z__1 = l_z_ # File: /home/yidi/local/pytorch/test/dynamo/test_higher_order_ops.py:1246, code: return x.sum() - z.sum() - self.buffer.sum() sum_1 = l_x__1.sum(); l_x__1 = None sum_2 = l_z__1.sum(); l_z__1 = None sub = sum_1 - sum_2; sum_1 = sum_2 = None sum_3 = l__self___buffer_1_false_branch.sum(); l__self___buffer_1_false_branch = None sub_1 = sub - sum_3; sub = sum_3 = None return sub_1 ``` After de-duplication, we have the following graph module: ```python class GraphModule(torch.nn.Module): def forward(self, L_x_ : torch.Tensor, L_y_ : torch.Tensor, s0 : torch.SymInt, L_z_ : torch.Tensor): l_x_ = L_x_ l_y_ = L_y_ l_z_ = L_z_ # File: /home/yidi/local/pytorch/test/dynamo/test_higher_order_ops.py:1232, code: return x.sum() + self.buffer.sum() + z.sum() l__self___buffer = self.L__self___buffer # File: /home/yidi/local/pytorch/torch/_higher_order_ops/cond.py:118, code: return cond_op(pred, true_fn, false_fn, operands) cond_true_0 = self.cond_true_0 cond_false_0 = self.cond_false_0 cond = torch.ops.higher_order.cond(l_y_, cond_true_0, cond_false_0, [l__self___buffer, l_x_, l_z_]); l_y_ = cond_true_0 = cond_false_0 = l__self___buffer = l_x_ = l_z_ = None return (cond,) class GraphModule(torch.nn.Module): def forward(self, l__self___buffer, l_x_, l_z_): l__self___buffer_1 = l__self___buffer l_x__1 = l_x_ l_z__1 = l_z_ # File: /home/yidi/local/pytorch/test/dynamo/test_higher_order_ops.py:1232, code: return x.sum() + self.buffer.sum() + z.sum() sum_1 = l_x__1.sum(); l_x__1 = None sum_2 = l__self___buffer_1.sum(); l__self___buffer_1 = None add = sum_1 + sum_2; sum_1 = sum_2 = None sum_3 = l_z__1.sum(); l_z__1 = None add_1 = add + sum_3; add = sum_3 = None return add_1 class GraphModule(torch.nn.Module): def forward(self, l__self___buffer_1, l_x_, l_z_): l__self___buffer_2 = l__self___buffer_1 l_x__1 = l_x_ l_z__1 = l_z_ # File: /home/yidi/local/pytorch/test/dynamo/test_higher_order_ops.py:1235, code: return x.sum() - z.sum() - self.buffer.sum() sum_1 = l_x__1.sum(); l_x__1 = None sum_2 = l_z__1.sum(); l_z__1 = None sub = sum_1 - sum_2; sum_1 = sum_2 = None sum_3 = l__self___buffer_2.sum(); l__self___buffer_2 = None sub_1 = sub - sum_3; sub = sum_3 = None return sub_1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112874 Approved by: https://github.com/zou3519	2023-11-27 22:07:42 +00:00
Michael Lazos	4c794f2ef1	Reinplace foreach when safe and allow aliasing during lowering (#112440 ) This reduces compile time of Adam on 1k parameters from 180s to 140s (28%), the main reason being that thousands of buffers no longer get sent to the scheduler. The idea behind this is that if a destination buffer (from a copy_) has no users, it shouldn't matter if dst aliases src. This is implemented by reinplacing copy_ nodes when safe. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112440 Approved by: https://github.com/jansel	2023-11-27 21:32:42 +00:00
Zhengxu Chen	e0d2a24967	Reland "[export] Support user input mutation. [1/2]" (#114496 ) (#114596 ) Summary: Serialization not implemented yet. Will do in the next diff. Resolving Github issues: https://github.com/pytorch/pytorch/issues/112429 https://github.com/pytorch/pytorch/issues/114142 Test Plan: onnx doc test ``` python -m xdoctest /opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/onnx/_internal/exporter.py ONNXProgram.model_signature:0 ``` Differential Revision: D51588558 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114596 Approved by: https://github.com/angelayi	2023-11-27 20:19:04 +00:00
Ke Wen	800cf5f7cb	Add USE_C10D_NCCL around NCCL trace utils (#114597 ) Fixes #114575 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114597 Approved by: https://github.com/malfet	2023-11-27 19:55:31 +00:00
Will Constable	69024883fb	Make dynamo's test_logging print helpful error (#114428 ) BEFORE ``` expected torch._dynamo.backends.distributed is DEBUG, got 0 ``` (0 is both unhelpful and also not numerically the right value, getEffectiveLevel() returns 20 not 0 for this particular case) AFTER ``` expected torch._dynamo.backends.distributed is DEBUG, got INFO ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114428 Approved by: https://github.com/Skylion007	2023-11-27 19:18:53 +00:00
Aaron Gokaslan	7fa1251080	[BE][Easy]: Enable NPY lint rules for ruff (#114476 ) Enable NPY lint rules for ruff Pull Request resolved: https://github.com/pytorch/pytorch/pull/114476 Approved by: https://github.com/justinchuby, https://github.com/malfet	2023-11-27 18:56:10 +00:00
Khushi Agrawal	1793ef77c6	[BC-breaking] conv1d & conv3d (#114594 ) As discussed here: https://github.com/pytorch/pytorch/pull/113885#discussion_r1404573875 #### TODO - [x] add error inputs after #114589 is merged Pull Request resolved: https://github.com/pytorch/pytorch/pull/114594 Approved by: https://github.com/lezcano	2023-11-27 18:30:59 +00:00
Aaron Gokaslan	4bb3a02d02	[BE]: Enable Ruff + Flake8 G201,G202 logging format rule. (#114474 ) Standardizes logging calls to always use logging.exception instead of logging.error where appropriate and enforces it with a lint. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114474 Approved by: https://github.com/jansel, https://github.com/malfet	2023-11-27 17:38:08 +00:00
Jack Taylor	3a4dea99df	ROCm triton commit pin update (#114348 ) Small bump in rocm triton commit pin to resolve reported issue on 7900XTX > RuntimeError: Triton Error [HIP]: Code: 719, Messsage: unspecified launch failure https://github.com/ROCmSoftwarePlatform/triton/issues/396 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114348 Approved by: https://github.com/jeffdaily	2023-11-27 17:29:23 +00:00
Yu, Guangye	bcfca41a2a	[Inductor] fix wrong Inductor UTs (#114504 ) # Motivation These UTs seem wrong. Fix them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114504 Approved by: https://github.com/aakhundov	2023-11-27 17:12:03 +00:00
Bin Bao	9fd447c346	[CI] Bump up the graph break count for DALLE2_pytorch temporarily (#114598 ) Summary: rotary-embedding-torch's version changing from 0.3.3 to 0.3.6 caused some new graph breaks for DALLE2_pytorch. A proper fix is to pin down rotary-embedding-torch's version in torchbench, and then update our torchbench pin to pick up that change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114598 Approved by: https://github.com/seemethere, https://github.com/aakhundov	2023-11-27 16:43:28 +00:00
atalman	56a95afb42	[RelEng] Pin disabled and slow test for release (#114515 ) Follow up for https://github.com/pytorch/pytorch/pull/114355 Pin disabled and slow tests when applying release only changes Pull Request resolved: https://github.com/pytorch/pytorch/pull/114515 Approved by: https://github.com/DanilBaibak	2023-11-27 15:15:19 +00:00
Khushi Agrawal	cff84871ce	[reland][opinfo][fix] conv3d & fix conv{1, 2}d for neg dilation\|groups & add ErrorInputs for conv ops (#114589 ) Previous PR: #113885 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114589 Approved by: https://github.com/lezcano	2023-11-27 14:45:44 +00:00
PyTorch MergeBot	ccb1de3595	Revert "[inductor] Fix torch.split bug on unbacked symint (#113406 )" This reverts commit cd7d6938c18d90870356553d4631f1388d2bb699. Reverted https://github.com/pytorch/pytorch/pull/113406 on behalf of https://github.com/DanilBaibak due to Break internal build ([comment](https://github.com/pytorch/pytorch/pull/113406#issuecomment-1827727411))	2023-11-27 12:20:52 +00:00
PyTorch MergeBot	fa1ccc34c4	Revert "[export] Support user input mutation. [1/2] (#114496 )" This reverts commit b62c0d96bcbe5f354ddce930fbdcd992dbaf1ce8. Reverted https://github.com/pytorch/pytorch/pull/114496 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/114496#issuecomment-1827289635))	2023-11-27 07:52:21 +00:00
PyTorch MergeBot	8232d4d1c3	Revert "[BE]: Enable Ruff + Flake8 G201,G202 logging format rule. (#114474 )" This reverts commit d30497f6b62007c9d1e3c38179528e9d25ac1292. Reverted https://github.com/pytorch/pytorch/pull/114474 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but I see a bunch of inductor failure after the commit `d30497f6b6`, trying to revert to see if it helps fix the issues ([comment](https://github.com/pytorch/pytorch/pull/114474#issuecomment-1827271887))	2023-11-27 07:36:08 +00:00
PyTorch MergeBot	150aaf46ca	Revert "[opinfo][fix] conv3d & fix conv{1, 2}d for neg dilation\|groups & add ErrorInputs for conv ops (#113885 )" This reverts commit 4fa1ff8404b6c26c076288aa2a0aa77f0c24916a. Reverted https://github.com/pytorch/pytorch/pull/113885 on behalf of https://github.com/huydhn due to Sorry for reverting you change but its TestCommonCUDA::test_compare_cpu_nn_functional_conv3d test failing in trunk `4fa1ff8404` ([comment](https://github.com/pytorch/pytorch/pull/113885#issuecomment-1827268473))	2023-11-27 07:33:00 +00:00
Wanchao Liang	68a36d2faa	[dtensor] refactor some existing test util to use comm mode (#114404 ) As titled, This is just a test util refactor: redistributed profiler is not good to use and we should use comm mode going forward Pull Request resolved: https://github.com/pytorch/pytorch/pull/114404 Approved by: https://github.com/wconstab ghstack dependencies: #113592	2023-11-27 06:43:09 +00:00
Zhengxu Chen	b62c0d96bc	[export] Support user input mutation. [1/2] (#114496 ) Summary: Serialization not implemented yet. Will do in the next diff. Resolving Github issues: https://github.com/pytorch/pytorch/issues/112429 https://github.com/pytorch/pytorch/issues/114142 Test Plan: buck2 run mode/opt caffe2/test:test_export -- -r test_export_ input_mutation Differential Revision: D51556962 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114496 Approved by: https://github.com/tugsbayasgalan	2023-11-27 04:53:38 +00:00
Wanchao Liang	624f202522	[dtensor] add CommDebugMode for debugging (#113592 ) This PR adds a CommDebugMode debugging tool to record the number of distributed collectives, utilizing TorchDispatchMode, the idea borrows from the FlopCounterMode and we can expand this later to make it more feature complete like the FlopCounterMode This is useful for debugging with DTensor and testing, in general this fits for any complex distributed algorithms where it's non-trival to understand the algorithm, we can use this tool to understand what happened under the hood., we can later cover c10d collectives directly Not sure if it would be a good general distributed debug tool yet, so adding to the dtensor package first Pull Request resolved: https://github.com/pytorch/pytorch/pull/113592 Approved by: https://github.com/wconstab	2023-11-27 02:40:28 +00:00
voznesenskym	081c5b3adc	Add Stateful/Stateless symbolic contexts, use fresh fake mode for dynamo backends (#113926 ) (#114526 ) Summary: The primary problem we are setting out to solve here is fake tensor freshness. Before this PR, fake tensors after dynamo represented fake tensors at the end of trace, so subsequent retraces like aot_autograd would start off with fake tensors in the wrong (end result) state, rather than their expected fresh state. The solution here is to start a fresh fake mode, and re-fakify the tensors. The nuance comes from ensuring that symbols are uniformly created for the symbolic sizes and strides of the tensor. This PR is the result of a lot of back and forth with ezyang and eellison. Initially, the first pass at this was not super different from what we have in the PR - the broad strokes were the same: 1) We cache source->symbol in shape_env 2) We pass policy objects around, stored at dynamo fakificaiton time, and reused for later fakification 3) We create a new fake mode for backends (from https://github.com/pytorch/pytorch/pull/113605/files) This is ugly, and has some layering violations. We detoured our decision making through a few other alternatives. Immutable/mutable fake tensor mode was the most interesting alternative, https://github.com/pytorch/pytorch/pull/113653, and was struck down on concerns of complexity in fake mode combined with it not covering all edge cases. We also detoured on what to do about tensor memoization returning back potentially different tensors than requested, and if that was an anti pattern (it is) we want to hack in with the symbol cache (we don't). We went back to the drawing board here, but with a few concessions: 1) the cache for source->symbol must live outside of shape_env, for both lifecycle, and layering reasons 2) A good amount of work needs to be done to pipe policy around fake_mode and meta_utils correctly, to cover all the cases (ezyang did this) cc penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 aakhundov kadeng imported-using-ghimport Test Plan: Imported from OSS Reviewed By: huydhn, Chillee Differential Revision: D51566250 Pulled By: voznesenskym Pull Request resolved: https://github.com/pytorch/pytorch/pull/114526 Approved by: https://github.com/Chillee, https://github.com/huydhn	2023-11-26 23:40:32 +00:00
Khushi Agrawal	4fa1ff8404	[opinfo][fix] conv3d & fix conv{1, 2}d for neg dilation\|groups & add ErrorInputs for conv ops (#113885 ) Previous PR: https://github.com/pytorch/pytorch/pull/85202 Also, cc'ing @lezcano @kshitij12345 @zou3519, who reviewed my previous PR. Thanks! Pull Request resolved: https://github.com/pytorch/pytorch/pull/113885 Approved by: https://github.com/lezcano	2023-11-26 13:44:30 +00:00
Scott Lowder	028071c4a1	Fix test assertions in test_min_max_nodes_parse. (#114537 ) Calls to `assertTrue` corrected to be `assertEqual` in `ElasticLaunchTest test_min_max_nodes_parse`. As originally written, the `assertTrue` statements will always pass, not actually asserting anything of value for the test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114537 Approved by: https://github.com/Skylion007	2023-11-26 09:25:41 +00:00
PyTorch UpdateBot	bbdd9b059f	[executorch hash update] update the pinned executorch hash (#114486 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml). Update the pinned executorch hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114486 Approved by: https://github.com/pytorchbot	2023-11-26 03:50:54 +00:00
Akihiro Nitta	d37c4c6995	Update `torch.compiler_troubleshooting.rst` (#114530 ) If you copy and paste the env var in the docs: ```console TORCHDYNAMO_REPRO_AFTER=“aot” ``` it leads to this error: ```python @functools.wraps(unconfigured_compiler_fn) def debug_wrapper(gm, example_inputs, kwargs): compiler_fn = functools.partial(unconfigured_compiler_fn, kwargs) > assert config.repro_after in ("dynamo", "aot", None) E torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised: E AssertionError: ``` because `config.repro_after` is being `'“aot”'` but not `'aot'`. --- It would've saved a few minutes of my time 😄 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114530 Approved by: https://github.com/Chillee	2023-11-25 23:15:47 +00:00
Jez Ng	0f5e24bda9	Properly type CachedFunction & rename to CachedMethod (#114161 ) Previously, I was unsure how to properly type the parameters of a decorated method. Then I found https://github.com/python/mypy/issues/13222#issuecomment-1193073470 which explains how to use `Concatenate` to hackily achieve it. Not entirely sure why we can't write a user-defined version of `Callable` that works seamlessly for both functions and methods... Pull Request resolved: https://github.com/pytorch/pytorch/pull/114161 Approved by: https://github.com/Skylion007	2023-11-25 01:30:23 +00:00
Aaron Gokaslan	d30497f6b6	[BE]: Enable Ruff + Flake8 G201,G202 logging format rule. (#114474 ) Standardizes logging calls to always use logging.exception instead of logging.error where appropriate and enforces it with a lint. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114474 Approved by: https://github.com/jansel	2023-11-24 23:29:51 +00:00
Oguz Ulgen	c6d88604d5	[Inductor] Fix mutation tracking of ConvolutionBinaryInplace (#114501 ) Init function reorders the arguments so the mutation actually happens on argument input[0] I am not sure if there's a good way to test this unfortunately.. Added tests on https://github.com/pytorch/pytorch/pull/114436 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114501 Approved by: https://github.com/leslie-fang-intel, https://github.com/aakhundov	2023-11-24 19:32:41 +00:00
Adnan Akhundov	0a063ad2c0	[inductor] Pass None and skip constexpr in custom Triton kernel calls from C++ (#114475 ) Summary: `None` arguments are codegened as `*i8` in the `triton_meta` of the generated or user-defined Triton kernels: `85aa372374/torch/_inductor/codegen/triton_utils.py (L33-L36)` Due to this, in contrary to the conventional Triton, we actually should pass `nullptr` to the Triton kernels in C++ wrapper codegen instead of passing nothing (as normally `None` doesn't make it to the generated PTX parameters, just like `tl.constexpr` args). This PR adds two things: 1. Proper C++ wrapper codegening (ABI and non-ABI) of `nullptr` and `c10::nullopt`, as the prior way codegening `c10::nullopt` as tensor breaks (also `c10` breaks in the ABI mode). 2. Skipping `tl.constexpr` args when calling the loaded-from-cubin compiled Triton kernel in the C++ wrapper codegen. As a side effect, this also resolves an issue with string arguments: now they are simply omitted in the C++ wrapper codegen. Test Plan: ``` $ python test/inductor/test_aot_inductor.py -k test_triton_kernel_with_none_input ... ---------------------------------------------------------------------- Ran 4 tests in 40.364s OK (skipped=2) ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/114475 Approved by: https://github.com/oulgen	2023-11-24 12:51:56 +00:00
chundian	cd7d6938c1	[inductor] Fix torch.split bug on unbacked symint (#113406 ) torch.split(x, l) fails when l's shape is the unbacked symint. E.g. l = y.tolist() makes l the unbacked shape, because l depends on the data access of y. The downdtream call `SliceView.create()` evaluates the shape even if the input shape is unbacked symint, which brings up the bug. Test Plan: python test/inductor/test_unbacked_symints.py -k test_split_with_sizes Pull Request resolved: https://github.com/pytorch/pytorch/pull/113406 Approved by: https://github.com/aakhundov, https://github.com/ezyang	2023-11-24 07:21:00 +00:00
Oguz Ulgen	51390722e9	Fix ConvolutionBinaryInplace using target node (#114436 ) This IR node mutates in place, it needs to use the argument not the target. Fixes #113440 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114436 Approved by: https://github.com/jansel ghstack dependencies: #114169	2023-11-24 06:25:11 +00:00
cyy	07e00de8d7	Add missing member initialization in c10::ExtraMeta constructor (#114448 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114448 Approved by: https://github.com/Skylion007	2023-11-24 03:54:11 +00:00
Tugsbayasgalan Manlaibaatar	dad3cc4d02	Fix type for keep_inference_mutation flag (#114482 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114482 Approved by: https://github.com/Skylion007 ghstack dependencies: #114421, #114479, #114481	2023-11-24 00:04:31 +00:00
Tugsbayasgalan Manlaibaatar	fa71f5efdc	[BE][aot_autograd] Remove unnecessary fields from ViewMutationData (#114481 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114481 Approved by: https://github.com/zhxchen17 ghstack dependencies: #114421, #114479	2023-11-24 00:04:26 +00:00
Tugsbayasgalan Manlaibaatar	e6e650d5eb	[BE][aot_autograd] Remove num_mutated_inputs (#114479 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114479 Approved by: https://github.com/zhxchen17 ghstack dependencies: #114421	2023-11-24 00:04:25 +00:00
Tugsbayasgalan Manlaibaatar	a378ae33e9	[BE][aot_autograd] Remove mutated_inp_indices (#114421 ) We should use mutated_inp_runtime_indices moving forward Pull Request resolved: https://github.com/pytorch/pytorch/pull/114421 Approved by: https://github.com/zhxchen17	2023-11-23 22:41:38 +00:00
cyy	b76e2949f7	Fix pool_size type in TaskThreadPool (#114063 ) As negative values of pool_size mean calling defaultNumThreads() Pull Request resolved: https://github.com/pytorch/pytorch/pull/114063 Approved by: https://github.com/Skylion007	2023-11-23 21:20:45 +00:00
Tobias Ringwald	a28876832c	Fixed an export problem when moving tensors to CPU during `torch.export.save` (#114029 ) For whatever reason calling`.cpu()` on a `nn.Parameter` wrapping a CUDA tensor will return a plain (non-parameter) tensor. This PR fixes the symptom in the linked issue, but not the underlying issue. Fixes #113999. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114029 Approved by: https://github.com/zhxchen17	2023-11-23 21:17:43 +00:00
GdoongMathew	fd1a01a393	Set default LR value of SGD to 1e-3 (#114467 ) Fixes https://github.com/pytorch/pytorch/issues/114089 Set the lr to 1e-3 in SGD to increase the consistency of input signature of optimizers. @janeyx99 This should be the redacted PR #114434 , sincerely. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114467 Approved by: https://github.com/janeyx99	2023-11-23 19:07:38 +00:00
vfdev-5	85aa372374	[inductor] Fixed conv issue with dynamic shapes (#114351 ) EDIT: fixes https://github.com/pytorch/pytorch/issues/114354 Description: The following code is failing: ```python import torch def func(x, w): return torch.nn.functional.conv2d(x, w, groups=int(w.shape[0])) x = torch.rand(1, 3, 64, 64) w = torch.rand(3, 1, 3, 3) y1 = func(x, w) cfunc = torch.compile(func, fullgraph=True, dynamic=True) y2 = cfunc(x, w) torch.testing.assert_close(y1, y2) ``` with the error: ``` File "/pytorch/torch/_inductor/kernel/conv.py", line 315, in convolution assert isinstance(groups, int) torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised: LoweringException: AssertionError: target: aten.convolution.default args[0]: TensorBox(StorageBox( InputBuffer(name='arg3_1', layout=FixedLayout('cpu', torch.float32, size=[1, s0, s1, s1], stride=[s0s12, s12, s1, 1])) )) args[1]: TensorBox(StorageBox( InputBuffer(name='arg1_1', layout=FixedLayout('cpu', torch.float32, size=[s0, 1, s0, s0], stride=[s02, s0*2, s0, 1])) )) args[2]: None args[3]: [1, 1] args[4]: [0, 0] args[5]: [1, 1] args[6]: False args[7]: [0, 0] args[8]: s0 ``` where `groups` argument is a symbol but expected to be `int`. This PR specializes `group` to its int value and fixes the problem. Context: Failing tests in torchvision with gaussian blur and adjust_sharpness ops - https://github.com/pytorch/vision/actions/runs/6955843968/job/18926393710?pr=8127 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114351 Approved by: https://github.com/ezyang	2023-11-23 13:13:06 +00:00
PyTorch MergeBot	01366efcc9	Revert "[pytree] register pytree node type in both C++ pytree and Python pytree (#112111 )" This reverts commit 4e4a6ad6ecd71a1aefde3992ecf7f77e37d2e264. Reverted https://github.com/pytorch/pytorch/pull/112111 on behalf of https://github.com/DanilBaibak due to Break internal build ([comment](https://github.com/pytorch/pytorch/pull/112111#issuecomment-1824099658))	2023-11-23 09:59:32 +00:00
Thiago Crepaldi	a76bb5d84d	Add support for models with mutated buffer on torch.onnx.dynamo_export (#112272 ) This PR adds a unit test that leverages `torch.export.ExportedProgram` models that mutates registered buffers. Although the exporter already works out of the box in such scenario, the GraphModule and the exported ONNX model have extra outputs containing the mutated buffers. On future runs of the ONNX model, the mutated buffers are used as input to the model. The aforementioned extra inputs and outputs are by design and the `ONNXProgram.model_signature` can be used to fetch detailed input/output schema for the exported model. However, when we want to compare pytorch output to ONNX's, there is a mismatch between the schema because pytorch output does not include the mutated buffers present on the ONNX output. This PR extends `onnx_program.adapt_torch_outputs_to_onnx(torch_outputs)` so that the mutated buffers are prepended to the Pytorch output, matching the ONNX schema. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112272 Approved by: https://github.com/titaiwangms, https://github.com/BowenBao	2023-11-23 09:59:02 +00:00
Huy Do	7daeb6509f	Update audio pinned commit nightly (#114426 ) I think we could have this pinned commit being update nightly like what we have with vision. This will avoid having an outdated audio pinned commit that needs to be updated manually, i.e. https://github.com/pytorch/pytorch/pull/114393 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114426 Approved by: https://github.com/atalman, https://github.com/seemethere, https://github.com/malfet	2023-11-23 07:36:55 +00:00
Huy Do	6f340c6f30	Handle the case when opening a reverted PR with deleted head branch (#114423 ) When reopening a reverted PR, `422: Unprocessable Entity` is returned when the head branch has been deleted, for example https://github.com/pytorch/pytorch/pull/112889#issuecomment-1823216686 ``` { "message": "Validation Failed", "errors": [ { "resource": "PullRequest", "code": "custom", "field": "state", "message": "state cannot be changed. The commsplit branch has been deleted." } ], "documentation_url": "https://docs.github.com/rest/pulls/pulls#update-a-pull-request" } ``` The revert still happens though, only reopening PR fails, which is ok to ignore in this case I think instead of going the complicated route of trying to restore the deleted branch by merge bot. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114423 Approved by: https://github.com/malfet, https://github.com/kit1980	2023-11-23 07:32:46 +00:00
PyTorch MergeBot	a43edd836c	Revert "Add support for models with mutated buffer on torch.onnx.dynamo_export (#112272 )" This reverts commit c4a22d6918b7ca218f2712d7e7e147aca7127fa3. Reverted https://github.com/pytorch/pytorch/pull/112272 on behalf of https://github.com/huydhn due to Sorry for reverting you change but it is failing dynamo test in trunk `c4a22d6918` ([comment](https://github.com/pytorch/pytorch/pull/112272#issuecomment-1823897964))	2023-11-23 07:07:56 +00:00
Chip Turner	066e072524	Retry #112889 (Opportunistically use ncclCommSplit when creating new NCCL groups) (#114385 ) - [c10d] (retry) Opportunistically use `ncclCommSplit` when creating new NCCL groups (#112889) - Guard use of `split_from` with a `hasattr` check for cases when NCCL (or RCCL) lacks `ncclCommSplit` Fixes cause of revert of original PR Pull Request resolved: https://github.com/pytorch/pytorch/pull/114385 Approved by: https://github.com/huydhn	2023-11-23 07:00:00 +00:00
Andrew Gu	ed05af278c	[DTensor] Passed `dynamic=False` for compile tests (#114390 ) Test Plan: ``` python test/distributed/_tensor/test_dtensor_compile.py ``` We found that after https://github.com/pytorch/pytorch/pull/114236 landed, DTensor + `torch.compile` tests were breaking (which was confounded with `DTensorSpec` hash changes). The temporary solution is to pass `dynamic=False`. Otherwise, we see errors like: <details> ``` ====================================================================== ERROR: test_2d_fsdp_tp_ac_compile (__main__.TestDTensorCompileE2E) ---------------------------------------------------------------------- Traceback (most recent call last): File "/data/users/andgu/pytorch/torch/testing/_internal/common_distributed.py", line 533, in wrapper self._join_processes(fn) File "/data/users/andgu/pytorch/torch/testing/_internal/common_distributed.py", line 752, in _join_processes self._check_return_codes(elapsed_time) File "/data/users/andgu/pytorch/torch/testing/_internal/common_distributed.py", line 802, in _check_return_codes raise RuntimeError(error) RuntimeError: Process 2 exited with error code 10 and exception: Traceback (most recent call last): File "/data/users/andgu/pytorch/torch/testing/_internal/common_distributed.py", line 649, in run_test getattr(self, test_name)() File "/data/users/andgu/pytorch/torch/testing/_internal/common_distributed.py", line 535, in wrapper fn() File "/data/users/andgu/pytorch/torch/testing/_internal/common_utils.py", line 2652, in wrapper method(args, kwargs) File "/data/users/andgu/pytorch/torch/testing/_internal/distributed/_tensor/common_dtensor.py", line 193, in wrapper func(self, args, *kwargs) # type: ignore[misc] File "/data/users/andgu/pytorch/torch/testing/_internal/common_distributed.py", line 174, in wrapper return func(args, *kwargs) File "/data/users/andgu/pytorch/test/distributed/_tensor/test_dtensor_compile.py", line 328, in test_2d_fsdp_tp_ac_compile compiled_output = compiled_2d(inp) File "/data/users/andgu/pytorch/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, *kwargs) File "/data/users/andgu/pytorch/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, *kwargs) File "/data/users/andgu/pytorch/torch/_dynamo/eval_frame.py", line 489, in _fn return fn(args, *kwargs) File "/data/users/andgu/pytorch/torch/_dynamo/external_utils.py", line 17, in inner return fn(args, *kwargs) File "/data/users/andgu/pytorch/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, *kwargs) File "/data/users/andgu/pytorch/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, *kwargs) File "/data/users/andgu/pytorch/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 848, in forward output = self._fsdp_wrapped_module(args, *kwargs) File "/data/users/andgu/pytorch/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, *kwargs) File "/data/users/andgu/pytorch/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, *kwargs) File "/data/users/andgu/pytorch/torch/_dynamo/eval_frame.py", line 655, in catch_errors return callback(frame, cache_entry, hooks, frame_state) File "/data/users/andgu/pytorch/torch/_dynamo/convert_frame.py", line 721, in _convert_frame result = inner_convert(frame, cache_entry, hooks, frame_state) File "/data/users/andgu/pytorch/torch/_dynamo/convert_frame.py", line 383, in _convert_frame_assert compiled_product = _compile( File "/data/users/andgu/pytorch/torch/_dynamo/convert_frame.py", line 645, in _compile guarded_code = compile_inner(code, one_graph, hooks, transform) File "/data/users/andgu/pytorch/torch/_dynamo/utils.py", line 244, in time_wrapper r = func(args, *kwargs) File "/data/users/andgu/pytorch/torch/_dynamo/convert_frame.py", line 562, in compile_inner out_code = transform_code_object(code, transform) File "/data/users/andgu/pytorch/torch/_dynamo/bytecode_transformation.py", line 1033, in transform_code_object transformations(instructions, code_options) File "/data/users/andgu/pytorch/torch/_dynamo/convert_frame.py", line 151, in _fn return fn(args, *kwargs) File "/data/users/andgu/pytorch/torch/_dynamo/convert_frame.py", line 527, in transform tracer.run() File "/data/users/andgu/pytorch/torch/_dynamo/symbolic_convert.py", line 2123, in run super().run() File "/data/users/andgu/pytorch/torch/_dynamo/symbolic_convert.py", line 818, in run and self.step() File "/data/users/andgu/pytorch/torch/_dynamo/symbolic_convert.py", line 781, in step getattr(self, inst.opname)(inst) File "/data/users/andgu/pytorch/torch/_dynamo/symbolic_convert.py", line 2238, in RETURN_VALUE self.output.compile_subgraph( File "/data/users/andgu/pytorch/torch/_dynamo/output_graph.py", line 912, in compile_subgraph self.compile_and_call_fx_graph(tx, list(reversed(stack_values)), root) File "/home/andgu/local/miniconda3/envs/pytorch-3.10/lib/python3.10/contextlib.py", line 79, in inner return func(args, *kwds) File "/data/users/andgu/pytorch/torch/_dynamo/output_graph.py", line 1069, in compile_and_call_fx_graph compiled_fn = self.call_user_compiler(gm) File "/data/users/andgu/pytorch/torch/_dynamo/utils.py", line 244, in time_wrapper r = func(args, kwargs) File "/data/users/andgu/pytorch/torch/_dynamo/output_graph.py", line 1141, in call_user_compiler raise BackendCompilerFailed(self.compiler_fn, e).with_traceback( File "/data/users/andgu/pytorch/torch/_dynamo/output_graph.py", line 1122, in call_user_compiler compiled_fn = compiler_fn(gm, self.example_inputs()) File "/data/users/andgu/pytorch/torch/_dynamo/repro/after_dynamo.py", line 117, in debug_wrapper compiled_gm = compiler_fn(gm, example_inputs) File "/data/users/andgu/pytorch/torch/__init__.py", line 1696, in __call__ return self.compiler_fn(model_, inputs_, self.kwargs) File "/data/users/andgu/pytorch/torch/_dynamo/backends/common.py", line 55, in compiler_fn cg = aot_module_simplified(gm, example_inputs, *kwargs) File "/data/users/andgu/pytorch/torch/_functorch/aot_autograd.py", line 4946, in aot_module_simplified compiled_fn = create_aot_dispatcher_function( File "/data/users/andgu/pytorch/torch/_dynamo/utils.py", line 244, in time_wrapper r = func(args, *kwargs) File "/data/users/andgu/pytorch/torch/_functorch/aot_autograd.py", line 4486, in create_aot_dispatcher_function compiled_fn = compiler_fn(flat_fn, fake_flat_args, aot_config, fw_metadata=fw_metadata) File "/data/users/andgu/pytorch/torch/_functorch/aot_autograd.py", line 2825, in aot_wrapper_dedupe return compiler_fn(flat_fn, leaf_flat_args, aot_config, fw_metadata=fw_metadata) File "/data/users/andgu/pytorch/torch/_functorch/aot_autograd.py", line 3011, in aot_wrapper_synthetic_base return compiler_fn(flat_fn, flat_args, aot_config, fw_metadata=fw_metadata) File "/data/users/andgu/pytorch/torch/_functorch/aot_autograd.py", line 3714, in aot_dispatch_autograd fx_g, joint_inputs, maybe_subclass_meta = aot_dispatch_autograd_graph(flat_fn, flat_args, aot_config, fw_metadata=fw_metadata) File "/data/users/andgu/pytorch/torch/_functorch/aot_autograd.py", line 3694, in aot_dispatch_autograd_graph fx_g = create_graph(joint_fn_to_trace, updated_joint_inputs, aot_config=aot_config) File "/data/users/andgu/pytorch/torch/_functorch/aot_autograd.py", line 1955, in create_graph fx_g = make_fx(f, decomposition_table=aot_config.decompositions)(args) File "/data/users/andgu/pytorch/torch/fx/experimental/proxy_tensor.py", line 869, in wrapped t = dispatch_trace(wrap_key(func, args, fx_tracer, pre_dispatch), tracer=fx_tracer, concrete_args=tuple(phs)) File "/data/users/andgu/pytorch/torch/_compile.py", line 24, in inner return torch._dynamo.disable(fn, recursive)(args, kwargs) File "/data/users/andgu/pytorch/torch/_dynamo/eval_frame.py", line 489, in _fn return fn(args, *kwargs) File "/data/users/andgu/pytorch/torch/_dynamo/external_utils.py", line 17, in inner return fn(args, *kwargs) File "/data/users/andgu/pytorch/torch/fx/experimental/proxy_tensor.py", line 481, in dispatch_trace graph = tracer.trace(root, concrete_args) File "/data/users/andgu/pytorch/torch/_dynamo/eval_frame.py", line 489, in _fn return fn(args, *kwargs) File "/data/users/andgu/pytorch/torch/_dynamo/external_utils.py", line 17, in inner return fn(args, *kwargs) File "/data/users/andgu/pytorch/torch/fx/_symbolic_trace.py", line 821, in trace (self.create_arg(fn(args)),), File "/data/users/andgu/pytorch/torch/fx/_symbolic_trace.py", line 688, in flatten_fn tree_out = root_fn(tree_args) File "/data/users/andgu/pytorch/torch/fx/experimental/proxy_tensor.py", line 517, in wrapped out = f(tensors) File "/data/users/andgu/pytorch/torch/_functorch/aot_autograd.py", line 3607, in joint_fn return inner_fn(flat_fn_maybe_joint, (primals, tangents), use_trace_joint=True) File "/data/users/andgu/pytorch/torch/_functorch/aot_autograd.py", line 3591, in inner_fn wrapped_outs = fn(all_args) File "/data/users/andgu/pytorch/torch/_functorch/aot_autograd.py", line 1941, in joint_helper return functionalized_f_helper(primals, tangents) File "/data/users/andgu/pytorch/torch/_functorch/aot_autograd.py", line 1894, in functionalized_f_helper f_outs = fn(f_args) File "/data/users/andgu/pytorch/torch/_functorch/aot_autograd.py", line 1862, in inner_fn_with_anomaly return inner_fn(args) File "/data/users/andgu/pytorch/torch/_functorch/aot_autograd.py", line 1796, in inner_fn outs, tangent_mask = fn(primals) File "/data/users/andgu/pytorch/torch/_functorch/aot_autograd.py", line 1724, in inner_fn outs = fn(args_maybe_cloned) File "/data/users/andgu/pytorch/torch/_functorch/aot_autograd.py", line 4552, in functional_call out = Interpreter(mod).run(args[params_len:], *kwargs) File "/data/users/andgu/pytorch/torch/fx/interpreter.py", line 138, in run self.env[node] = self.run_node(node) File "/data/users/andgu/pytorch/torch/fx/interpreter.py", line 195, in run_node return getattr(self, n.op)(n.target, args, kwargs) File "/data/users/andgu/pytorch/torch/fx/interpreter.py", line 267, in call_function return target(args, *kwargs) File "/data/users/andgu/pytorch/torch/distributed/_tensor/api.py", line 280, in __torch_dispatch__ return DTensor._op_dispatcher.dispatch( File "/data/users/andgu/pytorch/torch/distributed/_tensor/dispatch.py", line 106, in dispatch self.sharding_propagator.propagate(op_info) File "/data/users/andgu/pytorch/torch/distributed/_tensor/sharding_prop.py", line 161, in propagate output_sharding = self.propagate_op_sharding_non_cached(op_info.schema) File "/data/users/andgu/pytorch/torch/distributed/_tensor/sharding_prop.py", line 175, in propagate_op_sharding_non_cached out_tensor_meta = self._propagate_tensor_meta(op_schema) File "/data/users/andgu/pytorch/torch/distributed/_tensor/sharding_prop.py", line 85, in _propagate_tensor_meta fake_args = op_schema.gen_fake_args() File "/data/users/andgu/pytorch/torch/distributed/_tensor/op_schema.py", line 332, in gen_fake_args return tree_map_only( File "/data/users/andgu/pytorch/torch/utils/_cxx_pytree.py", line 765, in tree_map_only return tree_map( File "/data/users/andgu/pytorch/torch/utils/_cxx_pytree.py", line 607, in tree_map return optree.tree_map( File "/home/andgu/local/miniconda3/envs/pytorch-3.10/lib/python3.10/site-packages/optree/ops.py", line 473, in tree_map return treespec.unflatten(flat_results) File "/data/users/andgu/pytorch/torch/utils/_cxx_pytree.py", line 713, in wrapped return func(x) File "/data/users/andgu/pytorch/torch/distributed/_tensor/op_schema.py", line 31, in _rebuild_tensor_from_dtensor_meta return torch.empty_strided( File "/data/users/andgu/pytorch/torch/_subclasses/functional_tensor.py", line 297, in __torch_dispatch__ outs_unwrapped = func(args_unwrapped, *kwargs_unwrapped) File "/data/users/andgu/pytorch/torch/_ops.py", line 509, in __call__ return self._op(args, *kwargs or {}) File "/data/users/andgu/pytorch/torch/utils/_stats.py", line 20, in wrapper return fn(args, *kwargs) File "/data/users/andgu/pytorch/torch/fx/experimental/proxy_tensor.py", line 594, in __torch_dispatch__ return self.inner_torch_dispatch(func, types, args, kwargs) File "/data/users/andgu/pytorch/torch/fx/experimental/proxy_tensor.py", line 629, in inner_torch_dispatch return proxy_call(self, func, self.pre_dispatch, args, kwargs) File "/data/users/andgu/pytorch/torch/fx/experimental/proxy_tensor.py", line 317, in proxy_call proxy_args, proxy_kwargs = pytree.tree_map_only( File "/data/users/andgu/pytorch/torch/utils/_pytree.py", line 631, in tree_map_only return tree_map(map_only(__type_or_types)(func), tree) File "/data/users/andgu/pytorch/torch/utils/_pytree.py", line 523, in tree_map return tree_unflatten([func(i) for i in flat_args], spec) File "/data/users/andgu/pytorch/torch/utils/_pytree.py", line 523, in <listcomp> return tree_unflatten([func(i) for i in flat_args], spec) File "/data/users/andgu/pytorch/torch/utils/_pytree.py", line 591, in wrapped return func(x) File "/data/users/andgu/pytorch/torch/fx/experimental/proxy_tensor.py", line 247, in inner return get_proxy_slot(n, tracer)() File "/data/users/andgu/pytorch/torch/fx/experimental/proxy_tensor.py", line 110, in get_proxy_slot raise RuntimeError(f"{obj} is not tracked with proxy for {tracer}") torch._dynamo.exc.BackendCompilerFailed: backend='aot_eager' raised: RuntimeError: s0 is not tracked with proxy for <torch.fx.experimental.proxy_tensor.PythonKeyTracer object at 0x7fae60366c50> While executing %result_2 : [num_users=1] = call_function[target=torch._C._nn.linear](args = (%prim_redistribute_2, %l_self_mlp_0_net2_weight, %l_self_mlp_0_net2_bias), kwargs = {}) Original traceback: File "/data/users/andgu/pytorch/test/distributed/_tensor/test_dtensor_compile.py", line 51, in forward return self.mlp_1(self.mlp_0(input)) File "/data/users/andgu/pytorch/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, *kwargs) File "/data/users/andgu/pytorch/torch/testing/_internal/distributed/_tensor/common_dtensor.py", line 64, in forward return self.net2(self.relu(self.net1(x))) File "/data/users/andgu/pytorch/torch/nn/modules/module.py", line 1561, in _call_impl result = forward_call(args, **kwargs) File "/data/users/andgu/pytorch/torch/nn/modules/linear.py", line 116, in forward return F.linear(input, self.weight, self.bias) ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/114390 Approved by: https://github.com/wanchaol, https://github.com/huydhn ghstack dependencies: #114379	2023-11-23 05:47:38 +00:00
Andrew Gu	34326e43eb	[DTensor] Made `DTensorSpec` hash recomputation lazy (#114379 ) If we assign `spec.tensor_meta = ...`, we do not have to recompute the hash eagerly. We just need to clear the existing hash so that the next call to `__hash__` recomputes it. We found that the breakage of the DTensor + `torch.compile` tests comes from https://github.com/pytorch/pytorch/pull/114236 and are not directly related to the `DTensorSpec` hashing changes. We fix that in the following PR temporarily by passing `dynamic=False`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114379 Approved by: https://github.com/wanchaol	2023-11-23 05:45:18 +00:00
Ke Wen	36763d3135	[ProcessGroupNCCL] Move new trace utils (#114367 ) to TraceUtils.h Pull Request resolved: https://github.com/pytorch/pytorch/pull/114367 Approved by: https://github.com/wconstab, https://github.com/XilunWu	2023-11-23 05:07:41 +00:00
PyTorch UpdateBot	c340db56d5	[executorch hash update] update the pinned executorch hash (#114427 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml). Update the pinned executorch hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114427 Approved by: https://github.com/pytorchbot	2023-11-23 04:54:06 +00:00
Andrew Gu	088043fc49	[FSDP] Passed `TORCH_NCCL_DESYNC_DEBUG` instead of `NCCL_DESYNC_DEBUG` (#114432 ) This is to silence some warnings like: ``` [rank0]:[W Utils.hpp:164] Warning: Environment variable NCCL_DESYNC_DEBUG is deprecated; use TORCH_NCCL_DESYNC_DEBUG instead (function getCvarBool) [rank3]:[W Utils.hpp:164] Warning: Environment variable NCCL_DESYNC_DEBUG is deprecated; use TORCH_NCCL_DESYNC_DEBUG instead (function getCvarBool) [rank1]:[W Utils.hpp:164] Warning: Environment variable NCCL_DESYNC_DEBUG is deprecated; use TORCH_NCCL_DESYNC_DEBUG instead (function getCvarBool) [rank2]:[W Utils.hpp:164] Warning: Environment variable NCCL_DESYNC_DEBUG is deprecated; use TORCH_NCCL_DESYNC_DEBUG instead (function getCvarBool) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114432 Approved by: https://github.com/fegin	2023-11-23 04:53:12 +00:00
Xia, Weiwen	d18e6b07aa	Overload vec::dequantize to eliminate rounding error for quantized sigmoid (#114098 ) Description Fix #107030 Dequantize X by `(x_val - zp) * scale` instead of `x_val * scale + (-zp * scale)` to eliminate rounding error. Now this overload is used for sigmoid only. Performance impact: ![image](https://github.com/pytorch/pytorch/assets/12522207/655abd16-7d9d-4a9a-8c59-327ebf39157a) Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) Test plan `python test_quantization.py TestQuantizedOps.test_sigmoid_dequantize_rounding_error` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114098 Approved by: https://github.com/jgong5, https://github.com/jerryzh168	2023-11-23 04:33:57 +00:00
Thiago Crepaldi	c4a22d6918	Add support for models with mutated buffer on torch.onnx.dynamo_export (#112272 ) This PR adds a unit test that leverages `torch.export.ExportedProgram` models that mutates registered buffers. Although the exporter already works out of the box in such scenario, the GraphModule and the exported ONNX model have extra outputs containing the mutated buffers. On future runs of the ONNX model, the mutated buffers are used as input to the model. The aforementioned extra inputs and outputs are by design and the `ONNXProgram.model_signature` can be used to fetch detailed input/output schema for the exported model. However, when we want to compare pytorch output to ONNX's, there is a mismatch between the schema because pytorch output does not include the mutated buffers present on the ONNX output. This PR extends `onnx_program.adapt_torch_outputs_to_onnx(torch_outputs)` so that the mutated buffers are prepended to the Pytorch output, matching the ONNX schema. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112272 Approved by: https://github.com/titaiwangms, https://github.com/BowenBao	2023-11-23 03:39:18 +00:00
Yidi Wu	b27565ad7d	Forward fix D51468211 (#114381 ) Summary: Forward fix test failures caused by D51468211. The root cause is that when converting the param_buffer into fake_tensor, we didn't set the static_shapes=True, this causes the shape_env to have more symbols than expected. The current status is that we assume all param and buffers are constant sizes. Test Plan: buck2 test 'fbcode//mode/opt' fbcode//aps_models/ads/icvr/tests:export_test_cpu -- --exact 'aps_models/ads/icvr/tests:export_test_cpu - test_20x_icvr_export (aps_models.ads.icvr.tests.export_test.ExportTest)' Reviewed By: hongtansun-meta Differential Revision: D51531279 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114381 Approved by: https://github.com/angelayi	2023-11-23 02:58:52 +00:00
atalman	7a697c4683	[RelEng] Tag docker images for release, pin unstable and disabled jobs, apply release only changes (#114355 ) 1. This tags docker images using docker pull/tag/push for current release 2. Sets RELEASE_VERSION_TAG var and regenerates the workflows using the new docker tag 3. Remove conda token setting and Binary tests release changes these are already automated 4. Pin unstable and disabled jobs, autumate: https://github.com/pytorch/pytorch/pull/111675 Test: ``` RELEASE_VERSION=2.2 ./scripts/release/apply-release-changes.sh Tagging pytorch/manylinux-builder:cuda11.8-main to pytorch/manylinux-builder:cuda11.8-2.2 , dry_run: enabled Tagging pytorch/manylinux-builder:cuda12.1-main to pytorch/manylinux-builder:cuda12.1-2.2 , dry_run: enabled Tagging pytorch/libtorch-cxx11-builder:cuda11.8-main to pytorch/libtorch-cxx11-builder:cuda11.8-2.2 , dry_run: enabled Tagging pytorch/libtorch-cxx11-builder:cuda12.1-main to pytorch/libtorch-cxx11-builder:cuda12.1-2.2 , dry_run: enabled Tagging pytorch/manylinux-builder:rocm5.6-main to pytorch/manylinux-builder:rocm5.6-2.2 , dry_run: enabled Tagging pytorch/manylinux-builder:rocm5.7-main to pytorch/manylinux-builder:rocm5.7-2.2 , dry_run: enabled Tagging pytorch/libtorch-cxx11-builder:rocm5.6-main to pytorch/libtorch-cxx11-builder:rocm5.6-2.2 , dry_run: enabled Tagging pytorch/libtorch-cxx11-builder:rocm5.7-main to pytorch/libtorch-cxx11-builder:rocm5.7-2.2 , dry_run: enabled Tagging pytorch/manylinux-builder:cpu-main to pytorch/manylinux-builder:cpu-2.2 , dry_run: enabled Tagging pytorch/libtorch-cxx11-builder:cpu-main to pytorch/libtorch-cxx11-builder:cpu-2.2 , dry_run: enabled Tagging pytorch/manylinuxcxx11-abi-builder:cpu-cxx11-abi-main to pytorch/manylinuxcxx11-abi-builder:cpu-cxx11-abi-2.2 , dry_run: enabled Tagging pytorch/manylinuxaarch64-builder:cpu-aarch64-main to pytorch/manylinuxaarch64-builder:cpu-aarch64-2.2 , dry_run: enabled Tagging pytorch/conda-builder:cuda11.8-main to pytorch/conda-builder:cuda11.8-2.2 , dry_run: enabled Tagging pytorch/conda-builder:cuda12.1-main to pytorch/conda-builder:cuda12.1-2.2 , dry_run: enabled Tagging pytorch/conda-builder:cpu-main to pytorch/conda-builder:cpu-2.2 , dry_run: enabled /data/users/atalman/pytorch/.github/workflows/generated-linux-binary-manywheel-nightly.yml /data/users/atalman/pytorch/.github/workflows/generated-linux-binary-conda-nightly.yml /data/users/atalman/pytorch/.github/workflows/generated-linux-binary-libtorch-cxx11-abi-nightly.yml /data/users/atalman/pytorch/.github/workflows/generated-linux-binary-libtorch-pre-cxx11-nightly.yml /data/users/atalman/pytorch/.github/workflows/generated-linux-aarch64-binary-manywheel-nightly.yml /data/users/atalman/pytorch/.github/workflows/generated-linux-binary-manywheel-main.yml /data/users/atalman/pytorch/.github/workflows/generated-linux-binary-libtorch-cxx11-abi-main.yml /data/users/atalman/pytorch/.github/workflows/generated-linux-binary-libtorch-pre-cxx11-main.yml /data/users/atalman/pytorch/.github/workflows/generated-windows-binary-wheel-nightly.yml /data/users/atalman/pytorch/.github/workflows/generated-windows-binary-conda-nightly.yml /data/users/atalman/pytorch/.github/workflows/generated-windows-binary-libtorch-release-nightly.yml /data/users/atalman/pytorch/.github/workflows/generated-windows-binary-libtorch-debug-nightly.yml /data/users/atalman/pytorch/.github/workflows/generated-windows-binary-libtorch-release-main.yml /data/users/atalman/pytorch/.github/workflows/generated-windows-binary-libtorch-debug-main.yml /data/users/atalman/pytorch/.github/workflows/generated-macos-binary-wheel-nightly.yml /data/users/atalman/pytorch/.github/workflows/generated-macos-binary-conda-nightly.yml /data/users/atalman/pytorch/.github/workflows/generated-macos-binary-libtorch-cxx11-abi-nightly.yml /data/users/atalman/pytorch/.github/workflows/generated-macos-arm64-binary-libtorch-cxx11-abi-nightly.yml /data/users/atalman/pytorch/.github/workflows/generated-macos-arm64-binary-wheel-nightly.yml /data/users/atalman/pytorch/.github/workflows/generated-macos-arm64-binary-conda-nightly.yml ```` Result of pinning unstable and disabled jobs: ``` # The link to the published list of disabled jobs DISABLED_JOBS_URL = "https://ossci-metrics.s3.amazonaws.com/disabled-jobs.json?versionid=kKJlAXdrUbk3CilXbKu.6OwNTGQB8a.B" # and unstable jobs UNSTABLE_JOBS_URL = "https://ossci-metrics.s3.amazonaws.com/unstable-jobs.json?versionid=vzaicOxSsh55iXBXwgGrW6dFeVtPfrhr" ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114355 Approved by: https://github.com/malfet	2023-11-23 02:14:22 +00:00
Facebook Community Bot	2bae888f65	Automated submodule update: FBGEMM (#113977 ) This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `a142e2064d` Test Plan: Ensure that CI jobs succeed on GitHub before landing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113977 Approved by: https://github.com/malfet	2023-11-23 01:46:34 +00:00
PyTorch MergeBot	272b40aee5	Revert "deprecate PairwiseParallel from test (#114314 )" This reverts commit 07b6f377b401933e69a605037b8a5c2fba627601. Reverted https://github.com/pytorch/pytorch/pull/114314 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but this seems to fail periodic multigpu tests ([comment](https://github.com/pytorch/pytorch/pull/114314#issuecomment-1823727818))	2023-11-23 01:43:32 +00:00
Angela Yi	f961bda939	[export] Move serialized custom class objs to toplevel (#114371 ) Summary: Move the serialized CustomClassHolder objects to the toplevel SerializedArtifact instead of embedding the bytes in the graph. Currently the CustomClassHolder objects are embedded in the graph instead of being lifted to the ExportedProgram, so there's some logic introduced to lift it to the higher level of the serialized ExportedProgram. However, once that CustomClassHolder objects get lifted, we can remove the TODOs I added. Test Plan: CI Reviewed By: zhxchen17 Differential Revision: D51479125 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114371 Approved by: https://github.com/ydwu4	2023-11-22 23:44:20 +00:00
eqy	6a86cf00ad	[CUDA][cuBLAS] Remove explicit cuBLAS workspace allocation for CUDA 12.2+ (#113994 ) cuBLAS should be using `cudaMallocAsync` in CUDA 12.2+, which removes the need for explicit workspace allocation to avoid increasing memory usage with multiple graph captures. CC @ptrblck @malfet Pull Request resolved: https://github.com/pytorch/pytorch/pull/113994 Approved by: https://github.com/ezyang, https://github.com/malfet	2023-11-22 23:23:51 +00:00
Pedro Caldeira	5f504d1de7	Check for boolean values as argument on pow function. (#114133 ) Hello everyone! 😄 Also @lezcano , nice to meet you! :) Sorry if I miss anything, this is my first time around here. 🙃 This PR basically makes the same behaviour for cuda when using `torch.pow`. Basically Python considers True as 1 and False as 0. I just added this check into `pow` function. From what I understood, when I do `.equal` for `Scalar` that is boolean, I'm sure that types match so that won't cause more trouble. I know that the issue suggest to disable this case but that could be a little more complicated, in my humble opinion. And that can create some compability problems too, I guess. My argument is that code below is correct for native language, so I guess it does makes sense sending booleans as Scalar. ``` $ x = True $ x + x 2 ``` This was my first test: ``` Python 3.12.0 \| packaged by Anaconda, Inc. \| (main, Oct 2 2023, 17:29:18) [GCC 11.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import torch >>> torch.pow(torch.tensor([1, 2], device='cuda'), True) tensor([1, 2], device='cuda:0') >>> torch.pow(torch.tensor([1, 2]), True) tensor([1, 2]) >>> torch.pow(torch.tensor([1, 2]), False) tensor([1, 1]) >>> torch.pow(torch.tensor([1, 2], device='cuda'), False) tensor([1, 1], device='cuda:0') ``` I've run `test_torch.py` and got following results, so my guess is that I didn't break anything. I was just looking for a test that uses linear regression, as suggested. ``` Ran 1619 tests in 52.363s OK (skipped=111) [TORCH_VITAL] Dataloader.enabled True [TORCH_VITAL] Dataloader.basic_unit_test TEST_VALUE_STRING [TORCH_VITAL] CUDA.used true ``` (I can paste whole log, if necessary) If this is a bad idea overall, dont worry about it. It's not a big deal, it's actually a two line change 😅 so can we talk of how do things in a different strategy. For the record I've signed the agreement already. And I didn't run linter because it's not working 😞 . Looks like PyYaml 6.0 is broken and there's a 6.0.1 fix already but I have no idea how to update that 😅 Fixes #113198 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114133 Approved by: https://github.com/lezcano	2023-11-22 22:57:36 +00:00
PyTorch UpdateBot	aca6446a6e	[executorch hash update] update the pinned executorch hash (#114325 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml). Update the pinned executorch hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114325 Approved by: https://github.com/pytorchbot	2023-11-22 22:38:40 +00:00
ydwu4	6f3cd046ab	[BE] remove skipIfDynamo for some module hook tests (#114387 ) As titled. Test Plan: exiting tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114387 Approved by: https://github.com/ezyang	2023-11-22 22:15:34 +00:00
Isuru Fernando	2f536ff92c	Refactor values kwarg in foreach tests (#112781 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112781 Approved by: https://github.com/lezcano ghstack dependencies: #112778	2023-11-22 22:10:54 +00:00
Aaron Gokaslan	ea7d70aecc	[BE]: ruff FURB136: replace ternary with min/max (preview) (#114382 ) Replaces ternary if else statements with simple min max when appropriate. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114382 Approved by: https://github.com/albanD	2023-11-22 22:10:01 +00:00
PyTorch MergeBot	88a8a0daa4	Revert "Require less alignment for masking (#114173 )" This reverts commit f882c175d8e9731238c3f29ca10821f2fe9f0797. Reverted https://github.com/pytorch/pytorch/pull/114173 on behalf of https://github.com/huydhn due to Sorry for reverting you change, but it is failing some inductor tests `f882c175d8` ([comment](https://github.com/pytorch/pytorch/pull/114173#issuecomment-1823552362))	2023-11-22 21:49:31 +00:00
Andrew Gu	e7726b596e	[FSDP] Added DDP parity test for CPU training (#114372 ) This is a follow-up to https://github.com/pytorch/pytorch/pull/112145/ to include a numerical parity test with DDP for CPU training. ``` python -m pytest test/distributed/fsdp/test_fsdp_misc.py -k test_fsdp_cpu_training -s ``` We should follow-up on https://github.com/pytorch/pytorch/pull/112145/files#r1375102283 at some point too. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114372 Approved by: https://github.com/XilunWu	2023-11-22 21:46:57 +00:00
Eli Uriegas	1b66701379	ci: Bump TorchAudio, less third_party deps (#114393 ) Installing the current pinned version of TorchAudio can be problematic because it expects to be able to download a file from sourceware.org (see [ref](`a8f4e97bd5/third_party/bzip2/CMakeLists.txt (L14)`)) and that does not have any guarantees of uptime. This bumps this commit to the latest v2.1.1 commit (https://github.com/pytorch/audio/releases/tag/v2.1.1) which should have less third_party dependencies and thus be less flaky <details> <summary> Should help with errors like: </summary> logs link: https://github.com/pytorch/pytorch/actions/runs/6959510046/job/18942955523#step:15:592 ``` 5h+ pip install --progress-bar off --no-use-pep517 --user git+https://github.com/pytorch/audio.git@a8f4e97bd5356a7a77510cdf6a3a62e25a5dc602 Collecting git+https://github.com/pytorch/audio.git@a8f4e97bd5356a7a77510cdf6a3a62e25a5dc602 Cloning https://github.com/pytorch/audio.git (to revision a8f4e97bd5356a7a77510cdf6a3a62e25a5dc602) to /tmp/pip-req-build-6b5hkzmq Running command git clone --filter=blob:none --quiet https://github.com/pytorch/audio.git /tmp/pip-req-build-6b5hkzmq Running command git rev-parse -q --verify 'sha^a8f4e97bd5356a7a77510cdf6a3a62e25a5dc602' Running command git fetch -q https://github.com/pytorch/audio.git a8f4e97bd5356a7a77510cdf6a3a62e25a5dc602 Running command git checkout -q a8f4e97bd5356a7a77510cdf6a3a62e25a5dc602 Resolved https://github.com/pytorch/audio.git to commit a8f4e97bd5356a7a77510cdf6a3a62e25a5dc602 Running command git submodule update --init --recursive -q Preparing metadata (setup.py) ... 25l- error error: subprocess-exited-with-error × python setup.py egg_info did not run successfully. │ exit code: 1 ╰─> [60 lines of output] Traceback (most recent call last): File "/opt/conda/envs/py_3.10/lib/python3.10/urllib/request.py", line 1348, in do_open h.request(req.get_method(), req.selector, req.data, headers, File "/opt/conda/envs/py_3.10/lib/python3.10/http/client.py", line 1283, in request self._send_request(method, url, body, headers, encode_chunked) File "/opt/conda/envs/py_3.10/lib/python3.10/http/client.py", line 1329, in _send_request self.endheaders(body, encode_chunked=encode_chunked) File "/opt/conda/envs/py_3.10/lib/python3.10/http/client.py", line 1278, in endheaders self._send_output(message_body, encode_chunked=encode_chunked) File "/opt/conda/envs/py_3.10/lib/python3.10/http/client.py", line 1038, in _send_output self.send(msg) File "/opt/conda/envs/py_3.10/lib/python3.10/http/client.py", line 976, in send self.connect() File "/opt/conda/envs/py_3.10/lib/python3.10/http/client.py", line 1448, in connect super().connect() File "/opt/conda/envs/py_3.10/lib/python3.10/http/client.py", line 942, in connect self.sock = self._create_connection( File "/opt/conda/envs/py_3.10/lib/python3.10/socket.py", line 845, in create_connection raise err File "/opt/conda/envs/py_3.10/lib/python3.10/socket.py", line 833, in create_connection sock.connect(sa) OSError: [Errno 99] Cannot assign requested address During handling of the above exception, another exception occurred: Traceback (most recent call last): File "<string>", line 2, in <module> File "<pip-setuptools-caller>", line 34, in <module> File "/tmp/pip-req-build-6b5hkzmq/setup.py", line 184, in <module> _main() File "/tmp/pip-req-build-6b5hkzmq/setup.py", line 145, in _main _fetch_third_party_libraries() File "/tmp/pip-req-build-6b5hkzmq/setup.py", line 129, in _fetch_third_party_libraries _fetch_archives(_parse_sources()) File "/tmp/pip-req-build-6b5hkzmq/setup.py", line 123, in _fetch_archives torch.hub.download_url_to_file(url, dest, progress=False) File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/hub.py", line 620, in download_url_to_file u = urlopen(req) File "/opt/conda/envs/py_3.10/lib/python3.10/urllib/request.py", line 216, in urlopen return opener.open(url, data, timeout) File "/opt/conda/envs/py_3.10/lib/python3.10/urllib/request.py", line 519, in open response = self._open(req, data) File "/opt/conda/envs/py_3.10/lib/python3.10/urllib/request.py", line 536, in _open result = self._call_chain(self.handle_open, protocol, protocol + File "/opt/conda/envs/py_3.10/lib/python3.10/urllib/request.py", line 496, in _call_chain result = func(*args) File "/opt/conda/envs/py_3.10/lib/python3.10/urllib/request.py", line 1391, in https_open return self.do_open(http.client.HTTPSConnection, req, File "/opt/conda/envs/py_3.10/lib/python3.10/urllib/request.py", line 1351, in do_open raise URLError(err) urllib.error.URLError: <urlopen error [Errno 99] Cannot assign requested address> -- Git branch: HEAD -- Git SHA: a8f4e97bd5356a7a77510cdf6a3a62e25a5dc[602](https://github.com/pytorch/pytorch/actions/runs/6959510046/job/18942955523#step:15:603) -- Git tag: None -- PyTorch dependency: torch -- Building version 2.0.0a0+a8f4e97 --- Initializing submodules --- Initialized submodule --- Fetching v1.2.12.tar.gz --- Fetching bzip2-1.0.8.tar.gz [end of output] note: This error originates from a subprocess, and is likely not a problem with pip. error: metadata-generation-failed ``` </details> Signed-off-by: Eli Uriegas <eliuriegas@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/114393 Approved by: https://github.com/atalman, https://github.com/kit1980, https://github.com/huydhn	2023-11-22 21:05:20 +00:00
Wanchao Liang	d416e5b34f	[torchrun] fix incorrect warning for non static backend (#114335 ) This PR fixes a incorrect warning for non static rdzv backend, the warning should only be thrown when the rdzv endpoint not specified. error repro from @stas00 ``` $ cat test.py import torch $ python -u -m torch.distributed.run --nproc_per_node=1 --rdzv_endpoint localhost:6000 --rdzv_backend c10d test.py master_addr is only used for static rdzv_backend and when rdzv_endpoint is not specified ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114335 Approved by: https://github.com/H-Huang	2023-11-22 20:09:14 +00:00
drisspg	f882c175d8	Require less alignment for masking (#114173 ) # Summary Improved Fix for Attention Mask Alignment Issue (#112577) This PR addresses Issue #112577 by refining the previously implemented fix, which was found to be incorrect and causes un-needed memory regressions. The update simplifies the approach to handling the alignment of the attention mask for mem eff attention. ## Changes Alignment Check and Padding: Initially, the alignment of the attention mask is checked. If misalignment is detected, padding is applied, followed by slicing. During this process, a warning is raised to alert users. Should this be warn_once? We only call expand, once on the aligned mask. Reference https://github.com/facebookresearch/xformers/blob/main/xformers/ops/fmha/cutlass.py#L115 @albanD, @mruberry, @jbschlosser, @walterddr, and @mikaylagawarecki. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114173 Approved by: https://github.com/danthe3rd	2023-11-22 20:02:51 +00:00
Tianyu Liu	07b6f377b4	deprecate PairwiseParallel from test (#114314 ) Summary To solve issue #113706: 1. replace `PariwiseParallel` with `ColwiseParallel` and `RowwiseParallel`. 2. replace the input of ColwiseParallel from `make_input_replicate_1d` and `make_output_replicate_1d` to `input_layouts` and `output_layouts`. 3. deprecate the tests for `_parallelize_mlp` as it only supports `PariwiseParallel`. Test Plan `pytest pytorch/test/distributed/tensor/parallel/` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114314 Approved by: https://github.com/wanchaol, https://github.com/XilunWu	2023-11-22 19:45:40 +00:00
Jesse Cai	9d68cfee0d	[sparse][semi-structured] Make cusparseLt handle + flag thread_local (#114273 ) Summary: As raised in this issue: https://github.com/pytorch/pytorch/issues/113776 cuSPARSELt does not support sharing handles across different threads. Ideally we would use something like CuSparseHandlePool to do this, but since cuSPARSELt handle creation is inconsitent with the rest of CUDA, we have to do make these variables thread_local instead. Test Plan: `python test/test_sparse_semi_structured.py` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/114273 Approved by: https://github.com/danthe3rd	2023-11-22 18:55:52 +00:00
Tomasz Bohutyn	84909fef52	Add meta registration for aten.linear_backward (#114359 ) Fixes #114358 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114359 Approved by: https://github.com/ezyang	2023-11-22 18:24:24 +00:00
Xu Han	0f887a6d1a	limit fused kernel num args. (#113131 ) Fixes #97361 When fused kernel more than 1024 parameters, it should throw error from ctypes. Limit args number is should be a mechanism to protect stack memory. As we known, CPP is passing args via stack memory, and stack memory has size limitation. Code change: 1. cpp backend will check the fused nodes' args number, if it is reach the limitation. It will status flush status to ready. 2. scheduler will check `ready_to_flush` API and help backend flush codegen. 3. Add `ready_to_flush` API to `BaseScheduling`, Triton backend will return False due to not support it yet. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113131 Approved by: https://github.com/jgong5, https://github.com/mlazos	2023-11-22 18:05:33 +00:00
Isuru Fernando	1f1ff629a8	Use parent class attribute supports_out for foreach_zero opinfo (#112778 ) Instead of introducing a new has_no_out_of_place attribute Also fixes foreach_copy tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/112778 Approved by: https://github.com/lezcano	2023-11-22 18:00:44 +00:00
Jerry Zhang	d6578b3678	[quant][pt2e] Refactor some internal code for observer insertion (#113500 ) Summary: att Test Plan: python test/test_quantization.py TestQuantizePT2E Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/113500 Approved by: https://github.com/kimishpatel	2023-11-22 17:44:46 +00:00
PyTorch MergeBot	b927a4e2ca	Revert "Opportunistically use `ncclCommSplit` when creating new NCCL groups (#112889 )" This reverts commit 64a5372e6ce9b6ca0ee5c7482b27e24561725b28. Reverted https://github.com/pytorch/pytorch/pull/112889 on behalf of https://github.com/huydhn due to Sorry for reverting you change, but it is failing ROCm distributed jobs in trunk `4d07428ede` ([comment](https://github.com/pytorch/pytorch/pull/112889#issuecomment-1823214376))	2023-11-22 17:43:51 +00:00
Pavan Balaji	00ae299016	[c10d] Remove unused function (#114341 ) Summary: As the title suggests Test Plan: OSS CI Differential Revision: D51386619 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114341 Approved by: https://github.com/Skylion007	2023-11-22 17:31:20 +00:00
Angela Yi	9fcf1f9632	[export] Update schema (#114172 ) Summary: Will update CustomClassHolder in a followup Test Plan: CI Differential Revision: D51343522 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114172 Approved by: https://github.com/zhxchen17	2023-11-22 16:43:43 +00:00
CYuxian	9bab96c78c	[ONNX] Consider negative dim in _index_fill_reshape_helper (#114050 ) Fix export issue of index_copy op with negative dim. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114050 Approved by: https://github.com/thiagocrepaldi	2023-11-22 15:40:57 +00:00
Ke Wen	f2ca07b680	[ProcessGroupNCCL] Remove jumper to UCC (#114170 ) The "jumper" to UCC lib in ProcessGroupNCCL was a temporary solution a while back. Cleaning it now that UCC has its own "PG" representation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114170 Approved by: https://github.com/wconstab, https://github.com/fduwjj, https://github.com/XilunWu, https://github.com/Aidyn-A	2023-11-22 15:35:06 +00:00
Nikita Shulga	d7f698102e	Disable MPS tests on macos-m1-13 runners (#114360 ) As all of them are down at the moment, see screenshot below from [HUD](https://hud.pytorch.org/metrics) <img width="669" alt="image" src="https://github.com/pytorch/pytorch/assets/2453524/6c400791-ae7e-460a-9e77-55d454b587f3"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/114360 Approved by: https://github.com/atalman	2023-11-22 15:08:15 +00:00
Nikita Shulga	324cde59b2	[MPS] Fix test_copy_cast_no_leak (#114313 ) When running on MacOS-13.2 test always fails on first run, but succeeds on the second as presumably it reserves some memory to cache f32->f16 graph. Make it resilient against such failures by adding a warmup step when one conversion is performed before recording driver memory utilization. Fixes https://github.com/pytorch/pytorch/issues/114305 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114313 Approved by: https://github.com/huydhn	2023-11-22 14:48:24 +00:00
Bin Bao	33fad1c0d4	[AOTI] Fix a weight loading issue when the weight size can be 0 (#114280 ) Summary: When a weight tensor is 0-size, no device memory should be allocated for it. This PR fixes the weight loading logic for such a case. This problem was found when running the 14K model test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114280 Approved by: https://github.com/chenyang78	2023-11-22 14:03:51 +00:00
PyTorch MergeBot	2f3beb715c	Revert "Add Stateful/Stateless symbolic contexts, use fresh fake mode for dynamo backends (#113926 )" This reverts commit 2ca1119d532af0ba385c7b5944b954c9385b4901. Reverted https://github.com/pytorch/pytorch/pull/113926 on behalf of https://github.com/DanilBaibak due to Break internal build ([comment](https://github.com/pytorch/pytorch/pull/113926#issuecomment-1822713852))	2023-11-22 12:52:33 +00:00
PyTorch MergeBot	e239a2b2d7	Revert "[dynamo / DDP] - lazily compile submodules - to propagate real tensor strides to backend compiler (#114154 )" This reverts commit 266054c3cac0f800f37348aea1409c4759dd2315. Reverted https://github.com/pytorch/pytorch/pull/114154 on behalf of https://github.com/DanilBaibak due to The lower PR in the stack https://github.com/pytorch/pytorch/pull/113926 breaks the internal build ([comment](https://github.com/pytorch/pytorch/pull/114154#issuecomment-1822704476))	2023-11-22 12:46:15 +00:00
Jon Chuang	b4faa6bfa4	[dynamo] report guard failure user stack, fix incorrectly skipping interesting files (#114053 ) Fixes https://github.com/pytorch/pytorch/issues/114015 Before: ``` test/dynamo/test_functions.py::DefaultsTests::test_zip_strict [2023-11-18 23:11:09,316] [0/0] torch._dynamo.guards.__guards: [DEBUG] GUARDS: [2023-11-18 23:11:09,316] [0/0] torch._dynamo.guards.__guards: [DEBUG] hasattr(L['x'], '_dynamo_dynamic_indices') == False [2023-11-18 23:11:09,316] [0/0] torch._dynamo.guards.__guards: [DEBUG] ___check_type_id(L['ys'], 94696321555200) [2023-11-18 23:11:09,316] [0/0] torch._dynamo.guards.__guards: [DEBUG] len(L['ys']) == 3 [2023-11-18 23:11:09,316] [0/0] torch._dynamo.guards.__guards: [DEBUG] ___check_type_id(L['zs'], 94696321555200) [2023-11-18 23:11:09,316] [0/0] torch._dynamo.guards.__guards: [DEBUG] len(L['zs']) == 3 [2023-11-18 23:11:09,316] [0/0] torch._dynamo.guards.__guards: [DEBUG] ___check_type_id(L['ys'][0], 94696321556032) [2023-11-18 23:11:09,316] [0/0] torch._dynamo.guards.__guards: [DEBUG] L['ys'][0] == 1.0 [2023-11-18 23:11:09,316] [0/0] torch._dynamo.guards.__guards: [DEBUG] ___check_type_id(L['ys'][1], 94696321556032) [2023-11-18 23:11:09,316] [0/0] torch._dynamo.guards.__guards: [DEBUG] L['ys'][1] == 2.0 [2023-11-18 23:11:09,316] [0/0] torch._dynamo.guards.__guards: [DEBUG] ___check_type_id(L['ys'][2], 94696321556032) [2023-11-18 23:11:09,316] [0/0] torch._dynamo.guards.__guards: [DEBUG] L['ys'][2] == 3.0 [2023-11-18 23:11:09,316] [0/0] torch._dynamo.guards.__guards: [DEBUG] ___check_type_id(L['zs'][0], 94696321556032) [2023-11-18 23:11:09,316] [0/0] torch._dynamo.guards.__guards: [DEBUG] L['zs'][0] == 2.0 [2023-11-18 23:11:09,316] [0/0] torch._dynamo.guards.__guards: [DEBUG] ___check_type_id(L['zs'][1], 94696321556032) [2023-11-18 23:11:09,316] [0/0] torch._dynamo.guards.__guards: [DEBUG] L['zs'][1] == 5.0 [2023-11-18 23:11:09,316] [0/0] torch._dynamo.guards.__guards: [DEBUG] ___check_type_id(L['zs'][2], 94696321556032) [2023-11-18 23:11:09,316] [0/0] torch._dynamo.guards.__guards: [DEBUG] L['zs'][2] == 8.0 [2023-11-18 23:11:09,317] [0/0] torch._dynamo.guards.__guards: [DEBUG] utils_device.CURRENT_DEVICE == None # _dynamo/output_graph.py:365 in init_ambient_guards [2023-11-18 23:11:09,317] [0/0] torch._dynamo.guards.__guards: [DEBUG] (___skip_backend_check() or ___current_backend() == ___lookup_backend(140084534469552)) # _dynamo/output_graph.py:371 in init_ambient_guards [2023-11-18 23:11:09,317] [0/0] torch._dynamo.guards.__guards: [DEBUG] check_tensor(L['x'], Tensor, DispatchKeySet(CPU, BackendSelect, ADInplaceOrView, AutogradCPU), torch.float32, device=None, requires_grad=False, size=[3], stride=[1]) [2023-11-18 23:11:09,320] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function fn in /home/jonch/Desktop/Programming/mlsys/pytorch/test/dynamo/test_functions.py:2539 [2023-11-18 23:11:09,320] torch._dynamo.guards.__recompiles: [DEBUG] triggered by the following guard failure(s): [2023-11-18 23:11:09,320] torch._dynamo.guards.__recompiles: [DEBUG] - L['zs'][2] == 8.0 ``` After: ``` test/dynamo/test_functions.py::DefaultsTests::test_zip_strict [2023-11-18 23:07:33,341] [0/0] torch._dynamo.guards.__guards: [DEBUG] GUARDS: [2023-11-18 23:07:33,341] [0/0] torch._dynamo.guards.__guards: [DEBUG] hasattr(L['x'], '_dynamo_dynamic_indices') == False # x = x.clone() # test/dynamo/test_functions.py:2540 in fn [2023-11-18 23:07:33,341] [0/0] torch._dynamo.guards.__guards: [DEBUG] ___check_type_id(L['ys'], 94568804551424) # for y, z in zip(ys, zs, strict=True): # test/dynamo/test_functions.py:2541 in fn [2023-11-18 23:07:33,342] [0/0] torch._dynamo.guards.__guards: [DEBUG] len(L['ys']) == 3 # for y, z in zip(ys, zs, strict=True): # test/dynamo/test_functions.py:2541 in fn [2023-11-18 23:07:33,342] [0/0] torch._dynamo.guards.__guards: [DEBUG] ___check_type_id(L['zs'], 94568804551424) # for y, z in zip(ys, zs, strict=True): # test/dynamo/test_functions.py:2541 in fn [2023-11-18 23:07:33,342] [0/0] torch._dynamo.guards.__guards: [DEBUG] len(L['zs']) == 3 # for y, z in zip(ys, zs, strict=True): # test/dynamo/test_functions.py:2541 in fn [2023-11-18 23:07:33,342] [0/0] torch._dynamo.guards.__guards: [DEBUG] ___check_type_id(L['ys'][0], 94568804552256) # for y, z in zip(ys, zs, strict=True): # test/dynamo/test_functions.py:2541 in fn [2023-11-18 23:07:33,342] [0/0] torch._dynamo.guards.__guards: [DEBUG] L['ys'][0] == 1.0 # for y, z in zip(ys, zs, strict=True): # test/dynamo/test_functions.py:2541 in fn [2023-11-18 23:07:33,342] [0/0] torch._dynamo.guards.__guards: [DEBUG] ___check_type_id(L['ys'][1], 94568804552256) # for y, z in zip(ys, zs, strict=True): # test/dynamo/test_functions.py:2541 in fn [2023-11-18 23:07:33,342] [0/0] torch._dynamo.guards.__guards: [DEBUG] L['ys'][1] == 2.0 # for y, z in zip(ys, zs, strict=True): # test/dynamo/test_functions.py:2541 in fn [2023-11-18 23:07:33,342] [0/0] torch._dynamo.guards.__guards: [DEBUG] ___check_type_id(L['ys'][2], 94568804552256) # for y, z in zip(ys, zs, strict=True): # test/dynamo/test_functions.py:2541 in fn [2023-11-18 23:07:33,342] [0/0] torch._dynamo.guards.__guards: [DEBUG] L['ys'][2] == 3.0 # for y, z in zip(ys, zs, strict=True): # test/dynamo/test_functions.py:2541 in fn [2023-11-18 23:07:33,342] [0/0] torch._dynamo.guards.__guards: [DEBUG] ___check_type_id(L['zs'][0], 94568804552256) # for y, z in zip(ys, zs, strict=True): # test/dynamo/test_functions.py:2541 in fn [2023-11-18 23:07:33,342] [0/0] torch._dynamo.guards.__guards: [DEBUG] L['zs'][0] == 2.0 # for y, z in zip(ys, zs, strict=True): # test/dynamo/test_functions.py:2541 in fn [2023-11-18 23:07:33,342] [0/0] torch._dynamo.guards.__guards: [DEBUG] ___check_type_id(L['zs'][1], 94568804552256) # for y, z in zip(ys, zs, strict=True): # test/dynamo/test_functions.py:2541 in fn [2023-11-18 23:07:33,342] [0/0] torch._dynamo.guards.__guards: [DEBUG] L['zs'][1] == 5.0 # for y, z in zip(ys, zs, strict=True): # test/dynamo/test_functions.py:2541 in fn [2023-11-18 23:07:33,342] [0/0] torch._dynamo.guards.__guards: [DEBUG] ___check_type_id(L['zs'][2], 94568804552256) # for y, z in zip(ys, zs, strict=True): # test/dynamo/test_functions.py:2541 in fn [2023-11-18 23:07:33,342] [0/0] torch._dynamo.guards.__guards: [DEBUG] L['zs'][2] == 8.0 # for y, z in zip(ys, zs, strict=True): # test/dynamo/test_functions.py:2541 in fn [2023-11-18 23:07:33,342] [0/0] torch._dynamo.guards.__guards: [DEBUG] utils_device.CURRENT_DEVICE == None # _dynamo/output_graph.py:365 in init_ambient_guards [2023-11-18 23:07:33,342] [0/0] torch._dynamo.guards.__guards: [DEBUG] (___skip_backend_check() or ___current_backend() == ___lookup_backend(140370726823264)) # _dynamo/output_graph.py:371 in init_ambient_guards [2023-11-18 23:07:33,342] [0/0] torch._dynamo.guards.__guards: [DEBUG] check_tensor(L['x'], Tensor, DispatchKeySet(CPU, BackendSelect, ADInplaceOrView, AutogradCPU), torch.float32, device=None, requires_grad=False, size=[3], stride=[1]) # x = x.clone() # test/dynamo/test_functions.py:2540 in fn [2023-11-18 23:07:33,346] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function fn in /home/jonch/Desktop/Programming/mlsys/pytorch/test/dynamo/test_functions.py:2539 [2023-11-18 23:07:33,346] torch._dynamo.guards.__recompiles: [DEBUG] triggered by the following guard failure(s): [2023-11-18 23:07:33,346] torch._dynamo.guards.__recompiles: [DEBUG] - L['zs'][2] == 8.0 # for y, z in zip(ys, zs, strict=True): # test/dynamo/test_functions.py:2541 in fn ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114053 Approved by: https://github.com/ezyang	2023-11-22 12:26:41 +00:00
ancestor-mithril	2b72543f36	Solving pickle error when saving CyclicLR state_dict (#110931 ) ## How to reproduce: ```py import os import tempfile import torch from torch import nn from torch.optim import SGD from torch.optim.lr_scheduler import CyclicLR model = nn.Linear(100, 100) opt = SGD(model.parameters(), lr=1.) scheduler = CyclicLR(opt, base_lr=0.1, max_lr=0.2, scale_fn=lambda x: 0.99) tmp = tempfile.NamedTemporaryFile(delete=False) try: torch.save(scheduler.state_dict(), tmp.name) scheduler.load_state_dict(torch.load(tmp.name)) finally: tmp.close() os.unlink(tmp.name) ``` Error: ``` _pickle.PicklingError: Can't pickle <function <lambda> at 0x000001A51DF67600>: attribute lookup <lambda> on __main__ failed ``` ## Fix: Saving `scale_fn` to the state dict only if it is a callable object and not if it is a function or lambda. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110931 Approved by: https://github.com/janeyx99	2023-11-22 11:38:35 +00:00
Jiong Gong	a0e3321f0c	[inductor cpp] vectorize embedding lookup (#114062 ) For embedding lookup, there are indirect indexing with indices that are invariant to the vectorized itervar. To vectorize it, we need to keep the related indexing variables as scalars and allow vectorization when the related index_exprs are invariant to the vectorized itervar. This PR adds the support by lazily broadcasting scalar values (index_expr and constant) to vectors so that vector operations are only generated if needed by `CppVecKernel` when any of the inputs are vectors, otherwise, scalar ops are generated. The cse variable in cpp is now represented with `CppCSEVariable` which bookkeeps the relevant itervars to the variable and has a flag to mark whether it is a scalar or a vector. `CppVecOverrides` is improved to propagate these states when the ops are executed. For the added UT `test_embedding_vec`, the generated code before this PR is: ```c++ extern "C" void kernel(const long* in_ptr0, const float* in_ptr1, const float* in_ptr2, float* out_ptr0) { #pragma omp parallel num_threads(64) { { #pragma omp for for(long x0=static_cast<long>(0L); x0<static_cast<long>(128L); x0+=static_cast<long>(1L)) { #pragma GCC ivdep for(long x1=static_cast<long>(0L); x1<static_cast<long>(128L); x1+=static_cast<long>(1L)) { auto tmp0 = in_ptr0[static_cast<long>(x0)]; auto tmp5 = in_ptr2[static_cast<long>(x1 + (128Lx0))]; auto tmp1 = decltype(tmp0)(tmp0 + 64); auto tmp2 = tmp0 < 0; auto tmp3 = tmp2 ? tmp1 : tmp0; TORCH_CHECK((0 <= tmp3) & (tmp3 < 64L), "index out of bounds: 0 <= tmp3 < 64L") auto tmp4 = in_ptr1[static_cast<long>(x1 + (128Ltmp3))]; auto tmp6 = decltype(tmp4)(tmp4 + tmp5); out_ptr0[static_cast<long>(x1 + (128Lx0))] = tmp6; } } } } } ``` After this PR, we have: ```c++ extern "C" void kernel(const long in_ptr0, const float* in_ptr1, const float* in_ptr2, float* out_ptr0) { #pragma omp parallel num_threads(64) { { #pragma omp for for(long x0=static_cast<long>(0L); x0<static_cast<long>(128L); x0+=static_cast<long>(1L)) { for(long x1=static_cast<long>(0L); x1<static_cast<long>(128L); x1+=static_cast<long>(16L)) { auto tmp0 = in_ptr0[static_cast<long>(x0)]; auto tmp5 = at::vec::Vectorized<float>::loadu(in_ptr2 + static_cast<long>(x1 + (128Lx0))); auto tmp1 = decltype(tmp0)(tmp0 + 64); auto tmp2 = tmp0 < 0; auto tmp3 = tmp2 ? tmp1 : tmp0; TORCH_CHECK((0 <= tmp3) & (tmp3 < 64L), "index out of bounds: 0 <= tmp3 < 64L") auto tmp4 = at::vec::Vectorized<float>::loadu(in_ptr1 + static_cast<long>(x1 + (128Ltmp3))); auto tmp6 = tmp4 + tmp5; tmp6.store(out_ptr0 + static_cast<long>(x1 + (128L*x0))); } } } } } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114062 Approved by: https://github.com/jansel	2023-11-22 11:19:42 +00:00
PyTorch MergeBot	3e1abde46d	Revert "AOTAutograd: handle set_(), detect metadata mutations that cancel out (#111554 )" This reverts commit a911b4db9d82238a1d423e2b4c0a3d700217f0c1. Reverted https://github.com/pytorch/pytorch/pull/111554 on behalf of https://github.com/DanilBaibak due to The lower PR in the stack #113926 breaks the internal build ([comment](https://github.com/pytorch/pytorch/pull/111554#issuecomment-1822472206))	2023-11-22 10:13:48 +00:00
Jon Chuang	172a103857	[dynamo] `strict=True` kwarg for zip (#114047 ) Fixes https://github.com/pytorch/pytorch/issues/113894 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114047 Approved by: https://github.com/ezyang	2023-11-22 08:48:51 +00:00
Isuru Fernando	c77a4a4096	Fix compiling add with torch.int32 and scalars (#113965 ) Fixes #113944 When `b` and `alpha` are both scalars, using `prims.mul` will create a tensor with dtype `int64` resulting in wrong dtype. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113965 Approved by: https://github.com/ezyang	2023-11-22 07:32:19 +00:00
Yanbo Liang	9f0deb132b	[Inductor] Refactor group/batch fusion to support user defined execution order and configs (#113738 ) Meta internal customers need more flexible configs on these group/batch fusion's execution order and parameters, I'd like to provide a new inductor config that users can fine and auto tune these group/batch fusions for different models. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113738 Approved by: https://github.com/xuzhao9	2023-11-22 05:46:23 +00:00
BowenBao	bebe66e262	[ONNX] Benchmark to save sample inputs to disk before running (#114163 ) Such that even if failures occur during model run, the sample inputs are accessible for later investigation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114163 Approved by: https://github.com/thiagocrepaldi ghstack dependencies: #113780	2023-11-22 05:39:00 +00:00
BowenBao	bd44bdb675	[ONNX][dynamo_export] Turn off opmath for type promotion (#113780 ) Although opmath is the right thing to do to retain on-par precision, it inserts upcasts everywhere in the graph. This is particularly hard for backend to optimize since there is no way to differentiate between inserted upcasts and model code casts. Hence we consolidate the input dtype to the result dtype to avoid this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113780 Approved by: https://github.com/titaiwangms, https://github.com/justinchuby	2023-11-22 05:39:00 +00:00
Andrew Gu	e7326ec295	[DTensor] Computed `DTensorSpec` hash lazily (#114322 ) This is a forward fix for https://github.com/pytorch/pytorch/issues/113781. We lazily compute the hash so that we do not try to compute the hash on `SymInt`s (for the stride) during Dynamo tracing. Tested via: ``` python test/distributed/_tensor/test_dtensor_compile.py -k test_2d_fsdp_tp_ac_compile ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114322 Approved by: https://github.com/wanchaol ghstack dependencies: #113919, #113924, #114134, #113925, #113930, #114141, #113915, #114140	2023-11-22 04:13:11 +00:00
ydwu4	c5ddfa79b3	[HigherOrderOp] add output tensor meta check for cond (#113900 ) This PR checks the tensor meta of the outputs of cond's branches. This helps us to identify several tests that return outputs that have different requires_grad. Also fix the error messages, which previously was in torch.ops.higher_order.cond now is raised in dynamo CondHigherOrder. Test Plan: Existing tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113900 Approved by: https://github.com/zou3519 ghstack dependencies: #113819	2023-11-22 04:06:30 +00:00
ydwu4	9e657ce2ed	[HigherOrderOp] set should_flatten_output=True for cond (#113819 ) This PR add should_flatten_outpu=True for cond. This effectively allows cond to support pytree output with the output being flattened. Note: a single tensor output will be automatically casted as tuple for torch.ops.higher_order.cond. This PR also adds support for comparing BuiltinVariables e.g. tuple, this is to make sure we could make dynamo inline comparing two tree_spec to make sure both branches returns the same tree_spec. Test Plan: Existing tests. Will add more pytree tests and modify the documentations in the follow-up prs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113819 Approved by: https://github.com/zou3519	2023-11-22 04:06:30 +00:00
Will Constable	e0ec71deab	Fix module: distributed labeler (#114324 ) Removes preceding `/` which was preventing labeler from working. (looks like a typo in the original PR) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114324 Approved by: https://github.com/XilunWu, https://github.com/fegin	2023-11-22 03:43:14 +00:00
Nikita Shulga	0a33cf95c6	Add python-3.12 to triton wheels build matrix (#114327 ) Not sure if it will work, but perhaps worth a try Inspired by [following comment](`56556d0aac/manywheel/build_cuda.sh (L266)`): ``` # No triton dependency for now on 3.12 since we don't have binaries for it # and torch.compile doesn't work. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114327 Approved by: https://github.com/kit1980, https://github.com/PaliC	2023-11-22 03:26:32 +00:00
PyTorch MergeBot	2c4930a91d	Revert "[fx/DDP] add nested ctx_manager test for DDP Dynamo (#114056 )" This reverts commit d5d62e85615fdf345e0556a9d8edbee2d3c64ae2. Reverted https://github.com/pytorch/pytorch/pull/114056 on behalf of https://github.com/malfet due to Breaks inductor_distributed, see `d5d62e8561` ([comment](https://github.com/pytorch/pytorch/pull/114056#issuecomment-1822006423))	2023-11-22 02:52:31 +00:00
Sunita Nadampalli	db8f9686a7	[cmake] set 'mcpu=generic' as the default build flag for mkldnn on aarch64 (#113820 ) This is to remove the dependencies on mkldnn cmake default definitions Fixes https://github.com/pytorch/pytorch/issues/109312 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113820 Approved by: https://github.com/malfet	2023-11-22 02:49:33 +00:00
Edward Z. Yang	6187153753	Consolidate sym/non-sym overloads for _make_wrapper_subclass (#114236 ) I'm not sure why we needed two overloads previously, let's find out! Removing the int overload is load bearing because it now forces specialization on SymInt arguments instead of falling through to the SymInt overload, see new test. I decided NOT to allow storage offset simultaneously with None strides. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/114236 Approved by: https://github.com/albanD	2023-11-22 02:03:29 +00:00
Jerry Zhang	a785fbe513	[reland][quant][pt2e] Refactor insert observer to do sharing checking in the same place (#113458 ) (#113920 ) Summary: Previously it is scatter in two different places: before inserting observer and during observer, this PR moved everything before we insert observer * Next: refactor QuantizationSpec and check more fields for sharing Test Plan: CI (regression tests) Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D51420029](https://our.internmc.facebook.com/intern/diff/D51420029) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113920 Approved by: https://github.com/andrewor14	2023-11-22 01:48:51 +00:00
Thiago Crepaldi	3f736c2d77	Add ONNXProgram.__call__ API to run model with ONNX Runtime (#113495 ) Currently the user can use torch.onnx.dynamo_export to export the model. to ONNX. ```python import torch class Model(torch.nn.Module): def forward(self, x): return x + 1.0 onnx_program = torch.onnx.dynamo_export( Model(), torch.randn(1, 1, 2, dtype=torch.float), ) ``` The next step would be instantiating a ONNX runtime to execute it. ```python import onnxruntime # type: ignore[import] onnx_input = self.adapt_torch_inputs_to_onnx(args, *kwargs) options = options or {} providers = options.get("providers", onnxruntime.get_available_providers()) onnx_model = self.model_proto.SerializeToString() ort_session = onnxruntime.InferenceSession(onnx_model, providers=providers) def to_numpy(tensor): return ( tensor.detach().cpu().numpy() if tensor.requires_grad else tensor.cpu().numpy() ) onnxruntime_input = { k.name: to_numpy(v) for k, v in zip(ort_session.get_inputs(), onnx_input) } return ort_session.run(None, onnxruntime_input) ``` This PR provides the `ONNXProgram.__call__` method as facilitator to use ONNX Runtime under the hood, similar to how `torch.export.ExportedProgram.__call__` which allows the underlying `torch.fx.GraphModule` to be executed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113495 Approved by: https://github.com/titaiwangms	2023-11-22 01:48:45 +00:00
voznesenskym	044cd56dcc	[Easy] make @markDynamoStrictTest set nopython=True (#114308 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114308 Approved by: https://github.com/zou3519, https://github.com/oulgen	2023-11-22 01:36:29 +00:00
Jon Chuang	d5d62e8561	[fx/DDP] add nested ctx_manager test for DDP Dynamo (#114056 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114056 Approved by: https://github.com/wconstab	2023-11-22 01:08:25 +00:00
Andrew Calvano	4d07428ede	Fix for out of bounds read in mobile interpreter FORMAT opcode handler (#110303 ) Summary: The FORMAT opcode for the mobile TorchScript interpreter contained an out of bounds read issue leading to memory corruption. This change adds an explicit check that the number of inputs passed to the format method called when handling the FORMAT opcode is a valid and within bounds of the stack. Test Plan: contbuild + OSS signals Differential Revision: D49739095 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110303 Approved by: https://github.com/malfet	2023-11-22 01:05:42 +00:00
Hongtao Yu	9cbee4757e	[Autotune] Reduce XLBOCK for outer reduction (#114284 ) I have observed that quite a few Reduction.Outer kernels have potential for performance improvement by reducing register pressure. This is due to our current register pressure reduction logics, which only reduces RBLOCK, doesn't work for outer reductions. While we can tighten up there, which will likely increase compile time, I found a better workaround to tune down XBLOCK in the first place. Perf job: main 9efbb4ea73 (11/16) vs hoy/autotune/reduction Slight compile time and perf improvement seen. I also saw perf improvement locally for the few kernels being investigated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114284 Approved by: https://github.com/jansel	2023-11-22 00:28:08 +00:00
atalman	995fae6060	Move small pypi build as default for linux cuda 12.1 (#114281 ) This is first PR to resolve: https://github.com/pytorch/pytorch/issues/113972 Move our small wheel build as default Test: ``` pip3 install --no-cache-dir --pre torch-2.2.0.dev20231121%2Bcu121-cp310-cp310-linux_x86_64.whl --index-url https://download.pytorch.org/whl/nightly/cu121 Looking in indexes: https://download.pytorch.org/whl/nightly/cu121 Processing ./torch-2.2.0.dev20231121%2Bcu121-cp310-cp310-linux_x86_64.whl Collecting filelock (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/filelock-3.9.0-py3-none-any.whl (9.7 kB) Collecting typing-extensions>=4.8.0 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/typing_extensions-4.8.0-py3-none-any.whl (31 kB) Collecting sympy (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/sympy-1.11.1-py3-none-any.whl (6.5 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.5/6.5 MB 253.4 MB/s eta 0:00:00 Collecting networkx (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/networkx-3.0rc1-py3-none-any.whl (2.0 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 387.1 MB/s eta 0:00:00 Collecting jinja2 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/Jinja2-3.1.2-py3-none-any.whl (133 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 133.1/133.1 kB 365.3 MB/s eta 0:00:00 Collecting fsspec (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/fsspec-2023.4.0-py3-none-any.whl (153 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 154.0/154.0 kB 370.6 MB/s eta 0:00:00 Collecting pytorch-triton==2.1.0+6e4932cda8 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/pytorch_triton-2.1.0%2B6e4932cda8-cp310-cp310-linux_x86_64.whl (125.4 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 125.4/125.4 MB 384.1 MB/s eta 0:00:00 Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 23.7/23.7 MB 404.9 MB/s eta 0:00:00 Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 823.6/823.6 kB 402.5 MB/s eta 0:00:00 Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.1/14.1 MB 383.9 MB/s eta 0:00:00 Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl (731.7 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 731.7/731.7 MB 406.9 MB/s eta 0:00:00 Collecting nvidia-cublas-cu12==12.1.3.1 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 410.6/410.6 MB 388.2 MB/s eta 0:00:00 Collecting nvidia-cufft-cu12==11.0.2.54 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl (121.6 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 121.6/121.6 MB 410.5 MB/s eta 0:00:00 Collecting nvidia-curand-cu12==10.3.2.106 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl (56.5 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.5/56.5 MB 272.9 MB/s eta 0:00:00 Collecting nvidia-cusolver-cu12==11.4.5.107 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl (124.2 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 124.2/124.2 MB 381.5 MB/s eta 0:00:00 Collecting nvidia-cusparse-cu12==12.1.0.106 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl (196.0 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 196.0/196.0 MB 394.6 MB/s eta 0:00:00 Collecting nvidia-nccl-cu12==2.19.3 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_nccl_cu12-2.19.3-py3-none-manylinux1_x86_64.whl (166.0 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 166.0/166.0 MB 384.7 MB/s eta 0:00:00 Collecting nvidia-nvtx-cu12==12.1.105 (from torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (99 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 99.1/99.1 kB 281.8 MB/s eta 0:00:00 Collecting nvidia-nvjitlink-cu12 (from nvidia-cusolver-cu12==11.4.5.107->torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/cu121/nvidia_nvjitlink_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (19.8 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 19.8/19.8 MB 367.3 MB/s eta 0:00:00 Collecting MarkupSafe>=2.0 (from jinja2->torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/MarkupSafe-2.1.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (25 kB) Collecting mpmath>=0.19 (from sympy->torch==2.2.0.dev20231121+cu121) Downloading https://download.pytorch.org/whl/nightly/mpmath-1.2.1-py3-none-any.whl (532 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 532.6/532.6 kB 391.3 MB/s eta 0:00:00 Installing collected packages: mpmath, typing-extensions, sympy, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, networkx, MarkupSafe, fsspec, filelock, pytorch-triton, nvidia-cusparse-cu12, nvidia-cudnn-cu12, jinja2, nvidia-cusolver-cu12, torch ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114281 Approved by: https://github.com/malfet, https://github.com/huydhn	2023-11-22 00:10:03 +00:00
Jon Chuang	628586606e	[test] fix broken test, enable test (#114235 ) Fixes root cause of https://github.com/pytorch/pytorch/pull/114053#issuecomment-1820632457 This test was not running on OSS CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114235 Approved by: https://github.com/ezyang	2023-11-22 00:04:39 +00:00
Eli Uriegas	066ac56e02	ci: Clean up logic for `merge -r` (#114295 ) Rely on built in bash conditionals for doing the if statement rather than relying on $? To avoid issues observed in https://github.com/pytorch/pytorch/pull/111008#issuecomment-1821547141 Signed-off-by: Eli Uriegas <eliuriegas@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/114295 Approved by: https://github.com/huydhn, https://github.com/malfet	2023-11-21 23:36:34 +00:00
Joel Schlosser	afdc528520	Print the index and summary of the SampleInput that failed an OpInfo test (#99444 ) Related to the Reproducible Testing BE project. Goal is to print out the sample input that failed an OpInfo test. Crazy idea: to avoid requiring widespread changes across tests that use OpInfo sample inputs, return a new special iterator type from `OpInfo.sample_inputs()`, etc. that tracks the most recent item seen. If a test fails later on, print out this info to identify the sample that failed the test. This solves the problem that the test framework currently has no concept of which sample input is being operated on. This PR contains the following changes: * New `TrackedInputIter` that wraps a sample inputs func iterator and tracks the most recent input seen in a `TrackedInput` structure * The information is stored in a dictionary on the test function itself, mapping `full test ID -> most recent TrackedInput` * To determine the test function that is being run, we do some stack crawling hackery in `extract_test_fn_and_id()` * Above applies only when one of the following is called: `OpInfo.sample_inputs()`, `OpInfo.error_inputs()`, `OpInfo.reference_inputs()`, and `OpInfo.conjugate_sample_inputs()`. This could easily be extended to `ModuleInfo`s and the sparse sample input funcs as well Example output when a sample input causes a failure: ``` ====================================================================== ERROR: test_foo_add_cpu_uint8 (__main__.TestFakeTensorCPU) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/jbschlosser/branches/reproducible_testing/torch/testing/_internal/common_device_type.py", line 911, in test_wrapper return test(args, kwargs) File "/home/jbschlosser/branches/reproducible_testing/torch/testing/_internal/common_device_type.py", line 1097, in only_fn return fn(slf, args, *kwargs) File "/home/jbschlosser/branches/reproducible_testing/test/test_ops.py", line 2211, in test_foo self.fail('Example failure') AssertionError: Example failure The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/home/jbschlosser/branches/reproducible_testing/torch/testing/_internal/common_utils.py", line 2436, in wrapper method(args, kwargs) File "/home/jbschlosser/branches/reproducible_testing/torch/testing/_internal/common_device_type.py", line 414, in instantiated_test result = test(self, param_kwargs) File "/home/jbschlosser/branches/reproducible_testing/torch/testing/_internal/common_device_type.py", line 917, in test_wrapper raise Exception( Exception: Caused by sample input at index 2: SampleInput(input=Tensor[size=(5, 1), device="cpu", dtype=torch.uint8], args=TensorList[Tensor[size=(5,), device="cpu", dtype=torch.uint8]], kwargs={}, broadcasts_input=True, name='') To execute this test, run the following from the base repo dir: python test/test_ops.py -k test_foo_add_cpu_uint8 This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 ---------------------------------------------------------------------- ``` This notably doesn't print the actual `SampleInput` values, as that's hard without fully reproducible random sample generation. I went down this path for a while and it seems infeasible without adding an untenable amount of overhead to set the random seed per SampleInput (see https://github.com/pytorch/pytorch/issues/86694#issuecomment-1614943708 for more details). For now, I am settling for at least spitting out the index and some metadata of the `SampleInput`, as it seems better than nothing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99444 Approved by: https://github.com/janeyx99	2023-11-21 23:08:35 +00:00
Antonio Kim	7fc292930c	Add support for `torch.Generator` type in TorchScript (#110413 ) - Add support for `torch.Generator` type in TorchScript - Add `generator` args to all `torch.nn.init` functions that call `uniform_` or `normal_` - Add support for `torch.Generator` in LTC's TorchScript backend (CC: @wconstab) CC: @eellison @davidberard98 @GlebKazantaev @behzad-a Pull Request resolved: https://github.com/pytorch/pytorch/pull/110413 Approved by: https://github.com/wconstab, https://github.com/albanD, https://github.com/glebk-cerebras, https://github.com/davidberard98	2023-11-21 23:07:21 +00:00
CYuxian	b88abb1674	[ONNX] Fix export issue of aten::layer_norm in opset 17 (#114058 ) For torch.nn.LayerNorm, weight and bias could be None(when parameter elementwise_affine is False or bias is False), but for onnx op LayerNormalization from opset 17, weight and bias cannot be None. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114058 Approved by: https://github.com/thiagocrepaldi	2023-11-21 22:45:50 +00:00
Jon Chuang	62de29d06f	[optim] be explicit about CPU scalar tensor dtypes (#111008 ) Fixes https://github.com/pytorch/pytorch/issues/110940 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111008 Approved by: https://github.com/janeyx99	2023-11-21 22:44:50 +00:00
Jon Chuang	266054c3ca	[dynamo / DDP] - lazily compile submodules - to propagate real tensor strides to backend compiler (#114154 ) Fixes https://github.com/pytorch/pytorch/issues/113812, https://github.com/pytorch/pytorch/issues/102591, Probably fixes: https://github.com/pytorch/pytorch/issues/113740, https://github.com/pytorch/pytorch/issues/113786, https://github.com/pytorch/pytorch/issues/113788 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114154 Approved by: https://github.com/wconstab	2023-11-21 22:40:08 +00:00
Jon Chuang	54d04553ea	[fx, DDP] fx.split_module will setup/unwind autocast & grad_mode (#113374 ) --- Replaces: https://github.com/pytorch/pytorch/pull/112231 Fixes: https://github.com/pytorch/pytorch/issues/111794 DDPOptimizer splits modules. We need to setup/unwind global states (autocast, grad_enabled) for each split, as this affects downstream compilation. --- See before and after this PR for the split fx modules here (for autocast mode): https://github.com/pytorch/pytorch/pull/112231#issuecomment-1804274605 --- ### Discussion We don't actually have to do this for grad mode: https://github.com/pytorch/pytorch/pull/112231#issuecomment-1804280031. It's not wrong to do it anyway, but maybe unnecessary? But may still be better to keep this PR's changes so we're sure what the grad mode state ought to be for each subgraph. It may come in handy in the future. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113374 Approved by: https://github.com/wconstab	2023-11-21 21:29:59 +00:00
Bin Bao	6ff7260700	[CI] Switch to check against expected result files for cpu inductor integration tests (#113668 ) Summary: With this, we can completely remove CI_SKIP from common.py. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113668 Approved by: https://github.com/ezyang, https://github.com/jansel ghstack dependencies: #113574, #113575, #113446, #113559	2023-11-21 21:20:47 +00:00
Bin Bao	a9f9f98e2f	[CI] Switch to check against expected result files for dynamo_eager and aot_eager benchmark tests (#113559 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113559 Approved by: https://github.com/ezyang, https://github.com/jansel ghstack dependencies: #113574, #113575, #113446	2023-11-21 21:20:47 +00:00
Bin Bao	212f668408	[CI] Remove CI skip list for inductor integration tests (#113446 ) Summary: Switch to completely rely on checking against expected result files. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113446 Approved by: https://github.com/ezyang, https://github.com/malfet, https://github.com/jansel ghstack dependencies: #113574, #113575	2023-11-21 21:20:41 +00:00
Bin Bao	3c8a4f01b9	[CI] Increase the shard numbers for torchbench tests (#113575 ) Summary: torchbench tests are always the lagging shards when comparing to other integration test shards, so let's bump up the corresponding shard numbers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113575 Approved by: https://github.com/ezyang, https://github.com/malfet, https://github.com/jansel ghstack dependencies: #113574	2023-11-21 21:20:34 +00:00
Bin Bao	799d8c3035	[CI] Rename the inductor test config names for dynamic shapes tests (#113574 ) Summary: To make the naming consistent with tests in inductor-periodic and simplify update_expected.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/113574 Approved by: https://github.com/eellison, https://github.com/malfet, https://github.com/jansel	2023-11-21 21:20:27 +00:00
Yang Chen	ebeaec71bf	[aotinductor] don't generate python profiling code in the cpp world (#114182 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114182 Approved by: https://github.com/aakhundov, https://github.com/desertfire	2023-11-21 21:11:58 +00:00
Chip Turner	64a5372e6c	Opportunistically use `ncclCommSplit` when creating new NCCL groups (#112889 ) Currently `ncclCommInitRankConfig` is always used when creating new communicator groups. This is wasteful as it creates non-shared pairs of endpoint queues as well as costs time to re-establish communication. This change is transparent and opportunistic; when `dist.new_group` is called, it will use the existing, healthy world process group to select the right ranks to include in the process group. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112889 Approved by: https://github.com/kwen2501	2023-11-21 21:03:52 +00:00
Ying Zhang	3b108a150a	A fix for reduction + pointwise + multi-level reduction optimization (#112935 ) ATT, for cases like reduction + multiple pointwises + multi-level reduction, previously to decide num_splits of the multi-level reduction, we only check whether the input of multi-level reduction or input of input of multi-level reduction is a reduction node (i.e. max search level is 2). This PR changes the behavior to search for a reduction input node recursively if previous input nodes are pointwise nodes. Performance-wise it looks fine. ![Screenshot 2023-11-15 at 11 52 28 PM](https://github.com/pytorch/pytorch/assets/10527447/e726948c-0c00-4839-87a4-bcf9044c66d7) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112935 Approved by: https://github.com/chenyang78	2023-11-21 20:34:07 +00:00
Edward Z. Yang	2abfb8ec7d	Correctly codegen math.inf in Inductor (#114159 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/114159 Approved by: https://github.com/lezcano	2023-11-21 20:16:48 +00:00
CaoE	c47d2b8035	Add Half support for CPU autocast on eager mode (#112484 ) Add Half support for CPU autocast on eager mode since common operators have Half support on CPU. https://github.com/pytorch/pytorch/issues/96093. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112484 Approved by: https://github.com/leslie-fang-intel, https://github.com/ezyang	2023-11-21 20:08:28 +00:00
Xuehai Pan	4e4a6ad6ec	[pytree] register pytree node type in both C++ pytree and Python pytree (#112111 ) Changes: 1. Add `_private_register_pytree_node` API in both C++ and Python pytree. In C++ pytree, the API will only register pytree node for C++ pytree. In Python pytree, the API will only register pytree node for Python pytree. 2. Do not allow registering a type as pytree node twice in the Python pytree. 3. Add thread lock to the Python pytree node register API. 4. The old `_register_pytree_node` API will call the `_private_register_pytree_node` API and raise a deprecation warning. 5. Add a new `register_pytree_node` API to register node type in both C++ and Python implementations. 6. Add tests to ensure a warning will be raised when the old private function is called. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112111 Approved by: https://github.com/zou3519	2023-11-21 19:53:13 +00:00
Ying Liu	85b97605ab	Enable set sequence nr (#114120 ) Summary: In some cases (especially those involving collective calls) - we would want to always kick off a collective call first before running going down another path. For example: ``` tbe lookup -> a2a -> overarch dense -------------> ``` if the forward code is written as a2a_out = a2a dense = dense_net out = overarch(a2a_out, dense) out.backward() The current default is running backwards in the opposite order the forward is called. However, there is no data dependency between a2a and dense, so in reality either of them could be run first. We would like the a2a to run first because it provides optimal (on average) overlap. Changing the seq_nr of a2a_out to something large enough would allow autograd engine to kick it off first. Test Plan: Tests incoming Differential Revision: D51445261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114120 Approved by: https://github.com/ezyang, https://github.com/albanD	2023-11-21 19:47:28 +00:00
kshitij12345	1a3dbf57ca	vmap: simple inplace batch rule (#113513 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113513 Approved by: https://github.com/zou3519	2023-11-21 18:55:54 +00:00
Jon Chuang	f66add9b85	[dynamo] graph break on `np.ndarray.tobytes` (#114208 ) We can't model this accurately across np and tnp https://github.com/pytorch/pytorch/issues/114204#issuecomment-1820269949 So let's not even try. Just graph break. Fixes: https://github.com/pytorch/pytorch/issues/114204 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114208 Approved by: https://github.com/lezcano	2023-11-21 18:19:37 +00:00
Andrew Gu	7694b05416	[DTensor] Reduced to one `isinstance` call in `is_shard` (#114140 ) This is a nit change to save one `isinstance` call for when `dim` is not `None` but the placement is not `Shard`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114140 Approved by: https://github.com/Skylion007, https://github.com/wanchaol ghstack dependencies: #113919, #113924, #114134, #113925, #113930, #114141, #113915	2023-11-21 17:31:02 +00:00
Oguz Ulgen	ef90508f75	[AOTI] Support ReinterpretView in abi mode (#114169 ) https://github.com/pytorch/pytorch/pull/113967 added support for ReinterpretView but it turnes out we codegen it differently in abi compat mode. This PR adds support for abi compat mode as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114169 Approved by: https://github.com/aakhundov	2023-11-21 17:08:00 +00:00
Nikita Shulga	b5dd37f23e	[MPS] Fix memory leak in copy_from_mps_ (#114197 ) By always calling `[destBuffer release]` before leaving the scope in which it was allocated. Leak was introduced by https://github.com/pytorch/pytorch/pull/84928 Add regression test. Before the change: ``` % python ../test/test_mps.py -v -k test_copy_cast_no_leak --repeat 10 test_copy_cast_no_leak (__main__.TestMemoryLeak) ... FAIL ====================================================================== FAIL: test_copy_cast_no_leak (__main__.TestMemoryLeak) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/nshulga/git/pytorch/pytorch/torch/testing/_internal/common_utils.py", line 2554, in wrapper method(args, *kwargs) File "/Users/nshulga/git/pytorch/pytorch/build/../test/test_mps.py", line 1064, in test_copy_cast_no_leak self.assertTrue(driver_before == driver_after, f"Detected {driver_after-driver_before} bytes leak of GPU memory") AssertionError: False is not true : Detected 65536 bytes leak of GPU memory To execute this test, run the following from the base repo dir: python test/test_mps.py -k test_copy_cast_no_leak This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 ---------------------------------------------------------------------- Ran 1 test in 1.102s FAILED (failures=1) ``` After: ``` % python ../test/test_mps.py -k test_copy_cast_no_leak --repeat 10 . ---------------------------------------------------------------------- Ran 1 test in 0.819s OK . ---------------------------------------------------------------------- Ran 1 test in 0.001s OK . ---------------------------------------------------------------------- Ran 1 test in 0.002s OK ... ``` Fixes https://github.com/pytorch/pytorch/issues/114096 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114197 Approved by: https://github.com/kit1980	2023-11-21 14:52:55 +00:00
Isuru Fernando	4b7f9fa436	Meta register all foreach ops (#112281 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112281 Approved by: https://github.com/lezcano	2023-11-21 14:23:09 +00:00
vfdev-5	1f8d00c5a3	[inductor] Added decomposition for upsample_nearest_exact Nd (#113749 ) Description: - Added decomposition for upsample_nearest_exact: 1d, 2d, 3d Pull Request resolved: https://github.com/pytorch/pytorch/pull/113749 Approved by: https://github.com/lezcano	2023-11-21 13:03:47 +00:00
Max Ren	7733599b2e	update pthreadpool to 4fe0e1e183925bf8cfa6aae24237e724a96479b (#113904 ) submodule / Updating pthreadpool to this revision. This is in preparation for upgrading XNNPACK, as the new XNNPACK version uses some of the new pthreadpool APIs introduced in this revision. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113904 Approved by: https://github.com/Skylion007	2023-11-21 12:45:16 +00:00
Philip Meier	2aa486de9b	vendor packaging.version (#114108 ) Fixes #113940. This vendors the relevant parts of [`packaging==23.2.0`]() to have access to `Version` and `InvalidVersion` without taking a runtime dependency on `setuptools` or `packaging`. I didn't find any vendoring policy so I put it under `torch._vendor.packaging`. While I have only vendored the files we need, I have not touched or trimmed the files otherwise. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114108 Approved by: https://github.com/malfet, https://github.com/albanD	2023-11-21 11:51:23 +00:00
PyTorch MergeBot	8ec59d3553	Revert "[dynamo] report guard failure user stack, fix incorrectly skipping interesting files (#114053 )" This reverts commit 826ab0e32d558415d5d682842417fd16b2223739. Reverted https://github.com/pytorch/pytorch/pull/114053 on behalf of https://github.com/DanilBaibak due to Break internal build ([comment](https://github.com/pytorch/pytorch/pull/114053#issuecomment-1820584281))	2023-11-21 10:05:15 +00:00
PyTorch MergeBot	dd6ef0877e	Revert "[inductor cpp] vectorize embedding lookup (#114062 )" This reverts commit 2c0474c02d3ac04a429504225d7f1a6536d3b9e6. Reverted https://github.com/pytorch/pytorch/pull/114062 on behalf of https://github.com/huydhn due to Sorry for reverting your change, please help fix lint and reland it `2c0474c02d` ([comment](https://github.com/pytorch/pytorch/pull/114062#issuecomment-1820526515))	2023-11-21 09:21:20 +00:00
Justin Yip	1efff12a88	[pytorch-vulkan] BinaryOps auto convert int tensors into float (#114145 ) Summary: Some model has hardcoded int constant tensors for some binary operations. Test Plan: ``` yipjustin@yipjustin-mbp fbsource % buck2 run -c pt.has_backtraces=1 --target-platforms ovr_config//platform/macos:arm64-fbsource //xplat/caffe2:pt_vulkan_api_test_binAppleMac\#macosx-arm64 -- --gtest_filter="*" ... [ OK ] VulkanAPITest.linear_3d_flat (0 ms) [ RUN ] VulkanAPITest.linear_3d_small [ OK ] VulkanAPITest.linear_3d_small (0 ms) [ RUN ] VulkanAPITest.linear_3d_large [ OK ] VulkanAPITest.linear_3d_large (0 ms) [ RUN ] VulkanAPITest.linear_4d_flat [ OK ] VulkanAPITest.linear_4d_flat (0 ms) [ RUN ] VulkanAPITest.linear_4d_small [ OK ] VulkanAPITest.linear_4d_small (0 ms) [ RUN ] VulkanAPITest.linear_4d_large [ OK ] VulkanAPITest.linear_4d_large (0 ms) [ RUN ] VulkanAPITest.lstm_success [ OK ] VulkanAPITest.lstm_success (5 ms) [ RUN ] VulkanAPITest.lstm_mclareninputs_success [ OK ] VulkanAPITest.lstm_mclareninputs_success (21 ms) [ RUN ] VulkanAPITest.lstm_prepack_success [ OK ] VulkanAPITest.lstm_prepack_success (8 ms) [ RUN ] VulkanAPITest.querypool_flushed_shader_log xplat/caffe2/aten/src/ATen/test/vulkan_api_test.cpp:8108: Skipped QueryPool is not available [ SKIPPED ] VulkanAPITest.querypool_flushed_shader_log (0 ms) [----------] 414 tests from VulkanAPITest (5690 ms total) [----------] Global test environment tear-down [==========] 414 tests from 1 test suite ran. (5690 ms total) [ PASSED ] 413 tests. [ SKIPPED ] 1 test, listed below: [ SKIPPED ] VulkanAPITest.querypool_flushed_shader_log YOU HAVE 9 DISABLED TESTS ``` Full Paste: P885827407 Differential Revision: D51452935 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114145 Approved by: https://github.com/SS-JIA	2023-11-21 09:06:33 +00:00
PyTorch MergeBot	5f0d72124e	Revert "Print the index and summary of the SampleInput that failed an OpInfo test (#99444 )" This reverts commit e7f12b1eb0cedfd20dcb41ea35e21e9a71e3390a. Reverted https://github.com/pytorch/pytorch/pull/99444 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it seems to cause memory leak on CUDA job `e7f12b1eb0` ([comment](https://github.com/pytorch/pytorch/pull/99444#issuecomment-1820491298))	2023-11-21 08:58:54 +00:00
Digant Desai	6c597ef015	[PyTorch] Fix attr cleanup after constant folding (#113957 ) Summary: Two nodes can point to the same attribute via node.target. This makes sure, - we don't try to delete already deleted attribute, i.e. delete attr only once - we do delete all the nodes pointing to the attribute Test Plan: ``` buck run fbcode//mode/dev-nosan fbcode//executorch/backends/xnnpack/test:test_xnnpack_passes -- executorch.backends.xnnpack.test.passes.test_batch_norm_fusion.TestBatchNormFusion.test_q8_batch_norm_fusion ``` Differential Revision: D51419442 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113957 Approved by: https://github.com/Skylion007	2023-11-21 07:48:15 +00:00
Jiong Gong	2c0474c02d	[inductor cpp] vectorize embedding lookup (#114062 ) For embedding lookup, there are indirect indexing with indices that are invariant to the vectorized itervar. To vectorize it, we need to keep the related indexing variables as scalars and allow vectorization when the related index_exprs are invariant to the vectorized itervar. This PR adds the support by lazily broadcasting scalar values (index_expr and constant) to vectors so that vector operations are only generated if needed by `CppVecKernel` when any of the inputs are vectors, otherwise, scalar ops are generated. The cse variable in cpp is now represented with `CppCSEVariable` which bookkeeps the relevant itervars to the variable and has a flag to mark whether it is a scalar or a vector. `CppVecOverrides` is improved to propagate these states when the ops are executed. For the added UT `test_embedding_vec`, the generated code before this PR is: ```c++ extern "C" void kernel(const long* in_ptr0, const float* in_ptr1, const float* in_ptr2, float* out_ptr0) { #pragma omp parallel num_threads(64) { { #pragma omp for for(long x0=static_cast<long>(0L); x0<static_cast<long>(128L); x0+=static_cast<long>(1L)) { #pragma GCC ivdep for(long x1=static_cast<long>(0L); x1<static_cast<long>(128L); x1+=static_cast<long>(1L)) { auto tmp0 = in_ptr0[static_cast<long>(x0)]; auto tmp5 = in_ptr2[static_cast<long>(x1 + (128Lx0))]; auto tmp1 = decltype(tmp0)(tmp0 + 64); auto tmp2 = tmp0 < 0; auto tmp3 = tmp2 ? tmp1 : tmp0; TORCH_CHECK((0 <= tmp3) & (tmp3 < 64L), "index out of bounds: 0 <= tmp3 < 64L") auto tmp4 = in_ptr1[static_cast<long>(x1 + (128Ltmp3))]; auto tmp6 = decltype(tmp4)(tmp4 + tmp5); out_ptr0[static_cast<long>(x1 + (128Lx0))] = tmp6; } } } } } ``` After this PR, we have: ```c++ extern "C" void kernel(const long in_ptr0, const float* in_ptr1, const float* in_ptr2, float* out_ptr0) { #pragma omp parallel num_threads(64) { { #pragma omp for for(long x0=static_cast<long>(0L); x0<static_cast<long>(128L); x0+=static_cast<long>(1L)) { for(long x1=static_cast<long>(0L); x1<static_cast<long>(128L); x1+=static_cast<long>(16L)) { auto tmp0 = in_ptr0[static_cast<long>(x0)]; auto tmp5 = at::vec::Vectorized<float>::loadu(in_ptr2 + static_cast<long>(x1 + (128Lx0))); auto tmp1 = decltype(tmp0)(tmp0 + 64); auto tmp2 = tmp0 < 0; auto tmp3 = tmp2 ? tmp1 : tmp0; TORCH_CHECK((0 <= tmp3) & (tmp3 < 64L), "index out of bounds: 0 <= tmp3 < 64L") auto tmp4 = at::vec::Vectorized<float>::loadu(in_ptr1 + static_cast<long>(x1 + (128Ltmp3))); auto tmp6 = tmp4 + tmp5; tmp6.store(out_ptr0 + static_cast<long>(x1 + (128L*x0))); } } } } } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114062 Approved by: https://github.com/jansel ghstack dependencies: #113950	2023-11-21 07:37:15 +00:00
Pavan Balaji	8f8722e3f1	[nccl-pg] Avoid using NCCL_ prefix for non-NCCL env variables (#114077 ) NCCL_ prefix should only be used for NCCL library's environment variables. We currently use a few environment variables in PyTorch with the NCCL_ prefix that are the NCCL library does not understand. This patch renames such environment variables to use the TORCH_NCCL_ prefix instead. We still maintain the old NCCL_ variables, but throw a warning when they are used. The following env changes have been made: `NCCL_BLOCKING_WAIT` -> `TORCH_NCCL_BLOCKING_WAIT` `NCCL_ENABLE_TIMING` -> `TORCH_NCCL_ENABLE_TIMING` `NCCL_DESYNC_DEBUG` -> `TORCH_NCCL_DESYNC_DEBUG` `NCCL_ASYNC_ERROR_HANDLING` -> `TORCH_NCCL_ASYNC_ERROR_HANDLING` `ENABLE_NCCL_HEALTH_CHECK` -> `TORCH_ENABLE_NCCL_HEALTH_CHECK` `NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK` -> `TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK` Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/114077 Approved by: https://github.com/fduwjj	2023-11-21 07:23:42 +00:00
PyTorch UpdateBot	e122c90d3c	[executorch hash update] update the pinned executorch hash (#114008 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml). Update the pinned executorch hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114008 Approved by: https://github.com/pytorchbot, https://github.com/huydhn	2023-11-21 06:31:14 +00:00
David Berard	99af534e93	[docs][jit] Mention dynamic-shapes settings in jit/OVERVIEW.md (#113964 ) Document torch._C._jit_set_fusion_strategy, which can control how many static-shape compilation attempts are made before falling back to dynamic shapes, before falling back to uncompiled graph execution. Would be good to keep all the graph executor settings documented in one place. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113964 Approved by: https://github.com/eellison	2023-11-21 06:21:38 +00:00
Edward Z. Yang	7ea184d7e3	Handle item() on boolean tensor (#114157 ) This needs some special handling because we don't actually allocate boolean symbols in sympy; we allocate an integer indicator variable. See comment for more details. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/114157 Approved by: https://github.com/ydwu4	2023-11-21 04:34:58 +00:00
HDCharles	18e1a37c4e	[ao] updating embedding_bag support for fx and eager (#107623 ) Summary: our docs were saying dynamic embedding bag wasn't supported but it actually is (at least at the same level as embeddings were) it just wasn't previously tested/listed. Test Plan: python test/test_quantization.py -k "test_embedding" Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/107623 Approved by: https://github.com/jerryzh168	2023-11-21 03:54:00 +00:00
Ke Wen	dc65f6c601	[c10d] Remove deprecated multi-gpu-per-thread APIs (#114156 ) As of today, PyTorch Distributed's preferred programming model is one device per thread, as exemplified by the APIs in its document. The multi-GPU functions (which stand for multiple GPUs per CPU thread) have been deprecated for three versions. Removing them now before 2.2 release. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114156 Approved by: https://github.com/albanD, https://github.com/fduwjj, https://github.com/H-Huang	2023-11-21 03:50:23 +00:00
Sergii Dymchenko	f67696f45e	Update TorchFix to 0.2.0 (#114190 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114190 Approved by: https://github.com/malfet	2023-11-21 03:46:28 +00:00
PyTorch UpdateBot	e76c54bd87	[vision hash update] update the pinned vision hash (#113217 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113217 Approved by: https://github.com/pytorchbot	2023-11-21 03:39:45 +00:00
Wanchao Liang	bbc39b7bb4	[dtensor] enable RMSprop optimizer foreach support (#114152 ) as titled Pull Request resolved: https://github.com/pytorch/pytorch/pull/114152 Approved by: https://github.com/XilunWu ghstack dependencies: #114149, #114150, #114151	2023-11-21 03:23:40 +00:00
Wanchao Liang	bcd310a7ad	[dtensor] enable adagrad foreach support (#114151 ) This PR enables the adagrad foreach mode support Pull Request resolved: https://github.com/pytorch/pytorch/pull/114151 Approved by: https://github.com/XilunWu ghstack dependencies: #114149, #114150	2023-11-21 03:23:40 +00:00
Wanchao Liang	9b50611002	[dtensor] add test for SGD optimizer (#114150 ) as titled Pull Request resolved: https://github.com/pytorch/pytorch/pull/114150 Approved by: https://github.com/XilunWu ghstack dependencies: #114149	2023-11-21 03:23:35 +00:00
Wanchao Liang	b09bd36402	[dtensor] add test for adamw (#114149 ) This PR add tests for adamw optimizers Pull Request resolved: https://github.com/pytorch/pytorch/pull/114149 Approved by: https://github.com/XilunWu	2023-11-21 03:23:28 +00:00
Xilun Wu	36869463e0	[DTensor] add forward layer norm test (#114174 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114174 Approved by: https://github.com/fduwjj, https://github.com/wanchaol	2023-11-21 03:01:21 +00:00
Jez Ng	87925789ae	Make V.graph properly typed (#114025 ) Previously it lacked a type hint and so was treated as an Any type. This resulted in a lot of untyped code downstream as V.graph is referenced in many places in inductor code. I've typed it properly now as GraphLowering, and fixed the numerous type errors this surfaced. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114025 Approved by: https://github.com/eellison ghstack dependencies: #114013	2023-11-21 02:14:29 +00:00
Jez Ng	4812a62ca0	[inductor] Delete more type-ignores in dependencies.py (#114013 ) A couple of type hints were wrong Pull Request resolved: https://github.com/pytorch/pytorch/pull/114013 Approved by: https://github.com/eellison	2023-11-21 02:14:29 +00:00
voznesenskym	a911b4db9d	AOTAutograd: handle set_(), detect metadata mutations that cancel out (#111554 ) This should be enough to get @voznesenskym 's FSDP branch to plumb `set_()` through AOTAutograd properly and have everything properly no-op out. Main changes are: (1) graph break on `aten::set_.source_Tensor_storage_offset` (we could support it but it isn't needed, seems safer to graph break) (2) Functionalization: add a "proper" functionalization kernel for `aten::set_.source_Tensor`. The previous one we had was codegen'd and it was wrong (it would just clone() and call set_(), which does not do the right thing). I also manually mark on the `FunctionalTensorWrapper` when a given tensor has been mutated by a `set_()` call. (3) AOTAutograd: I added a new field, `InputAliasInfo.mutates_storage_metadata`, so we can distinguish between "regular" metadata mutations, and metadata mutations due to `set_()` calls. This is mainly because at runtime, one requires calling `as_strided_()` to fix up metadata, while the other requires calling `set_()`. (4) Made AOTAutograd's detection for metadata mutations / set_() mutations smarter and detect no-ops (if the storage and metadata are all the same). I also killed `was_updated()` and `was_metadata_updated()`, and replaced them with (existing) `has_data_mutation() ` and (new) `has_data_mutation()`, which can more accurately distinguish between data-mutation vs. `set_()` calls vs. metadata-mutation This PR is still silently correct in one case though, which I'd like to discuss more. In particular, this example: ``` def f(x): x_view = x.view(-1) x.set_(torch.ones(2)) x_view.mul_(2) return ``` If you have an input that experiences both a data-mutation and a `x_old.set_(x_new)` call, there are two cases: (a) the data mutation happened on the storage of `x_new`. This case should be handled automatically: if x_new is a graph intermediate then we will functionalize the mutation. If x_new is a different graph input, then we will perform the usual `copy_()` on that other graph input (b) the data mutation happened on the storage of `x_old`. This is more of a pain to handle, and doesn't currently work. At runtime, the right thing to do is probably something like: ``` def functionalized_f(x): x_view = x.view(-1) # set_() desugars into a no-op; later usages of x will use x_output x_output = torch.ones(2) # functionalize the mutation on x_view x_view_updated = x.mul(2) x_updated = x_view_updated.view(x.shape) # x experienced TWO TYPES of mutations; a data mutation and a metatadata mutation # We need to return both updated tensors in our graph return x_updated, x_output def runtime_wrapper(x): x_data_mutation_result, x_set_mutation_result = compiled_graph(x) # First, perform the data mutation on x's old storage x.copy_(x_data_mutation_result) # Then, swap out the storage of x with the new storage x.set_(x_set_mutation_result) ``` There are two things that make this difficult to do though: (1) Functionalization: the functionalization rule for `set_()` will fully throw away the old `FunctionalStorageImpl` on the graph input. So if there are any mutations to that `FunctionalStorageImpl` later on in the graph, the current graph input won't know about it. Maybe we can have a given `FunctionalTensorWrapper` remember all previous storages that it had, and track mutations on all of them - although this feels pretty complicated. (2) AOTAutograd now needs to know that we might have two graph outputs that correspond to a single "mutated input", which is annoying. It's worth pointing out that this issue is probably extremely unlikely for anyone to run into - can we just detect it and error? This feels slightly easier than solving it, although not significantly easier. We would still need `FunctionalTensorWrapper` to keep track of mutations on any of its "previous" storages, so it can report this info back to AOTAutograd so we can raise an error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111554 Approved by: https://github.com/ezyang ghstack dependencies: #113926	2023-11-21 01:52:46 +00:00
Huy Do	81f93991d3	Update merge rule to allow pytorchbot to land ExecuTorch hash update (#114180 ) The bot cannot merge the hash update PR otherwise, for example https://github.com/pytorch/pytorch/pull/114008#issuecomment-1818032181. I also need to move ExecuTorch jobs in trunk to pull to match the rule without the need to add `ciflow/trunk` label. The test job takes less than 20 minutes to finish atm on `2xlarge`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114180 Approved by: https://github.com/seemethere, https://github.com/ZainRizvi, https://github.com/malfet	2023-11-21 01:36:52 +00:00
Jacob Szwejbka	e8996055a9	[iOS][PTMCoreMLCompiler] update other deprecated function (#114177 ) Summary: old way was deprecated Test Plan: ci Reviewed By: kirklandsign Differential Revision: D51172622 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114177 Approved by: https://github.com/kirklandsign	2023-11-21 01:36:00 +00:00
Guilherme Leobas	77f16eb00c	Fix prod double backward when there are 2+ zeros (#113969 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113969 Approved by: https://github.com/albanD	2023-11-21 01:32:10 +00:00
Huy Do	85ce8a602b	Pin pywavelets to 1.4.1 (scikit-image dependency) (#114146 ) This is to prevent pip from pulling in 1.22.4 and fails Docker image builds, for example, https://github.com/pytorch/pytorch/actions/runs/6923861547/job/18842791777 The new package was released on Nov 17th https://pypi.org/project/PyWavelets/1.5.0/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/114146 Approved by: https://github.com/malfet, https://github.com/kit1980	2023-11-21 01:29:33 +00:00
Ke Wen	585332fb8d	[ProcessGroupNCCL] Fix avoid-record-stream warning for P2P (#114168 ) I have been seen below warning even though I did not set `TORCH_NCCL_AVOID_RECORD_STREAMS` to 1. ``` Warning: 0TORCH_NCCL_AVOID_RECORD_STREAMS=1 has no effect for point-to-point collectives. (function operator()) ``` Turns out that `TORCH_WARN_ONCE` is unconditional, so the original code below would print out both the value of `avoidRecordStreams_` and the error message: ``` TORCH_WARN_ONCE( avoidRecordStreams_, "TORCH_NCCL_AVOID_RECORD_STREAMS=1 has no effect for point-to-point " "collectives."); ``` That's also where the "0" in the message came from. Cc: @eqy Pull Request resolved: https://github.com/pytorch/pytorch/pull/114168 Approved by: https://github.com/eqy, https://github.com/fduwjj, https://github.com/H-Huang	2023-11-21 01:29:00 +00:00
Elias Ellison	6ec344b08f	Fix empty cpu tensor output in cudagraph (#114144 ) We can ignore empty cpu tensors Differential Revision: [D51472324](https://our.internmc.facebook.com/intern/diff/D51472324) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114144 Approved by: https://github.com/davidberard98	2023-11-21 01:24:48 +00:00
Andrew Gu	3e49621f3b	[DTensor] Cached hash for `DTensorSpec` (#113915 ) Overview Generally, I think we can try to freeze as many of these classes used in DTensor sharding propagation as possible so that we can cache hashes. This PR targets hashing `DTensorSpec`, which turns out to be relatively expensive. Details It looks like `tensor_meta` is only updated in `_wrap_output_spec_tensor_meta`, which only runs if the propagation was not cached: `ae94c7e491/torch/distributed/_tensor/sharding_prop.py (L137)` `ae94c7e491/torch/distributed/_tensor/sharding_prop.py (L153)` In that case, I think we can cache the hash for the `DTensorSpec` and only update it when one of the hashed attributes changes, which we only really expect to happen for `tensor_meta`. To ensure correctness, we need that all hashed attributes are immutable. - `DeviceMesh` caches its hash: `a9134fa99a/torch/distributed/_device_mesh.py (L181)` - This PR makes each `Placement` a frozen `dataclass`, making them immutable (relying on the fact that they do not have references to any mutable objects). - `TensorMeta` is a `NamedTuple` of `torch.Size`, `Tuple[int, ...]`, and `torch.dtype`, so it is immutable: `9916d8a9ea/torch/distributed/_tensor/placement_types.py (L369-L375)` Example For some simple small GPT model: Before: 0.125 ms <img width="509" alt="Screenshot 2023-11-16 at 10 08 05 PM" src="https://github.com/pytorch/pytorch/assets/31054793/10e59401-f635-431f-80b5-1b48df3a706e"> After: 0.048 ms <img width="294" alt="Screenshot 2023-11-16 at 10 08 47 PM" src="https://github.com/pytorch/pytorch/assets/31054793/09a3b0b9-f68c-4afc-bca1-c29a4b01c2fb"> The overall Adam CPU step time decreases from 7.647 ms to 6.451 ms. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113915 Approved by: https://github.com/wanchaol ghstack dependencies: #113919, #113924, #114134, #113925, #113930, #114141	2023-11-21 01:24:21 +00:00
Andrew Gu	fb25fd6f86	[DTensor] Replaced neg dim normalization with assert in helper (#114141 ) This is a replacement for https://github.com/pytorch/pytorch/pull/113922. I think we can still leave the check for negative shard dimension in `compute_local_shape_and_global_offset` and replace the normalization logic with an assert. This should provide us a stack trace to see which user-facing API did not normalize the dim as expected. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114141 Approved by: https://github.com/wanchaol ghstack dependencies: #113919, #113924, #114134, #113925, #113930	2023-11-21 01:24:21 +00:00
Jacob Szwejbka	d70857bd9e	[pytorch][lite interpreter] add tracer run under inference guard (#114003 ) Summary: This can change the ops called under the hood. Its not safe to always call because of on device training. Test Plan: ci Differential Revision: D51440119 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114003 Approved by: https://github.com/Jack-Khuu	2023-11-21 00:45:52 +00:00
Joel Schlosser	e7f12b1eb0	Print the index and summary of the SampleInput that failed an OpInfo test (#99444 ) Related to the Reproducible Testing BE project. Goal is to print out the sample input that failed an OpInfo test. Crazy idea: to avoid requiring widespread changes across tests that use OpInfo sample inputs, return a new special iterator type from `OpInfo.sample_inputs()`, etc. that tracks the most recent item seen. If a test fails later on, print out this info to identify the sample that failed the test. This solves the problem that the test framework currently has no concept of which sample input is being operated on. This PR contains the following changes: * New `TrackedInputIter` that wraps a sample inputs func iterator and tracks the most recent input seen in a `TrackedInput` structure * The information is stored in a dictionary on the test function itself, mapping `full test ID -> most recent TrackedInput` * To determine the test function that is being run, we do some stack crawling hackery in `extract_test_fn_and_id()` * Above applies only when one of the following is called: `OpInfo.sample_inputs()`, `OpInfo.error_inputs()`, `OpInfo.reference_inputs()`, and `OpInfo.conjugate_sample_inputs()`. This could easily be extended to `ModuleInfo`s and the sparse sample input funcs as well Example output when a sample input causes a failure: ``` ====================================================================== ERROR: test_foo_add_cpu_uint8 (__main__.TestFakeTensorCPU) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/jbschlosser/branches/reproducible_testing/torch/testing/_internal/common_device_type.py", line 911, in test_wrapper return test(args, kwargs) File "/home/jbschlosser/branches/reproducible_testing/torch/testing/_internal/common_device_type.py", line 1097, in only_fn return fn(slf, args, *kwargs) File "/home/jbschlosser/branches/reproducible_testing/test/test_ops.py", line 2211, in test_foo self.fail('Example failure') AssertionError: Example failure The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/home/jbschlosser/branches/reproducible_testing/torch/testing/_internal/common_utils.py", line 2436, in wrapper method(args, kwargs) File "/home/jbschlosser/branches/reproducible_testing/torch/testing/_internal/common_device_type.py", line 414, in instantiated_test result = test(self, param_kwargs) File "/home/jbschlosser/branches/reproducible_testing/torch/testing/_internal/common_device_type.py", line 917, in test_wrapper raise Exception( Exception: Caused by sample input at index 2: SampleInput(input=Tensor[size=(5, 1), device="cpu", dtype=torch.uint8], args=TensorList[Tensor[size=(5,), device="cpu", dtype=torch.uint8]], kwargs={}, broadcasts_input=True, name='') To execute this test, run the following from the base repo dir: python test/test_ops.py -k test_foo_add_cpu_uint8 This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 ---------------------------------------------------------------------- ``` This notably doesn't print the actual `SampleInput` values, as that's hard without fully reproducible random sample generation. I went down this path for a while and it seems infeasible without adding an untenable amount of overhead to set the random seed per SampleInput (see https://github.com/pytorch/pytorch/issues/86694#issuecomment-1614943708 for more details). For now, I am settling for at least spitting out the index and some metadata of the `SampleInput`, as it seems better than nothing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99444 Approved by: https://github.com/janeyx99	2023-11-21 00:11:20 +00:00
Isuru Fernando	e4a88d9581	Convert SymInts to SymFloats with SymPy (#113683 ) Fixes #109365 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113683 Approved by: https://github.com/ezyang, https://github.com/lezcano	2023-11-20 23:35:40 +00:00
ydwu4	4182092feb	[reland][HigherOrderOp] remove _deprecated_global_ns (#113813 ) This is a reland of #112757. Cannot land original one internally because internal diff is not in sync with OSS due to issues in dealing with two export repos (executorch and pytorch) using the ghimport-ghexport approach. Will try the web UI of import and export instead of ghimport and ghexport flow. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113813 Approved by: https://github.com/angelayi	2023-11-20 23:16:18 +00:00
soulitzer	c1d9d4a2b5	checkpoint_sequential warns if use_reentrant not passed explicitly (#114158 ) Use warning text for deprecation message. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114158 Approved by: https://github.com/albanD	2023-11-20 23:08:44 +00:00
voznesenskym	2ca1119d53	Add Stateful/Stateless symbolic contexts, use fresh fake mode for dynamo backends (#113926 ) The primary problem we are setting out to solve here is fake tensor freshness. Before this PR, fake tensors after dynamo represented fake tensors at the end of trace, so subsequent retraces like aot_autograd would start off with fake tensors in the wrong (end result) state, rather than their expected fresh state. The solution here is to start a fresh fake mode, and re-fakify the tensors. The nuance comes from ensuring that symbols are uniformly created for the symbolic sizes and strides of the tensor. This PR is the result of a lot of back and forth with @ezyang and @eellison. Initially, the first pass at this was not super different from what we have in the PR - the broad strokes were the same: 1) We cache source->symbol in shape_env 2) We pass policy objects around, stored at dynamo fakificaiton time, and reused for later fakification 3) We create a new fake mode for backends (from https://github.com/pytorch/pytorch/pull/113605/files) This is ugly, and has some layering violations. We detoured our decision making through a few other alternatives. Immutable/mutable fake tensor mode was the most interesting alternative, https://github.com/pytorch/pytorch/pull/113653, and was struck down on concerns of complexity in fake mode combined with it not covering all edge cases. We also detoured on what to do about tensor memoization returning back potentially different tensors than requested, and if that was an anti pattern (it is) we want to hack in with the symbol cache (we don't). We went back to the drawing board here, but with a few concessions: 1) the cache for source->symbol must live outside of shape_env, for both lifecycle, and layering reasons 2) A good amount of work needs to be done to pipe policy around fake_mode and meta_utils correctly, to cover all the cases (@ezyang did this) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113926 Approved by: https://github.com/ezyang, https://github.com/eellison	2023-11-20 23:06:37 +00:00
Sijia Chen	7afceb9f64	[AOTI] add float support of triton (#114014 ) Summary: As the title Test Plan: buck2 test 'fbcode//mode/opt' fbcode//caffe2/test/dynamo:test_dynamo -- --exact 'caffe2/test/dynamo:test_dynamo - test_functions.py::DefaultsTests::test_triton_kernel_None_arg' --print-passing-details Differential Revision: D51421325 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114014 Approved by: https://github.com/oulgen, https://github.com/aakhundov	2023-11-20 23:03:37 +00:00
Adnan Akhundov	ae00d9623e	[inductor] Add ABI shim function for torch.scatter (#114027 ) Summary: Scatter fallback calls `at::scatter` in the C++ wrapper codegen. This doesn't work in the ABI compatibility mode, as the latter requires a shim function. One is added in this PR. Test Plan: ``` $ python test/inductor/test_aot_inductor.py -k test_scatter_fallback s... ---------------------------------------------------------------------- Ran 4 tests in 52.713s OK (skipped=1) ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/114027 Approved by: https://github.com/chenyang78, https://github.com/desertfire ghstack dependencies: #114024	2023-11-20 22:51:59 +00:00
Adnan Akhundov	4b07fca7d7	[export] Allow shifted constraint ranges in dynamo._export (#114024 ) Summary: Previously, when we had two dynamic shape symbols `s0` and `s1` bound by the relationship `s1 == s0 + 1`, even when the range constraints were set in accordance with the relationship (e.g., to `[2, 1024]` for `s0` and to `[3, 1025]` for `s1`), `torch._dynamo.export` raised an error saying that the constraint is violated. Here we add a range check between the expression and the constraint and, if the ranges match, don't declare the constraint violated. We also add a flag to disable the dim constraint solver in `torch._dynamo.export` (not set by default for BC), passed down from the `torch._export.aot_compile`. This is because, even for simple constraints like `s1 == s0 + 1`, the solver claims that the constraint is too complex and the dimension `s0` must be specialized. The new flag is not exposed as a part of the public API (i.e., the one without `_`s in the module names). Both changes are required to unblock PT2 compilation of an internal model with AOT Inductor. Test Plan: ``` $ python test/inductor/test_aot_inductor.py -k test_shifted_constraint_ranges s... ---------------------------------------------------------------------- Ran 4 tests in 53.247s OK (skipped=1) ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/114024 Approved by: https://github.com/zhxchen17	2023-11-20 22:49:14 +00:00
Andrew Gu	c39c69953f	[DTensor] Used new placements for neg dim in `distribute_tensor` (#113930 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113930 Approved by: https://github.com/wanchaol ghstack dependencies: #113919, #113924, #114134, #113925	2023-11-20 22:32:58 +00:00
Andrew Gu	e2095a04ae	[DTensor] Ensured `grad_placements` was tuple (#113925 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113925 Approved by: https://github.com/wanchaol ghstack dependencies: #113919, #113924, #114134	2023-11-20 22:32:58 +00:00
Andrew Gu	f4ffd46c08	[DTensor] Used new placements for neg dim in `from_local` (#114134 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114134 Approved by: https://github.com/wanchaol ghstack dependencies: #113919, #113924	2023-11-20 22:32:51 +00:00
Andrew Gu	b41ad7d695	[DTensor] Used new placements for neg dim in `redistribute` (#113924 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113924 Approved by: https://github.com/wanchaol ghstack dependencies: #113919	2023-11-20 22:30:16 +00:00
Andrew Gu	77e058f055	[DTensor] Made `_Partial`, `Replicate` frozen dataclasses (#113919 ) This is part of the larger stack to work toward being able to cache hashes for `DTensorSpec`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113919 Approved by: https://github.com/wanchaol	2023-11-20 22:28:47 +00:00
Edward Z. Yang	97d2b439ce	[BE] Use definitely_true/sym_eq for same_meta (#114137 ) Follows https://github.com/pytorch/pytorch/pull/113159 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/114137 Approved by: https://github.com/Skylion007	2023-11-20 22:22:26 +00:00
Zhengxu Chen	13dd7f0c98	[export] Add missing builtin ops. (#113982 ) Summary: Fixing issue https://github.com/pytorch/pytorch/issues/113778 Test Plan: eyes. Differential Revision: D51436177 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113982 Approved by: https://github.com/Skylion007, https://github.com/ydwu4	2023-11-20 21:59:49 +00:00
Edward Z. Yang	8c4812be80	Replace expect_int with guard_int (#113921 ) The idea is that instead of erroring, we will just specialize at these sites. Fixes https://github.com/pytorch/pytorch/issues/113142 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113921 Approved by: https://github.com/zou3519	2023-11-20 21:27:48 +00:00
Edward Z. Yang	59ad51e10a	Insert deferred runtime asserts into Dynamo FX graph (#113958 ) During the course of fake tensor propagation (and, potentially, also Dynamo execution, although I do not believe it is possible to exercise this right now), we may generate deferred runtime asserts, which represent "guards" on unbacked symbols which cannot be immediately checked on entry to a code block; instead, they have to be checked at runtime. However, we currently accumulate these deferred runtime asserts into the ShapeEnv, and don't do anything with them. This PR modifies Dynamo to automatically insert these runtime asserts into the FX graph, before passing it on to the backend compiler. The assert format coincides with the export assert format as practiced in `torch/_export/passes/add_runtime_assertions_for_constraints_pass.py`, but actually these passes are completely disjoint right now as I only handle deferred runtime asserts, while export only handles ranges (which I should probably also handle, but don't in this PR.) The assertions must be inserted by Dynamo, because you could potentially then pass the asserts onto another backend like "eager" which no longer looks at the ShapeEnv before. Thanks to previous work in export, these asserts are preserved in AOTAutograd, but they are dropped by Inductor, which needs to be fixed in future work. This piece will be a bit awkward, as Inductor would have preferred to work with the Sympy expressions directly, ah well. Here is what the Dynamo traced FX graph looks like for the test in question: ``` <eval_with_key>.0 class GraphModule(torch.nn.Module): def forward(self, L_x_ : torch.Tensor): l_x_ = L_x_ # File: /data/users/ezyang/c/pytorch/wu.py:8, code: y = x.item() item = l_x_.item() # No stacktrace found for following nodes ge_1 = item >= 0 scalar_tensor_default = torch.ops.aten.scalar_tensor.default(ge_1); ge_1 = None _assert_async_msg = torch.ops.aten._assert_async.msg(scalar_tensor_default, "Deferred runtime assert failed: i0 >= 0, where i0 was defined by 'item' (for more information, run with TORCH_LOGS=+dynamo,dynamic)"); scalar_tensor_default = None # File: /data/users/ezyang/c/pytorch/wu.py:9, code: torch._check_is_size _check_is_size = torch._check_is_size(item) # File: /data/users/ezyang/c/pytorch/wu.py:10, code: if y >= 0: ge = item >= 0; item = None # File: /data/users/ezyang/c/pytorch/wu.py:11, code: return x * 2 mul = l_x_ * 2; l_x_ = None return (mul,) ``` Note that we actually keep the `_check_is_size` in the graph redundantly. However, assert_async is retained in the graph, whereas _check_is_size ends up getting DCE'ed. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113958 Approved by: https://github.com/aakhundov, https://github.com/tugsbayasgalan ghstack dependencies: #113978	2023-11-20 21:25:11 +00:00
Edward Z. Yang	473b17c4c1	Run sympy expressions with Python values / FX tracing (#113978 ) To codegen deferred runtime asserts, I need to be able to convert sympy expressions back into regular Python expressions that I can put in FX graphs. This PR adds some of the machinery to do this: it adds a new sympy analysis that runs operations on all FX traceable operations that can also be run with plain Python int/float/bool/etc. It's tested by symbolic tracing through the analysis, and then testing that this traced graph gives the same result as running the Python analysis directly. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113978 Approved by: https://github.com/aakhundov, https://github.com/lezcano	2023-11-20 21:25:11 +00:00
KingsleyLiu-NV	cd2798943d	[dtensor] support convolution ops (#113123 ) This PR creates a prototype of training convolutional neural networks based on DTensor. - Register required ops and implement operator dispatch - Add unit tests and example Basically, we shard the activations and replicate the model weights in this prototype. We can scale out to multiple GPUs and reduce the per-GPU memory footprint with this approach, and achieve weak scaling in terms of training performance (i.e., time per iteration). Reference log (on 2xA100 GPU): Unit Test ```bash root@luna-prod-78-80gb:/pytorch# python3 test/distributed/_tensor/test_convolution_ops.py /opt/conda/lib/python3.10/site-packages/torch/nn/modules/conv.py:456: UserWarning: 0TORCH_NCCL_AVOID_RECORD_STREAMS=1 has no effect for point-to-point collectives. (Triggered internally at /opt/conda/conda-bld/pytorch_1699257304556/work/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:2170.) return F.conv2d(input, weight, bias, self.stride, /opt/conda/lib/python3.10/site-packages/torch/nn/modules/conv.py:456: UserWarning: 0TORCH_NCCL_AVOID_RECORD_STREAMS=1 has no effect for point-to-point collectives. (Triggered internally at /opt/conda/conda-bld/pytorch_1699257304556/work/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:2170.) return F.conv2d(input, weight, bias, self.stride, .. ---------------------------------------------------------------------- Ran 2 tests in 30.354s OK root@luna-prod-78-80gb:/pytorch# python3 test/distributed/_tensor/test_other_ops.py [rank0]:[W ProcessGroupNCCL.cpp:2170] Warning: 0TORCH_NCCL_AVOID_RECORD_STREAMS=1 has no effect for point-to-point collectives. (function operator()) [rank0]:[W ProcessGroupNCCL.cpp:2170] Warning: 0TORCH_NCCL_AVOID_RECORD_STREAMS=1 has no effect for point-to-point collectives. (function operator()) [rank1]:[W ProcessGroupNCCL.cpp:2170] Warning: 0TORCH_NCCL_AVOID_RECORD_STREAMS=1 has no effect for point-to-point collectives. (function operator()) [rank1]:[W ProcessGroupNCCL.cpp:2170] Warning: 0TORCH_NCCL_AVOID_RECORD_STREAMS=1 has no effect for point-to-point collectives. (function operator()) ... ---------------------------------------------------------------------- Ran 3 tests in 16.343s OK ``` ConvNeXt Example ```bash root@luna-prod-78-80gb:/pytorch# python3 torch/distributed/_tensor/examples/convnext_example.py rank 3, 20 iterations, latency 584.80 ms, forward 102.84 ms, backward 297.80 ms, max reserved 16.34 GiB, max allocated 14.75 GiB rank 1, 20 iterations, latency 584.64 ms, forward 104.85 ms, backward 297.60 ms, max reserved 16.40 GiB, max allocated 14.74 GiB rank 0, 20 iterations, latency 584.48 ms, forward 104.64 ms, backward 297.90 ms, max reserved 16.39 GiB, max allocated 14.75 GiB rank 2, 20 iterations, latency 584.96 ms, forward 93.21 ms, backward 297.95 ms, max reserved 16.40 GiB, max allocated 14.74 GiB ``` @wanchaol @fduwjj FYI Pull Request resolved: https://github.com/pytorch/pytorch/pull/113123 Approved by: https://github.com/wanchaol	2023-11-20 21:01:28 +00:00
rzou	af51c948ac	Add mechanism for make_fx to not error on data-dependent-ops (#114129 ) I'm looking for a make_fx(tracing_mode=real) that doesn't error out on data-dependent operations. This PR adds a flag to do that. We use this to help implement offline generation, but this is useful by itself: sometimes we want to trace a function with real tensors and don't care if we bake values in (because we just want to see what happened). Test Plan: - new test Pull Request resolved: https://github.com/pytorch/pytorch/pull/114129 Approved by: https://github.com/ezyang ghstack dependencies: #114128	2023-11-20 20:55:55 +00:00
rzou	d1bb0b0e4d	Mark more built-in ops as pt2_compliant (#114128 ) See title Test Plan: - code reading Pull Request resolved: https://github.com/pytorch/pytorch/pull/114128 Approved by: https://github.com/ezyang	2023-11-20 20:55:55 +00:00
Edward Z. Yang	811bec46ef	Don't DCE item nodes if they're float (#114135 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/114135 Approved by: https://github.com/Skylion007	2023-11-20 20:42:27 +00:00
Xuehai Pan	0c450f4504	[functorch] fix potential race condition while loading `vmap` decomposition library (#113520 ) There can be a potential race condition while loading the `vmap` decomposition library in multi-threading programs. This PR adds a thread lock to avoid the case of registering the kernel multiple times. ```python import threading from torch._functorch.vmap import lazy_load_decompositions threads = [] for i in range(10000): thread = threading.Thread(target=lazy_load_decompositions) threads.append(thread) for thread in threads: thread.start() for thread in threads: thread.join() ``` ```text RuntimeError: This is not allowed since there's already a kernel registered from python overriding mse_loss_backward's behavior for FuncTorchBatched dispatch key and aten namespace. VMAP_DECOMPOSITIONS_LIB.impl(decomp, decomposition_table[decomp]) RuntimeError: This is not allowed since there's already a kernel registered from python overriding mse_loss_backward's behavior for FuncTorchBatched dispatch key and aten namespace. RuntimeError: This is not allowed since there's already a kernel registered from python overriding mse_loss_backward's behavior for FuncTorchBatched dispatch key and aten namespace. RuntimeError: This is not allowed since there's already a kernel registered from python overriding mse_loss_backward's behavior for FuncTorchBatched dispatch key and aten namespace. RuntimeError: This is not allowed since there's already a kernel registered from python overriding mse_loss_backward's behavior for FuncTorchBatched dispatch key and aten namespace. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/113520 Approved by: https://github.com/zou3519	2023-11-20 19:50:54 +00:00
drisspg	2b97f5a9a1	Disallow fp8 type promotion (#113975 ) Fixes #113663 As well as updating the promotion logic to disallow automatic type promotion between fp8 types this PR also cleans up the table entries. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113975 Approved by: https://github.com/albanD, https://github.com/malfet	2023-11-20 19:47:43 +00:00
Jon Chuang	0bb29f9450	[dynamo] Guard on `HAS_GRAPH_BREAKS` if graph breaks are present (i.e. cache miss if compiled object requires nopython) (#114073 ) Fixes https://github.com/pytorch/pytorch/issues/114059 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114073 Approved by: https://github.com/ezyang	2023-11-20 19:32:03 +00:00
Nikita Shulga	2b4c489f71	[lint] Install compatible numpy for 3.8 (#113869 ) Not ideal, but better than barring lint on 3.8 Fixes https://github.com/pytorch/pytorch/issues/113864 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113869 Approved by: https://github.com/albanD	2023-11-20 19:23:43 +00:00
Edward Z. Yang	fc39efc4c1	Fix filename typo 'funtionalized' (#114132 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/114132 Approved by: https://github.com/zhxchen17, https://github.com/Skylion007	2023-11-20 19:19:25 +00:00
Edward Z. Yang	934e9c3346	Boolean masking backwards doesn't work even with dynamic output shape ops, break accordingly (#114126 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/114126 Approved by: https://github.com/albanD	2023-11-20 19:07:37 +00:00
drisspg	039a4689a2	Update sdpa doctstring to point to flash-attn-v2 (#114124 ) # Summary See title Pull Request resolved: https://github.com/pytorch/pytorch/pull/114124 Approved by: https://github.com/albanD, https://github.com/Skylion007	2023-11-20 19:05:30 +00:00
Jon Chuang	9d2425c8a4	[dynamo] Be clearer about dict subtype source availability (#114069 ) ``` # [NOTE] OrderedDict, dict subtypes must always have source # We cannot instantiate such subtypes in-graph due to builtin __new__ ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114069 Approved by: https://github.com/ezyang	2023-11-20 18:49:42 +00:00
Jon Chuang	100b9952b1	[dynamo] Fix user defined object sourceless callable (#114066 ) Fixes https://github.com/pytorch/pytorch/issues/114019 We do not need to guard on callable user object defined instantiated in graph Pull Request resolved: https://github.com/pytorch/pytorch/pull/114066 Approved by: https://github.com/ezyang	2023-11-20 18:38:03 +00:00
Zhengxu Chen	e4ec5545cd	[export] Turn on verifier for serialization. (#113980 ) Summary: as title. Test Plan: CI Differential Revision: D51435909 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113980 Approved by: https://github.com/larryliu0820	2023-11-20 18:32:16 +00:00
Gary Zheng	d1ae5efa94	[torch][fsdp] More informative assertion error when rank mismatch (#113765 ) Summary: I had a job fail due to rank mismatch but didn't find enough information in the assertion message. This change makes the message more informative. Test Plan: CI tests and I ran a test job which failed as expected: ``` Rank 1 has different values for step: 8016.0. Other ranks: 7870.0 ``` Differential Revision: D51322046 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113765 Approved by: https://github.com/wz337, https://github.com/fegin	2023-11-20 17:44:41 +00:00
Edward Z. Yang	59bc98e4ae	[EASY] Rewrite test_anomaly_aot_autograd to more reliably trigger error (#114122 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/114122 Approved by: https://github.com/albanD	2023-11-20 17:42:42 +00:00
Andrew Gallagher	95eab508e3	[caffe2] Add non-x86 stub definition for `libraryFor` too (#114023 ) Summary: Fix non-x86 build errors with missing `libraryFor` symbol. Test Plan: ``` $ buck2 build -c fbcode.arch=aarch64 fbcode//admarket/adfinder:adfinder ``` Reviewed By: malfet Differential Revision: D51444766 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114023 Approved by: https://github.com/aaronenyeshi, https://github.com/malfet	2023-11-20 17:01:47 +00:00
Edward Z. Yang	aeb5fd52c7	Remove dead tensor_has_hints. (#114071 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/114071 Approved by: https://github.com/aakhundov	2023-11-20 16:02:24 +00:00
Aaron Gokaslan	7d5e8c1d51	[BE][easy]: Update ruff to 0.1.6 (#114125 ) Updates ruff to 0.1.6 for more bugfixes, less false positives / false negatives, and support for more rules. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114125 Approved by: https://github.com/albanD, https://github.com/malfet	2023-11-20 15:36:27 +00:00
Danylo Baibak	cbc6873538	[Dynamo][Forward fix] Add torch.ao back to is_allowed list (#114016 ) (#114111 ) Summary: As title Test Plan: Sandcastle Reviewed By: drisspg, huydhn, voznesenskym Differential Revision: D51445366 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114111 Approved by: https://github.com/jeanschmidt	2023-11-20 14:59:33 +00:00
PyTorch UpdateBot	140c54e6cc	[xla hash update] update the pinned xla hash (#110377 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml). Update the pinned xla hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110377 Approved by: https://github.com/pytorchbot	2023-11-20 10:54:05 +00:00
PyTorch MergeBot	f36d09fcb7	Revert "Add function to materialize COW storages (#113396 )" This reverts commit e2f090086bd494ee7b25da5b8e4f48d6cf61cc98. Reverted https://github.com/pytorch/pytorch/pull/113396 on behalf of https://github.com/DanilBaibak due to Break internal build ([comment](https://github.com/pytorch/pytorch/pull/113396#issuecomment-1818769090))	2023-11-20 10:26:01 +00:00
PyTorch MergeBot	fe428a284b	Revert "Add `torch._lazy_clone` to create COW tensors (#113397 )" This reverts commit 9916d8a9eaaf2c05c131f2a2dbe9eabeeaa9dffc. Reverted https://github.com/pytorch/pytorch/pull/113397 on behalf of https://github.com/DanilBaibak due to Unfortunately, I need to revert your PR because the lower [PR in the stack](https://github.com/pytorch/pytorch/pull/113396) is failing a bunch of internal build jobs. ([comment](https://github.com/pytorch/pytorch/pull/113397#issuecomment-1818761224))	2023-11-20 10:21:09 +00:00
PyTorch MergeBot	d40d72d664	Revert "Skip test_lazy_clone for Inductor (#114012 )" This reverts commit ecd8d388b9dec01c5abdf4978e632c9a3db34f95. Reverted https://github.com/pytorch/pytorch/pull/114012 on behalf of https://github.com/DanilBaibak due to I revert the PR due to the original changes broke the internal build. Here is the original diff stack [D51444337](https://www.internalfb.com/diff/D51444337) ([comment](https://github.com/pytorch/pytorch/pull/114012#issuecomment-1818745425))	2023-11-20 10:12:44 +00:00
PyTorch MergeBot	7d0339fb9a	Revert "[Dynamo][Forward fix] Add torch.ao back to is_allowed list (#114016 )" This reverts commit 09fe36274acb77249a058de0d778b73b29570036. Reverted https://github.com/pytorch/pytorch/pull/114016 on behalf of https://github.com/DanilBaibak due to The PR was exported as part of the co-dev approach and needs to merged once the internal diff will landed. ([comment](https://github.com/pytorch/pytorch/pull/114016#issuecomment-1818591191))	2023-11-20 09:32:15 +00:00
CaoE	7963aaac41	add Half support for AdaptiveAvgPool2d and AdaptiveMaxPool2d on CPU (#102079 ) ### Testing Single core: AdaptiveMaxPool2d: shape \| fp32 forward / ms \| fp16 forward / ms \| bf16 forward / ms \| fp32 backward / ms \| fp16 backward / ms \| bf16 backward / ms -- \| -- \| -- \| -- \| -- \| -- \| -- input size: (2, 56, 264, 264), output size: (100, 100) \| 71.5826 \| 78.7460 \| 85.7195 \| 7.3925 \| 6.0618 \| 6.2596 input size: (2, 56, 264, 264), output size: (50, 50) \| 28.122 \| 30.8572 \| 36.6366 \| 6.2645 \| 3.4781 \| 3.6628 input size: (32, 32, 100, 100), output size: (50, 50) \| 109.2978 \| 115.0330 \| 121.9500 \| 13.4329 \| 10.2769 \| 12.1975 input size: (16, 4, 300, 300), output size: (100, 100) \| 34.1849 \| 36.5876 \| 40.9862 \| 4.7719 \| 4.3362 \| 4.1417 28 cores: AdaptiveMaxPool2d: shape \| fp32 forward / ms \| fp16 forward / ms \| bf16 forward / ms \| fp32 backward / ms \| fp16 backward / ms \| bf16 backward / ms -- \| -- \| -- \| -- \| -- \| -- \| -- input size: (2, 56, 264, 264), output size: (100, 100) \| 3.1809 \| 3.5057 \| 3.6728 \| 0.6657 \| 0.3138 \| 0.2934 input size: (2, 56, 264, 264), output size: (50, 50) \| 1.2779 \| 1.3869 \| 1.5238 \| 0.4223 \| 0.1775 \| 0.1825 input size: (32, 32, 100, 100), output size: (50, 50) \| 4.7942 \| 4.9670 \| 5.2330 \| 1.7146 \| 0.6477 \| 0.7001 input size: (16, 4, 300, 300), output size: (100, 100) \| 1.9522 \| 2.0879 \| 2.3155 \| 0.4370 \| 0.3175 \| 0.2828 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102079 Approved by: https://github.com/jgong5, https://github.com/malfet	2023-11-20 03:01:00 +00:00
Bin Bao	5a96a42cea	[AOTI] Improve the two-pass wrapper codegen (#114067 ) Summary: For the second-pass, we don't have to rerun the whole inductor flow again. This PR moves that second-pass to the codegen time. This change not only speeds up the compilation, but also removes kernel scheduling inconsistency between the two passes. Another future improvement is to make the second-pass reuse the scheduler and do the wrapper codegen only. This is a copy of https://github.com/pytorch/pytorch/pull/113762 to land in github first. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114067 Approved by: https://github.com/chenyang78	2023-11-19 23:30:36 +00:00
cyy	226384b460	[2/N] Cleanup header inclusions in torch_cpu by iwyu (#109964 ) Further cleaning up of torch_cpu header inclusions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109964 Approved by: https://github.com/ezyang, https://github.com/Skylion007	2023-11-19 20:56:32 +00:00
Pearu Peterson	0bd4d1f4ab	Add sparse tensors support to dataloader. (#112842 ) Fixes https://github.com/pytorch/pytorch/issues/106837 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112842 Approved by: https://github.com/cpuhrsch, https://github.com/gokulavasan	2023-11-19 16:05:27 +00:00
Pearu Peterson	12f95df0e9	Eliminate unnecessary multiplications by 1 in addmm with sparse compressed tensor operand (#114026 ) This PR: - updates `torch/sparse/_triton_ops_meta.py` for the API change in `triton.testing.do_bench` - force `num_stages` to be 1 when blocksize is 128x128 to avoid out of resources exception when `bsr_dense_mm` is called from `nn.linear`. - as in the title. The performance of `nn.linear` on BSR tensor weights (dtypes `float16` and `bfloat16`) is increased as follows (`NVIDIA A100-SXM4-80GB`): - for blocksize 16x16, the average/maximum speed up is about 11/20 % - for blocksize 32x32, the average/maximum speed up is about 15/24 % - for blocksize 64x64, the average/maximum speed up is about 18/26 % - for blocksize 128x128, the average/maximum speed up is about 15/28 % Pull Request resolved: https://github.com/pytorch/pytorch/pull/114026 Approved by: https://github.com/cpuhrsch	2023-11-19 12:13:54 +00:00
Jon Chuang	826ab0e32d	[dynamo] report guard failure user stack, fix incorrectly skipping interesting files (#114053 ) Fixes https://github.com/pytorch/pytorch/issues/114015 Before: ``` test/dynamo/test_functions.py::DefaultsTests::test_zip_strict [2023-11-18 23:11:09,316] [0/0] torch._dynamo.guards.__guards: [DEBUG] GUARDS: [2023-11-18 23:11:09,316] [0/0] torch._dynamo.guards.__guards: [DEBUG] hasattr(L['x'], '_dynamo_dynamic_indices') == False [2023-11-18 23:11:09,316] [0/0] torch._dynamo.guards.__guards: [DEBUG] ___check_type_id(L['ys'], 94696321555200) [2023-11-18 23:11:09,316] [0/0] torch._dynamo.guards.__guards: [DEBUG] len(L['ys']) == 3 [2023-11-18 23:11:09,316] [0/0] torch._dynamo.guards.__guards: [DEBUG] ___check_type_id(L['zs'], 94696321555200) [2023-11-18 23:11:09,316] [0/0] torch._dynamo.guards.__guards: [DEBUG] len(L['zs']) == 3 [2023-11-18 23:11:09,316] [0/0] torch._dynamo.guards.__guards: [DEBUG] ___check_type_id(L['ys'][0], 94696321556032) [2023-11-18 23:11:09,316] [0/0] torch._dynamo.guards.__guards: [DEBUG] L['ys'][0] == 1.0 [2023-11-18 23:11:09,316] [0/0] torch._dynamo.guards.__guards: [DEBUG] ___check_type_id(L['ys'][1], 94696321556032) [2023-11-18 23:11:09,316] [0/0] torch._dynamo.guards.__guards: [DEBUG] L['ys'][1] == 2.0 [2023-11-18 23:11:09,316] [0/0] torch._dynamo.guards.__guards: [DEBUG] ___check_type_id(L['ys'][2], 94696321556032) [2023-11-18 23:11:09,316] [0/0] torch._dynamo.guards.__guards: [DEBUG] L['ys'][2] == 3.0 [2023-11-18 23:11:09,316] [0/0] torch._dynamo.guards.__guards: [DEBUG] ___check_type_id(L['zs'][0], 94696321556032) [2023-11-18 23:11:09,316] [0/0] torch._dynamo.guards.__guards: [DEBUG] L['zs'][0] == 2.0 [2023-11-18 23:11:09,316] [0/0] torch._dynamo.guards.__guards: [DEBUG] ___check_type_id(L['zs'][1], 94696321556032) [2023-11-18 23:11:09,316] [0/0] torch._dynamo.guards.__guards: [DEBUG] L['zs'][1] == 5.0 [2023-11-18 23:11:09,316] [0/0] torch._dynamo.guards.__guards: [DEBUG] ___check_type_id(L['zs'][2], 94696321556032) [2023-11-18 23:11:09,316] [0/0] torch._dynamo.guards.__guards: [DEBUG] L['zs'][2] == 8.0 [2023-11-18 23:11:09,317] [0/0] torch._dynamo.guards.__guards: [DEBUG] utils_device.CURRENT_DEVICE == None # _dynamo/output_graph.py:365 in init_ambient_guards [2023-11-18 23:11:09,317] [0/0] torch._dynamo.guards.__guards: [DEBUG] (___skip_backend_check() or ___current_backend() == ___lookup_backend(140084534469552)) # _dynamo/output_graph.py:371 in init_ambient_guards [2023-11-18 23:11:09,317] [0/0] torch._dynamo.guards.__guards: [DEBUG] check_tensor(L['x'], Tensor, DispatchKeySet(CPU, BackendSelect, ADInplaceOrView, AutogradCPU), torch.float32, device=None, requires_grad=False, size=[3], stride=[1]) [2023-11-18 23:11:09,320] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function fn in /home/jonch/Desktop/Programming/mlsys/pytorch/test/dynamo/test_functions.py:2539 [2023-11-18 23:11:09,320] torch._dynamo.guards.__recompiles: [DEBUG] triggered by the following guard failure(s): [2023-11-18 23:11:09,320] torch._dynamo.guards.__recompiles: [DEBUG] - L['zs'][2] == 8.0 ``` After: ``` test/dynamo/test_functions.py::DefaultsTests::test_zip_strict [2023-11-18 23:07:33,341] [0/0] torch._dynamo.guards.__guards: [DEBUG] GUARDS: [2023-11-18 23:07:33,341] [0/0] torch._dynamo.guards.__guards: [DEBUG] hasattr(L['x'], '_dynamo_dynamic_indices') == False # x = x.clone() # test/dynamo/test_functions.py:2540 in fn [2023-11-18 23:07:33,341] [0/0] torch._dynamo.guards.__guards: [DEBUG] ___check_type_id(L['ys'], 94568804551424) # for y, z in zip(ys, zs, strict=True): # test/dynamo/test_functions.py:2541 in fn [2023-11-18 23:07:33,342] [0/0] torch._dynamo.guards.__guards: [DEBUG] len(L['ys']) == 3 # for y, z in zip(ys, zs, strict=True): # test/dynamo/test_functions.py:2541 in fn [2023-11-18 23:07:33,342] [0/0] torch._dynamo.guards.__guards: [DEBUG] ___check_type_id(L['zs'], 94568804551424) # for y, z in zip(ys, zs, strict=True): # test/dynamo/test_functions.py:2541 in fn [2023-11-18 23:07:33,342] [0/0] torch._dynamo.guards.__guards: [DEBUG] len(L['zs']) == 3 # for y, z in zip(ys, zs, strict=True): # test/dynamo/test_functions.py:2541 in fn [2023-11-18 23:07:33,342] [0/0] torch._dynamo.guards.__guards: [DEBUG] ___check_type_id(L['ys'][0], 94568804552256) # for y, z in zip(ys, zs, strict=True): # test/dynamo/test_functions.py:2541 in fn [2023-11-18 23:07:33,342] [0/0] torch._dynamo.guards.__guards: [DEBUG] L['ys'][0] == 1.0 # for y, z in zip(ys, zs, strict=True): # test/dynamo/test_functions.py:2541 in fn [2023-11-18 23:07:33,342] [0/0] torch._dynamo.guards.__guards: [DEBUG] ___check_type_id(L['ys'][1], 94568804552256) # for y, z in zip(ys, zs, strict=True): # test/dynamo/test_functions.py:2541 in fn [2023-11-18 23:07:33,342] [0/0] torch._dynamo.guards.__guards: [DEBUG] L['ys'][1] == 2.0 # for y, z in zip(ys, zs, strict=True): # test/dynamo/test_functions.py:2541 in fn [2023-11-18 23:07:33,342] [0/0] torch._dynamo.guards.__guards: [DEBUG] ___check_type_id(L['ys'][2], 94568804552256) # for y, z in zip(ys, zs, strict=True): # test/dynamo/test_functions.py:2541 in fn [2023-11-18 23:07:33,342] [0/0] torch._dynamo.guards.__guards: [DEBUG] L['ys'][2] == 3.0 # for y, z in zip(ys, zs, strict=True): # test/dynamo/test_functions.py:2541 in fn [2023-11-18 23:07:33,342] [0/0] torch._dynamo.guards.__guards: [DEBUG] ___check_type_id(L['zs'][0], 94568804552256) # for y, z in zip(ys, zs, strict=True): # test/dynamo/test_functions.py:2541 in fn [2023-11-18 23:07:33,342] [0/0] torch._dynamo.guards.__guards: [DEBUG] L['zs'][0] == 2.0 # for y, z in zip(ys, zs, strict=True): # test/dynamo/test_functions.py:2541 in fn [2023-11-18 23:07:33,342] [0/0] torch._dynamo.guards.__guards: [DEBUG] ___check_type_id(L['zs'][1], 94568804552256) # for y, z in zip(ys, zs, strict=True): # test/dynamo/test_functions.py:2541 in fn [2023-11-18 23:07:33,342] [0/0] torch._dynamo.guards.__guards: [DEBUG] L['zs'][1] == 5.0 # for y, z in zip(ys, zs, strict=True): # test/dynamo/test_functions.py:2541 in fn [2023-11-18 23:07:33,342] [0/0] torch._dynamo.guards.__guards: [DEBUG] ___check_type_id(L['zs'][2], 94568804552256) # for y, z in zip(ys, zs, strict=True): # test/dynamo/test_functions.py:2541 in fn [2023-11-18 23:07:33,342] [0/0] torch._dynamo.guards.__guards: [DEBUG] L['zs'][2] == 8.0 # for y, z in zip(ys, zs, strict=True): # test/dynamo/test_functions.py:2541 in fn [2023-11-18 23:07:33,342] [0/0] torch._dynamo.guards.__guards: [DEBUG] utils_device.CURRENT_DEVICE == None # _dynamo/output_graph.py:365 in init_ambient_guards [2023-11-18 23:07:33,342] [0/0] torch._dynamo.guards.__guards: [DEBUG] (___skip_backend_check() or ___current_backend() == ___lookup_backend(140370726823264)) # _dynamo/output_graph.py:371 in init_ambient_guards [2023-11-18 23:07:33,342] [0/0] torch._dynamo.guards.__guards: [DEBUG] check_tensor(L['x'], Tensor, DispatchKeySet(CPU, BackendSelect, ADInplaceOrView, AutogradCPU), torch.float32, device=None, requires_grad=False, size=[3], stride=[1]) # x = x.clone() # test/dynamo/test_functions.py:2540 in fn [2023-11-18 23:07:33,346] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function fn in /home/jonch/Desktop/Programming/mlsys/pytorch/test/dynamo/test_functions.py:2539 [2023-11-18 23:07:33,346] torch._dynamo.guards.__recompiles: [DEBUG] triggered by the following guard failure(s): [2023-11-18 23:07:33,346] torch._dynamo.guards.__recompiles: [DEBUG] - L['zs'][2] == 8.0 # for y, z in zip(ys, zs, strict=True): # test/dynamo/test_functions.py:2541 in fn ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114053 Approved by: https://github.com/ezyang	2023-11-19 10:24:10 +00:00
Edward Z. Yang	edc5ae3113	Allow for calling lift_fresh_copy manually (#113923 ) In this case, the input could be fake! Just treat it normally in that case. Fixes https://github.com/pytorch/pytorch/issues/113331 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113923 Approved by: https://github.com/eellison, https://github.com/bdhirsh, https://github.com/leslie-fang-intel	2023-11-19 07:13:49 +00:00
Angela Yi	72a8329ec9	[reland][aotinductor] Add example_value metadata to nodes (#113986 ) Test Plan: `TORCH_LOGS=dynamo,inductor,aot CUDA_VISIBLE_DEVICES=7 TORCH_COMPILE_DEBUG=0 TORCHINDUCTOR_MAX_AUTOTUNE=1 buck2 run mode/opt-split-dwarf mode/inplace -c fbcode.enable_gpu_sections=true -c fbcode.platform=platform010 caffe2/torch/fb/model_transform/experimental/benchmark:mts_gpu_benchmark -- --local-model /tmp/409501788/66/gpu_lowering/input.predictor.disagg.gpu.merge --lower-backend="AOT_INDUCTOR"` Without passes: `BS: 2048, MFLOPS/BS: 40.51, TFLOP/s: 37.32, Time per iter: 2.22ms, Threads: 1, QPS: 921146.83, Accuracy: True (rtol=0.01), AOT_INDUCTOR lowering duration: 66.15s` With passes: `BS: 2048, MFLOPS/BS: 40.51, TFLOP/s: 37.49, Time per iter: 2.21ms, Threads: 1, QPS: 925450.82, Accuracy: True (rtol=0.01), AOT_INDUCTOR lowering duration: 261.11s` Differential Revision: D51436878 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113986 Approved by: https://github.com/zhxchen17	2023-11-19 07:12:24 +00:00
Justin Yip	33c6cae13b	[pytorch-vulkan][5/n] Enable BMM with the new packing. Massive refactor. (#113943 ) Summary: After the refactoring of matrix multiplication, the `bmm` logic can easily be adapted using the existing `mm` code, since the only difference is on the "batch" dimension, which is not packed now. So we can simply add computation along the "z" dimension. Further, I realized that `bias` and `beta` are simply a post-processing step after the matrix multiplication, I have factored them out. The nice part about this factoring is that we can directly leverage the broadcasting logic, hence we don't need to have a separate shader just to add the bias. So this diff massively simply the `mm` code. 1. Reduce 4 shaders `mm`, `addmm`, `bmm`, `baddbmm` into just `mm`. 2. Remove packing for bias. 3. Add support on `at::bmm(m1.vulkan(), m2.vulkan())` <= This is a blocking feature for the emformer models. Test Plan: ``` % buck2 run -c pt.has_backtraces=1 --target-platforms ovr_config//platform/macos:arm64-fbsource //xplat/caffe2:pt_vulkan_api_test_binAppleMac\#macosx-arm64 ... [ OK ] VulkanAPITest.linear_4d_flat (0 ms) [ RUN ] VulkanAPITest.linear_4d_small [ OK ] VulkanAPITest.linear_4d_small (0 ms) [ RUN ] VulkanAPITest.linear_4d_large [ OK ] VulkanAPITest.linear_4d_large (1 ms) [ RUN ] VulkanAPITest.lstm_success [ OK ] VulkanAPITest.lstm_success (5 ms) [ RUN ] VulkanAPITest.lstm_mclareninputs_success [ OK ] VulkanAPITest.lstm_mclareninputs_success (22 ms) [ RUN ] VulkanAPITest.lstm_prepack_success [ OK ] VulkanAPITest.lstm_prepack_success (3 ms) [ RUN ] VulkanAPITest.querypool_flushed_shader_log xplat/caffe2/aten/src/ATen/test/vulkan_api_test.cpp:8056: Skipped QueryPool is not available [ SKIPPED ] VulkanAPITest.querypool_flushed_shader_log (0 ms) [----------] 411 tests from VulkanAPITest (5754 ms total) [----------] Global test environment tear-down [==========] 411 tests from 1 test suite ran. (5754 ms total) [ PASSED ] 410 tests. [ SKIPPED ] 1 test, listed below: [ SKIPPED ] VulkanAPITest.querypool_flushed_shader_log ``` Full Paste: P884697749 Differential Revision: D51421256 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113943 Approved by: https://github.com/SS-JIA	2023-11-19 06:24:30 +00:00
PyTorch MergeBot	e3eca4c49f	Revert "Convert SymInts to SymFloats with SymPy (#113683 )" This reverts commit 0ec66b3be5a53ab960872981b5027c49c2e6b7e9. Reverted https://github.com/pytorch/pytorch/pull/113683 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing in trunk `0ec66b3be5`, probably a landrace as this is not failing on your PR ([comment](https://github.com/pytorch/pytorch/pull/113683#issuecomment-1817759130))	2023-11-19 06:09:15 +00:00
leslie-fang-intel	fb3bc3949a	[Inductor] remove GPT2ForSequenceClassification from ci skip list (#112100 ) Summary As discussed in https://github.com/pytorch/pytorch/issues/109019, the accuracy issue of `GPT2ForSequenceClassification` has been fixed in https://github.com/pytorch/pytorch/pull/108690. Remove it from CI Skip list. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112100 Approved by: https://github.com/lezcano	2023-11-19 05:12:18 +00:00
Isuru Fernando	84f791e697	Fix checking symbolic shapes inside torch._check (#113811 ) Fixes https://github.com/pytorch/pytorch/issues/110719#issuecomment-1768710678 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113811 Approved by: https://github.com/ezyang, https://github.com/peterbell10	2023-11-19 04:13:18 +00:00
cyy	bae61ecb96	[Reland 1] Cleanup header inclusions in torch_cpu by iwyu (#112311 ) Reland https://github.com/pytorch/pytorch/pull/101178 to use IWYU on torch_cpu. The header file changes are excluded to avoid breaking internal jobs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112311 Approved by: https://github.com/ezyang	2023-11-19 04:06:36 +00:00
zdevito	68ab458fe3	Don't recommmend max_split_size_mb first (#113481 ) I've run into a couple of cases now where max_split_size_mb has been set in projects as a workaround for fragmentation but it ends up causing problems later, such as degraded performance from freeing empty segments. While it is a useful setting to have, expandable_segments is probably a better first resort for fixing fragmentation since when it works it is less likely to need synchronous GPU operations to continue running. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113481 Approved by: https://github.com/msaroufim, https://github.com/albanD ghstack dependencies: #113231	2023-11-19 04:05:01 +00:00
zdevito	d968c4cac3	[torchelastic] ensure grandchild processes are restarted correctly (#113231 ) When torchelastic notices that one rank has failed, it will sent a SIGTERM signal to other trainer ranks to tear them down before restarting. However, if the trainer itself launches subprocesses, or is launched by a non-python wrapper script, then the SIGTERM will be delivered only to the direct child of torch eleastic and not all descendants. This opens subprocesses in a new linux 'session' which starts a new process group with the pgid the same as the trainers pid. Then when we send signals, we deliver them to the process group rather than just the direct child. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113231 Approved by: https://github.com/H-Huang	2023-11-19 04:05:01 +00:00
Pavan Balaji	958f3b0df6	[nccl-pg] Migrate to getCvar* functions for env variable checking (#113797 ) Summary: The getCvar* functions allow us to provide multiple environment variables for the same value. This allows us to deprecate some variables in favor of others, while still allowing users to temporarily use the old variables for some time. Test Plan: OSS CI Reviewed By: fduwjj, XilunWu Differential Revision: D51225487 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/113797 Approved by: https://github.com/fduwjj	2023-11-19 03:48:58 +00:00
Yanbo Liang	09fe36274a	[Dynamo][Forward fix] Add torch.ao back to is_allowed list (#114016 ) Summary: As title Test Plan: Sandcastle Differential Revision: D51445366 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114016 Approved by: https://github.com/drisspg, https://github.com/voznesenskym, https://github.com/huydhn	2023-11-19 02:59:34 +00:00
GD06	b30580e121	[PT] Include tensor shape info in the error messages of torch split (#113984 ) Summary: Include tensor shape info in the error messages of torch split. Test Plan: CI Differential Revision: D51436684 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113984 Approved by: https://github.com/ezyang	2023-11-19 01:34:57 +00:00
Isuru Fernando	0ec66b3be5	Convert SymInts to SymFloats with SymPy (#113683 ) Fixes #109365 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113683 Approved by: https://github.com/ezyang	2023-11-18 22:18:24 +00:00
Yanbo Liang	870539670a	[Dynamo] Support skip/inline function by name and consolidate skip/inline check logics (#113888 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/113888 Approved by: https://github.com/mlazos	2023-11-18 21:36:29 +00:00
Hongtao Yu	f0dedb340f	[C++] Fix clang compilation issue. (#114017 ) Summary: Clang compilation failed recently due to unfound crtbeginS.o and libgcc, but we should not be using them, as for self-containness Clang use compiler-rt instead. I'm also switching to using the lld linker which is also from the clang release pacakge. There was another issue where glibc was not found during linking. It looks like the glibc path was passed into the linker via `-B`. It should be also passed via`-L` which can get to the linker, for library reference. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114017 Approved by: https://github.com/hl475	2023-11-18 19:56:44 +00:00
Oguz Ulgen	11857e9a64	[Inductor] Allow autotuned argument to be anywhere in the argument list (#114002 ) Prior to this PR, autotuned arguments could only be at the back of the argument list. This is an inductor limitation and not triton limitation. Fixing this allows more MRS kernels to use user defined triton kernels. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114002 Approved by: https://github.com/aakhundov ghstack dependencies: #113967	2023-11-18 18:19:32 +00:00
Oguz Ulgen	e0c3936843	[Inductor] Support ReinterpretView in inductor codegen (#113967 ) Adding support for ReinterpretView in inductor so that jagged MRS kernels can use native triton kernels Pull Request resolved: https://github.com/pytorch/pytorch/pull/113967 Approved by: https://github.com/aakhundov	2023-11-18 18:19:32 +00:00
PyTorch MergeBot	ff7c06a01b	Revert "limit fused kernel num args. (#113131 )" This reverts commit 7b442c2b0ae0d9c944a777d7352135f370837c15. Reverted https://github.com/pytorch/pytorch/pull/113131 on behalf of https://github.com/albanD due to Breaks lint on trunk ([comment](https://github.com/pytorch/pytorch/pull/113131#issuecomment-1817548349))	2023-11-18 16:14:08 +00:00
Jiong Gong	b53d47a719	[inductor cpp] refactor: CppVecOverrides inherits CppOverrides (#113950 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113950 Approved by: https://github.com/Skylion007	2023-11-18 15:33:30 +00:00
Justin Yip	f8516cef88	[pytorch-vulkan][2/n] Height packing (#113883 ) Summary: Enable logic for converting a channel packed tensor into heigh packed one. Not yet connecting with rest of the system yet. Test Plan: ``` (base) yipjustin@yipjustin-mac fbsource % buck2 run -c pt.has_backtraces=1 --target-platforms ovr_config//platform/macos:arm64-fbsource //xplat/caffe2:pt_vulkan_quantized_api_test_binAppleMac\#macosx-arm64 -- --gtest_filter="packing" File changed: fbsource//xplat/caffe2/aten/src/ATen/test/vulkan_quantized_api_test.cpp Buck UI: https://www.internalfb.com/buck2/9a0d6bd6-e4a2-4d58-8f38-f806a0703122 Network: Up: 0B Down: 0B Jobs completed: 4. Time elapsed: 0.1s. BUILD SUCCEEDED Running main() from third-party/googletest/1.14.0/googletest/googletest/src/gtest_main.cc Note: Google Test filter = packing [==========] Running 1 test from 1 test suite. [----------] Global test environment set-up. [----------] 1 test from VulkanAPITest [ RUN ] VulkanAPITest.channel_to_height_packing_test [ OK ] VulkanAPITest.channel_to_height_packing_test (35 ms) [----------] 1 test from VulkanAPITest (35 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test suite ran. (36 ms total) [ PASSED ] 1 test. ``` Reviewed By: SS-JIA Differential Revision: D51379737 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113883 Approved by: https://github.com/SS-JIA	2023-11-18 09:46:48 +00:00
Edward Z. Yang	fdaddec2c3	make_fx can now SymIntify int inputs (#113452 ) This PR also contains a basket of fixes that were turned up by now testing more arguments with SymInt. I fixed as many of the easy ones as I could easily get earlier in this stack and a bunch here, but there are some more annoying ones I xfailed. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113452 Approved by: https://github.com/Chillee ghstack dependencies: #113877, #113911	2023-11-18 06:39:09 +00:00
Edward Z. Yang	33f7c6638f	Guard when fetching non-symbolic value out of Scalar (#113911 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113911 Approved by: https://github.com/voznesenskym ghstack dependencies: #113877	2023-11-18 06:39:09 +00:00
Edward Z. Yang	bc0d87cde3	Explicitly enumerate all method to operator mappings (#113968 ) This is useful for documentary purposes, since these are precisely the operators you need to understand to deal with int/float compute inside make_fx traced graphs with symbolic ints/floats. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113968 Approved by: https://github.com/Skylion007	2023-11-18 05:43:39 +00:00
Nikita Shulga	ecd8d388b9	Skip test_lazy_clone for Inductor (#114012 ) As half of those tests fail if run individually, but first failure masks all subsequent ones, i.e. ``` PYTORCH_TEST_WITH_INDUCTOR=1 python3 test/test_torch.py -v -k test_lazy_clone_cuda_float32 test_lazy_clone_cuda_float32 (__main__.TestTorchDeviceTypeCUDA) ... FAIL ... self.assertTrue(torch._C._is_cow_tensor(t)) AssertionError: False is not true ---------------------------------------------------------------------- Ran 1 test in 19.419s FAILED (failures=1) ``` But ``` $ PYTORCH_TEST_WITH_INDUCTOR=1 python3 test/test_torch.py -k test_lazy_clone_ ... ...................... ---------------------------------------------------------------------- Ran 24 tests in 24.969s OK ``` This flaky behavior was already detected, for example see https://github.com/pytorch/pytorch/issues/113953 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114012 Approved by: https://github.com/huydhn, https://github.com/kit1980	2023-11-18 04:57:00 +00:00
Edward Z. Yang	caffa44b1c	Correctly use real boolean operators, not bitwise in shape guard prints (#113927 ) Fixes https://github.com/pytorch/pytorch/issues/113875 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113927 Approved by: https://github.com/voznesenskym	2023-11-18 04:24:45 +00:00
Han, Xu	7b442c2b0a	limit fused kernel num args. (#113131 ) Fixes #97361 When fused kernel more than 1024 parameters, it should throw error from ctypes. Limit args number is should be a mechanism to protect stack memory. As we known, CPP is passing args via stack memory, and stack memory has size limitation. Code change: 1. cpp backend will check the fused nodes' args number, if it is reach the limitation. It will status flush status to ready. 2. scheduler will check `ready_to_flush` API and help backend flush codegen. 3. Add `ready_to_flush` API to `BaseScheduling`, Triton backend will return False due to not support it yet. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113131 Approved by: https://github.com/jgong5, https://github.com/mlazos	2023-11-18 03:55:52 +00:00
Jane Xu	5e30741754	Clean up optimizer imports in test_optim (#113971 ) This is purely a cosmetic change to set up for my optimizer infos, which will benefit from not needing to type optim.SparseAdam or whatever. The next step is actually adding the OptimizerInfos, similar to my attempt in https://github.com/pytorch/pytorch/pull/102774/files Pull Request resolved: https://github.com/pytorch/pytorch/pull/113971 Approved by: https://github.com/cpuhrsch	2023-11-18 03:52:01 +00:00
ydwu4	46542f6ce2	[reland][export] make aot_export_module uses dynamo's fake_mode (#114009 ) Retry landing https://github.com/pytorch/pytorch/pull/113681 Fixes https://github.com/pytorch/pytorch/issues/110100. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114009 Approved by: https://github.com/angelayi	2023-11-18 03:36:34 +00:00
Nikita Shulga	310e3060b7	[Caffe2] Handle `cpuinfo_initialize()` failure (#114011 ) It can fail on ARM platform if `/sys` folder is not accessible. In that case, call `std:🧵:hardware_concurrency()`, which is aligned with the thread_pool initialization logic of `c10::TaskThreadPoolBase:defaultNumThreads()` Further addresses issue raised in https://github.com/pytorch/pytorch/issues/113568 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114011 Approved by: https://github.com/kit1980 ghstack dependencies: #113771	2023-11-18 03:20:22 +00:00
albanD	855a5cf427	312 test fix in named tensor and TS deprecations (#113981 ) Fix existing bugs / deprecations that become hard errors when running CI with Python 3.12 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113981 Approved by: https://github.com/malfet	2023-11-18 03:06:04 +00:00
Jez Ng	4667e20b3f	Delete a bunch of type-ignores (#113990 ) * Replaced `ignore[import]` by mypy config file entries * Removed a bunch of ignores around previously-fixed attr-defined / call-arg issues * Fixed some invalid / undefined types; added a few more type-ignores to squelch the downstream errors this exposed Pull Request resolved: https://github.com/pytorch/pytorch/pull/113990 Approved by: https://github.com/eellison, https://github.com/Skylion007 ghstack dependencies: #113979	2023-11-18 02:48:38 +00:00
Ting Lu	47220bc72a	fixes multiple GPU detected error for test_fsdp_fine_tune.py (#112406 ) fixes "Duplicate GPU detected : rank 1 and rank 0 both on CUDA device" on test_fsdp_fine_tune.py. Only run the test if GPU number > 1. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112406 Approved by: https://github.com/awgu	2023-11-18 02:07:18 +00:00
Prachi Gupta	1567917e5a	[ROCm] Enable several inductor UTs (#112777 ) - test_compiled_optimizers.py - test_foreach.py - test_profiler.py - Fix test_profiler.py:test_inductor_profiling_triton_launch - Look for hipModuleLaunchKernel in the events list for AMD GPUs instead of cuLaunchKernel Pull Request resolved: https://github.com/pytorch/pytorch/pull/112777 Approved by: https://github.com/jataylo, https://github.com/malfet	2023-11-18 02:05:57 +00:00
BowenBao	b169f04170	[ONNX] Fix bench w/ iobinding; Remove cpu fallback (#113703 ) Summary - `TORCH_TO_NUMPY_DTYPE` was misplaced previously hence subclasses cannot access it. - Remove cpu fallback when benching onnx with gpu, expose gpu run failures properly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113703 Approved by: https://github.com/thiagocrepaldi ghstack dependencies: #113404, #113697	2023-11-18 01:33:06 +00:00
Thiago Crepaldi	d4189d8007	Extend _TestONNXRuntime to reuses all tests for new model format (#112289 ) `_TestONNXRuntime` has infra to test models which are either Callable or a `torch.nn.Module`. After #111497, we want to re-run all those tests for model of type `torch.export.ExportedProgram`. This PR adds to `self.run_test_with_fx_to_onnx_exporter_and_onnx_runtime` the capability of detect the model type to be tested and export the incoming `torch.nn.Module` model to `torch.export.ExportedProgram` before running ONNX export tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112289 Approved by: https://github.com/titaiwangms	2023-11-18 00:27:56 +00:00
Nikita Shulga	2efa89a388	[torch/csrc/onnx] Use nested namespaces (3/N) (#113993 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113993 Approved by: https://github.com/ZainRizvi ghstack dependencies: #113991, #113992	2023-11-18 00:20:19 +00:00
Nikita Shulga	d6744a698c	[torch/csrc/onnx] Use nested namespaces (2/N) (#113992 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113992 Approved by: https://github.com/ZainRizvi ghstack dependencies: #113991	2023-11-18 00:20:19 +00:00
Nikita Shulga	c83a897348	[torch/csrc/onnx] Use nested namespaces (1/N) (#113991 ) Differential Revision: [D51439849](https://our.internmc.facebook.com/intern/diff/D51439849) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113991 Approved by: https://github.com/ZainRizvi	2023-11-18 00:20:10 +00:00
Andrew Gu	e360f4c6dd	[DTensor] Renamed `shard_spec` -> `placements` in test file (#113917 ) Public APIs like `from_local` and `distribute_tensor` name the argument as `placements`, not `shard_spec` anymore. This was a direct find and replace. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113917 Approved by: https://github.com/wanchaol ghstack dependencies: #113654, #113903	2023-11-18 00:13:30 +00:00
Sherlock Huang	8372983fe3	[AOTInductor] Use ProxyExecutor for aten op if c-shim is missing (#113918 ) Summary: As discussed in the meeting, we are inverting the policy on the use of proxy executor for aten fallbacks. By default, aten fallback ops will use proxy executor, unless a c-shim is available. Test Plan: CIs Differential Revision: D51417683 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113918 Approved by: https://github.com/chenyang78	2023-11-18 00:04:21 +00:00
Catherine Lee	dab272eed8	[td] Consistent pytest cache (#113804 ) Move the pytest cache downloading into the build step and store it in additional ci files so that it stays consistent during sharding. Only build env is taken into account now instead of also test config since we might not have the test config during build time, making it less specific, but I also think this might be better since tests are likely to fail across the same test config (I also think it might be worth not even looking at build env but thats a different topic) Each cache upload should only include information from the current run. Do not merge current cache with downloaded cache during upload (shouldn't matter anyways since the downloaded cache won't exist at the time) From what I cant tell of the s3 retention policy, pytest cache files will be deleted after 30 days (cc @ZainRizvi to confirm), so we never have to worry about space or pulling old versions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113804 Approved by: https://github.com/ZainRizvi	2023-11-17 23:45:47 +00:00
Yanbo Liang	033d7b670a	[Dynamo][6.1/N] Refactor out TorchInGraphFunctionVariable and improve heuristic (#113432 ) This is splitted from #113009, please check https://github.com/pytorch/pytorch/pull/113009#issuecomment-1804417925 for more details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113432 Approved by: https://github.com/ezyang	2023-11-17 23:42:00 +00:00
Nikita Shulga	3fc38e6c83	[GHF] Abort merge on rebase failure (#113960 ) Abort merges invoked with `-r` if there is nothing to rebase Make `rebase_onto`/`rebase_ghstack_onto` return False if rebase is no-op and abort merge in that case Remove `-e` option from both trymerge and tryrebase workflows as one should never report failures on workflow dispatch Pull Request resolved: https://github.com/pytorch/pytorch/pull/113960 Approved by: https://github.com/clee2000	2023-11-17 23:11:00 +00:00
Oguz Ulgen	a450c784da	[AotAutograd] Move mutations hidden from autograd in graph (#113454 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113454 Approved by: https://github.com/bdhirsh	2023-11-17 22:47:06 +00:00
markstur	4d8c73b2b7	Trivial fix for minor typo in torch.jit._script.py (#113892 ) Trivial PR to close an open issue regarding a typo. Looked for more typos in file, but found none. Fixes #113866 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113892 Approved by: https://github.com/janeyx99	2023-11-17 22:20:21 +00:00
Peter Bell	e736d27e38	[inductor] Fix slice scatter shape calculation (#113838 ) Fixes #113641 As written, there is an off-by-one error whenever `end - start` doesn't evenly divide into `step`. e.g. if `end - start = 1` and `step = 2` we should get a single element but `1 // 2 == 0` so this wouldn't take anything from the slice. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113838 Approved by: https://github.com/Chillee	2023-11-17 22:09:35 +00:00
andrewor14	e5102ccd27	[quant][pt2] Support conv1d-bn QAT fusion (#113714 ) Summary: Previously the PT2 QAT code only supported conv2d-bn. This commit extends all existing QAT fusion support to conv1d-bn, including support for all variants like relu, no bias, literal args, cuda etc. This commit also refactors the code such that we can support conv3d-bn easily in the future. Test Plan: python test/test_quantization.py TestQuantizePT2EQAT_ConvBn1d Reviewers: jerryzh168, kimishpatel Subscribers: jerryzh168, kimishpatel, supriyar Differential Revision: [](https://our.internmc.facebook.com/intern/diff/) Differential Revision: [D51428979](https://our.internmc.facebook.com/intern/diff/D51428979) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113714 Approved by: https://github.com/jerryzh168	2023-11-17 22:09:30 +00:00
Kai Londenberg	d40d2709c9	Minor fix in Unit Test test_max_autotune.py (#113889 ) The benchmark method of TestBenchmarkRequest accesses a non-existent property in a codepath. Looks like a typo, this fixes it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113889 Approved by: https://github.com/Skylion007	2023-11-17 21:51:56 +00:00
Alexander Grund	5d439b07ca	Fix failing test_mkldnn_pattern_matcher if built without MKL (#113949 ) The test checks for the `mkldnn_fusion.linear` pass which checks `_is_packable_linear` that depends on `torch._C.has_mkl`. So skip the test as it would fail due to no pattern matches counted. See https://github.com/pytorch/pytorch/blob/main/torch/_inductor/fx_passes/mkldnn_fusion.py#L827 CC @XiaobingSuper as the author of the test. Not sure how many other test are affected by similar issues but this is the one in pattern matcher I see failing. Strangely the first part of the test succeeds where `bias = True` as it finds a match for `unfuse_bias_add_to_pointwise` (torch/_inductor/fx_passes/post_grad.py) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113949 Approved by: https://github.com/jansel	2023-11-17 21:29:10 +00:00
Aaron Gokaslan	69d9267c4f	[BE]: ruff - enable PIE804 (#113951 ) Enables ruff PIE804 which kills some more unnecessary temporary dicts. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113951 Approved by: https://github.com/ezyang, https://github.com/malfet	2023-11-17 21:23:02 +00:00
Jez Ng	4b1583fe57	type-ignore issues exposed by import following (#113979 ) Some new errors were introduced in a land-race with https://github.com/pytorch/pytorch/pull/113830. Silence them for now to get the lintrunner job green again. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113979 Approved by: https://github.com/huydhn	2023-11-17 21:20:09 +00:00
Andrzej Kotlowski	0885c58296	Add Bfloat16 scalar support to gloo backend (#113557 ) There was missing support for bfloat scalars. When I use gloo backend `torch.distributed.init_process_group(backend='gloo')` and run `torch.nn.parallel.DistributedDataParallel(model)` and _model_ has Bfloat16 features I receive following error: `RuntimeError: Invalid scalar type` This change fix this issue. c10::BFloat16 defines conversions from/to float, so calculations are made on float for bfloat. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113557 Approved by: https://github.com/XilunWu, https://github.com/jgong5	2023-11-17 21:16:54 +00:00
soulitzer	c435b8c10a	Fix autograd engine callback error propagation from device thread (#113702 ) The existing try-catch doesn't work because it doesn't call err.persist(). This is in contrast to the try-catch for evaluate_function which does work because it calls into python_engine's thread_on_exception which calls persist. Calling persist on a python_error stashes the PyErr state from the thread-local PyThreadState onto the python_error object, so that when this error object is stored onto the future and passed back to the calling cpu thread, python_engine's execute try-catch can then err.restore() the error state. Finally, the python_engine's execute would re-raise so that this is re-caught by the HANDLE_TH_ERRORS macro. Fixes https://github.com/pytorch/pytorch/issues/75750 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113702 Approved by: https://github.com/albanD	2023-11-17 20:17:02 +00:00
titaiwang	957312a4cf	[ONNX] Relax unsupported node analysis on complex dtype (#113785 ) In cases like #113444, users usually stop at UnsupportedNodeAnalysis with unsupported nodes information. Although in SARIF, they can clearly see it's due to lack of COMPLEX support, in screen error message, it's only showing original FX node name, such as `aten.mul.Tensor`. ~~This PR catches the information from diagnostic messages and reveal it to users.~~ The root cause is that UnsupportedNodeAnalysis is leveraging on `onnxfunction_dispatcher.get_function_overloads()` to decide if an ATen is supported or not. However, in `onnxfunction_dispatcher.get_function_overloads()`, lacking of complex function support is considered unsupported. This PR defines Unsupported FX nodes as not in registry. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113785 Approved by: https://github.com/thiagocrepaldi	2023-11-17 20:11:20 +00:00
PyTorch MergeBot	76bf10e551	Revert "Fix checking symbolic shapes inside torch._check (#113811 )" This reverts commit 4f8cb52ed94bcdce16c421d7a5e3e9d32acfa439. Reverted https://github.com/pytorch/pytorch/pull/113811 on behalf of https://github.com/huydhn due to This one still break inductor tests on main `4f8cb52ed9` ([comment](https://github.com/pytorch/pytorch/pull/113811#issuecomment-1817001514))	2023-11-17 19:56:02 +00:00
Catherine Lee	c51827b8ce	[ez] Hash update to reuse issues again (#113961 ) The bot that creates the issue got changed, but the search did not, so it wasn't finding old PRs and was just making new ones. This PR makes it reuse PRs again instead of making a new one everytime. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113961 Approved by: https://github.com/huydhn	2023-11-17 19:06:38 +00:00
Jane Xu	ac08022137	[BE][benchmarks] Minor comment cleanup, typos (#113898 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113898 Approved by: https://github.com/desertfire	2023-11-17 19:03:41 +00:00
Jon Chuang	00b67193ef	[utils] move `config_typing.pyi` to `torch.utils` (#113929 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113929 Approved by: https://github.com/ezyang, https://github.com/jansel ghstack dependencies: #111299, #111300, #113901, #113916	2023-11-17 18:51:57 +00:00
Huy Do	a7b701ed21	Update ExecuTorch pinned commit daily (#113832 ) WIP * [X] Update this pinned commit periodically, similar to https://github.com/pytorch/pytorch/pull/113499 * [ ] Increase ET coverage on PT CI, ideally, we should run all ET pull jobs? * [ ] Switch ExecuTorch's torch, vision, and audio nightly pins to commit pins * [ ] Update ExecuTorch's torch, vision, and audio commit pins periodically ### Testing `python .github/scripts/update_commit_hashes.py --repo-name executorch --branch main --pin-folder .ci/docker/ci_commit_pins` The testing PR is https://github.com/pytorch/pytorch/pull/113834 (I will move the pinned commit out of the Docker image if the docker build process is flaky, otherwise, refreshing Docker image daily seems like a good thing to catch early issue with the images?) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113832 Approved by: https://github.com/clee2000	2023-11-17 18:38:46 +00:00
chilli	d4bb16f443	Change functorch import to proxy_tensor import (#113913 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113913 Approved by: https://github.com/ezyang, https://github.com/zou3519	2023-11-17 18:32:50 +00:00
Jez Ng	631fb33fd6	Enable import following in MYPYNOFOLLOW (now MYPYINDUCTOR) (#113830 ) Skipping importing some packages for now to make this change more tractable. For some reason, lintrunner on CI raises errors in all imported `.pyi` files, even though it doesn't on my local machine. The errors are all from missing generic types, as the MYPYINDUCTOR config has `disallow_any_generics` set. I have thus added `disable-error-code` comments to the relevant files, though I fixed a few that were easy enough. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113830 Approved by: https://github.com/Skylion007 ghstack dependencies: #113722, #113721	2023-11-17 18:24:21 +00:00
Jez Ng	0c8362de1a	[dynamo] Make {guards,eval_frame}.py pass follow_imports typechecking (#113721 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113721 Approved by: https://github.com/Skylion007 ghstack dependencies: #113722	2023-11-17 18:24:21 +00:00
Edward Z. Yang	e2b114ab9f	[BE] Package dynamic_dims/constraint_dims into CreateSymbolicPolicy (#113802 ) This will make it more convenient to propagate more information through all of these functions in the future (e.g., for storage offset information.) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113802 Approved by: https://github.com/davidberard98, https://github.com/voznesenskym	2023-11-17 18:22:46 +00:00
Evgeni Burovski	dc3d0caab3	BUG: fix np.ndarray.resize under dynamo (#113931 ) Make sure ndarray.resize actually works in-place, so that dynamo does the right thing tracking the result. Fixes #113539 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113931 Approved by: https://github.com/lezcano	2023-11-17 18:12:17 +00:00
Facebook Community Bot	6849d75300	Automated submodule update: FBGEMM (#112312 ) This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `049f2a9ac6` Test Plan: Ensure that CI jobs succeed on GitHub before landing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112312 Approved by: https://github.com/malfet	2023-11-17 17:46:18 +00:00
Andrew Calvano	7c35874ad6	Fix for PyTorch mobile flatbuffer loader out of bounds reads (#110162 ) Summary: The mobile_ivalue_size field in the mobile_bytecode flatbuffer schema can be larger than the ivalues vector. This introduces potential for memory corruption when parsing the mobile_bytecode Module. This diff fixes the issue by ensuring that mobile_ivalue_size is less than the size of the ivalues vector. Test Plan: contbuild & OSS CI Differential Revision: D49687548 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110162 Approved by: https://github.com/malfet	2023-11-17 17:29:07 +00:00
Peter Bell	9f47580ad7	[BE] Don't mutate torch.compile global config in tests (#113882 ) We should uniformly use `config.patch` so the configuration changes don't effect different tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113882 Approved by: https://github.com/lezcano	2023-11-17 16:49:48 +00:00
Isuru Fernando	4f8cb52ed9	Fix checking symbolic shapes inside torch._check (#113811 ) Fixes https://github.com/pytorch/pytorch/issues/110719#issuecomment-1768710678 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113811 Approved by: https://github.com/ezyang, https://github.com/peterbell10	2023-11-17 16:14:02 +00:00
Brian Vaughan	dbb96ef30d	improve annotation device parameters where a device ordinal is allowed (#113647 ) Using mypy in code that depends on pytorch, I noticed that the type annotation doesn't allow a device ordinal. `error: Argument "device" to "to_empty" of "Module" has incompatible type "int"; expected "str \| device" [arg-type]` Pull Request resolved: https://github.com/pytorch/pytorch/pull/113647 Approved by: https://github.com/albanD	2023-11-17 14:41:22 +00:00
vfdev-5	a56af02913	[dynamo] Added support for is_contiguous with dynamic shapes (#113645 ) Description: - Added support for `x.is_contiguous` with dynamic shapes On `main` the following code is giving a graph break: ```python import torch @torch.compile(backend="eager", dynamic=True, fullgraph=True) def f(x): if x.is_contiguous(): return x else: return 0 x = torch.randn(13, 14) f(x) ``` with the error message: ``` File "pytorch/torch/_dynamo/variables/builder.py", line 1541, in wrap_fx_proxy_cls unimplemented( File "pytorch/torch/_dynamo/exc.py", line 193, in unimplemented raise Unsupported(msg) torch._dynamo.exc.Unsupported: torch.* op returned non-Tensor bool call_method is_contiguous from user code: File "check_is_contig_dynamic_true.py", line 37, in f if x.is_contiguous(): ``` This PR fixes the issue. ``` TORCH_COMPILE_DEBUG=1 python check_is_contig_dynamic_true.py [2023-11-14 15:49:04,399] [0/0] torch._dynamo.symbolic_convert: [INFO] Step 1: torchdynamo start tracing f check_is_contig_dynamic_true.py:34 [2023-11-14 15:49:04,403] [0/0] torch._dynamo.symbolic_convert.__trace_source: [DEBUG] TRACE starts_line check_is_contig_dynamic_true.py:34 in f () [2023-11-14 15:49:04,403] [0/0] torch._dynamo.symbolic_convert.__trace_source: [DEBUG] @torch.compile(backend="eager", dynamic=True, fullgraph=True) [2023-11-14 15:49:04,405] [0/0] torch._dynamo.symbolic_convert.__trace_source: [DEBUG] TRACE starts_line check_is_contig_dynamic_true.py:37 in f (f) [2023-11-14 15:49:04,405] [0/0] torch._dynamo.symbolic_convert.__trace_source: [DEBUG] if x.is_contiguous(): [2023-11-14 15:49:04,405] [0/0] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST x [] [2023-11-14 15:49:04,405] [0/0] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR is_contiguous [LazyVariableTracker()] [2023-11-14 15:49:04,804] [0/0] torch._dynamo.output_graph: [DEBUG] create_graph_input L_x_ L['x'] [2023-11-14 15:49:04,805] [0/0] torch._dynamo.variables.builder: [DEBUG] wrap_to_fake L['x'] (5, 4) [<DimDynamic.DUCK: 1>, <DimDynamic.DUCK: 1>] [None, None] [2023-11-14 15:49:04,839] [0/0] torch._dynamo.output_graph: [DEBUG] create_graph_input s0 L['x'].size()[0] [2023-11-14 15:49:04,840] [0/0] torch._dynamo.output_graph: [DEBUG] create_graph_input s1 L['x'].size()[1] [2023-11-14 15:49:04,840] [0/0] torch._dynamo.output_graph: [DEBUG] create_graph_input s2 L['x'].stride()[0] [2023-11-14 15:49:04,840] [0/0] torch._dynamo.output_graph: [DEBUG] create_graph_input s1 L['x'].stride()[1] [2023-11-14 15:49:04,840] [0/0] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 0 [GetAttrVariable(TensorVariable(), is_contiguous)] [2023-11-14 15:49:04,843] [0/0] torch._dynamo.symbolic_convert: [DEBUG] TRACE POP_JUMP_IF_FALSE 12 [ConstantVariable(bool)] [2023-11-14 15:49:04,844] [0/0] torch._dynamo.symbolic_convert.__trace_source: [DEBUG] TRACE starts_line check_is_contig_dynamic_true.py:42 in f (f) [2023-11-14 15:49:04,844] [0/0] torch._dynamo.symbolic_convert.__trace_source: [DEBUG] return 0 [2023-11-14 15:49:04,844] [0/0] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 0 [] [2023-11-14 15:49:04,844] [0/0] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [ConstantVariable(int)] [2023-11-14 15:49:04,844] [0/0] torch._dynamo.convert_frame: [DEBUG] Skipping frame because no content in function call f check_is_contig_dynamic_true.py 34 [2023-11-14 15:49:04,844] [0/0] torch._dynamo.convert_frame: [DEBUG] No graph captured with one_graph=True [2023-11-14 15:49:04,848] torch._dynamo.utils: [INFO] TorchDynamo compilation metrics: [2023-11-14 15:49:04,848] torch._dynamo.utils: [INFO] Function Runtimes (s) [2023-11-14 15:49:04,848] torch._dynamo.utils: [INFO] ------------------------------- -------------- [2023-11-14 15:49:04,848] torch._dynamo.utils: [INFO] _compile.<locals>.compile_inner 1.2083 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/113645 Approved by: https://github.com/lezcano	2023-11-17 12:32:38 +00:00
Jon Chuang	3df2c42921	[dynamic_shapes] SymNode's `hint` does not always conform to `pytype` (#113848 ) Fixes https://github.com/pytorch/pytorch/issues/113393 Another chapter in the story of Python's horrible handling of int <-> bool interactions. ```python print(True and 1) # 1 print(1 and True) # True print(True or 1) # True print(1 or True) # 1 ``` For sanity's sake, since we have defined more sane type promotion rules, let's use those and ensure `out_hint` conforms to `SymNode`'s `pytype`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113848 Approved by: https://github.com/ezyang	2023-11-17 11:28:55 +00:00
Jon Chuang	a5e4d4f25f	[dynamo] promote skipfiles logging to verbose (#113916 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113916 Approved by: https://github.com/ezyang ghstack dependencies: #111299, #111300, #113901	2023-11-17 10:00:44 +00:00
Jon Chuang	b62230a685	[dynamo] remove unused `OptimizeCtx` field - export (#113901 ) This is only an internal API, so it's not really a BC breaking concern Pull Request resolved: https://github.com/pytorch/pytorch/pull/113901 Approved by: https://github.com/ezyang ghstack dependencies: #111299, #111300	2023-11-17 10:00:44 +00:00
Jon Chuang	78318d0249	[dynamo] Cache size calc for differing config (#111300 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111300 Approved by: https://github.com/ezyang ghstack dependencies: #111299	2023-11-17 09:59:58 +00:00
Jon Chuang	5927e9cbf2	[dynamo] guarded config (#111299 ) --- Fixes: https://github.com/pytorch/pytorch/issues/110682 Replaces: https://github.com/pytorch/pytorch/pull/111074 The guards are installed based on config that is valid at the call to `torch.compile`, rather than at any subsequent call / triggered compilation. Subsequent compilations will restore the config if there is a config mismatch of the existing global config with the saved config. TODO: - [X] add tests Follow up PRs: - [x] add revised cache size computation (follow up PR: #111300 , based on: https://github.com/pytorch/pytorch/pull/107496) - [ ] handle run-only mode? - [ ] config restoration itself is not thread-safe (tracked: https://github.com/pytorch/pytorch/issues/111150) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111299 Approved by: https://github.com/ezyang	2023-11-17 09:59:58 +00:00
PyTorch MergeBot	7731c97e06	Revert "Fix checking symbolic shapes inside torch._check (#113811 )" This reverts commit 7f224f6714419f3d56e64a66079340b0e914a2ca. Reverted https://github.com/pytorch/pytorch/pull/113811 on behalf of https://github.com/jeanschmidt due to Breaking inductor tests on main ([comment](https://github.com/pytorch/pytorch/pull/113811#issuecomment-1816024288))	2023-11-17 09:29:45 +00:00
angelayi	f27ab241a4	[dynamo] Fix UnspecializedNNModuleVariable's source (#113852 ) Fixes https://github.com/pytorch/pytorch/issues/113041 In the case where we have an object represented as an UnspecializedNNModuleVariable, the source of an attribute on that object is `AttrSource(base=NotNNModuleSource(base=NNModuleSource(base=AttrSource(base=LocalSource(local_name='self', cell_or_freevar=False), member='seq'))), member='b')`. This causes dynamo to add an extra attribute as it doesn't go to this [`register_attr` step](`eddce3c054/torch/_dynamo/variables/builder.py (L955-L962)`). However if we have an object represented as a UserDefinedObjectVariable, the source of an attribute on that object is `AttrSource(base=NNModuleSource(base=AttrSource(base=LocalSource(local_name='self', cell_or_freevar=False), member='seq')), member='b')`. It seems that UnspecializedNNModuleVariables should behave in the same was as UserDefinedObjectVariables, but the source in these two cases are different. So, I removed the part that changes the source in the UnspecializedNNModuleVariables, and it seems to work! And CI is green (+ reduced graph breaks). ``` def test_unspecialized_nnmodule(self): class TestModule(torch.nn.Module): def __init__(self): super().__init__() self.a = torch.tensor(1.0) def forward(self, x: torch.Tensor) -> torch.Tensor: return x + self.a def forward_hook( module: torch.nn.Module, inputs, output ) -> torch.Tensor: return 2 * output seq = torch.nn.Sequential(TestModule()).eval() seq.b = torch.tensor(2) handle = seq.register_forward_hook(forward_hook) class M(torch.nn.Module): def __init__(self): super().__init__() self.seq = seq def forward(self, x): # self.seq.b has source: AttrSource(base=NotNNModuleSource(base=NNModuleSource(base=AttrSource(base=LocalSource(local_name='self', cell_or_freevar=False), member='seq'))), member='b') return self.seq(x) + self.seq.b inp = (torch.randn(2, 8),) ep = export(M(), inp) ``` ``` def test_user_defined_var(self): class TestModule(torch.nn.Module): def __init__(self): super().__init__() self.a = torch.tensor(1.0) def forward(self, x: torch.Tensor) -> torch.Tensor: return x + self.a class UserDefined: def __init__(self): self.test_module = TestModule() self.b = torch.tensor(2) def __call__(self, x): return self.test_module(x) class M(torch.nn.Module): def __init__(self): super().__init__() self.seq = UserDefined() def forward(self, x): # self.seq.b has source: AttrSource(base=NNModuleSource(base=AttrSource(base=LocalSource(local_name='self', cell_or_freevar=False), member='seq')), member='b') return self.seq(x) + self.seq.b inp = (torch.randn(2, 8),) ep = export(M(), inp) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/113852 Approved by: https://github.com/yanboliang	2023-11-17 08:17:27 +00:00
David Berard	7c38b76efe	Make offsets dynamic by default (#113734 ) Copied from @ezyang 's #113693. The motivation for this change is that we'd like to guard on storage offset in inductor, to make assumptions about data alignment. create_symbolic_sizes_strides_storage_offset() creates the sizes/strides/offset for fake tensors - they can either be integers or symints. This PR changes storage_offset to always be dynamic. In variables/builder.py, we remove a conditional so that all tensors get added to tracked_fakes. This is because the storage offset will be dynamic even if the other logic in builder.py suggests that it will be static; otherwise, we run into this issue: `1e260c851b/torch/fx/experimental/symbolic_shapes.py (L892-L895)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/113734 Approved by: https://github.com/ezyang	2023-11-17 07:57:21 +00:00
Jon Chuang	c94fdebd3e	[dynamo] chore: Fallback on `const_handler` instead of special-casing on `ConstantVariable` (#113893 ) Fixes https://github.com/pytorch/pytorch/pull/113874#issuecomment-1815269686 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113893 Approved by: https://github.com/ezyang	2023-11-17 07:46:58 +00:00
jon-chuang	c233cef8fd	[dynamo] Enforce lifetime of output fx graph and its metadata (#113517 ) Fixes https://github.com/pytorch/pytorch/issues/113516 Also asserts that by the time we modify the output's graph nodes, we are in the irreversible state of `should_exit`. Remove `creation_timestamp` from graph as it is only consumed by dynamo for checkpoint restore. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113517 Approved by: https://github.com/ezyang	2023-11-17 07:34:43 +00:00
hongxyan	16da135550	More replacing assert with CUDA_KERNEL_ASSERT in kernels (#113563 ) Fixes #103973 Background: After https://github.com/pytorch/pytorch/pull/113098, user verified that torch.sum() worked for environment where PCIe atomics was exposed as a problem for such operation. Goal: This is to expend the changes to other kernels where assert is called. The goal is the same so that we can disable kernel assertion easily for those users when the call sites consistently use CUDA_KERNEL_ASSERT. Test: We build wheels with these fixes for those users who had PCIe atomics issue, and users verified they can perform their workflow now with these fixes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113563 Approved by: https://github.com/jeffdaily, https://github.com/ezyang	2023-11-17 07:28:00 +00:00
fduwjj	015fd2eb41	[NCCL PG] Add dumping flight recorder in the NCCL watchdog timeout (#113678 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113678 Approved by: https://github.com/XilunWu ghstack dependencies: #113503	2023-11-17 07:00:41 +00:00
CK Luk	0ea126e834	add use_fake_all_gather and use_fake_reduce_scatter to FSDP for ablation studies (#113106 ) Summary: As titled Test Plan: Not needed because this is only for doing ablation studies Reviewed By: awgu Differential Revision: D50867908 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113106 Approved by: https://github.com/awgu	2023-11-17 05:43:30 +00:00
Edward Z. Yang	4979f9c0d7	[EASY] Support SymInt tracing on broadcast_shapes (#113877 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113877 Approved by: https://github.com/Skylion007	2023-11-17 04:43:57 +00:00
Thomas M Kehrenberg	e8ee14292e	Export `_C` in `torch/__init__.py` explicitly with `from . import` (#113887 ) This is now required with mypy 1.7. See release blog post: https://mypy-lang.blogspot.com/2023/11/mypy-17-released.html under the heading "New Rules for Re-exports". Under normal circumstances this isn't noticeable, but when the setting ``` implicit_reexport = false ``` is used in the mypy config file, then mypy can't find `torch._C` when only `torch` has been imported. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113887 Approved by: https://github.com/Skylion007	2023-11-17 03:32:14 +00:00
Isuru Fernando	7f224f6714	Fix checking symbolic shapes inside torch._check (#113811 ) Fixes https://github.com/pytorch/pytorch/issues/110719#issuecomment-1768710678 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113811 Approved by: https://github.com/ezyang, https://github.com/peterbell10	2023-11-17 03:05:49 +00:00
Evgeni Burovski	237cbd5be6	BUG: trace frames with numpy scalar -> ndarray functions (#112959 ) Fixes #112951 Make dynamo detect that `np.arange(3)` returns a FakeTensor, so the frame needs to be traced. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112959 Approved by: https://github.com/lezcano	2023-11-17 03:00:24 +00:00
Andrew Gu	99b89db174	[DTensor] Added `op_call` in no-mesh dispatch assert message (#113903 ) This helps debug, e.g. when there is an unsupported op. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113903 Approved by: https://github.com/wanchaol ghstack dependencies: #113654	2023-11-17 02:44:54 +00:00
ydwu4	0894981f6c	[HigherOrderOp][BE] change _make_inlined check callable() (#113881 ) A follow up of discussion #113814 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113881 Approved by: https://github.com/Skylion007	2023-11-17 02:44:12 +00:00
Wanchao Liang	ae94c7e491	[dtensor] add foreach_zero_ support (#113897 ) This PR add foreach_zero_ op support, to fix when optim.zero_grad(set_to_none=False) hit this op and erroring out the device mesh not found issue. Also move the test to use zero_grad as the last step as that's when we going to have dtensor as grads Pull Request resolved: https://github.com/pytorch/pytorch/pull/113897 Approved by: https://github.com/awgu	2023-11-17 02:11:19 +00:00
Kurt Mohler	9916d8a9ea	Add `torch._lazy_clone` to create COW tensors (#113397 ) Part of #109833 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113397 Approved by: https://github.com/ezyang ghstack dependencies: #113396	2023-11-17 01:58:51 +00:00
Kurt Mohler	e2f090086b	Add function to materialize COW storages (#113396 ) Part of #109833 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113396 Approved by: https://github.com/ezyang	2023-11-17 01:58:51 +00:00
eellison	a9134fa99a	Skip cudagraphs when there is sparsity (#113791 ) Fix for dlrm training Pull Request resolved: https://github.com/pytorch/pytorch/pull/113791 Approved by: https://github.com/Chillee	2023-11-17 01:36:03 +00:00
BowenBao	31459e3e56	[ONNX][dynamo_export] Add 'aten::rsub' type promotion (#113697 ) The logic is the same as 'aten::sub'. Needed by llama2. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113697 Approved by: https://github.com/justinchuby, https://github.com/thiagocrepaldi ghstack dependencies: #113404	2023-11-17 00:50:05 +00:00
Rohan Varma	b3308c4856	[FSDP][Docs] Omit "on CPU" (#113753 ) This initialization can take place on CPU, GPU, or meta device and the current comment sort of implies users need to do it on CPU for this to work. Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/113753 Approved by: https://github.com/wz337	2023-11-17 00:15:41 +00:00
Wanchao Liang	2ac33ad98a	[dtensor] group dispatch unwrapping to a method (#113846 ) This PR group the dispatch unwrapping logic to a method, so that even custom handlers can reuses many parts of the dispatch logic to do custom things. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113846 Approved by: https://github.com/wz337	2023-11-16 23:54:18 +00:00
Philip Meier	769f924bc6	robustify parametrize default name (#113856 ) #113340 was reverted initially due to a bad default parametrization name. The test looked like ```python @common_utils.parametrize( "type_fn", [ type, lambda obj: obj.__class__, ], ) def test_access_class_method_from_user_class(self, type_fn): ``` This is a valid parametrization, but results in these default test names: ```bash ❯ pytest test/dynamo/test_export.py -k test_access_class_method_from_user_class --co -q test/dynamo/test_export.py::ExportTests::test_access_class_method_from_user_class_type_fn_<class 'type'> test/dynamo/test_export.py::ExportTests::test_access_class_method_from_user_class_type_fn_<function ExportTests_<lambda> at 0x7f3be5de0c10> ``` Ignoring the whitespace in the test names, which can lead to other issues down the line, the problem in #113340 was that the lambda parameter included a memory address. IIUC, internally, the tests are not collected and run in the same process. Meaning, the address of the lambda and in turn the test name is no longer valid on the runner. This is fixed earlier in the stack by giving the parametrization an explicit name with `subtest`, but this PR is about preventing issues in the default case. `pytest` solves this by simply using the name of the parameter plus its index as id in the test name: ```python import pytest class Foo: def __repr__(self): return str(id(self)) @pytest.mark.parametrize( "bar", [ pytest.param(type), pytest.param(lambda obj: obj.__class__), pytest.param(Foo()), ], ) def test_foo(bar): pass ``` ``` ❯ pytest main.py --co -q main.py::test_foo[type] main.py::test_foo[<lambda>] main.py::test_foo[bar2] ``` `pytest` has better defaults for `type` and `lambda` than we do, but is has a safe default for custom objects. This PR aligns our default test name with `pytest`. Using the parametrization from above again, we now collect ```bash ❯ pytest test/dynamo/test_export.py -k test_access_class_method_from_user_class --co -q test/dynamo/test_export.py::ExportTests::test_access_class_method_from_user_class_type_fn0 test/dynamo/test_export.py::ExportTests::test_access_class_method_from_user_class_type_fn1 ``` which might not be as expressive at first glance, but at least prevents bugs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113856 Approved by: https://github.com/malfet, https://github.com/huydhn ghstack dependencies: #113855	2023-11-16 23:25:04 +00:00
Philip Meier	03bebd90f6	cleanup test parametrization (#113855 ) Cleanup from https://github.com/pytorch/pytorch/pull/113340#issuecomment-1814020469. ``` ❯ pytest test/dynamo/test_export.py -k test_access_class_method_from_user_class --co -q test/dynamo/test_export.py::ExportTests::test_access_class_method_from_user_class_attr test/dynamo/test_export.py::ExportTests::test_access_class_method_from_user_class_builtin ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/113855 Approved by: https://github.com/lezcano, https://github.com/huydhn	2023-11-16 23:25:04 +00:00
Jon Chuang	277229d0c6	[dynamo] Fix incorrectly casting `SymNode` to `int` when input is `bool` (#113871 ) Fixes https://github.com/pytorch/pytorch/issues/113393, https://github.com/pytorch/pytorch/pull/113848#issuecomment-1814624510 Incorrectly casting symnode type will cause it to take the wrong path in symbolic_shapes Pull Request resolved: https://github.com/pytorch/pytorch/pull/113871 Approved by: https://github.com/jansel	2023-11-16 23:24:57 +00:00
eellison	986634a117	Add Pass to move constructors from cpu to cuda (#109665 ) Sometimes indexing tensors are constructed on cpu and then used to index a cuda tensor. This prevents cudagraphs when it does not need to. Adding a pass which moves constructors from cpu->cuda when we can prove the downstream uses can be safely converted. This pr allows us to cudagraph `clip` from the blueberries model which improves perf from ~1.5x -> ~4x. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109665 Approved by: https://github.com/ezyang, https://github.com/jansel	2023-11-16 23:19:57 +00:00
Zain Rizvi	ec20c9044e	[TD] Fix metric emission for split test files (#113789 ) Fixes a bug in TD metrics generation where it wouldn't be able to find the rank and relevance that a heuristic gave a test run if that heuristic had divided that test into multiple test runs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113789 Approved by: https://github.com/clee2000	2023-11-16 23:19:40 +00:00
Bin Bao	1480c670a0	[AOTI] Delay the fallback kernel naming decision to the codegen time (#113660 ) Summary: This is to prepare for a later change that changes AOTI's second-pass to perform codegen only. Differential Revision: [D51382677](https://our.internmc.facebook.com/intern/diff/D51382677) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113660 Approved by: https://github.com/chenyang78	2023-11-16 23:07:30 +00:00
Yanbo Liang	bab41f44b8	[dynamo] Fix allow_in_graph decorator doesn't work on autograd.Function (#113510 ) Fixes #111032 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113510 Approved by: https://github.com/zou3519	2023-11-16 22:44:46 +00:00
PyTorch MergeBot	3f6e5e87f8	Revert "[1/N] Fixes clang-tidy warnings in header files (#113608 )" This reverts commit cab039fe9b9466f09f98318a11d2dcafef235426. Reverted https://github.com/pytorch/pytorch/pull/113608 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing with an internal build when -Wpessimizing-move is used ([comment](https://github.com/pytorch/pytorch/pull/113608#issuecomment-1815424448))	2023-11-16 22:38:41 +00:00
Aaron Gokaslan	d9f2cf9974	[BE]: Enable ruff rule PIE800 - unnecessary nested dict expansion (#113880 ) Adds an additional list which removes unnecessary dict literal unpacking, also applies the fixes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113880 Approved by: https://github.com/albanD	2023-11-16 22:34:38 +00:00
Shubhraprakash Das	bdf0b196db	Quantize bias for conv2d quantized op during setup (#113582 ) Summary: Quantize bias in setup step so that we do not incur additional time on quantizing bias in the first iteration. Test Plan: Ensure all vulkan quantize tests pass: buck2 run --target-platforms ovr_configplatform/macos:arm64-fbsourcexplat/caffe2:pt_vulkan_quantized_api_test_binAppleMac\#macosx-arm64 -c pt.vulkan_full_precision=1 --show-output" ..... [----------] Global test environment tear-down [==========] 78 tests from 1 test suite ran. (1519 ms total) [ PASSED ] 78 tests. YOU HAVE 8 DISABLED TESTS buck2 run --target-platforms ovr_config//platform/macos:arm64-fbsource //xplat/caffe2:pt_vulkan_api_test_binAppleMac\#macosx-arm64 -c pt.vulkan_full_precision=1 --show-output" Running main() from third-party/googletest/1.11.0/googletest/googletest/src/gtest_main.cc [==========] Running 395 tests from 1 test suite. [----------] Global test environment set-up. [----------] 395 tests from VulkanAPITest [----------] 395 tests from VulkanAPITest (6515 ms total) ..... [----------] Global test environment tear-down [==========] 395 tests from 1 test suite ran. (6515 ms total) [ PASSED ] 394 tests. [ SKIPPED ] 1 test, listed below: [ SKIPPED ] VulkanAPITest.querypool_flushed_shader_log YOU HAVE 5 DISABLED TESTS Reviewed By: yipjustin, copyrightly Differential Revision: D50997531 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113582 Approved by: https://github.com/yipjustin	2023-11-16 22:31:36 +00:00
Thiago Crepaldi	e19ea53e1d	Add optional torch.export.ExportGraphSignature to ONNXProgram (#113477 ) When the ONNX model is exported from a torch.export.ExportedProgram, a torch.export.ExportedGraphSignature is available with the specification of the model inputs and outputs. ExportedGraphSignature includes information such as the mapping between the exported input/buffer/output ONNX name to the original pytorch input/buffer/output name. It also specifies the kind of the input, such as user_input, parameter, buffer or constant_tensor. Outputs kind can be user_output, loss_output, buffer_mutation, etc Such information can be useful to understand what the ONNX model expects as inputs and how the output will look like when the ONNX input/output differs from the original PyTorch input/output schema. When the ONNX model is exported from a Callable or regular torch.nn.MOdule, such information is not available and ONNXProgram.model_signature will yield NOne Pull Request resolved: https://github.com/pytorch/pytorch/pull/113477 Approved by: https://github.com/BowenBao	2023-11-16 22:04:44 +00:00
Zain Rizvi	9a9232956f	Include job name in the emitted metrics (#113884 ) What it says in the title Pull Request resolved: https://github.com/pytorch/pytorch/pull/113884 Approved by: https://github.com/clee2000	2023-11-16 21:26:49 +00:00
William Wen	2530d47cbe	[dynamo] re-add option to log all guard check fails (#113585 ) Followup to https://github.com/pytorch/pytorch/pull/110325 - re-add the `report_all_guard_failures config` as a logging artifact `recompiles_verbose` with the following changes: - evaluating the check must be wrapped with exception handling because subsequent code parts following the first failure may result in errors if evaluated (e.g. if a guard checks first for size, then tries to index - a guard failure due to insufficient size would result in an index error for the latter check). - Adding a test for this case Sample: ```python import torch def fn(x): return torch.rand(x[-1], len(x)) opt_fn = torch.compile(fn) opt_fn([4, 5, 6]) opt_fn([7, 8]) opt_fn([9]) ``` Output (with `TORCH_LOGS="recompiles_verbose"`): ```bash [2023-11-15 16:13:26,741] torch._dynamo.guards.__recompiles_verbose: [DEBUG] Recompiling function fn in /data/users/williamwen/pytorch/playground5.py:15 [2023-11-15 16:13:26,741] torch._dynamo.guards.__recompiles_verbose: [DEBUG] triggered by the following guard failure(s): [2023-11-15 16:13:26,741] torch._dynamo.guards.__recompiles_verbose: [DEBUG] guard 0 failures: [2023-11-15 16:13:26,741] torch._dynamo.guards.__recompiles_verbose: [DEBUG] - len(L['x']) == 3 [2023-11-15 16:13:26,741] torch._dynamo.guards.__recompiles_verbose: [DEBUG] - L['x'][0] == 4 [2023-11-15 16:13:26,741] torch._dynamo.guards.__recompiles_verbose: [DEBUG] - L['x'][1] == 5 [2023-11-15 16:13:26,970] torch._dynamo.guards.__recompiles_verbose: [DEBUG] Recompiling function fn in /data/users/williamwen/pytorch/playground5.py:15 [2023-11-15 16:13:26,970] torch._dynamo.guards.__recompiles_verbose: [DEBUG] triggered by the following guard failure(s): [2023-11-15 16:13:26,970] torch._dynamo.guards.__recompiles_verbose: [DEBUG] guard 0 failures: [2023-11-15 16:13:26,970] torch._dynamo.guards.__recompiles_verbose: [DEBUG] - len(L['x']) == 2 [2023-11-15 16:13:26,970] torch._dynamo.guards.__recompiles_verbose: [DEBUG] [2023-11-15 16:13:26,970] torch._dynamo.guards.__recompiles_verbose: [DEBUG] guard 1 failures: [2023-11-15 16:13:26,970] torch._dynamo.guards.__recompiles_verbose: [DEBUG] - len(L['x']) == 3 [2023-11-15 16:13:26,970] torch._dynamo.guards.__recompiles_verbose: [DEBUG] - L['x'][0] == 4 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/113585 Approved by: https://github.com/jon-chuang, https://github.com/ezyang	2023-11-16 21:20:29 +00:00
PyTorch MergeBot	40dfabf970	Revert "[export] make aot_export_module uses dynamo's fake_mode (#113681 )" This reverts commit 094beca0c6ebc2ac7d70c5badc271a1663e05de6. Reverted https://github.com/pytorch/pytorch/pull/113681 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it is failing an internal ExecuTorch test ([comment](https://github.com/pytorch/pytorch/pull/113681#issuecomment-1815329750))	2023-11-16 21:20:02 +00:00
chundian	2abb04d1dc	[inductor] Relax symbolic guard for sizevars.evaluate_min (#113841 ) We should shorten two conditional guards (guard_equals, guard_lt) into only one (guard_leq). Then we can save re-compilation for access-the-last-element-of-the-tensor op. [test_torchinductor.test_setitem_with_int_parameter](`8efa6ad1fc/test/inductor/test_torchinductor.py (L6896C1-L6902)`) will become `frame_count = 2 if torch._dynamo.config.assume_static_by_default else 1`. Test plan: `python test/inductor/test_torchinductor.py -k test_setitem_with_int_parameter_cpu` Pull Request resolved: https://github.com/pytorch/pytorch/pull/113841 Approved by: https://github.com/peterbell10, https://github.com/aakhundov	2023-11-16 21:16:50 +00:00
PyTorch MergeBot	98df3088c3	Revert "Make offsets dynamic by default (#113734 )" This reverts commit 9efbb4ea73009950a2d99e4d871351c898aae0dd. Reverted https://github.com/pytorch/pytorch/pull/113734 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it is causing a memory leak in one of the test `9efbb4ea73` ([comment](https://github.com/pytorch/pytorch/pull/113734#issuecomment-1815297222))	2023-11-16 20:56:27 +00:00
PyTorch MergeBot	3c4e4d9947	Revert "[quant][pt2e] Refactor insert observer to do sharing checking in the same place (#113458 )" This reverts commit 585e315b3afc962bda4449957dc0d25eca3e4d4e. Reverted https://github.com/pytorch/pytorch/pull/113458 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing executorch export test for llama2 ([comment](https://github.com/pytorch/pytorch/pull/113458#issuecomment-1815280715))	2023-11-16 20:43:38 +00:00
Menglu Yu	de4fd3843c	[Inductor][fx pass] Fix a bug in the merge getitem cat pattern (#113822 ) Summary: The split cat pattern in D50100667 may change the sliced node returned by split node if the getitem to be merged is not consecutive indices. Test Plan: ``` buck2 test 'fbcode//mode/opt' fbcode//pytorch/benchmark/fb/test_gpu:run_test_gpu -- --exact 'pytorch/benchmark/fb/test_gpu:run_test_gpu - test_train_mimo_cmf_30x_inductor_accuracy (pytorch.benchmark.fb.test_gpu.test_gpu.TestBenchmarkFbGpu)' --run-disabled ``` Buck UI: https://www.internalfb.com/buck2/1fd8fa6a-83d1-4cfd-bf33-c7ddb28de5b5 Test UI: https://www.internalfb.com/intern/testinfra/testrun/6473924659080211 Network: Up: 1.3GiB Down: 48MiB (reSessionID-acaa2760-abff-442e-989f-3eefd1d1e034) Jobs completed: 75. Time elapsed: 18:37.5s. Cache hits: 0%. Commands: 68 (cached: 0, remote: 0, local: 68) Tests finished: Pass 1. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` buck2 test 'fbcode//mode/opt' fbcode//pytorch/benchmark/fb/test_gpu:run_test_gpu -- --exact 'pytorch/benchmark/fb/test_gpu:run_test_gpu - test_train_mimo_cmf_30x_inductor_speedup (pytorch.benchmark.fb.test_gpu.test_gpu.TestBenchmarkFbGpu)' ``` Buck UI: https://www.internalfb.com/buck2/7de122c6-23e0-4f13-b2b4-934cf780b60b Test UI: https://www.internalfb.com/intern/testinfra/testrun/16888498613412388 Network: Up: 90KiB Down: 2.1MiB (reSessionID-f75d6b7b-93ea-4d47-a52a-8d2429b30ad1) Jobs completed: 6. Time elapsed: 17:28.0s. Tests finished: Pass 1. Fail 0. Fatal 0. Skip 0. Build failure 0 Differential Revision: D51378532 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113822 Approved by: https://github.com/xuzhao9	2023-11-16 20:40:03 +00:00
Wei Lu	8dc4b12fa7	[Pytorch][Vulkan] refactor layer_norm (#113676 ) Summary: Due to the implementation of `native_layer_norm`, we can simplify the implementation of `layer_norm` by just invoking `native_layer_norm`. Test Plan: ``` [luwei@devbig984.prn1 /data/users/luwei/fbsource (7f66eb77b)]$ LD_LIBRARY_PATH=third-party/swiftshader/lib/linux-x64/ buck run fbcode/mode/dev-nosan //xplat/caffe2:pt_vulkan_api_test_bin -- --gtest_filter="layer_norm" Building: finished in 0.1 sec (100%) 339/339 jobs, 0/339 updated Total time: 0.2 sec BUILD SUCCEEDED Running main() from third-party/googletest/1.14.0/googletest/googletest/src/gtest_main.cc Note: Google Test filter = layer_norm [==========] Running 7 tests from 1 test suite. [----------] Global test environment set-up. [----------] 7 tests from VulkanAPITest [ RUN ] VulkanAPITest.layer_norm_invalid_inputs [ OK ] VulkanAPITest.layer_norm_invalid_inputs (69 ms) [ RUN ] VulkanAPITest.layer_norm_2d [ OK ] VulkanAPITest.layer_norm_2d (292 ms) [ RUN ] VulkanAPITest.layer_norm_3d [ OK ] VulkanAPITest.layer_norm_3d (289 ms) [ RUN ] VulkanAPITest.layer_norm_4d [ OK ] VulkanAPITest.layer_norm_4d (4 ms) [ RUN ] VulkanAPITest.native_layer_norm_2d [ OK ] VulkanAPITest.native_layer_norm_2d (5 ms) [ RUN ] VulkanAPITest.native_layer_norm_3d [ OK ] VulkanAPITest.native_layer_norm_3d (2 ms) [ RUN ] VulkanAPITest.native_layer_norm_4d [ OK ] VulkanAPITest.native_layer_norm_4d (4 ms) [----------] 7 tests from VulkanAPITest (667 ms total) [----------] Global test environment tear-down [==========] 7 tests from 1 test suite ran. (667 ms total) [ PASSED ] 7 tests. ``` Reviewed By: yipjustin Differential Revision: D51297971 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113676 Approved by: https://github.com/yipjustin	2023-11-16 20:39:58 +00:00
Nikita Shulga	0d6d97d956	Relax constraints on `test_cast_round_trip` (#113872 ) Results of float point operation can be affected by execution order and compiler is not guaranteed to make trivial optimization that might result in lost off precision while compiling in debug mode Fixes https://github.com/pytorch/pytorch/issues/113829 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113872 Approved by: https://github.com/Skylion007, https://github.com/huydhn	2023-11-16 19:52:05 +00:00
Edward Z. Yang	c4c45ab9b5	Fix resize matrix_power.out dynamic shapes (#113695 ) Fixes https://github.com/pytorch/pytorch/issues/113003 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113695 Approved by: https://github.com/bdhirsh, https://github.com/lezcano	2023-11-16 19:36:27 +00:00
Edward Z. Yang	8a183bf1ab	[BE] Consistently query tracing context for fake mode in Dynamo (#113768 ) Split from https://github.com/pytorch/pytorch/pull/113666 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113768 Approved by: https://github.com/bdhirsh	2023-11-16 19:31:10 +00:00
Edward Z. Yang	3a3a979984	Add torch.distributed.breakpoint (#113775 ) I tested it works by patching ``` diff --git a/test/distributed/test_dynamo_distributed.py b/test/distributed/test_dynamo_distributed.py index 96b3a82bdfa..dea9bac9302 100644 --- a/test/distributed/test_dynamo_distributed.py +++ b/test/distributed/test_dynamo_distributed.py @@ -18,6 +18,7 @@ from torch._dynamo import config from torch._dynamo.utils import same from torch._dynamo.testing import collect_results from torch.utils._triton import has_triton +import torch.distributed as dist from torch.distributed.fsdp.wrap import transformer_auto_wrap_policy, lambda_auto_wrap_policy from torch._higher_order_ops.wrap import tag_activation_checkpoint from torch.nn.parallel import DistributedDataParallel as DDP @@ -398,6 +399,7 @@ class TestMultiProc(DynamoDistributedMultiProcTestCase): @unittest.skipIf(not has_triton(), "Inductor+gpu needs triton and recent GPU arch") def test_fsdp_activation_checkpointing(self): with _dynamo_dist_per_rank_init(self.rank, self.world_size): + dist.breakpoint() model, inputs = get_toy_model_for_activation_checkpointing(f"cuda:{self.rank}") is_inner = lambda module: isinstance(module, ToyInnerModel) # noqa: E731 wrap_policy = functools.partial(lambda_auto_wrap_policy, lambda_fn=is_inner) ``` and then running `python test/distributed/test_dynamo_distributed.py -k test_fsdp_activation_checkpointing` It prints: ``` ATTENTION!!! Type 'up' to get to the frame that called dist.breakpoint(rank=0) > /data/users/ezyang/c/pytorch/torch/distributed/__init__.py(71)breakpoint() -> barrier() (Pdb) up > /data/users/ezyang/c/pytorch/test/distributed/test_dynamo_distributed.py(402)test_fsdp_activation_checkpointing() -> dist.breakpoint() (Pdb) list 397 398 @skip_if_lt_x_gpu(1) 399 @unittest.skipIf(not has_triton(), "Inductor+gpu needs triton and recent GPU arch") 400 def test_fsdp_activation_checkpointing(self): 401 with _dynamo_dist_per_rank_init(self.rank, self.world_size): 402 -> dist.breakpoint() 403 model, inputs = get_toy_model_for_activation_checkpointing(f"cuda:{self.rank}") 404 is_inner = lambda module: isinstance(module, ToyInnerModel) # noqa: E731 405 wrap_policy = functools.partial(lambda_auto_wrap_policy, lambda_fn=is_inner) 406 model = apply_fsdp_with_checkpointing(model, wrap_policy, is_inner) 407 correct_outputs = model(inputs) ``` Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113775 Approved by: https://github.com/wconstab, https://github.com/wanchaol	2023-11-16 19:30:57 +00:00
Mu-Chu Lee	eddce3c054	[AOTInductor] Rename model_runner to model_container_runner (#111324 ) Summary: We rename the model_runner to model_container_runner to prepare for adding tests of pure model without container. Test Plan: commit itself is a test. Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/111324 Approved by: https://github.com/desertfire, https://github.com/chenyang78	2023-11-16 19:14:22 +00:00
lezcano	1d96034816	[BE][easy] Simplify the registration of a few metafunctions (#113635 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113635 Approved by: https://github.com/Skylion007 ghstack dependencies: #113634, #113674	2023-11-16 19:09:12 +00:00
lezcano	ef982418df	Add OpInfo test that tests meta functions binary ufuncs with different dtypes (#113674 ) As per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/113674 Approved by: https://github.com/peterbell10 ghstack dependencies: #113634	2023-11-16 19:09:12 +00:00
lezcano	9b3e694f5d	Fix metafunction for many pointwise operations (#113634 ) The previous metafunction was completely broken. It incorrectly used a metafunction that was designed for prims. It also passed in an incorrect enum class for the type promotion. Fixes https://github.com/pytorch/pytorch/issues/113119 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113634 Approved by: https://github.com/peterbell10	2023-11-16 19:09:12 +00:00
soulitzer	3e3c6cc05e	Do not error when printing view created in no-grad modified in-place in no-grad (#113716 ) Fixes https://github.com/pytorch/pytorch/issues/99968 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113716 Approved by: https://github.com/albanD	2023-11-16 18:57:56 +00:00
blorange-amd	6cdb6234d6	[ROCm] Supports ROCm6.0 reorganization and cleanup (#111486 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111486 Approved by: https://github.com/jithunnair-amd, https://github.com/pruthvistony, https://github.com/malfet	2023-11-16 18:37:12 +00:00
Miles Lubin	070b2d3cff	cholesky_solve_backward: speed up using output_mask (#112981 ) Introduces a faster path for `cholesky_solve_backward` when the gradient with respect to the cholesky factor isn't required. Adds test coverage in `test_linalg.py`. # Example ## Setup ```py import torch torch.set_num_threads(1) mat = torch.randn(500, 1000) mat = mat @ mat.T L = torch.linalg.cholesky(mat, upper=False) rhs = torch.randn(500, 1) rhs.requires_grad = True sol = torch.cholesky_solve(rhs, L, upper=False).sum(dim=0) ``` ## Before ``` %timeit torch.autograd.grad(sol, rhs, retain_graph=True) 2.61 ms ± 18.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` ## After ``` %timeit torch.autograd.grad(sol, rhs, retain_graph=True) 109 µs ± 3.42 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112981 Approved by: https://github.com/lezcano	2023-11-16 18:30:57 +00:00
albanD	25fb88cf23	Add all 3.12 binary build for wheel. Let's see how it goes. V2 (#112882 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/112882 Approved by: https://github.com/malfet, https://github.com/sammcj	2023-11-16 18:20:12 +00:00
Tongzhou Wang	275403be16	[doc] Add nn.parametrizations.weight_norm (#113783 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113783 Approved by: https://github.com/albanD	2023-11-16 17:42:48 +00:00
PyTorch MergeBot	62d86f27c2	Revert "Add Pass to move constructors from cpu to cuda (#109665 )" This reverts commit 3bac94b107bb808b158b005d248804895d844d40. Reverted https://github.com/pytorch/pytorch/pull/109665 on behalf of https://github.com/eellison due to want to maek one last change ([comment](https://github.com/pytorch/pytorch/pull/109665#issuecomment-1814924579))	2023-11-16 17:39:49 +00:00
eellison	3bac94b107	Add Pass to move constructors from cpu to cuda (#109665 ) Sometimes indexing tensors are constructed on cpu and then used to index a cuda tensor. This prevents cudagraphs when it does not need to. Adding a pass which moves constructors from cpu->cuda when we can prove the downstream uses can be safely converted. This pr allows us to cudagraph `clip` from the blueberries model which improves perf from ~1.5x -> ~4x. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109665 Approved by: https://github.com/ezyang, https://github.com/jansel	2023-11-16 17:28:46 +00:00
ydwu4	7183926622	[HigherOrderOp][BE] consolidate UserFunctionVariable.call_function pattern to _make_inlined (#113814 ) We saw some use cases in higher order operator that tries to directly inline a user-level function (e.g. pytree.tree_flatten and pytree.tree_unflatten) with no tensor operations by manually constructing a UserFunctionVariable and run call_function on it. This PR consolidate this pattern a little bit by adding a _make_inlined helper function to make the UX better( i.e. the callilng convention is kept the same with the function that we'd like to inline) and also reduce redundancy, increase readability. Test Plan: Exisiting tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113814 Approved by: https://github.com/yanboliang	2023-11-16 16:56:24 +00:00
Nikita Shulga	d19cef34fb	Do not attempt to compile unwind.cpp on aarch64 (#113782 ) Summary: As almost entire unwinding logic is build around x86_64 ABI In essence, this reverts https://github.com/pytorch/pytorch/pull/104707 and adds `#ifndef FBCODE_CAFFE2` guards around `symbolize` dummy Use nested namespaces, as PyTorch is finally C++ compatible. Remove extraneous semicolon spotted by clang-tidy. Fixes https://github.com/pytorch/pytorch/issues/113208 Test Plan: CI + `buck2 build fbcode//mode/opt fbcode//caffe2/torch/fb/model_transform/fx2trt/packaging:generate_merge_net_file -c fbcode.arch=aarch64` Differential Revision: D51358469 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113782 Approved by: https://github.com/aaronenyeshi	2023-11-16 16:08:47 +00:00
cyy	f9bf104c64	[2/N] Fixes clang-tidy warnings in header files (#113727 ) This PR fixes more clang-tidy warnings in common headers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113727 Approved by: https://github.com/Skylion007	2023-11-16 13:21:15 +00:00
Behzad Abghari	ecf129565b	Avoid adding to lazy device cache if cache size is 0 (#113710 ) Fixes #113672 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113710 Approved by: https://github.com/antoniojkim, https://github.com/alanwaketan, https://github.com/desertfire	2023-11-16 12:45:34 +00:00
Justin Yip	51cbe780cb	[pytorch-vulkan][1/n] Enable Packing for Vulkan Tensors (#113627 ) Summary: The new implementation of mat-mul missed a critical step that does width-packing on GPU: see T169764697. The existing implementation of mat-mul also missed a case when the "B" matrix is already in vulkan, it fails to do packing, leading to wrong results. (I have added a disabled unittest to reflect the issue). We will take multiple steps to enable (width / height) packing and transformation between different packing on Vulkan. This is a first diff that enable a critical toolset: It allows development to fetch values from the underlying tensor, making it possible to implement tests for the transformation shaders. Test Plan: P882053410, P882053410 Differential Revision: D51291737 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113627 Approved by: https://github.com/SS-JIA	2023-11-16 09:04:07 +00:00
fduwjj	5fb1d8f18a	[NCCL PG] Enable storing nccl traces into storage and make it configurable (#113503 ) This PR is to enable the store of NCCL flight recorder to storage and make it configurable by letting users register their own way of storing the debug info. We will then provide users a script to offline parse and process the dumped blobs. One thing, this PR is not trying to resolve is to decide where to dump the debug info. I will send a follow-up PR to address that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113503 Approved by: https://github.com/zdevito	2023-11-16 07:44:15 +00:00
Angela Yi	c1c4882367	[aps] Sync thrift (#113810 ) Summary: Based on discussions with Sherlock + Zhengxu in D51118067, updated the internal thrift schema to match the OSS schema. Verifier failures: * Test contains a None as input, resulting in no meta["val"] * Test contains torch.autograd.grad_mode.set_grad_enabled as an op, which also results in no meta["val"] * torch.autograd.grad_mode.set_grad_enabled is also not a valid op * Test adds a "parameter" to the state dict but the parameter is not an nn.Parameter, causing an assertion failure So to bypass these failures I did the following hacks(?): * Before creating the exported program in deserialization, populate nodes w/o meta["val"] with meta["val"] = None * Add torch.autograd.grad_mode.set_grad_enabled to the skip opset * Duplicated ExportGraphSignature into aot_export.py so that the graph signature checks will be skipped Configerator changes in D51343615 Test Plan: CI Reviewed By: zhxchen17 Differential Revision: D51342921 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113810 Approved by: https://github.com/zhxchen17	2023-11-16 07:42:30 +00:00
Edward Z. Yang	8033f65c0b	Don't toggle torch logger to NOTSET if it is not set; always use pre-existing (#113842 ) This is kind of hard to test. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113842 Approved by: https://github.com/wanchaol	2023-11-16 07:06:05 +00:00
David Berard	9efbb4ea73	Make offsets dynamic by default (#113734 ) Copied from @ezyang 's #113693. The motivation for this change is that we'd like to guard on storage offset in inductor, to make assumptions about data alignment. create_symbolic_sizes_strides_storage_offset() creates the sizes/strides/offset for fake tensors - they can either be integers or symints. This PR changes storage_offset to always be dynamic. In variables/builder.py, we remove a conditional so that all tensors get added to tracked_fakes. This is because the storage offset will be dynamic even if the other logic in builder.py suggests that it will be static; otherwise, we run into this issue: `1e260c851b/torch/fx/experimental/symbolic_shapes.py (L892-L895)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/113734 Approved by: https://github.com/ezyang	2023-11-16 06:49:09 +00:00
Will Feng	b612e27221	[Easy] Fix typo in TagActivationCheckpoint comment (#113818 ) As titled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113818 Approved by: https://github.com/Chillee, https://github.com/bdhirsh	2023-11-16 06:06:09 +00:00
Pearu Peterson	cffea773e3	Fix bsr_dense_mm with a non-contiguous out argument. (#113801 ) Fixes https://github.com/pytorch/pytorch/issues/113754 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113801 Approved by: https://github.com/cpuhrsch	2023-11-16 05:56:17 +00:00
Jez Ng	0a9dbbbaad	Make _inductor/fx_utils.py, _dynamo/utils.py pass follow_imports typechecking (#113722 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113722 Approved by: https://github.com/lezcano	2023-11-16 05:44:15 +00:00
PyTorch MergeBot	bbd73c746e	Revert "[ONNX][dynamo_export] Add 'aten::rsub' type promotion (#113697 )" This reverts commit 48800e9bb0fd0d8aa56f961fe207b1040922fa2e. Reverted https://github.com/pytorch/pytorch/pull/113697 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it is failing in trunk `48800e9bb0`. The failure on the PR is legit https://github.com/pytorch/pytorch/actions/runs/6884783862/job/18728219414, let me take a look on why Dr.CI marks it as flaky ([comment](https://github.com/pytorch/pytorch/pull/113697#issuecomment-1813790907))	2023-11-16 04:59:32 +00:00
andrewor14	8241fe6edb	[quant][pt2][be] Rewrite QAT annotations using subgraph matcher (#113709 ) Summary: This is the recommended way to write quantizers according to https://pytorch.org/tutorials/prototype/pt2e_quantizer.html#a-note-on-ir-for-pt2e-quantization-flow. It is agnostic to changes in the aten IR and can be easily extended to support conv1d-bn and conv3d-bn fusion patterns in the future. This is the first step towards rewriting XNNPACKQuantizer using this subgraph matcher. Test Plan: python test/test_quantization.py TestQuantizePT2EQAT_ConvBn2d Reviewers: jerryzh168, kimishpatel Subscribers: jerryzh168, kimishpatel, supriyar Differential Revision: [D51366525](https://our.internmc.facebook.com/intern/diff/D51366525) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113709 Approved by: https://github.com/jerryzh168	2023-11-16 03:57:37 +00:00
PyTorch UpdateBot	8efa6ad1fc	[vision hash update] update the pinned vision hash (#113821 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113821 Approved by: https://github.com/pytorchbot	2023-11-16 03:36:29 +00:00
BowenBao	48800e9bb0	[ONNX][dynamo_export] Add 'aten::rsub' type promotion (#113697 ) The logic is the same as 'aten::sub'. Needed by llama2. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113697 Approved by: https://github.com/justinchuby, https://github.com/thiagocrepaldi ghstack dependencies: #113404	2023-11-16 03:31:07 +00:00
ydwu4	670311190d	[HigherOrderOp] Move _map.py to _higher_order_ops (#111152 ) Differential Revision: [D50332159](https://our.internmc.facebook.com/intern/diff/D50332159) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111152 Approved by: https://github.com/zou3519	2023-11-16 03:04:12 +00:00
Kefei Lu	1364f84b42	[easy] encapsulate fb changes from OSS (#113677 ) Summary: encapsulate fb changes into `torch._inductor.fx_passes.fb`, so that adding new passes (`fb.xxx`) won't need to touch OSS code like so: ``` # in torch/_inductor/fx_passes/pre_grad.py if config.is_fbcode(): from .fb import xxx # every new fb/xxx.py would have needed this change in OSS code base ``` Test Plan: CI Differential Revision: D51315193 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113677 Approved by: https://github.com/khabinov, https://github.com/chenyang78	2023-11-16 03:03:57 +00:00
Brian Hirsh	cebad9867b	graph break on intermediate leaves that require grad (#113277 ) fixes https://github.com/pytorch/pytorch/issues/90552. This is a simpler fix that just detects the situation where AOTAutograd can't create a proper backward graph for the situation and graph breaks. This was technically a silent correctness issue before. This PR tries to always graph break when we see a factory function that returns a tensor requiring grad. I check this by seeing if the op returned a `TensorVariable` in dynamo, and if one of the input arguments was a `requires_grad=True` kwarg. I think this is high-fidelity enough, and I'm also hoping that this is uncommon enough that a graph break is reasonable here. The fix to avoid the graph break in user land is also pretty easy - just instantiate your tensor outside of the compiled region and plumb it in. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113277 Approved by: https://github.com/eellison ghstack dependencies: #113267, #113416, #113584	2023-11-16 02:47:45 +00:00
Huy Do	c5f26a409a	Build and test ExecuTorch on PyTorch (#113364 ) This is the first part to start build and test ExecuTorch on PyTorch using a pinned commit. There will be another PR later to update the pinned commit periodically. * The pinned commit is in `.ci/docker/ci_commit_pins/executorch.txt` as part of PT Docker image * I add one simple test `source .ci/scripts/test.sh mv3 cmake xnnpack-quantization-delegation ''`. More could be added later, in fact, any ET tests on Linux could be run here * Building and installation vision and audio need to be done in CI after building PyTorch because they will be broken otherwise Next steps, in sequence: * [ ] Update this pinned commit periodically, similar to https://github.com/pytorch/pytorch/pull/113499 * [ ] Increase ET coverage on PT CI, ideally, we should run all ET pull jobs? * [ ] Switch ExecuTorch's torch, vision, and audio nightly pins to commit pins * [ ] Update ExecuTorch's torch, vision, and audio commit pins periodically Pull Request resolved: https://github.com/pytorch/pytorch/pull/113364 Approved by: https://github.com/ZainRizvi, https://github.com/malfet, https://github.com/guangy10	2023-11-16 02:19:58 +00:00
Jez Ng	c41a32a3bf	Move test_utils.py back to MYPY (#113745 ) Since MYPYNOFOLLOW is about to turn on import following, there's no reason to keep test_utils.py in the MYPYNOFOLLOW config. Moreover, I'm not sure it still takes 10 minutes to typecheck this file; adding it to the MYPY config takes `lintrunner --take MYPY --all-files` from 53s to 57s on my machine, which is substantial but not horrible. I guess we'll see how it fares on CI. (Note that we cannot simply merge MYPY and MYPYNOFOLLOW because the latter config turns on `disallow_any_generics` and so is in that sense stricter than the MYPY config.) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113745 Approved by: https://github.com/clee2000	2023-11-16 01:57:58 +00:00
Jez Ng	a3b859fc67	Drop dynamo-specific type hints on Tensor in favor of type-ignores (#113720 ) Per [this][1] discussion, plus some offline discussion. The summary: @albanD considers the core PyTorch types like Tensor to be extremely brittle, and does not think the risk of adding these typed attributes to be worth it. @eellison mentioned that we could use `WeakTensorKeyDictionary` instead. However, based on the sparse usage of these bonus attributes, I think that would be overkill. So I've opted to go with a few more type-ignore comments instead. [1]: https://github.com/pytorch/pytorch/pull/113610#discussion_r1392907367 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113720 Approved by: https://github.com/ezyang, https://github.com/albanD, https://github.com/eellison ghstack dependencies: #113534, #113610	2023-11-16 01:54:00 +00:00
Jez Ng	605d274300	[dynamo] Make {mutation_guard,symbolic_convert,side_effects}.py pass follow_imports typechecking (#113610 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113610 Approved by: https://github.com/ezyang ghstack dependencies: #113534	2023-11-16 01:54:00 +00:00
Jez Ng	df9acc61fb	[inductor] Make {freezing,ir}.py pass follow-imports typechecking (#113534 ) I used a couple of type-ignore comments in ir.py because it constructs short-lived instances of FixedLayout and GraphModuleSerializer, just to call a single method on them that doesn't use all their members. Making those unused members optional would make the rest of the code a lot messier with sprinkled `assert` statements. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113534 Approved by: https://github.com/albanD	2023-11-16 01:53:52 +00:00
Will Feng	d52b9ba6a8	[torch.compile + selective checkpoint] Attach `context_fn` to the checkpointed graph module, fixing flaky tests (#112672 ) torch.compile + SAC unit test is causing adjacent unit tests to be flaky due to its modification of shared singleton object. This PR attaches the checkpoint context fn to the checkpointed GraphModule, and look it up during execution, avoiding the need to make the higher-order op stateful. Specifically, we attach the `context_fn` to the checkpointed GraphModule. These two will be gc'ed at the same time, so it satisfies the lifetime requirement. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112672 Approved by: https://github.com/wanchaol	2023-11-16 01:34:52 +00:00
Aleksei Nikiforov	b526aae95a	test_lazy: skip HashTest.Scalar (#112747 ) Fixes #99883 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112747 Approved by: https://github.com/huydhn	2023-11-16 01:22:58 +00:00
Iris Zhang	72ce5dd13e	[2D] Remove enable_2d_with_fsdp() API and make remove_enable_2d_with_fsdp private (#112473 ) As we have our new 2D flow out, we want to remove `enable_2d_with_fsdp()`. In addition, we change pre_dp_module_transform to private, as we may need to change the UX later on. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112473 Approved by: https://github.com/fegin, https://github.com/wanchaol	2023-11-16 01:14:00 +00:00
Edward Z. Yang	c2c22dc427	[BE] Some debug logging for track_symint in produce_guards (#113774 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113774 Approved by: https://github.com/Skylion007, https://github.com/bdhirsh	2023-11-16 01:02:43 +00:00
David Berard	bd6b3c4df4	[BE][profiler] add test for EventList (#113764 ) EventList isn't really tested in CI because it seems to only really be used when kineto is not available. Add a basic sanity test that would have caught #113756 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113764 Approved by: https://github.com/malfet	2023-11-16 00:49:29 +00:00
eellison	f8eb46d623	index put device error checking (#113729 ) Fix for https://github.com/pytorch/pytorch/issues/101371 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113729 Approved by: https://github.com/bdhirsh	2023-11-16 00:39:04 +00:00
Catherine Lee	1e260c851b	[ez] Don't retry onnx in shell (#113803 ) is this important? not really, but the retries given by run_test.py on it's own should be enough Pull Request resolved: https://github.com/pytorch/pytorch/pull/113803 Approved by: https://github.com/BowenBao	2023-11-15 23:45:25 +00:00
PyTorch MergeBot	5d170fce29	Revert "Support tensors as Dict keys (#111196 )" This reverts commit b0805fa5d0f73f3419129b1606a3e9a58eed2768. Reverted https://github.com/pytorch/pytorch/pull/111196 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it is failing internally. I will provide the details there ([comment](https://github.com/pytorch/pytorch/pull/111196#issuecomment-1813410149))	2023-11-15 23:08:00 +00:00
Catherine Lee	463489ec95	[ez] Add some more pyre related files to gitignore (#113796 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/113796 Approved by: https://github.com/huydhn	2023-11-15 23:07:39 +00:00
PyTorch MergeBot	7137f5f8c3	Revert "[easy]Remove specialized value (#112252 )" This reverts commit 149b9dfd04ba7dee88168758bf7a5c603dd79d72. Reverted https://github.com/pytorch/pytorch/pull/112252 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but https://github.com/pytorch/pytorch/pull/111196 is failing internally. I will provide the details there ([comment](https://github.com/pytorch/pytorch/pull/112252#issuecomment-1813401896))	2023-11-15 23:02:49 +00:00
Bin Bao	c99d88afa4	[AOTI] Remove try_find_schema (#113617 ) Differential Revision: [D51350727](https://our.internmc.facebook.com/intern/diff/D51350727) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113617 Approved by: https://github.com/aakhundov, https://github.com/chenyang78, https://github.com/khabinov	2023-11-15 22:42:47 +00:00
Wei Wei	b19cf868e8	Back out "Support fp8 in AOTInductor + support optional<> in C ABI (#112527 )" (#113747 ) Test Plan: sandcastle Differential Revision: D51330618 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113747 Approved by: https://github.com/chenyang78, https://github.com/khabinov	2023-11-15 22:42:22 +00:00
ydwu4	094beca0c6	[export] make aot_export_module uses dynamo's fake_mode (#113681 ) Fixes #110100 by making aot_export_modules uses dynamo.export's fake_mode in export. Test Plan: Add new tests. One of the test places the fake tensor on cuda devices manually and we are able to export the program and preserve the device information in the final produced graph module even on a machine that installs a cpu version of pytorch. One workaround we need to do is to set all tensor's requires_grad to false as fake tensor with cuda devices doesn't compose well with aot_autograd right now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113681 Approved by: https://github.com/SherlockNoMad	2023-11-15 22:34:00 +00:00
voznesenskym	6435fc17bb	Remove ignore_sublcass from FakeTensorMode (#113795 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113795 Approved by: https://github.com/ezyang	2023-11-15 22:30:13 +00:00
Edward Z. Yang	97a62c715d	[BE] Remove duplicate storage_offset equality test (#113790 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113790 Approved by: https://github.com/albanD	2023-11-15 22:25:07 +00:00
Zsolt Dollenstein	9b736c707c	[Codemod][python/main_function] caffe2: (#113357 ) Differential Revision: D51149464 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113357 Approved by: https://github.com/huydhn	2023-11-15 22:17:31 +00:00
Catherine Lee	87aeb248c9	More random stepcurrent (#113620 ) Distributed tests for different backends have the same name, so they end up clashing using the current stepcurrent key, so tests were not being run. Disabled the following tests because they are failing: test_ddp_has_finalized test_broadcast_object_list <details> ``` 2023-11-14T06:44:01.0428686Z 2023-11-14T06:44:01.0430447Z distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_broadcast_object_list <- ../../../../opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/distributed/distributed_test.py INFO:numba.cuda.cudadrv.driver:init 2023-11-14T06:44:01.0431048Z [1699943450.893723] [99f90b6e6ff3:10028:0] ucc_context.c:402 UCC ERROR failed to create tl context for cuda 2023-11-14T06:44:01.0431625Z [1699943450.914385] [99f90b6e6ff3:10029:0] ucc_context.c:402 UCC ERROR failed to create tl context for cuda 2023-11-14T06:44:01.0432314Z [rank1]:[2023-11-14 06:30:51,405] torch.testing._internal.common_distributed: [ERROR] Caught exception: 2023-11-14T06:44:01.0433178Z [rank1]:[2023-11-14 06:30:51,405] torch.testing._internal.common_distributed: [ERROR] Traceback (most recent call last): 2023-11-14T06:44:01.0434677Z [rank1]:[2023-11-14 06:30:51,405] torch.testing._internal.common_distributed: [ERROR] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-11-14T06:44:01.0435435Z [rank1]:[2023-11-14 06:30:51,405] torch.testing._internal.common_distributed: [ERROR] getattr(self, test_name)() 2023-11-14T06:44:01.0436895Z [rank1]:[2023-11-14 06:30:51,405] torch.testing._internal.common_distributed: [ERROR] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 544, in wrapper 2023-11-14T06:44:01.0437500Z [rank1]:[2023-11-14 06:30:51,405] torch.testing._internal.common_distributed: [ERROR] fn() 2023-11-14T06:44:01.0438917Z [rank1]:[2023-11-14 06:30:51,405] torch.testing._internal.common_distributed: [ERROR] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2536, in wrapper 2023-11-14T06:44:01.0439637Z [rank1]:[2023-11-14 06:30:51,405] torch.testing._internal.common_distributed: [ERROR] method(args, kwargs) 2023-11-14T06:44:01.0441122Z [rank1]:[2023-11-14 06:30:51,405] torch.testing._internal.common_distributed: [ERROR] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 143, in wrapper 2023-11-14T06:44:01.0441873Z [rank1]:[2023-11-14 06:30:51,405] torch.testing._internal.common_distributed: [ERROR] return func(args, *kwargs) 2023-11-14T06:44:01.0443340Z [rank1]:[2023-11-14 06:30:51,405] torch.testing._internal.common_distributed: [ERROR] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 274, in wrapper 2023-11-14T06:44:01.0444077Z [rank1]:[2023-11-14 06:30:51,405] torch.testing._internal.common_distributed: [ERROR] ret = func(args, *kwargs) 2023-11-14T06:44:01.0445769Z [rank1]:[2023-11-14 06:30:51,405] torch.testing._internal.common_distributed: [ERROR] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/distributed/distributed_test.py", line 7717, in test_broadcast_object_list 2023-11-14T06:44:01.0446732Z [rank1]:[2023-11-14 06:30:51,405] torch.testing._internal.common_distributed: [ERROR] return self._test_broadcast_object_list() 2023-11-14T06:44:01.0448433Z [rank1]:[2023-11-14 06:30:51,405] torch.testing._internal.common_distributed: [ERROR] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/distributed/distributed_test.py", line 7683, in _test_broadcast_object_list 2023-11-14T06:44:01.0449187Z [rank1]:[2023-11-14 06:30:51,405] torch.testing._internal.common_distributed: [ERROR] dist.broadcast_object_list( 2023-11-14T06:44:01.0450553Z [rank1]:[2023-11-14 06:30:51,405] torch.testing._internal.common_distributed: [ERROR] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 72, in wrapper 2023-11-14T06:44:01.0451621Z [rank1]:[2023-11-14 06:30:51,405] torch.testing._internal.common_distributed: [ERROR] return func(args, *kwargs) 2023-11-14T06:44:01.0453161Z [rank1]:[2023-11-14 06:30:51,405] torch.testing._internal.common_distributed: [ERROR] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 2650, in broadcast_object_list 2023-11-14T06:44:01.0454065Z [rank1]:[2023-11-14 06:30:51,405] torch.testing._internal.common_distributed: [ERROR] broadcast(object_sizes_tensor, src=src, group=group) 2023-11-14T06:44:01.0455441Z [rank1]:[2023-11-14 06:30:51,405] torch.testing._internal.common_distributed: [ERROR] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 72, in wrapper 2023-11-14T06:44:01.0456183Z [rank1]:[2023-11-14 06:30:51,405] torch.testing._internal.common_distributed: [ERROR] return func(args, *kwargs) 2023-11-14T06:44:01.0457775Z [rank1]:[2023-11-14 06:30:51,405] torch.testing._internal.common_distributed: [ERROR] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 1947, in broadcast 2023-11-14T06:44:01.0458649Z [rank1]:[2023-11-14 06:30:51,405] torch.testing._internal.common_distributed: [ERROR] work = default_pg.broadcast([tensor], opts) 2023-11-14T06:44:01.0460923Z [rank1]:[2023-11-14 06:30:51,405] torch.testing._internal.common_distributed: [ERROR] RuntimeError: [/var/lib/jenkins/workspace/torch/csrc/distributed/c10d/ProcessGroupUCC.cpp:488] [Rank 1][ProcessGroupUCC-0][READY]failed to init cuda collective, error code -1: Operation is not supported, system error code 2 2023-11-14T06:44:01.0461471Z [rank1]:[2023-11-14 06:30:51,405] torch.testing._internal.common_distributed: [ERROR] 2023-11-14T06:44:01.0462430Z [rank1]:[2023-11-14 06:30:51,405] torch.testing._internal.common_distributed: [ERROR] To execute this test, run the following from the base repo dir: 2023-11-14T06:44:01.0463552Z [rank1]:[2023-11-14 06:30:51,405] torch.testing._internal.common_distributed: [ERROR] python test/distributed/test_distributed_spawn.py -k test_broadcast_object_list 2023-11-14T06:44:01.0464082Z [rank1]:[2023-11-14 06:30:51,405] torch.testing._internal.common_distributed: [ERROR] 2023-11-14T06:44:01.0465136Z [rank1]:[2023-11-14 06:30:51,405] torch.testing._internal.common_distributed: [ERROR] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2023-11-14T06:44:01.0465945Z [rank1]:[2023-11-14 06:30:51,405] torch.testing._internal.common_distributed: [ERROR] exiting process 1 with exit code: 10 2023-11-14T06:44:01.0466605Z [1699943451.005633] [99f90b6e6ff3:10029:0] parser.c:2034 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-11-14T06:44:01.0467303Z [1699943451.005633] [99f90b6e6ff3:10029:0] parser.c:2034 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-11-14T06:44:01.0467972Z [rank0]:[2023-11-14 06:30:51,462] torch.testing._internal.common_distributed: [ERROR] Caught exception: 2023-11-14T06:44:01.0468743Z [rank0]:[2023-11-14 06:30:51,462] torch.testing._internal.common_distributed: [ERROR] Traceback (most recent call last): 2023-11-14T06:44:01.0470233Z [rank0]:[2023-11-14 06:30:51,462] torch.testing._internal.common_distributed: [ERROR] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-11-14T06:44:01.0471106Z [rank0]:[2023-11-14 06:30:51,462] torch.testing._internal.common_distributed: [ERROR] getattr(self, test_name)() 2023-11-14T06:44:01.0472581Z [rank0]:[2023-11-14 06:30:51,462] torch.testing._internal.common_distributed: [ERROR] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 544, in wrapper 2023-11-14T06:44:01.0473162Z [rank0]:[2023-11-14 06:30:51,462] torch.testing._internal.common_distributed: [ERROR] fn() 2023-11-14T06:44:01.0474581Z [rank0]:[2023-11-14 06:30:51,462] torch.testing._internal.common_distributed: [ERROR] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2536, in wrapper 2023-11-14T06:44:01.0475314Z [rank0]:[2023-11-14 06:30:51,462] torch.testing._internal.common_distributed: [ERROR] method(args, *kwargs) 2023-11-14T06:44:01.0476776Z [rank0]:[2023-11-14 06:30:51,462] torch.testing._internal.common_distributed: [ERROR] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 143, in wrapper 2023-11-14T06:44:01.0477535Z [rank0]:[2023-11-14 06:30:51,462] torch.testing._internal.common_distributed: [ERROR] return func(args, *kwargs) 2023-11-14T06:44:01.0478993Z [rank0]:[2023-11-14 06:30:51,462] torch.testing._internal.common_distributed: [ERROR] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 274, in wrapper 2023-11-14T06:44:01.0479886Z [rank0]:[2023-11-14 06:30:51,462] torch.testing._internal.common_distributed: [ERROR] ret = func(args, *kwargs) 2023-11-14T06:44:01.0481593Z [rank0]:[2023-11-14 06:30:51,462] torch.testing._internal.common_distributed: [ERROR] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/distributed/distributed_test.py", line 7717, in test_broadcast_object_list 2023-11-14T06:44:01.0482429Z [rank0]:[2023-11-14 06:30:51,462] torch.testing._internal.common_distributed: [ERROR] return self._test_broadcast_object_list() 2023-11-14T06:44:01.0484145Z [rank0]:[2023-11-14 06:30:51,462] torch.testing._internal.common_distributed: [ERROR] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/distributed/distributed_test.py", line 7683, in _test_broadcast_object_list 2023-11-14T06:44:01.0484886Z [rank0]:[2023-11-14 06:30:51,462] torch.testing._internal.common_distributed: [ERROR] dist.broadcast_object_list( 2023-11-14T06:44:01.0486271Z [rank0]:[2023-11-14 06:30:51,462] torch.testing._internal.common_distributed: [ERROR] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 72, in wrapper 2023-11-14T06:44:01.0487018Z [rank0]:[2023-11-14 06:30:51,462] torch.testing._internal.common_distributed: [ERROR] return func(args, *kwargs) 2023-11-14T06:44:01.0488559Z [rank0]:[2023-11-14 06:30:51,462] torch.testing._internal.common_distributed: [ERROR] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 2650, in broadcast_object_list 2023-11-14T06:44:01.0489470Z [rank0]:[2023-11-14 06:30:51,462] torch.testing._internal.common_distributed: [ERROR] broadcast(object_sizes_tensor, src=src, group=group) 2023-11-14T06:44:01.0491078Z [rank0]:[2023-11-14 06:30:51,462] torch.testing._internal.common_distributed: [ERROR] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 72, in wrapper 2023-11-14T06:44:01.0491912Z [rank0]:[2023-11-14 06:30:51,462] torch.testing._internal.common_distributed: [ERROR] return func(args, *kwargs) 2023-11-14T06:44:01.0493369Z [rank0]:[2023-11-14 06:30:51,462] torch.testing._internal.common_distributed: [ERROR] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 1947, in broadcast 2023-11-14T06:44:01.0494419Z [rank0]:[2023-11-14 06:30:51,462] torch.testing._internal.common_distributed: [ERROR] work = default_pg.broadcast([tensor], opts) 2023-11-14T06:44:01.0496679Z [rank0]:[2023-11-14 06:30:51,462] torch.testing._internal.common_distributed: [ERROR] RuntimeError: [/var/lib/jenkins/workspace/torch/csrc/distributed/c10d/ProcessGroupUCC.cpp:488] [Rank 0][ProcessGroupUCC-0][READY]failed to init cuda collective, error code -1: Operation is not supported, system error code 2 2023-11-14T06:44:01.0497211Z [rank0]:[2023-11-14 06:30:51,462] torch.testing._internal.common_distributed: [ERROR] 2023-11-14T06:44:01.0498198Z [rank0]:[2023-11-14 06:30:51,462] torch.testing._internal.common_distributed: [ERROR] To execute this test, run the following from the base repo dir: 2023-11-14T06:44:01.0499291Z [rank0]:[2023-11-14 06:30:51,462] torch.testing._internal.common_distributed: [ERROR] python test/distributed/test_distributed_spawn.py -k test_broadcast_object_list 2023-11-14T06:44:01.0499838Z [rank0]:[2023-11-14 06:30:51,462] torch.testing._internal.common_distributed: [ERROR] 2023-11-14T06:44:01.0500881Z [rank0]:[2023-11-14 06:30:51,462] torch.testing._internal.common_distributed: [ERROR] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2023-11-14T06:44:01.0501667Z [rank0]:[2023-11-14 06:30:51,462] torch.testing._internal.common_distributed: [ERROR] exiting process 0 with exit code: 10 2023-11-14T06:44:01.0502343Z [1699943451.002362] [99f90b6e6ff3:10028:0] parser.c:2034 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-11-14T06:44:01.0503024Z [1699943451.002362] [99f90b6e6ff3:10028:0] parser.c:2034 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-11-14T06:44:01.0503411Z ('RERUN', {'yellow': True}) [6.1102s] [100%] ``` </details> test_ddp_sync_bn_training_vs_eval <details> ``` 2023-11-14T06:44:01.1494815Z 2023-11-14T06:44:01.1496630Z distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_ddp_sync_bn_training_vs_eval <- ../../../../opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/distributed/distributed_test.py INFO:numba.cuda.cudadrv.driver:init 2023-11-14T06:44:01.1497290Z [1699943779.976037] [99f90b6e6ff3:10758:0] parser.c:2034 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-11-14T06:44:01.1498119Z [1699943779.976037] [99f90b6e6ff3:10758:0] parser.c:2034 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-11-14T06:44:01.1498808Z STAGE:2023-11-14 06:36:20 10758:10758 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2023-11-14T06:44:01.1499465Z [1699943779.970792] [99f90b6e6ff3:10757:0] parser.c:2034 UCX WARN unused environment variables: UCX_COMMIT; UCX_HOME 2023-11-14T06:44:01.1500160Z [1699943779.970792] [99f90b6e6ff3:10757:0] parser.c:2034 UCX WARN (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) 2023-11-14T06:44:01.1500820Z STAGE:2023-11-14 06:36:20 10757:10757 ActivityProfilerController.cpp:314] Completed Stage: Warm Up 2023-11-14T06:44:01.1501556Z STAGE:2023-11-14 06:36:20 10758:10758 ActivityProfilerController.cpp:320] Completed Stage: Collection 2023-11-14T06:44:01.1502239Z STAGE:2023-11-14 06:36:20 10757:10757 ActivityProfilerController.cpp:320] Completed Stage: Collection 2023-11-14T06:44:01.1502952Z STAGE:2023-11-14 06:36:20 10757:10757 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2023-11-14T06:44:01.1503678Z STAGE:2023-11-14 06:36:20 10758:10758 ActivityProfilerController.cpp:324] Completed Stage: Post Processing 2023-11-14T06:44:01.1504350Z [rank0]:[2023-11-14 06:36:20,668] torch.testing._internal.common_distributed: [ERROR] Caught exception: 2023-11-14T06:44:01.1505119Z [rank0]:[2023-11-14 06:36:20,668] torch.testing._internal.common_distributed: [ERROR] Traceback (most recent call last): 2023-11-14T06:44:01.1506729Z [rank0]:[2023-11-14 06:36:20,668] torch.testing._internal.common_distributed: [ERROR] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-11-14T06:44:01.1507492Z [rank0]:[2023-11-14 06:36:20,668] torch.testing._internal.common_distributed: [ERROR] getattr(self, test_name)() 2023-11-14T06:44:01.1508992Z [rank0]:[2023-11-14 06:36:20,668] torch.testing._internal.common_distributed: [ERROR] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 544, in wrapper 2023-11-14T06:44:01.1509578Z [rank0]:[2023-11-14 06:36:20,668] torch.testing._internal.common_distributed: [ERROR] fn() 2023-11-14T06:44:01.1510994Z [rank0]:[2023-11-14 06:36:20,668] torch.testing._internal.common_distributed: [ERROR] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2536, in wrapper 2023-11-14T06:44:01.1511725Z [rank0]:[2023-11-14 06:36:20,668] torch.testing._internal.common_distributed: [ERROR] method(args, *kwargs) 2023-11-14T06:44:01.1513193Z [rank0]:[2023-11-14 06:36:20,668] torch.testing._internal.common_distributed: [ERROR] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 174, in wrapper 2023-11-14T06:44:01.1513962Z [rank0]:[2023-11-14 06:36:20,668] torch.testing._internal.common_distributed: [ERROR] return func(args, *kwargs) 2023-11-14T06:44:01.1515697Z [rank0]:[2023-11-14 06:36:20,668] torch.testing._internal.common_distributed: [ERROR] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/distributed/distributed_test.py", line 9230, in test_ddp_sync_bn_training_vs_eval 2023-11-14T06:44:01.1516529Z [rank0]:[2023-11-14 06:36:20,668] torch.testing._internal.common_distributed: [ERROR] self.assertNotEqual([], all_gather_calls) 2023-11-14T06:44:01.1518019Z [rank0]:[2023-11-14 06:36:20,668] torch.testing._internal.common_distributed: [ERROR] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3448, in assertNotEqual 2023-11-14T06:44:01.1518910Z [rank0]:[2023-11-14 06:36:20,668] torch.testing._internal.common_distributed: [ERROR] with self.assertRaises(AssertionError, msg=msg): 2023-11-14T06:44:01.1520177Z [rank0]:[2023-11-14 06:36:20,668] torch.testing._internal.common_distributed: [ERROR] File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 226, in __exit__ 2023-11-14T06:44:01.1521062Z [rank0]:[2023-11-14 06:36:20,668] torch.testing._internal.common_distributed: [ERROR] self._raiseFailure("{} not raised".format(exc_name)) 2023-11-14T06:44:01.1522238Z [rank0]:[2023-11-14 06:36:20,668] torch.testing._internal.common_distributed: [ERROR] File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 163, in _raiseFailure 2023-11-14T06:44:01.1523099Z [rank0]:[2023-11-14 06:36:20,668] torch.testing._internal.common_distributed: [ERROR] raise self.test_case.failureException(msg) 2023-11-14T06:44:01.1523923Z [rank0]:[2023-11-14 06:36:20,668] torch.testing._internal.common_distributed: [ERROR] AssertionError: AssertionError not raised 2023-11-14T06:44:01.1524470Z [rank0]:[2023-11-14 06:36:20,668] torch.testing._internal.common_distributed: [ERROR] 2023-11-14T06:44:01.1525481Z [rank0]:[2023-11-14 06:36:20,668] torch.testing._internal.common_distributed: [ERROR] To execute this test, run the following from the base repo dir: 2023-11-14T06:44:01.1526632Z [rank0]:[2023-11-14 06:36:20,668] torch.testing._internal.common_distributed: [ERROR] python test/distributed/test_distributed_spawn.py -k test_ddp_sync_bn_training_vs_eval 2023-11-14T06:44:01.1527180Z [rank0]:[2023-11-14 06:36:20,668] torch.testing._internal.common_distributed: [ERROR] 2023-11-14T06:44:01.1528223Z [rank0]:[2023-11-14 06:36:20,668] torch.testing._internal.common_distributed: [ERROR] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2023-11-14T06:44:01.1529029Z [rank0]:[2023-11-14 06:36:20,668] torch.testing._internal.common_distributed: [ERROR] exiting process 0 with exit code: 10 2023-11-14T06:44:01.1529786Z [rank1]:[2023-11-14 06:36:20,890] torch.testing._internal.common_distributed: [ERROR] Caught exception: 2023-11-14T06:44:01.1530576Z [rank1]:[2023-11-14 06:36:20,890] torch.testing._internal.common_distributed: [ERROR] Traceback (most recent call last): 2023-11-14T06:44:01.1532383Z [rank1]:[2023-11-14 06:36:20,890] torch.testing._internal.common_distributed: [ERROR] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-11-14T06:44:01.1533127Z [rank1]:[2023-11-14 06:36:20,890] torch.testing._internal.common_distributed: [ERROR] getattr(self, test_name)() 2023-11-14T06:44:01.1534608Z [rank1]:[2023-11-14 06:36:20,890] torch.testing._internal.common_distributed: [ERROR] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 544, in wrapper 2023-11-14T06:44:01.1535194Z [rank1]:[2023-11-14 06:36:20,890] torch.testing._internal.common_distributed: [ERROR] fn() 2023-11-14T06:44:01.1536817Z [rank1]:[2023-11-14 06:36:20,890] torch.testing._internal.common_distributed: [ERROR] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2536, in wrapper 2023-11-14T06:44:01.1537575Z [rank1]:[2023-11-14 06:36:20,890] torch.testing._internal.common_distributed: [ERROR] method(args, *kwargs) 2023-11-14T06:44:01.1539036Z [rank1]:[2023-11-14 06:36:20,890] torch.testing._internal.common_distributed: [ERROR] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 174, in wrapper 2023-11-14T06:44:01.1539800Z [rank1]:[2023-11-14 06:36:20,890] torch.testing._internal.common_distributed: [ERROR] return func(args, *kwargs) 2023-11-14T06:44:01.1541531Z [rank1]:[2023-11-14 06:36:20,890] torch.testing._internal.common_distributed: [ERROR] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/distributed/distributed_test.py", line 9230, in test_ddp_sync_bn_training_vs_eval 2023-11-14T06:44:01.1542388Z [rank1]:[2023-11-14 06:36:20,890] torch.testing._internal.common_distributed: [ERROR] self.assertNotEqual([], all_gather_calls) 2023-11-14T06:44:01.1544015Z [rank1]:[2023-11-14 06:36:20,890] torch.testing._internal.common_distributed: [ERROR] File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3448, in assertNotEqual 2023-11-14T06:44:01.1544907Z [rank1]:[2023-11-14 06:36:20,890] torch.testing._internal.common_distributed: [ERROR] with self.assertRaises(AssertionError, msg=msg): 2023-11-14T06:44:01.1546061Z [rank1]:[2023-11-14 06:36:20,890] torch.testing._internal.common_distributed: [ERROR] File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 226, in __exit__ 2023-11-14T06:44:01.1546944Z [rank1]:[2023-11-14 06:36:20,890] torch.testing._internal.common_distributed: [ERROR] self._raiseFailure("{} not raised".format(exc_name)) 2023-11-14T06:44:01.1548142Z [rank1]:[2023-11-14 06:36:20,890] torch.testing._internal.common_distributed: [ERROR] File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 163, in _raiseFailure 2023-11-14T06:44:01.1548991Z [rank1]:[2023-11-14 06:36:20,890] torch.testing._internal.common_distributed: [ERROR] raise self.test_case.failureException(msg) 2023-11-14T06:44:01.1549806Z [rank1]:[2023-11-14 06:36:20,890] torch.testing._internal.common_distributed: [ERROR] AssertionError: AssertionError not raised 2023-11-14T06:44:01.1550350Z [rank1]:[2023-11-14 06:36:20,890] torch.testing._internal.common_distributed: [ERROR] 2023-11-14T06:44:01.1551304Z [rank1]:[2023-11-14 06:36:20,890] torch.testing._internal.common_distributed: [ERROR] To execute this test, run the following from the base repo dir: 2023-11-14T06:44:01.1552462Z [rank1]:[2023-11-14 06:36:20,890] torch.testing._internal.common_distributed: [ERROR] python test/distributed/test_distributed_spawn.py -k test_ddp_sync_bn_training_vs_eval 2023-11-14T06:44:01.1553095Z [rank1]:[2023-11-14 06:36:20,890] torch.testing._internal.common_distributed: [ERROR] 2023-11-14T06:44:01.1554166Z [rank1]:[2023-11-14 06:36:20,890] torch.testing._internal.common_distributed: [ERROR] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2023-11-14T06:44:01.1554976Z [rank1]:[2023-11-14 06:36:20,890] torch.testing._internal.common_distributed: [ERROR] exiting process 1 with exit code: 10 2023-11-14T06:44:01.1555235Z ('RERUN', {'yellow': True}) [6.6107s] [100%] ``` </details> test_backend_full_group <details> ``` 2023-11-14T22:51:56.4502470Z FAILED [5.2125s] distributed/test_distributed_spawn.py::TestDistBackendWithSpawn::test_backend_full_group - RuntimeError: Process 0 exited with error code 10 and exception: 2023-11-14T22:51:56.4502665Z Traceback (most recent call last): 2023-11-14T22:51:56.4503603Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-11-14T22:51:56.4503796Z getattr(self, test_name)() 2023-11-14T22:51:56.4504710Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 544, in wrapper 2023-11-14T22:51:56.4504845Z fn() 2023-11-14T22:51:56.4505737Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2536, in wrapper 2023-11-14T22:51:56.4505896Z method(args, *kwargs) 2023-11-14T22:51:56.4506823Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 174, in wrapper 2023-11-14T22:51:56.4506992Z return func(args, *kwargs) 2023-11-14T22:51:56.4508285Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/distributed/distributed_test.py", line 882, in test_backend_full_group 2023-11-14T22:51:56.4508640Z self._test_group_override_backend(self._init_full_group_test) 2023-11-14T22:51:56.4509798Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/distributed/distributed_test.py", line 852, in _test_group_override_backend 2023-11-14T22:51:56.4510104Z group, group_id, rank = initializer(backend=new_backend) 2023-11-14T22:51:56.4510629Z UnboundLocalError: local variable 'new_backend' referenced before assignment 2023-11-14T22:51:56.4510650Z 2023-11-14T22:51:56.4510987Z To execute this test, run the following from the base repo dir: 2023-11-14T22:51:56.4511525Z python test/distributed/test_distributed_spawn.py -k test_backend_full_group 2023-11-14T22:51:56.4511545Z 2023-11-14T22:51:56.4511970Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2023-11-14T22:51:56.4511989Z 2023-11-14T22:51:56.4512242Z Process 1 exited with error code 10 and exception: 2023-11-14T22:51:56.4512454Z Traceback (most recent call last): 2023-11-14T22:51:56.4513380Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 658, in run_test 2023-11-14T22:51:56.4513687Z getattr(self, test_name)() 2023-11-14T22:51:56.4514612Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 544, in wrapper 2023-11-14T22:51:56.4514746Z fn() 2023-11-14T22:51:56.4515633Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2536, in wrapper 2023-11-14T22:51:56.4515791Z method(args, *kwargs) 2023-11-14T22:51:56.4516708Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_distributed.py", line 174, in wrapper 2023-11-14T22:51:56.4516895Z return func(args, **kwargs) 2023-11-14T22:51:56.4518008Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/distributed/distributed_test.py", line 882, in test_backend_full_group 2023-11-14T22:51:56.4518352Z self._test_group_override_backend(self._init_full_group_test) 2023-11-14T22:51:56.4519509Z File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/distributed/distributed_test.py", line 852, in _test_group_override_backend 2023-11-14T22:51:56.4519813Z group, group_id, rank = initializer(backend=new_backend) 2023-11-14T22:51:56.4520334Z UnboundLocalError: local variable 'new_backend' referenced before assignment 2023-11-14T22:51:56.4520355Z 2023-11-14T22:51:56.4528843Z To execute this test, run the following from the base repo dir: 2023-11-14T22:51:56.4529492Z python test/distributed/test_distributed_spawn.py -k test_backend_full_group 2023-11-14T22:51:56.4529681Z 2023-11-14T22:51:56.4530122Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2023-11-14T22:51:56.4530423Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! ``` </details> pretty sure the solution for this one is to add ucc in _test_group_override_backend https://ossci-raw-job-status.s3.amazonaws.com/log/18651430019 https://ossci-raw-job-status.s3.amazonaws.com/log/18651430132 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113620 Approved by: https://github.com/huydhn	2023-11-15 21:56:10 +00:00
PyTorch MergeBot	4534cf102a	Revert "[funcol] a few optimizations to funcol (#113324 )" This reverts commit 7117bffff916c44122ae73b5ce32a8411138db96. Reverted https://github.com/pytorch/pytorch/pull/113324 on behalf of https://github.com/huydhn due to Sorry for reverting your change here, but it is failing internal test ([comment](https://github.com/pytorch/pytorch/pull/113324#issuecomment-1813317913))	2023-11-15 21:53:23 +00:00
Vitaly Berov	dd28006d8d	SGR/Assistant: making sure linker drops unnecessary dependencies (#112871 ) Summary: Assistant/SGR is linked in a way that links to all not-reference libraries are dropped: https://www.internalfb.com/code/fbsource/[c74911ac21d6b90d1fbca8f2de08d6269f44e1fc]/xplat/toolchains/android/ndk/ndk_toolchains.bzl?lines=931 However, `caffe2` overrides this setting https://www.internalfb.com/code/fbsource/[2536ee6849b08da1adcd5b9da0e455a4af3a06d1][blame]/xplat/caffe2/c2_defs.bzl?lines=496. That results in the build breaks like discussed here: https://fb.workplace.com/groups/llvm.gcc/permalink/25390586597229949/ : Assistant doesn't use libforce_dlopen but it sill requires it, and that library exist on device. As we statically link all operators, the `caffe2` override doesn't seem to be necessary. This diff adds a build parameter affecting `caffe2` linker options. Test Plan: Built supernova experimental build, made sure Assistant starts without operator issues. Tried tts, ocr and asr command in SGR, made sure they work. Verified that hypernova build doesn't required libforce_dlopen when D50695343 is applied. Reviewed By: veselinp Differential Revision: D50870489 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112871 Approved by: https://github.com/vybv, https://github.com/PaliC	2023-11-15 21:12:33 +00:00
Jerry Zhang	585e315b3a	[quant][pt2e] Refactor insert observer to do sharing checking in the same place (#113458 ) Summary: Previously it is scatter in two different places: before inserting observer and during observer, this PR moved everything before we insert observer * Next: refactor QuantizationSpec and check more fields for sharing Test Plan: CI (regression tests) Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/113458 Approved by: https://github.com/kimishpatel	2023-11-15 21:08:39 +00:00
Jane Xu	deec2380c7	Add 0dim Tensor overload for _foreach_div (#113688 ) This PR is ALMOST basically just following the steps from #106677 EXCEPT! We do add one feature. Similar to fused_adam(w), for the CUDA dispatches: when the scalar tensor is on CPU, we .item and redispatch to the normal scalar overload. Otherwise, the cuda kernel will complain about mismatch in devices between the scalar and the tensors. Why do we add this feature? Our optimizers want to allow lr as a tensor, and lr could be a CPU tensor. lr is used with foreach_div_ in Adam, so our CI will break otherwise. After this PR, `_foreach_mul` and `_foreach_div` will accept either a CPU or a GPU tensor for the scalar tensor (vs only a GPU tensor). They join the ranks of `fused_adam(w)` in this characteristic. I did not yet do the same thing for foreach_add (the only other foreach op with a .Tensor overload) because there is no use case and will be more involved. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113688 Approved by: https://github.com/mlazos, https://github.com/albanD	2023-11-15 20:59:32 +00:00
Lucas Pasqualin	2164598c40	Improves comparison of state dicts for Checkpoint E2E Tests (#113181 ) Addresses the following comment - https://github.com/pytorch/pytorch/pull/112541#discussion_r1380197424 Changes the comparison of models in the checkpointing E2E test to compare a non-parallelized model against distribued model after training, saving, & loading. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113181 Approved by: https://github.com/fegin, https://github.com/huydhn, https://github.com/wz337	2023-11-15 20:48:45 +00:00
BowenBao	275a4521a9	[ONNX] Fix scalar type promotion between fp16 tensor and fp32 scalar (#113404 ) Fixes https://github.com/pytorch/pytorch/issues/104594. The reason for the exporter behavior in original posted issue is explained as follows: ONNX model track shape related computes that were done in pytorch by python numbers as tensor computes. This is the only way for ONNX to track them properly since ONNX only has tensor type, otherwise the computation result will be tracked statically as constant, and the model won't work for another input that differs in shape. Now for type promotion logic, scalars should be treated differently with tensors. Exporter mistook the shape related scalars as tensors in this case and incorrectly promoted. This PR fixes the behavior and relaxes the criteria of scalar recognition. For floating point, previously only a value from model initializer that has dtype torch.double and rank 0 is treated as scalar. Now it is relaxed to any intermediate value, as well as for dtype torch.float. Previous assumption was that python number is traced as torch.double dtype, which also appears to be invalid anymore. NOTE that this might introduce regression that a REAL 0-rank tensor is now being recognized as scalar. The downside is the model will drop in accuracy for these cases as certain computations will happen in lower precision data types. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113404 Approved by: https://github.com/justinchuby	2023-11-15 20:32:55 +00:00
Aaron Enye Shi	12b2dd16b0	[Kineto] Initialize libkineto profilers during torch init process during pybind set-up (#112623 ) Summary: We are planning to lazily initialize CUPTI when profiling is actually performed. Therefore, we need to remove profiler init dependency on CUPTI Callbacks' RESOURCE_CONTEXT_CREATED. Instead, we can initialize the profilers during torch profiler pybind, ie. THPAutograd_initExtension() and lazily in profilerStep(). Test Plan: CI and ran internally, see internal diff logs. Pulled By: aaronenyeshi Pull Request resolved: https://github.com/pytorch/pytorch/pull/112623 Approved by: https://github.com/albanD	2023-11-15 20:26:13 +00:00
Brian Hirsh	cc11c0d11b	aot_autograd: keep input mutations on requires_grad=True tensor out of the graph for inference (#113584 ) The original behavior of torch.compile w.r.t. input mutations maintains that if an input to a graph was mutated, and requires grad, we will keep the input mutation outside of the graph and replay it at runtime. This is important because, e.g., an input can have outstanding aliases, and mutating the input in eager mode will cause autograd to change the `grad_fn` of all outstanding aliases. It looks like landing https://github.com/pytorch/pytorch/pull/111347 changed this behavior slightly: * The linked PR makes it possible for AOTAutograd to go down the inference code path, even if some inputs require grad (because all of the outputs of the graph were seen to not require grad) * AOTAutograd's logic in the inference code path today is to always keep input mutations in the graph. This PR fixes that regression: regardless of inference vs. training, we should always keep input mutations outside of the graph if the input requires_grad. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113584 Approved by: https://github.com/tugsbayasgalan ghstack dependencies: #113267, #113416	2023-11-15 19:55:47 +00:00
Brian Hirsh	032e5a4528	handle cross-dtype views during AOTAutograd view-replay (#113416 ) Fixes https://github.com/pytorch/pytorch/issues/109053 I think "partitioning views out of the graph" will be a more robust fix for the class of errors that we've seen around incorrectly regenerating views at runtime. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113416 Approved by: https://github.com/ezyang ghstack dependencies: #113267	2023-11-15 19:55:47 +00:00
Brian Hirsh	720e866d18	graph break on out= ops with noncontiguous out args (#113267 ) Fixes https://github.com/pytorch/pytorch/issues/113010 In eager mode, when you call an out= op like `add(..., out=out_arg)` with an out argument that is noncontiguous, the noncontiguous out arg will be returned directly. When we functionalize though, functionalization replaces it with a call to `add(...)` which ignores the contiguity of the original out arg. Instead of trying to support this, this PR detects that situation and graph breaks Pull Request resolved: https://github.com/pytorch/pytorch/pull/113267 Approved by: https://github.com/albanD	2023-11-15 19:55:47 +00:00
Nikita Shulga	05d949279c	[C10] cpuinfo error handling (#113771 ) If `cpuinfo_initalize` returns false, call to subsequent cpuinfo functions may result in `abort()` Also, `defaultNumThreads()` method is assumption if one method fails then try another, and finally return 1. Alas there are no good way to test it on x86 platform, but on ARM one can replicate it by running `sudo chmod 750 /sys` and then `python3 -c "import torch;torch._C.profiler.gather_traceback(True, True, True)"` <!-- copilot:poem --> ### <samp>🤖 Generated by Copilot at 4d942e8</samp> > _`cpuinfo` fails_ > _avoid undefined behavior_ > _check before you count_ Partially addresses https://github.com/pytorch/pytorch/issues/113568 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113771 Approved by: https://github.com/atalman	2023-11-15 19:49:34 +00:00
eellison	c1315ae2b9	Only check significant strides in test torchinductor (#113389 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113389 Approved by: https://github.com/int3	2023-11-15 19:47:55 +00:00
Randolf Scholz	42b2b9e663	fixed pyi file for ReduceLROnPlateau (#113659 ) Fixes #63143 Issue reappeared due to subclassing not present in stub file. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113659 Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/kit1980	2023-11-15 19:33:36 +00:00
David Berard	b3423889fe	[inductor][fx pass] handle numpy compatibility arg names (#113078 ) Fixes #113038 the "dim" kwarg can also be referred to with "axis" - handle this case. `21b6030ac3/torch/csrc/utils/python_arg_parser.cpp (L72-L77)` previously, if the "axis" kwarg was used, it would not be matched and "dim" would default to 0. `75adb9f371/torch/_inductor/fx_passes/split_cat.py (L172-L176)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/113078 Approved by: https://github.com/eellison	2023-11-15 19:27:24 +00:00
wz337	ca9e654353	[FSDP] Fix FSDP submodule with DeviceMesh does not return DTensor state_dict error (#113593 ) For scenarios where FSDP is not the root module, the `_use_dtensor` flag would not be switched on. This PR fixes it by checking whether the submodule has the `device_mesh` and turn `_use_dtensor` flag on accordingly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113593 Approved by: https://github.com/fegin	2023-11-15 19:00:19 +00:00
PyTorch MergeBot	277474f1a0	Revert "[2d] pass shape/stride during tensor unflatten (#113547 )" This reverts commit 93372455a73043332c16a71cb9dccdf3e0412a57. Reverted https://github.com/pytorch/pytorch/pull/113547 on behalf of https://github.com/wanchaol due to broken compile test ([comment](https://github.com/pytorch/pytorch/pull/113547#issuecomment-1813048318))	2023-11-15 18:32:54 +00:00
min-jean-cho	c678c5ef38	[doc] caution torch.multinomial usage (#112892 ) Fixes #107406 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112892 Approved by: https://github.com/albanD	2023-11-15 18:20:48 +00:00
albanD	296c9e3ce7	upgrade lintrunner to the lowest supported versions on python 3.12 (#113562 ) As per title, the current versions fail to install on 3.12. The failures are related to https://github.com/numpy/numpy/issues/25147 They are fixed by adding manual annotations for the code in PyTorch and ignoring them on caffe2 as discussed with @malfet. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113562 Approved by: https://github.com/malfet	2023-11-15 18:12:01 +00:00
zabboud	7f9fafed53	Resolve docstring errors in throughput_benchmark.py, weak.py, _traceback.py, file_baton.py, _contextlib.py, _device.py, cpp_backtrace.py, bundled_inputs.py, run_cpu.py, hooks.py, mobile_optimizer.py, _freeze.py, __init__.py, mkldnn.py, dlpack.py (#113311 ) Fixes #112633 Fixed errors relating to pydocstyle in the following files. The remaining errors are not covered in this issue. `torch/utils/dlpack.py` was not modified as the errors are relating to the function signature in the first line in the docstring which must be maintained as is for proper Sphinx interpretation. ```python def from_dlpack(ext_tensor: Any) -> 'torch.Tensor': """from_dlpack(ext_tensor) -> Tensor ..... """ ``` pydocstyle torch/utils/_contextlib.py --count before: 4 after: 0 pydocstyle torch/backends/mps/__init__.py --count before: 8 after: 1 remaining errors ``` torch/backends/mps/__init__.py:1 at module level: D104: Missing docstring in public package ``` pydocstyle torch/backends/xeon/run_cpu.py --count before: 13 after: 1 remaining errors ``` torch/backends/xeon/run_cpu.py:864 in public function `main`: D103: Missing docstring in public function ``` pydocstyle torch/backends/cpu/__init__.py --count before: 2 after: 1 remaining errors ``` torch/backends/cpu/__init__.py:1 at module level: D104: Missing docstring in public package ``` pydocstyle torch/utils/cpp_backtrace.py --count before: 4 after: 1 remaining errors ``` torch/utils/cpp_backtrace.py:1 at module level: D100: Missing docstring in public module ``` pydocstyle torch/utils/bundled_inputs.py --count before: 8 after: 1 remaining errors ``` torch/utils/bundled_inputs.py:1 at module level: D100: Missing docstring in public module ``` pydocstyle torch/utils/file_baton.py --count before: 8 after: 1 remaining errors ``` torch/utils/file_baton.py:1 at module level: D100: Missing docstring in public module ``` pydocstyle torch/utils/mobile_optimizer.py --count before: 6 after: 1 remaining errors ``` torch/utils/mobile_optimizer.py:8 in public class `LintCode`: D101: Missing docstring in public class ``` pydocstyle torch/backends/opt_einsum/__init__.py --count before: 7 after: 5 remaining errors ``` torch/backends/opt_einsum/__init__.py:1 at module level: D104: Missing docstring in public package torch/backends/opt_einsum/__init__.py:67 in public function `set_flags`: D103: Missing docstring in public function torch/backends/opt_einsum/__init__.py:77 in public function `flags`: D103: Missing docstring in public function torch/backends/opt_einsum/__init__.py:93 in public class `OptEinsumModule`: D101: Missing docstring in public class torch/backends/opt_einsum/__init__.py:94 in public method `__init__`: D107: Missing docstring in __init__ ``` pydocstyle torch/utils/_device.py --count before: 9 after: 6 remaining errors ``` torch/utils/_device.py:58 in public class `DeviceContext`: D101: Missing docstring in public class torch/utils/_device.py:59 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/_device.py:62 in public method `__enter__`: D105: Missing docstring in magic method torch/utils/_device.py:68 in public method `__exit__`: D105: Missing docstring in magic method torch/utils/_device.py:73 in public method `__torch_function__`: D105: Missing docstring in magic method torch/utils/_device.py:80 in public function `device_decorator`: D103: Missing docstring in public function ``` pydocstyle torch/utils/_freeze.py --count before: 15 after: 7 remaining errors ``` torch/utils/_freeze.py:77 in public function `indent_msg`: D103: Missing docstring in public function torch/utils/_freeze.py:89 in public class `FrozenModule`: D101: Missing docstring in public class torch/utils/_freeze.py:100 in public class `Freezer`: D101: Missing docstring in public class torch/utils/_freeze.py:101 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/_freeze.py:106 in public method `msg`: D102: Missing docstring in public method torch/utils/_freeze.py:185 in public method `get_module_qualname`: D102: Missing docstring in public method torch/utils/_freeze.py:206 in public method `compile_string`: D102: Missing docstring in public method ``` pydocstyle torch/utils/throughput_benchmark.py --count before: 25 after: 8 remaining errors ``` torch/utils/throughput_benchmark.py:1 at module level: D100: Missing docstring in public module torch/utils/throughput_benchmark.py:27 in public class `ExecutionStats`: D101: Missing docstring in public class torch/utils/throughput_benchmark.py:28 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/throughput_benchmark.py:33 in public method `latency_avg_ms`: D102: Missing docstring in public method torch/utils/throughput_benchmark.py:37 in public method `num_iters`: D102: Missing docstring in public method torch/utils/throughput_benchmark.py:46 in public method `total_time_seconds`: D102: Missing docstring in public method torch/utils/throughput_benchmark.py:50 in public method `__str__`: D105: Missing docstring in magic method torch/utils/throughput_benchmark.py:94 in public method `__init__`: D107: Missing docstring in __init__ ``` pydocstyle torch/utils/hooks.py --count before: 14 after: 11 remaining errors ``` torch/utils/hooks.py:1 at module level: D100: Missing docstring in public module torch/utils/hooks.py:23 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/hooks.py:34 in public method `remove`: D102: Missing docstring in public method torch/utils/hooks.py:44 in public method `__getstate__`: D105: Missing docstring in magic method torch/utils/hooks.py:50 in public method `__setstate__`: D105: Missing docstring in magic method torch/utils/hooks.py:64 in public method `__enter__`: D105: Missing docstring in magic method torch/utils/hooks.py:67 in public method `__exit__`: D105: Missing docstring in magic method torch/utils/hooks.py:82 in public function `warn_if_has_hooks`: D103: Missing docstring in public function torch/utils/hooks.py:103 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/hooks.py:188 in public method `setup_input_hook`: D102: Missing docstring in public method torch/utils/hooks.py:197 in public method `setup_output_hook`: D102: Missing docstring in public method ``` pydocstyle torch/utils/_traceback.py --count before: 19 after: 14 remaining errors ``` torch/utils/_traceback.py:47 in public function `report_compile_source_on_error`: D103: Missing docstring in public function torch/utils/_traceback.py:160 in public class `CapturedTraceback`: D101: Missing docstring in public class torch/utils/_traceback.py:163 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/_traceback.py:167 in public method `cleanup`: D102: Missing docstring in public method torch/utils/_traceback.py:170 in public method `summary`: D102: Missing docstring in public method torch/utils/_traceback.py:182 in public method `__getstate__`: D105: Missing docstring in magic method torch/utils/_traceback.py:190 in public method `extract`: D205: 1 blank line required between summary line and description (found 0) torch/utils/_traceback.py:190 in public method `extract`: D400: First line should end with a period (not 't') torch/utils/_traceback.py:213 in public method `format`: D205: 1 blank line required between summary line and description (found 0) torch/utils/_traceback.py:213 in public method `format`: D400: First line should end with a period (not 'f') torch/utils/_traceback.py:213 in public method `format`: D401: First line should be in imperative mood (perhaps 'Format', not 'Formats') torch/utils/_traceback.py:224 in public method `format_all`: D200: One-line docstring should fit on one line with quotes (found 3) torch/utils/_traceback.py:247 in private function `_extract_symbolized_tb`: D205: 1 blank line required between summary line and description (found 0) torch/utils/_traceback.py:247 in private function `_extract_symbolized_tb`: D400: First line should end with a period (not 'f') ``` pydocstyle torch/utils/mkldnn.py --count before: 28 after: 26 remaining errors ``` torch/utils/mkldnn.py:1 at module level: D100: Missing docstring in public module torch/utils/mkldnn.py:4 in public class `MkldnnLinear`: D101: Missing docstring in public class torch/utils/mkldnn.py:5 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/mkldnn.py:19 in public method `__getstate__`: D105: Missing docstring in magic method torch/utils/mkldnn.py:23 in public method `__setstate__`: D105: Missing docstring in magic method torch/utils/mkldnn.py:29 in public method `forward`: D102: Missing docstring in public method torch/utils/mkldnn.py:75 in public class `MkldnnConv1d`: D101: Missing docstring in public class torch/utils/mkldnn.py:76 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/mkldnn.py:82 in public method `__setstate__`: D105: Missing docstring in magic method torch/utils/mkldnn.py:88 in public class `MkldnnConv2d`: D101: Missing docstring in public class torch/utils/mkldnn.py:89 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/mkldnn.py:100 in public method `__setstate__`: D105: Missing docstring in magic method torch/utils/mkldnn.py:110 in public class `MkldnnConv3d`: D101: Missing docstring in public class torch/utils/mkldnn.py:111 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/mkldnn.py:122 in public method `__setstate__`: D105: Missing docstring in magic method torch/utils/mkldnn.py:133 in public class `MkldnnBatchNorm`: D101: Missing docstring in public class torch/utils/mkldnn.py:136 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/mkldnn.py:155 in public method `__getstate__`: D105: Missing docstring in magic method torch/utils/mkldnn.py:163 in public method `__setstate__`: D105: Missing docstring in magic method torch/utils/mkldnn.py:171 in public method `forward`: D102: Missing docstring in public method torch/utils/mkldnn.py:184 in public class `MkldnnPrelu`: D101: Missing docstring in public class torch/utils/mkldnn.py:185 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/mkldnn.py:190 in public method `__getstate__`: D105: Missing docstring in magic method torch/utils/mkldnn.py:194 in public method `__setstate__`: D105: Missing docstring in magic method torch/utils/mkldnn.py:199 in public method `forward`: D102: Missing docstring in public method torch/utils/mkldnn.py:205 in public function `to_mkldnn`: D103: Missing docstring in public function ``` pydocstyle torch/utils/weak.py --count before: 32 after: 30 remaining errors ``` torch/utils/weak.py:1 at module level: D100: Missing docstring in public module torch/utils/weak.py:42 in public class `WeakIdRef`: D101: Missing docstring in public class torch/utils/weak.py:45 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/weak.py:54 in public method `__call__`: D102: Missing docstring in public method torch/utils/weak.py:61 in public method `__hash__`: D105: Missing docstring in magic method torch/utils/weak.py:64 in public method `__eq__`: D105: Missing docstring in magic method torch/utils/weak.py:84 in public class `WeakIdKeyDictionary`: D101: Missing docstring in public class torch/utils/weak.py:87 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/weak.py:131 in public method `__delitem__`: D105: Missing docstring in magic method torch/utils/weak.py:135 in public method `__getitem__`: D105: Missing docstring in magic method torch/utils/weak.py:138 in public method `__len__`: D105: Missing docstring in magic method torch/utils/weak.py:145 in public method `__repr__`: D105: Missing docstring in magic method torch/utils/weak.py:148 in public method `__setitem__`: D105: Missing docstring in magic method torch/utils/weak.py:151 in public method `copy`: D102: Missing docstring in public method torch/utils/weak.py:162 in public method `__deepcopy__`: D105: Missing docstring in magic method torch/utils/weak.py:172 in public method `get`: D102: Missing docstring in public method torch/utils/weak.py:175 in public method `__contains__`: D105: Missing docstring in magic method torch/utils/weak.py:182 in public method `items`: D102: Missing docstring in public method torch/utils/weak.py:189 in public method `keys`: D102: Missing docstring in public method torch/utils/weak.py:198 in public method `values`: D102: Missing docstring in public method torch/utils/weak.py:216 in public method `popitem`: D102: Missing docstring in public method torch/utils/weak.py:224 in public method `pop`: D102: Missing docstring in public method torch/utils/weak.py:228 in public method `setdefault`: D102: Missing docstring in public method torch/utils/weak.py:231 in public method `update`: D102: Missing docstring in public method torch/utils/weak.py:241 in public method `__ior__`: D105: Missing docstring in magic method torch/utils/weak.py:245 in public method `__or__`: D105: Missing docstring in magic method torch/utils/weak.py:252 in public method `__ror__`: D105: Missing docstring in magic method torch/utils/weak.py:262 in public method `__eq__`: D105: Missing docstring in magic method torch/utils/weak.py:276 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/weak.py:280 in public method `__call__`: D102: Missing docstring in public method ``` @mikaylagawarecki @jbschlosser @svekars Pull Request resolved: https://github.com/pytorch/pytorch/pull/113311 Approved by: https://github.com/ezyang	2023-11-15 17:40:04 +00:00
Nikita Shulga	e100ff42fd	Fix chrome trace entry format (#113763 ) Fix regression introduced by https://github.com/pytorch/pytorch/pull/107519 `'"args": {{}}}}, '` was part of format string, when curly braces a duplicated to get them printed single time, but ruff change left the string format as is Fixes https://github.com/pytorch/pytorch/issues/113756 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113763 Approved by: https://github.com/Skylion007, https://github.com/aaronenyeshi	2023-11-15 17:07:40 +00:00
PyTorch MergeBot	dedb47d94c	Revert "Fix resize matrix_power.out dynamic shapes (#113695 )" This reverts commit c3918c18b5f7b98ef83f9022062a9f1990e3324d. Reverted https://github.com/pytorch/pytorch/pull/113695 on behalf of https://github.com/ezyang due to sorry about that ([comment](https://github.com/pytorch/pytorch/pull/113695#issuecomment-1812705370))	2023-11-15 15:06:08 +00:00
George White	6c187246d6	Add support for float8_e4m3fnuz and _e5m2fnuz (#107586 ) This PR relates to the feature in [this feature submission](https://docs.google.com/document/d/1pF2T1xz54IPg1jG7FhykbrpbcJZVelQw0v8vBaoLkfs/edit). It has been based on #104242 which adds similar float8 types. These new types added in this PR are described in the paper at https://arxiv.org/abs/2206.02915. A brief description and comparison of the types with other float8 types can be also found in the [OpenXLA RFC](https://github.com/openxla/stablehlo/blob/main/rfcs/20230321-fp8_fnuz.md). Pull Request resolved: https://github.com/pytorch/pytorch/pull/107586 Approved by: https://github.com/seemethere, https://github.com/malfet	2023-11-15 15:01:11 +00:00
Edward Z. Yang	c3918c18b5	Fix resize matrix_power.out dynamic shapes (#113695 ) Fixes https://github.com/pytorch/pytorch/issues/113003 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113695 Approved by: https://github.com/bdhirsh, https://github.com/lezcano	2023-11-15 13:35:54 +00:00
Philip Meier	9146ca6a07	use sourceless builder for builtin getattr (#113340 ) In TorchVision we use the following (simplified) dispatch mechanism: ```python import torch def kernel1(tensor): return tensor + 2 def dispatcher1(input): kernel = get_kernel(dispatcher1, type(input)) return kernel(input) def kernel2(tensor): return tensor - 2 def dispatcher2(input): kernel = get_kernel(dispatcher2, type(input)) return kernel(input) # We actually use the function and type as keys, rather than their names. # However, this currently not supported, but should be easy to add after # https://github.com/pytorch/pytorch/pull/111196 REGISTRY = { "dispatcher1": {"Tensor": kernel1}, "dispatcher2": {"Tensor": kernel2}, } def get_kernel(dispatcher, input_type): dispatcher_registry = REGISTRY[dispatcher.__name__] for cls in input_type.__mro__: kernel = dispatcher_registry[cls.__name__] break return kernel ``` This can be compiled without graph breaks: ```python cfn = torch.compile(dispatcher1, fullgraph=True) torch.testing.assert_close(int(cfn(torch.tensor(3))), 5) cfn = torch.compile(dispatcher2, fullgraph=True) torch.testing.assert_close(int(cfn(torch.tensor(3))), 1) ``` However, if we start chaining these calls, we hit some issues: ```python class Pipeline(torch.nn.Module): def forward(self, input): input = dispatcher1(input) input = dispatcher2(input) return input cfn = torch.compile(Pipeline(), fullgraph=True) torch.testing.assert_close(int(cfn(torch.tensor(3))), 3) ``` ``` Can't access members of type(obj) for a generated custom object. Please use __class__ instead ``` The error message is not really helpful here. The following happens: when compiling `dispatcher1`, `get_kernel` gets inlined. That means when hitting `dispatcher2`, the `type` call no longer happens on an input with a source. Thus, in the first iteration we hit the top branch, while in the second we hit the bottom: `addb8e29cd/torch/_dynamo/variables/builtin.py (L1264-L1268)` And the error message I posted above originates from the type being treated as constant. This PR replaces this with a `SourcelessBuilder` instead. With that fix in place, we hit another pointing to `input_type.__mro__` ``` AssertionError: Consider SourcelessBuilder for ephemeral objects, usually objects created locally. ``` Fix is similar: instead of using a `VariableBuilder` here, we use a `SourcelessBuilder` in case we have no `source`: `addb8e29cd/torch/_dynamo/variables/builtin.py (L1167-L1168)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/113340 Approved by: https://github.com/peterbell10, https://github.com/lezcano	2023-11-15 13:01:20 +00:00
Angela Yi	50101d59ba	[export][retry] Move lifted tensors out of state_dict (#113689 ) Test Plan: CI Differential Revision: D51321532 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113689 Approved by: https://github.com/zhxchen17	2023-11-15 09:24:49 +00:00
Pritam Damania	17e2313dd3	Add an API to DDP for dynamically updating the underlying process group. (#113580 ) # Motivation If we would like to reinitialize DDP with a different PG with `torch.compile`, we need to do the following: ``` del old_ddp del old_pg pg = init_pg(...) ddp = DDP(pg) model = torch.compile(DDP) ``` This results in recompilation of the entire model and is very expensive. Since the only thing we need to update is the PG, we should be able to do this without having to compile the model again. # Proposal As a result, in this PR I've introduced an `_update_process_group` API which can dynamically update the underlying ProcessGroup used by DDP without needing to reinitialize DDP again. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113580 Approved by: https://github.com/fduwjj	2023-11-15 09:05:02 +00:00
Honglong Tian	7f1eda8c29	Minor: fix a typo (#113648 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/113648 Approved by: https://github.com/Skylion007, https://github.com/kit1980	2023-11-15 08:42:58 +00:00
gs-olive	757f36b988	[docs] Fix `torch.compile` "tensorrt" backend docs (#113711 ) - Update description from ONNX to current state (Torch-TensorRT) - Add clarification about import Fixes documentation on this page: https://pytorch.org/docs/stable/torch.compiler.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/113711 Approved by: https://github.com/msaroufim	2023-11-15 08:42:53 +00:00
drisspg	9b0f2f8d94	expose sdpa helpers to python (#110496 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110496 Approved by: https://github.com/jbschlosser	2023-11-15 07:34:34 +00:00
Nikita Shulga	78f3937ee8	[BE] Handle errors in `set_num_threads` (#113684 ) and `set_num_interop_threads` Before that, call `torch.set_num_threads(265)` resulted in segmentation fault, afterwards it becomes a good old runtime error: ``` % python -c "import torch;torch.set_num_threads(265)" Traceback (most recent call last): File "<string>", line 1, in <module> RuntimeError: Overflow when unpacking long ``` Similar to https://github.com/pytorch/pytorch/pull/60073 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113684 Approved by: https://github.com/Skylion007, https://github.com/albanD	2023-11-15 06:17:41 +00:00
Jiong Gong	1a8d076e0c	[inductor cpp] simplify test for uint8 add/sub (#113407 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113407 Approved by: https://github.com/lezcano ghstack dependencies: #113261	2023-11-15 06:17:25 +00:00
taomiao	dadca7aeec	remove \ in cache_dir (#110945 ) Fixes #110933 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110945 Approved by: https://github.com/masnesral, https://github.com/shunting314	2023-11-15 06:01:08 +00:00
Jez Ng	fda94124d7	[inductor] Make {cudagraph_trees,decomposition,post_grad}.py pass follow_imports typechecking (#113609 ) I added explicit imports to `kernel/__init__.py` as mypy doesn't seem to understand an empty `__init__.py`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113609 Approved by: https://github.com/eellison	2023-11-15 05:04:11 +00:00
min-jean-cho	6f4409073f	[doc] two diff meanings of rv generated by torch.tensor.geometric_ and torch.distributions.geometric.Geometric (#113183 ) The meaning of random variables generated by `torch.tensor.geometric_` and `torch.distributions.geometric.Geometric` are different, and they are defined by two different PMFs. Inform the user, so the user can choose their desired one. Background: https://github.com/pytorch/pytorch/pull/37984#issuecomment-630336511 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113183 Approved by: https://github.com/albanD	2023-11-15 03:49:04 +00:00
Jiong Gong	fcdfcdeef9	[inductor cpp] fix non-contiguous reduction store (#113261 ) Fix https://github.com/pytorch/pytorch/issues/113018 The reduction store in this case works on non-contiguous buffer. Previously, we only do scalar fallback for normal stores but not reduction stores. This PR fixes this. Before fix ```c++ #pragma omp for for(long x0=static_cast<long>(0L); x0<static_cast<long>(39L); x0+=static_cast<long>(1L)) { for(long x1=static_cast<long>(0L); x1<static_cast<long>(16L); x1+=static_cast<long>(16L)) { { #pragma omp declare reduction(max:at::vec::Vectorized<float>:omp_out = at::vec::maximum(omp_out, omp_in)) initializer(omp_priv={at::vec::Vectorized<float>(-std::numeric_limits<float>::infinity())}) float tmp_acc0 = -std::numeric_limits<float>::infinity(); at::vec::Vectorized<float> tmp_acc0_vec = at::vec::Vectorized<float>(-std::numeric_limits<float>::infinity()); for(long x2=static_cast<long>(0L); x2<static_cast<long>(18L); x2+=static_cast<long>(1L)) { auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr1 + static_cast<long>(x1 + (17Lx2) + (306Lx0))); tmp_acc0_vec = at::vec::maximum(tmp_acc0_vec, tmp0); } tmp_acc0_vec.store(out_ptr1 + static_cast<long>(x0 + (39Lx1))); // this is wrong since x0 is not vector dim } } #pragma omp simd simdlen(8) for(long x1=static_cast<long>(16L); x1<static_cast<long>(17L); x1+=static_cast<long>(1L)) { { float tmp_acc0 = -std::numeric_limits<float>::infinity(); for(long x2=static_cast<long>(0L); x2<static_cast<long>(18L); x2+=static_cast<long>(1L)) { auto tmp0 = in_ptr1[static_cast<long>(x1 + (17Lx2) + (306Lx0))]; tmp_acc0 = max_propagate_nan(tmp_acc0, tmp0); } out_ptr1[static_cast<long>(x0 + (39Lx1))] = tmp_acc0; } } } ``` After fix ```c++ #pragma omp for for(long x0=static_cast<long>(0L); x0<static_cast<long>(39L); x0+=static_cast<long>(1L)) { for(long x1=static_cast<long>(0L); x1<static_cast<long>(16L); x1+=static_cast<long>(16L)) { { #pragma omp declare reduction(max:at::vec::Vectorized<float>:omp_out = at::vec::maximum(omp_out, omp_in)) initializer(omp_priv={at::vec::Vectorized<float>(-std::numeric_limits<float>::infinity())}) float tmp_acc0 = -std::numeric_limits<float>::infinity(); at::vec::Vectorized<float> tmp_acc0_vec = at::vec::Vectorized<float>(-std::numeric_limits<float>::infinity()); for(long x2=static_cast<long>(0L); x2<static_cast<long>(18L); x2+=static_cast<long>(1L)) { auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr1 + static_cast<long>(x1 + (17Lx2) + (306Lx0))); tmp_acc0_vec = at::vec::maximum(tmp_acc0_vec, tmp0); } { __at_align__ float tmpbuf[16sizeof(float)/sizeof(float)]; tmp_acc0_vec.store(tmpbuf); for (long x1_inner = 0; x1_inner < 16; x1_inner++) out_ptr1[static_cast<long>(x0 + (39Lx1) + (39Lx1_inner))] = tmpbuf[x1_inner]; } } } #pragma omp simd simdlen(8) for(long x1=static_cast<long>(16L); x1<static_cast<long>(17L); x1+=static_cast<long>(1L)) { { float tmp_acc0 = -std::numeric_limits<float>::infinity(); for(long x2=static_cast<long>(0L); x2<static_cast<long>(18L); x2+=static_cast<long>(1L)) { auto tmp0 = in_ptr1[static_cast<long>(x1 + (17Lx2) + (306Lx0))]; tmp_acc0 = max_propagate_nan(tmp_acc0, tmp0); } out_ptr1[static_cast<long>(x0 + (39Lx1))] = tmp_acc0; } } } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/113261 Approved by: https://github.com/lezcano	2023-11-15 03:27:17 +00:00
andrewor14	f9ea697112	[quant][pt2][be] Refactor QAT tests for future patterns (#113658 ) Summary: Currently the QAT tests are very specific to conv-bn-2d. This makes it difficult to test new patterns like conv-bn-1d if we want to add them. This commit refactors these tests so we can add and test future patterns easily. Test Plan: python test/test_quantization.py TestQuantizePT2EQAT_ConvBn2d Reviewers: jerryzh168, kimishpatel Subscribers: jerryzh168, kimishpatel, supriyar Pull Request resolved: https://github.com/pytorch/pytorch/pull/113658 Approved by: https://github.com/jerryzh168	2023-11-15 02:17:13 +00:00
PyTorch MergeBot	77f66ade66	Revert "use sourceless builder for builtin getattr (#113340 )" This reverts commit d64bc8f0f81bd9b514eb1a5ee6f5b03094e4e6e9. Reverted https://github.com/pytorch/pytorch/pull/113340 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but the test is failing internally ([comment](https://github.com/pytorch/pytorch/pull/113340#issuecomment-1811684167))	2023-11-15 02:06:00 +00:00
Eli Uriegas	84ee7453ad	ci: Add clickable PR link to trymerge (#113712 ) Adds a link to trymerge so that you can quickly click through the job to the pull request for debugging. Signed-off-by: Eli Uriegas <eliuriegas@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113712 Approved by: https://github.com/clee2000, https://github.com/malfet	2023-11-15 01:55:33 +00:00
PyTorch MergeBot	92e3f45f0e	Revert "[dynamo] Refactor test cross importing (#113242 )" This reverts commit 4309d38f5d33530cbd875bded551e3fc08286c5d. Reverted https://github.com/pytorch/pytorch/pull/113242 on behalf of https://github.com/huydhn due to Sorry for reverting your stack, but it is failing to list test internally with buck2 ([comment](https://github.com/pytorch/pytorch/pull/113242#issuecomment-1811674395))	2023-11-15 01:53:07 +00:00
PyTorch MergeBot	6bffde99b0	Revert "[inductor] Move things into torch/testing/_internal/inductor_utils.py (#113275 )" This reverts commit 66d09f82170c528698b5ec606ba7838268ae1f8a. Reverted https://github.com/pytorch/pytorch/pull/113275 on behalf of https://github.com/huydhn due to Sorry for reverting your stack, but it is failing to list test internally with buck2 ([comment](https://github.com/pytorch/pytorch/pull/113275#issuecomment-1811666004))	2023-11-15 01:44:26 +00:00
PyTorch MergeBot	45671be2a0	Revert "Only check significant strides in test torchinductor (#113389 )" This reverts commit 28228e1517738f66f11ba278ed8e821c36dcff63. Reverted https://github.com/pytorch/pytorch/pull/113389 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it is conflicting with this revert https://github.com/pytorch/pytorch/pull/113275#issuecomment-1811651388, so I need to revert this to clean thing up ([comment](https://github.com/pytorch/pytorch/pull/113389#issuecomment-1811663791))	2023-11-15 01:41:16 +00:00
Jon Chuang	6a25bb8545	[inductor] use fusion_log for verbose logs (#113701 ) Fixes https://github.com/pytorch/pytorch/issues/113696 Previous logs hygeine not respected. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113701 Approved by: https://github.com/ezyang	2023-11-15 01:39:03 +00:00
PyTorch MergeBot	1e60174891	Revert "[dynamo] Add run_inductor_tests entrypoint (#113278 )" This reverts commit b00311ce9e430cf1b98d2103e21ed2179450a424. Reverted https://github.com/pytorch/pytorch/pull/113278 on behalf of https://github.com/huydhn due to Sorry for reverting your stack, but it is failing to list test internally with buck2 ([comment](https://github.com/pytorch/pytorch/pull/113278#issuecomment-1811646325))	2023-11-15 01:19:48 +00:00
Andrew Hoblitzell	9724d0fd87	docstyle _correct_bias.py _equalize.py _learnable_fake_quantize.py backend_config experimental fake_quantize.py fuse_modules.py fuser_method_mappings.py (#112992 ) Fixes #112988 For files __init__.py _correct_bias.py _equalize.py _learnable_fake_quantize.py backend_config experimental fake_quantize.py fuse_modules.py fuser_method_mappings.py Correct the following __init__.py:1 at module level: D104: Missing docstring in public package __init__.py:144 in public function `default_eval_fn`: D205: 1 blank line required between summary line and description (found 0) __init__.py:144 in public function `default_eval_fn`: D400: First line should end with a period (not 'f') __init__.py:144 in public function `default_eval_fn`: D401: First line should be in imperative mood; try rephrasing (found 'Default') __init__.py:152 in private class `_DerivedObserverOrFakeQuantize`: D204: 1 blank line required after class docstring (found 0) __init__.py:152 in private class `_DerivedObserverOrFakeQuantize`: D205: 1 blank line required between summary line and description (found 0) __init__.py:152 in private class `_DerivedObserverOrFakeQuantize`: D210: No whitespaces allowed surrounding docstring text __init__.py:152 in private class `_DerivedObserverOrFakeQuantize`: D400: First line should end with a period (not 's') _correct_bias.py:20 in public function `get_module`: D200: One-line docstring should fit on one line with quotes (found 2) _correct_bias.py:20 in public function `get_module`: D210: No whitespaces allowed surrounding docstring text _correct_bias.py:20 in public function `get_module`: D300: Use """triple double quotes""" (found '''-quotes) _correct_bias.py:20 in public function `get_module`: D400: First line should end with a period (not 'l') _correct_bias.py:25 in public function `parent_child_names`: D200: One-line docstring should fit on one line with quotes (found 2) _correct_bias.py:25 in public function `parent_child_names`: D300: Use """triple double quotes""" (found '''-quotes) _correct_bias.py:25 in public function `parent_child_names`: D400: First line should end with a period (not 'e') _correct_bias.py:25 in public function `parent_child_names`: D401: First line should be in imperative mood (perhaps 'Split', not 'Splits') _correct_bias.py:34 in public function `get_param`: D205: 1 blank line required between summary line and description (found 0) _correct_bias.py:34 in public function `get_param`: D210: No whitespaces allowed surrounding docstring text _correct_bias.py:34 in public function `get_param`: D300: Use """triple double quotes""" (found '''-quotes) _correct_bias.py:34 in public function `get_param`: D400: First line should end with a period (not 's') _correct_bias.py:44 in public class `MeanShadowLogger`: D204: 1 blank line required after class docstring (found 0) _correct_bias.py:44 in public class `MeanShadowLogger`: D205: 1 blank line required between summary line and description (found 0) _correct_bias.py:44 in public class `MeanShadowLogger`: D400: First line should end with a period (not 'n') _correct_bias.py:47 in public method `__init__`: D107: Missing docstring in __init__ _correct_bias.py:56 in public method `forward`: D205: 1 blank line required between summary line and description (found 0) _correct_bias.py:56 in public method `forward`: D210: No whitespaces allowed surrounding docstring text _correct_bias.py:56 in public method `forward`: D300: Use """triple double quotes""" (found '''-quotes) _correct_bias.py:56 in public method `forward`: D401: First line should be in imperative mood; try rephrasing (found 'The') _correct_bias.py:77 in public method `clear`: D102: Missing docstring in public method _correct_bias.py:85 in public function `bias_correction`: D205: 1 blank line required between summary line and description (found 0) _correct_bias.py:85 in public function `bias_correction`: D210: No whitespaces allowed surrounding docstring text _correct_bias.py:85 in public function `bias_correction`: D300: Use """triple double quotes""" (found '''-quotes) _correct_bias.py:85 in public function `bias_correction`: D400: First line should end with a period (not 's') _correct_bias.py:85 in public function `bias_correction`: D401: First line should be in imperative mood (perhaps 'Use', not 'Using') _equalize.py:22 in public function `set_module_weight`: D103: Missing docstring in public function _equalize.py:28 in public function `set_module_bias`: D103: Missing docstring in public function _equalize.py:34 in public function `get_module_weight`: D103: Missing docstring in public function _equalize.py:40 in public function `get_module_bias`: D103: Missing docstring in public function _equalize.py:47 in public function `max_over_ndim`: D200: One-line docstring should fit on one line with quotes (found 2) _equalize.py:47 in public function `max_over_ndim`: D210: No whitespaces allowed surrounding docstring text _equalize.py:47 in public function `max_over_ndim`: D300: Use """triple double quotes""" (found '''-quotes) _equalize.py:47 in public function `max_over_ndim`: D400: First line should end with a period (not 's') _equalize.py:47 in public function `max_over_ndim`: D401: First line should be in imperative mood (perhaps 'Apply', not 'Applies') _equalize.py:55 in public function `min_over_ndim`: D200: One-line docstring should fit on one line with quotes (found 2) _equalize.py:55 in public function `min_over_ndim`: D210: No whitespaces allowed surrounding docstring text _equalize.py:55 in public function `min_over_ndim`: D300: Use """triple double quotes""" (found '''-quotes) _equalize.py:55 in public function `min_over_ndim`: D400: First line should end with a period (not 's') _equalize.py:55 in public function `min_over_ndim`: D401: First line should be in imperative mood (perhaps 'Apply', not 'Applies') _equalize.py:63 in public function `channel_range`: D200: One-line docstring should fit on one line with quotes (found 2) _equalize.py:63 in public function `channel_range`: D210: No whitespaces allowed surrounding docstring text _equalize.py:63 in public function `channel_range`: D300: Use """triple double quotes""" (found '''-quotes) _equalize.py:63 in public function `channel_range`: D400: First line should end with a period (not 'l') _equalize.py:63 in public function `channel_range`: D401: First line should be in imperative mood (perhaps 'Find', not 'finds') _equalize.py:63 in public function `channel_range`: D403: First word of the first line should be properly capitalized ('Finds', not 'finds') _equalize.py:76 in public function `cross_layer_equalization`: D205: 1 blank line required between summary line and description (found 0) _equalize.py:76 in public function `cross_layer_equalization`: D210: No whitespaces allowed surrounding docstring text _equalize.py:76 in public function `cross_layer_equalization`: D300: Use """triple double quotes""" (found '''-quotes) _equalize.py:76 in public function `cross_layer_equalization`: D400: First line should end with a period (not 't') _equalize.py:120 in public function `equalize`: D205: 1 blank line required between summary line and description (found 0) _equalize.py:120 in public function `equalize`: D210: No whitespaces allowed surrounding docstring text _equalize.py:120 in public function `equalize`: D300: Use """triple double quotes""" (found '''-quotes) _equalize.py:120 in public function `equalize`: D400: First line should end with a period (not 'l') _equalize.py:159 in public function `converged`: D205: 1 blank line required between summary line and description (found 0) _equalize.py:159 in public function `converged`: D210: No whitespaces allowed surrounding docstring text _equalize.py:159 in public function `converged`: D300: Use """triple double quotes""" (found '''-quotes) _equalize.py:159 in public function `converged`: D400: First line should end with a period (not 's') _equalize.py:159 in public function `converged`: D401: First line should be in imperative mood (perhaps 'Test', not 'Tests') _learnable_fake_quantize.py:8 in private class `_LearnableFakeQuantize`: D204: 1 blank line required after class docstring (found 0) _learnable_fake_quantize.py:8 in private class `_LearnableFakeQuantize`: D205: 1 blank line required between summary line and description (found 0) _learnable_fake_quantize.py:8 in private class `_LearnableFakeQuantize`: D210: No whitespaces allowed surrounding docstring text _learnable_fake_quantize.py:8 in private class `_LearnableFakeQuantize`: D400: First line should end with a period (not 'h') _learnable_fake_quantize.py:68 in private method `enable_param_learning`: D205: 1 blank line required between summary line and description (found 0) _learnable_fake_quantize.py:68 in private method `enable_param_learning`: D400: First line should end with a period (not 'd') _learnable_fake_quantize.py:68 in private method `enable_param_learning`: D401: First line should be in imperative mood (perhaps 'Enable', not 'Enables') _learnable_fake_quantize.py:78 in private method `enable_static_estimate`: D205: 1 blank line required between summary line and description (found 0) _learnable_fake_quantize.py:78 in private method `enable_static_estimate`: D400: First line should end with a period (not 'f') _learnable_fake_quantize.py:78 in private method `enable_static_estimate`: D401: First line should be in imperative mood (perhaps 'Enable', not 'Enables') _learnable_fake_quantize.py:87 in private method `enable_static_observation`: D205: 1 blank line required between summary line and description (found 0) _learnable_fake_quantize.py:87 in private method `enable_static_observation`: D400: First line should end with a period (not 't') _learnable_fake_quantize.py:87 in private method `enable_static_observation`: D401: First line should be in imperative mood (perhaps 'Enable', not 'Enables') fake_quantize.py:1 at module level: D205: 1 blank line required between summary line and description (found 0) fake_quantize.py:1 at module level: D400: First line should end with a period (not 'n') fake_quantize.py:61 in public class `FakeQuantizeBase`: D205: 1 blank line required between summary line and description (found 0) fake_quantize.py:61 in public class `FakeQuantizeBase`: D210: No whitespaces allowed surrounding docstring text fake_quantize.py:61 in public class `FakeQuantizeBase`: D400: First line should end with a period (not 'e') fake_quantize.py:74 in public method `__init__`: D107: Missing docstring in __init__ fake_quantize.py:83 in public method `forward`: D102: Missing docstring in public method fake_quantize.py:87 in public method `calculate_qparams`: D102: Missing docstring in public method fake_quantize.py:91 in public method `enable_fake_quant`: D102: Missing docstring in public method fake_quantize.py:95 in public method `disable_fake_quant`: D102: Missing docstring in public method fake_quantize.py:99 in public method `enable_observer`: D102: Missing docstring in public method fake_quantize.py:103 in public method `disable_observer`: D102: Missing docstring in public method fake_quantize.py:107 in public method `with_args`: D102: Missing docstring in public method fake_quantize.py:115 in public class `FakeQuantize`: D205: 1 blank line required between summary line and description (found 0) fake_quantize.py:115 in public class `FakeQuantize`: D210: No whitespaces allowed surrounding docstring text fake_quantize.py:115 in public class `FakeQuantize`: D412: No blank lines allowed between a section header and its content ('Attributes') fake_quantize.py:150 in public method `__init__`: D107: Missing docstring in __init__ fake_quantize.py:188 in public method `calculate_qparams`: D102: Missing docstring in public method fake_quantize.py:191 in public method `forward`: D102: Missing docstring in public method fake_quantize.py:214 in public method `extra_repr`: D102: Missing docstring in public method fake_quantize.py:262 in public class `FixedQParamsFakeQuantize`: D205: 1 blank line required between summary line and description (found 0) fake_quantize.py:262 in public class `FixedQParamsFakeQuantize`: D210: No whitespaces allowed surrounding docstring text fake_quantize.py:262 in public class `FixedQParamsFakeQuantize`: D400: First line should end with a period (not 'n') fake_quantize.py:268 in public method `__init__`: D107: Missing docstring in __init__ fake_quantize.py:279 in public method `calculate_qparams`: D102: Missing docstring in public method fake_quantize.py:283 in public method `extra_repr`: D102: Missing docstring in public method fake_quantize.py:292 in public class `FusedMovingAvgObsFakeQuantize`: D205: 1 blank line required between summary line and description (found 0) fake_quantize.py:292 in public class `FusedMovingAvgObsFakeQuantize`: D400: First line should end with a period (not 'e') fake_quantize.py:307 in public method `__init__`: D107: Missing docstring in __init__ fake_quantize.py:322 in public method `calculate_qparams`: D102: Missing docstring in public method fake_quantize.py:326 in public method `extra_repr`: D102: Missing docstring in public method fake_quantize.py:342 in public method `forward`: D102: Missing docstring in public method fake_quantize.py:480 in private function `_is_fake_quant_script_module`: D200: One-line docstring should fit on one line with quotes (found 2) fake_quantize.py:480 in private function `_is_fake_quant_script_module`: D210: No whitespaces allowed surrounding docstring text fake_quantize.py:480 in private function `_is_fake_quant_script_module`: D300: Use """triple double quotes""" (found '''-quotes) fake_quantize.py:480 in private function `_is_fake_quant_script_module`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') fake_quantize.py:491 in public function `disable_fake_quant`: D400: First line should end with a period (not ':') fake_quantize.py:502 in public function `enable_fake_quant`: D400: First line should end with a period (not ':') fake_quantize.py:513 in public function `disable_observer`: D400: First line should end with a period (not ':') fake_quantize.py:524 in public function `enable_observer`: D400: First line should end with a period (not ':') fuse_modules.py:1 at module level: D100: Missing docstring in public module fuse_modules.py:39 in public function `fuse_known_modules`: D205: 1 blank line required between summary line and description (found 0) fuse_modules.py:39 in public function `fuse_known_modules`: D400: First line should end with a period (not 'd') fuse_modules.py:39 in public function `fuse_known_modules`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') fuse_modules.py:104 in public function `fuse_modules`: D400: First line should end with a period (not 'e') fuse_modules.py:167 in public function `fuse_modules_qat`: D200: One-line docstring should fit on one line with quotes (found 2) fuse_modules.py:167 in public function `fuse_modules_qat`: D210: No whitespaces allowed surrounding docstring text fuse_modules.py:167 in public function `fuse_modules_qat`: D400: First line should end with a period (not '`') fuser_method_mappings.py:1 at module level: D100: Missing docstring in public module fuser_method_mappings.py:18 in public function `fuse_conv_bn`: D400: First line should end with a period (not 'e') fuser_method_mappings.py:55 in public function `fuse_conv_bn_relu`: D400: First line should end with a period (not 'e') fuser_method_mappings.py:102 in public function `fuse_linear_bn`: D400: First line should end with a period (not 'e') fuser_method_mappings.py:131 in public function `fuse_convtranspose_bn`: D400: First line should end with a period (not 'e') fuser_method_mappings.py:154 in private function `_sequential_wrapper2`: D205: 1 blank line required between summary line and description (found 0) fuser_method_mappings.py:154 in private function `_sequential_wrapper2`: D210: No whitespaces allowed surrounding docstring text fuser_method_mappings.py:154 in private function `_sequential_wrapper2`: D400: First line should end with a period (not 's') fuser_method_mappings.py:182 in public function `get_fuser_method`: D205: 1 blank line required between summary line and description (found 0) fuser_method_mappings.py:182 in public function `get_fuser_method`: D210: No whitespaces allowed surrounding docstring text fuser_method_mappings.py:182 in public function `get_fuser_method`: D300: Use """triple double quotes""" (found '''-quotes) fuser_method_mappings.py:182 in public function `get_fuser_method`: D400: First line should end with a period (not ',') fuser_method_mappings.py:205 in private function `_get_valid_patterns`: D205: 1 blank line required between summary line and description (found 0) fuser_method_mappings.py:205 in private function `_get_valid_patterns`: D400: First line should end with a period (not ',') fuser_method_mappings.py:205 in private function `_get_valid_patterns`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') fuser_method_mappings.py:238 in public function `get_fuser_method_new`: D205: 1 blank line required between summary line and description (found 0) fuser_method_mappings.py:238 in public function `get_fuser_method_new`: D210: No whitespaces allowed surrounding docstring text fuser_method_mappings.py:238 in public function `get_fuser_method_new`: D400: First line should end with a period (not 'd') fuser_method_mappings.py:238 in public function `get_fuser_method_new`: D401: First line should be in imperative mood; try rephrasing (found 'This') Pull Request resolved: https://github.com/pytorch/pytorch/pull/112992 Approved by: https://github.com/kit1980	2023-11-15 00:59:44 +00:00
PyTorch MergeBot	252e68a83b	Revert "Add support for `torch.Generator` type in TorchScript (#110413 )" This reverts commit 54493fe8c4b1cca4c5ff993b99eb3e3dbc984226. Reverted https://github.com/pytorch/pytorch/pull/110413 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is, unfortunately, still breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/110413#issuecomment-1811625557))	2023-11-15 00:51:23 +00:00
Aryan Gupta	c892f1a318	Doc: Add and fix docstrings for torch.distributed files (#112735 ) Fixes #112647 Fixed and tested docstings for all files as defined in the issue. ``` > pydocstyle '/Users/guptaaryan16/Desktop/OSS/pytorch/torch/distributed/pipeline/sync/skip/skippable.py' --count Before: 15 After: 2 > pydocstyle torch/distributed/elastic/agent/server/local_elastic_agent.py --count Before: 4 After: 2 > pydocstyle '/Users/guptaaryan16/Desktop/OSS/pytorch/torch/distributed/elastic/agent/server/api.py' --count Before: 65 After: 12 > pydocstyle torch/distributed/elastic/agent/server/__init__.py --count Before: 2 After: 0 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112735 Approved by: https://github.com/kit1980	2023-11-15 00:49:07 +00:00
voznesenskym	b8b3c26d3d	If we re-fakeify a FakeTensor with the same ShapeEnv, preserve symbols (#113651 ) Subsumes half of https://github.com/pytorch/pytorch/pull/113605 We support fakeifying an already fake tensor, which will give you a new fake tensor mirroring the same structure as the original fake tensor, which is what is needed by https://github.com/pytorch/pytorch/issues/113643 . However, when this refakeification happens, we will naively reallocate all new sizes for all of the fake tensor. This is the right thing to do if you are re-fakeifying on a fresh ShapeEnv (because you're reparametrizing the sizes or something), but if you have two fake tensor modes which are sharing a shape environment, you would actually rather just reuse the original sizes/strides/offset from the original fake tensor. This ends up being pretty simple. I recommend viewing with whitespace diff turned off. There's some fuzz around jagged tensor handling; that code is probably not quite right, but I fixed it for this particular case in the most straightforward way. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113651 Approved by: https://github.com/albanD, https://github.com/eellison, https://github.com/bdhirsh	2023-11-15 00:36:04 +00:00
cyy	cab039fe9b	[1/N] Fixes clang-tidy warnings in header files (#113608 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113608 Approved by: https://github.com/Skylion007	2023-11-15 00:32:43 +00:00
min-jean-cho	31e16847ea	[doc] torch.tensor.geometric_, torch.tensor.uniform_ fix PMF vs PDF (#113109 ) - Geometric distribution is discrete, fix to PMF (probability mass function) - Continuous uniform distribution is continuous, fix to PDF (probability density function) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113109 Approved by: https://github.com/albanD	2023-11-15 00:30:19 +00:00
min-jean-cho	56c453233f	[doc] clarify the range of sampled rv for torch.tensor.exponential_ (#113195 ) Range of sampled random variable needs to be clarified for `torch.tensor.exponential_` whose supported interval is (0, inf) is different from [0, inf] of exponential distribution. Background: https://github.com/pytorch/pytorch/pull/37984#discussion_r1059527457, https://github.com/pytorch/pytorch/issues/48841#issuecomment-1530439039, https://github.com/pytorch/pytorch/pull/91673#discussion_r1069955813 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113195 Approved by: https://github.com/albanD	2023-11-15 00:30:14 +00:00
Siddharth Mishra	f5ce4d8baf	Fixed docstring errors in gradcheck.py, forward_ad.py, profiler_util.py, profiler_legacy.py, functional.py, grad_mode.py, function.py (#113266 ) Fixes #112594 docstring updated. Here are the output to with the number before and after. 1) torch/autograd/forward_ad.py Before : ``` /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/forward_ad.py:1 at module level: D100: Missing docstring in public module /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/forward_ad.py:23 in public function `enter_dual_level`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/forward_ad.py:23 in public function `enter_dual_level`: D401: First line should be in imperative mood; try rephrasing (found 'Function') /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/forward_ad.py:42 in public function `exit_dual_level`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/forward_ad.py:42 in public function `exit_dual_level`: D401: First line should be in imperative mood; try rephrasing (found 'Function') /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/forward_ad.py:62 in public function `make_dual`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/forward_ad.py:62 in public function `make_dual`: D400: First line should end with a period (not 'a') /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/forward_ad.py:128 in public class `UnpackedDualTensor`: D204: 1 blank line required after class docstring (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/forward_ad.py:128 in public class `UnpackedDualTensor`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/forward_ad.py:128 in public class `UnpackedDualTensor`: D209: Multi-line docstring closing quotes should be on a separate line /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/forward_ad.py:134 in public function `unpack_dual`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/forward_ad.py:165 in public class `dual_level`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/forward_ad.py:165 in public class `dual_level`: D400: First line should end with a period (not 't') /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/forward_ad.py:199 in public method `__enter__`: D105: Missing docstring in magic method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/forward_ad.py:202 in public method `__exit__`: D105: Missing docstring in magic method 15 ``` After: ``` /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/forward_ad.py:1 at module level: D100: Missing docstring in public module /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/forward_ad.py:205 in public method `__enter__`: D105: Missing docstring in magic method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/forward_ad.py:208 in public method `__exit__`: D105: Missing docstring in magic method 3 ``` 2) torch/autograd/functional.py Before: ``` /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/functional.py:1 at module level: D100: Missing docstring in public module /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/functional.py:262 in public function `vjp`: D202: No blank lines allowed after function docstring (found 1) /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/functional.py:262 in public function `vjp`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/functional.py:262 in public function `vjp`: D400: First line should end with a period (not 'e') /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/functional.py:262 in public function `vjp`: D401: First line should be in imperative mood; try rephrasing (found 'Function') /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/functional.py:359 in public function `jvp`: D202: No blank lines allowed after function docstring (found 1) /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/functional.py:359 in public function `jvp`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/functional.py:359 in public function `jvp`: D400: First line should end with a period (not 'f') /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/functional.py:359 in public function `jvp`: D401: First line should be in imperative mood; try rephrasing (found 'Function') /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/functional.py:584 in public function `jacobian`: D401: First line should be in imperative mood; try rephrasing (found 'Function') /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/functional.py:841 in public function `hessian`: D202: No blank lines allowed after function docstring (found 1) /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/functional.py:841 in public function `hessian`: D401: First line should be in imperative mood; try rephrasing (found 'Function') /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/functional.py:973 in public function `vhp`: D202: No blank lines allowed after function docstring (found 1) /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/functional.py:973 in public function `vhp`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/functional.py:973 in public function `vhp`: D400: First line should end with a period (not 'e') /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/functional.py:973 in public function `vhp`: D401: First line should be in imperative mood; try rephrasing (found 'Function') /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/functional.py:1076 in public function `hvp`: D202: No blank lines allowed after function docstring (found 1) /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/functional.py:1076 in public function `hvp`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/functional.py:1076 in public function `hvp`: D400: First line should end with a period (not 'r') /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/functional.py:1076 in public function `hvp`: D401: First line should be in imperative mood; try rephrasing (found 'Function') 20 ``` After: ``` /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/functional.py:1 at module level: D100: Missing docstring in public module 1 ``` 3) torch/autograd/profiler_legacy.py Before: ``` /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_legacy.py:1 at module level: D100: Missing docstring in public module /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_legacy.py:27 in public class `profile`: D400: First line should end with a period (not 'd') /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_legacy.py:29 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_legacy.py:62 in public method `config`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_legacy.py:74 in public method `__enter__`: D105: Missing docstring in magic method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_legacy.py:86 in public method `__exit__`: D105: Missing docstring in magic method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_legacy.py:103 in public method `__repr__`: D105: Missing docstring in magic method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_legacy.py:108 in public method `__str__`: D105: Missing docstring in magic method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_legacy.py:117 in public method `table`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_legacy.py:141 in public method `export_chrome_trace`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_legacy.py:148 in public method `export_stacks`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_legacy.py:154 in public method `key_averages`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_legacy.py:161 in public method `total_average`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_legacy.py:170 in public method `self_cpu_time_total`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_legacy.py:170 in public method `self_cpu_time_total`: D400: First line should end with a period (not 'f') /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_legacy.py:180 in private nested function `_get_record_key`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_legacy.py:180 in private nested function `_get_record_key`: D400: First line should end with a period (not 'd') /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_legacy.py:180 in private nested function `_get_record_key`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') 18 ``` After: ``` /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_legacy.py:1 at module level: D100: Missing docstring in public module /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_legacy.py:29 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_legacy.py:62 in public method `config`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_legacy.py:74 in public method `__enter__`: D105: Missing docstring in magic method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_legacy.py:86 in public method `__exit__`: D105: Missing docstring in magic method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_legacy.py:103 in public method `__repr__`: D105: Missing docstring in magic method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_legacy.py:108 in public method `__str__`: D105: Missing docstring in magic method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_legacy.py:117 in public method `table`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_legacy.py:141 in public method `export_chrome_trace`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_legacy.py:148 in public method `export_stacks`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_legacy.py:154 in public method `key_averages`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_legacy.py:161 in public method `total_average`: D102: Missing docstring in public method 12 ``` 4) torch/autograd/gradcheck.py Before: ``` /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/gradcheck.py:1 at module level: D100: Missing docstring in public module /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/gradcheck.py:27 in public class `GradcheckError`: D204: 1 blank line required after class docstring (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/gradcheck.py:27 in public class `GradcheckError`: D400: First line should end with a period (not '`') /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/gradcheck.py:258 in private function `_get_numerical_jacobian`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/gradcheck.py:258 in private function `_get_numerical_jacobian`: D400: First line should end with a period (not 'f') /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/gradcheck.py:258 in private function `_get_numerical_jacobian`: D401: First line should be in imperative mood (perhaps 'Compute', not 'Computes') /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/gradcheck.py:308 in public function `get_numerical_jacobian`: D401: First line should be in imperative mood; try rephrasing (found 'Deprecated') /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/gradcheck.py:459 in public function `get_numerical_jacobian_wrt_specific_input`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/gradcheck.py:488 in private function `_get_analytical_jacobian_forward_ad`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/gradcheck.py:488 in private function `_get_analytical_jacobian_forward_ad`: D400: First line should end with a period (not 't') /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/gradcheck.py:488 in private function `_get_analytical_jacobian_forward_ad`: D401: First line should be in imperative mood (perhaps 'Compute', not 'Computes') /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/gradcheck.py:816 in public function `get_analytical_jacobian`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/gradcheck.py:1944 in public function `gradcheck`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/gradcheck.py:1944 in public function `gradcheck`: D400: First line should end with a period (not 'l') /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/gradcheck.py:2133 in public function `gradgradcheck`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/gradcheck.py:2133 in public function `gradgradcheck`: D400: First line should end with a period (not 's') 16 ``` After: ``` /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/gradcheck.py:1 at module level: D100: Missing docstring in public module /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/gradcheck.py:463 in public function `get_numerical_jacobian_wrt_specific_input`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/gradcheck.py:820 in public function `get_analytical_jacobian`: D103: Missing docstring in public function 3 ``` 5) torch/autograd/function.py Before: ``` /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:1 at module level: D100: Missing docstring in public module /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:27 in public class `FunctionCtx`: D101: Missing docstring in public class /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:29 in public method `save_for_backward`: D401: First line should be in imperative mood (perhaps 'Save', not 'Saves') /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:88 in public method `save_for_forward`: D401: First line should be in imperative mood (perhaps 'Save', not 'Saves') /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:141 in public method `mark_dirty`: D401: First line should be in imperative mood (perhaps 'Mark', not 'Marks') /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:177 in public method `mark_shared_storage`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:185 in public method `mark_non_differentiable`: D401: First line should be in imperative mood (perhaps 'Mark', not 'Marks') /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:217 in public method `set_materialize_grads`: D401: First line should be in imperative mood (perhaps 'Set', not 'Sets') /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:276 in public class `BackwardCFunction`: D101: Missing docstring in public class /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:277 in public method `apply`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:291 in public method `apply_jvp`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:308 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:322 in private method `forward`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:322 in private method `forward`: D400: First line should end with a period (not 's') /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:322 in private method `forward`: D401: First line should be in imperative mood; try rephrasing (found 'This') /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:384 in private method `backward`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:384 in private method `backward`: D400: First line should end with a period (not 'e') /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:384 in private method `backward`: D401: First line should be in imperative mood (perhaps 'Define', not 'Defines') /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:416 in private method `jvp`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:416 in private method `jvp`: D400: First line should end with a period (not 'e') /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:416 in private method `jvp`: D401: First line should be in imperative mood (perhaps 'Define', not 'Defines') /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:439 in public class `Function`: D400: First line should end with a period (not '`') /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:472 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:482 in public method `__call__`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:505 in public method `vmap`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:505 in public method `vmap`: D400: First line should end with a period (not 'h') /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:505 in public method `vmap`: D401: First line should be in imperative mood (perhaps 'Define', not 'Defines') /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:536 in public method `apply`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:564 in public function `once_differentiable`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:612 in public function `traceable`: D401: First line should be in imperative mood (perhaps 'Mark', not 'Marks') /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:626 in public class `InplaceFunction`: D101: Missing docstring in public class /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:627 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:741 in public class `NestedIOFunction`: D101: Missing docstring in public class /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:761 in public method `backward`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:768 in public method `forward`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:775 in public method `save_for_backward`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:780 in public method `saved_tensors`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:784 in public method `mark_dirty`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:787 in public method `mark_non_differentiable`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:790 in public method `forward_extended`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:793 in public method `backward_extended`: D102: Missing docstring in public method 41 ``` After: ``` /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:1 at module level: D100: Missing docstring in public module /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:27 in public class `FunctionCtx`: D101: Missing docstring in public class /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:177 in public method `mark_shared_storage`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:276 in public class `BackwardCFunction`: D101: Missing docstring in public class /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:277 in public method `apply`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:291 in public method `apply_jvp`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:308 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:471 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:481 in public method `__call__`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:536 in public method `apply`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:564 in public function `once_differentiable`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:626 in public class `InplaceFunction`: D101: Missing docstring in public class /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:627 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:741 in public class `NestedIOFunction`: D101: Missing docstring in public class /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:761 in public method `backward`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:768 in public method `forward`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:775 in public method `save_for_backward`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:780 in public method `saved_tensors`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:784 in public method `mark_dirty`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:787 in public method `mark_non_differentiable`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:790 in public method `forward_extended`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/function.py:793 in public method `backward_extended`: D102: Missing docstring in public method 22 ``` 6) torch/autograd/profiler_util.py Before: ``` /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:1 at module level: D100: Missing docstring in public module /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:26 in public class `EventList`: D400: First line should end with a period (not ')') /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:28 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:46 in public method `__str__`: D105: Missing docstring in magic method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:70 in private method `_populate_cpu_children`: D202: No blank lines allowed after function docstring (found 1) /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:70 in private method `_populate_cpu_children`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:70 in private method `_populate_cpu_children`: D401: First line should be in imperative mood (perhaps 'Populate', not 'Populates') /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:166 in public method `self_cpu_time_total`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:179 in public method `table`: D401: First line should be in imperative mood (perhaps 'Print', not 'Prints') /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:210 in public method `export_chrome_trace`: D401: First line should be in imperative mood (perhaps 'Export', not 'Exports') /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:266 in public method `supported_export_stacks_metrics`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:273 in public method `export_stacks`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:354 in private function `_format_time`: D400: First line should end with a period (not 't') /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:354 in private function `_format_time`: D401: First line should be in imperative mood (perhaps 'Define', not 'Defines') /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:365 in private function `_format_time_share`: D400: First line should end with a period (not 't') /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:365 in private function `_format_time_share`: D401: First line should be in imperative mood (perhaps 'Define', not 'Defines') /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:373 in private function `_format_memory`: D400: First line should end with a period (not 'g') /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:373 in private function `_format_memory`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:408 in public method `cpu_time`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:412 in public method `cuda_time`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:416 in public method `privateuse1_time`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:420 in public class `Interval`: D101: Missing docstring in public class /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:421 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:425 in public method `elapsed_us`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:435 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:488 in public method `append_kernel`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:504 in public method `set_cpu_parent`: D400: First line should end with a period (not 't') /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:518 in public method `self_cpu_memory_usage`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:526 in public method `self_cuda_memory_usage`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:534 in public method `self_privateuse1_memory_usage`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:542 in public method `self_cpu_time_total`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:550 in public method `cuda_time_total`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:567 in public method `self_cuda_time_total`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:579 in public method `cpu_time_total`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:586 in public method `self_privateuse1_time_total`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:598 in public method `privateuse1_time_total`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:615 in public method `key`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:618 in public method `__repr__`: D105: Missing docstring in magic method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:659 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:687 in public method `add`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:726 in public method `__iadd__`: D105: Missing docstring in magic method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:729 in public method `__repr__`: D105: Missing docstring in magic method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:763 in public class `StringTable`: D101: Missing docstring in public class /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:764 in public method `__missing__`: D105: Missing docstring in magic method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:773 in public class `MemRecordsAcc`: D400: First line should end with a period (not 'l') /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:775 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:783 in public method `in_interval`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:846 in private function `_build_table`: D401: First line should be in imperative mood (perhaps 'Print', not 'Prints') 48 ``` After : ``` /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:1 at module level: D100: Missing docstring in public module /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:28 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:46 in public method `__str__`: D105: Missing docstring in magic method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:166 in public method `self_cpu_time_total`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:266 in public method `supported_export_stacks_metrics`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:273 in public method `export_stacks`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:408 in public method `cpu_time`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:412 in public method `cuda_time`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:416 in public method `privateuse1_time`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:420 in public class `Interval`: D101: Missing docstring in public class /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:421 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:425 in public method `elapsed_us`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:435 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:488 in public method `append_kernel`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:518 in public method `self_cpu_memory_usage`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:526 in public method `self_cuda_memory_usage`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:534 in public method `self_privateuse1_memory_usage`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:542 in public method `self_cpu_time_total`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:550 in public method `cuda_time_total`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:567 in public method `self_cuda_time_total`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:579 in public method `cpu_time_total`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:586 in public method `self_privateuse1_time_total`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:598 in public method `privateuse1_time_total`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:615 in public method `key`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:618 in public method `__repr__`: D105: Missing docstring in magic method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:659 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:687 in public method `add`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:726 in public method `__iadd__`: D105: Missing docstring in magic method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:729 in public method `__repr__`: D105: Missing docstring in magic method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:763 in public class `StringTable`: D101: Missing docstring in public class /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:764 in public method `__missing__`: D105: Missing docstring in magic method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:775 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/profiler_util.py:783 in public method `in_interval`: D102: Missing docstring in public method 33 ``` 7) torch/autograd/grad_mode.py Before: ``` /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/grad_mode.py:1 at module level: D100: Missing docstring in public module /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/grad_mode.py:73 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/grad_mode.py:78 in public method `__enter__`: D105: Missing docstring in magic method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/grad_mode.py:82 in public method `__exit__`: D105: Missing docstring in magic method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/grad_mode.py:133 in public method `__enter__`: D105: Missing docstring in magic method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/grad_mode.py:137 in public method `__exit__`: D105: Missing docstring in magic method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/grad_mode.py:182 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/grad_mode.py:187 in public method `__enter__`: D105: Missing docstring in magic method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/grad_mode.py:190 in public method `__exit__`: D105: Missing docstring in magic method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/grad_mode.py:193 in public method `clone`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/grad_mode.py:198 in public class `inference_mode`: D400: First line should end with a period (not 'e') /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/grad_mode.py:250 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/grad_mode.py:257 in public method `__new__`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/grad_mode.py:262 in public method `__enter__`: D105: Missing docstring in magic method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/grad_mode.py:266 in public method `__exit__`: D105: Missing docstring in magic method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/grad_mode.py:269 in public method `clone`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/grad_mode.py:301 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/grad_mode.py:306 in public method `__enter__`: D105: Missing docstring in magic method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/grad_mode.py:309 in public method `__exit__`: D105: Missing docstring in magic method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/grad_mode.py:312 in public method `clone`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/grad_mode.py:354 in private class `_unsafe_preserve_version_counter`: D400: First line should end with a period (not '!') 21 ``` After: ``` /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/grad_mode.py:1 at module level: D100: Missing docstring in public module /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/grad_mode.py:73 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/grad_mode.py:78 in public method `__enter__`: D105: Missing docstring in magic method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/grad_mode.py:82 in public method `__exit__`: D105: Missing docstring in magic method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/grad_mode.py:133 in public method `__enter__`: D105: Missing docstring in magic method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/grad_mode.py:137 in public method `__exit__`: D105: Missing docstring in magic method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/grad_mode.py:182 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/grad_mode.py:187 in public method `__enter__`: D105: Missing docstring in magic method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/grad_mode.py:190 in public method `__exit__`: D105: Missing docstring in magic method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/grad_mode.py:193 in public method `clone`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/grad_mode.py:250 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/grad_mode.py:257 in public method `__new__`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/grad_mode.py:262 in public method `__enter__`: D105: Missing docstring in magic method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/grad_mode.py:266 in public method `__exit__`: D105: Missing docstring in magic method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/grad_mode.py:269 in public method `clone`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/grad_mode.py:301 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/grad_mode.py:306 in public method `__enter__`: D105: Missing docstring in magic method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/grad_mode.py:309 in public method `__exit__`: D105: Missing docstring in magic method /home/ubuntu/Desktop/Docathon/pytorch/torch/autograd/grad_mode.py:312 in public method `clone`: D102: Missing docstring in public method 19 ``` @svekars @kit1980 @subramen Pull Request resolved: https://github.com/pytorch/pytorch/pull/113266 Approved by: https://github.com/aaronenyeshi, https://github.com/soulitzer, https://github.com/kit1980	2023-11-14 23:39:43 +00:00
eellison	28228e1517	Only check significant strides in test torchinductor (#113389 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113389 Approved by: https://github.com/int3	2023-11-14 22:45:09 +00:00
Yeounoh Chung	cf6e9f572e	Update xla pin (#113603 ) Fixes XLA workflow CI failures ``` ====================================================================== 344FAIL: test_set (__main__.TestAtenXlaTensor) 345---------------------------------------------------------------------- 346Traceback (most recent call last): 347 File "/tmp/pytorch/xla/test/test_operations.py", line 1007, in test_set 348 self.assertEqual(met.counter_value('DestroyXlaTensor'), 6) 349 File "/tmp/pytorch/xla/test/test_utils.py", line 301, in assertEqual 350 super(XlaTestCase, self).assertLessEqual(abs(x - y), prec, message) 351AssertionError: 1 not less than or equal to 1e-05 : 352 353---------------------------------------------------------------------- ``` We've disabled the failing test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113603 Approved by: https://github.com/JackCaoG, https://github.com/bdhirsh, https://github.com/malfet	2023-11-14 22:32:06 +00:00
Kazuaki Ishizaki	91973e1c31	Issue113185 (#113523 ) Fixes #113185 I have fixed the given docstring errors. The followings are the outputs with numbers before and after the changes: Pull Request resolved: https://github.com/pytorch/pytorch/pull/113523 Approved by: https://github.com/kit1980	2023-11-14 22:25:28 +00:00
Yanbo Liang	6b01126df5	[Easy] [Dynamo] Catch OSError when calling inspect.getfile (#113671 ) Fixes #111328 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113671 Approved by: https://github.com/Skylion007, https://github.com/williamwen42	2023-11-14 22:15:32 +00:00
Nikita Shulga	1d640566d4	[BE] Do not warn when safely loading legacy dicts (#113614 ) Use the same strategy as for unsafe pickler, i.e. use dummy `torch.serialization.StorageType` to represent legacy typed storage classes during deserialization. Add `_dtype` property to be able to use it for both new and legacy format deserialization. Parametrize `test_serialization_new_format_old_format_compat` Add regression test to validate that loading legacy modes can be done without any warnings Before the change: ``` % python test_serialization.py -v -k test_serialization_new_format_old_format_compat_ test_serialization_new_format_old_format_compat_cpu (__main__.TestBothSerializationCPU) ... ok test_serialization_new_format_old_format_compat_safe_cpu (__main__.TestBothSerializationCPU) ... /Users/nshulga/git/pytorch/pytorch/torch/_utils.py:836: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() return self.fget.__get__(instance, owner)() ok ---------------------------------------------------------------------- Ran 2 tests in 0.116s OK ``` Without the change but update test to catch warnings: ``` % python test_serialization.py -v -k test_serialization_new_format_old_format_compat_ test_serialization_new_format_old_format_compat_weights_only_False_cpu (__main__.TestBothSerializationCPU) ... ok test_serialization_new_format_old_format_compat_weights_only_True_cpu (__main__.TestBothSerializationCPU) ... FAIL ====================================================================== FAIL: test_serialization_new_format_old_format_compat_weights_only_True_cpu (__main__.TestBothSerializationCPU) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/nshulga/git/pytorch/pytorch/torch/testing/_internal/common_utils.py", line 2536, in wrapper method(args, kwargs) File "/Users/nshulga/git/pytorch/pytorch/torch/testing/_internal/common_device_type.py", line 415, in instantiated_test result = test(self, *param_kwargs) File "/Users/nshulga/git/pytorch/pytorch/test/test_serialization.py", line 807, in test_serialization_new_format_old_format_compat self.assertTrue(len(w) == 0, msg=f"Expected no warnings but got {[str(x) for x in w]}") AssertionError: False is not true : Expected no warnings but got ["{message : UserWarning('TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()'), category : 'UserWarning', filename : '/Users/nshulga/git/pytorch/pytorch/torch/_utils.py', lineno : 836, line : None}"] To execute this test, run the following from the base repo dir: python test/test_serialization.py -k test_serialization_new_format_old_format_compat_weights_only_True_cpu This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 ---------------------------------------------------------------------- Ran 2 tests in 0.109s FAILED (failures=1) ``` Fixes problem reported in https://github.com/pytorch/pytorch/issues/52181#issuecomment-1715738910 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113614 Approved by: https://github.com/kit1980, https://github.com/albanD	2023-11-14 22:09:10 +00:00
Li-Huai (Allan) Lin	538114db65	[MPS] Fix and refactor unary/binary ops with non-zero offset or non-contiguous output (#97085 ) Fixes #100764 This PR fixes the unary ops implementation and refactors the binary ops implementation a bit. For unary ops: Previously we didn't take into account unary ops that have a non-contiguous/storage-offset output, causing an incorrect result (because the MPS graph kernel always writes the buffer contiguously). Therefore, this PR creates a temporary output tensor for the graph first and then copy the result back to the original output tensor. We currently do not have a better fix other than this I think. For binary ops, see https://github.com/pytorch/pytorch/pull/97085#discussion_r1140999125 See the added test for repro. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97085 Approved by: https://github.com/malfet	2023-11-14 22:03:21 +00:00
Oguz Ulgen	9f71452331	Disable atomic_add fallback for cpu (#113655 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113655 Approved by: https://github.com/eellison	2023-11-14 21:40:29 +00:00
Aaron Gokaslan	18d7b8e4f7	[BE]: ruff apply rule PLW1510 to find silent subprocess errors (#113644 ) Reopens #111682 that I messed up due to a bad rebase and triggered some issues with CLA. This explicitly adds check=True or False to any subprocess calls where appropriate. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113644 Approved by: https://github.com/ezyang, https://github.com/kit1980	2023-11-14 20:59:40 +00:00
zabboud	53e7de4b65	Issue 112599 - fix pydocstyle errors (#113177 ) Fixes #112599 Fixed errors relating to pydocstyle in the following files. The remaining errors are related to docstrings at the module level and at methods within each module, `forward()`, `reset_parameters`, `__init__` ..etc pydocstyle torch/nn/modules/pooling.py --count before: 49 after: 29 remaining errors: ``` torch/nn/modules/pooling.py:1 at module level: D100: Missing docstring in public module torch/nn/modules/pooling.py:90 in public method `forward`: D102: Missing docstring in public method torch/nn/modules/pooling.py:163 in public method `forward`: D102: Missing docstring in public method torch/nn/modules/pooling.py:240 in public method `forward`: D102: Missing docstring in public method torch/nn/modules/pooling.py:315 in public method `__init__`: D107: Missing docstring in __init__ torch/nn/modules/pooling.py:321 in public method `forward`: D102: Missing docstring in public method torch/nn/modules/pooling.py:402 in public method `__init__`: D107: Missing docstring in __init__ torch/nn/modules/pooling.py:408 in public method `forward`: D102: Missing docstring in public method torch/nn/modules/pooling.py:472 in public method `__init__`: D107: Missing docstring in __init__ torch/nn/modules/pooling.py:478 in public method `forward`: D102: Missing docstring in public method torch/nn/modules/pooling.py:541 in public method `__init__`: D107: Missing docstring in __init__ torch/nn/modules/pooling.py:550 in public method `forward`: D102: Missing docstring in public method torch/nn/modules/pooling.py:620 in public method `__init__`: D107: Missing docstring in __init__ torch/nn/modules/pooling.py:630 in public method `forward`: D102: Missing docstring in public method torch/nn/modules/pooling.py:706 in public method `__init__`: D107: Missing docstring in __init__ torch/nn/modules/pooling.py:716 in public method `forward`: D102: Missing docstring in public method torch/nn/modules/pooling.py:720 in public method `__setstate__`: D105: Missing docstring in magic method torch/nn/modules/pooling.py:774 in public method `__init__`: D107: Missing docstring in __init__ torch/nn/modules/pooling.py:792 in public method `forward`: D102: Missing docstring in public method torch/nn/modules/pooling.py:845 in public method `__init__`: D107: Missing docstring in __init__ torch/nn/modules/pooling.py:863 in public method `forward`: D102: Missing docstring in public method torch/nn/modules/pooling.py:925 in public method `forward`: D102: Missing docstring in public method torch/nn/modules/pooling.py:979 in public method `forward`: D102: Missing docstring in public method torch/nn/modules/pooling.py:1026 in public method `forward`: D102: Missing docstring in public method torch/nn/modules/pooling.py:1068 in public method `forward`: D102: Missing docstring in public method torch/nn/modules/pooling.py:1111 in public method `forward`: D102: Missing docstring in public method torch/nn/modules/pooling.py:1150 in public method `forward`: D102: Missing docstring in public method torch/nn/modules/pooling.py:1189 in public method `forward`: D102: Missing docstring in public method torch/nn/modules/pooling.py:1228 in public method `forward`: D102: Missing docstring in public method ``` pydocstyle torch/nn/modules/upsampling.py --count before: 14 after: 7 remaining: ``` torch/nn/modules/upsampling.py:1 at module level: D100: Missing docstring in public module torch/nn/modules/upsampling.py:142 in public method `__init__`: D107: Missing docstring in __init__ torch/nn/modules/upsampling.py:156 in public method `forward`: D102: Missing docstring in public method torch/nn/modules/upsampling.py:160 in public method `__setstate__`: D105: Missing docstring in magic method torch/nn/modules/upsampling.py:166 in public method `extra_repr`: D102: Missing docstring in public method torch/nn/modules/upsampling.py:216 in public method `__init__`: D107: Missing docstring in __init__ torch/nn/modules/upsampling.py:263 in public method `__init__`: D107: Missing docstring in __init__ ``` pydocstyle torch/nn/modules/rnn.py --count before: 47 after: 40 remaining ``` torch/nn/modules/rnn.py:1 at module level: D100: Missing docstring in public module torch/nn/modules/rnn.py:59 in public method `__init__`: D107: Missing docstring in __init__ torch/nn/modules/rnn.py:160 in public method `__setattr__`: D105: Missing docstring in magic method torch/nn/modules/rnn.py:225 in public method `reset_parameters`: D102: Missing docstring in public method torch/nn/modules/rnn.py:230 in public method `check_input`: D102: Missing docstring in public method torch/nn/modules/rnn.py:242 in public method `get_expected_hidden_size`: D102: Missing docstring in public method torch/nn/modules/rnn.py:256 in public method `check_hidden_size`: D102: Missing docstring in public method torch/nn/modules/rnn.py:272 in public method `check_forward_args`: D102: Missing docstring in public method torch/nn/modules/rnn.py:278 in public method `permute_hidden`: D102: Missing docstring in public method torch/nn/modules/rnn.py:284 in public method `extra_repr`: D102: Missing docstring in public method torch/nn/modules/rnn.py:305 in public method `__getstate__`: D105: Missing docstring in magic method torch/nn/modules/rnn.py:313 in public method `__setstate__`: D105: Missing docstring in magic method torch/nn/modules/rnn.py:355 in public method `all_weights`: D102: Missing docstring in public method torch/nn/modules/rnn.py:471 in public method `__init__`: D107: Missing docstring in __init__ torch/nn/modules/rnn.py:478 in public method `__init__`: D107: Missing docstring in __init__ torch/nn/modules/rnn.py:481 in public method `__init__`: D107: Missing docstring in __init__ torch/nn/modules/rnn.py:503 in public method `forward` (skipping F811): D102: Missing docstring in public method torch/nn/modules/rnn.py:762 in public method `__init__`: D107: Missing docstring in __init__ torch/nn/modules/rnn.py:768 in public method `__init__`: D107: Missing docstring in __init__ torch/nn/modules/rnn.py:771 in public method `__init__`: D107: Missing docstring in __init__ torch/nn/modules/rnn.py:774 in public method `get_expected_cell_size`: D102: Missing docstring in public method torch/nn/modules/rnn.py:786 in public method `check_forward_args`: D102: Missing docstring in public method torch/nn/modules/rnn.py:798 in public method `permute_hidden`: D102: Missing docstring in public method torch/nn/modules/rnn.py:809 in public method `forward` (skipping F811): D102: Missing docstring in public method torch/nn/modules/rnn.py:820 in public method `forward` (skipping F811): D102: Missing docstring in public method torch/nn/modules/rnn.py:1030 in public method `__init__`: D107: Missing docstring in __init__ torch/nn/modules/rnn.py:1036 in public method `__init__`: D107: Missing docstring in __init__ torch/nn/modules/rnn.py:1039 in public method `__init__`: D107: Missing docstring in __init__ torch/nn/modules/rnn.py:1046 in public method `forward` (skipping F811): D102: Missing docstring in public method torch/nn/modules/rnn.py:1054 in public method `forward` (skipping F811): D102: Missing docstring in public method torch/nn/modules/rnn.py:1123 in public class `RNNCellBase`: D101: Missing docstring in public class torch/nn/modules/rnn.py:1134 in public method `__init__`: D107: Missing docstring in __init__ torch/nn/modules/rnn.py:1152 in public method `extra_repr`: D102: Missing docstring in public method torch/nn/modules/rnn.py:1160 in public method `reset_parameters`: D102: Missing docstring in public method torch/nn/modules/rnn.py:1224 in public method `__init__`: D107: Missing docstring in __init__ torch/nn/modules/rnn.py:1230 in public method `forward`: D102: Missing docstring in public method torch/nn/modules/rnn.py:1327 in public method `__init__`: D107: Missing docstring in __init__ torch/nn/modules/rnn.py:1332 in public method `forward`: D102: Missing docstring in public method torch/nn/modules/rnn.py:1422 in public method `__init__`: D107: Missing docstring in __init__ torch/nn/modules/rnn.py:1427 in public method `forward`: D102: Missing docstring in public method ``` pydocstyle torch/nn/modules/pixelshuffle.py --count before: 13 after: 8 remaining: ``` torch/nn/modules/pixelshuffle.py:1 at module level: D100: Missing docstring in public module torch/nn/modules/pixelshuffle.py:52 in public method `__init__`: D107: Missing docstring in __init__ torch/nn/modules/pixelshuffle.py:56 in public method `forward`: D102: Missing docstring in public method torch/nn/modules/pixelshuffle.py:59 in public method `extra_repr`: D102: Missing docstring in public method torch/nn/modules/pixelshuffle.py:105 in public method `__init__`: D107: Missing docstring in __init__ torch/nn/modules/pixelshuffle.py:109 in public method `forward`: D102: Missing docstring in public method torch/nn/modules/pixelshuffle.py:112 in public method `extra_repr`: D102: Missing docstring in public method ``` pydocstyle torch/nn/modules/sparse.py --count before: 14 after: 8 remaining errors: ``` torch/nn/modules/sparse.py:1 at module level: D100: Missing docstring in public module torch/nn/modules/sparse.py:124 in public method `__init__`: D107: Missing docstring in __init__ torch/nn/modules/sparse.py:153 in public method `reset_parameters`: D102: Missing docstring in public method torch/nn/modules/sparse.py:162 in public method `forward`: D102: Missing docstring in public method torch/nn/modules/sparse.py:167 in public method `extra_repr`: D102: Missing docstring in public method torch/nn/modules/sparse.py:320 in public method `__init__`: D107: Missing docstring in __init__ torch/nn/modules/sparse.py:350 in public method `reset_parameters`: D102: Missing docstring in public method torch/nn/modules/sparse.py:396 in public method `extra_repr`: D102: Missing docstring in public method ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/113177 Approved by: https://github.com/ezyang	2023-11-14 20:55:22 +00:00
FFFrog	a05639cea6	Add some checks about Device and Layout when create/convert named tensor (#113628 ) Fixes #113597 As the title stated Pull Request resolved: https://github.com/pytorch/pytorch/pull/113628 Approved by: https://github.com/ezyang	2023-11-14 20:40:27 +00:00
Andrew Gu	20eaa49dde	[PT-D] Made `_get_registry` return `None` if no APIs applied (#113654 ) I prefer to not modify the module if it does not have any of our APIs applied. The side effect of inserting a registry on the module when calling a getter is non-intuitive to me. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113654 Approved by: https://github.com/fegin	2023-11-14 20:28:11 +00:00
Wei Lu	afef32bd23	[Pytorch][Vulkan] native_layer_norm (#113573 ) Summary: We implement `native_layer_norm`. Compared to [`layer_norm`](https://pytorch.org/docs/stable/generated/torch.nn.LayerNorm.html), the output of native_layer_norm is a tuple of tensors containing: layer_norm, mean, 1/sqrt(var + eps). Test Plan: ``` [luwei@devbig984.prn1 /data/users/luwei/fbsource (2b2052666\|remote/fbandroid/stable)]$ LD_LIBRARY_PATH=third-party/swiftshader/lib/linux-x64/ buck run fbcode/mode/dev-nosan //xplat/caffe2:pt_vulkan_api_test_bin -- --gtest_filter="native_layer_norm" Building: finished in 0.1 sec (100%) 339/339 jobs, 0/339 updated Total time: 0.2 sec BUILD SUCCEEDED Running main() from third-party/googletest/1.14.0/googletest/googletest/src/gtest_main.cc Note: Google Test filter = native_layer_norm [==========] Running 3 tests from 1 test suite. [----------] Global test environment set-up. [----------] 3 tests from VulkanAPITest [ RUN ] VulkanAPITest.native_layer_norm_2d [ OK ] VulkanAPITest.native_layer_norm_2d (352 ms) [ RUN ] VulkanAPITest.native_layer_norm_3d [ OK ] VulkanAPITest.native_layer_norm_3d (308 ms) [ RUN ] VulkanAPITest.native_layer_norm_4d [ OK ] VulkanAPITest.native_layer_norm_4d (6 ms) [----------] 3 tests from VulkanAPITest (667 ms total) [----------] Global test environment tear-down [==========] 3 tests from 1 test suite ran. (667 ms total) [ PASSED ] 3 tests. ``` full test result in P881016177 Reviewed By: yipjustin Differential Revision: D51247030 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113573 Approved by: https://github.com/yipjustin	2023-11-14 20:11:32 +00:00
Aaron Gokaslan	b7b2178204	[BE]: Remove useless lambdas (#113602 ) Applies PLW0108 which removes useless lambda calls in Python, the rule is in preview so it is not ready to be enabled by default just yet. These are the autofixes from the rule. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113602 Approved by: https://github.com/albanD	2023-11-14 20:06:48 +00:00
Joel Schlosser	2a8a7425be	Fix to wrap jagged dims for split() / split_with_sizes() (#113591 ) Still need OpInfo-style tests to catch things like this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113591 Approved by: https://github.com/soulitzer	2023-11-14 19:36:08 +00:00
Joel Schlosser	ea39cc34f9	Refactor NestedTensor subclass to remove ragged_size from constructor (#113491 ) This PR removes the need for passing `ragged_size` into the `NestedTensor` constructor. This was an artifact of fake-ification, where sometimes we needed the NT to have a symbolic singleton symint shape for the ragged dimension. The new way of achieving this is to also store mappings between fake / functional tensors -> symbolic symints in the ragged structure registry. Now the `NestedTensor` constructor can just query this registry for the `ragged_size`. Old: `NestedTensor(values, offsets, , ragged_size=None, kwargs)` New: `NestedTensor(values, offsets, *kwargs)` This makes it possible to have a `_nested_view_from_values_offsets(values, offsets)` without needing to pass a `ragged_size`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113491 Approved by: https://github.com/ezyang, https://github.com/soulitzer	2023-11-14 19:32:21 +00:00
Guo Yejun	cdc9a05c89	cudagraph_trees.py: remove duplicate line (#113624 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/113624 Approved by: https://github.com/eellison	2023-11-14 19:20:23 +00:00
lezcano	149b9dfd04	[easy]Remove specialized value (#112252 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112252 Approved by: https://github.com/jansel ghstack dependencies: #111196	2023-11-14 19:14:03 +00:00
lezcano	b0805fa5d0	Support tensors as Dict keys (#111196 ) This prepares the PR where we implement sets in terms of dicts. To do so, rather than storing internally a dictionary that maps literals to VariableTrackers, it stores (pretty much) a dictionary from VTs to VTs. To do so, keys are wrapped in an opaque internal class `_Hashable`. The Hashable class is opaque on purpose so that it fails hard if if it inadvertently leaks back into user code. We also found and fixed a number of latent bugs and inconsistencies in the way dynamo checked what can be a dict key. More generally, we make much clearer what are the things that need to be modified to add a new supported key type to Dicts. Fixes https://github.com/pytorch/pytorch/issues/107595 Fixes https://github.com/pytorch/pytorch/issues/111603 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111196 Approved by: https://github.com/jansel	2023-11-14 19:14:03 +00:00
min-jean-cho	f22486b0fc	[doc] scale parameter notation for torch.Tensor.cauchy_ is misleading (#113178 ) Scale parameter notation currently used for `torch.Tensor.cauchy_` is misleading. Sigma (σ) is usually used to denote square root of variance. Variance is undefined in Cauchy distribution. Replace sigma (σ) with gamma (γ). Background: https://github.com/pytorch/pytorch/pull/37984#discussion_r1059551749 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113178 Approved by: https://github.com/mingxzhao, https://github.com/albanD	2023-11-14 18:55:42 +00:00
Ashvanth.S	e6bffc6b87	Fix docstring errors in default_hooks.py, optimizer_overlap.py, checkpoint_wrapper.py, copy.py, benchmark_ddp_rpc.py, utils.py, dependency.py, phony.py, pipeline.py, checkpoint.py, worker.py, batchnorm.py, quantization.py (#113511 ) Fixes #112645 Updated the files by fixing the docstring errors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113511 Approved by: https://github.com/weifengpy	2023-11-14 18:52:41 +00:00
Aaron Enye Shi	3b80577212	[Memory Snapshot] Add timestamps to memory events collected in snapshots (#112266 ) Summary: Use the same clock as the profiler to collect the timestamps on when memory events occurred. Save these to the snapshot dicts as well, so that they can be saved with the raw memory events. Test Plan: CI Observed that trace_entry will now have time_us field, and it is ascending. For example: ``` trace entry: {'action': 'free_requested', 'addr': 140366476918784, 'size': 8192, 'stream': 0, 'time_us': 1698326576864190} trace entry: {'action': 'free_completed', 'addr': 140366476918784, 'size': 8192, 'stream': 0, 'time_us': 1698326576864190} trace entry: {'action': 'free_requested', 'addr': 140366476936192, 'size': 8192, 'stream': 0, 'time_us': 1698326576864194} trace entry: {'action': 'free_completed', 'addr': 140366476936192, 'size': 8192, 'stream': 0, 'time_us': 1698326576864194} trace entry: {'action': 'free_requested', 'addr': 140366641430528, 'size': 8192000, 'stream': 0, 'time_us': 1698326576864205} trace entry: {'action': 'free_completed', 'addr': 140366641430528, 'size': 8192000, 'stream': 0, 'time_us': 1698326576864205} trace entry: {'action': 'free_requested', 'addr': 140366403571712, 'size': 4000, 'stream': 0, 'time_us': 1698326576864209} trace entry: {'action': 'free_completed', 'addr': 140366403571712, 'size': 4000, 'stream': 0, 'time_us': 1698326576864209} ``` Differential Revision: D50602011 Pulled By: aaronenyeshi Pull Request resolved: https://github.com/pytorch/pytorch/pull/112266 Approved by: https://github.com/zdevito	2023-11-14 18:48:59 +00:00
PyTorch MergeBot	5465f2bb6c	Revert "Improves comparison of state dicts for Checkpoint E2E Tests (#113181 )" This reverts commit 8f5fead86ea9a9eac85d20c6aee780e06ce04eb7. Reverted https://github.com/pytorch/pytorch/pull/113181 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing distribute test in trunk `8f5fead86e` with a not defined DTensor error ([comment](https://github.com/pytorch/pytorch/pull/113181#issuecomment-1810925052))	2023-11-14 18:42:40 +00:00
cyy	79e3833703	Enable clang-tidy in torch/csrc/quantized and some fixes (#113604 ) This PR enables clang-tidy checks in torch/csrc/quantized/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/113604 Approved by: https://github.com/Skylion007	2023-11-14 16:51:18 +00:00
andrewor14	14eb92cb43	[quant][pt2][be] Remove add/relu from conv-bn QAT pattern (#113006 ) Summary: This commit significantly simplifies the QAT fusion code for the `conv-bn` pattern by removing add and relu nodes from the match and replacement patterns. This does not reduce functionality; patterns like `conv-bn-relu`, `conv-bn-add`, and `conv-bn-add-relu` are still supported. We simply do not match these extra nodes, since there is actually no need to replace them. This has the additional benefit of reducing the number of patterns being matched by 16x, since for each add and relu variant of the `conv-bn` pattern there is also an in-place variant. This also enables more flexible `conv-bn` pattern matching in the future and keeps the number of patterns more scalable. One important change needed in this commit was to remove the match filter that requires the input and output activations to be quantized. This was necessary because otherwise we would always expect q-dq nodes immediately after the getitem node, instead of after the add or relu nodes for example. This has another side benefit of keeping QAT fusion flexible enough to support weight only quantization. Test Plan: python test/test_quantization.py TestQuantizePT2EQAT Reviewers: jerryzh168, kimishpatel Subscribers: jerryzh168, kimishpatel Pull Request resolved: https://github.com/pytorch/pytorch/pull/113006 Approved by: https://github.com/jerryzh168	2023-11-14 16:08:37 +00:00
Tugsbayasgalan Manlaibaatar	a7b75f586a	[RELAND] Disallow skipping dynamo (#110222 ) Previous discussion: https://github.com/pytorch/pytorch/pull/109476 In this PR, I made following additions to the original PR: 1) Unlifted graph module now runs the runtime assertions in its' forward call. 2) When we retrace, we make sure we run the assertions to make sure user is tracing the module with correct inputs with respect to the assumptions we made during first tracing. The way I do is that I create new graph module type with modified call method. And the runtime assertions happen under torchdynamo.disable so that it is just run in eager directly. The reason is we don't this to be traced part of the graph. 3) Both ep.module and capture_pre_autograd now returns _UnliftedGraphModule. Differential Revision: [D51078056](https://our.internmc.facebook.com/intern/diff/D51078056) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110222 Approved by: https://github.com/zhxchen17	2023-11-14 16:02:01 +00:00
Lucas Pasqualin	8f5fead86e	Improves comparison of state dicts for Checkpoint E2E Tests (#113181 ) Addresses the following comment - https://github.com/pytorch/pytorch/pull/112541#discussion_r1380197424 Changes the comparison of models in the checkpointing E2E test to compare a non-parallelized model against distribued model after training, saving, & loading. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113181 Approved by: https://github.com/fegin	2023-11-14 14:54:40 +00:00
Wanchao Liang	93372455a7	[2d] pass shape/stride during tensor unflatten (#113547 ) as titled, built on top of the work @wz337 enabled, this could save some runtime CPU time to recreate DTensor parameters with correct shape/stride, and avoid issues when un-even sharding parameters Pull Request resolved: https://github.com/pytorch/pytorch/pull/113547 Approved by: https://github.com/XilunWu ghstack dependencies: #113323, #113324	2023-11-14 09:28:09 +00:00
Wanchao Liang	7117bffff9	[funcol] a few optimizations to funcol (#113324 ) Apply a few optimizations to funcol: - allgather on non-0 dim, the resulting tensor already needs to access data in order to do torch.cat, so we sync wait here so that we don;t need to go through ACT dispatch for chunk + cat alltogether - have a fast return logic to aten.view as it's a commonly hit op for view related ops Pull Request resolved: https://github.com/pytorch/pytorch/pull/113324 Approved by: https://github.com/XilunWu ghstack dependencies: #113323	2023-11-14 09:28:09 +00:00
Wanchao Liang	b16e3b5373	[funcol] add two APIs: wait() and numpy() (#113323 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113323 Approved by: https://github.com/XilunWu, https://github.com/wz337, https://github.com/wconstab	2023-11-14 09:27:45 +00:00
Ying Zhang	a1e3c50165	A small fix for do_bench_using_profiling (#113611 ) ATT, there are cases where multiple kernel invocations have same kernel names, and key_averages() will wrongly get average results across different invocations. This fix uses cuda_time_total / n_repeat instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113611 Approved by: https://github.com/chenyang78	2023-11-14 06:31:22 +00:00
Snabel Kabiya	c21320b3b1	CPU Publish: Fix Assign device error, when module has multiple devices (#109149 ) (#113509 ) Summary: new version of this: https://www.internalfb.com/diff/D49110166?dst_version_fbid=252052334533986 Fix Assign device error, when module has multiple devices If fc_fp16_quantization enabled for CPU model. And module REMOTE_OTHER has multiple devices: {device(type='meta'), device(type='cpu')} We fail on this assertion: fbcode/caffe2/torch/ao/quantization/fx/utils.py 232 assert len(devices) <= 1, ( Since CPU models work on CPU devices, added a condition before the assertion. In case, we have CPU in module list of devices. Set device as CPU. Please see debug details: https://docs.google.com/document/d/1pMPCeJyMPA15NhFc2uAyNDkS9azR40uaNyOP0DIgHjU/edit Test Plan: AIMP_DISAGG_CPU=true buck run mode/opt -c python.package_style=inplace -c fbcode.enable_gpu_sections=true lego/scripts:lego_cli -- run-locally --model_entity_id 959168967 --config_version 28 --publish_context OFFLINE_PUBLISH --lego_pipeline aiplatform.modelstore.model_generation.lego.lego_pipeline_builder.gmpp_lego_pipeline --gmpp_config '{"gmpp_pipeline_descriptor": "aiplatform.modelstore.model_generation.v1.ads_pipelines.aimp_pyper_pipeline.model_generation_pipeline", "worker_process_number":12, "worker_thread_per_process_number": 6, "use_work_assignment": true}' 2>&1 \| tee /tmp/gmpp_lc.txt Snapshot: https://www.internalfb.com/manifold/explorer/ads_storage_fblearner/tree/user/facebook/fblearner/predictor/959168967/47 Differential Revision: D51226114 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113509 Approved by: https://github.com/jerryzh168	2023-11-14 06:15:32 +00:00
Nikita Shulga	b3a76ccc12	[BE] Make legacy type storage warning point to the caller (#113601 ) `@classproperty` decorator adds another wrapper, so warning with default stacklevel (2) would always point to the wrapper implementation rather than at callee. For example, before this change following code ```python import torch print(torch.FloatStorage.dtype) ``` will produce inactionable warning: ``` /Users/nshulga/git/pytorch/pytorch/torch/_utils.py:836: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() return self.fget.__get__(instance, owner)() ``` But after the change warning turns into: ``` /Users/nshulga/test/bar.py:2: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() print(torch.FloatStorage.dtype) ``` Discovered while reading https://github.com/pytorch/pytorch/issues/109108 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113601 Approved by: https://github.com/kit1980	2023-11-14 04:37:57 +00:00
Jez Ng	ffc3731dc4	Update TensorBase.to()'s' signature; create {guards,compiled_autograd}.pyi (#113536 ) I had to explicitly import submodules in torch/_C/_dynamo/__init__.pyi because mypy doesn't seem to understand empty `__init__.py[i]` files. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113536 Approved by: https://github.com/ezyang ghstack dependencies: #113412, #113535	2023-11-14 04:31:12 +00:00
Jez Ng	5b95715bc0	Make {Tracing,Compile}Context.get() return non-optional type (#113535 ) They are used in many contexts that don't actually check if the returned type is `None`. I have also created `try_get()` for the cases where we do actually want an Optional type returned. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113535 Approved by: https://github.com/ezyang ghstack dependencies: #113412	2023-11-14 04:31:12 +00:00
AllenTiTaiWang	d561654d99	[ONNX] Support more sympy operations in fx-onnx exporter (#112758 ) Fix https://github.com/microsoft/onnx-converters-private/issues/190 This PR retires built-in function mapping by adding built-in ops into torchlib (https://github.com/microsoft/onnxscript/pull/1135), and provide a runtime tests to guard the operation conversion. More built-in ops are supported in torchlib as well. ~~NOTE: `native_batch_norm` regression is caused by https://github.com/microsoft/onnxscript/issues/1140. Will fix it before I merge this.~~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/112758 Approved by: https://github.com/justinchuby, https://github.com/thiagocrepaldi	2023-11-14 03:40:48 +00:00
PyTorch UpdateBot	78ae49d104	[vision hash update] update the pinned vision hash (#113598 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113598 Approved by: https://github.com/pytorchbot, https://github.com/PaliC	2023-11-14 03:35:25 +00:00
voznesenskym	567db94d87	Add markDynamoStrictTest (#112768 ) Add markDynamoStrictTest Pull Request resolved: https://github.com/pytorch/pytorch/pull/112768 Approved by: https://github.com/zou3519	2023-11-14 02:52:12 +00:00
Jane Xu	edd967fe78	Add testing for foreach scalar Tensor overloads in inductor (#111600 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111600 Approved by: https://github.com/mlazos	2023-11-14 02:05:06 +00:00
Sergii Dymchenko	d94bfaff2e	Add TorchFix to the CI (#113403 ) Enable flake8 plugin for https://github.com/pytorch/test-infra/tree/main/tools/torchfix - TorchFix 0.1.1. Disable TorchFix codes that don't make sense for PyTorch itself. Update deprecated TorchVision APIs to make TorchFix pass. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113403 Approved by: https://github.com/Skylion007, https://github.com/malfet	2023-11-14 01:26:06 +00:00
Pearu Peterson	e1c872e009	Add optimal triton kernel parameters to bsr_dense_mm and scatter_mm for bfloat16 and float32 dtypes (#113553 ) As in the title. This PR is a follow-up to PR https://github.com/pytorch/pytorch/pull/112737 to address bfloat16 and float32 dtype cases. The performance increase is as follows (`NVIDIA A100-SXM4-80GB`): - bsr_scatter_mm and bfloat16 - for blocksize 16x16, the average/maximum speed up is about 29/75 %. - for blocksize 32x32, the average/maximum speed up is about 23/58 %. - for blocksize 64x64, the average/maximum speed up is about 27/66 %. - for blocksize 128x128, the average/maximum speed up is about 33/72 %. - bsr_dense_mm and bfloat16 - for blocksize 16x16, the average/maximum speed up is about 47/61 %. - for blocksize 32x32, the average/maximum speed up is about 29/43 %. - for blocksize 64x64, the average/maximum speed up is about 21/41 %. - for blocksize 128x128, the average/maximum speed up is about 12/29 %. - bsr_dense_mm and float32 - for blocksize 16x16, the average/maximum speed up is about 35/49 %. - for blocksize 32x32, the average/maximum speed up is about 2/5 %. - for blocksize 64x64, the average/maximum speed up is about 2/21 %. - for blocksize 128x128, the average/maximum speed up is about 79/84 %. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113553 Approved by: https://github.com/cpuhrsch	2023-11-14 00:47:59 +00:00
cyy	ff82dcd8fa	[2/N] Enable clang-tidy checks in torch/csrc/profiler (#113439 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/113439 Approved by: https://github.com/Skylion007	2023-11-14 00:39:54 +00:00
vfdev-5	a43c757275	Fixed error with cuda_ver in cpp_extension.py (#113555 ) Reported in `71ca42787f (r132390833)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/113555 Approved by: https://github.com/ezyang	2023-11-14 00:12:22 +00:00
Peter Bell	4b09b08d2e	Fix recompilation issue with content store (#113533 ) While running the accuracy minifier, I was getting the error: ``` NotImplementedError("xor_sum only implemented with inductor") ``` The logs showed that the cache limit was exceeded, and it was falling back to eager mode which doesn't work for this function. The cache failures was due to the code guarding on the id of the function being compiled which in this case is a closure that gets re-created for each function call so the guard always fails. This fixes the issue by making the storage hash kernel a global function and working around the dynamo dependency by the `lazy_compile` helper which defers the `torch.compile` call to the first invocation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113533 Approved by: https://github.com/Skylion007	2023-11-13 23:58:13 +00:00
Edward Z. Yang	ad06e9f060	Support logging aliases to list of modules (#113567 ) When SymNode was refactored into its own module, this broke logging for this file, as the `dynamic` alias no longer covered it. This PR adds supports for an alias to point to multiple qualified module names. To drive the refactor, I renamed `log_alias_to_log_qname` to `log_alias_to_log_qnames` and then audited all use sites. I invite you to do so as well. For good measure, I also add dynamic to dynamo, so that I always get dynamic logs when dynamo is enabled. Empirically this will be helpful because people keep sending me dynamo debug logs that don't have dynamic logs. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113567 Approved by: https://github.com/Skylion007, https://github.com/lezcano, https://github.com/mlazos ghstack dependencies: #113566	2023-11-13 23:35:18 +00:00
Edward Z. Yang	92ebf74ac1	Refactor loggers to use NOTSET when not set by user (#113566 ) Previously, the way our logging system worked was that for every registered log, we would explicit set a log level for it. This would lead to unintuitive behavior when you had multiple overlapping loggers, e.g., from the module hierarchy. Specifically, if you had `TORCH_LOGS=torch`, this would not actually set the logging level for torch._dynamo to be INFO, because the default log level is WARNING, and because torch._dynamo has a registered logger 'dynamo' this would end up setting the level on torch._dynamo to be WARNING, thereby overriding the level of the parent module torch. The 'all' logger did not display this behavior, but only because it was special cased to directly modify the default log level of all other loggers (and so this would not work for any sub-hierarchies). This PR refactors our code into a much more logical setup using NOTSET. Instead of setting the level of all loggers to some level, we instead default all loggers to NOTSET, unless a user explicitly requested logging from some logger. This means that if we have some logger which isn't explicitly mentioned by the user, parent loggers now have a chance to influence their log behavior. With this, I can eliminate the 'all' special case; 'all' really just means 'torch'. (I keep special handling directing all to torch for backwards compatibility, though arguably I can probably just turn all into an alias.) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113566 Approved by: https://github.com/mlazos, https://github.com/Skylion007	2023-11-13 23:35:18 +00:00
Antonio Kim	54493fe8c4	Add support for `torch.Generator` type in TorchScript (#110413 ) - Add support for `torch.Generator` type in TorchScript - Add `generator` args to all `torch.nn.init` functions that call `uniform_` or `normal_` - Add support for `torch.Generator` in LTC's TorchScript backend (CC: @wconstab) CC: @eellison @davidberard98 @GlebKazantaev @behzad-a Pull Request resolved: https://github.com/pytorch/pytorch/pull/110413 Approved by: https://github.com/wconstab, https://github.com/albanD, https://github.com/glebk-cerebras, https://github.com/davidberard98	2023-11-13 23:18:14 +00:00
ydwu4	3eacdaf1b3	[HigherOrderOp] add pytree operands tests for cond (#112661 ) This is a follow-up of #111611. After this PR, we allow pytree with tensor-only leaves as operands of branches. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112661 Approved by: https://github.com/zou3519	2023-11-13 23:09:46 +00:00
Jez Ng	68278cf7a8	[dynamo] Initialize tensor_weakref_to_sizes_strides with a weak dict (#113412 ) Spotted while working on getting output_graph.py to typecheck. The type hint indicates that it was intended to be initialized with a WeakIdKeyDictionary, but the actual runtime value was a regular dict. Not sure if there's some kind of test we should add for this fix. Looks like the code was originally added in https://github.com/pytorch/pytorch/pull/100128. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113412 Approved by: https://github.com/Skylion007, https://github.com/voznesenskym ghstack dependencies: #113413, #113518, #113519	2023-11-13 22:53:47 +00:00
Wanchao Liang	6ed20af10e	[dtensor] refactor op dispatch and fix is_same_size/equal (#112927 ) torch.equal/is_same_size currently skips sharding prop and directly do local tensor compute, this is wrong. for these two ops: - torch.equal: should not skip sharding prop, need to have two DTensor have the SAME sharding before compare local shard values - torch.is_same_size: need to completely skip both sharding prop and local compute This PR refactors the existing op_dispatch to make it a class instance so that we can do custom op handling, then fixes both torch.equal and torch.is_same_size Pull Request resolved: https://github.com/pytorch/pytorch/pull/112927 Approved by: https://github.com/fduwjj, https://github.com/XilunWu	2023-11-13 22:46:31 +00:00
pilot-j	9062e429db	Fixed docstring errors in torch/nn/functional.py (Docathon H2) (#112856 ) Fixes #112597 ### Output: BEFORE: ```functional.py:1 at module level: D400: First line should end with a period (not 'e') functional.py:438 in public function `fractional_max_pool2d_with_indices`: D400: First line should end with a period (not ')') functional.py:537 in public function `fractional_max_pool3d_with_indices`: D400: First line should end with a period (not ')') functional.py:646 in public function `max_pool1d_with_indices`: D400: First line should end with a period (not ')') functional.py:732 in public function `max_pool2d_with_indices`: D400: First line should end with a period (not ')') functional.py:818 in public function `max_pool3d_with_indices`: D400: First line should end with a period (not ')') functional.py:932 in public function `max_unpool1d`: D401: First line should be in imperative mood (perhaps 'Compute', not 'Computes') functional.py:968 in public function `max_unpool2d`: D401: First line should be in imperative mood (perhaps 'Compute', not 'Computes') functional.py:1000 in public function `max_unpool3d`: D401: First line should be in imperative mood (perhaps 'Compute', not 'Computes') functional.py:1031 in public function `lp_pool2d`: D205: 1 blank line required between summary line and description (found 0) functional.py:1031 in public function `lp_pool2d`: D400: First line should end with a period (not 'f') functional.py:1031 in public function `lp_pool2d`: D401: First line should be in imperative mood (perhaps 'Apply', not 'Applies') functional.py:1056 in public function `lp_pool1d`: D205: 1 blank line required between summary line and description (found 0) functional.py:1056 in public function `lp_pool1d`: D400: First line should end with a period (not 'f') functional.py:1056 in public function `lp_pool1d`: D401: First line should be in imperative mood (perhaps 'Apply', not 'Applies') functional.py:1077 in public function `adaptive_max_pool1d_with_indices`: D400: First line should end with a period (not ')') functional.py:1119 in public function `adaptive_max_pool2d_with_indices`: D400: First line should end with a period (not ')') functional.py:1163 in public function `adaptive_max_pool3d_with_indices`: D400: First line should end with a period (not ')') functional.py:1220 in public function `adaptive_avg_pool2d`: D205: 1 blank line required between summary line and description (found 0) functional.py:1220 in public function `adaptive_avg_pool2d`: D400: First line should end with a period (not 'f') functional.py:1220 in public function `adaptive_avg_pool2d`: D401: First line should be in imperative mood (perhaps 'Apply', not 'Applies') functional.py:1237 in public function `adaptive_avg_pool3d`: D205: 1 blank line required between summary line and description (found 0) functional.py:1237 in public function `adaptive_avg_pool3d`: D400: First line should end with a period (not 'f') functional.py:1237 in public function `adaptive_avg_pool3d`: D401: First line should be in imperative mood (perhaps 'Apply', not 'Applies') functional.py:1255 in public function `dropout`: D205: 1 blank line required between summary line and description (found 0) functional.py:1255 in public function `dropout`: D400: First line should end with a period (not 't') functional.py:1275 in public function `alpha_dropout`: D401: First line should be in imperative mood (perhaps 'Apply', not 'Applies') functional.py:1287 in public function `dropout1d`: D205: 1 blank line required between summary line and description (found 0) functional.py:1287 in public function `dropout1d`: D400: First line should end with a period (not ',') functional.py:1325 in public function `dropout2d`: D205: 1 blank line required between summary line and description (found 0) functional.py:1325 in public function `dropout2d`: D400: First line should end with a period (not ',') functional.py:1369 in public function `dropout3d`: D205: 1 blank line required between summary line and description (found 0) functional.py:1369 in public function `dropout3d`: D400: First line should end with a period (not ',') functional.py:1408 in public function `feature_alpha_dropout`: D205: 1 blank line required between summary line and description (found 0) functional.py:1408 in public function `feature_alpha_dropout`: D400: First line should end with a period (not ',') functional.py:1466 in public function `relu`: D400: First line should end with a period (not 'r') functional.py:1466 in public function `relu`: D402: First line should not be the function's "signature" functional.py:1491 in public function `glu`: D400: First line should end with a period (not 'r') functional.py:1491 in public function `glu`: D402: First line should not be the function's "signature" functional.py:1516 in public function `hardtanh`: D400: First line should end with a period (not 'r') functional.py:1516 in public function `hardtanh`: D402: First line should not be the function's "signature" functional.py:1542 in public function `relu6`: D400: First line should end with a period (not 'r') functional.py:1542 in public function `relu6`: D402: First line should not be the function's "signature" functional.py:1558 in public function `elu`: D401: First line should be in imperative mood (perhaps 'Apply', not 'Applies') functional.py:1582 in public function `selu`: D400: First line should end with a period (not 'r') functional.py:1582 in public function `selu`: D402: First line should not be the function's "signature" functional.py:1611 in public function `celu`: D400: First line should end with a period (not 'r') functional.py:1611 in public function `celu`: D402: First line should not be the function's "signature" functional.py:1638 in public function `leaky_relu`: D400: First line should end with a period (not 'r') functional.py:1638 in public function `leaky_relu`: D402: First line should not be the function's "signature" functional.py:1688 in public function `rrelu`: D400: First line should end with a period (not 'r') functional.py:1688 in public function `rrelu`: D402: First line should not be the function's "signature" functional.py:1755 in public function `tanhshrink`: D400: First line should end with a period (not 'r') functional.py:1755 in public function `tanhshrink`: D402: First line should not be the function's "signature" functional.py:1767 in public function `softsign`: D400: First line should end with a period (not 'r') functional.py:1767 in public function `softsign`: D402: First line should not be the function's "signature" functional.py:1806 in public function `softmin`: D401: First line should be in imperative mood (perhaps 'Apply', not 'Applies') functional.py:1832 in public function `softmax`: D401: First line should be in imperative mood (perhaps 'Apply', not 'Applies') functional.py:1868 in public function `gumbel_softmax`: D401: First line should be in imperative mood (perhaps 'Sample', not 'Samples') functional.py:1930 in public function `log_softmax`: D401: First line should be in imperative mood (perhaps 'Apply', not 'Applies') functional.py:1969 in public function `tanh`: D400: First line should end with a period (not 'r') functional.py:1969 in public function `tanh`: D402: First line should not be the function's "signature" functional.py:1980 in public function `sigmoid`: D400: First line should end with a period (not 'r') functional.py:1980 in public function `sigmoid`: D402: First line should not be the function's "signature" functional.py:1990 in public function `hardsigmoid`: D400: First line should end with a period (not 'n') functional.py:1990 in public function `hardsigmoid`: D401: First line should be in imperative mood (perhaps 'Apply', not 'Applies') functional.py:2057 in public function `silu`: D205: 1 blank line required between summary line and description (found 0) functional.py:2057 in public function `silu`: D401: First line should be in imperative mood (perhaps 'Apply', not 'Applies') functional.py:2081 in public function `mish`: D205: 1 blank line required between summary line and description (found 0) functional.py:2081 in public function `mish`: D401: First line should be in imperative mood (perhaps 'Apply', not 'Applies') functional.py:2100 in public function `hardswish`: D400: First line should end with a period (not ':') functional.py:2100 in public function `hardswish`: D401: First line should be in imperative mood (perhaps 'Apply', not 'Applies') functional.py:2136 in public function `embedding`: D202: No blank lines allowed after function docstring (found 1) functional.py:2136 in public function `embedding`: D401: First line should be in imperative mood; try rephrasing (found 'A') functional.py:2254 in public function `embedding_bag`: D205: 1 blank line required between summary line and description (found 0) functional.py:2254 in public function `embedding_bag`: D400: First line should end with a period (not 'e') functional.py:2254 in public function `embedding_bag`: D401: First line should be in imperative mood (perhaps 'Compute', not 'Computes') functional.py:2462 in public function `batch_norm`: D401: First line should be in imperative mood (perhaps 'Apply', not 'Applies') functional.py:2507 in public function `instance_norm`: D205: 1 blank line required between summary line and description (found 0) functional.py:2507 in public function `instance_norm`: D400: First line should end with a period (not 'a') functional.py:2507 in public function `instance_norm`: D401: First line should be in imperative mood (perhaps 'Apply', not 'Applies') functional.py:2540 in public function `layer_norm`: D401: First line should be in imperative mood (perhaps 'Apply', not 'Applies') functional.py:2554 in public function `group_norm`: D401: First line should be in imperative mood (perhaps 'Apply', not 'Applies') functional.py:2567 in public function `local_response_norm`: D205: 1 blank line required between summary line and description (found 0) functional.py:2567 in public function `local_response_norm`: D400: First line should end with a period (not 'f') functional.py:2567 in public function `local_response_norm`: D401: First line should be in imperative mood (perhaps 'Apply', not 'Applies') functional.py:2611 in public function `ctc_loss`: D401: First line should be in imperative mood; try rephrasing (found 'The') functional.py:2679 in public function `nll_loss`: D401: First line should be in imperative mood; try rephrasing (found 'The') functional.py:2895 in public function `kl_div`: D205: 1 blank line required between summary line and description (found 0) functional.py:2895 in public function `kl_div`: D400: First line should end with a period (not 's') functional.py:2895 in public function `kl_div`: D401: First line should be in imperative mood; try rephrasing (found 'The') functional.py:2978 in public function `cross_entropy`: D401: First line should be in imperative mood; try rephrasing (found 'This') functional.py:3069 in public function `binary_cross_entropy`: D205: 1 blank line required between summary line and description (found 0) functional.py:3069 in public function `binary_cross_entropy`: D400: First line should end with a period (not 't') functional.py:3069 in public function `binary_cross_entropy`: D401: First line should be in imperative mood; try rephrasing (found 'Function') functional.py:3139 in public function `binary_cross_entropy_with_logits`: D205: 1 blank line required between summary line and description (found 0) functional.py:3139 in public function `binary_cross_entropy_with_logits`: D400: First line should end with a period (not 't') functional.py:3139 in public function `binary_cross_entropy_with_logits`: D401: First line should be in imperative mood; try rephrasing (found 'Function') functional.py:3211 in public function `smooth_l1_loss`: D205: 1 blank line required between summary line and description (found 0) functional.py:3211 in public function `smooth_l1_loss`: D400: First line should end with a period (not 'e') functional.py:3211 in public function `smooth_l1_loss`: D401: First line should be in imperative mood; try rephrasing (found 'Function') functional.py:3251 in public function `huber_loss`: D205: 1 blank line required between summary line and description (found 0) functional.py:3251 in public function `huber_loss`: D400: First line should end with a period (not 'e') functional.py:3251 in public function `huber_loss`: D401: First line should be in imperative mood; try rephrasing (found 'Function') functional.py:3282 in public function `l1_loss`: D400: First line should end with a period (not 'r') functional.py:3282 in public function `l1_loss`: D402: First line should not be the function's "signature" functional.py:3313 in public function `mse_loss`: D400: First line should end with a period (not 'r') functional.py:3313 in public function `mse_loss`: D402: First line should not be the function's "signature" functional.py:3346 in public function `margin_ranking_loss`: D400: First line should end with a period (not 'r') functional.py:3346 in public function `margin_ranking_loss`: D402: First line should not be the function's "signature" functional.py:3382 in public function `hinge_embedding_loss`: D400: First line should end with a period (not 'r') functional.py:3382 in public function `hinge_embedding_loss`: D402: First line should not be the function's "signature" functional.py:3411 in public function `multilabel_margin_loss`: D400: First line should end with a period (not 'r') functional.py:3411 in public function `multilabel_margin_loss`: D402: First line should not be the function's "signature" functional.py:3439 in public function `soft_margin_loss`: D400: First line should end with a period (not 'r') functional.py:3439 in public function `soft_margin_loss`: D402: First line should not be the function's "signature" functional.py:3462 in public function `multilabel_soft_margin_loss`: D400: First line should end with a period (not 'r') functional.py:3462 in public function `multilabel_soft_margin_loss`: D402: First line should not be the function's "signature" functional.py:3510 in public function `cosine_embedding_loss`: D400: First line should end with a period (not 'r') functional.py:3510 in public function `cosine_embedding_loss`: D402: First line should not be the function's "signature" functional.py:3543 in public function `multi_margin_loss`: D400: First line should end with a period (not 'r') functional.py:3543 in public function `multi_margin_loss`: D402: First line should not be the function's "signature" functional.py:3708 in public function `upsample` (skipping F811,B950): D103: Missing docstring in public function functional.py:3713 in public function `upsample` (skipping F811,B950): D103: Missing docstring in public function functional.py:3718 in public function `upsample` (skipping F811): D205: 1 blank line required between summary line and description (found 0) functional.py:3718 in public function `upsample` (skipping F811): D400: First line should end with a period (not 'n') functional.py:3783 in private function `_is_integer`: D205: 1 blank line required between summary line and description (found 0) functional.py:3794 in public function `interpolate` (skipping F811,B950): D103: Missing docstring in public function functional.py:3799 in public function `interpolate` (skipping F811,B950): D103: Missing docstring in public function functional.py:3804 in public function `interpolate` (skipping F811,B950): D103: Missing docstring in public function functional.py:3809 in public function `interpolate` (skipping F811): D103: Missing docstring in public function functional.py:3821 in public function `interpolate` (skipping F811,B950): D205: 1 blank line required between summary line and description (found 0) functional.py:3821 in public function `interpolate` (skipping F811,B950): D400: First line should end with a period (not 'n') functional.py:4062 in public function `upsample_nearest` (skipping F811): D103: Missing docstring in public function functional.py:4067 in public function `upsample_nearest` (skipping F811): D103: Missing docstring in public function functional.py:4100 in public function `upsample_bilinear` (skipping F811): D103: Missing docstring in public function functional.py:4107 in public function `upsample_bilinear` (skipping F811): D103: Missing docstring in public function functional.py:4114 in public function `upsample_bilinear` (skipping F811): D103: Missing docstring in public function functional.py:4121 in public function `upsample_bilinear` (skipping F811): D103: Missing docstring in public function functional.py:4174 in public function `grid_sample`: D205: 1 blank line required between summary line and description (found 0) functional.py:4174 in public function `grid_sample`: D400: First line should end with a period (not 'e') functional.py:4315 in public function `affine_grid`: D205: 1 blank line required between summary line and description (found 0) functional.py:4315 in public function `affine_grid`: D400: First line should end with a period (not 'f') functional.py:4315 in public function `affine_grid`: D401: First line should be in imperative mood (perhaps 'Generate', not 'Generates') functional.py:4608 in public function `triplet_margin_loss`: D200: One-line docstring should fit on one line with quotes (found 3) functional.py:4608 in public function `triplet_margin_loss`: D400: First line should end with a period (not 's') functional.py:4643 in public function `triplet_margin_with_distance_loss`: D200: One-line docstring should fit on one line with quotes (found 3) functional.py:4705 in public function `normalize`: D401: First line should be in imperative mood (perhaps 'Perform', not 'Performs') functional.py:4733 in public function `assert_int_or_pair`: D103: Missing docstring in public function functional.py:4743 in public function `unfold`: D401: First line should be in imperative mood (perhaps 'Extract', not 'Extracts') functional.py:4773 in public function `fold`: D205: 1 blank line required between summary line and description (found 0) functional.py:4773 in public function `fold`: D400: First line should end with a period (not 'g') functional.py:4773 in public function `fold`: D401: First line should be in imperative mood (perhaps 'Combine', not 'Combines') functional.py:4800 in private function `_in_projection_packed`: D205: 1 blank line required between summary line and description (found 0) functional.py:4800 in private function `_in_projection_packed`: D401: First line should be in imperative mood (perhaps 'Perform', not 'Performs') functional.py:4867 in private function `_in_projection`: D205: 1 blank line required between summary line and description (found 0) functional.py:4867 in private function `_in_projection`: D400: First line should end with a period (not 'y') functional.py:4867 in private function `_in_projection`: D401: First line should be in imperative mood (perhaps 'Perform', not 'Performs') functional.py:5128 in public function `multi_head_attention_forward`: D205: 1 blank line required between summary line and description (found 0) functional.py:5128 in public function `multi_head_attention_forward`: D400: First line should end with a period (not ':') 160 ``` AFTER: ``` functional.py:3709 in public function `upsample` (skipping F811,B950): D103: Missing docstring in public function functional.py:3714 in public function `upsample` (skipping F811,B950): D103: Missing docstring in public function functional.py:3798 in public function `interpolate` (skipping F811,B950): D103: Missing docstring in public function functional.py:3803 in public function `interpolate` (skipping F811,B950): D103: Missing docstring in public function functional.py:3808 in public function `interpolate` (skipping F811,B950): D103: Missing docstring in public function functional.py:3813 in public function `interpolate` (skipping F811): D103: Missing docstring in public function functional.py:4068 in public function `upsample_nearest` (skipping F811): D103: Missing docstring in public function functional.py:4073 in public function `upsample_nearest` (skipping F811): D103: Missing docstring in public function functional.py:4106 in public function `upsample_bilinear` (skipping F811): D103: Missing docstring in public function functional.py:4113 in public function `upsample_bilinear` (skipping F811): D103: Missing docstring in public function functional.py:4120 in public function `upsample_bilinear` (skipping F811): D103: Missing docstring in public function functional.py:4127 in public function `upsample_bilinear` (skipping F811): D103: Missing docstring in public function functional.py:4742 in public function `assert_int_or_pair`: D103: Missing docstring in public function 13 ``` The file contained several docstring errors. I have fixed all of them(hopefully) and have tried to improve the over all readability of the code. For most part, I have included relevant description of functions (referred from official PyTorch Docs). In some cases where functions are purely mathematical or it is difficult to give one line description, I have just included references. For testing, I relied on local system and created a separate file. For final edits, I directly changed the contents of forked repo as visible already. Kindly review @svekars @subramen @kit1980 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112856 Approved by: https://github.com/kit1980	2023-11-13 22:16:49 +00:00
pilot-j	a2552d5521	Fixed docstring errors inside torch/cuda/ and torch/optim/ (Docathon H2) (#112964 ) Fixes #112592 1) File: torch/cuda/random.py ``` Before: /content/pytorch/torch/cuda/random.py:1 at module level: D100: Missing docstring in public module /content/pytorch/torch/cuda/random.py:21 in public function `get_rng_state`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') /content/pytorch/torch/cuda/random.py:43 in public function `get_rng_state_all`: D202: No blank lines allowed after function docstring (found 1) /content/pytorch/torch/cuda/random.py:43 in public function `get_rng_state_all`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') /content/pytorch/torch/cuda/random.py:54 in public function `set_rng_state`: D401: First line should be in imperative mood (perhaps 'Set', not 'Sets') /content/pytorch/torch/cuda/random.py:79 in public function `set_rng_state_all`: D208: Docstring is over-indented /content/pytorch/torch/cuda/random.py:79 in public function `set_rng_state_all`: D209: Multi-line docstring closing quotes should be on a separate line /content/pytorch/torch/cuda/random.py:79 in public function `set_rng_state_all`: D401: First line should be in imperative mood (perhaps 'Set', not 'Sets') /content/pytorch/torch/cuda/random.py:79 in public function `set_rng_state_all`: D414: Section has no content ('Args') /content/pytorch/torch/cuda/random.py:88 in public function `manual_seed`: D205: 1 blank line required between summary line and description (found 0) /content/pytorch/torch/cuda/random.py:88 in public function `manual_seed`: D401: First line should be in imperative mood (perhaps 'Set', not 'Sets') /content/pytorch/torch/cuda/random.py:110 in public function `manual_seed_all`: D205: 1 blank line required between summary line and description (found 0) /content/pytorch/torch/cuda/random.py:110 in public function `manual_seed_all`: D401: First line should be in imperative mood (perhaps 'Set', not 'Sets') /content/pytorch/torch/cuda/random.py:128 in public function `seed`: D205: 1 blank line required between summary line and description (found 0) /content/pytorch/torch/cuda/random.py:128 in public function `seed`: D401: First line should be in imperative mood (perhaps 'Set', not 'Sets') /content/pytorch/torch/cuda/random.py:146 in public function `seed_all`: D205: 1 blank line required between summary line and description (found 0) /content/pytorch/torch/cuda/random.py:146 in public function `seed_all`: D401: First line should be in imperative mood (perhaps 'Set', not 'Sets') /content/pytorch/torch/cuda/random.py:167 in public function `initial_seed`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') 18 ``` ``` After: /content/pytorch/torch/cuda/random.py:1 at module level: D100: Missing docstring in public module 1 ``` 2) File: torch/cuda/amp/autocast_mode.py ``` Before: /content/pytorch/torch/cuda/amp/autocast_mode.py:1 at module level: D100: Missing docstring in public module /content/pytorch/torch/cuda/amp/autocast_mode.py:18 in public class `autocast`: D205: 1 blank line required between summary line and description (found 0) /content/pytorch/torch/cuda/amp/autocast_mode.py:23 in public method `__init__`: D107: Missing docstring in __init__ /content/pytorch/torch/cuda/amp/autocast_mode.py:38 in public method `__enter__`: D105: Missing docstring in magic method /content/pytorch/torch/cuda/amp/autocast_mode.py:44 in public method `__exit__`: D105: Missing docstring in magic method /content/pytorch/torch/cuda/amp/autocast_mode.py:49 in public method `__call__`: D102: Missing docstring in public method /content/pytorch/torch/cuda/amp/autocast_mode.py:90 in public function `custom_fwd`: D205: 1 blank line required between summary line and description (found 0) /content/pytorch/torch/cuda/amp/autocast_mode.py:90 in public function `custom_fwd`: D400: First line should end with a period (not 'f') /content/pytorch/torch/cuda/amp/autocast_mode.py:90 in public function `custom_fwd`: D401: First line should be in imperative mood; try rephrasing (found 'Helper') /content/pytorch/torch/cuda/amp/autocast_mode.py:130 in public function `custom_bwd`: D205: 1 blank line required between summary line and description (found 0) /content/pytorch/torch/cuda/amp/autocast_mode.py:130 in public function `custom_bwd`: D400: First line should end with a period (not 'f') /content/pytorch/torch/cuda/amp/autocast_mode.py:130 in public function `custom_bwd`: D401: First line should be in imperative mood; try rephrasing (found 'Helper') 12 ``` ``` After: /content/pytorch/torch/cuda/amp/autocast_mode.py:1 at module level: D100: Missing docstring in public module /content/pytorch/torch/cuda/amp/autocast_mode.py:23 in public method `__init__`: D107: Missing docstring in __init__ /content/pytorch/torch/cuda/amp/autocast_mode.py:38 in public method `__enter__`: D105: Missing docstring in magic method /content/pytorch/torch/cuda/amp/autocast_mode.py:44 in public method `__exit__`: D105: Missing docstring in magic method /content/pytorch/torch/cuda/amp/autocast_mode.py:49 in public method `__call__`: D102: Missing docstring in public method 5 ``` 3) File: torch/cuda/amp/grad_scaler.py ``` Before: /content/pytorch/torch/cuda/amp/grad_scaler.py:1 at module level: D100: Missing docstring in public module /content/pytorch/torch/cuda/amp/grad_scaler.py:17 in private class `_MultiDeviceReplicator`: D200: One-line docstring should fit on one line with quotes (found 3) /content/pytorch/torch/cuda/amp/grad_scaler.py:39 in public class `OptState`: D101: Missing docstring in public class /content/pytorch/torch/cuda/amp/grad_scaler.py:50 in public class `GradScaler`: D205: 1 blank line required between summary line and description (found 0) /content/pytorch/torch/cuda/amp/grad_scaler.py:50 in public class `GradScaler`: D400: First line should end with a period (not 'g') /content/pytorch/torch/cuda/amp/grad_scaler.py:115 in public method `__init__`: D107: Missing docstring in __init__ /content/pytorch/torch/cuda/amp/grad_scaler.py:354 in public method `step`: D400: First line should end with a period (not ':') /content/pytorch/torch/cuda/amp/grad_scaler.py:456 in public method `update`: D401: First line should be in imperative mood (perhaps 'Update', not 'Updates') /content/pytorch/torch/cuda/amp/grad_scaler.py:529 in public method `get_scale`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') /content/pytorch/torch/cuda/amp/grad_scaler.py:544 in public method `get_growth_factor`: D200: One-line docstring should fit on one line with quotes (found 3) /content/pytorch/torch/cuda/amp/grad_scaler.py:544 in public method `get_growth_factor`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') /content/pytorch/torch/cuda/amp/grad_scaler.py:550 in public method `set_growth_factor`: D205: 1 blank line required between summary line and description (found 0) /content/pytorch/torch/cuda/amp/grad_scaler.py:550 in public method `set_growth_factor`: D400: First line should end with a period (not ':') /content/pytorch/torch/cuda/amp/grad_scaler.py:557 in public method `get_backoff_factor`: D200: One-line docstring should fit on one line with quotes (found 3) /content/pytorch/torch/cuda/amp/grad_scaler.py:557 in public method `get_backoff_factor`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') /content/pytorch/torch/cuda/amp/grad_scaler.py:563 in public method `set_backoff_factor`: D205: 1 blank line required between summary line and description (found 0) /content/pytorch/torch/cuda/amp/grad_scaler.py:563 in public method `set_backoff_factor`: D400: First line should end with a period (not ':') /content/pytorch/torch/cuda/amp/grad_scaler.py:570 in public method `get_growth_interval`: D200: One-line docstring should fit on one line with quotes (found 3) /content/pytorch/torch/cuda/amp/grad_scaler.py:570 in public method `get_growth_interval`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') /content/pytorch/torch/cuda/amp/grad_scaler.py:576 in public method `set_growth_interval`: D205: 1 blank line required between summary line and description (found 0) /content/pytorch/torch/cuda/amp/grad_scaler.py:576 in public method `set_growth_interval`: D400: First line should end with a period (not ':') /content/pytorch/torch/cuda/amp/grad_scaler.py:592 in public method `is_enabled`: D200: One-line docstring should fit on one line with quotes (found 3) /content/pytorch/torch/cuda/amp/grad_scaler.py:592 in public method `is_enabled`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') /content/pytorch/torch/cuda/amp/grad_scaler.py:598 in public method `state_dict`: D400: First line should end with a period (not ':') /content/pytorch/torch/cuda/amp/grad_scaler.py:598 in public method `state_dict`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') /content/pytorch/torch/cuda/amp/grad_scaler.py:624 in public method `load_state_dict`: D401: First line should be in imperative mood (perhaps 'Load', not 'Loads') /content/pytorch/torch/cuda/amp/grad_scaler.py:649 in public method `__getstate__`: D105: Missing docstring in magic method /content/pytorch/torch/cuda/amp/grad_scaler.py:665 in public method `__setstate__`: D105: Missing docstring in magic method 28 ``` ``` After: /content/pytorch/torch/cuda/amp/grad_scaler.py:1 at module level: D100: Missing docstring in public module /content/pytorch/torch/cuda/amp/grad_scaler.py:40 in public class `OptState`: D101: Missing docstring in public class /content/pytorch/torch/cuda/amp/grad_scaler.py:117 in public method `__init__`: D107: Missing docstring in __init__ /content/pytorch/torch/cuda/amp/grad_scaler.py:647 in public method `__getstate__`: D105: Missing docstring in magic method /content/pytorch/torch/cuda/amp/grad_scaler.py:663 in public method `__setstate__`: D105: Missing docstring in magic method 5 ``` 4) File: torch/optim/_functional.py ``` Before: /content/pytorch/torch/optim/_functional.py:1 at module level: D400: First line should end with a period (not 'e') 1 ``` ``` After: 0 ``` 5) File: torch/optim/__init__.py ``` Before: /content/pytorch/torch/optim/__init__.py:1 at module level: D205: 1 blank line required between summary line and description (found 0) 1 ``` ``` After: 0 ``` 6) File: torch/optim/lbfgs.py ``` Before: /content/pytorch/torch/optim/lbfgs.py:1 at module level: D100: Missing docstring in public module /content/pytorch/torch/optim/lbfgs.py:185 in public class `LBFGS`: D205: 1 blank line required between summary line and description (found 0) /content/pytorch/torch/optim/lbfgs.py:185 in public class `LBFGS`: D400: First line should end with a period (not 'c') /content/pytorch/torch/optim/lbfgs.py:215 in public method `__init__`: D107: Missing docstring in __init__ /content/pytorch/torch/optim/lbfgs.py:285 in public method `step`: D401: First line should be in imperative mood (perhaps 'Perform', not 'Performs') 5 ``` ``` After: /content/pytorch/torch/optim/lbfgs.py:1 at module level: D100: Missing docstring in public module /content/pytorch/torch/optim/lbfgs.py:217 in public method `__init__`: D107: Missing docstring in __init__ 2 ``` 7)File: torch/optim/sparse_adam.py ``` Before: /content/pytorch/torch/optim/sparse_adam.py:1 at module level: D100: Missing docstring in public module /content/pytorch/torch/optim/sparse_adam.py:7 in public class `SparseAdam`: D101: Missing docstring in public class /content/pytorch/torch/optim/sparse_adam.py:8 in public method `__init__`: D107: Missing docstring in __init__ /content/pytorch/torch/optim/sparse_adam.py:40 in public method `step`: D401: First line should be in imperative mood (perhaps 'Perform', not 'Performs') 4 ``` ``` After: /content/pytorch/torch/optim/sparse_adam.py:1 at module level: D100: Missing docstring in public module /content/pytorch/torch/optim/sparse_adam.py:7 in public class `SparseAdam`: D101: Missing docstring in public class /content/pytorch/torch/optim/sparse_adam.py:8 in public method `__init__`: D107: Missing docstring in __init__ 3 ``` 8) File:torch/optim/adadelta.py ``` Before: /content/pytorch/torch/optim/adadelta.py:1 at module level: D100: Missing docstring in public module /content/pytorch/torch/optim/adadelta.py:11 in public class `Adadelta`: D101: Missing docstring in public class /content/pytorch/torch/optim/adadelta.py:12 in public method `__init__`: D107: Missing docstring in __init__ /content/pytorch/torch/optim/adadelta.py:44 in public method `__setstate__`: D105: Missing docstring in magic method /content/pytorch/torch/optim/adadelta.py:82 in public method `step`: D401: First line should be in imperative mood (perhaps 'Perform', not 'Performs') /content/pytorch/torch/optim/adadelta.py:193 in public function `adadelta`: D202: No blank lines allowed after function docstring (found 1) 6 ``` ``` After: /content/pytorch/torch/optim/adadelta.py:1 at module level: D100: Missing docstring in public module /content/pytorch/torch/optim/adadelta.py:11 in public class `Adadelta`: D101: Missing docstring in public class /content/pytorch/torch/optim/adadelta.py:12 in public method `__init__`: D107: Missing docstring in __init__ /content/pytorch/torch/optim/adadelta.py:44 in public method `__setstate__`: D105: Missing docstring in magic method 4 ``` 9) File: torch/optim/adagrad.py ``` Before: /content/pytorch/torch/optim/adagrad.py:1 at module level: D100: Missing docstring in public module /content/pytorch/torch/optim/adagrad.py:11 in public class `Adagrad`: D101: Missing docstring in public class /content/pytorch/torch/optim/adagrad.py:12 in public method `__init__`: D107: Missing docstring in __init__ /content/pytorch/torch/optim/adagrad.py:63 in public method `__setstate__`: D105: Missing docstring in magic method /content/pytorch/torch/optim/adagrad.py:78 in public method `share_memory`: D102: Missing docstring in public method /content/pytorch/torch/optim/adagrad.py:100 in public method `step`: D401: First line should be in imperative mood (perhaps 'Perform', not 'Performs') /content/pytorch/torch/optim/adagrad.py:201 in public function `adagrad`: D202: No blank lines allowed after function docstring (found 1) 7 ``` ``` After: /content/pytorch/torch/optim/adagrad.py:1 at module level: D100: Missing docstring in public module /content/pytorch/torch/optim/adagrad.py:11 in public class `Adagrad`: D101: Missing docstring in public class /content/pytorch/torch/optim/adagrad.py:12 in public method `__init__`: D107: Missing docstring in __init__ /content/pytorch/torch/optim/adagrad.py:63 in public method `__setstate__`: D105: Missing docstring in magic method /content/pytorch/torch/optim/adagrad.py:78 in public method `share_memory`: D102: Missing docstring in public method 5 ``` 10) File: torch/optim/adam.py ``` Before: /content/pytorch/torch/optim/adam.py:1 at module level: D100: Missing docstring in public module /content/pytorch/torch/optim/adam.py:14 in public class `Adam`: D101: Missing docstring in public class /content/pytorch/torch/optim/adam.py:15 in public method `__init__`: D107: Missing docstring in __init__ /content/pytorch/torch/optim/adam.py:65 in public method `__setstate__`: D105: Missing docstring in magic method /content/pytorch/torch/optim/adam.py:135 in public method `step`: D401: First line should be in imperative mood (perhaps 'Perform', not 'Performs') /content/pytorch/torch/optim/adam.py:281 in public function `adam`: D202: No blank lines allowed after function docstring (found 1) /content/pytorch/torch/optim/adam.py:281 in public function `adam`: D205: 1 blank line required between summary line and description (found 0) 7 ``` ``` After: /content/pytorch/torch/optim/adam.py:1 at module level: D100: Missing docstring in public module /content/pytorch/torch/optim/adam.py:14 in public class `Adam`: D101: Missing docstring in public class /content/pytorch/torch/optim/adam.py:15 in public method `__init__`: D107: Missing docstring in __init__ /content/pytorch/torch/optim/adam.py:65 in public method `__setstate__`: D105: Missing docstring in magic method 4 ``` 11) File: torch/optim/adamax.py ``` Before: /content/pytorch/torch/optim/adamax.py:1 at module level: D100: Missing docstring in public module /content/pytorch/torch/optim/adamax.py:12 in public class `Adamax`: D101: Missing docstring in public class /content/pytorch/torch/optim/adamax.py:13 in public method `__init__`: D107: Missing docstring in __init__ /content/pytorch/torch/optim/adamax.py:47 in public method `__setstate__`: D105: Missing docstring in magic method /content/pytorch/torch/optim/adamax.py:91 in public method `step`: D401: First line should be in imperative mood (perhaps 'Perform', not 'Performs') /content/pytorch/torch/optim/adamax.py:203 in public function `adamax`: D202: No blank lines allowed after function docstring (found 1) 6 ``` ``` After: /content/pytorch/torch/optim/adamax.py:1 at module level: D100: Missing docstring in public module /content/pytorch/torch/optim/adamax.py:12 in public class `Adamax`: D101: Missing docstring in public class /content/pytorch/torch/optim/adamax.py:13 in public method `__init__`: D107: Missing docstring in __init__ /content/pytorch/torch/optim/adamax.py:47 in public method `__setstate__`: D105: Missing docstring in magic method 4 ``` 12) File: torch/optim/adamw.py ``` Before: /content/pytorch/torch/optim/adamw.py:1 at module level: D100: Missing docstring in public module /content/pytorch/torch/optim/adamw.py:12 in public class `AdamW`: D101: Missing docstring in public class /content/pytorch/torch/optim/adamw.py:13 in public method `__init__`: D107: Missing docstring in __init__ /content/pytorch/torch/optim/adamw.py:73 in public method `__setstate__`: D105: Missing docstring in magic method /content/pytorch/torch/optim/adamw.py:153 in public method `step`: D401: First line should be in imperative mood (perhaps 'Perform', not 'Performs') /content/pytorch/torch/optim/adamw.py:304 in public function `adamw`: D202: No blank lines allowed after function docstring (found 1) 6 ``` ``` After: /content/pytorch/torch/optim/adamw.py:1 at module level: D100: Missing docstring in public module /content/pytorch/torch/optim/adamw.py:12 in public class `AdamW`: D101: Missing docstring in public class /content/pytorch/torch/optim/adamw.py:13 in public method `__init__`: D107: Missing docstring in __init__ /content/pytorch/torch/optim/adamw.py:73 in public method `__setstate__`: D105: Missing docstring in magic method 4 ``` 13) File: torch/optim/asgd.py ``` Before: /content/pytorch/torch/optim/asgd.py:1 at module level: D100: Missing docstring in public module /content/pytorch/torch/optim/asgd.py:17 in public class `ASGD`: D101: Missing docstring in public class /content/pytorch/torch/optim/asgd.py:18 in public method `__init__`: D107: Missing docstring in __init__ /content/pytorch/torch/optim/asgd.py:52 in public method `__setstate__`: D105: Missing docstring in magic method /content/pytorch/torch/optim/asgd.py:107 in public method `step`: D401: First line should be in imperative mood (perhaps 'Perform', not 'Performs') /content/pytorch/torch/optim/asgd.py:195 in public function `asgd`: D202: No blank lines allowed after function docstring (found 1) 6 ``` ``` After: /content/pytorch/torch/optim/asgd.py:1 at module level: D100: Missing docstring in public module /content/pytorch/torch/optim/asgd.py:17 in public class `ASGD`: D101: Missing docstring in public class /content/pytorch/torch/optim/asgd.py:18 in public method `__init__`: D107: Missing docstring in __init__ /content/pytorch/torch/optim/asgd.py:52 in public method `__setstate__`: D105: Missing docstring in magic method 4 ``` Resolved docstring errors as listed. I initially changed in the main branch of forked repo which caused changes to appear in my PR to other issue. I have fixed that and hope this PR won't have any conflicts. Kindly review @svekars @jbschlosser. In case of any other issues please let me know. Thanks! Pull Request resolved: https://github.com/pytorch/pytorch/pull/112964 Approved by: https://github.com/kit1980	2023-11-13 22:16:44 +00:00
drisspg	27c3774320	Forward fix efficient attention rocm failure (#113588 ) See #110495 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113588 Approved by: https://github.com/malfet	2023-11-13 22:15:18 +00:00
Ting Lu	b3a7d9208b	disable test int_mm for sm90 or later (#113327 ) disable test int_mm for sm90 or later ``` python test/test_linalg.py -k test__int_mm_k_32_n_32_use_transpose_a_False_use_transpose_b_False_cuda _ TestLinalgCUDA.test__int_mm_k_32_n_32_use_transpose_a_False_use_transpose_b_False_cuda _ Traceback (most recent call last): File "/usr/lib/python3.10/unittest/case.py", line 59, in testPartExecutor yield File "/usr/lib/python3.10/unittest/case.py", line 591, in run self._callTestMethod(testMethod) File "/usr/lib/python3.10/unittest/case.py", line 549, in _callTestMethod method() File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_utils.py", line 2410, in wrapper method(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_utils.py", line 2410, in wrapper method(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_device_type.py", line 428, in instantiated_test raise rte File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_device_type.py", line 415, in instantiated_test result = test(self, param_kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_device_type.py", line 1084, in only_fn return fn(slf, args, *kwargs) File "/opt/pytorch/pytorch/test/test_linalg.py", line 5719, in test__int_mm _test(17, k, n, use_transpose_a, use_transpose_b) File "/opt/pytorch/pytorch/test/test_linalg.py", line 5680, in _test c_int32 = torch._int_mm(a_int8, b_int8) RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling cublasLtMatmul with transpose_mat1 0 transpose_mat2 0 m 32 n 17 k 32 mat1_ld 32 mat2_ld 32 result_ld 32 abType 3 cType 10 computeType 72 scaleType 10 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/113327 Approved by: https://github.com/malfet	2023-11-13 22:13:44 +00:00
zabboud	01478f1afa	Fix pydocstyle errors listed in issue 112589 (#113227 ) Fixes #112589 Fixed errors relating to pydocstyle in the following files. The remaining errors are related to docstrings at the module level and at methods within each module (see details below) pydocstyle torch/cuda/_utils.py --count before: 3 after: 0 pydocstyle torch/cuda/jiterator.py --count before: 3 after: 1 remaining errors: ``` torch/cuda/jiterator.py:1 at module level: D100: Missing docstring in public module ``` pydocstyle torch/cuda/graphs.py --count before: 25 after: 7 remaining errors: ``` torch/cuda/graphs.py:1 at module level: D100: Missing docstring in public module torch/cuda/graphs.py:54 in public method `__new__`: D102: Missing docstring in public method torch/cuda/graphs.py:108 in public method `debug_dump`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/graphs.py:108 in public method `debug_dump`: D400: First line should end with a period (not ':') torch/cuda/graphs.py:150 in public method `__init__`: D107: Missing docstring in __init__ torch/cuda/graphs.py:172 in public method `__enter__`: D105: Missing docstring in magic method torch/cuda/graphs.py:186 in public method `__exit__`: D105: Missing docstring in magic method ``` pydocstyle torch/cuda/_sanitizer.py --count before: 35 after: 31 remaining errors: ``` torch/cuda/_sanitizer.py:43 in public class `AccessType`: D101: Missing docstring in public class torch/cuda/_sanitizer.py:47 in public method `__str__`: D105: Missing docstring in magic method torch/cuda/_sanitizer.py:84 in public method `__init__`: D107: Missing docstring in __init__ torch/cuda/_sanitizer.py:96 in public method `__str__`: D105: Missing docstring in magic method torch/cuda/_sanitizer.py:139 in public method `__init__`: D107: Missing docstring in __init__ torch/cuda/_sanitizer.py:142 in public method `__str__`: D105: Missing docstring in magic method torch/cuda/_sanitizer.py:218 in public class `StreamSynchronizations`: D101: Missing docstring in public class torch/cuda/_sanitizer.py:219 in public method `__init__`: D107: Missing docstring in __init__ torch/cuda/_sanitizer.py:256 in public method `create_stream`: D102: Missing docstring in public method torch/cuda/_sanitizer.py:268 in public method `create_event`: D102: Missing docstring in public method torch/cuda/_sanitizer.py:272 in public method `delete_event`: D102: Missing docstring in public method torch/cuda/_sanitizer.py:276 in public method `update_seq_num`: D102: Missing docstring in public method torch/cuda/_sanitizer.py:280 in public method `record_state`: D102: Missing docstring in public method torch/cuda/_sanitizer.py:291 in public method `stream_wait_for_event`: D102: Missing docstring in public method torch/cuda/_sanitizer.py:298 in public method `all_streams_wait_for_event`: D102: Missing docstring in public method torch/cuda/_sanitizer.py:307 in public method `all_streams_wait_for_stream`: D102: Missing docstring in public method torch/cuda/_sanitizer.py:316 in public method `sync_all_streams`: D102: Missing docstring in public method torch/cuda/_sanitizer.py:323 in public method `is_ordered_after`: D102: Missing docstring in public method torch/cuda/_sanitizer.py:339 in public method `__init__`: D107: Missing docstring in __init__ torch/cuda/_sanitizer.py:460 in public function `zip_by_key`: D103: Missing docstring in public function torch/cuda/_sanitizer.py:466 in public function `zip_arguments`: D103: Missing docstring in public function torch/cuda/_sanitizer.py:478 in public class `ArgumentHandler`: D101: Missing docstring in public class torch/cuda/_sanitizer.py:479 in public method `__init__`: D107: Missing docstring in __init__ torch/cuda/_sanitizer.py:505 in public method `parse_inputs`: D102: Missing docstring in public method torch/cuda/_sanitizer.py:520 in public method `parse_outputs`: D102: Missing docstring in public method torch/cuda/_sanitizer.py:527 in public class `CUDASanitizerDispatchMode`: D101: Missing docstring in public class torch/cuda/_sanitizer.py:528 in public method `__init__`: D107: Missing docstring in __init__ torch/cuda/_sanitizer.py:562 in public method `__torch_dispatch__`: D105: Missing docstring in magic method torch/cuda/_sanitizer.py:597 in public method `__init__`: D107: Missing docstring in __init__ torch/cuda/_sanitizer.py:601 in public method `enable`: D102: Missing docstring in public method torch/cuda/_sanitizer.py:605 in public method `__del__`: D105: Missing docstring in magic method ``` pydocstyle torch/storage.py --count before: 90 after: 37 remaining errors: ``` torch/storage.py:1 at module level: D100: Missing docstring in public module torch/storage.py:310 in public class `UntypedStorage`: D101: Missing docstring in public class torch/storage.py:311 in public method `__getitem__`: D105: Missing docstring in magic method torch/storage.py:317 in public method `is_cuda`: D102: Missing docstring in public method torch/storage.py:321 in public method `is_hpu`: D102: Missing docstring in public method torch/storage.py:325 in public method `share_memory_`: D102: Missing docstring in public method torch/storage.py:444 in public class `TypedStorage`: D101: Missing docstring in public class torch/storage.py:453 in public method `fill_`: D102: Missing docstring in public method torch/storage.py:458 in public method `__new__`: D102: Missing docstring in public method torch/storage.py:530 in public method `__init__`: D107: Missing docstring in __init__ torch/storage.py:599 in public method `is_cuda`: D102: Missing docstring in public method torch/storage.py:604 in public method `is_hpu`: D102: Missing docstring in public method torch/storage.py:624 in public method `__len__`: D105: Missing docstring in magic method torch/storage.py:653 in public method `__setitem__`: D105: Missing docstring in magic method torch/storage.py:681 in public method `__getitem__`: D105: Missing docstring in magic method torch/storage.py:715 in public method `copy_`: D102: Missing docstring in public method torch/storage.py:723 in public method `nbytes`: D102: Missing docstring in public method torch/storage.py:731 in public method `type`: D102: Missing docstring in public method torch/storage.py:744 in public method `cuda`: D102: Missing docstring in public method torch/storage.py:751 in public method `hpu`: D102: Missing docstring in public method torch/storage.py:758 in public method `element_size`: D102: Missing docstring in public method torch/storage.py:766 in public method `get_device`: D102: Missing docstring in public method torch/storage.py:770 in public method `__str__`: D105: Missing docstring in magic method torch/storage.py:781 in public method `__repr__`: D105: Missing docstring in magic method torch/storage.py:785 in public method `__iter__`: D105: Missing docstring in magic method torch/storage.py:789 in public method `__copy__`: D105: Missing docstring in magic method torch/storage.py:793 in public method `__deepcopy__`: D105: Missing docstring in magic method torch/storage.py:801 in public method `__sizeof__`: D105: Missing docstring in magic method torch/storage.py:877 in public method `device`: D102: Missing docstring in public method torch/storage.py:881 in public method `size`: D102: Missing docstring in public method torch/storage.py:891 in public method `pickle_storage_type`: D102: Missing docstring in public method torch/storage.py:902 in public method `__reduce__`: D105: Missing docstring in magic method torch/storage.py:907 in public method `data_ptr`: D102: Missing docstring in public method torch/storage.py:915 in public method `resize_`: D102: Missing docstring in public method torch/storage.py:931 in public method `from_buffer`: D102: Missing docstring in public method torch/storage.py:1032 in public method `from_file`: D402: First line should not be the function's "signature" torch/storage.py:1075 in public method `is_shared`: D102: Missing docstring in public method ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/113227 Approved by: https://github.com/kit1980	2023-11-13 22:05:45 +00:00
PyTorch MergeBot	0e6b6a2483	Revert "AOTAutograd: handle set_(), detect metadata mutations that cancel out (#111554 )" This reverts commit 3afb4e5cf7b0162c532449fb5c9e7c7058a4c803. Reverted https://github.com/pytorch/pytorch/pull/111554 on behalf of https://github.com/clee2000 due to the xla failure is real sorry, log classifier is showing the wrong line ([comment](https://github.com/pytorch/pytorch/pull/111554#issuecomment-1809177978))	2023-11-13 21:46:57 +00:00
Thiago Crepaldi	cfee3bcf97	Add inheritance to ONNX's InputAdaptStep and OutputAdaptSet impl (#113476 ) This is a minor compliance change that specifies the InputAdaptStep and OutputAdapStep as the base class for the actual implementations Pull Request resolved: https://github.com/pytorch/pytorch/pull/113476 Approved by: https://github.com/justinchuby	2023-11-13 21:27:44 +00:00
Jithun Nair	b01e89587e	[ROCM][CI] Introduce tests-to-include as rocm-test workflow input (#110511 ) Fixes https://github.com/pytorch/pytorch/issues/110181 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110511 Approved by: https://github.com/huydhn	2023-11-13 21:25:49 +00:00
ChanBong	2ea3d64f47	fix docstring issues in torch.utils.tensorboard (#113336 ) Fixes #112637 Fixed all the issues listed. ### Error Counts \|File \| Count Before \| Count now\| \|---- \| ---- \| ---- \| \|`torch/utils/tensorboard/_proto_graph.py` \| 9 \| 0\| \|`torch/utils/tensorboard/_pytorch_graph.py` \| 27 \| 14\| \|`torch/utils/tensorboard/_utils.py` \| 5 \| 2\| \|`torch/utils/tensorboard/summary.py` \| 27 \| 12\| \|`torch/utils/tensorboard/writer.py` \| 42 \| 4\| \|`torch/utils/tensorboard/_caffe2_graph.py` \| 19 \| 0\| \|`torch/utils/hipify/constants.py` \| 2 \| 0\| Pull Request resolved: https://github.com/pytorch/pytorch/pull/113336 Approved by: https://github.com/ezyang	2023-11-13 20:50:01 +00:00
Yang Chen	a144eb502a	[aotinductor] add versions for the sdpa shim api (#113487 ) In our first implemenation of the sdpa shim api, we didn't consider the case where the optional scale argument could be None. It was unnoticed because we always got a default argument for the cuda backend. The issue was detected with the cpu backend. This PR implements versioning for shim kernels. Currently, we only have different versions for the sdpa api. We expect we would only maintain a very small number of abi-compatible shim APIs that had different versions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113487 Approved by: https://github.com/int3, https://github.com/desertfire	2023-11-13 20:18:58 +00:00
Oguz Ulgen	6ea20f5dc5	[AOTI] Use expr_printer to print sympy expr (#113317 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113317 Approved by: https://github.com/aakhundov, https://github.com/chenyang78	2023-11-13 20:14:04 +00:00
ChanBong	c0b57d4e3b	fix docstring issues in torch.distributed (#113337 ) Fixes #112643 Fixes all the issues listed ### Error Count \|File \| Count Before \| Count now\| \|---- \| ---- \| ---- \| \|`torch/distributed/optim/named_optimizer.py` \| 13 \| 1\| \|`torch/distributed/nn/functional.py` \| 7 \| 1\| \|`torch/distributed/nn/api/remote_module.py` \| 25 \| 3\| \|`torch/distributed/algorithms/join.py` \| 43 \| 4\| Pull Request resolved: https://github.com/pytorch/pytorch/pull/113337 Approved by: https://github.com/ezyang	2023-11-13 19:37:29 +00:00
ChanBong	5e10dd2c78	fix docstring issues in torch.utils (#113335 ) Fixes #112634 Fixes all the issues listed except in `torch/utils/_pytree.py` as the file no longer exists. ### Error counts \|File \| Count Before \| Count now\| \|---- \| ---- \| ---- \| \|`torch/utils/collect_env.py` \| 39 \| 25\| \|`torch/utils/cpp_extension.py` \| 51 \| 13\| \|`torch/utils/flop_counter.py` \| 25 \| 8\| \|`torch/utils/_foreach_utils.py.py` \| 2 \| 0\| \|`torch/utils/_python_dispatch.py.py` \| 26 \| 25\| \|`torch/utils/backend_registration.py` \| 15 \| 4\| \|`torch/utils/checkpoint.py` \| 29 \| 21\| Pull Request resolved: https://github.com/pytorch/pytorch/pull/113335 Approved by: https://github.com/ezyang	2023-11-13 19:37:25 +00:00
rraminen	44367c59b2	Update skip reason for failing unit tests on ROCm 5.7 (#113286 ) Follow up to https://github.com/pytorch/pytorch/pull/110465. Updated skip reason for failing unit tests on ROCm 5.7 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113286 Approved by: https://github.com/malfet	2023-11-13 19:29:04 +00:00
Antoni Viros	1aece432ba	Implement narrow from a regular tensor to jagged tensor (#112770 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112770 Approved by: https://github.com/cpuhrsch	2023-11-13 19:09:59 +00:00
Konstantin Dobler	3700894099	Fix FSDP `summon_full_params(..., with_grads=True)` when grad precision is not `fp32` (#112746 ) Fixes #112717 I moved the `torch.empty` call after the conditional so that we don't need to check whether `flat_param.grad` is None Pull Request resolved: https://github.com/pytorch/pytorch/pull/112746 Approved by: https://github.com/awgu	2023-11-13 19:04:24 +00:00
Justin Chu	47a59ee4d1	[ONNX] Update exporter issue report instructions for quantized models (#113494 ) Update the instructions to point users to the right place for creating issues. https://github.com/onnx/onnx/issues/5674#issuecomment-1806505240 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113494 Approved by: https://github.com/jerryzh168	2023-11-13 18:18:19 +00:00
drisspg	c46fc46dba	expose mem-eff to autograd (#110495 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110495 Approved by: https://github.com/jbschlosser	2023-11-13 17:47:40 +00:00
Brian Hirsh	3afb4e5cf7	AOTAutograd: handle set_(), detect metadata mutations that cancel out (#111554 ) This should be enough to get @voznesenskym 's FSDP branch to plumb `set_()` through AOTAutograd properly and have everything properly no-op out. Main changes are: (1) graph break on `aten::set_.source_Tensor_storage_offset` (we could support it but it isn't needed, seems safer to graph break) (2) Functionalization: add a "proper" functionalization kernel for `aten::set_.source_Tensor`. The previous one we had was codegen'd and it was wrong (it would just clone() and call set_(), which does not do the right thing). I also manually mark on the `FunctionalTensorWrapper` when a given tensor has been mutated by a `set_()` call. (3) AOTAutograd: I added a new field, `InputAliasInfo.mutates_storage_metadata`, so we can distinguish between "regular" metadata mutations, and metadata mutations due to `set_()` calls. This is mainly because at runtime, one requires calling `as_strided_()` to fix up metadata, while the other requires calling `set_()`. (4) Made AOTAutograd's detection for metadata mutations / set_() mutations smarter and detect no-ops (if the storage and metadata are all the same). I also killed `was_updated()` and `was_metadata_updated()`, and replaced them with (existing) `has_data_mutation() ` and (new) `has_data_mutation()`, which can more accurately distinguish between data-mutation vs. `set_()` calls vs. metadata-mutation This PR is still silently correct in one case though, which I'd like to discuss more. In particular, this example: ``` def f(x): x_view = x.view(-1) x.set_(torch.ones(2)) x_view.mul_(2) return ``` If you have an input that experiences both a data-mutation and a `x_old.set_(x_new)` call, there are two cases: (a) the data mutation happened on the storage of `x_new`. This case should be handled automatically: if x_new is a graph intermediate then we will functionalize the mutation. If x_new is a different graph input, then we will perform the usual `copy_()` on that other graph input (b) the data mutation happened on the storage of `x_old`. This is more of a pain to handle, and doesn't currently work. At runtime, the right thing to do is probably something like: ``` def functionalized_f(x): x_view = x.view(-1) # set_() desugars into a no-op; later usages of x will use x_output x_output = torch.ones(2) # functionalize the mutation on x_view x_view_updated = x.mul(2) x_updated = x_view_updated.view(x.shape) # x experienced TWO TYPES of mutations; a data mutation and a metatadata mutation # We need to return both updated tensors in our graph return x_updated, x_output def runtime_wrapper(x): x_data_mutation_result, x_set_mutation_result = compiled_graph(x) # First, perform the data mutation on x's old storage x.copy_(x_data_mutation_result) # Then, swap out the storage of x with the new storage x.set_(x_set_mutation_result) ``` There are two things that make this difficult to do though: (1) Functionalization: the functionalization rule for `set_()` will fully throw away the old `FunctionalStorageImpl` on the graph input. So if there are any mutations to that `FunctionalStorageImpl` later on in the graph, the current graph input won't know about it. Maybe we can have a given `FunctionalTensorWrapper` remember all previous storages that it had, and track mutations on all of them - although this feels pretty complicated. (2) AOTAutograd now needs to know that we might have two graph outputs that correspond to a single "mutated input", which is annoying. It's worth pointing out that this issue is probably extremely unlikely for anyone to run into - can we just detect it and error? This feels slightly easier than solving it, although not significantly easier. We would still need `FunctionalTensorWrapper` to keep track of mutations on any of its "previous" storages, so it can report this info back to AOTAutograd so we can raise an error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111554 Approved by: https://github.com/ezyang	2023-11-13 16:39:25 +00:00
zabboud	1d9919c46d	Fix pydocstyle for issue 112591 (#113233 ) Fixes #112591 Fixed errors relating to pydocstyle in the following files. The remaining errors are related to docstrings at the module level and methods within each module (see details below). pydocstyle torch/cuda/_memory_viz.py --count before: 7 after: 4 remaining errors: ``` torch/cuda/_memory_viz.py:77 in public function `format_flamegraph`: D103: Missing docstring in public function torch/cuda/_memory_viz.py:121 in public function `segments`: D103: Missing docstring in public function torch/cuda/_memory_viz.py:128 in public function `memory`: D103: Missing docstring in public function torch/cuda/_memory_viz.py:135 in public function `compare`: D103: Missing docstring in public function ``` pydocstyle torch/cuda/streams.py --count before: 29 after: 8 remaining errors: ``` torch/cuda/streams.py:1 at module level: D100: Missing docstring in public module torch/cuda/streams.py:31 in public method `__new__`: D102: Missing docstring in public method torch/cuda/streams.py:105 in public method `__eq__`: D105: Missing docstring in magic method torch/cuda/streams.py:110 in public method `__hash__`: D105: Missing docstring in magic method torch/cuda/streams.py:113 in public method `__repr__`: D105: Missing docstring in magic method torch/cuda/streams.py:135 in public method `__new__`: D102: Missing docstring in public method torch/cuda/streams.py:163 in public method `__new__`: D102: Missing docstring in public method torch/cuda/streams.py:237 in public method `__repr__`: D105: Missing docstring in magic method ``` pydocstyle torch/cuda/__init__.py --count before: 100 after: 46 remaining errors: ``` torch/cuda/__init__.py:251 in public class `DeferredCudaCallError`: D101: Missing docstring in public class torch/cuda/__init__.py:327 in public function `cudart`: D103: Missing docstring in public function torch/cuda/__init__.py:332 in public class `cudaStatus`: D101: Missing docstring in public class torch/cuda/__init__.py:337 in public class `CudaError`: D101: Missing docstring in public class torch/cuda/__init__.py:338 in public method `__init__`: D107: Missing docstring in __init__ torch/cuda/__init__.py:343 in public function `check_error`: D103: Missing docstring in public function torch/cuda/__init__.py:369 in public method `__init__`: D107: Missing docstring in __init__ torch/cuda/__init__.py:373 in public method `__enter__`: D105: Missing docstring in magic method torch/cuda/__init__.py:376 in public method `__exit__`: D105: Missing docstring in magic method torch/cuda/__init__.py:391 in public method `__init__`: D107: Missing docstring in __init__ torch/cuda/__init__.py:473 in public class `StreamContext`: D204: 1 blank line required after class docstring (found 0) torch/cuda/__init__.py:485 in public method `__init__`: D107: Missing docstring in __init__ torch/cuda/__init__.py:499 in public method `__enter__`: D105: Missing docstring in magic method torch/cuda/__init__.py:514 in public method `__exit__`: D105: Missing docstring in magic method torch/cuda/__init__.py:541 in public function `set_stream`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/__init__.py:838 in public function `current_blas_handle`: D400: First line should end with a period (not 'e') torch/cuda/__init__.py:894 in public function `memory_usage`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/__init__.py:894 in public function `memory_usage`: D400: First line should end with a period (not ')') torch/cuda/__init__.py:913 in public function `utilization`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/__init__.py:913 in public function `utilization`: D400: First line should end with a period (not 'r') torch/cuda/__init__.py:949 in public function `power_draw`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/__init__.py:949 in public function `power_draw`: D400: First line should end with a period (not ')') torch/cuda/__init__.py:1089 in public class `ByteStorage`: D101: Missing docstring in public class torch/cuda/__init__.py:1091 in public method `dtype`: D102: Missing docstring in public method torch/cuda/__init__.py:1100 in public class `DoubleStorage`: D101: Missing docstring in public class torch/cuda/__init__.py:1102 in public method `dtype`: D102: Missing docstring in public method torch/cuda/__init__.py:1111 in public class `FloatStorage`: D101: Missing docstring in public class torch/cuda/__init__.py:1113 in public method `dtype`: D102: Missing docstring in public method torch/cuda/__init__.py:1122 in public class `HalfStorage`: D101: Missing docstring in public class torch/cuda/__init__.py:1124 in public method `dtype`: D102: Missing docstring in public method torch/cuda/__init__.py:1133 in public class `LongStorage`: D101: Missing docstring in public class torch/cuda/__init__.py:1135 in public method `dtype`: D102: Missing docstring in public method torch/cuda/__init__.py:1144 in public class `IntStorage`: D101: Missing docstring in public class torch/cuda/__init__.py:1146 in public method `dtype`: D102: Missing docstring in public method torch/cuda/__init__.py:1155 in public class `ShortStorage`: D101: Missing docstring in public class torch/cuda/__init__.py:1157 in public method `dtype`: D102: Missing docstring in public method torch/cuda/__init__.py:1166 in public class `CharStorage`: D101: Missing docstring in public class torch/cuda/__init__.py:1168 in public method `dtype`: D102: Missing docstring in public method torch/cuda/__init__.py:1177 in public class `BoolStorage`: D101: Missing docstring in public class torch/cuda/__init__.py:1179 in public method `dtype`: D102: Missing docstring in public method torch/cuda/__init__.py:1188 in public class `BFloat16Storage`: D101: Missing docstring in public class torch/cuda/__init__.py:1190 in public method `dtype`: D102: Missing docstring in public method torch/cuda/__init__.py:1199 in public class `ComplexDoubleStorage`: D101: Missing docstring in public class torch/cuda/__init__.py:1201 in public method `dtype`: D102: Missing docstring in public method torch/cuda/__init__.py:1210 in public class `ComplexFloatStorage`: D101: Missing docstring in public class torch/cuda/__init__.py:1212 in public method `dtype`: D102: Missing docstring in public method ``` @mikaylagawarecki @albanD @svekars @jbschlosser Pull Request resolved: https://github.com/pytorch/pytorch/pull/113233 Approved by: https://github.com/malfet	2023-11-13 16:24:53 +00:00
PyTorch MergeBot	0fd856ca22	Revert "[ONNX] Fix scalar type promotion between fp16 tensor and fp32 scalar (#113404 )" This reverts commit 39ca5a3226331428465a84d53d5b50dfb4406cfe. Reverted https://github.com/pytorch/pytorch/pull/113404 on behalf of https://github.com/jeanschmidt due to sorry it is breaking CI jobs on main ([comment](https://github.com/pytorch/pytorch/pull/113404#issuecomment-1808314277))	2023-11-13 14:56:35 +00:00
Philip Meier	d64bc8f0f8	use sourceless builder for builtin getattr (#113340 ) In TorchVision we use the following (simplified) dispatch mechanism: ```python import torch def kernel1(tensor): return tensor + 2 def dispatcher1(input): kernel = get_kernel(dispatcher1, type(input)) return kernel(input) def kernel2(tensor): return tensor - 2 def dispatcher2(input): kernel = get_kernel(dispatcher2, type(input)) return kernel(input) # We actually use the function and type as keys, rather than their names. # However, this currently not supported, but should be easy to add after # https://github.com/pytorch/pytorch/pull/111196 REGISTRY = { "dispatcher1": {"Tensor": kernel1}, "dispatcher2": {"Tensor": kernel2}, } def get_kernel(dispatcher, input_type): dispatcher_registry = REGISTRY[dispatcher.__name__] for cls in input_type.__mro__: kernel = dispatcher_registry[cls.__name__] break return kernel ``` This can be compiled without graph breaks: ```python cfn = torch.compile(dispatcher1, fullgraph=True) torch.testing.assert_close(int(cfn(torch.tensor(3))), 5) cfn = torch.compile(dispatcher2, fullgraph=True) torch.testing.assert_close(int(cfn(torch.tensor(3))), 1) ``` However, if we start chaining these calls, we hit some issues: ```python class Pipeline(torch.nn.Module): def forward(self, input): input = dispatcher1(input) input = dispatcher2(input) return input cfn = torch.compile(Pipeline(), fullgraph=True) torch.testing.assert_close(int(cfn(torch.tensor(3))), 3) ``` ``` Can't access members of type(obj) for a generated custom object. Please use __class__ instead ``` The error message is not really helpful here. The following happens: when compiling `dispatcher1`, `get_kernel` gets inlined. That means when hitting `dispatcher2`, the `type` call no longer happens on an input with a source. Thus, in the first iteration we hit the top branch, while in the second we hit the bottom: `addb8e29cd/torch/_dynamo/variables/builtin.py (L1264-L1268)` And the error message I posted above originates from the type being treated as constant. This PR replaces this with a `SourcelessBuilder` instead. With that fix in place, we hit another pointing to `input_type.__mro__` ``` AssertionError: Consider SourcelessBuilder for ephemeral objects, usually objects created locally. ``` Fix is similar: instead of using a `VariableBuilder` here, we use a `SourcelessBuilder` in case we have no `source`: `addb8e29cd/torch/_dynamo/variables/builtin.py (L1167-L1168)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/113340 Approved by: https://github.com/peterbell10, https://github.com/lezcano	2023-11-13 14:29:17 +00:00
PyTorch UpdateBot	115da02432	[xla hash update] update the pinned xla hash (#113549 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml). Update the pinned xla hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113549 Approved by: https://github.com/pytorchbot	2023-11-13 12:07:20 +00:00
Peter Bell	44f1c6e41c	[inductor] Handle variance corrections larger than number of data points (#113284 ) Fixes #113167 When correction is larger than the number of data points, we should return a nan by dividing by zero, as is done in the eager implementation. `5ea76f1760/aten/src/ATen/native/SharedReduceOps.h (L137)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/113284 Approved by: https://github.com/lezcano	2023-11-13 11:16:17 +00:00
Chien-Chin Huang	2bcff4d8e3	[state_dict][11/N] Implement cpu_offload and full_state_dict for get_state_dict (#112837 ) As title Differential Revision: [D50962991](https://our.internmc.facebook.com/intern/diff/D50962991/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112837 Approved by: https://github.com/LucasLLC, https://github.com/wz337 ghstack dependencies: #112836, #112885	2023-11-13 10:03:06 +00:00
Chen_Liqing	b910d9eaa6	Add tensor.is_privateuseone (#113421 ) We found a scenario where ``tensor.device().is_privateuseone()`` is used to determine whether a tensor is privateuse1 but fails. In the code of ``Autograd``, for example： ``` ::std::tuple<at::Tensor,at::Tensor,at::Tensor> native_batch_norm(c10::DispatchKeySet ks, const at::Tensor & input, const c10::optional<at::Tensor> & weight, const c10::optional<at::Tensor> & bias, const c10::optional<at::Tensor> & running_mean, const c10::optional<at::Tensor> & running_var, bool training, double momentum, double eps) { auto& input_ = unpack(input, "input", 0); [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( input, weight, bias ); [[maybe_unused]] auto _any_has_forward_grad_result0 = (isFwGradDefined(input) \|\| isFwGradDefined(weight) \|\| isFwGradDefined(bias)); check_no_requires_grad(running_mean, "running_mean", "native_batch_norm"); check_no_requires_grad(running_var, "running_var", "native_batch_norm"); std::shared_ptr<NativeBatchNormBackward0> grad_fn; if (_any_requires_grad) { grad_fn = std::shared_ptr<NativeBatchNormBackward0>(new NativeBatchNormBackward0(), deleteNode); grad_fn->set_next_edges(collect_next_edges( input, weight, bias )); grad_fn->eps = eps; grad_fn->input_ = SavedVariable(input, false); grad_fn->running_mean_ = SavedVariable(running_mean, false); grad_fn->running_var_ = SavedVariable(running_var, false); grad_fn->training = training; grad_fn->weight_ = SavedVariable(weight, false); } ... } ``` When ``weight`` is ``None``, an empty tensor is automatically generated and will be transferred to the backward calculation: `c7e12c7427/torch/csrc/autograd/saved_variable.cpp (L121-L128)` At the beginning of the backward calculation in our scenario, we need to determine whether the input tensor is ``PrivateUse1`` . However, if we use ``tensor.device().is_privateuseone()``, we will get an error ``"tensor does not have a device"``: `c7e12c7427/c10/core/TensorImpl.h (L1223-L1235)` I think this part of the code can be optimized, what do you think? Pull Request resolved: https://github.com/pytorch/pytorch/pull/113421 Approved by: https://github.com/albanD	2023-11-13 01:51:27 +00:00
Jez Ng	7afb503e3c	[inductor] Label align() with [[maybe_unused]] (#113502 ) This squelches the "defined but not used" warning that occurs when memory planning is disabled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113502 Approved by: https://github.com/jansel	2023-11-12 16:33:47 +00:00
Peter Bell	8b61daaf73	Prune more unnecessary includes from CUDA transformers (#113493 ) These kernels are incredibly slow to compile and for the most part are completely independant of ATen/c10 yet they still end up including half of `c10` transitively through `CUDAGeneratorImpl.h` and `CUDAContext.h`. This trims the fat so `mem_eff_attention` doesn't depend on ATen/c10 at all, and `flash_attn` now only depends on `PhiloxUtils.cuh` (split out from `CUDAGeneratorImpl.h`) and `CUDAContextLight.h` which doesn't transitively include `TensorImpl.h`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113493 Approved by: https://github.com/lezcano	2023-11-12 16:00:05 +00:00
Wes Bland	9c331be919	[pytorch] Remove dot if no suffix (#113273 ) Summary: Add the suffix to the version string shouldn't happen if there is no suffix. Test Plan: ``` /data/users/wbland/fbsource/buck-out/v2/gen/fbcode/param_bench/train/comms/pt/comms.par \ --backend nccl --device cuda --collective all_gather \ --master-ip <snip> --log INFO --b 256 --e 1K \ --num-coll-per-iteration 10 --mode comms--num_iters 5 --w 1 --z 1 ... I1108 07:58:33.852557 2344130 ProcessGroupNCCL.cpp:990] [Rank 0] ProcessGroupNCCL initialization options: NCCL version: 2.17.1, NCCL_ASYNC_ERROR_HANDLING: 3, NCCL_DESYNC_DEBUG: 0, NCCL_ENABLE_TIMING: 0, NCCL_BLOCKING_WAIT: 0, TIMEOUT(ms): 600000, USE_HIGH_PRIORITY_STREAM: 0, TORCH_DISTRIBUTED_DEBUG: OFF, NCCL_DEBUG: OFF, ID=139992854228992 ... ``` Differential Revision: D51116095 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113273 Approved by: https://github.com/kwen2501	2023-11-12 15:41:27 +00:00
jiayisun	7f1cbc8b5a	remove intel_extension_for_pytorch from THIRDPARTY_SKIPLIST (#112840 ) Motivation: Since `intel_extension_for_pytorch` is added to `THIRDPARTY_SKIPLIST`, when the IPEX optimized model uses `torch.compile`, the functions defined in IPEX will be skipped, these functions will not be able to generate the corresponding FX graph through dynamo, cannot be optimized by the compiler, and unnecessary graph breaks occurred. This PR is to remove `intel_extension_for_pytorch` from `THIRDPARTY_SKIPLIST` so that IPEX and torch.compile can work better together. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112840 Approved by: https://github.com/jgong5, https://github.com/jansel	2023-11-12 09:40:51 +00:00
Ken Jin	70064ac416	[Dynamo] Match closures by code ID (#109427 ) Closes https://github.com/pytorch/pytorch/issues/107866 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109427 Approved by: https://github.com/ezyang, https://github.com/jansel	2023-11-12 08:20:14 +00:00
Siddharth Mishra	fe5d8850e2	Fixed docstring errors in _fuser.py, _state.py, __init__.py, _freeze.py, _async.py, _recursive.py, _tensorboard_vis.py, _trace.py, _await.py, _check.py, _serialization.py, _script.py, annotations.py, _monkeytype_config.py (#113371 ) Fixes #113194 docstrings updated. Here are the outputs with the number before and after:- 1) torch/sparse/__init__.py Before: ``` /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:1 at module level: D104: Missing docstring in public package /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:183 in public function `sum`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:183 in public function `sum`: D400: First line should end with a period (not 'n') /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:183 in public function `sum`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:391 in public class `check_sparse_tensor_invariants`: D207: Docstring is under-indented /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:436 in public method `is_enabled`: D207: Docstring is under-indented /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:436 in public method `is_enabled`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:448 in public method `enable`: D207: Docstring is under-indented /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:468 in public method `disable`: D207: Docstring is under-indented /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:475 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:479 in public method `__enter__`: D105: Missing docstring in magic method /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:486 in public method `__exit__`: D105: Missing docstring in magic method /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:492 in public method `__call__`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:502 in public function `as_sparse_gradcheck`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:502 in public function `as_sparse_gradcheck`: D400: First line should end with a period (not 'l') /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:502 in public function `as_sparse_gradcheck`: D401: First line should be in imperative mood (perhaps 'Decorate', not 'Decorator') /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:518 in private nested function `gradcheck_with_sparse_support`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:518 in private nested function `gradcheck_with_sparse_support`: D400: First line should end with a period (not 's') /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:518 in private nested function `gradcheck_with_sparse_support`: D401: First line should be in imperative mood; try rephrasing (found 'Same') /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:528 in private nested function `convert_to_strided_representation`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:528 in private nested function `convert_to_strided_representation`: D400: First line should end with a period (not 'n') /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:559 in private nested function `restore_from_strided_representation`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:559 in private nested function `restore_from_strided_representation`: D400: First line should end with a period (not 'd') 23 ``` After: ``` /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:1 at module level: D104: Missing docstring in public package /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:476 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:480 in public method `__enter__`: D105: Missing docstring in magic method /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:487 in public method `__exit__`: D105: Missing docstring in magic method /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:493 in public method `__call__`: D102: Missing docstring in public method 5 ``` 2) torch/contrib/_tensorboard_vis.py Before: ``` /home/ubuntu/Desktop/Docathon/pytorch/torch/contrib/_tensorboard_vis.py:21 in public function `dump_tensorboard_summary`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/contrib/_tensorboard_vis.py:54 in public function `visualize_graph_executor`: D401: First line should be in imperative mood (perhaps 'Append', not 'Appends') 2 ``` After: ``` /home/ubuntu/Desktop/Docathon/pytorch/torch/contrib/_tensorboard_vis.py:21 in public function `dump_tensorboard_summary`: D103: Missing docstring in public function 1 ``` 3) torch/jit/_state.py Before: ``` /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_state.py:1 at module level: D400: First line should end with a period (not 'e') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_state.py:20 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_state.py:25 in public method `parse_env`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_state.py:41 in public method `__bool__`: D105: Missing docstring in magic method /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_state.py:48 in public function `disable`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_state.py:52 in public function `enable`: D103: Missing docstring in public function 6 ``` After: ``` /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_state.py:20 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_state.py:25 in public method `parse_env`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_state.py:41 in public method `__bool__`: D105: Missing docstring in magic method /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_state.py:48 in public function `disable`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_state.py:52 in public function `enable`: D103: Missing docstring in public function 5 ``` 4) torch/jit/_monkeytype_config.py Before: ``` /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:27 in public function `is_torch_native_class`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:40 in public function `get_type`: D200: One-line docstring should fit on one line with quotes (found 3) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:40 in public function `get_type`: D401: First line should be in imperative mood; try rephrasing (found 'Helper') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:62 in public function `get_optional_of_element_type`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:62 in public function `get_optional_of_element_type`: D400: First line should end with a period (not 'l') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:62 in public function `get_optional_of_element_type`: D401: First line should be in imperative mood; try rephrasing (found 'Helper') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:75 in public function `get_qualified_name`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:84 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:87 in public method `log`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:90 in public class `JitTypeTraceStore`: D101: Missing docstring in public class /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:91 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:98 in public method `add`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:103 in public method `filter`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:111 in public method `analyze`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:122 in public method `consolidate_types`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:139 in public method `get_args_types`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:142 in public class `JitTypeTraceConfig`: D101: Missing docstring in public class /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:143 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:148 in public method `trace_logger`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:148 in public method `trace_logger`: D400: First line should end with a period (not 'd') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:148 in public method `trace_logger`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:154 in public method `trace_store`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:157 in public method `code_filter`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:163 in public class `JitTypeTraceStoreLogger`: D101: Missing docstring in public class /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:164 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:167 in public class `JitTypeTraceStore`: D101: Missing docstring in public class /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:168 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:171 in public class `JitTypeTraceConfig`: D101: Missing docstring in public class /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:172 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:179 in public function `jit_code_filter`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:179 in public function `jit_code_filter`: D401: First line should be in imperative mood; try rephrasing (found 'Custom') 31 ``` After: ``` /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:27 in public function `is_torch_native_class`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:74 in public function `get_qualified_name`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:83 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:86 in public method `log`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:89 in public class `JitTypeTraceStore`: D101: Missing docstring in public class /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:90 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:97 in public method `add`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:102 in public method `filter`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:110 in public method `analyze`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:121 in public method `consolidate_types`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:138 in public method `get_args_types`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:141 in public class `JitTypeTraceConfig`: D101: Missing docstring in public class /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:142 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:150 in public method `trace_store`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:153 in public method `code_filter`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:159 in public class `JitTypeTraceStoreLogger`: D101: Missing docstring in public class /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:160 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:163 in public class `JitTypeTraceStore`: D101: Missing docstring in public class /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:164 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:167 in public class `JitTypeTraceConfig`: D101: Missing docstring in public class /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:168 in public method `__init__`: D107: Missing docstring in __init__ 21 ``` 5) torch/jit/_fuser.py Before: ``` /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_fuser.py:9 in public function `optimized_execution`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_fuser.py:9 in public function `optimized_execution`: D400: First line should end with a period (not 'n') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_fuser.py:9 in public function `optimized_execution`: D401: First line should be in imperative mood; try rephrasing (found 'A') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_fuser.py:23 in public function `fuser`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_fuser.py:23 in public function `fuser`: D400: First line should end with a period (not 'n') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_fuser.py:23 in public function `fuser`: D401: First line should be in imperative mood; try rephrasing (found 'A') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_fuser.py:136 in public function `set_fusion_strategy`: D401: First line should be in imperative mood (perhaps 'Set', not 'Sets') 7 ``` After: ``` 0 ``` 6) torch/jit/_async.py Before: ``` /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_async.py:1 at module level: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_async.py:1 at module level: D400: First line should end with a period (not 'I') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_async.py:20 in public function `fork`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_async.py:20 in public function `fork`: D400: First line should end with a period (not 'e') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_async.py:20 in public function `fork`: D401: First line should be in imperative mood (perhaps 'Create', not 'Creates') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_async.py:88 in public function `wait`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_async.py:88 in public function `wait`: D400: First line should end with a period (not 'e') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_async.py:88 in public function `wait`: D401: First line should be in imperative mood (perhaps 'Force', not 'Forces') 8 ``` After: ``` 0 ``` 7) torch/jit/_await.py Before: ``` /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_await.py:11 in private function `_awaitable`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_await.py:11 in private function `_awaitable`: D400: First line should end with a period (not ',') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_await.py:11 in private function `_awaitable`: D401: First line should be in imperative mood (perhaps 'Create', not 'Creates') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_await.py:19 in private function `_awaitable_wait`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_await.py:19 in private function `_awaitable_wait`: D400: First line should end with a period (not ',') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_await.py:19 in private function `_awaitable_wait`: D401: First line should be in imperative mood (perhaps 'Request', not 'Requests') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_await.py:27 in private function `_awaitable_nowait`: D200: One-line docstring should fit on one line with quotes (found 3) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_await.py:27 in private function `_awaitable_nowait`: D401: First line should be in imperative mood (perhaps 'Create', not 'Creates') 8 ``` After: ``` 0 ``` 8) torch/jit/_check.py Before: ``` /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_check.py:10 in public class `AttributeTypeIsSupportedChecker`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_check.py:10 in public class `AttributeTypeIsSupportedChecker`: D400: First line should end with a period (not 'e') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_check.py:10 in public class `AttributeTypeIsSupportedChecker`: D412: No blank lines allowed between a section header and its content ('Example') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_check.py:61 in public method `check`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_check.py:110 in public method `visit_Assign`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_check.py:110 in public method `visit_Assign`: D400: First line should end with a period (not 'n') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_check.py:132 in public method `visit_AnnAssign`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_check.py:132 in public method `visit_AnnAssign`: D400: First line should end with a period (not '`') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_check.py:187 in public method `visit_Call`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_check.py:187 in public method `visit_Call`: D400: First line should end with a period (not '`') 10 ``` After: ``` /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_check.py:58 in public method `check`: D102: Missing docstring in public method 1 ``` 9) torch/jit/_freeze.py Before: ``` /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_freeze.py:1 at module level: D400: First line should end with a period (not 'g') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_freeze.py:16 in public function `freeze`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_freeze.py:16 in public function `freeze`: D400: First line should end with a period (not 'd') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_freeze.py:127 in public function `run_frozen_optimizations`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_freeze.py:127 in public function `run_frozen_optimizations`: D401: First line should be in imperative mood (perhaps 'Run', not 'Runs') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_freeze.py:182 in public function `optimize_for_inference`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_freeze.py:182 in public function `optimize_for_inference`: D400: First line should end with a period (not 'e') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_freeze.py:182 in public function `optimize_for_inference`: D401: First line should be in imperative mood (perhaps 'Perform', not 'Performs') 8 ``` After: ``` 0 ``` 10) torch/jit/_recursive.py Before: ``` /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:69 in public function `make_stub`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:75 in public function `make_stub_from_method`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:90 in public function `make_stubs_from_exported_methods`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:103 in public function `jit_ignored_properties`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:155 in public class `SourceContext`: D101: Missing docstring in public class /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:156 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:160 in public function `get_annotations`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:186 in public function `infer_concrete_type_builder`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:186 in public function `infer_concrete_type_builder`: D400: First line should end with a period (not 's') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:423 in public class `ConcreteTypeStore`: D101: Missing docstring in public class /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:427 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:434 in public method `get_or_create_concrete_type`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:434 in public method `get_or_create_concrete_type`: D400: First line should end with a period (not 'T') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:459 in public function `create_methods_and_properties_from_stubs`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:474 in public function `create_hooks_from_stubs`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:485 in public function `get_module_concrete_type`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:485 in public function `get_module_concrete_type`: D400: First line should end with a period (not 'e') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:485 in public function `get_module_concrete_type`: D401: First line should be in imperative mood (perhaps 'Get', not 'Gets') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:539 in public function `create_script_module`: D400: First line should end with a period (not 'e') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:539 in public function `create_script_module`: D401: First line should be in imperative mood (perhaps 'Create', not 'Creates') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:725 in public function `script_model_defines_attr`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:735 in public function `add_python_attr_to_scripted_model`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:740 in public function `get_overload_annotations`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:772 in public function `get_overload_name_mapping`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:797 in public function `make_stubs_for_overloads`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:816 in public function `check_module_initialized`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:842 in public function `infer_methods_to_compile`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:842 in public function `infer_methods_to_compile`: D400: First line should end with a period (not 'g') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:842 in public function `infer_methods_to_compile`: D401: First line should be in imperative mood (perhaps 'Implement', not 'Implements') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:904 in public function `get_hook_stubs`: D200: One-line docstring should fit on one line with quotes (found 3) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:904 in public function `get_hook_stubs`: D400: First line should end with a period (not 's') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:904 in public function `get_hook_stubs`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:940 in public function `get_property_stubs`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:940 in public function `get_property_stubs`: D400: First line should end with a period (not 'd') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:963 in public function `interface_script`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:963 in public function `interface_script`: D400: First line should end with a period (not 'r') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:963 in public function `interface_script`: D401: First line should be in imperative mood (perhaps 'Make', not 'Makes') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:977 in private nested function `infer_interface_methods_to_compile`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:977 in private nested function `infer_interface_methods_to_compile`: D400: First line should end with a period (not 'h') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:989 in public function `try_compile_fn`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:1014 in public function `wrap_cpp_class`: D200: One-line docstring should fit on one line with quotes (found 3) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:1021 in public function `wrap_cpp_module`: D200: One-line docstring should fit on one line with quotes (found 3) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:1021 in public function `wrap_cpp_module`: D400: First line should end with a period (not 's') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:1040 in public function `compile_unbound_method`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:1052 in public function `lazy_bind`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:1052 in public function `lazy_bind`: D400: First line should end with a period (not 'd') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:1052 in public function `lazy_bind`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') 47 ``` After: ``` /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:69 in public function `make_stub`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:75 in public function `make_stub_from_method`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:90 in public function `make_stubs_from_exported_methods`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:103 in public function `jit_ignored_properties`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:155 in public class `SourceContext`: D101: Missing docstring in public class /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:156 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:160 in public function `get_annotations`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:424 in public class `ConcreteTypeStore`: D101: Missing docstring in public class /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:428 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:457 in public function `create_methods_and_properties_from_stubs`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:472 in public function `create_hooks_from_stubs`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:724 in public function `script_model_defines_attr`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:734 in public function `add_python_attr_to_scripted_model`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:739 in public function `get_overload_annotations`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:771 in public function `get_overload_name_mapping`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:796 in public function `make_stubs_for_overloads`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:815 in public function `check_module_initialized`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:979 in public function `try_compile_fn`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:1026 in public function `compile_unbound_method`: D103: Missing docstring in public function 19 ``` @svekars Pull Request resolved: https://github.com/pytorch/pytorch/pull/113371 Approved by: https://github.com/davidberard98	2023-11-12 03:19:02 +00:00
Natalia Gimelshein	15a2caea8e	Enables copy/clone/reshape/contiguous operations for bits types (#113508 ) Per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/113508 Approved by: https://github.com/albanD	2023-11-11 22:51:50 +00:00
Jez Ng	d00c983b63	[dynamo] Make {testing,debug_utils,utils}.py pass follow_imports typechecking (#113519 ) Notes: * `debug_insert_nops` in testing.py was passing `None` to the compiler_fn parameter of `OutputGraph`, hence the modifications there. * I added `disable-error-code="method-assign"` to debug_utils.py as it does several such assignments. I guess mypy doesn't like it because it makes code near-impossible to safely typecheck. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113519 Approved by: https://github.com/Skylion007 ghstack dependencies: #113413, #113518	2023-11-11 22:15:46 +00:00
Jez Ng	6805d1e1d6	[inductor] Make graph.py pass follow_imports typechecking (#113518 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113518 Approved by: https://github.com/Skylion007 ghstack dependencies: #113413	2023-11-11 22:15:46 +00:00
Jez Ng	a8cf04fd2a	[inductor] Make {output_graph,pad_mm}.py pass follow_imports typechecking (#113413 ) I changed OutputGraph.nn_modules' type to `Dict[str, Any]` because it seems that `register_attr_or_module` can populate it with essentially any type. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113413 Approved by: https://github.com/Skylion007	2023-11-11 22:15:46 +00:00
Adnan Akhundov	8d41a5c605	[indictor] Fix cat decomp when first tensor is empty (#113514 ) Summary: Previously, when the first tensor argument to `aten.cat` was empty and there was only one non-empty tensor argument, the first (empty) tensor was erroneously returned by the `aten.cat` decomposition. Here we fix the bug. Test Plan: ``` $ python test/inductor/test_torchinductor.py -k test_cat_empty ... ---------------------------------------------------------------------- Ran 2 tests in 5.760s OK ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/113514 Approved by: https://github.com/jansel	2023-11-11 20:34:22 +00:00
BowenBao	39ca5a3226	[ONNX] Fix scalar type promotion between fp16 tensor and fp32 scalar (#113404 ) Fixes https://github.com/pytorch/pytorch/issues/104594. The reason for the exporter behavior in original posted issue is explained as follows: ONNX model track shape related computes that were done in pytorch by python numbers as tensor computes. This is the only way for ONNX to track them properly since ONNX only has tensor type, otherwise the computation result will be tracked statically as constant, and the model won't work for another input that differs in shape. Now for type promotion logic, scalars should be treated differently with tensors. Exporter mistook the shape related scalars as tensors in this case and incorrectly promoted. This PR fixes the behavior and relaxes the criteria of scalar recognition. For floating point, previously only a value from model initializer that has dtype torch.double and rank 0 is treated as scalar. Now it is relaxed to any intermediate value, as well as for dtype torch.float. Previous assumption was that python number is traced as torch.double dtype, which also appears to be invalid anymore. NOTE that this might introduce regression that a REAL 0-rank tensor is now being recognized as scalar. The downside is the model will drop in accuracy for these cases as certain computations will happen in lower precision data types. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113404 Approved by: https://github.com/justinchuby	2023-11-11 15:08:07 +00:00
Jason Ansel	b00311ce9e	[dynamo] Add run_inductor_tests entrypoint (#113278 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113278 Approved by: https://github.com/yanboliang	2023-11-11 08:54:43 +00:00
Justin Yip	fb9a136383	[pytorch-vulkan] Add operator<< for uvec3 (#112113 ) Summary: Useful for debugging. Test Plan: ``` LD_LIBRARY_PATH=third-party/swiftshader/lib/linux-x64/ buck2 run fbcode/mode/dev-nosan //xplat/caffe2:pt_vulkan_api_test_bin \| pastry ``` Test all pass: P865112285 Differential Revision: D50676692 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112113 Approved by: https://github.com/manuelcandales	2023-11-11 08:21:35 +00:00
PyTorch UpdateBot	ef49f61f19	[vision hash update] update the pinned vision hash (#113499 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113499 Approved by: https://github.com/pytorchbot	2023-11-11 03:21:37 +00:00
Jason Ansel	66d09f8217	[inductor] Move things into torch/testing/_internal/inductor_utils.py (#113275 ) This PR is just moving things around, so code shared by multiple tests files is in torch/testing/_internal/inductor_utils.py. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113275 Approved by: https://github.com/yanboliang ghstack dependencies: #113242	2023-11-11 03:17:35 +00:00
Jason Ansel	4309d38f5d	[dynamo] Refactor test cross importing (#113242 ) Having tests import tests is a bit annoying because fbcode/oss have different paths. This moves that stuff into a helper function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113242 Approved by: https://github.com/yanboliang	2023-11-11 03:17:35 +00:00
Jez Ng	5e03af8295	[inductor] Enable floor_div indexing to work under ABI-compat mode (#113276 ) Previously, floor_div operations were defined in ATen/native/BinaryOps.h. Since this header was not included under ABI-compat mode, trying to use those indexing operations would result in compilation errors. Technically, it is safe to use aten::native::floor_div_* functions in ABI-compat mode as they are header-only; we could simply include BinaryOps.h. However, there are other declarations in BinaryOps.h that are not binary-compatible, so this is not ideal. Thus, I have moved those functions into a separate file, and put them under c10/util, since they don't really have tensor-specific logic. c10 functions are not all header-only, so this still isn't ideal, but this still seems like an improvement. Moreover, cpp_prefix.h -- used when compiling cpp kernels -- already includes c10 header files, so ABI-compatibility already depends on maintaining some c10 functions as header-only. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113276 Approved by: https://github.com/chenyang78, https://github.com/desertfire	2023-11-11 02:51:29 +00:00
Chunzhi Yang	e75e01e6b9	Skip if the max element is 0 to avoid invalid config for CAT (#113321 ) Summary: We observe cuda invalid configuration during training. Here is an example log: https://www.internalfb.com/phabricator/paste/view/P876519113 It's actually caused by grid dim is 0 Here is an example failed job: https://www.internalfb.com/mlhub/pipelines/runs/mast/aps-zorror-996644c19c?version=0&env=PRODUCTION Test Plan: unit test Differential Revision: D51136494 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113321 Approved by: https://github.com/jianyuh	2023-11-11 02:45:43 +00:00
Isuru Fernando	3b915f9de0	[pt2] enable meta tests for `foreach` ops (#113484 ) Try https://github.com/pytorch/pytorch/pull/113059 again. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113484 Approved by: https://github.com/lezcano	2023-11-11 02:43:41 +00:00
Jason Ansel	28e11f54ab	[dynamo] skip test_internal_error_suppress_errors in fbcode (#113482 ) Summary: This test generates a different stack trace in fbcode and seems to have been failing for a while. Test Plan: sandcastle Differential Revision: D51210355 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113482 Approved by: https://github.com/oulgen	2023-11-11 02:41:29 +00:00
Catherine Lee	575be044c3	[TD] Disable HistoricalClassFailurCorrelation (#113497 ) Ex. https://github.com/pytorch/pytorch/actions/runs/6829618325/job/18576593307 ``` File "test/run_test.py", line 1806, in main test_stats = aggregated_heuristics.get_test_stats(test) File "/var/lib/jenkins/pytorch/tools/testing/target_determination/heuristics/interface.py", line 391, in get_test_stats metrics = heuristic_results.get_priority_info_for_test(test) File "/var/lib/jenkins/pytorch/tools/testing/target_determination/heuristics/interface.py", line 307, in get_priority_info_for_test relevance = self._get_test_relevance_group(test_run) File "/var/lib/jenkins/pytorch/tools/testing/target_determination/heuristics/interface.py", line 278, in _get_test_relevance_group raise ValueError(f"Test {test_run} not found in any relevance group") ValueError: Test test_cuda_expandable_segments not found in any relevance group ``` I believe that the root cause is that HistoricalClassFailurCorrelation splits `test_cuda_expandable_segments` into two sets: one with a class and one without the class. Then, when the entire `test_cuda_expandable_segments` fails (because we currently don't do class level granularity in TD), it is unable to find what HistoricalClassFailurCorrelation ranked the test as since it's split into two. I don't think this is that important for normal CI users since this code only runs if a test failed in the first place. However, it does mean that we can't gather TD stats, so I am going to disable it for now. One possible solution is to switch the contains call and take the worst or best of the bunch, which I think is what https://github.com/pytorch/pytorch/blob/main/tools/testing/target_determination/heuristics/interface.py#L272 is trying to do? unclear Pull Request resolved: https://github.com/pytorch/pytorch/pull/113497 Approved by: https://github.com/kit1980, https://github.com/huydhn, https://github.com/malfet	2023-11-11 02:34:27 +00:00
PyTorch MergeBot	3cb6cf1e8a	Revert "[ONNX] Fix scalar type promotion between fp16 tensor and fp32 scalar (#113404 )" This reverts commit f2cd68102a56cd0427f25b748bbe3b463d43807b. Reverted https://github.com/pytorch/pytorch/pull/113404 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it is failing in trunk `f2cd68102a`, may be a landrace or flaky of sort ([comment](https://github.com/pytorch/pytorch/pull/113404#issuecomment-1806613497))	2023-11-11 02:09:22 +00:00
Kaichao You	9f15fbae53	[Dynamo]fix bug for bytecode hook and leave a test case (#113457 ) Fixes https://github.com/pytorch/pytorch/pull/113234#issuecomment-1805584787 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/113457 Approved by: https://github.com/jansel	2023-11-11 01:59:48 +00:00
BJ Hargrave	670abff6ff	docs: Fix docstring lint errors in torch/distributed/fsdp/_flat_param.py & torch/distributed/fsdp/_init_utils.py (#113358 ) Fixes #113189 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113358 Approved by: https://github.com/kit1980	2023-11-11 01:53:02 +00:00
PyTorch MergeBot	4916a7e94f	Revert "[Kineto] Initialize libkineto profilers during torch init process during pybind set-up (#112623 )" This reverts commit a62a88bb84f633581242bd0107e01d2a075884a3. Reverted https://github.com/pytorch/pytorch/pull/112623 on behalf of https://github.com/huydhn due to This break TestCuda::test_lazy_init on ROCm ([comment](https://github.com/pytorch/pytorch/pull/112623#issuecomment-1806597750))	2023-11-11 00:35:56 +00:00
Nikita Shulga	0a7eef9bcf	[BE] Remove stale CUDA version check from cpp_extension.py (#113447 ) As at least CUDA-11.x is needed to build PyTorch on latest trunk. But still skip `--generate-dependencies-with-compile` if running on ROCm Pull Request resolved: https://github.com/pytorch/pytorch/pull/113447 Approved by: https://github.com/Skylion007, https://github.com/atalman, https://github.com/PaliC, https://github.com/huydhn	2023-11-11 00:20:08 +00:00
vfdev-5	740e8a536f	Perf improvements for eager GridSampler (#113341 ) Description: - Added vectorized `cast` and fixed `mask_gather` signature bug to be used in GridSampler Perf speed-up results: - CPU capability usage: AVX2 ``` [--------------------------------------------------------------------------------------- Affine grid sampling, cpu ----------------------------------------------------------------------------------------] \| Eager (2.2.0a0+git971a50e) PR \| Eager (2.2.0a0+git3ca81ae) nightly \| Speed-up PR vs Nightly 1 threads: ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Input: (1, 3, 500, 400) torch.float32, torch.contiguous_format, align_corners=True, mode=nearest \| 698.871 (+-42.998) \| 1196.590 (+-16.223) \| 1.712 (+-0.000) Input: (1, 3, 500, 400) torch.float64, torch.contiguous_format, align_corners=True, mode=nearest \| 1363.909 (+-49.798) \| 2658.933 (+-62.208) \| 1.949 (+-0.000) Input: (1, 3, 500, 400) torch.float32, torch.channels_last, align_corners=True, mode=nearest \| 542.857 (+-3.547) \| 1166.259 (+-13.349) \| 2.148 (+-0.000) Input: (1, 3, 500, 400) torch.float64, torch.channels_last, align_corners=True, mode=nearest \| 1110.957 (+-173.044) \| 2472.511 (+-37.322) \| 2.226 (+-0.000) Input: (1, 3, 500, 400) torch.float32, torch.contiguous_format, align_corners=False, mode=nearest \| 666.702 (+-3.624) \| 1211.040 (+-15.933) \| 1.816 (+-0.000) Input: (1, 3, 500, 400) torch.float64, torch.contiguous_format, align_corners=False, mode=nearest \| 1383.907 (+-52.735) \| 2680.096 (+-72.214) \| 1.937 (+-0.000) Input: (1, 3, 500, 400) torch.float32, torch.channels_last, align_corners=False, mode=nearest \| 552.020 (+-4.574) \| 1165.713 (+-13.829) \| 2.112 (+-0.000) Input: (1, 3, 500, 400) torch.float64, torch.channels_last, align_corners=False, mode=nearest \| 1195.561 (+-43.627) \| 2479.525 (+-37.279) \| 2.074 (+-0.000) Input: (1, 3, 500, 400) torch.float32, torch.contiguous_format, align_corners=True, mode=bilinear \| 1434.594 (+-18.829) \| 3713.318 (+-53.087) \| 2.588 (+-0.000) Input: (1, 3, 500, 400) torch.float64, torch.contiguous_format, align_corners=True, mode=bilinear \| 2584.424 (+-61.646) \| 6266.618 (+-70.403) \| 2.425 (+-0.000) Input: (1, 3, 500, 400) torch.float32, torch.channels_last, align_corners=True, mode=bilinear \| 1064.318 (+-17.605) \| 3689.232 (+-35.200) \| 3.466 (+-0.000) Input: (1, 3, 500, 400) torch.float64, torch.channels_last, align_corners=True, mode=bilinear \| 2227.200 (+-46.111) \| 6053.448 (+-43.859) \| 2.718 (+-0.000) Input: (1, 3, 500, 400) torch.float32, torch.contiguous_format, align_corners=False, mode=bilinear \| 1479.566 (+-23.023) \| 3695.113 (+-48.203) \| 2.497 (+-0.000) Input: (1, 3, 500, 400) torch.float64, torch.contiguous_format, align_corners=False, mode=bilinear \| 2551.005 (+-58.898) \| 6244.574 (+-66.058) \| 2.448 (+-0.000) Input: (1, 3, 500, 400) torch.float32, torch.channels_last, align_corners=False, mode=bilinear \| 1081.029 (+-13.911) \| 3680.292 (+-35.145) \| 3.404 (+-0.000) Input: (1, 3, 500, 400) torch.float64, torch.channels_last, align_corners=False, mode=bilinear \| 2209.528 (+-61.779) \| 6073.101 (+-99.366) \| 2.749 (+-0.000) Input: (1, 3, 500, 400) torch.float32, torch.contiguous_format, align_corners=True, mode=bicubic \| 4607.162 (+-40.688) \| 14703.326 (+-564.378) \| 3.191 (+-0.000) Input: (1, 3, 500, 400) torch.float64, torch.contiguous_format, align_corners=True, mode=bicubic \| 30132.017 (+-679.033) \| 38338.429 (+-768.288) \| 1.272 (+-0.000) Input: (1, 3, 500, 400) torch.float32, torch.channels_last, align_corners=True, mode=bicubic \| 4274.459 (+-33.603) \| 14766.649 (+-260.509) \| 3.455 (+-0.000) Input: (1, 3, 500, 400) torch.float64, torch.channels_last, align_corners=True, mode=bicubic \| 29137.615 (+-617.591) \| 37420.822 (+-785.526) \| 1.284 (+-0.000) Input: (1, 3, 500, 400) torch.float32, torch.contiguous_format, align_corners=False, mode=bicubic \| 4954.048 (+-79.199) \| 14704.016 (+-330.618) \| 2.968 (+-0.000) Input: (1, 3, 500, 400) torch.float64, torch.contiguous_format, align_corners=False, mode=bicubic \| 30068.414 (+-792.686) \| 38409.600 (+-691.079) \| 1.277 (+-0.000) Input: (1, 3, 500, 400) torch.float32, torch.channels_last, align_corners=False, mode=bicubic \| 4274.381 (+-35.679) \| 14756.324 (+-236.034) \| 3.452 (+-0.000) Input: (1, 3, 500, 400) torch.float64, torch.channels_last, align_corners=False, mode=bicubic \| 29148.286 (+-780.277) \| 37389.990 (+-663.702) \| 1.283 (+-0.000) Input: (8, 3, 500, 400) torch.float32, torch.contiguous_format, align_corners=True, mode=nearest \| 9656.722 (+-66.127) \| 13726.028 (+-112.412) \| 1.421 (+-0.000) Input: (8, 3, 500, 400) torch.float64, torch.contiguous_format, align_corners=True, mode=nearest \| 19947.575 (+-108.492) \| 41501.452 (+-327.186) \| 2.081 (+-0.000) Input: (8, 3, 500, 400) torch.float32, torch.channels_last, align_corners=True, mode=nearest \| 7597.021 (+-52.866) \| 10839.269 (+-93.029) \| 1.427 (+-0.000) Input: (8, 3, 500, 400) torch.float64, torch.channels_last, align_corners=True, mode=nearest \| 28164.663 (+-179.955) \| 34985.201 (+-350.970) \| 1.242 (+-0.000) Input: (8, 3, 500, 400) torch.float32, torch.contiguous_format, align_corners=False, mode=nearest \| 9703.983 (+-154.907) \| 13858.466 (+-128.411) \| 1.428 (+-0.000) Input: (8, 3, 500, 400) torch.float64, torch.contiguous_format, align_corners=False, mode=nearest \| 34086.142 (+-212.213) \| 41104.817 (+-433.195) \| 1.206 (+-0.000) Input: (8, 3, 500, 400) torch.float32, torch.channels_last, align_corners=False, mode=nearest \| 7626.922 (+-56.371) \| 10916.952 (+-96.023) \| 1.431 (+-0.000) Input: (8, 3, 500, 400) torch.float64, torch.channels_last, align_corners=False, mode=nearest \| 28277.855 (+-228.616) \| 34851.453 (+-260.788) \| 1.232 (+-0.000) Input: (8, 3, 500, 400) torch.float32, torch.contiguous_format, align_corners=True, mode=bilinear \| 14180.691 (+-184.150) \| 36243.299 (+-350.811) \| 2.556 (+-0.000) Input: (8, 3, 500, 400) torch.float64, torch.contiguous_format, align_corners=True, mode=bilinear \| 40699.798 (+-234.600) \| 68053.260 (+-1057.869) \| 1.672 (+-0.000) Input: (8, 3, 500, 400) torch.float32, torch.channels_last, align_corners=True, mode=bilinear \| 11190.905 (+-103.419) \| 30729.080 (+-381.639) \| 2.746 (+-0.000) Input: (8, 3, 500, 400) torch.float64, torch.channels_last, align_corners=True, mode=bilinear \| 35965.958 (+-298.474) \| 63030.143 (+-390.692) \| 1.752 (+-0.000) Input: (8, 3, 500, 400) torch.float32, torch.contiguous_format, align_corners=False, mode=bilinear \| 14461.459 (+-120.555) \| 36150.986 (+-293.416) \| 2.500 (+-0.000) Input: (8, 3, 500, 400) torch.float64, torch.contiguous_format, align_corners=False, mode=bilinear \| 40891.653 (+-195.887) \| 67757.076 (+-991.072) \| 1.657 (+-0.000) Input: (8, 3, 500, 400) torch.float32, torch.channels_last, align_corners=False, mode=bilinear \| 11437.092 (+-100.145) \| 30465.192 (+-282.936) \| 2.664 (+-0.000) Input: (8, 3, 500, 400) torch.float64, torch.channels_last, align_corners=False, mode=bilinear \| 36112.937 (+-306.527) \| 63729.695 (+-678.976) \| 1.765 (+-0.000) Input: (8, 3, 500, 400) torch.float32, torch.contiguous_format, align_corners=True, mode=bicubic \| 39512.380 (+-368.172) \| 129854.028 (+-1635.314) \| 3.286 (+-0.000) Input: (8, 3, 500, 400) torch.float64, torch.contiguous_format, align_corners=True, mode=bicubic \| 283835.203 (+-2166.425) \| 352072.211 (+-3178.250) \| 1.240 (+-0.000) Input: (8, 3, 500, 400) torch.float32, torch.channels_last, align_corners=True, mode=bicubic \| 35804.934 (+-341.254) \| 126762.714 (+-1740.266) \| 3.540 (+-0.000) Input: (8, 3, 500, 400) torch.float64, torch.channels_last, align_corners=True, mode=bicubic \| 275862.511 (+-2549.251) \| 341804.886 (+-2974.238) \| 1.239 (+-0.000) Input: (8, 3, 500, 400) torch.float32, torch.contiguous_format, align_corners=False, mode=bicubic \| 39514.504 (+-307.814) \| 130436.644 (+-3081.411) \| 3.301 (+-0.000) Input: (8, 3, 500, 400) torch.float64, torch.contiguous_format, align_corners=False, mode=bicubic \| 283929.198 (+-2373.485) \| 353432.316 (+-3600.725) \| 1.245 (+-0.000) Input: (8, 3, 500, 400) torch.float32, torch.channels_last, align_corners=False, mode=bicubic \| 35776.293 (+-267.109) \| 126884.936 (+-1718.414) \| 3.547 (+-0.000) Input: (8, 3, 500, 400) torch.float64, torch.channels_last, align_corners=False, mode=bicubic \| 276278.294 (+-2150.899) \| 326207.948 (+-2578.309) \| 1.181 (+-0.000) Times are in microseconds (us). ``` [Source](https://github.com/vfdev-5/pth-grid-sampler/blob/master/output/20231109-161300-pr_vs_nightly-speedup.md) TODO: - [ ] Add AVX512 benchmark results (I have no access to a cpu with avx512 capabilities anymore) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113341 Approved by: https://github.com/lezcano	2023-11-11 00:16:26 +00:00
AllenTiTaiWang	e8e3afb784	[ONNX] Refactor MaxPool to support dynamic inputs (#113318 ) In https://github.com/pytorch/pytorch/pull/106270, the solution managed to solve the [`ceil_model` corner issue](https://github.com/onnx/onnx/issues/5711) with the usage of `get_pool_ceil_padding`. However, padding the ceil in converter side only works when we already know the input shapes, therefore, a regression happens when users want to do dynamic inputs. This PR provides (1) refactor codes with torchlib implementation, (2) add dynamic shapes test, and (3) disable the corner tests with comments saying re-enable it when the [real fix from ONNX](https://github.com/onnx/onnx/pull/5741) is merged. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113318 Approved by: https://github.com/thiagocrepaldi	2023-11-10 23:23:49 +00:00
Thomas J. Fan	a4dc3716c0	Deprecated verbose parameter in LR schedulers (#111302 ) Fixes https://github.com/pytorch/pytorch/issues/100847 This PR follows the comment in https://github.com/pytorch/pytorch/issues/100847#issuecomment-1546247239 by deprecating the `verbose` parameter and removing the print statements. Removing the print statements is technically BC breaking, so I would be okay with putting them back in. To be less annoying, this PR raises a warning only when `verbose` is explicitly passed in. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111302 Approved by: https://github.com/albanD	2023-11-10 23:17:27 +00:00
Huy Do	d4e670c37c	Add pyre internal configs to gitignore (#113480 ) TSIA Pull Request resolved: https://github.com/pytorch/pytorch/pull/113480 Approved by: https://github.com/clee2000	2023-11-10 22:44:13 +00:00
Oguz Ulgen	06dc2f162d	[AOTI] Implement support for user defined kernels that use triton.autotune (#113229 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113229 Approved by: https://github.com/chenyang78	2023-11-10 22:40:51 +00:00
BowenBao	f2cd68102a	[ONNX] Fix scalar type promotion between fp16 tensor and fp32 scalar (#113404 ) Fixes https://github.com/pytorch/pytorch/issues/104594. The reason for the exporter behavior in original posted issue is explained as follows: ONNX model track shape related computes that were done in pytorch by python numbers as tensor computes. This is the only way for ONNX to track them properly since ONNX only has tensor type, otherwise the computation result will be tracked statically as constant, and the model won't work for another input that differs in shape. Now for type promotion logic, scalars should be treated differently with tensors. Exporter mistook the shape related scalars as tensors in this case and incorrectly promoted. This PR fixes the behavior and relaxes the criteria of scalar recognition. For floating point, previously only a value from model initializer that has dtype torch.double and rank 0 is treated as scalar. Now it is relaxed to any intermediate value, as well as for dtype torch.float. Previous assumption was that python number is traced as torch.double dtype, which also appears to be invalid anymore. NOTE that this might introduce regression that a REAL 0-rank tensor is now being recognized as scalar. The downside is the model will drop in accuracy for these cases as certain computations will happen in lower precision data types. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113404 Approved by: https://github.com/justinchuby	2023-11-10 22:31:25 +00:00
wlei	cbf12dfba6	[LLVM] Replaced getInt8PtrTy with getUnqual (#113455 ) [llvm-fb-staging] Build failed on pytorch jit due to llvm upstream API change. The fix should just replace getInt8PtrTy with getUnqual. The corresponding task - [T169468309](https://www.internalfb.com/tasks/?t=169468309). Pull Request resolved: https://github.com/pytorch/pytorch/pull/113455 Approved by: https://github.com/malfet	2023-11-10 22:26:20 +00:00
Edward Z. Yang	48c2f89399	[BE] Add friendly error message if you compile_fx_inner but not return tuple/list (#113451 ) Previously it would fail here: ``` File "/data/users/ezyang/a/pytorch/torch/_inductor/fx_passes/post_grad.py", line 597, in remove_noop_ops for out in tuple(graph.nodes)[-1].args[0]: ``` Now you'll trigger this assert instead. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113451 Approved by: https://github.com/albanD	2023-11-10 21:43:58 +00:00
Edward Z. Yang	dfa9e7b511	Allow inferring divisibility on unbacked SymInts and do replacement trick (#113165 ) We want something like torch.empty(i0, 12).view(4, -1, 12) to work. Right now, it chokes on guards on data dependent accesses. It turns out we are very close to having it work based on experiments in https://github.com/pytorch/pytorch/issues/112347 if we do the replacement trick, setting i0 = i1 * 4 to explicitly encode in the divisibility; this is good enough for Sympy to be able to handle the rest. There are two parts to this PR. * First, we must discover that there is this divisibility constraint. The place where this happens on view is in `infer_size`; however, we are unable to discover the modulus test with `expect_true` because the condition is currently written with a Python boolean operator that forces guarding too early: `numel == newsize or (dim is not None and newsize > 0 and numel % newsize == 0)`. We rewrite this into an equivalent version which tests on dim being None or not first, before performing individual tests. The main nontrivial reasoning here is that I must show that my set of tests in the `dim is None` branch are sufficient when `numel == newsize`. However, if `numel == newsize`, then the modulus must pass. Thus this is equivalent. * Given the modifications to `infer_size`, this suffices to produce a runtime assert `Eq(Mod(192*i0, 2304), 0)`. Now we must simply turn this into the replacement automatically. I wasn't really sure how to use Sympy to do this for me, so I just manually pattern matched for this particular expression form, and if it exists do the replacements. Note that this is kind of only useful for export, because inductor chokes on views involving unbacked SymInts. That will be follow up. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113165 Approved by: https://github.com/lezcano, https://github.com/aakhundov	2023-11-10 21:28:02 +00:00
IvanLauLinTiong	91c90f232a	Fix docstring errors in reductions.py, spawn.py, pool.py, parameter.py, cpp.py, grad.py, __init__.py, profiler.py, queue.py, graph.py (#113052 ) Fixes #112595 - `torch/autograd/profiler.py` </br> Before: 37 ``` torch/autograd/profiler.py:1 at module level: D100: Missing docstring in public module torch/autograd/profiler.py:91 in public class `profile`: D205: 1 blank line required between summary line and description (found 0) torch/autograd/profiler.py:175 in public method `__init__`: D107: Missing docstring in __init__ torch/autograd/profiler.py:261 in public method `config`: D102: Missing docstring in public method torch/autograd/profiler.py:272 in public method `__enter__`: D105: Missing docstring in magic method torch/autograd/profiler.py:290 in public method `__exit__`: D105: Missing docstring in magic method torch/autograd/profiler.py:308 in public method `__repr__`: D105: Missing docstring in magic method torch/autograd/profiler.py:313 in public method `__str__`: D105: Missing docstring in magic method torch/autograd/profiler.py:322 in public method `table`: D102: Missing docstring in public method torch/autograd/profiler.py:346 in public method `export_chrome_trace`: D102: Missing docstring in public method torch/autograd/profiler.py:355 in public method `export_stacks`: D102: Missing docstring in public method torch/autograd/profiler.py:361 in public method `key_averages`: D102: Missing docstring in public method torch/autograd/profiler.py:368 in public method `total_average`: D102: Missing docstring in public method torch/autograd/profiler.py:377 in public method `self_cpu_time_total`: D205: 1 blank line required between summary line and description (found 0) torch/autograd/profiler.py:377 in public method `self_cpu_time_total`: D400: First line should end with a period (not 'f') torch/autograd/profiler.py:555 in public class `record_function`: D205: 1 blank line required between summary line and description (found 0) torch/autograd/profiler.py:555 in public class `record_function`: D400: First line should end with a period (not 'f') torch/autograd/profiler.py:591 in public method `__init__`: D107: Missing docstring in __init__ torch/autograd/profiler.py:602 in public method `__enter__`: D105: Missing docstring in magic method torch/autograd/profiler.py:608 in public method `__exit__`: D105: Missing docstring in magic method torch/autograd/profiler.py:625 in private method `_call_end_callbacks_on_future`: D205: 1 blank line required between summary line and description (found 0) torch/autograd/profiler.py:625 in private method `_call_end_callbacks_on_future`: D400: First line should end with a period (not 'c') torch/autograd/profiler.py:707 in public method `__init__`: D107: Missing docstring in __init__ torch/autograd/profiler.py:712 in public method `__enter__`: D105: Missing docstring in magic method torch/autograd/profiler.py:733 in public method `__exit__`: D105: Missing docstring in magic method torch/autograd/profiler.py:826 in public method `__init__`: D107: Missing docstring in __init__ torch/autograd/profiler.py:831 in public method `__enter__`: D105: Missing docstring in magic method torch/autograd/profiler.py:853 in public method `__exit__`: D105: Missing docstring in magic method torch/autograd/profiler.py:863 in public function `load_nvprof`: D401: First line should be in imperative mood (perhaps 'Open', not 'Opens') torch/autograd/profiler.py:874 in public method `__init__`: D107: Missing docstring in __init__ torch/autograd/profiler.py:877 in public method `see`: D102: Missing docstring in public method torch/autograd/profiler.py:883 in public function `parse_nvprof_trace`: D103: Missing docstring in public function torch/autograd/profiler.py:951 in public class `KinetoStepTracker`: D205: 1 blank line required between summary line and description (found 0) torch/autograd/profiler.py:991 in public method `init_step_count`: D102: Missing docstring in public method torch/autograd/profiler.py:995 in public method `erase_step_count`: D102: Missing docstring in public method torch/autograd/profiler.py:1000 in public method `increment_step`: D205: 1 blank line required between summary line and description (found 0) torch/autograd/profiler.py:1023 in public method `current_step`: D102: Missing docstring in public method 37 ``` After: 27 ``` torch/autograd/profiler.py:1 at module level: D100: Missing docstring in public module torch/autograd/profiler.py:176 in public method `__init__`: D107: Missing docstring in __init__ torch/autograd/profiler.py:262 in public method `config`: D102: Missing docstring in public method torch/autograd/profiler.py:273 in public method `__enter__`: D105: Missing docstring in magic method torch/autograd/profiler.py:291 in public method `__exit__`: D105: Missing docstring in magic method torch/autograd/profiler.py:309 in public method `__repr__`: D105: Missing docstring in magic method torch/autograd/profiler.py:314 in public method `__str__`: D105: Missing docstring in magic method torch/autograd/profiler.py:323 in public method `table`: D102: Missing docstring in public method torch/autograd/profiler.py:347 in public method `export_chrome_trace`: D102: Missing docstring in public method torch/autograd/profiler.py:356 in public method `export_stacks`: D102: Missing docstring in public method torch/autograd/profiler.py:362 in public method `key_averages`: D102: Missing docstring in public method torch/autograd/profiler.py:369 in public method `total_average`: D102: Missing docstring in public method torch/autograd/profiler.py:593 in public method `__init__`: D107: Missing docstring in __init__ torch/autograd/profiler.py:604 in public method `__enter__`: D105: Missing docstring in magic method torch/autograd/profiler.py:610 in public method `__exit__`: D105: Missing docstring in magic method torch/autograd/profiler.py:708 in public method `__init__`: D107: Missing docstring in __init__ torch/autograd/profiler.py:713 in public method `__enter__`: D105: Missing docstring in magic method torch/autograd/profiler.py:734 in public method `__exit__`: D105: Missing docstring in magic method torch/autograd/profiler.py:827 in public method `__init__`: D107: Missing docstring in __init__ torch/autograd/profiler.py:832 in public method `__enter__`: D105: Missing docstring in magic method torch/autograd/profiler.py:854 in public method `__exit__`: D105: Missing docstring in magic method torch/autograd/profiler.py:875 in public method `__init__`: D107: Missing docstring in __init__ torch/autograd/profiler.py:878 in public method `see`: D102: Missing docstring in public method torch/autograd/profiler.py:884 in public function `parse_nvprof_trace`: D103: Missing docstring in public function torch/autograd/profiler.py:993 in public method `init_step_count`: D102: Missing docstring in public method torch/autograd/profiler.py:997 in public method `erase_step_count`: D102: Missing docstring in public method torch/autograd/profiler.py:1025 in public method `current_step`: D102: Missing docstring in public method 27 ``` - `torch/autograd/graph.py` </br> Before: 22 ``` torch/autograd/graph.py:1 at module level: D100: Missing docstring in public module torch/autograd/graph.py:24 in public class `Node`: D101: Missing docstring in public class torch/autograd/graph.py:27 in public method `name`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') torch/autograd/graph.py:42 in public method `next_functions`: D102: Missing docstring in public method torch/autograd/graph.py:47 in public method `metadata`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') torch/autograd/graph.py:56 in public method `register_hook`: D401: First line should be in imperative mood (perhaps 'Register', not 'Registers') torch/autograd/graph.py:94 in public method `register_prehook`: D401: First line should be in imperative mood (perhaps 'Register', not 'Registers') torch/autograd/graph.py:129 in public method `__subclasshook__`: D105: Missing docstring in magic method torch/autograd/graph.py:147 in public function `get_gradient_edge`: D205: 1 blank line required between summary line and description (found 0) torch/autograd/graph.py:147 in public function `get_gradient_edge`: D400: First line should end with a period (not 'f') torch/autograd/graph.py:147 in public function `get_gradient_edge`: D401: First line should be in imperative mood; try rephrasing (found 'This') torch/autograd/graph.py:166 in public function `increment_version`: D205: 1 blank line required between summary line and description (found 0) torch/autograd/graph.py:166 in public function `increment_version`: D400: First line should end with a period (not 'd') torch/autograd/graph.py:166 in public function `increment_version`: D401: First line should be in imperative mood; try rephrasing (found 'This') torch/autograd/graph.py:243 in public method `__init__`: D107: Missing docstring in __init__ torch/autograd/graph.py:251 in public method `__enter__`: D105: Missing docstring in magic method torch/autograd/graph.py:256 in public method `__exit__`: D105: Missing docstring in magic method torch/autograd/graph.py:261 in public class `save_on_cpu`: D205: 1 blank line required between summary line and description (found 0) torch/autograd/graph.py:261 in public class `save_on_cpu`: D400: First line should end with a period (not 'e') torch/autograd/graph.py:303 in public method `__init__`: D107: Missing docstring in __init__ torch/autograd/graph.py:365 in public function `register_multi_grad_hook`: D401: First line should be in imperative mood (perhaps 'Register', not 'Registers') torch/autograd/graph.py:588 in public function `allow_mutation_on_saved_tensors`: D400: First line should end with a period (not 'd') 22 ``` After: 8 ``` torch/autograd/graph.py:1 at module level: D100: Missing docstring in public module torch/autograd/graph.py:24 in public class `Node`: D101: Missing docstring in public class torch/autograd/graph.py:42 in public method `next_functions`: D102: Missing docstring in public method torch/autograd/graph.py:129 in public method `__subclasshook__`: D105: Missing docstring in magic method torch/autograd/graph.py:244 in public method `__init__`: D107: Missing docstring in __init__ torch/autograd/graph.py:252 in public method `__enter__`: D105: Missing docstring in magic method torch/autograd/graph.py:257 in public method `__exit__`: D105: Missing docstring in magic method torch/autograd/graph.py:303 in public method `__init__`: D107: Missing docstring in __init__ 8 ``` - `torch/multiprocessing/pool.py` </br> Before: 6 ``` torch/multiprocessing/pool.py:1 at module level: D100: Missing docstring in public module torch/multiprocessing/pool.py:7 in public function `clean_worker`: D103: Missing docstring in public function torch/multiprocessing/pool.py:18 in public class `Pool`: D205: 1 blank line required between summary line and description (found 0) torch/multiprocessing/pool.py:18 in public class `Pool`: D209: Multi-line docstring closing quotes should be on a separate line torch/multiprocessing/pool.py:29 in private method `_repopulate_pool`: D205: 1 blank line required between summary line and description (found 0) torch/multiprocessing/pool.py:29 in private method `_repopulate_pool`: D400: First line should end with a period (not ',') 6 ``` After: 2 ``` torch/multiprocessing/pool.py:1 at module level: D100: Missing docstring in public module torch/multiprocessing/pool.py:7 in public function `clean_worker`: D103: Missing docstring in public function 2 ``` - `torch/multiprocessing/queue.py` </br> Before: 11 ``` torch/multiprocessing/queue.py:1 at module level: D100: Missing docstring in public module torch/multiprocessing/queue.py:8 in public class `ConnectionWrapper`: D205: 1 blank line required between summary line and description (found 0) torch/multiprocessing/queue.py:8 in public class `ConnectionWrapper`: D209: Multi-line docstring closing quotes should be on a separate line torch/multiprocessing/queue.py:8 in public class `ConnectionWrapper`: D400: First line should end with a period (not 'o') torch/multiprocessing/queue.py:11 in public method `__init__`: D107: Missing docstring in __init__ torch/multiprocessing/queue.py:14 in public method `send`: D102: Missing docstring in public method torch/multiprocessing/queue.py:19 in public method `recv`: D102: Missing docstring in public method torch/multiprocessing/queue.py:23 in public method `__getattr__`: D105: Missing docstring in magic method torch/multiprocessing/queue.py:29 in public class `Queue`: D101: Missing docstring in public class torch/multiprocessing/queue.py:30 in public method `__init__`: D107: Missing docstring in __init__ torch/multiprocessing/queue.py:38 in public class `SimpleQueue`: D101: Missing docstring in public class 11 ``` After: 8 ``` torch/multiprocessing/queue.py:1 at module level: D100: Missing docstring in public module torch/multiprocessing/queue.py:10 in public method `__init__`: D107: Missing docstring in __init__ torch/multiprocessing/queue.py:13 in public method `send`: D102: Missing docstring in public method torch/multiprocessing/queue.py:18 in public method `recv`: D102: Missing docstring in public method torch/multiprocessing/queue.py:22 in public method `__getattr__`: D105: Missing docstring in magic method torch/multiprocessing/queue.py:28 in public class `Queue`: D101: Missing docstring in public class torch/multiprocessing/queue.py:29 in public method `__init__`: D107: Missing docstring in __init__ torch/multiprocessing/queue.py:37 in public class `SimpleQueue`: D101: Missing docstring in public class 8 ``` - `torch/multiprocessing/reductions.py` </br> Before: 31 ``` torch/multiprocessing/reductions.py:1 at module level: D100: Missing docstring in public module torch/multiprocessing/reductions.py:24 in public class `StorageWeakRef`: D209: Multi-line docstring closing quotes should be on a separate line torch/multiprocessing/reductions.py:31 in public method `__init__`: D107: Missing docstring in __init__ torch/multiprocessing/reductions.py:38 in public method `from_weakref`: D102: Missing docstring in public method torch/multiprocessing/reductions.py:44 in public method `expired`: D102: Missing docstring in public method torch/multiprocessing/reductions.py:47 in public method `__del__`: D105: Missing docstring in magic method torch/multiprocessing/reductions.py:50 in public method `__hash__`: D105: Missing docstring in magic method torch/multiprocessing/reductions.py:53 in public method `__eq__`: D105: Missing docstring in magic method torch/multiprocessing/reductions.py:60 in public class `SharedCache`: D400: First line should end with a period (not 'f') torch/multiprocessing/reductions.py:62 in public method `__init__`: D107: Missing docstring in __init__ torch/multiprocessing/reductions.py:75 in public method `get`: D102: Missing docstring in public method torch/multiprocessing/reductions.py:79 in public method `__setitem__`: D105: Missing docstring in magic method torch/multiprocessing/reductions.py:85 in public method `free_dead_references`: D102: Missing docstring in public method torch/multiprocessing/reductions.py:99 in public function `rebuild_event`: D103: Missing docstring in public function torch/multiprocessing/reductions.py:103 in public function `reduce_event`: D103: Missing docstring in public function torch/multiprocessing/reductions.py:108 in public function `rebuild_tensor`: D103: Missing docstring in public function torch/multiprocessing/reductions.py:121 in public function `rebuild_cuda_tensor`: D103: Missing docstring in public function torch/multiprocessing/reductions.py:189 in public function `reduce_tensor`: D103: Missing docstring in public function torch/multiprocessing/reductions.py:347 in public function `rebuild_nested_tensor`: D103: Missing docstring in public function torch/multiprocessing/reductions.py:364 in public function `reduce_nested_tensor`: D103: Missing docstring in public function torch/multiprocessing/reductions.py:389 in public function `fd_id`: D103: Missing docstring in public function torch/multiprocessing/reductions.py:397 in public function `storage_from_cache`: D103: Missing docstring in public function torch/multiprocessing/reductions.py:404 in public function `rebuild_storage_fd`: D103: Missing docstring in public function torch/multiprocessing/reductions.py:417 in public function `rebuild_storage_filename`: D103: Missing docstring in public function torch/multiprocessing/reductions.py:437 in public function `rebuild_storage_empty`: D103: Missing docstring in public function torch/multiprocessing/reductions.py:441 in public function `rebuild_typed_storage`: D103: Missing docstring in public function torch/multiprocessing/reductions.py:446 in public function `reduce_typed_storage`: D103: Missing docstring in public function torch/multiprocessing/reductions.py:450 in public function `rebuild_typed_storage_child`: D103: Missing docstring in public function torch/multiprocessing/reductions.py:455 in public function `reduce_typed_storage_child`: D103: Missing docstring in public function torch/multiprocessing/reductions.py:459 in public function `reduce_storage`: D103: Missing docstring in public function torch/multiprocessing/reductions.py:488 in public function `init_reductions`: D103: Missing docstring in public function 31 ``` After: 29 ``` torch/multiprocessing/reductions.py:1 at module level: D100: Missing docstring in public module torch/multiprocessing/reductions.py:32 in public method `__init__`: D107: Missing docstring in __init__ torch/multiprocessing/reductions.py:39 in public method `from_weakref`: D102: Missing docstring in public method torch/multiprocessing/reductions.py:45 in public method `expired`: D102: Missing docstring in public method torch/multiprocessing/reductions.py:48 in public method `__del__`: D105: Missing docstring in magic method torch/multiprocessing/reductions.py:51 in public method `__hash__`: D105: Missing docstring in magic method torch/multiprocessing/reductions.py:54 in public method `__eq__`: D105: Missing docstring in magic method torch/multiprocessing/reductions.py:63 in public method `__init__`: D107: Missing docstring in __init__ torch/multiprocessing/reductions.py:76 in public method `get`: D102: Missing docstring in public method torch/multiprocessing/reductions.py:80 in public method `__setitem__`: D105: Missing docstring in magic method torch/multiprocessing/reductions.py:86 in public method `free_dead_references`: D102: Missing docstring in public method torch/multiprocessing/reductions.py:100 in public function `rebuild_event`: D103: Missing docstring in public function torch/multiprocessing/reductions.py:104 in public function `reduce_event`: D103: Missing docstring in public function torch/multiprocessing/reductions.py:109 in public function `rebuild_tensor`: D103: Missing docstring in public function torch/multiprocessing/reductions.py:122 in public function `rebuild_cuda_tensor`: D103: Missing docstring in public function torch/multiprocessing/reductions.py:190 in public function `reduce_tensor`: D103: Missing docstring in public function torch/multiprocessing/reductions.py:348 in public function `rebuild_nested_tensor`: D103: Missing docstring in public function torch/multiprocessing/reductions.py:365 in public function `reduce_nested_tensor`: D103: Missing docstring in public function torch/multiprocessing/reductions.py:390 in public function `fd_id`: D103: Missing docstring in public function torch/multiprocessing/reductions.py:398 in public function `storage_from_cache`: D103: Missing docstring in public function torch/multiprocessing/reductions.py:405 in public function `rebuild_storage_fd`: D103: Missing docstring in public function torch/multiprocessing/reductions.py:418 in public function `rebuild_storage_filename`: D103: Missing docstring in public function torch/multiprocessing/reductions.py:438 in public function `rebuild_storage_empty`: D103: Missing docstring in public function torch/multiprocessing/reductions.py:442 in public function `rebuild_typed_storage`: D103: Missing docstring in public function torch/multiprocessing/reductions.py:447 in public function `reduce_typed_storage`: D103: Missing docstring in public function torch/multiprocessing/reductions.py:451 in public function `rebuild_typed_storage_child`: D103: Missing docstring in public function torch/multiprocessing/reductions.py:456 in public function `reduce_typed_storage_child`: D103: Missing docstring in public function torch/multiprocessing/reductions.py:460 in public function `reduce_storage`: D103: Missing docstring in public function torch/multiprocessing/reductions.py:489 in public function `init_reductions`: D103: Missing docstring in public function 29 ``` - `torch/multiprocessing/spawn.py` </br> Before: 19 ``` torch/multiprocessing/spawn.py:1 at module level: D100: Missing docstring in public module torch/multiprocessing/spawn.py:11 in public class `ProcessException`: D101: Missing docstring in public class torch/multiprocessing/spawn.py:14 in public method `__init__`: D107: Missing docstring in __init__ torch/multiprocessing/spawn.py:20 in public method `__reduce__`: D105: Missing docstring in magic method torch/multiprocessing/spawn.py:25 in public class `ProcessRaisedException`: D205: 1 blank line required between summary line and description (found 0) torch/multiprocessing/spawn.py:25 in public class `ProcessRaisedException`: D400: First line should end with a period (not 'n') torch/multiprocessing/spawn.py:30 in public method `__init__`: D107: Missing docstring in __init__ torch/multiprocessing/spawn.py:40 in public class `ProcessExitedException`: D205: 1 blank line required between summary line and description (found 0) torch/multiprocessing/spawn.py:40 in public class `ProcessExitedException`: D400: First line should end with a period (not 'l') torch/multiprocessing/spawn.py:47 in public method `__init__`: D107: Missing docstring in __init__ torch/multiprocessing/spawn.py:59 in public method `__reduce__`: D105: Missing docstring in magic method torch/multiprocessing/spawn.py:85 in public class `ProcessContext`: D101: Missing docstring in public class torch/multiprocessing/spawn.py:86 in public method `__init__`: D107: Missing docstring in __init__ torch/multiprocessing/spawn.py:93 in public method `pids`: D102: Missing docstring in public method torch/multiprocessing/spawn.py:97 in public method `join`: D205: 1 blank line required between summary line and description (found 0) torch/multiprocessing/spawn.py:97 in public method `join`: D401: First line should be in imperative mood (perhaps 'Try', not 'Tries') torch/multiprocessing/spawn.py:166 in public class `SpawnContext`: D101: Missing docstring in public class torch/multiprocessing/spawn.py:167 in public method `__init__`: D107: Missing docstring in __init__ torch/multiprocessing/spawn.py:180 in public function `start_processes`: D103: Missing docstring in public function 19 ``` After: 13 ``` torch/multiprocessing/spawn.py:1 at module level: D100: Missing docstring in public module torch/multiprocessing/spawn.py:11 in public class `ProcessException`: D101: Missing docstring in public class torch/multiprocessing/spawn.py:14 in public method `__init__`: D107: Missing docstring in __init__ torch/multiprocessing/spawn.py:20 in public method `__reduce__`: D105: Missing docstring in magic method torch/multiprocessing/spawn.py:27 in public method `__init__`: D107: Missing docstring in __init__ torch/multiprocessing/spawn.py:41 in public method `__init__`: D107: Missing docstring in __init__ torch/multiprocessing/spawn.py:53 in public method `__reduce__`: D105: Missing docstring in magic method torch/multiprocessing/spawn.py:79 in public class `ProcessContext`: D101: Missing docstring in public class torch/multiprocessing/spawn.py:80 in public method `__init__`: D107: Missing docstring in __init__ torch/multiprocessing/spawn.py:87 in public method `pids`: D102: Missing docstring in public method torch/multiprocessing/spawn.py:161 in public class `SpawnContext`: D101: Missing docstring in public class torch/multiprocessing/spawn.py:162 in public method `__init__`: D107: Missing docstring in __init__ torch/multiprocessing/spawn.py:175 in public function `start_processes`: D103: Missing docstring in public function 13 ``` - `torch/multiprocessing/__init__.py` </br> Before: 0 ``` torch/multiprocessing/__init__.py:1 at module level: D205: 1 blank line required between summary line and description (found 0) torch/multiprocessing/__init__.py:1 at module level: D400: First line should end with a period (not '`') torch/multiprocessing/__init__.py:57 in public function `set_sharing_strategy`: D401: First line should be in imperative mood (perhaps 'Set', not 'Sets') torch/multiprocessing/__init__.py:69 in public function `get_sharing_strategy`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') torch/multiprocessing/__init__.py:74 in public function `get_all_sharing_strategies`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') 5 ``` After: 0 - `torch/nn/__init__.py` </br> Before: 3 ``` torch/nn/__init__.py:1 at module level: D104: Missing docstring in public package torch/nn/__init__.py:14 in public function `factory_kwargs`: D205: 1 blank line required between summary line and description (found 0) torch/nn/__init__.py:14 in public function `factory_kwargs`: D400: First line should end with a period (not 'd') 3 ``` After: 1 ``` torch/nn/__init__.py:1 at module level: D104: Missing docstring in public package 1 ``` - `torch/nn/cpp.py` </br> Before: 16 ``` torch/nn/cpp.py:7 in public class `OrderedDictWrapper`: D205: 1 blank line required between summary line and description (found 0) torch/nn/cpp.py:7 in public class `OrderedDictWrapper`: D400: First line should end with a period (not 'e') torch/nn/cpp.py:16 in public method `__init__`: D107: Missing docstring in __init__ torch/nn/cpp.py:21 in public method `cpp_dict`: D102: Missing docstring in public method torch/nn/cpp.py:27 in public method `items`: D102: Missing docstring in public method torch/nn/cpp.py:30 in public method `keys`: D102: Missing docstring in public method torch/nn/cpp.py:33 in public method `values`: D102: Missing docstring in public method torch/nn/cpp.py:36 in public method `__iter__`: D105: Missing docstring in magic method torch/nn/cpp.py:39 in public method `__len__`: D105: Missing docstring in magic method torch/nn/cpp.py:42 in public method `__contains__`: D105: Missing docstring in magic method torch/nn/cpp.py:45 in public method `__getitem__`: D105: Missing docstring in magic method torch/nn/cpp.py:50 in public class `ModuleWrapper`: D205: 1 blank line required between summary line and description (found 0) torch/nn/cpp.py:50 in public class `ModuleWrapper`: D400: First line should end with a period (not 'd') torch/nn/cpp.py:55 in public method `__init__`: D107: Missing docstring in __init__ torch/nn/cpp.py:83 in public method `training`: D102: Missing docstring in public method torch/nn/cpp.py:90 in public method `__repr__`: D105: Missing docstring in magic method 16 ``` After: 12 ``` torch/nn/cpp.py:16 in public method `__init__`: D107: Missing docstring in __init__ torch/nn/cpp.py:21 in public method `cpp_dict`: D102: Missing docstring in public method torch/nn/cpp.py:27 in public method `items`: D102: Missing docstring in public method torch/nn/cpp.py:30 in public method `keys`: D102: Missing docstring in public method torch/nn/cpp.py:33 in public method `values`: D102: Missing docstring in public method torch/nn/cpp.py:36 in public method `__iter__`: D105: Missing docstring in magic method torch/nn/cpp.py:39 in public method `__len__`: D105: Missing docstring in magic method torch/nn/cpp.py:42 in public method `__contains__`: D105: Missing docstring in magic method torch/nn/cpp.py:45 in public method `__getitem__`: D105: Missing docstring in magic method torch/nn/cpp.py:52 in public method `__init__`: D107: Missing docstring in __init__ torch/nn/cpp.py:80 in public method `training`: D102: Missing docstring in public method torch/nn/cpp.py:87 in public method `__repr__`: D105: Missing docstring in magic method 12 ``` - `torch/nn/grad.py` </br> Before: 10 ``` torch/nn/grad.py:1 at module level: D400: First line should end with a period (not 'e') torch/nn/grad.py:8 in public function `conv1d_input`: D205: 1 blank line required between summary line and description (found 0) torch/nn/grad.py:8 in public function `conv1d_input`: D401: First line should be in imperative mood (perhaps 'Compute', not 'Computes') torch/nn/grad.py:40 in public function `conv1d_weight`: D401: First line should be in imperative mood (perhaps 'Compute', not 'Computes') torch/nn/grad.py:71 in public function `conv2d_input`: D205: 1 blank line required between summary line and description (found 0) torch/nn/grad.py:71 in public function `conv2d_input`: D401: First line should be in imperative mood (perhaps 'Compute', not 'Computes') torch/nn/grad.py:103 in public function `conv2d_weight`: D401: First line should be in imperative mood (perhaps 'Compute', not 'Computes') torch/nn/grad.py:134 in public function `conv3d_input`: D205: 1 blank line required between summary line and description (found 0) torch/nn/grad.py:134 in public function `conv3d_input`: D401: First line should be in imperative mood (perhaps 'Compute', not 'Computes') torch/nn/grad.py:166 in public function `conv3d_weight`: D401: First line should be in imperative mood (perhaps 'Compute', not 'Computes') 10 ``` After: 0 - `torch/nn/parameter.py` </br> Before: 17 ``` torch/nn/parameter.py:1 at module level: D100: Missing docstring in public module torch/nn/parameter.py:14 in public class `Parameter`: D204: 1 blank line required after class docstring (found 0) torch/nn/parameter.py:33 in public method `__new__`: D102: Missing docstring in public method torch/nn/parameter.py:54 in public method `__deepcopy__`: D105: Missing docstring in magic method torch/nn/parameter.py:62 in public method `__repr__`: D105: Missing docstring in magic method torch/nn/parameter.py:65 in public method `__reduce_ex__`: D105: Missing docstring in magic method torch/nn/parameter.py:84 in public class `UninitializedTensorMixin`: D101: Missing docstring in public class torch/nn/parameter.py:105 in public method `materialize`: D205: 1 blank line required between summary line and description (found 0) torch/nn/parameter.py:125 in public method `shape`: D102: Missing docstring in public method torch/nn/parameter.py:132 in public method `share_memory_`: D102: Missing docstring in public method torch/nn/parameter.py:138 in public method `__repr__`: D105: Missing docstring in magic method torch/nn/parameter.py:141 in public method `__reduce_ex__`: D105: Missing docstring in magic method torch/nn/parameter.py:149 in public method `__torch_function__`: D105: Missing docstring in magic method torch/nn/parameter.py:164 in public function `is_lazy`: D103: Missing docstring in public function torch/nn/parameter.py:186 in public method `__new__`: D102: Missing docstring in public method torch/nn/parameter.py:191 in public method `__deepcopy__`: D105: Missing docstring in magic method torch/nn/parameter.py:217 in public method `__new__`: D102: Missing docstring in public method 17 ``` After: 15 ``` torch/nn/parameter.py:1 at module level: D100: Missing docstring in public module torch/nn/parameter.py:34 in public method `__new__`: D102: Missing docstring in public method torch/nn/parameter.py:55 in public method `__deepcopy__`: D105: Missing docstring in magic method torch/nn/parameter.py:63 in public method `__repr__`: D105: Missing docstring in magic method torch/nn/parameter.py:66 in public method `__reduce_ex__`: D105: Missing docstring in magic method torch/nn/parameter.py:85 in public class `UninitializedTensorMixin`: D101: Missing docstring in public class torch/nn/parameter.py:127 in public method `shape`: D102: Missing docstring in public method torch/nn/parameter.py:134 in public method `share_memory_`: D102: Missing docstring in public method torch/nn/parameter.py:140 in public method `__repr__`: D105: Missing docstring in magic method torch/nn/parameter.py:143 in public method `__reduce_ex__`: D105: Missing docstring in magic method torch/nn/parameter.py:151 in public method `__torch_function__`: D105: Missing docstring in magic method torch/nn/parameter.py:166 in public function `is_lazy`: D103: Missing docstring in public function torch/nn/parameter.py:188 in public method `__new__`: D102: Missing docstring in public method torch/nn/parameter.py:193 in public method `__deepcopy__`: D105: Missing docstring in magic method torch/nn/parameter.py:219 in public method `__new__`: D102: Missing docstring in public method 15 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/113052 Approved by: https://github.com/mikaylagawarecki, https://github.com/soulitzer	2023-11-10 21:19:17 +00:00
Edward Z. Yang	9752ef595c	[BE] Consistently use the sym_stride lowering, instead of short-circuiting before (#113071 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113071 Approved by: https://github.com/voznesenskym	2023-11-10 21:19:12 +00:00
Kaichao You	958f755a0e	[FX][CodeGen] Make sure fx code is valid in python (#113345 ) This PR fixes two cases when fx generated code is invalid in python (syntax error): 1. multiple type annotation in one line: `var1: annotation1, var2: annotation2 = function_call()` 2. invalid type annotation for scalars like `var1: f32[] = function_call()`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113345 Approved by: https://github.com/ezyang	2023-11-10 21:12:16 +00:00
markstur	5540d276ce	Fix docstring errors in container.py, _functions.py, transformer.py, comm.py, parallel_apply.py, data_parallel.py, scatter_gather.py (#113250 ) Fix docstring errors in container.py, _functions.py, transformer.py, comm.py, parallel_apply.py, data_parallel.py, scatter_gather.py Fixes #112603 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113250 Approved by: https://github.com/mikaylagawarecki	2023-11-10 21:07:25 +00:00
giacomo	7b28f8c5ea	Better error message when applying interpolation on non-4D tensors (#113459 ) Fixes #113445 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113459 Approved by: https://github.com/albanD	2023-11-10 21:06:51 +00:00
Aaron Enye Shi	a62a88bb84	[Kineto] Initialize libkineto profilers during torch init process during pybind set-up (#112623 ) Summary: We are planning to lazily initialize CUPTI when profiling is actually performed. Therefore, we need to remove profiler init dependency on CUPTI Callbacks' RESOURCE_CONTEXT_CREATED. Instead, we can initialize the profilers during torch profiler pybind, ie. THPAutograd_initExtension() and lazily in profilerStep(). Test Plan: CI and ran internally, see internal diff logs. Differential Revision: D50894961 Pulled By: aaronenyeshi Pull Request resolved: https://github.com/pytorch/pytorch/pull/112623 Approved by: https://github.com/albanD	2023-11-10 20:50:54 +00:00
Edward Z. Yang	6b38836c73	[BE] Don't reify entire graph.nodes just to access last element (#113450 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113450 Approved by: https://github.com/albanD	2023-11-10 20:50:14 +00:00
PyTorch MergeBot	ae2c219de2	Revert "[BE] Remove stale CUDA version check from cpp_extension.py (#113447 )" This reverts commit 7ccca60927cdccde63d6a1d40480950f24e9877a. Reverted https://github.com/pytorch/pytorch/pull/113447 on behalf of https://github.com/malfet due to Broke ROCM ([comment](https://github.com/pytorch/pytorch/pull/113447#issuecomment-1806407892))	2023-11-10 20:46:13 +00:00
Jez Ng	a2c32b8bd0	[inductor] Make codegen/{common,wrapper,cuda/cutlass_utils}.py pass follow_imports typechecking (#113411 ) SymIntType is referenced by wrapper.py, so I added its .pyi definition. I also added SymBoolType along the way for completeness. The `insinstance` checks in wrapper.py reference torch.Type, which seems to cause mypy to choke. Not entirely sure why; I've just added type-ignore comments for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113411 Approved by: https://github.com/Skylion007 ghstack dependencies: #113409, #113410	2023-11-10 19:58:08 +00:00
Jez Ng	5a9f08feb5	[inductor] Make {joint_graph,inductor_prims,utils}.py pass follow_imports typechecking (#113410 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113410 Approved by: https://github.com/lezcano ghstack dependencies: #113409	2023-11-10 19:58:08 +00:00
Jez Ng	b0ede09682	[inductor] Make pattern_matcher.py pass follow_imports typechecking (#113409 ) Import following reveals that a good number of hints were wrong... Pull Request resolved: https://github.com/pytorch/pytorch/pull/113409 Approved by: https://github.com/Skylion007	2023-11-10 19:58:08 +00:00
Peter Bell	6e243f475d	[inductor] Move `has_torchvision_roi_align` check inside test_roi_align (#113385 ) Currently `test_torchinductor.py` imports `torchvision` at import time, which is problematic when you have a broken `torchvision` install as test collection will fail. This could happen for example if `torchvision` was built against a different versions of PyTorch as may happen regularly in development. This moves the check inside `test_roi_align` so a failure to import `torchvision` only causes a test failure and the other tests can run fine. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113385 Approved by: https://github.com/lezcano ghstack dependencies: #113384	2023-11-10 19:45:33 +00:00
Peter Bell	c4fe817a69	[inductor] Fix test_dist on pre-sm80 and add skipCUDAIf decorator (#113384 ) `test_dist` uses bfloat16 which isn't well supported by triton on pre-sm80 hardware, so split the test in two and add a skip. This also adds a `skipCUDAIf` decorator which only skips on CUDA devices so the test still runs on CPU. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113384 Approved by: https://github.com/lezcano	2023-11-10 19:45:33 +00:00
Nikita Shulga	7ccca60927	[BE] Remove stale CUDA version check from cpp_extension.py (#113447 ) As at least CUDA-11.x is needed to build PyTorch on latest trunk Pull Request resolved: https://github.com/pytorch/pytorch/pull/113447 Approved by: https://github.com/Skylion007, https://github.com/atalman, https://github.com/PaliC, https://github.com/huydhn	2023-11-10 18:54:19 +00:00
Alperen ÜNLÜ	cb233dada4	Fix docstrings on torch/nn/modules (#113260 ) Fixes #112598 ## Description Fixes the docstrings on following files. ```bash pydocstyle path-to-file --count ``` \| File \| Count \| \| ------------------------------------- \| ------- \| \| torch/nn/modules/adaptive.py \| 20 -> 4 \| \| torch/nn/modules/channelshuffle.py \| 7 -> 4 \| \| torch/nn/modules/conv.py \| 37 -> 25 \| \| torch/nn/modules/distance.py \| 7 -> 5 \| \| torch/nn/modules/dropout.py \| 17 -> 7 \| \| torch/nn/modules/flatten.py \| 10 -> 7 \| \| torch/nn/modules/fold.py \| 11 -> 7 \| \| torch/nn/modules/instancenorm.py \| 13 -> 1 \| \| torch/nn/modules/lazy.py \| 11 -> 2 \| \| torch/nn/modules/linear.py \| 20 -> 14 \| \| torch/nn/modules/normalization.py \| 25 -> 16 \| \| torch/nn/modules/padding.py \| 33 -> 19 \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/113260 Approved by: https://github.com/mikaylagawarecki	2023-11-10 18:22:48 +00:00
Scott Wolchok	b794bec581	[PyTorch] AOTI: add AOTIInductorModelGetNumOutputs & use for internal runner (#113299 ) I don't see why you couldn't get the number of outputs for a model directly without going through a container. Now you can. Differential Revision: [D51050435](https://our.internmc.facebook.com/intern/diff/D51050435/) NOTE FOR REVIEWERS: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D51050435/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/113299 Approved by: https://github.com/desertfire, https://github.com/chenyang78	2023-11-10 18:03:24 +00:00
Yichen Yan	b1eb9e172a	remove jit from dynamo benchmark (#113338 ) Continuous of https://github.com/pytorch/pytorch/pull/106071, without this dynamo dist cannot run at the moment. Related to https://github.com/pytorch/benchmark/pull/1787 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113338 Approved by: https://github.com/ezyang	2023-11-10 18:02:08 +00:00
PyTorch MergeBot	2cd8c0565c	Revert "[AOTI] Implement support for user defined kernels that use triton.autotune (#113229 )" This reverts commit 1488bafb274fcc82c8aac429bad61738bc3f950e. Reverted https://github.com/pytorch/pytorch/pull/113229 on behalf of https://github.com/PaliC due to breaking test_aot_inductor.py tests though a forward fix is coming ([comment](https://github.com/pytorch/pytorch/pull/113229#issuecomment-1806159396))	2023-11-10 17:46:14 +00:00
PyTorch MergeBot	3c9a59cb8d	Revert "[BE] [cuDNN] Always build assuming cuDNN >= 8.0 (#95722 )" This reverts commit df4f0b3829f8e8b623f4e94a8536cfa58ccfb9af. Reverted https://github.com/pytorch/pytorch/pull/95722 on behalf of https://github.com/PaliC due to is breaking a bunch of internal pytorch users ([comment](https://github.com/pytorch/pytorch/pull/95722#issuecomment-1806131675))	2023-11-10 17:26:36 +00:00
PyTorch MergeBot	2a271a3efa	Revert "[pytree] register pytree node type in both C++ pytree and Python pytree (#112111 )" This reverts commit a0d00349edbe09087b7bb8769cd1f49fbe7117ca. Reverted https://github.com/pytorch/pytorch/pull/112111 on behalf of https://github.com/PaliC due to _private_register_pytree_node now checks for duplicate registering, unfortunately, this breaks composability with torchrec internally :( ([comment](https://github.com/pytorch/pytorch/pull/112111#issuecomment-1806130993))	2023-11-10 17:24:40 +00:00
Chien-Chin Huang	6e714d7315	[state_dict] Rewrite _gather_state_dict to extract the traversal logic (#112885 ) This allows us to do cpu_offload with the same traversal logic Differential Revision: [D50982355](https://our.internmc.facebook.com/intern/diff/D50982355/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112885 Approved by: https://github.com/LucasLLC, https://github.com/wz337 ghstack dependencies: #112836	2023-11-10 17:07:52 +00:00
Bin Bao	c197c48ceb	[aotinductor] Add a demo tutorial (#112457 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112457 Approved by: https://github.com/msaroufim, https://github.com/albanD	2023-11-10 17:01:03 +00:00
Sun, Jiayi	91e4b0fc4e	Improve torch.unique docs (#113424 ) Related issue: https://github.com/pytorch/pytorch/issues/105742. In fact, `torch.unique` always sort the tensor at the beginning regardless of the `sort` argument and the `dim` argument . Pull Request resolved: https://github.com/pytorch/pytorch/pull/113424 Approved by: https://github.com/malfet ghstack dependencies: #113420	2023-11-10 16:36:30 +00:00
PyTorch MergeBot	23e0923c74	Revert "[pytree] reorganize submodule structure for C++ and Python pytree (#112278 )" This reverts commit eeeb40b32717bab75bd7d8f28f8343385688b3ab. Reverted https://github.com/pytorch/pytorch/pull/112278 on behalf of https://github.com/PaliC due to Reverting this pr as the one under it in the stack is causing regressions in torchrec ([comment](https://github.com/pytorch/pytorch/pull/112278#issuecomment-1806044435))	2023-11-10 16:30:36 +00:00
Chien-Chin Huang	d4c810cc11	[state_dict] Add cpu_only and ranks_only support for _gather_state_dict (#112836 ) Add cpu_only and ranks_only support for _gather_state_dict Differential Revision: [D50962980](https://our.internmc.facebook.com/intern/diff/D50962980/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112836 Approved by: https://github.com/LucasLLC, https://github.com/wz337	2023-11-10 16:03:46 +00:00
Chien-Chin Huang	08641a3232	Make FakeProcessGroup traceable (#113314 ) This PR mimics what we have done to trace ProcessGroup. This allows use to use FakeProcessGroup with torch.compile. FakeProcessGroup allows us to use world_size > 1 without creating multiple processes thus enabling the usage of PDB to debug bucketing DDP allreduce in the Inductor. We can theoretically use GLOO with world_size==1 to achieve the same goal. However, the `wait()` seems to be optimized away when the world_size is 1. Differential Revision: [D51136463](https://our.internmc.facebook.com/intern/diff/D51136463/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113314 Approved by: https://github.com/wanchaol	2023-11-10 16:03:38 +00:00
Sun, Jiayi	c3c4e70b2c	Revert "Revert 107846 and 109695 (#111099 )" (#113420 ) The algorithm is taken from Numpy implementation at https://github.com/numpy/numpy/blob/main/numpy/lib/arraysetops.py#L323, it first do a sort on the input sequence and then use a `mask` to record the unique element of each consecutive section. Now we don't have parallel sort on 1-dimension float tensor, will have it enabled in next step. Parallel radix sort is used for 1-dimensional int tensor. The following data is collected with script in the issue on Intel(R) Xeon(R) Gold 6248 CPU @ 2.5GHz with single sockets (20 cores): #### before (dtype int64) ``` Numpy just sort: 0.4271528720855713 s Numpy sort + indexes: 6.383563041687012 s Torch just sort: 0.46924352645874023 s Torch sort + indexes: 1.8140404224395752 s ``` #### after (dtype int64) ``` Torch just sort: 0.2540090084075928 s Torch sort + indexes: 0.2766146659851074 s ``` #### before (float32) ``` Numpy just sort: 0.41129398345947266 s Numpy sort + indexes: 6.422696590423584 s Torch just sort: 9.109549283981323 s Torch sort + indexes: 37.59021711349487 s ``` #### after (float32) ``` Torch just sort: 3.5369982719421387 s Torch sort + indexes: 3.582240581512451 s ``` if we enabled parallel sort on 1-dimension float tensor, the performance is: ``` Torch just sort: 0.3212606906890869 s Torch sort + indexes: 0.36211371421813965 s ``` Since i have fused the `inverse_indices` and `count` calculation in fused parallel loop (the algorithm is identical to NumPy's but with better optimization), they will take a small amount of additional time. Use a reduction implementation for unique when dtype is bool on CPU. This reverts commit 6dca81c054c1f7e378e956900265b085ca521e47 as `torch.sort` errors has been fixed in FBGEMM by `70c6e83c29`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113420 Approved by: https://github.com/malfet	2023-11-10 15:45:28 +00:00
George White	8880584015	Improve test_float8.py (#113361 ) The numeric test for round-trip casting of float8 dtypes originally consisted of generating a 100x100 tensor in the range 0..max. This change refactors the test, adds further edge cases and fixes multiple issues with the lower precision simulation which the results of the round-trip cast test were checked against. Set atol=0 and rtol=0 to ensure an exact equality comparison. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113361 Approved by: https://github.com/malfet, https://github.com/Neilblaze	2023-11-10 15:23:22 +00:00
Thiago Crepaldi	574e313643	Add thiagocrepaldi as person of interest for onnx exporter (#113402 ) @malfet @kit1980 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113402 Approved by: https://github.com/malfet	2023-11-10 15:19:58 +00:00
vfdev	71ca42787f	Replaced deprecated pkg_resources.packaging with packaging module (#113023 ) Usage of `from pkg_resources import packaging` leads to a deprecation warning: ``` DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html ``` and in strict tests where warnings are errors, this leads to CI breaks, e.g.: https://github.com/pytorch/vision/pull/8092 Replacing `pkg_resources.package` with `package` as it is now a pytorch dependency: `fa9045a872/requirements.txt (L19)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/113023 Approved by: https://github.com/Skylion007, https://github.com/malfet	2023-11-10 15:06:03 +00:00
Edward Z. Yang	f49b8e9313	Register SymInt-aware meta function for mm out, symintify resize (#113202 ) Fixes https://github.com/pytorch/pytorch/issues/112489 Fixes https://github.com/pytorch/pytorch/issues/112494 New OpInfo tests for out variants added, since these were not exercised previously. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113202 Approved by: https://github.com/albanD	2023-11-10 14:27:05 +00:00
leslie-fang-intel	4f2b2883dc	[Inductor] [Quant] Enable QLinear int8-mixed-bf16 Lowering (#112486 ) Summary - PR 7 for enabling Int8-Mixed-BF16 PT2E PTQ Quantization with Inductor https://github.com/pytorch/pytorch/issues/111640. - Enable the QLinear int8-mixed-bf16 weight prepack and post grad lowering inside inductor. TestPlan ``` python -m pytest test_mkldnn_pattern_matcher.py -k test_qlinear ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112486 Approved by: https://github.com/jgong5, https://github.com/eellison, https://github.com/jerryzh168	2023-11-10 12:35:13 +00:00
Nicolas Macchioni	eb1534027f	Back out "[inductor] scale up num_warps for reductions to lower register pressure (#113039 )" (#113400 ) Test Plan: CI Differential Revision: D51180501 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113400 Approved by: https://github.com/htyu	2023-11-10 09:22:29 +00:00
leslie-fang-intel	86d32bedc2	[Inductor] [Quant] Enable QConv2d Binary int8-mixed-bf16 Lowering (#112551 ) Summary - PR 6 for enabling Int8-Mixed-BF16 PT2E PTQ Quantization with Inductor https://github.com/pytorch/pytorch/issues/111640. - Enable the QConv2d Binary int8-mixed-bf16 post grad lowering inside inductor. TestPlan ``` python -m pytest test_mkldnn_pattern_matcher.py -k test_qconv2d ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112551 Approved by: https://github.com/jgong5, https://github.com/eellison, https://github.com/jerryzh168 ghstack dependencies: #112550	2023-11-10 09:11:11 +00:00
leslie-fang-intel	65e99357ae	[Inductor] [Quant] Enable QConv2d Unary int8-mixed-bf16 Lowering (#112550 ) Summary - PR 5 for enabling Int8-Mixed-BF16 PT2E PTQ Quantization with Inductor https://github.com/pytorch/pytorch/issues/111640. - Enable the QConv2d Unary int8-mixed-bf16 weight prepack and post grad lowering inside inductor. TestPlan ``` python -m pytest test_mkldnn_pattern_matcher.py -k test_qconv2d ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112550 Approved by: https://github.com/jgong5, https://github.com/jerryzh168	2023-11-10 08:59:40 +00:00
jiayisun	63d65dd6cd	Correct output shape of meta registration for qlinear_pointwise (#112390 ) Corrected output shape of meta registration for qlinear_pointwise. Because the weight of qlinear_pointwise has been transposed during the qLinear weight prepack process, the shape of the weight of qlinear_pointwise is (in_features, out_features). Pull Request resolved: https://github.com/pytorch/pytorch/pull/112390 Approved by: https://github.com/jgong5, https://github.com/leslie-fang-intel, https://github.com/eellison	2023-11-10 07:50:59 +00:00
cyy	41e8632ca4	[1/N] Fix clang-tidy warnings in torch/csrc/profiler (#112360 ) This PR fixes some clang-tidy warnings in torch/csrc/profiler Pull Request resolved: https://github.com/pytorch/pytorch/pull/112360 Approved by: https://github.com/ezyang	2023-11-10 07:37:23 +00:00
Edward Z. Yang	0f7ac2635d	Uniformly use SourcelessBuilder to handle user defined types (#113390 ) Subsumes https://github.com/pytorch/pytorch/pull/110794 Fixes https://github.com/pytorch/pytorch/issues/110315 This is not really a 100% sound fix, a deeper analysis of the bug can be found at https://docs.google.com/document/d/1y-nRAPdbZEji52MPKYzC0U3VhvW9yEAEDqP5t5GhWZ0/edit The general idea behind the fix here is that we are going to play fast and loose with user defined classes: as Dynamo is written today, we are willing to pull out these types and directly manipulate them (e.g., look at their `__mro__`, etc) without an intervening VariableTracker. As such, if I use `python_type` to extract out the Python type of a VT or if I am manually reading out the `__bases__` of a type, which may be a user defined class, if it is sourceless, all I need to do is use SourcelessBuilder instead of ConstantVariable to make sure I wrap it into the correct VT class. The approach in https://github.com/pytorch/pytorch/pull/110794 was "more correct", but we'd have to go substantially further to get it all working. So I am doing this to unblock suo for now. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113390 Approved by: https://github.com/suo	2023-11-10 07:26:52 +00:00
PyTorch MergeBot	59592389fc	Revert "[dynamo] Refactor test cross importing (#113242 )" This reverts commit 8858edad656f505728c9810093f796f96e1285cb. Reverted https://github.com/pytorch/pytorch/pull/113242 on behalf of https://github.com/PaliC due to this diff appears to be causing inductor failures internally ([comment](https://github.com/pytorch/pytorch/pull/113242#issuecomment-1805132719))	2023-11-10 05:43:08 +00:00
Xuehai Pan	eeeb40b327	[pytree] reorganize submodule structure for C++ and Python pytree (#112278 ) Reorganized the two C++ and Python pytree submodules into a subpackage. I think this would be easier to implement the abstract `PyTreeAPI` class with two implementations. And it will be much easier for the user to switch between the two implementations. Before: ```text torch ├── utils │ ├── _pytree.py │ ├── _cxx_pytree.py │ ... ... ``` After: ```text torch ├── utils │ ├── _pytree │ │ ├── __init__.py │ │ └── api │ │ ├── __init__.py │ │ ├── cxx.py │ │ └── python.py │ ... ... ``` The `torch.utils._pytree` module will import all APIs from `torch.utils._pytree.api.python`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112278 Approved by: https://github.com/zou3519 ghstack dependencies: #112111	2023-11-10 05:41:32 +00:00
PyTorch MergeBot	68bf0f1e7d	Revert "[inductor] Move things into torch/testing/_internal/inductor_utils.py (#113275 )" This reverts commit c967dc526a40f4b15003f9c99383acabe66367a6. Reverted https://github.com/pytorch/pytorch/pull/113275 on behalf of https://github.com/PaliC due to the diff this is stacked on top of appears to be causing inductor failures internally ([comment](https://github.com/pytorch/pytorch/pull/113275#issuecomment-1805131017))	2023-11-10 05:40:55 +00:00
Zhengxu Chen	8943207925	[dynamo] Support kwargs for lazy module call. (#113387 ) Summary: Seems like we already support kwargs in _infer_argument, so we don't need the extra assertion here. Test Plan: buck test caffe2/test:test_export -- -r lazy_module_kwargs Differential Revision: D51170339 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113387 Approved by: https://github.com/yanboliang	2023-11-10 05:17:58 +00:00
Yue Dong	7a1314c548	[Kineto] Fix the Chrome trace loading issue with all_to_all input split length > 30 (#113392 ) Summary: This change fixes the Chrome trace loading issue with all_to_all input split length > 30. Now when the `all_to_all` input split size is larger than 30 we truncate the content and adding `...` at the end, which caused trouble when loading with Chrome trace. Test Plan: Trace with length = 2: - Link: https://fburl.com/perfdoctor/b94u4x82 {F1145436735} Looking into the json file: ``` Before: "In split size": [6058496, 5942784] After "In split size": "[6058496, 5942784]" ``` Reviewed By: aaronenyeshi Differential Revision: D51167843 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113392 Approved by: https://github.com/aaronenyeshi	2023-11-10 05:03:18 +00:00
fduwjj	f9114193bd	[NCCL PG] ADD a separate monitoring thread to ensure we collect debug info and check watchdog heartbeat (#112518 ) This PR has the following goals: 1. Detect unhealthy nccl watchdog thread by implementing a heartbeat. NCCL watchdog sometimes can hang for several reasons such as nccl/cuda API bugs or unexpected blocking behaviors. This is the last resort to ensure that we don't silently keep the training job run for hours. 2. Sometimes, the process gets stuck in the destroy of NCCL PG, and this PR will ensure that we will eventually abort it after some time (by default 2 mins) 3. Once heartbeat cannot be heard, we dump debug information (for now, we just use the flight recorder implemented in https://github.com/pytorch/pytorch/pull/110960/files) to disk. (How and where to dump the debug info will be addressed in the following PR). 4. Finally, we initiate std::abort via `LOG(FATAL)` to kill the process. To clarify further what this PR is trying to solve, we first list are four cases when a NCCL PG can end up with: - case 1: ncclwatchdog gets stuck (maybe some blocking API) and heartbeat monitor kills it during regular heartbeat monitor loop. - case 2: ncclwatchdog timeout and desync report or destroy kicked in(let's call it shutdown) but this shutdown takes so long and heartbeat believes it has to kills the process anyway. - case 3: ncclwatchdog aborts the process (heartbeat monitor not involved) - case 4: program exits cleanly (heartbeat monitor not involved) As we can see here, this PR is trying to address case one and two and we also want to ensure adding one more monitor thread does not interfere what we are currently doing in case three and four. That's why we added two flags `terminateHeartbeatMonitorThread_` and `collectiveDebugInfoMode_`. For case three and four, either `monitorWakeUpCV_` will be waked up in the destructor or `terminateHeartbeatMonitorThread_` will be set to true. So that monitor thread will just exit ASAP. For case one, both `terminateHeartbeatMonitorThread_` and `collectiveDebugInfoMode_` will still false when monitor thread see there are no heartbeat, so it will directly kill the process. For case two, either `terminateHeartbeatMonitorThread_` and `collectiveDebugInfoMode_` will be true, the monitor thread will wait extra time before killing the process. Differential Revision: [D51146305](https://our.internmc.facebook.com/intern/diff/D51146305) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112518 Approved by: https://github.com/kwen2501, https://github.com/wconstab	2023-11-10 04:41:14 +00:00
Nikita Shulga	265d6aac0b	[MPS] Fix crashes during Conv backward pass (#113398 ) By adding weights tensor to the MPSGraph cache key. Add regression test to validate that collision no longer happens Fixes https://github.com/pytorch/pytorch/issues/112998 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113398 Approved by: https://github.com/kulinseth	2023-11-10 04:29:33 +00:00
Brian Hirsh	7064fbf1ea	Fix selective activation checkpointing with subclasses that override sizes() (#113380 ) The problem is that we have a subclass (FunctionalTensor) that overrides size/stride calls, causing them to go through __torch_dispatch__. But when SAC is active, we have _CachingTorchDispatchMode.__torch_dispatch__ active, that intercepts those size/stride calls first, and does something different with them instead of letting FunctionalTensor.__torch_dispatch__ handle them. This PR updates the SAC torch dispatch mode to know to not handle metadata calls, and let its tensor arguments handle them directly. Right now, `FunctionalTensor` has a hardcoded list of metadata ops, but we should probably put them somewhere more general. I'll add better testing before landing this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113380 Approved by: https://github.com/yf225, https://github.com/wanchaol	2023-11-10 04:12:50 +00:00
Jiong Gong	cb48f7855a	[inductor cpu] fix uint8 add and sub (#113253 ) Fix https://github.com/pytorch/pytorch/issues/113016 and https://github.com/pytorch/pytorch/issues/113020 and https://github.com/pytorch/pytorch/issues/113141 and https://github.com/pytorch/pytorch/issues/113143 and https://github.com/pytorch/pytorch/issues/113144 Explicit typecast result of add/sub to uint8 (similar to how we fixed mul previously) to avoid implicit type promotion from C. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113253 Approved by: https://github.com/lezcano, https://github.com/jansel	2023-11-10 04:06:42 +00:00
Huy Do	c7e12c7427	Rerun disabled tests on MacOS x86 (#113315 ) After the recent change https://github.com/pytorch/pytorch/pull/112103 to get the correct job name for GitHub runner, I expect rerun disabled tests and memory leak checks to start running on MacOS x86, but they are still not there. It turns out that we fix the schedule there Pretty simple change, I guess I will let it test in trunk? Pull Request resolved: https://github.com/pytorch/pytorch/pull/113315 Approved by: https://github.com/clee2000	2023-11-10 03:24:27 +00:00
Adrian Wälchli	866457e746	Fix pydocstyle errors in fully_sharded_data_parallel.py, api.py, graph_utils.py, distribute.py, iter_graph_module.py, comm_tensor.py, experimental_ops.py, batch_dim_utils.py, data_parallel.py, graph_optimization.py (#113216 ) Fixes #113191 ``` pydocstyle torch/distributed/fsdp/fully_sharded_data_parallel.py --count ``` On master: 80 After my changes on this PR: 3 ``` pydocstyle torch/distributed/_spmd/comm_tensor.py --count ``` On master: 5 After my changes on this PR: 3 ``` pydocstyle torch/distributed/_spmd/experimental_ops.py --count ``` On master: 3 After my changes on this PR: 1 ``` pydocstyle torch/distributed/_spmd/iter_graph_module.py --count ``` On master: 39 After my changes on this PR: 27 ``` pydocstyle torch/distributed/_spmd/graph_utils.py --count ``` On master: 16 After my changes on this PR: 4 ``` pydocstyle torch/distributed/_spmd/distribute.py --count ``` On master: 19 After my changes on this PR: 10 ``` pydocstyle torch/distributed/_spmd/api.py --count ``` On master: 10 After my changes on this PR: 3 ``` pydocstyle torch/distributed/_spmd/batch_dim_utils.py --count ``` On master: 14 After my changes on this PR: 3 ``` pydocstyle torch/distributed/_spmd/data_parallel.py --count ``` On master: 34 After my changes on this PR: 2 ``` pydocstyle torch/distributed/_spmd/graph_optimization.py --count ``` On master: 35 After my changes on this PR: 13 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113216 Approved by: https://github.com/ezyang	2023-11-10 03:08:32 +00:00
Edward Z. Yang	773b1cbe4f	[BE] Parenthesize and clauses for clarity (#113362 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113362 Approved by: https://github.com/albanD, https://github.com/Skylion007	2023-11-10 03:01:48 +00:00
Xuehai Pan	a0d00349ed	[pytree] register pytree node type in both C++ pytree and Python pytree (#112111 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112111 Approved by: https://github.com/zou3519	2023-11-10 02:41:30 +00:00
Xuehai Pan	5e2adc8650	[pytree] align function signature between C++ and Python pytree (#112482 ) Change the argument name in C++ and Python pytree APIs. Also add a test to ensure the function signatures are the same in the two implementations. - #112485 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112482 Approved by: https://github.com/zou3519	2023-11-10 02:37:48 +00:00
eellison	605236af06	Force fp16 for vision_maskrcnn inference (#113110 ) For fp16 for maskrcnn inference (doesnt support bf16). Also skip phi_1_5 in training - it OOMs even with batch size 1 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113110 Approved by: https://github.com/xmfan	2023-11-10 02:25:11 +00:00
Kurt Mohler	8bdce9bb74	Fix `UntypedStorage.resize_` to keep same CUDA device index (#113386 ) Fixes #113300 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113386 Approved by: https://github.com/albanD	2023-11-10 01:57:25 +00:00
Oguz Ulgen	1488bafb27	[AOTI] Implement support for user defined kernels that use triton.autotune (#113229 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113229 Approved by: https://github.com/chenyang78	2023-11-10 01:39:00 +00:00
Edward Z. Yang	44d0226690	Fix logging exception/stacks from logging (#113394 ) We were accidentally dropping them in our formatter, oops. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113394 Approved by: https://github.com/albanD	2023-11-10 01:17:29 +00:00
PyTorch MergeBot	66150b29e3	Revert "[pytree] align function signature between C++ and Python pytree (#112482 )" This reverts commit 4893a2814ffb5adeec102c17d71d2f25ba5eeb3c. Reverted https://github.com/pytorch/pytorch/pull/112482 on behalf of https://github.com/PaliC due to changing _register_pytree_node's signature is bc breaking, please revert the signature and reland ([comment](https://github.com/pytorch/pytorch/pull/112482#issuecomment-1804909926))	2023-11-10 00:59:23 +00:00
PyTorch MergeBot	9a90989121	Revert "[pytree] register pytree node type in both C++ pytree and Python pytree (#112111 )" This reverts commit 95f52611c735ad5d4eb7967f8588fec065a1b323. Reverted https://github.com/pytorch/pytorch/pull/112111 on behalf of https://github.com/PaliC due to in the bottom diff in the stack changing _register_pytree_node's signature is bc breaking, please revert the signature and reland ([comment](https://github.com/pytorch/pytorch/pull/112111#issuecomment-1804892924))	2023-11-10 00:38:28 +00:00
Richard Zou	d18d7a603e	[fbgemm_gpu] add pt2_compliant tag to some ops (#113201 ) Summary: X-link: https://github.com/pytorch/FBGEMM/pull/2119 Logs show these ops are being used with PT2, so we are grandfathering in these ops to the pt2_compliant tag. Most of these ops are tested, some aren't. Test Plan: - existing tests Differential Revision: D51076460 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113201 Approved by: https://github.com/williamwen42	2023-11-10 00:32:30 +00:00
Steven Troxler	cada6c7fee	[dynamo] Fix a bug by desugaring in-place ops on constants (#113117 ) Summary: Python allows users to write code like ``` x: 1 x += y x += z ``` This code has well-defined semantics: because x is an immutable primitive, the first `+=` will actually re-bind x, it is equivalent to `x = x + y`. The second in-place operation will either similarly desugar (if the result of `x + y` is itself immutable), or possibly result in "true" in-place operation. Now, this is a problem for us because today, dynamo tries to both resolve constant variables to their literal values at compile time and also compile in a way that treats `operator.` builtin functions consistently. This leads to a bug where code like ``` x: 1 x += y ``` actually gets compiled to ``` 1 += y ``` which is both semantically meaningless and a syntax error. A very simple fix that we've already used to fix the special case of `+=` is to detect this, treat it as an edge case, and desugar eagerly into `x = x + y`. The problem with that fix is that it only patched `iadd`, but actually all* of the in-place operators exhibit this behavior. This commit proposes that we tackle all of the inplace opeartors supported by fx in the same way: eagerly remap the operation to an assignment when the left-side is actually an immutable constant. Alternatives? There might be some other fix possible that wouldn't produce a hardcoded remapping; I know that we generally don't like the growth of mappings and blocklists in dynamo. I'm a little skeptical about a general solution though, because the bug is due precisely to Python's highly dynamic dispatching of inplace operations by type; since the fx graph has to be purely static, I suspect that we actually have to desugar this somewhere, because the dataflow is fundamentally different for true inplace operations on types that define `__iadd__`, etc vs the desugaring on primitives. I'm open to other suggestions Test Plan: I verified that the code in https://github.com/pytorch/pytorch/issues/112656 compiles with this fix, and the compiled functions produce the same outputs as the originals. This needs unit tests, but I'd like to get feedback on the approach in the meantime. Fixes #112656 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113117 Approved by: https://github.com/yanboliang	2023-11-10 00:22:55 +00:00
PyTorch MergeBot	bf452dcde6	Revert "[pytree] reorganize submodule structure for C++ and Python pytree (#112278 )" This reverts commit fa895da968ec6f1ae128ee95fcb96ba9addac8a0. Reverted https://github.com/pytorch/pytorch/pull/112278 on behalf of https://github.com/PaliC due to in the bottom diff in the stack changing _register_pytree_node's signature is bc breaking, please revert the signature and reland ([comment](https://github.com/pytorch/pytorch/pull/112278#issuecomment-1804870560))	2023-11-10 00:12:52 +00:00
Jason Ansel	c967dc526a	[inductor] Move things into torch/testing/_internal/inductor_utils.py (#113275 ) This PR is just moving things around, so code shared by multiple tests files is in torch/testing/_internal/inductor_utils.py. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113275 Approved by: https://github.com/yanboliang	2023-11-10 00:11:09 +00:00
Elias Ellison	8a91138f60	Dont error on returned constant, fix for levit_128 (#112544 ) Previously, levit_128 would fail on inference because we would return a view of a constant, which messed up our assertions of outputs being in the cuda graph pool. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112544 Approved by: https://github.com/ezyang ghstack dependencies: #112543	2023-11-10 00:04:25 +00:00
Sergey Lebedev	f8a6ea770c	[UCC] Fix input tensor in scatter (#112246 ) Input tensor is valid only for root rank. Fixes https://github.com/openucx/ucc/issues/859 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112246 Approved by: https://github.com/Aidyn-A, https://github.com/Fuzzkatt, https://github.com/kwen2501	2023-11-09 22:53:40 +00:00
Aidyn-A	c7e0fa49b6	[UCC][CUDA] Overlap p2p (#111608 ) The process group needs to set different streams for send and recv ops to make them asynchronous. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111608 Approved by: https://github.com/kwen2501	2023-11-09 22:48:25 +00:00
Sergii Dymchenko	bb06725ee0	Update mentions of deprecated functions if complex_numbers.rst (#113391 ) `torch.svd` is deprecated, and `torch.solve` is completely removed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113391 Approved by: https://github.com/malfet, https://github.com/lezcano	2023-11-09 22:32:26 +00:00
Andres Lugo-Reyes	afbf345807	[ROCm] Unskip functorch tests that now work (#110760 ) This issue unskips some of the working tests that were skipped as a result of https://github.com/pytorch/pytorch/issues/96560 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110760 Approved by: https://github.com/zou3519, https://github.com/jeffdaily	2023-11-09 22:26:02 +00:00
Sahdev Zala	0aed86a175	Fix docstring errors in Zero Redundancy Optimizer (#113200 ) This PR reduces docstring erros to 0 from total 98. This can be verified by running, `pydocstyle path-to-zero_redundancy_optimizer.py --count` BEFORE the PR: `pydocstyle torch/distributed/optim/zero_redundancy_optimizer.py --count` 98 AFTER the PR: `pydocstyle torch/distributed/optim/zero_redundancy_optimizer.py --count` 0 Fixes #112642 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113200 Approved by: https://github.com/weifengpy	2023-11-09 22:21:40 +00:00
Jez Ng	e6f0960762	[inductor] Make debug.py pass follow-imports typechecking (#113307 ) pydot accepts both a str and a list of str for its `prog` parameter. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113307 Approved by: https://github.com/Skylion007 ghstack dependencies: #113304, #113305, #113306	2023-11-09 22:08:17 +00:00
Jez Ng	a65969928c	[inductor] Make codecache.py pass follow-imports typechecking (#113306 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113306 Approved by: https://github.com/Skylion007 ghstack dependencies: #113304, #113305	2023-11-09 22:08:17 +00:00
Zhijing Li (Accelerator Enablement)	87082bd025	Reduce single reader check time for inline_container (#113328 ) Differential Revision: D51089711 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113328 Approved by: https://github.com/jiayisuse	2023-11-09 22:02:28 +00:00
Jez Ng	a3a55df4af	[dynamo] Add .pyi declaration of _CacheEntry (#113305 ) This is required for enabling follow-imports=silent; referenced by _dynamo/types.py. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113305 Approved by: https://github.com/Skylion007, https://github.com/ezyang ghstack dependencies: #113304	2023-11-09 21:55:49 +00:00
Jez Ng	767ce2b81c	[dynamo] Make decorators.py pass follow-import typechecking (#113304 ) I am trying to turn on `follow_imports=silent` for MYPYNOFOLLOW. However, this requires a huge number of changes, so I am breaking it down to a per-file basis. Unfortunately, we will not be able to turn on `follow_imports` until all files are fixed, so there is no way to stop regressions. So I hope to get these fixes in as fast as possible. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113304 Approved by: https://github.com/Skylion007	2023-11-09 21:55:49 +00:00
Jon Chuang	4e2e0437ea	[fx] stylistic improvements for fx.split_module (#113373 ) Was overly verbose before. Less qualified / long names = more clarity Pull Request resolved: https://github.com/pytorch/pytorch/pull/113373 Approved by: https://github.com/wconstab	2023-11-09 21:49:27 +00:00
Edward Z. Yang	82369e44a9	Add sym_node to uninteresting files (#113349 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113349 Approved by: https://github.com/Skylion007	2023-11-09 21:38:57 +00:00
Jacob Szwejbka	ff592f1038	[iOS][PTMCoreMLCompiler] Refactor use of deprecated writeToFile:atomically: (#113377 ) Summary: The NSString writeToFile:atomically: method was deprecated in iOS 2.0. This diff replaces it with a call to writeToFile:atomically:encoding:error: duplicate of D51003188 to fix gh permissions Test Plan: ci Differential Revision: D51164941 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113377 Approved by: https://github.com/kirklandsign	2023-11-09 21:08:23 +00:00
JackCaoG	b8a302ae6a	Disable flaky cpp test (#113302 ) Fixes [#ISSUE_NUMBER](https://github.com/pytorch/pytorch/issues/113251) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113302 Approved by: https://github.com/clee2000	2023-11-09 20:30:31 +00:00
Jerry Zhang	501d118255	[quant][pt2e] Add transform_for_annotation method in Quantizer (#113115 ) Summary: Adding the method so that people can do some transformations before annotation to make the graph easier to annotate Test Plan: python test/test_quantization.py TestQuantizePT2E.test_transform_for_annotation Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D51141080](https://our.internmc.facebook.com/intern/diff/D51141080) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113115 Approved by: https://github.com/kimishpatel	2023-11-09 20:23:29 +00:00
Shengbao Zheng	e53da90fe6	[Execution Trace] record global rank in pg_config_info (#113316 ) Summary: pg_config_info is used to dump pg information in Execution Trace(ET). For trace analysis purpose and PARAM replay benchmark, global rank is more meaningful than group ranks. p.s. ranks is a map of global rank: group rank Test Plan: Tested in HPC Differential Revision: D51136587 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113316 Approved by: https://github.com/XilunWu	2023-11-09 20:04:43 +00:00
Jon Chuang	5ccd22502f	[contextlib] Wrapping a function with `set_grad_enabled` will consume its global mutation (#113359 ) Fixes https://github.com/pytorch/pytorch/issues/113298 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113359 Approved by: https://github.com/soulitzer, https://github.com/jansel	2023-11-09 19:16:20 +00:00
Shubhraprakash Das	0381d8ce68	Quantized max pool 2d (#112937 ) Summary: Add quantized max pool 2d operation Test Plan: Check that all quantized tests pass" buck2 build --target-platforms ovr_config//platform/macos:arm64-fbsource //xplat/caffe2:pt_vulkan_quantized_api_test_binAppleMac\#macosx-arm64 -c pt.vulkan_full_precision=1 --show-output" Running main() from third-party/googletest/1.11.0/googletest/googletest/src/gtest_main.cc [==========] Running 78 tests from 1 test suite. [----------] Global test environment set-up. [----------] 78 tests from VulkanAPITest [ RUN ] VulkanAPITest.uniform_buffer_copy [ OK ] VulkanAPITest.uniform_buffer_copy (66 ms) [ RUN ] VulkanAPITest.copy_to_buffer [ OK ] VulkanAPITest.copy_to_buffer (61 ms) [ RUN ] VulkanAPITest.copy_to_buffer_channels_last [ OK ] VulkanAPITest.copy_to_buffer_channels_last (28 ms) [ RUN ] VulkanAPITest.cpu_to_vulkan_and_dequantize_quint8 [ OK ] VulkanAPITest.cpu_to_vulkan_and_dequantize_quint8 (58 ms) [ RUN ] VulkanAPITest.cpu_to_vulkan_and_dequantize_qint8 [ OK ] VulkanAPITest.cpu_to_vulkan_and_dequantize_qint8 (44 ms) [ RUN ] VulkanAPITest.cpu_to_vulkan_and_dequantize_qint32 [ OK ] VulkanAPITest.cpu_to_vulkan_and_dequantize_qint32 (72 ms) [ RUN ] VulkanAPITest.quantize_dequantize [ OK ] VulkanAPITest.quantize_dequantize (2 ms) [ RUN ] VulkanAPITest.quantize_per_tensor_and_dequantize_quint8 [ OK ] VulkanAPITest.quantize_per_tensor_and_dequantize_quint8 (69 ms) [ RUN ] VulkanAPITest.quantize_per_tensor_and_dequantize_quint8_qparams [ OK ] VulkanAPITest.quantize_per_tensor_and_dequantize_quint8_qparams (58 ms) [ RUN ] VulkanAPITest.quantize_per_tensor_and_dequantize_qint8 [ OK ] VulkanAPITest.quantize_per_tensor_and_dequantize_qint8 (77 ms) [ RUN ] VulkanAPITest.quantize_per_tensor_and_dequantize_qint8_qparams [ OK ] VulkanAPITest.quantize_per_tensor_and_dequantize_qint8_qparams (54 ms) [ RUN ] VulkanAPITest.quantize_per_tensor_and_dequantize_qint32 [ OK ] VulkanAPITest.quantize_per_tensor_and_dequantize_qint32 (93 ms) [ RUN ] VulkanAPITest.quantize_per_tensor_and_dequantize_qint32_qparams [ OK ] VulkanAPITest.quantize_per_tensor_and_dequantize_qint32_qparams (90 ms) [ RUN ] VulkanAPITest.quantized_add [ OK ] VulkanAPITest.quantized_add (2 ms) [ RUN ] VulkanAPITest.quantized_add_broadcast WARNING: Logging before InitGoogleLogging() is written to STDERR W1103 17:42:18.018113 4075724928 Resize.cpp:35] Warning: An output with one or more elements was resized since it had shape [2, 13, 1, 27], which does not match the required output shape [2, 13, 32, 27]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (function _resize_output_check) [ OK ] VulkanAPITest.quantized_add_broadcast (2 ms) [ RUN ] VulkanAPITest.quantized_add_broadcast1 [ OK ] VulkanAPITest.quantized_add_broadcast1 (1 ms) [ RUN ] VulkanAPITest.quantized_add_broadcast2 W1103 17:42:18.022008 4075724928 Resize.cpp:35] Warning: An output with one or more elements was resized since it had shape [32, 1], which does not match the required output shape [32, 27]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (function _resize_output_check) [ OK ] VulkanAPITest.quantized_add_broadcast2 (0 ms) [ RUN ] VulkanAPITest.quantized_add_broadcast3 [ OK ] VulkanAPITest.quantized_add_broadcast3 (0 ms) [ RUN ] VulkanAPITest.quantized_add_dif_params [ OK ] VulkanAPITest.quantized_add_dif_params (1 ms) [ RUN ] VulkanAPITest.conv2d [ OK ] VulkanAPITest.conv2d (4 ms) [ RUN ] VulkanAPITest.conv2d_pw [ OK ] VulkanAPITest.conv2d_pw (88 ms) [ RUN ] VulkanAPITest.conv2d_dw [ OK ] VulkanAPITest.conv2d_dw (32 ms) [ RUN ] VulkanAPITest.quantized_sub [ OK ] VulkanAPITest.quantized_sub (1 ms) [ RUN ] VulkanAPITest.quantized_mul [ OK ] VulkanAPITest.quantized_mul (1 ms) [ RUN ] VulkanAPITest.quantized_div [ OK ] VulkanAPITest.quantized_div (1 ms) [ RUN ] VulkanAPITest.quantized_upsample_nearest2d [ OK ] VulkanAPITest.quantized_upsample_nearest2d (0 ms) [ RUN ] VulkanAPITest.max_pool2d_qint8 [ OK ] VulkanAPITest.max_pool2d_qint8 (5 ms) [ RUN ] VulkanAPITest.max_pool2d_quint8 [ OK ] VulkanAPITest.max_pool2d_quint8 (4 ms) [ RUN ] VulkanAPITest.quantized_add_tests [ OK ] VulkanAPITest.quantized_add_tests (77 ms) [ RUN ] VulkanAPITest.quantized_sub_tests [ OK ] VulkanAPITest.quantized_sub_tests (104 ms) [ RUN ] VulkanAPITest.quantized_mul_tests [ OK ] VulkanAPITest.quantized_mul_tests (78 ms) [ RUN ] VulkanAPITest.quantized_div_tests [ OK ] VulkanAPITest.quantized_div_tests (124 ms) [ RUN ] VulkanAPITest.conv2d_quantized_fixed_params_uint8 [ OK ] VulkanAPITest.conv2d_quantized_fixed_params_uint8 (1 ms) [ RUN ] VulkanAPITest.conv2d_quantized_computed_params_uint8 [ OK ] VulkanAPITest.conv2d_quantized_computed_params_uint8 (0 ms) [ RUN ] VulkanAPITest.conv2d_quantized_random_params_uint8 [ OK ] VulkanAPITest.conv2d_quantized_random_params_uint8 (0 ms) [ RUN ] VulkanAPITest.conv2d_quantized_prepack_fixed_params_uint8 [ OK ] VulkanAPITest.conv2d_quantized_prepack_fixed_params_uint8 (0 ms) [ RUN ] VulkanAPITest.conv2d_quantized_prepack_computed_params_uint8 [ OK ] VulkanAPITest.conv2d_quantized_prepack_computed_params_uint8 (0 ms) [ RUN ] VulkanAPITest.conv2d_quantized_prepack_random_params_uint8 [ OK ] VulkanAPITest.conv2d_quantized_prepack_random_params_uint8 (0 ms) [ RUN ] VulkanAPITest.conv2d_dw_quantized_fixed_params_uint8 [ OK ] VulkanAPITest.conv2d_dw_quantized_fixed_params_uint8 (4 ms) [ RUN ] VulkanAPITest.conv2d_dw_quantized_computed_params_uint8 [ OK ] VulkanAPITest.conv2d_dw_quantized_computed_params_uint8 (3 ms) [ RUN ] VulkanAPITest.conv2d_dw_quantized_random_params_uint8 [ OK ] VulkanAPITest.conv2d_dw_quantized_random_params_uint8 (3 ms) [ RUN ] VulkanAPITest.conv2d_dw_quantized_prepack_fixed_params_uint8 [ OK ] VulkanAPITest.conv2d_dw_quantized_prepack_fixed_params_uint8 (3 ms) [ RUN ] VulkanAPITest.conv2d_dw_quantized_prepack_computed_params_uint8 [ OK ] VulkanAPITest.conv2d_dw_quantized_prepack_computed_params_uint8 (3 ms) [ RUN ] VulkanAPITest.conv2d_dw_quantized_prepack_random_params_uint8 [ OK ] VulkanAPITest.conv2d_dw_quantized_prepack_random_params_uint8 (3 ms) [ RUN ] VulkanAPITest.conv2d_pw_quantized_fixed_params_uint8 [ OK ] VulkanAPITest.conv2d_pw_quantized_fixed_params_uint8 (11 ms) [ RUN ] VulkanAPITest.conv2d_pw_quantized_computed_params_uint8 input_dif too big: 0.0175897. generating input again ... [ OK ] VulkanAPITest.conv2d_pw_quantized_computed_params_uint8 (17 ms) [ RUN ] VulkanAPITest.conv2d_pw_quantized_random_params_uint8 [ OK ] VulkanAPITest.conv2d_pw_quantized_random_params_uint8 (11 ms) [ RUN ] VulkanAPITest.conv2d_pw_quantized_prepack_fixed_params_uint8 [ OK ] VulkanAPITest.conv2d_pw_quantized_prepack_fixed_params_uint8 (11 ms) [ RUN ] VulkanAPITest.conv2d_pw_quantized_prepack_computed_params_uint8 [ OK ] VulkanAPITest.conv2d_pw_quantized_prepack_computed_params_uint8 (11 ms) [ RUN ] VulkanAPITest.conv2d_pw_quantized_prepack_random_params_uint8 [ OK ] VulkanAPITest.conv2d_pw_quantized_prepack_random_params_uint8 (11 ms) [ RUN ] VulkanAPITest.conv2d_quantized_fixed_params_int8_int32 [ OK ] VulkanAPITest.conv2d_quantized_fixed_params_int8_int32 (1 ms) [ RUN ] VulkanAPITest.conv2d_quantized_computed_params_int8_int32 [ OK ] VulkanAPITest.conv2d_quantized_computed_params_int8_int32 (0 ms) [ RUN ] VulkanAPITest.conv2d_quantized_random_params_int8_int32 [ OK ] VulkanAPITest.conv2d_quantized_random_params_int8_int32 (0 ms) [ RUN ] VulkanAPITest.conv2d_quantized_prepack_fixed_params_int8_int32 [ OK ] VulkanAPITest.conv2d_quantized_prepack_fixed_params_int8_int32 (0 ms) [ RUN ] VulkanAPITest.conv2d_quantized_prepack_computed_params_int8_int32 [ OK ] VulkanAPITest.conv2d_quantized_prepack_computed_params_int8_int32 (0 ms) [ RUN ] VulkanAPITest.conv2d_quantized_prepack_random_params_int8_int32 [ OK ] VulkanAPITest.conv2d_quantized_prepack_random_params_int8_int32 (0 ms) [ RUN ] VulkanAPITest.conv2d_dw_quantized_fixed_params_int8_int32 [ OK ] VulkanAPITest.conv2d_dw_quantized_fixed_params_int8_int32 (3 ms) [ RUN ] VulkanAPITest.conv2d_dw_quantized_computed_params_int8_int32 [ OK ] VulkanAPITest.conv2d_dw_quantized_computed_params_int8_int32 (4 ms) [ RUN ] VulkanAPITest.conv2d_dw_quantized_random_params_int8_int32 [ OK ] VulkanAPITest.conv2d_dw_quantized_random_params_int8_int32 (3 ms) [ RUN ] VulkanAPITest.conv2d_dw_quantized_prepack_fixed_params_int8_int32 [ OK ] VulkanAPITest.conv2d_dw_quantized_prepack_fixed_params_int8_int32 (4 ms) [ RUN ] VulkanAPITest.conv2d_dw_quantized_prepack_computed_params_int8_int32 [ OK ] VulkanAPITest.conv2d_dw_quantized_prepack_computed_params_int8_int32 (3 ms) [ RUN ] VulkanAPITest.conv2d_dw_quantized_prepack_random_params_int8_int32 [ OK ] VulkanAPITest.conv2d_dw_quantized_prepack_random_params_int8_int32 (3 ms) [ RUN ] VulkanAPITest.conv2d_pw_quantized_fixed_params_int8_int32 [ OK ] VulkanAPITest.conv2d_pw_quantized_fixed_params_int8_int32 (11 ms) [ RUN ] VulkanAPITest.conv2d_pw_quantized_computed_params_int8_int32 [ OK ] VulkanAPITest.conv2d_pw_quantized_computed_params_int8_int32 (12 ms) [ RUN ] VulkanAPITest.conv2d_pw_quantized_random_params_int8_int32 [ OK ] VulkanAPITest.conv2d_pw_quantized_random_params_int8_int32 (11 ms) [ RUN ] VulkanAPITest.conv2d_pw_quantized_prepack_fixed_params_int8_int32 [ OK ] VulkanAPITest.conv2d_pw_quantized_prepack_fixed_params_int8_int32 (11 ms) [ RUN ] VulkanAPITest.conv2d_pw_quantized_prepack_computed_params_int8_int32 [ OK ] VulkanAPITest.conv2d_pw_quantized_prepack_computed_params_int8_int32 (12 ms) [ RUN ] VulkanAPITest.conv2d_pw_quantized_prepack_random_params_int8_int32 [ OK ] VulkanAPITest.conv2d_pw_quantized_prepack_random_params_int8_int32 (11 ms) [ RUN ] VulkanAPITest.quantized_tensor_get_scale_zero_point [ OK ] VulkanAPITest.quantized_tensor_get_scale_zero_point (0 ms) [ RUN ] VulkanAPITest.linear_2d_flat [ OK ] VulkanAPITest.linear_2d_flat (3 ms) [ RUN ] VulkanAPITest.linear_2d_small [ OK ] VulkanAPITest.linear_2d_small (0 ms) [ RUN ] VulkanAPITest.linear_2d_large [ OK ] VulkanAPITest.linear_2d_large (2 ms) [ RUN ] VulkanAPITest.linear_3d_flat [ OK ] VulkanAPITest.linear_3d_flat (2 ms) [ RUN ] VulkanAPITest.linear_3d_small [ OK ] VulkanAPITest.linear_3d_small (1 ms) [ RUN ] VulkanAPITest.linear_3d_large [ OK ] VulkanAPITest.linear_3d_large (1 ms) [ RUN ] VulkanAPITest.linear_4d_flat [ OK ] VulkanAPITest.linear_4d_flat (1 ms) [ RUN ] VulkanAPITest.linear_4d_small [ OK ] VulkanAPITest.linear_4d_small (1 ms) [ RUN ] VulkanAPITest.linear_4d_large [ OK ] VulkanAPITest.linear_4d_large (2 ms) [----------] 78 tests from VulkanAPITest (1537 ms total) [----------] Global test environment tear-down [==========] 78 tests from 1 test suite ran. (1537 ms total) [ PASSED ] 78 tests. Differential Revision: D50821920 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112937 Approved by: https://github.com/yipjustin	2023-11-09 19:15:26 +00:00
NVS Abhilash	44c0521e8c	fix: docstring error in torch/distributed module (#113241 ) Fixes: #113193 `pydocstyle <all_files_in_issue> --count` - Before: 345 - After: 130 For deprecated methods, I have added a `noqa` to ignore them. I was not able to find the file `torch/distributed/tensor/parallel/multihead_attention_tp.py`, so I've ignored it for this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113241 Approved by: https://github.com/kit1980	2023-11-09 19:10:20 +00:00
Elias Ellison	977e555ca6	Skip conv-bn folding on multiple conv uses (#112543 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112543 Approved by: https://github.com/XiaobingSuper, https://github.com/davidberard98	2023-11-09 18:38:10 +00:00
Mikayla Gawarecki	b0c9ccdc4b	Add standard deviation of metrics over runs to inference benchmark (#113309 ) Run each `(batch_size, compile)` benchmark 10 times in `./runner.sh` and get mean and standard deviation of metrics in output table Only report `warmup latency`, `average_latency`, `throughput` and `gpu_util` Break `output.md` file into a single markdown file per `(batch_size, compile)` configuration. Further runs of `./runner.sh` will append one row to the table in each file for easy comparison Pull Request resolved: https://github.com/pytorch/pytorch/pull/113309 Approved by: https://github.com/albanD	2023-11-09 18:38:05 +00:00
Aaron Gokaslan	d977f118ad	Update ruff linter to v0.1.5 (#113355 ) Update ruff linter to v0.1.5. Mainly bugfixes, primarily for autofixes, but good to include since there is at least one pydocstyle autofix update in their as people prepare their pydocstyle PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113355 Approved by: https://github.com/kit1980, https://github.com/malfet	2023-11-09 18:06:54 +00:00
Wanchao Liang	9834fb7fd0	[dtensor] full_tensor to return synchronously (#113322 ) full_tensor API should return synchronously instead of AsyncCollectiveTensor and if the return is that, we do the wait directly, this makes the full_tensor API be more percise Pull Request resolved: https://github.com/pytorch/pytorch/pull/113322 Approved by: https://github.com/wz337	2023-11-09 18:02:40 +00:00
pratiklp00	4da5d4b2ef	Fix Clang compilation error with Lib ATen for ppc64le (#106446 ) This patch fixes error while compiling with Clang for ppc64le I have used clang version 15.0.7 Errors are as follow: ``` No matching function for call to 'vec_sel’ No matching function for call to 'vec_splats' Excess elements in scalar initializer Use of undeclared identifier 'vec_vsubudm' Fix for multiple error within int64_t DEFINE_MEMBER_OP_AND_ONE ``` References: - https://releases.llvm.org/9.0.0/tools/clang/docs/AttributeReference.html - https://reviews.llvm.org/D81083 Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/106446 Approved by: https://github.com/malfet	2023-11-09 17:27:14 +00:00
Roger Lam	289d887a41	Fix ZeroDivisionError when unfolding a zero-dimension tensor in compile mode (#113259 ) Fixes #113026 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113259 Approved by: https://github.com/peterbell10	2023-11-09 17:25:36 +00:00
Lucas Pasqualin	1d56e7b5af	Adds broadcast to functional collectives (#112668 ) Adds `broadcast` to functional collectives, including inductor support. Test with `python test_inductor_collectives.py -- TestCollectivesMultiProc.test_broadcast_inductor` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112668 Approved by: https://github.com/wanchaol, https://github.com/wconstab	2023-11-09 15:47:52 +00:00
Jiong Gong	bf2c20be55	[inductor test] enable dynamic loop for test_adaptive_avg_pool1d_argmax (#113339 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113339 Approved by: https://github.com/lezcano ghstack dependencies: #113168	2023-11-09 15:14:52 +00:00
Edward Z. Yang	f98ba596f1	Use CapturedTraceback symbolizer for C++ exceptions from Python library (#113207 ) This is the cheap and cheerful implementation, which is only enabled on TORCH_SHOW_CPP_STACKTRACES, because it eagerly symbolizes immediately at exception throw time, even if the exception will end up getting caught. It would be better to do this lazily and only symbolize when we try to print the exception, but that requires a more involved refactor of c10::Error that I don't feel like doing. Compare the output before: ``` frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x95 (0x7fa21b99d975 in /data/users/ezyang/c/pytorch/torch/lib/libc10.so) frame #1: c10::TensorImpl::throw_cannot_call_with_symbolic(char const) const + 0x8d (0x7fa21b951269 in /data/users/ezyang/c/pytorch/torch/lib/libc10.so) frame #2: c10::TensorImpl::sizes_custom() const + 0x9f (0x7fa21b9770df in /data/users/ezyang/c/pytorch/torch/lib/libc10.so) frame #3: at::meta::structured_mm::meta(at::Tensor const&, at::Tensor const&) + 0x31e (0x7fa20a202a8e in /data/users/ezyang/c/pytorch/torch/lib/libtorch_cpu.so) frame #4: <unknown function> + 0x29f34de (0x7fa20b5f34de in /data/users/ezyang/c/pytorch/torch/lib/libtorch_cpu.so) frame #5: <unknown function> + 0x2a1fd8e (0x7fa20b61fd8e in /data/users/ezyang/c/pytorch/torch/lib/libtorch_cpu.so) frame #6: <unknown function> + 0x6b907b (0x7fa2142b907b in /data/users/ezyang/c/pytorch/torch/lib/libtorch_python.so) frame #7: <unknown function> + 0x6b6175 (0x7fa2142b6175 in /data/users/ezyang/c/pytorch/torch/lib/libtorch_python.so) ``` and after: ``` #4 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0 #5 c10::TensorImpl::throw_cannot_call_with_symbolic(char const) const from ??:0 #6 c10::TensorImpl::sizes_custom() const [clone .localalias] from TensorImpl.cpp:0 #7 at::meta::structured_mm::meta(at::Tensor const&, at::Tensor const&) from ??:0 #8 at::(anonymous namespace)::wrapper_Meta_mm_out_out(at::Tensor const&, at::Tensor const&, at::Tensor&) from RegisterMeta.cpp:0 #9 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor& (at::Tensor const&, at::Tensor const&, at::Tensor&), &at::(anonymous namespace)::wrapper_Meta_mm_out_out>, at::Tensor&, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor&> >, false>::call(c10::OperatorKernel, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >) from RegisterMeta.cpp:0 ``` Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113207 Approved by: https://github.com/Skylion007	2023-11-09 15:06:08 +00:00
Jon Chuang	e6eab49e11	[dynamo] graph break on setattr `requires_grad` (#113163 ) Main: `RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn` This PR: graph breaks and eager applies the mutation, new tensors are tracked Fixes https://github.com/pytorch/pytorch/issues/109505 (the original bug does not occur, but a new bug where the mutation isn't applied - because AOTAutograd is not `requires_grad` mutation aware - is mitigated) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113163 Approved by: https://github.com/bdhirsh	2023-11-09 13:13:29 +00:00
Jiong Gong	8c704f7a0e	[inductor cpp] fix argmax with >1 reduction dims (#113168 ) Fix #113013. The argmax (and argmin) implementation doesn't handle the index compute properly when the number of reduction dims is larger than 1. It wrongly assumed only one reduction dim. With the given reproducer, the generated code before the change: ```c++ #include "/tmp/torchinductor_jgong5/tb/ctbgktuhgnnlel6ipqkfk76lfztr5pledachdkcq3asdqtlxpzt6.h" extern "C" void kernel(const double* in_ptr0, long* out_ptr0) { { { struct IndexValue_1 {size_t index; double value;}; IndexValue_1 tmp_acc0{0, -std::numeric_limits<double>::infinity()}; #if !defined(__clang_major__) \|\| __clang_major__ > 9 #pragma omp declare reduction(argmax : IndexValue_1 :\ omp_out.value = omp_in.value < omp_out.value ? omp_out.value : omp_in.value,\ omp_out.index = omp_in.value < omp_out.value ? omp_out.index : omp_in.index)\ initializer(omp_priv = {0, -std::numeric_limits<double>::infinity()}) #endif for(long x0=static_cast<long>(0L); x0<static_cast<long>(9L); x0+=static_cast<long>(1L)) { for(long x1=static_cast<long>(0L); x1<static_cast<long>(2L); x1+=static_cast<long>(1L)) { auto tmp0 = c10::convert<long>(0); auto tmp1 = c10::convert<long>(1); auto tmp2 = tmp0 < tmp1; auto tmp3 = c10::convert<long>(at::native::div_floor_integer((3Lx1), 2L)); auto tmp4 = c10::convert<long>(2L + (at::native::div_floor_integer((3Lx1), 2L))); auto tmp5 = tmp3 < tmp4; auto tmp6 = tmp2 & tmp5; auto tmp7 = [&] { auto tmp8 = in_ptr0[static_cast<long>((3Lx0) + (at::native::div_floor_integer((3Lx1), 2L)))]; return tmp8; } ; auto tmp9 = tmp6 ? tmp7() : static_cast<decltype(tmp7())>(0.0); auto tmp10 = c10::convert<long>(1L + (at::native::div_floor_integer((3Lx1), 2L))); auto tmp11 = tmp10 < tmp4; auto tmp12 = tmp2 & tmp11; auto tmp13 = [&] { auto tmp14 = in_ptr0[static_cast<long>(1L + (3Lx0) + (at::native::div_floor_integer((3Lx1), 2L)))]; return tmp14; } ; auto tmp15 = tmp12 ? tmp13() : static_cast<decltype(tmp13())>(0.0); auto tmp16 = tmp15 + tmp9; auto tmp17 = [&] { auto tmp18 = c10::convert<double>(1.0); return tmp18; } ; auto tmp19 = tmp6 ? tmp17() : static_cast<decltype(tmp17())>(0.0); auto tmp20 = [&] { auto tmp21 = c10::convert<double>(1.0); return tmp21; } ; auto tmp22 = tmp12 ? tmp20() : static_cast<decltype(tmp20())>(0.0); auto tmp23 = tmp22 + tmp19; auto tmp24 = tmp16 / tmp23; if (tmp_acc0.value < tmp24) { tmp_acc0.index = x1; tmp_acc0.value = tmp24; // both x0 and x1 are reduction vars while only x1 is assigned to tmp_acc0.index } } } out_ptr0[static_cast<long>(0L)] = tmp_acc0.index; } } } ``` After fix: ```c++ #include "/tmp/torchinductor_jgong5/tb/ctbgktuhgnnlel6ipqkfk76lfztr5pledachdkcq3asdqtlxpzt6.h" extern "C" void kernel(const double in_ptr0, long* out_ptr0) { { { struct IndexValue_1 {size_t index; double value;}; IndexValue_1 tmp_acc0{0, -std::numeric_limits<double>::infinity()}; #if !defined(__clang_major__) \|\| __clang_major__ > 9 #pragma omp declare reduction(argmax : IndexValue_1 :\ omp_out.value = omp_in.value < omp_out.value ? omp_out.value : omp_in.value,\ omp_out.index = omp_in.value < omp_out.value ? omp_out.index : omp_in.index)\ initializer(omp_priv = {0, -std::numeric_limits<double>::infinity()}) #endif for(long x0=static_cast<long>(0L); x0<static_cast<long>(9L); x0+=static_cast<long>(1L)) { for(long x1=static_cast<long>(0L); x1<static_cast<long>(2L); x1+=static_cast<long>(1L)) { auto tmp0 = c10::convert<long>(0); auto tmp1 = c10::convert<long>(1); auto tmp2 = tmp0 < tmp1; auto tmp3 = c10::convert<long>(at::native::div_floor_integer((3Lx1), 2L)); auto tmp4 = c10::convert<long>(2L + (at::native::div_floor_integer((3Lx1), 2L))); auto tmp5 = tmp3 < tmp4; auto tmp6 = tmp2 & tmp5; auto tmp7 = [&] { auto tmp8 = in_ptr0[static_cast<long>((3Lx0) + (at::native::div_floor_integer((3Lx1), 2L)))]; return tmp8; } ; auto tmp9 = tmp6 ? tmp7() : static_cast<decltype(tmp7())>(0.0); auto tmp10 = c10::convert<long>(1L + (at::native::div_floor_integer((3Lx1), 2L))); auto tmp11 = tmp10 < tmp4; auto tmp12 = tmp2 & tmp11; auto tmp13 = [&] { auto tmp14 = in_ptr0[static_cast<long>(1L + (3Lx0) + (at::native::div_floor_integer((3Lx1), 2L)))]; return tmp14; } ; auto tmp15 = tmp12 ? tmp13() : static_cast<decltype(tmp13())>(0.0); auto tmp16 = tmp15 + tmp9; auto tmp17 = [&] { auto tmp18 = c10::convert<double>(1.0); return tmp18; } ; auto tmp19 = tmp6 ? tmp17() : static_cast<decltype(tmp17())>(0.0); auto tmp20 = [&] { auto tmp21 = c10::convert<double>(1.0); return tmp21; } ; auto tmp22 = tmp12 ? tmp20() : static_cast<decltype(tmp20())>(0.0); auto tmp23 = tmp22 + tmp19; auto tmp24 = tmp16 / tmp23; if (tmp_acc0.value < tmp24) { tmp_acc0.index = static_cast<long>(x1 + (2Lx0)); tmp_acc0.value = tmp24; } } } out_ptr0[static_cast<long>(0L)] = tmp_acc0.index; } } } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/113168 Approved by: https://github.com/lezcano, https://github.com/jansel	2023-11-09 11:47:51 +00:00
Ayham Tannous	be66d5e845	Add file name and size to the serialization metadata logging (#113077 ) Summary: To be able to get more info on serialization/deserialization events, adding these two files to the metadata logging. - file_name - file_size Test Plan: buck2 test mode/dev caffe2/caffe2/serialize:inline_container_test Reviewed By: davidberard98 Differential Revision: D51040426 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113077 Approved by: https://github.com/davidberard98	2023-11-09 11:14:24 +00:00
Wanchao Liang	addb8e29cd	Enable 2d + AC torch.compile (#112536 ) This PR enables AC + torch.compile to work with FSDP + TP, the fix to high order op path is that we need to check both tensor and tensor subclass bases to make sourceless builder NOTE: selective AC + 2D is still not working, need to fix this separately Pull Request resolved: https://github.com/pytorch/pytorch/pull/112536 Approved by: https://github.com/yf225	2023-11-09 06:12:13 +00:00
Fei Kou	acd595e352	[easy][tp] Fix typo (#113292 ) Summary: as title Test Plan: buck test mode/opt -c fbcode.enable_gpu_sections=true //caffe2/test/distributed/_tensor/experimental:tp_transform Differential Revision: D51124333 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113292 Approved by: https://github.com/Skylion007	2023-11-09 06:02:18 +00:00
Jon Chuang	0093e23e52	[dynamo] GradModeVariable should only be eagerly initialized when doing the equivalent of `set_grad_enabled` (#113293 ) Grad mode variable was previously initialized eagerly when called - which is wrong when not explicitly using it in `set_grad_enabled` Pull Request resolved: https://github.com/pytorch/pytorch/pull/113293 Approved by: https://github.com/jansel	2023-11-09 06:00:14 +00:00
Zhengxu Chen	b3ad29e269	[export] Fix executorch models. (#113296 ) Summary: yolo fixing issues. See Test plan Test Plan: buck2 run 'fbcode//mode/dev' fbcode//executorch/examples/portable/test:test_export -- -r test_mv3_export_to_executorch [Need acl to repro this but the error message looks straight forward] buck2 test 'fbcode//mode/dev-nosan' fbcode//pye/model_inventory/nlu_stella_cap:nlu_stella_cap_test -- --exact 'pye/model_inventory/nlu_stella_cap:nlu_stella_cap_test - test_export_to_backend_dynamic_quantized (pye.model_inventory.nlu_stella_cap.NluStellaCapTest.NluStellaCapTest)' Differential Revision: D51128480 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113296 Approved by: https://github.com/tugsbayasgalan	2023-11-09 03:58:16 +00:00
Oguz Ulgen	fbf7866ac9	[Inductor] Fallback scatter when src dtype is bf16 (#113204 ) basic_gnn_gcn, basic_gnn_gin, basic_gnn_sage now pass Pull Request resolved: https://github.com/pytorch/pytorch/pull/113204 Approved by: https://github.com/eellison	2023-11-09 03:43:11 +00:00
wz337	31ded95cd5	[2D] Bind _fsdp_extension to FSDP instances (#113237 ) Currently, when we have 2D composition, a global variable _extensions controls the 2D deviation we need to take in state_dict calls (See https://github.com/pytorch/pytorch/blob/release/2.1/torch/distributed/fsdp/_fsdp_extensions.py#L66-L68). This is problematic when we have both a 2D model and a plain FSDP model in the same dist environment, as the _extensions will be mistakenly turned on for the plain FSDP model, resulting in state_dict error (RuntimeError: No parent device_mesh is found for FSDP device_mesh.). This PR binds _fsdp_extension to the FSDP instances to make sure that state_dict calls would not get interfered with each other when mixing both 2D and 1D parallelism. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113237 Approved by: https://github.com/fduwjj, https://github.com/fegin	2023-11-09 03:31:03 +00:00
Jez Ng	204ec11e6d	[inductor][easy] Fix fusion logging (#113308 ) We should use %s instead of %d as the numel may be sympy Exprs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113308 Approved by: https://github.com/lezcano	2023-11-09 03:19:39 +00:00
Guilherme Leobas	adcf9bb2bd	optimize case where div denominator is -1 (#112878 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112878 Approved by: https://github.com/lezcano	2023-11-09 02:41:05 +00:00
rzou	b694f88ef6	Grandfather in built-in TorchScript ops to being pt2_compliant (#113061 ) I'm seeing ops like torch.ops.aten.mul.complex being used with torch.compile (though this seems strange to me), but we should grandfather these in. Test Plan: - new tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/113061 Approved by: https://github.com/ezyang ghstack dependencies: #113050	2023-11-09 02:35:33 +00:00
rzou	c88a36ebce	Grandfather in some more pytorch ops to be pt2_compliant (#113050 ) We're not directly testing these, but in general the policy is to assume that PyTorch ops inside the pytorch repo are compliant. Test Plan: - existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/113050 Approved by: https://github.com/ezyang	2023-11-09 02:35:33 +00:00
Fei Kou	e2236ae097	[tp] Fix test_tp_transform_with_uncovered_op (#113310 ) Summary: Test fails on CPU currently with some weird error when `wait_tensor = torch.ops.c10d_functional.wait_tensor.default(all_gather_into_tensor);` runs. ``` [rank2]:[2023-11-08 13:30:29,940] torch.testing._internal.common_distributed: [ERROR] RuntimeError: A view was created in no_grad mode and is being modified inplace with grad mode enabled. This view is the output of a function that returns multiple views. Such functions do not allow the output views to be modified inplace. You should replace the inplace operation by an out-of-place one. ``` https://www.internalfb.com/intern/test/562950070959214/ Test Plan: buck test mode/opt -c fbcode.enable_gpu_sections=true //caffe2/test/distributed/_tensor/experimental:tp_transform Reviewed By: weifengpy Differential Revision: D51131676 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113310 Approved by: https://github.com/wanchaol	2023-11-09 02:00:11 +00:00
Peter Bell	15b61d6c1a	TensorImpl: Lazily compute numel and contiguity when symbolic (#112785 ) Currently whenever the sizes or strides are modified for a `TensorImpl` we eagerly recompute the numel and memory format flags. This is fine for static shapes as it's all fast C++ code, but for symbolic shapes it runs slow python code. This instead changes the `SymbolicShapeMeta` object to compute the derived quantities lazily at the first request. This has the added benefit that we can now pass assumptions in `empty_tensor_restride` which remove the need to compute some contiguity flags at all. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112785 Approved by: https://github.com/ezyang ghstack dependencies: #112689, #112890	2023-11-09 01:36:37 +00:00
Peter Bell	8c4bdac560	TensorImpl: Move symbolic refresh_numel and refresh_contiguous into their own class (#112890 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112890 Approved by: https://github.com/lezcano ghstack dependencies: #112689	2023-11-09 01:36:37 +00:00
Jason Ansel	8858edad65	[dynamo] Refactor test cross importing (#113242 ) Having tests import tests is a bit annoying because fbcode/oss have different paths. This moves that stuff into a helper function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113242 Approved by: https://github.com/yanboliang	2023-11-09 01:36:27 +00:00
eellison	325e0fdfdd	Enable masked_scatter_backward for inductor (#109642 ) masked_scatter_backward was previously implemented as a CompositeExplicitAutograd, which involved a decomp that calls masked_select, and masked_select in general produces data-dependent shapes that inductor doesn't support. But masked_scatter_backward reshapes the return value of masked_select such that the end result has a static shape again. I have converted masked_scatter_backward into an aten op to avoid this issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109642 Approved by: https://github.com/ezyang ghstack dependencies: #108170	2023-11-09 01:27:57 +00:00
drisspg	14811d69d7	[BE] Cleanup sdpa test helper usage (#113294 ) # Summary standardizes usage of the rand_sdpa_tensor helper Pull Request resolved: https://github.com/pytorch/pytorch/pull/113294 Approved by: https://github.com/soulitzer	2023-11-09 01:16:53 +00:00
Tugsbayasgalan Manlaibaatar	84d64d72d6	Persist copy_ in training graph for inputs that don't require grad (#111046 ) In this PR, we try to keep the input mutations in the forward graph IFF input mutation is data mutation and not metadata mutation and doesn't require grad. This is for optimizing inductor training graphs. (For more details: https://github.com/pytorch/pytorch/issues/109240) We keep the input mutation in the graph by wrapping the original callable in a wrapper function where in the end we add input.copy_(updated_input) call which is then traced via make_fx. Previously, this was only enabled for forward-only path but unconditionally disabled for joint graph. Another caveat is that when we are tracing through tensor subclasses, we won't allow any input mutations to be preserved in the graph. The reason is that it makes the code logic quite ugly for no obvious performance improvement. Most of the changes in this PR are mechanical and I didn't have to make any change to the partitioner. Previously forward/backward heavily relied on metadata field `num_mutated_inps` to figure out whether something is returned as extra output or not. But now since we keep some mutations in the graph, we need to propogate something similar to `num_mutated_inps - num_graph_handled_inps`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111046 Approved by: https://github.com/ezyang, https://github.com/bdhirsh	2023-11-09 00:40:29 +00:00
PaliC	2c4be77f02	Revert "[dynamo] Graph break on `setattr(Tensor, "data", Tensor)` (#113043 )" (#113297 ) This reverts commit ddfe5725342b0c0f707222879ca9dac305f97210. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113297 Approved by: https://github.com/PaliC	2023-11-09 00:26:21 +00:00
PyTorch MergeBot	94d95a91a2	Revert "[dynamo] graph break on setattr `requires_grad` (#113163 )" This reverts commit d261687d5f56ac8148fab2567cf1fa6dd5264def. Reverted https://github.com/pytorch/pytorch/pull/113163 on behalf of https://github.com/PaliC due to relevant tests are not running for this pr, however, this is fixed after landing https://github.com/pytorch/pytorch/pull/113297/ ([comment](https://github.com/pytorch/pytorch/pull/113163#issuecomment-1802967236))	2023-11-09 00:23:04 +00:00
Jerry Zhang	12c257cc00	[qunat][pt2e] Support allow_implicit_sharing flag (#112929 ) Summary: For a Node: node1 and edge: (node1, node2), since they are observing the same Tensor, we may want to implicitly share observers, this flag allows people to turn off this behavior for the output of the node See the test_allow_implicit_sharing test for use case Test Plan: python test/test_quantization.py TestQuantizePT2E.test_allow_implicit_sharing Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/112929 Approved by: https://github.com/kimishpatel	2023-11-08 23:47:17 +00:00
Yifu Wang	625958d8bc	Inductor support for native c10d_functional (#112439 ) This PR adds Inductor support for [native c10d_functional ops](https://github.com/pytorch/pytorch/pull/110570). The Inductor IRs introduced in this PR will replace the existing `CollectiveKernel` IR hierarchy. Compared to the existing collective IRs, the new IRs: - Are target language agnostic and support AOTInductor. - Express the constraints solely with read/write deps. This maximizes the potential for buffer reuse. - Address an issue where out-of-place collective's input buffers could be mutated while being volatilely read. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112439 Approved by: https://github.com/Chillee	2023-11-08 23:40:21 +00:00
Jez Ng	297c26bb8e	Support fp8 in AOTInductor + support optional<> in C ABI (#112527 ) This was originally ipiszy's PR: https://github.com/pytorch/pytorch/pull/112358 It turns out that we need to add support for optional types in order to support fp8 gemm (i.e. scaled_mm). Since our ABI-stable C interface can't support optional<> directly, I am passing in optional types via pointer instead. `AtenTensorHandle`s are already pointers, so nothing needs to change there. Only value types need to change. We decided on this approach instead of adding an extra `bool` param to the callee because this simplifies things. Having the same number of arguments regardless of whether we are emitting Python / C++ / ABI-compatible C++ makes codegen easier. There are a number of existing ABI-compatible functions that have optional-typed value parameters. Previously, they just assumed they would never be passed a `nullopt` / `None` at runtime. Changing them to use pointer types now would break ABI stability, so I have created an exclude list for those functions. Finally, I think the current implementation is kind of messy, and only works for FallbackKernels, even though technically ExternKernels could also have the same issue. It also doesn't support optional types nested in lists. I've left FIXME comments for both issues. Differential Revision: [D51084289](https://our.internmc.facebook.com/intern/diff/D51084289) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112527 Approved by: https://github.com/chenyang78, https://github.com/desertfire	2023-11-08 22:56:48 +00:00
BJ Hargrave	ee777a7c3c	docs: Add docstring for torch.masked._ops.logaddexp (#113206 ) logaddexp is not a reduction and normalization, so _apply_docstring_templates cannot be used to add a docstring. Fixes https://github.com/pytorch/pytorch/issues/113082 Also fix another misspelling. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113206 Approved by: https://github.com/cpuhrsch	2023-11-08 22:45:35 +00:00
Bin Bao	f6c00b16c8	[aotinductor] Update the benchmarking script to clone an eager model (#113046 ) Summary: fix https://github.com/pytorch/pytorch/issues/113029 where running a model in eager somehow can change a weight stride Pull Request resolved: https://github.com/pytorch/pytorch/pull/113046 Approved by: https://github.com/angelayi	2023-11-08 22:05:03 +00:00
Jason Ansel	24bb60d8a1	[inductor] Add test for debug.trace mode (#113240 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113240 Approved by: https://github.com/oulgen	2023-11-08 21:50:18 +00:00
Mengwei Liu	5506b9db43	[decomp] Fix _scaled_dot_product_flash_attention decomposition bug (#113102 ) For `_scaled_dot_product_flash_attention` we don't have `Tensor? attn_mask=None` but `scaled_dot_product_attention` has. In the original decomp there's a mixup where I added this argument to `_scaled_dot_product_flash_attention`. Fix it so that `_scaled_dot_product_flash_attention` is being decomposed correctly. Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/113102 Approved by: https://github.com/ezyang	2023-11-08 21:47:37 +00:00
PyTorch MergeBot	aef9e43fe6	Revert "Replaced deprecated pkg_resources.packaging with packaging module (#113023 )" This reverts commit 81ea7a489a85d6f6de2c3b63206ca090927e203a. Reverted https://github.com/pytorch/pytorch/pull/113023 on behalf of https://github.com/atalman due to breaks nightlies ([comment](https://github.com/pytorch/pytorch/pull/113023#issuecomment-1802720774))	2023-11-08 21:39:59 +00:00
hongxyan	b30f178d09	Replace assert with CUDA_KERNEL_ASSERT in Reduce.cuh for consistency (#113098 ) Related to Fixes #94891 Problem: We are trying to disable `printf` in kernels for Pytorch build on ROCm to fix the `torch.sum()` issues for certain community users by disabling `CUDA_KERNEL_ASSERT`, but found that there are still hostcall printf happening in `ReduceSumProdKernel` used by `torch.sum`. Reason: The reason is that there are `assert` function calls inside `Reduce.cuh`, ( defined as `__assert_fail` ) which caused `printf`. Fix: This pull request is to change `assert` to `CUDA_KERNEL_ASSERT` so that we can consistently disable assertion/printf in cuda/hip kernel code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113098 Approved by: https://github.com/ezyang	2023-11-08 21:25:54 +00:00
Edward Z. Yang	77e8e8fd2d	Rewrite docs so that it is OK to use record_stream before uses (#113282 ) The previous documentation did not appear to accurately describe the actual semantics in CUDA caching allocator. When you record stream, we only record a stream use: ``` void recordStream(Block* block, cuda::CUDAStream stream) { std::lock_guard<std::recursive_mutex> lock(mutex); if (stream.stream() == block->stream) { // ignore uses on the allocation stream, since those don't require any // special synchronization return; } block->stream_uses.insert(stream); } ``` It is only at deallocation time when we actually install an event on stream uses that we will subsequently query to determine if the block can be reused or not. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113282 Approved by: https://github.com/Skylion007, https://github.com/albanD	2023-11-08 21:24:50 +00:00
Jez Ng	5da9abfec2	[dynamo] Enable typechecking for comptime.py (#112999 ) I made `comptime` a callable instance instead of a function because mypy doesn't allow creating extra attributes on a plain function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112999 Approved by: https://github.com/ezyang ghstack dependencies: #112130, #112970, #112971, #112972, #112973, #112974, #112975	2023-11-08 21:17:45 +00:00
Jez Ng	26f907e09b	[dynamo] Enable typechecking for skipfiles.py (#112975 ) Not sure why mypy thinks `importlib.util.find_spec` is not a valid lookup, but it seems OK if I explicitly import it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112975 Approved by: https://github.com/yanboliang, https://github.com/eellison ghstack dependencies: #112130, #112970, #112971, #112972, #112973, #112974	2023-11-08 21:17:45 +00:00
Jez Ng	7fb56993ba	[dynamo] Enable typechecking for device_interface.py (#112974 ) One small runtime change: `get_interface_for_device()` now throws instead of returning None when an interface is not found. Inspecting all the callsites in the codebase shows that none of them actually check if the return type is None, so I think this is safe. I also silenced a bunch of mypy errors around method assignment; mypy seems unable to handle the subtype checks correctly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112974 Approved by: https://github.com/eellison ghstack dependencies: #112130, #112970, #112971, #112972, #112973	2023-11-08 21:17:45 +00:00
Jez Ng	152f9bbb9a	[dynamo] Switch MYPYNOFOLLOW config from includes to excludes (#112973 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112973 Approved by: https://github.com/Skylion007 ghstack dependencies: #112130, #112970, #112971, #112972	2023-11-08 21:17:45 +00:00
Jez Ng	bea2b703b0	[dynamo] Enable typechecking for bytecode_analysis.py (#112972 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112972 Approved by: https://github.com/jansel, https://github.com/eellison ghstack dependencies: #112130, #112970, #112971	2023-11-08 21:17:45 +00:00
Jez Ng	c1fa708b03	[dynamo] Enable typechecking for utils.py (#112971 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112971 Approved by: https://github.com/lezcano, https://github.com/jansel ghstack dependencies: #112130, #112970	2023-11-08 21:17:45 +00:00
Jez Ng	1c40d1c683	[dynamo] Enable typechecking for profiler.py (#112970 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112970 Approved by: https://github.com/ezyang ghstack dependencies: #112130	2023-11-08 21:17:45 +00:00
Jez Ng	dc63248b76	Make dynamo configs more amenable to static type checking (#112130 ) `install_config_module` makes a regular module into a ConfigModule with extra methods defined on it. mypy thinks those extra methods (or module functions) are undefined since it cannot analyze something so dynamic. As a workaround, I've created a fake module that defines these extra functions, which I import into the config modules during type checking. As part of this change, I've also added more types to config_utils.py and enabled typechecking for torch/_dynamo/config.py. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112130 Approved by: https://github.com/jansel	2023-11-08 21:17:45 +00:00
Nikita Shulga	d5eb9f725c	Fix test_add_scalar_with_empty_list_tensor (#113262 ) By actually instantiating test method to a different types and devices rather than always creating it on CPU. Also, remove `bool` from the list, as adding 1 to bool is not supported. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113262 Approved by: https://github.com/jeanschmidt, https://github.com/atalman, https://github.com/lezcano	2023-11-08 20:56:37 +00:00
Jon Chuang	d261687d5f	[dynamo] graph break on setattr `requires_grad` (#113163 ) Main: `RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn` This PR: graph breaks and eager applies the mutation, new tensors are tracked Fixes https://github.com/pytorch/pytorch/issues/109505 (the original bug does not occur, but a new bug where the mutation isn't applied - because AOTAutograd is not `requires_grad` mutation aware - is mitigated) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113163 Approved by: https://github.com/bdhirsh	2023-11-08 19:51:23 +00:00
Chien-Chin Huang	a66f2a1b99	[state_dict] Move _gather_state_dict to dcp module (#112835 ) This api is getting used by more than just FSDP. This PR moves it to DCP module. Differential Revision: [D50962966](https://our.internmc.facebook.com/intern/diff/D50962966/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112835 Approved by: https://github.com/wz337	2023-11-08 19:42:56 +00:00
PyTorch MergeBot	d98182e34e	Revert "Grandfather in built-in TorchScript ops to being pt2_compliant (#113061 )" This reverts commit 493b52b3d9395bde3c0dc072885a15e71f786c78. Reverted https://github.com/pytorch/pytorch/pull/113061 on behalf of https://github.com/PaliC due to breaking internal tests - contacted author with errors ([comment](https://github.com/pytorch/pytorch/pull/113061#issuecomment-1802528592))	2023-11-08 19:36:41 +00:00
Nikita Shulga	81bf0bd68d	[no ci] Fix typo in `persons_of_interest.rst` (#113283 ) There are no `c` in `Hirsh` Pull Request resolved: https://github.com/pytorch/pytorch/pull/113283 Approved by: https://github.com/bdhirsh	2023-11-08 19:36:32 +00:00
PyTorch MergeBot	e49b9492c6	Revert "Grandfather in some more pytorch ops to be pt2_compliant (#113050 )" This reverts commit 85832c0b9b2c7a4299aa1640a952f4d0f48efa66. Reverted https://github.com/pytorch/pytorch/pull/113050 on behalf of https://github.com/PaliC due to breaking internal tests - contacted author with errors ([comment](https://github.com/pytorch/pytorch/pull/113050#issuecomment-1802524046))	2023-11-08 19:33:15 +00:00
Thiago Crepaldi	16f82198ca	Export ReduleL1/ReduceL2 ONNX ops for aten::linalg_vector_norm(ord={1,2}) (#113173 ) After #84624, aten::linalg_vector_norm started being used instead of aten::norm. In the ONNX exporter, the latter leveraged Reduce{L1,L2} when p={1,2}, which resulted in more optimized code in the ONNX Runtime This PR extends aten::linal_vector_norm to also use Reduce{L1,L2} when ord={1,2}, producing an equivalent ONNX subgraph This PR is a WIP. Pending work include checking argument equivalence between `aten::norm` and `aten::linalg_vector_norm` and maybe re-enable tests disabled by #84624 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113173 Approved by: https://github.com/justinchuby	2023-11-08 19:08:43 +00:00
Menglu Yu	81b0166ca2	[Inductor][fx pass] Normalize nodes created by users (#113179 ) Summary: We noticed that the nodes created by users are absent of example value, which could not be normalized in the normalization pass, thus we change the format to the normalization format for enable the split cat merge. Test Plan: N/A Differential Revision: D51058817 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113179 Approved by: https://github.com/jackiexu1992	2023-11-08 19:08:18 +00:00
Zain Rizvi	0ab2a48e7e	Reland: [TD] Add heuristic for class level historical correlations (#113213 ) Relands PR https://github.com/pytorch/pytorch/pull/112162 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113213 Approved by: https://github.com/clee2000	2023-11-08 19:06:20 +00:00
wz337	5ea76f1760	[DeviceMesh][Test] Update 2D related test to use init_device_mesh (#113236 ) This PR: 1. Update all 2d related test to use DeviceMesh and remove `tp_mesh_dim` from TP calls. 2. Remove `test_fsdp_tp_checkpoint_integration` from `test/distributed/fsdp/test_fsdp_tp_integration.py` as checkpointing tests are covered in https://github.com/pytorch/pytorch/blob/main/test/distributed/tensor/parallel/test_fsdp_2d_parallel.py#L330 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113236 Approved by: https://github.com/wanchaol, https://github.com/fduwjj, https://github.com/fegin	2023-11-08 18:41:50 +00:00
Xilun Wu	e138d80e8e	[DTensor][2/N][forward fix] extend util function normalize_to_torch_size to accept single int size (#113244 ) Summary: In #113105 I used the util function `normalize_to_torch_size` to unify the `size` argument that may be in multiple formats. However the use of that function would only handle input of `Sequence[int]` therefore I submit this forward fix to have `normalize_to_torch_size` also able to handle the size argument of type `int` and `torch.Size`. A side product of this fix is it also enables 3 dtensor op tests (check `test_dtensor_ops.py`). Test: `pytest test/distributed/_tensor/test_dtensor_ops.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/113244 Approved by: https://github.com/wanchaol ghstack dependencies: #113105	2023-11-08 18:29:08 +00:00
Xilun Wu	088587574d	[DTensor][1/N] add forward layer norm support (#113105 ) Summary: This PR adds DTensor implementation for ATen op `native_layer_norm`. Test: `pytest test/distributed/_tensor/test_dtensor_ops.py -s -k layer_norm` Pull Request resolved: https://github.com/pytorch/pytorch/pull/113105 Approved by: https://github.com/wanchaol	2023-11-08 18:29:08 +00:00
Edward Z. Yang	9e6e9587c1	Make numel/sym_numel PyInterpreter work symmetrically to others (#113065 ) Just some better engineering code cleanup. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113065 Approved by: https://github.com/voznesenskym	2023-11-08 17:44:29 +00:00
lcskrishna	78b8465565	[Distributed] Limit world_size to 8 for FSDP Unit tests (#103412 ) There are few unit tests in FSDP that can support upto 8 GPUs. In this case, for example test_fsdp_uneven has an input size of [8,3]. For each process/rank we pass the data as input[self.rank] as below. So when we use 16 GPUs for our tests, these tests throw an index/key error. So basically to avoid such corner cases, I would like to add this change to use 8GPUs if there are more than 8 GPUs. This is applicable to both ROCm and CUDA builds as well. https://github.com/pytorch/pytorch/blob/main/test/distributed/fsdp/test_fsdp_uneven.py#L44 https://github.com/pytorch/pytorch/blob/main/test/distributed/fsdp/test_fsdp_uneven.py#L55 Pull Request resolved: https://github.com/pytorch/pytorch/pull/103412 Approved by: https://github.com/jithunnair-amd, https://github.com/pruthvistony, https://github.com/malfet	2023-11-08 17:21:38 +00:00
Jack Taylor	66577c0f3b	Update ROCm triton pin (#111129 ) Changes: - Enables bfloat16 support in MFMA dot on MI200 (`23979098c8`) - Add support for int8 to bfloat16 conversion (`2d3e38e182`) fixing a bug in bf16 triton gemm workloads. - Enable scanOp lowering by adding shfl_up support https://github.com/ROCmSoftwarePlatform/triton/pull/324 - MFMA16 support - support for the mfma_16x16xX instructions - these help perf on smaller sized GEMMs - `7e34c244c2` - configurable wavefront-per-eu - this helps us increase our occupancy in certain use cases such as Flash Attention - `e801638b40` - Many bug fixes and optimisations Pull Request resolved: https://github.com/pytorch/pytorch/pull/111129 Approved by: https://github.com/malfet, https://github.com/pruthvistony	2023-11-08 17:16:48 +00:00
Bert Maher	9bda1e874c	Reland "[aot inductor] Move constant loading logic from Container to Model" (#112197 ) Trying again, hopefully with 100% fewer merge conflicts Original diff: D50582959 Revert diff: D50657400 Differential Revision: [D50710815](https://our.internmc.facebook.com/intern/diff/D50710815/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112197 Approved by: https://github.com/desertfire, https://github.com/chenyang78	2023-11-08 15:08:26 +00:00
Catherine Lee	6e73ae2022	[ci][ez] Add job_id to emit_metrics (#113099 ) As in title. Also print the job id in the step since I'm struggling to find it Pull Request resolved: https://github.com/pytorch/pytorch/pull/113099 Approved by: https://github.com/seemethere	2023-11-08 10:32:41 +00:00
Jason Ansel	3914566c73	[dynamo] Refactor OrderedDict to dict (#113234 ) In Python3 all dicts are ordered. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113234 Approved by: https://github.com/oulgen, https://github.com/lezcano	2023-11-08 09:27:08 +00:00
Sherlock Huang	728ed37663	[AOTInductor] Allow using ProxyExecutor for ATen fallbacks (#112976 ) Summary: Use ProxyExecutor for aten._scaled_dot_product_efficient_attention in ABI-mode Test Plan: OSS CI Differential Revision: D51005807 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112976 Approved by: https://github.com/chenyang78, https://github.com/jansel	2023-11-08 08:34:11 +00:00
Eddie Yan	df4f0b3829	[BE] [cuDNN] Always build assuming cuDNN >= 8.0 (#95722 ) <!-- copilot:summary --> ### <samp>🤖 Generated by Copilot at 27084ed</samp> This pull request simplifies and cleans up the code that uses the cuDNN library for convolution, batch normalization, CTC loss, and quantized operations. It removes the unnecessary checks and conditions for older cuDNN versions and the experimental cuDNN v8 API, and ~~replaces them with the stable `cudnn_frontend` API that requires cuDNN v8 or higher. It also adds the dependency and configuration for the `cudnn_frontend` library in the cmake and bazel files.~~ Correction: The v7 API will still be available with this PR, and can still be used, without any changes to the defaults. This change simply always _builds_ the v8 API, and removes the case where _only_ the v7 API is built. This is a re-land of https://github.com/pytorch/pytorch/pull/91527 Pull Request resolved: https://github.com/pytorch/pytorch/pull/95722 Approved by: https://github.com/malfet	2023-11-08 07:53:23 +00:00
Oguz Ulgen	8ba11bf79d	[AOTI] Support non auto-tuned triton kernels in aoti (#113090 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113090 Approved by: https://github.com/aakhundov, https://github.com/chenyang78, https://github.com/desertfire	2023-11-08 07:48:15 +00:00
Yuqing Jiang	9f3e378125	[nested tensor]add split and layer_norm_backward operations (#113108 ) Summary: Add split and layer_norm_backward. Note: It is non trivial to support split_with_sizes backward so adding the split operation to support the use case in the model. Test Plan: unit tests Differential Revision: D51052966 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113108 Approved by: https://github.com/soulitzer	2023-11-08 07:44:35 +00:00
rraminen	3a429423fc	Upgrade CI to ROCm5.7 (#110465 ) This PR is to upgrade CI to ROCm5.7 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110465 Approved by: https://github.com/pruthvistony, https://github.com/malfet	2023-11-08 06:11:10 +00:00
Xuehai Pan	fa895da968	[pytree] reorganize submodule structure for C++ and Python pytree (#112278 ) Reorganized the two C++ and Python pytree submodules into a subpackage. I think this would be easier to implement the abstract `PyTreeAPI` class with two implementations. And it will be much easier for the user to switch between the two implementations. Before: ```text torch ├── utils │ ├── _pytree.py │ ├── _cxx_pytree.py │ ... ... ``` After: ```text torch ├── utils │ ├── _pytree │ │ ├── __init__.py │ │ └── api │ │ ├── __init__.py │ │ ├── cxx.py │ │ └── python.py │ ... ... ``` The `torch.utils._pytree` module will import all APIs from `torch.utils._pytree.api.python`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112278 Approved by: https://github.com/zou3519 ghstack dependencies: #112111	2023-11-08 06:05:39 +00:00
voznesenskym	3e4d14702a	On grad access, check if grad has changed and update stored example grad as needed (#112811 ) Fixes https://github.com/pytorch/pytorch/issues/112446 This is a doozy of a PR, there's a few important things to keep in mind here: 1) We MUST lift all tensors accessed via attrs to inputs, getattr is a no go in the graph, it violates the aot_autograd contract. Furthermore, aot_autograd does not know how to apply in-place ops to intermediary tensors that are attributes (aka from getattr) anyway. Views from ops are fine. 2) `.grad` access handling in dynamo peeks at the underlying value, the real tensor, because re-piping FakeTensors already made with this fake_mode through builder anew is a no go. 3) We have no proper mechanism for updating the hint / grapharg.example (the real value in (2) above) midway through trace Therefore, what we need to do is reconcile the difference in grad stashed on grapharg.example. The easiest way to do this is lazily, upon .grad access, by reading the new value off the right fake tensors. We can then make a tensor using that data as a hint to VariableBuilder to make the right VariableTracker. Note that the example value used here (torch.zeros) in the PR, is a dummy value only used as a tracing hint, it does not leak out into real runtime code. Alternatively, we could implement accumulate_grad_ in python... Pull Request resolved: https://github.com/pytorch/pytorch/pull/112811 Approved by: https://github.com/jansel	2023-11-08 05:45:00 +00:00
Will Feng	d01f8b291d	Fix visualize_overlap for Inductor comm reordering (#113066 ) The following assumptions are not always valid and need checking: 1. `snode.node` exists 2. `snode.node.layout.size` exists 3. `snode.node.layout.stride` exists 4. `snode.node.name` exists Also there is no guarantee that there won't be two collectives running at the same time. But it's hard to visualize the overlap in that case. So disable the visualization for that case for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113066 Approved by: https://github.com/wanchaol	2023-11-08 05:27:15 +00:00
Xuehai Pan	95f52611c7	[pytree] register pytree node type in both C++ pytree and Python pytree (#112111 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112111 Approved by: https://github.com/zou3519	2023-11-08 05:02:03 +00:00
Edward Z. Yang	1f3fa13f0a	Handle unbacked SymInt sized outputs in AOTAutograd (#113159 ) Thanks aakhundov for constructing the test case. This PR was constructed by running the failing test case, and then fixing problems until we got all the way to the end. There are a few distinct fixes: * AOTAutograd performs equality tests on tensor metadata to determine if a metadata mutation had occurred. If we test i0 vs i1, we should report these are NOT equal, since obviously we have somehow resized the tensor from i0 to i1 (even if, on a particular run, it is possible i0 == i1). * There's a sketchy fix for `test_aot_autograd_exhaustive_matmul_cpu_float32` where we check if the output shape equals the tangent shape. Unfortunately, the same `definitely_true` treatment does not work here, it still fails on the example. I piled an extra sketchy fix on top of it, where I just try my best to avoid doing the view. Maybe we should have some sort of logging here. * Partitioner needs to get out a size for unbacked SymInt when partitioning. I just feed it a random heuristic value in this case, similar to how we've been dealing with this in Inductor. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113159 Approved by: https://github.com/aakhundov, https://github.com/bdhirsh	2023-11-08 04:28:38 +00:00
Zhengxu Chen	aa376e31fd	[export] Enable verifier [2/n] (#113075 ) Summary: Turn on verifier check for exportec program ctor. Note that this effectively detect a large surface of spec violations, so we also spend some time fixing them one by one in this diff. Test Plan: CI Differential Revision: D51014944 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113075 Approved by: https://github.com/angelayi	2023-11-08 03:32:11 +00:00
wz337	f2963642c2	[DDP] Add device_mesh to DDP ctor (#112761 ) As title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112761 Approved by: https://github.com/fegin	2023-11-08 03:08:08 +00:00
Wes Bland	9d765d28ca	[pytorch] Add binding to get nccl version suffix (#112884 ) Summary: Adds a Python to C binding to get the NCCL_SUFFIX value for more accurate NCCL version information and add that to the NCCL version tuple. Differential Revision: D50978181 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112884 Approved by: https://github.com/kwen2501	2023-11-08 02:51:22 +00:00
Peter Bell	93cea394de	CMake: Loosen CUDA consistency check (#113174 ) Closes #108931, closes #108932, see also conda-forge/pytorch-cpu-feedstock#203 Currently we compare `CUDA_INCLUDE_DIRS` and expect exact equality with `CUDAToolkit_INCLUDE_DIR` however this fails in the presense of symbolic links or for split installs where there are multiple include paths. Given that, it makes sense to loosen the requirement to just version equality under the assumption that two installs of the same version should still be compatible. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113174 Approved by: https://github.com/malfet	2023-11-08 02:51:18 +00:00
Mikayla Gawarecki	b7acd374c9	Remove unecessary warning when getting storage.filename (#113212 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113212 Approved by: https://github.com/vmoens	2023-11-08 02:09:59 +00:00
Chen, Zejun	ceb07656c2	[dynamo] use APIs to use device interface instead of raw object in dynamo capture (#113000 ) This PR makes up for the https://github.com/pytorch/pytorch/pull/108312. This PR uses the _get_registered_device_interfaces_ to get the device interface, instead of using raw objects. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113000 Approved by: https://github.com/jgong5, https://github.com/jansel	2023-11-08 01:45:00 +00:00
Thiago Crepaldi	a6ed86bfdb	Add torch.onnx.dynamo_export test using ExportedProgram from file (#112271 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112271 Approved by: https://github.com/BowenBao	2023-11-08 01:34:18 +00:00
Wei Lu	2043d92472	[PyTorch][Vulkan] Add `LayerNorm` performance test binary (#112915 ) Summary: We add a performance test binary for the recently implemented operator `LayerNorm` D50436726. The difference of this test compared to the existing `vulkan_conv_arithmetic_perf_test.cpp` and `vulkan_mm_perf_test.cpp` is that - the existing tests benchmark a specific Vulkan shader such as `vulkan.mm`, `vulkan.addmm`, etc - but `LayerNorm` is implemented by invoking [a sequence of other operators (shader files)](https://www.internalfb.com/code/fbsource/[ff4989384cacda66a2eed4c800f69c69f6832c52]/fbcode/caffe2/aten/src/ATen/native/vulkan/ops/Layernorm.cpp?lines=94) instead of a devoted single shader file. Reusing the exiting test code wouldn't print meaningful result. To deal with this, we add a function `extractTotalShaderResultsAndSetState` which aggregates the latency of all invoked shaders except `nchw_to_image` and `image_to_nchw`. This test can be applied to other operators which don't have a devoted shader file. Test Plan: - build the binary ``` (base) luwei@luwei-mbp fbsource % buck2 build -c ndk.debug_info_level=0 -c ndk.static_linking=true -c pt.enable_qpl=0 -c pt.vulkan_use_gpu_diagnostics=1 --target-platforms=ovr_config//platform/android:arm32-fbsource //xplat/caffe2:pt_vulkan_layernorm_perf_test_binAndroid --show-output -c pt.vulkan_full_precision=1 ``` - push to device ``` (base) luwei@luwei-mbp fbsource % adb push buck-out/v2/gen/fbsource/f1f3f9bed27e143c/xplat/caffe2/__pt_vulkan_layernorm_perf_test_binAndroid__/pt_vulkan_layernorm_perf_test_binAndroid /data/local/tmp ``` - test on device ``` (base) luwei@luwei-mbp ~ % adb shell /data/local/tmp/pt_vulkan_layernorm_perf_test_binAndroid ``` - output, excerpt below, full test result in P871803721 it shows that the aggregation of invoked shaders takes 14.2 ms on average ``` Kernel Name Workgroup Size Duration (ns) =========== ============== =========== vulkan.nchw_to_image {75, 75, 19} 1310660 vulkan.nchw_to_image {75, 75, 19} 1313260 vulkan.nchw_to_image {75, 75, 19} 1268748 vulkan.mean_dim_keepdim {1, 75, 19} 878124 vulkan.mean_dim_keepdim {1, 1, 19} 53300 vulkan.mean_dim_keepdim {1, 1, 1} 62660 vulkan.mean_dim_keepdim {1, 75, 19} 871260 vulkan.mean_dim_keepdim {1, 1, 19} 53144 vulkan.mean_dim_keepdim {1, 1, 1} 62400 vulkan.sub {75, 75, 19} 1787760 vulkan.mul {75, 75, 19} 1866904 vulkan.mean_dim_keepdim {1, 75, 19} 868764 vulkan.mean_dim_keepdim {1, 1, 19} 56212 vulkan.mean_dim_keepdim {1, 1, 1} 62400 vulkan.sub {75, 75, 19} 1782872 vulkan.add_scalar {1, 1, 1} 2028 vulkan.pow_tensor_scalar {1, 1, 1} 2236 vulkan.mul {75, 75, 19} 1771276 ... vulkan.add {75, 75, 19} 1909544 vulkan.image_to_nchw {75, 75, 19} 1143844 ------------------------------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------------------------ layer_norm_benchmark/N:75/M:75/P:75/iterations:50/manual_time/threads:1 14.2 ms 48.9 ms 50 ``` Reviewed By: yipjustin Differential Revision: D50940613 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112915 Approved by: https://github.com/yipjustin	2023-11-08 01:29:10 +00:00
Aleksei Nikiforov	e5b758b855	S390x complex division (#108516 ) Adopt algorithm from AVX2 implementation. This change fixes test test_complex_div_underflow_overflow_cpu_complex128 from test/test_binary_ufuncs.py At the same time it breaks some of Arithmetics/*.Division tests from vec_test_all_types_ZVECTOR, but it's also broken on AVX2 and AVX512. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108516 Approved by: https://github.com/ezyang	2023-11-08 01:28:29 +00:00
ooooo	a8097ed479	Fix docstring errors in _composable_state.py, remote_device.py, value_ranges.py, utils.py, run.py, rendezvous.py, launch.py, argparse_util.py, __init__.py, _cycles.py (#112953 ) Fixes #112639 ```txt torch/utils/_sympy/value_ranges.py torch/utils/_sympy/value_ranges.py:60 in public class `ValueRanges`: D101: Missing docstring in public class torch/utils/_sympy/value_ranges.py:68 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/_sympy/value_ranges.py:81 in public method `__contains__`: D105: Missing docstring in magic method torch/utils/_sympy/value_ranges.py:86 in public method `tighten`: D400: First line should end with a period (not 'n') torch/utils/_sympy/value_ranges.py:90 in public method `__and__`: D105: Missing docstring in magic method torch/utils/_sympy/value_ranges.py:103 in public method `__or__`: D105: Missing docstring in magic method torch/utils/_sympy/value_ranges.py:113 in public method `is_singleton`: D102: Missing docstring in public method torch/utils/_sympy/value_ranges.py:118 in public method `unknown`: D102: Missing docstring in public method torch/utils/_sympy/value_ranges.py:122 in public method `wrap`: D102: Missing docstring in public method torch/utils/_sympy/value_ranges.py:129 in public method `increasing_map`: D400: First line should end with a period (not ')') torch/utils/_sympy/value_ranges.py:135 in public method `decreasing_map`: D400: First line should end with a period (not ')') torch/utils/_sympy/value_ranges.py:141 in public method `monotone_map`: D400: First line should end with a period (not 'g') torch/utils/_sympy/value_ranges.py:149 in public method `convex_min_zero_map`: D400: First line should end with a period (not '0') torch/utils/_sympy/value_ranges.py:149 in public method `convex_min_zero_map`: D403: First word of the first line should be properly capitalized ('Fn', not 'fn') torch/utils/_sympy/value_ranges.py:158 in public method `coordinatewise_increasing_map`: D205: 1 blank line required between summary line and description (found 0) torch/utils/_sympy/value_ranges.py:158 in public method `coordinatewise_increasing_map`: D400: First line should end with a period (not ':') torch/utils/_sympy/value_ranges.py:171 in public method `coordinatewise_monotone_map`: D400: First line should end with a period (not 'e') torch/utils/_sympy/value_ranges.py:180 in private class `SymPyValueRangeAnalysis`: D205: 1 blank line required between summary line and description (found 0) torch/utils/_sympy/value_ranges.py:180 in private class `SymPyValueRangeAnalysis`: D400: First line should end with a period (not 's') torch/utils/_sympy/value_ranges.py:386 in private method `reciprocal`: D210: No whitespaces allowed surrounding docstring text torch/utils/_sympy/value_ranges.py:386 in private method `reciprocal`: D400: First line should end with a period (not 'n') torch/utils/_sympy/value_ranges.py:488 in public class `ValueRangeAnalysis`: D101: Missing docstring in public class torch/utils/_sympy/value_ranges.py:489 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/_sympy/value_ranges.py:501 in public method `bool_handler`: D102: Missing docstring in public method torch/utils/_sympy/value_ranges.py:506 in public method `default_handler`: D102: Missing docstring in public method torch/utils/_sympy/value_ranges.py:511 in public method `load`: D102: Missing docstring in public method torch/utils/_sympy/value_ranges.py:514 in public method `store`: D102: Missing docstring in public method torch/utils/_sympy/value_ranges.py:517 in public method `reduction`: D102: Missing docstring in public method torch/utils/_sympy/value_ranges.py:520 in public method `index_expr`: D102: Missing docstring in public method torch/utils/_sympy/value_ranges.py:525 in public method `to_dtype`: D102: Missing docstring in public method torch/utils/_sympy/value_ranges.py:558 in public method `square`: D102: Missing docstring in public method torch/utils/_sympy/value_ranges.py:562 in public method `neg`: D102: Missing docstring in public method torch/utils/_sympy/value_ranges.py:566 in public method `truncdiv`: D102: Missing docstring in public method torch/utils/_sympy/value_ranges.py:577 in public method `sub`: D102: Missing docstring in public method torch/utils/_sympy/value_ranges.py:580 in public method `__getattr__`: D105: Missing docstring in magic method torch/utils/_sympy/value_ranges.py:585 in public function `bound_sympy`: D103: Missing docstring in public function 36 torch/utils/_sympy/value_ranges.py:60 in public class `ValueRanges`: D101: Missing docstring in public class torch/utils/_sympy/value_ranges.py:68 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/_sympy/value_ranges.py:81 in public method `__contains__`: D105: Missing docstring in magic method torch/utils/_sympy/value_ranges.py:86 in public method `tighten`: D400: First line should end with a period (not 'n') torch/utils/_sympy/value_ranges.py:90 in public method `__and__`: D105: Missing docstring in magic method torch/utils/_sympy/value_ranges.py:103 in public method `__or__`: D105: Missing docstring in magic method torch/utils/_sympy/value_ranges.py:113 in public method `is_singleton`: D102: Missing docstring in public method torch/utils/_sympy/value_ranges.py:118 in public method `unknown`: D102: Missing docstring in public method torch/utils/_sympy/value_ranges.py:122 in public method `wrap`: D102: Missing docstring in public method torch/utils/_sympy/value_ranges.py:182 in private class `SymPyValueRangeAnalysis`: D205: 1 blank line required between summary line and description (found 0) torch/utils/_sympy/value_ranges.py:182 in private class `SymPyValueRangeAnalysis`: D400: First line should end with a period (not 's') torch/utils/_sympy/value_ranges.py:388 in private method `reciprocal`: D210: No whitespaces allowed surrounding docstring text torch/utils/_sympy/value_ranges.py:388 in private method `reciprocal`: D400: First line should end with a period (not 'n') torch/utils/_sympy/value_ranges.py:490 in public class `ValueRangeAnalysis`: D101: Missing docstring in public class torch/utils/_sympy/value_ranges.py:491 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/_sympy/value_ranges.py:503 in public method `bool_handler`: D102: Missing docstring in public method torch/utils/_sympy/value_ranges.py:508 in public method `default_handler`: D102: Missing docstring in public method torch/utils/_sympy/value_ranges.py:513 in public method `load`: D102: Missing docstring in public method torch/utils/_sympy/value_ranges.py:516 in public method `store`: D102: Missing docstring in public method torch/utils/_sympy/value_ranges.py:519 in public method `reduction`: D102: Missing docstring in public method torch/utils/_sympy/value_ranges.py:522 in public method `index_expr`: D102: Missing docstring in public method torch/utils/_sympy/value_ranges.py:527 in public method `to_dtype`: D102: Missing docstring in public method torch/utils/_sympy/value_ranges.py:560 in public method `square`: D102: Missing docstring in public method torch/utils/_sympy/value_ranges.py:564 in public method `neg`: D102: Missing docstring in public method torch/utils/_sympy/value_ranges.py:568 in public method `truncdiv`: D102: Missing docstring in public method torch/utils/_sympy/value_ranges.py:579 in public method `sub`: D102: Missing docstring in public method torch/utils/_sympy/value_ranges.py:582 in public method `__getattr__`: D105: Missing docstring in magic method torch/utils/_sympy/value_ranges.py:587 in public function `bound_sympy`: D103: Missing docstring in public function 28 torch/utils/viz/_cycles.py torch/utils/viz/_cycles.py:14 in public function `observe_garbage`: D103: Missing docstring in public function torch/utils/viz/_cycles.py:207 in public function `object_annotation`: D205: 1 blank line required between summary line and description (found 0) torch/utils/viz/_cycles.py:207 in public function `object_annotation`: D400: First line should end with a period (not 'g') torch/utils/viz/_cycles.py:256 in public class `Node`: D101: Missing docstring in public class torch/utils/viz/_cycles.py:262 in public function `create_graph`: D103: Missing docstring in public function torch/utils/viz/_cycles.py:308 in public function `escape`: D103: Missing docstring in public function torch/utils/viz/_cycles.py:312 in public function `is_cuda_tensor`: D103: Missing docstring in public function torch/utils/viz/_cycles.py:315 in public function `cuda_allocation_context`: D103: Missing docstring in public function torch/utils/viz/_cycles.py:335 in public function `to_dot`: D103: Missing docstring in public function torch/utils/viz/_cycles.py:406 in public function `to_html`: D103: Missing docstring in public function torch/utils/viz/_cycles.py:416 in public function `observe_tensor_cycles`: D103: Missing docstring in public function torch/utils/viz/_cycles.py:429 in public function `warn_tensor_cycles`: D205: 1 blank line required between summary line and description (found 0) torch/utils/viz/_cycles.py:429 in public function `warn_tensor_cycles`: D400: First line should end with a period (not 'p') torch/utils/viz/_cycles.py:429 in public function `warn_tensor_cycles`: D401: First line should be in imperative mood; try rephrasing (found 'Reference') 14 torch/utils/viz/_cycles.py:14 in public function `observe_garbage`: D103: Missing docstring in public function torch/utils/viz/_cycles.py:256 in public class `Node`: D101: Missing docstring in public class torch/utils/viz/_cycles.py:262 in public function `create_graph`: D103: Missing docstring in public function torch/utils/viz/_cycles.py:308 in public function `escape`: D103: Missing docstring in public function torch/utils/viz/_cycles.py:312 in public function `is_cuda_tensor`: D103: Missing docstring in public function torch/utils/viz/_cycles.py:315 in public function `cuda_allocation_context`: D103: Missing docstring in public function torch/utils/viz/_cycles.py:335 in public function `to_dot`: D103: Missing docstring in public function torch/utils/viz/_cycles.py:406 in public function `to_html`: D103: Missing docstring in public function torch/utils/viz/_cycles.py:416 in public function `observe_tensor_cycles`: D103: Missing docstring in public function 9 torch/distributed/argparse_util.py torch/distributed/argparse_util.py:1 at module level: D100: Missing docstring in public module torch/distributed/argparse_util.py:13 in public class `env`: D205: 1 blank line required between summary line and description (found 0) torch/distributed/argparse_util.py:13 in public class `env`: D400: First line should end with a period (not 'g') torch/distributed/argparse_util.py:13 in public class `env`: D412: No blank lines allowed between a section header and its content ('Example') torch/distributed/argparse_util.py:43 in public method `__init__`: D107: Missing docstring in __init__ torch/distributed/argparse_util.py:56 in public method `__call__`: D102: Missing docstring in public method torch/distributed/argparse_util.py:61 in public class `check_env`: D205: 1 blank line required between summary line and description (found 0) torch/distributed/argparse_util.py:61 in public class `check_env`: D400: First line should end with a period (not 's') torch/distributed/argparse_util.py:61 in public class `check_env`: D412: No blank lines allowed between a section header and its content ('Example') torch/distributed/argparse_util.py:97 in public method `__init__`: D107: Missing docstring in __init__ torch/distributed/argparse_util.py:102 in public method `__call__`: D102: Missing docstring in public method 11 torch/distributed/argparse_util.py:1 at module level: D100: Missing docstring in public module torch/distributed/argparse_util.py:43 in public method `__init__`: D107: Missing docstring in __init__ torch/distributed/argparse_util.py:56 in public method `__call__`: D102: Missing docstring in public method torch/distributed/argparse_util.py:97 in public method `__init__`: D107: Missing docstring in __init__ torch/distributed/argparse_util.py:102 in public method `__call__`: D102: Missing docstring in public method 5 torch/distributed/_composable_state.py torch/distributed/_composable_state.py:20 in private function `_get_module_state`: D202: No blank lines allowed after function docstring (found 1) torch/distributed/_composable_state.py:20 in private function `_get_module_state`: D205: 1 blank line required between summary line and description (found 0) torch/distributed/_composable_state.py:20 in private function `_get_module_state`: D400: First line should end with a period (not '`') 3 0 torch/distributed/launch.py torch/distributed/launch.py:1 at module level: D205: 1 blank line required between summary line and description (found 0) torch/distributed/launch.py:1 at module level: D400: First line should end with a period (not 'd') torch/distributed/launch.py:156 in public function `parse_args`: D103: Missing docstring in public function torch/distributed/launch.py:171 in public function `launch`: D103: Missing docstring in public function torch/distributed/launch.py:180 in public function `main`: D103: Missing docstring in public function 5 torch/distributed/launch.py:157 in public function `parse_args`: D103: Missing docstring in public function torch/distributed/launch.py:172 in public function `launch`: D103: Missing docstring in public function torch/distributed/launch.py:181 in public function `main`: D103: Missing docstring in public function 3 torch/distributed/remote_device.py torch/distributed/remote_device.py:1 at module level: D100: Missing docstring in public module torch/distributed/remote_device.py:81 in private method `worker_name`: D205: 1 blank line required between summary line and description (found 0) torch/distributed/remote_device.py:81 in private method `worker_name`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') torch/distributed/remote_device.py:88 in private method `rank`: D205: 1 blank line required between summary line and description (found 0) torch/distributed/remote_device.py:88 in private method `rank`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') torch/distributed/remote_device.py:95 in private method `device`: D200: One-line docstring should fit on one line with quotes (found 3) torch/distributed/remote_device.py:95 in private method `device`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') 7 torch/distributed/remote_device.py:1 at module level: D100: Missing docstring in public module torch/distributed/remote_device.py:85 in private method `rank`: D205: 1 blank line required between summary line and description (found 0) torch/distributed/remote_device.py:85 in private method `rank`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') 3 torch/distributed/rendezvous.py torch/distributed/rendezvous.py:1 at module level: D100: Missing docstring in public module torch/distributed/rendezvous.py:23 in public function `register_rendezvous_handler`: D401: First line should be in imperative mood (perhaps 'Register', not 'Registers') torch/distributed/rendezvous.py:88 in public function `rendezvous`: D103: Missing docstring in public function torch/distributed/rendezvous.py:147 in private function `_create_c10d_store`: D205: 1 blank line required between summary line and description (found 0) torch/distributed/rendezvous.py:147 in private function `_create_c10d_store`: D400: First line should end with a period (not 'r') 5 torch/distributed/rendezvous.py:1 at module level: D100: Missing docstring in public module torch/distributed/rendezvous.py:89 in public function `rendezvous`: D103: Missing docstring in public function 2 torch/distributed/run.py torch/distributed/run.py:9 at module level: D205: 1 blank line required between summary line and description (found 0) torch/distributed/run.py:9 at module level: D400: First line should end with a period (not '`') torch/distributed/run.py:393 in public function `get_args_parser`: D202: No blank lines allowed after function docstring (found 1) torch/distributed/run.py:393 in public function `get_args_parser`: D401: First line should be in imperative mood; try rephrasing (found 'Helper') torch/distributed/run.py:610 in public function `parse_args`: D103: Missing docstring in public function torch/distributed/run.py:615 in public function `parse_min_max_nnodes`: D103: Missing docstring in public function torch/distributed/run.py:629 in public function `determine_local_world_size`: D103: Missing docstring in public function torch/distributed/run.py:670 in public function `get_rdzv_endpoint`: D103: Missing docstring in public function torch/distributed/run.py:677 in public function `get_use_env`: D205: 1 blank line required between summary line and description (found 0) torch/distributed/run.py:677 in public function `get_use_env`: D401: First line should be in imperative mood (perhaps 'Retrieve', not 'Retrieves') torch/distributed/run.py:689 in public function `config_from_args`: D103: Missing docstring in public function torch/distributed/run.py:770 in public function `run_script_path`: D205: 1 blank line required between summary line and description (found 0) torch/distributed/run.py:770 in public function `run_script_path`: D401: First line should be in imperative mood (perhaps 'Run', not 'Runs') torch/distributed/run.py:781 in public function `run`: D103: Missing docstring in public function torch/distributed/run.py:804 in public function `main`: D103: Missing docstring in public function 15 torch/distributed/run.py:611 in public function `parse_args`: D103: Missing docstring in public function torch/distributed/run.py:616 in public function `parse_min_max_nnodes`: D103: Missing docstring in public function torch/distributed/run.py:630 in public function `determine_local_world_size`: D103: Missing docstring in public function torch/distributed/run.py:671 in public function `get_rdzv_endpoint`: D103: Missing docstring in public function torch/distributed/run.py:691 in public function `config_from_args`: D103: Missing docstring in public function torch/distributed/run.py:784 in public function `run`: D103: Missing docstring in public function torch/distributed/run.py:807 in public function `main`: D103: Missing docstring in public function 7 torch/distributed/__init__.py torch/distributed/__init__.py:1 at module level: D104: Missing docstring in public package torch/distributed/__init__.py:8 in public function `is_available`: D205: 1 blank line required between summary line and description (found 0) torch/distributed/__init__.py:8 in public function `is_available`: D400: First line should end with a period (not ',') torch/distributed/__init__.py:8 in public function `is_available`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') 4 torch/distributed/__init__.py:1 at module level: D104: Missing docstring in public package 1 torch/distributed/utils.py:1 at module level: D100: Missing docstring in public module torch/distributed/utils.py:16 in private function `_pack_kwargs`: D205: 1 blank line required between summary line and description (found 0) torch/distributed/utils.py:16 in private function `_pack_kwargs`: D400: First line should end with a period (not ')') torch/distributed/utils.py:47 in private function `_cast_forward_inputs`: D205: 1 blank line required between summary line and description (found 0) torch/distributed/utils.py:88 in private function `_recursive_to`: D200: One-line docstring should fit on one line with quotes (found 3) torch/distributed/utils.py:141 in private function `_p_assert`: D205: 1 blank line required between summary line and description (found 0) torch/distributed/utils.py:141 in private function `_p_assert`: D209: Multi-line docstring closing quotes should be on a separate line torch/distributed/utils.py:141 in private function `_p_assert`: D400: First line should end with a period (not 't') torch/distributed/utils.py:141 in private function `_p_assert`: D401: First line should be in imperative mood; try rephrasing (found 'This') torch/distributed/utils.py:275 in private function `_sync_module_states`: D205: 1 blank line required between summary line and description (found 0) torch/distributed/utils.py:275 in private function `_sync_module_states`: D400: First line should end with a period (not 'n') torch/distributed/utils.py:275 in private function `_sync_module_states`: D401: First line should be in imperative mood (perhaps 'Sync', not 'Syncs') torch/distributed/utils.py:300 in private function `_sync_params_and_buffers`: D205: 1 blank line required between summary line and description (found 0) torch/distributed/utils.py:300 in private function `_sync_params_and_buffers`: D400: First line should end with a period (not 'y') torch/distributed/utils.py:300 in private function `_sync_params_and_buffers`: D401: First line should be in imperative mood (perhaps 'Synchronize', not 'Synchronizes') 15 torch/distributed/utils.py:1 at module level: D100: Missing docstring in public module 1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112953 Approved by: https://github.com/weifengpy	2023-11-08 01:13:09 +00:00
Kurman Karabukaev	bae8506589	[TorchElastic] Add option to configure log prefix for each rank (#112357 ) Summary: Add an ability to customize log lines and addtional template like behavior to enrich log information. Motivation: a) Log stream processing/aggregation gains additional value when it includes information about the global rank. Extension to that is that it will be easier to map ranks to hosts from log stream information (less relevant at the moment) b) Users can easily map the failure to the right rank without matching node rank offset+local rank. Implementation - BC change - keeps the logs line prefix as `[<role name><local rank>]:` - Optional env variable TORCHELASTIC_LOG_LINE_HEADER that will be used as a prefix when specified and currently exposes `role_name`, `rank` and `local_rank` variables that will be bound when agent assigns the ranks. Test Plan: CI https://fburl.com/mlhub/mzx5xspv Differential Revision: D50584590 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112357 Approved by: https://github.com/kiukchung	2023-11-08 01:00:26 +00:00
Richard Zou	d1c092ae1b	Update impl_abstract_pystub to be less boilerplatey (#113182 ) Summary: We've made the following changes: - The new way to use the API is `m.impl_abstract_pystub(module, context)`. Every subsequent m.def of an op inside the TORCH_LIBRARY block gives the op the `impl_abstract_pystub`. - Added a mechanism to determine if an operator was defined in Python or C++. Library.define in Python appends the op to a global set, which is analogous to what we do for tracking Library.impl. - If someone does `torch.library.impl_abstract` in Python for an operator, then we require that it has an `impl_abstract_pystub` specified and we also check that the module in the `impl_abstract_pystub` is the same as the module where the call to `torch.library.impl_abstract` exists. - Unfortunately we can't check the "context" (which is the buck target on buck-based systems) because buck sits above us. bypass-github-export-checks Test Plan: - existing tests Differential Revision: D51080493 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113182 Approved by: https://github.com/ezyang	2023-11-08 00:39:00 +00:00
Sergii Dymchenko	aae418aea6	Remove TODOs to add docstrings (#113197 ) Because the docstrings are actually defined in torch/_torch_docs.py. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113197 Approved by: https://github.com/atalman, https://github.com/malfet	2023-11-08 00:34:26 +00:00
NVS Abhilash	eb5487361d	docs: fix docstring errors in quantized modules and others (#112695 ) Fixes #112632 Before: 171 ``` torch/backends/_nnapi/prepare.py:24 in public method `__init__`: D107: Missing docstring in __init__ torch/backends/_nnapi/prepare.py:46 in public method `init`: D102: Missing docstring in public method torch/backends/_nnapi/prepare.py:60 in public method `forward`: D102: Missing docstring in public method torch/backends/_nnapi/prepare.py:94 in public function `convert_model_to_nnapi`: D103: Missing docstring in public function torch/backends/_nnapi/prepare.py:153 in public function `process_for_nnapi`: D103: Missing docstring in public function torch/backends/_nnapi/prepare.py:177 in private nested class `ShapeComputeModule`: D400: First line should end with a period (not 'n') torch/backends/_nnapi/serializer.py:19 in public class `NNAPI_OperandCode`: D101: Missing docstring in public class torch/backends/_nnapi/serializer.py:35 in public class `NNAPI_OperationCode`: D101: Missing docstring in public class torch/backends/_nnapi/serializer.py:133 in public class `NNAPI_FuseCode`: D101: Missing docstring in public class torch/backends/_nnapi/serializer.py:140 in public class `OperandValueSourceType`: D101: Missing docstring in public class torch/backends/_nnapi/serializer.py:150 in public class `TorchScalarTypes`: D101: Missing docstring in public class torch/backends/_nnapi/serializer.py:154 in public function `approx_equal`: D103: Missing docstring in public function torch/backends/_nnapi/serializer.py:158 in public function `tensor_size`: D103: Missing docstring in public function torch/backends/_nnapi/serializer.py:172 in public function `change_element`: D103: Missing docstring in public function torch/backends/_nnapi/serializer.py:194 in public class `DimOrder`: D101: Missing docstring in public class torch/backends/_nnapi/serializer.py:225 in public method `use_nchw`: D102: Missing docstring in public method torch/backends/_nnapi/serializer.py:233 in public function `broadcast_shapes`: D103: Missing docstring in public function torch/backends/_nnapi/serializer.py:260 in public function `get_conv_pool_shape`: D103: Missing docstring in public function torch/backends/_nnapi/serializer.py:284 in public function `fix_shape`: D103: Missing docstring in public function torch/backends/_nnapi/serializer.py:301 in public function `reverse_map_dim`: D103: Missing docstring in public function torch/backends/_nnapi/serializer.py:312 in public function `flex_name`: D103: Missing docstring in public function torch/backends/_nnapi/serializer.py:1337 in private method `_do_add_binary`: D400: First line should end with a period (not 's') torch/backends/_nnapi/serializer.py:1337 in private method `_do_add_binary`: D401: First line should be in imperative mood; try rephrasing (found 'Helper') torch/backends/_nnapi/serializer.py:2180 in public function `serialize_model`: D202: No blank lines allowed after function docstring (found 1) torch/backends/_nnapi/serializer.py:2180 in public function `serialize_model`: D205: 1 blank line required between summary line and description (found 0) torch/backends/_nnapi/serializer.py:2180 in public function `serialize_model`: D400: First line should end with a period (not ':') torch/backends/cuda/__init__.py:1 at module level: D104: Missing docstring in public package torch/backends/cuda/__init__.py:30 in public function `is_built`: D205: 1 blank line required between summary line and description (found 0) torch/backends/cuda/__init__.py:30 in public function `is_built`: D209: Multi-line docstring closing quotes should be on a separate line torch/backends/cuda/__init__.py:30 in public function `is_built`: D400: First line should end with a period (not 's') torch/backends/cuda/__init__.py:30 in public function `is_built`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') torch/backends/cuda/__init__.py:37 in public class `cuFFTPlanCacheAttrContextProp`: D101: Missing docstring in public class torch/backends/cuda/__init__.py:40 in public method `__init__`: D107: Missing docstring in __init__ torch/backends/cuda/__init__.py:44 in public method `__get__`: D105: Missing docstring in magic method torch/backends/cuda/__init__.py:47 in public method `__set__`: D105: Missing docstring in magic method torch/backends/cuda/__init__.py:54 in public class `cuFFTPlanCache`: D205: 1 blank line required between summary line and description (found 0) torch/backends/cuda/__init__.py:54 in public class `cuFFTPlanCache`: D400: First line should end with a period (not 'e') torch/backends/cuda/__init__.py:60 in public method `__init__`: D107: Missing docstring in __init__ torch/backends/cuda/__init__.py:73 in public method `clear`: D102: Missing docstring in public method torch/backends/cuda/__init__.py:78 in public class `cuFFTPlanCacheManager`: D205: 1 blank line required between summary line and description (found 0) torch/backends/cuda/__init__.py:78 in public class `cuFFTPlanCacheManager`: D400: First line should end with a period (not ',') torch/backends/cuda/__init__.py:89 in public method `__init__`: D107: Missing docstring in __init__ torch/backends/cuda/__init__.py:93 in public method `__getitem__`: D105: Missing docstring in magic method torch/backends/cuda/__init__.py:106 in public method `__getattr__`: D105: Missing docstring in magic method torch/backends/cuda/__init__.py:109 in public method `__setattr__`: D105: Missing docstring in magic method torch/backends/cuda/__init__.py:116 in public class `cuBLASModule`: D101: Missing docstring in public class torch/backends/cuda/__init__.py:117 in public method `__getattr__`: D105: Missing docstring in magic method torch/backends/cuda/__init__.py:126 in public method `__setattr__`: D105: Missing docstring in magic method torch/backends/cuda/__init__.py:147 in public function `preferred_linalg_library`: D202: No blank lines allowed after function docstring (found 1) torch/backends/cuda/__init__.py:204 in public class `SDPBackend`: D204: 1 blank line required after class docstring (found 0) torch/backends/cudnn/__init__.py:1 at module level: D104: Missing docstring in public package torch/backends/cudnn/__init__.py:81 in public function `version`: D400: First line should end with a period (not 'N') torch/backends/cudnn/__init__.py:81 in public function `version`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') torch/backends/cudnn/__init__.py:95 in public function `is_available`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') torch/backends/cudnn/__init__.py:99 in public function `is_acceptable`: D103: Missing docstring in public function torch/backends/cudnn/__init__.py:122 in public function `set_flags`: D103: Missing docstring in public function torch/backends/cudnn/__init__.py:150 in public function `flags`: D103: Missing docstring in public function torch/backends/cudnn/__init__.py:174 in public class `CudnnModule`: D101: Missing docstring in public class torch/backends/cudnn/__init__.py:175 in public method `__init__`: D107: Missing docstring in __init__ torch/backends/mkl/__init__.py:1 at module level: D104: Missing docstring in public package torch/backends/mkl/__init__.py:5 in public function `is_available`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') torch/backends/mkl/__init__.py:14 in public class `verbose`: D205: 1 blank line required between summary line and description (found 0) torch/backends/mkl/__init__.py:14 in public class `verbose`: D400: First line should end with a period (not 'y') torch/backends/mkl/__init__.py:41 in public method `__init__`: D107: Missing docstring in __init__ torch/backends/mkl/__init__.py:44 in public method `__enter__`: D105: Missing docstring in magic method torch/backends/mkl/__init__.py:53 in public method `__exit__`: D105: Missing docstring in magic method torch/backends/mkldnn/__init__.py:1 at module level: D104: Missing docstring in public package torch/backends/mkldnn/__init__.py:9 in public function `is_available`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') torch/backends/mkldnn/__init__.py:19 in public class `verbose`: D205: 1 blank line required between summary line and description (found 0) torch/backends/mkldnn/__init__.py:19 in public class `verbose`: D400: First line should end with a period (not 'y') torch/backends/mkldnn/__init__.py:47 in public method `__init__`: D107: Missing docstring in __init__ torch/backends/mkldnn/__init__.py:50 in public method `__enter__`: D105: Missing docstring in magic method torch/backends/mkldnn/__init__.py:59 in public method `__exit__`: D105: Missing docstring in magic method torch/backends/mkldnn/__init__.py:64 in public function `set_flags`: D103: Missing docstring in public function torch/backends/mkldnn/__init__.py:71 in public function `flags`: D103: Missing docstring in public function torch/backends/mkldnn/__init__.py:81 in public class `MkldnnModule`: D101: Missing docstring in public class torch/backends/mkldnn/__init__.py:82 in public method `__init__`: D107: Missing docstring in __init__ torch/backends/openmp/__init__.py:1 at module level: D104: Missing docstring in public package torch/backends/openmp/__init__.py:5 in public function `is_available`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') torch/nn/intrinsic/qat/modules/conv_fused.py:2 at module level: D400: First line should end with a period (not 's') torch/nn/intrinsic/qat/modules/linear_fused.py:2 at module level: D400: First line should end with a period (not 's') torch/nn/intrinsic/qat/modules/linear_relu.py:2 at module level: D400: First line should end with a period (not 's') torch/nn/qat/__init__.py:2 at module level: D400: First line should end with a period (not 's') torch/nn/qat/dynamic/__init__.py:2 at module level: D400: First line should end with a period (not 's') torch/nn/qat/dynamic/modules/linear.py:2 at module level: D400: First line should end with a period (not 's') torch/nn/qat/modules/__init__.py:2 at module level: D400: First line should end with a period (not 's') torch/nn/qat/modules/conv.py:2 at module level: D400: First line should end with a period (not 's') torch/nn/qat/modules/embedding_ops.py:2 at module level: D400: First line should end with a period (not 's') torch/nn/qat/modules/linear.py:2 at module level: D400: First line should end with a period (not 's') torch/nn/quantizable/modules/activation.py:2 at module level: D400: First line should end with a period (not 's') torch/nn/quantizable/modules/rnn.py:2 at module level: D400: First line should end with a period (not 's') torch/nn/quantized/_reference/modules/__init__.py:2 at module level: D400: First line should end with a period (not 's') torch/nn/quantized/_reference/modules/conv.py:2 at module level: D400: First line should end with a period (not 's') torch/nn/quantized/_reference/modules/linear.py:2 at module level: D400: First line should end with a period (not 's') torch/nn/quantized/_reference/modules/rnn.py:2 at module level: D400: First line should end with a period (not 's') torch/nn/quantized/_reference/modules/sparse.py:2 at module level: D400: First line should end with a period (not 's') torch/nn/quantized/_reference/modules/utils.py:2 at module level: D400: First line should end with a period (not 's') torch/nn/quantized/dynamic/modules/__init__.py:2 at module level: D400: First line should end with a period (not 's') torch/nn/quantized/dynamic/modules/conv.py:2 at module level: D400: First line should end with a period (not 's') torch/nn/quantized/dynamic/modules/linear.py:2 at module level: D400: First line should end with a period (not 's') torch/nn/quantized/dynamic/modules/rnn.py:2 at module level: D400: First line should end with a period (not 's') torch/nn/quantized/functional.py:1 at module level: D400: First line should end with a period (not 'l') torch/nn/quantized/modules/__init__.py:1 at module level: D400: First line should end with a period (not 's') torch/nn/quantized/modules/activation.py:2 at module level: D400: First line should end with a period (not 's') torch/nn/quantized/modules/batchnorm.py:2 at module level: D400: First line should end with a period (not 's') torch/nn/quantized/modules/conv.py:2 at module level: D400: First line should end with a period (not 's') torch/nn/quantized/modules/dropout.py:2 at module level: D400: First line should end with a period (not 's') torch/nn/quantized/modules/embedding_ops.py:2 at module level: D400: First line should end with a period (not 's') torch/nn/quantized/modules/functional_modules.py:2 at module level: D400: First line should end with a period (not 's') torch/nn/quantized/modules/linear.py:2 at module level: D400: First line should end with a period (not 's') torch/nn/quantized/modules/normalization.py:2 at module level: D400: First line should end with a period (not 's') torch/nn/quantized/modules/rnn.py:2 at module level: D400: First line should end with a period (not 's') torch/nn/quantized/modules/utils.py:2 at module level: D400: First line should end with a period (not 's') torch/nn/utils/_expanded_weights/conv_utils.py:13 in public function `conv_picker`: D103: Missing docstring in public function torch/nn/utils/_expanded_weights/conv_utils.py:23 in public function `conv_args_and_kwargs`: D103: Missing docstring in public function torch/nn/utils/_expanded_weights/conv_utils.py:31 in public function `conv_normalizer`: D103: Missing docstring in public function torch/nn/utils/_expanded_weights/conv_utils.py:35 in public function `conv_input_for_string_padding`: D103: Missing docstring in public function torch/nn/utils/_expanded_weights/conv_utils.py:43 in public function `int_padding_for_string_padding`: D103: Missing docstring in public function torch/nn/utils/_expanded_weights/conv_utils.py:59 in public function `conv_padding_for_same`: D103: Missing docstring in public function torch/nn/utils/_expanded_weights/conv_utils.py:66 in public function `conv_backward`: D103: Missing docstring in public function torch/nn/utils/_expanded_weights/conv_utils.py:131 in public function `conv_unfold_weight_grad_sample`: D103: Missing docstring in public function torch/nn/utils/_expanded_weights/conv_utils.py:166 in public function `conv_group_weight_grad_sample`: D103: Missing docstring in public function torch/nn/utils/_expanded_weights/conv_utils.py:189 in public function `unfold3d`: D202: No blank lines allowed after function docstring (found 1) torch/nn/utils/_expanded_weights/conv_utils.py:189 in public function `unfold3d`: D205: 1 blank line required between summary line and description (found 0) torch/nn/utils/_expanded_weights/conv_utils.py:189 in public function `unfold3d`: D401: First line should be in imperative mood (perhaps 'Extract', not 'Extracts') torch/nn/utils/_expanded_weights/expanded_weights_utils.py:6 in public function `is_batch_first`: D103: Missing docstring in public function torch/nn/utils/_expanded_weights/expanded_weights_utils.py:19 in public function `standard_kwargs`: D205: 1 blank line required between summary line and description (found 0) torch/nn/utils/_expanded_weights/expanded_weights_utils.py:19 in public function `standard_kwargs`: D300: Use """triple double quotes""" (found '''-quotes) torch/nn/utils/_expanded_weights/expanded_weights_utils.py:19 in public function `standard_kwargs`: D400: First line should end with a period (not 'e') torch/nn/utils/_expanded_weights/expanded_weights_utils.py:28 in public function `forward_helper`: D205: 1 blank line required between summary line and description (found 0) torch/nn/utils/_expanded_weights/expanded_weights_utils.py:28 in public function `forward_helper`: D300: Use """triple double quotes""" (found '''-quotes) torch/nn/utils/_expanded_weights/expanded_weights_utils.py:28 in public function `forward_helper`: D400: First line should end with a period (not ')') torch/nn/utils/_expanded_weights/expanded_weights_utils.py:84 in public function `maybe_scale_by_batch_size`: D103: Missing docstring in public function torch/nn/utils/_expanded_weights/expanded_weights_utils.py:90 in public function `set_grad_sample_if_exists`: D103: Missing docstring in public function torch/nn/utils/_expanded_weights/expanded_weights_utils.py:108 in public function `unpack_expanded_weight_or_tensor`: D103: Missing docstring in public function torch/nn/utils/_expanded_weights/expanded_weights_utils.py:123 in public function `sum_over_all_but_batch_and_last_n`: D205: 1 blank line required between summary line and description (found 0) torch/nn/utils/_expanded_weights/expanded_weights_utils.py:123 in public function `sum_over_all_but_batch_and_last_n`: D400: First line should end with a period (not 't') torch/nn/utils/_expanded_weights/expanded_weights_utils.py:123 in public function `sum_over_all_but_batch_and_last_n`: D401: First line should be in imperative mood (perhaps 'Calculate', not 'Calculates') torch/nn/utils/convert_parameters.py:1 at module level: D100: Missing docstring in public module torch/nn/utils/convert_parameters.py:57 in private function `_check_param_device`: D202: No blank lines allowed after function docstring (found 1) torch/nn/utils/convert_parameters.py:57 in private function `_check_param_device`: D205: 1 blank line required between summary line and description (found 0) torch/nn/utils/convert_parameters.py:57 in private function `_check_param_device`: D400: First line should end with a period (not 'd') torch/nn/utils/convert_parameters.py:57 in private function `_check_param_device`: D401: First line should be in imperative mood; try rephrasing (found 'This') torch/nn/utils/rnn.py:1 at module level: D100: Missing docstring in public module torch/nn/utils/rnn.py:28 in public class `PackedSequence`: D204: 1 blank line required after class docstring (found 0) torch/nn/utils/rnn.py:63 in public method `__new__`: D102: Missing docstring in public method torch/nn/utils/rnn.py:73 in public method `pin_memory`: D102: Missing docstring in public method torch/nn/utils/rnn.py:80 in public method `cuda`: D102: Missing docstring in public method torch/nn/utils/rnn.py:87 in public method `cpu`: D102: Missing docstring in public method torch/nn/utils/rnn.py:94 in public method `double`: D102: Missing docstring in public method torch/nn/utils/rnn.py:97 in public method `float`: D102: Missing docstring in public method torch/nn/utils/rnn.py:100 in public method `half`: D102: Missing docstring in public method torch/nn/utils/rnn.py:103 in public method `long`: D102: Missing docstring in public method torch/nn/utils/rnn.py:106 in public method `int`: D102: Missing docstring in public method torch/nn/utils/rnn.py:109 in public method `short`: D102: Missing docstring in public method torch/nn/utils/rnn.py:112 in public method `char`: D102: Missing docstring in public method torch/nn/utils/rnn.py:115 in public method `byte`: D102: Missing docstring in public method torch/nn/utils/rnn.py:119 in public method `to`: D202: No blank lines allowed after function docstring (found 1) torch/nn/utils/rnn.py:119 in public method `to`: D401: First line should be in imperative mood (perhaps 'Perform', not 'Performs') torch/nn/utils/rnn.py:146 in public method `is_cuda`: D400: First line should end with a period (not 'u') torch/nn/utils/rnn.py:150 in public method `is_pinned`: D400: First line should end with a period (not 'y') torch/nn/utils/rnn.py:150 in public method `is_pinned`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') torch/nn/utils/rnn.py:198 in public function `invert_permutation`: D103: Missing docstring in public function torch/nn/utils/rnn.py:274 in public function `pad_packed_sequence`: D401: First line should be in imperative mood (perhaps 'Pad', not 'Pads') torch/nn/utils/rnn.py:347 in public function `pad_sequence`: D202: No blank lines allowed after function docstring (found 1) torch/nn/utils/rnn.py:347 in public function `pad_sequence`: D400: First line should end with a period (not '`') torch/nn/utils/rnn.py:408 in public function `unpad_sequence`: D202: No blank lines allowed after function docstring (found 1) torch/nn/utils/rnn.py:408 in public function `unpad_sequence`: D400: First line should end with a period (not 's') torch/nn/utils/rnn.py:454 in public function `pack_sequence`: D400: First line should end with a period (not 's') torch/nn/utils/rnn.py:490 in public function `unpack_sequence`: D202: No blank lines allowed after function docstring (found 1) torch/nn/utils/rnn.py:490 in public function `unpack_sequence`: D400: First line should end with a period (not 's') 171 ``` After: 81 ``` torch/backends/_nnapi/prepare.py:24 in public method `__init__`: D107: Missing docstring in __init__ torch/backends/_nnapi/prepare.py:46 in public method `init`: D102: Missing docstring in public method torch/backends/_nnapi/prepare.py:60 in public method `forward`: D102: Missing docstring in public method torch/backends/_nnapi/prepare.py:94 in public function `convert_model_to_nnapi`: D103: Missing docstring in public function torch/backends/_nnapi/prepare.py:153 in public function `process_for_nnapi`: D103: Missing docstring in public function torch/backends/_nnapi/serializer.py:19 in public class `NNAPI_OperandCode`: D101: Missing docstring in public class torch/backends/_nnapi/serializer.py:35 in public class `NNAPI_OperationCode`: D101: Missing docstring in public class torch/backends/_nnapi/serializer.py:133 in public class `NNAPI_FuseCode`: D101: Missing docstring in public class torch/backends/_nnapi/serializer.py:140 in public class `OperandValueSourceType`: D101: Missing docstring in public class torch/backends/_nnapi/serializer.py:150 in public class `TorchScalarTypes`: D101: Missing docstring in public class torch/backends/_nnapi/serializer.py:154 in public function `approx_equal`: D103: Missing docstring in public function torch/backends/_nnapi/serializer.py:158 in public function `tensor_size`: D103: Missing docstring in public function torch/backends/_nnapi/serializer.py:172 in public function `change_element`: D103: Missing docstring in public function torch/backends/_nnapi/serializer.py:194 in public class `DimOrder`: D101: Missing docstring in public class torch/backends/_nnapi/serializer.py:225 in public method `use_nchw`: D102: Missing docstring in public method torch/backends/_nnapi/serializer.py:233 in public function `broadcast_shapes`: D103: Missing docstring in public function torch/backends/_nnapi/serializer.py:260 in public function `get_conv_pool_shape`: D103: Missing docstring in public function torch/backends/_nnapi/serializer.py:284 in public function `fix_shape`: D103: Missing docstring in public function torch/backends/_nnapi/serializer.py:301 in public function `reverse_map_dim`: D103: Missing docstring in public function torch/backends/_nnapi/serializer.py:312 in public function `flex_name`: D103: Missing docstring in public function torch/backends/cuda/__init__.py:1 at module level: D104: Missing docstring in public package torch/backends/cuda/__init__.py:39 in public class `cuFFTPlanCacheAttrContextProp`: D101: Missing docstring in public class torch/backends/cuda/__init__.py:42 in public method `__init__`: D107: Missing docstring in __init__ torch/backends/cuda/__init__.py:46 in public method `__get__`: D105: Missing docstring in magic method torch/backends/cuda/__init__.py:49 in public method `__set__`: D105: Missing docstring in magic method torch/backends/cuda/__init__.py:63 in public method `__init__`: D107: Missing docstring in __init__ torch/backends/cuda/__init__.py:76 in public method `clear`: D102: Missing docstring in public method torch/backends/cuda/__init__.py:91 in public method `__init__`: D107: Missing docstring in __init__ torch/backends/cuda/__init__.py:95 in public method `__getitem__`: D105: Missing docstring in magic method torch/backends/cuda/__init__.py:108 in public method `__getattr__`: D105: Missing docstring in magic method torch/backends/cuda/__init__.py:111 in public method `__setattr__`: D105: Missing docstring in magic method torch/backends/cuda/__init__.py:118 in public class `cuBLASModule`: D101: Missing docstring in public class torch/backends/cuda/__init__.py:119 in public method `__getattr__`: D105: Missing docstring in magic method torch/backends/cuda/__init__.py:128 in public method `__setattr__`: D105: Missing docstring in magic method torch/backends/cudnn/__init__.py:1 at module level: D104: Missing docstring in public package torch/backends/cudnn/__init__.py:99 in public function `is_acceptable`: D103: Missing docstring in public function torch/backends/cudnn/__init__.py:122 in public function `set_flags`: D103: Missing docstring in public function torch/backends/cudnn/__init__.py:150 in public function `flags`: D103: Missing docstring in public function torch/backends/cudnn/__init__.py:174 in public class `CudnnModule`: D101: Missing docstring in public class torch/backends/cudnn/__init__.py:175 in public method `__init__`: D107: Missing docstring in __init__ torch/backends/mkl/__init__.py:1 at module level: D104: Missing docstring in public package torch/backends/mkl/__init__.py:42 in public method `__init__`: D107: Missing docstring in __init__ torch/backends/mkl/__init__.py:45 in public method `__enter__`: D105: Missing docstring in magic method torch/backends/mkl/__init__.py:54 in public method `__exit__`: D105: Missing docstring in magic method torch/backends/mkldnn/__init__.py:1 at module level: D104: Missing docstring in public package torch/backends/mkldnn/__init__.py:48 in public method `__init__`: D107: Missing docstring in __init__ torch/backends/mkldnn/__init__.py:51 in public method `__enter__`: D105: Missing docstring in magic method torch/backends/mkldnn/__init__.py:60 in public method `__exit__`: D105: Missing docstring in magic method torch/backends/mkldnn/__init__.py:65 in public function `set_flags`: D103: Missing docstring in public function torch/backends/mkldnn/__init__.py:72 in public function `flags`: D103: Missing docstring in public function torch/backends/mkldnn/__init__.py:82 in public class `MkldnnModule`: D101: Missing docstring in public class torch/backends/mkldnn/__init__.py:83 in public method `__init__`: D107: Missing docstring in __init__ torch/backends/openmp/__init__.py:1 at module level: D104: Missing docstring in public package torch/nn/utils/_expanded_weights/conv_utils.py:13 in public function `conv_picker`: D103: Missing docstring in public function torch/nn/utils/_expanded_weights/conv_utils.py:23 in public function `conv_args_and_kwargs`: D103: Missing docstring in public function torch/nn/utils/_expanded_weights/conv_utils.py:31 in public function `conv_normalizer`: D103: Missing docstring in public function torch/nn/utils/_expanded_weights/conv_utils.py:35 in public function `conv_input_for_string_padding`: D103: Missing docstring in public function torch/nn/utils/_expanded_weights/conv_utils.py:43 in public function `int_padding_for_string_padding`: D103: Missing docstring in public function torch/nn/utils/_expanded_weights/conv_utils.py:59 in public function `conv_padding_for_same`: D103: Missing docstring in public function torch/nn/utils/_expanded_weights/conv_utils.py:66 in public function `conv_backward`: D103: Missing docstring in public function torch/nn/utils/_expanded_weights/conv_utils.py:131 in public function `conv_unfold_weight_grad_sample`: D103: Missing docstring in public function torch/nn/utils/_expanded_weights/conv_utils.py:166 in public function `conv_group_weight_grad_sample`: D103: Missing docstring in public function torch/nn/utils/_expanded_weights/expanded_weights_utils.py:6 in public function `is_batch_first`: D103: Missing docstring in public function torch/nn/utils/_expanded_weights/expanded_weights_utils.py:87 in public function `maybe_scale_by_batch_size`: D103: Missing docstring in public function torch/nn/utils/_expanded_weights/expanded_weights_utils.py:93 in public function `set_grad_sample_if_exists`: D103: Missing docstring in public function torch/nn/utils/_expanded_weights/expanded_weights_utils.py:111 in public function `unpack_expanded_weight_or_tensor`: D103: Missing docstring in public function torch/nn/utils/convert_parameters.py:1 at module level: D100: Missing docstring in public module torch/nn/utils/rnn.py:1 at module level: D100: Missing docstring in public module torch/nn/utils/rnn.py:64 in public method `__new__`: D102: Missing docstring in public method torch/nn/utils/rnn.py:74 in public method `pin_memory`: D102: Missing docstring in public method torch/nn/utils/rnn.py:81 in public method `cuda`: D102: Missing docstring in public method torch/nn/utils/rnn.py:88 in public method `cpu`: D102: Missing docstring in public method torch/nn/utils/rnn.py:95 in public method `double`: D102: Missing docstring in public method torch/nn/utils/rnn.py:98 in public method `float`: D102: Missing docstring in public method torch/nn/utils/rnn.py:101 in public method `half`: D102: Missing docstring in public method torch/nn/utils/rnn.py:104 in public method `long`: D102: Missing docstring in public method torch/nn/utils/rnn.py:107 in public method `int`: D102: Missing docstring in public method torch/nn/utils/rnn.py:110 in public method `short`: D102: Missing docstring in public method torch/nn/utils/rnn.py:113 in public method `char`: D102: Missing docstring in public method torch/nn/utils/rnn.py:116 in public method `byte`: D102: Missing docstring in public method torch/nn/utils/rnn.py:198 in public function `invert_permutation`: D103: Missing docstring in public function 81 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112695 Approved by: https://github.com/mikaylagawarecki	2023-11-07 23:52:16 +00:00
Ying Zhang	edcbd5a895	Make TORCH_COMPILE_DEBUG=1 work again (#112917 ) ATT. After the fix, self.node is `Optional[ir.Buffer]` in `FusedSchedulerNode` and `ForeachKernelSchedulerNode`, but `ir.Buffer` in `BaseSchedulerNode`. Using `ir.Buffer` for `BaseSchedulerNode.node` avoids all mypy complaints about Optionals. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112917 Approved by: https://github.com/davidberard98, https://github.com/int3, https://github.com/leslie-fang-intel, https://github.com/aakhundov	2023-11-07 23:34:30 +00:00
Elias Ellison	041b6b5c6b	TorchInductor Opinfo fixes for rng ops (#108170 ) Tests rng ops both with - fallback_random=True, assertEqual=True - fallback_random=False, assertEqual=False Pull Request resolved: https://github.com/pytorch/pytorch/pull/108170 Approved by: https://github.com/davidberard98	2023-11-07 23:13:57 +00:00
Will Feng	498a760802	Update comm_analysis.py license (#113184 ) Consulted with legal, this is the right way to do it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113184 Approved by: https://github.com/Chillee, https://github.com/malfet	2023-11-07 22:58:56 +00:00
Peter Bell	a3a2486be8	[dynamo] Avoid eager imports of classes with custom VariableTrackers (#112319 ) Currently custom VariableTrackers exist for classes that live outside of pytorch. For these cases dynamo currently eagerly imports the module to get the class object to compare against. This instead uses `sys.modules.get("module.path")` such that the module is never imported by dynamo itself, but if the user has imported the module then we will still access the module and grab the type we need to compare against. I noticed this issue because importing `KeyedJaggedTensor` fails half-way through if `fbgemm_gpu` has been built with an incompatible PyTorch version, in which case it retries the import again each time! Pull Request resolved: https://github.com/pytorch/pytorch/pull/112319 Approved by: https://github.com/lezcano, https://github.com/ezyang	2023-11-07 22:45:54 +00:00
Andrew Gu	e4c8737a0c	[PT-D] Updated Dynamo skip message for `@contract` tests (#112793 ) Even Dynamo can now trace through module hooks, its regex matcher for `HASATTR` does not like the state key: `12a6f5aa6b/torch/distributed/_composable/contract.py (L10-L14)` `12a6f5aa6b/torch/_dynamo/guards.py (L353-L355)` ``` PYTORCH_TEST_WITH_DYNAMO=1 python -m pytest test/distributed/_composable/test_contract.py ``` ``` ------------------------------------- Captured stderr call ------------------------------------- [2023-11-02 14:40:02,242] torch._dynamo.convert_frame: [WARNING] WON'T CONVERT resume_in_test_registry /data/users/andgu/pytorch/test/distributed/_composable/test_contract.py line 125 [2023-11-02 14:40:02,242] torch._dynamo.convert_frame: [WARNING] due to: [2023-11-02 14:40:02,242] torch._dynamo.convert_frame: [WARNING] Traceback (most recent call last): [2023-11-02 14:40:02,242] torch._dynamo.convert_frame: [WARNING] File "/data/users/andgu/pytorch/torch/_dynamo/convert_frame.py", line 687, in _convert_frame [2023-11-02 14:40:02,242] torch._dynamo.convert_frame: [WARNING] result = inner_convert(frame, cache_entry, hooks, frame_state) [2023-11-02 14:40:02,242] torch._dynamo.convert_frame: [WARNING] File "/data/users/andgu/pytorch/torch/_dynamo/convert_frame.py", line 148, in _fn [2023-11-02 14:40:02,242] torch._dynamo.convert_frame: [WARNING] return fn(args, kwargs) [2023-11-02 14:40:02,242] torch._dynamo.convert_frame: [WARNING] File "/data/users/andgu/pytorch/torch/_dynamo/convert_frame.py", line 406, in _convert_frame_assert [2023-11-02 14:40:02,242] torch._dynamo.convert_frame: [WARNING] return _compile( [2023-11-02 14:40:02,242] torch._dynamo.convert_frame: [WARNING] File "/data/users/andgu/pytorch/torch/_dynamo/convert_frame.py", line 614, in _compile [2023-11-02 14:40:02,242] torch._dynamo.convert_frame: [WARNING] guarded_code = compile_inner(code, one_graph, hooks, transform) [2023-11-02 14:40:02,242] torch._dynamo.convert_frame: [WARNING] File "/data/users/andgu/pytorch/torch/_dynamo/utils.py", line 221, in time_wrapper [2023-11-02 14:40:02,242] torch._dynamo.convert_frame: [WARNING] r = func(args, **kwargs) [2023-11-02 14:40:02,242] torch._dynamo.convert_frame: [WARNING] File "/data/users/andgu/pytorch/torch/_dynamo/convert_frame.py", line 594, in compile_inner [2023-11-02 14:40:02,242] torch._dynamo.convert_frame: [WARNING] check_fn = CheckFunctionManager( [2023-11-02 14:40:02,242] torch._dynamo.convert_frame: [WARNING] File "/data/users/andgu/pytorch/torch/_dynamo/guards.py", line 987, in __init__ [2023-11-02 14:40:02,242] torch._dynamo.convert_frame: [WARNING] guard.create(builder) [2023-11-02 14:40:02,242] torch._dynamo.convert_frame: [WARNING] File "/data/users/andgu/pytorch/torch/_guards.py", line 244, in create [2023-11-02 14:40:02,242] torch._dynamo.convert_frame: [WARNING] return self.create_fn(builder, self) [2023-11-02 14:40:02,242] torch._dynamo.convert_frame: [WARNING] File "/data/users/andgu/pytorch/torch/_dynamo/guards.py", line 354, in HASATTR [2023-11-02 14:40:02,242] torch._dynamo.convert_frame: [WARNING] assert m, f"invalid hasattr check {guard.name}" [2023-11-02 14:40:02,242] torch._dynamo.convert_frame: [WARNING] AssertionError: invalid hasattr check getattr(L['___stack0'], '__composable_api_state_key_643e6a56-3313-4c8f-9401-a5af7bd3ee26') ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112793 Approved by: https://github.com/wanchaol	2023-11-07 22:42:03 +00:00
PyTorch MergeBot	0fee7a0181	Revert "[TD] Add heuristic for class level historical correlations (#112162 )" This reverts commit ff1ae3520506045c266463a05b0ce346552363c7. Reverted https://github.com/pytorch/pytorch/pull/112162 on behalf of https://github.com/clee2000 due to broke lint? probably a landrace ([comment](https://github.com/pytorch/pytorch/pull/112162#issuecomment-1800310012))	2023-11-07 22:40:35 +00:00
Jason Ansel	356f3458c4	[dynamo] Remove incorrect sources (#112961 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112961 Approved by: https://github.com/voznesenskym, https://github.com/Skylion007 ghstack dependencies: #111306, #111415, #111725, #111726, #112962	2023-11-07 22:01:40 +00:00
Jason Ansel	bd8d924e9b	[dynamo] Relax NullContextVariable and RangeVariable guards (#112962 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112962 Approved by: https://github.com/voznesenskym ghstack dependencies: #111306, #111415, #111725, #111726	2023-11-07 22:01:40 +00:00
Aryan Gupta	8cee0a25bd	fix: Flake8-BugBear code B-026 for PyTorch (#111362 ) Fixes #106571 I have fixed the B-026 error codes for Flake8 tests on the codebase. Please review and tell me anything else to do. Thanks and excited for this first contribution to PyTorch. Also I refer this issue which introduced [B-026](https://github.com/PyCQA/flake8-bugbear/issues/286) in `pytest-bugbear` and discuss the error code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111362 Approved by: https://github.com/Skylion007	2023-11-07 21:38:18 +00:00
Justin Yip	2da062da51	[pytorch-vulkan] fix zero-dim test (#113116 ) Summary: Fix zero-dim test. Use `at::zeros` instead of `at::empty` as the init value inside a `at::empty` tensor is undefined. Likely to be the cause of test flakiness. {F1142344469} Test Plan: Run on devserver ``` $ LD_LIBRARY_PATH=third-party/swiftshader/lib/linux-x64/ buck2 run fbcode/mode/dev-nosan //xplat/caffe2:pt_vulkan_api_test_bin ... [ OK ] VulkanAPITest.linear_4d_large (2 ms) [ RUN ] VulkanAPITest.lstm_success [ OK ] VulkanAPITest.lstm_success (4 ms) [ RUN ] VulkanAPITest.lstm_mclareninputs_success [ OK ] VulkanAPITest.lstm_mclareninputs_success (45 ms) [ RUN ] VulkanAPITest.lstm_prepack_success [ OK ] VulkanAPITest.lstm_prepack_success (2 ms) [ RUN ] VulkanAPITest.querypool_flushed_shader_log xplat/caffe2/aten/src/ATen/test/vulkan_api_test.cpp:7773: Skipped QueryPool is not available [ SKIPPED ] VulkanAPITest.querypool_flushed_shader_log (0 ms) [----------] 402 tests from VulkanAPITest (24598 ms total) [----------] Global test environment tear-down [==========] 402 tests from 1 test suite ran. (24598 ms total) [ PASSED ] 399 tests. [ SKIPPED ] 1 test, listed below: [ SKIPPED ] VulkanAPITest.querypool_flushed_shader_log [ FAILED ] 2 tests, listed below: [ FAILED ] VulkanAPITest.conv2d_pw_prepack [ FAILED ] VulkanAPITest.conv2d_pw_prepack_bc 2 FAILED TESTS YOU HAVE 7 DISABLED TESTS ``` Last two are known failures on devserver. Full output: P875058890 Differential Revision: D51055623 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113116 Approved by: https://github.com/manuelcandales	2023-11-07 21:32:03 +00:00
Zain Rizvi	ff1ae35205	[TD] Add heuristic for class level historical correlations (#112162 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112162 Approved by: https://github.com/clee2000	2023-11-07 20:57:03 +00:00
rzou	056f2cba17	Deprecate "fallthrough" as autograd fallback default (#113166 ) This got reverted a couple of months ago. We have since fixed the known problems with the flag. It is time to try again. Context: This PR adds a new fallback to the Autograd dispatch keys. The previous behavior was a big footgun; we are moving to deprecating it. If you would prefer the old behavior: - A quick (unsupported) way to get the previous behavior is to call torch._C._set_autograd_fallback("nothing") - Register "torch::CppFunction::makeFallthrough()" to your Autograd key, like in https://gist.github.com/zou3519/d09a5f4b1afe2430af09fea67c6ff2c8 It is possible that this PR regresses performance of overhead-bound models. If this is the case, please reach out (and apply one of the temporary fixes in the previous section). Test Plan: - existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/113166 Approved by: https://github.com/soulitzer	2023-11-07 20:26:39 +00:00
Fei Kou	f496c8c4a7	[tp] handle non-covered ops (#112530 ) Summary: Only propagate sharding if the op has sharding strategy registered, otherwise mark in/out of the op as `Replicate`. Test Plan: buck test mode/opt -c fbcode.enable_gpu_sections=true //caffe2/test/distributed/_tensor/experimental:tp_transform Differential Revision: D50747611 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112530 Approved by: https://github.com/wanchaol	2023-11-07 20:20:44 +00:00
Kefei Lu	0af8fb71ab	add test for consecutive aot inductor compiles (#111170 ) Differential Revision: D50246956 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111170 Approved by: https://github.com/khabinov	2023-11-07 20:11:53 +00:00
William Wen	ad1c3467e2	[dynamo] run guard fail hooks for each cache entry for which there is a cache miss (#110325 ) Attempt number 2 at https://github.com/pytorch/pytorch/issues/108950. Improves debugging for guard failures/recompilations by: - only running guard fail reason generation during recompilation, instead of when a guard fails during dynamo cache lookup (so generating guard failure reasons is not on the critical path) - ~~always reporting all guard failures~~ Reports the first-failing guard failure for each cache entry. We don't expect a performance hit since the guard fail reasons are only generated at recompile time rather than runtime. Perf benchmark to check this (https://hud.pytorch.org/benchmark/torchbench/inductor_with_cudagraphs?startTime=Fri,%2027%20Oct%202023%2017:42:43%20GMT&stopTime=Fri,%2003%20Nov%202023%2017:42:43%20GMT&granularity=hour&mode=training&dtype=amp&lBranch=gh/williamwen42/62/head&lCommit=f4724f5ffc6d17ceae513a42fc18627be7b85482&rBranch=main&rCommit=29f3d392bf230072e3bffae37b078e770cae1956). We may also need to verify this on benchmarks where guard fails are common. Sample script: ```python import torch def generate_data(b): return ( torch.randn(b, 3, 32, 32).to(torch.float32).cuda(), torch.randint(1000, (b,)).cuda(), ) from torchvision.models import resnet18 def init_model(): return resnet18().to(torch.float32).cuda() model = init_model() model_opt = torch.compile(model, dynamic=False) for b in range(16, 32): data = generate_data(b) model_opt(data[0]) ``` Sample logs: ```bash (/data/users/williamwen/py310-env) [williamwen@devgpu020.odn1 /data/users/williamwen/pytorch (wwen/log-all-guards)]$ python playground5.py /data/users/williamwen/pytorch/torch/_inductor/compile_fx.py:141: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled. Consider setting `torch.set_float32_matmul_precision('high')` for better performance. warnings.warn( [2023-11-06 14:50:47,605] torch._dynamo.convert_frame: [WARNING] torch._dynamo hit config.cache_size_limit (8) [2023-11-06 14:50:47,605] torch._dynamo.convert_frame: [WARNING] function: 'forward' (/data/users/williamwen/torchvision/torchvision/models/resnet.py:284) [2023-11-06 14:50:47,605] torch._dynamo.convert_frame: [WARNING] last reason: tensor 'L['x']' size mismatch at index 0. expected 16, actual 24 [2023-11-06 14:50:47,605] torch._dynamo.convert_frame: [WARNING] To log all recompilation reasons, use TORCH_LOGS="recompiles". [2023-11-06 14:50:47,605] torch._dynamo.convert_frame: [WARNING] To diagnose recompilation issues, see https://pytorch.org/docs/master/compile/troubleshooting.html. (/data/users/williamwen/py310-env) [williamwen@devgpu020.odn1 /data/users/williamwen/pytorch (wwen/log-all-guards)]$ TORCH_LOGS="recompiles" python playground5.py /data/users/williamwen/pytorch/torch/_inductor/compile_fx.py:141: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled. Consider setting `torch.set_float32_matmul_precision('high')` for better performance. warnings.warn( [2023-11-06 14:53:31,591] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function forward in /data/users/williamwen/torchvision/torchvision/models/resnet.py:284 [2023-11-06 14:53:31,591] torch._dynamo.guards.__recompiles: [DEBUG] triggered by the following guard failure(s): [2023-11-06 14:53:31,591] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 16, actual 17 [2023-11-06 14:53:41,333] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function forward in /data/users/williamwen/torchvision/torchvision/models/resnet.py:284 [2023-11-06 14:53:41,333] torch._dynamo.guards.__recompiles: [DEBUG] triggered by the following guard failure(s): [2023-11-06 14:53:41,333] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 17, actual 18 [2023-11-06 14:53:41,333] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 16, actual 18 [2023-11-06 14:53:50,463] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function forward in /data/users/williamwen/torchvision/torchvision/models/resnet.py:284 [2023-11-06 14:53:50,463] torch._dynamo.guards.__recompiles: [DEBUG] triggered by the following guard failure(s): [2023-11-06 14:53:50,463] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 18, actual 19 [2023-11-06 14:53:50,463] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 17, actual 19 [2023-11-06 14:53:50,463] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 16, actual 19 [2023-11-06 14:53:59,848] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function forward in /data/users/williamwen/torchvision/torchvision/models/resnet.py:284 [2023-11-06 14:53:59,848] torch._dynamo.guards.__recompiles: [DEBUG] triggered by the following guard failure(s): [2023-11-06 14:53:59,848] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 19, actual 20 [2023-11-06 14:53:59,848] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 18, actual 20 [2023-11-06 14:53:59,848] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 17, actual 20 [2023-11-06 14:53:59,848] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 16, actual 20 [2023-11-06 14:54:08,549] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function forward in /data/users/williamwen/torchvision/torchvision/models/resnet.py:284 [2023-11-06 14:54:08,549] torch._dynamo.guards.__recompiles: [DEBUG] triggered by the following guard failure(s): [2023-11-06 14:54:08,549] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 20, actual 21 [2023-11-06 14:54:08,549] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 19, actual 21 [2023-11-06 14:54:08,549] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 18, actual 21 [2023-11-06 14:54:08,549] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 17, actual 21 [2023-11-06 14:54:08,549] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 16, actual 21 [2023-11-06 14:54:17,795] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function forward in /data/users/williamwen/torchvision/torchvision/models/resnet.py:284 [2023-11-06 14:54:17,795] torch._dynamo.guards.__recompiles: [DEBUG] triggered by the following guard failure(s): [2023-11-06 14:54:17,795] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 21, actual 22 [2023-11-06 14:54:17,795] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 20, actual 22 [2023-11-06 14:54:17,795] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 19, actual 22 [2023-11-06 14:54:17,795] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 18, actual 22 [2023-11-06 14:54:17,795] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 17, actual 22 [2023-11-06 14:54:17,795] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 16, actual 22 [2023-11-06 14:54:27,430] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function forward in /data/users/williamwen/torchvision/torchvision/models/resnet.py:284 [2023-11-06 14:54:27,430] torch._dynamo.guards.__recompiles: [DEBUG] triggered by the following guard failure(s): [2023-11-06 14:54:27,430] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 22, actual 23 [2023-11-06 14:54:27,430] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 21, actual 23 [2023-11-06 14:54:27,430] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 20, actual 23 [2023-11-06 14:54:27,430] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 19, actual 23 [2023-11-06 14:54:27,430] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 18, actual 23 [2023-11-06 14:54:27,430] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 17, actual 23 [2023-11-06 14:54:27,430] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 16, actual 23 [2023-11-06 14:54:36,744] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function forward in /data/users/williamwen/torchvision/torchvision/models/resnet.py:284 [2023-11-06 14:54:36,744] torch._dynamo.guards.__recompiles: [DEBUG] triggered by the following guard failure(s): [2023-11-06 14:54:36,744] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 23, actual 24 [2023-11-06 14:54:36,744] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 22, actual 24 [2023-11-06 14:54:36,744] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 21, actual 24 [2023-11-06 14:54:36,744] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 20, actual 24 [2023-11-06 14:54:36,744] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 19, actual 24 [2023-11-06 14:54:36,744] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 18, actual 24 [2023-11-06 14:54:36,744] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 17, actual 24 [2023-11-06 14:54:36,744] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 16, actual 24 [2023-11-06 14:54:36,744] torch._dynamo.convert_frame: [WARNING] torch._dynamo hit config.cache_size_limit (8) [2023-11-06 14:54:36,744] torch._dynamo.convert_frame: [WARNING] function: 'forward' (/data/users/williamwen/torchvision/torchvision/models/resnet.py:284) [2023-11-06 14:54:36,744] torch._dynamo.convert_frame: [WARNING] last reason: tensor 'L['x']' size mismatch at index 0. expected 16, actual 24 [2023-11-06 14:54:36,744] torch._dynamo.convert_frame: [WARNING] To log all recompilation reasons, use TORCH_LOGS="recompiles". [2023-11-06 14:54:36,744] torch._dynamo.convert_frame: [WARNING] To diagnose recompilation issues, see https://pytorch.org/docs/master/compile/troubleshooting.html. [2023-11-06 14:54:45,922] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function _forward_impl in /data/users/williamwen/torchvision/torchvision/models/resnet.py:266 [2023-11-06 14:54:45,922] torch._dynamo.guards.__recompiles: [DEBUG] triggered by the following guard failure(s): [2023-11-06 14:54:45,922] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 24, actual 25 [2023-11-06 14:54:54,691] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function _forward_impl in /data/users/williamwen/torchvision/torchvision/models/resnet.py:266 [2023-11-06 14:54:54,691] torch._dynamo.guards.__recompiles: [DEBUG] triggered by the following guard failure(s): [2023-11-06 14:54:54,691] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 25, actual 26 [2023-11-06 14:54:54,691] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 24, actual 26 [2023-11-06 14:55:03,591] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function _forward_impl in /data/users/williamwen/torchvision/torchvision/models/resnet.py:266 [2023-11-06 14:55:03,591] torch._dynamo.guards.__recompiles: [DEBUG] triggered by the following guard failure(s): [2023-11-06 14:55:03,591] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 26, actual 27 [2023-11-06 14:55:03,591] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 25, actual 27 [2023-11-06 14:55:03,591] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 24, actual 27 [2023-11-06 14:55:12,384] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function _forward_impl in /data/users/williamwen/torchvision/torchvision/models/resnet.py:266 [2023-11-06 14:55:12,384] torch._dynamo.guards.__recompiles: [DEBUG] triggered by the following guard failure(s): [2023-11-06 14:55:12,384] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 27, actual 28 [2023-11-06 14:55:12,384] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 26, actual 28 [2023-11-06 14:55:12,384] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 25, actual 28 [2023-11-06 14:55:12,384] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 24, actual 28 [2023-11-06 14:55:21,442] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function _forward_impl in /data/users/williamwen/torchvision/torchvision/models/resnet.py:266 [2023-11-06 14:55:21,442] torch._dynamo.guards.__recompiles: [DEBUG] triggered by the following guard failure(s): [2023-11-06 14:55:21,442] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 28, actual 29 [2023-11-06 14:55:21,442] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 27, actual 29 [2023-11-06 14:55:21,442] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 26, actual 29 [2023-11-06 14:55:21,442] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 25, actual 29 [2023-11-06 14:55:21,442] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 24, actual 29 [2023-11-06 14:55:30,315] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function _forward_impl in /data/users/williamwen/torchvision/torchvision/models/resnet.py:266 [2023-11-06 14:55:30,315] torch._dynamo.guards.__recompiles: [DEBUG] triggered by the following guard failure(s): [2023-11-06 14:55:30,315] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 29, actual 30 [2023-11-06 14:55:30,315] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 28, actual 30 [2023-11-06 14:55:30,315] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 27, actual 30 [2023-11-06 14:55:30,315] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 26, actual 30 [2023-11-06 14:55:30,315] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 25, actual 30 [2023-11-06 14:55:30,315] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 24, actual 30 [2023-11-06 14:55:39,839] torch._dynamo.guards.__recompiles: [DEBUG] Recompiling function _forward_impl in /data/users/williamwen/torchvision/torchvision/models/resnet.py:266 [2023-11-06 14:55:39,839] torch._dynamo.guards.__recompiles: [DEBUG] triggered by the following guard failure(s): [2023-11-06 14:55:39,839] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 30, actual 31 [2023-11-06 14:55:39,839] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 29, actual 31 [2023-11-06 14:55:39,839] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 28, actual 31 [2023-11-06 14:55:39,839] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 27, actual 31 [2023-11-06 14:55:39,839] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 26, actual 31 [2023-11-06 14:55:39,839] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 25, actual 31 [2023-11-06 14:55:39,839] torch._dynamo.guards.__recompiles: [DEBUG] - tensor 'L['x']' size mismatch at index 0. expected 24, actual 31 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/110325 Approved by: https://github.com/ezyang, https://github.com/jon-chuang	2023-11-07 20:10:59 +00:00
andrewor14	c0aba9be41	[quant][pt2] Fix custom dtype per channel weight in QAT (#112612 ) Summary: Previously we only copied over q/dq args for the per tensor case. This was because the qparams for `quantize_per_tensor` are literals while the qparams for `quantize_per_channel` are `get_attr` nodes (tensors), which disappear from the original nodes in the graph after subgraph rewriting. However, this is problematic because, in the per channel case, not all q/dq args are tensors. In particular, the args after the qparams (axis, qmin, qmax, dtype) are all literals. For these literal args we simply used the hardcoded ones (0, -127, 127, torch.int8 respectively), even if the user explicitly specified to use a different weight dtype. This commit fixes this by copying over these literal args for the per channel case as well. Test Plan: python test/test_quantization.py TestQuantizePT2EQAT.test_qat_per_channel_weight_custom_dtype Reviewers: jerryzh168, kimishpatel Subscribers: jerryzh168, kimishpatel, supriyar Pull Request resolved: https://github.com/pytorch/pytorch/pull/112612 Approved by: https://github.com/jerryzh168	2023-11-07 20:10:53 +00:00
soulitzer	538ec4942a	Do not generate zero-numel NT by default in helper and improve to_padded_tensor msg (#113162 ) Improvements: improves to_padded_tensor error message when passed a NT with zero numel Pull Request resolved: https://github.com/pytorch/pytorch/pull/113162 Approved by: https://github.com/jbschlosser ghstack dependencies: #113031, #112519, #113091	2023-11-07 19:56:26 +00:00
soulitzer	0c991acab0	Factor out test_nestedtensor setUp tearDown and call super (#113091 ) Fixes https://github.com/pytorch/pytorch/issues/112845 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113091 Approved by: https://github.com/jbschlosser ghstack dependencies: #113031, #112519	2023-11-07 19:56:26 +00:00
Jason Ansel	5fe96eaaf4	[dynamo] Remove VariableTracker.propagate (#111726 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111726 Approved by: https://github.com/voznesenskym ghstack dependencies: #111306, #111415, #111725	2023-11-07 19:55:19 +00:00
Jason Ansel	843a8ecd24	[dynamo] Remove VariableTracker.add_options (#111725 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111725 Approved by: https://github.com/voznesenskym ghstack dependencies: #111306, #111415	2023-11-07 19:55:19 +00:00
Jason Ansel	9664190952	[dynamo] Eagerly install guards (#111415 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111415 Approved by: https://github.com/voznesenskym ghstack dependencies: #111306	2023-11-07 19:55:19 +00:00
Jason Ansel	2964682490	[dynamo] Add LazyVariableTracker (#111306 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111306 Approved by: https://github.com/voznesenskym	2023-11-07 19:55:19 +00:00
atalman	2322d989e8	Apply release only changes to core (#109208 ) Utility script to run after branch cut have been completed. Execute: ``RELEASE_VERSION=2.1 apply-release-changes.sh`` Similar to: https://github.com/pytorch/audio/pull/3590 Test PR: https://github.com/pytorch/pytorch/pull/109210 Automate generation of PRs: https://github.com/pytorch/pytorch/pull/108053 https://github.com/pytorch/pytorch/pull/108688 https://github.com/pytorch/pytorch/pull/108064 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109208 Approved by: https://github.com/seemethere	2023-11-07 19:47:30 +00:00
Catherine Lee	0c448526a4	[experiment][TD] Rating number system (#112676 ) Emit excessive amount of heuristic info emitted, but that just means I can do more with it later? Pull Request resolved: https://github.com/pytorch/pytorch/pull/112676 Approved by: https://github.com/ZainRizvi	2023-11-07 19:40:11 +00:00
Menglu Yu	82875e69fe	[inductor][fx pass] Fix a bug for the merge_stack_tahn_unbind pattern (#113101 ) Summary: Context: https://fb.workplace.com/groups/1075192433118967/permalink/1328366351134906/ Test Plan: local reproduce igctr: ``` buck2 run mode/opt //scripts/jackiexu0313/pt2:local_model_with_pt2 -- --test_mode split_batch-group ``` P874994427 Differential Revision: D51052304 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113101 Approved by: https://github.com/jackiexu1992	2023-11-07 19:25:50 +00:00
Eddie Yan	785e586eb0	[CUDA][cuBLAS] Separate reduced precision reductions on/off for addmm tests (#112545 ) CC @malfet @ptrblck ~~We've been seeing a lot of noise from Ampere and later devices due to reduced precision reductions, so preemptively disabling them for addmm tests.~~ Breaking out addmm tests into one with and without reduced precision reductions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112545 Approved by: https://github.com/malfet	2023-11-07 19:09:29 +00:00
PyTorch MergeBot	bc3e2e03cd	Revert "Update impl_abstract_pystub to be less boilerplatey (#112851 )" This reverts commit 6ae4e3a8d249a96d9a8bbfba389d0509783e11e1. Reverted https://github.com/pytorch/pytorch/pull/112851 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/112851#issuecomment-1799539354))	2023-11-07 18:53:13 +00:00
Iris Zhang	2fc940e0c4	[DTensorTestbase] Add run_subtests to DTensorTestbase and fix test_ddp checkpoint test error (#113051 ) This PR: 1. Adds `run_subtests` to `DTensorTestbase`, which runs test functions given by `test_fn` as subtests. This amortizes the costly dist setup. 2. Update `test/distributed/checkpoint/test_state_dict.py` to use `DTensorTestbase`. This fixes the "Duplicate GPU detected: rank 0 and rank 1 both on CUDA device 11000" when running on 1 GPU, as the skip_if_lt_x_gpu is currently happening after dist setup. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113051 Approved by: https://github.com/fegin	2023-11-07 18:48:01 +00:00
Juncheng Gu	7c4e49ec80	[Fix] add validation logics to TCPStore queries (#107607 ) This PR fixes #106294. Due to the lack of request validation mechanism, TCPStore in torch mistakenly treats nmap scan messages as valid query messages, which leads to DDP OOM. The simple solution enforces the very first query from a client is a validation query with a predefined magic number. If the validation fails, the server will terminate the connection. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107607 Approved by: https://github.com/cbalioglu, https://github.com/XilunWu	2023-11-07 18:36:25 +00:00
Xilun Wu	56e514aefb	[dtensor][BE][1/N] fix DTensor Ops test (#113104 ) Summary: dtensor_ops test has helper function `assert_ref_dtensor_equal` which was written as expecting a `DTensor` argument `dtensor_rs` but actually receives a `torch.Tensor` in test. This PR removes the `to_local()` call on that object since it's actually a `torch.Tensor`. This PR is a part of internal task [T169242924](https://www.internalfb.com/intern/tasks/?t=169242924) for better engineering. Test: `pytest test/distributed/_tensor/test_dtensor_ops.py` `pytest test/distributed/_tensor/test_dtensor_ops.py -s -k baddbmm` Pull Request resolved: https://github.com/pytorch/pytorch/pull/113104 Approved by: https://github.com/wanchaol	2023-11-07 18:23:14 +00:00
Aryan Gupta	92e7f79609	Doc: Add and Fix docstrings for torch.util.data files (#112817 ) Fixes #112635 Fix docstrings for `torch.utils.data` files. ``` Before: > pydocstyle torch/utils/data/graph.py --count Before: 5 After: 1 > pydocstyle torch/utils/data/graph_settings.py --count Before: 8 After: 3 > pydocstyle torch/utils/data/dataloader.py --count Before: 12 After: 6 > pydocstyle torch/utils/data/dataset.py --count Before: 28 After: 23 > pydocstyle torch/utils/data/sampler.py --count Before: 24 After: 19 > pydocstyle torch/utils/data/_utils/signal_handling.py --count Before: 1 After: 0 > pydocstyle torch/utils/data/_utils/__init__.py --count Before: 2 After: 0 > pydocstyle torch/utils/data/_utils/collate.py --count Before: 20 After: 6 > pydocstyle torch/utils/data/_utils/fetch.py --count Before: 3 After: 0 > pydocstyle torch/utils/data/_utils/pin_memory.py --count Before: 4 After: 1 > pydocstyle torch/utils/data/datapipes/_decorator.py --count Before: 19 After: 16 > pydocstyle torch/utils/data/datapipes/_hook_iterator.py --count Before: 13 After: 0 > pydocstyle torch/utils/data/datapipes/_typing.py --count Before: 17 After: 4 > pydocstyle torch/utils/data/datapipes/gen_pyi.py --count Before: 19 After: 4 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112817 Approved by: https://github.com/kit1980	2023-11-07 17:59:56 +00:00
Li-Huai (Allan) Lin	740137df6f	[MPS] Add bucketize op (#112830 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112830 Approved by: https://github.com/kulinseth, https://github.com/malfet ghstack dependencies: #112829	2023-11-07 17:22:08 +00:00
Li-Huai (Allan) Lin	c4bb77323d	[MPS] Add searchsorted op (#112829 ) The metal kernels implemented are closely following `Bucketization.cu`. Benchmark: ``` [----------------------------- searchsorted ----------------------------] \| cpu \| mps 1 threads: -------------------------------------------------------------- Batch size: 8; In features: 64; Sorter: True \| 44 \| 530 Batch size: 8; In features: 64; Sorter: False \| 31 \| 12 Batch size: 8; In features: 256; Sorter: True \| 131 \| 520 Batch size: 8; In features: 256; Sorter: False \| 107 \| 12 Batch size: 8; In features: 1024; Sorter: True \| 499 \| 590 Batch size: 8; In features: 1024; Sorter: False \| 398 \| 12 Batch size: 16; In features: 64; Sorter: True \| 71 \| 540 Batch size: 16; In features: 64; Sorter: False \| 57 \| 12 Batch size: 16; In features: 256; Sorter: True \| 242 \| 610 Batch size: 16; In features: 256; Sorter: False \| 200 \| 12 Batch size: 16; In features: 1024; Sorter: True \| 999 \| 720 Batch size: 16; In features: 1024; Sorter: False \| 842 \| 12 Batch size: 32; In features: 64; Sorter: True \| 124 \| 509 Batch size: 32; In features: 64; Sorter: False \| 103 \| 12 Batch size: 32; In features: 256; Sorter: True \| 477 \| 650 Batch size: 32; In features: 256; Sorter: False \| 407 \| 12 Batch size: 32; In features: 1024; Sorter: True \| 1940 \| 833 Batch size: 32; In features: 1024; Sorter: False \| 1710 \| 12 Batch size: 64; In features: 64; Sorter: True \| 231 \| 590 Batch size: 64; In features: 64; Sorter: False \| 194 \| 12 Batch size: 64; In features: 256; Sorter: True \| 937 \| 710 Batch size: 64; In features: 256; Sorter: False \| 800 \| 13 Batch size: 64; In features: 1024; Sorter: True \| 3980 \| 1290 Batch size: 64; In features: 1024; Sorter: False \| 3330 \| 12 Batch size: 128; In features: 64; Sorter: True \| 448 \| 650 Batch size: 128; In features: 64; Sorter: False \| 390 \| 13 Batch size: 128; In features: 256; Sorter: True \| 1830 \| 850 Batch size: 128; In features: 256; Sorter: False \| 1590 \| 12 Batch size: 128; In features: 1024; Sorter: True \| 7790 \| 2850 Batch size: 128; In features: 1024; Sorter: False \| 6670 \| 13 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112829 Approved by: https://github.com/malfet	2023-11-07 17:22:08 +00:00
Aleksei Nikiforov	70eeb82f00	s390x: skip tests relying on specific openblas precision (#112843 ) This change skips test_forward_mode_AD_linalg_det_singular_cpu_complex128 and test_forward_mode_AD_linalg_det_singular_cpu_float64 from test/test_ops_fwd_gradients.py due to https://github.com/OpenMathLib/OpenBLAS/issues/4194 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112843 Approved by: https://github.com/kit1980	2023-11-07 17:18:20 +00:00
Oguz Ulgen	611a7457ca	[Inductor] Kill MutationLayout from ir.py (#112925 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112925 Approved by: https://github.com/jansel	2023-11-07 17:03:52 +00:00
Nikita Shulga	562c4ae4bc	Update Pillow pin to 10.0.1 (#113111 ) To err on the side of caution and fix dependabot warnings Pull Request resolved: https://github.com/pytorch/pytorch/pull/113111 Approved by: https://github.com/clee2000	2023-11-07 16:41:06 +00:00
Peter Bell	4fecbebc37	Fix OOM in test_large_block_sizes (#113153 ) This test is causing flakyness on CPU, see #113134 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113153 Approved by: https://github.com/lezcano	2023-11-07 16:12:19 +00:00
lezcano	6ce5de5275	Avoid calling as_tensor twice (#112866 ) Sometimes doing so may copy and that's not good. We avoid that by setting global flags. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112866 Approved by: https://github.com/kit1980, https://github.com/ev-br	2023-11-07 16:10:59 +00:00
Richard Zou	6ae4e3a8d2	Update impl_abstract_pystub to be less boilerplatey (#112851 ) Summary: We've made the following changes: - The new way to use the API is `m.impl_abstract_pystub(module, context)`. Every subsequent m.def of an op inside the TORCH_LIBRARY block gives the op the `impl_abstract_pystub`. - Added a mechanism to determine if an operator was defined in Python or C++. Library.define in Python appends the op to a global set, which is analogous to what we do for tracking Library.impl. - If someone does `torch.library.impl_abstract` in Python for an operator, then we require that it has an `impl_abstract_pystub` specified and we also check that the module in the `impl_abstract_pystub` is the same as the module where the call to `torch.library.impl_abstract` exists. - Unfortunately we can't check the "context" (which is the buck target on buck-based systems) because buck sits above us. Test Plan: - existing tests Differential Revision: D50972148 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112851 Approved by: https://github.com/ezyang	2023-11-07 16:07:42 +00:00
PyTorch MergeBot	9a28a7b498	Revert "Add support for `torch.Generator` type in TorchScript (#110413 )" This reverts commit 27e31ab6e86259b27d816d6fb6e7a69de526a0e4. Reverted https://github.com/pytorch/pytorch/pull/110413 on behalf of https://github.com/PaliC due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/110413#issuecomment-1799003164))	2023-11-07 15:53:32 +00:00
Ke Wen	bb7ac12cbf	[ProcessGroupNCCL] Avoid recording stream for broadcast and scatter (#112896 ) Summary: Follows PR #111431, save memory for DTensor init Test Plan: Sandcastle Reviewed By: wanchaol Differential Revision: D50985365 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112896 Approved by: https://github.com/wanchaol	2023-11-07 15:44:04 +00:00
Edward Z. Yang	98564d2d7a	If you have i0 = i1 * 12, perform this replacement directly (#112653 ) In https://github.com/pytorch/pytorch/pull/112156 I added support for creating replacements on unbacked SymInts, so if you asserted that `i0 == s0`, we would replace i0 with s0 (only ever replacing unbacked with backed.) However, if we have assertions involving only unbacked SymInts, we can also replace in this case! E.g., `i0 == i1` or `i0 == i1 * 12`. The previous logic for generating replacements would reject these cases, because you're not allowed to replace unbacked with unbacked. Modifying the logic is not so easy though; ordinarily, we decide what substitution to prioritize by trying to replace the largest hinted symbol, but for unbacked integers we don't have this. To get around this problem, for now I only setup replacements for trivial symbol equals something else situations. Check the diff with whitespace ignored, the addition is quite small. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/112653 Approved by: https://github.com/aakhundov	2023-11-07 14:31:54 +00:00
rzou	493b52b3d9	Grandfather in built-in TorchScript ops to being pt2_compliant (#113061 ) I'm seeing ops like torch.ops.aten.mul.complex being used with torch.compile (though this seems strange to me), but we should grandfather these in. Test Plan: - new tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/113061 Approved by: https://github.com/ezyang ghstack dependencies: #113049, #113050	2023-11-07 12:55:16 +00:00
rzou	85832c0b9b	Grandfather in some more pytorch ops to be pt2_compliant (#113050 ) We're not directly testing these, but in general the policy is to assume that PyTorch ops inside the pytorch repo are compliant. Test Plan: - existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/113050 Approved by: https://github.com/ezyang ghstack dependencies: #113049	2023-11-07 12:55:16 +00:00
rzou	a06832f911	Grandfather in c10d_functional ops to pt2_compliant (#113049 ) This PR also adds the ability to specify Tags for more `m.def(` overloads. Test Plan: - new test Pull Request resolved: https://github.com/pytorch/pytorch/pull/113049 Approved by: https://github.com/williamwen42	2023-11-07 12:55:05 +00:00
Peter Bell	c6f435befd	Don't recompute numel and contiguous in detach (#112689 ) When symolic shapes are involved, `refresh_numel` and `refresh_contiguous` are fairly expensive since they dispatch to python for SymInt handling. However, in the case of detach we can just copy the existing values instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112689 Approved by: https://github.com/lezcano, https://github.com/ezyang	2023-11-07 10:55:48 +00:00
Yue Dong	52e2b87d00	[Kineto][NCCL][5/n] Populate in/out split size info for all_to_all from CPU to CUDA kernel (#112308 ) Summary: This diff populates all_to_all input and out split size from CPU op to GPU kernel when valid. Test Plan: Trace example: - For non all_to_all collective functions: https://fburl.com/perfdoctor/4nobsu15 https://pxl.cl/3GNVb - For all_to_all: https://fburl.com/perfdoctor/f418goys https://pxl.cl/3H2nd Differential Revision: D50762093 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112308 Approved by: https://github.com/aaronenyeshi	2023-11-07 09:50:37 +00:00
Fei Kou	220c3bae6d	Pass in parallel strategy to tp_transform API (#112286 ) Summary: Support passing in parallel strategy map and apply TP transform based on that. This is to make it easier manually select layers from real model to parallelize and benchmark. Test Plan: buck test mode/opt -c fbcode.enable_gpu_sections=true //caffe2/test/distributed/_tensor/experimental:tp_transform Differential Revision: D50591039 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112286 Approved by: https://github.com/wanchaol	2023-11-07 09:28:58 +00:00
Elias Ellison	f6fb9fd681	use smaller batch size for timm_efficientdet in inference (#113095 ) Previously had OOMs Pull Request resolved: https://github.com/pytorch/pytorch/pull/113095 Approved by: https://github.com/xmfan ghstack dependencies: #112650	2023-11-07 07:08:16 +00:00
Aleksei Nikiforov	65304d8fd0	s390x: fix inductor constructing floats out of bytes (#112723 ) This change fixes test_embedding_bag_byte_unpack_cpu from test/inductor/test_torchinductor.py on s390x. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112723 Approved by: https://github.com/jansel	2023-11-07 06:51:46 +00:00
Will Constable	ff51f94e32	[Reland] Fix default timeouts for python entrypoints (e.g. init_process_group) (#113094 ) Previous PRs changed the c++ default timeout for PGNccl, but this path was only hit in some cases, and the python defaults took over in other cases. This PR ensures that NCCL pg always default to the changed NCCL-specific timeout value. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113094 Approved by: https://github.com/fduwjj	2023-11-07 05:34:26 +00:00
Oguz Ulgen	68c4507bc2	[Inductor] Allow None values to be passed in as arguments to triton kernels (#113056 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113056 Approved by: https://github.com/jansel ghstack dependencies: #112752, #113008, #112801	2023-11-07 05:29:42 +00:00
Oguz Ulgen	bfa717c6a6	[Inductor] Improve reinplace_scatters pass (#112801 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112801 Approved by: https://github.com/Chillee, https://github.com/jansel ghstack dependencies: #112752, #113008	2023-11-07 05:29:42 +00:00
Oguz Ulgen	f6008be266	Move all triton related testing utils into shared file (#113008 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113008 Approved by: https://github.com/zou3519, https://github.com/jansel ghstack dependencies: #112752	2023-11-07 05:29:29 +00:00
Oguz Ulgen	dbf44dffc9	[Inductor] Cache generated user defined triton kernels on tensor dtype and non tensor parameters (#112752 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112752 Approved by: https://github.com/jansel	2023-11-07 05:29:16 +00:00
Menglu Yu	f99b5f1f23	[Inductor][fx pass] Remove split nodes with split section size one (#112922 ) Summary: We observe that DSNN has many split nodes with split section size one, which hinder the split cat merge in the later pass, thus we remove such nodes in the early stage. Test Plan: # local reproduce with DSNN model ``` buck2 run mode/opt //scripts/jackiexu0313/pt2:local_model_with_pt2 -- --test_mode split_batch-group -c ``` P872705076 diffing: https://www.internalfb.com/intern/diffing/?paste_number=872698775 # unit test ``` buck2 test mode/dev-nosan //caffe2/test/inductor:split_cat_fx_passes ``` Buck UI: https://www.internalfb.com/buck2/b248410e-a556-47a2-9293-7f113b49f0d6 Test UI: https://www.internalfb.com/intern/testinfra/testrun/10696049124469023 Network: Up: 80KiB Down: 47KiB (reSessionID-a31dec17-d322-4757-ba84-4d262bd139cf) Jobs completed: 24. Time elapsed: 1:52.8s. Cache hits: 0%. Commands: 2 (cached: 0, remote: 0, local: 2) Tests finished: Pass 9. Fail 0. Fatal 0. Skip 0. Build failure 0 Differential Revision: D50990290 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112922 Approved by: https://github.com/jackiexu1992	2023-11-07 04:53:32 +00:00
Nikita Shulga	7bd066ab48	Package `pybind11/eigen/` (#113055 ) Which was added for eigen 2.11 release, see https://github.com/pybind/pybind11/tree/v2.11.0/include/pybind11/eigen Fixes https://github.com/pytorch/pytorch/issues/112841 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113055 Approved by: https://github.com/Skylion007, https://github.com/seemethere	2023-11-07 04:27:43 +00:00
Edward Z. Yang	10a829b85d	Retarget sym_size/sym_stride lowerings to their .int overloads (#113054 ) Fixes https://github.com/pytorch/pytorch/issues/112913 The new logging looks like this: ``` [2023-11-06 12:48:57,732] [0/0] torch._inductor.graph: [DEBUG] lowering %arg0_1 : [num_users=0] = placeholder[target=arg0_1] [2023-11-06 12:48:57,732] [0/0] torch._inductor.graph: [DEBUG] lowering %arg1_1 : [num_users=2] = placeholder[target=arg1_1] [2023-11-06 12:48:57,733] [0/0] torch._inductor.graph: [DEBUG] lowering %add : [num_users=1] = call_function[target=torch.ops.aten.add.Tensor](args = (%arg1_1, 1), kwargs = {}) [2023-11-06 12:48:57,733] [0/0] torch._inductor.graph: [DEBUG] via <function make_pointwise.<locals>.inner at 0x7f0abed28ee0> [2023-11-06 12:48:57,735] [0/0] torch._inductor.graph: [DEBUG] lowering %sym_stride_int : [num_users=1] = call_function[target=torch.ops.aten.sym_stride.int](args = (%add, 0), kwargs = {}) sym_stride [2023-11-06 12:48:57,735] [0/0] torch._inductor.graph: [DEBUG] lowering %mul : [num_users=1] = call_function[target=torch.ops.aten.mul.Tensor](args = (%arg1_1, %sym_stride_int), kwargs = {}) [2023-11-06 12:48:57,735] [0/0] torch._inductor.graph: [DEBUG] via <function mul at 0x7f0abec8bd00> [2023-11-06 12:48:57,744] [0/0] torch._inductor.graph: [DEBUG] lowering return (mul,) ``` Notice that `sym_stride` no longer is hitting the lowering. This is what the behavior was before I broke it. A better refactor coming soon. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113054 Approved by: https://github.com/davidberard98	2023-11-07 04:15:38 +00:00
Carlos Mocholí	c847fd2ac8	Fix `torch.compiler.cudagraph_mark_step_begin` example (#112807 ) Per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/112807 Approved by: https://github.com/eellison	2023-11-07 04:15:31 +00:00
drisspg	74c24d2367	Fixes a bug in inductor.triton.load (#113047 ) Lettin CI/CD tell me if there is anything wrong with this Original bug: ``` Shell r1 = rindex tmp37 = tl.load(out_ptr2 + (r1 + (8192*x0)), rmask, eviction_policy='evict_first', other=0) ^ AssertionError('cannot cast int32[constexpr[1],constexpr[2048]] to <[1, 2048], fp8e4nv>') ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/113047 Approved by: https://github.com/Skylion007, https://github.com/ipiszy	2023-11-07 04:06:54 +00:00
Jon Chuang	ddfe572534	[dynamo] Graph break on `setattr(Tensor, "data", Tensor)` (#113043 ) Fixes https://github.com/pytorch/pytorch/issues/113030 Alias information needs to be applied in eager before we can continue to trace the graph. ---- Perhaps this is too strict - couldn't we fx trace through the in-graph (pointer) aliasing, and track mutations through fake tensors instead, and still apply the aliasing mutation epilogue for further mutations outside of graph? 🤔 Regardless, it didn't seem to work too well when I tried this. Seems that `Tensor.__setattr__` doesn't work well in fx graph. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113043 Approved by: https://github.com/ezyang, https://github.com/voznesenskym	2023-11-07 03:56:21 +00:00
Elias Ellison	5c1ea30ca3	bump torchbench commit (#112650 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112650 Approved by: https://github.com/msaroufim, https://github.com/xuzhao9	2023-11-07 03:56:16 +00:00
Ning Wang	5cfe973bed	[PyTorch FX] ProxyableClassMeta skip map_aggregate if not is_fx_tracing (#112934 ) Summary: TorchRec KJT (https://fburl.com/code/yoaqqsgi) and LazyAwaitable (https://fburl.com/code/4bygm7tg) inherits ProxyableClassMeta in order to make torchrec model fx traceble. The issue is that even when is not fx tracing, it still triggers this `map_aggregate(args, check_proxy)` https://fburl.com/code/mpbmjsqw, which will iterate every inputs to KJT and flatten the list/dict to run a function on every element. It's super slow if the len(list) is large. This diff is to skip the map_aggregate when it's not fx tracing. Test Plan: #facebook # before: [trace](https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/devgpu021.odn1.facebook.com/rank-0.Nov_03_16_56_11.243575.pt.trace.json.gz&bucket=aps_traces) move_id_list features takes ~80ms when profiling with stack, most of the time is `map_aggregate` {F1140039564} # after: [trace](https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/traces/dynocli/devgpu021.odn1.facebook.com/rank-0.Nov_03_16_27_50.3617247.pt.trace.json.gz&bucket=aps_traces) now it's less than 3ms, no `map_aggregate` {F1140038095} Differential Revision: D50994285 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112934 Approved by: https://github.com/angelayi	2023-11-07 03:16:30 +00:00
Jeff Daily	4c04ae2451	[ROCm] fix test_softmax_forward_64bit_indexing_cuda OOM (#113093 ) TestNNDeviceTypeCUDA.test_softmax_forward_64bit_indexing_cuda started failing for ROCm after #112096 with the message torch.cuda.OutOfMemoryError: HIP out of memory. Tried to allocate 13.35 GiB. GPU 0 has a total capacity of 31.98 GiB of which 3.89 GiB is free. Of the allocated memory 26.69 GiB is allocated by PyTorch, and 18.91 MiB is reserved by PyTorch but unallocated. This amounts to approximately 41GB. The test is currently decorated with `largeTensorTest("30GB", "cuda")` but this is not sufficient for ROCm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113093 Approved by: https://github.com/malfet	2023-11-07 03:00:37 +00:00
Will Constable	8768b87bd1	Remove torch distributed from CODEOWNERS (#112813 ) After adding support for labeler, we don't need CODEOWNERS. This change will cause the distributed team members previously listed in CODEOWNERS to stop being auto-added as reviewers on PRs touching these files. The preceding PR adds labeler support for these same sets of files, and contains instructions for adding yourself to be cc'd for that label. It is preferable to be auto-cc'd rather than auto-tagged as reviewer, so that there is more signal in the reviewers list (either someone opted in which shows the PR author someone is likely looking at it, or the PR author added someone specifically which is a stronger notification to the tagged reviewer than the blanket CODEOWNERS behavior. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112813 Approved by: https://github.com/wanchaol, https://github.com/fduwjj	2023-11-07 02:43:04 +00:00
PyTorch MergeBot	1fea599d9a	Revert "Grandfather in c10d_functional ops to pt2_compliant (#113049 )" This reverts commit fe8570a1fe5c6678a4be8deff561dbc48693410e. Reverted https://github.com/pytorch/pytorch/pull/113049 on behalf of https://github.com/clee2000 due to something in the stack broke distributed and inductor, pretty sure its this one ([comment](https://github.com/pytorch/pytorch/pull/113049#issuecomment-1797298969))	2023-11-07 02:34:13 +00:00
PyTorch MergeBot	19dbd8aca3	Revert "Grandfather in some more pytorch ops to be pt2_compliant (#113050 )" This reverts commit efae8449a83df2bcd2e5f3c0f531051b6860cf0c. Reverted https://github.com/pytorch/pytorch/pull/113050 on behalf of https://github.com/clee2000 due to something in the stack broke distributed and inductor, pretty sure its the c10 one ([comment](https://github.com/pytorch/pytorch/pull/113050#issuecomment-1797279756))	2023-11-07 02:30:42 +00:00
PyTorch MergeBot	d94d72b397	Revert "Grandfather in built-in TorchScript ops to being pt2_compliant (#113061 )" This reverts commit 1d4d5e4319a5ddacdb4e0d1ac944bbb63921fdb1. Reverted https://github.com/pytorch/pytorch/pull/113061 on behalf of https://github.com/clee2000 due to something in the stack broke distributed and inductor, pretty sure its the c10 one. Not sure why so many things were flaky on this PR ([comment](https://github.com/pytorch/pytorch/pull/113061#issuecomment-1797251293))	2023-11-07 02:28:14 +00:00
Shunting Zhang	ad844e7919	[inductor] fix out of shared memory issue (#112916 ) Fix https://github.com/pytorch/pytorch/issues/112454 . The current fix is quite simple. The kernel has multiple triton configs. Previously any triton config fail to compile, we skip everything else and fail. Now we just skip the bad configs and pick the best one from the remaining configs. There are other ways to fix the issues more fundamentally but requires much more work: 1. Horace mentioned an idea to make sure the largest one of size_hints is the right most dimension. This way, that largest dimension with be mapped to XBLOCK and we won't scale it up too much since the threshold for the max grid size for x dimension is quite large (2*31 - 1). But this may require us to change loop ordering heuristics which may have other perf impact. 2. The issue happens because the kernel requires 2D tiling which uses shared memory. We can stop scaling up block size if: `XBLOCK YBLOCK * element_size >= max_shared_memory` . max_shared_memory is around 160K for A100. The tricky part here is we don't know dtype in `triton_config` method to decide the `element_size`. From metadata, we can find the dtype for each tensor, but if the kernel uses tensors of mixed types, we won't know what dtype is actually used for the data loaded into the shared memory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112916 Approved by: https://github.com/Chillee, https://github.com/eellison, https://github.com/jansel	2023-11-07 01:47:53 +00:00
Rohan Varma	c608b0eb35	[Dist] Enable FSDP on CPU (#112145 ) Differential Revision: [D50688958](https://our.internmc.facebook.com/intern/diff/D50688958/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112145 Approved by: https://github.com/fegin ghstack dependencies: #112144	2023-11-07 01:37:02 +00:00
Rohan Varma	5ffa98f7ba	[Dist] Add fallback reduce_scatter_base, allgather_base APIs to Gloo (#112144 ) Per Ke's suggestion, adding these APIs in ProcessGroupGloo directly to enable FSDP on CPUs Differential Revision: [D50636382](https://our.internmc.facebook.com/intern/diff/D50636382/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112144 Approved by: https://github.com/wz337, https://github.com/fegin, https://github.com/wanchaol, https://github.com/XilunWu	2023-11-07 01:37:02 +00:00
Justin Yip	e9496fdc34	[pytorch-vulkan] Disable failing test on vulkan_api_test (#112936 ) Summary: `conv2d_pw_prepack` and `conv2d_pw_prepack_bc` has been broken for a long time on Meta's CI. Cause unknown yet. The tests passes with a smaller input. Hence disable the large test, and enable test with a smaller tensor. Test Plan: Devserver: ``` LD_LIBRARY_PATH=third-party/swiftshader/lib/linux-x64/ buck2 run fbcode/mode/dev-nosan //xplat/caffe2:pt_vulkan_api_test_bin -- --gtest_filter="*" ``` Output: P872944689 ``` ... [ OK ] VulkanAPITest.linear_4d_small (0 ms) [ RUN ] VulkanAPITest.linear_4d_large [ OK ] VulkanAPITest.linear_4d_large (2 ms) [ RUN ] VulkanAPITest.lstm_success [ OK ] VulkanAPITest.lstm_success (7 ms) [ RUN ] VulkanAPITest.lstm_mclareninputs_success [ OK ] VulkanAPITest.lstm_mclareninputs_success (39 ms) [ RUN ] VulkanAPITest.lstm_prepack_success [ OK ] VulkanAPITest.lstm_prepack_success (3 ms) [ RUN ] VulkanAPITest.querypool_flushed_shader_log xplat/caffe2/aten/src/ATen/test/vulkan_api_test.cpp:7627: Skipped QueryPool is not available [ SKIPPED ] VulkanAPITest.querypool_flushed_shader_log (0 ms) [----------] 396 tests from VulkanAPITest (23847 ms total) [----------] Global test environment tear-down [==========] 396 tests from 1 test suite ran. (23847 ms total) [ PASSED ] 395 tests. [ SKIPPED ] 1 test, listed below: [ SKIPPED ] VulkanAPITest.querypool_flushed_shader_log YOU HAVE 9 DISABLED TESTS ``` Differential Revision: D50997218 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112936 Approved by: https://github.com/manuelcandales	2023-11-07 01:33:45 +00:00
Xuehai Pan	4893a2814f	[pytree] align function signature between C++ and Python pytree (#112482 ) Change the argument name in C++ and Python pytree APIs. Also add a test to ensure the function signatures are the same in the two implementations. - #112485 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112482 Approved by: https://github.com/zou3519	2023-11-07 01:26:41 +00:00
Peter Bell	7715b47f44	[fx] Speedup ShapeEnv cache invalidation checks (#112687 ) This may seem a bit silly but we spend ~5% of compilation on simply checking if the `ShapeEnv` cache has been invalidated. It isn't necessarily slow, but we call it millions of times per compile so everything adds up. To improve the situation, I've added a version counter to the shape env that gets incremented whenever the cache key changes. This does require a bit of care in `ShapeEnv` that we don't modify the relevant state without calling `self._update_version_counter()`. However, we already have a similar situation for the translation validation feature which requires `_set_replacement` to be called instead of modifying the replacements directly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112687 Approved by: https://github.com/ezyang ghstack dependencies: #112933	2023-11-07 01:10:25 +00:00
Peter Bell	65ecb36621	Move ShapeEnv config out of dynamo (#112933 ) Previously there was a circular dependency between fx and dynamo that happened to work out since ShapeEnv didn't access the config at module init time. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112933 Approved by: https://github.com/ezyang	2023-11-07 01:10:25 +00:00
Edward Z. Yang	b4dbb02d46	Adjust _list_with_default to also work with SymInt input (#113073 ) Fixes https://github.com/pytorch/pytorch/issues/112496 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/113073 Approved by: https://github.com/jbschlosser	2023-11-07 00:59:25 +00:00
Aaron Gokaslan	8219bf051b	[BE]: Apply RUF015 to torch folder (#113025 ) Removes unnecessary allocations of iterators. There is a small chance this may have side effects as the entire iterator is no longer consumed, but this is a way more efficient method for retrieving the first element. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113025 Approved by: https://github.com/ezyang, https://github.com/malfet	2023-11-07 00:48:15 +00:00
Kaichen Liu	fb8ffba47f	[PyTorch][Vulkan] Reduce 2D float matrix multiplication shader latency by more than 50% on some Android GPUs (#112918 ) Summary: - Introduce improved algorithm for 2d float GEMM [ output = alpha * (input) * (weight) + beta * (bias) ] that shows more than 50% shader latency reduction on Qualcomm GPUs. Does not apply for the quantized [integer] and batch [3d] matrix multiplication cases. - At function call of `run_linear_context()`/`run_addmm_context()`, re-pack the input tensor data to be row-wise element-dense in each texel - Reducing global I/O reads and writes through "batching" by fetching 4 input and weight texels each, then performing 16 output computations and writes, in each shader invocation - Leverage a loop unrolling/coalescing compile-time optimization of the GLSL->SPIR-V compiler using a macro for 4 Test Plan: # Numerical Validation - There are two pre-existing failures on trunk related to conv2d, unrelated to this diff's code paths - `LD_LIBRARY_PATH=third-party/swiftshader/lib/linux-x64/ buck run fbcode/mode/dev-nosan //xplat/caffe2:pt_vulkan_api_test_bin` ``` [ SKIPPED ] VulkanAPITest.querypool_flushed_shader_log (0 ms) [----------] 396 tests from VulkanAPITest (38014 ms total) [----------] Global test environment tear-down [==========] 396 tests from 1 test suite ran. (38014 ms total) [ PASSED ] 393 tests. [ SKIPPED ] 1 test, listed below: [ SKIPPED ] VulkanAPITest.querypool_flushed_shader_log [ FAILED ] 2 tests, listed below: [ FAILED ] VulkanAPITest.conv2d_pw_prepack [ FAILED ] VulkanAPITest.conv2d_pw_prepack_bc ``` # Performance Validation with Matrix Multiplication Benchmark Binary - build the benchmark binary on both this diff and trunk modified for 100 iterations, `buck2 build -c ndk.debug_info_level=0 -c ndk.static_linking=true -c pt.enable_qpl=0 -c pt.vulkan_use_gpu_diagnostics=1 --target-platforms=ovr_config//platform/android:arm32-fbsource //xplat/caffe2:pt_vulkan_mm_perf_test_binAndroid --show-output` __on local testing against a Samsung Galaxy S22 Ultra 75% reduction__ - this diff ``` Benchmark Time CPU Iterations --------------------------------------------------------------------------------------------------------------------------------- run_linear_context_benchmark/N:500/M:500/P:500/iterations:100/manual_time/threads:1 2.08 ms 10.3 ms 100 ``` - trunk: ``` Benchmark Time CPU Iterations --------------------------------------------------------------------------------------------------------------------------------- run_linear_context_benchmark/N:500/M:500/P:500/iterations:100/manual_time/threads:1 9.11 ms 13.8 ms 100 ``` __on local testing against our Android chipset of interest 50% reduction__ - this diff: ``` Benchmark Time CPU Iterations --------------------------------------------------------------------------------------------------------------------------------- [...] run_linear_context_benchmark/N:500/M:500/P:500/iterations:100/manual_time/threads:1 40.0 ms 90.6 ms 100 ``` - trunk: ``` Benchmark Time CPU Iterations --------------------------------------------------------------------------------------------------------------------------------- [...] run_linear_context_benchmark/N:500/M:500/P:500/iterations:100/manual_time/threads:1 81.3 ms 106 ms 100 ``` __on local testing against Google Pixel 7 Pro 55% reduction__ - this diff: ``` Benchmark Time CPU Iterations --------------------------------------------------------------------------------------------------------------------------------- [...] run_linear_context_benchmark/N:500/M:500/P:500/iterations:100/manual_time/threads:1 7.38 ms 10.7 ms 100 ``` - trunk: ``` Benchmark Time CPU Iterations --------------------------------------------------------------------------------------------------------------------------------- [...] run_linear_context_benchmark/N:500/M:500/P:500/iterations:100/manual_time/threads:1 16.2 ms 12.4 ms 100 ``` Reviewed By: yipjustin Differential Revision: D50441864 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112918 Approved by: https://github.com/yipjustin	2023-11-07 00:32:04 +00:00
Hongtao Yu	24b61a45c9	[inductor] scale up num_warps for reductions to lower register pressure (#113039 ) Recent work (https://github.com/pytorch/pytorch/pull/108193 and https://github.com/pytorch/pytorch/pull/109275) unveiled that bigger Triton kernel can regress performance due to increased register pressure which in turn lowers thread occupancy. By taking a look at the Triton internal, I see an opportunity to reduce the register pressure by decreasing the amount of work each thread does. I'm bumping up the `num_warps` to achieve this. The change should only affect reduction cases. I'm seeing real compilation time reduction with this change which is likely due to smaller LLVM IR: https://hud.pytorch.org/benchmark/compilers?startTime=Mon%2C%2023%20Oct%202023%2017%3A57%3A40%20GMT&stopTime=Mon%2C%2006%20Nov%202023%2018%3A57%3A40%20GMT&granularity=hour&suite=torchbench&mode=training&dtype=amp&lBranch=hoy-reduction&lCommit=f2d31b83aa170914018407d88a76d5951153b316&rBranch=main&rCommit=64f326097be8ac66ff057365f3bed2d64c697563 The slightly performance improvement can be noise, if not, the lower register pressure could explain. Ideally, we should improve Triton to automatically reroll large kernel to an inner loop, without hurting vectorization. That's something I'm considering on the LLVM side. I'm also seeing the fused kernel provided in https://github.com/pytorch/pytorch/pull/108193 gets a better performance by benefiting from a lower register pressure. PTXAS shows a usage of 32 registers compared to 55 previously. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113039 Approved by: https://github.com/shunting314	2023-11-07 00:12:16 +00:00
soulitzer	c2084da14a	[NT] Backward support for broadcasting binary ops (#112519 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112519 Approved by: https://github.com/jbschlosser ghstack dependencies: #113031	2023-11-07 00:03:21 +00:00
soulitzer	d5007d8d8e	Split out input_metadata.cpp from input_metadata.h (#113031 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113031 Approved by: https://github.com/albanD	2023-11-07 00:03:21 +00:00
rzou	1d4d5e4319	Grandfather in built-in TorchScript ops to being pt2_compliant (#113061 ) I'm seeing ops like torch.ops.aten.mul.complex being used with torch.compile (though this seems strange to me), but we should grandfather these in. Test Plan: - new tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/113061 Approved by: https://github.com/ezyang ghstack dependencies: #113036, #113049, #113050	2023-11-06 23:43:31 +00:00
rzou	efae8449a8	Grandfather in some more pytorch ops to be pt2_compliant (#113050 ) We're not directly testing these, but in general the policy is to assume that PyTorch ops inside the pytorch repo are compliant. Test Plan: - existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/113050 Approved by: https://github.com/ezyang ghstack dependencies: #113036, #113049	2023-11-06 23:43:31 +00:00
rzou	fe8570a1fe	Grandfather in c10d_functional ops to pt2_compliant (#113049 ) This PR also adds the ability to specify Tags for more `m.def(` overloads. Test Plan: - new test Pull Request resolved: https://github.com/pytorch/pytorch/pull/113049 Approved by: https://github.com/williamwen42 ghstack dependencies: #113036	2023-11-06 23:43:23 +00:00
rzou	71dca16610	Grandfather autogen'ed ops as pt2_compliant (#113036 ) Summary: I missed this when I grandfathered torchgen'ed aten ops as pt2_compliant. Test Plan: New test. Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/113036 Approved by: https://github.com/williamwen42	2023-11-06 23:43:17 +00:00
PyTorch MergeBot	75adb9f371	Revert "Fix default timeouts for python entrypoints (e.g. init_process_group) (#112893 )" This reverts commit f9d47e13813bbefc9f19a6c0430b7122f9d09b91. Reverted https://github.com/pytorch/pytorch/pull/112893 on behalf of https://github.com/clee2000 due to sorry this seems to have broken inductor `f9d47e1381` https://github.com/pytorch/pytorch/actions/runs/6776367936/job/18418174752 ([comment](https://github.com/pytorch/pytorch/pull/112893#issuecomment-1796979811))	2023-11-06 22:49:53 +00:00
Thiago Crepaldi	eefe327b11	Rename torch.onnx.ExportOutput* to ONNXProgram* (#112263 ) Since PyTorch 2.1, torch.export API was introduced and the term "export" got overloaded due to the already existing torch.onnx.export API. The torch.onnx.dynamo_export API was introduced on pyTorch 2.0 and it exposed a torch.onnx.ExportOutput which now can be confused with torch.export.export output To prevent such ambiguity and standardize names around the new torch.export.ExportedProgram, this PR renames torch.onnx.ExportOutput to torch.onnx.ONNXProgram Pull Request resolved: https://github.com/pytorch/pytorch/pull/112263 Approved by: https://github.com/BowenBao ghstack dependencies: #112444	2023-11-06 22:27:15 +00:00
Alexander Grund	21b6030ac3	Don't set CUDA_HOME when not compiled with CUDA support (#106310 ) It doesn't make sense to set this (on import!) as CUDA cannot be used with PyTorch in this case but leads to messages like > No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' when CUDA happens to be installed which is at least confusing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106310 Approved by: https://github.com/ezyang	2023-11-06 21:48:49 +00:00
Antonio Kim	27e31ab6e8	Add support for `torch.Generator` type in TorchScript (#110413 ) - Add support for `torch.Generator` type in TorchScript - Add `generator` args to all `torch.nn.init` functions that call `uniform_` or `normal_` - Add support for `torch.Generator` in LTC's TorchScript backend (CC: @wconstab) CC: @eellison @davidberard98 @GlebKazantaev @behzad-a Pull Request resolved: https://github.com/pytorch/pytorch/pull/110413 Approved by: https://github.com/wconstab, https://github.com/albanD, https://github.com/glebk-cerebras, https://github.com/davidberard98	2023-11-06 21:27:02 +00:00
Vidit Agarwal	7b99b3efb1	added 'weights_only' param in torch.load examples (#112860 ) Fixes #111876 `torch.load` without setting `weights_only=True` is unsafe. So updating examples of `torch.load` to use `weights_only=True` where possible and `weights_only=False` elsewhere with a warning of being unsafety. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112860 Approved by: https://github.com/kit1980	2023-11-06 21:17:36 +00:00
Thiago Crepaldi	c83112a31f	Add Autocast support to Conv thourgh explicit cast (#112806 ) Fix ONNX Runtime failure due to `[ONNXRuntimeError] : 1 : FAIL : Type Error : Type parameter (T) of Optype (Conv) bound to different types (tensor(float) and tensor(float16) in node (Conv_5401).` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112806 Approved by: https://github.com/BowenBao	2023-11-06 21:00:18 +00:00
Will Constable	f9d47e1381	Fix default timeouts for python entrypoints (e.g. init_process_group) (#112893 ) Previous PRs changed the c++ default timeout for PGNccl, but this path was only hit in some cases, and the python defaults took over in other cases. This PR ensures that NCCL pg always default to the changed NCCL-specific timeout value. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112893 Approved by: https://github.com/xw285cornell, https://github.com/kwen2501, https://github.com/XilunWu ghstack dependencies: #112611, #112803	2023-11-06 20:48:39 +00:00
vfdev	81ea7a489a	Replaced deprecated pkg_resources.packaging with packaging module (#113023 ) Usage of `from pkg_resources import packaging` leads to a deprecation warning: ``` DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html ``` and in strict tests where warnings are errors, this leads to CI breaks, e.g.: https://github.com/pytorch/vision/pull/8092 Replacing `pkg_resources.package` with `package` as it is now a pytorch dependency: `fa9045a872/requirements.txt (L19)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/113023 Approved by: https://github.com/Skylion007	2023-11-06 20:26:32 +00:00
Bin Bao	67256d5c1c	[aotinductor] Solves a problem where a tensor is returned more than once (#112177 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112177 Approved by: https://github.com/zhxchen17	2023-11-06 20:12:25 +00:00
Peter Bell	718035791d	Prefer `e.is_number` over `not e.free_symbols` in SymPy (#112688 ) We spend somewhere on the order 1% in `sympy.Expr.free_symbols` as it is called millions of times. Most of the time we actually just want to know "is this a constant", however `e.is_constant()` is horribly slow. It turns out though that there is another propery `is_number` that does what we want. > property is_number: > > Returns True if self has no free symbols and no undefined functions (AppliedUndef, to be precise). It will be faster > than if not self.free_symbols, however, since is_number will fail as soon as it hits a free symbol or undefined > function. Even further, we also avoid the overhead of building the unnecessary set object. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112688 Approved by: https://github.com/lezcano	2023-11-06 20:05:13 +00:00
Mengwei Liu	19e9f5cc7b	[torchgen] Add support for optional tensor (#112938 ) Summary: As titled Test Plan: rely on CI Differential Revision: D50997957 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112938 Approved by: https://github.com/Skylion007	2023-11-06 20:03:05 +00:00
Kai Londenberg	bdfde62e54	[Inductor CUTLASS backend] Epilogue fusion codegen (Step 1) (#110890 ) Summary: This PR adds epilogue fusion code generation support for the new experimental [Inductor Cutlass backend]([https://github.com/pytorch/pytorch/pull/108015]). Details: A fusion happens on the GEMM template level by taking a Cutlass 3.x GEMM Universal Matmul Kernel template and adding a custom template functor based on Cutlass new “Epilogue Visitor Trees” (EVT) on top, which represents and performs the computation of the fused Pointwise / Elementwise computation nodes. This is the approach dictated by [NVIDIA/cutlass example 49](https://github.com/NVIDIA/cutlass/blob/main/examples/49_hopper_gemm_with_collective_builder/49_collective_builder.cu), which is currently the only documentation and example of Cutlass Epilogue Visitor Trees. This EVT functor in turn is a hierarchical template expression which represents an abstract syntax tree of the fused computation to perform. A second codegen task is to create a hierarchical initializer expression, which provides potentially necessary arguments to each of the functor subexpressions. Step 1 functionality: * End to end code generation is possible using the above approach. * Supports simple elementwise expression fusion of chains of elementwise operations (with scalar constants ) after a matmul. * Elementwise operation support includes addition, subtraction, multiplication, division, minimum, maximum etc. * Examples / Unit tests include ReLU and ReLU6 fusion. * Support for fp16 and fp16 with fp32 accumulation data types. * Generates SM90 ( Hopper ) based CUDA Kernels ( as Cutlass up to 3.2.0 only supported EVT for SM90 ) The following is not yet supported, and is left for future work: * Full operation support ( e.g. full set of all ops usually handled via V.ops handlers ) * Cutlass EVT with SM80 support ( possible in Cutlass 3.2.1 according to release notes, but not yet documented ) * Add support for additional (auxiliary) inputs, which changes the Template Kernels' call signature * Add support for additional (auxiliary) outputs ( requires support for full computation graphs ) * Add support for reduction operations and operations which use different output layouts than the input * Add support for additional dtypes ( as far as Cutlass allows ) This PR updates third_party/cutlass to v3.2.2, which has some important improvements and features for the inductor backend. See also Cutlass release notes: https://github.com/NVIDIA/cutlass/releases/tag/v3.2.1 and https://github.com/NVIDIA/cutlass/releases/tag/v3.2.2 Notable changes in Cutlass 3.2.1 include: * Cutlass codegen python code has moved into a package with the "cutlass_library" namespace, which allows to prevent namespace clashes without resolving to monkey-patching ( which was done earlier ). * Support for SM80 epilogue visitor trees ( according to the Release Notes, not tried yet ) * Small API changes to the cutlass_library API ( requires adapting the inductor backend code ) Notable changes in Cutlass 3.2.2 include: * Bugfix that led to CUDA Illegal memory access in some Pytorch unit tests involving flash attention Test Plan: * CI * pytest test/inductor/test_max_autotune.py Note: So far, the CUTLASS backend is still disabled by default. Benchmarks are planned once more advanced fusions are enabled. Differential Revision: [D50988161](https://our.internmc.facebook.com/intern/diff/D50988161) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110890 Approved by: https://github.com/jansel ghstack dependencies: #112762	2023-11-06 19:42:10 +00:00
vfdev	59e003d159	Fixed cat uint8 lowering (#112753 ) Description: - Fixed cat uint8 lowering Otherwise, it gives the following issue on the repro code: ```python def func(x): batch_shape = x.shape[:1] out = torch.cat([x.new_zeros(1).expand(batch_shape + (1,)), x], dim=-1) return out cfunc = torch.compile(func) x = torch.randint(0, 256, size=(3, 255), dtype=torch.uint8) out = cfunc(x) ``` Error message: ``` File "/pytorch/torch/_inductor/lowering.py", line 1037, in <genexpr> if all(len(input.layout.size) == 4 for input in inputs): File "/pytorch/torch/_inductor/ir.py", line 5795, in __getattr__ fn = getattr(self.data, name) torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised: LoweringException: AttributeError: 'ExpandView' object has no attribute 'layout' target: aten.cat.default args[0]: [TensorBox( ExpandView(data=StorageBox( ComputedBuffer(name='buf0', layout=FlexibleLayout('cpu', torch.uint8, size=[1], stride=[1]), data=Pointwise( 'cpu', torch.uint8, def inner_fn(index): _ = index tmp0 = ops.constant(0, torch.uint8) return tmp0 , ranges=[1], origin_node=full, origins={full} )) ), size=[3, 1]) ), TensorBox(StorageBox( InputBuffer(name='arg0_1', layout=FixedLayout('cpu', torch.uint8, size=[3, 255], stride=[255, 1])) ))] args[1]: 1 Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information ``` Context: compiling is not working for torchvision's `F.equalize` op: https://github.com/pytorch/vision/issues/8056 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112753 Approved by: https://github.com/peterbell10	2023-11-06 19:42:04 +00:00
PaliC	542fa4a2e7	Revert "Revert "Use OpOverload instead of OpOverloadPacket for size/s… (#113058 ) Revert "Revert "Use OpOverload instead of OpOverloadPacket for size/stride/etc slots (#112119)"" This reverts commit a1d1b73a7c2cf6b9a2edb4170ec268dfd90956bd. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113058 Approved by: https://github.com/izaitsevfb	2023-11-06 19:38:49 +00:00
Iris Zhang	118e842fdf	[2D][test] Update 2d test to reflect distributed_state_dict API changes (#112967 ) As title Fixes https://github.com/pytorch/pytorch/issues/113033 Fixes https://github.com/pytorch/pytorch/issues/112969 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112967 Approved by: https://github.com/wanchaol, https://github.com/fegin, https://github.com/huydhn	2023-11-06 19:36:30 +00:00
Justin Yip	4d9546cc1b	[pytorch-vulkan] conv1d, only handle special case (#112880 ) Summary: Just enough to cover the requirement for our target use-case. Will add complete implementation later. Test Plan: ``` (base) yipjustin@yipjustin-mac fbsource % buck2 run --target-platforms ovr_config//platform/macos:arm64-fbsource //xplat/caffe2:pt_vulkan_api_test_binAppleMac\#macosx-arm64 -- --gtest_filter="conv1d" File changed: fbsource//xplat/caffe2/aten/src/ATen/test/vulkan_api_test.cpp Buck UI: https://www.internalfb.com/buck2/27291bfe-940a-4bed-9616-8f3b4f2a3fc7 Network: Up: 20MiB Down: 142B (reSessionID-5632e058-9f48-40eb-8157-30e2db104272) Jobs completed: 6. Time elapsed: 13.5s. Cache hits: 0%. Commands: 2 (cached: 0, remote: 0, local: 2) BUILD SUCCEEDED Running main() from third-party/googletest/1.11.0/googletest/googletest/src/gtest_main.cc Note: Google Test filter = conv1d [==========] Running 2 tests from 1 test suite. [----------] Global test environment set-up. [----------] 2 tests from VulkanAPITest [ RUN ] VulkanAPITest.conv1d_simple [ OK ] VulkanAPITest.conv1d_simple (37 ms) [ RUN ] VulkanAPITest.conv1d [ OK ] VulkanAPITest.conv1d (2 ms) [----------] 2 tests from VulkanAPITest (39 ms total) [----------] Global test environment tear-down [==========] 2 tests from 1 test suite ran. (39 ms total) [ PASSED ] 2 tests. ``` Differential Revision: D50914117 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112880 Approved by: https://github.com/manuelcandales	2023-11-06 19:36:08 +00:00
Min Si	ab1f6d58bc	[c10d] use allocator trace callbacks for NCCL PG register (#112850 ) Summary: We need to register all cache segments allocated by allocator, so that NCCL can apply zero copy algorithms at collective and point-to-point operations. How to track and register all cache segments: - It registers a register and a deregister hook to cache allocator as action tracker callbacks, tracking SEGMENT_ALLOC and SEGMENT_FREE trace entries, respectively. When SEGMENT_ALLOC is tracked, the register hook will register to the PG's communicators on the same device. Similarly, when SEGMENT_FREE is tracked, the deregister hook handles deregistration before cudaFree. - When a new NCCL communicator is created, it dumps the snapspot from cache allocator to register all existing cache segments at once. - When a NCCL communicator is aborted, it deregisters all segments that have been registered by this communicator Test Plan: See test in D50726971 Reviewed By: wconstab Differential Revision: D50726970 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112850 Approved by: https://github.com/wconstab	2023-11-06 19:29:32 +00:00
Sahdev Zala	c6ecd018d5	Fix docstring errors (#112693 ) This PR reduces docstring erros to 0 from total 128. This can be verified by running, pydocstyle path-to-distributed_c10d.py --count Where, path-to-distributed_c10d.py is `torch/distributed/distributed_c10d.py` BEFORE the PR: `pydocstyle torch/distributed/distributed_c10d.py --count` 128 AFTER the PR: `pydocstyle torch/distributed/distributed_c10d.py --count` 0 Fixes #112640 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112693 Approved by: https://github.com/H-Huang	2023-11-06 18:45:05 +00:00
Gleb Kazantaev	5248bc9c8e	[LTC] Fix type inference for native_layer_norm_backward (#112948 ) ## Description Fix a bug in compute_shape_native_layer_norm_backward function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112948 Approved by: https://github.com/Skylion007	2023-11-06 18:30:08 +00:00
Chien-Chin Huang	a810126cf7	[FSDP][optim_state_dict] Skip the parameter if the parameter does not belong to the current FSDP instance (#112804 ) Skip the fsdp managed parameter if the parameter is not managed by the current FSDP instance. This can happen if the not all FSDP instances have all the parameters. This can happen with FSDP + some MPMD style parallelism. Differential Revision: [D50562170](https://our.internmc.facebook.com/intern/diff/D50562170/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112804 Approved by: https://github.com/wz337	2023-11-06 18:23:36 +00:00
Josh Levy-Kramer	5f562afff3	[DTensor] min, max and prod sharding propagation rules (#112403 ) * `torch/distributed/_tensor/ops/math_ops.py` and `test/distributed/_tensor/test_math_ops.py`: add min, max and prod sharding propagation rules * `torch/distributed/_tensor/sharding_prop.py` Validate OutputSpec to provide better errors when provided invalid specs * `torch/distributed/_tensor/op_schema.py`: import `OpOverload` directly to aid linters Pull Request resolved: https://github.com/pytorch/pytorch/pull/112403 Approved by: https://github.com/wanchaol	2023-11-06 18:02:39 +00:00
andrewor14	b6e85eb8d5	[quant][pt2] Support quantized conv bias in QAT fusion (#112528 ) Summary: Previously QAT fusion assumes bias is not quantized. This works for the existing XNNPACKQuantizer, but not for custom quantizers that wish to quantize the bias. This commit supports this by adding the necessary patterns. This requires refactoring the code, however, since it previously assumed that there will only be one pair of q-dq (from conv weight) in the matched pattern, and this is no longer true. Test Plan: python test/test_quantization.py TestQuantizePT2EQAT.test_qat_conv_bn_bias_derived_qspec Reviewers: jerryzh168, kimishpatel Subscribers: jerryzh168, kimishpatel, supriyar Differential Revision: [D50856377](https://our.internmc.facebook.com/intern/diff/D50856377) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112528 Approved by: https://github.com/jerryzh168	2023-11-06 17:58:57 +00:00
Eddie Yan	e39668770a	[CUDA] 64-bit indexing fixes for cross-entropy kernels (#112096 ) For #108345, #111484 Addresses the forward kernels implicated in the issues, but will take another look at the backward kernels (in follow-up PRs if necessary). The spatial softmax kernel is changed to use signed integer indexing rather than unsigned as `ScalarType` only has signed integer types declared for now, but this should be a minor change. CC @ptrblck @crcrpar (who landed a few related PRs recently). Pull Request resolved: https://github.com/pytorch/pytorch/pull/112096 Approved by: https://github.com/mikaylagawarecki	2023-11-06 17:37:08 +00:00
atalman	a50f6d3685	Move release docker container builds to ubuntu22.04 (#113032 ) Move Official Docker builds for the release to : nvidia/cuda:12.1.1-cudnn8-devel-ubuntu22.04 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113032 Approved by: https://github.com/malfet, https://github.com/huydhn	2023-11-06 17:33:40 +00:00
Senthil Kumar N	3f62531191	Fix: docstring errors in `torch.nn.utils` - parametrizations.py/prune.py/weight_norm.py (#113021 ) Fixes #112631. As the previous PR #112943 has some accidental merge and it resolved through this PR. - torch/nn/utils/parametrizations.py Before - 6 ``` torch\nn\utils\parametrizations.py:1 at module level: D100: Missing docstring in public module torch\nn\utils\parametrizations.py:23 in private function `_make_orthogonal`: D205: 1 blank line required between summary line and description (found 0) torch\nn\utils\parametrizations.py:23 in private function `_make_orthogonal`: D210: No whitespaces allowed surrounding docstring text torch\nn\utils\parametrizations.py:178 in public function `orthogonal`: D401: First line should be in imperative mood (perhaps 'Apply', not 'Applies') torch\nn\utils\parametrizations.py:309 in public function `weight_norm`: D401: First line should be in imperative mood (perhaps 'Apply', not 'Applies') torch\nn\utils\parametrizations.py:483 in public function `spectral_norm`: D401: First line should be in imperative mood (perhaps 'Apply', not 'Applies') 6 ``` After - 1 ``` torch\nn\utils\parametrizations.py:1 at module level: D100: Missing docstring in public module 1 ``` - torch/nn/utils/prune.py Before - 100 ``` torch\nn\utils\prune.py:1 at module level: D200: One-line docstring should fit on one line with quotes (found 3) torch\nn\utils\prune.py:1 at module level: D400: First line should end with a period (not 's') torch\nn\utils\prune.py:13 in public class `BasePruningMethod`: D204: 1 blank line required after class docstring (found 0) torch\nn\utils\prune.py:21 in public method `__call__`: D205: 1 blank line required between summary line and description (found 0) torch\nn\utils\prune.py:21 in public method `__call__`: D400: First line should end with a period (not ')') torch\nn\utils\prune.py:34 in public method `compute_mask`: D205: 1 blank line required between summary line and description (found 0) torch\nn\utils\prune.py:34 in public method `compute_mask`: D401: First line should be in imperative mood (perhaps 'Compute', not 'Computes') torch\nn\utils\prune.py:53 in public method `apply_mask`: D205: 1 blank line required between summary line and description (found 0) torch\nn\utils\prune.py:53 in public method `apply_mask`: D400: First line should end with a period (not 'g') torch\nn\utils\prune.py:74 in public method `apply`: D205: 1 blank line required between summary line and description (found 0) torch\nn\utils\prune.py:74 in public method `apply`: D400: First line should end with a period (not 'd') torch\nn\utils\prune.py:74 in public method `apply`: D401: First line should be in imperative mood (perhaps 'Add', not 'Adds') torch\nn\utils\prune.py:200 in public method `prune`: D205: 1 blank line required between summary line and description (found 0) torch\nn\utils\prune.py:200 in public method `prune`: D400: First line should end with a period (not '`') torch\nn\utils\prune.py:200 in public method `prune`: D401: First line should be in imperative mood (perhaps 'Compute', not 'Computes') torch\nn\utils\prune.py:229 in public method `remove`: D205: 1 blank line required between summary line and description (found 0) torch\nn\utils\prune.py:229 in public method `remove`: D400: First line should end with a period (not 'd') torch\nn\utils\prune.py:229 in public method `remove`: D401: First line should be in imperative mood (perhaps 'Remove', not 'Removes') torch\nn\utils\prune.py:256 in public class `PruningContainer`: D205: 1 blank line required between summary line and description (found 0) torch\nn\utils\prune.py:264 in public method `__init__`: D107: Missing docstring in __init__ torch\nn\utils\prune.py:277 in public method `add_pruning_method`: D401: First line should be in imperative mood (perhaps 'Add', not 'Adds') torch\nn\utils\prune.py:297 in public method `__len__`: D105: Missing docstring in magic method torch\nn\utils\prune.py:300 in public method `__iter__`: D105: Missing docstring in magic method torch\nn\utils\prune.py:303 in public method `__getitem__`: D105: Missing docstring in magic method torch\nn\utils\prune.py:307 in public method `compute_mask`: D205: 1 blank line required between summary line and description (found 0) torch\nn\utils\prune.py:307 in public method `compute_mask`: D400: First line should end with a period (not 's') torch\nn\utils\prune.py:307 in public method `compute_mask`: D401: First line should be in imperative mood (perhaps 'Apply', not 'Applies') torch\nn\utils\prune.py:335 in private nested function `_combine_masks`: D205: 1 blank line required between summary line and description (found 0) torch\nn\utils\prune.py:335 in private nested function `_combine_masks`: D400: First line should end with a period (not ':') torch\nn\utils\prune.py:404 in public class `Identity`: D205: 1 blank line required between summary line and description (found 0) torch\nn\utils\prune.py:404 in public class `Identity`: D400: First line should end with a period (not 'e') torch\nn\utils\prune.py:410 in public method `compute_mask`: D102: Missing docstring in public method torch\nn\utils\prune.py:416 in public method `apply`: D205: 1 blank line required between summary line and description (found 0) torch\nn\utils\prune.py:416 in public method `apply`: D400: First line should end with a period (not 'd') torch\nn\utils\prune.py:416 in public method `apply`: D401: First line should be in imperative mood (perhaps 'Add', not 'Adds') torch\nn\utils\prune.py:442 in public method `__init__`: D107: Missing docstring in __init__ torch\nn\utils\prune.py:447 in public method `compute_mask`: D102: Missing docstring in public method torch\nn\utils\prune.py:469 in public method `apply`: D205: 1 blank line required between summary line and description (found 0) torch\nn\utils\prune.py:469 in public method `apply`: D400: First line should end with a period (not 'd') torch\nn\utils\prune.py:469 in public method `apply`: D401: First line should be in imperative mood (perhaps 'Add', not 'Adds') torch\nn\utils\prune.py:486 in public class `L1Unstructured`: D205: 1 blank line required between summary line and description (found 0) torch\nn\utils\prune.py:486 in public class `L1Unstructured`: D400: First line should end with a period (not 's') torch\nn\utils\prune.py:498 in public method `__init__`: D107: Missing docstring in __init__ torch\nn\utils\prune.py:503 in public method `compute_mask`: D102: Missing docstring in public method torch\nn\utils\prune.py:527 in public method `apply`: D205: 1 blank line required between summary line and description (found 0) torch\nn\utils\prune.py:527 in public method `apply`: D400: First line should end with a period (not 'd') torch\nn\utils\prune.py:527 in public method `apply`: D401: First line should be in imperative mood (perhaps 'Add', not 'Adds') torch\nn\utils\prune.py:564 in public method `__init__`: D107: Missing docstring in __init__ torch\nn\utils\prune.py:571 in public method `compute_mask`: D205: 1 blank line required between summary line and description (found 0) torch\nn\utils\prune.py:571 in public method `compute_mask`: D401: First line should be in imperative mood (perhaps 'Compute', not 'Computes') torch\nn\utils\prune.py:634 in public method `apply`: D205: 1 blank line required between summary line and description (found 0) torch\nn\utils\prune.py:634 in public method `apply`: D400: First line should end with a period (not 'd') torch\nn\utils\prune.py:634 in public method `apply`: D401: First line should be in imperative mood (perhaps 'Add', not 'Adds') torch\nn\utils\prune.py:653 in public class `LnStructured`: D205: 1 blank line required between summary line and description (found 0) torch\nn\utils\prune.py:653 in public class `LnStructured`: D400: First line should end with a period (not 'r') torch\nn\utils\prune.py:669 in public method `__init__`: D107: Missing docstring in __init__ torch\nn\utils\prune.py:677 in public method `compute_mask`: D205: 1 blank line required between summary line and description (found 0) torch\nn\utils\prune.py:677 in public method `compute_mask`: D401: First line should be in imperative mood (perhaps 'Compute', not 'Computes') torch\nn\utils\prune.py:747 in public method `apply`: D205: 1 blank line required between summary line and description (found 0) torch\nn\utils\prune.py:747 in public method `apply`: D400: First line should end with a period (not 'd') torch\nn\utils\prune.py:747 in public method `apply`: D401: First line should be in imperative mood (perhaps 'Add', not 'Adds') torch\nn\utils\prune.py:779 in public class `CustomFromMask`: D101: Missing docstring in public class torch\nn\utils\prune.py:783 in public method `__init__`: D107: Missing docstring in __init__ torch\nn\utils\prune.py:786 in public method `compute_mask`: D102: Missing docstring in public method torch\nn\utils\prune.py:793 in public method `apply`: D205: 1 blank line required between summary line and description (found 0) torch\nn\utils\prune.py:793 in public method `apply`: D400: First line should end with a period (not 'd') torch\nn\utils\prune.py:793 in public method `apply`: D401: First line should be in imperative mood (perhaps 'Add', not 'Adds') torch\nn\utils\prune.py:806 in public function `identity`: D205: 1 blank line required between summary line and description (found 0) torch\nn\utils\prune.py:806 in public function `identity`: D400: First line should end with a period (not 'e') torch\nn\utils\prune.py:806 in public function `identity`: D401: First line should be in imperative mood (perhaps 'Apply', not 'Applies') torch\nn\utils\prune.py:839 in public function `random_unstructured`: D205: 1 blank line required between summary line and description (found 0) torch\nn\utils\prune.py:839 in public function `random_unstructured`: D400: First line should end with a period (not '`') torch\nn\utils\prune.py:874 in public function `l1_unstructured`: D205: 1 blank line required between summary line and description (found 0) torch\nn\utils\prune.py:874 in public function `l1_unstructured`: D400: First line should end with a period (not '`') torch\nn\utils\prune.py:916 in public function `random_structured`: D205: 1 blank line required between summary line and description (found 0) torch\nn\utils\prune.py:916 in public function `random_structured`: D400: First line should end with a period (not '`') torch\nn\utils\prune.py:955 in public function `ln_structured`: D205: 1 blank line required between summary line and description (found 0) torch\nn\utils\prune.py:955 in public function `ln_structured`: D400: First line should end with a period (not '`') torch\nn\utils\prune.py:1000 in public function `global_unstructured`: D205: 1 blank line required between summary line and description (found 0) torch\nn\utils\prune.py:1000 in public function `global_unstructured`: D400: First line should end with a period (not '`') torch\nn\utils\prune.py:1120 in public function `custom_from_mask`: D205: 1 blank line required between summary line and description (found 0) torch\nn\utils\prune.py:1120 in public function `custom_from_mask`: D400: First line should end with a period (not '`') torch\nn\utils\prune.py:1154 in public function `remove`: D205: 1 blank line required between summary line and description (found 0) torch\nn\utils\prune.py:1154 in public function `remove`: D400: First line should end with a period (not 'e') torch\nn\utils\prune.py:1154 in public function `remove`: D401: First line should be in imperative mood (perhaps 'Remove', not 'Removes') torch\nn\utils\prune.py:1184 in public function `is_pruned`: D205: 1 blank line required between summary line and description (found 0) torch\nn\utils\prune.py:1184 in public function `is_pruned`: D400: First line should end with a period (not 'r') torch\nn\utils\prune.py:1211 in private function `_validate_pruning_amount_init`: D401: First line should be in imperative mood (perhaps 'Validate', not 'Validation') torch\nn\utils\prune.py:1243 in private function `_validate_pruning_amount`: D205: 1 blank line required between summary line and description (found 0) torch\nn\utils\prune.py:1243 in private function `_validate_pruning_amount`: D400: First line should end with a period (not 'e') torch\nn\utils\prune.py:1243 in private function `_validate_pruning_amount`: D401: First line should be in imperative mood (perhaps 'Validate', not 'Validation') torch\nn\utils\prune.py:1265 in private function `_validate_structured_pruning`: D205: 1 blank line required between summary line and description (found 0) torch\nn\utils\prune.py:1265 in private function `_validate_structured_pruning`: D400: First line should end with a period (not '-') torch\nn\utils\prune.py:1265 in private function `_validate_structured_pruning`: D401: First line should be in imperative mood (perhaps 'Validate', not 'Validation') torch\nn\utils\prune.py:1284 in private function `_compute_nparams_toprune`: D205: 1 blank line required between summary line and description (found 0) torch\nn\utils\prune.py:1284 in private function `_compute_nparams_toprune`: D400: First line should end with a period (not 'a') torch\nn\utils\prune.py:1308 in private function `_validate_pruning_dim`: D205: 1 blank line required between summary line and description (found 0) torch\nn\utils\prune.py:1308 in private function `_validate_pruning_dim`: D400: First line should end with a period (not ':') torch\nn\utils\prune.py:1318 in private function `_compute_norm`: D205: 1 blank line required between summary line and description (found 0) torch\nn\utils\prune.py:1318 in private function `_compute_norm`: D400: First line should end with a period (not 'n') 100 ``` After - 14 ``` torch\nn\utils\prune.py:266 in public method `__init__`: D107: Missing docstring in __init__ torch\nn\utils\prune.py:299 in public method `__len__`: D105: Missing docstring in magic method torch\nn\utils\prune.py:302 in public method `__iter__`: D105: Missing docstring in magic method torch\nn\utils\prune.py:305 in public method `__getitem__`: D105: Missing docstring in magic method torch\nn\utils\prune.py:411 in public method `compute_mask`: D102: Missing docstring in public method torch\nn\utils\prune.py:445 in public method `__init__`: D107: Missing docstring in __init__ torch\nn\utils\prune.py:450 in public method `compute_mask`: D102: Missing docstring in public method torch\nn\utils\prune.py:502 in public method `__init__`: D107: Missing docstring in __init__ torch\nn\utils\prune.py:507 in public method `compute_mask`: D102: Missing docstring in public method torch\nn\utils\prune.py:570 in public method `__init__`: D107: Missing docstring in __init__ torch\nn\utils\prune.py:677 in public method `__init__`: D107: Missing docstring in __init__ torch\nn\utils\prune.py:790 in public class `CustomFromMask`: D101: Missing docstring in public class torch\nn\utils\prune.py:794 in public method `__init__`: D107: Missing docstring in __init__ torch\nn\utils\prune.py:797 in public method `compute_mask`: D102: Missing docstring in public method 14 ``` - torch/nn/utils/weight_norm.py Before - 10 ``` torch\nn\utils\weight_norm.py:1 at module level: D200: One-line docstring should fit on one line with quotes (found 3) torch\nn\utils\weight_norm.py:1 at module level: D400: First line should end with a period (not '8') torch\nn\utils\weight_norm.py:12 in public class `WeightNorm`: D101: Missing docstring in public class torch\nn\utils\weight_norm.py:16 in public method `__init__`: D107: Missing docstring in __init__ torch\nn\utils\weight_norm.py:23 in public method `compute_weight`: D102: Missing docstring in public method torch\nn\utils\weight_norm.py:29 in public method `apply`: D102: Missing docstring in public method torch\nn\utils\weight_norm.py:59 in public method `remove`: D102: Missing docstring in public method torch\nn\utils\weight_norm.py:66 in public method `__call__`: D102: Missing docstring in public method torch\nn\utils\weight_norm.py:73 in public function `weight_norm`: D401: First line should be in imperative mood (perhaps 'Apply', not 'Applies') torch\nn\utils\weight_norm.py:137 in public function `remove_weight_norm`: D401: First line should be in imperative mood (perhaps 'Remove', not 'Removes') 10 ``` After - 6 ``` torch\nn\utils\weight_norm.py:10 in public class `WeightNorm`: D101: Missing docstring in public class torch\nn\utils\weight_norm.py:14 in public method `__init__`: D107: Missing docstring in __init__ torch\nn\utils\weight_norm.py:21 in public method `compute_weight`: D102: Missing docstring in public method torch\nn\utils\weight_norm.py:27 in public method `apply`: D102: Missing docstring in public method torch\nn\utils\weight_norm.py:57 in public method `remove`: D102: Missing docstring in public method torch\nn\utils\weight_norm.py:64 in public method `__call__`: D102: Missing docstring in public method 6 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/113021 Approved by: https://github.com/lezcano	2023-11-06 17:24:32 +00:00
Nikita Shulga	88920b26be	[Cmake] Check that gcc-9.4 or newer is used (#112858 ) As this is the oldest gcc that is fully compatible with C++17 standard. - Replace number of conditional version with simpler `if(CMAKE_COMPILER_IS_GNUCXX)` or `append_cxx_flag_if_supported`. - As `-Wsuggest-override` condition was hidden before incorrect guard, add missing `override` keywords to `torch::autograd::PyFunctionTensorPostAccGradHooks::apply_with_saved` , `caffe2::python::TensorFeeder::Feed` and `cafee2::NetObserverReporterPrint::report``` Fixes https://github.com/pytorch/pytorch/issues/101839 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112858 Approved by: https://github.com/Skylion007, https://github.com/albanD	2023-11-06 17:19:53 +00:00
PyTorch MergeBot	77d5f0379e	Revert "[HigherOrderOp] remove _deprecated_global_ns (#112757 )" This reverts commit fa81237af74e21e8d5b8e2d0f600ee9056bde4b8. Reverted https://github.com/pytorch/pytorch/pull/112757 on behalf of https://github.com/PaliC due to breaking a bunch of executorch tests ([comment](https://github.com/pytorch/pytorch/pull/112757#issuecomment-1795503740))	2023-11-06 17:04:19 +00:00
PyTorch MergeBot	a1d1b73a7c	Revert "Use OpOverload instead of OpOverloadPacket for size/stride/etc slots (#112119 )" This reverts commit 2337d8d0625f230f9a0469c5806e282fa4b964e9. Reverted https://github.com/pytorch/pytorch/pull/112119 on behalf of https://github.com/PaliC due to still breaking trt tests :( refer to diff ([comment](https://github.com/pytorch/pytorch/pull/112119#issuecomment-1795496395))	2023-11-06 17:01:50 +00:00
PyTorch MergeBot	679ca510b0	Revert "[Cmake] Check that gcc-9.4 or newer is used (#112858 )" This reverts commit ad894cd0728e97c649cd9b33e1f98b18fa12a1da. Reverted https://github.com/pytorch/pytorch/pull/112858 on behalf of https://github.com/PaliC due to breaking internal tests (check diff for test page) ([comment](https://github.com/pytorch/pytorch/pull/112858#issuecomment-1795485009))	2023-11-06 16:56:09 +00:00
Richard Zou	185515368b	Add generated opcheck test for if the pt2_compliant_tag is incorrectly applied (#112759 ) Summary: If there are xfails in the failures_dict and the operator has the pt2_compliant_tag, then we raise an error. These generated tests are separate from those in the failures dict because we don't actually need any sample inputs to check this. Test Plan: - New tests Differential Revision: D50936201 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112759 Approved by: https://github.com/ezyang	2023-11-06 13:45:35 +00:00
Aaron Gokaslan	376217cc0b	[BE]: Apply FURB145 to make code more readable and idiomatic. (#112990 ) Testing out some new rules that are in beta, I think I will apply this one codebase wide once it's out of preview. Replaces the hack of using `[:]` to do copies of list with the proper copy method. More efficient and more readable. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112990 Approved by: https://github.com/ezyang	2023-11-06 13:15:04 +00:00
PyTorch UpdateBot	fa9045a872	[xla hash update] update the pinned xla hash (#113011 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml). Update the pinned xla hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113011 Approved by: https://github.com/pytorchbot	2023-11-06 10:57:25 +00:00
PyTorch MergeBot	2bc1378d7b	Revert "[aotinductor] Solves a problem where a tensor is returned more than once (#112177 )" This reverts commit a91baaf314999abaaf93260f87b1ee109bb36541. Reverted https://github.com/pytorch/pytorch/pull/112177 on behalf of https://github.com/PaliC due to breaking internal tests (refer to internal diff) ([comment](https://github.com/pytorch/pytorch/pull/112177#issuecomment-1794153272))	2023-11-06 06:20:32 +00:00
CaoE	455241bbd3	Add Half for aten2, logaddexp, logaddexp2, hypot, and nextafter on CPU (#112138 ) Add Half for aten2, logaddexp, logaddexp2, hypot, and nextafter on CPU. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112138 Approved by: https://github.com/cpuhrsch	2023-11-06 06:01:29 +00:00
Bin Bao	bd9be877e4	[aotinductor] Move cache_dir to utils.py (#112728 ) Summary: Some tests can utilize cache_dir() Pull Request resolved: https://github.com/pytorch/pytorch/pull/112728 Approved by: https://github.com/jansel, https://github.com/chenyang78 ghstack dependencies: #112651	2023-11-06 03:42:10 +00:00
chunyuan	46a34e8c75	Inductor cpp wrapper: fix QMaxPool (#112379 ) Based on the `Argument types` section in this [file](`cb942ef2b1/aten/src/ATen/native (func)`), for non-inplace `Tensor` type in schema, it should be mapped to C++ argument of type `const Tensor&`. For `quantized_max_pool1d` and `quantized_max_pool2d`, the type of the `qx` input is `Tensor` type in the schema, thus modified the C++ type to be `const Tensor&`: `cb942ef2b1/aten/src/ATen/native/quantized/library.cpp (L222-L223)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112379 Approved by: https://github.com/jgong5, https://github.com/jansel ghstack dependencies: #112373, #112378	2023-11-06 02:07:51 +00:00
Nikita Shulga	3be0e1cd58	`c10::DriverAPI` Try opening libcuda.so.1 (#112996 ) As `libcuda.so` is only installed on dev environment (i.e. when CUDAToolkit is installed), while `libcuda.so.1` is part of NVIDIA driver. Also, this will keep it aligned with `a5cb8f75a7/aten/src/ATen/cuda/detail/LazyNVRTC.cpp (L16)` Also, change `TORCH_INTERNAL_ASSERT` to `TORCH_CHECK` as one can legitimate fail to open one, if driver could not be found. Fixes https://github.com/pytorch/pytorch/issues/112957 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112996 Approved by: https://github.com/kit1980, https://github.com/Skylion007 ghstack dependencies: #112994, #112995	2023-11-05 23:20:22 +00:00
Nikita Shulga	d0a80f8af1	Better errors in `c10::DriverAPI` on `dl` failure (#112995 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112995 Approved by: https://github.com/Skylion007 ghstack dependencies: #112994	2023-11-05 23:20:22 +00:00
Nikita Shulga	57191172f8	[BE] Use static local variable instead of `call_once` (#112994 ) See https://en.cppreference.com/w/cpp/language/storage_duration#Static_local_variables And also, it's weird to mix two paradigms together, as static local variable is used to initialize `DriverAPI::get()` singleton mere 3 lines below Pull Request resolved: https://github.com/pytorch/pytorch/pull/112994 Approved by: https://github.com/Skylion007	2023-11-05 23:20:11 +00:00
Aaron Gokaslan	9c1fb2cbb3	[BE]: Enable ruff PIE794 and fix bugs it found in test suite (#112989 ) Enables some tests that were incorrectly not being run and enables PIE794 globally. This rule checks if a classvar is defined twice as flags it as it is likely a bug. In fact, we found several cases where it was a bug. It does have a couple of false positives which I flagged upstream and replaced with noqas: https://github.com/astral-sh/ruff/issues/8497 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112989 Approved by: https://github.com/malfet	2023-11-05 22:11:53 +00:00
Prachi Gupta	07123bc198	[ROCm] Build Triton in Centos for ROCm (#112050 ) Triton build for centos-based ROCm Dockerfile was missing. This brings centos Dockerfile up-to-date with ubuntu Dockerfile. No CI job covers this change; this change is independently verified by ROCm QA team. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112050 Approved by: https://github.com/jataylo, https://github.com/malfet	2023-11-05 20:43:56 +00:00
Jason Ansel	a5cb8f75a7	[dynamo] Replace checkpointing with speculate/restart in graph_break_if_unsupported (#112921 ) See comment in #112902 for context. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112921 Approved by: https://github.com/voznesenskym ghstack dependencies: #112902	2023-11-05 17:09:29 +00:00
Jason Ansel	7818a2887a	[dynamo] Replace InstructionTranslator.checkpoint with speculate/restart (#112902 ) In my work on making guards installed eagerly (look up the stack), I found that our checkpoint/restore mechanism is very broken. There is lots of state (especially in shape_env) which we don't checkpoint and restore properly. We also have lots of mutable state on variable trackers already which is not checkpointed/restored. (See other PRs in this stack for some spot fixes.) Since we wanted to get rid of this anyway for making VariableTracker mutable, I figured I would just switch to restarting analysis. For other usages of copy_graphstate/restore_graphstate: 1) Many usages were pointless and not needed, these are removed in PRs below this. 2) Some other usage (similar to this one) is removed in PRs above this. 3) The tricky one I am not handling is higher_order_ops, which uses checkpoint/restore a lot. There might be some cases there where this speculate/restart trick won't work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112902 Approved by: https://github.com/voznesenskym	2023-11-05 17:09:29 +00:00
CaoE	7a18376187	Add Half support for poisson and use float for Half cumulative distribution on CPU (#112124 ) Add Half support for poisson and use float for Half cumulative distribution on CPU Pull Request resolved: https://github.com/pytorch/pytorch/pull/112124 Approved by: https://github.com/cpuhrsch	2023-11-05 16:10:27 +00:00
Ken Jin	674c104d12	Fix RecursionError in Inductor for large for loops (#112320 ) Fixes https://github.com/pytorch/pytorch/issues/111686 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112320 Approved by: https://github.com/peterbell10	2023-11-05 13:12:54 +00:00
Pearu Peterson	e64d250210	Add a tool for a semi-automatic optimization of bsr_dense_mm meta parameters. (#112737 ) Finding optimal meta parameters for bsr_dense_mm and bsr_scatter_mm triton kernels is a tedious job. This PR introduces a tool (a Python script `torch/sparse/_triton_ops_meta.py`) that finds the optimal set of meta parameters for a given set of matrix multiplication inputs and their block sizes. Currently, such a set is found for square bsr tensor inputs with sizes 256...16384 and square blocksizes 16...128, and dense tensor inputs with sizes 256...131072. As a result, bsr_dense_mm performance has increased as follows (`NVIDIA A100-SXM4-80GB`): - for blocksize 16x16, the average/maximum speed up is about 40/60 %. - for blocksize 32x32, the average/maximum speed up is about 28/45 %. - for blocksize 64x64, the average/maximum speed up is about 26/43 %. - for blocksize 128x128, the average/maximum speed up is about 12/28 %. To enable the performance improvements through meta parameter optimization for other CUDA devices, one must execute the `_triton_ops_meta.py` which will calculate the optimal meta parameters and store the results in a dictionary object defined in `_triton_ops_meta.py`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112737 Approved by: https://github.com/cpuhrsch	2023-11-05 12:52:09 +00:00
CaoE	26b5e27ace	Add Half support for cummax, cummin, cumprod, logcumsumexp, and prod on CPU (#112132 ) Add Half support for cummax, cummin, cumprod, logcumsumexp, and prod on CPU. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112132 Approved by: https://github.com/cpuhrsch	2023-11-05 12:31:38 +00:00
Jason Ansel	64f326097b	[dynamo] Refactor handling of state in context managers (#112939 ) The prior handling was rather buggy... Pull Request resolved: https://github.com/pytorch/pytorch/pull/112939 Approved by: https://github.com/voznesenskym, https://github.com/yanboliang ghstack dependencies: #112897, #112898, #112920, #112899	2023-11-05 03:10:30 +00:00
Huamin Li	ea4b63db62	Back out "[aotinductor] Add example_value metadata to nodes (#112415 )" (#112946 ) Summary: Original commit changeset: 967c6272c8e2 Original Phabricator Diff: D50802786 D50802786 is introding perf regression for AOTInductor internal models. Differential Revision: D51002032 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112946 Approved by: https://github.com/houseroad	2023-11-05 01:27:42 +00:00
Jason Ansel	3a41fff5c0	[dynamo] Remove empty_checkpoint (#112899 ) Refactor to make it easier to remove `self.checkpoint`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112899 Approved by: https://github.com/voznesenskym, https://github.com/yanboliang ghstack dependencies: #112897, #112898, #112920	2023-11-05 00:44:21 +00:00
Jason Ansel	d78b5e5403	[dynamo] Remove checkpoint in GenericContextManager (#112920 ) Checkpointing here is pointless since we just call `unimplemented()`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112920 Approved by: https://github.com/voznesenskym, https://github.com/yanboliang ghstack dependencies: #112897, #112898	2023-11-05 00:44:21 +00:00
Jason Ansel	2ba2525d12	[dynamo] Remove checkpoint in conditional (#112898 ) Checkpointing here is pointless since we just call `unimplemented()`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112898 Approved by: https://github.com/voznesenskym, https://github.com/yanboliang ghstack dependencies: #112897	2023-11-05 00:44:02 +00:00
Jason Ansel	a6b42b5ada	[dynamo] Remove checkpoint in inline_user_function_return (#112897 ) This usage is pointless since if we are throwing an exception the state doesn't matter. Extra graphs are from fixing a AttributeError("tensor_variable") which previosly caused the remainer of the frame to fallback. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112897 Approved by: https://github.com/voznesenskym, https://github.com/yanboliang	2023-11-05 00:43:52 +00:00
Aaron Gokaslan	847c7c6da6	Update ruff to v0.1.4 (#112966 ) Updates ruff which fixes some bugs and updates an API to be used more consistently `rule now takes --output-format with the old argname deprecated`. A lot of rule bugfixes and autofixes have been added to the pydocstyle rules which will be useful for the docathon. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112966 Approved by: https://github.com/kit1980, https://github.com/justinchuby	2023-11-05 00:00:11 +00:00
Jez Ng	f908b0e9a3	[dynamo] Enable typechecking for hooks.py (#112565 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112565 Approved by: https://github.com/Skylion007 ghstack dependencies: #112561, #112562, #112563, #112564	2023-11-04 19:37:06 +00:00
Jez Ng	fe41a9ce08	[dynamo] Enable typechecking for resume_execution.py (#112564 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112564 Approved by: https://github.com/williamwen42, https://github.com/eellison ghstack dependencies: #112561, #112562, #112563	2023-11-04 19:37:06 +00:00
Jez Ng	3b34c818ac	[dynamo] Enable typechecking for test_minifier_common.py (#112563 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112563 Approved by: https://github.com/Skylion007, https://github.com/eellison ghstack dependencies: #112561, #112562	2023-11-04 19:36:56 +00:00
Jez Ng	ca4fe028c8	[dynamo] Enable typechecking for replay_record.py (#112562 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112562 Approved by: https://github.com/Skylion007 ghstack dependencies: #112561	2023-11-04 19:36:38 +00:00
Jez Ng	b8ac5bbcbd	[dynamo] Enable typechecking for bytecode_transformation.py (#112561 ) As part of this diff, I have upgraded the `python_version` config setting to 3.11. `bytecode_transformation.py` (and a few other files) have functions using APIs only available in Python 3.11+. Those APIs are gated by a sys.version_info check in their typeshed .pyi files. So setting the min version to 3.11 allows those functions to typecheck properly. An alternative is to make the relevant types Any: ``` if sys.version_info >= (3, 11): _Positions = dis.Positions else: _Positions = Any ``` However, with python_version = 3.8, that means we're not getting any useful typechecking signal when encountering values of type _Position. Changing the python_version to 3.11 does mean that we will stop typechecking codepaths that run only on lower versions, but that seems a small price to pay. It does also mean that we won't catch code that uses newer APIs without the appropriate version check, but again, not sure this has much of an impact. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112561 Approved by: https://github.com/ezyang	2023-11-04 19:36:27 +00:00
Will Constable	854882bbf4	Add test for init_process_group timeout (#112803 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112803 Approved by: https://github.com/H-Huang ghstack dependencies: #112611	2023-11-04 19:10:41 +00:00
Jon Chuang	247b5bdbb5	[dynamo (easy)] Add skip reason to debug logs (#112869 ) Fixes https://github.com/pytorch/pytorch/issues/112867 Example logs ``` [2023-11-03 12:51:02,230] torch._dynamo.eval_frame: [DEBUG] skipping: helper (reason: in skipfiles, file: /usr/lib/python3.10/contextlib.py) [2023-11-03 12:51:02,230] torch._dynamo.eval_frame: [DEBUG] skipping: __init__ (reason: in skipfiles, file: /usr/lib/python3.10/contextlib.py) [2023-11-03 12:51:02,230] torch._dynamo.eval_frame: [DEBUG] skipping: __enter__ (reason: in skipfiles, file: /usr/lib/python3.10/contextlib.py) [2023-11-03 12:51:02,230] torch._dynamo.eval_frame: [DEBUG] skipping: backend_cache_wrapper (reason: in skipfiles, file: /home/jonch/Desktop/Programming/mlsys/pytorch/torch/_dynamo/eval_frame.py) [2023-11-03 12:51:02,230] torch._dynamo.eval_frame: [DEBUG] skipping: _maybe_init_guarded_backend_cache (reason: in skipfiles, file: /home/jonch/Desktop/Programming/mlsys/pytorch/torch/_dynamo/eval_frame.py) [2023-11-03 12:51:02,230] torch._dynamo.eval_frame: [DEBUG] skipping: innermost_fn (reason: in skipfiles, file: /home/jonch/Desktop/Programming/mlsys/pytorch/torch/_dynamo/eval_frame.py) [2023-11-03 12:51:02,230] torch._dynamo.eval_frame: [DEBUG] skipping: _set_current_backend (reason: in skipfiles, file: /home/jonch/Desktop/Programming/mlsys/pytorch/torch/_dynamo/eval_frame.py) [2023-11-03 12:51:02,230] torch._dynamo.eval_frame: [DEBUG] skipping: __init__ (reason: in skipfiles, file: /usr/lib/python3.10/contextlib.py) [2023-11-03 12:51:02,230] torch._dynamo.eval_frame: [DEBUG] skipping: __enter__ (reason: in skipfiles, file: /usr/lib/python3.10/contextlib.py) [2023-11-03 12:51:02,230] torch._dynamo.eval_frame: [DEBUG] skipping: enable_dynamic (reason: in skipfiles, file: /home/jonch/Desktop/Programming/mlsys/pytorch/torch/_dynamo/eval_frame.py) [2023-11-03 12:51:02,247] [0/0] torch._dynamo.symbolic_convert: [INFO] Step 1: torchdynamo start tracing fn /home/jonch/Desktop/sdpa.py:1635 [2023-11-03 12:51:02,248] [0/0] torch._dynamo.symbolic_convert.__trace_source: [DEBUG] TRACE starts_line /home/jonch/Desktop/sdpa.py:1635 in fn (fn) [2023-11-03 12:51:02,248] [0/0] torch._dynamo.symbolic_convert.__trace_source: [DEBUG] def fn(x): [2023-11-03 12:51:02,313] [0/0] torch._dynamo.output_graph: [DEBUG] create_graph_input L_x_ L['x'] [2023-11-03 12:51:02,314] [0/0] torch._dynamo.variables.builder: [DEBUG] wrap_to_fake L['x'] (3,) [<DimDynamic.STATIC: 2>] [None] [2023-11-03 12:51:02,314] [0/0] torch._dynamo.symbolic_convert.__trace_source: [DEBUG] TRACE starts_line /home/jonch/Desktop/sdpa.py:1636 in fn (fn) [2023-11-03 12:51:02,314] [0/0] torch._dynamo.symbolic_convert.__trace_source: [DEBUG] x = x + 1 [2023-11-03 12:51:02,314] [0/0] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST x [] ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112869 Approved by: https://github.com/jansel	2023-11-04 18:08:42 +00:00
Evgeni Burovski	d5fff7338e	BUG: gracefully fall back to numpy.random if asked in dynamo.config (#109205 ) Graph break if `config.use_numpy_random_stream=True` instead of a hard failure in inductor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109205 Approved by: https://github.com/lezcano	2023-11-04 14:54:05 +00:00
Iris Zhang	9af3f98faf	[DTensor] Fix DTensor.from_local() returns DTensor with wrong size for uneven sharded tensor (#110781 ) Fixes #110762 This PR: fixes issue described in #110762 by adding kwarg for shape and stride when creating DTensor using `DTensor.from_local()`. When `shape` and `stride` are provided, we skip calcualtion for `tensor_shape` and `tensor_stride` using `compute_global_tensor_info()`, as `compute_global_tensor_info()` always assume even sharding. Test plan: ``` python3 test/distributed/_tensor/test_dtensor.py -k test_from_local_uneven_sharding python3 test/distributed/_tensor/test_dtensor.py -k test_from_local_uneven_sharding_raise_error ``` cc. @wanchaol Pull Request resolved: https://github.com/pytorch/pytorch/pull/110781 Approved by: https://github.com/wanchaol	2023-11-04 11:21:10 +00:00
cyy	add78ac425	Fix a type error in AppendOnlyList (#112362 ) AppendOnlyList::emplace_back allocates an array and overwrites the first slot, which is unsafe on a non-trivial type. This PR fixes it and add other checks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112362 Approved by: https://github.com/aaronenyeshi	2023-11-04 07:06:42 +00:00
Nikita Shulga	ad894cd072	[Cmake] Check that gcc-9.4 or newer is used (#112858 ) As this is the oldest gcc that is fully compatible with C++17 standard. - Replace number of conditional version with simpler `if(CMAKE_COMPILER_IS_GNUCXX)` or `append_cxx_flag_if_supported`. - As `-Wsuggest-override` condition was hidden before incorrect guard, add missing `override` keywords to `torch::autograd::PyFunctionTensorPostAccGradHooks::apply_with_saved` , `caffe2::python::TensorFeeder::Feed` and `cafee2::NetObserverReporterPrint::report``` Fixes https://github.com/pytorch/pytorch/issues/101839 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112858 Approved by: https://github.com/Skylion007, https://github.com/albanD	2023-11-04 05:40:08 +00:00
Edward Z. Yang	dfb26d5999	Reland "Symintify repeat_interleave (#109133 )" (#112726 ) This reverts commit 08dbfecdbdf2af6f66b3226881c71d8977431197. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112726 Approved by: https://github.com/albanD	2023-11-04 05:15:55 +00:00
Iris Zhang	596dab4277	[DeviceMesh] Remove _validate_mesh from device_mesh.py (#112928 ) Plan B for https://github.com/pytorch/pytorch/pull/112839 Motivation for the change: 1. We need to remove `funcol` as a dependency for device_mesh.py to resolve circular dependency issues when introducing device_mesh as an arg for DDP. In the meantime, we should not go from funcol to non-funcol as @voznesenskym suggested. Therefore, we want to remove this all_gather check completely. 2. For large scale, it would not make sense to validate the mesh at global scale anyway. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112928 Approved by: https://github.com/wanchaol	2023-11-04 05:12:27 +00:00
Jon Chuang	fb044e2b17	[aot_autograd] Check that autocast states are never mutated by graphs passed to AOTAutograd (#112822 ) Fixes https://github.com/pytorch/pytorch/issues/112659 As explained in https://github.com/pytorch/pytorch/pull/112396, Dynamo will never pass a graph to AOTAutograd that mutates autocast state. If it is not needed, we do not want to support mutation wrappers for autocast state like for grad mode (https://github.com/pytorch/pytorch/pull/112396) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112822 Approved by: https://github.com/bdhirsh	2023-11-04 03:29:55 +00:00
chilli	0ac748cd29	Make pattern-matcher failure diagnostics lazy (again) and added an error message if format string is too long (#112923 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112923 Approved by: https://github.com/eellison ghstack dependencies: #112476	2023-11-04 02:54:17 +00:00
Will Constable	418c5206ec	Make `test_distributed_spawn.py` tell you how to run it correctly (#112924 ) Sample output if incorrect/missing args are specified: ``` RuntimeError: Missing expected env vars for `test_distributed_spawn.py`. Please ensure to specify the following: 'BACKEND' = one of ('gloo', 'nccl', 'ucc') 'WORLD_SIZE' = int >= 2 'TEMP_DIR' specifying a directory containing a barrier file named 'barrier'. e.g. touch /tmp/barrier && TEMP_DIR=/tmp BACKEND='nccl' WORLD_SIZE=2 python /data/users/whc/pytorch/test/distributed/test_distributed_spawn.py ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112924 Approved by: https://github.com/wanchaol	2023-11-04 02:43:43 +00:00
leslie-fang-intel	b4ce501137	[Inductor] [Quant] Re-structure Quantization testcase pattern matcher check (#112570 ) Summary This Diff re-structures Quantization testcase pattern matcher check. Instead of checking all the pattern matched in the Inductor, we will only check the core pattern match count and node numbers such as: dequant promotion, QConv/Linear Unary and QConv Binary. TestPlan ``` python -m pytest test_mkldnn_pattern_matcher.py -k test_q ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112570 Approved by: https://github.com/jgong5, https://github.com/jerryzh168	2023-11-04 01:11:34 +00:00
JackCaoG	042445b7d3	Add new Macro to count ops and time lazy tracing (#112679 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/112679 Approved by: https://github.com/alanwaketan	2023-11-04 00:40:29 +00:00
Justin Yip	075cb6bab6	[pytorch-vulkan] slices to support zero-size output (#112879 ) Summary: With D50030659, we are able to support zero-size tensor. Hence remove the check in slice. Also update related tests. Test Plan: ``` [yipjustin@189650.od ~/fbsource (876ab81e3)]$ LD_LIBRARY_PATH=third-party/swiftshader/lib/linux-x64/ buck2 run fbcode/mode/dev-nosan //xplat/caffe2:pt_vulkan_api_test_bin -- --gtest_filter="slice" File changed: fbsource//xplat/caffe2/aten/src/ATen/test/vulkan_api_test.cpp File changed: fbcode//caffe2/aten/src/ATen/test/vulkan_api_test.cpp File changed: fbcode//caffe2/aten/src/ATen/native/vulkan/ops/Slice.cpp 1 additional file change events Buck UI: https://www.internalfb.com/buck2/85adf6a3-7d17-4685-8d8b-a0b600df0b73 Network: Up: 44KiB Down: 1.3MiB (reSessionID-5afd53d4-0303-4f4d-a245-1eb810308fd3) Jobs completed: 6. Time elapsed: 22.8s. Cache hits: 0%. Commands: 2 (cached: 0, remote: 1, local: 1) BUILD SUCCEEDED Running main() from third-party/googletest/1.11.0/googletest/googletest/src/gtest_main.cc Note: Google Test filter = slice [==========] Running 6 tests from 1 test suite. [----------] Global test environment set-up. [----------] 6 tests from VulkanAPITest [ RUN ] VulkanAPITest.slice_width_success [ OK ] VulkanAPITest.slice_width_success (160 ms) [ RUN ] VulkanAPITest.slice_height_success [ OK ] VulkanAPITest.slice_height_success (6 ms) [ RUN ] VulkanAPITest.slice_feature_success [ OK ] VulkanAPITest.slice_feature_success (84 ms) [ RUN ] VulkanAPITest.slice_batch_success [ OK ] VulkanAPITest.slice_batch_success (5 ms) [ RUN ] VulkanAPITest.slice_zero_sized [ OK ] VulkanAPITest.slice_zero_sized (0 ms) [ RUN ] VulkanAPITest.slice_invalidinputs_exceptions [ OK ] VulkanAPITest.slice_invalidinputs_exceptions (0 ms) [----------] 6 tests from VulkanAPITest (257 ms total) [----------] Global test environment tear-down [==========] 6 tests from 1 test suite ran. (257 ms total) [ PASSED ] 6 tests. ``` Differential Revision: D50961979 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112879 Approved by: https://github.com/manuelcandales	2023-11-04 00:22:38 +00:00
Maxime Arthaud	62cbe86ac0	[torch] Skip the assertion on the return type when the annotation is a forward reference (#112870 ) Summary: The assertion is causing build failures when running Pysa, our security-focused static analyzer. This is because we run `pyre infer` on the source code before analyzing it, which introduces annotations such as `def foo() -> 'torch._tensor.Tensor'`. This does not work with the `out_wrapper` decorator which relies on inspecting the signature of the decorated function. Let's skip the check on the return type if we detect that it was introduced by `pyre infer`. Test Plan: eyes Differential Revision: D50976601 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112870 Approved by: https://github.com/ZainRizvi	2023-11-04 00:22:13 +00:00
Kai Londenberg	e36dba3a94	[Cutlass 3.2.2 submodule upgrade] Adapt Inductor cutlass backend to Cutlass 3.2.2 (#112762 ) The inductor cutlass backend was written against Cutlass version 3.1.x, there are some incompatible changes in Cutlass 3.2.2 which the Inductor cutlass backend needs to adapt to. Test plan: If third_party/cutlass is upgraded to Cutlass tag v3.2.2, several tests within test/inductor/test_max_autotune.py start to fail. With this diff applied, they pass again. Differential Revision: [D50986555](https://our.internmc.facebook.com/intern/diff/D50986555) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112762 Approved by: https://github.com/ipiszy, https://github.com/drisspg	2023-11-04 00:10:50 +00:00
Justin Yip	8f10a2321d	[pytorch-vulkan] log, log_softmax (#112828 ) Summary: tsia Test Plan: ``` [yipjustin@189650.od ~/fbsource (631468db3)]$ LD_LIBRARY_PATH=third-party/swiftshader/lib/linux-x64/ buck2 run fbcode/mode/dev-nosan //xplat/caffe2:pt_vulkan_api_test_bin -- --gtest_filter="softmax" File changed: fbsource//xplat/caffe2/aten/src/ATen/test/vulkan_api_test.cpp File changed: fbcode//caffe2/aten/src/ATen/native/vulkan/ops/Softmax.cpp File changed: fbcode//caffe2/aten/src/ATen/test/vulkan_api_test.cpp 1 additional file change events Buck UI: https://www.internalfb.com/buck2/d4f62e52-aba9-448a-a181-cf8881affb14 Network: Up: 0B Down: 0B Jobs completed: 4. Time elapsed: 0.5s. BUILD SUCCEEDED Running main() from third-party/googletest/1.11.0/googletest/googletest/src/gtest_main.cc Note: Google Test filter = softmax [==========] Running 2 tests from 1 test suite. [----------] Global test environment set-up. [----------] 2 tests from VulkanAPITest [ RUN ] VulkanAPITest.softmax [ OK ] VulkanAPITest.softmax (467 ms) [ RUN ] VulkanAPITest.log_softmax [ OK ] VulkanAPITest.log_softmax (95 ms) [----------] 2 tests from VulkanAPITest (563 ms total) [----------] Global test environment tear-down [==========] 2 tests from 1 test suite ran. (563 ms total) [ PASSED ] 2 tests. YOU HAVE 1 DISABLED TEST [yipjustin@189650.od ~/fbsource (631468db3)]$ LD_LIBRARY_PATH=third-party/swiftshader/lib/linux-x64/ buck2 run fbcode/mode/dev-nosan //xplat/caffe2:pt_vulkan_api_test_bin -- --gtest_filter="log" Buck UI: https://www.internalfb.com/buck2/e8210eb5-fd56-45f7-bf6c-5024931e778e Network: Up: 0B Down: 0B Jobs completed: 4. Time elapsed: 0.2s. BUILD SUCCEEDED Running main() from third-party/googletest/1.11.0/googletest/googletest/src/gtest_main.cc Note: Google Test filter = log [==========] Running 4 tests from 1 test suite. [----------] Global test environment set-up. [----------] 4 tests from VulkanAPITest [ RUN ] VulkanAPITest.log_softmax [ OK ] VulkanAPITest.log_softmax (572 ms) [ RUN ] VulkanAPITest.unary_op_log [ OK ] VulkanAPITest.unary_op_log (0 ms) [ RUN ] VulkanAPITest.unary_op_log_ [ OK ] VulkanAPITest.unary_op_log_ (59 ms) [ RUN ] VulkanAPITest.querypool_flushed_shader_log xplat/caffe2/aten/src/ATen/test/vulkan_api_test.cpp:7677: Skipped QueryPool is not available [ SKIPPED ] VulkanAPITest.querypool_flushed_shader_log (0 ms) [----------] 4 tests from VulkanAPITest (633 ms total) [----------] Global test environment tear-down [==========] 4 tests from 1 test suite ran. (633 ms total) [ PASSED ] 3 tests. [ SKIPPED ] 1 test, listed below: [ SKIPPED ] VulkanAPITest.querypool_flushed_shader_log YOU HAVE 1 DISABLED TEST ``` Differential Revision: D50961359 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112828 Approved by: https://github.com/manuelcandales	2023-11-04 00:08:44 +00:00
Mikayla Gawarecki	df149581bc	Tabulate outputs in inference benchmark (#112900 ) - Fix error where script was always compiling model - Make`runner.sh` parse outputs into nice `.md` format Pull Request resolved: https://github.com/pytorch/pytorch/pull/112900 Approved by: https://github.com/albanD ghstack dependencies: #112582, #112863	2023-11-03 23:53:30 +00:00
leslie-fang-intel	6ba2748690	[Quant] [PT2] Enable Decomposed quant per tensor/channel to accept bfloat16 input (#112225 ) Summary - PR 4 for enabling Int8-Mixed-BF16 PT2E PTQ Quantization with Inductor https://github.com/pytorch/pytorch/issues/111640. - Enable `decomposed quant_per_tensor` and `quant_per_channel` accepts bfloat16 input. TestPlan ``` python -m pytest test_quantized_tensor.py -k test_decomposed_quantize_per_tensor_bfloat16_input python -m pytest test_quantized_tensor.py -k test_decomposed_quantize_per_channel_bfloat16_input ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112225 Approved by: https://github.com/jgong5, https://github.com/jerryzh168	2023-11-03 23:47:43 +00:00
Oguz Ulgen	67e8762e83	[Inductor] Kill has_aliasing (#112875 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112875 Approved by: https://github.com/Chillee	2023-11-03 23:22:22 +00:00
Will Constable	65b74c9254	Make init_process_group timeout kwarg override pg_options (#112611 ) This used to be ambiguous but the pg_options._timeout value, if passed in, is being ignored. Make it sane and warn if 2 values are provided. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112611 Approved by: https://github.com/H-Huang	2023-11-03 23:13:03 +00:00
ydwu4	fa81237af7	[HigherOrderOp] remove _deprecated_global_ns (#112757 ) As titled. Test Plan: existing test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112757 Approved by: https://github.com/zou3519	2023-11-03 23:03:18 +00:00
Zhijing Li (Accelerator Enablement)	55971c5c4e	Enable concurrent reader for getRecord function (#112818 ) Summary: Use concurrent multiple readers to access record from different start index. It can provide better performance when the data being accessed is large. bypass-github-pytorch-ci-checks Test Plan: ``` buck2 run @//mode/dev //caffe2/caffe2/serialize:inline_container_test ``` Reviewed By: YazhiGao Differential Revision: D50957607 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112818 Approved by: https://github.com/houseroad, https://github.com/huydhn	2023-11-03 22:55:27 +00:00
stan	57a3af900e	Add suggested changes to init.py (#112864 ) A follow-up of PR #112617 on issue #112596 Added suggested changes from the review. - More specific on the type of uniform and normal distribution used. ```py def xavier_uniform_(tensor: Tensor, gain: float = 1.) -> Tensor: r"""Fill the input `Tensor` with values using a Xavier uniform distribution. The method is described in `Understanding the difficulty of training... """ ``` ```py def kaiming_normal_( tensor: Tensor, a: float = 0, mode: str = 'fan_in', nonlinearity: str = 'leaky_relu' ): r"""Fill the input `Tensor` with values using a Kaiming normal distribution. The method is described in `Delving deep into rectifiers: Surpassing... """ ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112864 Approved by: https://github.com/kit1980	2023-11-03 22:46:48 +00:00
Iris Zhang	973f730dda	[DCP] Add test for planner option for load_sharded_optimizer_state_dict (#112891 ) Add test for a user submitted PR: https://github.com/pytorch/pytorch/pull/112259 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112891 Approved by: https://github.com/fegin	2023-11-03 22:37:49 +00:00
Will Constable	63fc48257a	Configure labeler for 'module: distributed' (#112812 ) To opt-in to getting notified based on this label, simply add yourself to the summary field at the top of this issue: https://github.com/pytorch/pytorch/issues/24422 Uses same file paths as current CODEOWNERS. Note: can easily add sub-labels for components within distributed if we want. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112812 Approved by: https://github.com/wanchaol, https://github.com/fduwjj	2023-11-03 21:51:49 +00:00
Elias Ellison	6e1494ec7c	correct output dir (#112760 ) I was incorrectly overwriting the cudagraphs freezing dir for cudagaphs freezing autotune. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112760 Approved by: https://github.com/desertfire	2023-11-03 21:19:44 +00:00
NVS Abhilash	f58ecd4823	docs: fix docstrings for datapipes and other (#112765 ) Fixes #112636 Before: 265 ``` torch/utils/data/datapipes/dataframe/structures.py:1 at module level: D100: Missing docstring in public module torch/utils/data/datapipes/dataframe/structures.py:8 in public class `DataChunkDF`: D205: 1 blank line required between summary line and description (found 0) torch/utils/data/datapipes/dataframe/structures.py:8 in public class `DataChunkDF`: D208: Docstring is over-indented torch/utils/data/datapipes/dataframe/structures.py:8 in public class `DataChunkDF`: D400: First line should end with a period (not ',') torch/utils/data/datapipes/dataframe/structures.py:13 in public method `__iter__`: D105: Missing docstring in magic method torch/utils/data/datapipes/dataframe/structures.py:17 in public method `__len__`: D105: Missing docstring in magic method torch/utils/data/datapipes/datapipe.py:1 at module level: D100: Missing docstring in public module torch/utils/data/datapipes/datapipe.py:43 in public class `IterDataPipe`: D204: 1 blank line required after class docstring (found 0) torch/utils/data/datapipes/datapipe.py:119 in public method `__iter__`: D105: Missing docstring in magic method torch/utils/data/datapipes/datapipe.py:122 in public method `__getattr__`: D105: Missing docstring in magic method torch/utils/data/datapipes/datapipe.py:135 in public method `register_function`: D102: Missing docstring in public method torch/utils/data/datapipes/datapipe.py:139 in public method `register_datapipe_as_function`: D102: Missing docstring in public method torch/utils/data/datapipes/datapipe.py:161 in public method `__getstate__`: D205: 1 blank line required between summary line and description (found 0) torch/utils/data/datapipes/datapipe.py:161 in public method `__getstate__`: D401: First line should be in imperative mood; try rephrasing (found 'This') torch/utils/data/datapipes/datapipe.py:171 in public method `__reduce_ex__`: D105: Missing docstring in magic method torch/utils/data/datapipes/datapipe.py:180 in public method `set_getstate_hook`: D102: Missing docstring in public method torch/utils/data/datapipes/datapipe.py:186 in public method `set_reduce_ex_hook`: D102: Missing docstring in public method torch/utils/data/datapipes/datapipe.py:191 in public method `__repr__`: D105: Missing docstring in magic method torch/utils/data/datapipes/datapipe.py:197 in public method `__str__`: D105: Missing docstring in magic method torch/utils/data/datapipes/datapipe.py:203 in public method `__dir__`: D105: Missing docstring in magic method torch/utils/data/datapipes/datapipe.py:208 in public method `reset`: D205: 1 blank line required between summary line and description (found 0) torch/utils/data/datapipes/datapipe.py:208 in public method `reset`: D400: First line should end with a period (not ',') torch/utils/data/datapipes/datapipe.py:217 in public class `DFIterDataPipe`: D101: Missing docstring in public class torch/utils/data/datapipes/datapipe.py:223 in public class `MapDataPipe`: D204: 1 blank line required after class docstring (found 0) torch/utils/data/datapipes/datapipe.py:261 in public method `__getattr__`: D105: Missing docstring in magic method torch/utils/data/datapipes/datapipe.py:274 in public method `register_function`: D102: Missing docstring in public method torch/utils/data/datapipes/datapipe.py:278 in public method `register_datapipe_as_function`: D102: Missing docstring in public method torch/utils/data/datapipes/datapipe.py:293 in public method `__getstate__`: D205: 1 blank line required between summary line and description (found 0) torch/utils/data/datapipes/datapipe.py:293 in public method `__getstate__`: D401: First line should be in imperative mood; try rephrasing (found 'This') torch/utils/data/datapipes/datapipe.py:303 in public method `__reduce_ex__`: D105: Missing docstring in magic method torch/utils/data/datapipes/datapipe.py:312 in public method `set_getstate_hook`: D102: Missing docstring in public method torch/utils/data/datapipes/datapipe.py:318 in public method `set_reduce_ex_hook`: D102: Missing docstring in public method torch/utils/data/datapipes/datapipe.py:323 in public method `__repr__`: D105: Missing docstring in magic method torch/utils/data/datapipes/datapipe.py:329 in public method `__str__`: D105: Missing docstring in magic method torch/utils/data/datapipes/datapipe.py:335 in public method `__dir__`: D105: Missing docstring in magic method torch/utils/data/datapipes/datapipe.py:392 in public class `DataChunk`: D101: Missing docstring in public class torch/utils/data/datapipes/datapipe.py:393 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/datapipe.py:397 in public method `as_str`: D102: Missing docstring in public method torch/utils/data/datapipes/datapipe.py:401 in public method `__iter__`: D105: Missing docstring in magic method torch/utils/data/datapipes/datapipe.py:404 in public method `raw_iterator`: D102: Missing docstring in public method torch/utils/data/datapipes/iter/callable.py:1 at module level: D100: Missing docstring in public module torch/utils/data/datapipes/iter/callable.py:23 in public class `MapperIterDataPipe`: D204: 1 blank line required after class docstring (found 0) torch/utils/data/datapipes/iter/callable.py:23 in public class `MapperIterDataPipe`: D205: 1 blank line required between summary line and description (found 0) torch/utils/data/datapipes/iter/callable.py:63 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/iter/callable.py:121 in public method `__iter__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/callable.py:125 in public method `__len__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/callable.py:173 in public class `CollatorIterDataPipe`: D205: 1 blank line required between summary line and description (found 0) torch/utils/data/datapipes/iter/callable.py:213 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/iter/combinatorics.py:1 at module level: D100: Missing docstring in public module torch/utils/data/datapipes/iter/combinatorics.py:18 in public class `SamplerIterDataPipe`: D204: 1 blank line required after class docstring (found 0) torch/utils/data/datapipes/iter/combinatorics.py:29 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/iter/combinatorics.py:44 in public method `__iter__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/combinatorics.py:47 in public method `__len__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/combinatorics.py:56 in public class `ShufflerIterDataPipe`: D204: 1 blank line required after class docstring (found 0) torch/utils/data/datapipes/iter/combinatorics.py:56 in public class `ShufflerIterDataPipe`: D205: 1 blank line required between summary line and description (found 0) torch/utils/data/datapipes/iter/combinatorics.py:56 in public class `ShufflerIterDataPipe`: D400: First line should end with a period (not 'r') torch/utils/data/datapipes/iter/combinatorics.py:94 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/iter/combinatorics.py:114 in public method `set_shuffle`: D102: Missing docstring in public method torch/utils/data/datapipes/iter/combinatorics.py:118 in public method `set_seed`: D102: Missing docstring in public method torch/utils/data/datapipes/iter/combinatorics.py:122 in public method `__iter__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/combinatorics.py:137 in public method `__len__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/combinatorics.py:142 in public method `reset`: D102: Missing docstring in public method torch/utils/data/datapipes/iter/combinatorics.py:150 in public method `__getstate__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/combinatorics.py:165 in public method `__setstate__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/combinatorics.py:179 in public method `__del__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/combining.py:1 at module level: D100: Missing docstring in public module torch/utils/data/datapipes/iter/combining.py:26 in public class `ConcaterIterDataPipe`: D204: 1 blank line required after class docstring (found 0) torch/utils/data/datapipes/iter/combining.py:26 in public class `ConcaterIterDataPipe`: D205: 1 blank line required between summary line and description (found 0) torch/utils/data/datapipes/iter/combining.py:26 in public class `ConcaterIterDataPipe`: D400: First line should end with a period (not 'l') torch/utils/data/datapipes/iter/combining.py:44 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/iter/combining.py:51 in public method `__iter__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/combining.py:55 in public method `__len__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/combining.py:64 in public class `ForkerIterDataPipe`: D204: 1 blank line required after class docstring (found 0) torch/utils/data/datapipes/iter/combining.py:92 in public method `__new__`: D102: Missing docstring in public method torch/utils/data/datapipes/iter/combining.py:108 in private class `_ContainerTemplate`: D204: 1 blank line required after class docstring (found 0) torch/utils/data/datapipes/iter/combining.py:108 in private class `_ContainerTemplate`: D205: 1 blank line required between summary line and description (found 0) torch/utils/data/datapipes/iter/combining.py:108 in private class `_ContainerTemplate`: D400: First line should end with a period (not 'd') torch/utils/data/datapipes/iter/combining.py:126 in private method `get_length_by_instance`: D200: One-line docstring should fit on one line with quotes (found 3) torch/utils/data/datapipes/iter/combining.py:126 in private method `get_length_by_instance`: D400: First line should end with a period (not '`') torch/utils/data/datapipes/iter/combining.py:136 in private class `_ForkerIterDataPipe`: D204: 1 blank line required after class docstring (found 0) torch/utils/data/datapipes/iter/combining.py:136 in private class `_ForkerIterDataPipe`: D205: 1 blank line required between summary line and description (found 0) torch/utils/data/datapipes/iter/combining.py:136 in private class `_ForkerIterDataPipe`: D400: First line should end with a period (not 's') torch/utils/data/datapipes/iter/combining.py:275 in private class `_ChildDataPipe`: D204: 1 blank line required after class docstring (found 0) torch/utils/data/datapipes/iter/combining.py:275 in private class `_ChildDataPipe`: D205: 1 blank line required between summary line and description (found 0) torch/utils/data/datapipes/iter/combining.py:275 in private class `_ChildDataPipe`: D400: First line should end with a period (not 's') torch/utils/data/datapipes/iter/combining.py:320 in private method `_set_main_datapipe_valid_iterator_id`: D205: 1 blank line required between summary line and description (found 0) torch/utils/data/datapipes/iter/combining.py:343 in private method `_check_valid_iterator_id`: D200: One-line docstring should fit on one line with quotes (found 3) torch/utils/data/datapipes/iter/combining.py:351 in public class `DemultiplexerIterDataPipe`: D204: 1 blank line required after class docstring (found 0) torch/utils/data/datapipes/iter/combining.py:351 in public class `DemultiplexerIterDataPipe`: D205: 1 blank line required between summary line and description (found 0) torch/utils/data/datapipes/iter/combining.py:351 in public class `DemultiplexerIterDataPipe`: D400: First line should end with a period (not 'n') torch/utils/data/datapipes/iter/combining.py:384 in public method `__new__`: D102: Missing docstring in public method torch/utils/data/datapipes/iter/combining.py:399 in private class `_DemultiplexerIterDataPipe`: D205: 1 blank line required between summary line and description (found 0) torch/utils/data/datapipes/iter/combining.py:399 in private class `_DemultiplexerIterDataPipe`: D400: First line should end with a period (not 's') torch/utils/data/datapipes/iter/combining.py:534 in public class `MultiplexerIterDataPipe`: D204: 1 blank line required after class docstring (found 0) torch/utils/data/datapipes/iter/combining.py:534 in public class `MultiplexerIterDataPipe`: D205: 1 blank line required between summary line and description (found 0) torch/utils/data/datapipes/iter/combining.py:534 in public class `MultiplexerIterDataPipe`: D400: First line should end with a period (not ',') torch/utils/data/datapipes/iter/combining.py:549 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/iter/combining.py:553 in public method `__iter__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/combining.py:566 in public method `__len__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/combining.py:572 in public method `reset`: D102: Missing docstring in public method torch/utils/data/datapipes/iter/combining.py:575 in public method `__getstate__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/combining.py:585 in public method `__setstate__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/combining.py:593 in public method `__del__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/combining.py:599 in public class `ZipperIterDataPipe`: D204: 1 blank line required after class docstring (found 0) torch/utils/data/datapipes/iter/combining.py:599 in public class `ZipperIterDataPipe`: D205: 1 blank line required between summary line and description (found 0) torch/utils/data/datapipes/iter/combining.py:615 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/iter/combining.py:622 in public method `__iter__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/combining.py:626 in public method `__len__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/filelister.py:1 at module level: D100: Missing docstring in public module torch/utils/data/datapipes/iter/filelister.py:15 in public class `FileListerIterDataPipe`: D205: 1 blank line required between summary line and description (found 0) torch/utils/data/datapipes/iter/filelister.py:36 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/iter/filelister.py:58 in public method `__iter__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/filelister.py:62 in public method `__len__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/fileopener.py:1 at module level: D100: Missing docstring in public module torch/utils/data/datapipes/iter/fileopener.py:15 in public class `FileOpenerIterDataPipe`: D205: 1 blank line required between summary line and description (found 0) torch/utils/data/datapipes/iter/fileopener.py:15 in public class `FileOpenerIterDataPipe`: D400: First line should end with a period (not 'm') torch/utils/data/datapipes/iter/fileopener.py:42 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/iter/fileopener.py:66 in public method `__iter__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/fileopener.py:69 in public method `__len__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/grouping.py:1 at module level: D100: Missing docstring in public module torch/utils/data/datapipes/iter/grouping.py:31 in public class `BatcherIterDataPipe`: D204: 1 blank line required after class docstring (found 0) torch/utils/data/datapipes/iter/grouping.py:31 in public class `BatcherIterDataPipe`: D205: 1 blank line required between summary line and description (found 0) torch/utils/data/datapipes/iter/grouping.py:31 in public class `BatcherIterDataPipe`: D400: First line should end with a period (not 's') torch/utils/data/datapipes/iter/grouping.py:55 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/iter/grouping.py:68 in public method `__iter__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/grouping.py:79 in public method `__len__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/grouping.py:91 in public class `UnBatcherIterDataPipe`: D205: 1 blank line required between summary line and description (found 0) torch/utils/data/datapipes/iter/grouping.py:91 in public class `UnBatcherIterDataPipe`: D400: First line should end with a period (not 'l') torch/utils/data/datapipes/iter/grouping.py:112 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/iter/grouping.py:118 in public method `__iter__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/grouping.py:143 in public class `GrouperIterDataPipe`: D204: 1 blank line required after class docstring (found 0) torch/utils/data/datapipes/iter/grouping.py:143 in public class `GrouperIterDataPipe`: D205: 1 blank line required between summary line and description (found 0) torch/utils/data/datapipes/iter/grouping.py:143 in public class `GrouperIterDataPipe`: D400: First line should end with a period (not ',') torch/utils/data/datapipes/iter/grouping.py:185 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/iter/grouping.py:233 in public method `__iter__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/grouping.py:257 in public method `reset`: D102: Missing docstring in public method torch/utils/data/datapipes/iter/grouping.py:261 in public method `__getstate__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/grouping.py:278 in public method `__setstate__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/grouping.py:294 in public method `__del__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/routeddecoder.py:1 at module level: D100: Missing docstring in public module torch/utils/data/datapipes/iter/routeddecoder.py:19 in public class `RoutedDecoderIterDataPipe`: D205: 1 blank line required between summary line and description (found 0) torch/utils/data/datapipes/iter/routeddecoder.py:19 in public class `RoutedDecoderIterDataPipe`: D400: First line should end with a period (not 'a') torch/utils/data/datapipes/iter/routeddecoder.py:37 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/iter/routeddecoder.py:53 in public method `add_handler`: D102: Missing docstring in public method torch/utils/data/datapipes/iter/routeddecoder.py:56 in public method `__iter__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/routeddecoder.py:62 in public method `__len__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/selecting.py:1 at module level: D100: Missing docstring in public module torch/utils/data/datapipes/iter/selecting.py:21 in public class `FilterIterDataPipe`: D204: 1 blank line required after class docstring (found 0) torch/utils/data/datapipes/iter/selecting.py:46 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/iter/selecting.py:70 in public method `__iter__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/sharding.py:1 at module level: D100: Missing docstring in public module torch/utils/data/datapipes/iter/sharding.py:17 in public class `SHARDING_PRIORITIES`: D101: Missing docstring in public class torch/utils/data/datapipes/iter/sharding.py:30 in public class `ShardingFilterIterDataPipe`: D205: 1 blank line required between summary line and description (found 0) torch/utils/data/datapipes/iter/sharding.py:30 in public class `ShardingFilterIterDataPipe`: D400: First line should end with a period (not 's') torch/utils/data/datapipes/iter/sharding.py:39 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/iter/sharding.py:47 in public method `apply_sharding`: D102: Missing docstring in public method torch/utils/data/datapipes/iter/sharding.py:74 in public method `__iter__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/sharding.py:79 in public method `__len__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/streamreader.py:1 at module level: D100: Missing docstring in public module torch/utils/data/datapipes/iter/streamreader.py:10 in public class `StreamReaderIterDataPipe`: D204: 1 blank line required after class docstring (found 0) torch/utils/data/datapipes/iter/streamreader.py:10 in public class `StreamReaderIterDataPipe`: D205: 1 blank line required between summary line and description (found 0) torch/utils/data/datapipes/iter/streamreader.py:10 in public class `StreamReaderIterDataPipe`: D400: First line should end with a period (not 'l') torch/utils/data/datapipes/iter/streamreader.py:27 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/iter/streamreader.py:31 in public method `__iter__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/utils.py:1 at module level: D100: Missing docstring in public module torch/utils/data/datapipes/iter/utils.py:9 in public class `IterableWrapperIterDataPipe`: D204: 1 blank line required after class docstring (found 0) torch/utils/data/datapipes/iter/utils.py:29 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/iter/utils.py:33 in public method `__iter__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/utils.py:49 in public method `__len__`: D105: Missing docstring in magic method torch/utils/data/datapipes/map/callable.py:1 at module level: D100: Missing docstring in public module torch/utils/data/datapipes/map/callable.py:14 in public function `default_fn`: D103: Missing docstring in public function torch/utils/data/datapipes/map/callable.py:20 in public class `MapperMapDataPipe`: D204: 1 blank line required after class docstring (found 0) torch/utils/data/datapipes/map/callable.py:20 in public class `MapperMapDataPipe`: D205: 1 blank line required between summary line and description (found 0) torch/utils/data/datapipes/map/callable.py:45 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/map/callable.py:55 in public method `__len__`: D105: Missing docstring in magic method torch/utils/data/datapipes/map/callable.py:58 in public method `__getitem__`: D105: Missing docstring in magic method torch/utils/data/datapipes/map/combinatorics.py:1 at module level: D100: Missing docstring in public module torch/utils/data/datapipes/map/combinatorics.py:15 in public class `ShufflerIterDataPipe`: D204: 1 blank line required after class docstring (found 0) torch/utils/data/datapipes/map/combinatorics.py:55 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/map/combinatorics.py:68 in public method `set_shuffle`: D102: Missing docstring in public method torch/utils/data/datapipes/map/combinatorics.py:72 in public method `set_seed`: D102: Missing docstring in public method torch/utils/data/datapipes/map/combinatorics.py:76 in public method `__iter__`: D105: Missing docstring in magic method torch/utils/data/datapipes/map/combinatorics.py:85 in public method `reset`: D102: Missing docstring in public method torch/utils/data/datapipes/map/combinatorics.py:92 in public method `__len__`: D105: Missing docstring in magic method torch/utils/data/datapipes/map/combinatorics.py:95 in public method `__getstate__`: D105: Missing docstring in magic method torch/utils/data/datapipes/map/combinatorics.py:110 in public method `__setstate__`: D105: Missing docstring in magic method torch/utils/data/datapipes/map/combining.py:1 at module level: D100: Missing docstring in public module torch/utils/data/datapipes/map/combining.py:12 in public class `ConcaterMapDataPipe`: D204: 1 blank line required after class docstring (found 0) torch/utils/data/datapipes/map/combining.py:12 in public class `ConcaterMapDataPipe`: D205: 1 blank line required between summary line and description (found 0) torch/utils/data/datapipes/map/combining.py:34 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/map/combining.py:43 in public method `__getitem__`: D105: Missing docstring in magic method torch/utils/data/datapipes/map/combining.py:52 in public method `__len__`: D105: Missing docstring in magic method torch/utils/data/datapipes/map/combining.py:58 in public class `ZipperMapDataPipe`: D204: 1 blank line required after class docstring (found 0) torch/utils/data/datapipes/map/combining.py:58 in public class `ZipperMapDataPipe`: D205: 1 blank line required between summary line and description (found 0) torch/utils/data/datapipes/map/combining.py:76 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/map/combining.py:85 in public method `__getitem__`: D105: Missing docstring in magic method torch/utils/data/datapipes/map/combining.py:94 in public method `__len__`: D105: Missing docstring in magic method torch/utils/data/datapipes/map/grouping.py:1 at module level: D100: Missing docstring in public module torch/utils/data/datapipes/map/grouping.py:12 in public class `BatcherMapDataPipe`: D204: 1 blank line required after class docstring (found 0) torch/utils/data/datapipes/map/grouping.py:12 in public class `BatcherMapDataPipe`: D205: 1 blank line required between summary line and description (found 0) torch/utils/data/datapipes/map/grouping.py:12 in public class `BatcherMapDataPipe`: D400: First line should end with a period (not 's') torch/utils/data/datapipes/map/grouping.py:34 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/map/grouping.py:47 in public method `__getitem__`: D105: Missing docstring in magic method torch/utils/data/datapipes/map/grouping.py:60 in public method `__len__`: D105: Missing docstring in magic method torch/utils/data/datapipes/map/utils.py:1 at module level: D100: Missing docstring in public module torch/utils/data/datapipes/map/utils.py:9 in public class `SequenceWrapperMapDataPipe`: D204: 1 blank line required after class docstring (found 0) torch/utils/data/datapipes/map/utils.py:32 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/map/utils.py:45 in public method `__getitem__`: D105: Missing docstring in magic method torch/utils/data/datapipes/map/utils.py:48 in public method `__len__`: D105: Missing docstring in magic method torch/utils/data/datapipes/utils/common.py:1 at module level: D100: Missing docstring in public module torch/utils/data/datapipes/utils/common.py:26 in public function `validate_input_col`: D400: First line should end with a period (not 'n') torch/utils/data/datapipes/utils/common.py:26 in public function `validate_input_col`: D401: First line should be in imperative mood (perhaps 'Check', not 'Checks') torch/utils/data/datapipes/utils/common.py:127 in private function `_check_unpickable_fn`: D205: 1 blank line required between summary line and description (found 0) torch/utils/data/datapipes/utils/common.py:127 in private function `_check_unpickable_fn`: D400: First line should end with a period (not 'g') torch/utils/data/datapipes/utils/common.py:127 in private function `_check_unpickable_fn`: D401: First line should be in imperative mood (perhaps 'Check', not 'Checks') torch/utils/data/datapipes/utils/common.py:156 in public function `match_masks`: D103: Missing docstring in public function torch/utils/data/datapipes/utils/common.py:170 in public function `get_file_pathnames_from_root`: D103: Missing docstring in public function torch/utils/data/datapipes/utils/common.py:207 in public function `get_file_binaries_from_pathnames`: D103: Missing docstring in public function torch/utils/data/datapipes/utils/common.py:220 in public function `validate_pathname_binary_tuple`: D103: Missing docstring in public function torch/utils/data/datapipes/utils/common.py:290 in public class `StreamWrapper`: D204: 1 blank line required after class docstring (found 0) torch/utils/data/datapipes/utils/common.py:290 in public class `StreamWrapper`: D205: 1 blank line required between summary line and description (found 0) torch/utils/data/datapipes/utils/common.py:290 in public class `StreamWrapper`: D400: First line should end with a period (not 'y') torch/utils/data/datapipes/utils/common.py:298 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/utils/common.py:315 in public method `close_streams`: D200: One-line docstring should fit on one line with quotes (found 3) torch/utils/data/datapipes/utils/common.py:331 in public method `__getattr__`: D105: Missing docstring in magic method torch/utils/data/datapipes/utils/common.py:335 in public method `close`: D102: Missing docstring in public method torch/utils/data/datapipes/utils/common.py:351 in public method `autoclose`: D205: 1 blank line required between summary line and description (found 0) torch/utils/data/datapipes/utils/common.py:351 in public method `autoclose`: D400: First line should end with a period (not 's') torch/utils/data/datapipes/utils/common.py:359 in public method `__dir__`: D105: Missing docstring in magic method torch/utils/data/datapipes/utils/common.py:364 in public method `__del__`: D105: Missing docstring in magic method torch/utils/data/datapipes/utils/common.py:368 in public method `__iter__`: D105: Missing docstring in magic method torch/utils/data/datapipes/utils/common.py:371 in public method `__next__`: D105: Missing docstring in magic method torch/utils/data/datapipes/utils/common.py:374 in public method `__repr__`: D105: Missing docstring in magic method torch/utils/data/datapipes/utils/common.py:380 in public method `__getstate__`: D105: Missing docstring in magic method torch/utils/data/datapipes/utils/common.py:383 in public method `__setstate__`: D105: Missing docstring in magic method torch/utils/data/datapipes/utils/decoder.py:1 at module level: D100: Missing docstring in public module torch/utils/data/datapipes/utils/decoder.py:31 in public function `basichandlers`: D103: Missing docstring in public function torch/utils/data/datapipes/utils/decoder.py:87 in public function `handle_extension`: D202: No blank lines allowed after function docstring (found 1) torch/utils/data/datapipes/utils/decoder.py:87 in public function `handle_extension`: D205: 1 blank line required between summary line and description (found 0) torch/utils/data/datapipes/utils/decoder.py:87 in public function `handle_extension`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') torch/utils/data/datapipes/utils/decoder.py:115 in public class `ImageHandler`: D204: 1 blank line required after class docstring (found 0) torch/utils/data/datapipes/utils/decoder.py:115 in public class `ImageHandler`: D205: 1 blank line required between summary line and description (found 0) torch/utils/data/datapipes/utils/decoder.py:139 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/utils/decoder.py:143 in public method `__call__`: D102: Missing docstring in public method torch/utils/data/datapipes/utils/decoder.py:187 in public function `imagehandler`: D103: Missing docstring in public function torch/utils/data/datapipes/utils/decoder.py:194 in public function `videohandler`: D103: Missing docstring in public function torch/utils/data/datapipes/utils/decoder.py:215 in public function `audiohandler`: D103: Missing docstring in public function torch/utils/data/datapipes/utils/decoder.py:236 in public class `MatHandler`: D101: Missing docstring in public class torch/utils/data/datapipes/utils/decoder.py:237 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/utils/decoder.py:247 in public method `__call__`: D102: Missing docstring in public method torch/utils/data/datapipes/utils/decoder.py:253 in public function `mathandler`: D103: Missing docstring in public function torch/utils/data/datapipes/utils/decoder.py:261 in public function `extension_extract_fn`: D103: Missing docstring in public function torch/utils/data/datapipes/utils/decoder.py:270 in public class `Decoder`: D205: 1 blank line required between summary line and description (found 0) torch/utils/data/datapipes/utils/decoder.py:276 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/utils/decoder.py:282 in public method `add_handler`: D102: Missing docstring in public method torch/utils/data/datapipes/utils/decoder.py:292 in public method `decode1`: D102: Missing docstring in public method torch/utils/data/datapipes/utils/decoder.py:309 in public method `decode`: D102: Missing docstring in public method torch/utils/data/datapipes/utils/decoder.py:326 in public method `__call__`: D102: Missing docstring in public method torch/utils/data/datapipes/utils/snapshot.py:1 at module level: D100: Missing docstring in public module torch/utils/data/datapipes/utils/snapshot.py:11 in private function `_simple_graph_snapshot_restoration`: D205: 1 blank line required between summary line and description (found 0) torch/utils/data/datapipes/utils/snapshot.py:11 in private function `_simple_graph_snapshot_restoration`: D400: First line should end with a period (not ',') torch/utils/data/datapipes/utils/snapshot.py:11 in private function `_simple_graph_snapshot_restoration`: D401: First line should be in imperative mood; try rephrasing (found 'This') torch/utils/tensorboard/_convert_np.py:1 at module level: D200: One-line docstring should fit on one line with quotes (found 3) torch/utils/tensorboard/_convert_np.py:9 in public function `make_np`: D205: 1 blank line required between summary line and description (found 0) torch/utils/tensorboard/_convert_np.py:9 in public function `make_np`: D400: First line should end with a period (not ':') 265 ``` After: 166 ``` torch/utils/data/datapipes/dataframe/structures.py:1 at module level: D100: Missing docstring in public module torch/utils/data/datapipes/dataframe/structures.py:10 in public method `__iter__`: D105: Missing docstring in magic method torch/utils/data/datapipes/dataframe/structures.py:14 in public method `__len__`: D105: Missing docstring in magic method torch/utils/data/datapipes/datapipe.py:1 at module level: D100: Missing docstring in public module torch/utils/data/datapipes/datapipe.py:120 in public method `__iter__`: D105: Missing docstring in magic method torch/utils/data/datapipes/datapipe.py:123 in public method `__getattr__`: D105: Missing docstring in magic method torch/utils/data/datapipes/datapipe.py:136 in public method `register_function`: D102: Missing docstring in public method torch/utils/data/datapipes/datapipe.py:140 in public method `register_datapipe_as_function`: D102: Missing docstring in public method torch/utils/data/datapipes/datapipe.py:173 in public method `__reduce_ex__`: D105: Missing docstring in magic method torch/utils/data/datapipes/datapipe.py:182 in public method `set_getstate_hook`: D102: Missing docstring in public method torch/utils/data/datapipes/datapipe.py:188 in public method `set_reduce_ex_hook`: D102: Missing docstring in public method torch/utils/data/datapipes/datapipe.py:193 in public method `__repr__`: D105: Missing docstring in magic method torch/utils/data/datapipes/datapipe.py:199 in public method `__str__`: D105: Missing docstring in magic method torch/utils/data/datapipes/datapipe.py:205 in public method `__dir__`: D105: Missing docstring in magic method torch/utils/data/datapipes/datapipe.py:221 in public class `DFIterDataPipe`: D101: Missing docstring in public class torch/utils/data/datapipes/datapipe.py:266 in public method `__getattr__`: D105: Missing docstring in magic method torch/utils/data/datapipes/datapipe.py:279 in public method `register_function`: D102: Missing docstring in public method torch/utils/data/datapipes/datapipe.py:283 in public method `register_datapipe_as_function`: D102: Missing docstring in public method torch/utils/data/datapipes/datapipe.py:309 in public method `__reduce_ex__`: D105: Missing docstring in magic method torch/utils/data/datapipes/datapipe.py:318 in public method `set_getstate_hook`: D102: Missing docstring in public method torch/utils/data/datapipes/datapipe.py:324 in public method `set_reduce_ex_hook`: D102: Missing docstring in public method torch/utils/data/datapipes/datapipe.py:329 in public method `__repr__`: D105: Missing docstring in magic method torch/utils/data/datapipes/datapipe.py:335 in public method `__str__`: D105: Missing docstring in magic method torch/utils/data/datapipes/datapipe.py:341 in public method `__dir__`: D105: Missing docstring in magic method torch/utils/data/datapipes/datapipe.py:398 in public class `DataChunk`: D101: Missing docstring in public class torch/utils/data/datapipes/datapipe.py:399 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/datapipe.py:403 in public method `as_str`: D102: Missing docstring in public method torch/utils/data/datapipes/datapipe.py:407 in public method `__iter__`: D105: Missing docstring in magic method torch/utils/data/datapipes/datapipe.py:410 in public method `raw_iterator`: D102: Missing docstring in public method torch/utils/data/datapipes/iter/callable.py:1 at module level: D100: Missing docstring in public module torch/utils/data/datapipes/iter/callable.py:65 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/iter/callable.py:123 in public method `__iter__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/callable.py:127 in public method `__len__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/callable.py:216 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/iter/combinatorics.py:1 at module level: D100: Missing docstring in public module torch/utils/data/datapipes/iter/combinatorics.py:30 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/iter/combinatorics.py:45 in public method `__iter__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/combinatorics.py:48 in public method `__len__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/combinatorics.py:97 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/iter/combinatorics.py:117 in public method `set_shuffle`: D102: Missing docstring in public method torch/utils/data/datapipes/iter/combinatorics.py:121 in public method `set_seed`: D102: Missing docstring in public method torch/utils/data/datapipes/iter/combinatorics.py:125 in public method `__iter__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/combinatorics.py:140 in public method `__len__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/combinatorics.py:145 in public method `reset`: D102: Missing docstring in public method torch/utils/data/datapipes/iter/combinatorics.py:153 in public method `__getstate__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/combinatorics.py:168 in public method `__setstate__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/combinatorics.py:182 in public method `__del__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/combining.py:1 at module level: D100: Missing docstring in public module torch/utils/data/datapipes/iter/combining.py:46 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/iter/combining.py:53 in public method `__iter__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/combining.py:57 in public method `__len__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/combining.py:95 in public method `__new__`: D102: Missing docstring in public method torch/utils/data/datapipes/iter/combining.py:388 in public method `__new__`: D102: Missing docstring in public method torch/utils/data/datapipes/iter/combining.py:556 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/iter/combining.py:560 in public method `__iter__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/combining.py:573 in public method `__len__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/combining.py:579 in public method `reset`: D102: Missing docstring in public method torch/utils/data/datapipes/iter/combining.py:582 in public method `__getstate__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/combining.py:592 in public method `__setstate__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/combining.py:600 in public method `__del__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/combining.py:624 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/iter/combining.py:631 in public method `__iter__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/combining.py:635 in public method `__len__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/filelister.py:1 at module level: D100: Missing docstring in public module torch/utils/data/datapipes/iter/filelister.py:37 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/iter/filelister.py:59 in public method `__iter__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/filelister.py:63 in public method `__len__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/fileopener.py:1 at module level: D100: Missing docstring in public module torch/utils/data/datapipes/iter/fileopener.py:41 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/iter/fileopener.py:65 in public method `__iter__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/fileopener.py:68 in public method `__len__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/grouping.py:1 at module level: D100: Missing docstring in public module torch/utils/data/datapipes/iter/grouping.py:57 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/iter/grouping.py:70 in public method `__iter__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/grouping.py:81 in public method `__len__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/grouping.py:115 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/iter/grouping.py:121 in public method `__iter__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/grouping.py:190 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/iter/grouping.py:238 in public method `__iter__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/grouping.py:262 in public method `reset`: D102: Missing docstring in public method torch/utils/data/datapipes/iter/grouping.py:266 in public method `__getstate__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/grouping.py:283 in public method `__setstate__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/grouping.py:299 in public method `__del__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/routeddecoder.py:1 at module level: D100: Missing docstring in public module torch/utils/data/datapipes/iter/routeddecoder.py:38 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/iter/routeddecoder.py:54 in public method `add_handler`: D102: Missing docstring in public method torch/utils/data/datapipes/iter/routeddecoder.py:57 in public method `__iter__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/routeddecoder.py:63 in public method `__len__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/selecting.py:1 at module level: D100: Missing docstring in public module torch/utils/data/datapipes/iter/selecting.py:47 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/iter/selecting.py:71 in public method `__iter__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/sharding.py:1 at module level: D100: Missing docstring in public module torch/utils/data/datapipes/iter/sharding.py:17 in public class `SHARDING_PRIORITIES`: D101: Missing docstring in public class torch/utils/data/datapipes/iter/sharding.py:40 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/iter/sharding.py:48 in public method `apply_sharding`: D102: Missing docstring in public method torch/utils/data/datapipes/iter/sharding.py:75 in public method `__iter__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/sharding.py:80 in public method `__len__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/streamreader.py:1 at module level: D100: Missing docstring in public module torch/utils/data/datapipes/iter/streamreader.py:29 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/iter/streamreader.py:33 in public method `__iter__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/utils.py:1 at module level: D100: Missing docstring in public module torch/utils/data/datapipes/iter/utils.py:30 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/iter/utils.py:34 in public method `__iter__`: D105: Missing docstring in magic method torch/utils/data/datapipes/iter/utils.py:50 in public method `__len__`: D105: Missing docstring in magic method torch/utils/data/datapipes/map/callable.py:1 at module level: D100: Missing docstring in public module torch/utils/data/datapipes/map/callable.py:14 in public function `default_fn`: D103: Missing docstring in public function torch/utils/data/datapipes/map/callable.py:47 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/map/callable.py:57 in public method `__len__`: D105: Missing docstring in magic method torch/utils/data/datapipes/map/callable.py:60 in public method `__getitem__`: D105: Missing docstring in magic method torch/utils/data/datapipes/map/combinatorics.py:1 at module level: D100: Missing docstring in public module torch/utils/data/datapipes/map/combinatorics.py:56 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/map/combinatorics.py:69 in public method `set_shuffle`: D102: Missing docstring in public method torch/utils/data/datapipes/map/combinatorics.py:73 in public method `set_seed`: D102: Missing docstring in public method torch/utils/data/datapipes/map/combinatorics.py:77 in public method `__iter__`: D105: Missing docstring in magic method torch/utils/data/datapipes/map/combinatorics.py:86 in public method `reset`: D102: Missing docstring in public method torch/utils/data/datapipes/map/combinatorics.py:93 in public method `__len__`: D105: Missing docstring in magic method torch/utils/data/datapipes/map/combinatorics.py:96 in public method `__getstate__`: D105: Missing docstring in magic method torch/utils/data/datapipes/map/combinatorics.py:111 in public method `__setstate__`: D105: Missing docstring in magic method torch/utils/data/datapipes/map/combining.py:1 at module level: D100: Missing docstring in public module torch/utils/data/datapipes/map/combining.py:36 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/map/combining.py:45 in public method `__getitem__`: D105: Missing docstring in magic method torch/utils/data/datapipes/map/combining.py:54 in public method `__len__`: D105: Missing docstring in magic method torch/utils/data/datapipes/map/combining.py:80 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/map/combining.py:89 in public method `__getitem__`: D105: Missing docstring in magic method torch/utils/data/datapipes/map/combining.py:98 in public method `__len__`: D105: Missing docstring in magic method torch/utils/data/datapipes/map/grouping.py:1 at module level: D100: Missing docstring in public module torch/utils/data/datapipes/map/grouping.py:36 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/map/grouping.py:49 in public method `__getitem__`: D105: Missing docstring in magic method torch/utils/data/datapipes/map/grouping.py:62 in public method `__len__`: D105: Missing docstring in magic method torch/utils/data/datapipes/map/utils.py:1 at module level: D100: Missing docstring in public module torch/utils/data/datapipes/map/utils.py:33 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/map/utils.py:46 in public method `__getitem__`: D105: Missing docstring in magic method torch/utils/data/datapipes/map/utils.py:49 in public method `__len__`: D105: Missing docstring in magic method torch/utils/data/datapipes/utils/common.py:1 at module level: D100: Missing docstring in public module torch/utils/data/datapipes/utils/common.py:157 in public function `match_masks`: D103: Missing docstring in public function torch/utils/data/datapipes/utils/common.py:171 in public function `get_file_pathnames_from_root`: D103: Missing docstring in public function torch/utils/data/datapipes/utils/common.py:208 in public function `get_file_binaries_from_pathnames`: D103: Missing docstring in public function torch/utils/data/datapipes/utils/common.py:221 in public function `validate_pathname_binary_tuple`: D103: Missing docstring in public function torch/utils/data/datapipes/utils/common.py:300 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/utils/common.py:331 in public method `__getattr__`: D105: Missing docstring in magic method torch/utils/data/datapipes/utils/common.py:335 in public method `close`: D102: Missing docstring in public method torch/utils/data/datapipes/utils/common.py:356 in public method `__dir__`: D105: Missing docstring in magic method torch/utils/data/datapipes/utils/common.py:361 in public method `__del__`: D105: Missing docstring in magic method torch/utils/data/datapipes/utils/common.py:365 in public method `__iter__`: D105: Missing docstring in magic method torch/utils/data/datapipes/utils/common.py:368 in public method `__next__`: D105: Missing docstring in magic method torch/utils/data/datapipes/utils/common.py:371 in public method `__repr__`: D105: Missing docstring in magic method torch/utils/data/datapipes/utils/common.py:377 in public method `__getstate__`: D105: Missing docstring in magic method torch/utils/data/datapipes/utils/common.py:380 in public method `__setstate__`: D105: Missing docstring in magic method torch/utils/data/datapipes/utils/decoder.py:1 at module level: D100: Missing docstring in public module torch/utils/data/datapipes/utils/decoder.py:31 in public function `basichandlers`: D103: Missing docstring in public function torch/utils/data/datapipes/utils/decoder.py:141 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/utils/decoder.py:145 in public method `__call__`: D102: Missing docstring in public method torch/utils/data/datapipes/utils/decoder.py:189 in public function `imagehandler`: D103: Missing docstring in public function torch/utils/data/datapipes/utils/decoder.py:196 in public function `videohandler`: D103: Missing docstring in public function torch/utils/data/datapipes/utils/decoder.py:217 in public function `audiohandler`: D103: Missing docstring in public function torch/utils/data/datapipes/utils/decoder.py:238 in public class `MatHandler`: D101: Missing docstring in public class torch/utils/data/datapipes/utils/decoder.py:239 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/utils/decoder.py:249 in public method `__call__`: D102: Missing docstring in public method torch/utils/data/datapipes/utils/decoder.py:255 in public function `mathandler`: D103: Missing docstring in public function torch/utils/data/datapipes/utils/decoder.py:263 in public function `extension_extract_fn`: D103: Missing docstring in public function torch/utils/data/datapipes/utils/decoder.py:279 in public method `__init__`: D107: Missing docstring in __init__ torch/utils/data/datapipes/utils/decoder.py:285 in public method `add_handler`: D102: Missing docstring in public method torch/utils/data/datapipes/utils/decoder.py:295 in public method `decode1`: D102: Missing docstring in public method torch/utils/data/datapipes/utils/decoder.py:312 in public method `decode`: D102: Missing docstring in public method torch/utils/data/datapipes/utils/decoder.py:329 in public method `__call__`: D102: Missing docstring in public method torch/utils/data/datapipes/utils/snapshot.py:1 at module level: D100: Missing docstring in public module 166 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112765 Approved by: https://github.com/ejguan	2023-11-03 21:01:19 +00:00
Yukio Siraichi	132cb57e47	Skip aliasing correction for `lift_fresh`. (#112202 ) Fix: #111506 This PR skips aliasing correction on `lift_fresh` calls. Reasoning is: although unlifted and lifted tensors are technically aliases, they are from different levels of abstraction (`FunctionalTensorWrapper` and `XLATensor`). Pull Request resolved: https://github.com/pytorch/pytorch/pull/112202 Approved by: https://github.com/bdhirsh	2023-11-03 20:46:30 +00:00
Mikayla Gawarecki	c799689437	Refactor inference benchmark and add runner script to do sweep (#112863 ) - Added `runner.sh` that does a sweep over `batch_size=(1, 32, 64, 128, 256)` and `compile=(True, False)` - Added GPU utilization as a metric - Converted frontend from 2 processes (one putting requests into `request_queue` and one reading from `response_queue` and collecting metrics) to a single process with 3 threads (one putting requests into `request_queue` and one reading from `response_queue` and collecting metrics and one polling `nvidia-smi` for gpu utilization) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112863 Approved by: https://github.com/albanD ghstack dependencies: #112582	2023-11-03 20:26:43 +00:00
cyy	dc1a3581e4	Remove c10::variant (#112725 ) Maybe it's time to remove. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112725 Approved by: https://github.com/albanD	2023-11-03 18:31:58 +00:00
Bin Bao	a91baaf314	[aotinductor] Solves a problem where a tensor is returned more than once (#112177 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112177 Approved by: https://github.com/zhxchen17	2023-11-03 18:26:08 +00:00
BJ Hargrave	a3db4377eb	docs: Fix some docstring errors in torch.nn.utils parametrize/spectral_norm/stateless (#112786 ) Fixes https://github.com/pytorch/pytorch/issues/112630 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112786 Approved by: https://github.com/lezcano	2023-11-03 18:19:43 +00:00
David Berard	d084a024ae	[easy] skipIfTorchInductor - use condition variable (#112774 ) Fixes #112465 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112774 Approved by: https://github.com/jon-chuang, https://github.com/aaronenyeshi	2023-11-03 17:55:32 +00:00
Yuqing Jiang	2c3ab60506	[profiler] skip flop compute for Nested tensor (#112767 ) Summary: Since nested tensor doesn't have size(), when profiler with_flops is turned on, it throws exception in saveExtraArgs(). It is tricky to support flop computation for Nested tensor because it has dynamic shape. So skip the flop compute for Nested tensor for now instead of throwing exception. Test Plan: Used profiler with NT, the log shows this warning instead of throwing. ```/torch/nested/_internal/nested_tensor.py:205: UserWarning: Failed to save extra arguments for flops computation of op aten::add with input[0] as nested tensor. (Triggered internally at fbcode/caffe2/torch/csrc/profiler/util.cpp:433.)``` Differential Revision: D50919789 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112767 Approved by: https://github.com/aaronenyeshi	2023-11-03 17:44:00 +00:00
isdanni	43fb5147e2	[BE] Enable Ruff's Flake8 PYI001 (#112823 ) Enable [unprefixed-type-param (PYI001)](https://docs.astral.sh/ruff/rules/unprefixed-type-param/#unprefixed-type-param-pyi001) Link: #110950 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112823 Approved by: https://github.com/Skylion007	2023-11-03 17:25:39 +00:00
Nikita Shulga	e2e5897269	[CI] Do not use `packaging` in run_tests.py (#112873 ) It used to check that CUDA is newer than 11.6, but all of them are Yet another mitigation towards missing `packaging` on MacOS Pull Request resolved: https://github.com/pytorch/pytorch/pull/112873 Approved by: https://github.com/huydhn	2023-11-03 17:22:46 +00:00
Nikita Shulga	feb479757f	Make addc[mul\|div] support different out dtypes (#112682 ) By adding `.cast_common_dtype_to_outputs(true)` to `build_ternary_op`. According to profiling, this change does not result in additional kernel invocation on GPUs, i.e. following script ```python import torch def bench_addcdiv(size=(321024*2, 5), device="cuda"): x=torch.rand(size, device=device, dtype=torch.float) y=torch.rand(size, device=device, dtype=torch.double) with torch.profiler.profile(activities=[torch.profiler.ProfilerActivity.CUDA]) as prof: torch.addcdiv(x, x, x, out=y) rc=prof.key_averages() print(rc) if __name__ == "__main__": bench_addcdiv() ``` Shows that before and after the change it took roughly the same time to finish the computation. Before: ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ cudaLaunchKernel 92.99% 20.096ms 92.99% 20.096ms 20.096ms 0.000us 0.00% 0.000us 0.000us 1 void at::native::unrolled_elementwise_kernel<at::nat... 0.00% 0.000us 0.00% 0.000us 0.000us 1.605ms 100.00% 1.605ms 1.605ms 1 cudaDeviceSynchronize 7.01% 1.515ms 7.01% 1.515ms 1.515ms 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 21.611ms Self CUDA time total: 1.605ms ``` After: ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ cudaLaunchKernel 92.92% 19.996ms 92.92% 19.996ms 19.996ms 0.000us 0.00% 0.000us 0.000us 1 void at::native::unrolled_elementwise_kernel<at::nat... 0.00% 0.000us 0.00% 0.000us 0.000us 1.603ms 100.00% 1.603ms 1.603ms 1 cudaDeviceSynchronize 7.08% 1.523ms 7.08% 1.523ms 1.523ms 0.000us 0.00% 0.000us 0.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 21.519ms Self CUDA time total: 1.603ms ``` Add regression test. Fixes https://github.com/pytorch/pytorch/issues/112490 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112682 Approved by: https://github.com/albanD	2023-11-03 17:03:06 +00:00
Nikita Shulga	028e4fc6fa	Add `packaging` to requirements-macOS.txt (#112854 ) Fixes https://github.com/pytorch/pytorch/issues/102299 and https://github.com/pytorch/pytorch/issues/112832 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112854 Approved by: https://github.com/DanilBaibak, https://github.com/huydhn	2023-11-03 16:55:12 +00:00
Iris Zhang	44a28a5efa	[DCP][test] Make dim_0 size of params scale with world_size in torch/distributed/checkpoint/test_fsdp_optim_state.py (#112825 ) Make dim_0 size of params scale with world_size so it can be used to test the impact on performance when scaling up. More context of performance improvement is added in: https://github.com/pytorch/pytorch/pull/111687 For this cherry-pick pair, we remove `_shard_tensor()` call in `load_sharded_optimizer_state_dict()` in optimizer.py, which is reported to scale poorly with number of GPUs. The reason behind is that `_shard_tensor()` calls into `dist.all_gather_object()`, which is extremely expensive in communication when world_size becomes large. main: https://github.com/pytorch/pytorch/pull/111096 cherry-pick: https://github.com/pytorch/pytorch/pull/111687 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112825 Approved by: https://github.com/fegin	2023-11-03 16:37:57 +00:00
Jon Chuang	fd6e571207	[aot_autograd / dynamo] restore grad_mode and other globals to state prior to tracing; add grad_mode mutations to runtime wrapper (#112396 ) Fixes https://github.com/pytorch/pytorch/issues/112072 Grad mode mutations, which are the responsibility of aotautograd, need to be persisted outside of the graph as side-effects in the runtime wrapper. To facilitate this, and to maintain global state hygeine, we restore the grad mode to their value prior to tracing, for both dynamo (alongside other global states) and aot_autograd. This is in line with the assumption that aot_autograd should work as though it were called from eager, before the given GraphModule has been run. It is assumed that other global state (autocast mode, torch function) already maintain hygeine via their context manager APIs. --- ### Future Work? Should we also do this for: 1. autocast mode 2. torch_function_enabled Answer: no. (at least at present) It is assumed that other global state (autocast mode, torch function) already maintain hygeine via their context manager APIs. Furthermore, mutating this state directly is currently unsupported in dynamo, unlike `set_grad_enabled` Repro: ```python import torch def fn(x): x = x + 1 torch.set_autocast_enabled(True) return x + 1 print(torch.compile(fn, fullgraph=True)(torch.zeros(1))) # torch._dynamo.exc.Unsupported: call_method UserDefinedObjectVariable(set_autocast_enabled) __call__ [ConstantVariable(bool)] {} ``` ```python import torch def fn(x): x = x + 1 torch.overrides.BaseTorchFunctionMode.__enter__() return x + 1, torch._C._is_torch_function_enabled() print(torch.compile(fn, fullgraph=True)(torch.zeros(1))) # torch._dynamo.exc.Unsupported: 'call_function TorchFunctionMode.__enter__ in skip_files /home/jonch/Desktop/Programming/mlsys/pytorch/torch/overrides.py, skipped according skipfiles.SKIP_DIRS' ``` ~~I believe 1. is clearly yes - even if it is a corner case (autocast only has ctx manager public API, while dynamo will always emit ctx manager exits before compiling the graph, so one needs to use the internal _enter_autocast API to directly perform a mutation).~~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/112396 Approved by: https://github.com/bdhirsh	2023-11-03 16:14:09 +00:00
Oguz Ulgen	001573b687	[Inductor] Support one node creating multiple mutations in scheduler (#112547 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112547 Approved by: https://github.com/Chillee	2023-11-03 16:01:31 +00:00
cyy	21bc37fad8	[5/N] Apply clang-tidy to aten/src/ATen/core (#112219 ) Enlarge clang-tidy coverage to aten/src/ATen/core/* files Pull Request resolved: https://github.com/pytorch/pytorch/pull/112219 Approved by: https://github.com/Skylion007	2023-11-03 15:51:21 +00:00
Kazuaki Ishizaki	3e2c9410e1	Fix docstring errors in memory.py, nvtx.py (#112751 ) Fixes #112590 Fixed docstring errors in `torch/cuda/memory.py` and `torch/cuda/nvtx.py`. memory.py Before ``` torch/cuda/memory.py:1 at module level: D100: Missing docstring in public module torch/cuda/memory.py:67 in public function `caching_allocator_alloc`: D401: First line should be in imperative mood (perhaps 'Perform', not 'Performs') torch/cuda/memory.py:103 in public function `caching_allocator_delete`: D401: First line should be in imperative mood (perhaps 'Delete', not 'Deletes') torch/cuda/memory.py:122 in public function `set_per_process_memory_fraction`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/memory.py:148 in public function `empty_cache`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/memory.py:148 in public function `empty_cache`: D400: First line should end with a period (not 'g') torch/cuda/memory.py:163 in public function `memory_stats`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/memory.py:163 in public function `memory_stats`: D400: First line should end with a period (not 'a') torch/cuda/memory.py:163 in public function `memory_stats`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') torch/cuda/memory.py:264 in public function `memory_stats_as_nested_dict`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') torch/cuda/memory.py:272 in public function `reset_accumulated_memory_stats`: D401: First line should be in imperative mood (perhaps 'Reset', not 'Resets') torch/cuda/memory.py:292 in public function `reset_peak_memory_stats`: D401: First line should be in imperative mood (perhaps 'Reset', not 'Resets') torch/cuda/memory.py:311 in public function `reset_max_memory_allocated`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/memory.py:311 in public function `reset_max_memory_allocated`: D400: First line should end with a period (not 'y') torch/cuda/memory.py:311 in public function `reset_max_memory_allocated`: D401: First line should be in imperative mood (perhaps 'Reset', not 'Resets') torch/cuda/memory.py:338 in public function `reset_max_memory_cached`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/memory.py:338 in public function `reset_max_memory_cached`: D400: First line should end with a period (not 'e') torch/cuda/memory.py:338 in public function `reset_max_memory_cached`: D401: First line should be in imperative mood (perhaps 'Reset', not 'Resets') torch/cuda/memory.py:365 in public function `memory_allocated`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/memory.py:365 in public function `memory_allocated`: D400: First line should end with a period (not 'n') torch/cuda/memory.py:365 in public function `memory_allocated`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') torch/cuda/memory.py:383 in public function `max_memory_allocated`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/memory.py:383 in public function `max_memory_allocated`: D400: First line should end with a period (not 'n') torch/cuda/memory.py:383 in public function `max_memory_allocated`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') torch/cuda/memory.py:405 in public function `memory_reserved`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/memory.py:405 in public function `memory_reserved`: D400: First line should end with a period (not 's') torch/cuda/memory.py:405 in public function `memory_reserved`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') torch/cuda/memory.py:421 in public function `max_memory_reserved`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/memory.py:421 in public function `max_memory_reserved`: D400: First line should end with a period (not 's') torch/cuda/memory.py:421 in public function `max_memory_reserved`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') torch/cuda/memory.py:443 in public function `memory_cached`: D401: First line should be in imperative mood; try rephrasing (found 'Deprecated') torch/cuda/memory.py:452 in public function `max_memory_cached`: D401: First line should be in imperative mood; try rephrasing (found 'Deprecated') torch/cuda/memory.py:461 in public function `memory_snapshot`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') torch/cuda/memory.py:474 in public function `memory_summary`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/memory.py:474 in public function `memory_summary`: D400: First line should end with a period (not 'r') torch/cuda/memory.py:474 in public function `memory_summary`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') torch/cuda/memory.py:612 in public function `list_gpu_processes`: D202: No blank lines allowed after function docstring (found 1) torch/cuda/memory.py:612 in public function `list_gpu_processes`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/memory.py:612 in public function `list_gpu_processes`: D400: First line should end with a period (not 's') torch/cuda/memory.py:612 in public function `list_gpu_processes`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') torch/cuda/memory.py:648 in public function `mem_get_info`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/memory.py:648 in public function `mem_get_info`: D400: First line should end with a period (not 'n') torch/cuda/memory.py:648 in public function `mem_get_info`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') torch/cuda/memory.py:684 in private function `_record_memory_history`: D202: No blank lines allowed after function docstring (found 1) torch/cuda/memory.py:684 in private function `_record_memory_history`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/memory.py:684 in private function `_record_memory_history`: D400: First line should end with a period (not 'y') torch/cuda/memory.py:684 in private function `_record_memory_history`: D401: First line should be in imperative mood (perhaps 'Enable', not 'Enables') torch/cuda/memory.py:742 in private function `_snapshot`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/memory.py:742 in private function `_snapshot`: D401: First line should be in imperative mood (perhaps 'Save', not 'Saves') torch/cuda/memory.py:818 in private function `_dump_snapshot`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/memory.py:818 in private function `_dump_snapshot`: D401: First line should be in imperative mood (perhaps 'Save', not 'Saves') torch/cuda/memory.py:849 in public function `get_allocator_backend`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/memory.py:849 in public function `get_allocator_backend`: D400: First line should end with a period (not 'y') torch/cuda/memory.py:849 in public function `get_allocator_backend`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') torch/cuda/memory.py:894 in public method `__init__`: D107: Missing docstring in __init__ torch/cuda/memory.py:904 in public function `change_current_allocator`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/memory.py:904 in public function `change_current_allocator`: D401: First line should be in imperative mood (perhaps 'Change', not 'Changes') torch/cuda/memory.py:917 in private function `_get_current_allocator`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') 58 ``` After ``` torch/cuda/memory.py:151 in public function `empty_cache`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/memory.py:151 in public function `empty_cache`: D400: First line should end with a period (not 'g') torch/cuda/memory.py:439 in public function `memory_cached`: D401: First line should be in imperative mood; try rephrasing (found 'Deprecated') torch/cuda/memory.py:448 in public function `max_memory_cached`: D401: First line should be in imperative mood; try rephrasing (found 'Deprecated') torch/cuda/memory.py:676 in private function `_record_memory_history`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/memory.py:676 in private function `_record_memory_history`: D400: First line should end with a period (not 'y') torch/cuda/memory.py:841 in public function `get_allocator_backend`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/memory.py:841 in public function `get_allocator_backend`: D400: First line should end with a period (not 'y') 8 ``` nvtx.py Before ``` torch/cuda/nvtx.py:1 at module level: D100: Missing docstring in public module torch/cuda/nvtx.py:24 in public function `range_push`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/nvtx.py:24 in public function `range_push`: D400: First line should end with a period (not 'd') torch/cuda/nvtx.py:35 in public function `range_pop`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/nvtx.py:35 in public function `range_pop`: D400: First line should end with a period (not 'e') torch/cuda/nvtx.py:43 in public function `range_start`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/nvtx.py:43 in public function `range_start`: D400: First line should end with a period (not 'e') torch/cuda/nvtx.py:81 in public function `range`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/nvtx.py:81 in public function `range`: D400: First line should end with a period (not 'g') 9 ``` After ``` torch/cuda/nvtx.py:41 in public function `range_start`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/nvtx.py:41 in public function `range_start`: D400: First line should end with a period (not 'e') torch/cuda/nvtx.py:79 in public function `range`: D205: 1 blank line required between summary line and description (found 0) torch/cuda/nvtx.py:79 in public function `range`: D400: First line should end with a period (not 'g') 4 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112751 Approved by: https://github.com/kit1980	2023-11-03 15:19:17 +00:00
Tobias Ringwald	29716e865c	Enforce both input tensor shapes of CosineEmbeddingLoss to be equal. (#112782 ) …Added a test to prevent regressions. Fixes #112732. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112782 Approved by: https://github.com/lezcano	2023-11-03 15:15:06 +00:00
Edward Z. Yang	2337d8d062	Use OpOverload instead of OpOverloadPacket for size/stride/etc slots (#112119 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/112119 Approved by: https://github.com/yanboliang	2023-11-03 13:54:41 +00:00
Bin Bao	7f143d7ef5	[aotinductor] Allow specifying a .so name in the aot_inductor.output_path config (#112651 ) Differential Revision: [D50902585](https://our.internmc.facebook.com/intern/diff/D50902585) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112651 Approved by: https://github.com/chenyang78	2023-11-03 12:56:18 +00:00
leslie-fang-intel	871e27a61c	[Quant] [PT2] Remove the output Annotation of Conv/Linear in x86InductorQuantizer (#112140 ) Summary - PR 3 for enabling Int8-Mixed-BF16 PT2E PTQ Quantization with Inductor https://github.com/pytorch/pytorch/issues/111640. - Remove the output annotation of QConv/QLinear in X86InductorQuantizer. Test Plan ``` python -m pytest test_mkldnn_pattern_matcher.py -k test_qconv2d python -m pytest test_mkldnn_pattern_matcher.py -k test_qlinear python -m pytest test_x86inductor_quantizer.py -k Conv2d python -m pytest test_x86inductor_quantizer.py -k Linear ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112140 Approved by: https://github.com/jgong5, https://github.com/jerryzh168 ghstack dependencies: #112010, #112126	2023-11-03 08:24:55 +00:00
leslie-fang-intel	a53d29cc18	Enable oneDNN QLinear FP32/BF16 output (#112126 ) Summary - PR 2 for enabling Int8-Mixed-BF16 PT2E PTQ Quantization with Inductor https://github.com/pytorch/pytorch/issues/111640. - Enable QLinear (relu) with BFloat16 or Float32 output. TestPlan ``` python -u -m pytest -s -v test_quantized_op.py -k test_qlinear_pt2e ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112126 Approved by: https://github.com/jerryzh168, https://github.com/jgong5 ghstack dependencies: #112010	2023-11-03 08:20:54 +00:00
leslie-fang-intel	b6fc7af8a0	Enable oneDNN QConv FP32/BF16 output (#112010 ) Summary - PR 1 for enabling Int8-Mixed-BF16 PT2E PTQ Quantization with Inductor https://github.com/pytorch/pytorch/issues/111640. - Enable QConv (relu, add, add_relu) with BFloat16 or Float32 output. Test Plan ``` python -u -m pytest -s -v test_quantized_op.py -k test_qconv1d_pt2e python -u -m pytest -s -v test_quantized_op.py -k test_qconv2d_pt2e python -u -m pytest -s -v test_quantized_op.py -k test_qconv3d_pt2e python -u -m pytest test_quantized_op.py -k test_qconv2d_relu_pt2e python -u -m pytest test_quantized_op.py -k test_qconv2d_add_pt2e python -u -m pytest test_quantized_op.py -k test_qconv2d_add_relu_pt2e python -u -m pytest test_quantized_op.py -k test_qconv2d_add_relu_float_output_pt2e ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112010 Approved by: https://github.com/jerryzh168, https://github.com/jgong5	2023-11-03 08:16:45 +00:00
Kazuaki Ishizaki	9089242048	Fix typo under test directory (#112346 ) This PR fixes typo in comments and messages under `test` directory. This PR also fixes related typo in messages under `torch` directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112346 Approved by: https://github.com/kit1980, https://github.com/ezyang	2023-11-03 07:53:33 +00:00
Min Si	94ebf52ea3	[cuda] introduce trace tracker callback in cache allocator (#112238 ) Summary: This patch prototypes a trace tracker callback mechanism based on existing TraceEntry records. - It allows external of cache allocator to "attach" trace tracker callbacks. - When a TraceEntry is recorded, it triggers all attached callbacks. Callbacks can selectively behave based on the trace action. - RISK: The attached callback would be called within an allocator call stack (e.g., free during an allocate call). Potential deadlock may occur if other locks are called within the callback and has interdependency w/ the device allocator lock. It is the callback developer's responsibility to avoid any potential deadlock. - ADVICE: The callback mechanism is designed only for Pytorch internal use. We should not expose it to Python layer due to Python GIL that would cause a deadlock. See example in D50726970 that attaches NCCL register/deregister hooks via the trace tracker callback, so that all CUDA segments allocated by the allocator can be registered to NCCL communicators before any NCCL communication happens. This enables fast zero copy algorithms in NCCL. Differential Revision: D50726971 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112238 Approved by: https://github.com/zdevito	2023-11-03 07:38:09 +00:00
soulitzer	53fff56ab8	Graph break cleanly for test_nestedtensor (#112662 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112662 Approved by: https://github.com/jbschlosser	2023-11-03 07:20:43 +00:00
Chien-Chin Huang	88b98191b7	[FSDP][state_dict] Add world_size 1 unittest (#112669 ) As title Differential Revision: [D50754433](https://our.internmc.facebook.com/intern/diff/D50754433/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112669 Approved by: https://github.com/wz337, https://github.com/fduwjj	2023-11-03 07:02:43 +00:00
drisspg	458e7d09fd	Add meta func for scaled mm (#112609 ) # Summary Adds a meta implementation for _scaled_mm which is required for dynamic shapes Pull Request resolved: https://github.com/pytorch/pytorch/pull/112609 Approved by: https://github.com/eellison, https://github.com/malfet	2023-11-03 03:44:22 +00:00
Edward Z. Yang	3be99012d4	Switch some more SymInt tests to TORCH_CHECK_ALWAYS_SHOW_CPP_STACKTRACE (#112626 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/112626 Approved by: https://github.com/bdhirsh	2023-11-03 03:15:26 +00:00
Lucas Pasqualin	62c88ba0fc	E2E test for FSDP, HSDP, FSDP+TP in Distributed Checkpointing (#112541 ) Adds E2E tests for saving/loading distributed checkpoints. Supported so far are: - FSDP - HSDP - FSDP + TP Each method is also tested using `torch.compile` To run all tests: `python test/distributed/checkpoint/test/distributed/checkpoint/e2e/test_e2e_save_and_load.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112541 Approved by: https://github.com/fegin, https://github.com/wz337	2023-11-03 03:04:31 +00:00
Igor Sugak	4a17693d19	[CODEMOD][caffe2] replace uses of np.float with np.float64 (#112675 ) Differential Revision: D50752096 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112675 Approved by: https://github.com/Skylion007	2023-11-03 03:00:51 +00:00
Nikita Shulga	8665a51baf	Initialize logging facility when running ProcessGroupNCCLTest (#112809 ) If code is compiled without `glog`, there are no way to control log levels other than explicitly calling `c10::initLogging()` Test plan: Run `TORCH_CPP_LOG_LEVEL=0 ./bin/ProcessGroupNCCLTest` and observe extra log messages Pull Request resolved: https://github.com/pytorch/pytorch/pull/112809 Approved by: https://github.com/fduwjj	2023-11-03 02:26:13 +00:00
Aaron Enye Shi	0d95378341	[Profiler][Easy] Make timestamps in memory timelines be in microseconds (us) (#112772 ) Summary: Convert the timestamps in memory timelines from ns to us. Test Plan: CI Differential Revision: D50937241 Pulled By: aaronenyeshi Pull Request resolved: https://github.com/pytorch/pytorch/pull/112772 Approved by: https://github.com/anupambhatnagar, https://github.com/davidberard98	2023-11-03 00:41:41 +00:00
PyTorch MergeBot	2d5fec4d59	Revert "Enable concurrent reader for getRecord function (#111426 )" This reverts commit 12a6f5aa6bf3e11668293c36b436eead2f3b8614. Reverted https://github.com/pytorch/pytorch/pull/111426 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/111426#issuecomment-1791733096))	2023-11-03 00:22:21 +00:00
rzou	32039883d1	Set default for IS_FBCODE flag (#112766 ) Summary: If IS_FBCODE is False, then we print an OSS repro if a test fails. We do set IS_FBCODE manually on most internal tests, but we don't do it for all of them. This PR changes it so that the IS_FBCODE gets set to the correct default value (and then tests are able to override them if they'd like). Test Plan: - Tested locally Pull Request resolved: https://github.com/pytorch/pytorch/pull/112766 Approved by: https://github.com/williamwen42	2023-11-03 00:01:07 +00:00
Oguz Ulgen	13d62e28a3	[Inductor] Add Dynamic shape support to user defined triton kernels (#112523 ) 1) This PR moves the grid function codegen to wrapper so that we can use IndentBuffers as opposed to manually adding tabs for indentation. 2) In inductor, emits the grid function in the body of the kernel call so that it can use free symbols from dynamic shapes Pull Request resolved: https://github.com/pytorch/pytorch/pull/112523 Approved by: https://github.com/Chillee	2023-11-02 23:58:50 +00:00
Jason Ansel	f6dc09c1b1	[dynamo] Fix typo in higher_order_ops.py (#112750 ) "unsupported" was undefined Pull Request resolved: https://github.com/pytorch/pytorch/pull/112750 Approved by: https://github.com/zou3519	2023-11-02 23:43:17 +00:00
stanleyedward	12dab00173	Fix Docstring errors in init.py (#112617 ) Fixes #112596 Fix docstring errors in init.py ### Before the change -> 38 errors ``` ╭─user@pc ~/Path/to/pytorch ‹fix/docstring_init› ╰─➤ pydocstyle torch/nn/init.py --count 127 ↵ torch/nn/init.py:1 at module level: D100: Missing docstring in public module torch/nn/init.py:68 in public function `calculate_gain`: D205: 1 blank line required between summary line and description (found 0) torch/nn/init.py:123 in public function `uniform_`: D205: 1 blank line required between summary line and description (found 0) torch/nn/init.py:123 in public function `uniform_`: D400: First line should end with a period (not 'm') torch/nn/init.py:123 in public function `uniform_`: D401: First line should be in imperative mood (perhaps 'Fill', not 'Fills') torch/nn/init.py:141 in public function `normal_`: D205: 1 blank line required between summary line and description (found 0) torch/nn/init.py:141 in public function `normal_`: D400: First line should end with a period (not 'l') torch/nn/init.py:141 in public function `normal_`: D401: First line should be in imperative mood (perhaps 'Fill', not 'Fills') torch/nn/init.py:165 in public function `trunc_normal_`: D205: 1 blank line required between summary line and description (found 0) torch/nn/init.py:165 in public function `trunc_normal_`: D400: First line should end with a period (not 'd') torch/nn/init.py:165 in public function `trunc_normal_`: D401: First line should be in imperative mood (perhaps 'Fill', not 'Fills') torch/nn/init.py:187 in public function `constant_`: D401: First line should be in imperative mood (perhaps 'Fill', not 'Fills') torch/nn/init.py:203 in public function `ones_`: D401: First line should be in imperative mood (perhaps 'Fill', not 'Fills') torch/nn/init.py:216 in public function `zeros_`: D401: First line should be in imperative mood (perhaps 'Fill', not 'Fills') torch/nn/init.py:229 in public function `eye_`: D205: 1 blank line required between summary line and description (found 0) torch/nn/init.py:229 in public function `eye_`: D400: First line should end with a period (not 'y') torch/nn/init.py:229 in public function `eye_`: D401: First line should be in imperative mood (perhaps 'Fill', not 'Fills') torch/nn/init.py:249 in public function `dirac_`: D205: 1 blank line required between summary line and description (found 0) torch/nn/init.py:249 in public function `dirac_`: D400: First line should end with a period (not 'c') torch/nn/init.py:249 in public function `dirac_`: D401: First line should be in imperative mood (perhaps 'Fill', not 'Fills') torch/nn/init.py:311 in public function `xavier_uniform_`: D205: 1 blank line required between summary line and description (found 0) torch/nn/init.py:311 in public function `xavier_uniform_`: D400: First line should end with a period (not 'd') torch/nn/init.py:311 in public function `xavier_uniform_`: D401: First line should be in imperative mood (perhaps 'Fill', not 'Fills') torch/nn/init.py:338 in public function `xavier_normal_`: D205: 1 blank line required between summary line and description (found 0) torch/nn/init.py:338 in public function `xavier_normal_`: D400: First line should end with a period (not 'd') torch/nn/init.py:338 in public function `xavier_normal_`: D401: First line should be in imperative mood (perhaps 'Fill', not 'Fills') torch/nn/init.py:376 in public function `kaiming_uniform_`: D205: 1 blank line required between summary line and description (found 0) torch/nn/init.py:376 in public function `kaiming_uniform_`: D400: First line should end with a period (not 'd') torch/nn/init.py:376 in public function `kaiming_uniform_`: D401: First line should be in imperative mood (perhaps 'Fill', not 'Fills') torch/nn/init.py:425 in public function `kaiming_normal_`: D205: 1 blank line required between summary line and description (found 0) torch/nn/init.py:425 in public function `kaiming_normal_`: D400: First line should end with a period (not 'd') torch/nn/init.py:425 in public function `kaiming_normal_`: D401: First line should be in imperative mood (perhaps 'Fill', not 'Fills') torch/nn/init.py:462 in public function `orthogonal_`: D205: 1 blank line required between summary line and description (found 0) torch/nn/init.py:462 in public function `orthogonal_`: D400: First line should end with a period (not 's') torch/nn/init.py:462 in public function `orthogonal_`: D401: First line should be in imperative mood (perhaps 'Fill', not 'Fills') torch/nn/init.py:507 in public function `sparse_`: D205: 1 blank line required between summary line and description (found 0) torch/nn/init.py:507 in public function `sparse_`: D400: First line should end with a period (not 'e') torch/nn/init.py:507 in public function `sparse_`: D401: First line should be in imperative mood (perhaps 'Fill', not 'Fills') 38 ``` ### After the change -> 0 errors ``` ╭─user@pc ~/Path/to/pytorch ‹fix/docstring_init*› ╰─➤ pydocstyle torch/nn/init.py --count 0 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112617 Approved by: https://github.com/mikaylagawarecki	2023-11-02 23:42:17 +00:00
PyTorch MergeBot	2e29172942	Revert "Add meta func for scaled mm (#112609 )" This reverts commit 75174c379712433af1ff810b36e34573b3d2587e. Reverted https://github.com/pytorch/pytorch/pull/112609 on behalf of https://github.com/huydhn due to Sorry for reverting this change, but it is failing ROCm jobs `75174c3797` ([comment](https://github.com/pytorch/pytorch/pull/112609#issuecomment-1791704037))	2023-11-02 23:37:16 +00:00
PyTorch MergeBot	c63693ca27	Revert "[Fix] add validation logics to TCPStore queries (#107607 )" This reverts commit 50a99812172dd7d1e808fad8dc44665c1770df50. Reverted https://github.com/pytorch/pytorch/pull/107607 on behalf of https://github.com/huydhn due to For some reason, lint job was not run on the PR and now start failing trunk, please rebase and fix lint before relanding `50a9981217` ([comment](https://github.com/pytorch/pytorch/pull/107607#issuecomment-1791702818))	2023-11-02 23:34:08 +00:00
Adam Louly	c27a03a4e5	[ONNX] Cast scale back to fp16 after _attention_scale. (#112554 ) ### Description: The problem is that the graph was cast to `fp32` at a certain point but never reverted to `fp16`, causing the rest of the graph to run on `fp32`. This change aims to fix that issue and improve performance. ### Changes Made: - Modified the ONNX exporter code to ensure that the graph is correctly cast back to `fp16` after a necessary cast to `fp32`. ### Why This Change is Necessary: This change is necessary to ensure that the exported ONNX graph remains in `fp16` where appropriate, leading to significant gains in performance and memory savings. Without this fix, the graph would run entirely in `fp32`, causing suboptimal performance. ### Testing: - Performed extensive testing with various models and scenarios to validate the correctness of the changes. ### Benchmarking Results: Experiments Ran on: 8 GPUS - Tesla V100 - 32GB Before Fix: ort + 4 hidden layers + without fix - Train Runtime: 78.7088 seconds - Train Samples per Second: 10.164 - Train Steps per Second: 1.271 - Train Loss: 5.624655108451844 - Epoch: 0.3 After Fix: ort + 4 hidden layers + with fix - Train Runtime: 72.5636 seconds - Train Samples per Second: 11.025 - Train Steps per Second: 1.378 - Train Loss: 5.6252727746963505 - Epoch: 0.3 We can see 7.79% perf gain after this fix. - I only ran it on 4 hidden layers due to GPU constraints, the perf gain is going to be much higher on the full model. - You could see the gain on other models that uses _attention_scale as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112554 Approved by: https://github.com/BowenBao, https://github.com/thiagocrepaldi	2023-11-02 23:18:53 +00:00
Nicolas Macchioni	0a92ec9452	warn once for use flash attention and memory efficient attention (#112773 ) Summary: these logs can get pretty spammy if we use TORCH_WARN, it could be better to use TORCH_WARN_ONCE Test Plan: ci Differential Revision: D50941941 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112773 Approved by: https://github.com/drisspg	2023-11-02 22:58:28 +00:00
Chien-Chin Huang	e9d7fac89c	[state_dict][10/N] Let set_state_dict returns IncompatibleKeys (#112414 ) load_state_dict returns IncompatibleKeys, so set should also return the same information for the users. Differential Revision: [D50748157](https://our.internmc.facebook.com/intern/diff/D50748157/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112414 Approved by: https://github.com/wz337 ghstack dependencies: #112167, #112203	2023-11-02 22:39:38 +00:00
angelayi	3904b81420	[pytree] Add back a default serialized name (#112748 ) Previously we added a change which required users to pass in a serialized name if they want to serialize a pytree so that the serialized name does not depend on the python environment. However this is currently breaking AOTInductor benchmark tests as AOTInductor will serialize the pytree into the .so for flattening/unflattening the inputs. However, the registration for those pytree types in the AOTInductor benchmarks are in the huggingface repo, so I'm not sure what's a good fix for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112748 Approved by: https://github.com/zhxchen17, https://github.com/malfet	2023-11-02 22:34:42 +00:00
Juncheng Gu	50a9981217	[Fix] add validation logics to TCPStore queries (#107607 ) This PR fixes #106294. Due to the lack of request validation mechanism, TCPStore in torch mistakenly treats nmap scan messages as valid query messages, which leads to DDP OOM. The simple solution enforces the very first query from a client is a validation query with a predefined magic number. If the validation fails, the server will terminate the connection. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107607 Approved by: https://github.com/cbalioglu, https://github.com/XilunWu	2023-11-02 22:12:45 +00:00
Zhijing Li (Accelerator Enablement)	12a6f5aa6b	Enable concurrent reader for getRecord function (#111426 ) Summary: Zion-4s core has poor perf when it comes to reading the large tensor (e.g. 300G), no matter for manifold downloading or reading from files. In this diff, I changed the getRecord function from single thread to multiple threads by passing multiple readers to getRecord function and access the same record at different chunks with different readers. We control the number of additional reader with the`sigrid_model_manager_additional_reader` flag. The default value is 0. When `additional_reader=2`, we allocate `2` extra read client threads. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111426 Approved by: https://github.com/jiayisuse	2023-11-02 22:07:04 +00:00
Chien-Chin Huang	9d0c3e21d0	[state_dict][9/N] Add get and set APIs for model and optimizer state_dict (#112203 ) The original get_state_dict and set_state_dict pair is too complicated because of the possible combinations of usages. This PR adds the APIs to get/set model_state_dict and optimizer_state_dict seperately. Differential Revision: [D50713584](https://our.internmc.facebook.com/intern/diff/D50713584/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112203 Approved by: https://github.com/wz337 ghstack dependencies: #112167	2023-11-02 22:03:57 +00:00
Shaun Walbridge	0adb28b77d	Show CUDAExtension example commands as code (#112764 ) The default rendering of these code snippets renders the `TORCH_CUDA_ARCH_LIST` values with typographic quotes which prevent the examples from being directly copyable. Use code style for the two extension examples. Fixes #112763 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112764 Approved by: https://github.com/malfet	2023-11-02 21:47:50 +00:00
Brian	07c9b053f7	Enable planner to be used for loading sharded optimizer state dict (#112259 ) This creates a more consistent interface for saving and loading sharded state dicts. A planner is able to be specified when saving a sharded optimizer state dict, but there is currently no planner support for loading one. This change does not affect the default behavior of the function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112259 Approved by: https://github.com/wz337	2023-11-02 21:40:30 +00:00
Lucas Pasqualin	b10fa8a447	Adds lucasllc to CODEOWNERS in distributed (#112055 ) Adds myself to CODEOWNERS in distributed Pull Request resolved: https://github.com/pytorch/pytorch/pull/112055 Approved by: https://github.com/H-Huang	2023-11-02 21:29:45 +00:00
Fuzzkatt	db7a3cc436	fix missing nvml in c10/cuda/driver_api.cpp issue (#112121 ) Since https://github.com/pytorch/pytorch/pull/99699 introduced a dependency on nvml for oom reporting in `c10/cuda/driver_api.h`, `c10/cuda/driver_api.cpp`, and `reportProcessMemoryInfo` from `c10/cuda/CUDACachingAllocator.cpp`, we've seen failures regarding cuda expandable segments and oom reporting in NVIDIA's internal CI, specifically on Jetson devices which don't have nvml support as it is incompatible with Jetson. Example failures using the latest upstream on Orin AGX node: `python test/test_cuda.py -k test_notifies_oom` generates ``` Traceback (most recent call last): File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner self.run() File "/usr/lib/python3.10/threading.py", line 953, in run self._target(self._args, *self._kwargs) File "/opt/pytorch/pytorch/test/test_cuda.py", line 1643, in _worker results[t] = torch.nn.functional.conv2d(results[t], weight, padding=0) RuntimeError: CUDA driver error: out of memory ``` `python test/test_cuda_expandable_segments.py` generates ``` Traceback (most recent call last): File "/opt/pytorch/pytorch/test/test_cuda_expandable_segments.py", line 12, in <module> exec(compile(open(filepath).read(), filepath, mode='exec')) File "/opt/pytorch/pytorch/test/test_cuda.py", line 66, in <module> class TestCuda(TestCase): File "/opt/pytorch/pytorch/test/test_cuda.py", line 1609, in TestCuda @unittest.skipIf(not TEST_CUDNN, 'CUDNN not available') File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_utils.py", line 4628, in wrapped self._value = self._cb() File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/common_cuda.py", line 20, in <lambda> TEST_CUDNN = LazyVal(lambda: TEST_CUDA and torch.backends.cudnn.is_acceptable(torch.tensor(1., device=CUDA_DEVICE))) RuntimeError: handle_0 INTERNAL ASSERT FAILED at "/opt/pytorch/pytorch/c10/cuda/driver_api.cpp":15, please report a bug to PyTorch. ``` This PR intends to fix this issue by adding various dlopen checks to make sure nvml actually exists, and safely fall back to using the older libcuda based features of cuda expandable segments and oom reporting if nvml is not found. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112121 Approved by: https://github.com/eqy, https://github.com/ngimel, https://github.com/albanD	2023-11-02 21:28:05 +00:00
Zain Rizvi	4e67c69a7d	[TD] Support downgrading test relevance (#112671 ) Allow heuristics to actually downgrade the relevance of a test. Note that NONE/UNLIKELY tests will still get executed, but they will be ran at the end of the CI The Relevance chosen affects the outcome when Heuristics offer conflicting predictions. A relevance higher up in this list means higher confidence in the declared relevance: HIGH > NONE > PROBABLE > UNLIKELY > UNRANKED Given that we assume ordering based on the list in init right now since the lists are appended, do a similar thing for UNLIKELY and NONE ex HEURISTICS = [a, b, c, d] currently all things in b.high and added after a.high if b.none includes things in a.high, a.high trumps if b.none includes things in a.probable, then b.none trumps since none is stronger than probable if b.unlikely includes things from a.high/probable, a.high/probable trumps since unlikely and probable are at a higher strength Pull Request resolved: https://github.com/pytorch/pytorch/pull/112671 Approved by: https://github.com/clee2000	2023-11-02 21:02:40 +00:00
Aleksei Nikiforov	d9ad7ac390	Skip test_fork_wait_4 and test_fork_wait_4_async (#112743 ) Fixes #109782 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112743 Approved by: https://github.com/jbschlosser	2023-11-02 20:46:29 +00:00
Adrian Wälchli	157bda1bf0	Fix pydocstyle errors in torch/nn/module (#112674 ) Fixes #112601 ``` pydocstyle torch/nn/modules/module.py --count ``` On master: 115 After my changes on this PR: 8 The remaining 8 are due to missing docstrings in the magic methods: ``` torch/nn/modules/module.py:1 at module level: D100: Missing docstring in public module torch/nn/modules/module.py:1635 in public method `__getstate__`: D105: Missing docstring in magic method torch/nn/modules/module.py:1640 in public method `__setstate__`: D105: Missing docstring in magic method torch/nn/modules/module.py:1674 in public method `__getattr__`: D105: Missing docstring in magic method torch/nn/modules/module.py:1689 in public method `__setattr__`: D105: Missing docstring in magic method torch/nn/modules/module.py:1748 in public method `__delattr__`: D105: Missing docstring in magic method torch/nn/modules/module.py:2480 in public method `__repr__`: D105: Missing docstring in magic method torch/nn/modules/module.py:2505 in public method `__dir__`: D105: Missing docstring in magic method ``` Should I add them too? Happy to do it, I just wasn't sure if you wanted these documented. Please let me know. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112674 Approved by: https://github.com/mikaylagawarecki	2023-11-02 20:40:56 +00:00
Yifu Wang	ac9476ba99	Add .boxed() to c10d::ProcessGroup and c10d::Work's pybind (#111997 ) Summary: When passed from C++ to Python, `c10d::ProcessGroup` and `c10d::Work` are automatically converted to their pybind class which can't be used for dispatcher ops. `.boxed()` exposes `c10d::ProcessGroup` and `c10d::Work` as boxed custom class object to Python. ```python import tempfile import torch import torch.distributed as dist if __name__ == "__main__": with tempfile.NamedTemporaryFile(delete=False) as tmpf: dist.init_process_group( backend="nccl", init_method=f"file://{tmpf.name}", rank=0, world_size=1 ) group = dist.group.WORLD print(group) print(group.boxed()) ``` ``` <torch.distributed.distributed_c10d.ProcessGroup object at 0x7fe42fb78d30> ScriptObject <__torch__.torch.classes.c10d.ProcessGroup> ``` Test Plan: Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/111997 Approved by: https://github.com/lw	2023-11-02 20:35:20 +00:00
Evgeni Burovski	6a3922d523	BUG: compile np.array(list_of_arrays) (#112711 ) Add a shortcut for a sequence of arrays only. This remove a graph break on a common pattern of `np.array([np.cos(theta), np.sin(theta)])` and its ilk. This PR is a simpified alternative to https://github.com/pytorch/pytorch/pull/112521 --- it still breaks on mixing arrays and scalars or array_likes (e.g. `np.array([[1, 2], np.array[3, 4]])`) and instead adds a simple shortcut. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112711 Approved by: https://github.com/lezcano	2023-11-02 20:18:16 +00:00
Tugsbayasgalan Manlaibaatar	c1dc4cda5b	Delete unused is_inside_mode (#112677 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112677 Approved by: https://github.com/bdhirsh	2023-11-02 19:57:35 +00:00
Edward Z. Yang	eadb6aca9d	Improve repeat_interleave error message to report repeats/input sizes. (#112729 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/112729 Approved by: https://github.com/albanD	2023-11-02 19:50:15 +00:00
Zhengxu Chen	50767a075a	[export] Clean up verifier [1/n]. (#112505 ) Summary: Some adjustments to verifier so that it's easier to use it correctly. We will enable verifier later, so the current diff is no-op. Test Plan: CI Differential Revision: D50839295 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112505 Approved by: https://github.com/tugsbayasgalan, https://github.com/angelayi	2023-11-02 19:36:06 +00:00
Thiago Crepaldi	8198474eb7	Fix scope name when parent scope is empty for torch.onnx.export (#112654 ) Previous to this PR, we only checked TorchScript nodes for scope compatibility, skipping their parent's scope reference check. This PR fixes adds a check not only for the node being traversed, but its parents as well Pull Request resolved: https://github.com/pytorch/pytorch/pull/112654 Approved by: https://github.com/BowenBao	2023-11-02 19:31:32 +00:00
Josh Levy-Kramer	9d09d29297	[DTensor] Add rand_like, randn_like, randint_like ops to shard propagation (#112576 ) Add rand_like, randn_like, randint_like ops to shard propagation Pull Request resolved: https://github.com/pytorch/pytorch/pull/112576 Approved by: https://github.com/wanchaol, https://github.com/fduwjj	2023-11-02 18:45:43 +00:00
Andrew M. James	0bd2955f15	Memory leak from bsr_scatter_mm_indices_data argument cache (#112301 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112301 Approved by: https://github.com/cpuhrsch, https://github.com/pearu	2023-11-02 18:43:10 +00:00
drisspg	75174c3797	Add meta func for scaled mm (#112609 ) # Summary Adds a meta implementation for _scaled_mm which is required for dynamic shapes Pull Request resolved: https://github.com/pytorch/pytorch/pull/112609 Approved by: https://github.com/eellison, https://github.com/malfet	2023-11-02 18:42:41 +00:00
Huy Do	dd957138ec	Pin Docker images to main (#112692 ) This will help prevent a commit like `77901321d9` pushing to release branch from overwrite the Docker images used in main. In addition, the `DEFAULT_TAG` can be easily updated to `2.1` for example when doing branch cut release. This basically pins the Docker images like https://github.com/pytorch/pytorch/pull/111971 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112692 Approved by: https://github.com/malfet	2023-11-02 17:39:45 +00:00
Menglu Yu	543a618ae8	[inductor][fx pass] Fix a split cat bug in the pre grad (#112667 ) Summary: blue reels vdd v3 has unit test failure, we fix the bug Test Plan: ``` buck2 test 'fbcode//mode/opt' fbcode//pytorch/benchmark/fb/test_gpu:run_test_gpu -- --exact 'pytorch/benchmark/fb/test_gpu:run_test_gpu - test_train_blue_reels_vdd_v3_inductor_accuracy (pytorch.benchmark.fb.test_gpu.test_gpu.TestBenchmarkFbGpu)' ``` Test UI: https://www.internalfb.com/intern/testinfra/testrun/13229323914259182 Network: Up: 2.5MiB Down: 8.3MiB (reSessionID-b3362362-c80a-4ac2-8332-bc1321aaf0bd) Jobs completed: 6. Time elapsed: 5:13.2s. Tests finished: Pass 1. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` buck2 test 'fbcode//mode/opt' fbcode//pytorch/benchmark/fb/test_gpu:run_test_gpu -- --exact 'pytorch/benchmark/fb/test_gpu:run_test_gpu - test_train_blue_reels_vdd_v3_inductor_speedup (pytorch.benchmark.fb.test_gpu.test_gpu.TestBenchmarkFbGpu)' ``` Buck UI: https://www.internalfb.com/buck2/aa3031a9-3f1b-4f42-a78c-decbf2beb14f Test UI: https://www.internalfb.com/intern/testinfra/testrun/4785074810906355 Network: Up: 1.3GiB Down: 40MiB (reSessionID-801ddf16-ff5d-4135-9758-ff286d1d59aa) Jobs completed: 69. Time elapsed: 10:12.4s. Cache hits: 10%. Commands: 61 (cached: 6, remote: 4, local: 51) Tests finished: Pass 1. Fail 0. Fatal 0. Skip 0. Build failure 0 Differential Revision: D50901626 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112667 Approved by: https://github.com/xuzhao9, https://github.com/Skylion007	2023-11-02 17:33:15 +00:00
Mikayla Gawarecki	7cbf9869d5	Add v0 inference benchmark script (#112582 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112582 Approved by: https://github.com/albanD	2023-11-02 17:21:15 +00:00
Chien-Chin Huang	b1f50ead4f	[state_dict][8/N] Ignore meta parameters (#112167 ) This PR let `get_state_dict` ignore the parameters that are on the meta device. This PR also demonstrates a possible use case of ignoring meta parameters -- checkpointing pipeline parallelism. Differential Revision: [D50672521](https://our.internmc.facebook.com/intern/diff/D50672521/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112167 Approved by: https://github.com/wz337	2023-11-02 17:10:03 +00:00
Jerry Zhang	6929ebf2b0	[quant][docs] Add x86 inductor quant docs (#112648 ) Summary: att Test Plan: . Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/112648 Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5, https://github.com/andrewor14	2023-11-02 17:02:09 +00:00
Jon Chuang	954cba2ede	[optim/dynamo] shortcut adagrad with `has_complex` (#112722 ) Follow up to https://github.com/pytorch/pytorch/pull/110706, it was missed as depended on another fix Pull Request resolved: https://github.com/pytorch/pytorch/pull/112722 Approved by: https://github.com/albanD	2023-11-02 16:50:45 +00:00
PyTorch MergeBot	ca33dd780e	Revert "[pytree] Add back a default serialized name (#112748 )" This reverts commit ca72d23613f7976b3ad70e54234b125c1b763dde. Reverted https://github.com/pytorch/pytorch/pull/112748 on behalf of https://github.com/angelayi due to sorry, was trying to fix CI and broke CI ([comment](https://github.com/pytorch/pytorch/pull/112748#issuecomment-1791098635))	2023-11-02 16:47:59 +00:00
vinithakv	82e428723a	Followup patch for cpuinfo fix in ppc64le (#112707 ) Previously a crash in PyTorch on power systems was fixed with #110708. Even with the fix, the torch_test.py test throws the following error for one of the tests. "Error in cpuinfo: processor architecture is not supported in cpuinfo" This is a follow up patch to fix this error. Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/112707 Approved by: https://github.com/albanD	2023-11-02 16:34:41 +00:00
Axel Donath	174aef71af	Clarify maximize option in optimizer.py (#112724 ) While reading the documentation of the optimizers I noticed the description of the `maximize` option is misleading. It currently reads as if the parameters would we maximized, which is factually incorrect. This PR proposes a more clear description. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112724 Approved by: https://github.com/albanD	2023-11-02 16:34:37 +00:00
PyTorch MergeBot	25e17f3522	Revert "Use OpOverload instead of OpOverloadPacket for size/stride/etc slots (#112119 )" This reverts commit dd24e92949ad13960dc91fac93c3be5a43579201. Reverted https://github.com/pytorch/pytorch/pull/112119 on behalf of https://github.com/ZainRizvi due to Breaking internal tests. See D50912326 ([comment](https://github.com/pytorch/pytorch/pull/112119#issuecomment-1791072363))	2023-11-02 16:32:25 +00:00
PyTorch MergeBot	1245a7e75b	Revert "Remove default timeout from PGNCCL::Options ctor (#112555 )" This reverts commit 85e93632e7804bfe64316cbc491aa803a68b0701. Reverted https://github.com/pytorch/pytorch/pull/112555 on behalf of https://github.com/wconstab due to This PR is wrong, see above explanation ([comment](https://github.com/pytorch/pytorch/pull/112555#issuecomment-1791063778))	2023-11-02 16:27:33 +00:00
Josh Levy-Kramer	75f6d52971	[DTensor] Fix DeviceMesh.__repr__ to output valid Python syntax (#112401 ) Fix `DeviceMesh.__repr__` to output valid Python syntax Pull Request resolved: https://github.com/pytorch/pytorch/pull/112401 Approved by: https://github.com/wanchaol, https://github.com/fduwjj	2023-11-02 16:18:15 +00:00
angelayi	ca72d23613	[pytree] Add back a default serialized name (#112748 ) Previously we added a change which required users to pass in a serialized name if they want to serialize a pytree so that the serialized name does not depend on the python environment. However this is currently breaking AOTInductor benchmark tests as AOTInductor will serialize the pytree into the .so for flattening/unflattening the inputs. However, the registration for those pytree types in the AOTInductor benchmarks are in the huggingface repo, so I'm not sure what's a good fix for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112748 Approved by: https://github.com/zhxchen17, https://github.com/malfet	2023-11-02 16:18:03 +00:00
Edward Z. Yang	09df6b771b	Add a note about performant record_stream use. (#112526 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/112526 Approved by: https://github.com/albanD	2023-11-02 15:50:22 +00:00
Joel Schlosser	51a38380d1	Fix torch.load(..., weights_only=True) for NT (#112516 ) Found when looking into #112509 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112516 Approved by: https://github.com/soulitzer	2023-11-02 14:41:04 +00:00
Will Constable	85e93632e7	Remove default timeout from PGNCCL::Options ctor (#112555 ) Providing this timeout to the Options ctor is overriding user-provided values in cases where is_high_priority_stream is set. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112555 Approved by: https://github.com/fduwjj, https://github.com/H-Huang	2023-11-02 14:16:48 +00:00
Edward Z. Yang	a1ab22b81d	Reland "Trigger specialization when you call size()/stride() from C++ (#111935 )" (#112605 ) This reverts commit 22221c6d60613e498aa67b7f7f0f83ec97e35b8a. Differential Revision: [D50886564](https://our.internmc.facebook.com/intern/diff/D50886564) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112605 Approved by: https://github.com/voznesenskym	2023-11-02 13:27:31 +00:00
Min Si	68dead4a6c	[c10d] print NCCL_SUFFIX in NCCL version log at PG init (#112560 ) Summary: See title Test Plan: - Build with NCCL-EXP that defines NCCL_SUFFIX "meta-exp" output: ``` I1031 16:04:01.328174 611521 ProcessGroupNCCL.cpp:918] [Rank 1] ProcessGroupNCCL initialization options: NCCL version: 2.18.3-meta-exp, NCCL_ASYNC_ERROR_HANDLING: 3, NCCL_DESYNC_D EBUG: 0, NCCL_ENABLE_TIMING: 0, NCCL_BLOCKING_WAIT: 0, TIMEOUT(ms): 1800000, USE_HIGH_PRIORITY_STREAM: 0, TORCH_DISTRIBUTED_DEBUG: OFF, NCCL_DEBUG: INFO, ID=140577310728192 ``` - Build with default NCCL with empty NCCL_SUFFIX output: ``` I1031 20:35:45.665733 `2360419b12` ProcessGroupNCCL.cpp:918] [Rank 1] ProcessGroupNCCL initialization options: NCCL version: 2.18.3, NCCL_ASYNC_ERROR_HANDLING: 3,... ``` Differential Revision: D50863335 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112560 Approved by: https://github.com/xw285cornell	2023-11-02 09:56:52 +00:00
Yiwen Jiang	0276d5621a	Fix typo in compilation_unit.h (#112572 ) Fix typo Pull Request resolved: https://github.com/pytorch/pytorch/pull/112572 Approved by: https://github.com/Skylion007, https://github.com/kit1980	2023-11-02 08:26:59 +00:00
Jez Ng	ae85ba820f	[inductor] Memory planning (#112178 ) This was originally @jansel's PR: https://github.com/pytorch/pytorch/pull/102625, which I've built upon. This diff implements static memory planning. It's disabled by default while we examine its performance. We use a greedy-by-size approach. For dynamic shapes, the sizes of the example inputs are used as estimates when making planning decisions. We generate expressions to calculate the actual memory offsets and sizes at runtime when the values of the dynamic shapes are known. In order to simplify these calculations, we have organized the allocations into a tree that branches on space (address offsets) and time (live ranges). Finally, we need to align these offsets, so we have added an `align` sympy Expr to express these calculations. Some limitations: 1. It is only enabled during inference for now. Enabling it for training increases peak memory usage as we allocate all the memory needed for training upfront, before freeing the memory allocated during inference. We can probably address this by doing planning for both the inference and training passes together. 2. It doesn't work with PyTorch Distributed, because kernels like AllGatherIntoTensor codegen strings which do memory operations. We can fix this down the line by having them emit MemoryPlanningLines instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112178 Approved by: https://github.com/desertfire, https://github.com/jansel	2023-11-02 07:39:13 +00:00
NVS Abhilash	db66f15785	docs: fix docstrings in distributed.py and others (fixes #112604 ) (#112657 ) Fixes #112604 Fixes docstring by following `pydocstyle` outputs. - torch/nn/parallel/distributed.py Before: 84 ``` torch/nn/parallel/distributed.py:1 at module level: D100: Missing docstring in public module torch/nn/parallel/distributed.py:92 in private function `_cast_buffers`: D200: One-line docstring should fit on one line with quotes (found 3) torch/nn/parallel/distributed.py:103 in private function `_setup_mixed_precision_params`: D200: One-line docstring should fit on one line with quotes (found 3) torch/nn/parallel/distributed.py:103 in private function `_setup_mixed_precision_params`: D401: First line should be in imperative mood (perhaps 'Create', not 'Creates') torch/nn/parallel/distributed.py:143 in private function `_find_tensors`: D200: One-line docstring should fit on one line with quotes (found 3) torch/nn/parallel/distributed.py:273 in private method `__init__`: D200: One-line docstring should fit on one line with quotes (found 3) torch/nn/parallel/distributed.py:273 in private method `__init__`: D401: First line should be in imperative mood (perhaps 'Set', not 'Sets') torch/nn/parallel/distributed.py:287 in private method `main_hook`: D205: 1 blank line required between summary line and description (found 0) torch/nn/parallel/distributed.py:287 in private method `main_hook`: D400: First line should end with a period (not 'd') torch/nn/parallel/distributed.py:324 in private method `post_hook`: D205: 1 blank line required between summary line and description (found 0) torch/nn/parallel/distributed.py:324 in private method `post_hook`: D400: First line should end with a period (not 'l') torch/nn/parallel/distributed.py:324 in private method `post_hook`: D401: First line should be in imperative mood (perhaps 'Sync', not 'Syncs') torch/nn/parallel/distributed.py:332 in public class `DistributedDataParallel`: D205: 1 blank line required between summary line and description (found 0) torch/nn/parallel/distributed.py:332 in public class `DistributedDataParallel`: D400: First line should end with a period (not 'n') torch/nn/parallel/distributed.py:633 in public method `__init__`: D107: Missing docstring in __init__ torch/nn/parallel/distributed.py:960 in private method `_fire_reducer_autograd_hook`: D205: 1 blank line required between summary line and description (found 0) torch/nn/parallel/distributed.py:960 in private method `_fire_reducer_autograd_hook`: D401: First line should be in imperative mood (perhaps 'Fire', not 'Fires') torch/nn/parallel/distributed.py:969 in private method `_root_copy_hook`: D205: 1 blank line required between summary line and description (found 0) torch/nn/parallel/distributed.py:969 in private method `_root_copy_hook`: D400: First line should end with a period (not 's') torch/nn/parallel/distributed.py:1012 in private method `_module_wait_for_copy_hook`: D205: 1 blank line required between summary line and description (found 0) torch/nn/parallel/distributed.py:1012 in private method `_module_wait_for_copy_hook`: D400: First line should end with a period (not 'e') torch/nn/parallel/distributed.py:1050 in private method `_ddp_init_helper`: D205: 1 blank line required between summary line and description (found 0) torch/nn/parallel/distributed.py:1050 in private method `_ddp_init_helper`: D400: First line should end with a period (not ':') torch/nn/parallel/distributed.py:1050 in private method `_ddp_init_helper`: D401: First line should be in imperative mood (perhaps 'Initialize', not 'Initialization') torch/nn/parallel/distributed.py:1146 in public method `__getstate__`: D105: Missing docstring in magic method torch/nn/parallel/distributed.py:1154 in public method `__setstate__`: D105: Missing docstring in magic method torch/nn/parallel/distributed.py:1222 in private method `_assign_modules_buffers`: D205: 1 blank line required between summary line and description (found 0) torch/nn/parallel/distributed.py:1222 in private method `_assign_modules_buffers`: D400: First line should end with a period (not 'o') torch/nn/parallel/distributed.py:1222 in private method `_assign_modules_buffers`: D401: First line should be in imperative mood (perhaps 'Assign', not 'Assigns') torch/nn/parallel/distributed.py:1277 in private method `_get_parameters`: D200: One-line docstring should fit on one line with quotes (found 3) torch/nn/parallel/distributed.py:1277 in private method `_get_parameters`: D400: First line should end with a period (not 's') torch/nn/parallel/distributed.py:1277 in private method `_get_parameters`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') torch/nn/parallel/distributed.py:1312 in public method `no_sync`: D205: 1 blank line required between summary line and description (found 0) torch/nn/parallel/distributed.py:1312 in public method `no_sync`: D400: First line should end with a period (not 'P') torch/nn/parallel/distributed.py:1312 in public method `no_sync`: D401: First line should be in imperative mood; try rephrasing (found 'A') torch/nn/parallel/distributed.py:1340 in private method `_get_active_ddp_module`: D200: One-line docstring should fit on one line with quotes (found 3) torch/nn/parallel/distributed.py:1340 in private method `_get_active_ddp_module`: D403: First word of the first line should be properly capitalized ('Torchdynamo', not 'TorchDynamo') torch/nn/parallel/distributed.py:1517 in public method `forward`: D102: Missing docstring in public method torch/nn/parallel/distributed.py:1527 in public method `scatter`: D102: Missing docstring in public method torch/nn/parallel/distributed.py:1530 in public method `to_kwargs`: D102: Missing docstring in public method torch/nn/parallel/distributed.py:1539 in public method `gather`: D102: Missing docstring in public method torch/nn/parallel/distributed.py:1542 in public method `train`: D102: Missing docstring in public method torch/nn/parallel/distributed.py:1617 in public method `join`: D205: 1 blank line required between summary line and description (found 0) torch/nn/parallel/distributed.py:1617 in public method `join`: D400: First line should end with a period (not 'f') torch/nn/parallel/distributed.py:1617 in public method `join`: D401: First line should be in imperative mood; try rephrasing (found 'A') torch/nn/parallel/distributed.py:1723 in public method `join_hook`: D205: 1 blank line required between summary line and description (found 0) torch/nn/parallel/distributed.py:1723 in public method `join_hook`: D400: First line should end with a period (not 'y') torch/nn/parallel/distributed.py:1723 in public method `join_hook`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') torch/nn/parallel/distributed.py:1752 in public method `join_device`: D102: Missing docstring in public method torch/nn/parallel/distributed.py:1756 in public method `join_process_group`: D102: Missing docstring in public method torch/nn/parallel/distributed.py:1765 in private method `_register_buffer_comm_hook`: D205: 1 blank line required between summary line and description (found 0) torch/nn/parallel/distributed.py:1765 in private method `_register_buffer_comm_hook`: D400: First line should end with a period (not 'e') torch/nn/parallel/distributed.py:1765 in private method `_register_buffer_comm_hook`: D401: First line should be in imperative mood (perhaps 'Allow', not 'Allows') torch/nn/parallel/distributed.py:1805 in public method `register_comm_hook`: D205: 1 blank line required between summary line and description (found 0) torch/nn/parallel/distributed.py:1805 in public method `register_comm_hook`: D400: First line should end with a period (not 'a') torch/nn/parallel/distributed.py:1805 in public method `register_comm_hook`: D401: First line should be in imperative mood (perhaps 'Register', not 'Registers') torch/nn/parallel/distributed.py:1887 in private method `_register_builtin_comm_hook`: D205: 1 blank line required between summary line and description (found 0) torch/nn/parallel/distributed.py:1887 in private method `_register_builtin_comm_hook`: D400: First line should end with a period (not 'P') torch/nn/parallel/distributed.py:1887 in private method `_register_builtin_comm_hook`: D401: First line should be in imperative mood (perhaps 'Register', not 'Registers') torch/nn/parallel/distributed.py:1914 in private method `_register_fused_optim`: D205: 1 blank line required between summary line and description (found 0) torch/nn/parallel/distributed.py:1914 in private method `_register_fused_optim`: D400: First line should end with a period (not 'a') torch/nn/parallel/distributed.py:1914 in private method `_register_fused_optim`: D401: First line should be in imperative mood (perhaps 'Register', not 'Registers') torch/nn/parallel/distributed.py:2005 in public method `will_sync_module_buffers`: D102: Missing docstring in public method torch/nn/parallel/distributed.py:2060 in private method `_default_broadcast_coalesced`: D205: 1 blank line required between summary line and description (found 0) torch/nn/parallel/distributed.py:2060 in private method `_default_broadcast_coalesced`: D400: First line should end with a period (not 'e') torch/nn/parallel/distributed.py:2128 in private method `_get_data_parallel_params`: D200: One-line docstring should fit on one line with quotes (found 3) torch/nn/parallel/distributed.py:2128 in private method `_get_data_parallel_params`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') torch/nn/parallel/distributed.py:2141 in private method `_set_params_and_buffers_to_ignore_for_model`: D205: 1 blank line required between summary line and description (found 0) torch/nn/parallel/distributed.py:2141 in private method `_set_params_and_buffers_to_ignore_for_model`: D400: First line should end with a period (not 'r') torch/nn/parallel/distributed.py:2141 in private method `_set_params_and_buffers_to_ignore_for_model`: D401: First line should be in imperative mood (perhaps 'Set', not 'Sets') torch/nn/parallel/distributed.py:2170 in private method `_get_ddp_logging_data`: D205: 1 blank line required between summary line and description (found 0) torch/nn/parallel/distributed.py:2170 in private method `_get_ddp_logging_data`: D400: First line should end with a period (not 's') torch/nn/parallel/distributed.py:2170 in private method `_get_ddp_logging_data`: D401: First line should be in imperative mood; try rephrasing (found 'This') torch/nn/parallel/distributed.py:2184 in private method `_set_ddp_runtime_logging_sample_rate`: D205: 1 blank line required between summary line and description (found 0) torch/nn/parallel/distributed.py:2184 in private method `_set_ddp_runtime_logging_sample_rate`: D400: First line should end with a period (not 'g') torch/nn/parallel/distributed.py:2184 in private method `_set_ddp_runtime_logging_sample_rate`: D401: First line should be in imperative mood; try rephrasing (found 'This') torch/nn/parallel/distributed.py:2202 in private method `_set_static_graph`: D205: 1 blank line required between summary line and description (found 0) torch/nn/parallel/distributed.py:2202 in private method `_set_static_graph`: D400: First line should end with a period (not 'l') torch/nn/parallel/distributed.py:2202 in private method `_set_static_graph`: D401: First line should be in imperative mood; try rephrasing (found 'It') torch/nn/parallel/distributed.py:2227 in private method `_remove_autograd_hooks`: D200: One-line docstring should fit on one line with quotes (found 3) torch/nn/parallel/distributed.py:2227 in private method `_remove_autograd_hooks`: D401: First line should be in imperative mood (perhaps 'Remove', not 'Removes') torch/nn/parallel/distributed.py:2233 in private method `_check_reducer_finalized`: D205: 1 blank line required between summary line and description (found 0) torch/nn/parallel/distributed.py:2233 in private method `_check_reducer_finalized`: D400: First line should end with a period (not 'd') torch/nn/parallel/distributed.py:2233 in private method `_check_reducer_finalized`: D401: First line should be in imperative mood (perhaps 'Check', not 'Checks') 84 ``` After: 12 ``` torch/nn/parallel/distributed.py:1 at module level: D100: Missing docstring in public module torch/nn/parallel/distributed.py:618 in public method `__init__`: D107: Missing docstring in __init__ torch/nn/parallel/distributed.py:1133 in public method `__getstate__`: D105: Missing docstring in magic method torch/nn/parallel/distributed.py:1141 in public method `__setstate__`: D105: Missing docstring in magic method torch/nn/parallel/distributed.py:1503 in public method `forward`: D102: Missing docstring in public method torch/nn/parallel/distributed.py:1513 in public method `scatter`: D102: Missing docstring in public method torch/nn/parallel/distributed.py:1516 in public method `to_kwargs`: D102: Missing docstring in public method torch/nn/parallel/distributed.py:1525 in public method `gather`: D102: Missing docstring in public method torch/nn/parallel/distributed.py:1528 in public method `train`: D102: Missing docstring in public method torch/nn/parallel/distributed.py:1734 in public method `join_device`: D102: Missing docstring in public method torch/nn/parallel/distributed.py:1738 in public method `join_process_group`: D102: Missing docstring in public method torch/nn/parallel/distributed.py:1986 in public method `will_sync_module_buffers`: D102: Missing docstring in public method 12 ``` - torch/nn/utils/_named_member_accessor.py Before: 23 ``` torch/nn/utils/_named_member_accessor.py:12 in public function `set_tensor`: D103: Missing docstring in public function torch/nn/utils/_named_member_accessor.py:29 in public function `swap_tensor`: D103: Missing docstring in public function torch/nn/utils/_named_member_accessor.py:85 in public function `swap_submodule`: D103: Missing docstring in public function torch/nn/utils/_named_member_accessor.py:109 in public class `NamedMemberAccessor`: D205: 1 blank line required between summary line and description (found 0) torch/nn/utils/_named_member_accessor.py:109 in public class `NamedMemberAccessor`: D400: First line should end with a period (not 's') torch/nn/utils/_named_member_accessor.py:115 in public method `__init__`: D107: Missing docstring in __init__ torch/nn/utils/_named_member_accessor.py:122 in public method `get_submodule`: D205: 1 blank line required between summary line and description (found 0) torch/nn/utils/_named_member_accessor.py:155 in public method `swap_submodule`: D205: 1 blank line required between summary line and description (found 0) torch/nn/utils/_named_member_accessor.py:164 in public method `get_tensor`: D205: 1 blank line required between summary line and description (found 0) torch/nn/utils/_named_member_accessor.py:185 in public method `set_tensor`: D205: 1 blank line required between summary line and description (found 0) torch/nn/utils/_named_member_accessor.py:194 in public method `del_tensor`: D205: 1 blank line required between summary line and description (found 0) torch/nn/utils/_named_member_accessor.py:211 in public method `swap_tensor`: D205: 1 blank line required between summary line and description (found 0) torch/nn/utils/_named_member_accessor.py:224 in public method `get_tensors`: D205: 1 blank line required between summary line and description (found 0) torch/nn/utils/_named_member_accessor.py:233 in public method `set_tensors`: D205: 1 blank line required between summary line and description (found 0) torch/nn/utils/_named_member_accessor.py:249 in public method `set_tensors_dict`: D205: 1 blank line required between summary line and description (found 0) torch/nn/utils/_named_member_accessor.py:261 in public method `del_tensors`: D205: 1 blank line required between summary line and description (found 0) torch/nn/utils/_named_member_accessor.py:276 in public method `swap_tensors`: D205: 1 blank line required between summary line and description (found 0) torch/nn/utils/_named_member_accessor.py:296 in public method `swap_tensors_dict`: D205: 1 blank line required between summary line and description (found 0) torch/nn/utils/_named_member_accessor.py:325 in public method `check_keys`: D200: One-line docstring should fit on one line with quotes (found 3) torch/nn/utils/_named_member_accessor.py:340 in public method `named_parameters`: D200: One-line docstring should fit on one line with quotes (found 3) torch/nn/utils/_named_member_accessor.py:349 in public method `named_buffers`: D200: One-line docstring should fit on one line with quotes (found 3) torch/nn/utils/_named_member_accessor.py:358 in public method `named_tensors`: D200: One-line docstring should fit on one line with quotes (found 3) torch/nn/utils/_named_member_accessor.py:368 in public method `named_modules`: D200: One-line docstring should fit on one line with quotes (found 3) 23 ``` After: 4 ``` torch/nn/utils/_named_member_accessor.py:12 in public function `set_tensor`: D103: Missing docstring in public function torch/nn/utils/_named_member_accessor.py:29 in public function `swap_tensor`: D103: Missing docstring in public function torch/nn/utils/_named_member_accessor.py:85 in public function `swap_submodule`: D103: Missing docstring in public function torch/nn/utils/_named_member_accessor.py:116 in public method `__init__`: D107: Missing docstring in __init__ 4 ``` - torch/nn/utils/_per_sample_grad.py Before: 3 ``` torch/nn/utils/_per_sample_grad.py:12 in public function `call_for_per_sample_grads`: D205: 1 blank line required between summary line and description (found 0) torch/nn/utils/_per_sample_grad.py:12 in public function `call_for_per_sample_grads`: D400: First line should end with a period (not ')') torch/nn/utils/_per_sample_grad.py:12 in public function `call_for_per_sample_grads`: D402: First line should not be the function's "signature" 3 ``` After: 0 ``` 0 ``` - torch/nn/utils/init.py Before: 3 ``` torch/nn/utils/init.py:1 at module level: D100: Missing docstring in public module torch/nn/utils/init.py:6 in public function `skip_init`: D205: 1 blank line required between summary line and description (found 0) torch/nn/utils/init.py:6 in public function `skip_init`: D400: First line should end with a period (not 'g') 3 ``` After: 1 ``` torch/nn/utils/init.py:1 at module level: D100: Missing docstring in public module 1 ``` - torch/nn/utils/memory_format.py Before: 4 ``` torch/nn/utils/memory_format.py:1 at module level: D100: Missing docstring in public module torch/nn/utils/memory_format.py:5 in public function `convert_conv2d_weight_memory_format`: D202: No blank lines allowed after function docstring (found 1) torch/nn/utils/memory_format.py:5 in public function `convert_conv2d_weight_memory_format`: D205: 1 blank line required between summary line and description (found 0) torch/nn/utils/memory_format.py:5 in public function `convert_conv2d_weight_memory_format`: D400: First line should end with a period (not '`') 4 ``` After: 1 ``` torch/nn/utils/memory_format.py:1 at module level: D100: Missing docstring in public module 1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112657 Approved by: https://github.com/fduwjj	2023-11-02 05:52:47 +00:00
Iris Zhang	b07cfd79fe	[DeviceMesh] Move DeviceMesh out from torch.distributed._tensor (#112364 ) Move DeviceMesh out as a standalone module. Once we make sure everything is migrated and doc is ready, we will make `torch.distributed._device_mesh` public in follow-up PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112364 Approved by: https://github.com/wanchaol, https://github.com/fegin, https://github.com/fduwjj	2023-11-02 04:44:25 +00:00
Yanbo Liang	6f681ab5d9	[torch.compile] autograd.Function with multiple return values (#112475 ) Fixes #106389 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112475 Approved by: https://github.com/zou3519	2023-11-02 04:43:49 +00:00
drisspg	59869903b3	Fix mem eff bias bug (#112673 ) This fixes #112577 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112673 Approved by: https://github.com/cpuhrsch	2023-11-02 04:40:51 +00:00
sanchitintel	40ab6409da	[Trivial change] Remove duplicate line in freezing.py (#112538 ) ## Description `aten = torch.ops.aten` was being called twice. Removed one assignment in this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112538 Approved by: https://github.com/jgong5, https://github.com/Skylion007, https://github.com/eellison	2023-11-02 03:20:18 +00:00
Shunting Zhang	493ae78201	[inductor] nan-checker (#112091 ) This PR is spilt out of https://github.com/pytorch/pytorch/pull/108193 . It adds the ability to add assertion after each triton kernel calls to make sure all tensor arguments are not nan/inf. It helps me find a few bugs when working on benchmark fusion (due to messing up some kernel/graph level states when generating kernel code). Right now we have to disable cudagraphs to enable the nan/inf checks. Otherwise we will see errors like: https://gist.github.com/shunting314/053db66c4f121e5f4c5de159bf0032ed . My best guess is it's due to GPU->CPU copy during capturing for cudagraphs. cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @ColinPeppler @eellison if there is easy way to make it work with cudagraphs. But even if the nan-checker is not compatible with cudagraphs, it's probably still fine since it's just for debugging purpose. Test command: ``` TORCHINDUCTOR_BENCHMARK_KERNEL=1 TORCHINDUCTOR_NAN_ASSERTS=1 python benchmarks/dynamo/huggingface.py --backend inductor --amp --performance --only BertForMaskedLM --training --disable-cudagraphs ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112091 Approved by: https://github.com/eellison, https://github.com/jansel	2023-11-02 02:32:04 +00:00
Thiago Crepaldi	01e4984bac	Add decomposition for dynamo_export + ExportedProgram and remove None from input (#112444 ) This PR introduces the ability to produce GraphModules with Core ATen IR only through decompositions. It also removes `None` from user inputs as ONNX does not supports them Tests for these features will be executed when #112289 is merged, but for reference, they are as below: ```python def test_log_sigmoid(self): # This produces op as `torch.ops.aten.log_sigmoid_forward`, instead of the more # conventional `torch.ops.aten.log_sigmoid`. class Model(torch.nn.Module): def __init__(self): super().__init__() self.m = torch.nn.LogSigmoid() def forward(self, x): return self.m(x) input = torch.randn(2) self.run_test_with_fx_to_onnx_exporter_and_onnx_runtime( Model(), (input,), model_type=self.model_type ) def test_none_input(self): class NoneInputModel(torch.nn.Module): def forward( self, x: torch.Tensor, y: Optional[torch.Tensor], z: torch.Tensor ): if y is None: return x + z return x + y + z self.run_test_with_fx_to_onnx_exporter_and_onnx_runtime( NoneInputModel(), (torch.randn(1, 2), None, torch.randn(1, 2)), model_type=self.model_type, ) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112444 Approved by: https://github.com/BowenBao	2023-11-02 02:30:59 +00:00
leslie-fang-intel	6c19de07cd	[Quant] [PT2] Add ConvBNAdd(ReLU) Annotation into X86InductorQuantizer (#111281 ) Summary This PR adds ConvBNAdd(ReLU) QAT Annotation into `X86InductorQuantizer`. Test Plan ``` python -m pytest test_x86inductor_quantizer.py -k test_qat_conv2d_binary_with_quantizer_api python -m pytest test_x86inductor_quantizer.py -k test_qat_conv2d_binary_unary_with_quantizer_api python -m pytest test_mkldnn_pattern_matcher.py -k test_qat_qconv2d_add python -m pytest test_mkldnn_pattern_matcher.py -k test_qat_qconv2d_add_relu ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/111281 Approved by: https://github.com/jgong5, https://github.com/jerryzh168 ghstack dependencies: #111280	2023-11-02 02:05:49 +00:00
leslie-fang-intel	56ca0043f6	[Quant] [PT2] Enable QAT Quantization flow in X86InductorQuantizer (#111280 ) Summary This PR enables PT2 QAT Quantization flow in `X86InductorQuantizer`. Test Plan ``` python -m pytest test_x86inductor_quantizer.py -k test_qat_conv2d_with_quantizer_api python -m pytest test_x86inductor_quantizer.py -k test_qat_conv2d_unary_with_quantizer_api python -m pytest test_mkldnn_pattern_matcher.py -k test_qat_qconv2d python -m pytest test_mkldnn_pattern_matcher.py -k test_qat_qconv2d_relu ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/111280 Approved by: https://github.com/jgong5, https://github.com/jerryzh168	2023-11-02 02:03:10 +00:00
David Berard	8191fb3e06	[Reland2] [inductor][BE] split triton_meta and inductor_meta (#112351 ) triton_meta is intended to be passed directly to triton. Previous we were also putting other metadata into triton_meta; but we should split out the other metadata into a separate dict to avoid possible conficts in the future. This PR splits out triton_meta and inductor_meta so we have a place to put additional metadata that isn't intended to be passed to triton. Tests - wait for CI Differential Revision: [D50864493](https://our.internmc.facebook.com/intern/diff/D50864493) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112351 Approved by: https://github.com/eellison	2023-11-02 00:40:12 +00:00
angelayi	ff35e1e45b	[pytree] Add custom treespec fqn field (#112428 ) Custom classes that are serialized with pytree are serialized by default with `f”{class.__module__}.{class.__name__}”`. This is a dependency from our serialized program directly into the outer Python environment. If a user moves the class to a different directory, the serialized program will be unable to be loaded. So, we will require users to pass in an FQN if they want to serialize their custom treespec type. Differential Revision: [D50886366](https://our.internmc.facebook.com/intern/diff/D50886366) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112428 Approved by: https://github.com/suo	2023-11-02 00:26:41 +00:00
angelayi	131e0f1b75	[export] Separate out graph signature (#112412 ) Differential Revision: [D50800524](https://our.internmc.facebook.com/intern/diff/D50800524) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112412 Approved by: https://github.com/zhxchen17	2023-11-02 00:18:28 +00:00
Jason Ansel	b63335c27a	Make ci_expected_accuracy/update_expected.py apply csv linter (#112655 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112655 Approved by: https://github.com/desertfire	2023-11-02 00:05:14 +00:00
Tugsbayasgalan Manlaibaatar	af1a8f4cb2	Allow passing in dynamic_shapes without original argument name (#112298 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112298 Approved by: https://github.com/avikchaudhuri	2023-11-02 00:03:36 +00:00
Jason Ansel	c1e2ccdb97	AssertionError -> AttributeError in cuBLASModule (#112606 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112606 Approved by: https://github.com/eellison	2023-11-01 23:23:10 +00:00
Edward Z. Yang	258874888b	Refine replacements with equality tests on runtime asserts (#112156 ) Just poppin' off some TODOs. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/112156 Approved by: https://github.com/albanD, https://github.com/aakhundov ghstack dependencies: #112155	2023-11-01 23:02:17 +00:00
Edward Z. Yang	793c62b79c	Allow binary pointwise operations to cause refinement on unbacked SymInts (#112155 ) To do this, there is a little detour to remove hint caching for unbacked SymInts; now, we just always attempt to update the hint (using maybe_evaluate_static; this is much better than the replace we were doing before) if we don't think we know it. With this change, we now can generally infer that i0 == 1 is false for a size-like unbacked SymInt. So if we write the size match / broadcasting test very carefully (see comment), we will eventually end up expect_true(sizeA == sizeB), which is good enough to cause refinement. Phew! I think I still want to setup a replacement if you do i0 == s0, but I'm going to do that in a follow up. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/112155 Approved by: https://github.com/aakhundov, https://github.com/voznesenskym	2023-11-01 23:02:17 +00:00
Richard Zou	4f5acf8329	Log non-pt2_compliant ops encountered by Dynamo (#112581 ) Summary: See internal diff for more changes. Whenever we encounter a non-compliant op, we add it to a set on the OutputGraph. When a compilation event happens, we log the contents of this set. I'm planning on flipping the `only_allow_pt2_compliant_ops` config from False to True after the logging determines that existing models do not use non-compliant ops. Test Plan: - Tested the logging internally locally Differential Revision: D50884828 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112581 Approved by: https://github.com/yanboliang	2023-11-01 22:53:16 +00:00
angelayi	00d6d2f66b	[aotinductor] Add example_value metadata to nodes (#112415 ) split_cat fx passes expect the `example_value` metadata on every node. However, the graph module from _export_torch_ir does not contain this metadata, causing the split_cat fx passes to not run. So, I added a pass to add this metadata to every node in the graph. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112415 Approved by: https://github.com/frank-wei	2023-11-01 22:44:50 +00:00
Jon Chuang	f8285b1195	[dynamo] Fix nested torch function mode not setting correct value on exiting (#112621 ) Shold exit to the dynamo stubbed value, not the real value, as the real value is never mutated. Fixes https://github.com/pytorch/pytorch/issues/112620 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112621 Approved by: https://github.com/jansel	2023-11-01 22:07:35 +00:00
Kimish Patel	9e2af971fc	[Quantization] Add "quantization_tag" as metadata to fx proxy (#108764 ) Summary: In order to make sure that quantization_tag is preserved through second stage export, this PR adds it as a special metadata that should be preserved. Since quantization in export path will work on top of pre dispatch graph, subsequent post dispatch op decomposition, will decompose ops that quant workflow tagged. In order to make sure that the patterns identified by quantizer, remains identifiable, even after decompositions are applied, we must preserve "quantization_tag". This enables backend delegates, that quantized a model for specific backend, to be able to identify "quantized" patterns. Test Plan: metadata porting tests Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D49056259](https://our.internmc.facebook.com/intern/diff/D49056259) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108764 Approved by: https://github.com/tugsbayasgalan, https://github.com/jerryzh168	2023-11-01 21:41:58 +00:00
Ting Lu	e06288f8f1	skip test in test_eager_transforms.py while Triton lacks ARM support (#112092 ) fix the failure with test_compile_vmap_hessian in test_eager_transforms.py. Skipping the test while we wait for ARM support from Triton. cc @ptrblck @eqy Pull Request resolved: https://github.com/pytorch/pytorch/pull/112092 Approved by: https://github.com/eqy, https://github.com/huydhn	2023-11-01 21:33:18 +00:00
Edward Z. Yang	5b0840c71b	Guarantee expr is a sympy.Expr before xreplace'ing it (#112619 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/112619 Approved by: https://github.com/eellison, https://github.com/voznesenskym	2023-11-01 21:26:27 +00:00
ydwu4	5d7f23b1f4	[HighOrderOp] allow aliasing a variable from outer scope in higher order op (#112537 ) Fixes #112169 This PR follows voz's idea of disabling rename inside higher order operator body to avoid the confusion between renaming and mutating, where we'd like to allow rename and forbid mutation. Specifically, the confusion is because rename creates a new variable tracker and calls replace_all for MutableLocal. We either have to 1. look at the fields of the variable tracker to determine whether it's just a name change or 2. pass some information into replace_all and telling it it's a rename op so don't check for side-effects. Both approach seems undesirable, or 3. make rename mutate the user_code_variable_name for the variable tracker (note that: we've been doing this for MutableSideEffects). All approaches seem undesirable. We end up disabling rename if dynamo is speculating inside a higher order operator's body. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112537 Approved by: https://github.com/zou3519	2023-11-01 20:59:00 +00:00
jjsjann123	9d23440c81	Nvfuser code base nuke (#111447 ) removing nvfuser code base. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111447 Approved by: https://github.com/albanD	2023-11-01 20:53:14 +00:00
Han Qi	5a6f8014c4	Add a decomposition for _weight_norm_interface. (#112193 ) Fixes #112086 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112193 Approved by: https://github.com/ezyang	2023-11-01 19:51:11 +00:00
Nikita Shulga	1b86d5ef2f	[Ci] Add arm64 libtorch CI config (#112474 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112474 Approved by: https://github.com/ZainRizvi, https://github.com/seemethere ghstack dependencies: #112451, #112452	2023-11-01 19:09:34 +00:00
Oguz Ulgen	7f77ec37be	[Inductor] Clarify mutation related comments (#112466 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112466 Approved by: https://github.com/Chillee	2023-11-01 18:39:58 +00:00
Edward Z. Yang	dd24e92949	Use OpOverload instead of OpOverloadPacket for size/stride/etc slots (#112119 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/112119 Approved by: https://github.com/yanboliang	2023-11-01 18:26:01 +00:00
Wei-Sheng Chin	ab20bab729	[ONNX] Fix partial name matching when searching parameter tensors (#112517 ) Now we remove name in `onnx_input_names` once it's matched by a parameter so that the same name won't be matched twice. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112517 Approved by: https://github.com/thiagocrepaldi	2023-11-01 18:25:26 +00:00
littsk	623a311d22	fix torch.distributed.rpc example incorrect usage (#112367 ) Fixes #112366 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112367 Approved by: https://github.com/H-Huang	2023-11-01 18:08:32 +00:00
Nikita Shulga	54c7d0d99d	[GHF] Bot should reopen PR after revert (#112614 ) Fixes https://github.com/pytorch/test-infra/issues/4692 Test plan, see https://github.com/malfet/deleteme/pull/58#issuecomment-1789365259 / https://github.com/malfet/deleteme/actions/runs/6723011476 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112614 Approved by: https://github.com/seemethere, https://github.com/ezyang ghstack dependencies: #112613	2023-11-01 18:03:32 +00:00
Nikita Shulga	4a2242e479	[BE] Use GITHUB_API_URL (#112613 ) To avoid hardcoding the same string constant over and over again Pull Request resolved: https://github.com/pytorch/pytorch/pull/112613 Approved by: https://github.com/seemethere	2023-11-01 18:03:32 +00:00
Kurt Mohler	fd209543d5	Add `torch.utils.deterministic.fill_uninitialized_memory` flag (#111377 ) Part of #109802 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111377 Approved by: https://github.com/albanD, https://github.com/aaronenyeshi	2023-11-01 16:10:09 +00:00
Aaron Enye Shi	cce5016653	[Profiler] Manual Submodule Update for Kineto (#112540 ) Summary: Update the submodule of the Kineto project. Includes the following changes: - Fix HAS_CUPTI macro uses - Added error condition count tracking and prints - Collect more info on cudaEventRecord for stream wait sync events - Fix CUDA 11.7 support for new cudaLaunchKernelExC - Fix newlines in error info logging causing broken JSON - Kineto samples programs are fixed and updated - ROCm lib path fixed. - Clearing rocTracer cached data causing memory leaks - Fix int overflow in counter of activities - Populate collective metadata from CPU op to GPU kernels - Updated TEARDOWN_CUPTI to check value is 1 Test Plan: CI Differential Revision: D50861994 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112540 Approved by: https://github.com/davidberard98	2023-11-01 16:03:04 +00:00
Peter Bell	84f59d893a	[fx] Cache translation_validation_enabled on `ShapeEnv` (#112493 ) `ShapeEnv` has tons of functionallity that is conditioned on this `translation_validation_enabled()` check, to the point where 8% of time in `empty_strided` is spent just in that function. However, it doesn't really make sense for the value of `translation_validation_enabled()` to change throughout the life of a `ShapeEnv` so we might as well run the check once and store it in the `ShapeEnv`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112493 Approved by: https://github.com/lezcano ghstack dependencies: #112418	2023-11-01 14:37:28 +00:00
Peter Bell	9e89c36a54	[FakeTensor] Reuse flat_args throughout FakeTensorMode.dispatch (#112418 ) This function repeatedly flattens and unflattens the `args, kwargs` pair so we get a quite significant perf improvement from saving the `flat_args` and operating directly on those. I see a 15% improvement in dispatch for `empty_strided`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112418 Approved by: https://github.com/lezcano	2023-11-01 14:37:28 +00:00
Sherlock Huang	a126bbfea3	[AOTInductor] Include AOTI debug folder in package (#112514 ) Summary: Allow user to set debug dir for Inductor Include AOTInductor debug folder in the package. ``` zipinfo package.zip Archive: package.zip Zip file size: 1325264 bytes, number of entries: 46 -rw---- 0.0 fat 212 bl stor 80-000-00 00:00 package/data/aotinductor/merge-a100/aotinductor_pickle_data.json -rw---- 0.0 fat 6024 bl stor 80-000-00 00:00 package/data/aotinductor/merge-a100/debug/torchinductor/model___9.0/fx_graph_runnable.py -rw---- 0.0 fat 9031 bl stor 80-000-00 00:00 package/data/aotinductor/merge-a100/debug/torchinductor/model___9.0/fx_graph_readable.py -rw---- 0.0 fat 9202 bl stor 80-000-00 00:00 package/data/aotinductor/merge-a100/debug/torchinductor/model___9.0/fx_graph_transformed.py -rw---- 0.0 fat 10865 bl stor 80-000-00 00:00 package/data/aotinductor/merge-a100/debug/torchinductor/model___9.0/ir_pre_fusion.txt -rw---- 0.0 fat 10865 bl stor 80-000-00 00:00 package/data/aotinductor/merge-a100/debug/torchinductor/model___9.0/ir_post_fusion.txt -rw---- 0.0 fat 13553 bl stor 80-000-00 00:00 package/data/aotinductor/merge-a100/debug/torchinductor/model___9.0/output_code.py -rw---- 0.0 fat 5822 bl stor 80-000-00 00:00 package/data/aotinductor/merge-a100/debug/torchinductor/model___9.1/fx_graph_runnable.py -rw---- 0.0 fat 8817 bl stor 80-000-00 00:00 package/data/aotinductor/merge-a100/debug/torchinductor/model___9.1/fx_graph_readable.py -rw---- 0.0 fat 8988 bl stor 80-000-00 00:00 package/data/aotinductor/merge-a100/debug/torchinductor/model___9.1/fx_graph_transformed.py -rw---- 0.0 fat 10858 bl stor 80-000-00 00:00 package/data/aotinductor/merge-a100/debug/torchinductor/model___9.1/ir_pre_fusion.txt -rw---- 0.0 fat 10858 bl stor 80-000-00 00:00 package/data/aotinductor/merge-a100/debug/torchinductor/model___9.1/ir_post_fusion.txt ``` Test Plan: CIs Reviewed By: chenyang78 Differential Revision: D50815320 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112514 Approved by: https://github.com/chenyang78, https://github.com/desertfire	2023-11-01 08:25:11 +00:00
chunyuan	29f3d392bf	Inductor cpp wrapper: support QLinear (#112378 ) Align the type of `post_op_args` in the schema of `onednn::qlinear_pointwise` to be the same as other fusion OPs like qconv, conv, conv_transpose, linear by changing from `float[]` to `Scalar?[]`: `cb942ef2b1/aten/src/ATen/native/quantized/library.cpp (L260-L266)` `cb942ef2b1/aten/src/ATen/native/mkldnn/RegisterMkldnnOpContextClass.cpp (L48-L59)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112378 Approved by: https://github.com/jgong5, https://github.com/desertfire ghstack dependencies: #112373	2023-11-01 06:22:16 +00:00
chunyuan	337d69e40a	Inductor cpp wrapper: support QConv (#112373 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112373 Approved by: https://github.com/jgong5, https://github.com/desertfire	2023-11-01 06:15:49 +00:00
Jiong Gong	e061144aaf	[inductor] replace ops.div with ops.truediv (#112243 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112243 Approved by: https://github.com/lezcano ghstack dependencies: #112234	2023-11-01 05:50:51 +00:00
Jon Chuang	2ed3a73e40	[dynamo] treat `torch.device`, `torch.dtype` as constant literal; revise guards to have access to `torch` module (#112426 ) Just like e.g. container - list/set of constant literals, these are constant literals. We follow up to https://github.com/pytorch/pytorch/pull/112416, enforcing that we always use `ConstantVariable` to represent these. Replace https://github.com/pytorch/pytorch/pull/112284, https://github.com/pytorch/pytorch/pull/112332 as incomplete, in case there is no movement there. Ought to fix: https://github.com/pytorch/pytorch/issues/109910 We remove old guards special-casing, which fell back on str equality when not having access to `torch` module in `eval` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112426 Approved by: https://github.com/ezyang	2023-11-01 05:28:28 +00:00
Jon Chuang	76918367ff	fix(dynamo): `Optimizer._init_group` did not handle return value (#110709 ) blocks: https://github.com/pytorch/pytorch/pull/110706 Causes a bug for all optimizers that use _init_group return value. `compile` + _init_group ret value is not on testing path. So we also add test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110709 Approved by: https://github.com/ezyang	2023-11-01 05:22:42 +00:00
feifan	c73da67d46	new_qtensor support privateuseone allocator. (#111464 ) I want to create a quant tensor through `PerTensorAffineQuantizer`. But I found that it will throw error because of the lake of judgment for PrivateUse1. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111464 Approved by: https://github.com/ezyang	2023-11-01 05:16:58 +00:00
Jon Chuang	748c1a1d81	[dynamo] Be stricter about `HigherOrderOperator` kwargs (#111938 ) kwargs need to be handled carefully in speculate subgraph. We should be clearer about the contract of what the inputs are. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111938 Approved by: https://github.com/zou3519	2023-11-01 04:10:09 +00:00
Mikayla Gawarecki	320ac546ed	Clarify difference between share_memory and from_file (#111856 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111856 Approved by: https://github.com/albanD ghstack dependencies: #111688	2023-11-01 03:25:09 +00:00
Huy Do	df0a3c0541	Upload ROCm artifacts from the new workflow to S3 (#112442 ) This is raised as a regression after https://github.com/pytorch/pytorch/pull/111394#issuecomment-1785858263. The jobs are now in a different workflow, so their artifacts weren't uploaded to S3 like other trunk jobs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112442 Approved by: https://github.com/ZainRizvi, https://github.com/malfet	2023-11-01 03:06:15 +00:00
Menglu Yu	dcd94814a3	[inductor][fx pass] Add split-stack-tahn-unbind pattern detection (#111854 ) Summary: We add a new patten to further close the gap between fxt and pt2 Test Plan: # unit test ``` buck2 test mode/dev-nosan //caffe2/test/inductor:split_cat_fx_passes ``` Test UI: https://www.internalfb.com/intern/testinfra/testrun/1407375224343119 # icvr local test [P865759493](https://www.internalfb.com/intern/paste/P865759493/) before vs after transformation after "merge_getitem_cat_pass": https://www.internalfb.com/intern/diffing/?paste_number=854132317 # e2e test The proposal is bundled D50207610, D50397173 and D50100667 ### ICVR baseline: f489286934 baseline + optimus: f489287369 proposal: f492987960 ### CMF baseline: f489195078 baseline + optimus: f489215258 proposal: f492970293 ### IG_CTR baseline: f489237630 baseline + optimus: f489238767 proposal: f492977663 Differential Revision: D50397173 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111854 Approved by: https://github.com/jackiexu1992	2023-11-01 03:04:25 +00:00
Shunting Zhang	a1e222ef02	metric table (#109245 ) In dynamo/inductor, sometimes it helps to gather metrics/statistics for each model in different levels like model level, graph level, kernel level or pair of fusion nodes level. This kind of thing will be very easy to do with Scuba, but we only have scuba in fbcode. This PR build metric tables to solve part of the problem. Q: why not log to stdout/err direclty A: sometimes we need more structured data. E.g., it would be helpful to gather all the stats in a CSV and then do post-processing (like calculating a geomean etc.). Also metric table will tag each row with the model name which is helpful. Q: what's the difference with speedup_indcutor.csv A: speedup_indcutor.csv is a special case that gather statistics on model level: i.e., we have one row for each model. But recording statistics on finer grain level like graph etc. is also helpful. Example use cases: - As a followup on the bechmark fusion PR, I want to gather all the 'slow' fusion and analyze them. With the metric table, I can easily log slow fusion for each model into a csv file. Here is the log gathered for huggingface: https://gist.github.com/shunting314/964e73cc98368b301414ec7b7ad4c702 . - To help understand the effect of 'loop ordering after fusion' PR, it would be helpful to gather stats like how many fusions happens for each graph. Previously we log the metric to stderr directly. But logging these metrics in a structural way is useful. - gather number of registers, register spills, shared memory usage for each kernel in each model with runnable kernel code logged. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109245 Approved by: https://github.com/jansel, https://github.com/mlazos	2023-11-01 02:33:42 +00:00
Till Hoffmann	5296c14094	Add inverse gamma distribution and fix `sign` bug in `PowerTransform`. (#104501 ) This PR comprises a few small contributions: 1. `PowerTransform` returned a sign of `+1` irrespective of exponent. However, it should return the sign of the exponent because the gradient has the same sign as the exponent. That issue has been fixed. 2. Added tests to catch errors akin to 1. in the future. 3. Added an `InverseGamma` distribution as a `TransformedDistribution` with `PowerTransform(-1)` and `Gamma` base distribution. The `InverseGamma` is often used as a prior for the length scale of Gaussian processes to aggressively suppress short length scales (see [here](https://betanalpha.github.io/assets/case_studies/gaussian_processes.html#323_Informative_Prior_Model) for a discussion). Note: I added a `positive` constraint for the support of the inverse gamma distribution because the `PowerTransform(-1)` can fail for `nonnegative` constraints if the random variable is zero. ```python >>> torch.distributions.InverseGamma(0.5, 1.0).log_prob(torch.zeros(1)) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-8-758aa22deacd> in <module> ----> 1 torch.distributions.InverseGamma(0.5, 1.0).log_prob(torch.zeros(1)) ~/git/pytorch/torch/distributions/transformed_distribution.py in log_prob(self, value) 140 """ 141 if self._validate_args: --> 142 self._validate_sample(value) 143 event_dim = len(self.event_shape) 144 log_prob = 0.0 ~/git/pytorch/torch/distributions/distribution.py in _validate_sample(self, value) 298 valid = support.check(value) 299 if not valid.all(): --> 300 raise ValueError( 301 "Expected value argument " 302 f"({type(value).__name__} of shape {tuple(value.shape)}) " ValueError: Expected value argument (Tensor of shape (1,)) to be within the support (GreaterThan(lower_bound=0.0)) of the distribution InverseGamma(), but found invalid values: tensor([0.]) ``` This differs from the scipy implementation. ```python >>> scipy.stats.invgamma(0.5).pdf(0) 0.0 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/104501 Approved by: https://github.com/fritzo, https://github.com/ezyang	2023-11-01 02:26:25 +00:00
Adam J. Stewart	0347b36b52	SummaryWriter.add_figure: add type hints (#110021 ) Discovered a bug in our code that could have been prevented by type hints, so I added them 😄 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110021 Approved by: https://github.com/ezyang	2023-11-01 02:19:09 +00:00
Evgeni Burovski	6dd002f24e	avoid readonly arrays (#112524 ) Since PyTorch does not have readonly tensors, compiling code with readonly numpy arrays warns about possible UB. Thus detect readonly arrays, flip them to be writeable and clone the resulting tensor. BTW, this is a break away from numpy semantics: the resulting array is writeable. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112524 Approved by: https://github.com/lezcano	2023-11-01 02:15:03 +00:00
chilli	3cee033b98	Reland of a bunch of pattern matcher + indexing fixes (#112476 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112476 Approved by: https://github.com/oulgen	2023-11-01 02:13:44 +00:00
Janet Yang	ef1f08c5a0	State_dict serialization for meta tensors (#112213 ) Summary: Add cases for serializing meta tensors from state_dict Test Plan: sandcastle Differential Revision: D50718161 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112213 Approved by: https://github.com/zhxchen17, https://github.com/houseroad	2023-11-01 01:07:09 +00:00
Jon Chuang	41720c2a48	[dynamo] add infinite generators `itertools.{count, repeat, cycle}` (#110967 ) Fixes https://github.com/pytorch/pytorch/pull/110953/files#r1352868935 Depends on: https://github.com/pytorch/pytorch/pull/110953 Why not use these for `repeat(item, count)`: > These are not preferred as they return an opaque VariableTracker. In particular, one cannot do `enumerate(repeat(1))`. `repeat(1, 10)` benefits from the integration enjoyed by `ListVariableIterator` Follow ups: - [ ] make listiterator an IteratorVariable, define iterator integrations on base IteratorVariable where unspecialized https://github.com/pytorch/pytorch/pull/110967#discussion_r1356656469 - Please make a new issue for this - [ ] explore integrating cpython itertools test suite https://github.com/pytorch/pytorch/pull/110967#discussion_r1358326402 - [ ] Use something other than `StopIteration` to handle iterator termination https://github.com/pytorch/pytorch/pull/110967#discussion_r1358336038 - [ ] Add test case for consuming iterator simultaneously from two code points https://github.com/pytorch/pytorch/pull/110967/files#r1358325511 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110967 Approved by: https://github.com/ezyang	2023-11-01 00:33:17 +00:00
Jon Chuang	9bfebf754f	[dynamo] fix graph break, improve hygeine - enforce using ConstantVariable for `torch.device`,`torch.dtype` (#112416 ) Fixes https://github.com/pytorch/pytorch/pull/112332/files#r1375690808 Simplify code paths, fix graph break ``` torch._dynamo.exc.InternalTorchDynamoError: TorchVariable() has no type ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112416 Approved by: https://github.com/lezcano	2023-11-01 00:19:52 +00:00
PyTorch MergeBot	74e6c877e9	Revert "[inductor] Memory planning (#112178 )" This reverts commit f64a97c6f88873363c5b3c4c33f231b5578085b2. Reverted https://github.com/pytorch/pytorch/pull/112178 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it seems that ROCm will need to be fixed for the new test too `f64a97c6f8` ([comment](https://github.com/pytorch/pytorch/pull/112178#issuecomment-1788195311))	2023-11-01 00:03:56 +00:00
Jithun Nair	333d5821ee	[ROCm] Add gcnArchName to collect_env and torch.cuda.get_device_properties (#107477 ) Printing just the device name is not helpful when investigating PyTorch issues filed for specific AMD GPUs, as the support/issue might depend on the gfx arch, which is part of the gcnArchName property. `torch.cuda.get_device_properties(0).gcnArchName` will print the value of the `gcnArchName` property: eg. ``` >>> torch.cuda.get_device_properties(0).gcnArchName 'gfx906:sramecc+:xnack-' ``` ``` root@6f064e3c19fb:/data/pytorch/test# python ../torch/utils/collect_env.py ... GPU models and configuration: AMD Radeon Graphics(gfx906:sramecc+:xnack-) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/107477 Approved by: https://github.com/albanD	2023-10-31 23:05:36 +00:00
rzou	4daf8afe8e	Revert "Fix bug: not creating empty tensor with correct sizes and device. (#106734 )" (#112170 ) This reverts commit 528a2c0aa97d152b8004254040076b8ae605bf9f. The PR is wrong, see #110941. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112170 Approved by: https://github.com/albanD	2023-10-31 23:02:33 +00:00
voznesenskym	0f4d2904be	[dynamo] compiled_autograd support for post_acc_grad hooks (#112326 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112326 Approved by: https://github.com/jansel ghstack dependencies: #112325	2023-10-31 22:53:01 +00:00
PyTorch MergeBot	16953482d9	Revert "Enable planner to be used for loading sharded optimizer state dict (#112259 )" This reverts commit 6188f2e899e58cc120afd571094a97047bf97681. Reverted https://github.com/pytorch/pytorch/pull/112259 on behalf of https://github.com/ZainRizvi due to Sorry, but this breaks internal builds. @wz337 can you please help fix this? ([comment](https://github.com/pytorch/pytorch/pull/112259#issuecomment-1788119247))	2023-10-31 22:27:48 +00:00
Svetlana Karslioglu	c8b74fd012	Add assigntome-docathon workflow (#112525 ) - Adding a workflow to enable docathon participants assign issues to themselves. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112525 Approved by: https://github.com/clee2000	2023-10-31 22:23:32 +00:00
Riley Dulin	9e0cd64c5e	[fx] Add Graph option for replace_pattern (#112409 ) Summary: Allow doing pattern replacement with just an fx.Graph instead of a fx.GraphModule, which can let callers avoid paying the cost of `recompile()` for a small graph if they don't need the module. This is a significant speedup if you use hundreds of small patterns for replacement. Test Plan: Tested in a diff stacked on top of this: {D50756722} Reviewed By: SherlockNoMad, angelayi Differential Revision: D50756723 @diff-train-skip-merge Pull Request resolved: https://github.com/pytorch/pytorch/pull/112409 Approved by: https://github.com/ZainRizvi	2023-10-31 22:16:20 +00:00
Jon Chuang	53acdb66f7	[primtorch] `aten.normal` decomp has wrong return type due to `elementwise_type_promotion_wrapper` (#112467 ) Fixes https://github.com/pytorch/pytorch/issues/112449 elementwise_type_promotion_wrapper will promote `aten.normal` to the dtypes of `mean`, `std` args. But this is incorrect if we provide the dtype param. Hence, we allow overriding the result_dtype if a specified dtype arg is available. This problem is unique to `aten.normal`, all other ops decorated do not have a dtype param. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112467 Approved by: https://github.com/lezcano	2023-10-31 20:57:09 +00:00
Yuqing Jiang	24f217ee64	[Nested tensor] Add more ops in Python subclass nested tensor (#112302 ) Summary: Add dropout, split_with_sizes, and silu operations in python subclass nested tensor Test Plan: unit tests Differential Revision: D50676812 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112302 Approved by: https://github.com/soulitzer, https://github.com/jbschlosser	2023-10-31 20:57:05 +00:00
Steven Troxler	17fd4885aa	[dynamo] Support custom dict constructor with kwargs (#112513 ) Summary: As of https://github.com/pytorch/pytorch/pull/103192, dynamo supports code that creates OrderedDict instances using kwargs for the key-value pairs rather than passing a dict literal. But custom dicts (for example subclasses of OrderedDict) follow a different codepath so that we can check for conditions such as a custom `__init__` that need to force a graph break. This commit allows kwargs for custom dict constructors - if the args are empty and the class is not also a dataclass (which is the case that, for example, a `transformers.modeling_outputs.ModelOutput` instance will wind up hitting) then treat the kwargs as the key-value pairs. NOTE: For this to behave 100% correctly, we are relying on the fact that python dicts behave like ordered dicts so that they preserve the kwargs' ordering. Technically it is not guaranteed that future versions of Python will respect this; if that behavior changes we would need to ensure that dynamo uses OrderedDict for kwargs all the way down in order to handle special cases like OrderedDict where the kwargs' ordering does matter. Test Plan: ``` pytest test/dynamo/test_functions.py ``` I also verified that the new test fails without the changes to `dicts.py`. Reviewers: yanboliang Pull Request resolved: https://github.com/pytorch/pytorch/pull/112513 Approved by: https://github.com/yanboliang	2023-10-31 20:55:38 +00:00
Jon Chuang	f74d766632	feat(optim): use `has_complex` shortcut flag for all applicable optimizers, use `_view_as_real` auxiliary function (#110706 ) Follow up to: https://github.com/pytorch/pytorch/pull/110607 CC: @lezcano @janeyx99 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110706 Approved by: https://github.com/lezcano	2023-10-31 20:33:03 +00:00
Aaron Enye Shi	90bef4411e	[Profiler] Disable CUPTI Teardown when using CUDA Graphs (#112507 ) Summary: CUDA Graph does not work well with CUPTI teardown. 1) crashes on 1st lazy CUPTI re-init after teardown (CUDA 11) 2) crashes on 2nd non-lazy CUPTI re-init after teardown (CUDA 12) Workaround: turn off CUPTI teardown when using CUDA Graphs completely. Test Plan: CI Differential Revision: D50811284 Pulled By: aaronenyeshi Pull Request resolved: https://github.com/pytorch/pytorch/pull/112507 Approved by: https://github.com/davidberard98	2023-10-31 20:17:05 +00:00
PyTorch MergeBot	bc098c7fc2	Revert "[dynamo] `ExecutorchCallDelegateHigherOrderVariable` - add sanity check that input and output tensors are disjoint (#111960 )" This reverts commit 25f06ee51b0a113d13612cdc4dc7275250436bd0. Reverted https://github.com/pytorch/pytorch/pull/111960 on behalf of https://github.com/izaitsevfb due to Breaks internal tests, [T168506136](https://www.internalfb.com/intern/tasks/?t=168506136) ([comment](https://github.com/pytorch/pytorch/pull/111960#issuecomment-1787964742))	2023-10-31 20:14:20 +00:00
PyTorch MergeBot	b1b3d489f3	Revert "[dynamo] Be stricter about `HigherOrderOperator` kwargs (#111938 )" This reverts commit eb8af4dc675c625bbe2a28077e5951d4bbe8b862. Reverted https://github.com/pytorch/pytorch/pull/111938 on behalf of https://github.com/izaitsevfb due to Reverting to unblock the revert of #111960 ([comment](https://github.com/pytorch/pytorch/pull/111938#issuecomment-1787960567))	2023-10-31 20:10:58 +00:00
Jez Ng	f64a97c6f8	[inductor] Memory planning (#112178 ) This was originally @jansel's PR: https://github.com/pytorch/pytorch/pull/102625, which I've built upon. This diff implements static memory planning. It's disabled by default while we examine its performance. We use a greedy-by-size approach. For dynamic shapes, the sizes of the example inputs are used as estimates when making planning decisions. We generate expressions to calculate the actual memory offsets and sizes at runtime when the values of the dynamic shapes are known. In order to simplify these calculations, we have organized the allocations into a tree that branches on space (address offsets) and time (live ranges). Finally, we need to align these offsets, so we have added an `align` sympy Expr to express these calculations. Some limitations: 1. It is only enabled during inference for now. Enabling it for training increases peak memory usage as we allocate all the memory needed for training upfront, before freeing the memory allocated during inference. We can probably address this by doing planning for both the inference and training passes together. 2. It doesn't work with PyTorch Distributed, because kernels like AllGatherIntoTensor codegen strings which do memory operations. We can fix this down the line by having them emit MemoryPlanningLines instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112178 Approved by: https://github.com/desertfire, https://github.com/jansel	2023-10-31 20:02:30 +00:00
William Wen	aa649f713f	[dynamo, test] remove #ops comparison to fx.symbolic_trace from dynamo standard_test (#112420 ) Fix https://github.com/pytorch/pytorch/issues/112230 by removing the comparison of number of ops in dynamo vs. fx.symbolic_trace. A number of tests fail in `test_functions.py` fail because the number of ops is no longer the same, but this seems to be acceptable behavior by dynamo. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112420 Approved by: https://github.com/jansel, https://github.com/int3	2023-10-31 19:55:47 +00:00
Hanzhi Zhou	bb45f89cd9	Hackable distributed filesystem reader and writer (#106635 ) I propose some changes so that the `FileSystemReader` and `FileSystemWriter` can be used on other file systems. User only needs to provide `path` as a subclass of `Path` that overrides the necessary interfaces. For example, one can utilize `tf.io.gfile` to implement an interface to save to or load from HDFS. The following code snippet shows a working implementation. ```python from pathlib import Path import tensorflow as tf class GFileWrapper(tf.io.gfile.GFile): def __init__(self, path, mode="r") -> None: super().__init__(path, mode) def write(self, data): return super().write(bytes(data)) # a not quite efficient readinto, but it works def readinto(self, buffer): # read up to buffer's length data = self.read(len(buffer)) length = len(data) buffer[:length] = data return length class HdfsPath(type(Path())): def __new__(cls, pathsegments): return super().__new__(cls, pathsegments) @staticmethod def _fix_path(path): path = str(path) if path.startswith("hdfs:/") and not path.startswith("hdfs://"): path = path.replace("hdfs:/", "hdfs://") return path def open(self, mode="r", args, kwargs): return GFileWrapper(HdfsPath._fix_path(self), mode=mode) def mkdir(self, *kwargs) -> None: return tf.io.gfile.makedirs(HdfsPath._fix_path(self)) def rename(self, target): return tf.io.gfile.rename(HdfsPath._fix_path(self), HdfsPath._fix_path(target)) ``` ```python writer = FileSystemWriter(HdfsPath("hdfs://..."), sync_files=False) reader = FileSystemReader(HdfsPath("hdfs://...")) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/106635 Approved by: https://github.com/fduwjj	2023-10-31 19:36:18 +00:00
Josh Levy-Kramer	1df1ae66cc	[DTensor] Assert shard dim is less than tensor ndim (#112404 ) Assert shard dim is less than tensor ndim. Previously, an index error on line 154 is thrown and the error is not clear Pull Request resolved: https://github.com/pytorch/pytorch/pull/112404 Approved by: https://github.com/wanchaol, https://github.com/fduwjj	2023-10-31 19:36:14 +00:00
Sam Larsen	6ae21e73d3	[inductor] FX graph cache: Add support for symbolic shapes (#111421 ) Summary: Add support for caching graphs that have tensor args with symbolic shapes. The high-level appraoch is to serialize guards with the on-disk cached object and validating those guards pass before serving a cached object. Test Plan: New unit tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/111421 Approved by: https://github.com/ezyang	2023-10-31 19:31:05 +00:00
rzou	1483097679	Update how Dynamo decides to graph break on an OpOverloadPacket (#112200 ) Previously, under config.only_allow_pt2_compliant_ops, Dynamo graph breaks when it see an OpOverloadPacket where any overloads are not PT2 compliant. This is potentially brittle: if someone (unlikely) adds a new overload for a custom operator, then this would cause a previously non-graph-breaking call to the OpOverloadPacket to graph break. In this PR: - When Dynamo is about to write a call to an operator to the FX graph, we check if it is PT2 compliant. - For OpOverload, we check to see if the tag is on it - For OpOverloadPacket, we do overload resolution and check to see if the tag is on the OpOverload that it resolves to. Test Plan: - new tests, existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/112200 Approved by: https://github.com/bdhirsh	2023-10-31 19:10:37 +00:00
Zain Rizvi	fb0e3a5740	Refactor TD tests to own folder (#112166 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112166 Approved by: https://github.com/clee2000 ghstack dependencies: #112161	2023-10-31 18:50:54 +00:00
PyTorch MergeBot	5f461e9ec1	Revert "Error early when dataclass is not registered (#112211 )" This reverts commit b165abaa3b5b2f81fcd69c1060e651aabe38e574. Reverted https://github.com/pytorch/pytorch/pull/112211 on behalf of https://github.com/ZainRizvi due to Breaks internal builds. See D50820325 ([comment](https://github.com/pytorch/pytorch/pull/112211#issuecomment-1787794078))	2023-10-31 18:45:25 +00:00
Jon Chuang	a21851c69d	fix(inductor): `ForeachKernelSchedulerNode` group shape should be opaque for graph debug (#110336 ) ~~Shape is assumed by `TensorMetadata` to be torch.Shape/tuple, however, some of the scheduler node groups utilize `int`, so convert to tuple.~~ Root cause is actually `foreach` scheduler node having silent-error group of int, when in fact it ought to be opaque `foreach`. Previously: silent error / confusing shape of (0,) ![image](https://github.com/pytorch/pytorch/assets/9093549/5bc2a3c7-151f-4433-bbf8-044c7b03e989) Now: clear that it is foreach which does not have well-defined shape: ![image](https://github.com/pytorch/pytorch/assets/9093549/8373080d-4519-4e74-8a3b-da463e9968da) ~~Alternate might be to create list of shapes for each of its subnodes. Actually, for debuggability sake, I may prefer this. We can ensure that the recursive generation of this string is only done dynamically in a debug code path. Else, incrementally computing it on initialization of ForeachKernel may also be feasible.~~ This is quite infeasible for 100s of params. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110336 Approved by: https://github.com/mlazos	2023-10-31 18:44:08 +00:00
Jon Chuang	2e40e09d57	[dynamo] `{*}Tensor.__init__` from list of Tensor/ndarray as `torch.stack(List[FakeTensor])` (#111741 ) Follow up to https://github.com/pytorch/pytorch/pull/111665 Fixes: https://github.com/pytorch/pytorch/issues/106207 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111741 Approved by: https://github.com/lezcano	2023-10-31 18:44:04 +00:00
albanD	2f51b9223c	Make sure namedtuple are preserved when adding backward hooks on Module (#112433 ) As per title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112433 Approved by: https://github.com/mikaylagawarecki	2023-10-31 18:40:35 +00:00
Yang Chen	94f3df27e4	[aotinductor] reland: return a copy of any constant (#112370 ) When the model returns a constant, we cannot "release" its handle, because the constant doesn't have any handle at all. Instead, we should allocate a new tensor and then return a copy of the constant. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112370 Approved by: https://github.com/hl475, https://github.com/desertfire	2023-10-31 18:36:44 +00:00
Tugsbayasgalan Manlaibaatar	36164265ae	[export oncall] add some examples during oncall (#112445 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112445 Approved by: https://github.com/ydwu4	2023-10-31 18:33:03 +00:00
Shunting Zhang	fbafff3668	[reland][inductor] benchmark fusion (#112450 ) reland https://github.com/pytorch/pytorch/pull/108193 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112450 Approved by: https://github.com/jansel	2023-10-31 18:17:06 +00:00
David Berard	481a7a9643	[execution trace] ignore some properties when symbolic size/strides exist (#112458 ) Fixes #112235 Otherwise an exception will be thrown when we try to access storage or sizes on a tensor with symbolic size/strides. Added a test in test/dynamo/test_profiler.py Differential Revision: [D50821576](https://our.internmc.facebook.com/intern/diff/D50821576) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112458 Approved by: https://github.com/aaronenyeshi	2023-10-31 18:13:03 +00:00
Zain Rizvi	a5641bc56b	[TD] Enable Test Class granularity on heuristics (#112161 ) Changes the heuristic framework to support multiple prioritizing individual classes within a test file. Components of this included: - Updating TestPrioritizations to accept individual test classes being prioritized. Previously, when a heuristic wanted to prioritize a test file it would pass in the test's name, now to prioritize a class within a test it uses the notation "test::classname" - Changes are fully backwards compatible with existing heuristics - Test sharding now supports sharding individual tests (for when they're prioritized) - When a TestClass is prioritized, we pass the appropriate "-k" flags down to pytest Pull Request resolved: https://github.com/pytorch/pytorch/pull/112161 Approved by: https://github.com/huydhn	2023-10-31 18:11:05 +00:00
andrewor14	5cd1208415	[quant][pt2][be] Refactor QAT q-dq patterns (#112279 ) Summary: This commit refactors q-dq patterns used in QAT fusion, reducing code duplication. This is important for future efforts to support quantizing bias. Test Plan: python test/test_quantization.py TestQuantizePT2EQAT Reviewers: jerryzh168, kimishpatel Subscribers: jerryzh168, kimishpatel, supriyar Pull Request resolved: https://github.com/pytorch/pytorch/pull/112279 Approved by: https://github.com/jerryzh168 ghstack dependencies: #112159	2023-10-31 18:04:23 +00:00
andrewor14	231129ea36	[quant][pt2] Fix QAT conv-bn bias derived qspec (#112159 ) Summary: Today, we have special handling for special qspecs like `SharedQuantizationSpec` or `DerivedQuantizationSpec`, since these qspecs refer to other nodes in the graph and these node references need to be updated after replacement (since they referred to nodes in the original graph that no longer exist in the new graph). However, we only do the above for special nodes like conv, bn, getitem, and relu. This doesn't cover the common use case of having conv bias derive its qparams from those of conv input activations and conv weight. This commit adds support for this use case by also replacing the node references for these nodes. Test Plan: python test/test_quantization.py TestQuantizePT2EQAT.test_qat_conv_bn_bias_derived_qspec Reviewers: jerryzh168, kimishpatel Subscribers: jerryzh168, kimishpatel, supriyar Differential Revision: [D50697078](https://our.internmc.facebook.com/intern/diff/D50697078) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112159 Approved by: https://github.com/jerryzh168	2023-10-31 18:04:23 +00:00
Li-Huai (Allan) Lin	30237aaeec	[MPS] Fix bug when value is of complex (#111937 ) When the value of `fill` is of complex, this line `value.toDouble() == 0.0` will error out saying that converting complex to double will cause overflow. So we should firstly handle the complex value and then enter this condition. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111937 Approved by: https://github.com/malfet ghstack dependencies: #111885	2023-10-31 17:50:56 +00:00
Jerry Zhang	3db0095ea2	[reland][quant][pt2e][be] Cleanup observer insertion logic (#111828 ) (#112453 ) Summary: att, after SharedQuantizationSpec bug fix we are doing some checks before hand, this can simplify the logic when we insert observers Test Plan: contbuild & OSS CI, see `bf998a2c5d` Test plan from GitHub: python test/test_quantization.py TestQuantizePT2E CIs Differential Revision: D50816224 Pulled By: jerryzh168 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112453 Approved by: https://github.com/andrewor14	2023-10-31 17:33:24 +00:00
Jiong Gong	a1c56df1f0	[inductor cpp] vectorize support for truediv (#112234 ) Ops like group_norm has `ops.truediv` that doesn't have vectorization support yet. This PR adds the support. `test_group_norm_vec` Before: ```c++ extern "C" void kernel(const float* in_ptr0, const float* in_ptr1, const float* in_ptr2, float* out_ptr0, float* out_ptr1, float* out_ptr2) { #pragma omp parallel num_threads(64) { { #pragma omp for for(long x0=static_cast<long>(0L); x0<static_cast<long>(64L); x0+=static_cast<long>(1L)) { { #pragma omp declare reduction(welford:Welford<float>:omp_out = welford_combine(omp_out, omp_in)) initializer(omp_priv={Welford<float>()}) #pragma omp declare reduction(welford:Welford<at::vec::Vectorized<float>>:omp_out = welford_combine(omp_out, omp_in)) initializer(omp_priv={Welford<at::vec::Vectorized<float>>()}) Welford<float> tmp_acc0 = Welford<float>(); Welford<at::vec::Vectorized<float>> tmp_acc0_vec = Welford<at::vec::Vectorized<float>>(); for(long x1=static_cast<long>(0L); x1<static_cast<long>(1024L); x1+=static_cast<long>(16L)) { auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + static_cast<long>(x1 + (1024Lx0))); tmp_acc0_vec = welford_combine(tmp_acc0_vec, tmp0); } tmp_acc0 = welford_combine(tmp_acc0, welford_vec_reduce_all(tmp_acc0_vec)); out_ptr0[static_cast<long>(x0)] = static_cast<float>(tmp_acc0.mean); out_ptr1[static_cast<long>(x0)] = static_cast<float>(tmp_acc0.m2); } } } { #pragma omp for collapse(2) for(long x0=static_cast<long>(0L); x0<static_cast<long>(2L); x0+=static_cast<long>(1L)) { for(long x1=static_cast<long>(0L); x1<static_cast<long>(32L); x1+=static_cast<long>(1L)) { #pragma GCC ivdep for(long x2=static_cast<long>(0L); x2<static_cast<long>(1024L); x2+=static_cast<long>(1L)) { auto tmp0 = in_ptr0[static_cast<long>(x2 + (1024Lx1) + (32768Lx0))]; auto tmp1 = out_ptr0[static_cast<long>(x1 + (32Lx0))]; auto tmp3 = out_ptr1[static_cast<long>(x1 + (32Lx0))]; auto tmp10 = in_ptr1[static_cast<long>(x1)]; auto tmp12 = in_ptr2[static_cast<long>(x1)]; auto tmp2 = tmp0 - tmp1; auto tmp4 = c10::convert<float>(1024.0); auto tmp5 = tmp3 / tmp4; auto tmp6 = c10::convert<float>(1e-05); auto tmp7 = tmp5 + tmp6; auto tmp8 = 1 / std::sqrt(tmp7); auto tmp9 = decltype(tmp2)(tmp2 tmp8); auto tmp11 = decltype(tmp9)(tmp9 * tmp10); auto tmp13 = tmp11 + tmp12; out_ptr2[static_cast<long>(x2 + (1024Lx1) + (32768Lx0))] = tmp13; } } } } } } ``` After: ```c++ extern "C" void kernel(const float* in_ptr0, const float* in_ptr1, const float* in_ptr2, float* out_ptr0, float* out_ptr1, float* out_ptr2) { #pragma omp parallel num_threads(64) { { #pragma omp for for(long x0=static_cast<long>(0L); x0<static_cast<long>(64L); x0+=static_cast<long>(1L)) { { #pragma omp declare reduction(welford:Welford<float>:omp_out = welford_combine(omp_out, omp_in)) initializer(omp_priv={Welford<float>()}) #pragma omp declare reduction(welford:Welford<at::vec::Vectorized<float>>:omp_out = welford_combine(omp_out, omp_in)) initializer(omp_priv={Welford<at::vec::Vectorized<float>>()}) Welford<float> tmp_acc0 = Welford<float>(); Welford<at::vec::Vectorized<float>> tmp_acc0_vec = Welford<at::vec::Vectorized<float>>(); for(long x1=static_cast<long>(0L); x1<static_cast<long>(1024L); x1+=static_cast<long>(16L)) { auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + static_cast<long>(x1 + (1024Lx0))); tmp_acc0_vec = welford_combine(tmp_acc0_vec, tmp0); } tmp_acc0 = welford_combine(tmp_acc0, welford_vec_reduce_all(tmp_acc0_vec)); out_ptr0[static_cast<long>(x0)] = static_cast<float>(tmp_acc0.mean); out_ptr1[static_cast<long>(x0)] = static_cast<float>(tmp_acc0.m2); } } } { #pragma omp for collapse(2) for(long x0=static_cast<long>(0L); x0<static_cast<long>(2L); x0+=static_cast<long>(1L)) { for(long x1=static_cast<long>(0L); x1<static_cast<long>(32L); x1+=static_cast<long>(1L)) { for(long x2=static_cast<long>(0L); x2<static_cast<long>(1024L); x2+=static_cast<long>(16L)) { auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + static_cast<long>(x2 + (1024Lx1) + (32768Lx0))); auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(out_ptr0[static_cast<long>(x1 + (32Lx0))])); auto tmp3 = at::vec::Vectorized<float>(static_cast<float>(out_ptr1[static_cast<long>(x1 + (32Lx0))])); auto tmp10 = at::vec::Vectorized<float>(static_cast<float>(in_ptr1[static_cast<long>(x1)])); auto tmp12 = at::vec::Vectorized<float>(static_cast<float>(in_ptr2[static_cast<long>(x1)])); auto tmp2 = tmp0 - tmp1; auto tmp4 = at::vec::Vectorized<float>(static_cast<float>(1024.0)); auto tmp5 = tmp3 / tmp4; auto tmp6 = at::vec::Vectorized<float>(static_cast<float>(1e-05)); auto tmp7 = tmp5 + tmp6; auto tmp8 = tmp7.rsqrt(); auto tmp9 = tmp2 tmp8; auto tmp11 = tmp9 * tmp10; auto tmp13 = tmp11 + tmp12; tmp13.store(out_ptr2 + static_cast<long>(x2 + (1024Lx1) + (32768Lx0))); } } } } } } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112234 Approved by: https://github.com/lezcano, https://github.com/jansel	2023-10-31 17:15:21 +00:00
voznesenskym	b91fcdf4aa	[dynamo] Add support for register_post_accumulate_grad_hook (#112325 ) lint Pull Request resolved: https://github.com/pytorch/pytorch/pull/112325 Approved by: https://github.com/jansel	2023-10-31 17:04:49 +00:00
Peter Bell	04024926f4	Use `pytree.tree_map_` everywhere (#112417 ) Wherever we discard the output of `tree_map` it's better to call `tree_map_` which doesn't unflatten the mapped results and so is a lot cheaper. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112417 Approved by: https://github.com/lezcano ghstack dependencies: #112391, #112392, #112393, #112394	2023-10-31 15:57:06 +00:00
Peter Bell	66c32d099a	Use `pytree.arg_tree_leaves` everywhere (#112394 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112394 Approved by: https://github.com/lezcano ghstack dependencies: #112391, #112392, #112393	2023-10-31 15:57:06 +00:00
Peter Bell	046c0c66fd	[pytree] Add arg_tree_leaves to optimize flattening function arguments (#112393 ) We commonly do some variation of `tree_leaves((args, kwargs))`. This adds a new function `arg_tree_leaves(args, *kwargs)` which takes advantage of the known structure of `args` and `kwargs` to skip their `flatten_fn`. I see ~1 us improvement per call for args + kwargs, or a 0.5 us improvement when passing just one of `args` or `kwargs`. For shallow structures, this can be proportionally quite significant. For example, the empty_strided call I've been using as a benchmark: ``` args = ((100, 100), (100, 1)) kwargs = dict(device="cuda") ``` Sees a 30% speedup from this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112393 Approved by: https://github.com/lezcano ghstack dependencies: #112391, #112392	2023-10-31 15:57:00 +00:00
Guilherme Leobas	86196bf116	add batch impl. for inplace `index_add` operation (#112276 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112276 Approved by: https://github.com/zou3519, https://github.com/kshitij12345, https://github.com/malfet	2023-10-31 13:47:53 +00:00
Peter Ye	424c093fc7	Fix comment spelling error (#112468 ) Fix tiny spelling error in comments Pull Request resolved: https://github.com/pytorch/pytorch/pull/112468 Approved by: https://github.com/kit1980	2023-10-31 10:53:12 +00:00
CaoE	a310cc8968	Add Half support for kthvalue, cross, hist, and logit on CPU (#112135 ) Add Half support for kthvalue, cross, hist, and logit on CPU. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112135 Approved by: https://github.com/cpuhrsch	2023-10-31 09:12:47 +00:00
Nikita Shulga	8d6b4322d0	[CI] Limit libtorch builds to `shared-with-deps` (#112452 ) As that is the only variant that is being mentioned on https://pytorch.org/get-started/locally/ And for MacOS those three flavors were just building and uploading the same thing 3 times over, see [this](https://github.com/pytorch/pytorch/actions/runs/6689661275/job/18176516410) for example: ``` upload: ../../_temp/artifacts/libtorch-macos-2.2.0.dev20231030.zip to s3://pytorch/libtorch/nightly/cpu/libtorch-macos-2.2.0.dev20231030.zip ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112452 Approved by: https://github.com/huydhn ghstack dependencies: #112451	2023-10-31 08:40:06 +00:00
Wanchao Liang	70b392ae02	[dtensor] enable foreach operators for adam optimizer (#112108 ) This PR enables basic foreach ops in DTensor for adam optimizer, to improve performance compare to optimizer using torch.Tensor. Currently by default optimizer won't do this for tensor subclass, we will need to enable this by default in DTensor when all ops are covered, or early enable this when exploring new FSDP, we just need to append DTensor to the optimizer allow list. Some latency measurement, on a 5-layer MLP model: single tensor adam: 17ms ![Screenshot 2023-10-29 at 10 48 22 PM](https://github.com/pytorch/pytorch/assets/9443650/8937d786-b863-4318-88c2-12e43180ce8d) foreach multitensor adam: 4ms ![Screenshot 2023-10-29 at 10 50 58 PM](https://github.com/pytorch/pytorch/assets/9443650/de105cc3-8e12-4765-938a-763d8e958194) so around 4.25x improvement Pull Request resolved: https://github.com/pytorch/pytorch/pull/112108 Approved by: https://github.com/wz337	2023-10-31 08:09:46 +00:00
Pritam Damania	e66ec5843f	[RESUBMIT] Cleanup error reporting for ProcessGroupNCCL (#112419 ) Continuing some of the work from https://github.com/pytorch/pytorch/pull/108191, I realized majority of errors raised from ProcessGroupNCCL were just generic RuntimeError. In this PR, I've added appropriate error types to all the exceptions raised from ProcessGroupNCCL. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112419 Approved by: https://github.com/fduwjj	2023-10-31 05:58:21 +00:00
PyTorch MergeBot	cb942ef2b1	Revert "add batch impl. for inplace `index_add` operation (#112276 )" This reverts commit e3c8c63deaf594699d827e84869a3ecd7e2ab494. Reverted https://github.com/pytorch/pytorch/pull/112276 on behalf of https://github.com/PaliC due to breaking linux binary builds ([comment](https://github.com/pytorch/pytorch/pull/112276#issuecomment-1786455375))	2023-10-31 05:10:47 +00:00
Elias Ellison	08dbfecdbd	Revert "Symintify repeat_interleave (#109133 )" (#112245 ) This reverts commit 41e5d410cf4bfaaf264cc97b541e00d968be6db2. Differential Revision: [D50804696](https://our.internmc.facebook.com/intern/diff/D50804696) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112245 Approved by: https://github.com/eellison	2023-10-31 03:50:26 +00:00
PyTorch UpdateBot	6cebacdbc0	[vision hash update] update the pinned vision hash (#112455 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112455 Approved by: https://github.com/pytorchbot	2023-10-31 03:32:40 +00:00
Yanbo Liang	710337244d	[Inductor] Extend Pattern Matcher to Match Equivalent Function Invocation (#107832 ) Fixes #104391 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107832 Approved by: https://github.com/jansel	2023-10-31 03:32:33 +00:00
chunyuan	f50ec341bc	inductor cpp wrapper: add GIL release and acquire (#111888 ) Support multiple instances inference (in different threads of the same process) as in https://github.com/pytorch/pytorch/issues/93524#issuecomment-1421816158. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111888 Approved by: https://github.com/jgong5, https://github.com/jansel, https://github.com/desertfire	2023-10-31 03:23:30 +00:00
David Berard	bb97ce4c7f	[ivalue] operator<<: don't error on invalid IValue tags (#112232 ) While running the profiler, we observed a scenario where we observe IValues with invalid tags. Specifically, we try to convert the IValue to a string here: `d3bf6803b6/torch/csrc/profiler/util.cpp (L306-L308)` and in the scenario with invalid IValues, an exception gets thrown here, in `operator<<`. `d3bf6803b6/aten/src/ATen/core/ivalue.cpp (L864)` IMO, `<<` shouldn't error if the IValue is bad; instead, we should just print that the IValue tag is invalid. Differential Revision: [D50760040](https://our.internmc.facebook.com/intern/diff/D50760040) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112232 Approved by: https://github.com/albanD	2023-10-31 02:15:43 +00:00
Ying Zhang	c3113514e9	Fix regression from pointwise + multi-level reduction fusion (#112297 ) In https://github.com/pytorch/pytorch/pull/111122, an optimization is introduced for reduction + pointwise + multi-level reduction fusion. The main idea of this optimization is to have the first-level reduction of the multi-level reduction reuses the reduction sizes of the first reduction kernel so that there are better chances that the first reduction kernel and the first-level reduction of the multi-level reduction kernel can be fused. However, it introduces a bug for pattern pointwise + multi-level reduction, where the first-level reduction kernel wrongly reuses the reduction ranges (which is []) from the previous pointwise kernel. This PR fixes this issue. Test plan: `python timm_models.py --training --amp --performance --only=dm_nfnet_f0 --inductor` Results before this PR: 0.869x Results after this PR: 1.232x Benchmark results: ![Screenshot 2023-10-30 at 2 30 10 PM](https://github.com/pytorch/pytorch/assets/10527447/c7b241c0-92a4-49ff-96fb-2805c8fcc45a) <img width="1491" alt="Screenshot 2023-10-30 at 3 10 06 PM" src="https://github.com/pytorch/pytorch/assets/10527447/608d26ea-dcc5-4f2a-8700-4a928701392b"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/112297 Approved by: https://github.com/jansel	2023-10-31 01:47:46 +00:00
Ying Zhang	6ab1121bdc	Enable Mypy checking for scheduler.py (#105600 ) ATT, add type annotations and type assertions to pass Mypy checks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/105600 Approved by: https://github.com/int3	2023-10-31 01:47:13 +00:00
Nikita Shulga	0ce8cf7c7a	Update small wheel nccl-version to 2.19.3 (#112293 ) To keep it in sync with https://github.com/pytorch/pytorch/pull/110827 Added check to `scripts/generate_binary_build_matrix.py` to validate submodule and small wheel nccl versions are the same Step one in addressing https://github.com/pytorch/pytorch/issues/112285 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112293 Approved by: https://github.com/huydhn	2023-10-31 01:20:01 +00:00
Nikita Shulga	236eff9531	[BE] Refactor repeated assets in test_foreach.py (#112348 ) Tested conditions in `test_binary_op_list_error_cases` looks almost identical, although it tests method and in-place variants. Use for loop to make distinction a bit more explicit Pull Request resolved: https://github.com/pytorch/pytorch/pull/112348 Approved by: https://github.com/albanD ghstack dependencies: #112349	2023-10-31 01:11:44 +00:00
Guilherme Leobas	e3c8c63dea	add batch impl. for inplace `index_add` operation (#112276 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112276 Approved by: https://github.com/zou3519, https://github.com/kshitij12345	2023-10-31 00:59:18 +00:00
Wanchao Liang	2f09da3a21	[dtensor] Introduce full_tensor API to DTensor (#112224 ) This PR introduces a `full_tensor` API to DTensor, there were so many callsites that exercises the `redistribute(replicate)` path and I feel it deserves a separate API, mostly just a syntactic sugar Pull Request resolved: https://github.com/pytorch/pytorch/pull/112224 Approved by: https://github.com/wz337	2023-10-31 00:44:09 +00:00
Nikita Shulga	e2cd69a770	[CI] Call upload step `upload` (#112451 ) Rather than `build` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112451 Approved by: https://github.com/huydhn	2023-10-31 00:37:14 +00:00
Guilherme Leobas	b8a10a8a2d	Add batch decomposition for torch.unsafe_chunk (#110862 ) This updates the docs as well to show `torch.unsafe_chunk`. Should the `unsafe_*` functions should not appear in the docs? Pull Request resolved: https://github.com/pytorch/pytorch/pull/110862 Approved by: https://github.com/kshitij12345, https://github.com/zou3519	2023-10-31 00:37:08 +00:00
drisspg	40569b28f4	Constrain fx_stride order for scaled_mm (#112430 ) # Summary CublasLT requires row_major @ col_major order for scaled_mm. It is possible for the to not respect this constraint without adding this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112430 Approved by: https://github.com/eellison	2023-10-31 00:02:35 +00:00
Sam Larsen	12a9e09200	[inductor] Fix bug handling output_strides in fx graph cache (#112041 ) Summary: The current implementation is not properly attaching output strides to the tracing context when an fx graph is loaded from the cache. That bugs leads to assertion failures like `AssertionError: expected size 3==3, stride 1==9 at dim=1`. This change saves the output strides in the serialized object cached on disk and inserts them into the tracing context whether the graph is loaded from cache or compiled. Test Plan: * New unit test using resnet18 (which repros the problem) * Ran the timm benchmark suite with `--training` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112041 Approved by: https://github.com/ezyang	2023-10-30 23:49:10 +00:00
Tugsbayasgalan Manlaibaatar	cf3aa985a9	Don't rewrite assert in pytest (#112436 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112436 Approved by: https://github.com/angelayi	2023-10-30 23:20:02 +00:00
Jon Chuang	479f5eb029	[dynamo] Remove dead code - `real_value_tensor_positive_aliases` (#111911 ) (legality) It is currently impossible (and should remain impossible) - (due to dedup guards - all static tensors are unique) - to access the same static tensor value from a different source. As for `getattr(nn.Module, tensor)` source collisions, we will never instantiate a `nn.Module getattr` source for a static tensor, due to: - side-effect tracking (as long as we track all static tensors - see also https://github.com/pytorch/pytorch/pull/112025 for extra sanity check) - See: `c8a5bb451e/torch/_dynamo/variables/builder.py (L227)` (no worse) In any case, this field is currently unused. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111911 Approved by: https://github.com/voznesenskym	2023-10-30 23:10:52 +00:00
Brian	6188f2e899	Enable planner to be used for loading sharded optimizer state dict (#112259 ) This creates a more consistent interface for saving and loading sharded state dicts. A planner is able to be specified when saving a sharded optimizer state dict, but there is currently no planner support for loading one. This change does not affect the default behavior of the function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112259 Approved by: https://github.com/wz337	2023-10-30 22:51:09 +00:00
Kazuaki Ishizaki	ac71fea1a8	[test][functorch] fix function name in factory_fns (#112315 ) This PR fixes incorrect function name in `factory_fns` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112315 Approved by: https://github.com/zou3519	2023-10-30 22:08:07 +00:00
Peter Bell	31c0ef934b	[pytree] Remove LeafSpec construction cost in tree_flatten (#112392 ) On my machine, `pytree.LeafSpec()` takes ~600ns but since every leaf spec is the same, we can just use a global constant. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112392 Approved by: https://github.com/lezcano ghstack dependencies: #112391	2023-10-30 21:45:45 +00:00
Peter Bell	0f2b7a99e3	[pytree] Avoid constructing intermediate lists in tree_{flatten,leaves} (#112391 ) Instead of concatenating lists of child nodes, this appends the leaf nodes directly onto the list of leaves to be returned which gives a small perf improvement. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112391 Approved by: https://github.com/zou3519	2023-10-30 21:45:45 +00:00
Zhengxu Chen	da90c31593	[export] Upstream unflattener. (#112189 ) Summary: Provide a way for users to get the original module structure back after exporting. Test Plan: caffe2/test:test_export -- -r unflatten Differential Revision: D50708490 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112189 Approved by: https://github.com/suo, https://github.com/angelayi	2023-10-30 21:27:11 +00:00
Chengkun Du	67638d4dad	torch.compile: fix bug of fallback_randn when 'generator' is None (#112240 ) When I run Stable Diffusion in [Huggingface/Diffusers](https://github.com/huggingface/diffusers)，an error occured: ``` LoweringException: AssertionError: should have been handled in replace_random.py. target: aten.randn.generator args[0]: [1, 4, 64, 64] kwargs: {'generator': None, 'dtype': torch.float16, 'layout': torch.strided, 'device': device(type='cuda', index=0), 'pin_memory': False} ``` It looks like some bug of dynamo, and you can reproduce this bug like this: ```python import torch def model(shape, generator): return torch.randn(shape, generator=generator, device="cuda:0") model = torch.compile(model) x = model((1, 3, 64, 64), None) print(x) ``` Error occurs because 'None' is passed into ‘generator' , and dynamo has to process `torch.randn` into fx node `torch.ops.aten.randn.generator`. aten.randn.generator is not processed by decomposition and it is processed by lowering in [torch/_inductor/lowering.py](https://github.com/pytorch/pytorch/blob/main/torch/_inductor/lowering.py#L1815), randn.generator is processed like this: ```python @register_lowering(aten.randn) def randn(args, kwargs): if kwargs.get("generator", None) is not None: return fallback_randn_generator(args, *kwargs) elif config.fallback_random: return fallback_randn_default(args, *kwargs) raise AssertionError("should have been handled in replace_random.py") ``` As you can see, because 'generator' is None, it will not step into `fallback_randn_generator`, and of course, if you don't open `config.fallback_random`, it will not step into `fallback_randn_default`, too. Actually, if 'generator' is None, it could also be processed as`aten.randn.default`. And then, AssertionError will be throw, but in here, I will not disscuss too much about how to process this bug and will open an issue. Actually, `config.fallback_random` offers a way to debug randn in [config.py](https://github.com/pytorch/pytorch/blob/main/torch/_inductor/config.py#L190), so I try to open `config.fallback_random` to debug my model. But when I open it by: ```python # fallback to eager for random/dropout, this is slow but useful for debugging fallback_random = True ``` Another error occurs! ```python LoweringException: RuntimeError: Unknown keyword argument 'generator' for operator 'aten::randn'. Schema: aten::randn(SymInt[] size, , ScalarType? dtype=None, Layouit? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor ``` Obviously, `aten::randn` does not support `kwargs:{generator: None}`, so it should be popped before kwargs is feeded into `fallback_randn_default`. That's all I'm going to say. Thanks for reading carefully. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112240 Approved by: https://github.com/jansel	2023-10-30 21:10:54 +00:00
Ying Zhang	9f1ccd4dac	Fix internal test listing errors (#112300 ) For some reason, fbcode internal tests have list errors when a test is skipped and have ", " in the name. This fix tries to replace shape list into a string to avoid internal test listing errors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112300 Approved by: https://github.com/chenyang78	2023-10-30 20:47:40 +00:00
Nikita Shulga	80de49653a	Prevent OOB access in foreach_list variants (#112349 ) By checking that lists sizes are the same before computing forward gradients. Before the change ```cpp ::std::vector<at::Tensor> _foreach_add_List(c10::DispatchKeySet ks, at::TensorList self, at::TensorList other, const at::Scalar & alpha) { auto self_ = unpack(self, "self", 0); auto other_ = unpack(other, "other", 1); [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self, other ); std::vector<bool> _any_has_forward_grad_result(self.size()); for (const auto& i : c10::irange(self.size())) { _any_has_forward_grad_result[i] = isFwGradDefined(self[i]) \|\| isFwGradDefined(other[i]); } ... ``` after the change: ```cpp ::std::vector<at::Tensor> _foreach_add_List(c10::DispatchKeySet ks, at::TensorList self, at::TensorList other, const at::Scalar & alpha) { auto self_ = unpack(self, "self", 0); auto other_ = unpack(other, "other", 1); [[maybe_unused]] auto _any_requires_grad = compute_requires_grad( self, other ); TORCH_CHECK( self.size() == other.size(), "Tensor lists must have the same number of tensors, got ", self.size(), " and ", other.size()); std::vector<bool> _any_has_forward_grad_result(self.size()); for (const auto& i : c10::irange(self.size())) { _any_has_forward_grad_result[i] = isFwGradDefined(self[i]) \|\| isFwGradDefined(other[i]); } ``` Add regression test Fixes https://github.com/pytorch/pytorch/issues/112305 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112349 Approved by: https://github.com/Chillee	2023-10-30 20:43:03 +00:00
Jason Ansel	a14f8e09bb	[dynamo] torch._dynamo.optimize to torch.compile in cudagraph trees tests (#112314 ) This somehow fixes test issues later on. @eellison figured this one out and will try to figure out why. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112314 Approved by: https://github.com/eellison	2023-10-30 20:16:57 +00:00
Devang Aggarwal	69b9e54d45	Add openvino backend into torch.compile docs (#112321 ) The torch.compile [docs page](https://pytorch.org/docs/stable/torch.compiler.html) shows commonly used backends through torch.compile. Recently, the OpenVINO backend for torch.compile was released. This PR adds the torch.compile openvino backend into the torch.compile docs page. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112321 Approved by: https://github.com/msaroufim	2023-10-30 20:13:41 +00:00
Octavian Guzu	4fbf884f58	[fuzzing result][fuzz_torch_jit_lite_interpreter] read-heap-buffer-overflow-far-from-bounds (size 4) in c10::IValue::IValue() (#110453 ) Summary: This diff fixes an OOB read found by fuzzing in torch/../jit/mobile Test Plan: CI and ``` arc lionhead crash reproduce 853835926354224 ``` doesn't crash anymore. Differential Revision: D49537377 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110453 Approved by: https://github.com/davidberard98	2023-10-30 20:08:22 +00:00
Jason Ansel	4b8a5e1854	[dynamo] Remove VariableTracker.as_specialized (#112363 ) My local testing can't seem to find this function actually doing anything. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112363 Approved by: https://github.com/yanboliang	2023-10-30 20:07:55 +00:00
BowenBao	b97afc4018	Support 'BaseOutput' and subclasses from 'diffusers' in dynamo (#111978 ) Extending the workarounds for `transformers` `ModelOutput` to cover `diffusers` `BaseOutput`. Together with https://github.com/huggingface/diffusers/pull/5459 it should unblock export for `diffusers` models. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111978 Approved by: https://github.com/jansel	2023-10-30 19:53:31 +00:00
PyTorch MergeBot	d713b8dd5d	Revert "[inductor] Fix bug handling output_strides in fx graph cache (#112041 )" This reverts commit 3d2041b34210bef3902f6ba86881b38ac0fbc57e. Reverted https://github.com/pytorch/pytorch/pull/112041 on behalf of https://github.com/ZainRizvi due to fbcode failures ([comment](https://github.com/pytorch/pytorch/pull/112041#issuecomment-1785929233))	2023-10-30 19:50:23 +00:00
PyTorch MergeBot	fc0b0820fc	Revert "Readded device_assert skipping in index and index_put (and also added (#112093 )" This reverts commit b110d87ac271db01fd1d24a6595cf9633ac1ce43. Reverted https://github.com/pytorch/pytorch/pull/112093 on behalf of https://github.com/ZainRizvi due to Stack breaks internal builds ([comment](https://github.com/pytorch/pytorch/pull/112093#issuecomment-1785922905))	2023-10-30 19:45:41 +00:00
PyTorch MergeBot	4439b906c4	Revert "Some cleanups in pattern matcher (#112101 )" This reverts commit f7dc0ae16c4637be0a7f20a1d9cd4311e9a6d3e8. Reverted https://github.com/pytorch/pytorch/pull/112101 on behalf of https://github.com/ZainRizvi due to Stack breaks internal builds ([comment](https://github.com/pytorch/pytorch/pull/112101#issuecomment-1785920248))	2023-10-30 19:43:40 +00:00
PyTorch MergeBot	052f7a3edc	Revert "Added patterns for randperm + index_add (#112102 )" This reverts commit 1ff0b82be977107ab67ad2817ea76d46d3478d8f. Reverted https://github.com/pytorch/pytorch/pull/112102 on behalf of https://github.com/ZainRizvi due to Stack breaks internal builds ([comment](https://github.com/pytorch/pytorch/pull/112102#issuecomment-1785916704))	2023-10-30 19:41:29 +00:00
Tristan Rice	013f622dd2	grid_sample: support bfloat16 (#112331 ) This adds bfloat16 support to `torch.nn.functional.grid_sample` this is particularly important when doing feature sampling such as for rendering techniques used in PyTorch3d or for camera projections to voxel grids such as in SimpleBEV. Related to #57707 Test plan: ``` pytest test/test_nn.py -k grid_sample pytest test/test_ops.py -k grid_sample ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112331 Approved by: https://github.com/zou3519	2023-10-30 19:31:41 +00:00
soulitzer	3b58755c1c	Fix FakeTensor tolist when size is not symbolic (#112206 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112206 Approved by: https://github.com/ezyang ghstack dependencies: #112205	2023-10-30 19:25:10 +00:00
soulitzer	0cda4c8abe	Replay view with view_func instead of as_strided in meta_utils for NT (#112205 ) Currently meta_utils relies on as_strided when handling the view case (recursively meta-ify the base, and then do as_strided to simulate the view), but NestedTensor does not support as_strided today (though maybe it could?), so what we want to do instead is call Tensor. _view_func. Conveniently, _view_func IS always available for nested tensors. A detail to note is that _view_func actually incurs a guard because it needs to perform some metadata checks to make sure the view is still valid. This PR adds Tensor._unsafe_view_func which can avoid that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112205 Approved by: https://github.com/jbschlosser	2023-10-30 19:25:10 +00:00
Wei Lu	503955f5ec	[Pytorch][Vulkan] layer_norm (#112322 ) Summary: Generalize [layer_norm](https://pytorch.org/docs/stable/generated/torch.nn.LayerNorm.html) to all tensors of 2d to 4d. Using the mean and var operators in this diff stack, we can compute the layer_norm directly and remove the old shader file `layernorm.glsl`. ``` (input - input.mean(normalized_shape, keepdim=True)) / torch.sqrt(input.var(normalized_shape, correction=0, keepdims = True) + eps) * weight + bias ``` Test Plan: ``` [luwei@devbig984.prn1 /data/users/luwei/fbsource (0a5028d8c)]$ LD_LIBRARY_PATH=third-party/swiftshader/lib/linux-x64/ buck run fbcode/mode/dev-nosan //xplat/caffe2:pt_vulkan_api_test_bin -- --gtest_filter="layer_norm" Building: finished in 0.1 sec (100%) 339/339 jobs, 0/339 updated Total time: 0.1 sec BUILD SUCCEEDED Running main() from third-party/googletest/1.11.0/googletest/googletest/src/gtest_main.cc Note: Google Test filter = layer_norm [==========] Running 4 tests from 1 test suite. [----------] Global test environment set-up. [----------] 4 tests from VulkanAPITest [ RUN ] VulkanAPITest.layer_norm_invalid_inputs [ OK ] VulkanAPITest.layer_norm_invalid_inputs (69 ms) [ RUN ] VulkanAPITest.layer_norm_2d [ OK ] VulkanAPITest.layer_norm_2d (288 ms) [ RUN ] VulkanAPITest.layer_norm_3d [ OK ] VulkanAPITest.layer_norm_3d (302 ms) [ RUN ] VulkanAPITest.layer_norm_4d [ OK ] VulkanAPITest.layer_norm_4d (8 ms) [----------] 4 tests from VulkanAPITest (668 ms total) [----------] Global test environment tear-down [==========] 4 tests from 1 test suite ran. (668 ms total) [ PASSED ] 4 tests. ``` Reviewed By: yipjustin Differential Revision: D50436726 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112322 Approved by: https://github.com/yipjustin	2023-10-30 19:21:20 +00:00
Pearu Peterson	33c41daf60	Fix scatter_mm kernel failure on non-contiguous tensor arguments (#112337 ) This PR fixes ``` RuntimeError: Triton Error [CUDA]: an illegal memory access was encountered ``` that appears when using large non-contiguous tensor arguments in `scatter_mm` kernel launch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112337 Approved by: https://github.com/cpuhrsch ghstack dependencies: #112154, #112076	2023-10-30 19:16:05 +00:00
Pearu Peterson	cf6041e942	Use weakref in storing tensors as keys (follow-up to #111470 ) (#112076 ) This PR addresses the discussion items in https://github.com/pytorch/pytorch/pull/111470#discussion_r1369008167, that is, - use weakref when storing tensors as keys, - add `storage_offset` to the key data, - and revise the description of the `TensorAsKey` utility. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112076 Approved by: https://github.com/cpuhrsch ghstack dependencies: #112154	2023-10-30 19:16:05 +00:00
Pearu Peterson	e5c8ac8544	Eliminate try-catch block around triton::_triton_bsr_dense_mm_out call. (#112154 ) As in the title. Currently, the try-catch block hides the failures from triton kernel launches that are not related to exceptions that the try-catch block is meant to ignore. When triton kernel launch fails (e.g. due to bugs in triton or lack of resources), ignoring such failures will lead to hard-to-explain/unrelated errors in subsequent code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112154 Approved by: https://github.com/amjames, https://github.com/cpuhrsch	2023-10-30 19:16:05 +00:00
Xuehai Pan	21330e5ba1	[pytree] align `__all__` for C++ and Python pytree (#112110 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112110 Approved by: https://github.com/zou3519	2023-10-30 18:32:25 +00:00
Oguz Ulgen	219763c38d	Support calling user defined triton kernels with kernel.run (#112292 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112292 Approved by: https://github.com/jansel ghstack dependencies: #112290	2023-10-30 17:51:23 +00:00
Oguz Ulgen	1250032c2e	[Inductor] Add triton.autotune support for user defined triton kernels with complex grids (#112290 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112290 Approved by: https://github.com/jansel	2023-10-30 17:48:27 +00:00
Menglu Yu	5a1a9dc354	[inductor][fx pass] Add new split cat pattern detection (#110923 ) Summary: We add a new pattern to merge getitem_cat to enable further split merges Test Plan: ### test mcf model Patch D49972740 ``` buck2 run mode/opt //scripts/jackiexu0313/pt2:local_model_with_pt2 -- --test_mode split-only -c ``` P850153017 ### unit test ``` buck2 test mode/dev-nosan //caffe2/test/inductor:split_cat_fx_passes -- test_getitem_cat_merge ``` Buck UI: https://www.internalfb.com/buck2/eb7411a5-a6bd-46bc-bf66-756341e3ce10 Test UI: https://www.internalfb.com/intern/testinfra/testrun/13792273864439068 Network: Up: 48KiB Down: 15KiB (reSessionID-39ca57cc-5743-423e-b94f-9d0f642010f8) Jobs completed: 8. Time elapsed: 1:44.7s. Cache hits: 0%. Commands: 2 (cached: 0, remote: 0, local: 2) Tests finished: Pass 1. Fail 0. Fatal 0. Skip 0. Build failure 0 ### before vs after transformation https://www.internalfb.com/intern/diffing/?paste_number=847958889 Differential Revision: D50100667 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110923 Approved by: https://github.com/yanboliang	2023-10-30 17:46:13 +00:00
Yanbo Liang	31c223a52c	Forward fix a dynamo tracing rule test failure due to landing race (#112368 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/112368 Approved by: https://github.com/Chillee, https://github.com/malfet	2023-10-30 17:34:22 +00:00
Kefei Lu	a8c74e8225	torch.export: cannot instantiate Dim from REPL (#111231 ) Summary: ``` In [1]: import torch ...: torch.export.Dim('foo', min=1, max=16) --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell In[1], line 2 1 import torch ----> 2 torch.export.Dim('foo', min=1, max=16) File /..../torch/export/__init__.py:319, in Dim(name, min, max) 317 assert _max > _min, f"Cannot create Dim with inconsistent min={min}, max={max}" 318 dim = _Dim(name, (int,), {"min": _min, "max": _max}) --> 319 dim.__module__ = inspect.getmodule(inspect.stack()[1][0]).__name__ # type: ignore[union-attr] 320 return dim AttributeError: 'NoneType' object has no attribute '__name__' ``` Test Plan: Repeat above repro Differential Revision: D50275165 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111231 Approved by: https://github.com/avikchaudhuri, https://github.com/angelayi	2023-10-30 17:15:32 +00:00
Rohan Varma	92cc52ab0e	[CPU SDP] Remove mem efficient attn checks in CPU (#112375 ) It doesn't seem like memory efficient attention can be used on CPU, as we don't check for it when iterating backends in `select_sdp_backend_cpp`. So removing some of the logic around mem efficient attention selection. Created from CodeHub with https://fburl.com/edit-in-codehub Differential Revision: [D50775562](https://our.internmc.facebook.com/intern/diff/D50775562/) NOTE FOR REVIEWERS: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D50775562/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/112375 Approved by: https://github.com/drisspg	2023-10-30 16:43:20 +00:00
Nicolas Hug	255a4d0bd3	Fix doc of fullgraph parameter in torch.compile (#111906 ) The docstring currently states the opposite of what this parameter is doing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111906 Approved by: https://github.com/pmeier, https://github.com/zou3519	2023-10-30 15:17:59 +00:00
PyTorch UpdateBot	f77b9bf3ba	[xla hash update] update the pinned xla hash (#112374 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml). Update the pinned xla hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112374 Approved by: https://github.com/pytorchbot	2023-10-30 13:42:07 +00:00
Puelloc	e36dacaeed	[Docs] fix typo in example of `torch.linalg.solve_triangular` (#112361 ) Fixes #112359 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112361 Approved by: https://github.com/IvanYashchuk	2023-10-30 10:33:14 +00:00
CaoE	29844adbe0	Add Half support for logspace and range on CPU (#112131 ) Add Half support for logspace and range on CPU Pull Request resolved: https://github.com/pytorch/pytorch/pull/112131 Approved by: https://github.com/cpuhrsch	2023-10-30 07:18:47 +00:00
cyy	0d669f06a6	Update Android to R21e (#109355 ) R19c is too old, R21e is LTS version. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109355 Approved by: https://github.com/malfet, https://github.com/huydhn	2023-10-30 06:49:32 +00:00
Peter Bell	bbd5b935e4	Use `pytree.tree_leaves` everywhere (#112324 ) This changes all the instances I could find of `tree_flatten(...)[0]` or `x, _ = tree_flatten` to use `tree_leaves`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112324 Approved by: https://github.com/lezcano ghstack dependencies: #112327, #112323	2023-10-30 03:39:04 +00:00
Peter Bell	a0bf137a78	[pytree] Add optimized `tree_leaves` implementation (#112323 ) pytree is used in many hot paths for dynamo tracing and in many cases we don't care about the tree spec and just want the flattened list. This improves `pytree.tree_leaves` to not construct the spec which gives a noticeable performance improvement when multiplied by the many times it gets called during tracing. Concretely, I see a 2x speedup compared to `tree_flatten` in this benchmark: ```python import torch.utils._pytree as pytree %timeit pytree.tree_flatten([((100, 100), (100, 1)), dict(device="cuda")])[0] %timeit pytree.tree_leaves([((100, 100), (100, 1)), dict(device="cuda")]) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112323 Approved by: https://github.com/lezcano, https://github.com/XuehaiPan ghstack dependencies: #112327	2023-10-30 03:39:04 +00:00
Peter Bell	bfbc2e3ca8	[fx] Cache `_torchscript_schema_to_signature` (#112327 ) This function is called in `normalize_function` which is in a fairly hot path for `FakeTensor` dispatch. In this simple benchmark I see `normalize_function` improve from 92 us to 17 us just by caching this signature object. ```python import torch from torch._subclasses import FakeTensorMode from torch.fx.operator_schemas import normalize_function aten = torch._ops.ops.aten %timeit normalize_function( aten.empty_strided.default, args=((100, 100), (100, 1)), kwargs=dict(device="cuda"), normalize_to_only_use_kwargs=True) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112327 Approved by: https://github.com/lezcano	2023-10-30 03:38:52 +00:00
Khalid Abdullah	919c9b713e	[Typo fixed] in triton_heuristics.py (#112350 ) Fixes Typo Pull Request resolved: https://github.com/pytorch/pytorch/pull/112350 Approved by: https://github.com/Skylion007	2023-10-29 22:44:27 +00:00
Kazuaki Ishizaki	088d1648ec	[test][fx] fix incorrect method call in test case (#112336 ) This PR fixes the incorrect method name in function call in the test case Pull Request resolved: https://github.com/pytorch/pytorch/pull/112336 Approved by: https://github.com/jon-chuang, https://github.com/kit1980	2023-10-29 19:49:13 +00:00
Evgeni Burovski	a9ebee30fa	Make numpy core tests Dynamo traceable. (#112141 ) A follow-up to https://github.com/pytorch/pytorch/pull/112084: convert vendored numpy/core submodule tests dynamo-traceable. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112141 Approved by: https://github.com/lezcano	2023-10-29 19:28:53 +00:00
Evgeni Burovski	ccab8ce745	Make numpy fft and linalg tests Dynamo traceable (#112146 ) Follow up https://github.com/pytorch/pytorch/pull/112141 and make numpy vendored tests of fft and linalg modules dynamo-traceable. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112146 Approved by: https://github.com/lezcano	2023-10-29 19:27:38 +00:00
cyy	740d636165	Add clang-tidy checks in torch/csrc/autograd (#112313 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112313 Approved by: https://github.com/Skylion007	2023-10-29 18:55:11 +00:00
PyTorch MergeBot	ace2713d1e	Revert "Add `torch.utils.deterministic.fill_uninitialized_memory` flag (#111377 )" This reverts commit f1785373c08b9e8383b7eec3391d57053209b525. Reverted https://github.com/pytorch/pytorch/pull/111377 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/111377#issuecomment-1784179040))	2023-10-29 17:41:55 +00:00
rzou	ae72607e5f	Add way to determine which overload an OpOverloadPacket will resolve to (#112199 ) The types are a bit weird (we accept and return a string) because there is not really a notion of OpOverloadPacket vs OpOverload in C++. Test Plan: - new test Pull Request resolved: https://github.com/pytorch/pytorch/pull/112199 Approved by: https://github.com/ezyang ghstack dependencies: #112198	2023-10-29 15:36:14 +00:00
rzou	235a04c0de	Add getAllSortedOperatorsFor helper function (#112198 ) I need this for later. This roughly returns all the OpOverloads for an OpOverloadPacket in the order that the OpOverloadPacket decides to resolve them in. Test Plan: - wait for CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/112198 Approved by: https://github.com/ezyang	2023-10-29 15:36:14 +00:00
Jason Ansel	f5088d2e45	[dynamo] fix None routing bug during var_getattr on UDO (#111614 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111614 Approved by: https://github.com/jansel	2023-10-29 01:57:43 +00:00
Tugsbayasgalan Manlaibaatar	b165abaa3b	Error early when dataclass is not registered (#112211 ) Partially fixes: https://github.com/pytorch/pytorch/issues/112043 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112211 Approved by: https://github.com/angelayi	2023-10-28 19:36:02 +00:00
Jon Chuang	eb8af4dc67	[dynamo] Be stricter about `HigherOrderOperator` kwargs (#111938 ) kwargs need to be handled carefully in speculate subgraph. We should be clearer about the contract of what the inputs are. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111938 Approved by: https://github.com/zou3519	2023-10-28 18:54:33 +00:00
Oguz Ulgen	c14c4efc0e	[Inductor] Add triton.autotune support for user defined triton kernels with constant/simple grids (#112228 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112228 Approved by: https://github.com/jansel	2023-10-28 17:30:35 +00:00
Iris Zhang	12c1465d76	[DeviceMesh] Make mesh_resources private (#112294 ) This is to prepare moving DeviceMesh as a standalone distributed package. `_mesh_resources` should only be used in torch.distributed package. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112294 Approved by: https://github.com/fegin	2023-10-28 17:28:46 +00:00
Xuehai Pan	a7a0955790	[pytree][BE] reorganize imports and format code style and update type hints (#112268 ) Reland PR: - #112109 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112268 Approved by: https://github.com/Skylion007	2023-10-28 16:30:24 +00:00
Jason Ansel	0948550c53	[dynamo] Remove mutation in AutogradFunctionContextVariable (#112216 ) AutogradFunctionContextVariable was mutating self._saved_tensors, which is generally not allowed since VariableTracker objects should be read-only and are frequently copied via apply/clone. This was causing some test failures up the PR stack. This moves the mutation into a separate object that is not copied. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112216 Approved by: https://github.com/voznesenskym ghstack dependencies: #112122	2023-10-28 06:46:48 +00:00
Jason Ansel	c7b78fb76c	[dynamo] Replace recursively_contains with parents_tracker (#112122 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112122 Approved by: https://github.com/voznesenskym	2023-10-28 06:46:48 +00:00
William Wen	a380bf3297	[dynamo, test] skip flaky dynamo-wrapped tests (#112310 ) ghstack-source-id: 7a87e33e7513e7924e4513b6473284562989ed4c Pull Request resolved: https://github.com/pytorch/pytorch/pull/112309 Skip flaky tests reported by - https://github.com/pytorch/pytorch/issues/111825 - https://github.com/pytorch/pytorch/issues/111826 - https://github.com/pytorch/pytorch/issues/111909 - https://github.com/pytorch/pytorch/issues/112142 - https://github.com/pytorch/pytorch/issues/112220 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112310 Approved by: https://github.com/xmfan	2023-10-28 04:14:57 +00:00
Dianshi Li	31f605344f	[Resubmit][S372460 follow up] Reduce embedding feature validation failure carry-on impact (#111838 ) Summary: ## Context The embedding feature validation for GatherRangeToDense was added in the previous diff: D18031155. The logic will check the mismatchedranges or empty ranges in the whole model lifecycle, once the ratio exceeds some thresehold, it will trigger ENFORCE failure (exception). In the current implementation, it may have carry-on impact. The mismatch ratio is equal to: ``` ratio = mismatched_ranges_from_t0_to_t1 / total_ranges_from_t0_to_t1 ``` if the mismatched_ranges_from_t0 somehow increased a lot (bad value spike) at request N, the ratio will be much larger than the treshold. Then it may take long util t2 to make the new ratio drops below the threshold. however, the requests between t1 and t2 may be all good requests, then it brings carry-on impact. Instead, we would like to propose a new strategy, when exception happen at T1, we will clean up all the history counters for this bad feature, and make it a clean run for the next phase util the next exception. it then will get rid of the carry-on impact. In this logic, we will clean up the counter based on the bad feature J. more context: https://docs.google.com/document/d/1tYHISyiLf-PVKPVGlZRZ0iq2Hvog3g5BjCKMCDLZHHo/edit Test Plan: hardcode a much smaller threshold as 0.0001 to force trigger the exception, deploy on some hosts in prod tiers ``` EPHEMERAL_PACKAGE=d44a3de1305c3b4c30fd62bc354a1285 tw update fbcode/tupperware/config/admarket/sigrid/predictor/prod.tw tsp_cln/admarket/sigrid_predictor_v2_dh_t1_elastic_ha --tasks=100-199 --fast --force ``` ## 1) totalRanges can be correct logged ``` I1018 20:37:09.012523 2074 gather_ranges_to_dense_op.h:69 req:00f00000001abbf1] In GatherRangesToDenseOp: Lifetime empty ranges for each feature is 12354. Lifetime mismatched ranges for each feature is 526. With a total of 87503 examples for each feature. ``` ## 2) exception can be still triggered ``` E1018 21:08:42.007398 668 LoggingPredictorService.cpp:701 req:001000000013df51] getRequestPrecomputedDataOnePass failure on model 481948521_146: [enforce fail at gather_ranges_to_dense_op.h:215] std::max(totalRangesTemp, minObservation_) * maxMismatchedRatio_ >= mismatchedRangesTemp. 0.1 vs 1. Ratio of range length mismatch for feature at index 0 is 0.00813008 (1/123) which exceeds 1e-05. The incorrect lengths include: 15 (Error from operator: ``` Differential Revision: D50570811 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111838 Approved by: https://github.com/malfet	2023-10-28 03:50:33 +00:00
PyTorch UpdateBot	fdcd927d8a	[vision hash update] update the pinned vision hash (#112306 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112306 Approved by: https://github.com/pytorchbot	2023-10-28 03:40:44 +00:00
Ke Wen	a2dcf26df4	[c10d] Pass avoidRecordStreams into collective() function (#112195 ) Even after PR #111431, the `collective(...)` function still uses the underlined version `avoidRecordStreams_` inside and does not respect each collective call's preference, as the underlined `avoidRecordStreams_` is only controlled by environment variable. As a fix, we pass `avoidRecordStreams` into the collective() function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112195 Approved by: https://github.com/awgu	2023-10-28 03:28:51 +00:00
Jon Chuang	25f06ee51b	[dynamo] `ExecutorchCallDelegateHigherOrderVariable` - add sanity check that input and output tensors are disjoint (#111960 ) Fixes https://github.com/pytorch/pytorch/issues/111917 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111960 Approved by: https://github.com/zou3519	2023-10-28 02:48:43 +00:00
Xiaodong Wang	3080fd8383	[profiler] add send/recv src/dst info (#111811 ) Summary: There is an ask to add src/dst to nccl trace. This feels like the easiest way to do - adding it to metadata seems to require plumbing a few stacks so will be more work Test Plan: {F1128545195} Reviewed By: davidberard98 Differential Revision: D50560692 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111811 Approved by: https://github.com/davidberard98, https://github.com/aaronenyeshi, https://github.com/fduwjj	2023-10-28 02:48:23 +00:00
Taras Galkovskyi	2c7c2b7827	[torch op][xs] verbose error message for type mismatch in toList() (#110872 ) Summary: Currently error message doesn't give you details on nature of mismatch: Output annotation element type and runtime tensor element type must match for tolist() After update, error becomes actionable: RuntimeError: Output annotation element type and runtime tensor element type must match for tolist(): Long vs Int Test Plan: existing unit tests Reviewed By: iseeyuan Differential Revision: D50082858 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110872 Approved by: https://github.com/houseroad	2023-10-28 02:47:47 +00:00
Joel Schlosser	2225e6361d	Support for as_nested_tensor() with jagged layout + fixed nested_tensor() semantics (#112304 ) This PR: * Adds support for the `layout` kwarg to `torch.nested.as_nested_tensor()` * Fixes `torch.nested.nested_tensor()` * It should accept a list of lists of scalars * It should not preserve autograd history * Adds extensive testing for these two functions Semantics for the two functions follow those of the strided layout: * `torch.nested.nested_tensor(tensor_list, layout=torch.jagged)`: Creates a new jagged layout NT with no autograd history * `tensor_list` can be a list of Tensors or list of lists of scalars * `torch.nested.as_nested_tensor(tensor_list, layout=torch.jagged)`: Creates a new jagged layout NT preserving autograd history of `tensor_list` * `tensor_list` must be a list of Tensors Pull Request resolved: https://github.com/pytorch/pytorch/pull/112304 Approved by: https://github.com/cpuhrsch, https://github.com/soulitzer	2023-10-28 02:34:27 +00:00
PyTorch MergeBot	8d44999183	Revert "[Inductor] Add triton.autotune support for user defined triton kernels with constant/simple grids (#112228 )" This reverts commit dbb31a2984fa616b4bb6fac7abb2a06ec0533eb1. Reverted https://github.com/pytorch/pytorch/pull/112228 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing ROCm test in trunk `dbb31a2984` ([comment](https://github.com/pytorch/pytorch/pull/112228#issuecomment-1783660326))	2023-10-28 01:51:32 +00:00
Antoni Viros	668c3b3f3b	Add embedding op to jagged NT (#112288 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112288 Approved by: https://github.com/cpuhrsch	2023-10-28 01:29:17 +00:00
chilli	1ff0b82be9	Added patterns for randperm + index_add (#112102 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112102 Approved by: https://github.com/lezcano ghstack dependencies: #112093, #112101	2023-10-28 01:26:52 +00:00
drisspg	a1a765c195	Mirror of Xformers Fix (#112267 ) # Summary See https://github.com/fairinternal/xformers/pull/850 for more details Pull Request resolved: https://github.com/pytorch/pytorch/pull/112267 Approved by: https://github.com/cpuhrsch	2023-10-28 00:06:11 +00:00
Evgeni Burovski	46a6435203	Make numpy/lib vendored tests dynamo traceable (#112147 ) Follow up https://github.com/pytorch/pytorch/pull/112146 and #112141 : make numpy/lib vendored tests dynamo traceable Pull Request resolved: https://github.com/pytorch/pytorch/pull/112147 Approved by: https://github.com/lezcano	2023-10-27 23:53:32 +00:00
Ying Zhang	128f4db77e	A small fix in "do_bench_using_profiling" (#112223 ) This is a small fix in "do_bench_using_profiling()". When CUDA kernels are executed in a non-default CUDA stream, if cuda.synchronize() is called, a CUDA kernel named "Context Sync" will be launched to the default stream to wait until all other streams are finished. This CUDA kernel has "CUDA time" but is not a real kernel to profile. This fix excludes "Context Sync" when calculating kernel total time. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112223 Approved by: https://github.com/int3, https://github.com/chenyang78	2023-10-27 23:08:38 +00:00
Sam Larsen	3d2041b342	[inductor] Fix bug handling output_strides in fx graph cache (#112041 ) Summary: The current implementation is not properly attaching output strides to the tracing context when an fx graph is loaded from the cache. That bugs leads to assertion failures like `AssertionError: expected size 3==3, stride 1==9 at dim=1`. This change saves the output strides in the serialized object cached on disk and inserts them into the tracing context whether the graph is loaded from cache or compiled. Test Plan: * New unit test using resnet18 (which repros the problem) * Ran the timm benchmark suite with `--training` Differential Revision: [D50756653](https://our.internmc.facebook.com/intern/diff/D50756653) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112041 Approved by: https://github.com/ezyang	2023-10-27 22:30:46 +00:00
Oguz Ulgen	dbb31a2984	[Inductor] Add triton.autotune support for user defined triton kernels with constant/simple grids (#112228 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112228 Approved by: https://github.com/jansel	2023-10-27 21:40:22 +00:00
PyTorch MergeBot	c67236a05d	Revert "[dynamo] Be stricter about `HigherOrderOperator` kwargs (#111938 )" This reverts commit edafe2ddb99dd721021262fdfd58c3f796c7da0c. Reverted https://github.com/pytorch/pytorch/pull/111938 on behalf of https://github.com/izaitsevfb due to Fails meta internal executorch tests with `torch._dynamo.exc.InternalTorchDynamoError: name 'p_kwargs' is not defined` ([comment](https://github.com/pytorch/pytorch/pull/111938#issuecomment-1783538268))	2023-10-27 21:37:48 +00:00
PyTorch MergeBot	089e7aa4ac	Revert "[dynamo] `ExecutorchCallDelegateHigherOrderVariable` - add sanity check that input and output tensors are disjoint (#111960 )" This reverts commit 27cf49549a35dd78475098b7de02c0a5ab1367ea. Reverted https://github.com/pytorch/pytorch/pull/111960 on behalf of https://github.com/izaitsevfb due to Fails internal executorch tests with module 'torch.utils._pytree' has no attribute 'tree_flatten_only' ([comment](https://github.com/pytorch/pytorch/pull/111960#issuecomment-1783532843))	2023-10-27 21:32:30 +00:00
Yanbo Liang	061bf1a153	[5/N] Make torch context manager a TorchCtxManagerClassVariable (#111622 ) Major change in this PR is to make torch context manager class a separate ```TorchCtxManagerClassVariable```, since we have dynamo implementation for these ctx managers. I was thinking to wrap them as ```UserDefinedClassVariable``` and do dispatch at ```USCVariable.call_function```, but it seems almost the same amount of work and this way is more clear. This is on the way of moving ```TorchVariable``` to ```TorchFunctionVariable``` which will only handle the functions who would be allowed in graph (e.g, ```torch.sin```) and constant folded (e.g, ```torch.is_floating_point```). All other torch functions would be go through skip/inline rules, and would be wrapped as ```UserFunctionVariable``` (for inlined) and ```SkipFilesVariable``` (for skipped). The next steps: * Wrap torch modules, classes, objects as regular ```PythonModuleVariable```, ```UserDefinedClassVariable``` and ```UserDefinedObjectVariable```. * Generate the allow in graph torch functions list and wrap them as ```TorchFunctionVariable```. * Finally merge ```skipfiles.check``` and ```is_allowed``` into one function ```allow_skip.check(fn)``` which would return a Enum of allow, skip and inline. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111622 Approved by: https://github.com/jansel	2023-10-27 21:26:54 +00:00
agunapal	1460e5b7f5	updated aarch64 maintainers in docs (#112047 ) This PR adds a new section for maintainers of `aarch64`. Adding @snadampal to the list Pull Request resolved: https://github.com/pytorch/pytorch/pull/112047 Approved by: https://github.com/atalman	2023-10-27 21:09:36 +00:00
chilli	f7dc0ae16c	Some cleanups in pattern matcher (#112101 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112101 Approved by: https://github.com/eellison ghstack dependencies: #112093	2023-10-27 21:04:39 +00:00
Nikita Shulga	6d685ff54f	[BE] Remove float8 from vec is_floating_type definition (#112196 ) As it's not supported yet, and it's also not clear, how support should look like Pull Request resolved: https://github.com/pytorch/pytorch/pull/112196 Approved by: https://github.com/drisspg	2023-10-27 20:48:36 +00:00
Justin Yip	ca2106e871	[pytorch-vulkan] floor-divide for tensor, tensor (#112190 ) Summary: tsia Test Plan: ## Compile on Mac and run on Android ``` buck2 build -c ndk.static_linking=true -c pt.enable_qpl=0 --target-platforms=ovr_config//platform/android:arm32-fbsource //xplat/caffe2:pt_vulkan_api_test_binAndroid --show-output && adb push buck-out/v2/gen/fbsource/f1f3f9bed27e143c/xplat/caffe2/__pt_vulkan_api_test_binAndroid__/pt_vulkan_api_test_binAndroid /data/local/tmp ``` Run on android ``` $ adb shell /data/local/tmp/pt_vulkan_api_test_binAndroid ... [ RUN ] VulkanAPITest.lstm_prepack_success [ OK ] VulkanAPITest.lstm_prepack_success (11 ms) [ RUN ] VulkanAPITest.querypool_flushed_shader_log xplat/caffe2/aten/src/ATen/test/vulkan_api_test.cpp:7667: Skipped QueryPool is not available [ SKIPPED ] VulkanAPITest.querypool_flushed_shader_log (0 ms) [----------] 396 tests from VulkanAPITest (29980 ms total) [----------] Global test environment tear-down [==========] 396 tests from 1 test suite ran. (29980 ms total) [ PASSED ] 395 tests. [ SKIPPED ] 1 test, listed below: [ SKIPPED ] VulkanAPITest.querypool_flushed_shader_log YOU HAVE 7 DISABLED TESTS ``` All Passed. Full Output: P865232089 Reviewed By: copyrightly Differential Revision: D50677361 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112190 Approved by: https://github.com/manuelcandales	2023-10-27 20:20:41 +00:00
lezcano	1774704fc1	[dynamo] Simplify add_dict in preparation to refactor it with call_set (#110523 ) The previous implementation had a fair amount of repeated code, and did things like calling `add_options` where options was always empty (which is fine, as the guards are already set within ConstDictVariable). Pull Request resolved: https://github.com/pytorch/pytorch/pull/110523 Approved by: https://github.com/yanboliang, https://github.com/jansel ghstack dependencies: #110522	2023-10-27 20:17:10 +00:00
lezcano	1dcbd1c088	[dynamo] [easy] Move Set to dicts.py (#110522 ) A set is more of a dict than a list if you ask me. This comes before the refactor where we implement sets and dicts via the same logic. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110522 Approved by: https://github.com/jansel	2023-10-27 20:17:10 +00:00
Ter Chrng Ng	b9cb4103d7	Fix iphoneos compilation (#111502 ) Summary: As title Test Plan: buck build @//arvr/mode/iphoneos/mac/opt //xplat/third-party/XNNPACK:ukernels_asm_aarch64 Reviewed By: mcr229 Differential Revision: D50423968 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111502 Approved by: https://github.com/mcr229	2023-10-27 20:00:41 +00:00
Nikita Shulga	328a4c5475	[BE] Enhance `OpInfo.supported_dtype` (#111995 ) Current implementation is prone to errors, as it accepts any object, but does not print an error or something if device_type is not recognized. Remediate it by accepting both device-type and device identifies (either `torch.device` instance or "{device_type}:{ordinal}" string Fixes https://github.com/pytorch/pytorch/issues/111179 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111995 Approved by: https://github.com/albanD	2023-10-27 19:42:01 +00:00
Rohan Varma	192e795f3f	Change save -> load in comment (#112217 ) Change save -> load in comment because this is the load_state_dict API Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/112217 Approved by: https://github.com/wz337	2023-10-27 19:39:02 +00:00
Isuru Fernando	c120e5606e	Use ops_and_refs in test_ops.py instead of _ops_and_refs (#112022 ) `ops_and_refs` and `_ops_and_refs` have the same definition. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112022 Approved by: https://github.com/lezcano	2023-10-27 18:37:05 +00:00
Isuru Fernando	c7dcba9276	Remove passing disable_fastpath in kwargs (#112250 ) Fixes an issue that came up in https://github.com/pytorch/pytorch/pull/112030 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112250 Approved by: https://github.com/lezcano	2023-10-27 18:29:20 +00:00
chilli	b110d87ac2	Readded device_assert skipping in index and index_put (and also added (#112093 ) copy to noop pass) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112093 Approved by: https://github.com/oulgen, https://github.com/lezcano	2023-10-27 18:23:49 +00:00
DongDongBan	baf3e054e3	Fixed an error in the comment of file torch.utils.data.dataloader.py#944 . (#112244 ) Fixes #ISSUE_NUMBER @ssnl Pull Request resolved: https://github.com/pytorch/pytorch/pull/112244 Approved by: https://github.com/albanD	2023-10-27 18:16:58 +00:00
Facebook Community Bot	33daaeb6b5	Automated submodule update: FBGEMM (#112118 ) This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `6c2be8831a` Test Plan: Ensure that CI jobs succeed on GitHub before landing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112118 Approved by: https://github.com/malfet	2023-10-27 18:14:54 +00:00
Nikita Shulga	700071869a	[no-ci][EZ] Update RELEASE.md (#112253 ) Reflect default branch renames from master to main Pull Request resolved: https://github.com/pytorch/pytorch/pull/112253 Approved by: https://github.com/huydhn, https://github.com/ZainRizvi	2023-10-27 18:12:15 +00:00
Nikita Shulga	cb48ef21cc	[no-ci] Clarify revert handling in release branches (#112262 ) Changes that has been reverted on trunk, must be reverted in release as well Pull Request resolved: https://github.com/pytorch/pytorch/pull/112262 Approved by: https://github.com/huydhn	2023-10-27 18:11:29 +00:00
Jez Ng	a26cb0a3f2	[dynamo] Enable typechecking for testing.py (#112129 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112129 Approved by: https://github.com/Skylion007 ghstack dependencies: #111894, #111992, #112031, #112127, #112128	2023-10-27 18:00:56 +00:00
Jon Chuang	d3bf6803b6	[dynamo] add sanity check that we do not wrap tracked tensors (#112025 ) Identified as a result of https://github.com/pytorch/pytorch/pull/111911 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112025 Approved by: https://github.com/ezyang	2023-10-27 17:15:03 +00:00
drisspg	d97332f839	Add cuda status checks to FA templates (#112229 ) # Summary cuda status checks were accidentely removed on latest update Pull Request resolved: https://github.com/pytorch/pytorch/pull/112229 Approved by: https://github.com/Skylion007	2023-10-27 16:54:23 +00:00
Aaron Enye Shi	63c089b09d	[c10] Move profiler clock to libc10 for timestamps (#111972 ) Summary: Move the profiler's Approximate Clock from libtorch to libc10. The main reason is to allow c10 features to get time. The clock is using TSC when available for performance. CUDA Caching Allocator's implementation of memory snapshot will add the timestamps to memory events with this same clock in subsequent diff. Test Plan: CI Differential Revision: D50601935 Pulled By: aaronenyeshi Pull Request resolved: https://github.com/pytorch/pytorch/pull/111972 Approved by: https://github.com/davidberard98	2023-10-27 16:18:40 +00:00
Isuru Fernando	fdbb73fa4e	Check both ops and refs in test_strided_layout (#112160 ) Trying #112023 again to see if CLA issue is fixed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112160 Approved by: https://github.com/lezcano, https://github.com/Neilblaze	2023-10-27 15:35:34 +00:00
Richard Zou	bd0ea72b28	torch.library: Create helper function `is_functional_schema` (#111660 ) I will need this again soon. Test Plan: - existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/111660 Approved by: https://github.com/soulitzer	2023-10-27 15:20:25 +00:00
Edward Z. Yang	7df675743c	Stop using defaultdict for deferred_runtime_asserts (#112172 ) In the ShapeEnv record replay machinery we do equality tests on this dict, but `{i0: []}` is considered not equal to `{}`. But you can unpredictably end up with the first by just doing reads from the dict. Doing a real dict removes this wobbliness. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/112172 Approved by: https://github.com/ysiraichi, https://github.com/Skylion007	2023-10-27 15:05:42 +00:00
Howard Huang	9f7bff1171	Add timeout for master store if clients do not join (#111805 ) Currently, if the master_store does not have all clients join in the `timeout` time, it will just continue silently which could lead to errors down the road. However, if a client does not connect with the master within the specified time then an exception will be raised. This change will have master_store error out if not all clients have joined, making server and client consistent with each other. Since this is changing the default behavior of master store I am open to suggestions. Example: ```python import torch.distributed as dist import torch.multiprocessing as mp from datetime import timedelta def main(rank, world_size): if rank == 0: print("creating store") # world size is 2 so this eventually times out store = dist.TCPStore("localhost", 1234, 2, True, timeout=timedelta(seconds=5)) print("finished creating store") if __name__ == "__main__": world_size = 2 mp.spawn(main, (world_size,), nprocs=world_size) ``` Previous ``` print("creating store") print("finished creating store") ``` Now ``` print("creating store") torch.distributed.DistStoreError: Timed out after 6 seconds waiting for workers. 1/2 workers joined. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/111805 Approved by: https://github.com/XilunWu, https://github.com/fduwjj	2023-10-27 14:44:43 +00:00
Li-Huai (Allan) Lin	cf5479b57e	[MPS] Make the device in MPSGenerator consistent with MPSAllocator (#112188 ) `1b702b185e/aten/src/ATen/mps/MPSAllocator.mm (L751-L760)` The device in an MPS tensor is actually allocated with a device index, so this PR makes the device generated by `MPSGenerator` consistent with that. Fixes https://github.com/pytorch/pytorch/issues/110820#issuecomment-1752088865 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112188 Approved by: https://github.com/malfet, https://github.com/kulinseth	2023-10-27 09:37:41 +00:00
Sherlock Huang	7265c22a5d	[AOTInductor] Enforce no_grad for Run entries (#111613 ) Summary: Always enter no_grad mode in AOTInductor run entries. ``` // AOTInductor uses at::addmm_out, which doesn't supports // arguments that requires gradient. For this reason, we // enforce no_grad context for run APIs. ``` Test Plan: buck2 test mode/dev-nosan caffe2/test/inductor:test_aot_inductor and OSS CI Differential Revision: D50432042 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111613 Approved by: https://github.com/chenyang78, https://github.com/khabinov	2023-10-27 09:14:19 +00:00
Chien-Chin Huang	2a86bcbac2	[FSDP][state_dict] Cleanup the usage of _get_pg_default_device (#112168 ) _get_pg_default_device is not suitable for FSDP use case. We should always use the compute_device when communicating. Differential Revision: [D50698730](https://our.internmc.facebook.com/intern/diff/D50698730/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112168 Approved by: https://github.com/wz337	2023-10-27 08:09:08 +00:00
Wei Lu	46667c97fd	[Pytorch][Vulkan] var.dim (#111965 ) Summary: We implement [`torch.var`](https://pytorch.org/docs/stable/generated/torch.var.html) for tensors of 2d to 4d. By using the `mean`, `sub` and `pow` ops, we can compute the variance as below without adding a new shader. ``` at::Tensor self_mean = self.mean(opt_dim, true); at::Tensor output = (self.sub(self_mean).pow(2)).mean(opt_dim, keepdim); ``` Test Plan: ``` [luwei@devbig984.prn1 /data/users/luwei/fbsource (2da0640c6)]$ LD_LIBRARY_PATH=third-party/swiftshader/lib/linux-x64/ buck run fbcode/mode/dev-nosan //xplat/caffe2:pt_vulkan_api_test_bin -- --gtest_filter="var" Building: finished in 0.1 sec (100%) 339/339 jobs, 0/339 updated Total time: 0.1 sec BUILD SUCCEEDED Running main() from third-party/googletest/1.11.0/googletest/googletest/src/gtest_main.cc Note: Google Test filter = var [==========] Running 6 tests from 1 test suite. [----------] Global test environment set-up. [----------] 6 tests from VulkanAPITest [ RUN ] VulkanAPITest.var_2d_unbiased [ OK ] VulkanAPITest.var_2d_unbiased (322 ms) [ RUN ] VulkanAPITest.var_2d_biased [ OK ] VulkanAPITest.var_2d_biased (0 ms) [ RUN ] VulkanAPITest.var_3d_unbiased [ OK ] VulkanAPITest.var_3d_unbiased (2 ms) [ RUN ] VulkanAPITest.var_3d_biased [ OK ] VulkanAPITest.var_3d_biased (2 ms) [ RUN ] VulkanAPITest.var_4d_unbiased [ OK ] VulkanAPITest.var_4d_unbiased (175 ms) [ RUN ] VulkanAPITest.var_4d_biased [ OK ] VulkanAPITest.var_4d_biased (5 ms) [----------] 6 tests from VulkanAPITest (508 ms total) [----------] Global test environment tear-down [==========] 6 tests from 1 test suite ran. (508 ms total) [ PASSED ] 6 tests. ``` Reviewed By: yipjustin Differential Revision: D50398925 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111965 Approved by: https://github.com/yipjustin	2023-10-27 07:56:01 +00:00
Jez Ng	20fc2b4186	[dynamo] Enable typechecking for compiled_autograd.py (#112128 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112128 Approved by: https://github.com/Skylion007 ghstack dependencies: #111894, #111992, #112031, #112127	2023-10-27 06:18:58 +00:00
Jez Ng	632ac01bef	[dynamo] Enable typechecking for exc.py (#112127 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112127 Approved by: https://github.com/Skylion007 ghstack dependencies: #111894, #111992, #112031	2023-10-27 06:18:58 +00:00
Elias Ellison	6a99291546	Removing sdpa conv layout constraint (#112045 ) Previously layout opt with sdpa would cause failures because we would pass a non-dense last dim to sdpa. Those layout constraints have been added in prior prs. Now we can do conv layout opt with sdpa. Improves twins_pcpvt_base 1.4622 → 1.5351, xcit_large_24_p8_224 3.0681 → 3.1839 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112045 Approved by: https://github.com/shunting314 ghstack dependencies: #111976, #111721	2023-10-27 05:40:43 +00:00
Shengbao Zheng	572b66331e	[PyTorch][ET] collect comms in ET for send/recv (#111985 ) Summary: collect send/recv comms op in Execution Trace Test Plan: run param comms with arbitrary collective size to collect operator send ``` { "name": "record_param_comms", "id": 153, "rf_id": 141, "parent": 152, "fw_parent": 0, "seq_id": -1, "scope": 0, "tid": 1, "fw_tid": 0, "op_schema": "", "inputs": [[[21,22,0,262144,4,"cuda:0"]],215038,139890792374272,1,"send",[],[]], "input_shapes": [[[262144]],[],[],[],[],[],[]], "input_types": ["GenericList[Tensor(float)]","Int","Int","Int","String","GenericList[]","GenericList[]"], "outputs": [[[21,22,0,262144,4,"cuda:0"]]], "output_shapes": [[[262144]]], "output_types": ["GenericList[Tensor(float)]"] }, ``` recv ``` { "name": "record_param_comms", "id": 172, "rf_id": 160, "parent": 171, "fw_parent": 0, "seq_id": -1, "scope": 0, "tid": 1, "fw_tid": 0, "op_schema": "", "inputs": [[[138,139,0,262144,4,"cuda:0"]],215042,139890792374272,1,"recv",[],[]], "input_shapes": [[[262144]],[],[],[],[],[],[]], "input_types": ["GenericList[Tensor(float)]","Int","Int","Int","String","GenericList[]","GenericList[]"], "outputs": [[[138,139,0,262144,4,"cuda:0"]]], "output_shapes": [[[262144]]], "output_types": ["GenericList[Tensor(float)]"] }, ``` Differential Revision: D50624443 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111985 Approved by: https://github.com/fduwjj	2023-10-27 05:24:04 +00:00
Fei Kou	7e5e951dfe	[tp] update node meta with partitioned val (#112080 ) Test Plan: buck run mode/opt scripts/feikou/di:export_dummy_model -- --world-size=4 buck run mode/opt scripts/feikou/di:run_model -- --num_gpus=4 --num_iters=1 In sigmoid: Non-DI: ``` V1025 13:57:16.341391 2225036 run_model.cpp:84] Non-ditributed run outputs:[ 0.8350 0.5399 1.0196 0.9286 1.1265 1.0324 V1025 13:57:16.341391 2225036 run_model.cpp:84] 0.8350 0.5399 1.0196 0.9286 1.1265 1.0324 V1025 13:57:16.341391 2225036 run_model.cpp:84] 0.8350 0.5399 1.0196 0.9286 1.1265 1.0324 V1025 13:57:16.341391 2225036 run_model.cpp:84] 0.8350 0.5399 1.0196 0.9286 1.1265 1.0324 V1025 13:57:16.341391 2225036 run_model.cpp:84] 0.8350 0.5399 1.0196 0.9286 1.1265 1.0324 V1025 13:57:16.341391 2225036 run_model.cpp:84] [ CUDAFloatType{5,6} ]] ``` DI: ``` V1025 13:57:26.352564 2226855 run_model.cpp:278] [Rank 3] output wait_tensor_9: 0.8350 0.5399 1.0196 0.9286 1.1265 1.0324 V1025 13:57:26.352564 2226855 run_model.cpp:278] 0.8350 0.5399 1.0196 0.9286 1.1265 1.0324 V1025 13:57:26.352564 2226855 run_model.cpp:278] 0.8350 0.5399 1.0196 0.9286 1.1265 1.0324 V1025 13:57:26.352564 2226855 run_model.cpp:278] 0.8350 0.5399 1.0196 0.9286 1.1265 1.0324 V1025 13:57:26.352564 2226855 run_model.cpp:278] 0.8350 0.5399 1.0196 0.9286 1.1265 1.0324 V1025 13:57:26.352564 2226855 run_model.cpp:278] [ CUDAFloatType{5,6} ] ``` Differential Revision: D50663481 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112080 Approved by: https://github.com/wanchaol	2023-10-27 05:08:33 +00:00
Wanchao Liang	033680c9af	[tp] fix PrepareModuleInput for multiple inputs (#112204 ) Not all inputs needs to annotate shardings and convert to DTensors, if user annotate only one inputs are mark the rest as Nones, we should skip creating DTensors Pull Request resolved: https://github.com/pytorch/pytorch/pull/112204 Approved by: https://github.com/fduwjj	2023-10-27 05:08:05 +00:00
Michael Lazos	a6e556f8b0	Support calling __torch_function__ attribute access (#111737 ) Triggers `__torch_function__` tracing on attribute/method/property access matching the eager behavior for non-overridden attributes/methods/properties that are present on `torch.Tensor`. Some caveats: 1. for methods there doesn't seem to be a way to check if the original implementation of a method is overridden via monkey patching or not. For example: ``` class LocalSubclass(torch.Tensor): @classmethod def __torch_function__(cls, func, types, args=(), kwargs=None): if kwargs is None: kwargs = {} return super().__torch_function__(func, types, args, kwargs) x = torch.ones(2, 2).as_subclass(LocalSubclass) > x.sigmoid <built-in method sigmoid of LocalSubclass object at 0x7f8d305bb5e0> ``` There isn't a way to verify that this built-in method is equivalent to the base `torch.Tensor` implementation as each instance will have a different built-in method object that can't be traced back to the original `torch.Tensor` impl. You can check that the class itself has the original implementation via ``` > inspect.getattr_static(LocalSubclass, "sigmoid") <method 'sigmoid' of 'torch._C.TensorBase' objects> ``` But we can't detect if the user dynamically patches an object with a built-in method called sigmoid which does something completely different. 2. If a user overrides a method but calls the original implementation we will still graph break. This will require modifying `SuperVariable` (and any other way to get the original impl) to handle tensor subclasses. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111737 Approved by: https://github.com/jansel, https://github.com/ezyang	2023-10-27 04:57:19 +00:00
Levy Zhao	589625cbae	Add bandwidth to extern kernel calc (#110539 ) Summary: - Modify the result of get_estimated_runtime() for ExternKernelSchedulerNode to count both bytes and FLOPs and return the maximum of the two. Reviewed By: xmfan Differential Revision: D48987490 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110539 Approved by: https://github.com/xw285cornell	2023-10-27 04:46:24 +00:00
Iris Zhang	c84dbd2c03	[2D] Enable 2D optimizer set_state_dict() (#111778 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111778 Approved by: https://github.com/fegin, https://github.com/fduwjj ghstack dependencies: #111774	2023-10-27 04:33:00 +00:00
Iris Zhang (PyTorch)	aa9e65d8f5	[DCP] Add fsspec.transaction context when writing checkpoint to storage (#112191 ) Summary: Adding fsspec.transaction to safeguard checkpointing writing. With the context, it should only commit if there was no exception and discard otherwise. Test Plan: ``` command: buck test @//mode/dev-nosan //caffe2/test/distributed/checkpoint/fb:test_fsspec_filesystem -- --print-passing-details ``` Reviewed By: rohan-varma Differential Revision: D50701929 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112191 Approved by: https://github.com/rohan-varma	2023-10-27 04:27:29 +00:00
Elias Ellison	7cb72704cc	Constrain sdpa to fx strides (#111721 ) Fix for https://github.com/pytorch/pytorch/issues/109607. sdpa requires last dimension strides to be 1. Add constraint so that we run the op with the strides we observed in tracing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111721 Approved by: https://github.com/drisspg, https://github.com/Chillee, https://github.com/jansel ghstack dependencies: #111976	2023-10-27 03:23:27 +00:00
Wanchao Liang	94e90c199c	[dtensor] fix pointwise op linearity with strategy (#112107 ) This PR fixes the pointwise op strategy linearity, and switch the linear pointwise ops to use strategy. Also add tests show that using the new way we can enable full shard (S(0), S(0)) like operations Why this is useful? for 2-D Parallel like patterns where the named parameters are possibly fully sharded on all devices, [S(0), S(0)] or [S(1), S(0)], etc. need to work, since we don't use the sharding rules anymore, this is possible at this point. @awgu Pull Request resolved: https://github.com/pytorch/pytorch/pull/112107 Approved by: https://github.com/wz337	2023-10-27 02:41:45 +00:00
PyTorch MergeBot	64fd027f2e	Revert "[inductor] benchmark fusion (#108193 )" This reverts commit 73cc5d1cdda118007ccdb0be8d775ba76726596e. Reverted https://github.com/pytorch/pytorch/pull/108193 on behalf of https://github.com/izaitsevfb due to Trying to unblock the revert of #108690, please rebase and reland. ([comment](https://github.com/pytorch/pytorch/pull/108193#issuecomment-1782157638))	2023-10-27 01:40:06 +00:00
PyTorch MergeBot	0a3199dd7e	Revert "Readded device_assert skipping in index and index_put (and also added (#112093 )" This reverts commit e38347f490ae14bf96913a19e7dab9b5e752c276. Reverted https://github.com/pytorch/pytorch/pull/112093 on behalf of https://github.com/izaitsevfb due to Sorry, trying to resolve a conflict with intern, and unblock the revert of #108690 ([comment](https://github.com/pytorch/pytorch/pull/112093#issuecomment-1782154814))	2023-10-27 01:37:33 +00:00
PyTorch MergeBot	797d7100de	Revert "[quant][pt2e][be] Cleanup observer insertion logic (#111828 )" This reverts commit bf998a2c5d549cf4856c7becfca4a169bf68b709. Reverted https://github.com/pytorch/pytorch/pull/111828 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/111828#issuecomment-1782154648))	2023-10-27 01:35:27 +00:00
Nikita Shulga	ac4cc5dbea	[Dynamo] Do not crash if numpy is not installed (#112175 ) `s/isinstance(value, np.generic)/np is not None and isinstance(value, np.generic)/` Found while looking at https://github.com/pytorch/pytorch/pull/110512 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112175 Approved by: https://github.com/ev-br, https://github.com/kit1980	2023-10-27 00:39:28 +00:00
PyTorch MergeBot	22221c6d60	Revert "Trigger specialization when you call size()/stride() from C++ (#111935 )" This reverts commit 5846705e36795d76941e18073e49c6edba90c994. Reverted https://github.com/pytorch/pytorch/pull/111935 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/111935#issuecomment-1782107024))	2023-10-27 00:23:03 +00:00
Michael Lazos	1569df7f01	Don't search getitem for batch fusions (#112088 ) Batch mm fusion regressed optimizer compile time by about ~1m, excluding getitem solves this problem. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112088 Approved by: https://github.com/yanboliang	2023-10-27 00:13:55 +00:00
Dino Viehland	5b71834785	Avoid c++ exception and stack trace (#111438 ) Summary: When raising an exception here this causes pybind11's dispatcher to kick in, which causes aiplatform's logic to kick in (aiplatform::error_reporting::util::printAddressesWithBestEffortLocationInfo), which ultimately uses `folly::symbolizer::Symbolizer::symbolize` for building up the stack trace. In 3.8 this uses about 3.62% of the CPU time per pyperf (https://fburl.com/scuba/pyperf_experimental/on_demand/oi554uvy). In Cinder 3.8 for some reason this is worse - using 5.94% of the CPU. This exception is happening when doing a hasattr() on `prims` for things like `bitwise_left_shift` which don't exist: https://www.internalfb.com/code/fbsource/[2d695f650d00]/fbcode/caffe2/torch/_inductor/lowering.py?lines=590 That exception is ultimately going to be swallowed anyway, and the stack trace has no meaningful value. Furthermore because this is kind of an expected outcome in the code versus some random C++ exception the stack trace is less valuable as well. This changes this to return a (None, None) on the failure case instead of returning a valid op/overload list, avoiding the exception, and reclaiming the 3.62%-5.94% of time. Test Plan: Existing CI and perf run: https://fburl.com/scuba/pyperf_experimental/on_demand/oi554uvy Differential Revision: D50018789 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111438 Approved by: https://github.com/davidberard98	2023-10-26 23:55:34 +00:00
lezcano	acd02a60d5	Add a test making sure we are not importing SymPy when importing torch (#112038 ) As per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/112038 Approved by: https://github.com/malfet, https://github.com/peterbell10 ghstack dependencies: #112035, #112036, #112037	2023-10-26 23:32:27 +00:00
lezcano	47ccf04885	Split SymNode into its own file (#112037 ) This PR: - Moves TrueDiv, LShift, RShift, IsNonOverlappingAndDenseIndicator to `_sympy.functions.py` - Moves SymNode to `fx.experimental.sym_node`. - This file does not have any SymPy dependencies at import time - It installs the magic methods in Sym{Bool,Int,Float}. - N.b. With this split, we may be able to move Sym{Bool,Int,Float} to this file, and remove quite a few of the hacks around these classes - Imports `sym_node` in `torch/__init__.py` rather than the whole `symbolic_shapes.py`. This breaks the import-time dependency between torch and SymPy Pull Request resolved: https://github.com/pytorch/pytorch/pull/112037 Approved by: https://github.com/peterbell10 ghstack dependencies: #112035, #112036	2023-10-26 23:32:27 +00:00
lezcano	deac5357db	Make proxy_tensor.py not depend on SymPy (#112036 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112036 Approved by: https://github.com/malfet, https://github.com/peterbell10 ghstack dependencies: #112035	2023-10-26 23:32:19 +00:00
lezcano	4f7f46ee35	Move SymDispatchMode to its own file (#112035 ) This is just code movement + a getter and a setter to break the dependency of SymDispatchMode, and in turn, ProxySymDispatchMode on sympy. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112035 Approved by: https://github.com/peterbell10	2023-10-26 23:32:11 +00:00
PyTorch MergeBot	55ab9932f5	Revert "Constrain sdpa to fx strides (#111721 )" This reverts commit 8a7c3cec78686e661b3781b916a8aae59083f90a. Reverted https://github.com/pytorch/pytorch/pull/111721 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is breaking ROCm job in trunk `8a7c3cec78` ([comment](https://github.com/pytorch/pytorch/pull/111721#issuecomment-1782064133))	2023-10-26 23:27:57 +00:00
PyTorch MergeBot	4a94f77c8e	Revert "Make numpy/lib vendored tests dynamo traceable (#112147 )" This reverts commit 190b6e4ba88f6cf00d0bd08d6212a3fe6bb76eaa. Reverted https://github.com/pytorch/pytorch/pull/112147 on behalf of https://github.com/huydhn due to Sorry for reverting this again, but this is failing in trunk `190b6e4ba8` ([comment](https://github.com/pytorch/pytorch/pull/112147#issuecomment-1782056995))	2023-10-26 23:23:49 +00:00
Shunting Zhang	73cc5d1cdd	[inductor] benchmark fusion (#108193 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108193 Approved by: https://github.com/jansel	2023-10-26 22:18:37 +00:00
Nikita Shulga	e660bd1422	Re-enable some embedded bag tests (#111712 ) They were temporary disabled in 2019 by https://github.com/pytorch/pytorch/pull/26599 As suggested, increased relative tolerance from 0 to 2% when tests are using float16 dtype <!-- copilot:poem --> ### <samp>🤖 Generated by Copilot at 1e49d84</samp> > _`TestEmbeddingNN`_ > _CUDA tests restored_ > _Bug fixed in autumn breeze_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/111712 Approved by: https://github.com/huydhn	2023-10-26 22:16:38 +00:00
Evgeni Burovski	190b6e4ba8	Make numpy/lib vendored tests dynamo traceable (#112147 ) Follow up https://github.com/pytorch/pytorch/pull/112146 and #112141 : make numpy/lib vendored tests dynamo traceable Pull Request resolved: https://github.com/pytorch/pytorch/pull/112147 Approved by: https://github.com/lezcano	2023-10-26 21:41:22 +00:00
PyTorch MergeBot	abe172e268	Revert "Cleanup error reporting for ProcessGroupNCCL (#111979 )" This reverts commit b29c658265d6b95d8ec77f7052eff4f25190fbfc. Reverted https://github.com/pytorch/pytorch/pull/111979 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing multigpu test in trunk `b29c658265` ([comment](https://github.com/pytorch/pytorch/pull/111979#issuecomment-1781919184))	2023-10-26 21:29:40 +00:00
rzou	d91a18c433	Grandfather in torchgen'ed aten ops to torch.Tag.pt2_compliant_tag (#112053 ) In torchgen, we add the pt2_compliant_tag to all aten ops. Test Plan: - new test Pull Request resolved: https://github.com/pytorch/pytorch/pull/112053 Approved by: https://github.com/soulitzer	2023-10-26 21:21:09 +00:00
Jon Chuang	27cf49549a	[dynamo] `ExecutorchCallDelegateHigherOrderVariable` - add sanity check that input and output tensors are disjoint (#111960 ) Fixes https://github.com/pytorch/pytorch/issues/111917 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111960 Approved by: https://github.com/zou3519	2023-10-26 21:13:05 +00:00
Bin Bao	73f36e44fb	[aotinductor] Add a debug compile flag (#112021 ) Summary: When the debug compile flag is specified, model.so is compiled with "-O0 -g". Pull Request resolved: https://github.com/pytorch/pytorch/pull/112021 Approved by: https://github.com/chenyang78 ghstack dependencies: #111823	2023-10-26 21:11:08 +00:00
Bin Bao	f66cc67562	[aotinductor] Fix duplicated unbacked symbol declarations (#111823 ) Summary: For https://github.com/pytorch/pytorch/issues/111711 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111823 Approved by: https://github.com/ezyang, https://github.com/aakhundov	2023-10-26 21:11:08 +00:00
Lengyue	f839a5627b	Add bf16 support to replicate padding (#112099 ) Fixes #99433 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112099 Approved by: https://github.com/mikaylagawarecki	2023-10-26 20:30:49 +00:00
Elias Ellison	8a7c3cec78	Constrain sdpa to fx strides (#111721 ) Fix for https://github.com/pytorch/pytorch/issues/109607. sdpa requires last dimension strides to be 1. Add constraint so that we run the op with the strides we observed in tracing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111721 Approved by: https://github.com/drisspg, https://github.com/Chillee, https://github.com/jansel ghstack dependencies: #111976	2023-10-26 20:21:55 +00:00
Justin Yip	1b702b185e	[pytorch-vulkan] disable one zero-dim tensor test to fix test (#112087 ) Summary: D50347338 has bug on android (not Mac, not Devserver). This diff disable the test for time being while I identify the actual cause. Test Plan: ## Compile on devserver ``` [yipjustin@129360.od ~/fbsource (e415d865c)]$ buck2 build -c ndk.static_linking=true -c pt.enable_qpl=0 --target-platforms=ovr_config//platform/android:arm32-fbsource //xplat/caffe2:pt_vulkan_api_test_binAndroid --show-output File changed: fbcode//caffe2/aten/src/ATen/test/vulkan_api_test.cpp File changed: fbsource//xplat/caffe2/aten/src/ATen/test/vulkan_api_test.cpp Buck UI: https://www.internalfb.com/buck2/99d47e63-ed6e-4db9-bee2-24909d647b78 Network: Up: 3.2KiB Down: 67KiB (reSessionID-459e359b-773c-48a4-b129-81fde7c5e876) Jobs completed: 4664. Time elapsed: 7.3s. Cache hits: 100%. Commands: 38 (cached: 38, remote: 0, local: 0) BUILD SUCCEEDED fbsource//xplat/caffe2:pt_vulkan_api_test_binAndroid buck-out/v2/gen/fbsource/f1f3f9bed27e143c/xplat/caffe2/__pt_vulkan_api_test_binAndroid__/pt_vulkan_api_test_binAndroid ``` ## Run test. adb shell /data/local/tmp/pt_vulkan_api_test_binAndroid \| pastry Result: P864940908 ``` ... [ OK ] VulkanAPITest.lstm_success (7 ms) [ RUN ] VulkanAPITest.lstm_mclareninputs_success [ OK ] VulkanAPITest.lstm_mclareninputs_success (56 ms) [ RUN ] VulkanAPITest.lstm_prepack_success [ OK ] VulkanAPITest.lstm_prepack_success (7 ms) [ RUN ] VulkanAPITest.querypool_flushed_shader_log xplat/caffe2/aten/src/ATen/test/vulkan_api_test.cpp:7568: Skipped QueryPool is not available [ SKIPPED ] VulkanAPITest.querypool_flushed_shader_log (0 ms) [----------] 391 tests from VulkanAPITest (30715 ms total) [----------] Global test environment tear-down [==========] 391 tests from 1 test suite ran. (30715 ms total) [ PASSED ] 390 tests. [ SKIPPED ] 1 test, listed below: [ SKIPPED ] VulkanAPITest.querypool_flushed_shader_log YOU HAVE 7 DISABLED TESTS ``` Reviewed By: liuk22 Differential Revision: D50668570 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112087 Approved by: https://github.com/izaitsevfb, https://github.com/SS-JIA	2023-10-26 19:48:40 +00:00
Yang Chen	5e5329155e	[aotinductor] only include -lc10 for non-fbcode case (#112125 ) Summary: otherwise, we would break internal uses Differential Revision: D50681467 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112125 Approved by: https://github.com/swolchok, https://github.com/desertfire, https://github.com/SherlockNoMad	2023-10-26 19:47:08 +00:00
PyTorch MergeBot	3a284dae30	Revert "Do not materialize entire randperm in RandomSampler (#103339 )" This reverts commit d80174e2db679365f8b58ff8583bdc4af5a8b74c. Reverted https://github.com/pytorch/pytorch/pull/103339 on behalf of https://github.com/kit1980 due to Cause issues on MPS, and also fails without numpy ([comment](https://github.com/pytorch/pytorch/pull/103339#issuecomment-1781705172))	2023-10-26 18:53:14 +00:00
Thiago Crepaldi	b7affa2ac3	Add unit test for ONNX models with torch.distributions.normal.Normal (#111498 ) Fixes #111034 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111498 Approved by: https://github.com/justinchuby, https://github.com/BowenBao	2023-10-26 17:57:34 +00:00
ydwu4	8bc0b382fa	[HigherOrderOp] Move map_impl to torch.ops.higher_order (#111404 ) The purpose of this pr is as titled. Because of some misusage of ghstack, ghimport, and export to github from internal, the stack of https://github.com/pytorch/pytorch/pull/111092 is a mess. I'll try to land them one by one. This is a replacement for #111092 and #111400. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111404 Approved by: https://github.com/tugsbayasgalan, https://github.com/zou3519	2023-10-26 16:59:10 +00:00
Huy Do	f6f81a5969	Update get-workflow-job-id to also return job name (#112103 ) Then we can use this job name in `filter-test-configs` if it's available. This addresses the issue in which `filter-test-configs` on GitHub runners (MacOS x86) couldn't find the runner log to get the job name. This is expected because GitHub runners are isolated, so a job should not be able to access runner logs, which could contains information from other jobs. This allows all missing features depending on running `filter-test-configs` on GitHub runners: * Rerun disabled tests and memory leak check. For example, this would help avoid closing https://github.com/pytorch/pytorch/issues/110980#issuecomment-1779806466 early with the disabled test running properly on MacOS x86 * MacOS x86 jobs can now be disabled or marked as unstable I keep the current logic to parse the log as a fallback because it's working fine on self-hosted runners. That also handles the case where `get-workflow-job-id` fails. Also I move the rest of `get-workflow-job-id` up before the test step like https://github.com/pytorch/pytorch/pull/111483 ### Testing Spot checks some jobs to confirm they have the correct names: * MacOS M1 test job https://github.com/pytorch/pytorch/actions/runs/6648305319/job/18065275722?pr=112103#step:10:8 * MacOS x86 build job https://github.com/pytorch/pytorch/actions/runs/6648306305/job/18065138137?pr=112103#step:9:14 * Linux test job has https://github.com/pytorch/pytorch/actions/runs/6648300991/job/18065354503?pr=112103#step:13:7 * Windows test job https://github.com/pytorch/pytorch/actions/runs/6648305319/job/18065599500?pr=112103#step:12:7 * MacOS x86 test job https://github.com/pytorch/pytorch/actions/runs/6648306305/job/18066312801#step:10:8 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112103 Approved by: https://github.com/clee2000	2023-10-26 16:42:46 +00:00
PyTorch MergeBot	485cc0faae	Revert "[inductor] benchmark fusion (#108193 )" This reverts commit ec0cdcdf6a816eadb4d868284eea86732f50da2e. Reverted https://github.com/pytorch/pytorch/pull/108193 on behalf of https://github.com/ZainRizvi due to This test is breaking trunk. In the future please make sure to add the ciflow/trunk label before force merging any PR to ensure your code doesn't break those tests ([comment](https://github.com/pytorch/pytorch/pull/108193#issuecomment-1781473282))	2023-10-26 16:41:20 +00:00
Edward Z. Yang	7da713bbaf	Convert evaluate_expr GuardOnDataDependentSymNode into graph break (#111919 ) Extracted this failure from https://github.com/pytorch/pytorch/pull/110155 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/111919 Approved by: https://github.com/lezcano	2023-10-26 16:28:00 +00:00
ydwu4	036abd43b3	[dynamo] Preserve node names in export (#111947 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111947 Approved by: https://github.com/ydwu4, https://github.com/zou3519	2023-10-26 16:11:35 +00:00
angelayi	b126adcdee	[aotinductor] Pass TorchIR to AOTInductor (#110020 ) Updates `_export.aot_compile` to pass a torch IR graph to inductor, allowing inductor to now run the pre_grad_passes, and reuse more of inductor's code. Also updates the API to only return the `so_path`, and not returning the exported program. The pytree call spec is now serialized and placed inside of the generated model code. When calling the model, because there is no c++ pytree implementation linked yet, we can access the call specs through `get_call_spec()`, and call pytree flatten/unflattenin python. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110020 Approved by: https://github.com/desertfire	2023-10-26 15:54:31 +00:00
Evgeni Burovski	ed2cc4dd59	TST: make torch_np added tests dynamo traceable (#112149 ) Follow up https://github.com/pytorch/pytorch/pull/112146, https://github.com/pytorch/pytorch/pull/112141 and https://github.com/pytorch/pytorch/pull/112147: make torch_np added tests dynamo traceable Pull Request resolved: https://github.com/pytorch/pytorch/pull/112149 Approved by: https://github.com/lezcano	2023-10-26 15:36:36 +00:00
Joel Schlosser	42e4c648a2	New @decorateIf decorator for param-specific conditional decoration (#112033 ) Adds a new decorator `@decorateIf(decorator, predicate_fn)`. Examples: ```python from torch.testing._internal.common_utils import decorateIf ... @decorateIf(unittest.skip, lambda params: params["x"] == 2) @parametrize("x", range(5)) def test_foo(self, x): ... @parametrize("x,y", [(1, 'foo'), (2, 'bar'), (3, 'baz')]) @decorateIf( unittest.expectedFailure, lambda params: params["x"] == 3 and params["y"] == "baz" ) def test_bar(self, x, y): ... @decorateIf( unittest.expectedFailure, lambda params: params["op"].name == "add" and params["dtype"] == torch.float16 ) @ops(op_db) def test_op_foo(self, device, dtype, op): ... @decorateIf( unittest.skip, lambda params: params["module_info"].module_cls is torch.nn.Linear and \ params["device"] == "cpu" ) @modules(module_db) def test_module_foo(self, device, dtype, module_info): ... ``` Follow-up for per-param decoration based on https://github.com/pytorch/pytorch/issues/79161#issuecomment-1152487359 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112033 Approved by: https://github.com/clee2000, https://github.com/pmeier	2023-10-26 14:39:59 +00:00
Yang Chen	7671be8108	[aotinductor] allow generating default args in fbcode (#112085 ) Summary: Previously, we want to maintain forward-compatibility by skipping default args in the serialized artifacts in fbcode. However, some of our shim interfaces require default values being set. Discussed with Sherlock offline and we decided to allow serializing default args into the C++ wrapper code for now. We will refine this part if we see real FC requirement. Test Plan: ci Differential Revision: D50638663 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112085 Approved by: https://github.com/SherlockNoMad	2023-10-26 14:17:54 +00:00
lezcano	c8a5bb451e	Do not import sympy within torch._prims_common (#112034 ) This is the first of a few PRs that avoid importing SymPy at import time. The pitch here is that we (almost!) do not have SymPy on our API, so this should be feasible. This should speed-up torch imports by a good 15% as per https://dev-discuss.pytorch.org/t/delving-into-what-happens-when-you-import-torch/1589 In this PR we just move a few global imports into local imports. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112034 Approved by: https://github.com/ezyang	2023-10-26 12:53:25 +00:00
Jon Chuang	d6724a51f9	[dynamo] md5 hash non `compile_ignored` configs (#111298 ) fixes: https://github.com/pytorch/pytorch/issues/111235 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111298 Approved by: https://github.com/ezyang ghstack dependencies: #111303	2023-10-26 10:59:10 +00:00
Cao E	1c89ea7f72	Add Half support for softmax and log_softmax on CPU (#103315 ) Add Half support for softmax and log_softmax on CPU. Note: This introduces a correctness issue with MPS https://github.com/pytorch/pytorch/issues/111416 and https://github.com/pytorch/pytorch/issues/111479. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103315 Approved by: https://github.com/jgong5, https://github.com/mikaylagawarecki, https://github.com/malfet	2023-10-26 08:38:54 +00:00
dshi7	fbff99ffea	Add regex matching to Inductor all2all collective unit tests (#112077 ) Fixes #111776 Support check_regex in FileCheck() by adding `find_regex` in `struct TORCH_API StringCordView`. Callsite accepts RE syntax for std::regex. However, I haven't figured out submatch ID yet. For example, "buf5[0], buf6_inputs[0]" is still considered a match. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112077 Approved by: https://github.com/yf225	2023-10-26 08:29:30 +00:00
XiaobingSuper	395614c1a4	keep sync bn training flag same with converted bn's training flag (#111998 ) When converting bn to sync bn, we need to keep sync bn's training flag with the original bn flag, the motivation is there in case the given origin model has set some bn training flag and others are not seated, after we convert sync bn, we hoping not to change this behavior. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111998 Approved by: https://github.com/mikaylagawarecki	2023-10-26 08:18:08 +00:00
chilli	e38347f490	Readded device_assert skipping in index and index_put (and also added (#112093 ) copy to noop pass) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112093 Approved by: https://github.com/oulgen, https://github.com/lezcano ghstack dependencies: #111990	2023-10-26 07:54:44 +00:00
Jon Chuang	d090c18fca	[dynamo] annotate config with `@compile_ignored` (#111303 ) Fixes: #111221 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111303 Approved by: https://github.com/ezyang	2023-10-26 05:41:29 +00:00
Jez Ng	89bd17552d	[dynamo] Enable typechecking for funcname_cache.py (#112031 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112031 Approved by: https://github.com/Skylion007 ghstack dependencies: #111894, #111992	2023-10-26 04:54:16 +00:00
Jez Ng	413baa1b25	[dynamo] Enable typechecking for codegen.py (#111992 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111992 Approved by: https://github.com/Skylion007, https://github.com/eellison ghstack dependencies: #111894	2023-10-26 04:54:16 +00:00
Jez Ng	e67d2c9825	[dynamo] Enable typechecking for allowed_functions.py (#111894 ) Motivation: MYPYNOFOLLOW currently typechecks almost all inductor files and some dynamo files as well. However, it has `follow_imports=skip` enabled which greatly nerfs its effectiveness. I would like to enable import following for all the files currently checked by MYPYNOFOLLOW. But that leads to a lot of new errors in other files. I can exclude errors from files in other directories, but it is somewhat difficult to do that for dynamo and inductor files themselves. Thus I am making sure all the dynamo files typecheck first. Note on changes: I could not type the return value of `make_function_id_set` since it was returning a class defined in the function body. Thus I deleted `make_function_id_set` and replaced it with a direct construction of the `FunctionIdSet` instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111894 Approved by: https://github.com/Skylion007, https://github.com/eellison	2023-10-26 04:54:16 +00:00
Nikita Shulga	b61efe1c2b	Fix `torch.[size\|stride]`(dim=None)` invocation (#111991 ) Per documentation, one should be able to explicitly pass dim argument as None to get tensor size across all dimentions/strides, but before this change it was incorrectly interpreted as named tensor call. Modify `size` and `stride` signatures generated by `gen_pyi.py` to highlight that overload with `None` will return a Tuple, but one with `dim: _int` returns `int`. Add regression test to validate the behavior, and remove the check for asserts from two named tensors tests (NamedTensors are dead, aren't they?) Fixes https://github.com/pytorch/pytorch/issues/111944 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111991 Approved by: https://github.com/zou3519	2023-10-26 04:14:35 +00:00
Shunting Zhang	ec0cdcdf6a	[inductor] benchmark fusion (#108193 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108193 Approved by: https://github.com/jansel	2023-10-26 04:14:22 +00:00
Jon Chuang	edafe2ddb9	[dynamo] Be stricter about `HigherOrderOperator` kwargs (#111938 ) kwargs need to be handled carefully in speculate subgraph. We should be clearer about the contract of what the inputs are. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111938 Approved by: https://github.com/zou3519	2023-10-26 03:51:30 +00:00
Brian Hirsh	2aaa7e542c	AOTAutograd: avoid intermediate_base logic when all aliased outputs came from a multi_output_view (#111411 ) Partially addresses https://github.com/pytorch/pytorch/issues/111081 This fixes the majority of the slowness from https://fb.workplace.com/groups/1405155842844877/permalink/7491314274228973/. In particular, the type of example that suffers the most perf-wise in AOTAutograd looks like this: ``` @torch.compile def f(x): intermediate = x.mul(2) outs = intermediate.unbind(0) return outs x = torch.randn(50, 50, requires_grad=True) outs = f(x) sum(outs).sum().backward() ``` There are 50 output tensors in the above function, that all alias each other. AOTAutograd will dutifully exercise its intermediate base [logic](https://github.com/pytorch/pytorch/blob/main/torch/_functorch/aot_autograd.py#L294), and try to regenerate the aliases outside of the compiled `autograd.Function` at runtime, to ensure that the autograd engine is aware of the aliasing. In this case, this will result in 50 AsStridedBackward nodes in the backward, because we will fall back to using as_strided to generate each of those 50 outputs. The current PR as is (somewhat unsafely) ensures that the backward graph consists of a single `UnbindBackward`, or a call to `aten.cat()`. I left a long comment in the code describing the situation, but the core idea is that autograd does not let you mutate grad_fn of tensor aliases that come from multi-output views. So if we have `k` outputs that alias each other, but `k-1` of them are aliases that came from multi-output views, then in eager mode, it would not be possible to mutate one of the aliases in a way that would change the grad_fn of any of the other aliases, without causing an error in the backward. So the claim I'm making is that if we hide this aliasing from the autograd engine, then it is impossible for the user to perform any mutations that would cause autograd metadata to diverge between torch.compile and eager in a way that isn't an error in eager mode. To be fair, I think that taking the approach outlined in https://docs.google.com/document/d/1DlfFq8TKbuAn2zyJxLfoW-X1qkkm5PLdHFtySo03QAk/edit would also help us avoid the as_strided calls in this particularly egregious case, and* keep the autograd error messages. This relies on both pre-dispatch functionalization being fully hardened and adding some pretty invasive changes to AOTAutograd though, and is probably at least several months out. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111411 Approved by: https://github.com/ezyang	2023-10-26 02:54:50 +00:00
Jeff Daily	28c0b07d19	[ROCm] remove HCC references (#111975 ) - rename `__HIP_PLATFORM_HCC__` to `__HIP_PLATFORM_AMD__` - rename `HIP_HCC_FLAGS` to `HIP_CLANG_FLAGS` - rename `PYTORCH_HIP_HCC_LIBRARIES` to `PYTORCH_HIP_LIBRARIES` - workaround in tools/amd_build/build_amd.py until submodules are updated These symbols have had a long deprecation cycle and will finally be removed in ROCm 6.0. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111975 Approved by: https://github.com/ezyang, https://github.com/hongxiayang	2023-10-26 02:39:10 +00:00
Kurt Mohler	f1785373c0	Add `torch.utils.deterministic.fill_uninitialized_memory` flag (#111377 ) Part of #109802 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111377 Approved by: https://github.com/albanD	2023-10-26 02:39:06 +00:00
Hongtao Yu	7a3a00bb0b	[inductor] Remove redundant views (#111773 ) As a follow-up to https://github.com/pytorch/pytorch/pull/110740, this patches enables removing redundant complex views to allow more operation fusing. E.g, given ``` @torch.compile def foo(X, Y): Z = X + Y A = X + Y return A + Z ``` the generated code is: ``` @triton.jit def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): xnumel = 6 xoffset = tl.program_id(0) * XBLOCK xindex = xoffset + tl.arange(0, XBLOCK)[:] xmask = xindex < xnumel x0 = xindex tmp0 = tl.load(in_ptr0 + (x0), xmask) tmp1 = tl.load(in_ptr1 + (x0), xmask) tmp2 = tmp0 + tmp1 tmp3 = tmp2 + tmp2 tl.store(out_ptr0 + (x0), tmp3, xmask) ''') def call(args): arg0_1, arg1_1 = args args.clear() assert_size_stride(arg0_1, (3, ), (1, )) assert_size_stride(arg1_1, (3, ), (1, )) with torch.cuda._DeviceGuard(0): torch.cuda.set_device(0) # no-op to ensure context # Source Nodes: [A], Original ATen: [aten.add] buf0 = aten.view.dtype(arg0_1, torch.float32) del arg0_1 buf1 = buf0 del buf0 # Source Nodes: [A], Original ATen: [aten.add] buf2 = aten.view.dtype(arg1_1, torch.float32) del arg1_1 buf3 = buf2 del buf2 buf4 = empty_strided((6, ), (1, ), device='cuda', dtype=torch.float32) # Source Nodes: [add_2], Original ATen: [aten.add] stream0 = get_cuda_stream(0) triton_poi_fused_add_0.run(buf1, buf3, buf4, 6, grid=grid(6), stream=stream0) del buf1 del buf3 # Source Nodes: [add_2], Original ATen: [aten.add] buf5 = aten.view.dtype(buf4, torch.complex64) del buf4 buf6 = buf5 del buf5 return (buf6, ) ``` whereas previously the generated code was: ``` @triton.jit def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): xnumel = 6 xoffset = tl.program_id(0) * XBLOCK xindex = xoffset + tl.arange(0, XBLOCK)[:] xmask = xindex < xnumel x0 = xindex tmp0 = tl.load(in_ptr0 + (x0), xmask) tmp1 = tl.load(in_ptr1 + (x0), xmask) tmp2 = tmp0 + tmp1 tl.store(out_ptr0 + (x0), tmp2, xmask) def call(args): arg0_1, arg1_1 = args args.clear() assert_size_stride(arg0_1, (3, ), (1, )) assert_size_stride(arg1_1, (3, ), (1, )) with torch.cuda._DeviceGuard(0): torch.cuda.set_device(0) # no-op to ensure context # Source Nodes: [A], Original ATen: [aten.add] buf0 = aten.view.dtype(arg0_1, torch.float32) buf1 = buf0 del buf0 # Source Nodes: [A], Original ATen: [aten.add] buf2 = aten.view.dtype(arg1_1, torch.float32) buf3 = buf2 del buf2 buf4 = empty_strided((6, ), (1, ), device='cuda', dtype=torch.float32) # Source Nodes: [A], Original ATen: [aten.add] stream0 = get_cuda_stream(0) triton_poi_fused_add_0.run(buf1, buf3, buf4, 6, grid=grid(6), stream=stream0) del buf1 del buf3 # Source Nodes: [A], Original ATen: [aten.add] buf5 = aten.view.dtype(buf4, torch.complex64) buf6 = buf5 del buf5 # Source Nodes: [add_2], Original ATen: [aten.add] buf7 = aten.view.dtype(buf6, torch.float32) del buf6 buf8 = buf7 del buf7 # Source Nodes: [Z], Original ATen: [aten.add] buf9 = aten.view.dtype(arg0_1, torch.float32) del arg0_1 buf10 = buf9 del buf9 # Source Nodes: [Z], Original ATen: [aten.add] buf11 = aten.view.dtype(arg1_1, torch.float32) del arg1_1 buf12 = buf11 del buf11 buf13 = buf4; del buf4 # reuse # Source Nodes: [Z], Original ATen: [aten.add] triton_poi_fused_add_0.run(buf10, buf12, buf13, 6, grid=grid(6), stream=stream0) del buf10 del buf12 # Source Nodes: [Z], Original ATen: [aten.add] buf14 = aten.view.dtype(buf13, torch.complex64) buf15 = buf14 del buf14 # Source Nodes: [add_2], Original ATen: [aten.add] buf16 = aten.view.dtype(buf15, torch.float32) del buf15 buf17 = buf16 del buf16 buf18 = buf13; del buf13 # reuse # Source Nodes: [add_2], Original ATen: [aten.add] triton_poi_fused_add_0.run(buf8, buf17, buf18, 6, grid=grid(6), stream=stream0) del buf17 del buf8 # Source Nodes: [add_2], Original ATen: [aten.add] buf19 = aten.view.dtype(buf18, torch.complex64) del buf18 buf20 = buf19 del buf19 return (buf20, ) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/111773 Approved by: https://github.com/jansel	2023-10-26 02:37:17 +00:00
Zhengxu Chen	64d75f72d4	[fx] Add a faster method for inserting positional argument. (#111974 ) Summary: Traditionally when user want to update the arguments for an FX node, the only way is to call the setter of .args property on nodes. This may be problematic when we insert a lot of arguments. Because of the semantics of the setter method, it has a worst case O(n) complexity. Adding a new insert_arg provides us two benefits: 1. The operation is guaranteed to be O(1) cost. 2. User can express the intentation more directly, instead of writing code like `node.args = (arg,) + node.args` Test Plan: caffe2/test:fx -- -r test_insert_arg Reviewed By: suo Differential Revision: D50574435 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111974 Approved by: https://github.com/angelayi	2023-10-26 02:30:42 +00:00
Pritam Damania	b29c658265	Cleanup error reporting for ProcessGroupNCCL (#111979 ) Continuing some of the work from https://github.com/pytorch/pytorch/pull/108191, I realized majority of errors raised from ProcessGroupNCCL were just generic RuntimeError. In this PR, I've added appropriate error types to all the exceptions raised from ProcessGroupNCCL. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111979 Approved by: https://github.com/fduwjj	2023-10-26 01:39:54 +00:00
chilli	74adb4cccc	Updated flop counter to accept pytree inputs/outputs (#111990 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111990 Approved by: https://github.com/ezyang	2023-10-26 01:25:27 +00:00
PyTorch MergeBot	d641450180	Revert "[cpu][inductor] improve cpu vec implementations of log (#111898 )" This reverts commit b5703203647644176220676af0e8e5f23de8d45a. Reverted https://github.com/pytorch/pytorch/pull/111898 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/111898#issuecomment-1780263780))	2023-10-26 01:12:19 +00:00
Evgeni Burovski	3831cf4891	TST: make test_multiarray traceable by Dynamo (#112084 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112084 Approved by: https://github.com/lezcano ghstack dependencies: #112081, #112082, #112083	2023-10-26 01:03:45 +00:00
Evgeni Burovski	a4e4f41cce	MAINT: graph break on numpy.__version__ (#112083 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112083 Approved by: https://github.com/lezcano ghstack dependencies: #112081, #112082	2023-10-26 01:03:45 +00:00
Evgeni Burovski	7352c88f58	TST: add x{pass,fail}IfTorchDynamo (#112082 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112082 Approved by: https://github.com/lezcano ghstack dependencies: #112081	2023-10-26 01:03:45 +00:00
Evgeni Burovski	5b7caf31c1	CI: remove numpy_torch_interop from CI (#112081 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/112081 Approved by: https://github.com/lezcano	2023-10-26 01:03:45 +00:00
PyTorch MergeBot	d8e19bb03a	Revert "[2D] Enable 2D optimizer set_state_dict() (#111778 )" This reverts commit 52eec50d31976519a5b1b75993d4945927bcc92f. Reverted https://github.com/pytorch/pytorch/pull/111778 on behalf of https://github.com/huydhn due to Sorry for reverting you change, but it is failing multigpu test in trunk `52eec50d31` ([comment](https://github.com/pytorch/pytorch/pull/111778#issuecomment-1780227820))	2023-10-26 00:18:30 +00:00
Jon Chuang	0ed461ae4c	[dynamo] Ensure Dynamo uses this graph's fakes for `Tensor` `example_value`s (#111954 ) Fixes https://github.com/pytorch/pytorch/issues/111869, Fixes (detailed list of cases handled): https://github.com/pytorch/pytorch/pull/111913#discussion_r1370267313, fully fixes: https://github.com/pytorch/pytorch/issues/111873 Adds sanity checks ensuring that Dynamo uses this graph's fakes for Tensor `example_values` Handles the main (and only?) entrypoints for new `FakeTensor`s in a Dynamo graph: - `wrap_fx_proxy_cls` - `VariableTracker.wrap_tensor` Ensures that `get_fake_value` returns a fake except when we know we are going to properly wrap non-fakes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111954 Approved by: https://github.com/ezyang	2023-10-25 23:54:18 +00:00
pbialecki	17b732eb04	increase CPU memory requirement for test_nll_loss_large (#110963 ) Running `python test_nn.py -v -k test_nll_loss_large_tensor` on a machine with a small host RAM availability (e.g. ~50GB) fails with a `SIGKILL` even though the currently specified memory requirements for CPU (and GPU) are set to 48GB and are thus met. Profiling the peak memory usage via: ``` \time -v python test_nn.py -v -k test_nll_loss_large_tensor ``` and adding `print(torch.cuda.memory_summaryu())` at the end of the test shows a higher host RAM usage of >100GB and a device memory usage of ~32GB. ``` Command being timed: "python test_nn.py -v -k test_nll_loss_large_tensor" User time (seconds): 81.66 System time (seconds): 229.02 Percent of CPU this job got: 671% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:46.30 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 118150096 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 90280839 Voluntary context switches: 1669 Involuntary context switches: 1214548 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 ``` ``` \| PyTorch CUDA memory summary, device ID 0 \| \|---------------------------------------------------------------------------\| \| CUDA OOMs: 0 \| cudaMalloc retries: 0 \| \|===========================================================================\| \| Metric \| Cur Usage \| Peak Usage \| Tot Alloc \| Tot Freed \| \|---------------------------------------------------------------------------\| \| Allocated memory \| 32769 MiB \| 32769 MiB \| 81923 MiB \| 49154 MiB \| \| from large pool \| 32768 MiB \| 32768 MiB \| 81921 MiB \| 49152 MiB \| \| from small pool \| 0 MiB \| 0 MiB \| 1 MiB \| 1 MiB \| \|---------------------------------------------------------------------------\| \| Active memory \| 32769 MiB \| 32769 MiB \| 81923 MiB \| 49154 MiB \| \| from large pool \| 32768 MiB \| 32768 MiB \| 81921 MiB \| 49152 MiB \| \| from small pool \| 0 MiB \| 0 MiB \| 1 MiB \| 1 MiB \| \|---------------------------------------------------------------------------\| \| Requested memory \| 32769 MiB \| 32769 MiB \| 81923 MiB \| 49154 MiB \| \| from large pool \| 32768 MiB \| 32768 MiB \| 81921 MiB \| 49152 MiB \| \| from small pool \| 0 MiB \| 0 MiB \| 1 MiB \| 1 MiB \| \|---------------------------------------------------------------------------\| \| GPU reserved memory \| 32774 MiB \| 32774 MiB \| 81938 MiB \| 49164 MiB \| \| from large pool \| 32772 MiB \| 32772 MiB \| 81930 MiB \| 49158 MiB \| \| from small pool \| 2 MiB \| 2 MiB \| 8 MiB \| 6 MiB \| \|---------------------------------------------------------------------------\| ... ``` We haven't seen this issue before as the majority of our runners have sufficient host RAM and I just ran into it by chance. CC @atalman @malfet @crcrpar Pull Request resolved: https://github.com/pytorch/pytorch/pull/110963 Approved by: https://github.com/mikaylagawarecki, https://github.com/eqy, https://github.com/malfet	2023-10-25 23:45:47 +00:00
Facebook Community Bot	8516b4d7da	Automated submodule update: FBGEMM (#106168 ) This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `3579b4d627` Test Plan: Ensure that CI jobs succeed on GitHub before landing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106168 Approved by: https://github.com/huydhn	2023-10-25 23:32:30 +00:00
Elias Ellison	2971bdd6fc	Ignore Dims of value 1 in Require_Stride_order (#111976 ) Ignore dims of value 1 in require_stride_order since they don't affect layout. Previously, unsqueezed dims would always cause a copy because the stride of 0 would throw off the sorted stride order. This was causing perf problems with require_stride_order in next commit in stack. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111976 Approved by: https://github.com/Chillee	2023-10-25 23:14:25 +00:00
drisspg	4851c973ae	Update FlashAttentionV2 kernel to 02ac572 (#111886 ) # Summary We were restricted from updating to the newest version of FlashAttention based off of the changes to is_casual described here: https://github.com/pytorch/pytorch/issues/108108 Prior to this PR we landed: https://github.com/pytorch/pytorch/pull/111007 which enabled us to updated beyond: 9e5e8bc91e on FlashAttentionV2. With this PR we have updated to this commit: `02ac572f3f`. Or Tag 2.3.2 ## Plans Following this PR I plan to work more on https://github.com/pytorch/pytorch/issues/110681 in order to expose a CausalVariant attn_mask, w/ the potential for also exposing a kvcache attn_mask. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111886 Approved by: https://github.com/cpuhrsch	2023-10-25 23:07:56 +00:00
Yifu Wang	ec18ef62f4	Native c10d_functional ops (#110570 ) This PR introduces a native version of c10d_functional ops. The main goal is to add collective support in AOTInductor and allow collective ops to work in multi-threaded native runtimes. The native version also incorporated API improvements we wished to implement in Python c10d_functional: - Removed `ranks` and `group_size` from collective op signatures which were proven to be redundant. - Use tensor storage as opposed to `void*` to resolve in-flight work. The native process group registration/resolution mechansim is only used for native c10d_functional in the PR. It will become the single source of truth in upcoming PRs. The upcoming PRs will implement Inductor/AOTInductor support for c10d_functional, after which native c10d_functional will replace Python c10d_functional. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110570 Approved by: https://github.com/wanchaol	2023-10-25 22:56:06 +00:00
eellison	7fe51e3e9b	Add cudagraph_mark_step_begin in torch.compiler, reference in error message (#111722 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111722 Approved by: https://github.com/ezyang, https://github.com/msaroufim	2023-10-25 21:53:21 +00:00
Zhengxu Chen	f2a0bef35a	[export] Upstream support of (tensor, tensor list) in op returns. (#111857 ) Summary: Upstreaming from internal to oss. Diff: D49710320 Test Plan: buck2 build mode/opt sigmoid/inference/test_gpu:package_gen Differential Revision: D50577490 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111857 Approved by: https://github.com/SherlockNoMad	2023-10-25 21:38:12 +00:00
rzou	e5049648be	Add a "pt2 compliant" tag; add config to graph break on non-pt2_compliant ops (#111933 ) This PR: - adds the pt2 compliant tag. This tag specifies that the operator works with the PT2 compilation APIs. A custom op author should test their ops with opcheck if they choose to add this tag. - adds a config for Dynamo to allow only pt2 compliant ops into the graph and graph break on all other OpOverload/OpOverloadPacket. Bikeshedding help wanted on the name of the tag. It should be easily grep-able so we can set up rules for it. Test Plan: - new tests Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/111933 Approved by: https://github.com/ezyang ghstack dependencies: #111912, #111915, #111948	2023-10-25 21:20:59 +00:00
rzou	6365992f92	[opcheck] Add way to initialize blank failures dict (#111948 ) Summary: Fixes #111926. The workflow is: - create a blank file with the correct name - run a test with PYTORCH_OPCHECK_ACCEPT=1 Test Plan: - tested locally Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/111948 Approved by: https://github.com/ezyang ghstack dependencies: #111912, #111915	2023-10-25 21:20:59 +00:00
rzou	3219b728b6	[torch.library] Clarify torch.library.define's schema (#111915 ) Unlike the previous torch.library.define, this schema doesn't take a name (the name is a part of the qualname). We separated out the qualname from the schema in the new APIs so that they're all consistent with each other (they all accept the qualname separately). Test Plan: - new tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/111915 Approved by: https://github.com/suo, https://github.com/ezyang ghstack dependencies: #111912	2023-10-25 21:20:54 +00:00
rzou	2d04be9a00	[torch.library] Add mechanism to add tags during define (#111912 ) We extend torch.library.Library.define and torch.library.define with a tags argument. Test Plan: - new test Pull Request resolved: https://github.com/pytorch/pytorch/pull/111912 Approved by: https://github.com/ezyang	2023-10-25 21:20:48 +00:00
Yue Dong	ed15fa7cc2	[Kineto][NCCL][3/n] Get the NCCL communication info from PARAM_COMMS_INFO (#111846 ) This diff enables the functionality to get the NCCL communication metadata from `c10::DebugInfoKind::PARAM_COMMS_INFO` available in `ThreadLocalDebugInfo`. To make the overhead lighweight and avoid comparing the function name on each op, we add the method `bool isNcclMeta()`, which decided during initialization. Differential Revision: [D50439211](https://our.internmc.facebook.com/intern/diff/D50439211/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111846 Approved by: https://github.com/aaronenyeshi ghstack dependencies: #111842, #111843	2023-10-25 20:35:06 +00:00
lezcano	1623cc5815	[easy] Make test_mandelbrot_numpy deterministic (#112042 ) It fails for me locally, and I'm not the only one: https://dev-discuss.pytorch.org/t/main-failing-unit-test-dynamicshapesmisctests/1607 Pull Request resolved: https://github.com/pytorch/pytorch/pull/112042 Approved by: https://github.com/peterbell10	2023-10-25 20:29:50 +00:00
Catherine Lee	b33220063d	[TD] Historical edited files and profiling heuristics (#111510 ) Adds files for the heuristics and run them in trial mode. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111510 Approved by: https://github.com/ZainRizvi	2023-10-25 19:54:17 +00:00
atalman	36b3e1789a	Docker release build don't include build suffix in the release (#112046 ) This build is used in release as far as I know. For release we don't need suffix. Test in Release: ``` python3 .github/scripts/generate_pytorch_version.py 2.1.1+cpu python3 .github/scripts/generate_pytorch_version.py --no-build-suffix 2.1.1 ``` Test with nightly: ``` python3 .github/scripts/generate_pytorch_version.py --no-build-suffix 2.2.0.dev20231025 ``` With suffix: ``` python3 .github/scripts/generate_pytorch_version.py 2.2.0.dev20231025+cpu ```` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112046 Approved by: https://github.com/huydhn	2023-10-25 19:40:01 +00:00
Mikayla Gawarecki	b54ab57522	Document torch.from_file and fix UntypedStorage.from_file docs (#111688 ) Fixes https://github.com/pytorch/pytorch/issues/37439 Also threads through filename so it is accessible via `t.storage().filename` Pull Request resolved: https://github.com/pytorch/pytorch/pull/111688 Approved by: https://github.com/albanD	2023-10-25 19:28:11 +00:00
Jon Chuang	f3b42ab5b9	feat(dynamo): remove inconsistent tracing histories by acknowledging possibility of inconsistent side-effects (#110804 ) Fixes https://github.com/pytorch/pytorch/issues/110765 CC @voznesenskym @yanboliang @Fidget-Spinner @anijain2305 @soulitzer @ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/110804 Approved by: https://github.com/ezyang, https://github.com/voznesenskym	2023-10-25 19:27:11 +00:00
Zain Rizvi	cb4e62a498	Fix broken lint on trunk (#112051 ) Forward fix lint error introduced by https://github.com/pytorch/pytorch/pull/111146/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/112051 Approved by: https://github.com/seemethere, https://github.com/kit1980, https://github.com/malfet	2023-10-25 19:18:54 +00:00
BowenBao	b365acba28	[ONNX] A better way to safe guard 2GB model serialization (#111984 ) Summary - faster than previous try-catch. - more stable than previous try-catch. In some circumstances serializing models > 2GB into a single protobuf file ends up with a corrupted file without raising an exception. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111984 Approved by: https://github.com/justinchuby	2023-10-25 19:18:37 +00:00
Yang Chen	6b7b90462f	[aotinductor] Turn clang warning ignored-optimization-argument into error (#112008 ) Now we compile the generated wrapper C++ code with clang in fbcode. When the Model's run_impl function is too large, clang will issue a warning like: Function foo is too big to optimize [-Wignored-optimization-argument] and compile the code without any optimization. I think we may want to be more proactive in such cases. If the generated C++ code is too complex or too large to be optimized, we would like to be notified loudly with errors, so that we would figure out ways to address the issue. Later if we feel that turning this warning into an error is too aggressive, we would add a config to disable it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112008 Approved by: https://github.com/desertfire, https://github.com/htyu	2023-10-25 19:14:27 +00:00
PyTorch MergeBot	7e654c8f88	Revert "WIP / TST: allow testing torch._numpy under Dynamo (#110401 )" This reverts commit 5ed4a423ded14138f1a724eff15ccd14648f6c49. Reverted https://github.com/pytorch/pytorch/pull/110401 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it is failing dynamo job in trunk `5ed4a423de` ([comment](https://github.com/pytorch/pytorch/pull/110401#issuecomment-1779811943))	2023-10-25 18:21:16 +00:00
Will Feng	e9804aaacc	Fix unit tests and add logging for Inductor intra-graph reordering (#111981 ) 1. Fix code to make unit tests pass (incl. collect_env issue called out by @int3 in https://github.com/pytorch/pytorch/pull/108091#discussion_r1362901686). 2. Add logging for Inductor intra-graph reordering passes (`TORCH_LOGS="overlap"`), for easier debugging. Example log: ``` [rank0]:[2023-10-24 16:28:26,446] [0/0] torch._inductor.comms.__overlap: [DEBUG] ==== Visualize overlap before reordering pass <function reorder_compute_for_overlap at 0x7fa68c5568e0> ==== [rank0]:[2023-10-24 16:28:26,446] [0/0] torch._inductor.comms.__overlap: [DEBUG] ComputedBuffer (size=[4, 4], stride=[4, 1]) (buf0) [rank0]:[2023-10-24 16:28:26,447] [0/0] torch._inductor.comms.__overlap: [DEBUG] ExternKernelOut (extern_kernels.mm) (size=[4, 4], stride=[4, 1]) (buf1) [rank0]:[2023-10-24 16:28:26,447] [0/0] torch._inductor.comms.__overlap: [DEBUG] InPlaceHint (size=[4, 4], stride=[4, 1]) (buf2) [rank0]:[2023-10-24 16:28:26,447] [0/0] torch._inductor.comms.__overlap: [DEBUG] AllReduce (size=[4, 4], stride=[4, 1]) (buf3) [rank0]:[2023-10-24 16:28:26,447] [0/0] torch._inductor.comms.__overlap: [DEBUG] Wait (size=[4, 4], stride=[4, 1]) (buf4) [rank0]:[2023-10-24 16:28:26,447] [0/0] torch._inductor.comms.__overlap: [DEBUG] ComputedBuffer (size=[4, 4], stride=[4, 1]) (buf5) [rank0]:[2023-10-24 16:28:26,447] [0/0] torch._inductor.comms.__overlap: [DEBUG] InPlaceHint (size=[4, 4], stride=[4, 1]) (buf6) [rank0]:[2023-10-24 16:28:26,447] [0/0] torch._inductor.comms.__overlap: [DEBUG] AllReduce (size=[4, 4], stride=[4, 1]) (buf7) [rank0]:[2023-10-24 16:28:26,447] [0/0] torch._inductor.comms.__overlap: [DEBUG] Wait (size=[4, 4], stride=[4, 1]) (buf8) [rank0]:[2023-10-24 16:28:26,447] [0/0] torch._inductor.comms.__overlap: [DEBUG] ExternKernelOut (extern_kernels.mm) (size=[4, 4], stride=[4, 1]) (buf9) [rank0]:[2023-10-24 16:28:26,447] [0/0] torch._inductor.comms.__overlap: [DEBUG] ComputedBuffer (size=[4, 4], stride=[4, 1]) (buf10) [rank0]:[2023-10-24 16:28:26,447] [0/0] torch._inductor.comms.__overlap: [DEBUG] ExternKernelOut (extern_kernels.mm) (size=[4, 4], stride=[4, 1]) (buf11) [rank0]:[2023-10-24 16:28:26,447] [0/0] torch._inductor.comms.__overlap: [DEBUG] Est. runtime (ms): 0.000228 [rank0]:[2023-10-24 16:28:26,448] [0/0] torch._inductor.comms.__overlap: [DEBUG] ==== Visualize overlap after reordering pass <function reorder_compute_for_overlap at 0x7fa68c5568e0> ==== [rank0]:[2023-10-24 16:28:26,448] [0/0] torch._inductor.comms.__overlap: [DEBUG] InPlaceHint (size=[4, 4], stride=[4, 1]) (buf2) [rank0]:[2023-10-24 16:28:26,448] [0/0] torch._inductor.comms.__overlap: [DEBUG] AllReduce (size=[4, 4], stride=[4, 1]) (buf3) [rank0]:[2023-10-24 16:28:26,448] [0/0] torch._inductor.comms.__overlap: [DEBUG] \| ComputedBuffer (size=[4, 4], stride=[4, 1]) (buf0) [rank0]:[2023-10-24 16:28:26,448] [0/0] torch._inductor.comms.__overlap: [DEBUG] \| ExternKernelOut (extern_kernels.mm) (size=[4, 4], stride=[4, 1]) (buf1) [rank0]:[2023-10-24 16:28:26,448] [0/0] torch._inductor.comms.__overlap: [DEBUG] \| ExternKernelOut (extern_kernels.mm) (size=[4, 4], stride=[4, 1]) (buf9) [rank0]:[2023-10-24 16:28:26,448] [0/0] torch._inductor.comms.__overlap: [DEBUG] Wait (size=[4, 4], stride=[4, 1]) (buf4) [rank0]:[2023-10-24 16:28:26,448] [0/0] torch._inductor.comms.__overlap: [DEBUG] ComputedBuffer (size=[4, 4], stride=[4, 1]) (buf5) [rank0]:[2023-10-24 16:28:26,448] [0/0] torch._inductor.comms.__overlap: [DEBUG] InPlaceHint (size=[4, 4], stride=[4, 1]) (buf6) [rank0]:[2023-10-24 16:28:26,448] [0/0] torch._inductor.comms.__overlap: [DEBUG] AllReduce (size=[4, 4], stride=[4, 1]) (buf7) [rank0]:[2023-10-24 16:28:26,448] [0/0] torch._inductor.comms.__overlap: [DEBUG] Wait (size=[4, 4], stride=[4, 1]) (buf8) [rank0]:[2023-10-24 16:28:26,448] [0/0] torch._inductor.comms.__overlap: [DEBUG] ComputedBuffer (size=[4, 4], stride=[4, 1]) (buf10) [rank0]:[2023-10-24 16:28:26,448] [0/0] torch._inductor.comms.__overlap: [DEBUG] ExternKernelOut (extern_kernels.mm) (size=[4, 4], stride=[4, 1]) (buf11) [rank0]:[2023-10-24 16:28:26,448] [0/0] torch._inductor.comms.__overlap: [DEBUG] Est. runtime (ms): 0.000217 ``` The `\| SomeComputeOp` means the compute op is overlapped with the comm op above. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111981 Approved by: https://github.com/wanchaol	2023-10-25 18:19:43 +00:00
Thiago Crepaldi	9d4dbebc34	Add support to ExportedProgram as input to torch.onnx.dynamo_export (#111497 ) Fixes #109889 This PR adds `torch.export.export` as another `FXGraphExtractor` implementation. `torch.onnx.dynamo_export` automatically uses this new FX tracer when a `torch.export.ExportedProgram` is specified as `model` Implementation is back compatible, thus non `ExportedProgram` models are handled the exact same way as before Pull Request resolved: https://github.com/pytorch/pytorch/pull/111497 Approved by: https://github.com/BowenBao	2023-10-25 18:11:19 +00:00
Edward Z. Yang	07ccaabee7	Make profiler function will be ignored warn only once (#111921 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/111921 Approved by: https://github.com/mlazos, https://github.com/oulgen	2023-10-25 17:45:31 +00:00
Menglu Yu	2b952834c7	[pytorch][PR] [Inductor][FX passes] Pre grad batch relu fusion (#111146 ) Summary: We detect independent relu operators and do the fusion in the pre grad. Test Plan: ### unit test ``` buck2 test mode/dev-nosan //caffe2/test/inductor:group_batch_fusion ``` Test UI: https://www.internalfb.com/intern/testinfra/testrun/16888498608558485 ### Inlinve cvr f479655232 ``` buck2 run mode/opt //scripts/jackiexu0313/pt2:local_model_with_pt2 -- --test_mode split_batch_group ``` before vs after transformation https://www.internalfb.com/intern/diffing/?paste_number=851907099 ``` buck2 run mode/opt //scripts/jackiexu0313/pt2:local_model_with_pt2 -- --test_mode split_batch_group -c ``` P852036786 Differential Revision: D50207610 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111146 Approved by: https://github.com/yanboliang	2023-10-25 17:37:39 +00:00
Aleksei Nikiforov	721b1a6683	s390x vectorization: implement atanh for complex vectorized data (#111653 ) s390x vectorization: implement atanh for complex vectorized data Pull Request resolved: https://github.com/pytorch/pytorch/pull/111653 Approved by: https://github.com/ezyang	2023-10-25 17:36:34 +00:00
Liqun Fu	49489d478b	Update onnx 1.15.0rc2 submodule (#111964 ) Update ONNX submodule to the latest RC Pull Request resolved: https://github.com/pytorch/pytorch/pull/111964 Approved by: https://github.com/thiagocrepaldi	2023-10-25 16:41:45 +00:00
PyTorch MergeBot	5ce8002d24	Revert "Remove deprecated fbgemm operators (#104535 )" This reverts commit 57c7aa12dbf71617bd21fe7e076df8e823b5b7bb. Reverted https://github.com/pytorch/pytorch/pull/104535 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/104535#issuecomment-1779650412))	2023-10-25 16:34:16 +00:00
Edward Z. Yang	5846705e36	Trigger specialization when you call size()/stride() from C++ (#111935 ) This should be the last of the "it used to work with static shapes but it doesn't work with dynamic shapes" hard errors. Now we will just specialize if you hit it from C++. The strategy here is a bit clever. We shunt the size() call to Python binding if an error would have occurred. Importantly, we already have logic to make sure the newly allocated ints stay live for the duration of the ArrayRef access. storage_offset is intentionally omitted because there are some problems with it. I will fix them next. This should let us get rid of the aotautograd_static test configuration. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/111935 Approved by: https://github.com/zou3519	2023-10-25 16:17:55 +00:00
Evgeni Burovski	5ed4a423de	WIP / TST: allow testing torch._numpy under Dynamo (#110401 ) Use conditional imports: when running under dynamo, import the original NumPy not torch._numpy. This is what we want to trace, not our implementation. With this, the test suite passes with and without `PYTORCH_TEST_WITH_DYNAMO=1` (modulo a couple of test modules which are not meant to be compiled, e.g. `test_nep50_examples`). There are two new decorators, `x{fail,pass}ifTorchDynamo`, the `xpass` in most cases indicates a graph break and a fallback to eager for things we do not implement. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110401 Approved by: https://github.com/lezcano	2023-10-25 16:02:16 +00:00
Yifu Wang	6fd3659391	Make require_stride_order peek into AliasedLayout (#111681 ) Summary: `require_stride_order` doesn't know how to handle storage with `AliasedLayout`. It always resorts to a copy even when the view refers to a storage with `FixedLayout`. This causes an unneccessary allocation + copy for collective outputs. Peeking into `AliasedLayout` in `require_stride_order` seems to be the proper way to address the issue. Original program: ```python import tempfile import torch import torch.distributed as dist from torch.distributed._functional_collectives import * # noqa from torch._inductor.utils import run_and_get_triton_code def func(arg: torch.Tensor) -> torch.Tensor: buf0 = arg + 42 out0 = torch.ops.c10d_functional.all_reduce(buf0, "avg", "default", [0], 1) out0 = torch.ops.c10d_functional.wait_tensor(out0) return out0 if __name__ == "__main__": with tempfile.NamedTemporaryFile(delete=False) as tmpf: dist.init_process_group( backend="nccl", init_method=f"file://{tmpf.name}", rank=0, world_size=1 ) device = torch.device("cuda:0") compiled = torch.compile(func) print(run_and_get_triton_code(compiled, torch.rand(4, 4, device=device))) torch.cuda.synchronize() dist.destroy_process_group() ``` Before: ```python def call(args): arg0_1, = args args.clear() assert_size_stride(arg0_1, (4, 4), (4, 1)) with torch.cuda._DeviceGuard(0): torch.cuda.set_device(0) # no-op to ensure context buf0 = empty_strided((4, 4), (4, 1), device='cuda', dtype=torch.float32) # Source Nodes: [buf0], Original ATen: [aten.add] stream0 = get_cuda_stream(0) triton_poi_fused_add_0.run(arg0_1, buf0, 16, grid=grid(16), stream=stream0) del arg0_1 buf1 = buf0; del buf0 # reuse buf2_pg = c10d._find_or_create_pg_by_ranks_and_tag('default', [0], 1) buf2 = buf1 buf2_work = dist.all_reduce(buf2, async_op=True, group=buf2_pg, op=fun_col_impl._str_to_reduce_op('avg')) fun_col_impl._register_tensor_work(buf2, buf2_work) buf1 = _wait_tensor(buf1) buf3 = buf1 buf4 = empty_strided((4, 4), (4, 1), device='cuda', dtype=torch.float32) # Source Nodes: [out0_1], Original ATen: [c10d_functional.wait_tensor] triton_poi_fused_wait_tensor_1.run(buf3, buf4, 16, grid=grid(16), stream=stream0) del buf1 del buf3 return (buf4, ) ``` After: ```python def call(args): arg0_1, = args args.clear() assert_size_stride(arg0_1, (4, 4), (4, 1)) with torch.cuda._DeviceGuard(0): torch.cuda.set_device(0) # no-op to ensure context buf0 = empty_strided((4, 4), (4, 1), device='cuda', dtype=torch.float32) # Source Nodes: [buf0], Original ATen: [aten.add] stream0 = get_cuda_stream(0) triton_poi_fused_add_0.run(arg0_1, buf0, 16, grid=grid(16), stream=stream0) del arg0_1 buf1 = buf0; del buf0 # reuse buf2_pg = c10d._find_or_create_pg_by_ranks_and_tag('default', [0], 1) buf2 = buf1 buf2_work = dist.all_reduce(buf2, async_op=True, group=buf2_pg, op=fun_col_impl._str_to_reduce_op('avg')) fun_col_impl._register_tensor_work(buf2, buf2_work) buf1 = _wait_tensor(buf1) buf3 = buf1 del buf3 return (buf1, ) ``` Test Plan: Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/111681 Approved by: https://github.com/jansel	2023-10-25 15:44:09 +00:00
Jongsoo Park	ac08b10d60	[pytorch] bfloat16 support in erfinv (#111257 ) Differential Revision: D50280766 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111257 Approved by: https://github.com/jianyuh	2023-10-25 15:43:48 +00:00
PyTorch MergeBot	247f39f603	Revert "Fix inconsistency of max_split_size between DeviceStats and CUDAAllocatorConfig (#111555 )" This reverts commit 0b424ee0b7bfe09e0a438a63e8336e95eea85901. Reverted https://github.com/pytorch/pytorch/pull/111555 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/111555#issuecomment-1779438172))	2023-10-25 14:44:18 +00:00
Andrew Hu	8253e0524c	Add "device not supported" assert to inductor (#112001 ) Fixes #111999 Adds an assert that provides a more informative error message For example, when running a compiled function with mps (currently unsupported): ``` ... File "/Users/andrew.hu/Desktop/pytorch/torch/_inductor/graph.py", line 927, in init_wrapper_code assert wrapper_code_gen_cls is not None, f"Device {device_type} not supported" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised: AssertionError: Device mps not supported ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112001 Approved by: https://github.com/peterbell10	2023-10-25 14:19:37 +00:00
Atul Jangra	88244cd7a9	[torchx] Do not terminate parent process if exit code from child isn't valid (#111961 ) Summary: There's no reason to terminate the parent process trying to find the name of the signal received by the child process. Let's make sure this is handled properly, which then will ensure that parent process can process child failures. Test Plan: Unit tests. Reviewed By: aaronenyeshi Differential Revision: D50615419 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111961 Approved by: https://github.com/aaronenyeshi	2023-10-25 07:13:28 +00:00
Simon Fan	28ebe5df7a	yolov3: reduce batch size due to OOM (#111959 ) yolov3 w/ cudagraphs (known to use more memory) is failing perf test due to OOM (https://hud.pytorch.org/benchmark/torchbench/inductor_with_cudagraphs?startTime=Mon,%2016%20Oct%202023%2020:19:47%20GMT&stopTime=Mon,%2023%20Oct%202023%2020:19:47%20GMT&granularity=hour&mode=training&dtype=amp&lBranch=main&lCommit=0b424ee0b7bfe09e0a438a63e8336e95eea85901&rBranch=main&rCommit=29048be41ca3aa8974795d93b9ea9fd6dee415fc) I'm reducing the batch size from 16 to 8 to keep the same batch size for all yolov3 HUD benchmarks Pull Request resolved: https://github.com/pytorch/pytorch/pull/111959 Approved by: https://github.com/xuzhao9	2023-10-25 06:18:53 +00:00
PyTorch MergeBot	5120c97f32	Revert "Add support to ExportedProgram as input to torch.onnx.dynamo_export (#111497 )" This reverts commit 4f42edfb6e5b703eec2a14b8933090646702c5a2. Reverted https://github.com/pytorch/pytorch/pull/111497 on behalf of https://github.com/huydhn due to Sorry for reverting your change, it is failing ONNX test in trunk `4f42edfb6e`, possibly a landrace ([comment](https://github.com/pytorch/pytorch/pull/111497#issuecomment-1778519212))	2023-10-25 05:07:00 +00:00
wz337	52eec50d31	[2D] Enable 2D optimizer set_state_dict() (#111778 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111778 Approved by: https://github.com/fegin ghstack dependencies: #111774	2023-10-25 04:27:13 +00:00
Yue Dong	d8a9b6640e	[Kineto][NCCL][2/n] Add records NCCL meta to more collective functions (#111843 ) This diff records NCCL metadata for more commonly used collective functions. NOTE: the coalesced NCCL are not covered: https://fburl.com/code/ihgqqvg8 and how to support them needs further discussion. Differential Revision: [D50439232](https://our.internmc.facebook.com/intern/diff/D50439232/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111843 Approved by: https://github.com/aaronenyeshi, https://github.com/fduwjj ghstack dependencies: #111842	2023-10-25 03:49:09 +00:00
Yue Dong	43d0ae4822	[Kineto][NCCL][1/n] Add the world size info in NCCL metadata (#111842 ) This diff adds the world size info in NCCL metadata, as we need the information to calculate the algorithmic bandwidth and bus Bandwidth. Differential Revision: [D50439185](https://our.internmc.facebook.com/intern/diff/D50439185/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111842 Approved by: https://github.com/aaronenyeshi, https://github.com/fduwjj	2023-10-25 03:48:55 +00:00
Jerry Zhang	bf998a2c5d	[quant][pt2e][be] Cleanup observer insertion logic (#111828 ) Summary: att, after SharedQuantizationSpec bug fix we are doing some checks before hand, this can simplify the logic when we insert observers Test Plan: python test/test_quantization.py TestQuantizePT2E CIs Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/111828 Approved by: https://github.com/kimishpatel ghstack dependencies: #111827	2023-10-25 03:48:36 +00:00
wz337	8dc4887e84	[2D] Enable 2D optimizer get_state_dict() (#111774 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111774 Approved by: https://github.com/fegin	2023-10-25 03:44:14 +00:00
PyTorch UpdateBot	6625269e14	[vision hash update] update the pinned vision hash (#111982 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111982 Approved by: https://github.com/pytorchbot	2023-10-25 03:39:09 +00:00
cyy	f9cc7f6a1c	Enable Wno-unused-private-field,Wunused-lambda-capture and fix CUDA warnings (#110856 ) This PR enables Wno-unused-private-field,Wunused-lambda-capture and some CUDA warnings were fixed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110856 Approved by: https://github.com/albanD, https://github.com/malfet	2023-10-25 03:39:05 +00:00
Simon Fan	9e6c97890b	Dynamo runner: add FSDP handcrafted module wrapping policy (#111505 ) The default size based auto wrap policy may not be representative of actual usage of the models. We add support for a few handpicked models, and fallback to the size based policy. sample command: `PYTHONPATH=~/benchmark/ python benchmarks/dynamo/torchbench.py -dcuda --training --backend=inductor --multiprocess --performance --only nanogpt --fsdp` 1.257x 1.256x 1.257x 1.252x 1.257x 1.262x 1.258x 1.272x Pull Request resolved: https://github.com/pytorch/pytorch/pull/111505 Approved by: https://github.com/H-Huang, https://github.com/xuzhao9	2023-10-25 03:05:31 +00:00
Oguz Ulgen	a29a844938	[Inductor] Support top level constants in user defined triton kernels (#111970 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111970 Approved by: https://github.com/jansel ghstack dependencies: #111956	2023-10-25 02:43:51 +00:00
Oguz Ulgen	bb550b25c9	[Inductor] Support user defined triton kernels calling other triton kernels and activation functions (#111956 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111956 Approved by: https://github.com/jansel	2023-10-25 02:39:43 +00:00
Liao, Xuan	b570320364	[cpu][inductor] improve cpu vec implementations of log (#111898 ) Fixes #110611. The current Torchinductor's `log` implementations will call `sleef` functions in `aten::Vec` which show worse performance than Aten's `log` implementations that invoke `MKL` functions. The reason is that the `sleef` algorithms sacrifice performance in order to have a higher precision. This PR changes Torchinductor's `log` implementations from the `sleef` functions with `1.0` ULP error bound to the ones with `3.5` ULP error bound. Performance Machine: ICX The original perf number, perf with `Sleef_logf16_u10`: ```bash numactl -C0 python test.py log eager: 368.8463559374213 compiled: 616.8672097846866 logit eager: 565.499295014888 compiled: 1010.4096410796046 ``` Perf with `Sleef_logf16_u35`: ```bash numactl -C0 python test.py log eager: 364.8629770614207 compiled: 360.2141812443733 logit eager: 562.3160391114652 compiled: 545.2622110024095 ``` Accuracy error_bound \| tol=1e-6 \| tol=1e-7 -- \| -- \| -- 1.0 ULP \| PASS \| FAIL 3.5 ULP \| PASS \| FAIL Pull Request resolved: https://github.com/pytorch/pytorch/pull/111898 Approved by: https://github.com/jgong5, https://github.com/desertfire, https://github.com/jansel	2023-10-25 01:26:39 +00:00
Jon Chuang	e574a8ab55	[dynamo] Add sanity checks to ensure no double-wrapping of `FakeTensor`s produced by the current graph (#111913 ) Partially fixes: https://github.com/pytorch/pytorch/issues/111873 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111913 Approved by: https://github.com/ezyang	2023-10-25 01:18:32 +00:00
Thiago Crepaldi	4f42edfb6e	Add support to ExportedProgram as input to torch.onnx.dynamo_export (#111497 ) Fixes #109889 This PR adds `torch.export.export` as another `FXGraphExtractor` implementation. `torch.onnx.dynamo_export` automatically uses this new FX tracer when a `torch.export.ExportedProgram` is specified as `model` Implementation is back compatible, thus non `ExportedProgram` models are handled the exact same way as before Pull Request resolved: https://github.com/pytorch/pytorch/pull/111497 Approved by: https://github.com/BowenBao	2023-10-25 00:17:43 +00:00
Jerry Zhang	6e2dfb360b	[quant][be] Clean up prepare code (#111827 ) Summary: att Test Plan: CI Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/111827 Approved by: https://github.com/andrewor14	2023-10-25 00:14:59 +00:00
Shiyan Deng	3acaf8564d	[easy] use number of param bytes as the chunk size if it's not provided (#111844 ) Summary: ATT Test Plan: CI Differential Revision: D50572228 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111844 Approved by: https://github.com/zyan0, https://github.com/houseroad	2023-10-24 23:56:33 +00:00
BowenBao	ad4971c0b1	Delete deepcopied model after use in benchmark to reduce memory consumption (#111868 ) As title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111868 Approved by: https://github.com/msaroufim, https://github.com/thiagocrepaldi ghstack dependencies: #111867, #111593	2023-10-24 23:44:14 +00:00
Kimish Patel	a8760f1b42	[Quantization] Add a test for QAT + PTQ selective quantization in (#111689 ) xnnpack quantizer Summary: For some workflows you want to quantize some parts of the model via qat and then continue eager mode training. After training, you want to export the whole model and perform PTQ on the rest. Test Plan: test added Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D50510480](https://our.internmc.facebook.com/intern/diff/D50510480) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111689 Approved by: https://github.com/jerryzh168	2023-10-24 23:25:38 +00:00
Oleg Bulatov	192477b5ba	Enable flake8-bugbear B020 lint (#110823 ) Fixes part of https://github.com/pytorch/pytorch/issues/106571 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110823 Approved by: https://github.com/Skylion007	2023-10-24 22:43:47 +00:00
Zain Rizvi	b600aed237	[TD] Make test class times available during CI (#111836 ) Makes the test class durations uploaded by https://github.com/pytorch/test-infra/pull/4670 available during CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111836 Approved by: https://github.com/clee2000	2023-10-24 21:40:10 +00:00
Peter Bell	1dd57082a4	[inductor] Decompose boolean min/max into all/any (#110311 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110311 Approved by: https://github.com/lezcano ghstack dependencies: #110310	2023-10-24 21:33:53 +00:00
Peter Bell	46e80ce58a	[ATen] Support multi dim any and all reductions (#110310 ) This adds a new overload to `all` and `any` with support for multiple reduction dims. ``` all.dims(Tensor self, int[1]? dim=None, bool keepdim=False) -> Tensor any.dims(Tensor self, int[1]? dim=None, bool keepdim=False) -> Tensor ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/110310 Approved by: https://github.com/lezcano, https://github.com/albanD, https://github.com/justinchuby	2023-10-24 21:33:53 +00:00
chilli	9849ef1253	Remove requires_grad_info from AOTDispatch (#110773 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110773 Approved by: https://github.com/bdhirsh	2023-10-24 21:31:34 +00:00
PyTorch MergeBot	5344468712	Revert "[dynamo] Properly track user-defined types for `type()` (#110794 )" This reverts commit ad4ccf96896bdf0f098bd9192f8c5a019fddf4c6. Reverted https://github.com/pytorch/pytorch/pull/110794 on behalf of https://github.com/ezyang due to looks like this actually fails internal tests ([comment](https://github.com/pytorch/pytorch/pull/110794#issuecomment-1778002262))	2023-10-24 20:42:26 +00:00
BowenBao	4839f319da	Apply same 'pick_grad' on generating fp64 reference outputs (#111593 ) To lower memory consumption for inference mode. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111593 Approved by: https://github.com/msaroufim, https://github.com/thiagocrepaldi ghstack dependencies: #111867	2023-10-24 20:16:53 +00:00
BowenBao	ec2e0712db	[ONNX] Enable onnx inlining in benchmark for >2GB models (#111867 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111867 Approved by: https://github.com/thiagocrepaldi	2023-10-24 20:16:53 +00:00
Hannes Friederich	5da903ff78	[qnnpack] suppress empty translation unit warning (#111475 ) Summary: Spotted this while compiling on a Mac M1. The code in these files is gated behind #ifdef and requires SSE, so when building for ARM these files become empty. Test Plan: CI Differential Revision: D50407334 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111475 Approved by: https://github.com/digantdesai	2023-10-24 20:08:58 +00:00
PyTorch MergeBot	b0087b4cf7	Revert "record_function: remove legacy internal operators (#72303 )" This reverts commit 0be84bb41e6f527229b9f50ce9937038a0c14ffe. Reverted https://github.com/pytorch/pytorch/pull/72303 on behalf of https://github.com/izaitsevfb due to Apparently _record_function_enter is still used internally at Meta in several places and in lots of internal tests. ([comment](https://github.com/pytorch/pytorch/pull/72303#issuecomment-1777942975))	2023-10-24 20:01:14 +00:00
Bin Bao	e72fcd382b	[aotinductor] Fix a problem when the generated graph is empty (#111822 ) Summary: For https://github.com/pytorch/pytorch/issues/111691 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111822 Approved by: https://github.com/chenyang78	2023-10-24 20:00:27 +00:00
Nikita Shulga	b01e87d0c0	[BE][EZ] Use `setup-ssh` actions from `test-infra` (#111922 ) I though I've migrated all the actions to this one, but overlooked the Windows binary builds Pull Request resolved: https://github.com/pytorch/pytorch/pull/111922 Approved by: https://github.com/atalman	2023-10-24 19:55:58 +00:00
Oguz Ulgen	ddcf9c050b	[Inductor] Support calling user defined kernels with different type of arguments (#111939 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111939 Approved by: https://github.com/jansel, https://github.com/zou3519 ghstack dependencies: #111770, #111808	2023-10-24 19:49:48 +00:00
Jon Chuang	4ac848cf77	[dynamo] Perf (`MapHigherOrderVariable`): do not unnecessarily `get_real_value` (#111920 ) `get_real_value` will run the real tensor computation via the fx graph, which could be really expensive. Let's just do the sensible thing by running the fx graph on the fake value Pull Request resolved: https://github.com/pytorch/pytorch/pull/111920 Approved by: https://github.com/ezyang, https://github.com/zou3519	2023-10-24 19:44:25 +00:00
Zain Rizvi	3c46e859aa	[TD] Enable trial mode for new heuristics (#111858 ) This lets one get metrics from a new heuristic and evaluate it's results without having it actually reorder the tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/111858 Approved by: https://github.com/clee2000	2023-10-24 19:13:07 +00:00
atalman	7bec7d95e4	Automate release only changes, binary_linux_test.sh (#111862 ) Automates following release only change: https://github.com/pytorch/pytorch/pull/108688 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111862 Approved by: https://github.com/osalpekar	2023-10-24 18:59:34 +00:00
atalman	d92459617e	Automate passing conda-pytorchbot-token-test for release (#111821 ) <!-- copilot:summary --> ### <samp>🤖 Generated by Copilot at a3b51df</samp> This pull request adds support for testing binary uploads to Anaconda Cloud using different tokens and channels based on the branch name. It modifies the `.github/workflows/_binary-upload.yml` workflow and several other workflows that use the `.github/templates/upload.yml.j2` template. It also adds a new secret variable `conda-pytorchbot-token-test` to store the test token. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111821 Approved by: https://github.com/osalpekar, https://github.com/huydhn	2023-10-24 18:58:47 +00:00
ydwu4	cd034e1793	[HigherOrderOp] don't mannually set input for cond (#111611 ) We set mannualy_set_graph_inputs to False for CondHigherOrder. After that, it became necessary to deduplicate the inputs. We'll add pytree tests in the follow-up pr. Test Plan: existing tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111611 Approved by: https://github.com/zou3519 ghstack dependencies: #111610	2023-10-24 18:56:23 +00:00
Scott Wolchok	a0043d4840	[PyTorch] AOTI: cache dtypes and device types at DSO load (#111820 ) Calling the `aoti_torch_{device_type,dtype}` functions on each iteration can impose high costs on overhead-bound CPU models because they can't be inlined across a DSO boundary. If we call them on load, we can use simple load instructions at run time. Differential Revision: [D50563682](https://our.internmc.facebook.com/intern/diff/D50563682/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111820 Approved by: https://github.com/chenyang78, https://github.com/desertfire ghstack dependencies: #111815, #111816	2023-10-24 18:37:26 +00:00
Scott Wolchok	de2b41bbbf	[PyTorch] AOTI: override VecISA selection in fbcode (#111816 ) The OSS selection mechanism does not work internally, and doesn't make sense when the machine building the .so and the machine executing it may be different anyway. Differential Revision: [D50140024](https://our.internmc.facebook.com/intern/diff/D50140024/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111816 Approved by: https://github.com/jansel, https://github.com/msaroufim ghstack dependencies: #111815	2023-10-24 18:37:26 +00:00
Scott Wolchok	6afd00a318	[PyTorch] AOTI: use array of constants (#111815 ) We continue to allow the user to set clients with a map, but under the hood we use an array of constants. model_container thought it was OK to hand over the map, assume we just kept a pointer, and then mutate the map later; I had to fix that. I hope there aren't other sites that do the same thing... Differential Revision: [D50111512](https://our.internmc.facebook.com/intern/diff/D50111512/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111815 Approved by: https://github.com/jansel, https://github.com/desertfire	2023-10-24 18:37:18 +00:00
Bert Maher	b70efde3ad	[easy] Reapply D49842542 (remove pessimizing move) (#111910 ) This fixes a pessimizing move; for some reason the linked diff was allowed to land with this change applied only to the internal fork of pytorch. Differential Revision: [D50599188](https://our.internmc.facebook.com/intern/diff/D50599188/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111910 Approved by: https://github.com/desertfire, https://github.com/chenyang78	2023-10-24 17:51:16 +00:00
Justin Yip	b89c2202bc	[pytorch-vulkan] Support zero-dim (#111680 ) Summary: 1. Add zero-dim (Tensor with 1 element) support. 2. New operator `_local_scalar_dense` that map a zero-dim tensor into a Scalar 3. `sum_dim`: 3.1. Add zero-dim support. 3.2. Fix bug in negative indices when handling multi-dim reduction call 3.3. Add unittests to test new coverages 4. Add `aten::sum` support. 5. Change bug in `add_tensor` (and other binary ops), when `other` is zero dim, we will use broadcast instead. Test Plan: ## Devserver Full Paste: P858982150 ``` [yipjustin@31799.od ~/fbsource (8593e7559)]$ LD_LIBRARY_PATH=third-party/swiftshader/lib/linux-x64/ buck2 run fbcode/mode/dev-nosan -c pt.has_backtraces=1 //xplat/caffe2:pt_vulkan_api_test_bin -- File changed: fbsource//xplat/caffe2/aten/src/ATen/test/vulkan_api_test.cpp Buck UI: https://www.internalfb.com/buck2/90cad0ff-ac98-4dbf-8d6f-0e419c06208d Network: Up: 43KiB Down: 1.4MiB (reSessionID-dfc3a318-fd1a-4ad6-b077-c454ebb4c6a8) Jobs completed: 6. Time elapsed: 26.4s. Cache hits: 0%. Commands: 2 (cached: 0, remote: 1, local: 1) BUILD SUCCEEDED Running main() from third-party/googletest/1.11.0/googletest/googletest/src/gtest_main.cc [==========] Running 385 tests from 1 test suite. [----------] Global test environment set-up. [----------] 385 tests from VulkanAPITest [ RUN ] VulkanAPITest.zero_size_tensor [ OK ] VulkanAPITest.zero_size_tensor (9 ms) [ RUN ] VulkanAPITest.zero_dim_tensor_1 [ OK ] VulkanAPITest.zero_dim_tensor_1 (84 ms) [ RUN ] VulkanAPITest.zero_dim_tensor_2 [ OK ] VulkanAPITest.zero_dim_tensor_2 (22 ms) [ RUN ] VulkanAPITest.local_scalar_dense [ OK ] VulkanAPITest.local_scalar_dense (10 ms) ... [ OK ] VulkanAPITest.lstm_prepack_success (2 ms) [ RUN ] VulkanAPITest.querypool_flushed_shader_log xplat/caffe2/aten/src/ATen/test/vulkan_api_test.cpp:7484: Skipped QueryPool is not available [ SKIPPED ] VulkanAPITest.querypool_flushed_shader_log (0 ms) [----------] 385 tests from VulkanAPITest (46915 ms total) [----------] Global test environment tear-down [==========] 385 tests from 1 test suite ran. (46915 ms total) [ PASSED ] 382 tests. [ SKIPPED ] 1 test, listed below: [ SKIPPED ] VulkanAPITest.querypool_flushed_shader_log [ FAILED ] 2 tests, listed below: [ FAILED ] VulkanAPITest.conv2d_pw_prepack [ FAILED ] VulkanAPITest.conv2d_pw_prepack_bc 2 FAILED TESTS YOU HAVE 7 DISABLED TESTS ``` ## M1 MAC P859975219 ``` buck run //xplat/caffe2:pt_vulkan_api_test_binAppleMac\#macosx-arm64 --target-platforms ovr_config//platform/macos:arm64-fbsource -- --gtest_filter="*" Using additional configuration options from .buckconfig.local Building: finished in 0.2 sec (100%) 269/2875 jobs, 0/2875 updated Total time: 0.2 sec BUILD SUCCEEDED Running main() from third-party/googletest/1.11.0/googletest/googletest/src/gtest_main.cc [==========] Running 384 tests from 1 test suite. [----------] Global test environment set-up. [----------] 384 tests from VulkanAPITest [ RUN ] VulkanAPITest.zero_size_tensor [ OK ] VulkanAPITest.zero_size_tensor (40 ms) [ RUN ] VulkanAPITest.zero_dim_tensor_1 [ OK ] VulkanAPITest.zero_dim_tensor_1 (7 ms) [ RUN ] VulkanAPITest.zero_dim_tensor_2 [ OK ] VulkanAPITest.zero_dim_tensor_2 (1 ms) [ RUN ] VulkanAPITest.local_scalar_dense [ OK ] VulkanAPITest.local_scalar_dense (0 ms) [ RUN ] VulkanAPITest.copy_to_texture [ OK ] VulkanAPITest.copy_to_texture (45 ms) ... [ SKIPPED ] VulkanAPITest.querypool_flushed_shader_log (0 ms) [----------] 384 tests from VulkanAPITest (5127 ms total) [----------] Global test environment tear-down [==========] 384 tests from 1 test suite ran. (5127 ms total) [ PASSED ] 382 tests. [ SKIPPED ] 1 test, listed below: [ SKIPPED ] VulkanAPITest.querypool_flushed_shader_log [ FAILED ] 1 test, listed below: [ FAILED ] VulkanAPITest.normal_large 1 FAILED TEST YOU HAVE 5 DISABLED TESTS ``` Differential Revision: D50347338 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111680 Approved by: https://github.com/SS-JIA	2023-10-24 17:29:56 +00:00
Nikita Shulga	062850f4b9	Remove TorchText from RELEASE.MD (#111940 ) TorchText development has been paused, so it should no longer be considered a requirement for release process Pull Request resolved: https://github.com/pytorch/pytorch/pull/111940 Approved by: https://github.com/atalman, https://github.com/seemethere, https://github.com/kit1980	2023-10-24 17:28:33 +00:00
Guilherme Leobas	f97c2dabd9	Move negative index checking to common.py - Fix issue 97365 (#108690 ) Fixes https://github.com/pytorch/pytorch/issues/97365 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108690 Approved by: https://github.com/lezcano	2023-10-24 17:27:54 +00:00
Zhang, Wuxun	f32eb9bc55	fix missing non-contiguous output handling for add op (#111758 ) patch for https://github.com/pytorch/pytorch/pull/104689 which is missing similiar handling for add op Pull Request resolved: https://github.com/pytorch/pytorch/pull/111758 Approved by: https://github.com/karthiknagasub, https://github.com/ezyang	2023-10-24 17:27:50 +00:00
alhridoy	0c64ac0d3a	Add tests for strided layout in factory functions (#111463 ) Fixes #111222 This pull request adds tests for factory functions that create tensors with a strided layout. The tests are added to the `test_ops.py` file and check the behavior of the `empty`, `zeros`, `ones`, and `rand` factory functions when used with the `layout=torch.strided` argument. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111463 Approved by: https://github.com/lezcano	2023-10-24 17:05:44 +00:00
Jun Luo	fb7047e1a1	Place local_used_map_dev_ on CPU for MTIA (#111581 ) Summary: The dist backend used on MTIA doesn't support int32 allreduce for now. The local_used_map_dev_ has to be placed on CPU. Test Plan: See diff D50387636 Differential Revision: D50460304 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111581 Approved by: https://github.com/fduwjj	2023-10-24 17:02:44 +00:00
Jez Ng	ad3572a5dc	Unify torch.SymInt and torch.types.SymInt (#110573 ) Per @ezyang, this should be fine Pull Request resolved: https://github.com/pytorch/pytorch/pull/110573 Approved by: https://github.com/ezyang	2023-10-24 16:17:23 +00:00
Ying Zhang	099efd8346	Fix reduction + () + multi-level reduction optimization (#111781 ) In https://github.com/pytorch/pytorch/pull/111122, an optimization is introduced for reduction() + () + multi-level reduction. In this case, we make a multi-level reduction first-level reduction ranges the same as the previous reduction ranges so that the Inductor has better chances to fuse the first reduction and the first-level reduction of the multi-level reduction kernel together. There is a corner case that the multi-level reduction kernel has `keepdim=True`. In this case, ranges of the multi-level reduction kernel is not empty, and the dim info needs to be used to create the inner loader of the first-level reduction kernel. To keep the logic simple, for now we simply disable optimization when `keepdim=True`. Differential Revision: [D50544876](https://our.internmc.facebook.com/intern/diff/D50544876) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111781 Approved by: https://github.com/malfet, https://github.com/jansel	2023-10-24 15:42:21 +00:00
atalman	a887ad0b60	Add continue-on-error if ssh step is failing (#111916 ) This is debugging step and should not cause the whole workflow to fail. Hence adding continue-on-error which Prevents a job from failing when a step fails. Set to true to allow a job to pass when this step fails Failure: https://github.com/pytorch/pytorch/actions/runs/6627941257/job/18003997514?pr=111821 Example: ``` Run seemethere/add-github-ssh-key@v1 with: GITHUB_TOKEN: * activate-with-label: true label: with-ssh remove-existing-keys: true env: ALPINE_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine ANACONDA_USER: pytorch AWS_DEFAULT_REGION: us-east-1 BUILD_ENVIRONMENT: windows-binary-conda GITHUB_TOKEN: * PR_NUMBER: SHA1: e561cd9d[2](https://github.com/pytorch/pytorch/actions/runs/6627941257/job/18003997514?pr=111821#step:3:2)5[3](https://github.com/pytorch/pytorch/actions/runs/6627941257/job/18003997514?pr=111821#step:3:3)d8[4](https://github.com/pytorch/pytorch/actions/runs/6627941257/job/18003997514?pr=111821#step:3:4)0834d8bbef4ec98ad8[6](https://github.com/pytorch/pytorch/actions/runs/6627941257/job/18003997514?pr=111821#step:3:6)[8](https://github.com/pytorch/pytorch/actions/runs/6627941257/job/18003997514?pr=111821#step:3:8)ba01e4 SKIP_ALL_TESTS: 1 PYTORCH_ROOT: C:\actions-runner\_work\pytorch\pytorch/pytorch BUILDER_ROOT: C:\actions-runner\_work\pytorch\pytorch/builder PACKAGE_TYPE: conda DESIRED_CUDA: cu118 GPU_ARCH_VERSION: 11.8 GPU_ARCH_TYPE: cuda DESIRED_PYTHON: 3.[9](https://github.com/pytorch/pytorch/actions/runs/6627941257/job/18003997514?pr=111821#step:3:9) ciflow reference detected, attempting to extract PR number Error: The request could not be processed because too many files changed ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/111916 Approved by: https://github.com/malfet	2023-10-24 14:53:40 +00:00
Nicolas Hug	1ddbdb5144	Optest: Allow parametrized names for xfails checks (#111797 ) CC @zou3519 This is hopefully a fix for https://github.com/pytorch/vision/pull/8058/files#r1368570541. It seems to work for me locally, but maybe there's a more elegant way of handling this? Pull Request resolved: https://github.com/pytorch/pytorch/pull/111797 Approved by: https://github.com/zou3519	2023-10-24 11:35:27 +00:00
Fei Kou	4f79161452	Add tensor parallel sharding APIs for torch export (#111236 ) Add libraries to apply tensor parallel transformation to an exported program. Differential Revision: [D50214796](https://our.internmc.facebook.com/intern/diff/D50214796/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111236 Approved by: https://github.com/wanchaol	2023-10-24 10:07:14 +00:00
Rohan Varma	ebcc42ea10	[Dist] Fix coalescing manager + DETAIL debug mode (#111878 ) Fix https://github.com/pytorch/pytorch/issues/109520 by adding it to ProcessGroupWrapper. Differential Revision: [D50583403](https://our.internmc.facebook.com/intern/diff/D50583403/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111878 Approved by: https://github.com/fegin, https://github.com/wanchaol, https://github.com/fduwjj	2023-10-24 07:47:39 +00:00
zdevito	babb6c6ac4	nccl flight recorder (#110960 ) Keep a buffer of the last 16384 nccl work actions, including the stack trace that launched the event. When torch._C._distributed_c10d._dump_nccl_trace(), it an dump these to a pickled archive. For each action we get: process_group_id, seq_id, collective_name, size_of_first_tensor, stack trace state - issued, started, completed (based on cuda events and queried if necessary when the dump is requested) I tested that it is possible to query event state when the streams are otherwise stuck. Differential Revision: [D50138956](https://our.internmc.facebook.com/intern/diff/D50138956) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110960 Approved by: https://github.com/wconstab	2023-10-24 07:12:21 +00:00
Jon Chuang	9dfaba6f10	[dynamo] add repro for functorch/fx interop issue (`allow_in_graph`) (#111746 ) Fixes https://github.com/pytorch/pytorch/issues/109025 by adding repro Pull Request resolved: https://github.com/pytorch/pytorch/pull/111746 Approved by: https://github.com/voznesenskym	2023-10-24 07:03:15 +00:00
Li-Huai (Allan) Lin	4b804dac33	[MPS] Add complex support for `fill` (#111885 ) Fixes #110537 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111885 Approved by: https://github.com/malfet	2023-10-24 06:41:10 +00:00
Pritam Damania	0ad91c2bfb	Add an explicit _shutdown method to ProcessGroupNCCL (#111392 ) Currently, the only way ProcessGroupNCCL shuts down its background threads and aborts all communicators is via the destructor. However, given how python GC works and code holding references to the PG in multiple places, in practice calling `destroy_process_group` doesn't actually end up invoking the destructor. As a result, in this PR I'm adding a explicit shutdown method to that users can call to cleanup all resources. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111392 Approved by: https://github.com/XilunWu, https://github.com/wanchaol, https://github.com/fduwjj	2023-10-24 05:47:12 +00:00
Jon Chuang	6d78f34a06	fix regression which creates a new fake tensor (#111864 ) Fixes regression identified here: `ccd6b373b5 (r1369334484)` Now that `get_fake_value` will identify aliases, we should not try to wrap the fake value again. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111864 Approved by: https://github.com/eellison	2023-10-24 05:11:48 +00:00
FFFrog	0e0f6a248d	Fix num_batches_tracked of BatchNorm when load_state_dict (#110850 ) Fixes #110361 as the title shown Pull Request resolved: https://github.com/pytorch/pytorch/pull/110850 Approved by: https://github.com/mikaylagawarecki	2023-10-24 04:20:38 +00:00
Elias Ellison	30cbd2ea37	Add Benchmark for freezing + max autotune, turn on in weekly run (#111853 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111853 Approved by: https://github.com/desertfire	2023-10-24 04:13:56 +00:00
Jez Ng	cbc6213f5d	[inductor] Defer memory operation lowering to wrapper (#111402 ) Right now, memory ops are being lowered to strings partly in scheduler.codegen() and partly in wrapper.codegen(). But that makes static memory planning (which is done entirely in `wrapper.codegen()`) difficult to implement as information is "lost" by that point. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111402 Approved by: https://github.com/jansel	2023-10-24 03:47:56 +00:00
Hongtao Yu	6977ba6e3c	[inductor] decomposition for complex addition (#110740 ) Tracks https://github.com/pytorch/pytorch/issues/98161 Complex number support in Pytorch isn't ideal today as complex operations will mostly end up taken care of by the aten runtime, except for `torch.angle` which is handled in [105609](https://github.com/pytorch/pytorch/pull/105609). In general a better way to handle that could be to decompose complex operations first so that more opportunities for fusion could be unveiled, and then to have Triton take care of non-continuous (strided) tensor operations more efficiently. This change adds support to decompose complex addtions. ``` @triton.jit def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): xnumel = 6 xoffset = tl.program_id(0) * XBLOCK xindex = xoffset + tl.arange(0, XBLOCK)[:] xmask = xindex < xnumel x0 = xindex tmp0 = tl.load(in_ptr0 + (x0), xmask) tmp1 = tl.load(in_ptr1 + (x0), xmask) tmp2 = tmp0 + tmp1 tl.store(out_ptr0 + (x0), tmp2, xmask) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/110740 Approved by: https://github.com/jansel	2023-10-24 03:41:24 +00:00
Jason Ansel	b3bb94b980	[dynamo] Update test_invoke_in_pt2_compiled_autograd (#111817 ) Summary: For some reason this test seems to only run in fbcode, not OSS Test Plan: CI Differential Revision: D50562753 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111817 Approved by: https://github.com/izaitsevfb	2023-10-24 03:30:36 +00:00
drisspg	a469aca1cc	Exposes a fast_fp8_accum option to _scaled_mm (#111847 ) # Summary Adds the option to use fast_accumulation_mode for the fp8 matmul in scaled_mm Information can be found here: https://docs.nvidia.com/cuda/cublas/#cublasltmatmuldescattributes-t defaults to 0 (off) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111847 Approved by: https://github.com/ipiszy, https://github.com/malfet	2023-10-24 03:26:53 +00:00
Jesse Cai	702aaf8aea	[sparse] semi-structured sparse + torch.compile support (#111049 ) Summary: This PR adds in torch.compile support for semi-structured sparsity, using the subclass tracing @bdhirsh added. Based on wether we are using cuSPARSELt or CUTLASS, we return a different representation of the inner tensors. Test Plan: ``` python test/test_sparse_semi_structured.py -k compile ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/111049 Approved by: https://github.com/cpuhrsch	2023-10-24 02:23:20 +00:00
BowenBao	5eac44bc72	Ignore beartype if its version is 0.16.0 (#111859 ) With this fix, 'beartype' 0.16.0 should be ignored and not crash PyTorch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111859 Approved by: https://github.com/justinchuby, https://github.com/thiagocrepaldi	2023-10-24 02:11:26 +00:00
Huy Do	9132734a35	Use Dr.CI GitHub checkrun summary when querying its API fails (#111628 ) This will allow internal SandCastle job to access Dr.CI classification results via GitHub checkrun summary and correctly ignore unrelated failures. ### Testing Adding `TestBypassFailuresOnSandCastle` where Dr.CI API returns nothing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111628 Approved by: https://github.com/clee2000	2023-10-24 01:32:30 +00:00
PyTorch MergeBot	e62c887bab	Revert "[inductor][BE] split triton_meta and inductor_meta (#111397 )" This reverts commit 070b94dc08c73e133c5231ec6acbe407ae1580f3. Reverted https://github.com/pytorch/pytorch/pull/111397 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/111397#issuecomment-1776282039))	2023-10-24 00:52:24 +00:00
Dmitry Nikolaev	0a26e5fd8f	Use 'device' argument in test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_* (#111584 ) Argument "device" was missed. So, "test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_*_cuda" was always run on the default device ("cpu") if another default torch device was not configured before. This fix will probably detect a number of issues on various devices which were previously missed. Should fix failed rocm CI jobs with "##[error]The action has timed out." and speedup test execution Pull Request resolved: https://github.com/pytorch/pytorch/pull/111584 Approved by: https://github.com/soulitzer	2023-10-24 00:03:50 +00:00
Pearu Peterson	b969c675f5	Add batched dimensions support to the second operand of bsr_scatter_mm (#111796 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111796 Approved by: https://github.com/cpuhrsch ghstack dependencies: #110396, #111470, #111489, #111760	2023-10-23 23:52:49 +00:00
Pearu Peterson	6382011843	Add NVIDIA A100 optimized meta parameters to bsr_dense_mm (#111760 ) As in the title. The figures below illustrate the performance differences of bsr_dense_mm with optimized parameters and bsr_dense_mm with default parameters (GPU: NVIDIA A100-SXM4-80GB). The first figure represents the performance equilibrium point in BSR tensor sparsity at which value bsr_dense_mm have the same performance characteristics as torch.matmul. The second figure represents speedups from using optimized meta parameters in bsr_dense_mm at its performance equilibrium points with respect to bsr_dense_mm with default meta parameters. In sum, this PR speeds up `bsr_dense_mm` about 50 % depending on the bsr tensor shape and blocksize and lowers the performance equilibrium points of BSR tensor sparsity and strided tensor for matmul operations. <img src="https://github.com/pytorch/pytorch/assets/402156/6fe9d35f-dd21-4aa0-bb01-6ee257254453" width="48%"> <img src="https://github.com/pytorch/pytorch/assets/402156/506921c6-3770-4209-ad3d-498d2ae4989d" width="48%"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/111760 Approved by: https://github.com/cpuhrsch ghstack dependencies: #110396, #111470, #111489	2023-10-23 23:52:49 +00:00
Pearu Peterson	f3d08ab271	Use more performant bsr_scatter_mm within bsr_dense_mm when blocksize is 16. (#111489 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111489 Approved by: https://github.com/cpuhrsch ghstack dependencies: #110396, #111470	2023-10-23 23:52:49 +00:00
Pearu Peterson	6078ed95cc	Use lru_cache to cache indices data for bsr_scatter_mm. (#111470 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111470 Approved by: https://github.com/cpuhrsch ghstack dependencies: #110396	2023-10-23 23:52:49 +00:00
Oguz Ulgen	b56699b699	Add post grad graph logging (#111808 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111808 Approved by: https://github.com/Chillee ghstack dependencies: #111770	2023-10-23 23:24:04 +00:00
Richard Zou	0ea9646cdd	Rewrite torch.library's documentation (#111310 ) We mention the higher-level torch.library APIs and put the original docs into a low-level API section. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111310 Approved by: https://github.com/soulitzer ghstack dependencies: #111380, #111659	2023-10-23 23:02:41 +00:00
Richard Zou	66b74d231a	Change torch.library.impl to accept a device string (#111659 ) torch.library.impl now accepts a device string (e.g. "cpu", "cuda"). It still accepts DispatchKey strings, but we no longer document this, because using arbitrary DispatchKeys is more for the power users. We map the device string to a DispatchKey and then register the impl for said DispatchKey. A user may also specify multiple device strings at once or specify "types=default" to get a CompositeExplicitAutograd registration. Test Plan: - new tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/111659 Approved by: https://github.com/soulitzer ghstack dependencies: #111380	2023-10-23 23:02:41 +00:00
Richard Zou	6463f2b51c	Rename name->qualname in torch.library.impl_abstract (#111380 ) See title. Makes this consistent with torch.library.{define, impl, impl_device}, where we have named the same argument `qualname`. This is not BC-breaking because we have not released a version of PyTorch with impl_abstract in it yet. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111380 Approved by: https://github.com/soulitzer	2023-10-23 23:02:36 +00:00
Peter Bell	0be84bb41e	record_function: remove legacy internal operators (#72303 ) These operators have not been used since #76420 but were preserved for TorchScript backward compatibility Pull Request resolved: https://github.com/pytorch/pytorch/pull/72303 Approved by: https://github.com/albanD ghstack dependencies: #104535	2023-10-23 22:55:05 +00:00
David Berard	4ed4753ac3	[inductor][easy] skip test_extension_backend.py in fbcode (#111591 ) Summary: It's currently failing. We should skip it in fbcode because cpp extensions don't work right now. Differential Revision: D48852412 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111591 Approved by: https://github.com/desertfire	2023-10-23 22:37:13 +00:00
Nikita Shulga	d22e5e4b52	Fix DDP notes (#111833 ) To include `import os` otherwise sample is not syntactically correct Reported in https://github.com/pytorch/pytorch.github.io/pull/1490 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111833 Approved by: https://github.com/wanchaol	2023-10-23 22:05:36 +00:00
David Berard	070b94dc08	[inductor][BE] split triton_meta and inductor_meta (#111397 ) triton_meta is intended to be passed directly to triton. Previous we were also putting other metadata into triton_meta; but we should split out the other metadata into a separate dict to avoid possible conficts in the future. This PR splits out triton_meta and inductor_meta so we have a place to put additional metadata that isn't intended to be passed to triton. Tests - wait for CI Differential Revision: [D50442547](https://our.internmc.facebook.com/intern/diff/D50442547) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111397 Approved by: https://github.com/shunting314, https://github.com/eellison	2023-10-23 21:38:21 +00:00
soulitzer	73170b23d4	Add compile support for NT unbind (#111531 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111531 Approved by: https://github.com/ezyang	2023-10-23 21:16:20 +00:00
Sherlock Huang	4d45c21c3f	[Export] Don't serialize missing args with default value (#111715 ) Summary: Per https://docs.google.com/document/d/1FzWm-sHYwmRi3x_g036kOxd99KaYquUsA-L5JwOn8ys/edit I wonder if this would break executorch? @larryliu0820 I see exir/serialize.py using export's GraphModuleSerializer. Test Plan: Existing CIs Differential Revision: D50519217 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111715 Approved by: https://github.com/zhxchen17	2023-10-23 21:09:15 +00:00
Iris Z	185e76238d	[2D][Documentation] Add some comments to _chunk_dtensor (#111775 ) As title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111775 Approved by: https://github.com/awgu	2023-10-23 20:43:03 +00:00
Catherine Lee	3b5b7ebd09	[ci] Save various json files from test infra into folder (#111516 ) We pull a lot of files from https://github.com/pytorch/test-infra/blob/generated-stats/stats and name them separately when we add them to the artifacts in the build, so stick them in a folder and just add that instead. Slow test and disabled test jsons remain as they were since they are pulled during the test step and do not need to be included in the artifacts during build since they are not used for sharding. Sanity checked that test times could be found for linux, mac, windows, and rocm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111516 Approved by: https://github.com/huydhn, https://github.com/ZainRizvi	2023-10-23 20:38:25 +00:00
drisspg	e509b162ed	Disable FlashAttenion for is_causal=True when seqlen q not equal kv (#111007 ) # Summary: This pull request removes support for non-square sequence lengths in causal attention when using FlashAttention V2. ### Why are doing this // FlashAttention 2 updated the default mask meaning for causal in this PR: // 9e5e8bc91e it is now aligned to lower_right which would be a BC break // for non-square masks. We will not support non-square masks for causal w/ FAV2 For more context see: https://github.com/pytorch/pytorch/issues/108108 ### Followup A large number of people will likely want to use FAV2 with lower_right causal attention for non equal sequence lengths. See this RFC : https://github.com/pytorch/pytorch/issues/110681 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111007 Approved by: https://github.com/cpuhrsch	2023-10-23 20:33:37 +00:00
Hongtao Yu	98e749a306	[Pytorch][CPU] Switch building compiler to Clang (#111537 ) Summary: The slimdsnn model is currently built with GCC, and I see Clang-15 generates better code than GCC which is 10% faster, after a stack of backporting (D50338220). There are likely other improvements to internal Clang as the TOT Clang in LLVM upstream generates even better code. Test Plan: Before: buck2 run mode/{opt,inplace} //accelerators/workloads/models/slimdsnn:slimdsnn_dso_benchmark -- --iterations=100000000 Starting benchmark, 100000000 iterations... Batch=1 latency: 0.643 us After: buck2 run mode/{opt,inplace} //accelerators/workloads/models/slimdsnn:slimdsnn_dso_benchmark -- --iterations=100000000 Starting benchmark, 100000000 iterations... Batch=1 latency: 0.593 us Reviewed By: bertmaher Differential Revision: D50399150 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111537 Approved by: https://github.com/bertmaher	2023-10-23 20:26:46 +00:00
Edward Z. Yang	6c384cf4a6	Don't DCE unbacked SymInt if it is returned as shape constant buffer (#111803 ) Also adds some logging for the inductor scheduler Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/111803 Approved by: https://github.com/jansel	2023-10-23 19:57:38 +00:00
Howard Huang	0b602b13c8	[small] fix tcpstore doc arg (#111807 ) incorrect arg name `wait_for_worker` -> `wait_for_workers` Pull Request resolved: https://github.com/pytorch/pytorch/pull/111807 Approved by: https://github.com/awgu, https://github.com/fduwjj	2023-10-23 19:51:09 +00:00
Pearu Peterson	d4708a6da7	Add scatter_mm and bsr_scatter_mm operations. (#110396 ) This PR introduces `scatter_mm` operation (compute `mm` of arbitrary pairs of tensors given in batches of tensors) that is used to implement `bsr_scatter_mm` that is equivalent to `bsr_dense_mm` (the `mm` operation on bsr and strided tensors). The implementation is provided both in Triton (when tensor dimensions are multiples of 16) and in PyTorch (otherwise). The figures below illustrate the performance differences of `bsr_scatter_mm` and `bsr_dense_mm` (GPU: `NVIDIA GeForce RTX 2060 SUPER`). The first figure represents the performance equilibrium point in BSR tensor sparsity at which value `bsr_scatter_mm` or `bsr_dense_mm` have the same performance characteristics as `torch.matmul`. The second figure represents speedups from using `bsr_scatter_mm` at its performance equilibrium points with respect to `bsr_dense_mm`. <img src="https://github.com/pytorch/pytorch/assets/402156/526d182e-937f-4812-a6c4-904f52d6d5ab" width="48%"> <img src="https://github.com/pytorch/pytorch/assets/402156/ccb606ab-1f3f-4133-887c-b56285f4f168" width="48%"> The same figures for GPU card `NVIDIA A100-SXM4-80GB`: <img src="https://github.com/pytorch/pytorch/assets/402156/25466f1d-df34-4d1c-a975-afb478e4d9f0" width="48%"> <img src="https://github.com/pytorch/pytorch/assets/402156/6ada91f0-a20f-4f0d-8a48-1f4ccc60d08e" width="48%"> In sum: - `bsr_scatter_mm` is about 2x faster than `bsr_dense_mm` for small block sizes of 16 and 32 and large tensors [GPU: `NVIDIA GeForce RTX 2060 SUPER`]. - `bsr_scatter_mm` is up to 2x faster than `bsr_dense_mm` for small block sizes of 16 and large tensors [GPU: `NVIDIA A100-SXM4-80GB`]. - `bsr_dense_mm` is up to 20 % faster than `bsr_scatter_mm` for block sizes of 64 or larger [GPU: `NVIDIA GeForce RTX 2060 SUPER`]. - However, `bsr_dense_mm` fails with `OutOfResources` exception for block sizes of 256 or larger whereas `bsr_scatter_mm` succeeds. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110396 Approved by: https://github.com/cpuhrsch	2023-10-23 19:45:30 +00:00
Pearu Peterson	3b9246ba18	Add CSR tensor with non-contiguous values support to CuSparseSpMatCsrDescriptor (#111742 ) Fixes https://github.com/pytorch/pytorch/issues/111574 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111742 Approved by: https://github.com/amjames, https://github.com/cpuhrsch	2023-10-23 19:20:11 +00:00
HDCharles	335582584f	[inductor] Adding a way to force fusion of int_mm with mul (#111413 ) Summary: When doing quantization int_mm -> mul or int_mm -> mul -> to(dtype) is an extremely common op pattern which is currently not handled well by inductor. Ideally, since the output of int_mm has dtype int32 we'd prefer to only realize a smaller dtype like bf16 or float16. Currently inductor doesn't have a way to force this, in many cases the mul gets fused with a bunch of subsequent pointwise and reduction ops from the dequant and following ops creating an increase in memory overhead and a general slowdown compared to the fused version. as an external benchmark, for SAM this seems to improve our e2e image encoder times by 3-5% depending on batchsize and reduces memory usage by 20% Test Plan: python test/inductor/test_pattern_matcher.py -k "int_mm_mul" Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/111413 Approved by: https://github.com/jansel	2023-10-23 19:18:50 +00:00
Jez Ng	e264b42a2e	[re-land][inductor] Refactor and optimize allocation calls (#111117 ) (#111511 ) Summary: This is a re-land of https://github.com/pytorch/pytorch/pull/111117 with updates to our internal tests included. This splits out changes from https://github.com/pytorch/pytorch/pull/102625 to make things easier to review. This diff creates a `make_allocation()` method that extracts the logic from `make_buffer_allocation()` while allowing us to allocate non-buffer objects. In particular, we will use this to allocate memory pools during memory planning. This diff also includes a small optimization -- if the desired allocation is contiguous, then we emit a call to `empty()` instead of `empty_strided()` with its superfluous stride argument. Test Plan: contbuild & OSS CI, see `9ce0ae836d` Differential Revision: D50429424 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111511 Approved by: https://github.com/jansel	2023-10-23 19:18:32 +00:00
Yang Chen	184aee12cc	make no-inline calls to throw exceptions (#111787 ) Previously, we throw runtime_error exceptions with some string operations upon failures. However, inlining such calls into the main run function causes some exponential compilation-time behavior for the host compiler, where the compiler may spend an hour running call-graph related passes for some large models. This PR replaces the relevant code with no-inline calls. With this change, we reduced the compilation time from more than an hour down to a couple of minutes for some large models. Note that these non-inline calls have little impact to the model inference runtime, because they are on the error-handling paths. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111787 Approved by: https://github.com/desertfire, https://github.com/jansel	2023-10-23 19:12:04 +00:00
Jon Chuang	36d34ce951	[dynamo] support comparing LHS constant with tensor (#111492 ) Fixes https://github.com/pytorch/pytorch/issues/108582 Depends on https://github.com/pytorch/pytorch/pull/111557 for fixing broken integration tests. (due to this PR unblocking an in-graph set membership) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111492 Approved by: https://github.com/Skylion007	2023-10-23 19:05:14 +00:00
Oguz Ulgen	59ae0d9f9d	Allow setting logger output format with TORCH_LOGS_FORMAT (#111770 ) TORCH_LOGS_FORMAT="%(levelname)s: %(message)s" will only dump output level and message contents Pull Request resolved: https://github.com/pytorch/pytorch/pull/111770 Approved by: https://github.com/jansel	2023-10-23 18:42:27 +00:00
Nikita Shulga	01a2c801d4	Pass `BUILD_ENVIRONMENT` to MPS tests (#111595 ) - Pass `GIT_DEFAULT_BRANCH` and `TEST_CONFIG` as well. - Unify `_mac-test.yml` and `_mac-test-mps.yml` further by passing runner type via the matrix and uploading results using the same pattern (before the change MacOS12 and MacOS13 results on PRs were overwritten) - Add `Cleanup disk space` step to `_mac-test-mps.yml` job Should fix the ``` Warning: Gathered no stats from artifacts for build env None build env and None test config. Using default build env and default test config instead. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/111595 Approved by: https://github.com/atalman	2023-10-23 18:37:15 +00:00
Wei Lu	0247dce6cb	[Pytorch][Vulkan] mean.dim (#111609 ) Summary: We implement [`torch.mean(input, dim, keepdim)`](https://pytorch.org/docs/stable/generated/torch.mean.html) for tensors of 2d to 4d. Since 0-dim tensor hasn't been supported yet, we only support `dim.size() < input.dim()` for now. We will support following cases in the future work: - `dim.size() == input.dim()` - `input.dim() == 1` Test Plan: ``` [luwei@devbig984.prn1 /data/users/luwei/fbsource (970fcd90c)]$ LD_LIBRARY_PATH=third-party/swiftshader/lib/linux-x64/ buck run fbcode/mode/dev-nosan //xplat/caffe2:pt_vulkan_api_test_bin -- --gtest_filter="mean" Building: finished in 0.1 sec (100%) 339/339 jobs, 0/339 updated Total time: 0.1 sec BUILD SUCCEEDED Running main() from third-party/googletest/1.11.0/googletest/googletest/src/gtest_main.cc Note: Google Test filter = mean [==========] Running 7 tests from 1 test suite. [----------] Global test environment set-up. [----------] 7 tests from VulkanAPITest [ RUN ] VulkanAPITest.mean_invalid_inputs [ OK ] VulkanAPITest.mean_invalid_inputs (46 ms) [ RUN ] VulkanAPITest.mean_dim_2d [ OK ] VulkanAPITest.mean_dim_2d (127 ms) [ RUN ] VulkanAPITest.mean_dim_3d [ OK ] VulkanAPITest.mean_dim_3d (103 ms) [ RUN ] VulkanAPITest.mean_dim_4d [ OK ] VulkanAPITest.mean_dim_4d (89 ms) [ RUN ] VulkanAPITest.mean_dim_keepdim_2d [ OK ] VulkanAPITest.mean_dim_keepdim_2d (66 ms) [ RUN ] VulkanAPITest.mean_dim_keepdim_3d [ OK ] VulkanAPITest.mean_dim_keepdim_3d (127 ms) [ RUN ] VulkanAPITest.mean_dim_keepdim_4d [ OK ] VulkanAPITest.mean_dim_keepdim_4d (4 ms) [----------] 7 tests from VulkanAPITest (564 ms total) [----------] Global test environment tear-down [==========] 7 tests from 1 test suite ran. (564 ms total) [ PASSED ] 7 tests. ``` Reviewed By: yipjustin Differential Revision: D50312990 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111609 Approved by: https://github.com/yipjustin	2023-10-23 18:34:53 +00:00
jjsjann123	39c09d4da6	Revert "Revert "Nvfuser code removal (#111093 )"" (#111604 ) This reverts commit 715dfced72657e5adacd5bef16e3d458cd94851b. The original PR #111093 is reverted due to broken internal build. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111604 Approved by: https://github.com/davidberard98	2023-10-23 18:32:41 +00:00
Bin Bao	ce48d36324	[aotinductor] Update test utility to use AOTIModelRunner (#111657 ) Summary: Use AOTIModelRunner provided by libtorch instead of the custom written RAIIModelContainer for testing. This change also makes running AOTInductor benchmarks on CPU possbile. Differential Revision: [D50560764](https://our.internmc.facebook.com/intern/diff/D50560764) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111657 Approved by: https://github.com/chenyang78	2023-10-23 18:21:27 +00:00
William Wen	4b6b8fcf6d	Disable dynamo when running generated opcheck tests (#111685 ) Summary: Use `TORCHDYNAMO_DISABLE=1` when running generated opcheck tests. Enable some `fbgemm::pack_segments` tests that errored out (with error `RuntimeError: expected int but got s0s1*2`) because dynamo was being run in the opcheck tests. Test Plan: `parsh -v --build-flags mode/dev-nosan //deeplearning/fbgemm/fbgemm_gpu:sparse_ops_test` then `run_tests("test_pack_segments")` Differential Revision: D50508958 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111685 Approved by: https://github.com/zou3519	2023-10-23 18:21:16 +00:00
Yanbo Liang	e644b03775	[Forward fix] torch.fx.passes.shape_prop should not be skipped (#111771 ) Summary: As title Test Plan: All failures in T167831495 passed Differential Revision: D50542953 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111771 Approved by: https://github.com/aakhundov	2023-10-23 18:05:26 +00:00
CaoE	4b324a8717	Add Half support for aminmax on CPU (#106853 ) Add Half support for aminmax on CPU. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106853 Approved by: https://github.com/cpuhrsch	2023-10-23 17:43:47 +00:00
Ken Jin	ad4ccf9689	[dynamo] Properly track user-defined types for `type()` (#110794 ) Closes https://github.com/pytorch/pytorch/issues/110315. Thanks to @ezyang for the easy repro! Pull Request resolved: https://github.com/pytorch/pytorch/pull/110794 Approved by: https://github.com/ezyang	2023-10-23 17:34:23 +00:00
Aleksei Nikiforov	a22e238db0	Additional lint fixes (#111793 ) Follow up to https://github.com/pytorch/pytorch/pull/111367 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111793 Approved by: https://github.com/ezyang, https://github.com/Skylion007	2023-10-23 17:18:26 +00:00
ydwu4	f3d02d9ae6	Add support for sym_ite (#111440 ) This PR supports sym_ite. This is useful for converting SymBool to SymInt in e.g. #109916. Internally, it uses sympy.Piecewise. We cannot use sympy.ITE because it expects the arguments and output all to be boolean type but we want return SymInt type when converting a SymBool to SymInt. So we use sympy.Piecewise to denote the symbolic relationship. Note that this pr uses the range analysis for sympy.Piecewise implemented in https://github.com/pytorch/pytorch/blob/main/torch/utils/_sympy/value_ranges.py. Test Plan: See added test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111440 Approved by: https://github.com/ezyang	2023-10-23 16:17:43 +00:00
Dmitry Nikolaev	09040f6fbb	bypass nvml for torch.cuda.device_count() if rocm (#110418 ) This is a quick-fix to suppress printing "UserWarning: Can't initialize NVML" when calling torch.cuda.device_count() if [NVIDIA Management Library] (https://developer.nvidia.com/nvidia-management-library-nvml) (nvml module) is installed with ROCm. Fixes https://ontrack-internal.amd.com/browse/SWDEV-414997 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110418 Approved by: https://github.com/jeffdaily, https://github.com/jithunnair-amd, https://github.com/kit1980	2023-10-23 16:15:48 +00:00
albanD	236472b32a	Allow to specify specific files for debug info (#111748 ) Building with `USE_CUSTOM_DEBINFO=torch/csrc/Module.cpp python setup.py develop` for example will provide debug info only for this file. This allows to enable debug symbols very fast from a non-debug build by doing a clean then develop (as long as you have ccache) and avoid very large binaries that take a very long time to load in gdb. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111748 Approved by: https://github.com/drisspg, https://github.com/ezyang, https://github.com/malfet	2023-10-23 14:00:54 +00:00
Peter Bell	024ffd342a	[ATen] Make _unsafe_index CompositeExplicitAutograd (#111795 ) The ATen implementation for this function simply calls `at::index` so there's no reason this shouldn't be composite. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111795 Approved by: https://github.com/lezcano	2023-10-23 13:34:29 +00:00
Hannes Friederich	f3991df408	[caffe2] avoid variable shadowing (#111476 ) Summary: Some builds use -Wshadow and currently there is a compiler warning when building that file. Code inspection shows that `torch::autograd::impl::get_view_autograd_meta` simply extracts information from the passed object, which is `const`. Therefore the returned views should be the same all the time, and we can fetch the view only once. Test Plan: CI NOTE: please advise for a more comprehensive test plan. Differential Revision: D50407625 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111476 Approved by: https://github.com/Skylion007, https://github.com/albanD	2023-10-23 13:22:11 +00:00
cyy	e676ec2fe7	Fix undefined __assert_fail on FreeBSD (#111761 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111761 Approved by: https://github.com/Skylion007	2023-10-23 12:46:03 +00:00
PyTorch UpdateBot	1eb6c4314b	[xla hash update] update the pinned xla hash (#111788 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml). Update the pinned xla hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111788 Approved by: https://github.com/pytorchbot	2023-10-23 10:59:39 +00:00
Michael Lazos	fb8876069d	Support tracing base torch_function impl (#111731 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111731 Approved by: https://github.com/jansel ghstack dependencies: #111730	2023-10-23 07:11:32 +00:00
cyy	0b424ee0b7	Fix inconsistency of max_split_size between DeviceStats and CUDAAllocatorConfig (#111555 ) CUDAAllocatorConfig uses size_t max_split_size and initializes it to std:: numeric_limits<size_t>::max(), and then the value is assigned to max_split_size of DeviceStats which is of type int64_t, so that the command ``` python3 -c "import torch;y=torch.empty(3,device='cuda');print(torch.cuda.memory_stats(0)['max_split_size'])" ``` returned -1. After skimming through the code, and reading the doc in https://pytorch.org/docs/stable/generated/torch.cuda.memory_stats.html, It was sure that negative values of max_split_size make no sense and we should use size_t instead. Now the error has been fixed and the command returns std:: numeric_limits<size_t>::max(). This issue was found in revert of #111137 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111555 Approved by: https://github.com/colesbury	2023-10-23 06:55:29 +00:00
Liao, Xuan	f7401de1bb	Add mha to Autocast CPU (#107674 ) Fixes #106751. This PR adds `_native_multi_head_attention` to Autocast CPU policy. Behavior: Within the scope of torch.cpu.amp.autocast(dtype=torch.bfloat16) , `_native_multi_head_attention` will be forced to run with bf16 data type. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107674 Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5, https://github.com/jon-chuang, https://github.com/drisspg	2023-10-23 06:02:55 +00:00
Michael Lazos	1d9a7f9e43	[Reland] TensorWithTFOverride inheritance from TensorVariable (#111766 ) Accidentally merged https://github.com/pytorch/pytorch/pull/111730 with ghstack, so relanding Pull Request resolved: https://github.com/pytorch/pytorch/pull/111766 Approved by: https://github.com/jansel	2023-10-23 04:33:16 +00:00
Jason Ansel	c65c0682b1	[dynamo] Expand _nonvar_fields names (#111749 ) This should be a small compile time optimization, since we won't need to walk these fields in apply(). Pull Request resolved: https://github.com/pytorch/pytorch/pull/111749 Approved by: https://github.com/yanboliang	2023-10-23 02:58:16 +00:00
Oguz Ulgen	2b2b6caf8f	[inductor] Implement clone removal for user defined triton kernel via reinplace_scatters (#111627 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111627 Approved by: https://github.com/jansel ghstack dependencies: #111434	2023-10-22 22:28:00 +00:00
Federico Galatolo	d118531733	Use `\odot` everywhere instead of mixing `\odot` and `` for the Hadamard product (#111763 ) This pull request addresses an inconsistency in the representation of the Hadamard product across PyTorch documentation. Currently, the notation varies among different modules: - In `torch.nn.LSTM` documentation the Hadamard product is represented with $\odot$ - In `torch.nn.GRU` documentation the Hadamard product is represented with $$ - In `torch.nn.LSTMCell` documentation the Hadamard product is represented with $$ - In `torch.nn.GRUCell` documentation the Hadamard product is represented with $$ - In `torch.ao.nn.quantized.dynamic.GRU` documentation the Hadamard product is represented with $$ This PR proposes consistently representing the Hadamard product throughout the documentation to enhance clarity and align with established standards. The notation $\odot$ will be uniformly adopted, following the convention in the [Deep Learning Book](https://www.deeplearningbook.org/contents/linear_algebra.html). Changes Made:* - Modified `torch.nn.GRU` documentation to represent the Hadamard product with $\odot$ - Modified `torch.nn.LSTMCell` documentation to represent the Hadamard product with $\odot$ - Modified `torch.nn.GRUCell` documentation to represent the Hadamard product with $\odot$ - Modified `torch.ao.nn.quantized.dynamic.GRU` documentation to represent the Hadamard product with $\odot$ Pull Request resolved: https://github.com/pytorch/pytorch/pull/111763 Approved by: https://github.com/albanD	2023-10-22 21:01:35 +00:00
Jon Chuang	5af97fedd2	[dynamo] Fix context wrapping grad mode variable (#111534 ) Fixes https://github.com/pytorch/pytorch/issues/111528 Makes use of `ContextWrappingVariable` so that the function will enter the grad mode whenever it is called, and exit once it is done calling. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111534 Approved by: https://github.com/jansel	2023-10-22 20:55:48 +00:00
Angel Yang	798efab532	Fix S367052 to unblock ICVR MC3 (#109937 ) Summary: Somehow "getitem" started to get Tensor starting from ads_ranking:996 and broke SDD pipelining FX-transformer. We need to skip the Tensor node in annotation. Test Plan: N4326037 with ads_ranking kernel # Before ads_ranking:v996 {F1100009226} # With this diff {F1100009310} Differential Revision: D49567615 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109937 Approved by: https://github.com/xush6528	2023-10-22 20:24:26 +00:00
Jon Chuang	c4ab229a82	[dynamo] Implement `set.__contains__` for `Tensor` as object match of `FakeTensor` (#111738 ) Fixes https://github.com/pytorch/pytorch/issues/111556 Dynamo implementation of `set.__contains__` previously used `__eq__` match. But this is wrong when `__eq__` match does not imply `__hash__` match, as is the case for `torch.Tensor`, leading to inconsistent results. See: https://github.com/pytorch/pytorch/issues/111542 Hence implement as Tensor object match i.e. proxy node `'example_value'` FakeTensor match. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111738 Approved by: https://github.com/lezcano	2023-10-22 17:40:34 +00:00
Oguz Ulgen	977d3bcc46	[Inductor] Support user defined triton kernels in inductor (#111434 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111434 Approved by: https://github.com/jansel	2023-10-22 17:04:19 +00:00
Jon Chuang	e2e1189f41	[dynamo] Fix guard for ndarray calling `torch.as_tensor(None)` (#111665 ) Fixes https://github.com/pytorch/pytorch/issues/111662 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111665 Approved by: https://github.com/lezcano	2023-10-22 15:16:21 +00:00
Chen, Zejun	8e60d646b9	[dynamo][stream]support device-agnostic stream in dynamo and capture stream/event method in fx graph (#108312 ) This PR implements 2 things: 1. support the device agnostic stream and runtime APIs captured by the dynamo. 2. support the stream methods(include the event) captured by the dynamo. Here are details for 1st. Previously the stream captured in dynamo was tightly bind to CUDA. Here we implement a global singleton container named `StreamMethodContainer` for different backends to register their associated stream methods to dynamo. When import the backend’s product, the stream operations can be registered directly by calling ``` device_stream_method = {'current_stream': method_1, 'create_stream_context': method_2, 'set_stream': method_3, 'set_stream_by_id': method_4} torch._dynamo.stream.register_stream_method(device_name, device_stream_method) ``` Stream methods need to be passed in this API according to the precise semantics represented by the dict key in `device_stream_method`. After register, these methods can be used by dynamo to capture the stream operations in users’ script, for example, get the current stream or set the specific stream. Additionally, the wrapped stream variable and the stream context variable are changed to be the device-agnostic, the proxy functions of these variables are assigned by the associated methods in the container. All of this are illustrated in the below. Below is a illustration. ![image](https://github.com/pytorch/pytorch/assets/74231238/37ac7350-c539-4167-9886-c3744ecab65d) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108312 Approved by: https://github.com/jansel, https://github.com/jgong5	2023-10-22 13:22:58 +00:00
Peter Bell	57c7aa12db	Remove deprecated fbgemm operators (#104535 ) These operators are not used and have been deprecated since #72690 (Feb 2022). Additionally, the `torch.jit.quantized` interface has been deprecated since #40102 (June 2020). Pull Request resolved: https://github.com/pytorch/pytorch/pull/104535 Approved by: https://github.com/ezyang	2023-10-22 06:10:09 +00:00
Yanbo Liang	bf01a7b023	[3/N] Merge skipfiles.check rules (#111451 ) This major change in this PR is to consolidate the skipfiles.check rules, the major thing done is merging the original ```FILE_INLINELIST``` with ```SUBMOD_INLINELIST``` into new ```MOD_INLINELIST``` and a legacy ```LEGACY_MOD_INLINELIST```. Let's use the following example to illustrate what is the expected behavior for this force inline list: `fa995626a8/torch/_dynamo/skipfiles.py (L344-L369)` The handling logic is: * If f2 is inlined, we will check both ```MOD_INLINELIST``` and ```LEGACY_MOD_INLINELIST``` to consultant force inline rules for f3. * If f2 is skipped, we will check ```LEGACY_MOD_INLINELIST``` only for inline rules for f3. The reason behind this design is: if f2 is skipped, if we always trace all recursively called functions, we will go to the very low level functions (e.g, ```super().__init__```) which caused graph breaks. We treated this as a signal that all functions that f2 recursively called should be skipped as well if f2 is skipped. This is also a feature that many PyTorch developers requested, they just want to skip all recursive functions if they mark the upper level functions as skipped. For PyTorch developers, we should only use ```MOD_INLINELIST``` going forward. I think most of the modules in the ```LEGACY_MOD_INLINELIST``` are legacy things to workaround when we didn't have a good skip/inline API. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111451 Approved by: https://github.com/ezyang	2023-10-22 04:35:15 +00:00
Wanchao Liang	61461f39d1	[dtensor] handle negative dim and fix TP regression (#111750 ) TP style still have some regression due to negative dim specifications, fix it by allow DTensor API to handle negative dims and normalize them. i.e. TP uses `Shard(-1)`, and then try to redistribute `Shard(1) -> Shard(-1)`, this should ideally be no-op but current it runs a decompose sharding phrase and it would turn this transformation to `Shard(1) -> Replicate -> Shard(-1)`, which is wrong and triggers unnecessary allgathers Pull Request resolved: https://github.com/pytorch/pytorch/pull/111750 Approved by: https://github.com/rohan-varma	2023-10-22 04:25:45 +00:00
Wanchao Liang	1d291e1f19	[dtensor] hide xla imports to avoid warning (#111751 ) xla imports throw warnings about xla not imported and we should only import xla when needed Pull Request resolved: https://github.com/pytorch/pytorch/pull/111751 Approved by: https://github.com/rohan-varma	2023-10-22 04:09:10 +00:00
Brian Hirsh	c9ca0dde0d	python_arg_parser + dynamic shapes: fix segfault coercing symint to intlist (#111642 ) Fixes https://github.com/pytorch/pytorch/issues/104812. As of https://github.com/pytorch/pytorch/pull/111216, the python arg parser will now guard and cast symints from dynamo into ints when it is forced to (e.g. when we pass a symint to an op that only accepts ints). But the python arg parser also has logic to try to coerce ints into int[] - we need the same logic for symint -> int[]. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111642 Approved by: https://github.com/ezyang, https://github.com/albanD ghstack dependencies: #111553	2023-10-22 02:27:14 +00:00
Brian Hirsh	62942b075c	dynamo: graph break on resize_ (#111553 ) AOTAutograd's handling for resize_() isn't fully robust (and on top of that, functionalization can potentially give up and raise an error if the tensor you're resizing has outstanding views). So given that, and given that resize_() is rare, I updated dynamo to graph break on resize_() instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111553 Approved by: https://github.com/ezyang	2023-10-22 02:27:14 +00:00
PyTorch MergeBot	f0cde8613c	Revert "Use fmt::format in NCCLUtils and ProcessGroupNCCL instead of c10::str (#107268 )" This reverts commit 6c56e1ce2b8d850eb8f51731ecc8be415160e02b. Reverted https://github.com/pytorch/pytorch/pull/107268 on behalf of https://github.com/jansel due to Breaks build on Ubuntu 23.04 ([comment](https://github.com/pytorch/pytorch/pull/107268#issuecomment-1773960355))	2023-10-22 01:03:30 +00:00
Banit Agrawal	cc776d2186	[PyTorch Pinned Allocator] Create per thread task pool for mapping memory space (#111545 ) Differential Revision: D50443865 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111545 Approved by: https://github.com/zdevito	2023-10-22 00:23:49 +00:00
Jason Ansel	7bd004297a	[inductor] Move inductor ops to CompositeExplicitAutograd (#111702 ) Relands #111274 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111702 Approved by: https://github.com/voznesenskym ghstack dependencies: #111700, #111701	2023-10-21 17:31:43 +00:00
Jason Ansel	1a528c826e	[Compiled Autograd] Error if tensor_post_acc_grad_hooks is set (#111701 ) Relands #111273 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111701 Approved by: https://github.com/voznesenskym ghstack dependencies: #111700	2023-10-21 17:31:43 +00:00
Jason Ansel	a1154e673b	[Compiled Autograd] Turn accumulate_grad into an op (#111700 ) Relands #111271 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111700 Approved by: https://github.com/voznesenskym	2023-10-21 17:31:09 +00:00
cyy	39f484646b	[4/N] Apply clang-tidy to aten/src/ATen/core (#111406 ) Applies clang-tidy to more aten/src/ATen/core/* files Pull Request resolved: https://github.com/pytorch/pytorch/pull/111406 Approved by: https://github.com/Skylion007	2023-10-21 15:14:00 +00:00
Jon Chuang	47eed65481	[dynamo] Add `is_` support for `Tensor`s, force `get_fake_value` to reuse previously computed `example_value` if available (#111565 ) Use FakeTensor id match as equivalent to object identity match cc Pull Request resolved: https://github.com/pytorch/pytorch/pull/111565 Approved by: https://github.com/ezyang	2023-10-21 13:56:30 +00:00
voznesenskym	9455af58b5	[easy][dynamo] Cleanup guard builder selection (#111723 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111723 Approved by: https://github.com/jon-chuang, https://github.com/jansel	2023-10-21 10:48:32 +00:00
Tobias Ringwald	cc28b9c10a	Fixed a memory leak in PyTorchFileReader (#111703 ) Fixes #111330. This PR prevents `PyTorchFileReader` from leaking memory when initialized with an already opened file handle instead of a file name. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111703 Approved by: https://github.com/Skylion007	2023-10-21 10:11:43 +00:00
Jon Chuang	344fc98991	[dynamo] fix: `SetVariable` should test `Tensor` identity based `example_value` FakeTensor, not `fx.Node` (#111696 ) FX Node changes after in-place op. FakeTensor remains the same. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111696 Approved by: https://github.com/ezyang	2023-10-21 08:49:21 +00:00
Edward Z. Yang	d054078b74	Fix missing guards from logs (#111698 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/111698 Approved by: https://github.com/suo, https://github.com/voznesenskym	2023-10-21 07:17:09 +00:00
Sergii Dymchenko	9c9f66c042	[TorchFix] Update old pretrained TorchVision API in tests (#111708 ) For TorchVision models, `pretrained` parameters have been deprecated in favor of "Multi-weight support API" - see https://pytorch.org/vision/0.15/models.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/111708 Approved by: https://github.com/NicolasHug	2023-10-21 07:05:33 +00:00
Yuanjing Shi	920c9adcc6	[MetaTensor] fix inplace copy for meta tensor (#111705 ) Fixes #105685 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111705 Approved by: https://github.com/ezyang	2023-10-21 06:02:37 +00:00
PyTorch UpdateBot	5737545467	[vision hash update] update the pinned vision hash (#111720 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111720 Approved by: https://github.com/pytorchbot	2023-10-21 05:09:06 +00:00
Sergii Dymchenko	3c4581d613	Remove outdated declarations from setup.py (#110660 ) `-Wno-deprecated-declarations` should not be needed after Python 2 not supported. Clang issue for `-Wno-missing-braces` was fixed in 2018. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110660 Approved by: https://github.com/huydhn, https://github.com/atalman, https://github.com/malfet	2023-10-21 04:55:44 +00:00
Edward Z. Yang	c84c86f018	SymIntify convolution (#111599 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/111599 Approved by: https://github.com/wanchaol, https://github.com/bdhirsh	2023-10-21 03:03:20 +00:00
Elias Ellison	0a147fd112	Pointwise fuse cat with pointwise inputs or outputs and <= 4 inputs (#111233 ) Improves perf of llama_v2 locally from 1.55 -> 1.57 The initial heuristic is to lower to pointwise if # of inputs is <= 4, and all the inputs are pointwise or cannot be memory planned away, or if all the outputs are pointwise. Perf run was +3% on inference.. There are definitely instances where we should be lowering to foreach_kernels, but it's less flexible for fusion. The motivating example was: ``` def rotate_half(x): """Rotates half the hidden dims of the input.""" x1 = x[..., : x.shape[-1] // 2] x2 = x[..., x.shape[-1] // 2 :] return torch.cat((-x2, x1), dim=-1) def apply_rotary_pos_emb(q, k, cos, sin): iota = torch.ops.prims.iota.default(512, start = 0, step = 1, dtype = torch.int64, device = device(type='cuda', index=0), requires_grad = False) # File: /scratch/eellison/work/torchdynamo/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py:657, code: position_ids = position_ids.unsqueeze(0).view(-1, seq_length) unsqueeze = torch.ops.aten.unsqueeze.default(iota, 0) position_ids = torch.ops.aten.reshape.default(unsqueeze, [-1, 512]); unsqueeze = None # The first two dimensions of cos and sin are always 1, so we can `squeeze` them. cos = cos.squeeze(1).squeeze(0) # [seq_len, dim] sin = sin.squeeze(1).squeeze(0) # [seq_len, dim] cos = cos[position_ids].unsqueeze(1) # [bs, 1, seq_len, dim] sin = sin[position_ids].unsqueeze(1) # [bs, 1, seq_len, dim] q_embed = (q * cos) + (rotate_half(q) * sin) k_embed = (k * cos) + (rotate_half(k) * sin) return q_embed, k_embed ``` Also not sure if I should be more worried about concatting reduction->pointwise inputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111233 Approved by: https://github.com/Chillee	2023-10-21 02:34:05 +00:00
Nikita Shulga	03da0694b7	Fix buffer overflow in `torch.sort` (#111672 ) By updating fbgemm submodule Add regression test for it (though it can probably be limited to just CPU as reproducer only works if num_threads is 1) Also, update call-sites to `fbgemm:: GenerateEmbeddingSpMDM` to pass `isbf16` twice, to match API changes introduced in https://github.com/pytorch/FBGEMM/pull/1851 Fixes https://github.com/pytorch/pytorch/issues/111189 and https://github.com/pytorch/pytorch/issues/111710 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111672 Approved by: https://github.com/Skylion007	2023-10-21 02:30:11 +00:00
Michael Lazos	62df159c3f	move tf override tensor to torch_function.py (#111714 ) Moves TensorWithTFOverride to torch_function.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/111714 Approved by: https://github.com/eellison, https://github.com/voznesenskym	2023-10-21 02:29:01 +00:00
atalman	5034e98393	Fix create source distribution step for release (#111697 ) This is fixing following failure in the release branch: ``` cp: cannot create directory '/tmp/pytorch-release/2.1': No such file or directory ``` Link: https://github.com/pytorch/pytorch/actions/runs/6591657669/job/17910724990 cp will report that error if the parent directory (pytorch-release in this case) does not exist. This is working in main since ``PT_RELEASE_NAME: pytorch-main`` however for release its ``PT_RELEASE_NAME: pytorch-release/2.1`` Test: ``` export tag_or_branch=release/2.1 tag_or_branch="${tag_or_branch//\//_}" echo $tag_or_branch release_2.1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/111697 Approved by: https://github.com/huydhn, https://github.com/osalpekar	2023-10-21 01:57:23 +00:00
Yeounoh Chung	8376079b97	[DTensor][XLA] Support Xla backend in distribute_tensor API (#110275 ) This addresses #92909 , and enable XLA backend support for `distribute_tensor` API. Test plan: added a unit test case & tested with CloudTPU. The CI should skip this unless it's a XLA workflow. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110275 Approved by: https://github.com/wanchaol, https://github.com/alanwaketan, https://github.com/JackCaoG	2023-10-21 01:17:15 +00:00
wz337	ff864efd53	[DCP][Test] Add use_dtensor subtests for test_state_dict FSDP test (#111615 ) As title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111615 Approved by: https://github.com/fegin	2023-10-21 00:44:41 +00:00
wz337	cb2fef1f47	[DCP][Test] Update fine-tune e2e test to use init_device_mesh and DTensor state_dict (#111598 ) As title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111598 Approved by: https://github.com/fegin	2023-10-21 00:37:00 +00:00
Nikita Shulga	7709382b50	Fix regression in `torch.equal` behavior for NaNs (#111699 ) `torch.equal(x, x)` should return false if one of `x` is a tenor of floats one of which is NaN. So, it renders some of the optimization proposed in https://github.com/pytorch/pytorch/pull/100024 invalid, though as result `torch.equal` will become much slower for identical floating point tensors. Add regression test that calls torch.equal for tensor containing NaN Fixes https://github.com/pytorch/pytorch/issues/111251 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111699 Approved by: https://github.com/Skylion007, https://github.com/albanD	2023-10-21 00:02:45 +00:00
Eddie Yan	aa24459595	[NCCL][CUDA][CUDA Graphs] Flush enqueued work before starting a graph capture 2 (#110665 ) Alternative to #104487. Several have chimed in that #104487 introduces a dependency from torch (c10d) to ATen, which is considered backward and messy. This alternative switches the dependency relationship at the cost of requiring graphs to potentially do some polling before the capture. CC @huydhn @malfet @Aidyn-A @ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/110665 Approved by: https://github.com/kwen2501	2023-10-20 23:57:43 +00:00
Germán Méndez Bravo	f9d45f63dd	[torch] Add LOAD_METHOD_SUPER and LOAD_ATTR_SUPER (#111707 ) Summary: Cinder has two new opcodes which optimize `super()` in classes. This implements the opcodes for `torch._dynamo`. Test Plan: ``` buck2 test mode/opt-split-dwarf aps_models/ads/icvr/... -c fbcode.use_cinder_fast_test=true ``` Differential Revision: D50516475 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111707 Approved by: https://github.com/jansel	2023-10-20 23:50:42 +00:00
Aaron Gokaslan	9b499b417e	[BE]: Apply subprocess check to github scripts (#111684 ) Add subproces checks to raise exceptions in Github scripts Pull Request resolved: https://github.com/pytorch/pytorch/pull/111684 Approved by: https://github.com/albanD	2023-10-20 23:37:57 +00:00
Jerry Zhang	43c211facb	[quant][pt2e] Actually support transitive sharing for SharedQuantizationSpec (#111172 ) Summary: Previously we actually did not really support this, this PR added the support. Next * clean up insert observer logic * add allow_transitive_sharing boolean flag to allow people to turn this op for certain edges Test Plan: python test/test_quantization.py TestQuantizePT2E.test_shared_qspec_transitivity Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D50250789](https://our.internmc.facebook.com/intern/diff/D50250789) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111172 Approved by: https://github.com/kimishpatel	2023-10-20 23:25:17 +00:00
Aaron Gokaslan	1ad0f0b308	[BE]: remove unnecessary enumerate calls (#111690 ) Remove unnecessary enumerate calls, entirely automated fixes so probably reasonably low risk. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111690 Approved by: https://github.com/malfet	2023-10-20 23:20:29 +00:00
PyTorch MergeBot	c2a248bdb3	Revert "[ROCm] Unskip functorch tests that now work (#110760 )" This reverts commit 71b35862d3f4ebf0285370d2224b0d0efb118321. Reverted https://github.com/pytorch/pytorch/pull/110760 on behalf of https://github.com/izaitsevfb due to Lint failure ([comment](https://github.com/pytorch/pytorch/pull/110760#issuecomment-1773490896))	2023-10-20 23:04:49 +00:00
Ying Zhang	e9422b1fb0	Fix test listing error (#111630 ) Summary: Fix fbcode internal test listing error Test Plan: buck2 test 'fbcode//mode/opt' fbcode//caffe2/test/inductor:max_autotune -- --run-disabled Differential Revision: D50485766 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111630 Approved by: https://github.com/desertfire	2023-10-20 23:00:18 +00:00
Jon Chuang	101210e2ce	[dynamo] cast single-elem tensors to float and int (#111518 ) Fixes https://github.com/pytorch/pytorch/issues/109538 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111518 Approved by: https://github.com/ezyang	2023-10-20 22:53:58 +00:00
HDCharles	079394e9d6	[documentation] adding desc for adaptive_autorange (#111612 ) Summary: This prevented it from showing up in docs Test Plan: no functional changes Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/111612 Approved by: https://github.com/cpuhrsch	2023-10-20 22:38:39 +00:00
Will Feng	4c6e85365f	Add NVIDIA license to comm_analysis.py (#111670 ) We adapted the cost model from NCCL code, we should apply their license here as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111670 Approved by: https://github.com/Chillee, https://github.com/wanchaol	2023-10-20 21:34:35 +00:00
Andres Lugo-Reyes	71b35862d3	[ROCm] Unskip functorch tests that now work (#110760 ) This issue unskips some of the working tests that were skipped as a result of https://github.com/pytorch/pytorch/issues/96560 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110760 Approved by: https://github.com/zou3519	2023-10-20 21:33:56 +00:00
voznesenskym	303c54dbd9	[dynamo] share a subgraph tracer across fwd and bwd in autograd.Function (#111588 ) Fixes https://github.com/pytorch/pytorch/issues/111031 The current design of autograd.Function tracing in dynamo is that we: 1) speculate fwd, and if its fine, 2) speculate bwd, and if its fine 3) install the .apply in the graph alongside fwd guards The mechanism for doing so involves creating HOPs for fwd, bwd, and apply. The speculation for fwd and bwd create their own subtracer. This is fine, until a proxy created in fwd is used in bwd. For a simple example, consider: ``` class Foo(Function): @staticmethod def forward(ctx, x): ctx.x0 = x.size(0) return x * 2 @staticmethod def backward(ctx, grad_out): return grad_out * ctx.x0 ``` the value stored at `x0` is a proxy - but it is a proxy belonging to the fwd speculation subtracer. Rather than teaching it to the subtracer for bwd, we choose to create a subtracer that covers both fwd and bwd speculation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111588 Approved by: https://github.com/zou3519	2023-10-20 21:32:02 +00:00
ydwu4	bdba54fb4d	[HigherOrderOp] use assertExpectedInline for control flow tests (#111610 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111610 Approved by: https://github.com/zou3519	2023-10-20 21:07:00 +00:00
Wei Lu	8ffbc36f8f	[Pytorch][Vulkan] Fix the implementation of `aten::sum.dim_IntList` (#111586 ) Summary: The existing implementation of `aten::sum.dim_IntList` depends on the following steps: - store the items of arguments `opt_dim` in a `std::set<int64_t> dims_set;` - iterate through `dims_set` in the reverse order(i.e. from largest to smallest) and compute the sum for one designated dim in `sum_dim` But when `opt_dim` contains negative items and `keepdim==false`, the dimension iteration over the set will be messed up. For example, the existing implementation will fail at the test case `test_sum_dim({10, 7, 5}, {-1, -2});`. We fix the issue by invoking `int64_t dim_normalized = utils::normalize(d, self.dim());` to get normalized dim in the range [0, `self.dim()` - 1]. Moreover the existing TORCH_CHECK of the condition ``` d >= -self.dim() - 1 && d <= self.dim() ``` is wrong and fixed by ``` d >= -self.dim() && d < self.dim() ``` Test Plan: ``` [luwei@devbig984.prn1 /data/users/luwei/fbsource (04b08a835)]$ LD_LIBRARY_PATH=third-party/swiftshader/lib/linux-x64/ buck run fbcode/mode/dev-nosan //xplat/caffe2:pt_vulkan_api_test_bin -- --gtest_filter="sum" Building: finished in 0.1 sec (100%) 339/339 jobs, 0/339 updated Total time: 0.2 sec BUILD SUCCEEDED Running main() from third-party/googletest/1.11.0/googletest/googletest/src/gtest_main.cc Note: Google Test filter = sum [==========] Running 8 tests from 1 test suite. [----------] Global test environment set-up. [----------] 8 tests from VulkanAPITest [ RUN ] VulkanAPITest.cumsum [ OK ] VulkanAPITest.cumsum (105 ms) [ RUN ] VulkanAPITest.sum_invalid_inputs [ OK ] VulkanAPITest.sum_invalid_inputs (0 ms) [ RUN ] VulkanAPITest.sum_dim_2d [ OK ] VulkanAPITest.sum_dim_2d (145 ms) [ RUN ] VulkanAPITest.sum_dim_3d [ OK ] VulkanAPITest.sum_dim_3d (91 ms) [ RUN ] VulkanAPITest.sum_dim_4d [ OK ] VulkanAPITest.sum_dim_4d (89 ms) [ RUN ] VulkanAPITest.sum_dim_keepdim_2d [ OK ] VulkanAPITest.sum_dim_keepdim_2d (63 ms) [ RUN ] VulkanAPITest.sum_dim_keepdim_3d [ OK ] VulkanAPITest.sum_dim_keepdim_3d (135 ms) [ RUN ] VulkanAPITest.sum_dim_keepdim_4d [ OK ] VulkanAPITest.sum_dim_keepdim_4d (4 ms) [----------] 8 tests from VulkanAPITest (637 ms total) [----------] Global test environment tear-down [==========] 8 tests from 1 test suite ran. (637 ms total) [ PASSED ] 8 tests. ``` Reviewed By: yipjustin Differential Revision: D50442152 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111586 Approved by: https://github.com/yipjustin	2023-10-20 20:33:06 +00:00
andrewor14	e4e7d34fe9	[pt2][quant] Clean up QAT get conv-bn-relu nodes (#111515 ) Summary: Reduces duplicate code to map original matched nodes to replacement nodes. Test Plan: python test/test_quantization.py TestQuantizePT2EQAT Reviewers: jerryzh168 Subscribers: jerryzh168, supriyar Pull Request resolved: https://github.com/pytorch/pytorch/pull/111515 Approved by: https://github.com/jerryzh168	2023-10-20 20:01:38 +00:00
Andrew Gu	cc37d8d3f8	[Easy] Fixed typo in `init_device_mesh` note (#111658 ) It has been a while since I landed a PR... Pull Request resolved: https://github.com/pytorch/pytorch/pull/111658 Approved by: https://github.com/H-Huang, https://github.com/wz337	2023-10-20 19:49:38 +00:00
Edward Z. Yang	14c2f296e0	Don't suppress original error message for data-dependent value (#111596 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/111596 Approved by: https://github.com/suo	2023-10-20 19:38:50 +00:00
Aleksei Nikiforov	ba04d84089	S390x inductor support (#111367 ) Use arch compile flags. They are needed for vectorization support on s390x. Implement new helper functions for inductor. This change fixes multiple tests in test_cpu_repro.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/111367 Approved by: https://github.com/ezyang	2023-10-20 19:38:46 +00:00
Nikita Shulga	8d03a0dd75	[ez] Remove extraneous files (#111668 ) Accidentally added by https://github.com/pytorch/pytorch/pull/111504 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111668 Approved by: https://github.com/atalman	2023-10-20 19:34:59 +00:00
fduwjj	fdc29f58c6	[TP] Refactor style to make it work with torch.compile (#111625 ) We are refactoring parallel style to solve the following things: 1. To further simplifying code logic to make more readable for users. 2. To remove tuple check so that we can work with dynamo for now. Ideally dynamo needs to support this case and we will fix it in parallel. 3. Add tests for newly added parallel style in UT and torch compile test so that we can capture regression due to code change. 4. Move placements early return check into DTensor since it is by passed by dynamo. 5. Remove PairwiseParallelStyle from unit tests to use the new Col/Rowwise parallel style. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111625 Approved by: https://github.com/wanchaol	2023-10-20 19:20:43 +00:00
CaoE	d1afb7d43d	add Half support for multinomial on CPU (#104178 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/104178 Approved by: https://github.com/jgong5, https://github.com/kulinseth, https://github.com/cpuhrsch	2023-10-20 19:16:04 +00:00
Kaichao You	d1110a18de	[Dynamo]make sure resume function have valid names (#111635 ) An ongoing effort for https://github.com/pytorch/pytorch/issues/111633 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/111635 Approved by: https://github.com/ezyang, https://github.com/jansel	2023-10-20 18:54:52 +00:00
Michael Lazos	a55ecec195	[dynamo][`__torch_function__` 2/n] Refactor TensorWithTFOverrideVariable (#109556 ) This is purely a refactor that preserves the existing behavior and tests. The main contributions of the PR are to refactor the dispatch of `__torch_function__` to enable calling it with TF override objects in any argument position and matching the eager dispatch behavior. This will allow for the following in upcoming PRs: 1) have TensorWithTFOverrideVariable inherit from TensorVariable 2) enable tracing through the base `__torch_function__` implementation. Note: this depends on https://github.com/pytorch/pytorch/pull/109542 towards tracing for https://github.com/pytorch/pytorch/issues/93723 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109556 Approved by: https://github.com/jansel, https://github.com/ezyang	2023-10-20 18:53:38 +00:00
Jon Chuang	11a3c7696b	[dynamo - testing] Add repro for higher order op list inputs (#111647 ) Add repro from https://github.com/pytorch/pytorch/issues/110118 now that it has been fixed Pull Request resolved: https://github.com/pytorch/pytorch/pull/111647 Approved by: https://github.com/ezyang	2023-10-20 18:23:23 +00:00
Zhengxu Chen	9656ef88b6	[sigmoid] Switch to oss serializer. (#111455 ) Differential Revision: D50348807 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111455 Approved by: https://github.com/tugsbayasgalan	2023-10-20 18:19:05 +00:00
Guilherme Leobas	974c47a20e	remove flatten.using_ints, linalg_, linear, log_softmax.int, logdet, special_ from xfail list (#110985 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110985 Approved by: https://github.com/Skylion007, https://github.com/zou3519	2023-10-20 18:15:39 +00:00
Stephen Jia	8df42f9220	[PyTorch][Vulkan] Allow 0-size tensors to be represented in PyTorch Vulkan (#111512 ) Summary: 0-size tensors are allowed in PyTorch (e.g. a tensor with size {2, 1, 0}). However, this currently causes issues with PyTorch Vulkan as the Vulkan API would raise an error when attempting to allocate a resource with no memory. This diff fixes the behaviour by adding support for `VulkanImage` and `VulkanBuffer` objects that do not have any associated memory. Test Plan: Tested locally with `vulkan_api_test` on Mac as a sanity test. ``` buck run //xplat/caffe2:pt_vulkan_api_test_bin --target-platforms ovr_config//platform/macos:x86_64-fbsource -- --gtest_filter="*" ``` But given how foundational of a change this is, more extensive testing should be done in order to be safe. Reviewed By: yipjustin Differential Revision: D50030659 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111512 Approved by: https://github.com/yipjustin	2023-10-20 17:46:13 +00:00
Nikita Shulga	2452e65960	[BE] More nested namespaces (#111575 ) <!-- copilot:summary --> ### <samp>🤖 Generated by Copilot at bb8fede</samp> Simplify the syntax of various namespace definitions and declarations in `aten/src/ATen/cpu` and `aten/src/ATen/metal` files to improve code readability and consistency. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111575 Approved by: https://github.com/atalman, https://github.com/albanD, https://github.com/Skylion007	2023-10-20 16:43:57 +00:00
Kurt Mohler	a267d95c2a	Reland: Add `lazy_clone_storage` to create COW storages (#111579 ) Relands #110192 NOTE: COW storages do not actually copy on write yet, they just have the COW deleter and deleter context applied to them Part of #109833 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111579 Approved by: https://github.com/ezyang	2023-10-20 15:49:59 +00:00
Jack Taylor	619ae87a1d	Disable inductor layout_opt on ROCm (#111474 ) Previously we disabled this option on none MI200 GPUs (https://github.com/pytorch/pytorch/pull/107812 due to worse NHWC conv performance on some cards. This PR will disable this feature for all GPUs to make this uniform for ROCm and due to perf regressions noted here https://github.com/pytorch/pytorch/pull/110319 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111474 Approved by: https://github.com/jithunnair-amd, https://github.com/eellison	2023-10-20 09:31:01 +00:00
Liao, Xuan	3ca81aed42	Add sdpa to Autocast CPU (#111558 ) Fixes #111276 This PR adds sdpa to Autocast CPU policy. Behavior: Within the scope of `torch.cpu.amp.autocast(dtype=torch.bfloat16)`, `scaled_dot_product_attention` will be forced to run with bf16 data type. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111558 Approved by: https://github.com/colesbury	2023-10-20 05:30:09 +00:00
Andrei Gheorghe	6c56e1ce2b	Use fmt::format in NCCLUtils and ProcessGroupNCCL instead of c10::str (#107268 ) Fixes #64604 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107268 Approved by: https://github.com/fduwjj	2023-10-20 05:26:51 +00:00
Aaron Gokaslan	37253c0cd5	Update RUFF to 0.1.1 (#111618 ) Updates ruff to the latest version with some bugfixes Pull Request resolved: https://github.com/pytorch/pytorch/pull/111618 Approved by: https://github.com/colesbury	2023-10-20 04:46:24 +00:00
Sherlock Huang	ff835fb464	[AOTInductor] Disable NonABI tests in fbcode (#111616 ) Summary: NonABI mode is not intended to be used in fbcode. Test Plan: buck2 test 'fbcode//mode/dev-nosan' fbcode//caffe2/test/inductor:test_aot_inductor Differential Revision: D50478575 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111616 Approved by: https://github.com/desertfire, https://github.com/khabinov	2023-10-20 04:37:05 +00:00
PyTorch UpdateBot	e24fdfa177	[vision hash update] update the pinned vision hash (#111624 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111624 Approved by: https://github.com/pytorchbot	2023-10-20 04:09:18 +00:00
Jane Xu	93a9b1314b	Make step() faster by passing in a tensor vs scalar 1 (#111084 ) This is the culminated result of https://github.com/pytorch/pytorch/pull/110954#issuecomment-1758520411. We are making the code slightly more complicated to gain some perf in minimizing calls to `.copy_()` and `.to()`. ### Code ``` import torch with torch.cuda.device(0): steps = [torch.zeros((), device="cpu", dtype=torch.float32) for i in range(1000)] with torch.profiler.profile( activities=[ torch.profiler.ProfilerActivity.CPU, torch.profiler.ProfilerActivity.CUDA, ] ) as p: # New code: # step_device = steps[0].device # one = torch.tensor(1.0, device=step_device) if str(step_device) == "cpu" else 1 # torch._foreach_add_(steps, one, 1.0) # Old code: torch._foreach_add_(steps, 1) print(p.key_averages().table(sort_by="cpu_time_total")) ``` ### Profiles with old code ``` ------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls ------------------------- ------------ ------------ ------------ ------------ ------------ ------------ aten::_foreach_add_ 35.31% 52.089ms 99.99% 147.495ms 147.495ms 1 aten::add_ 25.05% 36.949ms 64.68% 95.406ms 95.406us 1000 aten::to 3.97% 5.852ms 39.63% 58.457ms 58.457us 1000 aten::_to_copy 10.11% 14.917ms 35.66% 52.605ms 52.605us 1000 aten::copy_ 21.65% 31.939ms 21.65% 31.939ms 31.939us 1000 aten::empty_strided 3.90% 5.749ms 3.90% 5.749ms 5.749us 1000 cudaDeviceSynchronize 0.01% 18.000us 0.01% 18.000us 18.000us 1 ------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 147.513ms ``` with new code ``` ------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls ------------------------- ------------ ------------ ------------ ------------ ------------ ------------ aten::_foreach_add_ 55.06% 49.963ms 99.86% 90.625ms 90.625ms 1 aten::add_ 44.81% 40.662ms 44.81% 40.662ms 40.662us 1000 aten::detach_ 0.01% 8.000us 0.05% 45.000us 45.000us 1 detach_ 0.04% 37.000us 0.04% 37.000us 37.000us 1 aten::empty 0.03% 30.000us 0.03% 30.000us 30.000us 1 aten::to 0.03% 23.000us 0.03% 23.000us 23.000us 1 cudaDeviceSynchronize 0.02% 22.000us 0.02% 22.000us 22.000us 1 aten::lift_fresh 0.01% 6.000us 0.01% 6.000us 6.000us 1 ------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 90.751ms ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/111084 Approved by: https://github.com/albanD ghstack dependencies: #111079	2023-10-20 01:34:08 +00:00
Jane Xu	ca7d084ff9	Add ScalarTensor or 0dim overload for _foreach_add (#111079 ) Adding a Tensor overload will allow us to: - optimize in more cases than before - increase coverage for scalarTensor instead of just scalars in our foreach APIs The main complication in this PR was that add.Tensor has a scalar overload, so I've now built out support for that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111079 Approved by: https://github.com/albanD	2023-10-20 01:34:07 +00:00
Guilherme Leobas	935f697754	remove movedim.intlist, tensor_split, to. from xfail list (#110999 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110999 Approved by: https://github.com/kshitij12345	2023-10-19 23:54:45 +00:00
Elias Ellison	652f4c656e	Freeze fuse two mms (#111232 ) Improves llama_v2 perf locally from 1.48x -> 1.55x. A good future rewrite would be to unify the freezing batching with the other batching rules that @yanboliang & co were working on. I want to wait for the forthcoming pre-dispatch changes to settle down first. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111232 Approved by: https://github.com/Chillee	2023-10-19 22:52:34 +00:00
Aaron Gokaslan	cb856b08b2	[BE]: Attach cause to some exceptions and enable RUFF TRY200 (#111496 ) Did some easy fixes from enabling TRY200. Most of these seem like oversights instead of intentional. The proper way to silence intentional errors is with `from None` to note that you thought about whether it should contain the cause and decided against it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111496 Approved by: https://github.com/malfet	2023-10-19 21:56:36 +00:00
Tung D. Le	c90f8c883d	[ONNX][s390x] byteswap data when serializing to external files during onnx exporting (#111543 ) This patch is a complement to #107963 to do byteswap data when exporting to onnx, which its swap bytes from big endian to little endian when external files of a big onnx model. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111543 Approved by: https://github.com/justinchuby	2023-10-19 21:44:39 +00:00
Shengbao Zheng	8899abde32	[PyTorch][ET] Improve Process Groups Mapping Info Collection (#110908 ) Summary: Process Groups Mapping info collection was introduced in D46321690. Improve the mapping info collected there: - replace pg_id (a unique ID for the PG object) with pg_names (a unique name for each pg and shared by all ranks) - add number of pg info with group_count - reduce the length of pg_config_info to avoid being truncated(max length of 4096, now doubled ) by - migrating ranks(a map from global ranks to group ranks) with the list of global ranks of a pg, since we currently don't use group rank id - using an empty rank list to indicate that all ranks are involved in a pg and adding a field of group_size to show how many ranks are involved Test Plan: Tested in HPC ``` buck2 run mode/opt //hpc/torchrec/models/ads:cmf_10x_launcher -- launcher=local data_loader=random data_loader.num_batches=100 checkpoint=model_store max_ind_range=10 launcher.num_trainers=8 ``` Example output in ET ``` { "name": "## process_group:init ##", "id": 3, "rf_id": 1, "parent": 2, "fw_parent": 0, "seq_id": -1, "scope": 7, "tid": 1, "fw_tid": 0, "op_schema": "", "inputs": ["[{\"pg_name\": \"0\", \"backend_id\": 140688385794048, \"backend_config\": \"cuda:nccl\", \"ranks\": {\"0\": 0, \"1\": 1, \"2\": 2, \"3\": 3, \"4\": 4, \"5\": 5, \"6\": 6, \"7\": 7}, \"group_count\": 4}, {\"pg_name\": \"1\", \"backend_id\": 140688386762752, \"backend_config\": \"cuda:nccl\", \"ranks\": {\"0\": 0, \"1\": 1, \"2\": 2, \"3\": 3, \"4\": 4, \"5\": 5, \"6\": 6, \"7\": 7}, \"group_count\": 4}, {\"pg_name\": \"2\", \"backend_id\": 140682531798720, \"backend_config\": \"cpu:gloo,cuda:gloo\", \"ranks\": {\"0\": 0, \"1\": 1, \"2\": 2, \"3\": 3, \"4\": 4, \"5\": 5, \"6\": 6, \"7\": 7}, \"group_count\": 4}, {\"pg_name\": \"faa29c0b1e06cd7abc873bd561414911_0\", \"backend_id\": 140672678002688, \"backend_config\": \"cpu:gloo,cuda:gloo\", \"ranks\": {\"0\": 0, \"1\": 1, \"2\": 2, \"3\": 3, \"4\": 4, \"5\": 5, \"6\": 6, \"7\": 7}, \"group_count\": 4}, {\"pg_name\": \"3\", \"backend_id\": 140672678007616, \"backend_config\": \"cpu:gloo,cuda:gloo\", \"ranks\": {\"0\": 0, \"1\": 1, \"2\": 2, \"3\": 3, \"4\": 4, \"5\": 5, \"6\": 6, \"7\": 7}, \"group_count\": 4}, {\"pg_name\": \"faa29c0b1e06cd7abc873bd561414911_1\", \"backend_id\": 140672678012544, \"backend_config\": \"cpu:gloo,cuda:gloo\", \"ranks\": {\"0\": 0, \"1\": 1, \"2\": 2, \"3\": 3, \"4\": 4, \"5\": 5, \"6\": 6, \"7\": 7}, \"group_count\": 4}]"], "input_shapes": [[]], "input_types": ["String"], "outputs": [], "output_shapes": [], "output_types": [] }, ``` Before the change, pg_config_info of >128 rank will be truncated, e.g. ``` "inputs": ["[{\"pg_id\": 140321146893696, \"backend_id\": 140321113854976, \"backend_config\": \"cuda:nccl\", \"ranks\": {\"0\": 0, \"1\": 1, \"2\": 2, \"3\": 3, \"4\": 4, \"5\": 5, \"6\": 6, \"7\": 7, \"8\": 8, \"9\": 9, \"10\": 10, \"11\": 11, \"12\": 12, \"13\": 13, \"14\": 14, \"15\": 15, \"16\": 16, \"17\": 17, \"18\": 18, \"19\": 19, \"20\": 20, \"21\": 21, \"22\": 22, \"23\": 23, \"24\": 24, \"25\": 25, \"26\": 26, \"27\": 27, \"28\": 28, \"29\": 29, \"30\": 30, \"31\": 31, \"32\": 32, \"33\": 33, \"34\": 34, \"35\": 35, \"36\": 36, \"37\": 37, \"38\": 38, \"39\": 39, \"40\": 40, \"41\": 41, \"42\": 42, \"43\": 43, \"44\": 44, \"45\": 45, \"46\": 46, \"47\": 47, \"48\": 48, \"49\": 49, \"50\": 50, \"51\": 51, \"52\": 52, \"53\": 53, \"54\": 54, \"55\": 55, \"56\": 56, \"57\": 57, \"58\": 58, \"59\": 59, \"60\": 60, \"61\": 61, \"62\": 62, \"63\": 63, \"64\": 64, \"65\": 65, \"66\": 66, \"67\": 67, \"68\": 68, \"69\": 69, \"70\": 70, \"71\": 71, \"72\": 72, \"73\": 73, \"74\": 74, \"75\": 75, \"76\": 76, \"77\": 77, \"78\": 78, \"79\": 79, \"80\": 80, \"81\": 81, \"82\": 82, \"83\": 83, \"84\": 84, \"85\": 85, \"86\": 86, \"87\": 87, \"88\": 88, \"89\": 89, \"90\": 90, \"91\": 91, \"92\": 92, \"93\": 93, \"94\": 94, \"95\": 95, \"96\": 96, \"97\": 97, \"98\": 98, \"99\": 99, \"100\": 100, \"101\": 101, \"102\": 102, \"103\": 103, \"104\": 104, \"105\": 105, \"106\": 106, \"107\": 107, \"108\": 108, \"109\": 109, \"110\": 110, \"111\": 111, \"112\": 112, \"113\": 113, \"114\": 114, \"115\": 115, \"116\": 116, \"117\": 117, \"118\": 118, \"119\": 119, \"120\": 120, \"121\": 121, \"122\": 122, \"123\": 123, \"124\": 124, \"125\": 125, \"126\": 126, \"127\": 127}}, {\"pg_id\": 140321074662400, \"backend_id\": 140321100033024, \"backend_config\": \"cuda:nccl\", \"ranks\": {\"0\": 0, \"1\": 1, \"2\": 2, \"3\": 3, \"4\": 4, \"5\": 5, \"6\": 6, \"7\": 7, \"8\": 8, \"9\": 9, \"10\": 10, \"11\": 11, \"12\": 12, \"13\": 13, \"14\": 14, \"15\": 15, \"16\": 16, \"17\": 17, \"18\": 18, \"19\": 19, \"20\": 20, \"21\": 21, \"22\": 22, \"23\": 23, \"24\": 24, \"25\": 25, \"26\": 26, \"27\": 27, \"28\": 28, \"29\": 29, \"30\": 30, \"31\": 31, \"32\": 32, \"33\": 33, \"34\": 34, \"35\": 35, \"36\": 36, \"37\": 37, \"38\": 38, \"39\": 39, \"40\": 40, \"41\": 41, \"42\": 42, \"43\": 43, \"44\": 44, \"45\": 45, \"46\": 46, \"47\": 47, \"48\": 48, \"49\": 49, \"50\": 50, \"51\": 51, \"52\": 52, \"53\": 53, \"54\": 54, \"55\": 55, \"56\": 56, \"57\": 57, \"58\": 58, \"59\": 59, \"60\": 60, \"61\": 61, \"62\": 62, \"63\": 63, \"64\": 64, \"65\": 65, \"66\": 66, \"67\": 67, \"68\": 68, \"69\": 69, \"70\": 70, \"71\": 71, \"72\": 72, \"73\": 73, \"74\": 74, \"75\": 75, \"76\": 76, \"77\": 77, \"78\": 78, \"79\": 79, \"80\": 80, \"81\": 81, \"82\": 82, \"83\": 83, \"84\": 84, \"85\": 85, \"86\": 86, \"87\": 87, \"88\": 88, \"89\": 89, \"90\": 90, \"91\": 91, \"92\": 92, \"93\": 93, \"94\": 94, \"95\": 95, \"96\": 96, \"97\": 97, \"98\": 98, \"99\": 99, \"100\": 100, \"101\": 101, \"102\": 102, \"103\": 103, \"104\": 104, \"105\": 105, \"106\": 106, \"107\": 107, \"108\": 108, \"109\": 109, \"110\": 110, \"111\": 111, \"112\": 112, \"113\": 113, \"114\": 114, \"115\": 115, \"116\": 116, \"117\": 117, \"118\": 118, \"119\": 119, \"120\": 120, \"121\": 121, \"122\": 122, \"123\": 123, \"124\": 124, \"125\": 125, \"126\": 126, \"127\": 127}}, {\"pg_id\": 140321154994304, \"backend_id\": 140319780290048, \"backend_config\": \"cpu:gloo,cuda:gloo\", \"ranks\": {\"0\": 0, \"1\": 1, \"2\": 2, \"3\": 3, \"4\": 4, \"5\": 5, \"6\": 6, \"7\": 7, \"8\": 8, \"9\": 9, \"10\": 10, \"11\": 11, \"12\": 12, \"13\": 13, \"14\": 14, \"15\": 15, \"16\": 16, \"17\": 17, \"18\": 18, \"19\": 19, \"20\": 20, \"21\": 21, \"22\": 22, \"23\": 23, \"24\": 24, \"25\": 25, \"26\": 26, \"27\": 27, \"28\": 28, \"29\": 29, \"30\": 30, \"31\": 31, \"32\": 32, \"33\": 33, \"34\": 34, \"35\": 35, \"36\": 36, \"37\": 37, \"38\": 38, \"39\": 39, \"40\": 40, \"41\": 41, \"42\": 42, \"43\": 43, \"44\": 44, \"45\": 45, \"46\": 46, \"47\": 47, \"48\": 48, \"49\": 49, \"50\": 50, \"51\": 51, \"52\": 52, \"53\": 53, \"54\": 54, \"55\": 55, \"56\": 56, \"57\": 57, \"58\": 58, \"59\": 59, \"60\": 60, \"61\": 61, \"62\": 62, \"63\": 63, \"64\": 64, \"65\": 65, \"66\": 66, \"67\": 67, \"68\": 68, \"69\": 69, \"70\": 70, \"71\": 71, \"72\": 72, \"73\": 73, \"74\": 74, \"75\": 75, \"76\": 76, \"77\": 77, \"78\": 78, \"79\": 79, \"80\": 80, \"81\": 81, \"82\": 82, \"83\": 83, \"84\": 84, \"85\": 85, \"86\": 86, \"87\": 87, \"88\": 88, \"89\": 89, \"90\": 90, \"91\": 91, \"92\": 92, \"93\": 93, \"94\": 94, \"95\": 95, \"96\": 96, \"97\": 97, \"98\": 98, \"99\": 99, \"100\": 100, \"101\": 101, \"102\": 102, \"103\": 103, \"104\": 104, \"105\": 105, \"106\": 106, \"107\": 107, \"108\": 108, \"109\": 109, \"110\": 110, \"111\": 111, \"112\": 112, \"113\": 113, \"114\""], "input_shapes": [[]], "input_types": ["String"], ``` After the change the length reduced ``` "inputs": ["[{\"pg_name\": \"0\", \"backend_id\": 140551405059072, \"backend_config\": \"cuda:nccl\", \"ranks\": [], \"group_size\": 128, \"group_count\": 4}, {\"pg_name\": \"1\", \"backend_id\": 140551399745536, \"backend_config\": \"cuda:nccl\", \"ranks\": [], \"group_size\": 128, \"group_count\": 4}, {\"pg_name\": \"2\", \"backend_id\": 140578999821184, \"backend_config\": \"cpu:gloo,cuda:gloo\", \"ranks\": [], \"group_size\": 128, \"group_count\": 4}, {\"pg_name\": \"ea2f9024c70c8b9a25bc06a4723e5805_0\", \"backend_id\": 140559197777152, \"backend_config\": \"cpu:gloo,cuda:gloo\", \"ranks\": [], \"group_size\": 128, \"group_count\": 4}, {\"pg_name\": \"3\", \"backend_id\": 140549119076736, \"backend_config\": \"cpu:gloo,cuda:gloo\", \"ranks\": [], \"group_size\": 128, \"group_count\": 4}, {\"pg_name\": \"ea2f9024c70c8b9a25bc06a4723e5805_1\", \"backend_id\": 140571995143424, \"backend_config\": \"cpu:gloo,cuda:gloo\", \"ranks\": [], \"group_size\": 128, \"group_count\": 4}]"], "input_shapes": [[]], "input_types": ["String"], ``` Reviewed By: louisfeng, fduwjj Differential Revision: D50048147 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110908 Approved by: https://github.com/fduwjj	2023-10-19 21:37:19 +00:00
Wenting Wang	675df7520a	[tgif][multiforward] allow codegen to generate different func name (#111446 ) Summary: see Shiyan's design doc for ATM TS publish weights dedupe https://fb.quip.com/HnUVAjUMaXMQ Test Plan: tested in N4454041 after D50341352 that multiforward method is working for ts model Differential Revision: D45750812 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111446 Approved by: https://github.com/842974287	2023-10-19 21:19:30 +00:00
Pruthvi Madugundu	f0fac6a94f	Update gloo submodule commit to include recent ROCm6.0 related updates (#111465 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111465 Approved by: https://github.com/jithunnair-amd, https://github.com/malfet	2023-10-19 21:18:23 +00:00
Howard Huang	7a3c3d63bf	fix gloo cuda sparse_allreduce dispatch (#111485 ) Fixes #111422 allreduce_sparse_cuda gets dispatched to allreduce_sparse which doesnt exist for gloo. However, gloo has an existing implementation so this is just fixing the dispatching to that. The reason CI didn't catch this is because we are calling the backend directly. Added a test which calls the public API (dist.XYZ) and goes through the dispatcher Pull Request resolved: https://github.com/pytorch/pytorch/pull/111485 Approved by: https://github.com/fduwjj	2023-10-19 21:15:45 +00:00
Ying Zhang	dc31dbbcab	Optimize reduction + amax fusion (#111122 ) This PR optimizes cases like layer_norm + fp8 quant (which includes amax and fp8 quant) fusion when amax is split into multiple reduction kernels. Benchmark: ``` python test/inductor/test_fp8.py -k test_layernorm_fp8_quant_benchmark Before this PR: Config: float8_dtype=torch.float8_e5m2, shape=(4, 2048, 4096). Benchmark results: Inductor: 0.13262102689486555ms, Eager: 0.8211962616822429ms, LN only Inductor: 0.09606276150627614ms. After this PR: Config: float8_dtype=torch.float8_e5m2, shape=(4, 2048, 4096). Benchmark results: Inductor: 0.08281274131274131ms, Eager: 0.8217452830188678ms, LN only Inductor: 0.09586902286902287ms. ``` LN + fp8 quant is even faster than LN itself. The reason could be that LN + fp8 outputs fp8 while LN outputs fp16. From Inductor nightly benchmark test: There are perf differences in cuda_graph / cuda_graph_dynamic / default runs, but no difference in inductor_max_autotune. So it seems to me that the perf differences are mostly like fluctuations. ![Screenshot 2023-10-18 at 4 58 55 PM](https://github.com/pytorch/pytorch/assets/10527447/6640474a-1e1d-4d33-97e9-0a60d0bc9f1f) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111122 Approved by: https://github.com/jansel	2023-10-19 20:53:50 +00:00
soulitzer	786c51d626	Symintify torch.diff (#111530 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111530 Approved by: https://github.com/bdhirsh, https://github.com/ezyang ghstack dependencies: #111529	2023-10-19 20:38:57 +00:00
soulitzer	74f6f7adcf	Fix NT subclass test typo (#111529 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111529 Approved by: https://github.com/jbschlosser	2023-10-19 20:07:04 +00:00
Jon Chuang	79529ef657	[dynamo] fix graph break when listlike of tensor contains const (#111572 ) Fixes https://github.com/pytorch/pytorch/pull/111557#discussion_r1365620968 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111572 Approved by: https://github.com/voznesenskym, https://github.com/lezcano	2023-10-19 19:51:28 +00:00
CaoE	2a40b7efcb	Add Half support for addcmul, addcdiv, cumsum, and topk on CPU (#103319 ) Add Half support for addcmul, addcdiv, cumsum, and topk on CPU. Note: This PR will introduce the issue https://github.com/pytorch/pytorch/issues/111454. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103319 Approved by: https://github.com/jgong5, https://github.com/cpuhrsch	2023-10-19 17:47:45 +00:00
PyTorch MergeBot	715dfced72	Revert "Nvfuser code removal (#111093 )" This reverts commit 572628e52054b0e061fbaeb0497267380fe45180. Reverted https://github.com/pytorch/pytorch/pull/111093 on behalf of https://github.com/jeanschmidt due to Breaking internal builds, @albanD please help to support the author with the next steps to get this diff merged ([comment](https://github.com/pytorch/pytorch/pull/111093#issuecomment-1771434853))	2023-10-19 17:39:49 +00:00
Nikita Shulga	ca5f6f7af3	[MPS] Skip virtualized devices (#111576 ) Skip devices that does not support `MTLGPUFamilyMac2`, for example something called "Apple Paravirtual device", which started to appear in GitHub CI, from https://github.com/malfet/deleteme/actions/runs/6577012044/job/17867739464#step:3:18 ``` Found device Apple Paravirtual device isLowPower false supports Metal false ``` As first attempt to allocate memory on such device will fail with: ``` RuntimeError: MPS backend out of memory (MPS allocated: 0 bytes, other allocations: 0 bytes, max allowed: 1.70 GB). Tried to allocate 0 bytes on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure). ``` Fixes https://github.com/pytorch/pytorch/issues/111449 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111576 Approved by: https://github.com/atalman, https://github.com/clee2000, https://github.com/huydhn	2023-10-19 17:19:35 +00:00
Catherine Lee	0617f7fa75	[ez] Remove unused code in upload_test_stats (#111504 ) This is code related to parallelism and test times that isn't used, so remove it. Tested by running locally with `python3 -m tools.stats.upload_test_stats --workflow-run-id 6551035874 --workflow-run-attempt 1 --head-branch main --head-repository "pytorch/pytorch"` and commenting out parts for uploading to s3. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111504 Approved by: https://github.com/huydhn	2023-10-19 16:09:15 +00:00
Oguz Ulgen	4e310fd875	[Autograd] Track when mutations are for triton kernels (#111500 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111500 Approved by: https://github.com/bdhirsh	2023-10-19 15:34:34 +00:00
Edward Z. Yang	971f67c988	Allow SymInt to specialize to FLOAT (#111219 ) Fixes https://github.com/pytorch/pytorch/issues/111200 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/111219 Approved by: https://github.com/Skylion007, https://github.com/bdhirsh ghstack dependencies: #111216	2023-10-19 12:55:18 +00:00
Edward Z. Yang	40c44c2307	Force specialization on INT_LIST (#111216 ) Follow up on https://github.com/pytorch/pytorch/pull/95479 Fixes https://github.com/pytorch/pytorch/issues/111198 Fixes https://github.com/pytorch/pytorch/issues/111197 Fixes https://github.com/pytorch/pytorch/issues/111188 Fixes https://github.com/pytorch/pytorch/issues/111201 Fixes https://github.com/pytorch/pytorch/issues/111202 I can also do this for some other types, will do this stacked on top. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/111216 Approved by: https://github.com/voznesenskym	2023-10-19 12:55:18 +00:00
kshitij12345	aa3243bceb	[vmap] symintify : is_same_size and split_with_sizes (#111491 ) Partial : https://github.com/pytorch/pytorch/issues/111312 Reference: Point 1 of https://github.com/pytorch/pytorch/issues/111312#issuecomment-1769079147 Should this have a test? Pull Request resolved: https://github.com/pytorch/pytorch/pull/111491 Approved by: https://github.com/Skylion007, https://github.com/zou3519	2023-10-19 11:04:40 +00:00
Wanchao Liang	03e28bde2e	[tp] fix torch compile regression (#111521 ) The most recent refactor of TP https://github.com/pytorch/pytorch/pull/111160 breaks torch compile path, so reverting the behavior back by: 1. use the old default prepare_input/output 2. add the colwise/rowwise parallel test instead Pull Request resolved: https://github.com/pytorch/pytorch/pull/111521 Approved by: https://github.com/fduwjj	2023-10-19 10:27:10 +00:00
eqy	894b9957c8	[DOCS][CUDA] Update TF32 docs for sm90 (#111337 ) For #110252. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111337 Approved by: https://github.com/msaroufim	2023-10-19 09:36:13 +00:00
Scruel Tao	503f44fbb8	Fix: perverse input's NaN values to prevent undefined behavior for `matrix_exp` function (#111539 ) Currently, for `matrix_exp` function, if we have NaN values in the input matrices (small batches), it will keep outputting a "normal" result without any NaN value in it, and this will cause some problems that we may can't notice. This PR is for preventing such undefined behavior by "bring back" those NaN values. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111539 Approved by: https://github.com/lezcano	2023-10-19 09:07:36 +00:00
Daniel Dale	90e2117a99	Allow optimizer state conversion to accommodate optimizers that have no tensor state (e.g. SGD) (#111501 ) Fixes #111499 This PR slightly alters the new fused `all_gather` `optim_state_dict` implementation to support optimizers without tensor state (e.g. SGD) in a `use_orig_params=True` context. The principle change is to short-circuit `_allgather_orig_param_states` if an empty `state_buffers` dict is returned after completing `_convert_all_state_info` here: `93e5065ba0/torch/distributed/fsdp/_optim_utils.py (L1481-L1484)` To allow `_convert_all_state_info` to accommodate optimizers with no tensor state, I also change the scope of `dtype` and make the return type `Optional`. As discussed in the issue this PR fixes, I'm [extending](`93e5065ba0/test/distributed/fsdp/test_fsdp_optim_state.py (L1915I)`) `test_state_dict_with_none_tensor_state` to test with both Adam and SGD optimizers to validate scalar and non-tensor states continue to be restored for both optimizer types. Thanks to the distributed team as always for their adroit design and exceptionally valuable contributions to the open source ML community. Hope you all feel appreciated commensurate with the compounding progress your work enables. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111501 Approved by: https://github.com/fegin	2023-10-19 06:47:04 +00:00
Valentin Andrei	5ce2ab8466	[cuda] Preserve operations order between vectorized and non-vectorized in ln grad input (#111488 ) The vectorized implementation in https://github.com/pytorch/pytorch/pull/111021 changed the order of arithmetic instructions in `layer_norm_grad_input`, causing non bitwise identical results when compared to the non-vectorized implementation. At merging, all accuracy checks passed, including internal inductor ones. There are CI periodic inductor dynamo tests (e.g. `pit_b_224`) that run eager mode models several times and compare results. If the input buffers are aligned to the vector length, the vectorized implementation will be used. If not, the default one will be used. If the 2 eager runs end up having different buffer alignments, 2 implementations will be called and then the results would be very close but not bitwise identical. The tests check for bitwise identical results and in some cases they may fail. This fix makes sure that the operation order between non-vectorized and vectorized is the same and the 2 implementations should produce bitwise identical results. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111488 Approved by: https://github.com/malfet	2023-10-19 06:00:15 +00:00
Igor Sugak	b2b5f1377b	[caffe2] replace numpy.object with object (#111494 ) Reviewed By: florazzz Differential Revision: D50380126 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111494 Approved by: https://github.com/Skylion007	2023-10-19 04:37:00 +00:00
BowenBao	e3463fe4ca	[ONNX] Benchmark to store test data along exported model (#111095 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111095 Approved by: https://github.com/justinchuby, https://github.com/thiagocrepaldi	2023-10-19 03:20:52 +00:00
Jack Taylor	71d7173ab3	Introduce is_big_gpu condition for test_max_autotune (#111467 ) Fixes https://github.com/pytorch/pytorch/issues/111527 Other test files that rely on `max_autotune` mode being enabled already conditionalise the UT suite on this condition (e.g. test_select_algorithm). Proposing to add this condition for test_max_autotune. Currently we are observing failures on these UTs on the ROCm runners but using MI200+ these tests will pass again (context: https://github.com/pytorch/pytorch/pull/111381#issuecomment-1768048732) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111467 Approved by: https://github.com/shunting314	2023-10-19 03:05:22 +00:00
Huy Do	4ec777e9a5	[BE] Clean up trymerge code handling broken trunk failures (#111520 ) This is the final part of https://github.com/pytorch/pytorch/pull/110054. The broken trunk classification has been done on Dr.CI, so we can just check for that in trymerge for consistency when ghstack is used. * [x] https://github.com/pytorch/pytorch/pull/110054 * [x] https://github.com/pytorch/pytorch/pull/110133 * [x] This PR to clean up the broken trunk logic. One important change is that `get_classifications` doesn't need to query the jobs from Rockset for the head and merge base SHA anymore, saving a query there. The function looks a lot simpler now. ### Testing https://github.com/pytorch/pytorch/pull/111253 had 1 broken trunk failure as detected by Dr.CI from the base commit `3eb5cae3af` (valid) while trymerge didn't detect that because ghstack base commit `be8e517174` didn't have the same failure (miss). Pull Request resolved: https://github.com/pytorch/pytorch/pull/111520 Approved by: https://github.com/clee2000	2023-10-19 02:30:56 +00:00
Nikita Shulga	4f0cf1e1ff	Mark more decomp tests as slow (#111524 ) Something is broken with automatic slow detection, so let's do it manually Those tests were previously classified as slow, see: ``` test_decomp.py::TestDecompCUDA::test_quick_core_backward_baddbmm_cuda_float64 SKIPPED [0.0003s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 53%] test_decomp.py::TestDecompCUDA::test_quick_core_backward_clamp_max_cuda_float64 SKIPPED [0.0002s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 53%] test_decomp.py::TestDecompCUDA::test_quick_core_backward_clamp_min_cuda_float64 SKIPPED [0.0002s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 53%] ``` from https://ossci-raw-job-status.s3.amazonaws.com/log/17792633247 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111524 Approved by: https://github.com/kit1980, https://github.com/izaitsevfb, https://github.com/huydhn	2023-10-19 02:29:59 +00:00
Ke Wen	18cc8a92ac	[ProcessGroupNCCL] Avoid recording stream for synchronous ops (#111431 ) For synchronous ops (i.e. `asyncOp = False`), we don't want to record streams because we know that the NCCL stream will join back to the "current" stream right after this op. So we might just as well keep the stream ownership of the input/output tensors unchanged. The benefit would be that the allocation/free of the tensors would look deterministic to the "current" stream so that the caching allocator can reuse memory pool for this stream in a clever way. To prevent the input/output tensors from being recycled by python, we rely on the stashing mechanism in ProcessGroupNCCL (which can be also turned on by setting `TORCH_NCCL_AVOID_RECORD_STREAMS=1`). This mechanism change is for libraries like FSDP which uses `all_gather_into_tensor` and `reduce_scatter_tensor` in a synchronous way and which cannot set `TORCH_NCCL_AVOID_RECORD_STREAMS=1` for their users. And therefore, this change is limited to these two collectives for now. Cc: @awgu @janeyx99 @albanD Pull Request resolved: https://github.com/pytorch/pytorch/pull/111431 Approved by: https://github.com/H-Huang	2023-10-19 00:41:09 +00:00
dependabot[bot]	a7883ee470	Bump urllib3 from 2.0.6 to 2.0.7 in /tools/build/bazel (#111435 ) Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.0.6 to 2.0.7. - [Release notes](https://github.com/urllib3/urllib3/releases) - [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst) - [Commits](https://github.com/urllib3/urllib3/compare/2.0.6...2.0.7) --- updated-dependencies: - dependency-name: urllib3 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-10-18 17:14:06 -07:00
Tugsbayasgalan Manlaibaatar	547a116fcf	Fix redundant asserts (#111445 ) Fixes: https://github.com/pytorch/pytorch/issues/109852 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111445 Approved by: https://github.com/zhxchen17	2023-10-18 23:57:31 +00:00
Joel Schlosser	ba2ba9621c	More NT subclass op support for SAM (#111253 ) With this PR, we have full op support for SAM without needing to unwrap subclass into jagged buffer -> run ops -> rewrap manually. Specifically, this was previously happening in the MaskDecoder. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111253 Approved by: https://github.com/soulitzer, https://github.com/cpuhrsch	2023-10-18 21:21:28 +00:00
Huy Do	53c1dca6a3	[Reland] Add a workflow to release Android binaries (#110976 ) This adds 2 jobs to build PyTorch Android with and without lite interpreter: * Keep the list of currently supported ABI armeabi-v7a, arm64-v8a, x86, x86_64 * Pass all the test on emulator * Run an the test app on emulator and my Android phone `arm64-v8a` without any issue ![Screenshot_20231010-114453](https://github.com/pytorch/pytorch/assets/475357/57e12188-1675-44d2-a259-9f9577578590) * Run on AWS https://us-west-2.console.aws.amazon.com/devicefarm/home#/mobile/projects/b531574a-fb82-40ae-b687-8f0b81341ae0/runs/5fce6818-628a-4099-9aab-23e91a212076 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110976 Approved by: https://github.com/atalman	2023-10-18 21:17:11 +00:00
Pruthvi Madugundu	a771fde8b1	Update the magma to version 2.7.2 (#111442 ) - 2.7.2 version + few ROCm related commits: https://bitbucket.org/icl/magma/pull-requests/37 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111442 Approved by: https://github.com/Skylion007, https://github.com/jithunnair-amd	2023-10-18 21:09:05 +00:00
Catherine Lee	102fbd402c	[ci] Move step to get workflow job id before test step in linux (#111483 ) We’ve been strugging to get the job id since 9/28/2023 12:03 pm. Before this we had almost 0 problems getting job id, but after, we get a lot of `Recieved status code '502' when attempting to retrieve https://api.github.com/repos/pytorch/pytorch/actions/runs/6551579728/jobs?per_page=100:\n", 'Bad Gateway\n\nheaders=Server: GitHub.com\nDate: Tue, 17 Oct 2023 20:32:52 GMT\nContent-Type: application/json\nContent-Length: 32\nETag: "652eed15-20"\nVary: Accept-Encoding, Accept, X-Requested-With\nX-GitHub-Request-Id: EC62:7EE0:166AAF5:2D51A8E:652EEF6A\nconnection: close\n\n` ex https://github.com/pytorch/pytorch/actions/runs/6551579728/job/17793898278#step:18:22 Recently, it has been happening around 1/4 of the time, possibly more. I think this happens almost only on linux. I believe this is somehow caused by a test, since distributed tests seems to be disproportionately affected, so I move the step to get the job id before the test step. This also has the benefit of the test step being able to get the job id now if we want it. Regardless of whether this works or not, its a pretty harmless change that might make things easier in the future. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111483 Approved by: https://github.com/huydhn	2023-10-18 20:54:06 +00:00
PyTorch MergeBot	9c7391ea36	Revert " [1/N] Apply clang-tidy to c10 cuda files (#111137 )" This reverts commit 43b023694eea4348fa28e8028fa7445d6375860c. Reverted https://github.com/pytorch/pytorch/pull/111137 on behalf of https://github.com/malfet due to Was reverted internally due to the failures in torch.cuda.memory_stats(device=0) (presumably) ([comment](https://github.com/pytorch/pytorch/pull/111137#issuecomment-1769274103))	2023-10-18 20:32:53 +00:00
Huy Do	7fabb73dae	Add ciflow/rocm label to run ROCm jobs (#111394 ) Fixes https://github.com/pytorch/test-infra/issues/4516. As this is not part of trunk, it won't block regular merge. On the other hand, we can still add `ciflow/rocm` to run it on PR. ~~I'll add an auto label rule for this after this is merged and the label becomes available~~ Here it is https://github.com/pytorch/test-infra/pull/4647 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111394 Approved by: https://github.com/ZainRizvi	2023-10-18 20:28:13 +00:00
Nikita Shulga	16cb3bdd57	Skip `test_quick_core_backward_baddbmm_cuda_float64` (#111493 ) As its painfully slow (10+ min on A100): ```shell $ time python3 test_decomp.py -v -k test_quick_core_backward_baddbmm_cuda_float64 Fail to import hypothesis in common_utils, tests are not derandomized test_quick_core_backward_baddbmm_cuda_float64 (__main__.TestDecompCUDA) ... ok ---------------------------------------------------------------------- Ran 1 test in 897.523s OK real 15m4.773s user 15m0.207s sys 0m6.492s ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/111493 Approved by: https://github.com/clee2000, https://github.com/huydhn	2023-10-18 20:09:14 +00:00
Igor Sugak	93e5065ba0	[CODEMOD][caffe2] replace numpy.bool with bool (#111432 ) Test Plan: numpy.bool is long deprecated and removed starting numpy-1.20.0 [1]. This replaces all references with equivalent `bool` type using the following oneliner: ``` rg -l 'np\.bool' caffe2 \| grep '\.py$' \| xargs perl -pi -e 's,\bnp\.bool\b,bool,' ``` 1. https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations Differential Revision: D50372711 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111432 Approved by: https://github.com/Skylion007	2023-10-18 18:56:40 +00:00
ramcherukuri	fa995626a8	[ROCm] Bump kineto submodule commit to clear kineto cache to avoid memory leaks (#110849 ) Fixes #103999 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110849 Approved by: https://github.com/Skylion007	2023-10-18 17:34:03 +00:00
Yanbo Liang	256a5ff49d	int4 mm kernel enhancement (#111460 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/111460 Approved by: https://github.com/Chillee	2023-10-18 17:19:52 +00:00
Sherlock Huang	b72a1402f5	[AOTInductor] ProxyExecutor skips serializing missing args with default value (#111425 ) Summary: In AOTInductor ABI Compatible-mode, we don't serialize missing args with default value. Test Plan: buck2 run mode/dev-nosan deeplearning/aot_inductor/test:test_custom_ops Differential Revision: D50345729 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111425 Approved by: https://github.com/angelayi	2023-10-18 17:10:42 +00:00
Michael Lazos	543dc75746	[Reland] horizontal concat fusion (#111437 ) Reland https://github.com/pytorch/pytorch/pull/108115 The main fix is to disallow nop nodes to be included in foreach scheduler nodes Pull Request resolved: https://github.com/pytorch/pytorch/pull/111437 Approved by: https://github.com/yanboliang	2023-10-18 17:09:01 +00:00
PyTorch MergeBot	3eb5cae3af	Revert "[Compiled Autograd] Turn accumulate_grad into an op (#111271 )" This reverts commit 04b04c068659127a53d659c44b0dd75fa9fd5887. Reverted https://github.com/pytorch/pytorch/pull/111271 on behalf of https://github.com/jeanschmidt due to Breaking internal CI ([comment](https://github.com/pytorch/pytorch/pull/111271#issuecomment-1768527932))	2023-10-18 14:02:34 +00:00
PyTorch MergeBot	0be90c5d7f	Revert "[Compiled Autograd] Error if tensor_post_acc_grad_hooks is set (#111273 )" This reverts commit cba0dd0fdcdc550005976fd4af6fd3c70f4ddb3c. Reverted https://github.com/pytorch/pytorch/pull/111273 on behalf of https://github.com/jeanschmidt due to Breaking internal CI ([comment](https://github.com/pytorch/pytorch/pull/111273#issuecomment-1768522328))	2023-10-18 14:00:30 +00:00
PyTorch MergeBot	a389e2c7c7	Revert "[inductor] Move inductor ops to CompositeExplicitAutograd (#111274 )" This reverts commit 8b46a106f254fd860a4b7b99c8bb640ba58cb176. Reverted https://github.com/pytorch/pytorch/pull/111274 on behalf of https://github.com/jeanschmidt due to Breaking internal CI ([comment](https://github.com/pytorch/pytorch/pull/111274#issuecomment-1768517555))	2023-10-18 13:57:23 +00:00
PyTorch MergeBot	ed7739d690	Revert "[aot_inductor] return a copy of any constant (#111356 )" This reverts commit 71e1f34923af186dff46a8641c977a1cf507e06c. Reverted https://github.com/pytorch/pytorch/pull/111356 on behalf of https://github.com/jeanschmidt due to Breaking internal ci ([comment](https://github.com/pytorch/pytorch/pull/111356#issuecomment-1768503640))	2023-10-18 13:51:30 +00:00
PyTorch MergeBot	08f580d498	Revert "[inductor] Refactor and optimize allocation calls (#111117 )" This reverts commit 9ce0ae836d6801a39776897b9e891cd978b28aea. Reverted https://github.com/pytorch/pytorch/pull/111117 on behalf of https://github.com/jeanschmidt due to Braking internal CI ([comment](https://github.com/pytorch/pytorch/pull/111117#issuecomment-1768489865))	2023-10-18 13:45:02 +00:00
Nikita Shulga	a4391f085b	Add regression test for cuda_stream type checks (#111430 ) Reported in https://github.com/pytorch/pytorch/issues/111268 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111430 Approved by: https://github.com/huydhn ghstack dependencies: #111428	2023-10-18 07:24:01 +00:00
Nikita Shulga	e2f1d03d73	[BE] Use `C10_UNUSED` (#111439 ) <!-- copilot:poem --> ### <samp>🤖 Generated by Copilot at 21e87dc</samp> > _We're sailing on the sea of code, with warnings to avoid_ > _We use the `C10_UNUSED` macro for variables unexploited_ > _We heave and ho and pull and push, and make the code more neat_ > _We sing this shanty as we go, to keep us in good spirits_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/111439 Approved by: https://github.com/huydhn	2023-10-18 04:54:47 +00:00
Bin Bao	1ac36dbd2a	[aotinductor] Make writing of the weight files to be conditional (#111379 ) Summary: Since we cache the AOTInductor generated library file, we should not need to write the weights as binary file if the library file already exists. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111379 Approved by: https://github.com/chenyang78	2023-10-18 04:52:36 +00:00
Scruel Tao	108378e2af	Fix: `torch.matrix_exp` performance issue (#105225 ) (#110848 ) Fixes #105225 - New implementation for `compute_T18_scale_square` method. - Always use the highest degree for large batch sizes (size > 1). Pull Request resolved: https://github.com/pytorch/pytorch/pull/110848 Approved by: https://github.com/lezcano	2023-10-18 04:43:25 +00:00
Bin Bao	a9b3afd3d8	[aotinductor] Refactor the generated result (#111080 ) Summary: Return the compiled library path as a string instead of wrap it as a callable. Differential Revision: [D50246941](https://our.internmc.facebook.com/intern/diff/D50246941) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111080 Approved by: https://github.com/jansel, https://github.com/chenyang78	2023-10-18 04:35:34 +00:00
Nikita Shulga	e9a51a6a07	[BE] Revive test_typing (#111428 ) `test_typing.py` was written to use `pytest` in https://github.com/pytorch/pytorch/pull/54234 which unfortunately rendered it incompatible with run_test.py, and therefore it was not running in CI all this time. In this PR, same functionality is re-written using unittest framework, and `parametrize` from `torch.testing._internal._common_utils`. Valid `test_typing.py` with ufmt Disable `fail/bitwise_ops.py` and `pass/jit.py` as it regressed at some point as well as one of examples in `namedtuple.py` as `torch.linalg.qr` type is no longer revealed correctly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111428 Approved by: https://github.com/clee2000	2023-10-18 02:19:49 +00:00
jjsjann123	572628e520	Nvfuser code removal (#111093 ) Removes the existing integration code & build of nvfuser in TorchScript. Note that I intentionally left the part where we wipe out `third_party/nvfuser` repo. I'll do that in a separate PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111093 Approved by: https://github.com/albanD	2023-10-18 01:00:47 +00:00
BowenBao	0b14ec8ca6	[ONNX] Add dynamo_onnx_aot_inline to bench (#110183 ) An option that applies onnx.inliner post model export. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110183 Approved by: https://github.com/thiagocrepaldi	2023-10-18 00:43:04 +00:00
Justin Yip	eafce2394d	[pytorch-vulkan] aten::floor_divide (#110785 ) Summary: as title. only tensor_scalar. this diff does not include the element-wise tensor-tensor operation. Test Plan: ``` [yipjustin@33167.od ~/fbsource (9cfca7c97)]$ LD_LIBRARY_PATH=third-party/swiftshader/lib/linux-x64/ buck2 run fbcode/mode/dev-nosan //xplat/caffe2:pt_vulkan_api_test_bin -- --gtest_filter="floor_divide_scalar" Watchman fresh instance: new mergebase, cleared graph state, cleared dep files Buck UI: https://www.internalfb.com/buck2/bcac40be-79af-47c5-bd3f-95c11179aa68 Network: Up: 29MiB Down: 264MiB (reSessionID-2fef8b89-76b0-4496-bb27-b10d42cf7ef4) Jobs completed: 5196. Time elapsed: 45.9s. Cache hits: 81%. Commands: 2070 (cached: 1672, remote: 375, local: 23) BUILD SUCCEEDED Running main() from third-party/googletest/1.11.0/googletest/googletest/src/gtest_main.cc Note: Google Test filter = floor_divide_scalar [==========] Running 2 tests from 1 test suite. [----------] Global test environment set-up. [----------] 2 tests from VulkanAPITest [ RUN ] VulkanAPITest.floor_divide_scalar [ OK ] VulkanAPITest.floor_divide_scalar (150 ms) [ RUN ] VulkanAPITest.floor_divide_scalar_inplace [ OK ] VulkanAPITest.floor_divide_scalar_inplace (39 ms) [----------] 2 tests from VulkanAPITest (189 ms total) [----------] Global test environment tear-down [==========] 2 tests from 1 test suite ran. (190 ms total) [ PASSED ] 2 tests. ``` Differential Revision: D50001740 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110785 Approved by: https://github.com/SS-JIA	2023-10-17 23:02:35 +00:00
soulitzer	2dc1726ab7	Compile NestedTensor with AOTAutograd (#110529 ) This PR has a number of changes that improve subclass support for AOTAutograd/Inductor in general: - previously if a subclass does extra aliasing between graph outputs/inputs in a way, the partitioner would complain because grad_outputs are the outputs reused as-is. Now we do a view_as(self) to workaround this. - Use dense -> dense metadata when working with fwd_output_strides during backward. This is important since the stride information comes from inductor which sees the dense to dense graph. - Inductor requires that the inputs to the compiled backward to match some expected strides computed during compilation. We make sure to make the inner tensors of the subclass contiguous (previously, we only made the subclass itself contiguous) Changes specific to NestedTensor relevant to compilation: - Properly handle the case where `__tensor_unflatten__` is passed non-symbolic dense tensors and with meta extracted from fake subclasses. - Skip var_to_range logic for singleton int - Skip size hint logic in inductor for singleton int Pull Request resolved: https://github.com/pytorch/pytorch/pull/110529 Approved by: https://github.com/bdhirsh	2023-10-17 21:17:10 +00:00
Yanbo Liang	e708de83b9	[4/N] Reorder VariableBuilder._wrap (#111409 ) Reorganize the priority inside of ```VariableBuilder._wrap```: * is_allowed returning True -> TorchVariable * skipfiles.check returning True -> SkipFilesVariable * UserFunctionVariable/UserMethodVariable (This is means both is_allowed and skipfiles.check returning False, then inlining by default) * UserDefinedClassVariable * UserDefinedObjectVariable (the ultimate default value) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111409 Approved by: https://github.com/jansel	2023-10-17 21:12:34 +00:00
PyTorch MergeBot	41490119f2	Revert "[sparse] semi-structured sparse + torch.compile support (#111049 )" This reverts commit 408f210938176870133a3dde5e8fbc4926cafbc0. Reverted https://github.com/pytorch/pytorch/pull/111049 on behalf of https://github.com/clee2000 due to Sorry I'm pretty sure this caused a memory leak `408f210938` https://github.com/pytorch/pytorch/actions/runs/6550388354/job/17790615103 `test_sparse_semi_structured.py::TestSparseSemiStructuredCUDA::test_mlp_contiguous_relu_compile_backend_cutlass_dense_input_shape_(1, 128)_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSparseSemiStructuredCUDA.test_mlp_contiguous_relu_compile_backend_cutlass_dense_input_shape_(1, 128)_cuda! Caching allocator allocated memory was 235008 and is now reported as 352256 on device 0. CUDA driver allocated memory was 359333888 and is now 361431040.` ([comment](https://github.com/pytorch/pytorch/pull/111049#issuecomment-1767186569))	2023-10-17 21:11:09 +00:00
Zhengxu Chen	17002d25c5	[export] Remove call_spec argument from ExportedProgram ctor. (#111407 ) Summary: call_spec arg is not used anymore. Test Plan: CI Reviewed By: SherlockNoMad, tugsbayasgalan Differential Revision: D50335365 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111407 Approved by: https://github.com/izaitsevfb	2023-10-17 21:01:37 +00:00
Xiaoya Xiang	2bb1692334	fix dict size change during iteration (#111267 ) Summary: _wrapped_fns_to_patch points to f_globals which might change during iteration due to factors like lazy imports. This diff fixes potential runtime errors like: ``` RuntimeError: dictionary changed size during iteration ``` Test Plan: CI Reviewed By: Kronuz Differential Revision: D50283983 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111267 Approved by: https://github.com/yanboliang	2023-10-17 20:36:13 +00:00
Shunting Zhang	cc9b7bb85c	[reland] [inductor] fix a max-autotune rng state related bug (#111381 ) reland https://github.com/pytorch/pytorch/pull/109828 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111381 Approved by: https://github.com/lezcano	2023-10-17 19:16:36 +00:00
Sherlock Huang	1aad6d803a	[Reland][Inductor] Disallow OpOverloadPacket in ir.FallbackKernel (#110567 ) (#111396 ) This is a reland of #110567 with additional fbcode fixed. Summary: In ABI compatible mode, We always need op_overload.schema for FallbackKernel. Approved by: https://github.com/jansel Test Plan: contbuild & OSS CI, see `37a0265992` Differential Revision: D50339346 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111396 Approved by: https://github.com/chenyang78	2023-10-17 18:53:38 +00:00
Huy Do	6e8079e00f	Fix timeout value for memory leak check job (#111386 ) Fixes https://github.com/pytorch/pytorch/pull/110193 as it doesn't work as expected: * I forgot the timeout on the test step * Also MacOS test job wasn't covered ### Testing The job timeout is set correctly to 600 https://github.com/pytorch/pytorch/actions/runs/6541825177/job/17764485473#step:14:7 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111386 Approved by: https://github.com/clee2000	2023-10-17 18:25:02 +00:00
wz337	543a763cd8	[DCP] Add HSDP checkpiont unit tests (#111399 ) Add two unit tests: 1. HSDP checkpoint unit test 2. HSDP FSDP checkpoint conversion unit test Pull Request resolved: https://github.com/pytorch/pytorch/pull/111399 Approved by: https://github.com/wanchaol	2023-10-17 17:59:42 +00:00
Zain Rizvi	2c313880fc	[TD] Make test class correlation scores available to heuristics. (#111229 ) https://github.com/pytorch/test-infra/pull/4617 generates `file_test_class_rating.json`. Now we ensure it's available for heuristics to use during the test step. (Actual heuristics will come in a separate PR) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111229 Approved by: https://github.com/huydhn	2023-10-17 16:29:30 +00:00
Philip Meier	973c87b320	raise instead of skip in test/test_meta.py (#110939 ) Supersedes #109004. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110939 Approved by: https://github.com/lezcano, https://github.com/kurtamohler	2023-10-17 10:17:43 +00:00
Yang Chen	71e1f34923	[aot_inductor] return a copy of any constant (#111356 ) When the model returns a constant, we cannot "release" its handle, because the constant doesn't have any handle at all. Instead, we should allocate a new tensor and then return a copy of the constant. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111356 Approved by: https://github.com/hl475	2023-10-17 08:44:21 +00:00
PyTorch MergeBot	7a740e2b85	Revert "direct runtime assertions (#111262 )" This reverts commit e6d9350d7f135b3e0f27a949853ae691021b51f6. Reverted https://github.com/pytorch/pytorch/pull/111262 on behalf of https://github.com/jeanschmidt due to Breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/111262#issuecomment-1765881675))	2023-10-17 08:04:36 +00:00
Yanbo Liang	29048be41c	[Reland] Add int4mm kernel (#111403 ) This is a reland for #110914, #111327 and #111390 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111403 Approved by: https://github.com/Chillee	2023-10-17 06:33:18 +00:00
cyy	7b7f070ec5	[3/N] Apply clang-tidy to aten/src/ATen/core/ (#111301 ) Applies clang-tidy to aten/src/ATen/core/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/111301 Approved by: https://github.com/Skylion007	2023-10-17 05:52:20 +00:00
cyy	43b023694e	[1/N] Apply clang-tidy to c10 cuda files (#111137 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/111137 Approved by: https://github.com/zou3519, https://github.com/Skylion007	2023-10-17 04:52:50 +00:00
cdzhan	46000bede6	Fix a typo in fake tensor test. (#111193 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/111193 Approved by: https://github.com/janeyx99	2023-10-17 03:36:28 +00:00
Chien-Chin Huang	013b51f8cc	[state_dict][7/N] Add a fine tuning e2e test case for distributed.state_dict and DCP (#111111 ) As title Differential Revision: [D50209732](https://our.internmc.facebook.com/intern/diff/D50209732/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111111 Approved by: https://github.com/wz337 ghstack dependencies: #111106, #111107, #111275, #111109, #111110, #111120	2023-10-17 03:09:12 +00:00
Jez Ng	9ce0ae836d	[inductor] Refactor and optimize allocation calls (#111117 ) This splits out changes from https://github.com/pytorch/pytorch/pull/102625 to make things easier to review. This diff creates a `make_allocation()` method that extracts the logic from `make_buffer_allocation()` while allowing us to allocate non-buffer objects. In particular, we will use this to allocate memory pools during memory planning. This diff also includes a small optimization -- if the desired allocation is contiguous, then we emit a call to `empty()` instead of `empty_strided()` with its superfluous stride argument. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111117 Approved by: https://github.com/jansel	2023-10-17 03:06:52 +00:00
cyy	3e354ef3e3	Increase coverage of clang-tidy to CudaIPCTypes.cpp (#111371 ) This PR uses clang-tidy in torch/csrc/CudaIPCTypes.cpp Pull Request resolved: https://github.com/pytorch/pytorch/pull/111371 Approved by: https://github.com/Skylion007	2023-10-17 02:08:10 +00:00
Aaron Gokaslan	a0632389b7	[BE]: Update lintrunner mypy to 1.6.0 (#111375 ) Follow up to #111305 that updates lintrunner's version too. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111375 Approved by: https://github.com/malfet	2023-10-17 01:22:06 +00:00
Aaron Gokaslan	c8a72db432	[BE]: Update ruff to 0.1.0 (#111391 ) Updates RUFF to the latest and greatest version Pull Request resolved: https://github.com/pytorch/pytorch/pull/111391 Approved by: https://github.com/albanD, https://github.com/malfet	2023-10-17 01:09:16 +00:00
Chien-Chin Huang	19a6487ad4	[state_dict][6/N] Change API names to avoid conflict and simplify the API signatures (#111120 ) `state_dict` is a very common variable name people use to represent a local state_dict and `load_state_dict` conflicts with DCP's `load_state_dict`. This PR changes `state_dict` to `get_state_dict`. `get_state_dict` is more close to what is this API does -- users use the API to get the current state_dict for saving or for loading (passed to DCP for loading in-place).. This PR also changes `load_state_dict` to `set_state_dict`. `set_state_dict` is less ideal compared to `get_state_dict` but is symetric. We can still change the API name before it goes to beta. This PR also simplies the API signatures. `model_only` is removed and `optim_only` only exists for `get_state_dict`. Differential Revision: [D50213931](https://our.internmc.facebook.com/intern/diff/D50213931/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111120 Approved by: https://github.com/wz337 ghstack dependencies: #111106, #111107, #111275, #111109, #111110	2023-10-17 00:15:31 +00:00
Brian Hirsh	7fb09b804b	Reland "AOTAutograd: Go down inference path if no outputs require grad (#111011 )" (#111347 ) Re-land of https://github.com/pytorch/pytorch/pull/111011. The original PR ended up having a bad interaction with code that tried to run `torch.compile` under `with torch.inference_mode`, which caused some internal tests to fail. The issue was that: (1) AOTInductor invokes the pattern matcher passes in inductor (2) The pattern matcher registers some code with [training_graph](https://github.com/pytorch/pytorch/blob/main/torch/_inductor/fx_passes/pad_mm.py#L461) (3) The `training_graph` function expects to be able to set the global autograd state to `requires_grad`, and always get out a join graph (assertion [here](https://github.com/pytorch/pytorch/blob/main/torch/_inductor/pattern_matcher.py#L1196)). (4) However, when inference_mode is activated, and you try to run AOTAutograd, AOTAutograd will witness that all outputs to the traced function will not require grad, and (now correctly) think that we are tracing an inference graph, which fails the above assert. After talking to Bin, it sounds like these training-only patterns aren't necessary when we know we are compiling an inference graph (which should always be the case if you're running torch.compile with inference_mode). So I updated the pattern matcher to ignore any pattern matches using `training_graph`, when inference_mode is enabled. This reverts commit cf6b1cdf6ac74d375b0787bd8f9463cb3a53b0e5. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111347 Approved by: https://github.com/Chillee	2023-10-17 00:11:15 +00:00
Nikita Shulga	f84755bcac	Fix _CudaStreamBase type annotations (#111387 ) Make it inherit from `Stream` as indeed it is, see `97a513ed07/torch/csrc/cuda/Stream.cpp (L208)` and ``` python3 -c "import torch;print(torch._C._CudaStreamBase.__base__)" <class 'torch.Stream'> ``` Fixes https://github.com/pytorch/pytorch/issues/111268 TODO (in separate PR): Revive `test_typing` and add regression test Pull Request resolved: https://github.com/pytorch/pytorch/pull/111387 Approved by: https://github.com/jeanschmidt, https://github.com/Skylion007	2023-10-16 23:26:58 +00:00
Chien-Chin Huang	9683a26c55	[state_dict][5/N] Add submodules save and load support (#111110 ) It is not easy for user to do submodules save and load (e.g., fine tuning) because FSDP requires to get the root module. This PR enables the support of submodule save and load. Differential Revision: [D50209727](https://our.internmc.facebook.com/intern/diff/D50209727/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111110 Approved by: https://github.com/wz337 ghstack dependencies: #111106, #111107, #111275, #111109	2023-10-16 23:25:37 +00:00
Huy Do	bd9a2465e7	Back out "Add a workflow to release Android binaries (#110976 )" (#111401 ) Summary: Original commit changeset: 96813f0fac68 Original Phabricator Diff: D50161780 This breaks the integration test on T166457344 Test Plan: Sandcastle. Differential Revision: D50344243 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111401 Approved by: https://github.com/izaitsevfb	2023-10-16 23:16:37 +00:00
Jesse Cai	408f210938	[sparse] semi-structured sparse + torch.compile support (#111049 ) Summary: This PR adds in torch.compile support for semi-structured sparsity, using the subclass tracing @bdhirsh added. Based on wether we are using cuSPARSELt or CUTLASS, we return a different representation of the inner tensors. Test Plan: ``` python test/test_sparse_semi_structured.py -k compile ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/111049 Approved by: https://github.com/cpuhrsch ghstack dependencies: #110583	2023-10-16 23:07:26 +00:00
Kazuaki Ishizaki	deb800ee81	Fix typo under test directory (#111304 ) This PR fixes typo in comments under `test` directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111304 Approved by: https://github.com/Skylion007	2023-10-16 23:06:06 +00:00
PyTorch MergeBot	1e70f4d02c	Revert "Reland #2 "[C10] PG observability hooks. (#108815 , #110907 )" (#111072 )" This reverts commit bb1424d46e656dfcdd4c12efe58ada9f1720c4d8. Reverted https://github.com/pytorch/pytorch/pull/111072 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/111072#issuecomment-1765399829))	2023-10-16 23:03:26 +00:00
Evgeni Burovski	5a8a89360d	Handle the `.tolist` method of np.arrays in dynamo (#111382 ) Fixes part 1 of https://github.com/pytorch/pytorch/issues/111370#issuecomment-1764730773 While at it, add a test for numpy ndarray `.size` attribute. This started as an attempt to remove the delegation of what looks like a `.size()` method --- which does not exist in numpy --- on the same line this patch adds a `tolist` to. But this is apparently needed for something else and existing tests start failing. Thus, declare it as _ain't broken don't fix_, and only keep the test. Can remove the test if wanted though. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111382 Approved by: https://github.com/lezcano	2023-10-16 22:56:52 +00:00
Richard Zou	afb4914c3d	Align torch.library.impl with the new torch.library style (#111308 ) We add a new overload to torch.library.impl that accepts an optional Library arg. If provided, the lifetime of the registration will be tied to the Library arg, otherwise, it will live forever. Test Plan: - existing and new tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/111308 Approved by: https://github.com/soulitzer ghstack dependencies: #111307	2023-10-16 22:32:23 +00:00
Richard Zou	9d9cc67592	Make torch.library.define consistent with the new APIs (#111307 ) This PR introduces a new overload of torch.library.define. Like impl_abstract, and our plans for the rest of the torch.library APIs, we allow it to accept an optional library object to tie the lifetime of the op definition to. Test Plan: - new tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/111307 Approved by: https://github.com/soulitzer, https://github.com/ezyang	2023-10-16 22:32:23 +00:00
Shubhraprakash Das	5c3955200c	Add linear quantize function to custom ops (#111148 ) Summary: Add linear quantize for vulkan to custom ops so it can be used from a model. Test Plan: buck2 run --target-platforms ovr_config//platform/macos:arm64-fbsource -c pt.vulkan_full_precision=1 //xplat/caffe2/fb/custom_ops/vulkan_quantized:pt_vulkan_quantized_test_binAppleMac\#macosx-arm64 [ OK ] VulkanAPITest.convert_qconv2d_context (135 ms) [ RUN ] VulkanAPITest.linear_2d [ OK ] VulkanAPITest.linear_2d (4 ms) [----------] 2 tests from VulkanAPITest (139 ms total) [----------] Global test environment tear-down [==========] 2 tests from 1 test suite ran. (139 ms total) [ PASSED ] 2 tests. ############################################################## buck2 build --target-platforms ovr_config//platform/macos:arm64-fbsource //xplat/caffe2:pt_vulkan_quantized_api_test_binAppleMac\#macosx-arm64 -c pt.vulkan_full_precision=1 --show-output" buck-out//v2/gen/fbsource/xplat/caffe2/pt_vulkan_quantized_api_test_binAppleMac [ OK ] VulkanAPITest.conv2d_pw_quantized_prepack_random_params_int8_int32 (11 ms) [ RUN ] VulkanAPITest.linear_2d_flat [ OK ] VulkanAPITest.linear_2d_flat (4 ms) [ RUN ] VulkanAPITest.linear_2d_small [ OK ] VulkanAPITest.linear_2d_small (1 ms) [ RUN ] VulkanAPITest.linear_2d_large [ OK ] VulkanAPITest.linear_2d_large (1 ms) [ RUN ] VulkanAPITest.linear_3d_flat [ OK ] VulkanAPITest.linear_3d_flat (2 ms) [ RUN ] VulkanAPITest.linear_3d_small [ OK ] VulkanAPITest.linear_3d_small (2 ms) [ RUN ] VulkanAPITest.linear_3d_large [ OK ] VulkanAPITest.linear_3d_large (1 ms) [ RUN ] VulkanAPITest.linear_4d_flat [ OK ] VulkanAPITest.linear_4d_flat (1 ms) [ RUN ] VulkanAPITest.linear_4d_small [ OK ] VulkanAPITest.linear_4d_small (1 ms) [ RUN ] VulkanAPITest.linear_4d_large [ OK ] VulkanAPITest.linear_4d_large (1 ms) [ RUN ] VulkanAPITest.linear_custom [ OK ] VulkanAPITest.linear_custom (0 ms) [----------] 76 tests from VulkanAPITest (1811 ms total) [----------] Global test environment tear-down [==========] 76 tests from 1 test suite ran. (1811 ms total) [ PASSED ] 76 tests. YOU HAVE 8 DISABLED TESTS ############################################################## buck2 run --target-platforms ovr_configplatform/macos:arm64-fbsourcexplat/caffe2:pt_vulkan_api_test_binAppleMac\#macosx-arm64 -c pt.vulkan_full_precision=1 [----------] Global test environment tear-down [==========] 346 tests from 1 test suite ran. (5648 ms total) [ PASSED ] 345 tests. [ SKIPPED ] 1 test, listed below: [ SKIPPED ] VulkanAPITest.querypool_flushed_shader_log YOU HAVE 5 DISABLED TESTS Reviewed By: manuelcandales Differential Revision: D49609985 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111148 Approved by: https://github.com/yipjustin	2023-10-16 21:47:09 +00:00
PyTorch MergeBot	408e991dfe	Revert "Quant: add weight int4pack mm kernel (#110914 )" This reverts commit 9980876cab9dcedce7d7dd1c8a2e168b548eaa36. Reverted https://github.com/pytorch/pytorch/pull/110914 on behalf of https://github.com/jeanschmidt due to Breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/110914#issuecomment-1765302621))	2023-10-16 21:27:26 +00:00
PyTorch MergeBot	5ff9b49063	Revert "update int4 tinygemm kernels (#111327 )" This reverts commit e0e15a4ac61648cc8f63f0ab102c32e8884fb5d1. Reverted https://github.com/pytorch/pytorch/pull/111327 on behalf of https://github.com/jeanschmidt due to This PR is preventing the revert of https://github.com/pytorch/pytorch/pull/110914 ([comment](https://github.com/pytorch/pytorch/pull/111327#issuecomment-1765299310))	2023-10-16 21:24:54 +00:00
Valentin Andrei	f29b957475	[cuda] vectorized implementation for layer_norm_grad_input_kernel (#111021 ) Using vectorized loads/stores makes the `layer_norm_grad_input_kernel` generally faster. This PR accelerates medium and larger problem sizes. ```python def run_model_on_device(fs, X, gO, device_string, numeric_type): ln = torch.nn.LayerNorm((fs,), device=device_string, dtype=numeric_type) ln.reset_parameters() X.grad = None ln.zero_grad(set_to_none=True) out = ln(X) out.backward(gO) return (ln.weight.grad, ln.bias.grad) def run_correctness_test(eps_weight, eps_bias): dtype = torch.float for val in l_inputs: bs = val[0][0] fs = val[0][1] mean_adjustment = torch.randn(fs, device="cpu", dtype=torch.float) X = mean_adjustment * torch.randn( bs, fs, device="cpu", dtype=torch.float, requires_grad=True ) X = X.detach().requires_grad_() gO = torch.rand_like(X) X_gpu = X.to("cuda") X_gpu = X_gpu.detach().requires_grad_() gO_gpu = gO.to("cuda") gO_gpu = gO_gpu.detach().requires_grad_() grad_cpu_ref = run_model_on_device(fs, X, gO, "cpu", dtype) grad_gpu = run_model_on_device(fs, X_gpu, gO_gpu, "cuda", dtype) weight_grad_gpu_target = grad_gpu[0].detach().to("cpu") bias_grad_gpu_target = grad_gpu[1].detach().to("cpu") weight_delta = torch.abs(grad_cpu_ref[0] - weight_grad_gpu_target) weight_mismatches = (weight_delta >= eps_weight).nonzero() weight_mismatch_pct = len(weight_mismatches) / len(weight_delta) * 100 bias_delta = torch.abs(grad_cpu_ref[1] - bias_grad_gpu_target) bias_mismatches = (bias_delta >= eps_bias).nonzero() bias_mismatch_pct = len(bias_mismatches) / len(bias_delta) * 100 if weight_mismatch_pct > 0 or bias_mismatch_pct > 0: print( "Size ({} x {}) mismatch percentage: weight {:3.2f} bias {:3.2f}".format( fs, bs, weight_mismatch_pct, bias_mismatch_pct ) ) # Run the correctness tests run_correctness_test(0.01, 0.01) torch.cuda.synchronize() # Allocate a tensor equal to L2 cache size on A100 GPUs l2_cache_flusher = torch.empty(int(80 * (1024*2)), dtype=torch.float, device="cuda") # Run the performance tests. We need to run this at global scope because otherwise # the `ln` and `gO` objects are likely removed by the JIT compiler results = [] for dtype in (torch.float, torch.half): for val in l_inputs: bs = val[0][0] fs = val[0][1] iterations = val[1] ln = torch.nn.LayerNorm((fs,), device="cuda", dtype=dtype) X = torch.randn(bs, fs, device="cuda", dtype=dtype, requires_grad=True) gO = torch.rand_like(X) # Try to measure FWD and BWD pass in the same loop l_ev_start_fwd = [torch.cuda.Event(enable_timing=True)] iterations l_ev_stop_fwd = [torch.cuda.Event(enable_timing=True)] * iterations l_ev_stop_bwd = [torch.cuda.Event(enable_timing=True)] * iterations l_fwd_times = [] l_bwd_times = [] torch.cuda.synchronize() for i in range(iterations): l2_cache_flusher.zero_() torch.cuda._sleep(1_000_000) X.grad = None ln.zero_grad(set_to_none=True) l_ev_start_fwd[i].record() out = ln(X) l_ev_stop_fwd[i].record() out.backward(gO) l_ev_stop_bwd[i].record() torch.cuda.synchronize() l_fwd_times = [] l_bwd_times = [] for i in range(iterations): l_fwd_times.append(l_ev_start_fwd[i].elapsed_time(l_ev_stop_fwd[i])) l_bwd_times.append(l_ev_stop_fwd[i].elapsed_time(l_ev_stop_bwd[i])) print( "({}, {}, {}, fwd_ms, bwd_ms)\|{:.3f}\|{:.3f}".format( dtype, bs, fs, sum(l_fwd_times) / iterations * 1000, sum(l_bwd_times) / iterations * 1000, ) ) ``` Results in the attached picture: <img width="314" alt="Screenshot 2023-10-16 at 11 08 25 AM" src="https://github.com/pytorch/pytorch/assets/23515689/ce571fc5-c84e-47eb-95f6-9faa44042cc1"> I also isolated the previous implementation and the vectorized one into a native CUDA program and the speedup is confirmed. Average speedup = 21.73% ``` Size (2048, 2048); Mismatches: dX = 0 out of 4194304. Max missmatch idx = 0. [16/1529] reference = 0.0560 (ms); optimized = 0.0435 (ms); bw_opt = 1437.54 GB/s; speedup = 28.78% Size (4096, 512); Mismatches: dX = 0 out of 2097152. Max missmatch idx = 0. reference = 0.0220 (ms); optimized = 0.0174 (ms); bw_opt = 1797.26 GB/s; speedup = 26.44% Size (1024, 512); Mismatches: dX = 0 out of 524288. Max missmatch idx = 0. reference = 0.0101 (ms); optimized = 0.0082 (ms); bw_opt = 953.49 GB/s; speedup = 22.97% Size (1024, 256); Mismatches: dX = 1 out of 262144. Max missmatch idx = 22411. reference = 0.0082 (ms); optimized = 0.0075 (ms); bw_opt = 521.14 GB/s; speedup = 9.21% Size (1024, 1024); Mismatches: dX = 0 out of 1048576. Max missmatch idx = 0. reference = 0.0137 (ms); optimized = 0.0108 (ms); bw_opt = 1447.42 GB/s; speedup = 26.93% Size (2048, 512); Mismatches: dX = 0 out of 1048576. Max missmatch idx = 0. reference = 0.0141 (ms); optimized = 0.0116 (ms); bw_opt = 1349.79 GB/s; speedup = 21.81% Size (2048, 256); Mismatches: dX = 0 out of 524288. Max missmatch idx = 0. reference = 0.0108 (ms); optimized = 0.0102 (ms); bw_opt = 768.90 GB/s; speedup = 6.09% Size (1024, 128); Mismatches: dX = 1 out of 131072. Max missmatch idx = 9165. reference = 0.0070 (ms); optimized = 0.0068 (ms); bw_opt = 288.56 GB/s; speedup = 2.81% Size (1024, 2048); Mismatches: dX = 0 out of 2097152. Max missmatch idx = 0. reference = 0.0223 (ms); optimized = 0.0164 (ms); bw_opt = 1905.58 GB/s; speedup = 35.90% Size (1024, 768); Mismatches: dX = 3 out of 786432. Max missmatch idx = 507105. reference = 0.0113 (ms); optimized = 0.0101 (ms); bw_opt = 1160.00 GB/s; speedup = 11.79% Size (2048, 128); Mismatches: dX = 0 out of 262144. Max missmatch idx = 0. reference = 0.0097 (ms); optimized = 0.0089 (ms); bw_opt = 440.97 GB/s; speedup = 9.12% Size (2048, 1024); Mismatches: dX = 0 out of 2097152. Max missmatch idx = 0. reference = 0.0204 (ms); optimized = 0.0166 (ms); bw_opt = 1881.43 GB/s; speedup = 22.81% Size (4096, 256); Mismatches: dX = 1 out of 1048576. Max missmatch idx = 601965. reference = 0.0156 (ms); optimized = 0.0154 (ms); bw_opt = 1016.47 GB/s; speedup = 1.24% Size (4096, 1024); Mismatches: dX = 0 out of 4194304. Max missmatch idx = 0. reference = 0.0411 (ms); optimized = 0.0417 (ms); bw_opt = 1499.55 GB/s; speedup = -1.43% Size (4096, 4096); Mismatches: dX = 0 out of 16777216. Max missmatch idx = 0. reference = 0.2323 (ms); optimized = 0.2077 (ms); bw_opt = 1203.75 GB/s; speedup = 11.83% Size (1024, 4096); Mismatches: dX = 0 out of 4194304. Max missmatch idx = 0. reference = 0.0659 (ms); optimized = 0.0570 (ms); bw_opt = 1096.51 GB/s; speedup = 15.60% Size (1024, 3072); Mismatches: dX = 0 out of 3145728. Max missmatch idx = 0. reference = 0.0425 (ms); optimized = 0.0299 (ms); bw_opt = 1568.10 GB/s; speedup = 42.11% Size (1024, 2464); Mismatches: dX = 8 out of 2523136. Max missmatch idx = 2087476. reference = 0.0292 (ms); optimized = 0.0230 (ms); bw_opt = 1636.18 GB/s; speedup = 27.07% Size (1024, 800); Mismatches: dX = 1 out of 819200. Max missmatch idx = 652342. reference = 0.0114 (ms); optimized = 0.0104 (ms); bw_opt = 1175.05 GB/s; speedup = 9.63% Size (1024, 6144); Mismatches: dX = 0 out of 6291456. Max missmatch idx = 0. reference = 0.0973 (ms); optimized = 0.0844 (ms); bw_opt = 1110.87 GB/s; speedup = 15.28% Size (1024, 4904); Mismatches: dX = 6 out of 5021696. Max missmatch idx = 4670210. reference = 0.0814 (ms); optimized = 0.0721 (ms); bw_opt = 1037.99 GB/s; speedup = 12.90% Size (4096, 2048); Mismatches: dX = 0 out of 8388608. Max missmatch idx = 0. reference = 0.0990 (ms); optimized = 0.0770 (ms); bw_opt = 1623.58 GB/s; speedup = 28.54% Size (1024, 1860); Mismatches: dX = 0 out of 1904640. Max missmatch idx = 0. reference = 0.0219 (ms); optimized = 0.0174 (ms); bw_opt = 1631.12 GB/s; speedup = 25.75% Size (1024, 20160); Mismatches: dX = 23 out of 20643840. Max missmatch idx = 20274656. reference = 0.3054 (ms); optimized = 0.2600 (ms); bw_opt = 1183.08 GB/s; speedup = 17.45% Size (3072, 256); Mismatches: dX = 0 out of 786432. Max missmatch idx = 0. reference = 0.0129 (ms); optimized = 0.0127 (ms); bw_opt = 925.71 GB/s; speedup = 1.69% Size (4096, 128); Mismatches: dX = 3 out of 524288. Max missmatch idx = 451331. reference = 0.0128 (ms); optimized = 0.0129 (ms); bw_opt = 608.06 GB/s; speedup = -0.74% Size (512, 128); Mismatches: dX = 0 out of 65536. Max missmatch idx = 0. reference = 0.0062 (ms); optimized = 0.0061 (ms); bw_opt = 161.25 GB/s; speedup = 2.35% Size (2048, 64); Mismatches: dX = 0 out of 131072. Max missmatch idx = 0. reference = 0.0084 (ms); optimized = 0.0086 (ms); bw_opt = 228.70 GB/s; speedup = -2.49% Size (3072, 2048); Mismatches: dX = 0 out of 6291456. Max missmatch idx = 0. reference = 0.0770 (ms); optimized = 0.0614 (ms); bw_opt = 1527.43 GB/s; speedup = 25.44% Size (3200, 104); Mismatches: dX = 0 out of 332800. Max missmatch idx = 0. reference = 0.0105 (ms); optimized = 0.0113 (ms); bw_opt = 440.93 GB/s; speedup = -6.96% Size (1152, 384); Mismatches: dX = 0 out of 442368. Max missmatch idx = 0. reference = 0.0102 (ms); optimized = 0.0084 (ms); bw_opt = 786.48 GB/s; speedup = 21.59% Size (131072, 64); Mismatches: dX = 12 out of 8388608. Max missmatch idx = 7659094. reference = 0.2054 (ms); optimized = 0.2873 (ms); bw_opt = 438.49 GB/s; speedup = -28.51% Size (64, 131072); Mismatches: dX = 0 out of 8388608. Max missmatch idx = 0. reference = 0.8372 (ms); optimized = 0.3295 (ms); bw_opt = 379.37 GB/s; speedup = 154.09% Size (131072, 128); Mismatches: dX = 18 out of 16777216. Max missmatch idx = 16158071. reference = 0.2296 (ms); optimized = 0.3116 (ms); bw_opt = 805.47 GB/s; speedup = -26.31% Size (128, 131072); Mismatches: dX = 0 out of 16777216. Max missmatch idx = 0. reference = 0.9297 (ms); optimized = 0.3785 (ms); bw_opt = 660.52 GB/s; speedup = 145.64% Size (131072, 256); Mismatches: dX = 47 out of 33554432. Max missmatch idx = 33062426. reference = 0.3003 (ms); optimized = 0.4231 (ms); bw_opt = 1184.07 GB/s; speedup = -29.02% Size (256, 131072); Mismatches: dX = 0 out of 33554432. Max missmatch idx = 0. reference = 1.0449 (ms); optimized = 0.4828 (ms); bw_opt = 1035.63 GB/s; speedup = 116.43% Average speedup = 21.73% ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/111021 Approved by: https://github.com/malfet	2023-10-16 21:22:41 +00:00
Jason Ansel	8b46a106f2	[inductor] Move inductor ops to CompositeExplicitAutograd (#111274 ) I suspect in practice this won't matter, but if we do end up tracing this it causes them not to get decomposed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111274 Approved by: https://github.com/voznesenskym, https://github.com/desertfire ghstack dependencies: #111271, #111273	2023-10-16 21:16:24 +00:00
Jason Ansel	cba0dd0fdc	[Compiled Autograd] Error if tensor_post_acc_grad_hooks is set (#111273 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111273 Approved by: https://github.com/voznesenskym ghstack dependencies: #111271	2023-10-16 21:16:24 +00:00
Jason Ansel	04b04c0686	[Compiled Autograd] Turn accumulate_grad into an op (#111271 ) Rather than baking the behavior of `AccumulateGrad` nodes into the generated graph (either as `+=`, or as a return value of the graph). This creates a new `accumulate_grad_` dispatcher op that is included in the generated graph like: ``` def forward(self, inputs, sizes, hooks): getitem = inputs[0] getitem_1 = inputs[1] getitem_2 = inputs[2] getitem_3 = inputs[3] getitem_4 = inputs[4] getitem_5 = inputs[5] getitem_6 = inputs[6] getitem_7 = inputs[7] getitem_8 = inputs[8] getitem_9 = inputs[9]; inputs = None expand = torch.ops.aten.expand.default(getitem, [2, 4]); getitem = None threshold_backward = torch.ops.aten.threshold_backward.default(expand, getitem_1, 0); expand = getitem_1 = None t = torch.ops.aten.t.default(getitem_3); getitem_3 = None mm = torch.ops.aten.mm.default(threshold_backward, t); t = None t_1 = torch.ops.aten.t.default(threshold_backward) mm_1 = torch.ops.aten.mm.default(t_1, getitem_2); t_1 = getitem_2 = None t_2 = torch.ops.aten.t.default(mm_1); mm_1 = None sum_1 = torch.ops.aten.sum.dim_IntList(threshold_backward, [0], True); threshold_backward = None view = torch.ops.aten.view.default(sum_1, [4]); sum_1 = None t_3 = torch.ops.aten.t.default(t_2); t_2 = None accumulate_grad_ = torch.ops.inductor.accumulate_grad_.default(getitem_4, t_3); getitem_4 = t_3 = None threshold_backward_1 = torch.ops.aten.threshold_backward.default(mm, getitem_5, 0); mm = getitem_5 = None t_4 = torch.ops.aten.t.default(threshold_backward_1) mm_2 = torch.ops.aten.mm.default(t_4, getitem_6); t_4 = getitem_6 = None t_5 = torch.ops.aten.t.default(mm_2); mm_2 = None sum_2 = torch.ops.aten.sum.dim_IntList(threshold_backward_1, [0], True); threshold_backward_1 = None view_1 = torch.ops.aten.view.default(sum_2, [4]); sum_2 = None t_6 = torch.ops.aten.t.default(t_5); t_5 = None accumulate_grad__1 = torch.ops.inductor.accumulate_grad_.default(getitem_7, t_6); getitem_7 = t_6 = None accumulate_grad__2 = torch.ops.inductor.accumulate_grad_.default(getitem_8, view_1); getitem_8 = view_1 = None accumulate_grad__3 = torch.ops.inductor.accumulate_grad_.default(getitem_9, view); getitem_9 = view = None return [] ``` The motivation here is `AccumulateGrad` nodes are causing trouble in FSDP tracing, since FSDP is in-place resizing parameters and parameter storage in hooks. We will model this mutation in dynamo, but not during the initial compiled autograd capture. This allows us to bypass failing shape checks in the initial capture. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111271 Approved by: https://github.com/voznesenskym	2023-10-16 21:16:17 +00:00
Aniket Patil	6f06832219	Fixed typo in activation.py (#111358 ) liner -> linear Pull Request resolved: https://github.com/pytorch/pytorch/pull/111358 Approved by: https://github.com/mikaylagawarecki	2023-10-16 20:36:55 +00:00
PyTorch MergeBot	97a513ed07	Revert "Add `lazy_clone_storage` to create COW storages (#110192 )" This reverts commit 1c308144177d6e1663e41aae32a89e1c49b8b3b4. Reverted https://github.com/pytorch/pytorch/pull/110192 on behalf of https://github.com/jeanschmidt due to Breaking internal builds, @ezyang please support the author providing further details ([comment](https://github.com/pytorch/pytorch/pull/110192#issuecomment-1765157285))	2023-10-16 19:43:20 +00:00
George White	c271df9239	IPUHooksInterface: fix a typo, remove const & (#111372 ) Return an `at::Generator` from `newIPUGenerator`, not a reference to one. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111372 Approved by: https://github.com/Skylion007	2023-10-16 19:19:40 +00:00
Shengbao Zheng	07f0413b70	[c10d] add nccl version to c10d logger (#111215 ) Summary: NCCL version is essential for debugging purpose and NCCL rollout monitoring. Log this info for easy access. Test Plan: run cmf10x on devgpu https://pxl.cl/3B5gf https://fburl.com/scuba/pytorch_c10d_logging/lybk2usq Differential Revision: D50240853 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111215 Approved by: https://github.com/Skylion007	2023-10-16 18:47:49 +00:00
Sam Larsen	ff432c048d	[easy] Remove duplicate exprs in produce_guards (#111270 ) Summary: We're checking the original guard.expr in the issued set instead of the simplified expr, leading to duplicate guards in cases where one expression simplifies to another. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111270 Approved by: https://github.com/Chillee, https://github.com/ezyang	2023-10-16 18:31:38 +00:00
Luo Bo	b691d09010	fix: reset prefetch flag upon reshard (#111354 ) The `prefetched` flag should be reset upon reshard. Otherwise, for zero2, next access to the corresponding parameter will skip "unshard" operation, and results in wrong parameter shape. The need of unsharding is also metioned [in the comment of `FlatParameterHandle.unshard`](https://github.com/pytorch/pytorch/blob/main/torch/distributed/fsdp/_flat_param.py#L1241-L1242). As [`FlatParameterHandle` already guarded it against unnecessary all gather](https://github.com/pytorch/pytorch/blob/main/torch/distributed/fsdp/_flat_param.py#L1240), this shouldn't incur extra communication overhead. _Personally I also find `_prefetched` a bit of mis-named, it should really be `_unsharded`._ Pull Request resolved: https://github.com/pytorch/pytorch/pull/111354 Approved by: https://github.com/awgu	2023-10-16 18:31:33 +00:00
Thiago Crepaldi	9ab6ac5bc1	[ONNX] Fix aten::new_zeros due to TorchScript behavior change on Pytorch 2.1 Fix #110935 (#110956 ) Fixes #110597 Summary: * Generic code: The `torch._C.Value.node().mustBeNone()` is encapsulated into the high-level API `JitScalarType.from_value` ; `_is_none` was also extended to allow either `None` or `torch._C.Value.node.mustBeNone()`, so users don't manually call into TorchScript API when implementing operators * Specific to `new_zeros` (and ops of ` _like` and `new_`): When checking `dtype`, we always must use ` _is_none`, which will call proposed by #110935 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110956 Approved by: https://github.com/justinchuby, https://github.com/BowenBao	2023-10-16 18:28:20 +00:00
ydwu4	9f562a3de3	[dynamo] make disable_cahce_limit also disable accumulated cache limit (#111334 ) Fixes #111329. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111334 Approved by: https://github.com/yanboliang	2023-10-16 17:59:04 +00:00
PyTorch MergeBot	89f11c69a8	Revert "[inductor] Adding a way to force fusion of int_mm with mul (#111125 )" This reverts commit f4297576e63e4110f6bdf2522ae6a5fb4c7f3816. Reverted https://github.com/pytorch/pytorch/pull/111125 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it fails on ROCm `f4297576e6` ([comment](https://github.com/pytorch/pytorch/pull/111125#issuecomment-1764956174))	2023-10-16 17:37:13 +00:00
Wanchao Liang	59281d5631	[tp] fix SP style regression (#111353 ) [tp] fix SP style regression Although we want to remove prepare_input/output, we should still keep the old behavior for SequenceParallel Pull Request resolved: https://github.com/pytorch/pytorch/pull/111353 Approved by: https://github.com/fduwjj	2023-10-16 17:18:17 +00:00
PyTorch MergeBot	493618d745	Revert "[C10D] Introduce C++ side Collective Callbacks. (#110307 )" This reverts commit 359336e3e9a0f67974e53805b5207fbbbc149490. Reverted https://github.com/pytorch/pytorch/pull/110307 on behalf of https://github.com/wconstab due to this sits on top of another PR https://github.com/pytorch/pytorch/pull/111072 that needs to be reverted due to internal release testing failure / multisect blame ([comment](https://github.com/pytorch/pytorch/pull/110307#issuecomment-1764910301))	2023-10-16 17:07:58 +00:00
Tao He	6462d71c10	Fixes a typo in docstring: should be "elastic" (#111352 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111352 Approved by: https://github.com/H-Huang	2023-10-16 16:54:52 +00:00
Brian Hirsh	0d368f586a	fix wrong meta for index_select.out (#111364 ) fixes https://github.com/pytorch/pytorch/issues/110699 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111364 Approved by: https://github.com/ezyang ghstack dependencies: #111040	2023-10-16 15:18:20 +00:00
Brian Hirsh	4cf23c6a61	FunctionalTensor: avoid spurious not_implemented logging during proxy tracing (#111040 ) This is kind of hard to test, but I can try to add a test case if requested. I noticed locally that we now end up logging to the ProxyTensorMode and FakeTensorMode `not_implemented` logs in very simple compile examples: https://github.com/pytorch/pytorch/blob/main/torch/fx/experimental/proxy_tensor.py#L269 It was because `_mirror_autograd_meta_to()` indirectly queries sizes, and since modes have higher priority than subclasses, `aten::sym_sizes()` was getting dispatched to our modes before going to `FunctionalTensor.__torch_dispatch__`. This works out fine (they return NotImplemented and we eventually get to `FunctionalTensor`) but I figured we want to avoid cluttering up the logs. So I wrapped the calls with `FunctionalTensorMode`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111040 Approved by: https://github.com/ezyang	2023-10-16 15:18:20 +00:00
FFFrog	50b80185d6	fix bugs about traceback.walk_stack in python3.8.x (#110922 ) Fixes #110769 as stated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110922 Approved by: https://github.com/mikaylagawarecki	2023-10-16 14:29:07 +00:00
Edward Z. Yang	126d422cf0	Error if you try to run Dynamo compiled function under torch.jit.trace (#111321 ) Fixes https://github.com/pytorch/pytorch/issues/111319 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/111321 Approved by: https://github.com/Chillee	2023-10-16 13:52:29 +00:00
PyTorch UpdateBot	78909a6f0b	[xla hash update] update the pinned xla hash (#111360 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml). Update the pinned xla hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111360 Approved by: https://github.com/pytorchbot	2023-10-16 11:53:00 +00:00
PyTorch MergeBot	9af82fa2b8	Revert "[vision hash update] update the pinned vision hash (#111316 )" This reverts commit da364449909b02202e542952c271244a33412c4a. Reverted https://github.com/pytorch/pytorch/pull/111316 on behalf of https://github.com/DanilBaibak due to Broken trunk ([comment](https://github.com/pytorch/pytorch/pull/111316#issuecomment-1763827734))	2023-10-16 06:43:09 +00:00
PyTorch MergeBot	b4745d476c	Revert "[sparse] semi-structured sparse + torch.compile support (#111049 )" This reverts commit ac02531babab028cb260d2225ff9e91e92df063b. Reverted https://github.com/pytorch/pytorch/pull/111049 on behalf of https://github.com/DanilBaibak due to Broken trunk ([comment](https://github.com/pytorch/pytorch/pull/111049#issuecomment-1763795957))	2023-10-16 06:16:59 +00:00
fduwjj	bfcd86955e	[TP] Fix TP doc format to show examples correctly (#111346 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111346 Approved by: https://github.com/wanchaol ghstack dependencies: #111160, #111166, #111176, #111177	2023-10-16 06:15:10 +00:00
chilli	e0e15a4ac6	update int4 tinygemm kernels (#111327 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111327 Approved by: https://github.com/msaroufim ghstack dependencies: #111314	2023-10-15 21:53:29 +00:00
Wanchao Liang	882bc1708b	[dtensor][11/n] adds some __str__ for ease of read (#111278 ) This add __Str__ to op schema and dtensor spec for ease of reading Pull Request resolved: https://github.com/pytorch/pytorch/pull/111278 Approved by: https://github.com/fduwjj ghstack dependencies: #109145, #110717, #111234	2023-10-15 16:00:31 +00:00
Wanchao Liang	6b5d736bf7	[dtensor][10/n] switch pointwise op to use op strategy (#111234 ) As titled, this also handles sth like [Shard(0), Shard(0)] correctly for pointwise ops, which was previously errored out Pull Request resolved: https://github.com/pytorch/pytorch/pull/111234 Approved by: https://github.com/fduwjj ghstack dependencies: #109145, #110717	2023-10-15 16:00:31 +00:00
Wanchao Liang	f34f3b5421	[dtensor][9/n] matrix ops to generate strategy (#110717 ) This PR switches matrix ops to generate the sharding strategies, and with the cost selection algorithm introduced in the previous PR we are able to enable this and more ops to leverage strategy based sharding prop This also fixes a bunch of corner cases that existing propagation does not cover, resulting in full coverage for baddbmm Pull Request resolved: https://github.com/pytorch/pytorch/pull/110717 Approved by: https://github.com/fduwjj ghstack dependencies: #109145	2023-10-15 16:00:16 +00:00
Wanchao Liang	b4ab8ac515	[dtensor][8/N] Introduce cost model for sharding (#109145 ) This PR adds some basic comm cost model for sharding prop Pull Request resolved: https://github.com/pytorch/pytorch/pull/109145 Approved by: https://github.com/fduwjj	2023-10-15 16:00:06 +00:00
fduwjj	25a2845d78	[TP] Enable embedding sharding in TP API (#111177 ) We see use cases where embedding sharding is also needed in TP API so we enabled it in the API since DTensor already support colwise embedding sharding. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111177 Approved by: https://github.com/wanchaol ghstack dependencies: #111160, #111166, #111176	2023-10-15 11:49:56 +00:00
chilli	e942fddb83	Fix get_estimated_runtime for symbolic shapes (#111314 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111314 Approved by: https://github.com/lezcano	2023-10-15 05:40:03 +00:00
Avik Chaudhuri	e6d9350d7f	direct runtime assertions (#111262 ) Previously we were generating a graph to add runtime assertions on inputs and then running that graph to check input constraints. This PR checks input constraints directly. Differential Revision: D50289970 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111262 Approved by: https://github.com/zhxchen17	2023-10-15 05:15:09 +00:00
Chien-Chin Huang	7df287dc18	[state_dict][4/N] Support strict flag for model.load_state_dict (#111109 ) As title Differential Revision: [D50209723](https://our.internmc.facebook.com/intern/diff/D50209723/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111109 Approved by: https://github.com/wz337 ghstack dependencies: #111106, #111107, #111275	2023-10-15 04:58:15 +00:00
PyTorch UpdateBot	da36444990	[vision hash update] update the pinned vision hash (#111316 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111316 Approved by: https://github.com/pytorchbot	2023-10-15 04:37:20 +00:00
Nikita Shulga	4a388e70f2	Update mypy to 1.6.0 (#111305 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111305 Approved by: https://github.com/janeyx99	2023-10-15 01:55:44 +00:00
Evgeni Burovski	48989bc820	trace frames with np.ndarray (#110512 ) Fixes #109604 Resubmit gh-109715 + several skips and small fixes to make tests pass. The main fix here is by @ysiraichi : previously, dynamo did not resume tracing numpy ndarrays after a graph break. While at it, fix several small issues Yukio's fix uncovers: - graph break gracefully on numpy dtypes which do not map to torch.dtypes (uint16 etc) - recognize array scalars in dynamo, treat them as 0D ndarrays - make sure that iterating over torch.ndarray generates arrays not bare tensors Pull Request resolved: https://github.com/pytorch/pytorch/pull/110512 Approved by: https://github.com/lezcano	2023-10-15 00:56:10 +00:00
Yanbo Liang	da662248fb	[Dynamo] Fix autograd.Function tracing errors loudly involving saved tensors (#111277 ) Fixes #104792 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111277 Approved by: https://github.com/jansel, https://github.com/zou3519	2023-10-15 00:47:59 +00:00
fduwjj	ff3d773dd9	[TP] Add deprecation warnings in the documentations for Pairwise parallel, sequence parallel and other prepare input/output functions (#111176 ) As part of TP UX improvements, we want to keep our API simple (not easy) so that users get the flexibility to do what they want and avoid a too generic API which tries to solve everything and get things too complicated. We are updating the doc accordingly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111176 Approved by: https://github.com/wanchaol ghstack dependencies: #111160, #111166	2023-10-15 00:39:24 +00:00
Bin Bao	73d288fdf9	[aotinductor] Relax ExternKernel kwargs checking (#111167 ) Summary: When a fallback kernel is called without specifying any kwargs, we still need to fill in default values for those kwargs when generating cpp call. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111167 Approved by: https://github.com/chenyang78, https://github.com/jgong5	2023-10-14 21:41:33 +00:00
Edwiv	5caf2e55d4	[FSDP] fix: fix for fsdp zero2 validation error (#110139 ) # Problem When sharding_strategy is set to SHARD_GRAD_OP and forward_prefetch is turned on, the validation after the train has an incorrect weight shape. <img width="1508" alt="image" src="https://github.com/pytorch/pytorch/assets/41232043/57a9c3bb-cb5c-46df-ac26-922740686f9e"> # Analyze When using `SHARD_GRAD_OP`, the `free_unsharded_flat_param` in `_post_forward_reshard` is often False, so it does not set the handle's `_prefetched` flag to False after the forward. The normal train phase sets this flag to False in the `_post_backward_final_callback`, and the validation phase doesn't execute the hook, so after the first iter of the validation is done, the flag of the handle of the prefetched will remain True. This will cause the handle to skip the `_unshard` in the next `_pre_forward_unshard`, and the `_prefetch_handle` will not do a prefetch, which will result in an incorrect weight shape. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110139 Approved by: https://github.com/awgu	2023-10-14 20:59:28 +00:00
Nikita Shulga	6dc54fe8d6	[BE] Compile FBGEMM with ASAN (#111266 ) If `USE_ASAN` is set, compile FBGEMM with ASAN as well, by setting `USE_SANITIZER` to `address,undefined` This fixes regression in sanitizer coverage introduced by https://github.com/pytorch/pytorch/pull/93147 that change effects of sanitizer from the entire project to just torch libraries, and finally allows one to reliably catch regression reported in https://github.com/pytorch/pytorch/issues/111189 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111266 Approved by: https://github.com/huydhn	2023-10-14 20:35:04 +00:00
alanhe151220037	cff8bf47c3	update the dispatch of some operators which accept scalar (#110918 ) The scalar overloads of some ops like `bitwise_xor.Scalar` were dispatched to `CompositeImplicitAutograd` by default. It is against the rule for `CompositeImplicitAutograd` that all tensor operations (except reading metadata) must be done through calls to the ATen dispatcher rather than interacting with the Tensor directly. So, here update the dispatch of these overloads to `CompositeExplicitAutograd`. Fixes #93224 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110918 Approved by: https://github.com/peterbell10	2023-10-14 15:54:08 +00:00
fduwjj	8085e08a84	[TP] Add prepareInput and output for input/output DTensor layout annotation in the parent module in TP API (#111166 ) In some use cases, we found that users might want to annote the input/output DTensor layout for the parent module rather than the submodule whose parameters are to be distributed so that we want to have these two class for users to annote input/output DTensor layouts so that we register pre-FWD/FWD hook for the TP-lized module. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111166 Approved by: https://github.com/wanchaol ghstack dependencies: #111160	2023-10-14 15:37:52 +00:00
Chien-Chin Huang	7c67139e7b	[state_dict][3/N] Cleanup StateDictOptions, make it more readable (#111275 ) This is a reland PR for https://github.com/pytorch/pytorch/pull/111108 with the proper docstring fix. 1. Rename DistributedStateDictOptions to StateDictOptions. 2. Remove cpu_offload as we have not yet required this option. 3. Rename save_frozen_parameters to ignore_frozen_params. Differential Revision: [D50294352](https://our.internmc.facebook.com/intern/diff/D50294352/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111275 Approved by: https://github.com/wz337 ghstack dependencies: #111106, #111107	2023-10-14 15:34:52 +00:00
fduwjj	3a8b10e2da	[TP] Refactor Parallel Style to make it more usable (#111160 ) One thing we find it challenging for users is that we don't want to expose the concept of prepare_input and prepare_out to users since there are so many func names for users to select from which is quite confusing. On the other hand, the colwise and rowwise parallel always need input(out) and output(in) to be certain layout so we can somehow simplify the logic here and make it more usable. So we added three public attributes to the parallelStyle here and the code logic is like: ```python class ParallelStyle(ABC): """ The parallel style user wants the module or submodule to be parallelized. We can add more in future, but this seems sufficient for immediate needs. Users can extend this class to build their own parallel style with customized input/output preparations. """ input_layouts: Union[placement, Tuple[placement]] output_layouts: Union[placement, Tuple[placement]] use_local: bool class RowwiseParallel(ParallelStyle): """ Partitioning the row of a module. We assume the input to be a sharded DTensor and output to be a replicate Tensor. """ def __init__(self): super().__init__(input_layouts=Shard(-1), output_layouts=Replicate(), use_local=True) Class ColwiseParallel(ParallelStyle): """ Partitioning the column of a module. We assume the input to be a Replicated DTensor and output to be a sharded DTensor. """ def __init__(self): super().__init__(input_layouts=Replicate(), output_layouts=Shard(-1), use_local=True) # For the case of Sequence parallel, users just set different input_shard, Shard(0) or Shard(1) instead of Replicate() Class PrepareModuleInput(ParallelStyle): """ Only used to specify the input distribute spec for a module. """ def __init__(self): super().__init__(input_layouts=Shard(0), output_layouts=Replicate(), use_local=False) Class PrepareModuleOutput(ParallelStyle): """ Only used to specify the output distribute spec for a module. """ def __init__(self): super().__init__(input_layouts=Replicate(), output_layouts=Shard(0), use_local=True) parallelize_plan = { "embedding": ColwiseParallel(output_shard=Replicate()), "attn": PrepareModuleInput(), "attn.w1": ColwiseParallel(), "attn.w2": ColwiseParallel(), "attn.w3": ColwiseParallel(), "attn.wo": RowwiseParallel(), } parallelize_module( module=block, # this can be a submodule or module device_mesh=mesh['tp'], parallelize_plan=parallelize_plan, ) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/111160 Approved by: https://github.com/wanchaol	2023-10-14 15:26:36 +00:00
Will Feng	b28cb43f5c	Intra-graph reordering pass on Inductor scheduler IR (based on #100762 ) (#108091 ) This PR implements intra-graph communication reordering pass on Inductor scheduler IR, based on Horace's previous PR #100762. Main algorithm: 1. Greedily moves waits as late as possible (i.e. until we reach a use) 2. Greedily moves comms as early as possible (i.e. until we reach an input) 3. Move computes following simple heuristics to improve overlap. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108091 Approved by: https://github.com/Chillee, https://github.com/wanchaol	2023-10-14 14:51:24 +00:00
cyy	8bd5eb8c96	[2/N] Apply clang-tidy to aten/src/ATen/core/ (#111006 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111006 Approved by: https://github.com/Skylion007	2023-10-14 14:15:42 +00:00
Adnan Akhundov	d7317d8a11	Fix size_hint call sites failing on unbacked SymInts (#110520 ) Summary: Unbacked SymInts can't get a `sizevars.size_hint` due to being data-dependent. #109893 has added a new `fallback` parameter to `sizevars.size_hint` to specify the fallback value in cases like unbacked SymInt. In this PR we add more of those. Test Plan: CI Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/110520 Approved by: https://github.com/jansel, https://github.com/ezyang	2023-10-14 08:10:09 +00:00
Bert Maher	0013611c81	[inductor] Allow backend compiler to skip (#111153 ) Summary: Sometimes the backend compiler can encounter a transient failure (in our case, a remote build service infrequently hits a hiccup). We'd rather run eager than fail the training job. Test Plan: Inject an exception in the RE path and run: ``` buck2 run @//mode/{opt,inplace} //caffe2/test/inductor:smoke ``` Differential Revision: D50234516 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111153 Approved by: https://github.com/ezyang, https://github.com/jansel	2023-10-14 02:44:15 +00:00
Nikita Shulga	48e4d18388	[BE] Move ASAN from clang-12 to clang-15 (#111218 ) Hopefully it will align with internal system and they will detect heap-overlow access reported in https://github.com/pytorch/pytorch/issues/111189 Also, do not build neither Triton, nor protobuf nor DB dependencies (as they are not needed for ASAN builds/tests) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111218 Approved by: https://github.com/Skylion007	2023-10-14 02:31:41 +00:00
PyTorch MergeBot	581d97c19e	Revert "[state_dict][3/N] Cleanup StateDictOptions, make it more readable (#111108 )" This reverts commit b1db9590853d2ac205bc57c906d81935874daf09. Reverted https://github.com/pytorch/pytorch/pull/111108 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but I think it is cleaner to reland this change ([comment](https://github.com/pytorch/pytorch/pull/111108#issuecomment-1762504496))	2023-10-14 02:22:19 +00:00
Zhengxu Chen	11ac4ace5f	[export] Use meta val from the old nodes in run_decompositions(). (#111225 ) Summary: fall back to the old nodes when meta val is missing. Test Plan: buck2 run //executorch/examples/portable/scripts:export -- --model_name=emformer_predict Differential Revision: D50278439 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111225 Approved by: https://github.com/larryliu0820	2023-10-14 02:08:49 +00:00
Jesse Cai	ac02531bab	[sparse] semi-structured sparse + torch.compile support (#111049 ) Summary: This PR adds in torch.compile support for semi-structured sparsity, using the subclass tracing @bdhirsh added. Based on wether we are using cuSPARSELt or CUTLASS, we return a different representation of the inner tensors. Test Plan: ``` python test/test_sparse_semi_structured.py -k compile ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/111049 Approved by: https://github.com/cpuhrsch ghstack dependencies: #110583	2023-10-14 01:13:01 +00:00
Kurt Mohler	1c30814417	Add `lazy_clone_storage` to create COW storages (#110192 ) This PR relands #110022 but accounts for the changes in #110191. Also, the function for creating COW storages is called `lazy_clone_storage` in this PR, instead of `try_ensure` NOTE: COW storages do not actually copy on write yet, they just have the COW deleter and deleter context applied to them Part of #109833 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110192 Approved by: https://github.com/ezyang	2023-10-14 00:53:21 +00:00
PyTorch MergeBot	482782406a	Revert "Add `lazy_clone_storage` to create COW storages (#110192 )" This reverts commit 33f151348684bd74fbc9939f00c39408ef92074d. Reverted https://github.com/pytorch/pytorch/pull/110192 on behalf of https://github.com/kit1980 due to revert to work around some importing issues ([comment](https://github.com/pytorch/pytorch/pull/110192#issuecomment-1762430374))	2023-10-14 00:48:45 +00:00
Thiago Crepaldi	4f4e2c1c08	Add constant node sizes to proto size calculation (#111097 ) Fixes #110982 https://github.com/pytorch/pytorch/pull/62257 deprecated `torch.onnx.export(use_external_data_format: bool=...)` argument, but it seems the introduced `EncoderBase::GetGraphProtoSize` has a bug and doesn't detect models > 2GB when onnx Constant nodes are large (and responsible for the size overflow) This PR adds the constant node to the total size of the model, along with initializers. In python, what we need to do is: ```python import onnx def compute_tensor_size(tensor): # Compute the size of the tensor based on its shape and data type size = tensor.size * tensor.itemsize return size def sum_constant_and_initializer_sizes(model_path): # Load the ONNX model model = onnx.load(model_path) total_size = 0 initializer_size = 0 constant_size = 0 # Compute the size of constant nodes for node in model.graph.node: if node.op_type == 'Constant': constant_value = node.attribute[0].t # Convert constant value to numpy array constant_array = onnx.numpy_helper.to_array(constant_value) # Compute the size of the constant tensor tensor_size = compute_tensor_size(constant_array) total_size += tensor_size constant_size += tensor_size # Compute the size of initializer nodes that are not graph inputs for initializer in model.graph.initializer: if initializer.name not in [input.name for input in model.graph.input]: # Convert the shape and data type information to calculate size # tensor = onnx.helper.tensor_value_info_to_tensor(input) tensor = onnx.numpy_helper.to_array(initializer) tensor_size = compute_tensor_size(tensor) total_size += tensor_size initializer_size += tensor_size return total_size, constant_size, initializer_size model_path = '/path/to/model.onnx' total_size, constant_size, initializer_size = sum_constant_and_initializer_sizes(model_path) print("Total size of constant nodes in bytes:", constant_size) print("Total size of initializer nodes (excluding graph inputs) in bytes:", initializer_size) print("Total size of constant and initializer nodes (excluding graph inputs) in bytes:", total_size) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/111097 Approved by: https://github.com/justinchuby, https://github.com/zhipenghan	2023-10-14 00:37:02 +00:00
Michael Voznesensky	3b08a4a6b2	[dynamo] collapse local and global guard builders (#111226 ) [Wait for CI] [dynamo] collapse local and global guard builders Pull Request resolved: https://github.com/pytorch/pytorch/pull/111226 Approved by: https://github.com/ezyang	2023-10-14 00:16:59 +00:00
Kent Gauen	bb89a9e48c	Skipped CUDA Flags if C++ Extension Name includes "arch" Substring (#111211 ) The CUDA architecture flags from TORCH_CUDA_ARCH_LIST will be skipped if the TORCH_EXTENSION_NAME includes the substring "arch". A C++ Extension should be allowed to have any name. I just manually skip the TORCH_EXTENSION_NAME flag when checking if one of the flags is "arch". There is probably a better fix, but I'll leave this to experts. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111211 Approved by: https://github.com/ezyang	2023-10-14 00:10:01 +00:00
HDCharles	f4297576e6	[inductor] Adding a way to force fusion of int_mm with mul (#111125 ) Summary: When doing quantization int_mm -> mul or int_mm -> mul -> to(dtype) is an extremely common op pattern which is currently not handled well by inductor. Ideally, since the output of int_mm has dtype int32 we'd prefer to only realize a smaller dtype like bf16 or float16. Currently inductor doesn't have a way to force this, in many cases the mul gets fused with a bunch of subsequent pointwise ops from the dequant creating an increase in memory overhead and a general slowdown compared to the fused version. Theoretically with better control of/smarter inductor fusion, this could be something we get for free, at which point these changes can be removed. Test Plan: python test/inductor/test_pattern_matcher.py -k "int_mm_mul" Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/111125 Approved by: https://github.com/jansel, https://github.com/cpuhrsch	2023-10-13 23:37:14 +00:00
yewentao	c151163333	Documentation Clarification on torch.compile Example (#110942 ) Fixes #110917 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110942 Approved by: https://github.com/msaroufim, https://github.com/malfet	2023-10-13 22:46:42 +00:00
isdanni	00d962631c	[BE] Enable Ruff's Flake8 PYI045 (#111184 ) Enable [iter-method-return-iterable (PYI045)](https://docs.astral.sh/ruff/rules/iter-method-return-iterable/#iter-method-return-iterable-pyi045) Link: #110950 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111184 Approved by: https://github.com/Skylion007	2023-10-13 22:20:04 +00:00
Zhengxu Chen	ba7b9211ee	[export] Update serialization schema to input/output specs. (#845 ) (#111204 ) Summary: Pull Request resolved: https://github.com/pytorch/executorch/pull/845 Test Plan: CI Differential Revision: D50191531 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111204 Approved by: https://github.com/angelayi	2023-10-13 22:19:56 +00:00
veritas-Qiu	a3e9b80082	Fix torch.diagonal for torch.onnx.export when dim1<0 or dim2<0 (#111130 ) in many cases, torch.diagonal will pass (dim1=-2, dim2=-1), onnx export will always fail in these cases this pr try to fix the bug Pull Request resolved: https://github.com/pytorch/pytorch/pull/111130 Approved by: https://github.com/thiagocrepaldi	2023-10-13 22:05:53 +00:00
Jez Ng	375e7bd003	Un-skip a bunch of UnaryUfuncInfo bfloat16 tests (#110799 ) It appears they were disabled because the test suite didn't use to support weaker tolerances for them. But it seems that has since been addressed; e.g. we have relaxed tolerances specified in `ded5ee75ac/test/test_unary_ufuncs.py (L208-L209)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/110799 Approved by: https://github.com/eellison ghstack dependencies: #110798	2023-10-13 21:46:53 +00:00
Jez Ng	d8de45d22c	Update arg{min,max} tests and docs (#110845 ) The `argmin` docs had been updated in https://github.com/pytorch/pytorch/issues/78791 but left a minor typo. `argmax` had a similar issue but was not noticed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110845 Approved by: https://github.com/eellison	2023-10-13 21:40:29 +00:00
Edward Z. Yang	d38472c176	Don't sympify reflection_pad2d ranges (#111212 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/111212 Approved by: https://github.com/eellison	2023-10-13 21:36:30 +00:00
isdanni	382327bd0e	[BE] Enable Ruff's Flake8 PYI034 (#111105 ) Enable [non-self-return-type (PYI034)](https://docs.astral.sh/ruff/rules/non-self-return-type/#non-self-return-type-pyi034) Link: #110950 EDIT: to newly added reviewers, please ignore the request, it's due to a rebase error 😅 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111105 Approved by: https://github.com/Skylion007	2023-10-13 21:19:53 +00:00
lezcano	2fd546aa5e	Allow strided layout in torch.normal (#111205 ) Fixes https://github.com/pytorch/pytorch/issues/111119 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111205 Approved by: https://github.com/ezyang	2023-10-13 21:17:38 +00:00
Chien-Chin Huang	b1db959085	[state_dict][3/N] Cleanup StateDictOptions, make it more readable (#111108 ) 1. Rename DistributedStateDictOptions to StateDictOptions. 2. Remove cpu_offload as we have not yet required this option. 3. Rename save_frozen_parameters to ignore_frozen_params. Differential Revision: [D50209711](https://our.internmc.facebook.com/intern/diff/D50209711/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111108 Approved by: https://github.com/wz337 ghstack dependencies: #111106, #111107	2023-10-13 21:03:51 +00:00
Nikita Karetnikov	625a3b1a42	Remove some patterns from PrimTorch merge rules (#111230 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111230 Approved by: https://github.com/ezyang	2023-10-13 20:37:15 +00:00
Peter Bell	6aa91c8dad	[dynamo] Register einops functions lazily (#110575 ) Fixes #110549 We currently have a circular import between dynamo and einops as described in the issue. This works around the issue by adding a mechanism to register initialization callbacks that are called the first time an object is seen from that particular module. This means that dynamo will only import `einops` after it's already fully initialized and being called in a function being traced. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110575 Approved by: https://github.com/jansel ghstack dependencies: #110990	2023-10-13 20:08:40 +00:00
Peter Bell	8747e4c8c1	[dynamo] Add specialized variable tracker for sys.modules (#110990 ) `sys.modules` is currently treated as a constant dictionary and any reference to it will result in guards on the full contents of `sys.modules`. This instead adds a specialized variable tracker which tries to guard only on the modules referenced by the code. e.g. ``` sys.modules["operator"].add(x, x) ``` will generate the guard ``` ___dict_contains('operator', G['sys'].modules) ``` It does this with special support for `__contains__` `__getitem__` and `.get` which are probably the most commonly used with `sys.modules`. For anything else we just fall back to building the dict tracker as normal. While accessing `sys.modules` may seem unusual, it actually comes up when inlining the `warnings.catch_warnings` context manager which internally accesses `sys.modules["warnings"]`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110990 Approved by: https://github.com/ezyang	2023-10-13 20:08:40 +00:00
Bin Bao	058cb70ad9	[CI] Add auto label rule for torch/_export (#111181 ) Summary: Auto label all torch/_export changes with ciflow/inductor to trigger AOTInductor tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111181 Approved by: https://github.com/angelayi	2023-10-13 20:07:47 +00:00
Jesse Cai	8db72a430d	[sparse] Add padding for dense matrices in semi-structured sparse (#110583 ) Summary: Currently we have shape constraints in semi-structured sparsity for both CUTLASS and cuSPARSELt These shape constraints unfortunately apply to both the dense and sparse matrices in sparsedense matmul. This PR adds in support for calling `F.pad` in order to pad dense matrices to the right size with zeros and then pull out the corresponding rows from the resultant result matrix. We also throw a warning in this case. The tests have also been updated to take in a dense_input_shape parameter. Test Plan: ``` python test/test_sparse_semi_structured.py ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/110583 Approved by: https://github.com/alexsamardzic, https://github.com/cpuhrsch	2023-10-13 20:04:23 +00:00
PyTorch MergeBot	2b6f281e5c	Revert "Remove dead code (#111207 )" This reverts commit c2ed714f54eb6564123bb53401c4c66aeba40625. Reverted https://github.com/pytorch/pytorch/pull/111207 on behalf of https://github.com/huydhn due to Sorry for reverting this, but it breaks lint `c2ed714f54` ([comment](https://github.com/pytorch/pytorch/pull/111207#issuecomment-1762126366))	2023-10-13 19:56:11 +00:00
PyTorch MergeBot	cf6b1cdf6a	Revert "AOTAutograd: Go down inference path if no outputs require grad (#111011 )" This reverts commit ded5ee75ac51af1614cc79cd9c6f76524f10c3d8. Reverted https://github.com/pytorch/pytorch/pull/111011 on behalf of https://github.com/kit1980 due to broke internal aotinductor tests with inference_mode ([comment](https://github.com/pytorch/pytorch/pull/111011#issuecomment-1762056233))	2023-10-13 19:11:26 +00:00
Nikita Shulga	d84dcfb3e0	[Doc] Fix typo in cpp/installing when wheel is used (#111143 ) <!-- copilot:summary --> ### <samp>🤖 Generated by Copilot at 34a15a6</samp> Updated the cmake command in `docs/cpp/source/installing.rst` to use python3 and fix a documentation error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111143 Approved by: https://github.com/clee2000, https://github.com/kit1980	2023-10-13 18:56:27 +00:00
lezcano	c2ed714f54	Remove dead code (#111207 ) This dictionary is not used anywhere. The _make_dupe_guard function does not exist anymore Pull Request resolved: https://github.com/pytorch/pytorch/pull/111207 Approved by: https://github.com/Skylion007, https://github.com/voznesenskym	2023-10-13 18:46:27 +00:00
Chien-Chin Huang	e99abaae2f	[state_dict][2/N] Let distributed.state_dict accepts single optimizer (#111107 ) It's quite annoying that users have to create a tuple of optimizers even if there is only one optimizer. This PR makes most users' life easier. Differential Revision: [D50209704](https://our.internmc.facebook.com/intern/diff/D50209704/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111107 Approved by: https://github.com/wz337 ghstack dependencies: #111106	2023-10-13 18:40:57 +00:00
Jon Chuang	ac768333be	[dynamo] fix prim lowering validation logic for dynamic shape args (#111208 ) Fixes https://github.com/pytorch/pytorch/issues/111199 Fixes https://github.com/pytorch/pytorch/issues/111203 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111208 Approved by: https://github.com/ezyang	2023-10-13 18:36:13 +00:00
Chien-Chin Huang	247d5e16fc	[DCP] Improve with_temp_dir robustness (#111106 ) Calling os.sync() to ensure the tempfile can be seens across ranks. Differential Revision: [D50209697](https://our.internmc.facebook.com/intern/diff/D50209697/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111106 Approved by: https://github.com/Skylion007, https://github.com/wz337	2023-10-13 18:03:24 +00:00
eqy	5a2ab7dcb7	[CUDA][cuFFT] Initialize CUDA context for cuFFT before execute is called (#110326 ) Potential fix for #109448 CC @Aidyn-A Pull Request resolved: https://github.com/pytorch/pytorch/pull/110326 Approved by: https://github.com/Aidyn-A, https://github.com/malfet	2023-10-13 18:02:25 +00:00
PyTorch MergeBot	f68d6e8108	Revert "Move at::{Refcounted,}MapAllocator to c10 (#109881 )" This reverts commit 68a1219f74467a4d2124288f3ab6f8bc471fe4a1. Reverted https://github.com/pytorch/pytorch/pull/109881 on behalf of https://github.com/kit1980 due to breaking internal builds, undefined symbol: _ZN3c1022RefcountedMapAllocator6decrefEv ([comment](https://github.com/pytorch/pytorch/pull/109881#issuecomment-1761950014))	2023-10-13 17:57:53 +00:00
Kazuaki Ishizaki	8162f4170b	Fix typo under c10 directory (#111155 ) This PR fixes typo in comments and messages in files under `c10` directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111155 Approved by: https://github.com/Skylion007	2023-10-13 16:52:51 +00:00
Kazuaki Ishizaki	ac48c11ab7	Fix typo under torchgen directory (#111154 ) This PR fixes typo in comments and messages in files under `torchgen` directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111154 Approved by: https://github.com/rajveer43, https://github.com/Skylion007	2023-10-13 16:43:46 +00:00
isdanni	b460c30893	[BE] Enable Ruff's Flake8 PYI042 (#111114 ) Enable [snake-case-type-alias (PYI042)](https://docs.astral.sh/ruff/rules/snake-case-type-alias/) Link: #110950 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111114 Approved by: https://github.com/albanD	2023-10-13 16:33:07 +00:00
Jay Chae	5db9f911ac	[pt][group_fusion] fix shape guarding in fusion candidate search (#111174 ) Summary: without the `all` in the fix ``` node.kwargs.get("beta", 1.0) == 1.0 node.kwargs.get("alpha", 1.0) == 1.0 and len(input_shape) == 2 and len(weight_shape) == 2 and all(x % 2 == 0 for x in input_shape + weight_shape) and shape <= MAX_FUSE_TENSOR_SIZE_GROUP_LINEAR # <----- HERE for shape in input_shape + weight_shape ``` this statement defaults to a generator object which means it will always be true. One of the issues is that the shapes could be an odd number which forces gmm to load element-by-element rather than vectorized load. In VDDv3 torchbench example(posted in test plan), you can see there is a 37ms GMM call which swamps any gain from fusion. Overall this change makes the GMM fusion 24% faster Differential Revision: D48696572 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111174 Approved by: https://github.com/davidberard98	2023-10-13 16:28:02 +00:00
Scott Wolchok	84975339bd	[PyTorch] AOTI: generate reused thread_locals when tensors provably have static shape (#110892 ) If a Tensor can be reused and has static shape, we can just cache it across iterations. This is meant as a quickly shippable overhead reduction for CPU overhead-bound use cases that we can ship without relying on memory planning. Differential Revision: [D50023678](https://our.internmc.facebook.com/intern/diff/D50023678/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110892 Approved by: https://github.com/bertmaher ghstack dependencies: #110876, #110877, #110909	2023-10-13 16:07:05 +00:00
Scott Wolchok	bf72a723ef	[PyTorch] AOTI: Add aoti_torch_assign_tensors to ABI (#110909 ) I need this to do a cheap and easy output copy in D50023678. Differential Revision: [D50105080](https://our.internmc.facebook.com/intern/diff/D50105080/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110909 Approved by: https://github.com/jansel, https://github.com/chenyang78, https://github.com/desertfire ghstack dependencies: #110876, #110877	2023-10-13 16:07:05 +00:00
Michael Voznesensky	cff71c47dd	[dynamo] Forward fix a bunch of distributed collective allow fixes (#111171 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111171 Approved by: https://github.com/yanboliang	2023-10-13 15:49:04 +00:00
Kurt Mohler	33f1513486	Add `lazy_clone_storage` to create COW storages (#110192 ) This PR relands #110022 but accounts for the changes in #110191. Also, the function for creating COW storages is called `lazy_clone_storage` in this PR, instead of `try_ensure` NOTE: COW storages do not actually copy on write yet, they just have the COW deleter and deleter context applied to them Part of #109833 Differential Revision: [D50265134](https://our.internmc.facebook.com/intern/diff/D50265134) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110192 Approved by: https://github.com/ezyang	2023-10-13 15:33:40 +00:00
Angela Yi	35750bf9d1	[export] Fix issue with internal model (#111140 ) Summary: This was error was run into when running ExportPassBase on an exported model with lifted constant tensors: ``` File "/data/users/angelayi/pytorch/torch/_subclasses/fake_tensor.py", line 1444, in dispatch len(kwargs) == 0 and len(args) == 1 and type(args[0]) is torch.Tensor AssertionError: (FakeTensor(..., size=(s0,)),) {} While executing %lift_fresh_copy_1 : [num_users=1] = call_function[target=torch.ops.aten.lift_fresh_copy.default](args = (%_lifted_tensor_constant99,), kwargs = {}) Original traceback: File "" in forward mean = torch.tensor([0.485, 0.456, 0.406]).reshape(3, 1, 1) ``` In ExportPassBase, we retrace using the fake tensors in the placeholder nodes, but when running into this lift_fresh_copy operators, it's unable to be called with the fake tensors. Test Plan: CI Reviewed By: chakriu Differential Revision: D50211827 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111140 Approved by: https://github.com/zhxchen17	2023-10-13 14:07:07 +00:00
Will Constable	359336e3e9	[C10D] Introduce C++ side Collective Callbacks. (#110307 ) C++ side callbacks allow for advance users to get access to the collective firehose. It's worth mentioning and discussing the dire environment that those callbacks are invoked. From either main thread of watchdog thread and with a PTD lock held. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110307 Approved by: https://github.com/fduwjj ghstack dependencies: #111061, #111072	2023-10-13 13:53:16 +00:00
Edward Z. Yang	d24539ee6a	Improve reflection_pad2d lowering for dynamic shapes (#110988 ) Fixes https://github.com/pytorch/pytorch/issues/110696 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/110988 Approved by: https://github.com/jansel, https://github.com/lezcano	2023-10-13 13:38:46 +00:00
Sam Larsen	0dfa354570	[inductor] Implement Fx graph caching to improve warm compilation time. (#103453 ) Summary: Implement an on-disk cache to save and reuse compiled FX Graphs. This implementation does not handle tensors with symbolic shapes. This needs to be done in a follow-up PR. Test Plan: * New unit tests exercising saving and load from the cache. * New unit tests to exercise the cache key calculations. * Ran several benchmarks to see cache hit and resulting compilation times. Differential Revision: [D50255289](https://our.internmc.facebook.com/intern/diff/D50255289) Pull Request resolved: https://github.com/pytorch/pytorch/pull/103453 Approved by: https://github.com/eellison, https://github.com/Chillee	2023-10-13 13:33:56 +00:00
Kaichao You	69dcbc02b0	[Dynamo]Expose bytecode hooks and add example usage for decompilation in docs (#110714 ) Dynamo dynamically translate bytecode of python functions, which is powerful but with difficult-to-understand bytecode. Most users cannot understand python bytecode. Although a general purpose way to decompile python bytecode into source code is very difficult, I find that this work can be greatly simplified since Dynamo already cleans up the code: the bytecode generated by Dynamo is a reduced subset of well-behaved python bytecode. I created a tiny decompiler for pytorch 2.0, named `depyf`: https://github.com/youkaichao/depyf . There are several takeaways: - It supports pyton 3.7 - 3.11 (both inclusive), the same python versions supported by pytorch. Since the main usage of this library is to understand pytorch 2.0, I plan to keep pace with pytorch. If pytorch supports a new python version, I can add support for that. (Actually, the core code is just about 1k lines. Adding support for new versions of python bytecode can be done in just several days.) - I have tested the correctness of decompiled source code in torchbench. I capture the modified bytecode generated by Dynamo, decompile it into source code, and then compile it into new bytecode, replace the Dynamo generated bytecode with new bytecode. And it passed all the accuracy tests for timm models. For huggingface models, the situation is more complicated: all failed cases are caused by the compile step: some functions use the `__class__` as closure variables, but decompiler can only get the code object, so it has no way to figure out the `__class__` , leading to a name error when compiling the decompiled code. That said, it passed the rest tests without the `__class__` issue. Please see the log file https://cloud.tsinghua.edu.cn/f/685e4af8d930499baa7c/?dl=1 and https://cloud.tsinghua.edu.cn/f/cab89500e15e4b62890b/?dl=1 for details. With the above efforts, I think it would be great to add an additional logging option in Dynamo: we can try to decompile the generated bytecode into source code, so that users can have a rough idea of what the modified bytecode does. It does not affect the workflow of Dynamo, but just adds more debug information. An example code from the [doc](https://pytorch.org/docs/main/torch.compiler_deepdive.html): ```python from typing import List import torch from torch import _dynamo as torchdynamo def my_compiler(gm: torch.fx.GraphModule, example_inputs: List[torch.Tensor]): print("my_compiler() called with FX graph:") gm.graph.print_tabular() return gm.forward # return a python callable @torchdynamo.optimize(my_compiler) def toy_example(a, b): x = a / (torch.abs(a) + 1) if b.sum() < 0: b = b * -1 return x * b for _ in range(100): toy_example(torch.randn(10), torch.randn(10)) ``` Run with `export TORCH_LOGS="+dynamo,guards,bytecode"`. Bytecode logging: ``` [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] ORIGINAL BYTECODE toy_example /Users/youkaichao/DeepLearning/depyf/ykc_test.py line 8 [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 10 0 LOAD_FAST 0 (a) [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 2 LOAD_GLOBAL 0 (torch) [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 4 LOAD_METHOD 1 (abs) [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 6 LOAD_FAST 0 (a) [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 8 CALL_METHOD 1 [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 10 LOAD_CONST 1 (1) [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 12 BINARY_ADD [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 14 BINARY_TRUE_DIVIDE [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 16 STORE_FAST 2 (x) [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 11 18 LOAD_FAST 1 (b) [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 20 LOAD_METHOD 2 (sum) [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 22 CALL_METHOD 0 [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 24 STORE_FAST 3 (__temp_2) [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 12 26 LOAD_FAST 3 (__temp_2) [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 28 LOAD_CONST 2 (0) [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 30 COMPARE_OP 0 (<) [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 32 POP_JUMP_IF_FALSE 21 (to 42) [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 13 34 LOAD_FAST 1 (b) [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 36 LOAD_CONST 3 (-1) [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 38 BINARY_MULTIPLY [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 40 STORE_FAST 1 (b) [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 14 >> 42 LOAD_FAST 2 (x) [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 44 LOAD_FAST 1 (b) [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 46 BINARY_MULTIPLY [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 48 RETURN_VALUE [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] [2023-10-06 23:56:44,929] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] [2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] MODIFIED BYTECODE toy_example /Users/youkaichao/DeepLearning/depyf/ykc_test.py line 8 [2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 8 0 LOAD_GLOBAL 3 (__compiled_fn_0) [2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 2 LOAD_FAST 0 (a) [2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 4 LOAD_FAST 1 (b) [2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 6 CALL_FUNCTION 2 [2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 8 UNPACK_SEQUENCE 2 [2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 10 STORE_FAST 2 (x) [2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 12 POP_JUMP_IF_FALSE 12 (to 24) [2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 14 LOAD_GLOBAL 4 (__resume_at_34_1) [2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 16 LOAD_FAST 1 (b) [2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 18 LOAD_FAST 2 (x) [2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 20 CALL_FUNCTION 2 [2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 22 RETURN_VALUE [2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] >> 24 LOAD_GLOBAL 5 (__resume_at_42_2) [2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 26 LOAD_FAST 1 (b) [2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 28 LOAD_FAST 2 (x) [2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 30 CALL_FUNCTION 2 [2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 32 RETURN_VALUE [2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] [2023-10-06 23:56:44,930] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] ``` New output with this PR: ``` [2023-10-06 16:25:21,535] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] possible source code: [2023-10-06 16:25:21,535] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] def toy_example(a, b): [2023-10-06 16:25:21,535] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] __temp_1 = __compiled_fn_0(a, b) [2023-10-06 16:25:21,535] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] x = __temp_1[0] [2023-10-06 16:25:21,535] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] if __temp_1[1]: [2023-10-06 16:25:21,535] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] return __resume_at_34_1(b, x) [2023-10-06 16:25:21,535] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] return __resume_at_42_2(b, x) [2023-10-06 16:25:21,535] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] [2023-10-06 16:25:21,535] [0/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] If you find the decompiled code is wrong,please submit an issue at https://github.com/youkaichao/depyf/issues. ``` The rest two log (please pay attention to the output `possible source code:`): ``` [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] ORIGINAL BYTECODE <resume in toy_example> /workspace/youkaichao/code/pytorch/ykc.py line 12 [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 12 0 JUMP_ABSOLUTE 22 (to 44) [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 2 LOAD_FAST 2 (a) [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 4 LOAD_GLOBAL 0 (torch) [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 6 LOAD_ATTR 1 (abs) [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 8 LOAD_FAST 2 (a) [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 10 CALL_FUNCTION 1 [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 12 LOAD_CONST 1 (1) [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 14 BINARY_ADD [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 16 BINARY_TRUE_DIVIDE [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 18 STORE_FAST 1 (x) [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 20 LOAD_FAST 0 (b) [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 22 LOAD_ATTR 2 (sum) [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 24 CALL_FUNCTION 0 [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 26 STORE_FAST 3 (__temp_2) [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 28 LOAD_FAST 3 (__temp_2) [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 30 LOAD_CONST 2 (0) [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 32 COMPARE_OP 0 (<) [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 34 POP_JUMP_IF_FALSE 22 (to 44) [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 36 LOAD_FAST 0 (b) [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 38 LOAD_CONST 3 (-1) [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 40 BINARY_MULTIPLY [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 42 STORE_FAST 0 (b) [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 14 >> 44 LOAD_FAST 1 (x) [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 46 LOAD_FAST 0 (b) [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 48 BINARY_MULTIPLY [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 50 RETURN_VALUE [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] MODIFIED BYTECODE <resume in toy_example> /workspace/youkaichao/code/pytorch/ykc.py line 12 [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 12 0 LOAD_GLOBAL 3 (__compiled_fn_3) [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 2 LOAD_FAST 0 (b) [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 4 LOAD_FAST 1 (x) [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 6 CALL_FUNCTION 2 [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 8 UNPACK_SEQUENCE 1 [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 10 RETURN_VALUE [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] [2023-10-06 16:25:21,566] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] [2023-10-06 16:25:21,567] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] possible source code: [2023-10-06 16:25:21,567] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] def <resume in toy_example>(b, x): [2023-10-06 16:25:21,567] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] return __compiled_fn_3(b, x)[0] [2023-10-06 16:25:21,567] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] [2023-10-06 16:25:21,567] [1/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] If you find the decompiled code is wrong,please submit an issue at https://github.com/youkaichao/depyf/issues. ``` ``` [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] ORIGINAL BYTECODE <resume in toy_example> /workspace/youkaichao/code/pytorch/ykc.py line 12 [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 12 0 JUMP_ABSOLUTE 18 (to 36) [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 2 LOAD_FAST 2 (a) [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 4 LOAD_GLOBAL 0 (torch) [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 6 LOAD_ATTR 1 (abs) [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 8 LOAD_FAST 2 (a) [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 10 CALL_FUNCTION 1 [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 12 LOAD_CONST 1 (1) [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 14 BINARY_ADD [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 16 BINARY_TRUE_DIVIDE [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 18 STORE_FAST 1 (x) [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 20 LOAD_FAST 0 (b) [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 22 LOAD_ATTR 2 (sum) [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 24 CALL_FUNCTION 0 [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 26 STORE_FAST 3 (__temp_2) [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 28 LOAD_FAST 3 (__temp_2) [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 30 LOAD_CONST 2 (0) [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 32 COMPARE_OP 0 (<) [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 34 POP_JUMP_IF_FALSE 22 (to 44) [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 13 >> 36 LOAD_FAST 0 (b) [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 38 LOAD_CONST 3 (-1) [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 40 BINARY_MULTIPLY [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 42 STORE_FAST 0 (b) [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 14 >> 44 LOAD_FAST 1 (x) [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 46 LOAD_FAST 0 (b) [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 48 BINARY_MULTIPLY [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 50 RETURN_VALUE [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] [2023-10-06 16:25:21,579] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] [2023-10-06 16:25:21,580] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] MODIFIED BYTECODE <resume in toy_example> /workspace/youkaichao/code/pytorch/ykc.py line 12 [2023-10-06 16:25:21,580] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 12 0 LOAD_GLOBAL 3 (__compiled_fn_4) [2023-10-06 16:25:21,580] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 2 LOAD_FAST 0 (b) [2023-10-06 16:25:21,580] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 4 LOAD_FAST 1 (x) [2023-10-06 16:25:21,580] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 6 CALL_FUNCTION 2 [2023-10-06 16:25:21,580] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 8 UNPACK_SEQUENCE 1 [2023-10-06 16:25:21,580] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] 10 RETURN_VALUE [2023-10-06 16:25:21,580] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] [2023-10-06 16:25:21,580] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] [2023-10-06 16:25:21,580] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] possible source code: [2023-10-06 16:25:21,580] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] def <resume in toy_example>(b, x): [2023-10-06 16:25:21,580] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] return __compiled_fn_4(b, x)[0] [2023-10-06 16:25:21,580] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] [2023-10-06 16:25:21,580] [2/0] torch._dynamo.convert_frame.__bytecode: [DEBUG] If you find the decompiled code is wrong,please submit an issue at https://github.com/youkaichao/depyf/issues. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/110714 Approved by: https://github.com/jansel	2023-10-13 12:36:00 +00:00
Honglin Zhu	cdc8d709cb	Fix mkldnn_matmul error on AArch64 (#110150 ) Fixes #110149 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110150 Approved by: https://github.com/jgong5, https://github.com/peterbell10	2023-10-13 12:35:46 +00:00
DanilBaibak	24bd3301d9	Fixed description of run_on input for linux-binary-test workflow (#111191 ) As a part of the migrating linux arm64 runners to the autoscaling group, fixed description of the run_on input for the linux-binary-test Pull Request resolved: https://github.com/pytorch/pytorch/pull/111191 Approved by: https://github.com/jeanschmidt	2023-10-13 09:57:40 +00:00
PaliC	af05fbb84a	Linter to avoid csv merge conflicts (#111163 ) This PR addresses the persistent issue of merge conflicts in the benchmarks/dynamo/ci_expected_accuracy/ directory, specifically those arising from frequently updated CSV files. Based on @malfet suggestion, the solution implemented adds three spaces between each line in the CSV files. This approach has proven effective in preventing merge conflicts, as evidenced in [D50239634](https://www.internalfb.com/intern/diff/D50239634/). Regardless of these changes the extra new lines should still allow the csvs to be ingested as normal. If you have access to the diff: Normally, modifying a line that is later altered in the stack results in a merge conflict during restacking. With this new spacing strategy, lines that are not modified further down the stack will not trigger merge conflicts, achieving our intended outcome. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111163 Approved by: https://github.com/malfet, https://github.com/huydhn	2023-10-13 09:35:34 +00:00
Oleg Khabinov	8209bbbd06	[AOTInductor] Improve validation for C++ wrapper codegen (#111102 ) It's a reimplementation of #111089 1. When using fake inputs make sure they are on the same device as the original inputs. 2. Don't change the value of self.cpp_wrapper from True to False if can't generate a C++ wrapper, instead have a check and fail early to avoid producing Python code for C++ compiler. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111102 Approved by: https://github.com/desertfire, https://github.com/jgong5, https://github.com/chunyuan-w	2023-10-13 08:46:17 +00:00
David Berard	898482f1bf	[logging] log exceptions when provided (#111164 ) This PR will cause logging.exception() to also dump the exception and stacktrace. Copied from `74723e1110/Lib/logging/__init__.py (L707-L711)` repro: <details> ```python import torch import torch._inductor.config torch._inductor.config.triton.inject_relu_bug_TESTING_ONLY = "runtime_error" def fn(x, y): return (x @ y).relu() x, y = [torch.rand((16, 16), device='cuda') for _ in range (2)] torch.compile(fn)(x, y) ``` run with TORCHDYNAMO_REPRO_AFTER=aot TORCHDYNAMO_REPRO_LEVEL=4 </details> before: ``` ... [2023-10-12 14:18:52,902] torch._dynamo.debug_utils: [ERROR] While minifying the program in accuracy minification mode, ran into a runtime exception which is likely an unrelated issue. Skipping this graph. ``` now: ``` ... [2023-10-12 14:18:52,902] torch._dynamo.debug_utils: [ERROR] While minifying the program in accuracy minification mode, ran into a runtime exception which is likely an unrelated issue. Skipping this graph. Traceback (most recent call last): File "/data/users/dberard/scripts/relu_accuracy_issue.py", line 10, in <module> torch.compile(fn)(x, y) ... ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/111164 Approved by: https://github.com/eellison	2023-10-13 03:52:26 +00:00
Jesse Cai	4c01686027	Public API for constructing NT with jagged layout from tensor list (#111078 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111078 Approved by: https://github.com/cpuhrsch, https://github.com/soulitzer ghstack dependencies: #109123	2023-10-13 03:27:41 +00:00
Scott Wolchok	a2c17a2b00	[PyTorch] AOTI: add CPU fast path in aoti_torch_empty_strided (#110877 ) This seems to reduce benchmark time by 15-20%. Supersedes D49835545. Differential Revision: [D49974460](https://our.internmc.facebook.com/intern/diff/D49974460/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110877 Approved by: https://github.com/chenyang78, https://github.com/jansel, https://github.com/desertfire ghstack dependencies: #110876	2023-10-13 02:16:11 +00:00
Scott Wolchok	b85f848233	[PyTorch] -DNDEBUG in inductor codecache builds (#110876 ) Things like TORCH_INTERNAL_ASSERT_DEBUG_ONLY care about this! Differential Revision: [D49972742](https://our.internmc.facebook.com/intern/diff/D49972742/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110876 Approved by: https://github.com/chenyang78, https://github.com/jansel, https://github.com/desertfire	2023-10-13 02:16:11 +00:00
Zhengxu Chen	168bad5f23	[export] Reland "Fix graph signature data model to list of specs." (#111136 ) Summary: reland D49876258 Test Plan: CI Differential Revision: D50224384 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111136 Approved by: https://github.com/angelayi	2023-10-13 02:04:29 +00:00
Yanbo Liang	9980876cab	Quant: add weight int4pack mm kernel (#110914 ) Adding the weight int4pack mm CUDA kernel. The kernel comes from the tinnygemm project which developed by Jeff Johnson. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110914 Approved by: https://github.com/Chillee	2023-10-13 01:21:18 +00:00
CaoE	8713a1a363	add Half support for bernoulli on CPU (#104176 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/104176 Approved by: https://github.com/mingfeima, https://github.com/cpuhrsch	2023-10-13 01:18:55 +00:00
drisspg	74b1f4f71a	Update sdp_utils functions to accept const& params (#111144 ) # Summary All our filter functions should not mutate the passed in params, this both makes the intent more clear and allows for the compiler to possible produce more optimal code. ### Note I used East-const style cause I think it is more clear: https://mariusbancila.ro/blog/2018/11/23/join-the-east-const-revolution/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/111144 Approved by: https://github.com/cpuhrsch, https://github.com/Skylion007	2023-10-13 00:48:08 +00:00
Wei Lu	21dc1d2547	[Vulkan] Add the 2D case to Layernorm operator (#110796 ) Summary: We add a 2D implementation to the op [LayerNorm](https://pytorch.org/docs/stable/generated/torch.nn.LayerNorm.html?fbclid=IwAR00Xi7gt-qo4_LDFo18aaKTxnC4s1vlqk5EREqL0KE0Iz_97-WTlvi0muY) The current implementation of layer_norm D37407311 supports - input of 3D and normalized_shape also of 3D, or - input of 4D with batch dim equal to 1 and normalized_shape of 3D Since a 2D tensor of [H, W] can be represented as [1, H, W] in shader, we make a straightforward generalization to the case where both input and normalized_shape are of 2D. Test Plan: ## Before ``` [luwei@devbig984.prn1 ~/fbsource (e09fe4ae4\|remote/fbsource/stable...)]$ LD_LIBRARY_PATH=third-party/swiftshader/lib/linux-x64/ buck run fbcode/mode/dev-nosan //xplat/caffe2:pt_vulkan_api_test_bin -- --gtest_filter="layer_norm_2d" Recommended: For faster builds try buck2: replace 'buck' with 'buck2' NOTE: buck-out/ has changed: look for files in fbsource/buck-out/v2/ 'buck2 build --show-output //xplat/caffe2:pt_vulkan_api_test_bin' will print the new output paths. If you are building in fbsource//xplat and have questions, post in 'Cross Platform Dev Discussions': https://fb.workplace.com/groups/xplat.qa Targets matching .buckconfig buck2.supported_projects: {'//xplat/caffe2:pt_vulkan_api_test_bin': '//xplat'} To suppress this warning: touch ~/.config/.dont_hint_buck2 clang-12: warning: argument unused during compilation: '-pthread' [-Wunused-command-line-argument] Downloaded 2/4 artifacts, 125.45 Kbytes, 33.3% cache miss (for updated rules) Building: finished in 4.9 sec (100%) 2637/2637 jobs, 3/2637 updated Total time: 4.9 sec BUILD SUCCEEDED Running main() from third-party/googletest/1.11.0/googletest/googletest/src/gtest_main.cc Note: Google Test filter = layer_norm_2d [==========] Running 3 tests from 1 test suite. [----------] Global test environment set-up. [----------] 3 tests from VulkanAPITest [ RUN ] VulkanAPITest.layer_norm_2d_small unknown file: Failure C++ exception with description "Vulkan layernorm expects 3-dim or 4-dim input! Exception raised from layer_norm at xplat/caffe2/aten/src/ATen/native/vulkan/ops/Layernorm.cpp:66 (most recent call first): (no backtrace available)" thrown in the test body. [ FAILED ] VulkanAPITest.layer_norm_2d_small (56 ms) [ RUN ] VulkanAPITest.layer_norm_2d_medium unknown file: Failure C++ exception with description "Vulkan layernorm expects 3-dim or 4-dim input! Exception raised from layer_norm at xplat/caffe2/aten/src/ATen/native/vulkan/ops/Layernorm.cpp:66 (most recent call first): (no backtrace available)" thrown in the test body. [ FAILED ] VulkanAPITest.layer_norm_2d_medium (0 ms) [ RUN ] VulkanAPITest.layer_norm_2d_large unknown file: Failure C++ exception with description "Vulkan layernorm expects 3-dim or 4-dim input! Exception raised from layer_norm at xplat/caffe2/aten/src/ATen/native/vulkan/ops/Layernorm.cpp:66 (most recent call first): (no backtrace available)" thrown in the test body. [ FAILED ] VulkanAPITest.layer_norm_2d_large (27 ms) [----------] 3 tests from VulkanAPITest (84 ms total) [----------] Global test environment tear-down [==========] 3 tests from 1 test suite ran. (84 ms total) [ PASSED ] 0 tests. [ FAILED ] 3 tests, listed below: [ FAILED ] VulkanAPITest.layer_norm_2d_small [ FAILED ] VulkanAPITest.layer_norm_2d_medium [ FAILED ] VulkanAPITest.layer_norm_2d_large 3 FAILED TESTS ``` ## After ``` [luwei@devbig984.prn1 ~/fbsource (e09fe4ae4\|remote/fbsource/stable...)]$ LD_LIBRARY_PATH=third-party/swiftshader/lib/linux-x64/ buck run fbcode/mode/dev-nosan //xplat/caffe2:pt_vulkan_api_test_bin -- --gtest_filter="layer_norm_2d" Recommended: For faster builds try buck2: replace 'buck' with 'buck2' NOTE: buck-out/ has changed: look for files in fbsource/buck-out/v2/ 'buck2 build --show-output //xplat/caffe2:pt_vulkan_api_test_bin' will print the new output paths. If you are building in fbsource//xplat and have questions, post in 'Cross Platform Dev Discussions': https://fb.workplace.com/groups/xplat.qa Targets matching .buckconfig buck2.supported_projects: {'//xplat/caffe2:pt_vulkan_api_test_bin': '//xplat'} To suppress this warning: touch ~/.config/.dont_hint_buck2 clang-12: warning: argument unused during compilation: '-pthread' [-Wunused-command-line-argument] Downloaded 1/3 artifacts, 1.40 Mbytes, 50.0% cache miss (for updated rules) Building: finished in 5.0 sec (100%) 2637/2637 jobs, 2/2637 updated Total time: 5.0 sec BUILD SUCCEEDED Running main() from third-party/googletest/1.11.0/googletest/googletest/src/gtest_main.cc Note: Google Test filter = layer_norm_2d [==========] Running 3 tests from 1 test suite. [----------] Global test environment set-up. [----------] 3 tests from VulkanAPITest [ RUN ] VulkanAPITest.layer_norm_2d_small [ OK ] VulkanAPITest.layer_norm_2d_small (282 ms) [ RUN ] VulkanAPITest.layer_norm_2d_medium [ OK ] VulkanAPITest.layer_norm_2d_medium (0 ms) [ RUN ] VulkanAPITest.layer_norm_2d_large [ OK ] VulkanAPITest.layer_norm_2d_large (214 ms) [----------] 3 tests from VulkanAPITest (497 ms total) [----------] Global test environment tear-down [==========] 3 tests from 1 test suite ran. (497 ms total) [ PASSED ] 3 tests. ``` full test result: P848167714 Differential Revision: D50048054 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110796 Approved by: https://github.com/yipjustin	2023-10-13 00:37:11 +00:00
Jon Chuang	9c7f464eef	[inductor]: Better debugging of `can_fuse` decisions with `TORCH_LOGS=fusion` (#110415 ) Fixes https://github.com/pytorch/pytorch/issues/110393 Example logs (for adagrad on main). In this case, it clearly identifies device mismatch as a potential red flag, which is indeed the obstacle to adagrad's successful fusion. (see: https://github.com/pytorch/pytorch/pull/110339) ``` [2023-10-03 21:50:24,084] [0/0] torch._inductor.scheduler.__schedule: [DEBUG] ===== attempting fusion (1/10): 18 nodes ===== [2023-10-03 21:50:24,084] [0/0] torch._inductor.scheduler.__schedule: [DEBUG] cannot fuse (foreach:3): candidate consumer has no dep in any foreach producer [2023-10-03 21:50:24,084] [0/0] torch._inductor.scheduler.__schedule: [DEBUG] cannot fuse (foreach:3): candidate consumer has no dep in any foreach producer [2023-10-03 21:50:24,085] [0/0] torch._inductor.scheduler.__schedule: [DEBUG] cannot fuse (foreach:3): candidate consumer has no dep in any foreach producer [2023-10-03 21:50:24,085] [0/0] torch._inductor.scheduler.__schedule: [DEBUG] cannot fuse (foreach:3): candidate consumer has no dep in any foreach producer [2023-10-03 21:50:24,085] [0/0] torch._inductor.scheduler.__schedule: [DEBUG] cannot fuse (7): device mismatch (node1: cuda:0, node2: cpu) [2023-10-03 21:50:24,085] [0/0] torch._inductor.scheduler.__schedule: [DEBUG] cannot fuse (7): device mismatch (node1: cuda:0, node2: cpu) [2023-10-03 21:50:24,085] [0/0] torch._inductor.scheduler.__schedule: [DEBUG] cannot fuse (7): device mismatch (node1: cuda:0, node2: cpu) [2023-10-03 21:50:24,085] [0/0] torch._inductor.scheduler.__schedule: [DEBUG] cannot fuse (7): device mismatch (node1: cuda:0, node2: cpu) [2023-10-03 21:50:24,085] [0/0] torch._inductor.scheduler.__schedule: [DEBUG] cannot fuse (foreach:3): candidate consumer has no dep in any foreach producer [2023-10-03 21:50:24,085] [0/0] torch._inductor.scheduler.__schedule: [DEBUG] cannot fuse (7): device mismatch (node1: cuda:0, node2: cpu) [2023-10-03 21:50:24,085] [0/0] torch._inductor.scheduler.__schedule: [DEBUG] cannot fuse (foreach:3): candidate consumer has no dep in any foreach producer [2023-10-03 21:50:24,085] [0/0] torch._inductor.scheduler.__schedule: [DEBUG] cannot fuse (foreach:3): candidate consumer has no dep in any foreach producer [2023-10-03 21:50:24,085] [0/0] torch._inductor.scheduler.__schedule: [DEBUG] cannot fuse (7): device mismatch (node1: cuda:0, node2: cpu) [2023-10-03 21:50:24,085] [0/0] torch._inductor.scheduler.__schedule: [DEBUG] cannot fuse (7): device mismatch (node1: cuda:0, node2: cpu) [2023-10-03 21:50:24,085] [0/0] torch._inductor.scheduler.__schedule: [DEBUG] cannot fuse (7): device mismatch (node1: cuda:0, node2: cpu) [2023-10-03 21:50:24,085] [0/0] torch._inductor.scheduler.__schedule: [DEBUG] cannot fuse (foreach:3): candidate consumer has no dep in any foreach producer [2023-10-03 21:50:24,085] [0/0] torch._inductor.scheduler.__schedule: [DEBUG] 13 possible fusions: [2023-10-03 21:50:24,085] [0/0] torch._inductor.scheduler.__schedule: [DEBUG] (ForeachKernelSchedulerNode(nodes=buf0_buf1_buf2_buf3), ForeachKernelSchedulerNode(nodes=buf4_buf5_buf6_buf7)) [2023-10-03 21:50:24,085] [0/0] torch._inductor.scheduler.__schedule: [DEBUG] (ForeachKernelSchedulerNode(nodes=buf4_buf5_buf6_buf7), SchedulerNode(name='buf8')) [2023-10-03 21:50:24,085] [0/0] torch._inductor.scheduler.__schedule: [DEBUG] (ForeachKernelSchedulerNode(nodes=buf4_buf5_buf6_buf7), SchedulerNode(name='buf10')) [2023-10-03 21:50:24,085] [0/0] torch._inductor.scheduler.__schedule: [DEBUG] (ForeachKernelSchedulerNode(nodes=buf0_buf1_buf2_buf3), SchedulerNode(name='buf12')) [2023-10-03 21:50:24,085] [0/0] torch._inductor.scheduler.__schedule: [DEBUG] (ForeachKernelSchedulerNode(nodes=buf0_buf1_buf2_buf3), SchedulerNode(name='buf14')) [2023-10-03 21:50:24,085] [0/0] torch._inductor.scheduler.__schedule: [DEBUG] (ForeachKernelSchedulerNode(nodes=buf4_buf5_buf6_buf7), SchedulerNode(name='buf9')) [2023-10-03 21:50:24,085] [0/0] torch._inductor.scheduler.__schedule: [DEBUG] (ForeachKernelSchedulerNode(nodes=buf4_buf5_buf6_buf7), SchedulerNode(name='buf11')) [2023-10-03 21:50:24,085] [0/0] torch._inductor.scheduler.__schedule: [DEBUG] (ForeachKernelSchedulerNode(nodes=buf0_buf1_buf2_buf3), SchedulerNode(name='buf13')) [2023-10-03 21:50:24,085] [0/0] torch._inductor.scheduler.__schedule: [DEBUG] (ForeachKernelSchedulerNode(nodes=buf0_buf1_buf2_buf3), SchedulerNode(name='buf15')) [2023-10-03 21:50:24,085] [0/0] torch._inductor.scheduler.__schedule: [DEBUG] (SchedulerNode(name='buf25'), SchedulerNode(name='buf33')) [2023-10-03 21:50:24,085] [0/0] torch._inductor.scheduler.__schedule: [DEBUG] (SchedulerNode(name='buf43'), SchedulerNode(name='buf51')) [2023-10-03 21:50:24,085] [0/0] torch._inductor.scheduler.__schedule: [DEBUG] (SchedulerNode(name='buf34'), SchedulerNode(name='buf42')) [2023-10-03 21:50:24,085] [0/0] torch._inductor.scheduler.__schedule: [DEBUG] (SchedulerNode(name='buf16'), SchedulerNode(name='buf24')) [2023-10-03 21:50:24,087] [0/0] torch._inductor.scheduler.__schedule: [DEBUG] completed fusion round (1/10): fused 18 nodes into 5 nodes [2023-10-03 21:50:24,087] [0/0] torch._inductor.scheduler.__schedule: [DEBUG] [2023-10-03 21:50:24,087] [0/0] torch._inductor.scheduler.__schedule: [DEBUG] ===== attempting fusion (2/10): 5 nodes ===== [2023-10-03 21:50:24,087] [0/0] torch._inductor.scheduler.__schedule: [DEBUG] cannot fuse (7): device mismatch (node1: cuda:0, node2: cpu) [2023-10-03 21:50:24,087] [0/0] torch._inductor.scheduler.__schedule: [DEBUG] cannot fuse (7): device mismatch (node1: cuda:0, node2: cpu) [2023-10-03 21:50:24,087] [0/0] torch._inductor.scheduler.__schedule: [DEBUG] cannot fuse (7): device mismatch (node1: cuda:0, node2: cpu) [2023-10-03 21:50:24,087] [0/0] torch._inductor.scheduler.__schedule: [DEBUG] cannot fuse (7): device mismatch (node1: cuda:0, node2: cpu) [2023-10-03 21:50:24,087] [0/0] torch._inductor.scheduler.__schedule: [DEBUG] 0 possible fusions: [2023-10-03 21:50:24,087] [0/0] torch._inductor.scheduler.__schedule: [DEBUG] completed fusion round (2/10): fused 5 nodes into 5 nodes [2023-10-03 21:50:24,087] [0/0] torch._inductor.scheduler.__schedule: [DEBUG] [2023-10-03 21:50:24,087] [0/0] torch._inductor.scheduler.__schedule: [DEBUG] ===== fusion complete (2 iterations) ===== ``` CC @jansel @ngimel @mlazos @shunting314 @peterbell10 as code owners Pull Request resolved: https://github.com/pytorch/pytorch/pull/110415 Approved by: https://github.com/mlazos	2023-10-13 00:36:45 +00:00
Avik Chaudhuri	1208a44799	[docs] export full aten opset (#111161 ) Differential Revision: [D50240459](https://our.internmc.facebook.com/intern/diff/D50240459/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111161 Approved by: https://github.com/tugsbayasgalan	2023-10-13 00:28:35 +00:00
Matthew Hoffman	ad4472833c	define public API for torch.nn.utils (#111026 ) Adding modules imported here and the following functions to the `__all__`: * [clip_grad_norm_](https://pytorch.org/docs/stable/generated/torch.nn.utils.clip_grad_norm_.html) * [clip_grad_value_](https://pytorch.org/docs/stable/generated/torch.nn.utils.clip_grad_value_.html) * [remove_weight_norm](https://pytorch.org/docs/stable/generated/torch.nn.utils.remove_weight_norm.html) * [parameters_to_vector](https://pytorch.org/docs/stable/generated/torch.nn.utils.parameters_to_vector.html) * [vector_to_parameters](https://pytorch.org/docs/stable/generated/torch.nn.utils.vector_to_parameters.html) * [remove_spectral_norm](https://pytorch.org/docs/stable/generated/torch.nn.utils.remove_spectral_norm.html) * [skip_init](https://pytorch.org/docs/stable/generated/torch.nn.utils.skip_init.html) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111026 Approved by: https://github.com/mikaylagawarecki	2023-10-12 23:05:23 +00:00
Joel Schlosser	8f90be4478	Expand NT subclass to support SAM (#109123 ) This PR contains the changes needed to support using the NT jagged subclass within SAM. Note that a NT with multiple ragged dims is still required at the extremes for inputs / outputs, but the internal computation generally involves a single ragged dim, making the jagged layout usable. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109123 Approved by: https://github.com/cpuhrsch, https://github.com/soulitzer	2023-10-12 20:33:22 +00:00
wz337	e0eaa95e99	[DCP] Remove _shard_tensor() call in load_sharded_optimizer_state_dict in optimizer.py (#111096 ) `_shard_tensor()` calls into `dist.all_gather_object()` and this is causing optimizer state dict loading to be super slow. Workaround: call `FSDP._shard_utils._create_chunk_sharded_tensor()` to construct ShardedTensor without any communication. Thanks to @fegin for suggesting the fix! Thanks @mvpatel2000 for reporting the issue and providing profiling details to help us isolate the problematic source code quickly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111096 Approved by: https://github.com/fegin	2023-10-12 20:27:06 +00:00
Yang Chen	6748a14a71	[aot_inductor] add a test with AOTInductor + TorchScript (#111124 ) This test may be of a reference for using AOTInductor with TorchScript. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111124 Approved by: https://github.com/jansel	2023-10-12 19:29:07 +00:00
Aleksandar Samardžić	397deaa825	Fix typo in mixed dtypes linear operator implementation. (#111127 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111127 Approved by: https://github.com/Skylion007	2023-10-12 19:06:04 +00:00
PyTorch MergeBot	7fbfa4e020	Revert "[inductor] Implement Fx graph caching to improve warm compilation time. (#103453 )" This reverts commit fc1105b2827ee2febc85a3c353470edfd70a66ed. Reverted https://github.com/pytorch/pytorch/pull/103453 on behalf of https://github.com/kit1980 due to Same issue unfortunately, the newly added test fails on internal builds ([comment](https://github.com/pytorch/pytorch/pull/103453#issuecomment-1760202365))	2023-10-12 18:54:51 +00:00
atalman	f9053877b4	Add pypi required metadata to all wheels except linux (#111042 ) Will fix package after publishing https://github.com/pytorch/pytorch/issues/100974 Poetry install requires all wheels on pypi to have same metadata. Hence including linux dependencies in all non-linux wheels Pull Request resolved: https://github.com/pytorch/pytorch/pull/111042 Approved by: https://github.com/malfet	2023-10-12 17:40:13 +00:00
Jack Taylor	94c9dbff22	Disable cutlass_template on ROCm (#111132 ) Fixes #111066 #111065 #111064 Currently use_cutlass_template is returning True on ROCm but the feature is not supported. Fix to return false on ROCm. I considering adding this change to `try_import_cutlass` instead but the comments hinted that this function would be removed at some point. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111132 Approved by: https://github.com/jansel	2023-10-12 17:14:07 +00:00
Will Constable	bb1424d46e	Reland #2 "[C10] PG observability hooks. (#108815 , #110907 )" (#111072 ) This reverts commit 314a502eb04c6382e2cc9af0573533efba54109d. Changes since original PR: Reland 1 * rename torch.distributed.hooks to torch.distributed._hooks Reland 2 * make _hooks importable even if !distributed.is_available() * handle cuda driver exit intermittent failure caused by new cuda api usage in callback caller (see prev PR in stack) (original PR https://github.com/pytorch/pytorch/pull/108815 desc copied below) Expose a set of observability hooks into C10D such that our users can detect collectives failure both faster and more easily. The design is similar to NCCL desync debug that it minimized the overhead by doing most of the work out of the main thread. This PR introduces a new module torch.distributed.hooks that exposes the following set of methods: register_collective_start_hook register_collective_end_hook register_process_group_hook The process group hook exposes PG creation on the member ranks and call them inline from the the PG creation code. This is fine since this happens during initialization and a limited number of times. The collective start/end hooks are fired from a single background thread. It reads events from a C++ queue and dispatches over. Queue notification is oddly done using a pipe, this is needed so python can abort the thread on shutdown and have it as background thread. This is not possible with more reasonable choices like a condvar. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111072 Approved by: https://github.com/malfet ghstack dependencies: #111061	2023-10-12 16:59:23 +00:00
isdanni	dede1e96e2	[BE] Enable Ruff's Flake8 PYI018 (#111101 ) Enable [unused-private-type-var (PYI018)](https://docs.astral.sh/ruff/rules/unused-private-type-var/#unused-private-type-var-pyi018) Link: #110950 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111101 Approved by: https://github.com/albanD	2023-10-12 16:26:21 +00:00
Prachi Gupta	53a9ac534c	Added decorator `skipRocmIfTorchInductor` and skipped failing tests (#107760 ) This PR adds a skip decorator which will disable tests in CI for ROCm inductor workflow. This new workflow will be coming in via https://github.com/pytorch/pytorch/pull/110544 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107760 Approved by: https://github.com/jataylo, https://github.com/pruthvistony, https://github.com/atalman	2023-10-12 16:00:35 +00:00
Yu, Guangye	918054f422	[Inductor] support channel last for xpu conv in inductor layout opt path (#111018 ) # Motivation support xpu channel last for inductor layout optimization path. Currently, `_conv_determine_backend_memory_format` always returns torch.contiguous_format for XPU conv. # Solution Add xpu channel last detection stragey in `determine_backend_memory_format` Pull Request resolved: https://github.com/pytorch/pytorch/pull/111018 Approved by: https://github.com/jansel, https://github.com/eellison, https://github.com/EikanWang	2023-10-12 15:13:50 +00:00
Luo Bo	5ace912263	fix: do not reshard parameters twice (#110948 ) This PR fixes potential double resharding of parameters that both: 1. requires no gradient and, 2. were used more than once during forward pass. [`_register_post_backward_hook`](https://github.com/pytorch/pytorch/blob/main/torch/distributed/fsdp/_runtime_utils.py#L1415) handles the case correctly, this PR does the same for `_register_post_backward_reshard_only_hook`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110948 Approved by: https://github.com/awgu	2023-10-12 15:09:33 +00:00
Will Constable	aec0f98e70	Move cuda driver exit handling from helpers to threads (#111061 ) The pattern here is that main may exit and kill cuda driver before c10d watchdog related threads have cleanly exited. If this happens, c10d threads may still make CUDA api calls and raise an exception about the cuda driver being dead. In the past we've patched a few helper functions that call into cuda to specifically handle this driver exiting message. Instead, we know that this problem applies only to codepaths in our background threads, so we should catch at that scope and not worry about fine-grained catching at the helper granularity. (and if a helper is used from the main thread, we should NOT catch this exception- it's the application's fault) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111061 Approved by: https://github.com/malfet, https://github.com/fduwjj	2023-10-12 13:47:04 +00:00
isdanni	2f53085f3f	[BE] Enable Ruff's Flake8 PYI030 (#111103 ) Enable [unnecessary-literal-union (PYI030)](https://docs.astral.sh/ruff/rules/unnecessary-literal-union/) Link: #110950 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111103 Approved by: https://github.com/albanD	2023-10-12 13:31:59 +00:00
Peter Bell	68a1219f74	Move at::{Refcounted,}MapAllocator to c10 (#109881 ) `libshm.so` depends on the torch library exclusively for `at::RefcountedMapAllocator`, so it makes sense to move it to c10 along with the other memory allocators. This means `libshm.so` only depends on `c10` and we don't need to relink `libshm.so` for every ATen change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109881 Approved by: https://github.com/albanD	2023-10-12 10:51:13 +00:00
PyTorch MergeBot	42b89aea4b	Revert "[export] Fix graph signature data model to list of specs. (#111017 )" This reverts commit 33b69509d3665f82bf91cee96f9beeef0d8e0b72. Reverted https://github.com/pytorch/pytorch/pull/111017 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/111017#issuecomment-1759292161))	2023-10-12 09:52:33 +00:00
Michael Voznesensky	395d0eaea0	Dynamo - config gated torch.distributed allow, exclusion for special leaf funcs (#110894 ) `is_allowed` is a tricky bit of functionality - it sits early up in builder and is used to drive the creation of TorchVariable (more notes here, meta only https://fb.workplace.com/groups/pytorch.dev/permalink/1393563781222098/) If we are tracing distributed in full, we want to route certain calls in distributed to NOT PASS is_allowed (this does not, confusingly, mean that they are not allowed, lol, but rather that we dont want them to become TorchVariable), others, we are fine with preserving. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110894 Approved by: https://github.com/ezyang	2023-10-12 09:25:51 +00:00
cyy	499146354e	Use CUDA image for lintrunner (#110502 ) We switch to pytorch-linux-jammy-cuda11.8-cudnn8-py3.9-linter for lintrunner for checking CUDA cpp source. Meanwhile, there is a Dockerfile change due to missing libiomp installation and some other clang-tidy fixes triggered by the switch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110502 Approved by: https://github.com/malfet	2023-10-12 09:16:36 +00:00
Rohan Varma	d8ad0ba5c1	[Dist][ez][nit] Formatted nccl version string in startup (#111076 ) Formats the string using the existing getNCCLversion Differential Revision: [D50193558](https://our.internmc.facebook.com/intern/diff/D50193558/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111076 Approved by: https://github.com/Skylion007	2023-10-12 08:54:32 +00:00
Chien-Chin Huang	b35279dfac	[DDP] Make _ReplicateState inherit from _State and make replicate eagerly initialized (#109647 ) Follow how fully_shard store the _FSDPState, this PR makes _ReplicateState inherit from _State. This PR also makes replicate eagerly initialize the internal DDP instance so that users can access the required methods/functions before the first forward(). Differential Revision: [D49428291](https://our.internmc.facebook.com/intern/diff/D49428291/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109647 Approved by: https://github.com/wz337, https://github.com/rohan-varma ghstack dependencies: #110688	2023-10-12 07:58:39 +00:00
Tugsbayasgalan Manlaibaatar	5614023f5e	Move export.constrain_as_* to torch._constrain_as_* (#110757 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110757 Approved by: https://github.com/avikchaudhuri ghstack dependencies: #109859	2023-10-12 05:37:44 +00:00
PyTorch MergeBot	6ce3a38050	Revert "Move export.constrain_as_* to torch._constrain_as_* (#110757 )" This reverts commit 5aee22e0e033dbd2346b533fb2651ee30ca5ed86. Reverted https://github.com/pytorch/pytorch/pull/110757 on behalf of https://github.com/kit1980 due to Depends on https://github.com/pytorch/pytorch/pull/109859 that needs to be reverted ([comment](https://github.com/pytorch/pytorch/pull/110757#issuecomment-1758908371))	2023-10-12 04:53:29 +00:00
PyTorch UpdateBot	f0e7a91030	[vision hash update] update the pinned vision hash (#111098 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111098 Approved by: https://github.com/pytorchbot	2023-10-12 04:30:55 +00:00
albanD	5e8be63e99	Allow specifiying inputs as GradientEdge in autograd APIs (#110867 ) This can be useful for advanced users (like AOTAutograd) who don't want to keep the corresponding Tensor alive (for memory reasons for example) or when inplace op will change the Tensor's grad_fn (but gradients wrt to the original value is needed). I went minimal API change but open to suggestions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110867 Approved by: https://github.com/soulitzer	2023-10-12 04:08:44 +00:00
Zhengxu Chen	33b69509d3	[export] Fix graph signature data model to list of specs. (#111017 ) Summary: Previously we design the GraphSignature format as a bunch of inputs and outputs node names. After a discussion in the design meeting we decide to change the format to make signature more self-contained. Now the signature format look like the following: ``` [ InputSpec( kind=InputKind.USER_INPUT, arg=TensorArgument(name="arg0_1"), target=None, ), ... ] ``` Test Plan: CI Reviewed By: angelayi Differential Revision: D49876258 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111017 Approved by: https://github.com/angelayi	2023-10-12 03:39:04 +00:00
Wanchao Liang	097defb160	[device mesh] only check when world size > num_devices per host (#111091 ) as titled Pull Request resolved: https://github.com/pytorch/pytorch/pull/111091 Approved by: https://github.com/awgu, https://github.com/wz337 ghstack dependencies: #110898, #110900	2023-10-12 03:37:18 +00:00
Edward Z. Yang	9316c8b4bc	Use torch._check for cat error checking (#111035 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/111035 Approved by: https://github.com/Skylion007	2023-10-12 03:28:27 +00:00
Sergii Dymchenko	6dca81c054	Revert 107846 and 109695 (#111099 ) https://github.com/pytorch/pytorch/pull/107846 caused Meta-internal S369412 https://github.com/pytorch/pytorch/pull/109695 depends on 107846 so also needs to be reverted Pull Request resolved: https://github.com/pytorch/pytorch/pull/111099 Approved by: https://github.com/malfet	2023-10-12 02:30:30 +00:00
Michael Lazos	07f0f383fa	update tensor-like to check instance for torch function impl (#111087 ) tensor like should check the instance for a torch function impl, not the type Pull Request resolved: https://github.com/pytorch/pytorch/pull/111087 Approved by: https://github.com/ezyang	2023-10-12 02:14:38 +00:00
wz337	8e32e62f67	[TP] Validate TP mesh dim for 2D composition (#111001 ) Currently, we only support intranode TP when compositin TP with other parallelism. This PR adds additional check to validate the TP mesh dim in TP initialization when parent mesh exists. cc. @fegin, @fduwjj Pull Request resolved: https://github.com/pytorch/pytorch/pull/111001 Approved by: https://github.com/fduwjj	2023-10-12 02:11:44 +00:00
Yeounoh Chung	80ea8784f3	Bump xla_base version tag to v1.1 (#109757 ) Update to a new base image for xla workflow. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109757 Approved by: https://github.com/malfet	2023-10-12 01:45:26 +00:00
Jerry Zhang	e0ddc3ff9c	[quant][pt2e][be] Move xnnpack quantizer tests to separate file (#111004 ) Summary: att Test Plan: python test/test_quantization.py TestXNNPACKQuantizer python test/test_quantization.py TestXNNPACKQuantizerModels Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/111004 Approved by: https://github.com/andrewor14	2023-10-12 01:16:05 +00:00
Shubhraprakash Das	8f8d8a0b50	Linear Quantize (#110581 ) Summary: Adding Linear vulkan quantize operator Test Plan: buck2 run --target-platforms ovr_config//platform/macos:arm64-fbsource -c pt.vulkan_full_precision=1 //xplat/caffe2/fb/custom_ops/vulkan_quantized:pt_vulkan_quantized_test_binAppleMac\#macosx-arm64 [ OK ] VulkanAPITest.convert_qconv2d_context (135 ms) [ RUN ] VulkanAPITest.linear_2d [ OK ] VulkanAPITest.linear_2d (4 ms) [----------] 2 tests from VulkanAPITest (139 ms total) [----------] Global test environment tear-down [==========] 2 tests from 1 test suite ran. (139 ms total) [ PASSED ] 2 tests. ############################################################## buck2 build --target-platforms ovr_config//platform/macos:arm64-fbsource //xplat/caffe2:pt_vulkan_quantized_api_test_binAppleMac\#macosx-arm64 -c pt.vulkan_full_precision=1 --show-output" buck-out//v2/gen/fbsource/xplat/caffe2/pt_vulkan_quantized_api_test_binAppleMac [ OK ] VulkanAPITest.conv2d_pw_quantized_prepack_random_params_int8_int32 (11 ms) [ RUN ] VulkanAPITest.linear_2d_flat [ OK ] VulkanAPITest.linear_2d_flat (4 ms) [ RUN ] VulkanAPITest.linear_2d_small [ OK ] VulkanAPITest.linear_2d_small (1 ms) [ RUN ] VulkanAPITest.linear_2d_large [ OK ] VulkanAPITest.linear_2d_large (1 ms) [ RUN ] VulkanAPITest.linear_3d_flat [ OK ] VulkanAPITest.linear_3d_flat (2 ms) [ RUN ] VulkanAPITest.linear_3d_small [ OK ] VulkanAPITest.linear_3d_small (2 ms) [ RUN ] VulkanAPITest.linear_3d_large [ OK ] VulkanAPITest.linear_3d_large (1 ms) [ RUN ] VulkanAPITest.linear_4d_flat [ OK ] VulkanAPITest.linear_4d_flat (1 ms) [ RUN ] VulkanAPITest.linear_4d_small [ OK ] VulkanAPITest.linear_4d_small (1 ms) [ RUN ] VulkanAPITest.linear_4d_large [ OK ] VulkanAPITest.linear_4d_large (1 ms) [ RUN ] VulkanAPITest.linear_custom [ OK ] VulkanAPITest.linear_custom (0 ms) [----------] 76 tests from VulkanAPITest (1811 ms total) [----------] Global test environment tear-down [==========] 76 tests from 1 test suite ran. (1811 ms total) [ PASSED ] 76 tests. YOU HAVE 8 DISABLED TESTS ############################################################## buck2 run --target-platforms ovr_config//platform/macos:arm64-fbsource //xplat/caffe2:pt_vulkan_api_test_binAppleMac\#macosx-arm64 -c pt.vulkan_full_precision=1 [----------] Global test environment tear-down [==========] 346 tests from 1 test suite ran. (5648 ms total) [ PASSED ] 345 tests. [ SKIPPED ] 1 test, listed below: [ SKIPPED ] VulkanAPITest.querypool_flushed_shader_log YOU HAVE 5 DISABLED TESTS Reviewed By: manuelcandales Differential Revision: D48812642 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110581 Approved by: https://github.com/yipjustin	2023-10-12 01:04:06 +00:00
Kurt Mohler	5292a92e03	Add `torch.unravel_index` (#110580 ) Fixes #35674 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110580 Approved by: https://github.com/lezcano, https://github.com/kulinseth	2023-10-12 00:55:51 +00:00
angelayi	577e3dff88	[aotinductor] Fail models temporarily (#111100 ) Temporarily mark these models as fail. Failures are due to https://github.com/pytorch/pytorch/pull/111030 which is needed for ExecuTorch's release so it can't be reverted. Will forward fix the failures. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111100 Approved by: https://github.com/desertfire	2023-10-12 00:48:44 +00:00
Yanbo Liang	986ad3bfa6	[2/N] Dynamo supports skip by function & removes skipfiles circular import (#110835 ) Several improvements for skipfiles: * Add ```FUNC_INLINELIST``` to support function level skip/inline check. * Use ```fn.__code__``` to match function since we can't get the function object sometimes. * Use python module string name for ```FILE_INLINELIST``` and ```SUBMODULE_INLINELIST```. * Use filename to match file and python module, which can fundamentally resolved the circular import issues introduced by skipfiles. * Use ```TYPE_CHECKING``` to ensure the python module string name is correct. * Add unit tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110835 Approved by: https://github.com/ezyang	2023-10-12 00:44:41 +00:00
cyy	a6b452dfdc	[2/N] Enable Wunused-result, Wunused-variable and Wmissing-braces in torch targets (#110836 ) This PR enables Wunused-result, Wunused-variable and Wmissing-braces because our code base is clean. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110836 Approved by: https://github.com/Skylion007	2023-10-11 23:49:15 +00:00
Kazuaki Ishizaki	6d7744ca46	Fix typo under torch/_functorch directory (#111067 ) This PR fixes typo the the of comments and exception messages in files under `torch/_functorch` directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111067 Approved by: https://github.com/Skylion007	2023-10-11 23:09:36 +00:00
Brian Hirsh	4d29b40299	torch.compile DTensor E2E (#105236 ) This PR updates DTensor to support torch.compile Cool stuff: there are some new tests in `test_dtensor.py` that show both the forward and backward graphs that we can send to inductor, when running a matmul with DTensor's. In particular, for this user code: ``` def fn(x, y): dt = DTensor.from_local(x.reshape(2, 4), mesh, [Shard(0)], run_check=False) dt2 = DTensor.from_local(y.reshape(4, 2), mesh, [Shard(1)], run_check=False) dt_out = torch.matmul(dt, dt2) dt_out_redistribute = dt_out.redistribute(mesh, [Replicate()]) return dt_out.to_local() ``` We generate the following fw and backward graphs. Forward graph: ``` def forward(self, primals_1, primals_2): view = torch.ops.aten.view.default(primals_1, [2, 4]); primals_1 = None _to_copy = torch.ops.aten._to_copy.default(view, dtype = torch.float32, layout = torch.strided, device = device(type='cuda', index=0)); view = None detach = torch.ops.aten.detach.default(_to_copy); _to_copy = None detach_1 = torch.ops.aten.detach.default(detach); detach = None view_1 = torch.ops.aten.view.default(primals_2, [4, 2]); primals_2 = None _to_copy_1 = torch.ops.aten._to_copy.default(view_1, dtype = torch.float32, layout = torch.strided, device = device(type='cuda', index=0)); view_1 = None detach_2 = torch.ops.aten.detach.default(_to_copy_1); _to_copy_1 = None detach_3 = torch.ops.aten.detach.default(detach_2); detach_2 = None detach_4 = torch.ops.aten.detach.default(detach_1) all_gather_into_tensor = torch.ops.c10d_functional.all_gather_into_tensor.default(detach_3, 'ptd:0', [0, 1], 2) wait_tensor = torch.ops.c10d_functional.wait_tensor.default(all_gather_into_tensor); all_gather_into_tensor = None split = torch.ops.aten.split.Tensor(wait_tensor, 4); wait_tensor = None getitem = split[0] getitem_1 = split[1]; split = None cat = torch.ops.aten.cat.default([getitem, getitem_1], 1); getitem = getitem_1 = None detach_5 = torch.ops.aten.detach.default(cat); cat = None mm = torch.ops.aten.mm.default(detach_4, detach_5); detach_4 = detach_5 = None detach_6 = torch.ops.aten.detach.default(mm); mm = None detach_9 = torch.ops.aten.detach.default(detach_6); detach_6 = None detach_10 = torch.ops.aten.detach.default(detach_9); detach_9 = None t = torch.ops.aten.t.default(detach_1); detach_1 = None detach_13 = torch.ops.aten.detach.default(t); t = None t_1 = torch.ops.aten.t.default(detach_3); detach_3 = None detach_15 = torch.ops.aten.detach.default(t_1); t_1 = None clone = torch.ops.aten.clone.default(detach_15, memory_format = torch.contiguous_format); detach_15 = None return [detach_10, detach_13, clone] ``` Backward graph: ``` def forward(self, detach_13, clone, tangents_1): detach_11 = torch.ops.aten.detach.default(tangents_1); tangents_1 = None detach_12 = torch.ops.aten.detach.default(detach_11); detach_11 = None mm_1 = torch.ops.aten.mm.default(detach_13, detach_12); detach_13 = None detach_14 = torch.ops.aten.detach.default(mm_1); mm_1 = None detach_16 = torch.ops.aten.detach.default(detach_12); detach_12 = None all_gather_into_tensor_2 = torch.ops.c10d_functional.all_gather_into_tensor.default(clone, 'ptd:0', [0, 1], 2); clone = None wait_tensor_2 = torch.ops.c10d_functional.wait_tensor.default(all_gather_into_tensor_2); detach_17 = torch.ops.aten.detach.default(wait_tensor_2); wait_tensor_2 = None mm_2 = torch.ops.aten.mm.default(detach_16, detach_17); detach_16 = detach_17 = None detach_18 = torch.ops.aten.detach.default(mm_2); mm_2 = None split_1 = torch.ops.aten.split.Tensor(detach_14, 2, 1); detach_14 = None getitem_2 = split_1[0] getitem_3 = split_1[1]; split_1 = None cat_1 = torch.ops.aten.cat.default([getitem_2, getitem_3]); getitem_2 = getitem_3 = None reduce_scatter_tensor = torch.ops.c10d_functional.reduce_scatter_tensor.default(cat_1, 'SUM', 'ptd:0', [0, 1], 2); cat_1 = None wait_tensor_3 = torch.ops.c10d_functional.wait_tensor.default(reduce_scatter_tensor); reduce_scatter_tensor = None detach_19 = torch.ops.aten.detach.default(wait_tensor_3); wait_tensor_3 = None detach_20 = torch.ops.aten.detach.default(detach_19); detach_19 = None detach_21 = torch.ops.aten.detach.default(detach_20); detach_20 = None detach_22 = torch.ops.aten.detach.default(detach_21); detach_21 = None _to_copy_2 = torch.ops.aten._to_copy.default(detach_22, dtype = torch.float32, layout = torch.strided, device = device(type='cpu')); detach_22 = None view_2 = torch.ops.aten.view.default(_to_copy_2, [8]); _to_copy_2 = None detach_23 = torch.ops.aten.detach.default(detach_18); detach_18 = None detach_24 = torch.ops.aten.detach.default(detach_23); detach_23 = None _to_copy_3 = torch.ops.aten._to_copy.default(detach_24, dtype = torch.float32, layout = torch.strided, device = device(type='cpu')); detach_24 = None view_3 = torch.ops.aten.view.default(_to_copy_3, [8]); _to_copy_3 = None return [view_3, view_2] ``` Some of the stuff in this graph looks kinda of silly though (e.g. an unnecessary split() + cat(), and all the extra detach() calls). Stuff that's broken: - functionalization is pretty horribly broken. In particular, the original strategy I used in this stack was to have functionalization run above subclass desugaring. But that doesn't play well with with the way we want to compile DTensor. DTensor has a few API's like `.redistribute()`, `.to_local()`, and the `DTensor()` constructor, that we want to put directly into the graph so that we can compile them (e.g. redistribute() will desugar into collective ops). Doing this requires functionalization to run underneath the subclass though. I hacked around this for now, by forcing these functions to run functionalization first if they need to. - the backward test that I have is... wrong. The backward graph that we trace out looks kind of reasonable, but it gives incorrect gradients on one of the two inputs. This needs further debugging (presumably we should be able to stare at the graph and identify which part of it is wrong?). Pull Request resolved: https://github.com/pytorch/pytorch/pull/105236 Approved by: https://github.com/wanchaol	2023-10-11 21:55:27 +00:00
Christian Puhrsch	3553eb9b89	Add CUTLASS-based support for mixed dtypes matrix multiplication (#110981 ) Resubmission without ghstack to make it easier to import https://github.com/pytorch/pytorch/pull/110934/commits Pull Request resolved: https://github.com/pytorch/pytorch/pull/110981 Approved by: https://github.com/drisspg	2023-10-11 21:47:52 +00:00
Kurt Mohler	0f924cdee3	Fix `functional::smooth_l1_loss` signatures to not override `beta` (#109798 ) This splits `nn::functional::smooth_l1_loss` into two different signatures in order to keep backward compatibility for calling the function like `smooth_l1_loss(input, target, /reduction=/..., /beta=/...)` Fixes #70163 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109798 Approved by: https://github.com/mikaylagawarecki	2023-10-11 21:37:37 +00:00
soulitzer	73f4c1a406	[reland2] Update custom Function preserve torch function when inputs … (#110895 ) …returned as-is Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/110895 Approved by: https://github.com/albanD	2023-10-11 21:37:19 +00:00
Xiaoyi Liu	52e76a3056	fix ShardedTensor.gather when shard is empty (#110962 ) Summary: current ShardedTensor.gather is not working as expectation when the shard is empty on any rank The root cause is identified that when a sharded tensor has no placement on a specific rank, the metadata doesn't include that rank's placement which introduces KeyError in : ```shard_offset = shard_placement[shard. Metadata][1]``` It's fixed by adding an empty tensor check. Test Plan: before change: after change: Differential Revision: D50114085 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110962 Approved by: https://github.com/wz337	2023-10-11 21:26:41 +00:00
Brian Hirsh	ded5ee75ac	AOTAutograd: Go down inference path if no outputs require grad (#111011 ) Fixes https://github.com/pytorch/pytorch/issues/110666 Slight update to original PR here: https://github.com/pytorch/pytorch/pull/111005 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111011 Approved by: https://github.com/Chillee	2023-10-11 20:59:47 +00:00
Angela Yi	6d8e0c4b5a	[export] Get export APIs ready for PTC (reland) (#111030 ) Summary: https://docs.google.com/document/d/1QJJEGnj2nHGPODlw38BEG3KLLCOTfdOVjPrNQbz_LM8/edit#bookmark=id.lp80wfshq130 Changes: * `torch.export` will return a functional ATen graph but not lowered to core aten decompositions (CompositeImplicitAutograd decomps still run) * `exported_program.run_decompositions(decomposition_table)` will optionally take a decomposition table, and run decompositions on the exported program, returning a new exported program. By default we will run the Core ATen decomposition table. Calling convention for Executorch stays the same: ``` pre_autograd_graph = capture_pre_autograd_graph(f, args, ...) aten_graph_no_decomps = torch.export.export(pre_autograd_graph, args, ...) # Within to_edge we decompose to core aten and then convert to edge edge_graph = exir.to_edge(aten_graph_no_decomps) ``` Test Plan: CI Differential Revision: D50172210 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111030 Approved by: https://github.com/ydwu4	2023-10-11 20:48:24 +00:00
Huy Do	20a7366147	Fix Android publish step with lite interpreter (#111071 ) This file needs to be added to the list like others. The publish command `BUILD_LITE_INTERPRETER=1 android/gradlew -p android publish` finishes successfully with this and files are available on Nexus: ![Screenshot 2023-10-11 at 11 56 53](https://github.com/pytorch/pytorch/assets/475357/849d4aa7-79f6-47fa-a471-d452d7c1bdf6) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111071 Approved by: https://github.com/atalman	2023-10-11 20:28:12 +00:00
isdanni	6c7013a3dc	[Doc] Add weight dtype in torch.nn.CrossEntropyLoss (#110998 ) Fixes #101213 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110998 Approved by: https://github.com/albanD	2023-10-11 19:52:13 +00:00
Jerry Zhang	d589106bcd	[quant][pt2e] Disable remove_qconfig (#111000 ) Summary: This is a hacky flag that we had before in fx flow, and we don't want this in the new flow Test Plan: python test/test_quantization.py TestQuantizePT2E Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/111000 Approved by: https://github.com/andrewor14	2023-10-11 19:43:46 +00:00
Elias Ellison	cf1da9bd17	enable index add test (#111016 ) Dynamo is swallowing a user exception when suppress_errors is set to True. There's an issue filed for that: https://github.com/pytorch/pytorch/issues/108798. In the meantime we still like the functionality in this test which works without the default setting (dont suppress errors) to not regress. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111016 Approved by: https://github.com/yanboliang	2023-10-11 19:41:35 +00:00
Guilherme Leobas	e151307db0	Clean-up composite implicit ops for aten::isfinite, isreal and log_sigmoid (#110896 ) Functions: * aten::isfinite * aten::log_sigmoide * aten::isreal Pull Request resolved: https://github.com/pytorch/pytorch/pull/110896 Approved by: https://github.com/Skylion007, https://github.com/kshitij12345	2023-10-11 19:28:10 +00:00
PyTorch MergeBot	d3205f8377	Revert "[2/N] Dynamo supports skip by function & removes skipfiles circular import (#110835 )" This reverts commit 0bd4ce728b9af2a14cfbda89e8faa9c9cfd61a5b. Reverted https://github.com/pytorch/pytorch/pull/110835 on behalf of https://github.com/DanilBaibak due to Broken trunk ([comment](https://github.com/pytorch/pytorch/pull/110835#issuecomment-1758279590))	2023-10-11 18:39:36 +00:00
wz337	80dfc974dd	[2D] Enable 2D FSDP+TP model.load_state_dict() (#110925 ) This PR adds a all_gather_dtensor() method to fsdp/_fsdp_extensions.py and the actual implementation in tensor/parallel/fsdp.py. This enables FSDP to load 2D DTensor state_dict into model when calling `model.load_state_dict()`. cc. @fegin Pull Request resolved: https://github.com/pytorch/pytorch/pull/110925 Approved by: https://github.com/fegin ghstack dependencies: #110831, #110846	2023-10-11 18:22:20 +00:00
Oguz Ulgen	fd4ba806f6	Implement tensor slice in inductor to stop falling back for aten.index (#111015 ) Fixes #110711 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111015 Approved by: https://github.com/Chillee	2023-10-11 17:53:24 +00:00
wz337	6c136c3302	[2D] Enable 2D DTensor state_dict for FSDP + TP (#110846 ) This PR adds a `chunk_dtensor()` method to fsdp/_fsdp_extensions.py and the actual implementation of `chunk_dtensor()` in tensor/parallel/fsdp.py. This enables FSDP to return 2D DTensor state_dict when composing FSDP with TP. cc. @fegin Pull Request resolved: https://github.com/pytorch/pytorch/pull/110846 Approved by: https://github.com/fegin, https://github.com/wanchaol ghstack dependencies: #110831	2023-10-11 17:40:39 +00:00
Yanbo Liang	0bd4ce728b	[2/N] Dynamo supports skip by function & removes skipfiles circular import (#110835 ) Several improvements for skipfiles: * Add ```FUNC_INLINELIST``` to support function level skip/inline check. * Use ```fn.__code__``` to match function since we can't get the function object sometimes. * Use python module string name for ```FILE_INLINELIST``` and ```SUBMODULE_INLINELIST```. * Use filename to match file and python module, which can fundamentally resolved the circular import issues introduced by skipfiles. * Use ```TYPE_CHECKING``` to ensure the python module string name is correct. * Add unit tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110835 Approved by: https://github.com/ezyang	2023-10-11 17:24:56 +00:00
Wanchao Liang	de1ca4a081	[dtensor] small change to refactor random ops (#110900 ) make random ops be a set instead of list Pull Request resolved: https://github.com/pytorch/pytorch/pull/110900 Approved by: https://github.com/fduwjj ghstack dependencies: #110898	2023-10-11 17:03:08 +00:00
Wanchao Liang	657e8f2cad	[dtensor] make replicate -> partial do division instead (#110898 ) This PR switches the replicate -> partial to do division instead of zeroing out other ranks, it preserve same numerics, but avoid the per-rank behavior difference, and friendly to torch compile Pull Request resolved: https://github.com/pytorch/pytorch/pull/110898 Approved by: https://github.com/fduwjj	2023-10-11 17:03:08 +00:00
Aaron Enye Shi	204f338f71	Reland [Profiler] Improve the docstring for export_memory_timeline (#110983 ) Summary: Add more details about the export_memory_timeline API, as we've landed new representations of the memory timeline data. Test Plan: CI, should be no functional change, as we only changed comments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110983 Approved by: https://github.com/DanilBaibak	2023-10-11 16:42:05 +00:00
PyTorch MergeBot	cae3a2e6eb	Revert "[sparse] Add i8i8->i32 support for cuSPARSELt (#110499 )" This reverts commit 33da6c89516d9d9067f7181826826224a4cf5afe. Reverted https://github.com/pytorch/pytorch/pull/110499 on behalf of https://github.com/jcaip due to cslt v0.5.0 requires a newer linker and we will be using v0.4.0 as the base version ([comment](https://github.com/pytorch/pytorch/pull/110499#issuecomment-1758039953))	2023-10-11 16:14:59 +00:00
Bin Bao	86619c9c9d	[aotinductor] Add both cpu and cuda tests for the AOTInductor cpp test (#110920 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110920 Approved by: https://github.com/chenyang78 ghstack dependencies: #110652, #110891	2023-10-11 15:58:28 +00:00
Bin Bao	3058700f7f	[aotinductor] Add AOTIModelRunner as a utility class (#110891 ) Summary: Introduce a utility class AOTIModelRunner to take care of running an AOTInductor compiled model. It does things like dlopen a model, initialize the model container, setup inputs and outputs, and destroy the model container. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110891 Approved by: https://github.com/chenyang78 ghstack dependencies: #110652	2023-10-11 15:58:28 +00:00
Bin Bao	b17c247eb1	[aotindutor] Update the cpp test example (#110652 ) Summary: store inputs and outpus in python, and load them back to run the compiled model in c++ and compare the output Pull Request resolved: https://github.com/pytorch/pytorch/pull/110652 Approved by: https://github.com/chenyang78	2023-10-11 15:58:28 +00:00
ydwu4	3062e267b1	[cond] Add more tests for valid inputs of cond (#110727 ) This PR adds a parametrized test for cond. It tests cond can be traced with valid inputs. Specifically valid inputs is combination of: - pred (python boolean, boolean tensor, int tensor, scalar tensor) - true_fn/false_fn (func, obj, nn_module) - Operands (0 or more tensor inputs), tested with 0 and 2 - closures (0 or more tensor closures), tested with 0 and 2 - nested_level (no nesting or level-2 nested cond) What this test doesn't cover: - pred: symbolic boolean expression as predicate - true_fn/false_fn: that mutates indermediate tensors - operands: non-tensor operands such as float, int - closures: nn_module attribute closures, python constant closures - nested_level: 3+ Pull Request resolved: https://github.com/pytorch/pytorch/pull/110727 Approved by: https://github.com/zou3519	2023-10-11 15:56:13 +00:00
Nikita Shulga	ef19824db8	Suppress warnings in tensorpipe.h (#111012 ) To fix distributed compilation with clang-15 Fixes https://github.com/pytorch/pytorch/issues/110974 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111012 Approved by: https://github.com/huydhn, https://github.com/drisspg, https://github.com/Skylion007	2023-10-11 15:41:30 +00:00
Nikita Shulga	f2d476843e	[MPS][BE] Avoid redispatch in `sign_out` (#110955 ) By calling `at::mps::sign_outf` rather than `at::sign_out` that calls dispatcher again. Also, do not copy output unnecessarily. <!-- copilot:poem --> ### <samp>🤖 Generated by Copilot at f942e74</samp> > _Metal tensors rise from the ashes_ > _`sign` and `sgn` unleash their flashes_ > _MPSFunctions reign supreme_ > _In the header of the metal dream_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/110955 Approved by: https://github.com/kulinseth, https://github.com/albanD	2023-10-11 15:10:21 +00:00
Sam Larsen	fc1105b282	[inductor] Implement Fx graph caching to improve warm compilation time. (#103453 ) Summary: Implement an on-disk cache to save and reuse compiled FX Graphs. This implementation does not handle tensors with symbolic shapes. This needs to be done in a follow-up PR. Test Plan: * New unit tests exercising saving and load from the cache. * New unit tests to exercise the cache key calculations. * Ran several benchmarks to see cache hit and resulting compilation times. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103453 Approved by: https://github.com/eellison, https://github.com/Chillee	2023-10-11 14:39:14 +00:00
rzou	2cf9782912	[generate_opcheck_tests] Add some reasonable defaults (#110977 ) Summary: Make it easier to add `generate_opcheck_tests` by adding defaults for the failures_dict location, the additional decorators, and the test utils. Test Plan: Existing tests Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/110977 Approved by: https://github.com/williamwen42 ghstack dependencies: #110951	2023-10-11 14:28:05 +00:00
Bin Bao	4abfa22812	[aotinductor] Add a perf smoke test for AOTInductor (#110972 ) Summary: To prevent perf regression like the one caused by #110510 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110972 Approved by: https://github.com/chenyang78	2023-10-11 13:30:05 +00:00
PyTorch MergeBot	98c329b19e	Revert "[core ATen IR] Add decompositions for max, min, var_mean (#110906 )" This reverts commit 9606cda64e97210cfcca07110ef4872cedc5a1d9. Reverted https://github.com/pytorch/pytorch/pull/110906 on behalf of https://github.com/SS-JIA due to Breaks internal CI ([comment](https://github.com/pytorch/pytorch/pull/110906#issuecomment-1757490740))	2023-10-11 11:41:21 +00:00
igm503	95ff51d8ed	[MPS] Add support for Softshrink to MPS Backend (#110814 ) Adds the softshrink activation function to the mps backend. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110814 Approved by: https://github.com/kulinseth	2023-10-11 07:55:39 +00:00
Rohan Varma	de370eb313	[Distributed] Small nits to apply_optimizer_in_backward (#110903 ) Clarify a few things around the documentation Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/110903 Approved by: https://github.com/janeyx99	2023-10-11 07:45:45 +00:00
PyTorch MergeBot	0821868110	Revert "[export] Get export APIs ready for PTC (#110410 )" This reverts commit b96ea9f361f2ed872c4a7d662427cadec345b702. Reverted https://github.com/pytorch/pytorch/pull/110410 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/110410#issuecomment-1757017249))	2023-10-11 07:31:51 +00:00
Huy Do	74029fae9d	Fix broken period workflow after #110976 (#111013 ) My typo mistake after #110976 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111013 Approved by: https://github.com/kit1980, https://github.com/malfet	2023-10-11 06:40:18 +00:00
Ramin Azarmehr	056d6247c7	[MPS] Use Metal Events to synchronize buffers in MPSAllocator (Part 1) (#106938 ) - This PR is the first part of a bigger change to use `MPSEvent` to synchronize shared-buffers between CPU/GPU. - Add APIs to record and wait for `MPSEvents` in `MPSAllocator`. - Use a container list for Buffer Pools to simplify iterating over them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106938 Approved by: https://github.com/kulinseth	2023-10-11 06:13:05 +00:00
Angela Yi	b96ea9f361	[export] Get export APIs ready for PTC (#110410 ) Summary: https://docs.google.com/document/d/1QJJEGnj2nHGPODlw38BEG3KLLCOTfdOVjPrNQbz_LM8/edit#bookmark=id.lp80wfshq130 Changes: * `torch.export` will return a functional ATen graph w/o decompositions * `exported_program.run_decompositions(decomposition_table)` will optionally take a decomposition table, and run decompositions on the exported program, returning a new exported program. By default we will run the Core ATen decomposition table. Calling convention for Executorch stays the same: ``` pre_autograd_graph = capture_pre_autograd_graph(f, args, ...) aten_graph_no_decomps = torch.export.export(pre_autograd_graph, args, ...) # Within to_edge we decompose to core aten and then convert to edge edge_graph = exir.to_edge(aten_graph_no_decomps) ``` Test Plan: CI Differential Revision: D49742989 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110410 Approved by: https://github.com/ydwu4	2023-10-11 06:10:07 +00:00
Michael Voznesensky	1e7947b3e0	Revert "Reland 3rd try [finishing colesbury's PR 100642] Guard on nn.Module dicts and type (#109323 )" + Forward fixes + test (#110964 ) This reverts commit f786fbdebdd24d3a6807e3b9fbf055836db4ad60. Forward fixes Pull Request resolved: https://github.com/pytorch/pytorch/pull/110964 Approved by: https://github.com/ezyang, https://github.com/anijain2305	2023-10-11 05:16:47 +00:00
Nikita Shulga	e49ea87162	Fix socket.cpp compilation using gcc-9.4 (#111002 ) Otherwise following error is thrown when attempted to compile with WERROR enabled: ``` In file included from /home/nshulga/git/pytorch/pytorch/torch/csrc/distributed/c10d/socket.cpp:30: /home/nshulga/git/pytorch/pytorch/third_party/fmt/include/fmt/chrono.h:340:24: warning: redundant redeclaration of ‘constexpr’ static data member ‘fmt::v10::detail::codecvt_result<CodeUnit>::max_size’ [-Wdeprecated] 340 \| constexpr const size_t codecvt_result<CodeUnit>::max_size; \| ^~~~~~~~~~~~~~~~~~~~~~~~ /home/nshulga/git/pytorch/pytorch/third_party/fmt/include/fmt/chrono.h:335:33: note: previous declaration of ‘fmt::v10::detail::codecvt_result<CodeUnit>::max_size’ 335 \| static constexpr const size_t max_size = 32; \| ^~~~~~~~ ``` or following if using clang as host compiler ``` In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/distributed/c10d/socket.cpp:30: /Users/nshulga/git/pytorch/pytorch/third_party/fmt/include/fmt/chrono.h:340:50: warning: out-of-line definition of constexpr static data member is redundant in C++17 and is deprecated [-Wdeprecated] constexpr const size_t codecvt_result<CodeUnit>::max_size; ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/111002 Approved by: https://github.com/drisspg	2023-10-11 05:16:00 +00:00
wz337	a614281ea9	Add current_device() to torch.cpu (#110987 ) Better support device agnostic, add a "cpu" return for `current_device()` in torch.cpu so that we won't run into `AttributeError: module 'torch.cpu' has no attribute 'current_device'`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110987 Approved by: https://github.com/wanchaol	2023-10-11 05:13:10 +00:00
soulitzer	110382bacf	Make NestedTensor compilable with eager backend (#109171 ) In this PR: - Adds support for strides for jagged tensor (design doc for this coming soon) - NestedTensor skips automatic dynamic - Make use of @bdhirsh's subclass fakification logic by adding the __tensor_{un,}flatten__ functions. - Additional logic for fakification: since existing subclass fakification logic does not handle the case where the outer tensor has an additional dimension. We insert one-off logic to (1) insert an extra SingletonSymInt onto the fakified NestedTensor. (2) make sure we call track_symint on both the sizes on the inner and outer tensor during guard creation. Remaining things that are weird: - Still need to skip some logic in meta utils for some reason (I was going to write this up more, but decided not to since we're not able to do this anyway for a immediate reason: we cannot arbitrarily compare singleton ints. For now I'm just following Brian's advise from [here](https://github.com/pytorch/pytorch/pull/109171#discussion_r1328137070) ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109171 Approved by: https://github.com/ezyang, https://github.com/bdhirsh	2023-10-11 04:47:10 +00:00
drisspg	e0dbaa04d2	Fix the meta func for mem_eff_backward (#110893 ) Fixes #110832 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110893 Approved by: https://github.com/eellison	2023-10-11 02:58:54 +00:00
andrewor14	0e551bbcd7	[quant][pt2] Preserve source_fn_stack after QAT fusion (#110899 ) Test Plan: python test/test_quantization.py TestQuantizePT2EQAT.test_qat_preserve_source_fn_stack Reviewers: jerryzh168, kimishpatel Subscribers: jerryzh168, kimishpatel, supriyar Differential Revision: [D50101253](https://our.internmc.facebook.com/intern/diff/D50101253) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110899 Approved by: https://github.com/jerryzh168	2023-10-11 02:55:52 +00:00
Tugsbayasgalan Manlaibaatar	5aee22e0e0	Move export.constrain_as_* to torch._constrain_as_* (#110757 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110757 Approved by: https://github.com/avikchaudhuri ghstack dependencies: #109859	2023-10-11 02:37:55 +00:00
soulitzer	c9eb8d8d90	Add set_checkpoint_debug_enabled that overrides local setting (#110728 ) People access activation checkpoint through many layers of config and it is not always guaranteed that all the layers of wrapping around checkpoint properly propagate all the kwargs, e.g. debug mode. This context manager offers an alternative way to enable debug mode that bypasses the need for all layers to propagate kwargs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110728 Approved by: https://github.com/albanD ghstack dependencies: #110673, #110674, #110675, #110676	2023-10-11 02:12:31 +00:00
Michael Voznesensky	02f6a8126e	Support a simple subset of functions as backward hooks on intermediate tensors (#109537 ) The main thrust of the initial effort here was to capture `register_hook` calls on tensors in compile regions. The first part of this was done in https://github.com/pytorch/pytorch/pull/108903 wherein we added support for register_hook input tensors. The distinction between input and intermediary is due to implementation differences. There are 2 kinds of hooks: 1) Hooks on objects with sources (inputs, params) 2) Hooks on objects w/o sources (intermediaries, and outputs). Note: As outputs can be made simple by how dynamo handles residuals, they could actually be handled as if they were inputs, but, for the sake of this PR, we will refer to hooks as either hooks on inputs (sourced), or hooks on intermediaries (not sourced). The plan: For tensors w/ a source: (The PR above) We record registered hooks, store them as a global, and associate them with the tensor in residuals. This means that when dynamo goes to create the frame, where we produce bytecode to stitch together our PT2 modified bytecode with the original eager code, we call register_hook. This registration of hooks in residuals is sound because (a) it happens right after a Pt2 frame region ends and (b) we know that the tensor is alive in f_locals, f_globals, or a module in the users invoking frame. This means we can soundly know it will be around to invoke register_hook on. As long as we guard on the identity of the lifted function, this is sound to do. For tensors w/o a source: (This PR) Ostensibly, the most correct and complete solution would be to smuggle hooks into a runtime wrapper in aot_autograd, where all the items the hooks close over are lifted to inputs as necessary and passed alongside the user provided function. This is necessary so that we can properly trace out and capture all the mutations within the user defined hook at backwards time. This is too complicated - so, we limited the scope of this initial PR to a simple subset of hooks: - Hooks must have a source (be known to us already, not a lambda or intermediary defined function) - We must be tracing under compiled autograd The flow: We use the HOP added in https://github.com/pytorch/pytorch/pull/109690/files, referred to as the HOP below. 1) We intercept register_hook calls and wrap the user defined fn in the HOP 2) We write a `_register_hook_trampoline` to the graph that is a local no-arg function that is invoked as a call_function in the dynamo graph 3) aot_autograd inlines through it during its trace, and sees the HOP 4) the HOP preserves itself in the graph - it does not get traced into 5) During backwards, compiled_autograd installs the HOP under a hook call 6) When compiled_autograd enters compilation over its generated graph, dynamo traces the contents of the hook Pull Request resolved: https://github.com/pytorch/pytorch/pull/109537 Approved by: https://github.com/ezyang	2023-10-11 01:35:37 +00:00
Jon Chuang	79212430df	feat(inductor): fx graph debug should display device (#110346 ) Device mismatch issues are root cause of: https://github.com/pytorch/pytorch/issues/107006, hence make device-related scheduling issues easier to diagnose. Also format single-kwarg graphs to be more concise Example rendering: ![image](https://github.com/pytorch/pytorch/assets/9093549/1b59a994-f2df-45c9-8cb7-37eb3ba12654) CC code owners: @ngimel @jansel @shunting314 @mlazos @peterbell10 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110346 Approved by: https://github.com/eellison	2023-10-11 00:34:55 +00:00
Edward Z. Yang	24bf9aeb6b	Fix arange with dynamic end argument. (#110979 ) Fixes https://github.com/pytorch/pytorch/issues/93468 There's a few extra tests that are sort of unrelated, but I ended up writing them while working on the fix and decided to keep them. The big idea here is to split the `_check` so that `expect_true` works; I could have probably also improved the symbolic reasoning but I'm lazy. One small logging fix too. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/110979 Approved by: https://github.com/Skylion007	2023-10-11 00:32:34 +00:00
leslie-fang-intel	a11d4a8378	[Reland] [Inductor] Break the loop fusion when node2 depends on node1 mutations (#110677 ) Reland PR https://github.com/pytorch/pytorch/pull/109172 which has been reverted in https://github.com/pytorch/pytorch/pull/110622 Differential Revision: [D50097373](https://our.internmc.facebook.com/intern/diff/D50097373) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110677 Approved by: https://github.com/jgong5, https://github.com/ezyang	2023-10-11 00:26:45 +00:00
PyTorch MergeBot	314a502eb0	Revert "Reland "[C10] PG observability hooks. (#108815 )" (#110907 )" This reverts commit 7678cd22af46c9df4fb47a409d3e8ad71a6127ea. Reverted https://github.com/pytorch/pytorch/pull/110907 on behalf of https://github.com/huydhn due to Sorry for reverting this, but macos job in trunk starts failing after this `7678cd22af` ([comment](https://github.com/pytorch/pytorch/pull/110907#issuecomment-1756497387))	2023-10-11 00:23:42 +00:00
Huy Do	2edc75a669	Add a workflow to release Android binaries (#110976 ) This adds 2 jobs to build PyTorch Android with and without lite interpreter: * Keep the list of currently supported ABI armeabi-v7a, arm64-v8a, x86, x86_64 * Pass all the test on emulator * Run an the test app on emulator and my Android phone `arm64-v8a` without any issue ![Screenshot_20231010-114453](https://github.com/pytorch/pytorch/assets/475357/57e12188-1675-44d2-a259-9f9577578590) * Run on AWS https://us-west-2.console.aws.amazon.com/devicefarm/home#/mobile/projects/b531574a-fb82-40ae-b687-8f0b81341ae0/runs/5fce6818-628a-4099-9aab-23e91a212076 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110976 Approved by: https://github.com/atalman	2023-10-11 00:19:33 +00:00
Jon Chuang	5aa96fd336	[dynamo] list index: add more list types to testing, support namedtuple, improve error handling (#110919 ) Follow up: #110817 Minor improvements as discussed in prev PR Pull Request resolved: https://github.com/pytorch/pytorch/pull/110919 Approved by: https://github.com/ezyang	2023-10-11 00:16:39 +00:00
SS-JIA	9606cda64e	[core ATen IR] Add decompositions for max, min, var_mean (#110906 ) ## Context Add decompositions for `aten.max`, `aten.min`, and `aten.var_mean`. These operators follow a pattern of returning a tuple of outputs from two component operators: ``` aten.max(x) -> return aten.amax(x), aten.argmax(x) aten.min(x) -> return aten.amin(x), aten.argmin(x) aten.var_mean(x) -> return aten.var(x), aten.mean(x) ``` For `var_mean`, the `refs` implementation was doing something similar, so I changed it to call `torch.` ops instead like was done for other `refs` implementations previously. cc: @peterbell10 @lezcano Note that Inductor lowers all these directly, so they are excluded from the Inductor decomp table. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110906 Approved by: https://github.com/manuelcandales	2023-10-11 00:06:24 +00:00
PyTorch MergeBot	3100d3e661	Revert "[inductor] Implement Fx graph caching to improve warm compilation time. (#103453 )" This reverts commit 8a8668e1aeac8d1726ac746372f5a93262994f62. Reverted https://github.com/pytorch/pytorch/pull/103453 on behalf of https://github.com/kit1980 due to The newly added test fails on internal builds ([comment](https://github.com/pytorch/pytorch/pull/103453#issuecomment-1756449919))	2023-10-10 23:21:59 +00:00
cyy	f98d6ad8b3	[1/N] Apply clang-tidy to aten/src/ATen/core/ (#110861 ) It is time to cliang-tidy aten. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110861 Approved by: https://github.com/Skylion007	2023-10-10 23:20:58 +00:00
Will Constable	ca03f36233	Change ProcessGroupNCCL default timeout to 10 min (#110947 ) Avoid changing default for other backends as CPU backend (GLOO) may need longer timeouts. Motivated by trying to save cluster time when encountering collective hangs. Generally collectives should time out within seconds and 30 minutes (or 10 minutes) should provide ample headroom for edge cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110947 Approved by: https://github.com/xw285cornell, https://github.com/fduwjj	2023-10-10 22:28:39 +00:00
Tugsbayasgalan Manlaibaatar	cd275dc24f	Remove RangeConstraints in favor of ValueRanges (#109859 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109859 Approved by: https://github.com/avikchaudhuri	2023-10-10 22:22:05 +00:00
Jerry Zhang	7a69e3d30b	[fx][subgraph_matcher] Add a matcher that supports name to node map (#110743 ) Summary: We want the matcher to return a name -> node in target graph so that we can refer to the node by name, this is useful for downstream applications like quantization. and also we can use the torch API as source of truth instead of matching aten API directly. Test Plan: python test/fx/test_matcher_utils.py Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/110743 Approved by: https://github.com/SherlockNoMad	2023-10-10 22:21:24 +00:00
Ramil Nugmanov	91eeb77260	StackDataset batched sampling (#110694 ) Optimization of loading minibatches Pull Request resolved: https://github.com/pytorch/pytorch/pull/110694 Approved by: https://github.com/ejguan	2023-10-10 22:05:51 +00:00
Joel Schlosser	ac01304e22	pin_memory support for NT (#110404 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110404 Approved by: https://github.com/cpuhrsch, https://github.com/albanD ghstack dependencies: #110292	2023-10-10 21:58:19 +00:00
Joel Schlosser	43ea782af3	Multiprocessing support for NT (#110292 ) Fixes #110161 Allows NTs to be used in DataLoaders with `num_workers > 1`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110292 Approved by: https://github.com/cpuhrsch, https://github.com/albanD	2023-10-10 21:58:19 +00:00
Aaron Bockover	7f2d25c547	[ONNX] bump onnx submodule to rel-1.15.0 (#110663 ) - onnx==1.15.0rc1 - onnxscript==0.1.0.dev20231006 - ort-nightly==1.17.0.dev20231005001 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110663 Approved by: https://github.com/ezyang, https://github.com/thiagocrepaldi	2023-10-10 21:44:09 +00:00
rzou	3a29cdc5e6	[optests] Add dontGenerateOpCheckTests and is_inside_opcheck_mode (#110951 ) This PR adds the following helper functions for generated opcheck tests: - dontGenerateOpCheckTests is a decorator that skips generation of the opcheck tests for the generated function - is_inside_opcheck_mode lets us query if we are in a generated test. Useful for fast debugging out-of-tree without needing to update PyTorch. Test Plan: - new tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/110951 Approved by: https://github.com/williamwen42	2023-10-10 21:43:43 +00:00
wz337	d9eb5a57aa	[FSDP] Change _create_chunk_dtensor in fsdp/_shard_utils.py to use public API from DTensor (#110831 ) This PR: 1) updates _create_chunk_dtensor() in _shard_utils.py to use public APIs from DTensor. This will avoid the global_size calculation error from using DTensor.from_local() for uneven-sharded parameters, as described in https://github.com/pytorch/pytorch/issues/110762 2) updates test/distributed/fsdp/test_fsdp_dtensor_state_dict.py to include unit test for a model with uneven sharding. cc. @wanchaol, @fegin Pull Request resolved: https://github.com/pytorch/pytorch/pull/110831 Approved by: https://github.com/wanchaol, https://github.com/fegin	2023-10-10 21:04:27 +00:00
Jon Chuang	6e770c0dda	[dynamo] Add `itertools.repeat` via polyfill (#110953 ) Fixes https://github.com/pytorch/pytorch/issues/110286 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110953 Approved by: https://github.com/ezyang	2023-10-10 20:40:33 +00:00
PyTorch MergeBot	02a02a23ee	Revert "Move at::{Refcounted,}MapAllocator to c10 (#109881 )" This reverts commit 0341deb1c720d8c908ed40e853eaacfc8ac37181. Reverted https://github.com/pytorch/pytorch/pull/109881 on behalf of https://github.com/albanD due to It does break buck build ([comment](https://github.com/pytorch/pytorch/pull/109881#issuecomment-1756195823))	2023-10-10 20:39:12 +00:00
Khushi Agrawal	495f77be7a	[cpu] explicitly vectorize digamma (#110217 ) ### Benchmarking results ```python [-------------- torch.digamma(x) Benchmark -------------] \| implicitly vectorized \| explicitly vectorized 1 threads: ----------------------------------------------------------------------- dtype torch.float16 - n : 100 \| 3.8 \| 3.5 dtype torch.float16 - n : 200 \| 5.8 \| 5.3 dtype torch.float16 - n : 500 \| 11.8 \| 10.7 dtype torch.float16 - n : 1000 \| 22.0 \| 19.6 dtype torch.float16 - n : 10000 \| 203.6 \| 179.7 dtype torch.float32 - n : 100 \| 3.8 \| 3.6 dtype torch.float32 - n : 200 \| 5.7 \| 5.5 dtype torch.float32 - n : 500 \| 11.1 \| 11.1 dtype torch.float32 - n : 1000 \| 20.6 \| 20.5 dtype torch.float32 - n : 10000 \| 191.7 \| 189.6 dtype torch.float64 - n : 100 \| 3.8 \| 3.7 dtype torch.float64 - n : 200 \| 5.9 \| 5.7 dtype torch.float64 - n : 500 \| 11.9 \| 11.7 dtype torch.float64 - n : 1000 \| 22.1 \| 21.7 dtype torch.float64 - n : 10000 \| 203.6 \| 199.7 dtype torch.bfloat16 - n : 100 \| 3.7 \| 3.5 dtype torch.bfloat16 - n : 200 \| 5.6 \| 5.3 dtype torch.bfloat16 - n : 500 \| 11.2 \| 10.6 dtype torch.bfloat16 - n : 1000 \| 20.8 \| 19.5 dtype torch.bfloat16 - n : 10000 \| 190.0 \| 179.7 Times are in microseconds (us).` ``` ### Benchmarking config Machine: Intel(R) Core(TM) i7-10870H CPU @ 2.20GHz <p> ```python >>> import torch >>> print(f"Torch config: {torch.__config__.show()}") Torch config: PyTorch built with: - GCC 9.3 - C++ Version: 201703 - Intel(R) oneAPI Math Kernel Library Version 2023.1-Product Build 20230303 for Intel(R) 64 architecture applications - OpenMP 201511 (a.k.a. OpenMP 4.5) - LAPACK is enabled (usually provided by MKL) - CPU capability usage: AVX2 - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CXX_COMPILER=/usr/local/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_PYTORCH_QNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.2.0, USE_CUDA=0, USE_CUDNN=OFF, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=0, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=0, USE_OPENMP=ON, USE_ROCM=OFF, ``` </p> Script - ``` import torch import pickle from torch.utils import benchmark from itertools import product device = 'cpu' dtypes = (torch.float16, torch.float32, torch.float64, torch.bfloat16) n = (100, 200, 500, 1000, 10000) result = [] for dtype, num in product(dtypes, n): x = torch.rand(num, dtype=dtype, device='cpu') torch.digamma(x) stmt = 'torch.digamma(x)' measurement = benchmark.Timer( stmt=stmt, globals={'x': x}, label=stmt + " Benchmark", sub_label=f"dtype {dtype} - n : {num}", description="vectorized", ).blocked_autorange(min_run_time=10) result.append(measurement) fname_prefix = "benchmark_digamma_" benchmark.Compare(result).print() with open(fname_prefix+"vectorized", "wb") as f: pickle.dump(result, f) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/110217 Approved by: https://github.com/sanchitintel, https://github.com/vfdev-5, https://github.com/ezyang	2023-10-10 20:31:25 +00:00
Will Constable	7678cd22af	Reland "[C10] PG observability hooks. (#108815 )" (#110907 ) This reverts commit ff0358b0384d6a3a5b8ceeae625c93221612ba8e. (original PR https://github.com/pytorch/pytorch/pull/108815 desc copied below) Expose a set of observability hooks into C10D such that our users can detect collectives failure both faster and more easily. The design is similar to NCCL desync debug that it minimized the overhead by doing most of the work out of the main thread. This PR introduces a new module torch.distributed.hooks that exposes the following set of methods: register_collective_start_hook register_collective_end_hook register_process_group_hook The process group hook exposes PG creation on the member ranks and call them inline from the the PG creation code. This is fine since this happens during initialization and a limited number of times. The collective start/end hooks are fired from a single background thread. It reads events from a C++ queue and dispatches over. Queue notification is oddly done using a pipe, this is needed so python can abort the thread on shutdown and have it as background thread. This is not possible with more reasonable choices like a condvar. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110907 Approved by: https://github.com/fduwjj	2023-10-10 20:09:40 +00:00
Jon Chuang	84ad3ed7b2	[dynamo] add config for displaying all guard failures (#110927 ) Fixes https://github.com/pytorch/pytorch/issues/110879 Example output: ``` ('Recompiling function fn in /home/jonch/Desktop/Programming/mlsys/pytorch/test/dynamo/test_misc.py:4578', 'triggered by the following guard failures: ["___check_type_id(L[\'obj\'], 94834370481168)", "L[\'obj\'].x == -0.5"]') ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/110927 Approved by: https://github.com/lezcano	2023-10-10 19:57:44 +00:00
DanilBaibak	8cf1a02e80	Rever [Profiler] Improve the docstring for export_memory_timeline (#110978 ) Rever [Profiler] Improve the docstring for export_memory_timeline Pull Request resolved: https://github.com/pytorch/pytorch/pull/110978 Approved by: https://github.com/huydhn, https://github.com/aaronenyeshi	2023-10-10 19:57:25 +00:00
soulitzer	bc49b1e50b	[reland] Use is_symbolic instead of testing isinstance in some place (#110676 ) reland of https://github.com/pytorch/pytorch/pull/110372 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110676 Approved by: https://github.com/ezyang ghstack dependencies: #110673, #110674, #110675	2023-10-10 19:37:17 +00:00
soulitzer	df9a6bcaef	[reland] Symintify guards.cpp (#110675 ) reland of https://github.com/pytorch/pytorch/pull/110371 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110675 Approved by: https://github.com/ezyang ghstack dependencies: #110673, #110674	2023-10-10 19:37:17 +00:00
soulitzer	3842b175d2	[reland] Add symbolic singleton int (#110674 ) reland of https://github.com/pytorch/pytorch/pull/110370 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110674 Approved by: https://github.com/ezyang ghstack dependencies: #110673	2023-10-10 19:37:17 +00:00
soulitzer	fda0a965c7	[reland] Support SingletonSymNode mul with coefficient (#110673 ) reland of https://github.com/pytorch/pytorch/pull/110369 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110673 Approved by: https://github.com/ezyang	2023-10-10 19:37:17 +00:00
eellison	fb4b9e9c8e	Re-enable a couple of fixed tests (#110770 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110770 Approved by: https://github.com/yanboliang, https://github.com/int3, https://github.com/Skylion007 ghstack dependencies: #110651	2023-10-10 19:13:14 +00:00
drisspg	5183760ca5	Adding Backward Support for NestedTensors and FlashAttention (#97485 ) # Summary <!-- copilot:summary --> ### <samp>🤖 Generated by Copilot at 318764f</samp> This pull request implements the CUDA backend of the SDPA kernel for nested tensors, which enables efficient transformer models with variable-length sequences. It adds a new dispatch key, a backward function, a unit test, and some helper functions for the kernel. It modifies `test/test_transformers.py`, `aten/src/ATen/native/native_functions.yaml`, `aten/src/ATen/native/nested/cuda/NestedTensorTransformerFunctionsBackward.cpp`, and `aten/src/ATen/native/nested/cuda/NestedTensorTransformerUtils.h`. <!-- copilot:poem --> ### <samp>🤖 Generated by Copilot at ed4a773</samp> > _Fused kernels of doom, unleash the flash attention_ > _Nested tensors on fire, reshape and pad with caution_ > _Backward pass of power, dispatch the CUDA key_ > _Test the gradients of hell, warn the user if they disagree_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/97485 Approved by: https://github.com/jbschlosser	2023-10-10 18:08:17 +00:00
Andrey Talman	77e5f5d8a4	Updates to patch version release plans (#110952 ) 1. Updates to patch release process 2. Add Release cadence section 3. Changed description for Modify release matrix to reflect current process Pull Request resolved: https://github.com/pytorch/pytorch/pull/110952 Approved by: https://github.com/malfet	2023-10-10 17:59:29 +00:00
Aaron Shi	52b1470935	[Profiler] Improve the docstring for export_memory_timeline (#110949 ) Summary: Add more details about the export_memory_timeline API, as we've landed new representations of the memory timeline data. Test Plan: CI, should be no functional change, as we only changed comments. Differential Revision: D50123450 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110949 Approved by: https://github.com/davidberard98	2023-10-10 17:53:56 +00:00
Anthony Alayo	31611b40b9	cmake: allow to build pytorch as a CMake subproject (#110373 ) This is a re-attempt of fixing https://github.com/pytorch/pytorch/issues/53980, first submitted in https://github.com/pytorch/pytorch/pull/54978. Quoting @SpaceIm ``` Fixes https://github.com/pytorch/pytorch/issues/53980 Maybe it would be nice to find why some files are generated in CMAKE_BINARY_DIR instead of CMAKE_CURRENT_BINARY_DIR or Torch_BINARY_DIR or PROJECT_BINARY_DIR, but there is a lot of indirection in the logic of pytorch build files, so I was not able to find where it comes from. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/110373 Approved by: https://github.com/malfet	2023-10-10 17:47:35 +00:00
Chien-Chin Huang	57f6368b8e	[collective] Add a torch.compile + functional_collectives test (#110688 ) Add a test to ensure functional_collectives + torch.compile always works. Differential Revision: [D50001491](https://our.internmc.facebook.com/intern/diff/D50001491/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110688 Approved by: https://github.com/wanchaol, https://github.com/fduwjj	2023-10-10 17:14:50 +00:00
eellison	c5f06b9753	Re-enable test_copy_transpose_math_view, neg_view/dce fix (#110651 ) - neg view can just be lowered to neg() post functionalization - we were treating all fallback kernels as not having side effects. we shouldn't dce mutating fallback kernels - either mutations induced by the reinplacing pass or clone_ with unsupported arguments (complex) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110651 Approved by: https://github.com/Chillee, https://github.com/jansel, https://github.com/malfet, https://github.com/Skylion007	2023-10-10 16:34:01 +00:00
Brian Hirsh	ba86dfcd83	AOTDispatch subclass (#104483 ) This is a PoC of AOTDispatch support. This PR actually works on basic examples, and I'm working on testing it out on `DTensor` (with @wanchaol), `SemiStructuredSparsityTensor` (with @jcaip), and `FP8Tensor`. There are some design decisions baked into the PR that I think we need consensus on though - so I'm planning on writing a larger design doc to go over the changes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104483 Approved by: https://github.com/ezyang	2023-10-10 16:13:16 +00:00
Jiong Gong	8bc04f46fe	[inductor cpp] use c10::bit_cast to avoid violating strict-aliasing (#110809 ) Fix https://github.com/pytorch/pytorch/issues/110807 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110809 Approved by: https://github.com/jansel	2023-10-10 11:16:31 +00:00
Chien-Chin Huang	7b25c2b90e	[FSDP][optim_state_dict] Move local optimizer state to FSDP compute_device (#110929 ) This will ensure all the tensors are on FSDP compute_device. Differential Revision: [D50059492](https://our.internmc.facebook.com/intern/diff/D50059492/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110929 Approved by: https://github.com/wz337	2023-10-10 10:34:31 +00:00
Michael Voznesensky	fb68aa0a92	[Easy] Remove unused return type from utils (#110887 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110887 Approved by: https://github.com/ezyang	2023-10-10 09:02:11 +00:00
SS-JIA	a425307589	[ATen core IR] De-register `full_like` and `empty_like` as core (#110924 ) ## Context Following up from @peterbell10 comments on https://github.com/pytorch/pytorch/pull/110882. * `empty_like` was erroneously classified as `core`. It can be decomposed using `empty_permuted` and in fact is currently decomposed this way in the core decomposition table. * `full_like` can be similarly decomposed to `full_permuted` once https://github.com/pytorch/pytorch/pull/110234 lands. The current decomposition into `empty_like` and `fill` doesn't work because `fill` decomposes to `full_like`, resulting in a recursive loop. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110924 Approved by: https://github.com/kirklandsign	2023-10-10 09:02:05 +00:00
jjsjann123	37567fdf31	Nvfuser cpp api deprecation attempt 2 (#110881 ) attempting to re-try #110318 deprecating nvfuser c++ API warning has been updated to TORCH_WARN_ONCE; Warning thrown inside torch::jit::fuser::cuda::isEnabled() is turned off and will be deprecated when we pulled out TorchScript integration in the follow up PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110881 Approved by: https://github.com/davidberard98, https://github.com/NicolasHug	2023-10-10 08:07:03 +00:00
Liao, Xuan	8820dda943	Revise def of contiguity in bmm (#110811 ) Fixes #108754. `hf_T5_generate` would encounter a regression when calling `extern_kernels.bmm`, if one input is `reinterpret_tensor(buf2, (8, 1, 64), (64, 0, 1))` rather than `reinterpret_tensor(buf2, (8, 1, 64), (64, 512, 1), 0)`. As @jgong5 mentioned in comment, in fact the two tensors are equivalent: The stride doesn't matter when the corresponding size is 1. We revise the definition of contiguity in `bmm` to add the above situation as a contiguous case. Thus, when stride equals to 0, `extern_kernels.bmm` could still use `gemm` of MKL to gain the performance. Speedup of `hf_T5_generate` is 1.343x now and 1.138x before, with script `bash inductor_single_test.sh multiple inference performance torchbench hf_T5_generate float32 first dynamic default 0`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110811 Approved by: https://github.com/jgong5, https://github.com/lezcano, https://github.com/Chillee	2023-10-10 06:48:08 +00:00
Tugsbayasgalan Manlaibaatar	35e48e262c	[custom op] Use canonical API to constrain unbacked values (#108372 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108372 Approved by: https://github.com/angelayi, https://github.com/ezyang	2023-10-10 05:14:28 +00:00
PyTorch MergeBot	33403336fa	Revert "[user errors] compulsory case names, allow multiple (#110878 )" This reverts commit 2ae71c45982109065e19a2c05473fbe7237215ab. Reverted https://github.com/pytorch/pytorch/pull/110878 on behalf of https://github.com/kit1980 due to export/test_export.py::TestExport::test_multiple_definitions_same_name_dim - TypeError: UserError.init() missing 1 required positional argument: 'case_names' ([comment](https://github.com/pytorch/pytorch/pull/110878#issuecomment-1754360051))	2023-10-10 04:44:40 +00:00
PyTorch UpdateBot	8891da40d7	[vision hash update] update the pinned vision hash (#110915 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110915 Approved by: https://github.com/pytorchbot	2023-10-10 04:31:10 +00:00
PyTorch MergeBot	19ecb5d0d5	Revert "[Inductor] Disallow OpOverloadPacket in ir.FallbackKernel (#110567 )" This reverts commit 37a02659921490d85b2b0712ad52b924e0c431cd. Reverted https://github.com/pytorch/pytorch/pull/110567 on behalf of https://github.com/kit1980 due to breaking internal builds, see D50091340 ([comment](https://github.com/pytorch/pytorch/pull/110567#issuecomment-1754308982))	2023-10-10 03:49:20 +00:00
Avik Chaudhuri	2ae71c4598	[user errors] compulsory case names, allow multiple (#110878 ) We want to get to a point where most UserErrors link to exportdb examples. This PR makes passing case names non-optional to make this intent clearer and encourage developers who raise UserErrors to make or point to examples that make fixing such errors more obvious for users. In addition, sometimes there are multiple examples that are relevant to an error. Thus this PR also enables passing multiple case names. Retry of #110733 which was reverted due to a landrace. Differential Revision: [D50087148](https://our.internmc.facebook.com/intern/diff/D50087148/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110878 Approved by: https://github.com/gmagogsfm, https://github.com/tugsbayasgalan	2023-10-10 03:48:07 +00:00
Jesse Cai	f10aab03c4	[sparse] Fix semi-structured sparse shape mismatch bug (#110420 ) Summary: Currently, PyTorch incorrectly calculates the size of the returned matrix when we pass a non-contiguous batched (>2d) input to the semi-structured sparse subclass. This is most common in MLP layers, where we have 2 linear layers back to back. This will lead to an error like the following: ``` RuntimeError: shape '[20, 64, 64, 3072]' is invalid for input of size 62914560 ``` Where the size of the sparse matmul result is off because we infer the output shape with the wrong tensor shape. This happens because of a bug where we did not update the subclass tensor shape when doing transpose. For semi-structured sparsity, transposing is a no-op where we just set the boolean flag, but we forgot to also update the tensor shape. Note that this error goes away in inference mode, since we avoid decomposing the aten.linear op and handle shape folding ourselves, which changes the execution path. An alternative way to fix this issue is to set TORCH_FLATTEN_LINEAR_3D=True, which will also fix this error. Test Plan: ``` python test/test_sparse_semi_structured.py -k test_mlp ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/110420 Approved by: https://github.com/alexsamardzic, https://github.com/cpuhrsch	2023-10-10 03:07:31 +00:00
Eric Zhang	468a73f0e3	Support Numpy ints in the `torch.nn.functional.interpolate` dtype check (#110778 ) In https://github.com/pytorch/pytorch/pull/99243, a check was added to ensure the `size` only contained integers. This PR updates the check to also include numpy integers based on this comment (cc @kit1980): https://github.com/pytorch/pytorch/pull/99243#issuecomment-1646736646. Similar to the other commenter, I also ran into issues where existing software broke due to this after upgrading to PT2.1: ``` if not torch.jit.is_scripting(): if not all(_is_integer(x) for x in size): > raise TypeError( "expected size to be one of int or Tuple[int] or Tuple[int, int] or " f"Tuple[int, int, int], but got size with types {[type(x) for x in size]}" ) E TypeError: expected size to be one of int or Tuple[int] or Tuple[int, int] or Tuple[int, int, int], but got size with types [<class 'numpy.int64'>, <class 'numpy.int64'>] /conda-env/lib/python3.8/site-packages/torch/nn/functional.py:3924: TypeError ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/110778 Approved by: https://github.com/mikaylagawarecki	2023-10-10 01:46:33 +00:00
Edward Z. Yang	de3ae93e9b	Include rank of default PG in C++ log messages (#110623 ) I tested by adding some warning logs in C++, run a distributed program and show that they now had `[rank0]:` in the messages. There is no existing test infra for C++ logging so I couldn't easily add a unit test. The implementation strategy is to setup a global variable in C++, and then poke it when we initialize a process group. This was the simplest thing I could think of that would work. This PR only works for non-glog logging. Probably need to come up with some other strategy for glog, e.g., a custom prefix, but need to make sure this doesn't conflict with fbcode. I can't easily test this from OSS, will leave as follow up work. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/110623 Approved by: https://github.com/voznesenskym, https://github.com/wanchaol, https://github.com/fduwjj	2023-10-10 00:26:52 +00:00
Peter Bell	0341deb1c7	Move at::{Refcounted,}MapAllocator to c10 (#109881 ) `libshm.so` depends on the torch library exclusively for `at::RefcountedMapAllocator`, so it makes sense to move it to c10 along with the other memory allocators. This means `libshm.so` only depends on `c10` and we don't need to relink `libshm.so` for every ATen change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109881 Approved by: https://github.com/albanD	2023-10-09 23:53:47 +00:00
angelayi	3704bf4ee8	[export] Update custom ops docs (#110492 ) Updating the doc links in the custom ops documentation in export Pull Request resolved: https://github.com/pytorch/pytorch/pull/110492 Approved by: https://github.com/avikchaudhuri	2023-10-09 23:40:40 +00:00
Wanchao Liang	28d7d7fc42	device agnostic: torch.cpu.set_device (#110716 ) to support device agnostic, add a dummpy placeholder in torch.cpu Pull Request resolved: https://github.com/pytorch/pytorch/pull/110716 Approved by: https://github.com/albanD	2023-10-09 23:00:15 +00:00
Jane Xu	2aa0ba38a4	Make is_sparse a property of MaskedTensor (#110725 ) Fixes #104574 Seeing that MaskedTensor is a prototype, the BC breaking nature of this change seems okay? Locally tested: <img width="1372" alt="image" src="https://github.com/pytorch/pytorch/assets/31798555/239e61ba-e0b9-4909-8c7a-0ce3869d7375"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/110725 Approved by: https://github.com/cpuhrsch	2023-10-09 22:35:38 +00:00
SS-JIA	6c8096ec31	[ATen core IR] Register additional ATen operators as core (#110882 ) ## Context For more context, please refer to [this PyTorch forums post](https://dev-discuss.pytorch.org/t/defining-the-core-aten-opset/1464). This PR registers some additional ATen operators as `core`, based on feedback from the forums post as well as the experiences from adding other core ATen decompositions. The ATen operators registered as core in this diff, with the associated reasoning, are: ATen op \| reasoning --\|-- aten::atan2 \| This operator often maps to a hardware intrinsic. aten::diagonal \| There is no straightforward decomposition for this operator. aten::empty_like \| Decomposition for this operator would require `as_strided` to retain the strides of the input tensor, which should be avoided. aten::expm1 \| This operator often maps to a hardware intrinsic; Furthermore, decomposing it will negatively impact the numerical precision of the output. aten::full_like \| Decomposition for this operator would require `as_strided` to retain the strides of the input tensor, which should be avoided. aten::log10 \| This operator often maps to a hardware intrinsic; Furthermore, decomposing it will negatively impact the numerical precision of the output. aten::log1p \| This operator often maps to a hardware intrinsic; Furthermore, decomposing it will negatively impact the numerical precision of the output. aten::log2 \| This operator often maps to a hardware intrinsic; Furthermore, decomposing it will negatively impact the numerical precision of the output. aten::pow.Scalar_Tensor \| This is a Scalar variant of pow.Tensor_Tensor, which is a part of core. aten::resize \| There is no valid decomposition for this operator. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110882 Approved by: https://github.com/lezcano	2023-10-09 22:27:00 +00:00
Will Constable	733368a822	Change default NCCL_ASYNC_ERROR_HANDLING to 3:SkipCleanUp (#110723 ) Summary Currently, when detecting a timeout/exception in the watchdog workCleanupLoop, we call nccl APIs to abort all the active communicators before finally re-raising the exception and killing the process. The nccl APIs may hang, causing additional problems. Instead, just re-raise. @kumpera proposed that changing this default should save us from a lot of commonly observed errors. Note: there are other cuda/nccl api calls in our watchdog, which also could hang. This change is not a substitute for a deeper refactor. Detail The current default (NCCL_ASYNC_ERROR_HANDLING=1:TearDown) meant the following: SHOULD_TEAR_DOWN() evaluates to true - This affects 'ProcessGroupNCCL::WorkNCCL::handleException` - handleException is called from two places: - work.wait() -> synchronizeInternal() -> handleException() - workCleanupLoop() -> handleException() - when true, the excpetion is logged and rethrown SHOULD_CLEAN_UP() evaluates to true - This only impacts the workCleanupLoop() - When true, it means all communicators will be aborted (ncclCommAbort()) upon work exception or timeout The proposed new default is NCCL_ASYNC_ERROR_HANDLING3=3:SkipCleanUp. This only changes SHOULD_CLEAN_UP() to false, impacting workCleanupLoop() behavior. Communicators will no longer be aborted, which should avoid a class of bugs where the watchdog hangs due to calling nccl APIs which may block/hang. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110723 Approved by: https://github.com/fduwjj, https://github.com/xw285cornell	2023-10-09 21:38:32 +00:00
Guilherme Leobas	0a580da582	Add batch decomposition for torch.linalg.eigh (#110640 ) Closes https://github.com/pytorch/pytorch/issues/108481 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110640 Approved by: https://github.com/kshitij12345, https://github.com/zou3519	2023-10-09 21:36:49 +00:00
chilli	201d02ef77	stop non-differentiable values from being materialized in aotautograd (#110721 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110721 Approved by: https://github.com/bdhirsh ghstack dependencies: #110720	2023-10-09 20:18:19 +00:00
chilli	c596db762f	refactor aotautograd to set requires_grad on info rather than a separate array (#110720 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110720 Approved by: https://github.com/bdhirsh	2023-10-09 20:18:19 +00:00
Jon Chuang	db760527e0	fix(dynamo): list index via polyfill (#110817 ) Fixes https://github.com/pytorch/pytorch/issues/109031 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110817 Approved by: https://github.com/ezyang	2023-10-09 19:48:39 +00:00
Wanchao Liang	2a76c7f018	[dtensor] skip move to device when device_type match (#110774 ) skip tensor.to in from_local and distribute_tensor when device_type of device mesh matches tensor.device type, since from_local on the critial path of TP, this might also reduce some overhead Pull Request resolved: https://github.com/pytorch/pytorch/pull/110774 Approved by: https://github.com/fduwjj	2023-10-09 19:39:11 +00:00
Kazuaki Ishizaki	50bd252863	Fix typo `the the` (#110869 ) This PR fixes typo `the the` of comments and exception message in files. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110869 Approved by: https://github.com/soulitzer	2023-10-09 19:32:45 +00:00
Kazuaki Ishizaki	b5f9696d81	Fix typo under torch directory (#110824 ) This PR fixes typo `the the` of comments and exception messages in files under `torch` directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110824 Approved by: https://github.com/H-Huang	2023-10-09 19:16:43 +00:00
PyTorch MergeBot	d1c157c598	Revert "[reland] Update custom Function preserve torch function when inputs r… (#110679 )" This reverts commit 563728f61c39379070661af3a431aa49eaf5c8ac. Reverted https://github.com/pytorch/pytorch/pull/110679 on behalf of https://github.com/kit1980 due to The diff has Meta-internal changes, please land from Phabricator ([comment](https://github.com/pytorch/pytorch/pull/110679#issuecomment-1753523182))	2023-10-09 19:09:01 +00:00
Edward Z. Yang	8ae623db9d	Don't pass tuple to with statement (#110864 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/110864 Approved by: https://github.com/Skylion007, https://github.com/awgu	2023-10-09 19:00:34 +00:00
igm503	4b881b0da3	[MPS] add support for sgn to MPS backend (#110829 ) Fixes #86805 Adds support for sgn to MPS backend. Notes: 1. @malfet self-assigned this when he was working on implementing polar, but from what I can tell, he didn't end up needing to implement it. 2. @Berzeg implemented this last year, before view_as_complex was supported. Because of @malfet recent contributions, however, @Berzeg 's implementation works. I've removed the part of his implementation that dealt with non-complex dtypes (since these can just be passed to at::sign), matched the more recent pattern we've been using in UnaryOps.mm, and thrown in a simple implementation of _efficientzerotensor for mps, so that the backward function works. 3. @Berzeg deserves a good bit of credit for this, so let me know if there's a way to assign him some without jamming up the pr (he seems to be AWOL since last working on this) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110829 Approved by: https://github.com/malfet	2023-10-09 16:53:25 +00:00
Aaron Gokaslan	144cda7f06	[BE]: Enable ruff's flake8-PYI rules (#110830 ) Enable Flake8-PYI rules codebase wide. Most of the rules already match our codebase style, the remaining ones that were not autofixed I have added to the pyproject.toml to be enabled in a later PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110830 Approved by: https://github.com/albanD	2023-10-09 16:37:26 +00:00
Mwiza Kunda	306b2284f2	Add meta kernel for ctc_loss.intList (#107949 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107949 Approved by: https://github.com/zou3519	2023-10-09 16:35:14 +00:00
PyTorch MergeBot	bbdc8c7b05	Revert "deprecating nvfuser c++ API (#110318 )" This reverts commit bf0866fc164b1eab10a5174a57e21eb3321bef89. Reverted https://github.com/pytorch/pytorch/pull/110318 on behalf of https://github.com/davidberard98 due to too many warnings being thrown in torchvision https://github.com/pytorch/pytorch/issues/110857 ([comment](https://github.com/pytorch/pytorch/pull/110318#issuecomment-1753245449))	2023-10-09 15:41:50 +00:00
Aaron Gokaslan	2e57b1e847	[BE]: Update NCCL submodule to v2.19.3 (#110827 ) Updates NCCL submodule to v2.19.3 Mostly contains some more performance fixes for H100s as well as a couple new performance features and some new plugin support. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110827 Approved by: https://github.com/malfet	2023-10-09 13:37:26 +00:00
PyTorch UpdateBot	a18b98f8a2	[xla hash update] update the pinned xla hash (#110852 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml). Update the pinned xla hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110852 Approved by: https://github.com/pytorchbot	2023-10-09 12:00:17 +00:00
cyy	3a70a02a81	Enable Wrange-loop-analysis (#110837 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/110837 Approved by: https://github.com/Skylion007	2023-10-09 11:19:03 +00:00
vfdev-5	d2a2a67fa4	Added new test sample to interpolate op in OpInfo (#104181 ) Description: - Added new test sample to interpolate op in OpInfo - Fixed silent issue with zero tensor test sample for uint8 dtype Pull Request resolved: https://github.com/pytorch/pytorch/pull/104181 Approved by: https://github.com/pmeier, https://github.com/lezcano	2023-10-09 10:55:56 +00:00
Jez Ng	ddb0c26511	[inductor] Re-enable more fixed tests (#110798 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110798 Approved by: https://github.com/Skylion007	2023-10-09 04:36:51 +00:00
Nikita Shulga	92fea5ae3f	[GHF] Re-enable `test_internal_changes` (#110834 ) As Jon fixed the internal change status reporting after the issue is closed Fixes https://github.com/pytorch/pytorch/issues/110218 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110834 Approved by: https://github.com/janeyx99	2023-10-09 03:23:07 +00:00
cyy	3ec33957eb	[1/N] Enable Wunused-result and Wunused-variable in torch targets (#110722 ) They are useful for checking results of function calls. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110722 Approved by: https://github.com/Skylion007	2023-10-08 23:43:45 +00:00
Animesh Jain	e1f0f9c64e	[dynamo][easy] Move code from GetAttrVariable to a suitable place (#110535 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110535 Approved by: https://github.com/jansel	2023-10-08 22:37:34 +00:00
Yunfeng Wang	ad24965f6c	typo: add space after cudnn error messages (#110806 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/110806 Approved by: https://github.com/Skylion007	2023-10-08 20:58:40 +00:00
Kazuaki Ishizaki	a603dcc307	Fix typo under test directory (#110826 ) This PR fixes typo `the the` of comments in files under `test` directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110826 Approved by: https://github.com/Skylion007	2023-10-08 20:52:38 +00:00
Kazuaki Ishizaki	afed0314a8	Fix typo under aten directory (#110822 ) This PR fixes typo `the the` of comments in files under `aten` directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110822 Approved by: https://github.com/Skylion007	2023-10-08 20:52:22 +00:00
Kazuaki Ishizaki	105f3b5f91	Fix typo under caffe2 directory (#110825 ) This PR fixes typo `the the` of comments in files under `caffe2` directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110825 Approved by: https://github.com/Skylion007	2023-10-08 20:48:12 +00:00
Kazuaki Ishizaki	fde28fdc8c	Fix typo under torch/_decomp directory (#110821 ) This PR fixes typo of comments in files under `torch/_decomp` directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110821 Approved by: https://github.com/Skylion007	2023-10-08 20:33:49 +00:00
Sam Larsen	8a8668e1ae	[inductor] Implement Fx graph caching to improve warm compilation time. (#103453 ) Summary: Implement an on-disk cache to save and reuse compiled FX Graphs. This implementation does not handle tensors with symbolic shapes. This needs to be done in a follow-up PR. Test Plan: * New unit tests exercising saving and load from the cache. * New unit tests to exercise the cache key calculations. * Ran several benchmarks to see cache hit and resulting compilation times. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103453 Approved by: https://github.com/eellison	2023-10-08 20:32:15 +00:00
Huamin Li	5ef490f736	Update AOTInductor compile logic for CPU backend for Meta internal env (#110729 ) Reviewed By: muchulee8, chenyang78 Differential Revision: D49944410 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110729 Approved by: https://github.com/chenyang78	2023-10-08 19:48:12 +00:00
vinithakv	36e6b0cfa2	Fix cpuinfo related crash on ppc64 (#110708 ) The "import torch" crashes with following cpuinfo error on powerpc64. ============================================================== >>> import torch Error in cpuinfo: processor architecture is not supported in cpuinfo Fatal error in cpuinfo: cpuinfo_get_processors_count called before cpuinfo is initialized Aborted (core dumped) ================================================================== The patch fixes this by excluding powerpc from using cpuinfo as it is not supported for ppc64. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110708 Approved by: https://github.com/ezyang	2023-10-08 13:31:54 +00:00
Kazuaki Ishizaki	bff28ec568	Fix typo under torch/_export directory (#110808 ) This PR fixes typo of comments and message in files under `torch/_export` directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110808 Approved by: https://github.com/gmagogsfm	2023-10-08 11:47:51 +00:00
Jon Chuang	844ea6408b	feat(dynamo): handle accumulate kwargs ("func", "initial") (#110686 ) Follow up to: https://github.com/pytorch/pytorch/pull/110683 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110686 Approved by: https://github.com/ezyang	2023-10-08 07:06:52 +00:00
cdzhan	fa8e4ea212	Add support for hasattr on ListVariable (#110438 ) Fixes #109502 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110438 Approved by: https://github.com/jansel	2023-10-08 05:34:00 +00:00
Animesh Jain	58637c4b43	[dynamo] Remove SuperSource (#110475 ) The motivation for removing this is already present in the pre-PR comments. Copying it ~~~ # NB - SuperSource is a weird one. # it is our only source with 2 bases, so we use the objec # as the base, rather than the type, since an invocation # like super(Foo, foo) is represented here, the source object base is more spiritually # aligned with the instance, rather than the type. # This whole construction is questionable tho, and we should probably find a way to # avoid this exception to our otherwise nice source parentage invariant. ~~~ Instead of using super(a, b), we can use `type(b).__mro__[index]`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110475 Approved by: https://github.com/jansel	2023-10-08 04:45:06 +00:00
Bin Bao	6b4c686b9a	[aotindutor] Forward fix a performance regression (#110800 ) Summary: Forward fix a performance regression caused by https://github.com/pytorch/pytorch/pull/110510. When a model is run once, all those kernel pointers are initialized and removing the if-nullptr check will cause those loadKernel be unnecessarily executed again when we rerun the foward function. Another way to do this is to codegen loadKernel in the initializer, which I may do in a later PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110800 Approved by: https://github.com/jansel	2023-10-08 04:06:44 +00:00
albanD	1824ea3c0f	Add a test to make sure all modules in the codebase are importable (#110598 ) As per title, running import on any of these files lead to a crash. I'm very curious how the code in them is used! Pull Request resolved: https://github.com/pytorch/pytorch/pull/110598 Approved by: https://github.com/janeyx99, https://github.com/malfet	2023-10-08 03:52:30 +00:00
cyy	230a124a7a	[5/N] Move remaining c10::variant calls to std::variant (#110423 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110423 Approved by: https://github.com/colesbury	2023-10-08 03:52:02 +00:00
Wanchao Liang	459cef8649	switch dtensor and functional collective to use optree (#110670 ) optree recently landed and provide quite good perf, conditionally import new optree if optree is installed Some numbers testing mlp layer with TP + func collective: before this PR: 10.390ms after this PR: 9.189ms so around e2e 10% CPU overhead reduction Pull Request resolved: https://github.com/pytorch/pytorch/pull/110670 Approved by: https://github.com/fegin	2023-10-08 03:05:39 +00:00
Oguz Ulgen	defa0d3a2d	Add a side table for triton kernels to avoid using itertools.partial (#110633 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110633 Approved by: https://github.com/jansel	2023-10-08 02:01:59 +00:00
albanD	57cc886639	Fix public binding check to check all submodules (#110601 ) Fix https://github.com/pytorch/pytorch/issues/86619 The test to make sure modules are importable is being added at https://github.com/pytorch/pytorch/pull/110598 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110601 Approved by: https://github.com/zou3519	2023-10-08 00:36:31 +00:00
albanD	8edb561631	Fix use after free in tensor creation (#106707 ) Fix https://github.com/pytorch/pytorch/issues/106534 Pull Request resolved: https://github.com/pytorch/pytorch/pull/106707 Approved by: https://github.com/Skylion007, https://github.com/ezyang	2023-10-07 22:41:21 +00:00
Tugsbayasgalan Manlaibaatar	0a5f0b5db3	Suport tracing HuggingFace models (#110748 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110748 Approved by: https://github.com/avikchaudhuri	2023-10-07 22:37:28 +00:00
ydwu4	d84bcb9c8c	[HigherOrderOp] expose torch.cond (#110293 ) This pr expose torch._higher_order_ops.cond as torch.cond. 1. Need to add #noqa: F811 to the _check calls in torch/__init__.py to address some confusing linter error "Redefinition of unused 'cond'" but only one cond is imported and for these lines that have this error, they don't define the cond but just use it as an argument. 2. Also add cond to the list that allows it to be traced through so as dynamo could trigger the CondHigherOrder logic instead of creating a TorchVariable. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110293 Approved by: https://github.com/zou3519	2023-10-07 20:39:52 +00:00
Eric Grinstein	0a5bb1c2eb	Feature/stft no window warn (#110695 ) Fixes #88919 @mruberry @peterbell10 This PR adds a warning to the .cpp STFT and ISTFT functions if a window is not provided. It also describes the warning in the documentation on `functional.py`. Finally, it adds unit tests to check if the warning is being produced. I have audited for internal calls of `stft` and `istft` on Pytorch and haven't found any. Thank you for the opportunity to contribute! Eric Pull Request resolved: https://github.com/pytorch/pytorch/pull/110695 Approved by: https://github.com/ezyang	2023-10-07 20:24:36 +00:00
cyy	c3e4e4f6d2	[4/N] Add -Wdeprecated and related fixes (#110204 ) This PR enables Wdeprecated on torch_cpu Pull Request resolved: https://github.com/pytorch/pytorch/pull/110204 Approved by: https://github.com/ezyang	2023-10-07 19:46:08 +00:00
angelayi	096b14eae8	Fix numel test to be > 2 (#110731 ) This makes it consistent with the comment. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/110731 Approved by: https://github.com/angelayi	2023-10-07 19:18:59 +00:00
fduwjj	2dc5e166a5	[TP][Inference] Enable DTensor TP inference (#110751 ) In https://github.com/pytorch/pytorch/pull/109977, we observed that during inference mode, aten.Linear does not get decomposed. So instead of enabling sharding propagation for linear op, we use func.decompose so that it gets decomposed to matmul and mm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110751 Approved by: https://github.com/bdhirsh, https://github.com/wanchaol	2023-10-07 18:57:27 +00:00
Kazuaki Ishizaki	19ce68a45c	Fix typo under torch/_numpy directory (#110782 ) This PR fixes typo of comments in files under torch/_numpy directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110782 Approved by: https://github.com/Skylion007	2023-10-07 17:42:35 +00:00
Sherlock Huang	a119efe9c7	[AOTInductor][ez] Fix FallbackKernel.codegen() (#110777 ) Summary: ProxyExecutor should only used in fbcode for cpp codegen. Test Plan: Existing CIs Differential Revision: D50048488 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110777 Approved by: https://github.com/chenyang78	2023-10-07 15:29:09 +00:00
cyy	12f97bb2e9	[Reland][3/N] Add -Wdeprecated and related fixes (#110518 ) Fixes the string_view errors and reland the work. The previous changes in torch/csrc/utils/invalid_arguments.cpp were too aggressive and not tested thoroughly. They are discarded. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110518 Approved by: https://github.com/ezyang	2023-10-07 08:38:40 +00:00
Adnan Akhundov	98b79e9488	[inductor] Add AOTI ABI shim function for torch.nonzero (#110766 ) Summary: `torch.nonzero` doesn't have inductor lowering (yet). To invoke the operator in AOT Inductor's ABI compatibility mode we need a dedicated shim function. Test Plan: ``` $ python test/inductor/test_aot_inductor.py -k test_zero_grid_with_unbacked_symbols ... ---------------------------------------------------------------------- Ran 4 tests in 78.650s OK ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/110766 Approved by: https://github.com/chenyang78 ghstack dependencies: #110713, #110745, #110764	2023-10-07 08:32:27 +00:00
Adnan Akhundov	13a2f42635	[inductor] Add size, stride, storage_offset to RAIIAtenTensorHandle (#110764 ) Summary: For unbacked SymInts, the C++ wrapper codegen can generate expressions like `buf123.size()` or `.stride()` or `.storage_offset()`: `7cc0020a80/torch/_inductor/ir.py (L2504-L2520)` Here we add corresponding methods to the `RAIIAtenTensorHandle` class so that the above codegen works in the ABI compatibility mode. Test Plan: CI + the following PR Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/110764 Approved by: https://github.com/chenyang78 ghstack dependencies: #110713, #110745	2023-10-07 08:26:42 +00:00
Adnan Akhundov	abb00f66d8	[inductor] Add AOTI ABI shim function for repeat_interleave.Tensor (#110745 ) Summary: `repeat_interleave.Tensor` doesn't have inductor lowering. To invoke the operator in AOT Inductor's ABI compatibility mode we need a dedicated shim function. Test Plan: ``` $ python test/inductor/test_aot_inductor.py -k test_repeat_interleave ... ---------------------------------------------------------------------- Ran 4 tests in 70.526s OK ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/110745 Approved by: https://github.com/chenyang78 ghstack dependencies: #110713	2023-10-07 08:18:01 +00:00
Yang Chen	432df71820	[inductor] added a config to always add tensor constants (#110491 ) Summary: In some scenarios, we want to update constants at runtime. In such cases, we have to keep the original constants in the generated code without applying any constant-inlining optimizations. This PR adds a config to force us to add tensor constants. Differential Revision: D49895154 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110491 Approved by: https://github.com/mikekgfb	2023-10-07 07:51:54 +00:00
Mu-Chu Lee	840e68301c	[AOTInductor] Change UpdateConstants to UpdateConstantsMap (#110576 ) Summary: Change name of UpdateConstants to UpdateConstantsMap Test Plan: Differential Revision: D49937744 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110576 Approved by: https://github.com/chenyang78, https://github.com/khabinov	2023-10-07 07:36:57 +00:00
Huy Do	18f0d3af72	Revert "[user errors] compulsory case names, allow multiple (#110733 )" (#110783 ) This reverts commit 983f6f36dbaf0210360926547b05deb1e4f798a4. I have no idea how to revert https://github.com/pytorch/pytorch/pull/110733 with the bot. So reverting it manually for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110783 Approved by: https://github.com/ZainRizvi, https://github.com/kit1980	2023-10-07 07:32:39 +00:00
Chien-Chin Huang	d54e20f457	[FSDP][state_dict] Add a unittest for local_state_dict resharding (#110625 ) This PR adds a unittest to demonstrate the ability for LOCAL_STATE_DICT to do resharding. Differential Revision: [D44260141](https://our.internmc.facebook.com/intern/diff/D44260141/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110625 Approved by: https://github.com/wz337	2023-10-07 07:22:41 +00:00
XDaoHong	1b34238d67	fix get device index if has _utils._get_device_index in privateuse1 (#108123 ) Get device index by torch.privateuse1._utils._get_device_index, if the metched exists. Reason: Can only get device_index 0 if ```location``` such as 'privateuse1' before modify. Can get accurate deivce index use _get_device_index in this scenario. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108123 Approved by: https://github.com/albanD	2023-10-07 06:18:59 +00:00
Stephen Jia	c2e7a0d689	[core IR] Add decomps for `aten.sum` and `aten.squeeze` variants (#110645 ) Summary: ## Context Both `aten.sum` and `aten.squeeze` have a "most generic" variant in the form of `aten.sum.dim_IntList` and `aten.squeeze.dims` respectively. Add decompositions for other non generic variants of these operators to express them using the most generic variant. Note that to register these decomps, the reference implementation under `_refs` had to be removed as registered decompositions. cc: @lezcano @peterbell10 Test Plan: Github CI + Meta Internal CI Differential Revision: D49965952 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110645 Approved by: https://github.com/peterbell10, https://github.com/digantdesai, https://github.com/manuelcandales	2023-10-07 04:21:51 +00:00
Jez Ng	c77dd684c9	Enable typechecking in _inductor/ir.py (#110112 ) I used a bunch of ignore-type comments, mostly due to https://github.com/pytorch/pytorch/issues/109963. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110112 Approved by: https://github.com/peterbell10	2023-10-07 04:19:38 +00:00
Oguz Ulgen	e8ef8bfdce	[Inductor] Allow matmul to have flexiable layout when we are not autotuning (#110726 ) Fixes #102804 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110726 Approved by: https://github.com/Chillee	2023-10-07 04:08:37 +00:00
Jerry Zhang	5cc1a38370	[release_notes] Some updates after 2.1 release (#110771 ) Summary: 1. aligned topic with labels 2. added some more descriptions in release note worksheet template Test Plan: . Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/110771 Approved by: https://github.com/drisspg	2023-10-07 03:10:46 +00:00
jjsjann123	bf0866fc16	deprecating nvfuser c++ API (#110318 ) deprecating nvfuser c++ API Pull Request resolved: https://github.com/pytorch/pytorch/pull/110318 Approved by: https://github.com/davidberard98	2023-10-07 02:25:21 +00:00
Avik Chaudhuri	983f6f36db	[user errors] compulsory case names, allow multiple (#110733 ) We want to get to a point where most `UserError`s link to `exportdb` examples. This PR makes passing case names non-optional to make this intent clearer and encourage developers who raise `UserError`s to make or point to examples that make fixing such errors more obvious for users. In addition, sometimes there are multiple examples that are relevant to an error. Thus this PR also enables passing multiple case names. Differential Revision: [D50020465](https://our.internmc.facebook.com/intern/diff/D50020465/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110733 Approved by: https://github.com/zhxchen17	2023-10-07 01:25:12 +00:00
Chien-Chin Huang	90bf6e3938	[FSDP][optim_state_dict] Enable cpu_offload config for optimzer state_dict (#108434 ) We had the option but never used cpu_offload as optimizer state_dict offloads the tensors to CPU by default. And this is usually most users want as the tensors are required to be moved to CPU eventually. However, we may want to disable offloading to CPU in some cases, epsecially for the debugging purpose. This PR lets optimizer state_dict read the flag. Differential Revision: [D48913340](https://our.internmc.facebook.com/intern/diff/D48913340/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108434 Approved by: https://github.com/wz337	2023-10-07 01:14:49 +00:00
soulitzer	563728f61c	[reland] Update custom Function preserve torch function when inputs r… (#110679 ) …eturned as-is reland of https://github.com/pytorch/pytorch/pull/109825#issuecomment-1749803837 Opening this without ghstack to do codev. In our PR, we changed the signature of `_wrap_outputs`. There is some internal code that calls `_wrap_outputs` directly, so we also need to update that callsite. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110679 Approved by: https://github.com/albanD	2023-10-07 00:27:45 +00:00
Wanchao Liang	1c97808f81	[dtensor] support lt/gt op (#110585 ) This PR enables lt/gt aten op Pull Request resolved: https://github.com/pytorch/pytorch/pull/110585 Approved by: https://github.com/fduwjj ghstack dependencies: #110584	2023-10-07 00:06:36 +00:00
Wanchao Liang	9378a2ceda	[dtensor] support aten.where and enable implicit scalar promotion (#110584 ) This PR adds support for aten.where and support implicit scalar promotion, basically when we meet scalar tensors in dispatching logic, we implicitly convert it those to replicated dtensor The latter also enables bunch of ops in op db to pass Pull Request resolved: https://github.com/pytorch/pytorch/pull/110584 Approved by: https://github.com/fduwjj	2023-10-07 00:06:36 +00:00
Yue Dong	e3bf5000a7	Hide the contiguous requirement for user input mesh when initializing DeviceMesh (#110628 ) Summary: As title, this diff hides the contiguous requirement for user input mesh when initializing DeviceMesh. In the current implementation, when testing with inter-node model parallelism, an exception is thrown during mesh validation when the following input is provided: ``` mesh = torch.arange(0, world_size).view(mp_size, dp_size).transpose(0, 1) device_mesh = DeviceMesh( "cuda", mesh.contiguous(), mesh_dim_names=("dp", "mp") ) ``` Test Plan: Unit Test: ``` buck2 test mode/dev-nosan //caffe2/test/distributed/_tensor:device_mesh -- test_validate_device_mesh Test UI: https://www.internalfb.com/intern/testinfra/testrun/3940649876878399 Network: Up: 0B Down: 0B Jobs completed: 6. Time elapsed: 1:58.7s. Tests finished: Pass 1. Fail 0. Fatal 0. Skip 0. Build failure 0 ``` Test with MP ``` mesh = torch.arange(0, world_size).view(mp_size, dp_size).transpose(0, 1) device_mesh = DeviceMesh( "cuda", mesh.contiguous(), mesh_dim_names=("dp", "mp") ) ``` Without the change: exception. After this change: initialzied sucessfully. Differential Revision: D49942839 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110628 Approved by: https://github.com/wanchaol, https://github.com/xw285cornell, https://github.com/fduwjj	2023-10-06 23:54:13 +00:00
albanD	a0bbd075b2	Add the Mode section in the extending doc (#110073 ) Cover the basic principles of Mode and an example on how to use them and their behavior. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110073 Approved by: https://github.com/janeyx99	2023-10-06 23:50:55 +00:00
chilli	6b1007b2a7	Fix error in div lowering with integers (#102809 ) Fixes https://github.com/pytorch/pytorch/issues/101016 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102809 Approved by: https://github.com/ngimel ghstack dependencies: #110501, #110504, #110591, #110668, #110687	2023-10-06 23:21:40 +00:00
Nikita Shulga	d35e3dbd06	Fix concurrency limits for Create Release (#110759 ) Also, don't run it on tags, but run on release branch and on `release` event. Tweak linter to accept different concurrency limits for `create_release.yml` Fixes https://github.com/pytorch/pytorch/issues/110569 as all the invocations of workflow in the past were cancelled by concurrently limit due to the tag push and release happening at roughly the same time, see https://github.com/pytorch/pytorch/actions/workflows/create_release.yml?query=event%3Arelease Pull Request resolved: https://github.com/pytorch/pytorch/pull/110759 Approved by: https://github.com/atalman	2023-10-06 23:14:12 +00:00
Jon Chuang	9b55194f81	fix(dynamo): Incorrect `accumulate` implementation, bad tests (#110683 ) Root cause of: https://github.com/pytorch/pytorch/issues/110287 Fixed many tests that didn't actually test due to unreliability of `CompileCounter.frame_count` in detecting graph breaks: https://github.com/pytorch/pytorch/issues/110730 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110683 Approved by: https://github.com/voznesenskym	2023-10-06 23:07:56 +00:00
PyTorch UpdateBot	4342b0849f	[vision hash update] update the pinned vision hash (#110667 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110667 Approved by: https://github.com/pytorchbot	2023-10-06 23:01:11 +00:00
Huy Do	f952551963	Handle invalid cancellation signals in trymerge (#110690 ) This change is needed after https://github.com/pytorch/test-infra/pull/4579 and https://github.com/pytorch/test-infra/pull/4610. All invalid cancelled signals have been removed from Dr.CI and HUD. So trymerge should ignore them accordingly for a consistent experience. ### Testing https://github.com/pytorch/pytorch/pull/110367#issuecomment-1750099960 is the PR where a bunch of invalid cancelled signals showed up and blocked merges Pull Request resolved: https://github.com/pytorch/pytorch/pull/110690 Approved by: https://github.com/clee2000, https://github.com/ZainRizvi	2023-10-06 22:43:33 +00:00
Adnan Akhundov	2aa3064364	[inductor] Add aoti_torch_dtype_bool to AOTI ABI shim (#110713 ) Summary: ATT Test Plan: CI Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/110713 Approved by: https://github.com/chenyang78, https://github.com/desertfire	2023-10-06 22:16:39 +00:00
Nikita Shulga	65d40a72c4	Delete rogue print from `test_quantize_pt2e.py` (#110732 ) Introduced by https://github.com/pytorch/pytorch/pull/110308 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110732 Approved by: https://github.com/clee2000, https://github.com/ZainRizvi, https://github.com/jerryzh168	2023-10-06 22:16:10 +00:00
Jeff Daily	59592ce9f2	[CUDA Host Allocator][ROCm] fixes (#110715 ) Follow up to #110123, removing the CUDA_VERSION check for ROCm because HIP already has hipMallocAsync() and doesn't need the version check there. Follow up to #108488, fixing the unit failing unit tests by accepting either a "cuda" or "hip" attribute for the caching allocator options. This is aligned to the masquerading strategy for ROCm/HIP. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110715 Approved by: https://github.com/ezyang	2023-10-06 21:42:24 +00:00
Sergii Dymchenko	3d87c52cef	Remove stuff for Python before 3.8 from install_conda.sh (#110671 ) As we only support Python 3.8+ now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110671 Approved by: https://github.com/seemethere, https://github.com/huydhn, https://github.com/atalman, https://github.com/malfet, https://github.com/ZainRizvi	2023-10-06 21:40:28 +00:00
George White	f4796df914	Add support for generators on the IPU device (#110704 ) This change adds hooks similar to those used on other device types, to allow the Torch to create and use generators provided by the IPU backend. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110704 Approved by: https://github.com/ezyang	2023-10-06 21:36:14 +00:00
Avik Chaudhuri	44d34fe65c	different bounds for same Dim name (#110638 ) Previously,`Dim` definitions that shared the same name but had different ranges were allowed to appear in the `dynamic_shapes` argument of an `export` call. They would correspond to the same dynamic dimension (identified by the shared name) with an effective range would be the intersection of the different ranges. However this behavior can be confusing, because having different definitions with the same name is more likely than not unintentional. Therefore, this PR makes it a user error. We still allow different definitions with the same name to exist at the same time (no global uniqueness) as long as they are not confused in the same `export` call. Redefinitions with the same bounds are also allowed, in case they are accidentally created by executing the same code multiple times. Differential Revision: [D49965944](https://our.internmc.facebook.com/intern/diff/D49965944/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110638 Approved by: https://github.com/zhxchen17	2023-10-06 21:22:52 +00:00
Avik Chaudhuri	0d4a360fa2	remove replaced symbols from range_constraints (#110644 ) While the `range_constraints` that is initially derived by processing of constraints only contains symbols that appear in the graph module, eventually the `range_constraints` that are in the exported program seem to contain more symbols than those that appear in the graph module. Clearly this is a regression, because the example of "Expressing Dynamism" in our public docs (https://pytorch.org/docs/stable/export.html#expressing-dynamism) does not show the extra symbols in `range_constraints`, but running the example does. The problem seems to arise when we are running `_transform` passes, where we regenerate the `range_constraints` from the `shape_env`. However, as a rule, symbols that have `replacements` are actually replaced (by other expressions, including constants or other symbols), so they should never appear in the graph module. Thus we can filter such symbols out from `range_constraints` as well. Differential Revision: [D49969620](https://our.internmc.facebook.com/intern/diff/D49969620/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110644 Approved by: https://github.com/zhxchen17	2023-10-06 21:13:55 +00:00
Adnan Akhundov	f74937741e	Remove runtime assertions between export and AOT compilation (#110710 ) Summary: The runtime assertions inserted in the `torch._export.export` by the `_AddRuntimeAssertionsForInlineConstraintsPass` lead to errors in AOT Inductor like #109884. In `torch._export.aot_compile` export and AOT compilation are run consecutively which would lead to the above issue if any assertions are inserted. In this PR, we're adding a new parameter / flag to `torch._export.aot_compile`, `remove_runtime_assertions`, to remove the assertions inserted during export before AOT compilation. The flag is set to `False` for BC. Additionally, we remove the flag `add_runtime_assertions_for_inline_constraints` recently added to `torch._dynamo.config`, as it can lead to undesirable `torch._export` behavior and is 's no longer required for the AOT Inductor testing purposes. Test Plan: CI Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/110710 Approved by: https://github.com/zhxchen17, https://github.com/chenyang78	2023-10-06 21:09:35 +00:00
cdzhan	7cc0020a80	[decomp] Fix different return type in threshold_backward vs. eager (#110689 ) due to type promotion with floating point scalar in decompositions.py Fixes part of #100838 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110689 Approved by: https://github.com/ezyang	2023-10-06 20:59:58 +00:00
Zhengxu Chen	756b4e9e08	[export] Add codeowners. (#110718 ) Summary: So that we can catch all changes under export/ Test Plan: CI Differential Revision: D50017157 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110718 Approved by: https://github.com/tugsbayasgalan	2023-10-06 20:57:51 +00:00
kshitij12345	b8a3998c23	add batch rule for missing inplace ops (#110692 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110692 Approved by: https://github.com/ezyang	2023-10-06 20:53:28 +00:00
Yanbo Liang	1b1bc08557	[Dynamo] SizeVariable can be indexed by symint (#110349 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/110349 Approved by: https://github.com/williamwen42	2023-10-06 20:48:07 +00:00
PyTorch MergeBot	ff0358b038	Revert "[C10] PG observability hooks. (#108815 )" This reverts commit 0c7a877745f98b8fce8868291408945c0dd817d6. Reverted https://github.com/pytorch/pytorch/pull/108815 on behalf of https://github.com/albanD due to Add a new torch.distributed.hooks namespace but does not document it, test was added this morning ([comment](https://github.com/pytorch/pytorch/pull/108815#issuecomment-1751327751))	2023-10-06 19:49:49 +00:00
Sherlock Huang	37a0265992	[Inductor] Disallow OpOverloadPacket in ir.FallbackKernel (#110567 ) In ABI compatible mode, We always need op_overload.schema for FallbackKernel. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110567 Approved by: https://github.com/jansel	2023-10-06 19:20:50 +00:00
Rodrigo Kumpera	0c7a877745	[C10] PG observability hooks. (#108815 ) Expose a set of observability hooks into C10D such that our users can detect collectives failure both faster and more easily. The design is similar to NCCL desync debug that it minimized the overhead by doing most of the work out of the main thread. This PR introduces a new module torch.distributed.hooks that exposes the following set of methods: register_collective_start_hook register_collective_end_hook register_process_group_hook The process group hook exposes PG creation on the member ranks and call them inline from the the PG creation code. This is fine since this happens during initialization and a limited number of times. The collective start/end hooks are fired from a single background thread. It reads events from a C++ queue and dispatches over. Queue notification is oddly done using a pipe, this is needed so python can abort the thread on shutdown and have it as background thread. This is not possible with more reasonable choices like a condvar. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108815 Approved by: https://github.com/wconstab, https://github.com/fduwjj	2023-10-06 18:52:46 +00:00
Joel Schlosser	17348b0f51	Implement split_with_sizes backward for NT (#110647 ) Needed internally. Note that `split_with_sizes()` for NT is currently supported only on `dim=-1`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110647 Approved by: https://github.com/cpuhrsch, https://github.com/soulitzer ghstack dependencies: #110646	2023-10-06 18:44:22 +00:00
Joel Schlosser	48240ec62e	Make unbind() overrideable for NT subclass (#110646 ) Reland of #109122. Fixed the memory leak by not saving the outputs of `unbind()` for backward. Rather, the NT sizes are saved so undefined grads can replaced with zeros of the correct size. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110646 Approved by: https://github.com/soulitzer, https://github.com/cpuhrsch	2023-10-06 18:44:22 +00:00
Jesse Cai	33da6c8951	[sparse] Add i8i8->i32 support for cuSPARSELt (#110499 ) Summary: With the release of cuSPARSELt v0.5.0, we now have support for int8 int8 -> int32 matmul. This PR adds support for this via out_dtype. Test Plan: ``` python test/test_sparse_semi_structured.py -k int32 ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/110499 Approved by: https://github.com/cpuhrsch	2023-10-06 18:32:47 +00:00
Kazuaki Ishizaki	f7ce19d40a	Fix typo under torch/onnx directory (#110697 ) This PR fixes typo of comments in files under `torch/onnx` directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110697 Approved by: https://github.com/ezyang	2023-10-06 18:21:00 +00:00
soulitzer	69ea214cc2	[reland] Update singleton int to error when inequality relation is undefined (#110672 ) reland of https://github.com/pytorch/pytorch/pull/110044 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110672 Approved by: https://github.com/ezyang	2023-10-06 17:50:25 +00:00
PyTorch MergeBot	576b80d23e	Revert "[HigherOrderOp] expose torch.cond (#110293 )" This reverts commit 601f872831649bccf1069ac59b2ecfd0895a88e3. Reverted https://github.com/pytorch/pytorch/pull/110293 on behalf of https://github.com/ydwu4 due to Sorry, didn't check the error carefully on the PR. A doc error is related to this pr ([comment](https://github.com/pytorch/pytorch/pull/110293#issuecomment-1751176719))	2023-10-06 17:44:17 +00:00
cyy	e75f2e2ea1	Fix clang-tidy warnings in CUDAPluggableAllocator (#110678 ) This PR fixes clang-tidy warnings in CUDAPluggableAllocator. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110678 Approved by: https://github.com/Skylion007	2023-10-06 17:33:08 +00:00
ydwu4	601f872831	[HigherOrderOp] expose torch.cond (#110293 ) This pr expose torch._higher_order_ops.cond as torch.cond. 1. Need to add #noqa: F811 to the _check calls in torch/__init__.py to address some confusing linter error "Redefinition of unused 'cond'" but only one cond is imported and for these lines that have this error, they don't define the cond but just use it as an argument. 2. Also add cond to the list that allows it to be traced through so as dynamo could trigger the CondHigherOrder logic instead of creating a TorchVariable. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110293 Approved by: https://github.com/zou3519	2023-10-06 17:04:31 +00:00
Jeff Daily	e8f1f4ed66	[quant][pt2][ROCm] follow-up PR 109908 for miopen_batch_norm (#110653 ) Fixes recent broken unit tests caused by PR #109908 because cudnn and miopen have separate batch norm functions. ``` 2023-10-05T09:35:01.6606614Z _______________ TestQuantizePT2EQAT.test_qat_conv_bn_fusion_cuda _______________ 2023-10-05T09:35:01.6606948Z Traceback (most recent call last): 2023-10-05T09:35:01.6607362Z File "/var/lib/jenkins/pytorch/test/quantization/pt2e/test_quantize_pt2e_qat.py", line 323, in test_qat_conv_bn_fusion_cuda 2023-10-05T09:35:01.6607767Z self._verify_symmetric_xnnpack_qat_graph( 2023-10-05T09:35:01.6608217Z File "/var/lib/jenkins/pytorch/test/quantization/pt2e/test_quantize_pt2e_qat.py", line 130, in _verify_symmetric_xnnpack_qat_graph 2023-10-05T09:35:01.6608658Z self._verify_symmetric_xnnpack_qat_graph_helper( 2023-10-05T09:35:01.6609105Z File "/var/lib/jenkins/pytorch/test/quantization/pt2e/test_quantize_pt2e_qat.py", line 173, in _verify_symmetric_xnnpack_qat_graph_helper 2023-10-05T09:35:01.6609623Z m = prepare_qat_pt2e(m, quantizer) 2023-10-05T09:35:01.6610171Z File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/ao/quantization/quantize_pt2e.py", line 178, in prepare_qat_pt2e 2023-10-05T09:35:01.6610561Z _fuse_conv_bn_qat(model) 2023-10-05T09:35:01.6611072Z File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/ao/quantization/pt2e/qat_utils.py", line 501, in _fuse_conv_bn_qat 2023-10-05T09:35:01.6611497Z m = _fuse_conv_bn_qat_helper(m, is_cuda=True) 2023-10-05T09:35:01.6612065Z File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/ao/quantization/pt2e/qat_utils.py", line 575, in _fuse_conv_bn_qat_helper 2023-10-05T09:35:01.6612492Z _get_conv_bn_getitem_nodes(r.replacements) 2023-10-05T09:35:01.6613058Z File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/ao/quantization/pt2e/qat_utils.py", line 383, in _get_conv_bn_getitem_nodes 2023-10-05T09:35:01.6613465Z assert bn_node is not None 2023-10-05T09:35:01.6613716Z AssertionError ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/110653 Approved by: https://github.com/jerryzh168, https://github.com/pruthvistony	2023-10-06 15:30:55 +00:00
albanD	c4db607607	Doc test non packages (#110568 ) Add non-package python modules to the public API checks. The original change is to remove the `ispkg` check in this line https://github.com/pytorch/pytorch/blob/main/docs/source/conf.py#L518 Everything else is to add the appropriate modules to the rst files, make sure every module we provide can be imported (fixed by either making optional dependencies optional or just deleting files that have been un-importable for 3 years), make API that are both modules and functions (like torch.autograd.gradcheck) properly rendered on the docs website without confusion and add every non-documented API to the allow list (~3k of them). Next steps will be to try and fix these missing docs Pull Request resolved: https://github.com/pytorch/pytorch/pull/110568 Approved by: https://github.com/zou3519	2023-10-06 14:16:01 +00:00
atalman	a3e5ec453a	Move Docker official builds to Cuda 12.1.1 (#110703 ) Since our pipy released CUDA version is 12.1.1, Moving the Docker builds to 12.1.1. Related to : https://github.com/pytorch/pytorch/issues/110643 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110703 Approved by: https://github.com/DanilBaibak	2023-10-06 13:56:45 +00:00
Khushi Agrawal	261cae793a	[cpu] remove vec code for ops that do not support complex no (#110280 ) Removes dead code pertaining to ATen ops for which complex dtype is unsupported. Reference: https://github.com/pytorch/pytorch/pull/110217#discussion_r1340599702 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110280 Approved by: https://github.com/vfdev-5	2023-10-06 12:10:18 +00:00
chilli	ceb773b68d	Fix #110680 (requires_grad typo in decomp) (#110687 ) Fixes https://github.com/pytorch/pytorch/issues/110680 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110687 Approved by: https://github.com/voznesenskym, https://github.com/lezcano ghstack dependencies: #110501, #110504, #110591, #110668	2023-10-06 10:36:01 +00:00
Jon Chuang	d776dd04ac	perf(optim/dynamo): shortcut `is_sparse` iteration in SGD multi_tensor (#110648 ) Originated: https://github.com/pytorch/pytorch/pull/110353#discussion_r1347806922 Speeds up significantly in non-sparse path (majority use-case). Benchmarks: https://github.com/pytorch/pytorch/issues/110506#issuecomment-1747732478 CC: @janeyx99 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110648 Approved by: https://github.com/janeyx99	2023-10-06 08:56:18 +00:00
Jack Taylor	96f616a054	Revert tl.int1 casting change for ROCm to avoid hangs (#110531 ) Seeing hangs on ROCm seemingly after this PR https://github.com/pytorch/pytorch/pull/110388 https://ossci-raw-job-status.s3.amazonaws.com/log/17381916785 `inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_exp2_cuda_bool Command took >30min, returning 124` Conditionalising out of this while we investigate. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110531 Approved by: https://github.com/peterbell10	2023-10-06 08:53:45 +00:00
Jack Taylor	6b92c367c5	Add test_jit_cuda_fuser to ROCM_BLOCKLIST (#110440 ) Adds the nvfuser related unit test suite to ROCM_BLOCKLIST as should not be run on ROCm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110440 Approved by: https://github.com/jeffdaily, https://github.com/pruthvistony, https://github.com/lezcano	2023-10-06 08:47:15 +00:00
Huy Do	65afa760a6	Add a script to run iOS test app on AWS Device Farm (#110202 ) This adds a script to test PyTorch on actual iOS devices on AWS Device Farm. The test could take quite a long time pending for the devices to become available, so the steps are done manually and documented in `ios/TestApp/README.md`. ### Testing 1. TestApp itself runs fine on my local iPhone 13 and on [device farm](https://us-west-2.console.aws.amazon.com/devicefarm/home#/mobile/projects/b531574a-fb82-40ae-b687-8f0b81341ae0/runs/d2653ca8-8ee2-44dd-b15e-0402f9ab0aca). I can see the benchmark results output at the console log. ``` BUILD_LITE_INTERPRETER=1 USE_PYTORCH_METAL=1 USE_COREML_DELEGATE=1 IOS_PLATFORM=OS IOS_ARCH=arm64 ./scripts/build_ios.sh pushd ios/TestApp/benchmark ruby setup.rb --lite 1 -t 9HKVT38N77 --benchmark popd ruby scripts/xcode_build.rb -i build_ios/install -x ios/TestApp/TestApp.xcodeproj -p "OS" ``` 2. Trying to run TestAppTests https://github.com/pytorch/pytorch/blob/main/ios/TestApp/TestAppTests/TestLiteInterpreter.mm on my local iPhone ends up with this error `Logic Testing Unavailable. Logic Testing on iOS devices is not supported. You can run logic tests on the Simulator`. I update the xcode project to reuse TestApp as the host application. ``` ruby setup.rb --lite 1 -t 9HKVT38N77 ``` 3.. Trying [another round of testing on device farm](https://us-west-2.console.aws.amazon.com/devicefarm/home#/mobile/projects/b531574a-fb82-40ae-b687-8f0b81341ae0/runs/18dbd69d-8608-46d8-a868-bd05b69375db) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110202 Approved by: https://github.com/kit1980	2023-10-06 08:23:16 +00:00
Michael Voznesensky	7d98549ca9	retain_graph=True in compiled_autograd (#110367 ) Adds support for retain_graph=True - known as keep_graph_ internally in the autograd engine. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110367 Approved by: https://github.com/jansel	2023-10-06 08:22:10 +00:00
Jon Chuang	63fe5de89b	feat(optim): add SGD sparse multitensor to testing path (#110562 ) Follow up to: https://github.com/pytorch/pytorch/pull/110454, which defines the infra for sparse multi tensor optimizer testing Pull Request resolved: https://github.com/pytorch/pytorch/pull/110562 Approved by: https://github.com/janeyx99	2023-10-06 07:48:25 +00:00
kshitij12345	371d8ba599	vmap: decompose real and imag instead of registering batch rule (#110508 ) Clean-up Pull Request resolved: https://github.com/pytorch/pytorch/pull/110508 Approved by: https://github.com/zou3519	2023-10-06 06:01:12 +00:00
Sergii Dymchenko	e8605f6f22	Correct outdated Doxygen link (#110654 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110654 Approved by: https://github.com/huydhn	2023-10-06 05:23:27 +00:00
chilli	6d23193aab	Added strict=True to zip in aot_autograd (#110668 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110668 Approved by: https://github.com/ezyang ghstack dependencies: #110501, #110504, #110591	2023-10-06 05:12:05 +00:00
Jon Chuang	d279979102	perf(inductor): improve `Adam` compile times by shortcutting for loops (via `has_complex`) (#110607 ) Adam part of: https://github.com/pytorch/pytorch/issues/110506 TODO: - If this approach is validated as a good one, it an also be applied to all other optimizers which convert `complex` via list comprehensions ### Results: `NUM_PARAMS=200, foreach=True` - main: dynamo: 43s, inductor: 31s, total: 74s - this PR: dynamo: 3.5s, inductor: 30s, total: 34s (dynamo speedup: 12.3x, overall speedup: 34s, 2.1x) `NUM_PARAMS=1000, foreach=True, has_complex shortcut`: ``` <class 'torch.optim.adam.Adam'> {'lr': 0.01, 'foreach': True} torch.float32 TorchDynamo compilation metrics: Function Runtimes (s) ------------------------------------ ------------------------------- _compile.<locals>.compile_inner 0.0329, 50.0806, 0.0041 OutputGraph.call_user_compiler 44.9924 ``` `NUM_PARAMS=1000, foreach=True`: ``` <class 'torch.optim.adam.Adam'> {'lr': 0.01, 'foreach': True} torch.float32 TorchDynamo compilation metrics: Function Runtimes (s) ------------------------------------ ------------------------------- _compile.<locals>.compile_inner 0.0389, 58.6069, 0.0043 OutputGraph.call_user_compiler 44.1425 ``` ### Discussion - `has_complex` shortcut provides additional 2x dynamo speedup. It is not necessary to achieve a significant overall speedup. CC: @janeyx99 @mlazos Pull Request resolved: https://github.com/pytorch/pytorch/pull/110607 Approved by: https://github.com/janeyx99, https://github.com/lezcano	2023-10-06 05:08:49 +00:00
Huy Do	26bfb0fc21	Check for both workflow and job names from Dr.CI (#110661 ) In https://github.com/pytorch/pytorch/pull/110362, the failure was flaky but merge bot treated it as an actual failure. This is a regression after https://github.com/pytorch/test-infra/pull/4604 where the name returned by Dr.CI now includes workflow name. For example, the name is `trunk / macos-12-py3-arm64 / test (default, 2, 3, macos-m1-12)` in the JSON response: ``` {"FAILED": [], "FLAKY": [{"workflowId": 6372581477, "id": 17297638807, "name": "trunk / macos-12-py3-arm64 / test (default, 2, 3, macos-m1-12)", "jobName": "macos-12-py3-arm64 / test (default, 2, 3, macos-m1-12)", "conclusion": "failure", "completed_at": "2023-10-01T22:18:28Z", "html_url": "https://github.com/pytorch/pytorch/actions/runs/6372581477/job/17297638807", "head_branch": "ciflow/trunk/110362", "pr_number": 110362, "head_sha": "03f51e36dedf234931006d1db61677b229c9a119", "failure_captures": ["Failure: There is only 4671284KB free space left in /, which is less than the minimum requirement of"], "failure_line": "Failure: There is only 4671284KB free space left in /, which is less than the minimum requirement of 6291456KB for macOS", "time": "2023-10-01T22:17:53.847751Z"}], "BROKEN_TRUNK": [], "UNSTABLE": []} ``` I update merge bot to handle this better by considering both workflow name, job name, and the combination full name. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110661 Approved by: https://github.com/clee2000	2023-10-06 04:36:52 +00:00
Banit Agrawal	64583c4d04	[CUDA Host Allocator] Add support of CudaHostRegister (#108488 ) Summary: This diff adds another option to create cuda pinned memory using cudaHostRegister. Differential Revision: D45843715 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108488 Approved by: https://github.com/zdevito	2023-10-06 04:13:02 +00:00
Jon Chuang	57e9969021	feat(optim): Add adadelta multi_tensor support for complex, with `has_complex` shortcut (#110631 ) Partial fix: https://github.com/pytorch/pytorch/issues/110606 More on `has_complex` shortcut: https://github.com/pytorch/pytorch/pull/110613#issuecomment-1749314805 CC: @janeyx99, @mlazos, @lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/110631 Approved by: https://github.com/lezcano	2023-10-06 03:34:41 +00:00
Jon Chuang	11047be10e	feat(optim): Add `NAdam`support for complex, with `has_complex` shortcut (#110634 ) Partial fix: https://github.com/pytorch/pytorch/issues/110606 More on `has_complex` shortcut: https://github.com/pytorch/pytorch/pull/110613#issuecomment-1749314805 CC: @janeyx99 @mlazos @lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/110634 Approved by: https://github.com/lezcano	2023-10-06 03:31:48 +00:00
Jon Chuang	347ea3fe0d	feat(optim): Add `RAdam` support for complex, with `has_complex` shortcut (#110635 ) Partial fix: https://github.com/pytorch/pytorch/issues/110606 More on `has_complex` shortcut: https://github.com/pytorch/pytorch/pull/110613#issuecomment-1749314805 CC: @janeyx99 @mlazos @lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/110635 Approved by: https://github.com/lezcano	2023-10-06 03:29:26 +00:00
Zhengxu Chen	be5dc3a00d	[export] Update ArgumentSpec definition. (#110612 ) Summary: Changing ArgumentSpec into a true union type in Python without changing serialization format. Test Plan: CI Differential Revision: D49871088 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110612 Approved by: https://github.com/angelayi	2023-10-06 03:14:45 +00:00
angelayi	83061ee177	[aotinductor] Fix benchmarks with self.autocast (#110490 ) Fixes https://github.com/pytorch/pytorch/issues/108173 The original error was that there was a type mismatch between the output of eager mode (float16) and from aot_compile (float32). This is because when we run the model eagerly in the benchmarks, we call [self.model_iter_fn](https://github.com/pytorch/pytorch/blob/main/benchmarks/dynamo/common.py#L2072-L2076) to run the model, rather than directly calling the model. In the case of timm models, it calls the model with [self.autocast()](https://github.com/pytorch/pytorch/blob/main/benchmarks/dynamo/timm_models.py#L321-L323), causing the eager model to return a float16 value. However, the model we export with aot_compile does not have the self.autocast context, so it returns float32. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110490 Approved by: https://github.com/desertfire	2023-10-06 02:13:47 +00:00
Catherine Lee	8a09fe4a05	[ez] Remove print in heuristics aggregation (#110621 ) Move print to the beginning instead because putting it at the end makes it so you have to scroll through when debugging, and nothing in that function indicates that it should be printing anything Also the line for printing disabled issues out of the for loop Pull Request resolved: https://github.com/pytorch/pytorch/pull/110621 Approved by: https://github.com/huydhn	2023-10-06 02:04:53 +00:00
PyTorch MergeBot	dac895c10a	Revert "Multiprocessing support for NT (#110292 )" This reverts commit f17fe89e14ef7c29690d989c857ae011b8589b80. Reverted https://github.com/pytorch/pytorch/pull/110292 on behalf of https://github.com/kit1980 due to Causes CUDA memory leaks ([comment](https://github.com/pytorch/pytorch/pull/110292#issuecomment-1749852095))	2023-10-06 01:07:40 +00:00
Tobias Ringwald	555c83d097	Added a UserWarning when using torch.{std,var,std_mean,std_var} with dof<=0 (#109824 ) Fixes #109696. This PR adds a `UserWarning` when calling - `torch.var` - `torch.var_mean` - `torch.std` - `torch.std_mean` with an effective `dof<=0`. Until now, only `torch.cov` warned about this. The code also handles edge cases, such as `torch.empty` ``` >>> import torch; torch.std_mean(torch.empty(0), correction=0) <stdin>:1: UserWarning: std_mean(): degrees of freedom is <= 0 (Triggered internally at /app/aten/src/ATen/native/ReduceOps.cpp:1671.) (tensor(nan), tensor(nan)) ``` multi-dim reductions ``` >>> import torch; torch.std_mean(torch.empty(10, 30, 20, 50), correction=600, dim=(1, 2)) <stdin>:1: UserWarning: std_mean(): degrees of freedom is <= 0 (Triggered internally at /app/aten/src/ATen/native/ReduceOps.cpp:1671.) [... snip ...] ``` and a negative `correction`. ``` >>> import torch; torch.std_mean(torch.randn(0), correction=-5) (tensor(nan), tensor(nan)) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/109824 Approved by: https://github.com/soulitzer	2023-10-06 01:03:47 +00:00
PyTorch MergeBot	81ce5d5725	Revert "pin_memory support for NT (#110404 )" This reverts commit 3597325bc7f07d97ded1c94c47bb59c98e080a0f. Reverted https://github.com/pytorch/pytorch/pull/110404 on behalf of https://github.com/kit1980 due to Previous PR in the stack caused CUDA memory leaks ([comment](https://github.com/pytorch/pytorch/pull/110404#issuecomment-1749850211))	2023-10-06 01:03:17 +00:00
cyy	11b3210a11	[Reland2] Remove calls of c10::either (#110487 ) This PR is reland of #109707 with fixes of MSVC failures. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110487 Approved by: https://github.com/soulitzer	2023-10-06 00:25:15 +00:00
PyTorch MergeBot	330db8278b	Revert "Update singleton int to error when inequality relation is undefined (#110044 )" This reverts commit 07331c65e6b47f41475fc0d81ba03917f39b55dd. Reverted https://github.com/pytorch/pytorch/pull/110044 on behalf of https://github.com/PaliC due to bottom diff is causing a plethora of internal failures ([comment](https://github.com/pytorch/pytorch/pull/110044#issuecomment-1749805209))	2023-10-05 23:55:37 +00:00
PyTorch MergeBot	1c3fae46ee	Revert "Support SingletonSymNode mul with coefficient (#110369 )" This reverts commit eb8feb8ff8610d53d92773c2d7dce05c2196d672. Reverted https://github.com/pytorch/pytorch/pull/110369 on behalf of https://github.com/PaliC due to bottom diff is causing a plethora of internal failures ([comment](https://github.com/pytorch/pytorch/pull/110369#issuecomment-1749802899))	2023-10-05 23:51:28 +00:00
PyTorch MergeBot	236afe73a2	Revert "Update custom Function preserve torch function when inputs returned as-is (#109825 )" This reverts commit 4e73eee93f411596fcabb32cc8e7686890d1c7fb. Reverted https://github.com/pytorch/pytorch/pull/109825 on behalf of https://github.com/PaliC due to causing a plethora of internal failures ([comment](https://github.com/pytorch/pytorch/pull/109825#issuecomment-1749802739))	2023-10-05 23:49:41 +00:00
PyTorch MergeBot	fdf6055ea7	Revert "Add symbolic singleton int (#110370 )" This reverts commit a7145cb3a42e925209c7f34c0b8b169dc72ff4c6. Reverted https://github.com/pytorch/pytorch/pull/110370 on behalf of https://github.com/PaliC due to bottom diff is causing a plethora of internal failures ([comment](https://github.com/pytorch/pytorch/pull/110370#issuecomment-1749801188))	2023-10-05 23:47:09 +00:00
PyTorch MergeBot	585e2bd818	Revert "Symintify guards.cpp (#110371 )" This reverts commit e1cfcdfa06d476fb7c6dc9be1b677b23569d4ed6. Reverted https://github.com/pytorch/pytorch/pull/110371 on behalf of https://github.com/PaliC due to bottom diff is causing a plethora of internal failures ([comment](https://github.com/pytorch/pytorch/pull/110371#issuecomment-1749798063))	2023-10-05 23:42:35 +00:00
PyTorch MergeBot	bcd44dac60	Revert "Use is_symbolic instead of testing isinstance in some place (#110372 )" This reverts commit 8672d64fed2d76062f14a74075d560fe6fc38b1a. Reverted https://github.com/pytorch/pytorch/pull/110372 on behalf of https://github.com/PaliC due to bottom diff is causing a plethora of internal failures ([comment](https://github.com/pytorch/pytorch/pull/110372#issuecomment-1749795074))	2023-10-05 23:37:37 +00:00
Gufan Yin	5d963474aa	Replace enforce_dtype with dtype in ShardedTensor.gather (#110561 ) Summary: Sometimes local_shards are empty on some ranks, and out.dtype is float16, which will cause error if enforce_dtype is True because `data` will be float32. Callers know best what dtype they want, so we can just let callers decide. Temporarily keep enforce_dtype for backward compatibility Test Plan: Run local and MAST job Reviewed By: uciyc123 Differential Revision: D46886551 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110561 Approved by: https://github.com/wanchaol, https://github.com/malfet	2023-10-05 23:16:23 +00:00
Edward Z. Yang	f274c7b32c	Add functional collective all_to_all_single and support it in Inductor (#110195 ) Copy of https://github.com/pytorch/pytorch/pull/106655 from yf225 rebased on top of item() support changes Pull Request resolved: https://github.com/pytorch/pytorch/pull/110195 Approved by: https://github.com/Skylion007	2023-10-05 23:11:51 +00:00
Jon Chuang	df7d01aed5	perf(inductor): use for loop with shortcut in `Optimizer`s to speedup against list comprehensions (e.g. complex conversion) (#110613 ) Fully fixes: https://github.com/pytorch/pytorch/issues/110506 Depends: https://github.com/pytorch/pytorch/pull/110607 Potential merge conflicts: - https://github.com/pytorch/pytorch/pull/110339 - https://github.com/pytorch/pytorch/pull/110345 - https://github.com/pytorch/pytorch/pull/110454 Related: - https://github.com/pytorch/pytorch/issues/110606 (we can apply the improvements here orthogonally to the complex support) ### Results Benchmark: 100 params. Breakdowns (float32, dynamo): ``` Adagrad: this PR: 4.4s, main: 8.8s Adam: this PR: 2.1s, main: 9.8s AdamW: this PR: 2.5s, main: 8.2s ASGD: this PR: 3.1s, main: 8.5s RMSProp: this PR: 1.3s, main: 4.2s RProp: this PR: 6.7s, main: 14.9s ``` Notes: 1. Adagrad is still slow due to `_get_value` list comprehension. Can be fixed in https://github.com/pytorch/pytorch/pull/110339/files by utilizing capturable path 2. Adamax is not actually compiled (it is currently disabled). 3. Inductor compile time is quite variable. We calculate dynamo by subtracting `call_user_compiler` from `compile_inner` timing. <details> This PR: ``` Adagrad (torch.float32): 28.47496461868286s Adagrad (torch.complex64): 29.379547357559204s Adam (torch.float32): 17.334211587905884s Adam (torch.complex64): 29.637500524520874s Adamax (torch.float32): 2.4749321937561035s Adamax (torch.complex64): 3.1997995376586914s AdamW (torch.float32): 18.06532859802246s AdamW (torch.complex64): 28.25661015510559s ASGD (torch.float32): 23.70255398750305s ASGD (torch.complex64): 25.33756995201111s RMSprop (torch.float32): 7.964028596878052s RMSprop (torch.complex64): 12.909599781036377s Rprop (torch.float32): 30.512362003326416s Rprop (torch.complex64): 44.74405765533447s ``` Main ``` Adagrad (torch.float32): 26.919506072998047s Adagrad (torch.complex64): 35.190622091293335s Adam (torch.float32): 25.715000867843628s Adam (torch.complex64): 24.17716670036316s Adamax (torch.float32): 2.4404726028442383s Adamax (torch.complex64): 3.3538928031921387s AdamW (torch.float32): 25.2022807598114s AdamW (torch.complex64): 28.915700912475586s ASGD (torch.float32): 24.108731985092163s ASGD (torch.complex64): 26.589075088500977s RMSprop (torch.float32): 10.781344175338745s RMSprop (torch.complex64): 15.136352777481079s Rprop (torch.float32): 42.46482181549072s Rprop (torch.complex64): 48.28277635574341s ``` Seems that it doesn't help the complex case by much (but that's not the majority case). torch.float32 is generally positive, when it does not show drastic improvement / regresses, it is due to inductor variance (by manually inspecting the logs). </details> ### Benchmark Script ```python import torch import time from torch.optim import Adagrad, Adam, Adamax, AdamW, ASGD, RMSprop, Rprop OPTIMS = [Adagrad, Adam, Adamax, AdamW, ASGD, RMSprop, Rprop] DTYPES = [torch.float, torch.cfloat] NUM_PARAMS = 100 kwargs = { "lr": 0.01, "foreach": True } summary = [] for optim_cls in OPTIMS: for dtype in DTYPES: torch._dynamo.reset() # torch._inductor.metrics.reset() input = torch.ones([10, 10], dtype=dtype, device="cuda:0") model = torch.nn.Sequential( [torch.nn.Linear(10, 10, dtype=dtype, device="cuda:0") for _ in range(NUM_PARAMS)] ) model(input).sum().abs().backward() opt_compiled = optim_cls(model.parameters(), *kwargs) compiled_step = torch.compile(opt_compiled.step) with torch.set_grad_enabled(False): start_time = time.time() compiled_step() summary.append(f"{optim_cls.__name__} ({dtype}): {time.time() - start_time}s") print(optim_cls, kwargs, dtype, torch._dynamo.utils.compile_times()) for s in summary: print(s) ``` CC: @janeyx99 @mlazos Pull Request resolved: https://github.com/pytorch/pytorch/pull/110613 Approved by: https://github.com/janeyx99	2023-10-05 23:10:52 +00:00
Jerry Zhang	7b6042111f	[quant][pt2e] Refactor conv related annotation for XNNPACKQuantizer (#110308 ) Summary: Since we changed IR that we are working with to pre autograd aten IR, it's easier to use plain pattern match instead of relying on source_matcher_utils now, this PR refactors the annotation for conv to use aten ops directly. Also fixed reentrant test after this change. Test Plan: python test/test_quantization.py TestQuantizePT2E Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/110308 Approved by: https://github.com/kimishpatel	2023-10-05 22:36:18 +00:00
Nikita Shulga	be02103786	[BE] Get rid of code duplication (#110619 ) Replace `dispatch_to_CDouble`, `dispatch_to_CLong` and `dispatch_to_CComplexDouble` with `dispatch_to<T>` template <!-- copilot:poem --> ### <samp>🤖 Generated by Copilot at c3d9d01</samp> > _Sing, O Muse, of the clever coder who devised_ > _A wondrous template function, `dispatch_to<T>`, that could_ > _Handle with ease the various scalar types that vexed_ > _The previous code, which was verbose and dull as wood._ Pull Request resolved: https://github.com/pytorch/pytorch/pull/110619 Approved by: https://github.com/soulitzer, https://github.com/albanD ghstack dependencies: #110618	2023-10-05 22:05:57 +00:00
Nikita Shulga	82e353fffc	[BE] Use nested namespaces in autograd/templates (#110618 ) As PyTorch can now use C++17 language features Pull Request resolved: https://github.com/pytorch/pytorch/pull/110618 Approved by: https://github.com/soulitzer	2023-10-05 22:05:57 +00:00
albanD	cae537126f	Set _diffThreshold on our TestCase (#110603 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/110603 Approved by: https://github.com/albanD	2023-10-05 21:49:28 +00:00
Aaron Gokaslan	668eb55488	[BE]: Enable some basic pytest style rules (#110362 ) Adds some basic flake8-pytest-style rules from ruff with their autofixes. I just picked a couple uncontroversial changes about having a consistent pytest style that were already following. We should consider enabling some more in the future, but this is a good start. I also upgraded ruff to the latest version. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110362 Approved by: https://github.com/ezyang, https://github.com/albanD, https://github.com/kit1980	2023-10-05 21:40:43 +00:00
Wanchao Liang	c95cf4b4c9	[dtensor] add grad placements kwarg to to_local API (#110629 ) When we convert to local tensor, dtensor can't track autograd or gradient layout of the local tensor anymore, if user do sth not expected, there needs to be a way for user to hint about the gradient layout of the local tensor Pull Request resolved: https://github.com/pytorch/pytorch/pull/110629 Approved by: https://github.com/zdevito	2023-10-05 21:34:01 +00:00
chilli	ada65508d2	Add option to flop counter formula registration to get raw values (#110591 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110591 Approved by: https://github.com/awgu ghstack dependencies: #110501, #110504	2023-10-05 21:14:41 +00:00
Scott Wolchok	9e72c9cccd	[torch] easy missing move in aoti_runtime/model.h (#110469 ) Just an extra shared_ptr copy, nothing fancy. Differential Revision: [D49792510](https://our.internmc.facebook.com/intern/diff/D49792510/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110469 Approved by: https://github.com/Skylion007	2023-10-05 20:56:06 +00:00
William Wen	71beca4899	[dynamo, logging] Report name of defining class along side function name in Dynamo logs (#110190 ) Implement https://github.com/pytorch/pytorch/issues/109236 Sample code: ```python import torch class AAA: class DUMMY: class DUMMY2: pass def dummy(self): def dummy2(): pass class BBB: @staticmethod def CCC(): class DDD: if True: @staticmethod def EEE(): x = [torch.ones(3, 3) for _ in range(5)] return x return DDD def fn(): return AAA.BBB.CCC().EEE() opt_fn = torch.compile(fn, backend="eager") opt_fn() ``` Logs: ```bash $TORCH_LOGS="trace_source" python playground2.py [2023-09-27 17:38:35,641] [0/0] torch._dynamo.symbolic_convert.__trace_source: [DEBUG] TRACE starts_line /data/users/williamwen/pytorch/playground2.py:21 in fn (fn) [2023-09-27 17:38:35,641] [0/0] torch._dynamo.symbolic_convert.__trace_source: [DEBUG] def fn(): [2023-09-27 17:38:35,642] [0/0] torch._dynamo.symbolic_convert.__trace_source: [DEBUG] TRACE starts_line /data/users/williamwen/pytorch/playground2.py:22 in fn (fn) [2023-09-27 17:38:35,642] [0/0] torch._dynamo.symbolic_convert.__trace_source: [DEBUG] return AAA.BBB.CCC().EEE() [2023-09-27 17:38:35,661] [0/0] torch._dynamo.symbolic_convert.__trace_source: [DEBUG] TRACE starts_line /data/users/williamwen/pytorch/playground2.py:11 in CCC (AAA.BBB) (inline depth: 1) [2023-09-27 17:38:35,661] [0/0] torch._dynamo.symbolic_convert.__trace_source: [DEBUG] @staticmethod [2023-09-27 17:38:35,661] [0/0] torch._dynamo.symbolic_convert.__trace_source: [DEBUG] TRACE starts_line /data/users/williamwen/pytorch/playground2.py:13 in CCC (AAA.BBB.CCC.DDD) (inline depth: 1) [2023-09-27 17:38:35,661] [0/0] torch._dynamo.symbolic_convert.__trace_source: [DEBUG] class DDD: [2023-09-27 17:38:35,723] [1/0] torch._dynamo.symbolic_convert.__trace_source: [DEBUG] TRACE starts_line /data/users/williamwen/pytorch/playground2.py:17 in <listcomp> (AAA.BBB.CCC.DDD.EEE) [2023-09-27 17:38:35,723] [1/0] torch._dynamo.symbolic_convert.__trace_source: [DEBUG] x = [torch.ones(3, 3) for _ in range(5)] ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/110190 Approved by: https://github.com/ezyang, https://github.com/mlazos	2023-10-05 20:41:38 +00:00
Jon Chuang	c99de9f37c	fix(optim): adagrad sparse multitensor incorrect early exit (#110454 ) Fixes https://github.com/pytorch/pytorch/issues/110444#issuecomment-1745181530 This PR: Passes Main: ``` test/optim/test_optim.py::TestOptim::test_adagrad_sparse FAILED [0.0058s] ==================================================================================================================================== FAILURES ===================================================================================================================================== __________________________________________________________________________________________________________________________ TestOptim.test_adagrad_sparse __________________________________________________________________________________________________________________________ Traceback (most recent call last): File "/home/jonch/Desktop/Programming/mlsys/pytorch/test/optim/test_optim.py", line 1448, in test_adagrad_sparse self._test_rosenbrock_sparse( File "/home/jonch/Desktop/Programming/mlsys/pytorch/test/optim/test_optim.py", line 128, in _test_rosenbrock_sparse self.assertEqual(params, params_c, atol=1e-6, rtol=1e-6) File "/home/jonch/Desktop/Programming/mlsys/pytorch/torch/testing/_internal/common_utils.py", line 3309, in assertEqual raise error_metas.pop()[0].to_error( AssertionError: Tensor-likes are not close! Mismatched elements: 1 / 2 (50.0%) Greatest absolute difference: 0.09999999999993325 at index (1,) (up to 1e-06 allowed) Greatest relative difference: 0.06249999999996089 at index (1,) (up to 1e-06 allowed) ``` CC: @janeyx99 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110454 Approved by: https://github.com/janeyx99	2023-10-05 20:37:57 +00:00
CK Luk	ecdd1bcf03	Back out "[Inductor] Break the loop fusion when node2 depends on node1 mutations (#109172 )" (#110622 ) Summary: Original commit changeset: 03980fb054d5 Original Phabricator Diff: D49519512 Bisecting shows that this diff is the cause of S369683. Since this affects Ads production, need to back out this diff immediately. Test Plan: See S369683 Reviewed By: ezyang Differential Revision: D49958638 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110622 Approved by: https://github.com/yanboliang	2023-10-05 20:09:09 +00:00
Chien-Chin Huang	88616349d7	[state_dict][1/N] Implement the basic functions of distributed.checkpoint._state_dict (#105902 ) This PR implements the basic functions of distributed.checkpoint._state_dict. This PR currently contains the flattening of optimizer state_dict which makes the PR too large. A later version may split it into 2 for a better code review. Differential Revision: [D47647719](https://our.internmc.facebook.com/intern/diff/D47647719/) NOTE FOR REVIEWERS: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D47647719/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/105902 Approved by: https://github.com/wz337	2023-10-05 20:04:15 +00:00
Bin Bao	298f01d9a2	[aotinductor] Avoid generating redundant kernel loading code (#110510 ) Summary: 1) Stop forcing triton.unique_kernel_names to True for AOTInductor, because the unique kernel name can be read from metadata; 2) Only generate load_kernel once for each kernel since we don't have control flow in our generated code. This solves https://github.com/pytorch/pytorch/issues/105553. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110510 Approved by: https://github.com/chenyang78, https://github.com/jansel	2023-10-05 19:59:38 +00:00
Sherlock Huang	f1b94461aa	[AOTInductor] ProxyExecutor support Dynamic Shape (#110526 ) Summary: Extend ProxyExecutor to support dynamic shape. Example of ProxyExecutor invocation with symints. ``` int64_t* arg0_1_size; AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_get_sizes(arg0_1, &arg0_1_size)); auto s0 = arg0_1_size[0]; auto s1 = arg0_1_size[1]; int64_t* arg1_1_size; AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_get_sizes(arg1_1, &arg1_1_size)); auto s2 = arg1_1_size[0]; auto s3 = arg1_1_size[1]; ... aoti_torch_proxy_executor_call_function(proxy_executor, 0, 15, std::vector<int64_t>{42, 16, 17, s0 + s1, s0 + s1, s2s3, 45, 67, 16, 17, s2s3, s2s3, s0 + s1, 89, 910}.data(), 7, std::vector<AtenTensorHandle>{arg0_1, arg0_1, arg1_1, buf2, arg0_1, arg1_1, buf4}.data()); ``` Example of serialized SymInt(s) arguments: ``` { "name": "symint", "arg": { "asSymInt": { "asName": "s0 + s1" } } }, { "name": "symints", "arg": { "asSymInts": [ { "asName": "s0 + s1" }, { "asName": "s2s3" } ] } }, ... { "name": "o_symint", "arg": { "asSymInt": { "asName": "s2s3" } } }, { "name": "o_symints", "arg": { "asSymInts": [ { "asName": "s2s3" }, { "asName": "s0 + s1" } ] } }, ``` Test Plan: buck2 run mode/dev-nosan deeplearning/aot_inductor/test:test_custom_ops Differential Revision: D49887555 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110526 Approved by: https://github.com/chenyang78	2023-10-05 19:05:20 +00:00
Dmytro Dzhulgakov	a0cea517e7	Add 9.0a to cpp_extension supported compute archs (#110587 ) There's an extended compute capability 9.0a for Hopper that was introduced in Cuda 12.0: https://docs.nvidia.com/cuda/archive/12.0.0/cuda-compiler-driver-nvcc/index.html#gpu-feature-list E.g. Cutlass leverages it: `5f13dcad78/python/cutlass/emit/pytorch.py (L684)` This adds it to the list of permitted architectures to use in `cpp_extension` directly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110587 Approved by: https://github.com/ezyang	2023-10-05 17:41:06 +00:00
dependabot[bot]	c89d35adfe	Bump pillow from 9.5.0 to 10.0.1 in /.ci/docker (#110494 ) Bumps [pillow](https://github.com/python-pillow/Pillow) from 9.5.0 to 10.0.1. - [Release notes](https://github.com/python-pillow/Pillow/releases) - [Changelog](https://github.com/python-pillow/Pillow/blob/main/CHANGES.rst) - [Commits](https://github.com/python-pillow/Pillow/compare/9.5.0...10.0.1) --- updated-dependencies: - dependency-name: pillow dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-10-05 10:37:26 -07:00
Antoni Viros i Martin	efdf155383	Add requirement for input to AllGatherIntoTensor to be contiguous (#109561 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109561 Approved by: https://github.com/Chillee	2023-10-05 17:04:48 +00:00
Ikko Eltociear Ashimine	f21c322e20	Fix typo in BatchLinearAlgebraLibBlas.cpp (#110608 ) accomodate -> accommodate Pull Request resolved: https://github.com/pytorch/pytorch/pull/110608 Approved by: https://github.com/malfet	2023-10-05 16:48:53 +00:00
Catherine Lee	d6e5898e8d	Quieter logs in CI (#110033 ) To reduce the amount of logs * for successes, only print the part that says what tests ran and don't print the rest. Zip the log into an artifact. The line listing al the test names is really long, but if you view source of the raw logs, it will not wrap so it will only be one line. The log classifier can also be configured to ignored this line. Gets rid of lines like `test_ops.py::TestCommonCPU::test_multiple_devices_round_cpu_int64 SKIPPED [0.0010s] (Only runs on cuda) [ 9%]` * for failures/reruns, print logs. Do not zip. Also * change log artifact name Examples of various logs: `a074db0f7f` failures `1b439e24c4` failures possibly controversial haha should i include an option for always printing? Pull Request resolved: https://github.com/pytorch/pytorch/pull/110033 Approved by: https://github.com/huydhn	2023-10-05 16:40:37 +00:00
Joel Schlosser	3597325bc7	pin_memory support for NT (#110404 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110404 Approved by: https://github.com/cpuhrsch, https://github.com/albanD ghstack dependencies: #110292	2023-10-05 16:33:22 +00:00
ydwu4	cc1de49340	[HigherOrderOp] fallthrough some keys by default. (#110478 ) Fixes #109253 Test Plan: Added a new test that shows default fallthrough keys can be overrided. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110478 Approved by: https://github.com/ezyang	2023-10-05 16:25:42 +00:00
Jason Park	26f634eefb	Enable aarch64 for fixing undefined symbol error. (#110542 ) Summary: ARM can be safely supported Reviewed By: andrewjcg, aaronenyeshi Differential Revision: D49921679 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110542 Approved by: https://github.com/aaronenyeshi	2023-10-05 16:16:06 +00:00
Jeff Daily	a94b6f39d1	[ROCm] conditionally enable hipsparse const descriptors for version >= 2.4.0 (#110317 ) This is in preparation for upcoming backwards-incompatible hipsparse changes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110317 Approved by: https://github.com/malfet	2023-10-05 16:07:51 +00:00
chilli	f767a6c57a	Made pattern-matcher diagnostics lazily reported + added TORCH_COMPILE_CPROFILE (#110504 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110504 Approved by: https://github.com/mlazos, https://github.com/eellison ghstack dependencies: #110501	2023-10-05 15:47:30 +00:00
PyTorch MergeBot	1e4c0641ce	Revert "Made pattern-matcher diagnostics lazily reported + added TORCH_COMPILE_CPROFILE (#110504 )" This reverts commit 9648df1a6af8509ba2f5455a8465e0c67d0dd0c2. Reverted https://github.com/pytorch/pytorch/pull/110504 on behalf of https://github.com/PaliC due to temporarily will revert as it's causing problems with difftrain import ([comment](https://github.com/pytorch/pytorch/pull/110504#issuecomment-1749132253))	2023-10-05 15:28:23 +00:00
Chien-Chin Huang	1a729618ef	[FSDP][optim_state_dict] Make the new optimizer allgather fusion work with fine-tuning models (#110540 ) With use_orig_params=True, it is possible that some parameters with the same FlatParameter are in the optimizer while others parameters are frozen. This PR makes the allgather fusion logic support the case. Differential Revision: [D49922028](https://our.internmc.facebook.com/intern/diff/D49922028/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110540 Approved by: https://github.com/awgu, https://github.com/rohan-varma	2023-10-05 15:17:10 +00:00
Joel Schlosser	f17fe89e14	Multiprocessing support for NT (#110292 ) Fixes #110161 Allows NTs to be used in DataLoaders with `num_workers > 1`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110292 Approved by: https://github.com/cpuhrsch, https://github.com/albanD	2023-10-05 15:04:48 +00:00
Andrew Or	7c72238e4b	Back out "Enable pickling model prepared with QAT qconfig" (#110392 ) Summary: D49187352 caused our model conversion and loading of QAT checkpoint to be stuck with thrift time out. we are actively checking in final code and model for static quant HTP prod model, and encountered this breakage at head Thursday. Thrift timeout is a not failing, and because of that, it's hard to bisect and find this culprit. It is also hard to set up unit test, because the job simply time-out. Better test is needed to guard downstream model conversion against upstream changes. Our suspicion of why this diff broke us is that we create a lot of modules with qat (in a recursive manner) but our model is not a qat traceable module (it is a graph with many qat modules and floating point modules). With fuctools.partial as in the original diff, we will be caching modules in the memory and causing the memory of the machine to be taken up completely. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110392 Approved by: https://github.com/junesg, https://github.com/jerryzh168	2023-10-05 14:41:00 +00:00
Oleg Khabinov	cf1b494afd	[AOTInductor] Store loaded kernels in the model (#110554 ) Defining kernels as static vars is problematic for subsequent model loading on non-default CUDA devices. Assuming those kernels were loaded in context of the device #0, so, they are not nullptr anymore, therefore kernels won't work on devices other than the device #0. This change makes devices remembered at model level in AOT mode. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110554 Approved by: https://github.com/chenyang78, https://github.com/desertfire	2023-10-05 10:17:05 +00:00
Sehoon Kim	c36b31d530	`torch::nn::AdaptiveLogSoftmaxWithLoss`: check length of `cutoffs` (#106777 ) Fixes #106698 Also added a check for python API, because current error message ``` Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/sehoon/pytorch-latest/torch/nn/modules/adaptive.py", line 128, in __init__ or (min(cutoffs) <= 0) \ ValueError: min() arg is an empty sequence ``` is not very comprehensible. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106777 Approved by: https://github.com/albanD	2023-10-05 05:35:47 +00:00
PyTorch UpdateBot	00b9afa429	[vision hash update] update the pinned vision hash (#110571 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110571 Approved by: https://github.com/pytorchbot	2023-10-05 05:14:04 +00:00
Avik Chaudhuri	416eca9736	export db links for user errors (#110555 ) Ideally all `_dynamo.exc.UserError`s should have "case names", i.e., link to examples in `exportdb`. This PR adds case names to several instances of `_dynamo.exc.UserError`. In particular, looking at coverage based on `UserErrorType`: * `DYNAMIC_CONTROL_FLOW`, `ANTI_PATTERN`, and `STANDARD_LIBRARY` are fully covered. * `CONSTRAINT_VIOLATION` and `DYNAMIC_DIM` have no coverage. We don't seem to have any dedicated examples of specifying dynamic shapes in `exportdb` (although they are used in some other examples without explanation, to avoid some specialization that would make such examples moot). * `INVALID_INPUT` is only partly covered. Frankly this is tedious to cover via examples. Differential Revision: [D49928518](https://our.internmc.facebook.com/intern/diff/D49928518/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110555 Approved by: https://github.com/angelayi, https://github.com/ydwu4	2023-10-05 05:03:04 +00:00
PyTorch MergeBot	21019620ee	Revert "[Dynamo] SizeVariable can be indexed by symint (#110349 )" This reverts commit 510ec7e3c539dfed49df587d09e8a0a87e187201. Reverted https://github.com/pytorch/pytorch/pull/110349 on behalf of https://github.com/PaliC due to breaking internal tests (check diff) ([comment](https://github.com/pytorch/pytorch/pull/110349#issuecomment-1748021641))	2023-10-05 04:42:33 +00:00
andrewor14	62cad5b5b0	[quant][pt2] Support cudnn_batch_norm in QAT fusion (#109908 ) Summary: Today, we get different batch norm ops depending on the device the model is placed on at export time. Exporting `model.cpu()` gives `_native_batch_norm_legit`, while exporting `model.cuda()` gives `cudnn_batch_norm`. QAT fusion currently only supports the former and silently ignores the latter. This commit fixes this by additionally matching on the latter op during QAT fusion. Test Plan: python test/test_quantization.py TestQuantizePT2EQAT.test_qat_conv_bn_fusion python test/test_quantization.py TestQuantizePT2EQAT.test_qat_conv_bn_relu_fusion Reviewers: jerryzh168, kimishpatel Subscribers: jerryzh168, kimishpatel, supriyar Differential Revision: [D49615145](https://our.internmc.facebook.com/intern/diff/D49615145) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109908 Approved by: https://github.com/jerryzh168	2023-10-05 04:08:44 +00:00
lezcano	4b1e138162	[dynamo] [easy]Remove InstructionTranslator from within Set (#110521 ) I believe this was a left over from the before times. See if CI agrees. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110521 Approved by: https://github.com/ezyang	2023-10-05 04:01:18 +00:00
Angela Yi	a93337ed55	[export] Add ir spec (#110394 ) Summary: Copied IR spec over from Executorch Test Plan: _docs_ Differential Revision: D49829187 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110394 Approved by: https://github.com/ydwu4, https://github.com/gmagogsfm	2023-10-05 03:06:30 +00:00
drisspg	a8653f35de	One more small Perf Tweak to fill_ (#110294 ) # Summary Perf win by check which device tensors are on ## Before this PR: ``` Shell CPU \| CPU: 1.3328152848407626 GPU \| GPU: 6.614773320034146 CPU \| GPU: 29.027153505012393 GPU \| CPU: 17.22372299991548 ``` ## After this PR ``` Shell CPU \| CPU: 1.4241038949694484 GPU \| GPU: 7.060713530518115 CPU \| GPU: 15.149936103262007 GPU \| CPU: 5.774620908778161 ``` #### Repro Script ``` Python a = torch.tensor([0.2, 0.5], device="cpu") amax = torch.tensor(0.5, device="cpu") print(f"CPU \| CPU: {benchmark_torch_function_in_microseconds(torch.fill_, a, amax)}") a = torch.tensor([0.2, 0.5], device="cuda") amax = torch.tensor(0.5, device="cuda") print(f"GPU \| GPU: {benchmark_torch_function_in_microseconds(torch.fill_, a, amax)}") a = torch.tensor([0.2, 0.5], device="cpu") amax = torch.tensor(0.5, device="cuda") print(f"CPU \| GPU: {benchmark_torch_function_in_microseconds(torch.fill_, a, amax)}") a = torch.tensor([0.2, 0.5], device="cuda") amax = torch.tensor(0.5, device="cpu") print(f"GPU \| CPU: {benchmark_torch_function_in_microseconds(torch.fill_, a, amax)}") ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/110294 Approved by: https://github.com/mikaylagawarecki	2023-10-05 02:42:57 +00:00
Kazuaki Ishizaki	434a996c42	Fix typo under torch/_inductor directory (#110530 ) This PR fixes typo of comments and messages in files under `torch/_dynamo` directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110530 Approved by: https://github.com/kit1980	2023-10-05 02:17:20 +00:00
chilli	9648df1a6a	Made pattern-matcher diagnostics lazily reported + added TORCH_COMPILE_CPROFILE (#110504 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110504 Approved by: https://github.com/mlazos, https://github.com/eellison ghstack dependencies: #110501	2023-10-05 01:34:57 +00:00
chilli	e686341f64	Consider that ops can be fused into cat in the min-cut partitioner (#110501 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110501 Approved by: https://github.com/eellison	2023-10-05 01:34:57 +00:00
Justin Chu	d24e7be243	Include `onnx` and `onnxscript` information in collect_env.py (#110560 ) `onnx` and `onnxscript` are used in torch.onnx.dynamo_export since 2.0. It would be helpful to collect version information in user issue reports. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110560 Approved by: https://github.com/albanD	2023-10-05 01:29:04 +00:00
Amadeusz Skrzypczak	653f966df0	Fix type promotion of float8_e5m2 and float8_e4m3fn (#110279 ) There is an issue with float8 type promotion, because _promoteTypesLookup doesn't contain records for few types between bfloat16 and float8. I have simply moved float8 types just after bfloat16, however I'm not sure if it doesn't break serialization. Please, decide if it can stay like this, or should I insert missing records filled with "ud" into _promoteTypesLookup instead of moving types. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110279 Approved by: https://github.com/albanD	2023-10-05 01:28:48 +00:00
Bin Bao	c121f957c2	[aotinductor] Enable test_non_default_cuda_device on CI (#110509 ) Summary: test_non_default_cuda_device needs to run on a multi-gpu CI instance Differential Revision: [D49937115](https://our.internmc.facebook.com/intern/diff/D49937115) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110509 Approved by: https://github.com/angelayi, https://github.com/khabinov, https://github.com/chenyang78	2023-10-05 01:25:50 +00:00
Jane Xu	9f40ffeec6	[optim] disable large_tensor tests for ROCm (#110559 ) Closes #105825 #105820 #105754 by replacing with an incode skip. Fixes #105825, fixes #105820, fixes #105754 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110559 Approved by: https://github.com/albanD	2023-10-05 01:21:21 +00:00
Edward Z. Yang	6a974bec5d	Change flash attention outputs to be SymInt instead of int (#110533 ) Fixes https://github.com/pytorch/pytorch/issues/110322 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/110533 Approved by: https://github.com/albanD	2023-10-05 01:00:07 +00:00
Edward Z. Yang	f1d81134ef	Print output type if assert fires (#110534 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/110534 Approved by: https://github.com/albanD	2023-10-05 00:59:17 +00:00
Justin Chu	f3aba45049	[ONNX] Create onnxscript-torchlib specific xfails/skips for fx tests (#110536 ) Creates xfail_onnxscript/skip_onnxscript so that it is clear torchlib needs to support it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110536 Approved by: https://github.com/BowenBao	2023-10-05 00:39:05 +00:00
Mihir Patel	95c59b30b8	Update fully_sharded_data_parallel to fix typing (#110545 ) Fixes typing so that linter does not complain when using CustomPolicy. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110545 Approved by: https://github.com/awgu, https://github.com/Skylion007	2023-10-05 00:00:10 +00:00
Xuehai Pan	0daa7d4815	[test][docs] Fix doctest warnings for syntax errors (#110517 ) Fixes some syntax errors in doctest find in CI tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110517 Approved by: https://github.com/albanD	2023-10-05 00:00:06 +00:00
Fabrice Pont	053367b1ed	fix: flake8-bugbear code B024 (#107265 ) See #106571 item B024 This fix concerns the addition of `abstractmethod` to methods declared inside abstract classes. Should I also include PEP8 compliant reformatting on the files I had to modify ? Pull Request resolved: https://github.com/pytorch/pytorch/pull/107265 Approved by: https://github.com/kit1980	2023-10-04 23:52:52 +00:00
Xuehai Pan	449271f3f1	[pytree] Extract reusable generic tests for pytree (#110395 ) Part of #109684 - #109684 Changes: - Add new functions `tree_structure`, `tree_leaves`, `tree_map_` and `tree_map_only_` to Python pytree. - Extract reusable tests for pytree to `TestGenericPytree`. - Change `treespec_dumps` and `treespec_loads` in C++ pytree to call Python pytree and use JSON string as serialization type. - Rename `torch.utils.pytree` -> `torch.utils._cxx_pytree`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110395 Approved by: https://github.com/zou3519	2023-10-04 23:40:50 +00:00
Jon Chuang	37afa0c349	fix(inductor): Increase coverage of Inductor ATen lowering (#110473 ) Add sqrt to decomp testing path and fix missing `minimum`, `clamp_min`,`clamp_max` lowerings and/or registrations. Follow up to: https://github.com/pytorch/pytorch/pull/110468#issuecomment-1745718602 (requires upstream to merge to avoid merge conflict) CC: @janeyx99 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110473 Approved by: https://github.com/janeyx99	2023-10-04 23:40:46 +00:00
Xu Zhao	2e31fae5c5	Cleanup the code in the `dynamo` userbenchmark (#110519 ) Summary: Skip importing the modules that are only available in the pytorch source code, not pytorch nightly release. Make dynamo benchmark work on both OSS and internal. X-link: https://github.com/pytorch/benchmark/pull/1960 Test Plan: ``` $ python run_benchmark.py dynamo --only alexnet --training --performance --inductor loading model: 0it [00:05, ?it/s] cuda train alexnet running benchmark: 100%\|█████████████████\| 30/30 [00:00<00:00, 41.46it/s] 1.129x ``` ``` $ buck2 run mode/opt //pytorch/benchmark:run_benchmark -- dynamo --only alexnet --training --inductor --performance --output-directory $HOME loading model: 0it [00:16, ?it/s] running benchmark: 100%\|█████████████████\| 30/30 [00:00<00:00, 37.94it/s] cuda train alexnet 1.120x ``` Differential Revision: D49912006 Pulled By: xuzhao9 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110519 Approved by: https://github.com/desertfire, https://github.com/jansel	2023-10-04 23:26:30 +00:00
Howard Huang	0949d97c16	fix batch_isend_irecv example incorrect usage (#110408 ) mismatched dtypes silently leads to wrong outputs in nccl ``` 1:recv_tensor=tensor([0., 0.], device='cuda:1') 0:recv_tensor=tensor([2.8026e-45, 0.0000e+00], device='cuda:0') ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/110408 Approved by: https://github.com/awgu, https://github.com/Neilblaze	2023-10-04 22:57:03 +00:00
soulitzer	8672d64fed	Use is_symbolic instead of testing isinstance in some place (#110372 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110372 Approved by: https://github.com/ezyang ghstack dependencies: #110044, #110369, #110370, #110371	2023-10-04 22:56:42 +00:00
soulitzer	e1cfcdfa06	Symintify guards.cpp (#110371 ) Separating this out so we can check perf more easily Pull Request resolved: https://github.com/pytorch/pytorch/pull/110371 Approved by: https://github.com/ezyang ghstack dependencies: #110044, #110369, #110370	2023-10-04 22:56:42 +00:00
soulitzer	a7145cb3a4	Add symbolic singleton int (#110370 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110370 Approved by: https://github.com/ezyang ghstack dependencies: #110044, #110369	2023-10-04 22:56:26 +00:00
soulitzer	eb8feb8ff8	Support SingletonSymNode mul with coefficient (#110369 ) We want to be able to use SingletonSymNode to represent strides for Jagged layout tensor. The following is for 3D, but easily generalizable to higher dimensions. Constraints: - [B, x, D] (where x represents the "variably lengthed dim") can be strided in two ways [x, 1, sum(x)] and [dx, d, 1]. We need two different placeholder values depending on how the jagged tensor is strided. - When doing operations we need the strides of output tensors to be expressable in terms of the strides and sizes of the inner tensors. Given [B, x, D] @ [D, D'], the output strides is [x * D', D', 1] rather than some opaque [x2, D', 1]. This constraint exists because if I'm tracing, I need a symint to represent the output stride. This symint needs to come from somewhere; I get it in several ways: (1) create a constant, (2) unbacked symint, (3) create a new input using a source, (4) output of an operation on an existing symint. It is clear that (4) is what we want here, which brings us to the design below. Design: Given the two constraints, the most straightforward way to implement this is actually to update SingletonSymNode to include some scalar factor, i.e. Morally, SingletonSymNode represents `factor * [s_0, s_1, …, s_n]` This enables us to symbolically compute strides from sizes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110369 Approved by: https://github.com/ezyang ghstack dependencies: #110044	2023-10-04 22:56:15 +00:00
soulitzer	07331c65e6	Update singleton int to error when inequality relation is undefined (#110044 ) Previously, something like j0 >= 3, would return False. In sympy however, it is not possible to make it so that both j0 >= 3 and j0 < 3 return False. In sympy, you only get to dispatch on Ge, and the remaining are derived, e.g. defining Ge(j0 >= 3) to be False would force Lt(j0, 3) to be True, which is not what we want. In this PR, we make it so that both j0 >=3 and j0 < 3 error, so that in a future PR when we create the symbolic counterpart of this singleton, the behaviors can be the same. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110044 Approved by: https://github.com/ezyang	2023-10-04 22:55:53 +00:00
soulitzer	4e73eee93f	Update custom Function preserve torch function when inputs returned as-is (#109825 ) Fixes https://github.com/pytorch/pytorch/issues/109805 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109825 Approved by: https://github.com/albanD	2023-10-04 22:45:11 +00:00
Alin Pahontu	21d77bcf80	added path to correct directory containing headers (#110063 ) After make install the headers are placed in include/openblas/ folder instead of include/ folder. Updated FindOpenBLAS.cmake to make that change clear. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110063 Approved by: https://github.com/Blackhex, https://github.com/kit1980	2023-10-04 21:56:36 +00:00
Avik Chaudhuri	6fc09aee36	constant output errors (#110472 ) When mapping between the original signature of a program and the graph-captured signature of its exported program, we emit errors when we see unexpected original or graph-captured inputs or outputs. These errors can arise because of various reasons, e.g.: 1. some input or output has been lifted because of mutation 2. some type is not pytree-registered for flattening / unflattening 3. some type cannot be realized with graph operations (This is probably not an exhaustive list.) Previously we used to emit errors based on a vanilla id-based membership check between the two sides, mostly anticipating (1) as the reason for errors. But this does not do justice to errors because of (2) or (3). This PR emits a different error when it finds (3) to be a probable cause. Specifically, it considers only Tensor and Sym* types to be "supported": no other type seems to be realizable by graph operations. When (2) is a probable cause, we sometimes also hit the same error because we would expect the supported types to show through upon registration. But this kind of error may need some more work in the future. Differential Revision: [D49885828](https://our.internmc.facebook.com/intern/diff/D49885828/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110472 Approved by: https://github.com/ydwu4	2023-10-04 21:56:20 +00:00
Bert Maher	a9df9e5187	[inductor] get_system shouldn't error if CUDA is not installed (#110282 ) Using inductor on a CPU-only device should be OK. Differential Revision: [D49749912](https://our.internmc.facebook.com/intern/diff/D49749912/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110282 Approved by: https://github.com/desertfire	2023-10-04 21:28:55 +00:00
ydwu4	6db3853eeb	Add doc for torch.cond (#108691 ) We add a doc for torch.cond. This PR is a replacement of https://github.com/pytorch/pytorch/pull/107977. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108691 Approved by: https://github.com/zou3519	2023-10-04 21:24:14 +00:00
Dmitry Nikolaev	901aa85b58	fix TEST_ROCM definition to disable test_jit_cudnn_extension on rocm (#110385 ) Define TEST_ROCM before modification TEST_CUDA. Otherwise TEST_ROCM will always be false and will not disable test_jit_cudnn_extension for rocm Fixes https://github.com/pytorch/pytorch/issues/107182 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110385 Approved by: https://github.com/jithunnair-amd, https://github.com/kit1980	2023-10-04 20:02:02 +00:00
Yang Chen	46a5558cd5	[AOTInductor] Simplified AOTInductor interface and model class (#110411 ) Summary: This PR removed several APIs from the AOTInductor interface, which are not used by the client. It also simplified AOTInductor's model class by removing the dim info for input/output tensors. We included dim info before to return max output shapes, which was used by the client to allocate memory for output tensors. Now, we allocate output tensor memory from the .so so that we don't need to maintain such information any more. The deletion of dim info from the model class also simplified the codegen quite a bit. Test Plan: ci Reviewed By: khabinov Differential Revision: D49835430 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110411 Approved by: https://github.com/khabinov, https://github.com/desertfire, https://github.com/jansel	2023-10-04 18:35:24 +00:00
Oguz Ulgen	baa9af155e	Add more tests for native triton kernels (#110486 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110486 Approved by: https://github.com/jansel ghstack dependencies: #110403	2023-10-04 18:26:45 +00:00
Oguz Ulgen	f04b1a0d27	[AOTInductor] Implement autograd eager backend for native triton kernels (#110403 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110403 Approved by: https://github.com/zou3519, https://github.com/bdhirsh	2023-10-04 17:56:56 +00:00
Bin Bao	c0c2e052a4	[aotinductor] Clean up fallback kernel cpp name generation (#110267 ) Summary: Unify the way to generate cpp kernel name when the kernel is from OpOverload Pull Request resolved: https://github.com/pytorch/pytorch/pull/110267 Approved by: https://github.com/zou3519 ghstack dependencies: #110233	2023-10-04 17:18:02 +00:00
Bin Bao	539367f0bc	[aotindutor] Refactor optional value codegen (#110233 ) Summary: Simplify the codegen for optional values by using c10::nullopt, and we don't need placeholders like OptionalScalar because we can simply use None for that purpose. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110233 Approved by: https://github.com/jansel	2023-10-04 17:18:02 +00:00
Shiyan Deng	247c574313	[jit] make register parameter/buffer thread safe in torch::jit::Module (#110488 ) Summary: Registering param/buffer will write into a vector inside Object, need to maintain thread safety if we have threads reading from the vector and writing to the vector at the same time. Test Plan: CI Differential Revision: D49882601 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110488 Approved by: https://github.com/davidberard98	2023-10-04 17:04:23 +00:00
Kazuaki Ishizaki	2c1b009e39	Fix typo under torch/_dynamo directory (#110459 ) This PR fixes typo of comments in files under `torch/_dynamo` directory Pull Request resolved: https://github.com/pytorch/pytorch/pull/110459 Approved by: https://github.com/colesbury	2023-10-04 16:05:05 +00:00
Bert Maher	4c3d3b7176	[inductor] Lower small gemvs on CPU (#110456 ) If the gemv fits in registers, like [1,16]*[16,16], MKL isn't going to do much better than compiling a simple for-loop, and we end up paying allocation overhead and ATen overhead. A very small internal inference model drops from 7->5 us with this change. Differential Revision: [D49875991](https://our.internmc.facebook.com/intern/diff/D49875991/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110456 Approved by: https://github.com/chenyang78, https://github.com/jgong5	2023-10-04 15:16:38 +00:00
Banit Agrawal	30c4c6ff9b	[PyTorch CCA] Refactor caching allocator config code (#110123 ) Summary: This diff refactors the code by moving CUDAAllocatorConfig into the header file. This config refactoring is done so that we can use the same config code for CUDA pinned memory as well. Test Plan: sandcastle Differential Revision: D49653265 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110123 Approved by: https://github.com/zdevito	2023-10-04 14:58:23 +00:00
PyTorch MergeBot	156aefa89b	Revert "[3/N] Add -Wdeprecated and related fixes (#109698 )" This reverts commit c31fcdaa4f79e83c82ec4f5ff3cf96e2cb99eecd. Reverted https://github.com/pytorch/pytorch/pull/109698 on behalf of https://github.com/PaliC due to breaking quantization tests ( quantization/test_quantize_per_channel_sub_byte and quantization/test_quantize_per_channel_float_qparams) internally ([comment](https://github.com/pytorch/pytorch/pull/109698#issuecomment-1746999806))	2023-10-04 14:33:47 +00:00
cyy	5220d0dfaf	Increase header coverage of clang-tidy (#110443 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110443 Approved by: https://github.com/Skylion007	2023-10-04 13:52:06 +00:00
Yukio Siraichi	0e55cc4986	[HigherOrderOp] Flatten outputs of `wrap`. (#109433 ) Fix: #109247 This PR flattens `wrap` outputs by inlining `pytree.tree_flatten` function after calling the inner function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109433 Approved by: https://github.com/zou3519 ghstack dependencies: #110290	2023-10-04 13:43:55 +00:00
Yukio Siraichi	f68f49c462	Refactor expect tests on test_higher_order_ops.py. (#110290 ) This PR inlines the expecteds strings onto the `assertExpectedInline` calls, so that, when change is needed, we may do that by using the `expectedtest` machinery: setting the environment variable `EXPECTTEST_ACCEPT=1`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110290 Approved by: https://github.com/zou3519	2023-10-04 13:43:55 +00:00
Raphael Reme	9f0601df6d	Fix a typo in `cholesky_inverse` documentation (#110364 ) Very small PR to fix a typo in [https://pytorch.org/docs/stable/generated/torch.cholesky_inverse.html](cholesky_inverse) doc. According to the current doc, the function expects $A$, the symmetric positive-definite matrix, as input. But the examples given (and more important, the code) is using $u$ the cholesky decomposition of this matrix (like cholesky_solve). Also, it provides a correct example of batch usage of this function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110364 Approved by: https://github.com/lezcano	2023-10-04 12:30:11 +00:00
Ken Jin	31d635803b	[Dynamo] Fx proxy for builtin all with list iterators (#109972 ) Fixes https://github.com/pytorch/pytorch/issues/109057. Fixes https://github.com/pytorch/pytorch/issues/103620. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109972 Approved by: https://github.com/ezyang	2023-10-04 07:59:26 +00:00
Yu Guo	2bf3ca1be7	[torchdynamo] preserve deterministic_algorithms_warn_only in convert_context (#110457 ) Summary: preserve deterministic_algorithms_warn_only in dynamo context Test Plan: modified unit tests to test warn_only Differential Revision: D49872622 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110457 Approved by: https://github.com/jansel	2023-10-04 07:12:32 +00:00
Jez Ng	dddf581da7	[dynamo] Add graph break on requires_grad_() (#110053 ) Fixes #107861. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110053 Approved by: https://github.com/eellison	2023-10-04 06:22:16 +00:00
Xiaodong Wang	562c68e56f	[nccl] denoise warning msg (#110433 ) Summary: This is too noisy for anything set with TORCH_NCCL_USE_COMM_NONBLOCKING. Just warn once. Test Plan: GH CI Differential Revision: D49846339 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110433 Approved by: https://github.com/awgu	2023-10-04 06:21:53 +00:00
PyTorch UpdateBot	a0e321d5ad	[vision hash update] update the pinned vision hash (#110489 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110489 Approved by: https://github.com/pytorchbot	2023-10-04 06:16:41 +00:00
Jon Chuang	3fd938369f	add `foreach_abs` meta registration and inductor decomp (#110468 ) Fixes https://github.com/pytorch/pytorch/issues/110458 Somehow it is on allowlist but not on testing path. CC @janeyx99 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110468 Approved by: https://github.com/janeyx99	2023-10-04 06:09:37 +00:00
Max Ren	08c7dcda65	[pt2e][xnnpack_quantizer] quantize "mul" (#110428 ) Adding "mul" to list of partitions that are supported by the quantizer. This shows up in EDSR, where we still want to quantize the mul op Differential Revision: [D49850151](https://our.internmc.facebook.com/intern/diff/D49850151/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110428 Approved by: https://github.com/jerryzh168 ghstack dependencies: #110427	2023-10-04 05:11:53 +00:00
Max Ren	66202ed29c	[pt2e][xnnpack_quantizer] add util function to convert scalars to attrs (#110427 ) Jerry provided a notebook solution for converting scalars to attrs so that they may be properly quantized: https://fburl.com/anp/kzz7tfn1 Adding this pass as a util function in xnnpack_quantizer_utils.py Differential Revision: [D49850150](https://our.internmc.facebook.com/intern/diff/D49850150/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110427 Approved by: https://github.com/jerryzh168	2023-10-04 05:11:53 +00:00
Jerry Zhang	64416a1fc7	[quant][docs] Fix formatting (#110460 ) Summary: att Test Plan: check generated docs Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/110460 Approved by: https://github.com/andrewor14	2023-10-04 04:54:10 +00:00
chilli	005e8ddcb9	cache the hash construction on Guard (#110464 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110464 Approved by: https://github.com/zou3519, https://github.com/voznesenskym	2023-10-04 04:49:18 +00:00
zdevito	3fe3439242	Use LLVMSymbolizer directly for unwind inside fbcode (#108800 ) Using LLVMSymbolizer directly avoids having to call fork which has caused timeouts in some circumstances. Differential Revision: [D49070589](https://our.internmc.facebook.com/intern/diff/D49070589/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108800 Approved by: https://github.com/aaronenyeshi	2023-10-04 04:04:08 +00:00
Yanbo Liang	510ec7e3c5	[Dynamo] SizeVariable can be indexed by symint (#110349 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/110349 Approved by: https://github.com/williamwen42	2023-10-04 03:20:18 +00:00
Sherlock Huang	50054b1a62	[AOTInductor] ProxyExecutor support ReinterpretView inputs (#110451 ) Summary: See wrapper.codegen_reinterpret_view(), it return a temporary handle for tensor, which has following problem. ``` # NB, the return handle here represents a temporary tensor, which will be automatically # released. # Here's a sample usage in the cpp wrapper code: # ``` # aoti_torch_addmm_out( # buf1, # arg1_1, # RAIIAtenTensorHandle(tmp_tensor_handle_0), # buf0, # 1L, # 1L)); # ``` # RAIIAtenTensorHandle(tmp_tensor_handle_0) will be released after the call to addmm_out. # This could be problematic when it's used in a different pattern, for example: # ```` # AtenTensorHandle tensor_args[] = {RAIIAtenTensorHandle(tmp_tensor_handle_2), buf5, buf6}; # aoti_torch_proxy_executor_call_function(..., tensor_args); # ```` # RAIIAtenTensorHandle(tmp_tensor_handle_2) will be invalid when it's used in the latter # kernel call. return f"RAIIAtenTensorHandle({tmp_name})" ``` As a result, ProxyExecutor would generate following code, which cause invalid memory access. Before: ``` // Source Nodes: [fn_with_tuple_output], Original ATen: [fb.fn_with_tuple_output] AtenTensorHandle tmp_tensor_handle_2; AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch__reinterpret_tensor(buf3, 2, int_array_0, int_array_1, 0L, &tmp_tensor_handle_2)); ... AtenTensorHandle tensor_args[] = {RAIIAtenTensorHandle(tmp_tensor_handle_2), buf5, buf6}; int64_t int_args[] = {1}; aoti_torch_proxy_executor_call_function(proxy_executor, 1, 1, int_args, 3, tensor_args); buf3.reset(); ``` With fix in this diff, ProxyExecutor generates following code After: ``` // Source Nodes: [fn_with_tuple_output], Original ATen: [fb.fn_with_tuple_output] AtenTensorHandle tmp_tensor_handle_2; AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch__reinterpret_tensor(buf3, 2, int_array_0, int_array_1, 0L, &tmp_tensor_handle_2)); ... aoti_torch_proxy_executor_call_function(proxy_executor, 1, 1, std::vector<int64_t>{1}.data(), 3, std::vector<AtenTensorHandle>{RAIIAtenTensorHandle(tmp_tensor_handle_2), buf5, buf6}.data()); buf3.reset(); ``` I am not exactly a big fan of such `std::vector{...}.data()` for creating a temp array, but I can't think of another fix. Test Plan: buck2 run mode/dev-nosan deeplearning/aot_inductor/test:test_custom_ops Reviewed By: desertfire Differential Revision: D49758764 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110451 Approved by: https://github.com/desertfire	2023-10-04 02:20:31 +00:00
eellison	dd95eaaf1a	turn back on constant folding in fbcode (#108604 ) Differential Revision: [D49020794](https://our.internmc.facebook.com/intern/diff/D49020794) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108604 Approved by: https://github.com/davidberard98, https://github.com/mlazos	2023-10-04 02:13:03 +00:00
Howard Huang	efb73fe8e4	Fix send()/recv() to adhere to timeout (#109611 ) Summary: Point to point ops don't enqueue their work to the `workMetaList_` which means that the NCCL watchdog does not watch over them, hence they do not respect the collective timeouts. Test Plan: While trying to add a test I found we dont have tests which validate the nccl watch dog. It looks like this is because we dont have a good way to detect when nccl watchdog has thrown an error (exception is thrown in a side thread) in our testing framework / `MultiprocessTestCase` I manually tested this change with the script in https://github.com/pytorch/pytorch/issues/109401, but need to look more closely at how to automate a test for NCCL watchdog Differential Revision: D49418976 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109611 Approved by: https://github.com/wconstab	2023-10-03 23:27:45 +00:00
Xiaodong Wang	a0bffe7ed7	[S366352] Print nccl version during initialization (#110305 ) Summary: print nccl version during initialization Differential Revision: D49603220 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110305 Approved by: https://github.com/Skylion007, https://github.com/fegin, https://github.com/rohan-varma	2023-10-03 23:09:48 +00:00
cyy	c31fcdaa4f	[3/N] Add -Wdeprecated and related fixes (#109698 ) This PR follows #108626. Hopefully we can enable the warning in the next PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109698 Approved by: https://github.com/Skylion007, https://github.com/ezyang	2023-10-03 22:50:53 +00:00
Mu-Chu Lee	836ba6430a	[AOTInductor] Initial functionality for Inf and NaN checker (#109526 ) Summary: Add initial functionality for Inf and NaN checker for AOTInductor. Test Plan: Included in commit. Skipped for CI as SIGABRT can't be captured by pytest. Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D49379751](https://our.internmc.facebook.com/intern/diff/D49379751) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109526 Approved by: https://github.com/chenyang78	2023-10-03 22:39:42 +00:00
Bin Bao	06e88d2cfc	[aotinductor] Remove output_spec from AOTInductorModelCache (#110462 ) Summary: No need to store output_spec as the returned exported.call_spec already contains that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110462 Approved by: https://github.com/angelayi	2023-10-03 22:29:36 +00:00
eellison	98c8550158	Fix Triplet Margin Loss Opinfo (#110302 ) Triplet Margin Loss takes in a Callable `distance_function` parameter which is not supported as an argument on the fx graph. See previous error: > File "/scratch/eellison/work/pytorch/torch/_dynamo/symbolic_convert.py", line 562, in call_function self.push(fn.call_function(self, args, kwargs)) File "/scratch/eellison/work/pytorch/torch/_dynamo/variables/torch.py", line 723, in call_function proxy_args_kwargs(args, kwargs), File "/scratch/eellison/work/pytorch/torch/_dynamo/utils.py", line 504, in proxy_args_kwargs f"call_function args: {typestr(args)} {typestr(*list(kwargs.values()))}" File "/scratch/eellison/work/pytorch/torch/_dynamo/exc.py", line 143, in unimplemented raise Unsupported(msg) torch._dynamo.exc.Unsupported: call_function args: TensorVariable() TensorVariable() TensorVariable() ConstantVariable(float) NNModuleVariable() This is fixable by just inlining into `triplet_margin_loss` and continuing to compile it. This required support for `has_torch_function_variadic`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110302 Approved by: https://github.com/mlazos	2023-10-03 20:26:13 +00:00
Peter Bell	a8a31bc165	[dynamo][BE] test_misc.py shouldn't change the default dtype globally (#110412 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110412 Approved by: https://github.com/jansel, https://github.com/lezcano, https://github.com/Fidget-Spinner ghstack dependencies: #110398	2023-10-03 19:25:37 +00:00
Peter Bell	dc794ec32c	[dynamo] Trace through builtin `abs` (#110398 ) In python `abs(x)` does nothing but delegate to `x.__abs__()` so we should do the same in dynamo. This also adds `SymNode.__abs__` so we can trace through indexing expressions involving `abs`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110398 Approved by: https://github.com/jansel, https://github.com/lezcano	2023-10-03 19:25:37 +00:00
igm503	a389181f2e	[MPS] add support for aten::nextafter (#109685 ) Fixes https://github.com/pytorch/pytorch/issues/77764#issuecomment-1722515591 Adds support for aten::nextafter to the MPS backend. Supports float and half types. Notes: - I've added nextafter to the output_grad_check XFAILLIST since neither this nor the cpu implementations have grad functions - Metal Shading Language 3.1 seems to have a native nextafter() function, so once that's available, this kernel can just call that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109685 Approved by: https://github.com/kulinseth	2023-10-03 19:20:22 +00:00
Pruthvi Madugundu	9ce2e02fd6	Revert "[ROCm] Remove PYTORCH_MIOPEN_SUGGEST_NHWC flag (#90725 )" (#110319 ) This reverts commit 66bfcd32fd7f41154f1fd520e14012d3f717db4d. NHWC is have perf regression on MIOpen, so reverting till the performance issue is fixed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110319 Approved by: https://github.com/jeffdaily, https://github.com/jithunnair-amd, https://github.com/kit1980	2023-10-03 19:14:47 +00:00
Brian Hirsh	b457e3f79a	Reland attempt 2 of "Update AOTAutograd to use FunctionalTensorMode instead of C++ functionalization (#106406 )" (#109906 )" (#110079 ) The first reland broke internal (failing diff: D49617462). The major error looks like it's because there's an internal-only higher order op that needs a new functionalization rule. I'm going to land an internal diff for that and confirm tests pass before relanding this PR. Also confirmed that the issue from https://github.com/pytorch/pytorch/issues/110121 is fixed, and added a test. This reverts commit 1b90f07f5a9fcb9187fee94f769fc117490c1e39. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110079 Approved by: https://github.com/ezyang	2023-10-03 18:50:25 +00:00
Octavian Guzu	b5c3a17c2c	[fuzzing result][fuzz_torch_jit_lite_interpreter] read-heap-buffer-overflow-far-from-bounds (size 4) in c10::IValue::IValue() (#110441 ) Summary: This diff fixes a heap underflow found by fuzzing in torch/csrc/jit/runtime/vararg_functions.cpp Test Plan: CI and ``` arc lionhead crash reproduce 1753074381791061 ``` doesn't crash anymore. Differential Revision: D49537535 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110441 Approved by: https://github.com/Skylion007	2023-10-03 18:48:12 +00:00
Yang Chen	da63c7f2c3	[AOTInductor] remove CUDA dependency for cpp backend (#110409 ) Summary: Previously, we link against cuda libs even for pure cpp backend. This caused issues for cases where the inference platform does not have GPUs. This diff removed cuda dependency for cpp backend. Reviewed By: bertmaher, muchulee8, mikekgfb Differential Revision: D49800712 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110409 Approved by: https://github.com/bertmaher, https://github.com/desertfire	2023-10-03 18:36:00 +00:00
PyTorch MergeBot	df3ab70dde	Revert "Added new test sample to interpolate op in OpInfo (#104181 )" This reverts commit 87f8bc65f8cbc3202d645cdfa80a206b564276ac. Reverted https://github.com/pytorch/pytorch/pull/104181 on behalf of https://github.com/peterbell10 due to Causing OOM in slow-gradcheck ([comment](https://github.com/pytorch/pytorch/pull/104181#issuecomment-1745472323))	2023-10-03 18:07:02 +00:00
Rohan Varma	40be6b72e1	[ez] Type function in distributed_c10d (#110435 ) This function returns a `torch.device`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110435 Approved by: https://github.com/awgu	2023-10-03 17:54:04 +00:00
vfdev	5977d17953	Update common_methods_invocations.py (#110383 ) Description: - Fixed misleading test sample case Context: sample input is composed of input tensor `(N, C, iH, iW)` and grid tensor `(N, oH, oW, 2)`, however, grid is defined as `(N, C, oW, 2)`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110383 Approved by: https://github.com/peterbell10	2023-10-03 17:53:39 +00:00
Bert Maher	aecfe5d168	[aoti] Remove pessimizing move (#110446 ) "`std::move` of a temporary prevents copy elision" says the compiler, and I am pretty sure it is right. Since AtenTensorHandle* implicitly converts to RAIIAtenTensorHandle, I simply called emplace_back; happy to put an explicit ctor if that makes folks happier. Differential Revision: [D49842542](https://our.internmc.facebook.com/intern/diff/D49842542/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110446 Approved by: https://github.com/desertfire, https://github.com/Skylion007 ghstack dependencies: #110445	2023-10-03 17:44:58 +00:00
Bert Maher	174e46b853	[inductor][easy] Free functions in headers should be declared inline (#110445 ) If multiple files include model.h, you end up with duplicate symbols errors. Differential Revision: [D49842167](https://our.internmc.facebook.com/intern/diff/D49842167/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110445 Approved by: https://github.com/desertfire, https://github.com/Skylion007	2023-10-03 17:44:49 +00:00
Nikita Shulga	cd0e7d133b	Migrate MacOs wheel binary builds to ephemeral M1 runners (#110432 ) Surprisingly there are no speed difference between running the cross-compilation on `macos12-xl` (x86_64 12 core machine) and `macos-13-xlarge` (m1 6 core machine) Most of the changes are on the https://github.com/pytorch/builder side: - `50a6e91f97` skips installing mkl on M1 machines - `bbb29b0467` same for llvm-9 - `8bcc83dbb1` bumps minimal numpy version to 1.19 (as 1.17 is not available for m1) - `cc4f1f9055` skips building tests/distributed for M1 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110432 Approved by: https://github.com/kit1980	2023-10-03 17:31:28 +00:00
Levy Zhao	7f0a659ccc	Script to compare measured (trace) runtimes with estimated runtimes (#108037 ) (#109076 ) Summary: X-link: https://github.com/pytorch/benchmark/pull/1856 Reviewed By: xmfan, xuzhao9 Differential Revision: D48523883 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109076 Approved by: https://github.com/xw285cornell	2023-10-03 17:05:35 +00:00
Jerry Zhang	f2a1b93549	Back out "[quant] Support integer implementations for adaptive_avg_pool2d (#104226 )" (#110316 ) Summary: Original commit changeset: acdb5b34e3aa Original Phabricator Diff: D47321689 Test Plan: opinfo tests in CI Differential Revision: D49789403 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110316 Approved by: https://github.com/kimishpatel	2023-10-03 16:59:23 +00:00
Yanbo Liang	9bc5e10899	[New][1/N] Dynamo skipfiles refactor (#110330 ) This is the replacement of #109567. Now I preserved all existing semantics and only focusing on API (for developers) and code structure changes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110330 Approved by: https://github.com/ezyang	2023-10-03 16:50:33 +00:00
Kazuaki Ishizaki	aa3629ee3e	Fix typo under docs directory (#110359 ) This PR fixes typo in `.rst` files under docs directory Pull Request resolved: https://github.com/pytorch/pytorch/pull/110359 Approved by: https://github.com/kit1980	2023-10-03 16:36:05 +00:00
David Berard	4069d1de59	[distributed] Remove recordStream for callback that ends a profiler event (#109933 ) Background: recordStreams can result in memory spikes, so we don't want them to appear in FSDP (https://dev-discuss.pytorch.org/t/fsdp-cudacachingallocator-an-outsider-newb-perspective/1486). @ awgu is working on fixing this, but it turns out profiler was causing recordStream to get called when it is enabled. Why profiler was causing recordStream to get called: NCCL calls add profiler events manually; they register a callback to be executed when the future for the collective is completed; this indicates the end of the CPU-side profiler event for the callback: `c2c7c4035f/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp (L1822-L1824)` In order to guarantee safety, ivalue::Future::invokeCallback calls `recordStream` on the future's storage buffers; this marks the fact that other streams (e.g. the one that the callback runs on) may need to use the storage. `c2c7c4035f/aten/src/ATen/core/ivalue_inl.h (L1171-L1173)` Change: The end-profiler-event callback doesn't actually use the future, so we don't need to recordStream on it. This PR introduces an optional parameter `uses_future` for adding callbacks; a user can set this variable to "false" to unsafely skip the recordStream, if the user knows that the future will not be used in the lambda. Tests: (a) unit tests; (b) added an assert in recordStream: `c2c7c4035f/c10/cuda/CUDACachingAllocator.cpp (L3260)` and verified that it doesn't get triggered when running basic distributed tests w/ profiler enabled Pull Request resolved: https://github.com/pytorch/pytorch/pull/109933 Approved by: https://github.com/wconstab	2023-10-03 14:40:43 +00:00
Stephen Jia	ff96f6d04f	[core IR][reland] Add `split.Tensor` and `unbind` decompositions to core ATen decomp table (#110323 ) Summary: This is a reland of [github PR #110102]( https://github.com/pytorch/pytorch/pull/110102). The original PR had to be unlanded due to internal CI failures. This diff applies some small fixes to the failing tests to adjust to the new decompositions. Note that `lift_fresh` will not be decomposed for now, since it was found that [constant propogation looks specifically for `lift_fresh`](`13af952f94/torch/fx/experimental/proxy_tensor.py (L381-L386)`). Therefore decomposing `lift_fresh` will interfere with constant propogation during export. Test Plan: Github CI and internal CI Differential Revision: D49761321 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110323 Approved by: https://github.com/jansel	2023-10-03 14:35:04 +00:00
dependabot[bot]	4cdc52a2d4	Bump urllib3 from 2.0.2 to 2.0.6 in /tools/build/bazel (#110421 ) Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.0.2 to 2.0.6. - [Release notes](https://github.com/urllib3/urllib3/releases) - [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst) - [Commits](https://github.com/urllib3/urllib3/compare/2.0.2...2.0.6) --- updated-dependencies: - dependency-name: urllib3 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-10-03 07:13:28 -07:00
Yu, Guangye	2cbfcc740f	use torch.xpu.manual_seed_all in torch.seed (#110376 ) # Motivate Use manual_seed_all instead of manual_seed. Because multi-device is supported in xpu backend. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110376 Approved by: https://github.com/ezyang	2023-10-03 13:41:55 +00:00
HDCharles	428cbd7513	[ao] fixing multihead attention convert size (#110407 ) Summary: after converting nn.multihead attention we weren't deleting the old in_proj_weight and in_proj_bias despite not (really) using them. Test Plan: python test/test_quantization.py -k "test_custom_module_multi_head_attention" Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/110407 Approved by: https://github.com/jerryzh168	2023-10-03 08:49:12 +00:00
Nikita Shulga	f76e5c846d	Speed-up casts to FP8 (#110251 ) Unlike half/bfloat16 casts, where entire model is cast to half-precision floats, only parts of the network can be in float8 and therefore performance of the casts is important. Speedup casts by implementing non-dynamically castable variants using new refactored `gpu_kernel_nocast` template. Mesaure performance using the following script: ```python import torch def run_cast_bench(size=(10000, 10000), src_dtype=torch.float16, dtype=torch.float8_e5m2): x=torch.rand(size, device="cuda", requires_grad=False, dtype=src_dtype) z=torch.empty(size, device="cuda", dtype=dtype, requires_grad=False) with torch.profiler.profile(activities=[torch.profiler.ProfilerActivity.CUDA]) as prof: z.copy_(x) rc=prof.key_averages() print(f"Running bench for src_dtype={src_dtype} dst_dtype={dtype} cuda_time={rc[1].cuda_time}") if __name__ == "__main__": for dtype in [torch.float8_e5m2, torch.float8_e4m3fn]: run_cast_bench(src_dtype=torch.half, dtype=dtype) run_cast_bench(src_dtype=torch.float, dtype=dtype) run_cast_bench(src_dtype=torch.bfloat16, dtype=dtype) ``` Below are before and after results: \| Cast type \| After \| Before \| \| ---------- \| ------ \| ----- \| \| fp32->e5m2 \| 228 us \| 336 us\| \| fp16->e5m2 \| 150 us \| 323 us\| \| bf16->e5m2 \| 150 us \| 322 us\| \| fp32->e4m3 \| 227 us \| 331 us\| \| fp16->e4m3 \| 148 us \| 318 us\| \| bf16->e4m3 \| 149 us \| 318 us\| Skip the optimizations on ROCm platform TODO: - Investigate why `__nv_cvt` intrinsics defined in https://docs.nvidia.com/cuda/cuda-math-api/group__CUDA__MATH__FP8__MISC.html end up being slower Pull Request resolved: https://github.com/pytorch/pytorch/pull/110251 Approved by: https://github.com/drisspg	2023-10-03 08:16:47 +00:00
Xiaodong Wang	f4c0ef95bc	[AMD] Fix broken build from nested transformer utils (#110245 ) Summary: D49374910 breaks internal amd build because we didn't hipify the header file in nested/cuda. Maybe it's just easier to move it outside. Reviewed By: nmacchioni Differential Revision: D49743234 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110245 Approved by: https://github.com/drisspg	2023-10-03 08:05:10 +00:00
vfdev-5	d9fe1713c3	Enabled batch rule decompositions for upsample*.vec ops (#110333 ) Follow-up PR to https://github.com/pytorch/pytorch/pull/110172 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110333 Approved by: https://github.com/zou3519	2023-10-03 06:58:18 +00:00
Sherlock Huang	15219f53d1	[AOTInductor] Fix ProxyExecutor's handling on multiple outputs (#110374 ) Summary: Fix ProxyExecutor after D49780781 Test Plan: buck2 run mode/dev-nosan deeplearning/aot_inductor/test:test_custom_ops Differential Revision: D49816044 Privacy Context Container: 368960445142440 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110374 Approved by: https://github.com/chenyang78	2023-10-03 06:42:22 +00:00
Yanbo Liang	03f28dbce3	[HigherOrderOp] Better testing strategy for wrap that checks guards and recompiles (#110343 ) Fixes #109251 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110343 Approved by: https://github.com/zou3519	2023-10-03 05:57:38 +00:00
PyTorch UpdateBot	ce50132748	[vision hash update] update the pinned vision hash (#110424 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110424 Approved by: https://github.com/pytorchbot	2023-10-03 05:20:27 +00:00
wz337	d15d7a6485	[DTensorTestbase] Add "cpu:gloo,cuda:nccl" backend to DTensorTestbase (#110397 ) This PR updates backend as a property to DTensorTestbase and add "cpu:gloo,cuda:nccl" support in DTensorTestbase so that we can use `cpu:gloo,cuda:nccl` backend for checkpoint unit tests. cc. @wanchaol, @fduwjj, @XilunWu Pull Request resolved: https://github.com/pytorch/pytorch/pull/110397 Approved by: https://github.com/wanchaol	2023-10-03 04:54:02 +00:00
Huy Do	f7909cb947	Build and test iOS on GitHub M1 runners (#110406 ) They are here https://github.blog/2023-10-02-introducing-the-new-apple-silicon-powered-m1-macos-larger-runner-for-github-actions I have been able to run iOS simulator tests on my M1 laptop without issues. Some numbers: * iOS build takes ~1h with x86 runners * The new M1 runners take ~20m https://github.com/pytorch/pytorch/actions/runs/6386171957 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110406 Approved by: https://github.com/malfet, https://github.com/seemethere	2023-10-03 03:17:10 +00:00
Sergii Dymchenko	3fe94e46c2	Skip test_retracibility under ASAN (#110414 ) See https://github.com/pytorch/pytorch/issues/110416 Skipping this under ASAN to make CI green. This probably needs to be moved to slow tests eventually. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110414 Approved by: https://github.com/huydhn, https://github.com/malfet	2023-10-03 02:05:35 +00:00
Shubhraprakash Das	3bd229b53c	Add quantized tensor function to get scale and zero point (#110095 ) Summary: See summary Test Plan: buck2 run --target-platforms ovr_config//platform/macos:arm64-fbsource -c pt.vulkan_full_precision=1 //xplat/caffe2/fb/custom_ops/vulkan_quantized:pt_vulkan_quantized_test_binAppleMac\#macosx-arm64 [ OK ] VulkanAPITest.convert_qconv2d_context (135 ms) [ RUN ] VulkanAPITest.linear_2d [ OK ] VulkanAPITest.linear_2d (4 ms) [----------] 2 tests from VulkanAPITest (139 ms total) [----------] Global test environment tear-down [==========] 2 tests from 1 test suite ran. (139 ms total) [ PASSED ] 2 tests. ############################################################## buck2 build --target-platforms ovr_config//platform/macos:arm64-fbsource //xplat/caffe2:pt_vulkan_quantized_api_test_binAppleMac\#macosx-arm64 -c pt.vulkan_full_precision=1 --show-output" buck-out//v2/gen/fbsource/xplat/caffe2/pt_vulkan_quantized_api_test_binAppleMac [ OK ] VulkanAPITest.conv2d_pw_quantized_prepack_random_params_int8_int32 (11 ms) [ RUN ] VulkanAPITest.linear_2d_flat [ OK ] VulkanAPITest.linear_2d_flat (4 ms) [ RUN ] VulkanAPITest.linear_2d_small [ OK ] VulkanAPITest.linear_2d_small (1 ms) [ RUN ] VulkanAPITest.linear_2d_large [ OK ] VulkanAPITest.linear_2d_large (1 ms) [ RUN ] VulkanAPITest.linear_3d_flat [ OK ] VulkanAPITest.linear_3d_flat (2 ms) [ RUN ] VulkanAPITest.linear_3d_small [ OK ] VulkanAPITest.linear_3d_small (2 ms) [ RUN ] VulkanAPITest.linear_3d_large [ OK ] VulkanAPITest.linear_3d_large (1 ms) [ RUN ] VulkanAPITest.linear_4d_flat [ OK ] VulkanAPITest.linear_4d_flat (1 ms) [ RUN ] VulkanAPITest.linear_4d_small [ OK ] VulkanAPITest.linear_4d_small (1 ms) [ RUN ] VulkanAPITest.linear_4d_large [ OK ] VulkanAPITest.linear_4d_large (1 ms) [ RUN ] VulkanAPITest.linear_custom [ OK ] VulkanAPITest.linear_custom (0 ms) [----------] 76 tests from VulkanAPITest (1811 ms total) [----------] Global test environment tear-down [==========] 76 tests from 1 test suite ran. (1811 ms total) [ PASSED ] 76 tests. YOU HAVE 8 DISABLED TESTS ############################################################## buck2 run --target-platforms ovr_configplatform/macos:arm64-fbsourcexplat/caffe2:pt_vulkan_api_test_binAppleMac\#macosx-arm64 -c pt.vulkan_full_precision=1 [----------] Global test environment tear-down [==========] 346 tests from 1 test suite ran. (5648 ms total) [ PASSED ] 345 tests. [ SKIPPED ] 1 test, listed below: [ SKIPPED ] VulkanAPITest.querypool_flushed_shader_log YOU HAVE 5 DISABLED TESTS Differential Revision: D49609986 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110095 Approved by: https://github.com/yipjustin	2023-10-03 01:48:31 +00:00
Catherine Lee	f69e9c8c91	run_tests.py minor logging changes (#110188 ) Minor logging changes that just kind of annoyed me: * prevent constant printing of `No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'` by moving import within the function (idk if this is ok) * prevent constant printing of `Ignoring disabled issues: ['']` (no idea why it was not gated behind a function or main) * change all prints in run_tests.py to be through stderr so theres no weird interleaving (although if everything goes through stderr, might as well just print everything through stdout...) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110188 Approved by: https://github.com/huydhn, https://github.com/malfet, https://github.com/ZainRizvi	2023-10-03 01:22:47 +00:00
Fuzzkatt	e55d6f923c	minor tf32 fixes for unit tests on H100 and L40 (#110201 ) fixes the following tests which were failing in the NVIDIA internal CI on H100 and L40: test/test_nn.py: * test_TransformerEncoderLayer_gelu_activation_cuda_tf32 * test_Transformer_multilayer_coder_cuda_tf32 test/inductor/test_torchinductor.py: * test_batch_norm_2d_2_cuda Pull Request resolved: https://github.com/pytorch/pytorch/pull/110201 Approved by: https://github.com/mikaylagawarecki, https://github.com/jansel, https://github.com/Skylion007	2023-10-03 00:10:37 +00:00
eellison	3812f2e40c	Preserve layout on like constructors (#110242 ) Partially fixes `test_memory_format_factory_like_functions_preserve` with PYTORCH_TEST_WITH_INDUCTOR. Inductor preserves memory layouts for user-visible outputs as annotated on the fx graph that it is passed in. That graph is generated from running aot_autograd with decompositions. If the decompositions give incorrect strides, so will inductor. This preserves the layout of `_like` operators when it corresponds to a `torch.memory_format`. It doesnt fix a) arbitrary permutations, b) striding of non-dense outputs. Both of these are lower-pri compared to preserving channels last. We would need either https://github.com/pytorch/pytorch/issues/92920 or a `to` variant that takes in a physical layout arbitrary permutations. I converted the output of rand to the correct layout instead of passing the layout in so that this would compose with the `replace_random` pass, and because the two pointwise ops will get fused anyway. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110242 Approved by: https://github.com/int3	2023-10-02 23:53:55 +00:00
cyy	d58a91b2a6	[4/N] Move remaining c10::variant calls to std::variant (#110382 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/110382 Approved by: https://github.com/Skylion007	2023-10-02 23:52:04 +00:00
Peter Bell	01b2f25ebd	[inductor] Cast loads from boolean tensors to `tl.int1` (#110388 ) Triton currently loads pointer to `tl.int1` as `tl.int8`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110388 Approved by: https://github.com/lezcano, https://github.com/Skylion007	2023-10-02 22:52:08 +00:00
Jerry Zhang	28b3ff7974	[quant][pt2e][docs] Update main quant doc with pt2 export quantization information (#110260 ) Summary: att Test Plan: . Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/110260 Approved by: https://github.com/kimishpatel	2023-10-02 21:29:38 +00:00
PyTorch MergeBot	cba3f407b1	Revert "[HigherOrderOp] Flatten outputs of `wrap`. (#109433 )" This reverts commit 651b198cdfac09d506a586040b16a88db1f54d85. Reverted https://github.com/pytorch/pytorch/pull/109433 on behalf of https://github.com/kit1980 due to Depends on reverted https://github.com/pytorch/pytorch/pull/110290 ([comment](https://github.com/pytorch/pytorch/pull/109433#issuecomment-1743766271))	2023-10-02 21:09:19 +00:00
PyTorch MergeBot	859733512f	Revert "Refactor expect tests on test_higher_order_ops.py. (#110290 )" This reverts commit d9aecaefbe477256022ae0c0eae3a77a71bcb320. Reverted https://github.com/pytorch/pytorch/pull/110290 on behalf of https://github.com/kit1980 due to Broke multiple tests and also lint https://github.com/pytorch/pytorch/actions/runs/6384854768/job/17329068768 ([comment](https://github.com/pytorch/pytorch/pull/110290#issuecomment-1743764686))	2023-10-02 21:07:19 +00:00
Chien-Chin Huang	cdde899a73	[FSDP][optim_state_dict] Fuse allgather for optim_state_dict when use_orig_params is True (#108298 ) The original implementation of `_gather_orig_param_state` is naive. It performs one allgather_object and two allgather (if the optimizer is Adam) per FQN. This can be slow and make `_optim_state_dict` become bottleneck. This PR rewrite the implementation and fuse all the `allgather_object`s into one. As for `allgather`, it is fused based on the information of FlatParameters. So there will be 2N `allgather` where N is the number of FlatParameter and 2 is due to Adam having 2 states per FQN. One experiment on 8GPU A100 shows that the execution of the gathering is improved to 0.3 seconds from 3 seconds. Differential Revision: [D48835138](https://our.internmc.facebook.com/intern/diff/D48835138/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108298 Approved by: https://github.com/awgu	2023-10-02 20:57:08 +00:00
Jez Ng	15dfe7b8e3	Actually enable typechecking for _inductor/index_propagation.py (#110110 ) It was supposed to be enabled in #105622 but that PR neglected to update .lintrunner.toml. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110110 Approved by: https://github.com/Skylion007	2023-10-02 20:57:03 +00:00
Peter Bell	80b6f072e3	[ATen] Remove ATen.h includes from transformers (#110199 ) The kernel files here in particular are quite slow to compile and don't use anything from `ATen.h`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110199 Approved by: https://github.com/malfet	2023-10-02 20:43:23 +00:00
Fuzzkatt	c28bb46445	Fix test_mem_efficient_attention_vs_math_ref_grads tolerance from test_transformers.py (#108094 ) Tolerance currently too low, triggering test failures via numerical mismatch in NVIDIA internal testing for certain H100, A16, A40 configs. cc: @ptrblck @eqy Pull Request resolved: https://github.com/pytorch/pytorch/pull/108094 Approved by: https://github.com/eqy, https://github.com/msaroufim	2023-10-02 20:42:57 +00:00
BowenBao	6b2c52278e	Benchmark flag to include slowdowns when computing gmean of speedups over eager (#108375 ) `clip(1)` excludes slowdowns by treating them as 1x. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108375 Approved by: https://github.com/jansel	2023-10-02 20:35:08 +00:00
sunghyunjun	b5268456f9	Fix optimize_for_inference to support modules that don't have a forward method (#110013 ) Fixes #108662 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110013 Approved by: https://github.com/davidberard98	2023-10-02 20:13:44 +00:00
Yukio Siraichi	651b198cdf	[HigherOrderOp] Flatten outputs of `wrap`. (#109433 ) Fix: #109247 This PR flattens `wrap` outputs by inlining `pytree.tree_flatten` function after calling the inner function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109433 Approved by: https://github.com/zou3519 ghstack dependencies: #110290	2023-10-02 19:58:30 +00:00
Yukio Siraichi	d9aecaefbe	Refactor expect tests on test_higher_order_ops.py. (#110290 ) This PR inlines the expecteds strings onto the `assertExpectedInline` calls, so that, when change is needed, we may do that by using the `expectedtest` machinery: setting the environment variable `EXPECTTEST_ACCEPT=1`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110290 Approved by: https://github.com/zou3519	2023-10-02 19:58:30 +00:00
RihamSelim	92242f599a	[PyTorch] Add Expanded call stack to nodes [Take 2] (#110229 ) Summary: Adding back D46578700 / PR https://github.com/pytorch/pytorch/pull/108426 Note: The changes were originally reverted due to memory regression, these changes are putting the code behind a gflag so it is only used by binaries that require expanded stack for BPF Profiling. Original Diff comment: To get a Node's call stack we currently loop on the InlinedCallStack graph and follow the "callee" chain. Since the node's inlined stack does not change we can optimize this but expanding the node's inlined stack once and reusing it. This is particularly useful when reading the node's stack from another process (e.g. BPF) as it simplified the memory traversal process. The new data structure (NodeSourceInfo) only holds pointers to the function name and file name variables, and assumes these objects will be alive throughout the lifetime of the process. Each Node has an extended attribute that has an index to a vector of stack frames expanded_node_stacks_ node_stack_attr_symbol_ is only needed to make accessing the stack vector index attribute easier from BPF. Test Plan: - Verified using BPF Program in subsequent diffs - Perf testing for loading large model: P822455246 Differential Revision: D49565461 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110229 Approved by: https://github.com/zdevito	2023-10-02 19:52:41 +00:00
Ubuntu	16e3f158b9	Add function to port FX minified graph to HLO via StableHLO (#109084 ) If `XLA_HLO_DEBUG` flag is enabled, generated a minified HLO graph when using the minifier. This function enables HLO minification support by porting the minified FX graph to StableHLO via the `save_torch_model_as_stablehlo` function. This allows users to port the minified graph to compilers that are not compatible with TorchDynamo/Inductor workflow and use XLA instead. The purpose of this PR is to help XLA users debug accuracy and compilation errors. It will also be helpful for existing TorchDynamo/XLA workflow on `torchxla_trace_once` backend as well. Fixes [#5461](https://github.com/pytorch/xla/issues/5461) in Torch XLA repo. CC @GleasonK @qihqi Pull Request resolved: https://github.com/pytorch/pytorch/pull/109084 Approved by: https://github.com/anijain2305	2023-10-02 19:36:04 +00:00
PyTorch MergeBot	7e6cf04a84	Revert "Multiprocessing support for NT (#110292 )" This reverts commit 881e7304d6315c17953fa5b9bc1dfe07dcb7d166. Reverted https://github.com/pytorch/pytorch/pull/110292 on behalf of https://github.com/jbschlosser due to Address review comments ([comment](https://github.com/pytorch/pytorch/pull/110292#issuecomment-1743524901))	2023-10-02 18:27:13 +00:00
Joel Schlosser	881e7304d6	Multiprocessing support for NT (#110292 ) Fixes #110161 Allows NTs to be used in DataLoaders with `num_workers > 1`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110292 Approved by: https://github.com/cpuhrsch ghstack dependencies: #110219	2023-10-02 18:14:34 +00:00
Huy Do	7827ae2864	Increase job timeout limit when running with memory leak check (#110193 ) This fixes the daily timeout of ROCm jobs when running with memory leak check turning on. I want to use something like `inputs.timeout-minutes * 2` but that syntax, unfortunately, isn't supported in GitHub action YAML. So I decide to just x2 the current timeout value of 300 minutes to make it 600 minutes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110193 Approved by: https://github.com/clee2000	2023-10-02 18:01:49 +00:00
PyTorch MergeBot	8d6479725a	Revert "Adding Backward Support for NestedTensors and FlashAttention (#97485 )" This reverts commit 28d69d52569c8d140e83a2411e6066c903b94b29. Reverted https://github.com/pytorch/pytorch/pull/97485 on behalf of https://github.com/huydhn due to Sorry for reverting you change, but one of the tests test_fused_kernels_nested_broadcasting_requires_grad_failure_cuda is failing on Windows CUDA `f7ba3e85e2` ([comment](https://github.com/pytorch/pytorch/pull/97485#issuecomment-1743474468))	2023-10-02 17:48:57 +00:00
Wanchao Liang	26900d21c2	[dtensor] skip pytree when not necessary (#110132 ) pytree is a great tool, but it sometimes considers to be evil for tensor subclasses, it's useful to implement subclass quickly, but it: * exposes non-trival CPU overhead * many ops don't need pytree, only the one with list/dict ops needs * blindly use pytree to re-wrap have semantic issues for inplace/out ops This PR avoid using pytree for most ops during torch_dispatch and only enable it for certain ops Pull Request resolved: https://github.com/pytorch/pytorch/pull/110132 Approved by: https://github.com/fduwjj	2023-10-02 17:44:34 +00:00
cyy	fd6c993eea	Add missing CUDA error check (#110368 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110368 Approved by: https://github.com/Skylion007	2023-10-02 17:34:31 +00:00
Jon Chuang	46d1f9b385	fix(lint): Fix lint issues on main (#110389 ) Lint issue was introduced in https://github.com/pytorch/pytorch/pull/110186 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110389 Approved by: https://github.com/Skylion007, https://github.com/malfet	2023-10-02 17:04:01 +00:00
Li-Huai (Allan) Lin	a3c1e3c95c	Generalize toAccumulateType() (#108248 ) Trying to address this comment: https://github.com/pytorch/pytorch/pull/106666#discussion_r1297397554 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108248 Approved by: https://github.com/kulinseth, https://github.com/albanD	2023-10-02 16:34:36 +00:00
cyy	7853f8f6da	Fix override warnings in nvfuser (#110350 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110350 Approved by: https://github.com/Skylion007	2023-10-02 16:29:29 +00:00
Angela Yi	e47e946bbf	[aotinductor] Use dynamic_shape instead of constraints (#110360 ) Summary: Previously we used export's constraints to specify all batch-size dimensions being dynamic. This is done by creating 1 constraint `dynamic_dim(inp[0][0], lower, upper)`, followed by `dynamic_dim(inp[0][0]) == dynamic_dim(inp[i][0])` for every input `i`. Through the new `dynamic_shapes` API, we can use `Dims("batch_size")` on every dimension to specify which dimensions are dynamic and equal to each other, and `None` otherwise: `{i: [Dims("batch_size", lower, upper), None] for every input i}` Note: `dynamic_shapes` and `constraints` utilize the same "constraints" backend so this diff should be idempotent. Test Plan: `buck2 run @//mode/dev-nosan //caffe2/torch/fb/model_transform/experimental/benchmark/test/aotinductor:test_aot_inductor_benchmark` Reviewed By: chenyang78, aakhundov Differential Revision: D49784351 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110360 Approved by: https://github.com/desertfire	2023-10-02 16:09:37 +00:00
vfdev-5	87f8bc65f8	Added new test sample to interpolate op in OpInfo (#104181 ) Description: - Added new test sample to interpolate op in OpInfo - Fixed silent issue with zero tensor test sample for uint8 dtype Pull Request resolved: https://github.com/pytorch/pytorch/pull/104181 Approved by: https://github.com/pmeier, https://github.com/lezcano	2023-10-02 15:35:48 +00:00
cdzhan	175b626216	Enable torch.promote_types in Dynamo tracing (#110358 ) Fixes #109508 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110358 Approved by: https://github.com/Skylion007	2023-10-02 15:20:36 +00:00
Alexander Grund	e0348ceceb	Avoid undefined behavior in JIT-generated conversion code (#110212 ) The inductor/dynamo JIT generator creates C++ code using `static_cast` for type conversions. This is can be undefined behavior for e.g. `static_cast<uint8_t>(floatVal)` where `floatVal` is a negative value. To avoid this in the "regular" C++ code `c10::convert` is used. So use it in the JIT generated code too. Fixes #110077 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110212 Approved by: https://github.com/ezyang, https://github.com/jgong5, https://github.com/desertfire	2023-10-02 12:56:41 +00:00
Menglu Yu	f7812cdbd9	[inductor][Optimus]Improve logging for Optimus (#110186 ) Summary: It is based on the diff D49340843. We add more logs for better debug and logging purposes. Test Plan: ``` [2023-09-27 20:35:53,844] [0/0] torch._inductor.fx_passes.group_batch_fusion: [INFO] Before group_batch fusion in pre grads pass. Print graph: https://www.internalfb.com/intern/everpaste/?color=0&handle=GEoA8xb22jibUNEEAPYecF9_RVM1br0LAAAz [2023-09-27 20:35:55,001] [0/0] torch._inductor.fx_passes.group_batch_fusion: [INFO] Apply fusion BatchLinearFusion. Print graph: https://www.internalfb.com/intern/everpaste/?color=0&handle=GPMR9BYffjwToEQCAFS7rgixMi0pbr0LAAAz [2023-09-27 20:35:57,419] [0/0] torch._inductor.fx_passes.group_batch_fusion: [INFO] Apply fusion BatchLinearLHSFusion. Print graph: https://www.internalfb.com/intern/everpaste/?color=0&handle=GKiA8hNycGpBdAIDAOn0c1Hpef4sbr0LAAAz [2023-09-27 20:35:57,585] [0/0] torch._inductor.fx_passes.group_batch_fusion: [INFO] BatchLayernormFusion: key = ('batch_layernorm', 'torch.Size([2048, 128])', 'torch.Size([128])', 'torch.Size([128])', '(128,)', '1e-05'); subset size = 7 [2023-09-27 20:35:58,493] [0/0] torch._inductor.fx_passes.group_batch_fusion: [INFO] Apply fusion BatchLayernormFusion. Print graph: https://www.internalfb.com/intern/everpaste/?color=0&handle=GKpftRa9Glxm-MYDAOZb_D80JHsYbr0LAAAz [2023-09-27 20:35:59,754] [0/0] torch._inductor.fx_passes.group_batch_fusion: [INFO] Apply fusion BatchTanhFusion. Print graph: https://www.internalfb.com/intern/everpaste/?color=0&handle=GPgh9BZQl4EKGckAAES094iV3Atrbr0LAAAz I0927 20:36:00.532000 3750607 pre_grad.py:71] After group_batch_fusion_pre_grad_passes: https://www.internalfb.com/intern/everpaste/?color=0&handle=GBPb8xYxfrbXuCMDAI5d_a4YyhFBbr0LAAAz ``` Differential Revision: D49710166 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110186 Approved by: https://github.com/jackiexu1992, https://github.com/yanboliang	2023-10-02 07:29:25 +00:00
Michael Voznesensky	06464a3477	Change compiled_autograd tests to xfail instead of skip (#110348 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110348 Approved by: https://github.com/Chillee, https://github.com/jansel, https://github.com/Skylion007	2023-10-01 23:03:36 +00:00
wz337	a588648759	[DCP] Fix 'torch.cpu' has no attribute 'current_device' in checkpoint/optimizer.py (#110299 ) When running on "gloo" and "cpu:gloo,cuda:nccl" backend, it will run into the following error. ``` -- Process 1 terminated with the following error: Traceback (most recent call last): File "/data/users/irisz/pytorch/torch/multiprocessing/spawn.py", line 74, in _wrap fn(i, *args) File "/data/users/irisz/pytorch/torch/distributed/checkpoint/examples/fsdp_checkpoint_example.py", line 105, in run_fsdp_checkpoint_example optim_state = load_sharded_optimizer_state_dict( File "/data/users/irisz/pytorch/torch/distributed/checkpoint/optimizer.py", line 295, in load_sharded_optimizer_state_dict _alloc_tensor(value.properties, value.size, dp_pg_device_type), sharding_spec File "/data/users/irisz/pytorch/torch/distributed/checkpoint/optimizer.py", line 109, in _alloc_tensor device=cast(torch.device, _get_device_module(device_type).current_device()), AttributeError: module 'torch.cpu' has no attribute 'current_device' ``` This PR fix the error in optimizer.py. Will follow up to add "cpu:gloo,cuda:nccl" support in DTensorBase so we can update unit test to include this backend. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110299 Approved by: https://github.com/kumpera	2023-10-01 21:54:13 +00:00
Angela Yi	13af952f94	[export] Add run_decomposition() function to ExportedProgram (#110236 ) Summary: https://docs.google.com/document/d/1QJJEGnj2nHGPODlw38BEG3KLLCOTfdOVjPrNQbz_LM8/edit#bookmark=id.lp80wfshq130 `exported_program.run_decompositions(decomposition_table)` will optionally take a decomposition table, and run decompositions on the exported program, returning a new exported program. By default we will run the Core ATen decomposition table. Splitting up this diff with the following one (D49742989) to make migrating Executorch easier: 1. Land this diff 1. Wait for a pytorch nightly to include this diff 1. Update executorch's pytorch nightly 1. Land the following diff to have export() return no decomps Test Plan: Tested in following diff Differential Revision: D49743208 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110236 Approved by: https://github.com/gmagogsfm	2023-10-01 18:18:27 +00:00
chilli	13681382d5	Add heuristic for when `evict_first` should be set (and some other minor things) (#108841 ) Example of when the `evict_first` heuristic helps. ``` @torch.compile def f(a, b): return (a * b).sum(dim=-1) N = 512 inps = (torch.randn(N, N, N).permute(2, 1, 0), torch.randn(N, N, N).permute(1, 2, 0)) from torch._inductor.utils import do_bench print(do_bench(lambda: f(*inps))) ``` This generates code like this: http://ix.io/4HFs ``` Original: 3.8 ms This PR: 3.54 ms Always `evict_first: 5.4ms ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/108841 Approved by: https://github.com/lezcano, https://github.com/jansel	2023-10-01 17:06:12 +00:00
ruiren	e4414716d5	[onnx] support attn_mask fp16 type (#110306 ) When users define customized `attention mask` using `dtype=torch.float16`, e.g. ``` from torch.nn import functional as F float_min = torch.finfo(torch.float16).min attention_mask_fp16 = (attention_mask * 1.0).masked_fill(attention_mask, float_min).to(torch.float16) attn_output = F.scaled_dot_product_attention( query_layer_, key_layer_, value_layer_, attention_mask_fp16, 0.0, is_causal=False ) ``` the onnx graph cannot be exported. When q, k ,v have the fp16 type, we can support this `attn_mask` to be `fp16` type, by adding ``` elif ( _type_utils.JitScalarType.from_value(attn_mask) == _type_utils.JitScalarType.FLOAT in (_type_utils.JitScalarType.FLOAT, _type_utils.JitScalarType.HALF) ``` This can export `.onnx` graph. Fixes #109336 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110306 Approved by: https://github.com/titaiwangms	2023-10-01 14:50:58 +00:00
cyy	55905c4a1a	[2/N] Enable clang-tidy to c10/test/*cpp (#110270 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110270 Approved by: https://github.com/Skylion007, https://github.com/kit1980	2023-10-01 07:36:23 +00:00
cyy	ef5ff79019	[2/N] Clean up CMake target linking (#109986 ) This PR cleans up more CMake target linking. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109986 Approved by: https://github.com/malfet	2023-10-01 05:36:08 +00:00
Oleg Khabinov	669faab0ad	[AOTInductor] Add non-default device test (#110024 ) Differential Revision: D49604597 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110024 Approved by: https://github.com/chenyang78	2023-10-01 05:08:23 +00:00
PyTorch UpdateBot	2bcae75513	[vision hash update] update the pinned vision hash (#110344 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110344 Approved by: https://github.com/pytorchbot	2023-10-01 04:20:06 +00:00
Oleg Khabinov	e8c0364f36	[AOTInductor] Add model runner to avoid using torch_extension (#110263 ) Differential Revision: D49609669 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110263 Approved by: https://github.com/chenyang78	2023-10-01 00:52:17 +00:00
Sherlock Huang	898656e9d1	[AOTInductor] ProxyExecutor supports Tuple of Tensor and List[Tensor] in returns (#110187 ) Summary: ProxyExecutor supports custom ops that return a tuple mixed of Tensor and List[Tensor] e.g. `"fn_with_mix_outputs(Tensor t, Tensor[] tensors) -> (Tensor, Tensor[])"` Example: `out7, [out8, out9] = torch.ops.fb.fn_with_mix_outputs(out5, [out6, out4])` got compiled into ``` AtenTensorHandle buf11_handle; // output buffer AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_new_uninitialized_tensor(&buf11_handle)); RAIIAtenTensorHandle buf11(buf11_handle); AtenTensorHandle buf12_handle; // output buffer AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_new_uninitialized_tensor(&buf12_handle)); RAIIAtenTensorHandle buf12(buf12_handle); AtenTensorHandle buf13_handle; // output buffer AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_new_uninitialized_tensor(&buf13_handle)); RAIIAtenTensorHandle buf13(buf13_handle); AtenTensorHandle tensor_args_var_7[] = {buf8.get(), buf9.get(), buf6.get(), buf11.get(), buf12.get(), buf13.get()}; int64_t int_args_var_8[] = {}; aoti_torch_proxy_executor_call_function(proxy_executor, 3, 0, int_args_var_8, 6, tensor_args_var_7); ``` Serialized extern node ``` { "name": "buf10", "node": { "target": "fb::fn_with_mix_outputs", "inputs": [ { "name": "t", "arg": { "asTensor": { "name": "buf8" } } }, { "name": "tensors", "arg": { "asTensors": [ { "name": "buf9" }, { "name": "buf6" } ] } } ], "outputs": [ { "asTensor": { "name": "buf11" } }, { "asTensors": [ { "name": "buf12" }, { "name": "buf13" } ] } ], "metadata": {} } } ``` Test Plan: Test Differential Revision: D49710320 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110187 Approved by: https://github.com/chenyang78	2023-09-30 19:47:01 +00:00
Colin Peppler	6bb448a2d3	[inductor][fbcode] Add -D C10_DISABLE_TENSORIMPL_EXTENSIBILITY to cpp_compile_command (#110122 ) Summary: ## Why? The .so and .h files are compiled seperately with different flags. The .so is compiled by AOTInductor and .h files (eg. c10/core/TensorImpl.h) are compiled by buck2. Let's make sure the .so is also compiled with this macro in fbcode. Differential Revision: D49664078 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110122 Approved by: https://github.com/chenyang78, https://github.com/khabinov	2023-09-30 16:34:59 +00:00
cyy	d0ad848aa5	Enable misc clang-tidy checks (#110283 ) This PR enables the misc-XX checks in clang-tidy. Meanwhile, I excluded some of them that require a lot of code changes and have no immediate benefits. Some additional fixes and suppression were also given. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110283 Approved by: https://github.com/albanD	2023-09-30 10:39:52 +00:00
Adnan Akhundov	2ead6c2f6e	Skip launching kernels with zero grid in AOT Inductor (#110312 ) Summary: with the grid computed in terms of unbacked `SymInt`s, it can happen that the grid is zero size. This causes CUDA error on `cuLaunchKernel` in the AOT Inductor codegen. In this PR, when the grid contains unbacked `SymInt`s, a check is added around the `launchKernel` in the AOT Inductor's C++ wrapper codegen to make sure that the grid is not zero-size. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110312 Approved by: https://github.com/chenyang78	2023-09-30 09:12:56 +00:00
Huy Do	81a74457ca	[BE] Clean up trymerge code handling flaky failures (#110133 ) This is the 2nd part of https://github.com/pytorch/pytorch/pull/110054. The flaky classification has been done on Dr.CI. There is no need to download flaky rule files and do the check anymore. Some tests are also updated with new examples because we mocked the list of flaky rules there. Similar tests have been done on Dr.CI. * [x] https://github.com/pytorch/pytorch/pull/110054 * [x] Clean up the flaky rules logic because it has already been implemented on Dr. CI * [ ] Clean up the broken trunk logic for the same reason Pull Request resolved: https://github.com/pytorch/pytorch/pull/110133 Approved by: https://github.com/clee2000	2023-09-30 08:01:00 +00:00
Oguz Ulgen	f7ba3e85e2	[Dynamo] Add functional triton kernel wrapper (#110185 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110185 Approved by: https://github.com/jansel, https://github.com/zou3519, https://github.com/bdhirsh ghstack dependencies: #109623	2023-09-30 04:20:20 +00:00
eqy	6b84658433	[CUDA][cudaMallocAsync] Improve `PYTORCH_CUDA_ALLOC_CONF` error message (#104891 ) Tiny fix to improve use-facing errors for issues like #104801 CC @ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/104891 Approved by: https://github.com/kit1980	2023-09-30 02:59:02 +00:00
Nikita Shulga	ad8aef0f98	[BE] [3/N] Use nested namespaces (#110314 ) Mostly in torch/csrc/jit/runtime and in `ATen/cuda/` Pull Request resolved: https://github.com/pytorch/pytorch/pull/110314 Approved by: https://github.com/seemethere	2023-09-30 02:23:48 +00:00
drisspg	8745d2d4f2	Small optimization to how we call flash-attention (#110324 ) # Summary Logging Mode is great, and helped me identify that we are doing an unnecessary slice sometimes. ### Numbers For small sizes: ie. (16, 16, 32, 32) This brings the timing from: `flash_time: 29.344002110883594 micro seconds` to `flash_time: 26.971791498363018 micro seconds` Pull Request resolved: https://github.com/pytorch/pytorch/pull/110324 Approved by: https://github.com/cpuhrsch	2023-09-30 02:15:07 +00:00
leslie-fang-intel	7eeb392eb3	[Inductor] Enable the item() and nonzero() codegen test on CPU (#110262 ) Summary Follow up https://github.com/pytorch/pytorch/pull/109893 which has issue in support of CPU as reported in https://github.com/pytorch/pytorch/issues/109897. This fix mainly includes 2 changes: - Current implementation of `rename_indexing` `10c646295d/torch/_inductor/codegen/common.py (L1023)` only add symbol name start with `s` or `ps` into `kernel.args.sizevars`. However, `Unbacked symint` will start as `i`, so we extend the implementation of `rename_indexing` to support symbol start with `i`. - Currently, the internal loop index also name start as `i`. Since `i` has has been used as `Unbacked symint`, change the name to start with `x` which should align with trition. Test Plan ``` python -u -m pytest -s -v test_torchinductor_dynamic_shapes.py -k test_bool_mask_nobreak python -u -m pytest -s -v test_torchinductor_dynamic_shapes.py -k test_nonzero_size_factory_nobreak python -u -m pytest -s -v test_torchinductor_dynamic_shapes.py -k test_item_zeros_nobreak ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/110262 Approved by: https://github.com/ezyang, https://github.com/jgong5	2023-09-30 00:13:20 +00:00
ancestor-mithril	e0be9ebc18	Simplify the conditionals used for learning rate calculation for `ConstantLR` learning rate scheduler (#109785 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109785 Approved by: https://github.com/janeyx99, https://github.com/kit1980	2023-09-29 23:11:23 +00:00
Bin Bao	993eea0edd	[aotinductor] Fix a missing schema issue for repeat_interleave (#110105 ) Differential Revision: [D49686812](https://our.internmc.facebook.com/intern/diff/D49686812) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110105 Approved by: https://github.com/zou3519, https://github.com/jansel, https://github.com/aakhundov	2023-09-29 23:01:37 +00:00
davidgens-cerebras	ee0bff209c	[LTC] correct AdaptiveAvgPool3d channel dim index for shape inference (#109822 ) Fixes #109821 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109822 Approved by: https://github.com/mikaylagawarecki, https://github.com/alanwaketan	2023-09-29 22:54:12 +00:00
Nikita Shulga	5a87477e3f	[BE] Use `std::make_unique` (#110298 ) Since C++14 `std::unique_ptr<type_t[]> x(new type_t[NUM])` is identical to `auto x = std::make_unique<type_t[]>(NUM);` Leave two `std::unique_ptr<float[]> arr(new float[NUM]());` as statement not just allocates, but initializes it as well, se e below: `d04b35e7e3/aten/src/ATen/native/cpu/SoftMaxKernel.cpp (L700-L701)` On the other hand, from https://github.com/pytorch/pytorch/pull/60371 it's not at all clear, if it needs to be initialized to zero at that point... Pull Request resolved: https://github.com/pytorch/pytorch/pull/110298 Approved by: https://github.com/kit1980	2023-09-29 22:46:30 +00:00
PyTorch MergeBot	b083058e45	Revert "Make unbind() overrideable for NT subclass (#109122 )" This reverts commit f5a23ca78d13c5e536f5062325c815c50be5f4c2. Reverted https://github.com/pytorch/pytorch/pull/109122 on behalf of https://github.com/PaliC due to breaking slow tests ([comment](https://github.com/pytorch/pytorch/pull/109122#issuecomment-1741555305))	2023-09-29 22:41:56 +00:00
Evgeni Burovski	1e95a1ae8c	MAINT: pytorchify torch._numpy tests: core/ and fft/ (#109815 ) 1. Inherit from TestCase 2. Use pytorch parametrization 3. Use unittest.expectedFailure to mark xfails, also unittest skips All this to make pytest-less invocation work: $ python test/torch_np/test_basic.py cross-ref #109593, #109718, #109775 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109815 Approved by: https://github.com/lezcano	2023-09-29 22:36:13 +00:00
Octavian Guzu	9c7071b0e3	[fuzzing result][fuzz_torch_jit_lite_interpreter] read-heap-use-after-free (size 8) in std::_Function_base::_M_empty() (#110289 ) Summary: This diff fixes a heap UAF found by fuzzing in torch/csrc/jit/mobile/interpreter.cpp Test Plan: CI and ``` arc lionhead crash reproduce 1009060456885023 ``` doesn't crash anymore. Reviewed By: malfet Differential Revision: D49538326 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110289 Approved by: https://github.com/malfet	2023-09-29 22:32:38 +00:00
PyTorch MergeBot	f2d7faf4ba	Revert "MAINT: pytorchify torch._numpy tests: core/ and fft/ (#109815 )" This reverts commit 132a138a01806b45bb050cbcacbaa782fcf2e2ae. Reverted https://github.com/pytorch/pytorch/pull/109815 on behalf of https://github.com/PaliC due to causing various slow tests to fail ([comment](https://github.com/pytorch/pytorch/pull/109815#issuecomment-1741525574))	2023-09-29 21:53:36 +00:00
drisspg	28d69d5256	Adding Backward Support for NestedTensors and FlashAttention (#97485 ) # Summary <!-- copilot:summary --> ### <samp>🤖 Generated by Copilot at 318764f</samp> This pull request implements the CUDA backend of the SDPA kernel for nested tensors, which enables efficient transformer models with variable-length sequences. It adds a new dispatch key, a backward function, a unit test, and some helper functions for the kernel. It modifies `test/test_transformers.py`, `aten/src/ATen/native/native_functions.yaml`, `aten/src/ATen/native/nested/cuda/NestedTensorTransformerFunctionsBackward.cpp`, and `aten/src/ATen/native/nested/cuda/NestedTensorTransformerUtils.h`. <!-- copilot:poem --> ### <samp>🤖 Generated by Copilot at ed4a773</samp> > _Fused kernels of doom, unleash the flash attention_ > _Nested tensors on fire, reshape and pad with caution_ > _Backward pass of power, dispatch the CUDA key_ > _Test the gradients of hell, warn the user if they disagree_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/97485 Approved by: https://github.com/jbschlosser	2023-09-29 21:34:47 +00:00
Avik Chaudhuri	359c2a53f5	dynamic_shapes + retrace exported program (#110276 ) An `ExportedProgram`'s `__call__` signature is different from the original module, so `dynamic_shapes` that follow the original signature would fail when applied to re-export an `ExportedProgram`. This PR fixes this issue, in other words, the original `dynamic_shapes` should now work when re-exporting. Differential Revision: D49764011 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110276 Approved by: https://github.com/tugsbayasgalan	2023-09-29 21:06:46 +00:00
PyTorch MergeBot	c2c7c4035f	Revert "Simplify the conditionals used for learning rate calculation for `ConstantLR` learning rate scheduler (#109785 )" This reverts commit 83283b4f0dc2032a31f9a80c7aa40e3e552ec944. Reverted https://github.com/pytorch/pytorch/pull/109785 on behalf of https://github.com/PaliC due to causing macos errors as per `83283b4f0d` ([comment](https://github.com/pytorch/pytorch/pull/109785#issuecomment-1741471142))	2023-09-29 20:49:28 +00:00
atalman	b253fc9c93	Revert "[1/N] Dynamo skipfiles refactor (#109567 )" (#110296 ) This reverts commit 84c5435b296bf7361f0f3043f7e68b7ba13ffd70. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110296 Approved by: https://github.com/yanboliang	2023-09-29 20:35:46 +00:00
Peter Bell	bc047ec906	[inductor] Make sure unfuse_addmm and addmm patterns don't overlap (#110235 ) Inductor has two opposing patterns, ``` addmm -> add + mm add + mm -> addmm ``` This uses the `extra_check` to disable the addmm fusion pattern when the heuristic to unfuse add is met, for consistency. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110235 Approved by: https://github.com/lezcano, https://github.com/eellison ghstack dependencies: #110232	2023-09-29 19:35:29 +00:00
Peter Bell	d04b35e7e3	[inductor] Fix bug in input mutation (#107614 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107614 Approved by: https://github.com/jansel	2023-09-29 18:27:06 +00:00
Sherlock Huang	d7de26804e	[AOTInductor] ProxyExecutor supports List[Tensor] return type (#110182 ) Summary: Support custom ops returns List[Tensor] type, like `"fn_with_list_output(Tensor[] tensors, int i) -> Tensor[]"` As an example `out5, out6 = torch.ops.fb.fn_with_list_output([out3, out4], 1)` got compiled into ``` AtenTensorHandle buf8_handle; // output buffer AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_new_uninitialized_tensor(&buf8_handle)); RAIIAtenTensorHandle buf8(buf8_handle); AtenTensorHandle buf9_handle; // output buffer AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_new_uninitialized_tensor(&buf9_handle)); RAIIAtenTensorHandle buf9(buf9_handle); AtenTensorHandle tensor_args_var_5[] = {buf5.get(), buf6.get(), buf8.get(), buf9.get()}; int64_t int_args_var_6[] = {1}; aoti_torch_proxy_executor_call_function(proxy_executor, 2, 1, int_args_var_6, 4, tensor_args_var_5); ``` Test Plan: Test Differential Revision: D49694691 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110182 Approved by: https://github.com/chenyang78	2023-09-29 18:21:48 +00:00
Mu-Chu Lee	d6d3f6cfe5	Add weight update for DSOModel. (#110273 ) Summary: Add weight update for DSOModel and AOTInductorModel Test Plan: buck2 test accelerators/workloads/models/slimdsnn:slimdsnn_dso_test - SlimDSNN.DSO_Update_Constants Reviewed By: mikekgfb Differential Revision: D49748685 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110273 Approved by: https://github.com/hl475	2023-09-29 18:14:01 +00:00
Jaromir Latal	6e2c14a0e8	[Codemod][[codemod] Replace third-party mock with unittest.mock] caffe2/caffe2 (#106541 ) Reviewed By: thechrisu Differential Revision: D47909974 Pull Request resolved: https://github.com/pytorch/pytorch/pull/106541 Approved by: https://github.com/thechrisu	2023-09-29 18:09:49 +00:00
Simon Fan	88ef126a93	rename nanogpt_generate to nanogpt to also support train (#109746 ) Differential Revision: [D49522940](https://our.internmc.facebook.com/intern/diff/D49522940) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109746 Approved by: https://github.com/msaroufim, https://github.com/malfet, https://github.com/xuzhao9	2023-09-29 17:36:48 +00:00
Yang Chen	30759848fa	[inductor] handle non-list/tuple outputs for FallbackKernel (#110145 ) generate_output may return non-list/tuple outputs. Let's force those to be list, because we will enumerate kernel.outputs later in the codegen. Also fixed a minor issue in an assertion message. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110145 Approved by: https://github.com/aakhundov	2023-09-29 17:13:26 +00:00
Catherine Lee	defb364adf	Clean up test_external_module_register (#110254 ) caused by #109866 The test registers new device module, the above pr checks for xpu, sees that it got registered and uses it but its a dummy module. This causes any test after it to fail so I "clean up" the registered module Another possible solution would be to run this test last lol Pull Request resolved: https://github.com/pytorch/pytorch/pull/110254 Approved by: https://github.com/huydhn	2023-09-29 17:02:13 +00:00
Bin Bao	0ff1155d3a	[aotinductor] Refactor test_aot_inductor to take different devices (#110216 ) Summary: Replace hardcoded device to self.device, to make it easier to test both cpu and cuda Pull Request resolved: https://github.com/pytorch/pytorch/pull/110216 Approved by: https://github.com/chenyang78, https://github.com/bertmaher ghstack dependencies: #110215	2023-09-29 16:30:19 +00:00
Bin Bao	ce6d09a775	[aotinductor] Refactor test_aot_inductor (#110215 ) Summary: Remove the usage of output tensors in the test script, since AOTInductor now returns output tensors instead of taking in pre-allocated output tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110215 Approved by: https://github.com/angelayi, https://github.com/chenyang78	2023-09-29 16:30:19 +00:00
Andrei Gheorghe	28f52f2f80	Fix aminmax on CUDA when input shape contains 0 (#107564 ) The CUDA kernel asserts numel() > 0, the CPU kernel doesn't and returns empty values (as expected) Fixes #95349 and #85439 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107564 Approved by: https://github.com/lezcano	2023-09-29 16:18:08 +00:00
Oguz Ulgen	2d50a30d77	[Dynamo] Add native support for Triton Kernels to Dynamo (#109623 ) This PR adds native support to Dynamo to detect Triton kernels and create an FX graph node out of them. AOT eager and inductor modes will be support in follow up PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109623 Approved by: https://github.com/jansel	2023-09-29 15:49:18 +00:00
Joel Schlosser	3693777a86	Pickle support for NT (#110219 ) Fixes #104198 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110219 Approved by: https://github.com/cpuhrsch	2023-09-29 15:30:06 +00:00
Jane Xu	c9511e8ac9	[foreach][BE] cleaning up MultiTensorApply.cuh (#110228 ) Followup edits to #109402 as suggested by @r-barnes Pull Request resolved: https://github.com/pytorch/pytorch/pull/110228 Approved by: https://github.com/drisspg	2023-09-29 14:44:48 +00:00
Bert Maher	92f4a7b663	[inductor] Add fbcode include path for cuda (#110240 ) We missed the cuda include, leading to failures in cases where CUDA was not installed locally but only provided via third-party/GVFS. Differential Revision: [D49745585](https://our.internmc.facebook.com/intern/diff/D49745585/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110240 Approved by: https://github.com/hl475	2023-09-29 13:39:40 +00:00
Peter Bell	758735b739	[dynamo] Convert dtype arguments as well as inputs in `cast_to_fp64` (#110232 ) Generating reference outputs somtimes fails because of type mismatches in the graph, an issue which was noticed previously for `prims.convert_element_type` and fixed in #92036 but the same issue happens with other functions such as tensor constructors. This expands the fix from #92036 to all dtype keyword arguments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110232 Approved by: https://github.com/ezyang	2023-09-29 12:42:14 +00:00
Rohan Varma	24e5d61af8	Log usage of optimizer in backward (#110206 ) This will allow us to inspect and aggregate jobs that use optimizer in backward Differential Revision: [D48674740](https://our.internmc.facebook.com/intern/diff/D48674740/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110206 Approved by: https://github.com/awgu	2023-09-29 11:00:07 +00:00
PyTorch UpdateBot	acac92f806	[vision hash update] update the pinned vision hash (#110258 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110258 Approved by: https://github.com/pytorchbot	2023-09-29 04:17:27 +00:00
ancestor-mithril	d615f0078c	Updating documentation for `PolynomialLR` (#110151 ) Docstring mentions the power parameter is `int`, when it should have been `float`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110151 Approved by: https://github.com/janeyx99	2023-09-29 03:50:11 +00:00
Zain Rizvi	07ec95b17c	TD: Fix sorting bug for historical correlations heuristic (#110257 ) Fix bug where the historical correlations heuristic currently sorts heuristics in the opposite order, ranking the least relevant tests most highly <!-- copilot:poem --> ### <samp>🤖 Generated by Copilot at 70333d1</samp> > _`test_files` sorted_ > _by ratings, high to low_ > _a faster spring test_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/110257 Approved by: https://github.com/clee2000	2023-09-29 03:29:08 +00:00
cyy	3dc479e70b	[1/N] Apply clang-tidy to c10/test/*cpp (#109278 ) This series of PR enables clang-tidy checks in c10/test. We aim to finally add the path to lintrunner.toml Pull Request resolved: https://github.com/pytorch/pytorch/pull/109278 Approved by: https://github.com/kit1980	2023-09-29 02:20:57 +00:00
jjsjann123	e6b5e0ecc6	removing the functionality of nvfuser python APIs (#110124 ) Removing the functionalities from nvfuser python APIs. Since the use of nvfuser has been deprecated before the last release cut. We are removing torch script support. I'll have the next PR to actually remove the code base. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110124 Approved by: https://github.com/davidberard98	2023-09-29 01:45:00 +00:00
rzou	88de391692	[torch.library] Fix some docstrings (#110214 ) Removed some erroneous colons Test Plan: - code reading Pull Request resolved: https://github.com/pytorch/pytorch/pull/110214 Approved by: https://github.com/ezyang	2023-09-29 01:44:49 +00:00
ancestor-mithril	83283b4f0d	Simplify the conditionals used for learning rate calculation for `ConstantLR` learning rate scheduler (#109785 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109785 Approved by: https://github.com/janeyx99, https://github.com/kit1980	2023-09-29 01:19:05 +00:00
Jerry Zhang	c9b8e06060	[quant] Enable quantization for wav2letter (#109830 ) Summary: Also added annotation support for conv1d_relu and conv1d in XNNPACKQuantizer, the quantized results still matches fx quant path (didn't quantize conv1d) so tests are not disabled Test Plan: with-proxy buck2 run executorch/examples/quantization:example -- -m=w2l --verify Differential Revision: D49479546 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109830 Approved by: https://github.com/kimishpatel	2023-09-29 00:47:34 +00:00
Animesh Jain	ce8b4f56d8	[dynamo] Dont put nn module guards on torch inbuilt nn modules (#110230 ) This is one way to fix https://github.com/pytorch/pytorch/issues/110048 Looking for feedback. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110230 Approved by: https://github.com/ezyang	2023-09-29 00:43:16 +00:00
chunyuan	20dabea35d	Inductor cpp wrapper: support MkldnnRnnLayer (#107858 ) 1. Directly use the `codegen` function of the parent class which already supported both python and cpp wrapper. 2. The output of the `at::mkldnn_rnn_layer` OP is actually a `std::tuple` `1491bae277/aten/src/ATen/native/mkldnn/RNN.cpp (L218)` Fix the type when calling `MultiOutput`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107858 Approved by: https://github.com/jgong5, https://github.com/jansel	2023-09-29 00:22:42 +00:00
Edward Z. Yang	d1a13129bb	Add support for item() and nonzero() codegen in Inductor (#109893 ) This is another version of https://github.com/pytorch/pytorch/pull/109262 that I think is more harmonious with inductor design. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/109893 Approved by: https://github.com/jansel	2023-09-28 23:37:31 +00:00
Jerry Zhang	3de42995e4	[quant][pt2e] Add quant API re-entrant test (#110125 ) Summary: Add the test to make sure we can call the quantize API multiple times Test Plan: python test/test_quantization.py TestQuantizePT2E.test_reentrant Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/110125 Approved by: https://github.com/kimishpatel ghstack dependencies: #110097	2023-09-28 22:41:59 +00:00
skc7	bbb95878e9	[LLVM] Update apis incompatible with llvm versions in codegen (#110200 ) Opaque pointers support is disabled in llvm 14 and enabled by default from llvm 15 and above. setOpaquePointers api usage is deprecated from llvm 16. Removed this API. Update CreateMalloc and CreateFree apis for latest llvm release. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110200 Approved by: https://github.com/Skylion007	2023-09-28 21:49:30 +00:00
Nikita Shulga	ae546db562	[GHF] Update meregbot tests (#110221 ) One should never edit `gql_mocks.json` by hand, as otherwise it does not validate mergebot behavior using the actual GitHub data, but rather snapshot of this data frozen in time. Unfortunately, GitHub started to delete checkrun statuses against older PR, so some tests needs to be updated. For example https://github.com/pytorch/pytorch/pull/77700/checks committed on May 19th 2022 has no checks at the time of the writing (Sep 28th 2023) Deleted `test_checksuites_pagination` as its checks are gone it tests the same functionality as `test_get_checkruns_many_runs`, which was updated to use more recent PR. Deleted `test_get_classifications_pending_unstable`, because what it wants to test is inherently unreliable and therefore it must be rewritten using some different mechanisms. Disabled `test_internal_changes` as the mechanism is broken at the moment, see https://github.com/pytorch/pytorch/issues/110218 Updated `test_pr_dependencies_ghstack` and `test_pr_dependencies` to generate `msg` using `pr.get_body()` rather than hardcode the text (that were updated after test was committed.) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110221 Approved by: https://github.com/clee2000, https://github.com/huydhn	2023-09-28 21:29:17 +00:00
Peter Bell	be3b16daad	[decomp] Fix baddbmm decomposition (#109714 ) The decomposition is currently registered without the pw_cast_for_opmath decorator, due to the ordering of decorators being meaningful. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109714 Approved by: https://github.com/lezcano	2023-09-28 21:23:44 +00:00
Nikita Shulga	41d6c29b19	[BE] Fix `pointless comparison` warning (#110227 ) As Indeed `uint32_t(x) >= 0` is always true Warning typically looks as follows: ``` [337/1379] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/EmbeddingBag.cu.o ../aten/src/ATen/core/ivalue.h(1283): warning #186-D: pointless comparison of unsigned integer with zero Remark: The warnings can be suppressed with "-diag-suppress <warning-number>" ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/110227 Approved by: https://github.com/atalman, https://github.com/albanD	2023-09-28 20:21:26 +00:00
Bin Bao	f82a29e32b	[inductor] Add CI jobs to test AOTInductor (#108419 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108419 Approved by: https://github.com/angelayi, https://github.com/jansel	2023-09-28 20:19:25 +00:00
Yu, Guangye	81da6db74a	fix a missing keyword virtual (#110220 ) # Motivate fix a missing keyword `virtual` Pull Request resolved: https://github.com/pytorch/pytorch/pull/110220 Approved by: https://github.com/ezyang	2023-09-28 19:45:34 +00:00
PyTorch MergeBot	e0b035c220	Revert "[core IR] Add lift_fresh, split.Tensor, and unbind decompositions to core ATen decomp table (#110102 )" This reverts commit 22e706f76894a898036329256a3f2f58e79aee92. Reverted https://github.com/pytorch/pytorch/pull/110102 on behalf of https://github.com/atalman due to Breaks internal CI ([comment](https://github.com/pytorch/pytorch/pull/110102#issuecomment-1739856671))	2023-09-28 19:03:25 +00:00
Yang Chen	aaaa3c1586	Fixed minor issues for bmm/mm decompositon (#109836 ) Summary: * Fixed minor issues for bmm/mm decompositon * enabled addmm for inductor Test Plan: ci Reviewed By: mikekgfb Differential Revision: D49522332 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109836 Approved by: https://github.com/jansel, https://github.com/mikekgfb	2023-09-28 18:45:01 +00:00
cyy	168f516fae	[3/N] Move c10::variant to std::variant (#110141 ) This PR moves more c10::variant calls to std::variant Pull Request resolved: https://github.com/pytorch/pytorch/pull/110141 Approved by: https://github.com/Skylion007	2023-09-28 18:43:55 +00:00
Yanbo Liang	84c5435b29	[1/N] Dynamo skipfiles refactor (#109567 ) This is 1/N of the dynamo skipfiles/allowed_functions refactor, the major change in this PR includes: * Refactor & define the [skipfiles rules](https://github.com/pytorch/pytorch/pull/109567/files#diff-5aa3ce9db729bf0901ea97a5d3cc51924cc8575d9c516c1c8f572a35de92544aR56) and interface * For every ```skipfiles.check```, we return both the check result and the skip/inline reason and log them for debugging. * We found several latent issues/bugs and incorrect implementations in the codebase, but I'm planning to fix them in follow-up PRs to make the refactor decoupled with bug fixes. * More details in the inline comments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109567 Approved by: https://github.com/ezyang, https://github.com/jansel, https://github.com/anijain2305	2023-09-28 18:36:46 +00:00
Jerry Zhang	e3eb1d92d8	[quant][docs] Add documentation for `prepare_pt2e`, `prepare_qat_pt2e` and `convert_pt2e` (#110097 ) Summary: att Test Plan: . Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/110097 Approved by: https://github.com/kimishpatel	2023-09-28 18:24:58 +00:00
Evgeni Burovski	3603f646eb	BUG: fix torch._numpy.arange(5, dtype="float32") (#110005 ) Make `np.arange` respect an explicitly provided dtype. Also remove duplicated tests: - torch_np/test_function_base.py::TestArange is a dupe of - torch_np/numpy_tests/core/test_multiarray.py::TestArange Fixes #109975 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110005 Approved by: https://github.com/lezcano	2023-09-28 18:21:18 +00:00
ydwu4	5f7eff0adb	Replace node.meta source_fn with source_fn_stack (#108595 ) A resubmit of https://github.com/pytorch/pytorch/pull/108447. Copy over the descriptions: This is a follow-up of the discussion in https://github.com/pytorch/pytorch/pull/108356, where we want to repalce source_fn with source_fn_stack Before this PR, for the following example: ```python backend = EagerAndRecordGraphs() @torch.compile(backend=backend, fullgraph=True) def cond_f(pred, pred2, x, y): def true_fn(pred2, x, y): return x + y def false_fn(pred2, x, y): def true_fn2(x, y): return x.sin() - y.cos() def false_fn2(x, y): return x.cos() - y.sin() return control_flow.cond(pred2, true_fn2, false_fn2, (x, y)) return control_flow.cond(pred, true_fn, false_fn, (pred2, x, y)) ``` The graph captured is shown below: ```python class GraphModule(torch.nn.Module): def forward(self, L_pred_ : torch.Tensor, L_pred2_ : torch.Tensor, L_x_ : torch.Tensor, L_y_ : torch.Tensor): l_pred_ = L_pred_ l_pred2_ = L_pred2_ l_x_ = L_x_ l_y_ = L_y_ cond_true_1 = self.cond_true_1 cond_false_1 = self.cond_false_1 cond = torch.ops.higher_order.cond(l_pred_, cond_true_1, cond_false_1, [l_pred2_, l_x_, l_y_]); l_pred_ = cond_true_1 = cond_false_1 = l_pred2_ = l_x_ = l_y_ = None return (cond,) class GraphModule(torch.nn.Module): def forward(self, l_pred2_, l_x_, l_y_): add = l_x_ + l_y_; l_x_ = l_y_ = None return add class GraphModule(torch.nn.Module): def forward(self, l_pred2_, l_x_, l_y_): cond_true_0 = self.cond_true_0 cond_false_0 = self.cond_false_0 cond = torch.ops.higher_order.cond(l_pred2_, cond_true_0, cond_false_0, [l_x_, l_y_]); l_pred2_ = cond_true_0 = cond_false_0 = l_x_ = l_y_ = None return cond class GraphModule(torch.nn.Module): def forward(self, l_x_, l_y_): sin = l_x_.sin(); l_x_ = None cos = l_y_.cos(); l_y_ = None sub = sin - cos; sin = cos = None return sub class GraphModule(torch.nn.Module): def forward(self, l_x_, l_y_): cos = l_x_.cos(); l_x_ = None sin = l_y_.sin(); l_y_ = None sub = cos - sin; cos = sin = None return sub ``` the source_fn for inner cond, sin, cos will be a (name, target) tuple: ``` ('cond', <torch._ops.HigherOrderOperator object at xxx>) ('sin', 'sin') ('cos', 'cos') ('sub'. <built-in function sub>) ``` After this pr, the source_fn_stack will be a list of (name, target) tuple. The bottom of stack is the end of the list. ``` [('cond', <torch._ops.HigherOrderOperator object at xxx>), ('cond', <torch._ops.HigherOrderOperator object at xxx>)], [('cond', <torch._ops.HigherOrderOperator object at xxx>), ('cond', <torch._ops.HigherOrderOperator object at xxx>), ('sin', 'sin')], [('cond', <torch._ops.HigherOrderOperator object at xxx>), ('cond', <torch._ops.HigherOrderOperator object at xxx>), ('cos', 'cos')] [('cond', <torch._ops.HigherOrderOperator object at xxx>), ('cond', <torch._ops.HigherOrderOperator object at xxx>), ('sub', <built-in function sub>)] ``` Test Plan: See added tests in test_higher_order_ops.py and modify existing test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108595 Approved by: https://github.com/angelayi, https://github.com/zou3519	2023-09-28 18:18:36 +00:00
rzou	1d0a8eed5d	[generate_opcheck_tests] Enable using same failures_dict for multiple testclasses (#110164 ) This PR allows us to use the same failures_dict for multiple test classes. This is helpful if you have a bunch of small TestCase(es) and to centralize all the failures dict into one big one. Test Plan: - existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/110164 Approved by: https://github.com/williamwen42	2023-09-28 17:56:45 +00:00
Kurt Mohler	f2c360e3e5	Reorganize and rename COW files and APIs (#110191 ) This PR does the following: * Combine `cow/context.<h/cpp>` and `cow/deleter.<h/cpp>` into `cow/COWDeleter.<h/cpp>` * Rename `Context` to `COWDeleterContext` * Rename `delete_context` to `cow_deleter` * Remove the separate `impl_cow_context` bazel library, combining it with the base c10 core library * Rename `context_test.cpp` to `cow_test.cpp` Pull Request resolved: https://github.com/pytorch/pytorch/pull/110191 Approved by: https://github.com/ezyang	2023-09-28 17:50:44 +00:00
vfdev-5	c62be12061	Added batch rules for _upsample_bi2d_aa and _upsample_bi2d_aa_backward (#110172 ) Description: - Added batch rules for `_upsample_bi2d_aa` and `_upsample_bi2d_aa_backward` - Added few more test cases into `sample_inputs_upsample_aten` Pull Request resolved: https://github.com/pytorch/pytorch/pull/110172 Approved by: https://github.com/kshitij12345, https://github.com/zou3519	2023-09-28 17:42:48 +00:00
eellison	2a246c5259	update type() calling to not use unneeded device (#110163 ) Previous code path was doing an unnecessary cuda init as well as causing an unnecessary "device" to occur in the jit trace. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110163 Approved by: https://github.com/henryhu6, https://github.com/albanD	2023-09-28 17:34:46 +00:00
cyy	7f5fd92372	Reland use std::make_unique after internal changes (#109742 ) check internal follow up of #109780 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109742 Approved by: https://github.com/ezyang	2023-09-28 17:24:08 +00:00
Edwiv	7f5737392d	[FSDP] fix: fix for fsdp exec order pre fwd record (#110138 ) When the sharding_strategy is set to SHARD_GRAD_OP and forward_prefetch=True, during direct validation run, self.is_first_iter will always be True (because training=False, iter+1 is not executed). Additionally, the _pre_forward_order_index of the first handle entering the record_pre_forward function is 0. This causes the handle to have a False result in the if condition at line 166 when entering the record_pre_forward function again (the expected value should be True because _pre_forward_order_index has actually been assigned a value). As a result, the first handle is repetitively added to handles_pre_forward_order, leading to incorrect prefetching order. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110138 Approved by: https://github.com/awgu	2023-09-28 15:45:05 +00:00
Yukio Siraichi	6f48d872d0	Re-land: Break graph on `manual_seed`. (#109109 ) Re-landing: #108647 (old #107594) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109109 Approved by: https://github.com/lezcano	2023-09-28 15:28:40 +00:00
Bert Maher	5f417fd710	[aot_inductor] Lightweight model runner (#110158 ) It's useful to have a simple, lightweight way to run a model that adds essentially no overhead to calling the model's generated `run_impl` method. This C API is a super thin wrapper around AOTInductorModel: Create, Run, and Delete are provided, and do very little work beyond dispatch to the appropriate helpers. Note the Create function also provides additional functionality beyond the Container API; it allows the user to pass in a weight map defined in userland, which is a requirement for several serving use cases. Differential Revision: [D49670711](https://our.internmc.facebook.com/intern/diff/D49670711/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110158 Approved by: https://github.com/desertfire, https://github.com/chenyang78	2023-09-28 14:59:41 +00:00
Xu Zhao	ad0ba5e187	[torchbench] Consistent accuracy results with dynamobench (#110189 ) Summary: Use the upstream `torch._dynamo.same` function in accuracy checking and remove the self-hosted version in torchbench. Now cmf_10x and ads_dhen_5x can run in deterministic mode, enable deepcopy and deterministic mode. Test Plan: ``` $ buck2 run mode/opt //pytorch/benchmark:run -- cmf_10x -d cuda -t train --accuracy Running train method from cmf_10x on cuda in eager mode with input batch size 4 and precision tf32. Accuracy: pass ``` ``` $ buck2 run mode/opt //pytorch/benchmark:run -- cmf_10x -d cuda -t train --torchdynamo inductor --torchinductor_enable_batch_fusion --torchinductor_enable_split_cat_fx_pass --accuracy Running train method from cmf_10x on cuda in dynamo inductor mode with input batch size 4 and precision tf32. Accuracy: pass ``` Without this PR, it will print: ``` File "/mnt/xarfuse/uid-234232/9aa53cfe-seed-nspid4026531836_cgpid9238070-ns-4026531840/torch/_dynamo/utils.py", line 190, in time_wrapper r = func(args, kwargs) File "/mnt/xarfuse/uid-234232/9aa53cfe-seed-nspid4026531836_cgpid9238070-ns-4026531840/torch/_inductor/graph.py", line 464, in run return super().run(args) File "/mnt/xarfuse/uid-234232/9aa53cfe-seed-nspid4026531836_cgpid9238070-ns-4026531840/torch/fx/interpreter.py", line 138, in run self.env[node] = self.run_node(node) File "/mnt/xarfuse/uid-234232/9aa53cfe-seed-nspid4026531836_cgpid9238070-ns-4026531840/torch/_inductor/graph.py", line 826, in run_node result.realize_hint() File "/mnt/xarfuse/uid-234232/9aa53cfe-seed-nspid4026531836_cgpid9238070-ns-4026531840/torch/_inductor/ir.py", line 5273, in realize_hint and self.is_pointwise_non_scalar_tensor_num_reads_larger_than_one() File "/mnt/xarfuse/uid-234232/9aa53cfe-seed-nspid4026531836_cgpid9238070-ns-4026531840/torch/_inductor/utils.py", line 343, in wrapper setattr(self, key, fn(self)) File "/mnt/xarfuse/uid-234232/9aa53cfe-seed-nspid4026531836_cgpid9238070-ns-4026531840/torch/_inductor/ir.py", line 5332, in is_pointwise_non_scalar_tensor_num_reads_larger_than_one (sum(read.index != 0 for read in self.data.get_reads()) > 1) File "/mnt/xarfuse/uid-234232/9aa53cfe-seed-nspid4026531836_cgpid9238070-ns-4026531840/torch/_inductor/ir.py", line 5332, in <genexpr> (sum(read.index != 0 for read in self.data.get_reads()) > 1) File "/mnt/xarfuse/uid-234232/9aa53cfe-seed-nspid4026531836_cgpid9238070-ns-4026531840/torch/_inductor/dependencies.py", line 74, in index raise NotImplementedError("StarDep does not have an index") torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised: NotImplementedError: StarDep does not have an index Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information You can suppress this exception and fall back to eager by setting: import torch._dynamo torch._dynamo.config.suppress_errors = True ``` Reviewed By: jackiexu1992, mengluy0125 Differential Revision: D49639733 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110189 Approved by: https://github.com/desertfire, https://github.com/jansel	2023-09-28 14:50:57 +00:00
Bin Bao	8e14e76c34	[inductor] Enhance an input type assertion msg (#110176 ) Summary: to address https://github.com/pytorch/pytorch/issues/110089 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110176 Approved by: https://github.com/angelayi	2023-09-28 13:35:11 +00:00
PyTorch MergeBot	248a1b7011	Revert "Enable function declaration check in Vulkan and Metal backends (#106762 )" This reverts commit bf8617c37d6b32a1aaf7e5d63e4f558637f8d84d. Reverted https://github.com/pytorch/pytorch/pull/106762 on behalf of https://github.com/atalman due to Breaks internal CI ([comment](https://github.com/pytorch/pytorch/pull/106762#issuecomment-1739184482))	2023-09-28 13:32:10 +00:00
Bert Maher	eb082ef604	[inductor] Decompose addmm if it's a dot product on cpu (#110010 ) Generated code for dot product is often faster (on CPU) than dispatching to aten, since it avoids op dispatch overhead and allows fusion with surrounding ops, which in turn avoids allocations. Differential Revision: [D49595876](https://our.internmc.facebook.com/intern/diff/D49595876/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110010 Approved by: https://github.com/chenyang78, https://github.com/jgong5, https://github.com/mikekgfb	2023-09-28 13:30:14 +00:00
aashishthakur10	ee8983da70	109605 dynamo scalar ndarray pow gen (#109953 ) Fixes #109605 Generated code before: ``` def call(args): arg0_1, = args args.clear() assert_size_stride(arg0_1, (8, ), (1, )) buf0 = empty_strided((), (), device='cpu', dtype=torch.int64) cpp_fused_lift_fresh_0(c_void_p(buf0.data_ptr())) # Source Nodes: [wrapped_pow], Original ATen: [aten.lift_fresh, aten.pow] buf1 = aten.pow(arg0_1, reinterpret_tensor(buf0, (8, ), (0, ), 0)) del arg0_1 del buf0 buf2 = buf1 assert_size_stride(buf2, (8, ), (1, )) del buf1 return (buf2, ) ``` Generated code now: ``` def call(args): arg0_1, = args args.clear() assert_size_stride(arg0_1, (8, ), (1, )) buf0 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.int64) cpp_fused_pow_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr())) del arg0_1 return (buf0, ) ``` @lezcano What would be a good way to add a test for this? Pull Request resolved: https://github.com/pytorch/pytorch/pull/109953 Approved by: https://github.com/lezcano	2023-09-28 13:11:06 +00:00
Avik Chaudhuri	5da5e068f3	deprecate constraints in favor of dynamic_shapes (#110143 ) Recently we updated the `export` API to take an experimental `dynamic_shapes` argument that was meant to subsume the existing `constraints` argument. This PR deprecates `constraints` (with a warning on its use, but without actually removing it). Simultaneously it replaces all uses of `constraints` in docs, examples, and tests with corresponding uses of `dynamic_shapes` (preserving behavior). This exercise fortunately revealed some minor bugs in the implementation which have also been fixed in this PR. Some uses of `constraints` still remain, e.g., when `torch._dynamo.export` is called directly. (Meta-internal uses will be updated in a separate diff.) Differential Revision: D49676049 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110143 Approved by: https://github.com/tugsbayasgalan	2023-09-28 10:26:21 +00:00
Sindi Shkodrani	419ec3b229	Enable pickling model prepared with QAT qconfig (#109288 ) Summary: Resolving error: AttributeError: Can't pickle local object '_add_module_to_qconfig_obs_ctr.<locals>.get_factory_kwargs_based_on_module_device' by moving nested function out to the main module Test Plan: Added test to CI Reviewed By: andrewor14 Differential Revision: D49187352 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109288 Approved by: https://github.com/andrewor14	2023-09-28 09:51:19 +00:00
angelayi	c71a64ccce	[aotinductor] Rename if name is prefixed with integer (#110113 ) Fixes https://github.com/pytorch/pytorch/issues/109894. Since in c++ we cannot have variables that start with an integer, we can do some additional handling in inductor to not produce constant tensors with names starting with integers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110113 Approved by: https://github.com/desertfire	2023-09-28 07:26:28 +00:00
Brian	e20c35a53b	Allow public access for imports (#108914 ) Fixes #108776 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108914 Approved by: https://github.com/wanchaol	2023-09-28 06:05:59 +00:00
Jez Ng	fc1fcc4d17	Enable typechecking for _inductor/fx_passes/group_batch_fusion.py (#110111 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110111 Approved by: https://github.com/eellison, https://github.com/Skylion007 ghstack dependencies: #110109	2023-09-28 04:53:09 +00:00
Jez Ng	3e7f23e04f	[inductor] Actually enable typing for sizevars.py and joint_graph.py (#110109 ) The commit message of #107862 says it enabled mypy checking for sizevars.py, but it seems that it neglected to update .lintrunner.toml. New type errors appear to have crept in since then, so I've fixed them accordingly. A similar mistake happened with #109955 for joint_graph.py, though that one is more recent and so hasn't had any new type errors to fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110109 Approved by: https://github.com/Skylion007	2023-09-28 04:53:09 +00:00
cyy	a81d083b1c	[Reland] Add -Wdeprecated and related fixes (#110019 ) This is reland of PRs #https://github.com/pytorch/pytorch/pull/108626 and #109564. We fixed the IOS build failure by changing ``` ((CHECK) ? (EXPR) : ([] { assert(!#CHECK); }(), (EXPR))) ``` to ``` ((CHECK) ? (EXPR) : ([] { assert(false); }(), (EXPR))) ``` in TR2_OPTIONAL_ASSERTED_EXPRESSION, since the former syntax was invalid on Apple Clang. Anyway, we could apply the simple fix hoping that c10::optional would be replaced by std::optional soon. We also enabled -Wdeprecated on c10. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110019 Approved by: https://github.com/clee2000	2023-09-28 03:34:29 +00:00
Sherlock Huang	7f2b51c668	[AOTInductor] ProxyExecutor supports custom op with tuple output (#110140 ) Summary: Extend ProxyExecutor to support custom ops with tuple outputs. Generated wrapper code for `out3, out4 = torch.ops.fb.fn_with_tuple_output(out2, 1)` ``` AtenTensorHandle buf5_handle; // output buffer AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_new_uninitialized_tensor(&buf5_handle)); RAIIAtenTensorHandle buf5(buf5_handle); AtenTensorHandle buf6_handle; // output buffer AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_new_uninitialized_tensor(&buf6_handle)); RAIIAtenTensorHandle buf6(buf6_handle); AtenTensorHandle tensor_args_var_3[] = {buf3.get(), buf5.get(), buf6.get()}; int64_t int_args_var_4[] = {1}; aoti_torch_proxy_executor_call_function(proxy_executor, 1, 1, int_args_var_4, 3, tensor_args_var_3); ``` Test Plan: Test Differential Revision: D49673994 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110140 Approved by: https://github.com/chenyang78	2023-09-28 02:50:39 +00:00
PyTorch MergeBot	75462fd870	Revert "[1/N] Dynamo skipfiles refactor (#109567 )" This reverts commit f8e0ebec8c6156922026fc2bf6e5a829097b4506. Reverted https://github.com/pytorch/pytorch/pull/109567 on behalf of https://github.com/huydhn due to Many jobs are failing in trunk after this with FILENAME_ALLOWLIST is not defined error `f8e0ebec8c`. This looks like a landrace ([comment](https://github.com/pytorch/pytorch/pull/109567#issuecomment-1738344950))	2023-09-28 02:22:22 +00:00
Matthew Hoffman	68b0db1274	Define the public API for torch.distributed.fsdp (#109922 ) Related: https://github.com/pytorch/pytorch/wiki/Public-API-definition-and-documentation Related: https://github.com/microsoft/pylance-release/issues/2953 This fixes pylance issues for these classes: ``` "FullyShardedDataParallel" is not exported from module "torch.distributed.fsdp" ``` These classes all have public docs: * [`BackwardPrefetch`](https://pytorch.org/docs/stable/fsdp.html#torch.distributed.fsdp.BackwardPrefetch) * [`CPUOffload`](https://pytorch.org/docs/stable/fsdp.html#torch.distributed.fsdp.CPUOffload) * [`FullyShardedDataParallel`](https://pytorch.org/docs/stable/fsdp.html#torch.distributed.fsdp.FullyShardedDataParallel) * [`MixedPrecision`](https://pytorch.org/docs/stable/fsdp.html#torch.distributed.fsdp.MixedPrecision) * [`ShardingStrategy`](https://pytorch.org/docs/stable/fsdp.html#torch.distributed.fsdp.ShardingStrategy) And it seems like all the newly added classes will have docs once they are released. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109922 Approved by: https://github.com/wanchaol	2023-09-28 02:15:58 +00:00
Howard Huang	1ca68c971c	distributed doc fix (#110157 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110157 Approved by: https://github.com/awgu	2023-09-28 01:34:02 +00:00
Joel Schlosser	f5a23ca78d	Make unbind() overrideable for NT subclass (#109122 ) Goal: avoid making unbind composite implicit so we can override it within `__torch_dispatch__()` for the NT subclass. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109122 Approved by: https://github.com/cpuhrsch, https://github.com/soulitzer	2023-09-28 01:26:22 +00:00
Yanbo Liang	f8e0ebec8c	[1/N] Dynamo skipfiles refactor (#109567 ) This is 1/N of the dynamo skipfiles/allowed_functions refactor, the major change in this PR includes: * Refactor & define the [skipfiles rules](https://github.com/pytorch/pytorch/pull/109567/files#diff-5aa3ce9db729bf0901ea97a5d3cc51924cc8575d9c516c1c8f572a35de92544aR56) and interface * For every ```skipfiles.check```, we return both the check result and the skip/inline reason and log them for debugging. * We found several latent issues/bugs and incorrect implementations in the codebase, but I'm planning to fix them in follow-up PRs to make the refactor decoupled with bug fixes. * More details in the inline comments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109567 Approved by: https://github.com/ezyang, https://github.com/jansel, https://github.com/anijain2305	2023-09-28 01:21:59 +00:00
SS-JIA	22e706f768	[core IR] Add lift_fresh, split.Tensor, and unbind decompositions to core ATen decomp table (#110102 ) ## Context Add existing decomps for `lift_fresh`, `split.Tensor`, and `unbind` to the core ATen decomposition table. Do not use them in inductor, since Inductor currently lowers these directly. One note though is that `lift_fresh`'s decomposition has a note saying it's not correct under autograd. However, my understanding is that these decompositions are registered to the `"post_autograd"` decomposition table, meaning autograd wouldn't be a factor. Would like some confirmation that this premise is correct. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110102 Approved by: https://github.com/jansel	2023-09-28 01:21:45 +00:00
Mu-Chu Lee	840bb650f8	[AOTInductor] Update regex rule for symbol (#110184 ) Summary: Update regex rule to match _ letter. Test Plan: Included in commit Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/110184 Approved by: https://github.com/desertfire	2023-09-28 01:13:18 +00:00
CaoE	9399e0b1ff	add fp16 support for gemm (#99498 ) ### Testing Native matmul vs. mkldnn matmul on SPR (with avx512_fp16 support) single core: Input \| Naïve impl / ms \| oneDNN / ms \| Speed up -- \| -- \| -- \| -- M: 128, N: 128, K: 128, trans_a: False, trans_b: False \| 2010.387 \| 64.700 \| 31.072 M: 128, N: 256, K: 128, trans_a: False, trans_b: False \| 4027.116 \| 107.780 \| 37.364 M: 8192, N: 768, K: 768, trans_a: False, trans_b: False \| 28685868.488 \| 90663.008 \| 316.401 56 cores: Input \| Naïve impl / ms \| oneDNN / ms \| Speed up -- \| -- \| -- \| -- M: 128, N: 128, K: 128, trans_a: False, trans_b: False \| 5.091 \| 0.24 \| 211.30 M: 128, N: 128, K: 128, trans_a: False, trans_b: True \| 5.224 \| 0.23 \| 220.09 M: 128, N: 256, K: 128, trans_a: False, trans_b: False \| 10.006 \| 0.30 \| 330.31 M: 8192, N: 768, K: 768, trans_a: False, trans_b: False \| 29435.372 \| 1.770 \| 1662.80 M: 8192, N: 768, K: 768, trans_a: False, trans_b: True \| 31464.961 \| 1.728 \| 18204.76 M: 8192, N: 768, K: 3072, trans_a: False, trans_b: False \| 115035.849 \| 7.990 \| 14396.90 M: 8192, N: 768, K: 3072, trans_a: False, trans_b: True \| 122981.023 \| 7.725 \| 15918.34 Batch: 768, M: 128, N: 64, K: 128 \| 2032.523 \| 0.705 \| 2882.23 Pull Request resolved: https://github.com/pytorch/pytorch/pull/99498 Approved by: https://github.com/jgong5, https://github.com/malfet	2023-09-28 01:03:50 +00:00
Peter Bell	d796518485	[refs] Fix size check from #108360 (#109083 ) PR #108360 uses the same default `last_dim_size` formula from complex-to-real (C2R) transforms for complex-to-complex (C2C) and real-to-complex (R2C). However, this is not correct because for C2R the input is only half the size of the full tensor, which is not the case for C2C and C2R. This error is mostly benign since `last_dim_size` was only used for the `>= 1` condition which is almost always met anyway. For this PR I now use it as the argument to `_apply_norm` which makes it load-bearing for correctness and so is thoroughly tested now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109083 Approved by: https://github.com/lezcano	2023-09-27 23:59:29 +00:00
BowenBao	85e408217a	[ONNX] Move out onnx bench bash scripts (#103983 ) Summary: - Remove onnx bench related scripts and `_onnx` folder. - Update `common.py` to include onnx related patches previously under `_onnx` folder. - Update `merge_rules.json` to include bench files. - Added quick sanity onnx bench test to onnx CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103983 Approved by: https://github.com/kit1980	2023-09-27 23:54:26 +00:00
Jithun Nair	60b46d7902	Add ROCm folks as CODEOWNERS for triton.txt (#110108 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110108 Approved by: https://github.com/kit1980	2023-09-27 23:31:15 +00:00
Mark Saroufim	40b83d98de	fix bugs in export docstrings (#110169 ) First error ``` Traceback (most recent call last): File "/home/ubuntu/exporty.py", line 8, in <module> ep = torch.export.export(MyModule(), torch.randn(5)) File "/opt/conda/envs/sam/lib/python3.10/site-packages/torch/export/__init__.py", line 509, in export return export(f, args, kwargs, constraints) File "/opt/conda/envs/sam/lib/python3.10/site-packages/torch/_export/__init__.py", line 314, in export raise UserError(UserErrorType.INVALID_INPUT, torch._dynamo.exc.UserError: Expecting `args` to be a tuple of example positional inputs, got <class 'torch.Tensor'> ``` Second error ``` (sam) ubuntu@ip-172-31-9-217:~$ python exporty.py Traceback (most recent call last): File "/home/ubuntu/exporty.py", line 13, in <module> torch.export.save(ep, 'exported_program.pt2', extra_files=extra_files) File "/opt/conda/envs/sam/lib/python3.10/site-packages/torch/export/__init__.py", line 566, in save save(ep, f, extra_files=extra_files, opset_version=opset_version) File "/opt/conda/envs/sam/lib/python3.10/site-packages/torch/_export/__init__.py", line 595, in save encoded_content = content.encode('utf-8') AttributeError: 'bytes' object has no attribute 'encode'. Did you mean: 'decode'? ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/110169 Approved by: https://github.com/angelayi	2023-09-27 22:56:42 +00:00
Tugsbayasgalan Manlaibaatar	bf7307adf8	Support inference_mode decorator (#109274 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109274 Approved by: https://github.com/williamwen42	2023-09-27 22:21:42 +00:00
Nikita Shulga	a200bb5e54	[BE] Do not use `assert` in unit tests (#110179 ) One should always use `unittest.assert` methods rather than plain `assert` as later can be turned into a noop if Python runtime is invoked with optimizations enabled Fixes use of `assert` introduced by https://github.com/pytorch/pytorch/pull/105251 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110179 Approved by: https://github.com/huydhn	2023-09-27 21:53:18 +00:00
Michael Voznesensky	2ff9d1fda3	Add size to constant - type dispatche through BaseListVariable.cls_for (#110166 ) Differential Revision: D49689895 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110166 Approved by: https://github.com/anijain2305	2023-09-27 21:44:16 +00:00
Mu-Chu Lee	7782108792	[AOTIndutor] Fix freeze for AOTInductor (#110055 ) Summary: Add test for freeze graph in AOTInductor. Remove unused code path. Test Plan: Included in commit. Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/110055 Approved by: https://github.com/angelayi	2023-09-27 21:21:47 +00:00
Huy Do	955298bc40	Use Dr.CI results to classify flaky failures in trymerge (#110054 ) After https://github.com/pytorch/test-infra/pull/4589, we can now query Dr.CI to get the list of flaky failures there. This change queries Dr.CI API endpoint and check if the failure is a flaky one using `is_flaky` function. Because the change is relatively large, I'm breaking it down to several smaller PRs in this order: * [x] This PR queries Dr.CI and adds `is_flaky` check * [ ] Clean up the flaky rules logic because it has already been implemented on Dr. CI * [ ] Clean up the broken trunk logic for the same reason ### Testing * Create a new `drci_mocks.json` file to catch the JSON response from Dr.CI API endpoint. The API requires `DRCI_BOT_KEY`. * `pytest -v test_trymerge.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/110054 Approved by: https://github.com/clee2000	2023-09-27 21:21:29 +00:00
Animesh Jain	213badf632	[dynamo][guards-log] Add debug msg for nn_module_guards only when log is enabled (#110167 ) I did not do any benchmarks, but there could be a small overhead of creating the debug_msg. Adding debug_msg only when guards log is enabled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110167 Approved by: https://github.com/ezyang	2023-09-27 21:11:44 +00:00
Jon Chuang	6aae636f69	chore(inductor): Simplify `will_fusion_create_cycle` and cleanup to `node.ancestors` (#109976 ) recursive_predecessors == ancestors so rename. Improve comments Simplify `will_fusion_create_cycle` - make it easier to read and add detailed comments. Diagram to illustrate clarification of shortcut. ![Inductor Deep Dive](https://github.com/pytorch/pytorch/assets/9093549/7a30e088-8a33-4a9c-a8a7-81199cd086e2) CC: @ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/109976 Approved by: https://github.com/jansel	2023-09-27 20:48:53 +00:00
Michael Voznesensky	b123fd168a	Higher order op for preserving leaf functions through trace, particularly for getting user defined hooks to compiled autograd (#109690 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109690 Approved by: https://github.com/ezyang	2023-09-27 20:47:15 +00:00
Animesh Jain	fe11227764	[dynamo][higher order op] Fix minor bug in error msgs (#110099 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110099 Approved by: https://github.com/zou3519	2023-09-27 20:28:17 +00:00
Huy Do	7c1702f099	Keep JSON mocks file in gzip format (#110173 ) This is to keep them smaller than the file size limit enforced in fbcode. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110173 Approved by: https://github.com/malfet	2023-09-27 20:16:58 +00:00
Huy Do	d4b06dc426	Pass S3 credentials to ios upload workflow (#109222 ) This fixes the failed upload to S3 for nightly and release build. The credentials needs to be passed from the caller workflow. We also need to setup the credential in D49291627 before merging this one. ### Testing Upload successfully https://github.com/pytorch/pytorch/actions/runs/6190836578/job/17125308432?pr=109222#step:13:51 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109222 Approved by: https://github.com/atalman	2023-09-27 20:15:02 +00:00
PyTorch UpdateBot	21ff0cc3ac	[xla hash update] update the pinned xla hash (#109999 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml). Update the pinned xla hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109999 Approved by: https://github.com/pytorchbot	2023-09-27 19:52:50 +00:00
Nikita Shulga	ae064ad4c6	Fix XLA update rules (#110177 ) Regression introduced during migration of `bionic` to `focal` by https://github.com/pytorch/pytorch/pull/105260 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110177 Approved by: https://github.com/clee2000	2023-09-27 19:25:31 +00:00
Yanbo Liang	5ef5f1ab9a	[HigherOrderOp] wrap (and checkpoint) should accept pytree inputs (#109962 ) Fixes #109250 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109962 Approved by: https://github.com/zou3519	2023-09-27 18:51:09 +00:00
Nikita Shulga	58c33789c6	Fix governance.rst link rendering (#110171 ) By adding `__` to the end of the link decorator according to https://sublime-and-sphinx-guide.readthedocs.io/en/latest/references.html#links-to-external-web-pages Fixes regression introduced by https://github.com/pytorch/pytorch/pull/106863 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110171 Approved by: https://github.com/seemethere, https://github.com/msaroufim, https://github.com/atalman	2023-09-27 18:49:03 +00:00
cyy	36eb1bb548	Use constexpr members in ConstantSymNodeImpl (#110142 ) A simple refactoring. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110142 Approved by: https://github.com/Skylion007	2023-09-27 18:31:33 +00:00
Michael Voznesensky	a8bed7191b	[Easy] use BaseListVariable cls_for for all list-y type dispatching (#110159 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110159 Approved by: https://github.com/ezyang	2023-09-27 18:21:15 +00:00
Sherlock Huang	ec5bbef8af	[AOTInductor] Switch ProxyExecutor to use AtenTensorHandle (#109748 ) Summary: Switch ProxyExecutor to use AtenTensorHandle. Test Plan: E2E Test Differential Revision: D49471659 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109748 Approved by: https://github.com/yifuwang, https://github.com/desertfire, https://github.com/chenyang78	2023-09-27 17:51:30 +00:00
Lei, Zhenyuan	633bd0765e	Integrate xpu into torch.Generator and torch.seed (#109866 ) Integrate torch.xpu.Generator into torch.Generator Integrate torch.xpu.seed into torch.seed Pull Request resolved: https://github.com/pytorch/pytorch/pull/109866 Approved by: https://github.com/ezyang	2023-09-27 17:44:45 +00:00
hongxyan	0511df0ee9	[ROCM] enable skipped test_api cpp tests (#109817 ) [ROCM] enable skipped test_api cpp tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/109817 Approved by: https://github.com/jithunnair-amd, https://github.com/malfet	2023-09-27 16:52:46 +00:00
PyTorch MergeBot	063d2572da	Revert "Use Dr.CI results to classify flaky failures in trymerge (#110054 )" This reverts commit d0f82cd082fad7243226e0ab68fd995873ea7d76. Reverted https://github.com/pytorch/pytorch/pull/110054 on behalf of https://github.com/huydhn due to The mock gql_mocks.json file is not bigger than the file size limit on fbcode ([comment](https://github.com/pytorch/pytorch/pull/110054#issuecomment-1737727552))	2023-09-27 16:33:10 +00:00
Edward Z. Yang	8791e8697a	Print full stack trace on suppressed error (#110106 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/110106 Approved by: https://github.com/zou3519, https://github.com/voznesenskym	2023-09-27 16:09:06 +00:00
Mengwei Liu	0721a394b6	[executorch][kernel reg] Allow kernel manual registration (#110086 ) Summary: Exposing a codegen mode for generating a hook for user to register their kernels. If we pass `--manual-registration` flag to `gen_executorch.py`, we will generate the following files: 1. RegisterKernels.h which declares a `register_all_kernels()` API inside `torch::executor` namespace. 2. RegisterKernelsEverything.cpp which implements `register_all_kernels()` by defining an array of generated kernels. This way user can depend on the library declared by `executorch_generated_lib` macro (with `manual_registration=True`) and be able to include `RegisterKernels.h`. Then they can manually call `register_all_kernels()` instead of relying on C++ static initialization mechanism which is not available in some embedded systems. Test Plan: Rely on the unit test: ``` buck2 test fbcode//executorch/runtime/kernel/test:test_kernel_manual_registration ``` Reviewed By: cccclai Differential Revision: D49439673 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110086 Approved by: https://github.com/cccclai	2023-09-27 16:04:20 +00:00
PyTorch MergeBot	1265400ba6	Revert "Reland: implement a function to convert a storage to copy-on-write (#110022 )" This reverts commit dddf07e56a9a798ae27d976d697c3d434cf63a5b. Reverted https://github.com/pytorch/pytorch/pull/110022 on behalf of https://github.com/atalman due to New tests are failing in internal CI ([comment](https://github.com/pytorch/pytorch/pull/110022#issuecomment-1737584693))	2023-09-27 15:05:41 +00:00
rzou	7dbdf3be1e	Fix inductor CI (by updating graph break count) (#110160 ) There was a vision hash update which led to fewer graph breaks. This seems expected to me (because the hash update included https://github.com/pytorch/vision/pull/7944 and nms is used in maskrcnn). Test Plan: - wait for ci Pull Request resolved: https://github.com/pytorch/pytorch/pull/110160 Approved by: https://github.com/ezyang, https://github.com/Chillee	2023-09-27 14:37:36 +00:00
cyy	bf8617c37d	Enable function declaration check in Vulkan and Metal backends (#106762 ) This PR enables declaration check in Vulkan and Metal backends, so that we can identify unused functions more easily. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106762 Approved by: https://github.com/ezyang	2023-09-27 14:29:24 +00:00
rzou	774137d506	Add torch.ops.import_module (#110090 ) Generally, to extend PyTorch with custom operators, a user will create a Python module whose import triggers registration of the custom operators via a torch.ops.load_library call or a call to one or more torch.library.* APIs. It is unexpected for Python modules to have side effects, so some linters and formatters will complain. Use torch.ops.import_module to import the module without a linter or formatter complaining. NB: A more robust API would actually check if a custom op was registered or modified, but this is technically challenging to do. In the future we can add a warning if a custom op wasn't registered or modified. Test Plan: - existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/110090 Approved by: https://github.com/ezyang	2023-09-27 13:56:47 +00:00
Kaichao You	34ded74399	[Dynamo] fix signature in dynamo types (#110081 ) The type signature is obsolete. This PR fixes the type signature, leaves comments in the C code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110081 Approved by: https://github.com/jansel	2023-09-27 09:30:04 +00:00
Tarun Karuturi	a51b8df261	Add support for event_tracer in codegen layer (#109990 ) Summary: Split out from D48975975, this handles the pytorch specific changes to add support for event_tracer in codegen layer. Test Plan: CI Reviewed By: dbort Differential Revision: D49487710 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109990 Approved by: https://github.com/Jack-Khuu	2023-09-27 09:09:03 +00:00
Edward Z. Yang	10c646295d	When doing typed typecheck, also check signature with symint removed (#109727 ) See the test case for what we didn't catch (SymInt vs const SymInt& mismatch.) It's necessary to test for both, because we will fall back to the non-SymInt signature if there is no SymInt unboxed kernel available. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/109727 Approved by: https://github.com/zou3519	2023-09-27 07:29:46 +00:00
Jerry Zhang	1b51d29b66	[quant][pt2e] Enable constant folding for quantize ops (#109343 ) Summary: This PR added constant folding for quantize ops so that instead of storing fp32 weight in the quantized model, we'll get int8/int16 etc. weight Test Plan: python test/test_quantization.py TestQuantizePT2E.test_fold_quantize also will verify in executorch later Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D49399210](https://our.internmc.facebook.com/intern/diff/D49399210) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109343 Approved by: https://github.com/kimishpatel, https://github.com/jgong5	2023-09-27 06:04:45 +00:00
PyTorch UpdateBot	6138750ab1	[vision hash update] update the pinned vision hash (#110127 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110127 Approved by: https://github.com/pytorchbot	2023-09-27 04:25:39 +00:00
Angela Yi	ddbf1aab64	[export] Add dynamic_shapes to _export.aot_compile (#110101 ) Summary: Following the new dynamic_shapes API (introduced in https://github.com/pytorch/pytorch/pull/108448), we will also add a dynamic_shapes API to _export.aot_compile Test Plan: CI Differential Revision: D49653815 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110101 Approved by: https://github.com/gmagogsfm	2023-09-27 04:10:22 +00:00
Edward Z. Yang	f7c9ef88f5	Add masked_select abstract impl (#110103 ) Fixes https://github.com/pytorch/pytorch/issues/109871 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/110103 Approved by: https://github.com/bdhirsh	2023-09-27 04:07:58 +00:00
Wang Ran (汪然)	33d8f5f73e	fix typo (#109965 ) fix typo Pull Request resolved: https://github.com/pytorch/pytorch/pull/109965 Approved by: https://github.com/zou3519, https://github.com/kit1980	2023-09-27 03:32:04 +00:00
Edward Z. Yang	869226bf94	Avoid passing generator to parametrize (#110104 ) Fixes ``` ValueError: <function TestMeta.test_layer_norm_backward at 0x7f555f56e440>: An empty arg_values was passed to @parametrize. Note that this may result from reuse of a generator. ``` Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/110104 Approved by: https://github.com/malfet, https://github.com/jbschlosser, https://github.com/voznesenskym	2023-09-27 02:52:48 +00:00
SS-JIA	dec140f1ea	[core IR] Add a core decomposition for aten.all (#110093 ) ## Context Change the ref implementation of `aten.all` to only use other `torch` operators such that we can use it for the core ATen decomposition table. This will replace the decomposition for `aten.all` that was used specifically by Inductor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110093 Approved by: https://github.com/manuelcandales, https://github.com/peterbell10, https://github.com/lezcano	2023-09-27 01:31:41 +00:00
Yukio Siraichi	51a8c166a6	Add test for `ShapeEnv` recording fallback. (#109944 ) This PR adds a test for the previous PR in this stack: #109904. In summary, it calls functions decorated with `@record_shapeenv_event`, that don't have an explicit `ShapeEnv` parameter, with arguments that don't hold a `ShapeEnv` instance. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109944 Approved by: https://github.com/ezyang	2023-09-27 00:50:14 +00:00
SS-JIA	9928c10e71	[core IR] Add glu as a core decomposition (#110043 ) ## Context Add the decomposition for `aten.glu` as a decomposition in the core ATen decomposition table. Don't use it in the Inductor decomposition table since Inductor has a lowering for it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110043 Approved by: https://github.com/peterbell10, https://github.com/lezcano ghstack dependencies: #110046	2023-09-27 00:23:05 +00:00
Yang Chen	4d0ae7c9da	[inductor] support _scaled_dot_product_flash_attention fallback (#110085 ) Summary: This PR supports _scaled_dot_product_flash_attention fallback kernel. Note that in the abi_compatible mode, we retrieve outputs by passing output argument pointers rather than relying on std::get. It also fixes an issue related to dynamic shapes, where we wrongfully query undefined dynamic symbols. Test Plan: ci Reviewed By: frank-wei Differential Revision: D49620191 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110085 Approved by: https://github.com/desertfire	2023-09-27 00:09:56 +00:00
Shiyan Deng	19ca883f8b	[pytorch][jit] allow passing in obj loader in unpickle api (#109730 ) Summary: We are trying to use wired message to pass python objects like KJT. In order to make JIT be able to unpickle it, we need to provide a type resolver as well as an obj loader. This diff modify the interface to let we be able to do that. Test Plan: Rely on current CI to make sure existing usage doesn't break. In the next diff, test e2e Differential Revision: D49438569 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109730 Approved by: https://github.com/davidberard98	2023-09-26 23:50:20 +00:00
Edward Z. Yang	3262c5358f	Use _check_is_size for validate_dim_length (#109849 ) _check_is_size has some extra juice for unbacked SymInts, use it. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/109849 Approved by: https://github.com/yanboliang	2023-09-26 23:33:31 +00:00
Wanchao Liang	27443eadeb	[dtensor][7/n] remove reduction rule (#109144 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109144 Approved by: https://github.com/fduwjj ghstack dependencies: #108263, #108264	2023-09-26 22:24:50 +00:00
Wanchao Liang	2dd9a79d22	[dtensor][6/n] refactor reduction to use op strategy (#108264 ) This PR refactors the reduction op to use strategy based propagation Pull Request resolved: https://github.com/pytorch/pytorch/pull/108264 Approved by: https://github.com/fduwjj ghstack dependencies: #108263	2023-09-26 22:24:50 +00:00
Wanchao Liang	986d255db2	[dtensor][5/n] switch random ops to op strategy (#108263 ) This PR switches the random ops to use op strategy instead of rule based, this is a first series of PRs to refactor ops after we refactor op dispatch logic Pull Request resolved: https://github.com/pytorch/pytorch/pull/108263 Approved by: https://github.com/fduwjj	2023-09-26 22:24:42 +00:00
Huy Do	d0f82cd082	Use Dr.CI results to classify flaky failures in trymerge (#110054 ) After https://github.com/pytorch/test-infra/pull/4589, we can now query Dr.CI to get the list of flaky failures there. This change queries Dr.CI API endpoint and check if the failure is a flaky one using `is_flaky` function. Because the change is relatively large, I'm breaking it down to several smaller PRs in this order: * [x] This PR queries Dr.CI and adds `is_flaky` check * [ ] Clean up the flaky rules logic because it has already been implemented on Dr. CI * [ ] Clean up the broken trunk logic for the same reason ### Testing * Create a new `drci_mocks.json` file to catch the JSON response from Dr.CI API endpoint. The API requires `DRCI_BOT_KEY`. * `pytest -v test_trymerge.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/110054 Approved by: https://github.com/clee2000	2023-09-26 21:24:21 +00:00
Richard Zou	bb9779ecd2	Revert D49640259: Revert D49615962: [optests] Test names in failure dicts should be prefixed with test class (#110094 ) Summary: Revert D49640259: Revert D49615962: [optests] Test names in failure dicts should Test Plan: revert-hammer Differential Revision: D49645397 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110094 Approved by: https://github.com/izaitsevfb	2023-09-26 21:16:36 +00:00
Khushi Agrawal	ac3190c52c	[cpu] vectorize atanh (#107786 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/107786 Approved by: https://github.com/jgong5, https://github.com/sanchitintel, https://github.com/ezyang	2023-09-26 20:20:46 +00:00
PyTorch MergeBot	194d9aa0f2	Revert "[Dynamo] Match closures by code ID (#109427 )" This reverts commit 3de08575031bc0ea770b5935dec13046d8ba7992. Reverted https://github.com/pytorch/pytorch/pull/109427 on behalf of https://github.com/voznesenskym due to Fails test `PYTORCH_TEST_WITH_DYNAMO=1 python test_ops.py -k test_out_warning__refs_cat_cpu ([comment](https://github.com/pytorch/pytorch/pull/109427#issuecomment-1736101561))	2023-09-26 18:54:36 +00:00
Angela Yi	a7409695bb	[export] Verifier for exported program (#109519 ) Summary: X-link: https://github.com/pytorch/executorch/pull/292 Added a verifier for the graph signature in a exported program Test Plan: CI Differential Revision: D48926643 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109519 Approved by: https://github.com/zhxchen17	2023-09-26 18:47:43 +00:00
Jane Xu	0a60219fe3	[foreach] Fix 0-size handling for real for real (#109402 ) @crcrpar's last attempt to fix the 0-size problem unfortunately did not pass all cases. See my comment in https://github.com/pytorch/pytorch/issues/100701. When we have a tail tensor of size 0, the old code would mess with the chunk logic to check the previous tensor's length. This is flawed because: 1. if the previous tensor was also 0 sized, (so a tensor list of [tensor, tensor, tensor, ..., 0-sized tensor, 0-sized tensor],) chunks would still be 0 and the nested for loop would be missed. 2. the nested forloop pronounces side effects on tensorListMeta that _shouldn't_ be there! This can mess up the compute in unexpected ways that I haven't really needed to reason through. We noticed that the problem had not been fixed due to an internal report. This PR solves the issue by: - removing the finagling of chunks when the tail tensor is 0-sized - adding a surefire way for the kernel to be launched in the case where the last tensor is 0-sized AND there's content in the metadata, signifying there is stuff to compute still. ## test plan As I went through the code, I also added some comments explaining what's up and modified our tensor inputs to ensure that this case is tested in the test_parity test in test_foreach.py. Yes, I do realize there is quite a bit of duplication and that this file could be due for a refactor. That said, the primary goal of this PR is to fix the pretty egregious bug and refactoring can be a followup. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109402 Approved by: https://github.com/albanD	2023-09-26 17:38:20 +00:00
Rodrigo Kumpera	317e39a8ad	[C10d] Cleanup collective sequence number. (#109136 ) Sequence numbers must be associated with a Work object if we want to use it as a way to report collective progress. The API surface change is introducing Work::getSequenceNumber, which should eventually be exposed to python. The bulk of this change is changing gloo to make the sequence number be always in use and weave it to the dozens subclasses of Work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109136 Approved by: https://github.com/fduwjj	2023-09-26 17:17:04 +00:00
Driss Guessous	818f2297e6	Ensure fill_ works when value is a view of self (#109835 ) # Summary Introduced a BC breaking change in #109533 when self is a view of the value. By using the copy_() op inside fill_ we were hitting `assert_no_partial_overlap` in tensor iterator. Ideal we would be able to avoid this check if value.numel() ==1 . But rather than monkeying around with tensor iterator I just clone the input instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109835 Approved by: https://github.com/mikaylagawarecki	2023-09-26 17:12:48 +00:00
Richard Barnes	3705e65254	Add `pin_memory` to `torch.Tensor` type annotation args (#109797 ) Test Plan: Sandcastle Differential Revision: D49504528 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109797 Approved by: https://github.com/jianyuh	2023-09-26 17:12:37 +00:00
Zain Rizvi	1277d0e834	[BE] Add sharding data by default to metrics (#110035 ) Extend metric library to allow setting global metrics on a process level which will always be emitted. Current use case for them is to include shard information every time a metric is emitted by run_test.py <!-- copilot:poem --> ### <samp>🤖 Generated by Copilot at 0cae92c</samp> > _`run_test` refactored_ > _Sharding metrics in Rockset_ > _Autumn of testing_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/110035 Approved by: https://github.com/clee2000	2023-09-26 17:06:49 +00:00
Li-Huai (Allan) Lin	d91492a7a4	[MPS] Fix sort with empty tensor. (#109584 ) Fixes #107284 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109584 Approved by: https://github.com/kulinseth, https://github.com/albanD ghstack dependencies: #109557, #109574	2023-09-26 16:30:38 +00:00
Bin Bao	993530ee4f	[aotinductor] Relax the CUDAGuard device index check (#110030 ) Summary: Although AOTInductor only supports running on a single cuda device, it does work in the case where there is a mix of cpu and cuda ops. So instead of asserting if a CUDA index appears for the first time, we check if there is only one cuda device index. This solves https://github.com/pytorch/pytorch/issues/109655 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110030 Approved by: https://github.com/jansel	2023-09-26 16:23:23 +00:00
Catherine Lee	47adcd412f	Increase timeout for slow tests (#109206 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109206 Approved by: https://github.com/huydhn	2023-09-26 16:18:38 +00:00
leslie-fang-intel	0dcea70bfd	fix sfdp patern 13 accuracy issue (#110001 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110001 Approved by: https://github.com/eellison	2023-09-26 15:23:45 +00:00
PyTorch MergeBot	2393864070	Revert "[optests] Test names in failure dicts should be prefixed with test class (#110045 )" This reverts commit 76fcec74c413af22186f0782f02aca49ab61dc20. Reverted https://github.com/pytorch/pytorch/pull/110045 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/110045#issuecomment-1735711094))	2023-09-26 14:56:08 +00:00
DanilBaibak	a5de10d7a5	Remove linux.t4g.2xlarge Usage (#110064 ) Switched from linux.t4g.2xlarge to linux.arm64.2xlarge Pull Request resolved: https://github.com/pytorch/pytorch/pull/110064 Approved by: https://github.com/atalman, https://github.com/malfet	2023-09-26 14:30:35 +00:00
rzou	ea20db8aa0	[optests] Excise unused operator_compile_check (#110011 ) The recommendation is to just use `opcheck`, which has superceded all uses of `operator_compile_check`. Test Plan: - existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/110011 Approved by: https://github.com/ezyang ghstack dependencies: #109912	2023-09-26 13:24:21 +00:00
PyTorch MergeBot	812bf847b7	Revert "Add test for `ShapeEnv` recording fallback. (#109944 )" This reverts commit a4dec8d306d96637aa4dc1ee9deba289b128c148. Reverted https://github.com/pytorch/pytorch/pull/109944 on behalf of https://github.com/atalman due to New test failing internally ([comment](https://github.com/pytorch/pytorch/pull/109944#issuecomment-1735512734))	2023-09-26 13:11:22 +00:00
Aleksei Nikiforov	e05eb69c93	Don't link to libcpuinfo on s390x (#109875 ) Don't even build it. It does not support s390x. This is a follow up for https://github.com/pytorch/pytorch/pull/109496 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109875 Approved by: https://github.com/kit1980	2023-09-26 12:43:35 +00:00
Peter Bell	92d86cd1ad	[inductor] Fix triton compiler error in multilayer any (#109325 ) Fixes #109196 When we have a split reduction and the tensor is not an even multiple of the split size, we use `ops.masked` to pad to an even multiple. In the case here we generated: ```python tmp5 = tl.where(mask, tmp4, 0) ``` which implicitly promotes our boolean value to `int32`. The fix is to give the default value the same dtype as `result`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109325 Approved by: https://github.com/lezcano	2023-09-26 12:29:29 +00:00
PyTorch MergeBot	1b90f07f5a	Revert "Reland "Update AOTAutograd to use FunctionalTensorMode instead of C++ functionalization (#106406 )" (#109906 )" This reverts commit d0fe8fa5db6dd06adfe1246a72b6d3a5215ff86e. Reverted https://github.com/pytorch/pytorch/pull/109906 on behalf of https://github.com/atalman due to Breaks internal tests ([comment](https://github.com/pytorch/pytorch/pull/109906#issuecomment-1735416852))	2023-09-26 12:10:25 +00:00
Evgeni Burovski	132a138a01	MAINT: pytorchify torch._numpy tests: core/ and fft/ (#109815 ) 1. Inherit from TestCase 2. Use pytorch parametrization 3. Use unittest.expectedFailure to mark xfails, also unittest skips All this to make pytest-less invocation work: $ python test/torch_np/test_basic.py cross-ref #109593, #109718, #109775 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109815 Approved by: https://github.com/lezcano	2023-09-26 11:04:24 +00:00
wz337	8140494afd	[3/N][2D] Enable training with new 2D flow (#110034 ) Replacing https://github.com/pytorch/pytorch/pull/109553 as it gets reverted. This PR enables training with new 2D flow and adds associated test. In addition, this PR moves the tensor/parallel/_data_parallel_utils.py that are fsdp specific back to tensor/parallel/fsdp.py to avoid circular dependency for ddp.py and test/distributed/tensor/parallel/test_ddp_2d_parallel.py. state_dict related changes would be in later PRs. cc. @fegin, @fduwjj, @wanchaol, @awgu Pull Request resolved: https://github.com/pytorch/pytorch/pull/110034 Approved by: https://github.com/fduwjj	2023-09-26 09:14:15 +00:00
Animesh Jain	0673aa3d28	[dynamo][guards-log] Print nn module guard saved dict versions for debugging (#110028 ) This is the output for nn module guards ~~~ [DEBUG] GUARDS: [DEBUG] hasattr(L['x'], '_dynamo_dynamic_indices') == False # _dynamo/variables/builder.py:1356 in wrap_fx_proxy_cls [DEBUG] ___check_obj_id(L['self'], 139820807110912) # for mod in self.mods: # examples/graph_break.py:35 in forward [DEBUG] __nn_module_guard_0(L['self']) # versions(mod=9998, _parameters=1194395, _buffers=1194397, _modules=1194423, _forward_hooks=1194405, _forward_pre_hooks=1194411, _backward_hooks=1194402, _backward_pre_hooks=1194400) # for mod in self.mods: # examples/graph_break.py:35 in forward [DEBUG] ___check_obj_id(L['self'].mods[0], 139817945727568) # for mod in self.mods: # examples/graph_break.py:35 in forward [DEBUG] __nn_module_guard_1(L['self'].mods[0]) # versions(mod=10001, _parameters=1194428, _buffers=1194430, _modules=1194522, _forward_hooks=1194438, _forward_pre_hooks=1194444, _backward_hooks=1194435, _backward_pre_hooks=1194433) # for mod in self.mods: # examples/graph_break.py:35 in forward [DEBUG] ___check_obj_id(L['self'].mods[1], 139817945560640) # for mod in self.mods: # examples/graph_break.py:35 in forward [DEBUG] __nn_module_guard_2(L['self'].mods[1]) # versions(mod=10001, _parameters=1194660, _buffers=1194662, _modules=1194753, _forward_hooks=1194670, _forward_pre_hooks=1194676, _backward_hooks=1194667, _backward_pre_hooks=1194665) # for mod in self.mods: # examples/graph_break.py:35 in forward [DEBUG] ___check_obj_id(L['self'].mods[0].linear, 139817945727856) # return self.linear(a) # examples/graph_break.py:24 in helper [DEBUG] __nn_module_guard_3(L['self'].mods[0].linear) # versions(mod=10004, _parameters=1470004, _buffers=1194467, _modules=1194493, _forward_hooks=1194475, _forward_pre_hooks=1194481, _backward_hooks=1194472, _backward_pre_hooks=1194470) # return self.linear(a) # examples/graph_break.py:24 in helper [DEBUG] ___check_obj_id(L['self'].mods[1].linear, 139817945561120) # return self.linear(a) # examples/graph_break.py:24 in helper [DEBUG] __nn_module_guard_4(L['self'].mods[1].linear) # versions(mod=10004, _parameters=1470008, _buffers=1194699, _modules=1194725, _forward_hooks=1194707, _forward_pre_hooks=1194713, _backward_hooks=1194704, _backward_pre_hooks=1194702) # return self.linear(a) # examples/graph_break.py:24 in helper [DEBUG] utils_device.CURRENT_DEVICE == None # _dynamo/output_graph.py:373 in init_ambient_guards ~~~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/110028 Approved by: https://github.com/ezyang ghstack dependencies: #110023, #110039	2023-09-26 08:53:07 +00:00
SS-JIA	5df8aca994	[core IR] Add a core decomposition for floor_divide (#110046 ) ## Context Introduce a core decomposition for `aten.floor_divide` into other `aten` ops, and add it to the core ATen decomposition table. This replaces the decomposition of `floor_divide` that was used by Inductor. I noticed there was a note on that decomposition ``` # TorchInductor-only decomposition. It should not be taken to core. # See https://github.com/pytorch/torchdynamo/pull/1120 ``` but couldn't discern the reason why this is the case. cc: @lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/110046 Approved by: https://github.com/peterbell10	2023-09-26 08:39:21 +00:00
Yukio Siraichi	26e8cc0465	Add test for `ShapeEnv` state when not recording. (#109945 ) This PR adds a test for checking `ShapeEnv` state when it's built with `should_record_events=False`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109945 Approved by: https://github.com/ezyang ghstack dependencies: #109904, #109944	2023-09-26 07:20:46 +00:00
Animesh Jain	2ac7e52d34	[dynamo][nn_module_guards] Config flag to disable nn_module_guards (#110039 ) This flag is requested by @Chillee who is seeing recompilations with simple gpt experiments. We are observing recompilations because `_parameters` ordered dict keeps changing from run to run, and its unclear why that is happening. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110039 Approved by: https://github.com/Chillee ghstack dependencies: #110023	2023-09-26 06:35:23 +00:00
Justin Yip	dd819138da	[pytorch vulkan] add tensor vulkan check for at::cat (#109936 ) Summary: Saw this issue when running pytorch vulkan on a LSTM model: https://www.internalfb.com/phabricator/paste/view/P834993118 Found that we don't always to the vulkan transfer on `at::cat` Test Plan: (Not running the LSTM model yet. Since there are other crahses.) ``` [yipjustin@47884.od /data/sandcastle/boxes/fbsource (3fd2308f8\|remote/fbcode/warm_fbcode_od_stable...)]$ LD_LIBRARY_PATH=third-party/swiftshader/lib/linux-x64/ buck run fbcode/mode/dev-nosan //xplat/caffe2:pt_vulkan_api_test_bin -- --gtest_filter="cat" Building: finished in 0.1 sec (100%) 330/330 jobs, 0/330 updated Total time: 0.2 sec BUILD SUCCEEDED Running main() from third-party/googletest/1.11.0/googletest/googletest/src/gtest_main.cc Note: Google Test filter = cat [==========] Running 43 tests from 1 test suite. [----------] Global test environment set-up. [----------] 43 tests from VulkanAPITest [ RUN ] VulkanAPITest.replication_pad2d [ OK ] VulkanAPITest.replication_pad2d (102 ms) [ RUN ] VulkanAPITest.cat_4d_dim0_invalidinputs_exceptions [ OK ] VulkanAPITest.cat_4d_dim0_invalidinputs_exceptions (67 ms) [ RUN ] VulkanAPITest.cat_4d_dim0_samebatch_success [ OK ] VulkanAPITest.cat_4d_dim0_samebatch_success (111 ms) [ RUN ] VulkanAPITest.cat_4d_dim0_diffbatch_success [ OK ] VulkanAPITest.cat_4d_dim0_diffbatch_success (76 ms) [ RUN ] VulkanAPITest.cat_4d_dim0_singledepth_success [ OK ] VulkanAPITest.cat_4d_dim0_singledepth_success (40 ms) [ RUN ] VulkanAPITest.cat_4d_dim0_singletensor_success [ OK ] VulkanAPITest.cat_4d_dim0_singletensor_success (7 ms) [ RUN ] VulkanAPITest.cat_4d_dim0_twotensors_success [ OK ] VulkanAPITest.cat_4d_dim0_twotensors_success (30 ms) [ RUN ] VulkanAPITest.cat_4d_dim0_negdim_success [ OK ] VulkanAPITest.cat_4d_dim0_negdim_success (78 ms) [ RUN ] VulkanAPITest.cat_4d_dim1_negdim_success [ OK ] VulkanAPITest.cat_4d_dim1_negdim_success (130 ms) [ RUN ] VulkanAPITest.cat_4d_dim2_negdim_success [ OK ] VulkanAPITest.cat_4d_dim2_negdim_success (75 ms) [ RUN ] VulkanAPITest.cat_4d_dim3_negdim_success [ OK ] VulkanAPITest.cat_4d_dim3_negdim_success (68 ms) [ RUN ] VulkanAPITest.cat_4d_dim1_texture2d_success [ OK ] VulkanAPITest.cat_4d_dim1_texture2d_success (2 ms) [ RUN ] VulkanAPITest.cat_4d_dim1_singledepth_success [ OK ] VulkanAPITest.cat_4d_dim1_singledepth_success (65 ms) [ RUN ] VulkanAPITest.cat_4d_dim1_singletensor_success [ OK ] VulkanAPITest.cat_4d_dim1_singletensor_success (8 ms) [ RUN ] VulkanAPITest.cat_4d_dim1_bat1_mult4ch_success [ OK ] VulkanAPITest.cat_4d_dim1_bat1_mult4ch_success (9 ms) [ RUN ] VulkanAPITest.cat_4d_dim1_bat2_mult4ch_success [ OK ] VulkanAPITest.cat_4d_dim1_bat2_mult4ch_success (18 ms) [ RUN ] VulkanAPITest.cat_4d_dim1_mult4ch_mixed_success [ OK ] VulkanAPITest.cat_4d_dim1_mult4ch_mixed_success (60 ms) [ RUN ] VulkanAPITest.cat_4d_dim2_sameheight_success [ OK ] VulkanAPITest.cat_4d_dim2_sameheight_success (80 ms) [ RUN ] VulkanAPITest.cat_4d_dim2_diffheight_success [ OK ] VulkanAPITest.cat_4d_dim2_diffheight_success (69 ms) [ RUN ] VulkanAPITest.cat_4d_dim2_singledepth_success [ OK ] VulkanAPITest.cat_4d_dim2_singledepth_success (12 ms) [ RUN ] VulkanAPITest.cat_4d_dim2_invalidinputs_exceptions [ OK ] VulkanAPITest.cat_4d_dim2_invalidinputs_exceptions (63 ms) [ RUN ] VulkanAPITest.cat_4d_dim3_invalidinputs_exceptions [ OK ] VulkanAPITest.cat_4d_dim3_invalidinputs_exceptions (86 ms) [ RUN ] VulkanAPITest.cat_4d_dim3_samewidth_success [ OK ] VulkanAPITest.cat_4d_dim3_samewidth_success (117 ms) [ RUN ] VulkanAPITest.cat_4d_dim3_diffwidth_success [ OK ] VulkanAPITest.cat_4d_dim3_diffwidth_success (72 ms) [ RUN ] VulkanAPITest.cat_3d_dim0_mult4ch_success [ OK ] VulkanAPITest.cat_3d_dim0_mult4ch_success (12 ms) [ RUN ] VulkanAPITest.cat_3d_dim0_diff_channel_success [ OK ] VulkanAPITest.cat_3d_dim0_diff_channel_success (28 ms) [ RUN ] VulkanAPITest.cat_3d_dim0_same_channel_success [ OK ] VulkanAPITest.cat_3d_dim0_same_channel_success (15 ms) [ RUN ] VulkanAPITest.cat_3d_dim1_diffheight_success [ OK ] VulkanAPITest.cat_3d_dim1_diffheight_success (21 ms) [ RUN ] VulkanAPITest.cat_3d_dim1_same_height_success [ OK ] VulkanAPITest.cat_3d_dim1_same_height_success (10 ms) [ RUN ] VulkanAPITest.cat_3d_dim2_diffwidth_success [ OK ] VulkanAPITest.cat_3d_dim2_diffwidth_success (21 ms) [ RUN ] VulkanAPITest.cat_3d_dim2_samewidth_success [ OK ] VulkanAPITest.cat_3d_dim2_samewidth_success (11 ms) [ RUN ] VulkanAPITest.cat_3d_dim0_negdim_success [ OK ] VulkanAPITest.cat_3d_dim0_negdim_success (25 ms) [ RUN ] VulkanAPITest.cat_3d_dim1_negdim_success [ OK ] VulkanAPITest.cat_3d_dim1_negdim_success (23 ms) [ RUN ] VulkanAPITest.cat_3d_dim2_negdim_success [ OK ] VulkanAPITest.cat_3d_dim2_negdim_success (10 ms) [ RUN ] VulkanAPITest.cat_2d_dim0_same_height_success [ OK ] VulkanAPITest.cat_2d_dim0_same_height_success (3 ms) [ RUN ] VulkanAPITest.cat_2d_dim0_diff_height_success [ OK ] VulkanAPITest.cat_2d_dim0_diff_height_success (2 ms) [ RUN ] VulkanAPITest.cat_2d_dim1_same_width_success [ OK ] VulkanAPITest.cat_2d_dim1_same_width_success (3 ms) [ RUN ] VulkanAPITest.cat_2d_dim1_diff_width_success [ OK ] VulkanAPITest.cat_2d_dim1_diff_width_success (4 ms) [ RUN ] VulkanAPITest.cat_2d_dim0_negdim_success [ OK ] VulkanAPITest.cat_2d_dim0_negdim_success (3 ms) [ RUN ] VulkanAPITest.cat_2d_dim1_negdim_success [ OK ] VulkanAPITest.cat_2d_dim1_negdim_success (3 ms) [ RUN ] VulkanAPITest.cat_1d_dim0_same_width_success [ OK ] VulkanAPITest.cat_1d_dim0_same_width_success (52 ms) [ RUN ] VulkanAPITest.cat_1d_dim0_diff_width_success [ OK ] VulkanAPITest.cat_1d_dim0_diff_width_success (0 ms) [ RUN ] VulkanAPITest.cat_1d_dim0_negdim_success [ OK ] VulkanAPITest.cat_1d_dim0_negdim_success (0 ms) [----------] 43 tests from VulkanAPITest (1717 ms total) [----------] Global test environment tear-down [==========] 43 tests from 1 test suite ran. (1717 ms total) [ PASSED ] 43 tests. YOU HAVE 4 DISABLED TESTS ``` Differential Revision: D49566743 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109936 Approved by: https://github.com/SS-JIA	2023-09-26 06:08:17 +00:00
Zain Rizvi	5dcee01c2b	Monitor baseline for TD prioritizations (#110031 ) For tests that TD prioritizes, we should track what their ordering _would have been_ if none of the TD heuristics had applied to it. This is useful for two reasons: 1. It lets us better understand TD may have contributed to that test running sooner 2. it's possible that heuristics actually mark a test as less important than the default sorting would have claimed (the default sorts tests in a fixed order). This will let us track how often that happens Pull Request resolved: https://github.com/pytorch/pytorch/pull/110031 Approved by: https://github.com/clee2000	2023-09-26 04:27:16 +00:00
Li-Huai (Allan) Lin	ac1e85161e	[MPS] Fix nll_loss with default ignore_index (#109574 ) `-100` should be a valid `ignore_index` as indicated in the linked issue. This PR also cleans up some unnecessary MPSTensor copies. Fixes #108148 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109574 Approved by: https://github.com/kulinseth ghstack dependencies: #109557	2023-09-26 04:13:09 +00:00
Li-Huai (Allan) Lin	0087118997	[MPS] Fix mps to cpu copy with storage offset (#109557 ) Fix #108978 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109557 Approved by: https://github.com/DenisVieriu97	2023-09-26 04:13:08 +00:00
Li-Huai (Allan) Lin	129f535778	[VMAP] Add linspace and logspace batch rules (#105451 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105451 Approved by: https://github.com/zou3519 ghstack dependencies: #107958, #104889	2023-09-26 04:08:24 +00:00
wangxiyuan	5589b81173	Remove redundant change for gloo (#106750 ) HIP deprecated symbols are removed by `d74270ece2` and `fe2ad9c328` which is included in pytorch gloo already. gloo in pytorch master: `597accfd79` There is no need to fix it in pytorch now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106750 Approved by: https://github.com/jithunnair-amd, https://github.com/kit1980	2023-09-26 03:46:14 +00:00
mikey dagitses	dddf07e56a	Reland: implement a function to convert a storage to copy-on-write (#110022 ) Relands #100819 In addition, the `impl_cow_context` library is combined into the base c10 core library, and COW unit tests are combined into just one binary. Part of #109833 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110022 Approved by: https://github.com/ezyang	2023-09-26 03:33:18 +00:00
rzou	76fcec74c4	[optests] Test names in failure dicts should be prefixed with test class (#110045 ) We want to use the same failures dict for multiple TestCase. This happens common in e.g. fbgemm. To move towards that, we need to prefix each test name with their test class to avoid ambiguity Differential Revision: [D49615962](https://our.internmc.facebook.com/intern/diff/D49615962/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110045 Approved by: https://github.com/williamwen42	2023-09-26 03:21:12 +00:00
Jez Ng	41bb5c27a2	Enable typechecking for _inductor/fx_passes/joint_graph.py (#109955 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109955 Approved by: https://github.com/Skylion007 ghstack dependencies: #109951, #109952, #109954	2023-09-26 02:49:43 +00:00
Jez Ng	86762f33d1	Enable typechecking for _inductor/fx_passes/pad_mm.py (#109954 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109954 Approved by: https://github.com/Skylion007 ghstack dependencies: #109951, #109952	2023-09-26 02:49:43 +00:00
Jez Ng	55f8553078	Enable typechecking for _inductor/fx_passes/pre_grad.py (#109952 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109952 Approved by: https://github.com/Skylion007 ghstack dependencies: #109951	2023-09-26 02:49:42 +00:00
Jez Ng	89fc66fb36	Enable typechecking for _inductor/fx_passes/split_cat.py (#109951 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109951 Approved by: https://github.com/Skylion007	2023-09-26 02:49:40 +00:00
Shoaib Meenai	ac60638c6c	[ndk] Clean up LLVM and libc++ 12 and 13 (#107326 ) Reviewed By: simpleton Differential Revision: D48410595 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107326 Approved by: https://github.com/yozhu	2023-09-26 02:05:27 +00:00
rzou	f8fcc54f70	Add torch.library.impl_abstract (#109912 ) Changelog: - torch.library.impl_abstract optionally accepts a torch.library.Library object. If passed in, then the lifetime of the registration is tied to the Library object. - we've also changed torch.library.impl_abstract to work on all operators, including overloads. - we refactored the `torch._custom_ops.` and `torch._custom_op.` impl_abstract APIs and put them under torch._library. This is the final resting place for them. I will follow-up with deleting all the `torch._custom_ops.` stuff later. - There is a new "SimpleOperatorRegistry" where we actually collect the abstract_impl. We will expand this to also hold the other torch._custom_ops. APIs when we move those to torch.library NB: Previously we had designed `impl_abstract` assuming a very high-level Python-only custom op API. We've revisited that since; now, impl_abstract works for all custom ops, no matter python or C++, no matter the schema. The new refactored design reflects this better. Test Plan: - existing and new tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/109912 Approved by: https://github.com/ezyang	2023-09-26 01:59:50 +00:00
Animesh Jain	b481349d3c	[dynamo][guards-log] Do not print duplicate guard entries (#110023 ) Cleans up logs for nn module guards. They always get duplicated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110023 Approved by: https://github.com/ezyang	2023-09-26 01:59:25 +00:00
Yuqing Jiang	56659844f9	[profiler] Show shapes for lists of tensors in chrome traces #109263 (#109751 ) Summary: https://github.com/pytorch/pytorch/issues/109263 Show the shape of tensorlist when the length is < 30. Test Plan: {F1097707985} and unit tests Reviewed By: davidberard98 Differential Revision: D49351902 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109751 Approved by: https://github.com/davidberard98	2023-09-26 01:03:54 +00:00
Bin Bao	4bf1cd6961	[aotinductor] Rename aot_runtime to aoti_runtime (#110007 ) Summary: Make the naming more explicit Differential Revision: D49593528 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110007 Approved by: https://github.com/houseroad	2023-09-26 00:46:54 +00:00
Edward Z. Yang	b07bebd4bd	Add default arguments to sym_constrain_range_for_size (#109858 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/109858 Approved by: https://github.com/williamwen42	2023-09-26 00:35:33 +00:00
cyy	bcedbac96a	Re-enable more Windows tests (#109847 ) Follows the work of #108930. The commented test_custom_classes.py was removed since the file doesn't exist. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109847 Approved by: https://github.com/kit1980	2023-09-26 00:29:31 +00:00
Yanbo Liang	a81cb0de16	[Dynamo] Support python class member_descriptor (#109956 ) Fixes Meta internal cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109956 Approved by: https://github.com/jansel	2023-09-26 00:03:41 +00:00
Edward Z. Yang	5f6216b12c	Add torch.fx.experimental.recording to uninteresting_files() (#109887 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/109887 Approved by: https://github.com/Chillee	2023-09-25 23:22:29 +00:00
Mu-Chu Lee	7af30ea54c	[AOTInductor] Bug fix for redefining symbol name (#110041 ) Summary: Bug fix for redefining symbol name. Test Plan: python benchmarks/dynamo/huggingface.py --bfloat16 --accuracy --inference --device cuda --export-aot-inductor --cold-start-latency --only OPTForCausalLM Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/110041 Approved by: https://github.com/desertfire	2023-09-25 23:03:06 +00:00
Andrei Gheorghe	6275f91654	Improved DDP checkpoint documentation (#106985 ) Amended the documentation for the specified case. Fixes #84589 Pull Request resolved: https://github.com/pytorch/pytorch/pull/106985 Approved by: https://github.com/wanchaol, https://github.com/fduwjj	2023-09-25 22:54:24 +00:00
Sam Larsen	7ed06e8317	[inductor] enable mypy checking in torch/_inductor/codegen/cpp.py (#109729 ) Summary: Add enough typehints / ignores to enable mypy checking in torch/_inductor/codegen/cpp.py Test Plan: lintrunner Pull Request resolved: https://github.com/pytorch/pytorch/pull/109729 Approved by: https://github.com/Skylion007	2023-09-25 22:53:05 +00:00
Nikita Shulga	f87863335c	[BE]s/DEFINE_ENUM/DEFINE_ST_ENUM_VAL_/ (#109917 ) To avoid potential collisions with other libraries that can define such enum globally (which is a bad practice, but happens sometimes) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109917 Approved by: https://github.com/Skylion007	2023-09-25 22:19:09 +00:00
angelayi	57cdad2396	[aotinductor] Update benchmark to include compilation time (#109998 ) Fixes [comment](https://github.com/pytorch/pytorch/pull/109820#pullrequestreview-1638629777) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109998 Approved by: https://github.com/desertfire	2023-09-25 21:30:22 +00:00
Pritam Damania	ab70183c53	[RFC] Allow "spawn" start method for torchinductor workers. (#108850 ) Context: https://github.com/pytorch/pytorch/issues/108586 This PR adds a config to torchinductor such that users can specify the multiprocessing context for TorchInductor workers in codecache. This would allow users a choice of using "spawn" in multithreaded environments instead of "fork" being hardcoded as the default. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108850 Approved by: https://github.com/ezyang, https://github.com/zdevito	2023-09-25 21:30:17 +00:00
Yukio Siraichi	a4dec8d306	Add test for `ShapeEnv` recording fallback. (#109944 ) This PR adds a test for the previous PR in this stack: #109904. In summary, it calls functions decorated with `@record_shapeenv_event`, that don't have an explicit `ShapeEnv` parameter, with arguments that don't hold a `ShapeEnv` instance. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109944 Approved by: https://github.com/ezyang ghstack dependencies: #109904	2023-09-25 20:59:41 +00:00
Mwiza Kunda	5c4b5baf21	Fix python decomps for OpOverloadPackets and add tests (#107707 ) - Extend `test_torch_dispatch_meta_outplace` to test torch ops that do not have an out parameter but have aten op overloads that have out parameters. Additionally, Python decompositions may register `OpOverloadPacket`'s so decompositions need to be tested to ensure all `OpOverloads` still function for the `Meta` key (e.g. if a python decomposition is registered for an aten op `aten.foo` with overloads `[default, out]`, the python function needs to support receiving out arguments) - Add out parameter wrappers to python decomps for aten ops that have out overloads CC. @ezyang @albanD @lezcano Fixes #107713 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107707 Approved by: https://github.com/lezcano	2023-09-25 20:53:30 +00:00
PyTorch MergeBot	c1a2f35805	Revert "Disallow skipping dynamo (#109476 )" This reverts commit 7bb1d10c2ff06116506fb190c1b816a5b75f46ff. Reverted https://github.com/pytorch/pytorch/pull/109476 on behalf of https://github.com/atalman due to Failing internal CI ([comment](https://github.com/pytorch/pytorch/pull/109476#issuecomment-1734402581))	2023-09-25 20:20:50 +00:00
fwenguang	c4f2b6dbd2	[profiler] use PyCFunction_Check to check both PyCMethod_Type and PyC… (#110002 ) At https://github.com/pytorch/pytorch/blob/main/torch/csrc/autograd/profiler_python.cpp#L1096, when what is PyTrace_C_CALL, Py_TYPE(arg) only can be PyCFunction_Type before python3.9. But in python3.9 or later, Py_TYPE(arg) also can be PyCMethod_Type. PyCMethod_Type is subtype of PyCFunction_Type, ref to `f2eaa92b0c/Objects/methodobject.c (L372)`. So there should use PyCFunction_Check to check arg->ob_type. Fixes #109877 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110002 Approved by: https://github.com/ezyang	2023-09-25 20:17:25 +00:00
PyTorch MergeBot	83deaa16ed	Revert "[1/N] Cleanup header inclusions in torch_cpu by iwyu (#101178 )" This reverts commit b7a95f4fdb8a79dc459cc757dafcdbd0953b1a62. Reverted https://github.com/pytorch/pytorch/pull/101178 on behalf of https://github.com/atalman due to Break internal CI ([comment](https://github.com/pytorch/pytorch/pull/101178#issuecomment-1734384645))	2023-09-25 20:05:25 +00:00
Zain Rizvi	d6cc3ac8b2	Add PR number to metrics when available (#109406 ) <!-- copilot:summary --> ### <samp>🤖 Generated by Copilot at 780bfa6</samp> Add a new metric for pull request number in `tools/stats/upload_metrics.py`. This allows tracking the CI performance of pull requests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109406 Approved by: https://github.com/kit1980, https://github.com/malfet, https://github.com/clee2000	2023-09-25 19:57:34 +00:00
Ken Jin	3de0857503	[Dynamo] Match closures by code ID (#109427 ) Closes https://github.com/pytorch/pytorch/issues/107866 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109427 Approved by: https://github.com/ezyang, https://github.com/jansel	2023-09-25 19:10:35 +00:00
Moritz Hennen	09c598745c	Rename `torch._C._TensorBase` to `TensorBase` (#109940 ) I have gone ahead and implemented the renaming of the type `torch._C._TensorBase` to a non-private class name `TensorBase`. The changes also include leaving `torch._C._TensorBase` as an alias to the new type: `70458768fb/torch/csrc/autograd/python_variable.cpp (L2196-L2197)` both in the c++ code and in the corresponding `__init__.pyi.in` file: `70458768fb/torch/_C/__init__.pyi.in (L1522)` Fixes #109438 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109940 Approved by: https://github.com/ezyang	2023-09-25 19:10:22 +00:00
angelayi	a565f1bee6	[aotinductor] Skip benchmarks with control flow (#109661 ) Since AOTInductor doesn't support control flow yet, we will skip over tests that are currently failing due to containing control flow in the code. Logs taken from https://hud.pytorch.org/benchmark/compilers?startTime=Tue%2C%2012%20Sep%202023%2022%3A56%3A40%20GMT&stopTime=Tue%2C%2019%20Sep%202023%2022%3A56%3A40%20GMT&granularity=hour&suite=torchbench&mode=inference&dtype=bfloat16&lBranch=main&lCommit=2c1554a0323107d821be3ff13df7833b9f0b960d&rBranch=main&rCommit=47be61e12bd51df27182343d312dc3df485d5559 Errors documented in https://github.com/pytorch/pytorch/issues/105217 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109661 Approved by: https://github.com/desertfire	2023-09-25 18:49:06 +00:00
Aaron Gokaslan	6b39cf863f	Fix invalid arg to getLogger in torch distributed checkpoint (#110008 ) Ran the experimental LOG002 ruff check and found a bug in our codebase. Logger should not be instantiated from `__file__`, it should be instantiated from `__name__` https://docs.astral.sh/ruff/rules/invalid-get-logger-argument/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/110008 Approved by: https://github.com/ezyang	2023-09-25 18:21:18 +00:00
SS-JIA	7de669f2f9	[core IR] Remove trunc decomp and add trunc to core (#109902 ) Following up from [this comment](https://github.com/pytorch/pytorch/pull/109319#discussion_r1330803226). Remove the decomposition for `trunc`, and add it as a core operator. Going forward, provide similar treatment for operators that map cleanly to hardware instructions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109902 Approved by: https://github.com/peterbell10	2023-09-25 18:18:06 +00:00
Peter Bell	fe5e63f5db	[inductor] Do type promotion in pointless cumsum pattern replacement (#109960 ) Fixes #109925 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109960 Approved by: https://github.com/Fidget-Spinner, https://github.com/lezcano	2023-09-25 18:17:15 +00:00
eellison	4734496a0c	Extend storage access error api for untyped_storage() (#109750 ) In cudagraph trees, we invalidate tensors at some point and drop their storage. Then, when they are accessed with .data_ptr(), a custom error message is thrown. Previously, this invalidation didn't also make untyped_storage()/storage() error which could result in a segfault. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109750 Approved by: https://github.com/zou3519	2023-09-25 17:51:27 +00:00
PyTorch MergeBot	a5364b12bb	Revert "[ONNX] Remove the depreacated function `_export` (#109763 )" This reverts commit d7c05bb2e8de24386664c01e887357ff50a09842. Reverted https://github.com/pytorch/pytorch/pull/109763 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/109763#issuecomment-1734201053))	2023-09-25 17:47:21 +00:00
PyTorch MergeBot	52e14787ae	Revert "MAINT: pytorchify torch._numpy tests: core/ and fft/ (#109815 )" This reverts commit 5ad1baf6fa036690786cc45dafb79c6a4656cec5. Reverted https://github.com/pytorch/pytorch/pull/109815 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it is failing a slow test in trunk `5ad1baf6fa`. Please help fix and reland the change ([comment](https://github.com/pytorch/pytorch/pull/109815#issuecomment-1734137821))	2023-09-25 17:01:27 +00:00
PyTorch MergeBot	f5886bf352	Revert "[3/N][2D] Enable training with new 2D flow (#109553 )" This reverts commit 217b37c023d58854a7a6117c3726ed44786c9d03. Reverted https://github.com/pytorch/pytorch/pull/109553 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but those distributed failures look legit and they are failing in trunk https://hud.pytorch.org/pr/109553 ([comment](https://github.com/pytorch/pytorch/pull/109553#issuecomment-1734100546))	2023-09-25 16:37:19 +00:00
Randolf Scholz	837272f150	Python 3.10 Union operator \| support for JIT (#109293 ) Fixes #101777 - [x] Duplicated the tests from `test/jit/test_union.py` into [`test/jit/test_union_pep604.py`](https://github.com/pytorch/pytorch/pull/109293/files#diff-b981f6493093482b43b0e62057b0c01b004b3e932d4e63a1166c3808c0172b83), using PEP604 style Unions - [x] Exchanged custom `get_args` and `get_origin` with `typing.get_args` and `typing.get_origin` which have the same functionality and became part of the standard library in 3.8 - [x] Added utility function `pep604union_to_union` in `tree_views.h` which converts a `BinOP("\|")` node into the corresponding `Union`. This function intercepts `ScriptTypeParser::parseTypeFromExpr` and `ScriptTypeParser::parseTypeFromExprImpl` and patches the expression. - [ ] There is a single failing test, I commented it out for the moment to see if CI complains about anything else. I tried several hours to figure out how to patch it, but I am not experienced with C++ development and debugging. From what I could gather, the following fails: ```python def test_union_optional_of_union_return(self): @torch.jit.script def fn() -> None \| str \| int: y: Optional[int \| str] = "foo" return y ``` In the section: `75b954b715/torch/csrc/jit/frontend/script_type_parser.cpp (L232-L243)` When using regular `Union`, the `resolver` path is taken, whereas with the patch pep604 union, `resolveType` doesn't work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109293 Approved by: https://github.com/ezyang	2023-09-25 15:35:54 +00:00
Brian Hirsh	d0fe8fa5db	Reland "Update AOTAutograd to use FunctionalTensorMode instead of C++ functionalization (#106406 )" (#109906 ) I'm pretty sure this is fixed but I'll run inductor and trunk CI. The failing test in trunk previously was that the selective activation checkpointing code that landed recently assumes that it can detect whether or not AOTAutograd is running by seeing if the inputs to SAC are C++ `FunctionalTensorWrapper`s previous land broke some inductor trunk tests This reverts commit 629a628cc8bb1f62e2cce11bf0c8a00d3d06f896. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109906 Approved by: https://github.com/ezyang	2023-09-25 14:53:54 +00:00
Michael Voznesensky	3beed41e12	[Easy] Remove hook warning where source is always guaranteed (#109898 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109898 Approved by: https://github.com/ezyang	2023-09-25 14:36:28 +00:00
Pritam Damania	5565a29568	Release GIL in torch.cuda ops wherever possible. (#109159 ) Most `torch.cuda` ops (ex: `torch.cuda.synchronize`) do not release GIL in C++ land. This has the potential of causing deadlocks and freeze the python process. For example, `torch.cuda.synchronize` could hold GIL and get blocked on some operation. However, that operation might never complete in python land since GIL is held by `torch.cuda.synchronize`. In this PR, I've tried to release GIL as much as possible in `torch.cuda` ops. See https://github.com/pytorch/pytorch/issues/109074 for an example of how holding GIL causes a deadlock. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109159 Approved by: https://github.com/ezyang	2023-09-25 14:35:31 +00:00
Navid Sheik	96a3a7cc82	[pytorch] make IterableDataset of Iterable type (#109645 ) Summary: Makes `IterableDataset` of `Iterable` type. Test Plan: tests next diff in the stack are all green Differential Revision: D49420146 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109645 Approved by: https://github.com/DanilBaibak, https://github.com/Skylion007	2023-09-25 14:18:15 +00:00
Aleksandar Samardžić	6a202c36af	Minor fixes in semi-structured sparse code (#105595 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105595 Approved by: https://github.com/jcaip	2023-09-25 14:06:08 +00:00
PyTorch MergeBot	829b5c0949	Revert "[Dynamo] Support python class member_descriptor (#109956 )" This reverts commit 12cd776d902dea1ee3f0ef7980bea62ff64096d2. Reverted https://github.com/pytorch/pytorch/pull/109956 on behalf of https://github.com/jeanschmidt due to multiple slow jobs broken ([comment](https://github.com/pytorch/pytorch/pull/109956#issuecomment-1733706269))	2023-09-25 13:25:45 +00:00
wz337	217b37c023	[3/N][2D] Enable training with new 2D flow (#109553 ) This PR enables training with new 2D flow and adds associated test. state_dict related changes would be in later PRs. cc. @fegin, @fduwjj, @wanchaol, @awgu Pull Request resolved: https://github.com/pytorch/pytorch/pull/109553 Approved by: https://github.com/fegin, https://github.com/awgu	2023-09-25 05:32:07 +00:00
Yanbo Liang	12cd776d90	[Dynamo] Support python class member_descriptor (#109956 ) Fixes Meta internal cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109956 Approved by: https://github.com/jansel	2023-09-25 03:15:39 +00:00
cyy	265acd4bea	Clean up CMake target linking (#109959 ) This PR cleans up more CMake target linking. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109959 Approved by: https://github.com/ezyang	2023-09-25 01:37:14 +00:00
CaoE	7c9052165a	add fp16 support for native conv and deconv on CPU (#99497 ) ### Testing Native conv vs. mkldnn conv on SPR (with avx512_fp16 support) Single core: Input \| Naïve impl / us \| oneDNN / us \| Speed up -- \| -- \| -- \| -- IC: 64, OC: 256, kernel: 1, stride: 1, N: 256, H: 56, W: 56, G: 1, pad: 0 \| 34676789 \| 524199.8 \| 66.15185 IC: 128, OC: 512, kernel: 1, stride: 1, N: 256, H: 28, W: 28, G: 1, pad: 0 \| 33454125 \| 349844.4 \| 95.62573 IC: 256, OC: 256, kernel: 3, stride: 1, N: 1, H: 16, W: 16, G: 1, pad: 0 \| 317650.1 \| 2317.677 \| 137.0554 IC: 128, OC: 256, kernel: 3, stride: 1, N: 1, L: 64 \| 15334.68 \| 167.264 \| 91.67952 56 cores: Input \| Naïve impl / us \| oneDNN / us \| Speed up -- \| -- \| -- \| -- IC: 64, OC: 256, kernel: 1, stride: 1, N: 256, H: 56, W: 56, G: 1, pad: 0 \| 1032064 \| 11073.58 \| 93.20061 IC: 128, OC: 512, kernel: 1, stride: 1, N: 256, H: 28, W: 28, G: 1, pad: 0 \| 1000097 \| 16371.19 \| 61.08883 IC: 256, OC: 1024, kernel: 1, stride: 1, N: 256, H: 14, W: 14, G: 1, pad: 0 \| 981813.4 \| 9008.908 \| 108.9825 IC: 1024, OC: 256, kernel: 1, stride: 1, N: 256, H: 14, W: 14, G: 1, pad: 0 \| 1082606 \| 10150.47 \| 106.6558 IC: 256, OC: 256, kernel: 3, stride: 1, N: 1, H: 16, W: 16, G: 1, pad: 0 \| 319980.6 \| 181.598 \| 1762.027 Pull Request resolved: https://github.com/pytorch/pytorch/pull/99497 Approved by: https://github.com/jgong5, https://github.com/cpuhrsch	2023-09-25 01:31:26 +00:00
Evgeni Burovski	ca5f3a7436	TST: test that numpy dtypes do not graph break (#109974 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109974 Approved by: https://github.com/lezcano	2023-09-25 01:00:39 +00:00
Edward Z. Yang	84a67c0665	Use wrapper instead of V.graph.wrapper_code (#109883 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/109883 Approved by: https://github.com/voznesenskym, https://github.com/jansel	2023-09-24 23:11:11 +00:00
Edward Z. Yang	10f9edc99d	Don't -Werror on cast-function-type (#109796 ) I recently built PyTorch with clang and we are apparently not warnings clean on this. Since we don't have any contbuild that catches this situation, just get rid of it. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/109796 Approved by: https://github.com/cpuhrsch	2023-09-24 23:05:10 +00:00
fduwjj	bb74d9104f	[PTD][TP] Refactor the test and temporary disable one test case (#109919 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109919 Approved by: https://github.com/wz337	2023-09-24 22:52:20 +00:00
Evgeni Burovski	5ad1baf6fa	MAINT: pytorchify torch._numpy tests: core/ and fft/ (#109815 ) 1. Inherit from TestCase 2. Use pytorch parametrization 3. Use unittest.expectedFailure to mark xfails, also unittest skips All this to make pytest-less invocation work: $ python test/torch_np/test_basic.py cross-ref #109593, #109718, #109775 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109815 Approved by: https://github.com/lezcano	2023-09-24 16:46:01 +00:00
jjsjann123	0d3db1048a	remove nvfuser test in upstream pytorch (#109918 ) Removing nvfuser related tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/109918 Approved by: https://github.com/msaroufim	2023-09-24 13:49:37 +00:00
Evgeni Burovski	befe60afc2	TST: pytorchify test/torch_np/test_dtype.py (#109967 ) This file was missing from https://github.com/pytorch/pytorch/pull/109593 NB: This PR only mechanically converts the test. Will add more tests to see what's going on with `dtype=np.float64` etc under dynamo. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109967 Approved by: https://github.com/lezcano	2023-09-24 13:34:02 +00:00
Michael Voznesensky	95e2eec9bf	Better invariants - always route list/tuple to their requisite VTs instead of ConstantVariable (#109869 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109869 Approved by: https://github.com/jansel	2023-09-24 08:52:42 +00:00
Yu, Guangye	e9c9b1ed59	[Inductor] Generalize inductor triton backend device agnostic (#109486 ) # Motivation @jansel As discussed before, we expected to generalize some cuda-specific code. This can make inductor more friendly to third-party backend so that we can leverage inductor code as much as possible. # Solution To implement this, we give a solution to introduce device runtime abstraction. We wrapper them inside `DeviceInterface` and use `register_interface_for_device` to register each kind of device to inductor. Then use `get_interface_for_device` to fetch the corresponding runtime from device type. Then usage is like this: ```python device_interface = get_interface_for_device("xpu") device_interface .is_available() # to check if XPU is available device_interface .device_count() # to check how much XPU device is available ``` The `DeviceInterface` is a simple abstraction, which enables third-party backends that implement CUDA-like semantics to be integrated with inductor. This can prevent third-party backend from using monkey patch to override some utility functions, like `decode_device` that is hard-coded with CUDA. # Additional Context The main code change: - To leverage AsyncCompile, make it device-agnostic - Avoid monkey patches, make some utility functions device-agnostic Pull Request resolved: https://github.com/pytorch/pytorch/pull/109486 Approved by: https://github.com/jansel, https://github.com/jgong5, https://github.com/EikanWang	2023-09-24 07:49:20 +00:00
cyy	b7a95f4fdb	[1/N] Cleanup header inclusions in torch_cpu by iwyu (#101178 ) Following our previous IWYU work #100304 on C10, it makes more sense to try IWYU on torch_cpu. This PR does exactly that. Meanwhile, it fixes issue #48684. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101178 Approved by: https://github.com/ezyang	2023-09-24 05:01:20 +00:00
cyy	dee100945e	[2/N] Move c10::variant to std::variant (#109723 ) This PR moves most of c10::variant calls to std::variant. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109723 Approved by: https://github.com/ezyang	2023-09-24 02:47:43 +00:00
Ed Pizzi	c13177f2cb	[FSDP] Propagate requires_grad attribute to unsharded params (#109892 ) Summary: This preserves `requires_grad` in the case where all parameters within a `FlatParameter` have the same `requires_grad` value. Currently, unsharded parameters have `requires_grad=True` in some cases where the `FlatParameter` and all original parameters have `requires_grad=False`. This could be extended to support `FlatParameters` with a mix of `requires_grad` states by extending `ParamInfo` to capture `requires_grad` for each parameter. Test Plan: test added Differential Revision: D49517155 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109892 Approved by: https://github.com/awgu	2023-09-24 01:30:50 +00:00
PyTorch MergeBot	ebb30bdd6f	Revert "Better invariants - always route list/tuple to their requisite VTs instead of ConstantVariable (#109869 )" This reverts commit 06aa6966a88586d34e6470cc2149121d17971056. Reverted https://github.com/pytorch/pytorch/pull/109869 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but the failed test looks legit as it is also failing in trunk `06aa6966a8` ([comment](https://github.com/pytorch/pytorch/pull/109869#issuecomment-1732424765))	2023-09-23 22:42:23 +00:00
PyTorch MergeBot	d9627c4264	Revert "[inductor] fix a max-autotune rng state related bug (#109828 )" This reverts commit 3663436db31bd3cebcb76efe05d8355553a05c57. Reverted https://github.com/pytorch/pytorch/pull/109828 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but the rocm failure looks legit. There is also another numpy import error when running dynamo test on CPU ([comment](https://github.com/pytorch/pytorch/pull/109828#issuecomment-1732423883))	2023-09-23 22:35:37 +00:00
wz337	b89ce814c0	[FSDP] Remove _set_use_dtensor in post_load_state_dict_hook (#109924 ) This is a follow up for https://github.com/pytorch/pytorch/pull/109767. We only need _set_use_dtensor in pre_state_dict_hook() and pre_load_state_dict_hook() and we do not need _set_use_dtensor in _post_load_state_dict_hook(). This PR removes _set_use_dtensor in post_load_state_dict_hook(). In addition, this PR adjusts the test cases in test_hsdp_dtensor_state_dict.py to capture changes in https://github.com/pytorch/pytorch/pull/109767 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109924 Approved by: https://github.com/fegin	2023-09-23 22:34:36 +00:00
Tugsbayasgalan Manlaibaatar	7bb1d10c2f	Disallow skipping dynamo (#109476 ) Based on William's recent diff on preserving node metadata on retracing, we no longer need to skip dynamo on retracing. This softens our previous restriction of not allowing any new constraints from user side because we can utilize dynamo to analyze through constraints now. As a result, re-export can technically happen with any new constraints. This opens up another problem that "Is it ok to use more loose constraints on the retracing?" If we allow loose constraints, we can technically diverge from eager behaviour because for example we could have eliminated unsafe control flow based on previous assumption. But we can also argue this is ok because we can say we treat the Exported callable to be an independent callable from its' original source code. We can technically ban loose constraints inside export, but my concern is we are breaking abstraction by doing special case checks on ExportedProgram. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109476 Approved by: https://github.com/avikchaudhuri, https://github.com/zhxchen17	2023-09-23 22:15:18 +00:00
Tobias Ringwald	460fc9da62	Disabled UserWarnings for some public functions in torch.overrides (#109890 ) Fixes #109842. This disables the implicit `UserWarning`s that were raised for deprecated `torch` attributes. The filtering was designed to be as specific as possible, in order to not filter any other warnings that may be raised. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109890 Approved by: https://github.com/ezyang	2023-09-23 20:40:04 +00:00
Yukio Siraichi	f35cc0fb6f	Don't record function call if `ShapeEnv` is not found. (#109904 ) Fix: #109844 - Redirecting execution to original function if `ShapeEnv` instance is not found in its arguments - Removed `dont_record_shape_env_events`, as it wasn't being used anywhere Pull Request resolved: https://github.com/pytorch/pytorch/pull/109904 Approved by: https://github.com/ezyang	2023-09-23 19:48:24 +00:00
Evgeni Burovski	92c49e2168	MAINT/TST: pytorch-ify torch._numpy tests (added tests only, not vendored) (#109593 ) 1. Inherit from TestCase 2. Use pytorch parametrization 3. Use unittest.expectedFailure to mark xfails All this to make pytest-less invocation work: $ python test/torch_np/test_basic.py Furthermor, tests can now be run under dynamo, and we see first errors: ``` $ PYTORCH_TEST_WITH_DYNAMO=1 python test/torch_np/test_basic.py -k test_toscalar_list_func .E. ====================================================================== ERROR: test_toscalar_list_func_<function shape at 0x7f9b83a4fc10>_np_func_<function shape at 0x7f9a8dd38af0> (__main__.TestOneArrToScalar) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/ev-br/repos/pytorch/torch/testing/_internal/common_utils.py", line 356, in instantiated_test test(self, *param_kwargs) File "test/torch_np/test_basic.py", line 232, in test_toscalar_list @parametrize("func, np_func", one_arg_scalar_funcs) File "/home/ev-br/repos/pytorch/torch/nn/modules/module.py", line 1519, in _wrapped_call_impl return self._call_impl(args, *kwargs) File "/home/ev-br/repos/pytorch/torch/nn/modules/module.py", line 1528, in _call_impl return forward_call(args, *kwargs) File "/home/ev-br/repos/pytorch/torch/_dynamo/eval_frame.py", line 406, in _fn return fn(args, *kwargs) File "/home/ev-br/repos/pytorch/torch/fx/graph_module.py", line 726, in call_wrapped return self._wrapped_call(self, args, *kwargs) File "/home/ev-br/repos/pytorch/torch/fx/graph_module.py", line 305, in __call__ raise e File "/home/ev-br/repos/pytorch/torch/fx/graph_module.py", line 292, in __call__ return super(self.cls, obj).__call__(args, *kwargs) # type: ignore[misc] File "/home/ev-br/repos/pytorch/torch/nn/modules/module.py", line 1519, in _wrapped_call_impl return self._call_impl(args, *kwargs) File "/home/ev-br/repos/pytorch/torch/nn/modules/module.py", line 1528, in _call_impl return forward_call(args, **kwargs) File "<eval_with_key>.2", line 5, in forward shape = torch._numpy._funcs_impl.shape([[1, 2, 3], [4, 5, 6]]) File "/home/ev-br/repos/pytorch/torch/_numpy/_funcs_impl.py", line 655, in shape return tuple(a.shape) AttributeError: 'list' object has no attribute 'shape' ---------------------------------------------------------------------- Ran 3 tests in 0.915s FAILED (errors=1) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/109593 Approved by: https://github.com/lezcano	2023-09-23 18:18:50 +00:00
Evgeni Burovski	8d47f90e50	Pytorchify numpy vendored tests in torch_np/lib/ (#109718 ) 1. Inherit from TestCase 2. Use pytorch parametrization 3. Use unittest.expectedFailure to mark xfails, also unittest skips All this to make pytest-less invocation work: $ python test/torch_np/test_basic.py cross-ref https://github.com/pytorch/pytorch/pull/109593, https://github.com/pytorch/pytorch/pull/109775 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109718 Approved by: https://github.com/ezyang	2023-09-23 15:31:03 +00:00
lezcano	835c18e7ea	Avoid saving `self` for mean.backward (#109935 ) Fixes https://github.com/pytorch/pytorch/issues/109876 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109935 Approved by: https://github.com/soulitzer	2023-09-23 11:50:54 +00:00
wz337	a13201e857	[DCP] Add unit test for FSDP -> TP checkpoint conversion (#109899 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109899 Approved by: https://github.com/rohan-varma	2023-09-23 09:19:45 +00:00
sdp	2872f788aa	add path for DPC++ SYCL device code in Float8_e4m3fn (#109911 ) Building IPEX-XPU with PyTorch fails with `error: builtin is not supported on this target _BitScanReverse` on Windows. The root cause of the error is due to `_BitScanReverse` compiler intrinsic function not being supported in SYCL target device code with DPC++ compiler, while being supported in host code with MSVC compiler. Thanks to @gujinghui, @xuhancn for the help in identifying the root cause and debugging. A minimal reproducible script: ```cpp #include <CL/sycl.hpp> #include <chrono> #include <iostream> #ifdef _MSC_VER #include <intrin.h> #endif void test( sycl::queue& q) { uint8_t input = 123; const uint32_t w = (uint32_t)input << 24; const uint32_t nonsign = w & UINT32_C(0x7FFFFFFF); unsigned long nonsign_bsr; _BitScanReverse(&nonsign_bsr, (unsigned long)nonsign); // host code, no error sycl::range<2> global_range{1, 1}; sycl::range<2> local_range{1, 1}; auto e = q.submit([&](auto& h) { sycl::stream out(100000, 256, h); h.parallel_for(sycl::nd_range<2>{global_range, local_range}, [=](sycl::nd_item<2> item) { #if defined(_MSC_VER) uint8_t input = 123; const uint32_t w = (uint32_t)input << 24; unsigned long nonsign_bsr; _BitScanReverse(&nonsign_bsr, (unsigned long)nonsign); // device code, error: builtin is not supported on this target #else __builtin_clz(nonsign); #endif // Fix to add a check for SYCL device code: /* #if defined(__SYCL_DEVICE_ONLY__) out << "DPC++ SYCL" << sycl::endl; __builtin_clz(nonsign); #elif defined(_MSC_VER) out << "MSVC" << sycl::endl; uint8_t input = 123; const uint32_t w = (uint32_t)input << 24; unsigned long nonsign_bsr; _BitScanReverse(&nonsign_bsr, (unsigned long)nonsign); #endif */ }); }); q.wait(); } int main() { #if defined(__SYCL_DEVICE_ONLY__) std::cout << "DPC++ SYCL" << std::endl; #elif defined(_MSC_VER) std::cout << "MSVC" << std::endl; #endif sycl::queue q(sycl::default_selector_v); test(q); return 0; } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/109911 Approved by: https://github.com/ezyang	2023-09-23 07:07:22 +00:00
Huamin Li	85ddc985d0	Back out "[pytorch][PR] [Inductor] Extend Pattern Matcher to Match Equivalent Function Invocation" (#109931 ) Summary: Original commit changeset: 3466b85fe0a1 Original Phabricator Diff: D49433268 More context D49536556 bypass-github-pytorch-ci-checks Test Plan: revertreverthammer Differential Revision: D49565384 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109931 Approved by: https://github.com/houseroad	2023-09-23 05:58:08 +00:00
Oleg Khabinov	54faedf5f2	[AOTInductor] Load model on arbitrary device (#109816 ) Reviewed By: desertfire Differential Revision: D49402404 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109816 Approved by: https://github.com/chenyang78	2023-09-23 04:45:20 +00:00
Ying Zhang	bbdce93571	Basic fp8 support in Inductor (#109168 ) Add basic fp8 support in Inductor, including: * Fix fp8 Triton codegen issues; * Add min_elements_per_thread requirement for fp8 related dtype conversions. More details on Triton implementation can be found from `10f59d8ce0/lib/Conversion/TritonGPUToLLVM/ElementwiseOpToLLVM.cpp (L10)`. Note that the current implementation only works for Pointwise. Will create follow-up PRs for Reduction. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109168 Approved by: https://github.com/drisspg	2023-09-23 04:41:41 +00:00
Ying Zhang	ff7af15e80	Re-enable max_autotune tests for the CUTLASS backend. (#109831 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109831 Approved by: https://github.com/aakhundov	2023-09-23 04:27:40 +00:00
Kunal Vaishnavi	c0d746c90e	[ONNX] Relax getting module attributes in ONNX export (#109759 ) ### Description This PR fixes a bug with getting module attributes during `torch.onnx.export` when `export_modules_as_functions` is used. With this fix, we can compare the LLaMA-2 models produced by the TorchScript exporter and the [Dynamo exporter](https://github.com/pytorch/pytorch/issues/104903). ### Context When exporting LLaMA-2 from Hugging Face with `export_modules_as_functions`, the `Embedding` object does not have the `freeze` attribute. ``` File "/home/kvaishnavi/.local/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 662, in forward inputs_embeds = self.embed_tokens(input_ids) File "/home/kvaishnavi/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1519, in _wrapped_call_impl return self._call_impl(args, *kwargs) File "/home/kvaishnavi/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1558, in _call_impl args_result = hook(self, args) File "/home/kvaishnavi/.local/lib/python3.8/site-packages/torch/onnx/utils.py", line 1394, in _track_module_attributes_forward_pre_hook setattr(module, attr_name, _get_module_attributes(module)) File "/home/kvaishnavi/.local/lib/python3.8/site-packages/torch/onnx/utils.py", line 1474, in _get_module_attributes return {k: getattr(module, k) for k in annotations} File "/home/kvaishnavi/.local/lib/python3.8/site-packages/torch/onnx/utils.py", line 1474, in <dictcomp> return {k: getattr(module, k) for k in annotations} File "/home/kvaishnavi/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1696, in __getattr__ raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'") AttributeError: 'Embedding' object has no attribute 'freeze' ``` To get around this issue, we can skip adding the keys in the dictionary when the object does not have the attribute. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109759 Approved by: https://github.com/BowenBao	2023-09-23 02:47:51 +00:00
Kaichao You	c789ed6e62	[Inductor][FX]support nn.Linear nn.ConvTransposeNd for efficient_conv_bn_eval (#109722 ) Using the `functional_call` API, we can deal with nn.Linear and nn.ConvTransposeNd, just like normal conv. Thanks for @albanD pointing out the API in https://github.com/pytorch/pytorch/issues/109596 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/109722 Approved by: https://github.com/jansel	2023-09-23 01:12:34 +00:00
Shunting Zhang	3663436db3	[inductor] fix a max-autotune rng state related bug (#109828 ) Fix https://github.com/pytorch/pytorch/issues/109736 . HF pin move causes regression on accuracy check for HF models on the dashboard. Manually reverting the HF PR ( https://github.com/huggingface/transformers/pull/24696/files ) could recover, but this may hide some real issue. I happen to found that using a warm matmul max-autotune cache can work around the issue. Or putting it in another way: - make all calls to check_cache cache miss repro the issue - make all cals to check_cache cache hit works around the issue I did some sort of 'bisect' to force halving the amount of cache miss each time while still make sure we can repro. Luckily reducing to a single cache miss still repro the issue. With more debugging, it turns out that it's the call to `torch.randn` on cuda device causing the problem. The fix is to make sure we restore the rng state when we generate random inputs for max-autotune benchmarking. TBH, I can not fully explain the root cause although I know it's caused by rng state change. AOTAutograd already has some logic to preserve rng state. And I can not repro the issue in unit tests. I have a few guess why the RNG state is not restored in the first place after we generate random inputs for max-autotune: - maybe AOTAutograd misses some corner case to preserve the rng state - maybe for the failed models, there are some eager fallback that's not handled by inductor. And if those fallback calles random number related APIs, we will see the issue. But again I don't find a good way to simulate this. Repro: ``` TORCHINDUCTOR_BENCHMARK_KERNEL=1 TORCHINDUCTOR_MAX_AUTOTUNE_GEMM=1 CUDA_VISIBLE_DEVICES=3 time python benchmarks/dynamo/huggingface.py --backend inductor --amp --accuracy --only PLBartForCausalLM --training --cold-start-latency ``` We always repro the issue without the PR but pass the accuracy check with the PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109828 Approved by: https://github.com/eellison	2023-09-23 00:58:10 +00:00
Angel Yang	d7f3986314	Fix S367052 to unblock ICVR MC3 (#109853 ) Summary: Somehow "getitem" started to get Tensor starting from ads_ranking:996 and broke SDD pipelining FX-transformer. We need to skip the Tensor node in annotation. Test Plan: N4326037 # Before {F1099052907} # With this diff {F1099052270} Differential Revision: D49528046 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109853 Approved by: https://github.com/jackiexu1992, https://github.com/lanza, https://github.com/xush6528	2023-09-23 00:23:42 +00:00
Michael Voznesensky	06aa6966a8	Better invariants - always route list/tuple to their requisite VTs instead of ConstantVariable (#109869 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109869 Approved by: https://github.com/jansel ghstack dependencies: #109896	2023-09-22 22:46:29 +00:00
Mark Saroufim	691f8ca4f4	faster build instructions CONTRIBUTING.md (#109900 ) Discovered this as I was building pytorch on a fresh g5.4x instance on aws, building flash attnetion was bricking my machine ``` Building wheel torch-2.2.0a0+gitd0c8e82 -- Building version 2.2.0a0+gitd0c8e82 cmake --build . --target install --config Release [1/748] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim96_fp16_sm80.cu.o FAILED: caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim96_fp16_sm80.cu.o /opt/conda/envs/torchbench/bin/ccache /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DAT_PER_OPERATOR_HEADERS -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DTORCH_CUDA_BUILD_MAIN_LIB -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_CUDA -DUSE_DISTRIBUTED -DUSE_EXPERIMENTAL_CUDNN_V8_API -DUSE_EXTERNAL_MZCRC -DUSE_FLASH_ATTENTION -DUSE_MEM_EFF_ATTENTION -DUSE_NCCL -DUSE_RPC -DUSE_TENSORPIPE -D_FILE_OFFSET_BITS=64 -Dtorch_cuda_EXPORTS -I/home/ubuntu/pytorch/build/aten/src -I/home/ubuntu/pytorch/aten/src -I/home/ubuntu/pytorch/build -I/home/ubuntu/pytorch -I/home/ubuntu/pytorch/cmake/../third_party/benchmark/include -I/home/ubuntu/pytorch/third_party/onnx -I/home/ubuntu/pytorch/build/third_party/onnx -I/home/ubuntu/pytorch/third_party/foxi -I/home/ubuntu/pytorch/build/third_party/foxi -I/home/ubuntu/pytorch/aten/src/THC -I/home/ubuntu/pytorch/aten/src/ATen/cuda -I/home/ubuntu/pytorch/aten/src/ATen/../../../third_party/cutlass/include -I/home/ubuntu/pytorch/build/caffe2/aten/src -I/home/ubuntu/pytorch/aten/src/ATen/.. -I/home/ubuntu/pytorch/build/nccl/include -I/home/ubuntu/pytorch/c10/cuda/../.. -I/home/ubuntu/pytorch/c10/.. -I/home/ubuntu/pytorch/third_party/tensorpipe -I/home/ubuntu/pytorch/build/third_party/tensorpipe -I/home/ubuntu/pytorch/third_party/tensorpipe/third_party/libnop/include -I/home/ubuntu/pytorch/torch/csrc/api -I/home/ubuntu/pytorch/torch/csrc/api/include -isystem /home/ubuntu/pytorch/build/third_party/gloo -isystem /home/ubuntu/pytorch/cmake/../third_party/gloo -isystem /home/ubuntu/pytorch/cmake/../third_party/tensorpipe/third_party/libuv/include -isystem /home/ubuntu/pytorch/cmake/../third_party/googletest/googlemock/include -isystem /home/ubuntu/pytorch/cmake/../third_party/googletest/googletest/include -isystem /home/ubuntu/pytorch/third_party/protobuf/src -isystem /home/ubuntu/pytorch/third_party/gemmlowp -isystem /home/ubuntu/pytorch/third_party/neon2sse -isystem /home/ubuntu/pytorch/third_party/XNNPACK/include -isystem /home/ubuntu/pytorch/third_party/ittapi/include -isystem /home/ubuntu/pytorch/cmake/../third_party/eigen -isystem /usr/local/cuda/include -isystem /home/ubuntu/pytorch/third_party/ideep/mkl-dnn/include/oneapi/dnnl -isystem /home/ubuntu/pytorch/third_party/ideep/include -isystem /home/ubuntu/pytorch/cmake/../third_party/cudnn_frontend/include -D_GLIBCXX_USE_CXX11_ABI=1 -Xfatbin -compress-all -DONNX_NAMESPACE=onnx_torch -gencode arch=compute_86,code=sm_86 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=set_but_not_used,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -Wno-deprecated-gpu-targets --expt-extended-lambda -DCUB_WRAPPED_NAMESPACE=at_cuda_detail -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -O3 -DNDEBUG -std=c++17 -Xcompiler=-fPIC -DTORCH_USE_LIBUV -DCAFFE2_USE_GLOO -Xcompiler=-Wall,-Wextra,-Wno-unused-parameter,-Wno-unused-function,-Wno-unused-result,-Wno-missing-field-initializers,-Wno-unknown-pragmas,-Wno-type-limits,-Wno-array-bounds,-Wno-unknown-pragmas,-Wno-strict-overflow,-Wno-strict-aliasing,-Wno-missing-braces,-Wno-maybe-uninitialized -MD -MT caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim96_fp16_sm80.cu.o -MF caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim96_fp16_sm80.cu.o.d -x cu -c /home/ubuntu/pytorch/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim96_fp16_sm80.cu -o caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim96_fp16_sm80.cu.o Killed ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/109900 Approved by: https://github.com/drisspg	2023-09-22 22:39:51 +00:00
Animesh Jain	8ed08e5a7c	[dynamo] Graph break on rng get/set state - remove GeneratorStateSource (#109410 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109410 Approved by: https://github.com/ezyang ghstack dependencies: #109411	2023-09-22 22:31:55 +00:00
Michael Voznesensky	a902150a1e	[Easy] ConstantVariable() -> .create (#109896 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109896 Approved by: https://github.com/ezyang	2023-09-22 22:30:15 +00:00
SS-JIA	e42d450a55	[core IR] Add div.Tensor_mode, div.Scalar_mode, and copy as core operators (#109812 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109812 Approved by: https://github.com/kirklandsign	2023-09-22 22:05:49 +00:00
Jijie Wei	334ead04a9	Back out "[decomp] Fix baddbmm decomposition (#109714 )" (#109855 ) Summary: Original commit changeset: 95c462a380c9 Original Phabricator Diff: D49484954 this diff cause test failure for deterministic ne test see:https://www.internalfb.com/sandcastle/job/18014399565419856/ Test Plan: buck2 test 'fbcode//mode/opt' fbcode//aps_models/ads/icvr/tests:icvr_fm_e2e_deterministic_ne_test -- --exact 'aps_models/ads/icvr/tests:icvr_fm_e2e_deterministic_ne_test - aps_models.ads.icvr.tests.icvr_fm_e2e_deterministic_ne_test.ICVR_FM_E2EDeterministicNeTest: test_e2e_deterministic_icvr_fm_pt2_fsdp_multi_gpus' https://www.internalfb.com/intern/testinfra/testrun/16888498605839953 Differential Revision: D49527271 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109855 Approved by: https://github.com/yanboliang	2023-09-22 22:01:38 +00:00
Modi Mo	f0d71de4ac	Update caffe2 with LLVM-18 API change (#109408 ) Summary: https://github.com/llvm/llvm-project/pull/66295 modified some internal LLVM APIs, update these places with the changes under LLVM version guard Test Plan: CI Differential Revision: D49340871 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109408 Approved by: https://github.com/Skylion007	2023-09-22 21:40:58 +00:00
Rodrigo Kumpera	c26270c733	[C10D] Even more store scalability work. (#109218 ) Fix a bug socket.cpp in timeout detection that only shows up with 10k ranks. Make the minimum wait time in _store_based_barrier to be adaptative based on the number of ranks. Longer timeouts give more room for the store to do productive work when swamped. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109218 Approved by: https://github.com/XilunWu ghstack dependencies: #109217	2023-09-22 21:27:09 +00:00
Shunting Zhang	de1b00abda	inductor: tigher upperbound for rblock scaling (#109839 ) Previously when we deciding if dynamically scaling down rblock, we use the following formule to compute the upper bound of number of blocks per sm: ``` max_threads_per_multi_processo / (32 * num_warps) ``` This is correct but it's a bit loose and some times because of the loose upper bound, we skip some optimization opportunities. The new upper bound is: 65536 / n_reg_used_by_each_block . This is a tighter upper bound and can be helpful if the kernel uses too many registers (i.e. much larger than 32). For kernel https://gist.github.com/shunting314/59aeafd297ed8ff03aa12030a2dd41ae (this is a real kernel inductor generates for HF), the change improve its perf from: 0.485ms 0.332GB 684.29GB/s to 0.240ms 0.332GB 1382.70GB/s . The perf is bad previsouly because of register spills Pull Request resolved: https://github.com/pytorch/pytorch/pull/109839 Approved by: https://github.com/jansel	2023-09-22 20:55:51 +00:00
Mark Saroufim	e2cfbca5ab	Add clip to dynamo runners (#109840 ) CLIP was moved to canary models because we use the multimodal version which depends on torchtext which torchbench deprecated https://github.com/pytorch/benchmark/pull/1837 This issue didn't show up before because we hadn't updated the torchbench pin Pull Request resolved: https://github.com/pytorch/pytorch/pull/109840 Approved by: https://github.com/cpuhrsch	2023-09-22 20:50:57 +00:00
Jez Ng	2895fbd857	Enable typechecking for _inductor/pattern_matcher.py (#109613 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109613 Approved by: https://github.com/Skylion007	2023-09-22 20:50:21 +00:00
Tina (Lin) Dineva	411ca10e74	[Pytorch][Vulkan] Add baddbmm (#109851 ) Summary: Similar implementation like BMM & ADDMM, the bias tensor is using the packed weights, similar to MM, but increases the index via the z-dim to get more matrices in the batch. Packed bias (input of MM): ``` ivec3 pos(k_, j_, 0); float v = texelFetch(uInput, pos, 0) # v.xyzw are 4 numbers in one matrix # no batch # k_, j_ has only 1/4 of the range as the original matrix size (HW matrix i=> H/2W/21 3D Image). ``` Packed bias (input of BMM): ``` ivec3 pos(k_, j_, i); float v = texelFetch(uInput, pos, 0) # v.xyzw are 4 numbers in one matrix # i as batch id ``` To support broadcasting*, the bias packing of `mm` is slightly different than weight packing, which repeats the single element in height-dim twice to fill the 4 planes (see code for details). The width-dim doesn’t repeat twice, but the code still works, because stacking 3 planes together with the last one empty yields the same 3D image. However, this doesn’t work for `bmm`, since it’s a series of `{4 planes} {4 planes} … {4 planes}`, and each `{4 planes}` represents a matrix, so only 3 planes completely mess up the indexing. Thus, I repeat the single element in width-dim as well to fill all 4 planes to have the correct indexing. https://pytorch.org/docs/stable/generated/torch.baddbmm.html Test Plan: ``` [ttingchulin@27298.od /data/sandcastle/boxes/fbsource (bmm)]$ LD_LIBRARY_PATH=third-party/swiftshader/lib/linux-x64/ buck run fbcode/mode/dev-nosan //xplat/caffe2:pt_vulkan_api_test_bin ``` Reviewed By: yipjustin Differential Revision: D49402181 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109851 Approved by: https://github.com/yipjustin	2023-09-22 20:34:38 +00:00
Oguz Ulgen	1df14f1bf8	Move has_triton to top level triton utils so that dynamo can also access (#109832 ) it without creating cyclic dependencies Pull Request resolved: https://github.com/pytorch/pytorch/pull/109832 Approved by: https://github.com/zou3519	2023-09-22 19:33:41 +00:00
Jane Xu	4b0281b32c	[BE][foreach] name tests correctly. noncontiguous inputs != fastpath (#109771 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109771 Approved by: https://github.com/soulitzer	2023-09-22 19:16:14 +00:00
Rodrigo Kumpera	92de1d3222	[C10D] Push store scalability a bit further. (#109217 ) This is a bunch of small changes to improve store scalability: - stagger client connection to avoid a stampede. - warn if somaxconn is too small. - increase the backlog to 16k. Differential Revision: [D49238587](https://our.internmc.facebook.com/intern/diff/D49238587) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109217 Approved by: https://github.com/XilunWu	2023-09-22 17:23:46 +00:00
Bin Bao	c27c56a5c4	[inductor] Add back a missing header include (#109845 ) Summary: It was removed in https://github.com/pytorch/pytorch/pull/109678, which regressed GoogleFnet in HF. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109845 Approved by: https://github.com/angelayi, https://github.com/chenyang78	2023-09-22 17:06:06 +00:00
PyTorch MergeBot	d0c8e8240d	Revert "When doing typed typecheck, also check signature with symint removed (#109727 )" This reverts commit 56ef200c2dc8a1f1e269861b7a6e02e99d3b56a1. Reverted https://github.com/pytorch/pytorch/pull/109727 on behalf of https://github.com/ezyang due to yolov3 problem ([comment](https://github.com/pytorch/pytorch/pull/109727#issuecomment-1731585002))	2023-09-22 15:11:27 +00:00
PyTorch MergeBot	629a628cc8	Revert "Update AOTAutograd to use FunctionalTensorMode instead of C++ functionalization (#106406 )" This reverts commit b5d6e831a9ecbd5b8c126cace5ea8567156365c8. Reverted https://github.com/pytorch/pytorch/pull/106406 on behalf of https://github.com/malfet due to Broke lots of tests on trunk ([comment](https://github.com/pytorch/pytorch/pull/106406#issuecomment-1731524917))	2023-09-22 14:32:34 +00:00
Andrew Calvano	2512017814	Fix for out of bounds read in torch mobile flatbuffer loader (#108439 ) Remove redundant (and unsafe) `mobile::serialization::ModuleBufferHasIdentifier(data)` as ` mobile::serialization::VerifyModuleBuffer(verifier)` validates the same thing but in boundary-check safe manner. Test Plan: Out of bounds read crash no longer reproduces Differential Revision: D48914114 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108439 Approved by: https://github.com/manuelcandales, https://github.com/malfet	2023-09-22 14:26:33 +00:00
qiukaida	93ce6df931	Fix torch.utils.benchmark API while use privateuse1. (#108548 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/108548 Approved by: https://github.com/aaronenyeshi	2023-09-22 14:24:18 +00:00
Nikita Shulga	f092eecc92	Handle C++ exceptions raised during `finfo`/`iinfo` calls (#109743 ) Partially fixes https://github.com/pytorch/pytorch/issues/109737 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109743 Approved by: https://github.com/albanD ghstack dependencies: #109744	2023-09-22 14:17:58 +00:00
Bin Bao	d7dfa91e12	[inductor] Refactor some libtorch c shim interfaces (#109834 ) Summary: Change the returned values to be in the back of the parameters, because 1) it is more consistent with AOTInductor runtime API convention; 2) because the out-variant ops have the out tensor at the beginning of parameters, this makes the return values more distinguished from those Test Plan: ``` buck test mode/opt caffe2/torch/fb/model_transform/experimental/benchmark/test/aotinductor:test_aot_inductor_benchmark ``` Differential Revision: D49522928 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109834 Approved by: https://github.com/chenyang78	2023-09-22 12:45:23 +00:00
Matthew Hoffman	098d62d278	Add global_step parameter to SummaryWriter.add_hparams (#109572 ) Fixes #37738 where all hparam metrics can only be plotted at step 0. This is basically just a resubmission of #50653. before: <img width="345" alt="Screenshot 2023-09-18 at 8 09 13 PM" src="https://github.com/pytorch/pytorch/assets/27844407/89ebb327-9d0f-4e4e-a77a-27067a9d4ca0"> after: <img width="346" alt="Screenshot 2023-09-18 at 7 56 52 PM" src="https://github.com/pytorch/pytorch/assets/27844407/059e0465-c6bf-47fe-974b-2175e57aa62d"> @ngimel @J0Nreynolds @ezyang @albanD Pull Request resolved: https://github.com/pytorch/pytorch/pull/109572 Approved by: https://github.com/ezyang	2023-09-22 12:37:01 +00:00
Edward Z. Yang	b4ede53776	Use constrain_range_as_size for nonzero/repeat_interleave (#109857 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/109857 Approved by: https://github.com/tugsbayasgalan	2023-09-22 12:14:46 +00:00
Edward Z. Yang	56ef200c2d	When doing typed typecheck, also check signature with symint removed (#109727 ) See the test case for what we didn't catch (SymInt vs const SymInt& mismatch.) It's necessary to test for both, because we will fall back to the non-SymInt signature if there is no SymInt unboxed kernel available. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/109727 Approved by: https://github.com/zou3519	2023-09-22 12:12:10 +00:00
Michael Lazos	24ba4b7059	[dynamo][`__torch_function__` 1/n] Add getset descriptor and `__get__` vars (#109542 ) Adds the MethodWrapperVariable and GetSetDescriptor variable types. These are used in `__torch_function__` tracing to represent attribute reads (`__get__`) and for comparing unbound methods. (the func argument when `__torch_function__` is dispatched from a method call) towards tracing for https://github.com/pytorch/pytorch/issues/93723 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109542 Approved by: https://github.com/jansel	2023-09-22 10:39:15 +00:00
wangxiyuan	d7c05bb2e8	[ONNX] Remove the depreacated function `_export` (#109763 ) `_export` API was depreacated and should be removed after 2.0. See: https://github.com/pytorch/pytorch/pull/107208 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109763 Approved by: https://github.com/thiagocrepaldi	2023-09-22 07:14:13 +00:00
Brian Hirsh	b5d6e831a9	Update AOTAutograd to use FunctionalTensorMode instead of C++ functionalization (#106406 ) Now that FunctionalTensor and `FunctionalTensorMode` are lower down in this stack, the changes in this PR are more mechanical: Everywhere in AOTAutograd that I used to use the C++ functionalization API, I now use the python functionalization API. Note that this doesn't actually cause functionalization to run underneath torch_dispatch. I'm saving that re-ordering for later in the stack. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106406 Approved by: https://github.com/ezyang ghstack dependencies: #108654, #109662, #109632, #109023	2023-09-22 07:09:04 +00:00
Brian Hirsh	63526a63f5	Make FunctionalTensor subclass to be more like functorch (interaction with ZeroTensor + Conjugate key) (#109023 ) I added some tests for Conj, Neg and ZeroTensor for both python and C++ functionalization. This also fixes a nasty segfult when running a functorch `jacfwd` test with `torch.compile`, once AOTAutograd is using `FunctionalTensor`. Changes: (1) I use Jeffrey's `make_wrapper_subclass(extra_dispatch_keys)` kwarg to plumb extra dispatch keys ontoto the wrapper, mirroring what C++ functionalization does (C++ functionalization will mirror all dispatch keys from the inner tensor to the wrapper, except for python and functorch keys). (2) FunctionalTensorMode will decompose CompositeImplicitAutograd ops, since (for example) ZeroTensor kernels can send ops like `.to()` directly to the Python key. We'll need a way to toggle this later for pre-dispatch functionalization (3) Bound `_ForceDispatchKeyGuard` and BatchedTensorImpl's dispatch keyset to python Pull Request resolved: https://github.com/pytorch/pytorch/pull/109023 Approved by: https://github.com/zou3519 ghstack dependencies: #108654, #109662, #109632	2023-09-22 07:09:04 +00:00
Brian Hirsh	7a21e960c6	fix infinite loop with primtorch and .to(meta) (#109632 ) Fixes https://github.com/pytorch/pytorch/issues/103532, which I needed in order to more easily/properly test that python functionalization is at parity with C++ functionalization for conj/neg. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109632 Approved by: https://github.com/ezyang ghstack dependencies: #108654, #109662	2023-09-22 07:09:04 +00:00
Brian Hirsh	46b0b7bff7	_return_and_correct_aliasing: fix for schemas with mutable tensor in kwargs (#109662 ) I missed a few tests the first time around - this fixes out= op handling for `_return_and_correct_aliasing`, which failed a few tests in the python functionalization <> AOTAutograd PR above. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109662 Approved by: https://github.com/ezyang ghstack dependencies: #108654	2023-09-22 07:09:04 +00:00
Brian Hirsh	dae9aa8925	fix subclass custom sizes dynamic shapes caching (#108654 ) This PR fixes the ownership/lifetime handling for tensor subclasses that override sizes/strides, when tensors get resized. This is needed now, because `FunctionalTensor` is a subclass that has a custom size/stride (so it can plumb requests to its inner tensor), and is also a core piece of infra (it's used during tracing in AOTAutograd, which means that metadata mutation and resizing that happens to work with torch.compile today needs to work with FunctionalTensor). After a bunch of discussion with @ezyang and @soulitzer, I updated `PyInterpreter::sym_sizes()` (and friends) so that: (1) They allocate a py::capsule buffer and stash it on the tensor on the first call to size/stride (2) On a size/stride call where we noticed that the number of dimensions on the tensor has changed (so our buffer it stale), we re-allocate the buffer (3) On a size/strude cal where we notice that the number of dimensions is the same, but the values are different (this happens whenever a tensor experiences a metadata mutation, like `.transpose_()`), we inplace-modify the buffer and put the new ints/symints into it I also ended up doing the SmallVector optimization, which was required to fix some tests in AOTAutograd. Ideally we should look into those tests, and nail down the parts of our codebase that rely on SmallVector not re-allocating on a resize... but I'm saving this for a followup. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108654 Approved by: https://github.com/ezyang	2023-09-22 07:09:04 +00:00
Avik Chaudhuri	ebc7039bcb	New export API with dynamic shape specifications instead of constraints (#108448 ) Our experience using `constraints` / `dynamic_dim` with the existing export API has found it to be (subjectively) clunky and (objectively) verbose in common cases. This PR implements a new design for the export API that replaces the use of `constraints` / `dynamic_dim` with a new way of specifying dynamic shapes, involving the following concepts: * a constructor `Dim` for first-class named dynamic dimensions with ranges (similar to `functorch.dim`, and analogous to internal symbolic sizes) * a mechanism that uses the above in `export` calls to associate inputs to their dynamic shape specifications (`dynamic_shapes`) Design doc: https://docs.google.com/presentation/d/168U7XK72C_WSsZpGESP6Cho9udh193fi0gfjxCNcJ4E/edit#slide=id.p (Meta-only). Note that we only implement Option 1 in that doc. An older version of this PR also implemented Option 3, which is an alternative way of specifying dynamic shapes using tensor type annotations on the exported callable; but we have moved that to future work for now. See docs for these new features in `torch.export`. The existing `torch.export.export` is modified to use the new API, `torch._export.export__RC__`, whenever `constraints=None`. We have not deprecated the existing API yet, but will do in a follow-up. Constraint violation errors arising through use of the new API will now contain suggested fixes using the new API. No longer do we need to report all specializations for static dimensions and suggest all constraints over dynamic dimensions to fix such errors. Instead, due to the redesign, the suggested fixes are much more concise, only involving modifying the definitions of relevant `Dim`s. Differential Revision: [D48919204](https://our.internmc.facebook.com/intern/diff/D48919204/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108448 Approved by: https://github.com/suo, https://github.com/gmagogsfm	2023-09-22 06:58:26 +00:00
cyy	cd99cdc3af	fix std::move warnings from gcc (#105780 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105780 Approved by: https://github.com/Skylion007	2023-09-22 05:55:21 +00:00
Yanbo Liang	4ff294522a	[Inductor] Extend Pattern Matcher to Match Equivalent Function Invocation (#107832 ) Fixes #104391 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107832 Approved by: https://github.com/jansel	2023-09-22 05:26:08 +00:00
rzou	8124a6c40c	[TORCH_LIBRARY] Add impl_abstract_pystub (#109529 ) We want users to be able to define custom ops in C++ but put the abstract impl in Python (since it is easier to write them in Python and the abstract impl better models device semantics and data-dependent operators). `m.impl_abstract_pystub(opname, python_module, context)` declares the abstract_impl of the operator to exist in the given python module. When the abstract_impl needs to be accessed (either via FakeTensor or Meta), and it does not exist, the PyTorch Dispatcher will yell with a descriptive error message. Some details: - We construct a new global AbstractImplPyStub mapping in Dispatcher.cpp. Read/write to this map is protected by the Dispatcher lock. - We add a new Meta Tensor fallback kernel. The fallback errors out if there is no meta kernel, but also offers a nicer error message if we see that there is a pystub. - We create a `torch._utils_internal.throw_abstract_impl_not_imported_error` helper function to throw errors. This way, we can throw different error messages in OSS PyTorch vs internal PyTorch. To invoke this from C++, we added a PyInterpreter::throw_abstract_impl_not_imported_error. Differential Revision: [D49464753](https://our.internmc.facebook.com/intern/diff/D49464753/) Differential Revision: [D49464753](https://our.internmc.facebook.com/intern/diff/D49464753) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109529 Approved by: https://github.com/ezyang, https://github.com/bdhirsh	2023-09-22 04:55:36 +00:00
Edward Z. Yang	3268b039ec	Handle unbacked symints in Triton size hints (#109609 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/109609 Approved by: https://github.com/yf225	2023-09-22 03:16:53 +00:00
Edward Z. Yang	abd9b763ca	[RFC] Add debug log as we lower each FX node (#109602 ) I found this useful for orienting myself when I threw an error mid-lowering. What do other people think? Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/109602 Approved by: https://github.com/malfet, https://github.com/voznesenskym	2023-09-22 03:10:22 +00:00
Tina (Lin) Dineva	e1d71231e2	[Pytorch][Vulkan] Add bmm op (#109360 ) Summary: BMM is developed on top of MM methodology, the main difference is the 1st input matrix changed from a standard Vulkan 2D tensor to a Vulkan 3D tenser, so the indexing changed quite differently. The matrices of a batch are appended on z-dimension and the channel of size 4 (texel). The 2nd input matrix remains the same as a packed format (fit a `HW` matrix into a `H/2 W/2 * 1` 3D image texture by utilizing all 4 values in the texel), but appends more matrices in the batch on z-dimension (only has 1 element in the case of MM). Vulkan 2D Basic (1st input of MM & output): ``` ivec3 pos(j, i, 0); float v = texelFetch(uInput, pos, 0)[0]; # no batch ``` Vulkan 3D Basic (1st input of BMM & output): ``` ivec3 pos(k, j, i/4); float v = texelFetch(uInput, pos, 0)[i % 4]; # i as batch id ``` Packed weights (2nd input of MM): ``` ivec3 pos(k_, j_, 0); float v = texelFetch(uInput, pos, 0) # v.xyzw are 4 numbers in one matrix # no batch # k_, j_ has only 1/4 of the range as the original matrix size (HW matrix i=> H/2W/21 3D Image). ``` Packed weights (2nd input of BMM):* ``` ivec3 pos(k_, j_, i); float v = texelFetch(uInput, pos, 0) # v.xyzw are 4 numbers in one matrix # i as batch id ``` Based on the different indexing of MM & BMM. I modified the MM methodology to produce the desired output image. Test Plan: ``` [ttingchulin@27298.od /data/sandcastle/boxes/fbsource (bmm)]$ LD_LIBRARY_PATH=third-party/swiftshader/lib/linux-x64/ buck run fbcode/mode/dev-nosan //xplat/caffe2:pt_vulkan_api_test_bin -- --gtest_filter="<test>" eg. -- --gtest_filter="mm" Building: finished in 0.1 sec (100%) 328/3361 jobs, 0/3361 updated Total time: 0.1 sec BUILD SUCCEEDED Running main() from xplat/third-party/gmock/googletest-1.12.1/googletest/src/gtest_main.cc Note: Google Test filter = mm [==========] Running 8 tests from 1 test suite. [----------] Global test environment set-up. [----------] 8 tests from VulkanAPITest [ RUN ] VulkanAPITest.addmm [ OK ] VulkanAPITest.addmm (125 ms) [ RUN ] VulkanAPITest.addmm_expand [ OK ] VulkanAPITest.addmm_expand (76 ms) [ RUN ] VulkanAPITest.addmm_expand2 [ OK ] VulkanAPITest.addmm_expand2 (0 ms) [ RUN ] VulkanAPITest.bmm [ OK ] VulkanAPITest.bmm (152 ms) [ RUN ] VulkanAPITest.bmm_large [ OK ] VulkanAPITest.bmm_large (4818 ms) [ RUN ] VulkanAPITest.bmm_small [ OK ] VulkanAPITest.bmm_small (4 ms) [ RUN ] VulkanAPITest.bmm_one [ OK ] VulkanAPITest.bmm_one (0 ms) [ RUN ] VulkanAPITest.mm [ OK ] VulkanAPITest.mm (55 ms) [----------] 8 tests from VulkanAPITest (5233 ms total) [----------] Global test environment tear-down [==========] 8 tests from 1 test suite ran. (5233 ms total) [ PASSED ] 8 tests. ``` Differential Revision: D49306279 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109360 Approved by: https://github.com/yipjustin	2023-09-22 02:52:45 +00:00
Bin Bao	8856c1628e	[inductor] Change AOTInductor to return output tensors (#109790 ) Summary: Change AOTInductor to directly return output tensors instead of taking pre-allocated output tensors to return the results. This gives several benefits: * It makes sure AOTInductor has the same behavior when managing the output tensors as the default Inductor, which is widely tested and thus more reliable. * As we have debugged before, there are cases we still have to codegen extra copy_ ops to fill the pre-allocated output tensors which doesn't make sense for performance. * With the coming enhanced memory planning, this again will make sure the memory planning logic is the between AOTInductor and Inductor, which will greatly simplify the problem and improve the reliability. This change also combines D49494954 from Yang and https://github.com/pytorch/pytorch/pull/109560 from Angela. Differential Revision: D49502318 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109790 Approved by: https://github.com/chenyang78	2023-09-22 02:31:52 +00:00
Suraj Subramanian	d43f9f7707	Add redirect links to the contributor wiki (#106863 ) * Update contribution guide links to the wiki page --------- Co-authored-by: Svetlana Karslioglu <svekars@meta.com>	2023-09-21 22:01:20 -04:00
Gustav Larsson	8dcdc74915	torch->onnx export support: quantized::linear_relu (#109755 ) - Adds support for quantized::linear_relu - Adds weight unpacking pattern matcher - Adds to export for opset 10 and 13. - Adds QAT test modeled after conv2d+relu fusion test Pull Request resolved: https://github.com/pytorch/pytorch/pull/109755 Approved by: https://github.com/BowenBao, https://github.com/thiagocrepaldi	2023-09-21 23:24:20 +00:00
Daniil Kutz	175ccfc4c8	Verify flatbuffer module fields are initialized (#109794 ) Fixes #109793 Add validation on flatbuffer module field to prevent segfault Pull Request resolved: https://github.com/pytorch/pytorch/pull/109794 Approved by: https://github.com/malfet	2023-09-21 23:19:17 +00:00
drisspg	d65e067baa	Updates to attn_mask handiling in mem_eff (#109620 ) # Summar Align internal changes to what is xformers: `a67cd57531` We have actually already removed the bias 4d view so this is in theory is a no op and really just increased safety checks Pull Request resolved: https://github.com/pytorch/pytorch/pull/109620 Approved by: https://github.com/cpuhrsch	2023-09-21 22:40:58 +00:00
PyTorch MergeBot	b5fde4c382	Revert "[Reland] Remove calls of c10::either (#109708 )" This reverts commit 0735f6c0d5857d9ae7893d23c5a4b53bdf887967. Reverted https://github.com/pytorch/pytorch/pull/109708 on behalf of https://github.com/atalman due to Broke windows periodic tests ([comment](https://github.com/pytorch/pytorch/pull/109708#issuecomment-1730356321))	2023-09-21 22:04:25 +00:00
igm503	255d1a776a	[MPS] Add support for Mish to MPS backend (#109786 ) Fixes [#ISSUE_NUMBER](https://github.com/pytorch/pytorch/issues/77764#issuecomment-1712894444) Adds the mish activation function to the mps backend. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109786 Approved by: https://github.com/kulinseth	2023-09-21 21:01:20 +00:00
Angela Yi	f7ddc54503	[aotinductor] Update performance benchmark code (109560) (#109820 ) Summary: Same as #109560, made a new PR because we need to land from internal Previously during performance benchmark testing, we would create an AOTInductorModelContainerHandle every time the compiled function is run with new inputs. However after https://github.com/pytorch/pytorch/pull/108473 we now load the constants needed in the runtime when initializing the AOTInductorModelContainerHandle. This resulted in our benchmarks displaying a ~0.4x speedup. This diff moves the initialization of AOTInductorModelContainerHandle outside of the code where we run the compiled function with different inputs. For example, ``` python benchmarks/dynamo/huggingface.py --performance --cold-start-latency --inference --bfloat16 --export-aot-inductor --disable-cudagraphs --device cuda --total-partitions 3 --partition-id 0 --only AlbertForMaskedLM ``` results in `1.359x` speedup. Specifically, this adds a `create_container_handle` and `delete_container_handle` function which need to called before `run`. We call `create_container_handle` to initialize the AOTInductorModelContainerHandle, call `run` to run the compiled .so with different inputs, and then `delete_container_handle` to delete it. [Updated dashboard results](https://hud.pytorch.org/benchmark/compilers?startTime=Wed%2C%2013%20Sep%202023%2021%3A03%3A55%20GMT&stopTime=Wed%2C%2020%20Sep%202023%2021%3A03%3A55%20GMT&granularity=hour&suite=torchbench&mode=inference&dtype=bfloat16&lBranch=angelayi/aot_inductor_benchmark&lCommit=f9aa49c4c9a1a140b6f0c4520d1d6d99b57e12fa&rBranch=main&rCommit=015be4cedba357eb931e24bf188479235db7c5c8) Test Plan: CI Differential Revision: D49513934 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109820 Approved by: https://github.com/desertfire	2023-09-21 20:49:41 +00:00
Mwiza Kunda	8dedc9dd9b	Add meta tests for layer/group/batch norm backward (#109591 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/109591 Approved by: https://github.com/ezyang	2023-09-21 18:58:51 +00:00
Mwiza Kunda	83b4aab5bc	Allow zero sized tensors to be resized with meta_randperm (#109721 ) Failure will be handled by `_maybe_resize_out` Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/109721 Approved by: https://github.com/ezyang	2023-09-21 18:41:29 +00:00
Evgeni Burovski	8207118d55	MAINT/TST: pytorch-ify test_linalg, vendored from NumPy (#109775 ) 1. Inherit from TestCase 2. Use pytorch parametrization 2. Use unittest.expectedFailure to mark xfails, also unittest skips All this to make pytest-less invocation work: $ python test/torch_np/test_basic.py cross-ref https://github.com/pytorch/pytorch/pull/109593, https://github.com/pytorch/pytorch/pull/109718 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109775 Approved by: https://github.com/ezyang	2023-09-21 18:36:19 +00:00
cyy	e9e93c5350	[Reland] Move torch::make_unique to std::make_unique (#109780 ) We can first try to move torch::make_unique to std::make_unique despite reverting of #108866 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/109780 Approved by: https://github.com/ezyang	2023-09-21 18:30:21 +00:00
Randolf Scholz	c6b9481c15	Update type hint for `Tensor.__getitem__`. (#109531 ) Better type-hint that's similar in spirit to `numpy.ndarray.__getitem__`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109531 Approved by: https://github.com/ezyang	2023-09-21 18:19:38 +00:00
PyTorch MergeBot	b1f1b39feb	Revert "Add PR number to metrics when available (#109406 )" This reverts commit 5e19216a6e0e6ee322b7416f9a793a51b1ff8c82. Reverted https://github.com/pytorch/pytorch/pull/109406 on behalf of https://github.com/atalman due to breaks lint ([comment](https://github.com/pytorch/pytorch/pull/109406#issuecomment-1730049340))	2023-09-21 17:59:12 +00:00
Edward Z. Yang	09622d8d49	Allow inferring size-nature from sizes passed to empty constructor (#109720 ) This removes the need for many constrain_as_size calls as we now infer them from error checking for sizes. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/109720 Approved by: https://github.com/aakhundov	2023-09-21 17:57:40 +00:00
atalman	6ca964b410	Remove torchtext from Build Official Docker images (#109799 ) Fixes nightly official Docker image build. Failures: https://hud.pytorch.org/hud/pytorch/pytorch/nightly/1?per_page=50&name_filter=Build%20Official <!-- copilot:summary --> ### <samp>🤖 Generated by Copilot at 8671bfc</samp> Remove `torchtext` installation from `Dockerfile` for arm64. This fixes the arm64 build of the PyTorch Docker image. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109799 Approved by: https://github.com/seemethere	2023-09-21 17:07:45 +00:00
Edward Z. Yang	0351e2042b	Avoid throwing exception in ClosingTHPObjectPtr (#109758 ) Previously, if ClosingTHPObjectPtr was destructed because we were unwinding the stack from an exception, we would attempt to call close() which just isn't going to work. Two fixes: 1. Detect if we're unwinding due to a Python error, and don't try to do more Python stuff if so. 2. If close() fails somehow, write an unraisable exception, don't try to throw because that will terminate if you're in an exception. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/109758 Approved by: https://github.com/jansel	2023-09-21 17:04:14 +00:00
hauntsaninja	2cd0b94533	Hide __getattr__ from type checkers (#109683 ) Visibility of this causes type checkers to conservatively assume that all attributes are defined on torch module. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109683 Approved by: https://github.com/ngimel, https://github.com/ezyang, https://github.com/malfet	2023-09-21 17:01:23 +00:00
Simon Fan	ef8d461b09	Fix torchbench --multiprocess (#109657 ) `python benchmarks/dynamo/torchbench.py --multiprocess` currently fails due to initializing distributed multiple times: ``` torch.distributed.DistNetworkError: The server socket has failed to listen on any local network address. The server socket has failed to bind to [::]:6789 (errno: 98 - Address already in use). The server socket has failed to bind to 0.0.0.0:6789 (errno: 98 - Address already in use). ``` Because torchbench calls itself via mp.spawn, there is the parent run (with `--multiprocess`) and child runs (with `--multiprocess --only <model>`). This PR addresses this by fixing two issues: 1) distributed is initialized once in parent run and once in child runs, it should be initialized only in child runs where we have accurate rank and world size info 2) torchbench overrides CUDA_VISIBLE_DEVICES/world_size sometimes, but it shouldn't for distributed use cases where we want to use all available gpus I am also adding a CI test to cover this type of issue in #109311 ### Test plan parent run test: `python benchmarks/dynamo/torchbench.py --ci --accuracy --timing --explain --inductor --device cuda --inference --bfloat16 --output /home/xmfan/local/pytorch/test/test-reports/inference_torchbench.csv --multiprocess` child run test: `python benchmarks/dynamo/torchbench.py --ci --accuracy --timing --explain --inductor --device cuda --inference --bfloat16 --output /home/xmfan/local/pytorch/test/test-reports/inference_torchbench.csv --multiprocess --only simple_gpt` Pull Request resolved: https://github.com/pytorch/pytorch/pull/109657 Approved by: https://github.com/H-Huang	2023-09-21 16:53:07 +00:00
cyy	ba0362a09e	Remove unused build system checks and definitions (#109711 ) Remove some outdated checks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109711 Approved by: https://github.com/ezyang	2023-09-21 16:52:16 +00:00
Zain Rizvi	5e19216a6e	Add PR number to metrics when available (#109406 ) <!-- copilot:summary --> ### <samp>🤖 Generated by Copilot at 780bfa6</samp> Add a new metric for pull request number in `tools/stats/upload_metrics.py`. This allows tracking the CI performance of pull requests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109406 Approved by: https://github.com/kit1980, https://github.com/malfet, https://github.com/clee2000	2023-09-21 16:47:05 +00:00
Mwiza Kunda	6b7b9c796e	Fix registering jit decompositions for jvp for out wrapped decomps (#109367 ) Python decompositions wrapped by `out_wrapper` need to be unwrapped before compiling with TorchScript since: - `out_wrapper` extends the decompositions signature with an out parameter, however this `out` parameter is not present in the source code of the original decomposition so the resulting `ScriptFunction` will not have an `out` parameter - `out_wrapper` is in the `torch._prims_common.wrappers` module so its `globals()` are different to the globals of the decomposition to be wrapped. This may cause symbol resolution to fail with the TorchScript compiler since it is compiling the unwrapped decomps source code rather than the wrapper The python decomposition for `aten.trace` is wrapped as an example, other decompositions are to be fixed in https://github.com/pytorch/pytorch/pull/107707 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109367 Approved by: https://github.com/lezcano	2023-09-21 16:36:51 +00:00
PyTorch MergeBot	406b8412c2	Revert "[inductor] Use _unsafe_view decompostion (#109669 )" This reverts commit 90a2026cd12065994eb234e8c5f332143d9d9468. Reverted https://github.com/pytorch/pytorch/pull/109669 on behalf of https://github.com/clee2000 due to failing internally ([comment](https://github.com/pytorch/pytorch/pull/109669#issuecomment-1729906056))	2023-09-21 16:25:00 +00:00
Will Feng	3f3e353885	torch.compile + selective activation checkpointing (#105489 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105489 NOTE: this PR is tagged "not user facing", because it's not ready to be announced externally yet. This PR implements torch.compile + selective activation checkpoint (SAC) integration, by using `TagActivationCheckpoint` (same backend as torch.compile + full activation checkpoint integration). TorchDispatchMode based implementation cannot support including inplace ops in the checkpointed region at the moment (the reason for this needs investigation), and there is also no way to ban them (because TorchDispatchMode now only sees "after-functionalization" ops, so can't detect if an op is in-place). Hence we hide torch.compile + SAC behind a flag (`torch._dynamo.config._experimental_support_context_fn_in_torch_utils_checkpoint`) and will only use it internally for cases that are known to not have in-place ops. This state won't last too long, because in-place op will at least be able to be detected after Brian's mode reordering and related functionalization changes. So next steps after this PR: 1. Wait for Brian's mode reordering and related functionalization changes to land, and then try to enable the "inplace ops" unit test for torch.compile + selective activation checkpoint (if it doesn't work, investigate why). 2. Unify selective- and full-checkpoint under TorchDispatchMode based implementation. Differential Revision: D47497145 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105489 Approved by: https://github.com/anijain2305	2023-09-21 16:24:11 +00:00
wz337	a5145364d9	[FSDP] Fix _use_dtensor not automatically turn on for model state dict when using DeviceMesh (#109767 ) Fixes #109648 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109767 Approved by: https://github.com/fegin	2023-09-21 15:15:45 +00:00
Sam Larsen	62555930a0	[inductor] Enable mypy checking for codegen/triton_foreach (#109643 ) Summary: Add enough typehints to enable mypy checking for codegen/triton_foreach. Also fixed a bug in a dtype param. Test Plan: * `python test/inductor/test_foreach.py` * `lintrunner` Pull Request resolved: https://github.com/pytorch/pytorch/pull/109643 Approved by: https://github.com/mlazos ghstack dependencies: #109146	2023-09-21 14:30:00 +00:00
Sam Larsen	4eada253e1	[inductor] Set CUDA_VISIBLE_DEVICES for multi-device subprocess autotuning (#109500 ) Summary: The curent parallel autotune implementation sets the CUDA_VISIBLE_DEVICES env var too late -- after the benchmarking subprocess has started -- and the torch libraries don't recognize the change. Since the multiprocessing library doesn't support providing an environment for the subprocess, temporarily set CUDA_VISIBLE_DEVICES in the parent process so that the change is inherited by the subprocess. Test Plan: * New unit test to verify the env var is set in the sub-process and fail the benchmark if it's not. * Ran multiprocess autotuning and looked at the output from `nvidia-smi pmon` to make sure that all GPUs were assigned processes. Snippet: ``` 1 3442314 C 2 1 - - python 2 3442318 C 2 1 - - python 3 3442320 C 8 2 - - python 4 3442323 C 9 4 - - python 5 3442325 C 10 4 - - python 6 3442327 C 10 4 - - python 7 3442329 C 2 0 - - python 0 3434906 C 0 0 - - python ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/109500 Approved by: https://github.com/eellison, https://github.com/shunting314	2023-09-21 14:29:30 +00:00
PyTorch MergeBot	169ae7540d	Revert "Handle unbacked symints in Triton size hints (#109609 )" This reverts commit 654731a52b6bbe0b12f7c5aaac005f8a08c6816f. Reverted https://github.com/pytorch/pytorch/pull/109609 on behalf of https://github.com/ezyang due to this seems to regress HF perf ([comment](https://github.com/pytorch/pytorch/pull/109609#issuecomment-1729688883))	2023-09-21 14:25:42 +00:00
zhxchen17	ac967e9dad	[export] Fix tree spec matching behavior. (#109679 ) Summary: Test Plan: Internal test. Reviewers: Subscribers: Tasks: Tags: Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/109679 Approved by: https://github.com/angelayi, https://github.com/tugsbayasgalan	2023-09-21 14:24:09 +00:00
Edward Z. Yang	d38379f9f1	Update dynamic shapes documentation (#109764 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/109764 Approved by: https://github.com/gchanan	2023-09-21 13:53:43 +00:00
Jithun Nair	86a9534165	Upgrade nightly wheels to rocm5.7 (#109571 ) Follow-up to https://github.com/pytorch/builder/pull/1541 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109571 Approved by: https://github.com/ezyang	2023-09-21 13:41:23 +00:00
Howard Huang	600d0d0284	Add "cuda" to MPI backend capabilities (#109614 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/109543 Test Plan: We need to run CUDA aware MPI in PyTorch to actually test this change, we currently have no MPI tests. Differential Revision: D49420438 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109614 Approved by: https://github.com/XilunWu	2023-09-21 13:34:58 +00:00
Aleksei Nikiforov	b91ba226ce	Don't use cpuinfo on s390x (#109496 ) It doesn't support s390x and just crashes pytorch on init. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109496 Approved by: https://github.com/huydhn	2023-09-21 12:20:49 +00:00
willfengg	772e104dfd	[inductor] visualize fused ops in svg graph (#107752 ) example usage * `TORCH_COMPILE_DEBUG=1 INDUCTOR_ORIG_FX_SVG=1 INDUCTOR_POST_FUSION_SVG=1 python trig.py`: show original fx node name, file, and code. see snapshot 2 where we have origin_0, 1, 2 * trig.py can be found in P816304818 Implementation * keep original fx graph in GraphLowering, ```self.orig_gm: torch.fx.GraphModule = gm.__copy__()``` * draw original fx graph with origins ir_post_fusion ```V.debug.draw_orig_fx_graph(self.orig_gm, self.scheduler.nodes)```. node.meta["buff_meta"] tracks buf_name <img width="350" alt="Screenshot 2023-08-29 at 12 40 24 PM" src="https://github.com/pytorch/pytorch/assets/134637289/c4e197cb-ab3b-4a09-a584-c1356376accb"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/107752 Approved by: https://github.com/mlazos	2023-09-21 08:03:05 +00:00
cyy	f5b753bab1	Fix inline_container_test on Windows (#109754 ) Fix the failure mentioned in https://github.com/pytorch/pytorch/pull/109393. The reason is that IO streams were not opened in binary mode while binary data was written and read. Interestingly, the test passed on Linux. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109754 Approved by: https://github.com/malfet	2023-09-21 07:46:25 +00:00
mingfeima	b780b246eb	Use a reduction implementation for unique when dtype is bool on CPU (#109695 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109695 Approved by: https://github.com/lezcano	2023-09-21 06:56:10 +00:00
Nikita Shulga	cddd0db241	Add `finfo` properties for float8 dtypes (#109744 ) Add float8 finfo checks to `test_type_info.py` Fixes https://github.com/pytorch/pytorch/issues/109737 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109744 Approved by: https://github.com/drisspg	2023-09-21 03:41:48 +00:00
Yeounoh Chung	e2e9d15726	Unblock float16 dtype for xla autocasting (#109554 ) `torch.autocast` with `xla` backend has been restricted to `torch.bfloat16`. This shouldn't be the case anymore. This works with `xla::cast( ..., type=f16)` ``` IR { %0 = f32[] prim::Constant(), xla_shape=f32[], value=1 %1 = f32[3,2]{1,0} aten::expand(%0), xla_shape=f32[3,2]{1,0}, size=(3, 2), dynamic_dims=(0, 0) %2 = f16[3,2]{1,0} xla::cast(%1), xla_shape=f16[3,2]{1,0}, type=f16, dtype=Half, stype=Float %3 = f32[] prim::Constant(), xla_shape=f32[], value=1 %4 = f32[2,3]{1,0} aten::expand(%3), xla_shape=f32[2,3]{1,0}, size=(2, 3), dynamic_dims=(0, 0) %5 = f16[2,3]{1,0} xla::cast(%4), xla_shape=f16[2,3]{1,0}, type=f16, dtype=Half, stype=Float %6 = f16[2,2]{1,0} aten::mm(%5, %2), xla_shape=f16[2,2]{1,0}, ROOT=0 } ``` This will allow PyTorch/XLA to extend its autocast implementation to use `xla` backend for `float16` type as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109554 Approved by: https://github.com/JackCaoG, https://github.com/bdhirsh	2023-09-21 03:19:44 +00:00
lezcano	13bd4ed933	Add docs for torch.compile(numpy) (#109710 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109710 Approved by: https://github.com/ev-br, https://github.com/gchanan, https://github.com/peterbell10	2023-09-21 03:05:21 +00:00
zhxchen17	7a04ae6fba	[export] Remove redundant no_grad() for exported program execution. (#109686 ) Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/109686 Approved by: https://github.com/angelayi	2023-09-21 01:20:54 +00:00
leslie-fang-intel	e4d8ec9fe8	inductor: only do the conv+bn folding for the freezing path (#109587 ) Re-enable PR: https://github.com/pytorch/pytorch/pull/109270 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109587 Approved by: https://github.com/eellison	2023-09-21 00:47:37 +00:00
leslie-fang-intel	9e2b07ac9d	[Inductor] Break the loop fusion when node2 depends on node1 mutations (#109172 ) Summary Fix the issue: https://github.com/pytorch/pytorch/issues/108963. After this PR, loop fusion should break when node2 depends on node1's buffer mutation. Take the UT as example: - Before this PR, the generated code is: ``` cpp_fused_div_index_add_0 = async_compile.cpp(''' #include "/tmp/torchinductor_root/ib/cibrnuq56cxamjj4krp4zpjvsirbmlolpbnmomodzyd46huzhdw7.h" extern "C" void kernel(const double* in_ptr0, const long* in_ptr1, const double* in_ptr2, double* out_ptr0, double* out_ptr1) { { auto tmp0 = in_ptr0[static_cast<long>(0L)]; out_ptr0[static_cast<long>(0L)] = tmp0; } { auto tmp0 = in_ptr1[static_cast<long>(0L)]; auto tmp1 = in_ptr2[static_cast<long>(0L)]; auto tmp4 = out_ptr0[static_cast<long>(0L)]; auto tmp2 = static_cast<double>(2.0); auto tmp3 = decltype(tmp1)(tmp1 * tmp2); auto tmp5 = tmp4 / tmp2; atomic_add(&out_ptr0[static_cast<long>(0L)], tmp3); out_ptr1[static_cast<long>(0L)] = tmp5; } } ''') ``` - After this PR, the generated code is: ``` cpp_fused_div_index_add_0 = async_compile.cpp(''' #include "/tmp/torchinductor_root/ib/cibrnuq56cxamjj4krp4zpjvsirbmlolpbnmomodzyd46huzhdw7.h" extern "C" void kernel(const double* in_ptr0, const long* in_ptr1, const double* in_ptr2, double* out_ptr0, double* out_ptr1) { { auto tmp0 = in_ptr0[static_cast<long>(0L)]; out_ptr0[static_cast<long>(0L)] = tmp0; } { auto tmp0 = in_ptr1[static_cast<long>(0L)]; auto tmp1 = in_ptr2[static_cast<long>(0L)]; auto tmp2 = static_cast<double>(2.0); auto tmp3 = decltype(tmp1)(tmp1 * tmp2); atomic_add(&out_ptr0[static_cast<long>(0L)], tmp3); } { auto tmp0 = out_ptr0[static_cast<long>(0L)]; auto tmp1 = static_cast<double>(2.0); auto tmp2 = tmp0 / tmp1; out_ptr1[static_cast<long>(0L)] = tmp2; } } ''') ``` Test Plan ``` python -u -m pytest -s -v test_torchinductor.py -k test_mutations_loop_fusion ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/109172 Approved by: https://github.com/jgong5, https://github.com/jansel	2023-09-21 00:30:51 +00:00
Bin Bao	9c2715bbb2	[inductor] Clean up AOTInductor runtime ABI (#109678 ) Summary: Change the AOTInductor runtime interface to avoid referring to aten data structures directly, mostly at::Tensor and ProxyExecutor. This a combination of https://github.com/pytorch/pytorch/pull/109436, https://github.com/pytorch/pytorch/pull/109498, https://github.com/pytorch/pytorch/pull/109450, https://github.com/pytorch/pytorch/pull/109606, plus a few internal build changes. Reviewed By: frank-wei Differential Revision: D49374820 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109678 Approved by: https://github.com/frank-wei, https://github.com/chenyang78	2023-09-21 00:25:24 +00:00
Nikita Shulga	4e3b03217d	[BE] Replace 8 with `CHAR_BIT` (#109740 ) Defined in [limits.h](https://en.cppreference.com/w/c/types/limits) as number of bits per byte Pull Request resolved: https://github.com/pytorch/pytorch/pull/109740 Approved by: https://github.com/kit1980, https://github.com/ZainRizvi	2023-09-20 23:42:25 +00:00
Yukio Siraichi	6e3a7473cf	Trace calls with Python `Enum` values. (#109507 ) Fix: #82135 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109507 Approved by: https://github.com/ezyang	2023-09-20 22:18:11 +00:00
Nikita Shulga	55685d57c0	[JIT] Fix typed enum handling in 3.11 (#109717 ) In Python-3.11+ typed enums (such as `enum.IntEnum`) retain `__new__`,`__str__` and so on method of the base class via `__init__subclass__()` method (see https://docs.python.org/3/whatsnew/3.11.html#enum ), i.e. following code ```python import sys import inspect from enum import Enum class IntColor(int, Enum): RED = 1 GREEN = 2 class Color(Enum): RED = 1 GREEN = 2 def get_methods(cls): def predicate(m): if not inspect.isfunction(m) and not inspect.ismethod(m): return False return m.__name__ in cls.__dict__ return inspect.getmembers(cls, predicate=predicate) if __name__ == "__main__": print(sys.version) print(f"IntColor methods {get_methods(IntColor)}") print(f"Color methods {get_methods(Color)}") ``` Returns empty list for both cases for older Python, but on Python-3.11+ it returns list contains of enum constructors and others: ```shell % conda run -n py310 python bar.py 3.10.12 \| packaged by conda-forge \| (main, Jun 23 2023, 22:41:52) [Clang 15.0.7 ] IntColor methods [] Color methods [] % conda run -n py311 python bar.py 3.11.0 \| packaged by conda-forge \| (main, Oct 25 2022, 06:21:25) [Clang 14.0.4 ] IntColor methods [('__format__', <function Enum.__format__ at 0x105006ac0>), ('__new__', <function Enum.__new__ at 0x105006660>), ('__repr__', <function Enum.__repr__ at 0x1050068e0>)] Color methods [] ``` This change allows typed enums to be scriptable on 3.11, by explicitly marking several `enum.Enum` method to be dropped by jit script and adds test that typed enums are jit-scriptable. Fixes https://github.com/pytorch/pytorch/issues/108933 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109717 Approved by: https://github.com/atalman, https://github.com/davidberard98	2023-09-20 22:09:41 +00:00
Peter Bell	7ce69d5dbe	[RELAND] Remove some unnecessary <iostream> includes from headers (#108150 ) In almost all cases this is only included for writing the output formatter, which only uses `std::ostream` so including `<ostream>` is sufficient. The istream header is ~1000 lines so the difference is non-trivial. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108150 Approved by: https://github.com/albanD, https://github.com/malfet ghstack dependencies: #108149	2023-09-20 21:55:15 +00:00
Catherine Lee	05b3a4dd88	Fix test_libtorch.bat not exiting on error (#109393 ) For some weird reason, the batch file gets rid of the `exit /b 1` inside the for loop, so failures never actually get surfaced. Add skips for the tests that were failing. Also don't run the windows cpu build on main since it's in trunk. This is what currently works for the rocm build. The temp file failure originates from https://github.com/pytorch/pytorch/pull/108508 (got fixed before I merged this PR) I'm not sure when the ChunkRecordIteratorTest started failing, but it was after the above. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109393 Approved by: https://github.com/malfet	2023-09-20 21:34:40 +00:00
cyy	0735f6c0d5	[Reland] Remove calls of c10::either (#109708 ) While there were FB issues encountered when removing c10::either #109299 , we should be able to change OSS code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109708 Approved by: https://github.com/clee2000	2023-09-20 21:23:10 +00:00
Peter Bell	cadb566bbc	[RELAND] [ATen] Update pre-compiled header (#108149 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108149 Approved by: https://github.com/albanD	2023-09-20 20:38:30 +00:00
soulitzer	8bc00dfffd	Hashing for constant and singleton SymInt/SymBool (#109170 ) Bugfix: - previously, SymBool does not implement `__eq__`, Python falls back to default `__eq__ `and `__hash__` - in this PR, we make SymBool implement `__eq__` - symbolic SymBool now raises an error when hashed just like SymInt/SymFloat New feature: - previously, SymInt and SymFloat are unhashable (even if you are singleton or constant) - in this PR, SymInt and SymBool are hashable if singleton/constant Stay the same: - SymNode are hashable due to default Python behavior Pull Request resolved: https://github.com/pytorch/pytorch/pull/109170 Approved by: https://github.com/ezyang ghstack dependencies: #109169	2023-09-20 20:37:15 +00:00
soulitzer	5252fcb133	Handle constant SymBool in unary and binary operations (#109169 ) In this PR: - When Constant SymNode are detected in unary/binary ops demote them to plain int/bool before proceeding. Sometimes this means doing a unary op with a Constant SymNode would result in a plain bool. - Introduce an is_symbolic method, only available from Python. We need this because isinstance(x, SymInt) is no longer sufficient to check whether a given int/SymInt is symbolic or not. See later PR in the stack to see how this is used. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109169 Approved by: https://github.com/ezyang	2023-09-20 20:37:15 +00:00
lezcano	8597d37536	Implement numpy(force=True) (#109636 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109636 Approved by: https://github.com/ezyang ghstack dependencies: #109634	2023-09-20 20:06:13 +00:00
lezcano	1f6828ca99	Fix numpy(force=False) (#109634 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109634 Approved by: https://github.com/ezyang	2023-09-20 20:06:13 +00:00
Rodrigo Kumpera	9a1b6d44bb	[C10d] Add PG::enableCollectivesTiming to make it dynamically enabled. (#108814 ) Collectives timing gates the tracking when a collective starts on device. Currently it's enabled by set the NCCL_ENABLE_TIMING env var. The goal of this PR is to make it possible to dynamically enable that flag so users of the PG hooks don't have to set that flag in order to have their hooks work. The design is that once set, all new collectives will have such behavior so we track it on each Work object. We make enableTiming_ atomic in PGNCCL to avoid races on non-TSO hardware. To ensure consistency, we copy its value during Work construction and replace all previous usage of enableTiming_ from the PG with usages from the Work, which now has an immutable value. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108814 Approved by: https://github.com/wconstab, https://github.com/fduwjj ghstack dependencies: #108813	2023-09-20 19:47:41 +00:00
drisspg	3add22b716	Created nested utils.cpp (#109304 ) # Summary This refactors the preprocessing for nestedtensors that glue into SDPA. This is done in order to aid with reviewing: https://github.com/pytorch/pytorch/pull/97485 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109304 Approved by: https://github.com/cpuhrsch	2023-09-20 19:38:34 +00:00
PyTorch MergeBot	559d1f94a0	Revert "[Dynamo][Test] reland testcase with state (#109713 )" This reverts commit 5c897eacff8bc8f559d336d02f5c627c0045ac9d. Reverted https://github.com/pytorch/pytorch/pull/109713 on behalf of https://github.com/PaliC due to creates a out of memory error for macos tests ([comment](https://github.com/pytorch/pytorch/pull/109713#issuecomment-1728314478))	2023-09-20 19:34:07 +00:00
wangxiyuan	f9947830bb	[ONNX] Remove the depreacated function in symbolic_helper (#109681 ) These three functions in symbolic_helper are depreacated and should be removed after pytorch 2.0. The clean up job will be separated into several patches to ensure the safety. See: https://github.com/pytorch/pytorch/pull/107208 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109681 Approved by: https://github.com/thiagocrepaldi	2023-09-20 19:31:39 +00:00
wz337	f3c12f5aa2	[DCP][test] Update test_dtensor_resharding.py (#109619 ) Remove @parametrize and replace it with a for loop. This is due to @parametrize gets too complicated for internal test to recognize the test name. @fegin Pull Request resolved: https://github.com/pytorch/pytorch/pull/109619 Approved by: https://github.com/fegin	2023-09-20 19:05:07 +00:00
Nicolas Macchioni	7e05cd4eca	[autotuning] move logging logic into logging function (#109155 ) Summary: move check for use_global_cache into logging functions Test Plan: sandcastle/ci Differential Revision: D49211797 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109155 Approved by: https://github.com/jansel	2023-09-20 18:53:59 +00:00
Peter Bell	90a2026cd1	[inductor] Use _unsafe_view decompostion (#109669 ) As per the old comment, decomposing is better than lowering because patterns for `view` would apply to `_unsafe_view` as well. `fc47ba2794/torch/_inductor/decomposition.py (L89)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/109669 Approved by: https://github.com/lezcano ghstack dependencies: #109667, #109668	2023-09-20 18:45:56 +00:00
Peter Bell	6f0cf5a837	[decomp] Decompose unsafe_split{,_with_sizes} into safe variants (#109668 ) The "safety" aspect refers to the output not being registered as aliasing the input, but after AOTAutograd I don't think this distinction matters. However, we shouldn't use the same decomposition as the safe variant in case the backend doesn't want to decompose split. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109668 Approved by: https://github.com/lezcano ghstack dependencies: #109667	2023-09-20 18:45:56 +00:00
Peter Bell	9e629dd73c	[decomp] Add all std and std_mean overloads to core decompostions (#109667 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109667 Approved by: https://github.com/lezcano	2023-09-20 18:45:56 +00:00
Peter Bell	36a8105f54	[decomp] Fix baddbmm decomposition (#109714 ) The decomposition is currently registered without the pw_cast_for_opmath decorator, due to the ordering of decorators being meaningful. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109714 Approved by: https://github.com/lezcano	2023-09-20 18:40:21 +00:00
Jane Xu	b60a7c59ea	Refactor check_fast_path_restriction in preparation for has_empty_tensor variant (#109534 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109534 Approved by: https://github.com/albanD	2023-09-20 18:24:30 +00:00
Kaichao You	5c897eacff	[Dynamo][Test] reland testcase with state (#109713 ) Reland the PR https://github.com/pytorch/pytorch/pull/108750 reverted by https://github.com/pytorch/pytorch/issues/108838 , since https://github.com/pytorch/pytorch/pull/108969 has been merged. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109713 Approved by: https://github.com/anijain2305	2023-09-20 18:19:18 +00:00
Edward Z. Yang	be712a02e9	Trace `pytree` calls inside `vmap` implementation. (#109107 ) This PR fixes the `expectedFailure` introduced in the previous PR. Problem: container variables, such as `ConstDictVariable`, aren't registered nodes anymore. But, we have to process the tensors inside them, anyway. Solution: wrap the pytree functions in a `UserFunctionVariable`, and call it. This should inline the given pytree function, and return the expected processed arguments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109107 Approved by: https://github.com/zou3519 ghstack dependencies: #109201, #108533	2023-09-20 18:11:10 +00:00
Edward Z. Yang	654731a52b	Handle unbacked symints in Triton size hints (#109609 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/109609 Approved by: https://github.com/yf225 ghstack dependencies: #109603	2023-09-20 18:03:54 +00:00
Yang Chen	1c4e811565	replace data_ptr with aoti_torch_get_data_ptr for cpp codegen (#109615 ) Summary: in cpp codege, we should use aoti_torch_get_data_ptr for retrieving aten tensor pointers if abi_compatible is true Test Plan: ci Reviewed By: bertmaher Differential Revision: D49411392 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109615 Approved by: https://github.com/bertmaher, https://github.com/desertfire, https://github.com/jansel	2023-09-20 17:26:17 +00:00
PyTorch MergeBot	cdb51d2ad0	Revert "[2/N] Add -Wdeprecated and related fixes (#109564 )" This reverts commit 5b50641bac49e00ad05060f0b9fe3dcc5d73bc9b. Reverted https://github.com/pytorch/pytorch/pull/109564 on behalf of https://github.com/atalman due to Need to revert as followup revert of first PR 108626 ([comment](https://github.com/pytorch/pytorch/pull/109564#issuecomment-1728137207))	2023-09-20 17:15:57 +00:00
Nikita Shulga	af3741745c	[CI] Add `torch.compile` works without numpy test (#109624 ) Fixes https://github.com/pytorch/pytorch/issues/109387 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109624 Approved by: https://github.com/albanD	2023-09-20 17:07:20 +00:00
Edward Z. Yang	b771c04d6e	Handle unbacked symints in buffer reuse calculation (#109603 ) This is rewritten from https://github.com/pytorch/pytorch/pull/106655 to land faster, with peterbell10's comments. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/109603 Approved by: https://github.com/yf225	2023-09-20 16:54:57 +00:00
Edward Z. Yang	63025d4218	Do not redundantly min start with new_size[dim], since end is already min'ed with it (#109599 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/109599 Approved by: https://github.com/aakhundov	2023-09-20 16:53:42 +00:00
PyTorch MergeBot	1cc052bcab	Revert "[1/N] Add -Wdeprecated and related fixes (#108626 )" This reverts commit a53a677b4d8b9f4b9abbfeed2a6d4c00e9ee2252. Reverted https://github.com/pytorch/pytorch/pull/108626 on behalf of https://github.com/clee2000 due to I'm getting errors internally that look like the below on x86_64-apple-ios-simulator with clang 16 ([comment](https://github.com/pytorch/pytorch/pull/108626#issuecomment-1728102447))	2023-09-20 16:49:11 +00:00
eellison	db6e9f66f1	Use pretty print for checking no duplicated pattern (#109066 ) The pretty print is faster and more concise because it memoizes objects. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109066 Approved by: https://github.com/yanboliang ghstack dependencies: #109663, #108894, #108917, #109142, #109156	2023-09-20 16:44:09 +00:00
eellison	d24ba7a634	Add 3d Attn Pattern to match HF Whisper (#109156 ) Adds a 3d pattern that improves perf of HF Whisper from 1.3 -> 4.1. We could be matching more generally on 3d, but i'll leave that for another pr. Thanks to @drisspg for helping me write the pattern. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109156 Approved by: https://github.com/yanboliang ghstack dependencies: #109663, #108894, #108917, #109142	2023-09-20 16:39:31 +00:00
Rodrigo Kumpera	881bfbf21d	[c10d] Add tests for usig libuv through init_process_group. (#108661 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108661 Approved by: https://github.com/XilunWu, https://github.com/fduwjj	2023-09-20 16:02:20 +00:00
cyy	567e8ebf94	[1/N] Move c10::variant to std::variant (#103675 ) This PR moves some calls of c10::variant to std::variant. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103675 Approved by: https://github.com/ezyang	2023-09-20 15:21:24 +00:00
Bert Maher	e87bd9f588	[aot inductor] Make unit tests work on CPU (#109625 ) Summary: AOT inductor is only sort-of supported on CPU right now, but it works with a few hacks (the .so needs to be compiled and run with CUDA present, because we haven't excised the CUDA deps; also there's an `is_cpu` flag that needs to be plumbed into the call, or else all the weights are erroneously allocated on GPU). But, with those hacks in place, it currently works, so it's worth having the current state of it continue working (and at some point we'll remove the hacks). Test Plan: ``` python test_aot_inductor -k test_simple_cpu ``` Reviewers: binbao Subscribers: Tasks: Tags: Differential Revision: [D49427400](https://our.internmc.facebook.com/intern/diff/D49427400) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109625 Approved by: https://github.com/mikekgfb, https://github.com/chenyang78, https://github.com/desertfire	2023-09-20 14:51:44 +00:00
wz337	e73efbffab	[Test][ShardedTensor] Add test for corner case for chunk sharding spec (#109626 ) ## Description Add a test case to cover the corner case of empty shards when creating ShardedTensor. Original fix contributed by a user. https://github.com/pytorch/pytorch/pull/108915 ## Test With the fix, the test added runs fine. Without the fix in https://github.com/pytorch/pytorch/pull/108915, the test case added would throw the following assertion error. ``` (/home/irisz/local/a/pytorch-env) [irisz@devgpu051.cln3 ~/local/pytorch (add_test_for_corner_case_for_chunk_sharding_spec)]$ python3 test/distributed/_shard/sharded_tensor/test_sharded_tensor.py TestShardTensor.test_shard_tensor_with_empty_shard Fail to import hypothesis in common_utils, tests are not derandomized INFO:numba.cuda.cudadrv.driver:init Fail to import hypothesis in common_utils, tests are not derandomized Fail to import hypothesis in common_utils, tests are not derandomized Fail to import hypothesis in common_utils, tests are not derandomized Fail to import hypothesis in common_utils, tests are not derandomized NCCL version 2.18.3+cuda12.0 [rank3]:[2023-09-19 11:19:27,071] torch.testing._internal.common_distributed: [ERROR] Caught exception: [rank3]:[2023-09-19 11:19:27,071] torch.testing._internal.common_distributed: [ERROR] Traceback (most recent call last): [rank3]:[2023-09-19 11:19:27,071] torch.testing._internal.common_distributed: [ERROR] File "/data/users/irisz/pytorch/torch/testing/_internal/common_distributed.py", line 658, in run_test [rank3]:[2023-09-19 11:19:27,071] torch.testing._internal.common_distributed: [ERROR] getattr(self, test_name)() [rank3]:[2023-09-19 11:19:27,071] torch.testing._internal.common_distributed: [ERROR] File "/data/users/irisz/pytorch/torch/testing/_internal/common_distributed.py", line 544, in wrapper [rank3]:[2023-09-19 11:19:27,071] torch.testing._internal.common_distributed: [ERROR] fn() [rank3]:[2023-09-19 11:19:27,071] torch.testing._internal.common_distributed: [ERROR] File "/data/users/irisz/pytorch/torch/testing/_internal/common_utils.py", line 2406, in wrapper [rank3]:[2023-09-19 11:19:27,071] torch.testing._internal.common_distributed: [ERROR] method(args, kwargs) [rank3]:[2023-09-19 11:19:27,071] torch.testing._internal.common_distributed: [ERROR] File "/data/users/irisz/pytorch/torch/testing/_internal/distributed/_shard/sharded_tensor/__init__.py", line 94, in wrapper [rank3]:[2023-09-19 11:19:27,071] torch.testing._internal.common_distributed: [ERROR] func(self, args, *kwargs) [rank3]:[2023-09-19 11:19:27,071] torch.testing._internal.common_distributed: [ERROR] File "/data/users/irisz/pytorch/torch/testing/_internal/common_distributed.py", line 174, in wrapper [rank3]:[2023-09-19 11:19:27,071] torch.testing._internal.common_distributed: [ERROR] return func(args, *kwargs) [rank3]:[2023-09-19 11:19:27,071] torch.testing._internal.common_distributed: [ERROR] File "/data/users/irisz/pytorch/test/distributed/_shard/sharded_tensor/test_sharded_tensor.py", line 258, in test_shard_tensor_with_empty_shard [rank3]:[2023-09-19 11:19:27,071] torch.testing._internal.common_distributed: [ERROR] st = _shard_tensor(tensor, spec) [rank3]:[2023-09-19 11:19:27,071] torch.testing._internal.common_distributed: [ERROR] File "/data/users/irisz/pytorch/torch/distributed/_shard/api.py", line 68, in _shard_tensor [rank3]:[2023-09-19 11:19:27,071] torch.testing._internal.common_distributed: [ERROR] st = sharding_spec.shard(tensor, src_rank=src_rank, process_group=process_group) [rank3]:[2023-09-19 11:19:27,071] torch.testing._internal.common_distributed: [ERROR] File "/data/users/irisz/pytorch/torch/distributed/_shard/sharding_spec/chunk_sharding_spec.py", line 170, in shard [rank3]:[2023-09-19 11:19:27,071] torch.testing._internal.common_distributed: [ERROR] assert local_tensor is not None [rank3]:[2023-09-19 11:19:27,071] torch.testing._internal.common_distributed: [ERROR] AssertionError [rank3]:[2023-09-19 11:19:27,071] torch.testing._internal.common_distributed: [ERROR] exiting process 3 with exit code: 10 [rank0]:[2023-09-19 11:19:27,123] torch.testing._internal.common_distributed: [ERROR] Caught exception: [rank0]:[2023-09-19 11:19:27,123] torch.testing._internal.common_distributed: [ERROR] Traceback (most recent call last): [rank0]:[2023-09-19 11:19:27,123] torch.testing._internal.common_distributed: [ERROR] File "/data/users/irisz/pytorch/torch/testing/_internal/common_distributed.py", line 658, in run_test [rank0]:[2023-09-19 11:19:27,123] torch.testing._internal.common_distributed: [ERROR] getattr(self, test_name)() [rank0]:[2023-09-19 11:19:27,123] torch.testing._internal.common_distributed: [ERROR] File "/data/users/irisz/pytorch/torch/testing/_internal/common_distributed.py", line 544, in wrapper [rank0]:[2023-09-19 11:19:27,123] torch.testing._internal.common_distributed: [ERROR] fn() [rank0]:[2023-09-19 11:19:27,123] torch.testing._internal.common_distributed: [ERROR] File "/data/users/irisz/pytorch/torch/testing/_internal/common_utils.py", line 2406, in wrapper [rank0]:[2023-09-19 11:19:27,123] torch.testing._internal.common_distributed: [ERROR] method(args, *kwargs) [rank0]:[2023-09-19 11:19:27,123] torch.testing._internal.common_distributed: [ERROR] File "/data/users/irisz/pytorch/torch/testing/_internal/distributed/_shard/sharded_tensor/__init__.py", line 94, in wrapper [rank0]:[2023-09-19 11:19:27,123] torch.testing._internal.common_distributed: [ERROR] func(self, args, *kwargs) [rank0]:[2023-09-19 11:19:27,123] torch.testing._internal.common_distributed: [ERROR] File "/data/users/irisz/pytorch/torch/testing/_internal/common_distributed.py", line 174, in wrapper [rank0]:[2023-09-19 11:19:27,123] torch.testing._internal.common_distributed: [ERROR] return func(args, *kwargs) [rank0]:[2023-09-19 11:19:27,123] torch.testing._internal.common_distributed: [ERROR] File "/data/users/irisz/pytorch/test/distributed/_shard/sharded_tensor/test_sharded_tensor.py", line 258, in test_shard_tensor_with_empty_shard [rank0]:[2023-09-19 11:19:27,123] torch.testing._internal.common_distributed: [ERROR] st = _shard_tensor(tensor, spec) [rank0]:[2023-09-19 11:19:27,123] torch.testing._internal.common_distributed: [ERROR] File "/data/users/irisz/pytorch/torch/distributed/_shard/api.py", line 68, in _shard_tensor [rank0]:[2023-09-19 11:19:27,123] torch.testing._internal.common_distributed: [ERROR] st = sharding_spec.shard(tensor, src_rank=src_rank, process_group=process_group) [rank0]:[2023-09-19 11:19:27,123] torch.testing._internal.common_distributed: [ERROR] File "/data/users/irisz/pytorch/torch/distributed/_shard/sharding_spec/chunk_sharding_spec.py", line 179, in shard [rank0]:[2023-09-19 11:19:27,123] torch.testing._internal.common_distributed: [ERROR] dist.scatter( [rank0]:[2023-09-19 11:19:27,123] torch.testing._internal.common_distributed: [ERROR] File "/data/users/irisz/pytorch/torch/distributed/c10d_logger.py", line 68, in wrapper [rank0]:[2023-09-19 11:19:27,123] torch.testing._internal.common_distributed: [ERROR] return func(args, *kwargs) [rank0]:[2023-09-19 11:19:27,123] torch.testing._internal.common_distributed: [ERROR] File "/data/users/irisz/pytorch/torch/distributed/distributed_c10d.py", line 3143, in scatter [rank0]:[2023-09-19 11:19:27,123] torch.testing._internal.common_distributed: [ERROR] _check_tensor_list(scatter_list, "scatter_list") [rank0]:[2023-09-19 11:19:27,123] torch.testing._internal.common_distributed: [ERROR] File "/data/users/irisz/pytorch/torch/distributed/distributed_c10d.py", line 808, in _check_tensor_list [rank0]:[2023-09-19 11:19:27,123] torch.testing._internal.common_distributed: [ERROR] raise TypeError( [rank0]:[2023-09-19 11:19:27,123] torch.testing._internal.common_distributed: [ERROR] TypeError: Invalid function argument. Expected parameter `scatter_list` to be of type List[torch.Tensor]. [rank0]:[2023-09-19 11:19:27,123] torch.testing._internal.common_distributed: [ERROR] [rank0]:[2023-09-19 11:19:27,123] torch.testing._internal.common_distributed: [ERROR] To execute this test, run the following from the base repo dir: [rank0]:[2023-09-19 11:19:27,123] torch.testing._internal.common_distributed: [ERROR] python test/distributed/_shard/sharded_tensor/test_sharded_tensor.py -k test_shard_tensor_with_empty_shard [rank0]:[2023-09-19 11:19:27,123] torch.testing._internal.common_distributed: [ERROR] [rank0]:[2023-09-19 11:19:27,123] torch.testing._internal.common_distributed: [ERROR] This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 [rank0]:[2023-09-19 11:19:27,123] torch.testing._internal.common_distributed: [ERROR] exiting process 0 with exit code: 10 Process 3 terminated with exit code 10, terminating remaining processes. E ====================================================================== ERROR: test_shard_tensor_with_empty_shard (__main__.TestShardTensor) ---------------------------------------------------------------------- Traceback (most recent call last): File "/data/users/irisz/pytorch/torch/testing/_internal/common_distributed.py", line 542, in wrapper self._join_processes(fn) File "/data/users/irisz/pytorch/torch/testing/_internal/common_distributed.py", line 761, in _join_processes self._check_return_codes(elapsed_time) File "/data/users/irisz/pytorch/torch/testing/_internal/common_distributed.py", line 811, in _check_return_codes raise RuntimeError(error) RuntimeError: Process 3 exited with error code 10 and exception: Traceback (most recent call last): File "/data/users/irisz/pytorch/torch/testing/_internal/common_distributed.py", line 658, in run_test getattr(self, test_name)() File "/data/users/irisz/pytorch/torch/testing/_internal/common_distributed.py", line 544, in wrapper fn() File "/data/users/irisz/pytorch/torch/testing/_internal/common_utils.py", line 2406, in wrapper method(args, *kwargs) File "/data/users/irisz/pytorch/torch/testing/_internal/distributed/_shard/sharded_tensor/__init__.py", line 94, in wrapper func(self, args, *kwargs) File "/data/users/irisz/pytorch/torch/testing/_internal/common_distributed.py", line 174, in wrapper return func(args, **kwargs) File "/data/users/irisz/pytorch/test/distributed/_shard/sharded_tensor/test_sharded_tensor.py", line 258, in test_shard_tensor_with_empty_shard st = _shard_tensor(tensor, spec) File "/data/users/irisz/pytorch/torch/distributed/_shard/api.py", line 68, in _shard_tensor st = sharding_spec.shard(tensor, src_rank=src_rank, process_group=process_group) File "/data/users/irisz/pytorch/torch/distributed/_shard/sharding_spec/chunk_sharding_spec.py", line 170, in shard assert local_tensor is not None AssertionError ---------------------------------------------------------------------- Ran 1 test in 21.207s FAILED (errors=1) ``` cc. @fduwjj @wanchaol Pull Request resolved: https://github.com/pytorch/pytorch/pull/109626 Approved by: https://github.com/fduwjj	2023-09-20 14:40:07 +00:00
Aleksei Nikiforov	a019e5cbff	s390x onnx: byteswap data when serializing it (#107963 ) This change fixes test_pad, test_pad_with_dynamic_input_shape, test_reshape, test_resize and test_resize_after_concat in test/onnx/test_pytorch_onnx_shape_inference.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/107963 Approved by: https://github.com/justinchuby	2023-09-20 14:27:45 +00:00
Salil Desai	40b2c796dc	[Decomposition] baddbmm (#108534 ) Summary: Moving decomposition of baddbmm from _inductor/decomposition.py and include it in core_aten_decompositions `ff38c0e2f9/torch/_inductor/decomposition.py (L203)` Test Plan: Phabricator + OSS Tests Differential Revision: D48871741 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108534 Approved by: https://github.com/SherlockNoMad	2023-09-20 12:49:32 +00:00
Kaichao You	b30ee35a6f	[Inductor][FX]Support efficient conv bn eval (#108757 ) This PR adds an `efficient_conv_bn_eval_graph_transform` pass to the inductor. It tries to identify consecutive conv + bn computation with bn in eval mode, and changes it to a more efficient implementation. It does not modify parameters, which makes it support training without any pain. If no such patterns are identified, it does nothing. Therefore, it is backward compatible. It has great benefit in terms of memory footprint: For resnet50 with input batchsize 64, image size 224, forward + backward training: \| Technique \| Memory Footprint (GB) \| Remarks \| \|-------------------------------\|----------------------------\|-------------------------------------------\| \| Eager Mode \| 5.18 \| \| \| torch.compile \| 5.46 \| Strangely, not saving memory \| \| torch.compile with this PR \| 2.88 \| Saves about 50% memory! \| The script to measure the memory footprint: ```python from torchvision.models.resnet import resnet50 import torch net = resnet50().eval().cuda() input = torch.randn(64, 3, 224, 224).cuda() opt_net = torch.compile(net) # Use torch.compile # opt_net = net # Eager mode current_memory = torch.cuda.memory_allocated() torch.cuda.reset_peak_memory_stats() for i in range(10): opt_net.zero_grad() output = opt_net(input) output.sum().backward() del output peak_memory = torch.cuda.max_memory_allocated() additional_peak_memory = peak_memory - current_memory print(f"Additional peak memory used: {additional_peak_memory / (1024 ** 3)} GB") ``` More results can be found in the corresponding paper: (this method is called Tune Mode in the tables). <img width="709" alt="image" src="https://github.com/pytorch/pytorch/assets/23236638/db4815b0-d93e-4726-b1d5-e6651f256484"> <img width="653" alt="image" src="https://github.com/pytorch/pytorch/assets/23236638/22e5e1ab-6129-4c3d-a875-3c7343293b2e"> Note: the difference between this PR and https://github.com/pytorch/pytorch/pull/106372 is that, https://github.com/pytorch/pytorch/pull/106372 tries to fix and change the implementation of `torch.fx.experimental.optimization.fuse`, which causes compatibility issues; this PR only introduces a new graph transform passes, and does not break the previous code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108757 Approved by: https://github.com/jansel	2023-09-20 08:10:02 +00:00
Jiaxu Zhu	595af261b2	[ao] Support Subclasses of `FloatFunctional` in eager mode prepare (#109646 ) Summary: As title, if a module is subclassing `nnq.FloatFunctional`, also adding observers to it like `nnq.FloatFunctional` Test Plan: CI Differential Revision: D49431968 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109646 Approved by: https://github.com/jerryzh168	2023-09-20 08:09:55 +00:00
Sherlock Huang	293205c54b	[AOTInductor] Fix aot_inductor/test:test_custom_ops (#109660 ) Summary: Fix aot_inductor/test:test_custom_ops, which was broken by https://github.com/pytorch/pytorch/pull/109391 Test Plan: buck2 run mode/dev-nosan //deeplearning/aot_inductor/test:test_custom_ops Differential Revision: D49438928 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109660 Approved by: https://github.com/desertfire, https://github.com/chenyang78	2023-09-20 07:44:39 +00:00
cyy	5b50641bac	[2/N] Add -Wdeprecated and related fixes (#109564 ) This PR follows #108626. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109564 Approved by: https://github.com/ezyang	2023-09-20 07:03:25 +00:00
cyy	d137b620c5	Fix c10_tempfile_test failure on Windows (#109680 ) Fixes c10_tempfile_test indicated by #109393. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109680 Approved by: https://github.com/clee2000	2023-09-20 07:01:42 +00:00
eellison	ad53b53518	Generate patterns in fp16 and fp32 (#109142 ) aten.softmax will generate a different decomposition for fp16/bf16 and fp32 because when invoked in lower precision it will upcast the inputs to fp32 and then downcast after. This has been causing us to miss bf16 patterns. For example, Camembert improves 20% with this PR (as do I'm sure many other models). Pull Request resolved: https://github.com/pytorch/pytorch/pull/109142 Approved by: https://github.com/yanboliang ghstack dependencies: #109663, #108894, #108917	2023-09-20 06:38:02 +00:00
rzou	122264a0c0	[generate_opcheck_tests] tests should ignore meta/FakeTensors (#109641 ) These tests generally don't work on meta tensors because they need to compare the data of the Tensors. For example, SchemaCheckMode errors out if any inputs are meta or Fake because it needs to check their storages to see if any mutation occurred and those do not have storages. Test Plan: - new tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/109641 Approved by: https://github.com/bdhirsh, https://github.com/soulitzer ghstack dependencies: #109637, #109638, #109639, #109640	2023-09-20 06:33:37 +00:00
rzou	d3d71367b9	[generate_opcheck_tests] Always print a repro (#109640 ) On failure of a test, we will always print a "repro". This repro isn't really runnable but gives the user a sense of how to actually reproduce the test without the test suite, because using the test suite is a bit convoluted. If the user passes PYTORCH_OPCHECK_PRINT_BETTER_REPRO, we will print a fuller repro that saves the exact problematic test inputs to disk and reads them back out. Test Plan: - expecttests on the generate_repro helper function - tried this out locally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109640 Approved by: https://github.com/bdhirsh, https://github.com/soulitzer ghstack dependencies: #109637, #109638, #109639	2023-09-20 06:33:37 +00:00
rzou	af900fe228	[generate_opcheck_tests] flip unified_diff order (#109639 ) It was reversed. As written this is a bit difficult to test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109639 Approved by: https://github.com/bdhirsh, https://github.com/soulitzer ghstack dependencies: #109637, #109638	2023-09-20 06:33:37 +00:00
rzou	7564f04389	[generate_opcheck_tests] add type checking (#109638 ) Test Plan: - lintrunner Pull Request resolved: https://github.com/pytorch/pytorch/pull/109638 Approved by: https://github.com/bdhirsh, https://github.com/soulitzer ghstack dependencies: #109637	2023-09-20 06:33:37 +00:00
rzou	10d575911e	[generate_opcheck_tests] rename "success" to "xsuccess" (#109637 ) Not BC breaking because no existing failures dict have "success" in them. Test Plan: - new tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/109637 Approved by: https://github.com/bdhirsh, https://github.com/soulitzer	2023-09-20 06:33:37 +00:00
Zejun Huang	d271a5c796	[minimizer]skip mode for minimizer (#109399 ) Summary: - skip known issue nodes in minimizer and check the whole graph Reviewed By: siyan-lin Differential Revision: D48990707 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109399 Approved by: https://github.com/jfix71	2023-09-20 06:23:46 +00:00
eellison	067f172930	Serialize Remaining Patterns (#108917 ) Serializes the remaining traced patterns. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108917 Approved by: https://github.com/davidberard98 ghstack dependencies: #109663, #108894	2023-09-20 05:39:23 +00:00
eellison	16d608d70d	Add Python serialization to Pattern Matcher patterns (#108894 ) Adds a Python Pretty Printer to the pattern matcher that serializes patterns as python. Generating our fuse attention patterns was taking 4 seconds of compile time, which will only get worse as we add more variants (which I will do in the rest of this stack). To write out patterns, build pytorch, then run `gen_attention_patterns.py`. Since there is a line limit for PRs i'm only including the _sdpa_pattern1 in this first diff. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108894 Approved by: https://github.com/yanboliang ghstack dependencies: #109663	2023-09-20 05:36:52 +00:00
Yinghai Lu	1a5e0edf56	[dynamo] Avoid divided by zero error when printing out choices (#109328 ) Summary: We met with problem in practice. Although I think whenever we meet with this problem, something bad probably happened in upstream. Like the run instance possibly returned immediately due to error. Throw this out to see if we can catch something earlier. That'll be better. Test Plan: CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/109328 Approved by: https://github.com/ezyang, https://github.com/jansel	2023-09-20 05:27:20 +00:00
eellison	76dd38b591	add back in unsafe view decomp (#109663 ) This decomp makes pattern matching easier, and was only just excluded from decomp set in https://github.com/pytorch/pytorch/pull/108713 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109663 Approved by: https://github.com/davidberard98, https://github.com/yanboliang	2023-09-20 05:23:59 +00:00
Brian Hirsh	238fb66085	python functionalization: support higher order ops (#108656 ) We now have two types of functionalization, C++ Functionalization (through the `Functionalize` dispatch key), and python functionalization (through the `FunctionalTensorMode` torch_dispatch mode). This means that all higher order ops need custom functionalization rules for the python variant too. I added them here, as well as a helper function `dispatch_functionalize()` - equivalent to `torch.func.functionalize()`, except that it uses `FunctionalTensorMode`. In theory we could have secretly switched `torch.func.functionalize` to use `FunctionalTensorMode`. This would be BC-breaking, though, since `FunctionalTensorMode` isn't composable with the other functorch transforms (the functorch layer-mode stack doesn't know how to re-order torch_dispatch modes arbitrarily). Pull Request resolved: https://github.com/pytorch/pytorch/pull/108656 Approved by: https://github.com/zou3519 ghstack dependencies: #109024, #109248	2023-09-20 04:37:31 +00:00
Brian Hirsh	d9342cde6e	custom ops: don't error if autograd input is a tensor subclass (#109248 ) This is needed to allow the custom ops in our custom op autograd tests to accept `FunctionalTensor` arguments as inputs that we compute gradients for. Previously, custom ops would raise an error if you tried to pass in a tensor subclass when using autograd. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109248 Approved by: https://github.com/zou3519 ghstack dependencies: #109024	2023-09-20 04:37:31 +00:00
Brian Hirsh	c9b60a691b	functorch: fallthrough on calls to custom size/stride/storage_offset calls (#109024 ) The problem (that @zou3519 pointed out) is that functorch assumes that when it create a TensorImpl (like `TensorWrapper`, [code](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/functorch/TensorWrapper.cpp#L43)), it doesn't re-enter the dispatcher. However, if the inner tensor that we hold is a tensor subclass with custom size/strides, then calls like `sym_storage_offset()` get plumbed to `__torch_dispatch__` as `torch.ops.aten.sym_storage_offset.default`, which is a real op registered to the dispatcher ([here](https://github.com/pytorch/pytorch/blob/main/torch/csrc/jit/runtime/register_prim_ops.cpp#L526)). Pull Request resolved: https://github.com/pytorch/pytorch/pull/109024 Approved by: https://github.com/zou3519	2023-09-20 04:37:31 +00:00
igm503	0317626df5	[MPS] adding weight_norm_interface support for mps (#108008 ) Fixes #104513 Adds support for aten::_weight_norm_interface to the mps backend. Also adds a consistency test for the output and the grad. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108008 Approved by: https://github.com/kulinseth	2023-09-20 02:18:28 +00:00
Chien-Chin Huang	1b3e5b53f3	[FSDP][optim_state_dict] Add device to _shard_utils.py to explicitly use the device from fsdp_state (#109631 ) _get_pg_default_device does not always get the device we want. This PR let the user explicitly tell use the correct device. Differential Revision: [D49425743](https://our.internmc.facebook.com/intern/diff/D49425743/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109631 Approved by: https://github.com/awgu, https://github.com/fduwjj, https://github.com/wz337	2023-09-20 01:59:38 +00:00
mingfeima	6b760ffd6c	improve unique performance on CPU (#107846 ) Fix https://github.com/pytorch/pytorch/issues/107098, improve `unique` performance on CPU. The algorithm is taken from Numpy implementation at https://github.com/numpy/numpy/blob/main/numpy/lib/arraysetops.py#L323, it first do a sort on the input sequence and then use a `mask` to record the unique element of each consecutive section. Now we don't have parallel sort on 1-dimension float tensor, will have it enabled in next step. Parallel radix sort is used for 1-dimensional int tensor. The following data is collected with script in the issue on Intel(R) Xeon(R) Gold 6248 CPU @ 2.5GHz with single sockets (20 cores): #### before (dtype int64) ``` Numpy just sort: 0.4271528720855713 s Numpy sort + indexes: 6.383563041687012 s Torch just sort: 0.46924352645874023 s Torch sort + indexes: 1.8140404224395752 s ``` #### after (dtype int64) ``` Torch just sort: 0.2540090084075928 s Torch sort + indexes: 0.2766146659851074 s ``` #### before (float32) ``` Numpy just sort: 0.41129398345947266 s Numpy sort + indexes: 6.422696590423584 s Torch just sort: 9.109549283981323 s Torch sort + indexes: 37.59021711349487 s ``` #### after (float32) ``` Torch just sort: 3.5369982719421387 s Torch sort + indexes: 3.582240581512451 s ``` if we enabled parallel sort on 1-dimension float tensor, the performance is: ``` Torch just sort: 0.3212606906890869 s Torch sort + indexes: 0.36211371421813965 s ``` Since i have fused the `inverse_indices` and `count` calculation in fused parallel loop (the algorithm is identical to NumPy's but with better optimization), they will take a small amount of additional time. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107846 Approved by: https://github.com/jgong5, https://github.com/nikitaved, https://github.com/peterbell10	2023-09-20 01:38:19 +00:00
Edward Z. Yang	518308a740	Trace through `pytree` API with dynamo. (#108533 ) Fix: #107315 This PR enables dynamo to trace through the `pytree` API by inlining its functions. In order to do so, a few details of `pytree` had to be changed. In summary, this PR: - Introduces `TreeSpecVariable` for representing `TreeSpec` instances - Specializes `<type>.__bases__` call, returning a `TupleVariable` - Enables the call to `id` builtin function for every variable that implements `as_python_constant` method - Specializes `ConstantVariable.call_method` for its (un)flatten functions - Implements `UserDefinedObjectVariable.as_python_constant` - Modifies `pytree` by: - Make `SUPPORTED_NODES` a map of ids (instead of types) to `NodeDef` - Removed `functools.wraps` function, since it can't be inlined Pull Request resolved: https://github.com/pytorch/pytorch/pull/108533 Approved by: https://github.com/ezyang, https://github.com/voznesenskym ghstack dependencies: #109201	2023-09-20 00:04:56 +00:00
Edward Z. Yang	103260a43b	Re-define check for `typing` classes. (#109201 ) This PR fix the `is_typing` function: checks whether a value is an instance of a class from the `typing` package. This reverts commit b09c09f7bb3adb6a5b8a107a5b96757b569daa8d. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109201 Approved by: https://github.com/ezyang	2023-09-20 00:04:56 +00:00
Sam Larsen	85d26f7868	[inductor] Enable mypy checking for torch/_inductor/codegen/triton.py (#109146 ) Summary: enably mypy chcking for torch/_inductor/codegen/triton.py and make the minimum number of fixes / ignores to get the linter to pass Test Plan: `lintrunner -a` Pull Request resolved: https://github.com/pytorch/pytorch/pull/109146 Approved by: https://github.com/peterbell10	2023-09-19 23:01:03 +00:00
PyTorch MergeBot	8705fc1bbd	Revert "Add Python serialization to Pattern Matcher patterns (#108894 )" This reverts commit 7db175b6f628ed18f98eeb41d8b15c85c40e0f51. Reverted https://github.com/pytorch/pytorch/pull/108894 on behalf of https://github.com/eellison due to land race ([comment](https://github.com/pytorch/pytorch/pull/108894#issuecomment-1726649151))	2023-09-19 23:00:03 +00:00
PyTorch MergeBot	8b4b1817c8	Revert "Serialize Remaining Patterns (#108917 )" This reverts commit 7bf08b77f378e5b540fb08dd0c61326fe3ab5583. Reverted https://github.com/pytorch/pytorch/pull/108917 on behalf of https://github.com/eellison due to land race ([comment](https://github.com/pytorch/pytorch/pull/108917#issuecomment-1726646267))	2023-09-19 22:54:52 +00:00
Michael Lazos	b1d2028eb0	Add compiled optimizer test for nadam (#109548 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/109548 Approved by: https://github.com/janeyx99	2023-09-19 22:54:36 +00:00
PyTorch MergeBot	c2f5d4d8f0	Revert "Generate patterns in fp16 and fp32 (#109142 )" This reverts commit 14994cc9780cc66e03f8ce6720996e798dd85e19. Reverted https://github.com/pytorch/pytorch/pull/109142 on behalf of https://github.com/eellison due to MESSAGE ([comment](https://github.com/pytorch/pytorch/pull/109142#issuecomment-1726641232))	2023-09-19 22:52:05 +00:00
Gal Rotem	11c6a98bca	[torch] add use_buffers to swa_utils interface (#109078 ) Summary: As title, this already exists in swa_utils.py Differential Revision: D49155243 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109078 Approved by: https://github.com/janeyx99	2023-09-19 21:30:59 +00:00
eellison	14994cc978	Generate patterns in fp16 and fp32 (#109142 ) aten.softmax will generate a different decomposition for fp16/bf16 and fp32 because when invoked in lower precision it will upcast the inputs to fp32 and then downcast after. This has been causing us to miss bf16 patterns. For example, Camembert improves 20% with this PR (as do I'm sure many other models). Pull Request resolved: https://github.com/pytorch/pytorch/pull/109142 Approved by: https://github.com/yanboliang ghstack dependencies: #108894, #108917	2023-09-19 20:59:42 +00:00
Tobias Ringwald	7b53303d3c	Improved the docs for torch.std, torch.var, torch.std_mean, torch.var_mean and torch.cov (#109326 ) Fixes #109186. This PR updates the docs for - `torch.var` - `torch.var_mean` - `torch.std` - `torch.std_mean` - `torch.cov` to reflect the actual implementation behavior when `correction >= N`. The math for `torch.cov` should probably be double checked before merging. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109326 Approved by: https://github.com/albanD	2023-09-19 20:47:24 +00:00
eellison	7bf08b77f3	Serialize Remaining Patterns (#108917 ) Serializes the remaining traced patterns. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108917 Approved by: https://github.com/davidberard98 ghstack dependencies: #108894	2023-09-19 20:45:52 +00:00
eellison	7db175b6f6	Add Python serialization to Pattern Matcher patterns (#108894 ) Adds a Python Pretty Printer to the pattern matcher that serializes patterns as python. Generating our fuse attention patterns was taking 4 seconds of compile time, which will only get worse as we add more variants (which I will do in the rest of this stack). To write out patterns, build pytorch, then run `gen_attention_patterns.py`. Since there is a line limit for PRs i'm only including the _sdpa_pattern1 in this first diff. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108894 Approved by: https://github.com/yanboliang	2023-09-19 20:36:52 +00:00
Digant Desai	5845fc2fa6	[PyTorch][Coreml] Bubble up NSError from loadModel (#109444 ) Summary: This can help debug issues esp fc/bc issues with coreml tools, when a model fails to load. Test Plan: On a macbook fbsource, ``` arc focus2 -b pp-ios -a ModelRunner -a //xplat/caffe2/c10:c10Apple -a //xplat/caffe2/fb/dynamic_pytorch:dynamic_pytorch_implApple -a //xplat/caffe2:coreml_delegateApple --auto-test-schemes --force-with-wrong-xcode ``` It builds and runs the Playground app using a bunch of coreml models on my iPhone. Here is one for example, https://pxl.cl/3nSPn Also forcefully triggering MLModel ctor failure to test this code by setting a `modelURL=nil`, and as expected got this, ``` libc++abi: terminating due to uncaught exception of type c10::Error: Error loading MLModel Error details: Localized_description: nil value for URL Domain: com.apple.CoreML Code: 3 User Info: { NSLocalizedDescription = "nil value for URL"; } Input Shapes: N/A Exception raised from compile at xplat/caffe2/torch/csrc/jit/backends/coreml/objc/PTMCoreMLBackend.mm:162 (most recent call first): (no backtrace available) ``` Instead of a previous message would have been, ``` Loading MLModel failed ``` Unrelated issues * P829736691 - with running MaskRCNN on Coreml with the Playground app. Only happens some times. * P829741377 - with Metal Operator Tests with the Playground app. Differential Revision: D49349726 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109444 Approved by: https://github.com/kimishpatel	2023-09-19 20:08:37 +00:00
Tina (Lin) Dineva	a86727a06b	[Pytorch][Vulkan] rewrite available() check and add tests for them (#109541 ) Summary: As suggested by liuk22 [[here](https://www.internalfb.com/diff/D49306279?dst_version_fbid=3583458958608887&transaction_fbid=282478474429100)] , rewrote `available()` check and add tests to ensure they work. Test Plan: ``` LD_LIBRARY_PATH=third-party/swiftshader/lib/linux-x64/ buck run fbcode/mode/dev-nosan //xplat/caffe2:pt_vulkan_api_test_bin ``` Differential Revision: D49388848 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109541 Approved by: https://github.com/yipjustin	2023-09-19 18:59:01 +00:00
Edward Z. Yang	964b79c813	[EASY] Update dynamo dependency installing Makefile (#107229 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/107229 Approved by: https://github.com/bdhirsh	2023-09-19 18:58:37 +00:00
Scott Wolchok	caf4376349	[PyTorch] remove branch in isIntrusivePtr (#109273 ) There is a code comment in ivalue.h that is intended to explain the motivation for this change fully; please request changes if it doesn't. Differential Revision: [D49245910](https://our.internmc.facebook.com/intern/diff/D49245910/) NOTE FOR REVIEWERS: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D49245910/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/109273 Approved by: https://github.com/ezyang ghstack dependencies: #109272	2023-09-19 17:51:41 +00:00
Scott Wolchok	e29330deab	[PyTorch] clang-format ivalue.h (#109272 ) I don't know how this got out of format, but now it's formatted. Differential Revision: [D49245911](https://our.internmc.facebook.com/intern/diff/D49245911/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109272 Approved by: https://github.com/ezyang, https://github.com/Skylion007	2023-09-19 17:51:41 +00:00
PyTorch MergeBot	cd31c170c9	Revert "[ONNX] Remove deprecated functions (#107208 )" This reverts commit 263ca7d69bb9b3b58ae0f9b4d27864587611389c. Reverted https://github.com/pytorch/pytorch/pull/107208 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/107208#issuecomment-1726183104))	2023-09-19 17:26:48 +00:00
FFFrog	70f2adaec3	Setup_context does not contain default values of forward() (#108561 ) Fixes #108529 As the title shown. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108561 Approved by: https://github.com/soulitzer	2023-09-19 16:23:52 +00:00
PyTorch MergeBot	1427b8149c	Revert "Eliminate c10::guts::make_unique_base (#109429 )" This reverts commit 6b1a15d1bb465b9f0f07a7a7c8dc5d88d086438a. Reverted https://github.com/pytorch/pytorch/pull/109429 on behalf of https://github.com/clee2000 due to Sorry its me again, I'm getting that this caused an instruction count regression internally ([comment](https://github.com/pytorch/pytorch/pull/109429#issuecomment-1725923294))	2023-09-19 15:47:00 +00:00
Peter Bell	a68280e2c3	[cpu] Vectorize nan_to_num (#98329 ) Locally I see a roughly 4x speedup. Pull Request resolved: https://github.com/pytorch/pytorch/pull/98329 Approved by: https://github.com/lezcano	2023-09-19 15:25:41 +00:00
Bert Maher	1895bd9bb5	[inductor] Decompose torch.ops.quantized.embedding_bag_byte_unpack (#109398 ) This would be cleaner if we had support for u8->float32 views (bitcasts) in inductor, but it works for now. Differential Revision: [D49329910](https://our.internmc.facebook.com/intern/diff/D49329910/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109398 Approved by: https://github.com/hl475, https://github.com/jansel, https://github.com/jgong5	2023-09-19 14:11:47 +00:00
Salil Desai	d0cc623192	[Decomposition] _unsafe_view (#108713 ) Summary: Decomp already exists so just add it to core_aten_decompositions https://www.internalfb.com/code/fbsource/[9d5eabd7b213d1a356d4e7bb400355d574ea924b]/fbcode/caffe2/torch/_decomp/decompositions.py?lines=3091 Differential Revision: D48619079 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108713 Approved by: https://github.com/larryliu0820, https://github.com/SherlockNoMad	2023-09-19 13:37:35 +00:00
drisspg	deea268e43	Update aten_fill to avoid d2h sync (#109533 ) Fixes #109115 ### Before: <img width="1526" alt="Screenshot 2023-09-18 at 11 57 32 AM" src="https://github.com/pytorch/pytorch/assets/32754868/394a4c51-7cae-4d05-b9ad-b17d02beaf72"> ### After: <img width="1550" alt="Screenshot 2023-09-18 at 11 57 25 AM" src="https://github.com/pytorch/pytorch/assets/32754868/e2f774f5-5374-49c3-95ec-dd3a85f74a2e"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/109533 Approved by: https://github.com/mikaylagawarecki	2023-09-19 13:34:49 +00:00
Salil Desai	2e721aab98	[Decomposition] Trunc (#109319 ) Summary: Add Decomp for Trunc and add it to core_aten_decompositions Differential Revision: D49042033 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109319 Approved by: https://github.com/SherlockNoMad	2023-09-19 13:30:13 +00:00
Salil Desai	ae66d0b3bf	[Decomposition] clamp_max (#108718 ) Summary: Decomp already exists so just add it to core_aten_decompositions https://www.internalfb.com/code/fbsource/[abda43a5a268e83fef6d62b49531a390ce915ad2]/fbcode/caffe2/torch/_refs/__init__.py?lines=1855 Differential Revision: D48880026 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108718 Approved by: https://github.com/SherlockNoMad	2023-09-19 13:25:35 +00:00
Brian Hirsh	25e81f19f3	reland "python functionalization: add helpers, functionalize_sync and mirror_autograd_meta (#107917 )" (#109518 ) Reland - the previous PR was reverted by internal with this error: ``` File "/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/buck-out/v2/gen/fbcode/363cd7e240f5d021/caffe2/torch/fb/trainer/data_modules/tests/__test_dataloader__/test_dataloader#link-tree/torch/__init__.py", line 29, in <module> from ._utils_internal import _functionalize_sync as _sync ImportError: cannot import name '_functionalize_sync' from 'torch._utils_internal' ``` I couldn't figure out why internal was unhappy with the import. One potential reason is that I see a build rule for another `_utils_internal.py` in the fb folder here ([link](https://www.internalfb.com/code/fbsource/[30ed85cd88409af98b7490be137aaa5dfd7afd01]/fbcode/caffe2/TARGETS?lines=444)) Rather than burn more time investigating, I confirmed internally that the error goes away if I move the util from `torch/_utils_internal.py` to `torch/_utils.py`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109518 Approved by: https://github.com/albanD	2023-09-19 13:25:24 +00:00
Edward Z. Yang	677a1010e6	Implement traceable torch.tensor when you have SymInt/SymFloat inputs (#109515 ) I just ported the C++ torch.tensor implementation to Python, swapping out the inner bits to successively stack tensors together, so that we can trace through `scalar_tensor`. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/109515 Approved by: https://github.com/voznesenskym ghstack dependencies: #109513	2023-09-19 13:19:57 +00:00
CaoE	8ed906030c	add fp16 support for mkldnn conv and deconv on CPU (#99496 ) The PR is part of https://github.com/pytorch/pytorch/issues/97068, which is to add fp16 support for mkldnn conv and mkldnn deconv to leverage avx_ne_convert, avx512-fp16, and amx-fp16 via the oneDNN library. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99496 Approved by: https://github.com/jgong5, https://github.com/cpuhrsch	2023-09-19 12:37:28 +00:00
CaoE	54c28c564f	add Half support for BatchNorm on CPU (#102070 ) Fixes #106543 ### Testing Single core: shape \| fp32 forward / ms \| fp16 forward / ms \| bf16 forward / ms \| fp32 backward / ms \| fp16 backward / ms \| bf16 backward / ms -- \| -- \| -- \| -- \| -- \| -- \| -- (1, 4, 256, 256) \| 0.7116 \| 0.1427 \| 0.1744 \| 0.2638 \| 0.2002 \| 0.2556 (1, 32, 100, 100) \| 0.8579 \| 0.1725 \| 0.2077 \| 0.3023 \| 0.2399 \| 0.2995 (32, 16, 200, 200) \| 57.3466 \| 12.2179 \| 13.1320 \| 45.9524 \| 24.1526 \| 24.9882 28 cores: shape \| fp32 forward / ms \| fp16 forward / ms \| bf16 forward / ms \| fp32 backward / ms \| fp16 backward / ms \| bf16 backward / ms -- \| -- \| -- \| -- \| -- \| -- \| -- (1, 4, 256, 256) \| 0.2571 \| 0.0713 \| 0.0846 \| 0.1140 \| 0.0883 \| 0.1043 (1, 32, 100, 100) \| 0.1077 \| 0.0510 \| 0.0548 \| 0.0700 \| 0.0645 \| 0.0713 (32, 16, 200, 200) \| 5.5060 \| 1.4195 \| 1.4663 \| 6.773 \| 3.0886 \| 3.1343 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102070 Approved by: https://github.com/jgong5, https://github.com/mikaylagawarecki, https://github.com/mingfeima	2023-09-19 10:43:33 +00:00
Nikita Shulga	2f53bca0fc	[Docs] Fix typo in `torch.unflatten` (#109588 ) Fixes https://github.com/pytorch/pytorch/issues/109559 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109588 Approved by: https://github.com/lezcano	2023-09-19 10:37:45 +00:00
Nikita Shulga	af867c2d14	[Docs] Fix `compiler.list_backends` invocation (#109568 ) s/torch.compile.list_backends/torch.compiler.list_backends` Fixes https://github.com/pytorch/pytorch/issues/109451 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109568 Approved by: https://github.com/msaroufim, https://github.com/svekars	2023-09-19 10:00:04 +00:00
cyy	a53a677b4d	[1/N] Add -Wdeprecated and related fixes (#108626 ) This PR adds -Wdeprecated to CMake warnings and fixes related issues. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108626 Approved by: https://github.com/ezyang, https://github.com/Skylion007	2023-09-19 09:24:04 +00:00
leslie-fang-intel	4a60bd22b2	[Quant][Inductor] Enable quantization dynamic batch size support (#108550 ) Summary This Diff enables dynamic batch size support for quantization use case in Inductor. Take the UT in this PR as example, after this PR, the generated code will have assumption of dynamic input batch size. ``` cpp_fused_quantize_per_tensor_0 = async_compile.cpp(''' #include "/tmp/torchinductor_root/ib/cibrnuq56cxamjj4krp4zpjvsirbmlolpbnmomodzyd46huzhdw7.h" extern "C" void kernel(const float* in_ptr0, unsigned char* out_ptr0, const long ks0, const long ks1) { { #pragma GCC ivdep for(long i0=static_cast<long>(0L); i0<static_cast<long>(ks0); i0+=static_cast<long>(1L)) { #pragma GCC ivdep for(long i1=static_cast<long>(0L); i1<static_cast<long>(3L); i1+=static_cast<long>(1L)) { #pragma GCC ivdep for(long i2=static_cast<long>(0L); i2<static_cast<long>(static_cast<long>(ks1ks1)); i2+=static_cast<long>(1L)) { auto tmp0 = in_ptr0[static_cast<long>(i2 + (i1(static_cast<long>(ks1ks1))) + (3Li0(static_cast<long>(ks1ks1))))]; auto tmp1 = static_cast<float>(40.36037717834931); auto tmp2 = decltype(tmp0)(tmp0 * tmp1); auto tmp3 = std::nearbyint(tmp2); auto tmp4 = static_cast<float>(97.0); auto tmp5 = tmp3 + tmp4; auto tmp6 = static_cast<float>(0.0); auto tmp7 = max_propagate_nan(tmp5, tmp6); auto tmp8 = static_cast<float>(255.0); auto tmp9 = min_propagate_nan(tmp7, tmp8); auto tmp10 = static_cast<unsigned char>(tmp9); out_ptr0[static_cast<long>(i1 + (3Li2) + (3Li0(static_cast<long>(ks1ks1))))] = tmp10; } } } } } ''') cpp_fused_dequantize_per_tensor_mean_quantize_per_tensor_1 = async_compile.cpp(''' #include "/tmp/torchinductor_root/ib/cibrnuq56cxamjj4krp4zpjvsirbmlolpbnmomodzyd46huzhdw7.h" extern "C" void kernel(const unsigned char* in_ptr0, float* out_ptr0, unsigned char* out_ptr1, const long ks0, const long ks1) { { #pragma GCC ivdep for(long i0=static_cast<long>(0L); i0<static_cast<long>(ks0); i0+=static_cast<long>(1L)) { for(long i1=static_cast<long>(0L); i1<static_cast<long>(16L); i1+=static_cast<long>(16L)) { { #pragma omp declare reduction(+:at::vec::Vectorized<float>:omp_out = omp_out + omp_in) initializer(omp_priv={at::vec::Vectorized<float>(0)}) float tmp_acc0 = 0; at::vec::Vectorized<float> tmp_acc0_vec = at::vec::Vectorized<float>(0); for(long i2=static_cast<long>(0L); i2<static_cast<long>(1L + (static_cast<long>((at::native::div_floor_integer(ks1, 2L))(at::native::div_floor_integer(ks1, 2L)))) + (2L(at::native::div_floor_integer(ks1, 2L)))); i2+=static_cast<long>(1L)) { auto tmp0 = at::vec::Vectorized<uint8_t>::loadu_one_fourth(in_ptr0 + static_cast<long>(i1 + (16Li0) + (16Li2) + (16Li0(static_cast<long>((at::native::div_floor_integer(ks1, 2L))(at::native::div_floor_integer(ks1, 2L))))) + (32Li0(at::native::div_floor_integer(ks1, 2L))))); auto tmp1 = at::vec::convert_uint8_to_float(tmp0); auto tmp2 = at::vec::Vectorized<float>(static_cast<float>(0.0)); auto tmp3 = tmp1 - tmp2; auto tmp4 = at::vec::Vectorized<float>(static_cast<float>(0.010429476387798786)); auto tmp5 = tmp3 tmp4; tmp_acc0_vec = tmp_acc0_vec + tmp5; } tmp_acc0_vec.store(out_ptr0 + static_cast<long>(i1 + (16Li0))); } } } } { #pragma GCC ivdep for(long i0=static_cast<long>(0L); i0<static_cast<long>(16Lks0); i0+=static_cast<long>(1L)) { auto tmp0 = out_ptr0[static_cast<long>(i0)]; auto tmp1 = static_cast<float>(1L + (static_cast<long>((at::native::div_floor_integer(ks1, 2L))(at::native::div_floor_integer(ks1, 2L)))) + (2L(at::native::div_floor_integer(ks1, 2L)))); auto tmp2 = tmp0 / tmp1; auto tmp3 = static_cast<float>(168.09128392896545); auto tmp4 = decltype(tmp2)(tmp2 * tmp3); auto tmp5 = std::nearbyint(tmp4); auto tmp6 = static_cast<float>(0.0); auto tmp7 = tmp5 + tmp6; auto tmp8 = max_propagate_nan(tmp7, tmp6); auto tmp9 = static_cast<float>(255.0); auto tmp10 = min_propagate_nan(tmp8, tmp9); auto tmp11 = static_cast<unsigned char>(tmp10); out_ptr1[static_cast<long>(i0)] = tmp11; } } } ''') cpp_fused_dequantize_per_tensor_2 = async_compile.cpp(''' #include "/tmp/torchinductor_root/ib/cibrnuq56cxamjj4krp4zpjvsirbmlolpbnmomodzyd46huzhdw7.h" extern "C" void kernel(const unsigned char* in_ptr0, float* out_ptr0, const long ks0) { { for(long i0=static_cast<long>(0L); i0<static_cast<long>(16Lks0); i0+=static_cast<long>(16L)) { auto tmp0 = at::vec::Vectorized<uint8_t>::loadu_one_fourth(in_ptr0 + static_cast<long>(i0)); auto tmp1 = at::vec::convert_uint8_to_float(tmp0); auto tmp2 = at::vec::Vectorized<float>(static_cast<float>(100.0)); auto tmp3 = tmp1 - tmp2; auto tmp4 = at::vec::Vectorized<float>(static_cast<float>(0.0056716203689575195)); auto tmp5 = tmp3 tmp4; tmp5.store(out_ptr0 + static_cast<long>(i0)); } } } ''') async_compile.wait(globals()) del async_compile def call(args): arg8_1, arg9_1, arg10_1 = args args.clear() s0 = arg8_1 s2 = arg9_1 assert_size_stride(arg10_1, (s0, 3, s2, s2), (3(s2s2), s2s2, s2, 1)) buf0 = empty_strided((s0, 3, s2, s2), (3(s2s2), 1, 3s2, 3), device='cpu', dtype=torch.uint8) cpp_fused_quantize_per_tensor_0(c_void_p(arg10_1.data_ptr()), c_void_p(buf0.data_ptr()), c_long(s0), c_long(s2)) del arg10_1 buf1 = torch.ops.onednn.qconv2d_pointwise(buf0, 0.024776775389909744, 97, constant5, constant2, constant3, constant0, [1, 1], [1, 1], [1, 1], 1, 95.88209060714476, 0, False, 'relu', [], '') assert_size_stride(buf1, (s0, 16, 1 + s2, 1 + s2), (16 + (16(s2s2)) + (32s2), 1, 16 + (16s2), 16)) del buf0 # Source Nodes: [quantize_per_tensor_default_2], Original ATen: [quantized_decomposed.quantize_per_tensor] buf2 = torch.ops.quantized.max_pool2d(buf1, [3, 3], [2, 2], [1, 1], [1, 1], False) del buf1 buf3 = buf2 assert_size_stride(buf3, (s0, 16, 1 + (s2 // 2), 1 + (s2 // 2)), (16 + (16((s2 // 2)(s2 // 2))) + (32(s2 // 2)), 1, 16 + (16(s2 // 2)), 16)) del buf2 buf4 = empty_strided((s0, 16, 1, 1), (16, 1, 16s0, 16s0), device='cpu', dtype=torch.float32) buf5 = empty_strided((s0, 16), (16, 1), device='cpu', dtype=torch.uint8) cpp_fused_dequantize_per_tensor_mean_quantize_per_tensor_1(c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr()), c_void_p(buf5.data_ptr()), c_long(s0), c_long(s2)) del buf3 buf6 = torch.ops.onednn.qlinear_pointwise(buf5, 0.005949148442596197, 0, constant6, constant4, constant3, constant1, 176.31645543014483, 100, False, 'none', [], '') assert_size_stride(buf6, (s0, 16), (16, 1)) del buf5 buf7 = reinterpret_tensor(buf4, (s0, 16), (16, 1)); del buf4 # reuse cpp_fused_dequantize_per_tensor_2(c_void_p(buf6.data_ptr()), c_void_p(buf7.data_ptr()), c_long(s0)) return (buf7, ) ``` TestPlan ``` python -m pytest test_mkldnn_pattern_matcher.py -k test_qconv2d_maxpool2d_linear_dynamic ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/108550 Approved by: https://github.com/jgong5, https://github.com/jansel	2023-09-19 08:30:16 +00:00
cyy	ac603bc2f8	[Reland] Eliminate invocations of c10::stoi,c10::stod,c10::stoull,c10::stoll (#109566 ) This is reland of #87603 with definitions of c10::stoXX kept for further investigation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109566 Approved by: https://github.com/huydhn	2023-09-19 07:15:25 +00:00
Edward Z. Yang	2c1554a032	Make SymFloat behave symmetrically with float in torch.tensor (#109513 ) Previously, SymFloat would force double precision. That's wrong; instead, we must respect default dtype. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/109513 Approved by: https://github.com/voznesenskym	2023-09-19 01:52:41 +00:00
Angela Yi	e8ab8c877d	[exir] Add lift constant tensors passes after aten_to_edge (#109382 ) Summary: X-link: https://github.com/pytorch/executorch/pull/359 When exporting using enable_aot (through the torch.export path), we want to lift all constant tensors as buffers to the exported program. The ScalarToTensor pass in EXIR's aten_to_edge passes will create some constant tensors in the graph, so we will need to run a lift_constant_tensors pass afterwards. Note that this only needs to be applied when exporting using the torch.export path because in the original path, nothing is lifted. Test Plan: CI Differential Revision: D49207492 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109382 Approved by: https://github.com/cccclai	2023-09-19 01:34:58 +00:00
Mark Saroufim	0ec9f59f70	Loudly Error in dynamo bench if eager fails (#109536 ) Helps debug https://github.com/pytorch/benchmark/issues/1901 I will wait until the ONNX beartype sev is fixed before merging Pull Request resolved: https://github.com/pytorch/pytorch/pull/109536 Approved by: https://github.com/xuzhao9	2023-09-19 00:40:42 +00:00
Angela Yi	98208e5160	[export] Update deserialized FakeTensorMode/ShapeEnv with same configs as export (#109522 ) Summary: Deserialized FakeTensorMode/ShapeEnv should have the same configs as export: https://fburl.com/code/y7jxf5qw Test Plan: CI Differential Revision: D49377410 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109522 Approved by: https://github.com/zhxchen17	2023-09-19 00:34:30 +00:00
Randolf Scholz	a44cf44067	improved type hints ScriptModule (#109535 ) Added properties - "code" - "code_with_constants" - "graph" - "inlined_graph" - "original_name" With appropriate type hints to `ScriptModule` stub and removed them from child class `RecursiveScriptModule`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109535 Approved by: https://github.com/ezyang	2023-09-19 00:13:15 +00:00
Yu, Guangye	871b5caae7	Fix hpu deserialization bug (#109499 ) # Motivation fix hpu deserialization bug. It should check hpu model if and only if location start with hpu. Otherwise, it always raise an AssertError if hpu is not imported. This break the serialization/desirialization functionality abourt other third-party like IPEX. # Solution only assert hpu model when start with hpu Pull Request resolved: https://github.com/pytorch/pytorch/pull/109499 Approved by: https://github.com/ezyang	2023-09-19 00:10:51 +00:00
angelayi	5b13f74e9b	[export] Update how we input kwargs (#109160 ) Previously, the code for passing inputs to exported program was: ``` if kwargs: return (args, kwargs) else: return args ``` However, this causes some inconsistency where if the original input contains args and kwargs, the treespec would be a tuple containing a tuple of arguments, and a dictionary of keyword arguments. But if the original input only contained args, the treespec would just be a tuple of arguments. This inconsistency causes some inconveniences in the runtime. So I updated the code to just always keep the kwargs around. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109160 Approved by: https://github.com/zhxchen17, https://github.com/avikchaudhuri	2023-09-19 00:04:32 +00:00
Andrea D'Eusanio	a6d34c60a1	Fixing searchsorted doc (#109364 ) Removing ambiguous description Fixes #109298 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109364 Approved by: https://github.com/colesbury	2023-09-18 23:12:53 +00:00
zhxchen17	6f4b9cc9ab	[export] Skip noop runtime assertion pass. (#109395 ) Summary: If there's no inline constraints added, just return the original graph. We want to do this because sometimes this pass mess up the node names, before we actually fix this, we could make the behavior a bit less buggy by skipping noop passes. Test Plan: Reviewers: Subscribers: Tasks: Tags: Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/109395 Approved by: https://github.com/angelayi	2023-09-18 22:37:28 +00:00
Pritam Damania	550b0ec3d4	Release GIL around VariableInfo::zeros to avoid deadlocks (#109454 ) See https://github.com/pytorch/pytorch/issues/109074#issue-1891369807 and https://github.com/pytorch/pytorch/issues/109074#issuecomment-1718825855 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109454 Approved by: https://github.com/albanD	2023-09-18 22:28:48 +00:00
Aaron Bockover	0e2b22c451	[ONNX] switch from onnxscript-preview to onnxscript (#109139 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109139 Approved by: https://github.com/BowenBao, https://github.com/thiagocrepaldi	2023-09-18 22:24:47 +00:00
Andres Lugo-Reyes	9863286abf	[ROCM] Enable bwd cross_entropy on ROCM now that eps tolerance update (#109384 ) Follow up to https://github.com/pytorch/pytorch/pull/109038 The fix in the PR above also fixes this test on rocm Pull Request resolved: https://github.com/pytorch/pytorch/pull/109384 Approved by: https://github.com/jeffdaily, https://github.com/albanD	2023-09-18 22:20:38 +00:00
Xuehai Pan	0bf30c140a	[pytree] Use OpTree for PyTree manipulation (#93139 ) Split from #92679. Use C++-based PyTree implementation. ## Highlights 1. High performance (20x speedup than the pure-Python implementation, 10%-20% overall speedup for `torch.fx`) 2. Multi-input tree-map support 3. Custom tree node registry with namespace isolation Refs: - #65761 - #91323 - #92679 From https://github.com/pytorch/pytorch/issues/65761#issuecomment-1334746366: > ### 0. Out-of-box compatible with JAX's pytree, provides the same interfaces and functions (and more). > > ### 1. High-performance: `optree` has comparable fast tree operations (~0.9x for `dict`s and ~2.5x for `OrderedDict`s) than JAX's pytree and it is 20x faster than `torch.utils._pytree`. > > `optree` implements some common Python container types in C++ (e.g., `OrderedDict`) and achieves 2.5x performance than JAX's pytree. Check out section [Built-in PyTree Node Types](https://github.com/metaopt/optree#built-in-pytree-node-types) and [Benchmark](https://github.com/metaopt/optree#benchmark) for more details. > > \| Module \| Nodes \| OpTree (μs) \| JAX XLA (μs) \| PyTorch (μs) \| DM-Tree (μs) \| Speedup (J / O) \| Speedup (P / O) \| Speedup (D / O) \| > \| :-------- \| ----: \| ----------: \| -----------: \| -----------: \| -----------: \| --------------: \| --------------: \| --------------: \| > \| TinyMLP \| 53 \| 26.40 \| 68.19 \| 586.87 \| 34.14 \| 2.58 \| 22.23 \| 1.29 \| > \| AlexNet \| 188 \| 84.28 \| 259.51 \| 2182.07 \| 125.12 \| 3.08 \| 25.89 \| 1.48 \| > \| ResNet18 \| 698 \| 288.57 \| 807.27 \| 7881.69 \| 429.39 \| 2.80 \| 27.31 \| 1.49 \| > \| ResNet34 \| 1242 \| 580.75 \| 1564.97 \| 15082.84 \| 819.02 \| 2.69 \| 25.97 \| 1.41 \| > \| ResNet50 \| 1702 \| 791.18 \| 2081.17 \| 20982.82 \| 1104.62 \| 2.63 \| 26.52 \| 1.40 \| > \| ResNet101 \| 3317 \| 1603.93 \| 3939.37 \| 40382.14 \| 2208.63 \| 2.46 \| 25.18 \| 1.38 \| > \| ResNet152 \| 4932 \| 2446.56 \| 6267.98 \| 56892.36 \| 3139.17 \| 2.56 \| 23.25 \| 1.28 \| > \| ViT-H/14 \| 3420 \| 1681.48 \| 4488.33 \| 41703.16 \| 2504.86 \| 2.67 \| 24.80 \| 1.49 \| > \| Swin-B \| 2881 \| 1565.41 \| 4091.10 \| 34241.99 \| 1936.75 \| 2.61 \| 21.87 \| 1.24 \| > \| \| \| \| \| \| Average \| 2.68 \| 24.78 \| 1.38 \| > > <div align="center"> > <img src="https://user-images.githubusercontent.com/16078332/200494435-fd5bb385-59f7-4811-b520-98bf5763ccf3.png" width="90%" /> > </div> > > ### 2. Namespace Isolation for the PyTree Type Registry > > In addition to the JAX's pytree registry for custom node type registration, `optree` adds `namespace` isolation to the registry. Users can register the same type multiple times for different flatten/unflatten behavior. It also provides module-level isolation for safety reasons. For example, you can add a unique prefix to your namespace to isolate your registry with other modules (e.g., `torch.xxx`, `torch.functorch.xxx`): > > ```python > # Register a Python type into a namespace > import torch > > optree.register_pytree_node( > torch.Tensor, > # (tensor) -> (children, metadata) > flatten_func=lambda tensor: ( > (tensor.cpu().numpy(),), > dict(dtype=tensor.dtype, device=tensor.device, requires_grad=tensor.requires_grad), > ), > # (metadata, children) -> tensor > unflatten_func=lambda metadata, children: torch.tensor(children[0], *metadata), > namespace='torch.torch2numpy', > ) > ``` > > ```python > >>> tree = {'weight': torch.ones(size=(1, 2)).cuda(), 'bias': torch.zeros(size=(2,))} > >>> tree > {'weight': tensor([[1., 1.]], device='cuda:0'), 'bias': tensor([0., 0.])} > > # Flatten without specifying the namespace > >>> tree_flatten(tree) # `torch.Tensor`s are leaf nodes > ([tensor([0., 0.]), tensor([[1., 1.]], device='cuda:0')], PyTreeSpec({'bias': , 'weight': })) > > # Flatten with the namespace > >>> leaves, treespec = optree.tree_flatten(tree, namespace='torch.torch2numpy') > >>> leaves, treespec > ( > [array([0., 0.], dtype=float32), array([[1., 1.]], dtype=float32)], > PyTreeSpec( > { > 'bias': CustomTreeNode(Tensor[{'dtype': torch.float32, 'device': device(type='cpu'), 'requires_grad': False}], []), > 'weight': CustomTreeNode(Tensor[{'dtype': torch.float32, 'device': device(type='cuda', index=0), 'requires_grad': False}], [*]) > }, > namespace='torch.torch2numpy' > ) > ) > > # `entries` are not defined and use `range(len(children))` > >>> optree.tree_paths(tree, namespace='torch.torch2numpy') > [('bias', 0), ('weight', 0)] > > # Unflatten back to a copy of the original object > >>> optree.tree_unflatten(treespec, leaves) > {'bias': tensor([0., 0.]), 'weight': tensor([[1., 1.]], device='cuda:0')} > ``` > > Check out section [Registering a Container-like Custom Type as Non-leaf Nodes](https://github.com/metaopt/optree#notes-about-the-pytree-type-registry) for more details. > > ### 3. Support both `None` as Non-leaf Node and `None` as Leaf > > In JAX's implementation, `None` is always an internal non-leaf node with an arity 0, which is like an empty tuple. This limits the usage of the JAX's pytree utilities for PyTorch. For example, the `nn.Module` uses `_parameters` and `_buffers` (`OrderedDict[str, Optional[Tensor]]`) to hold the tensors, while the value can be a tensor or `None`. > > `optree` supports both `None` as Non-leaf Node (JAX's default) and `None` as Leaf (PyTorch's default). Check out section [None is Non-leaf Node vs. None is Leaf](https://github.com/metaopt/optree#none-is-non-leaf-node-vs-none-is-leaf) for more details. > > ### 4. Some other improvements and bug fixes > > 1. Adds in-place version of treemap (`tree_map_`), which reduces redundant unflatten operation for better performance. > 2. Adds support for tree flatten and tree map with paths. (useful for `functorch` module extraction). > 3. Improves the JAX's pytree sorting support for `dict`s. > 4. Better string representation `repr(PyTreeSpec)`. > 5. Fixes some bugs for JAX's pytree of hashing, pickle serialization, segmentation fault for infinite recursion, and tree-compose/tree-transpose. From https://github.com/pytorch/pytorch/pull/92679#issuecomment-1398778481: > ```python > # pytree_make_fx_bench.py > import torch > from torch.fx.experimental.proxy_tensor import make_fx > import time > > def f(x): > for _ in range(10000): > x = x+x > return x > > import time > begin = time.time() > out = make_fx(f, tracing_mode="real")(torch.randn(20)) > begin = time.time() > print(f'tracing_mode="real" {time.time() - begin:.2f}') > out = make_fx(f, tracing_mode="fake")(torch.randn(20)) > print(f'tracing_mode="fake" {time.time() - begin:.2f}') > > out = make_fx(f, tracing_mode="symbolic")(torch.randn(20)) > print(f'tracing_mode="symbolic" {time.time() - begin:.2f}') > ``` > > This seems to run around 10-20% faster with the optree implementation: > > ``` > # Optree > python pytree_make_fx_bench.py > tracing_mode="real" 0.00 > tracing_mode="fake" 6.32 > tracing_mode="symbolic" 27.13 > ``` > > ``` > # torch.utils._pytree > python pytree_make_fx_bench.py > tracing_mode="real" 0.00 > tracing_mode="fake" 7.66 > tracing_mode="symbolic" 31.07 > ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/93139 Approved by: https://github.com/malfet	2023-09-18 21:24:56 +00:00
Yanbo Liang	8a567bb59d	[HigherOrderOp] Should automatically pop modes (#109157 ) Fixes #108282 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109157 Approved by: https://github.com/zou3519	2023-09-18 20:54:09 +00:00
Kimish Patel	73ac814148	[Pytorch][quant] Move xnnpack quantizer to use aten.linear (#109254 ) Summary: Now that quantization works on pre-dispatch aten IR, moving to full set of aten ops is ok. Plus when tracing models like ViT, the linear projections of of k, q, v uses functional.linear and not nn.Linear, which results not being able to extract nodes corresponding to linear. Test Plan: quant tests Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D49252194](https://our.internmc.facebook.com/intern/diff/D49252194) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109254 Approved by: https://github.com/jerryzh168	2023-09-18 20:26:44 +00:00
Edward Z. Yang	77d745666b	Add TORCH_CHECK_ALWAYS_SHOW_CPP_STACKTRACE (#109373 ) Unlike TORCH_CHECK, these always show C++ stacktrace on error. Put it on errors where you frequently seem to need this information. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/109373 Approved by: https://github.com/bdhirsh ghstack dependencies: #109372	2023-09-18 19:46:32 +00:00
Edward Z. Yang	8a1bbf383d	Out-of-line cannot call with symbolic error test (#109372 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/109372 Approved by: https://github.com/bdhirsh	2023-09-18 19:46:32 +00:00
Justin Chu	050c56d0a5	[dynamo][ci] Pin beartype to 0.15.0 (#109510 ) CIs are failing because of https://github.com/beartype/beartype/issues/282 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109510 Approved by: https://github.com/thiagocrepaldi	2023-09-18 19:08:32 +00:00
PyTorch MergeBot	4d44d8c00a	Revert "Eliminate c10::stoi,c10::stod,c10::stoull,c10::stoll (#109179 )" This reverts commit 852f1b8417e80b72a7d1c4a772f66af28da02913. Reverted https://github.com/pytorch/pytorch/pull/109179 on behalf of https://github.com/huydhn due to Sorry for reverting your change but this is breaking periodic buck build, so please fix the issue and reland the change https://github.com/pytorch/pytorch/actions/runs/6207458526/job/16852695272 ([comment](https://github.com/pytorch/pytorch/pull/109179#issuecomment-1724168571))	2023-09-18 18:41:12 +00:00
PyTorch MergeBot	70ca3ee951	Revert "inductor: only do the conv+bn folding for the freezing path (#109270 )" This reverts commit c7017fff38e73210541124739ed9404492ddd68c. Reverted https://github.com/pytorch/pytorch/pull/109270 on behalf of https://github.com/malfet due to Broke slow test, see `c7017fff38` ([comment](https://github.com/pytorch/pytorch/pull/109270#issuecomment-1724132526))	2023-09-18 18:15:31 +00:00
Jez Ng	5cd8a6d40a	Enable typechecking for _inductor/fx_utils.py (#109415 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109415 Approved by: https://github.com/Skylion007 ghstack dependencies: #109269, #109347, #109335	2023-09-18 18:12:23 +00:00
Jez Ng	fe452108fb	Enable typechecking for _inductor/debug.py (#109335 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109335 Approved by: https://github.com/eellison ghstack dependencies: #109269, #109347	2023-09-18 18:12:23 +00:00
Jez Ng	9172c9f03f	Fix spelling / capitalization in freezing.py error message (#109347 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109347 Approved by: https://github.com/eellison ghstack dependencies: #109269	2023-09-18 18:12:20 +00:00
Jez Ng	bab627073a	Enable typechecking for _inductor/freezing.py (#109269 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109269 Approved by: https://github.com/eellison	2023-09-18 18:12:18 +00:00
JackCaoG	282aa26764	Update the instruction to enable dynamo logs (#109409 ) ``` torch._dynamo.config.log_level = logging.INFO torch._dynamo.config.output_code = True ``` were replaced with the module level log control https://github.com/pytorch/pytorch/pull/94858 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109409 Approved by: https://github.com/msaroufim	2023-09-18 17:49:40 +00:00
PyTorch MergeBot	a399f839ac	Revert "Add PR number to metrics when available (#109406 )" This reverts commit f0fb4b3897e9cac3b99ee5b9b2ecab255e9e2da3. Reverted https://github.com/pytorch/pytorch/pull/109406 on behalf of https://github.com/ZainRizvi due to breaks trunk ([comment](https://github.com/pytorch/pytorch/pull/109406#issuecomment-1724061024))	2023-09-18 17:35:37 +00:00
Rockerz	05c31b3b69	typo in DispatchKeySet.h (#109431 ) Fixes #108641 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109431 Approved by: https://github.com/Skylion007	2023-09-18 17:34:36 +00:00
Shunting Zhang	cddeceb6b6	[inductor] scale down RBLOCK for occupancy (#109275 ) For large reduction (with large xnumel and rnumel), we potentially need run large number of thread blocks. Occupancy matters here since with larger occupancy we can run more blocks on each SM and we may need less number of waves to run the entire kernel on the GPU. Number of registers used by each thread can limit the occupancy. For A100, it's safe to say that register usage does not limit occupancy only if each thread use <= 32 registers. This PR leverage this observation and reduce RBLOCK (thus reduce registers used by each thread) if thread usage limit occupancy for large reduction. The scenario mentioned can happen for the softmax kernel used in transformers. Here are some results get from devgpu: - PLBartForCausalLM we improve from 1.88x (58.7ms) to 2.00x (55.82ms) - TrOCRForCausalLM we improve from 1.45x (92.9ms) to 1.51x (89.12ms) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109275 Approved by: https://github.com/jansel	2023-09-18 17:29:30 +00:00
cyy	5d5990fc49	Remaining replacement of c10::stoi with std::stoi (#109482 ) PR #109179 replaced c10::stoi with std::stoi. However, there were some files forgotten to change. This patch fixes them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109482 Approved by: https://github.com/vfdev-5, https://github.com/Skylion007	2023-09-18 16:05:09 +00:00
Bin Bao	6ffa59031a	[inductor] Fix CudaStreamGuard in AOTInductor ABI compatible mode (#109471 ) Summary: Use a RAII class to wrap around at::cuda::CUDAStreamGuard. Previous implementation didn't follow the exact CUDAStreamGuard behavior. Test Plan: CI Differential Revision: D49355542 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109471 Approved by: https://github.com/chenyang78	2023-09-18 15:54:58 +00:00
Nikita Shulga	d2ca5fa6c5	[lintrunner] Capture mypy internal error (#109421 ) Mypy internal errors are reported to stderr rather than stdout and does not contain column number This should prevent internal errors from creeping into the code and occlude other legitimate errors Test plan: Checkout `5cd861fcf7` apply this change and see `lintrunner` run to report internal error Fixes https://github.com/pytorch/pytorch/issues/104940 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109421 Approved by: https://github.com/Skylion007	2023-09-18 15:48:14 +00:00
Jithun Nair	591c01995b	Add CONDA_CMAKE=yes for all ROCm docker configs (#109334 ) This ensures all BUILD_ENVIRONMENT ROCm configs will have LAPACK/MKL support enabled due to using conda cmake. This should have no impact on pytorch/pytorch CI builds though, since those do not fall in the catch-all condition. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109334 Approved by: https://github.com/pruthvistony, https://github.com/kit1980	2023-09-18 15:08:06 +00:00
Edward Yang	88600e7d2e	[RELAND] Force synced KJT to trace unbacked SymInt (#108960 ) (#109216 ) Summary: The basic concept behind this diff is to modify Dynamo's tracing behavior when it encounters a KeyedJaggedTensor that is synced (aka has `_length_per_key` and `_offset_per_key` populated). These fields are lists of integers; ordinarily, Dynamo will optimistically try to specialize on integers, however, for KJTs, we know that these integers will definitely vary from run-to-run. Furthermore, ordinarily, we would also specialize these integers if they are 0/1, but we will frequently expect features in KJTs to be 0/1. The fix is to detect KJTs and treat these integers as unbacked integers. This is NOT a universally sound optimization: when treating these integers as unbacked, we never report them as equal to zero or one. In return, we always generate graphs that generalize no matter the length of values on features. This is enough to trace through APS sparse arch, torchrec_dlrm and some small split-cat examples. The special integer behavior is triggered by a dynamically scoped `force_unspec_int_unbacked_size_like` variable on TracingContext, which we trigger when we wrap a KJT. There probably are other ways to do this, but this was simple and worked. Test Plan: ``` buck2 test mode/dev-nosan //pytorch/benchmark/fb/test_gpu:run_test_gpu ``` from aakhundov 1. first build feed_lower_benchmark: ``` buck2 build --show-output mode/opt -c python.package_style=inplace -c fbcode.enable_gpu_sections=true -c fbcode.platform=platform010 -c fbcode.split-dwarf=true hpc/new/models/feed/benchmark:feed_lower_benchmark ``` 2. then run the lowering of the model with it: ``` TORCHINDUCTOR_MAX_AUTOTUNE=1 TORCHINDUCTOR_UNIQUE_KERNEL_NAMES=1 TORCH_LOGS="output_code,graph_code" TORCH_COMPILE_DEBUG=1 ../buck-out/v2/gen/fbcode/79c6b019ee0f9469/hpc/new/models/feed/benchmark/__feed_lower_benchmark__/feed_lower_benchmark.par --load=manifold://ig_inference_model/tree/user/facebook/fblearner/predictor/960999465/60/gpu_lowering/input.predictor --skip-trt --skip-ait --sync-mode=0 --enable-aot-inductor --lower-presets="ig_stories" --gpu-trace ``` cf https://docs.google.com/document/d/1yD30xYrdmM8r2HTdmXnZTg0-MHVexfVrAa0294m1AUE/edit?pli=1#heading=h.qiv3fp7e6zg0 From torchrec: https://www.internalfb.com/intern/wiki/Torchrec/Development/Testing_production_models/ From ge0405 baseline (without your diff): f477293168 your diff: f477292363 ``` buck2 test //caffe2/test/dynamo:test_dynamo_torchrec buck2 run 'fbcode//mode/opt' fbcode//pytorch/benchmark/fb/test_gpu:run_test_gpu -- 'pytorch.benchmark.fb.test_gpu.test_gpu.TestBenchmarkFbGpu.test_train_blue_reels_vdd_v3_inductor_speedup' ``` Differential Revision: D49236757 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109216 Approved by: https://github.com/voznesenskym	2023-09-18 14:39:44 +00:00
Peter Bell	1a361e4e9f	[inductor] realize_into should not alias src and dst (#108126 ) Fixes #107995 In the reproducer we have the fx graph: ```python class <lambda>(torch.nn.Module): def forward(self, arg0_1: f32[1]): # File: <ipython-input-1-5f62cb746ad5>:10, code: return self.layer1(inputs) gt: b8[1] = torch.ops.aten.gt.Scalar(arg0_1, 0) mul: f32[1] = torch.ops.aten.mul.Tensor(arg0_1, 5.2955089) where: f32[1] = torch.ops.aten.where.self(gt, arg0_1, mul); gt = mul = None # No stacktrace found for following nodes copy_: f32[1] = torch.ops.aten.copy_.default(arg0_1, where); arg0_1 = None return (where,) ``` The `where` node is both copied into `arg0_1` and returned as the output of the function. Currently `realize_into` converts the where's storage into a `MutationLayout` of `arg0_1`, for which no tensor named `buf0` is allocated. This is incorrect as `where` and `arg0_1` shouldn't share storage. It also breaks the wrapper code generation which references `buf0` directly in the return, but never allocates a `buf0`. This issue only appears for size zero tensors, because otherwise the `src` buffer becomes a user of `arg0_1` which forces this copy to happen anyway. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108126 Approved by: https://github.com/jansel	2023-09-18 14:16:43 +00:00
Salil Desai	fc47ba2794	[Decomposition] clamp_min (#108717 ) Summary: Decomp already exists so just add it to core_aten_decompositions https://www.internalfb.com/code/fbsource/[abda43a5a268e83fef6d62b49531a390ce915ad2]/fbcode/caffe2/torch/_refs/__init__.py?lines=1846 Differential Revision: D48880080 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108717 Approved by: https://github.com/SherlockNoMad	2023-09-18 12:43:58 +00:00
Salil Desai	a6d4cca7c0	[Decomposition] unsafe_split.Tensor (#108544 ) Summary: Include decomp in core_aten_decompositions Decomp already exists https://www.internalfb.com/code/fbsource/[03ff511cad587fc27ed8fd6a54b87845246e8e0c]/fbcode/caffe2/torch/_decomp/decompositions.py?lines=1209 Test Plan: OSS + Phabricator Tests Differential Revision: D48940445 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108544 Approved by: https://github.com/larryliu0820, https://github.com/SherlockNoMad	2023-09-18 12:43:07 +00:00
Salil Desai	af93b29c5e	[Decomposition] std.correction (#108733 ) Summary: Include decomp in core_aten_decompositions Decomp: https://www.internalfb.com/code/fbsource/[e69bf00ff87a55c9a30bd7905881661ff05fa211]/fbcode/caffe2/torch/_refs/__init__.py?lines=2398 Differential Revision: D48940402 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108733 Approved by: https://github.com/larryliu0820, https://github.com/SherlockNoMad	2023-09-18 11:38:23 +00:00
mingfeima	a683bc54fd	move transform_bias_rescale_qkv vectorized code to cpu sub folder (#109095 ) `at::vec::Vectorized<>` will not be properly vectorized under the folder of `aten/src/ATen/native/transformers`, move the vectorized code to `aten/src/ATen/native/cpu` where the macros of `CPU_CAPABILITY_AVX2`, `CPU_CAPABILITY_AVX512` etc. are defined. Here is the vtune log before and after this patch on `transform_bias_rescale_qkv_cpu` 1. before: ![transformer_bioas_rescale_qkv_before](https://github.com/pytorch/pytorch/assets/20233731/582f6873-d86e-47a6-bd2a-620b97acc5b1) 2. after: ![transformer_bioas_rescale_qkv_after](https://github.com/pytorch/pytorch/assets/20233731/949004ab-3cbc-4a1d-a03d-9a17efa981ae) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109095 Approved by: https://github.com/jgong5, https://github.com/lezcano	2023-09-18 08:40:06 +00:00
Zain Rizvi	f0fb4b3897	Add PR number to metrics when available (#109406 ) <!-- copilot:summary --> ### <samp>🤖 Generated by Copilot at 780bfa6</samp> Add a new metric for pull request number in `tools/stats/upload_metrics.py`. This allows tracking the CI performance of pull requests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109406 Approved by: https://github.com/kit1980, https://github.com/malfet, https://github.com/clee2000	2023-09-18 03:17:54 +00:00
aashishthakur10	9e86a093e4	add torch.device to python type (#108116 ) Fixes #107856 This PR adds torch.device instance check in the python_type method for torch variables in dynamo. @ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/108116 Approved by: https://github.com/msaroufim, https://github.com/ezyang	2023-09-18 02:20:30 +00:00
Aaron Gokaslan	6d725e7d66	[BE]: enable ruff rules PLR1722 and PLW3301 (#109461 ) Enables two ruff rules derived from pylint: * PLR1722 replaces any exit() calls with sys.exit(). exit() is only designed to be used in repl contexts as may not always be imported by default. This always use the version in the sys module which is better * PLW3301 replaces nested min / max calls with simplified versions (ie. `min(a, min(b, c))` => `min(a, b. c)`). The new version is more idiomatic and more efficient. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109461 Approved by: https://github.com/ezyang	2023-09-18 02:07:21 +00:00
cyy	a9a0f7a4ad	Build CUDA image for lintrunner (#109456 ) Following the recent works, it is necessary to add CUDA files in the docker container so that we can lint CUDA code in the future. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109456 Approved by: https://github.com/malfet	2023-09-18 02:05:17 +00:00
Catherine Lee	0cae3b5df5	Revert "[PyTorch] Add Expanded call stack to nodes (#108426 )" (#109468 ) This reverts commit c657d9ecc555facb18cb0eecd8ffe15141394aa1. https://github.com/pytorch/pytorch/pull/108426 The diff got reverted internally via a backout diff without getting exported to github. Do not import this PR Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/109468 Approved by: https://github.com/kit1980	2023-09-17 23:46:20 +00:00
Ken Jin	f9e72acc8f	Guard default dtype in torchdynamo (#109459 ) Fixes https://github.com/pytorch/pytorch/issues/109458 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109459 Approved by: https://github.com/ezyang	2023-09-17 22:51:33 +00:00
PyTorch MergeBot	71420a98ab	Revert "Remove c10::either (#109299 )" This reverts commit 9d297cc77374f3a7e4bddd9b04fbe7ed64b838be. Reverted https://github.com/pytorch/pytorch/pull/109299 on behalf of https://github.com/clee2000 due to sorry but there are a few internal usages and when I tried swapping them out, I got some errors. I will get someone to look at them on Monday ([comment](https://github.com/pytorch/pytorch/pull/109299#issuecomment-1722579387))	2023-09-17 22:05:47 +00:00
PyTorch MergeBot	525e4f42d0	Revert "replace torch::make_unique with std::make_unique (#108866 )" This reverts commit 03e35efbf733da28d9e1c5a4b1b203fe335b5f94. Reverted https://github.com/pytorch/pytorch/pull/108866 on behalf of https://github.com/clee2000 due to Sorry but I found more usages of `torch::make_unique` internally, I can go change all of these, but I'd prefer if that gets done before this gets merged ([comment](https://github.com/pytorch/pytorch/pull/108866#issuecomment-1722577925))	2023-09-17 21:57:30 +00:00
PyTorch MergeBot	07f2efa285	Revert "[HigherOrderOp] Should automatically pop modes (#109157 )" This reverts commit f03b8abd4706e53b3fb6aefbd4304884e537616d. Reverted https://github.com/pytorch/pytorch/pull/109157 on behalf of https://github.com/clee2000 due to broke internal builds D49346922 ([comment](https://github.com/pytorch/pytorch/pull/109157#issuecomment-1722571262))	2023-09-17 21:19:52 +00:00
PyTorch MergeBot	49b18ae546	Revert "python functionalization: add helpers, functionalize_sync and mirror_autograd_meta (#107917 )" This reverts commit 0ad595954a1766f26aa55b0f72814d55865bb1dc. Reverted https://github.com/pytorch/pytorch/pull/107917 on behalf of https://github.com/clee2000 due to breaking internal builds D49346637 ([comment](https://github.com/pytorch/pytorch/pull/107917#issuecomment-1722566885))	2023-09-17 20:57:41 +00:00
PyTorch MergeBot	17193faf1a	Revert "Created nested utils.cpp (#109304 )" This reverts commit 924723bda7e3b7dfba1612027ecd3f7af10fb449. Reverted https://github.com/pytorch/pytorch/pull/109304 on behalf of https://github.com/clee2000 due to sorry but this is breaking internal builds due to the new header file D49346814 ([comment](https://github.com/pytorch/pytorch/pull/109304#issuecomment-1722561958))	2023-09-17 20:32:49 +00:00
Bin Bao	c8e4e08c8d	[inductor] Forward fix a windows test error (#109449 ) Summary: forward fix a windows test error from https://github.com/pytorch/pytorch/pull/109391 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109449 Approved by: https://github.com/mikekgfb	2023-09-17 19:54:21 +00:00
cyy	75b954b715	[4/N] Enable clang-tidy in torch/csrc/autograd (#109455 ) The PR enables clang-tidy checks in torch/csrc/autograd. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109455 Approved by: https://github.com/Skylion007	2023-09-17 17:11:50 +00:00
XiaobingSuper	c7017fff38	inductor: only do the conv+bn folding for the freezing path (#109270 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109270 Approved by: https://github.com/eellison	2023-09-17 12:36:49 +00:00
cyy	51d2d825ab	[3/N] apply clang-tidy in torch/csrc/autograd (#109368 ) This PR applies clang-tidy fixes in torch/csrc/autograd/FunctionsManual.cpp. There are also other fixes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109368 Approved by: https://github.com/Skylion007	2023-09-17 07:26:59 +00:00
Ying Zhang	d8da2a7c85	Switch to CUDA event based profiling (#109338 ) In https://github.com/pytorch/pytorch/pull/107901, the CUDA event based profiling is changed to profiler based profiling to avoid counting CPU-side kernel launch overhead in final latency numbers. However, it turns out that torch.profile() is significantly slower than CUDA event which affects model compilation speed quite significantlly. This PR changes back to CUDA event based profiling. Follow-ups: * Try CUDA event profiling with CUDAGraphs; * Multi-GPU profiling; Pull Request resolved: https://github.com/pytorch/pytorch/pull/109338 Approved by: https://github.com/frank-wei	2023-09-17 06:04:41 +00:00
cyy	92b0db2967	Don't find MKL if it isn't used (#109426 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109426 Approved by: https://github.com/Skylion007	2023-09-17 03:39:39 +00:00
cyy	6b1a15d1bb	Eliminate c10::guts::make_unique_base (#109429 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109429 Approved by: https://github.com/Skylion007	2023-09-17 00:04:09 +00:00
Animesh Jain	4e4314da7f	[dynamo] remove DummyGlobalSource (#109411 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109411 Approved by: https://github.com/ezyang	2023-09-16 23:11:11 +00:00
Wanchao Liang	9a95b4bc7b	[dtensor] quick fix to #109306 (#109428 ) Looks like the op argument schema type check is not reliable.. for things like aten.div.Tensor(Tensor, Tensor), the second argument can still be a float/scalar for some reason, switch to check with the instance type directly Pull Request resolved: https://github.com/pytorch/pytorch/pull/109428 Approved by: https://github.com/awgu, https://github.com/fegin	2023-09-16 20:53:55 +00:00
Aaron Gokaslan	f15adf204b	[BE]: Replace undocumented constant in logging (#109434 ) Replaces the undocumented alias with the proper constant WARNING Pull Request resolved: https://github.com/pytorch/pytorch/pull/109434 Approved by: https://github.com/ezyang	2023-09-16 20:17:32 +00:00
wz337	0aedacb4f7	[2D][1/N] Add _enable_extension to fsdp state (#109242 ) Add _enable_extension to fsdp state. We will use this to determine whether we should enable the extension or not. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109242 Approved by: https://github.com/fegin	2023-09-16 19:03:10 +00:00
wz337	322bf50dbe	[2D][2/N][DeviceMesh] Add get_parent_mesh_dim() in _MeshEnv class (#109330 ) Adding some additional APIs that are needed for 2D workflow. Since each parallelism is only aware of its own mesh when we are constructing 2D state_dict. We need to know the mesh_dim of the child mesh in the parent mesh. So, we can use it to create DTensor that is 2D sound. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109330 Approved by: https://github.com/fegin, https://github.com/fduwjj, https://github.com/wanchaol	2023-09-16 19:03:04 +00:00
drisspg	b275a902d3	Small type hint fix (#109414 ) # Summary Adds these types to the type hint list for better IDE experience Pull Request resolved: https://github.com/pytorch/pytorch/pull/109414 Approved by: https://github.com/Skylion007	2023-09-16 18:46:46 +00:00
Aaron Gokaslan	247e2f8461	[BE]: Update ruff to v0.0.290 (#109435 ) Updates our ruff linter to the latest and fixes a few false negatives along the way. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109435 Approved by: https://github.com/ezyang	2023-09-16 18:43:34 +00:00
Bin Bao	0f646b1d15	[inductor] Add a C shim layer for libtorch (#109391 ) Summary: This PR adds a limited C shim layer for libtorch. The ultimate goal is to ban any direct reference to aten/c10 data structures or functions, to avoid ABI breakage by providing stable C interfaces. To make the review and landing easier, we broke the changes into several steps. In this PR (a combination of https://github.com/pytorch/pytorch/pull/109022 and https://github.com/pytorch/pytorch/pull/109351), we add C interfaces for certain libtorch functions and modify the wrapper codegen to generate calls to those interfaces. There are a few other items to be addressed in future PRs: * The AOTInductor runtime interface still takes lists of aten tensors as input and output * The interaction with ProxyExecutor (general fallback support) needs to move away from aten tensor * Remove all references to aten/c10 headers in the AOTInductor-generated code Differential Revision: D49302669 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109391 Approved by: https://github.com/chenyang78	2023-09-16 16:46:26 +00:00
Edward Z. Yang	d860313903	Improve can't call type() error message (#109378 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/109378 Approved by: https://github.com/Skylion007	2023-09-16 16:12:58 +00:00
Jez Ng	58bdc63dd6	[inductor] Remove a bunch of check_gradient=False in opinfo tests (#109417 ) Despite what the comments say, they do not seem to segfault not cause CUDA errors any more. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109417 Approved by: https://github.com/lezcano ghstack dependencies: #109359, #109416	2023-09-16 13:31:05 +00:00
Jez Ng	1e4f2b576d	Have inductor tests call output_process_fn_grad (#109416 ) This is similar to what's done in test_ops.py. Fixes #109353. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109416 Approved by: https://github.com/lezcano ghstack dependencies: #109359	2023-09-16 13:31:05 +00:00
Jez Ng	7f3885137f	Add meta function for _segment_reduce (#109359 ) This fixes numerous tests which were xfailing. For instance, the `_segment_reduce.lengths` OpInfo test, which was previously relying on the fallback kernel to determine the shape of the meta tensor. The fallback kernel would fail with segment_reduce(): Expected all rows of lengths along axis to sum to data.size(lengths.dim()-1) when !unsafe. as it was trying to read the values of a meta tensor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109359 Approved by: https://github.com/ezyang	2023-09-16 13:31:03 +00:00
leslie-fang-intel	55c19a3c6d	Inductor: Increase multiplier to 3 for Inductor AMP benchmark correctness check (#109097 ) Summary As reported in https://github.com/pytorch/pytorch/issues/108333, we find some of the models have failed the benchmark's correctness check. However, the end-to-end model's accuracy ([test script](https://gist.github.com/leslie-fang-intel/aac8b3c2b450532fd0517c758bb845e0)) when comparing AMP with FP32 is within a difference of less than 0.1%. Thus, it's possible that the correctness check failures for these models are false alarms. We use multiplier of 3 instead of 2 in this PR to avoid these false alarms. Model end-to-end accuracy test results are: <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40"> <head> <meta name=ProgId content=Excel.Sheet> <meta name=Generator content="Microsoft Excel 15"> <link id=Main-File rel=Main-File href="file:///C:/Users/jiahaofa/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> <link rel=File-List href="file:///C:/Users/jiahaofa/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> </head> <body link="#0563C1" vlink="#954F72"> SPR \| \| \| \| \| \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| FP32 Imperative TOP1 Accuracy \| FP32 Imperative TOP5 Accuracy \| BF16 AMP Inductor TOP1 Accuracy \| BF16 AMP Inductor TOP5 Accuracy \| BF16/FP32 Relative Loss TOP1 Accuracy \| BF16/FP32 Relative Loss TOP5 Accuracy gluon_inception_v3 \| 73.262 \| 90.774 \| 73.256 \| 90.802 \| -0.01% \| 0.03% mobilenetv2_100 \| 72.89 \| 90.996 \| 72.826 \| 90.946 \| -0.09% \| -0.05% mobilenetv3_large_100 \| 75.72 \| 92.55 \| 75.764 \| 92.554 \| 0.06% \| 0.00% </body> </html> Pull Request resolved: https://github.com/pytorch/pytorch/pull/109097 Approved by: https://github.com/jgong5, https://github.com/jansel	2023-09-16 10:02:56 +00:00
cyy	7014ef0f43	Eliminates c10::guts::array (#109423 ) Follow the work of #106810 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109423 Approved by: https://github.com/Skylion007	2023-09-16 09:18:03 +00:00
Oguz Ulgen	b03ef1d969	[Dynamo] Fix numpy error in test_numpy_torch_operators (#109087 ) When you inplace matmul two one dimensional numpy arrays, numpy=="1.24.3" gives ``` TypeError: In-place matrix multiplication is not (yet) supported. Use 'a = a @ b' instead of 'a @= b'. ``` but numpy=="1.25.2" gives ``` ValueError: inplace matrix multiplication requires the first operand to have at least one and the second at least two dimensions. ``` This diff makes it so that newer versions of numpy does not fail on this test because we do not catch ValueError. An alternative solution would be to update the test cases to be 2 dimensional, but that would have impact on other operators being tested. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109087 Approved by: https://github.com/jansel	2023-09-16 07:37:07 +00:00
cyy	852f1b8417	Eliminate c10::stoi,c10::stod,c10::stoull,c10::stoll (#109179 ) We can remove these functions in favor of std ones. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109179 Approved by: https://github.com/colesbury	2023-09-16 07:22:50 +00:00
Wenting Wang	393fe9339a	Back out "Revert D49107540: [pytorch][PR] split by tag" (#109332 ) Summary: Original commit changeset: 6391a068640b Original Phabricator Diff: D49107540 Test Plan: same as D49107540 Differential Revision: D49297522 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109332 Approved by: https://github.com/842974287	2023-09-16 05:29:16 +00:00
cyy	7bce7f50f3	Add torchgen path in gen_vulkan_spy (#108980 ) Fixes the CMake building error ``` from torchgen.code_template import CodeTemplate ModuleNotFoundError: No module named 'torchgen' ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/108980 Approved by: https://github.com/ezyang	2023-09-16 04:09:56 +00:00
ydwu4	706d8e2230	[dynamo] Respect shape dynamism of SymInt sized tensor (#109331 ) Before this PR, if we run the following code: ```python def true_fn(x): return x - x.cos() def false_fn(x): return x + x.sin() def foo(x): return cond(x.shape[0] == 4, true_fn, false_fn, [x]) gm = make_fx(foo, tracing_mode='symbolic')(torch.ones(3, 4)) gm = make_fx(foo, tracing_mode='symbolic')(torch.ones(4, 5)) ``` we'll have the following error: ```python Traceback (most recent call last): File "/home/yidi/local/pytorch/make_fx.py", line 16, in <module> gm = make_fx(foo, tracing_mode='symbolic')(torch.ones(4, 5)) File "/home/yidi/local/pytorch/torch/fx/experimental/proxy_tensor.py", line 841, in wrapped t = dispatch_trace(wrap_key(func, args, fx_tracer, pre_dispatch), tracer=fx_tracer, concrete_args=tuple(phs)) File "/home/yidi/local/pytorch/torch/_compile.py", line 24, in inner return torch._dynamo.disable(fn, recursive)(args, kwargs) File "/home/yidi/local/pytorch/torch/_dynamo/eval_frame.py", line 397, in _fn return fn(args, *kwargs) File "/home/yidi/local/pytorch/torch/_dynamo/external_utils.py", line 17, in inner return fn(args, *kwargs) File "/home/yidi/local/pytorch/torch/fx/experimental/proxy_tensor.py", line 461, in dispatch_trace graph = tracer.trace(root, concrete_args) File "/home/yidi/local/pytorch/torch/_dynamo/eval_frame.py", line 397, in _fn return fn(args, *kwargs) File "/home/yidi/local/pytorch/torch/_dynamo/external_utils.py", line 17, in inner return fn(args, *kwargs) File "/home/yidi/local/pytorch/torch/fx/_symbolic_trace.py", line 817, in trace (self.create_arg(fn(args)),), File "/home/yidi/local/pytorch/torch/fx/experimental/proxy_tensor.py", line 497, in wrapped out = f(tensors) File "/home/yidi/local/pytorch/make_fx.py", line 13, in foo return control_flow.cond(x.shape[0] == 4, true_fn, false_fn, [x]) File "/home/yidi/local/pytorch/torch/_higher_order_ops/cond.py", line 151, in cond return torch.compile(cond_op, backend="eager", fullgraph=True)( File "/home/yidi/local/pytorch/torch/_dynamo/eval_frame.py", line 397, in _fn return fn(args, *kwargs) File "/home/yidi/local/pytorch/torch/_dynamo/eval_frame.py", line 545, in catch_errors return callback(frame, cache_entry, hooks, frame_state) File "/home/yidi/local/pytorch/torch/_dynamo/convert_frame.py", line 140, in _fn return fn(args, *kwargs) File "/home/yidi/local/pytorch/torch/_dynamo/convert_frame.py", line 380, in _convert_frame_assert return _compile( File "/home/yidi/local/pytorch/torch/_dynamo/convert_frame.py", line 561, in _compile guarded_code = compile_inner(code, one_graph, hooks, transform) File "/home/yidi/local/pytorch/torch/_dynamo/utils.py", line 189, in time_wrapper r = func(args, kwargs) File "/home/yidi/local/pytorch/torch/_dynamo/convert_frame.py", line 483, in compile_inner out_code = transform_code_object(code, transform) File "/home/yidi/local/pytorch/torch/_dynamo/bytecode_transformation.py", line 1028, in transform_code_object transformations(instructions, code_options) File "/home/yidi/local/pytorch/torch/_dynamo/convert_frame.py", line 432, in transform tracer = InstructionTranslator( File "/home/yidi/local/pytorch/torch/_dynamo/symbolic_convert.py", line 2032, in __init__ self.symbolic_locals = collections.OrderedDict( File "/home/yidi/local/pytorch/torch/_dynamo/symbolic_convert.py", line 2035, in <genexpr> VariableBuilder( File "/home/yidi/local/pytorch/torch/_dynamo/variables/builder.py", line 229, in __call__ vt = self._wrap(value).clone(self.options()) File "/home/yidi/local/pytorch/torch/_dynamo/variables/builder.py", line 374, in _wrap return type_dispatch(self, value) File "/home/yidi/local/pytorch/torch/_dynamo/variables/builder.py", line 808, in wrap_listlike output = [ File "/home/yidi/local/pytorch/torch/_dynamo/variables/builder.py", line 809, in <listcomp> VariableBuilder(self.tx, GetItemSource(self.get_source(), i))( File "/home/yidi/local/pytorch/torch/_dynamo/variables/builder.py", line 229, in __call__ vt = self._wrap(value).clone(self.options()) File "/home/yidi/local/pytorch/torch/_dynamo/variables/builder.py", line 374, in _wrap return type_dispatch(self, value) File "/home/yidi/local/pytorch/torch/_dynamo/variables/builder.py", line 808, in wrap_listlike output = [ File "/home/yidi/local/pytorch/torch/_dynamo/variables/builder.py", line 809, in <listcomp> VariableBuilder(self.tx, GetItemSource(self.get_source(), i))( File "/home/yidi/local/pytorch/torch/_dynamo/variables/builder.py", line 229, in __call__ vt = self._wrap(value).clone(self.options()) File "/home/yidi/local/pytorch/torch/_dynamo/variables/builder.py", line 374, in _wrap return type_dispatch(self, value) File "/home/yidi/local/pytorch/torch/_dynamo/variables/builder.py", line 1040, in wrap_tensor tensor_variable = wrap_fx_proxy( File "/home/yidi/local/pytorch/torch/_dynamo/variables/builder.py", line 1267, in wrap_fx_proxy return wrap_fx_proxy_cls( File "/home/yidi/local/pytorch/torch/_dynamo/variables/builder.py", line 1382, in wrap_fx_proxy_cls example_value = wrap_to_fake_tensor_and_record( File "/home/yidi/local/pytorch/torch/_dynamo/variables/builder.py", line 1652, in wrap_to_fake_tensor_and_record dynamic_dims, constraint_dims = _automatic_dynamic( File "/home/yidi/local/pytorch/torch/_dynamo/variables/builder.py", line 1550, in _automatic_dynamic if dim is not None and e.size()[i] != dim: File "/home/yidi/local/pytorch/torch/__init__.py", line 352, in __bool__ return self.node.bool_() File "/home/yidi/local/pytorch/torch/fx/experimental/symbolic_shapes.py", line 1019, in bool_ return self.guard_bool("", 0) File "/home/yidi/local/pytorch/torch/fx/experimental/symbolic_shapes.py", line 1001, in guard_bool r = self.shape_env.evaluate_expr(self.expr, self.hint, fx_node=self.fx_node) File "/home/yidi/local/pytorch/torch/fx/experimental/recording.py", line 227, in wrapper return fn(args, *kwargs) File "/home/yidi/local/pytorch/torch/fx/experimental/symbolic_shapes.py", line 3793, in evaluate_expr assert orig_expr == hint, f"{orig_expr} != {hint}" AssertionError: False != True from user code: Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information You can suppress this exception and fall back to eager by setting: import torch._dynamo torch._dynamo.config.suppress_errors = True ``` It's because we record the SymInt in the frame state in _automatic_dynamic the first time we compile the function. Then In the second time, when we are given a symint sized input with different hints, the comparison fails. Implementation: This PR returns shape dynamism according to the dynamism of inputs: if a diemsion is SymInt, return DYNAMIC else return static. Test Plan: Add a test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109331 Approved by: https://github.com/ezyang	2023-09-16 02:56:53 +00:00
Nikita Shulga	fb58a72d96	Use `torch.cumsum` instead of numpy one (#109400 ) `s/list(numpy.cumsum(foo))/torch.cumsum(torch.tensor(foo), 0).tolist()/` Test plan: ` python3 ../test/inductor/test_split_cat_fx_passes.py -v` Partially addresses https://github.com/pytorch/pytorch/issues/109387 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109400 Approved by: https://github.com/ezyang	2023-09-16 02:52:49 +00:00
Nikita Shulga	4ee179c952	Fix `ConstantVariable` init method if NumPy is missing (#109388 ) By adding `np is not None` check before `isinstance(value, np.number)` Partially addresses https://github.com/pytorch/pytorch/issues/109387 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109388 Approved by: https://github.com/ezyang	2023-09-16 00:07:19 +00:00
William Wen	b904432e82	[dynamo] preserve some FX node metadata of GraphModules (#107067 ) Requested from @tugsbayasgalan: we want dynamo to preserve some FX node metadata when we trace `GraphModule`s (`nn_module_stack`, `source_fn`, `stack_trace`). This is helpful for the case when we export an aten-level `GraphModule`, add some (possibly non-torch or non-aten) ops, and we want to transform the graph back into an aten-level graph. Without preserving metadata, future passes that look at metadata (e.g. quantization passes) won't work. This feature also has the additional benefit of being able to preserve origin line of code when `print_readable`'ing a `GraphModule`. This is helpful when debugging graphs that have passed through dynamo several times. The added unit test demonstrates the added functionality of this PR. ~This PR is currently a proof-of-concept implementation that shows that preserving node metadata across dynamo is possible.~ This PR preserves node metadata across dynamo by doing the following: - ~inject a counter variable into the `GraphModule` source code, which is incremented every time a node is run~ - Construct a line number -> node index map in `GraphModule` as the source code is being generated. - pass a list of node metadata and the line number map to dynamo's bytecode analyzer - ~dynamo traces the counter as a `ConstantVariable`, so when we create a new proxy, we can determine which original node index this proxy corresponds by looking at the value of the traced counter~ - When we create a new proxy, get the current instruction's line number, and get the node index using the line number map - index into the original node metadata ~using the counter variable's tracked value.~ ~Some things that should be addressed off the top of my head:~ - ~Is this feature even desirable? (Do we really want Dynamo to have special behavior for `GraphModules`? Should we expect users to re-export `GraphModules`?)~ - ~Is there a better approach than to use a counter? We considered using node names, line numbers, and assuming that proxies are created in the same order as the nodes, but each of these 3 have shortcomings. For node names, we only have access to new node names, not the old ones. Using line number is fragile. The third is problematic since not all created nodes go through `create_proxy` (e.g. inputs). We currently generate a line number to node index map when the `GraphModule`'s code is generated.~ - ~What's the best way to send data across the "CPython gap"? That is, it is not obvious how to cleanly pass data from dynamo's `eval_frame.py:_TorchDynamoContext.__call__` to `symbolic_convert.py:InstructionTranslatorBase.__init__`. In this PR, we use a global.~ Differential Revision: [D49257108](https://our.internmc.facebook.com/intern/diff/D49257108) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107067 Approved by: https://github.com/jansel	2023-09-15 23:29:14 +00:00
PyTorch MergeBot	7af792ab05	Revert "[inductor][Optimus]Improve logging for group batch fusion (#109314 )" This reverts commit afad0d074b5504c87aa1dc9ae352686a8dd3a8eb. Reverted https://github.com/pytorch/pytorch/pull/109314 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/109314#issuecomment-1722015037))	2023-09-15 23:28:50 +00:00
cyy	a14d30d8d1	[1/N] apply clang-tidy in torch/csrc/autograd (#109032 ) This PR begins a new series of patches for enabling clang-tidy checks in torch/csrc/augograd Pull Request resolved: https://github.com/pytorch/pytorch/pull/109032 Approved by: https://github.com/albanD, https://github.com/Skylion007	2023-09-15 23:28:43 +00:00
David Berard	b4ea3260d7	[JIT] Document torch.jit.interface (#109356 ) Good option for replacing "Callable" types; we should document it so it's searchable. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109356 Approved by: https://github.com/eellison, https://github.com/gmagogsfm	2023-09-15 23:23:47 +00:00
Edward Z. Yang	ec8b58f5ba	Add support for tolist on AsyncCollectiveTensor (#109377 ) This has to be done by hand because tolist isn't supported on tensor subclasses. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/109377 Approved by: https://github.com/wconstab, https://github.com/fduwjj	2023-09-15 21:48:13 +00:00
Brian	806c52b4c9	Update chunk_sharding_spec.py (#108915 ) Fixes #108869 Implements the first solution proposed in the issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108915 Approved by: https://github.com/wanchaol, https://github.com/wz337	2023-09-15 21:43:15 +00:00
Jackie (Jiaqi) Xu	afad0d074b	[inductor][Optimus]Improve logging for group batch fusion (#109314 ) Summary: Log graph with Everpaste for debug and find more patterns to fuse Test Plan: to add logs Differential Revision: D49284640 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109314 Approved by: https://github.com/yanboliang	2023-09-15 20:46:08 +00:00
Brian Hirsh	71b4b32014	return_and_correct_aliasing: massage some schemas to work with torchgen (#108897 ) This issue is that `str(torch.ops.aten.conv2d.default._schema)` does not return the same schema that is in native_functions.yaml ([link](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/native_functions.yaml#L1654)). Torchscript appears to change the default arg string `int[2] strides=1` to `int[2] strides=[1, 1]`. If you try to parse that with torchgen, torchgen is unhappy (it tries to split arguments on comma, but now we have a comma inside of the default argument). Fixing the issue directly in torchgen was a bit more painful, so I opted just to undo the transformation that torchscript made: convert `=[1, 1]` back into `=1`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108897 Approved by: https://github.com/ezyang ghstack dependencies: #106404, #107917	2023-09-15 20:19:25 +00:00
Brian Hirsh	0ad595954a	python functionalization: add helpers, functionalize_sync and mirror_autograd_meta (#107917 ) Added two new utils to help with turning python functionalization on in AOTAutograd (next PR): (1) updated `torch._sync()`. Previously, this API could only handle `torch.Tensor` instances that had a `FunctionalTensorWrapper` TensorImpl. It now needs to handle python `FunctionalTensor`'s. In theory I can probably break BC and change this API (since it's private?), but I decided not to do it in this PR stack do minimize the chance of reverts. Instead of updating that API directly (which is in C++), I just added a python shim that first tries to unwrap the python `FunctionalTensor` if there is one, then calls the existing C++ logic (2) `mirror_autograd_meta` is now a standalone API that tries to mirror the `requires_grad` and `is_leaf` autograd metadata from one tensor to another. Previously this was hardcoded into `torch._to_functional_tensor()`. But I now need to use it in a more standalone way: later in AOTAutograd when we unwrap and re-wrap a tensor subclasses, we need to manually mirror the autograd metadata from the original to the updated version of the subclass. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107917 Approved by: https://github.com/ezyang ghstack dependencies: #106404	2023-09-15 20:19:25 +00:00
Brian Hirsh	f22b303f65	Add TorchDispatch version of functionalization (#106404 ) This PR adds a new `FunctionalTensor` subclass, and `FunctionalTensorMode` torch dispatch mode. Together, this class/mode are a lightweight wrapper around our existing C++ functionalization logic. This idea came from Ed - later in the stack, I want to be able to run functionalization underneath torch_dispatch, when performing tracing in AOTAutograd. I can't do this easily with vanilla C++ functionalization, because it has a dedicated dispatch key that always runs before TorchDispatch. However, by adding a torch_dispatch mode shim around functionalization, we can use functionalization as a torch_dispatch mode, which will make it easier to run underneath other modes later. This PR provides the basic new classes, and some light testing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106404 Approved by: https://github.com/ezyang	2023-09-15 20:19:25 +00:00
CYuxian	504dceacb1	[ONNX] Fix indexing issue of meshgrid op (#109350 ) Should unpack tensor_list before swapping the elements for indexing 'xy'. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109350 Approved by: https://github.com/thiagocrepaldi	2023-09-15 19:49:43 +00:00
cyy	4c208c1475	Remove unneeded linking in CMake targets (#109192 ) This PR removes unused library dependencies, help refactoring in the future. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109192 Approved by: https://github.com/ezyang	2023-09-15 19:43:25 +00:00
Edward Z. Yang	d3a64ff249	Display subclass name when tolist() fails due to tensor subclass (#109376 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/109376 Approved by: https://github.com/wanchaol	2023-09-15 19:42:39 +00:00
cyy	9d297cc773	Remove c10::either (#109299 ) We can replace it with std::variant. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109299 Approved by: https://github.com/colesbury, https://github.com/ezyang	2023-09-15 19:34:31 +00:00
Oleg Khabinov	cc03e3a892	[AOTInductor] Do not hardcode directory with .cubin files (#109151 ) Reviewed By: frank-wei, chenyang78 Differential Revision: D49081883 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109151 Approved by: https://github.com/chenyang78	2023-09-15 18:38:05 +00:00
andrewor14	7da3c938cf	[quant][be] Move QAT tests to its own file (#108061 ) Test Plan: python test/test_quantization.py TestQuantizePT2EQAT python test/test_quantization.py TestQuantizePT2EQATModels Reviewers: jerryzh168, kimishpatel Subscribers: jerryzh168, kimishpatel, supriyar Pull Request resolved: https://github.com/pytorch/pytorch/pull/108061 Approved by: https://github.com/jerryzh168	2023-09-15 18:34:44 +00:00
Jesse Cai	369a84e5c4	[core][sparse][pruning] Add (i8i8)-> fp16 support to cuSPARSELt matmul (#109214 ) Summary: This PR adds in support for sparse matmul using cuSPASRELt with int8 inputs and fp16 outputs. It does so by adding a out_dtype flag to `torch_cslt_sparse_mm`. Because the only mixed_dtype support present in cuSPARSELt is for int8 input and fp16 output, we error out if: * out_dtype is set and the input tensors are not int8. * out_dtype is set to any value other than fp16 Test Plan: python test/test_sparse_semi_structured -k int8_in_fp16_out Reviewers: @cphursh Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/109214 Approved by: https://github.com/cpuhrsch	2023-09-15 18:14:40 +00:00
Brian	ab99a95470	Update planner.py (#107998 ) Fixes #107997 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107998 Approved by: https://github.com/wz337	2023-09-15 18:12:45 +00:00
Sam Larsen	86e6bd3e53	[inductor] Enable mypy checking for torch/_inductor/bounds.py (#109271 ) Summary: Add type hints and enable mypy checking for torch/_inductor/bounds.py Test Plan: lintrunner Pull Request resolved: https://github.com/pytorch/pytorch/pull/109271 Approved by: https://github.com/lezcano	2023-09-15 17:47:24 +00:00
Nikita Shulga	a9bf1031d4	[BE] Do not use `numpy` in `torch._inductor.codegen.cpp` (#109324 ) `s/numpy.iinfo(numpy.int32)/torch.iinfo(torch.int32)/` as those two are interchangeable Partially addresses https://github.com/pytorch/pytorch/issues/109387 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109324 Approved by: https://github.com/albanD	2023-09-15 17:29:10 +00:00
lezcano	653c1564bf	Fix broadcasting cosine_similarity (#109363 ) Fixes https://github.com/pytorch/pytorch/issues/109333 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109363 Approved by: https://github.com/peterbell10	2023-09-15 17:12:35 +00:00
Peter Bell	aed9bee041	[inductor] Lower masked_scatter on CUDA (#108803 ) This decomposes masked_scatter into `aten.cumsum` and a single pointwise kernel, which is similar to what is done in eager. I only do this for CUDA because on CPU it isn't split into two passes like this so would cause a slowdown. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108803 Approved by: https://github.com/lezcano	2023-09-15 16:36:06 +00:00
Jerry Zhang	3943afc94e	[quant][be] Remove unused APIs (#109342 ) Summary: att Test Plan: python test/test_quantization.py TestQuantizePT2E Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/109342 Approved by: https://github.com/kimishpatel, https://github.com/andrewor14	2023-09-15 16:07:01 +00:00
ydwu4	f3d1401843	Fix cond branches take no arguments (#109308 ) For code like this: ```python import torch from functorch.experimental import control_flow def exportdb_example2(x): def true_fn(): return torch.sin(x) def false_fn(): return torch.cos(x) return control_flow.cond(x.sum() > 0, true_fn, false_fn, []) ep = torch._export.export(exportdb_example2, (torch.randn(4, 5),)) ``` before the pr, when the branches take an empty/list of tuple as inputs, we'll have error like following: ```python Traceback (most recent call last): File "/home/yidi/local/pytorch/test_cond.py", line 11, in <module> ep = torch._export.export(exportdb_example2, (torch.randn(4, 5),)) File "/home/yidi/local/pytorch/torch/_export/__init__.py", line 340, in export gm_torch_level, _ = torch._dynamo.export( File "/home/yidi/local/pytorch/torch/_dynamo/eval_frame.py", line 1207, in inner result_traced = opt_f(args, kwargs) File "/home/yidi/local/pytorch/torch/_dynamo/eval_frame.py", line 397, in _fn return fn(args, *kwargs) File "/home/yidi/local/pytorch/test_cond.py", line 3, in exportdb_example2 def exportdb_example2(x): File "/home/yidi/local/pytorch/torch/_dynamo/eval_frame.py", line 397, in _fn return fn(args, *kwargs) File "/home/yidi/local/pytorch/torch/_dynamo/external_utils.py", line 17, in inner return fn(args, *kwargs) File "/home/yidi/local/pytorch/torch/_dynamo/eval_frame.py", line 1173, in result_capturing_wrapper graph_captured_result = torch.func.functional_call( File "/home/yidi/local/pytorch/torch/_functorch/functional_call.py", line 143, in functional_call return nn.utils.stateless._functional_call( File "/home/yidi/local/pytorch/torch/nn/utils/stateless.py", line 264, in _functional_call return module(args, *kwargs) File "/home/yidi/local/pytorch/torch/fx/graph_module.py", line 725, in call_wrapped return self._wrapped_call(self, args, *kwargs) File "/home/yidi/local/pytorch/torch/fx/graph_module.py", line 305, in __call__ raise e File "/home/yidi/local/pytorch/torch/fx/graph_module.py", line 292, in __call__ return super(self.cls, obj).__call__(args, *kwargs) # type: ignore[misc] File "/home/yidi/local/pytorch/torch/nn/modules/module.py", line 1519, in _wrapped_call_impl return self._call_impl(args, *kwargs) File "/home/yidi/local/pytorch/torch/nn/modules/module.py", line 1528, in _call_impl return forward_call(args, *kwargs) File "<eval_with_key>.2", line 10, in forward File "/home/yidi/local/pytorch/torch/_ops.py", line 301, in __call__ return wrapper() File "/home/yidi/local/pytorch/torch/_dynamo/eval_frame.py", line 397, in _fn return fn(args, *kwargs) File "/home/yidi/local/pytorch/torch/_dynamo/external_utils.py", line 17, in inner return fn(args, *kwargs) File "/home/yidi/local/pytorch/torch/_ops.py", line 297, in wrapper return self.dispatch( File "/home/yidi/local/pytorch/torch/_ops.py", line 280, in dispatch return kernel(args, *kwargs) File "/home/yidi/local/pytorch/torch/_higher_order_ops/utils.py", line 52, in inner return autograd_not_implemented_inner(op, deferred_error, args, *kwargs) File "/home/yidi/local/pytorch/torch/_higher_order_ops/utils.py", line 25, in autograd_not_implemented_inner result = operator(args, *kwargs) File "/home/yidi/local/pytorch/torch/_ops.py", line 301, in __call__ return wrapper() File "/home/yidi/local/pytorch/torch/_dynamo/eval_frame.py", line 397, in _fn return fn(args, *kwargs) File "/home/yidi/local/pytorch/torch/_dynamo/external_utils.py", line 17, in inner return fn(args, *kwargs) File "/home/yidi/local/pytorch/torch/_ops.py", line 297, in wrapper return self.dispatch( File "/home/yidi/local/pytorch/torch/_ops.py", line 255, in dispatch return self.python_key_mode_table[type(curr_mode)](args, *kwargs) File "/home/yidi/local/pytorch/torch/_higher_order_ops/cond.py", line 310, in cond_fake_tensor_mode flat_false_outs, _ = pytree.tree_flatten(false_fn(operands)) File "/home/yidi/local/pytorch/torch/fx/graph_module.py", line 725, in call_wrapped return self._wrapped_call(self, args, kwargs) File "/home/yidi/local/pytorch/torch/fx/graph_module.py", line 305, in __call__ raise e File "/home/yidi/local/pytorch/torch/fx/graph_module.py", line 292, in __call__ return super(self.cls, obj).__call__(args, *kwargs) # type: ignore[misc] File "/home/yidi/local/pytorch/torch/nn/modules/module.py", line 1519, in _wrapped_call_impl return self._call_impl(args, *kwargs) File "/home/yidi/local/pytorch/torch/nn/modules/module.py", line 1528, in _call_impl return forward_call(args, **kwargs) TypeError: forward() takes 2 positional arguments but 3 were given ``` Thanks for @williamwen42 spotting this error! We fix it by addressing the case when add_after is -1. Test Plan: See newly added tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109308 Approved by: https://github.com/williamwen42	2023-09-15 15:35:46 +00:00
ydwu4	1aba61e977	Allow cond to have more dynamo cache beyond limit (#109318 ) This is short term workaround for https://github.com/pytorch/pytorch/issues/108500. In the long term, we should have separate caches if cond appears at different places in user code or per true_fn/false_fn cache. Test Plan: see added test. It tests cond can go beyond cache limit. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109318 Approved by: https://github.com/ezyang	2023-09-15 15:33:36 +00:00
Yukio Siraichi	dfdc0b63c9	Bisect FX node asserts on `ValidationException`. (#107493 ) This PR introduces binary search for finding smaller validation errors, when they occur. We do that by bisecting the sequence of `torch._assert` FX nodes recorded as the source expression of the translation validator (TV) by `ShapeEnv.evaluate_expr` calls. Then, we raise the error caused by the earliest node. In summary, the changes are: - Call `bisect` on `ValidationError` @ _torch/_dynamo/convert_frame.py_ - Implement the binary search @ _torch/fx/experimental/symbolic_shapes.py_ Edit: moved `ShapeEnv` replay-recording to #107989 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107493 Approved by: https://github.com/ezyang ghstack dependencies: #107989	2023-09-15 15:18:12 +00:00
Andrew Gallagher	a873f523ba	[aarch64][caffe2/torch/csrc/profiler] Support aarch64 in inline assembly (#104707 ) Summary: Port x86 inline assembly to aarch64: - Use `sp` instead of `%rsp` for stack pointer; move to second caller- saved register `x1` instead of `%rsi` - Use `x29` instead of `%rbp` for base pointer; move to third caller- saved register `x2` instead of `%rdx` Test Plan: ``` $ buck2 build fbcode//mode/opt fbcode//caffe2/torch/fb/model_transform/fx2trt/packaging:generate_merge_net_file ``` Reviewed By: jasonjk-park Differential Revision: D47242468 Pull Request resolved: https://github.com/pytorch/pytorch/pull/104707 Approved by: https://github.com/aaronenyeshi	2023-09-15 14:34:55 +00:00
Jez Ng	faf3de35db	Fix max/min.reduction_with_dim opinfo test for bool tensors (#109264 ) The tests were failing because the input tensors were getting incorrectly promoted to int64, due to `transform_args` incorrectly considering the `dim` argument when determining the type to promote to. It doesn't seem like type promotion is necessary in general for max and min, so I've switched them to `type_promotion_kind=None`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109264 Approved by: https://github.com/eellison ghstack dependencies: #109165	2023-09-15 12:37:17 +00:00
Jez Ng	19f8b05afe	Disable gradient check for linalg.eig (#109165 ) Both the eager and compiled versions fail with the following message when trying to compute the grad: RuntimeError: linalg_eig_backward: The eigenvectors in the complex case are specified up to multiplication by e^{i phi}. The specified loss function depends on this quantity, so it is ill-defined. I'm not sure if there's a way to adapt the OpInfo such that the grad is computable, but we should at least check that the forward pass is correct. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109165 Approved by: https://github.com/eellison	2023-09-15 12:37:17 +00:00
Jez Ng	66fdea606d	Enable typing for _inductor/exc.py (#109176 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109176 Approved by: https://github.com/eellison ghstack dependencies: #109173	2023-09-15 12:36:59 +00:00
Jez Ng	bd89f80bae	Add more types for inductor_prims.py (#109173 ) Also fix a grammatical issue in the docstring. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109173 Approved by: https://github.com/eellison	2023-09-15 12:36:59 +00:00
Catherine Lee	1cc0921eb6	Add tensorboard to pip requirements (#109349 ) https://github.com/pytorch/pytorch/pull/108351/files is failing on mac and windows because we don't have the dependency It is available on linux because it is included in .ci/docker/requirements-docs.txt Adding skips to make it green. Here are some outputs for future debugging https://github.com/pytorch/pytorch/actions/runs/6192933622/job/16813841625 https://ossci-raw-job-status.s3.amazonaws.com/log/16813841625 ``` 2023-09-15T02:09:43.2397460Z =================================== FAILURES =================================== 2023-09-15T02:09:43.2397650Z ______________________ TestTensorBoardSummary.test_audio _______________________ 2023-09-15T02:09:43.2397830Z Traceback (most recent call last): 2023-09-15T02:09:43.2398090Z File "/Users/ec2-user/runner/_work/pytorch/pytorch/test/test_tensorboard.py", line 417, in test_audio 2023-09-15T02:09:43.2398390Z self.assertTrue(compare_proto(summary.audio('dummy', tensor_N(shape=(42,))), self)) 2023-09-15T02:09:43.2398720Z File "/Users/ec2-user/runner/_work/_temp/conda_environment_6192933622/lib/python3.9/unittest/case.py", line 688, in assertTrue 2023-09-15T02:09:43.2399100Z ##[endgroup] 2023-09-15T02:09:43.2399240Z raise self.failureException(msg) 2023-09-15T02:09:43.2399400Z AssertionError: False is not true 2023-09-15T02:09:43.2399490Z 2023-09-15T02:09:43.2399590Z To execute this test, run the following from the base repo dir: 2023-09-15T02:09:43.2399820Z python test/test_tensorboard.py -k test_audio 2023-09-15T02:09:43.2399930Z ``` https://github.com/pytorch/pytorch/actions/runs/6192933622/job/16814065258 https://ossci-raw-job-status.s3.amazonaws.com/log/16814065258 ``` 2023-09-15T02:38:44.6284979Z ================================== FAILURES =================================== 2023-09-15T02:38:44.6285295Z ______________________ TestTensorBoardNumpy.test_scalar _______________________ 2023-09-15T02:38:44.6285556Z Traceback (most recent call last): 2023-09-15T02:38:44.6285915Z File "C:\actions-runner\_work\pytorch\pytorch\test\test_tensorboard.py", line 794, in test_scalar 2023-09-15T02:38:44.6286325Z res = make_np(np.float128(1.00008 + 9)) 2023-09-15T02:38:44.6286705Z File "C:\Jenkins\Miniconda3\lib\site-packages\numpy\__init__.py", line 315, in __getattr__ 2023-09-15T02:38:44.6287700Z raise AttributeError("module {!r} has no attribute " 2023-09-15T02:38:44.6288060Z AttributeError: module 'numpy' has no attribute 'float128' 2023-09-15T02:38:44.6288241Z 2023-09-15T02:38:44.6288390Z To execute this test, run the following from the base repo dir: 2023-09-15T02:38:44.6288679Z python test\test_tensorboard.py -k test_scalar 2023-09-15T02:38:44.6288846Z ``` https://github.com/pytorch/pytorch/actions/runs/6193449301/job/16815113985 https://ossci-raw-job-status.s3.amazonaws.com/log/16815113985 ``` 2023-09-15T03:25:53.7797550Z =================================== FAILURES =================================== 2023-09-15T03:25:53.7797790Z __________________ TestTensorBoardSummary.test_histogram_auto __________________ 2023-09-15T03:25:53.7798000Z Traceback (most recent call last): 2023-09-15T03:25:53.7798310Z File "/Users/ec2-user/runner/_work/pytorch/pytorch/test/test_tensorboard.py", line 426, in test_histogram_auto 2023-09-15T03:25:53.7798690Z self.assertTrue(compare_proto(summary.histogram('dummy', tensor_N(shape=(1024,)), bins='auto', max_bins=5), self)) 2023-09-15T03:25:53.7799090Z File "/Users/ec2-user/runner/_work/_temp/conda_environment_6193449301/lib/python3.9/unittest/case.py", line 688, in assertTrue 2023-09-15T03:25:53.7799430Z raise self.failureException(msg) 2023-09-15T03:25:53.7799610Z AssertionError: False is not true 2023-09-15T03:25:53.7799720Z 2023-09-15T03:25:53.7799840Z To execute this test, run the following from the base repo dir: 2023-09-15T03:25:53.7800170Z python test/test_tensorboard.py -k test_histogram_auto 2023-09-15T03:25:53.7800310Z 2023-09-15T03:25:53.7800430Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2023-09-15T03:25:53.7800870Z - generated xml file: /Users/ec2-user/runner/_work/pytorch/pytorch/test/test-reports/python-pytest/test_tensorboard/test_tensorboard-aef95b5e2d69c061.xml - 2023-09-15T03:25:53.7801200Z =========================== short test summary info ============================ ``` https://github.com/pytorch/pytorch/actions/runs/6193576371/job/16815396352 https://ossci-raw-job-status.s3.amazonaws.com/log/16815396352 ``` 2023-09-15T03:47:02.9430070Z _________________ TestTensorBoardSummary.test_histogram_doane __________________ 2023-09-15T03:47:02.9430250Z Traceback (most recent call last): 2023-09-15T03:47:02.9430520Z File "/Users/ec2-user/runner/_work/pytorch/pytorch/test/test_tensorboard.py", line 433, in test_histogram_doane 2023-09-15T03:47:02.9430850Z self.assertTrue(compare_proto(summary.histogram('dummy', tensor_N(shape=(1024,)), bins='doane', max_bins=5), self)) 2023-09-15T03:47:02.9431180Z File "/Users/ec2-user/runner/_work/_temp/conda_environment_6193576371/lib/python3.9/unittest/case.py", line 688, in assertTrue 2023-09-15T03:47:02.9431390Z raise self.failureException(msg) 2023-09-15T03:47:02.9431550Z AssertionError: False is not true 2023-09-15T03:47:02.9431640Z 2023-09-15T03:47:02.9431730Z To execute this test, run the following from the base repo dir: 2023-09-15T03:47:02.9432000Z python test/test_tensorboard.py -k test_histogram_doane 2023-09-15T03:47:02.9432120Z ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/109349 Approved by: https://github.com/huydhn	2023-09-15 10:39:48 +00:00
Wanchao Liang	9456de937b	[dtensor] Fix and improve the sharding cache behavior (#109306 ) resolves https://github.com/pytorch/pytorch/issues/109101 The problem is essentially because we were hashing all the arguments, including the scalar too (i.e. aten.div(tensor, scalar)), in the optimizer, the scalar might change everytime we call the op, thus cache miss everytime we call the op This PR improves the sharding cache behavior by introducing a RuntimeSchemaInfo, used to record some runtime necessary hashing information during op registration time. This enable us to: * only hash arguments that are tensor or have static_argnum, this is to enable many cases like aten.div.Tensor(tensor, 0.23231) hit the cache. as we currently hashing all args which exclude those cases * with the correct cache behavior, optimizers will hit the cache again and resolve the high cpu overhead issue. simple MLP shows all cache hit and for a single addmm -> 0.319ms (from 0.341ms), shows some hashing improvements: <img width="1172" alt="Screenshot 2023-09-14 at 11 06 07 AM" src="https://github.com/pytorch/pytorch/assets/9443650/3406d673-dd8d-4ad9-9b80-9d4721c430e3"> Adam optimizer shows aten.div hit sharding cache again <img width="1016" alt="Screenshot 2023-09-14 at 11 02 10 AM" src="https://github.com/pytorch/pytorch/assets/9443650/4280e8e3-af44-4fc2-8360-ea80b768f1d9"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/109306 Approved by: https://github.com/fduwjj	2023-09-15 10:32:49 +00:00
Paul Gesel	0cbca85707	Add check to prevent NumPy ndarray from being treated as tuple when indexing (#108954 ) Fixes #108689 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108954 Approved by: https://github.com/lezcano	2023-09-15 08:51:58 +00:00
Animesh Jain	f786fbdebd	Reland 3rd try [finishing colesbury's PR 100642] Guard on nn.Module dicts and type (#109323 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109323 Approved by: https://github.com/huydhn, https://github.com/voznesenskym	2023-09-15 08:44:14 +00:00
cyy	af7d79923c	Remove thrift from Docker builds (#109344 ) Thrift is not used in Pytorch. Outdated gcc7 configs are removed too. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109344 Approved by: https://github.com/malfet	2023-09-15 08:28:32 +00:00
David Berard	34ddf08f27	[inductor] update fbcode skips for AOTInductor (#109313 ) Summary: seems like the `if __name__ == "__main__":` part doesn't run in fbcode; instead, just add skips Differential Revision: D49258492 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109313 Approved by: https://github.com/desertfire, https://github.com/chenyang78	2023-09-15 04:28:25 +00:00
Animesh Jain	2b6d983b8b	Reland [dynamo][activation checkpointing] Trace through ActivationWrapper (#109327 ) Fixes https://github.com/pytorch/pytorch/issues/108269 Original reverted PR - https://github.com/pytorch/pytorch/pull/108599 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109327 Approved by: https://github.com/aakhundov	2023-09-15 03:43:59 +00:00
drisspg	924723bda7	Created nested utils.cpp (#109304 ) # Summary This refactors the preprocessing for nestedtensors that glue into SDPA. This is done in order to aid with reviewing: https://github.com/pytorch/pytorch/pull/97485 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109304 Approved by: https://github.com/cpuhrsch	2023-09-15 03:33:11 +00:00
Omkar Salpekar	2d4924db32	Remove S3 Update Workflow (#109317 ) Simpler workflow to update S3 management is done in https://github.com/pytorch/builder/pull/1531. We can remove this job from here. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109317 Approved by: https://github.com/huydhn	2023-09-15 03:17:38 +00:00
Elias Ellison	b3272b2c00	Trace attention inference patterns with p=0, cleanup (#109118 ) When dropout is traced in inference, it creates a clone() instead of training pattern of rand() etc. This was partially addressed by manually https://github.com/pytorch/pytorch/pull/108141, however that did not cover all of the patterns that included dropout, and there is no reason we should have to specify them manually. This updates the inference patterns generated to trace with dropout_p = 0.0. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109118 Approved by: https://github.com/drisspg, https://github.com/Valentine233	2023-09-15 01:40:04 +00:00
Animesh Jain	5349615240	[dynamo] Unblock a model with jit.isinstance (#109178 ) prevents this error ``` File "/tmp/jetter.azp5q59y/torch/fx/proxy.py", line 291, in create_arg python/0 raise NotImplementedError(f"argument of type: {type(a)}") python/0 torch._dynamo.exc.InternalTorchDynamoError: argument of type: <class 'typing._GenericAlias'> ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/109178 Approved by: https://github.com/yanboliang	2023-09-15 01:19:46 +00:00
Rodrigo Kumpera	2bca5f2af7	[C10D] Track pg name in c++. (#108813 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108813 Approved by: https://github.com/wconstab	2023-09-15 01:10:29 +00:00
Jerry Zhang	58a883093f	[quant][pt2e] Add test for serialize and deserialize quantized model (#109158 ) Summary: att Test Plan: python test/test_quantization.py TestQuantizePT2E.test_save_load Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/109158 Approved by: https://github.com/andrewor14 ghstack dependencies: #108924, #108925	2023-09-15 00:50:55 +00:00
cyy	36b8ca4e48	[2/N] apply clang-tidy in torch/csrc/autograd (#109277 ) This PR follows the work of PR #109032. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109277 Approved by: https://github.com/albanD	2023-09-15 00:39:12 +00:00
atalman	ec3c748fa2	Document Triton dependency for the release process (#109296 ) Document triton dependency for the Pytorch Release <!-- copilot:summary --> ### <samp>🤖 Generated by Copilot at e9773f0</samp> Add documentation for Triton dependency in `RELEASE.md`. The documentation covers how to install and use Triton for various PyTorch builds and platforms. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109296 Approved by: https://github.com/malfet	2023-09-14 23:48:45 +00:00
cyy	8cb96f5f2c	[Reland]Use cpuinfo to determine c10::ThreadPool thread number (#107339 ) Relands PR #107010 and fixes BUCK builds. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107339 Approved by: https://github.com/ezyang	2023-09-14 23:44:23 +00:00
Jason Carreiro	fa62308673	[tensorboard] Fix TensorBoard summary encoding for torch.bfloat16 tensors (#108351 ) Summary: The `tensor_proto` function in the TensorBoard summary writer code doesn't correctly encode `torch.bfloat16` tensors; it tries to use a data type of `DT_BFLOAT` when creating the protobuf, but `DT_BFLOAT` is not a valid enum value (see `types.proto`). The correct value to use when encoding tensors of this type is `DT_BLOAT16`. This diff updates the type map in the summary code to use the correct type. While fixing this error, I also noticed the wrong field of the protobuf was being used when encoding tensors of this type; per the docs in the proto file, the DT_HALF and DT_BFLOAT16 types should use the `half_val` field, not `float_val`. Since this might confuse folks trying to read this data from storage in the future, I've updated the code to correctly use to `half_val` field for these cases. Note that there's no real size advantage from doing this, since both the `half_val` and `float_val` fields are 32 bits long. Test Plan: Added a parameterized unit test that tests encoding tensors with `torch.half`, `torch.float16`, and `torch.bfloat16` data types. # Before this change The test fails with an `ValueError` due to the incorrect enum label: ``` ====================================================================== ERROR: test_bfloat16_tensor_proto (test_tensorboard.TestTensorProtoSummary) ---------------------------------------------------------------------- Traceback (most recent call last): File "/data/users/jcarreiro/fbsource/buck-out/v2/gen/fbcode/f88b3f368c9334db/caffe2/test/__tensorboard__/tensorboard#link-tree/torch/testing/_internal/common_utils.py", line 2382, in wrapper method(args, kwargs) File "/data/users/jcarreiro/fbsource/buck-out/v2/gen/fbcode/f88b3f368c9334db/caffe2/test/__tensorboard__/tensorboard#link-tree/test_tensorboard.py", line 871, in test_bfloat16_tensor_proto tensor_proto( File "/data/users/jcarreiro/fbsource/buck-out/v2/gen/fbcode/f88b3f368c9334db/caffe2/test/__tensorboard__/tensorboard#link-tree/torch/utils/tensorboard/summary.py", line 400, in tensor_proto tensor_proto = TensorProto(*tensor_proto_args) ValueError: unknown enum label "DT_BFLOAT" To execute this test, run the following from the base repo dir: python test/__tensorboard__/tensorboard#link-tree/test_tensorboard.py -k test_bfloat16_tensor_proto This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 ---------------------------------------------------------------------- ``` # After this change The test passes. Reviewed By: tanvigupta17 Differential Revision: D48828958 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108351 Approved by: https://github.com/hamzajzmati, https://github.com/XilunWu	2023-09-14 23:12:22 +00:00
PyTorch MergeBot	bf5622e965	Revert "split by tag (#108892 )" This reverts commit 89b6276be9f1b04491625cc0d05de01c15f75597. Reverted https://github.com/pytorch/pytorch/pull/108892 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/108892#issuecomment-1720249148))	2023-09-14 22:43:03 +00:00
PyTorch MergeBot	be9f73f031	Revert "Add meta and OpInfo for _embedding_bag_dense_backward (#109211 )" This reverts commit fe14e43d14420a53426215a5fff30113da6d216a. Reverted https://github.com/pytorch/pytorch/pull/109211 on behalf of https://github.com/clee2000 due to Sorry I think the test_ops.py::TestCommonCUDA::test_compare_cpu__embedding_bag_dense_backward_cuda_float32 is failing `492a93d185` https://github.com/pytorch/pytorch/actions/runs/6190707847/job/16808644559 not sure why this is run in slow when it looks to be a new test ([comment](https://github.com/pytorch/pytorch/pull/109211#issuecomment-1720235918))	2023-09-14 22:29:12 +00:00
Zain Rizvi	28169193b4	[TD] Improve heuristic metrics collection (#109305 ) Fixes a bug with heuristic metrics collection where the metrics would sometimes inaccurately claim a heuristic to have ranked a test more highly than any other heuristic. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109305 Approved by: https://github.com/clee2000	2023-09-14 22:20:34 +00:00
Wenting Wang	89b6276be9	split by tag (#108892 ) Differential Revision: D49107540 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108892 Approved by: https://github.com/842974287	2023-09-14 21:49:11 +00:00
ydwu4	2bf7a283cb	Remove expected test failures for cond (#108709 ) Remove the expected failure in def test_control_flow_tracing(self) by chaning the error message to `Expected pred to be bool or tensor, but got Proxy$eq$` Pull Request resolved: https://github.com/pytorch/pytorch/pull/108709 Approved by: https://github.com/ezyang, https://github.com/zou3519 ghstack dependencies: #107662, #107850	2023-09-14 21:34:31 +00:00
ydwu4	6140facf00	Support SymBool input to torch.compile (#107850 ) We could have SymBool inputs for torch.compile, e.g. in the following situation: ``` def f(x:torch.Tensor): pred = x.size(0) == 3 torch.compile(f)(pred, x) make_fx(f, tracing_mode="symbolic")(x) ``` The idea of this PR (credit to @ezyang) is to support SymBool by re-using the infra we've already had for SymInt so that we don't need to replicate a lot of stuff. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107850 Approved by: https://github.com/ezyang ghstack dependencies: #107662	2023-09-14 21:34:31 +00:00
Andres Lugo-Reyes	ea94344821	[ROCm] Enable Lerp tests for complex32 (#108100 ) Enables previously disabled "lerp" opinfo tests for chalf on ROCm Pull Request resolved: https://github.com/pytorch/pytorch/pull/108100 Approved by: https://github.com/pruthvistony, https://github.com/jithunnair-amd, https://github.com/kit1980	2023-09-14 21:21:29 +00:00
Simon Fan	54c5f474a7	Forward rank and world size info to Torchbench models when using dynamo runner (#108438 ) Adding support to pass rank and world_size to torchbench model, via its extra_args parameter: https://github.com/pytorch/benchmark/blob/main/torchbenchmark/util/model.py#L83C80-L83C90 This is used for models which distribute over multiple GPUs e.g. simple_gpt https://github.com/pytorch/benchmark/pull/1867 Also add an option to skip multiprocess only gpu models Testing via `python benchmarks/dynamo/torchbench.py -d cuda --output=benchmark_logs/performance.csv --inference --performance --timing --print-memory --multiprocess --only simple_gpt` Pull Request resolved: https://github.com/pytorch/pytorch/pull/108438 Approved by: https://github.com/Chillee	2023-09-14 21:01:20 +00:00
cyy	03e35efbf7	replace torch::make_unique with std::make_unique (#108866 ) It should be safe to remove the old torch::make_unique functions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108866 Approved by: https://github.com/albanD	2023-09-14 20:52:26 +00:00
Yanbo Liang	f03b8abd47	[HigherOrderOp] Should automatically pop modes (#109157 ) Fixes #108282 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109157 Approved by: https://github.com/zou3519	2023-09-14 20:46:26 +00:00
wz337	492a93d185	[HSDP] Updating HSDP test - test_hsdp_init_with_device_mesh (#109202 ) Remove DeviceMesh import dependency. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109202 Approved by: https://github.com/fegin	2023-09-14 20:03:22 +00:00
Masaki Kozuki	602413a0a0	Refactor `test_foreach.py` (#107869 ) ## Summary - Change the default of `supports_autograd` and `supports_forward_ad` of `ForeachFuncInfo` to `True` - Add `test_zero_size_tensor_inputs` to make sure that foreach functions can handle 0-size Tensor inputs - Add `test_parity` to check the consistency between outputs of foreach and for-loop of native function. - Add `test_autodiff` to check forward-mode and reverse-mode AD - Keep the corner cases that are not covered by the newly introduced methods rel: - #58833 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107869 Approved by: https://github.com/janeyx99	2023-09-14 19:39:26 +00:00
Tobias Ringwald	f7574ea43f	`torch.load`: Replaced multiple one byte read() calls during the `_is_zipfile` check with a single call (#109119 ) Fixes #108955. Right now, the `_is_zipfile` check in `torch.load` performs multiple `read()` calls, reading 1 byte at a time in a loop. This is rather wasteful and leads to performance problems when accessing files on a network share (see #108955) . This PR replaces those 1 byte calls with a single big call. Functionally, this is equivalent as `read(n)` only reads up to `n` bytes, so even if the file is shorter there should not be any problems. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109119 Approved by: https://github.com/mikaylagawarecki	2023-09-14 19:39:10 +00:00
lezcano	c382ad47dd	Deprecate torch.cross default behaviour (#108760 ) Long overdue this one. We may be able to change it in a few years :hopeful:. BC-breaking note This PR deprecates `torch.cross`'s default dim in favor of `torch.linalg.cross`. A upgrade guide is added to the documentation for `torch.cross`. Note this PR DOES NOT remove `torch.cross`. Fixes https://github.com/pytorch/pytorch/issues/108664 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108760 Approved by: https://github.com/albanD	2023-09-14 19:36:29 +00:00
Joel Schlosser	78cd86c552	NT support for gt (#109121 ) Needed for mask thresholding in SAM. TODO: Hook into existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/109121 Approved by: https://github.com/cpuhrsch, https://github.com/soulitzer	2023-09-14 19:35:27 +00:00
wangxiyuan	263ca7d69b	[ONNX] Remove deprecated functions (#107208 ) The usage of some functions is deprecated. This PR drop them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107208 Approved by: https://github.com/justinchuby, https://github.com/thiagocrepaldi	2023-09-14 19:09:56 +00:00
Thiago Crepaldi	0edb616793	Add test/onnx_caffe2 to ONNX Exporter merge rule (#109295 ) As we deprecate the TorchScript ONNX exporter we need to refactor the onnx caffe2 tests to start using private functions instead of public ones That requires changes to the merge rules to allow ONNX exporter to drive deprecation Pull Request resolved: https://github.com/pytorch/pytorch/pull/109295 Approved by: https://github.com/malfet	2023-09-14 19:06:12 +00:00
Edward Z. Yang	fe14e43d14	Add meta and OpInfo for _embedding_bag_dense_backward (#109211 ) The sample inputs is a bit involved because there are a lot of shenanigans in the derivative formula. Check comments. This is exercised in vdd, internal test `buck2 run '@fbcode//mode/opt' fbcode//pytorch/benchmark/fb/test_gpu:run_test_gpu -- 'pytorch.benchmark.fb.test_gpu.test_gpu.TestBenchmarkFbGpu.test_train_blue_reels_vdd_v3_inductor_speedup'` Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/109211 Approved by: https://github.com/albanD, https://github.com/zou3519	2023-09-14 18:49:32 +00:00
Jez Ng	b121e4df92	Increase tolerances for baddbmm opinfo test (#109164 ) The compiled `baddbmm` deviates from the eager `baddbmm` due to its decomp into badd + bmm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109164 Approved by: https://github.com/eellison	2023-09-14 18:37:04 +00:00
Jerry Zhang	9187559e75	[quant][be] Remove test/quantization/pt2e/test_quantize_pt2e_fx.py (#108925 ) Summary: this is no longer needed since we have the quantizer api now Test Plan: . Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/108925 Approved by: https://github.com/andrewor14 ghstack dependencies: #108924	2023-09-14 18:35:17 +00:00
PyTorch MergeBot	900288f138	Revert "[inductor] Lower masked_scatter on CUDA (#108803 )" This reverts commit e4036ed7068cdcbe07470c1740ca25ab8ead7a3b. Reverted https://github.com/pytorch/pytorch/pull/108803 on behalf of https://github.com/peterbell10 due to Bot merged after aborted rebase ([comment](https://github.com/pytorch/pytorch/pull/108803#issuecomment-1719918831))	2023-09-14 18:12:27 +00:00
FFFrog	d4990ad5a1	Fix the example in the extending.func.rst (#109279 ) As the title shown ,the `backward` function is missing the definition of `ind` and `ind_inv`, which will lead to error when calling backward Pull Request resolved: https://github.com/pytorch/pytorch/pull/109279 Approved by: https://github.com/zou3519	2023-09-14 17:29:39 +00:00
weifengpy	9021fb8dac	[dynamo] implement custom dict variable as a general solution for HF's ModelOutput class (#105044 ) before the PR, for HF's ModelOutput class, we use dicts.py/DataClassVariable with our own implementation on __getItem__, __setAttr__, __setItem__. There is a risk that ModelOutput logic may change since it is a user code after the PR, we inline __getItem__, __setAttr__, __setItem__ using dicts.py/CustomizedDictVariable so the logic always keep AA unit test * python test/dynamo/test_model_output.py -k test_HF_bert_model_output test on HF benchmark * python benchmarks/dynamo/huggingface.py -d cuda --inference --accuracy --progress --inductor --print-dataframe-summary 2>&1 * all metric are the same before/after the PR, including pass rate, unique_graphs, graph_breaks, unique_graph_breaks * before the PR: P790393916 * after the PR: P790368991 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105044 Approved by: https://github.com/jansel	2023-09-14 17:15:50 +00:00
Peter Bell	e4036ed706	[inductor] Lower masked_scatter on CUDA (#108803 ) This decomposes masked_scatter into `aten.cumsum` and a single pointwise kernel, which is similar to what is done in eager. I only do this for CUDA because on CPU it isn't split into two passes like this so would cause a slowdown. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108803 Approved by: https://github.com/lezcano ghstack dependencies: #108802	2023-09-14 17:07:53 +00:00
PyTorch MergeBot	800c665618	Revert "[inductor] Add ir.Scan and lower aten.cumsum on CUDA (#106581 )" This reverts commit 5976a08eea1656a0f5420661b33e0937248f2097. Reverted https://github.com/pytorch/pytorch/pull/106581 on behalf of https://github.com/peterbell10 due to This combined with #108803 uncovered a triton bug openai/triton#2298 ([comment](https://github.com/pytorch/pytorch/pull/106581#issuecomment-1719811113))	2023-09-14 16:58:52 +00:00
Bert Maher	1b502139f3	Added a flag is_cpu to the AOTInductor runtime (#109300 ) Summary: added a flag is_cpu that can be specified by the user to indicate whether the AOTInductor runtime is for CPU. It's false by default. Test Plan: ci Reviewed By: hl475, aakhundov Differential Revision: D49253826 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109300 Approved by: https://github.com/aakhundov	2023-09-14 16:24:09 +00:00
Bert Maher	3acccb3aa0	[AOTInductor] Add is_cpu for AOTInductorModelContainer (#109287 ) Summary: If is_cpu is set for the model container, no need to move the weights from cpu to device. Reviewed By: bertmaher Differential Revision: D49252595 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109287 Approved by: https://github.com/aakhundov	2023-09-14 16:24:01 +00:00
PyTorch MergeBot	b226373d16	Revert "add Half support for BatchNorm on CPU (#102070 )" This reverts commit b6a1d3fb97ca8eeccf15a4c495fdd1af4b197f88. Reverted https://github.com/pytorch/pytorch/pull/102070 on behalf of https://github.com/clee2000 due to I'm very sorry but it looks like #106543 was not fixed, I still see it failing on main `b6a1d3fb97` https://github.com/pytorch/pytorch/actions/runs/6185704949/job/16793975677 ([comment](https://github.com/pytorch/pytorch/pull/102070#issuecomment-1719747065))	2023-09-14 16:13:34 +00:00
ydwu4	94a54b89aa	[dynamo] Add BACKEND_MATCH guard to detect and recompile when backend changes (#107337 ) Motivation: We try to make torch.cond use torch.compile automatically so that we could error out when there is side-effects in the branches and correctly handle the closures. Before this PR, we have a warning if we don't turn on a config raise_on_backend_change (turning it on gives us an error) for the following code: ```python def foo() # Inside torch.cond, we'd like to do something like torch.compile(foo, backend="eager", fullgraph=True)(...) ... # Users may then call torch.compile somewhere else. # Dynamo will use the cached code of foo for "eager" backend # but we expect dynamo to recompile with "inductor" backend. torch.compile(foo, backend="inductor")(...) ``` This PR adds a BACKEND_MATCH guard. Effectively, it implements a per-backend cache. In the above example, the cached code for "eager" won't work for "inductor" due to guard check failures and the second torch.compile will do a re-compilation. In the future, it might be useful to have something like a configuration guard that guards against dynamo configuration changes across different compiles (e.g. compile a function with fullgraph=False then compile it again with fullgraph=True). Implementation: 1. We add a guarded_backend_cache and check the most_recent_backend against the backend associated with cached code. We also remove the raise_on_backend_change flag. Note: More lines are printed for debug log due to newly added context manager and guard adds . Test Plan: Removed original tests that raise on different backend and add a new test to test whether the BACKEND_MATCH guard can guard against backend change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107337 Approved by: https://github.com/jansel	2023-09-14 15:49:30 +00:00
David Watson	9b3f5823f3	Added test for interpolate nearest exact (#108558 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108558 Approved by: https://github.com/mikaylagawarecki	2023-09-14 15:17:33 +00:00
Andres Lugo-Reyes	111b9ef390	[ROCM] Enable test_fn_fwgrad_..._functional_binary_cross_entropy on ROCM (#109038 ) Fixes #98431 This change addresses a hardware assertion that was triggered on ROCm only tests. Description of the problem: Assertion triggered ``` Device-side assertion `target_val >= zero && target_val <= one' failed. ``` The issue in question is due to a GPU side assertion in `binary_cross_entropy_out_cuda` where a `target_val` get's passed to the kernel that does not fall between 0 and 1. The value in question that triggers the assertion is -0.000000000810. The origin of this negative value comes from one of the tensors generated for the test. In this tensor, one of the values (on ROCM) is 0.000000999190 which adhere's to the restriction that it is between 0 and 1. However, this value is eventually passed as a single entry tensor to gradcheck.py::_compute_numerical_gradient ( https://github.com/pytorch/pytorch/blob/main/torch/autograd/gradcheck.py#L347) This function perturbs the tensor value in-place by subtracting `v` from it and then adding it back. The value of `v` comes from the default `eps` value defined here https://github.com/pytorch/pytorch/blob/main/torch/autograd/gradcheck.py#L2119 Currently pegged at `1e-6`. So what occurs is when an input is less than the default eps (like 0.000000999190 ), the perturbation calculation causes an entry in the tensor to flip to negative, i.e. 0.000000999190 - 1e-6 = -0.000000000810 (due to the subtraction here: https://github.com/pytorch/pytorch/blob/main/torch/autograd/gradcheck.py#L364) which then triggers the device side assertion in `binary_cross_entropy_out_cuda`. This PR loosens the EPS by an order of magnitude to get around the error. Since this issue has not been caught in the field in any meaningful way, I find this to be an adequate solution, though am happy to hear opposing viewpoints. Important to mention, while this error was only occurring on ROCm platforms, the issue described is also present in CUDA based environments. The difference being that CUDA doesn't seem to generate a tensor with any values less than `1e-6`. When injecting the small value on an Nvidia box, the same device side assertion was triggered. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109038 Approved by: https://github.com/jeffdaily, https://github.com/albanD	2023-09-14 15:17:29 +00:00
Edward Z. Yang	7f1f5afc91	Run only one pytest parametrization when generating optest (#108936 ) Richard, I'm curious to see what you think of this. I'm trying to use optest on the torchvision test suite, and after hacking up pytest support in https://github.com/pytorch/pytorch/pull/108929 I noticed that this was 5x'ing the test time... for no good reason. * torchvision nms tests before optests: 60 passed, 4 skipped, 1206 deselected in 11.47s * after optests: 300 passed, 20 skipped, 1206 deselected in 49.85s It's no good reason because torchvision has parametrized the tests to get a spread of various random generation, but for checking schema or fake tensor, we don't actually need to test for different values. This PR hacks up the codegen to replace pytest parametrize markers so that, instead of sampling many values, we sample only one value if you mark it with `opcheck_only_one`. There's a carveout for device parametrization, where we always run all those variants. With this PR: * reduced optests: 88 passed, 4 skipped, 1206 deselected in 13.89s Companion torchvision PR which uses this at https://github.com/pytorch/vision/pull/7961 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/108936 Approved by: https://github.com/zou3519	2023-09-14 14:54:57 +00:00
Mu-Chu Lee	7f7f6267e9	[AOTInductor] Skip pre_grad_passes for exported graph. (#109246 ) Summary: We skip pre_grad_passes if graph comes from export (aten IR) since pre_grad_passes (i.e. remove_identity) would not preserve meta["val"] in aten IR. Differential Revision: D49246374 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109246 Approved by: https://github.com/aakhundov	2023-09-14 13:30:12 +00:00
CaoE	b6a1d3fb97	add Half support for BatchNorm on CPU (#102070 ) Fixes #106543 ### Testing Single core: shape \| fp32 forward / ms \| fp16 forward / ms \| bf16 forward / ms \| fp32 backward / ms \| fp16 backward / ms \| bf16 backward / ms -- \| -- \| -- \| -- \| -- \| -- \| -- (1, 4, 256, 256) \| 0.7116 \| 0.1427 \| 0.1744 \| 0.2638 \| 0.2002 \| 0.2556 (1, 32, 100, 100) \| 0.8579 \| 0.1725 \| 0.2077 \| 0.3023 \| 0.2399 \| 0.2995 (32, 16, 200, 200) \| 57.3466 \| 12.2179 \| 13.1320 \| 45.9524 \| 24.1526 \| 24.9882 28 cores: shape \| fp32 forward / ms \| fp16 forward / ms \| bf16 forward / ms \| fp32 backward / ms \| fp16 backward / ms \| bf16 backward / ms -- \| -- \| -- \| -- \| -- \| -- \| -- (1, 4, 256, 256) \| 0.2571 \| 0.0713 \| 0.0846 \| 0.1140 \| 0.0883 \| 0.1043 (1, 32, 100, 100) \| 0.1077 \| 0.0510 \| 0.0548 \| 0.0700 \| 0.0645 \| 0.0713 (32, 16, 200, 200) \| 5.5060 \| 1.4195 \| 1.4663 \| 6.773 \| 3.0886 \| 3.1343 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102070 Approved by: https://github.com/jgong5, https://github.com/mikaylagawarecki	2023-09-14 12:23:59 +00:00
Jerry Zhang	41e2189843	[quant] Remove reference representation rewrite for adaptive_avg_pool2d (#108924 ) Summary: integer adaptive_avg_pool2d is not well defined due to different possible ways of rounding fp32 value to integer value, and this op isn't too critical for numerics (since it appears not too often), so we'll skip this for now. we might need to revert the changes that adds integer impl for adaptive_avg_pool op as well Test Plan: python test/test_quantization.py TestQuantizePT2ERepresentation Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/108924 Approved by: https://github.com/kimishpatel	2023-09-14 10:18:36 +00:00
Lujia Zhang	a6fadf643f	Re-do D48544397: [TGIF Inplace] [xlv2][1/n] Expose a couple APIs from inline_container that will be used for chunk read" (#109183 ) Summary: Original commit changeset: 4a5f31518ad0 Original Phabricator Diff: D48544397 fix easycla Differential Revision: D49221088 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109183 Approved by: https://github.com/wqfish	2023-09-14 08:17:14 +00:00
Yang Chen	9cd4548f01	AOTInductor dynamic shape (#109012 ) Summary: This PR adds dynamic-shape support for AOTInductor * On the runtime/interface side, we added two structs, StaticDimInfo and DynamicDimInfo, to hold values for static and dynamic dimensions, respectively. Dynamic dimensions are tracked by an unordered map field defined in AOTInductorModelBase. At inference time, the inference run method will assign the current real dimensional value to each dynamic dimension before executing any kernel. * On the CUDA wrapper codegen side, we generate dynamic symbols appropriately for shape computations. We simulate kernel launch grids in the C++ land by re-using the grid functions from the Python world. The returned grid configs, which may contain symbolic expressions, are printed out in their C++ forms via the CppPrinter. Note that when dynamic shapes are involved, we have to compute grid configs for each kernel at runtime in the same way as we do for launching the corresponding Triton kernel. Otherwise, we may end up with memory-access failures or mis-computations caused by invalid indices for fetching or storing data in device memory. Differential Revision: D49100472 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109012 Approved by: https://github.com/khabinov, https://github.com/desertfire, https://github.com/hl475	2023-09-14 08:00:30 +00:00
zhxchen17	f4e96df60a	[export] Preserve shape dynamism for unused inputs. (#109239 ) Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/109239 Approved by: https://github.com/ydwu4	2023-09-14 07:43:36 +00:00
Rohan Varma	25bf1a49c0	[FSDP][Wrap] ModuleWrapPolicy callable (#109117 ) Makes ModuleWrapPolicy callable, in my case this is needed for composition with or_policy. We should also make or_policy a public interface IMO. Differential Revision: [D49175112](https://our.internmc.facebook.com/intern/diff/D49175112/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109117 Approved by: https://github.com/fegin ghstack dependencies: #109116	2023-09-14 07:14:18 +00:00
Rohan Varma	f558e86fa0	[FSDP] continue if param not exist in sharded load (#109116 ) If I add a param and then wrap with FSDP + load state dict, when strict=False don't hard error here. Differential Revision: [D49170812](https://our.internmc.facebook.com/intern/diff/D49170812/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109116 Approved by: https://github.com/fegin	2023-09-14 07:14:18 +00:00
Aaron Bockover	6898754401	[ONNX] bump ort-nightly==1.16.0.dev20230908001 (#109212 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109212 Approved by: https://github.com/malfet	2023-09-14 05:04:18 +00:00
Nikita Shulga	90068ab30a	Fix CUDA-12 wheel loading on AmazonLinux (#109244 ) Or any other distro that have different purelib and platlib paths Regression was introduced, when small wheel base dependency was migrated from CUDA-11 to CUDA-12 Not sure why, but minor version of the package is no longer shipped with following CUDA-12: - nvidia_cuda_nvrtc_cu12-12.1.105 - nvidia-cuda-cupti-cu12-12.1.105 - nvidia-cuda-cupti-cu12-12.1.105 But those were present in CUDA-11 release, i.e: ``` shell bash-5.2# curl -OL `922c5996aa/nvidia_cuda_nvrtc_cu11-11.7.99-2-py3-none-manylinux1_x86_64.whl`; unzip -t nvidia_cuda_nvrtc_cu11-11.7.99-2-py3-none-manylinux1_x86_64.whl \|grep \.so testing: nvidia/cuda_nvrtc/lib/libnvrtc-builtins.so.11.7 OK testing: nvidia/cuda_nvrtc/lib/libnvrtc.so.11.2 OK bash-5.2# curl -OL `c64c03f49d/nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl`; unzip -t nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl\|grep \.so testing: nvidia/cuda_nvrtc/lib/libnvrtc-builtins.so.12.1 OK testing: nvidia/cuda_nvrtc/lib/libnvrtc.so.12 OK ``` Fixes https://github.com/pytorch/pytorch/issues/109221 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109244 Approved by: https://github.com/huydhn	2023-09-14 03:13:32 +00:00
PyTorch MergeBot	47f79e9a2b	Revert "Support SymBool input to torch.compile (#107850 )" This reverts commit 9f6d70b2fdbc4847dbff7c807c5620b4b408bb59. Reverted https://github.com/pytorch/pytorch/pull/107850 on behalf of https://github.com/huydhn due to Sorry for reverting this, but test_export_with_symbool_inputs is failing in trunk `a08e1370ef` ([comment](https://github.com/pytorch/pytorch/pull/107850#issuecomment-1718675877))	2023-09-14 02:53:36 +00:00
PyTorch MergeBot	de76c88d90	Revert "Remove expected test failures for cond (#108709 )" This reverts commit a08e1370ef8cb13cfbf18d9663427a57fa8657f2. Reverted https://github.com/pytorch/pytorch/pull/108709 on behalf of https://github.com/huydhn due to Sorry for reverting this, but test_export_with_symbool_inputs is failing in trunk `a08e1370ef` ([comment](https://github.com/pytorch/pytorch/pull/108709#issuecomment-1718669964))	2023-09-14 02:47:28 +00:00
Edward Z. Yang	05170b0b73	Reformat line of code header to put co_name after (#109233 ) I find this more intuitive as it matches the default Python traceback formatting. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/109233 Approved by: https://github.com/williamwen42	2023-09-14 02:07:16 +00:00
Jerry Zhang	c914ca7577	[quant][be] Add TestPT2ERepresentation test case (#108923 ) Summary: att Test Plan: python test/test_quantization.py TestPT2ERepresentation Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/108923 Approved by: https://github.com/andrewor14	2023-09-14 02:01:38 +00:00
Michael Voznesensky	064ae9ff33	Support register_hook on input tensors (#108903 ) The strategy in this PR is pretty straightforward. There are 2 kinds of hooks: 1) Hooks on objects with sources (inputs, params) 2) Hooks on objects w/o sources (intermediaries, and outputs). Note: As outputs can be made simple by how dynamo handles residuals, they could actually be handled as if they were inputs, but, for the sake of this PR, we will refer to hooks as either hooks on inputs (sourced), or hooks on intermediaries (not sourced). The plan: For tensors w/ a source: We record registered hooks, store them as a global, and associate them with the tensor in residuals. This means that when dynamo goes to create the frame, where we produce bytecode to stitch together our PT2 modified bytecode with the original eager code, we call `register_hook`. This registration of hooks in residuals is sound because (a) it happens right after a Pt2 frame region ends and (b) we know that the tensor is alive in f_locals, f_globals, or a module in the users invoking frame. This means we can soundly know it will be around to invoke `register_hook` on. As long as we guard on the identity of the lifted function, this is sound to do. For tensors w/o a source: Graph break - we will support this in a subsequent PR Handles: An interesting new component here is the creation of a `STORE_FAST `->`LOAD_FAST` associated with the handle, the return result of `register_hook`. If the user code stored the result of `register_hook` in a handle, we need to honor that. We do so by interceding into `STORE_FAST`, and recording the name of the local variable as directed by user code. We then honor that same name in the reconstructed bytecode. If the user did not store a hook, we merely pop the produced value to preserve the stack. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108903 Approved by: https://github.com/ezyang ghstack dependencies: #108846, #109092	2023-09-14 01:52:21 +00:00
Sam Larsen	50a084070f	[inductor][easy] Enable mypy checking for all inductor files that already pass (#109238 ) Summary: Let's just enable if mypy checking already passes. I checked all entries in the exclude list and enabled any that individually pass. Also needed one trivial change to a file already enabled. Test Plan: `lintrunner torch/_inductor/.py torch/_inductor//*.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/109238 Approved by: https://github.com/eellison	2023-09-14 01:45:25 +00:00
Ying Zhang	acad84ba6c	Disable cutlass tests in fbcode (#109241 ) Summary: ATT, fbcode requires different cutlass path setup. Test Plan: CI Differential Revision: D49242138 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109241 Approved by: https://github.com/DanilBaibak, https://github.com/chenyang78	2023-09-14 01:41:10 +00:00
Jackie (Jiaqi) Xu	62732bdcdb	[ez][inductor][fx passes] quick fix for invalid nodes (#109234 ) Summary: As title.Need to check whether node is valid before fusion Test Plan: To add test Differential Revision: D49241525 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109234 Approved by: https://github.com/yanboliang	2023-09-14 01:40:49 +00:00
zhxchen17	5edbee9404	[export] Normalize nn_module_stack paths. (#109231 ) Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/109231 Approved by: https://github.com/angelayi	2023-09-14 01:34:31 +00:00
Nakul Camsamudram	109ab6a0df	Support str() on user defined functions (#108973 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108973 Approved by: https://github.com/anijain2305	2023-09-14 01:32:02 +00:00
ydwu4	a08e1370ef	Remove expected test failures for cond (#108709 ) Remove the expected failure in def test_control_flow_tracing(self) by chaning the error message to `Expected pred to be bool or tensor, but got Proxy$eq$` Pull Request resolved: https://github.com/pytorch/pytorch/pull/108709 Approved by: https://github.com/ezyang, https://github.com/zou3519 ghstack dependencies: #107662, #107850	2023-09-14 01:16:29 +00:00
ydwu4	9f6d70b2fd	Support SymBool input to torch.compile (#107850 ) We could have SymBool inputs for torch.compile, e.g. in the following situation: ``` def f(x:torch.Tensor): pred = x.size(0) == 3 torch.compile(f)(pred, x) make_fx(f, tracing_mode="symbolic")(x) ``` The idea of this PR (credit to @ezyang) is to support SymBool by re-using the infra we've already had for SymInt so that we don't need to replicate a lot of stuff. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107850 Approved by: https://github.com/ezyang ghstack dependencies: #107662	2023-09-14 01:16:29 +00:00
Angela Yi	025d1a18ab	[export] Separate out exported_program.py (#109147 ) Test Plan: CI Differential Revision: D49205011 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109147 Approved by: https://github.com/zhxchen17	2023-09-14 01:14:46 +00:00
Sam Larsen	4a09ed5459	[inductor] Parallelize Max Autotune step 2: Use multiple GPUs (#109127 ) Test Plan: `python test/inductor/test_max_autotune.py` `TORCHINDUCTOR_AUTOTUNE_IN_SUBPROC=1 TORCHINDUCTOR_MAX_AUTOTUNE=1 python benchmarks/dynamo/torchbench.py --device cuda --performance --backend inductor --inference --only hf_Bart` `TORCHINDUCTOR_AUTOTUNE_MULTI_DEVICE=1 TORCHINDUCTOR_AUTOTUNE_IN_SUBPROC=1 TORCHINDUCTOR_MAX_AUTOTUNE=1 python benchmarks/dynamo/torchbench.py --device cuda --performance --backend inductor --inference --only hf_Bart` Pull Request resolved: https://github.com/pytorch/pytorch/pull/109127 Approved by: https://github.com/shunting314, https://github.com/eellison ghstack dependencies: #109126	2023-09-14 00:37:39 +00:00
Sam Larsen	ce4283933f	[inductor] Parallelize Max Autotune step 1: refactor autotune_process (#109126 ) Summary: Step 1 in revamping subprocess autotune to support multiple GPUs. This diff just does some refactoring to autotune_process.py in order to prepare for the next diff: * Move all logic for managing the sub-process (like detecting sub-process crashes) into the TuningProcess class. * Use log.debug statements instead of print statements Test Plan: python test/inductor/test_max_autotune.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/109126 Approved by: https://github.com/shunting314, https://github.com/eellison	2023-09-14 00:37:39 +00:00
Guilherme Leobas	dbddf1816a	Remove include_0d from sample_inputs_gather (#109125 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109125 Approved by: https://github.com/lezcano ghstack dependencies: #108879, #108880, #109120	2023-09-13 23:13:09 +00:00
Guilherme Leobas	61f0578787	Update take_along_dim docs to include `dim=None` case (#109120 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109120 Approved by: https://github.com/lezcano ghstack dependencies: #108879, #108880	2023-09-13 23:13:09 +00:00
Guilherme Leobas	d046376c4f	Dispatch `numpy.take_along_axis` to `torch.take_along_dim` (#108880 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108880 Approved by: https://github.com/lezcano ghstack dependencies: #108879	2023-09-13 23:13:09 +00:00
Guilherme Leobas	49e3d76684	Add `SymInt` support to `torch.take_along_dim` (#108879 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108879 Approved by: https://github.com/Skylion007, https://github.com/lezcano, https://github.com/Chillee	2023-09-13 23:13:09 +00:00
Driss Guessous	aca3bd44d1	Fix failing inductor test (#109220 ) Summary: This broke as a result of the flashv2 PR. The tests couldnt' be listed expect for a100 machine which is weird.. Test Plan: buck2 test 'fbcode//mode/opt' fbcode//caffe2/test/inductor:fused_attention Differential Revision: D49239716 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109220 Approved by: https://github.com/eellison	2023-09-13 23:12:32 +00:00
ydwu4	33c94b8b16	Better error handling for cond (#108817 ) ## Exception in cond: For code below: ```python import torch import functorch.experimental.control_flow as control_flow def true_fn(x): return x.sin() def false_fn(x): return x, x def f(x, y): return control_flow.cond(y, true_fn, false_fn, [x]) f(torch.ones(3, 4), torch.tensor(False)) ``` The original exception stack trace is: ```python Traceback (most recent call last): File "/home/yidi/local/pytorch/test_exc.py", line 33, in <module> f(torch.ones(3, 4), torch.tensor(False)) File "/home/yidi/local/pytorch/test_exc.py", line 31, in f return control_flow.cond(y, true_fn, false_fn, [x]) File "/home/yidi/local/pytorch/torch/_higher_order_ops/cond.py", line 154, in cond return torch.compile(cond_op, backend="eager", fullgraph=True)( File "/home/yidi/local/pytorch/torch/_dynamo/eval_frame.py", line 365, in _fn return fn(args, kwargs) File "/home/yidi/local/pytorch/torch/_dynamo/eval_frame.py", line 513, in catch_errors return callback(frame, cache_entry, hooks, frame_state) File "/home/yidi/local/pytorch/torch/_dynamo/convert_frame.py", line 140, in _fn return fn(args, *kwargs) File "/home/yidi/local/pytorch/torch/_dynamo/convert_frame.py", line 380, in _convert_frame_assert return _compile( File "/home/yidi/local/pytorch/torch/_dynamo/convert_frame.py", line 560, in _compile guarded_code = compile_inner(code, one_graph, hooks, transform) File "/home/yidi/local/pytorch/torch/_dynamo/utils.py", line 197, in time_wrapper r = func(args, *kwargs) File "/home/yidi/local/pytorch/torch/_dynamo/convert_frame.py", line 482, in compile_inner out_code = transform_code_object(code, transform) File "/home/yidi/local/pytorch/torch/_dynamo/bytecode_transformation.py", line 1028, in transform_code_object transformations(instructions, code_options) File "/home/yidi/local/pytorch/torch/_dynamo/convert_frame.py", line 449, in transform tracer.run() File "/home/yidi/local/pytorch/torch/_dynamo/symbolic_convert.py", line 2083, in run super().run() File "/home/yidi/local/pytorch/torch/_dynamo/symbolic_convert.py", line 733, in run and self.step() File "/home/yidi/local/pytorch/torch/_dynamo/symbolic_convert.py", line 696, in step getattr(self, inst.opname)(inst) File "/home/yidi/local/pytorch/torch/_dynamo/symbolic_convert.py", line 397, in wrapper return inner_fn(self, inst) File "/home/yidi/local/pytorch/torch/_dynamo/symbolic_convert.py", line 1164, in CALL_FUNCTION_EX self.call_function(fn, argsvars.items, kwargsvars.items) File "/home/yidi/local/pytorch/torch/_dynamo/symbolic_convert.py", line 570, in call_function self.push(fn.call_function(self, args, kwargs)) File "/home/yidi/local/pytorch/torch/_dynamo/variables/higher_order_ops.py", line 418, in call_function (false_r, false_graph, false_lifted_freevars) = speculate_branch(False) File "/home/yidi/local/pytorch/torch/_dynamo/variables/higher_order_ops.py", line 410, in speculate_branch raise UncapturedHigherOrderOpError( torch._dynamo.exc.UncapturedHigherOrderOpError: Expected branch to return a single tensor from user code: File "/home/yidi/local/pytorch/torch/_dynamo/external_utils.py", line 17, in inner return fn(args, *kwargs) Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information You can suppress this exception and fall back to eager by setting: import torch._dynamo torch._dynamo.config.suppress_errors = True ``` After this PR we get: ```python Traceback (most recent call last): File "/home/yidi/local/pytorch/torch/_dynamo/variables/higher_order_ops.py", line 50, in graph_break_as_hard_error return fn(args, *kwargs) File "/home/yidi/local/pytorch/torch/_dynamo/variables/higher_order_ops.py", line 429, in call_function (false_r, false_graph, false_lifted_freevars) = speculate_branch(False) File "/home/yidi/local/pytorch/torch/_dynamo/variables/higher_order_ops.py", line 421, in speculate_branch unimplemented( File "/home/yidi/local/pytorch/torch/_dynamo/exc.py", line 187, in unimplemented raise Unsupported(msg) torch._dynamo.exc.Unsupported: Expected branch to return a single tensor The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/home/yidi/local/pytorch/test_exc.py", line 33, in <module> f(torch.ones(3, 4), torch.tensor(False)) File "/home/yidi/local/pytorch/test_exc.py", line 31, in f return control_flow.cond(y, true_fn, false_fn, [x]) File "/home/yidi/local/pytorch/torch/_higher_order_ops/cond.py", line 154, in cond return torch.compile(cond_op, backend="eager", fullgraph=True)( File "/home/yidi/local/pytorch/torch/_dynamo/eval_frame.py", line 338, in _fn return fn(args, *kwargs) File "/home/yidi/local/pytorch/torch/_dynamo/eval_frame.py", line 500, in catch_errors return callback(frame, cache_entry, hooks, frame_state) File "/home/yidi/local/pytorch/torch/_dynamo/convert_frame.py", line 140, in _fn return fn(args, *kwargs) File "/home/yidi/local/pytorch/torch/_dynamo/convert_frame.py", line 382, in _convert_frame_assert return _compile( File "/home/yidi/local/pytorch/torch/_dynamo/convert_frame.py", line 562, in _compile guarded_code = compile_inner(code, one_graph, hooks, transform) File "/home/yidi/local/pytorch/torch/_dynamo/utils.py", line 189, in time_wrapper r = func(args, *kwargs) File "/home/yidi/local/pytorch/torch/_dynamo/convert_frame.py", line 484, in compile_inner out_code = transform_code_object(code, transform) File "/home/yidi/local/pytorch/torch/_dynamo/bytecode_transformation.py", line 1028, in transform_code_object transformations(instructions, code_options) File "/home/yidi/local/pytorch/torch/_dynamo/convert_frame.py", line 451, in transform tracer.run() File "/home/yidi/local/pytorch/torch/_dynamo/symbolic_convert.py", line 2088, in run super().run() File "/home/yidi/local/pytorch/torch/_dynamo/symbolic_convert.py", line 728, in run and self.step() File "/home/yidi/local/pytorch/torch/_dynamo/symbolic_convert.py", line 691, in step getattr(self, inst.opname)(inst) File "/home/yidi/local/pytorch/torch/_dynamo/symbolic_convert.py", line 392, in wrapper return inner_fn(self, inst) File "/home/yidi/local/pytorch/torch/_dynamo/symbolic_convert.py", line 1159, in CALL_FUNCTION_EX self.call_function(fn, argsvars.items, kwargsvars.items) File "/home/yidi/local/pytorch/torch/_dynamo/symbolic_convert.py", line 565, in call_function self.push(fn.call_function(self, args, kwargs)) File "/home/yidi/local/pytorch/torch/_dynamo/variables/higher_order_ops.py", line 53, in graph_break_as_hard_error raise UncapturedHigherOrderOpError(reason + msg) from e torch._dynamo.exc.UncapturedHigherOrderOpError: Cond doesn't work unless it is captured completely with torch.compile. Scroll up to find out what causes the graph break. from user code: File "/home/yidi/local/pytorch/torch/_dynamo/external_utils.py", line 17, in inner return fn(args, *kwargs) Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information You can suppress this exception and fall back to eager by setting: import torch._dynamo torch._dynamo.config.suppress_errors = True ``` ## Exception during speculating branches The example code below has a inplace-buffer mutation error, ```python import torch import functorch.experimental.control_flow as control_flow class Foo(torch.nn.Module): def __init__(self): super().__init__() self.register_buffer("buffer", torch.ones(6, 4)) def forward(self, x): def true_fn(x): self.buffer += 1 return self.buffer.sum() + x.sum() def false_fn(x): return (x - 1).sum() return control_flow.cond(x.shape[0] > 4, true_fn, false_fn, [x]) mod_for_compile = torch.compile(Foo(), backend="eager", dynamic=True) mod_for_compile(torch.ones(3, 4)) ``` Before this PR the exception looks like: ```python [2023-09-08 15:20:03,332] [0/0] torch._dynamo.variables.higher_order_ops: [WARNING] speculate_subgraph: while introspecting cond, we were unable to trace function `true_fn` into a single graph. This means that Dynamo was unable to prove safety for this API and will fall back to eager-mode PyTorch, which could lead to a slowdown. [2023-09-08 15:20:03,332] [0/0] torch._dynamo.variables.higher_order_ops: [ERROR] Can't inplace modify module params/buffers inside HigherOrderOp Traceback (most recent call last): File "/home/yidi/local/pytorch/torch/_dynamo/variables/higher_order_ops.py", line 163, in speculate_subgraph output = f.call_function(tx, args, sub_kwargs) File "/home/yidi/local/pytorch/torch/_dynamo/variables/functions.py", line 90, in call_function return tx.inline_user_function_return( File "/home/yidi/local/pytorch/torch/_dynamo/symbolic_convert.py", line 606, in inline_user_function_return result = InliningInstructionTranslator.inline_call(self, fn, args, kwargs) File "/home/yidi/local/pytorch/torch/_dynamo/symbolic_convert.py", line 2200, in inline_call return cls.inline_call_(parent, func, args, kwargs) File "/home/yidi/local/pytorch/torch/_dynamo/symbolic_convert.py", line 2316, in inline_call_ tracer.run() File "/home/yidi/local/pytorch/torch/_dynamo/symbolic_convert.py", line 733, in run and self.step() File "/home/yidi/local/pytorch/torch/_dynamo/symbolic_convert.py", line 696, in step getattr(self, inst.opname)(inst) File "/home/yidi/local/pytorch/torch/_dynamo/symbolic_convert.py", line 1219, in STORE_ATTR .call_function(self, [obj, ConstantVariable(inst.argval), val], {}) File "/home/yidi/local/pytorch/torch/_dynamo/variables/builtin.py", line 618, in call_function result = handler(tx, args, *kwargs) File "/home/yidi/local/pytorch/torch/_dynamo/variables/builtin.py", line 1169, in call_setattr raise AttributeMutationError( torch._dynamo.exc.AttributeMutationError: Can't inplace modify module params/buffers inside HigherOrderOp The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/home/yidi/local/pytorch/torch/_dynamo/variables/higher_order_ops.py", line 394, in speculate_branch ret_val, ret_graph, ret_lifted_freevars = speculate_subgraph( File "/home/yidi/local/pytorch/torch/_dynamo/variables/higher_order_ops.py", line 222, in speculate_subgraph raise Unsupported( torch._dynamo.exc.Unsupported: speculate_subgraph: while introspecting cond, we were unable to trace function `true_fn` into a single graph. This means that Dynamo was unable to prove safety for this API and will fall back to eager-mode PyTorch, which could lead to a slowdown. Scroll up for the stack trace of the initial exception. The reason was: Can't inplace modify module params/buffers inside HigherOrderOp The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/home/yidi/local/pytorch/test_exc.py", line 20, in <module> mod_for_compile(torch.ones(3, 4)) File "/home/yidi/local/pytorch/torch/nn/modules/module.py", line 1519, in _wrapped_call_impl return self._call_impl(args, *kwargs) File "/home/yidi/local/pytorch/torch/nn/modules/module.py", line 1528, in _call_impl return forward_call(args, *kwargs) File "/home/yidi/local/pytorch/torch/_dynamo/eval_frame.py", line 365, in _fn return fn(args, *kwargs) File "/home/yidi/local/pytorch/torch/nn/modules/module.py", line 1519, in _wrapped_call_impl return self._call_impl(args, *kwargs) File "/home/yidi/local/pytorch/torch/nn/modules/module.py", line 1528, in _call_impl return forward_call(args, *kwargs) File "/home/yidi/local/pytorch/torch/_dynamo/eval_frame.py", line 513, in catch_errors return callback(frame, cache_entry, hooks, frame_state) File "/home/yidi/local/pytorch/torch/_dynamo/convert_frame.py", line 632, in _convert_frame result = inner_convert(frame, cache_entry, hooks, frame_state) File "/home/yidi/local/pytorch/torch/_dynamo/convert_frame.py", line 140, in _fn return fn(args, *kwargs) File "/home/yidi/local/pytorch/torch/_dynamo/convert_frame.py", line 380, in _convert_frame_assert return _compile( File "/home/yidi/local/pytorch/torch/_dynamo/convert_frame.py", line 560, in _compile guarded_code = compile_inner(code, one_graph, hooks, transform) File "/home/yidi/local/pytorch/torch/_dynamo/utils.py", line 197, in time_wrapper r = func(args, *kwargs) File "/home/yidi/local/pytorch/torch/_dynamo/convert_frame.py", line 482, in compile_inner out_code = transform_code_object(code, transform) File "/home/yidi/local/pytorch/torch/_dynamo/bytecode_transformation.py", line 1028, in transform_code_object transformations(instructions, code_options) File "/home/yidi/local/pytorch/torch/_dynamo/convert_frame.py", line 449, in transform tracer.run() File "/home/yidi/local/pytorch/torch/_dynamo/symbolic_convert.py", line 2083, in run super().run() File "/home/yidi/local/pytorch/torch/_dynamo/symbolic_convert.py", line 733, in run and self.step() File "/home/yidi/local/pytorch/torch/_dynamo/symbolic_convert.py", line 696, in step getattr(self, inst.opname)(inst) File "/home/yidi/local/pytorch/torch/_dynamo/symbolic_convert.py", line 397, in wrapper return inner_fn(self, inst) File "/home/yidi/local/pytorch/torch/_dynamo/symbolic_convert.py", line 1124, in CALL_FUNCTION self.call_function(fn, args, {}) File "/home/yidi/local/pytorch/torch/_dynamo/symbolic_convert.py", line 570, in call_function self.push(fn.call_function(self, args, kwargs)) File "/home/yidi/local/pytorch/torch/_dynamo/variables/functions.py", line 261, in call_function return super().call_function(tx, args, kwargs) File "/home/yidi/local/pytorch/torch/_dynamo/variables/functions.py", line 90, in call_function return tx.inline_user_function_return( File "/home/yidi/local/pytorch/torch/_dynamo/symbolic_convert.py", line 606, in inline_user_function_return result = InliningInstructionTranslator.inline_call(self, fn, args, kwargs) File "/home/yidi/local/pytorch/torch/_dynamo/symbolic_convert.py", line 2200, in inline_call return cls.inline_call_(parent, func, args, kwargs) File "/home/yidi/local/pytorch/torch/_dynamo/symbolic_convert.py", line 2316, in inline_call_ tracer.run() File "/home/yidi/local/pytorch/torch/_dynamo/symbolic_convert.py", line 733, in run and self.step() File "/home/yidi/local/pytorch/torch/_dynamo/symbolic_convert.py", line 696, in step getattr(self, inst.opname)(inst) File "/home/yidi/local/pytorch/torch/_dynamo/symbolic_convert.py", line 397, in wrapper return inner_fn(self, inst) File "/home/yidi/local/pytorch/torch/_dynamo/symbolic_convert.py", line 1124, in CALL_FUNCTION self.call_function(fn, args, {}) File "/home/yidi/local/pytorch/torch/_dynamo/symbolic_convert.py", line 570, in call_function self.push(fn.call_function(self, args, kwargs)) File "/home/yidi/local/pytorch/torch/_dynamo/variables/higher_order_ops.py", line 415, in call_function (true_r, true_graph, true_lifted_freevars) = speculate_branch(True) File "/home/yidi/local/pytorch/torch/_dynamo/variables/higher_order_ops.py", line 405, in speculate_branch raise UncapturedHigherOrderOpError( torch._dynamo.exc.UncapturedHigherOrderOpError: Cond doesn't work unless it is captured completely with torch.compile from user code: File "/home/yidi/local/pytorch/test_exc.py", line 16, in forward return control_flow.cond(x.shape[0] > 4, true_fn, false_fn, [x]) File "/home/yidi/local/pytorch/torch/_higher_order_ops/cond.py", line 127, in cond return cond_op(pred, true_fn, false_fn, operands) Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information You can suppress this exception and fall back to eager by setting: import torch._dynamo torch._dynamo.config.suppress_errors = True ``` after this PR, the only difference is the error message of UncapturedHigherOrderOpError changes from `Cond doesn't work unless it is captured completely with torch.compile` to `Cond doesn't work unless it is captured completely with torch.compile. Scroll up to find out what causes the graph break`. ```python [2023-09-08 15:17:02,052] [0/0] torch._dynamo.variables.higher_order_ops: [WARNING] speculate_subgraph: while introspecting cond, we were unable to trace function `true_fn` into a single graph. This means that Dynamo was unable to prove safety for this API and will fall back to eager-mode PyTorch, which could lead to a slowdown. [2023-09-08 15:17:02,052] [0/0] torch._dynamo.variables.higher_order_ops: [ERROR] Can't inplace modify module params/buffers inside HigherOrderOp Traceback (most recent call last): File "/home/yidi/local/pytorch/torch/_dynamo/variables/higher_order_ops.py", line 177, in speculate_subgraph output = f.call_function(tx, args, sub_kwargs) File "/home/yidi/local/pytorch/torch/_dynamo/variables/functions.py", line 90, in call_function return tx.inline_user_function_return( File "/home/yidi/local/pytorch/torch/_dynamo/symbolic_convert.py", line 601, in inline_user_function_return result = InliningInstructionTranslator.inline_call(self, fn, args, kwargs) File "/home/yidi/local/pytorch/torch/_dynamo/symbolic_convert.py", line 2193, in inline_call return cls.inline_call_(parent, func, args, kwargs) File "/home/yidi/local/pytorch/torch/_dynamo/symbolic_convert.py", line 2300, in inline_call_ tracer.run() File "/home/yidi/local/pytorch/torch/_dynamo/symbolic_convert.py", line 728, in run and self.step() File "/home/yidi/local/pytorch/torch/_dynamo/symbolic_convert.py", line 691, in step getattr(self, inst.opname)(inst) File "/home/yidi/local/pytorch/torch/_dynamo/symbolic_convert.py", line 1214, in STORE_ATTR .call_function(self, [obj, ConstantVariable(inst.argval), val], {}) File "/home/yidi/local/pytorch/torch/_dynamo/variables/builtin.py", line 618, in call_function result = handler(tx, args, *kwargs) File "/home/yidi/local/pytorch/torch/_dynamo/variables/builtin.py", line 1169, in call_setattr raise AttributeMutationError( torch._dynamo.exc.AttributeMutationError: Can't inplace modify module params/buffers inside HigherOrderOp The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/home/yidi/local/pytorch/torch/_dynamo/variables/higher_order_ops.py", line 50, in graph_break_as_hard_error return fn(args, *kwargs) File "/home/yidi/local/pytorch/torch/_dynamo/variables/higher_order_ops.py", line 426, in call_function (true_r, true_graph, true_lifted_freevars) = speculate_branch(True) File "/home/yidi/local/pytorch/torch/_dynamo/variables/higher_order_ops.py", line 410, in speculate_branch ret_val, ret_graph, ret_lifted_freevars = speculate_subgraph( File "/home/yidi/local/pytorch/torch/_dynamo/variables/higher_order_ops.py", line 236, in speculate_subgraph raise Unsupported( torch._dynamo.exc.Unsupported: speculate_subgraph: while introspecting cond, we were unable to trace function `true_fn` into a single graph. This means that Dynamo was unable to prove safety for this API and will fall back to eager-mode PyTorch, which could lead to a slowdown. Scroll up for the stack trace of the initial exception. The reason was: Can't inplace modify module params/buffers inside HigherOrderOp The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/home/yidi/local/pytorch/test_exc.py", line 20, in <module> mod_for_compile(torch.ones(3, 4)) File "/home/yidi/local/pytorch/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, *kwargs) File "/home/yidi/local/pytorch/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(args, *kwargs) File "/home/yidi/local/pytorch/torch/_dynamo/eval_frame.py", line 338, in _fn return fn(args, *kwargs) File "/home/yidi/local/pytorch/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, *kwargs) File "/home/yidi/local/pytorch/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(args, *kwargs) File "/home/yidi/local/pytorch/torch/_dynamo/eval_frame.py", line 500, in catch_errors return callback(frame, cache_entry, hooks, frame_state) File "/home/yidi/local/pytorch/torch/_dynamo/convert_frame.py", line 634, in _convert_frame result = inner_convert(frame, cache_entry, hooks, frame_state) File "/home/yidi/local/pytorch/torch/_dynamo/convert_frame.py", line 140, in _fn return fn(args, *kwargs) File "/home/yidi/local/pytorch/torch/_dynamo/convert_frame.py", line 382, in _convert_frame_assert return _compile( File "/home/yidi/local/pytorch/torch/_dynamo/convert_frame.py", line 562, in _compile guarded_code = compile_inner(code, one_graph, hooks, transform) File "/home/yidi/local/pytorch/torch/_dynamo/utils.py", line 189, in time_wrapper r = func(args, **kwargs) File "/home/yidi/local/pytorch/torch/_dynamo/convert_frame.py", line 484, in compile_inner out_code = transform_code_object(code, transform) File "/home/yidi/local/pytorch/torch/_dynamo/bytecode_transformation.py", line 1028, in transform_code_object transformations(instructions, code_options) File "/home/yidi/local/pytorch/torch/_dynamo/convert_frame.py", line 451, in transform tracer.run() File "/home/yidi/local/pytorch/torch/_dynamo/symbolic_convert.py", line 2088, in run super().run() File "/home/yidi/local/pytorch/torch/_dynamo/symbolic_convert.py", line 728, in run and self.step() File "/home/yidi/local/pytorch/torch/_dynamo/symbolic_convert.py", line 691, in step getattr(self, inst.opname)(inst) File "/home/yidi/local/pytorch/torch/_dynamo/symbolic_convert.py", line 392, in wrapper return inner_fn(self, inst) File "/home/yidi/local/pytorch/torch/_dynamo/symbolic_convert.py", line 1119, in CALL_FUNCTION self.call_function(fn, args, {}) File "/home/yidi/local/pytorch/torch/_dynamo/symbolic_convert.py", line 565, in call_function self.push(fn.call_function(self, args, kwargs)) File "/home/yidi/local/pytorch/torch/_dynamo/variables/functions.py", line 261, in call_function return super().call_function(tx, args, kwargs) File "/home/yidi/local/pytorch/torch/_dynamo/variables/functions.py", line 90, in call_function return tx.inline_user_function_return( File "/home/yidi/local/pytorch/torch/_dynamo/symbolic_convert.py", line 601, in inline_user_function_return result = InliningInstructionTranslator.inline_call(self, fn, args, kwargs) File "/home/yidi/local/pytorch/torch/_dynamo/symbolic_convert.py", line 2193, in inline_call return cls.inline_call_(parent, func, args, kwargs) File "/home/yidi/local/pytorch/torch/_dynamo/symbolic_convert.py", line 2300, in inline_call_ tracer.run() File "/home/yidi/local/pytorch/torch/_dynamo/symbolic_convert.py", line 728, in run and self.step() File "/home/yidi/local/pytorch/torch/_dynamo/symbolic_convert.py", line 691, in step getattr(self, inst.opname)(inst) File "/home/yidi/local/pytorch/torch/_dynamo/symbolic_convert.py", line 392, in wrapper return inner_fn(self, inst) File "/home/yidi/local/pytorch/torch/_dynamo/symbolic_convert.py", line 1119, in CALL_FUNCTION self.call_function(fn, args, {}) File "/home/yidi/local/pytorch/torch/_dynamo/symbolic_convert.py", line 565, in call_function self.push(fn.call_function(self, args, kwargs)) File "/home/yidi/local/pytorch/torch/_dynamo/variables/higher_order_ops.py", line 53, in graph_break_as_hard_error raise UncapturedHigherOrderOpError(reason + msg) from e torch._dynamo.exc.UncapturedHigherOrderOpError: Cond doesn't work unless it is captured completely with torch.compile. Scroll up to find out what causes the graph break. from user code: File "/home/yidi/local/pytorch/test_exc.py", line 16, in forward return control_flow.cond(x.shape[0] > 4, true_fn, false_fn, [x]) File "/home/yidi/local/pytorch/torch/_higher_order_ops/cond.py", line 127, in cond return cond_op(pred, true_fn, false_fn, operands) Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information You can suppress this exception and fall back to eager by setting: import torch._dynamo torch._dynamo.config.suppress_errors = True ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/108817 Approved by: https://github.com/zou3519	2023-09-13 23:03:59 +00:00
PyTorch MergeBot	04a765f95d	Revert "add Half support for BatchNorm on CPU (#102070 )" This reverts commit 6065e7a97cfad4c2ae2b8722969648a53265fa13. Reverted https://github.com/pytorch/pytorch/pull/102070 on behalf of https://github.com/clee2000 due to sorry it looks like this is causing an unexpected success for `test_jit_fuser_te.py::TestNNCOpInfoCPU::test_nnc_correctness_nn_functional_batch_norm_cpu_float16` `6065e7a97c` https://github.com/pytorch/pytorch/actions/runs/6178069462/job/16770849782 ([comment](https://github.com/pytorch/pytorch/pull/102070#issuecomment-1718402208))	2023-09-13 22:38:42 +00:00
Catherine Lee	c44f816960	Disable tests mentioned in 109213 (#109232 ) #109213 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109232 Approved by: https://github.com/huydhn	2023-09-13 22:29:00 +00:00
Vlad Scherbich	2d26364fb3	[caffe2][cuda] Fix instrumentation of malloc/free SDTs for `CUDACachingAllocator` (#108907 ) Summary: There's currently a bug in `CUDACachingAllocator` which makes it impossible to determine whether a `malloc`ed sample has been deallocated (introduced in D48229150). It happens because we currently instrument the `malloc` SDT before a block of memory has been allocated by either `cudaMalloc` or local cashing allocator `malloc` call. Since this is a static tracepoint, it receives arg values at the point of instrumentation. Currently, it receives the memory pointer, `void* p`, which is NULL. Changes in this diff: 1) Move this SDT to right before the `allocate` function returns, so that memory has been allocated already and `p` pointer points to a valid, non-NULL address. 2) Enable tracing of `cudaMalloc` calls, in addition to `NativeCachingAllocator::malloc` 3) renames a poorly-named local var: `r` --> `devPtr` (pointer to the allocated memory block) Test Plan: Tested with a local PyTorch script that leaks memory. Verified the following: * prior to this fix (prod), malloc samples are not marked as "freed" * with the fix (branch), samples are marked as "freed" * results are comparable with the current uprobe implementation to sample PyTorch malloc events in `gpusnoop` Reviewed By: chaekit Differential Revision: D48873734 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108907 Approved by: https://github.com/chaekit	2023-09-13 22:15:41 +00:00
pytorchbot	faa5985dfe	Fix issue when input/output buffer of functional collective (e.g. allreduce / allgather) is incorrectly reused later (#108811 ) For this program: ```python def func(a, *, tag, ranks, group_size): ar = torch.ops.c10d_functional.all_reduce(a, "sum", tag, ranks, group_size) ar = torch.ops.c10d_functional.wait_tensor(ar) c = torch.relu(a) # c = a d = torch.matmul(c, c) e = d + ar return (e,) ``` the generated code is: ```python def call(args): arg0_1, = args args.clear() assert_size_stride(arg0_1, (4, 4), (4, 1)) with torch.cuda._DeviceGuard(1): torch.cuda.set_device(1) # no-op to ensure context buf0 = empty_strided((4, 4), (4, 1), device='cuda', dtype=torch.float32) buf0.copy_(arg0_1) #no reuse buf1_pg = c10d._find_or_create_pg_by_ranks_and_tag('', [0, 1], 2) buf1 = buf0 buf1_work = dist.all_reduce(buf1, async_op=True, group=buf1_pg, op=fun_col_impl._str_to_reduce_op('sum')) fun_col_impl._register_tensor_work(buf1, buf1_work) del buf1 buf0 = _wait_tensor(buf0) buf2 = buf0 buf3 = buf0; del buf0 # reuse # Source Nodes: [relu], Original ATen: [aten.relu] stream1 = get_cuda_stream(1) triton_poi_fused_relu_0.run(arg0_1, buf3, 16, grid=grid(16), stream=stream1) del arg0_1 buf4 = empty_strided((4, 4), (4, 1), device='cuda', dtype=torch.float32) # Source Nodes: [add, relu], Original ATen: [aten.add, aten.relu] extern_kernels.addmm(buf2, buf3, buf3, alpha=1, beta=1, out=buf4) return (buf4, ) ``` We can notice that allreduce input (`buf1` which is alias of `buf0`) is incorrectly reused as input (`buf3`) to the triton `triton_poi_fused_relu_0` inplace kernel, diverging from eager mode logic. In general, we should make it so that Inductor doesn't try to reuse the input buffer to an inplace functional collective. We have a similar problem for output buffer of out-of-place functional collectives, see https://github.com/pytorch/pytorch/issues/108780#issuecomment-1714921994. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108811 Approved by: https://github.com/Chillee, https://github.com/wconstab	2023-09-13 21:39:37 +00:00
Andrew Gu	54dd65f93a	[FSDP] Only check exec order if DETAIL (#109049 ) The execution order check seems to have been causing more problems than it prevents. Motivated by an internal issue, we can move this check to only `DISTRIBUTED_DEBUG_LEVEL=DETAIL`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109049 Approved by: https://github.com/fegin	2023-09-13 20:40:38 +00:00
Nikita Shulga	916183a012	[MPS] Fix crash if nonzero is called concurrently (#108996 ) Surrounds `stream->synchronize()` call with `dispatch_sync(stream->queue(), ^{});`, which is a noop for signle threaded program, but serializes calls to the synchronize across the threads using the same stream. Prevent `[IOGPUMetalCommandBuffer validate]:215: failed assertion 'commit an already committed command buffer'` non-recoverable exception, which is triggered every time one is using PyCharm to inspect tensors on MPS device Fixes https://github.com/pytorch/pytorch/issues/100285 <!-- copilot:poem --> ### <samp>🤖 Generated by Copilot at 1662ce2</samp> > _Sing, O Muse, of the swift and skillful coders_ > _Who fixed the dreadful deadlock of the stream_ > _That crashed the mighty tensors of the MPS_ > _When they sought out the nonzero elements._ Pull Request resolved: https://github.com/pytorch/pytorch/pull/108996 Approved by: https://github.com/kulinseth	2023-09-13 19:28:47 +00:00
Sahdev Zala	35aeb6aa85	Do not use a specific LOC in link (#108957 ) The order of LOC can change and so it should not be used in creating a link. Also, a specific LOC is not needed here given the function name as used in general in overall documentaton. Previously, a fix was provided by updating the line number for the mentioned issue in this PR but the LOC was eventually changed resulting a broken link. Fixes #102183 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108957 Approved by: https://github.com/ezyang	2023-09-13 19:21:45 +00:00
Randolf Scholz	32f50b7021	Improve type annotations for `jit.script` (#108782 ) Fixes #108781 - [x] added `@overload` for `jit.script` - [x] added typing unittest in `test/typing/pass/jit.py` - NOTE: unittest is not automatically checked by mypy when executing lintrunner currently. (how to fix?) - [x] used `stubgen` to create [torch/jit/_script.pyi](https://github.com/pytorch/pytorch/pull/108782/files#diff-738e66abee2523a952b3ddbaecf95e187cce559473cf8c1b3da7c247ee5d1132) and added overloads there. (adding them inside `_script.py` itself interfered with JIT engine) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108782 Approved by: https://github.com/ezyang	2023-09-13 19:20:25 +00:00
Huamin Li	8851603a9c	Back out "[Inductor] Extend Pattern Matcher to Match Equivalent Function Invocation (#107832 )" (#109174 ) Summary: Original commit changeset: ad8e1321811a Original Phabricator Diff: D49151331 Test Plan: Sandcastle Differential Revision: D49218851 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109174 Approved by: https://github.com/hl475, https://github.com/yanboliang	2023-09-13 18:17:59 +00:00
Riham Selim	c657d9ecc5	[PyTorch] Add Expanded call stack to nodes (#108426 ) Summary: To get a Node's call stack we currently loop on the InlinedCallStack graph and follow the "callee" chain. Since the node's inlined stack does not change we can optimize this but expanding the node's inlined stack once and reusing it. This is particularly useful when reading the node's stack from another process (e.g. BPF) as it simplified the memory traversal process. The new data structure (NodeSourceInfo) only holds pointers to the function name and file name variables, and assumes these objects will be alive throughout the lifetime of the process. Each Node has an extended attribute that has an index to a vector of stack frames `expanded_node_stacks_` `node_stack_attr_symbol_` is only needed to make accessing the stack vector index attribute easier from BPF. Test Plan: - Performance Impact: The cost of expanding the call stack is between 500 - 1000 ns and happens only per instruction node at initialization time. - Verified using BPF Program in subsequent diffs Reviewed By: zdevito Differential Revision: D46578700 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108426 Approved by: https://github.com/zdevito	2023-09-13 17:48:47 +00:00
Andrei Gheorghe	00908475e6	Use global variables to register the return_types namedtuples (#108832 ) Fixes #69221. Builds on top of #107000, fixing the buck build issue linked [here](https://github.com/pytorch/pytorch/pull/107000#issuecomment-1708857375). Pull Request resolved: https://github.com/pytorch/pytorch/pull/108832 Approved by: https://github.com/zou3519	2023-09-13 17:42:46 +00:00
CaoE	6065e7a97c	add Half support for BatchNorm on CPU (#102070 ) Fixes #106543 ### Testing Single core: shape \| fp32 forward / ms \| fp16 forward / ms \| bf16 forward / ms \| fp32 backward / ms \| fp16 backward / ms \| bf16 backward / ms -- \| -- \| -- \| -- \| -- \| -- \| -- (1, 4, 256, 256) \| 0.7116 \| 0.1427 \| 0.1744 \| 0.2638 \| 0.2002 \| 0.2556 (1, 32, 100, 100) \| 0.8579 \| 0.1725 \| 0.2077 \| 0.3023 \| 0.2399 \| 0.2995 (32, 16, 200, 200) \| 57.3466 \| 12.2179 \| 13.1320 \| 45.9524 \| 24.1526 \| 24.9882 28 cores: shape \| fp32 forward / ms \| fp16 forward / ms \| bf16 forward / ms \| fp32 backward / ms \| fp16 backward / ms \| bf16 backward / ms -- \| -- \| -- \| -- \| -- \| -- \| -- (1, 4, 256, 256) \| 0.2571 \| 0.0713 \| 0.0846 \| 0.1140 \| 0.0883 \| 0.1043 (1, 32, 100, 100) \| 0.1077 \| 0.0510 \| 0.0548 \| 0.0700 \| 0.0645 \| 0.0713 (32, 16, 200, 200) \| 5.5060 \| 1.4195 \| 1.4663 \| 6.773 \| 3.0886 \| 3.1343 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102070 Approved by: https://github.com/jgong5, https://github.com/mikaylagawarecki	2023-09-13 17:30:16 +00:00
Huy Do	f6d8ecf9b3	Use the correct channel token when uploading nightly triton conda (#109073 ) This fixes 2 bugs on triton build workflow: * Use the wrong conda credential when `UPLOAD_CHANNEL` is not set https://github.com/pytorch/pytorch/actions/runs/6129675580/job/16691419329#step:7:18 * Upload wheel and conda packages when pushing to main in addition to nightly. This is needed because the binary wheel build on trunk also looks for torchtriton package after the triton pin is updated. ### Testing https://github.com/pytorch/pytorch/actions/runs/6152447684/job/16694843862?pr=109073#step:7:38 looks correct now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109073 Approved by: https://github.com/atalman	2023-09-13 17:12:33 +00:00
Huy Do	c9fdfafb00	Allow marking multiple unstable configs of the same job name (#109185 ) This is a bug that has stayed for a surprisingly long period of time (my fault). When there are multiple unstable configurations (`inductor`, `inductor_huggingface`, `inductor_huggingface_dynamic`) of the same job (`inductor / cuda12.1-py3.10-gcc9-sm86`), only the first one was marked as unstable. The for loop returned too early and missed the other twos even though they were also marked as unstable, for example https://ossci-metrics.s3.amazonaws.com/unstable-jobs.json ### Testing * Add an unit test * CI run https://github.com/pytorch/pytorch/actions/runs/6169798353 shows that the configs below are all marked as unstable: * https://github.com/pytorch/pytorch/issues/107079 * https://github.com/pytorch/pytorch/issues/109153 * https://github.com/pytorch/pytorch/issues/109154 * Manually run the script to verify the test matrix output: ``` python .github/scripts/filter_test_configs.py \ --workflow "inductor" \ --job-name "cuda12.1-py3.10-gcc9-sm86 / build," \ --test-matrix "{ include: [ { config: "inductor", shard: 1, num_shards: 1, runner: "linux.g5.4xlarge.nvidia.gpu" }, { config: "inductor_huggingface", shard: 1, num_shards: 1, runner: "linux.g5.4xlarge.nvidia.gpu" }, { config: "inductor_timm", shard: 1, num_shards: 2, runner: "linux.g5.4xlarge.nvidia.gpu" }, { config: "inductor_timm", shard: 2, num_shards: 2, runner: "linux.g5.4xlarge.nvidia.gpu" }, { config: "inductor_torchbench", shard: 1, num_shards: 1, runner: "linux.g5.4xlarge.nvidia.gpu" }, { config: "inductor_huggingface_dynamic", shard: 1, num_shards: 1, runner: "linux.g5.4xlarge.nvidia.gpu" }, { config: "inductor_timm_dynamic", shard: 1, num_shards: 2, runner: "linux.g5.4xlarge.nvidia.gpu" }, { config: "inductor_timm_dynamic", shard: 2, num_shards: 2, runner: "linux.g5.4xlarge.nvidia.gpu" }, { config: "inductor_torchbench_dynamic", shard: 1, num_shards: 1, runner: "linux.g5.4xlarge.nvidia.gpu" }, { config: "inductor_distributed", shard: 1, num_shards: 1, runner: "linux.g5.12xlarge.nvidia.gpu" }, ]} " \ --pr-number "" \ --tag "" \ --event-name "push" \ --schedule "" \ --branch "" ::set-output name=keep-going::False ::set-output name=is-unstable::False ::set-output name=reenabled-issues:: ::set-output name=test-matrix::{"include": [{"config": "inductor", "shard": 1, "num_shards": 1, "runner": "linux.g5.4xlarge.nvidia.gpu", "unstable": "unstable"}, {"config": "inductor_huggingface", "shard": 1, "num_shards": 1, "runner": "linux.g5.4xlarge.nvidia.gpu", "unstable": "unstable"}, {"config": "inductor_timm", "shard": 1, "num_shards": 2, "runner": "linux.g5.4xlarge.nvidia.gpu"}, {"config": "inductor_timm", "shard": 2, "num_shards": 2, "runner": "linux.g5.4xlarge.nvidia.gpu"}, {"config": "inductor_torchbench", "shard": 1, "num_shards": 1, "runner": "linux.g5.4xlarge.nvidia.gpu"}, {"config": "inductor_huggingface_dynamic", "shard": 1, "num_shards": 1, "runner": "linux.g5.4xlarge.nvidia.gpu", "unstable": "unstable"}, {"config": "inductor_timm_dynamic", "shard": 1, "num_shards": 2, "runner": "linux.g5.4xlarge.nvidia.gpu"}, {"config": "inductor_timm_dynamic", "shard": 2, "num_shards": 2, "runner": "linux.g5.4xlarge.nvidia.gpu"}, {"config": "inductor_torchbench_dynamic", "shard": 1, "num_shards": 1, "runner": "linux.g5.4xlarge.nvidia.gpu"}, {"config": "inductor_distributed", "shard": 1, "num_shards": 1, "runner": "linux.g5.12xlarge.nvidia.gpu"}]} ::set-output name=is-test-matrix-empty::False ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/109185 Approved by: https://github.com/clee2000	2023-09-13 17:06:37 +00:00
Catherine Lee	fe198f3141	inductor/test_max_autotune serial in CI (#109209 ) Fixes #ISSUE_NUMBER Trying to figure out why the this keeps timing out, wondering if its due to parallelization weirdness Pull Request resolved: https://github.com/pytorch/pytorch/pull/109209 Approved by: https://github.com/huydhn	2023-09-13 17:04:43 +00:00
Huy Do	d05a6e5ade	Add missing DeviceMesh import (#109187 ) The test is broken after https://github.com/pytorch/pytorch/pull/107533#issuecomment-1709529759 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109187 Approved by: https://github.com/clee2000	2023-09-13 16:50:35 +00:00
Emil Laftchiev	f2639a2c37	Back out "Dynamo support for autograd.Function w/ once_differentiable (#108686 )" (#109199 ) Summary: Original commit changeset: e11cddf1fecc Original Phabricator Diff: D49064185 Test Plan: Comparing PT1 and PT2 performance on the IG Feed Model with this diff backed out: N4274204 Comparing the PT1 and PT2 performance on IG Feed with this diff committed: N4271093 Reviewed By: zou3519 Differential Revision: D49230047 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109199 Approved by: https://github.com/zou3519, https://github.com/xw285cornell	2023-09-13 15:43:20 +00:00
Sam Larsen	264f1e7b4c	[inductor] Enable Mypy Checking for torch/_inductor/codecache.py (#108789 ) Summary: Add type annotations to torch/_inductor/codecache.py and enable mypy checking Test Plan: `lintrunner torch/_inductor/*.py` `python test/inductor/test_max_autotune.py` `python test/inductor/test_aot_inductor.py` `python test/inductor/test_torchinductor.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/108789 Approved by: https://github.com/Skylion007, https://github.com/eellison	2023-09-13 14:05:35 +00:00
drisspg	ad90ab31f2	Flash Attention v2 (#105602 ) # Summary ## PR Dependencies I don't use ghstack :( this is a PR where it would have been helpful. That beings said I am going to peel off some PRs to make reviewing this easier: - [x] Separate build flags for Flash and MemEff: #107985 ### Description This pull request updates the version of _scaled_dot_product_flash_attention from version 1 to version 2. The changes are based on the flash attention code originally authored by @tridao ### Changes Made The majority of the changes in this pull request involve: - Copying over the flash_attention sources. - Updating header files. - Removing padding and slicing code from within the flash_attention kernel and relocating it to the composite implicit region of the SDPA. This was need to make the kernel functional and appease autograd. - Introducing a simple kernel generator to generate different instantiations of the forward and backward flash templates. - Adding conditional compilation (ifdef) to prevent building when nvcc is invoked with gencode < sm80. - Introducing a separate dependent option for mem_eff_attention, as flash_attention v2 lacks support for Windows and cannot be built for sm50 generation codes. - Modifying build.sh to reduce parallelization on sm86 runners and to lower the maximum parallelization on the manywheel builds. This adjustment was made to address out-of-memory issues during the compilation of FlashAttentionV2 sources. - Adding/Updating tests. ### Notes for Reviewers This is not a fun review, and I apologize in advance. Most of the files-changed are in the flash_attn/ folder. The only files of interest here IMO: - aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp - aten/src/ATen/native/transformers/cuda/flash_attn/kernels/generate_kernels.py ( this has been incorporated upstream to flash-attention github) There are a number of files all related to avoiding OOMs in CI/CD. These are typically shell scripts. ### Follow up items - Include the updates from `e07aa036db` and `9e5e8bc91e` \| https://github.com/pytorch/pytorch/issues/108108 ### Work Items - [x] I don't think Windows will be supported for 3.1.0 - Need to update cmakee - [x] Let multi_query/attention pass through and test \| UPDATE: I have the fast path implemented here: https://github.com/pytorch/pytorch/pull/106730 but since this will require changes to semantics of math to call repeat_interleave, I think this should be done as a followup. - [x] Had to drop cutlass back to 3.0.0 to get it to compile. Need to figure out how to upgrade to 3.1.0 and later. Spoke with Tri and he is going to be taking a look. Note: compiling with clang currently errors for the cute headers. - [x] Update test exercise above codepath - [x] Still need to disable on seq_len % 128 != 0 for backward( Tri beat me to it `a4f148b6ab`) - [x] Add determinism warning to BWD, Tri got to this one as well: 1c41d2b - [x] Update dispatcher to universally prefer FlashV2 - [x] Update tests to exercise new head_dims - [x] Move the head_dim padding from kernel to top level composite implicit function in order to make it purely functional - [x] Create template generator script - [x] Initial cmake support for building kernels/ folder - [x] Replay CudaGraph changes ### Results #### Forward only The TFlops are reported here are on a100 that is underclocked. ![flashv2_tflops_vs_seq_len](https://github.com/pytorch/pytorch/assets/32754868/152de46d-8fa6-42f0-9a9c-ef1eb7ae29e7) #### Forward+Backward Ran a sweep and for large compute bound sizes we do see a ~2x performance increase for forw+back. <img width="1684" alt="Screenshot 2023-07-20 at 3 47 47 PM" src="https://github.com/pytorch/pytorch/assets/32754868/fdd26e07-0077-4878-a417-f3a418b6fb3b"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/105602 Approved by: https://github.com/huydhn, https://github.com/cpuhrsch	2023-09-13 13:59:05 +00:00
Edward Z. Yang	55f956f1d2	optests improvements based on torchvision usage on nms (#108929 ) - Update cross-ref FakeMode test to use ShapeEnv. Dynamic ops can now return an unbacked SymInt. We always accept this as equal to whatever the real value was. - Relax test so it works on all classes, not just unittest.TestCase - Properly wrap the original method, so things like pytree.mark.parametrize are carried over - Support dynamic shapes by default for make_fx `tracing_mode="fake"` without symbolifying everything else Fixes https://github.com/pytorch/pytorch/issues/108927 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/108929 Approved by: https://github.com/zou3519	2023-09-13 13:26:15 +00:00
Richard Zou	bfa8429c6a	[optests] Changed failures_dict format to json; automatic update of failures_dict (#109110 ) We changed the failures_dict format from .py to json and added a way to automatically update the failures dict (the user can set PYTORCH_OPCHECK_ACCEPT=1 to do so), assuming the tests don't crash in the process. Some details: - We introduced a FailuresDict class that handles save/load and from which one can query a test status ("xfail", "skip", etc). - PYTORCH_OPCHECK_ACCEPT=1 does not override everything. In particular: it doesn't try to update the failures dict for a test marked as "skip", but it will update it for tests marked as "xfail" or "success". - PYTORCH_OPCHECK_ACCEPT=1 also does not override the "comment" field, unless it is flipping an "xfail" into "success". - I'll update the gdoc linked in the comments with how to actually use PYTORCH_OPCHECK_ACCEPT=1 internally (it's not trivial). Note that this isn't multithreading-safe, the current recommendation is to run the tests sequentially if the user wants to use PYTORCH_OPCHECK_ACCEPT=1. Differential Revision: D49167181 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109110 Approved by: https://github.com/ezyang	2023-09-13 13:24:15 +00:00
Jez Ng	db48bc80d9	Check index size during decomp of index_add (#108826 ) This partially fixes the `test_index_add_correctness` test (#108181) when run under inductor: it causes an exception to be raised [here][1] as expected. The test as a whole still cannot be made to pass under inductor because the [last assert][2] still fails, likely due to #108798. [1]: `dec2b267d4/test/test_torch.py (L6049)` [2]: `dec2b267d4/test/test_torch.py (L6051)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/108826 Approved by: https://github.com/eellison	2023-09-13 13:06:26 +00:00
Jez Ng	d2d36aad6f	Enable typechecking for _inductor/virtualized.py (#108916 ) Also add a few more type annotations to utils.py (some of its functions are called from virtualized.py) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108916 Approved by: https://github.com/eellison	2023-09-13 13:04:51 +00:00
PyTorch MergeBot	c5e7588613	Revert "[dynamo] preserve some FX node metadata of GraphModules (#107067 )" This reverts commit 1d42148fee45e5bdb6c96a1ff45b8d4d326138ee. Reverted https://github.com/pytorch/pytorch/pull/107067 on behalf of https://github.com/DanilBaibak due to Break internal build ([comment](https://github.com/pytorch/pytorch/pull/107067#issuecomment-1717321061))	2023-09-13 09:59:33 +00:00
David Berard	aee5dec3aa	torch/csrc/profiler/README.md - stubs, RecordFunction, Autograd interaction (#108470 ) Technical details about the profiler - stubs for the stuff I haven't had time to fill out yet, plus details about RecordFunction and the profiler's interaction with autograd. reviewers - see `06c41eea9e/torch/csrc/profiler/README.md` for rendered markdown Pull Request resolved: https://github.com/pytorch/pytorch/pull/108470 Approved by: https://github.com/aaronenyeshi	2023-09-13 07:46:01 +00:00
Michael Voznesensky	de0b18fad9	Use user directed names for variables where possible (#109092 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109092 Approved by: https://github.com/ezyang ghstack dependencies: #108846	2023-09-13 07:44:04 +00:00
Catherine Lee	015be4cedb	Forward fix lint (#109177 ) After https://github.com/pytorch/pytorch/pull/109075 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109177 Approved by: https://github.com/angelayi	2023-09-13 06:10:34 +00:00
angelayi	3d8d59e68b	Update inductor ci_expected_accuracy (#109148 ) Changes due to updating the HF pin: [107400](https://github.com/pytorch/pytorch/pull/107400) Somehow during the previous PR it didn't need these changes...probably a CI bug Pull Request resolved: https://github.com/pytorch/pytorch/pull/109148 Approved by: https://github.com/clee2000, https://github.com/desertfire	2023-09-13 05:12:33 +00:00
Evgeni Burovski	3ac2396e00	Fix `torch._numpy.random` (#108944 ) Fix several issues with `torch._numpy.random` functions on eager 1. actually return scalars when `size is None` 2. fix dispatch with USE_NUMPY_STREAM 3. make tnp.random functions composable: make numpy functions receive numpy arguments, not `tnp.ndarray`s 4. fix random.shuffle for e.g. lists The main need for this gymnastics is due to `np.random` functions returning an ndarray or python scalar depending on the `size` argument. We decided a while ago to replicate this behavior in `tnp.random` and not elsewhere where we always return 0D arrays instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108944 Approved by: https://github.com/lezcano	2023-09-13 05:08:19 +00:00
Animesh Jain	41e5d410cf	Symintify repeat_interleave (#109133 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109133 Approved by: https://github.com/ezyang, https://github.com/voznesenskym, https://github.com/bdhirsh	2023-09-13 04:55:56 +00:00
Yanan Cao	a09539f454	Add torch.export.register_dataclass API (#109152 ) `register_dataclass` allows dataclass to be used as valid input/output types of torch.export.export Pull Request resolved: https://github.com/pytorch/pytorch/pull/109152 Approved by: https://github.com/ydwu4	2023-09-13 04:17:12 +00:00
Wanchao Liang	375d2ca6c9	[dtensor][4/n] don't use make_fx for strategy propagation (#108262 ) We were using make_fx for strategy based propagation so that we can get a graph and the shape related metadata, this becomes too much overkill for the sharding propagation purpose. This change refactors the strategy propagation to remove the graph based propagation, instead just use the op to index to the strategy functions. We also just use a fake shape prop instead of relying on fx tracing for the shape/stride propagation. for a future possible decomposed propagation, we will exercise different codepath to enable that NOTE that this would also greatly reduce the latency of: 1. first time dtensor operations when populating the cache, the first iter would become faster again! 2. greatly reduce the test_dtensor_ops.py time again, right now the whole test finished within 2-3 mins again. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108262 Approved by: https://github.com/fduwjj ghstack dependencies: #107306, #108261	2023-09-13 04:08:02 +00:00
Wanchao Liang	09f3e08bcc	[dtensor][3/n] use dedicated TensorMeta instead of the fx one (#108261 ) This PR switches the usage of fx's shape prop TensorMetadata to dtensor's own dedicated defined TensorMeta, this is because DTensor only cares three fields: shape/stride/dtype, all other fields are not necessary and can be inferred from local_tensor directly. This would help significantly simplify how we deal with the tensor metadata by not caring other fields. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108261 Approved by: https://github.com/fduwjj ghstack dependencies: #107306	2023-09-13 04:08:02 +00:00
Wanchao Liang	fc1dcfb9ab	[dtensor][2/n] use op overload instead of function schema (#107306 ) function schema doesn't provide us anything as we can also get the schema from `op._schema`, include the op directly in op_schema makes easier for sharding prop to do fake execution, and in principle it should also make the hash comparison faster as we don't need to hash the function schema, instead we just hash the `id(op)` which is constant This PR is just a refactor to include op to OpSchema instead of func schema, no other logic changes Pull Request resolved: https://github.com/pytorch/pytorch/pull/107306 Approved by: https://github.com/fduwjj	2023-09-13 04:08:02 +00:00
wz337	48e6ffbe30	[DCP][Test] Fix device assignment in test/distributed/checkpoint/test_file_system_checkpoint_cpu.py (#109141 ) Device should always be "cpu" for cpu tensor types. This will fix fb buck test failure when running in internal. ``` buck2 test '@fbcode//mode/dev-nosan' fbcode//caffe2/test/distributed/checkpoint:file_system_checkpoint_cpu -- --exact 'caffe2/test/distributed/checkpoint:file_system_checkpoint_cpu - test_switch_between_sharded_tensor_to_tensor_thread_count_1 (test_file_system_checkpoint_cpu.TestDistributedReshardOnLoad)' ``` This will unblock [D48667323](https://www.internalfb.com/diff/D48667323). Pull Request resolved: https://github.com/pytorch/pytorch/pull/109141 Approved by: https://github.com/fegin	2023-09-13 03:04:14 +00:00
AllenTiTaiWang	91e154fcd7	[ONNX] Support None in fx.args as torchlib inputs (#108708 ) Prior to this PR, if None is returned from intermediate nodes, it will crashes the export because None is not expected to be passed into `_fill_tensor_shape_type`, and raise beartype roar. The function fills in shape and type to TorchScriptTensor according to its info from FX graph. This is discovered after https://github.com/microsoft/onnxscript/pull/1043 is supported. The op specifically generates None in one of its inputs, but the only output from it being consumed is the first one (not None). Reference test from a TorchBench model: ```python def test_nanogpt(self): import sys sys.path.append("/home/titaiwang") from nanoGPT.model import GPT, GPTConfig # Load the model kwargs = { "block_size": 256, "vocab_size": 8096, # GPT-2 vocab_size of 50257, padded up to nearest multiple of 64 for efficiency "n_layer": 2, "n_head": 2, "n_embd": 128, "dropout": 0.0, "bias": False, # True: bias in Linears and LayerNorms, like GPT-2. False: a bit better and faster } config = GPTConfig(**kwargs) with torch.backends.cuda.sdp_kernel( enable_flash=True, enable_mem_efficient=True ): model = GPT(config) print("Done loading model") inputs = torch.arange(128).view(2, 64) targets = torch.arange(128).view(2, 64) self.run_test_with_fx_to_onnx_exporter_and_onnx_runtime( model, (inputs,), input_kwargs={ "targets": targets, }, verbose=True, ) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/108708 Approved by: https://github.com/justinchuby, https://github.com/thiagocrepaldi	2023-09-13 02:47:16 +00:00
Yanbo Liang	a2ff345416	[HigherOrderOp] Support SymInt as input to body function (#108967 ) Fixes #108283 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108967 Approved by: https://github.com/zou3519	2023-09-13 02:14:16 +00:00
soulitzer	4667a5c948	Update SingletonSymNode to allow more comparisons (#108315 ) In this PR: - {in,}equality between singleton and plain ints returns false instead of erroring - Morally define the semantic of j0 > c to be as if j0 represented an array [s_0, s_1, ... s_n] and s_k > c for all k - Just like for equality, we don't actually want to do the comparison one by one, instead j0 is constrained to some range [min, max]. By default this range is [2, int64_t::max] so that it acts like a size and passes 0/1 specialization checks. - In the future, we can define some API to allow users to constrain the range of their singletons Pull Request resolved: https://github.com/pytorch/pytorch/pull/108315 Approved by: https://github.com/ezyang	2023-09-13 01:58:02 +00:00
Tina (Lin) Dineva	a46df6ebce	[pytorch-vulkan] add aten::randn_like & aten::normal_ (#109075 ) Summary: Implemented `aten::normal_` shader and used it to create `aten::randn_like`. Ops defintions: https://pytorch.org/docs/stable/generated/torch.randn_like.html https://pytorch.org/docs/stable/generated/torch.Tensor.normal_.html Test Plan: ``` [ttingchulin@53491.od /data/sandcastle/boxes/fbsource (randn)]$ LD_LIBRARY_PATH=third-party/swiftshader/lib/linux-x64/ buck run fbcode/mode/dev-nosan //xplat/caffe2:pt_vulkan_api_test_bin -- --gtest_filter="<test>" eg. -- --gtest_filter="randn_like" [==========] Running 2 tests from 1 test suite. [----------] Global test environment set-up. [----------] 2 tests from VulkanAPITest [ RUN ] VulkanAPITest.randn_like [ OK ] VulkanAPITest.randn_like (230 ms) [ RUN ] VulkanAPITest.randn_like_large [ OK ] VulkanAPITest.randn_like_large (570 ms) [----------] 2 tests from VulkanAPITest (801 ms total) [----------] Global test environment tear-down [==========] 2 tests from 1 test suite ran. (801 ms total) [ PASSED ] 2 tests. [ttingchulin@53491.od /data/sandcastle/boxes/fbsource (randn)]$ LD_LIBRARY_PATH=third-party/swiftshader/lib/linux-x64/ buck run fbcode/mode/dev-nosan //xplat/caffe2:pt_vulkan_api_test_bin -- --gtest_filter="<test>" eg. -- --gtest_filter="normal_" [==========] Running 3 tests from 1 test suite. [----------] Global test environment set-up. [----------] 3 tests from VulkanAPITest [ RUN ] VulkanAPITest.normal_ [ OK ] VulkanAPITest.normal_ (222 ms) [ RUN ] VulkanAPITest.normal_large [ OK ] VulkanAPITest.normal_large (136 ms) [ RUN ] VulkanAPITest.normal_error [ OK ] VulkanAPITest.normal_error (37 ms) [----------] 3 tests from VulkanAPITest (396 ms total) [----------] Global test environment tear-down [==========] 3 tests f. ``` Reviewed By: yipjustin Differential Revision: D48814024 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109075 Approved by: https://github.com/yipjustin	2023-09-13 01:07:34 +00:00
Edward Z. Yang	e5f300f085	Make mutation test work with quantized tensors (#108935 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/108935 Approved by: https://github.com/zou3519	2023-09-13 00:54:01 +00:00
yanbing-j	687f027896	[submodule] Fix eltwise share buffer issue in ideep (#108038 ) Fix [#107876 ](https://github.com/pytorch/pytorch/issues/107876). This PR is to fix [#107876 ](https://github.com/pytorch/pytorch/issues/107876), which root cause is that, eltwise is lack of the logic of dealing with src and diff_src with different shapes. After initializing a new diff_src and reordering back to diff_src's buffer, like inner_product and matmul, the issue https://github.com/pytorch/pytorch/issues/107876 can be addressed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108038 Approved by: https://github.com/jgong5, https://github.com/mingfeima	2023-09-13 00:53:57 +00:00
Edward Z. Yang	e027de2c86	Add torch.distributed get_rank and get_world_size to constant_fold_functions (#109029 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/109029 Approved by: https://github.com/bdhirsh	2023-09-13 00:52:43 +00:00
Yukio Siraichi	12e8530b35	Record and replay for ShapeEnv. (#107989 ) This PR introduces record and replay functionality for `ShapeEnv` instances. In short, throughout the execution of a program, we record events (e.g. function calls that modify its state) so that, in the future, we are able to reproduce any intermediary state of the instance. In summary, this PR introduces the following changes (they mostly belong to _symbolic_shapes.py_ unless otherwise stated): - Create `ShapeEnvEvent` class for recording function calls + arguments - Create `record_shapeenv_event` decorator and decorate every function that changes the state of a `ShapeEnv`: it creates an appropriate event and add it to the available ShapeEnv instance (sometimes it has to extract from `SymTypes`). - Create `SymNode.with_shape_env` convenient function for replacing `ShapeEnv` references - Wraps `ShapeEnv` initialization method: so that we also save the exact way a `ShapeEnv` was constructed, i.e. arguments - Introduces a way to compare two `ShapeEnv` instances, defining a concept of state for that class. In short, the state of `ShapeEnv` is every variable that may change the execution flow - Create `check_shape_env_recorded_events` dynamo configuration for enabling the check for equality the state of `ShapeEnv` with another one that was constructed by replaying all the recorded events. This check takes place inside `produce_guards` - Create `replay_shape_env_events` function for replaying given events. It assumes the first event is `ShapeEnv` initialization function Pull Request resolved: https://github.com/pytorch/pytorch/pull/107989 Approved by: https://github.com/ezyang	2023-09-13 00:22:38 +00:00
max	e066056414	fix 'Node' object is not iterable in functorch.compile.minifier (#103011 ) Fixes #102169 Pull Request resolved: https://github.com/pytorch/pytorch/pull/103011 Approved by: https://github.com/Chillee	2023-09-12 23:47:40 +00:00
Jez Ng	063a62622b	Add memory overlap check to `meta_copy_` (#108989 ) Fixes `test_copy_many_to_one`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108989 Approved by: https://github.com/eellison	2023-09-12 23:28:14 +00:00
Jez Ng	f08885287f	Fix cumprod f16 opinfo test via ref-in-float + increasing tolerances (#109128 ) Without setting `reference_in_float`, cumprod's single sample case passes (i.e. the compiled f16 result matches the eager mode f16 result; in fact they are identical because they both call into aten). However, the grad calculation does not line up. Turning on `reference_in_float` causes the grad check to pass (i.e. we are closer to the more accurate f64 grad calculation) but causes the single sample case to fail. Since the compiled f16 case is no less accurate than the eager f16 case for the single sample, relaxing the tolerances here seems fine. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109128 Approved by: https://github.com/eellison ghstack dependencies: #109081, #109089	2023-09-12 23:19:59 +00:00
Jez Ng	6869b25f1b	Fix a bunch of opinfo tests by using reference_in_float (#109089 ) I set reference_in_float to be always True, ran the full opinfo test suite, and observed which tests were now unexpectedly passing. However, I didn't turn on reference_in_float by default in this diff because it also creates some new failures. Related: https://github.com/pytorch/pytorch/issues/105534 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109089 Approved by: https://github.com/eellison ghstack dependencies: #109081	2023-09-12 23:19:59 +00:00
Jez Ng	baefe47161	Fix std_mean f16 opinfo test by using reference_in_float (#109081 ) It seems that the compiled f16 op is more accurate than the eager f16 op: Compiled float16 vs Eager float64 Mismatched elements: 25 / 25 (100.0%) Greatest absolute difference: 3.718038455710615e-05 at index (1, 0) (up to 1e-07 allowed) Greatest relative difference: 0.0018021699903143316 at index (0, 4) (up to 1e-07 allowed) Eager float16 vs Eager float64 Mismatched elements: 25 / 25 (100.0%) Greatest absolute difference: 7.280254198286512e-05 at index (3, 3) (up to 1e-07 allowed) Greatest relative difference: 0.004104326045245938 at index (0, 4) (up to 1e-07 allowed) Compiled float16 vs Eager float16 Mismatched elements: 7 / 25 (28.0%) Greatest absolute difference: 7.62939453125e-05 at index (3, 3) (up to 1e-05 allowed) Greatest relative difference: 0.00588226318359375 at index (0, 4) (up to 0.001 allowed) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109081 Approved by: https://github.com/eellison	2023-09-12 23:19:59 +00:00
Kurt Mohler	4c5e43574c	Reland 2: Add PyObject preservation for UntypedStorage (#109039 ) Relands #103907 after it was reverted. This PR makes the new `ignore_hermetic_tls` argument of `check_pyobj` optional to avoid causing a compilation error in torchdistx Part of #91395 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109039 Approved by: https://github.com/ezyang	2023-09-12 22:26:05 +00:00
wz337	6dc56d3490	[DTensor] Remove compute_local_offset from _utils.py (#109096 ) Separating internal changes with OSS changes. This PR contains removing the compute_local_offset from the OSS directory only. This replaces https://github.com/pytorch/pytorch/pull/108965 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109096 Approved by: https://github.com/wanchaol, https://github.com/fduwjj	2023-09-12 21:55:15 +00:00
Jerry Zhang	cf26e5575d	[quant][be] Reduce warnings in tests (#108922 ) Summary: att Test Plan: python test/test_quantization.py TestQuantizePT2E Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/108922 Approved by: https://github.com/andrewor14 ghstack dependencies: #108920, #108921	2023-09-12 21:54:33 +00:00
redwrasse	9118073fe7	assign var for "not populated" str (#108844 ) minor cleanup of assigning a variable to the 'not populated' string value referenced in several places in `vmapify_autograd_function`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108844 Approved by: https://github.com/zou3519	2023-09-12 20:53:48 +00:00
PyTorch MergeBot	91aab161d0	Revert "[inductor] Lower masked_scatter on CUDA (#108803 )" This reverts commit c8e577bf409591910f9667a51f2cf92b3c5455e0. Reverted https://github.com/pytorch/pytorch/pull/108803 on behalf of https://github.com/lezcano due to makes test_comprehensive_masked_scatter_cuda_int64 flaky ([comment](https://github.com/pytorch/pytorch/pull/108803#issuecomment-1716407433))	2023-09-12 20:49:06 +00:00
Jerry Zhang	b01b934aca	[quant][be] Cleanup xnnpack_quantizer implementation (#108921 ) Summary: att Test Plan: python test/test_quantization.py TestQuantizePT2E Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/108921 Approved by: https://github.com/andrewor14	2023-09-12 19:28:41 +00:00
Rodrigo Kumpera	bde75eb9a8	[Gloo] Properly pass op type to Work (#108812 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108812 Approved by: https://github.com/wconstab, https://github.com/fduwjj	2023-09-12 18:21:09 +00:00
Ying Zhang	a2d5f13310	[Inductor CUTLASS backend] Step 5: Gemm CUTLASS templates (#108015 ) This is the step 5 to add cutlass as an alternative inductor backend. Feature request: https://github.com/pytorch/pytorch/issues/106991. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108015 Approved by: https://github.com/kadeng, https://github.com/jansel, https://github.com/aakhundov ghstack dependencies: #107802, #107847, #107901, #107931	2023-09-12 17:44:38 +00:00
Ying Zhang	097fd43f8c	[Inductor CUTLASS backend] Step 4: CUDA (template) kernels (#107931 ) This is the step 4 to add cutlass as an alternative inductor backend. Full tests can be found from the last PR in the stack. Feature request: https://github.com/pytorch/pytorch/issues/106991. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107931 Approved by: https://github.com/aakhundov, https://github.com/jansel, https://github.com/kadeng ghstack dependencies: #107802, #107847, #107901	2023-09-12 17:44:38 +00:00
Ying Zhang	b2d764ece0	[Inductor CUTLASS backend] Step 3: autotune_process, and CUDABenchmarkRequest (#107901 ) This is the step 3 to add cutlass as an alternative inductor backend. Full tests can be found from the last PR in the stack. Feature request: https://github.com/pytorch/pytorch/issues/106991. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107901 Approved by: https://github.com/jansel, https://github.com/aakhundov, https://github.com/kadeng ghstack dependencies: #107802, #107847	2023-09-12 17:44:36 +00:00
Ying Zhang	102fefac21	[Inductor CUTLASS backend] Step 2: CUDACodeCache (#107847 ) This is the step 2 to add cutlass as an alternative inductor backend. Feature request: https://github.com/pytorch/pytorch/issues/106991. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107847 Approved by: https://github.com/jansel, https://github.com/kadeng, https://github.com/aakhundov ghstack dependencies: #107802	2023-09-12 17:44:34 +00:00
Ying Zhang	a14761b68a	[Inductor CUTLASS backend] Step 1: Inductor config for cuda / cutlass, util functions. (#107802 ) This is the step 1 to add cutlass as an alternative inductor backend. Feature request: https://github.com/pytorch/pytorch/issues/106991. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107802 Approved by: https://github.com/jansel, https://github.com/aakhundov, https://github.com/kadeng	2023-09-12 17:44:32 +00:00
atalman	15b13d3cff	Revert "CI Sev - pin docker images for A100 workers (#108871 )" (#109071 ) This reverts commit 89eb7a75a251c41c4bee86e9ede1001b0d3998af. Not required anymore since issue addressed by https://github.com/pytorch/test-infra/pull/4563 But deploying normally. Want to get proper green signal for deployment Pull Request resolved: https://github.com/pytorch/pytorch/pull/109071 Approved by: https://github.com/huydhn	2023-09-12 17:22:04 +00:00
Evgeni Burovski	cd46b5db76	make sure all torch._numpy tests run on CI (#108762 ) - Add `if __name__ == "__main__": run_tests()` stanzas to test files in `torch_np` folder so that these tests run on CI - Skip / xfail things smoked out by this change - remove a stray python file which should not have been added to tests in the first place. - fix einsum if opt_einsum is present - add skips for older numpies Pull Request resolved: https://github.com/pytorch/pytorch/pull/108762 Approved by: https://github.com/lezcano	2023-09-12 17:12:21 +00:00
Mikayla Gawarecki	abd83ce180	Small fix in SDPA docstring codeblock (#109086 ) Fix https://github.com/pytorch/pytorch/issues/109072 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109086 Approved by: https://github.com/drisspg	2023-09-12 16:48:46 +00:00
igm503	1b9b3a2d15	[MPS] Adding lgamma, digamma, and polygamma implementations (#106292 ) Fixes issue mentioned in #77764 e.g. https://github.com/pytorch/pytorch/issues/77764#issuecomment-1654111744 Adds MPS support for the following ops: - lgamma - mvlgamma - digamma - polygamma The lgamma fucntion does not yet have an MPS backend implementation. I've added one using a custom metal kernel (following John D. Cook's c++ implementation of the log gamma function: https://www.johndcook.com/blog/cpp_gamma/). For the backward pass op, I've added a digamma kernel that follows the cpu+cuda digamma implementation, and for the backward pass of the digamma op, I've added a polygamma + trigamma kernel following, again, the cpu+cuda implementations. NOTE: The cpu implementation of the polygamma function incorrectly (as far as I can tell) outputs a finite number for order = 1 and x in the negative integers. The mps implementation correctly outputs infinite. (see https://github.com/pytorch/pytorch/issues/106692) The polygamma tests currently don't pass because of the error in the cpu+cuda kernels, but also because there are smallish discrepancies near the negative integers between the cpu+cuda and the mps polygamma and trigamma kernels. I'm not sure exactly why this is, but let me know if the discrepancies are too big. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106292 Approved by: https://github.com/kulinseth	2023-09-12 16:43:37 +00:00
Peter Bell	c8e577bf40	[inductor] Lower masked_scatter on CUDA (#108803 ) This decomposes masked_scatter into `aten.cumsum` and a single pointwise kernel, which is similar to what is done in eager. I only do this for CUDA because on CPU it isn't split into two passes like this so would cause a slowdown. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108803 Approved by: https://github.com/lezcano ghstack dependencies: #108802	2023-09-12 16:16:05 +00:00
Peter Bell	464f9c3725	[meta] Add meta implementation for aten.masked_scatter (#108802 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108802 Approved by: https://github.com/lezcano	2023-09-12 16:16:05 +00:00
angelayi	c3945b5f84	Update HF version to commit hash (6c26faa) (#107400 ) Some [errors](https://ossci-raw-job-status.s3.amazonaws.com/log/15968424899) in the [torchinductor hf benchmarks](https://hud.pytorch.org/benchmark/huggingface/inductor_aot_inductor?startTime=Thu,%2010%20Aug%202023%2018:05:47%20GMT&stopTime=Thu,%2017%20Aug%202023%2018:05:47%20GMT&granularity=hour&mode=inference&dtype=bfloat16&lBranch=main&lCommit=384e0d104fd077d31efafc564129660e9b7a0f25&rBranch=main&rCommit=03414081ff7ee011e17ee10f9ddb2584811bf965) should be fixed in the most recent release (for example, this [line](`c036c814f4/src/transformers/models/opt/modeling_opt.py (L688)`) no longer exists). Additionally, I landed a [commit (6c26faa)](`6c26faa159`) to the HF transformers repro to fix one of the graph breaks. This PR results in [76% pass rate for the export + aot inductor HF benchmark!](https://hud.pytorch.org/benchmark/compilers?startTime=Thu%2C%2010%20Aug%202023%2022%3A45%3A09%20GMT&stopTime=Thu%2C%2017%20Aug%202023%2022%3A45%3A09%20GMT&granularity=hour&suite=torchbench&mode=inference&dtype=bfloat16&lBranch=angelayi/hf_version&lCommit=0accaaca2fa70ca2f78c1a587dd4b6750448dd90&rBranch=main&rCommit=03414081ff7ee011e17ee10f9ddb2584811bf965) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107400 Approved by: https://github.com/ezyang, https://github.com/desertfire, https://github.com/malfet	2023-09-12 15:25:28 +00:00
Angela Yi	58391aeaf1	[export] Lift constant tensors as buffes (reland) (#109040 ) Summary: When we retrace the graph containing constant tensors, they get lifted as buffer inputs. AotInductor also wants to lift all the constants as inputs. If we separate the constants as a separate thing, then it adds an additional complexity where we now have to keep track of 3 inputs (params, buffers, constants). Cons: People might care about specifically what buffers are/are not buffers? If people want to know specifically which buffers are constants, we can add an additional field in the graph signature to mark this. Test Plan: CI Differential Revision: D49153367 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109040 Approved by: https://github.com/zhxchen17	2023-09-12 15:23:00 +00:00
PyTorch MergeBot	1d32c9c7f2	Revert "Force synced KJT to trace unbacked SymInt (#108960 )" This reverts commit f9a250c35bd061e2e6f4c2d92e2b1b16390e8636. Reverted https://github.com/pytorch/pytorch/pull/108960 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/108960#issuecomment-1715850779))	2023-09-12 14:37:36 +00:00
Aaron Bockover	8c981c8c4b	[ONNX] bump submodule to onnx==1.14.1 (#108895 ) Bump the pip and submodule ONNX dependencies to official stable 1.14.1; there were no code changes between 1.14.1rc2 and 1.14.1. Also bump ORT to run tests against ort-nightly==1.16.0.dev20230908001. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108895 Approved by: https://github.com/justinchuby, https://github.com/thiagocrepaldi	2023-09-12 14:20:22 +00:00
PyTorch MergeBot	5a7c008b30	Revert "[ROCm] Add ROCm AMDGPU support for inductor cpp codegen (#105141 )" This reverts commit 8ff00360a4daab7848307a9a0b1c81b1da873d0c. Reverted https://github.com/pytorch/pytorch/pull/105141 on behalf of https://github.com/DanilBaibak due to Break internal build ([comment](https://github.com/pytorch/pytorch/pull/105141#issuecomment-1715629007))	2023-09-12 12:29:55 +00:00
Edward Z. Yang	5531a23b20	Don't set requires_grad inside meta function (#108988 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/108988 Approved by: https://github.com/lezcano, https://github.com/bdhirsh	2023-09-12 12:24:13 +00:00
FFFrog	bc3f0d341a	LazyBatchNorm{1-3}d support dict&set (#109015 ) Fixes #105292 As the title shown ,LazyBatchNorm don`t support dict&set, keep consistent with BatchNorm{1-3}d. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109015 Approved by: https://github.com/mikaylagawarecki	2023-09-12 09:09:59 +00:00
PyTorch MergeBot	41bd0fde7e	Revert "Remove fixed skips (#108674 )" This reverts commit ab9fb03d6f674e3592910a0c4cc8208517a71084. Reverted https://github.com/pytorch/pytorch/pull/108674 on behalf of https://github.com/huydhn due to Sorry for picking this up a bit late, but with https://github.com/pytorch/pytorch/pull/108647 reverted, these tests are failing again. So we need to wait for the PR to reland before we can land this change ([comment](https://github.com/pytorch/pytorch/pull/108674#issuecomment-1715202692))	2023-09-12 08:04:32 +00:00
Lucy Qiu	65a3d398f1	[Pytorch][Vulkan] Call binary_op_scalar when 'other' is a 0-dim tensor (#109035 ) Summary: 0-dim tensor are not supported in Vulkan. If a binary_op_tensor is called with 'other_arg' being a 0-dim tensor, then we extract the scalar out and call binary_op_scalar. Used to run the [FD model]( https://www.internalfb.com/manifold/explorer/wrist-camera-ml/tree/models/fd-ted-pi/fd-hybrid/fd_hybrid_vulkan.ptl) on [CLI](https://www.internalfb.com/intern/wiki/Malibu/Software/Machine_Learning/PyTorch_On_Device_Catalog/#build-and-run-native-pyt) Test Plan: ``` lfq@lfq-mbp fbsource % buck run --target-platforms ovr_config//platform/macos:arm64-fbsource //xplat/caffe2:pt_vulkan_api_test_binAppleMac\#macosx-arm64 -c pt.vulkan_full_precision=1 ... [ RUN ] VulkanAPITest.querypool_flushed_shader_log xplat/caffe2/aten/src/ATen/test/vulkan_api_test.cpp:6891: Skipped QueryPool is not available [ SKIPPED ] VulkanAPITest.querypool_flushed_shader_log (0 ms) [----------] 339 tests from VulkanAPITest (5308 ms total) [----------] Global test environment tear-down [==========] 339 tests from 1 test suite ran. (5308 ms total) [ PASSED ] 338 tests. [ SKIPPED ] 1 test, listed below: [ SKIPPED ] VulkanAPITest.querypool_flushed_shader_log YOU HAVE 5 DISABLED TESTS ``` Differential Revision: D48672535 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109035 Approved by: https://github.com/manuelcandales	2023-09-12 07:35:59 +00:00
PyTorch MergeBot	59f605be57	Revert "Reland 2: Add PyObject preservation for UntypedStorage (#109039 )" This reverts commit 419e4e17a2c991d17685754a7fb0ddcf7dfdac87. Reverted https://github.com/pytorch/pytorch/pull/109039 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing linter job in trunk, probably due to a landrace ([comment](https://github.com/pytorch/pytorch/pull/109039#issuecomment-1715147020))	2023-09-12 07:26:11 +00:00
Avik Chaudhuri	47be61e12b	untracked inputs in constraints (#109037 ) Differential Revision: [D49157009](https://our.internmc.facebook.com/intern/diff/D49157009/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109037 Approved by: https://github.com/zhxchen17	2023-09-12 06:50:01 +00:00
Edward Yang	f9a250c35b	Force synced KJT to trace unbacked SymInt (#108960 ) Summary: The basic concept behind this diff is to modify Dynamo's tracing behavior when it encounters a KeyedJaggedTensor that is synced (aka has `_length_per_key` and `_offset_per_key` populated). These fields are lists of integers; ordinarily, Dynamo will optimistically try to specialize on integers, however, for KJTs, we know that these integers will definitely vary from run-to-run. Furthermore, ordinarily, we would also specialize these integers if they are 0/1, but we will frequently expect features in KJTs to be 0/1. The fix is to detect KJTs and treat these integers as unbacked integers. This is NOT a universally sound optimization: when treating these integers as unbacked, we never report them as equal to zero or one. In return, we always generate graphs that generalize no matter the length of values on features. This is enough to trace through APS sparse arch, torchrec_dlrm and some small split-cat examples. The special integer behavior is triggered by a dynamically scoped `force_unspec_int_unbacked_size_like` variable on TracingContext, which we trigger when we wrap a KJT. There probably are other ways to do this, but this was simple and worked. Test Plan: ``` buck2 test mode/dev-nosan //pytorch/benchmark/fb/test_gpu:run_test_gpu ``` from aakhundov 1. first build feed_lower_benchmark: ``` buck2 build --show-output mode/opt -c python.package_style=inplace -c fbcode.enable_gpu_sections=true -c fbcode.platform=platform010 -c fbcode.split-dwarf=true hpc/new/models/feed/benchmark:feed_lower_benchmark ``` 2. then run the lowering of the model with it: ``` TORCHINDUCTOR_MAX_AUTOTUNE=1 TORCHINDUCTOR_UNIQUE_KERNEL_NAMES=1 TORCH_LOGS="output_code,graph_code" TORCH_COMPILE_DEBUG=1 ../buck-out/v2/gen/fbcode/79c6b019ee0f9469/hpc/new/models/feed/benchmark/__feed_lower_benchmark__/feed_lower_benchmark.par --load=manifold://ig_inference_model/tree/user/facebook/fblearner/predictor/960999465/60/gpu_lowering/input.predictor --skip-trt --skip-ait --sync-mode=0 --enable-aot-inductor --lower-presets="ig_stories" --gpu-trace ``` cf https://docs.google.com/document/d/1yD30xYrdmM8r2HTdmXnZTg0-MHVexfVrAa0294m1AUE/edit?pli=1#heading=h.qiv3fp7e6zg0 From torchrec: https://www.internalfb.com/intern/wiki/Torchrec/Development/Testing_production_models/ From ge0405 baseline (without your diff): f477293168 your diff: f477292363 Differential Revision: D49019987 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108960 Approved by: https://github.com/voznesenskym	2023-09-12 03:44:24 +00:00
zhxchen17	6c8b0dfba6	[export] Add a private interface for customizing decomp. (#109058 ) Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/109058 Approved by: https://github.com/angelayi	2023-09-12 03:05:46 +00:00
Vincent Lee	15202cc80c	[caffe2] Remove cxx override to c++17 (#108687 ) Summary: Allow the user to specify the cxx version to use when compiling. For applications that compile with C++20, we wish to also compile this library with C++20 to avoid subtle ODR violations with using different library standards. Test Plan: Built the project successfully. Reviewed By: smeenai Differential Revision: D48636406 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108687 Approved by: https://github.com/davidberard98	2023-09-12 02:54:59 +00:00
Yu, Guangye	b1f21399c8	Prerequisite of ATen/native/utils header for C++ extension (#109013 ) # Motivate Without this PR, if we would like to include the header file like ```#include <ATen/native/ForeachUtils.h>``` in our C++ extension, it will raise a Error ```/home/xxx/torch/include/ATen/native/ForeachUtils.h:7:10: fatal error: 'ATen/native/utils/ParamsHash.h' file not found```. We should fix it. # Solution Add the ATen/native/utils header file in the build. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109013 Approved by: https://github.com/ezyang	2023-09-12 02:30:45 +00:00
eellison	85428f5ea5	Fix 0-sized views of tensors in cudagraphs (#109055 ) Fixes internal model. If a tensor with real storage is viewed by a 0-dim tensor it is still kept alive and needs to be kept track of in our storage tracking. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109055 Approved by: https://github.com/ezyang, https://github.com/xw285cornell	2023-09-12 01:24:43 +00:00
Kurt Mohler	419e4e17a2	Reland 2: Add PyObject preservation for UntypedStorage (#109039 ) Relands #103907 after it was reverted. This PR makes the new `ignore_hermetic_tls` argument of `check_pyobj` optional to avoid causing a compilation error in torchdistx Part of #91395 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109039 Approved by: https://github.com/ezyang	2023-09-12 01:19:40 +00:00
PyTorch MergeBot	2039f30c06	Revert "[inductor] Parallelize Max Autotune step 1: Use Popen (#107982 )" This reverts commit d6856680039e5557b45e4cd6e95f82ca64f6435a. Reverted https://github.com/pytorch/pytorch/pull/107982 on behalf of https://github.com/masnesral due to fbcode failures ([comment](https://github.com/pytorch/pytorch/pull/107982#issuecomment-1714818307))	2023-09-12 01:12:22 +00:00
PyTorch MergeBot	c36c2bfcb2	Revert "[inductor] Parallelize Max Autotune step 2: Use all GPUs (#107983 )" This reverts commit 2c61313ff3b9ca585f04a4bb78263f301a8cec27. Reverted https://github.com/pytorch/pytorch/pull/107983 on behalf of https://github.com/masnesral due to fbcode failures ([comment](https://github.com/pytorch/pytorch/pull/107983#issuecomment-1714816358))	2023-09-12 01:08:08 +00:00
cyy	f150f96255	[Reland] increase clang-tidy coverage in torch/csrc (#108875 ) Reland PR #103058 since there was a time gap between this PR and other PRs in terms of torch/csrc modifications Pull Request resolved: https://github.com/pytorch/pytorch/pull/108875 Approved by: https://github.com/Skylion007	2023-09-12 00:54:53 +00:00
Iris	b6f9d4dbc4	[DCP] Enable nD device_mesh resharding DTensor in DCP and add associated tests (#106230 ) This PR: 1. Drop assert for 1D DeviceMesh check to allow DTensor with nD DeviceMesh when creating write_item. 2. Add tests for both placement changes and mesh changes for both 1D and 2D scenarios. cc. @kumpera @wanchaol @fegin Pull Request resolved: https://github.com/pytorch/pytorch/pull/106230 Approved by: https://github.com/kumpera	2023-09-12 00:47:58 +00:00
cyy	8025b193a9	Re-enable some Windows tests (#108930 ) The tests were disabled long before. It should be fine to enable them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108930 Approved by: https://github.com/kit1980	2023-09-12 00:33:19 +00:00
Michael Voznesensky	4691cb26b3	Disable compile for massive data pipe test (#109063 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109063 Approved by: https://github.com/clee2000 ghstack dependencies: #108846	2023-09-12 00:15:52 +00:00
Michael Voznesensky	55a204ebc8	[Easy] log graphs in compiled_autograd if TORCH_LOGS=compiled_autograd (#108991 ) [Easy] log graphs in compiled_autograd if TORCH_LOGS=compiled_autograd Pull Request resolved: https://github.com/pytorch/pytorch/pull/108991 Approved by: https://github.com/ezyang ghstack dependencies: #108846	2023-09-12 00:15:02 +00:00
chilli	33c1136f89	Added limit on number of warps for coordesc autotuner (#108997 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108997 Approved by: https://github.com/shunting314	2023-09-12 00:14:38 +00:00
Jerry Zhang	241e84bf98	[quant][be] Rewrite xnnpack_quantizer_utils.py to use decorators (#108920 ) Summary: att Test Plan: python test/test_quantization.py TestQuantizePT2E Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/108920 Approved by: https://github.com/kimishpatel	2023-09-12 00:09:13 +00:00
Li-Huai (Allan) Lin	b2cba439b4	Introduce Tensor overload to linspace and logspace (#104889 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/104889 Approved by: https://github.com/zou3519 ghstack dependencies: #107958	2023-09-11 23:30:40 +00:00
David Berard	405f014c26	[jit] Skip NNAPI, test_ivalue, CPU NNC tests in fbcode (#108937 ) Summary: NNAPI: Internal test infra can't find test_nnapi.py. Easiest solution is to just skip these tests if test_nnapi.py can't be found test_ivalue: fails due to qscheme op not implemented for CPU backend. In OSS, it doesn't run because it's not included in test_jit.py. CPU NNC tests: test_working_byte_cpu_float32 is failing, but hard to repro; we don't use CPU NNC internally, so let's just skip CPU NNC tests internally. Differential Revision: D48041615 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108937 Approved by: https://github.com/eellison	2023-09-11 22:42:30 +00:00
Li-Huai (Allan) Lin	293d3b89d8	Add Opinfos for the Tensor overload of linspace/logspace (#107958 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107958 Approved by: https://github.com/zou3519	2023-09-11 22:30:19 +00:00
igm503	03fd3544a2	fixed lgamma documentation error (#108719 ) Fixes #108527 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108719 Approved by: https://github.com/zou3519	2023-09-11 22:29:06 +00:00
Huamin Li	97d9188178	Speical treatment to build AOTInductor with cuda-12 from Meta internal (#108831 ) Differential Revision: D49042577 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108831 Approved by: https://github.com/bertmaher	2023-09-11 22:16:23 +00:00
Nik Waldron	29c29339e5	Add torch_lazy_enable_device_data_cache to disable lazy device data cache (#107827 ) ### Add python binding variables for enabling and disabling These changes will be used in the pytorch/xla repository for lowering HLO for the AWS Neuron compiler. For correct tensor lowerings the device cache size must be set to zero. It is advantageous to be able to enable and disable the cache without deleting it. This allows use of the XLA device, and HLO lowering in a single python file, isolating cache disablement to a python context. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107827 Approved by: https://github.com/JackCaoG, https://github.com/wconstab, https://github.com/bdhirsh	2023-09-11 22:14:39 +00:00
humingxue	03bf745e1d	Fix the parameter error in test_device_mesh.py (#108758 ) Fix the parameter error in test_device_mesh.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/108758 Approved by: https://github.com/awgu	2023-09-11 21:39:13 +00:00
Brian Vaughan	bb14805bcd	fix an incorrect indent in documentation (#108273 ) doc for `torch.distributed.send(tensor, dst, group=None, tag=0)` was rendering incorrectly here: https://pytorch.org/docs/stable/distributed.html due to lack of indent (it was interpreting the continuation as a new argument). Pull Request resolved: https://github.com/pytorch/pytorch/pull/108273 Approved by: https://github.com/awgu, https://github.com/kit1980	2023-09-11 21:27:52 +00:00
Catherine Lee	a4138b1f99	[ez] Fix small type error in run_test (#109036 ) This is really small but it has tripped me up at least 3 times. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109036 Approved by: https://github.com/kit1980	2023-09-11 21:11:20 +00:00
Jacob Szwejbka	5c8efa6077	[export] Fix export arg type declaration (#109060 ) Summary: Its a arbitrary length tuple of anything. Tuple[Any] means 1 element. Test Plan: ci Differential Revision: D49161625 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109060 Approved by: https://github.com/angelayi	2023-09-11 20:54:05 +00:00
Tina (Lin) Dineva	b0656ac81f	[pytorch-vulkan] move glsl random utils to Random.h (#108724 ) Summary: I plan to use the Box-Muller method for sampling from the normal distribution to implement `aten::randn_like`, which can use the existing uniform functions, so I move them out to a `random.h`. https://en.wikipedia.org/wiki/Box%E2%80%93Muller_transform Test Plan: ``` [ttingchulin@95660.od /data/sandcastle/boxes/fbsource (rand_lib)]$ LD_LIBRARY_PATH=third-party/swiftshader/lib/linux-x64/ buck run fbcode/mode/dev-nosan //xplat/caffe2:pt_vulkan_api_test_bin -- --gtest_filter="<test>" eg. -- --gtest_filter="uniform" BUILD SUCCEEDED Running main() from xplat/third-party/gmock/googletest-1.12.1/googletest/src/gtest_main.cc Note: Google Test filter = uniform [==========] Running 1 test from 1 test suite. [----------] Global test environment set-up. [----------] 1 test from VulkanAPITest [ RUN ] VulkanAPITest.uniform [ OK ] VulkanAPITest.uniform (120 ms) [----------] 1 test from VulkanAPITest (120 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test suite ran. (120 ms total) [ PASSED ] 1 test. ``` Reviewed By: yipjustin Differential Revision: D48750679 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108724 Approved by: https://github.com/yipjustin	2023-09-11 20:27:43 +00:00
Aidyn-A	e7bd9c5315	[CUDA][CUDA Graphs] Fix CUDAGraph::reset function (#108896 ) The following two cases fail due to a small oversight `CUDAGraph::reset()` that causes failures in graph destructor ```Python import torch x = torch.zeros(4, device="cuda") g = torch.cuda.CUDAGraph() with torch.cuda.graph(g): x = x + 1 g.reset() del g ``` that fails with: ``` terminate called after throwing an instance of 'c10::Error' what(): uc >= 0 INTERNAL ASSERT FAILED at ".../pytorch/c10/cuda/CUDACachingAllocator.cpp":2157, please report a bug to PyTorch. ``` and reset and subsequent re-capture ```Python import torch x = torch.zeros(4, device="cuda") g = torch.cuda.CUDAGraph() with torch.cuda.graph(g): x = x + 1 g.reset() with torch.cuda.graph(g): x = x + 1 g.replay() ``` which fails with: ``` Traceback (most recent call last): File "test_graph.py", line 11, in <module> with torch.cuda.graph(g): File ".../pytorch/torch/cuda/graphs.py", line 192, in __enter__ self.cuda_graph.capture_begin( File ".../pytorch/torch/cuda/graphs.py", line 77, in capture_begin super().capture_begin(pool=pool, capture_error_mode=capture_error_mode) RuntimeError: This CUDAGraph instance already owns a captured graph. To capture a new graph, create a new instance. ``` This PR fixes `CUDAGraph::reset()` function for above to use cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108896 Approved by: https://github.com/ezyang	2023-09-11 19:49:31 +00:00
Kiarash Jamali	fb288aa99b	Add Bfloat16 support to CrossKernel.cu (#108941 ) Fixes #108940 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108941 Approved by: https://github.com/mikaylagawarecki	2023-09-11 19:05:01 +00:00
Peter Bell	5976a08eea	[inductor] Add ir.Scan and lower aten.cumsum on CUDA (#106581 ) This adds the `ir.Scan` node (currently only supported on CUDA) which re-uses the existing reduction kernel machinery to support different kinds of non-pointwise ops. Just like reductions it supports prologue and epilogue fusions and has both persistent and non-persistent kernel generation. Currently this doesn't support the equivalent of `Reduction.create_multilayer` and will instead fall back to eager in those cases. This is because splitting into multiple kernel invocations ends up being far slower than cub's single kernel strategy which matches the performance of a copy kernel. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106581 Approved by: https://github.com/lezcano, https://github.com/atalman	2023-09-11 18:44:10 +00:00
soulitzer	2bcff92540	Add NestedTensor python subclass (#108314 ) Description coming soon Pull Request resolved: https://github.com/pytorch/pytorch/pull/108314 Approved by: https://github.com/jbschlosser ghstack dependencies: #108808	2023-09-11 18:29:20 +00:00
Cyril-Anto	4a4a2fc1a5	Enable Mypy Checking for torch/_inductor/fx_passes/fuse_attention.py (#107369 ) Fixes #105230 Summary: As suggested in https://github.com/pytorch/pytorch/issues/105230 mypy checking is enabled in torch/_inductor/fx_passes/fuse_attention.py After Fix: mypy --follow-imports=skip torch/_inductor/fx_passes/fuse_attention.py Success: no issues found in 1 source file Pull Request resolved: https://github.com/pytorch/pytorch/pull/107369 Approved by: https://github.com/mikaylagawarecki	2023-09-11 18:08:26 +00:00
PyTorch MergeBot	e276d70451	Revert "Add Opinfos for the Tensor overload of linspace/logspace (#107958 )" This reverts commit 106e0a0ef19c8dad088fc1ec10d7d93d76409352. Reverted https://github.com/pytorch/pytorch/pull/107958 on behalf of https://github.com/clee2000 due to I think the newly added test test_mps.py::TestConsistencyCPU::test_output_match_logspace_tensor_overload_cpu_complex64 is broken, probably a landrace since the mergebase seems to be 21 days old `106e0a0ef1` https://github.com/pytorch/pytorch/actions/runs/6149523234/job/16685849126 ([comment](https://github.com/pytorch/pytorch/pull/107958#issuecomment-1714309905))	2023-09-11 17:38:04 +00:00
PyTorch MergeBot	a7f5abeade	Revert "Introduce Tensor overload to linspace and logspace (#104889 )" This reverts commit 57e52393213b6b4fba3b334654b96396a2904087. Reverted https://github.com/pytorch/pytorch/pull/104889 on behalf of https://github.com/clee2000 due to sorry have to revert this to revert https://github.com/pytorch/pytorch/pull/107958 ([comment](https://github.com/pytorch/pytorch/pull/104889#issuecomment-1714305768))	2023-09-11 17:33:48 +00:00
William Wen	1d42148fee	[dynamo] preserve some FX node metadata of GraphModules (#107067 ) Requested from @tugsbayasgalan: we want dynamo to preserve some FX node metadata when we trace `GraphModule`s (`nn_module_stack`, `source_fn`, `stack_trace`). This is helpful for the case when we export an aten-level `GraphModule`, add some (possibly non-torch or non-aten) ops, and we want to transform the graph back into an aten-level graph. Without preserving metadata, future passes that look at metadata (e.g. quantization passes) won't work. This feature also has the additional benefit of being able to preserve origin line of code when `print_readable`'ing a `GraphModule`. This is helpful when debugging graphs that have passed through dynamo several times. The added unit test demonstrates the added functionality of this PR. ~This PR is currently a proof-of-concept implementation that shows that preserving node metadata across dynamo is possible.~ This PR preserves node metadata across dynamo by doing the following: - ~inject a counter variable into the `GraphModule` source code, which is incremented every time a node is run~ - Construct a line number -> node index map in `GraphModule` as the source code is being generated. - pass a list of node metadata and the line number map to dynamo's bytecode analyzer - ~dynamo traces the counter as a `ConstantVariable`, so when we create a new proxy, we can determine which original node index this proxy corresponds by looking at the value of the traced counter~ - When we create a new proxy, get the current instruction's line number, and get the node index using the line number map - index into the original node metadata ~using the counter variable's tracked value.~ ~Some things that should be addressed off the top of my head:~ - ~Is this feature even desirable? (Do we really want Dynamo to have special behavior for `GraphModules`? Should we expect users to re-export `GraphModules`?)~ - ~Is there a better approach than to use a counter? We considered using node names, line numbers, and assuming that proxies are created in the same order as the nodes, but each of these 3 have shortcomings. For node names, we only have access to new node names, not the old ones. Using line number is fragile. The third is problematic since not all created nodes go through `create_proxy` (e.g. inputs). We currently generate a line number to node index map when the `GraphModule`'s code is generated.~ - ~What's the best way to send data across the "CPython gap"? That is, it is not obvious how to cleanly pass data from dynamo's `eval_frame.py:_TorchDynamoContext.__call__` to `symbolic_convert.py:InstructionTranslatorBase.__init__`. In this PR, we use a global.~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/107067 Approved by: https://github.com/jansel	2023-09-11 17:11:51 +00:00
redwrasse	ba4782e3c0	cleanup typos; redundant parentheses (#109003 ) - minor spelling fixes in `aten/src/ATen/core/TransformationHelper.h` - remove redundant parentheses in control statements in `torch/distributed/algorithms/_quantization/quantization.py`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109003 Approved by: https://github.com/davidradl, https://github.com/H-Huang	2023-09-11 17:09:17 +00:00
Nakul Camsamudram	3b265e021f	Support Optional typehint without graph breaking (#108970 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108970 Approved by: https://github.com/anijain2305	2023-09-11 16:42:44 +00:00
PyTorch MergeBot	090fe45e1c	Revert "make sure all torch._numpy tests run on CI (#108762 )" This reverts commit 7abeb92796635bd3ee216a0872bddd0395e97d10. Reverted https://github.com/pytorch/pytorch/pull/108762 on behalf of https://github.com/clee2000 due to sorry but I think the asan test_scalarmath failure is real `7abeb92796` https://github.com/pytorch/pytorch/actions/runs/6132913963/job/16645381921 ([comment](https://github.com/pytorch/pytorch/pull/108762#issuecomment-1714214523))	2023-09-11 16:29:20 +00:00
soulitzer	3efc1882e8	Update CopySlices to not internal assert when grad_output is undefined (#108353 ) Fixes https://github.com/pytorch/pytorch/issues/107928 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108353 Approved by: https://github.com/albanD ghstack dependencies: #107296, #107349	2023-09-11 16:26:05 +00:00
Andrew Or	e8a402c56e	[quant][pt2] Fix and rename `move_model_to_eval` (#108891 ) Summary: This commit fixes two silent correctness problems with the current implementation of `move_model_to_eval`: (1) Previously the user had to manually call `eliminate_dead_code` before calling `move_model_to_eval`, otherwise the dropout pattern won't actually get eliminated. This is because subgraph rewriter complains the match is not self-contained, and so silently does not do the replacement. (2) We wish to error when the user calls `model.train()` or `model.eval()` on an exported model. This error is raised correctly immediately after export today, but no longer raised after the user calls prepare or convert. We fix (1) by moving the `eliminate_dead_code` call into `move_model_to_eval`, and fix (2) by ensuring the respective errors are thrown after prepare and convert as well. Additionally, this commit renames `move_model_to_eval` to `move_exported_model_to_eval` to be more explicit. bypass-github-export-checks Test Plan: python test/test_quantization.py TestQuantizePT2E.test_disallow_eval_train python test/test_quantization.py TestQuantizePT2E.test_move_exported_model_to_eval Imported from OSS Differential Revision: D49097293 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108891 Approved by: https://github.com/jerryzh168	2023-09-11 15:37:01 +00:00
Li-Huai (Allan) Lin	57e5239321	Introduce Tensor overload to linspace and logspace (#104889 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/104889 Approved by: https://github.com/zou3519 ghstack dependencies: #107958	2023-09-11 15:29:39 +00:00
Li-Huai (Allan) Lin	106e0a0ef1	Add Opinfos for the Tensor overload of linspace/logspace (#107958 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107958 Approved by: https://github.com/zou3519	2023-09-11 15:29:39 +00:00
lxg2015	e19a855b4d	[HSDP] Fix Node 1 unable receive parameters from Node 0 (#108331 ) When use hybrid_shard mode FSDP, state.process_group means gpu_0,1,,,~,7 on node 0，so gpus on node 1 cannot receive parameters, setting process_group to default_group（global_group）can fix this issue Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/108331 Approved by: https://github.com/awgu	2023-09-11 15:13:28 +00:00
cyy	9a492fc27f	Fix unknown c++ flag detection in CMake (#109000 ) Unknown -Wno-XXX flags are still appended to GCC via append_cxx_flag_if_supported because of the behavior mentioned in GCC document: ``` When an unrecognized warning option is requested (e.g., -Wunknown-warning), GCC emits a diagnostic stating that the option is not recognized. However, if the -Wno- form is used, the behavior is slightly different: no diagnostic is produced for -Wno-unknown-warning unless other diagnostics are being produced. This allows the use of new -Wno- options with old compilers, but if something goes wrong, the compiler warns that an unrecognized option is present. ``` This PR tries to fix by detection the flag of the -WXXX form. Unfortunately, third_party/fbgemm/CMakeLists.txt redefines append_cxx_flag_if_supported and our version is overwritten. As a result, we have to re-include utils.cmake to overwrite it again. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109000 Approved by: https://github.com/malfet	2023-09-11 08:32:07 +00:00
Feng Yuan	18225cc6aa	inductor: add custom pass hooks in post_grad_passes (#108615 ) Supports #107921 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108615 Approved by: https://github.com/jansel	2023-09-11 04:13:32 +00:00
XiaobingSuper	a6b153b311	inductor: remove redundant memory copy when view a ExternKernelAlloc buffer (#108635 ) When viewing a ExternKernelAlloc buffer, there always have a redundant memory copy: ``` buf0: ExternKernelSchedulerNode(MKLPackedLinear) buf0.writes = [StarDep(name='buf0')] buf0.unmet_dependencies = [] buf0.met_dependencies = [StarDep(name='arg1_1'), StarDep(name='constant0'), StarDep(name='constant1')] buf0.users = [NodeUser(node=SchedulerNode(name='buf1'), can_inplace=True, is_weak=False)] buf0.node.kernel = torch.ops.mkl._mkl_linear buf1: SchedulerNode(ComputedBuffer) buf1.writes = [MemoryDep('buf1', c0, {c0: 64})] buf1.unmet_dependencies = [MemoryDep('buf0', c0, {c0: 64})] buf1.met_dependencies = [] buf1.users = [NodeUser(node=OUTPUT, can_inplace=False, is_weak=False)] buf1.group.device = cpu buf1.group.iteration = ((64,), ()) buf1.sizes = ([64], []) class buf1_loop_body: var_ranges = {z0: 64} index0 = z0 def body(self, ops): get_index = self.get_index('index0') load = ops.load('buf0', get_index) get_index_1 = self.get_index('index0') store = ops.store('buf1', get_index_1, load, None) return store ``` and the cpp backend-generated code is: ``` cpp_fused_view_0 = async_compile.cpp(''' #include "/tmp/torchinductor_xiaobing/ib/cibrnuq56cxamjj4krp4zpjvsirbmlolpbnmomodzyd46huzhdw7.h" extern "C" void kernel(float* in_out_ptr0) { #pragma omp parallel num_threads(40) { { #pragma omp for for(long i0=static_cast<long>(0L); i0<static_cast<long>(64L); i0+=static_cast<long>(16L)) { auto tmp0 = at::vec::Vectorized<float>::loadu(in_out_ptr0 + static_cast<long>(i0)); tmp0.store(in_out_ptr0 + static_cast<long>(i0)); } } } } ''') async_compile.wait(globals()) del async_compile def call(args): arg1_1, = args args.clear() assert_size_stride(arg1_1, (4, 16), (16, 1)) buf0 = torch.ops.mkl._mkl_linear(arg1_1, constant1, constant0, None, 4) del arg1_1 buf1 = reinterpret_tensor(buf0, (4, 4, 4), (16, 4, 1)); del buf0 # reuse cpp_fused_view_0(c_void_p(buf1.data_ptr())) return (buf1, ) ``` For the ExternKernelAlloc buffer, we can do a real view, rather than a memory copy. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108635 Approved by: https://github.com/jgong5, https://github.com/desertfire, https://github.com/jansel ghstack dependencies: #108560	2023-09-11 01:19:37 +00:00
XiaobingSuper	a6ada463ec	inductor: make onednn linear inputs are always real contiguous (#108560 ) For OneDNN linear, if packed linear inputs are not the default contiguous tensor, it always calls in ref pat and gets a worse performance, this PR will force its inputs to the actual default contiguous tensor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108560 Approved by: https://github.com/jgong5, https://github.com/desertfire, https://github.com/jansel	2023-09-11 01:11:36 +00:00
Nakul Camsamudram	e716505345	Graph break within check_kwargs for higher order ops #108597 #108730 (#108821 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108821 Approved by: https://github.com/anijain2305	2023-09-10 21:09:02 +00:00
Sam Larsen	1a3a07ac2c	[inductor] Enable Mypy Checking for torch/_inductor/codegen/triton_utils.py (#108951 ) Summary: Used monkeytype to generate the typehints and enabled mypy checking Test Plan: `lintrunner torch/_inductor/codegen/*.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/108951 Approved by: https://github.com/Skylion007	2023-09-10 19:18:51 +00:00
Huy Do	4a98c898e2	Refactor ios-build-test workflow to support binary release (#108322 ) This refactors the logic from CircleCI iOS [build](https://github.com/pytorch/pytorch/blob/main/.circleci/config.yml#L1323-L1344) and [upload](https://github.com/pytorch/pytorch/blob/main/.circleci/config.yml#L1369-L1377) jobs to GHA. * Nightly artifacts will be available again on `ossci-ios-build` S3 bucket, for example `libtorch_lite_ios_nightly_2.1.0.20230517.zip`. The last one there was s3://ossci-ios-build/libtorch_lite_ios_nightly_2.1.0.20230517.zip from May 17th * [LibTorch-Lite-Nightly](https://github.com/CocoaPods/Specs/blob/master/Specs/c/3/1/LibTorch-Lite-Nightly/1.14.0.20221109/LibTorch-Lite-Nightly.podspec.json) on cocoapods * Release artifacts will be on `ossci-ios` S3 bucket, for example `s3://ossci-ios/libtorch_lite_ios_1.13.0.zip` from Nov 3rd 2022 * [LibTorch-Lite](https://github.com/CocoaPods/Specs/blob/master/Specs/c/c/3/LibTorch-Lite/1.13.0.1/LibTorch-Lite.podspec.json) on cocoapods * [LibTorch](https://github.com/CocoaPods/Specs/blob/master/Specs/1/3/c/LibTorch/1.13.0.1/LibTorch.podspec.json) on cocoapods I will clean up Circle CI code in another PR. ### Testing Generate new release artifacts for testing from main branch. Simulator testing have all passed. * With lite interpreter https://github.com/pytorch/pytorch/actions/runs/6093860118 * https://ossci-ios.s3.amazonaws.com/libtorch_lite_ios_2.1.0.zip * https://ossci-ios.s3.amazonaws.com/LibTorch-Lite-2.1.0.podspec * LibTorch binary can be built without lite interpreter https://github.com/pytorch/pytorch/actions/runs/6103616035 and uses TorchScript, but it has been long dead from my understanding. The binary can still be built and tested though. * https://ossci-ios.s3.amazonaws.com/libtorch_ios_2.1.0.zip * https://ossci-ios.s3.amazonaws.com/LibTorch-2.1.0.podspec ### Next step for release * Once the PR is committed. I plan to use the workflow dispatch to build the binaries manually on `release/2.1` branch. Once they looks good, we can publish them on cocoapods. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108322 Approved by: https://github.com/atalman	2023-09-10 19:08:15 +00:00
Evgeni Burovski	63ae1051e1	MAINT: do not test numpy funcs in torch._numpy (#108807 ) Remove testing of numpy functions which torch._numpy does not implement. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108807 Approved by: https://github.com/lezcano	2023-09-10 17:18:30 +00:00
cyy	59254c75a1	[Reland] fix c10:TempFile APIs on Windows (#108508 ) PR #106656 was reverted due to IOS failures. It seems that IOS builds don't have full support of std::filesystem. This PR discards std::filesystem changes and add temp file creation on Windows. It also moves the platform syscalls into a separate cpp file. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108508 Approved by: https://github.com/ezyang	2023-09-10 16:58:41 +00:00
redwrasse	f81eacd30c	typo fix strategy_comb in basic_strategy.py (#108972 ) Typo fix `startegy_comb` -> `strategy_comb` in `basic_strategy.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/108972 Approved by: https://github.com/Skylion007	2023-09-10 15:58:15 +00:00
Sam Larsen	2c61313ff3	[inductor] Parallelize Max Autotune step 2: Use all GPUs (#107983 ) Summary: Step 2 in revamping subprocess autotune to support multiple GPUs: use a pool of subprocesses and distribute benchmark calls across them. Test Plan: `python test/inductor/test_max_autotune.py` `TORCHINDUCTOR_AUTOTUNE_IN_SUBPROC=1 TORCHINDUCTOR_MAX_AUTOTUNE=1 python benchmarks/dynamo/torchbench.py --device cuda --performance --backend inductor --inference --only hf_Bart` `TORCHINDUCTOR_AUTOTUNE_MULTI_DEVICE=1 TORCHINDUCTOR_AUTOTUNE_IN_SUBPROC=1 TORCHINDUCTOR_MAX_AUTOTUNE=1 python benchmarks/dynamo/torchbench.py --device cuda --performance --backend inductor --inference --only hf_Bart` Pull Request resolved: https://github.com/pytorch/pytorch/pull/107983 Approved by: https://github.com/eellison, https://github.com/shunting314 ghstack dependencies: #107982	2023-09-10 15:43:03 +00:00
Sam Larsen	d685668003	[inductor] Parallelize Max Autotune step 1: Use Popen (#107982 ) Summary: Step 1 in revamping subprocess autotune to support multiple GPUs: use Popen to create a new process with an entry point we control so we don't reinterpret the toplevel script. Test Plan: `python test/inductor/test_max_autotune.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/107982 Approved by: https://github.com/eellison, https://github.com/shunting314	2023-09-10 15:43:03 +00:00
atalman	89eb7a75a2	CI Sev - pin docker images for A100 workers (#108871 ) Pinning docker images, trying to address SEV : https://github.com/pytorch/pytorch/issues/108862 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108871 Approved by: https://github.com/huydhn	2023-09-10 13:30:00 +00:00
Edward Z. Yang	2b9ad3d5c4	Fix setitem with SymInt (#108873 ) Fixes https://github.com/pytorch/pytorch/issues/101939 Several fixes bundled together: 1. When we valueToTensor, we only handled non-symbolic inputs and not symbolic inputs. We support symbolic Scalar, so also handle symbolic values. 2. In the symbolic case, we MUST NOT lift_fresh, as you're not going to inline a constant into the graph, it's going to be from a `scalar_tensor` call (so no need to clone it to avoid mutations) 3. In indexing scalarToTensor, must not do the static, directly read out the scalar contents logic with the scalar is symbolic Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/108873 Approved by: https://github.com/jansel	2023-09-10 06:44:22 +00:00
Nikita Shulga	9b12a28d89	[MPS] Implement `mul` operation for complex types (#108395 ) Using existing BinaryKernel template Add `mul` as well as `kron` and `outer` to list of MPS ops that support complex types This should add all the missing ops mentioned in https://github.com/pytorch/pytorch/issues/105665 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108395 Approved by: https://github.com/albanD ghstack dependencies: #108393, #108394	2023-09-10 05:39:12 +00:00
Nikita Shulga	c7bb842d35	[MPS] Add complex `add`/`sub` (#108394 ) Using `view_as_real` and running elementwise ops in resulted tensors Add `add` and `sub` to list of complex ops that should work on MPS Pull Request resolved: https://github.com/pytorch/pytorch/pull/108394 Approved by: https://github.com/albanD ghstack dependencies: #108393	2023-09-10 05:39:12 +00:00
PyTorch MergeBot	92de1d2d02	Revert "[Dynamo][Test]Add a testcase for module with training state (#108750 )" This reverts commit f90444cf0b979ba434391fdd42fb1e2afb98ac34. Reverted https://github.com/pytorch/pytorch/pull/108750 on behalf of https://github.com/huydhn due to Sorry for reverting you change, but it starts failing this test https://github.com/pytorch/pytorch/issues/108838 without https://github.com/pytorch/pytorch/pull/108883 and the latter has been reverted ([comment](https://github.com/pytorch/pytorch/pull/108750#issuecomment-1712708800))	2023-09-10 04:45:00 +00:00
PyTorch MergeBot	56c2386157	Revert "reland [finishing colesbury's PR 100642] Guard on nn.Module dicts and type (#108883 )" This reverts commit d4230e55748c66c72e7a17b1cd08540b742b20a5. Reverted https://github.com/pytorch/pytorch/pull/108883 on behalf of https://github.com/huydhn due to Per the discussion thread on D49122208, reverting this change ([comment](https://github.com/pytorch/pytorch/pull/108883#issuecomment-1712707853))	2023-09-10 04:40:02 +00:00
Nikita Shulga	53a4ca4b58	[MPS][BE] Add `dispatch_sync_with_rethrow` (#108393 ) And enable testing for match_output for complex types. Most of them should throw an "unsupported XYZ" error, rather than crash. This fixed several crashes when linalg ops were invoked with complex inputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108393 Approved by: https://github.com/kit1980, https://github.com/kulinseth	2023-09-10 02:07:12 +00:00
angelayi	2b138e4f7d	[export] torch.export landing page (#108783 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108783 Approved by: https://github.com/avikchaudhuri, https://github.com/gmagogsfm	2023-09-10 01:40:42 +00:00
Evgeni Burovski	7abeb92796	make sure all torch._numpy tests run on CI (#108762 ) - Add `if __name__ == "__main__": run_tests()` stanzas to test files in `torch_np` folder so that these tests run on CI - Skip / xfail things smoked out by this change - remove a stray python file which should not have been added to tests in the first place. - fix einsum if opt_einsum is present - add skips for older numpies Pull Request resolved: https://github.com/pytorch/pytorch/pull/108762 Approved by: https://github.com/lezcano	2023-09-09 20:05:27 +00:00
FFFrog	003c5bb156	Add checks to `num_layers` for `RNN`, `LSTM`, `GRU` (#108853 ) Fixes #108223 As the title shown Pull Request resolved: https://github.com/pytorch/pytorch/pull/108853 Approved by: https://github.com/mikaylagawarecki	2023-09-09 19:33:52 +00:00
Yanbo Liang	4c503f2451	[Inductor] Extend Pattern Matcher to Match Equivalent Function Invocation (#107832 ) Fixes #104391 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107832 Approved by: https://github.com/jansel	2023-09-09 19:19:11 +00:00
Michael Voznesensky	e4350d6d4e	Functools partial support in dynamo (#108846 ) The strategy for supporting functools partials is relatively straightforward. There are 2 cases we need to support: 1) Functools partials as input In this case, we are first seeing the functools partial and it is guaranteed to have a source. As such, the args, keywords, and func of the functools partial are passed through VariableBuilder. As this is the first time we are seeing these objects (as it is an input), we re-enter VariableBuilder with a source referencing the args, keywords, and func as attributes of the input to produce: - func: A callable VariableTracker (UDF, TorchVariable, etc) depending on the value of `func` - args: List[VariableTracker] - note, not ListVariableTracker! - keywords: Dict[str, VariableTracker] A major benefit of this structure is that it very elegantly matches the args to `call_function`. We then compose a FunctoolsPartialVariable from the VariableTrackers made above. 2) Functools partials created within compile In this case, we already have all the args as known VTs, and thus just compose a FunctoolsPartialVariable as we do for case (1). For both (1) and (2) - we propagate all guards from the func, args, and keyword VTs to the FunctoolsPartialVariable Pull Request resolved: https://github.com/pytorch/pytorch/pull/108846 Approved by: https://github.com/ezyang, https://github.com/jansel	2023-09-09 17:25:02 +00:00
Jack Taylor	8ff00360a4	[ROCm] Add ROCm AMDGPU support for inductor cpp codegen (#105141 ) Follows from previous enablement attempt: https://github.com/pytorch/pytorch/pull/101797 Adds support for hsaco binaries in inductor's cpp_wrapper codegen and enables the CUDA tests in test_cpp_wrapper. Pull Request resolved: https://github.com/pytorch/pytorch/pull/105141 Approved by: https://github.com/jansel	2023-09-09 16:28:56 +00:00
ekamiti	0f88d93b10	decomposition spectral ops fixes (#108360 ) Fixes https://github.com/pytorch/pytorch/issues/105986, https://github.com/pytorch/pytorch/issues/108204, https://github.com/pytorch/pytorch/issues/108205 Fix all issues flagged when making changes for https://github.com/pytorch/pytorch/pull/107421 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108360 Approved by: https://github.com/ezyang	2023-09-09 04:48:09 +00:00
Animesh Jain	d4230e5574	reland [finishing colesbury's PR 100642] Guard on nn.Module dicts and type (#108883 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108883 Approved by: https://github.com/voznesenskym, https://github.com/huydhn	2023-09-09 03:12:31 +00:00
David Berard	ed7f9cac91	[inductor] Add CPU-side profiler event names for templates and foreach kernels (#108449 ) This passes in the descriptive kernel name as part of the triton_meta dict that gets passed to the CachingAutotuner, for foreach kernels and templates. Before: <img width="684" alt="Screenshot 2023-09-01 at 11 56 02 AM" src="https://github.com/pytorch/pytorch/assets/5067123/c14e13fc-0d9e-425a-a08b-613ef42aa264"> After: <img width="562" alt="Screenshot 2023-09-01 at 2 13 00 PM" src="https://github.com/pytorch/pytorch/assets/5067123/551bb9a9-865b-401e-b6e0-8ebbe5431565"> This PR also refactors the "magic strings" (KERNEL_NAME and DESCRIPTIVE_KRNL_NAME) into an enum in utils.py. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108449 Approved by: https://github.com/jansel	2023-09-09 02:11:13 +00:00
wz337	311fbe43e6	[DeviceMesh] Fix __getitem__ docstring typo (#108837 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/108837 Approved by: https://github.com/wanchaol	2023-09-09 01:46:14 +00:00
Thiago Crepaldi	7b3efeaf42	Follow-up #108379 (#108905 ) Fixes #108379 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108905 Approved by: https://github.com/abock	2023-09-09 01:38:36 +00:00
William Wen	2c3febb273	[dynamo] disable flaky test_unhandled_exception_in_dynamo2 (#108906 ) Fix https://github.com/pytorch/pytorch/issues/106028. The test `test_unhandled_exception_in_dynamo` should cover most cases. The disabled test `test_unhandled_exception_in_dynamo2` covered some weird case that I found when implementing dynamo 3.11. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108906 Approved by: https://github.com/yanboliang	2023-09-09 01:10:09 +00:00
Evgeni Burovski	324b23f337	MAINT: torch/_numpy: remove stubs raising NIError (#108902 ) Remove remaining stubs. There is no point raising NotImplementedError, now that a missing function triggers a graph break just by being missing in `torch._numpy` namespace. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108902 Approved by: https://github.com/lezcano	2023-09-09 00:11:14 +00:00
Jez Ng	b41b189b71	Un-skip the linalg_ldl_solve tests (#108842 ) There's a comment that says it segfaults, but it doesn't appear to do so any more Pull Request resolved: https://github.com/pytorch/pytorch/pull/108842 Approved by: https://github.com/eellison	2023-09-08 23:34:16 +00:00
shibo19	a5e1d38025	add check for torch_arg (#108397 ) Fixes https://github.com/pytorch/pytorch/issues/108219 add check for torch_arg marco, as for inchannel/outchannel/groups, it should be greater than 0. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108397 Approved by: https://github.com/mikaylagawarecki	2023-09-08 23:18:27 +00:00
Edward Z. Yang	af8b04d5f6	Add create_graph_input debug log (#108836 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/108836 Approved by: https://github.com/mlazos, https://github.com/voznesenskym	2023-09-08 23:00:57 +00:00
Edward Z. Yang	66f67d9a25	Print restart attempt as part of Dynamo log context (#108864 ) Now looks like: ``` [2023-09-08 06:04:48,532] [0/0] torch._dynamo.symbolic_convert: [DEBUG] TRACE STORE_ATTR foo [ConstantVariable(int), NNModule Variable()] [2023-09-08 06:04:48,532] [0/0] torch._dynamo.convert_frame: [INFO] Restarting analysis due to _dynamo/variables/nn_module.py :138 in convert_to_unspecialized [2023-09-08 06:04:48,533] [0/0_1] torch._dynamo.symbolic_convert: [INFO] Step 1: torchdynamo start tracing f /data/users/ezyang/c/pytorch/a.py:6 [2023-09-08 06:04:48,533] [0/0_1] torch._dynamo.symbolic_convert.__trace_source: [DEBUG] TRACE starts_line f /data/users/ezyang/c/pytorch/a.py:6 ``` I'm happy to bikeshed the exact formatting of the attempt number if you want. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/108864 Approved by: https://github.com/mlazos, https://github.com/voznesenskym	2023-09-08 23:00:19 +00:00
Huy Do	703cdd711f	Revert "[export] Lift constant tensors as buffers (#108592 )" (#108893 ) This reverts commit e3407238f6be0583fe6dac7e2c4897f6c4480ed4. I gave up trying to revert the original PR in the usual way https://github.com/pytorch/pytorch/pull/108592#issuecomment-1712135536, so let's manually revert it then. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108893 Approved by: https://github.com/izaitsevfb, https://github.com/atalman	2023-09-08 22:25:10 +00:00
FFFrog	f30f9fec87	Fix the issue described by #106769 (#108340 ) Fixes #106769 Align the behavior of the C++ interface with the Python interface 1. Remove some checks in C++ frontend api ,which duplicate with below `50fa5880e8/aten/src/ATen/native/RNN.cpp (L676-L690)` 3. Add some checks 4. support 1D 5. Add Test Pull Request resolved: https://github.com/pytorch/pytorch/pull/108340 Approved by: https://github.com/mikaylagawarecki	2023-09-08 22:22:09 +00:00
PyTorch MergeBot	8caaa4f4cd	Revert "Re-land: Break graph on `manual_seed`. (#108647 )" This reverts commit c887309437817f39ea3ef484732af427b393899f. Reverted https://github.com/pytorch/pytorch/pull/108647 on behalf of https://github.com/huydhn due to Ouch, we are hit again my another internal import error from https://github.com/pytorch/pytorch/blob/main/torch/_inductor/config.py#L205-L206 ([comment](https://github.com/pytorch/pytorch/pull/108647#issuecomment-1712230103))	2023-09-08 21:18:00 +00:00
Kaichao You	296f015f42	[Dev Container]Add readme for devcontainer (#108848 ) Following the PR https://github.com/pytorch/pytorch/pull/108766 , add a README to guide users through the usage of devcontainers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108848 Approved by: https://github.com/drisspg	2023-09-08 21:03:27 +00:00
Edward Z. Yang	137afe74e0	Don't fastpath conj copy when conj/neg bit mismatch (#108881 ) Fixes https://github.com/pytorch/pytorch/issues/106051 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/108881 Approved by: https://github.com/soulitzer	2023-09-08 20:44:43 +00:00
Aaron Bockover	bd1229477d	[ONNX] Add initial support for FP8 ONNX export (#107962 ) This PR resurrects @tcherckez-nvidia's #106379 with changes to resolve conflicts against newer `main` and defines our own constants for the new ONNX types to [avoid breaking Meta's internal usage of an old ONNX](https://github.com/pytorch/pytorch/pull/106379#issuecomment-1675189340). - `::torch::onnx::TensorProto_DataType_FLOAT8E4M3FN=17` - `::torch::onnx::TensorProto_DataType_FLOAT8E5M2=19` Pull Request resolved: https://github.com/pytorch/pytorch/pull/107962 Approved by: https://github.com/justinchuby, https://github.com/titaiwangms	2023-09-08 20:40:39 +00:00
Shunting Zhang	fa542cc4bb	update triton pin (#108104 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108104 Approved by: https://github.com/desertfire ghstack dependencies: #107722	2023-09-08 20:01:57 +00:00
Rohith Menon	39ff80125f	Add support for an operator level thread local observer (#108822 ) Summary: Add support for an operator level thread local observer Test Plan: Verified the interception as part of a pytorch model evaluation with static runtime. Differential Revision: D49082250 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108822 Approved by: https://github.com/davidberard98	2023-09-08 19:32:03 +00:00
PyTorch MergeBot	68238606f3	Revert "Reland: Add PyObject preservation for UntypedStorage (#103907 )" This reverts commit 56b848157c259b4e53225e2516d603e9c8cfab79. Reverted https://github.com/pytorch/pytorch/pull/103907 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it is failing torchdistx build which uses check_pyobj here `9c1b9f5cb2/src/python/torchdistx/_C/deferred_init.cc (L87)` ([comment](https://github.com/pytorch/pytorch/pull/103907#issuecomment-1712121158))	2023-09-08 19:27:07 +00:00
soulitzer	8d863560bd	Allow adding extra dispatch keys to wrapper tensor subclass (#108808 ) Updated version of https://github.com/pytorch/pytorch/pull/108313 which has more review comments Pull Request resolved: https://github.com/pytorch/pytorch/pull/108808 Approved by: https://github.com/bdhirsh	2023-09-08 18:46:09 +00:00
Thiago Crepaldi	aa3355da8a	Refactor torch.onnx documentation (#108379 ) * Distinguish both TorchScript-based exporter (`torch.onnx.export`) and the TorchDynamo-based exporter (`torch.onnx.dynamo_export`) exporters * Merge ONNX diagnostics page with the exporter page * Add initial version of a quick overview on the new exporter * Updates `torch.compiler.html` with the right page for the ONNX Runtime backend for `torch.compile` * Renamed doc files to clearly identify files belonging to the legacy and newer onnx exporters Fixes #108274 https://docs-preview.pytorch.org/pytorch/pytorch/108379/index.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/108379 Approved by: https://github.com/justinchuby, https://github.com/wschin, https://github.com/malfet	2023-09-08 18:23:48 +00:00
Bin Bao	e91f66471c	[reland][inductor] Switch to use the runtime interface for AOTInductor testing (#108878 ) Summary: This is a reland of https://github.com/pytorch/pytorch/pull/108663 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108878 Approved by: https://github.com/muchulee8	2023-09-08 17:58:35 +00:00
Iman Tabrizian	a81290ccb9	Add DLPack bool support (#108486 ) Fixes #94463 Fixes https://github.com/pytorch/pytorch/issues/67081 - [X] Update DLPack header file - [X] Add testing for DLPack boolean - [X] Add boolean support to PyTorch's DLPack support Pull Request resolved: https://github.com/pytorch/pytorch/pull/108486 Approved by: https://github.com/ezyang	2023-09-08 17:55:33 +00:00
Jerry Zhang	b0de6a8002	[quant][executorch] Support inception_v4 in examples (#108382 ) Summary: Verified that pt2e quant flow matches the fx flow with executorch backend config Test Plan: with-proxy buck2 run executorch/examples/quantization:example -- -m=ic4 --verify ``` [INFO 2023-08-31 16:08:06,923 example.py:77] prepare sqnr: inf [INFO 2023-08-31 16:08:06,932 example.py:81] quant diff max: 0.0 [INFO 2023-08-31 16:08:06,936 example.py:85] quant sqnr: inf ``` full output: https://www.internalfb.com/intern/paste/P818520579/ Differential Revision: D48889075 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108382 Approved by: https://github.com/kimishpatel	2023-09-08 17:39:31 +00:00
Jeffrey Dunn	25d657c701	Fix possible naming collision issue (#107743 ) Summary: As pointed out in https://github.com/pytorch/pytorch/pull/107479, using a set prevents collisions like "a" => "a", "a" => "a_1", "a_1" => "a_1" (but should go to "a_1_1"). We can combine using counters and a set to avoid this problem. Still gets us the performance benefit in the case of collisions with a very minor penalty in a case with no collision. Test Plan: Extract this code and run: ``` # New version from typing import Dict, Set class Net: _net_names_used_counters: Dict[str, int] = {} _net_names_used: Set[str] = set() staticmethod def current_prefix(): return "test_prefix" staticmethod def _get_next_net_name(basename): basename = "/".join(x for x in [Net.current_prefix(), basename] if x) idx = Net._net_names_used_counters.get(basename, 0) while (name := basename if idx == 0 else f"{basename}_{idx}") in Net._net_names_used: idx += 1 Net._net_names_used_counters[basename] = idx + 1 Net._net_names_used.add(name) return name print(Net._get_next_net_name("basename")) print(Net._get_next_net_name("x_basename")) print(Net._get_next_net_name("basename")) print(Net._get_next_net_name("basename")) print(Net._get_next_net_name("x_basename")) print(Net._get_next_net_name("basename_1")) > test_prefix/basename > test_prefix/x_basename > test_prefix/basename_1 > test_prefix/basename_2 > test_prefix/x_basename_1 > test_prefix/basename_1_1 ``` Differential Revision: D48576516 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107743 Approved by: https://github.com/zdevito	2023-09-08 17:39:27 +00:00
Yanbo Liang	8990174676	[Dynamo] Should inline __new__ function rather than skipping frame (#108549 ) Fixes #107460 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108549 Approved by: https://github.com/jansel	2023-09-08 16:51:47 +00:00
Edward Z. Yang	9b83402666	Add support for symbolic repeat_interleave (#108763 ) Fixes https://github.com/pytorch/pytorch/issues/108195 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/108763 Approved by: https://github.com/Chillee	2023-09-08 16:48:32 +00:00
Richard Zou	ef2bbe1ae1	Dynamo support for autograd.Function w/ once_differentiable (#108686 ) Fixes #106893 There are two main changes: - Before this PR, the function returned by once_differentiable was included in skipfiles (because its .co_code is torch/autograd/function.py). This PR adds a mechanism to tell Dynamo to inline a function, no matter if it is included in skipfiles. - A bugfix: when we are introspecting the backward, we need to turn the grad mode off. This is to accurately model the eager-mode semantics: In eager-mode PyTorch, if second-order gradients were not requested, then the grad mode is off. torch.compile does not work with higher-order gradients and just assumes we do first-order gradients, so this is OK. Test Plan: - new test Differential Revision: [D49064185](https://our.internmc.facebook.com/intern/diff/D49064185) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108686 Approved by: https://github.com/voznesenskym	2023-09-08 16:10:32 +00:00
cyy	16c2fb702b	fix a CMake syntax warning (#108849 ) Fix the CMake Warning ``` (dev) at torch/CMakeLists.txt:389: Syntax Warning in cmake code at column 115 Argument not separated from preceding token by whitespace. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/108849 Approved by: https://github.com/Skylion007	2023-09-08 16:10:12 +00:00
PyTorch MergeBot	fa8bfe5ca2	Revert "increase clang-tidy coverage in torch/csrc (#103058 )" This reverts commit cdf7f3e78032a17600f701e9153e9bb49fad8ce7. Reverted https://github.com/pytorch/pytorch/pull/103058 on behalf of https://github.com/atalman due to Sorry for reverting your change, breaks lint ([comment](https://github.com/pytorch/pytorch/pull/103058#issuecomment-1711906915))	2023-09-08 16:07:41 +00:00
cyy	cdf7f3e780	increase clang-tidy coverage in torch/csrc (#103058 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/103058 Approved by: https://github.com/Skylion007	2023-09-08 15:07:32 +00:00
Andrei Gheorghe	2028987bf7	Fix finding Intel MKL on Windows, as well as LAPACK, cuDNN and cuSPARSELt (#108040 ) Fixes #108039 Intel MKL is now found correctly: -- MKL libraries: C:/Program Files (x86)/Intel/oneAPI/mkl/latest/lib/intel64/mkl_intel_lp64.lib;C:/Program Files (x86)/Intel/oneAPI/mkl/latest/lib/intel64/mkl_sequential.lib;C:/Program Files (x86)/Intel/oneAPI/mkl/latest/lib/intel64/mkl_core.lib -- MKL include directory: C:/Program Files (x86)/Intel/oneAPI/mkl/latest/include and LAPACK too (excerpt from build.ninja): LINK_LIBRARIES = lib\c10.lib lib\pthreadpool.lib lib\cpuinfo.lib lib\XNNPACK.lib lib\fbgemm.lib lib\libittnotify.lib lib\gloo.lib lib\foxi_loader.lib lib\kineto.lib "C:\Program Files (x86)\Intel\oneAPI\mkl\latest\lib\intel64\mkl_intel_lp64.lib" "C:\Program Files (x86)\Intel\oneAPI\mkl\latest\lib\intel64\mkl_sequential.lib" "C:\Program Files (x86)\Intel\oneAPI\mkl\latest\lib\intel64\mkl_core.lib" "C:\Program Files (x86)\Intel\oneAPI\mkl\latest\lib\intel64\mkl_lapack95_lp64.lib" "C:\Program Files (x86)\Intel\oneAPI\mkl\latest\lib\intel64\mkl_intel_lp64.lib" "C:\Program Files (x86)\Intel\oneAPI\mkl\latest\lib\intel64\mkl_sequential.lib" "C:\Program Files (x86)\Intel\oneAPI\mkl\latest\lib\intel64\mkl_core.lib" "C:\Program Files (x86)\Intel\oneAPI\mkl\latest\lib\intel64\mkl_intel_lp64.lib" "C:\Program Files (x86)\Intel\oneAPI\mkl\latest\lib\intel64\mkl_sequential.lib" "C:\Program Files (x86)\Intel\oneAPI\mkl\latest\lib\intel64\mkl_core.lib" cuSPARSELt is also found correctly: -- Found CUSPARSELT: C:/Program Files/NVIDIA cuSPARSELt/v0.4/lib/cusparseLt.lib Also cuDNN include directory is properly added for the test target cuda_cudnn_test: build caffe2\CMakeFiles\cuda_cudnn_test.dir\__\aten\src\ATen\test\cuda_cudnn_test.cpp.obj: CXX_COMPILER__cuda_cudnn_test_RelWithDebInfo C$:\work\Repos\pytorch\aten\src\ATen\test\cuda_cudnn_test.cpp \|\| cmake_object_order_depends_target_cuda_cudnn_test DEFINES = .... FLAGS = .... INCLUDES = -IC:\work\Repos\pytorch\build\aten\src -IC:\work\Repos\pytorch\aten\src ........... -external:IC:\work\Repos\pytorch\third_party\ittapi\include -external:IC:\work\Repos\pytorch\cmake\..\third_party\eigen -external:I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\include" -external:IC:\work\Repos\pytorch\torch\include -external:IC:\work\Repos\pytorch\third_party\ideep\include -external:IC:\work\Repos\pytorch\third_party\googletest\googletest\include -external:IC:\work\Repos\pytorch\third_party\googletest\googletest -external:I"C:\Program Files\NVIDIA cuDNN\include" -external:IC:\work\Repos\pytorch\cmake\..\third_party\cudnn_frontend\include -external:W0 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108040 Approved by: https://github.com/ezyang	2023-09-08 14:41:00 +00:00
CK Luk	366baf690b	Back out "[Dynamo x FSDP] Add support for params, buffers, submodules on FSDPManagedNNModuleVariable (#107923 )" (#108823 ) Summary: Original commit changeset: 33650f7cb0fb Original Phabricator Diff: D48833682 Test Plan: See T162942232 for how we figured out that this diff caused significant numeric difference. Reviewed By: voznesenskym Differential Revision: D49082219 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108823 Approved by: https://github.com/xw285cornell	2023-09-08 14:39:43 +00:00
Edward Z. Yang	39180a8414	Comment about prune_dead_locals in dynamo (#107787 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/107787 Approved by: https://github.com/mlazos	2023-09-08 14:37:28 +00:00
Paul Zhang	51c2b587c9	Back out "[PyPer][BE] Fix test_scripted_module in StatCollector" (#108588 ) Differential Revision: D48908507 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108588 Approved by: https://github.com/jerryzh168	2023-09-08 14:33:58 +00:00
Randolf Scholz	ddbaad6d74	updated pad_sequence type hint (#108765 ) Fixes #89623 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108765 Approved by: https://github.com/malfet, https://github.com/zou3519, https://github.com/ezyang	2023-09-08 13:06:03 +00:00
XiaobingSuper	09f7cb0eaf	fix typo of mkldnn linear dynamic shape path (#108330 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108330 Approved by: https://github.com/jgong5, https://github.com/jansel ghstack dependencies: #108220	2023-09-08 08:47:57 +00:00
Huy Do	a9c663c269	Revert "Flash Attention v2 (#105602 )" (#108827 ) This reverts commit add45aea1cc8048fd0b43445b28fec7d93281f00. There are some conflicts on some benchmark csv file https://github.com/pytorch/pytorch/pull/105602#issuecomment-1710988951 so I need to revert this manually. The diff has been reverted internally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108827 Approved by: https://github.com/kit1980	2023-09-08 07:43:04 +00:00
Matthew Hoffman	e40d6ae0a7	Improve torch.cuda.amp type hints (#108630 ) Fixes #108629 1. Add the following to their modules' `__all__` so that pyright considers them to be publicly exported: * [`torch.autocast`](https://pytorch.org/docs/stable/amp.html#torch.autocast) * [`torch.cuda.amp.GradScaler`](https://pytorch.org/docs/stable/amp.html#torch.cuda.amp.GradScaler) * [`torch.cuda.amp.autocast`](https://pytorch.org/docs/stable/amp.html#torch.cuda.amp.autocast) * [`torch.cuda.amp.custom_fwd`](https://pytorch.org/docs/stable/amp.html#torch.cuda.amp.custom_fwd) * [`torch.cuda.amp.custom_bwd`](https://pytorch.org/docs/stable/amp.html#torch.cuda.amp.custom_bwd) 2. Add `overload`s for `torch.cuda.amp.GradScaler.scale` to differentiate when a `torch.Tensor` is returned vs. an `Iterable[torch.Tensor]` is returned based on the type of the `outputs` parameter. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108630 Approved by: https://github.com/ezyang	2023-09-08 06:06:25 +00:00
Michael Lazos	6c7260407b	Back out "Horizontally fuse input concatenation (#108115 )" (#108793 ) Summary: Original commit changeset: f15956d96311 Original Phabricator Diff: D48996091 Test Plan: Reverting to Unbreak test Differential Revision: D49065517 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108793 Approved by: https://github.com/Chillee	2023-09-08 05:14:57 +00:00
PyTorch MergeBot	428f5f9e7e	Revert "[inductor] Switch to use the runtime interface for AOTInductor testing (#108663 )" This reverts commit 366ce589d0b6bdde8f9ca2087f224b6925841a05. Reverted https://github.com/pytorch/pytorch/pull/108663 on behalf of https://github.com/Chillee due to Sorry :'( Need to revert to resolve merge conflict for another revert ([comment](https://github.com/pytorch/pytorch/pull/108663#issuecomment-1711076411))	2023-09-08 05:01:27 +00:00
Jason Ansel	4965fffeda	[dynamo] Move global state guards to C++ (#108624 ) This combines a bunch of python global state guards into a single C++ guard and switches to checking them 100% of the time. It also adds a few new guards for things that change inductor's behavior. Even though we are checking more things, I expect this to be much faster. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108624 Approved by: https://github.com/anijain2305	2023-09-08 04:07:08 +00:00
PyTorch UpdateBot	258bc2d845	[vision hash update] update the pinned vision hash (#108818 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108818 Approved by: https://github.com/pytorchbot	2023-09-08 04:04:06 +00:00
Mu-Chu Lee	30a33b76b9	[AOTInductor] Include constants in AOTInductor .so file. (#108473 ) Summary: Include constants in AOTInductor .so file. Added some difference: 1) serialize with ctypes instead of the native of torch.storage 2) Use the underlying for_blob instead of from_blob to construct Tensor. Test Plan: Unit tests: ``` test/inductor/test_aot_inductor.py ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/108473 Approved by: https://github.com/angelayi	2023-09-08 03:49:53 +00:00
PyTorch MergeBot	72f24d0001	Revert "[dynamo][finishing colesbury's PR 100642] Guard on nn.Module dicts and type (#108528 )" This reverts commit 34bb74c4cf963f1939b4988b7e76b2cea5e2a914. Reverted https://github.com/pytorch/pytorch/pull/108528 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it has some nasty merge conflicts after the revert of D48910794. I need to revert this so the conflict could be resolved. Please help rebase this tomorrow and reland the change ([comment](https://github.com/pytorch/pytorch/pull/108528#issuecomment-1711034781))	2023-09-08 03:49:41 +00:00
PyTorch MergeBot	e45b290127	Revert "Revert "Flash Attention v2 (#105602 )" (#108827 )" This reverts commit 24e9bbe22af296048f8242c6112d13cff726c588. Reverted https://github.com/pytorch/pytorch/pull/108827 on behalf of https://github.com/huydhn due to I need to land this revert properly as there are new failures showing up on trunk ([comment](https://github.com/pytorch/pytorch/pull/108827#issuecomment-1711020924))	2023-09-08 03:25:45 +00:00
Huy Do	24e9bbe22a	Revert "Flash Attention v2 (#105602 )" (#108827 ) This reverts commit add45aea1cc8048fd0b43445b28fec7d93281f00. There are some conflicts on some benchmark csv file https://github.com/pytorch/pytorch/pull/105602#issuecomment-1710988951 so I need to revert this manually. The diff has been reverted internally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108827 Approved by: https://github.com/kit1980	2023-09-08 02:54:20 +00:00
Randolf Scholz	8391e3fba4	fixed nn.Module.to type hint (#108767 ) Fixes #108675 - [x] adds `str` as option for `device` - [x] use `typing_extensions.Self` instead of `T`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108767 Approved by: https://github.com/ezyang	2023-09-08 02:40:53 +00:00
youkaichao	f90444cf0b	[Dynamo][Test]Add a testcase for module with training state (#108750 ) Add the problem mentioned in https://github.com/pytorch/pytorch/issues/105653 into tests. This issue has been addressed by https://github.com/pytorch/pytorch/pull/108528 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/108750 Approved by: https://github.com/anijain2305	2023-09-08 02:39:42 +00:00
David Berard	dec2b267d4	[dynamo] Add "Torch-Compiled Region" profiler event (#108462 ) Motivation: We already have a `CompiledFunction` event that comes from the autograd.Function added by aot_autograd. However, this doesn't appear during inference, or if none of the inputs to a graph require grad. It also doesn't appear if your backend doesn't use aot_autograd. This adds a profiler event that will always appear. <img width="615" alt="Screenshot 2023-09-01 at 4 46 28 PM" src="https://github.com/pytorch/pytorch/assets/5067123/fed90ca9-a8e7-458c-80eb-b4160de55218"> Perf - increase in latency (with profiler turned off) was within noise when I measured a simple cpu-only torch-compiled function that returned `x.view_as(x)`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108462 Approved by: https://github.com/anijain2305	2023-09-08 02:10:17 +00:00
PyTorch MergeBot	38fcf77a1b	Revert "[dynamo] Add BACKEND_MATCH guard to detect and recompile when backend changes (#107337 )" This reverts commit 1a64ec7dd48408d6839a5c2cceb55b0c4be2243b. Reverted https://github.com/pytorch/pytorch/pull/107337 on behalf of https://github.com/huydhn due to Sorry for reverting your change but inductor perf smoke test starts to regress after this ([comment](https://github.com/pytorch/pytorch/pull/107337#issuecomment-1710974588))	2023-09-08 02:03:48 +00:00
cyy	e3280a7c88	fix returning in void function (#108774 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108774 Approved by: https://github.com/Skylion007	2023-09-08 01:51:14 +00:00
Rodrigo Kumpera	a6dab86259	[C10d] Fix TCPSTore::wait to be robust to interruptions. (#108425 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108425 Approved by: https://github.com/daulet-askarov, https://github.com/fegin	2023-09-08 00:12:20 +00:00
Jing Shan	fc2b980000	[Lint] Auto format graph_module.py (#108594 ) Summary: Auto format the `graph_module.py` file Test Plan: lint Differential Revision: D48983066 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108594 Approved by: https://github.com/jiayisuse	2023-09-08 00:04:21 +00:00
Ken Jin	c458fa0d35	Decompose/add reference for `view_as_complex` (#108005 ) Aten source: `d4a99631dd/aten/src/ATen/native/ComplexHelper.h (L78)` Documentation reference: https://pytorch.org/docs/stable/generated/torch.view_as_complex.html Note: this adds a new primitive `view_of_dtype`, which is trivially implemented, as its meta function is already implemented elsewhere. Finally, this is not registered as a decomposition (yet), because TorchInductor does not yet support complex types. It should be added once we do. Closes https://github.com/pytorch/pytorch/issues/108020 as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108005 Approved by: https://github.com/peterbell10, https://github.com/ezyang	2023-09-07 23:49:20 +00:00
Bin Bao	366ce589d0	[inductor] Switch to use the runtime interface for AOTInductor testing (#108663 ) Summary: Switch AOTInductor unit tests and integration tests to invoke the same runtime interface. This is only an effort to unify the usage of the runtime. The interface scrutiny will come in later PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108663 Approved by: https://github.com/ezyang ghstack dependencies: #108653	2023-09-07 23:38:11 +00:00
Avik Chaudhuri	c55cb29bb2	enforce equalities (#108429 ) Sometimes one might want to impose equalities that are not required by guards, e.g. say that you only want square images when rectangular images would suffice. Curiously we never checked that the concrete values passed in example shapes actually satisfy such equality constraints. So, e.g., you could multiply two tensors of shapes MxK and KxN, specify that M and N must be equal, and then pass examples where they are not equal. Relatedly, the symbolic shape dimensions for inputs in the exported graph were not forced to be equal. However, runtime assertions still fire because they take into account all equality constraints. This would result in the strange situation where export would succeed but the exported program with the same example inputs would fail. This PR fixes these issues. Differential Revision: [D48910918](https://our.internmc.facebook.com/intern/diff/D48910918/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108429 Approved by: https://github.com/zhxchen17	2023-09-07 23:21:35 +00:00
Michael Gschwind	247c603da9	Run mm decomposition tests for CPU and GPU (#108620 ) Summary: Run mm decomposition tests for CPU and GPU One nit - this will suppress CPU tests on hosts that have CUDA (i.e., TEST_CUDA is True), but doesn't have Triton because we don't have access to whether the test is actually for CPU or CUDA (which would require reading the device argument) (This is a general limitation on torch.compile tests because on CUDA they require triton in the std config.) Test Plan: sandcastle, github Differential Revision: D48998215 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108620 Approved by: https://github.com/bertmaher	2023-09-07 23:00:26 +00:00
ydwu4	1a64ec7dd4	[dynamo] Add BACKEND_MATCH guard to detect and recompile when backend changes (#107337 ) Motivation: We try to make torch.cond use torch.compile automatically so that we could error out when there is side-effects in the branches and correctly handle the closures. Before this PR, we have a warning if we don't turn on a config raise_on_backend_change (turning it on gives us an error) for the following code: ```python def foo() # Inside torch.cond, we'd like to do something like torch.compile(foo, backend="eager", fullgraph=True)(...) ... # Users may then call torch.compile somewhere else. # Dynamo will use the cached code of foo for "eager" backend # but we expect dynamo to recompile with "inductor" backend. torch.compile(foo, backend="inductor")(...) ``` This PR adds a BACKEND_MATCH guard. Effectively, it implements a per-backend cache. In the above example, the cached code for "eager" won't work for "inductor" due to guard check failures and the second torch.compile will do a re-compilation. In the future, it might be useful to have something like a configuration guard that guards against dynamo configuration changes across different compiles (e.g. compile a function with fullgraph=False then compile it again with fullgraph=True). Implementation: 1. We add a guarded_backend_cache and check the most_recent_backend against the backend associated with cached code. We also remove the raise_on_backend_change flag. 2. Then newly added context manager and guard adds more lines for debug log so we change the uppper limit from 50 to 55. Test Plan: Removed original tests that raise on different backend and add a new test to test whether the BACKEND_MATCH guard can guard against backend change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107337 Approved by: https://github.com/jansel	2023-09-07 22:45:54 +00:00
Rodrigo Kumpera	b26af5d5ac	[c10d] Add TCPSTore libuv backend support to c10d rendezvous. (#108284 ) This enables libuv under env and tcp urls. Under env either use the environment variable USE_LIBUV=1 or the url parameter use_lib=1. Under tcp use the url parameter use_lib=1. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108284 Approved by: https://github.com/H-Huang, https://github.com/XilunWu	2023-09-07 21:39:58 +00:00
Kaichao You	96d269eab1	[Dev Container][CUDA]Fix linker path (#108766 ) Building with CUDA in dev container leads to error: `cannot find -lcudart_static`. This is because the libraries are under a custom CUDA_HOME, and `ld` cannot find it. Updating the `LDFLAGS` environment variable works. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108766 Approved by: https://github.com/drisspg	2023-09-07 21:32:39 +00:00
drisspg	09a17c512d	Add better error messaging to scaled_mm (#108454 ) Fixes #108411 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108454 Approved by: https://github.com/vkuzo	2023-09-07 21:26:47 +00:00
Evgeni Burovski	1f20531939	fall back to eager on `NotImplementedError` (#107863 ) Follow-up to https://github.com/pytorch/pytorch/pull/107710: Help dynamo fall back to eager when compiling unimplemented numpy constructs: - arrays of strings - (arg){min, max} for complex types - various arguments typed as NotImplemented (`np.ones(4, order="F")` etc) - numpy functions which torch._numpy does not implement To test, run (we do not implement arrays of strings) ``` import torch import numpy as np @torch.compile(fullgraph=False) def fn(): return np.asarray(["L", "U"]) ``` and observe it compiles with fullgraph=False and fails with fullgraph=True Fixes https://github.com/pytorch/pytorch/issues/107970 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107863 Approved by: https://github.com/ezyang, https://github.com/lezcano	2023-09-07 21:22:20 +00:00
PyTorch MergeBot	8ba23e48fa	Revert "[inductor] Add ir.Scan and lower aten.cumsum on CUDA (#106581 )" This reverts commit 53a27021c59f1df640cba88bb48f67ee977e07f8. Reverted https://github.com/pytorch/pytorch/pull/106581 on behalf of https://github.com/atalman due to Sorry for reverting your change, but it broke rocm CI ([comment](https://github.com/pytorch/pytorch/pull/106581#issuecomment-1710776610))	2023-09-07 21:13:42 +00:00
ydwu4	774c822979	Fix expected test failures for predispatch export nested cond and out_dtype (#108715 ) Before this PR, we use get_fake_value to get the fake_sub_args then call op(fake_sub_args) to get the example value for out dtype. This causes problem when the input proxy's op type is `get_attr`, get_fake_value for a `get_attr` node will actually look at the original param/buffer and return a real tensor* instead of fake tensor. This is OK for export, since export's fake_mode allows non_fake_inputs see [here](https://github.com/pytorch/pytorch/blob/main/torch/_dynamo/output_graph.py#L278). But it causes problem when nesting cond with out_dtype where cond will use torch.compile(full_graph=True) to inspect out_dtype and find the inputs to op are mixed FakeTensor and real tensor. This PR changes how we get the example values from proxies by directly looking at node.meta["example_value"]. This meta data is guaranteed to exist for all proxies during dynamo tracing so it's safe to use ( it's also used by get_fake_value to get fake tensors from args for general ops see [here](https://github.com/pytorch/pytorch/blob/main/torch/_dynamo/utils.py#L1318)). Test Plan: existing tests + remove expected failure for a test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108715 Approved by: https://github.com/zou3519	2023-09-07 18:13:00 +00:00
Peter Bell	53a27021c5	[inductor] Add ir.Scan and lower aten.cumsum on CUDA (#106581 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/106581 Approved by: https://github.com/lezcano	2023-09-07 17:40:45 +00:00
eellison	ab9fb03d6f	Remove fixed skips (#108674 ) These no longer fail with TEST_WITH_TORCHINDUCTOR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108674 Approved by: https://github.com/desertfire	2023-09-07 17:36:56 +00:00
PyTorch MergeBot	77691e8bc3	Revert "[dynamo][activation checkpointing] Trace through ActivationWrapper (#108599 )" This reverts commit 9efe0f7bf2b397f5ba7ea778fe155e415f54ea67. Reverted https://github.com/pytorch/pytorch/pull/108599 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but test_ddp_activation_checkpointing is failing distributed ROCm test in trunk ([comment](https://github.com/pytorch/pytorch/pull/108599#issuecomment-1710479387))	2023-09-07 16:47:40 +00:00
Zhengxu Chen	c75aec90d3	[dynamo] Record nn_module_stack also for unspecialized nn modules. (#108281 ) Summary: Currently node metadata "nn_module_stack" is only being used by export. For some export model, we still want to retain nn_module_stack for unspecialized module for various purposes. This diff add a path to also record nn_module_stack when unspecialized module has a source available. Test Plan: test_export_nn_module_stack_patched_module Differential Revision: D48841193 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108281 Approved by: https://github.com/yanboliang, https://github.com/tugsbayasgalan	2023-09-07 15:38:46 +00:00
FFFrog	121cfb60c0	fix the issue decribled by #108380 (#108759 ) Fixes #108380 As the title shown. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108759 Approved by: https://github.com/awgu	2023-09-07 13:53:36 +00:00
Joel Schlosser	b928e08f3d	Initial vmap + NT support with unbind fallback (#106786 ) PoC demonstrating vmap + NT based on the [design doc](https://docs.google.com/document/d/1dVVk6TOqz93PLTIneU2T3xaxCs9qZ0MaJyCvOAp_bC0). This PR: * Allows `BatchedTensorImpl`s to contain NTs * Introduces a `BatchedNestedTensor` dispatch key for NT-specific batching rules * Provides a batching rule fallback that unbinds the NTs -> performs computation on constituent -> rebinds results into NT Restrictions: * Only supports one level of vmap * Only supports vmapping over dim=0 for NTs * For operations with mixed NT / dense inputs, support is also limited to dim=0 for the dense inputs Pull Request resolved: https://github.com/pytorch/pytorch/pull/106786 Approved by: https://github.com/zou3519	2023-09-07 13:53:20 +00:00
cyy	e4f3e5434f	[Reland] Elimates c10::guts::to_string (#108748 ) Reland of PR #108480, after relanding another blocking PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108748 Approved by: https://github.com/huydhn	2023-09-07 13:35:17 +00:00
Yukio Siraichi	c887309437	Re-land: Break graph on `manual_seed`. (#108647 ) Trying to re-land #107594. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108647 Approved by: https://github.com/eellison	2023-09-07 12:52:38 +00:00
Edward Z. Yang	9f37aec964	Add torch._check_is_size (#108685 ) Check comments for what it does. The key distinction is that if you feed it an unbacked SymInt, we will also apply >= 2 assumption at compile time. This will get exercised when I reland https://github.com/pytorch/pytorch/pull/107788 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/108685 Approved by: https://github.com/albanD, https://github.com/Skylion007	2023-09-07 12:48:39 +00:00
Bin Bao	e1aba2c8c3	[CI] Update the pinned timm version (#108076 ) Summary: Unify the pinned timm version and install timm at the docker building time Pull Request resolved: https://github.com/pytorch/pytorch/pull/108076 Approved by: https://github.com/ezyang	2023-09-07 11:38:13 +00:00
Michael Lazos	b193f295b6	Add capturable ASGD impl (#107857 ) Add capturable ASGD impl + test Pull Request resolved: https://github.com/pytorch/pytorch/pull/107857 Approved by: https://github.com/janeyx99	2023-09-07 06:30:30 +00:00
cyy	4fa283e0a4	[Reland] Simplify c10::string_view implementation (#108622 ) PR #108479 was reverted because ``` In file included from xplat/caffe2/c10/util/Exception.h:5: In file included from xplat/caffe2/c10/util/StringUtil.h:6: xplat/caffe2/c10/util/string_view.h:576:31: error: out-of-line definition of constexpr static data member is redundant in C++17 and is deprecated [-Werror,-Wdeprecated] basic_string_view<CharT>::npos; ``` Now this is fixed and Wdeprecated generated no warnings on my host. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108622 Approved by: https://github.com/Skylion007	2023-09-07 06:24:22 +00:00
Bin Bao	fae9547cb7	[inductor] Refactor wrapper.py (#108653 ) Summary: Cherry-pick refactoring from https://github.com/pytorch/pytorch/pull/105331 to make the code review easier. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108653 Approved by: https://github.com/ezyang, https://github.com/khabinov	2023-09-07 05:27:59 +00:00
gs-olive	6a448816f5	[fx][split] Copy node metadata for placeholders (#107981 ) - Follow-up to #107248 which copies metadata for placeholder nodes in the top-level FX graph - Currently, top-level placeholders do not have their metadata copied over, causing loss of `TensorMetadata` in some `torch.compile` backends Fixes https://github.com/pytorch/TensorRT/issues/2258 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107981 Approved by: https://github.com/angelayi	2023-09-07 04:44:17 +00:00
Kurt Mohler	56b848157c	Reland: Add PyObject preservation for UntypedStorage (#103907 ) This relands #97470 after #102553 reverted it. This PR attempts to fix the internal failure by avoiding an unnecessary intermediate storage buffer allocation in `c10::newStorageImplFromRefcountedDataPtr`. Part of #91395 Pull Request resolved: https://github.com/pytorch/pytorch/pull/103907 Approved by: https://github.com/ezyang	2023-09-07 04:24:11 +00:00
Jason Ansel	35974234c4	[inductor] simplify time_and_log fallback (#108489 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108489 Approved by: https://github.com/eellison ghstack dependencies: #108468	2023-09-07 04:23:36 +00:00
Jason Ansel	96dd173fa0	[inductor] simplify cudagraph_fail_reasons printing (#108468 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108468 Approved by: https://github.com/eellison	2023-09-07 04:23:36 +00:00
wz337	7bc25e38c0	[HSDP] Raise error when HSDP device_mesh has a parent_mesh (#108603 ) As we don't currently support HSDP + TP yet, raises an error for HSDP initialization if a device_mesh passed in has a parent mesh. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108603 Approved by: https://github.com/awgu	2023-09-07 04:17:10 +00:00
Sam Larsen	275e71c562	[inductor][easy] Enable Mypy Checking in torch/_inductor/kernel/ (#108678 ) Summary: Looks like these already pass (and torch/_inductor/kernel/mm_plus_mm_new.py does not exist) Test Plan: `lintrunner torch/_inductor/kernel/mm.py torch/_inductor/kernel/bmm.py torch/_inductor/kernel/__init__.py torch/_inductor/kernel/mm_plus_mm.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/108678 Approved by: https://github.com/eellison	2023-09-07 03:29:22 +00:00
Kurt Mohler	3f88e3105f	Reland: Remove remaining global `set_default_dtype` calls from tests (#108088 ) Fixes #68972 Relands #107246 To avoid causing Meta-internal CI failures, this PR avoids always asserting that the default dtype is float in the `TestCase.setUp/tearDown` methods. Instead, the assert is only done if `TestCase._default_dtype_check_enabled == True`. `_default_dtype_check_enabled` is set to True in the `if __name__ == "__main__":` blocks of all the relevant test files that have required changes for this issue Pull Request resolved: https://github.com/pytorch/pytorch/pull/108088 Approved by: https://github.com/ezyang	2023-09-07 03:04:34 +00:00
Catherine Lee	54e73271c7	When patching dynamic shapes test class, don't run the original tests (#108681 ) redo of https://github.com/pytorch/pytorch/pull/103523 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108681 Approved by: https://github.com/ezyang	2023-09-07 02:13:59 +00:00
Yanbo Liang	027e3b7910	[Forward-fix] check if source is None when using tensor out variants (#108700 ) Summary: As title Test Plan: Sandcastle Reviewed By: JacobSzwejbka Differential Revision: D49029357 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108700 Approved by: https://github.com/angelayi	2023-09-07 01:51:02 +00:00
Animesh Jain	34bb74c4cf	[dynamo][finishing colesbury's PR 100642] Guard on nn.Module dicts and type (#108528 ) This PR is a 99% copy paste of Sam Gross (@colesbury) work at https://github.com/pytorch/pytorch/pull/100642. Copied from there -------- The NN_MODULE guard now subsumes guards on Module attributes. The check_fn will fail if the module attributes are changed (such as Module.training), parameters, submodules, and buffers are added or removed, and if fields are changed on the type itself. This gives up specificity in the guard check -- if any field is changed the check_fn fails -- for faster overall checks. ----- Pull Request resolved: https://github.com/pytorch/pytorch/pull/108528 Approved by: https://github.com/ezyang	2023-09-07 01:45:47 +00:00
Zhengxu Chen	d830e4658a	[export] Fix unlifting pass param name handling. (#108659 ) Summary: Fixing an internal test. Differential Revision: D49014757 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108659 Approved by: https://github.com/huydhn	2023-09-07 01:39:07 +00:00
Huy Do	d301fb4022	Fix broken doc tests after #108482 (#108725 ) Tiny fix so I don't wanna revert the PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108725 Approved by: https://github.com/kit1980	2023-09-07 01:24:53 +00:00
angelayi	e3407238f6	[export] Lift constant tensors as buffers (#108592 ) When we retrace the graph containing constant tensors, they get lifted as buffer inputs. AotInductor also wants to lift all the constants as inputs. If we separate the constants as a separate thing, then it adds an additional complexity where we now have to keep track of 3 inputs (params, buffers, constants). Cons: People might care about specifically what buffers are/are not buffers? If people want to know specifically which buffers are constants, we can add an additional field in the graph signature to mark this. Differential Revision: [D49017872](https://our.internmc.facebook.com/intern/diff/D49017872) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108592 Approved by: https://github.com/zhxchen17	2023-09-07 01:14:30 +00:00
PyTorch MergeBot	43527d41a2	Revert "Remove fixed skips (#108674 )" This reverts commit 518cfda2dd0e940603c74717b4cb33493a9ec908. Reverted https://github.com/pytorch/pytorch/pull/108674 on behalf of https://github.com/huydhn due to Sorry for reverting this, but one test is failing on inductor `518cfda2dd`, and it seems easier to revert this than disabling the test ([comment](https://github.com/pytorch/pytorch/pull/108674#issuecomment-1709310192))	2023-09-07 00:56:46 +00:00
Sam Larsen	27fe45eaf6	[inductor][easy] Enable Mypy Checking for torch/_inductor/decomposition.py (#108682 ) Summary: Looks like one simple type mismatch between `get_decompositions()` and `remove_decompositions()` Test Plan: `lintrunner torch/_inductor/decomposition.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/108682 Approved by: https://github.com/eellison	2023-09-07 00:48:55 +00:00
Animesh Jain	9efe0f7bf2	[dynamo][activation checkpointing] Trace through ActivationWrapper (#108599 ) Fixes https://github.com/pytorch/pytorch/issues/108269 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108599 Approved by: https://github.com/rohan-varma	2023-09-07 00:32:18 +00:00
Kimish Patel	c1877e99c5	[Quant] Move to BFS instead of DFS to check for connectedness (#108572 ) Summary: Using dfs to check if two nodes are connecgted is making it very slow. Use of BFS makes it much faster. Test Plan: https://gist.github.com/leslie-fang-intel/9cd828623f567a3afbf41564d3546398 Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D48971710](https://our.internmc.facebook.com/intern/diff/D48971710) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108572 Approved by: https://github.com/jerryzh168, https://github.com/osalpekar	2023-09-07 00:26:28 +00:00
Michael Gschwind	2a40fe2dbf	[experimental] use EXCEPT_FOR env to suppress CPU tests from GPU RE (#108672 ) Summary: [experimental] use EXCEPT_FOR env to suppress CPU tests from GPU RE -- alternative implementation to D48997976 using preexisting PYTORCH_TESTING_DEVICE_EXCEPT_FOR facility and building remaining logic (for assert-positive listers like test_transformers) on top of that. Goal: save ~100 GPU (10% of capacity), enables us to fund more aggressive PyPer unit testing on GPU RE Test Plan: sandcastle, github Differential Revision: D48998582 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108672 Approved by: https://github.com/bertmaher	2023-09-06 23:33:18 +00:00
PyTorch MergeBot	6a304ed1f2	Revert "Skip ROCm jobs on PR (for now) (#108083 )" This reverts commit 9fdb5ef26b5c560d80d002ca9fea9632a523b94b. Reverted https://github.com/pytorch/pytorch/pull/108083 on behalf of https://github.com/huydhn due to ROCm queue looks better now, reverting this to see if the queue looks ok before picking up https://github.com/pytorch/test-infra/issues/4516 ([comment](https://github.com/pytorch/pytorch/pull/108083#issuecomment-1709222748))	2023-09-06 22:47:25 +00:00
JackCaoG	e73ec92ad2	Minor fixs to make torchbench runable on torch/xla (#107919 ) `import torch_xla.core.xla_model as xm` no longer trigger the xla runtime to init, hence explictly create the device here. This is a workaround for https://github.com/pytorch/xla/issues/4174. `is_correct` reference has been deleted, I think it is a deadcode. After this patch, I am able to run ``` python benchmarks/dynamo/torchbench.py --randomize-input --performance --trace-on-xla --training --backend=openxla --only resnet50 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/107919 Approved by: https://github.com/shunting314, https://github.com/wconstab	2023-09-06 22:35:53 +00:00
eellison	518cfda2dd	Remove fixed skips (#108674 ) These no longer fail with TEST_WITH_TORCHINDUCTOR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108674 Approved by: https://github.com/desertfire	2023-09-06 22:33:43 +00:00
DanilBaibak	e6042db0f1	Try to use linux.arm64.2xlarge runners (#107672 ) Try to use linux.arm64.2xlarge runners. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107672 Approved by: https://github.com/atalman	2023-09-06 22:06:57 +00:00
atalman	cd6a332bc5	Use linux.24xlarge for conda linux nightly builds (#108695 ) Fixes https://github.com/pytorch/pytorch/issues/108607 CI test: https://github.com/pytorch/pytorch/pull/108666 Lowering memory limit did not worked: https://github.com/pytorch/pytorch/pull/108669 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108695 Approved by: https://github.com/drisspg, https://github.com/seemethere, https://github.com/huydhn	2023-09-06 21:39:35 +00:00
Angela Yi	d856f3b47d	[export] Change _generate_new_graph_signature (#108571 ) Summary: Previously `_generate_new_graph_signature` had the assumption that all transformations were not in place. However, this is an incorrect assumption leading to mysterious failures when running passes doing in-place modifications. This function is technically only needed in the case where the user output node or user input node name is changed. For example, if the user output node was "add" but a pass changes all the "add"s to "mul"s, then the output node will now be named "mul", which we have to update. For cases where users change the number of user inputs/outputs, number of parameters/buffers, or the names of parameters/buffers it will require extra work on the user's side to update the graph signature, since there is no automatic way for us to detect where to put what. Note: this doesn't actually change the names for the buffers_to_mutate part of the graph signature, but we're going to assume this is rare, because implementing auto-fixing for that is a little hard... Test Plan: Running `buck test fbcode//mode/dev-nosan fbcode//executorch/backends/xnnpack/test:` on top of D48710125, https://www.internalfb.com/intern/testinfra/testrun/5066549776877081 Differential Revision: D48917505 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108571 Approved by: https://github.com/zhxchen17	2023-09-06 21:39:26 +00:00
Yukio Siraichi	089950b83a	Fix inductor `sub` with symbolic integers. (#108518 ) Fix: #108159 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108518 Approved by: https://github.com/peterbell10	2023-09-06 21:01:34 +00:00
FFFrog	3f74e57e34	add packaging to requirements.txt (#108554 ) As stated in this https://github.com/pytorch/pytorch/pull/107207#issuecomment-1700674065, packaging is not a built-in python module and need to add it to requirements.txt. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108554 Approved by: https://github.com/albanD	2023-09-06 20:29:46 +00:00
Jenny	8a76f8e6fe	Enable mypy checking in torch/_inductor/sizevars.py (#107862 ) Fixes #105230 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107862 Approved by: https://github.com/eellison	2023-09-06 19:43:07 +00:00
Jerry Zhang	32a16d4999	[quant][pt2e] Support int16 quantization (#108453 ) Summary: Previously we can only use native pytorch int dtypes that has corresponding quantized dtypes (e.g. quint8, qint8), this PR removes this assumption in observers/fake_quants so that users can use all pytorch native dtypes (except for int64, we can add it later if need) the main addition here is int16. Test Plan: python test/test_quantization.py TestQuantizePT2E Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/108453 Approved by: https://github.com/kimishpatel	2023-09-06 19:31:20 +00:00
Sherlock Huang	bee7e78130	[PT2 Inference] Prototype of Inference Runtime (#108482 ) Summary: This diff demonstrates a simplified E2E workflow for PT2 Inference stack: 1. Model author with `torch.export()` 2. Model processing with `aot_inductor.compile()` 3. Model served with a new Inference Runtime API, named `ModelRunner` `torch.export()` and `aot_inductor.compile()` produces a zip file using `PyTorchStreamWriter`. Runtime reads the zip file with `PyTorchStreamReader`. The zip file contains {F1080328179} More discussion on packaging can be found in https://docs.google.com/document/d/1C-4DP5yu7ZhX1aB1p9JcVZ5TultDKObM10AqEtmZ-nU/edit?usp=sharing Runtime can now switch between two Execution modes: 1. Graph Interpreter mode, implemented based on Sigmoid's Executor 2. AOTInductor mode, implemented based on FBAOTInductorModel Test Plan: buck2 run mode/dev-nosan mode/inplace -c fbcode.enable_gpu_sections=True //sigmoid/inference/test:e2e_test Export and Lower with AOTInductor buck2 run mode/dev-sand mode/inplace -c fbcode.enable_gpu_sections=True sigmoid/inference:export_package Run with GraphInterpreter and AOTInducotr buck2 run mode/dev-nosan //sigmoid/inference:main Reviewed By: suo Differential Revision: D47781098 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108482 Approved by: https://github.com/zhxchen17	2023-09-06 19:28:58 +00:00
Huy Do	5a4fe05a15	Revert "Force synced KJT to trace unbacked SymInt (#107788 )" (#108684 ) This reverts commit 3b92ef814de4571a125294f2aa95843d7d2e2aea. So let's manually revert it instead. (Not sure why the bot doesn't work on https://github.com/pytorch/pytorch/pull/107788) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108684 Approved by: https://github.com/ezyang	2023-09-06 19:15:45 +00:00
PyTorch MergeBot	1aacbaed8b	Revert "[export] Fix dict.get() to dict.setdefault() for param lookup. (#108587 )" This reverts commit c99a70c8dfc98dd5e6905990c41194c2ceb1318b. Reverted https://github.com/pytorch/pytorch/pull/108587 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it is failing one internal test. Please take a look at the diff D48995555 for more details ([comment](https://github.com/pytorch/pytorch/pull/108587#issuecomment-1708933010))	2023-09-06 19:05:01 +00:00
PyTorch MergeBot	27d5dcf589	Revert "Use global variables to register the return_types namedtuples (#107000 )" This reverts commit ae8eb7a3f9aee106affca3b27c1f4031bd216730. Reverted https://github.com/pytorch/pytorch/pull/107000 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it is failing internal build ([comment](https://github.com/pytorch/pytorch/pull/107000#issuecomment-1708862325))	2023-09-06 18:13:23 +00:00
PyTorch MergeBot	e5e653a660	Revert "docs: Match open bracket with close bracket in unsqueeze (#95215 )" This reverts commit 9d04d376d81be2f01e5ea6b68943390346f2494c. Reverted https://github.com/pytorch/pytorch/pull/95215 on behalf of https://github.com/kit1980 due to Incorrect assumptions ([comment](https://github.com/pytorch/pytorch/pull/95215#issuecomment-1708852420))	2023-09-06 18:04:10 +00:00
QuarticCat	20812d69e5	Fix extension rebuilding on Linux (#108613 ) On Linux, CUDA header dependencies are not correctly tracked. After you modify a CUDA header, affected CUDA files won't be rebuilt. This PR will fix this problem. ```console $ ninja -t deps rep_penalty.o: #deps 2, deps mtime 1693956351892493247 (VALID) /home/qc/Workspace/NotMe/exllama/exllama_ext/cpu_func/rep_penalty.cpp /home/qc/Workspace/NotMe/exllama/exllama_ext/cpu_func/rep_penalty.h rms_norm.cuda.o: #deps 0, deps mtime 1693961188871054130 (VALID) rope.cuda.o: #deps 0, deps mtime 1693961188954388632 (VALID) cuda_buffers.cuda.o: #deps 0, deps mtime 1693961188797719768 (VALID) ... ``` Historically, this line of code has been changed twice. It was first implemented in #49344 and there's no `if IS_WINDOWS`, just like now. Then in #56015 someone added `if IS_WINDOWS` for unknown reason. That PR has no description so I don't know what bug he encountered. I don't think there's any bug with these flags on Linux, at least for today. CMake generates exactly the same flags for CUDA. ```ninja ############################################# # Rule for compiling CUDA files. rule CUDA_COMPILER__cpp_cuda_unscanned_Debug depfile = $DEP_FILE deps = gcc command = ${LAUNCHER}${CODE_CHECK}/opt/cuda/bin/nvcc -forward-unknown-to-host-compiler $DEFINES $INCLUDES $FLAGS -MD -MT $out -MF $DEP_FILE -x cu -c $in -o $out description = Building CUDA object $out ``` where `-MD` is short for `--generate-dependencies-with-compile` and `-MF` is short for `--dependency-output`. My words can be verified by `nvcc --help`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108613 Approved by: https://github.com/ezyang	2023-09-06 17:58:21 +00:00
Pearu Peterson	4e042cfed5	Improve triton bsr_dense_mm performance on column-major ordered inputs with float32 dtype (#108512 ) As in the title. The bsr_dense_mm performance on inputs using column-major storage order is relevant for `linear(x, W)` operation that for BSR weights is defined as `bsr_dense_mm(W, x.transpose(-2, -1)).transpose(-2, 1)` so that the second argument to `bse_dense_mm` is a strided tensor using column-major storage order when `x` is C-contiguous. For large inputs (size > 1000) and moderate sparsity in the BSR input, the speed up can be more than 3 times, as illustrated in the following figure (raw data: [bench_bsr_dense_mm_1_results.txt](https://github.com/pytorch/pytorch/files/12512245/bench_bsr_dense_mm_1_results.txt)): ![bench_bsr_dense_mm_1](https://github.com/pytorch/pytorch/assets/402156/c6372008-dfae-4d26-b119-2c3c944a74ae) For small inputs (size=512), there exists a slight degradation of performance. For row-major ordered inputs, there is no change in performance (see raw data above). For inputs with float16 dtype, there is no considerable change in performance (see blue marks in the figure). Pull Request resolved: https://github.com/pytorch/pytorch/pull/108512 Approved by: https://github.com/cpuhrsch	2023-09-06 17:30:06 +00:00
fakeYan	1dabfb68e7	Add TORCH_API to expose RPC module functions for RPC module device extension (#108553 ) At present, I refer to the existing rpc backend tensorpipe backend, and implement our own rpc communication backend in our extension package. We found that these functions are not exposed during development, and direct use will cause our extension package to appear undefined symbol problem. Add the TORCH_API macro to the functions required to implement the custom tensorpipe agent in the rpc module to expose them to developers，at the same time, we think this risk is very controllable and hope it can be merged into the version 2.1. cc @albanD, @kumpera Pull Request resolved: https://github.com/pytorch/pytorch/pull/108553 Approved by: https://github.com/kumpera, https://github.com/albanD	2023-09-06 17:24:46 +00:00
Jenny	e471c12a01	Enable mypy checking in torch/_inductor/__init__.py (#108336 ) Fixes #105230 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108336 Approved by: https://github.com/ezyang	2023-09-06 17:14:54 +00:00
eellison	738106c1f7	Torchbench model tolerance changes (#108598 ) Move detectron2_fcos_r_50_fpn to amp. The minifier showed the following snippet as causing the divergence, where inductor has better numerics than eager: ``` import torch def foo(x): return x > .2 inp = torch.tensor([.2002], device="cuda", dtype=torch.bfloat16) print(foo(inp)) print(torch.compile(foo)(inp)) ``` doctr_reco_predictor had very minimal divergence (.002 vs .001 required), bumping tolerance here. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108598 Approved by: https://github.com/shunting314	2023-09-06 16:52:29 +00:00
Jing Xu	aa89f0a1fd	[Doc] Move Dynamo IPEX backend to training/inference category (#108643 ) As title. Since dynamo IPEX backend supports training, move it to the category above. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108643 Approved by: https://github.com/msaroufim	2023-09-06 15:57:12 +00:00
Daniil Kutz	79bc4eeb2b	Fix empty vector segfault during version parsing in quantized serialization (#108418 ) Hi! I've been fuzzing different pytorch modules with with [sydr-fuzz](https://github.com/ispras/oss-sydr-fuzz/tree/master/projects/pytorch), and found a SEGV that occurs during data parsing for quantized conv deserialization. The crash occurs because of empty `optional` vector. Docker to reproduce found error: [Dockerfile](https://github.com/ispras/oss-sydr-fuzz/tree/master/projects/pytorch). ### PoC: [crash-aaa72b1c1431ac556118e34099ba163052dc0f96.txt](https://github.com/pytorch/pytorch/files/12499249/crash-aaa72b1c1431ac556118e34099ba163052dc0f96.txt) ### ASAN report ``` ==1003193==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x000000cbd1b1 bp 0x7fffffff8490 sp 0x7fffffff7a30 T0) ==1003193==The signal is caused by a READ memory access. ==1003193==Hint: address points to the zero page. #0 0xcbd1b1 in c10::optional_base<at::Tensor>::optional_base(c10::optional_base<at::Tensor> const&) /pytorch/c10/util/Optional.h:222:17 #1 0x2b32336 in c10::optional<at::Tensor>::optional(c10::optional<at::Tensor> const&) /pytorch/c10/util/Optional.h:631:3 #2 0x2b32336 in std::tuple<long, std::vector<long, std::allocator<long> >, std::vector<c10::optional<at::Tensor>, std::allocator<c10::optional<at::Tensor> > > > parse_conv_serialized_state<2u>(c10::IValue) /pytorch/aten/src/ATen/native/quantized/cpu/conv_serialization.h:183:17 #3 0x2b30276 in int register_conv_params<2>()::'lambda'(c10::IValue)::operator()(c10::IValue) const /pytorch/aten/src/ATen/native/quantized/cpu/fbgemm_utils.cpp:410:49 #4 0x2b30014 in std::enable_if<!(std::is_member_pointer<std::decay<int register_conv_params<2>()::'lambda'(c10::IValue) const&>::type>::value), std::invoke_result<int register_conv_params<2>()::'lambda'(c10::IValue) const&, c10::IValue>::type>::type c10::guts::invoke<int register_conv_params<2>()::'lambda'(c10::IValue) const&, c10::IValue>(int register_conv_params<2>()::'lambda'(c10::IValue) const&, c10::IValue&&) /pytorch/c10/util/C++17.h:203:10 #5 0x2b2f7e7 in torch::class_<ConvPackedParamsBase<2> >& torch::class_<ConvPackedParamsBase<2> >::def_pickle<int register_conv_params<2>()::'lambda'(c10::intrusive_ptr<ConvPackedParamsBase<2>, c10::detail::intrusive_target_default_null_type<ConvPackedParamsBase<2> > > const&), int register_conv_params<2>()::'lambda'(c10::IValue)>(int register_conv_params<2>()::'lambda'(c10::intrusive_ptr<ConvPackedParamsBase<2>, c10::detail::intrusive_target_default_null_type<ConvPackedParamsBase<2> > > const&)&&, int register_conv_params<2>()::'lambda'(c10::IValue)&&)::'lambda'(c10::tagged_capsule<ConvPackedParamsBase<2> >, c10::IValue&&)::operator()(c10::tagged_capsule<ConvPackedParamsBase<2> >, c10::IValue&&) const /pytorch/torch/custom_class.h:328:11 #6 0x2b2f570 in c10::guts::infer_function_traits<int register_conv_params<2>()::'lambda'(c10::intrusive_ptr<ConvPackedParamsBase<2>, c10::detail::intrusive_target_default_null_type<ConvPackedParamsBase<2> > > const&)>::type::return_type torch::detail::call_torchbind_method_from_stack<torch::class_<ConvPackedParamsBase<2> >& torch::class_<ConvPackedParamsBase<2> >::def_pickle<int register_conv_params<2>()::'lambda'(c10::intrusive_ptr<ConvPackedParamsBase<2>, c10::detail::intrusive_target_default_null_type<ConvPackedParamsBase<2> > > const&), int register_conv_params<2>()::'lambda'(c10::IValue)>(int register_conv_params<2>()::'lambda'(c10::intrusive_ptr<ConvPackedParamsBase<2>, c10::detail::intrusive_target_default_null_type<ConvPackedParamsBase<2> > > const&)&&, int register_conv_params<2>()::'lambda'(c10::IValue)&&)::'lambda'(c10::tagged_capsule<ConvPackedParamsBase<2> >, c10::IValue&&), false, 0ul, 1ul>(int register_conv_params<2>()::'lambda'(c10::intrusive_ptr<ConvPackedParamsBase<2>, c10::detail::intrusive_target_default_null_type<ConvPackedParamsBase<2> > > const&)&, std::vector<c10::IValue, std::allocator<c10::IValue> >&, std::integer_sequence<unsigned long, 0ul, 1ul>) /pytorch/torch/custom_class_detail.h:139:10 #7 0x2b2f408 in c10::guts::infer_function_traits<int register_conv_params<2>()::'lambda'(c10::intrusive_ptr<ConvPackedParamsBase<2>, c10::detail::intrusive_target_default_null_type<ConvPackedParamsBase<2> > > const&)>::type::return_type torch::detail::call_torchbind_method_from_stack<torch::class_<ConvPackedParamsBase<2> >& torch::class_<ConvPackedParamsBase<2> >::def_pickle<int register_conv_params<2>()::'lambda'(c10::intrusive_ptr<ConvPackedParamsBase<2>, c10::detail::intrusive_target_default_null_type<ConvPackedParamsBase<2> > > const&), int register_conv_params<2>()::'lambda'(c10::IValue)>(int register_conv_params<2>()::'lambda'(c10::intrusive_ptr<ConvPackedParamsBase<2>, c10::detail::intrusive_target_default_null_type<ConvPackedParamsBase<2> > > const&)&&, int register_conv_params<2>()::'lambda'(c10::IValue)&&)::'lambda'(c10::tagged_capsule<ConvPackedParamsBase<2> >, c10::IValue&&), false>(int register_conv_params<2>()::'lambda'(c10::intrusive_ptr<ConvPackedParamsBase<2>, c10::detail::intrusive_target_default_null_type<ConvPackedParamsBase<2> > > const&)&, std::vector<c10::IValue, std::allocator<c10::IValue> >&) /pytorch/torch/custom_class_detail.h:153:10 #8 0x2b2f408 in torch::detail::BoxedProxy<void, torch::class_<ConvPackedParamsBase<2> >& torch::class_<ConvPackedParamsBase<2> >::def_pickle<int register_conv_params<2>()::'lambda'(c10::intrusive_ptr<ConvPackedParamsBase<2>, c10::detail::intrusive_target_default_null_type<ConvPackedParamsBase<2> > > const&), int register_conv_params<2>()::'lambda'(c10::IValue)>(int register_conv_params<2>()::'lambda'(c10::intrusive_ptr<ConvPackedParamsBase<2>, c10::detail::intrusive_target_default_null_type<ConvPackedParamsBase<2> > > const&)&&, int register_conv_params<2>()::'lambda'(c10::IValue)&&)::'lambda'(c10::tagged_capsule<ConvPackedParamsBase<2> >, c10::IValue&&)>::operator()(std::vector<c10::IValue, std::allocator<c10::IValue> >&, torch::class_<ConvPackedParamsBase<2> >& torch::class_<ConvPackedParamsBase<2> >::def_pickle<int register_conv_params<2>()::'lambda'(c10::intrusive_ptr<ConvPackedParamsBase<2>, c10::detail::intrusive_target_default_null_type<ConvPackedParamsBase<2> > > const&), int register_conv_params<2>()::'lambda'(c10::IValue)>(int register_conv_params<2>()::'lambda'(c10::intrusive_ptr<ConvPackedParamsBase<2>, c10::detail::intrusive_target_default_null_type<ConvPackedParamsBase<2> > > const&)&&, int register_conv_params<2>()::'lambda'(c10::IValue)&&)::'lambda'(c10::tagged_capsule<ConvPackedParamsBase<2> >, c10::IValue&&)&) /pytorch/torch/custom_class_detail.h:174:5 #9 0x2b2f38d in torch::jit::Function* torch::class_<ConvPackedParamsBase<2> >::defineMethod<torch::class_<ConvPackedParamsBase<2> >& torch::class_<ConvPackedParamsBase<2> >::def_pickle<int register_conv_params<2>()::'lambda'(c10::intrusive_ptr<ConvPackedParamsBase<2>, c10::detail::intrusive_target_default_null_type<ConvPackedParamsBase<2> > > const&), int register_conv_params<2>()::'lambda'(c10::IValue)>(int register_conv_params<2>()::'lambda'(c10::intrusive_ptr<ConvPackedParamsBase<2>, c10::detail::intrusive_target_default_null_type<ConvPackedParamsBase<2> > > const&)&&, int register_conv_params<2>()::'lambda'(c10::IValue)&&)::'lambda'(c10::tagged_capsule<ConvPackedParamsBase<2> >, c10::IValue&&)>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int register_conv_params<2>()::'lambda'(c10::intrusive_ptr<ConvPackedParamsBase<2>, c10::detail::intrusive_target_default_null_type<ConvPackedParamsBase<2> > > const&), std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::initializer_list<torch::arg>)::'lambda'(std::vector<c10::IValue, std::allocator<c10::IValue> >&)::operator()(std::vector<c10::IValue, std::allocator<c10::IValue> >&) /pytorch/torch/custom_class.h:407:7 #10 0x2b2f38d in int register_conv_params<2>()::'lambda'(c10::intrusive_ptr<ConvPackedParamsBase<2>, c10::detail::intrusive_target_default_null_type<ConvPackedParamsBase<2> > > const&) std::__invoke_impl<void, torch::jit::Function* torch::class_<ConvPackedParamsBase<2> >::defineMethod<torch::class_<ConvPackedParamsBase<2> >& torch::class_<ConvPackedParamsBase<2> >::def_pickle<int register_conv_params<2>()::'lambda'(c10::intrusive_ptr<ConvPackedParamsBase<2>, c10::detail::intrusive_target_default_null_type<ConvPackedParamsBase<2> > > const&), int register_conv_params<2>()::'lambda'(c10::IValue)>(int register_conv_params<2>()::'lambda'(c10::intrusive_ptr<ConvPackedParamsBase<2>, c10::detail::intrusive_target_default_null_type<ConvPackedParamsBase<2> > > const&)&&, int register_conv_params<2>()::'lambda'(c10::IValue)&&)::'lambda'(c10::tagged_capsule<ConvPackedParamsBase<2> >, c10::IValue&&)>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int register_conv_params<2>()::'lambda'(c10::intrusive_ptr<ConvPackedParamsBase<2>, c10::detail::intrusive_target_default_null_type<ConvPackedParamsBase<2> > > const&), std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::initializer_list<torch::arg>)::'lambda'(std::vector<c10::IValue, std::allocator<c10::IValue> >&)&, std::vector<c10::IValue, std::allocator<c10::IValue> >&>(std::__invoke_other, int register_conv_params<2>()::'lambda'(c10::IValue)&&, std::vector<c10::IValue, std::allocator<c10::IValue> >&) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/invoke.h:60:14 #11 0x125654e in torch::jit::Function::operator()(std::vector<c10::IValue, std::allocator<c10::IValue> >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, c10::IValue, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, c10::IValue> > > const&) /pytorch/aten/src/ATen/core/function.h:62:5 #12 0xec2c1c6 in torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_1::operator()(c10::StrongTypePtr const&, c10::IValue) const /pytorch/torch/csrc/jit/serialization/import.cpp:172:7 #13 0xec2c1c6 in c10::intrusive_ptr<c10::ivalue::Object, c10::detail::intrusive_target_default_null_type<c10::ivalue::Object> > std::__invoke_impl<c10::intrusive_ptr<c10::ivalue::Object, c10::detail::intrusive_target_default_null_type<c10::ivalue::Object> >, torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_1&, c10::StrongTypePtr, c10::IValue>(std::__invoke_other, torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_1&, c10::StrongTypePtr&&, c10::IValue&&) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/invoke.h:60:14 #14 0xec2b9a0 in std::enable_if<is_invocable_r_v<c10::intrusive_ptr<c10::ivalue::Object, c10::detail::intrusive_target_default_null_type<c10::ivalue::Object> >, torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_1&, c10::StrongTypePtr, c10::IValue>, c10::intrusive_ptr<c10::ivalue::Object, c10::detail::intrusive_target_default_null_type<c10::ivalue::Object> > >::type std::__invoke_r<c10::intrusive_ptr<c10::ivalue::Object, c10::detail::intrusive_target_default_null_type<c10::ivalue::Object> >, torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_1&, c10::StrongTypePtr, c10::IValue>(torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_1&, c10::StrongTypePtr&&, c10::IValue&&) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/invoke.h:113:9 #15 0xec2b8ae in std::_Function_handler<c10::intrusive_ptr<c10::ivalue::Object, c10::detail::intrusive_target_default_null_type<c10::ivalue::Object> > (c10::StrongTypePtr, c10::IValue), torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_1>::_M_invoke(std::_Any_data const&, c10::StrongTypePtr&&, c10::IValue&&) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/std_function.h:291:9 #16 0xeda0c63 in std::function<c10::intrusive_ptr<c10::ivalue::Object, c10::detail::intrusive_target_default_null_type<c10::ivalue::Object> > (c10::StrongTypePtr, c10::IValue)>::operator()(c10::StrongTypePtr, c10::IValue) const /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/std_function.h:622:14 #17 0xed8062d in torch::jit::Unpickler::readGlobal(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_9::operator()() const /pytorch/torch/csrc/jit/serialization/unpickler.cpp:863:20 #18 0xed8062d in void std::__invoke_impl<void, torch::jit::Unpickler::readGlobal(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_9&>(std::__invoke_other, torch::jit::Unpickler::readGlobal(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_9&) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/invoke.h:60:14 #19 0xed877c6 in torch::jit::Unpickler::readInstruction() /pytorch/torch/csrc/jit/serialization/unpickler.cpp:545:7 #20 0xed85b27 in torch::jit::Unpickler::run() /pytorch/torch/csrc/jit/serialization/unpickler.cpp:253:27 #21 0xed85781 in torch::jit::Unpickler::parse_ivalue() /pytorch/torch/csrc/jit/serialization/unpickler.cpp:206:3 #22 0xec9c7be in torch::jit::readArchiveAndTensors(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, c10::optional<std::function<c10::StrongTypePtr (c10::QualifiedName const&)> >, c10::optional<std::function<c10::intrusive_ptr<c10::ivalue::Object, c10::detail::intrusive_target_default_null_type<c10::ivalue::Object> > (c10::StrongTypePtr, c10::IValue)> >, c10::optional<c10::Device>, caffe2::serialize::PyTorchStreamReader&, c10::Type::SingletonOrSharedTypePtr<c10::Type> (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&), std::shared_ptr<torch::jit::DeserializationStorageContext>) /pytorch/torch/csrc/jit/serialization/import_read.cpp:53:20 #23 0xec2b168 in torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /pytorch/torch/csrc/jit/serialization/import.cpp:184:10 #24 0xec27235 in torch::jit::(anonymous namespace)::ScriptModuleDeserializer::deserialize(c10::optional<c10::Device>, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&, bool) /pytorch/torch/csrc/jit/serialization/import.cpp:287:19 #25 0xec25644 in torch::jit::import_ir_module(std::shared_ptr<torch::jit::CompilationUnit>, std::istream&, c10::optional<c10::Device>, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&, bool, bool) /pytorch/torch/csrc/jit/serialization/import.cpp:389:25 #26 0xec2dcbe in torch::jit::import_ir_module(std::shared_ptr<torch::jit::CompilationUnit>, std::istream&, c10::optional<c10::Device>, bool) /pytorch/torch/csrc/jit/serialization/import.cpp:325:10 #27 0xec30659 in torch::jit::load(std::istream&, c10::optional<c10::Device>, bool) /pytorch/torch/csrc/jit/serialization/import.cpp:485:10 #28 0x8d8636 in LLVMFuzzerTestOneInput /load.cc:42:14 #29 0x8d835d in ExecuteFilesOnyByOne /AFLplusplus/utils/aflpp_driver/aflpp_driver.c:255:7 #30 0x8d8168 in LLVMFuzzerRunDriver /AFLplusplus/utils/aflpp_driver/aflpp_driver.c #31 0x8d7d28 in main /AFLplusplus/utils/aflpp_driver/aflpp_driver.c:300:10 #32 0x7ffff7a37082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082) (BuildId: 1878e6b475720c7c51969e69ab2d276fae6d1dee) #33 0x817add in _start (/load_afl+0x817add) AddressSanitizer can not provide additional info. SUMMARY: AddressSanitizer: SEGV /pytorch/c10/util/Optional.h:222:17 in c10::optional_base<at::Tensor>::optional_base(c10::optional_base<at::Tensor> const&) ==1003193==ABORTING ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/108418 Approved by: https://github.com/Skylion007	2023-09-06 15:45:50 +00:00
Kimish Patel	ebed490c2f	[sdpa decomp] change sdpa decomp to be consistent with flash attention (#108608 ) Summary: See the comment in code for the reasons of the change Test Plan: buck2 test executorch/examples/export/test:test_export -- test_vit_export_to_executorch Differential Revision: D48992180 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108608 Approved by: https://github.com/larryliu0820	2023-09-06 15:34:03 +00:00
FFFrog	6edd06441a	Fix copy=True behavior for torch.asarray when device is not None/cpu (#108511 ) Fixes #108408 See issue for details Pull Request resolved: https://github.com/pytorch/pytorch/pull/108511 Approved by: https://github.com/ysiraichi, https://github.com/rgommers, https://github.com/ezyang	2023-09-06 15:16:30 +00:00
Yinghai Lu	aebb86fef7	Back out "Faster gc_count update for CUDACachingAllocator" (#108632 ) Summary: Original commit changeset: 1d04ae368fd8 Original Phabricator Diff: D48481557 block.pool is not guaranteed to be not nullptr Test Plan: CI Differential Revision: D49003756 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108632 Approved by: https://github.com/houseroad	2023-09-06 14:57:41 +00:00
chunyuan	ca9f4222e1	Inductor cpp wrapper: fix codegen of positional args with default value (#108552 ) Fixes https://github.com/pytorch/pytorch/issues/108323. Cpp wrapper has functionality regression on `llama` and `tnt_s_patch16_224` due to recent support of scaled dot product flash attention in inductor. The schema of this OP is as follows: ``` - func: _scaled_dot_product_flash_attention(Tensor query, Tensor key, Tensor value, float dropout_p=0.0, bool is_causal=False, bool return_debug_mask=False, *, float? scale=None) -> (Tensor output, Tensor logsumexp, Tensor cum_seq_q, Tensor cum_seq_k, int max_q, int max_k, Tensor philox_seed, Tensor philox_offset, Tensor debug_attn_mask) ``` For `llama` and `tnt_s_patch16_224`, the OP is called in the below way, where the three positional args with default values are not passed (`float dropout_p=0.0, bool is_causal=False, bool return_debug_mask=False`). ```python y = torch.ops.aten._scaled_dot_product_flash_attention.default(x0, x1, x2, scale = 0.125) ``` This PR fixes the cpp wrapper support for this case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108552 Approved by: https://github.com/jgong5, https://github.com/desertfire, https://github.com/jansel	2023-09-06 13:15:12 +00:00
Bin Bao	60bd30ee0b	[inductor] Move AOTInductor runtime headers (#108564 ) Summary: Move AOTInductor runtime header files into its own subdirectory, to separate them from to-be-added libtorch C interface. Reviewed By: frank-wei Differential Revision: D48905038 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108564 Approved by: https://github.com/frank-wei	2023-09-06 11:50:41 +00:00
alexdremov	b60273b88a	[MPS] Pixel shuffle unshuffle support (#99306 ) Fixes #83196 Now, MPS implementation is blazingly fast. Though, I have several questions on improving this PR: 1. I copied code from `test_nn.py`. Is there better way to test this? 2. I decided to use `usepixelshuffleorder:YES`. Am I right performance-wise? According to docs: ``` `usePixelShuffleOrder` can be used to control how the data within spatial blocks is ordered in the `depthAxis` dimension: with `usePixelShuffleOrder=YES` the values within the spatial blocks are stored contiguosly within the `depthAxis` dimension whereas otherwise they are stored interleaved with existing values in the `depthAxis` dimension. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/99306 Approved by: https://github.com/kulinseth, https://github.com/malfet	2023-09-06 09:11:39 +00:00
wz337	ca2cdb3009	[DeviceMesh] Minor docstring update for init_device_mesh and rename variables (#108391 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/108391 Approved by: https://github.com/wanchaol	2023-09-06 08:27:11 +00:00
Banit Agrawal	3fe8417643	[PyTorch] Add the lazy init call for p2p access function (#1991 ) (#108589 ) Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/1991 Test Plan: sandcastle Reviewed By: zdevito Differential Revision: D48939723 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108589 Approved by: https://github.com/zdevito	2023-09-06 05:52:56 +00:00
wz337	49aa8d19dd	[DTensor] Replace usage of compute_local_offset by compute_local_shape_and_global_offset (#108547 ) This PR removes four usages of compute_local_offset() in PyTorch repo and replaces it with the new API compute_local_shape_and_global_offset(). We will be removing compute_local_offset() API in the next diff, as there are usages internally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108547 Approved by: https://github.com/wanchaol	2023-09-06 04:53:44 +00:00
PyTorch UpdateBot	ce4967ad18	[vision hash update] update the pinned vision hash (#108611 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108611 Approved by: https://github.com/pytorchbot	2023-09-06 03:55:17 +00:00
Edward Z. Yang	3b92ef814d	Force synced KJT to trace unbacked SymInt (#107788 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/107788 Approved by: https://github.com/voznesenskym	2023-09-06 03:18:26 +00:00
eellison	c8e72a4a5c	Improve mem efficiency of constant folding (#108421 ) Couple changes to make it more efficient. - Because we replacing nodes that only have a single value, only store a single value instead of the whole tensor for node replacement - torch.fx.Interpreter will preserve a Tensor in the env as long as it has more uses. That also applies even to output uses, but we are not going to constant fold that use. Instead of using last use for garbage collection, use last non output use. If reviewers would prefer I ghstack this bc of code movement let me know. Fix for https://github.com/pytorch/pytorch/issues/108388 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108421 Approved by: https://github.com/jansel	2023-09-06 02:19:30 +00:00
Bin Bao	28c5b62210	[inductor] Use empty_strided to create output tensors when testing AOTInductor (#108364 ) Summary: This will fix 3 fail_accuracy failures in HF. Test Plan: ``` python benchmarks/dynamo/huggingface.py --bfloat16 --accuracy --inference --device cuda --export-aot-inductor --only T5Small ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/108364 Approved by: https://github.com/angelayi ghstack dependencies: #108412	2023-09-06 02:04:32 +00:00
Tina (Lin) Dineva	d494b923a9	[pytorch-vulkan] aten::.rand_like (#108086 ) Summary: Before implementing `aten::.randn_like` as requested (T152843033), I think it worth to extend `aten::rand_like` from existing `aten::uniform`, since they're so similar. Test Plan: ``` [ttingchulin@6945.od /data/sandcastle/boxes/fbsource (rand_like)]$ LD_LIBRARY_PATH=third-party/swiftshader/lib/linux-x64/ buck run fbcode/mode/dev-nosan //xplat/caffe2:pt_vulkan_api_test_bin -- --gtest_filter="<test>" eg. -- --gtest_filter="rand_like" [==========] Running 1 test from 1 test suite. [----------] Global test environment set-up. [----------] 1 test from VulkanAPITest [ RUN ] VulkanAPITest.rand_like [ OK ] VulkanAPITest.rand_like (136 ms) [----------] 1 test from VulkanAPITest (136 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test suite ran. (136 ms total) [ PASSED ] 1 test. [ttingchulin@6945.od /data/sandcastle/boxes/fbsource (rand_like)]$ LD_LIBRARY_PATH=third-party/swiftshader/lib/linux-x64/ buck run fbcode/mode/dev-nosan //xplat/caffe2:pt_vulkan_api_test_bin -- --gtest_filter="<test>" eg. -- --gtest_filter="uniform" Building: finished in 0.1 sec (100%) 329/329 jobs, 0/329 updated Total time: 0.1 sec BUILD SUCCEEDED Running main() from xplat/third-party/gmock/googletest-1.12.1/googletest/src/gtest_main.cc Note: Google Test filter = uniform [==========] Running 1 test from 1 test suite. [----------] Global test environment set-up. [----------] 1 test from VulkanAPITest [ RUN ] VulkanAPITest.uniform [ OK ] VulkanAPITest.uniform (131 ms) [----------] 1 test from VulkanAPITest (131 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test suite ran. (131 ms total) [ PASSED ] 1 test. [ttingchulin@6945.od /data/sandcastle/boxes/fbsource (rand_like)]$ LD_LIBRARY_PATH=third-party/swiftshader/lib/linux-x64/ buck run fbcode/mode/dev-nosan //xplat/caffe2:pt_vulkan_api_test_bin ALL PASS ``` Differential Revision: D48710273 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108086 Approved by: https://github.com/yipjustin	2023-09-06 01:25:03 +00:00
Shiyan Deng	d471eaeb1d	fix inline_container.cc inplace loading (#108573 ) Summary: bypass-github-pytorch-ci-checks bypass-github-export-checks force-merge-on-github Differential Revision: D48971847 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108573 Approved by: https://github.com/wqfish	2023-09-06 00:02:42 +00:00
Yanbo Liang	ff28b4b908	Fix dynamo benchmark config --print-graph-breaks (#108584 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/108584 Approved by: https://github.com/anijain2305	2023-09-05 23:31:43 +00:00
cyy	bae14b3d9f	Update clang7 CI jobs to clang9 (#108339 ) This PR updates the remaining clang7 CI job to clang9. However, I have no permission to push the new docker image so Android CI tests would fail. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108339 Approved by: https://github.com/ezyang	2023-09-05 22:46:47 +00:00
zhxchen17	c99a70c8df	[export] Fix dict.get() to dict.setdefault() for param lookup. (#108587 ) Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/108587 Approved by: https://github.com/angelayi	2023-09-05 22:08:51 +00:00
Kevin Slagle	eab57145ab	fix matrix_power documentation bug (#108585 ) The torch.linalg.matrix_power documentation suggests using the formula `matrix_power(torch.linalg.solve(A, B), n) == matrix_power(A, -n) @ B` to avoid negative matrix powers. But the ordering of the left side is not correct. This patch fixes it to: `torch.linalg.solve(matrix_power(A, n), B) == matrix_power(A, -n) @ B` Pull Request resolved: https://github.com/pytorch/pytorch/pull/108585 Approved by: https://github.com/lezcano	2023-09-05 22:08:46 +00:00
Rohan Varma	208fd1cb84	[RFC] Somewhat BC breaking: make checkpoint_wrapper default to NO_REENTRANT (#108435 ) We should use no_reentrant. There are a lot of users of this API, but it is in a prototype state so should be fine to change. Differential Revision: [D48898148](https://our.internmc.facebook.com/intern/diff/D48898148/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108435 Approved by: https://github.com/awgu ghstack dependencies: #108032, #108033	2023-09-05 21:43:41 +00:00
Rohan Varma	db6d09c086	[RFC][FSDP] Don't move ignored params / buffers to device (#108033 ) Since these are ignored by FSDP, don't move them. Differential Revision: [D48727044](https://our.internmc.facebook.com/intern/diff/D48727044/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108033 Approved by: https://github.com/awgu ghstack dependencies: #108032	2023-09-05 21:43:41 +00:00
Rohan Varma	3334ec3a00	[RFC] Don't materialize ignored modules for FSDP (#108032 ) Per title. This seems needed for cases where I have a large embedding I want to separately manage, but FSDP would initialize it and thus consume the memory. Currently the interaction with torchdistX materialize_module is not tested, this can be done as follow up work. Differential Revision: [D48722046](https://our.internmc.facebook.com/intern/diff/D48722046/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108032 Approved by: https://github.com/awgu	2023-09-05 21:43:41 +00:00
Danielle Pintz	fee9fc1df0	[pytorch] Update docstring for FSDP.set_state_dict_type (#103864 ) Summary: I noticed optim_state_dict_config was missing from the Args section Test Plan: N/A Reviewed By: rohan-varma Differential Revision: D46670165 Pull Request resolved: https://github.com/pytorch/pytorch/pull/103864 Approved by: https://github.com/rohan-varma, https://github.com/fegin, https://github.com/fduwjj	2023-09-05 21:43:31 +00:00
Digant Desai	64ad16a5e1	[XNNPACK] Enable XX kernels (#108440 ) Summary: Enables copy, pad, fill etc. kernels in XNNPACK library. This shouldn't be too bad size implication. Test Plan: CI Differential Revision: D48915384 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108440 Approved by: https://github.com/Skylion007	2023-09-05 21:37:24 +00:00
wz337	66af4f6ec7	[HSDP] Add device_mesh to FSDP kwarg and add dtensor state_dict support for HSDP (#107533 ) This PR: 1) Add device_mesh kwarg to FSDP. Remove init_device_mesh() from _runtime_utils.py, as device_mesh would be passed in by user as an kwarg. 2) change use_dtensor flag for state_dict_config and optim_state_dict_config to be private. If device_mesh is used with sharded model/optim state dict, _use_dtensor flag would be set to True and model/optim state dict would return dtensor state_dict. Otherwise, _use_dtensor flag would be set to False and model/optim state dict would return sharded_tensor state_dict. 3) Update _optim_utils.py, _shard_utils.py, and _state_dict_utils.py to add support for HSDP to return 2D DTensor state_dict. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107533 Approved by: https://github.com/fegin, https://github.com/awgu, https://github.com/wanchaol	2023-09-05 21:21:21 +00:00
Thiago Crepaldi	b1729d8bbe	Fix doc preview page url at CONTRIBUTING.md (#108580 ) The URL for previewing documentation directly on PR has changed and CONTRIBUTING.md got outdated. There is also a minor fix to a non-existent document URL Pull Request resolved: https://github.com/pytorch/pytorch/pull/108580 Approved by: https://github.com/svekars, https://github.com/kit1980	2023-09-05 20:17:55 +00:00
Brian Hirsh	fac7a1f730	fix issue with lift_fresh_copy when using export + compile (#108243 ) Fixes https://github.com/pytorch/pytorch/issues/105327. The problem is that `lift_fresh_copy()`'s functionalization implementation currently assumes that the input is always not functional. This is apparently too limiting: when you have "user" code like this (which can potentially come from exporting a model and then running compile on the resulting graph): ``` tensor_constant0 = torch.tensor(2) lift_fresh = torch.ops.aten.lift_fresh_copy.default(tensor_constant0) ``` When we run this through AOTAutograd, the first call (torch.tensor(2)) will already be lifted into a functional tensor wrapper - so the `lift_fresh_copy` call doesn't need to do any "lifting" anymore - it just needs to do a clone. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108243 Approved by: https://github.com/albanD ghstack dependencies: #108081, #108235	2023-09-05 20:02:35 +00:00
Brian Hirsh	da914aed21	error when using _dynamo.optimize_ddp=True and _inductor.keep_output_stride=False together (#108235 ) From talking to @wconstab, we agreed that because of the way DDPOptimizer is written, it is (sort of) incompatible with inductor's `keep_output_stride=False` optimizations (and will cause silent correctness problems if you use them ogether). Added an assertion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108235 Approved by: https://github.com/wconstab ghstack dependencies: #108081	2023-09-05 20:02:35 +00:00
Brian Hirsh	def33d4d7a	Fix inductor <> ddp_optimizer issue (#108081 ) @wconstab pointed out that inductor found a graph with 6 input mutations and only 1 output, and seemed to be (incorrectly) chopping off the first "6" outputs from the graph (even though there is only 1). It looks like this is because: (1) AOTAutograd has special handling for input mutations in inference vs. training graphs. In a training graph, whenever AOTAutograd sees an input mutation, it will add an extra output to the graph, corresponding to the updated input (and then at runtime, it will grab the updated input, and perform the actual mutation outside of the graph). In inference, AOTAutograd is smarter and can leave the input mutations directly in the graph for inductor to optimize (doing this in training is harder). In inference, AOTAutograd will not add any extra graph outputs for input mutations. It looks like inductor was unconditionally assuming that input mutations counted as extra outputs in the graph, which is wrong for the inference case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108081 Approved by: https://github.com/wconstab	2023-09-05 20:02:35 +00:00
Andrei Gheorghe	ae8eb7a3f9	Use global variables to register the return_types namedtuples (#107000 ) Fixes #69221 @pytorchbot label "topic: not user facing" Pull Request resolved: https://github.com/pytorch/pytorch/pull/107000 Approved by: https://github.com/zou3519	2023-09-05 20:00:29 +00:00
moto	d64e1c5f9d	Fix error message concatanation (#108581 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108581 Approved by: https://github.com/mikaylagawarecki	2023-09-05 19:46:52 +00:00
Bin Bao	7cdfc38433	[inductor] Update how AOTInductor resizes output tensors (#108412 ) Summary: Improve https://github.com/pytorch/pytorch/pull/107848 so that there is no resize_ needed for output tensors when existing the main function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108412 Approved by: https://github.com/jansel	2023-09-05 19:33:26 +00:00
PyTorch MergeBot	1b76a5c24b	Revert "Use std::filesystem in c10 tempfile and tempdir (#106656 )" This reverts commit 7b91f762b65ea250b87aaa2e2b67e429a9d29f16. Reverted https://github.com/pytorch/pytorch/pull/106656 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it is failing internal iOS build. This was missed by period mobile build I think ([comment](https://github.com/pytorch/pytorch/pull/106656#issuecomment-1707187814))	2023-09-05 19:22:56 +00:00
Digant Desai	a9a6423261	Revert "[export] Copy gm before calling PassManager" for test or build failures (#108441 ) Test Plan: CI Differential Revision: D48916322 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108441 Approved by: https://github.com/cccclai	2023-09-05 19:21:01 +00:00
katotaisei	0b44fdfaec	fix use_deterministic_algorithms docstring (#108551 ) I fixed an error in the example. `k` in `torch.Tensor.kthvalue(k)` is 1-indexed, so `torch.randn(10, device='cuda').kthvalue(0)` should be `torch.randn(10, device='cuda').kthvalue(1)`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108551 Approved by: https://github.com/mikaylagawarecki	2023-09-05 18:44:23 +00:00
Rodrigo Kumpera	23e8a11fef	[c10d] Introduce TCPStore client metrics collection. (#108348 ) We collect timing and counts of every operation. They are acessible from python using TCPStore::collect_client_counters. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108348 Approved by: https://github.com/XilunWu	2023-09-05 18:36:27 +00:00
Daniil Kutz	4a472d9e95	[jit] Verify stack size and index to prevent off-by-one error (#108413 ) Hi! I've been fuzzing different pytorch modules with with [sydr-fuzz](https://github.com/ispras/oss-sydr-fuzz/tree/master/projects/pytorch), and found a heap buffer overflow error that occurs by incorrect loop condition in torch::jit::unpickler.cpp. This bug can be triggered by `torch::distributed::rpc::deserializeRequest()` method in RPC module. Docker to reproduce found error: [Dockerfile](https://github.com/ispras/oss-sydr-fuzz/tree/master/projects/pytorch). ### PoC for deserealizeRequest(): [crash-001e49dcd3a3c439e2b1273d580049309e052bdd.txt](https://github.com/pytorch/pytorch/files/12498999/crash-001e49dcd3a3c439e2b1273d580049309e052bdd.txt) ### ASAN report ``` ==339982==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x619000086a88 at pc 0x000000996fa4 bp 0x7fffffff9c50 sp 0x7fffffff9c48 READ of size 4 at 0x619000086a88 thread T0 #0 0x996fa3 in c10::IValue::IValue(c10::IValue const&) /pytorch/aten/src/ATen/core/ivalue.h:226:33 #1 0xdf99a38 in std::pair<c10::impl::DictIterator<c10::IValue, c10::IValue, ska_ordered::detailv3::sherwood_v3_table<std::pair<c10::IValue, c10::IValue>, c10::IValue, c10::detail::DictKeyHash, ska_ordered::detailv3::KeyOrValueHasher<c10::IValue, std::pair<c10::IValue, c10::IValue>, c10::detail::DictKeyHash>, c10::detail::DictKeyEqualTo, ska_ordered::detailv3::KeyOrValueEquality<c10::IValue, std::pair<c10::IValue, c10::IValue>, c10::detail::DictKeyEqualTo>, std::allocator<std::pair<c10::IValue, c10::IValue> >, std::allocator<ska_ordered::detailv3::sherwood_v3_entry<std::pair<c10::IValue, c10::IValue> > > >::templated_iterator<std::pair<c10::IValue, c10::IValue> > >, bool> c10::Dict<c10::IValue, c10::IValue>::insert_or_assign<c10::IValue&, c10::IValue&>(c10::IValue&, c10::IValue&) const /pytorch/aten/src/ATen/core/Dict_inl.h:136:5 #2 0xed966c7 in torch::jit::Unpickler::readInstruction() /pytorch/torch/csrc/jit/serialization/unpickler.cpp:490:14 #3 0xed94377 in torch::jit::Unpickler::run() /pytorch/torch/csrc/jit/serialization/unpickler.cpp:253:27 #4 0xed93fd1 in torch::jit::Unpickler::parse_ivalue() /pytorch/torch/csrc/jit/serialization/unpickler.cpp:206:3 #5 0xece09ee in torch::jit::unpickle(std::function<unsigned long (char, unsigned long)>, std::function<c10::StrongTypePtr (c10::QualifiedName const&)>, c10::ArrayRef<at::Tensor>, c10::Type::SingletonOrSharedTypePtr<c10::Type> ()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)) /pytorch/torch/csrc/jit/serialization/pickle.cpp:126:20 #6 0xece0dac in torch::jit::unpickle(char const, unsigned long, std::function<c10::StrongTypePtr (c10::QualifiedName const&)>, c10::ArrayRef<at::Tensor>, c10::Type::SingletonOrSharedTypePtr<c10::Type> ()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)) /pytorch/torch/csrc/jit/serialization/pickle.cpp:136:10 #7 0x1006a4e7 in torch::distributed::rpc::PythonRemoteCall::fromMessage(torch::distributed::rpc::Message const&) /pytorch/torch/csrc/distributed/rpc/python_remote_call.cpp:40:16 #8 0x101d02e1 in torch::distributed::rpc::deserializeRequest(torch::distributed::rpc::Message const&) /pytorch/torch/csrc/distributed/rpc/utils.cpp:111:14 #9 0x8db738 in LLVMFuzzerTestOneInput /message_deserialize.cc:192:27 #10 0x8d84cd in ExecuteFilesOnyByOne /AFLplusplus/utils/aflpp_driver/aflpp_driver.c:255:7 #11 0x8d82d8 in LLVMFuzzerRunDriver /AFLplusplus/utils/aflpp_driver/aflpp_driver.c #12 0x8d7e98 in main /AFLplusplus/utils/aflpp_driver/aflpp_driver.c:300:10 #13 0x7ffff7a37082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082) (BuildId: 1878e6b475720c7c51969e69ab2d276fae6d1dee) #14 0x817c4d in _start (/message_deserialize_afl+0x817c4d) 0x619000086a88 is located 8 bytes to the right of 1024-byte region [0x619000086680,0x619000086a80) allocated by thread T0 here: #0 0x8d54ca in operator new(unsigned long) /llvm-project-llvmorg-14.0.6/compiler-rt/lib/asan/asan_new_delete.cpp:95:3 SUMMARY: AddressSanitizer: heap-buffer-overflow /pytorch/aten/src/ATen/core/ivalue.h:226:33 in c10::IValue::IValue(c10::IValue const&) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/108413 Approved by: https://github.com/ezyang	2023-09-05 18:28:17 +00:00
kshitij12345	a74f50d524	torch.compile-functorch interaction: update docs (#108130 ) Doc Preview: https://docs-preview.pytorch.org/pytorch/pytorch/108130/torch.compiler_faq.html#torch-func-works-with-torch-compile-for-grad-and-vmap-transforms Will also cherry-pick this for release branch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108130 Approved by: https://github.com/zou3519	2023-09-05 18:24:08 +00:00
CaoE	42f94d7e9f	add Half support for maxpool on CPU (#98819 ) ### Testing Single socket (28 cores): shape \| fp32 forward / ms \| fp16 forward / ms \| bf16 forward / ms \| fp32 backward / ms \| fp16 backward / ms \| bf16 backward / ms -- \| -- \| -- \| -- \| -- \| -- \| -- size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: contig \| 4.12895 \| 6.9669 \| 5.30297 \| 0.55775 \| 1.98917 \| 0.72233 size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: CL \| 0.85093 \| 1.88813 \| 1.38063 \| 5.5742 \| 36.5086 \| 10.58552 size: (32, 16, 200, 200), kernel: 3, stride: 1, mem_format: contig \| 22.37212 \| 37.90383 \| 30.94482 \| 6.85868 \| 10.6116 \| 3.9993 size: (32, 16, 200, 200), kernel: 3, stride: 1, mem_format: CL \| 5.41658 \| 4.71098 \| 4.66578 \| 6.69875 \| 14.7171 \| 5.1167 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: contig \| 10.69831 \| 18.0468 \| 13.71657 \| 2.61192 \| 4.96172 \| 1.68635 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: CL \| 2.52637 \| 2.0096 \| 2.0055 \| 2.60314 \| 7.2093 \| 2.49843 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: contig \| 0.47605 \| 0.88398 \| 0.65326 \| 0.06525 \| 0.115489 \| 0.0674 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: CL3d \| 0.10902 \| 0.25293 \| 0.157475 \| 0.11386 \| 0.53319 \| 0.17836 Single core: shape \| fp32 forward / ms \| fp16 forward / ms \| bf16 forward / ms \| fp32 backward / ms \| fp16 backward / ms \| bf16 backward / ms -- \| -- \| -- \| -- \| -- \| -- \| -- size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: contig \| 90.9809 \| 163.473 \| 126.1276 \| 6.57721 \| 41.40833 \| 11.82505 size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: CL \| 9.88405 \| 38.39137 \| 29.62069 \| 7.10636 \| 36.97535 \| 11.0525 size: (32, 16, 200, 200), kernel: 3, stride: 1, mem_format: contig \| 476.782 \| 855.4769 \| 648.2248 \| 46.6488 \| 219.2586 \| 67.10599 size: (32, 16, 200, 200), kernel: 3, stride: 1, mem_format: CL \| 80.29271 \| 91.33854 \| 87.80345 \| 48.81692 \| 203.9974 \| 63.39004 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: contig \| 235.2113 \| 419.0799 \| 315.4284 \| 20.6049 \| 107.1524 \| 32.39169 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: CL \| 29.47653 \| 33.54905 \| 32.82823 \| 22.59674 \| 98.5586 \| 30.05763 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: contig \| 7.90684 \| 13.9208 \| 10.03272 \| 0.23725 \| 1.35269 \| 0.41728 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: CL3d \| 2.33638 \| 3.36894 \| 2.64635 \| 0.26535 \| 1.244 \| 0.38895 Pull Request resolved: https://github.com/pytorch/pytorch/pull/98819 Approved by: https://github.com/mingfeima, https://github.com/mikaylagawarecki	2023-09-05 18:23:41 +00:00
Stiopa Koltsov	1e0e55c504	[xplat][buck2][typing] Fix typechecker issue (#108525 ) Test Plan: CI Reviewed By: JakobDegen Differential Revision: D48817210 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108525 Approved by: https://github.com/osalpekar	2023-09-05 18:18:45 +00:00
PyTorch MergeBot	8da04e023e	Revert "Eliminate c10::guts::to_string (#108480 )" This reverts commit 4146be192ead477360a2763c5005e46a9485c3bf. Reverted https://github.com/pytorch/pytorch/pull/108480 on behalf of https://github.com/huydhn due to Sorry for reverting this, but this is needed to keep trunk green after https://github.com/pytorch/pytorch/pull/108479 was reverted. Both will need to be relanded ([comment](https://github.com/pytorch/pytorch/pull/108480#issuecomment-1707067595))	2023-09-05 18:04:53 +00:00
PyTorch MergeBot	5b31a41841	Revert "[NCCL][CUDA][CUDA Graphs] Flush enqueued work before starting a graph capture (#104487 )" This reverts commit db63bf3d7e5eef320dde9c2d4b7976eb5fcddbd6. Reverted https://github.com/pytorch/pytorch/pull/104487 on behalf of https://github.com/huydhn due to Sorry for reverting you change, it is failing internal build ([comment](https://github.com/pytorch/pytorch/pull/104487#issuecomment-1707055346))	2023-09-05 17:57:19 +00:00
Animesh Jain	29f1097891	[dynamo] Reduce cache size limit to 8 (#108526 ) As title Pull Request resolved: https://github.com/pytorch/pytorch/pull/108526 Approved by: https://github.com/ezyang	2023-09-05 17:56:26 +00:00
shibo19	03aac0bff6	add input check at the beginning for C++ API `interpolate` (#108506 ) Fixes https://github.com/pytorch/pytorch/issues/108346 add the input check to the beginning for C++ API `interpolate`, raise an error when got an invalid input. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108506 Approved by: https://github.com/ezyang	2023-09-05 17:56:17 +00:00
PyTorch MergeBot	9f71a4ebd4	Revert "Simplify c10::string_view implementation (#108479 )" This reverts commit ce03b78a8f463139c87a4bf42e8f37ebabca5b0f. Reverted https://github.com/pytorch/pytorch/pull/108479 on behalf of https://github.com/huydhn due to Sorry for reverting you change, it is failing internal builds ([comment](https://github.com/pytorch/pytorch/pull/108479#issuecomment-1707033082))	2023-09-05 17:39:54 +00:00
Shayekh Bin Islam	e8005781be	Softmax in functorch example fixed (#107988 ) The output of softmax was overwritten by the output of fc2 in the following line. So, the output of the softmax is never utilized. Now, the final output of the model includes softmax. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107988 Approved by: https://github.com/zou3519	2023-09-05 17:18:48 +00:00
Daniil Kutz	e787708ad7	[jit] Validate statement parsing during class deserialization (#108417 ) Hi! I've been fuzzing different pytorch modules with with [sydr-fuzz](https://github.com/ispras/oss-sydr-fuzz/tree/master/projects/pytorch), and found a SEGV that occurs during class deserialization in jit module. Docker to reproduce found error: [Dockerfile](https://github.com/ispras/oss-sydr-fuzz/tree/master/projects/pytorch). ### PoC: [crash-bfbab61bf86755aa712bb978e26057ae76d75fe4.txt](https://github.com/pytorch/pytorch/files/12499228/crash-bfbab61bf86755aa712bb978e26057ae76d75fe4.txt) ### ASAN report ``` ==1003115==ERROR: AddressSanitizer: SEGV on unknown address (pc 0x00000db61680 bp 0x7fffffff5e30 sp 0x7fffffff5a60 T0) ==1003115==The signal is caused by a READ memory access. ==1003115==Hint: this fault was caused by a dereference of a high value address (see register values below). Disassemble the provided pc to learn which register was used. #0 0xdb61680 in c10::intrusive_ptr<torch::jit::Tree, c10::detail::intrusive_target_default_null_type<torch::jit::Tree> >::retain_() /pytorch/c10/util/intrusive_ptr.h:265:54 #1 0xdb6721c in c10::intrusive_ptr<torch::jit::Tree, c10::detail::intrusive_target_default_null_type<torch::jit::Tree> >::intrusive_ptr(c10::intrusive_ptr<torch::jit::Tree, c10::detail::intrusive_target_default_null_type<torch::jit::Tree> > const&) /pytorch/c10/util/intrusive_ptr.h:354:5 #2 0xdb6721c in torch::jit::Expr::Expr(c10::intrusive_ptr<torch::jit::Tree, c10::detail::intrusive_target_default_null_type<torch::jit::Tree> > const&) /pytorch/torch/csrc/jit/frontend/tree_views.h:270:49 #3 0xdbf73b9 in torch::jit::Maybe<torch::jit::Expr>::get() const /pytorch/torch/csrc/jit/frontend/tree_views.h:212:12 #4 0xecac171 in torch::jit::SourceImporterImpl::importClass(c10::QualifiedName const&, torch::jit::ClassDef const&, bool) /pytorch/torch/csrc/jit/serialization/import_source.cpp:454:64 #5 0xeca0ada in torch::jit::SourceImporterImpl::importNamedType(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, torch::jit::ClassDef const&) /pytorch/torch/csrc/jit/serialization/import_source.cpp:288:5 #6 0xeca7422 in torch::jit::SourceImporterImpl::findNamedType(c10::QualifiedName const&) /pytorch/torch/csrc/jit/serialization/import_source.cpp:140:5 #7 0xeca295c in torch::jit::SourceImporterImpl::resolveType(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, torch::jit::SourceRange const&) /pytorch/torch/csrc/jit/serialization/import_source.cpp:261:10 #8 0xdd03bc8 in torch::jit::ScriptTypeParser::parseTypeFromExpr(torch::jit::Expr const&) const /pytorch/torch/csrc/jit/frontend/script_type_parser.cpp:238:24 #9 0xdcfc9b6 in torch::jit::ScriptTypeParser::parseType(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /pytorch/torch/csrc/jit/frontend/script_type_parser.cpp:312:10 #10 0xecbac43 in torch::jit::SourceImporter::loadType(c10::QualifiedName const&) const /pytorch/torch/csrc/jit/serialization/import_source.cpp:786:27 #11 0xec2b5d3 in torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_0::operator()(c10::QualifiedName const&) const /pytorch/torch/csrc/jit/serialization/import.cpp:146:33 #12 0xec2b5d3 in c10::StrongTypePtr std::__invoke_impl<c10::StrongTypePtr, torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_0&, c10::QualifiedName const&>(std::__invoke_other, torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_0&, c10::QualifiedName const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/invoke.h:60:14 #13 0xec2b4a0 in std::enable_if<is_invocable_r_v<c10::StrongTypePtr, torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_0&, c10::QualifiedName const&>, c10::StrongTypePtr>::type std::__invoke_r<c10::StrongTypePtr, torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_0&, c10::QualifiedName const&>(torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_0&, c10::QualifiedName const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/invoke.h:113:9 #14 0xec2b3a0 in std::_Function_handler<c10::StrongTypePtr (c10::QualifiedName const&), torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::$_0>::_M_invoke(std::_Any_data const&, c10::QualifiedName const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/std_function.h:291:9 #15 0xec95f7c in std::function<c10::StrongTypePtr (c10::QualifiedName const&)>::operator()(c10::QualifiedName const&) const /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/std_function.h:622:14 #16 0xed78721 in torch::jit::Unpickler::readGlobal(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /pytorch/torch/csrc/jit/serialization/unpickler.cpp:844:9 #17 0xed87821 in torch::jit::Unpickler::readInstruction() /pytorch/torch/csrc/jit/serialization/unpickler.cpp:520:7 #18 0xed85b27 in torch::jit::Unpickler::run() /pytorch/torch/csrc/jit/serialization/unpickler.cpp:253:27 #19 0xed85781 in torch::jit::Unpickler::parse_ivalue() /pytorch/torch/csrc/jit/serialization/unpickler.cpp:206:3 #20 0xec9c7be in torch::jit::readArchiveAndTensors(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, c10::optional<std::function<c10::StrongTypePtr (c10::QualifiedName const&)> >, c10::optional<std::function<c10::intrusive_ptr<c10::ivalue::Object, c10::detail::intrusive_target_default_null_type<c10::ivalue::Object> > (c10::StrongTypePtr, c10::IValue)> >, c10::optional<c10::Device>, caffe2::serialize::PyTorchStreamReader&, c10::Type::SingletonOrSharedTypePtr<c10::Type> (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&), std::shared_ptr<torch::jit::DeserializationStorageContext>) /pytorch/torch/csrc/jit/serialization/import_read.cpp:53:20 #21 0xec2b168 in torch::jit::(anonymous namespace)::ScriptModuleDeserializer::readArchive(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /pytorch/torch/csrc/jit/serialization/import.cpp:184:10 #22 0xec27235 in torch::jit::(anonymous namespace)::ScriptModuleDeserializer::deserialize(c10::optional<c10::Device>, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&, bool) /pytorch/torch/csrc/jit/serialization/import.cpp:287:19 #23 0xec25644 in torch::jit::import_ir_module(std::shared_ptr<torch::jit::CompilationUnit>, std::istream&, c10::optional<c10::Device>, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&, bool, bool) /pytorch/torch/csrc/jit/serialization/import.cpp:389:25 #24 0xec2dcbe in torch::jit::import_ir_module(std::shared_ptr<torch::jit::CompilationUnit>, std::istream&, c10::optional<c10::Device>, bool) /pytorch/torch/csrc/jit/serialization/import.cpp:325:10 #25 0xec30659 in torch::jit::load(std::istream&, c10::optional<c10::Device>, bool) /pytorch/torch/csrc/jit/serialization/import.cpp:485:10 #26 0x8d8636 in LLVMFuzzerTestOneInput /load.cc:42:14 #27 0x8d835d in ExecuteFilesOnyByOne /AFLplusplus/utils/aflpp_driver/aflpp_driver.c:255:7 #28 0x8d8168 in LLVMFuzzerRunDriver /AFLplusplus/utils/aflpp_driver/aflpp_driver.c #29 0x8d7d28 in main /AFLplusplus/utils/aflpp_driver/aflpp_driver.c:300:10 #30 0x7ffff7a37082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082) (BuildId: 1878e6b475720c7c51969e69ab2d276fae6d1dee) #31 0x817add in _start (/load_afl+0x817add) AddressSanitizer can not provide additional info. SUMMARY: AddressSanitizer: SEGV /pytorch/c10/util/intrusive_ptr.h:265:54 in c10::intrusive_ptr<torch::jit::Tree, c10::detail::intrusive_target_default_null_type<torch::jit::Tree> >::retain_() ==1003115==ABORTING ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/108417 Approved by: https://github.com/ezyang	2023-09-05 17:09:25 +00:00
Michael Lazos	96d74073f8	Horizontally fuse input concatenation (#108115 ) Fixes https://github.com/pytorch/pytorch/issues/106688 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108115 Approved by: https://github.com/jansel	2023-09-05 16:55:32 +00:00
atalman	6a1a893f8f	Bump version 2.1.0 -> 2.2.0 (#108156 ) Same as: https://github.com/pytorch/pytorch/pull/95790 <!-- copilot:poem --> ### <samp>🤖 Generated by Copilot at 50063bb</samp> > _`PyTorch` version up_ > _Nightly and release builds change_ > _Autumn of progress_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/108156 Approved by: https://github.com/osalpekar, https://github.com/albanD	2023-09-05 15:56:23 +00:00
Peter Bell	a16b0aa26a	[dynamo] Fix return type of Tensor.shape (#108240 ) This should be `torch.Size` but was returning a plain tuple under dynamo. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108240 Approved by: https://github.com/ezyang ghstack dependencies: #108239	2023-09-05 14:58:39 +00:00
Peter Bell	7c931f2491	[dynamo] Add dynamic shapes support to torch.Size.numel (#108239 ) Currently numel only supports static shapes, but this expands it to support generating symbolic arithmetic into the graph. e.g. ``` # x.size().numel with x.size() = [s0, 1, s1] size = l_x_.size() getitem = size[0] getitem_2 = size[2]; size = None mul = getitem * getitem_2; getitem = getitem_2 = None ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/108239 Approved by: https://github.com/ezyang	2023-09-05 14:58:39 +00:00
Danielle Pintz	b2c6383f44	[pytorch] Small fix to docstring of FSDP.optim_state_dict_to_load (#108383 ) Summary: Fix ordering of args in docstring Test Plan: N/A Differential Revision: D48889668 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108383 Approved by: https://github.com/fegin, https://github.com/awgu, https://github.com/wz337	2023-09-05 14:56:56 +00:00
ekamiti	0ef2556351	Update sparse_funcs to include primtorch types (#107421 ) Fixes #107335. A few issues have been identified while enabling this test and filed: https://github.com/pytorch/pytorch/issues/105986 https://github.com/pytorch/pytorch/issues/108204 https://github.com/pytorch/pytorch/issues/108205 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107421 Approved by: https://github.com/ezyang	2023-09-05 14:34:48 +00:00
Aleksei Nikiforov	e27ddd2cee	s390x SIMD: update abs() function for complex numbers (#108515 ) It propagated NANs when it should have not. Also, replace it with std::abs due to precision mismatching. This change fixes test_python_ref__refs_abs_cpu_complex32 test in test/test_ops.py. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108515 Approved by: https://github.com/ezyang	2023-09-05 14:20:00 +00:00
Adam J. Stewart	0a8296da7d	ReduceLROnPlateau: inherit LRScheduler (#108464 ) Fixes #106767 FIxes #104687 Fixes #49369 Fixes #63143 Fixes #50715 Fixes #21981 Fixes #2829 Hoping this is just a simple fix, but we'll see. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108464 Approved by: https://github.com/ezyang	2023-09-05 13:48:54 +00:00
cyy	efc7c366f4	Remove auto_gil.h (#108492 ) auto_gil.h has been deprecated for a long time. We can switch to pybind11. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108492 Approved by: https://github.com/Skylion007	2023-09-05 08:26:13 +00:00
cyy	a9d9803bfd	Enable MKLDNN ASAN tests (#108478 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108478 Approved by: https://github.com/ezyang	2023-09-05 08:22:13 +00:00
cyy	468660d03e	use std::initialization_list for vector literals (#108504 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108504 Approved by: https://github.com/Skylion007	2023-09-05 08:17:13 +00:00
Bin Bao	3d2938b1fc	[inductor] Add an aot_inductor class in inductor config (#108369 ) Summary: Introduce an aot_inductor class to group AOTInductor specific configs Differential Revision: D48880684 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108369 Approved by: https://github.com/frank-wei	2023-09-05 07:11:19 +00:00
Yanming Wang	ff38c0e2f9	[Inductor] Make aot-inductor work with pip installed torch (#108319 ) It seems pip-installed torch is built with `D_GLIBCXX_USE_CXX11_ABI=0` and it fails the inductor/test_aot_inductor.py with: ``` ERROR: test_with_offset (__main__.AotInductorTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/ubuntu/anaconda3/lib/python3.9/site-packages/torch/testing/_internal/common_utils.py", line 2388, in wrapper method(args, *kwargs) File "/home/ubuntu/src/pytorch/test/inductor/test_aot_inductor.py", line 112, in test_with_offset actual = AOTInductorModelRunner.run(model, example_inputs, expected) File "/home/ubuntu/src/pytorch/test/inductor/test_aot_inductor.py", line 63, in run optimized, exported, output_tensors, output_spec = AOTInductorModelRunner.load( File "/home/ubuntu/src/pytorch/test/inductor/test_aot_inductor.py", line 50, in load optimized = torch.utils.cpp_extension.load_inline( File "/home/ubuntu/anaconda3/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1635, in load_inline return _jit_compile( File "/home/ubuntu/anaconda3/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1736, in _jit_compile return _import_module_from_library(name, build_directory, is_python_module) File "/home/ubuntu/anaconda3/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 2136, in _import_module_from_library module = importlib.util.module_from_spec(spec) File "<frozen importlib._bootstrap>", line 565, in module_from_spec File "<frozen importlib._bootstrap_external>", line 1173, in create_module File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed ImportError: /tmp/torchinductor_ubuntu/cqrzlw3yizrsx2us5bnjosr4tzct24h6qwb6xbbx654fxvdupoub/cr6ndwlgeorw34etxhwvs547kbnftyxtwwrsmbdraa4hjeevsvji.so: undefined symbol: _ZN3c106detail23torchInternalAssertFailEPKcS2_jS2_RKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE ``` I'm not sure how to test this in CI, maybe run tests with prebuilt wheels? Pull Request resolved: https://github.com/pytorch/pytorch/pull/108319 Approved by: https://github.com/ezyang	2023-09-04 19:57:38 +00:00
Daniil Kutz	159ce22694	[rpc] Fix assertion on vector length during message parsing (#108414 ) Hi! I've been fuzzing different pytorch modules with with [sydr-fuzz](https://github.com/ispras/oss-sydr-fuzz/tree/master/projects/pytorch), and found a heap buffer overflow error that occurs during Python object deserialization routine. Vector with `IValues` is verified to contain at least 3 elements, which are subsequently removed from vector. The rest of vector is passed further, where it is expected to contain at least one more element. The crash occurs on empty vector. Docker to reproduce found error: [Dockerfile](https://github.com/ispras/oss-sydr-fuzz/tree/master/projects/pytorch). ### PoC: [crash-6d634f38a76bfeaa1fffc9472e8ea7b88ee8e776.txt](https://github.com/pytorch/pytorch/files/12499089/crash-6d634f38a76bfeaa1fffc9472e8ea7b88ee8e776.txt) ### ASAN report ``` ==339647==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x604000105388 at pc 0x000000c2b3bc bp 0x7fffffffb8d0 sp 0x7fffffffb8c8 READ of size 4 at 0x604000105388 thread T0 #0 0xc2b3bb in c10::IValue::isString() const /pytorch/aten/src/ATen/core/ivalue.h:685:27 #1 0xc2b3bb in c10::IValue::toStringRef[abi:cxx11]() const /pytorch/aten/src/ATen/core/ivalue_inl.h:2308:3 #2 0x101ce65f in torch::distributed::rpc::SerializedPyObj::fromIValues(std::vector<c10::IValue, std::allocator<c10::IValue> >) /pytorch/torch/csrc/distributed/rpc/types.cpp:103:39 #3 0x1006a7a0 in torch::distributed::rpc::PythonRemoteCall::fromMessage(torch::distributed::rpc::Message const&) /pytorch/torch/csrc/distributed/rpc/python_remote_call.cpp:58:26 #4 0x101d02e1 in torch::distributed::rpc::deserializeRequest(torch::distributed::rpc::Message const&) /pytorch/torch/csrc/distributed/rpc/utils.cpp:111:14 #5 0x8db738 in LLVMFuzzerTestOneInput /message_deserialize.cc:192:27 #6 0x8d84cd in ExecuteFilesOnyByOne /AFLplusplus/utils/aflpp_driver/aflpp_driver.c:255:7 #7 0x8d82d8 in LLVMFuzzerRunDriver /AFLplusplus/utils/aflpp_driver/aflpp_driver.c #8 0x8d7e98 in main /AFLplusplus/utils/aflpp_driver/aflpp_driver.c:300:10 #9 0x7ffff7a37082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082) (BuildId: 1878e6b475720c7c51969e69ab2d276fae6d1dee) #10 0x817c4d in _start (/message_deserialize_afl+0x817c4d) 0x604000105388 is located 8 bytes to the left of 48-byte region [0x604000105390,0x6040001053c0) allocated by thread T0 here: #0 0x8d54ca in operator new(unsigned long) /llvm-project-llvmorg-14.0.6/compiler-rt/lib/asan/asan_new_delete.cpp:95:3 SUMMARY: AddressSanitizer: heap-buffer-overflow /pytorch/aten/src/ATen/core/ivalue.h:685:27 in c10::IValue::isString() const ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/108414 Approved by: https://github.com/ezyang	2023-09-04 19:32:15 +00:00
PyTorch MergeBot	48286d34a4	Revert "Break graph on `manual_seed`. (#107594 )" This reverts commit 6ad5568cbc7122356b58789a1d3bcd16d5faf775. Reverted https://github.com/pytorch/pytorch/pull/107594 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it has an import issue that breaks internal code ([comment](https://github.com/pytorch/pytorch/pull/107594#issuecomment-1705584405))	2023-09-04 18:00:37 +00:00
a-r-r-o-w	e08577aec5	Spelling fix (#108490 ) Fixes spelling mistake: non-deterinistic -> non-deterministic Pull Request resolved: https://github.com/pytorch/pytorch/pull/108490 Approved by: https://github.com/ezyang	2023-09-04 16:59:35 +00:00
Aleksei Nikiforov	51c2e22e94	When byteorder record is missing load as little endian by default (#108343 ) Fixes #101688 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108343 Approved by: https://github.com/mikaylagawarecki	2023-09-04 15:20:22 +00:00
Guilherme Leobas	7e878c9d10	Add decomposition for `aten.take_along_dim` (#108185 ) xref #107875 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108185 Approved by: https://github.com/lezcano	2023-09-04 13:49:53 +00:00
cyy	4146be192e	Eliminate c10::guts::to_string (#108480 ) This PR replace c10::guts::to_string with std::to_string. The major part of changes is using void* as optimizer state key since string is used only for serialization and using pointers as hashing keys is more efficient than a string. Some other guts functions in the affected source files are also replaced. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108480 Approved by: https://github.com/Skylion007	2023-09-04 08:12:53 +00:00
David Berard	06b173780d	[dynamo] "TorchDynamo Cache Lookup" event: use C++ api (#108436 ) Background: "TorchDynamo Cache Lookup" events appear in traces to indicate a dynamo cache lookup; it's useful to check when cache lookups are taking a long time. To add a profiler event, one can use the `torch.profiler.record_function` context manager, or the C++ equivalent. Previously, the python version was used; first, when the profiler was enabled, callbacks for record_function_enter and record_function_exit were registered; then those would be called before and after every cache lookup. This PR: Instead of calling the python bindings for `torch.profiler.record_function`, directly call the C++ implementation. This simplifies a lot of the code for binding C/C++. It also improves performance; previously there was a lot of overhead in the "TorchDynamo Cache Lookup" event, making the event artificially take a long time. After this change the events now appear shorter, because there's less overhead in starting/stopping the event: in other words, the profiler no longer distorts the results as much. Performance results: I ran using the script below on a cpu-only 1.6GHz machine. I report the median time (from 100 measurements) of a "TorchDynamo Cache Lookup" event before and after this PR. I think it is reasonable to consider the difference to be due to a reduction in overhead. <details> <summary>Benchmarking script</summary> ```python def fn(x, y): return (x * y).relu() a, b = [torch.rand((4, 4), requires_grad=True) for _ in range(2)] opt_fn = torch.compile(fn) opt_fn(a, b) opt_fn(a, b) with torch.profiler.profile() as prof: opt_fn(a, b) ``` </details> Median before PR: 198-228 us (median of 100, measured 5 times) Median after PR: 27us Pull Request resolved: https://github.com/pytorch/pytorch/pull/108436 Approved by: https://github.com/anijain2305, https://github.com/jansel	2023-09-04 04:37:26 +00:00
cyy	621463a3e6	Update libfmt submodule to 10.1.1 (#108431 ) This PR updates libfmt to version 10.1.1. We also set utf-8 source encoding earlier before include third party libraries on Windows. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108431 Approved by: https://github.com/Skylion007	2023-09-03 23:44:39 +00:00
cyy	ce03b78a8f	Simplify c10::string_view implementation (#108479 ) Remove unnecessary code in C++17 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108479 Approved by: https://github.com/Skylion007	2023-09-03 17:45:12 +00:00
cyy	aff7fdcb4c	Add a missing argument (#108477 ) Fix a tiny bug in string formatting. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108477 Approved by: https://github.com/Skylion007	2023-09-03 16:42:27 +00:00
Kimish Patel	cc50e654d4	[aten decomp] Update sdpa decom (#108371 ) Summary: Earlier decomp was routing _flash* variant to _match variant and this was result in failure during torch.export, for some reason that I couldnt trace. However, it seems that we should really have a decomp for scaled_dot_product_attention, instead of scaled_dot_product_flash_attention. Right? This diff adds that. Plus it adds a test to check if the model exported via two stage export, has decomposed the op. This test needs improvement to figur eout what the core aten opset is and check for anything that is not inside. Test Plan: test_model_exports_to_core_aten Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D48917461](https://our.internmc.facebook.com/intern/diff/D48917461) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108371 Approved by: https://github.com/larryliu0820	2023-09-03 15:17:08 +00:00
youkaichao	ba9acbebfc	[Doc] Update the dynamo deepdive doc (#108147 ) With a new tool `depyf` to decompile bytecode into human readable source code, understanding dynamo becomes much more easier. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108147 Approved by: https://github.com/jansel	2023-09-03 13:08:13 +00:00
cyy	7b91f762b6	Use std::filesystem in c10 tempfile and tempdir (#106656 ) This PR simplifies c10::TempFile and c10::TempDir. It also deletes Windows temp files in c10::~TempFile, this behavior is absent on the current version. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106656 Approved by: https://github.com/ezyang	2023-09-03 13:03:10 +00:00
Vishwa Raj Singh	1b3dc05c3e	Use contiguous() to handle noncontiguous outputs during elementwise decomposition (#108140 ) Fixes https://github.com/pytorch/pytorch/issues/108218 Use contiguous() API to handle noncontiguous outputs during elementwise decomp With this change, ops is decomposing properly (testcase from the bug): ``` graph(): %arg0_1 : [#users=3] = placeholder[target=arg0_1] %abs_1 : [#users=1] = call_function[target=torch.ops.aten.abs.default](args = (%arg0_1,), kwargs = {}) %floor : [#users=1] = call_function[target=torch.ops.aten.floor.default](args = (%abs_1,), kwargs = {}) %sign : [#users=1] = call_function[target=torch.ops.aten.sign.default](args = (%arg0_1,), kwargs = {}) %mul : [#users=1] = call_function[target=torch.ops.aten.mul.Tensor](args = (%floor, %sign), kwargs = {}) %sub : [#users=1] = call_function[target=torch.ops.aten.sub.Tensor](args = (%arg0_1, %mul), kwargs = {}) return (sub,) ``` Output: ``` tensor([[ 0.2871, 0.7189, 0.7297], [ 0.8782, -0.4899, 0.7055]], device='hpu:0') ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/108140 Approved by: https://github.com/ezyang	2023-09-03 04:32:22 +00:00
Joel Schlosser	e5548f8195	NT support for cat with dim > 0 when representable as jagged (#108428 ) Used in SAM Pull Request resolved: https://github.com/pytorch/pytorch/pull/108428 Approved by: https://github.com/cpuhrsch ghstack dependencies: #108361, #108370, #108362	2023-09-03 01:50:32 +00:00
Joel Schlosser	76ccf6c770	NT support for narrow() on dim=0 (#108362 ) Satisfies request here: https://github.com/pytorch/pytorch/issues/105913#issuecomment-1652249934 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108362 Approved by: https://github.com/cpuhrsch ghstack dependencies: #108361, #108370	2023-09-02 23:48:37 +00:00
Zhicheng Yan	01b662bafe	[gen_operators_yaml] add arguments to control include_all_overloads (#108396 ) Summary: In SelectiveBuildOperator, we can specify argument `include_all_overloads`. If True, all overloaded operators (for example, `aten::to.dtype_layout`, `aten::to.prim_Device"` are considered as overloaded operators of `aten::to`), will be built and linked to the final binary. This can significantly increases the final binary size, which could be a deal breaker for on-device deployment. In this diff, we make back-compatible changes to add new arguments `--not-include-all-overloads-static-root-ops` and `--not-include-all-overloads-closure-ops`. When they are set, we set `include_all_overloads` flag to False for static root ops and closure ops, and rely on code analyzer to decide the actual used overloaded operator. Test Plan: - unit test ``` buck test //xplat/caffe2/tools:gen_operators_yaml_test ``` - See test plan in D48771544 where we reduce the shared lib file `libmrengine.lib` from 16653072 bytes to 13686032 bytes. - See detailed document: https://fburl.com/gdoc/mc93h6kb Reviewed By: larryliu0820 Differential Revision: D48772302 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108396 Approved by: https://github.com/larryliu0820	2023-09-02 17:37:36 +00:00
Sherlock Huang	b9dfdc091b	[AOTInductor][Reland] Proxy Executor for Extern Fallback kernels (#107279 ) (#108350 ) Summary: This is a prototype for running extern fallback kernels with a host side proxy executor. Sample of generated cpp wrapper call: ``` at::Tensor buf0; // output buffer void* tensor_args_var_0[] = {&arg0_1, &arg0_1, &arg1_1, &arg0_1, &arg1_1, &buf0}; int64_t int_args_var_1[] = {81, 81, 7, 7, 7, 81}; proxy_executor->call_function("buf0", int_args_var_1, tensor_args_var_0); ``` - In my current implementation, proxy executor interprets the raw pointers according to the ops schema. This assumes that custom op MUST have a valid schema registered to Dispatcher. (I would like to validate this assumption) - I am using callboxed() API of the custom kernels. This is inevitable, as we wish to have a single call_function API for all possible custom kernels. - These are all the input argument types I have support so far. union Argument { # Bool value does not matter 1: bool asNone; 2: TensorArgument asTensor; 3: list<TensorArgument> asTensors; 5: i64 asInt; 7: list<i64> asInts; 8: double asFloat; 9: list<double> asFloats; 10: string asString; 10.5: list<string> asStrings; 11: SymIntArgument asSymInt; 12: list<SymIntArgument> asSymInts; 13: ScalarType asScalarType; 14: MemoryFormat asMemoryFormat; 15: Layout asLayout; 16: Device asDevice; 17: bool asBool; 18: list<bool> asBools; } - Need a policy for handling unpopulated argument with default values. Here are the options, and it has BC implications. 1. requires exported fx graph to explicitly populate default values, if users doesn't specify. 2. requires cpp wrapper to explicitly populate default values, if fx graph doesn't specify. 3. Proxy executor look up from opSchema for default values. For fixing T162112344 Test Plan: frontend: buck2 run mode/dev-sand mode/inplace -c fbcode.enable_gpu_sections=True sigmoid/frontend:export_main test: buck2 run mode/dev-sand //deeplearning/aot_inductor/test:test_custom_ops backend: buck2 run mode/dev-nosan //deeplearning/aot_inductor/fb:main buck2 test 'fbcode//mode/opt' fbcode//caffe2/torch/fb/model_transform/experimental/benchmark/test:test_aot_inductor_benchmark -- --exact 'caffe2/torch/fb/model_transform/experimental/benchmark/test:test_aot_inductor_benchmark - test_aot_inductor_benchmark_cmf30x (caffe2.torch.fb.model_transform.experimental.benchmark.test.test_aot_inductor_benchmark.AOTInductorBenchmark)' Reviewed By: suo Differential Revision: D48747417 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108350 Approved by: https://github.com/izaitsevfb	2023-09-02 17:14:10 +00:00
youkaichao	b9fc6d7ded	[Dynamo] Update the implementation of _debug_get_cache_entry_list (#108335 ) In https://github.com/pytorch/pytorch/pull/106673 , I created a private API `_debug_get_cache_entry_list` to help pull out cache entries from compiled functions. Recently, I find that @anijain2305 commented in the code that this API should be revisited, and so I created this PR. First, this API cannot be removed even if cache entry becomes a first-class python class`torch._C._dynamo.eval_frame._CacheEntry`. The facts that `extra_index` is static, and `get_extra_state` is inline static, make them not accessible elsewhere. This API `_debug_get_cache_entry_list` is the only way for users to get all the cache entries from code. Second, since the`torch._C._dynamo.eval_frame._CacheEntry` class is a python class, I simplified the C-part code, and remove the necessity of creating a namedtuple for this in the python code. Third, I also add a small improvement, that if the argument is a function, we can automatically pass its `__code__` to the API. The above change will slightly change the output, from list of named tuple to list of `torch._C._dynamo.eval_frame._CacheEntry`. I will update the corresponding docs that use this API. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108335 Approved by: https://github.com/jansel, https://github.com/anijain2305	2023-09-02 16:38:59 +00:00
Kurt Mohler	de58600126	Improve docs for `torch.unique` `dim` argument (#108292 ) Fixes #103142 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108292 Approved by: https://github.com/albanD	2023-09-02 11:09:09 +00:00
cyy	0cc2f06aec	[Reland] Improve MKL related logic in FindOpenMP.cmake (#104224 ) Reland of PR #94924. The purpose of this PR is to deal with the complicated interactions between MKL and OpenMP. There are two improvements: 1. It uses a flag to avoid infinite mutual recursion in calling find_package(MKL) and find_package(OpenMP) in some cases. 2. The logic of finding iomp5 is improved and now we can test MKLDNN under ASAN. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104224 Approved by: https://github.com/malfet	2023-09-02 07:55:11 +00:00
Kimish Patel	ffc0c46092	[Quantization] Add metadata porting for nodes added by quantization (#107107 ) Summary: This diff adds adding metadata to q-dq nodes by inferring the quatization intent from node annotations. Annotations on the node are way for user to specify how a node or subgraph is supposed to be quantized. We continue to use that information to copy metadata on Q/DQ node from appropriate nodes. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D48488416](https://our.internmc.facebook.com/intern/diff/D48488416) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107107 Approved by: https://github.com/jerryzh168 ghstack dependencies: #107105, #107106, #107899, #107900	2023-09-02 06:38:14 +00:00
cyy	d6a9c2b4b5	[BC BREAKING] Remove outdated python submodules (#108236 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108236 Approved by: https://github.com/malfet	2023-09-02 06:24:20 +00:00
Kimish Patel	eb67c452c8	[Quant] Add DQ duplication pass (#107900 ) Summary: During convert step observers are first replaced by Q-DQ pair. In some scenarios like following output DQ has a fan out. ---> OP2 -> Q -> DQ / OP -> Q -> DQ - \ ---> OP3 -> Q -> DQ If either op OP2 or OP3 are configured to be quantized, then the input is expected to quantized. In this case quantized equivalent of some pattern, that quantizer asked to be quantized, should look like: [DQ -> {pattern} -> Q]. However, in scenario like above where DQ node is shared between multiple "quantized" patterns, boundary of "quantized" pattern is not clear because DQ now belongs to multiple quantized patterns. This poses challenge for: - Porting metadata: which "quantized" partition this DQ node belongs - Quantized representation, equivalently, needs to identify self-contained quantized pattern that is replaced by its equivalent pattern that captures compute in the quantized precision. Test Plan: test_duplicate_dq_pass Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D48663147](https://our.internmc.facebook.com/intern/diff/D48663147) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107900 Approved by: https://github.com/jerryzh168, https://github.com/andrewor14, https://github.com/leslie-fang-intel ghstack dependencies: #107105, #107106, #107899	2023-09-02 06:20:03 +00:00
Kimish Patel	f8d1ca9835	[Quant] Bug fix (#107899 ) Summary: When two layers are quantized differently, observer map update updates map for key (observed_node, node), whereas it should really be (original_input, node) Test Plan: Test in the next diff adds a test where it otherwise fails Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D48663145](https://our.internmc.facebook.com/intern/diff/D48663145) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107899 Approved by: https://github.com/jerryzh168 ghstack dependencies: #107105, #107106	2023-09-02 06:20:03 +00:00
Kimish Patel	37b0d76e35	[Quantization] Make annotation util functions return annotated nodes (#107106 ) Summary: Having annotation functions return nodes that are annotated is useful specifically for adding "quantization_tag" to those nodes Test Plan: CI Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D48488415](https://our.internmc.facebook.com/intern/diff/D48488415) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107106 Approved by: https://github.com/jerryzh168 ghstack dependencies: #107105	2023-09-02 06:19:55 +00:00
Kimish Patel	99168c1fa9	[Quant] Use input_qspec_map for weight quantization of linear (#107105 ) Summary: In prepararation for metadata porting diff, it is required that weight quant annotation happens via edge quantization, i.e. input_qspec_map. Reason: Metadata is ported via associating DQ node's metadata with its consumer while associating Q node's metadata with its producer. Furthermore, such porting must be qualified via user intent to see if the consumder of DQ, or producer of Q, actually specified intent of quantization By making quantization annotation on linear node's weight via input_qspec_map, we can enable associating DQ of [weight -> Q -> DQ], with the linear module. Test Plan: CI Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D48488414](https://our.internmc.facebook.com/intern/diff/D48488414) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107105 Approved by: https://github.com/jerryzh168	2023-09-02 06:19:50 +00:00
PyTorch UpdateBot	ab6a86dccd	[vision hash update] update the pinned vision hash (#108460 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108460 Approved by: https://github.com/pytorchbot	2023-09-02 03:52:25 +00:00
eellison	ed92d9345e	Refactorings for constant folding (#108450 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108450 Approved by: https://github.com/jansel	2023-09-02 03:49:05 +00:00
Mark Saroufim	5f5caed25a	do not cast all inputs in benchmarks (#108456 ) Fixes why stable diffusion is not showing up in inference dashboard even though it shows up in training dashboard The reason is stable diffusion in torchbench has a line like `input_tensor = input_tensor.long().to(self.device)` and if you cast this to a bfloat16 the inference will fail <img width="1705" alt="Screenshot 2023-09-01 at 4 37 49 PM" src="https://github.com/pytorch/pytorch/assets/3282513/ada0d381-1af0-4378-8e8b-2375b39c3713"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/108456 Approved by: https://github.com/cpuhrsch	2023-09-02 03:13:17 +00:00
Banit Agrawal	b8af8ac784	[CUDACaching Allocator] Release the allocator lock on the slow path (#108367 ) Summary: This diff is to release the global allocator lock on the slow path when we do synchronous cudaMalloc call. Differential Revision: D48750077 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108367 Approved by: https://github.com/zdevito	2023-09-02 02:52:25 +00:00
Huy Do	4084d039b7	Only add triton dependency to CUDA and ROCm binaries if it hasn't been set as an installation requirement yet (#108424 ) The dependency was added twice before in CUDA and ROCm binaries, one as an installation dependency from builder and the later as an extra dependency for dynamo, for example: ``` Requires-Python: >=3.8.0 Description-Content-Type: text/markdown License-File: LICENSE License-File: NOTICE Requires-Dist: filelock Requires-Dist: typing-extensions Requires-Dist: sympy Requires-Dist: networkx Requires-Dist: jinja2 Requires-Dist: fsspec Requires-Dist: pytorch-triton (==2.1.0+e6216047b8) Provides-Extra: dynamo Requires-Dist: pytorch-triton (==2.1.0+e6216047b8) ; extra == 'dynamo' Requires-Dist: jinja2 ; extra == 'dynamo' Provides-Extra: opt-einsum Requires-Dist: opt-einsum (>=3.3) ; extra == 'opt-einsum' ``` In the previous release, we needed to remove this part from `setup.py` to build release binaries https://github.com/pytorch/pytorch/pull/96010. With this, that step isn't needed anymore because the dependency will come from builder. ### Testing Using the draft https://github.com/pytorch/pytorch/pull/108374 for testing and manually inspect the wheels artifact at https://github.com/pytorch/pytorch/actions/runs/6045878399 (don't want to go through all `ciflow/binaries` again) * torch-2.1.0.dev20230901+cu121-cp39-cp39-linux_x86_64 ``` Requires-Python: >=3.8.0 Description-Content-Type: text/markdown Requires-Dist: filelock Requires-Dist: typing-extensions Requires-Dist: sympy Requires-Dist: networkx Requires-Dist: jinja2 Requires-Dist: fsspec Requires-Dist: pytorch-triton (==2.1.0+e6216047b8) <-- This will be 2.1.0 on the release branch after https://github.com/pytorch/builder/pull/1515 Provides-Extra: dynamo Requires-Dist: jinja2 ; extra == 'dynamo' Provides-Extra: opt-einsum Requires-Dist: opt-einsum (>=3.3) ; extra == 'opt-einsum' ``` * torch-2.1.0.dev20230901+cu121.with.pypi.cudnn-cp39-cp39-linux_x86_64 ``` Requires-Python: >=3.8.0 Description-Content-Type: text/markdown Requires-Dist: filelock Requires-Dist: typing-extensions Requires-Dist: sympy Requires-Dist: networkx Requires-Dist: jinja2 Requires-Dist: fsspec Requires-Dist: pytorch-triton (==2.1.0+e6216047b8) Requires-Dist: nvidia-cuda-nvrtc-cu12 (==12.1.105) ; platform_system == "Linux" and platform_machine == "x86_64" Requires-Dist: nvidia-cuda-runtime-cu12 (==12.1.105) ; platform_system == "Linux" and platform_machine == "x86_64" Requires-Dist: nvidia-cuda-cupti-cu12 (==12.1.105) ; platform_system == "Linux" and platform_machine == "x86_64" Requires-Dist: nvidia-cudnn-cu12 (==8.9.2.26) ; platform_system == "Linux" and platform_machine == "x86_64" Requires-Dist: nvidia-cublas-cu12 (==12.1.3.1) ; platform_system == "Linux" and platform_machine == "x86_64" Requires-Dist: nvidia-cufft-cu12 (==11.0.2.54) ; platform_system == "Linux" and platform_machine == "x86_64" Requires-Dist: nvidia-curand-cu12 (==10.3.2.106) ; platform_system == "Linux" and platform_machine == "x86_64" Requires-Dist: nvidia-cusolver-cu12 (==11.4.5.107) ; platform_system == "Linux" and platform_machine == "x86_64" Requires-Dist: nvidia-cusparse-cu12 (==12.1.0.106) ; platform_system == "Linux" and platform_machine == "x86_64" Requires-Dist: nvidia-nccl-cu12 (==2.18.1) ; platform_system == "Linux" and platform_machine == "x86_64" Requires-Dist: nvidia-nvtx-cu12 (==12.1.105) ; platform_system == "Linux" and platform_machine == "x86_64" Requires-Dist: triton (==2.1.0) ; platform_system == "Linux" and platform_machine == "x86_64" <--This is 2.1.0 because it already has https://github.com/pytorch/pytorch/pull/108423, but the package doesn't exist yet atm Provides-Extra: dynamo Requires-Dist: jinja2 ; extra == 'dynamo' Provides-Extra: opt-einsum Requires-Dist: opt-einsum (>=3.3) ; extra == 'opt-einsum' ``` * torch-2.1.0.dev20230901+rocm5.6-cp38-cp38-linux_x86_64 ``` Requires-Python: >=3.8.0 Description-Content-Type: text/markdown Requires-Dist: filelock Requires-Dist: typing-extensions Requires-Dist: sympy Requires-Dist: networkx Requires-Dist: jinja2 Requires-Dist: fsspec Requires-Dist: pytorch-triton-rocm (==2.1.0+34f8189eae) <-- This will be 2.1.0 on the release branch after https://github.com/pytorch/builder/pull/1515 Provides-Extra: dynamo Requires-Dist: jinja2 ; extra == 'dynamo' Provides-Extra: opt-einsum Requires-Dist: opt-einsum (>=3.3) ; extra == 'opt-einsum' ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/108424 Approved by: https://github.com/atalman	2023-09-02 01:16:18 +00:00
Yukio Siraichi	2e3fce5450	Add dynamo support for `rdiv` dunder method. (#108422 ) Fix: #106646 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108422 Approved by: https://github.com/eellison	2023-09-02 00:59:22 +00:00
Peter Bell	fa8edd93b7	[inductor] Handle aten.full's dtype in the decomposition (#108443 ) In the lowering we don't have `SymFloat` and `SymInt`, we just have `sympy.Expr` so it is impossible to accurately determine the expected dtype of a `full` call. For example, `sym_float(int_expr)` has `is_integer=True` but should be treated as a float. In the decomposition though, we can get this right. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108443 Approved by: https://github.com/lezcano	2023-09-02 00:53:04 +00:00
PyTorch MergeBot	2c1f0772d5	Revert "Horizontally fuse input concatenation (#108115 )" This reverts commit 5911faeb8fc3f625f9c3a42e58d45f7b7578ab8a. Reverted https://github.com/pytorch/pytorch/pull/108115 on behalf of https://github.com/osalpekar due to Broke internal benchmarking job. See [D48890838](https://www.internalfb.com/diff/D48890838) ([comment](https://github.com/pytorch/pytorch/pull/108115#issuecomment-1703546520))	2023-09-02 00:19:00 +00:00
Xiaodong Wang	a27f01083d	[S362716] turn off constant folding (#108389 ) Summary: Constant folding is using a lot of memory and is causing OOM. Turn if off in fbcode. Also filed an issue https://github.com/pytorch/pytorch/issues/108388 Test Plan: Cloned a failed job and it's working now Differential Revision: D48871102 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108389 Approved by: https://github.com/eellison	2023-09-01 23:36:48 +00:00
ydwu4	e3933609d4	Make make_fx cond preserve node meta (#108356 ) Motivation: Currently, for the following code that exports cond operator: ```python import torch from functorch.experimental.control_flow import cond class MySubModule(torch.nn.Module): def foo(self, x): return x.cos() def forward(self, x): return self.foo(x) class CondBranchClassMethod(torch.nn.Module): def __init__(self): super().__init__() self.subm = MySubModule() def bar(self, x): return x.sin() def forward(self, x): return cond(x.shape[0] <= 2, self.subm.forward, self.bar, [x]) from torch._export import capture_pre_autograd_graph example_inputs = (torch.randn(1, 3, 3, 3),) m = CondBranchClassMethod() m.eval() gm = capture_pre_autograd_graph(m, example_inputs) print(gm) # source_fn for original cond op, getattr submodule op are all cond op for n in gm.graph.nodes: print("n:", n.format_node(), n.meta) print("\n\n\n") # source_fn for submodule nodes are all cond op # Expected: ideally this should be the real ops, e.g. torch.sin, aten.cos, etc for n in gm.submodule_0.graph.nodes: print("n:", n.format_node(), n.meta) ``` Output is like below: ``` GraphModule( (submodule_0): GraphModule() (submodule_1): GraphModule() ) def forward(self, arg_0): arg0_1, = fx_pytree.tree_flatten_spec([arg_0], self._in_spec) submodule_0 = self.submodule_0 submodule_1 = self.submodule_1 cond = torch.ops.higher_order.cond(True, submodule_0, submodule_1, [arg0_1]); submodule_0 = submodule_1 = arg0_1 = None return pytree.tree_unflatten((cond,), self._out_spec) # To see more debug info, please use `graph_module.print_readable()` n: %arg0_1 : [num_users=1] = placeholder[target=arg0_1] {'val': FakeTensor(..., size=(1, 3, 3, 3)), 'tensor_meta': None, 'is_torch_exported': True, 'stack_trace': 'NoneType: None\n'} n: %submodule_0 : [num_users=1] = get_attr[target=submodule_0] {'stack_trace': 'NoneType: None\n', 'source_fn': ('cond', <torch._ops.HigherOrderOperator object at 0x7f68ae93efd0>), 'original_aten': None, 'from_node': [('cond', <torch._ops.HigherOrderOperator object at 0x7f68ae93efd0>), ('conditional', <torch._ops.HigherOrderOperator object at 0x7f68ae93efd0>), ('cond', <torch._ops.HigherOrderOperator object at 0x7f68ae93efd0>)], 'seq_nr': -1} n: %submodule_1 : [num_users=1] = get_attr[target=submodule_1] {'stack_trace': 'NoneType: None\n', 'source_fn': ('cond', <torch._ops.HigherOrderOperator object at 0x7f68ae93efd0>), 'original_aten': None, 'from_node': [('cond', <torch._ops.HigherOrderOperator object at 0x7f68ae93efd0>), ('conditional', <torch._ops.HigherOrderOperator object at 0x7f68ae93efd0>), ('cond', <torch._ops.HigherOrderOperator object at 0x7f68ae93efd0>)], 'seq_nr': -1} n: %cond : [num_users=1] = call_function[target=torch.ops.higher_order.cond](args = (True, %submodule_0, %submodule_1, [%arg0_1]), kwargs = {}) {'stack_trace': 'NoneType: None\n', 'source_fn': ('cond', <torch._ops.HigherOrderOperator object at 0x7f68ae93efd0>), 'original_aten': None, 'from_node': [('cond', <torch._ops.HigherOrderOperator object at 0x7f68ae93efd0>), ('conditional', <torch._ops.HigherOrderOperator object at 0x7f68ae93efd0>), ('cond', <torch._ops.HigherOrderOperator object at 0x7f68ae93efd0>)], 'seq_nr': -1, 'val': FakeTensor(..., size=(1, 3, 3, 3)), 'tensor_meta': None, 'is_torch_exported': True} n: return (cond,) {'stack_trace': 'NoneType: None\n', 'from_node': [('output', 'output')], 'seq_nr': -1, 'is_torch_exported': True, 'val': (FakeTensor(..., size=(1, 3, 3, 3)),), 'tensor_meta': (None,)} n: %arg0_1 : [num_users=1] = placeholder[target=arg0_1] {'stack_trace': ' File "<ipython-input-9-2a8c7c0498ed>", line 36, in forward\n return cond(x.shape[0] <= 2, self.subm.forward, self.bar, [x])\n', 'source_fn': ('cond', <torch._ops.HigherOrderOperator object at 0x7f68ae93efd0>), 'original_aten': None, 'from_node': [('cond', <torch._ops.HigherOrderOperator object at 0x7f68ae93efd0>), ('arg0_1', 'arg0_1')], 'seq_nr': -1, 'val': FakeTensor(..., size=(1, 3, 3, 3)), 'tensor_meta': None} n: %cos_default : [num_users=1] = call_function[target=torch.ops.aten.cos.default](args = (%arg0_1,), kwargs = {}) {'stack_trace': ' File "<ipython-input-9-2a8c7c0498ed>", line 36, in forward\n return cond(x.shape[0] <= 2, self.subm.forward, self.bar, [x])\n', 'source_fn': ('cond', <torch._ops.HigherOrderOperator object at 0x7f68ae93efd0>), 'original_aten': <OpOverload(op='aten.cos', overload='default')>, 'from_node': [('cond', <torch._ops.HigherOrderOperator object at 0x7f68ae93efd0>), ('cos', <OpOverload(op='aten.cos', overload='default')>), ('cos_default', <OpOverload(op='aten.cos', overload='default')>)], 'seq_nr': -1, 'val': FakeTensor(..., size=(1, 3, 3, 3)), 'tensor_meta': None} n: return cos_default {'stack_trace': ' File "<ipython-input-9-2a8c7c0498ed>", line 36, in forward\n return cond(x.shape[0] <= 2, self.subm.forward, self.bar, [x])\n', 'source_fn': ('cond', <torch._ops.HigherOrderOperator object at 0x7f68ae93efd0>), 'original_aten': None, 'from_node': [('cond', <torch._ops.HigherOrderOperator object at 0x7f68ae93efd0>), ('output', 'output')], 'seq_nr': -1, 'val': FakeTensor(..., size=(1, 3, 3, 3)), 'tensor_meta': None} ``` As we can see, the meta of nodes in subgrarphs are overriden with the cond's metat data. This is because the function _set_current_meta is only invoked at the top-level graph module in interpreter. When we're calling into cond and dealing with the submodules here, we didn't set the current_meta to the meta of nodes of subgraph properly. Implementation: This pr fixes it by: in trace_cond, we optionally use an fx.interpreter to interpret the subgraphs so that the meta data is preserved only when the following conditions are satisfied: - The subgraphs are graph_module: this is necessary that we use the fx.Interpreter - The current make_fx has turned preserve_node_meta on (as is the case for capture_pre_autograd_graph). Test Plan See added tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108356 Approved by: https://github.com/SherlockNoMad	2023-09-01 22:43:55 +00:00
Xiaodong Wang	ac42b4ea4d	[pt2] Turn on cudagraph tree in fbcode (#108416 ) Summary: cudagraph tree will significantly reduce the memory usage> Memory consumption wise: {F1081833757} with cudagraph tree: 65GB w/o cudagraph tree: 83GB Differential Revision: D48907239 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108416 Approved by: https://github.com/eellison	2023-09-01 22:39:43 +00:00
Avik Chaudhuri	ad032a76f3	print equalities (#108427 ) Differential Revision: [D48910802](https://our.internmc.facebook.com/intern/diff/D48910802/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108427 Approved by: https://github.com/angelayi	2023-09-01 22:37:22 +00:00
drisspg	add45aea1c	Flash Attention v2 (#105602 ) # Summary ## PR Dependencies I don't use ghstack :( this is a PR where it would have been helpful. That beings said I am going to peel off some PRs to make reviewing this easier: - [x] Separate build flags for Flash and MemEff: #107985 ### Description This pull request updates the version of _scaled_dot_product_flash_attention from version 1 to version 2. The changes are based on the flash attention code originally authored by @tridao ### Changes Made The majority of the changes in this pull request involve: - Copying over the flash_attention sources. - Updating header files. - Removing padding and slicing code from within the flash_attention kernel and relocating it to the composite implicit region of the SDPA. This was need to make the kernel functional and appease autograd. - Introducing a simple kernel generator to generate different instantiations of the forward and backward flash templates. - Adding conditional compilation (ifdef) to prevent building when nvcc is invoked with gencode < sm80. - Introducing a separate dependent option for mem_eff_attention, as flash_attention v2 lacks support for Windows and cannot be built for sm50 generation codes. - Modifying build.sh to reduce parallelization on sm86 runners and to lower the maximum parallelization on the manywheel builds. This adjustment was made to address out-of-memory issues during the compilation of FlashAttentionV2 sources. - Adding/Updating tests. ### Notes for Reviewers This is not a fun review, and I apologize in advance. Most of the files-changed are in the flash_attn/ folder. The only files of interest here IMO: - aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp - aten/src/ATen/native/transformers/cuda/flash_attn/kernels/generate_kernels.py ( this has been incorporated upstream to flash-attention github) There are a number of files all related to avoiding OOMs in CI/CD. These are typically shell scripts. ### Follow up items - Include the updates from `e07aa036db` and `9e5e8bc91e` \| https://github.com/pytorch/pytorch/issues/108108 ### Work Items - [x] I don't think Windows will be supported for 3.1.0 - Need to update cmakee - [x] Let multi_query/attention pass through and test \| UPDATE: I have the fast path implemented here: https://github.com/pytorch/pytorch/pull/106730 but since this will require changes to semantics of math to call repeat_interleave, I think this should be done as a followup. - [x] Had to drop cutlass back to 3.0.0 to get it to compile. Need to figure out how to upgrade to 3.1.0 and later. Spoke with Tri and he is going to be taking a look. Note: compiling with clang currently errors for the cute headers. - [x] Update test exercise above codepath - [x] Still need to disable on seq_len % 128 != 0 for backward( Tri beat me to it `a4f148b6ab`) - [x] Add determinism warning to BWD, Tri got to this one as well: 1c41d2b - [x] Update dispatcher to universally prefer FlashV2 - [x] Update tests to exercise new head_dims - [x] Move the head_dim padding from kernel to top level composite implicit function in order to make it purely functional - [x] Create template generator script - [x] Initial cmake support for building kernels/ folder - [x] Replay CudaGraph changes ### Results #### Forward only The TFlops are reported here are on a100 that is underclocked. ![flashv2_tflops_vs_seq_len](https://github.com/pytorch/pytorch/assets/32754868/152de46d-8fa6-42f0-9a9c-ef1eb7ae29e7) #### Forward+Backward Ran a sweep and for large compute bound sizes we do see a ~2x performance increase for forw+back. <img width="1684" alt="Screenshot 2023-07-20 at 3 47 47 PM" src="https://github.com/pytorch/pytorch/assets/32754868/fdd26e07-0077-4878-a417-f3a418b6fb3b"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/105602 Approved by: https://github.com/huydhn, https://github.com/cpuhrsch	2023-09-01 22:14:44 +00:00
Kaichen Liu	234f00e1cd	[PyTorch][Vulkan] Add a matrix multiplication performance test binary and fix GPU latency measurement (#108266 ) Summary: - Added a new matmul perf test binary as target `pt_vulkan_mm_perf_test_bin` - Also renamed the existing `vulkan_perf_test_bin` to `vulkan_conv_arithmetic_perf_test_bin` with associated source file name change - Fixed the manual time benchmark measurement for both performance binaries, which was not tracking the correct opnames (e.g. checked for runtime of nonexistent "mm" instead of "vulkan.mm") Test Plan: # pt_vulkan_mm_perf_test_bin - build the matrix multiplication performance test binary ``` ~/fbsource » buck2 build -c ndk.debug_info_level=0 -c ndk.static_linking=true -c pt.enable_qpl=0 -c pt.vulkan_use_gpu_diagnostics=1 --target-platforms=ovr_config//platform/android:arm32-fbsource //xplat/caffe2:pt_vulkan_mm_perf_test_binAndroid --show-output -c pt.vulkan_full_precision=1 [...] BUILD SUCCEEDED fbsource//xplat/caffe2:pt_vulkan_mm_perf_test_binAndroid buck-out/v2/gen/fbsource/f1f3f9bed27e143c/xplat/caffe2/__pt_vulkan_mm_perf_test_binAndroid__/pt_vulkan_mm_perf_test_binAndroid ``` - test on arm32 android device ``` ~/fbsource » adb push buck-out/v2/gen/fbsource/f1f3f9bed27e143c/xplat/caffe2/__pt_vulkan_mm_perf_test_binAndroid__/pt_vulkan_mm_perf_test_binAndroid /data/local/tmp/ ~/fbsource » adb shell /data/local/tmp/pt_vulkan_mm_perf_test_binAndroid ``` - output P817269023, excerpt below ``` Kernel Name Workgroup Size Duration (ns) =========== ============== =========== vulkan.nchw_to_image {500, 500, 1} 4336072 vulkan.nchw_to_image {250, 250, 1} 1106716 vulkan.nchw_to_image {1, 1, 1} 7228 vulkan.mm {250, 250, 1} 132570256 [...] vulkan.mm {250, 250, 1} 80492152 vulkan.image_to_nchw {500, 500, 1} 1420328 ------------------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------------------------------------- mm_benchmark/N:500/M:500/P:500/iterations:5/manual_time/threads:1 91047 ms 143 ms 5 ``` # pt_vulkan_conv_arithmetic_perf_test_bin - build the convolution and arithmetic performance test binary ``` ~/fbsource » buck2 build -c ndk.debug_info_level=0 -c ndk.static_linking=true -c pt.enable_qpl=0 -c pt.vulkan_use_gpu_diagnostics=1 --target-platforms=ovr_config//platform/android:arm32-fbsource //xplat/caffe2:pt_vulkan_conv_arithmetic_perf_test_binAndroid --show-output -c pt.vulkan_full_precision=1 [...] BUILD SUCCEEDED fbsource//xplat/caffe2:pt_vulkan_conv_arithmetic_perf_test_binAndroid buck-out/v2/gen/fbsource/f1f3f9bed27e143c/xplat/caffe2/__pt_vulkan_conv_arithmetic_perf_test_binAndroid__/pt_vulkan_conv_arithmetic_perf_test_binAndroid ``` - test on arm32 android device ``` ~/fbsource » adb push buck-out/v2/gen/fbsource/f1f3f9bed27e143c/xplat/caffe2/__pt_vulkan_conv_arithmetic_perf_test_binAndroid__/pt_vulkan_conv_arithmetic_perf_test_binAndroid /data/local/tmp/ ~/fbsource » adb shell /data/local/tmp/pt_vulkan_conv_arithmetic_perf_test_binAndroid 2023-07-20T20:23:26+00:00 ``` - output P817267332, excerpt below ``` Kernel Name Workgroup Size Duration (ns) =========== ============== =========== vulkan.add {193, 221, 30} 39475696 vulkan.image_to_nchw {193, 221, 30} 13463424 vulkan.add {193, 221, 30} 72950176 vulkan.image_to_nchw {193, 221, 30} 17792684 [...] vulkan.add {193, 221, 30} 72986368 vulkan.image_to_nchw {193, 221, 30} 15921672 ---------------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ---------------------------------------------------------------------------------------------------------------------------- add_op_benchmark/N:3/C:40/H:221/W:193/iterations:100/manual_time/threads:1 73242 ms 602 ms 100 libc++abi: terminating due to uncaught exception of type c10::Error: Copy of vulkan quantized tensors to cpu is currently disabled! ``` Reviewed By: yipjustin Differential Revision: D48798710 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108266 Approved by: https://github.com/manuelcandales	2023-09-01 22:11:35 +00:00
CaoE	8f02884569	add Half support for GroupNorm on CPU (#100234 ) ### Testing Single socket (28cores): * Contiguous: shape \| forward / s\| forward / s\| backward / s\| backward / s -- \| -- \| -- \| -- \| -- \| fp32 \| mixed fp32 fp16 \| fp32 \| mixed fp32 fp16 [10, 128, 10, 10] \| 2.45E-05 \| 3.26E-05 \| 6.87E-05 \| 7.40E-05 [10, 128, 80, 80] \| 0.000726 \| 0.000606 \| 0.002183 \| 0.001112 * Channels Last: shape \| forward / s\| forward / s\| backward / s\| backward / s -- \| -- \| -- \| -- \| -- \| fp32 \| mixed fp32 fp16 \| fp32 \| mixed fp32 fp16 [10, 128, 10, 10] \| 2.88E-05 \| 2.72E-05 \| 6.56E-05 \| 6.63E-05 [10, 128, 80, 80] \| 0.00076 \| 0.000256 \| 0.002385 \| 0.000735 Single core: * Contiguous: shape \| forward / s\| forward / s\| backward / s\| backward / s -- \| -- \| -- \| -- \| -- \| fp32 \| mixed fp32 fp16 \| fp32 \| mixed fp32 fp16 [10, 128, 10, 10] \| 9.47E-05 \| 1.90E-04 \| 2.03E-04 \| 3.10E-04 [10, 128, 80, 80] \| 6.25E-03 \| 8.98E-03 \| 0.016485 \| 0.01369 * Channels Last: shape \| forward / s\| forward / s\| backward / s\| backward / s -- \| -- \| -- \| -- \| -- \| fp32 \| mixed fp32 fp16 \| fp32 \| mixed fp32 fp16 [10, 128, 10, 10] \| 8.66E-05 \| 7.89E-05 \| 1.95E-04 \| 1.43E-04 [10, 128, 80, 80] \| 5.97E-03 \| 3.13E-03 \| 0.01626 \| 8.70E-03 Pull Request resolved: https://github.com/pytorch/pytorch/pull/100234 Approved by: https://github.com/jgong5, https://github.com/mikaylagawarecki	2023-09-01 21:25:24 +00:00
Joel Schlosser	54dcb0ea61	NT support for matmul of (B, *, C, D) NT with dense (D, E) (#108370 ) Used in SAM Pull Request resolved: https://github.com/pytorch/pytorch/pull/108370 Approved by: https://github.com/cpuhrsch ghstack dependencies: #108361	2023-09-01 20:45:44 +00:00
Xilun Wu	a78b78cd76	[DTensor][random] add DTensor constructor: randn (#108285 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108285 Approved by: https://github.com/wanchaol	2023-09-01 20:28:41 +00:00
Catherine Lee	c67ebae344	Put logging in run_tests (#107987 ) Logging regarding which tests are serial + parallel + what tests actually get run on the shard got removed, which can be pretty helpful, so this adds it back in. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107987 Approved by: https://github.com/huydhn, https://github.com/Neilblaze	2023-09-01 20:23:30 +00:00
Yukio Siraichi	29f17e1f14	Fix `full` on symbolic value. (#108166 ) Fix: #108067 This PR adds checks for `sympy.Expr` when extracting the dtype from a value inside the `full` lowering. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108166 Approved by: https://github.com/lezcano	2023-09-01 20:16:40 +00:00
zhxchen17	fc1c862e62	[export] Properly handle duplicated params. (#108415 ) Summary: Test Plan: python benchmarks/dynamo/huggingface.py --bfloat16 --accuracy --inference --device cuda --export --only BertForMaskedLM python benchmarks/dynamo/huggingface.py --bfloat16 --accuracy --inference --device cuda --export-aot-inductor --only BertForMaskedLM Reviewers: Subscribers: Tasks: Tags: Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/108415 Approved by: https://github.com/angelayi	2023-09-01 19:44:32 +00:00
Parthkumar Desai	2d9a828900	enabled AT_USE_JITERATOR() for `tan` and `tanh` kernels. (#102427 ) This MR fixes the test failures for `jiterator` implemenation for `tan` and `tanh` Unary kernels as mentioned in #100842. The failures were fixed by adjusting tolerances but some failures were in `test_unary_ufuncs.py` required adjusting input values as well. Since the jiterator kernels are using libstdc++, the supported input range is smaller than thrust implementation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102427 Approved by: https://github.com/malfet	2023-09-01 19:16:10 +00:00
Justin Chu	6ba2b6e147	[ONNX] Show sarif_report_path (#108398 ) `sarif_report_path` was not formatted correctly in the error message @BowenBao Pull Request resolved: https://github.com/pytorch/pytorch/pull/108398 Approved by: https://github.com/thiagocrepaldi	2023-09-01 19:11:52 +00:00
Peter Bell	e58d3ed81d	[inductor] Generalize pointless_cumsum_replacement pattern (#108373 ) The current pattern transforms: ``` ones([x, y]).cumsum(1) -> arange(1, 1 + y).expand([x, y]) ``` but this generalizes it to ``` full(shape, fill_value).cumsum(d) -> (fill_value * arange(1, 1 + shape[d])).view([1..., shape[d], 1...]).expand(shape) ``` So we handle any fill value, any number of dimensions, and broadcasting to any dimension. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108373 Approved by: https://github.com/lezcano	2023-09-01 17:12:09 +00:00
Bin Bao	0f1a225f33	[CI] Enable max-autotune for Sunday dashboard run (#108386 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108386 Approved by: https://github.com/huydhn	2023-09-01 14:55:24 +00:00
lezcano	2a6ef9b04d	[dynamo] Avoid recompilation when the PyTorch function accepts scalars (#108162 ) Before, it would create a 0D tensor with the input, which would incur in a guard and specialisation. It's not clear whether the guard and specialisation is the right behaviour when we create 0D tensors, but that's a story for another day. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108162 Approved by: https://github.com/ev-br, https://github.com/peterbell10	2023-09-01 14:35:42 +00:00
Chien-Chin Huang	591cb776af	[FSDP][state_dict][optim_state_dict] Log slow optim and model state_dict paths (#108290 ) This PR adds SimpleProfiler for FSDP state_dict/load_state_dict logging purpose. SimpleProfiler use class variables to record profiling results and it does everything in the Python which can be slow. So it is only suitable for logging slow actions such as initialization and state_dict/load_state_dict. This PR uses SimpleProfiler to log some critical/slow paths of the model and optimizer state_dict/load_state_dict. Differential Revision: [D48774406](https://our.internmc.facebook.com/intern/diff/D48774406/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108290 Approved by: https://github.com/wz337	2023-09-01 06:57:59 +00:00
Eddie Yan	db63bf3d7e	[NCCL][CUDA][CUDA Graphs] Flush enqueued work before starting a graph capture (#104487 ) #103503 addresses the situation where additional work is enqueued for the NCCL watchdog to poll during a graph capture---something we want to avoid as the subsequent polling will query an event and crash the capture. However, there is currently no check that there was not work _already_ enqueued for the watchdog to poll. If there was already work that was enqueued and not cleaned up before the start of a graph capture, then we run into a similar problem where the event query will crash the capture. We've observed this causing crashes on several workloads, although it is somewhat flaky (if the watchdog happens to poll just before the graph capture and cleanup, then we dodge the crash). This is a bit of a tricky issue as it involves making sure that no process group has enqueued work at the start of a capture, and as such the simplest solution is to add a bit of global state to track all process groups. This PR forces the start of the graph capture to wait until all enqueued work is completed and cleaned up or times out. I did consider the alternative of simply having the watchdog skip cleanup if we detect that we are in the middle of a graph capture, but I think deferring the cleanup until later could result in false positive timeouts (e.g., if we defer cleanup on work that has completed long ago, checking timers after the graph capture could yield a "timeout"). CC @Aidyn-A @bottler @kwen2501 @ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/104487 Approved by: https://github.com/kwen2501	2023-09-01 05:42:08 +00:00
Paul Zhang	4a9c6f1b73	[PyPer][BE] Fix test_scripted_module in StatCollector (#108232 ) Summary: D41985889 removed the cast to int for the inputs to torch.histc below, allowing the inputs to still be tensors. These tensors still have require_grad_ set to True, causing issues with the call to torch.histc. Test Plan: buck2 test 'fbcode//mode/opt' fbcode//dper3/dper3/modules/low_level_modules/tests:stat_collector_test -- --exact 'dper3/dper3/modules/low_level_modules/tests:stat_collector_test - test_scripted_module (dper3.dper3.modules.low_level_modules.tests.stat_collector_test.StatCollectorTest_1)' Differential Revision: D48800879 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108232 Approved by: https://github.com/jerryzh168	2023-09-01 04:23:57 +00:00
zhxchen17	d96446b9c2	[export] Fix duplicated params for AOTInductor. (#108354 ) Summary: Test Plan: python benchmarks/dynamo/huggingface.py --bfloat16 --accuracy --inference --device cuda --export --only BertForMaskedLM python benchmarks/dynamo/huggingface.py --bfloat16 --accuracy --inference --device cuda --export-aot-inductor --only BertForMaskedLM Reviewers: Subscribers: Tasks: Tags: Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/108354 Approved by: https://github.com/angelayi, https://github.com/desertfire	2023-09-01 03:18:49 +00:00
Elias Ellison	e18f512b81	Update accuracy checking for nan, floats (#108202 ) Fixes inference accuracy for `doctr_reco_predictor` and `pyhpc_turbulent_kinetic_energy`. For the `same(float, float)` comparison we weren't going through the more rigorous tensor comparison path which takes into account the fp64 base results. Also return True when fp64 base result are not well formed (nan). I debugged these models and the source of divergence were innocuous: `doctr_reco_predictor` - can be fixed by turning off layout optimization, decomp for batch norm `pyhpc_turbulent_kinetic_energy` - divergence caused because fused kernel keeps precision in fp32 instead of casting back and forth from/to fp32/bf16. Fused kernel is better precision, anyway. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108202 Approved by: https://github.com/jansel	2023-09-01 02:54:01 +00:00
wz337	90ef3b82d1	[DeviceMesh] Add unique mesh_dim_name check in init_device_mesh() (#108326 ) Each mesh_dim_name in mesh_dim_names need to be unique. This PR adds check when calling init_device_mesh(). Pull Request resolved: https://github.com/pytorch/pytorch/pull/108326 Approved by: https://github.com/wanchaol	2023-09-01 02:14:18 +00:00
vasiliy	3702980717	dynamo: trace autograd.Function with tensor subclass input (#108093 ) Summary: Enables dynamo eager mode tracing for the following situation: 1. we have a torch.autograd.Function 2. the input to that function is a tensor subclass which is an intermediary This is useful for float8 training UX. Test Plan: ``` python test/dynamo/test_autograd_function.py -k intermediary_input ``` Reviewers: Subscribers: Tasks: Tags: Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/108093 Approved by: https://github.com/bdhirsh, https://github.com/wanchaol	2023-09-01 02:12:38 +00:00
Joel Schlosser	414cb26ded	NT support for cat with dim=0 (#108361 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108361 Approved by: https://github.com/cpuhrsch	2023-09-01 02:02:53 +00:00
Jerry Zhang	a9fe0b5b74	[quant][pt2e] Move propagate_annotation from quant flow to quantizer (#108320 ) Summary: Previously we run propagate_annotation by default in quantization flow to propagate annotations for ops like reshape, view etc. Not all quantizers would need this so we moved this to xnnpack_quantizer_utils for now. Next Step: * make propagate_annotation function configurable with a custom list of ops * remove unneeded ops in `_is_share_obs_or_fq_op` Test Plan: python test/test_quantization.py TestQuantizePT2E Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D48856985](https://our.internmc.facebook.com/intern/diff/D48856985) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108320 Approved by: https://github.com/kimishpatel	2023-09-01 01:49:19 +00:00
PyTorch MergeBot	ab5b4c4419	Revert "[HSDP] Add device_mesh to FSDP and add dtensor state_dict support for HSDP (#107533 )" This reverts commit cc220e45a80d7c01a4a58b0f386ca07236a6927a. Reverted https://github.com/pytorch/pytorch/pull/107533 on behalf of https://github.com/huydhn due to Sorry for reverting this, but it is failing in trunk with the same failure on test_dynamo_distributed `cc220e45a8` ([comment](https://github.com/pytorch/pytorch/pull/107533#issuecomment-1701983247))	2023-09-01 01:26:30 +00:00
Jun Luo	8289ad8e5e	Support is_mtia attribute. (#108307 ) (#108310 ) Summary: FBGEMM uses `self.iter.is_cuda` to check if the tensor is for CUDA. This diff enables similar feature `self.iter.is_mtia` for tensors with MTIA device key. Test Plan: See diff D48693225 Reviewed By: jackm321 Differential Revision: D48809191 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108310 Approved by: https://github.com/albanD	2023-09-01 01:25:40 +00:00
PyTorch MergeBot	d569e506ab	Revert "Flash Attention v2 (#105602 )" This reverts commit 9df3d882c8fe1e57914315aa250664ad5003d4fd. Reverted https://github.com/pytorch/pytorch/pull/105602 on behalf of https://github.com/huydhn due to I think we miss a case here for sm80 build on inductor workflow as it is now OOM on trunk https://github.com/pytorch/pytorch/actions/runs/6042843139 ([comment](https://github.com/pytorch/pytorch/pull/105602#issuecomment-1701974862))	2023-09-01 01:15:01 +00:00
leslie-fang-intel	ee0e04ac48	Allow float dtype when Autocast CPU Disabled (#107348 ) Summary Fix the https://github.com/pytorch/pytorch/issues/100565 by allowing float32 data type when Autocast CPU is disabled. Current behavior is: - When autocast is disabled and user passes in float data type, it works well. - When autocast is enabled and user passes in float data type, a warn message throws `UserWarning: In CPU autocast, but the target dtype is not supported. Disabling autocast.` to disable autocast automatically TestPlan ``` python -u -m pytest -s -v test_autocast.py -k test_autocast_disabled_with_fp32_dtype ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/107348 Approved by: https://github.com/jgong5, https://github.com/Neilblaze, https://github.com/albanD	2023-09-01 00:49:44 +00:00
leslie-fang-intel	6c342ec368	Revert PR-107951 to only support new graph capture API in Quantization (#108317 ) Summary Revert the changes in https://github.com/pytorch/pytorch/pull/107951 to make the utils function only support graph captured by `capture_pre_autograd_graph`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108317 Approved by: https://github.com/jgong5, https://github.com/jerryzh168 ghstack dependencies: #108214	2023-09-01 00:47:10 +00:00
leslie-fang-intel	fb808c30c7	x86_inductor_quantizer switches to new graph capture API (#108214 ) Summary Update `X86InductorQuantizer` and related testcase to the new graph capture API `capture_pre_autograd_graph`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108214 Approved by: https://github.com/jgong5, https://github.com/jerryzh168	2023-09-01 00:43:45 +00:00
wz337	aadd86b1e8	[DCP]Add unit test for tp checkpoint (#108286 ) As title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108286 Approved by: https://github.com/wanchaol, https://github.com/fduwjj	2023-09-01 00:30:13 +00:00
Elias Ellison	63eee52ba7	Add Drq to BF16 Higher Tolernace (#108368 ) This passes for me on aws gpu but not devgpu, and was already in the `REQUIRE_HIGHER_FP16_TOLERANCE` set. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108368 Approved by: https://github.com/shunting314	2023-09-01 00:29:27 +00:00
Jirka Borovec	9178deedff	removing some redundant str splits (#106089 ) drop some redundant string splits, no factual changes, just cleaning the codebase Pull Request resolved: https://github.com/pytorch/pytorch/pull/106089 Approved by: https://github.com/albanD, https://github.com/malfet	2023-09-01 00:22:58 +00:00
wz337	cc220e45a8	[HSDP] Add device_mesh to FSDP and add dtensor state_dict support for HSDP (#107533 ) This PR: 1) Add device_mesh kwarg to FSDP. Remove init_device_mesh() from _runtime_utils.py, as device_mesh would be passed in by user as an kwarg. 2) change use_dtensor flag for state_dict_config and optim_state_dict_config to be private. If device_mesh is used with sharded model/optim state dict, _use_dtensor flag would be set to True and model/optim state dict would return dtensor state_dict. Otherwise, _use_dtensor flag would be set to False and model/optim state dict would return sharded_tensor state_dict. 3) Update _optim_utils.py, _shard_utils.py, and _state_dict_utils.py to add support for HSDP to return 2D DTensor state_dict. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107533 Approved by: https://github.com/fegin, https://github.com/awgu, https://github.com/wanchaol	2023-09-01 00:15:00 +00:00
Wanchao Liang	a29b9101fa	[dynamo] fix dynamo + DTensor to work with 2d (#108329 ) pair debugged with @wconstab and we found some issue in both dynamo and the TP's fsdp extension side. This PR fixes the dynamo + DTensor integration so that the current graph break FSDP can work with tensor parallel by moving the torch.compile after FSDP wrapping. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108329 Approved by: https://github.com/Skylion007, https://github.com/wconstab	2023-08-31 22:46:26 +00:00
Wanchao Liang	eafc05887f	[dtensor] fix two more requires_grad callsite (#108358 ) redistribute return a new DTensor and those returned DTensors should follow the input DTensor requires_grad instead of the input tensor local tensor's requires_grad Pull Request resolved: https://github.com/pytorch/pytorch/pull/108358 Approved by: https://github.com/fduwjj	2023-08-31 22:25:40 +00:00
Huy Do	3e75fd06e2	Pin pandas version for inductor Docker image (#108355 ) Building docker in trunk is failing atm https://github.com/pytorch/pytorch/actions/runs/6033657019/job/16370683676 with the following error: ``` + conda_reinstall numpy=1.24.4 + as_jenkins conda install -q -n py_3.10 -y --force-reinstall numpy=1.24.4 + sudo -E -H -u jenkins env -u SUDO_UID -u SUDO_GID -u SUDO_COMMAND -u SUDO_USER env PATH=/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64 conda install -q -n py_3.10 -y --force-reinstall numpy=1.24.4 Collecting package metadata (current_repodata.json): ...working... done Solving environment: ...working... unsuccessful initial attempt using frozen solve. Retrying with flexible solve. Collecting package metadata (repodata.json): ...working... done Solving environment: ...working... unsuccessful initial attempt using frozen solve. Retrying with flexible solve. PackagesNotFoundError: The following packages are not available from current channels: - numpy=1.24.4 Current channels: - https://repo.anaconda.com/pkgs/main/linux-64 - https://repo.anaconda.com/pkgs/main/noarch - https://repo.anaconda.com/pkgs/r/linux-64 - https://repo.anaconda.com/pkgs/r/noarch ``` This was pulled in by pandas 2.1.0 released yesterday https://pypi.org/project/pandas/2.1.0 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108355 Approved by: https://github.com/kit1980, https://github.com/atalman, https://github.com/malfet	2023-08-31 21:58:40 +00:00
Nikita Shulga	bae409388c	[MPS] Fix `.item()` for multi-dim scalar (#107913 ) By refactoring `_local_scalar_dense_mps` to use `_empty_like` to allocate CPU tensor. Also, print a more reasonable error message when dst dim is less than src in mps_copy_ This fixes regression introduced by https://github.com/pytorch/pytorch/pull/105617 and adds regression test. <!-- copilot:poem --> ### <samp>🤖 Generated by Copilot at abd06e6</samp> > _Sing, O Muse, of the valiant deeds of the PyTorch developers_ > _Who strive to improve the performance and usability of tensors_ > _And who, with skill and wisdom, fixed a bug in the MPS backend_ > _That caused confusion and dismay to many a user of `item()`_ Fixes https://github.com/pytorch/pytorch/issues/107867 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107913 Approved by: https://github.com/albanD	2023-08-31 21:08:29 +00:00
drisspg	5b6ba4110b	Fallback to eager for float8 ops in inductor (#108293 ) # Summary As a stop gap to supporting the FP8 Dtype within inductor we would like to fallback to eager. Currently there are 3 ops that are needed for this: `_scaled_mm` ( matmul for fp8 types) `clone` (for creating new copies of fp8 tensors) `to` ( for converting to and from fp8 types). This PR registers a fallback for _scaled_mm. And adds fp8 to trigger `unsupported_input_tensor` Prior to these changes this was failing with: ``` Shell File "/home/drisspg/meta/pytorch/torch/_inductor/codegen/triton_utils.py", line 11, in signature_of tye = JITFunction._type_of(arg.dtype) File "/home/drisspg/miniconda3/envs/dev/lib/python3.10/site-packages/triton/runtime/jit.py", line 229, in _type_of return key if isinstance(key, str) else f"*{tys[dtype_str]}" torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised: KeyError: 'float8_e4m3fn' ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/108293 Approved by: https://github.com/peterbell10	2023-08-31 20:48:18 +00:00
Michael Lazos	49df1de383	Cudagraphs support for compiled optimizers (#107504 ) Marks all params/optimizer state as static addresses and a finalizer which cleans up the graph attributes when the optimizer goes out of scope. **Note: this does not mark grads as static because this will increase memory usage significantly There are two cases: 1. The upstream graph is cudagraphed - this case will work fine OOTB 2. The upstream graph is not cudagraphed - in this case, there will be a lot of copies introduced from the upstream (to copy the grads) into cudagraphed-owned memory, unless the user explicitly marks the grads as static. If the user does this, this will also require not deallocating the grads in zero_grad() (either the mod or optimizer version) by setting them to zero vs None. There is a PR (https://github.com/pytorch/pytorch/pull/107853) in flight to throw an error if zero_grad attempts to set static grads to None. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107504 Approved by: https://github.com/eellison	2023-08-31 20:47:18 +00:00
drisspg	d5ff8ca4ef	Relax divsibilty by 16 for leading dimension of mat1 in scaled_gemm (#108308 ) # Summary CublasLT requires that the matrices be 16 byte aligned. If mat1.size(-1) % 16 == 0 and the matrix is row major than the leading dimension can be any value. See this coment: https://github.com/pytorch/pytorch/pull/107341#discussion_r1310934737 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108308 Approved by: https://github.com/eqy, https://github.com/vkuzo	2023-08-31 20:31:47 +00:00
Yukio Siraichi	aeb4d6d5c5	Fix constant folding of arithmetic operations with symbolic values. (#108160 ) Partial fix: #108067 This PR fixes an inductor bug where it assumed the type of arithmetic nodes arguments were all `Tensor`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108160 Approved by: https://github.com/lezcano	2023-08-31 20:26:35 +00:00
Shunting Zhang	eb8659fe81	pass inference accuracy check for detectron2_fcos_r_50_fpn (#108328 ) We need a higher tolerance to pass the inference accuracy check for detectron2_fcos_r_50_fpn . Command: ``` python benchmarks/dynamo/torchbench.py --backend inductor --bfloat16 --accuracy --only detectron2_fcos_r_50_fpn --disable-cudagraphs --inference ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/108328 Approved by: https://github.com/jansel	2023-08-31 20:21:20 +00:00
Minh-Long Luu (刘明龙)	95f268e426	Add examples for `nn.CosineEmbeddingLoss` (#108215 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108215 Approved by: https://github.com/mikaylagawarecki	2023-08-31 20:01:24 +00:00
ydwu4	f8c93df2d1	Fix boolean tensor for map (#108289 ) torch.empty_strided is able to create a new tensor based on the meta data. For boolean tensor, we call a clone directly, however, we'll get a functional tensor if input is a functional tensor and that functional tensor won't be tracked by tracer's tensor_tracker after dispatching so it become a tensor\_constant in the graph if create_arg. So we manually unwrap the functional tensor before calling clone. Test Plan: See added test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108289 Approved by: https://github.com/angelayi	2023-08-31 19:17:28 +00:00
ydwu4	46f0d17498	Change to torch.ops.higher_order.cond in verifier (#108302 ) We need to match against torch.ops.higher_order.cond in verifier. Test Plan: added test to guard against change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108302 Approved by: https://github.com/angelayi	2023-08-31 19:12:07 +00:00
Wanchao Liang	74ff028839	[dtensor] fix new_empty_strided op (#107835 ) This PR fixes the new_empty_strided op to become replicate from sharding when necessary, this is a quick fix to resolve https://github.com/pytorch/pytorch/issues/107661 We'll need to think more about the behavior of this op when it comes to sharding, one possibility is to follow the input sharding, but given the output shape of this op might not be the same as the input, it's hard to say we should follow the input sharding, further improvement needed once we figure out the op syntax Pull Request resolved: https://github.com/pytorch/pytorch/pull/107835 Approved by: https://github.com/fduwjj	2023-08-31 18:27:35 +00:00
Jun Luo	46cd2fef3f	Create empty host tensor for MTIA device type. (#108198 ) Summary: Before copying tensor from CPU memory to device memory, for MTIA device, it doesn't need to pin the host memory first. Test Plan: See diff D48761820 Reviewed By: jackm321 Differential Revision: D48456471 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108198 Approved by: https://github.com/cx-yin, https://github.com/fduwjj	2023-08-31 18:12:59 +00:00
Yanbo Liang	dabdb97087	[Dynamo] Graph break on functions using tensor out variants (#108182 ) Fixes #108021 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108182 Approved by: https://github.com/eellison	2023-08-31 17:49:14 +00:00
Cyril-Anto	877561f388	Enable Mypy Checking in torch/_inductor/dependencies.py (#107675 ) Fixes #105230 Summary: As suggested in https://github.com/pytorch/pytorch/issues/105230 mypy checking is enabled in torch/_inductor/dependencies.py After Fix: mypy --follow-imports=skip torch/_inductor/dependencies.py Success: no issues found in 1 source file Pull Request resolved: https://github.com/pytorch/pytorch/pull/107675 Approved by: https://github.com/jansel	2023-08-31 17:36:43 +00:00
PyTorch MergeBot	2e1e7ed610	Revert "Fallback to eager for float8 ops in inductor (#108293 )" This reverts commit 98aa3745c258827cde8d081d0713ba2cd67c864e. Reverted https://github.com/pytorch/pytorch/pull/108293 on behalf of https://github.com/huydhn due to Sorry for reverting your change, it is failing on ROCM `98aa3745c2` ([comment](https://github.com/pytorch/pytorch/pull/108293#issuecomment-1701446105))	2023-08-31 17:21:20 +00:00
Nakul Camsamudram	335767e7da	Raise an error for unsupported ctx managers (#108272 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108272 Approved by: https://github.com/anijain2305	2023-08-31 17:20:36 +00:00
Zain Rizvi	5727b07ac6	TD: logging bugfix (#108288 ) Fix bug where logging metrics don't get emitted unless the 'keep-going' label is specified on the PR Also adds some extra logging to make debugging easier Pull Request resolved: https://github.com/pytorch/pytorch/pull/108288 Approved by: https://github.com/Skylion007	2023-08-31 16:51:49 +00:00
Bin Bao	06d74e6b24	Revert "[AOTInductor] Include constants in AOTInductor .so file. (#10… (#108349 ) This reverts commit c3239442a3dd1040b251ff33bef40589cba40e1c due to internal test failures. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108349 Approved by: https://github.com/aakhundov, https://github.com/zhxchen17	2023-08-31 16:26:02 +00:00
Evgeni Burovski	01dfa7620d	MAINT: np.unique works with f16 directly (#108228 ) (follow up on gh-107768) Remove a f16->f32 workaround from np.unique, since torch.unique and np.unique seem to just work with float16 tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108228 Approved by: https://github.com/lezcano	2023-08-31 16:21:13 +00:00
XiaobingSuper	cbf7c91883	inductor: make fallback for cpu scatter_add (#108220 ) For inductor cpu backend, the scatter_add will use ```atomic_add```, which get a worse performance, currently, we make fallback for it to avoid performance regression compared with eager mode(single socket of SKX): ``` basic_gnn_gin 1.16x(after) Vs 0.509x(before) basic_gnn_sage 1.064x(after) Vs 0.496x (before) basic_gnn_gcn 1.373x(aftre) Vs 0.720x(before) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/108220 Approved by: https://github.com/jgong5, https://github.com/desertfire	2023-08-31 16:11:07 +00:00
drisspg	9df3d882c8	Flash Attention v2 (#105602 ) # Summary ## PR Dependencies I don't use ghstack :( this is a PR where it would have been helpful. That beings said I am going to peel off some PRs to make reviewing this easier: - [x] Separate build flags for Flash and MemEff: #107985 ### Description This pull request updates the version of _scaled_dot_product_flash_attention from version 1 to version 2. The changes are based on the flash attention code originally authored by @tridao ### Changes Made The majority of the changes in this pull request involve: - Copying over the flash_attention sources. - Updating header files. - Removing padding and slicing code from within the flash_attention kernel and relocating it to the composite implicit region of the SDPA. This was need to make the kernel functional and appease autograd. - Introducing a simple kernel generator to generate different instantiations of the forward and backward flash templates. - Adding conditional compilation (ifdef) to prevent building when nvcc is invoked with gencode < sm80. - Introducing a separate dependent option for mem_eff_attention, as flash_attention v2 lacks support for Windows and cannot be built for sm50 generation codes. - Modifying build.sh to reduce parallelization on sm86 runners and to lower the maximum parallelization on the manywheel builds. This adjustment was made to address out-of-memory issues during the compilation of FlashAttentionV2 sources. - Adding/Updating tests. ### Notes for Reviewers This is not a fun review, and I apologize in advance. Most of the files-changed are in the flash_attn/ folder. The only files of interest here IMO: - aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp - aten/src/ATen/native/transformers/cuda/flash_attn/kernels/generate_kernels.py ( this has been incorporated upstream to flash-attention github) There are a number of files all related to avoiding OOMs in CI/CD. These are typically shell scripts. ### Follow up items - Include the updates from `e07aa036db` and `9e5e8bc91e` \| https://github.com/pytorch/pytorch/issues/108108 ### Work Items - [x] I don't think Windows will be supported for 3.1.0 - Need to update cmakee - [x] Let multi_query/attention pass through and test \| UPDATE: I have the fast path implemented here: https://github.com/pytorch/pytorch/pull/106730 but since this will require changes to semantics of math to call repeat_interleave, I think this should be done as a followup. - [x] Had to drop cutlass back to 3.0.0 to get it to compile. Need to figure out how to upgrade to 3.1.0 and later. Spoke with Tri and he is going to be taking a look. Note: compiling with clang currently errors for the cute headers. - [x] Update test exercise above codepath - [x] Still need to disable on seq_len % 128 != 0 for backward( Tri beat me to it `a4f148b6ab`) - [x] Add determinism warning to BWD, Tri got to this one as well: 1c41d2b - [x] Update dispatcher to universally prefer FlashV2 - [x] Update tests to exercise new head_dims - [x] Move the head_dim padding from kernel to top level composite implicit function in order to make it purely functional - [x] Create template generator script - [x] Initial cmake support for building kernels/ folder - [x] Replay CudaGraph changes ### Results #### Forward only The TFlops are reported here are on a100 that is underclocked. ![flashv2_tflops_vs_seq_len](https://github.com/pytorch/pytorch/assets/32754868/152de46d-8fa6-42f0-9a9c-ef1eb7ae29e7) #### Forward+Backward Ran a sweep and for large compute bound sizes we do see a ~2x performance increase for forw+back. <img width="1684" alt="Screenshot 2023-07-20 at 3 47 47 PM" src="https://github.com/pytorch/pytorch/assets/32754868/fdd26e07-0077-4878-a417-f3a418b6fb3b"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/105602 Approved by: https://github.com/huydhn, https://github.com/cpuhrsch	2023-08-31 16:02:20 +00:00
lezcano	239ee76177	Add refs/decomps for dot/vdot (#108194 ) Follow-up on https://github.com/pytorch/pytorch/issues/108127#issuecomment-1698142427 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108194 Approved by: https://github.com/peterbell10 ghstack dependencies: #108188	2023-08-31 15:30:23 +00:00
lezcano	239fed7e1e	Add reference for linalg.vecdot (#108188 ) Was addressing https://github.com/pytorch/pytorch/issues/108127, but then I realised that vecdot is already CompositeImplicit. Pushing anyway as a short-and-sweet PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108188 Approved by: https://github.com/peterbell10	2023-08-31 15:30:23 +00:00
PyTorch MergeBot	150088a9cd	Revert "Use ctypes to serialize raw content for tensors. (#108287 )" This reverts commit 43f28beffc572474e8c5f6ba6c33115e9dc69be9. Reverted https://github.com/pytorch/pytorch/pull/108287 on behalf of https://github.com/desertfire due to Internal test failure from https://github.com/pytorch/pytorch/pull/107718. Revert this one first and then revert 107718. ([comment](https://github.com/pytorch/pytorch/pull/108287#issuecomment-1701138632))	2023-08-31 14:17:04 +00:00
Digant Desai	691e0e9799	[export] Copy gm before calling PassManager (#108321 ) Test Plan: CI Reviewed By: angelayi, cccclai Differential Revision: D48801487 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108321 Approved by: https://github.com/kimishpatel, https://github.com/mcr229	2023-08-31 13:34:08 +00:00
kshitij12345	31ef33871d	[vmap][dynamo] run vmap under python dispatcher (#107947 ) Before `test_op_has_batch_rule_cholesky_solve_cpu_float32` failed: ``` PYTORCH_TEST_WITH_DYNAMO=1 pytest test/functorch/test_vmap.py -k test_op_has_batch_rule_cholesky_solve_cpu_float32 test/functorch/test_vmap.py terminate called after throwing an instance of 'pybind11::error_already_set' what(): RuntimeError: /home/kshiteej/Pytorch/pytorch_functorch/build/aten/src/ATen/RegisterCompositeExplicitAutograd.cpp:2214: SymIntArrayRef expected to contain only concrete integers ``` After this PR the test cases NOTE: We can't be 100% of tests on CI till we figure out https://github.com/pytorch/pytorch/issues/107444 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107947 Approved by: https://github.com/zou3519	2023-08-31 13:16:44 +00:00
Angela Yi	58268137f1	[pytree] Allow register_pytree_node to take in 5 inputs (#108256 ) Summary: This is currently breaking internal old torch.packages from when _register_pytree_node took 5 inputs. Test Plan: CI Differential Revision: D48834742 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108256 Approved by: https://github.com/zhxchen17, https://github.com/zou3519	2023-08-31 11:06:17 +00:00
kshitij12345	50fa5880e8	[vmap] symintify alias and squeeze (#107577 ) Following tests now pass (both ops call into `alias` on certain paths) ``` PYTORCH_TEST_WITH_DYNAMO=1 pytest test/functorch/test_vmap.py -k test_squeeze -v PYTORCH_TEST_WITH_DYNAMO=1 pytest test/functorch/test_vmap.py -k test_conj -v ``` NOTE: Ideally, this symint version should work with non symint version as well but that would mean changes at multiple places. Wanted to get a review for this fix before-hand. Other sites which use the `IntArrayRef` overload. `5f56c4fb32/aten/src/ATen/native/TensorShape.cpp (L1707-L1713)` `view_impl` is called from `view` and `_unsafe_view`. `5f56c4fb32/aten/src/ATen/native/TensorShape.cpp (L3295-L3306)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/107577 Approved by: https://github.com/zou3519	2023-08-31 08:08:33 +00:00
Zhengxu Chen	138fafe72d	[export] Fix torch.export() issues for server use cases. (#108275 ) Test Plan: In D48788843 Differential Revision: D48811793 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108275 Approved by: https://github.com/tugsbayasgalan	2023-08-31 07:19:18 +00:00
Mu-Chu Lee	43f28beffc	Use ctypes to serialize raw content for tensors. (#108287 ) Summary: There's a deadlock in current storage's implementation if the size of tensor is too large. Use ctypes to do serialization. Test Plan: python benchmarks/dynamo/huggingface.py --bfloat16 --accuracy --inference --device cuda --export-aot-inductor --only MT5ForConditionalGeneration Reviewers: Subscribers: Tasks: Tags: Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/108287 Approved by: https://github.com/desertfire, https://github.com/malfet	2023-08-31 06:59:18 +00:00
cyy	c24d0d3163	clang8=>clang9 in jobs (#107144 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107144 Approved by: https://github.com/malfet	2023-08-31 06:51:58 +00:00
cyy	a20fac89c8	[4/N] fix clang-tidy warnings in torch/csrc (#108305 ) Fixes clang-tidy warnings in torch/csrc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108305 Approved by: https://github.com/Skylion007	2023-08-31 06:47:42 +00:00
AllenTiTaiWang	d72b990bab	[ONNX] Move large scale models without non-persistent buffers to runtime test (#108084 ) Fixes https://github.com/pytorch/pytorch/issues/107715 Update models with their config to save CI running time and memories. Move some of models that doesn't need non-persistent buffers to runtime test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108084 Approved by: https://github.com/thiagocrepaldi	2023-08-31 06:05:19 +00:00
Jerry Zhang	9ed0b3fcd9	[release_note_tool] Update test and skip commits that errors out (#108252 ) Summary: att Test Plan: python test_release_notes.py Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/108252 Approved by: https://github.com/drisspg	2023-08-31 04:38:53 +00:00
Yanbo Liang	9862c7196b	[Dynamo] SetVariable supports contains (#108189 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/108189 Approved by: https://github.com/voznesenskym	2023-08-31 04:28:49 +00:00
drisspg	98aa3745c2	Fallback to eager for float8 ops in inductor (#108293 ) # Summary As a stop gap to supporting the FP8 Dtype within inductor we would like to fallback to eager. Currently there are 3 ops that are needed for this: `_scaled_mm` ( matmul for fp8 types) `clone` (for creating new copies of fp8 tensors) `to` ( for converting to and from fp8 types). This PR registers a fallback for _scaled_mm. And adds fp8 to trigger `unsupported_input_tensor` Prior to these changes this was failing with: ``` Shell File "/home/drisspg/meta/pytorch/torch/_inductor/codegen/triton_utils.py", line 11, in signature_of tye = JITFunction._type_of(arg.dtype) File "/home/drisspg/miniconda3/envs/dev/lib/python3.10/site-packages/triton/runtime/jit.py", line 229, in _type_of return key if isinstance(key, str) else f"*{tys[dtype_str]}" torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised: KeyError: 'float8_e4m3fn' ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/108293 Approved by: https://github.com/peterbell10	2023-08-31 04:09:01 +00:00
rzou	0e4752bafc	Allow registering decomps for HigherOrderOp; add decomp for out_dtype (#108080 ) We allow registering decomps for HigherOrderOp via the existing decomp mechanisms: - I refactored those APIs to accept torch._ops.OperatorBase, which is the base class for torch.ops.HigherOrderOperator and torch.ops.OpOverload - HigherOrderOps must directly call maybe_handle_decomp in their ProxyTorchDispatchMode handling in order to resolve decompositions. We can change this in the future so that they do not need to do this. Next, we add an inductor decomp for out_dtype. This decomp shouldn't be generally available because we want to preserve out_dtype to the backend for other use cases (i.e. executorch). Test Plan: - new tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/108080 Approved by: https://github.com/HDCharles	2023-08-31 03:15:38 +00:00
PyTorch MergeBot	95e3126370	Revert "[BE] Pin scipy to 1.10.1 (#108270 )" This reverts commit cd3860cf160838a997ecbbed1ff58823c252e5b3. Reverted https://github.com/pytorch/pytorch/pull/108270 on behalf of https://github.com/huydhn due to Some inductor tests start failing after this change. The failure comes from numba so I suspect that updating Docker pulls in an unwanted dependency update again ([comment](https://github.com/pytorch/pytorch/pull/108270#issuecomment-1700302953))	2023-08-31 03:06:13 +00:00
chilli	11860d9d41	Added info for each artifact option, added a help option to TORCH_LOGS, and changed the error message (#107758 ) New message when invalid option is provided <img width="1551" alt="image" src="https://github.com/pytorch/pytorch/assets/6355099/8b61534a-ee55-431e-94fe-2ffa25b7fd5c"> TORCH_LOGS="help" <img width="1558" alt="image" src="https://github.com/pytorch/pytorch/assets/6355099/72e8939c-92fa-4141-8114-79db71451d42"> TORCH_LOGS="+help" <img width="1551" alt="image" src="https://github.com/pytorch/pytorch/assets/6355099/2cdc94ac-505a-478c-aa58-0175526075d2"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/107758 Approved by: https://github.com/ezyang, https://github.com/mlazos ghstack dependencies: #106192	2023-08-31 02:12:35 +00:00
Nikita Shulga	cd3860cf16	[BE] Pin scipy to 1.10.1 (#108270 ) As older version leaked memory and there are no good reason to still test Python-3.8 against scipy-1.8.3 Fixes https://github.com/pytorch/pytorch/security/dependabot/7 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108270 Approved by: https://github.com/kit1980	2023-08-31 01:44:47 +00:00
hasteinmetz	b535ed2c1a	Update to RNN documentation (issue #106085 ) (#106222 ) Addresses [issue #106085](https://github.com/pytorch/pytorch/issues/106085). In `torch/nn/modules/rnn.py`: - Adds documentation string to RNNBase class. - Adds parameters to __init__ methods for RNN, LSTM, and GRU, classes. - Adds type annotations to __init__ methods for RNN, LSTM, and GRU. In `torch/ao/nn/quantized/dynamic/modules/rnn.py`: - Adds type specifications to `_FLOAT_MODULE` attributes in RNNBase, RNN, LSTM, and GRU classes. > This resolves a `mypy` assignment error `Incompatible types in assignment (expression has type "Type[LSTM]", base class "RNNBase" defined the type as "Type[RNNBase]")` that seemed to be a result of fully specified type annotations in `torch/nn/modules/rnn.py`). Pull Request resolved: https://github.com/pytorch/pytorch/pull/106222 Approved by: https://github.com/mikaylagawarecki	2023-08-31 00:50:32 +00:00
Huy Do	23a6706c7d	Fix triton upload channel detection (#108291 ) This should be nightly for nightly and test for release candidates. There are 2 bugs: * The shell needs to set to `bash` explicitly, otherwise, GHA uses `sh` which doesn't recognized `[[` as shown in https://github.com/pytorch/pytorch/actions/runs/6030476858/job/16362717792#step:6:10 *`${GITHUB_REF_NAME}` is un-quoted. This is basically https://www.shellcheck.net/wiki/SC2248 but this wasn't captured by actionlint, and shellcheck doesn't work with workflow YAML file. I will think about how to add a lint rule for this later then. ### Testing https://github.com/pytorch/pytorch/actions/runs/6031330411 to confirm that setting the channel is performed correctly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108291 Approved by: https://github.com/osalpekar, https://github.com/atalman	2023-08-31 00:40:06 +00:00
Shunting Zhang	7cb4bf675b	[inductor] no-side-effect codegen (#107617 ) Inductor kernel codegen previously have the following side effect: - in `Kernel.__exit__ `, we add local used buffers in graph.removed_buffers - during codegen, we do memory allocation/free. These cause doing multiple versions of codegen for the same kernel hard. The PR refactor the code to make kernel codegen not changing graph level states. After codegening a kernel, the graph level state is not changed so we can go on to codegen another version of the kernel if we want. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107617 Approved by: https://github.com/jansel	2023-08-31 00:25:17 +00:00
Mikayla Gawarecki	3817de5d84	Fix layernorm cpu precision issues (#108089 ) #108072 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108089 Approved by: https://github.com/mingfeima, https://github.com/albanD	2023-08-30 23:55:10 +00:00
Jason Ansel	8a089f632e	[inductor] Fix MKL issue with test_indirect_device_assert (#108172 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108172 Approved by: https://github.com/peterbell10	2023-08-30 23:13:22 +00:00
Jason Ansel	b2fe5eb710	[inductor] Ignore sympy.PolynomialError while simplifying (#108280 ) Fixes #108276 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108280 Approved by: https://github.com/Skylion007	2023-08-30 22:57:15 +00:00
Bin Bao	6830480999	[inductor] Move test_inductor_sequence_nr out of test_aot_inductor (#108237 ) Summary: The initial PR that added test_inductor_sequence_nr (https://github.com/pytorch/pytorch/pull/103129) seems to think test_aot_inductor is to test aot_autograd + inductor. Move the test to test_torchinductor instead, which can simplify the logic for test_aot_inductor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108237 Approved by: https://github.com/davidberard98, https://github.com/Neilblaze	2023-08-30 22:42:41 +00:00
Zachary DeVito	7fb131043c	[memory snapshots] _record_memory_history_legacy bug fix (#108260 ) The argment order for the legacy path got swapped in a recent patch. Because there is still a blog post documenting the legacy interface people are hitting this pathway. This patch fixes #108208 I will also update the blog post to the new API so that people are more likely to use the newer `_record_memory_history` API. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108260 Approved by: https://github.com/awgu	2023-08-30 22:33:04 +00:00
Michael Lazos	5911faeb8f	Horizontally fuse input concatenation (#108115 ) Fixes https://github.com/pytorch/pytorch/issues/106688 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108115 Approved by: https://github.com/jansel	2023-08-30 21:57:11 +00:00
Pritam Damania	704b0b3c67	[RESUBMIT] Standardize on error types for distributed errors. (#108191 ) We have a plethora of error types for various errors raised from c10d. These include `RuntimeError`, `TimeoutError`, `SocketError`, `DistBackendError` etc. This results in messy code during error handling somewhat like this: ``` if "NCCL" in exception_str: ... if "Timed out initializing process group in store based barrier on rank" in exception_str: ... if "The client socket has timed out after" in exception_str: ... if "Broken pipe" in exception_str: ... if "Connection reset by peer" in exception_str: ... ``` To address this issue, in this PR I've ensured added these error types: 1. DistError - the base type of all distributed errors 2. DistBackendError - this already existed and referred to PG backend errors 3. DistStoreError - for errors originating from the store 4. DistNetworkError - for general network errors coming from the socket library Pull Request resolved: https://github.com/pytorch/pytorch/pull/108191 Approved by: https://github.com/H-Huang	2023-08-30 21:47:39 +00:00
Rodrigo Kumpera	6dacf52f88	[submodule] [C10] Update gloo. (#107236 ) This brings in an improved rendezvous for the TCP backend. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107236 Approved by: https://github.com/H-Huang, https://github.com/XilunWu	2023-08-30 20:59:13 +00:00
chilli	39130c7433	Add reinplacing pass for scatters + incremental fake tensor updating (#106192 ) mutation for params) Pull Request resolved: https://github.com/pytorch/pytorch/pull/106192 Approved by: https://github.com/jansel, https://github.com/eellison	2023-08-30 20:41:37 +00:00
Christian Puhrsch	d0b725ea8a	reduce overhead in split and chunk for NestedTensor (#108213 ) GH first copy of #108207 Uses raw pointers to reduce construction overhead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108213 Approved by: https://github.com/dracifer, https://github.com/jbschlosser	2023-08-30 20:40:44 +00:00
Adnan Akhundov	071f9ccd8b	[inductor] Add input generation fn option for autotuning (#108242 ) Summary: In certain cases, the content of some inputs is important for consistent behavior (and performance signals) of an operator. One example is fbgemm jagged tensor operators where offsets Tensor's content must be consistent with the shape of the values Tensor (i.e. `values.size(0) == offsets[-1]` + monotonicity). This is particularily important in the context of autotuning, where the inputs are currently generated as random (for float types) or all-zero (for int types) `torch.Tensors`. Even if the extern kernel and Triton tempalte are robust enough to tolerate improper input content, the performance signals would likely be useless. In this PR, we add an option to pass input-generating functions for a subset of inputs of the autotuned op (to the `AlgorithmSelectorCache.__call__`). Test Plan: ``` $ python test/inductor/test_max_autotune.py ... ---------------------------------------------------------------------- Ran 17 tests in 80.146s OK ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D48831225](https://our.internmc.facebook.com/intern/diff/D48831225) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108242 Approved by: https://github.com/jansel	2023-08-30 20:19:26 +00:00
Doe Hyun Yoon	ad17e5ec4e	Faster gc_count update for CUDACachingAllocator (#108071 ) Summary: Modify the way we update gc_count in CUDACachingAlloctor to make it faster. Reviewed By: jaewonlee-fb Differential Revision: D48481557 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108071 Approved by: https://github.com/zdevito	2023-08-30 18:51:44 +00:00
Zain Rizvi	238cc84af9	[TD] Emit metrics to compare heuristic quality (#108192 ) When a test fails, we will now emit fine grained details about how accurately heuristics predicted the relevance of that test. ## Context Why only look at failing tests? Our only signal that a PR is most likely relevant to a test is whether or not a test fails on it. Green tests don't tell us if the success was due to the code being good vs being irrelevant. This isn't a perfect measure, since it can miscategorize unstable and flaky failures as having been "missed" by the heuristics, but it's a reasonable approximation. ## What's measured? The metrics this PR collects are designed to answer the following questions ### How comprehensive are the heuristics? - What's the false negative rate, the % of failures that ideally should have been prioritized but weren't? (Both at an aggregate level and at a per heuristic level) ### How precise are the heuristics? - What % of failed tests were prioritized by a given heuristic? What % was prioritized overall? - How relevant was a failed test was considered to be? (Both a aggregate level and at a per heuristic level) - What % of time was a given heuristic prioritizing a failing test higher than any other heuristic? Pull Request resolved: https://github.com/pytorch/pytorch/pull/108192 Approved by: https://github.com/huydhn ghstack dependencies: #108117	2023-08-30 18:28:18 +00:00
Manuel Candales	d695486f69	[Vulkan] Fix addmm & linear when bias needs to broadcast (#108199 ) Test Plan: On Devserver ``` LD_LIBRARY_PATH=third-party/swiftshader/lib/linux-x64/ buck run //xplat/caffe2:pt_vulkan_api_test_bin ``` Reviewed By: shubhraprakash1 Differential Revision: D48806899 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108199 Approved by: https://github.com/digantdesai	2023-08-30 17:50:58 +00:00
Angela Yi	5683ab74f4	[export] Fix autogenerated stacktrace (#108217 ) Summary: Existing code is incorrectly overwriting the stacktrace to be None because since there is no exception happening, `traceback.format_exc` is None. Also we should only populate the stack trace if it not there in the first place. Test Plan: CI Differential Revision: D48818478 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108217 Approved by: https://github.com/zhxchen17	2023-08-30 17:44:06 +00:00
Yukio Siraichi	6ad5568cbc	Break graph on `manual_seed`. (#107594 ) Fix: #107187 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107594 Approved by: https://github.com/eellison	2023-08-30 17:24:11 +00:00
blorange-amd	b1b9a3646a	Increased logging threshold for profiler matching (#108010 ) Fixed test_memory_profiler::TestMemoryProfilerE2E::test_memory_timeline by changing the (arbitrary) threshold for logging. With increased size allocations on newer AMD GPUs, a higher threshold over 1024 is preferred to account for those differences and yet satisfy the test requirements. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108010 Approved by: https://github.com/colesbury, https://github.com/pruthvistony	2023-08-30 17:16:36 +00:00
cyy	01fc6466d1	[Reland] [1/N] fix clang-tidy warnings in torch/csrc (#108114 ) Reland of PR #107648 with auto replaced with Py_ssize_t in eval_frame.c. This PR applies fixes to some found issues by clang-tidy in torch/csrc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108114 Approved by: https://github.com/Skylion007	2023-08-30 17:11:16 +00:00
Huy Do	7be233f3a5	Remove commit hash when building triton wheel and conda in release mode (#108203 ) This is the follow-up of https://github.com/pytorch/pytorch/pull/108187 to set the correct release version without commit hash for triton wheel and conda binaries when building them in release mode. ### Testing * With commit hash (nightly): https://github.com/pytorch/pytorch/actions/runs/6019021716 * Without commit hash https://github.com/pytorch/pytorch/actions/runs/6019378616 (by adding `--release` into the PR) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108203 Approved by: https://github.com/atalman	2023-08-30 16:49:21 +00:00
andrewor14	057b807178	[quant] Move dropout replacement to `move_model_to_eval` (#108184 ) Summary: This commit adds a public facing `torch.ao.quantization.move_model_to_eval` util function for QAT users. Instead of calling model.eval() on an exported model (which doesn't work, see https://github.com/pytorch/pytorch/issues/103681), the user would call this new util function instead. This ensures special ops such as dropout and batchnorm (not supported yet) will have the right behavior when the graph is later used for inference. Note: Support for an equivalent `move_model_to_train` will be added in the future. This is difficult to do for dropout currently because the eval pattern of dropout is simply a clone op, which we cannot just match and replace with a dropout op. Test Plan: python test/test_quantization.py TestQuantizePT2E.test_move_model_to_eval Reviewers: jerryzh168, kimishpatel Subscribers: jerryzh168, kimishpatel, supriyar Differential Revision: [D48814735](https://our.internmc.facebook.com/intern/diff/D48814735) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108184 Approved by: https://github.com/jerryzh168	2023-08-30 16:33:17 +00:00
Mengwei Liu	0fb1c05c5a	[pytorch] Add decomp rule for scaled_dot_product_attention (#108180 ) `scaled_dot_product_attention` used to be decomposed in pre-autograd, given that it calls `_scaled_dot_product_attention_math` and `_scaled_dot_product_attention_math` only has a `CompositeImplicitAutograd` kernel. As a result it's decomposed into ops with finer granularity. However recent PRs (#103826 #105131) added new logic in `scaled_dot_product_attention` and now it calls `_scaled_dot_product_flash_attention` which contains a CPU kernel. This results in `_scaled_dot_product_flash_attention` showing up in `torch.export()`. This PR adds a decomposition that ensures `scaled_dot_product_attention` is still being decomposed the same way as before, i.e., going through `_scaled_dot_product_attention_math`. Notice that this decomp rule should be excluded by inductor. Differential Revision: [D48762000](https://our.internmc.facebook.com/intern/diff/D48762000/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108180 Approved by: https://github.com/SherlockNoMad	2023-08-30 15:52:08 +00:00
Nikita Shulga	e31038d574	Check results dtype in index_out (#108167 ) This logic exists for index_put and index_add, but for some reason not for `index.out` Skip testing, as this function is not technically exposed on the Python level. <!-- copilot:poem --> ### <samp>🤖 Generated by Copilot at c688cfd</samp> > _`index_out` checks types_ > _avoiding errors in autumn_ > _complex tensors work_ Fixes https://github.com/pytorch/pytorch/issues/107698 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108167 Approved by: https://github.com/albanD	2023-08-30 14:55:18 +00:00
Emmanuel Menage	fe1f26af8a	Add support for PickleOpCode::APPEND in torch unpickler (#104027 ) Reviewed By: qiminglu Differential Revision: D46760650 Pull Request resolved: https://github.com/pytorch/pytorch/pull/104027 Approved by: https://github.com/ezyang	2023-08-30 14:24:50 +00:00
Michael Lazos	0297232053	Fix operator precedence (#108196 ) Summary: Ensure that modules are only installed if they are not fsdp modules. Differential Revision: D48810186 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108196 Approved by: https://github.com/shunting314, https://github.com/anijain2305	2023-08-30 14:00:33 +00:00
CaoE	813246c554	Add scalar conversion using avx instructions for half (#102140 ) ### Motivation Scalar conversion between Half and Float on CPU is more time consuming compared to BFloat16 <-> Float. There is no direct data type conversion instruction for single Half value on CPU, so we add scalar conversion with avx instructions for Half to speed up. ### Testing Test maxpool, and compared with the results of #98819. Single socket (28 cores): shape \| fp16 forward / ms \| bf16 forward / ms \| fp16 backward / ms \| bf16 backward / ms \| speedup ratio (fp16 forward) \| speedup ratio (fp16 backward) -- \| -- \| -- \| -- \| -- \| -- \| -- size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: contig \| 5.07165 \| 5.418 \| 0.5798 \| 0.5123 \| 1.373694951 \| 3.430786 size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: CL \| 1.37455 \| 1.2505 \| 8.8336 \| 9.7684 \| 1.373635008 \| 4.132924 size: (32, 16, 200, 200), kernel: 3, stride: 1, mem_format: contig \| 28.72 \| 30.7069 \| 3.813 \| 3.75 \| 1.31977124 \| 2.783006 size: (32, 16, 200, 200), kernel: 3, stride: 1, mem_format: CL \| 4.5783 \| 4.703 \| 4.703 \| 5.1 \| 1.028980189 \| 3.1293 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: contig \| 13.896 \| 14.8138 \| 1.6635 \| 1.6274 \| 1.298704663 \| 2.982699 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: CL \| 2.11291 \| 2.1158 \| 2.26778 \| 2.272 \| 0.951105348 \| 3.179012 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: contig \| 0.4204 \| 0.3843 \| 0.0649 \| 0.0633 \| 2.102711703 \| 1.779492 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: CL3d \| 0.1134 \| 0.11 \| 0.1476 \| 0.143 \| 2.23042328 \| 3.612398 Single core: shape \| fp16 forward / ms \| bf16 forward / ms \| fp16 backward / ms \| bf16 backward / ms \| speedup ratio (fp16 forward) \| speedup ratio (fp16 backward) -- \| -- \| -- \| -- \| -- \| -- \| -- size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: contig \| 124.413 \| 114.44 \| 10.553 \| 11.2486 \| 1.31395433 \| 3.923844 size: (1, 56, 264, 264), kernel: 3, stride: 1, mem_format: CL \| 28.99 \| 28.0781 \| 9.5092 \| 10.9258 \| 1.324296999 \| 3.888377 size: (32, 16, 200, 200), kernel: 3, stride: 1, mem_format: contig \| 640.8276 \| 591.964 \| 59.18776 \| 60.854 \| 1.334956391 \| 3.704458 size: (32, 16, 200, 200), kernel: 3, stride: 1, mem_format: CL \| 88.57 \| 90.214 \| 54.358 \| 59.205 \| 1.031258214 \| 3.75285 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: contig \| 318.6197 \| 285.155 \| 28.4999 \| 29.4387 \| 1.315298144 \| 3.759747 size: (32, 32, 100, 100), kernel: 3, stride: 1, mem_format: CL \| 31.3981 \| 34.0544 \| 25.6557 \| 28.7811 \| 1.068505738 \| 3.841587 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: contig \| 8.87882 \| 8.207 \| 0.386056 \| 0.3939 \| 1.567866 \| 3.50387 size: (4, 19, 10, 16, 16), kernel: 3, stride: 1, mem_format: CL3d \| 2.4167 \| 2.38295 \| 0.3769 \| 0.4066 \| 1.39402491 \| 3.30061 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102140 Approved by: https://github.com/jgong5, https://github.com/mingfeima, https://github.com/cpuhrsch	2023-08-30 13:26:53 +00:00
Sebastian Bischoff	ca7249b80a	Remove duplicate sentences in description of torch.linalg.eig (#108230 ) This removes nearly identical sentences in the description of `torch.linalg.eig` describing the checks in the backward pass by @lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/108230 Approved by: https://github.com/lezcano	2023-08-30 13:16:04 +00:00
Liao, Xuan	3a79621c9d	[Inductor] Add fused_attention pattern matcher with additional clone (#108141 ) A previous PR https://github.com/pytorch/pytorch/pull/106274 decomposes `aten.dropout` and would create a `clone()` when `eval()` or `p=0`. This makes many SDPA-related models fail to match fused_attention pattern matchers. This PR adds new fused_attention pattern matchers with an additional clone to re-enable the SDPA op matching. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108141 Approved by: https://github.com/jgong5, https://github.com/eellison	2023-08-30 05:07:35 +00:00
leslie-fang-intel	e45b391ebd	Enable Mypy Checking in mkldnn_fusion.py and quantization.py (#108131 ) Summary As in issue: https://github.com/pytorch/pytorch/issues/105230, enable Mypy Checking in `torch/_inductor/fx_passes/mkldnn_fusion.py` and `torch/_inductor/fx_passes/quantization.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/108131 Approved by: https://github.com/Skylion007, https://github.com/eellison ghstack dependencies: #108125	2023-08-30 05:03:35 +00:00
leslie-fang-intel	1a5fdc2458	Re-enable some Quantization UTs after Quantization flow updates (#108125 ) Summary This diff mainly has done 2 things: - Re-enable the testcases skipped in commit: `9ae3d7ca90` due to the quantization flow update. - Break down the original testcases into small testcases to make each testcase simpler. TestPlans ``` python -m pytest test_mkldnn_pattern_matcher.py -k test_qconv2d python -m pytest test_mkldnn_pattern_matcher.py -k test_qconv2d_relu python -m pytest test_mkldnn_pattern_matcher.py -k test_qconv2d_add python -m pytest test_mkldnn_pattern_matcher.py -k test_qconv2d_add_relu python -m pytest test_mkldnn_pattern_matcher.py -k test_qconv2d_dequant_promotion python -m pytest test_mkldnn_pattern_matcher.py -k test_qlinear python -m pytest test_mkldnn_pattern_matcher.py -k test_qlinear_relu python -m pytest test_mkldnn_pattern_matcher.py -k test_qlinear_dequant_promotion ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/108125 Approved by: https://github.com/jerryzh168, https://github.com/jgong5	2023-08-30 04:41:54 +00:00
Zain Rizvi	620d267ef3	Refactor TestPrioritizations to support more priorities and reduce risk of accidental mutations (#108117 ) Refactor TD code to make it easier to add additional categories later and also support the changes required to enable the metrics needed for TD Pull Request resolved: https://github.com/pytorch/pytorch/pull/108117 Approved by: https://github.com/huydhn	2023-08-30 04:14:28 +00:00
Shunting Zhang	5e0ec03a71	[inductor][easy] reuse a single is_aligned function (#108135 ) Resolve comment: https://github.com/pytorch/pytorch/pull/107722#discussion_r1308117422 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108135 Approved by: https://github.com/jansel ghstack dependencies: #107722	2023-08-30 03:33:30 +00:00
PyTorch UpdateBot	bf517f4092	[vision hash update] update the pinned vision hash (#108201 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108201 Approved by: https://github.com/pytorchbot	2023-08-30 03:23:59 +00:00
CaoE	283ce12aa9	Add channels_last3d support for mkldnn conv and mkldnn deconv (#95271 ) ### Motivation - Add channels_last3d support for mkldnn conv and mkldnn deconv. - Use `ideep::convolution_transpose_forward::compute_v3` instead of `ideep::convolution_transpose_forward::compute`. compute_v3 uses `is_channels_last` to notify ideep whether to go CL or not to align with the memory format check of PyTorch. ### Testing 1 socket (28 cores): - memory format: torch.contiguous_format module \| shape \| forward / ms \| backward / ms -- \| -- \| -- \| -- conv3d \| input size: (32, 32, 10, 100, 100), weight size: (32, 32, 3, 3, 3) \| 64.56885 \| 150.1796 conv3d \| input size: (32, 16, 10, 200, 200), weight size: (16, 16, 3, 3, 3) \| 100.6754 \| 231.8883 conv3d \| input size: (16, 4, 5, 300, 300), weight size: (4, 4, 3, 3, 3) \| 19.31751 \| 68.31131 module \| shape \| forward / ms \| backward / ms -- \| -- \| -- \| -- ConvTranspose3d \| input size: (32, 32, 10, 100, 100), weight size: (32, 32, 3, 3, 3) \| 122.7646 \| 207.5125 ConvTranspose3d \| input size: (32, 16, 10, 200, 200), weight size: (16, 16, 3, 3, 3) \| 202.4542 \| 368.5492 ConvTranspose3d \| input size: (16, 4, 5, 300, 300), weight size: (4, 4, 3, 3, 3) \| 122.959 \| 84.62577 - memory format: torch.channels_last_3d module \| shape \| forward / ms \| backward / ms -- \| -- \| -- \| -- conv3d \| input size: (32, 32, 10, 100, 100), weight size: (32, 32, 3, 3, 3) \| 40.06993 \| 114.317 conv3d \| input size: (32, 16, 10, 200, 200), weight size: (16, 16, 3, 3, 3 \| 49.08249 \| 133.4079 conv3d \| input size: (16, 4, 5, 300, 300), weight size: (4, 4, 3, 3, 3) \| 5.873911 \| 17.58647 module \| shape \| forward / ms \| backward / ms -- \| -- \| -- \| -- ConvTranspose3d \| input size: (32, 32, 10, 100, 100), weight size: (32, 32, 3, 3, 3) \| 88.4246 \| 208.2269 ConvTranspose3d \| input size: (32, 16, 10, 200, 200), weight size: (16, 16, 3, 3, 3 \| 140.0725 \| 270.4172 ConvTranspose3d \| input size: (16, 4, 5, 300, 300), weight size: (4, 4, 3, 3, 3) \| 23.0223 \| 37.16972 Pull Request resolved: https://github.com/pytorch/pytorch/pull/95271 Approved by: https://github.com/jgong5, https://github.com/cpuhrsch	2023-08-30 02:53:30 +00:00
wz337	13e4cce83c	[DTensor] Add util API to compute_local_shape_and_global_offset for checkpointing purpose (#107996 ) The compute_local_shape_and_global_offset API does the following: 1) Calculate both local_shape and global_offset in one API to replace two API calls (compute_local_size and compute_local_shape). 2) Generate the correct global_offset for checkpointing purposes. We are currently using compute_local_offset for downstream checkpoint components, which could lead to incorrect results. For checkpointing, we need global_offset instead of local_offset. In some cases, global_offset does not equal to local_offset, when a dimension is sharded multipe times on different mesh dimension (e.g. placements = [Shard(0), Shard(0)]). Follow-up PRs: 1) Replace related downstream components to use compute_local_shape_and_global_offset instead of compute_local_size and compute_local_offset. 2) Audit existing code base to see if we can remove compute_local_size and compute_local_offset, since they are currently being used. cc. @wanchaol Pull Request resolved: https://github.com/pytorch/pytorch/pull/107996 Approved by: https://github.com/wanchaol	2023-08-30 02:46:50 +00:00
Shunting Zhang	556bfe7cb5	[inductor] let codegen not rely on node order (#107320 ) We'd like to benchmark fusion (either for autotuning or for gathering data to find some patterns that can guide optimizations). There is a deadlock here that prevents us from doing this: to benchmark fusion, we need do codegen before all the fusions are done. However currently codegen rely on xSchedulerNode.last_usage information to decide which buffers are not needed at all and thus don't even need to be allocated/written (Scheduler.removed_buffers tracks this). xSchedulerNode.last_usage information can only be computed once the order of all the nodes have been decided. But each fusion pass (`fuse_nodes_once`) can also change node orders. So we know the final node orders only after all the fusions have completed. That blocks us from doing codegen during fusion (before all fusion are done). Here I just show the above with a chain of dependencies to make it easier to understand (a -> b means a depends on b, or b has to happen before a): ``` benchmark one fusion decision -> codegen -> xSchedulerNode.last_usage -> node order -> all fusions have completed ``` Actually we only need to decide if a buffer has only local usages (if yes, it's a candidate for removing). This can be decided if we know what are all the users for each buffer. We can avoid using xSchedulerNode.last_usage in this case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107320 Approved by: https://github.com/peterbell10, https://github.com/jansel	2023-08-30 02:34:20 +00:00
Omkar Salpekar	7264b75763	Remove Anaconda Prune (#108111 ) Anaconda Prune job has been migrated to test-infra, so this job in CCI is defunct. We can remove it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108111 Approved by: https://github.com/huydhn	2023-08-30 00:47:25 +00:00
Huy Do	cd07214a41	Fix various issues on build-triton-wheel workflow (#108187 ) There are more issues that I expect at the beginning: * Triton was uploaded on `main` instead of `nightly` and release branch * The environment `conda-aws-upload` wasn't used correctly in both wheel and conda upload * Conda update wasn't run in a separate ephemeral runner * Duplicated upload logic, should have just use `bash .circleci/scripts/binary_upload.sh` instead * Handle `CONDA_PYTORCHBOT_TOKEN` and `CONDA_PYTORCHBOT_TOKEN_TEST` tokens in a similar way as https://github.com/pytorch/test-infra/pull/4530 Part of https://github.com/pytorch/pytorch/issues/108154 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108187 Approved by: https://github.com/atalman	2023-08-30 00:02:32 +00:00
Jason Ansel	2c87ef3dbf	[inductor] Fix inputs with existing offsets (#108168 ) This cherrypicks the reinterpret_tensor change from #102625 in order to fix a subtle correctness bug when the graph inputs already have a storage_offset set. The view change also fixes some issues with quantized models in torchbench. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108168 Approved by: https://github.com/desertfire	2023-08-29 23:47:03 +00:00
Mu-Chu Lee	c3239442a3	[AOTInductor] Include constants in AOTInductor .so file. (#107718 ) Summary: Include the constants into AOTInductor .so file. We do not modify existing API signatures but create necessary format with weight lifted out instead. Test Plan: test/inductor/test_aot_inductor.py Reviewers: Subscribers: Tasks: Tags: Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/107718 Approved by: https://github.com/angelayi, https://github.com/eellison	2023-08-29 22:37:30 +00:00
Jane Xu	fa49be2a49	[docs] Properly link register_post_accumulate_grad_hook docs (#108157 ) it shows up now ![image](https://github.com/pytorch/pytorch/assets/31798555/0aa86839-b9c5-4b4b-b1b1-aa1c0c0abbab) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108157 Approved by: https://github.com/soulitzer, https://github.com/albanD	2023-08-29 22:13:33 +00:00
Vitalii Topoliuk	525b593954	Fix focus builds of macOS apps on apple silicon. (#96966 ) (#107816 ) Summary: Focus2 builds of some apps on apple silicon Macs are failing. We've determined that removing the `user.focus_enabled=true` config option allows the build to succeed. Reviewed By: milend Differential Revision: D44076509 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107816 Approved by: https://github.com/kit1980	2023-08-29 21:57:01 +00:00
lezcano	86bc50ae60	Add AMP support to linalg.vecdot. (#108165 ) We follow the same rules as matmul. Fixes https://github.com/pytorch/pytorch/issues/108127 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108165 Approved by: https://github.com/albanD	2023-08-29 21:33:52 +00:00
Tugsbayasgalan Manlaibaatar	75884f4e1d	Error when someone calls train/eval on pre_autograd graph (#108143 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108143 Approved by: https://github.com/andrewor14	2023-08-29 21:03:48 +00:00
Yukio Siraichi	cadd97feef	Remove case for `RecursionError` on `try_solve`. (#108144 ) This PR removes an `except` clause for `RecursionError`. It used to be there because `sympy.solve` was being used at the time. Since we are using the simpler `try_solve`, it's not needed anymore. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108144 Approved by: https://github.com/Skylion007	2023-08-29 19:22:20 +00:00
Eli Kobrin	68b518c13e	Add check for out of range pointer. (#107510 ) ### Summary Hi! We've been fuzzing pytorch with [sydr-fuzz](https://github.com/ispras/oss-sydr-fuzz) and found an error of accessing arbitary address while parsing flatbuffer format using `torch::load` function. pytorch version: 18bcf62bbcf7ffd47e3bcf2596f72aa07a07d65f (the last commit at the moment of reporting the issue) ### Details The vulnerability appears while loading arbitrary user input using `torch::load` function. To detect the error the input must correspond to FlatbufferFileFormat, so the part of parsing flatbuffer in `import_ir_module` function must be executed. Firstly error can occur in `GetMutableRoot` in `module.h`, where we add pointer to input data buffer with the value, got from dereference of this pointer (which data fully depends on the user input and can be arbitrary). so the resulting `flatbuffer_module` address can be corrupted. Moreover, we can get the arbitrary address later at `flatbuffer_loader.cpp:305`, when we get `ival` pointer with `Get` method. There in `IndirectHelper::Read` function we add pointer with the offset got from the dereference of this pointer, so the address can be corrupted again. The corrupted `ival` pointer is dereferenced at `table.h` in flatbuffers project, where is used to get another address, which is later dereferenced again at `table.h` in flatbuffers project. The resulting corrupted address is written to `func` pointer at `flatbuffer_loader.cpp:274`, which is then used in `parseFunction`, where write access to the address occurs. To fix the problem we can compute the end of memory area in `parse_and_initialize_mobile_module` function like this: ``` auto* end = static_cast<char*>(data) + size; ``` And then pass it to all the callees and insert corresponding checks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107510 Approved by: https://github.com/albanD	2023-08-29 18:25:11 +00:00
FFFrog	78810d78e8	Fix the coredump described by #106702 (#108002 ) Fixes #106702 and add some tests As shown by [maxUnpool1d](https://pytorch.org/docs/master/generated/torch.nn.MaxUnpool1d)(`MaxUnpool2d`, `MaxUnpool3d` also), `Input` and `Output` support `(N,C,)` or `(C,)`, but the c++ api currently supports the former, and the latter will cause a coredump. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108002 Approved by: https://github.com/albanD	2023-08-29 17:14:16 +00:00
Jack Taylor	fa885baf04	[ROCm] Update ROCm pin to fix triton wheel lib issue (#108137 ) ROCm nightly wheels are facing issue as so files required for triton wheel are not being copied to third_party after https://github.com/pytorch/pytorch/pull/107600 This PR is updating our pinned commit to bring in a revert a change to search for none numbered so's which seems to be causing the issue Before PR 107600 https://github.com/pytorch/pytorch/actions/runs/5763938339/job/15626824908 ``` 2023-08-04T15:35:34.8049152Z Copying DRM Libraries 2023-08-04T15:35:34.8239451Z + ROCM_SO=("libhsa-runtime64.so.1" "libamdhip64.so.5" "libamd_comgr.so.2") 2023-08-04T15:35:34.8239816Z + mkdir -p python/triton/third_party/rocm/lib 2023-08-04T15:35:34.8295087Z + for lib in '"${ROCM_SO[@]}"' 2023-08-04T15:35:34.8295443Z + filepath=/tmp/vanilla_extract/opt/rocm-5.4.2/lib/libamdhip64.so.5 2023-08-04T15:35:34.8295891Z + cp /tmp/vanilla_extract/opt/rocm-5.4.2/lib/libamdhip64.so.5 python/triton/third_party/rocm/lib/ 2023-08-04T15:35:34.8421258Z ++ echo libamdhip64.so.5 2023-08-04T15:35:34.8421621Z ++ sed -e 's/\.so./.so/g' 2023-08-04T15:35:34.8432022Z + LINKNAME=libamdhip64.so 2023-08-04T15:35:34.8432686Z + ln -sf libamdhip64.so.5 python/triton/third_party/rocm/lib/libamdhip64.so ... 2023-08-04T15:35:35.7473664Z copying triton/third_party/rocm/lib/libamdhip64.so -> build/lib.linux-x86_64-cpython-38/triton/third_party/rocm/lib 2023-08-04T15:35:35.7617341Z copying triton/third_party/rocm/lib/libamdhip64.so.5 -> build/lib.linux-x86_64-cpython-38/triton/third_party/rocm/lib ... 2023-08-04T15:40:10.5063779Z copying build/lib.linux-x86_64-cpython-38/triton/third_party/rocm/lib/libamdhip64.so -> build/bdist.linux-x86_64/wheel/triton/third_party/rocm/lib 2023-08-04T15:40:10.6144654Z copying build/lib.linux-x86_64-cpython-38/triton/third_party/rocm/lib/libamdhip64.so.5 -> build/bdist.linux-x86_64/wheel/triton/third_party/rocm/lib ... 2023-08-04T15:40:37.3571973Z adding 'triton/third_party/rocm/lib/libamdhip64.so' 2023-08-04T15:40:38.4553988Z adding 'triton/third_party/rocm/lib/libamdhip64.so.5' ... 2023-08-04T15:40:53.3747917Z Setting rpath of triton/third_party/rocm/lib/libamdhip64.so to '' 2023-08-04T15:40:53.4602326Z $ORIGIN 2023-08-04T15:40:53.4620034Z Setting rpath of triton/third_party/rocm/lib/libamdhip64.so.5 to '' 2023-08-04T15:40:53.6337419Z $ORIGIN ... 2023-08-04T15:40:53.7152828Z Copied triton/third_party/rocm/lib/libamdhip64.so to triton/third_party/rocm/lib/libamdhip64.so 2023-08-04T15:40:53.7215480Z Copied triton/third_party/rocm/lib/libamdhip64.so.5 to triton/third_party/rocm/lib/libamdhip64.so ... ``` After PR 107600 https://github.com/pytorch/pytorch/actions/runs/5967761429/job/16190110783 ``` 2023-08-24T18:59:14.3976234Z + ROCM_SO=("libhsa-runtime64.so" "libamdhip64.so" "libamd_comgr.so" "libdrm.so" "libdrm_amdgpu.so") 2023-08-24T18:59:14.3977489Z + for lib in '"${ROCM_SO[@]}"' 2023-08-24T18:59:14.4216575Z + file_path=($(find $ROCM_HOME/lib/ -name "$lib")) 2023-08-24T18:59:14.4219237Z ++ find /opt/rocm/lib/ -name libamdhip64.so 2023-08-24T18:59:14.4254361Z + [[ -z /opt/rocm/lib/libamdhip64.so ]] 2023-08-24T18:59:14.4254677Z + [[ -z /opt/rocm/lib/libamdhip64.so ]] 2023-08-24T18:59:14.4254965Z + [[ -z /opt/rocm/lib/libamdhip64.so ]] 2023-08-24T18:59:14.4255246Z + [[ -z /opt/rocm/lib/libamdhip64.so ]] 2023-08-24T18:59:14.4255538Z + cp /opt/rocm/lib/libamdhip64.so python/triton/third_party/rocm/lib 2023-08-24T18:59:14.5085155Z ++ echo libamdhip64.so 2023-08-24T18:59:14.5085818Z ++ sed -e 's/\.so./.so/g' 2023-08-24T18:59:14.5097094Z + LINKNAME=libamdhip64.so 2023-08-24T18:59:14.5097572Z + ln -sf libamdhip64.so python/triton/third_party/rocm/lib/libamdhip64.so ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/108137 Approved by: https://github.com/sunway513, https://github.com/huydhn	2023-08-29 16:47:56 +00:00
PyTorch MergeBot	4e47ea5131	Revert "Break graph on `manual_seed`. (#107594 )" This reverts commit 6c28de24374db3b2a58aabf62985d18ce899c91f. Reverted https://github.com/pytorch/pytorch/pull/107594 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it seems to cause failures in trunk on inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_uniform_cuda_float, likely a landrace ([comment](https://github.com/pytorch/pytorch/pull/107594#issuecomment-1697783965))	2023-08-29 16:38:01 +00:00
Rodrigo Kumpera	fe2cda64dc	[C10D] Implement new libuv backend for TCPStore. (#108066 ) The new backend is currently under a flag 'use_libuv' in TCPStore constructor to reduce the impact on existing users as we test it. This is a reland of #105870 with a fix for a bad test. Differential Revision: [D48742554](https://our.internmc.facebook.com/intern/diff/D48742554) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108066 Approved by: https://github.com/H-Huang, https://github.com/fduwjj	2023-08-29 14:55:14 +00:00
Brian Hirsh	80c7fdf49f	wrapper subclasses: support non-cpu device for dynamic shape overload (#107926 ) This is tested by AOTAutograd later in the stack, but I can add direct tests if anyone wants them. Previously, the second overload of `_make_wrapper_subclass` (which supports dynamic shapes) would just always return a wrapper tensor that reported as being on `cpu`. This updates it to properly respect the `device` arg that was passed in. At first I thought about doing this the same way that FakeTensor does it (override device to do a custom impl), but that seemed overly complicated. Since the subclass is a wrapper, we can just bake in the value on the wrapper. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107926 Approved by: https://github.com/ezyang ghstack dependencies: #107915, #107916	2023-08-29 14:27:21 +00:00
Brian Hirsh	c6e3adaf54	add dynamic shapes support for subclasses that override size/stride (#107916 ) This is mostly a minor fix on top of @soulitzer's PR https://github.com/pytorch/pytorch/pull/107839. (1) `strides` wasn't going through the new `set_tensor_attr_with_capsule` flow (2) The dynamic shapes overload for `_make_wrapper_subclass` currently errors when you try to use custom sizes - I removed the error (3) added a test I need this later because I'm adding a `__torch_dispatch__` `FunctionalTensor` wrapper subclass, that needs to support dynamic shapes, and also plumb metadata calls to its inner tensor later. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107916 Approved by: https://github.com/ezyang, https://github.com/soulitzer ghstack dependencies: #107915	2023-08-29 14:27:21 +00:00
Brian Hirsh	4f34caf164	add return_and_correct_aliasing() util for wrapper subclasses (#107915 ) This PR adds a `return_and_correct_aliasing()` utility, that wrapper subclasses can use to get correct aliasing. I updated `TwoTensor` to use it, and added some testing that the aliasing of my `TwoTensor` subclass now matches the aliasing behavior of normal tensors. Right now my test just uses a few hand-picked opinfos (that have varying aliasing behavior). I thought all op infos might be overkill (does that take a while to run?), but I'm happy to add them all if people prefer. One more general question about this PR: eventually, proper aliasing will be a requirement in order for AOTAutograd to handle aliasing/mutations on subclasses properly during compilation. How can we make sure that wrapper subclasses use this API? A few options (from talking to Richard): (1) Yolo require subclasses to use the API and hope users do as well (what this PR does) (2) Yolo require subclasses to use the API, but add a kwarg to `_make_wrapper_subclass`, e.g. `manual_aliasing=True`, that torch.compile checks for before allowing the subclass to be used in compilation (3) Automatically run this API in our python fallback, for every tensor subclass that currently implements `__tensor_flatten__` (aka only the "traceable" subclasses) (4) Automatically run this API in our python fallback, for every tensor subclass. This would be a bit higher blast radius, since it would change the existing aliasing behavior of wrapper subclasses. Maybe.. this is the right thing to do though? Either way, my tentative plan is to do (1) to unblock, and revisit this later once we want to come up with public docs + a more general "tensor subclass in PT2 requirements" plan Pull Request resolved: https://github.com/pytorch/pytorch/pull/107915 Approved by: https://github.com/ezyang	2023-08-29 14:27:19 +00:00
Yukio Siraichi	6c28de2437	Break graph on `manual_seed`. (#107594 ) Fix: #107187 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107594 Approved by: https://github.com/eellison	2023-08-29 12:59:57 +00:00
vfdev	b7624fc91e	Cleaned up test_mps.py::test_output*_match (#108092 ) Description: - cleaned up test_mps.py::test_output_match and test_mps.py::test_output_grad_match tests - removed unused variables and useless brackets - simplified atol/rtol setup if/else code Pull Request resolved: https://github.com/pytorch/pytorch/pull/108092 Approved by: https://github.com/kulinseth	2023-08-29 10:46:02 +00:00
voznesenskym	f3a8d57aea	[Dynamo x FSDP] Add support for params, buffers, submodules on FSDPManagedNNModuleVariable (#107923 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107923 Approved by: https://github.com/wconstab	2023-08-29 08:54:13 +00:00
XiaobingSuper	977e4302ab	skip dynamic shape test for test_conv_bn_fuse (#108113 ) For test_conv_bn_fuse dynamic case, we always fuse bn with convolution and there only a external convolution call, not loops, so it will failed when we do a dynamic loop vars check. This PR will skip this case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108113 Approved by: https://github.com/huydhn	2023-08-29 08:39:52 +00:00
Jerry Zhang	147b3495e2	[quant][pt2e] Add reference representation for dynamic quantized linear (#108073 ) Summary: att Test Plan: python test/test_quantization.py TestQuantizePT2E.test_representation_dynamic_linear buck2 test 'fbcode//mode/opt' fbcode//caffe2/test:quantization_pt2e -- 'test_representation_dynamic_linear' Reviewed By: kimishpatel Differential Revision: D48703076 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108073 Approved by: https://github.com/andrewor14	2023-08-29 07:12:55 +00:00
vfdev-5	0cfc5899f9	[inductor] Improved grid_sampler_2d decomposition for cuda (#104710 ) Description: - Improved grid_sampler_2d decomposition code to generate single cuda kernel instead of two Related to https://github.com/pytorch/pytorch/issues/104296 Perfs: - speed-up on cuda (~x5) and cpu (~x2) for bicubic mode ``` Speed-up PR vs Nightly = ratio between columns "Compiled (2.1.0a0+git52598e9) PR" and "Compiled (2.1.0a0+gitcf76938) Nightly" [------------------------------------------------------------------------------------------------------------------------------- Affine grid sampling, cpu -------------------------------------------------------------------------------------------------------------------------------] \| Eager (2.1.0a0+git52598e9) PR \| Compiled (2.1.0a0+git52598e9) PR \| Compiled (2.1.0a0+gitcf76938) Nightly \| speed-up PR vs Nightly \| Eager (2.1.0a0+gitcf76938) Nightly 1 threads: -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=bilinear \| 38.010 (+-0.118) \| 51.466 (+-1.257) \| 47.867 (+-0.124) \| 0.930 (+-0.000) \| 33.654 (+-0.411) Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=bilinear \| 35.532 (+-0.236) \| 52.189 (+-0.093) \| 58.979 (+-0.206) \| 1.130 (+-0.000) \| 32.543 (+-0.198) Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=bilinear \| 38.187 (+-0.112) \| 47.892 (+-0.117) \| 45.833 (+-0.081) \| 0.957 (+-0.000) \| 33.752 (+-0.116) Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=bilinear \| 36.708 (+-0.244) \| 51.680 (+-0.104) \| 58.360 (+-0.108) \| 1.129 (+-0.000) \| 32.576 (+-0.751) Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=nearest \| 24.201 (+-0.088) \| 27.451 (+-0.059) \| 27.937 (+-0.081) \| 1.018 (+-0.000) \| 24.367 (+-0.074) Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=nearest \| 19.266 (+-0.105) \| 26.070 (+-0.085) \| 26.092 (+-0.054) \| 1.001 (+-0.000) \| 20.144 (+-0.064) Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=nearest \| 24.293 (+-0.125) \| 26.085 (+-0.064) \| 26.575 (+-0.061) \| 1.019 (+-0.000) \| 24.515 (+-0.095) Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=nearest \| 19.440 (+-0.075) \| 25.252 (+-0.059) \| 25.259 (+-0.051) \| 1.000 (+-0.000) \| 19.770 (+-0.070) Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=bicubic \| 114.900 (+-0.508) \| 113.416 (+-1.271) \| 248.679 (+-1.431) \| 2.193 (+-0.000) \| 114.609 (+-0.515) Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=bicubic \| 115.973 (+-0.555) \| 124.711 (+-1.596) \| 282.187 (+-2.418) \| 2.263 (+-0.000) \| 115.368 (+-0.652) Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=bicubic \| 111.730 (+-0.562) \| 110.914 (+-0.865) \| 253.899 (+-2.226) \| 2.289 (+-0.000) \| 111.285 (+-1.226) Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=bicubic \| 112.859 (+-0.487) \| 131.696 (+-1.298) \| 294.124 (+-1.963) \| 2.233 (+-0.000) \| 110.910 (+-0.969) Times are in milliseconds (ms). [------------------------------------------------------------------------------------------------------------------------------- Affine grid sampling, cuda ------------------------------------------------------------------------------------------------------------------------------] \| Eager (2.1.0a0+git52598e9) PR \| Compiled (2.1.0a0+git52598e9) PR \| Compiled (2.1.0a0+gitcf76938) Nightly \| speed-up PR vs Nightly \| Eager (2.1.0a0+gitcf76938) Nightly 1 threads: -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=bilinear \| 228.811 (+-0.037) \| 92.990 (+-0.446) \| 92.648 (+-0.286) \| 0.996 (+-0.000) \| 228.274 (+-0.067) Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=bilinear \| 222.107 (+-0.076) \| 93.247 (+-0.387) \| 92.528 (+-0.423) \| 0.992 (+-0.000) \| 221.922 (+-0.297) Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=bilinear \| 235.654 (+-0.055) \| 75.781 (+-0.566) \| 115.865 (+-0.419) \| 1.529 (+-0.000) \| 236.032 (+-0.111) Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=bilinear \| 226.752 (+-0.088) \| 76.312 (+-0.328) \| 116.468 (+-0.477) \| 1.526 (+-0.000) \| 226.950 (+-0.027) Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=nearest \| 225.540 (+-0.013) \| 75.638 (+-0.341) \| 72.621 (+-0.292) \| 0.960 (+-0.000) \| 225.937 (+-0.017) Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=nearest \| 217.425 (+-0.024) \| 75.484 (+-0.545) \| 73.518 (+-0.296) \| 0.974 (+-0.000) \| 217.793 (+-0.008) Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=nearest \| 231.474 (+-0.020) \| 75.972 (+-0.339) \| 73.030 (+-0.387) \| 0.961 (+-0.000) \| 231.991 (+-0.184) Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=nearest \| 223.408 (+-0.016) \| 75.622 (+-0.279) \| 73.542 (+-0.336) \| 0.973 (+-0.000) \| 223.893 (+-0.021) Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=bicubic \| 319.382 (+-0.023) \| 149.060 (+-0.190) \| 772.116 (+-0.266) \| 5.180 (+-0.000) \| 320.549 (+-0.387) Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=bicubic \| 319.987 (+-0.134) \| 154.443 (+-0.014) \| 797.651 (+-0.232) \| 5.165 (+-0.000) \| 320.665 (+-0.397) Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=bicubic \| 326.138 (+-0.439) \| 149.092 (+-0.036) \| 772.508 (+-0.259) \| 5.181 (+-0.000) \| 325.751 (+-0.398) Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=bicubic \| 326.024 (+-0.118) \| 154.452 (+-0.209) \| 797.756 (+-0.229) \| 5.165 (+-0.000) \| 326.870 (+-0.372) Times are in microseconds (us). ``` [Source](https://raw.githubusercontent.com/vfdev-5/pth-inductor-dev/master/output/20230828-134459-affine-grid-sampler-PR-vs-Nightly-speedup.md) Pull Request resolved: https://github.com/pytorch/pytorch/pull/104710 Approved by: https://github.com/lezcano	2023-08-29 05:54:24 +00:00
Elias Ellison	d040d5b9ee	Fix multi output layout error in indexing dtype calculation (#108085 ) Differential Revision: [D48757829](https://our.internmc.facebook.com/intern/diff/D48757829) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108085 Approved by: https://github.com/yanboliang, https://github.com/davidberard98, https://github.com/jansel, https://github.com/peterbell10	2023-08-29 05:43:44 +00:00
Shunting Zhang	e68b3ad14f	update triton pin with needed inductor change (#107722 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107722 Approved by: https://github.com/jansel, https://github.com/cpuhrsch	2023-08-29 04:31:44 +00:00
drisspg	00eed6f367	Better Error Message for invalid Out_dtype + Bias for scaled_mm (#108097 ) # Summary Fixes an error case that was directly throwing Cublasslt error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108097 Approved by: https://github.com/vkuzo	2023-08-29 04:10:17 +00:00
PyTorch UpdateBot	1b2eac00cb	[vision hash update] update the pinned vision hash (#108112 ) This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml). Update the pinned vision hash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108112 Approved by: https://github.com/pytorchbot	2023-08-29 04:08:05 +00:00
PyTorch MergeBot	6648880aca	Revert "Remove Array.h (#106810 )" This reverts commit 39297eb22f13a92da40ddc79eca5f0fc937bfee1. Reverted https://github.com/pytorch/pytorch/pull/106810 on behalf of https://github.com/huydhn due to Sorry for reverting your PR but the build is failing precompiled header build in trunk due to a landrace with the revert of https://github.com/pytorch/pytorch/pull/106915 ([comment](https://github.com/pytorch/pytorch/pull/106810#issuecomment-1696702323))	2023-08-29 03:11:13 +00:00
Jason Ansel	de5ffa8a3a	[inductor] Add aten.multinomial to disallowed cudagraphs ops (#108105 ) Fixes: ```python CUDA_LAUNCH_BLOCKING=1 ./benchmarks/dynamo/torchbench.py --inference --performance --no-skip --inductor --freezing --only nanogpt_generate loading model: 0it [00:00, ?it/s]number of parameters: 123.69M loading model: 0it [00:07, ?it/s] cuda eval nanogpt_generate ERROR:common:Backend dynamo failed in warmup() Traceback (most recent call last): File "/data/users/jansel/pytorch/torch/_inductor/cudagraph_trees.py", line 1084, in _record static_outputs = model(inputs) File "/data/users/jansel/pytorch/torch/_inductor/codecache.py", line 401, in _run_from_cache return compiled_graph.compiled_artifact(inputs) File "/tmp/torchinductor_jansel/db/cdbk4ip3fucyoccnbnoik2crjpdkliwxll653l7l3wwsxiygmade.py", line 18375, in call buf239 = aten.multinomial.default(buf238, 1) File "/data/users/jansel/pytorch/torch/_ops.py", line 448, in __call__ return self._op(args, *kwargs or {}) RuntimeError: CUDA error: operation not permitted when stream is capturing ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/108105 Approved by: https://github.com/eellison ghstack dependencies: #108096, #108087, #108098	2023-08-29 02:58:48 +00:00
Jason Ansel	6d61d74545	[dynamo] Fix setattr nn.Module with new attribute (#108098 ) This is one (but not all) issues in DALLE2_pytorch Pull Request resolved: https://github.com/pytorch/pytorch/pull/108098 Approved by: https://github.com/eellison ghstack dependencies: #108096, #108087	2023-08-29 02:58:48 +00:00
Richard Barnes	39297eb22f	Remove Array.h (#106810 ) Summary: Replaced by std::array Test Plan: Sandcastle Differential Revision: D48160261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/106810 Approved by: https://github.com/peterbell10	2023-08-29 02:49:08 +00:00
Brian Hirsh	da54f3c519	reorder proxy / fake modes so they always run last (#104482 ) Update: Made refactor of the original PR. See the original description below, but here I'll describe the updates: (1) TLS changes in `TorchDispatchModeTLS.h/cpp`. I added a `TorchDispatchModeKey` enum, that (for now) just contains PROXY and FAKE. The ModeTLS used to just contain a `std::vector<std::shared_ptr<c10::SafePyObject>>` corresponding to the mode stack. It now also contains a separate array of "infra modes", indexed by mode key (PROXY and FAKE, with a new addition, FUNCTIONAL, coming later in the stack). `TorchDispatchModeTLS::push_onto_stack` and `TorchDispatchModeTLS::pop_stack` are now a bit more complicated. Pushing accepts an optional mode_key, which if set, tells us to add the given mode directly to our "infra_modes" array. Popping will first check the "user mode" stack, before trying to pop anything from the infra mode stack. It also optionally returns the mode key of the mode we popped if there was one - that way if we push that same mode back onto the TLS later, we know where it goes. `TorchDispatchModeTLS::dispatch_mode_enabled()` now accepts an optional `skip_infra_modes` param, so you can separately query if there are "any modes at all", or if there are "any user modes". `TorchDispatchModeTLS::get/set/unset_mode()` all take in a mode key, and get/set/unset the mode at that particular mode key (meaning they are only meant to be used for infra modes). There were also some mild codegen changes to support the new enum (2) `fake_tensor.py/proxy_tensor.py/_python_dispatch.py` The way I tell the infra that certain subclasses/modes are "infra" is through the enum: I gave `FakeTensor` and `FakeTensorMode` a `self._mode_key = torch._C.TorchDispatchModeKey.FAKE`. `TorchDispatchMode.__enter/exit__()` (in `_python_dispatch.py` now check if the current mode has a mode key, and if so they plumb it into any `push_onto_stack()` calls (which eventually instructs `TorchDispatchModeTLS` where to put the mode). Same thing for `ProxyTorchDispatchMode`. I also had to change both of these mode's enter/exit, to handle the fact that there can no longer be multiple proxy/fake modes on the mode stack at once. I updated them both to have a `self.enter_stack: List[Optional[TorchDispatchMode]]` - whenever we push a given mode in `__enter__`, we remove the current ambient fake/proxy mode from the mode stack, and save it in `enter_stack`, so that on exit we can reset the state properly. (2) dispatching logic in `python_arg_parser.cpp` This is where the core dispatching logic changes are. I added two helpers, `dispatch_on_subclass()` and `dispatch_on_mode()`. The overall dispatching order is now: ``` (a) dispatch_on_mode() # try user modes first (where the mode stack automatically considers infra modes last) (b) dispatch_on_subclass() # try user subclasses next (skipping infra subclasses) (c) dispatch_on_subclass() # try infra subclasses next (skipping user subclasses) ``` Note that we still want "user subclasses" to run before "infra modes". As Ed helped me realize, this will work today: If proxy/fake modes in step 1, they'll return NotImplemented if they see a user subclass, allowing us to redispatch to the user subclass. How do (b) and (c) distinguish between user and infra subclasses? Infra subclasses (FakeTensor, and later FunctionalTensor) are required to have a `_mode_key` hidden on the subclass - so we filter via arguments that do/don't have the _mode_key. (3) I also changed `DoubleTensor` to `TwoTensor` to minimize confusion (@albanD pointed out that DoubleTensor would be easily confused with `torch.FloatTensor` and friends). ----- original description below ----- The main purpose of this PR is to fix the "ordering problem" between torch_dispatch modes, where we want to ensure that our Fake and Proxy dispatch modes always run after any dispatch modes created by the user, regardless of where they are in the stack. See this doc for more details: https://docs.google.com/document/d/1COQ291nOZvtFnzGTQMJqoYZ3sttEYFw_7HbfSyL8gcA/edit Full set of changes below. I ended up including a few semi-related changes in this PR that I documented - but if folks would rather I separate them out, happy to try to do that. (1) Add dedicated TLS slots for FakeTensorMode and ProxyTensorMode This is the main component of this PR. There are two new slots, `TorchDispatchModeTLS.fake_mode_` and `TorchDispatchModeTLS.proxy_mode_`, which correspond to a single "global" fake and proxy mode. There is now an invariant that `torchDispatchModeState.stack_` can never contain either of these modes. I also added a `TorchDispatchModeTLS::maybe_highest_mode()` helper that consults the `stack_` as well as both the proxy and fake slots, and returns the highest priority mode - this is because there are a few places in the codebase where we legitimately want to get the highest priority mode, including fake or proxy, if one is set. This also made the implementations of the existing `disable_proxy_modes_tracing()` and `get_innermost_proxy_mode()` marginally simpler. (2) Updated the dispatching logic in handle_torch_function_no_python_arg_parser() This is the function that actually figures out which torch_dispatch implementation to call, given the current mode stack and tensor subclass inputs. This function got marginally more complicated as part of the refactor: First we inspect the mode stack and any non-fake subclass inputs. Then we check for the proxy mode slot. Then we check for the Fake mode slot, before finally checking for any fake subclass inputs. (3) new python `_get_fake_tensor_mode()` and `_get_proxy_tensor_mode()` API's Before, if you wanted to see if proxy or fake modes were active in python, you would have to consult the mode stack. Since these two modes are no longer part of the actual mode stack, I added two new API's to directly check if either proxy or fake modes are active. (4) Allow traceable tensor subclasses to access storages from python This is convenient later in the stack, where AOTAutograd needs to detect aliasing of inputs and outputs, where those inputs and outputs might be tensor subclasses. Previously, `x.untyped_storage()` would raise an error if `x` was a subclass. In this PR, I tried to relax this constraint as little as possible: `THPVariable_storage()` will only try to return a storage to python if the tensor subclass that you are passing in is "traceable" (5) Fixed subclass fakeification @wanchaol recently added support to be able to fakeify tensor subclasses. That fakeification logic works in most cases, but there is one case it doesn't handle: autograd metadata. In particular, since autograd sees our tensor subclasses and not their desugared tensors, we need to make sure that our fakeified subclass has the same autograd metadata as the original subclass. I updated `meta_utils.py` to make sure that the autograd metadata is correct. (6) make tensor subclasses resizeable Previously we didn't allow tensor subclasses to be resizeable. I ran into an issue where fakeifying a tensor subclass occasionally requires swapping out its storage, which can involve resizing the tensor. Mechanically, this required updating `at::for_blob()` to expose a way to request that the tensor that you create has resizeable storage, and then using this new API in `_make_wrapper_tensor()`. (7) Added a basic DoubleTensor subclass for testing I use this subclass more later in this stack in my AOTAutograd tests - but it serves as a simple subclass example to test the dispatch ordering in this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104482 Approved by: https://github.com/ezyang ghstack dependencies: #107415	2023-08-29 02:36:48 +00:00
Brian Hirsh	5efd63b1b8	better support for fakeifying and dynamoing through torch_dispatch subclasses (with dynamic shapes) (#107415 ) There is already some support for plumbing `__torch_dispatch__` tensor subclasses through dynamo, but this PR beefs it up a bit and adds a test. In particular: (1) Fakeifying tensor subclasses didn't properly set autograd metadata (requires_grad, is_leaf) on the newly fakeified wrapper subclass. I don't actually have a test for this in this PR, but it's tested pretty heavily later in my aot autograd tests (2) Fakeifying tensor subclasses didn't properly track source information for dynamic shapes on the inner tensors. I added a new `WrapperSubclassFieldSource` subclass, that represents a source coming from a tensor field on a wrapper subclass, which I use in the fakeifying logic, and again in symbolic_shapes.py to generate proper guards. (3) `_make_wrapper_subclass()` marginally updated this code to work better with dynamic shapes. One thing that's a bit weird about `_make_wrapper_subclass`: it has two overloads, and the first explicitly does not support dynamic shapes (and the second.. does not support kwargs). I think that later we probably want to consolidate / at least make the first overload work with dynamic shapes, but I didn't want to handle that in this PR (so these smaller changes seemed like a strict improvement). Pull Request resolved: https://github.com/pytorch/pytorch/pull/107415 Approved by: https://github.com/ezyang	2023-08-29 02:36:48 +00:00
PyTorch MergeBot	378ffde8c1	Revert "Remove some unnecessary <iostream> includes from headers (#106914 )" This reverts commit a6c29b722772816804d54eed070fbb38450d3e6f. Reverted https://github.com/pytorch/pytorch/pull/106914 on behalf of https://github.com/izaitsevfb due to Causing metal breakage internally, see D48709279 ([comment](https://github.com/pytorch/pytorch/pull/106914#issuecomment-1696670027))	2023-08-29 02:22:33 +00:00
PyTorch MergeBot	2f226804a0	Revert "Minor fixs to make torchbench runable on torch/xla (#107919 )" This reverts commit ed8f21282fca07621836a14f7d517148e1b944c3. Reverted https://github.com/pytorch/pytorch/pull/107919 on behalf of https://github.com/izaitsevfb due to Conflicts with the revert of 106914 ([comment](https://github.com/pytorch/pytorch/pull/107919#issuecomment-1696662453))	2023-08-29 02:18:07 +00:00
Animesh Jain	de972529dc	[logging] Add more flags to default logs (#107912 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107912 Approved by: https://github.com/mlazos	2023-08-29 01:01:02 +00:00
Sergii Dymchenko	5251ae6fb7	Explicitly include iostream (#108103 ) Summary: Similar to D48568760 Test Plan: Sandcastle Reviewed By: osalpekar Differential Revision: D48758708 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108103 Approved by: https://github.com/osalpekar	2023-08-29 00:10:34 +00:00
Jason Ansel	2d54d4c913	[inductor] Add constant_to_device for ir.Constant (#108087 ) Fixes error with: ``` TORCHINDUCTOR_MAX_AUTOTUNE=1 TORCHINDUCTOR_MAX_AUTOTUNE_POINTWISE=1 TORCHINDUCTOR_MAX_AUTOTUNE_GEMM=1 ./benchmarks/dynamo/torchbench.py --inference --performance --no-skip --inductor --freezing --only pyhpc_turbulent_kinetic_energy ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/108087 Approved by: https://github.com/eellison ghstack dependencies: #108096	2023-08-29 00:08:11 +00:00
Jason Ansel	73235d08c3	[dynamo] Graph break on pack_padded_sequence (#108096 ) This is to workaround #93501. Fixes errors in: ``` ./benchmarks/dynamo/torchbench.py --inference --performance --no-skip --inductor --freezing --only tacotron2 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/108096 Approved by: https://github.com/davidberard98	2023-08-29 00:08:11 +00:00
PyTorch MergeBot	d4ff06ec84	Revert "Standardize on error types for distributed errors. (#107651 )" This reverts commit 0e2317479b3cb987e1f3230876654f156bd11a09. Reverted https://github.com/pytorch/pytorch/pull/107651 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it is failing inductor test in trunk for one of its model moco ([comment](https://github.com/pytorch/pytorch/pull/107651#issuecomment-1696578138))	2023-08-28 23:58:33 +00:00
Flavio Sales Truzzi	cd4f74fb2e	[PT2] - Add check for stack (#108012 ) Summary: Add check for `guard.stack` which was causing exceptions like: ``` toch._dynamo.exc.InternalTorchDynamoError: 'NoneType' object has no attribute 'format' ``` Test Plan: contbuild & OSS CI Differential Revision: D48709458 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108012 Approved by: https://github.com/anijain2305	2023-08-28 23:30:34 +00:00
Aaron Gokaslan	3488837ec1	Update ruff to v0.0.286 (#108058 ) Updates ruff to v0.0.286 and fixes one false negative. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108058 Approved by: https://github.com/albanD	2023-08-28 22:55:56 +00:00
PyTorch MergeBot	8caa89917b	Revert "[ATen] Update pre-compiled header (#106915 )" This reverts commit c68d0a7042e850cebc4cbe7f717fc11aedf6b9d7. Reverted https://github.com/pytorch/pytorch/pull/106915 on behalf of https://github.com/osalpekar due to Unfortunately there is still a breaking Metal job due to the bottom PR. @kit1980 will help fix this and get this merged ([comment](https://github.com/pytorch/pytorch/pull/106915#issuecomment-1696530828))	2023-08-28 22:51:19 +00:00
Animesh Jain	9d2ffc5dfa	[reland][Dynamo] cache_size policy #107496 (#108069 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108069 Approved by: https://github.com/yanboliang	2023-08-28 22:06:54 +00:00
Douglas Lehr	cd20a89ccc	[ROCM] Add ROCm support to debug_dump and enable_debug_mode (#107845 ) enable_debug_mode and debug_dump are enabled in ROCM releases. Add ROCM flags to #if defines so they can be accessed by PyTorch users. Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/107845 Approved by: https://github.com/pruthvistony, https://github.com/huydhn	2023-08-28 22:03:34 +00:00
Pritam Damania	0e2317479b	Standardize on error types for distributed errors. (#107651 ) We have a plethora of error types for various errors raised from c10d. These include `RuntimeError`, `TimeoutError`, `SocketError`, `DistBackendError` etc. This results in messy code during error handling somewhat like this: ``` if "NCCL" in exception_str: ... if "Timed out initializing process group in store based barrier on rank" in exception_str: ... if "The client socket has timed out after" in exception_str: ... if "Broken pipe" in exception_str: ... if "Connection reset by peer" in exception_str: ... ``` To address this issue, in this PR I've ensured added these error types: 1. DistError - the base type of all distributed errors 2. DistBackendError - this already existed and referred to PG backend errors 3. DistStoreError - for errors originating from the store 4. DistNetworkError - for general network errors coming from the socket library Pull Request resolved: https://github.com/pytorch/pytorch/pull/107651 Approved by: https://github.com/H-Huang	2023-08-28 21:58:15 +00:00
Huy Do	9fdb5ef26b	Skip ROCm jobs on PR (for now) (#108083 ) Follow AMD suggestion to relieve the queue on ROCM, let the jobs run in only trunk Pull Request resolved: https://github.com/pytorch/pytorch/pull/108083 Approved by: https://github.com/pruthvistony, https://github.com/seemethere	2023-08-28 21:42:31 +00:00
andrewor14	199e23bc3a	[quant][be] Clean up QAT tests in test_quantize_pt2e.py (#107991 ) Summary: This commit does 4 main things: 1. When verifying QAT numerics, automatically check both the per tensor and the per channel cases, and automatically verify convert numerics 2. When verifying the QAT graph, automatically check both the per tensor and the per channel cases 3. Merge verify graph and verify numerics tests for conv-bn 4. Fix `test_prepare_qat_conv_bn_fusion_getitem_placeholder`, which was no longer testing the right thing recent capture changes, since the maxpool op is no longer followed by a getitem node. However, we do still need this test for other ops that are followed by getitem nodes (e.g. standalone BN). Items (1) - (3) make the QAT tests significantly less verbose and easier to read. Test Plan: python test/test_quantization.py TestQuantizePT2E python test/test_quantization.py TestQuantizePT2EModels Reviewers: jerryzh168, kimishpatel Subscribers: jerryzh168, kimishpatel, supriyar Pull Request resolved: https://github.com/pytorch/pytorch/pull/107991 Approved by: https://github.com/jerryzh168	2023-08-28 21:12:00 +00:00
bilzard	18a58f0bd6	Implement "RAdamW" optimizer (#107507 ) Fixes #107282 ## Overview - basic design decision was followed as they made on #103881 (tensor operation, test cases, order & position of argument etc.) - for the algorithm for decoupled weight decay, I referred to [1, 2] ## backwards-incompatible changes - positional argument `decoupled_weight_decay` is added to: - `torch.optim.radam` The existing code which refers to these APIs can be affected. Note: Positional argument `decoupled_weight_decay` is added to `torch.optim.RAdam`. However, since it was added to the last position and with default value, it is not affected. ## Reference - [1] [Decoupled Weight Decay Regularization](https://arxiv.org/abs/1711.05101) - [2] https://github.com/LiyuanLucasLiu/RAdam/blob/master/radam/radam.py#L5-L94 ## TODO - [x] implement tensor operation - [x] implement test cases - [x] modify doc-string - [x] pass unit test code locally `python test/test_optim.py -k test_radam` Pull Request resolved: https://github.com/pytorch/pytorch/pull/107507 Approved by: https://github.com/janeyx99	2023-08-28 20:50:25 +00:00
PyTorch MergeBot	8cbf77585d	Revert "[1/N] fix clang-tidy warnings in torch/csrc (#107648 )" This reverts commit 49eeca00d1e76dd0158758f2c29da6b1d06bf54a. Reverted https://github.com/pytorch/pytorch/pull/107648 on behalf of https://github.com/osalpekar due to This causes breakages due to underspecified type ([comment](https://github.com/pytorch/pytorch/pull/107648#issuecomment-1696372588))	2023-08-28 20:35:12 +00:00
Aaron Bockover	b0d109f29f	[ONNX] Bump onnx submodule to 1.14.1; ONNX Runtime 1.16 (#106984 ) Bump dependencies: - ort-nightly 1.16.0.dev20230824005 - onnx 1.14.1rc2 - onnxscript 0.1.0.dev20230825 Pull Request resolved: https://github.com/pytorch/pytorch/pull/106984 Approved by: https://github.com/BowenBao, https://github.com/thiagocrepaldi	2023-08-28 20:11:29 +00:00
katotaisei	bcda859e34	fix typos (#108006 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108006 Approved by: https://github.com/Skylion007	2023-08-28 19:49:09 +00:00
voznesenskym	5d85d897e0	Torchrec Enablement Fixes - Re-PR 107910 (#108018 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108018 Approved by: https://github.com/wconstab	2023-08-28 19:47:53 +00:00
Nicolas Macchioni	73cbe95005	[pt2][autotuning] add logging for failed autotunings (#108034 ) Summary: log failed autotunings due to cuda misaligned address errors Test Plan: https://www.internalfb.com/intern/daiquery/?queryid=1587758145084896 Differential Revision: D48663354 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108034 Approved by: https://github.com/jansel	2023-08-28 19:44:38 +00:00
drisspg	182a9cf366	Add Independent Memory Efficient and Flash Attention Build Flags (#107985 ) # Summary In an effort to simplify https://github.com/pytorch/pytorch/pull/105602, this PR pulls out independent chunks of code that can be landed prior to FlashV2 landing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107985 Approved by: https://github.com/cpuhrsch	2023-08-28 18:39:18 +00:00
Huy Do	f0c6e5c91f	Fix the use of inputs.build_environment in #107868 (#108075 ) It should be `${{ inputs.build_environment }}`, although I wonder why not just clean up the artifacts directory for all build instead of just `aarch64` Pull Request resolved: https://github.com/pytorch/pytorch/pull/108075 Approved by: https://github.com/atalman, https://github.com/seemethere	2023-08-28 18:29:19 +00:00
Mikayla Gawarecki	584a01b650	Fix LayerNorm(bias=False) error (#108060 ) Fixes #108048 - [ ] Cherry pick this [here](https://github.com/pytorch/pytorch/issues/108055) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108060 Approved by: https://github.com/jbschlosser, https://github.com/albanD, https://github.com/malfet	2023-08-28 18:23:13 +00:00
cyy	054f3f1d8f	[3/N] fix clang-tidy warnings in torch/csrc (#108024 ) Apply fixes to some found issues by clang-tidy in torch/csrc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108024 Approved by: https://github.com/Skylion007, https://github.com/albanD, https://github.com/malfet	2023-08-28 18:00:00 +00:00
kobecai	356b8f6339	[dynamo]bugfix:implement numel() for SizeVariable (#107944 ) fix the issue that SizeVariable does not support numel() method Fixes #106407 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107944 Approved by: https://github.com/Skylion007	2023-08-28 17:54:57 +00:00
Ken Jin	7349e8c1a1	Don't use `np.random` for TorchDynamo (#108009 ) Part of https://github.com/pytorch/pytorch/issues/107970 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108009 Approved by: https://github.com/lezcano	2023-08-28 17:18:40 +00:00
FFFrog	a1d8132210	Enable mypy check in torch/_inductor/optimize_indexing.py (#107943 ) Fixes #105230 ```shell $ lintrunner init && lintrunner -a torch/_inductor/optimize_indexing.py ... ok No lint issues. Successfully applied all patches. ``` ```shell $ mypy torch/_inductor/optimize_indexing.py Success: no issues found in 1 source file Pull Request resolved: https://github.com/pytorch/pytorch/pull/107943 Approved by: https://github.com/Skylion007	2023-08-28 17:08:13 +00:00
Sam Larsen	20f3808aa2	Implement decomposition for aten.tensor_split.tensor_indices_or_sections (#107251 ) Summary: Before this change, the tensor_indices_or_sections variant of aten.tensor_split causes a `RuntimeError: The tensor has a non-zero number of elements` due to that operation needing to introspect data. Decomposing into one of the other two tensor_split variants fixes the problem. Test Plan: Enabled tensor_split tests in test/inductor/test_torchinductor_opinfo.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/107251 Approved by: https://github.com/ezyang, https://github.com/eellison	2023-08-28 17:01:23 +00:00
FFFrog	010064159b	Fix the issue described by #106532 (#108036 ) Fixes #106532 As the title says Pull Request resolved: https://github.com/pytorch/pytorch/pull/108036 Approved by: https://github.com/albanD	2023-08-28 16:23:47 +00:00
Driss Guessous	c8f7f2659b	Two small mem_eff bug fixes (#103201 ) # Summary Upstream two small bug fixes: * https://github.com/fairinternal/xformers/pull/679 * https://github.com/fairinternal/xformers/pull/681 Pull Request resolved: https://github.com/pytorch/pytorch/pull/103201 Approved by: https://github.com/cpuhrsch	2023-08-28 16:21:47 +00:00
Joel Schlosser	67371c7431	Binary op support for (B, C, , ) NT with (C, 1, 1) dense (#107890 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107890 Approved by: https://github.com/cpuhrsch ghstack dependencies: #107891, #107892	2023-08-28 15:19:39 +00:00
Joel Schlosser	33d70be95f	Binary out-of-place ge.Scalar / eq.Scalar support for NT (#107892 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107892 Approved by: https://github.com/cpuhrsch ghstack dependencies: #107891	2023-08-28 15:18:37 +00:00
Joel Schlosser	e917d2749a	Unary out-of-place sin / cos support for NT (#107891 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107891 Approved by: https://github.com/cpuhrsch	2023-08-28 15:17:34 +00:00
wz337	264df88a2d	[C10D][Logger]Add more info to c10d logger (#107331 ) This PR adds pg_name and world_size to c10d logging. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107331 Approved by: https://github.com/kumpera	2023-08-28 15:10:56 +00:00
bcoutinho	dcc674de8e	remove step invocation warning (#107216 ) Fixes #99734 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107216 Approved by: https://github.com/davidberard98, https://github.com/aaronenyeshi	2023-08-28 14:35:25 +00:00
Aleksei Nikiforov	60bb02a907	Fix fallback FBGEMM implementation for Big Endian systems. (#96422 ) This change fixes multiple tests in test/test_quantization.py::TestQuantizedEmbeddingOps. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96422 Approved by: https://github.com/huydhn	2023-08-28 12:44:12 +00:00
ydwu4	49e964cad6	Automatically turn on dynamo in cond (#108028 ) A replacement of https://github.com/pytorch/pytorch/pull/107932. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108028 Approved by: https://github.com/zou3519 ghstack dependencies: #108025, #108026, #108027	2023-08-28 10:16:41 +00:00
ydwu4	6f8eecfb10	Add UncapturedHigherOrderOpError to always raise exceptions for cond. (#108027 ) We want cond to always throw errors despite user's torch.compile mode. The current implementation is to 1. catch the UserError.GRAPH_BREAK_IN_CONTROL_FLOW and once saw it, we directly raise: once in [break_graph_if_unsupported](`bad3f2db40/torch/_dynamo/symbolic_convert.py (L1250)`), which catches and raises for call_function (entry point of higher order operator) and a few others. 2. The raised exception is caught and raised again in [step](`bad3f2db40/torch/_dynamo/symbolic_convert.py (L691)`), where all instructions' exceptions are handled. 3. At the top-level, we treat it like an hard error and not supressing the errors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108027 Approved by: https://github.com/zou3519 ghstack dependencies: #108025, #108026	2023-08-28 07:23:03 +00:00

3821 changed files with 241740 additions and 429315 deletions

									
										77

.ci/docker/build.sh
									
												View File
												
				@ -71,6 +71,9 @@ if [[ "$image" == *cuda* && "$UBUNTU_VERSION" != "22.04" ]]; then

				  DOCKERFILE="${OS}-cuda/Dockerfile"

				elif [[ "$image" == *rocm* ]]; then

				  DOCKERFILE="${OS}-rocm/Dockerfile"

				elif [[ "$image" == *cuda*linter* ]]; then

				  # Use a separate Dockerfile for linter to keep a small image size

				  DOCKERFILE="linter-cuda/Dockerfile"

				elif [[ "$image" == *linter* ]]; then

				  # Use a separate Dockerfile for linter to keep a small image size

				  DOCKERFILE="linter/Dockerfile"

				@ -129,35 +132,6 @@ case "$image" in

				    CONDA_CMAKE=yes

				    TRITON=yes

				    ;;

				  pytorch-linux-focal-cuda11.8-cudnn8-py3-gcc7)

				    CUDA_VERSION=11.8.0

				    CUDNN_VERSION=8

				    ANACONDA_PYTHON_VERSION=3.10

				    GCC_VERSION=7

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    KATEX=yes

				    UCX_COMMIT=${_UCX_COMMIT}

				    UCC_COMMIT=${_UCC_COMMIT}

				    CONDA_CMAKE=yes

				    TRITON=yes

				    ;;

				    pytorch-linux-focal-cuda11.8-cudnn8-py3-gcc7-inductor-benchmarks)

				    CUDA_VERSION=11.8.0

				    CUDNN_VERSION=8

				    ANACONDA_PYTHON_VERSION=3.10

				    GCC_VERSION=7

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    KATEX=yes

				    UCX_COMMIT=${_UCX_COMMIT}

				    UCC_COMMIT=${_UCC_COMMIT}

				    CONDA_CMAKE=yes

				    TRITON=yes

				    INDUCTOR_BENCHMARKS=yes

				    ;;

				  pytorch-linux-focal-cuda12.1-cudnn8-py3-gcc9)

				    CUDA_VERSION=12.1.1

				    CUDNN_VERSION=8

				@ -181,13 +155,13 @@ case "$image" in

				    CONDA_CMAKE=yes

				    ONNX=yes

				    ;;

				  pytorch-linux-focal-py3-clang7-android-ndk-r19c)

				  pytorch-linux-focal-py3-clang9-android-ndk-r21e)

				    ANACONDA_PYTHON_VERSION=3.8

				    CLANG_VERSION=7

				    CLANG_VERSION=9

				    LLVMDEV=yes

				    PROTOBUF=yes

				    ANDROID=yes

				    ANDROID_NDK_VERSION=r19c

				    ANDROID_NDK_VERSION=r21e

				    GRADLE_VERSION=6.8.3

				    NINJA_VERSION=1.9.0

				    ;;

				@ -228,7 +202,7 @@ case "$image" in

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    ROCM_VERSION=5.4.2

				    ROCM_VERSION=5.6

				    NINJA_VERSION=1.9.0

				    CONDA_CMAKE=yes

				    TRITON=yes

				@ -239,22 +213,11 @@ case "$image" in

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    ROCM_VERSION=5.6

				    ROCM_VERSION=5.7

				    NINJA_VERSION=1.9.0

				    CONDA_CMAKE=yes

				    TRITON=yes

				    ;;

				  pytorch-linux-focal-py3.8-gcc7)

				    ANACONDA_PYTHON_VERSION=3.8

				    GCC_VERSION=7

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    KATEX=yes

				    CONDA_CMAKE=yes

				    TRITON=yes

				    DOCS=yes

				    ;;

				    pytorch-linux-jammy-py3.8-gcc11-inductor-benchmarks)

				    ANACONDA_PYTHON_VERSION=3.8

				    GCC_VERSION=11

				@ -286,6 +249,12 @@ case "$image" in

				    CONDA_CMAKE=yes

				    TRITON=yes

				    ;;

				  pytorch-linux-jammy-py3-clang15-asan)

				    ANACONDA_PYTHON_VERSION=3.10

				    CLANG_VERSION=15

				    CONDA_CMAKE=yes

				    VISION=yes

				    ;;

				  pytorch-linux-jammy-py3.8-gcc11)

				    ANACONDA_PYTHON_VERSION=3.8

				    GCC_VERSION=11

				@ -297,6 +266,12 @@ case "$image" in

				    TRITON=yes

				    DOCS=yes

				    ;;

				  pytorch-linux-jammy-py3-clang12-executorch)

				    ANACONDA_PYTHON_VERSION=3.10

				    CLANG_VERSION=12

				    CONDA_CMAKE=yes

				    EXECUTORCH=yes

				    ;;

				  pytorch-linux-focal-linter)

				    # TODO: Use 3.9 here because of this issue https://github.com/python/mypy/issues/13627.

				    # We will need to update mypy version eventually, but that's for another day. The task

				@ -304,6 +279,11 @@ case "$image" in

				    ANACONDA_PYTHON_VERSION=3.9

				    CONDA_CMAKE=yes

				    ;;

				  pytorch-linux-jammy-cuda11.8-cudnn8-py3.9-linter)

				    ANACONDA_PYTHON_VERSION=3.9

				    CUDA_VERSION=11.8

				    CONDA_CMAKE=yes

				    ;;

				  *)

				    # Catch-all for builds that are not hardcoded.

				    PROTOBUF=yes

				@ -321,6 +301,9 @@ case "$image" in

				      extract_version_from_image_name rocm ROCM_VERSION

				      NINJA_VERSION=1.9.0

				      TRITON=yes

				      # To ensure that any ROCm config will build using conda cmake

				      # and thus have LAPACK/MKL enabled

				      CONDA_CMAKE=yes

				    fi

				    if [[ "$image" == *centos7* ]]; then

				      NINJA_VERSION=1.10.2

				@ -354,14 +337,11 @@ if [[ "$image" == *cuda*  && ${OS} == "ubuntu" ]]; then

				fi

				# Build image

				# TODO: build-arg THRIFT is not turned on for any image, remove it once we confirm

				# it's no longer needed.

				docker build \

				       --no-cache \

				       --progress=plain \

				       --build-arg "BUILD_ENVIRONMENT=${image}" \

				       --build-arg "PROTOBUF=${PROTOBUF:-}" \

				       --build-arg "THRIFT=${THRIFT:-}" \

				       --build-arg "LLVMDEV=${LLVMDEV:-}" \

				       --build-arg "DB=${DB:-}" \

				       --build-arg "VISION=${VISION:-}" \

				@ -393,6 +373,7 @@ docker build \

				       --build-arg "ONNX=${ONNX}" \

				       --build-arg "DOCS=${DOCS}" \

				       --build-arg "INDUCTOR_BENCHMARKS=${INDUCTOR_BENCHMARKS}" \

				       --build-arg "EXECUTORCH=${EXECUTORCH}" \

				       -f $(dirname ${DOCKERFILE})/Dockerfile \

				       -t "$tmp_tag" \

				       "$@" \

									
										12

.ci/docker/centos-rocm/Dockerfile
									
												View File
												
				@ -98,6 +98,18 @@ COPY ./common/install_ninja.sh install_ninja.sh

				RUN if [ -n "${NINJA_VERSION}" ]; then bash ./install_ninja.sh; fi

				RUN rm install_ninja.sh

				ARG TRITON

				# Install triton, this needs to be done before sccache because the latter will

				# try to reach out to S3, which docker build runners don't have access

				ENV CMAKE_C_COMPILER cc

				ENV CMAKE_CXX_COMPILER c++

				COPY ./common/install_triton.sh install_triton.sh

				COPY ./common/common_utils.sh common_utils.sh

				COPY ci_commit_pins/triton-rocm.txt triton-rocm.txt

				COPY triton_version.txt triton_version.txt

				RUN if [ -n "${TRITON}" ]; then bash ./install_triton.sh; fi

				RUN rm install_triton.sh common_utils.sh triton-rocm.txt triton_version.txt

				# Install ccache/sccache (do this last, so we get priority in PATH)

				COPY ./common/install_cache.sh install_cache.sh

				ENV PATH /opt/cache/bin:$PATH

1

.ci/docker/ci_commit_pins/executorch.txt Normal file

View File

				`@ -0,0 +1 @@`
				`ca6322dcfc51b209a06b76d160bd95d81d58f15c`

2

.ci/docker/ci_commit_pins/huggingface.txt

View File

 @ -1 +1 @@
 .27.4
 c26faa159b79a42d7fa46cb66e2d21523351987

2

.ci/docker/ci_commit_pins/timm.txt

View File

 @ -1 +1 @@
 b9d43c7dcac1fe05e851dd7be7187b108af593d2
 b907b4d45a4713cbc425cbf224c46089fd514

2

.ci/docker/ci_commit_pins/triton-rocm.txt

View File

 @ -1 +1 @@
 d67b9418cacda0d356c2102d7c1a887948b013
 dafe1459823b9549417ed95e9720f1b594fab329

2

.ci/docker/ci_commit_pins/triton.txt

View File

 @ -1 +1 @@
 e6216047b8b0aef1fe8da6ca8667a3ad0a016411
 bcad9dabe15021c53b6a88296e9d7a210044f108

									
										9

.ci/docker/common/install_base.sh
									
												View File
												
				@ -9,10 +9,7 @@ install_ubuntu() {

				  #   "$UBUNTU_VERSION" == "18.04"*

				  # instead of

				  #   "$UBUNTU_VERSION" == "18.04"

				  if [[ "$UBUNTU_VERSION" == "18.04"* ]]; then

				    cmake3="cmake=3.10*"

				    maybe_libiomp_dev="libiomp-dev"

				  elif [[ "$UBUNTU_VERSION" == "20.04"* ]]; then

				  if [[ "$UBUNTU_VERSION" == "20.04"* ]]; then

				    cmake3="cmake=3.16*"

				    maybe_libiomp_dev=""

				  elif [[ "$UBUNTU_VERSION" == "22.04"* ]]; then

				@ -23,7 +20,9 @@ install_ubuntu() {

				    maybe_libiomp_dev="libiomp-dev"

				  fi

				  if [[ "$CLANG_VERSION" == 12 ]]; then

				  if [[ "$CLANG_VERSION" == 15 ]]; then

				    maybe_libomp_dev="libomp-15-dev"

				  elif [[ "$CLANG_VERSION" == 12 ]]; then

				    maybe_libomp_dev="libomp-12-dev"

				  elif [[ "$CLANG_VERSION" == 10 ]]; then

				    maybe_libomp_dev="libomp-10-dev"

									
										26

.ci/docker/common/install_conda.sh
									
												View File
												
				@ -54,23 +54,13 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then

				  CONDA_COMMON_DEPS="astunparse pyyaml mkl=2021.4.0 mkl-include=2021.4.0 setuptools"

				  if [ "$ANACONDA_PYTHON_VERSION" = "3.11" ]; then

				    conda_install numpy=1.23.5 ${CONDA_COMMON_DEPS}

				  elif [ "$ANACONDA_PYTHON_VERSION" = "3.10" ]; then

				    conda_install numpy=1.21.2 ${CONDA_COMMON_DEPS}

				  elif [ "$ANACONDA_PYTHON_VERSION" = "3.9" ]; then

				    conda_install numpy=1.21.2 ${CONDA_COMMON_DEPS}

				  elif [ "$ANACONDA_PYTHON_VERSION" = "3.8" ]; then

				    conda_install numpy=1.21.2 ${CONDA_COMMON_DEPS}

				  else

				    # Install `typing-extensions` for 3.7

				    conda_install numpy=1.21.2 ${CONDA_COMMON_DEPS} typing-extensions

				    conda_install numpy=1.21.2 ${CONDA_COMMON_DEPS}

				  fi

				  # This is only supported in 3.8 upward

				  if [ "$MINOR_PYTHON_VERSION" -gt "7" ]; then

				    # Install llvm-8 as it is required to compile llvmlite-0.30.0 from source

				    # and libpython-static for torch deploy

				    conda_install llvmdev=8.0.0 "libpython-static=${ANACONDA_PYTHON_VERSION}"

				  fi

				  # Install llvm-8 as it is required to compile llvmlite-0.30.0 from source

				  # and libpython-static for torch deploy

				  conda_install llvmdev=8.0.0 "libpython-static=${ANACONDA_PYTHON_VERSION}"

				  # Use conda cmake in some cases. Conda cmake will be newer than our supported

				  # min version (3.5 for xenial and 3.10 for bionic), so we only do it in those

				@ -89,13 +79,7 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then

				  # Install some other packages, including those needed for Python test reporting

				  pip_install -r /opt/conda/requirements-ci.txt

				  # Update scikit-learn to a python-3.8 compatible version

				  if [[ $(python -c "import sys; print(int(sys.version_info >= (3, 8)))") == "1" ]]; then

				    pip_install -U scikit-learn

				  else

				    # Pinned scikit-learn due to https://github.com/scikit-learn/scikit-learn/issues/14485 (affects gcc 5.5 only)

				    pip_install scikit-learn==0.20.3

				  fi

				  pip_install -U scikit-learn

				  if [ -n "$DOCS" ]; then

				    apt-get update

									
										62

.ci/docker/common/install_executorch.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,62 @@

				#!/bin/bash

				set -ex

				source "$(dirname "${BASH_SOURCE[0]}")/common_utils.sh"

				clone_executorch() {

				  EXECUTORCH_PINNED_COMMIT=$(get_pinned_commit executorch)

				  # Clone the Executorch

				  git clone https://github.com/pytorch/executorch.git

				  # and fetch the target commit

				  pushd executorch

				  git checkout "${EXECUTORCH_PINNED_COMMIT}"

				  git submodule update --init

				  popd

				  chown -R jenkins executorch

				}

				install_buck2() {

				  pushd executorch/.ci/docker

				  BUCK2_VERSION=$(cat ci_commit_pins/buck2.txt)

				  source common/install_buck.sh

				  popd

				}

				install_conda_dependencies() {

				  pushd executorch/.ci/docker

				  # Install conda dependencies like flatbuffer

				  conda_install --file conda-env-ci.txt

				  popd

				}

				install_pip_dependencies() {

				  pushd executorch/.ci/docker

				  # Install all Python dependencies

				  pip_install -r requirements-ci.txt

				  popd

				}

				setup_executorch() {

				  pushd executorch

				  source .ci/scripts/utils.sh

				  install_flatc_from_source

				  pip_install .

				  build_executorch_runner "cmake"

				  # Make sure that all the newly generate files are owned by Jenkins

				  chown -R jenkins .

				  popd

				}

				clone_executorch

				install_buck2

				install_conda_dependencies

				install_pip_dependencies

				setup_executorch

									
										18

.ci/docker/common/install_inductor_benchmark_deps.sh
									
												View File
												
				@ -6,23 +6,21 @@ source "$(dirname "${BASH_SOURCE[0]}")/common_utils.sh"

				function install_huggingface() {

				  local version

				  version=$(get_pinned_commit huggingface)

				  pip_install pandas

				  pip_install scipy

				  pip_install z3-solver

				  pip_install "transformers==${version}"

				  commit=$(get_pinned_commit huggingface)

				  pip_install pandas==2.0.3

				  pip_install "git+https://github.com/huggingface/transformers@${commit}"

				}

				function install_timm() {

				  local commit

				  commit=$(get_pinned_commit timm)

				  pip_install pandas

				  pip_install scipy

				  pip_install z3-solver

				  pip_install "git+https://github.com/rwightman/pytorch-image-models@${commit}"

				  pip_install pandas==2.0.3

				  pip_install "git+https://github.com/huggingface/pytorch-image-models@${commit}"

				  # Clean up

				  conda_run pip uninstall -y cmake torch torchvision triton

				}

				# Pango is needed for weasyprint which is needed for doctr

				conda_install pango

				install_huggingface

				# install_timm

				install_timm

									
										23

.ci/docker/common/install_onnx.sh
									
										Normal file → Executable file
									
												View File
												
				@ -4,36 +4,35 @@ set -ex

				source "$(dirname "${BASH_SOURCE[0]}")/common_utils.sh"

				retry () {

				    "$@" || (sleep 10 && "$@") || (sleep 20 && "$@") || (sleep 40 && "$@")

				}

				# A bunch of custom pip dependencies for ONNX

				pip_install \

				  beartype==0.10.4 \

				  beartype==0.15.0 \

				  filelock==3.9.0 \

				  flatbuffers==2.0 \

				  mock==5.0.1 \

				  ninja==1.10.2 \

				  networkx==2.0 \

				  numpy==1.22.4

				  numpy==1.24.2

				# ONNXRuntime should be installed before installing

				# onnx-weekly. Otherwise, onnx-weekly could be

				# overwritten by onnx.

				pip_install \

				  onnxruntime==1.15.1 \

				  parameterized==0.8.1 \

				  pytest-cov==4.0.0 \

				  pytest-subtests==0.10.0 \

				  tabulate==0.9.0 \

				  transformers==4.31.0

				  transformers==4.32.1

				# Using 1.15dev branch for the following not yet released features and fixes.

				# - Segfault fix for shape inference.

				# - Inliner to workaround ORT segfault.

				pip_install onnx-weekly==1.15.0.dev20230717

				pip_install coloredlogs packaging

				retry pip_install -i https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple/ --no-cache-dir --no-input ort-nightly==1.17.0.dev20231005006

				# TODO: change this when onnx-script is on testPypi

				# pip_install onnxscript-preview==0.1.0.dev20230809 --no-deps

				# NOTE: temp change for CI to run on unpublished onnxscript PR.

				pip_install "onnxscript@git+https://github.com/microsoft/onnxscript@f69be19ebd3f2e0d7efe64b0c7be3329cbab3822" --no-deps

				pip_install -i https://test.pypi.org/simple/ onnx==1.15.0rc2

				pip_install onnxscript==0.1.0.dev20231128 --no-deps

				# Cache the transformers model to be used later by ONNX tests. We need to run the transformers

				# package to download the model. By default, the model is cached at ~/.cache/huggingface/hub/

									
										6

.ci/docker/common/install_rocm_magma.sh
									
												View File
												
				@ -5,8 +5,10 @@ set -ex

				# "install" hipMAGMA into /opt/rocm/magma by copying after build

				git clone https://bitbucket.org/icl/magma.git

				pushd magma

				# Fixes memory leaks of magma found while executing linalg UTs

				git checkout 28592a7170e4b3707ed92644bf4a689ed600c27f

				# Version 2.7.2 + ROCm related updates

				git checkout 823531632140d0edcb7e77c3edc0e837421471c5

				cp make.inc-examples/make.inc.hip-gcc-mkl make.inc

				echo 'LIBDIR += -L$(MKLROOT)/lib' >> make.inc

				echo 'LIB += -Wl,--enable-new-dtags -Wl,--rpath,/opt/rocm/lib -Wl,--rpath,$(MKLROOT)/lib -Wl,--rpath,/opt/rocm/magma/lib' >> make.inc

									
										14

.ci/docker/common/install_thrift.sh
									
												View File
											
				@ -1,14 +0,0 @@

				apt-get update

				apt-get install -y sudo wget libboost-dev libboost-test-dev libboost-program-options-dev libboost-filesystem-dev libboost-thread-dev libevent-dev automake libtool flex bison pkg-config g++ libssl-dev

				wget https://www-us.apache.org/dist/thrift/0.12.0/thrift-0.12.0.tar.gz

				tar -xvf thrift-0.12.0.tar.gz

				cd thrift-0.12.0

				for file in ./compiler/cpp/Makefile*; do

				  sed -i 's/\-Werror//' $file

				done

				./bootstrap.sh

				./configure --without-php --without-java --without-python --without-nodejs --without-go --without-ruby

				sudo make

				sudo make install

				cd ..

				rm thrift-0.12.0.tar.gz

									
										10

.ci/docker/common/install_triton.sh
									
												View File
												
				@ -23,8 +23,10 @@ fi

				# The logic here is copied from .ci/pytorch/common_utils.sh

				TRITON_PINNED_COMMIT=$(get_pinned_commit ${TRITON_TEXT_FILE})

				apt update

				apt-get install -y gpg-agent

				if [ -n "${UBUNTU_VERSION}" ];then

				    apt update

				    apt-get install -y gpg-agent

				fi

				if [ -n "${CONDA_CMAKE}" ]; then

				  # Keep the current cmake and numpy version here, so we can reinstall them later

				@ -36,12 +38,12 @@ if [ -z "${MAX_JOBS}" ]; then

				    export MAX_JOBS=$(nproc)

				fi

				if [ -n "${GCC_VERSION}" ] && [[ "${GCC_VERSION}" == "7" ]]; then

				if [ -n "${UBUNTU_VERSION}" ] && [ -n "${GCC_VERSION}" ] && [[ "${GCC_VERSION}" == "7" ]]; then

				  # Triton needs at least gcc-9 to build

				  apt-get install -y g++-9

				  CXX=g++-9 pip_install "git+${TRITON_REPO}@${TRITON_PINNED_COMMIT}#subdirectory=python"

				elif [ -n "${CLANG_VERSION}" ]; then

				elif [ -n "${UBUNTU_VERSION}" ] && [ -n "${CLANG_VERSION}" ]; then

				  # Triton needs <filesystem> which surprisingly is not available with clang-9 toolchain

				  add-apt-repository -y ppa:ubuntu-toolchain-r/test

				  apt-get install -y g++-9

									
										44

.ci/docker/linter-cuda/Dockerfile
									
										Normal file
									
												View File
												
				@ -0,0 +1,44 @@

				ARG UBUNTU_VERSION

				FROM ubuntu:${UBUNTU_VERSION}

				ARG UBUNTU_VERSION

				ENV DEBIAN_FRONTEND noninteractive

				# Install common dependencies (so that this step can be cached separately)

				COPY ./common/install_base.sh install_base.sh

				RUN bash ./install_base.sh && rm install_base.sh

				# Install missing libomp-dev

				RUN apt-get update && apt-get install -y --no-install-recommends libomp-dev && apt-get autoclean && apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

				# Install user

				COPY ./common/install_user.sh install_user.sh

				RUN bash ./install_user.sh && rm install_user.sh

				# Install conda and other packages (e.g., numpy, pytest)

				ARG ANACONDA_PYTHON_VERSION

				ARG CONDA_CMAKE

				ENV ANACONDA_PYTHON_VERSION=$ANACONDA_PYTHON_VERSION

				ENV PATH /opt/conda/envs/py_$ANACONDA_PYTHON_VERSION/bin:/opt/conda/bin:$PATH

				COPY requirements-ci.txt /opt/conda/requirements-ci.txt

				COPY ./common/install_conda.sh install_conda.sh

				COPY ./common/common_utils.sh common_utils.sh

				RUN bash ./install_conda.sh && rm install_conda.sh common_utils.sh /opt/conda/requirements-ci.txt

				# Install cuda and cudnn

				ARG CUDA_VERSION

				RUN wget -q https://raw.githubusercontent.com/pytorch/builder/main/common/install_cuda.sh -O install_cuda.sh

				RUN bash ./install_cuda.sh ${CUDA_VERSION} && rm install_cuda.sh

				ENV DESIRED_CUDA ${CUDA_VERSION}

				ENV PATH /usr/local/nvidia/bin:/usr/local/cuda/bin:$PATH

				# Note that Docker build forbids copying file outside the build context

				COPY ./common/install_linter.sh install_linter.sh

				COPY ./common/common_utils.sh common_utils.sh

				RUN bash ./install_linter.sh

				RUN rm install_linter.sh common_utils.sh

				USER jenkins

				CMD ["bash"]

35

.ci/docker/requirements-ci.txt

View File

 @ -75,10 +75,10 @@ librosa>=0.6.2 ; python_version < "3.11"
 #Pinned versions:
 #test that import:
 mypy==1.4.1
 mypy==1.7.0
 # Pin MyPy version because new errors are likely to appear with each release
 #Description: linter
 #Pinned versions: 1.4.1
 #Pinned versions: 1.7.0
 #test that import: test_typing.py, test_type_hints.py
 networkx==2.8.8
 @ -124,10 +124,22 @@ opt-einsum==3.3
 #Pinned versions: 3.3
 #test that import: test_linalg.py
 pillow==9.3.0 ; python_version <= "3.8"
 pillow==9.5.0 ; python_version > "3.8"
 optree==0.9.1
 #Description: A library for tree manipulation
 #Pinned versions: 0.9.1
 #test that import: test_vmap.py, test_aotdispatch.py, test_dynamic_shapes.py,
 #test_pytree.py, test_ops.py, test_control_flow.py, test_modules.py,
 #common_utils.py, test_eager_transforms.py, test_python_dispatch.py,
 #test_expanded_weights.py, test_decomp.py, test_overrides.py, test_masked.py,
 #test_ops.py, test_prims.py, test_subclass.py, test_functionalization.py,
 #test_schema_check.py, test_profiler_tree.py, test_meta.py, test_torchxla_num_output.py,
 #test_utils.py, test_proxy_tensor.py, test_memory_profiler.py, test_view_ops.py,
 #test_pointwise_ops.py, test_dtensor_ops.py, test_torchinductor.py, test_fx.py,
 #test_fake_tensor.py, test_mps.py
 pillow==10.0.1
 #Description:  Python Imaging Library fork
 #Pinned versions:
 #Pinned versions: 10.0.1
 #test that import:
 protobuf==3.20.2
 @ -271,7 +283,18 @@ pytest-cpp==2.3.0
 #Pinned versions: 2.3.0
 #test that import:
 z3-solver
 z3-solver==4.12.2.0
 #Description: The Z3 Theorem Prover Project
 #Pinned versions:
 #test that import:
 tensorboard==2.13.0
 #Description: Also included in .ci/docker/requirements-docs.txt
 #Pinned versions:
 #test that import: test_tensorboard
 pywavelets==1.4.1
 #Description: This is a requirement of scikit-image, we need to pin
 # it here because 1.5.0 conflicts with numpy 1.21.2 used in CI
 #Pinned versions: 1.4.1
 #test that import:

									
										12

.ci/docker/ubuntu-cuda/Dockerfile
									
												View File
												
				@ -79,12 +79,6 @@ ENV OPENSSL_ROOT_DIR /opt/openssl

				RUN bash ./install_openssl.sh

				ENV OPENSSL_DIR /opt/openssl

				# (optional) Install non-default CMake version

				ARG CMAKE_VERSION

				COPY ./common/install_cmake.sh install_cmake.sh

				RUN if [ -n "${CMAKE_VERSION}" ]; then bash ./install_cmake.sh; fi

				RUN rm install_cmake.sh

				ARG INDUCTOR_BENCHMARKS

				COPY ./common/install_inductor_benchmark_deps.sh install_inductor_benchmark_deps.sh

				COPY ./common/common_utils.sh common_utils.sh

				@ -93,6 +87,12 @@ COPY ci_commit_pins/timm.txt timm.txt

				RUN if [ -n "${INDUCTOR_BENCHMARKS}" ]; then bash ./install_inductor_benchmark_deps.sh; fi

				RUN rm install_inductor_benchmark_deps.sh common_utils.sh timm.txt huggingface.txt

				# (optional) Install non-default CMake version

				ARG CMAKE_VERSION

				COPY ./common/install_cmake.sh install_cmake.sh

				RUN if [ -n "${CMAKE_VERSION}" ]; then bash ./install_cmake.sh; fi

				RUN rm install_cmake.sh

				ARG TRITON

				# Install triton, this needs to be done before sccache because the latter will

				# try to reach out to S3, which docker build runners don't have access

									
										15

.ci/docker/ubuntu/Dockerfile
									
												View File
												
				@ -17,13 +17,6 @@ ARG LLVMDEV

				COPY ./common/install_clang.sh install_clang.sh

				RUN bash ./install_clang.sh && rm install_clang.sh

				# (optional) Install thrift.

				ARG THRIFT

				COPY ./common/install_thrift.sh install_thrift.sh

				RUN if [ -n "${THRIFT}" ]; then bash ./install_thrift.sh; fi

				RUN rm install_thrift.sh

				ENV INSTALLED_THRIFT ${THRIFT}

				# Install user

				COPY ./common/install_user.sh install_user.sh

				RUN bash ./install_user.sh && rm install_user.sh

				@ -153,6 +146,14 @@ COPY ci_commit_pins/triton.txt triton.txt

				RUN if [ -n "${TRITON}" ]; then bash ./install_triton.sh; fi

				RUN rm install_triton.sh common_utils.sh triton.txt

				ARG EXECUTORCH

				# Build and install executorch

				COPY ./common/install_executorch.sh install_executorch.sh

				COPY ./common/common_utils.sh common_utils.sh

				COPY ci_commit_pins/executorch.txt executorch.txt

				RUN if [ -n "${EXECUTORCH}" ]; then bash ./install_executorch.sh; fi

				RUN rm install_executorch.sh common_utils.sh executorch.txt

				ARG ONNX

				# Install ONNX dependencies

				COPY ./common/install_onnx.sh ./common/common_utils.sh ./

									
										7

.ci/onnx/test.sh
									
												View File
												
				@ -3,11 +3,6 @@

				# shellcheck source=./common.sh

				source "$(dirname "${BASH_SOURCE[0]}")/common.sh"

				# Use to retry ONNX test, only retry it twice

				retry () {

				    "$@" || (sleep 60 && "$@")

				}

				if [[ "$BUILD_ENVIRONMENT" == *onnx* ]]; then

				  # TODO: This can be removed later once vision is also part of the Docker image

				  pip install -q --user --no-use-pep517 "git+https://github.com/pytorch/vision.git@$(cat .github/ci_commit_pins/vision.txt)"

				@ -16,5 +11,5 @@ if [[ "$BUILD_ENVIRONMENT" == *onnx* ]]; then

				  # NB: ONNX test is fast (~15m) so it's ok to retry it few more times to avoid any flaky issue, we

				  # need to bring this to the standard PyTorch run_test eventually. The issue will be tracked in

				  # https://github.com/pytorch/pytorch/issues/98626

				  retry "$ROOT_DIR/scripts/onnx/test.sh"

				  "$ROOT_DIR/scripts/onnx/test.sh"

				fi

									
										15

.ci/pytorch/build.sh
									
												View File
												
				@ -63,6 +63,12 @@ else

				  export LLVM_DIR=/opt/llvm/lib/cmake/llvm

				fi

				if [[ "$BUILD_ENVIRONMENT" == *executorch* ]]; then

				  # To build test_edge_op_registration

				  export BUILD_EXECUTORCH=ON

				  export USE_CUDA=0

				fi

				if ! which conda; then

				  # In ROCm CIs, we are doing cross compilation on build machines with

				  # intel cpu and later run tests on machines with amd cpu.

				@ -159,6 +165,14 @@ if [[ "$BUILD_ENVIRONMENT" == *cuda* && -z "$TORCH_CUDA_ARCH_LIST" ]]; then

				  exit 1

				fi

				# We only build FlashAttention files for CUDA 8.0+, and they require large amounts of

				# memory to build and will OOM

				if [[ "$BUILD_ENVIRONMENT" == *cuda* ]] && [[ "$TORCH_CUDA_ARCH_LIST" == *"8.6"* || "$TORCH_CUDA_ARCH_LIST" == *"8.0"* ]]; then

				  echo "WARNING: FlashAttention files require large amounts of memory to build and will OOM"

				  echo "Setting MAX_JOBS=(nproc-2)/3 to reduce memory usage"

				  export MAX_JOBS="$(( $(nproc --ignore=2) / 3 ))"

				fi

				if [[ "${BUILD_ENVIRONMENT}" == *clang* ]]; then

				  export CC=clang

				  export CXX=clang++

				@ -168,7 +182,6 @@ if [[ "$BUILD_ENVIRONMENT" == *-clang*-asan* ]]; then

				  export LDSHARED="clang --shared"

				  export USE_CUDA=0

				  export USE_ASAN=1

				  export USE_MKLDNN=0

				  export UBSAN_FLAGS="-fno-sanitize-recover=all;-fno-sanitize=float-divide-by-zero;-fno-sanitize=float-cast-overflow"

				  unset USE_LLVM

				fi

									
										18

.ci/pytorch/common_utils.sh
									
												View File
												
				@ -43,7 +43,7 @@ function assert_git_not_dirty() {

				    # TODO: we should add an option to `build_amd.py` that reverts the repo to

				    #       an unmodified state.

				    if [[ "$BUILD_ENVIRONMENT" != *rocm* ]] && [[ "$BUILD_ENVIRONMENT" != *xla* ]] ; then

				        git_status=$(git status --porcelain)

				        git_status=$(git status --porcelain | grep -v '?? third_party' || true)

				        if [[ $git_status ]]; then

				            echo "Build left local git repository checkout dirty"

				            echo "git status --porcelain:"

				@ -171,13 +171,6 @@ function install_torchrec_and_fbgemm() {

				  pip_install --no-use-pep517 --user "git+https://github.com/pytorch/torchrec.git@${torchrec_commit}"

				}

				function install_numpy_pytorch_interop() {

				  local commit

				  commit=$(get_pinned_commit numpy_pytorch_interop)

				  # TODO: --no-use-pep517 will result in failure.

				  pip_install --user "git+https://github.com/Quansight-Labs/numpy_pytorch_interop.git@${commit}"

				}

				function clone_pytorch_xla() {

				  if [[ ! -d ./xla ]]; then

				    git clone --recursive --quiet https://github.com/pytorch/xla.git

				@ -212,15 +205,6 @@ function test_torch_deploy(){

				 popd

				}

				function install_timm() {

				  local commit

				  commit=$(get_pinned_commit timm)

				  pip_install pandas

				  pip_install scipy

				  pip_install z3-solver

				  pip_install "git+https://github.com/rwightman/pytorch-image-models@${commit}"

				}

				function checkout_install_torchbench() {

				  local commit

				  commit=$(get_pinned_commit torchbench)

									
										2

.ci/pytorch/macos-build.sh
									
												View File
												
				@ -43,7 +43,7 @@ cross_compile_arm64() {

				compile_arm64() {

				  # Compilation for arm64

				  # TODO: Compile with OpenMP support (but this causes CI regressions as cross-compilation were done with OpenMP disabled)

				  USE_DISTRIBUTED=0 USE_OPENMP=0 MACOSX_DEPLOYMENT_TARGET=11.0 WERROR=1 BUILD_TEST=OFF USE_PYTORCH_METAL=1 python setup.py bdist_wheel

				  USE_DISTRIBUTED=0 USE_OPENMP=1 MACOSX_DEPLOYMENT_TARGET=11.0 WERROR=1 BUILD_TEST=OFF USE_PYTORCH_METAL=1 python setup.py bdist_wheel

				}

				compile_x86_64() {

									
										4

.ci/pytorch/multigpu-test.sh
									
												View File
												
				@ -36,10 +36,12 @@ time python test/run_test.py --verbose -i distributed/test_functional_api

				# DTensor tests

				time python test/run_test.py --verbose -i distributed/_tensor/test_device_mesh

				time python test/run_test.py --verbose -i distributed/_tensor/test_random_ops

				time python test/run_test.py --verbose -i distributed/_tensor/test_dtensor_compile

				# DeviceMesh test

				time python test/run_test.py --verbose -i distributed/test_device_mesh

				# DTensor/TP tests

				time python test/run_test.py --verbose -i distributed/tensor/parallel/test_ddp_2d_parallel

				time python test/run_test.py --verbose -i distributed/tensor/parallel/test_fsdp_2d_parallel

									
										117

.ci/pytorch/test.sh
									
												View File
												
				@ -80,6 +80,11 @@ if [[ "$BUILD_ENVIRONMENT" != *bazel* ]]; then

				  CUSTOM_TEST_ARTIFACT_BUILD_DIR=$(realpath "${CUSTOM_TEST_ARTIFACT_BUILD_DIR:-"build/custom_test_artifacts"}")

				fi

				# Reduce set of tests to include when running run_test.py

				if [[ -n $TESTS_TO_INCLUDE ]]; then

				  echo "Setting INCLUDE_CLAUSE"

				  INCLUDE_CLAUSE="--include $TESTS_TO_INCLUDE"

				fi

				# shellcheck source=./common.sh

				source "$(dirname "${BASH_SOURCE[0]}")/common.sh"

				@ -148,7 +153,7 @@ if [[ "$BUILD_ENVIRONMENT" == *asan* ]]; then

				    export PYTORCH_TEST_WITH_ASAN=1

				    export PYTORCH_TEST_WITH_UBSAN=1

				    # TODO: Figure out how to avoid hard-coding these paths

				    export ASAN_SYMBOLIZER_PATH=/usr/lib/llvm-12/bin/llvm-symbolizer

				    export ASAN_SYMBOLIZER_PATH=/usr/lib/llvm-15/bin/llvm-symbolizer

				    export TORCH_USE_RTLD_GLOBAL=1

				    # NB: We load libtorch.so with RTLD_GLOBAL for UBSAN, unlike our

				    # default behavior.

				@ -182,7 +187,7 @@ if [[ "$BUILD_ENVIRONMENT" == *asan* ]]; then

				    # have, and it applies to child processes.

				    # TODO: get rid of the hardcoded path

				    export LD_PRELOAD=/usr/lib/llvm-12/lib/clang/12.0.1/lib/linux/libclang_rt.asan-x86_64.so

				    export LD_PRELOAD=/usr/lib/llvm-15/lib/clang/15.0.7/lib/linux/libclang_rt.asan-x86_64.so

				    # Disable valgrind for asan

				    export VALGRIND=OFF

				    # Increase stack size, because ASAN red zones use more stack

				@ -228,13 +233,16 @@ test_python_shard() {

				    exit 1

				  fi

				  time python test/run_test.py --exclude-jit-executor --exclude-distributed-tests --shard "$1" "$NUM_TEST_SHARDS" --verbose

				  # Bare --include flag is not supported and quoting for lint ends up with flag not being interpreted correctly

				  # shellcheck disable=SC2086

				  time python test/run_test.py --exclude-jit-executor --exclude-distributed-tests $INCLUDE_CLAUSE --shard "$1" "$NUM_TEST_SHARDS" --verbose

				  assert_git_not_dirty

				}

				test_python() {

				  time python test/run_test.py --exclude-jit-executor --exclude-distributed-tests --verbose

				  # shellcheck disable=SC2086

				  time python test/run_test.py --exclude-jit-executor --exclude-distributed-tests $INCLUDE_CLAUSE --verbose

				  assert_git_not_dirty

				}

				@ -281,6 +289,10 @@ test_inductor_distributed() {

				  # Smuggle a few multi-gpu tests here so that we don't have to request another large node

				  echo "Testing multi_gpu tests in test_torchinductor"

				  pytest test/inductor/test_torchinductor.py -k test_multi_gpu

				  pytest test/inductor/test_aot_inductor.py -k test_non_default_cuda_device

				  pytest test/inductor/test_aot_inductor.py -k test_replicate_on_devices

				  pytest test/distributed/_tensor/test_dtensor_compile.py

				  pytest test/distributed/tensor/parallel/test_fsdp_2d_parallel.py

				  # this runs on both single-gpu and multi-gpu instance. It should be smart about skipping tests that aren't supported

				  # with if required # gpus aren't available

				@ -303,14 +315,17 @@ test_inductor() {

				# "Global" flags for inductor benchmarking controlled by TEST_CONFIG

				# For example 'dynamic_aot_eager_torchbench' TEST_CONFIG means we run

				# the benchmark script with '--dynamic-shapes --backend aot_eager --device cuda'

				# The matrix of test options is specified in .github/workflows/periodic.yml

				# and .github/workflows/inductor.yml

				# The matrix of test options is specified in .github/workflows/inductor.yml,

				# .github/workflows/inductor-periodic.yml, and

				# .github/workflows/inductor-perf-test-nightly.yml

				DYNAMO_BENCHMARK_FLAGS=()

				if [[ "${TEST_CONFIG}" == *dynamo_eager* ]]; then

				  DYNAMO_BENCHMARK_FLAGS+=(--backend eager)

				elif [[ "${TEST_CONFIG}" == *aot_eager* ]]; then

				  DYNAMO_BENCHMARK_FLAGS+=(--backend aot_eager)

				elif [[ "${TEST_CONFIG}" == *aot_inductor* ]]; then

				  DYNAMO_BENCHMARK_FLAGS+=(--export-aot-inductor)

				elif [[ "${TEST_CONFIG}" == *inductor* && "${TEST_CONFIG}" != *perf* ]]; then

				  DYNAMO_BENCHMARK_FLAGS+=(--inductor)

				fi

				@ -319,7 +334,7 @@ if [[ "${TEST_CONFIG}" == *dynamic* ]]; then

				  DYNAMO_BENCHMARK_FLAGS+=(--dynamic-shapes --dynamic-batch-only)

				fi

				if [[ "${TEST_CONFIG}" == *cpu_accuracy* ]]; then

				if [[ "${TEST_CONFIG}" == *cpu_inductor* ]]; then

				  DYNAMO_BENCHMARK_FLAGS+=(--device cpu)

				else

				  DYNAMO_BENCHMARK_FLAGS+=(--device cuda)

				@ -383,6 +398,11 @@ test_perf_for_dashboard() {

				            "${target_flag[@]}" --"$mode" --"$dtype" --backend "$backend" "$@" --freezing \

				            --output "$TEST_REPORTS_DIR/${backend}_with_cudagraphs_freezing_${suite}_${dtype}_${mode}_cuda_${target}.csv"

				      fi

				      if [[ "$DASHBOARD_TAG" == *freeze_autotune_cudagraphs-true* ]] && [[ "$mode" == "inference" ]]; then

				        TORCHINDUCTOR_MAX_AUTOTUNE=1 python "benchmarks/dynamo/$suite.py" \

				            "${target_flag[@]}" --"$mode" --"$dtype" --backend "$backend" "$@" --freezing \

				            --output "$TEST_REPORTS_DIR/${backend}_with_cudagraphs_freezing_autotune_${suite}_${dtype}_${mode}_cuda_${target}.csv"

				      fi

				      if [[ "$DASHBOARD_TAG" == *aotinductor-true* ]] && [[ "$mode" == "inference" ]]; then

				        python "benchmarks/dynamo/$suite.py" \

				            "${target_flag[@]}" --"$mode" --"$dtype" --export-aot-inductor --disable-cudagraphs "$@" \

				@ -433,19 +453,12 @@ test_single_dynamo_benchmark() {

				      "${DYNAMO_BENCHMARK_FLAGS[@]}" \

				      "$@" "${partition_flags[@]}" \

				      --output "$TEST_REPORTS_DIR/${name}_${suite}.csv"

				    if [[ "${TEST_CONFIG}" == *inductor* ]] && [[ "${TEST_CONFIG}" != *cpu_accuracy* ]]; then

				      # other jobs (e.g. periodic, cpu-accuracy) may have different set of expected models.

				      python benchmarks/dynamo/check_accuracy.py \

				        --actual "$TEST_REPORTS_DIR/${name}_$suite.csv" \

				        --expected "benchmarks/dynamo/ci_expected_accuracy/${TEST_CONFIG}_${name}.csv"

				      python benchmarks/dynamo/check_graph_breaks.py \

				        --actual "$TEST_REPORTS_DIR/${name}_$suite.csv" \

				        --expected "benchmarks/dynamo/ci_expected_accuracy/${TEST_CONFIG}_${name}.csv"

				    else

				      python benchmarks/dynamo/check_csv.py \

				        -f "$TEST_REPORTS_DIR/${name}_${suite}.csv"

				    fi

				    python benchmarks/dynamo/check_accuracy.py \

				      --actual "$TEST_REPORTS_DIR/${name}_$suite.csv" \

				      --expected "benchmarks/dynamo/ci_expected_accuracy/${TEST_CONFIG}_${name}.csv"

				    python benchmarks/dynamo/check_graph_breaks.py \

				      --actual "$TEST_REPORTS_DIR/${name}_$suite.csv" \

				      --expected "benchmarks/dynamo/ci_expected_accuracy/${TEST_CONFIG}_${name}.csv"

				  fi

				}

				@ -463,8 +476,10 @@ test_dynamo_benchmark() {

				  elif [[ "${TEST_CONFIG}" == *perf* ]]; then

				    test_single_dynamo_benchmark "dashboard" "$suite" "$shard_id" "$@"

				  else

				    if [[ "${TEST_CONFIG}" == *cpu_accuracy* ]]; then

				    if [[ "${TEST_CONFIG}" == *cpu_inductor* ]]; then

				      test_single_dynamo_benchmark "inference" "$suite" "$shard_id" --inference --float32 "$@"

				    elif [[ "${TEST_CONFIG}" == *aot_inductor* ]]; then

				      test_single_dynamo_benchmark "inference" "$suite" "$shard_id" --inference --bfloat16 "$@"

				    else

				      test_single_dynamo_benchmark "inference" "$suite" "$shard_id" --inference --bfloat16 "$@"

				      test_single_dynamo_benchmark "training" "$suite" "$shard_id" --training --amp "$@"

				@ -479,9 +494,17 @@ test_inductor_torchbench_smoketest_perf() {

				  python benchmarks/dynamo/torchbench.py --device cuda --performance --backend inductor --float16 --training \

				    --batch-size-file "$(realpath benchmarks/dynamo/torchbench_models_list.txt)" --only hf_Bert \

				    --output "$TEST_REPORTS_DIR/inductor_training_smoketest.csv"

				  # the reference speedup value is hardcoded in check_hf_bert_perf_csv.py

				  # this value needs to be actively maintained to make this check useful

				  python benchmarks/dynamo/check_hf_bert_perf_csv.py -f "$TEST_REPORTS_DIR/inductor_training_smoketest.csv"

				  # The threshold value needs to be actively maintained to make this check useful

				  python benchmarks/dynamo/check_perf_csv.py -f "$TEST_REPORTS_DIR/inductor_training_smoketest.csv" -t 1.4

				  python benchmarks/dynamo/torchbench.py --device cuda --performance --bfloat16 --inference \

				    --export-aot-inductor --only nanogpt --output "$TEST_REPORTS_DIR/inductor_inference_smoketest.csv"

				  # The threshold value needs to be actively maintained to make this check useful

				  # The perf number of nanogpt seems not very stable, e.g.

				  # https://github.com/pytorch/pytorch/actions/runs/7158691360/job/19491437314,

				  # and thus we lower its threshold to reduce flakiness. If this continues to be a problem,

				  # we switch to use some other model.

				  python benchmarks/dynamo/check_perf_csv.py -f "$TEST_REPORTS_DIR/inductor_inference_smoketest.csv" -t 4.9

				  # Check memory compression ratio for a few models

				  for test in hf_Albert timm_vision_transformer; do

				@ -544,6 +567,10 @@ test_without_numpy() {

				  python -c "import sys;sys.path.insert(0, 'fake_numpy');from unittest import TestCase;import torch;x=torch.randn(3,3);TestCase().assertRaises(RuntimeError, lambda: x.numpy())"

				  # Regression test for https://github.com/pytorch/pytorch/issues/66353

				  python -c "import sys;sys.path.insert(0, 'fake_numpy');import torch;print(torch.tensor([torch.tensor(0.), torch.tensor(1.)]))"

				  # Regression test for https://github.com/pytorch/pytorch/issues/109387

				  if [[ "${TEST_CONFIG}" == *dynamo* ]]; then

				    python -c "import sys;sys.path.insert(0, 'fake_numpy');import torch;torch.compile(lambda x:print(x))('Hello World')"

				  fi

				  popd

				}

				@ -601,7 +628,7 @@ test_libtorch_jit() {

				  # Run jit and lazy tensor cpp tests together to finish them faster

				  if [[ "$BUILD_ENVIRONMENT" == *cuda* && "$TEST_CONFIG" != *nogpu* ]]; then

				    LTC_TS_CUDA=1 python test/run_test.py --cpp --verbose -i cpp/test_jit cpp/nvfuser_tests cpp/test_lazy

				    LTC_TS_CUDA=1 python test/run_test.py --cpp --verbose -i cpp/test_jit cpp/test_lazy

				  else

				    # CUDA tests have already been skipped when CUDA is not available

				    python test/run_test.py --cpp --verbose -i cpp/test_jit cpp/test_lazy -k "not CUDA"

				@ -662,7 +689,8 @@ test_vulkan() {

				test_distributed() {

				  echo "Testing distributed python tests"

				  time python test/run_test.py --distributed-tests --shard "$SHARD_NUMBER" "$NUM_TEST_SHARDS" --verbose

				  # shellcheck disable=SC2086

				  time python test/run_test.py --distributed-tests --shard "$SHARD_NUMBER" "$NUM_TEST_SHARDS" $INCLUDE_CLAUSE --verbose

				  assert_git_not_dirty

				  if [[ ("$BUILD_ENVIRONMENT" == *cuda* || "$BUILD_ENVIRONMENT" == *rocm*) && "$SHARD_NUMBER" == 1 ]]; then

				@ -971,9 +999,28 @@ test_docs_test() {

				}

				test_executorch() {

				  pushd /executorch

				  echo "Install torchvision and torchaudio"

				  # TODO(huydhn): Switch this to the pinned commits on ExecuTorch once they are

				  # there.  These libraries need to be built here, and not part of the Docker

				  # image because they require the target version of torch to be installed first

				  pip_install --no-use-pep517 --user "git+https://github.com/pytorch/audio.git"

				  pip_install --no-use-pep517 --user "git+https://github.com/pytorch/vision.git"

				  echo "Run ExecuTorch regression tests for some models"

				  # NB: This is a sample model, more can be added here

				  export PYTHON_EXECUTABLE=python

				  # TODO(huydhn): Add more coverage here using ExecuTorch's gather models script

				  # shellcheck disable=SC1091

				  source .ci/scripts/test.sh mv3 cmake xnnpack-quantization-delegation ''

				  popd

				  # Test torchgen generated code for Executorch.

				  echo "Testing Executorch op registration"

				  echo "Testing ExecuTorch op registration"

				  "$BUILD_BIN_DIR"/test_edge_op_registration

				  assert_git_not_dirty

				}

				@ -988,6 +1035,8 @@ elif [[ "${TEST_CONFIG}" == *xla* ]]; then

				  install_torchvision

				  build_xla

				  test_xla

				elif [[ "${TEST_CONFIG}" == *executorch* ]]; then

				  test_executorch

				elif [[ "$TEST_CONFIG" == 'jit_legacy' ]]; then

				  test_python_legacy_jit

				elif [[ "${BUILD_ENVIRONMENT}" == *libtorch* ]]; then

				@ -1010,11 +1059,10 @@ elif [[ "${TEST_CONFIG}" == *huggingface* ]]; then

				  test_dynamo_benchmark huggingface "$id"

				elif [[ "${TEST_CONFIG}" == *timm* ]]; then

				  install_torchvision

				  install_timm

				  id=$((SHARD_NUMBER-1))

				  test_dynamo_benchmark timm_models "$id"

				elif [[ "${TEST_CONFIG}" == *torchbench* ]]; then

				  if [[ "${TEST_CONFIG}" == *cpu_accuracy* ]]; then

				  if [[ "${TEST_CONFIG}" == *cpu_inductor* ]]; then

				    install_torchaudio cpu

				  else

				    install_torchaudio cuda

				@ -1025,13 +1073,13 @@ elif [[ "${TEST_CONFIG}" == *torchbench* ]]; then

				  # https://github.com/opencv/opencv-python/issues/885

				  pip_install opencv-python==4.8.0.74

				  if [[ "${TEST_CONFIG}" == *inductor_torchbench_smoketest_perf* ]]; then

				    checkout_install_torchbench hf_Bert hf_Albert timm_vision_transformer

				    checkout_install_torchbench hf_Bert hf_Albert nanogpt timm_vision_transformer

				    PYTHONPATH=$(pwd)/torchbench test_inductor_torchbench_smoketest_perf

				  else

				    checkout_install_torchbench

				    # Do this after checkout_install_torchbench to ensure we clobber any

				    # nightlies that torchbench may pull in

				    if [[ "${TEST_CONFIG}" != *cpu_accuracy* ]]; then

				    if [[ "${TEST_CONFIG}" != *cpu_inductor* ]]; then

				      install_torchrec_and_fbgemm

				    fi

				    PYTHONPATH=$(pwd)/torchbench test_dynamo_benchmark torchbench "$id"

				@ -1043,12 +1091,10 @@ elif [[ "${TEST_CONFIG}" == *inductor* && "${SHARD_NUMBER}" == 1 ]]; then

				elif [[ "${TEST_CONFIG}" == *dynamo* && "${SHARD_NUMBER}" == 1 && $NUM_TEST_SHARDS -gt 1 ]]; then

				  test_without_numpy

				  install_torchvision

				  install_numpy_pytorch_interop

				  test_dynamo_shard 1

				  test_aten

				elif [[ "${TEST_CONFIG}" == *dynamo* && "${SHARD_NUMBER}" == 2 && $NUM_TEST_SHARDS -gt 1 ]]; then

				  install_torchvision

				  install_numpy_pytorch_interop

				  test_dynamo_shard 2

				elif [[ "${SHARD_NUMBER}" == 1 && $NUM_TEST_SHARDS -gt 1 ]]; then

				  test_without_numpy

				@ -1076,6 +1122,10 @@ elif [[ "${BUILD_ENVIRONMENT}" == *-mobile-lightweight-dispatch* ]]; then

				  test_libtorch

				elif [[ "${TEST_CONFIG}" = docs_test ]]; then

				  test_docs_test

				elif [[ "${BUILD_ENVIRONMENT}" == *rocm* && -n "$TESTS_TO_INCLUDE" ]]; then

				  install_torchvision

				  test_python

				  test_aten

				else

				  install_torchvision

				  install_monkeytype

				@ -1088,5 +1138,4 @@ else

				  test_custom_backend

				  test_torch_function_benchmark

				  test_benchmarks

				  test_executorch

				fi

									
										3

.ci/pytorch/win-test-helpers/build_pytorch.bat
									
												View File
												
				@ -127,8 +127,7 @@ python -c "import os, glob; os.system('python -mpip install --no-index --no-deps

				    :: export test times so that potential sharded tests that'll branch off this build will use consistent data

				    python tools/stats/export_test_times.py

				    copy /Y ".pytorch-test-times.json" "%PYTORCH_FINAL_PACKAGE_DIR%"

				    copy /Y ".pytorch-test-file-ratings.json" "%PYTORCH_FINAL_PACKAGE_DIR%"

				    robocopy /E ".additional_ci_files" "%PYTORCH_FINAL_PACKAGE_DIR%\.additional_ci_files"

				    :: Also save build/.ninja_log as an artifact

				    copy /Y "build\.ninja_log" "%PYTORCH_FINAL_PACKAGE_DIR%\"

									
										3

.ci/pytorch/win-test-helpers/run_python_nn_smoketests.py
									
												View File
												
				@ -2,6 +2,7 @@

				import os

				import subprocess

				import sys

				COMMON_TESTS = [

				    (

				@ -53,4 +54,4 @@ if __name__ == "__main__":

				                print("Reruning with traceback enabled")

				                print("Command:", command_string)

				                subprocess.run(command_args, check=False)

				            exit(e.returncode)

				            sys.exit(e.returncode)

									
										5

.ci/pytorch/win-test-helpers/test_custom_script_ops.bat
									
												View File
												
				@ -26,11 +26,6 @@ popd

				python test_custom_ops.py -v

				if ERRORLEVEL 1 exit /b 1

				:: TODO: fix and re-enable this test

				:: See https://github.com/pytorch/pytorch/issues/25155

				:: python test_custom_classes.py -v

				:: if ERRORLEVEL 1 exit /b 1

				python model.py --export-script-module="build/model.pt"

				if ERRORLEVEL 1 exit /b 1

									
										28

.ci/pytorch/win-test-helpers/test_libtorch.bat
									
												View File
												
				@ -1,7 +1,3 @@

				:: Skip LibTorch tests when building a GPU binary and testing on a CPU machine

				:: because LibTorch tests are not well designed for this use case.

				if "%USE_CUDA%" == "0" IF NOT "%CUDA_VERSION%" == "cpu" exit /b 0

				call %SCRIPT_HELPERS_DIR%\setup_pytorch_env.bat

				if errorlevel 1 exit /b 1

				@ -21,7 +17,7 @@ if not errorlevel 0 exit /b 1

				cd %TMP_DIR_WIN%\build\torch\test

				for /r "." %%a in (*.exe) do (

				    call :libtorch_check "%%~na" "%%~fa"

				    if errorlevel 1 exit /b 1

				    if errorlevel 1 goto fail

				)

				goto :eof

				@ -34,18 +30,6 @@ set CPP_TESTS_DIR=%TMP_DIR_WIN%\build\torch\test

				:: Skip verify_api_visibility as it a compile level test

				if "%~1" == "verify_api_visibility" goto :eof

				:: See https://github.com/pytorch/pytorch/issues/25161

				if "%~1" == "c10_metaprogramming_test" goto :eof

				if "%~1" == "module_test" goto :eof

				:: See https://github.com/pytorch/pytorch/issues/25312

				if "%~1" == "converter_nomigraph_test" goto :eof

				:: See https://github.com/pytorch/pytorch/issues/35636

				if "%~1" == "generate_proposals_op_gpu_test" goto :eof

				:: See https://github.com/pytorch/pytorch/issues/35648

				if "%~1" == "reshape_op_gpu_test" goto :eof

				:: See https://github.com/pytorch/pytorch/issues/35651

				if "%~1" == "utility_ops_gpu_test" goto :eof

				echo Running "%~2"

				if "%~1" == "c10_intrusive_ptr_benchmark" (

				  :: NB: This is not a gtest executable file, thus couldn't be handled by pytest-cpp

				@ -56,11 +40,15 @@ if "%~1" == "c10_intrusive_ptr_benchmark" (

				python test\run_test.py --cpp --verbose -i "cpp/%~1"

				if errorlevel 1 (

				  echo %1 failed with exit code %errorlevel%

				  exit /b 1

				  goto fail

				)

				if not errorlevel 0 (

				  echo %1 failed with exit code %errorlevel%

				  exit /b 1

				  goto fail

				)

				goto :eof

				:eof

				exit /b 0

				:fail

				exit /b 1

									
										3

.ci/pytorch/win-test-helpers/test_python_jit_legacy.bat
									
												View File
												
				@ -1,8 +1,7 @@

				call %SCRIPT_HELPERS_DIR%\setup_pytorch_env.bat

				echo Copying over test times file

				copy /Y "%PYTORCH_FINAL_PACKAGE_DIR_WIN%\.pytorch-test-times.json" "%PROJECT_DIR_WIN%"

				copy /Y "%PYTORCH_FINAL_PACKAGE_DIR_WIN%\.pytorch-test-file-ratings.json" "%PROJECT_DIR_WIN%"

				robocopy /E "%PYTORCH_FINAL_PACKAGE_DIR_WIN%\.additional_ci_files" "%PROJECT_DIR_WIN%\.additional_ci_files"

				pushd test

									
										3

.ci/pytorch/win-test-helpers/test_python_shard.bat
									
												View File
												
				@ -22,8 +22,7 @@ if "%SHARD_NUMBER%" == "1" (

				)

				echo Copying over test times file

				copy /Y "%PYTORCH_FINAL_PACKAGE_DIR_WIN%\.pytorch-test-times.json" "%PROJECT_DIR_WIN%"

				copy /Y "%PYTORCH_FINAL_PACKAGE_DIR_WIN%\.pytorch-test-file-ratings.json" "%PROJECT_DIR_WIN%"

				robocopy /E "%PYTORCH_FINAL_PACKAGE_DIR_WIN%\.additional_ci_files" "%PROJECT_DIR_WIN%\.additional_ci_files"

				echo Run nn tests

				python run_test.py --exclude-jit-executor --exclude-distributed-tests --shard "%SHARD_NUMBER%" "%NUM_TEST_SHARDS%" --verbose

									
										4

.ci/pytorch/win-test.sh
									
												View File
												
				@ -35,10 +35,10 @@ if [[ "$BUILD_ENVIRONMENT" == *cuda* ]]; then

				fi

				# TODO: Move both of them to Windows AMI

				python -m pip install pytest-rerunfailures==10.3 pytest-cpp==2.3.0

				python -m pip install pytest-rerunfailures==10.3 pytest-cpp==2.3.0 tensorboard==2.13.0

				# Install Z3 optional dependency for Windows builds.

				python -m pip install z3-solver

				python -m pip install z3-solver==4.12.2.0

				run_tests() {

				    # Run nvidia-smi if available

									
										28

.circleci/cimodel/data/simple/anaconda_prune_defintions.py
									
												View File
											
				@ -1,28 +0,0 @@

				from collections import OrderedDict

				from cimodel.data.simple.util.branch_filters import gen_filter_dict

				from cimodel.lib.miniutils import quote

				CHANNELS_TO_PRUNE = ["pytorch-nightly", "pytorch-test"]

				PACKAGES_TO_PRUNE = "pytorch torchvision torchaudio torchtext ignite torchcsprng"

				def gen_workflow_job(channel: str):

				    return OrderedDict(

				        {

				            "anaconda_prune": OrderedDict(

				                {

				                    "name": f"anaconda-prune-{channel}",

				                    "context": quote("org-member"),

				                    "packages": quote(PACKAGES_TO_PRUNE),

				                    "channel": channel,

				                    "filters": gen_filter_dict(branches_list=["postnightly"]),

				                }

				            )

				        }

				    )

				def get_workflow_jobs():

				    return [gen_workflow_job(channel) for channel in CHANNELS_TO_PRUNE]

									
										2

.circleci/cimodel/data/simple/util/docker_constants.py
									
												View File
												
				@ -32,4 +32,4 @@ def gen_mobile_docker(specifier):

				DOCKER_IMAGE_ASAN, DOCKER_REQUIREMENT_ASAN = gen_mobile_docker("asan")

				DOCKER_IMAGE_NDK, DOCKER_REQUIREMENT_NDK = gen_mobile_docker("android-ndk-r19c")

				DOCKER_IMAGE_NDK, DOCKER_REQUIREMENT_NDK = gen_mobile_docker("android-ndk-r21e")

									
										51

.circleci/config.yml
									
										generated
									
												View File
												
				@ -444,35 +444,6 @@ jobs:

				          script="/Users/distiller/project/.circleci/scripts/binary_ios_upload.sh"

				          cat "$script"

				          source "$script"

				  anaconda_prune:

				    parameters:

				      packages:

				        type: string

				        description: "What packages are we pruning? (quoted, space-separated string. eg. 'pytorch', 'torchvision torchaudio', etc.)"

				        default: "pytorch"

				      channel:

				        type: string

				        description: "What channel are we pruning? (eq. pytorch-nightly)"

				        default: "pytorch-nightly"

				    docker:

				      - image: continuumio/miniconda3

				    environment:

				      - PACKAGES: "<< parameters.packages >>"

				      - CHANNEL: "<< parameters.channel >>"

				    steps:

				      - checkout

				      - run:

				          name: Install dependencies

				          no_output_timeout: "1h"

				          command: |

				            conda install -yq anaconda-client

				      - run:

				          name: Prune packages

				          no_output_timeout: "1h"

				          command: |

				              ANACONDA_API_TOKEN="${CONDA_PYTORCHBOT_TOKEN}" \

				              scripts/release/anaconda-prune/run.sh

				  pytorch_doc_push:

				    resource_class: medium

				    machine:

				@ -652,7 +623,7 @@ jobs:

				            - run:

				                name: Archive artifacts into zip

				                command: |

				                  zip -1 -r artifacts.zip dist/ build/.ninja_log build/compile_commands.json .pytorch-test-times.json .pytorch-test-file-ratings.json

				                  zip -1 -r artifacts.zip dist/ build/.ninja_log build/compile_commands.json .additional_ci_files

				                  cp artifacts.zip /Users/distiller/workspace

				      - persist_to_workspace:

				@ -686,8 +657,6 @@ jobs:

				      TEST_CONFIG: << parameters.test-config >>

				      SHARD_NUMBER: << parameters.shard-number >>

				      NUM_TEST_SHARDS: << parameters.num-test-shards >>

				      PYTORCH_RETRY_TEST_CASES: 1

				      PYTORCH_OVERRIDE_FLAKY_SIGNAL: 1

				    steps:

				      - checkout

				      - attach_workspace:

				@ -1414,22 +1383,4 @@ workflows:

				          requires:

				            - pytorch_ios_full_jit_12_5_1_nightly_x86_64_build

				            - pytorch_ios_full_jit_12_5_1_nightly_arm64_build

				      - anaconda_prune:

				          name: anaconda-prune-pytorch-nightly

				          context: "org-member"

				          packages: "pytorch torchvision torchaudio torchtext ignite torchcsprng"

				          channel: pytorch-nightly

				          filters:

				            branches:

				              only:

				                - postnightly

				      - anaconda_prune:

				          name: anaconda-prune-pytorch-test

				          context: "org-member"

				          packages: "pytorch torchvision torchaudio torchtext ignite torchcsprng"

				          channel: pytorch-test

				          filters:

				            branches:

				              only:

				                - postnightly

				    when: << pipeline.parameters.run_build >>

									
										3

.circleci/generate_config_yml.py
									
												View File
												
				@ -10,8 +10,6 @@ import shutil

				import sys

				from collections import namedtuple

				import cimodel.data.simple.anaconda_prune_defintions

				import cimodel.data.simple.docker_definitions

				import cimodel.data.simple.mobile_definitions

				import cimodel.data.simple.nightly_ios

				@ -144,7 +142,6 @@ def gen_build_workflows_tree():

				    build_workflows_functions = [

				        cimodel.data.simple.mobile_definitions.get_workflow_jobs,

				        cimodel.data.simple.nightly_ios.get_workflow_jobs,

				        cimodel.data.simple.anaconda_prune_defintions.get_workflow_jobs,

				    ]

				    build_jobs = [f() for f in build_workflows_functions]

				    build_jobs.extend(

									
										2

.circleci/scripts/binary_ios_upload.sh
									
												View File
												
				@ -33,7 +33,7 @@ fi

				cp ${PROJ_ROOT}/LICENSE ${ZIP_DIR}/

				# zip the library

				export DATE="$(date -u +%Y%m%d)"

				export IOS_NIGHTLY_BUILD_VERSION="2.1.0.${DATE}"

				export IOS_NIGHTLY_BUILD_VERSION="2.2.0.${DATE}"

				if [ "${BUILD_LITE_INTERPRETER}" == "1" ]; then

				    # libtorch_lite_ios_nightly_1.11.0.20210810.zip

				    ZIPFILE="libtorch_lite_ios_nightly_${IOS_NIGHTLY_BUILD_VERSION}.zip"

									
										23

.circleci/scripts/binary_linux_test.sh
									
												View File
												
				@ -54,7 +54,7 @@ fi

				# Move debug wheels out of the the package dir so they don't get installed

				# Move debug wheels out of the package dir so they don't get installed

				mkdir -p /tmp/debug_final_pkgs

				mv /final_pkgs/debug-*.zip /tmp/debug_final_pkgs || echo "no debug packages to move"

				@ -66,6 +66,12 @@ mv /final_pkgs/debug-*.zip /tmp/debug_final_pkgs || echo "no debug packages to m

				#   conda build scripts themselves. These should really be consolidated

				# Pick only one package of multiple available (which happens as result of workflow re-runs)

				pkg="/final_pkgs/\$(ls -1 /final_pkgs|sort|tail -1)"

				if [[ "\$PYTORCH_BUILD_VERSION" == *dev* ]]; then

				    CHANNEL="nightly"

				else

				    CHANNEL="test"

				fi

				if [[ "$PACKAGE_TYPE" == conda ]]; then

				  (

				    # For some reason conda likes to re-activate the conda environment when attempting this install

				@ -83,25 +89,14 @@ if [[ "$PACKAGE_TYPE" == conda ]]; then

				    if [[ "$DESIRED_CUDA" == 'cpu' ]]; then

				      retry conda install -c pytorch -y cpuonly

				    else

				      cu_ver="${DESIRED_CUDA:2:2}.${DESIRED_CUDA:4}"

				      CUDA_PACKAGE="pytorch-cuda"

				      PYTORCH_CHANNEL="pytorch"

				      if [[ "\${TORCH_CONDA_BUILD_FOLDER}" == "pytorch-nightly" ]]; then

				              PYTORCH_CHANNEL="pytorch-nightly"

				      fi

				      retry conda install \${EXTRA_CONDA_FLAGS} -yq -c nvidia -c "\${PYTORCH_CHANNEL}" "pytorch-cuda=\${cu_ver}"

				      retry conda install \${EXTRA_CONDA_FLAGS} -yq -c nvidia -c "pytorch-\${CHANNEL}" "pytorch-cuda=\${cu_ver}"

				    fi

				    conda install \${EXTRA_CONDA_FLAGS} -y "\$pkg" --offline

				  )

				elif [[ "$PACKAGE_TYPE" != libtorch ]]; then

				  if [[ "$(uname -m)" == aarch64 ]]; then

				    # Using "extra-index-url" until all needed aarch64 dependencies are

				    # added to "https://download.pytorch.org/whl/nightly/"

				    pip install "\$pkg" --extra-index-url "https://download.pytorch.org/whl/nightly/${DESIRED_CUDA}"

				  else

				    pip install "\$pkg" --index-url "https://download.pytorch.org/whl/nightly/${DESIRED_CUDA}"

				  fi

				  pip install "\$pkg" --index-url "https://download.pytorch.org/whl/\${CHANNEL}/${DESIRED_CUDA}"

				  retry pip install -q numpy protobuf typing-extensions

				fi

				if [[ "$PACKAGE_TYPE" == libtorch ]]; then

									
										11

.circleci/scripts/binary_populate_env.sh
									
												View File
												
				@ -59,7 +59,7 @@ PIP_UPLOAD_FOLDER='nightly/'

				# We put this here so that OVERRIDE_PACKAGE_VERSION below can read from it

				export DATE="$(date -u +%Y%m%d)"

				#TODO: We should be pulling semver version from the base version.txt

				BASE_BUILD_VERSION="2.1.0.dev$DATE"

				BASE_BUILD_VERSION="2.2.0.dev$DATE"

				# Change BASE_BUILD_VERSION to git tag when on a git tag

				# Use 'git -C' to make doubly sure we're in the correct directory for checking

				# the git tag

				@ -77,13 +77,8 @@ else

				  export PYTORCH_BUILD_VERSION="${BASE_BUILD_VERSION}+$DESIRED_CUDA"

				fi

				if [[ -n "${PYTORCH_EXTRA_INSTALL_REQUIREMENTS:-}" ]]; then

				  export PYTORCH_BUILD_VERSION="${PYTORCH_BUILD_VERSION}-with-pypi-cudnn"

				fi

				export PYTORCH_BUILD_NUMBER=1

				JAVA_HOME=

				BUILD_JNI=OFF

				if [[ "$PACKAGE_TYPE" == libtorch ]]; then

				@ -155,8 +150,8 @@ EOL

				# nproc doesn't exist on darwin

				if [[ "$(uname)" != Darwin ]]; then

				  # Because most Circle executors only have 20 CPUs, using more causes OOMs w/ Ninja and nvcc parallelization

				  MEMORY_LIMIT_MAX_JOBS=18

				  # This was lowered from 18 to 12 to avoid OOMs when compiling FlashAttentionV2

				  MEMORY_LIMIT_MAX_JOBS=12

				  NUM_CPUS=$(( $(nproc) - 2 ))

				  # Defaults here for **binary** linux builds so they can be changed in one place

									
										36

.circleci/scripts/binary_upload.sh
									
												View File
												
				@ -11,16 +11,11 @@ PKG_DIR=${PKG_DIR:-/tmp/workspace/final_pkgs}

				# currently set within `designate_upload_channel`

				UPLOAD_CHANNEL=${UPLOAD_CHANNEL:-nightly}

				# Designates what subfolder to put packages into

				UPLOAD_SUBFOLDER=${UPLOAD_SUBFOLDER:-cpu}

				UPLOAD_SUBFOLDER=${UPLOAD_SUBFOLDER:-}

				UPLOAD_BUCKET="s3://pytorch"

				BACKUP_BUCKET="s3://pytorch-backup"

				BUILD_NAME=${BUILD_NAME:-}

				# this is temporary change to upload pypi-cudnn builds to separate folder

				if [[ ${BUILD_NAME} == *with-pypi-cudnn* ]]; then

				  UPLOAD_SUBFOLDER="${UPLOAD_SUBFOLDER}_pypi_cudnn"

				fi

				DRY_RUN=${DRY_RUN:-enabled}

				# Don't actually do work unless explicit

				ANACONDA="true anaconda"

				@ -64,12 +59,17 @@ s3_upload() {

				  local pkg_type

				  extension="$1"

				  pkg_type="$2"

				  s3_dir="${UPLOAD_BUCKET}/${pkg_type}/${UPLOAD_CHANNEL}/${UPLOAD_SUBFOLDER}/"

				  s3_root_dir="${UPLOAD_BUCKET}/${pkg_type}/${UPLOAD_CHANNEL}"

				  if [[ -z ${UPLOAD_SUBFOLDER:-} ]]; then

				    s3_upload_dir="${s3_root_dir}/"

				  else

				    s3_upload_dir="${s3_root_dir}/${UPLOAD_SUBFOLDER}/"

				  fi

				  (

				    for pkg in ${PKG_DIR}/*.${extension}; do

				      (

				        set -x

				        ${AWS_S3_CP} --no-progress --acl public-read "${pkg}" "${s3_dir}"

				        ${AWS_S3_CP} --no-progress --acl public-read "${pkg}" "${s3_upload_dir}"

				      )

				    done

				  )

				@ -82,15 +82,17 @@ pip install -q awscli

				case "${PACKAGE_TYPE}" in

				  conda)

				    conda_upload

				    # Fetch  platform (eg. win-64, linux-64, etc.) from index file

				    # Because there's no actual conda command to read this

				    subdir=$(\

				      tar -xOf ${PKG_DIR}/*.bz2 info/index.json \

				        | grep subdir  \

				        | cut -d ':' -f2 \

				        | sed -e 's/[[:space:]]//' -e 's/"//g' -e 's/,//' \

				    )

				    BACKUP_DIR="conda/${subdir}"

				    for conda_archive in ${PKG_DIR}/*.tar.bz2; do

				      # Fetch  platform (eg. win-64, linux-64, etc.) from index file because

				      # there's no actual conda command to read this

				      subdir=$(\

				        tar -xOf "${conda_archive}" info/index.json \

				          | grep subdir  \

				          | cut -d ':' -f2 \

				          | sed -e 's/[[:space:]]//' -e 's/"//g' -e 's/,//' \

				      )

				      BACKUP_DIR="conda/${subdir}"

				    done

				    ;;

				  libtorch)

				    s3_upload "zip" "libtorch"

									
										29

.circleci/verbatim-sources/job-specs/binary-job-specs.yml
									
												View File
												
				@ -42,32 +42,3 @@ jobs:

				          script="/Users/distiller/project/.circleci/scripts/binary_ios_upload.sh"

				          cat "$script"

				          source "$script"

				  anaconda_prune:

				    parameters:

				      packages:

				        type: string

				        description: "What packages are we pruning? (quoted, space-separated string. eg. 'pytorch', 'torchvision torchaudio', etc.)"

				        default: "pytorch"

				      channel:

				        type: string

				        description: "What channel are we pruning? (eq. pytorch-nightly)"

				        default: "pytorch-nightly"

				    docker:

				      - image: continuumio/miniconda3

				    environment:

				      - PACKAGES: "<< parameters.packages >>"

				      - CHANNEL: "<< parameters.channel >>"

				    steps:

				      - checkout

				      - run:

				          name: Install dependencies

				          no_output_timeout: "1h"

				          command: |

				            conda install -yq anaconda-client

				      - run:

				          name: Prune packages

				          no_output_timeout: "1h"

				          command: |

				              ANACONDA_API_TOKEN="${CONDA_PYTORCHBOT_TOKEN}" \

				              scripts/release/anaconda-prune/run.sh

									
										4

.circleci/verbatim-sources/job-specs/job-specs-custom.yml
									
												View File
												
				@ -177,7 +177,7 @@

				            - run:

				                name: Archive artifacts into zip

				                command: |

				                  zip -1 -r artifacts.zip dist/ build/.ninja_log build/compile_commands.json .pytorch-test-times.json .pytorch-test-file-ratings.json

				                  zip -1 -r artifacts.zip dist/ build/.ninja_log build/compile_commands.json .additional_ci_files

				                  cp artifacts.zip /Users/distiller/workspace

				      - persist_to_workspace:

				@ -211,8 +211,6 @@

				      TEST_CONFIG: << parameters.test-config >>

				      SHARD_NUMBER: << parameters.shard-number >>

				      NUM_TEST_SHARDS: << parameters.num-test-shards >>

				      PYTORCH_RETRY_TEST_CASES: 1

				      PYTORCH_OVERRIDE_FLAKY_SIGNAL: 1

				    steps:

				      - checkout

				      - attach_workspace:

15

.clang-tidy

View File

 @ -1,5 +1,8 @@
 ---
 # NOTE there must be no spaces before the '-', so put the comma last.
 # The check bugprone-unchecked-optional-access is also turned off atm
 # because it causes clang-tidy to hang randomly. The tracking issue
 # can be found at https://github.com/llvm/llvm-project/issues/69369.
 InheritParentConfig: true
 Checks: '
 bugprone-*,
 @ -9,6 +12,7 @@ bugprone-*,
 -bugprone-lambda-function-name,
 -bugprone-reserved-identifier,
 -bugprone-swapped-arguments,
 -bugprone-unchecked-optional-access,
 clang-diagnostic-missing-prototypes,
 cppcoreguidelines-*,
 -cppcoreguidelines-avoid-do-while,
 @ -30,8 +34,13 @@ cppcoreguidelines-*,
 -facebook-hte-RelativeInclude,
 hicpp-exception-baseclass,
 hicpp-avoid-goto,
 misc-unused-alias-decls,
 misc-unused-using-decls,
 misc-*,
 -misc-const-correctness,
 -misc-use-anonymous-namespace,
 -misc-unused-parameters,
 -misc-no-recursion,
 -misc-non-private-member-variables-in-classes,
 -misc-confusable-identifiers,
 modernize-*,
 -modernize-concat-nested-namespaces,
 -modernize-macro-to-enum,
 @ -44,7 +53,7 @@ modernize-*,
 performance-*,
 readability-container-size-empty,
 '
 HeaderFilterRegex: '^(c10/(?!test)|torch/csrc/(?!deploy/interpreter/cpython)).*$'
 HeaderFilterRegex: '^(aten/|c10/|torch/).*$'
 AnalyzeTemporaryDtors: false
 WarningsAsErrors: '*'
 ...

									
										72

.devcontainer/README.md
									
										Normal file
									
												View File
												
				@ -0,0 +1,72 @@

				# Step by step guide on using PyTorch's DevContainer

				Using PyTorch's DevContainer environment involves a series of steps that will help you set up a development environment that is isolated and replicable. Below, we'll guide you through each step to make this process as smooth as possible:

				## Step 1: Install VSCode

				1. Navigate to the [Visual Studio Code website](https://code.visualstudio.com/).

				2. Download the appropriate installer for your operating system (Windows, Linux, or macOS).

				3. Run the installer and follow the on-screen instructions to install VSCode on your system.

				4. After installation, launch VSCode.

				## Step 2: Install DevContainer Extension

				1. In VSCode, go to the Extensions view by clicking on the Extensions icon in the Activity Bar on the side of the window.

				2. Search for "Dev Containers" in the Extensions view search bar.

				3. Find the "Dev Containers" extension in the search results and click on the install button to install it.

				You can also go to the extension's [homepage](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers) and [documentation page](https://code.visualstudio.com/docs/devcontainers/containers) to find more details.

				## Step 3: Install Docker and Add Current Login User to Docker Group

				1. Follow the [official guide](https://docs.docker.com/get-docker/) to install Docker. Don't forget the [post installation steps](https://docs.docker.com/engine/install/linux-postinstall/).

				If you are using [Visual Studio Code Remote - SSH](https://code.visualstudio.com/docs/remote/ssh), then you only need to install Docker in the remote host, not your local computer. And the following steps should be run in the remote host.

				## Step 4 (Optional): Install NVIDIA Container Toolkit for GPU Usage

				1. If you intend to use GPU resources, first ensure you have NVIDIA drivers installed on your system. Check if `nvidia-smi` works to verify your GPU setup.

				2. Follow the [official guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#docker) to install the NVIDIA Container Toolkit.

				3. After installation, verify that the toolkit is installed correctly by running:

				   ```

				   docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi

				   ```

				## Step 5: Clone PyTorch

				1. Open a terminal or command prompt.

				2. Use the following command to clone the PyTorch repository:

				   ```

				   git clone https://github.com/pytorch/pytorch

				   ```

				3. Navigate to the cloned directory:

				   ```

				   cd pytorch

				   ```

				## Step 6: Open in DevContainer

				1. In VSCode, use the Command Palette (`Ctrl+Shift+P` or `Cmd+Shift+P` on macOS) to run the "Remote-Containers: Open Folder in Container..." command.

				2. You will be prompted with two options: CPU dev container or CUDA dev container. Choose the one you want to run.

				## Step 7: Wait for Building the Environment

				1. After opening the folder in a DevContainer, VSCode will start building the container. This process can take some time as it involves downloading necessary images and setting up the environment.

				2. You can monitor the progress in the VSCode terminal.

				3. Once the build process completes, you'll have a fully configured PyTorch development environment in a container.

				4. The next time you open the same dev container, it will be much faster, as it does not require building the image again.

				You are now all set to start developing with PyTorch in a DevContainer environment. This setup ensures you have a consistent and isolated development environment for your PyTorch projects.

				## Step 8: Build PyTorch

				To build pytorch from source, simply run:

				   ```

				   python setup.py develop

				   ```

				The process involves compiling thousands of files, and would take a long time. Fortunately, the compiled objects can be useful for your next build. When you modify some files, you only need to compile the changed files the next time.

				Note that only contents in the `pytorch` directory are saved to disk. This directory is mounted to the docker image, while other contents in the docker image are all temporary, and will be lost if docker restarts the image or the server reboots.

				For an in-depth understanding of Dev Container and its caveats, please refer to [the full documentation](https://code.visualstudio.com/docs/devcontainers/containers).

									
										2

.devcontainer/scripts/install-dev-tools.sh
									
												View File
												
				@ -9,3 +9,5 @@ make setup_lint

				# Add CMAKE_PREFIX_PATH to bashrc

				echo 'export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}' >> ~/.bashrc

				# Add linker path so that cuda-related libraries can be found

				echo 'export LDFLAGS="-L${CONDA_PREFIX}/lib/ $LDFLAGS"' >> ~/.bashrc

12

.flake8

View File

 @ -2,7 +2,7 @@
 # NOTE: **Mirror any changes** to this file the [tool.ruff] config in pyproject.toml
 # before we can fully move to use ruff
 enable-extensions = G
 select = B,C,E,F,G,P,SIM1,T4,W,B9
 select = B,C,E,F,G,P,SIM1,T4,W,B9,TOR0,TOR1,TOR2
 max-line-length = 120
 # C408 ignored because we like the dict keyword argument syntax
 # E501 is not flexible enough, we're using B950 instead
 @ -14,15 +14,21 @@ ignore =
     # to line this up with executable bit
     EXE001,
     # these ignores are from flake8-bugbear; please fix!
     B007,B008,B017,B019,B020,B023,B024,B026,B028,B903,B904,B905,B906,B907
     B007,B008,B017,B019,B023,B028,B903,B904,B905,B906,B907
     # these ignores are from flake8-comprehensions; please fix!
     C407,
     # these ignores are from flake8-logging-format; please fix!
     G100,G101,G200,G201,G202
     G100,G101,G200
     # these ignores are from flake8-simplify. please fix or ignore with commented reason
     SIM105,SIM108,SIM110,SIM111,SIM113,SIM114,SIM115,SIM116,SIM117,SIM118,SIM119,SIM12,
     # flake8-simplify code styles
     SIM102,SIM103,SIM106,SIM112,
     # TorchFix codes that don't make sense for PyTorch itself:
     # removed and deprecated PyTorch functions.
     TOR001,TOR101,
     # TODO(kit1980): fix all TOR102 issues
     # `torch.load` without `weights_only` parameter is unsafe
     TOR102,
 per-file-ignores =
     __init__.py: F401
     torch/utils/cpp_extension.py: B950

									
										5

.github/actionlint.yaml
									
										vendored
									
												View File
												
				@ -3,11 +3,12 @@ self-hosted-runner:

				    - linux.20_04.4x

				    - linux.20_04.16x

				    - linux.large

				    - linux.large.arc

				    - linux.2xlarge

				    - linux.4xlarge

				    - linux.12xlarge

				    - linux.24xlarge

				    - linux.t4g.2xlarge

				    - linux.arm64.2xlarge

				    - linux.4xlarge.nvidia.gpu

				    - linux.8xlarge.nvidia.gpu

				    - linux.16xlarge.nvidia.gpu

				@ -23,3 +24,5 @@ self-hosted-runner:

				    - macos-12-xl

				    - macos-12

				    - macos12.3-m1

				    - macos-latest-xlarge

				    - macos-13-xlarge

									
										7

.github/actions/filter-test-configs/action.yml
									
										vendored
									
												View File
												
				@ -13,6 +13,10 @@ inputs:

				    required: true

				    type: string

				    description: JSON description of what test configs to run.

				  job-name:

				    type: string

				    required: false

				    default: ""

				outputs:

				  test-matrix:

				@ -56,6 +60,7 @@ runs:

				    - name: Get the job name

				      id: get-job-name

				      if: inputs.job-name == ''

				      continue-on-error: true

				      shell: bash

				      run: |

				@ -91,7 +96,7 @@ runs:

				      shell: bash

				      env:

				        GITHUB_TOKEN: ${{ inputs.github-token }}

				        JOB_NAME: ${{ steps.get-job-name.outputs.job-name }}

				        JOB_NAME: ${{ inputs.job-name == '' && steps.get-job-name.outputs.job-name || inputs.job-name }}

				        PR_NUMBER: ${{ github.event.pull_request.number }}

				        TAG: ${{ steps.parse-ref.outputs.tag }}

				        EVENT_NAME: ${{ github.event_name }}

									
										8

.github/actions/get-workflow-job-id/action.yml
									
										vendored
									
												View File
												
				@ -11,18 +11,20 @@ outputs:

				  job-id:

				    description: The retrieved workflow job id

				    value: ${{ steps.get-job-id.outputs.job-id }}

				  job-name:

				    description: The retrieved workflow job name

				    value: ${{ steps.get-job-id.outputs.job-name }}

				runs:

				  using: composite

				  steps:

				    - name: Get jobid or fail

				    - name: Get job id and name or fail

				      # timeout-minutes is unsupported for composite workflows, see https://github.com/actions/runner/issues/1979

				      # timeout-minutes: 10

				      shell: bash

				      id: get-job-id

				      run: |

				        set -eux

				        GHA_WORKFLOW_JOB_ID=$(python3 .github/scripts/get_workflow_job_id.py "${GITHUB_RUN_ID}" "${RUNNER_NAME}")

				        echo "job-id=${GHA_WORKFLOW_JOB_ID}" >> "${GITHUB_OUTPUT}"

				        python3 .github/scripts/get_workflow_job_id.py "${GITHUB_RUN_ID}" "${RUNNER_NAME}"

				      env:

				        GITHUB_TOKEN: ${{ inputs.github-token }}

									
										11

.github/actions/pytest-cache-upload/action.yml
									
										vendored
									
												View File
												
				@ -10,6 +10,13 @@ inputs:

				    description: Shard number for the current job

				    required: false

				    default: "0"

				  sha:

				    description: SHA for the commit

				    required: true

				  test_config:

				    description: Name of the test config

				    required: false

				    default: "default"

				  job_identifier:

				    description: Text that uniquely identifies a given job type within a workflow. All shards of a job should share the same job identifier.

				    required: true

				@ -33,6 +40,8 @@ runs:

				      env:

				        CACHE_DIR: ${{ inputs.cache_dir }}

				        JOB_IDENTIFIER: ${{ inputs.job_identifier }}

				        SHA: ${{ inputs.sha }}

				        TEST_CONFIG: ${{ inputs.test_config }}

				        SHARD: ${{ inputs.shard }}

				        REPO: ${{ github.repository }}

				      run: |

				@ -41,6 +50,8 @@ runs:

				          --cache_dir $GITHUB_WORKSPACE/$CACHE_DIR \

				          --pr_identifier $GITHUB_REF \

				          --job_identifier $JOB_IDENTIFIER \

				          --sha $SHA \

				          --test_config $TEST_CONFIG \

				          --shard $SHARD \

				          --repo $REPO \

				          --temp_dir $RUNNER_TEMP \

									
										12

.github/actions/upload-test-artifacts/action.yml
									
										vendored
									
												View File
												
				@ -43,14 +43,14 @@ runs:

				        FILE_SUFFIX: ${{ inputs.file-suffix }}

				      run: |

				        # Remove any previous test reports if they exist

				        rm -f usage-log-*.zip

				        rm -f logs-*.zip

				        # this workflow is also run in bazel build test, but we dont generate usage reports for it

				        # so check to see if the file exists first

				        if [ -f 'usage_log.txt' ]; then

				            zip "usage-log-${FILE_SUFFIX}.zip" 'usage_log.txt'

				            zip "logs-${FILE_SUFFIX}.zip" 'usage_log.txt'

				        fi

				        if ls test/**/*.log 1> /dev/null 2>&1; then

				            zip -r "usage-log-${FILE_SUFFIX}.zip" test -i '*.log'

				            zip -r "logs-${FILE_SUFFIX}.zip" test -i '*.log'

				        fi

				    # Windows zip

				@ -80,7 +80,7 @@ runs:

				        FILE_SUFFIX: ${{ inputs.file-suffix }}

				      run: |

				        # -ir => recursive include all files in pattern

				        7z a "usage-log-$Env:FILE_SUFFIX.zip" 'usage_log.txt' -ir'!test\*.log'

				        7z a "logs-$Env:FILE_SUFFIX.zip" 'usage_log.txt' -ir'!test\*.log'

				    # S3 upload

				    - name: Store Test Downloaded JSONs on S3

				@ -112,7 +112,7 @@ runs:

				          ${{ github.repository }}/${{ github.run_id }}/${{ github.run_attempt }}/artifact

				        retention-days: 14

				        if-no-files-found: ignore

				        path: usage-log-*.zip

				        path: logs-*.zip

				    # GHA upload

				    - name: Store Test Downloaded JSONs on Github

				@ -146,7 +146,7 @@ runs:

				      continue-on-error: true

				      with:

				        # Add the run attempt, see [Artifact run attempt]

				        name: usage-log-runattempt${{ github.run_attempt }}-${{ inputs.file-suffix }}.zip

				        name: logs-runattempt${{ github.run_attempt }}-${{ inputs.file-suffix }}.zip

				        retention-days: 14

				        if-no-files-found: ignore

				        path: |

									
										1

.github/auto_request_review.yml
									
										vendored
									
												View File
												
				@ -12,7 +12,6 @@ reviewers:

				    symbolic-shapes:

				      - symbolic-shapes

				      - antoniojkim

				      - wconstab

				      - SherlockNoMad

				    Chillee:

				      - ezyang

2

.github/ci_commit_pins/audio.txt vendored

View File

 @ -1 +1 @@
 a8f4e97bd5356a7a77510cdf6a3a62e25a5dc602
 fa9b2c74e84d7eb1fc6e3eb51e43213f0c05

2

.github/ci_commit_pins/fbgemm.txt vendored

View File

 @ -1 +1 @@
 b2746f642cc2c99fe9d1a0c34359c0de45341c2
 de731af65b4f04696e85c729e3282450b51b95fd

1

.github/ci_commit_pins/numpy_pytorch_interop.txt vendored

View File

				`@ -1 +0,0 @@`
				`0c4e82511d349358d2c8c492dd833334e742f27f`

1

.github/ci_commit_pins/timm.txt vendored

View File

				`@ -1 +0,0 @@`
				`b9d43c7dcac1fe05e851dd7be7187b108af593d2`

2

.github/ci_commit_pins/torchbench.txt vendored

View File

 @ -1 +1 @@
 b9e13c826f3930e54346b4d619cb59182f68
 a2fb8624947f9c0e2edc898ff42a16124da

2

.github/ci_commit_pins/vision.txt vendored

View File

 @ -1 +1 @@
 cd5ea8e21d7596a24907710411d6b4a43f628d
 e12d200c97d7aab668b976e92b46513c9ca7a0d8

2

.github/ci_commit_pins/xla.txt vendored

View File

 @ -1 +1 @@
 e1ee592d9806216d7ac0bb711cae6307b0c5b68a
 a80c1e7f958e7d8e8f92319db70876940e67ad9b

									
										15

.github/labeler.yml
									
										vendored
									
												View File
												
				@ -15,6 +15,7 @@

				"ciflow/inductor":

				- torch/_decomp/**

				- torch/_dynamo/**

				- torch/_export/**

				- torch/_inductor/**

				- benchmarks/dynamo/**

				- torch/_subclasses/fake_tensor.py

				@ -22,12 +23,17 @@

				- torch/_subclasses/meta_utils.py

				- test/distributed/test_dynamo_distributed.py

				- test/distributed/test_inductor_collectives.py

				- torch/_functorch/partitioners.py

				- torch/_functorch/_aot_autograd/**

				- torch/_functorch/aot_autograd.py

				- torch/_functorch/partitioners.py

				- .ci/docker/ci_commit_pins/**

				- .github/ci_commit_pins/**

				- c10/core/Sym*

				- torch/fx/experimental/symbolic_shapes.py

				- test/distributed/_tensor/test_dtensor_compile.py

				- test/distributed/tensor/parallel/test_fsdp_2d_parallel.py

				- torch/distributed/_tensor/**

				- torch/distributed/fsdp/**

				"module: cpu":

				- aten/src/ATen/cpu/**

				@ -66,3 +72,10 @@

				"ciflow/trunk":

				- .ci/docker/ci_commit_pins/triton.txt

				"oncall: distributed":

				- torch/csrc/distributed/**

				- torch/distributed/**

				- torch/nn/parallel/**

				- test/distributed/**

				- torch/testing/_internal/distributed/**

									
										43

.github/merge_rules.yaml
									
										vendored
									
												View File
												
				@ -4,15 +4,19 @@

				  - .ci/onnx/*

				  - .ci/docker/common/install_onnx.sh

				  - aten/src/ATen/core/interned_strings.h

				  - benchmarks/dynamo/**

				  - docs/source/onnx.rst

				  - docs/source/onnx*

				  - docs/source/scripts/onnx/**

				  - docs/source/_static/img/onnx/**

				  - scripts/onnx/**

				  - test/onnx/**

				  - test/onnx_caffe2/**

				  - tools/onnx/**

				  - torch/_dynamo/backends/onnxrt.py

				  - torch/_C/__init__.pyi.in

				  - torch/_C/_onnx.pyi

				  - torch/_logging/**

				  - torch/csrc/jit/passes/onnx.*

				  - torch/csrc/jit/passes/onnx/**

				  - torch/csrc/jit/serialization/export.*

				@ -22,8 +26,6 @@

				  - torch/testing/_internal/common_methods_invocations.py

				  - third_party/onnx

				  - caffe2/python/onnx/**

				  - benchmarks/dynamo/_onnx/**

				  - torch/_logging/**

				  approved_by:

				  - BowenBao

				  - abock

				@ -72,6 +74,7 @@

				- name: OSS CI / pytorchbot

				  patterns:

				  - .github/ci_commit_pins/audio.txt

				  - .github/ci_commit_pins/vision.txt

				  - .github/ci_commit_pins/torchdynamo.txt

				  - .ci/docker/ci_commit_pins/triton.txt

				@ -82,6 +85,19 @@

				  - EasyCLA

				  - Lint

				  - pull

				  - inductor

				- name: OSS CI /pytorchbot / Executorch

				  patterns:

				  - .ci/docker/ci_commit_pins/executorch.txt

				  approved_by:

				  - pytorchbot

				  ignore_flaky_failures: false

				  mandatory_checks_name:

				  - EasyCLA

				  - Lint

				  - pull / linux-jammy-py3-clang12-executorch / build

				  - pull / linux-jammy-py3-clang12-executorch / test (executorch, 1, 1, linux.2xlarge)

				- name: OSS CI / pytorchbot / XLA

				  patterns:

				@ -92,8 +108,8 @@

				  mandatory_checks_name:

				  - EasyCLA

				  - Lint

				  - pull / linux-bionic-py3_8-clang8-xla / build

				  - pull / linux-bionic-py3_8-clang8-xla / test (xla, 1, 1, linux.12xlarge)

				  - pull / linux-focal-py3_8-clang9-xla / build

				  - pull / linux-focal-py3_8-clang9-xla / test (xla, 1, 1, linux.12xlarge)

				- name: Documentation

				  patterns:

				@ -123,9 +139,6 @@

				- name: PrimTorch

				  patterns:

				  - aten/src/ATen/native_functions.yaml

				  - aten/src/ATen/native/**

				  - test/**

				  - torch/_meta_registrations.py

				  - torch/_decomp/**

				  - torch/_refs/**

				@ -319,6 +332,7 @@

				  - XiaobingSuper

				  - jgong5

				  - vfdev-5

				  - leslie-fang-intel

				  mandatory_checks_name:

				  - EasyCLA

				  - Lint

				@ -337,6 +351,21 @@

				  - Lint

				  - pull

				- name: x86 CPU quantization

				  patterns:

				  - torch/ao/quantization/quantizer/x86_inductor_quantizer.py

				  - torch/_inductor/fx_passes/quantization.py

				  - test/quantization/core/test_quantized_op.py

				  - test/inductor/test_mkldnn_pattern_matcher.py

				  - test/quantization/pt2e/test_x86inductor_quantizer.py

				  approved_by:

				  - leslie-fang-intel

				  - jgong5

				  mandatory_checks_name:

				  - EasyCLA

				  - Lint

				  - pull

				- name: Autocast

				  patterns:

				  - torch/amp/**

									
										1

.github/pytorch-probot.yml
									
										vendored
									
												View File
												
				@ -10,6 +10,7 @@ ciflow_push_tags:

				- ciflow/mps

				- ciflow/nightly

				- ciflow/periodic

				- ciflow/rocm

				- ciflow/slow

				- ciflow/trunk

				- ciflow/unstable

0

.github/requirements/conda-env-Linux-X64 → .github/requirements/conda-env-Linux-X64.txt vendored

View File

2

.github/requirements/conda-env-iOS → .github/requirements/conda-env-iOS.txt vendored

View File

 @ -1,7 +1,5 @@
 blas=1.0
 cmake=3.22.1
 mkl=2022.1.0
 mkl-include=2022.1.0
 ninja=1.10.2
 numpy=1.23.3
 pyyaml=6.0

2

.github/requirements/conda-env-macOS-ARM64 vendored

View File

 @ -5,7 +5,7 @@ cmake=3.22.*
 typing-extensions=4.3.0
 dataclasses=0.8
 pip=22.2.2
 pillow=9.2.0
 pillow=10.0.1
 pkg-config=0.29.2
 wheel=0.37.1
 # NB: This is intentionally held back because anaconda main doesn't

2

.github/requirements/conda-env-macOS-X64 vendored

View File

 @ -7,7 +7,7 @@ cmake=3.22.*
 typing-extensions=4.3.0
 dataclasses=0.8
 pip=22.2.2
 pillow=9.2.0
 pillow=10.0.1
 libuv=1.40.0
 pkg-config=0.29.2
 wheel=0.37.1

1

.github/requirements/pip-requirements-iOS.txt vendored

View File

 @ -1,3 +1,4 @@
 # iOS simulator requirements
 coremltools==5.0b5
 protobuf==3.20.2
 optree==0.9.1

3

.github/requirements/pip-requirements-macOS.txt vendored

View File

 @ -10,6 +10,7 @@ numba<=0.49.1; platform_machine != "arm64"
 opt-einsum>=3.3
 psutil==5.9.1
 nvidia-ml-py==11.525.84
 packaging==23.1
 pygments==2.15.0
 pytest==7.3.2
 pytest-xdist==3.3.1
 @ -25,3 +26,5 @@ sympy==1.11.1
 pytest-cpp==2.3.0
 rockset==1.0.3
 z3-solver==4.12.2.0
 tensorboard==2.13.0
 optree==0.9.1

2

.github/requirements/regenerate-requirements.txt vendored

View File

 @ -1,2 +1,2 @@
 typing-extensions
 typing-extensions>=4.8.0
 jinja2

									
										35

.github/scripts/build_triton_wheel.py
									
										vendored
									
												View File
												
				@ -60,12 +60,20 @@ def build_triton(

				    build_conda: bool = False,

				    build_rocm: bool = False,

				    py_version: Optional[str] = None,

				    release: bool = False,

				) -> Path:

				    env = os.environ.copy()

				    if "MAX_JOBS" not in env:

				        max_jobs = os.cpu_count() or 1

				        env["MAX_JOBS"] = str(max_jobs)

				    version_suffix = ""

				    if not release:

				        # Nightly binaries include the triton commit hash, i.e. 2.1.0+e6216047b8

				        # while release build should only include the version, i.e. 2.1.0

				        version_suffix = f"+{commit_hash[:10]}"

				        version += version_suffix

				    with TemporaryDirectory() as tmpdir:

				        triton_basedir = Path(tmpdir) / "triton"

				        triton_pythondir = triton_basedir / "python"

				@ -80,7 +88,7 @@ def build_triton(

				        if build_conda:

				            with open(triton_basedir / "meta.yaml", "w") as meta:

				                print(

				                    f"package:\n  name: torchtriton\n  version: {version}+{commit_hash[:10]}\n",

				                    f"package:\n  name: torchtriton\n  version: {version}\n",

				                    file=meta,

				                )

				                print("source:\n  path: .\n", file=meta)

				@ -103,7 +111,7 @@ def build_triton(

				            patch_init_py(

				                triton_pythondir / "triton" / "__init__.py",

				                version=f"{version}+{commit_hash[:10]}",

				                version=f"{version}",

				            )

				            if py_version is None:

				                py_version = f"{sys.version_info.major}.{sys.version_info.minor}"

				@ -122,21 +130,25 @@ def build_triton(

				                cwd=triton_basedir,

				                env=env,

				            )

				            conda_path = list(Path(tmpdir).glob("linux-64/torchtriton*.bz2"))[0]

				            conda_path = next(iter(Path(tmpdir).glob("linux-64/torchtriton*.bz2")))

				            shutil.copy(conda_path, Path.cwd())

				            return Path.cwd() / conda_path.name

				        patch_setup_py(

				            triton_pythondir / "setup.py",

				            name=triton_pkg_name,

				            version=f"{version}+{commit_hash[:10]}",

				        )

				        # change built wheel name and version

				        env["TRITON_WHEEL_NAME"] = triton_pkg_name

				        env["TRITON_WHEEL_VERSION_SUFFIX"] = version_suffix

				        patch_init_py(

				            triton_pythondir / "triton" / "__init__.py",

				            version=f"{version}+{commit_hash[:10]}",

				            version=f"{version}",

				        )

				        if build_rocm:

				            # TODO: Remove me when ROCM triton is updated

				            patch_setup_py(

				                triton_pythondir / "setup.py",

				                name=triton_pkg_name,

				                version=f"{version}",

				            )

				            check_call("scripts/amd/setup_rocm_libs.sh", cwd=triton_basedir, shell=True)

				            print("ROCm libraries setup for triton installation...")

				@ -144,7 +156,7 @@ def build_triton(

				            [sys.executable, "setup.py", "bdist_wheel"], cwd=triton_pythondir, env=env

				        )

				        whl_path = list((triton_pythondir / "dist").glob("*.whl"))[0]

				        whl_path = next(iter((triton_pythondir / "dist").glob("*.whl")))

				        shutil.copy(whl_path, Path.cwd())

				        if build_rocm:

				@ -157,12 +169,14 @@ def main() -> None:

				    from argparse import ArgumentParser

				    parser = ArgumentParser("Build Triton binaries")

				    parser.add_argument("--release", action="store_true")

				    parser.add_argument("--build-conda", action="store_true")

				    parser.add_argument("--build-rocm", action="store_true")

				    parser.add_argument("--py-version", type=str)

				    parser.add_argument("--commit-hash", type=str)

				    parser.add_argument("--triton-version", type=str, default=read_triton_version())

				    args = parser.parse_args()

				    build_triton(

				        build_rocm=args.build_rocm,

				        commit_hash=args.commit_hash

				@ -171,6 +185,7 @@ def main() -> None:

				        version=args.triton_version,

				        build_conda=args.build_conda,

				        py_version=args.py_version,

				        release=args.release,

				    )

									
										3

.github/scripts/check_labels.py
									
										vendored
									
												View File
												
				@ -1,6 +1,7 @@

				#!/usr/bin/env python3

				"""Check whether a PR has required labels."""

				import sys

				from typing import Any

				from github_utils import gh_delete_comment, gh_post_pr_comment

				@ -46,7 +47,7 @@ def main() -> None:

				    except Exception as e:

				        pass

				    exit(0)

				    sys.exit(0)

				if __name__ == "__main__":

BIN
.github/scripts/drci_mocks.json.gz vendored Normal file

View File

Binary file not shown.

									
										29

.github/scripts/ensure_actions_will_cancel.py
									
										vendored
									
												View File
												
				@ -1,6 +1,5 @@

				#!/usr/bin/env python3

				import argparse

				import sys

				from pathlib import Path

				@ -10,9 +9,11 @@ import yaml

				REPO_ROOT = Path(__file__).resolve().parent.parent.parent

				WORKFLOWS = REPO_ROOT / ".github" / "workflows"

				EXPECTED_GROUP = (

				EXPECTED_GROUP_PREFIX = (

				    "${{ github.workflow }}-${{ github.event.pull_request.number || github.sha }}"

				    "-${{ github.event_name == 'workflow_dispatch' }}"

				)

				EXPECTED_GROUP = (

				    EXPECTED_GROUP_PREFIX + "-${{ github.event_name == 'workflow_dispatch' }}"

				)

				@ -26,15 +27,8 @@ def should_check(filename: Path) -> bool:

				if __name__ == "__main__":

				    parser = argparse.ArgumentParser(

				        description="Ensure all relevant GitHub actions jobs will be cancelled based on a concurrency key"

				    )

				    args = parser.parse_args()

				    files = list(WORKFLOWS.glob("*.yml"))

				    errors_found = False

				    files = [f for f in files if should_check(f)]

				    files = [f for f in WORKFLOWS.glob("*.yml") if should_check(f)]

				    names = set()

				    for filename in files:

				        with open(filename) as f:

				@ -46,7 +40,18 @@ if __name__ == "__main__":

				            errors_found = True

				        names.add(name)

				        actual = data.get("concurrency", {})

				        if not actual.get("group", "").startswith(EXPECTED_GROUP):

				        if filename.name == "create_release.yml":

				            if not actual.get("group", "").startswith(EXPECTED_GROUP_PREFIX):

				                print(

				                    f"'concurrency' incorrect or not found in '{filename.relative_to(REPO_ROOT)}'",

				                    file=sys.stderr,

				                )

				                print(

				                    f"concurrency group should start with {EXPECTED_GROUP_PREFIX} but found {actual.get('group', None)}",

				                    file=sys.stderr,

				                )

				                errors_found = True

				        elif not actual.get("group", "").startswith(EXPECTED_GROUP):

				            print(

				                f"'concurrency' incorrect or not found in '{filename.relative_to(REPO_ROOT)}'",

				                file=sys.stderr,

									
										13

.github/scripts/filter_test_configs.py
									
										vendored
									
												View File
												
				@ -410,16 +410,17 @@ def process_jobs(

				            if target_job in (TEST_JOB_NAME, BUILD_AND_TEST_JOB_NAME):

				                target_cfg = m.group("cfg")

				                return _filter_jobs(

				                # NB: There can be multiple unstable configurations, i.e. inductor, inductor_huggingface

				                test_matrix = _filter_jobs(

				                    test_matrix=test_matrix,

				                    issue_type=issue_type,

				                    target_cfg=target_cfg,

				                )

				        warnings.warn(

				            f"Found a matching {issue_type.value} issue {target_url} for {workflow} / {job_name}, "

				            + f"but the name {target_job_cfg} is invalid"

				        )

				        else:

				            warnings.warn(

				                f"Found a matching {issue_type.value} issue {target_url} for {workflow} / {job_name}, "

				                + f"but the name {target_job_cfg} is invalid"

				            )

				    # Found no matching target, return the same input test matrix

				    return test_matrix

									
										192

.github/scripts/generate_binary_build_matrix.py
									
										vendored
									
												View File
												
				@ -10,13 +10,13 @@ architectures:

				    * Latest ROCM

				"""

				import os

				from typing import Dict, List, Optional, Tuple

				CUDA_ARCHES = ["11.8", "12.1"]

				ROCM_ARCHES = ["5.5", "5.6"]

				ROCM_ARCHES = ["5.6", "5.7"]

				CPU_CXX11_ABI_ARCH = ["cpu-cxx11-abi"]

				@ -24,6 +24,81 @@ CPU_CXX11_ABI_ARCH = ["cpu-cxx11-abi"]

				CPU_AARCH64_ARCH = ["cpu-aarch64"]

				PYTORCH_EXTRA_INSTALL_REQUIREMENTS = {

				    "11.8": (

				        "nvidia-cuda-nvrtc-cu11==11.8.89; platform_system == 'Linux' and platform_machine == 'x86_64' | "  # noqa: B950

				        "nvidia-cuda-runtime-cu11==11.8.89; platform_system == 'Linux' and platform_machine == 'x86_64' | "

				        "nvidia-cuda-cupti-cu11==11.8.87; platform_system == 'Linux' and platform_machine == 'x86_64' | "

				        "nvidia-cudnn-cu11==8.7.0.84; platform_system == 'Linux' and platform_machine == 'x86_64' | "

				        "nvidia-cublas-cu11==11.11.3.6; platform_system == 'Linux' and platform_machine == 'x86_64' | "

				        "nvidia-cufft-cu11==10.9.0.58; platform_system == 'Linux' and platform_machine == 'x86_64' | "

				        "nvidia-curand-cu11==10.3.0.86; platform_system == 'Linux' and platform_machine == 'x86_64' | "

				        "nvidia-cusolver-cu11==11.4.1.48; platform_system == 'Linux' and platform_machine == 'x86_64' | "

				        "nvidia-cusparse-cu11==11.7.5.86; platform_system == 'Linux' and platform_machine == 'x86_64' | "

				        "nvidia-nccl-cu11==2.19.3; platform_system == 'Linux' and platform_machine == 'x86_64' | "

				        "nvidia-nvtx-cu11==11.8.86; platform_system == 'Linux' and platform_machine == 'x86_64'"

				    ),

				    "12.1": (

				        "nvidia-cuda-nvrtc-cu12==12.1.105; platform_system == 'Linux' and platform_machine == 'x86_64' | "  # noqa: B950

				        "nvidia-cuda-runtime-cu12==12.1.105; platform_system == 'Linux' and platform_machine == 'x86_64' | "

				        "nvidia-cuda-cupti-cu12==12.1.105; platform_system == 'Linux' and platform_machine == 'x86_64' | "

				        "nvidia-cudnn-cu12==8.9.2.26; platform_system == 'Linux' and platform_machine == 'x86_64' | "

				        "nvidia-cublas-cu12==12.1.3.1; platform_system == 'Linux' and platform_machine == 'x86_64' | "

				        "nvidia-cufft-cu12==11.0.2.54; platform_system == 'Linux' and platform_machine == 'x86_64' | "

				        "nvidia-curand-cu12==10.3.2.106; platform_system == 'Linux' and platform_machine == 'x86_64' | "

				        "nvidia-cusolver-cu12==11.4.5.107; platform_system == 'Linux' and platform_machine == 'x86_64' | "

				        "nvidia-cusparse-cu12==12.1.0.106; platform_system == 'Linux' and platform_machine == 'x86_64' | "

				        "nvidia-nccl-cu12==2.19.3; platform_system == 'Linux' and platform_machine == 'x86_64' | "

				        "nvidia-nvtx-cu12==12.1.105; platform_system == 'Linux' and platform_machine == 'x86_64'"

				    ),

				}

				def get_nccl_submodule_version() -> str:

				    from pathlib import Path

				    nccl_version_mk = (

				        Path(__file__).absolute().parent.parent.parent

				        / "third_party"

				        / "nccl"

				        / "nccl"

				        / "makefiles"

				        / "version.mk"

				    )

				    if not nccl_version_mk.exists():

				        raise RuntimeError(

				            "Please make sure that nccl submodule is checked out when importing this script"

				        )

				    with nccl_version_mk.open("r") as f:

				        content = f.read()

				    d = {}

				    for l in content.split("\n"):

				        if not l.startswith("NCCL_"):

				            continue

				        (k, v) = l.split(":=")

				        d[k.strip()] = v.strip()

				    return f"{d['NCCL_MAJOR']}.{d['NCCL_MINOR']}.{d['NCCL_PATCH']}"

				def get_nccl_wheel_version(arch_version: str) -> str:

				    import re

				    requirements = map(

				        str.strip, re.split("[;|]", PYTORCH_EXTRA_INSTALL_REQUIREMENTS[arch_version])

				    )

				    return next(x for x in requirements if x.startswith("nvidia-nccl-cu")).split("==")[

				        1

				    ]

				def validate_nccl_dep_consistency(arch_version: str) -> None:

				    wheel_ver = get_nccl_wheel_version(arch_version)

				    submodule_ver = get_nccl_submodule_version()

				    if wheel_ver != submodule_ver:

				        raise RuntimeError(

				            f"NCCL submodule version {submodule_ver} differs from wheel version {wheel_ver}"

				        )

				def arch_type(arch_version: str) -> str:

				    if arch_version in CUDA_ARCHES:

				@ -38,23 +113,29 @@ def arch_type(arch_version: str) -> str:

				        return "cpu"

				# This can be updated to the release version when cutting release branch, i.e. 2.1

				DEFAULT_TAG = os.getenv("RELEASE_VERSION_TAG", "main")

				WHEEL_CONTAINER_IMAGES = {

				    **{

				        gpu_arch: f"pytorch/manylinux-builder:cuda{gpu_arch}"

				        gpu_arch: f"pytorch/manylinux-builder:cuda{gpu_arch}-{DEFAULT_TAG}"

				        for gpu_arch in CUDA_ARCHES

				    },

				    **{

				        gpu_arch: f"pytorch/manylinux-builder:rocm{gpu_arch}"

				        gpu_arch: f"pytorch/manylinux-builder:rocm{gpu_arch}-{DEFAULT_TAG}"

				        for gpu_arch in ROCM_ARCHES

				    },

				    "cpu": "pytorch/manylinux-builder:cpu",

				    "cpu-cxx11-abi": "pytorch/manylinuxcxx11-abi-builder:cpu-cxx11-abi",

				    "cpu-aarch64": "pytorch/manylinuxaarch64-builder:cpu-aarch64",

				    "cpu": f"pytorch/manylinux-builder:cpu-{DEFAULT_TAG}",

				    "cpu-cxx11-abi": f"pytorch/manylinuxcxx11-abi-builder:cpu-cxx11-abi-{DEFAULT_TAG}",

				    "cpu-aarch64": f"pytorch/manylinuxaarch64-builder:cpu-aarch64-{DEFAULT_TAG}",

				}

				CONDA_CONTAINER_IMAGES = {

				    **{gpu_arch: f"pytorch/conda-builder:cuda{gpu_arch}" for gpu_arch in CUDA_ARCHES},

				    "cpu": "pytorch/conda-builder:cpu",

				    **{

				        gpu_arch: f"pytorch/conda-builder:cuda{gpu_arch}-{DEFAULT_TAG}"

				        for gpu_arch in CUDA_ARCHES

				    },

				    "cpu": f"pytorch/conda-builder:cpu-{DEFAULT_TAG}",

				}

				PRE_CXX11_ABI = "pre-cxx11"

				@ -64,26 +145,38 @@ DEBUG = "debug"

				LIBTORCH_CONTAINER_IMAGES: Dict[Tuple[str, str], str] = {

				    **{

				        (gpu_arch, PRE_CXX11_ABI): f"pytorch/manylinux-builder:cuda{gpu_arch}"

				        (

				            gpu_arch,

				            PRE_CXX11_ABI,

				        ): f"pytorch/manylinux-builder:cuda{gpu_arch}-{DEFAULT_TAG}"

				        for gpu_arch in CUDA_ARCHES

				    },

				    **{

				        (gpu_arch, CXX11_ABI): f"pytorch/libtorch-cxx11-builder:cuda{gpu_arch}"

				        (

				            gpu_arch,

				            CXX11_ABI,

				        ): f"pytorch/libtorch-cxx11-builder:cuda{gpu_arch}-{DEFAULT_TAG}"

				        for gpu_arch in CUDA_ARCHES

				    },

				    **{

				        (gpu_arch, PRE_CXX11_ABI): f"pytorch/manylinux-builder:rocm{gpu_arch}"

				        (

				            gpu_arch,

				            PRE_CXX11_ABI,

				        ): f"pytorch/manylinux-builder:rocm{gpu_arch}-{DEFAULT_TAG}"

				        for gpu_arch in ROCM_ARCHES

				    },

				    **{

				        (gpu_arch, CXX11_ABI): f"pytorch/libtorch-cxx11-builder:rocm{gpu_arch}"

				        (

				            gpu_arch,

				            CXX11_ABI,

				        ): f"pytorch/libtorch-cxx11-builder:rocm{gpu_arch}-{DEFAULT_TAG}"

				        for gpu_arch in ROCM_ARCHES

				    },

				    ("cpu", PRE_CXX11_ABI): "pytorch/manylinux-builder:cpu",

				    ("cpu", CXX11_ABI): "pytorch/libtorch-cxx11-builder:cpu",

				    ("cpu", PRE_CXX11_ABI): f"pytorch/manylinux-builder:cpu-{DEFAULT_TAG}",

				    ("cpu", CXX11_ABI): f"pytorch/libtorch-cxx11-builder:cpu-{DEFAULT_TAG}",

				}

				FULL_PYTHON_VERSIONS = ["3.8", "3.9", "3.10", "3.11"]

				FULL_PYTHON_VERSIONS = ["3.8", "3.9", "3.10", "3.11", "3.12"]

				def translate_desired_cuda(gpu_arch_type: str, gpu_arch_version: str) -> str:

				@ -190,7 +283,6 @@ def generate_wheels_matrix(

				    os: str,

				    arches: Optional[List[str]] = None,

				    python_versions: Optional[List[str]] = None,

				    gen_special_an_non_special_wheel: bool = True,

				) -> List[Dict[str, str]]:

				    package_type = "wheel"

				    if os == "linux" or os == "linux-aarch64":

				@ -224,9 +316,8 @@ def generate_wheels_matrix(

				                else arch_version

				            )

				            # special 12.1 wheels package without dependencies

				            # dependency downloaded via pip install

				            if arch_version == "12.1" and os == "linux":

				            # 12.1 linux wheels require PYTORCH_EXTRA_INSTALL_REQUIREMENTS to install

				            if arch_version in ["12.1", "11.8"] and os == "linux":

				                ret.append(

				                    {

				                        "python_version": python_version,

				@ -238,41 +329,36 @@ def generate_wheels_matrix(

				                        "devtoolset": "",

				                        "container_image": WHEEL_CONTAINER_IMAGES[arch_version],

				                        "package_type": package_type,

				                        "pytorch_extra_install_requirements": "nvidia-cuda-nvrtc-cu12==12.1.105; platform_system == 'Linux' and platform_machine == 'x86_64' | "  # noqa: B950

				                        "nvidia-cuda-runtime-cu12==12.1.105; platform_system == 'Linux' and platform_machine == 'x86_64' | "

				                        "nvidia-cuda-cupti-cu12==12.1.105; platform_system == 'Linux' and platform_machine == 'x86_64' | "

				                        "nvidia-cudnn-cu12==8.9.2.26; platform_system == 'Linux' and platform_machine == 'x86_64' | "

				                        "nvidia-cublas-cu12==12.1.3.1; platform_system == 'Linux' and platform_machine == 'x86_64' | "

				                        "nvidia-cufft-cu12==11.0.2.54; platform_system == 'Linux' and platform_machine == 'x86_64' | "

				                        "nvidia-curand-cu12==10.3.2.106; platform_system == 'Linux' and platform_machine == 'x86_64' | "

				                        "nvidia-cusolver-cu12==11.4.5.107; platform_system == 'Linux' and platform_machine == 'x86_64' | "

				                        "nvidia-cusparse-cu12==12.1.0.106; platform_system == 'Linux' and platform_machine == 'x86_64' | "

				                        "nvidia-nccl-cu12==2.18.1; platform_system == 'Linux' and platform_machine == 'x86_64' | "

				                        "nvidia-nvtx-cu12==12.1.105; platform_system == 'Linux' and platform_machine == 'x86_64'",

				                        "build_name": f"{package_type}-py{python_version}-{gpu_arch_type}{gpu_arch_version}-with-pypi-cudnn".replace(  # noqa: B950

				                        "pytorch_extra_install_requirements": PYTORCH_EXTRA_INSTALL_REQUIREMENTS[arch_version],  # fmt: skip

				                        "build_name": f"{package_type}-py{python_version}-{gpu_arch_type}{gpu_arch_version}".replace(  # noqa: B950

				                            ".", "_"

				                        ),

				                    }

				                )

				                if not gen_special_an_non_special_wheel:

				                    continue

				            ret.append(

				                {

				                    "python_version": python_version,

				                    "gpu_arch_type": gpu_arch_type,

				                    "gpu_arch_version": gpu_arch_version,

				                    "desired_cuda": translate_desired_cuda(

				                        gpu_arch_type, gpu_arch_version

				                    ),

				                    "devtoolset": "cxx11-abi"

				                    if arch_version == "cpu-cxx11-abi"

				                    else "",

				                    "container_image": WHEEL_CONTAINER_IMAGES[arch_version],

				                    "package_type": package_type,

				                    "build_name": f"{package_type}-py{python_version}-{gpu_arch_type}{gpu_arch_version}".replace(

				                        ".", "_"

				                    ),

				                }

				            )

				            else:

				                ret.append(

				                    {

				                        "python_version": python_version,

				                        "gpu_arch_type": gpu_arch_type,

				                        "gpu_arch_version": gpu_arch_version,

				                        "desired_cuda": translate_desired_cuda(

				                            gpu_arch_type, gpu_arch_version

				                        ),

				                        "devtoolset": "cxx11-abi"

				                        if arch_version == "cpu-cxx11-abi"

				                        else "",

				                        "container_image": WHEEL_CONTAINER_IMAGES[arch_version],

				                        "package_type": package_type,

				                        "build_name": f"{package_type}-py{python_version}-{gpu_arch_type}{gpu_arch_version}".replace(

				                            ".", "_"

				                        ),

				                        "pytorch_extra_install_requirements":

				                        PYTORCH_EXTRA_INSTALL_REQUIREMENTS["12.1"]  # fmt: skip

				                        if os != "linux" else "",

				                    }

				                )

				    return ret

				validate_nccl_dep_consistency("12.1")

				validate_nccl_dep_consistency("11.8")

									
										42

.github/scripts/generate_ci_workflows.py
									
										vendored
									
												View File
												
				@ -60,7 +60,7 @@ class BinaryBuildWorkflow:

				    branches: str = "nightly"

				    # Mainly for macos

				    cross_compile_arm64: bool = False

				    xcode_version: str = ""

				    macos_runner: str = "macos-12-xl"

				    def __post_init__(self) -> None:

				        if self.abi_version:

				@ -125,7 +125,9 @@ LINUX_BINARY_BUILD_WORFKLOWS = [

				        package_type="libtorch",

				        abi_version=generate_binary_build_matrix.CXX11_ABI,

				        build_configs=generate_binary_build_matrix.generate_libtorch_matrix(

				            OperatingSystem.LINUX, generate_binary_build_matrix.CXX11_ABI

				            OperatingSystem.LINUX,

				            generate_binary_build_matrix.CXX11_ABI,

				            libtorch_variants=["shared-with-deps"],

				        ),

				        ciflow_config=CIFlowConfig(

				            labels={LABEL_CIFLOW_BINARIES, LABEL_CIFLOW_BINARIES_LIBTORCH},

				@ -137,7 +139,9 @@ LINUX_BINARY_BUILD_WORFKLOWS = [

				        package_type="libtorch",

				        abi_version=generate_binary_build_matrix.PRE_CXX11_ABI,

				        build_configs=generate_binary_build_matrix.generate_libtorch_matrix(

				            OperatingSystem.LINUX, generate_binary_build_matrix.PRE_CXX11_ABI

				            OperatingSystem.LINUX,

				            generate_binary_build_matrix.PRE_CXX11_ABI,

				            libtorch_variants=["shared-with-deps"],

				        ),

				        ciflow_config=CIFlowConfig(

				            labels={LABEL_CIFLOW_BINARIES, LABEL_CIFLOW_BINARIES_LIBTORCH},

				@ -154,7 +158,6 @@ LINUX_BINARY_SMOKE_WORKFLOWS = [

				            OperatingSystem.LINUX,

				            arches=["11.8", "12.1"],

				            python_versions=["3.8"],

				            gen_special_an_non_special_wheel=False,

				        ),

				        branches="main",

				    ),

				@ -212,7 +215,9 @@ WINDOWS_BINARY_BUILD_WORKFLOWS = [

				        package_type="libtorch",

				        abi_version=generate_binary_build_matrix.RELEASE,

				        build_configs=generate_binary_build_matrix.generate_libtorch_matrix(

				            OperatingSystem.WINDOWS, generate_binary_build_matrix.RELEASE

				            OperatingSystem.WINDOWS,

				            generate_binary_build_matrix.RELEASE,

				            libtorch_variants=["shared-with-deps"],

				        ),

				        ciflow_config=CIFlowConfig(

				            labels={LABEL_CIFLOW_BINARIES, LABEL_CIFLOW_BINARIES_LIBTORCH},

				@ -224,7 +229,9 @@ WINDOWS_BINARY_BUILD_WORKFLOWS = [

				        package_type="libtorch",

				        abi_version=generate_binary_build_matrix.DEBUG,

				        build_configs=generate_binary_build_matrix.generate_libtorch_matrix(

				            OperatingSystem.WINDOWS, generate_binary_build_matrix.DEBUG

				            OperatingSystem.WINDOWS,

				            generate_binary_build_matrix.DEBUG,

				            libtorch_variants=["shared-with-deps"],

				        ),

				        ciflow_config=CIFlowConfig(

				            labels={LABEL_CIFLOW_BINARIES, LABEL_CIFLOW_BINARIES_LIBTORCH},

				@ -294,20 +301,39 @@ MACOS_BINARY_BUILD_WORKFLOWS = [

				        package_type="libtorch",

				        abi_version=generate_binary_build_matrix.CXX11_ABI,

				        build_configs=generate_binary_build_matrix.generate_libtorch_matrix(

				            OperatingSystem.MACOS, generate_binary_build_matrix.CXX11_ABI

				            OperatingSystem.MACOS,

				            generate_binary_build_matrix.CXX11_ABI,

				            libtorch_variants=["shared-with-deps"],

				        ),

				        ciflow_config=CIFlowConfig(

				            labels={LABEL_CIFLOW_BINARIES, LABEL_CIFLOW_BINARIES_LIBTORCH},

				            isolated_workflow=True,

				        ),

				    ),

				    BinaryBuildWorkflow(

				        os=OperatingSystem.MACOS_ARM64,

				        package_type="libtorch",

				        abi_version=generate_binary_build_matrix.CXX11_ABI,

				        build_configs=generate_binary_build_matrix.generate_libtorch_matrix(

				            OperatingSystem.MACOS,

				            generate_binary_build_matrix.CXX11_ABI,

				            libtorch_variants=["shared-with-deps"],

				        ),

				        cross_compile_arm64=False,

				        macos_runner="macos-13-xlarge",

				        ciflow_config=CIFlowConfig(

				            labels={LABEL_CIFLOW_BINARIES, LABEL_CIFLOW_BINARIES_LIBTORCH},

				            isolated_workflow=True,

				        ),

				    ),

				    BinaryBuildWorkflow(

				        os=OperatingSystem.MACOS_ARM64,

				        package_type="wheel",

				        build_configs=generate_binary_build_matrix.generate_wheels_matrix(

				            OperatingSystem.MACOS_ARM64

				        ),

				        cross_compile_arm64=True,

				        cross_compile_arm64=False,

				        macos_runner="macos-13-xlarge",

				        ciflow_config=CIFlowConfig(

				            labels={LABEL_CIFLOW_BINARIES, LABEL_CIFLOW_BINARIES_WHEEL},

				            isolated_workflow=True,

									
										19

.github/scripts/get_workflow_job_id.py
									
										vendored
									
												View File
												
				@ -111,7 +111,7 @@ def fetch_jobs(url: str, headers: Dict[str, str]) -> List[Dict[str, str]]:

				# running.

				def find_job_id(args: Any) -> str:

				def find_job_id_name(args: Any) -> Tuple[str, str]:

				    # From https://docs.github.com/en/actions/learn-github-actions/environment-variables

				    PYTORCH_REPO = os.environ.get("GITHUB_REPOSITORY", "pytorch/pytorch")

				    PYTORCH_GITHUB_API = f"https://api.github.com/repos/{PYTORCH_REPO}"

				@ -130,15 +130,28 @@ def find_job_id(args: Any) -> str:

				    for job in jobs:

				        if job["runner_name"] == args.runner_name:

				            return job["id"]

				            return (job["id"], job["name"])

				    raise RuntimeError(f"Can't find job id for runner {args.runner_name}")

				def set_output(name: str, val: Any) -> None:

				    if os.getenv("GITHUB_OUTPUT"):

				        with open(str(os.getenv("GITHUB_OUTPUT")), "a") as env:

				            print(f"{name}={val}", file=env)

				        print(f"setting {name}={val}")

				    else:

				        print(f"::set-output name={name}::{val}")

				def main() -> None:

				    args = parse_args()

				    try:

				        print(find_job_id(args))

				        # Get both the job ID and job name because we have already spent a request

				        # here to get the job info

				        job_id, job_name = find_job_id_name(args)

				        set_output("job-id", job_id)

				        set_output("job-name", job_name)

				    except Exception as e:

				        print(repr(e), file=sys.stderr)

				        print(f"workflow-{args.workflow_run_id}")

									
										40

.github/scripts/github_utils.py
									
										vendored
									
												View File
												
				@ -5,12 +5,15 @@ import os

				import warnings

				from dataclasses import dataclass

				from typing import Any, Callable, cast, Dict, List, Optional, Tuple

				from typing import Any, Callable, cast, Dict, List, Optional, Tuple, Union

				from urllib.error import HTTPError

				from urllib.parse import quote

				from urllib.request import Request, urlopen

				GITHUB_API_URL = "https://api.github.com"

				@dataclass

				class GitHubComment:

				    body_text: str

				@ -26,16 +29,20 @@ def gh_fetch_url_and_headers(

				    url: str,

				    *,

				    headers: Optional[Dict[str, str]] = None,

				    data: Optional[Dict[str, Any]] = None,

				    data: Union[Optional[Dict[str, Any]], str] = None,

				    method: Optional[str] = None,

				    reader: Callable[[Any], Any] = lambda x: x.read(),

				) -> Tuple[Any, Any]:

				    if headers is None:

				        headers = {}

				    token = os.environ.get("GITHUB_TOKEN")

				    if token is not None and url.startswith("https://api.github.com/"):

				    if token is not None and url.startswith(f"{GITHUB_API_URL}/"):

				        headers["Authorization"] = f"token {token}"

				    data_ = json.dumps(data).encode() if data is not None else None

				    data_ = None

				    if data is not None:

				        data_ = data.encode() if isinstance(data, str) else json.dumps(data).encode()

				    try:

				        with urlopen(Request(url, headers=headers, data=data_, method=method)) as conn:

				            return conn.headers, reader(conn)

				@ -57,7 +64,7 @@ def gh_fetch_url(

				    url: str,

				    *,

				    headers: Optional[Dict[str, str]] = None,

				    data: Optional[Dict[str, Any]] = None,

				    data: Union[Optional[Dict[str, Any]], str] = None,

				    method: Optional[str] = None,

				    reader: Callable[[Any], Any] = lambda x: x.read(),

				) -> Any:

				@ -125,7 +132,7 @@ def gh_post_pr_comment(

				    org: str, repo: str, pr_num: int, comment: str, dry_run: bool = False

				) -> List[Dict[str, Any]]:

				    return _gh_post_comment(

				        f"https://api.github.com/repos/{org}/{repo}/issues/{pr_num}/comments",

				        f"{GITHUB_API_URL}/repos/{org}/{repo}/issues/{pr_num}/comments",

				        comment,

				        dry_run,

				    )

				@ -135,14 +142,14 @@ def gh_post_commit_comment(

				    org: str, repo: str, sha: str, comment: str, dry_run: bool = False

				) -> List[Dict[str, Any]]:

				    return _gh_post_comment(

				        f"https://api.github.com/repos/{org}/{repo}/commits/{sha}/comments",

				        f"{GITHUB_API_URL}/repos/{org}/{repo}/commits/{sha}/comments",

				        comment,

				        dry_run,

				    )

				def gh_delete_comment(org: str, repo: str, comment_id: int) -> None:

				    url = f"https://api.github.com/repos/{org}/{repo}/issues/comments/{comment_id}"

				    url = f"{GITHUB_API_URL}/repos/{org}/{repo}/issues/comments/{comment_id}"

				    gh_fetch_url(url, method="DELETE")

				@ -153,7 +160,7 @@ def gh_fetch_merge_base(org: str, repo: str, base: str, head: str) -> str:

				    # https://docs.github.com/en/rest/commits/commits?apiVersion=2022-11-28#compare-two-commits

				    try:

				        json_data = gh_fetch_url(

				            f"https://api.github.com/repos/{org}/{repo}/compare/{base}...{head}",

				            f"{GITHUB_API_URL}/repos/{org}/{repo}/compare/{base}...{head}",

				            headers={"Accept": "application/vnd.github.v3+json"},

				            reader=json.load,

				        )

				@ -167,3 +174,18 @@ def gh_fetch_merge_base(org: str, repo: str, base: str, head: str) -> str:

				        warnings.warn(f"Failed to get merge base for {base}...{head}: {error}")

				    return merge_base

				def gh_update_pr_state(org: str, repo: str, pr_num: int, state: str = "open") -> None:

				    url = f"{GITHUB_API_URL}/repos/{org}/{repo}/pulls/{pr_num}"

				    try:

				        gh_fetch_url(url, method="PATCH", data={"state": state})

				    except HTTPError as err:

				        # When trying to open the pull request, error 422 means that the branch

				        # has been deleted and the API couldn't re-open it

				        if err.code == 422 and state == "open":

				            warnings.warn(

				                f"Failed to open {pr_num} because its head branch has been deleted: {err}"

				            )

				        else:

				            raise

105513

.github/scripts/gql_mocks.json generated vendored

View File

File diff suppressed because one or more lines are too long

BIN
.github/scripts/gql_mocks.json.gz vendored Normal file

View File

Binary file not shown.

									
										8

.github/scripts/pytest_cache.py
									
										vendored
									
												View File
												
				@ -38,6 +38,12 @@ def parse_args() -> argparse.Namespace:

				        required=True,

				        help="A unique job identifier that should be the same for all runs of job",

				    )

				    parser.add_argument(

				        "--sha", required="--upload" in sys.argv, help="SHA of the commit"

				    )  # Only required for upload

				    parser.add_argument(

				        "--test_config", required="--upload" in sys.argv, help="The test config"

				    )  # Only required for upload

				    parser.add_argument(

				        "--shard", required="--upload" in sys.argv, help="The shard id"

				    )  # Only required for upload

				@ -84,6 +90,8 @@ def main() -> None:

				            pr_identifier=pr_identifier,

				            repo=repo,

				            job_identifier=args.job_identifier,

				            sha=args.sha,

				            test_config=args.test_config,

				            shard=args.shard,

				            cache_dir=cache_dir,

				            bucket=args.bucket,

									
										56

.github/scripts/pytest_caching_utils.py
									
										vendored
									
												View File
												
				@ -56,6 +56,8 @@ def upload_pytest_cache(

				    pr_identifier: PRIdentifier,

				    repo: GithubRepo,

				    job_identifier: str,

				    sha: str,

				    test_config: str,

				    shard: str,

				    cache_dir: Path,

				    temp_dir: Path,

				@ -79,25 +81,11 @@ def upload_pytest_cache(

				    if not bucket:

				        bucket = BUCKET

				    # Merge the current cache with any caches from previous runs before uploading

				    # We only need to merge it with the cache for the same shard (which will have already been downloaded if it exists)

				    # since the other shards will handle themselves

				    shard_cache_path = _get_temp_cache_dir_path(

				        temp_dir, pr_identifier, repo, job_identifier, shard

				    )

				    if shard_cache_path.is_dir():

				        _merge_pytest_caches(shard_cache_path, cache_dir)

				    #

				    # Upload the cache

				    #

				    obj_key_prefix = _get_s3_key_prefix(pr_identifier, repo, job_identifier, shard)

				    # This doesn't include the zip file extension. That'll get added later

				    zip_file_path = temp_dir / ZIP_UPLOAD / obj_key_prefix

				    zip_file_path = zip_folder(cache_dir, zip_file_path)

				    obj_key_prefix = _get_s3_key_prefix(

				        pr_identifier, repo, job_identifier, sha, test_config, shard

				    )

				    zip_file_path = zip_folder(cache_dir, temp_dir / ZIP_UPLOAD / obj_key_prefix)

				    obj_key = f"{obj_key_prefix}{os.path.splitext(zip_file_path)[1]}"  # Keep the new file extension

				    upload_file_to_s3(zip_file_path, bucket, obj_key)

				@ -136,38 +124,22 @@ def download_pytest_cache(

				    )

				    for downloaded_zip in downloads:

				        # the file name of the zip is the shard id

				        shard = os.path.splitext(os.path.basename(downloaded_zip))[0]

				        cache_dir_for_shard = _get_temp_cache_dir_path(

				            temp_dir, pr_identifier, repo, job_identifier, shard

				        # Unzip into random folder, then merge with the current cache

				        cache_dir_for_shard = (

				            temp_dir / UNZIPPED_CACHES / os.urandom(16).hex() / PYTEST_CACHE_DIR_NAME

				        )

				        unzip_folder(downloaded_zip, cache_dir_for_shard)

				        print(

				            f"Merging cache for job_identifier `{job_identifier}`, shard `{shard}` into `{dest_cache_dir}`"

				        )

				        print(f"Merging cache from {downloaded_zip}")

				        _merge_pytest_caches(cache_dir_for_shard, dest_cache_dir)

				def _get_temp_cache_dir_path(

				    temp_dir: Path,

				    pr_identifier: PRIdentifier,

				    repo: GithubRepo,

				    job_identifier: str,

				    shard: str,

				) -> Path:

				    return (

				        temp_dir

				        / UNZIPPED_CACHES

				        / _get_s3_key_prefix(pr_identifier, repo, job_identifier, shard)

				        / PYTEST_CACHE_DIR_NAME

				    )

				def _get_s3_key_prefix(

				    pr_identifier: PRIdentifier,

				    repo: GithubRepo,

				    job_identifier: str,

				    sha: str = "",

				    test_config: str = "",

				    shard: str = "",

				) -> str:

				    """

				@ -176,6 +148,10 @@ def _get_s3_key_prefix(

				    """

				    prefix = f"{PYTEST_CACHE_KEY_PREFIX}/{repo.owner}/{repo.name}/{pr_identifier}/{sanitize_for_s3(job_identifier)}"

				    if sha:

				        prefix += f"/{sha}"

				    if test_config:

				        prefix += f"/{sanitize_for_s3(test_config)}"

				    if shard:

				        prefix += f"/{shard}"

47298

.github/scripts/rockset_mocks.json vendored

View File

File diff suppressed because it is too large Load Diff

BIN
.github/scripts/rockset_mocks.json.gz vendored Normal file

View File

Binary file not shown.

									
										64

.github/scripts/tag_docker_images_for_release.py
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,64 @@

				import argparse

				import subprocess

				from typing import Dict

				import generate_binary_build_matrix

				def tag_image(

				    image: str,

				    default_tag: str,

				    release_version: str,

				    dry_run: str,

				    tagged_images: Dict[str, bool],

				) -> None:

				    if image in tagged_images:

				        return

				    release_image = image.replace(f"-{default_tag}", f"-{release_version}")

				    print(f"Tagging {image} to {release_image} , dry_run: {dry_run}")

				    if dry_run == "disabled":

				        subprocess.check_call(["docker", "pull", image])

				        subprocess.check_call(["docker", "tag", image, release_image])

				        subprocess.check_call(["docker", "push", release_image])

				    tagged_images[image] = True

				def main() -> None:

				    parser = argparse.ArgumentParser()

				    parser.add_argument(

				        "--version",

				        help="Version to tag",

				        type=str,

				        default="2.2",

				    )

				    parser.add_argument(

				        "--dry-run",

				        help="No Runtime Error check",

				        type=str,

				        choices=["enabled", "disabled"],

				        default="enabled",

				    )

				    options = parser.parse_args()

				    tagged_images: Dict[str, bool] = dict()

				    platform_images = [

				        generate_binary_build_matrix.WHEEL_CONTAINER_IMAGES,

				        generate_binary_build_matrix.LIBTORCH_CONTAINER_IMAGES,

				        generate_binary_build_matrix.CONDA_CONTAINER_IMAGES,

				    ]

				    default_tag = generate_binary_build_matrix.DEFAULT_TAG

				    for platform_image in platform_images:  # type: ignore[attr-defined]

				        for arch in platform_image.keys():  # type: ignore[attr-defined]

				            tag_image(

				                platform_image[arch],  # type: ignore[index]

				                default_tag,

				                options.version,

				                options.dry_run,

				                tagged_images,

				            )

				if __name__ == "__main__":

				    main()

									
										57

.github/scripts/test_filter_test_configs.py
									
										vendored
									
												View File
												
				@ -102,6 +102,30 @@ MOCKED_DISABLED_UNSTABLE_JOBS = {

				        "manywheel-py3_8-cuda11_8-build",

				        "",

				    ],

				    "inductor / cuda12.1-py3.10-gcc9-sm86 / test (inductor)": [

				        "pytorchbot",

				        "107079",

				        "https://github.com/pytorch/pytorch/issues/107079",

				        "inductor",

				        "cuda12.1-py3.10-gcc9-sm86",

				        "test (inductor)",

				    ],

				    "inductor / cuda12.1-py3.10-gcc9-sm86 / test (inductor_huggingface)": [

				        "pytorchbot",

				        "109153",

				        "https://github.com/pytorch/pytorch/issues/109153",

				        "inductor",

				        "cuda12.1-py3.10-gcc9-sm86",

				        "test (inductor_huggingface)",

				    ],

				    "inductor / cuda12.1-py3.10-gcc9-sm86 / test (inductor_huggingface_dynamic)": [

				        "pytorchbot",

				        "109154",

				        "https://github.com/pytorch/pytorch/issues/109154",

				        "inductor",

				        "cuda12.1-py3.10-gcc9-sm86",

				        "test (inductor_huggingface_dynamic)",

				    ],

				}

				MOCKED_PR_INFO = {

				@ -569,6 +593,37 @@ class TestConfigFilter(TestCase):

				                "expected": '{"include": [{"config": "default", "unstable": "unstable"}]}',

				                "description": "Both binary build and test jobs are unstable",

				            },

				            {

				                "workflow": "inductor",

				                "job_name": "cuda12.1-py3.10-gcc9-sm86 / build",

				                "test_matrix": """

				                    { include: [

				                        { config: "inductor" },

				                        { config: "inductor_huggingface", shard: 1 },

				                        { config: "inductor_huggingface", shard: 2 },

				                        { config: "inductor_timm", shard: 1 },

				                        { config: "inductor_timm", shard: 2 },

				                        { config: "inductor_torchbench" },

				                        { config: "inductor_huggingface_dynamic" },

				                        { config: "inductor_torchbench_dynamic" },

				                        { config: "inductor_distributed" },

				                    ]}

				                """,

				                "expected": """

				                    { "include": [

				                        { "config": "inductor", "unstable": "unstable" },

				                        { "config": "inductor_huggingface", "shard": 1, "unstable": "unstable" },

				                        { "config": "inductor_huggingface", "shard": 2, "unstable": "unstable" },

				                        { "config": "inductor_timm", "shard": 1 },

				                        { "config": "inductor_timm", "shard": 2 },

				                        { "config": "inductor_torchbench" },

				                        { "config": "inductor_huggingface_dynamic", "unstable": "unstable" },

				                        { "config": "inductor_torchbench_dynamic" },

				                        { "config": "inductor_distributed" }

				                    ]}

				                """,

				                "description": "Marking multiple unstable configurations",

				            },

				        ]

				        for case in testcases:

				@ -577,7 +632,7 @@ class TestConfigFilter(TestCase):

				            test_matrix = yaml.safe_load(case["test_matrix"])

				            filtered_test_matrix = mark_unstable_jobs(workflow, job_name, test_matrix)

				            self.assertEqual(case["expected"], json.dumps(filtered_test_matrix))

				            self.assertEqual(json.loads(case["expected"]), filtered_test_matrix)

				    @mock.patch("subprocess.check_output")

				    def test_perform_misc_tasks(self, mocked_subprocess: Any) -> None:

									
										543

.github/scripts/test_trymerge.py
									
										vendored
									
												View File
												
				@ -7,11 +7,12 @@

				# GraphQL queries in trymerge.py, please make sure to delete `gql_mocks.json`

				# And re-run the test locally with ones PAT

				import gzip

				import json

				import os

				import warnings

				from hashlib import sha256

				from typing import Any, cast, Dict, List, Optional

				from typing import Any, Dict, List, Optional

				from unittest import main, mock, skip, TestCase

				from urllib.error import HTTPError

				@ -19,18 +20,20 @@ from gitutils import get_git_remote_name, get_git_repo_dir, GitRepo

				from trymerge import (

				    categorize_checks,

				    DRCI_CHECKRUN_NAME,

				    find_matching_merge_rule,

				    FlakyRule,

				    get_classifications,

				    get_drci_classifications,

				    get_rockset_results,

				    gh_get_team_members,

				    gh_graphql,

				    GitHubPR,

				    is_broken_trunk,

				    JobCheckState,

				    main as trymerge_main,

				    MandatoryChecksMissingError,

				    MergeRule,

				    PostCommentError,

				    RE_GHSTACK_DESC,

				    read_merge_rules,

				    remove_job_name_suffix,

				    validate_revert,

				@ -39,6 +42,10 @@ from trymerge import (

				if "GIT_REMOTE_URL" not in os.environ:

				    os.environ["GIT_REMOTE_URL"] = "https://github.com/pytorch/pytorch"

				GQL_MOCKS = "gql_mocks.json.gz"

				ROCKSET_MOCKS = "rockset_mocks.json.gz"

				DRCI_MOCKS = "drci_mocks.json.gz"

				def mock_query(

				    fallback_function: Any,

				@ -51,11 +58,11 @@ def mock_query(

				    def get_mocked_queries() -> Any:

				        if not os.path.exists(gql_db_fname):

				            return {}

				        with open(gql_db_fname, encoding="utf-8") as f:

				        with gzip.open(gql_db_fname, encoding="utf-8", mode="rt") as f:

				            return json.load(f)

				    def save_mocked_queries(obj: Any) -> None:

				        with open(gql_db_fname, encoding="utf-8", mode="w") as f:

				        with gzip.open(gql_db_fname, encoding="utf-8", mode="wt") as f:

				            json.dump(obj, f, indent=2)

				            f.write("\n")

				@ -68,19 +75,20 @@ def mock_query(

				    try:

				        rc = fallback_function(*args)

				    except HTTPError as err:

				        if err.code == 401:

				        if err.code == 401 or err.code == 403:

				            err_msg = f"If you are seeing this message during workflow run, please make sure to update {file_name}"

				            err_msg += f" locally, by deleting it and running {os.path.basename(__file__)} with "

				            err_msg += " GitHub Personal Access Token passed via GITHUB_TOKEN environment variable"

				            err_msg += (

				                " the rockset api key passed via ROCKSET_API_KEY environment variable"

				            )

				            err_msg += f" locally, by deleting it and running {os.path.basename(__file__)} with"

				            err_msg += " GitHub Personal Access Token passed via GITHUB_TOKEN,"

				            err_msg += " the rockset api key passed via ROCKSET_API_KEY,"

				            err_msg += " and drci api key passed via DRCI_BOT_KEY environment variables"

				            if (

				                os.getenv("GITHUB_TOKEN") is None

				                or os.getenv("ROCKSET_API_KEY") is None

				                or os.getenv("DRCI_BOT_KEY") is None

				            ):

				                err_msg = (

				                    "Failed to update cached GraphQL queries as GITHUB_TOKEN or ROCKSET_API_KEY is not defined."

				                    "Failed to update cached queries as GITHUB_TOKEN or ROCKSET_API_KEY or DRCI_BOT_KEY "

				                    + "is not defined. "

				                    + err_msg

				                )

				            raise RuntimeError(err_msg) from err

				@ -100,19 +108,29 @@ def mocked_gh_graphql(query: str, **kwargs: Any) -> Any:

				    def gh_graphql_wrapper(query: str, kwargs: Any) -> Any:

				        return gh_graphql(query, **kwargs)

				    return mock_query(gh_graphql_wrapper, "gql_mocks.json", key_function, query, kwargs)

				    return mock_query(gh_graphql_wrapper, GQL_MOCKS, key_function, query, kwargs)

				def mocked_rockset_results(head_sha: str, merge_base: str, num_retries: int = 3) -> Any:

				    return mock_query(

				        get_rockset_results,

				        "rockset_mocks.json",

				        ROCKSET_MOCKS,

				        lambda x, y: f"{x} {y}",

				        head_sha,

				        merge_base,

				    )

				def mocked_drci_classifications(pr_num: int, project: str, num_retries: int = 3) -> Any:

				    return mock_query(

				        get_drci_classifications,

				        DRCI_MOCKS,

				        lambda x, y: f"{x} {y}",

				        pr_num,

				        project,

				    )

				def mock_parse_args(revert: bool = False, force: bool = False) -> Any:

				    class Object:

				        def __init__(self) -> None:

				@ -189,6 +207,18 @@ def mocked_read_merge_rules(repo: Any, org: str, project: str) -> List[MergeRule

				            ],

				            ignore_flaky_failures=True,

				        ),

				        MergeRule(

				            name="xla",

				            patterns=[".github/ci_commit_pins/xla.txt"],

				            approved_by=["pytorchbot"],

				            mandatory_checks_name=[

				                "Lint",

				                "EasyCLA",

				                "pull / linux-focal-py3_8-clang9-xla / build",

				                "pull / linux-focal-py3_8-clang9-xla / test (xla, 1, 1, linux.12xlarge)",

				            ],

				            ignore_flaky_failures=True,

				        ),

				    ]

				@ -196,16 +226,6 @@ def mocked_read_merge_rules_raise(repo: Any, org: str, project: str) -> List[Mer

				    raise RuntimeError("testing")

				def empty_flaky_rules() -> List[FlakyRule]:

				    return []

				def xla_is_flaky_rules() -> List[FlakyRule]:

				    return [

				        FlakyRule("xla", ["FAILED: Build did NOT complete successfully"]),

				    ]

				def xla_merge_rules(repo: Any, org: str, project: str) -> List[MergeRule]:

				    return [

				        MergeRule(

				@ -217,6 +237,7 @@ def xla_merge_rules(repo: Any, org: str, project: str) -> List[MergeRule]:

				                "EasyCLA",

				                "pull / linux-bionic-py3_8-clang8-xla / build",

				                "pull / linux-bionic-py3_8-clang8-xla / test (xla, 1, 1, linux.4xlarge)",

				                "inductor / cuda11.8-py3.10-gcc7-sm86 / test (inductor_torchbench_dynamic, 1, 1, linux.g5.4xlarge.nvidia.gpu)",

				            ],

				            ignore_flaky_failures=False,

				        ),

				@ -238,9 +259,11 @@ class DummyGitRepo(GitRepo):

				        return "super awsome commit message"

				@mock.patch("trymerge.read_flaky_rules", side_effect=empty_flaky_rules)

				@mock.patch("trymerge.get_rockset_results", side_effect=empty_rockset_results)

				@mock.patch("trymerge.gh_graphql", side_effect=mocked_gh_graphql)

				@mock.patch(

				    "trymerge.get_drci_classifications", side_effect=mocked_drci_classifications

				)

				class TestTryMerge(TestCase):

				    def test_merge_rules_valid(self, *args: Any) -> None:

				        "Test that merge_rules.yaml can be parsed"

				@ -251,7 +274,7 @@ class TestTryMerge(TestCase):

				    @mock.patch("trymerge.read_merge_rules", side_effect=mocked_read_merge_rules)

				    def test_match_rules(self, *args: Any) -> None:

				        "Tests that PR passes merge rules"

				        pr = GitHubPR("pytorch", "pytorch", 77700)

				        pr = GitHubPR("pytorch", "pytorch", 109999)

				        repo = DummyGitRepo()

				        self.assertTrue(find_matching_merge_rule(pr, repo) is not None)

				@ -304,14 +327,9 @@ class TestTryMerge(TestCase):

				    def test_internal_changes(self, *args: Any) -> None:

				        "Tests that PR with internal changes is detected"

				        pr = GitHubPR("pytorch", "pytorch", 73969)

				        pr = GitHubPR("pytorch", "pytorch", 110140)

				        self.assertTrue(pr.has_internal_changes())

				    def test_checksuites_pagination(self, *args: Any) -> None:

				        "Tests that PR with lots of checksuits can be fetched"

				        pr = GitHubPR("pytorch", "pytorch", 73811)

				        self.assertEqual(len(pr.get_checkrun_conclusions()), 76)

				    def test_comments_pagination(self, *args: Any) -> None:

				        "Tests that PR with 50+ comments can be fetched"

				        pr = GitHubPR("pytorch", "pytorch", 31093)

				@ -323,7 +341,9 @@ class TestTryMerge(TestCase):

				        # see https://gist.github.com/malfet/9b93bc7eeddeaf1d84546efc4f0c577f

				        pr = GitHubPR("pytorch", "pytorch", 68111)

				        self.assertGreater(len(pr.get_comments()), 20)

				        self.assertGreater(len(pr.get_checkrun_conclusions()), 3)

				        # NS(09/27/2023): GitHub seems to recycle older checkruns

				        # https://github.com/pytorch/pytorch/pull/68111/checks shows 0 runs

				        # self.assertGreater(len(pr.get_checkrun_conclusions()), 3)

				        self.assertGreater(pr.get_commit_count(), 60)

				    def test_gql_retrieve_checksuites(self, *args: Any) -> None:

				@ -368,14 +388,16 @@ class TestTryMerge(TestCase):

				    def test_get_checkruns_many_runs(self, *args: Any) -> None:

				        """Tests that all checkruns can be fetched"""

				        pr = GitHubPR("pytorch", "pytorch", 77700)

				        pr = GitHubPR("pytorch", "pytorch", 105260)

				        conclusions = pr.get_checkrun_conclusions()

				        self.assertEqual(len(conclusions), 79)

				        self.assertTrue("pull / linux-docs / build-docs (cpp)" in conclusions.keys())

				        self.assertEqual(len(conclusions), 221)

				        self.assertTrue(

				            "pull / linux-docs / build-docs-cpp-false" in conclusions.keys()

				        )

				    def test_cancelled_gets_ignored(self, *args: Any) -> None:

				        """Tests that cancelled workflow does not override existing successfull status"""

				        pr = GitHubPR("pytorch", "pytorch", 82169)

				        pr = GitHubPR("pytorch", "pytorch", 110367)

				        conclusions = pr.get_checkrun_conclusions()

				        lint_checks = [name for name in conclusions.keys() if "Lint" in name]

				        self.assertTrue(len(lint_checks) > 0)

				@ -523,108 +545,7 @@ class TestTryMerge(TestCase):

				        for case in test_cases:

				            self.assertEqual(case["expected"], remove_job_name_suffix(case["name"]))

				    def test_is_broken_trunk(self, *args: Any) -> None:

				        test_cases: List[Dict[str, Any]] = [

				            {

				                "head_job": None,

				                "base_jobs": {

				                    "job_a": {

				                        "conclusion": "success",

				                        "failure_captures": ["a", "b"],

				                    },

				                    "job_b": {

				                        "conclusion": "failure",

				                        "failure_captures": ["a", "b"],

				                    },

				                },

				                "expected": False,

				                "description": "Invalid input - head job",

				            },

				            {

				                "head_job": {

				                    "conclusion": "failure",

				                    "failure_captures": ["a", "b"],

				                },

				                "base_jobs": None,

				                "expected": False,

				                "description": "Invalid input - base jobs",

				            },

				            {

				                "head_job": {

				                    "conclusion": "failure",

				                    "failure_captures": ["a", "b"],

				                },

				                "base_jobs": {},

				                "expected": False,

				                "description": "Invalid input - empty base jobs",

				            },

				            {

				                "head_job": {

				                    "conclusion": "failure",

				                    "failure_captures": ["x", "y"],

				                },

				                "base_jobs": {

				                    "job_a": {

				                        "conclusion": "success",

				                        "failure_captures": ["a", "b"],

				                    },

				                    "job_b": {

				                        "conclusion": "failure",

				                        "failure_captures": ["x", "y"],

				                    },

				                },

				                "expected": True,

				                "description": "Found a match",

				            },

				            {

				                "head_job": {

				                    "conclusion": "success",

				                    "failure_captures": ["x", "y"],

				                },

				                "base_jobs": {

				                    "job_a": {

				                        "conclusion": "success",

				                        "failure_captures": ["a", "b"],

				                    },

				                    "job_b": {

				                        "conclusion": "failure",

				                        "failure_captures": ["x", "y"],

				                    },

				                },

				                "expected": False,

				                "description": "Not found - different conclusion",

				            },

				            {

				                "head_job": {

				                    "conclusion": "failure",

				                    "failure_captures": ["a", "b"],

				                },

				                "base_jobs": {

				                    "job_a": {

				                        "conclusion": "success",

				                        "failure_captures": ["a", "b"],

				                    },

				                    "job_b": {

				                        "conclusion": "failure",

				                        "failure_captures": ["x", "y"],

				                    },

				                },

				                "expected": False,

				                "description": "Not found - different captured failures",

				            },

				        ]

				        for case in test_cases:

				            self.assertEqual(

				                case["expected"], is_broken_trunk(case["head_job"], case["base_jobs"])

				            )

				    def test_get_merge_base(

				        self,

				        mock_gh_graphql: Any,

				        mock_get_rockset_results: Any,

				        mock_read_flaky_rules: Any,

				    ) -> None:

				    def test_get_merge_base(self, *args: Any) -> None:

				        pr = GitHubPR("pytorch", "pytorch", 104121)

				        mock_merge_base = "mocked-sha"

				@ -642,57 +563,130 @@ class TestTryMerge(TestCase):

				@mock.patch("trymerge.get_rockset_results", side_effect=mocked_rockset_results)

				@mock.patch("trymerge.gh_graphql", side_effect=mocked_gh_graphql)

				@mock.patch("trymerge.gh_fetch_merge_base", return_value="")

				@mock.patch(

				    "trymerge.get_drci_classifications", side_effect=mocked_drci_classifications

				)

				class TestBypassFailures(TestCase):

				    def test_get_classifications(self, *args: Any) -> None:

				        flaky_rules = [

				            # Try a regex rule

				            FlakyRule("distributed", ["##\\[error\\]The operation [wW]as .+"])

				        ]

				        pr = GitHubPR("pytorch", "pytorch", 92863)

				        pr = GitHubPR("pytorch", "pytorch", 109584)

				        checks = pr.get_checkrun_conclusions()

				        checks = get_classifications(

				            checks, pr.last_commit()["oid"], pr.get_merge_base(), flaky_rules, []

				            pr.pr_num,

				            pr.project,

				            checks,

				            [],

				        )

				        self.assertTrue(

				            checks[

				                "pull / linux-bionic-py3_7-clang8-xla / test (xla, 1, 1, linux.4xlarge)"

				                "pull / linux-focal-py3.11-clang10 / test (dynamo, 1, 2, linux.2xlarge)"

				            ].classification

				            == "BROKEN_TRUNK"

				        )

				        self.assertTrue(

				            checks[

				                "pull / linux-focal-py3.7-gcc7 / test (distributed, 1, 2, linux.2xlarge)"

				                "trunk / win-vs2019-cpu-py3 / test (default, 2, 3, windows.4xlarge.nonephemeral)"

				            ].classification

				            == "FLAKY"

				        )

				        self.assertTrue(

				            checks[

				                "pull / linux-jammy-py3.8-gcc11 / test (distributed, 1, 2, linux.2xlarge)"

				            ].classification

				            == "FLAKY"

				        )

				        self.assertTrue(

				            checks[

				                "pull / linux-focal-cuda11.8-py3.10-gcc9 / test (distributed, 1, 3, linux.8xlarge.nvidia.gpu)"

				            ].classification

				            == "FLAKY"

				        )

				        # Set the threshold larger or equal to the number of ok failures

				        pending, failed, ignorable = categorize_checks(

				            checks, list(checks.keys()), ok_failed_checks_threshold=2

				            checks, list(checks.keys()), ok_failed_checks_threshold=6

				        )

				        self.assertTrue(len(pending) == 0)

				        self.assertTrue(len(failed) == 0)

				        self.assertTrue(len(ignorable["FLAKY"]) == 1)

				        self.assertTrue(len(ignorable["BROKEN_TRUNK"]) == 1)

				        self.assertTrue(len(ignorable["FLAKY"]) == 4)

				        self.assertTrue(len(ignorable["BROKEN_TRUNK"]) == 2)

				        # Not set any threshold, defaults to -1 to ignore all flaky and broken trunk failures

				        pending, failed, ignorable = categorize_checks(checks, list(checks.keys()))

				        self.assertTrue(len(pending) == 0)

				        self.assertTrue(len(failed) == 0)

				        self.assertTrue(len(ignorable["FLAKY"]) == 1)

				        self.assertTrue(len(ignorable["BROKEN_TRUNK"]) == 1)

				        self.assertTrue(len(ignorable["FLAKY"]) == 4)

				        self.assertTrue(len(ignorable["BROKEN_TRUNK"]) == 2)

				        # Set the threshold lower than the number of ok failures

				        pending, failed, ignorable = categorize_checks(

				            checks, list(checks.keys()), ok_failed_checks_threshold=1

				        )

				        self.assertTrue(len(pending) == 0)

				        self.assertTrue(len(failed) == 2)

				        self.assertTrue(len(failed) == 6)

				        self.assertTrue(len(ignorable["FLAKY"]) == 4)

				        self.assertTrue(len(ignorable["BROKEN_TRUNK"]) == 2)

				        # Set the threshold to 0 like when ignore_flaky_failures is on

				        pending, failed, ignorable = categorize_checks(

				            checks, list(checks.keys()), ok_failed_checks_threshold=1

				        )

				        self.assertTrue(len(pending) == 0)

				        self.assertTrue(len(failed) == 6)

				        self.assertTrue(len(ignorable["FLAKY"]) == 4)

				        self.assertTrue(len(ignorable["BROKEN_TRUNK"]) == 2)

				    def test_get_classifications_flaky_fullname(self, *args: Any) -> None:

				        pr = GitHubPR("pytorch", "pytorch", 110362)

				        checks = pr.get_checkrun_conclusions()

				        checks = get_classifications(

				            pr.pr_num,

				            pr.project,

				            checks,

				            [],

				        )

				        pending, failed, ignorable = categorize_checks(checks, list(checks.keys()))

				        self.assertTrue(len(pending) == 0)

				        self.assertTrue(len(failed) == 0)

				        self.assertTrue(len(ignorable["FLAKY"]) == 1)

				    def test_get_classifications_invalid_cancel(self, *args: Any) -> None:

				        pr = GitHubPR("pytorch", "pytorch", 110367)

				        checks = pr.get_checkrun_conclusions()

				        checks = get_classifications(

				            pr.pr_num,

				            pr.project,

				            checks,

				            [],

				        )

				        pending, failed, ignorable = categorize_checks(checks, list(checks.keys()))

				        self.assertTrue(len(pending) == 0)

				        self.assertTrue(len(failed) == 0)

				        self.assertTrue(len(ignorable["FLAKY"]) == 0)

				        self.assertTrue(len(ignorable["BROKEN_TRUNK"]) == 0)

				        self.assertTrue(len(ignorable["UNSTABLE"]) == 3)

				    def test_get_classifications_similar_failures(self, *args: Any) -> None:

				        pr = GitHubPR("pytorch", "pytorch", 109750)

				        checks = pr.get_checkrun_conclusions()

				        checks = get_classifications(

				            pr.pr_num,

				            pr.project,

				            checks,

				            [],

				        )

				        pending, failed, ignorable = categorize_checks(checks, list(checks.keys()))

				        self.assertTrue(len(pending) == 0)

				        self.assertTrue(len(failed) == 0)

				        self.assertTrue(len(ignorable["FLAKY"]) == 1)

				        self.assertTrue(len(ignorable["BROKEN_TRUNK"]) == 1)

				    def test_get_classifications_unstable(self, *args: Any) -> None:

				        pr = GitHubPR("pytorch", "pytorch", 104312)

				        checks = pr.get_checkrun_conclusions()

				        checks = get_classifications(

				            checks, pr.last_commit()["oid"], pr.get_merge_base(), [], []

				            pr.pr_num,

				            pr.project,

				            checks,

				            [],

				        )

				        workflow_name = "linux-bionic-cuda12.1-py3.10-gcc9-bazel-test"

				        job_name = "build-and-test (default, 1, 1, linux.4xlarge.nvidia.gpu, unstable)"

				@ -706,19 +700,6 @@ class TestBypassFailures(TestCase):

				        self.assertTrue(len(failed) == 0)

				        self.assertTrue(len(ignorable["UNSTABLE"]) == 1)

				    def test_get_classifications_pending_unstable(self, *args: Any) -> None:

				        pr = GitHubPR("pytorch", "pytorch", 105998)

				        checks = pr.get_checkrun_conclusions()

				        checks = get_classifications(

				            checks, pr.last_commit()["oid"], pr.get_merge_base(), [], []

				        )

				        pending, failed, ignorable = categorize_checks(

				            checks, list(checks.keys()), ok_failed_checks_threshold=1

				        )

				        self.assertTrue(len(pending) == 0)

				        self.assertTrue(len(failed) == 3)

				        self.assertTrue(len(ignorable["UNSTABLE"]) == 3)

				    def test_get_classifications_broken_trunk(self, *args: Any) -> None:

				        # The mock merge base is the actual value returned by gh_fetch_merge_base

				        test_cases = [

				@ -726,13 +707,13 @@ class TestBypassFailures(TestCase):

				                # This PR had one broken trunk failure but it was run on a different shard

				                # than the one on the base commit. This should still count as broken trunk

				                "pr_num": 104214,

				                "mock_merge_base": "436d035dc74db9c703297a62163b0cad0c546665",

				                "related_failure_count": 0,

				                "unrelated_failure_count": 1,

				            },

				            {

				                # This PR had one broken trunk failure and it used ghstack

				                "pr_num": 105145,

				                "mock_merge_base": "194fe1d12f9860734cc28ed21bdabda2fbb06336",

				                "related_failure_count": 0,

				                "unrelated_failure_count": 1,

				            },

				            {

				@ -741,112 +722,81 @@ class TestBypassFailures(TestCase):

				                # keep the failure record from the merge base so that it can

				                # be used to detect broken trunk

				                "pr_num": 107160,

				                "mock_merge_base": "a5d841ef01e615e2a654fb12cf0cd08697d12ccf",

				                "related_failure_count": 0,

				                "unrelated_failure_count": 4,

				            },

				            {

				                # This PR used Dr.CI broken trunk classification

				                "pr_num": 111253,

				                "related_failure_count": 1,

				                "unrelated_failure_count": 2,

				            },

				        ]

				        for case in test_cases:

				            pr_num = case["pr_num"]

				            mock_merge_base = case["mock_merge_base"]

				            related_failure_count = case["related_failure_count"]

				            unrelated_failure_count = case["unrelated_failure_count"]

				            pr = GitHubPR("pytorch", "pytorch", cast(int, pr_num))

				            with mock.patch(

				                "trymerge.gh_fetch_merge_base", return_value=mock_merge_base

				            ) as mocked_gh_fetch_merge_base:

				                checks = pr.get_checkrun_conclusions()

				                checks = get_classifications(

				                    checks, pr.last_commit()["oid"], pr.get_merge_base(), [], []

				                )

				            pr = GitHubPR("pytorch", "pytorch", pr_num)

				            checks = pr.get_checkrun_conclusions()

				            checks = get_classifications(

				                pr.pr_num,

				                pr.project,

				                checks,

				                [],

				            )

				                pending, failed, _ = categorize_checks(checks, list(checks.keys()))

				                self.assertTrue(len(pending) == 0)

				                self.assertTrue(len(failed) == 0)

				            pending, failed, _ = categorize_checks(checks, list(checks.keys()))

				            self.assertTrue(len(pending) == 0)

				            self.assertTrue(len(failed) == related_failure_count)

				                # When the ok_failed_checks_threshold is set to 0, the broken trunk failure

				                # won't be ignored

				                pending, failed, _ = categorize_checks(

				                    checks, list(checks.keys()), ok_failed_checks_threshold=0

				                )

				                self.assertTrue(len(pending) == 0)

				                self.assertTrue(len(failed) == unrelated_failure_count)

				            # When the ok_failed_checks_threshold is set to 0, the broken trunk failure

				            # won't be ignored

				            pending, failed, _ = categorize_checks(

				                checks, list(checks.keys()), ok_failed_checks_threshold=0

				            )

				            self.assertTrue(len(pending) == 0)

				            self.assertTrue(

				                len(failed) == unrelated_failure_count + related_failure_count

				            )

				    def test_ignore_current(self, *args: Any) -> None:

				        # Test various interactions of the failure classifier to ensure that ignore

				        # current checks takes place after other classifications: flaky, unstable,

				        # or broken trunk. Only actual new failures should be kept in the list of

				        # ignore current checks to use to record force merge with actual failures

				        flaky_rules = [

				            FlakyRule("distributed", ["##\\[error\\]The operation was canceled."])

				        ]

				        flaky = (

				            "pull / linux-focal-py3.7-gcc7 / test (distributed, 1, 2, linux.2xlarge)"

				        )

				        flaky = "pull / linux-focal-cuda11.8-py3.10-gcc9 / test (distributed, 1, 3, linux.8xlarge.nvidia.gpu)"

				        broken_trunk = (

				            "pull / linux-bionic-py3_7-clang8-xla / test (xla, 1, 1, linux.4xlarge)"

				            "pull / linux-focal-py3.11-clang10 / test (dynamo, 1, 2, linux.2xlarge)"

				        )

				        pr = GitHubPR("pytorch", "pytorch", 92863)

				        pr = GitHubPR("pytorch", "pytorch", 109584)

				        checks = pr.get_checkrun_conclusions()

				        # No broken trunk or flaky rules, then all failures are ignored when ic is used

				        checks = get_classifications(

				            checks, pr.last_commit()["oid"], None, [], [broken_trunk, flaky]

				        )

				        self.assertTrue(checks[flaky].classification == "IGNORE_CURRENT_CHECK")

				        self.assertTrue(checks[broken_trunk].classification == "IGNORE_CURRENT_CHECK")

				        _, failed, ignorable = categorize_checks(

				            checks, list(checks.keys()), ok_failed_checks_threshold=2

				        )

				        self.assertTrue(len(failed) == 0)

				        self.assertTrue(len(ignorable["IGNORE_CURRENT_CHECK"]) == 2)

				        self.assertTrue(len(ignorable["FLAKY"]) == 0)

				        self.assertTrue(len(ignorable["BROKEN_TRUNK"]) == 0)

				        # Known flaky failure takes precedence over ignore current (need to set the

				        # merge base here to get the results from Rockset, and that categorize the

				        # broken trunk failure too

				        checks = get_classifications(

				            pr.pr_num,

				            pr.project,

				            checks,

				            pr.last_commit()["oid"],

				            pr.get_merge_base(),

				            flaky_rules,

				            [broken_trunk, flaky],

				        )

				        self.assertTrue(checks[flaky].classification == "FLAKY")

				        self.assertTrue(checks[broken_trunk].classification == "BROKEN_TRUNK")

				        _, failed, ignorable = categorize_checks(

				            checks, list(checks.keys()), ok_failed_checks_threshold=2

				        )

				        _, failed, ignorable = categorize_checks(checks, list(checks.keys()))

				        self.assertTrue(len(failed) == 0)

				        self.assertTrue(len(ignorable["IGNORE_CURRENT_CHECK"]) == 0)

				        self.assertTrue(len(ignorable["FLAKY"]) == 1)

				        self.assertTrue(len(ignorable["BROKEN_TRUNK"]) == 1)

				        self.assertTrue(len(ignorable["FLAKY"]) == 4)

				        self.assertTrue(len(ignorable["BROKEN_TRUNK"]) == 2)

				        # Broken trunk takes precedence over ignore current (no flaky rule is set here)

				        checks = get_classifications(

				            checks,

				            pr.last_commit()["oid"],

				            pr.get_merge_base(),

				            [],

				            [broken_trunk, flaky],

				        )

				        self.assertTrue(checks[flaky].classification == "IGNORE_CURRENT_CHECK")

				        self.assertTrue(checks[broken_trunk].classification == "BROKEN_TRUNK")

				        _, failed, ignorable = categorize_checks(

				            checks, list(checks.keys()), ok_failed_checks_threshold=2

				        )

				        self.assertTrue(len(failed) == 0)

				        self.assertTrue(len(ignorable["IGNORE_CURRENT_CHECK"]) == 1)

				        self.assertTrue(len(ignorable["FLAKY"]) == 0)

				        self.assertTrue(len(ignorable["BROKEN_TRUNK"]) == 1)

				    @mock.patch("trymerge.read_flaky_rules", side_effect=xla_is_flaky_rules)

				    @mock.patch("trymerge.read_merge_rules", side_effect=xla_merge_rules)

				    def test_dont_ignore_flaky_failures(self, *args: Any) -> None:

				        """Regression test for https://github.com/pytorch/test-infra/issues/4126"""

				        pr = GitHubPR("pytorch", "pytorch", 100369)

				        """

				        Regression test for https://github.com/pytorch/test-infra/issues/4126

				        """

				        pr = GitHubPR("pytorch", "pytorch", 105312)

				        repo = DummyGitRepo()

				        # Check that failure is classified as flaky but still raises exception

				        with warnings.catch_warnings(record=True) as w, self.assertRaises(RuntimeError):

				@ -861,14 +811,97 @@ class TestBypassFailures(TestCase):

				@mock.patch("trymerge.get_rockset_results", side_effect=mocked_rockset_results)

				@mock.patch("trymerge.gh_graphql", side_effect=mocked_gh_graphql)

				@mock.patch("trymerge.gh_fetch_merge_base", return_value="")

				class TestGitHubPRGhstackDependencies2(TestCase):

				@mock.patch("trymerge.get_drci_classifications", return_value={})

				class TestBypassFailuresOnSandCastle(TestCase):

				    def test_get_classifications(self, *args: Any) -> None:

				        pr = GitHubPR("pytorch", "pytorch", 111467)

				        checks = pr.get_checkrun_conclusions()

				        checks = get_classifications(

				            pr.pr_num,

				            pr.project,

				            checks,

				            [],

				        )

				        pending, failed, ignorable = categorize_checks(checks, list(checks.keys()))

				        self.assertTrue(len(pending) == 0)

				        self.assertTrue(len(failed) == 0)

				        self.assertTrue(len(ignorable["FLAKY"]) == 1)

				        self.assertTrue(len(ignorable["BROKEN_TRUNK"]) == 1)

				    def test_get_classifications_drci_checkrun_not_found(self, *args: Any) -> None:

				        pr = GitHubPR("pytorch", "pytorch", 111467)

				        # No summary

				        checks = pr.get_checkrun_conclusions()

				        checks[DRCI_CHECKRUN_NAME] = JobCheckState(

				            DRCI_CHECKRUN_NAME,

				            "",

				            "NEUTRAL",

				            None,

				            1,

				            "",

				            None,

				        )

				        checks = get_classifications(

				            pr.pr_num,

				            pr.project,

				            checks,

				            [],

				        )

				        pending, failed, ignorable = categorize_checks(checks, list(checks.keys()))

				        self.assertTrue(len(pending) == 0)

				        self.assertTrue(len(failed) == 2)

				        # Empty summary

				        checks = pr.get_checkrun_conclusions()

				        checks[DRCI_CHECKRUN_NAME] = JobCheckState(

				            DRCI_CHECKRUN_NAME,

				            "",

				            "NEUTRAL",

				            None,

				            1,

				            "",

				            "",

				        )

				        checks = get_classifications(

				            pr.pr_num,

				            pr.project,

				            checks,

				            [],

				        )

				        pending, failed, ignorable = categorize_checks(checks, list(checks.keys()))

				        self.assertTrue(len(pending) == 0)

				        self.assertTrue(len(failed) == 2)

				        # No Dr.CI checkrun

				        checks = pr.get_checkrun_conclusions()

				        del checks[DRCI_CHECKRUN_NAME]

				        checks = get_classifications(

				            pr.pr_num,

				            pr.project,

				            checks,

				            [],

				        )

				        pending, failed, ignorable = categorize_checks(checks, list(checks.keys()))

				        self.assertTrue(len(pending) == 0)

				        self.assertTrue(len(failed) == 2)

				@mock.patch("trymerge.get_rockset_results", side_effect=mocked_rockset_results)

				@mock.patch("trymerge.gh_graphql", side_effect=mocked_gh_graphql)

				@mock.patch("trymerge.gh_fetch_merge_base", return_value="")

				@mock.patch(

				    "trymerge.get_drci_classifications", side_effect=mocked_drci_classifications

				)

				class TestGitHubPRGhstackDependencies(TestCase):

				    def test_pr_dependencies(self, *args: Any) -> None:

				        pr = GitHubPR("pytorch", "pytorch", 106068)

				        msg = pr.gen_commit_message(filter_ghstack=True)

				        assert msg == (

				            "[FSDP] Break up `_post_backward_hook` into smaller funcs (#106068)\n\n\nDifferential Revision: ["

				            "D47852461](https://our.internmc.facebook.com/intern/diff/D47852461)\nPull Request resolved: "

				            "https://github.com/pytorch/pytorch/pull/106068\nApproved by: \n"

				        self.assertEqual(

				            msg,

				            f"{pr.get_title()} (#106068)\n\n{RE_GHSTACK_DESC.sub('', pr.get_body())}\n"

				            "Pull Request resolved: https://github.com/pytorch/pytorch/pull/106068\n"

				            "Approved by: https://github.com/ezyang, https://github.com/fegin\n",

				        )

				    def test_pr_dependencies_ghstack(self, *args: Any) -> None:

				@ -876,13 +909,13 @@ class TestGitHubPRGhstackDependencies2(TestCase):

				        pr1 = GitHubPR("pytorch", "pytorch", 106033)

				        pr2 = GitHubPR("pytorch", "pytorch", 106034)

				        pr = GitHubPR("pytorch", "pytorch", 106068)

				        msg = pr.gen_commit_message(filter_ghstack=True, ghstack_deps=[pr0, pr1, pr2])

				        assert msg == (

				            "[FSDP] Break up `_post_backward_hook` into smaller funcs (#106068)\n\n\nDifferential Revision: ["

				            "D47852461](https://our.internmc.facebook.com/intern/diff/D47852461)\nPull Request resolved: "

				            "https://github.com/pytorch/pytorch/pull/106068\nApproved by: \n"

				            "ghstack dependencies: #106032, #106033, #106034\n"

				        self.assertEqual(

				            msg,

				            f"{pr.get_title()} (#106068)\n\n{RE_GHSTACK_DESC.sub('', pr.get_body())}\n"

				            "Pull Request resolved: https://github.com/pytorch/pytorch/pull/106068\n"

				            "Approved by: https://github.com/ezyang, https://github.com/fegin\n"

				            "ghstack dependencies: #106032, #106033, #106034\n",

				        )

				    @skip(

				@ -931,7 +964,7 @@ class TestGitHubPRGhstackDependencies2(TestCase):

				        mock_repo.cherry_pick.assert_any_call("rev2")

				        mock_repo.cherry_pick.assert_any_call("rev123")

				        assert mock.call("rev1") not in mock_repo.cherry_pick.call_args_list

				        self.assertTrue(mock.call("rev1") not in mock_repo.cherry_pick.call_args_list)

				        # Verify the first call

				        message = mock_repo.amend_commit_message.call_args_list[0].args[0]

				@ -944,8 +977,8 @@ class TestGitHubPRGhstackDependencies2(TestCase):

				            "dependencies: #106032, #106033\n"

				        )

				        assert message.startswith(prefix)

				        assert message.endswith(suffix)

				        self.assertTrue(message.startswith(prefix))

				        self.assertTrue(message.endswith(suffix))

				        # Verify the second call

				        mock_repo.amend_commit_message.assert_any_call(

									
										249

.github/scripts/trymerge.py
									
										vendored
									
												View File
												
				@ -30,6 +30,7 @@ from github_utils import (

				    gh_fetch_url,

				    gh_post_commit_comment,

				    gh_post_pr_comment,

				    gh_update_pr_state,

				    GitHubComment,

				)

				@ -61,6 +62,7 @@ class JobCheckState(NamedTuple):

				    classification: Optional[str]

				    job_id: Optional[int]

				    title: Optional[str]

				    summary: Optional[str]

				JobNameToStateDict = Dict[str, JobCheckState]

				@ -74,29 +76,6 @@ class WorkflowCheckState:

				        self.jobs: JobNameToStateDict = {}

				class FlakyRule:

				    def __init__(self, name: str, captures: List[str]):

				        self.name = re.compile(name)

				        self.captures = [re.compile(r) for r in captures]

				    def matches(self, job: Optional[Dict[str, Any]]) -> bool:

				        return (

				            job is not None

				            and self.name.search(job.get("name", "")) is not None

				            and job.get("failure_captures") is not None

				            and all(

				                any(

				                    r.search(capture) is not None

				                    for capture in job.get("failure_captures", [])

				                )

				                for r in self.captures

				            )

				        )

				    def __repr__(self) -> str:

				        return f"FlakyRule[name='{self.name}', captures={self.captures}]"

				GH_PR_REVIEWS_FRAGMENT = """

				fragment PRReviews on PullRequestReviewConnection {

				  nodes {

				@ -141,6 +120,7 @@ fragment PRCheckSuites on CheckSuiteConnection {

				          detailsUrl

				          databaseId

				          title

				          summary

				        }

				        pageInfo {

				          endCursor

				@ -332,6 +312,7 @@ query ($owner: String!, $name: String!, $number: Int!, $cs_cursor: String, $cr_c

				                    detailsUrl

				                    databaseId

				                    title

				                    summary

				                  }

				                  pageInfo {

				                    endCursor

				@ -456,6 +437,7 @@ MERGE_RULE_PATH = Path(".github") / "merge_rules.yaml"

				ROCKSET_MERGES_COLLECTION = "merges"

				ROCKSET_MERGES_WORKSPACE = "commons"

				REMOTE_MAIN_BRANCH = "origin/main"

				DRCI_CHECKRUN_NAME = "Dr.CI"

				INTERNAL_CHANGES_CHECKRUN_NAME = "Meta Internal-Only Changes Check"

				HAS_NO_CONNECTED_DIFF_TITLE = (

				    "There is no internal Diff connected, this can be merged now"

				@ -569,6 +551,7 @@ def add_workflow_conclusions(

				                            classification=None,

				                            job_id=checkrun_node["databaseId"],

				                            title=checkrun_node["title"],

				                            summary=checkrun_node["summary"],

				                        )

				                if bool(checkruns["pageInfo"]["hasNextPage"]):

				@ -599,6 +582,7 @@ def add_workflow_conclusions(

				                classification=None,

				                job_id=None,

				                title=None,

				                summary=None,

				            )

				    for job_name, job in no_workflow_obj.jobs.items():

				        res[job_name] = job

				@ -924,6 +908,7 @@ class GitHubPR:

				                    classification=None,

				                    job_id=None,

				                    title=None,

				                    summary=None,

				                )

				        return self.conclusions

				@ -1261,13 +1246,6 @@ def read_merge_rules(

				        return [MergeRule(**x) for x in rc]

				@lru_cache(maxsize=None)

				def read_flaky_rules() -> List[FlakyRule]:

				    # NOTE: This is currently hardcoded, can be extended to do per repo rules

				    FLAKY_RULES_URL = "https://raw.githubusercontent.com/pytorch/test-infra/generated-stats/stats/flaky-rules.json"

				    return _get_flaky_rules(FLAKY_RULES_URL)

				def find_matching_merge_rule(

				    pr: GitHubPR,

				    repo: Optional[GitRepo] = None,

				@ -1298,25 +1276,15 @@ def find_matching_merge_rule(

				    reject_reason = f"No rule found to match PR. Please [report]{issue_link} this issue to DevX team."

				    rules = read_merge_rules(repo, pr.org, pr.project)

				    flaky_rules = read_flaky_rules()

				    if not rules:

				        reject_reason = f"Rejecting the merge as no rules are defined for the repository in {MERGE_RULE_PATH}"

				        raise RuntimeError(reject_reason)

				    checks = pr.get_checkrun_conclusions()

				    base_rev = None

				    try:

				        # is allowed to fail if git is not available

				        base_rev = pr.get_merge_base()

				    except Exception as e:

				        print(

				            f"Failed fetching base git revision for {pr.pr_num}. Skipping additional classifications.\n"

				            f"{type(e)}\n{e}"

				        )

				    checks = get_classifications(

				        pr.pr_num,

				        pr.project,

				        checks,

				        pr.last_commit()["oid"],

				        base_rev,

				        flaky_rules,

				        ignore_current_checks=ignore_current_checks,

				    )

				@ -1467,11 +1435,6 @@ def checks_to_markdown_bullets(

				    ]

				@retries_decorator(rc=[])

				def _get_flaky_rules(url: str) -> List[FlakyRule]:

				    return [FlakyRule(**rule) for rule in gh_fetch_json_list(url)]

				@retries_decorator()

				def save_merge_record(

				    collection: str,

				@ -1575,6 +1538,27 @@ where

				        return []

				@retries_decorator()

				def get_drci_classifications(pr_num: int, project: str = "pytorch") -> Any:

				    """

				    Query HUD API to find similar failures to decide if they are flaky

				    """

				    # NB: This doesn't work internally atm because this requires making an

				    # external API call to HUD

				    failures = gh_fetch_url(

				        f"https://hud.pytorch.org/api/drci/drci?prNumber={pr_num}",

				        data=f"repo={project}",

				        headers={

				            "Authorization": os.getenv("DRCI_BOT_KEY", ""),

				            "Accept": "application/vnd.github.v3+json",

				        },

				        method="POST",

				        reader=json.load,

				    )

				    return failures.get(str(pr_num), {}) if failures else {}

				REMOVE_JOB_NAME_SUFFIX_REGEX = re.compile(r", [0-9]+, [0-9]+, .+\)$")

				@ -1583,78 +1567,86 @@ def remove_job_name_suffix(name: str, replacement: str = ")") -> str:

				def is_broken_trunk(

				    head_job: Optional[Dict[str, Any]], base_jobs: Optional[Dict[str, Dict[str, Any]]]

				    name: str,

				    drci_classifications: Any,

				) -> bool:

				    if not head_job or not base_jobs:

				    if not name or not drci_classifications:

				        return False

				    # Consult the list of broken trunk failures from Dr.CI

				    return any(

				        head_job["conclusion"] == base_job["conclusion"]

				        and head_job["failure_captures"] == base_job["failure_captures"]

				        for base_job in base_jobs.values()

				        name == broken_trunk["name"]

				        for broken_trunk in drci_classifications.get("BROKEN_TRUNK", [])

				    )

				def is_flaky(

				    name: str,

				    drci_classifications: Any,

				) -> bool:

				    if not name or not drci_classifications:

				        return False

				    # Consult the list of flaky failures from Dr.CI

				    return any(name == flaky["name"] for flaky in drci_classifications.get("FLAKY", []))

				def is_invalid_cancel(

				    name: str,

				    conclusion: Optional[str],

				    drci_classifications: Any,

				) -> bool:

				    """

				    After https://github.com/pytorch/test-infra/pull/4579, invalid cancelled

				    signals have been removed from HUD and Dr.CI. The same needs to be done

				    here for consistency

				    """

				    if (

				        not name

				        or not drci_classifications

				        or not conclusion

				        or conclusion.upper() != "CANCELLED"

				    ):

				        return False

				    # If a job is cancelled and not listed as a failure by Dr.CI, it's an

				    # invalid signal and can be ignored

				    return all(

				        name != failure["name"] for failure in drci_classifications.get("FAILED", [])

				    )

				def get_classifications(

				    pr_num: int,

				    project: str,

				    checks: Dict[str, JobCheckState],

				    head_sha: str,

				    merge_base: Optional[str],

				    flaky_rules: List[FlakyRule],

				    ignore_current_checks: Optional[List[str]],

				) -> Dict[str, JobCheckState]:

				    # Group by job name without shard id and suffix to correctly identify broken

				    # trunk failures, i.e. linux-bionic-cuda12.1-py3.10-gcc9-sm86 / test (default)

				    head_sha_jobs: Dict[str, Dict[str, Dict[str, Any]]] = defaultdict(dict)

				    merge_base_jobs: Dict[str, Dict[str, Dict[str, Any]]] = defaultdict(dict)

				    # Get the failure classification from Dr.CI, which is the source of truth

				    # going forward. It's preferable to try calling Dr.CI API directly first

				    # to get the latest results as well as update Dr.CI PR comment

				    drci_classifications = get_drci_classifications(pr_num=pr_num, project=project)

				    print(f"From Dr.CI API: {json.dumps(drci_classifications)}")

				    if merge_base is not None:

				        def insert(

				            d: Dict[str, Dict[str, Dict[str, Any]]],

				            key: str,

				            val: Dict[str, Any],

				            overwrite_failed_run_attempt: bool,

				        ) -> None:

				            key_no_suffix = remove_job_name_suffix(key)

				            if key not in d[key_no_suffix]:

				                d[key_no_suffix][key] = val

				                return

				            # When overwrite_failed_run_attempt is set to True, always overwrite

				            # the job with the result from the latest attempt. This option is for

				            # jobs from the pull request head_sha where the latest retry is used

				            # when merging

				            #

				            # When overwrite_failed_run_attempt is False, only overwrite the job

				            # with the result from the latest attempt if the latest retry failed.

				            # This option is for jobs from the merger_base where we want to record

				            # failures for broken trunk

				            if d[key_no_suffix][key]["id"] < val["id"] and (

				                overwrite_failed_run_attempt or not is_passing_status(val["conclusion"])

				            ):

				                d[key_no_suffix][key] = val

				        rockset_results = get_rockset_results(head_sha, merge_base)

				        for rockset_result in rockset_results:

				            name = f"{rockset_result['workflow_name']} / {rockset_result['name']}"

				            if rockset_result["head_sha"] == head_sha:

				                insert(

				                    head_sha_jobs,

				                    name,

				                    rockset_result,

				                    overwrite_failed_run_attempt=True,

				                )

				            else:

				                insert(

				                    merge_base_jobs,

				                    name,

				                    rockset_result,

				                    overwrite_failed_run_attempt=False,

				                )

				    # NB: if the latest results from Dr.CI is not available, i.e. when calling from

				    # SandCastle, we fallback to any results we can find on Dr.CI check run summary

				    if (

				        not drci_classifications

				        and DRCI_CHECKRUN_NAME in checks

				        and checks[DRCI_CHECKRUN_NAME]

				        and checks[DRCI_CHECKRUN_NAME].summary

				    ):

				        drci_summary = checks[DRCI_CHECKRUN_NAME].summary

				        try:

				            print(f"From Dr.CI checkrun summary: {drci_summary}")

				            drci_classifications = json.loads(str(drci_summary))

				        except json.JSONDecodeError as error:

				            warn("Invalid Dr.CI checkrun summary")

				            drci_classifications = {}

				    checks_with_classifications = checks.copy()

				    for name, check in checks.items():

				        if check.status == "SUCCESS":

				        if check.status == "SUCCESS" or check.status == "NEUTRAL":

				            continue

				        if "unstable" in name:

				@ -1665,13 +1657,13 @@ def get_classifications(

				                "UNSTABLE",

				                check.job_id,

				                check.title,

				                check.summary,

				            )

				            continue

				        name_no_suffix = remove_job_name_suffix(name)

				        head_sha_job = head_sha_jobs.get(name_no_suffix, {}).get(name)

				        if is_broken_trunk(head_sha_job, merge_base_jobs.get(name_no_suffix)):

				        # NB: It's important to note that when it comes to ghstack and broken trunk classification,

				        # Dr.CI uses the base of the whole stack

				        if is_broken_trunk(name, drci_classifications):

				            checks_with_classifications[name] = JobCheckState(

				                check.name,

				                check.url,

				@ -1679,12 +1671,34 @@ def get_classifications(

				                "BROKEN_TRUNK",

				                check.job_id,

				                check.title,

				                check.summary,

				            )

				            continue

				        elif any(rule.matches(head_sha_job) for rule in flaky_rules):

				        elif is_flaky(name, drci_classifications):

				            checks_with_classifications[name] = JobCheckState(

				                check.name, check.url, check.status, "FLAKY", check.job_id, check.title

				                check.name,

				                check.url,

				                check.status,

				                "FLAKY",

				                check.job_id,

				                check.title,

				                check.summary,

				            )

				            continue

				        elif is_invalid_cancel(name, check.status, drci_classifications):

				            # NB: Create a new category here for invalid cancelled signals because

				            # there are usually many of them when they happen. So, they shouldn't

				            # be counted toward ignorable failures threshold

				            checks_with_classifications[name] = JobCheckState(

				                check.name,

				                check.url,

				                check.status,

				                "INVALID_CANCEL",

				                check.job_id,

				                check.title,

				                check.summary,

				            )

				            continue

				@ -1696,6 +1710,7 @@ def get_classifications(

				                "IGNORE_CURRENT_CHECK",

				                check.job_id,

				                check.title,

				                check.summary,

				            )

				    return checks_with_classifications

				@ -1789,6 +1804,7 @@ def try_revert(

				    if not dry_run:

				        pr.add_numbered_label("reverted")

				        gh_post_commit_comment(pr.org, pr.project, commit_sha, revert_msg)

				        gh_update_pr_state(pr.org, pr.project, pr.pr_num)

				def prefix_with_github_url(suffix_str: str) -> str:

				@ -1864,6 +1880,8 @@ def categorize_checks(

				            # ignored anyway. This is useful to not need to wait for scarce resources

				            # like ROCm, which is also frequently in unstable mode

				            pending_checks.append((checkname, url, job_id))

				        elif classification == "INVALID_CANCEL":

				            continue

				        elif not is_passing_status(check_runs[checkname].status):

				            target = (

				                ignorable_failed_checks[classification]

				@ -1909,7 +1927,8 @@ def merge(

				    ignore_current: bool = False,

				) -> None:

				    initial_commit_sha = pr.last_commit()["oid"]

				    print(f"Attempting merge of {initial_commit_sha}")

				    pr_link = f"https://github.com/{pr.org}/{pr.project}/pull/{pr.pr_num}"

				    print(f"Attempting merge of {initial_commit_sha} ({pr_link})")

				    if MERGE_IN_PROGRESS_LABEL not in pr.get_labels():

				        gh_add_labels(pr.org, pr.project, pr.pr_num, [MERGE_IN_PROGRESS_LABEL])

				@ -1974,7 +1993,6 @@ def merge(

				    start_time = time.time()

				    last_exception = ""

				    elapsed_time = 0.0

				    flaky_rules = read_flaky_rules()

				    ignore_current_checks = [

				        x[0] for x in ignore_current_checks_info

				    ]  # convert to List[str] for convenience

				@ -2007,10 +2025,9 @@ def merge(

				            checks = pr.get_checkrun_conclusions()

				            checks = get_classifications(

				                pr.pr_num,

				                pr.project,

				                checks,

				                pr.last_commit()["oid"],

				                pr.get_merge_base(),

				                flaky_rules,

				                ignore_current_checks=ignore_current_checks,

				            )

				            pending, failing, _ = categorize_checks(

									
										22

.github/scripts/tryrebase.py
									
										vendored
									
												View File
												
				@ -51,7 +51,7 @@ def post_already_uptodate(

				def rebase_onto(

				    pr: GitHubPR, repo: GitRepo, onto_branch: str, dry_run: bool = False

				) -> None:

				) -> bool:

				    branch = f"pull/{pr.pr_num}/head"

				    remote_url = f"https://github.com/{pr.info['headRepository']['nameWithOwner']}.git"

				    refspec = f"{branch}:{pr.head_ref()}"

				@ -68,6 +68,7 @@ def rebase_onto(

				        push_result = repo._run_git("push", "-f", remote_url, refspec)

				    if "Everything up-to-date" in push_result:

				        post_already_uptodate(pr, repo, onto_branch, dry_run)

				        return False

				    else:

				        gh_post_comment(

				            pr.org,

				@ -78,18 +79,21 @@ def rebase_onto(

				            + "git pull --rebase`)",

				            dry_run=dry_run,

				        )

				        return True

				def rebase_ghstack_onto(

				    pr: GitHubPR, repo: GitRepo, onto_branch: str, dry_run: bool = False

				) -> None:

				) -> bool:

				    if (

				        subprocess.run(

				            [sys.executable, "-m", "ghstack", "--help"], capture_output=True

				            [sys.executable, "-m", "ghstack", "--help"],

				            capture_output=True,

				            check=False,

				        ).returncode

				        != 0

				    ):

				        subprocess.run([sys.executable, "-m", "pip", "install", "ghstack"])

				        subprocess.run([sys.executable, "-m", "pip", "install", "ghstack"], check=True)

				    orig_ref = f"{re.sub(r'/head$', '/orig', pr.head_ref())}"

				    repo.fetch(orig_ref, orig_ref)

				@ -115,8 +119,9 @@ def rebase_ghstack_onto(

				    if dry_run:

				        print("Don't know how to dry-run ghstack")

				        return False

				    else:

				        ghstack_result = subprocess.run(["ghstack"], capture_output=True)

				        ghstack_result = subprocess.run(["ghstack"], capture_output=True, check=True)

				        push_result = ghstack_result.stdout.decode("utf-8")

				        print(push_result)

				        if ghstack_result.returncode != 0:

				@ -166,6 +171,8 @@ def rebase_ghstack_onto(

				            in push_result

				        ):

				            post_already_uptodate(pr, repo, onto_branch, dry_run)

				            return False

				        return True

				def additional_rebase_failure_info(e: Exception) -> str:

				@ -222,9 +229,10 @@ def main() -> None:

				    try:

				        if pr.is_ghstack_pr():

				            with git_config_guard(repo):

				                rebase_ghstack_onto(pr, repo, onto_branch, dry_run=args.dry_run)

				                rc = rebase_ghstack_onto(pr, repo, onto_branch, dry_run=args.dry_run)

				        else:

				            rebase_onto(pr, repo, onto_branch, dry_run=args.dry_run)

				            rc = rebase_onto(pr, repo, onto_branch, dry_run=args.dry_run)

				        sys.exit(0 if rc else 1)

				    except Exception as e:

				        msg = f"Rebase failed due to {e}"

									
										3

.github/scripts/update_commit_hashes.py
									
										vendored
									
												View File
												
				@ -114,7 +114,8 @@ def main() -> None:

				    # query to see if a pr already exists

				    params = {

				        "q": f"is:pr is:open in:title author:pytorchmergebot repo:{OWNER}/{REPO} {args.repo_name} hash update"

				        "q": f"is:pr is:open in:title author:pytorchupdatebot repo:{OWNER}/{REPO} {args.repo_name} hash update",

				        "sort": "created",

				    }

				    response = git_api("/search/issues", params)

				    if response["total_count"] != 0:

5

.github/templates/common.yml.j2 vendored

View File

 @ -36,9 +36,10 @@ concurrency:
 {%- macro setup_ec2_windows() -%}
       !{{ display_ec2_information() }}
       - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
         uses: seemethere/add-github-ssh-key@v1
         uses: pytorch/test-infra/.github/actions/setup-ssh@main
         continue-on-error: true
         with:
           GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
           github-secret: ${{ secrets.GITHUB_TOKEN }}
       # Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560
       - name: Enable long paths on Windows
         shell: powershell

8

.github/templates/linux_binary_build_workflow.yml.j2 vendored

View File

 @ -55,12 +55,14 @@ jobs:
     uses: ./.github/workflows/_binary-build-linux.yml
     with:!{{ upload.binary_env_as_input(config) }}
       {%- if "aarch64" in build_environment %}
       runs_on: linux.t4g.2xlarge
       runs_on: linux.arm64.2xlarge
       ALPINE_IMAGE: "arm64v8/alpine"
       {%- elif "conda" in build_environment and config["gpu_arch_type"] == "cuda" %}
       runs_on: linux.24xlarge
       {%- endif %}
       build_name: !{{ config["build_name"] }}
       build_environment: !{{ build_environment }}
       {%- if config.pytorch_extra_install_requirements is defined %}
       {%- if config.pytorch_extra_install_requirements is defined and config.pytorch_extra_install_requirements|d('')|length > 0  %}
       PYTORCH_EXTRA_INSTALL_REQUIREMENTS: !{{ config.pytorch_extra_install_requirements }}
       {%- endif %}
     secrets:
 @ -74,7 +76,7 @@ jobs:
       build_name: !{{ config["build_name"] }}
       build_environment: !{{ build_environment }}
       {%- if "aarch64" in build_environment %}
       runs_on: linux.t4g.2xlarge
       runs_on: linux.arm64.2xlarge
       ALPINE_IMAGE: "arm64v8/alpine"
       {%- elif config["gpu_arch_type"] == "rocm" %}
       runs_on: linux.rocm.gpu

13

.github/templates/macos_binary_build_workflow.yml.j2 vendored

View File

 @ -58,9 +58,12 @@ jobs:
 {%- for config in build_configs %}
   !{{ config["build_name"] }}-build:
     if: ${{ github.repository_owner == 'pytorch' }}
     runs-on: macos-12-xl
     runs-on: !{{ macos_runner }}
     timeout-minutes: !{{ common.timeout_minutes }}
     !{{ upload.binary_env(config, true) }}
     {%- if config.pytorch_extra_install_requirements is defined and config.pytorch_extra_install_requirements|d('')|length > 0  %}
       PYTORCH_EXTRA_INSTALL_REQUIREMENTS: !{{ config.pytorch_extra_install_requirements }}
     {%- endif %}
       # For sccache access (only on non-forked PRs)
       AWS_ACCESS_KEY_ID: ${{ secrets.MACOS_SCCACHE_S3_ACCESS_KEY_ID }}
       AWS_SECRET_ACCESS_KEY: ${{ secrets.MACOS_SCCACHE_S3_SECRET_ACCESS_KEY }}
 @ -69,11 +72,15 @@ jobs:
       - name: Install conda and dependencies
         run: |
           # Install conda, setup-miniconda messes with the path that messes with the ruby stuff we do later on
           curl --retry 3 --retry-all-errors -o "${RUNNER_TEMP}/conda.sh" https://repo.anaconda.com/miniconda/Miniconda3-py310_23.5.2-0-MacOSX-x86_64.sh
           curl --retry 3 --retry-all-errors -o "${RUNNER_TEMP}/conda.sh" "https://repo.anaconda.com/miniconda/Miniconda3-py310_23.5.2-0-MacOSX-$(uname -m).sh"
           chmod +x "${RUNNER_TEMP}/conda.sh"
           /bin/bash "${RUNNER_TEMP}/conda.sh" -b -p "${RUNNER_TEMP}/anaconda"
           echo "${RUNNER_TEMP}/anaconda/bin" >> "${GITHUB_PATH}"
           echo "DEVELOPER_DIR=/Applications/Xcode_13.3.1.app/Contents/Developer" >> "${GITHUB_ENV}"
           if [ -d "/Applications/Xcode_14.3.1.app" ]; then
             echo "DEVELOPER_DIR=/Applications/Xcode_14.3.1.app/Contents/Developer" >> "${GITHUB_ENV}"
           elif [ -d "/Applications/Xcode_13.3.1.app" ]; then
             echo "DEVELOPER_DIR=/Applications/Xcode_13.3.1.app/Contents/Developer" >> "${GITHUB_ENV}"
           fi
       !{{ common.checkout(deep_clone=False, directory="pytorch") }}
       !{{ common.checkout(deep_clone=False, directory="builder", repository=common.builder_repo, branch=common.builder_branch) }}
       - name: Install sccache (only for non-forked PRs, and pushes to trunk)

1

.github/templates/upload.yml.j2 vendored

View File

 @ -68,5 +68,6 @@
       aws-pytorch-uploader-access-key-id: ${{ secrets.AWS_PYTORCH_UPLOADER_ACCESS_KEY_ID }}
       aws-pytorch-uploader-secret-access-key: ${{ secrets.AWS_PYTORCH_UPLOADER_SECRET_ACCESS_KEY }}
       conda-pytorchbot-token: ${{ secrets.CONDA_PYTORCHBOT_TOKEN }}
       conda-pytorchbot-token-test: ${{ secrets.CONDA_PYTORCHBOT_TOKEN_TEST }}
     uses: ./.github/workflows/_binary-upload.yml
 {%- endmacro %}

3

.github/templates/windows_binary_build_workflow.yml.j2 vendored

View File

 @ -59,6 +59,9 @@ jobs:
     runs-on: windows.4xlarge.nonephemeral
     timeout-minutes: !{{ common.timeout_minutes }}
     !{{ upload.binary_env(config, True) }}
     {%- if config.pytorch_extra_install_requirements is defined and config.pytorch_extra_install_requirements|d('')|length > 0  %}
       PYTORCH_EXTRA_INSTALL_REQUIREMENTS: !{{ config.pytorch_extra_install_requirements }}
     {%- endif %}
     steps:
       !{{ common.setup_ec2_windows() }}
       !{{ set_runner_specific_vars() }}

									
										1

.github/workflows/_android-build-test.yml
									
										vendored
									
												View File
												
				@ -29,6 +29,7 @@ env:

				jobs:

				  filter:

				    if: github.repository_owner == 'pytorch'

				    runs-on: [self-hosted, linux.large]

				    outputs:

				      test-matrix: ${{ steps.filter.outputs.test-matrix }}

									
										3

.github/workflows/_android-full-build-test.yml
									
										vendored
									
												View File
												
				@ -29,6 +29,7 @@ env:

				jobs:

				  filter:

				    if: github.repository_owner == 'pytorch'

				    runs-on: [self-hosted, linux.large]

				    outputs:

				      test-matrix: ${{ steps.filter.outputs.test-matrix }}

				@ -157,7 +158,7 @@ jobs:

				          # run gradle buildRelease

				          (echo "./.circleci/scripts/build_android_gradle.sh" | docker exec \

				            -e BUILD_ENVIRONMENT="pytorch-linux-focal-py3-clang7-android-ndk-r19c-gradle-build" \

				            -e BUILD_ENVIRONMENT="pytorch-linux-focal-py3-clang9-android-ndk-r21e-gradle-build" \

				            -e MAX_JOBS="$(nproc --ignore=2)" \

				            -e AWS_DEFAULT_REGION \

				            -e PR_NUMBER \

									
										9

.github/workflows/_bazel-build-test.yml
									
										vendored
									
												View File
												
				@ -33,6 +33,7 @@ env:

				jobs:

				  filter:

				    if: github.repository_owner == 'pytorch'

				    runs-on: [self-hosted, linux.large]

				    outputs:

				      test-matrix: ${{ steps.filter.outputs.test-matrix }}

				@ -120,8 +121,7 @@ jobs:

				          GITHUB_RUN_ID: ${{ github.run_id }}

				          GITHUB_RUN_NUMBER: ${{ github.run_number }}

				          GITHUB_RUN_ATTEMPT: ${{ github.run_attempt }}

				          PYTORCH_RETRY_TEST_CASES: 1

				          PYTORCH_OVERRIDE_FLAKY_SIGNAL: 1

				          JOB_ID: ${{ steps.get-job-id.outputs.job-id }}

				          REENABLED_ISSUES: ${{ needs.filter.outputs.reenabled-issues }}

				          # TODO duplicated

				          AWS_DEFAULT_REGION: us-east-1

				@ -147,6 +147,7 @@ jobs:

				            -e GITHUB_JOB \

				            -e GITHUB_RUN_NUMBER \

				            -e GITHUB_RUN_ATTEMPT \

				            -e JOB_ID \

				            -e GIT_DEFAULT_BRANCH="$GIT_DEFAULT_BRANCH" \

				            -e SHARD_NUMBER \

				            -e NUM_TEST_SHARDS \

				@ -157,8 +158,6 @@ jobs:

				            -e TORCH_CUDA_ARCH_LIST \

				            -e OUR_GITHUB_JOB_ID \

				            -e CUDA_VERSION \

				            -e PYTORCH_RETRY_TEST_CASES \

				            -e PYTORCH_OVERRIDE_FLAKY_SIGNAL \

				            --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \

				            --security-opt seccomp=unconfined \

				            --cap-add=SYS_PTRACE \

				@ -184,7 +183,7 @@ jobs:

				        shell: bash

				        if: always() && steps.test.conclusion

				        run: |

				          cat test/**/*.log || true

				          cat test/**/*_toprint.log || true

				      - name: Chown workspace

				        uses: ./.github/actions/chown-workspace

									
										7

.github/workflows/_binary-build-linux.yml
									
										vendored
									
												View File
												
				@ -15,7 +15,7 @@ on:

				          required: false

				          default: linux.12xlarge

				          type: string

				          description: Hardware to run this "build"job on, linux.12xlarge or linux.t4g.2xlarge.

				          description: Hardware to run this "build"job on, linux.12xlarge or linux.arm64.2xlarge.

				      ALPINE_IMAGE:

				        required: false

				        type: string

				@ -140,6 +140,7 @@ jobs:

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: pytorch/test-infra/.github/actions/setup-ssh@main

				        continue-on-error: true

				        with:

				          github-secret: ${{ secrets.github-token }}

				@ -159,10 +160,12 @@ jobs:

				      - name: Clean workspace

				        shell: bash

				        run: |

				          set -eux

				          rm -rf "${GITHUB_WORKSPACE}"

				          mkdir "${GITHUB_WORKSPACE}"

				          if [[ inputs.build_environment == 'linux-aarch64-binary-manywheel' ]]; then

				          if [[ ${{ inputs.build_environment }} == 'linux-aarch64-binary-manywheel' ]]; then

				            rm -rf "${RUNNER_TEMP}/artifacts"

				            mkdir "${RUNNER_TEMP}/artifacts"

				          fi

									
										3

.github/workflows/_binary-test-linux.yml
									
										vendored
									
												View File
												
				@ -62,7 +62,7 @@ on:

				      runs_on:

				        required: true

				        type: string

				        description: Hardware to run this job on. Valid values are linux.4xlarge, linux.4xlarge.nvidia.gpu, linux.t4g.2xlarge, and linux.rocm.gpu

				        description: Hardware to run this job on. Valid values are linux.4xlarge, linux.4xlarge.nvidia.gpu, linux.arm64.2xlarge, and linux.rocm.gpu

				    secrets:

				      github-token:

				        required: true

				@ -128,6 +128,7 @@ jobs:

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: pytorch/test-infra/.github/actions/setup-ssh@main

				        continue-on-error: true

				        with:

				          github-secret: ${{ secrets.github-token }}

Compare commits

3960 Commits release/2. ... flight_5

77 .ci/docker/build.sh Unescape Escape View File

12 .ci/docker/centos-rocm/Dockerfile Unescape Escape View File

1 .ci/docker/ci_commit_pins/executorch.txt Normal file Unescape Escape View File

2 .ci/docker/ci_commit_pins/huggingface.txt Unescape Escape View File

2 .ci/docker/ci_commit_pins/timm.txt Unescape Escape View File

2 .ci/docker/ci_commit_pins/triton-rocm.txt Unescape Escape View File

2 .ci/docker/ci_commit_pins/triton.txt Unescape Escape View File

9 .ci/docker/common/install_base.sh Unescape Escape View File

26 .ci/docker/common/install_conda.sh Unescape Escape View File

62 .ci/docker/common/install_executorch.sh Executable file Unescape Escape View File

18 .ci/docker/common/install_inductor_benchmark_deps.sh Unescape Escape View File

23 .ci/docker/common/install_onnx.sh Normal file → Executable file Unescape Escape View File

6 .ci/docker/common/install_rocm_magma.sh Unescape Escape View File

14 .ci/docker/common/install_thrift.sh Unescape Escape View File

10 .ci/docker/common/install_triton.sh Unescape Escape View File

44 .ci/docker/linter-cuda/Dockerfile Normal file Unescape Escape View File

35 .ci/docker/requirements-ci.txt Unescape Escape View File

12 .ci/docker/ubuntu-cuda/Dockerfile Unescape Escape View File

15 .ci/docker/ubuntu/Dockerfile Unescape Escape View File

7 .ci/onnx/test.sh Unescape Escape View File

15 .ci/pytorch/build.sh Unescape Escape View File

18 .ci/pytorch/common_utils.sh Unescape Escape View File

2 .ci/pytorch/macos-build.sh Unescape Escape View File

4 .ci/pytorch/multigpu-test.sh Unescape Escape View File

117 .ci/pytorch/test.sh Unescape Escape View File

3 .ci/pytorch/win-test-helpers/build_pytorch.bat Unescape Escape View File

3 .ci/pytorch/win-test-helpers/run_python_nn_smoketests.py Unescape Escape View File

5 .ci/pytorch/win-test-helpers/test_custom_script_ops.bat Unescape Escape View File

28 .ci/pytorch/win-test-helpers/test_libtorch.bat Unescape Escape View File

3 .ci/pytorch/win-test-helpers/test_python_jit_legacy.bat Unescape Escape View File

3 .ci/pytorch/win-test-helpers/test_python_shard.bat Unescape Escape View File

4 .ci/pytorch/win-test.sh Unescape Escape View File

28 .circleci/cimodel/data/simple/anaconda_prune_defintions.py Unescape Escape View File

2 .circleci/cimodel/data/simple/util/docker_constants.py Unescape Escape View File

51 .circleci/config.yml generated Unescape Escape View File

3 .circleci/generate_config_yml.py Unescape Escape View File

2 .circleci/scripts/binary_ios_upload.sh Unescape Escape View File

23 .circleci/scripts/binary_linux_test.sh Unescape Escape View File

11 .circleci/scripts/binary_populate_env.sh Unescape Escape View File

36 .circleci/scripts/binary_upload.sh Unescape Escape View File

29 .circleci/verbatim-sources/job-specs/binary-job-specs.yml Unescape Escape View File

4 .circleci/verbatim-sources/job-specs/job-specs-custom.yml Unescape Escape View File

15 .clang-tidy Unescape Escape View File

72 .devcontainer/README.md Normal file Unescape Escape View File

2 .devcontainer/scripts/install-dev-tools.sh Unescape Escape View File

12 .flake8 Unescape Escape View File

5 .github/actionlint.yaml vendored Unescape Escape View File

7 .github/actions/filter-test-configs/action.yml vendored Unescape Escape View File

8 .github/actions/get-workflow-job-id/action.yml vendored Unescape Escape View File

11 .github/actions/pytest-cache-upload/action.yml vendored Unescape Escape View File

12 .github/actions/upload-test-artifacts/action.yml vendored Unescape Escape View File

1 .github/auto_request_review.yml vendored Unescape Escape View File

2 .github/ci_commit_pins/audio.txt vendored Unescape Escape View File

2 .github/ci_commit_pins/fbgemm.txt vendored Unescape Escape View File

1 .github/ci_commit_pins/numpy_pytorch_interop.txt vendored Unescape Escape View File

1 .github/ci_commit_pins/timm.txt vendored Unescape Escape View File

2 .github/ci_commit_pins/torchbench.txt vendored Unescape Escape View File

2 .github/ci_commit_pins/vision.txt vendored Unescape Escape View File

2 .github/ci_commit_pins/xla.txt vendored Unescape Escape View File

15 .github/labeler.yml vendored Unescape Escape View File

43 .github/merge_rules.yaml vendored Unescape Escape View File

1 .github/pytorch-probot.yml vendored Unescape Escape View File

0 .github/requirements/conda-env-Linux-X64 → .github/requirements/conda-env-Linux-X64.txt vendored Unescape Escape View File

2 .github/requirements/conda-env-iOS → .github/requirements/conda-env-iOS.txt vendored Unescape Escape View File

2 .github/requirements/conda-env-macOS-ARM64 vendored Unescape Escape View File

2 .github/requirements/conda-env-macOS-X64 vendored Unescape Escape View File

1 .github/requirements/pip-requirements-iOS.txt vendored Unescape Escape View File

3 .github/requirements/pip-requirements-macOS.txt vendored Unescape Escape View File

2 .github/requirements/regenerate-requirements.txt vendored Unescape Escape View File

35 .github/scripts/build_triton_wheel.py vendored Unescape Escape View File

3 .github/scripts/check_labels.py vendored Unescape Escape View File

BIN .github/scripts/drci_mocks.json.gz vendored Normal file View File

29 .github/scripts/ensure_actions_will_cancel.py vendored Unescape Escape View File

13 .github/scripts/filter_test_configs.py vendored Unescape Escape View File

192 .github/scripts/generate_binary_build_matrix.py vendored Unescape Escape View File

42 .github/scripts/generate_ci_workflows.py vendored Unescape Escape View File

19 .github/scripts/get_workflow_job_id.py vendored Unescape Escape View File

40 .github/scripts/github_utils.py vendored Unescape Escape View File

3960 Commits

release/2. ... flight_5

77

.ci/docker/build.sh

View File

12

.ci/docker/centos-rocm/Dockerfile

View File

1

.ci/docker/ci_commit_pins/executorch.txt Normal file

View File

2

.ci/docker/ci_commit_pins/huggingface.txt

View File

2

.ci/docker/ci_commit_pins/timm.txt

View File

2

.ci/docker/ci_commit_pins/triton-rocm.txt

View File

2

.ci/docker/ci_commit_pins/triton.txt

View File

9

.ci/docker/common/install_base.sh

View File

26

.ci/docker/common/install_conda.sh

View File

62

.ci/docker/common/install_executorch.sh Executable file

View File

18

.ci/docker/common/install_inductor_benchmark_deps.sh

View File

23

.ci/docker/common/install_onnx.sh Normal file → Executable file

View File

6

.ci/docker/common/install_rocm_magma.sh

View File

14

.ci/docker/common/install_thrift.sh

View File

10

.ci/docker/common/install_triton.sh

View File

44

.ci/docker/linter-cuda/Dockerfile Normal file

View File

35

.ci/docker/requirements-ci.txt

View File

12

.ci/docker/ubuntu-cuda/Dockerfile

View File

15

.ci/docker/ubuntu/Dockerfile

View File

7

.ci/onnx/test.sh

View File

15

.ci/pytorch/build.sh

View File

18

.ci/pytorch/common_utils.sh

View File

2

.ci/pytorch/macos-build.sh

View File

4

.ci/pytorch/multigpu-test.sh

View File

117

.ci/pytorch/test.sh

View File

3

.ci/pytorch/win-test-helpers/build_pytorch.bat

View File

3

.ci/pytorch/win-test-helpers/run_python_nn_smoketests.py

View File

5

.ci/pytorch/win-test-helpers/test_custom_script_ops.bat

View File

28

.ci/pytorch/win-test-helpers/test_libtorch.bat

View File

3

.ci/pytorch/win-test-helpers/test_python_jit_legacy.bat

View File

3

.ci/pytorch/win-test-helpers/test_python_shard.bat

View File

4

.ci/pytorch/win-test.sh

View File

28

.circleci/cimodel/data/simple/anaconda_prune_defintions.py

View File

2

.circleci/cimodel/data/simple/util/docker_constants.py

View File

51

.circleci/config.yml generated

View File

3

.circleci/generate_config_yml.py

View File

2

.circleci/scripts/binary_ios_upload.sh

View File

23

.circleci/scripts/binary_linux_test.sh

View File

11

.circleci/scripts/binary_populate_env.sh

View File

36

.circleci/scripts/binary_upload.sh

View File

29

.circleci/verbatim-sources/job-specs/binary-job-specs.yml

View File

4

.circleci/verbatim-sources/job-specs/job-specs-custom.yml

View File

15

.clang-tidy

View File

72

.devcontainer/README.md Normal file

View File

2

.devcontainer/scripts/install-dev-tools.sh

View File

12

.flake8

View File

5

.github/actionlint.yaml vendored

View File

7

.github/actions/filter-test-configs/action.yml vendored

View File

8

.github/actions/get-workflow-job-id/action.yml vendored

View File

11

.github/actions/pytest-cache-upload/action.yml vendored

View File

12

.github/actions/upload-test-artifacts/action.yml vendored

View File

1

.github/auto_request_review.yml vendored

View File

2

.github/ci_commit_pins/audio.txt vendored

View File

2

.github/ci_commit_pins/fbgemm.txt vendored

View File

1

.github/ci_commit_pins/numpy_pytorch_interop.txt vendored

View File

1

.github/ci_commit_pins/timm.txt vendored

View File

2

.github/ci_commit_pins/torchbench.txt vendored

View File

2

.github/ci_commit_pins/vision.txt vendored

View File

2

.github/ci_commit_pins/xla.txt vendored

View File

15

.github/labeler.yml vendored

View File

43

.github/merge_rules.yaml vendored

View File

1

.github/pytorch-probot.yml vendored

View File

0

.github/requirements/conda-env-Linux-X64 → .github/requirements/conda-env-Linux-X64.txt vendored

View File

2

.github/requirements/conda-env-iOS → .github/requirements/conda-env-iOS.txt vendored

View File

2

.github/requirements/conda-env-macOS-ARM64 vendored

View File

2

.github/requirements/conda-env-macOS-X64 vendored

View File

1

.github/requirements/pip-requirements-iOS.txt vendored

View File

3

.github/requirements/pip-requirements-macOS.txt vendored

View File

2

.github/requirements/regenerate-requirements.txt vendored

View File

35

.github/scripts/build_triton_wheel.py vendored

View File

3

.github/scripts/check_labels.py vendored

View File

BIN
.github/scripts/drci_mocks.json.gz vendored Normal file

View File

29

.github/scripts/ensure_actions_will_cancel.py vendored

View File

13

.github/scripts/filter_test_configs.py vendored

View File

192

.github/scripts/generate_binary_build_matrix.py vendored

View File

42

.github/scripts/generate_ci_workflows.py vendored

View File

19

.github/scripts/get_workflow_job_id.py vendored

View File

40

.github/scripts/github_utils.py vendored

View File

105513

.github/scripts/gql_mocks.json generated vendored

View File