pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 12:54:11 +08:00

Author	SHA1	Message	Date
Jane Xu	30587195d3	Migrate c10/macros/cmake_macros.h.in to torch/headeronly (#158035 ) Summary: As above, also changes a bunch of the build files to be better Test Plan: internal and external CI did run buck2 build fbcode//caffe2:torch and it succeeded Rollback Plan: Reviewed By: swolchok Differential Revision: D78016591 Pull Request resolved: https://github.com/pytorch/pytorch/pull/158035 Approved by: https://github.com/swolchok	2025-07-15 19:52:59 +00:00
mikey dagitses	9bbee245fe	update rules_python and let bazel install its own pip dependencies (#101405 ) update rules_python and let bazel install its own pip dependencies Summary: This is the official way of doing Python in Bazel. Test Plan: Rely on CI. --- Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/pytorch/pytorch/pull/101405). * #101406 * __->__ #101405 Pull Request resolved: https://github.com/pytorch/pytorch/pull/101405 Approved by: https://github.com/vors, https://github.com/huydhn	2023-05-23 06:20:33 +00:00
Sergei Vorobev	a5e2309f5e	[bazel] Add @pytorch in tools/bazel.bzl (#91424 ) This is a follow-up from #89660 There is another place that needs to be updated. I think this time I covered all of them... Pull Request resolved: https://github.com/pytorch/pytorch/pull/91424 Approved by: https://github.com/malfet	2023-01-04 18:28:19 +00:00
Richard Barnes	ad188a227e	Introduce CUDA Device Assertions Infrastructure (#84609 ) Summary: This diff introduces a set of changes that makes it possible for the host to get assertions from CUDA devices. This includes the introduction of `CUDA_KERNEL_ASSERT2` A preprocessor macro to be used within a CUDA kernel that, upon an assertion failure, writes the assertion message, file, line number, and possibly other information to UVM (Managed memory). Once this is done, the original assertion is triggered, which places the GPU in a Bad State requiring recovery. In my tests, data written to UVM appears there before the GPU reaches the Bad State and is still accessible from the host after the GPU is in this state. Messages are written to a multi-message buffer which can, in theory, hold many assertion failures. I've done this as a precaution in case there are several, but I don't actually know whether that is possible and a simpler design which holds only a single message may well be all that is necessary. `TORCH_DSA_KERNEL_ARGS` This preprocess macro is added as an _argument_ to a kernel function's signature. It expands to supply the standardized names of all the arguments needed by `C10_CUDA_COMMUNICATING_KERNEL_ASSERTION` to handle device-side assertions. This includes, eg, the name of the pointer to the UVM memory the assertion would be written to. This macro abstracts the arguments so there is a single point of change if the system needs to be modified. `c10::cuda::get_global_cuda_kernel_launch_registry()` This host-side function returns a singleton object that manages the host's part of the device-side assertions. Upon allocation, the singleton allocates sufficient UVM (Managed) memory to hold information about several device-side assertion failures. The singleton also provides methods for getting the current traceback (used to identify when a kernel was launched). To avoid consuming all the host's memory the singleton stores launches in a circular buffer; a unique "generation number" is used to ensure that kernel launch failures map to their actual launch points (in the case that the circular buffer wraps before the failure is detected). `TORCH_DSA_KERNEL_LAUNCH` This host-side preprocessor macro replaces the standard ``` kernel_name<<<blocks, threads, shmem, stream>>>(args) ``` invocation with ``` TORCH_DSA_KERNEL_LAUNCH(blocks, threads, shmem, stream, args); ``` Internally, it fetches the UVM (Managed) pointer and generation number from the singleton and append these to the standard argument list. It also checks to ensure the kernel launches correctly. This abstraction on kernel launches can be modified to provide additional safety/logging. `c10::cuda::c10_retrieve_device_side_assertion_info` This host-side function checks, when called, that no kernel assertions have occurred. If one has. It then raises an exception with: 1. Information (file, line number) of what kernel was launched. 2. Information (file, line number, message) about the device-side assertion 3. Information (file, line number) about where the failure was detected. Checking for device-side assertions Device-side assertions are most likely to be noticed by the host when a CUDA API call such as `cudaDeviceSynchronize` is made and fails with a `cudaError_t` indicating > CUDA error: device-side assert triggered CUDA kernel errors Therefore, we rewrite `C10_CUDA_CHECK()` to include a call to `c10_retrieve_device_side_assertion_info()`. To make the code cleaner, most of the logic of `C10_CUDA_CHECK()` is now contained within a new function `c10_cuda_check_implementation()` to which `C10_CUDA_CHECK` passes the preprocessor information about filenames, function names, and line numbers. (In C++20 we can use `std::source_location` to eliminate macros entirely!) # Notes on special cases * Multiple assertions from the same block are recorded * Multiple assertions from different blocks are recorded * Launching kernels from many threads on many streams seems to be handled correctly * If two process are using the same GPU and one of the processes fails with a device-side assertion the other process continues without issue * X Multiple assertions from separate kernels on different streams seem to be recorded, but we can't reproduce the test condition * X Multiple assertions from separate devices should be all be shown upon exit, but we've been unable to generate a test that produces this condition Differential Revision: D37621532 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84609 Approved by: https://github.com/ezyang, https://github.com/malfet	2022-12-08 01:26:07 +00:00
Michael Andreas Dagitses	eb5751d84b	move gen_aten and gen_aten_hip into shared build structure Pull Request resolved: https://github.com/pytorch/pytorch/pull/77751 This requires two changes to rule generation: * pulling the cpu static dispatch prediction into the rules * disabling the Bazel-style generated file aliases Differential Revision: [D36481918](https://our.internmc.facebook.com/intern/diff/D36481918/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36481918/)! Approved by: https://github.com/kit1980, https://github.com/seemethere	2022-06-15 18:22:52 +00:00
mikey dagitses	596c54c699	add support for filtering out Bazel targets from common structure (#76173 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/76173 We need this facility temporarily to sequence some changes without breakage. This is generally not a good idea since the main purpose of this effort is to replicate builds in OSS Bazel. ghstack-source-id: 155215491 Test Plan: Manual test and rely on CI. Reviewed By: dreiss Differential Revision: D35815290 fbshipit-source-id: 89bacda373e7ba03d6a3fcbcaa5af42ae5eac154 (cherry picked from commit 1b808bbc94c939da1fd410d81b22d43bdfe1cda0)	2022-05-03 12:13:19 +00:00
mikey dagitses	f4200600e4	move Bazel version header generation to shared build structure (#75332 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/75332 ghstack-source-id: 154678044 Test Plan: Rely on OSS CI. Reviewed By: malfet Differential Revision: D35434229 fbshipit-source-id: 7cdd33fa32d0c485f44477e414c24c9bc4b74963 (cherry picked from commit 60285c613e8703c52f36f0bf1178e35c04574ffa)	2022-04-25 17:51:30 +00:00
mikey dagitses	785972b4eb	move codegen binary to the common build system (#74470 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74470 Internally, use it as well. ghstack-source-id: 152438657 Test Plan: Rely on CI to validate. Reviewed By: malfet Differential Revision: D35011144 fbshipit-source-id: fb7247470df579ae23fcbc74bd2f8d6cc55cf657 (cherry picked from commit d9b476e2507807097a59c0b0a5ddf029d8dc0ab3)	2022-03-31 15:38:16 +00:00
mikey dagitses	79307fbde0	use the //tools/codegen target in Bazel (#74465 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74465 This requires adding py_library and its PyPI dependency provider "requirement". ghstack-source-id: 152438643 Test Plan: Rely on CI to validate. Reviewed By: malfet Differential Revision: D35009795 fbshipit-source-id: 424c4968474b3c2fb37d2c7dba932b37605a63f7 (cherry picked from commit 91e442c3bf0e204b0fb6c98405aaaa7308011511)	2022-03-31 12:54:14 +00:00
mikey dagitses	2efee542fd	create a c10 test suite (#71907 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71907 This allows us to refactor the c10 tests without anything downstream needing to be concerned about it. ghstack-source-id: 150235098 Test Plan: This ought to be a no-op, rely on CI to validate. Reviewed By: malfet Differential Revision: D33815403 fbshipit-source-id: d358d6e8b1b45b62cef73bdbfd9c7709a7075c42 (cherry picked from commit a554dbe55a28516c8db2287552194860be87f2f0)	2022-03-02 11:33:22 +00:00
mikey dagitses	026c0af479	move intrusive_ptr benchmark to shared build structure (#71413 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71413 ghstack-source-id: 150235101 Test Plan: Verified manually. Rely on CI to validate. Reviewed By: malfet Differential Revision: D33635740 fbshipit-source-id: 82c6798a20c01c16fb17547d4a0ba30e6ffc272d (cherry picked from commit d7a0b39f510f59fe16f138a712d380e0091b230a)	2022-03-02 11:33:22 +00:00
mikey dagitses	e9dfc61938	extract //c10 to common build system (#71411 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71411 This library is mostly the same now externally and internally, though internal to Meta we never include cuda in this library, so our select resolves internally unconditionally to false. ghstack-source-id: 150235103 Test Plan: This ought to be a no-op, rely on CI. Reviewed By: malfet Differential Revision: D33635739 fbshipit-source-id: a4d3c7e30995c0e43ecd4c69ad0abb23498ee098 (cherry picked from commit c574a123615588adbe42cc51a713fccfa1b2cac0)	2022-03-02 11:33:22 +00:00
mikey dagitses	286f5a51f9	move //c10:tests target to the shared //c10/test package (#70928 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70928 ghstack-source-id: 148159366 Test Plan: Ensured that the same number of tests are found and run. Reviewed By: malfet Differential Revision: D33455272 fbshipit-source-id: fba1e3409b14794be3e6fe4445c56dd5361cfe9d (cherry picked from commit b45fce500aa9c3f69915bf0857144ba6d268e649)	2022-02-03 20:14:57 +00:00
mikey dagitses	6d9c0073a8	create //c10/cuda library (#70863 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70863 ghstack-source-id: 148159368 Test Plan: Ought to be a no-op: rely on CI to validate. Reviewed By: malfet Differential Revision: D33367290 fbshipit-source-id: cb550538b9eafaa0117f94077ebd4cb920688881 (cherry picked from commit 077d9578bcbf5e41e806c6acb7a8f7c622f66fe9)	2022-02-03 19:17:18 +00:00
Michael Dagitses	40e88b75c4	extract out //c10/util:base library (#70854 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70854 We can't do the entire package since parts of it depend on //c10/core. ghstack-source-id: 147170901 Test Plan: Rely on CI. Reviewed By: malfet Differential Revision: D33321821 fbshipit-source-id: 6d634da872a382a60548e2eea37a0f9f93c6f080 (cherry picked from commit 0afa808367ff92b6011b61dcbb398a2a32e5e90d)	2022-01-26 11:51:45 +00:00
Michael Dagitses	78e1f9db34	port //c10/macros to common build structure (#70852 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70852 This is the first change that uses a common build file, build.bzl, to hold most of the build logic. ghstack-source-id: 147170895 Test Plan: Relying on internal and external CI. Reviewed By: malfet Differential Revision: D33299331 fbshipit-source-id: a66afffba6deec76b758dfb39bdf61d747b5bd99 (cherry picked from commit d9163c56f55cfc97c20f5a6d505474d5b8839201)	2022-01-19 20:56:12 +00:00

16 Commits