pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 21:14:14 +08:00

Author	SHA1	Message	Date
Dmytro Dzhulgakov	c25e33789e	Lightweight at-most-once logging for API usage (#20745 ) Summary: Resubmit #20698 which got messed up. Idea is that when PyTorch is used in a custom build environment (e.g. Facebook), it's useful to track usage of various APIs centrally. This PR introduces a simple very lightweight mechanism to do so - only first invocation of a trigger point would be logged. This is significantly more lightweight than #18235 and thus we can allow to put logging in e.g. TensorImpl. Also adds an initial list of trigger points. Trigger points are added in such a way that no static initialization triggers them, i.e. just linking with libtorch.so will not cause any logging. Further suggestions of what to log are welcomed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20745 Differential Revision: D15429196 Pulled By: dzhulgakov fbshipit-source-id: a5e41a709a65b7ebccc6b95f93854e583cf20aca	2019-05-23 23:17:59 -07:00
Edward Z. Yang	9b1dbffba5	Re-sync with internal repository (#20702 )	2019-05-20 09:22:57 -04:00
Dmytro Dzhulgakov	d3059b9c49	Lightweight logging for once-only API usage	2019-05-19 23:04:40 -07:00
Owen Anderson	79ac2120ba	Fix a few instances of notifying on a CV while holding the lock (#18857 ) Summary: Fix a few instances of notifying on a CV while holding the lock to release the lock before notifying. This avoids an extra thread suspension when the notified thread tries to grab the lock. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18857 Differential Revision: D14779132 Pulled By: resistor fbshipit-source-id: b18a05c4c15be1426ebfdffac1c8f002b771cfd7	2019-04-05 08:41:53 -07:00
Edward Yang	517c7c9861	Canonicalize all includes in PyTorch. (#14849 ) Summary: Anywhere we used #include "foo.h", we now say #include <foo.h> Paths are adjusted to be rooted out of aten/src, torch/lib, or the root level directory. I modified CMakeLists.txt by hand to remove TH and THC from the include paths. I used the following script to do the canonicalization: ``` import subprocess import re import os.path files = subprocess.check_output(['git', 'ls-files']).decode('utf-8').rstrip().split('\n') for fn in files: if not any(fn.endswith(suff) for suff in ['.cu', '.cpp', '.in', '.h', '.hpp', '.cu', '.cuh', '.cc']): continue if not any(fn.startswith(pref) for pref in ["aten/", "torch/"]): continue with open(fn, 'r') as f: c = f.read() def fmt(p): return "#include <{}>".format(p) def repl(m): p = m.group(1) if p in ["dlfcn.h", "unistd.h", "nvrtc.h", "cuda.h", "cuda_runtime.h", "cstdint", "cudnn.h", "Python.h", "cusparse.h", "cuda_runtime_api.h", "cuda_fp16.h", "cublas_v2.h", "stdint.h", "curand_kernel.h"]: return fmt(p) if any(p.startswith(pref) for pref in ["torch/csrc", "c10/", "ATen/", "caffe2/", "TH/", "THC/", "Eigen/", "gtest/", "zdl/", "gloo/", "onnx/", "miopen/"]): return fmt(p) for root in ["aten/src", "torch/lib", ""]: for bad_root in [os.path.dirname(fn), "aten/src/TH", "aten/src/THC", "torch/csrc"]: new_p = os.path.relpath(os.path.join(bad_root, p), root) if not new_p.startswith("../") and (os.path.exists(os.path.join(root, new_p)) or os.path.exists(os.path.join(root, new_p + ".in"))): return fmt(new_p) print("ERROR: ", fn, p) return m.group(0) new_c = re.sub(r'#include "([^"]+)"', repl, c) if new_c != c: print(fn) with open(fn, 'w') as f: f.write(new_c) ``` Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/14849 Reviewed By: dzhulgakov Differential Revision: D13363445 Pulled By: ezyang fbshipit-source-id: 52361f878a672785f9306c9e9ab2513128092b68	2018-12-08 19:38:30 -08:00
Pieter Noordhuis	4ec6bd7356	Add sourceRank() to ProcessGroup::Work (#14453 ) Summary: This function is only implemented for the subclasses where it makes sense. If it's not overridden it will throw an error. Having this function removes the need for a pointer passing hack to pass the source rank of a recv operation back to the caller. Instead, the caller can now call `source_rank` on the work object and achieve the same result. Closes #11804. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14453 Differential Revision: D13230898 Pulled By: pietern fbshipit-source-id: ef38f48bfaca8ef9a364e5be122951bafc9f8e49	2018-11-29 09:16:53 -08:00
Pieter Noordhuis	9598d380b0	Make ProcessGroup::Work::wait() throw (#14298 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14298 This is a breaking API change for users of the C++ c10d API. The work object defined wait() to return a boolean. If the work completed successfully it would return true, if it didn't it would return false. It was then up to the user to call the exception() function to figure out what went wrong. This has proven suboptimal as it allows users to forget about failure handling and errors may be ignored. The work class is semantically very similar to std::future, where a call to get() may throw if the underlying std::promise has set an exception. This commit changes the semantic of the work class to be similar to this and turns wait() into a void function that throws if the work completes with an exception. The exception() function can still be used to retrieve the exception if isSuccess() returns false, but now returns an std::exception_ptr instead of a reference to a std::exception. Reviewed By: manojkris Differential Revision: D13158475 fbshipit-source-id: 9cd8569b9e7cbddc867a5f34c6fd0b7be85581b8	2018-11-27 10:46:40 -08:00
Pieter Noordhuis	ee5e474fcf	Process group base class and Gloo implementation (#7628 ) This is a starting point and only implements allreduce for CPU tensors. It includes most base functionality like algorithm caching (similar approach as taken in the THD GlooCache) and multi-threaded execution (new). The expectation is that function calls on the process group class are globally serialized. They execute collective functions, so members of the collective must call the same functions in the same order, or a deadlock may happen. The algorithm cache works as follows: the ProcessGroupGloo class has a cache map from algorithm keys to algorithm entries. The algorithm key is a struct with fields that make up the signature of a collective function. It includes the dimensionality of the input/output tensors, tensor device assignment, source/destination rank, etc. For collective calls with the same key, the process group will lazily initialize and then cache a Gloo algorithm instance. For now we only keep a single algorithm instance per key, but this may be revisited in the future, if we observe contention on a single key and can exploit additional parallelism.	2018-05-23 09:02:18 -07:00

8 Commits