mirror of
https://github.com/pytorch/pytorch.git
synced 2025-10-21 05:34:18 +08:00
We have a plethora of error types for various errors raised from c10d. These include `RuntimeError`, `TimeoutError`, `SocketError`, `DistBackendError` etc. This results in messy code during error handling somewhat like this: ``` if "NCCL" in exception_str: ... if "Timed out initializing process group in store based barrier on rank" in exception_str: ... if "The client socket has timed out after" in exception_str: ... if "Broken pipe" in exception_str: ... if "Connection reset by peer" in exception_str: ... ``` To address this issue, in this PR I've ensured added these error types: 1. **DistError** - the base type of all distributed errors 2. **DistBackendError** - this already existed and referred to PG backend errors 3. **DistStoreError** - for errors originating from the store 4. **DistNetworkError** - for general network errors coming from the socket library Pull Request resolved: https://github.com/pytorch/pytorch/pull/108191 Approved by: https://github.com/H-Huang
34 lines
968 B
C++
34 lines
968 B
C++
// Copyright (c) Facebook, Inc. and its affiliates.
|
|
// All rights reserved.
|
|
//
|
|
// This source code is licensed under the BSD-style license found in the
|
|
// LICENSE file in the root directory of this source tree.
|
|
|
|
#pragma once
|
|
|
|
#include <stdexcept>
|
|
|
|
#include <c10/macros/Macros.h>
|
|
#include <c10/util/Exception.h>
|
|
|
|
// Utility macro similar to C10_THROW_ERROR, the major difference is that this
|
|
// macro handles exception types defined in the c10d namespace, whereas
|
|
// C10_THROW_ERROR requires an exception to be defined in the c10 namespace.
|
|
#define C10D_THROW_ERROR(err_type, msg) \
|
|
throw ::c10d::err_type( \
|
|
{__func__, __FILE__, static_cast<uint32_t>(__LINE__)}, msg)
|
|
|
|
namespace c10d {
|
|
|
|
using c10::DistNetworkError;
|
|
|
|
class TORCH_API SocketError : public DistNetworkError {
|
|
using DistNetworkError::DistNetworkError;
|
|
};
|
|
|
|
class TORCH_API TimeoutError : public DistNetworkError {
|
|
using DistNetworkError::DistNetworkError;
|
|
};
|
|
|
|
} // namespace c10d
|