[Fix] add validation logics to TCPStore queries (#107607)

This PR fixes #106294.

Due to the lack of request validation mechanism, TCPStore in torch mistakenly treats nmap scan messages as valid query messages, which leads to DDP OOM. The simple solution enforces the very first query from a client is a validation query with a predefined magic number. If the validation fails, the server will terminate the connection.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107607
Approved by: https://github.com/cbalioglu, https://github.com/XilunWu
This commit is contained in:
Juncheng Gu
2023-11-02 22:12:45 +00:00
committed by PyTorch MergeBot
parent 12a6f5aa6b
commit 50a9981217
5 changed files with 145 additions and 56 deletions

View File

@ -18,7 +18,11 @@
namespace c10d {
namespace detail {
// Magic number for client validation.
static const uint32_t validationMagicNumber = 0x3C85F7CE;
enum class QueryType : uint8_t {
VALIDATE,
SET,
COMPARE_SET,
GET,