Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the issue of printing data overflow. #1383

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

wangfakang
Copy link

@wangfakang wangfakang commented Jul 31, 2024

Fix the issue of printing data overflow. friendly ping @AddyLaddy @sjeaugey .

uint32_t byte_len;
uint64_t wr_id;

nccl/src/include/ibvcore.h

Lines 302 to 316 in 178b6b7

struct ibv_wc {
uint64_t wr_id;
enum ibv_wc_status status;
enum ibv_wc_opcode opcode;
uint32_t vendor_err;
uint32_t byte_len;
uint32_t imm_data; /* in network byte order */
uint32_t qp_num;
uint32_t src_qp;
int wc_flags;
uint16_t pkey_index;
uint16_t slid;
uint8_t sl;
uint8_t dlid_path_bits;
};

For example, in the following example, the len variable becomes a negative number when the size of the variable is out of bounds.

xxxx:8036:8538 [0] transport/net_ib.cc:1295 NCCL WARN NET/IB : Got completion from peer x.x.x.x <16275> with error 12, opcode 32749, len -2147483648, vendor err 129 (Send) localGid xxxxx remoteGid xxxxx

@wangfakang
Copy link
Author

friendly ping @AddyLaddy @sjeaugey

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant