Standard one-sided RMA synchronization method #6564
-
Hi. I went through the documentation trying to come up with a simple design for one-sided RMA but the best I could do doesn't look very good actually. This is a high-level summary of the steps I take: Receiver:
Sender: What is counterintutive is that step 3 is needed in order to make progress (otherwise fi_write* returns FI_EAGAIN), and if I use anything else other than fi_writemsg with immediate data, the receivers
So we have to read the CQ on the receiving end (for flow-control I assume, so that the sender doesn't get an EAGAIN), but it is not guaranteed that we'll get an event in the CQ, because e.g. the sender didn't use fi_writemsg with immediate data? Surely this can't be the best way of synchronizing the receiver with the sender. What is a better approach? PS: I'm using Thanks, |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
The short answer is I believe you have it. At step 6, I assume you meant the FI_REMOTE_CQ_DATA flag. You should also be able to use fi_writedata() instead of fi_writemsg(). Backing up some... When using RDM endpoints over verbs, the most common selection ends up with RxM as the upper level provider and verbs using connected QPs at the lower level. RxM (reliable datagram over message endpoints) must establish a connection with its peer before any transfers occur. This happens the first time a data transfer operation is invoked. If step 2 is exchanging data through the libfabric APIs, then the connection has been setup. If step 2 is occurring out of band (e.g. via sockets), and fi_writemsg() is the first data transfer call being made, EAGAIN is expected. The call will not succeed until the connection has been created. The CQ reads are needed at both peers to drive progress, unless auto-progress has been enabled. See the fi_domain man page, fi_domain_attr, ctrl and data progress fields. While RxM is establishing a connection, the peer needs to process and respond to the connection request. For manual progress, that uses the application thread under the CQ read APIs. The app isn't aware that the connection setup is occurring, other than EAGAIN being returned from transfer calls. After the connection has been established, rxm + verbs implement SW flow control, so EAGAIN can occur at any point when that kicks in. SW flow control is tied to operations that consume posted receive buffers at the target. A normal RMA does not consume receive buffers. However, an RMA with CQ data does. EAGAIN is also possible if the local transmit queue is full. Anyway, assuming that you eventually get past seeing EAGAIN, and the transfer is accepted... If you have verbs hardware with full transport offload (most common case), then RMA operations posted by the sender will pass directly through to the hardware. Hardware will drive the transfer at the initiator and target sides. You don't have to read the CQ at the target for the RMA to happen. However, the only way for the target to know that the initiator is done is to receive some sort of 'message' from the initiator that follows the RMA. The purpose of reading the CQ at the target is to get that notification. That message can either be a stand-alone transfer, where the initiator calls fi_send, or can be CQ data carried as part of the RMA itself. The latter is most efficient. Because of the verbs architecture, CQ data 'consumes' a posted receive buffer at the target and is reported as a completion. With rxm, the consumed buffer is internal to rxm and not an application buffer. |
Beta Was this translation helpful? Give feedback.
The short answer is I believe you have it.
At step 6, I assume you meant the FI_REMOTE_CQ_DATA flag. You should also be able to use fi_writedata() instead of fi_writemsg().
Backing up some...
When using RDM endpoints over verbs, the most common selection ends up with RxM as the upper level provider and verbs using connected QPs at the lower level. RxM (reliable datagram over message endpoints) must establish a connection with its peer before any transfers occur. This happens the first time a data transfer operation is invoked. If step 2 is exchanging data through the libfabric APIs, then the connection has been setup. If step 2 is occurring out of band (e.g. via sockets), and fi_writemsg(…