nl_send_sync returns without consuming ack message

Mon Jan 23 01:37:42 PST 2017

2017-01-23 9:56 GMT+01:00 Thomas Haller <thaller at redhat.com>:
> On Fri, 2017-01-20 at 20:37 +0100, Christophe Gouault wrote:
>> Hi,
>>
>> I am trying to use the nl_send_sync() with auto-ack enabled, to
>> retrieve the response to an XFRM_MSG_GETAE request.
>>
>> In case of error (e.g. IPsec SA does not exist), the kernel sends a
>> single NLMSG_ERROR netlink message (error message). nl_send_sync()
>> returns a error value and the NLMSG_ERROR message is consumed (OK).
>>
>> However, in case of success (IPsec SA found), the kernel sends 2
>> distinct netlink messages (with same sequence number but not
>> MULTIPART
>> flag): an XFRM_MSG_NEWAE (response) + an NLMSG_ERROR (ack message)
>> but
>> nl_send_sync() only processes the first one.
>>
>> The NL_CB_INVALID callback is properly invoked to process the
>> XFRM_MSG_NEWAE message, but the ack message is not read.
>> nl_send_sync() returns 0 without consuming the ack, which remains in
>> the socket buffer. It will be read next time someone reads the
>> socket,
>> instead of reading a response to a new request.
>>
>> I expected nl_send_sync() to process all messages until an error or
>> ack is read. Am I missing something?
>>
>> Regards,
>> Christophe
>
> Hi Chirstophe,
>
> nl_send_sync() basically just sends the request and then calls
> nl_wait_for_ack(). I assume you did not set NL_NO_AUTO_ACK -- because
> then nl_wait_for_ack() is bypassed.
>
> Then, nl_wait_for_ack() modifies the current socket handler
>   nl_cb_set(cb, NL_CB_ACK, NL_CB_CUSTOM, ack_wait_handler, NULL);
> that means, if you have another handler installed which returns NL_STOP
> before the ACK is received, it will not wait.
>
> I would check if there are conflicting handlers present.
>   https://github.com/thom311/libnl/blob/3dd2a0f26fa59896b4b4a262cf309a4be4aa70d3/lib/nl.c#L1112

Hi Thomas,

Indeed, I did not set NL_NO_AUTO_ACK (I did not invoke
nl_socket_disable_auto_ack()), and I only registered a NL_CB_VALID
handler.

I carefully read the source code of functions invoked by
nl_send_sync(), and the behavior I observed can easily be explained.
My concern is: is it really the expected behavior. My feeling is no:

When recvmsgs() parses the reply to my request, it invokes my
registered NL_CB_VALID handler, which returns NL_OK (= 0).

Then "hdr = nlmsg_next(hdr, &n);" is called to read a possible next
message in the current datagram. The ack is in a separate datagram, so
the "while (nlmsg_ok(hdr, n))" loop ends.

The next datagram is only read if the message had a NLM_F_MULTI flag,
which is not the case, so recvmsgs() returns, then
nl_recvmsgs_report(), nl_recvmsgs(), nl_wait_for_ack(),
nl_wait_for_ack()) and finally nl_send_sync() itself.

Eventually, nl_send_sync() returns 0 without reading the ack.

Regards,
Christophe