In this post I’ll take a closer look at IPv6 Duplicate Address Detection (aka ‘DAD’, which evidently bears all of types of jokes and wordplays). While the general mechanism should be roughly familiar to everybody working with IPv6 there are some interesting intricacies under the hood, some of which might even have operational implications.
DAD was already part of the initial specification of SLAAC in RFC 1971 (dating from 1996), which was then obsoleted by RFC 2462. RFC 4429 describes a modification called ‘Optimistic Duplicate Address Detection’. Neighbor discovery and SLAAC, incl. DAD, were later updated/specified in the RFCs 4861 and 4862 which are considered the main standards as of today. Finally DAD was enhanced in RFC 7527 but that’s of minor relevance here.
Its goal is to avoid address conflicts (within the scope of a respective address). To do so it is supposed to perform a specific verification procedure (‘ask a certain question’) and subsequently to act on the result of that procedure. However, as we will see, namely the latter can depend on a number of circumstances, in particular on the type of the address/IID.
How to ask the question?
Generally speaking a host is expected to perform the following (for a given unicast address):
- send a Neighbor Solicitation (ICMPv6 type 135) message.
- use the unspecified address (::, see RFC 4291, section 2.5.2) as source address, the requested unicast address’s Solicited-Node multicast address (SNMA, see RFC 4291, section 2.7.1) as target address and put the to-be-used unicast address as target address into the ICMPv6 payload.
This can look like this (ref. RFC 2464 for the ’33:33′ in the Ethernet multicast address):
It should be noted that RFC 4862 states that “Duplicate Address Detection MUST be performed on all unicast addresses prior to assigning them to an interface, regardless of whether they are obtained through stateless autoconfiguration, DHCPv6, or manual configuration”, but in practice this can be turned off on the OS level (and there might even exist situations where this could be desirable, see below). Still, the general verification procedure is mostly identical on the vast majority of operating systems.
Shall we wait for a response?
This is where the differences between scenarios start. As stated above RFC 4429 describes a thing called ‘Optimistic DAD’. The idea here is put an address into an ‘optimistic’ state right after sending out the NS and thereby make the address operational pretty much immediately (with some minor restrictions like not to send certain packets with said address as the Source Link-Layer Address Option [SSLAO]). This optimization is supposed to be used when – as of RFC 4429 section 3.1 – “the address is based on a most likely unique interface identifier” such as an EUI-64 generated one, a randomly generated one (Privacy Extensions, RFC 4941, more info here), a Cryptographically Generated Address (as for example used by Apple devices, see here) or a DHCPv6 address (note that the concept of ‘stable’ addresses as of RFC 7217 did not exist at the time). Optimistic DAD explicitly “SHOULD NOT be used for manually entered addresses”.
As of today it’s a fair assumption that all ‘client operating systems’ use Optimistic DAD, as can be observed in the above example, but this does not apply to servers using static addresses. This is how it looks like on macOS Big Sur (note that the router solicitation is sent already two milliseconds after the DAD neighbor solicitation)
What if the response indicates a conflict?
This is where things (differences) become really interesting. While RFC 4429 has a dedicated section on the ‘Collision Case’ (sect. 4.2), it remains relatively vague, includes terms like ‘hopefully’ 😉, and states that an address collision “may incur some penalty to the ON [optimistic node], in the form of broken connections, and some penalty to the rightful owner of the address” (which doesn’t sound right to me…).
RFC 4862 mandates (in “5.4.5. When Duplicate Address Detection Fails”) that in case of a collision of an EUI-64 generated address the IPv6 operation of the respective interface “SHOULD be disabled”, but “MAY be continued” in other (address generation) scenarios. Furthermore “the node SHOULD log a system management error”.
An interface with a static address where DAD failed could look like this:
inet6 2001:db8:320:104::9/64 scope global tentative dadfailed
valid_lft forever preferred_lft forever
So, overall no guidance is provided here how to proceed in case of a detected conflict for addresses based on RFC 3972 (CGAs), RFC 4941 (Privacy Extensions) or RFC 7217 (‘Stable IIDs’), but this may be specified in other places (see below), and/or might be left to the implementors of individual OS stacks. Many years ago Christopher Werny and myself performed some testing for Windows and Linux, creating various scenarios with address collisions, and from the top of my head I recall that their behavior was both quite different and not necessarily intuitive (sorry I don’t remember details).
CGAs have a dedicated Collision Count parameter which can be “incremented during CGA generation to recover from an address collision detected by duplicate address detection” (RFC 3972, section 3).
RFC 4941 includes this (with the TEMP_IDGEN_RETRIES defaulting to the value 3):
RFC 8415 on DHCPv6 specifies as follows (with a DEC_MAX_RC parameter indicating the number of client-side retries of getting a new address. it defaults to the value 4):
Furthermore the DHCPv6 server “SHOULD mark the addresses declined by the client so that those addresses are not assigned to other clients”.
I’m not sure about the exact sequence of things when the client uses optimistic DAD (which in turn should be the default for DHCPv6 addresses).
tl:dr of this section: the exact behavior of reacting to an address collision might not always be the same, and it might depend on several circumstances.
Operational Implications (1): Service Bindings
As laid out above optimistic DAD is not supposed to be performed when static IPv6 addresses are used. This can create issues when during system boot a service is to be bound to an address which is still in ‘tentative’ state (during DAD), as discussed in this thread (also interesting comment there at the bottom, on the differences re: DAD between FreeBSD and NetBSD).
This could look like this:
020/09/26 10:08:22 [emerg] 11298#11298: bind() to [2001:db8:104:1700::12]:80 failed (99: Cannot assign requested address)
Apparently this may be fixed by touching the following sysctl but I don’t fully understand its mechanism, so this might only work in certain scenarios:
In any case the delay induced by DAD (with static addresses) should be considered for service bindings during startup.
Operational Implications (2): cni0 interface stuck in DAD
I once heard of a case where the cni0 bridge interface on Kubernetes clusters was stuck in DAD when initialized by standard CentOS initscripts (which in turn was difficult to troubleshoot as it only had veth members and wasn’t bound to any physical interface). This could presumably only be solved by disabling DAD as a whole. That might be a debatable approach (I for one think this is perfectly doable even in other settings once one has sufficient control over the [static] address assignment mechanisms), but for completeness sake here’s the relevant sysctl (from the current Linux kernel documentation):
Suffice to say that DAD might kick in various ways and in the context of different dependencies, so one has to be aware of its inner workings and of its role during interface initialization.
To contribute to such an understanding was the exact point of this post ;-). Thank you for reading so far, and as always I’m happy to receive feedback on any channel incl. Twitter.