Despite having DNSSEC, DoH or DoT to secure DNS lookups, many systems still rely on plain old DNS from 1983. Earlier this year, we’ve been part of a larger team that audited c-ares v1.19.0. c-ares is an asynchronous DNS client library with support for a wide range of platforms. It is around for quite some time now and a few of its more prominent users include libcurl, node.js and Wireshark.
In this post, we delve into one specific outcome of our work, namely the weak DNS query ID generation in c-ares identified as CVE-2023-31147.
Even though plain DNS does not include any cryptographic measures for authenticity, DNS queries use two properties for being more resilient against forged answers1:
In the good old days, this was not the default. For instance, DNS resolvers used fixed source ports, allowing attackers to focus solely on correctly predicting the 16-bit query ID. The practical exploitation of this vulnerability was most prominently shown by Dan Kaminsky in 2008 through the publication of CVE-2008-1447. Thus, selecting the source port and query ID using a cryptographically secure random number generator (CSPRNG) is crucial: It makes it quite hard for attackers to construct and inject a malicious response before the real response arrives at the client.
When auditing a DNS protocol implementation like c-ares, we always check if these mitigations are properly implemented. In the context of this blog post, we primarily focus on random DNS query identifiers, as source port randomization is nowadays the default and is managed by the OS2.
The high-level design of DNS-Query ID generation in c-ares appears fairly simple to users: Upon initialization with ares_init(ares_channel *channelptr)
,
c-ares collects random bytes from the OSes CSPRNG. These bytes serve as the seed for an internal CSPRNG.
This internal CSPRNG is then used to generate the 16-bit DNS query ID for each individual query.
The CSPRNG state is stored in the opaque type ares_channel
and updated every time a DNS query
ID is generated through the use of ares__generate_new_id(...)
.
Once we start looking at the code more closely, we’ll realize that not everything is well designed: First, we notice that DNS query IDs are generated using a pseudo random number generator (PRNG) based on RC4:
|
|
In case you’re too young or it’s been a while: RC4 is a stream cipher designed in 1987 and has gained widespread usage over the years. However, since its inception, it has been shown to be flawed and insecure multiple times. It is also the reason why Wifi protocols WPA-TKIP and WEP were both famously broken. So it is not a cipher you’d use in 2023.
Using RC4 as cryptographically secure PRNG (CSPRNG) was also quite common some time ago and
it was the core of arc4random(3)
which originated
in OpenBSD and is nowadays part of all BSD-descendants including Apple’s macOS and iOS.
However, with the discovery of more and
more ways the RC4 key stream is biased and not properly random,
it became clear that RC4 was unsuitable for use as a CSPRNG. As a result, today’s arc4random(3)
implementations
use CSPRNGs based on ChaCha20 or AES.3
Shifting our focus back to c-ares, there is more to be considered: While functions like arc4random(3)
did
ensure that new entropy is added after a certain amount of random bytes has
been generated, c-ares takes a different approach. It simply seeds the PRNG once and uses it throughout
the entire lifespan of the ares_channel
. This may not be a concern for tools like adig
(the dig
version of c-ares), since the process will never live long enough. On the other hand, if we consider services which
run for a much longer time and perform a lot of DNS queries, this might be a different story.
Especially, if said service initializes a single ares_channel
upon startup and uses it until it is stopped.
While this is probably not really an issue with c-ares, it is common practice to reseed
the PRNG after generating a certain amount of bytes.
With our interest peaked, we can start looking more closely into how RC4 is seeded. This is
done in the function init_id_key(rc4_key* key,int key_data_len)
which gets handed
in a pointer to the rc4_key
type stored in the ares_channel
and a number dubbed
key_data_len
:
|
|
We notice that the buffer key_data_ptr
is actually
useless, since it is never populated with something other than all zero bytes.
So we can actually ignore it in the calculation of index2
in line 25.
Furthermore, the 256-byte buffer key->state[]
is filled with numbers from 0
to 255
, and
then handed to a function randomize_key(...)
which one would assume shuffles
the contents of key->state[]
. Also notice that we hand
key_data_len
to this function. Digging through the header files, we can find
that its value is always ARES_ID_KEY_LEN
, which is 31
. So this is not the
length of key->state[]
which is 256 bytes.
Therefore, it begs the question where the true randomness is fetched from the OS.
Looking at the code above, randomize_key(...)
is the only sensible candidate.
Prior to delving into that, let’s briefly compare how RC4 is implemented in OpenSSL:
|
|
We’ll notice that init_id_key(...)
in c-ares is slightly different and actually broken:
OpenSSL’s key schedule implementation receives the raw key via the data
buffer,
initializes the state buffer d
and then shuffles d
and data
.
In c-ares, key_data_ptr
is OpenSSL’s data
buffer and state
is the
equivalent of pointer d
. Knowing this, we can see that c-ares confused
key->state
with key_data_ptr
when calling randomize_key(...)
which we
assume retrieves a random key.
Consequently, the 256 ARES_SWAP_BYTE(...)
operations in c-ares’ key schedule incorrectly
depend on key->state
only, but not key_data_ptr
.
Without going into all the math details here, this definitely looks worse than what the original RC4 key
schedule does as it is likely resulting in fewer possible permutations of key->state[]
.
We can assume that this has a downside on the quality of random numbers it generates.
Finally, let’s take a look at the randomize_key(...)
more closely:
|
|
At first glance, this appears okay: On WIN32
systems, RtlGenRandom()
is
used to query key_data_len
bytes of randomness from the OS and place them
into key
which is key->state[]
in the caller init_id_key(...)
.
On non-WIN32
targets, we either open the file path of CARES_RANDOM_FILE
and
read key_data_len
bytes from there or fall back to using rand()
to get
the same amount of random numbers. We assume that CARES_RANDOM_FILE
is
set to /dev/urandom
or /dev/random
for now.
An initial observation here is that the fallback relies on rand(3)
,
which is not designed for generating cryptographically secure random numbers.
Since it is only a fallback if nothing else works it is better than return 4;
,
but it would be a better choice to first try to use arc4random(3)
on *BSD or getrandom(2)
on Linux, just in case.
More concerning though, is the absence of any srand(3)
in the whole
source, which would seed the PRNG used by rand(3)
. Without it, rand(3)
will output the same sequence of numbers every single time!
This means that all our DNS query IDs will be
fully predictable every time we end up in this fallback case.
Looking at the CARES_RANDOM_FILE
case, it becomes evident that any error with
fread(3)
will silently fail and result again in rand(3)
being used.
While c-ares does the best it can in this case, it probably shouldn’t fail silently.
At least a few sysadmins would want to know that their DNS queries are
all predictable due to some configuration issue.
Finally, there is one more thing in the above code. Whenever CARES_RANDOM_FILE
is not set, it automatically falls back to using rand(3)
for seeding the RC4
PRNG. That this can be a problem becomes apparent when we look at the Autotools
configure.ac
file:
dnl Check for user-specified random device
AC_ARG_WITH(random,
AS_HELP_STRING([--with-random=FILE],
[read randomness from FILE (default=/dev/urandom)]),
[ CARES_RANDOM_FILE="$withval" ],
[
dnl Check for random device. If we're cross compiling, we can't
dnl check, and it's better to assume it doesn't exist than it is
dnl to fail on AC_CHECK_FILE or later.
if test "$cross_compiling" = "no"; then
AC_CHECK_FILE("/dev/urandom", [ CARES_RANDOM_FILE="/dev/urandom"] )
else
AC_MSG_WARN([cannot check for /dev/urandom while cross compiling; assuming none])
fi
]
)
As we can see, the existence of /dev/urandom
is determined at compile time.
This will likely break in cross-compile situations where this file does not
exist on your build host. We’ll then always use the fallback case with the RC4 PRNG seeded by rand(3)
!
Luckily enough, c-ares also brings CMake as build system and this is used
for example by the Yocto meta-oe recipe for c-ares, so not all is lost.
Nevertheless, the check for /dev/urandom
’s existence in c-ares should probably
be done during runtime instead of determining it at compile time.
Combining all these issues, we now know that:
/dev/urandom
on the host will result in the fallback case being
used, even if /dev/urandom
would be available on the target.This means that DNS query IDs generated by c-ares are not fully random, raising the likelihood of query IDs becoming completely predictable. Consequently, an attacker’s search space for the tuple of source port and DNS query ID is smaller and makes it more likely to succeed. It basically brings us closer to the good old days of CVE-2008-1447 where source port randomization was not used by default. :-)
After reporting this issue, the c-ares maintainers published v1.19.1, which rectifies this problem (fix commit) and other recent vulnerabilities. So, better be sure to update c-ares to the latest version!
RFC5452 lists all measures DNS implementations should take to be more resilient against response forgery. ↩︎
Of course we also checked if the c-ares code base does anything special with respect to selecting the source port. ↩︎
Shout out to DragonFlyBSD who also managed this just recently in 2023! ;-) ↩︎
Publish date
22.08.2023
Category
cryptography
Authors
David Gstir
+43 5 9980 400 00 (email preferred)
sigma star gmbh
Eduard-Bodem-Gasse 6, 1st floor
6020 Innsbruck | Austria