Linux Kernel 5.17.9 is out

A new stable Linux kernel version is out. Let's see what are the changes with respect to the cyber-security.

Summary of changes

From the security standpoint we can find a series of changes related to network port number randomization. But what is the kind of security related to port numbers, if they are known and seen in the network packets, one can be wondering? Hand tight, we'll explain shortly.

A little theory

Network ports numbers are just the pair of numbers in internet packets (usually TCP or UDP) - source port number and destination port number. In the outgoing packet, the destination port is determined by the network service we are talking to (e.g. we can choose 22 for SSH), while the source port can be anything and usually look random. Ephemeral port number is a common term for that. But how should we calculate or pick this number? First common sense rule says that for different connections to the same destination IP and port there should be different source port numbers. This is important for both endpoints (server and client), otherwise they will not be able to distinguish for which connection each TCP/UDP packet belongs to. Speaking more strictly, each transport connection is uniquely identified by the so-called Five-Tuple structure:

Five-Tuple

 

For instance, one can distinguish the following two TCP connection:

  • 192.168.100.1:5566 -> 192.168.100.2:22
  • 192.168.100.1:5567 -> 192.168.100.2:22

despite of the fact that they are between the same machines and they both carry SSH traffic.

So we now know that the source port numbers should not be constant, they need to be somewhat random and not repeat each other, but are there any other requirements? A spoiler - yes. In essence, they must be hard-to-guess. But how hard?

Off-path attacks on networks

These are the special class of network attacks where an attacker cannot see (i.e. capture packets) the network traffic, but can influence it indirectly (i.e. inject packets). They are also called blinded attacks and are usually conducted using a side-channel inference. An easy example is when an adversary injects his packet into the transmission so that it looks like a legitimate server response, and this "reply" packet contains something that can harm the client software, e.g.:

  • RST flag that simply terminates the connection
  • carefully-crafted malformed content that may crash the software or exploit a vulnerability

Securely generated random port numbers

In order to inject the packet without seeing the existing data flow an attacker needs to guess all five components of the connection. Protocol number is the easiest part. People usually use TCP or UDP. Host addresses are also guessable using social engineering or indirect knowledge. Destination port number usually belongs to registered protocol list (22 is SSH, 443 is HTTPS, etc). The source port number is the hardest part. But if it is not generated using a strong cryptographically secure random function, it can also be guessed even in an automated way.

Linux kernel mechanisms

To generate enough random data without performance degradation, kernel developers chose to use so-called Double-Hash Port Selection Algorithm. Without going to deep into the details, we can say that it is based on a pre-generated table of a fixed size (stored in memory) integers, a hash function that takes IPs, port numbers and a secret key as its input, and a loop that tries to obtain a next non-occupied ephemeral port from the table.

The hash function is a well-known keyed SipHash function. It's blazing fast and at the same time enough strong to cryptographic attacks.

Summary of improvements

1. Using the full 64-bit output of a SipHash calculated. Previously the Kernel was taking only 32-bit part of it since the times it uses MD5 instead of SipHash. The more bits we use, the more perfectly distributed random number we get.

2. Next, a code that adds some more entropy to already randomly generated port numbers was added. 

3. The size of pre-generated random number table is increased, requiring more memory and thus a different allocation strategy.

4. Finally, unnecessary additional hash function was removed, being redundant and not adding any security.

Useful Links