What is VXLAN (rfc7348)?
May 19, 2018
From the rfc7348
VXLAN is a Layer 2 overlay scheme on a Layer 3 network.
To understand VXLAN better, let’s first understand what’s subnetting and VLAN.
Let’s say we have a physical LAN where there are multiple hosts with IPs in
10.1.2.0/24 network. Each host can talk to the other hosts using a switch alone. Now, we want to group set of hosts and separate them from each other. What are our options?
The first thought that comes to my mind is IP subnetting. It can prevent two hosts in different subnets from talking to each other unless we explicitly allow it using a router.
Say now we have two subnets,
10.1.2.0/25 where the two ranges are
10.1.2.1 - 10.1.2.126 and
10.1.2.129 - 10.1.2.254. The broadcast address for first subnet is
10.1.2.127 and for second subnet it’s
However, they still have the same layer 2 broadcast domain.
Consider the following example (in gif below) where even though the host (
10.1.2.2) can’t ping the other hosts (
10.1.2.134) but it can broadcast in the other subnet since the broadcast domain at layer 2 is same and switch will broadcast all the frames with destination
FFFF.FFFF.FFFF to all the ports irrespective of layer 3 source/destination IPs.
The behavior is same in presence of router that allows the two subnets to talk to each other. Since switch operates at Layer 2.
So how can we split the broadcast domains for the subnets? This is where VLAN comes in. VLAN can be used to separate the broadcast domains as well.
VLAN frames are tagged by switch based on what port they arrive at. VLAN header is 4 bytes long and is placed before the type field in ethernet frame. It contains a 12 bit VLAN Identifier (VID) that identifies the frame it belongs to.
In the example below, hosts (
10.1.2.135) are in the same vlan (
vlan1) and the remaining hosts belong to a different vlan. As in the previous experiments, let’s ping the broadcast address (
10.1.2.127) from the host
10.1.2.3. Note that in this case, the packet is only broadcasted to the hosts that belong to the same vlan irrespective of the subnet. Note that the host (
10.1.2.35) will drop the packet since the destination belongs to a different subnet and we don’t have a default gateway set. However, hosts in the other vlan don’t even receive the icmp request.
Note that 12 bit identifier limits the maximum number of vlans you can have to 4096. However with recent influx of virtualized environments, this is not enough and VLAN is not friendly enough for large number of multi tenant topologies. Hence the proposal for VXLAN.
VXLAN stands for Virtual eXtensible Local Area Network, it’s a layer 2 in layer 3 overlay tunnel. More specifically an Ethernet in UDP tunnel. The idea of VXLAN is similar to VLAN, as in it provides logical separation of private networks accross physical network boundaries.
Following up on the discussion above, let’s say we need to be able to logically separate networks even further. In presence of multi tenants, we want to provide a logical separation of layer 3. Consider the network topology in the diagram below network A and B are part of the same private network separated by external network. A-P, B-Q wants to use same subnet in the same physical network.
Firstly, in order to provide a separation accross physical network, we need an underlay network. The requirement from underlay network (UDP) is that it should have ip connectivity between two networks. Note that the UDP port 4789 is reserved for VXLAN.
Secondly, in order to provide separation within the same physical network, we need something similar to VLAN to identify the two logically different but numerically same IP subnets. This is where VNI (VXLAN network identifier) from VXLAN header is used. Similar to VID of VLAN.
A VXLAN transmission looks something like this:
VXLAN Header: +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |R|R|R|R|I|R|R|R| Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | VXLAN Network Identifier (VNI) | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Taken from rfc7348
The endpoint of the tunnel, VTEP (Virtual Tunnel End Point) forms the control plane of VXLAN (overlay network). It maintains a mapping of internal (VM) MAC address to the outer IP address. VTEP can learn of these mappings from source MAC to destination IP addresses either through manual configuration, SDN push or using multicast (source learning through multicast)
Packet below shows a trace while sending a packet from
Assuming the mappings has already been learnt, while sending a packet from A:
10.0.0.1 to B:
- VTEP will look up the mac address of
10.0.0.2in its mapping and determine that it needs to go through
- VTEP then encapsulates the packet from
10.0.0.1in a layer 3 packet with source IP as
192.168.56.11and destination IP as
- It uses a dynamic source port (that allows for load balancing, etc) and the reserved
4789as the destination port.
- In addition, it also adds VNI (123 in example below) to distinguish between A and P.
VXLAN is an important piece of puzzle when it comes to scaling IPv4 and providing network abstractions. Overlay networks are built upon this concept and there are several open source projects that make it easier to deploy these in production. If you want to learn more, you can checkout flannel that’s often used with Kubernetes to setup these overlay networks.
I hope you have a slightly better understanding of where VXLAN belongs in plethora of networking technologies. As always, I’d appreciate any feedback, or comments.