Virtual NIC, bridging, NAT, KVM and IPv6 | segfault.segfault.digital

Index

Introduction
Overview
VNIC and bridge
VMs and TAP devices
NAT & port forwarding
IPv6
References

Introduction

I now have a full root server with a CPU that supports HW-virtualization.
The CPU isn't bad ("Intel(R) Core(TM) i3-2130 CPU @ 3.40GHz") and I have a LOT of ram (16GB) and disk space (2 x 1TB disks) available => perfect to set up VMs!

Why use VMs?
Well, on one side I wanted to separate the mailserver from the rest (webserver). I wanted to have a chance that if the webserver got corrupted the mailserver would still potentially be ok (or the opposite) - but in such a case I would probably have to be quick detecting and reacting as having a corrupted VM would probably pose serious risks as well for the host and the other VMs.
On the other side I liked the basic idea of VMs - let's say that I think about upgrading one of them: I can shut it down, copy on the host the file containing its root filesystem, start it up again and perform the upgrade and if things go wrong I still have a full copy of the old version.
Or: start up immediately after the copy one of the VMs and perform the upgrade on one while the other one is still up (not in the case of the mailserver, otherwise new emails would end up only on the non-upgraded copy...).
And: it lowers the complexity of the upgrade on the host itself as it runs only the basic services (firewall, virtualization), plus lowers the risk that if an upgrade fails the whole system is affected.
What did I want to do? Did it work?
- Iwanted to have at least 2 VMs running on the first one the webserver and on the second one the mailserver.
  Worked!
- make the host and all the VMs available to the IPv4-world by using only 1 public IPv4 address.
  Worked!
- make every single VM available to the IPv6-world by directly assigning to each VM one of the thousands of public IPv6-addresses I have.
  Failed!
What did not work with IPv6?
I tried more or less for fifty hours to assign to the VMs one of the public IPv6-addresses and make them directly reachable from the outside world, but it just did not work. Tried, first using an "intelligent" approach and with the time passing by blindly trying out more and more combinations with different types of network bridges, internal routing, kenel parameters, and... nothing - absolutely no results. The VMs could be reached through IPv6 only from the host, but never from the outside.
IPv6 plan B
In the end I just gave it up and used on the host "haproxy" to internally push the inbound IPv6-traffic to the right VMs (based on the incoming connection's port) through the internal IPv4 network (I could have used the IPv6-address of the VMs but by addressing it over IPv4 it was easier as I could use in the VMs almost the same firewall-rules that I had already for IPv4).
I know - quite disappointing.
I think that I read all the posts in forums/blogs/maillists about IPv6 of the planet :o)
I assume that the issue I have is due to some routing problems - IPv6 and its semi-automatic configuration is very nice if it works but becomes a nightmare if it doesn't.

This is just an example to put you on the right track if you get stuck somewhere when doing something similar.

Please, if you think that after reading this guide you think to know what kind of problem I had with IPv6 then please let me know :o). I will leave it like this for the time being and give it one more try with another server or once my Internet provider is ready to give me an IPv6 address.

Overview

The whole setup is probably best described with a diagram:
virtual machines overview

Important: I'm not good with networking, firewall & Co. Some details might be wrong in this article, so be sure to review it critically before apllying the configuration. Use these informations at your own risk.

VNIC and bridge

virtual machines vnic

Why using a VNIC (Virtual Network Interface Card)?
You normally create a bridge by bridging the physical NIC (e.g. eth0).
This works perfectly but one problem in doing so is that mistakes when setting up the bridge and then transfering the IP-configuration of the physical NIC to the bridge result in the server not being reachable anymore.
This is not a problem when you have physical access to the server (e.g. at home) but in the case of accessing the server remotely it becomes a problem because you'll have to force a HW-reboot of the server (hopefully your provider supports it) to gain back access to it (hopefully your new configuration isn't permanent, othewise you'll have to reboot in some rescue mode).

Because of this risk, and because I have anyway only 1 public IPv4-address (meaning that in any case the VMs won't use public IP-addresses as the only one available is already used by the host) I decided to use the VETH-functionality of the kernel and create a brand new virtual ethernet card not directly connected to the physical NIC in the case that if veth0 gets screwed up eth0 will still be reachable.

Enabling and using VETH
First of all enable in the kernel the support for virtual NICs in "Device Drivers => Network device support => Network core driver support => Virtual ethernet pair device", then compile the new kernel and reboot the server.

A quick look with "ifconfig -a" shows the network interfaces that you've got. In my case I see the following:
====================
eth0      Link encap:Ethernet HWaddr e0:cb:4d:7c:ac:7d
          inet addr:37.59.3.136 Bcast:37.59.3.255 Mask:255.255.255.0
          inet6 addr: fe80::e2cb:4eff:fe8c:ac7d/64 Scope:Link
          inet6 addr: 2001:41d0:8:5688::1/64 Scope:Global
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:19278 errors:0 dropped:0 overruns:0 frame:0
          TX packets:21872 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:8764756 (8.3 MiB) TX bytes:16239230 (15.4 MiB)
          Interrupt:20 Memory:fe500000-fe520000

lo        Link encap:Local Loopback
          inet addr:127.0.0.1 Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING MTU:16436 Metric:1
          RX packets:8771 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8771 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:7207728 (6.8 MiB) TX bytes:7207728 (6.8 MiB)
====================

If you now create virtual network interfaces with the command "ip link add type veth" and then check again with "ifconfig -a" you should see a pair of new "vethX" interfaces showing up - in my case a veth0 and a veth1:
====================
eth0      Link encap:Ethernet HWaddr e0:cb:4d:7c:ac:7d
          inet addr:37.59.3.136 Bcast:37.59.3.255 Mask:255.255.255.0
          inet6 addr: fe80::e2cb:4eff:fe8c:ac7d/64 Scope:Link
          inet6 addr: 2001:41d0:8:5688::1/64 Scope:Global
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:19278 errors:0 dropped:0 overruns:0 frame:0
          TX packets:21872 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:8764756 (8.3 MiB) TX bytes:16239230 (15.4 MiB)
          Interrupt:20 Memory:fe500000-fe520000

veth0     Link encap:Ethernet HWaddr fa:f9:fe:bb:3f:37
          BROADCAST MULTICAST MTU:1500 Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

veth1     Link encap:Ethernet HWaddr 32:5b:ca:9b:9d:b2
          BROADCAST MULTICAST MTU:1500 Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
====================

The original scope of VETH is to have each one of the two veth-interfaces to mirror what the other one receives or sends.
I am misusing this and am using only one of the two as a normal NIC.

The last thing I have to do is to assign to it a random MAC that I know of and set it to promiscuous mode (for the bridge that I'll set up later) with the following commands:
====================
ifconfig veth0 hw ether DE:AD:BE:EF:24:31
ifconfig veth0 0.0.0.0 promisc up
====================

Almost done!

Setting up the bridge

I now create a bridge called "br0" with...
brctl addbr br0
...then just to be on the safe side I assign to the bridge the same MAC that I used for veth0 (you might have to first run "ifconfig br0 down" if it complains)...
ifconfig br0 hw ether DE:AD:BE:EF:24:31
...then link the bridge to the virtual network interface...
brctl addif br0 veth0
...then perhaps turn off some kind of ping that the bridge sends to the whole network every 3 seconds (do it only if you have only 1 bridge running)...
brctl stp br0 off
...and finally assign the IPv4-configuration of an internal network to your bridge and activate the bridge:
ifconfig br0 192.168.0.1 netmask 255.255.255.0 up
That's it - you should be able to ping that IPv4-address (192.168.0.1) from within your host.

VMs and TAP-devices

Virtual machines and TAP

Now that we have a working bridge we can make KVM use it.
We need 3 things: a script to start the VM, a script to create the a TAP-interface whenever a VM starts, and another script to destroy the TAP when the VM goes down.

As an example, the script I use to start the VMs looks in my case like this:
=====================
#!/bin/bash
VMNAME="test"

PIDPATH=/tmp/vm-amd64-gentoo-"$VMNAME".pid
VMPID=$(cat $PIDPATH)

KVM_CMD="qemu-kvm \
        -nographic -daemonize \
        -balloon none \
        -pidfile $PIDPATH \
        -drive file=/mnt/vm/images/amd64-gentoo-test/rootfs.img,if=virtio,cache=writeback \
        -drive file=/mnt/vm/images/amd64-gentoo-test/swapfile.img,if=virtio,cache=writeback \
        -m 1024 -smp 2 \
        -kernel /somewhere/mykernel \
        -append \"root=/dev/vda\" \
        -net nic,model=virtio,vlan=0,macaddr=DE:AD:BE:EF:32:10 \
        -net tap,vlan=0,script=/myscripts/tap-create.sh,downscript=/myscripts/tap-destroy.sh"

FOUND=0
VMPID=$(cat $PIDPATH)
FOUND=$(ps -p $VMPID 2>/dev/null | wc -l)
if (($FOUND > 1))
then
        echo "It looks like that the VM is already running!"
        exit 1
else
        echo "About to run the following command..."
        echo $KVM_CMD
        echo ""
        echo "...to start the VM $VMNAME ."
        `$KVM_CMD`
        echo ""
fi
=====================
(you might want to replace the "-nographic -daemonize" with "-vga std" until everything works so that you're able to see what's going on during the start of the VM)
(the MAC shown above "DE:AD:BE:EF:32:10" is another random one, different from the one I assigned to veth)

You see above that I tell KVM to call 2 scripts to create the TAP interface when the VM starts (/myscripts/tap-create.sh) and to get rid of it when it's shut down ("/myscripts/tap-destroy.sh").
They're the standard scripts that KVM uses and you might have them already - if you don't, here are their contents:

/myscripts/tap-create.sh
===================
#!/bin/bash
set -x

switch=br0

if [ -n "$1" ];then
        tunctl -t $1
        ip link set $1 up
        sleep 0.5s
        brctl addif $switch $1
        exit 0
else
        echo "Error: no interface specified"
        exit 1
fi
===================

/myscripts/tap-destroy.sh
===================
#!/bin/bash
set -x

if [ -n "$1" ];then
        tunctl -d $1
        exit 0
else
        echo "Error: no interface specified"
        exit 1
fi
===================

If you now start the VM and have a look at the output of "ifconfig" you should see that a new tap-interface was created. KVM manages the interfaces and it will choose on its own which name to use.

Et voilà! After having set in your VM a static IP (e.g. 192.168.0.23) you should be able to connect from your host to your VM and the opposite (to the IPv4-address of the bridge, 192.168.0.1) using for example SSH - keep in mind that the login might seem to be stuck (for ~20 seconds) because when logging in SSH will try to resolve your host name and as the VM does not have yet any kind of dns-connectivity to the outside world the request will need a few seconds to reach the timeout and give it up, allowing you then to login.

NAT & port forwarding

Virtual machines nat and port forwarding

You now have a virtual network card, a bridge using it and a VM that uses it through its TAP-devices but the VMs are still more or less isolated from the outside world.
To allow the VM to connect to the Internet you will have to set up on your host NAT (Network Address Translation).
To allow the Internet to connect to your VM you will have to set up port forwarding.

DNS and DHCP-server

If you want you can set up on the host a dns- and dhcp-server by installing "dnsmasq".
Its configuration is very simple - in my case "/etc/dnsmasq.conf" looks like this:
=================
dhcp-range=192.168.0.100,192.168.0.150,72h
interface=br0
=================
Start then the service with:
/etc/init.d/dnsmasq start

NAT and port forwarding

To activate the support for NAT and port forwarding enable in the kernel (if you're using a custom one) the following options:

Select "Networking support => Networking options => TCP/IP networking => IP: advanced router"
Select "Networking support => Networking options => Network packet filtering framework (Netfilter)" and select "Advanced netfilter configuration" plus "Bridged IP/ARP packets filtering" and select all the sub-entries in...
- "Core Netfilter Configuration"
- "IP: Netfilter Configuration"
- "IPv6: Netfilter Configuration"

Compile, install and reboot.

Once done modify the configuration of your iptables-firewall to enable the forwarding of requests.
I am doing everything in a script and the section relevant for the task of NAT and port forwarding in my case looks like this:
===================
#!/bin/bash

#Allow forwarding of established connections
iptables -I FORWARD 1 -m state --state RELATED,ESTABLISHED -j ACCEPT

#Basic forwarding rules
iptables -A FORWARD -i br0 -s 192.168.0.0/24 -j ACCEPT
iptables -A FORWARD -i eth0 -d 192.168.0.0/24 -j ACCEPT
iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
iptables -t nat -A POSTROUTING -o br0 -j ACCEPT
iptables -t nat -A OUTPUT -o br0 -j ACCEPT

#Tell the kernel that ip forwarding is OK
#IPv4
echo 1 > /proc/sys/net/ipv4/ip_forward
for f in /proc/sys/net/ipv4/conf/*/rp_filter ; do echo 1 > $f ; done
echo 1 > /proc/sys/net/ipv4/conf/default/rp_filter
#IPv6
echo 1 > /proc/sys/net/ipv6/conf/all/forwarding

#Allow the host to connect to any VM on any port
iptables -A OUTPUT -o $LAN -j ACCEPT -d 192.168.0.0/24 -m state --state NEW

#IPv6 ICMP
ip6tables -A FORWARD -p ipv6-icmp -j ACCEPT
#IPv6 traceroute
ip6tables -A FORWARD -i br0 -p udp --dport 33434 -j ACCEPT

############Start-Port forwarding
#List these rules with "iptables -t nat --list"

#vm1-web-start
iptables -t nat -A PREROUTING -p tcp --dport 1234 -i eth0 -j DNAT --to 192.168.0.100:22
iptables -t nat -A PREROUTING -p tcp --dport 80 -i eth0 -j DNAT --to 192.168.0.100:80
#vm1-web-end

#vm2-email-start
iptables -t nat -A PREROUTING -p tcp --dport 5678 -i eth0 -j DNAT --to 192.168.0.200:22
iptables -t nat -A PREROUTING -p tcp --dport 25 -i eth0 -j DNAT --to 192.168.0.200:25
#vm2-email-end
===================

(and don't forget to set in the configuration of your VMs the IP of your bridge (192.168.0.1) as "default gateway")

With all this you should now be able first of all to connect from within your VM to the Internet.

Concerning the inbound connections you see above that e.g. I am forwarding all the traffic that arrives on my host (eth0) on the port 1234 to the VM that has the internal IP 192.168.0.100 on port 22, which is where SSH of that VM is listening on => therefore if I want to connect from home to that VM I will use the command "ssh -p 1234 " and I will be transparently proxied to the VM by "iptables" running on the host.
Of course you won't be able to use twice the same external port - e.g. if the port 22 is already used by the host, you'll have to use other ports (1234 and 5678 in the example above) for your VMs.

IPv6

Virtual machines IPv6

As I already mentioned at the beginning, I originally wanted to give the VMs a direct IPv6 connectivity to the outside world using external IPv6-addresses, but it did not work.
Therefore, the plan B was to use "haproxy" and proxy inbound IPv6-connections on specific ports to the right VMs.

The configuration of haproxy is quite simple and in my case it looks like this:
===================
global
    log 127.0.0.1        local0
    log 127.0.0.1        local1 notice
    maxconn             4096
    user                haproxy
    group               haproxy
    daemon

defaults
    log                 global
    mode                tcp
    option              dontlognull
    retries             3
    maxconn             4000
    contimeout          5000
    clitimeout          50000
    srvtimeout          50000

listen ipv6proxy80     2001:41d0:8:5688::1:80
        mode    tcp
        server ipv4server80    192.168.0.100:80
        maxconn 4000
listen ipv6proxy443    2001:41d0:8:5688::1:443
        mode    tcp
        server ipv4server443   192.168.0.100:443
        maxconn 4000
===================
In this configuration I proxy 1:1 inbound IPv6-connections on port 80 and 443 to the VM that hosts the webserver over IPv4.

References

SW-versions used:
Distribution: Gentoo
Kernel: 3.3.8
HAProxy: 1.4.21
KVM (QEMU): 1.1.1-r1
iptables: 1.4.3
Links: