From: Herbert Poetzl (herbert_at_13thfloor.at)
Date: Mon 29 Mar 2004 - 04:43:45 BST
Hello Community!
I finished investigating the options we have regarding the network
(interface) development in future linux-vserver versions, and I'd
like to get your opinion on several issues and/or ideas ...
this is going to be a little longer, and I'd suggest to read it
thoroughly and think about it before replying (but you probably
do that anyways ;)
I'll do that in several parts, so I can accumulate questions,
suggestions, answers, etc, and respond to them, while I proceed,
so do not expect this to be something final, and do not hesitate
to ask questions and/or provide feedback ...
------------
first, a short overview about the basic principles in use, and
the 'building blocks', I identified and researched.
Network Interfaces [ip link]
- provide a handle to a physical or virtual device
- have a physical address (eg. MAC for ethernet)
- do traffic accounting rx/tx errors/drops/...
IP Addresses [ip addr]
- provide an internet address ipv4/ipv6/...
- associated with a link (interface) as primary/secondary
- have/define a network (address/netmask)
Network Sockets [netstat -atuw]
- provide an interface to send/receive messages
- associated with an address (not an interface)
what we currently use in linux-vserver:
Network Context [chbind]
- limits the addresses to a given set of addresses
- is inherited from parent to child process
- is applied to socket operations
- limits the visibility of addresses
- doesn't know anything about interfaces
- can not be modified or migrated into
what is the difference to the UML/QEMU/VMware network device?
basically a network interface is something, where a packet
enters or leaves the host (server), and that is, what the
tun/tap device on the host, and the network driver on the
UML/QEMU/VMware client does.
consider the following setup:
host: eth0: <some-network-ip>
tun0: 10.0.0.1/24
lo: 127.0.0.1/8
client: eth0: 10.0.0.2/24
lo: 127.0.0.1/8
what happens on a 'ping -c 1 10.0.0.2' issued on the host?
H# ping -c 1 10.0.0.2
PING 10.0.0.2 (10.0.0.2) from 10.0.0.1 : 56(84) bytes of data.
64 bytes from 10.0.0.2: icmp_seq=0 ttl=64 time=44.554 msec
HOST (MAC-H, 10.0.0.1) (MAC-C, 10.0.0.2) CLIENT
| |
| arp: who-has 10.0.0.2 tell 10.0.0.1 ---------------------> |
| <------------------------- arp: reply 10.0.0.2 is-at MAC-C |
| |
| icmp: 10.0.0.1 > 10.0.0.2: echo request -----------------> |
| |
| <--------------------- arp: who-has 10.0.0.1 tell 10.0.0.2 |
| arp: reply 10.0.0.1 is-at MAC-H -------------------------> |
| |
| <------------------- icmp: 10.0.0.2 > 10.0.0.1: echo reply |
and the ifconfig on the client (and on the host, except for
some differences in the packet size[1]) now show:
eth0 Link encap:Ethernet HWaddr MAC-C
inet addr:10.0.0.2 Bcast:10.0.0.255 Mask: ...
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:3 errors:0 dropped:0 overruns:0 frame:0
TX packets:3 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:218 (218.0 b) TX bytes:218 (218.0 b)
what did UML/QEMU/VMware do in that process? simple, the
application did receive 3 packets from host, via tun0, and
transmitted them to the client kernel via eth0, and it also
received 3 packets from the client, which it delivered via
the tun0 device to the network stack of the host.
now, let's have a look at the same ping on the client side:
C# ping -c 1 10.0.0.2
PING 10.0.0.2 (10.0.0.2) from 10.0.0.2 : 56(84) bytes of data.
64 bytes from 10.0.0.2: icmp_seq=0 ttl=64 time=4.391 msec
CLIENT (MAC-C, 10.0.0.2) (MAC-C, 10.0.0.2) CLIENT
| |
| icmp: 10.0.0.2 > 10.0.0.2: echo request -----------------> |
| <------------------- icmp: 10.0.0.2 > 10.0.0.2: echo reply |
and the ifconfig on the client shows:
eth0 Link encap:Ethernet HWaddr MAC-C
inet addr:10.0.0.2 Bcast:10.0.0.255 Mask: ...
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:2 errors:0 dropped:0 overruns:0 frame:0
TX packets:2 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:168 (168.0 b) TX bytes:168 (168.0 b)
what was the part of UML/QEMU/VMware in that process?
at least nothing network related, because the entire ping
was handled on the client, which used the loopback interface
to reach one of its local addresses, disabling the lo device
would cause the ping to fail.
interesting things to spend a second thought on:
- why does the host->client ping take ~10 times longer?
- why does lo show 2 packets received and 2 transmitted?
- why does lo account a different size than tun0?
- why does tun0 account a different size than eth0?
next part: routing and netfilter (probably)
best,
Herbert
[1] if you do a detailed dump and have a close look at the
accounted network data, you will find that the client
receives more data than the host transmits (via tun0)
_______________________________________________
Vserver mailing list
Vserver_at_list.linux-vserver.org
http://list.linux-vserver.org/mailman/listinfo/vserver