vserver development mailing list: Re: [Vserver] [RFC] Future Linux-Vserver Networking (Part 1)

Re: [Vserver] [RFC] Future Linux-Vserver Networking (Part 1):
vserver development mailing list
[Next/Previous Months] [Main vserver Project Homepage] [Howto Subscribe/Unsubscribe] [Paul Sladen's vserver stuff]

On Tue, Mar 30, 2004 at 05:26:45PM +0000, Liam Helmer wrote:
> Here's a bunch of thoughts on networking in linux-vserver.
>
> The nice thing about the current linux-vserver interface is that it's
> efficient. The packet only has to travel once through the network stack,
> which makes it faster, especially when packets have to be rebuilt from
> fragments.

this is something I intend to preserve, as the (linux-)vserver
motto always was (and still is) "at full speed!"

> The bad thing about the current linux-vserver interface is that the
> current setup can break routing somewhat. This is mostly due to the way
> that IP is limited: because you're limited to a particular IP, bound to
> a particular network interface, there ends up being compatibility
> problems if a particular route is on a different interface than the IP
> is bound to -> the box has difficulty determining how to send the packet
> because it's souce address is wrong (more of a problem with NAT than
> live IPs, granted).

interesting, where did you observe this? just because the
linux-vserver network code does not care about interfaces.

> Also, if your address changes, you have to restart each and every
> process on the vserver. Many vserver services won't operate correctly if
> you bind the vserver to 0.0.0.0 (for instance bind), and even the ones
> that do get pretty persnickity if you change the IP that 0.0.0.0
> represents.

> Additionally, localhost doesn't work nicely. What if, for instance, I
> wanted to have all the vservers connecting to a particular service on
> the box. In the current situation, I have to bind the service to a
> particular IP, and then have all the vservers connect to the service on
> that IP. I can always manage that through DNS, but it's not ideal -> it
> would be nicer to have an IP that I could connect to locally, without
> having to rely on a particular IP for the box.

> --
>
> It strikes me that there's 2 distinct categories of useage whern it
> comes to networking on linux-vserver:
>
> 1) The virtual hosting camp that uses one or more live IPs to represent
> each client; or
>
> 2) The security partitioning camp that shares the boxes IPs among the
> vservers -> sometimes IPs are shared, sometimes they're not.
>
> Each camp requires different core functionality. The virtual hosting
> camp wants:
>
> 1) Transparency: real ips that don't need translation
> 2) Simplicity: not having to set up internal networks, etc
> 3) Bind security: vservers shouldn't be able to disrupt other vservers
> 4) Traffic accounting: ability to easily see which vserver traffic is
> coming from
> 5) Security partitioning: vserver should only be able to see it's own
> traffic
> 6) Ability to quickly turn on/off vservers
>
> The security partitioning camp requires:
>
> 1) Security partitioning: as above
> 2) Bind security: as above
> 3) Traffic accounting
> 4) Information security: the vserver should have as little information
> as possible about the host computer be available.
> 5) Ability to quickly turn on/off vservers
> 6) Ability to have private communication between vservers (such as
> partioning a mysql server from an apache server on the local box).
>
> They don't really need it to be as simple or transparent as the virtual
> hosting camp.

the home camp, which want's to use many vservers
with one public ip, but doesn't care much about
security ;)

> ---
>
> One possible scenario is the following. I'll work on a patch for
> vserver-utils if anyone's interested in this:
>
> 1) Put a private IP and interface for the context using the dummy net
> module
> insmod -o new dummy
> ip link set dev dummy0 address ff:ff:ff:ff:11:11
> nameif ${vserver_name} ff:ff:ff:ff:11:11
> ip addr add x.x.x.x/yy dev ${vserver_name}
> # NB: Bringing the interface up isn't necessary: in fact, I recommend
> against it, as it could affect routing.
>
> The reason for making a named interface is transparency. Also, it allows
> one quick command to bring down all IPs on the vserver:
>
> ip addr flush dev ${vserver_name}
>
> Then, if NAT is required for the IP you've assigned.
>
> 2) Assign that IP to be the vserver IP for the box.
> echo IP=X.X.X.X >> /etc/vservers/mine.conf
> 3) Mark the traffic because it's coming from that IP.
> iptables -t mangle -A PREROUTING -s x.x.x.x -j MARK --set-mark
> ${S_CONTEXT}
> iptables -t mangle -A PREROUTING -d <ext.ip> -j MARK --set-mark
> ${S_CONTEXT}
> 4) SNAT outgoing traffic that you want to allow from that section.
> iptables -t nat -A PREROUTING -m mark --mark ${S_CONTEXT} -j SNAT --to
> <ext.ip>
> 5) DNAT incoming traffic that you want to allow to that section.
> iptables -t nat -A POSTROUTING -m mark --mark ${S_CONTEXT} -j DNAT
> --to <x.x.x.x>
> 6) Drop all external connections coming directly to the dummy ip:
> iptables -t nat -A PREROUTING -d x.x.x.x -i eth0 -j DROP
>
> That offers better security, because it specifically allows any incoming
> traffic that may occur, as well as concealing the real IP from the
> vserver, which can be useful from a security standpoint. As well, it can
> facilitate having static private IP addresses for the vservers, and deal
> with services that don't want to be advertized elsewhere. It's fairly
> transparent, and it's easy to understand a configuration, because the
> interfaces are properly labelled.
>
> The downside of using this is that it could, I think, confuse services
> that use multicast or broadcast, like slp or dhcp (I haven't tried these
> in this type of setup).
>
> So, what extra pieces would help make this scenario better?
>
> Once simple piece would be a netfilter context match module, so that you
> could reliably mark a packet as coming from a particular context
> (modelled after the "uid" match module, which matches a process to a
> uid). This way, you could reliably mark packets as coming from a
> particular context. This would be applicable for iptables, as well as
> for ip routing rules, etc.

> This currently works based on IP, but this
> doesn't scale so well to a) vservers that share IPs, or b) multi-ip
> vservers. This would also be very useful for tracking and limiting
> vserver-to-vserver communication over the lo interface.

> Also, with the creation of a particular dummy interface for the vserver
> in userspace, it behoves the kernel portion of vserver to enforce the
> context to only see to this vserver device somehow. That way, we can get
> away from the ipv4 and ipv6 limitations, and instead talk deal with
> per-device limitations -> which enables having a number of different
> types of protocols securely. That way, also, a vserver could have
> control over it's own "interface" without there being a security risk.
> I'm unsure how this would affect routing, however, as the packets still
> have to route out another interface, that, in this case, it wouldn't be
> able to "see".

sounds good too, actually I did some research in this
direction, but there are some limitations to this approach
(maybe some of them can be lifted by cheating a little ;)

> Another possibility is having the vserver "interface" be an entirely
> non-broadcasting entity, and instead have rules by which a vserver
> interface is translated by the kernel to deal with external networking.
> For instance, you had a vserver-0 interface, which updated corresponding
> information in eth0, with a set of rules that defined the translation.
> But, that does require a great deal more complexity.

yes, and complexity often brings confusion, and error,
which lead to bugs and security holes ...

> Anyways, sorry for the lack of formal structure here -> these are just
> ideas.