Mutara + LXC + Docker == fail?

Hi,

I’ve seen that since Mutara (6.0) the support for LXD is deprecated and will be removed in future versions (?!).

Regarding the nice/interesting example at Using Docker on OpenNebula through LXD - OpenNebula – Open Source Cloud & Edge Computing Platform it was possible to spawn docker via LXD/LXC containers. I’ve tried it and it worked also with 6.0.2 using the LXD node driver.

But after reading about the deprecation I moved to the new LXC driver (and tried it on fresh nodes). Now docker can’t be used in the LXC containers anymore, it throws errors about problems with mounting:

docker: Error response from daemon: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: rootfs_linux.go:58: preparing rootfs caused: permission denied: unknown.

The container start failed. I’ve read at LXC Driver — OpenNebula 6.0.2 documentation that

In order to ensure the security in a multitenant environment, only unprivileged containers are supported by LXC drivers.

Is this maybe the reason for the bug, the LXC containers can’t be started privileged anymore (which seems to be needed for docker inside LXC)?

If yes:

  • will the driver feature come back to life/the LXC driver or is this by intention?
  • if intended and the feature will not come back, why was this feature “advertised” and supported with the LXD driver (and is still working)? Maybe this should be a configurable driver option?

My knowledge with LXC is not in-depth enough yet to fix the problem myself - does anyone has some idea how this can be fixed? Maybe some LXC configuration on the node system itself?

Bye
Björn

Hi @terminar,

In order to increase the security (privileged containers in a multi-tenant environment can be a serious security threat) the new integration with LXC only supports unprivileged containers. I haven’t tried deploying docker inside an LXC container by myself, but for the error you pasted it looks like it might be affecting that.

Currently LXD is still supported for compatibility with all deployments, so users that are already using LXD can start migrating their environment with enough time.

If you’re interested in running docker images inside LXC, I suggest to take a look at OpenNebula and DockerHub integration, which allows you to directly run this docker images as LXC containers without needing to run Docker inside LXC. You can find a complete example here: Running Containers — OpenNebula 6.0.2 documentation

Tried it - it’s a hassle due to the missing handling of the docker-entrypoint. Yes, it it working if you are just using some basic nginx image but I tried several other images, it’s just not working with more “complex” images. But thank you anyway for the tipp.

This should be selectable by the administrator of OpenNebula. If the admin doesn’t understand that security difference that’s another story.

In our case we use KVM machines (finished deployed KVM machines were we don’t have administrative access to the KVM start - we can only select predefined installation images and use cloud-init for initialization). - We use KVM like bare metal.

So that means

  • we can’t use the KVM functionality of OpenNebula to deploy images/maschines
  • we HAVE security and separation due to KVM itself but need to use docker stuff (or LXC in privileged configuration).

The security view is of course needed but that’s an opinionated point of view that only security is given by unprivileged containers. Privileged LXC in KVM is more secure than unprivileged LXC without KVM.
Now we have a comparable security situation but can’t use LXC with Mutara.

But OK, understood - problem by intention so I have to fork/create another driver which can be configured (or can be used without any stacked hacky situation like KVM->LXC->Docker) just to use OpenNebula with containers.

Thanks for information!

@cgonzalez Just an update because I gave your suggestion again another try.
Image: redmine:latest from dockerhub.

  • searched for Dockerfile,
  • searched for the entypoint and command in the Dockerfile
  • added that stuff to the VM template
  • tried => failed, ENV missing
  • terminated LXC container
  • added missing ENV => export to start_script context
  • created new LXC container

Still doesn’t work because uid error:

Fetching rake 13.0.3
sudo: effective uid is not 0, is /usr/bin/sudo on a file system with the 'nosuid' option set or an NFS file system without root privileges?
Bundler::SudoNotPermittedError: Bundler requires sudo access to install at the
moment. Try installing again, granting Bundler sudo access when prompted, or
installing into a different path.
An error occurred while installing rake (13.0.3), and Bundler cannot continue.
Make sure that `gem install rake -v '13.0.3' --source 'https://rubygems.org/'`
succeeds before bundling.

That image is complete non usable with LXC (don’t know if privileged LXC would allow uid0?).
I’ve tried others which also had different interesting problems. The suggestion is really not usable.

Please re-think the situation not to allow privileged LXC. It makes LXC usage nearly useless in terms of existing images.

Hello @terminar ,

Support for Entrypoint has been added in the latest release of OpenNebula. So now, when you export and image from the DockerHub, the entrypoint of the image is stored in /one_entrypoint.sh so you can run it on the start script of the VM:

nohup /one_entrypoint.sh &> /dev/null &

Best,
Álex.

Thanks for information, great news.
But that doesn’t fix problems regarding unprivileged containers with images needing e.g. uid 0 (like the example with the redmine image).

Hi @terminar,

This should be selectable by the administrator of OpenNebula. If the admin doesn’t understand that security difference that’s another story.

I would usually agree with you, but I think supporting privileged containers in any multi tenant environment (which from my point of view is the main propose of deploying a cloud environment) implies a very big security breach. Is not a minor configuration option.

I think the best way to proceed with this kind of errors is to find the exact problem and try to tune the container configuration in order to provide only the permission required for the specific case (i.e whatever docker needs). Instead of directly “open” the entire host.

we HAVE security and separation due to KVM itself but need to use docker stuff (or LXC in privileged configuration).

Even using KVM for your host (which will provides a virtualization layer) if two different people are able to deploy containers in this hosts they would be in danger. From this point of view I cannot see any differences between running containers in a bare metal or virtualized hosts. Even if you’re running one KVM host for one specific user the user, the user might be able to perform actions that affect the main host directly (which doesn’t sound like something I’d like to allow).

Regarding the: uid is not 0 error, please note that inside the container the root user will have uid 0 so this error isn’t probably related with the privileged/unprivileged issue.

$ echo "id" | lxc-attach one-1
uid=0(root) gid=0(root) groups=0(root)

Just my two cents explaining why I still stick on this topic: using “cloud environment” as general rule of thumb for every use case seem to be limiting - especially they are not all multi tenant (yes you wrote - “main propose”, but that is also limiting).

There are specific widespread examples where VM orchestration is used to logically group (administrative and financial) responsibility and completely dispatch sub-administration of resources.
Such “on-premise sub-administrated cloud environments” are not necessary multi tenant.
Let’s call these example “cloud in a cloud” for the moment - $big_company IT administration department grants specific project/department group around N (20? 50?) instances of virtual hosts, additionally some dedicated servers and some switches - for doing e.g. their own fancy R&D stuff which is completely “detached” and out of administrative scope of the IT department. If something (hardware) is broken it will be fixed. Backup&Recovery is done automatically. Nothing more, nothing less. Detached orchestration of such (on-premise) environments (with OpenNebula) is really helpful.

TL;DR: Use cases for OpenNebula can be really different and meaningful without “multi tenant” situation but still having security in mind (or maybe not because it doesn’t need to be).


Correct but again, that depends on administration and planning (a cluster or services). Even when looking at a big flat open online “multi tenant” environment:

  • If only one LXC container is executed per KVM that’s still not a problem. Why would anyone do this? Many small KVM instances using OpenNebula for (LXC) container rollout, using “hypervisor” as envelope - because there is maybe no access to the administration itself.
  • If a “service” containing different (LXC) containers running on a “dedicated” “hypervisor” instance as self contained application also not a big deal.

But understood - ON has a focus on a specific/main propose use case itself and that makes sense regarding the projects direction. No need to talk about any further, I just wanted to explain one other example. Thank you for the time trying to explain the motivation behind that decision.


Coming back to the main problem example:

Correct, root will have uid 0 if logged in as root but there is still privilege magic regarding uid 0/root.

$ echo “id” |lxc-attach one-62
uid=0(root) gid=0(root) groups=0(root)

In this case with this specific redmine image (as posted in the first log above) sudo is not working.
The redmine image is using “bundler check || bundler install” as user “redmine” (uidgid 999/999).
bundler itself seems to call sudo. Calling bundler as root to prevent this will fail because bundler don’t want to be called as root generally.

The rootfs is mounted

/var/lib/one/datastores/0/62/mapper/disk.0 on / type fuse (rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other)

It seems that the nosuid is a problem in this specific case (and maybe for other images the nodev?).
I haven’t found out if this is a general lxc configuration problem or a problem with the “unprivileged” situation. Is OpenNebula doing specific things for “unprivileged” lxc?

Thanks for sharing the details of your use case @terminar, very interesting!

Even in your case, I would try to prioritize security and find out a different solution than privileged containers, but this is just a personal opinion. One of ours tenant is flexibility, so if you really think this would really make your use case better, please open a GitHub issue and I will bring the discussion to the team.

You can check the LXC configuration for you container by checking its deployment file, it should be available at the host where the container is running in: /var/lib/one/datastores/<sys_ds_id>/<vm_id>/deployment.file. It should look like this:

lxc.include = '/usr/share/lxc/config/common.conf'
lxc.rootfs.path = '/var/lib/lxc-one/2/disk.0'
lxc.mount.entry = '/var/lib/lxc-one/2/disk.1 context none rbind,ro,create=dir,optional 0 0'
lxc.cgroup.cpu.shares = '1024'
lxc.cgroup.memory.limit_in_bytes = '768M'
lxc.cgroup.memory.oom_control = '1'
lxc.idmap = 'u 0 600100001 65536'
lxc.idmap = 'g 0 600100001 65536'

Basically by default we’re including the LXC common configuration, we are mounting the disks and adding some cgroups limitations and the UID/GID mapping for the unprivileged containers. The content of this file will change depending on your container configuration.

As you can see we’re not passing the mount options for the rootfs path, but we do for the other disks. These options are now hardcoded but it might be interesting to make it more customizable. I’ve open an issue for this: Allow custom mount options for LXC disks · Issue #5429 · OpenNebula/one · GitHub.

Also, if you need to add any LXC specific configuration you can make use of profiles and raw section (LXC Driver — OpenNebula 6.0.2 documentation).

1 Like

I dived a bit deeper and it is true that there are no options passed explicitly but the new LXC driver is indeed influencing how the lxc rootfs is mounted (and how functional it is). Let’s go down the rabbit hole.

  • lxc/storage/storageutils.rb#40 uses bindfs for mounts, it’s the first and only OpenNebula driver which uses bindfs (didn’t found any other reference? so that’s an assumption)
  • bindfs is listing in it’s manual that suid (setuid,setgid) have no effect/is not working inside the mount due to a neccessary “security feature”, also it’s mentioning that ‘-o dev’ allows access to devices.

Looking the default behavior the new LXC driver produces with bindfs:

/var/lib/one/datastores/0/48/mapper/disk.0 on / type fuse (rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other)

I changed

#{COMMANDS[:bind]} #{cmd_opts} #{src} #{target}

in storageutils.rb#L162 to

#{COMMANDS[:bind]} -o dev #{cmd_opts} #{src} #{target}

which then produces:

(rw,nosuid,relatime,user_id=0,group_id=0,default_permissions,allow_other)

The ‘nodev’ option is removed from the mounted rootfs.
I did another try, I stopped the container, unmounted the bindfs mount

/var/lib/one/datastores/0/48/mapper/disk.0 on /var/lib/lxc-one/48/disk.0

and remounted it manually with

mount -o bind /var/lib/one/datastores/0/48/mapper/disk.0 on /var/lib/lxc-one/48/disk.0

restarted the lxc container, attached to it and voila:

/dev/loop0 on / type ext4 (rw,relatime)

Of course that doesn’t fix my problem because now something is more broken maybe due to my manually intervention (cgroup errors on start of the container, the uidgid mapping is missing) and now all uid/gid’s are nobody:nogroup. But that seems to demonstrate exactly the problem and the reason.

Even if it will be possible to configure additional mount options for the rootfs in the future, “bindfs” will introduce nosuid which seem to make the use of sudo and setuid/setgid stuff impossible. And in that case, images and installations relying on suid will not work - even if root (id0) seems to be available within the container.

I’ll stop here to investigate any further.

all uid/gid’s are nobody:nogroup. But that seems to demonstrate exactly the problem and the reason.

Here is the reason why we are using the bindfs. The container images file systems use the usually UID/GID (i.e unshifted) for the files ownership. So in order to align them with the container UID/GID mapping we use the bindfs command with --uid-offset and --gid-offset options.

Even if it will be possible to configure additional mount options for the rootfs in the future, “bindfs” will introduce nosuid which seem to make the use of sudo and setuid/setgid stuff impossible. And in that case, images and installations relying on suid will not work - even if root (id0) seems to be available within the container.

Yes, I did put the details into the GitHub issue. I think the problem can be easily addressed for the rootfs [1], but we need to evaluate if there’s also an option for the extra disks.

The aim is to find a solution that allows to use the provided mount options from the bottom to the top, making sure the file system inside the container have the expected options/permissions.

[1] Allow custom mount options for LXC disks · Issue #5429 · OpenNebula/one · GitHub