Replies: 7 comments
-
@adrianreber @avagin PTAL 🙏🏻 |
Beta Was this translation helpful? Give feedback.
-
I've been retrying this several times and I always end up with the same behavior (as described above) so I tend to think this must be a runc (or criu?) bug. Has any member of the runc community been able to reproduce this? Thx |
Beta Was this translation helpful? Give feedback.
-
Unfortunately I have never used container checkpointing with MACVLAN. Sorry.
Using an external network namespace sounds like the best way to solve this. Maybe you just have some iptables rules left from CRIU in the network namespace which block the traffic. Can you see if you have any iptables rules in your network namespace? |
Beta Was this translation helpful? Give feedback.
-
Adrian, thanks for your response. Indeed, iptables rules added by CRIU into the network namespace are there after restoring. Could it be that CRIU is not able to remove them due to the errors in the restoring process described above?
|
Beta Was this translation helpful? Give feedback.
-
You can try something like |
Beta Was this translation helpful? Give feedback.
-
Thanks, this seemed to work, at least now the restored container can be reached by ping. I have yet to try what happened with the established TCP connections. I assume this could be considered a workaround to deal with the iptables errors while restoring. However, the restoring procedure still hangs when |
Beta Was this translation helpful? Give feedback.
-
That sounds correct. Using Podman to restore a container the last messages in
That is how it is supposed to be. |
Beta Was this translation helpful? Give feedback.
-
CRIU claims to support checkpointing/restoring a net namespace with a MACVLAN device, so I assumed this could work also with runc containers. Nevertheless, up to now I have failed to achieve this. I have tried to checkpoint/restore a runc container running a redis server by using a external net namespace and a net namespace created by runc. I describe below the steps done and the outcome in both cases:
Software versions
$ runc --version
runc version 1.0.1
commit: v1.0.1-0-g4144b63
spec: 1.0.2-dev
go: go1.15.14
libseccomp: 2.5.1
$ criu --version
Version: 3.15
GitID: v3.14-441-g15266a4fe
With external net namespace
In
config.json
:{ "type": "network", "path": "/var/run/netns/redis" },
Create net namespace
/var/run/netns/redis
with MACVLAN device and run container:$ sudo ip netns add redis
$ sudo ip link add link enp0s8 veth-redis-1 type macvlan mode bridge
$ sudo ip link set veth-redis-1 netns /var/run/netns/redis
$ sudo ip netns exec redis ip link set dev veth-redis-1 name eth0
$ sudo ip netns exec redis ip addr add 192.168.1.17/32 dev eth0
$ sudo ip netns exec redis ip link set dev eth0 up
$ sudo ip netns exec redis ip route add 192.168.1.0/24 dev eth0
$ sudo runc run -d redis &> /dev/null < /dev/null
Container with MACVLAN device is created and can be reached successfully:
$ sudo runc exec redis ip addr
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
7: eth0@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 3e:25:5f:25:0b:a4 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 192.168.1.17/32 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::3c25:5fff:fe25:ba4/64 scope link
valid_lft forever preferred_lft forever
$ ping 192.168.1.17
PING 192.168.1.17 (192.168.1.17) 56(84) bytes of data.
64 bytes from 192.168.1.17: icmp_seq=1 ttl=64 time=0.025 ms
Checkpoint container. It finishes correctly, and recognizes the net namespace as external (even if I don't specify
external net[4026532237]:/var/run/netns/redis
in/etc/criu/runc.conf
). There is not any mention about the MACVLAN device in the log (is this the expected behavior given that the net namespace is external?) (see complete log: dump-external-1.log)$ sudo runc checkpoint --image-path $HOME/images/ --work-path $HOME/images/ --tcp-established redis
Specify
external macvlan[eth0]:enp0s8
in/etc/criu/runc.conf
and try to restore the container:$ sudo runc restore --image-path $HOME/images/ --work-path $HOME/images/ --tcp-established redis
Container seems to restore correctly in the right namespace (it can see the MACVLAN device), but the restoring procedure hangs when
Running post-resume scripts
(should I add the detach option?). Additionally, there are some errors in the log whenRunning network-unlock scripts
(see complete log: restore-external-1.log). Probably because of this, the container cannot be reached through the MACVLAN device.iptables-restore: line 5 failed
(00.072804) Error (criu/util.c:645): exited, status=1
ip6tables-restore: line 5 failed
(00.073749) Error (criu/util.c:645): exited, status=1
In
config.json
:{ "type": "network" },
Run container and create a MACVLAN device in the corresponding net namespace. As before (same output as previous step 3), the container can be reached successfully.
$ sudo runc run -d redis &> /dev/null < /dev/null
$ PID=$(sudo runc ps redis | sed '1d' | awk '{print $2}')
$ NETNS=/proc/$PID/ns/net
$ sudo mkdir -p /var/run/netns/
$ sudo ln -sf $NETNS /var/run/netns/redis
$ sudo ip link add link enp0s8 veth-redis-1 type macvlan mode bridge
$ sudo ip link set veth-redis-1 netns /var/run/netns/redis
$ sudo ip netns exec redis ip link set dev veth-redis-1 name eth0
$ sudo ip netns exec redis ip addr add 192.168.1.17/32 dev eth0
$ sudo ip netns exec redis ip link set dev eth0 up
$ sudo ip netns exec redis ip route add 192.168.1.0/24 dev eth0
Checkpoint container. It finishes correctly, but there is not any mention about the MACVLAN device in the log (see complete log: dump-internal-1.log)
$ sudo runc checkpoint --image-path $HOME/images/ --work-path $HOME/images/ --tcp-established redis
Specify
external macvlan[eth0]:enp0s8
in/etc/criu/runc.conf
and try to restore the container:$ sudo runc restore --image-path $HOME/images/ --work-path $HOME/images/ --tcp-established redis
Container does not restore correctly as it cannot see the MACVLAN device (only sees localhost). Additionally, as before, the restoring procedure hangs when
Running post-resume scripts
(see complete log: restore-internal-1.log)$ sudo runc exec redis ip addr
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
Probably I'm doing something wrong. I'd appreciate any input.
Thx
Beta Was this translation helpful? Give feedback.
All reactions