Skip to content
This repository was archived by the owner on Dec 13, 2018. It is now read-only.

cr: use the RPC protocol for communication with criu#486

Merged
vmarmol merged 7 commits into
docker-archive:criufrom
avagin:swrk
Apr 9, 2015
Merged

cr: use the RPC protocol for communication with criu#486
vmarmol merged 7 commits into
docker-archive:criufrom
avagin:swrk

Conversation

@avagin

@avagin avagin commented Mar 27, 2015

Copy link
Copy Markdown
Contributor

criu swrk will be supported starting with CRIU 1.6 (1 June 2015), so this patch is only for review by now.

criu swrk is a special mode when we don't want to execute a service, but want to use RPC.

Here is more details:
http://lists.openvz.org/pipermail/criu/2015-March/019400.html

criu swrk allows to remove the hack with saving StdFds. And in future, we will be able to handle notifications from CRIU without additional scripts.

Jenkins will fail because goprotobuf are not added in dependencies.

Cc: @SaiedKazemi

@LK4D4

LK4D4 commented Mar 27, 2015

Copy link
Copy Markdown
Contributor

That's really awesome.

@avagin avagin changed the title [RFC] cr: use the RPC protocol for communication with criu [DRAFT] cr: use the RPC protocol for communication with criu Mar 27, 2015
@SaiedKazemi

Copy link
Copy Markdown

@avagin @mrunalp
Yes, I agree that SWRK mode would be the way to go. The only issue is that currently we don't require a minimum version of CRIU because all command line options have been there for a long time (this is something that @mrunalp asked). In case someone is running a very old version of CRIU, it will print an error message and exit. Using CRIU in SWRK mode will require a version check of at least v1.6.

@xemul

xemul commented Mar 27, 2015

Copy link
Copy Markdown

@SaiedKazemi , yes but swrk mode itself exists in older versions too (it appeared in 1.2). AFAIU the only reason for 1.6 is that inherit-fd option will appear. Is absence of inherit-fd a blocker? I mean, will it be impossible to C/R anything without it?

@SaiedKazemi

Copy link
Copy Markdown

@xemul, without inherit-fd support docker logs will not work. I assume most use cases would want this functionality. But I am not the best person to answer the question whether it's a blocker or not.

@xemul

xemul commented Mar 27, 2015

Copy link
Copy Markdown

@SaiedKazemi , I see. So without swrk's inherit_fd C/R in libcontainer will have to work with CRIU's CLI to be usable.

I've looked at the patch to CRIU @avagin has sent. I think we can make CRIU 1.5.1 with this patch to make swrk mode usable for libcontainer. All the more so we will have an issue with newer kernel soon that was fixed by Oleg Nesterov from RedHat.

@mrunalp

mrunalp commented Mar 27, 2015

Copy link
Copy Markdown
Contributor

👍 I just noticed that Fedora 21 has criu 1.3.1 Does it mean that the libcontainer criu integration won't work on it? I also see that @avagin is maintaining the criu packages on Fedora. Would it be possible to support newer versions of criu on Fedora 21 (it has kernel 3.19 now)?

@SaiedKazemi

Copy link
Copy Markdown

That's a bummer. For restore to be fully functional, we needed two options that were added to CRIU v1.4. The --inherit-fd option (0412152fc5) and extending the already existing --veth-pair option (296129295a). This means that the CRIU command line options in the new libcontainer will not work with 1.3.1.

@avagin

avagin commented Mar 27, 2015

Copy link
Copy Markdown
Contributor Author

@mrunalp @SaiedKazemi @xemul I am going to update CRIU in FC21.

Ubuntu has CRIU 1.3.1 too. We need to update CRIU there too. So I think we can realease criu 1.5.1 and use criu swrk in libcontainer.

root@ubuntu:/home/avagin# criu -V
Version: 1.3.1
root@ubuntu:/home/avagin# cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=14.10
DISTRIB_CODENAME=utopic
DISTRIB_DESCRIPTION="Ubuntu 14.10"

@xemul

xemul commented Mar 27, 2015

Copy link
Copy Markdown

OK, so the next steps would be 1.5.1 and Fedora's package update :)

@mrunalp

mrunalp commented Mar 27, 2015

Copy link
Copy Markdown
Contributor

@avagin Thanks! :)
@xemul That sounds like a plan.

@avagin avagin force-pushed the swrk branch 2 times, most recently from 73c55ec to c58d357 Compare March 30, 2015 12:24
@xemul

xemul commented Mar 31, 2015

Copy link
Copy Markdown

So, the criu-1.5.1 is out :) Feel free to pick one up and use!

@avagin avagin changed the title [DRAFT] cr: use the RPC protocol for communication with criu cr: use the RPC protocol for communication with criu Apr 1, 2015
@avagin

avagin commented Apr 1, 2015

Copy link
Copy Markdown
Contributor Author

I think this request is ready to be merged. @SaiedKazemi @crosbymichael @LK4D4 @mrunalp could you review this?

@SaiedKazemi

Copy link
Copy Markdown

@avagin @crosbymichael @LK4D4 @mrunalp @xemul
Yes, I think it's ready. There is, however, an issue with storing CRIU images directory inside the container's root. As you saw in my comment, the caller has to pass in the directory as an argument (preferably outside the container's root) to Checkpoint() and Restore().

In the background, I am working with Pavel and Tycho Andersen of Canonical to prevent CRIU's temporary cgyard mount points leaking into the container. But this should not be a show stopper for merging this PR.

@mrunalp

mrunalp commented Apr 1, 2015

Copy link
Copy Markdown
Contributor

@avagin @SaiedKazemi Thanks! Yes, I will try this out and add review comments if any (hopefully by tomorrow).

@avagin avagin force-pushed the swrk branch 2 times, most recently from a062638 to 0797e7b Compare April 2, 2015 13:20
@avagin

avagin commented Apr 2, 2015

Copy link
Copy Markdown
Contributor Author

@SaiedKazemi This request is only about criu swrk. I agree that a user should be able to specify a checkpoint directory. As for cgyard I think it's a good to split workdir and imagedir. In this case the problem with cgyard will disapear, if I don't mess something. I have added a commit which sets a separate directory as workdir for criu.

@SaiedKazemi

Copy link
Copy Markdown

@avagin Great. With your commit to pass an image directory outside container's root, we won't need to change CRIU.

@crosbymichael

Copy link
Copy Markdown
Contributor

The image directory should never be inside the container's root. That is just an implementation detail of running nsinit from the current dir. you can change it via nsinit --root. It will always and should always be in another place

@SaiedKazemi

Copy link
Copy Markdown

@crosbymichael
Yes, that's how it should be done. But the current code in Checkpoint() is hard coded to create it as:
dir := filepath.Join(c.root, "checkpoint")
which ends up in the container's root -- that's why I marked it with XXX.

@avagin said that he already has a commit that passes the image directory path to both Checkpoint() and Restore().

@avagin

avagin commented Apr 2, 2015

Copy link
Copy Markdown
Contributor Author

@SaiedKazemi you misunderstood me. I split imagedir and workdir. workdir contains temporary files. imagedir is required for restoring a container. Pls, take a look at avagin@0797e7b

@mrunalp

mrunalp commented Apr 2, 2015

Copy link
Copy Markdown
Contributor

I built criu 1.5.1 on F21 and I see a failure when I try to checkpoint a container

[root@localhost redis]# nsinit exec -t --net host /usr/local/bin/redis-server

# Another terminal
[root@localhost redis]# nsinit --criu /usr/bin/criu checkpoint                                                       
>>> criu [dump -v4 --images-dir /redis/nsinit/checkpoint --work-dir /redis/nsinit/criu.work -o dump.log --root /redis --manage-cgroups --evasive-devices -t 15126]
exit status 1
[root@localhost redis]# tail nsinit/checkpoint/dump.log 
(00.038113)   <--
(00.038115)   [./dev/mqueue](143->140)
(00.038116)   <--
(00.038117)  <--
(00.038119) <--
(00.038122) Error (mount.c:624): 145:./dev/console doesn't have a proper root mount
(00.038124) Unlock network
(00.038126) Unfreezing tasks into 1
(00.038127)     Unseizing 14482 into 1
(00.038150) Error (cr-dump.c:1979): Dumping FAILED.

@mrunalp

mrunalp commented Apr 2, 2015

Copy link
Copy Markdown
Contributor

I am also seeing the same failure on the criu branch (without these changes).

@mrunalp

mrunalp commented Apr 2, 2015

Copy link
Copy Markdown
Contributor

@avagin Could we clean up the commit history a bit? It is difficult to load the entire PR in a web page so going commit by commit to review. However, I see that some code was removed in later commits.

@mrunalp

mrunalp commented Apr 2, 2015

Copy link
Copy Markdown
Contributor

@avagin Up to you though, I can manage either ways :)

@crosbymichael

Copy link
Copy Markdown
Contributor

@SaiedKazemi container.root is not Container.Rootfs They are two different directories. The rootfs is the filesystem for the container. The container.root is where runtime state like configs are saved and can be used for criu image also.

We dont' have to worry about the TTY usecase because it's not common and we can keep this morning and get it merged soon. TTY is a nice to have, not a requirement by a long shot.

@SaiedKazemi

Copy link
Copy Markdown

@crosbymichael What I see is:
Rootfs: /home/saied/work/nsinit/busybox
root: /home/saied/work/nsinit/busybox/nsinit
Therefore, the "checkpoint" directory that Checkpoint() creates is inside the container's root filesystem. As a result, if you run mount after restore, you will see a bunch of cgyard mount points in the container.
[Terminal A]
$ sudo nsinit exec -- sh -i
[Terminal B]
$ sudo nsinit -criu /usr/local/bin/criu checkpoint
[Termina A]
$ sudo nsinit -criu /usr/local/bin/criu restore
/ # mount
...
none on /nsinit/checkpoint/.criu.cgyard.mc2dLq type tmpfs (rw,relatime)
...
If you try this, you will see the same thing. Have I missed something?

@avagin

avagin commented Apr 3, 2015

Copy link
Copy Markdown
Contributor Author

@SaiedKazemi I would recomend you to use the --root option. Default values cannot be suitable for all cases.
nsinit --root /var/run/nsinit exec ...

@crosbymichael

Copy link
Copy Markdown
Contributor

Lets change the default so it's less confusing when testing

@crosbymichael

Copy link
Copy Markdown
Contributor

I opened #507 to change it so it's less confusing for people.

@crosbymichael

Copy link
Copy Markdown
Contributor

Ok, it's merged and I also rebased the criu branch. Sorry @avagin

avagin added 6 commits April 3, 2015 23:13
In this case CRIU will exit after restoring processes. Here is
no reason to wait the init process.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
criu swrk is a special mode when we don't want to execute a service,
but want to use RPC.

Here is more details:
http://lists.openvz.org/pipermail/criu/2015-March/019400.html

Another good feature of this mode is that we don't need to create
action scripts and we will be able to remove the hack with saving StdFds.

criu swrk is supported starting with CRIU 1.5.1.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
CRIU version must be 1.5.1 or higher

Signed-off-by: Andrey Vagin <avagin@openvz.org>
This directory can be removed when criu completes.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
@avagin

avagin commented Apr 3, 2015

Copy link
Copy Markdown
Contributor Author

@crosbymichael I've rebased this branch too.

Everything must be in vendor/

Signed-off-by: Andrey Vagin <avagin@openvz.org>
@SaiedKazemi

Copy link
Copy Markdown

@crosbymichael @avagin Anything to do for this PR to merge? Also, what needs to be done for the criu branch to merge into master?

@avagin

avagin commented Apr 7, 2015

Copy link
Copy Markdown
Contributor Author

@crosbymichael @SaiedKazemi I think we can merge it into the criu branch.

@vmarmol

vmarmol commented Apr 9, 2015

Copy link
Copy Markdown
Contributor

Spoke offline with @SaiedKazemi and I'll go ahead and merge this into the CRIU branch. We've been more open with allowing some development like this in branches before we merge into master (same as we did for the API branch). Let me know if you completely disagree @crosbymichael, but I think this is Kosher from what we've done before :)

vmarmol added a commit that referenced this pull request Apr 9, 2015
cr: use the RPC protocol for communication with criu
@vmarmol vmarmol merged commit 7b3359e into docker-archive:criu Apr 9, 2015
@crosbymichael

Copy link
Copy Markdown
Contributor

@vmarmol your a maintainer just like me, no worries ;)

@SaiedKazemi

Copy link
Copy Markdown

@vmarmol @crosbymichael
Great. What's the plan to merge into master?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants