-
Notifications
You must be signed in to change notification settings - Fork 609
interfaces: add a steam-support interface #11708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The attempt to add support for futex_waitv didn't work, as it looks like snapd is being built against a too old libseccomp. It looks like we'd need libseccomp >= 2.5.4, which was released yesterday. I think it's currently being built with 2.5.3. |
interfaces/builtin/steam_support.go
Outdated
@{PROC}/@{pid}/gid_map rw, | ||
@{PROC}/@{pid}/setgroups rw, | ||
@{PROC}/@{pid}/mounts r, | ||
@{PROC}/@{pid}/mountinfo r, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can these @{PROC}/@{pid}/
entries be reduced with an owner
prefix? Unfortunately @{pid}
doesn't yet mean "only this process's pid", so these rules are perhaps wider than intended.
interfaces/builtin/steam_support.go
Outdated
# when performing a bind mount). Ideally we'd write a rule that | ||
# requires the remount option in combination with any other, but | ||
# AppArmor doesn't currently support that. | ||
mount options in (rw, ro, nosuid, nodev, noexec, remount, bind, silent, relatime) -> /newroot/{,**}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to write another eleven rules like the above, with smaller subsets of these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mount calls this is intended to cover are from this code:
https://github.com/containers/bubblewrap/blob/main/bind-mount.c
It is working around the fact that calling mount() with MS_BIND causes it to ignore all flags passed other than MS_REC. So it is essentially doing the following:
- call
mount(..., MS_BIND|MS_REC, ...)
to perform the mount. - parse /proc/self/mountinfo to discover the new mounts that have been created, and decode the mount flags used for each mount.
- for each new mount, bitwise or some additional flags to the existing flags. If the result is different, perform a
mount(..., MS_SILENT|MS_BIND|MS_REMOUNT|new_flags,...)
call to update the flags.
So the exact combination of flags will depend on the existing mounts on the system, which could include however the user has configured their host sytem, as that gets exposed through /var/lib/snapd/hostfs
. So I think it'd be somewhat more than 12 possibilities if avoiding the in
syntax.
I think the full enumeration would be the cross product of:
silent, bind, remount, nosuid
: always setnodev
or nothingro
orrw
noexec
or nothingnoatime
orrelatime
or nothingnodiratime
or nothing
So I think there are 48 possibilities. Does that sound about right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yikes, that code is far more dynamic than I expected. I thought I was going to see an array of a dozen directories plus their flags in the sources... Feel free to ignore this entirely, then. Thanks for the explanation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general LGTM, but could you please add the owner
rule qualifier as suggested by @setharnold so we keep the least privileges? Thanks!
…set up pressure-vessel containers
This is a new syscall used to wait on multiple futexes at once, and Wine/Proton will attempt to use it if the kernel supports it. Blocking access prevents it from falling back to the other futex related syscalls.
3d29f4e
to
57cf149
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, but someone closer to our current interfaces world like @mardy should review it.
Also the unit tests are failing right now with:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks! For now please ignore the nitpick comment about generating the rules, it's better to have this merged first. And in any case I'm not sure it would be a win, so we can safely disregard that :-)
# | ||
# But that is not supported by AppArmor. So we enumerate the possible | ||
# combinations of options Bubblewrap might use. | ||
remount options=(bind, silent, nosuid, rw) /newroot/{,**}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might save some bytes if we generated this list programmatically, but it's probably not worth the effort (and if even, it should be a follow-up).
# Pivot from the intermediate root to sandbox root | ||
mount options in (rw, silent, rprivate) -> /oldroot/, | ||
umount /oldroot/, | ||
pivot_root oldroot=/newroot/ /newroot/, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I learnt something new today! Nice! :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks not unreasonable, one question
/run/pressure-vessel/** mrw, | ||
/run/host/usr/sbin/ldconfig* ixr, | ||
/run/host/usr/bin/localedef ixr, | ||
/var/cache/ldconfig/** rw, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does the snap does a layout on this one, as it comes from the base afaict?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we followup with that outside of this PR? I'd like @jhenstridge to respond, but we won't get that as timely as we'd like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, these are in the pivoted root? but is still mount /var from the snap root, so still not entirely sure how that is writable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added these rules to handle accesses made by the process within the mount namespace created by pressure-vessel/bubblewrap.
Maybe in future this could be separated out into a sub-profile, but that's complicated by the fact that the executables we'd perform the transitions on are downloaded by Steam and may have varying paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still a bit confused how that dir gets writable because the underlying dir comes from the base that is read-only
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can land this but I would like to understand how that gets writable by Monday. I'm probably missing something but I'm guessing what happens just looking at the rules here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The root of the bubblewrap sandbox is a tmpfs, so any path not mounted over is potentially writeable. It isn't exposing the whole host system /run
or /var
, so these locations are writeable (at least when AppArmor allows it).
@jhenstridge I'm merging this but I would still like some better understanding of how that rw cache directory works on Monday. |
This interface is intended to provide some additional permissions needed by the steam snap. At present, this is primarily AppArmor and seccomp rules to allow Steam to launch pressure-vessel containers, which it uses to provide a consistent runtime environment to some games (at the moment mainly Windows games it runs under Proton/Wine). PV is based on Bubblewrap, as used by Flatpak and various other process sandboxes on GNOME systems. Related to getting Steam games to run, I've added the futex_waitv syscall to the base template. Although the Ubuntu kernels don't yet support this syscall, we want to let Proton try to call it so it will fall back to the old futex API. As this has essentially the same security concerns as the existing futex syscalls, it seemed sensible to add it to the base template rather than the steam-support interface. snap-seccomp knows about this syscall as of 15th April, when PR #11674 was merged. * interfaces: add a steam-support interface with permissions needed to set up pressure-vessel containers * interfaces/seccomp: add futex_waitv to the base template This is a new syscall used to wait on multiple futexes at once, and Wine/Proton will attempt to use it if the kernel supports it. Blocking access prevents it from falling back to the other futex related syscalls. * tests: add steam-support to policy snap * interfaces: limit proc access to same owner in steam interface * interfaces: lock down the remount AppArmor rules for steam-support * interfaces: allow pressure-vessel to mount tmpfs to mask certain directories * interfaces/policy: add base declaration tests for steam-support
This interface is intended to provide some additional permissions needed by the steam snap.
At present, this is primarily AppArmor and seccomp rules to allow Steam to launch pressure-vessel containers, which it uses to provide a consistent runtime environment to some games (at the moment mainly Windows games it runs under Proton/Wine). PV is based on Bubblewrap, as used by Flatpak and various other process sandboxes on GNOME systems.
Related to getting Steam games to run, I've added the futex_waitv syscall to the base template. Although the Ubuntu kernels don't yet support this syscall, we want to let Proton try to call it so it will fall back to the old futex API. As this has essentially the same security concerns as the existing futex syscalls, it seemed sensible to add it to the base template rather than the steam-support interface.
snap-seccomp knows about this syscall as of 15th April, when PR #11674 was merged.