Skip to content

Commit a43edc7

Browse files
Christian Braunergregkh
authored andcommitted
binfmt_misc: cleanup on filesystem umount
[ Upstream commit 1c5976e ] Currently, registering a new binary type pins the binfmt_misc filesystem. Specifically, this means that as long as there is at least one binary type registered the binfmt_misc filesystem survives all umounts, i.e. the superblock is not destroyed. Meaning that a umount followed by another mount will end up with the same superblock and the same binary type handlers. This is a behavior we tend to discourage for any new filesystems (apart from a few special filesystems such as e.g. configfs or debugfs). A umount operation without the filesystem being pinned - by e.g. someone holding a file descriptor to an open file - should usually result in the destruction of the superblock and all associated resources. This makes introspection easier and leads to clearly defined, simple and clean semantics. An administrator can rely on the fact that a umount will guarantee a clean slate making it possible to reinitialize a filesystem. Right now all binary types would need to be explicitly deleted before that can happen. This allows us to remove the heavy-handed calls to simple_pin_fs() and simple_release_fs() when creating and deleting binary types. This in turn allows us to replace the current brittle pinning mechanism abusing dget() which has caused a range of bugs judging from prior fixes in [2] and [3]. The additional dget() in load_misc_binary() pins the dentry but only does so for the sake to prevent ->evict_inode() from freeing the node when a user removes the binary type and kill_node() is run. Which would mean ->interpreter and ->interp_file would be freed causing a UAF. This isn't really nicely documented nor is it very clean because it relies on simple_pin_fs() pinning the filesystem as long as at least one binary type exists. Otherwise it would cause load_misc_binary() to hold on to a dentry belonging to a superblock that has been shutdown. Replace that implicit pinning with a clean and simple per-node refcount and get rid of the ugly dget() pinning. A similar mechanism exists for e.g. binderfs (cf. [4]). All the cleanup work can now be done in ->evict_inode(). In a follow-up patch we will make it possible to use binfmt_misc in sandboxes. We will use the cleaner semantics where a umount for the filesystem will cause the superblock and all resources to be deallocated. In preparation for this apply the same semantics to the initial binfmt_misc mount. Note, that this is a user-visible change and as such a uapi change but one that we can reasonably risk. We've discussed this in earlier versions of this patchset (cf. [1]). The main user and provider of binfmt_misc is systemd. Systemd provides binfmt_misc via autofs since it is configurable as a kernel module and is used by a few exotic packages and users. As such a binfmt_misc mount is triggered when /proc/sys/fs/binfmt_misc is accessed and is only provided on demand. Other autofs on demand filesystems include EFI ESP which systemd umounts if the mountpoint stays idle for a certain amount of time. This doesn't apply to the binfmt_misc autofs mount which isn't touched once it is mounted meaning this change can't accidently wipe binary type handlers without someone having explicitly unmounted binfmt_misc. After speaking to systemd folks they don't expect this change to affect them. In line with our general policy, if we see a regression for systemd or other users with this change we will switch back to the old behavior for the initial binfmt_misc mount and have binary types pin the filesystem again. But while we touch this code let's take the chance and let's improve on the status quo. [1]: https://lore.kernel.org/r/[email protected] [2]: commit 43a4f26 ("exec: binfmt_misc: fix race between load_misc_binary() and kill_node()" [3]: commit 83f9182 ("exec: binfmt_misc: shift filp_close(interp_file) from kill_node() to bm_evict_inode()") [4]: commit f0fe2c0 ("binder: prevent UAF for binderfs devices II") Link: https://lore.kernel.org/r/[email protected] (v1) Cc: Sargun Dhillon <[email protected]> Cc: Serge Hallyn <[email protected]> Cc: Jann Horn <[email protected]> Cc: Henning Schild <[email protected]> Cc: Andrei Vagin <[email protected]> Cc: Al Viro <[email protected]> Cc: Laurent Vivier <[email protected]> Cc: [email protected] Acked-by: Serge Hallyn <[email protected]> Signed-off-by: Christian Brauner <[email protected]> Signed-off-by: Christian Brauner <[email protected]> Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
1 parent c13541c commit a43edc7

File tree

1 file changed

+168
-48
lines changed

1 file changed

+168
-48
lines changed

fs/binfmt_misc.c

Lines changed: 168 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -60,12 +60,11 @@ typedef struct {
6060
char *name;
6161
struct dentry *dentry;
6262
struct file *interp_file;
63+
refcount_t users; /* sync removal with load_misc_binary() */
6364
} Node;
6465

6566
static DEFINE_RWLOCK(entries_lock);
6667
static struct file_system_type bm_fs_type;
67-
static struct vfsmount *bm_mnt;
68-
static int entry_count;
6968

7069
/*
7170
* Max length of the register string. Determined by:
@@ -82,19 +81,23 @@ static int entry_count;
8281
*/
8382
#define MAX_REGISTER_LENGTH 1920
8483

85-
/*
86-
* Check if we support the binfmt
87-
* if we do, return the node, else NULL
88-
* locking is done in load_misc_binary
84+
/**
85+
* search_binfmt_handler - search for a binary handler for @bprm
86+
* @misc: handle to binfmt_misc instance
87+
* @bprm: binary for which we are looking for a handler
88+
*
89+
* Search for a binary type handler for @bprm in the list of registered binary
90+
* type handlers.
91+
*
92+
* Return: binary type list entry on success, NULL on failure
8993
*/
90-
static Node *check_file(struct linux_binprm *bprm)
94+
static Node *search_binfmt_handler(struct linux_binprm *bprm)
9195
{
9296
char *p = strrchr(bprm->interp, '.');
93-
struct list_head *l;
97+
Node *e;
9498

9599
/* Walk all the registered handlers. */
96-
list_for_each(l, &entries) {
97-
Node *e = list_entry(l, Node, list);
100+
list_for_each_entry(e, &entries, list) {
98101
char *s;
99102
int j;
100103

@@ -123,9 +126,49 @@ static Node *check_file(struct linux_binprm *bprm)
123126
if (j == e->size)
124127
return e;
125128
}
129+
126130
return NULL;
127131
}
128132

133+
/**
134+
* get_binfmt_handler - try to find a binary type handler
135+
* @misc: handle to binfmt_misc instance
136+
* @bprm: binary for which we are looking for a handler
137+
*
138+
* Try to find a binfmt handler for the binary type. If one is found take a
139+
* reference to protect against removal via bm_{entry,status}_write().
140+
*
141+
* Return: binary type list entry on success, NULL on failure
142+
*/
143+
static Node *get_binfmt_handler(struct linux_binprm *bprm)
144+
{
145+
Node *e;
146+
147+
read_lock(&entries_lock);
148+
e = search_binfmt_handler(bprm);
149+
if (e)
150+
refcount_inc(&e->users);
151+
read_unlock(&entries_lock);
152+
return e;
153+
}
154+
155+
/**
156+
* put_binfmt_handler - put binary handler node
157+
* @e: node to put
158+
*
159+
* Free node syncing with load_misc_binary() and defer final free to
160+
* load_misc_binary() in case it is using the binary type handler we were
161+
* requested to remove.
162+
*/
163+
static void put_binfmt_handler(Node *e)
164+
{
165+
if (refcount_dec_and_test(&e->users)) {
166+
if (e->flags & MISC_FMT_OPEN_FILE)
167+
filp_close(e->interp_file, NULL);
168+
kfree(e);
169+
}
170+
}
171+
129172
/*
130173
* the loader itself
131174
*/
@@ -139,12 +182,7 @@ static int load_misc_binary(struct linux_binprm *bprm)
139182
if (!enabled)
140183
return retval;
141184

142-
/* to keep locking time low, we copy the interpreter string */
143-
read_lock(&entries_lock);
144-
fmt = check_file(bprm);
145-
if (fmt)
146-
dget(fmt->dentry);
147-
read_unlock(&entries_lock);
185+
fmt = get_binfmt_handler(bprm);
148186
if (!fmt)
149187
return retval;
150188

@@ -196,7 +234,16 @@ static int load_misc_binary(struct linux_binprm *bprm)
196234

197235
retval = 0;
198236
ret:
199-
dput(fmt->dentry);
237+
238+
/*
239+
* If we actually put the node here all concurrent calls to
240+
* load_misc_binary() will have finished. We also know
241+
* that for the refcount to be zero ->evict_inode() must have removed
242+
* the node to be deleted from the list. All that is left for us is to
243+
* close and free.
244+
*/
245+
put_binfmt_handler(fmt);
246+
200247
return retval;
201248
}
202249

@@ -551,30 +598,90 @@ static struct inode *bm_get_inode(struct super_block *sb, int mode)
551598
return inode;
552599
}
553600

601+
/**
602+
* bm_evict_inode - cleanup data associated with @inode
603+
* @inode: inode to which the data is attached
604+
*
605+
* Cleanup the binary type handler data associated with @inode if a binary type
606+
* entry is removed or the filesystem is unmounted and the super block is
607+
* shutdown.
608+
*
609+
* If the ->evict call was not caused by a super block shutdown but by a write
610+
* to remove the entry or all entries via bm_{entry,status}_write() the entry
611+
* will have already been removed from the list. We keep the list_empty() check
612+
* to make that explicit.
613+
*/
554614
static void bm_evict_inode(struct inode *inode)
555615
{
556616
Node *e = inode->i_private;
557617

558-
if (e && e->flags & MISC_FMT_OPEN_FILE)
559-
filp_close(e->interp_file, NULL);
560-
561618
clear_inode(inode);
562-
kfree(e);
619+
620+
if (e) {
621+
write_lock(&entries_lock);
622+
if (!list_empty(&e->list))
623+
list_del_init(&e->list);
624+
write_unlock(&entries_lock);
625+
put_binfmt_handler(e);
626+
}
563627
}
564628

565-
static void kill_node(Node *e)
629+
/**
630+
* unlink_binfmt_dentry - remove the dentry for the binary type handler
631+
* @dentry: dentry associated with the binary type handler
632+
*
633+
* Do the actual filesystem work to remove a dentry for a registered binary
634+
* type handler. Since binfmt_misc only allows simple files to be created
635+
* directly under the root dentry of the filesystem we ensure that we are
636+
* indeed passed a dentry directly beneath the root dentry, that the inode
637+
* associated with the root dentry is locked, and that it is a regular file we
638+
* are asked to remove.
639+
*/
640+
static void unlink_binfmt_dentry(struct dentry *dentry)
566641
{
567-
struct dentry *dentry;
642+
struct dentry *parent = dentry->d_parent;
643+
struct inode *inode, *parent_inode;
644+
645+
/* All entries are immediate descendants of the root dentry. */
646+
if (WARN_ON_ONCE(dentry->d_sb->s_root != parent))
647+
return;
568648

649+
/* We only expect to be called on regular files. */
650+
inode = d_inode(dentry);
651+
if (WARN_ON_ONCE(!S_ISREG(inode->i_mode)))
652+
return;
653+
654+
/* The parent inode must be locked. */
655+
parent_inode = d_inode(parent);
656+
if (WARN_ON_ONCE(!inode_is_locked(parent_inode)))
657+
return;
658+
659+
if (simple_positive(dentry)) {
660+
dget(dentry);
661+
simple_unlink(parent_inode, dentry);
662+
d_delete(dentry);
663+
dput(dentry);
664+
}
665+
}
666+
667+
/**
668+
* remove_binfmt_handler - remove a binary type handler
669+
* @misc: handle to binfmt_misc instance
670+
* @e: binary type handler to remove
671+
*
672+
* Remove a binary type handler from the list of binary type handlers and
673+
* remove its associated dentry. This is called from
674+
* binfmt_{entry,status}_write(). In the future, we might want to think about
675+
* adding a proper ->unlink() method to binfmt_misc instead of forcing caller's
676+
* to use writes to files in order to delete binary type handlers. But it has
677+
* worked for so long that it's not a pressing issue.
678+
*/
679+
static void remove_binfmt_handler(Node *e)
680+
{
569681
write_lock(&entries_lock);
570682
list_del_init(&e->list);
571683
write_unlock(&entries_lock);
572-
573-
dentry = e->dentry;
574-
drop_nlink(d_inode(dentry));
575-
d_drop(dentry);
576-
dput(dentry);
577-
simple_release_fs(&bm_mnt, &entry_count);
684+
unlink_binfmt_dentry(e->dentry);
578685
}
579686

580687
/* /<entry> */
@@ -601,8 +708,8 @@ bm_entry_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos)
601708
static ssize_t bm_entry_write(struct file *file, const char __user *buffer,
602709
size_t count, loff_t *ppos)
603710
{
604-
struct dentry *root;
605-
Node *e = file_inode(file)->i_private;
711+
struct inode *inode = file_inode(file);
712+
Node *e = inode->i_private;
606713
int res = parse_command(buffer, count);
607714

608715
switch (res) {
@@ -616,13 +723,22 @@ static ssize_t bm_entry_write(struct file *file, const char __user *buffer,
616723
break;
617724
case 3:
618725
/* Delete this handler. */
619-
root = file_inode(file)->i_sb->s_root;
620-
inode_lock(d_inode(root));
726+
inode = d_inode(inode->i_sb->s_root);
727+
inode_lock(inode);
621728

729+
/*
730+
* In order to add new element or remove elements from the list
731+
* via bm_{entry,register,status}_write() inode_lock() on the
732+
* root inode must be held.
733+
* The lock is exclusive ensuring that the list can't be
734+
* modified. Only load_misc_binary() can access but does so
735+
* read-only. So we only need to take the write lock when we
736+
* actually remove the entry from the list.
737+
*/
622738
if (!list_empty(&e->list))
623-
kill_node(e);
739+
remove_binfmt_handler(e);
624740

625-
inode_unlock(d_inode(root));
741+
inode_unlock(inode);
626742
break;
627743
default:
628744
return res;
@@ -681,13 +797,7 @@ static ssize_t bm_register_write(struct file *file, const char __user *buffer,
681797
if (!inode)
682798
goto out2;
683799

684-
err = simple_pin_fs(&bm_fs_type, &bm_mnt, &entry_count);
685-
if (err) {
686-
iput(inode);
687-
inode = NULL;
688-
goto out2;
689-
}
690-
800+
refcount_set(&e->users, 1);
691801
e->dentry = dget(dentry);
692802
inode->i_private = e;
693803
inode->i_fop = &bm_entry_operations;
@@ -731,7 +841,8 @@ static ssize_t bm_status_write(struct file *file, const char __user *buffer,
731841
size_t count, loff_t *ppos)
732842
{
733843
int res = parse_command(buffer, count);
734-
struct dentry *root;
844+
Node *e, *next;
845+
struct inode *inode;
735846

736847
switch (res) {
737848
case 1:
@@ -744,13 +855,22 @@ static ssize_t bm_status_write(struct file *file, const char __user *buffer,
744855
break;
745856
case 3:
746857
/* Delete all handlers. */
747-
root = file_inode(file)->i_sb->s_root;
748-
inode_lock(d_inode(root));
858+
inode = d_inode(file_inode(file)->i_sb->s_root);
859+
inode_lock(inode);
749860

750-
while (!list_empty(&entries))
751-
kill_node(list_first_entry(&entries, Node, list));
861+
/*
862+
* In order to add new element or remove elements from the list
863+
* via bm_{entry,register,status}_write() inode_lock() on the
864+
* root inode must be held.
865+
* The lock is exclusive ensuring that the list can't be
866+
* modified. Only load_misc_binary() can access but does so
867+
* read-only. So we only need to take the write lock when we
868+
* actually remove the entry from the list.
869+
*/
870+
list_for_each_entry_safe(e, next, &entries, list)
871+
remove_binfmt_handler(e);
752872

753-
inode_unlock(d_inode(root));
873+
inode_unlock(inode);
754874
break;
755875
default:
756876
return res;

0 commit comments

Comments
 (0)