Commit 2761713
committed
rbd: fix copyup completion race
For write/discard obj_requests that involved a copyup method call, the
opcode of the first op is CEPH_OSD_OP_CALL and the ->callback is
rbd_img_obj_copyup_callback(). The latter frees copyup pages, sets
->xferred and delegates to rbd_img_obj_callback(), the "normal" image
object callback, for reporting to block layer and putting refs.
rbd_osd_req_callback() however treats CEPH_OSD_OP_CALL as a trivial op,
which means obj_request is marked done in rbd_osd_trivial_callback(),
*before* ->callback is invoked and rbd_img_obj_copyup_callback() has
a chance to run. Marking obj_request done essentially means giving
rbd_img_obj_callback() a license to end it at any moment, so if another
obj_request from the same img_request is being completed concurrently,
rbd_img_obj_end_request() may very well be called on such prematurally
marked done request:
<obj_request-1/2 reply>
handle_reply()
rbd_osd_req_callback()
rbd_osd_trivial_callback()
rbd_obj_request_complete()
rbd_img_obj_copyup_callback()
rbd_img_obj_callback()
<obj_request-2/2 reply>
handle_reply()
rbd_osd_req_callback()
rbd_osd_trivial_callback()
for_each_obj_request(obj_request->img_request) {
rbd_img_obj_end_request(obj_request-1/2)
rbd_img_obj_end_request(obj_request-2/2) <--
}
Calling rbd_img_obj_end_request() on such a request leads to trouble,
in particular because its ->xfferred is 0. We report 0 to the block
layer with blk_update_request(), get back 1 for "this request has more
data in flight" and then trip on
rbd_assert(more ^ (which == img_request->obj_request_count));
with rhs (which == ...) being 1 because rbd_img_obj_end_request() has
been called for both requests and lhs (more) being 1 because we haven't
got a chance to set ->xfferred in rbd_img_obj_copyup_callback() yet.
To fix this, leverage that rbd wants to call class methods in only two
cases: one is a generic method call wrapper (obj_request is standalone)
and the other is a copyup (obj_request is part of an img_request). So
make a dedicated handler for CEPH_OSD_OP_CALL and directly invoke
rbd_img_obj_copyup_callback() from it if obj_request is part of an
img_request, similar to how CEPH_OSD_OP_READ handler invokes
rbd_img_obj_request_read_callback().
Since rbd_img_obj_copyup_callback() is now being called from the OSD
request callback (only), it is renamed to rbd_osd_copyup_callback().
Cc: Alex Elder <[email protected]>
Cc: [email protected] # 3.10+, needs backporting for < 3.18
Signed-off-by: Ilya Dryomov <[email protected]>
Reviewed-by: Alex Elder <[email protected]>1 parent fc927cd commit 2761713
1 file changed
+17
-5
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
523 | 523 | | |
524 | 524 | | |
525 | 525 | | |
| 526 | + | |
526 | 527 | | |
527 | 528 | | |
528 | 529 | | |
| |||
1818 | 1819 | | |
1819 | 1820 | | |
1820 | 1821 | | |
| 1822 | + | |
| 1823 | + | |
| 1824 | + | |
| 1825 | + | |
| 1826 | + | |
| 1827 | + | |
| 1828 | + | |
| 1829 | + | |
| 1830 | + | |
| 1831 | + | |
1821 | 1832 | | |
1822 | 1833 | | |
1823 | 1834 | | |
| |||
1866 | 1877 | | |
1867 | 1878 | | |
1868 | 1879 | | |
| 1880 | + | |
| 1881 | + | |
1869 | 1882 | | |
1870 | 1883 | | |
1871 | 1884 | | |
| |||
2530 | 2543 | | |
2531 | 2544 | | |
2532 | 2545 | | |
2533 | | - | |
| 2546 | + | |
2534 | 2547 | | |
2535 | 2548 | | |
2536 | 2549 | | |
2537 | 2550 | | |
2538 | 2551 | | |
2539 | 2552 | | |
| 2553 | + | |
| 2554 | + | |
2540 | 2555 | | |
2541 | 2556 | | |
2542 | 2557 | | |
| |||
2563 | 2578 | | |
2564 | 2579 | | |
2565 | 2580 | | |
2566 | | - | |
2567 | | - | |
2568 | | - | |
| 2581 | + | |
2569 | 2582 | | |
2570 | 2583 | | |
2571 | 2584 | | |
| |||
2650 | 2663 | | |
2651 | 2664 | | |
2652 | 2665 | | |
2653 | | - | |
2654 | 2666 | | |
2655 | 2667 | | |
2656 | 2668 | | |
| |||
0 commit comments