From b490792c18f74b76ec8161721c1e07f810e36309 Mon Sep 17 00:00:00 2001 From: George Dunlap Date: Thu, 10 Oct 2019 17:57:49 +0100 Subject: [PATCH 02/11] x86/mm: Don't re-set PGT_pinned on a partially de-validated page When unpinning pagetables, if an operation is interrupted, relinquish_memory() re-sets PGT_pinned so that the un-pin will pickedup again when the hypercall restarts. This is appropriate when put_page_and_type_preemptible() returns -EINTR, which indicates that the page is back in its initial state (i.e., completely validated). However, for -ERESTART, this leads to a state where a page has both PGT_pinned and PGT_partial set. This happens to work at the moment, although it's not really a "canonical" state; but in subsequent patches, where we need to make a distinction in handling between PGT_validated and PGT_partial pages, this causes issues. Move to a "canonical" state by: - Only re-setting PGT_pinned on -EINTR - Re-dropping the refcount held by PGT_pinned on -ERESTART In the latter case, the PGT_partial bit will be cleared further down with the rest of the other PGT_partial pages. While here, clean up some trainling whitespace. This is part of XSA-299. Reported-by: George Dunlap Signed-off-by: George Dunlap Reviewed-by: Jan Beulich --- xen/arch/x86/domain.c | 31 ++++++++++++++++++++++++++++--- 1 file changed, 28 insertions(+), 3 deletions(-) diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index 2585327834..59df8a6d8d 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -114,7 +114,7 @@ static void play_dead(void) * this case, heap corruption or #PF can occur (when heap debugging is * enabled). For example, even printk() can involve tasklet scheduling, * which touches per-cpu vars. - * + * * Consider very carefully when adding code to *dead_idle. Most hypervisor * subsystems are unsafe to call. */ @@ -1909,9 +1909,34 @@ static int relinquish_memory( break; case -ERESTART: case -EINTR: + /* + * -EINTR means PGT_validated has been re-set; re-set + * PGT_pinned again so that it gets picked up next time + * around. + * + * -ERESTART, OTOH, means PGT_partial is set instead. Put + * it back on the list, but don't set PGT_pinned; the + * section below will finish off de-validation. But we do + * need to drop the general ref associated with + * PGT_pinned, since put_page_and_type_preemptible() + * didn't do it. + * + * NB we can do an ASSERT for PGT_validated, since we + * "own" the type ref; but theoretically, the PGT_partial + * could be cleared by someone else. + */ + if ( ret == -EINTR ) + { + ASSERT(page->u.inuse.type_info & PGT_validated); + set_bit(_PGT_pinned, &page->u.inuse.type_info); + } + else + put_page(page); + ret = -ERESTART; + + /* Put the page back on the list and drop the ref we grabbed above */ page_list_add(page, list); - set_bit(_PGT_pinned, &page->u.inuse.type_info); put_page(page); goto out; default: @@ -2161,7 +2186,7 @@ void vcpu_kick(struct vcpu *v) * pending flag. These values may fluctuate (after all, we hold no * locks) but the key insight is that each change will cause * evtchn_upcall_pending to be polled. - * + * * NB2. We save the running flag across the unblock to avoid a needless * IPI for domains that we IPI'd to unblock. */ -- 2.23.0