[bug report & help] arm64: ltp testcase "migrate_pages01" failed

Yisheng Xie xieyisheng1 at huawei.com
Tue Oct 17 06:19:26 PDT 2017


Hi Will,

On 2017/10/17 17:23, Will Deacon wrote:
> On Tue, Oct 17, 2017 at 10:58:53AM +0800, Tan Xiaojun wrote:
>> I'm not sure if this is the problem on arm64 numa. What do you think ?
>> By the way, this testcase can be successful in any case on x86.
> 
> To be honest, this isn't a particularly helpful bug report. I appreciate
> that a test is reporting failure, but it doesn't look like you've spent
> very much effort to understand what the test is trying to do and why it
> thinks it's failed to do it. All I can sensibly do with your bug report
> is run the test myself, and it passes on the systems I have available.
> 
> So, you need to:
> 
> 1. Understand what the test is doing.
> 2. Figure out which bit isn't doing what it's supposed to
> 3. See if that part can be isolated to trigger the problem
> 
> At that point, it should be possible to describe the unexpected behaviour
> at a level which we can actually investigate if necessary.
This test case is to test whether we should migrate successfully if user call
SYSC_migrate_pages with a invalid node. eg, we should 4 node 0-3, and try to
migrate to node 4. And this should return -EINVAL.

however, the kernel will migrate the memory to node 0 and return ok(e.g. 0).
The root cause is for
	nodes_subset(*new, node_states[N_MEMORY])

will return true when new = 0x10 and node_states[N_MEMORY]=0xf, MAX_NUMNODES=4.

And this is common issue, and I also can reproduce at certain config on X86-64
e.g. CONFIG_NODES_SHIFT=3 and have 8 node in the system.

IMO, if nbits=4, 0x0 or 0x10, 0xFF..F0 should not a subset of anything, so following
patch may fix this problem:

From: Yisheng Xie <xieyisheng1 at huawei.com>
Date: Tue, 17 Oct 2017 20:53:55 +0800
Subject: [PATCH] bitmap: fix corner case of bitmap_subset

As Xiaojun reported the ltp of migrate_pages01 will failed in system
whoes has 4 node with CONFIG_NODES_SHIFT=2:

migrate_pages01    0  TINFO  :  test_invalid_nodes
migrate_pages01   14  TFAIL  :  migrate_pages_common.c:45: unexpected failure - returned value = 0, expected: -1
migrate_pages01   15  TFAIL  :  migrate_pages_common.c:55: call succeeded unexpectedly

and the root cause is
	nodes_subset(*new, node_states[N_MEMORY])

will return true in the case like: new = 0x10 and node_states[N_MEMORY]=0xf,
MAX_NUMNODES=4.

Fix it by correct the corner case of bitmap_subset, which makes 0x0 or
0x10, 0xFF..F0  not a subset of bitmap when bitmap lenth is 4.

Reported-by: Tan Xiaojun <tanxiaojun at huawei.com>
Signed-off-by: Yisheng Xie <xieyisheng1 at huawei.com>
---
 include/linux/bitmap.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
index 700cf5f..bc66978 100644
--- a/include/linux/bitmap.h
+++ b/include/linux/bitmap.h
@@ -283,6 +283,8 @@ static inline int bitmap_intersects(const unsigned long *src1,
 static inline int bitmap_subset(const unsigned long *src1,
 			const unsigned long *src2, unsigned int nbits)
 {
+	if(!(*src1 & BITMAP_LAST_WORD_MASK(nbits)))
+		return false;
 	if (small_const_nbits(nbits))
 		return ! ((*src1 & ~(*src2)) & BITMAP_LAST_WORD_MASK(nbits));
 	else
-- 
1.7.12.4

Thanks
Yisheng Xie

> 
> Will
> 
>> On 2017/10/16 19:42, Tan Xiaojun wrote:
>>> Hi all,
>>>
>>> I test ltp in Hisilicon D05 board and get a failed result about the testcase "migrate_pages01".
>>>
>>> In fact, The sub testcase "test_invalid_nodes" failed. The testcase is to find a invalid numa node and migrate memory pages to this node via syscall of "migrate_pages".
>>> The expected result of this case is returning "-1", but it actually return "0".
>>>
>>> --------------------------------------------------------
>>> # ./migrate_pages01
>>> migrate_pages01    0  TINFO  :  test_empty_mask
>>> migrate_pages01    1  TPASS  :  expected ret success: returned value = 0
>>> migrate_pages01    0  TINFO  :  test_invalid_pid -1
>>> migrate_pages01    2  TPASS  :  expected ret success: returned value = -1
>>> migrate_pages01    3  TPASS  :  expected failure: TEST_ERRNO=ESRCH(3): No such process
>>> migrate_pages01    0  TINFO  :  test_invalid_pid unused pid
>>> migrate_pages01    4  TPASS  :  expected ret success: returned value = -1
>>> migrate_pages01    5  TPASS  :  expected failure: TEST_ERRNO=ESRCH(3): No such process
>>> migrate_pages01    0  TINFO  :  test_invalid_masksize
>>> migrate_pages01    6  TPASS  :  expected ret success: returned value = -1
>>> migrate_pages01    7  TPASS  :  expected failure: TEST_ERRNO=EINVAL(22): Invalid argument
>>> migrate_pages01    0  TINFO  :  test_invalid_mem -1
>>> migrate_pages01    8  TPASS  :  expected ret success: returned value = -1
>>> migrate_pages01    9  TPASS  :  expected failure: TEST_ERRNO=EFAULT(14): Bad address
>>> migrate_pages01    0  TINFO  :  test_invalid_mem invalid prot
>>> migrate_pages01   10  TPASS  :  expected ret success: returned value = -1
>>> migrate_pages01   11  TPASS  :  expected failure: TEST_ERRNO=EFAULT(14): Bad address
>>> migrate_pages01    0  TINFO  :  test_invalid_mem unmmaped
>>> migrate_pages01   12  TPASS  :  expected ret success: returned value = -1
>>> migrate_pages01   13  TPASS  :  expected failure: TEST_ERRNO=EFAULT(14): Bad address
>>> migrate_pages01    0  TINFO  :  test_invalid_nodes
>>> migrate_pages01   14  TFAIL  :  migrate_pages_common.c:45: unexpected failure - returned value = 0, expected: -1
>>> migrate_pages01   15  TFAIL  :  migrate_pages_common.c:55: call succeeded unexpectedly
>>> migrate_pages01    0  TINFO  :  test_invalid_perm
>>> migrate_pages01   16  TPASS  :  expected ret success: returned value = -1
>>> migrate_pages01   17  TPASS  :  expected failure: TEST_ERRNO=EPERM(1): Operation not permitted
>>> --------------------------------------------------------
>>>
>>> I debug and find a interesting thing, this case does not always fail.
>>>
>>> 1) If one or several numa nodes have no memory, this case will run successfully like below:
>>>
>>> --------------------
>>> available: 4 nodes (0-3)
>>> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
>>> node 0 size: 65309 MB
>>> node 0 free: 61650 MB
>>> node 1 cpus: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
>>> node 1 size: 65404 MB
>>> node 1 free: 61377 MB
>>> node 2 cpus: 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
>>> node 2 size: 65401 MB
>>> node 2 free: 62316 MB
>>> node 3 cpus: 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
>>> node 3 size: 0 MB
>>> node 3 free: 0 MB
>>> node distances:
>>> node   0   1   2   3
>>>   0:  10  15  20  20
>>>   1:  15  10  20  20
>>>   2:  20  20  10  15
>>>   3:  20  20  15  10
>>> ---------------------
>>>
>>> This testcase will find node number 3 and migrate pages to node 3. And syscall of "migrate_pages" return -1, test succeeded.
>>>
>>> 2) In most cases, all nodes have memory, and the testcase will get non-existent node like node number 4. The syscall of "migrate_pages" should also return -1, but return 0 actually.
>>> So the testcase failed.
>>>
>>> I think it is a problem in arm64. But I am not familiar with numa, so I ask for help from you.
>>>
>>> Thanks.
>>> Xiaojun.
>>>
>>>
>>> .
>>>
>>
>>
> 
> .
> 




More information about the linux-arm-kernel mailing list