-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Description
Noticed these failures when I was investigating some disabled tracing tests in #56507. These failures are unrelated to the tests I turned back on in that PR, so I looked at the history.
net6.0-Linux-Debug-x64-CoreCLR_release-Ubuntu.1804.Amd64.Open
/datadisks/disk1/work/B3F20994/w/C4E20A47/e /datadisks/disk1/work/B3F20994/w/C4E20A47/e
Discovering: System.Runtime.Tests (method display = ClassAndMethod, method display options = None)
Discovered: System.Runtime.Tests (found 28 of 6255 test cases)
Starting: System.Runtime.Tests (parallel test collections = on, max threads = 2)
./RunTests.sh: line 162: 11202 Killed "$RUNTIME_PATH/dotnet" exec --runtimeconfig System.Runtime.Tests.runtimeconfig.json --depsfile System.Runtime.Tests.deps.json xunit.console.dll System.Runtime.Tests.dll -xml testResults.xml -nologo -nocolor -trait category=OuterLoop -notrait category=IgnoreForCI -notrait category=failing $RSP_FILE
/datadisks/disk1/work/B3F20994/w/C4E20A47/e
----- end Thu Jul 29 01:24:36 UTC 2021 ----- exit code 137 ----------------------------------------------------------
exit code 137 means SIGKILL Killed eg by kill
ulimit -c value: unlimited
[ 2439.914551] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[ 2439.914551] 251 total pagecache pages
[ 2439.914552] 0 pages in swap cache
[ 2439.914553] Swap cache stats: add 0, delete 0, find 0/0
[ 2439.914553] Free swap = 0kB
[ 2439.914553] Total swap = 0kB
[ 2439.914554] 2097038 pages RAM
[ 2439.914554] 0 pages HighMem/MovableOnly
[ 2439.914555] 58679 pages reserved
[ 2439.914555] 0 pages cma reserved
[ 2439.914555] 0 pages hwpoisoned
[ 2439.914556] Tasks state (memory values in pages):
[ 2439.914556] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
[ 2439.914560] [ 447] 0 447 43216 215 331776 0 0 systemd-journal
[ 2439.914562] [ 470] 0 470 24428 43 94208 0 0 lvmetad
[ 2439.914563] [ 476] 0 476 11204 566 131072 0 -1000 systemd-udevd
[ 2439.914564] [ 523] 0 523 3005 229 69632 0 0 hv_kvp_daemon
[ 2439.914565] [ 896] 62583 896 35489 133 184320 0 0 systemd-timesyn
[ 2439.914566] [ 1024] 100 1024 20021 151 176128 0 0 systemd-network
[ 2439.914567] [ 1062] 101 1062 17697 173 176128 0 0 systemd-resolve
[ 2439.914569] [ 1319] 0 1319 20058 3259 204800 0 0 python3
[ 2439.914570] [ 1332] 0 1332 15545 168 155648 0 0 systemd-logind
[ 2439.914571] [ 1333] 0 1333 42739 1957 229376 0 0 networkd-dispat
[ 2439.914572] [ 1336] 0 1336 40270 32 86016 0 0 lxcfs
[ 2439.914573] [ 1338] 103 1338 12514 160 143360 0 -900 dbus-daemon
[ 2439.914574] [ 1366] 0 1366 72000 214 188416 0 0 accounts-daemon
[ 2439.914575] [ 1372] 0 1372 27605 56 114688 0 0 irqbalance
[ 2439.914576] [ 1381] 0 1381 7084 51 94208 0 0 atd
[ 2439.914577] [ 1382] 102 1382 66817 364 163840 0 0 rsyslogd
[ 2439.914578] [ 1391] 0 1391 7938 73 98304 0 0 cron
[ 2439.914579] [ 1393] 0 1393 226267 6655 286720 0 -999 containerd
[ 2439.914580] [ 1397] 0 1397 4104 38 73728 0 0 agetty
[ 2439.914581] [ 1408] 0 1408 3723 32 69632 0 0 agetty
[ 2439.914582] [ 1436] 0 1436 72221 197 200704 0 0 polkitd
[ 2439.914583] [ 1622] 0 1622 1128 17 53248 0 0 none
[ 2439.914584] [ 1785] 0 1785 18076 181 176128 0 -1000 sshd
[ 2439.914585] [ 1806] 0 1806 96545 4082 266240 0 0 python3
[ 2439.914586] [ 2473] 1000 2473 2899 66 65536 0 0 helix.sh
[ 2439.914588] [ 2928] 0 2928 247469 11662 483328 0 -500 dockerd
[ 2439.914589] [ 3295] 1000 3295 44341 6852 241664 0 0 python3
[ 2439.914590] [ 3299] 106 3299 7150 46 94208 0 0 uuidd
[ 2439.914591] [ 3313] 1000 3313 63593 7085 270336 0 0 python3
[ 2439.914592] [ 3314] 1000 3314 124773 11968 348160 0 0 python3
[ 2439.914593] [ 11190] 1000 11190 1158 16 57344 0 0 sh
[ 2439.914594] [ 11192] 1000 11192 1158 17 57344 0 0 execute.sh
[ 2439.914595] [ 11194] 1000 11194 2932 83 69632 0 0 bash
[ 2439.914596] [ 11202] 1000 11202 2906815 1915794 15781888 0 0 dotnet
[ 2439.914597] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/helix.service,task=dotnet,pid=11202,uid=1000
[ 2439.914636] Out of memory: Killed process 11202 (dotnet) total-vm:11627260kB, anon-rss:7663176kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:15412kB oom_score_adj:0
[ 2440.040540] oom_reaper: reaped process 11202 (dotnet), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Waiting a few seconds for any dump to be written..
cat /proc/sys/kernel/core_pattern: /home/helixbot/dotnetbuild/dumps/core.%u.%p
cat /proc/sys/kernel/core_uses_pid: 0
cat: /proc/sys/kernel/coredump_filter: No such file or directory
cat /proc/sys/kernel/coredump_filter:
Looking around for any Linux dump..
... found no dump in /datadisks/disk1/work/B3F20994/w/C4E20A47/e
+ export _commandExitCode=137
and
net6.0-Linux-Debug-x64-CoreCLR_release-SLES.15.Amd64.Open
~/work/A42C0904/w/A0C3088C/e ~/work/A42C0904/w/A0C3088C/e
Discovering: System.Runtime.Tests (method display = ClassAndMethod, method display options = None)
Discovered: System.Runtime.Tests (found 28 of 6255 test cases)
Starting: System.Runtime.Tests (parallel test collections = on, max threads = 2)
./RunTests.sh: line 162: 19114 Killed "$RUNTIME_PATH/dotnet" exec --runtimeconfig System.Runtime.Tests.runtimeconfig.json --depsfile System.Runtime.Tests.deps.json xunit.console.dll System.Runtime.Tests.dll -xml testResults.xml -nologo -nocolor -trait category=OuterLoop -notrait category=IgnoreForCI -notrait category=failing $RSP_FILE
~/work/A42C0904/w/A0C3088C/e
----- end Thu Jul 29 01:40:46 UTC 2021 ----- exit code 137 ----------------------------------------------------------
exit code 137 means SIGKILL Killed eg by kill
ulimit -c value: unlimited
dmesg: read kernel buffer failed: Operation not permitted
Waiting a few seconds for any dump to be written..
cat /proc/sys/kernel/core_pattern: /home/helixbot/dotnetbuild/dumps/core.%u.%p
cat /proc/sys/kernel/core_uses_pid: 0
cat: /proc/sys/kernel/coredump_filter: No such file or directory
cat /proc/sys/kernel/coredump_filter:
Looking around for any Linux dump..
... found no dump in /home/helixbot/work/A42C0904/w/A0C3088C/e
Both appear to be the same failure with little to no other diagnostics information. I see a few other failures in the history in AzDO going as far back as at least June 24th, but I saw failures all the way back into early May. The logs for those builds are gone, so I can't verify that they are the same failures. I stopped going back in the history at May, so I'm not sure how far back this failure goes.
Based on the history, it looks like this test is potentially flakey. It routinely passes, but occasionally fails. Seemingly in pairs, e.g., if one test run fails, there is another failure within a run of the other. All records of the test in AzDO have the exact same duration 00:01:00.00 regardless of pass or fail. I'm not sure how much I trust these records as a result.
I couldn't find an issue tracking this, but feel free to dup if there is already one.