Packages buildbot is erratic, both master and 23.05 packages fail often

Thibaut hacks at slashdirt.org
Sat Jun 3 05:23:14 PDT 2023



> Le 3 juin 2023 à 10:27, Hannu Nyman <hannu.nyman at iki.fi> a écrit :
> 
> Petr Štetiar kirjoitti 2.6.2023 klo 22.07:
>> So having following in buildbot log:
>> 
>>  2023-06-01 23:53:12+0000 [-] command timed out: 3600 seconds without output running [b'make', b'-j7', b'IGNORE_ERRORS=n m y', b'BUILD_LOG=1', b'CONFIG_AUTOREMOVE=y', b'CONFIG_SIGNED_PACKAGES='], attempting to kill
>>  2023-06-01 23:53:12+0000 [-] trying to kill process group 1528179
>> 
>> I've looked at the system logs around that time and found following:
>> 
>>  Jun 01 22:23:19 audit[3844576]: AVC apparmor="DENIED" operation="mkdir" info="Failed name lookup - name too long"
>>                  error=-36 profile="docker-default"
>> 		 name="/shared-workdir/build/sdk/build_dir/hostpkg/gettext-0.21.1/gettext-tools/confdir3/confdir3/confdir3/confdir3...[snip very long repeating pattern]...
>> 		 confdir3/confdir3/confdir3/confdir3/confdir3" pid=3844576 comm="conftest" requested_mask="c" denied_mask="c" fsuid=1000 ouid=1000
>>  Jun 01 22:23:45 kernel: conftest[3855174]: segfault at 0 ip 00007fe9581067e7 sp 00007ffd94ca2118 error 4 in libc-2.31.so[7fe958085000+159000]
>>  ...
>> 
>> Since the host is shared with other 3 build workers I can't be sure, that it
>> originated from that timeouted build.
>> 
> 
> Looking at that observation about gettext and recursive "confdir3/", it is plausible that gettext has problem that manifests in some builds, or trouble with parallelism on some occasions.
> 
> Gettext was heavily reorganised in May, near the same time as the buildbot code was revamped. So, this might quite well be related to the gettext package and not the new buildbot code.

At the risk of repeating myself, there is *no* new buildbot code for phase2.
They are still running the same old code from March 2022:
https://buildbot.staging.openwrt.org/master/packages/#/about
https://buildbot.staging.openwrt.org/openwrt-23.05/packages/#/about

[…]

> No gettext completion before the final timeout error.  Hunderds of other packages were compiled in the time when gettext was was being recursively compiled?

I wouldn’t pay too much attention to this build failure until the space problems are resolved.
Running out of space can wreck havoc in many different ways and we may simply be looking at side effects (possibly across containers) of that.

My 2c.
T.


More information about the openwrt-devel mailing list