Packages buildbot is erratic, both master and 23.05 packages fail often

Thibaut hacks at slashdirt.org
Fri Jun 2 02:09:48 PDT 2023


Hi,

> Le 2 juin 2023 à 07:43, Petr Štetiar <ynezz at true.cz> a écrit :
> 
> Thibaut <hacks at slashdirt.org> [2023-06-01 18:21:22]:
> 
> Hi,
> 
>>> There has been many timeouts of "3600 seconds without output" in master,
>> 
>> These look like connectivity issues.
> 
> I'm not sure, as there is a keep alive going on between master/worker so
> master would remove the worker quite sooner due to keep alive response
> timeout, wouldn't it? Putting asside some buildbot bugs of course.

You are correct, I was talking out of my rear end ;)
This is not a connectivity issue, the build is actually hung. dmesg might have more info.

> Workers osuosl-dock-09,10,11,12 are on one build host and
> osuosl-dock-05,06,07,08 are on the second build host, wouldn't they have same
> connectivity issues at the same time?

Correct.

I noticed you adjusted cpu affinity, on NUMA nodes it helps performance. On my buildbot setup I used cgroups and assigned each buildworker to a specific cgroup, which enables me to adjust CPU affinity *and* memory affinity, which you also want. It also enables you to easily set memory limits for each build worker.

I’m happy to share my config if interested, it’s not a very complex setup.

> I'm not saying it's not possible, there has been similar network issues in the
> past, so it might be it.
> 
>>> and quite too many "out of space" errors in the 23.05 packages buildbot.
>> 
>> 23.05 package builders are nearly all out of space, possibly due to accumulated cruft in dl dir.
> 
> from the quick look it seems like Rust has increased the disk space
> requirements in shared work directory.

I’m confused with that sentence: the du step shows 36G used, but df says all 60G are full; which suggests something *outside* of the build directory is eating space?

Cheers,
Thibaut


More information about the openwrt-devel mailing list