Saturday, 2021-02-06

*** JayF has joined #openstack-infra00:11
*** tosky has quit IRC00:28
*** thiago__ has joined #openstack-infra00:43
*** tdasilva_ has quit IRC00:45
*** tdasilva_ has joined #openstack-infra00:48
*** thiago__ has quit IRC00:50
*** hamalq has joined #openstack-infra01:17
*** yamamoto has quit IRC01:18
*** Xuchu has joined #openstack-infra01:20
*** Xuchu_ has quit IRC01:22
*** yamamoto has joined #openstack-infra01:23
*** hamalq has quit IRC01:24
*** hamalq has joined #openstack-infra01:25
*** yamamoto has quit IRC01:28
*** Xuchu_ has joined #openstack-infra02:12
*** Xuchu has quit IRC02:15
*** yamamoto has joined #openstack-infra02:21
*** hamalq has quit IRC02:24
*** yamamoto has quit IRC02:38
*** dviroel has quit IRC03:22
*** dviroel has joined #openstack-infra03:25
*** Xuchu has joined #openstack-infra03:26
*** Xuchu_ has quit IRC03:29
dansmithSaw this on a grenade job during setup, trying to install packages from apt:04:03
dansmithE: You don't have enough free space in /var/cache/apt/archives/.04:03
dansmithhttps://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_c31/774317/2/check/grenade/c31a0f0/job-output.txt04:03
*** david-lyle has joined #openstack-infra05:25
*** Xuchu_ has joined #openstack-infra05:25
*** redrobot6 has joined #openstack-infra05:27
*** dklyle has quit IRC05:28
*** redrobot has quit IRC05:28
*** redrobot6 is now known as redrobot05:28
*** Xuchu has quit IRC05:29
*** mgoddard has quit IRC05:31
*** kota_ has quit IRC05:31
*** kota_ has joined #openstack-infra05:31
*** mgoddard has joined #openstack-infra05:31
*** dviroel has quit IRC05:42
*** Xuchu_ has quit IRC05:46
*** yamamoto has joined #openstack-infra06:35
*** yamamoto has quit IRC06:39
*** david-lyle has quit IRC07:42
*** vesper11 has joined #openstack-infra08:35
*** yamamoto has joined #openstack-infra08:36
*** yamamoto has quit IRC08:41
*** matt_kosut has joined #openstack-infra09:19
*** matt_kosut has quit IRC09:19
*** xek has joined #openstack-infra09:51
*** vesper11 has quit IRC09:51
*** paladox has quit IRC09:59
*** tosky has joined #openstack-infra10:14
*** xek has quit IRC10:22
*** yamamoto has joined #openstack-infra10:37
*** yamamoto has quit IRC10:42
*** yamamoto has joined #openstack-infra10:55
*** yamamoto has quit IRC11:29
*** yamamoto has joined #openstack-infra12:02
*** yamamoto has quit IRC12:10
*** tdasilva_ has quit IRC12:11
*** tdasilva_ has joined #openstack-infra12:12
*** yamamoto has joined #openstack-infra14:07
fungidansmith: looks like devstack could stand to do an apt clean after each round of things it installs. by default, debian derivatives leave copies of all installed packages in /var/cache/archive, and disk space might be tight in that provider14:09
fungier, in /var/cache/apt/archive i mean14:10
fungi`apt-get clean` or `apt clean` will clear them out14:10
*** yamamoto has quit IRC14:11
*** paladox has joined #openstack-infra14:18
*** dviroel has joined #openstack-infra14:23
*** tosky has quit IRC16:03
*** maysams has quit IRC16:12
*** Tengu has quit IRC17:20
*** Tengu has joined #openstack-infra17:21
*** ralonsoh has joined #openstack-infra17:22
*** xek has joined #openstack-infra17:22
*** ralonsoh has quit IRC17:25
*** xek has quit IRC17:35
*** xek has joined #openstack-infra17:38
*** xek has quit IRC17:38
*** d34dh0r53 has quit IRC18:32
*** tosky has joined #openstack-infra19:23
*** slaweq has joined #openstack-infra19:48
dansmithfungi: really? it's complaining about not having 400mb of disk.. are the workers really that tight on space?20:02
dansmiththe workers get cleaned after each run, so it's not package cache from the previous run right?20:02
*** yamamoto has joined #openstack-infra20:10
*** yamamoto has quit IRC20:14
funginot sure what you mean by workers, but the job nodes are deleted and booted fresh20:15
*** slaweq has quit IRC20:15
fungiunfortunately it failed in such a way that the usual devstack log collection didn't happen, so we don't have a df to see what the actual filesystem size was20:15
fungipossible something happened when that node booted which caused it not to growpart the rootfs at boot and left it at the nominal image size20:16
fungifirst time i've seen that, so hard to speculate as to the cause20:17
fungiare you finding multiple occurrences?20:18
dansmithyeah I saw it a couple times yesterday, always on the grenade job20:18
dansmithby workers I mean the thing that we run devstack in.. so yeah, I assumed those get booted fresh, but I thought maybe you were suggesting that we just do a ./clean and re-run of devstack, so wasn't sure20:19
dansmithfungi: this doesn't have to be something for a saturday for either of us, it just seemed like maybe something had changed and we were going to see a rash of fails due to disk space coming20:19
dansmithI did a bunch of pushes last night before the jobs finished, so I'm not sure many of those actually got reported, but the last round that I let complete last night seemed to finish20:21
fungiwe might ought to add a df to https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/validate-host/library/zuul_debug_info.py and then it will be included in the zuul-info/zuul-info.*.txt files we collect from the nodes20:22
fungii'll throw up a patch for that now while i'm thinking about it20:22
fungiat least that way we'll know what the filesystem sizes and utilization look like at the start of each job, and can speculate a bit better as to what happened20:23
dansmithack, devstack or zuul could also do it before to see what we start with.. I dunno how big the disks are on those, but 400m seemed like an awfully small margin20:23
fungiyep, but that was also after numerous package install rounds earlier in the log20:24
dansmithsure sure, but.. 400m :)20:24
dansmithif we need to start being more disk conscious then that's a thing I guess, but I'd want to know where it's all going20:25
fungidansmith: https://review.opendev.org/77435820:30
fungialso there was a time when we sparsely fallocated swapfiles on nodes, but more recent linux kernels have required us to preallocate them instead20:30
fungiso depending on the swap size set in the job configuration (default in our deployment is 1gb) that can eat away at available space on the rootfs20:31
fungia lot of jobs have it set to 8gb, but even that alone doesn't seem like it should be the cause of the problem in that example20:32
dansmithhow big are the roots supposed to be?20:32
dansmithmaybe our fs isn't expanded or something and we have a small margin over the actual size of the disk?20:32
fungii think some providers have a rootfs as small as 20gb and then allocate a larger ephemeral disk which some kinds of jobs (e.g. devstack) mount at /opt20:33
dansmithokay20:34
fungii don't recall how small they are for that particular provider from your example, i'd probably have to manually boot or hold a node and investigate20:34
dansmithack, well, anyway, let's not make a saturday out of this.. it was mostly just an FYI in case something has changed lately that was likely to cause a raft of disk space fails20:36
fungiyep, totally appreciate the heads up, i'll advocate for the patch to collect initial fs sizes/utilization and see if we can't get a better idea of why we see it sometimes20:38
dansmithlogstash shows several hits in the last 48 hours btw20:41
dansmithso it wasn't just those two20:41
fungisame provider each time?20:42
dansmithairship-kn1 yeah looks like20:43
fungipossible something has changed there, in that case20:53
*** zzzeek has quit IRC21:01
*** zzzeek has joined #openstack-infra21:02
fungii wonder if they shrunk the disk on the flavor we've been using, for example21:07
dansmithI heard there's some k8s malware going around mining bitcoin, maybe we've got an openstack virus on our hands that eats disk :)21:08
fungitasty, tasty disk21:08
fungiit could also be something like this has always been the smallest rootfs of all our providers but recently some change merged to grenade which caused it to begin using far more disk, and because we boot so few instances in that provider it's gone unspotted until now21:09
dansmithyeah I dunno what has changed really.. could be as simple as the mysql package includes sample databases now or something I guess21:14
dansmiththe message actually says 100mb is what it has free, but needs 400, which really seems like too close a margin for something not to have changed recently21:15
jrosserif it’s focal could it be the delta to 20.04.2 trying to install? that landed on feb 4th21:19
dansmiththat's a good idea, but I don't see any giant "and all these 300 will come too" package installs in that log21:24
corvusdtantsur|afk: hi, it looks like openlab terraform-provider-openstack jobs are failing after feb 2; i looked and do not immediately see the cause.  here's the build history: http://status.openlabtesting.org/builds?job_name=terraform-provider-openstack-acceptance-test  it's failing on "TASK [install-devstack : Set fact for devstack openrc]" which has a no_log, so i can't see why.  do you have any21:37
corvusidea?21:37
corvusdtantsur|afk: see also https://github.com/theopenlab/openlab/issues/681 and https://github.com/theopenlab/openlab-zuul-jobs/pull/110422:03
corvus(and i'm totally open to suggestions of a better irc channel, i realize this is not directly TACT related, but there's a community nexus here; sorry)22:05
*** slaweq has joined #openstack-infra22:05
*** zzzeek has quit IRC22:06
*** zzzeek has joined #openstack-infra22:08
*** yamamoto has joined #openstack-infra22:11
*** slaweq has quit IRC22:50
*** dviroel has quit IRC22:53
*** yamamoto has quit IRC23:19
*** tosky has quit IRC23:52

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!