Tuesday, 2021-03-16

openstackgerritIan Wienand proposed opendev/system-config master: Remove references to review-dev  https://review.opendev.org/c/opendev/system-config/+/78069103:12
openstackgerritIan Wienand proposed opendev/system-config master: gerrit: download latest mysql connector  https://review.opendev.org/c/opendev/system-config/+/77685703:46
openstackgerritIan Wienand proposed opendev/system-config master: gerrit : add mariadb_container option  https://review.opendev.org/c/opendev/system-config/+/77596103:46
openstackgerritIan Wienand proposed opendev/system-config master: Create review-staging group  https://review.opendev.org/c/opendev/system-config/+/78069803:46
openstackgerritIan Wienand proposed opendev/system-config master: refstack: fix backup script typo  https://review.opendev.org/c/opendev/system-config/+/78069903:49
openstackgerritMerged opendev/system-config master: gitea-git-repos: update deprecated API path  https://review.opendev.org/c/opendev/system-config/+/74156204:00
openstackgerritMerged opendev/system-config master: Enable srvr, stat and dump commands in the zk cluster  https://review.opendev.org/c/opendev/system-config/+/78030304:10
openstackgerritIan Wienand proposed opendev/system-config master: refstack: fix backup script typo  https://review.opendev.org/c/opendev/system-config/+/78069904:13
openstackgerritIan Wienand proposed opendev/system-config master: Create review-staging group  https://review.opendev.org/c/opendev/system-config/+/78069805:19
openstackgerritIan Wienand proposed opendev/system-config master: gerrit: download latest mysql connector  https://review.opendev.org/c/opendev/system-config/+/77685705:19
openstackgerritIan Wienand proposed opendev/system-config master: gerrit: add mariadb_container option  https://review.opendev.org/c/opendev/system-config/+/77596105:19
openstackgerritMerged opendev/system-config master: refstack: fix backup script typo  https://review.opendev.org/c/opendev/system-config/+/78069905:43
openstackgerritOpenStack Proposal Bot proposed openstack/project-config master: Normalize projects.yaml  https://review.opendev.org/c/openstack/project-config/+/78070606:12
*** ysandeep|bbl is now known as ysandeep06:20
openstackgerritMerged openstack/diskimage-builder master: replace the link which is in the 06-hpdsa file  https://review.opendev.org/c/openstack/diskimage-builder/+/73028606:34
*** whoami-rajat_ is now known as whoami-rajat07:09
openstackgerritDaniel Blixt proposed zuul/zuul-jobs master: WIP: Make build-sshkey handling windows compatible  https://review.opendev.org/c/zuul/zuul-jobs/+/78066208:19
*** ysandeep is now known as ysandeep|lunch09:11
openstackgerritMerged openstack/project-config master: Normalize projects.yaml  https://review.opendev.org/c/openstack/project-config/+/78070609:22
openstackgerritMerged openstack/diskimage-builder master: Change paths for bootloader files in iso element  https://review.opendev.org/c/openstack/diskimage-builder/+/77760609:56
*** ysandeep|lunch is now known as ysandeep10:10
*** fressi has joined #opendev10:58
*** fressi has joined #opendev11:00
openstackgerritDaniel Blixt proposed zuul/zuul-jobs master: WIP: Make build-sshkey handling windows compatible  https://review.opendev.org/c/zuul/zuul-jobs/+/78066211:07
rosmaitaclarkb: fungi: i figured out another way to address that problem you helped me with yesterday: https://review.opendev.org/c/openstack/cinder/+/780692 (it's got a -1 from zuul for an unrelated reason)12:31
*** jpena is now known as jpena|lunch12:39
*** smcginnis has quit IRC12:40
openstackgerritRico Lin proposed opendev/system-config master: Remove TW User Group ML  https://review.opendev.org/c/opendev/system-config/+/58403512:42
openstackgerritAurelien Lourot proposed openstack/project-config master: Add Manila dashboard charm to OpenStack charms  https://review.opendev.org/c/openstack/project-config/+/78081212:42
openstackgerritAurelien Lourot proposed openstack/project-config master: Add Manila dashboard charm to OpenStack charms  https://review.opendev.org/c/openstack/project-config/+/78081212:45
lourot^ fungi o/ hopefully this one will have the post-merge zuul jobs passing (pinging you directly because you have the context). Thanks!13:20
*** amoralej is now known as amoralej|lunch13:23
*** jpena|lunch is now known as jpena13:42
*** ykarel|afk is now known as ykarel13:58
openstackgerritDaniel Blixt proposed zuul/zuul-jobs master: WIP: Make build-sshkey handling windows compatible  https://review.opendev.org/c/zuul/zuul-jobs/+/78066214:02
openstackgerritMerged openstack/project-config master: Add Manila dashboard charm to OpenStack charms  https://review.opendev.org/c/openstack/project-config/+/78081214:29
slittle1fatal: unable to access 'https://opendev.org/openstack/horizon.git/': Empty reply from server14:49
clarkbslittle1: what operation emitted that error? also if you open that url and inspect the ssl cert you can identify which of the backends you are talking to which may help with debugging14:50
fungislittle1: also when did you see it? might help us correlate to our resource trending14:53
slittle1seconds before I posted14:54
slittle1sec ....14:54
fungii'm going to guess it's gitea0614:56
fungicacti graphs show we exhausted all memory there a few minutes ago (8gb ram+8gb swap), probably find an oom in dmesg14:56
clarkb01 is working (it is my "home" server)14:57
clarkbin the process of testing 02 now, but if cacti shows 06 is unhappy then that seems a likely culprit14:57
slittle1git clone https://opendev.org/openstack/horizon.git -b stable/train14:57
fungiactually, no oom in gitea06's log since almost a month14:58
fungibut graphs show we topped out swap and went into massive iowait thrash there14:58
clarkbslittle1: and if you instpect the ssl cert for https://opendev.org/ it should show you an altname for the backend server serving requests to you through the load balancer14:59
clarkbif that shows 06 it would confirm fungi's suspicion (unless the load balancer has already moved 06 out of the rotation)14:59
slittle1Cloning into 'horizon'...   hangs/times out14:59
slittle1wget https://opendev.org/openstack/horizon.git14:59
slittle1--2021-03-16 10:58:31--  https://opendev.org/openstack/horizon.git14:59
slittle1Resolving opendev.org (opendev.org)..., 2604:e100:3:0:f816:3eff:fe6b:ad6214:59
slittle1Connecting to opendev.org (opendev.org)||:443... connected.14:59
slittle1HTTP request sent, awaiting response... 500 Internal Server Error14:59
slittle1how do I get to the ssl cert ?15:00
fungioh, if you're trying from the command line you can use openssl s_client15:01
clarkbI just open it in my browser and click on the little button to inspect the site security15:01
clarkbbut yes, s_client works n the command line too15:01
fungiecho | openssl s_client -connect opendev.org:https > /dev/null15:02
fungiit'll report the depth=0 CN to stderr15:02
fungiwhich will contain the backend server name15:03
lourotfungi, you look busy, but when you have a moment, manage_projects timed out again: https://review.opendev.org/c/openstack/project-config/+/780812 - and so the subsequent service-zuul didn't run. Thanks!15:04
clarkball of the backends but 06 show a short spike in tcp opens, but they seem to handle it ok. 06 on the other hand shows a longer spike and not handling it ok15:04
slittle1echo | openssl s_client -connect opendev.org:https > /dev/null15:05
slittle1depth=2 O = Digital Signature Trust Co., CN = DST Root CA X315:05
slittle1verify return:115:05
slittle1depth=1 C = US, O = Let's Encrypt, CN = R315:05
slittle1verify return:115:05
slittle1depth=0 CN = gitea06.opendev.org15:05
slittle1verify return:115:05
fungiyeah, the depth=0 CN line is what's relevant15:05
fungilourot: well, on a conference call right now, yeah, but will take a look when the meeting ends15:06
clarkbI wonder if we should take 06 out proactively and have it rebalance (alternative is the LB should notice on its own and rebalance)15:07
clarkbmy keys are in another room and I am also on a conf call, so will have to dig in after15:08
fungiclarkb: yeah, looks like it's still basically at max swap, i'll try to remove it from the pool15:08
fungi#status log Temporarily disabled balance_git_https,gitea06.opendev.org in haproxy on gitea-lb0115:11
openstackstatusfungi: finished logging15:11
fungislittle1: if you try again, you should see a different backend server now15:11
clarkband hopeflly the rebalance doesn't end up pointing the firehose at some other server15:12
slittle1depth=0 CN = gitea02.opendev.org15:12
slittle1cloning is working !15:12
fungiclarkb: i wonder if it's manage-projects... we've posited that before... the log shows it's not finished gitea06 yet. correlation is not causation, but we seem to notice this around new project creation events15:14
clarkbfungi: oh! ya. Maybe we should just drop the code that tries to update descriptions15:14
clarkbfungi: and we can run that manually when necessary15:14
clarkbI strongly suspect that is what does it since it hits every project (and possibly with a regression in newer gitea not handling that as well as we'd like)15:15
fungilog says gitea06.opendev.org was handled at 14:34:22 and returned a log of internal server errors15:16
fungieven without sending https traffic to it, gitea is still consuming a lot of memory and generating a lot of iowait15:28
clarkbit is probably trying to complete the operations that it was working on15:29
fungiyeah, swap use is falling, i expect that's a lot of i/o paging stuff back into active memory15:30
openstackgerritJeremy Stanley proposed opendev/system-config master: Stop updating Gitea project descriptions for now  https://review.opendev.org/c/opendev/system-config/+/78090415:39
fungiclarkb: ^ is that what you had in mind?15:39
fungii wonder if we could be smarter about it, if querying the descriptions is lightweight and updating them is not, then a lbyl to avoid making unnecessary updates could help15:40
clarkbfungi: yes ish. I think we should move the description updates under the always_update flag. Also we test that so your change will fail as is. I can have a patchset up in a bit15:40
fungiaha, good point15:40
clarkbfungi: I've noticed a couple of other areas of the code that may not be tuned well for gitea15:41
clarkbI'll note them in the commit message for followup15:41
openstackgerritClark Boylan proposed opendev/system-config master: Don't always update gitea project descriptions  https://review.opendev.org/c/opendev/system-config/+/78090415:43
clarkbfungi: ^ something like that?15:43
fungii'll compare to what i was about to push15:43
clarkbthat may make the job run much longer, but I think testing that behavior is a good idea15:45
fungido we need to put it in the "if create or self.always_update:" block instead? do we actually set the description in the make_gitea_project() method?15:46
clarkbfungi: ya creating a project also sets the description through a different process15:47
clarkbso update and create need to be exclusive15:47
fungigitea06 has calmed back down now, i'll reenable it in the pool and keep an eye on it15:47
clarkbsounds good, we should keep an eye out for problems that indicate we may want to trigger replication for it15:48
clarkb(gerrit should retry if it fails, but seems like that doesn't awlays work out)15:48
fungi#status log Re-enabled gitea06 in haproxy now that the crisis has passed15:51
*** hemanth_n has quit IRC15:51
openstackstatusfungi: finished logging15:51
clarkbI'm going to approve the nl01.opendev.org inventory change now16:12
openstackgerritClark Boylan proposed opendev/system-config master: Upgrade gitea to 1.13.4  https://review.opendev.org/c/opendev/system-config/+/78092316:20
clarkbnoticed that we're a couple of minor releases behind and there are some performance related bug fixes. Figure we should update as part of the debugging as well16:21
fungiclarkb: should i do a zuul-scheduler smart-reconfigure (the manage-projects nested tasks are completing, the job around them is just timing out)16:23
clarkbfungi: you should double check that gerrit has updated first since gerrit happens after gitea16:24
clarkbif gerrit also looks good then ya I'd do the smart reconfigure command16:24
fungiTASK [gerrit : Run manage-projects] ... "rc": 016:27
fungiplus the thousands of lines of output you would expect16:27
fungiso yeah, it ran16:28
clarkbhuh, how did it timeout then?16:30
fungiit's nested, right?16:36
fungithe job timed out but the task on bridge kept running to completion after that16:36
clarkbI would've expected the parent process going away to kill the nested one but ya that is possible16:37
clarkband you're sure you identified the correct ansible run log?16:37
clarkbI would check that the new projects are in gerrit directly as a safety valve. Zuul will complain excessively if we tell it to manage ap roject that doesn't exist iirc16:38
fungiclarkb: though, i think that also means i don't actually need to reconfigure zuul16:38
fungiLast reconfigured: Tue, Mar 16, 2021 3:28 PM16:39
fungi(that's 15:28 utc)16:39
clarkblooks recent enough16:39
fungithat's a few minutes after manage-projects16:40
clarkbunexpected success, the best kind of success16:40
fungitechnically correct16:40
lourotthanks, does that mean that my new projects are fine?16:41
openstackgerritMerged opendev/system-config master: Add nl01.opendev.org to our inventory  https://review.opendev.org/c/opendev/system-config/+/78062016:45
fungilourot: yes, zuul seems to know about it https://zuul.opendev.org/t/openstack/project/opendev.org/openstack/charm-manila-dashboard16:47
fungithe previous ones probably were as well16:48
fungii think we assumed the scheduler might not have reconfigured after the previous timeout, but it probably did16:48
lourot\o/ thanks a lot!16:50
roman_gHello team. Could I ask you to create a branch for us, please? https://review.opendev.org/admin/repos/airship/docs,branches branch name 'v2.0.0' from current HEAD. Tank you. I'm core of the project, however I don't think I have permissions. OpenDev docs reference https://docs.opendev.org/opendev/infra-manual/latest/drivers.html .17:22
roman_g*Thank you17:22
clarkbroman_g: you can update your project acls then do it yourself17:26
clarkblet me double check that repo17:26
clarkbroman_g: https://opendev.org/openstack/project-config/src/branch/master/gerrit/acls/airship/docs.config#L3 the airship-release group should be able to do it17:27
roman_gAh, all right. Thank you.17:27
roman_gThank you, clarkb. :)17:27
roman_gI forgot we have it.17:28
clarkbhrm the LE deploy job failed so all the other deploy jobs behind it are skipped17:40
clarkbwe do run nodepool hourly though so I'll wait for the next hourly pass I guess17:40
clarkbI'll look into why the LE job failed shortly17:40
clarkbnb01.opendev.org failed to have the acme.sh client installed17:42
clarkbI suspect we've run out of disk there again17:43
clarkbyup that suspicion is the case17:45
*** amoralej is now known as amoralej|off17:48
clarkbfungi: also the stop doing description updates change failed beacuse the zuul description wasn't updated. I thought my edits to the change would address that but maybe there is a bug in there17:49
clarkbI see the bug17:52
fungiwe only call the always_update block under certain circumstances, right?17:53
* fungi traces that attribute17:53
openstackgerritClark Boylan proposed opendev/system-config master: Don't always update gitea project descriptions  https://review.opendev.org/c/opendev/system-config/+/78090417:54
clarkbfungi: ^ the elif was never hit because it was exclusive to the previous block which would always run if always_update is set17:54
clarkband yes we only run always_update in certain circumstances, though the test was updated to set it17:54
fungiahh, it defaults to true and we also set it true17:55
fungiand yeah, that elif as written will never match17:56
clarkbit defaults to false17:56
clarkbat least at the role level17:57
clarkbit looks like nb01 has managed to take all the image builds and nb02 doesn't have any17:57
clarkband that is part of the reason why its disk has filled up17:57
clarkbI have stopped the builder on 01 and am cleaning out /opt/dib_tmp and will reboot it17:58
clarkbbut then I think I'll leave it off for a bit to see if nb02 can build an image17:58
clarkbthe dib tmp cleanup got us 74GB of room. nb02 picking up some image slack would give us more17:59
clarkbI've also found a couple of leaked vhd intermediate.bak files18:01
clarkbcleaning this up has us to 151GB of free disk now18:01
fungiclarkb: on line 332 we set always_update=dict(type='bool', default=True)18:02
clarkbfungi: interesting, the role itself sets a false default and then passes that in18:03
fungii missed the middle tier18:03
fungiso the module defaults it to true, the role defaults it to false, then our invocation of the role set it to true18:04
clarkbon nb02 I'd like to see if this running centos-7 build succeeds then stop the builder there, do the same dib_tmp cleanup, reboot and restart the service on that host18:04
fungimakes sense18:05
clarkblooking at older build logs on nb02 it looks fairly unhappy18:06
clarkbexec_sudo: losetup: /opt/dib_tmp/dib_image.3w2ZnmAi/image0.raw: failed to set up loop device: No such file or directory exec_sudo /usr/local/lib/python3.7/site-packages/diskimage_builder/block_device/utils.py:13518:06
clarkbthat dir does exist but is empty18:10
clarkbso there is no image0.raw18:10
fungiclarkb: after reading trough the gitea issues about argon2, https://github.com/go-gitea/gitea/issues/14702 suggests api tokens... do you happen to know if that's already a feature or are they suggesting adding it?18:10
clarkbfungi: I believe it is already a feature18:11
fungiclarkb: are you expecting the No such file or directory error is due to full filesystem?18:11
fungior no?18:11
clarkblooking at that image build log it cleans up the image file so no surprise it is gone18:11
clarkbfungi: no this is on nb02 which has plenty of disk18:12
clarkbI don't understand this problem but figure I'll still do the planned cleanup and restart and take it from there18:12
clarkbthat same issue just hit the centos-7 build that was running18:13
clarkbI'll stop the builder on 02 now18:13
openstackgerritKendall Nelson proposed opendev/system-config master: Move PTG bot site to opendev namespace  https://review.opendev.org/c/opendev/system-config/+/78094218:21
clarkb02 has been cleaned up and rebooted and the service is restarted18:25
clarkbthe centos-7 image is being rebuilt there now and we can watch it to see if the problem persists there18:25
clarkbI'm going to take a break before the meeting, back in a bit'18:27
openstackgerritJeremy Stanley proposed opendev/puppet-ptgbot master: Redirect from server aliases  https://review.opendev.org/c/opendev/puppet-ptgbot/+/78094718:44
fungidiablo_rojo_phon: you're going to want this ^ too18:44
fungiand then plumb the aliases list in with ptg.openstack.org in it18:45
*** ralonsoh has quit IRC18:51
openstackgerritKendall Nelson proposed opendev/system-config master: Move PTG bot site to opendev namespace  https://review.opendev.org/c/opendev/system-config/+/78094219:26
*** rosmaita has joined #opendev19:31
rosmaitacinder just merged a patch that fixes our docs job by modifying tox.ini ... do all the patches that failed that job earlier need to be rebased, or can they just be rechecked (or does it depend?)19:33
clarkbrosmaita: zuul tests the outcome of code merging. This means that one of the very first things it does behind the scenes when running a job is it merges the code as proposed into the target branch. If this fails you get a conflict error. If it succeeds the job runs against that potential future state19:34
clarkblong story short you do not need to rebase unless there is a conflict19:34
rosmaitagreat, ty19:34
clarkboh I should've mentioned the gitea upgrade in open discussion19:58
clarkbhttps://review.opendev.org/c/opendev/system-config/+/780923 it passes testing, though I think we should land https://review.opendev.org/c/opendev/system-config/+/780904 frist then rerun19:59
ianwclarkb: should we document that flag in 780904?  it only needs a manual run when a description changes?20:00
clarkbianw: good idea, it is a preexisting flag but it doesn't exist in the docs20:01
clarkbianw: ya its a preexisting flag that we used to coerce things if tehy somehow got out of sync, we haven't had to use it in a long time20:01
clarkbI figured we could add description updates to that coercion while we sort things out. I'll push a new ps with a role readme update20:01
*** tosky has quit IRC20:05
*** tosky has joined #opendev20:05
openstackgerritClark Boylan proposed opendev/system-config master: Don't always update gitea project descriptions  https://review.opendev.org/c/opendev/system-config/+/78090420:06
clarkbianw: ^ something like that?20:06
clarkb(I updated the same change so you'll need to reapply votes)20:07
openstackgerritIan Wienand proposed opendev/system-config master: refstack: cleanup old puppet  https://review.opendev.org/c/opendev/system-config/+/78013820:07
clarkband now lunch20:08
fungiclarkb: i'm reminded, there was a post by mkopec to openstack-discuss back in november asking for issue link updating. maybe we need to treat it similarly to descriptions if we add that?20:09
clarkbto help me not forget I plan to restart the builder on nb01 towards the end of my day as that will have given nb02 a few hours to make some builds (it is doing that successfully now)20:09
clarkbfungi: they will be treated the same after https://review.opendev.org/c/opendev/system-config/+/78090420:09
clarkbbasically need an out of band run with always_update set to true20:10
fungiokay,. so always_update does update issue links too?20:10
clarkbfungi: ya thats what the block above you had discussed changing does20:10
clarkbthough now I'm wodnering if that prior block would handle the descriptuon too. I seem to recall the description was special though20:11
fungioh, yep right20:11
clarkbyup the testing that failed on ps2 confirms that description is separate20:11
clarkbotherwise we would've passed testing on ps2 when the if elif was in use (and not if if)20:11
fungii clearly shouldn't be looking at source code and frying things at the same time20:11
clarkbfungi: are you able to review https://review.opendev.org/c/openstack/project-config/+/779864 while frying things?20:13
clarkbI really need to go find lunch myself now though20:13
*** yoctozepto has joined #opendev20:14
fungiclarkb: yep, lgtm20:14
openstackgerritKendall Nelson proposed opendev/system-config master: Move PTG bot site to opendev namespace  https://review.opendev.org/c/opendev/system-config/+/78094220:15
openstackgerritJames E. Blair proposed opendev/system-config master: Use grafyaml container image  https://review.opendev.org/c/opendev/system-config/+/78012820:27
*** slaweq has joined #opendev20:29
clarkbianw: looking at the kerberos kdc role change again. Would it make sense to run the service-kerberos playbook twice in testing to ensure that you properly noop on the second pass sicne the files exist?20:38
clarkbthe gitea job runs manage projects twice which is what made me think of that20:39
clarkbanyone else want to review https://review.opendev.org/c/openstack/project-config/+/779864 before I approve it to flip nl01.openstack.org nodepool config to nl01.opendev.org?20:40
ianwclarkb: yes, could do as a double-check20:52
clarkbianw: also you should see my comment on https://review.opendev.org/c/opendev/system-config/+/77685720:54
clarkbI've approved the project-config change to flip over nl01.openstack.org to nl01.opendev.org in nodepool configs. I'll watch it20:56
ianwclarkb: hrm, good catch.  i guess we should just leave that one alone; it was really just a thought as i was figuring out where all the .jar's come from20:56
clarkbianw: ya, I think we can make that change later, but right now the git history of gerrit itself seems to imply that maybe we aren't ready yet20:57
ianwmaybe i'll replace that change with "do not do this!" :)20:57
clarkbwfm :)20:57
ianwwhat is the best way to ping someone on https://gerrit-review.googlesource.com/c/gerrit/+/297522 ?  i guess the ML?20:58
clarkbianw: ya, I've tried their slack server for questions and pings and they seem tp ush people to the ML there20:59
openstackgerritMerged openstack/project-config master: Flip nl01.openstack.org to nl01.opendev.org  https://review.opendev.org/c/openstack/project-config/+/77986421:02
ianwi've successfully imported the trove mysql dump into mariadb, so that fix is really only for testing where we start a fresh db21:04
clarkbI'm tailing the launcher debug logs on both hosts and grepping for 'Active requests' that seems to be the info to watch and see that things have flipped over21:09
clarkbthe transition has begun21:13
*** mlavalle has quit IRC21:13
clarkbthere might be some quota fights but seems to be happy so far21:14
fungiokay, dinner is eaten and cleaned up, i'm back now for more fun21:15
fungilooks like the flippin' launcher went well21:16
clarkbso far at least, I'm still wating for the old one to be done21:17
clarkbonce its active request set falls to zero I think I can stop the service on that server, and start working on changes to clean it up21:17
clarkbthen I guess tomorrow plan to launch 02 03 and 04 and get them swapped too21:17
fungimaybe to address the drift on project descriptions, issue links, et cetera we could have a periodic job which just runs with always_update in a weekly pipeline or something21:21
clarkbfungi: we probably also want to try and make those other improvements first as I expect doing the always_update will be disruptive as is21:22
clarkb(but we can currently choose to take that risk if we need to)21:22
fungiit might not be so bad if it happened at a time when there's little other load to contend with21:23
clarkbthat is a good point21:23
fungithough if this is related to password hash memory usage, i do wonder why it would hit one server so hard and not impact the other 7, unless there was something else going on at the same time to compound it21:24
clarkbya I suspect that isn't the direct cause, but could be something that applies more pressure to an already busy system21:26
openstackgerritIan Wienand proposed opendev/system-config master: kerberos-kdc: role to manage Kerberos KDC servers  https://review.opendev.org/c/opendev/system-config/+/77884021:32
openstackgerritIan Wienand proposed opendev/system-config master: kerberos: switch servers to Ansible control  https://review.opendev.org/c/opendev/system-config/+/77989021:32
openstackgerritIan Wienand proposed opendev/system-config master: kerberos-kdc: add database backups  https://review.opendev.org/c/opendev/system-config/+/77989121:32
openstackgerritIan Wienand proposed opendev/system-config master: system-config-run-kerberos: run twice  https://review.opendev.org/c/opendev/system-config/+/78097921:32
ianwhrm, i didn't mean to rebase that21:33
fungiseems it was a trivial rebase at least21:47
fungii was reviewing the first in the series when that hit, but i based my vote on the previous patchset since gerrit carried over clarkb's vote21:48
openstackgerritClark Boylan proposed openstack/project-config master: Add idle configs for nl02-04.opendev.org  https://review.opendev.org/c/openstack/project-config/+/78098221:57
openstackgerritClark Boylan proposed openstack/project-config master: Flip nl02-04.openstack.org to nl02-04.opendev.org  https://review.opendev.org/c/openstack/project-config/+/78098321:57
openstackgerritClark Boylan proposed openstack/project-config master: Remove unused nl0X.openstack.org config files  https://review.opendev.org/c/openstack/project-config/+/78098421:57
openstackgerritClark Boylan proposed openstack/project-config master: Update nodepool zk configs to be a bit less confusing  https://review.opendev.org/c/openstack/project-config/+/78098521:57
clarkbthe first change in that stack is safe to land now21:57
clarkbthe others need to be landed in coordination with new servers existing and being happy and others need to wait for old servers ot be out of inventory21:58
clarkbbut that is all prep work for trying to get through these tomorrow21:58
clarkbnl01.openstack.org is looking idle now. I'll stop its nodepool-launcher21:58
openstackgerritClark Boylan proposed opendev/system-config master: Cleanup nl01.openstack.org  https://review.opendev.org/c/opendev/system-config/+/78098621:59
clarkbI think ^ is ready to land whenever we are happy with nl01.opendev.org21:59
clarkbI set the first unsafe change to workinprogress to avoid any confusion22:00
clarkbfeel free to review :)22:01
clarkbI'm going to start launching 02-04 now with the intention of getting them ansibled and flipped over tomorrow morning22:04
clarkbnb02 just failed an image build with DEBUG diskimage_builder.block_device.utils [-] exec_sudo: umount: /opt/dib_tmp/dib_build.zODdHTpF/mnt/: target is busy. exec_sudo /usr/local/lib/python3.7/site-packages/diskimage_builder/block_device/utils.py:13522:10
clarkbI wonder if this is genesis of the problem I noticed earlier (it did successfully build two centos images first though)22:11
clarkbianw: ^ is that familar to you?22:11
ianwclarkb: hrm ... no22:11
ianwthat sounds like some daemon we might have started that hasn't shutdown?22:12
fungii've seen it when processes get left running in a chroot22:12
fungiyeah, like that22:12
clarkbhrm maybe the suse builds are leaking something that causes that?22:12
clarkbit has started a new suse tumbleweed build so we get to see if it persists22:12
ianw2021-03-16 20:47:13.185 | /tmp/in_target.d/finalise.d/89-boot-settings: line 140: /etc/init.d/boot.local: No such file or directory22:12
ianwthat seemed to cause the failure in opensuse-tumbleweed-0000301586.log22:13
clarkboh there is an earlier failure? I should've checked for that22:13
clarkbthinking out loud here we may want to pause that image build to try and get nb02 to build more images so nb01 can be restarted22:14
ianwthere's a lot of22:15
ianwexec_sudo: losetup: /opt/dib_tmp/dib_image.oQ9ri5cB/image0.raw: failed to set up loop device: No such file or directory22:15
clarkbianw: ya those are teh errors that caused nb01 to fill up bceause nb02 stopped building images22:18
clarkbianw: what I had done was to stop nb01 and clean it up but then did the same with nb02 hoping it could then build a number of images to take load off of nb01 before starting nb01 again22:18
clarkbthe status now is nb01 has had dib_tmp cleaned out and the server was rebooted but nodepool-builder is stopped (I downed the docker compose setup)22:19
clarkbnb02 did the same but I started the service up again and it then managed to build a few images22:19
clarkbI was going to start up nb01 again when nb02 had built a number of images22:19
openstackgerritMerged opendev/system-config master: Don't always update gitea project descriptions  https://review.opendev.org/c/opendev/system-config/+/78090422:21
openstackgerritMerged opendev/system-config master: refstack: cleanup old puppet  https://review.opendev.org/c/opendev/system-config/+/78013822:21
ianwwe have /dev/loop* ...22:22
fungii think the error is complaining about the other end22:22
fungithe file to be mounted on the loop device22:22
ianwhrm ... /opt/dib_tmp/dib_image.WVe8UrDr/image0.raw: failed to set up loop device: No such file or directory22:23
ianwi dunno, i don't see how that file can't be there22:23
clarkbianw: thats from a pre cleanup and restart build though right?22:25
ianw-rw-r--r-- 1 nodepool nodepool 1.6M Mar 16 16:07 ubuntu-focal-0000086000.log22:25
ianw-rw-r--r-- 1 nodepool nodepool 1.6M Mar 16 17:48 ubuntu-focal-0000086235.log22:26
ianwCreate image file [/opt/dib_tmp/dib_image.3w2ZnmAi/image0.raw]22:27
ianw    with open(filename, "w") as fd:22:27
ianw        fd.seek(size - 1)22:27
ianw        fd.write("\0")22:27
clarkbya those were pre cleanups22:27
openstackgerritMerged opendev/system-config master: kerberos-kdc: role to manage Kerberos KDC servers  https://review.opendev.org/c/opendev/system-config/+/77884022:28
ianwreboot   system boot  4.15.0-137-gener Tue Mar 16 18:24   still running22:29
ianwok, so after 18:2422:29
ianwso it built centos-7 & 8 and is now stuck on tumbleweed22:29
openstackgerritClark Boylan proposed opendev/zone-opendev.org master: Add nl02-04 to DNS  https://review.opendev.org/c/opendev/zone-opendev.org/+/78098822:30
clarkbianw: ya22:30
clarkbor at least may be stuck on tumbleweed, not sure if it will fail that again22:30
ianwit looks like a systematic error22:31
ianw /tmp/in_target.d/finalise.d/89-boot-settings: line 140: /etc/init.d/boot.local: No such file or directory22:31
fungimaybe that element has bitrotted against a recent change in tumbleweed?22:32
openstackgerritClark Boylan proposed opendev/system-config master: Add new opendev.org nodepool launchers  https://review.opendev.org/c/opendev/system-config/+/78098922:32
ianwthat's a prjoect-config element22:32
clarkbfungi: ya could be22:32
clarkbok all the changes to do nl02-04 are now up. Just a matter of reviewing them for accuracy and landing them in the right orders22:33
ianwif [[ "$DISTRO_NAME" =~ (opensuse) ]] ; then22:34
ianw    rclocal=/etc/init.d/boot.local22:34
ianwi'm guessing tumbleweed has decided this is no longer cool22:34
clarkbI'm on tumbleweed let me see what I've got if anything22:34
clarkbianw: I've got /etc/init.d/boot.d22:34
clarkbas a directory22:35
clarkbmaybe we can set rclocal to /etc/init.d/boot.d/opendev.local?22:35
fungiso they changed to a run-parts replacement for it i guess22:37
clarkbthat is my guess22:37
clarkbthough everything is empty locally22:37
clarkbhttps://forums.opensuse.org/showthread.php/531768-Where-Art-Thou-Boot-Local says the rc-local.service is what ingests the scripts22:37
* clarkb looks to see what that looks like22:37
clarkbExecStart=/etc/init.d/boot.local start is what it says22:38
ianwclarkb: do you have a packae "aaa_base"?  this seems to be where it may have previously been shipping a boot.local file22:38
clarkbmaybe /etc/init.d isn't there at all on the images22:38
clarkbi+ | aaa_base <- yes the i means it is installed22:39
clarkbthere is also an aaa_base-extras which is installed22:39
ianwLet's own /etc/init.d/ as it is gone from package filesystem22:41
ianw7 months ago though ...22:41
ianwseems to me it should be creating a boot.local file in the .spec @ https://build.opensuse.org/package/view_file/openSUSE:Factory/aaa_base/aaa_base.spec?expand=1 line 13422:43
clarkbaaa_base is being installed  according to the build log too22:44
clarkbcould it be a permissions issue?22:44
clarkbwe're trying to redirect straight into the file and not using a tmp file then mv'ing it with sudo22:45
*** rosmaita has left #opendev22:45
clarkbhrm the forwarding.conf update would imply we have privs there though22:45
clarkbas we chown root:root without sudo22:46
ianwyeah it must be as root22:46
ianwlooking in /opt/dib_tmp/dib_build.ITSx0k5W/mnt/etc which is the currently building chroot, there is no /etc/init.d22:47
clarkbis it possible aaa_base updated and that spec is not longer current/valid?22:48
clarkbeg the TODO got done?22:48
ianw(201/224) Installing: aaa_base-84.87+git20210308.d7a7d3a-1.1.x86_64 [...........done]22:49
ianwthat seems to all match, i'd say22:49
clarkbwoudl the rpm pacakge build optimize that out since its a couple of empty files?22:50
* clarkb downloads an rpm22:50
clarkbmark /etc/init.d/{boot,after}.local as %config(noreplace) (boo#1179097) <- shows up in the change log of that rpm22:52
clarkbwould noreplace imply some sort of perms setting preventing changes?22:53
clarkbthere is also a script that deletes those files in some situations22:54
ianwyeah, i've jumped into the chroot for the currently building, and reinstalling aaa_base the directory isn't made22:55
ianw# we have several local files, that changed over the time.  Check the22:56
ianw# existing one, if they contain real data.  If not, delete them.22:56
clarkbI can't find this script in the package stuff on obs that I see in the rpm22:56
clarkbya thats teh script I'm talking about22:56
clarkbit is aaa_base.pre22:58
clarkbhttps://github.com/openSUSE/aaa_base/blob/master/aaa_base.pre found it22:58
clarkbrpm package guide says that pre is run prior to the install23:00
clarkband it is the install that is making the empty files23:00
clarkb(that just makes me extra confused now(23:00
clarkbianw: it does install /dev/null to those paths, do we maybe not have dev null present?23:01
clarkb(and maybe it just silently fails and moves on?)23:02
clarkbthis is quickly getting outside of my packaging knowledge23:02
clarkbmaybe we see if dirk has any good ideas?23:03
ianwyeah, i mean i don't know what changed but i think we have a pretty good clue where to look23:04
ianwwe should probably just pause it I guess.  we don't test project-config elements in the gate which is a bit of a bummer, why we've never noticed this23:05
clarkbI'm g ood with pausing so that we can focus on getting nb02 to build more images then get nb01 back in the toration23:06
ianwi have to run out for about 20 minutes but can pause it after and file a bug23:06
clarkbianw: would you prefer I write that change?23:06
clarkbI can write it now23:06
clarkbjup to you23:06
clarkbinfra-root https://review.opendev.org/c/opendev/zone-opendev.org/+/780988 https://review.opendev.org/c/openstack/project-config/+/780982 and https://review.opendev.org/c/opendev/system-config/+/780989 (note its parent https://review.opendev.org/c/opendev/system-config/+/780986/) should all be ready to go at this point23:24
clarkbwhen nl01.opendev.org came online ansible did not start its launcher23:24
clarkbbut even if it had the idle configs seemed to work well23:24
clarkbianw: thinking out loud about the boot.local issue maybe now that everything is using system we should write out a shell script and have a unit run it?23:27
fungiusing system...d?23:29
fungijust making sure i grok the suggestion23:29
clarkboh yes s/system/systemd/23:30
fungihuh, system-config-run-meetpad failed in the gate for https://review.opendev.org/77830823:31
fungichecking to see if it's just a fluke23:31
fungiahh, nothing new23:34
fungiERROR: for web  toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit23:34
ianwsorry back now23:48
ianwtrying to fix it is also a valid path forward :)23:49

