Monday, 2020-04-20

ianwi'm going to reboot it ... see if it reoccurs after a certain build00:26
ianwfedora-30 or xenial-plain was the last thing it tried to build before it went crazy, i think00:27
ianwroot     24520 24518  0 00:49 ?        00:00:00 lvs --noheadings --separator : -o vg_name,lv_name ... hrm00:52
prometheanfireianw: have a tic to look at ?02:39
prometheanfireanother glean thing, but the last one that's both passing tests and has at least one other +202:39
*** jhesketh has joined #opendev02:56
*** kevinz has joined #opendev03:06
openstackgerritIan Wienand proposed openstack/project-config master: Fix release for Fedora 31
ianw^ i think this might be the "cause" of the nb04 issues -- it causes the build to fail, and something in the unwinding process kills the container ....03:26
openstackgerritMerged openstack/project-config master: Fix release for Fedora 31
*** ykarel|away is now known as ykarel04:09
ianwunfortunately that has *not* fixed it.  f31 built but then it looks like the container was hosed04:58
ianwlogs in /var/log/nodepool/builds/keep04:58
ianw# docker run --privileged --entrypoint /bin/bash zuul/nodepool-builder:latest  -c  " umount /proc; mount"05:14
ianwi notice that this "works" ... as in unmounts /proc ... whereas on a real system that's pretty much impossible05:14
ianwit's a red herring05:57
ianwi think05:57
ianw# docker run --privileged --entrypoint /bin/bash zuul/nodepool-builder:latest  -c  "DIB_RELEASE=bionic disk-image-create -o test.qcow2 ubuntu-minimal vm ; echo ; mount ; ls /proc"05:57
ianwis a replicator05:57
*** dpawlik has joined #opendev06:09
*** ysandeep|away is now known as ysandeep06:53
*** DSpider has joined #opendev06:57
openstackgerritIan Wienand proposed openstack/project-config master: Move Ubuntu builds away from nb04
*** ysandeep is now known as ysandeep|afk07:11
*** rpittau|afk is now known as rpittau07:12
*** ralonsoh has joined #opendev07:18
openstackgerritMerged openstack/project-config master: Move Ubuntu builds away from nb04
openstackgerritMerged openstack/project-config master: Add TrilioVault charms
openstackgerritMerged openstack/project-config master: Remove pypy job from x/surveil
AJaegerconfig-core, please review (nb03 update)07:51
AJaegerianw, are we ready to go to Fedora 31? What about updating fedora-latest as well?07:51
AJaegerinfra-root, has a promote failure:  infra-prod-remote-puppet-else : FAILURE in 1m 44s07:57
AJaeger"ERROR! the playbook: /home/zuul/src/ could not be found"08:00
*** ysandeep|afk is now known as ysandeep08:02
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Use cached 'tox_executable' in fetch-tox-output
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Use cached 'tox_executable' in fetch-tox-output
*** ykarel is now known as ykarel|lunch08:17
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Use cached 'tox_executable' in fetch-tox-output
*** sshnaidm has joined #opendev08:22
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Use cached 'tox_executable' in fetch-tox-output
ianwAJaeger: i'll re-evaluate tomorrow, i have to investigate these nb04 builder failures a bit more08:37
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Use cached 'tox_executable' in fetch-tox-output
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Use cached 'tox_executable' in fetch-tox-output
*** ykarel|lunch is now known as ykarel09:00
*** tosky has joined #opendev09:01
AJaegerianw: sure09:03
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Use cached 'tox_executable' in fetch-tox-output
*** ysandeep is now known as ysandeep|lunch09:34
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Use cached 'tox_executable' in fetch-tox-output
openstackgerritMerged opendev/irc-meetings master: Not all meetings are OpenStack
openstackgerritAndreas Jaeger proposed zuul/zuul-jobs master: Fail fetch-sphinx-tarball if no html exists
*** ysandeep|lunch is now known as ysandeep10:13
*** rpittau is now known as rpittau|bbl10:15
openstackgerritAndreas Jaeger proposed zuul/zuul-jobs master: Fail fetch-sphinx-tarball if no html exists
*** roman_g has joined #opendev10:39
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Use remote_src false for easier debugging
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: fetch-sphinx-tarball: use remote_src false
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: fetch-sphinx-tarball: use remote_src false
openstackgerritSorin Sbarnea proposed zuul/zuul-jobs master: tox: allow tox to be upgraded
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: fetch-sphinx-tarball: use remote_src true
openstackgerritAndreas Jaeger proposed zuul/zuul-jobs master: Use main.yaml, not .yml
*** dpawlik has quit IRC11:50
*** dpawlik has joined #opendev11:50
*** dpawlik has quit IRC11:52
*** dpawlik has joined #opendev11:53
*** dpawlik has quit IRC11:54
*** dpawlik has joined #opendev11:54
openstackgerritAndreas Jaeger proposed zuul/zuul-jobs master: Use main.yaml, not .yml
*** bwensley has left #opendev11:56
*** rpittau|bbl is now known as rpittau12:03
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: fetch-sphinx-tarball: Do not keep owner of archived files
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Only delete variables tempfile when it exists
openstackgerritMonty Taylor proposed opendev/system-config master: Split eavesdrop into its own playbook
openstackgerritMonty Taylor proposed opendev/system-config master: Move cloud-init removal to its own playbook
openstackgerritMonty Taylor proposed opendev/system-config master: Just move cloud-init removal into base-server
openstackgerritMonty Taylor proposed opendev/system-config master: Stop cloning a bunch of puppet modules we don't use
openstackgerritMonty Taylor proposed opendev/system-config master: Remove some extra bits from site.pp
openstackgerritMonty Taylor proposed opendev/system-config master: Split codesearch into its own playbook
AJaegermorning, mordred ! Did you see my comment about the promote failure on 720534 above?12:44
openstackgerritMonty Taylor proposed opendev/system-config master: Fix remote_puppet_else playbook name
openstackgerritMonty Taylor proposed opendev/system-config master: Fix remote_puppet playbook names
mordredAJaeger: ^^ that should fix it12:45
mordredAJaeger: thanks for noticing :)12:50
openstackgerritSlawek Kaplonski proposed openstack/project-config master: Update Neutron Grafana dashboard
openstackgerritSlawek Kaplonski proposed openstack/project-config master: Update Neutron Grafana dashboard
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Only delete variables tempfile when it exists
openstackgerritMonty Taylor proposed opendev/system-config master: Split codesearch into its own playbook
*** ykarel is now known as ykarel|afk12:56
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Only delete variables tempfile when it exists
openstackgerritAndreas Jaeger proposed zuul/zuul-jobs master: Use main.yaml, not .yml
*** kevinz has quit IRC13:09
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Only delete variables tempfile when it exists
*** kevinz has joined #opendev13:14
openstackgerritMerged zuul/zuul-jobs master: Update Fedora to 31
mordredfrickler, fungi: when you get a sec,
openstackgerritMerged zuul/zuul-jobs master: Make ubuntu-plain jobs voting
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: WIP: Adds role: ensure-ansible
corvusmordred: re the report of retry_limits in #openstack-infra, it looks like the scheduler is under memory pressure and we need to restart13:35
mordredcorvus: ah. I didn't think to look for memory pressure - I was trying to find an error or something in the log13:38
corvusmordred: i looked at the graph, and then grepped for "ZooKeeper" to confirm there were connection losses13:39
corvusmordred: we may have a problem, i may need more eyes13:41
mordredcorvus: k. where should I apply the eyes?13:41
corvusi think the scheduler is at the part of the startup process where it queries gerrit for all project-branches, but i see no activity13:41
corvusit is talking to gerrit though13:42
corvusit's getting events13:42
mordredcorvus: yeah - I see the events13:43
corvushuh, it looks like it never stopped13:43
corvusbut the process timestamp is recent13:43
corvusi'll just try killing it again13:44
mordredRuntimeError: cannot join current thread13:44
mordredthat's fun13:44
corvusokay, there's definitely no scheduler process running now, restarting13:44
mordredit looks like it's starting13:44
corvuswhy aren't we seeing all the branches at cat jobs?13:45
mordredbut ... yeah13:45
mordreddo we need to restart the mergers?13:45
corvusshouldn't need to13:45
corvusit debug logs the cat jobs before sending them to the mergers anyway13:46
corvusso even if we were waiting on cat jobs, we'd still see a bunch of output13:46
mordredI just checked and the main.yaml is there and is correct13:46
mordredfwiw - puppet hasn't run on the zuul hosts in several days (or anywhere) - so honestly _nothing_ should be different on the hosts13:48
corvusi was just about to say, the zuul install is rather outdated13:48
corvusit was doing a db migration?13:49
mordredcorvus: yeah13:49
corvusand aborting it caused problems13:49
mordredand being unhappy about it13:49
corvusok, should we manually undo that migration and let it run this time13:50
mordredwell - it at least explains the pause - we should maybe add a log line13:50
corvusand then put in some friggin log lines? :)13:50
mordred"starting db migrations" or something13:50
corvusmordred: can you get a mysql prompt while i find the migration?13:50
mordredcorvus: working on it13:50
mordredcorvus: in a mysql prompt on zuul.o.o in a screen session13:52
corvusmordred: alter table "zuul_build" drop column "error_detail";13:52
corvussorry,  i never get the quotes right13:52
mordredcorvus: do we need to update the migrations table/13:52
corvusmordred: i'm assuming it never got upgraded, but i'll get the values13:53
corvusmordred: i think the value we want in there is 5f183546b39c.  let's just check and see if that's what's there.13:53
mordredk. will do once this is done13:54
corvusi just checked, and wow, it really doesn't say anything about starting a migration13:55
mordredwe might want to set a status - if the restarts were any indication, we might be at this for a few minutes13:55
mordredcorvus: I'd think "starting migrations" "running migration XXX" and "done with migrations" would all be nice13:55
mordredwe also might want to investigate starting to use online ddl for some of these13:56
corvusstatus notice Zuul is temporarily offline; service should be restored in about 15 minutes.13:56
corvusmordred: ^?13:56
corvus(is that too optimistic?)13:57
mordredcorvus: let's go for it13:57
corvus#status notice Zuul is temporarily offline; service should be restored in about 15 minutes.13:57
openstackstatuscorvus: sending notice13:57
mordredcorvus: <-- for our reading pleasure later13:57
-openstackstatus- NOTICE: Zuul is temporarily offline; service should be restored in about 15 minutes.13:57
*** sgw has joined #opendev13:57
* fungi is here if help is needed, but has just been watching since you seem to have figured this out already13:58
*** dmsimard has joined #opendev13:58
corvusmordred: neat; it will be interesting to manage schema upgrades with multiple schedulers13:58
sgwMorning folks, any idea why a .gitreview commit would get stuck for the startlingx/kernel?  Does the .zuul.yaml need to precede it?14:00
corvusmordred: we should also get around to dropping some rows from this table.14:00
mordredcorvus: yeah. will probably need to do a leader election14:00
openstackstatuscorvus: finished sending notice14:00
corvussgw: yes it does14:00
corvussgw: at least add the 'noop' jobs to allow the gate to work14:01
corvusmordred: let's check the rev14:02
*** iurygregory has joined #opendev14:02
corvusgood that's what we want14:02
sgwOk, thatnks14:02
fungisgw: or you could squash them both into a single change14:02
corvusi'll restart scheduler now and let it run14:02
mordredcool. I think you're good to ... yeah14:02
mordredcorvus: dropping took about 9 minutes, so we shoudl expet the add to take the same14:03
corvusmordred: ack; loosk like it's running14:03
ttxhrm, github does not seem to allow me to do that git push --prune after all14:08
ttxFails with lots of ! [remote failure]      refs/changes/66/217766/2 (remote failed to report status)14:08
ttxI guess git push --mirror would work, but that is a bit more costly to run and potentially would introduce a race14:09
*** calcmandan has quit IRC14:09
mordredttx: well - the race is probably fine - it'll just get resolved by the next push14:09
*** calcmandan has joined #opendev14:09
ttxDoes someone know how to clone a repository with all the branches ? Doing git clone and then git push --mirror only pushes the master branch14:10
corvusyou could have someone standing by to force a gerrit replication on the project afterwords14:10
ttxgit clone --mirror seems to do something else than what you expect it to14:10
mordredttx: git clone --mirror makes a bare repo - after you do that, do "git config --bool core.bare false"14:12
mordred(from in the repo)14:12
corvusprogress on zuul14:13
corvuslooks like the github driver is starting14:13
fungittx: we have some code in jeepyb which clones all branches and tags14:13
* fungi finds14:13
corvusand there go the cat jobs \o/14:13
mordredthere we go!14:13
ttxfungi: yeah, I was hoping the async refs/changes cleanup script would not require to clone half of the universe14:14
corvusttx: despite that error, did the prune happen to work?14:15
*** ysandeep is now known as ysandeep|afk14:15
ttxcorvus: no, it leaves the refs/changes untouched14:15
corvusokay, it's up and re-enqueue is running14:17
openstackgerritMonty Taylor proposed opendev/system-config master: Split eavesdrop into its own playbook
mordredcorvus: woot. that's a relief14:18
*** ykarel|afk is now known as ykarel14:22
corvuswin 9614:32
corvuslooks like all systems nominal14:34
mordredcorvus: woot14:34
corvusmordred: shall we terminate the screen session now?14:34
mordredcorvus: yeah14:35
corvusmordred: thanks for your help; i think we recovered from my screwup pretty quickly :)14:36
mordredcorvus: yeah - I'm glad we accidentally cancelled the migration and then reapplied it causing that error14:37
mordredcorvus: otherwise we would have just been super confused for much longer14:37
corvus"why did it take 10 minutes to start?"14:38
ttxSo in summary: we are trying to out-of-band clean up refs/changes/* on GitHub mirrors so that the executor does not get caught for hours cleaning them up the first time it does a git-mirror replication. The only way to do that seems to be to run a git push --mirror from a full clone with all refs/heads but no refs/changes. Any suggestion on how to do that without actually cloning all repositories locally ?14:39
mordredttx: nope. I think that'll require cloning all the repos14:39
ttxOK, will upgrade my bash one-liner to a full bash script then :)14:40
corvusttx: maybe we just want to do this for nova and neutron, and let the executors handle the rest?14:49
ttxyeah... I'll first assess how long it takes and see if that would work14:49
openstackgerritMerged zuul/zuul-jobs master: Document output variables
openstackgerritMerged zuul/zuul-jobs master: Python roles: misc doc updates
fungittx: openstack-manuals is the other one gerrit tends to spend a bunch of time syncing, so maybe that one as well14:56
*** mlavalle has joined #opendev14:57
clarkbis there a tldr on the zuul situation?14:57
fungiclarkb: i posted one in #openstack-infra i can copy here14:59
clarkbI see it now thnaks15:00
corvusclarkb: ran out of memory, i restarted, got confused why it was hanging during start, aborted startup, started again, got a db migration error, realized that's why it was slow, manually reverted the db migration, started it again and just let it run, then all is good.15:00
clarkbdo we think the memory issue is a leak?15:00
corvusclarkb: the big run up was a few weeks ago, and may have been due to me using the repl15:00
corvusi'd like to disregard this data point based on that15:01
corvusclarkb:  should improve the logging to print out migration info (but we won't see it until we stop using our custom log config)15:01
openstackgerritMonty Taylor proposed opendev/system-config master: Split eavesdrop into its own playbook
*** dpawlik has quit IRC15:07
mordredclarkb, corvus: is not quite ready yet (I keep finding things - but I think they'll all help with splitting site.pp generally) ... but the stack leading up to it should all be both ready and safe to land15:09
clarkbmordred: ok I've got a couple meetings to watch for the next little bit then will try to review15:09
clarkbI also need to get a change up to simplify the docker-compose stuff (zk can use the global install and global install doesn't need to keep removing distro package now)15:10
*** ykarel is now known as ykarel|away15:18
sgwAnother dumb question: So we added the .zuul.yaml to starlingx/kernel, but I don't see the zuul job running on for that commit, did we miss something else?15:25
clarkbsgw: can you link to the change?15:26
fungientirely possible it got pushed in the few minutes when the scheduler was offline for emergency maintenance15:26
sgwso try recheck?15:26
fungimm, no it was pushed after the manitenance concluded15:26
fungiis it maybe already queued?15:27
funginope, i don't see it on the status page either15:28
sgwRight, I thought it should show up on the status page15:28
clarkbwe should double check the project was added to the tenant config15:28
fungiyeah, next thing i checked was the tenant configuration errors zuul is reporting, but those are all for openstack/openstack-ansible-tests15:29
clarkbits in the file and the restarts today should've updated it. However, maybe mordred's changes aren't updating zuul config properly?15:29
fungimordred did say ansible hasn't updated the scheduler in several days15:30
fricklercorvus: once zuul is stable, meetpad is broken for me, the redirects for the etherpad work, but I can't start a meeting. I'm assuming some bug in
AJaeger does not list starlingx/kernel15:31
corvusfrickler: ah thanks, i'll take a look.15:31
clarkbAJaeger: k so it likely is the issue of infra-prod-* jobs not updating properly15:31
clarkbmordred: ^ fyi15:31
fungiclarkb: AJaeger: yep, not (yet) updated in /etc/zuul/layout/main.yaml on zuul.o.o15:32
mordredthere is a patch in gate that should fix this15:34
corvusfrickler: it looks like only 2 of the 4 services are running15:36
AJaegermordred: the job is still in check after the restart - should we move it to gate?15:36
corvusFATAL ERROR: JVB auth password must be changed, check the README15:37
mordredAJaeger: yeah - how about i do that real quick15:38
corvusfrickler: ^ i guess they updated the docker images, probably to make that password change mandatory15:38
corvusso we could either pin to old images, or actually come up with real passwords15:38
corvusmay as well do real passwords, since we might want to expose xmpp later anyway15:38
mordredcorvus: might as well15:39
mordredAJaeger: enqueued15:39
*** donnyd has joined #opendev15:39
mordredclarkb: incidentally (and you'll see this in the eavedrop patch) - I learned that the hostname: ansible task requires dbus to exist on systemd systems - and we have that on our cloud-provider images in prod, but it's not installed in the gate nodes. yay dbus15:40
donnydI guess I also need to start having my infra discussions here too right?15:41
corvusmordred: what's the state of getting /p/ mapped into the gerrit container?15:41
corvusmordred: my local git repos are getting further and further behind15:41
corvusi feel like we should either treat that as a serious regression, or stop running the service15:42
clarkb/p/ is going away right?15:42
mordredcorvus: we need to restart the gerrit contianer15:42
clarkbshould we redirect it to or just / on gerrit?15:42
clarkbah if this is already handled then /me gets out of the way15:42
corvusmordred: can we just do that now?15:42
corvusmordred: you want to type that or shall i?15:42
mordredeither way - I can if you wanna do a status log15:43
corvusmordred: sgtm15:43
corvusmordred: do you want to do a notice though?15:43
corvusor just a log?15:43
mordredoh - yeah - that's what I meant15:43
mordreda notice15:43
corvusstatus notice Gerrit will be restarted to correct a misconfiguration which caused some git mirrors to have outdated references.15:44
mordredcorvus: hang on one sec15:44
mordred(but that looks good)15:44
openstackgerritAndreas Jaeger proposed zuul/zuul-jobs master: Use main.yaml, not .yml
corvushow's that ^  (i didn't want to advertise /p/ specifically, but wanted to give folks a breadcrumb in case they saw what i was seeing)15:45
mordredcorvus: ok. we're good (I was double checking the config and then had a "wait, is that right?" moment, but I'm back to being good15:45
corvus#status notice Gerrit will be restarted to correct a misconfiguration which caused some git mirrors to have outdated references.15:45
openstackstatuscorvus: sending notice15:45
-openstackstatus- NOTICE: Gerrit will be restarted to correct a misconfiguration which caused some git mirrors to have outdated references.15:45
fungidonnyd: opendev-wide infrastructure discussions in here, openstack-project-specific infrastructure discussions still make sense in #openstack-infra15:46
donnydthanks fungi15:47
donnydSo I should have brought the issue i noticed this morning to this channel as it effected all opendev (on OE that is)15:48
openstackstatuscorvus: finished sending notice15:49
corvusmordred: ^15:49
mordredok. stopping gerrit15:49
mordredgerrit is starting15:49
mordredcorvus: exception in error log15:51
mordredgerrit seems up - but we're getting mergability check tracebacks15:52
mordredor - we got one15:52
clarkbI think its normal to get a variety of excepions if you want to compare to pre restart15:52
clarkbthings like ssh connections closing unexpectedly15:52
corvusyeah, i think there were some before15:52
mordredok. cool15:52
corvushaving said that, we may want to see if we can figure out what repo that is and fsck it15:53
corvusbecause: Caused by: org.eclipse.jgit.errors.MissingObjectException: Missing blob baeb8879a2ae011e4ea3836dabba584f1311f81415:53
corvusi love how it doesn't say what repo15:53
mordredyup. super helpful15:53
corvus"just google for the sha and see if github has it"15:53
clarkbmordred: corvus: this restart used the new graceful stop right? (want to confirm that worked as expected)15:53
mordredclarkb: the graceful stop was in the compose file - I don't know that I experienced different behavior15:54
corvusi don't see any log lines about stopping15:54
corvusclarkb: maybe you can test it out on review-dev?15:55
fungidonnyd: there's a good chance the problem you spotted was actually a symptom of the zuul scheduler running out of memory15:57
donnydthat would make sense.. .it was firing nodes and they came up fine.. it just seemed like the rest of the process was busted somewhere15:57
fungisince that would have caused nodepool to delete lots of nodes out from under zuul, resulting in jobs getting rerun en masse15:58
fungi(because of zookeeper disconnects)15:58
sgwSo, is the starlingx/kernel setup correctly, or did the restart help? Or should I fire a recheck16:00
clarkbsgw: we need a zuul job on our end to run and update zuul's config. This is fixed by a change in the gate apparently16:01
clarkbsgw: I think you just need to wait until your projcets shows up here
openstackgerritMerged opendev/system-config master: Fix remote_puppet playbook names
mordredok - there's the patch16:02
AJaegersgw: that change needed to merge first ^16:02
sgwAh so we missed adding it to the projects list?16:03
mordredwe've been updating our config management- and we had a typo that caused some of it to not actually run16:03
AJaegersgw: you did fine - we missed telling zuul about it16:03
mordredthe config management in question was the stuff that actually applies the zuul main.yaml config file :)16:03
sgwah ok thanks16:03
mordredthe job should run in the next few minutes - the deploy job is enqueued16:04
openstackgerritJames E. Blair proposed opendev/system-config master: Use real passwords for meetpad
corvusfrickler, clarkb, mordred, fungi: ^16:06
*** rpittau is now known as rpittau|afk16:07
mordredcorvus: ++16:07
fungisgw: yeah, you caught us mid-transition between how we're handling applying those configurations16:14
openstackgerritMerged zuul/zuul-jobs master: Fail fetch-sphinx-tarball if no html exists
clarkbAJaeger: is something you might be able to respond to? its questions baout docs publishing and layout for airship16:20
AJaegerclarkb: yes, can do16:22
clarkbthank you!16:23
*** fdegir has joined #opendev16:25
mordredAJaeger: remote-puppet-else is running16:25
mordredAJaeger, sgw : zuul has been updated - should be all good now16:42
mordredsorry about the delay16:42
sgwdo I need to do a recheck or is it in the queue16:42
clarkbsgw: you need to recheck16:43
openstackgerritClark Boylan proposed opendev/system-config master: Cleanup unneeded things post docker-compose upgrade
clarkbinfra-root ^ small cleanup from docker-compose things friday16:48
clarkbmordred: on the applytest changes in system-config we don't seem to run the applytest job?16:57
clarkboh wait nevermind I think we do I'm just blind16:57
*** sshnaidm is now known as sshnaidm|afk16:59
clarkbmordred: comment on not sure if you want to do that in a followon or update the change17:00
clarkbmordred: comment on which is a bit moreinvolved17:03
AJaegerclarkb, mordred , let me do that followup change on 72088717:04
AJaegerclarkb: want to +A 720887 then?17:04
clarkbAJaeger: sure17:04
dmsimardis anyone else getting a basic authentication prompt on ? I don't get one on
openstackgerritAndreas Jaeger proposed opendev/system-config master: Remove system-config-puppet-beaker-rspec-puppet-4-centos-7-infra
AJaegerclarkb, mordred ^17:06
AJaegerdmsimard: I get the prompt as well ;(17:06
clarkbdmsimard: yes we tracked it down a few weeks back. It has to do with your readme having a 404 link iirc17:07
clarkbsomething like that17:07
clarkb(and that is how gitea expresses it when rendering the readme)17:07
dmsimardyeah there's a link to in the readme, that's proabably not going to work17:08
openstackgerritMerged opendev/system-config master: Remove puppet-beaker-rspec-puppet-4-infra-system-config
openstackgerritMerged opendev/system-config master: Remove unused rspec tests
openstackgerritMerged opendev/system-config master: Make applytest files outside of system-config
sgwThanks for your help unsticking zuul and the starlingx/kernel jobs17:20
openstackgerritMerged opendev/system-config master: Move puppet apply jobs to system-config repo
AJaegerconfig-core, please review now that 720887 is merged ^. And infra-root, please review
AJaegermordred: did you see the -1 on ?17:31
AJaegerclarkb: please have a look at - mass puppet retirement. Should we ask for an announcement email?17:32
clarkbAJaeger: ya why don't I write an email to openstack-infra17:33
AJaegerthat works as well ;)17:33
*** ralonsoh has quit IRC17:39
clarkbAJaeger: note sent17:40
*** roman_g has quit IRC17:56
*** prometheanfire has quit IRC18:15
*** slittle1 has quit IRC18:17
*** slittle1 has joined #opendev18:21
openstackgerritMonty Taylor proposed opendev/system-config master: Split eavesdrop into its own playbook
mordredclarkb: yeah - I'll do the centos removal in a followup18:28
clarkbmordred: AJaeger volunteered fwiw so I approved the parent18:28
mordredclarkb: cool18:29
mordredclarkb: so - etherpad-dev ... I don't know - do we want to keep an etherpad-dev?18:29
mordredat the very least I think we should re-deploy one using the new stuff18:29
openstackgerritMerged opendev/system-config master: Use real passwords for meetpad
mordredso removing the existing one probably isn't a bad idea anyway18:29
clarkbmordred: ya I think its mostly about making sure we know what the plan is there and not ignoring it if we need to not ignore it18:30
clarkbI'm semi interested in running it like we do gitea and others. Basically rely on our test tooling as -dev replacement18:30
mordredyeah. I think I'd like to do that until such a time as we realize it doesn't work18:30
clarkbbut etherpad is different in that its client behavior is really important so being able to test it "live" might make it an exception?18:30
mordredalso a good point18:30
mordredsince we're building our own images, we'd likely need to make a second image with like a :dev tag18:31
mordredso that we could run them on different tags18:31
clarkbmordred: thinking out loud here, maybe we can get away with holding a test node for actual client verification18:33
*** slittle1 has quit IRC18:33
clarkbbasically build -dev on demand rather than keeping it around (I kinda like that)18:34
mordredyeah. me too18:34
mordredI think it's worth at least trying18:34
mordredturns out we can always spin up another etherpad-dev if we need to18:34
AJaegermordred: is the centos removal18:35
mordredAJaeger: ++18:37
mordredAJaeger: fun on the focal image build :(18:37
openstackgerritMonty Taylor proposed openstack/project-config master: Start building focal images
AJaegermordred: yeah18:39
AJaegerspeaking about nodepool, here's one more change to review, please
AJaegermordred: and you need a +2 on (mirror focal) first...18:40
*** jrosser has quit IRC18:41
*** slittle1 has joined #opendev18:41
AJaegermordred: don't we add it to nb04 as paused as well?18:42
*** jrosser has joined #opendev18:42
clarkbAJaeger: yes those two should be disjoint18:43
clarkbso enable on !nb04 and disable on nb04 or vice versa18:43
AJaegerconfig-core, please review and (needs recheck once 720889 is merged)18:50
mordredclarkb: sigh:
mordredclarkb: what do you think we should do about that? the issue is that puppet is trying to start accessbot - which is clearly not going to work because we don't have a real irc account there18:55
openstackgerritDrew Walters proposed openstack/project-config master: Add Airship subproject documentation job
mordred(also - puppet logging to syslog is ... really annoying)18:55
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Only delete variables tempfile when it exists
clarkbmordred: maybe we should just stick with the noop apply jobs for now then?18:56
clarkbseems like handling cases like that will be easier as we convert into ansible18:56
*** prometheanfire has joined #opendev18:56
mordredclarkb: yeah. well - oh, you know18:56
mordredclarkb: what if we add noop support to the puppet role18:56
mordredso we still break it out into a service playbook like that patch is doing18:57
mordredbut we just have the test job run with noop set18:57
clarkbthat might give us better coverage of the puppet machinery18:57
clarkbif not quite for every puppet object18:57
clarkbya I think I like that18:57
clarkbits a decent halfway compromise18:57
mordredyeah. that said - _most_ of these things work just ine18:58
mordredit's really just starting services like accessbot18:58
mordredso maybe thinking about passing a "skip services" flag, similar to noop, that we can protect things like accessbot that are just impossible18:58
mordredwe might not have that many18:58
mordredand we _can_ do most of these things18:58
mordredI'll play with that first18:58
mordred(we could always spin up an irc daemon and point accessbot at it ;) )18:59
* mordred is not going to do that18:59
mordredalso - I mean, one of my next tasks is ansible-ifying eavesdrop _anyway_19:00
mordredI was thinking it would make an easy first puppet split out - but I might also just ansiblify it rather than update teh puppet to conditionally run accessbot19:01
openstackgerritMerged openstack/project-config master: Use legacy infra puppet jobs from system-config
openstackgerritDrew Walters proposed openstack/project-config master: Add Airship subproject documentation job
AJaegerianw: since you reworked all the docs publishing for static, could you review the change above, please ^19:15
AJaegeris that all fine - including directory structure?19:15
clarkbok meeting agenda is out20:58
clarkbI might finally have time to look at nodepool logs again20:58
clarkblooks like rackspace has been happy but inap is not currently20:59
clarkbbased on grafana graphs21:00
openstackgerritJoseph Richard proposed openstack/project-config master: Add Portieris Armada app to StarlingX
clarkbmordred: if you have a second I was going to test docker-compose stoppage of gerrit on review-dev but there are a ton of jeepyb upstream project leaked processes that maybe we should deal with first21:05
clarkbI seem to recall something had to be fixed around that, is there cleanup we need to do?21:05
mordredclarkb: oh - I think we already fixed that but probably didnt' clean up on review-dev21:07
mordredthe issue was that we weren't mounting in all the right things, so manage-projects was starting and then couldn't log in to gerrit, so it just sits there retrying untilthe end of time21:08
*** sgw has quit IRC21:09
clarkbcorvus: looking at the nodepool behavior with fresh eyes. I think nodepool is actually aware that it is at or near quota, it then pauses while waiting for ~150 nodes to delete. Then nodes "delete" but nova quota isn't updated so nodepool unpauses and tries to launch nodes and fails on quota errors.21:09
clarkbnow I think where we get in trouble is we then pause immediately again?21:10
clarkbso we can end up with multiple requests locked in a provider that isn't really in a happy state21:10
clarkbI'll try and get an etherpad of relevant logs together21:10
fungiclarkb: mordred: also the cronspam from the daily backups for review-dev are complaining21:17
fungii don't have an example handy, but there will be another in a few hours21:17
*** sgw has joined #opendev21:18
clarkb my notes on nodepool behavior21:18
clarkbgoing to share in #zuul now as I think this is mostly a nodepool thing21:19
sgw Hi Team, has something changed with the build-openstack-docs-pti template?  We are seeing a POST_FAILURE in the starlingx/zuul-jobs repo with this change:
sgwThis repo does not generate any docs21:20
sgwAJaeger: ^^^ you put this into starlingx/zuul-jobs, we are also seeing a similar issue in the new starlingx/kernel repo21:22
clarkbsgw: do you have the change that added the jobs handy?21:30
fungiclarkb: added build-openstack-docs-pti to starlingx/zuul-jobs when it merged on 2019-08-2621:32
fungiso that's been a while21:33
mordredbuild-openstack-docs-pti seems like a weird job to run on a starlingx repo - but I don't know background there21:33
fungisince it was added in august, that likely predates us better standardizing opendev docs jobs21:37
sgwShould I just disable that job for now since that repo does not have any docs requirements anyway21:40
openstackgerritDouglas Mendizábal proposed openstack/project-config master: Add ansible role for managing Luna SA HSM
openstackgerritMonty Taylor proposed opendev/puppet-accessbot master: Add flag to skip running the access script
clarkbsgw: yes if there are no docs to build I think yo ucan safely drop the docs jobs21:42
openstackgerritMonty Taylor proposed opendev/system-config master: Split eavesdrop into its own playbook
mordredclarkb: ^^ the depends-on won't actually work - I don't have the puppet modules doing that yet - and I don't know if I care enough to21:45
mordredclarkb: but if we could go ahead and land the puppet change, I think we can recheck the system-config change and it should work21:45
mordredI looked at the other thing - but the ansible task is going to be ... a little more involved21:45
mordredand I think it should be done as a separate change21:46
mordredclarkb: that said - this is going to be another great one to have triggered by project-config changes21:48
mordredsince the puppet run itself is the actual "bot" in this case21:48
mordredclarkb: we have a file - in puppet-accessbot - that doesn't seem to be used anywhere21:50
clarkbmordred: is that what we run to check that the proper perms are already set on channels to add the bot?21:50
mordredclarkb: oh - maybe so?21:51
mordredwe don't install it21:51
openstackgerritMonty Taylor proposed opendev/system-config master: WIP Build a container for accessbot
fungimordred: is that ^ maybe something we should eventually be running as a periodic zuul job?22:04
mordredfungi: yup!22:04
fungicool, doesn't seem like it actually needs a home on any persistent server22:05
mordredfungi: what I'm thinking is - make the container - install the container and config files with ansible in service-eavesdrop.yaml - and them make a run-accessbot.yaml playbook that just runs the command - that we can run in response to project-config changes and also in timer jobs22:06
mordredthen we can run service-eavesdrop in the gate, but leave off run-accessbot22:06
mordredfungi: yeah - we could almost certainly even not install it - but we do need the config files and secrets, so it might be just as easy to keep the pattern22:07
fungiyeah, for now it makes sense not to change much22:07
fungibut ultimately it's just a command we run periodically with some config from git and a secret22:07
fungithere's no state to maintain22:08
fungi(unlike our other irc bots)22:08
mordredfungi: in the mean time -if you have a sec, could you ?22:09
mordredI need that to unstick
mordred(turns out running the command when we run the puppet in the gate job is unhappy)22:09
mordredthe depends-on will not work22:09
* mordred promises the is all going to lead to finishing the gerrit task and getting gerritbot updating again22:10
fungimordred: looks like there's some negative ci results for 72135022:11
mordredfungi: booo22:11
fungilegacy-puppet-lint and system-config-puppet-beaker-rspec-puppet-4-infra22:11
mordredso ...22:13
mordredok - nevermind22:13
mordredI'm just going to finish the ansiblifying in the morning22:13
mordredbecause I'm NOT fixing that22:14
fungimore bitrot? i haven't looked at the errors yet22:14
mordredyeah- well - sort of22:14
mordredI think it's bitrot in terms of we have a central test that expects to run that likely hasn't been run here22:14
fungibut it seems like every time i turn my back, yet another ruby gem decides it no longer supports xenial's version of ruby22:14
ianwmordred / clarkb: not sure if you saw, after a number of false starts i diagnosed the nb04 down to ubuntu somehow destroying the container (
mordredianw: I saw the disable ubuntu on nb04 patch but hadn't seen that22:15
* fungi imagines ubuntu dousing some crates with petrol and setting them ablaze22:15
ianwi'm not sure why ... we do the same thing in the gate for the nodepool container func test22:15
ianwanyway, that will obviously be a blocker for migrating all our hosts to the container builder22:16
mordredianw: do we run more than one in the gate? I mean - the first build seems to work fine22:16
ianwthat was the red-herring ... the rpms builds work fine22:16
mordredianw: is it that dib is finding a mount inside of the container and "cleaning it up" ... oh22:16
mordredso it's not a fundamentally dib thing22:17
mordredit's the debootstrap22:17
mordredso - potentially something about how debootstrap builds its chroot - or more likeluy cleans up after itself22:17
mordredis "cleaning up" something it shouldn't be22:17
ianwi have a suspicion debootstrap is involved ... i need to trace it out today22:17
mordredianw: I agree with that suspicion22:17
ianwi'm not really sure why "umount /proc" in a container actually works22:17
ianwi guess no daemon has any part of it open?22:18
ianwand i'm not sure why it would work in the gate if it doesn't in production22:18
ianwthese are the mysteries of our time22:18
mordredianw: by the time you figure this out, you're going to understand everything about docker22:18
*** tosky has quit IRC22:19
ianwhaha docker will probably get bought by microsoft and we'll probably switch to podman then :)22:19
mordredianw: :)22:19
* ianw wonders if docker is already owned by microsoft, it's hard to keep up22:20
mordredianw: so - a thing to ponder in parallel22:20
mordredianw: <-- if we get that working, we could use the ubuntu docker image for the initial rootfs and avoid the debootstrap step22:21
mordredianw: it might be a thing to ponder depending on how today's debugging goes22:22
ianwyeah it's certainly on my mind22:22
fungialso you might consider switching to mmdebstrap, if we're doing it in containers and not stuck with ancient tools22:23
mordredfungi: wow - what's mmdebstrap?22:23
fungiit's in debian as of buster, and ubuntu as of disco22:24
mordredfungi: neat22:24
fungiIn contrast to debootstrap it uses apt, supports more than one mirror, automatically uses security and updates mirrors for Debian stable chroots, is 3-6 times faster, produces smaller output by removing unnecessary cruft, is bit-by-bit reproducible if $SOURCE_DATE_EPOCH is set, allows unprivileged operation using Linux user namespaces, fakechroot or proot and can setup foreign architecture chroots using22:24
mordredianw: so - yeah - might also be worth trying updating dib to use mmdebstrap22:24
corvusianw, mordred: i believe that change is working22:24
corvusgimme a sec to dig up links22:25
mordredcorvus: yeah - by "get it working" I might just mean "figure out the test failures"22:25
corvusit's actually passing the relevant tests22:25
corvusthe failures are that it's only tested under bionic or something22:25
corvusbecause podman isn't installed on the others22:25
fungimordred: ianw: i've been using mmdebstrap on sid for creating my stable chroots for a couple years now, works well22:25
* mordred has to afk22:26
corvusianw, mordred: so to get this working we might just need to add more podman support to distros in zuul-jobs, or disable that functest on platforms where we can't22:26
openstackgerritMerged opendev/system-config master: Cleanup unneeded things post docker-compose upgrade
fungialso the fact that it can use qemu-user for foreign architectures would make it possible to build an arm64 ubuntu image on an amd64 vm, in theory (though i don't know how slow that would be)22:27
ianwcorvus: it would be good to do some boot tests too, even in experimental.  although the nodepool tests are on my todo list to replace with container based ones because they don't really reflect production22:28
corvusah yep22:28
clarkbfungi: slow as molasses probably22:31
fungiyeah, depends on how much actually runs under qemu-user22:32
fungilike, unpacking the debs doesn't need emulation22:32
fungibut maintscripts probably could22:32
fungialso probably dib would need to be extended to run certain phases for elements in a similar emulation layer anyway22:32
fungiso it's not like we could take advantage of that straight away22:33
clarkbianw: fungi mordred corvus is there a tldr on the dib things? or should I just not worry? was pretty heads down on nodepool things but I think we've maybe pulled that thread to a conclusion (at least until change gets written?)22:37
ianwclarkb: the long story short is now i've realised that ubuntu doesn't seem to build under the container as is22:38
ianwwe can either fix it, or move ahead with an alternative approach like the container base images, or try something in between like new debootstraps22:39
ianwthen we should update the dib nodepool tests to be testing our production images under container builds22:39
ianwok, i've figured out the magic ruins to run a test build under strace on nb04 and it's outputting to /root/trace/out.txt ... this should tell us if it's clearly someting running "umount /proc"22:55
ianwkeeping notes in :
openstackDebian bug 919659 in live-build "live-build: building in docker fails with mounting /proc unmount /sys" [Important,Open]23:02
ianw"2020-04-20 22:58:27.162 | W: Failure trying to run: chroot "/tmp/dib_build.JD7OVXuc/mnt" mount -t proc proc /proc"23:05
fungii guess that should be merged with bug 92181523:05
fungier, i mean23:06
openstackDebian bug 921815 in debootstrap "debootstrap umount "host" /proc when running in a Docker container" [Normal,Open]23:06
ianwyeah, that links to an unmerged pull request23:06
fungiwell, "merge request" (because it's gitlab), but yep23:07
ianw--cap-add SYS_ADMIN might be enough?  i sort of thought that a privileged container had that23:07
clarkbI need to reboot for overdo system updates. Back in a bit23:07
fungiianw: i thought it did too, but i'm not super confident in my grasp of container implementations23:07
fungithe first bug you linked had a runtime hackaround mentioned downthread23:08
ianwit must be allowed to do mounts, all the others mount a million things ...23:08
ianw4197  mount("proc", "/proc", "proc", MS_MGC_VAL, NULL) = -1 ELOOP (Too many levels of symbolic links)23:09
ianw4197  write(2, "/proc: mount(2) system call fail"..., 70) = 7023:09
ianwright, it's not failing with a permissions error, but something like the layout issues23:10
ianwfungi: is mmdebstrap a complete rewrite?23:17
ianwhrm "Debootstrap supports creating a Debian chroot on non-Debian systems but mmdebstrap requires apt and is thus limited to Debian and derivatives."23:19
clarkbapt is available on other system though right?23:19
clarkbposible that isn't sufficient though23:19
fungiianw: seems like a complete rewrite at least. there's also cdebootstrap, which is a rewrite in c23:20
ianwjohnsom / cgoncalves are possibly the main people who might care about building debuntu on !debuntu?23:21
fungia colloquial term for the family of debian and debian derivative distributions23:23
fungi(such as ubuntu23:23
johnsomCurrently Octavia cares about Ubuntu and CentOS/RHEL23:23
fungidoes octavia use centos/rhel to build ubuntu images with dib?23:24
johnsomThere might be some shenanigans like that happening, but I don't think it is a *need*.23:25
ianwright now i'm thinking import the debootstrap patch into the openstackci ppa and put that in the container23:25
ianwwe ran with a debootstrap from there (still do maybe on xenial?)23:26
johnsomWe do use debootstrap for the ubuntu-minimal style builds as the cloud images are ... large and this has been more stable for us (i.e. changing cloud image formats, etc.)23:27
ianwyeah i'd prefer to KISS ... i don't think we want to spend time on new debootstrap implementations that might make life hard for existing users of -minimal images23:29
openstackgerritMohammed Naser proposed zuul/zuul-jobs master: helm-template: enable using values file
fungialso mmdebstrap isn't available on old ubuntu versions (prior to 20.04 lts, which isn't even out yet) so, yeah, that would be a challenge for anyone not running from a debian/buster container23:44
fungii agree patching debootstrap is probably the best option for now23:44
johnsomI am jumping between a bunch of conversations, so hard for me to track here. Let me know if there is feedback I can give for DIB. (FYI, 20.04 is planned for release this week last I checked).23:52
fungijohnsom: nope, that was helpful, thanks23:52
johnsomOk, cool! Easiest conversation I have had today23:53
fungiwe likely don't want to cause problems for folks building images from older ubuntu releases for a while still23:53
fungii was only considering opendev's use case, as ianw rightly pointed out23:53
ianwthis should let us test23:59
ianwit's nice to have options; of the three "fixes" available seemed the most appropriate23:59

Generated by 2.15.3 by Marius Gedminas - find it at!