Tuesday, 2020-06-16

fungii'm still around but seems like the deployment backlog isn't close to reaching 720302 yet, still 7 changes ahead of it00:40
fungiso i have doubts i'll be awake by then00:40
fungia lot of deploy jobs are failing or hitting timeouts too00:41
corvusfungi, clarkb: we've disabled ansible, so everything should timeout.  we'll let it continue doing that overnight, then run stuff manually tomorrow.00:42
fungiahh, okay. so i should probably just apply 733673 by hand at this point00:43
fungithough i've technically already missed the 15th for deleting that ml now, i doubt anyone is put out by it00:45
fungi#status log deleted user-committee ml from openstack mailman site on lists.o.o00:48
openstackstatusfungi: finished logging00:48
auristorianw: how long did it take to release mirror.fedora after the rsync without -t ?00:52
fungi#status log ze04 rebooted to clear inconsistent afs rw volume access following saturday's outage00:53
openstackstatusfungi: finished logging00:53
ianwauristor: i ran a zero-delta update and it took about 10 seconds :)00:54
auristorI think we've found your smoking gun.00:55
openstackgerritIan Wienand proposed opendev/system-config master: rsync-mirrors: drop rsync -t and standarise flags  https://review.opendev.org/73575300:55
ianwauristor/fungi: ^ and that's what i've come up with00:55
auristorrsync -t isn't helping anything because it always finds a time different00:56
ianwright, it always tries to set tv_nsec now; possibly previously (i mean, this has been a long time) upstream mirror was ext3 or nfs or something that wasn't reporting ns precision00:57
auristordo you care about typos in commit messages?00:57
ianwauristor: yep; i'll fix any pointed out :)00:57
ianwlike standarise :)00:57
auristorivestigating :)00:58
openstackgerritIan Wienand proposed opendev/system-config master: rsync-mirrors: drop rsync -t and make flags consistent  https://review.opendev.org/73575300:58
fungilooking at the rsync manpage, it has a --modify-window option to make the timestamp comparisons flexibly fuzzy, though i have no idea if that would have helped00:59
auristora window of 1s would have helps00:59
ianwaiui it's default window does ignore ns ... but it's the fact it tries to *update* the files on the sync part (rather than the "what should i sync" part)01:00
ianwagain, aiui, it uses the modification time as a shortcut ... if modification time is == then file hasn't changed01:01
ianwbut without "--times" it falls back to it's old logic of ctime and file size01:01
auristorianw: sorry but there are two more typos01:01
auristorI think you need a new keyboard, more coffee or more beer01:01
auristortrue, its the setting of time not the comparison that is the problem01:03
*** Meiyan has joined #opendev01:03
openstackgerritIan Wienand proposed opendev/system-config master: rsync-mirrors: drop rsync -t and make flags consistent  https://review.opendev.org/73575301:03
ianwheh, just need to slow down, and pay attention to flyspell01:04
auristora problem for openafs that is.   either an auristorfs client or an auristorfs fileserver would handle it01:04
fungiand yeah, that's why i figured altering the comparison window wouldn't do any good01:04
fungiianw: not sure if you saw, i left comments/questions on patch #201:05
auristornow that syncing will transfer the correct amount of data perhaps replicas can be added back to afs01.ord01:05
ianwfungi: yeah, that -p was missed thanks01:06
ianwfungi: yeah, we had the "-i" to debug what rsync thought it was touching01:07
fungiahh, okay01:07
ianwhowever, it *was* touching the timestamp, and not reporting that01:07
fungiso we'll likely roll those back to just -v?01:07
ianw(which i am poking at the rsync source about)01:08
ianwwe can; let me update that too01:08
openstackgerritIan Wienand proposed opendev/system-config master: rsync-mirrors: drop rsync -t and make flags consistent  https://review.opendev.org/73575301:10
openstackgerritOleksandr Kozachenko proposed zuul/zuul-jobs master: Fix namespace speicification in collect-kubernetes-logs role  https://review.opendev.org/73575501:17
ianwhttps://git.samba.org/?p=rsync.git;a=patch;h=0f8e9e2d8638e47d646a6baba694b303ac84e695;hp=c4a3f55be35726d0a033996dc37b0fb248b45cb501:41
ianwfungi/auristor: ^ and here's the change that fixes it ...01:42
ianwwhich made it into 3.1.3 ... and bionic has ... you guessed it ... 3.1.201:43
ianwalso, it seems "If you repeat the option, unchanged files will also be output," is mentioned for "-i" ... so if we had "-ii" we would have actually seen the itemized output saying "t", so it's just a bug with 3.1.2 we didn't see that which would have tipped me off without having to strace and blah blah blah01:46
ianwactually, no, that's not true.  it doesn't show up with "-ii" either01:47
ianwhttp://paste.openstack.org/show/794789/01:48
openstackgerritIan Wienand proposed opendev/system-config master: rsync-mirrors: drop rsync -t and make flags consistent  https://review.opendev.org/73575301:51
*** ysandeep is now known as ysandeep|away02:03
auristorianw: I think that rsync fix is wrong02:25
auristorThe clock resolution of afs3 is 1s just as the clock resolution of FAT is 2s.   rsync should be querying the clock resolution of the source and destination filesystems and use them to decide what a matching timestamp is and not set the modify time if there was a time match.02:29
ianwbut how do you really know what filesystem you're writing to?02:33
auristorif the version of linux you are using doesn't support fsinfo then you do what df does and use a table of file system names as reported by stat02:35
ianwmirror runs are failing on mirror01.ca-ymq-1.vexxhost.opendev.org "src file does not exist, use "force=yes" if you really want to create the link: /afs/openstack.org/mirror/logs" ... i'm guess this is the same "directory gone" issue02:42
ianwyeah, it's the same odd mix of missing stuff fungi reported; i'll take the same approach and reboot it02:44
auristorhttps://bugzilla.redhat.com/show_bug.cgi?id=167277902:44
openstackbugzilla.redhat.com bug 1672779 in rsync "Rsync bug resets modification time of every destination file that has not changed" [Medium,Closed: errata] - Assigned to mruprich02:44
*** Meiyan has quit IRC02:45
*** Meiyan has joined #opendev02:46
ianwauristor: yeah, i saw that, that's asking for a backport of that change02:46
auristorthe fix is from 200102:47
ianwi have also added the credentials i missed for https://review.opendev.org/#/c/728739/ to hopefully fix the bridge run02:48
auristorthe overlayfs problem is that its reporting nanosecond timestamp support even though the underlying filesystem might be 1s or 2s granularity.  however, the fix there should be in overlayfs to ignore the setting of the timestamp when the underlying filesystem doesn't support the required resolution.02:50
ianw#status log rebooted  mirror01.ca-ymq-1.vexxhost.opendev.org for afs connection issues02:53
openstackstatusianw: finished logging02:53
ianwstatic, backup, etc seem to have timed out ..02:56
ianw2020-06-15 23:09:27,070 DEBUG zuul.AnsibleJob.output: [e: 13020d2bc71446c89ab132b3449dccdf] [build: da25d932c7914344af04e1b55a1a5488] Ansible output: b"TASK [Make sure a manaul maint isn't going on path=/home/zuul/DISABLE-ANSIBLE, state=absent, timeout=3600, sleep=10] ***"03:02
ianw2020-06-15 23:39:02,272 WARNING zuul.AnsibleJob: [e: 13020d2bc71446c89ab132b3449dccdf] [build: da25d932c7914344af04e1b55a1a5488] Ansible timeout exceeded: 1781.186414957046503:02
ianwok, so corvus' "Make disable-ansible fancier" seems to address the issue of this file being in place and people who are out of sync, like me :) not knowing what's going on03:07
ianwJun 16 10:42:41 <corvus>        fungi, clarkb: we've disabled ansible, so everything should timeout.  we'll let it continue doing that overnight, then run stuff manually tomorrow03:12
*** ysandeep|away is now known as ysandeep04:17
*** ykarel|away is now known as ykarel04:28
openstackgerritMerged zuul/zuul-jobs master: Fix namespace speicification in collect-kubernetes-logs role  https://review.opendev.org/73575505:00
*** DSpider has joined #opendev05:21
*** hashar has joined #opendev06:02
openstackgerritIan Wienand proposed opendev/system-config master: openafs-client: Use PPA for Xenial ARM64  https://review.opendev.org/73505506:10
openstackgerritIan Wienand proposed opendev/system-config master: Acutally run system-config arm64 test on an arm64 node  https://review.opendev.org/73528106:10
openstackgerritIan Wienand proposed opendev/system-config master: mirror-update: mirror Fedora 32  https://review.opendev.org/73577306:16
*** hashar has quit IRC06:21
openstackgerritIan Wienand proposed zuul/zuul-jobs master: Remove the -plain job variants  https://review.opendev.org/73577406:23
openstackgerritIan Wienand proposed openstack/project-config master: Turn -plain nodes down to min-ready 0  https://review.opendev.org/73577706:30
openstackgerritIan Wienand proposed openstack/project-config master: Remove plain images  https://review.opendev.org/73577806:30
openstackgerritMerged openstack/project-config master: Retire Tricircle projects: finish infra todo  https://review.opendev.org/72890206:42
*** hashar has joined #opendev06:48
openstackgerritOpenStack Proposal Bot proposed openstack/project-config master: Normalize projects.yaml  https://review.opendev.org/73578206:48
*** rpittau|afk is now known as rpittau06:57
*** sgw1 has quit IRC07:01
openstackgerritMerged zuul/zuul-jobs master: Return upload_results in test-upload-logs-swift role  https://review.opendev.org/73550307:11
*** tosky has joined #opendev07:31
*** Meiyan has quit IRC07:40
*** Meiyan has joined #opendev07:41
*** moppy has quit IRC08:01
*** moppy has joined #opendev08:01
*** ykarel is now known as ykarel|lunch08:08
*** lpetrut has joined #opendev08:16
*** hashar_ has joined #opendev08:26
*** hashar has quit IRC08:27
openstackgerritSlawek Kaplonski proposed openstack/project-config master: Add noop-jobs for neutron-fwaas project  https://review.opendev.org/73580708:38
*** hashar_ is now known as hashar08:43
*** priteau has joined #opendev08:45
*** ysandeep is now known as ysandeep|lunch08:46
openstackgerritSlawek Kaplonski proposed openstack/project-config master: Rename neutron-fwaas and neutron-fwaas-dashboard to x/ namespace  https://review.opendev.org/73581208:53
*** ykarel|lunch is now known as ykarel08:57
openstackgerritSlawek Kaplonski proposed openstack/project-config master: Add noop-jobs for neutron-fwaas and neutron-fwaas-dashboard projects  https://review.opendev.org/73580709:08
AJaegerinfra-root, please review https://review.opendev.org/735832 for infra-manual to better document project removal.09:29
jrosserAJaeger: can you give me any advice on the openstack-tox-docs failure here https://review.opendev.org/#/c/735805/09:35
jrosseri must be missing something but can't see an obvious problem09:35
AJaegerjrosser: I hope it's fixed by 735801  which just merged09:36
AJaegerjrosser: let me check the logs to see whether that is the same problem09:36
AJaegerjrosser: yes, that's the problem that 735801 should fix.09:37
jrosserAJaeger: excellent thanks09:37
openstackgerritSlawek Kaplonski proposed openstack/project-config master: Retire neutron-fwaas and neutron-fwaas-dashboard projects  https://review.opendev.org/73581209:47
openstackgerritMerged zuul/zuul-jobs master: Terraform roles and jobs.  https://review.opendev.org/73367509:48
openstackgerritSlawek Kaplonski proposed openstack/project-config master: Add noop-jobs for neutron-fwaas and neutron-fwaas-dashboard projects  https://review.opendev.org/73580709:51
*** diablo_rojo has quit IRC09:54
openstackgerritSlawek Kaplonski proposed openstack/project-config master: Add noop-jobs for neutron-fwaas and neutron-fwaas-dashboard projects  https://review.opendev.org/73580709:55
openstackgerritSlawek Kaplonski proposed openstack/project-config master: Readd publish-to-pypi for neutron-fwaas and dashboard  https://review.opendev.org/73585010:00
*** ysandeep|lunch is now known as ysandeep10:05
*** hashar_ has joined #opendev10:10
*** hashar has quit IRC10:11
*** Meiyan has quit IRC10:11
openstackgerritSlawek Kaplonski proposed openstack/project-config master: Add noop-jobs for neutron-fwaas and neutron-fwaas-dashboard projects  https://review.opendev.org/73580710:17
openstackgerritSlawek Kaplonski proposed openstack/project-config master: Readd publish-to-pypi for neutron-fwaas and dashboard  https://review.opendev.org/73585010:17
openstackgerritSlawek Kaplonski proposed openstack/project-config master: Readd publish-to-pypi for neutron-fwaas and dashboard  https://review.opendev.org/73585010:19
*** hashar__ has joined #opendev10:20
*** hashar__ is now known as hashar10:24
*** hashar_ has quit IRC10:24
*** lpetrut has quit IRC10:33
*** rpittau is now known as rpittau|bbl10:34
*** calcmandan has quit IRC10:36
*** calcmandan has joined #opendev10:39
*** sshnaidm is now known as sshnaidm|afk10:45
*** hashar has quit IRC10:50
*** lpetrut has joined #opendev11:14
*** sshnaidm|afk is now known as sshnaidm11:40
openstackgerritMerged openstack/diskimage-builder master: Add .eggs to gitignore  https://review.opendev.org/73446911:43
*** priteau has quit IRC11:47
openstackgerritEmilien Macchi proposed openstack/project-config master: paunch: don't run publish-to-pypi template  https://review.opendev.org/73588912:13
openstackgerritEmilien Macchi proposed openstack/project-config master: paunch: don't run publish-to-pypi template  https://review.opendev.org/73588912:26
openstackgerritEmilien Macchi proposed openstack/project-config master: Revert "Deprecate Paunch"  https://review.opendev.org/73589312:26
*** rpittau|bbl is now known as rpittau12:28
*** tkajinam has quit IRC12:31
openstackgerritAndreas Jaeger proposed openstack/project-config master: paunch: don't run publish-to-pypi template  https://review.opendev.org/73588912:43
*** rchurch has quit IRC12:49
*** rchurch has joined #opendev12:51
*** ysandeep is now known as ysandeep|afk12:51
mordredAJaeger: the "build-python-release" job seems sad with the new imagew12:56
*** sgw1 has joined #opendev12:56
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: Add ensure-pip to build-python-release  https://review.opendev.org/73590412:57
AJaegermordred: ;/12:58
AJaegermordred: we had a few interesting cases that needed fixing ;/12:58
AJaegermordred: LGTM, thanks12:59
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: Add ensure-pip to build-sphinx-docs  https://review.opendev.org/73590813:10
*** mtreinish has quit IRC13:13
*** mtreinish has joined #opendev13:14
openstackgerritMerged opendev/system-config master: rsync-mirrors: drop rsync -t and make flags consistent  https://review.opendev.org/73575313:15
openstackgerritMerged openstack/project-config master: paunch: don't run publish-to-pypi template  https://review.opendev.org/73588913:15
openstackgerritEmilien Macchi proposed openstack/project-config master: Revert "Deprecate Paunch"  https://review.opendev.org/73589313:17
openstackgerritEmilien Macchi proposed openstack/project-config master: Revert "Deprecate Paunch"  https://review.opendev.org/73589313:18
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: Add ensure-pip and ensure-virtualenv to build-sphinx-docs  https://review.opendev.org/73590813:18
openstackgerritMerged zuul/zuul-jobs master: Add ensure-pip to build-python-release  https://review.opendev.org/73590413:20
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: Add ensure-pip and ensure-virtualenv to build-sphinx-docs  https://review.opendev.org/73590813:21
*** ysandeep|afk is now known as ysandeep13:27
*** mtreinish has quit IRC13:36
*** mtreinish has joined #opendev13:38
*** mtreinish has quit IRC13:43
openstackgerritTristan Cacqueray proposed zuul/zuul-jobs master: DNM: test ensure-sphinx role  https://review.opendev.org/73591913:49
*** mtreinish has joined #opendev13:49
*** tkajinam has joined #opendev13:57
*** mlavalle has joined #opendev13:58
*** roman_g has joined #opendev14:06
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: Build sphinx with python3 instead  https://review.opendev.org/73592314:12
*** hashar has joined #opendev14:12
*** jbryce_ has joined #opendev14:13
*** mnaser has quit IRC14:14
*** zbr_ has joined #opendev14:15
*** auristor has quit IRC14:15
*** jbryce has quit IRC14:16
*** zbr has quit IRC14:16
*** jbryce_ is now known as jbryce14:16
*** zbr_ is now known as zbr14:16
*** mnaser has joined #opendev14:17
*** hrw has quit IRC14:18
*** hrw has joined #opendev14:18
*** auristor has joined #opendev14:23
openstackgerritJeremy Stanley proposed zuul/zuul-jobs master: Record artifact checksums and signatures to stdout  https://review.opendev.org/73592914:40
openstackgerritJeremy Stanley proposed zuul/zuul-jobs master: Simplify twine invocation for PyPI uploads  https://review.opendev.org/73593214:50
clarkbgitea is down to a single 1.12.0 milestone bug15:02
clarkband it is the bug that discussion says is probably not a bug15:02
AJaegerclarkb: that's what I would try as well to get a release out of the door ;)15:04
clarkbha15:05
clarkbin this case I've read the comments and I think they are right? its about a gitea push hook not running on new project create. The reason for that is the project creation hook runs and a push hook is a different event15:05
AJaegeryeah, looks different15:06
openstackgerritMerged zuul/zuul-jobs master: Add ensure-pip and ensure-virtualenv to build-sphinx-docs  https://review.opendev.org/73590815:09
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: Build sphinx with python3 instead  https://review.opendev.org/73592315:09
clarkbhttps://github.com/go-gitea/gitea/issues/11534 is the bug fwiw15:10
mordredclarkb: I agree, that does not sound like a bug15:18
*** ykarel is now known as ykarel|away15:23
AJaegerinfra-root, please review https://review.opendev.org/735832 for infra-manual to better document project removal.15:23
clarkbinfra-root re Zuul restart is the plan there to keep zuul ansible disabled, stop zuul, run playbook for zuul and zk manually then start zuul?15:40
clarkbI'm making tea and breakfast but will be around to help in just a few minutes15:41
fungithat's what last night's plan sounded like, at least15:41
clarkbAJaeger: is openstack-tox-docs expected to run on eg cinder master now?15:43
mordredclarkb: yes, I believe so (re restart plan)15:43
AJaegerclarkb: since ages15:43
clarkbAJaeger: ok was following up on https://zuul.opendev.org/t/openstack/build/28e2129d52564def8c473b2545f32b1f which failed ~7 hours ago. I'll let them know to recheck15:44
AJaegerclarkb: the switch to those jobs was done during stein cycle across all projects15:44
AJaegerclarkb: ah, yes, that is fixed15:44
AJaegerclarkb: I thought you were talking about 73593715:44
clarkbAJaeger: sorry I mean is it expected to be successful after the pip and virtualenv fallout15:44
clarkbsounds like yes to both things :)15:44
AJaegerclarkb: got it ;)15:44
AJaegerclarkb: and yes to both15:44
AJaegerclarkb: I was talking about 735923  and 735937, reviews welcome ;)15:45
*** lpetrut has quit IRC15:49
*** ysandeep is now known as ysandeep|brb15:51
*** rpittau is now known as rpittau|afk16:12
*** ysandeep|brb is now known as ysandeep16:23
corvusclarkb, fungi, mordred: i'm around and ready to do the restart16:24
fungii too am around and basically free (just polishing off the last of my lunch)16:24
clarkbya I'm sipping tea and reading too much about python packaging :) happy for the distraction16:25
*** diablo_rojo has joined #opendev16:27
* mordred is also here16:27
clarkbI notified the openstack release team earlier today and they said they would hold off on releaes too16:28
corvuscool16:28
corvusjust checking on the load -- we're busy, but not too backlogged16:28
corvusthere is a release-post job16:28
fungiand i just got the last of the release failure fallout from the afs outage cleaned up16:29
fungi(had to retrieve some files from pypi and manually recreate pgp sigs)16:29
corvusand it's done16:29
*** ysandeep is now known as ysandeep|away16:30
corvusmordred: is there a playbook to update the git repos on bridge?16:30
corvusmaybe we just manually update p-c and s-c and that's good enough?16:31
mordredcorvus: yes, I believe that16:32
fungidoes bridge even care about p-c? do we copy that onto the servers from bridge or fetch it on them?16:32
mordred(there is not a playbook, but just manually updating should be fine)16:32
corvusfungi: no idea, better safe than sorry :)16:33
mordredwe clone project-config ourselves16:33
mordredin playbooks/roles/sync-project-config/tasks/main.yaml16:33
corvusi ran: "git pull https://opendev.org/opendev/system-config"16:33
fungicorvus: wfm16:33
corvushead looks right16:33
mordredinto /opt/project-config16:33
mordredso - system-config is all we need - but you can also pull project-config in /home/zuul to be safe :)16:34
corvusi did that too :)16:34
fungioh, right, at one point we were pushing zuul refs for p-c onto bridge and it was in turn pushing those to servers, i guess16:34
mordredyeah - and - we can actually go back to that now ...16:35
mordredwe stopped because we didn't have the serial pipeline manager16:35
mordredand things would get enqueued out of order16:35
mordredbut - this also seems fine16:35
fungimordred: well, i think we still have that issue because deploy and periodic pipelines can race16:36
corvusstatus notice Zuul is being restarted for an urgent configuration change and may be offline for 15-30 minutes.  Patches uploaded or approved during that time will need to be rechecked.16:36
corvushow's that look?16:37
mordredfungi: oh right16:37
mordredcorvus: ++16:37
fungicorvus: lgtm16:37
corvus#status notice Zuul is being restarted for an urgent configuration change and may be offline for 15-30 minutes.  Patches uploaded or approved during that time will need to be rechecked.16:37
openstackstatuscorvus: sending notice16:37
-openstackstatus- NOTICE: Zuul is being restarted for an urgent configuration change and may be offline for 15-30 minutes. Patches uploaded or approved during that time will need to be rechecked.16:37
corvusi have saved queues; will stop zuul now16:37
mordredcorvus: are you screening?16:38
corvusno, maybe i should going forward16:38
corvusi've started a root screen on bridge16:39
corvusthe zuul stop is still running in an unscreened window, i'll let you know when it's done16:39
corvusbut next we should stop nodepool16:40
*** sgw1 has quit IRC16:40
openstackstatuscorvus: finished sending notice16:40
corvusanyone understand that failure?16:42
mordredcorvus: uhm16:42
corvusoh, is because of containers?16:43
mordredcorvus: oh - I think it's trying to stop nodepool launcher16:43
mordredbut we don't have that anymore?16:43
corvuswe don't have nodepool-launchers?16:43
mordredwe don't have service: nodepool-launcher16:43
corvusok, yeah, so the container stuff16:43
fungiwe talked about a service wrapper around docker-compose, but that doesn't exist (yet)16:43
corvusi think we need to find a better way to keep these ad-hoc playbooks current :/16:43
mordredI think we updated the zuul one and not the nodepool one16:44
corvusare all of our launchers in containers?  are all of our builders in containers?16:44
mordredlooking16:45
mordredlaunchers are all in containers16:45
clarkbnb03 is not in container16:45
mordredbuilders are  a mix - nb03 is not container nodepool-builder_opendev is container16:45
clarkbnb01 and nb02 are in containers16:45
corvusis that "include_role" "tasks_from: stop" thing going to work for the nodepool containers?16:46
mordred(nodepool-builder_opendev is a group containing the list of container builder hosts)16:46
mordredcorvus: no. but we should make that work16:46
mordredcorvus: I will make a patch to update the nodepool roles to support that pattern16:46
corvusokay, so what i'm getting is the best way to make sure nodepool is stopped is to log into all the machines and make it be stopped?16:46
clarkbwe can probably do a simple ansible command?16:47
mordredcorvus: we can make a quick local playbook - or we can log in to them all16:47
mordredwe want to do docker-compose down on nodepool-builder_opendev and nodepool-launcher groups - and service stop on nb0316:48
corvusmordred: okay, want to make that playbook then?16:50
corvusmordred: you can drive the screen session16:50
mordredhttp://paste.openstack.org/show/794825/16:51
mordredcorvus: k. driving16:51
mordredcorvus: you want me to put the nodepool stop into the zuul_stop playbook for now then?16:51
corvusnope16:51
corvushow about that one :)16:51
corvusif it's just what's in paste, we can just cat>16:52
mordredhow's that16:54
corvusok let's try it16:54
corvusthe nb03 not existing is concerning?16:54
mordredyeah ...16:54
corvus.openstack?16:55
corvusyeah,  that's right16:55
mordredmaybe it's disabled16:55
corvusok, i guess i'll log in manually16:55
corvusit's been disabled a long time16:56
mordredyes - it is disabled16:56
mordred    - nb03.openstack.org # ianw 2020-05-20 hand edits applied to dib to build focal on xenial16:56
openstackgerritClark Boylan proposed opendev/system-config master: Run restart playbooks to test they work  https://review.opendev.org/73596916:56
corvusand it's about to not be able to come back up16:56
corvusbecause its config is going to be wrong16:56
corvusi guess we can talk about it at the meeting16:56
mordredyou know ...16:56
mordredI think those edits have all landed to dib16:56
corvusso i'll just stop it for now, and not bring it back up16:56
mordredkk16:56
clarkbcorvus: I think that is fine16:56
clarkbthe existing images won't go away16:56
corvusok it is stopped16:57
clarkbalso do these changes force tls for everything? or can it still talk via not tls?16:57
corvusi think that means all of nodepool is stopped; zuul is almost finished stopping16:57
corvusclarkb: force tls16:57
clarkbfwiw I expect that https://review.opendev.org/735969 will fail due to the problems we discovered. We can squash fixes into that or I'll rebase on the fixes. But i think that will help us keep those playbooks running16:58
corvusnow i think we can run the zk playbook16:58
mordredcorvus: ++16:58
mordredclarkb: I thinkt hat patch is a great idea - I'll update it to fix the start/stop stuff once this is done16:59
fungithe only thing which still has not-tls left over as far as i could find are the firewall rules (fixed when 735740 merges) and the nodepool conffiles in project-config (which ansible overwrites)16:59
openstackgerritMonty Taylor proposed opendev/system-config master: Run restart playbooks to test they work  https://review.opendev.org/73596917:03
mordredclarkb: ^^ updated17:03
clarkbcorvus: that looked happy17:03
corvusagreed17:03
mordred\o/17:03
corvusnow we need to restart all the zk docker containers17:03
mordredyup17:03
mordredcorvus: it's /etc/zookeeper-compose/17:04
mordredcorvus: those look good17:05
corvuskk, running the stop17:05
corvus2020-06-16 17:05:48,010 [myid:1] - INFO  [/23.253.236.126:3888:UnifiedServerSocket$UnifiedSocket@273] - Accepted TLS connection from zk03.openstack.org/23.253.90.246:37594 - TLSv1.2 - TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA25617:06
clarkbneat17:06
mordredcorvus: do we expect: 2020-06-16 17:05:48,238 [myid:3] - ERROR [QuorumPeer[myid=3](plain=disabled)(secure=[0:0:0:0:0:0:0:0]:2281):QuorumPeer@1619] - Error writing next dynamic config file to disk:17:06
clarkbmordred: yes I think that is expected17:06
corvusyep17:06
mordredI thought so - but thought I'd check17:06
clarkbwe thought it may have been causing the problems we had when we switched to containers but turns otu it was a bug in zk itself that was workedaround by using IP addrs in config isntead of dns names17:07
mordred++17:07
corvusall 3 look happy17:08
clarkbI've just confirmed that port 2181 isn't listening on zk01 but 2281 is (confirms the assertion earlier that non tls is disabled)17:08
corvuslet's bring nodepool back up?17:08
clarkbcorvus: ++17:08
fungiyep, looks right17:08
mordredcorvus: I added nodepool start tasks in https://review.opendev.org/73596917:08
corvusactually17:09
corvusoh we haven't run the nodepool playbook yet17:09
mordredthat's important17:09
corvuslet's do that now17:09
mnaser(something i may get around contributing to opendev is a custom 503 page)17:09
corvuslook good?17:09
fungihrm, yeah nodepool configs still showed the old port17:09
mordredcorvus: while you're doig that - I can make a nodepool_start.yaml playbook for you17:09
corvusmordred: k thx; running now17:10
clarkbcorvus: yes lgtm17:10
fungiat least nl01 still has 2181 in nodepool.yaml17:10
corvusfungi: yep, that should get fixed by the current playbook run17:10
fungithat's what i figured17:10
mordredcorvus: playbooks/nodepool_start.yaml should be updated17:11
corvuswoot17:11
fungimnaser: also i've been thinking if statusbot wrote status messages to a published file on eavesdrop (rather than just to the wiki and twitter) we could probably easily transclude the most recent few into the main opendev.org page and even color-code or filter them by severity17:11
fungimnaser: and then custom 503 pages or similar could include a message linking there17:13
corvusfungi: /var/lib/statusbot/www/alert.json17:13
fungiaha, right, so half of that is already done ;)17:13
mnaseris statusbot hosted on the same machine as zuul.opendev.org apache frontend?17:13
clarkbmnaser: no17:14
mnaseraw ok17:14
clarkbstatus bot is on eavesdrop.openstack.org17:14
mnasercould have been nice, if that file is created, we could have given 503 if it exists17:14
corvusi manually killed ze04; i suspect it was dead on reboot due to the gearman cert issue17:16
corvusall of zuul is stopped now17:16
fungimnaser: well, it could still be included with some js, if we served it from a vhost on eavesdrop.o.o17:16
fungi/etc/nodepool/nodepool.yaml on nl01 looks correct now17:16
mnaserfungi: yeah -- but i was thinking if file exists locally => serve 503 else serve usual 503 bc backend down -- so we don't end up showing a "in maintenance" page if the reality of things is "zuul-web is down"17:17
mnaserbut ill leave that discussion for after the restart :)17:17
fungicorvus: yeah, that sounds likely. i rebooted it late last night just so we wouldn't forget and accidentally bring it back up with broken afs access17:17
corvuscool, i think we can restart nodepool now17:17
mordredcorvus: ++17:17
clarkb++17:18
corvus2020-06-16 17:18:07,468 [myid:2] - INFO  [nioEventLoopGroup-4-3:X509AuthenticationProvider@172] - Authenticated Id 'CN=nl01.openstack.org,OU=Org,O=Company Name,L=Oakland,ST=California,C=US' for Scheme 'x509'17:18
mordred\o./17:18
mordredthat's so exciting17:18
clarkbhrm didn't realize we still have nb04 up (shouldn't be a problem for this)17:18
corvusnl01 appears active and dealing with requests17:18
fungimnaser: i figured we could do something simpler, like a bit of js on the opendev.org main page to include the last few status updates, and then our 503 page for zuul could suggest people look at opendev.org for possible maintenance in progress17:19
mnaserfungi: ah yes, that works too -- or simple js to check if alert.json contains something17:19
mnaserif we serve that of eavesdrop17:19
mordredor our 503 page could have javascript and fetch from the status17:19
mordredyeah17:19
corvushrm, a connection reset just happened17:19
mordreduhoh17:19
corvusit recovered, but it's curious17:20
mordredwe should keep our eyes on that17:20
fungizk connection reset?17:20
mordredcorvus: is tobiash already running tls zk at bmw?17:20
corvusyeah, on nl0117:20
corvusi think?17:20
corvusand onother one17:20
tobiashweird, we saw connection resets on staging as well17:21
tobiashNot yet running it in prod17:21
fungibut not in production?17:21
clarkbnl03 and nl04?17:21
corvusi think just nl01 so far?17:22
corvusnope, others too17:22
tobiashJust remembered thr reason for this was too many builds in one path in zk in our case17:22
clarkb2020-06-16 17:19:25,229 WARNING kazoo.client: Connection dropped: socket connection error: The operation did not complete (read) (_ssl.c:2607) from nl0317:23
tobiashWe had an image build failure loop there17:23
corvusperhaps the tls adapters for either kazoo or zookeeper can't handle large data?17:24
clarkbthat seems to be happening pretty regularly in nl03's log17:25
fungithat would be unfortunate for us17:25
corvusclarkb: nl01 too17:25
clarkbhttps://github.com/python-zk/kazoo/issues/58717:25
clarkbunfortunately no much additional info there, but looks like upstream is aware of the problem and would like help debugging it17:26
corvusthis seems like it may not be viable; and i don't see an obvious immediate emergency fix17:26
mordredcorvus: I agree17:26
corvusi think we will need to revert those patches out of system-config and re-run the process so far17:27
clarkbya I think taht should do it. We'll end up with potentially stale CA info but that will get fixed when we next try this17:27
fungitoo bad, but i concur17:27
mordredcorvus: yah - and the gearman cert update will still be in place, so we should be able to start17:27
clarkband reverting the chagne should reset the zk and nodepool configs to talk 218117:27
mordredyah17:27
corvusyep.  i'll look into whether we can run the zk cluster in dual mode17:28
corvusthat may help us with testing, and maybe we can take over that gh issue17:28
clarkbooh ya that may help with reproduction17:28
clarkb++17:28
mordred++17:28
corvusjust revert 29825ac18b58145f007f64b2998357445b8fdd91 ?17:29
clarkbyes I think so17:30
mordredcorvus: yes, I think that's right17:30
clarkbthat'll update the zk configs across the baord to go back to 2181 without tls17:30
clarkbtobiash: your docker images are different than ours right? I wonder if it could be a openssl or kazoo version thing17:32
clarkbtobiash: iirc we are both running python3.6 though17:32
tobiashyes, ours are bionic based17:33
clarkbthough you probably use the same images in your staging as in production?17:33
tobiashyes17:33
mordredclarkb: kazoo should be coming in via pip - so I imagine we'd have the same kazoo. could be different openssl17:35
mordredtobiash: are you using the upstream zookeeper images? or building your own?17:35
openstackgerritGuillaume Chauvel proposed zuul/zuul-jobs master: prepare-workspace: Set root dir to zuul build uuid  https://review.opendev.org/73598017:35
mordredcause in the list of things we should check - we've got zookeeper server, zookeeper client/kazoo and OS things like openssl17:36
corvusnp seems happy, i'll run service-zuul while it idles and we confirm it's stable17:43
*** sgw has quit IRC17:47
mordredcorvus: watching taht ansible is not super exciting17:50
corvusbooooorrrriiinngggg17:51
corvusshouldn't have turned cowsay off17:51
clarkbMoo17:52
*** sgw has joined #opendev17:53
mordredcorvus: looks done17:54
fungiand nodepool configs are back to before17:55
clarkbfungi: ya I think we're all the way up to zuul has been updated (but not started/)17:55
corvusi'll run the zuul start playbook now17:57
corvusso far so good17:59
clarkblooks like jobs have queued up18:01
clarkbhaven't seen any starting yet18:02
clarkband now it looks like a bunch have started18:03
corvusstatus notice Zuul is back online; changes uploaded or approved between 16:40 and 18:00 will need to be rechecked.18:03
corvushow's that look?18:03
clarkbthat looks good to me18:03
mordredcorvus: ++18:04
mordredcorvus: and .... boo that we have a weird zk scale/connection/tls issue to investigate18:05
corvus#status notice Zuul is back online; changes uploaded or approved between 16:40 and 18:00 will need to be rechecked.18:05
openstackstatuscorvus: sending notice18:05
-openstackstatus- NOTICE: Zuul is back online; changes uploaded or approved between 16:40 and 18:00 will need to be rechecked.18:06
corvusyep.  on the plus side, the deployment (at least up through nodepool) went flawlessly because of all that gate testing :)18:06
mordredcorvus: \o/18:06
* mordred now sandwiches18:06
auristorare the current releases of mirror.centos, mirror.epel, mirror.yum-puppetlabs, and mirror.opensuse from rsyncs including -t or excluding -t?   I ask because they appear to be transferring the contents of the entire volume.18:07
openstackstatuscorvus: finished sending notice18:09
clarkbauristor: while https://review.opendev.org/#/c/735753/ has merged I don't think we've applied it to the server yet as we were in a limbo state overnight (pacific time)18:09
clarkbauristor: we're just about to get past that limbo state (we need to merge another revert I think) then that server should get updated and we can see if things improve18:10
*** mugsie has quit IRC18:11
clarkbcorvus: ^ do we need to push and approve/merge that revert for the zk update?18:13
corvusclarkb: yes, will do in just a sec18:14
corvusi've reset the repo on bridge to current HEAD18:14
*** mugsie has joined #opendev18:15
openstackgerritJames E. Blair proposed opendev/system-config master: Revert "Add Zookeeper TLS support"  https://review.opendev.org/73599018:15
clarkb+2 thanks18:16
corvusclarkb, fungi, mordred: ^ we should merge that asap, then we can re-enable ansible18:16
fungiapproved18:22
openstackgerritMerged openstack/project-config master: Revert "Deprecate Paunch"  https://review.opendev.org/73589318:24
*** tosky_ has joined #opendev18:26
*** tosky has quit IRC18:28
*** iurygregory has quit IRC18:33
*** tosky_ is now known as tosky18:38
fricklersomething looks very broken now https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_53b/735536/3/gate/tempest-full/53be1fa/job-output.txt19:15
*** hashar has quit IRC19:15
mordredfrickler: I agree19:17
clarkbdid ansible update as part of our zuul changes?19:20
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: Make sure pip is installed for python releases  https://review.opendev.org/73600119:21
fungiparsing error around https://opendev.org/openstack/devstack-gate/src/branch/master/roles/test-matrix/tasks/main.yaml#L2419:22
fungicould it be the trailing +?19:22
clarkbfungi: he error was: template error while templating string: no filter named 'match'19:22
fungihrm, no, last touched two years ago19:22
clarkbI think its mad about the match19:22
ianwfungi: isn't it "no filter named 'match'"19:23
clarkbnewer ansible does a thing where you can't | some things19:23
clarkbyou have to is them or something19:23
fungiahh, yeah, around line 2919:25
fungii guess we don't record the ansible version in our build inventory either19:26
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: Make sure pip and wheel are installed for python releases  https://review.opendev.org/73600119:30
ianwfungi: isn't that 2020-06-16 18:44:04.254108 | Ansible Version: 2.9.519:36
fungioh, cool so we do log it at least19:38
fungiand, wow, i guess we're using very new ansible there as clarkb predicted19:38
mordred2.9.9 is latest ansible 2.919:39
mordredso 2.9.5 is from feb 13 - I'd expect no behavior change19:40
fungiyeah, but new as in 2.9 and not our platform default (still 2.7 right?)19:40
ianwhttps://zuul.openstack.org/builds?job_name=tempest-full -- the ones that passed before were on 2.8.819:40
fungiahh, 2.819:40
fungiso something seems to have switched that job to use 2.019:40
fungi2.919:40
ianwbetween 2020-06-16 13:51:38 and 2020-06-16 19:16:2319:41
clarkbwe updated zuul19:43
clarkbwhich may have updated the default from 2.8 to 2.9?19:43
ianwhttps://review.opendev.org/#/c/736006/ switches it to "is" which i think is the fix19:45
clarkbianw: ya that seems right from memory19:46
ianwhttps://docs.ansible.com/ansible/latest/user_guide/playbooks_tests.html#test-syntax19:46
ianwi guess that makes it interesting that we might now have pip/venv and ansible issues all together19:47
fricklerthere's more warnings about 2.9 breaking things like https://zuul.opendev.org/t/openstack/build/806eae234ba646409c0a73833797d080/log/job-output.txt#486519:52
clarkbmordred: https://zuul.opendev.org/t/openstack/build/6728299a161b44d9ab25ed85b9c941bb/log/applytest/puppetapplytest30.final.out.FAILED any idea why we are getting those? that should be ansible talking to localhost right?19:52
ianwfrickler: hrm, nice catch ... i guess the pain of updating this is more useful than working around it19:54
ianwlooks like --become is drop in for --sudo, so that change switches that now too19:58
corvusthat --sudo thing isn't because of the zuul upgrade, right?  that's the internal devstack-gate ansible command19:58
*** iurygregory has joined #opendev19:59
ianwtrue, i wonder if we pin the ansible we install there19:59
clarkbianw: we do20:00
corvusare there any errors related to zuul's upgrade to ansible 2.9?20:01
ianwANSIBLE_VERSION=${ANSIBLE_VERSION:-2.7.14}20:01
clarkbcorvus: https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_53b/735536/3/gate/tempest-full/53be1fa/job-output.txt20:01
clarkbcorvus: that is the only error/failure I'm aware of so far20:01
corvusoookay, so it's part of https://review.opendev.org/73600620:02
ianwyeah, i can split that into two to put the --become thing after, if we like20:03
corvusi just saw a bunch of command line stuff, i missed the test_matrix bit20:03
corvusianw: meh, if it passes tests, i say we leave it alone :)20:03
ianw(it seems we'll eventually hit the opposite issue of 2.7 breaking and not being supported, and having to update that, at some point)20:04
corvuspresumably that should all work since it's emitting the warning20:04
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: Build sphinx with python3 instead  https://review.opendev.org/73592320:05
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: Make sure pip and wheel are installed for python releases  https://review.opendev.org/73600120:05
mordredclarkb: I am baffled as to why that isn't working20:09
mordredcorvus: which change is that for?20:10
mordredgah20:10
mordredclarkb:20:10
clarkbmordred: cool not just me then20:10
clarkbI rechecekd it for that reason, going to see if it is more consistent20:10
mordredclarkb: I saw that issue crop up yesterday I think - so we might actually need to investigate something20:11
mordredclarkb: OR20:11
mordredthat might be motivation to move those out of site.pp and into their own job using the real stuff20:11
* frickler is done for the day, hoping things look better tomorrow20:12
clarkbfrickler: thanks! I'll try to catch up on devstack reviews after lunch20:12
clarkband now it is time for lunch20:13
openstackgerritAndreas Jaeger proposed openstack/project-config master: Replace build-sphinx-docs jobs  https://review.opendev.org/73601620:21
AJaegermordred: I suggest to be friendly to those projects using build-sphinx - even if none of them merged a change for over a year ;( ^20:22
openstackgerritMerged opendev/system-config master: Revert "Add Zookeeper TLS support"  https://review.opendev.org/73599020:24
mordredAJaeger: ++20:30
clarkbinfra-root ^ has merged, should we rm the DISABLE-ANSIBLE file?20:36
clarkbLooks like that revert change queued up all the things too20:36
*** roman_g has quit IRC20:36
mordredclarkb: then yeah - as long as the revert change is what's queued up20:36
clarkbhttps://review.opendev.org/#/c/735990/ is what i see as queued up20:37
mordredyup. same20:37
mordredlet's remove it20:37
ianwit's be worth keeping an eye; yesterday i noticed for example the mirror job failing due to our weird blank afs volumes on shome hosts20:37
clarkband that is the revert20:37
corvusclarkb: ++ you doing it?20:37
clarkbcorvus: mordred I can do that now20:37
mordred++20:37
clarkbdone20:37
clarkbhrm the is vs | match thing is in devstack too?20:40
clarkbconsidering that is affecting the devstack uwsgi fixes we may want to set openstack tenant to ansible 2.8 default20:40
clarkbthat way we can land the uwsgi fixes then the ansible fixes20:41
ianwclarkb: no, i don't think so20:41
ianwi'm trying to catch up on where the other fixes are at20:42
clarkbianw: https://zuul.opendev.org/t/openstack/build/53be1fab81f643aab08e84fb51126a9c/console that seems to be a native zuul devstack job with the same failure20:42
clarkbianw: I think the code may have been copy pastad from d-g to devstack?20:42
ianwhuh, ok.  i think i'd be of the opinion we could force merge a fix of "|" to "is" if required20:43
clarkbya we could also do that20:43
ianwahh, perhaps this is only on old branches20:45
clarkbianw: fwiw I think the actual fixes for uwsgi seem to be fine20:45
clarkbianw: its just a matter of landing them in a bottom up fashion so that grenade is happy20:45
clarkbprod install ansible job completed successfully20:46
clarkbprod base is running now20:46
ianwclarkb: yeah, that's right, so test-matrix is gone on master.  on older branches, it's the 'test-matrix' role that comes from d-g20:48
clarkbaha20:48
ianwi.e. 736006 should fix what you posted above20:48
clarkbits actually pulling it from d-g?20:48
openstackgerritMonty Taylor proposed opendev/system-config master: Add stop and start playbooks for nodepool  https://review.opendev.org/73603120:48
mordredclarkb, corvus: ^^ there's the re-org we discussed during the maint20:49
openstackgerritSean McGinnis proposed openstack/project-config master: Ensure pip is installed for propose-update-constraints  https://review.opendev.org/73603220:49
corvusmordred: should there be a nodepool_launcher/start.yaml ?20:51
ianwclarkb: The error appears to be in <blah> opendev.org/openstack/devstack-gate/roles/test-matrix/tasks/main.yaml': line 24, column 3, but may ... so yeah20:51
mordredcorvus: there already is20:52
mordredcorvus: we almost even sort of maybe got this somewhat right :)20:53
clarkbianw: ya I'm trying to find where we run test-matrix role on the devstack side and failing but I expect you are right20:53
clarkbianw: should we maybe promote the d-g chagne to the gate?20:53
ianwclarkb: it actually passed the bits it changed, right?20:54
openstackgerritMonty Taylor proposed opendev/system-config master: Trigger zuul and nodepool on start/stop playbook changes  https://review.opendev.org/73603920:55
mordredclarkb, corvus: ^^ that said, we shoudl do that too20:55
clarkbianw: thats a tricky question :/20:56
clarkbwe run a bunch of non legacy jobs against d-g :/20:57
ianwclarkb: and not actually "tempest-full"20:58
clarkbianw: seems like https://29c14fb02adb24d88a67-f81dd8fd4f9add75179602ffbcf81b7d.ssl.cf1.rackcdn.com/736006/2/check/legacy-tempest-neutron-full-stable/24b8db6/logs/devstacklog.txt indicates it failed on uwsig as expected20:58
clarkbwhcih implies it did get past the test-matrix thing?20:58
clarkbianw: but also it runs neutron's multinode job which is failing on a venv thing? I'm going to look at the venv thing now I guess. But suspect we should maybe force merge the d-g change then enqueue uwsgi fixes to the gate20:59
clarkbnevermind the multinode thing failed on uwsgi21:00
ianwlegacy-tempest-dsvm-neutron-full-centos-7 (3. attempt) ... that might be showing an issue?  if it's in a pre playbook21:01
ianw"msg": "No package matching 'python-pip' found available, installed or updated" on that ... urgh some other issue then21:03
clarkb#status log Rebooted logstash-worker02 and 13 as ansible base.yaml complained it could not reach them21:03
openstackstatusclarkb: finished logging21:03
clarkbif anyone is wondering what the base run is taking so long ^21:03
clarkbthose serversseemed to be out to lunch21:03
clarkb02 is back, still waiting on 1321:03
ianwi'm struggling to see any of the devsatck-gate tests that have actually tested devstack-gate :/21:05
clarkbianw: the centos job is failing on no package-pip available21:08
ianwyeah, i'll have to look at that.  i guess epel is involved21:08
clarkbbut again no complaints with your chagne itself. I think we may be ok, but hard to know for sure21:08
clarkbI'm still somewhat inclined to merge it, then roll forward on the uwsgi fixes21:09
mordred++21:10
mordredI think that sounds right21:10
ianwclarkb: we could depends-on https://review.opendev.org/#/c/735536/ to it, and make sure it gets going, then merge it21:11
clarkbianw: that works, then we could also direct enqueue 735536 to the gate once the d-g change is in21:11
ianwi think you have to have a stable/train run to see it21:11
clarkblets do that21:11
clarkbwe also need train to merge before anything else as the uwsgi fixes need to be bottom up21:12
clarkbI like that plan21:12
clarkbare you updating that change or should I?21:12
clarkb(with the depends on I eamn21:12
ianwclarkb: umm, i can, just a sec21:12
ianwok, https://review.opendev.org/735536 updated with depends-on https://review.opendev.org/73600621:15
ianwwe can watch the tempest-full job and should see the d-g changes apply early in the run, then be confident it's safe to merge21:15
clarkbinfra-root the base job timed out (due to the ssh'ing issues)21:17
clarkbI expect a rerun of that now would be happier since I restarted those servers. But I wonder if we can make it timeout ssh attempts much more quickly?21:18
ianwgetting some breakfast, will come back to check on d-g stuff21:18
clarkbmordred: ^ do you know? I kinda think zuul must do somethign along those lines talking to zuul test nodes?21:18
clarkbianw: https://zuul.opendev.org/t/openstack/stream/cfd2de2f98ff4e62afd7e56d12428621?logfile=console.log that job21:31
clarkbit isn't quite started up yet but should have a console log soon21:31
clarkbianw: also service-bridge is running right now whihc shoudl configure the dns stuff if you have a moment to check that when it is done21:32
clarkbianw: it succeeded and the cron job is installed21:37
clarkbdo we want to run it in the foreground early to ensure it works?21:37
clarkboverall things are looking good \o/21:38
clarkbianw: we have console log on that job now21:41
clarkbit looks like test matrix ran successfully21:46
clarkbianw: want to confirm?21:46
ianwclarkb: looking21:48
ianw2020-06-16 21:44:30.393760 | TASK [test-matrix : Append neutron to configs for stable/ocata+]21:50
ianw2020-06-16 21:44:31.186053 | controller | ok21:50
ianwthe become stuff should be coming in a tic21:51
ianwhttps://zuul.openstack.org/stream/7aba2eec60134836bc520c196fffcf34?logfile=console.log has used the --become flags too21:55
ianwclarkb: so i agree, the changed bits of 736006 have run successfully so if you agree i'll force merge it21:56
clarkbI'm double checking the --become now21:57
clarkbianw: have a timestamp for --become the search function doesn't seem to work21:57
ianwclarkb: ~ 2020-06-16 21:54:45.083996 | primary | + /opt/stack/new/devstack-gate/devstack-vm-gate.sh:setup_ssh:L81:   /tmp/ansible/bin/ansible all --become -f 5 -i /home/zuul/workspace/inventory -m file -a 'path='\''/root/.ssh'\'' mode=0700 state=directory'22:01
*** Eighth_Doctor has quit IRC22:01
clarkboh its a different job that makes sense22:01
clarkbianw: yup I agree that is all good we should merge it now I think22:02
ianwok i'll do that now22:03
openstackgerritGuillaume Chauvel proposed zuul/zuul-jobs master: WIP: prepare-workspace: Set root dir to zuul build uuid  https://review.opendev.org/73598022:04
*** rchurch has quit IRC22:06
*** factor has joined #opendev22:07
*** rchurch has joined #opendev22:09
ianwthat's merged22:16
clarkbI thinkwe can enqueue https://review.opendev.org/#/c/735536/ and https://review.opendev.org/#/c/735523/ to the gate now?22:17
*** Eighth_Doctor has joined #opendev22:17
clarkbI guess that second one doesn't have enough +2's yet technically (or approval)22:17
clarkbbut I think we can apply that too22:18
ianwyeah, ++22:22
ianwi just have to run to school and back, back in about 15 min22:23
clarkbk I'll enqueue the first one now22:23
*** icarusfactor has joined #opendev22:26
*** factor has quit IRC22:26
*** Eighth_Doctor has quit IRC22:27
mordredclarkb: WHY THE PUPPET JOB FAILED AGAIN ARGHHHHH22:29
clarkbmordred: well I've just finished with the devstack things, but I think I need a break for a minute but I can probably help wtih that next22:29
clarkbmordred: we probably want to confirm what it is sshing to? like should it be (is it) using local connection?22:29
mordredclarkb: I'm honestly not 100% sure - this is one of those things using infra-spec-helper22:30
mordredclarkb: I'm about at EOD and definitely don't have enough brain pellets for debugging this22:30
mordredclarkb: BUT22:30
mordredclarkb: what I think might be more useful is first thing in the morning when my trough of pellets is more full, I crank through a patch to split the puppets into their system-config-run jobs22:31
mordredoh - and one to be more specific with inventory file matchers22:32
clarkbmordred: are you a traeger now?22:32
mordredclarkb: yes22:32
*** iurygregory has quit IRC22:33
*** mugsie has quit IRC22:33
*** auristor has quit IRC22:33
*** jaicaa_ has quit IRC22:33
*** yoctozepto has quit IRC22:33
*** ttx has quit IRC22:33
*** cgoncalves has quit IRC22:33
*** amotoki has quit IRC22:33
*** bhagyashris has quit IRC22:33
*** dmsimard has quit IRC22:33
*** fdegir has quit IRC22:33
*** tbarron has quit IRC22:33
*** corvus has quit IRC22:33
*** ysandeep|away has quit IRC22:33
*** tristanC has quit IRC22:33
*** wrenchyfrenchy has quit IRC22:33
*** dirk has quit IRC22:33
*** johnsom has quit IRC22:33
*** rchurch has quit IRC22:33
*** moppy has quit IRC22:33
*** smcginnis has quit IRC22:33
*** hillpd has quit IRC22:33
*** wendallkaters has quit IRC22:33
*** SotK has quit IRC22:33
*** owalsh has quit IRC22:33
*** tobiash has quit IRC22:33
*** dpawlik6 has quit IRC22:33
*** mnaser has quit IRC22:33
*** mlavalle has quit IRC22:33
*** tkajinam has quit IRC22:33
*** shtepanie has quit IRC22:33
*** rajinir has quit IRC22:33
*** spotz has quit IRC22:33
*** sgw has quit IRC22:33
*** calcmandan has quit IRC22:33
*** seongsoocho has quit IRC22:33
*** vblando has quit IRC22:33
*** mnasiadka has quit IRC22:33
*** prometheanfire has quit IRC22:33
*** Dmitrii-Sh has quit IRC22:33
*** icarusfactor has quit IRC22:33
*** olaph has quit IRC22:33
*** avass has quit IRC22:33
*** melwitt has quit IRC22:33
*** cmurphy has quit IRC22:33
*** logan- has quit IRC22:33
*** paladox has quit IRC22:33
*** panda has quit IRC22:33
*** jhesketh has quit IRC22:33
*** AJaeger has quit IRC22:33
*** JayF has quit IRC22:33
*** ChanServ has quit IRC22:33
*** markmcclain has quit IRC22:33
*** persia has quit IRC22:33
*** odyssey4me has quit IRC22:33
*** mgagne has quit IRC22:33
*** hrw has quit IRC22:33
*** jbryce has quit IRC22:33
*** sshnaidm has quit IRC22:33
*** elod has quit IRC22:33
*** cloudnull has quit IRC22:33
*** ianw has quit IRC22:33
*** osmanlicilegi has quit IRC22:33
*** openstackgerrit has quit IRC22:33
*** frickler has quit IRC22:33
*** zbr has quit IRC22:33
*** DSpider has quit IRC22:33
*** Open10K8S has quit IRC22:33
*** mrunge has quit IRC22:33
*** aannuusshhkkaa has quit IRC22:33
*** donnyd has quit IRC22:33
*** jrosser has quit IRC22:33
*** noonedeadpunk has quit IRC22:33
*** rpittau|afk has quit IRC22:33
*** rm_work has quit IRC22:33
*** jroll has quit IRC22:33
*** mordred has quit IRC22:33
*** ykarel|away has quit IRC22:33
*** fungi has quit IRC22:33
*** tosky has quit IRC22:33
*** diablo_rojo has quit IRC22:33
*** mtreinish has quit IRC22:33
*** knikolla has quit IRC22:33
*** kevinz has quit IRC22:33
*** clarkb has quit IRC22:33
*** stephenfin has quit IRC22:33
*** andreykurilin has quit IRC22:33
*** avass has joined #opendev22:37
*** olaph has joined #opendev22:37
*** icarusfactor has joined #opendev22:37
*** johnsom has joined #opendev22:37
*** AJaeger has joined #opendev22:37
*** jhesketh has joined #opendev22:37
*** panda has joined #opendev22:37
*** paladox has joined #opendev22:37
*** fungi has joined #opendev22:37
*** ykarel|away has joined #opendev22:37
*** mordred has joined #opendev22:37
*** jroll has joined #opendev22:37
*** rm_work has joined #opendev22:37
*** rpittau|afk has joined #opendev22:37
*** noonedeadpunk has joined #opendev22:37
*** jrosser has joined #opendev22:37
*** aannuusshhkkaa has joined #opendev22:37
*** mrunge has joined #opendev22:37
*** donnyd has joined #opendev22:37
*** Open10K8S has joined #opendev22:37
*** zbr has joined #opendev22:37
*** Dmitrii-Sh has joined #opendev22:37
*** prometheanfire has joined #opendev22:37
*** calcmandan has joined #opendev22:37
*** sgw has joined #opendev22:37
*** logan- has joined #opendev22:37
*** cmurphy has joined #opendev22:37
*** melwitt has joined #opendev22:37
*** dpawlik6 has joined #opendev22:37
*** tobiash has joined #opendev22:37
*** owalsh has joined #opendev22:37
*** SotK has joined #opendev22:37
*** wendallkaters has joined #opendev22:37
*** hillpd has joined #opendev22:37
*** smcginnis has joined #opendev22:37
*** moppy has joined #opendev22:37
*** rchurch has joined #opendev22:37
*** andreykurilin has joined #opendev22:37
*** stephenfin has joined #opendev22:37
*** dirk has joined #opendev22:37
*** wrenchyfrenchy has joined #opendev22:37
*** ChanServ has joined #opendev22:37
*** cgoncalves has joined #opendev22:37
*** amotoki has joined #opendev22:37
*** tepper.freenode.net sets mode: +o ChanServ22:37
*** bhagyashris has joined #opendev22:37
*** dmsimard has joined #opendev22:37
*** fdegir has joined #opendev22:37
*** tbarron has joined #opendev22:37
*** corvus has joined #opendev22:37
*** ysandeep|away has joined #opendev22:37
*** tristanC has joined #opendev22:37
*** iurygregory has joined #opendev22:37
*** mugsie has joined #opendev22:37
*** auristor has joined #opendev22:37
*** jaicaa_ has joined #opendev22:37
*** yoctozepto has joined #opendev22:37
*** hrw has joined #opendev22:38
*** jbryce has joined #opendev22:38
*** sshnaidm has joined #opendev22:38
*** elod has joined #opendev22:38
*** cloudnull has joined #opendev22:38
*** ianw has joined #opendev22:38
*** osmanlicilegi has joined #opendev22:38
*** openstackgerrit has joined #opendev22:38
*** frickler has joined #opendev22:38
*** ttx has joined #opendev22:38
*** markmcclain has joined #opendev22:38
*** odyssey4me has joined #opendev22:38
*** mgagne has joined #opendev22:38
*** persia has joined #opendev22:38
*** JayF has joined #opendev22:38
*** mnaser has joined #opendev22:39
*** mlavalle has joined #opendev22:39
*** tkajinam has joined #opendev22:39
*** shtepanie has joined #opendev22:39
*** rajinir has joined #opendev22:39
*** spotz has joined #opendev22:39
*** jrosser has quit IRC22:40
*** tosky has joined #opendev22:40
*** mnaser has quit IRC22:41
*** mtreinish has joined #opendev22:42
*** kevinz has joined #opendev22:42
*** knikolla has joined #opendev22:42
*** clarkb has joined #opendev22:42
*** jrosser has joined #opendev22:43
*** mnaser has joined #opendev22:48
*** vblando has joined #opendev22:51
corvusokay, it looks like we can run zk with ssl and plain in parallel.  i'll work on splitting out my zk-all-the-things change to just add ssl listening the zk server22:51
*** diablo_rojo has joined #opendev22:52
*** Eighth_Doctor has joined #opendev22:54
*** mnasiadka has joined #opendev23:02
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: Make sure wheel is installed for python releases  https://review.opendev.org/73600123:03
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: Build sphinx with python3 instead  https://review.opendev.org/73592323:03
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: Make sure wheel is installed for python releases  https://review.opendev.org/73600123:08
ianwinfra-root: bridge.openstack.org is really dying with a bunch of what seem to be old ansible-playbook services23:13
ianwi'm going to kill all the ones from jun 13,14,1523:13
clarkbianw: I wonder if those timed out like I saw base do today and sometimes they don't die23:14
mordredtheyr'e all remote_puppet_else23:14
mordredoh - that's not true23:14
mordreda few are base23:14
mordredbut most of them are remote_puppet_else23:14
mordredianw: I support the killing - but I thinkw e should now keep our eyes on this and try to figure out why these are hung23:15
ianwhost 23.253.242.1423:15
ianw14.242.253.23.in-addr.arpa domain name pointer logstash-worker15.openstack.org.23:15
ianwappears to be what hung them up...23:15
clarkbianw: 02 and 13 I rebooted in response to seeing them fail ssh in today's base log23:15
clarkbperhaps that set of servers got live migrated ro something and tripped up ansible23:16
clarkbI do think reducing ansible ssh timeout would be good if possible23:16
mordredyeah - ssh-ing to logstash-worker15.openstack.org is hanging from there23:16
mordredclarkb: ++23:16
mordredsshing to logstash-worker15.openstack.org from my laptop is also hanging23:17
ianwyeah there's 36 hung ssh's to it on bridge23:17
clarkbk let me reboot it like the other 223:17
mordredianw: we should kill those ssh's23:17
ianwyep, doing that now23:17
mordred-o ConnectTimeout=1023:18
clarkbreboot issued23:18
mordredthat dosn't seem to have done anything23:18
mordredbut it's in our ssh commands23:18
mordredoh23:18
clarkbmordred: "This value is used only when the target is down or really unreachable, not when it refuses the connection."23:18
mordredclarkb: we have a control persist target23:18
mordredclarkb: nod23:19
ianw$ ps -aef | grep "/bin/bash" | wc -l23:20
ianw181223:20
ianwummm23:20
mordredwow23:21
ianwi think something has fork bombed23:21
mordredclarkb: what's the deal with stable branch devstack?23:21
corvusi closed all my terminals yesterday23:21
clarkbianw: the earliest one seems owned by init23:22
clarkbmordred: well the fixes were enqueued to the gate where they seem to have failed23:22
clarkbmordred: HTTPError: 404 Client Error: Not Found for url: http://mirror.gra1.ovh.opendev.org/wheel/ubuntu-16.04-x86_64/pkg-resources/ so maybe not the fault of the changes23:23
mordredclarkb: awesome23:24
clarkbianw: the bashes have gone away did you do anything?23:24
ianwi  kill -9 101823:24
clarkbrgr23:24
ianwwhich was the one owned by init at the top23:24
mordredssh to logstash15 no longer hangs23:25
mordredtons of bash again23:25
mordredianw: it's owned by you23:26
ianwi think rax-dns-backup is a fork bomb23:26
ianwit is ... it calls itself ... wtf23:26
openstackgerritMerged zuul/zuul-jobs master: Make sure wheel is installed for python releases  https://review.opendev.org/73600123:26
clarkbwasn't it python?23:26
clarkbianw: we must've gotten the file resource wrong in asnible?23:27
ianwcontent: rax-dns-backup23:28
ianwyeah...23:28
openstackgerritIan Wienand proposed opendev/system-config master: rax-dns-backup: fix copy file typo  https://review.opendev.org/73606323:29
ianwclarkb: ^ i feel like that might work better23:29
mordredianw: +A23:34
mordredianw: you might have to fight fork-bombs until that lands23:34
ianwmordred: it should only trigger at 2am; i was just going to run it manually once to confirm it23:35
mordredalso - TIL file content can be a script23:35
mordredwait - isn't it .... OOHHHHHHHHHHHHH23:35
mordredI understand23:35
mordredianw: maybe rm the file that's on disk anyway :)23:36
ianwgood idea, done23:36
auristorlooks like all of the volume transactions finally completed23:46
clarkbauristor: ianw also we should have updated our rsync commands at this point23:47
ianwyeah, agree, next runs should drop "-t"23:49
*** tosky has quit IRC23:50
ianwhttp://grafana.openstack.org/d/ACtl1JSmz/afs?viewPanel=12&orgId=1 should (hopefully) show a marked decline soon ...23:51
auristorwhat triggers the sync?23:51
ianwauristor: they are just cron jobs23:53
ianwauristor: installed @ https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/mirror-update/tasks/rsync.yaml#L64 , specifically23:54
auristorif everything on afs02.dfw will be replicated to afs01.ord, vicepa on afs01.ord will need to be increased23:54
clarkbauristor: I don't think it will be beacuse it would never finihs23:56
auristorit doesn't finish because each vos release is transferring the entire contents of each volume23:57
fungieven just initially, we have around 3tib of data we'd need to sync, and pushing that even halfway across the usa (from dallas to chicago) will take a very long time23:57
fungithough we could slowly add volumes to the set we replicate there and knock it out eventually, even just something as innocuous as ubuntu making a new point release will mean a days-log vos release to copy the updated data23:59
auristorsure.  the way you start it is by "vos dump" to a local file.   compress the file.  scp it.   decompress and feed it to "vos restore" to create the volume on the afs01.ord.   "vos addsite" and then the next incremental will send just the diff from the version that was restored.23:59
fungiahh, yeah that might not be so bad23:59

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!