Friday, 2020-06-26

openstackgerritMerged opendev/system-config master: Make bindep installs non-interactive
openstackgerritIan Wienand proposed opendev/system-config master: [wip] graphite container deployment
openstackgerritIan Wienand proposed opendev/system-config master: [wip] graphite container deployment
openstackgerritIan Wienand proposed opendev/system-config master: [wip] graphite container deployment
openstackgerritOpenStack Proposal Bot proposed openstack/project-config master: Normalize projects.yaml
openstackgerritIan Wienand proposed opendev/system-config master: [wip] graphite container deployment
openstackgerritMerged openstack/project-config master: Retire networking-onos, openstack-ux, solum-infra-guest-agent: Step 1
openstackgerritIan Wienand proposed opendev/system-config master: Graphite container deployment
fricklermnaser: wow, those amd nodes really seem to rock, a good 30% off of a complete tempest run. makes me wonder whether we might want to consider having a flavor with like 6 cores instead of 8, if we could increase our quota with that proportionally07:29
openstackgerritIan Wienand proposed opendev/system-config master: Graphite container deployment
openstackgerritIan Wienand proposed opendev/system-config master: Graphite container deployment
AJaegerianw: afs building failed on (job openafs-rpm-package-build-promote)07:46
AJaegerianw: I think I know what's going on, patch coming07:47
AJaegerfix is
openstackgerritCarlos Goncalves proposed openstack/project-config master: Add nested-virt-centos-8 label
openstackgerritShivanand Tendulker proposed openstack/project-config master: Removes py35 and py27 jobs for proliantutils
openstackgerritCarlos Goncalves proposed openstack/project-config master: Add nested-virt-centos-8 label
openstackgerritShivanand Tendulker proposed openstack/project-config master: Removes py35 and py27 jobs for proliantutils
ttxfungi, AJaeger: so I got a hit overnight on the github mirroring race condition, and it failed to behave the way I expected. It did not ignore the error in mirroring as it should have:
ttxIt should ignore the error if zuul.newrev is defined:
ttxAnd it was defined:
ttxAny hint welcome :)10:57
ttx... now thinking that I should just add a retry and suppress the newrev check, that would take care of the race condition too.11:06
ttx... but still interested in understanding why that first solution fails.11:06
mnaserfrickler: happy to experiment. These machines have really fast I/O and brand new processors. Nothing officially announced yet though :)11:16
openstackgerritThierry Carrez proposed zuul/zuul-jobs master: upload-git-mirror: use retries to avoid races
AJaegeravass: do you know what's wrong with ttx's ignore_errors line? See his comments above - help appreciated, please11:46
fricklermnaser: yeah I saw the processors in the zuul output ;)11:57
openstackgerritMonty Taylor proposed opendev/system-config master: Set noninteractive in assemble script too
openstackgerritShivanand Tendulker proposed openstack/project-config master: Removes py35, tox and cover jobs for proliantutils
clarkbfrickler: mnaser we did that with osic 4 core flavors back in the day. Definitely worth considering14:35
clarkbinfra-root should we proceed with now? and try to get a run of mamage projects in prod?14:36
fungiclarkb: sounds good, i just reviewed and approved it14:39
fungii'm around, albeit barely treading water this morning14:39
fungibut can help with the production debugging if it doesn't work14:39
clarkbI rechecked it a bunch last night so fairly confident it is working14:40
clarkbpassing the gate will be like a 5th of 6th successful run14:41
* clarkb makes tea while that goes through the gate14:42
fungiyeah, it looks like a reasonable approach14:52
fungibasically grab the list, if it's in there we know we don't need to create it, if it's not in there assume it might just be missing because the listing we got is incomplete so check directly whether it exists, and then if it doesn't exist create it14:52
openstackgerritMerged openstack/project-config master: Removes py35, tox and cover jobs for proliantutils
clarkbfungi: manage projects fix is about to merge15:29
openstackgerritMerged opendev/system-config master: Deal with gitea pagination of repo lists
clarkbthere it is15:30
mordredclarkb: ... the gerrit folks upstream are looking at various things related to the master/main sitch - they're considering making it possible to support both targets one as the alias for the other15:36
mordredclarkb: I think maybe we should add a bug to their list about the inability to replicate the default branch change15:36
mordredit's not a gerrit issue to solve - but they might have a good context from which to discuss it with upstream15:36
clarkbmordred: ++ fwiw I'm fairly certain the reason git push doesn't update HEAD is git push can work in a non replication setup15:37
clarkbmordred: the solution there may be a --replicate type flag to git push where it implies force and updating HEAD and all that15:37
clarkband ya gerrit can probably articulate that need better than us15:37
fungiright, and git doesn't want to assume your local branches are the same as the remote's branches15:37
clarkbfungi: manage projects is doing jeepyb things to gerrit now15:39
clarkbso the gitea issue is past us (at least temporarily as we need to make those other improvments)15:39
clarkbfungi: this is the bit you may be interested in though as it should do those acl updates15:39
mordredclarkb: I've submitted an issue15:43
clarkbmordred: I haen't had a chance to read the thing they wrote yet, but if aliasing they probably awnt to avoid assuming master is the only branch people would do that with. I could see that being useful for some projects like ansible that have decided not to use master but also use a name other than what people are converging on15:45
mordredclarkb: yeah. they've also got an issue written to discuss the general idea of rename a branch15:46
mordredclarkb: so it sounds like they're trying ot look at each of the pieces of this potentially generally15:46
openstackgerritThierry Carrez proposed openstack/project-config master: [DNM] Define maintain-github-mirror job
ttxfungi, clarkb: interested in early feedback on ^15:48
ttx(no urgency)15:48
ttxThe script is pretty well-tested, but the Ansible/Zuul part might need fixes15:49
ttxRegarding:, it's probably too late today to approve it, won't be able to watch it much, so probably better wait for Monday15:50
clarkbmordred: fungi: for when you get a chance. It appears that the gitea and gerrit side of things all went well according to the manage-projects log16:03
clarkbmordred: fungi the problem I'm seeing is that hte job isn't ending now that manage-projects on review.o.o has completed16:03
clarkbI expect that is because review-test is causing it to have a sad16:03
mordredclarkb: \o/16:04
clarkbfor prod we seem ok but in order to not have noise in the signal there (whcih caused confusion debugging the gitea thing) we may want to remove review-test from there or something?16:04
clarkband more generally, we should think about how we can make ansible fail faster in those situations?16:05
mordredclarkb: yes to both- we should definitely exclude review-test here16:06
mordredclarkb: I think I left it in originally because it seemed like testing that manage projects works right against new gerrit would be good16:07
clarkbk, I'm not able to do that right this moment. Now that I'm reasonably confident its happy on the prod side I need to figure out a bike ride before summer wakes up and says hello16:07
mordredbut maybe we're missing something to make that work16:08
clarkbnot sure how people operate in warm weather like this16:08
* mordred waves to clarkb from new orleans16:08
clarkbmordred: I'm learning if I don't get out before like 11am I'm better off waiting until tomorrow16:10
openstackgerritMonty Taylor proposed opendev/system-config master: Stop running manage projects on review-test
mordredclarkb: I also recommend just learning to enjoy sweating16:12
AJaegerclarkb: seems jeepyb run fine - is created and content was imported17:07
openstackgerritMerged openstack/project-config master: Normalize projects.yaml
AJaegerinfra-root, infra-prod-manage-projects timed out after 30mins on the change above ^17:57
clarkbAJaeger: yes we think the timeout is related to trying to run on review-test18:07
clarkbAJaeger: should help. I'm reviewing that now18:07
mordredAJaeger: the manage-projects itself should have been successful18:11
mordredclarkb: does manage-projects just spin indefinitely trying to connect if the server isn't there? haven't we seen that issue in other contexts?18:12
mordredclarkb: because, you know, we're not even running gerrit there yet18:12
clarkbmordred: that could be18:12
mordredclarkb: I think manage-projects will indefinitely retry18:12
mordredclarkb: I unfortunately have to step out of the house for a little bit so I can't hands-on help right now18:13
clarkbmordred: no worries I don't think its urngent now that the main failure is addressed18:16
clarkband its friday etc etc18:16
mordredclarkb: yeah. yay friday18:16
clarkbmordred: for docker zuul executors is there anything I can do to help move that along?18:16
mordredclarkb: now that the krb5-user patch landed - we can try doing a docker-compose pull ; docker-compose up -d  again18:21
mordredon ze0118:21
clarkbmordred: cool I'll give that a go after breakfast/lunch18:21
mordredI can do that real quick again if you wann akeep an eye on it18:21
mordredor - I can leave it to you - turns out it's a simple operation either way :)18:22
clarkbya I can watch it between sandwich bites18:23
clarkbmordred: just docker-compose down it if it is sad?18:23
mordredclarkb: yeah18:24
mordredclarkb: so -...18:24
mordredI'm doig pull18:24
mordredI ran out of space the first time - but the other container was still running. so I stopped it and repulled and it was fine18:25
mordredbut we might need to investigate disk space requirements18:25
clarkbmordred: ok zuul runtime is on a separate partition18:25
mordredclarkb: yeah. / is at 100%18:25
clarkbfor containers I've tried to aggressively prune images in our ansible and we may need todo that there since it isn't ansibling right now?18:25
clarkbI'll take a look18:25
mordredclarkb: ++ - I bet there's something we're not doing right - we're using 39G in /18:26
mordredthat's ... heavy18:26
* mordred runs out - biab18:27
clarkbmy normal incantation of cd / && du -hs * | sort -h doesn't work when we've got /afs mounted18:30
clarkbinfra-root ze01:/root/var-lib-zuul-backup is 20GB large and accounts for a significant portion of our idsk use on /18:35
clarkblooks like that was made back in 2017 (must've been part of initial zuulv3 rollout)18:35
clarkbcan we clean that up?18:35
fungii suspect it can18:44
clarkbI'll clean it up in a bit if I don't hear objections18:46
openstackgerritClark Boylan proposed opendev/system-config master: Paginate all the gitea get requests
openstackgerritClark Boylan proposed opendev/system-config master: Increase parallelism of gitea project creation
clarkbI'll WIP those because I want to recheck them a bunch just to be sure there aren't any other weird corner cases here to address18:59
mordredclarkb: I agree, I think that can go19:01
mordredclarkb: I mean - it's a backup from 201719:02
clarkbya I'm just about to context switch to that and clean it up19:02
mordredclarkb: I've got headroom if you want19:02
clarkbits fine I was just getting those gitea management changes rebased19:03
clarkbrm'ing that dir on ze01 now19:03
clarkband done. mordred can I just docker-compose up -d now? or should I do another image pull?19:04
mordredclarkb: should be good - although it also won't hurt19:04
mordredthe pull should be a no-op19:05
clarkbzuul-executor is running on ze01 now19:06
clarkbI'm going to prune docker images19:06
clarkball done19:06
clarkbze01 seems extremely busy so it is doing work. I guess we just wait now to see if the post run tasks arehappy19:10
mordredclarkb: ++19:11
mordredclarkb: it seemed to generally work except for that last time19:12
openstackgerritAndreas Jaeger proposed openstack/project-config master: Finish retirement of openstack-ux,solum-infra-guestagent
openstackgerritAndreas Jaeger proposed openstack/project-config master: Finish retirement of networking-onos
mordredclarkb: look at this:
mordredclarkb: I had tests timeout in sdk unittests - the test timeout is set to 5 seconds (which is still _absurdly_ long for a test)21:51
mordredall of them are in random - which makes me thnk - perhaps test node is missing the random stuff?21:52
mordred        content = ''.join(random.SystemRandom().choice(21:52
mordred            string.ascii_uppercase + string.digits)21:52
mordred            for _ in range(file_size)).encode('latin-1')21:53
mordredis the code in question21:53
fungi"the random stuff"21:53
fungididn't it move to math.random?21:53
fungimaybe newer python interpreter?21:53
mordredfungi: we run <insert name of thing> on the vms to generate entropy no?21:53
fungioh, it was timing out? yeah, usually <thing>21:54
* fungi refreshes fridaybrain21:54
mordredfungi: can't think of the name of <thing> for the life of me21:54
fungii had to grep the dpkg -l output on one of my virtual machines for random terms21:55
fungiproof i should not be behind a keyboard right now i guess21:55
mordredfungi: haveged is listed in infra-package-needs21:58
mordredand this ran on a bionic node in vexxhost - so it shouldn't be new or exciting22:00
clarkbmaybe haveged isnt running for some reason?22:00
clarkbbut ya haveged should provideplenty of entropy I think22:00
fungi(...if it gets started)22:01
mordredand this is only really asking for 4000 bytes - althuogh it's doing it per-thread - so 28k total22:01
clarkbalso perhaps related to new hardware in vexxhost22:01
clarkblike maybe it uses hardware pool in kvm amd that is sad or something22:01
mordredthat said - why is this using systemrandom in the first place22:02
mordredI do not need random for security - this is a test fixture22:03
fungiurandom would totally be sufficient there22:04
clarkbis it possible that bypasses haveged somehow?22:06
clarkbhaveged should feed /dev/random though22:07
fungiyeah, on bionic the kernel should be new enough to have nonblocking /dev/random after seeding22:07
fungii think22:08
fungior it could be friday, in which case all bets on the accuracy of my memory are suspect22:08
fungii've been mowing for the past hour, so it's possible the sun has addled my brain22:09
* fungi gets back to it, this lawn isn't going to destroy itself after all22:09
mordredfungi: have you considered getting a goat?22:15
mordredfungi: it would have the added benefit of also eating any other object you leave downstairs22:16
clarkbavoid the bit flies22:16
clarkb*bit and dont google that22:16
mordredclarkb: you type good22:17
clarkbthe bestest typist22:18
fungiyeah, aware of botflies22:18
fungino thanks22:18
fungiwe have sandflies and those are already getting on my nerves more than the mosquitoes22:18
clarkbsystem-config-run-base seems to be broken22:33
clarkb for a half a second I thought taht could be related to tripleo's problems but this is mad about an ssh rsa key  file22:33
clarkbwe are failing to run the nested ansible to apply base to all the hosts22:34
clarkbdid we change how ssh works there?22:35
clarkboddly the gitea jobs passed both times and that also runs nested ansible22:35
clarkbI'm not sure I have the friday afternoon motivation to debug that :)22:36
fungiyeah, i'm in the middle of a half-hearted attempt to diagnose openstack constraints proposal failures22:41
clarkbI rechecked it again. If it happens a third time I'll do my best to look closer22:44
clarkbI wonder if this is an ssh-keygen issue like we have with zuul quickstart to get the format right22:45
clarkbbut ansible uses openssh not paramiko so it should just work22:45
fungibut why only now?22:45
clarkbdistro update maybe?22:45
clarkbthat could explain the delta between base and gitea jobs too as we run them on different platforms maybe?22:46
fungiyeah maybe22:46
