Tuesday, 2018-12-04

clarkbdoes anyone understand why atomic is building skopeo/ostree as part of the image pull too?00:01
mordredno. that doesn't make any sense to me00:01
clarkbthat also seems like a less than desireable feature (I mean its called "atomic" beacuse it uses atomic images but then we go compile things just for fun?)00:01
mordredI would have thought one of the benefits of this as an exploded packaging format is pre-built binaries00:01
mnaseri think it might be a layered thing00:01
mnaserwhere it commits a layer of bins on top of the existing one00:02
mnaseri guess it's time for me to write a magnum patch00:02
mnaserboo00:02
clarkbmnaser: ah so the error there looks like a compile fail but really its a write the objects to disk error? I could see that00:02
* mordred nominates mnaser for core in all openstack projects00:02
clarkbmordred: corvus fwiw I'm not in a spot to redeploy the cluster today. I think we need to wait for mnaser to have new images regardless00:03
clarkbmnaser: ^ starting with that might be the easiest path forward then those who can build new cluster can redeploy00:03
mordredclarkb: kk00:03
mnaseryeah i think the fail is that it tries to commit the new image on top of everything00:03
mnaserclarkb: you can technically redeploy without us making changes yet :)00:03
clarkbmnaser: by building my own images?00:03
mnasernope!00:03
mnaserone second00:03
mnaserso when you deploy magnum clusters00:04
mnasertheres something called "labels" which are pretty much parameters00:04
mnaserone of them is the image tag used to deploy cluster00:04
mnaserin this case it's called kube_Tag00:05
mnaserhttps://github.com/openstack/magnum/blob/3a50a242d34f9ee02bc782629e54710d437b3d23/magnum/drivers/common/templates/kubernetes/fragments/configure-kubernetes-minion.sh#L28-L2900:05
mnaserwhich gets injected later into heat00:05
clarkb--labels kube_Tag=1.11.5-1 ?00:05
mnaserhttps://github.com/openstack/magnum/blob/c8019ea77f33609452dd1a973e0f421b118c2079/doc/source/user/index.rst#kube-tag00:05
clarkbcool I think that will work then. Good to know00:06
mnaserkube_tag -- all lower case, sorry about that uppercase typo00:06
clarkbunless it boots the existing atomic image, then fails to pull in the new ones :P00:06
clarkbwhich is what I was trying to do :P00:06
mnaserclarkb: it won't, it will actually pull down the new ones from the get go00:06
mnaserand i guess we'll have.. enough space in the new cluster00:07
mnaserim guessing logs and what not probably managed to fill up the cluster00:07
clarkbya if the old images fit then the new ones likely will in that case00:07
clarkbmnaser: the bulk of it is atomic itself. 4.4GB/5GB00:07
mnaserhttps://hub.docker.com/r/openstackmagnum/kubernetes-apiserver/tags/00:07
mnaserthey dont see to be much bigger00:07
clarkbthe journal is 500MB00:07
clarkbwhich is the other space in that I think00:07
clarkbI can probably nuke some of the journal to pull in these new images too00:07
clarkbmnaser: so ya probably what will happen is new images iwll get pulled in straight away before journal fills disk. Then journal will fill disk more quickly as slightly less disk is available for it00:08
mnaseryeah, still should be fixed tho.00:09
clarkb++00:09
mnaserbut just an easier workaround to help you get going00:09
clarkbnow I want to see about pruning the journal and rerunning the pull just to test all these assumptions00:09
*** munimeha1 has quit IRC00:09
fungiinfra-root: https://review.openstack.org/621258 is now un-wip'ed and passing jobs. please review so i can remove lists.o.o from the emergency disable list00:11
clarkbfungi: done00:13
clarkbfwiw making progress now that journal is pruned00:13
mordredfungi: +2 - feel free to +A whenever it's the right time00:14
mordredor I can if it's fine to roll live right now00:15
fungiit's fine to go live now00:15
fungii've already hand-patched it in00:15
fungibecause i wanted to get notifications about it sent out before utc midnight00:15
clarkband ya confirmed on the minion we have a bunch of thnigsrunning under docker on the docker volume00:17
clarkbI wonder why the master isn't running any of those toe spread out the load00:18
clarkb#status log clarkb upgraded the Nodepool magnum k8s cluster by pulling images and rebasing/restarting services for k8s on the master and minion nodes. Magnum doesn't support these upgrades via the API yet. Note that due to disk space issues the master node had its journal cleaned up in order to pull the new images down00:19
openstackstatusclarkb: finished logging00:19
clarkbwe are now patched against teh cve00:20
fungii've added a nonmember discard filter of ^[0-9]+@qq\.com$ to the openstack-discuss ml config, after seeing that one or more of the aliased old list addresses is receiving spam from random addresses matching that pattern on the order of one every few seconds00:21
clarkbmnaser: last question of the day. Do you know if the master node runs any pod workload?00:22
mnaserclarkb: nope00:22
clarkbmnaser: seems like maybe it doesn't in which case the docker volume isn't useful there00:22
fungii'm not thrilled with that discard filter, but i don't see how the moderation queue will be even remotely manageable otherwise00:22
mnaserclarkb: yeah indeed, i would have to check with other magnum devs00:22
* mnaser is not thrilled at how the boot from volume stuff is not super clean00:23
mnaser:<00:23
clarkbfungi: as long as that doesn't prevent people with qq.com addrs from registering that is probably a reasonable compromise00:23
clarkbmnaser: that was an exciting end of the day for me :) I learned a bunch about magnum and atomic and a little about k8s too :P00:23
mnaserand signed me up for work! :P00:23
fungiclarkb: they can subscribe just fine and nonmembers using non-numeric @qq.com addresses can also send and land in moderation00:24
clarkbah its the numeric addrs got it, (though those seem common on that platform?)00:24
fungiit's just the all-numeric @qq.com addresses which need to be subscribed to send00:24
fungiyeah, i think they at one point matched phone numbers for converged text messaging00:25
clarkbah00:25
clarkbalright I've got to step out now. I've not really gotten away from my desk at all today00:25
clarkbmy brain is sufficiently fried. But k8s is patched so yay00:25
fungibut i'll admit i'm not super familiar with qq.com's services00:25
*** jamesmcarthur has joined #openstack-infra00:26
*** tosky has quit IRC00:27
fungilooks like it's also getting some spam from empty addresses like: "sales" <>00:28
fungii wonder how best to filter those00:28
mordredclarkb: I believe this now makes you the resident infra k8s expert. congrats00:30
clarkbuh oh00:31
*** jamesmcarthur has quit IRC00:31
fungigood jeorb00:33
*** bdodd_ has joined #openstack-infra00:37
*** bdodd has quit IRC00:37
*** yamamoto has quit IRC00:38
*** bdodd has joined #openstack-infra00:44
*** gfidente|afk has quit IRC00:45
pabelangerstrongbad!00:45
mnaserclarkb: mordred https://review.openstack.org/#/c/621734/ if you wanna follow progres but threw that up to see how it breaks and ill test locally soon00:46
*** bdodd_ has quit IRC00:48
*** bdodd__ has joined #openstack-infra00:48
*** bdodd has quit IRC00:49
fungipabelanger: well, that was actually coach z, but yes a homestar runner reference indeed00:49
pabelangerfungi: ah, right. Had to google again.00:50
openstackgerritMerged openstack-infra/system-config master: Shut down openstack general, dev, ops and sigs mls  https://review.openstack.org/62125800:51
*** sthussey has quit IRC01:05
*** bhavikdbavishi has joined #openstack-infra01:06
*** bdodd has joined #openstack-infra01:06
*** bdodd__ has quit IRC01:07
*** lbragstad has quit IRC01:07
*** yamamoto has joined #openstack-infra01:10
clarkbmwhahaha: http://logs.openstack.org/41/621341/1/gate/tripleo-ci-centos-7-standalone/0157d94/logs/undercloud/home/zuul/standalone_deploy.log.txt.gz#_2018-12-03_23_21_18 possible 7.6 fallout? its weird because it says the skew and correction are 0, but it still tries 20 times then fails01:14
mwhahahaThat's chrony not able to sync01:14
mwhahahaWe syncwait 2001:14
mwhahahaNtp can be touchy01:15
clarkbmwhahaha: ya but why did it fail if the correction and skew are 001:15
*** wolverineav has quit IRC01:15
clarkbthat should mean it is in sync01:15
mwhahahaSo if it happens more than a few times it might be 7.6 fallout01:15
mwhahahaNo it might not have been synced01:15
clarkbthen the skew number is buggy?01:15
mwhahahaIt's unlikely to be 001:15
clarkbmwhahaha: ya but the stdout says 0 is my point01:15
*** wolverineav has joined #openstack-infra01:16
mwhahahaYeah that's the output when chronyc can't sync01:16
clarkbby can't sync you mean it couldn't talk to an ntp server so the output is "null"01:17
clarkb?01:17
mwhahahaYea01:18
mwhahahaIt's the same thing ntp does01:18
clarkbdo you know what sources it is looking for?01:19
mwhahahaWe just switched to chrony recently but ntp initial sync can be flakey01:19
mwhahahaNtp.pool.org01:19
mwhahahaWe likely need to expand the source list01:19
mwhahahaI'll check it out later01:19
mwhahahaI think I know the cause01:20
clarkbchronyc makestep && chronyc waitsync 20 may be worthwhile too just in case the skew is large01:21
*** yamamoto has quit IRC01:21
clarkb(we've seen that unfortunately on some clouds they can be hours off)01:21
*** jamesmcarthur has joined #openstack-infra01:24
*** bhavikdbavishi has quit IRC01:24
*** wolverineav has quit IRC01:26
*** wolverineav has joined #openstack-infra01:26
mwhahahaThat's what we are doing. Anyway it's cause we recnely dropped the pool config and aren't supplying multiple servers any more.01:26
mwhahahaI'll propose a patch and ping juan01:27
*** yamamoto has joined #openstack-infra01:28
*** jamesmcarthur has quit IRC01:34
*** jamesmcarthur has joined #openstack-infra01:35
*** takamatsu has quit IRC01:37
*** jamesmcarthur has quit IRC01:38
*** hongbin has joined #openstack-infra01:43
*** jamesmcarthur has joined #openstack-infra01:49
*** jamesmcarthur has quit IRC01:51
fungi#status log used rmlist to delete the openstack, openstack-dev, openstack-operators and openstack-sigs mailing lists on lists.o.o while leaving their archives in place01:52
openstackstatusfungi: finished logging01:52
fungiin case they're needed, i saved the final states of the mailman configs and subscriber lists for all of those in my homedir on the server01:53
fungiso that we won't need to extract them from backups01:53
*** jamesmcarthur has joined #openstack-infra01:53
*** jamesmcarthur has quit IRC01:57
*** jamesmcarthur has joined #openstack-infra02:04
*** agopi has quit IRC02:04
*** jamesmcarthur has quit IRC02:07
*** jamesmcarthur has joined #openstack-infra02:07
*** agopi has joined #openstack-infra02:13
*** gyee has quit IRC02:14
*** mrsoul has quit IRC02:18
openstackgerritIan Wienand proposed openstack-infra/zuul-jobs master: mirror-workspace-git-repos: Explicitly show HEAD of checked out branches  https://review.openstack.org/62184002:21
openstackgerritIan Wienand proposed openstack-infra/system-config master: Add support for enabling the ARA callback plugin in install-ansible  https://review.openstack.org/61122802:29
openstackgerritIan Wienand proposed openstack-infra/system-config master: Prefix install_openstacksdk variable  https://review.openstack.org/62146202:29
openstackgerritIan Wienand proposed openstack-infra/system-config master: [to squash] Modifications to ARA installation  https://review.openstack.org/62146302:29
openstackgerritIan Wienand proposed openstack-infra/system-config master: functional-tests: collect and publish inner ARA results  https://review.openstack.org/61721602:29
openstackgerritIan Wienand proposed openstack-infra/system-config master: Revert "Make system-config-run-base-ansible-devel non-voting"  https://review.openstack.org/62184702:29
*** betherly has joined #openstack-infra02:38
*** bhavikdbavishi has joined #openstack-infra02:39
*** rlandy has quit IRC02:40
*** wolverineav has quit IRC02:41
*** jamesmcarthur has quit IRC02:42
*** wolverineav has joined #openstack-infra02:42
*** betherly has quit IRC02:43
*** jamesmcarthur has joined #openstack-infra02:45
*** wolverineav has quit IRC02:47
*** psachin has joined #openstack-infra02:48
*** wolverineav has joined #openstack-infra02:55
*** wolverineav has quit IRC03:01
*** eernst has joined #openstack-infra03:11
*** hongbin_ has joined #openstack-infra03:11
*** eernst has quit IRC03:12
*** hongbin has quit IRC03:13
*** hongbin has joined #openstack-infra03:22
*** dklyle has joined #openstack-infra03:23
*** onovy has quit IRC03:23
*** hongbin_ has quit IRC03:23
*** onovy has joined #openstack-infra03:25
*** david-lyle has quit IRC03:25
openstackgerritkangyufei proposed openstack/boartty master: Change openstack-dev to openstack-discuss  https://review.openstack.org/62193803:27
*** armax has quit IRC03:28
*** wolverineav has joined #openstack-infra03:32
*** betherly has joined #openstack-infra03:33
*** betherly has quit IRC03:38
openstackgerritMerged openstack-infra/nodepool master: Set pool for error'ed instances  https://review.openstack.org/62168103:42
*** ramishra has joined #openstack-infra03:46
*** dave-mccowan has quit IRC03:53
*** ykarel|away has joined #openstack-infra03:58
*** mriedem_away has quit IRC03:58
*** sshnaidm is now known as sshnaidm|afk04:04
*** auristor has quit IRC04:05
*** auristor has joined #openstack-infra04:05
*** jamesmcarthur has quit IRC04:11
tonybtobias-urdin: Sorry I let that slip off my radar.  I'll queue it up for tomorrow04:19
*** ykarel|away has quit IRC04:24
*** ykarel|away has joined #openstack-infra04:25
*** ykarel|away is now known as ykarel04:27
*** jamesmcarthur has joined #openstack-infra04:39
*** agopi_ has joined #openstack-infra05:00
*** agopi has quit IRC05:00
*** diablo_rojo has quit IRC05:05
*** agopi__ has joined #openstack-infra05:09
*** agopi__ is now known as agopi05:10
*** agopi_ has quit IRC05:12
*** janki has joined #openstack-infra05:22
*** hongbin has quit IRC05:39
openstackgerritManik Bindlish proposed openstack/os-testr master: Change openstack-dev to openstack-discuss  https://review.openstack.org/62200205:49
*** gcb_ has joined #openstack-infra06:06
*** gcb_ has quit IRC06:11
*** betherly has joined #openstack-infra06:13
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: update status page layout to 4 columns  https://review.openstack.org/62201006:15
*** betherly has quit IRC06:17
*** ahosam has joined #openstack-infra06:22
ianwhrm, another weird ansible error on the devel branch job -> fatal: [trusty]: FAILED! => {"msg": "the connection plugin 'ssh' was not found"}06:32
*** apetrich has quit IRC06:37
*** yboaron_ has joined #openstack-infra06:46
*** jamesmcarthur has quit IRC06:47
ianw#status log fixed emergency file to re-enable bridge.o.o puppet runs (which stopped in http://grafana.openstack.org/d/qzQ_v2oiz/bridge-runtime?orgId=1&from=1543888040274&to=1543889448699)06:52
openstackstatusianw: finished logging06:52
ianwit doesn't seem the switch to static inventory has happened yet, probably because of the broken runs.  i guess it will take two more runs to fully deploy06:53
ianwinfra-root: ^ worth keeping an eye on06:53
*** takamatsu has joined #openstack-infra06:53
*** wolverineav has quit IRC06:55
*** wolverineav has joined #openstack-infra06:56
*** quiquell|off is now known as quiquell06:59
*** pcaruana has joined #openstack-infra07:10
*** prometheanfire has quit IRC07:11
*** AJaeger has quit IRC07:11
*** prometheanfire has joined #openstack-infra07:13
*** AJaeger has joined #openstack-infra07:15
*** ahosam has quit IRC07:21
*** apetrich has joined #openstack-infra07:24
tobias-urdintonyb: thank you, got lost on me as well :)07:28
*** dpawlik has joined #openstack-infra07:29
*** ykarel_ has joined #openstack-infra07:29
*** ykarel has quit IRC07:31
*** rcernin has quit IRC07:38
*** jtomasek has joined #openstack-infra07:41
*** wolverineav has quit IRC07:45
*** wolverineav has joined #openstack-infra07:47
*** quiquell is now known as quiquell|brb07:50
*** wolverineav has quit IRC07:52
*** adriancz has joined #openstack-infra07:57
*** longkb has joined #openstack-infra08:00
*** chkumar|off is now known as chandan_kumar08:05
*** quiquell|brb is now known as quiquell08:07
*** takamatsu has quit IRC08:15
*** ralonsoh has joined #openstack-infra08:20
*** gema has joined #openstack-infra08:28
*** ykarel_ is now known as ykarel08:29
*** ianychoi has quit IRC08:30
*** ianychoi has joined #openstack-infra08:30
openstackgerritTristan Cacqueray proposed openstack-infra/nodepool master: Set type for error'ed instances  https://review.openstack.org/62210108:44
*** jamesmcarthur has joined #openstack-infra08:44
*** takamatsu has joined #openstack-infra08:46
*** cgoncalves has joined #openstack-infra08:48
*** jamesmcarthur has quit IRC08:50
*** tosky has joined #openstack-infra08:51
*** ykarel is now known as ykarel|lunch08:51
*** jpena|off is now known as jpena08:55
*** agopi has quit IRC08:56
openstackgerritTristan Cacqueray proposed openstack-infra/nodepool master: Set id for error'ed instances  https://review.openstack.org/62210808:56
*** agopi has joined #openstack-infra08:56
*** ahosam has joined #openstack-infra08:57
*** jamesmcarthur has joined #openstack-infra08:58
*** wolverineav has joined #openstack-infra08:59
*** jamesmcarthur has quit IRC09:02
*** jpich has joined #openstack-infra09:02
*** agopi_ has joined #openstack-infra09:06
*** agopi has quit IRC09:09
*** rascasoft has quit IRC09:09
*** rascasoft has joined #openstack-infra09:10
*** takamatsu has quit IRC09:14
*** shardy has joined #openstack-infra09:15
*** shardy has quit IRC09:16
*** takamatsu has joined #openstack-infra09:16
*** shardy has joined #openstack-infra09:16
*** agopi_ is now known as agopi09:22
*** ahosam has quit IRC09:22
*** lbragstad has joined #openstack-infra09:24
*** ykarel|lunch is now known as ykarel09:26
*** priteau has joined #openstack-infra09:30
*** d0ugal has quit IRC09:31
*** aojea has joined #openstack-infra09:32
*** jamesmcarthur has joined #openstack-infra09:33
*** jamesmcarthur has quit IRC09:38
*** e0ne has joined #openstack-infra09:44
*** sshnaidm|afk is now known as sshnaidm09:46
*** takamatsu has quit IRC09:49
*** takamatsu has joined #openstack-infra09:50
*** derekh has joined #openstack-infra09:51
*** yamamoto has quit IRC09:59
openstackgerritJonathan Rosser proposed openstack-infra/project-config master: Add centos/suse to OSA grafana dashboard  https://review.openstack.org/62216910:00
*** gfidente has joined #openstack-infra10:02
*** dtantsur|afk is now known as dtantsur10:07
*** bhavikdbavishi1 has joined #openstack-infra10:08
openstackgerritManik Bindlish proposed openstack/os-performance-tools master: Change openstack-dev to openstack-discuss  https://review.openstack.org/62217310:08
*** yamamoto has joined #openstack-infra10:08
*** yamamoto has quit IRC10:08
openstackgerritQuique Llorente proposed openstack-infra/zuul master: Add default value for relative_priority  https://review.openstack.org/62217510:09
*** bhavikdbavishi has quit IRC10:09
*** bhavikdbavishi1 is now known as bhavikdbavishi10:09
*** takamatsu has quit IRC10:13
*** ralonsoh has quit IRC10:23
*** ralonsoh has joined #openstack-infra10:23
*** ahosam has joined #openstack-infra10:29
*** electrofelix has joined #openstack-infra10:34
*** jamesmcarthur has joined #openstack-infra10:34
*** bhavikdbavishi1 has joined #openstack-infra10:44
*** bhavikdbavishi has quit IRC10:48
*** bhavikdbavishi1 is now known as bhavikdbavishi10:48
*** takamatsu has joined #openstack-infra10:50
ssbarnea|roverI observed some spam which looks as being related to zuul logger with "waiting for logger", like http://logs.openstack.org/99/618899/3/gate/tripleo-ci-centos-7-standalone/d0e6fe3/job-output.txt.gz10:52
ssbarnea|roveris this one-off or a more serious problem?10:52
*** bhavikdbavishi has quit IRC10:52
*** jpena is now known as jpena|brb10:54
ssbarnea|rovercreated as https://review.openstack.org/#/c/622187/10:57
openstackgerritSorin Sbarnea proposed openstack-infra/elastic-recheck master: Query: [primary] Waiting for logger  https://review.openstack.org/62221011:01
*** takamatsu has quit IRC11:04
*** takamatsu has joined #openstack-infra11:05
*** ahosam has quit IRC11:05
*** ahosam has joined #openstack-infra11:07
*** yboaron_ has quit IRC11:15
*** yboaron_ has joined #openstack-infra11:15
*** yamamoto has joined #openstack-infra11:22
*** longkb has quit IRC11:22
*** yamamoto has quit IRC11:35
*** e0ne has quit IRC11:35
*** e0ne has joined #openstack-infra11:38
*** bhavikdbavishi has joined #openstack-infra11:54
*** vabada has joined #openstack-infra11:59
*** vabada has quit IRC11:59
*** vabada has joined #openstack-infra12:00
*** gfidente has quit IRC12:06
*** yboaron_ has quit IRC12:07
*** yboaron_ has joined #openstack-infra12:12
*** dave-mccowan has joined #openstack-infra12:12
*** d0ugal has joined #openstack-infra12:16
*** ahosam has quit IRC12:19
*** lbragstad has quit IRC12:21
*** lbragstad has joined #openstack-infra12:22
*** lbragstad has quit IRC12:23
*** lbragstad has joined #openstack-infra12:24
*** lbragsta_ has joined #openstack-infra12:26
*** ykarel is now known as ykarel|afk12:29
*** lbragsta_ has quit IRC12:31
*** lbragstad has quit IRC12:31
*** quiquell is now known as quiquell|lunch12:33
pabelangerssbarnea|rover: that usually happens when zuul_conole port on remote node is closed, and zuul is not able to stream logs12:35
*** lbragstad has joined #openstack-infra12:38
*** jamesmcarthur has quit IRC12:39
*** Douhet has quit IRC12:41
*** jpena|brb is now known as jpena|lunch12:42
*** Douhet has joined #openstack-infra12:42
*** ykarel|afk is now known as ykarel12:43
*** tpsilva has joined #openstack-infra12:44
*** ahosam has joined #openstack-infra12:47
*** bhavikdbavishi has quit IRC12:50
*** ykarel is now known as ykarel|afk12:51
*** bhavikdbavishi has joined #openstack-infra12:56
*** psachin has quit IRC12:59
*** udesale has joined #openstack-infra13:00
*** ahosam has quit IRC13:01
*** rh-jelabarre has joined #openstack-infra13:01
*** boden has joined #openstack-infra13:05
*** jcoufal has joined #openstack-infra13:08
openstackgerritFilippo Inzaghi proposed openstack-dev/hacking master: Change openstack-dev to openstack-discuss  https://review.openstack.org/62231713:08
*** sshnaidm is now known as sshnaidm|afk13:14
*** jtomasek_ has joined #openstack-infra13:15
openstackgerritFilippo Inzaghi proposed openstack-dev/pbr master: Change openstack-dev to openstack-discuss  https://review.openstack.org/62232113:18
*** jtomasek has quit IRC13:18
*** rlandy has joined #openstack-infra13:27
*** bhavikdbavishi has quit IRC13:29
*** yboaron_ has quit IRC13:29
*** psachin has joined #openstack-infra13:38
*** kgiusti has joined #openstack-infra13:40
*** priteau has quit IRC13:40
*** jpena|lunch is now known as jpena13:40
*** priteau has joined #openstack-infra13:42
*** quiquell|lunch is now known as quiquell13:44
*** jaosorior has joined #openstack-infra13:45
*** jamesmcarthur has joined #openstack-infra13:46
openstackgerritFilippo Inzaghi proposed openstack-infra/bindep master: hange openstack-dev to openstack-discuss  https://review.openstack.org/62232513:46
*** eharney has joined #openstack-infra13:47
openstackgerritFilippo Inzaghi proposed openstack-infra/elastic-recheck master: Change openstack-dev to openstack-discuss  https://review.openstack.org/62232613:49
*** janki has quit IRC13:50
openstackgerritFilippo Inzaghi proposed openstack-infra/gear master: Change openstack-dev to openstack-discuss  https://review.openstack.org/62232713:51
*** jamesmcarthur has quit IRC13:51
*** psachin has quit IRC13:51
*** gfidente has joined #openstack-infra13:55
openstackgerritFilippo Inzaghi proposed openstack-infra/git-review master: Change openstack-infra to openstack-discuss  https://review.openstack.org/62232813:57
e0nehi. could anybody please help me why CI didn't start for https://review.openstack.org/#/c/580469/?13:58
e0neI've got the same issue with the related patch13:58
e0nehttps://review.openstack.org/#/c/601687/313:58
*** sshnaidm|afk is now known as sshnaidm13:59
fricklere0ne: you have a dependency cycle between these two14:00
openstackgerritFilippo Inzaghi proposed openstack-infra/glean master: Change openstack-dev to openstack-discuss  https://review.openstack.org/62232914:00
fricklere0ne: one depends on the other, which is rebased onto the first14:00
e0nefrickler: thanks! I missed this somehow :(14:01
*** sthussey has joined #openstack-infra14:02
*** udesale has quit IRC14:08
openstackgerritMerged openstack-infra/gear master: Change openstack-dev to openstack-discuss  https://review.openstack.org/62232714:14
*** irclogbot_1 has quit IRC14:15
dmsimardianw: meant to ping you last night for the ara things but got sidetracked, ping me when you're around ?14:16
*** larainema has joined #openstack-infra14:25
*** janki has joined #openstack-infra14:29
*** mriedem has joined #openstack-infra14:33
dmsimardianw: most notably, your review makes me wonder at what point should we perhaps consider a standard "install-ansible" zuul-jobs role14:33
*** ginopc has quit IRC14:33
*** zul has quit IRC14:37
openstackgerritFilippo Inzaghi proposed openstack-infra/grafyaml master: Change openstack-infra to openstack-discuss  https://review.openstack.org/62233814:38
fungiskimming the zuul status page, looks like things are going really well14:40
*** ginopc has joined #openstack-infra14:41
fungitripleo and nova have burned a lot of their tokens and have changes waiting for hours to get node assignments, but other lower-activity projects are getting nodes straight away with no problem14:41
*** zul has joined #openstack-infra14:41
*** bhavikdbavishi has joined #openstack-infra14:43
cmurphyis the asterisk server okay? it's been a while since I've used it so it might just be me but calling into it with a sip client or telephone gives me a busy signal14:44
*** janki has quit IRC14:44
fungiserver's reachable over ssh14:44
fungiasterisk process is running since september 2714:45
*** xek has quit IRC14:45
fungii don't have a sip client handy but can confirm the dial-in number is returning busy14:46
fungii wonder if the account for the dial-in trunk has run out of prepaid minutes again14:46
fungii know clarkb was talking to the osf finance person at the summit about getting a different account set up which bills the foundation directly14:47
fungibut i don't recall where that got to14:47
openstackgerritFilippo Inzaghi proposed openstack-infra/infra-manual master: Change openstack-infra to openstack-discuss  https://review.openstack.org/62234114:48
funginothing jumping out at me in the asterisk service logs14:48
fungiinteresting that sip connectivity would also be broken as i didn't think that relied on the dial-in account14:49
openstackgerritJens Harbott (frickler) proposed openstack-infra/project-config master: Add placement into integrated queue  https://review.openstack.org/62234214:50
cmurphyit's possible my client is broken, it turns out opensuse doesn't have an official jitsi package and the rpm i downloaded from jitsi might be meant for redhat14:50
cmurphybut that's why i was hoping dialing in would work14:50
fungii could probably install jitsi on my workstation but don't have a speaker/microphone to test it with14:51
cmurphyfor me the connection timed out without reaching the server so you'd still be able to see if it gets past that14:51
fungiinteresting. that could be a routing or packet filtering problem too14:52
amorinhey all14:52
fungihey amorin, did you manage to confirm the disk performance strangeness we were seeing in bhs1?14:53
*** udesale has joined #openstack-infra14:53
*** irclogbot_1 has joined #openstack-infra14:53
amorinI am starting working on it14:54
amorinso I'd like to be sure14:54
amorinis the flavor ssd-osFoundation-3 ?14:54
fungichecking14:54
amorinhow did you test the IO ?14:54
cmurphythe message in jitsi when i use the address sip:conference@pbx.openstack.org is "The remote party has not replied!The call will be disconnected"14:54
*** eharney has quit IRC14:56
fungiamorin: https://git.openstack.org/cgit/openstack-infra/project-config/tree/nodepool/nl04.openstack.org.yaml says we are booting all our images in bhs1 with flavor ssd-osFoundation-314:56
amorinok14:56
amorinI am booting right now an image with debian 914:56
amorinhow are you measuring IO?14:56
amorinusing fio?14:56
fungiamorin: we were just trying a basic dd from /dev/zero to a file on the rootfs with a blocksize around 1mb and file size around 4gb to see the write speed. factor of 20x performance difference between equivalent instances in gra1 and bhs114:57
amorinok14:57
fungiamorin: it first came to our attention that jobs which preallocate a non-sparse swapfile were taking an additional 10+ minutes in bhs1 waiting on that step to complete, which is what got us digging deeper on possible disk performance differences there14:58
*** zul has quit IRC14:58
fungibut more generally it seems to also be impacting the speed at which any i/o-bound operations complete (package installation, log flushing, et cetera)14:59
*** florianf is now known as florianf|biab14:59
amorinmake sense if disk is slowing down the whole instance14:59
fungiso a far higher proportion of jobs running in bhs1 were basically hitting their configured timeouts before they could complete15:00
fungii did some rough analysis of job timeouts based on our logstash data, and jobs running in bhs1 were 20x more likely to timeout than jobs running in gra115:01
fungiinteresting that the proportion there was roughly the same as the difference in disk write speed... i wouldn't have expected quite so close a correlation15:01
fungiand that was after we halved our max-servers there to account for any possible cpu contention due to the 2:1 oversubscription ratio15:02
*** dpawlik has quit IRC15:04
fungicmurphy: ahh, okay then that error very well may indicate some trouble with the service itself. i'll see if i can repeat your findings from here15:04
cmurphythanks fungi15:04
fungicmurphy: on a related note, i didn't see puppet log making any local changes on lists.katacontainers.io15:05
fungiso that does indeed seem to have been a no-op15:06
fungias hoped15:06
cmurphyfungi: that's good15:06
cmurphyfungi: there's another patch in the same state that needs a recheck https://review.openstack.org/61565615:07
fungicmurphy: wow, so it looks like there is no jitsi package in debian these days. even ubuntu seems to have dropped it from universe after 14.04lts (likely because they were importing it from debian?)15:08
cmurphyfungi: oh dear :(15:08
fungiresearching a bit now to see if i can tell why15:08
cmurphytheir website does seem to dissuade people from using it anymore https://jitsi.org/15:09
cmurphyit's "legacy"15:09
openstackgerritMerged openstack-infra/project-config master: Set placement's gate queue to integrated  https://review.openstack.org/62126715:09
openstackgerritSorin Sbarnea proposed openstack-infra/elastic-recheck master: Query: HTTPConnectionPool(host='tempest-sendmail.tripleo.org',  https://review.openstack.org/62235215:10
fungicmurphy: yeah, looks like it was dropped from debian/unstable over a year ago after it failed to meet qa requirements for either of the previous two stable debian releases15:11
pabelangercmurphy: I can try to debug pbx.o.o if you'd like15:11
pabelangerif you give me 1 min, you can try your call again15:11
fungipabelanger: that would be lovely if you have a moment!15:11
cmurphythanks pabelanger15:11
pabelangercmurphy: okay, try your call again15:12
fungipabelanger: what did you find? looks like you restarted the service15:13
cmurphypabelanger: "Initiating call"15:13
pabelangerfungi: I think there was a deadlock in asterisk, when I tried to enable sip debugs, the CLI froze15:14
*** jamesmcarthur has joined #openstack-infra15:14
cmurphy"The remote party has not replied!The call will be disconnected"15:14
fungii'm able to dial into the bridge15:14
pabelangercmurphy: oh, I think you are using ipv615:14
pabelangerlet me see if we have pbx.o.o setup for that15:14
pabelangercmurphy: if you can confirm on your client side15:15
fungilooks like the server itself is reachable on its v6 address15:15
pabelangerYah, we have it enabled. I am unsure if it was ever tested / worked15:15
openstackgerritSorin Sbarnea proposed openstack-infra/elastic-recheck master: Query: [primary] Waiting for logger  https://review.openstack.org/62221015:15
cmurphypabelanger: when i ping pbx.openstack.org it uses the ipv4 address15:15
pabelangerCall-ID: 5678768139207f7e86d484327d1687b6@0:0:0:0:0:0:0:015:16
*** mriedem is now known as mriedem_afk15:16
pabelangerI suspect that is the issue, let me see if I can fix15:16
fungicmurphy: your ping utility may default to v4 and need a -6 option or to be invoked as ping6 instead15:16
cmurphyi think I changed one of my networks configs to favor ipv4 always because my office was having issues15:16
cmurphynot sure how to tell what jitsi is using, it doesn't log the address15:17
openstackgerritSorin Sbarnea proposed openstack-infra/elastic-recheck master: Query: HTTPConnectionPool(host='tempest-sendmail.tripleo.org',  https://review.openstack.org/62235215:17
fungicmurphy: you could try connecting by ip address instead of hostname, probably?15:17
cmurphywill try that15:17
pabelangercmurphy: yah, I can see your client is not replying to our 200 OK15:17
fungisip:6000@23.253.226.3215:17
openstackgerritFilippo Inzaghi proposed openstack-infra/infra-specs master: Change openstack-infra to openstack-discuss  https://review.openstack.org/62235515:17
fungipabelanger: any idea what a better alternative to jitsi is we could recommend? looks like it's basically abandoned upstream and been dropped from most modern distros15:18
pabelangerfungi: I'd have to look, I'm a few years out of date on SIP clients sadly.15:19
cmurphysame result with the ipv4 address15:19
fungipabelanger: no worries, thought you might know off the top of your head15:19
pabelangercmurphy: okay if I post your IP address here? If not, I can PM15:19
corvusfungi: should i pay the phone bill again?15:19
cmurphypabelanger: yeah i think it's fine15:19
openstackgerritMerged openstack-infra/infra-manual master: Change openstack-infra to openstack-discuss  https://review.openstack.org/62234115:19
pabelangerfungi: At one point, I wanted to build a very simple webrtc app that we hosted on pbx.o.o (or some other server) and people just used their browsers15:19
pabelangercmurphy: From: "cmurphy" <sip:cmurphy@195.135.221.2:60998;transport=udp>;tag=3e41c1c215:19
pabelangercmurphy: is that your current IP?15:20
fungicorvus: seems not to have been the problem this time15:20
cmurphypabelanger: yep looks like15:20
corvusfungi: well, the account balance is low; if it's not the problem now, it will be shortly15:20
cmurphymaybe a firewall issue in my office?15:20
corvuslooks like we have $7.2615:21
ssbarnea|rovercan someone help me with few elastic-recheck reviews? https://review.openstack.org/#/q/project:openstack-infra/elastic-recheck+status:open+is:mergeable15:21
fungicmurphy: sip can be sensitive to some sorts of address translation and state tracking15:21
pabelangercmurphy: for some reason, you are not getting our replies to your invite. You could try using TCP for your client, and see if your firewall is better with those packets15:21
pabelangerusually in this case, I would say look to your firewall and see why the packets are not getting to your client15:22
corvusfwiw i can sip connect15:22
fungireading up at https://jitsi.org/ it seems they've basically switched focus to webrtc-based video conferencing tools and ceased development on sip15:22
pabelangercorvus: Yup, see your attempt15:22
openstackgerritFilippo Inzaghi proposed openstack-infra/jeepyb master: Change openstack-infra to openstack-discuss  https://review.openstack.org/62235715:22
cmurphypabelanger: corvus okay then it is definitely me, i can try to figure out how to switch to tcp and/or try again from home15:22
corvusand i can dial in on pstn15:23
openstackgerritDavid Moreau Simard proposed openstack-infra/puppet-openstackci master: Add AFS mirror support  https://review.openstack.org/52937615:23
fungiyeah, i was able to dial in with my phone after pabelanger restarted askerisk15:23
pabelangerfungi: cmurphy: I still use ekiga as a client, maybe see if you have a package for that15:23
cmurphypabelanger: i do! excellent15:23
pabelangergreat15:24
fungidebian seems to have dropped ekiga from unstable as well, but more recently than jitsi. checking why15:24
cmurphyand dialing in now works15:24
pabelangerYah, wouldn't surprise me that everybody is switching to webrtc15:25
openstackgerritDavid Moreau Simard proposed openstack-infra/puppet-openstackci master: Add AFS mirror support for RHEL/CentOS  https://review.openstack.org/52873915:25
fungiekiga was removed from debian/unstable two weeks ago at the maintainer's request, noting it's been abandoned upstream since 2013 and hadn't been updated with support for newer ptlib, so blocking removal of openssl 1.015:26
pabelangerYah, so I think there is a deadlock in chan_sip for some reason, not worth debugging at this point. We are running an old version of asterisk, we also need to upgrade to bionic also.15:26
fungihttps://bugs.debian.org/91159315:26
openstackDebian bug 911593 in ftp.debian.org "RM: ekiga -- ROM; RoQA; unmaintained, depends on ptlib which depends on openssl1.0" [Normal,Open]15:26
openstackgerritDavid Moreau Simard proposed openstack-infra/puppet-openstackci master: Add AFS mirror support for RHEL/CentOS  https://review.openstack.org/52873915:28
openstackgerritFilippo Inzaghi proposed openstack-infra/openstack-zuul-jobs master: Change openstack-infra to openstack-discuss  https://review.openstack.org/62236015:30
*** xek has joined #openstack-infra15:30
amorinfungi: can you give me the dd command you used ?15:34
amorinI cant reproduce15:34
amorinI have even better perf on BHS1 than on GRA115:34
openstackgerritFilippo Inzaghi proposed openstack-infra/openstackid master: Change openstack-infra to openstack-discuss  https://review.openstack.org/62236215:35
fungidd if=/dev/zero of=foo bs=1M count=409615:35
fungiamorin: it's possible we were seeing i/o competition with other instances actively running jobs. we were testing while the region was still in use so now that we've turned it off completely that could explain why you don't see it15:36
amorinyes maybe15:37
fungiwe could turn our max-servers for bhs1 back up to what we were running before and see if you can reproduce then, i suppose15:38
openstackgerritFilippo Inzaghi proposed openstack-infra/os-loganalyze master: Change openstack-dev to openstack-discuss  https://review.openstack.org/62236315:38
openstackgerritFilippo Inzaghi proposed openstack-infra/project-config master: Change openstack-infra to openstack-discuss  https://review.openstack.org/62236515:40
*** dpawlik has joined #openstack-infra15:40
dmsimardfungi: iteratively perhaps ? like not necessary 150 out of the gate15:40
dmsimardnecessarily*15:40
*** zul has joined #openstack-infra15:41
fungiwell, if we *do* want to crank it back up to 159 i can simply un-wip https://review.openstack.org/62125115:41
amorinwait  a little15:42
fungibut sure, we can also try going back to 79 instead if that would help15:42
fungijust let us know how we can assist15:42
*** florianf|biab is now known as florianf15:43
*** graphene has joined #openstack-infra15:43
*** dpawlik has quit IRC15:45
openstackgerritFilippo Inzaghi proposed openstack-infra/python-storyboardclient master: Change openstack-dev to openstack-discuss  https://review.openstack.org/62236815:45
clarkbfungi: cmurphy: I did bring it up with scott and he had said he would look at the options. But I don't know if he has done that and if he has if there were conclusions15:46
fungiclarkb: do you still need me to chair the infra/opendev meeting today, or do you expect to be around for that?15:48
clarkbfungi: I will be around15:49
clarkbthis cold won't go away so I'm hiding at home15:49
fungisorry to hear that!15:49
openstackgerritFilippo Inzaghi proposed openstack-infra/statusbot master: Change openstack-infra to openstack-discuss  https://review.openstack.org/62237515:51
openstackgerritFilippo Inzaghi proposed openstack-infra/storyboard master: Change openstack-dev to openstack-discuss  https://review.openstack.org/62237715:52
*** ykarel|afk is now known as ykarel15:53
amorinwe are doing iotune on the flavor15:53
amorinthe thing is that we are setting the same value for both bhs1 and gra115:53
amorinbut I think this is not needed at all on your flavors15:54
amorinyou are alone on the hosts so15:54
amorinI will try to disable this15:54
*** kjackal has quit IRC15:55
openstackgerritFilippo Inzaghi proposed openstack-infra/system-config master: Change openstack-infra to openstack-discuss  https://review.openstack.org/62238015:55
*** kjackal has joined #openstack-infra15:55
clarkbfungi: I see you are already -2ing some of those ^15:56
openstackgerritFilippo Inzaghi proposed openstack-infra/yaml2ical master: Change openstack-infra to openstack-discuss  https://review.openstack.org/62238215:56
fungiyep15:57
clarkbI'm looking at the zuul status page and trying to decipher if the new behavior is doing what we expect. One thing I notice is that nova has a large backlog in check (not completely unexpected) but the tripleo projects don't15:57
clarkbI'm guessing that is because tripleo while using many resources is spread out among project repos?15:58
clarkbits possible this may need further tuning15:58
fungitripleo-heat-templates had a large backlog earlier15:58
fungiand tripleo-ci a modest backlog as well15:58
clarkbnova has ~31 changes in check15:59
clarkb~20 for tripleo-heat-templates15:59
clarkb(I don't know that the behavior is wrong, mostly just trying to make sense of my observations at this point)16:00
openstackgerritFilippo Inzaghi proposed openstack-infra/zuul-base-jobs master: Change openstack-infra to openstack-discuss  https://review.openstack.org/62238616:01
openstackgerritFilippo Inzaghi proposed openstack-infra/zuul-jobs master: Change openstack-infra to openstack-discuss  https://review.openstack.org/62238716:02
openstackgerritFilippo Inzaghi proposed openstack-infra/zuul-sphinx master: Change openstack-infra to openstack-discuss  https://review.openstack.org/62238816:03
corvusfungi: https://review.openstack.org/62234116:04
corvusAJaeger: ^ that change may not be trivial16:04
openstackgerritJeremy Stanley proposed openstack-infra/infra-manual master: Revert "Change openstack-infra to openstack-discuss"  https://review.openstack.org/62239116:05
clarkbheh apparently people were twittering about 15h queue delays16:06
clarkbif only people would use the mailing list :/16:06
fungicorvus: thanks for spotting16:06
clarkbah its jd's thing https://twitter.com/openstackstatus16:07
clarkb?16:07
amorinfungi: I tuned the flavor to improve iotune, this is far better now16:08
clarkbjd_: fwiw we changed how job assignments are prioritized to give greater weight to projects that use less resources16:08
amorincan you turn back on your booting on this region?16:08
clarkbjd_: so while the ~15h or whatever delay is accurate for some projects it doesn't paint the whole picture anymore16:08
amorinroot@amorin-bhs1:~# dd if=/dev/zero of=foo bs=1M count=409616:08
amorin4096+0 records in16:08
amorin4096+0 records out16:08
amorin4294967296 bytes (4.3 GB, 4.0 GiB) copied, 8.85547 s, 485 MB/s16:08
clarkbamorin: that does look much better, thanks!16:09
fungijd_: specifically, we now dynamically round-robin job resources between gate queues so it's no longer a global fifo for the whole pipeline16:09
clarkbfungi: do you already have a prepushed revert up for amorin that I can review?16:10
*** slittle1 has joined #openstack-infra16:10
fungiclarkb: https://review.openstack.org/62125116:11
clarkbamorin: ^ has been approved we should see that soon ish16:11
*** xek has quit IRC16:12
*** pcaruana has quit IRC16:12
amorinlooks good16:13
jd_fungi: clarkb: ok; fwiw I don't really maintain this anymore so if anyone wants to take it over I'm all for it16:17
*** armax has joined #openstack-infra16:22
AJaegercorvus: agreed, missed that it was infra for the infra-manual ;( fungi, thanks for reverting.16:22
openstackgerritQuique Llorente proposed openstack-infra/zuul master: Add default value for relative_priority  https://review.openstack.org/62217516:23
*** eharney has joined #openstack-infra16:24
fungithe author at least seems to have gotten the message after a dozen or so -2 votes on other changes like that and abandoned them16:25
*** hwoarang has quit IRC16:27
*** electrofelix has quit IRC16:27
openstackgerritJames E. Blair proposed openstack-infra/nodepool master: Fix race when deleting Node znodes  https://review.openstack.org/62240316:28
AJaegerfungi: wow, he's learning...16:29
openstackgerritMerged openstack-infra/infra-manual master: Revert "Change openstack-infra to openstack-discuss"  https://review.openstack.org/62239116:32
openstackgerritMerged openstack-infra/project-config master: Revert "Temporarily disable ovh-bhs1 in nodepool"  https://review.openstack.org/62125116:32
*** xek has joined #openstack-infra16:32
*** larainema has quit IRC16:34
openstackgerritNate Johnston proposed openstack-infra/project-config master: Neutron grafana update for co-gating section  https://review.openstack.org/62241816:40
clarkbfungi: I think what the current status is trying to tell me is that with current resources (previous to bhs1 going back in) we can service the first two changes for a project in check16:41
clarkbright now the priority for third change is high enough that it isn't getting serviced. Will be interesting to see if that changes with bhs1 capacity back in place16:41
fungiclarkb: probably won't be a significant bump as it's only restoring an additional ~18%16:46
*** gyee has joined #openstack-infra16:47
*** quiquell is now known as quiquell|off16:50
*** mriedem_afk is now known as mriedem16:50
openstackgerritNate Johnston proposed openstack-infra/project-config master: Neutron grafana update for co-gating section  https://review.openstack.org/62241816:50
openstackgerritmelissaml proposed openstack/os-testr master: Change openstack-dev to openstack-discuss  https://review.openstack.org/62242516:56
openstackgerritmelissaml proposed openstack/os-testr master: Update the home-page URL  https://review.openstack.org/62242716:59
*** shardy has quit IRC17:00
openstackgerritDoug Hellmann proposed openstack-infra/openstack-zuul-jobs master: stop publishing release notes using python 2  https://review.openstack.org/62243017:02
*** ykarel is now known as ykarel|away17:03
*** ramishra has quit IRC17:06
*** takamatsu has quit IRC17:06
*** graphene has quit IRC17:10
*** wolverineav has quit IRC17:10
clarkbfungi: nova has 3 changes processing in check now17:10
clarkband bhs1 doesn't seem to be in use yet17:10
fungicool!17:11
clarkbnl04 should be updating shortly based on syslog timestampos17:11
*** graphene has joined #openstack-infra17:11
*** pcaruana has joined #openstack-infra17:12
*** e0ne has quit IRC17:13
*** ginopc has quit IRC17:14
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Add governance document  https://review.openstack.org/62243917:14
*** ccamacho has quit IRC17:15
*** dpawlik has joined #openstack-infra17:15
*** xek has quit IRC17:16
clarkbbhs1 is enabled again17:17
*** xek has joined #openstack-infra17:17
*** ykarel|away has quit IRC17:22
dmsimardyay, thanks amorin17:26
*** wolverineav has joined #openstack-infra17:29
*** wolverineav has quit IRC17:30
*** wolverineav has joined #openstack-infra17:30
*** jcoufal has quit IRC17:32
corvuslooks like we're still ramping up to full utilization; the executors are close to maxed out so they're taking on new jobs slowly17:34
openstackgerritMerged openstack-infra/nodepool master: Make launcher debug slightly less chatty  https://review.openstack.org/62167517:34
openstackgerritMerged openstack-infra/openstack-zuul-jobs master: stop publishing release notes using python 2  https://review.openstack.org/62243017:34
clarkbcorvus: we just added 159 nodes to our capacity too (restoring bhs1)17:34
corvusyeah, sorry that's what i was referring to17:34
fungii assumed that's what he meant by ramping up17:34
clarkbah17:35
*** jpich has quit IRC17:35
clarkbtripleo has just merged 12 changes together in the gate (as long as there are no last second merge conflicts)17:36
clarkbso that is exciting17:36
*** bobh has joined #openstack-infra17:36
fungione just hit a post_failure17:36
fungiand is now showing a merge conflict17:37
openstackgerritJonathan Rosser proposed openstack-infra/project-config master: Add centos/suse to OSA grafana dashboard  https://review.openstack.org/62216917:37
amorinclarkb: dmsimard fungi what is the status of BHS1 nodes ?17:37
*** bobh_ has joined #openstack-infra17:37
amorinis IO better?17:37
*** jcoufal has joined #openstack-infra17:37
clarkbamorin: I haven't checked a specific instance yet but will do that now17:37
fungier, i guess the merge conflict is on a different change17:38
amorinok17:38
*** bobh_ has quit IRC17:38
*** bobh_ has joined #openstack-infra17:39
fungiclarkb: dmsimard: looks like the ara change for logs.o.o may have significantly improved cpu utilization and also somewhat improved memory usage there, based on cacti graphs17:40
dmsimardamorin: searching "node_provider:ovh-gra1 or node_provider:ovh-bhs1" on logstash.openstack.org seems to yield successful jobs as far as I can tell17:40
*** bobh has quit IRC17:41
*** bhavikdbavishi has quit IRC17:41
amorindmsimard glad to hear that17:41
clarkbamorin: 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 4.89867 s, 219 MB/s17:41
amorinsounds good17:41
clarkbthat looks much much better, thank you17:41
amorinI applied the same on GRA1, so even GRA1 should be better17:41
fungiclarkb: dmsimard: oh, or at least the ara change resulted in no negative impact on the server... the earlier cpu and memory increases seem to be part of a periodic pattern if you look at the monthly graphs17:41
amorinI'll check if I can increase the IO a little bit more in the future17:41
clarkbamorin: good to know re GRA117:42
dmsimardfungi: the monthly graphs are indeed showing an odd pattern http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=138&rra_id=all17:42
fungidmsimard: i have a feeling that's the weekly log pruning cron17:43
dmsimardthat would make sense if it lasts ~3 days17:43
fungiit takes a few days to run its course17:43
fungiso, yeah17:43
*** bobh_ has quit IRC17:43
amorinalso, we have in our commit pipe the fact to reenable nested virtualisation, I hope I will be able to push that to GRA1 and BHS1 tomorrow17:44
amorinI know that this is affecting the OSF somehow17:44
*** bobh has joined #openstack-infra17:45
clarkbamorin: its mostly that we have some jobs that test nested workloads (octavia tests nested load balancers for example) and qemu isn't really fast enough to do that reliably17:45
dmsimardI remember an obscure bug we hit on OVH with... disk injection ? Not sure that ever got fixed17:46
*** wolverineav has quit IRC17:46
*** derekh has quit IRC17:46
*** wolverineav has joined #openstack-infra17:47
dmsimardSomething along the lines of libguestfs using kvm instead of qemu17:47
*** wolverineav has quit IRC17:47
*** kjackal has quit IRC17:47
*** wolverineav has joined #openstack-infra17:47
amorinah, I am not aware of this bug17:48
openstackgerritNate Johnston proposed openstack-infra/openstack-zuul-jobs master: Replace neutron-grenade job with grenade-py3  https://review.openstack.org/62248017:48
openstackgerritJames E. Blair proposed openstack-infra/nodepool master: Fix race when deleting Node znodes  https://review.openstack.org/62240317:49
dmsimardamorin: it was https://bugs.launchpad.net/nova/+bug/1735823 -- it's probably not specific to OVH but that's where we noticed it17:50
openstackLaunchpad bug 1735823 in OpenStack Compute (nova) "Nova can hang when creating a VM with disk injection" [Medium,In progress] - Assigned to Matt Riedemann (mriedem)17:50
amorinok, I'll check that17:52
mriedeminteresting http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Failed%20to%20force%20guestfs%20TCG%20mode%5C%22%20AND%20tags%3A%5C%22screen-n-cpu.txt%5C%22&from=7d17:53
mriedem[None req-9d2a4bac-a4b3-4be2-a896-88c3cb7497cf  tempest-ServerActionsTestJSON-983743848  tempest-ServerActionsTestJSON-983743848] Failed to force guestfs TCG  mode. guestfs_set_backend_settings returned: None: NotImplementedError17:53
*** pcaruana has quit IRC17:56
*** gfidente is now known as gfidente|afk18:04
*** xek has quit IRC18:06
*** xek has joined #openstack-infra18:06
clarkbdhellmann: the community goals email reminds me that I think you had said you had a plan to make openstack client startup quicker? or was that dtroyer?18:11
clarkbI'd be curious to hear what that plan is if I am remembering correctly. And will try to help as much as possible to make those changes18:11
*** e0ne has joined #openstack-infra18:13
TheJuliaclarkb: I think it was dtroyer. I replied to that thread specifically because I have questions regarding what we would consider scoping. If we do the full scope and the largest lift, then it should naturally fire up much faster if I understand the root problem as to why it fires up so slowly.18:19
clarkbTheJulia: the issue with startup time is the use of entrypoints as implmented by pkg_resources18:20
clarkbTheJulia: it scans your entire python path for all python package installations, inspects them for metadata then sorts them all. This scales with the number of packages and the speed of your filesystem18:20
TheJuliayup18:20
TheJuliaThat was my understanding as well18:20
*** jpena is now known as jpena|off18:20
clarkbwith long lived processes this isn't an issue bceause you start it once and the cost is paid and you are done. But with osc as used by eg devstack we pay that cost every time we run osc18:20
clarkband it adds significant time to the devstack runs I think18:21
clarkb(its been a while since I looked at numbers)18:21
TheJuliathe last numbers I seem to remember were kind of mind boggling18:21
fungiwasn't there a plan to have a persistent osc socket in devstack to reduce that impact?18:22
TheJuliai wonder if it could make sense to have some sort of cache to prevent the scan18:22
TheJuliaor something like what fungi just mentioned18:22
clarkbfungi: that was one idea thrown out at the time18:22
fungiTheJulia: yeah, i vaguely recall dhellmann talking about stevedore caching entrypoints as one possibility18:22
clarkbanother idea was to stop using entrypoints for the "base" or "common" commands18:23
fungian interesting effect of the dynamic priority in zuul is that you can see which repos are the most active in check based on what's floated to the top due to getting delayed18:24
fungipretty much all tripleo, nova and a smidge of neutron at the moment. everyone else is getting nodes pretty quickly18:25
fungia networking-odl change just entered the check pipeline 6 minutes ago and already has about half its node requests fulfilled18:26
toskydo you plan to publish some graphs? :)18:27
fungitosky: hard to know what to graph there18:27
toskyand/or statistics18:27
*** auristor has quit IRC18:28
fungii mean, we have general graphs like the ones at http://grafana.openstack.org/d/T6vSHcSik/zuul-status18:30
*** e0ne has quit IRC18:30
fungiand extremely detailed stats tracked at http://graphite.openstack.org/18:30
fungibut not sure what statistical insights can be gained from the recent change in prioritization18:31
clarkbhalf an hour to the infra meeting. I failed at sending email yesterday. /me edits reminders on calendar for that18:31
clarkbok my phone will remind me next week18:32
dmsimardfungi, clarkb: merging https://review.openstack.org/#/c/616306/ might help in giving us insight as to how the resources are distributed with the new algorithm -- would have actually been nice to have this *before* the algorithm change to compare the differences18:32
*** jrist has quit IRC18:32
*** graphene has quit IRC18:34
fungidmsimard: would it? the algorithm change wouldn't affect long-term utilization per project, only their relative throughput18:34
fungiit wasn't about reducing anyone's utilization, more like letting someone with a couple of items in their hand jump in front of your massive cartload at the grocery checkout18:35
fungiyou're all still going through eventually18:36
fungijamesmcarthur: i'm guessing you want me to just discard that message of yours which just landed in the -discuss moderation queue since you seem to have resent it from a different address anyway?18:39
jamesmcarthurha fungi: I was scrambling to see if I had a login to the ml18:39
jamesmcarthuryes please :) delete18:39
fungidone18:40
fungii haven't added any other moderators yet because with the transition i'm still relying on a fair amount of local filtering and spam identification to make sure nothing slips through the cracks18:40
clarkbdmsimard: fungi yup over the long term we use something like 30% of our available resources because weekends and holidays and hours where fewer people are working are still a thing for us18:41
clarkbdmsimard: but also we can glean that info from the logs already18:41
clarkbtobiash's change is a definite improvement over that but we do have some of the info18:41
dmsimardfungi, clarkb: makes sense18:43
clarkbdmsimard: the last two days its tripleo 40.9%, nova 9.9%, neutron 8.7% then puppet, helm , and osa make the range from 5-3% ish18:43
funginice to see tripleo continuing to drop in that ranking18:44
fungier, well, not in ranking but at least in proportion18:44
clarkbfungi: dmsimard the week prior was tripleo 36%, neutron 13%, nova 8.9%18:46
clarkbI think that mostly confirms we aren't changing usage, just when you get to use it18:47
pabelangerthe RETRY_LIMIT of tripleo jobs yesterday will also eat up a lot of nodes too, since we try 3x18:50
clarkbpabelanger: ya but it was failing pretty fast and I promoted the change as soon as it was approved18:51
clarkb(I think we handled that reasonably well, the only followup on that is maybe adding an ansible lint rule to become when installing packages)18:51
*** betherly has joined #openstack-infra18:54
*** ahosam has joined #openstack-infra18:56
*** betherly has quit IRC18:58
*** jcoufal has quit IRC19:01
*** diablo_rojo has joined #openstack-infra19:01
*** diablo_rojo has quit IRC19:01
*** diablo_rojo has joined #openstack-infra19:01
*** wolverineav has quit IRC19:02
ianwdmsimard: hey, i added the ara install stuff to the meeting topics.  i dunno about a generic role in zuul-jobs, i'm not opposed if someone wants to write it for something, but i'm not sure i'd take it on just for the sake of it19:02
*** e0ne has joined #openstack-infra19:03
dmsimardianw: ok, I'll be there for meeting19:03
*** wolverineav has joined #openstack-infra19:06
*** imacdonn has joined #openstack-infra19:13
*** ahosam has quit IRC19:19
openstackgerritMerged openstack-infra/nodepool master: Fix race when deleting Node znodes  https://review.openstack.org/62240319:20
imacdonnhi infra ... just reporting another case of something that looks like tests failing due to IP address conflict (ssh auth/e failures and timeouts): http://logs.openstack.org/48/622348/1/check/legacy-tempest-dsvm-full-lio-src-os-brick/d81b9e5/19:21
*** prometheanfire has quit IRC19:21
fungiimacdonn: interesting, that's in obh-bhs1 which we only turned back on to start using in the past couple hours19:22
*** graphene has joined #openstack-infra19:22
*** prometheanfire has joined #openstack-infra19:22
fungiimacdonn: what about that looks like an ip address conflict? looks like tempest tests failing to delete volumes19:24
fungi"Invalid volume: Volume status must be available or error or error_restoring or error_extending or error_managing and must not be migrating, attached, belong to a group, have snapshots or be disassociated from snapshots after volume transfer."19:24
imacdonnfungi: maybe I pasted the wrong link .. seeing things like:19:24
dmsimardssbarnea|rover: (now offtopic) openstack tries really hard to not use closed source software, "free software needs free tools", four opens, etc.19:24
imacdonn2018-12-04 18:42:52.326095 | primary |     2018-12-04 18:39:26,546 32281 INFO     [paramiko.transport] Authentication (publickey) failed.19:24
imacdonn2018-12-04 18:42:52.326217 | primary |     2018-12-04 18:39:26,688 32281 WARNING  [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@172.24.5.2 (Authentication failed.). Number attempts: 4. Retry after 5 seconds.19:24
imacdonnand: 2018-12-04 18:51:27.023717 | primary |     tempest.lib.exceptions.SSHTimeout: Connection to the 172.24.5.2 via SSH timed out.19:25
fungiimacdonn: 172.24.5.2 would be an address of a nested vm created by devstack (that's an rfc-1918 address), not a job node interface's address19:25
ianw#status log moved bridge.o.o /etc/ansible/hosts/openstack.yaml to a .old file for clarity, as it is not (and perhaps was never) used19:26
openstackstatusianw: finished logging19:26
*** graphene has quit IRC19:27
imacdonnfungi: OK ... but seems that not being able to ssh to it is still bad ?19:27
*** jrist has joined #openstack-infra19:29
fungiimacdonn: sure, but you're reporting an issue entirely encapsulated within devstack, nothing to do with our ci system's connectivity to the job node19:29
*** graphene has joined #openstack-infra19:29
*** florianf has quit IRC19:30
imacdonnfungi: I didn't think I made any assertion about where the problem lies :)19:30
*** dpawlik has quit IRC19:30
fungiimacdonn: you mentioned thinking it was due to an ip address conflict19:31
*** dpawlik has joined #openstack-infra19:31
fungifor which the only ip address conflicts within the infra team's domain of control are conflicts over the addresses by which the ci system reaches its job nodes19:31
imacdonnfungi: OK, I guess my report was not useful, so please disregard.19:32
fungiimacdonn: sorry, i was merely confused why you were mentioning what looked like a devstack problem19:33
imacdonnfungi: I thought it looked similar to issues we saw some weeks ago, where it appeared that the test was trying to ssh to a VM that it had created, but was failing to login, which seemed like it may be actually connecting to some other host using that IP address19:34
*** dtantsur is now known as dtantsur|afk19:35
imacdonnfungi: I don't have any visibility into the infrastructure that these tests run on, so it's difficult to diagnose beyond that19:35
fungiimacdonn: yeah, and i don't have enough insight into devstack/tempest to be able to guess what's gone wrong there. you may find a better audience for devstack and tempest failures in #openstack-qa as those aren't infra projects19:40
*** wolverineav has quit IRC19:40
fungilooks like our connectivity to the node where that ran was fine19:40
*** wolverineav has joined #openstack-infra19:41
*** wolverineav has quit IRC19:46
*** jamesmcarthur has quit IRC19:58
*** e0ne has quit IRC19:58
*** aojea has quit IRC19:59
corvusmordred: oh, were you suggesting that if we're going to use a ssg for opendev, we should go ahead and use gatsby?  that makes sense.19:59
openstackgerritFrançois Magimel proposed openstack-infra/infra-manual master: Fix some URL redirections and broken links  https://review.openstack.org/62258120:01
clarkbmordred: corvus: it seems that gatsby isn't really any different than any other static site generator? You write markdown or rst, compile it to static site with html/js/css and then serve that built content.20:04
clarkbfrom that I have two questions 1) why gatsby instead of the million other static site generators, 2) where does netlify-cms fit into that?20:04
clarkbinfra-root I am going to reboot the opendev ns server that needs a reboot now20:05
clarkbsince nothing is really using opendev yet I think its safe to just do the reboot right?20:06
clarkbIn the future we might update NS records?20:06
fungirebooting an authoritative nameserver when there's another one responding should be fine20:06
*** agopi has quit IRC20:06
fungieven if the domains it serves are in use20:07
clarkbdo resolvers check both in that case?20:07
clarkbpriority probably plays into it too?20:07
fungiyes, if a resolver can't reach one of the listed nameservers for a domain it will try another20:07
clarkbok motd no longer says reboot required20:07
fungiif it needs to try too many then the client's request may time out before the recursive resolver provides it an answer20:08
clarkbnsd failed to start so I'm debugging that now20:09
clarkbhrm journalctl says it returned 1 too many times in quick succession so systemd stopped trying to start it20:11
clarkbI don't see any logs indicating what the actual failur was though20:11
clarkbI think networking wasn't up yet20:12
*** wolverineav has joined #openstack-infra20:12
clarkbI stop,started it by hand and it is running now20:12
clarkba race in the init system I guess20:13
*** manjeets_ has joined #openstack-infra20:13
clarkbmaybe we need to add an After=networking.target or whatever the appropriate incantation is?20:13
*** manjeets has quit IRC20:14
fungii take it the nsd package isn't providing its own systemd unit file? or is this merely a bug in the one it shipped?20:18
fungiinfra-root: if i can get some quick eyes on https://review.openstack.org/619056 that's one more mailing list which can be retired20:22
clarkbfungi: my guess is its a bug in the one that is shipped. But we can add a .after file iirc to chagne the order without changing the main unit20:24
clarkbI'm going to grab lunch then I'll dig back into my todo list and check if nsd starting before network has an easy fix20:28
clarkbbefore I do that. brainstorming around new zuul behavior20:29
clarkbcorvus: ^ I wonder if switching from change count based numbers to node use might give better approximation for what we are trying to do there?20:29
clarkbI don't think that will make nova or tripleo move through any more quickly20:30
clarkbbut if we did that then we might be able to account for gate and check?20:30
*** graphene has quit IRC20:30
*** xek has quit IRC20:31
*** graphene has joined #openstack-infra20:32
fungii agree that node count for the requested jobs is a more accurate representation of the cost than just treating every change equally, but i wonder how you use that to prioritize since you ideally satisfy all node requests for a change to keep things from sitting around half-tested20:36
*** ahosam has joined #openstack-infra20:36
openstackgerritMerged openstack-infra/system-config master: bridge.o.o : install ansible 2.7.3  https://review.openstack.org/61721820:37
*** jrist has quit IRC20:39
corvusfungi: i think we should continue to set the priority to the same value for every job in a change, but if we based the priority on nodes requested by changes ahead in the queue, then projects with large node counts will be "demerited" faster.  so a second system-config change which ran all of our jobs might be waiting behind the third zuul change which only ran a few jobs.20:44
*** takamatsu has joined #openstack-infra20:44
fungii guess that could work20:44
corvus(i mean, we could also have different priority values for jobs within a change; that would be a more complex and different behavior)20:46
*** ralonsoh has quit IRC20:47
pabelangerinteresting idea using node requests for priority also20:48
fungicorvus: oh, and you were asking about openstack-discuss metrics. looking back at my notes i was performing cross-analysis of addresses seen posting to the old lists which are subscribed to the new. for example, there were 245 distinct addresses seen posting 10 or more messages to the old lists so far in 2018 and 148 of those addresses (60%) have subscribed to the new list20:50
fungi70% of the addresses which posted 20 messages or more this year have subscribed to the new list20:52
fungi86% of those which posted 50 messages or more20:52
fungiclearly a long tail there, a lot of which i expect is due to address changes20:53
*** eharney has quit IRC20:53
*** auristor has joined #openstack-infra20:53
fungi100 messages only ratchets it up another percent20:54
*** gfidente|afk is now known as gfidente20:59
*** dpawlik has quit IRC21:01
*** wolverineav has quit IRC21:01
*** wolverineav has joined #openstack-infra21:04
*** eharney has joined #openstack-infra21:04
*** wolverineav has quit IRC21:04
*** wolverineav has joined #openstack-infra21:04
*** wolverineav has quit IRC21:15
*** kjackal has joined #openstack-infra21:18
openstackgerritKendall Nelson proposed openstack-infra/infra-specs master: StoryBoard Story Attachments  https://review.openstack.org/60737721:19
*** wolverineav has joined #openstack-infra21:20
*** wolverineav has quit IRC21:20
*** wolverineav has joined #openstack-infra21:20
clarkbianw: fungi: https://review.openstack.org/#/c/621847/1 confuses me slightly. Wasn't there an error after we got the fix in that precipitated the nonvoting change?21:23
clarkbit certainly appears to work on that change though21:23
clarkbthe issues was ssh is not a valid connection plugin iirc21:24
clarkbdo we know what caused that? maybe ansible fixed that in a race with our jobs running?21:24
fungiclarkb: yes, without knowing what triggered the subsequent ssh errors, i can only guess we raced with a change landing upstream in ansible21:24
ianwclarkb: there was https://review.openstack.org/#/c/621633/ proposed, which used a block: in the handler which doesn't work.  after investigating that, it's a known "problem" (the github issues seems unclear if it's a bug or feature)21:25
ianwcorvus: ^ could probably abandon that for clarity?21:26
clarkbianw: ya then we merged 621634 which is the fix for that via the listeners addition and the next runs after that had errors saying ssh was an unknown connection plugin21:26
clarkbianw: it was that second error that caused us to switch to non voting21:26
ianwoh right, yeah i saw that once21:26
corvusabandoned21:26
corvusi'm on the fence -- i could go either way, but i think maybe keeping it non-voting and just trying to keep an eye on it might be the way to go.21:27
clarkbthe recent commit log for ansible doesn't show a likely fix. I guess I should git log -p21:27
ianwi kind of like that it has proved it has actually found things that saved us from having production issues21:28
*** armax has quit IRC21:28
ianwalthough, i guess it caused production issues as it had to be fixed before other things could move in21:28
ianwdmsimard: so following on from meeting conversation, i'd propose we squash https://review.openstack.org/621463 into your original ara installation review21:30
corvusyeah.  the job is great.21:30
ianwand then create a new review that enables it for production bridge.o.o21:30
*** jrist has joined #openstack-infra21:31
*** kgiusti has left #openstack-infra21:31
*** tpsilva has quit IRC21:32
clarkbthat error comes out of lib/ansible/executor/task_executor.py attempting to load the connection plugins based on the host config21:33
clarkbI dont' see any code touching that file in the last day21:33
ianwcorvus: one thing that would have made it a bit quicker to bisect is something like -> https://review.openstack.org/621840 .  i think there's many ways to do something similar (stamp the heads in logs) so interested in your thoughts21:34
ianwunfortunately ansible --version doesn't show the git head unless it's installed with "pip -e"21:34
ianw(which does work, https://review.openstack.org/621471, but i thought a common zuul solution is probably nicer)21:35
clarkbhttps://github.com/ansible/ansible/pull/49249 is suspicious given it deals with handlers though21:36
clarkbI do wonder though if ansible even intends for it to be consumed this way21:36
clarkbits good data for us, but I'm also not sure how actionable it is (we were 1 for 2 yesterday)21:37
corvusclarkb: no i don't think so, which is why i think non-voting may be the best approach21:38
corvusthe branch is called 'devel' :)21:38
*** wolverineav has quit IRC21:38
*** eharney has quit IRC21:40
*** wolverineav has joined #openstack-infra21:40
fungiperhaps a more effective job to peek at if it's failing around time for ansible release candidates?21:42
clarkbdmsimard: for https://review.openstack.org/#/c/611228/10/playbooks/host_vars/bridge.openstack.org.yaml we would start writing a sqlite database on production bridge.o.o too right? what happens when we mix the ansible cron, adhoc ansible, and sqlite there? Will that cause ansible to fail or just sqlite to record junk data?21:43
clarkbI think I'm ok with it as long as its not going to cause the cron to fail if someone runs an ad hoc command or vice cersa21:44
pabelangerianw: clarkb: running ansible from devel is a little hard, I'd maybe say doing so around RC time, per fungi or maybe just run it from a stable branch21:44
clarkbpabelanger: in this case we are interested in testing how future ansible will break us so stable branches probably aren't what we want. But ya maybe we only look at it when it is RC time21:45
pabelangerianw: clarkb: often in ansible-network, we've had devel break us for some random reason, so we do non-voting also but really considering dropping it shortly21:45
dmsimardclarkb: the vars for using mysql are set up in the hostvars on bridge.o.o21:45
clarkbdmsimard: not on that change, it is using sqlite aiui21:45
clarkbdmsimard: ya the default is sqlite21:45
pabelangerclarkb: yah, I think RC time is when things really stablize on devel, IIRC. Until then, it is pretty wildwest21:45
dmsimardclarkb: if you use the install-ansible role without anything, yes -- but in the hostvars for bridge.o.o we set up mysql and the authentication21:46
clarkbdmsimard: we don't I linked to the host vars for bridge.openstack.org and that is commented out21:46
dmsimardin the private vars? equivalent to hiera21:46
* clarkb looks21:47
dmsimardclarkb: /etc/ansible/hosts/host_vars/bridge.openstack.org.yaml21:47
dmsimardIt may not reflect the latest patchsets, need to do a full review21:47
clarkbya ok so its set in private data I now know what my comements on that change are :)21:48
dmsimardianw: yup let me review21:49
clarkbok left notes on https://review.openstack.org/#/c/611228/10 and its parent (which makes the ansible devel job voting again)21:50
openstackgerritJames E. Blair proposed openstack-infra/infra-specs master: Move project-hosting spec to completed  https://review.openstack.org/62260921:51
dmsimardclarkb: can you expand on "Can we instead enable this just for testing to start?" ?21:52
dmsimardWhat is "this" ?21:52
dmsimardmysql implementation ?21:52
dmsimardthe in-job nested report ?21:52
clarkbdmsimard: enable ara on "bridge.openstack.org" only in the test jobs to start21:53
dmsimardok21:54
fungiokay, gonna go find an early dinner, but should be back soon21:55
*** bobh has quit IRC21:57
*** jaosorior has quit IRC22:00
*** boden has quit IRC22:07
clarkbdiablo_rojo: fungi comments on https://review.openstack.org/#/c/607377/6 curious to know what you think22:18
clarkbthat is the bulk of reviews done. Time to debug nsd ansible22:18
*** ahosam has quit IRC22:19
*** slaweq has quit IRC22:20
openstackgerritNate Johnston proposed openstack-infra/openstack-zuul-jobs master: Add nodeset ubuntu-bionic-2-node for bionic multinode testing  https://review.openstack.org/62261322:21
clarkbhttps://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/ yes freedesktop.org how did you know :)22:22
clarkbthis nsd unit problem has its own faq entry22:22
openstackgerritIan Wienand proposed openstack-infra/system-config master: Add support for enabling the ARA callback plugin in install-ansible  https://review.openstack.org/61122822:23
openstackgerritIan Wienand proposed openstack-infra/system-config master: Prefix install_openstacksdk variable  https://review.openstack.org/62146222:23
openstackgerritIan Wienand proposed openstack-infra/system-config master: [to squash] Modifications to ARA installation  https://review.openstack.org/62146322:23
openstackgerritIan Wienand proposed openstack-infra/system-config master: functional-tests: collect and publish inner ARA results  https://review.openstack.org/61721622:23
ianwdmsimard: ^ responded on that variable22:23
diablo_rojoclarkb, looking now22:24
dmsimardianw: oi, I was working on the squash :p22:25
dmsimardianw: damn it, I think gerrit hijacked my ctrl+f and I didn't see it22:26
diablo_rojoclarkb, seems like a good route to me, I think we might want to be able to set when the link expires based on if its private or not? But other than that, I think being more careful is better?22:27
ianwdmsimard: it's all similar for ansible & opensatcksdk ... it's a lot of variable swizzling but it does make it a bit clearer when the role is called i think22:27
clarkbdiablo_rojo: ya we should be able to make things more or less accessible depending on that22:27
diablo_rojoclarkb, then I think it sounds good.22:28
dmsimardianw: so here's what I'd do: squash 621463 into 611228 but move the .zuul.yaml change to 61721622:28
dmsimardhow does that sound ?22:29
*** takamatsu has quit IRC22:29
ianwdmsimard: sure, want me to do that, or do you have it in progress?22:30
dmsimardhave it almost done22:30
*** bobh has joined #openstack-infra22:30
dmsimardianw: good idea to use ara master with devel :)22:32
openstackgerritDavid Moreau Simard proposed openstack-infra/system-config master: Add support for enabling the ARA callback plugin in install-ansible  https://review.openstack.org/61122822:32
openstackgerritDavid Moreau Simard proposed openstack-infra/system-config master: functional-tests: collect and publish inner ARA results  https://review.openstack.org/61721622:32
ianwthat -devel job will be testing ansible devel, openstacksdk master and ara master.  it's a pretty good canary22:33
ianwyay, cloud-launcher is running again on bridge.  looks like 20 minutes is about the baseline22:33
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool master: Add cleanup routine to delete empty nodes  https://review.openstack.org/62261622:33
*** dave-mccowan has quit IRC22:33
dmsimardianw: I guess we should move https://review.openstack.org/#/c/611228/12/playbooks/bridge.yaml to 617216 too22:33
*** rcernin has joined #openstack-infra22:34
dmsimardas well as the testinfra test22:35
dmsimardit's a self nit, mostly wanted to keep the scope of the patches separate :p22:35
ianwdmsimard: yeah, can we also base it on https://review.openstack.org/#/c/621462/ (renames of the other variables) so it's all consistent22:36
clarkbok I've confirmed that the nsd unit from ubuntu is buggy22:36
dmsimardianw: will do22:37
*** slaweq has joined #openstack-infra22:37
clarkbaha, it would work if we listned on :: 0.0.0.0 or 127.0.0.1 or ::122:39
clarkbinteresting22:39
*** slaweq has quit IRC22:42
*** mriedem has quit IRC22:44
*** rcernin_ has joined #openstack-infra22:45
*** rcernin has quit IRC22:45
openstackgerritDavid Moreau Simard proposed openstack-infra/system-config master: Prefix install_openstacksdk variable  https://review.openstack.org/62146222:46
openstackgerritDavid Moreau Simard proposed openstack-infra/system-config master: Add support for enabling the ARA callback plugin in install-ansible  https://review.openstack.org/61122822:46
openstackgerritDavid Moreau Simard proposed openstack-infra/system-config master: Enable ARA reports for system-config bridge CI jobs  https://review.openstack.org/61721622:46
dmsimardianw: ^ hopefully squashed and rebased everything in the right order22:48
dmsimardhttps://review.openstack.org/#/c/621847/ and https://review.openstack.org/#/c/621463/ are not included in the tree22:49
ianwyep they can be abanonded22:51
*** slaweq has joined #openstack-infra22:53
*** wolverineav has quit IRC22:53
openstackgerritClark Boylan proposed openstack-infra/system-config master: Update nsd systemd unit deps  https://review.openstack.org/62262022:53
clarkbcorvus: fungi ^ fyi I think that should fix it though I've not tested it yet. Our integration testing should at least test it isn't more broken than today22:54
*** wolverineav has joined #openstack-infra22:56
*** wolverineav has quit IRC23:01
clarkbianw: dmsimard stack lgtm23:01
*** priteau has quit IRC23:03
*** wolverineav has joined #openstack-infra23:05
*** bobh has quit IRC23:07
*** slaweq has quit IRC23:10
*** jamesdenton has quit IRC23:10
*** rcernin_ has quit IRC23:12
*** rcernin has joined #openstack-infra23:13
*** witek has quit IRC23:14
*** witek has joined #openstack-infra23:16
corvusclarkb: slick23:16
fungiokay, fed and catching up23:18
*** gema has quit IRC23:18
ianwclarkb: want to just look over https://review.openstack.org/#/c/621231 (add stein ubuntu-cloud repo) but i think it's fine23:21
mnaserhm23:26
mnaseropenstack/nova seems to have a 13h queue23:26
mnaseris that based on the new changes?23:26
fungimnaser: yes, nova changes will wait for other projects if there are more outstanding nova changes than outstanding changes for those other projects23:28
fungialso mikal seems to have pushed a ~20-change series for nova in check earlier today which contributed to slowing testing for them23:28
mnasermight be running in late but wouldn't this behavior maybe encourage one big patch rather than a bunch of small ones23:29
mnaserand yeah, it seems like one bad stack can just halt a whole project :\23:29
fungior might encourage projects to split into multiple repos instead of relying on a single repo too23:29
fungimnaser: the alternative was one stack of nova changes halting testing for everyone else while we tested them23:30
mnaserim trying to think how to ideally resolve this :\23:31
mnaserwhile i see how we've solved things, i feel like it also caused other bottlenecks in other ways23:31
fungibasically with the dynamic priorities algorithm, more active projects now wait longer for some of their changes so that less active projects get more prompt results23:31
mnaserfungi: for example, should we maybe run openstack-tox-lower-constraints only when lower-constraints.txt changes?23:33
mnaserjust maybe trying to identify ways where we can reduce our usage and have things run faster23:33
mnaseror maybe openstack-tox-docs only when docs change23:33
fungimnaser: that supposes that changes people propose to the code don't introduce an expectation on newer versions of their dependencies23:34
fungialso, whittling away at jobs like unit tests and docs builds aren23:35
fungi't where you're going to see gains23:35
mnaseri hate to say it but *maybe* we should fast fail if one job fails rather than running the rest23:35
fungifind ways to run fewer multi-node several-hour integration tests23:35
mnasersure it might result in more iterations23:35
fungiare you seeing lots of changes running where the quick-to-complete jobs failed?23:36
fungipep8 didn't pass but devstack+tempest worked and ran for two hours?23:36
mnaseror at smaller scale, if unit tests failed, but it still ran the whole devstack+tempest anyways (and might have probably failed)23:37
clarkbkeep in mind that still the biggest hog is the tripleo gate resets (one just happened recently)23:38
fungiso, like, getting devstack+tempest to abort on the first test failure rather than running the rest of the battery?23:38
clarkbI think the most significant change would be investing in reliable testing23:38
clarkb(which is why I keep pushing that angle)23:38
mnaserfungi: kinda, or making some sort of dependency of linters => unit => func => integration23:38
mnaserclarkb: i agree, but even if everything is reliable, a 20 stack patch will slow down an entire project a whole day23:39
fungiyes, spending lots of effort finding ways to run 5% fewer jobs is basically admitting defeat on fixing the bugs which cause gate resets and eat far more resources than that23:39
mnaserwell, no one is solving those bugs unfortunately, so its kinda like23:39
mnaserthe better outcome of the two bad outcomes23:39
mnaser:(23:39
clarkbmnaser: that simplification is too simple :) the only reason that check is slow like that is gate is monopolizing the resources23:40
fungimnaser: well, the check pipeline is only getting a small slice of the resource allocation because the gate pipeline still has absolute priority over it23:40
clarkbgate is monopolizing the resources because queues keep restarting due to bugs23:40
mnaserman at this point even i'll help out track these down :\23:40
*** udesale has quit IRC23:40
fungimnaser: we could "optimize" by making check higher priority and letting approved changes take however long they take to eventually merge23:40
mnaserfor example 619701,5 in gate reset over `openstack-tox-lower-constraints`23:41
mnaserthat cant be that hard to find..23:41
fungiamounts to basically test things in check during the week when developers are awake and active, and then merging approved changes on the weekends23:41
clarkbmnaser: ya I try to pick a few off the pile every week (yesterday was the centos 7.6 related failures)23:42
clarkbtoday I've got a much longer todo list so havne't been digging into individual failures23:42
mnaseri try to look after openstack-ansible as much as possible23:43
mnaseri dunno how bad we are :)23:43
clarkbwell OSA is ~4.5% of the total resource usage23:44
fungigood news: openstack-ansible results should be coming faster than tripleo, nova and neutron ;)23:44
clarkbreally we are going to see the most impact improving the tests of the projects consuming the most resources23:44
fungibut yeah, fixing bugs which only impact osad can at best chip away at <5% of the resource utilization23:44
fungifixing bugs which impact tripleo on the other hand get to eat away at their 40%23:45
fungi(or dropping more of their jobs, or removing additional nodes from their multi-node jobs)23:45
clarkbtop of integrated gate is grenade and tempest failing on a glance change23:47
clarkbin aggregate integrated gate is also a big consumer23:47
fungiit includes 2 of the top 3 repos, for sure23:47
fungii'm guessing tripleo-heat-templates is the highest consumer repo anyway23:48
*** gfidente has quit IRC23:48
clarkbfungi: yes at 15 ish percent23:49
clarkbmnaser: that lower constraints failure appears to be a valid nova unittest failuer23:51
clarkbin test_add_exmods23:51
clarkb*test_add_extra_exmods23:51
mnaserspoke too soon23:53
mnaserall our centos jobs are broken because of https://github.com/CentOS/sig-cloud-instance-images/issues/13323:53
mnaseroh well23:53
mnaserinteresting feature idea that just came to me23:54
mnaser'pause' jobs for a project .. but not set them to non voting23:54
mnaseri don't want to go non voting on centos-7 and start landing stuff that break it, but i dont want to waste upstream CI resources for constant failures anyways23:54
*** tosky has quit IRC23:55
mnasermaybe a job that hard-fails right away (which runs on zuul-executor only, avoiding allocating a node?)23:55
*** pbourke has quit IRC23:55
clarkbmnaser: you're think just short circuit and return failure until some reset is made so that nodes aren't consumed for that?23:55
fungitripleo seems to want something similar, as they have a lot of external dependencies which can break them and then tend to just abandon every approved change to clear out their gate queue if they know they'll all fail anyway23:56
mnaserclarkb: yeah23:56
clarkbthat is theoretically possible, I don't think zuul would do it for us today23:56
mnaserbut we can probably do it using that similar job that had rtd warning fail23:56
clarkbout of curiousity why does the centos version matter if you are using a container. I thought the whole point was isolating those concerns :P23:56
*** pbourke has joined #openstack-infra23:56
*** rascasoft has quit IRC23:57
*** pabelanger has quit IRC23:57
*** haleyb has quit IRC23:57
mnaserclarkb: we build venvs and tag them per OS release, the container running the repo is 7.5 so it builds things against 7.5.. when the host (7.6) tries to download something, it gets a 404 because it cant find the package23:57
mnasers/tries to download something/tries to download a built venv/23:57
clarkbmnaser: its probably safer to rely on wheels rather than whole venvs?23:58
mnaserthis is all fixed thanks to the work of odyssey4me so this will slowly disappear out of OSA (this cycle hopefully)23:58
clarkbthey should install quite quickly23:58
mnaserexactly.  enter python_venv_builder role we're transitioning to :)23:58
*** rascasoft has joined #openstack-infra23:58
mnaserinstead we build the venv inside the container by installing the whells23:58
mnaserhttps://github.com/openstack/ansible-role-python_venv_build23:59

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!