#openstack-ansible log

16:01:31 <evrardjp> #startmeeting openstack_ansible_meeting
16:01:32 <openstack> Meeting started Tue Oct 23 16:01:31 2018 UTC and is due to finish in 60 minutes.  The chair is evrardjp. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:01:34 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:01:36 <openstack> The meeting name has been set to 'openstack_ansible_meeting'
16:01:57 <evrardjp> #topic rollcall
16:01:59 <evrardjp> o/
16:02:04 <d34dh0r53> o/
16:02:07 <nicolasbock> o/
16:02:25 <hwoarang> o/
16:02:31 <mnaser> o/
16:02:46 <openstackgerrit> Jesse Pretorius (odyssey4me) proposed openstack/openstack-ansible-ops master: MNAIO: Add legacy os-infra_hosts group back  https://review.openstack.org/612737
16:02:50 <odyssey4me> o/
16:03:03 <evrardjp> #topic Last week highlights
16:03:08 <cloudnull> o/
16:03:22 <evrardjp> hwoarang: added "Proposed distro job for ceph deployments and needs votes to get it in. Also restored lxc and ceph deployments for SUSE. Also needs votes :)"
16:03:48 <mnaser> any link to that review?
16:03:50 <evrardjp> jrosser 's highlight was "bionic timouts - fixed by removing repo cach server"
16:04:32 <hwoarang> my highlight is old i've removed it from the wiki. i think it's all done but i am still catching up. iirc some opensuse jobs have been reverted and/or switched to non-voting
16:04:37 <hwoarang> so trying to figure out what happened :(
16:05:19 <evrardjp> ok sorry for that it seems my browser might have caused a reference to an old highlight then
16:05:43 <mnaser> afaik we went nv after a lot of timeout and failure, some might be infra related and some being mirror related and some just being some os thing
16:05:44 <spotz> \o/
16:06:01 <openstackgerrit> Merged openstack/openstack-ansible-os_swift master: RedHat: Use monolithic openstack-swift package  https://review.openstack.org/612397
16:06:01 <openstackgerrit> Merged openstack/openstack-ansible-os_swift master: zuul: Switch to distro package installation template  https://review.openstack.org/606056
16:06:25 <evrardjp> should we discuss this further in the open discussion, or is there a bug referencing this?
16:06:44 <hwoarang> open discussion
16:06:48 <evrardjp> great
16:06:59 <evrardjp> let's move to bug triage first though
16:07:06 <evrardjp> #topic bugtriage
16:07:17 <evrardjp> Please see our usual etherpad
16:07:27 <evrardjp> https://etherpad.openstack.org/p/osa-bugtriage
16:07:35 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1798079
16:07:36 <openstack> Launchpad bug 1798079 in openstack-ansible "Test environment example in openstack-ansible" [Undecided,New]
16:08:33 <evrardjp> yup
16:08:36 <evrardjp> seems fair
16:08:45 <evrardjp> https://docs.openstack.org/openstack-ansible/rocky/user/test/example.html doesn't talk about bounds
16:08:48 <evrardjp> bonds*
16:08:52 <odyssey4me> looks like low hanging fruit
16:09:22 <evrardjp> classification?
16:09:42 <odyssey4me> low/confirmed?
16:10:00 <evrardjp> lgtm
16:10:27 <evrardjp> anyone new wants to take this low hanging fruit?
16:10:54 <nicolasbock> I like low hanging fruit :)
16:11:00 <nicolasbock> I could give it a try
16:11:07 <evrardjp> thanks!
16:11:10 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1797499
16:11:11 <openstack> Launchpad bug 1797499 in openstack-ansible "keystone default deploy test uses http not https" [Undecided,New]
16:12:24 <odyssey4me> I'm pretty sure this was discussed at length last week in the channel.
16:13:02 <evrardjp> not reflected in the bug, although the bug was updated 3 times.https://bugs.launchpad.net/openstack-ansible/+bug/1797499/+activity
16:13:02 <openstack> Launchpad bug 1797499 in openstack-ansible "keystone default deploy test uses http not https" [Undecided,New]
16:13:02 <odyssey4me> The test mentioned is a localhost test against the local keystone service. It does not need to use https because the keystone container only listens on http by default.
16:13:14 <odyssey4me> Yeah, I didn't know there was a bug.
16:13:39 <odyssey4me> This is partially fixed by something that merged recently.
16:14:50 <odyssey4me> The actual issue the guy had was he was trying to use the same IP for the external and internal endpoints.
16:15:01 <evrardjp> I think the title of the bug is very confusing and incorrect
16:15:47 <odyssey4me> The fact that http didn't work right when the settings were applied is now fixed in https://review.openstack.org/#/q/I823f2f949258157e306dbf80570abe53373da0c3
16:16:04 <evrardjp> I remember said patch odyssey4me
16:16:05 <evrardjp> good
16:16:13 <evrardjp> so we can close this one then, as invalid?
16:16:26 <evrardjp> if it's an incorrect classification we'll see this one open
16:16:40 <odyssey4me> there is still an issue in that if keystone is set to use client <-https-> haproxy <-https-> keystone then it will fail mmiserably
16:17:09 <spotz> You should be able to use the same IP for internal and external endpoints, its the user_variables.yml.example
16:17:12 <odyssey4me> I think we may already have a bug or two for that condition
16:17:27 <odyssey4me> spotz: yes, but it never worked - but thanks to your patch it now will
16:17:36 <evrardjp> spotz: I would not recommend it though
16:17:47 <evrardjp> but that's another topic
16:17:56 <odyssey4me> My suggestion to the reporter was to use different IP's for the different endpoints.
16:18:05 <odyssey4me> There was a report of success after that.
16:18:08 <evrardjp> ok
16:18:56 <evrardjp> so do we agree on the invalid classification?
16:19:50 <evrardjp> let's move to the open discussion
16:20:08 <evrardjp> #topic open discussion
16:20:50 <evrardjp> hwoarang: before going to your topic I will prio on those already written there, as there are topics that might be skipped
16:20:57 <spotz> let me know your email to add your to the hangout if coming to summit!
16:21:02 <evrardjp> (due to their recurrent/unupdated) nature
16:21:15 <evrardjp> that's indeed first topic
16:21:25 <evrardjp> #link https://etherpad.openstack.org/p/OSA-berlin-planning
16:21:55 <evrardjp> anything else on that topic?
16:22:13 <evrardjp> ok next
16:22:24 <evrardjp> "Openvswitch configuration does not handle configuration properly on compute nodes. It should be configured with different interfaces on neutron agent container and compute hosts"
16:22:39 <evrardjp> wasn't this one already present last week?
16:22:56 <jamesdenton> do you have the bug for that?
16:23:47 <evrardjp> jamesdenton: all I can see on said topic is two links: http://eavesdrop.openstack.org/irclogs/%23openstack-ansible/%23openstack-ansible.2018-07-30.log.html#t2018-07-30T15:33:48 and https://drive.google.com/file/d/1ebmQFx4w7W6G9KJGLuj82VDH5xOwXEhW/view
16:24:03 <jamesdenton> thx
16:24:04 <evrardjp> jamesdenton: but I think it's an old topic
16:24:12 <jamesdenton> i did OVS install recently and don't recall any issues
16:24:15 <evrardjp> ok
16:24:21 <evrardjp> thanks for the feedback there
16:24:42 <evrardjp> Tahvok is not there to talk about that, so let's move towards another topic then!
16:24:50 <evrardjp> Integration between os_tempest and tripleo validate-tempest
16:24:55 <evrardjp> is there anything to say there?
16:25:16 <evrardjp> anything new?
16:25:24 <evrardjp> chandankumar and arxcruz?
16:25:46 <chandankumar> evrardjp: I am working on https://review.openstack.org/591424
16:25:50 <evrardjp> I think I saw a commit today, please review!
16:25:51 <chandankumar> distro support
16:25:53 <evrardjp> chandankumar: great!
16:25:54 <arxcruz> evrardjp: well, i'll work to enable python-tempestconf
16:26:00 <arxcruz> this week/sprint
16:26:03 <evrardjp> that sounds very nice!
16:26:07 <chandankumar> it is almost done few breaking changes
16:26:20 <evrardjp> if you need help don't hesitate to ping
16:26:33 <arxcruz> we are still reformulating how our internal sprint works, but now things are get track
16:26:53 <evrardjp> arxcruz: agile!
16:26:57 <chandankumar> evrardjp: I am in sync with odyssey4me lots of stuff going on from that changes
16:26:57 <evrardjp> :D
16:27:00 <arxcruz> I should send the first wip this week
16:27:16 <arxcruz> evrardjp: hehe, i wish :D
16:27:25 <evrardjp> chandankumar: cool, you are in good hands! other cores, don't hesitate to help there :)
16:27:43 <mnaser> i can do reviews if need be :)
16:27:45 <evrardjp> arxcruz: hahah. Thanks for the first WIP then! :)
16:27:56 <openstackgerrit> Jesse Pretorius (odyssey4me) proposed openstack/openstack-ansible-ops master: MNAIO: Add legacy os-infra_hosts group back  https://review.openstack.org/612737
16:28:20 <evrardjp> arxcruz: chandankumar do you mind if we keep this item on the agenda, so that I know to ping you, and we can track things next week?
16:28:30 <jungleboyj> spotz: mbuil Opened a bug for AIO networking not working:  https://bugs.launchpad.net/openstack-ansible/+bug/1799507
16:28:30 <openstack> Launchpad bug 1799507 in openstack-ansible "AIO deployment instances do not have network connectivity" [Undecided,New]
16:28:31 <chandankumar> evrardjp: sure
16:28:33 <arxcruz> evrardjp: sure
16:28:39 <evrardjp> great
16:29:00 <arxcruz> evrardjp: https://etherpad.openstack.org/p/openstack-ansible-tempest is our plan
16:29:01 <openstackgerrit> Merged openstack/openstack-ansible-tests stable/rocky: Update Ansible to 2.5.10  https://review.openstack.org/612405
16:29:02 <arxcruz> so far
16:29:13 <arxcruz> feel free to comment :)
16:29:27 <evrardjp> I will have a look :)
16:29:40 <evrardjp> anything else on that topic?
16:29:55 <evrardjp> If not, I'd like to leave the mic to hwoarang
16:30:00 <hwoarang> ok
16:30:44 <hwoarang> so what i would like to talk about is this job revert/non-voting situation. what we normally do when a job brakes one day, then we normally move it to non-voting
16:30:59 <hwoarang> however, nobody remembers to bring it back to voting so the testing matrix is different every other day
16:31:09 <hwoarang> and not sure what we can do about that
16:31:15 <hwoarang> it all feels a bit random right now
16:31:38 <odyssey4me> hwoarang: typically I try to push an immediate patch to revert the non-voting change, and then recheck it from time to time
16:31:42 <odyssey4me> I find that works best
16:31:55 <evrardjp> Zuul v3 brought us the ability to be more flexible, but indeed I don't like this kind of expectations issues personally
16:32:05 <odyssey4me> but, quite honestly, all the job changes are making it hard for me to get work done to switch the role to use the integrated build properly
16:32:21 <hwoarang> we are fortunate enough to have distro people around so maybe we can reach out to them, give them like 2 days to fix stuff before we revert or something?
16:32:22 <evrardjp> should we decide a rule?
16:32:31 <hwoarang> because moving from non-voting to voting normally has lower prio
16:32:38 <hwoarang> so things can stay in non-voting for days or weeks
16:32:50 <odyssey4me> it seems like the mirror issues are better now, that's definitely been more stable since nicolasbock upped the mirror refresh frequency
16:32:54 <evrardjp> hwoarang: which can lead to bad things
16:33:16 <odyssey4me> the only issue now is that broken packages hit the repositories relatively often on master - especially for the distro builds
16:33:31 <hwoarang> but a broken master is expected from time to time :/
16:34:04 <hwoarang> keepig jobs as voting actually puts pressure on upstream people for a quick fix
16:34:12 <evrardjp> may I suggest we move towards a non-voting in master, and keep jobs stable on stable branches, until 2 days are passed without improvements?
16:34:14 <odyssey4me> perhaps then distro builds should remain non-voting for master until after m3, then work gets done to make it all work right until the RC time frame? that sucks though because it puts tons of pressure on everyone working on that then
16:34:21 <odyssey4me> more ideally, the work should be spread out
16:34:34 <odyssey4me> so it'd be far nicer if we could use a more stable repo somehow
16:34:47 <odyssey4me> something that gets testing before promoting
16:35:00 <evrardjp> it maybe doesn't need to wait for m3
16:35:06 <hwoarang> that might work since packages are changing quite often before M points
16:35:27 <hwoarang> i am fine with stabilizing distro jobs after branching too
16:36:10 <odyssey4me> could we perhaps rather use an infra specific mirror that's updated only after the package updates are tested and validated?
16:36:13 <hwoarang> what i dont like is this flip-flot because it's hard to keep track of it on all the repos
16:36:22 <odyssey4me> yep, definitely agreed for that
16:36:25 <evrardjp> also the jobs should stay in checks and we should all together not merge things blindly if a -nv fails -- really check at the failure
16:36:50 <evrardjp> odyssey4me: hwoarang agreed on the no flip-flop
16:37:15 <nicolasbock> We currently don't have that for Master odyssey4me
16:37:41 <odyssey4me> ok, but perhaps there'd be a way to implement it in openstack-infra?
16:37:51 <hwoarang> so far it seems that distro jobs are causing the big trouble so we can make them non-voting. but the source based ones should remain voting and try to fix them instead of moving to non-voting. fixes normally arrive in less than 48h
16:38:21 <odyssey4me> some sort of periodic job to test the 'proposed' set, then if it passes copy the tested set into the infra mirrors
16:38:45 <hwoarang> well ideally upstreams should CI their packages ;p
16:39:01 <evrardjp> hwoarang: and publish them right? :p
16:39:03 <odyssey4me> that seems fair to me - we aim to switch the distro jobs to voting before the new release, and all stable branches have them voting
16:39:05 <hwoarang> :)
16:39:22 <evrardjp> so could we sum it up?
16:39:25 <hwoarang> that sounds like the easier solution to keep 'master' happy
16:39:41 <hwoarang> *easiest
16:39:59 <evrardjp> master: source -- wait for 2 days, packages -- see what we can do with more stability + making it non voting until milestone x // stable branches -- always wait for 2 days
16:40:45 <hwoarang> more or less. at least check with the appropriate $distro channel for ETA
16:40:54 <hwoarang> maybe the problem is not known to them at all
16:41:26 <odyssey4me> well, centos and ubuntu don't update their packages without testing - so those distro jobs could perhaps remain voting the entire time... but it seems that opensuse is not testing the master packages after prepping them, so this may have to be suse specific?
16:41:30 <evrardjp> hwoarang: indeed
16:41:47 <odyssey4me> right now the ubuntu distro installs are incomplete, so they'd need to be non-voting anyway
16:41:56 <odyssey4me> but I think centos is fine as far as I've seen
16:42:07 <evrardjp> odyssey4me: and we are not running bleeding edge PPAs
16:42:08 <hwoarang> odyssey4me: honestly i dont know how the suse cloud team is testing the packages
16:42:30 <odyssey4me> evrardjp: yep, if that changes then we'd likely have to apply the same rule
16:42:36 <nicolasbock> We don't hwoarang (IRC) ;)
16:42:39 * evrardjp whistles
16:42:54 <hwoarang> ah ok then
16:42:58 <nicolasbock> We don't spend a lot of time on master unfortunately
16:43:07 <evrardjp> we do but... OMG you don't want to know
16:43:14 <hwoarang> so stabilizting after branching seems the most sensible thing for suse
16:43:14 <nicolasbock> Once it's branched it's a different story
16:43:24 <nicolasbock> I think so
16:43:31 <hwoarang> ok then so be it
16:43:37 <nicolasbock> Unless we can convince the right people to add some more vetting to master ;)
16:43:46 <odyssey4me> ok, so we're all happy for Ubuntu/SUSE distro jobs to remain non-voting until the RC period where work ramps up to get them working - any work done during the cycle is appreciated and advised, but it may break routinely
16:43:52 <evrardjp> nicolasbock: isn't that what I suggested?
16:43:54 <evrardjp> :p
16:43:58 <hwoarang> odyssey4me: ok
16:44:15 <spotz> hehe
16:44:16 <odyssey4me> mnaser: you happy with that?
16:44:35 <odyssey4me> It seems that CentOS is the model to follow for the rest. ;)
16:44:35 <nicolasbock> Yes, I don't think we need to convince you evrardjp (IRC) ;)
16:44:36 <openstackgerrit> Merged openstack/openstack-ansible-tests master: Update ansible to latest stable 2.6.x  https://review.openstack.org/612062
16:45:01 <evrardjp> ok are we done on this topic? A new bug was raised
16:45:14 <evrardjp> from jungleboyj
16:45:19 <evrardjp> https://bugs.launchpad.net/openstack-ansible/+bug/1799507
16:45:19 <openstack> Launchpad bug 1799507 in openstack-ansible "AIO deployment instances do not have network connectivity" [Undecided,New]
16:45:26 <mnaser> i mean, i don't think it's ideal, but i dont think we can do better.
16:45:44 <odyssey4me> hwoarang: FYI https://review.openstack.org/612391 is up, but isn't passing yet :/
16:45:53 <openstackgerrit> Merged openstack/openstack-ansible-os_glance master: Make glance cache management cron task idempotent  https://review.openstack.org/612065
16:46:05 <hwoarang> odyssey4me: yes but it's not suse who is failing :)
16:46:10 <mnaser> but yes, packagers: please CI your stuff, that'd be awesomeeeEe
16:46:10 <spotz> I can poke jungleboyj if we need him
16:46:13 <hwoarang> so...
16:46:27 <jungleboyj> i am here
16:47:12 <evrardjp> sorry I might have pulled jungleboyj a little too early in the conversation if you are all still talking about that
16:47:16 <evrardjp> :p
16:47:18 <odyssey4me> hwoarang: ah yes, I remember now :p
16:47:35 <evrardjp> jungleboyj: can you give us your openstack_user_config.yml and eventual user variables?
16:48:08 <mnaser> i mean, we cant force people to do things but it would REALLY be nice if they cared about down stream users, because if they break us we can break them and vice versa
16:48:08 <evrardjp> or did you use gate_check_commit?
16:48:13 <mnaser> but yeah.  moving on.
16:48:32 <evrardjp> sorry to keep the clock in there
16:48:34 <evrardjp> :p
16:49:23 <evrardjp> jungleboyj: I propose continue discussing your bug after the meeting, would that be okay for you? It would let you the time to publish said configuration or say how you reached said state (which process did you run for example)
16:49:36 <evrardjp> (it's only in 10 minutes)
16:49:54 <evrardjp> so for the last 10 minutes: Are there other topics ?
16:50:18 <jungleboyj> evrardjp:  Sure.  Happy to provide any data you need.  :-)  Just let me know how I can help.
16:50:46 <evrardjp> thanks jungleboyj
16:51:01 <openstackgerrit> Jesse Pretorius (odyssey4me) proposed openstack/openstack-ansible-os_glance stable/rocky: Make glance cache management cron task idempotent  https://review.openstack.org/612752
16:51:02 <jungleboyj> thank you
16:51:10 <jrosser> bionic/ceph another -nv thing that might fester. I have it working but only by upgrading ceph to mimic
16:51:28 <evrardjp> oh
16:51:34 <evrardjp> that's quite something to track too
16:51:56 <evrardjp> jrosser: do you have help there? like logan- ?
16:52:08 <jrosser> i want a second opinion really
16:52:24 <jrosser> uca and dl.ceph.com only seem to provide mimic for bionic anyway
16:52:27 <logan-> o/
16:52:43 <logan-> we should use mimic then
16:52:45 <odyssey4me> jrosser: sounds sensible to me then
16:52:54 <jrosser> and a discussion point might be if we prefer uca or ceph.com packages
16:52:56 <odyssey4me> jrosser.doit()
16:52:57 <evrardjp> sounds reasonable to me
16:53:01 <evrardjp> odyssey4me: :D
16:53:04 <jrosser> i prefer ceph.com becasue they work :)
16:53:16 <jrosser> i can't make the uca ones pass gate check
16:53:21 <odyssey4me> sounds fine to me, I just hate the extra moving part :p
16:53:30 <evrardjp> jrosser: you mean the packages or the people?
16:53:34 <logan-> ^ ive hit bugs in radosgw with uca packages because they were pending SRE in launchpad
16:53:39 <logan-> so i always use ceph.com packages personally
16:53:50 <mnaser> isnt ceph mirrored in infra btw
16:53:58 <jrosser> ceph.com also provide debug symbol packages which is ++ good thing
16:53:59 <odyssey4me> mnaser: hammer is :p
16:54:03 <mnaser> http://mirror.sjc1.vexxhost.openstack.org/ceph-deb-mimic/
16:54:08 <mnaser> http://mirror.sjc1.vexxhost.openstack.org/ceph-deb-luminous/
16:54:10 <mnaser> http://mirror.sjc1.vexxhost.openstack.org/ceph-deb-jewel/
16:54:11 <mnaser> :D
16:54:14 <mnaser> problem solved?
16:54:18 <evrardjp> mnaser: :D
16:54:22 <jrosser> mnaser: look at the env var passed into a job though
16:54:23 <evrardjp> jrosser: agreed
16:54:29 <jrosser> thats not so useful
16:55:00 <mnaser> not sure, as in when to use uca and when not to?
16:55:01 <odyssey4me> ok, do we need to switch from mirroring to a reverse proxy instead? or perhaps expose the right env vars?
16:55:24 * jrosser school half term so not managed to follow this up
16:56:04 <odyssey4me> ok, so we keep using the ceph packages, but need to switch to using the right mirror - and figure out how we get the right mirror path
16:56:10 <jrosser> anyway - i have patches in for switching to mimic and also some general tidy up of the ceph server install
16:56:16 <mnaser> odyssey4me: agreed
16:56:34 <odyssey4me> jrosser: thanks for that!
16:56:35 <jrosser> reviews please on those and i can chase them up when i'm back at work later in the week
16:56:47 <openstackgerrit> Dmitriy Rabotjagov (noonedeadpunk) proposed openstack/openstack-ansible-os_masakari master: Basic implementation of masakari-monitors  https://review.openstack.org/584629
16:57:17 <odyssey4me> FYI I've got a URL that may be useful for reviewers in a hurry: http://bit.ly/2NVPFCg
16:57:43 <odyssey4me> Those are mergable, have passed CI, submitted by cores, and have no negative reviews.
16:57:49 <evrardjp> thanks odyssey4me
16:57:59 <evrardjp> only two minutes remaining for your last topics!
16:59:33 * mnaser cant wait for gerrit shared dashboards
16:59:34 <evrardjp> ok thanks everyone!
16:59:37 <mnaser> odyssey4me: will be our goto dashboard-ian
16:59:39 <evrardjp> mnaser: agreed
16:59:46 <evrardjp> haha
17:00:09 <evrardjp> #endmeeting