rdogerritAlan Pevec created rdoinfo: Initial pin for Oslo and OpenStack clients  http://review.rdoproject.org/r/173000:29
*** anshul has joined #rdo06:37
*** tesseract- has joined #rdo06:39
*** jubapa_ has quit IRC10:28
rdobot[sensu] NEW: master.monitoring.rdoproject.org - check-delorean-newton-current @ http://tinyurl.com/gud2vup |#| Build failure on centos7-master/current: python-openstackclient: http://trunk.rdoproject.org/centos7-master/report.html10:34
apevecweshay, trown, slagle - we'll be soon blocked on fix in tripleoclient https://bugs.launchpad.net/tripleo/+bug/160587611:02
openstackLaunchpad bug 1605876 in tripleo "openstack baremetal import --json instackenv.json fails with "PluginAttributeError: _auth_params"" [Critical,Triaged] - Assigned to Brad P. Crochet (brad-9)11:02
apevecOSC has reverted private attributes11:02
*** trozet has joined #rdo11:04
slagleapevec: thrash|g0ne is working on it11:06
slaglewe got a promote in tripleo-ci this morning11:07
slaglemaybe rdo will promote before this breaks yet again11:07
slaglei think the problem is actually in mistralclient fwiw11:08
apevecah then move LP to mistralclient11:09
slaglei just did11:09
rdogerritMerged openstack/openstackclient-distgit: python-openstackclient: drop ClientManager patch  http://review.rdoproject.org/r/173111:10
apevecRDO promote is unlikely, HA job has only chance to pass on 1/4 of ci.centos machines11:10
apevecbut weshay said he'll re-run on internal HW then we could promote manually11:11
apevecbased on his result11:11
apevecweshay, ^ let me know how is it going11:11
apevectake lastest hash where oooq minimal passed and ha failed11:11
*** trown|outtypewww is now known as trown11:17
apevecsocial, is there LP now for InstanceDeployFailure: Failed to provision instance f6d9453a-d9f1-4206-ac2d-97c6b495a171: Timeout reached while waiting for callback for node 82080397-4963-4aa2-ac35-5f79dabd262e ?11:18
trownactually looks like we had everything except minimal pass, (and that failed on the NodeNotLocked thing), I just tried a "retry" there11:18
apevectrown, crap :)11:18
trownI dont think the multijob will actually continue if it succeeds, but if it passes I am going to promote manually the next phases11:18
apevectrown, see above, we have now OSC/mistralclient in consistent11:18
apevecso if you promote, we need one older hash11:19
*** weshay has quit IRC11:19
trownapevec: right I will promote what ran in last job, not consistent11:19
apevectrown, what about those i-p-a  timeouts? Was it a neutron bug filed?11:19
apevecI see that in logs11:20
apevecthat's 25. TBD ironic/nova timeout from yesterday11:20
trownoh you are right it didnt fail with NodeNotLocked but the other one social was talking about11:22
apevecyeah, is ther LP# ?11:23
trownfound it: [11:35:01] <social> trown: /etc/neutron/plugins/ml2/openvswitch_agent.ini ovsdb_interface = vsctl11:24
apevecyes, http://eavesdrop.openstack.org/irclogs/%23rdo/%23rdo.2016-07-25.log.html#t2016-07-25T15:35:0111:25
trownbut I dont see any explanation of what is going on and why that works around it... or if that is just a better default11:25
apevecbut we need neutron bug# so that jlibosva ihrachys can validate it11:25
apevecjlibosva, ihrachys - any more on ^ last I saw from social was " issue with neutron using ovs native instead vsctl,  it causes ovs to timeout and reconnect which should not disrupt anything but it drops all the connections eg dd fails for ironic"11:26
trownnevermind found enough to file a bug11:26
ihrachyswat where who why?11:26
trown[10:25:00] <social> trown: this is issue with neutron using ovs native instead vsctl11:26
trown[10:25:28] <social> trown: it causes ovs to timeout and reconnect which should not disrupt anything but it drops all the connections eg dd fails for ironic11:26
apevectrown, irc logs ++ :)11:27
apevecihrachys, same here :)11:27
ihrachysapevec: please elaborate, I still to drink my coffee11:27
apevecwe're just copy/pasting what social said, garb him :)11:27
ihrachyswhat is the workaround for?11:28
ihrachyswhat's the issue?11:28
ihrachyslink to a bug or what?11:28
dmsimardapevec, number80, pabelanger: we're still having that meeting in 30 right ? I'm gonna need extra caffeine.11:28
apevecdmsimard, I'll be late up to 30min11:29
apevecnumber80, ^ feel free to start or resched11:29
*** gildub_ has quit IRC11:29
*** gildub has quit IRC11:29
*** gildub has joined #rdo11:30
ihrachysapevec: we need a bug reported if you want anyone to look at it11:30
dmsimardI don't have a huge amount of time, have a pediatrician appointment 2h past the meeting beginning11:30
*** nehar has quit IRC11:30
dmsimardFigured 1h would've been enough11:30
*** Goneri has joined #rdo11:31
dmsimardapevec: think it's important that you're there11:31
trownapevec: ihrachys, https://bugs.launchpad.net/neutron/+bug/1534110 seems maybe related?11:34
openstackLaunchpad bug 1534110 in neutron "OF native connection sometimes goes away and agent exits" [Medium,New] - Assigned to YAMAMOTO Takashi (yamamoto)11:34
apevectrown, hmm, January11:35
*** ushkalim__ has quit IRC11:35
apevecwhy it started hitting us only now?11:35
*** ushkalim_ has quit IRC11:36
trownoh right... k, I will just file a new bug with logs and what information we have11:36
apevecnumber80, dmsimard - prev. meeting got moved to 2pm, so I've now full overlap, please reschedule11:36
trownwell it is racy, and the timeout in the ovs-agent logs is just an RPC timeout probably because the undercloud is very overtaxed11:36
apevectrown, so under-powerd CI is actually good :)11:37
apevecexposing all the races11:37
trownso maybe the behavior was there, but we didnt have the issue becuase the undercloud was less overwhelmed11:37
* trown blames mistral11:37
*** egallen has joined #rdo11:38
*** lucasagomes is now known as lucas-hungry11:38
ihrachystrown: could be.11:40
ihrachysapevec: because we switched to native lately11:40
*** fzdarsky is now known as fzdarsky|lunch11:41
ihrachystrown: please reuse the bug unless you clearly see it's not the same thing11:41
ihrachystrown: once you are done, I think we can bump the bug to High and target to N because it's now the default.11:42
trownihrachys: ok, I will add to current bug... I definitely dont understand enough to know it is not the same thing11:42
ihrachysajo: ^ any news on the bug ? https://bugs.launchpad.net/neutron/+bug/153411011:42
openstackLaunchpad bug 1534110 in neutron "OF native connection sometimes goes away and agent exits" [Medium,New] - Assigned to YAMAMOTO Takashi (yamamoto)11:42
ihrachystrown: does the agent exit for you though?11:43
ihrachysbecause the bug describes the exit of the agent.11:43
*** ade_b has quit IRC11:43
ihrachyswe may need to pull in jlibosva and ajo and otherwiseguy to look at the results you have. so post them first, then we'll try to get folks on board.11:43
trownihrachys: ya I dont see the agent actually exit, just RPC timeout, and at the same time connection Ironic was using to dd image drops11:45
*** jlibosva has quit IRC11:45
apevecihrachys, trown - what do you think about using social's workaround?  /etc/neutron/plugins/ml2/openvswitch_agent.ini ovsdb_interface = vsctl11:47
apevecand where would that fit in tripleo?11:47
trownwe would have to wire it into puppet on the undercloud11:47
*** ushkalim_ has joined #rdo11:48
ihrachysapevec: that's fine for short term but we should not ship N with it11:48
*** ushkalim__ has joined #rdo11:48
ihrachysapevec: because then we diverge from u/s11:48
trownit is not a simple fix to do correctly in instack-undercloud either11:49
apevecyes, only as a workaround to unblock RDO promotion, we're stuck > 2 week due to various issues :(11:49
trownputting a hack in to quickstart would work, but I get a lot of shit for that11:50
apevecI was about to suggest sed or crudini 1-liner :)11:50
apevecok, I've updated issue 25.11:51
*** zoli|lunch is now known as zoli11:51
trownI put my understanding in https://bugs.launchpad.net/neutron/+bug/153411011:51
openstackLaunchpad bug 1534110 in neutron "OF native connection sometimes goes away and agent exits" [Medium,New] - Assigned to YAMAMOTO Takashi (yamamoto)11:51
*** zoli is now known as zoliXXL11:51
trownwould help to have social confirm or deny as I am pretty much just regurgitating from IRC logs11:51
*** rhallisey has joined #rdo11:53
ajoihrachys, was it here where you asked me about https://bugs.launchpad.net/neutron/+bug/1534110  ? , my IRC client crashed11:54
openstackLaunchpad bug 1534110 in neutron "OF native connection sometimes goes away and agent exits" [Medium,New] - Assigned to YAMAMOTO Takashi (yamamoto)11:54
ajoihrachys,  no news11:54
*** richm has joined #rdo11:54
*** paramite is now known as paramite|afk11:55
ihrachysajo: seems like rdo ci hits it11:56
*** pkovar has joined #rdo11:56
ajoihrachys, I guess it's more likely to happen now since we switched to native :/11:57
ihrachysajo: that's why it hits them11:57
ajoihrachys, any idea of what's the rate? I guess we need to bump priority on the bug if we're all native now11:57
ihrachysajo: which sucks because if we don't guard against timeouts, then it's not on par with prev interface11:57
ihrachysajo: I said it before, yes; High and target to N11:58
ajoihrachys, what's the difference betweem critical an high?11:58
ihrachysajo: though trown is to check it's the same thing and report his logs11:58
ajofrom my point of view, neutron ref impl is broken with this out in the wild11:58
ihrachysajo: critical blocks all development like gate breakage, high is everything else of high importance11:59
ajoso High11:59
trownihrachys: I posted logs... they are not that interesting from ovs-agent side... just RPC timeout at the same time as Ironic connection drops12:00
*** shardy is now known as shardy_lunch12:01
number80lol, I still managed to be the first in the meeting12:01
ajotrown, but RPC is different to openflow12:01
ajotrown, what you reported is something else :)12:02
ajotrown, doesn't it recover from that?12:02
weshaytrown, where did the min job fail.. no tempest errors afaict12:02
trownajo: right, but ironic connection is via iscsi for dd'ing image... and that drops at the same time12:02
ihrachysajo: probably a timeout triggers some reconf in ryu or smth12:03
ihrachysso flows are broken.12:03
ihrachysthat would be my guess for the connection flip12:03
weshaydam.. inventory12:03
trownweshay: it failed on this issue we are discussing ^12:03
trownweshay: I did retry and that is failing the same way, so I am not returning the node so I can hack on it... want to test the a proposed workaround there12:04
socialtrown: from the comment they aren't using native12:05
*** rdas has quit IRC12:05
socialajo: are you sure you don't use native in ovs agent? I could only reproduce it with native12:05
*** rdas has joined #rdo12:06
trownsocial: default is definitely native in my conf12:06
socialtrown: we could for now deliver patched config in neutron or make puppet to flip it12:07
*** thrash|g0ne is now known as thrash12:07
number80apevec, dmsimard, pabelanger: meeting rescheduled 3 hours later than original schedule12:08
socialajo: how is the memory on the machines you hit this? do you have some link on failed jobs I could skim through?12:08
number80fbo: ping stable builds12:09
trownsocial: probably meant for me? https://ci.centos.org/artifacts/rdo/jenkins-tripleo-quickstart-promote-master-delorean-minimal-411 is most recent example12:09
trownsocial: machines are definitely under heavy load12:09
fbonumber80, pong12:10
dmsimardnumber80: not going to be there :(12:11
dmsimardnumber80: I should be back from an appointment like an hour and a half after that12:12
number80dmsimard: then, try to provide a schedule that fits Alan too12:14
number80I guess today won't work12:16
socialtrown: it's heartbeats lost in amqp?12:18
*** danielbruno has joined #rdo12:19
number80looks like there's a nice bug in webob12:20
socialtrown: Jul 26 12:49:39 undercloud ovs-vswitchd[10666]: ovs|00043|rconn|ERR|br-ctlplane<->tcp: no response to inactivity probe after 5 seconds, disconnecting12:21
*** jlibosva has joined #rdo12:24
*** gildub has quit IRC12:26
trownsocial: looking at a live CI environment now, bottleneck is much more CPU than memory... not even sure we would expect an actual user to have such resource constraints12:27
socialjlibosva: starting 2016-07-26 10:49:28.788  in https://ci.centos.org/artifacts/rdo/jenkins-tripleo-quickstart-promote-master-delorean-minimal-411/undercloud/var/log/neutron/openvswitch-agent.log.gz12:27
trownmight try just giving VMs more vcpus12:27
*** ohochman has joined #rdo12:28
socialjlibosva: Jul 26 12:49:39 it looses connection12:30
jlibosvasocial: because of the same reason on flat network?12:31
socialjlibosva: yep the ovs thing12:31
jlibosvasocial: have you tried to run it several time with vsctl interface to be sure it's caused by native interface?12:31
socialjlibosva: in my case vsctl didn't fail not sure if that helps12:33
rdobot[sensu] NEW: master.monitoring.rdoproject.org - check-delorean-newton-current @ http://tinyurl.com/gud2vup |#| Build failure on centos7-master/current: python-openstackclient: http://trunk.rdoproject.org/centos7-master/report.html12:34
jlibosvasocial: it would be good to be sure it wasn't a coincidence, you tried it just once and native fails sporadically, right?12:34
socialjlibosva: I tried vsctl 4 times and it didn't fail, native fails obviously only under heavy load12:34
jlibosvasocial: ok, I thought it was only once12:34
*** julim has quit IRC12:35
*** rodrigods has quit IRC12:35
*** rodrigods has joined #rdo12:35
*** lucas-hungry is now known as lucasagomes12:39
*** jcoufal has joined #rdo12:40
rdobot[sensu] RESOLVED: master.monitoring.rdoproject.org - check-delorean-newton-current @ http://tinyurl.com/gud2vup |#| No build failures detected: http://trunk.rdoproject.org/centos7-master/report.html12:45
*** jeckersb_gone is now known as jeckersb12:46
jlibosvasocial: coming12:50
* social hides under table12:51
*** paramite|afk is now known as paramite12:51
weshayin 4 min12:56
chandankumarapevec, https://bugzilla.redhat.com/show_bug.cgi?id=1318765 and https://bugzilla.redhat.com/show_bug.cgi?id=1355886 any thing more to add on this12:57
openstackbugzilla.redhat.com bug 1318765 in Package Review "Review Request: openstack-sahara-tests - Sahara Scenario Test Framework" [Unspecified,Assigned] - Assigned to apevec12:57
openstackbugzilla.redhat.com bug 1355886 in Package Review "python-horizon-tests-tempest : Tempest Integration of Horizon" [Unspecified,Assigned] - Assigned to apevec12:57
*** dmsimard is now known as dmsimard|afk12:58
*** eliska has quit IRC12:58
*** _elmiko is now known as elmiko13:01
weshaytrown, you avail?13:01
weshaypanda, you are welcome as well13:02
*** linuxgeek_ has joined #rdo13:03
rdogerritFabien Boucher proposed config: Initial commit to activate the stable branch build on CBS  http://review.rdoproject.org/r/172713:03
*** Liuqing has quit IRC13:06
*** linuxaddicts has quit IRC13:07
weshaybkero, mtg13:07
rdogerritMerged config: Initial commit to activate the stable branch build on CBS  http://review.rdoproject.org/r/172713:09
socialjlibosva: I'll ping you in ~20 min with env ready again13:15
*** hynekm has quit IRC13:16
*** Liuqing has quit IRC13:19
pabelangerapevec: number80: dmsimard|afk: apologies for not making the meeting.  Looks like it's in 90mins now?13:34
larsksdmsimard|afk: did you get sorted out with your docker exec question yesterday?  I didn't see it until late last night...13:34
number80pabelanger: well, dmsimard|afk can't13:36
number80so we have to figure out a better timeslot this week13:36
*** fzdarsky|lunch is now known as fzdarsky13:36
* number80 is going to Poland next week and apevec will be on PTO after that13:36
number80as both of you have kids, I suppose that you'll take PTO in august too13:37
rdogerrithguemar proposed openstack/novaclient-distgit: Added py2 and py3 subpackage  http://review.rdoproject.org/r/161813:44
*** links has quit IRC13:44
rdogerritFabien Boucher proposed config: Add newton-distgit-cbs-validate job and activate it for newton-rdo branch  http://review.rdoproject.org/r/173213:47
*** ayoung has joined #rdo13:48
apevecnumber80, pabelanger - dmsimard|afk  is back after 16:30 UTC13:51
number80UTC ?13:52
number80I thought it was EDT13:52
apevecnumber80, that's what he said, lemme check13:52
apevecI should be back around 4:30PM UTC (12:30PM EST).13:52
apevecnumber80, reschedule to that slot13:53
apevecthen we'll see if that works out13:53
number80it should be quick13:53
kbsinghhewbrocca: on the machine side, the only key metric that might be relevant here is the per-code compute capacity on the amd chassis is 60% of what it is on the intel machines13:53
apevechewbrocca, trown might have a solution: https://review.openstack.org/34737113:53
kbsinghif we can get to using more cores13:54
apeveckbsingh, yep, that's what trown's patch is doing13:54
hewbroccakbsingh: I see, so each blade is less beefy, but there are more blades13:54
hewbroccaOK, more cores13:54
kbsinghapevec: that looks good!13:54
hewbroccaoh, wow13:54
hewbroccaAre we trying it?13:55
*** gszasz has quit IRC13:55
trownhewbrocca: ya I confirmed it worked on a CI node that failed without it13:55
hewbroccatrown: good man13:55
trownhewbrocca: running CI on it to make sure we dont break liberty/mitaka13:55
trownbecause more CPU = more workers = more memory... and we had memory issues in the past13:55
trownthanks kbsingh for the analysis on the AMD CPUs that helped me know how to fix it13:56
kbsinghthere are 2 more things that i am going to work on that should help - (1) bstinson is going to work with facilities to see if we can move ram around and bring up another 64 intel machines with 32gb ram ( we have 64 of these, sitting with 16gb, which is no use )13:56
hewbroccakbsingh: that would be magnificent13:56
kbsinghand (2) i am going to work on changing the fair-spread logic of machine allocation towards a more 'best-available'13:56
*** ekuris has quit IRC13:58
*** hynekm has quit IRC14:03
*** rdo has joined #rdo14:03
*** nyechiel_ has joined #rdo14:04
*** jcoufal has joined #rdo14:06
*** kaminohana has quit IRC14:11
*** limao has joined #rdo14:12
*** limao_ has joined #rdo14:13
*** sdake has joined #rdo14:14
*** limao has quit IRC14:16
*** sdake_ has joined #rdo14:18
*** mengxd has joined #rdo14:20
*** sdake has quit IRC14:20
number80didn't saw that we merged https://review.rdoproject.org/r/#/c/1685/ \o/14:23
*** dustins has joined #rdo14:23
*** laron has joined #rdo14:23
* number80 now can build dependencies slightly faster14:23
*** limao_ has quit IRC14:26
*** anshul has quit IRC14:26
*** anshul has joined #rdo14:27
jruzickanice hack :)14:28
jruzickaerr, solution14:28
*** tosky has quit IRC14:30
*** eliska has quit IRC14:30
number80jruzicka: it's a hack, that's why I never dared to merge it myself :)14:34
*** Liuqing has joined #rdo14:34
*** pabelanger has quit IRC14:39
*** pabelanger has joined #rdo14:39
*** pnavarro has quit IRC14:40
jlibosvaihrachys: partially, I'm working with social, he can reproduce it locally14:40
apevectrown, I'll merge 23. under 25. ack? https://etherpad.openstack.org/p/delorean_master_current_issues14:41
trownya, they are similar... my patch for command_timeout did actually help HA job :)14:41
*** zhenguo has quit IRC14:41
jlibosvaihrachys: if you want an update, it's because of ovs interfaces switch. since we switched both in the short period, we don't know which cause it or whether it's the combination14:42
socialjlibosva: I forgot to ping you14:42
ihrachysjlibosva: I just wanted to make sure it has not slipped thru the cracks14:42
jlibosvasocial: I came after 20 minutes but your seat was empty14:42
socialjlibosva: yeah I'm here and env is ready14:43
socialI should have gave you shell14:43
jlibosvaihrachys: we're on it, people already spotted they can't install openstack14:43
ihrachysok cool. visibility :P14:43
jlibosvasocial: I'll give you my brand new old public key14:44
*** zoliXXL is now known as zoli|brb14:50
*** aderyugin has quit IRC14:52
*** zoli|brb is now known as zoli14:54
*** zoli is now known as zoliXXL14:54
*** number80 has quit IRC14:55
*** aderyugin has joined #rdo15:00
*** mbound has quit IRC15:07
*** anshul has quit IRC15:12
*** satya4ever has quit IRC15:14
*** nyechiel has joined #rdo15:18
*** READ10 has joined #rdo15:21
*** gkadam has quit IRC15:21
*** tosky has joined #rdo15:25
*** nmagnezi has joined #rdo15:25
*** saneax is now known as saneax_AFK15:29
*** ade_b has quit IRC15:29
*** Liuqing has quit IRC15:29
*** leanderthal is now known as leanderthal|afk15:30
*** smeyer has quit IRC15:37
*** linuxaddicts has joined #rdo15:39
apevecweshay, myoung, trown - just to double-check, you're re-testing internally this hash? https://ci.centos.org/job/rdo-promote-get-hash-master/536/console15:40
weshayapevec, aye15:41
apevecthat's the one where, of all jobs!, only minimal  failed15:41
*** garrett has quit IRC15:41
apevecyeah, ha job was lucky and hit dusty15:42
trownapevec: ya, I actually disable promote, and put that hash manually in get hash job15:42
trownapevec: waiting on CPU patch to pass CI, then will rerun promote with hash from 53615:42
*** akrivoka has quit IRC15:43
rdogerritFrederic Lepied proposed rdoinfo: Initial pin for Oslo and OpenStack clients  http://review.rdoproject.org/r/173015:43
trownapevec: we can also skip image building in that run, since we already have an image for that hash15:43
trownso maybe after lunch we have a promote15:43
*** tesseract- has quit IRC15:43
apevecright, your patch doesn't affect image building15:43
trownyep, just undercloud vm15:44
*** number80 has joined #rdo15:45
*** weshay is now known as weshay_brb15:45
number80seems that CBS is stalled15:49
number80oh no, just SCL jamming the builders15:50
number80One good reason to not like koschei15:50
number80(koschei is generic DLRN-like continuous delivery that uses koji)15:51
apevecnumber80, oh, SCL SIG uses that?15:54
*** panda is now known as panda|bbl15:55
apevecnumber80, flepied - I think we can now merge both  https://review.rdoproject.org/r/#/q/topic:newton-uc15:55
apevecas a starting point, so we can setup new dlrn instance15:55
apevecI'll send update-uc script in followup15:56
flepiedapevec: yes but we need to fix the script to insert 'source-branch'15:56
apevecwith improved projects for filter15:56
apevecflepied, yep, that too15:56
apevecI'll re-run it and it should not make changes to your patchset15:57
flepiedapevec: yep15:57
*** indy21 has quit IRC15:58
*** mosulica has quit IRC15:58
trownok promote kicked now with twice the cpus... https://ci.centos.org/view/rdo/view/promotion-pipeline/ getting lunch15:59
*** laron has joined #rdo15:59
hewbroccawoop woop15:59
hewbroccanice work trown|lunch15:59
weshayapevec, did I loose rights to add repos to redhat-openstack?16:04
*** pgadiya has joined #rdo16:04
*** pgadiya has quit IRC16:05
apevecyou shouldn't have, lemme check16:05
apevecweshay, this guy https://github.com/weshayutin is Owner16:06
apevecnot sure who is under helment...16:06
*** tumble has quit IRC16:06
*** oshvartz has quit IRC16:07
weshayapevec, :)16:07
weshayweird.. github changed something when forking..16:07
*** nmagnezi has joined #rdo16:08
*** mbound has joined #rdo16:08
apevecspeaking of forks, f25 was branched today16:09
apevecnumber80, jruzicka ^ when is py3 deadline for Fedora?16:10
apevecnumber80, oh, Flock is next week?16:11
number80apevec: I haven't been nagged again but I'm nearly finished16:12
number80currently, I'm waiting CBS resources to be freed so that I can close it all16:13
*** mbound has quit IRC16:13
number80tripleoclient/tripleocommon are the only exception but that needs more work and they're not in fedora16:14
jruzickaooo always is something extra16:14
number80I'm not commenting that :)16:14
jlibosvaihrachys: then again, which timeout did you mean couple of hrs back? Just making sure it's not a different issue16:14
number80I need more coffee16:15
ihrachysjlibosva: my understanding was that a rpc timeout in agent triggered something in ryu that made connectivity blink16:15
jlibosvaihrachys: ack, that sounds like it16:16
*** richm has quit IRC16:16
jlibosvaihrachys: do you know if it happens on jenkins jobs?16:16
ihrachysjlibosva: that's all I knew16:16
kbsinghnumber80: i highly encourage you to report on this with a 'can we increase capacity' for CBS16:17
apeveckbsingh, is there bug# for that?16:18
kbsinghnumber80: we have the ability to increase build capacity in CBS almost 300% if needed16:18
kbsinghapevec: you should be good to go ahead and file one16:18
kbsingh cc: bstinson :)16:18
apevecI'll let number80 file one for CBS16:19
number80kbsingh: well, that's one of the rare occasion I'm waiting, during the w-e, I have as many builders as I want :)16:19
kbsingh( just going by the fact that number80 said things were hectic almost an hour back, and from the looks of it, thre are still 70+ jobs in the queue, we should be ok to add more capacity there )16:19
number80300% good16:19
number80I won't complain to have more builders available :)16:19
kbsinghwe have 2 machines there now ( bstinson might correct me ) - and we can add another 6 ( 2 more or less now, 4 more would mean moving things around )16:20
number80well, let's be future-proof and expect more usage of CBS :)16:21
*** hrw has quit IRC16:21
*** vaneldik has quit IRC16:22
*** zoliXXL is now known as zoli|gone16:23
kbsinghnumber80: go ahead and file for more resources, we might not treat it as urgent for now since you think its not urgent, but we can get some more builders added in the next few weeks for sure16:24
kbsinghdid trown|lunch's work to spread load a bit, mean we got a promote ?16:24
kbsingh( or is that still running )16:25
hewbroccakbsingh: it's running now16:25
apevecdamn, one packstack job failed16:25
apevecwe'll need to re-run it16:26
*** ushkalim__ has quit IRC16:26
*** ushkalim_ has quit IRC16:26
apevecdmsimard|afk, when you're back https://ci.centos.org/job/weirdo-master-promote-packstack-scenario001/409/16:26
*** pilasguru has joined #rdo16:27
*** zoli|gone is now known as zoli_gone-proxy16:28
apevechmm, cirros image16:28
apevecbut not during create i.e. d/l error16:28
*** jlibosva has quit IRC16:29
imcsk8apevec: i'll take a look also16:30
*** derekh has quit IRC16:30
apevecimcsk8, looks like a race in Packstack::Provision ?16:31
*** pilasguru has quit IRC16:32
apevecimage create followed by image set16:32
apevecit kept retrying but it wasn't enough16:33
*** spr1 has joined #rdo16:34
*** jpich has quit IRC16:35
*** mcornea has quit IRC16:35
*** chem has quit IRC16:37
apevecimcsk8, also what dmsimard|afk and I discussed yesterday, we could use cached cirros in packstack jobs, like puppet jobs: https://github.com/openstack/puppet-openstack-integration/blob/6c7cb5a39d2b54a222605d9caf9973e8059e34e5/run_tests.sh#L15316:39
number80kbsingh: done https://bugs.centos.org/view.php?id=11223 I can't change assignee of add bstinson as CC though16:40
* number80 only has ticket creation + commenting perms in mantis16:40
*** florianf has quit IRC16:42
*** lucasagomes is now known as lucas-dinner16:44
apevecimcsk8, hmm, this is all within glance_image ?16:45
imcsk8apevec: that sounds cool16:45
apevecthat's what packstack is using right?16:45
*** chandankumar has joined #rdo16:45
apevecnumber80, getting f25 spam from f25 too?16:46
apevecerr from pkgdb16:46
*** danielbruno has quit IRC16:46
imcsk8apevec: packstack uses this one: https://github.com/openstack/packstack/blob/master/packstack/plugins/provision_700.py#L3316:46
* apevec needs to tune his fedora-notifs16:46
*** dmsimard|afk is now known as dmsimard16:47
apevecimcsk8, yep, we'd need to check if that works with file://16:47
apevecbut also, this race should be solved in puppet-glance16:48
number80apevec: not yet but I suspect that fedmsg is lagging16:48
EmilienMwhat race?16:48
EmilienMis there a launchpad somewhere?16:48
imcsk8apevec: i'll check, it does not seem very problematic16:48
apevecEmilienM, https://ci.centos.org/artifacts/rdo/weirdo-master-promote-packstack-scenario001/409/packstack/logs/latest/manifests/
apevecEmilienM, not yet16:48
apevecimage set after image create16:49
apevecrelies on retries16:49
apevecand we could mask it by using locally cached cirros image I guess16:49
*** ccamacho is now known as ccamacho|awawawa16:50
*** ccamacho|awawawa is now known as ccamacho|away16:50
dmsimardapevec, pabelanger, number80: I'm back16:50
apevecbut right now it just failed in rdo promotion https://ci.centos.org/view/rdo/view/promotion-pipeline/job/rdo-delorean-promote-master/539/16:50
dmsimardapevec: yeah the cirros image is starting to be a bit troublesome16:50
apevecdmsimard, welcome back!16:50
dmsimardapevec: I saw it as well16:51
apevecdmsimard, it will be fun it oooq pass in that run :)16:51
apevecit -> if16:51
dmsimardapevec: we can definitely cache it in review.rdo but I don't see how we could cache it in ci.centos since they're "virgin installs"16:51
dmsimardapevec: I wonder if the cloud SIG could host it16:51
number80dmsimard, apevec, pabelanger: then ready for a quick call?16:52
apevechmm, we could package it as RPM ?16:52
apevecnumber80, let's16:52
dmsimardapevec: not a crazy idea .. it's not that big16:52
kbsinghif you need to pre-seed content, you can host it on the artifacts machine, then rsync it down before kicking stuff off16:52
kbsinghhumm the entire image as an rpm ?16:52
dmsimardkbsingh: cirros is pretty small16:52
dmsimardsmaller than the kernel-dev packages :p16:53
kbsinghhave you guys seen centos-min ? i believe its down to 70MB16:53
apeveckbsingh, url?16:53
pabelangernumber80: sure16:53
apevecsounds interesting16:53
number80kbsingh: that's awesome16:53
number80(I'm saying that as a former fedora cloud guy)16:53
*** jhershbe has quit IRC16:53
dmsimardnumber80, pabelanger, apevec: ok let's do the meeting then, meet you in the BJ in 1 min16:54
*** pilasguru has joined #rdo16:54
kbsinghapevec: https://github.com/cgwalters/centos-dockerbase-minimal is a good place to start from16:54
kbsinghapevec: we can trick it down even further, just need to be careful what we call it16:54
number80ah , it's using cgwalters libhif yum-min clone16:54
number80if you remove yum and cloud-init, you can win a lot of space16:55
kbsinghclone as in, it does 'yum install'16:55
*** rdas has joined #rdo16:55
number80well, for CI usage, you don't need much more16:55
pabelangerdmsimard: URL?16:56
dmsimardpabelanger: PM16:56
*** laron has quit IRC17:09
*** jtomasek has quit IRC17:10
*** shardy has quit IRC17:13
*** trown|lunch is now known as trown17:14
*** apevec has quit IRC17:16
larsksmetabsd: generally, that just means increasing the space available to whatever contains /var/lib/nova/instances (or mounting a new filesystem at that location).17:29
*** READ10 has quit IRC17:30
*** rdas has quit IRC17:30
*** vaneldik has left #rdo17:36
*** pkovar has quit IRC17:37
EmilienMapevec_: new error on trunk http://paste.openstack.org/show/ilg5IMSvyUycj3JwbdFb/17:38
*** spr1 has quit IRC17:38
*** spr1 has joined #rdo17:38
apevec_@epel ?17:41
*** hynekm has joined #rdo17:42
number80I'm building nodejs 4.4.717:43
number80in RDO -testing17:43
*** spr1 has quit IRC17:44
*** jhershbe has quit IRC17:44
dmsimardtristanC: are you there or I guess moving ?17:44
*** gfidente is now known as gfidente|afk17:44
*** READ10 has joined #rdo17:47
*** ihrachys has quit IRC17:47
*** spr1 has joined #rdo17:47
*** Tenhi has joined #rdo17:48
*** toanju has joined #rdo17:50
EmilienMnumber80: that's why it broke or?17:50
trownapevec_: upstream still uses epel17:50
trownwe wont hit that in RDO17:50
*** imcleod has quit IRC17:50
*** DaveJ__ has quit IRC17:50
dmsimardepel ? why ?17:51
*** spr1 has quit IRC17:51
*** spr1 has joined #rdo17:51
apevec_https://ci.centos.org/view/rdo/view/promotion-pipeline/job/rdo-delorean-promote-master/539/ - only packstack failed!17:52
*** apevec_ is now known as apevec17:52
trownapevec: ya rerun is queued up waiting on executor17:52
*** spr1 has quit IRC17:52
dmsimardcirros image failure again ..17:52
*** mvk has joined #rdo17:52
trownI am just going to rerun that hash until it passes17:52
*** spr1 has joined #rdo17:52
apevectrown, promote or die!17:53
trownwithout building images it is only about 90 minutes a run17:53
trownwe will get it17:53
dmsimardat that point we should probably promote manually.. lol17:53
*** tosky has quit IRC17:53
trownI guess17:53
apevecyeah, let's promote that hash17:53
dmsimardwhat hash is it17:53
apevecbut keep the current run17:53
dmsimardI can do17:53
apevecmaybe we get lucky in packstack :)17:53
apevecdmsimard, ah found that old dlrn HA proposal: https://github.com/javierpena/delorean-instance/blob/master/docs/delorean-instance.md it's different story17:54
pabelangerdmsimard: if you want to create an account on launchpad.net and enroll it into openstack.org, that will be the user zuul.rdoproject.org can use17:54
apevecdmsimard, but let's keep that for the next week when jpena is back17:54
trowndmsimard: 96/af/96af93d36151b04fc82c5c1db3615311bfda544e_d13b9e3817:54
EmilienMnumber80: can you let us know if the nodejs failure is on your side?17:55
dmsimardapevec: yeah that's no good17:55
trownI will promote the image17:55
EmilienMall tripleo CI jobs are failing atm17:55
*** iberezovskiy is now known as iberezovskiy|off17:55
*** Tenhi has quit IRC17:55
trownEmilienM: node is coming from EPEL... rdo doesnt manage EPEL17:55
dmsimardit's sqlalchemy17:55
*** degorenko is now known as _degorenko|afk17:55
trownEmilienM: this comes up once every couple months... but upstream wants EPEL so this is what happens17:55
EmilienMtrown: ok so something broke in EPEL17:55
dmsimardso we could have multiple "workers" for a release17:55
dmsimardpabelanger: ack17:56
EmilienMtrown: have we tried to deploy tripleo CI without epel?17:56
EmilienMtrown: Puppet CI doesn't deploy EPEL at all and it works fine17:56
*** priteau has joined #rdo17:57
apevecdmsimard,  we also need central "repo" mashing place, but yeah central DB too17:57
trownEmilienM: RDO quickstart image does not have EPEL either17:57
*** jhershbe has joined #rdo17:57
*** spr1 has quit IRC17:57
trownit is totally possible to not use EPEL, but there are hardcoded dib things that set it up17:57
apevecEmilienM, number80 - Updated By: 1:libuv-1.8.0-1.el7.x86_64 (delorean-newton-testing)17:58
apevecthat's new dep for nodejs update that number80 is doing17:58
number80yeah, it's slowly coming17:59
apevecso it's a mix which is breaking17:59
apevecepel on its own should work17:59
*** hynekm has quit IRC17:59
apevecEmilienM, but what requires nodejs? which job is that?17:59
number80I need this one before having 4.4.117:59
*** dtrainor has quit IRC17:59
number80jobs are launched sequentially, but CBS queue is full17:59
number80well, was full18:00
EmilienMapevec: tripleo18:00
trownEmilienM: this is all that is required to not use EPEL https://github.com/redhat-openstack/ansible-role-tripleo-image-build/blob/master/templates/dib-prepare-centos7-default.sh.j2#L29-L3618:00
EmilienMtrown: https://review.openstack.org/34749918:00
trownI have brought it up before though and folks dont seem interested18:00
EmilienMtrown: i AM interested.18:01
trownEmilienM: ya you will have to remove it from the DIB elements though18:01
EmilienMtrown: can you look my patch and tells if I missed something?18:01
number80EPEL must die # do not take this seriously!18:01
*** panda|bbl is now known as panda18:01
EmilienMtrown: can you help me with that?18:01
socialtrown: I was on training any update on the neutron issue?18:02
trownEmilienM: /usr/share/diskimage-builder/elements/base/install.d/99-dkms is the tricky one as it is in the base element18:02
trownso my easy hack of just removing it probably will not fly upstream18:02
*** ppowell has joined #rdo18:03
number80trown: worst-case would be adding it in -common18:03
number80or have a separate repo for upstream purposes18:03
trownsocial: I increased the number of vCPUs of the undercloud VM to not create the issue18:03
trownsocial: since it only happens when things are bogged down18:04
* number80 running low on energy18:04
trownsocial: not a solution, but fixes CI18:04
EmilienMtrown: where in ooo-ci could I apply your patch?18:04
trownEmilienM: not sure we can apply that in ooo-ci, as we get the "undercloud image" from infra (it is just infra centos image)18:05
*** anilvenkata has joined #rdo18:06
trownEmilienM: which is built by DIB18:06
socialtrown: and it worked?18:06
trownsocial: ya, it seems to have improved the pass rate pretty dramatically18:06
trownfrom 0 to non-zero for sure18:06
trownso infinite percent increase18:06
social^_^ ok, I'll try to keep looking into this18:06
*** leanderthal|afk has quit IRC18:07
*** jlibosva has joined #rdo18:07
trownEmilienM: checking your patch though, I will help get it removed from tripleo-ci if we dont get blocked18:07
EmilienMlunch now18:08
*** laron has joined #rdo18:08
*** dtrainor has joined #rdo18:12
*** gszasz has quit IRC18:23
*** jprovazn has quit IRC18:27
*** chandankumar has quit IRC18:28
*** spr1 has joined #rdo18:30
*** jubapa_ has joined #rdo18:31
*** sdake has quit IRC18:35
*** laron has quit IRC18:35
dmsimardapevec: we've reached disk space treshold again on internal dlrn18:42
*** mlammon|afk is now known as mlammon18:43
*** aortega has quit IRC18:43
*** weshay has quit IRC18:49
*** jubapa_ has quit IRC18:54
*** dpeacock has quit IRC18:56
*** itlinux has joined #rdo18:56
*** abregman has joined #rdo19:12
*** rain has joined #rdo19:12
rdogerritMerged openstack/novaclient-distgit: Added py2 and py3 subpackage  http://review.rdoproject.org/r/161819:17
*** Guest90133 has quit IRC19:18
*** spr1 has quit IRC19:25
*** spr1 has joined #rdo19:25
*** nyechiel has quit IRC19:26
*** spr1 has quit IRC19:29
*** Guest90133 has joined #rdo19:30
*** livelace has quit IRC19:30
*** milan has joined #rdo19:32
*** sdake has joined #rdo19:36
*** dgurtner has joined #rdo19:36
*** dgurtner has joined #rdo19:36
metabsdHow can assign a physical network card to my virtual switch ?19:36
*** imcleod has quit IRC19:39
*** spr1 has joined #rdo19:41
*** weshay has joined #rdo19:42
*** dgurtner has quit IRC19:42
EmilienMnumber80: any update on node?19:43
EmilienMtripleo ci is broken now19:43
number80EmilienM: currently on it19:45
number80(it's good enough but I prefer building latest 4.4.7 which requires refreshing patches)19:46
EmilienMnumber80: please ping me when it's done so I can test it19:47
slagletrown: if we can remove epel, let's do it19:48
slaglei dont know that anyone is necessary opposed to it19:49
trownslagle: I think between my patch and EmilienM's it will be removed19:49
slaglemaybe we can just enable it on demand, such as for this bmc image19:49
trownslagle: I am not even sure it is needed there19:49
trownpython-crypto is packaged19:49
trownapevec: maybe it didnt at some point..l. it is definitely in delorean-deps now19:52
bnemecapevec: http://paste.openstack.org/show/542169/19:52
apevecbnemec, yeah, replace it with python-crypto :)19:53
apevecpython2-* is provided by python-*19:53
apevecerr, other way around :)19:53
apevecso yum install python-crypto should always work19:53
apevecepel or without19:54
bnemecYeah, I'll look into changing it.19:57
bnemecFor the moment I think we should just leave it alone though.  There's no runtime dependency there, it only gets used if we rebuild the bmc image.19:57
bnemecWhich needs to happen pretty much never. :-)19:57
apevecjust curious, what is that image?19:59
bnemecIt's the fake BMC that we use in CI for providing IPMI control of OpenStack instances.19:59
trownapevec: how long does it typically take from promotion to showing up on http://buildlogs.centos.org/centos/7/cloud/x86_64/rdo-trunk-master-tested//20:00
apevectrown, dmsimard - btw what failed in https://ci.centos.org/view/rdo/view/promotion-pipeline/job/rdo-delorean-promote-master/540/ run?20:01
*** coolsvap_ has quit IRC20:01
apevecha, puppet, packstack :(20:01
apevectrown, that was w/ increased vcpu#  ?20:02
imcsk8apevec: checking20:02
dmsimardpuppet-openstack-scenario002 TestNetworkBasicOps.test_network_basic_ops20:02
trownapevec: ya20:02
apevecso still not enough for HA job? :(20:02
dmsimardapevec: looks like a oslo.privsep stacktrace20:03
dmsimardfirst time I see one of those: https://ci.centos.org/artifacts/rdo/weirdo-master-promote-puppet-openstack-scenario002/408/puppet-openstack/logs/nova/nova-compute.txt.gz20:03
trownmaybe not 100% of the time, but I bet it will pass sometimes on the slower chassis now20:03
*** ppowell has quit IRC20:03
trownwhich is an improvement :P20:03
apevecit is, but we have other odds to chase, so better make this 1.0 factor :)20:04
dmsimardapevec: looks like we could be hitting https://bugs.launchpad.net/oslo.privsep/+bug/159374320:07
openstackLaunchpad bug 1593743 in oslo.privsep "(privsep) u'systool -c fc_host -v' failed. Not Retrying." [Undecided,New] - Assigned to Walt Boring (walter-boring)20:07
trownsounds boring20:08
dmsimardsounds like a trunk lib issue20:08
dmsimardEmilienM: fyi ^20:08
EmilienMsounds super boring20:09
apevecreported month ago...20:09
*** akshai has quit IRC20:09
dmsimardadded a comment, I'll try and hunt down that mister boring20:11
dmsimardI'll add that to both etherpads ..20:13
*** fragatina has joined #rdo20:15
dmsimardhm, turns out it might just be a logging issue20:15
apevec"That dropped off after we blacklisted the os-brick 1.4.0 version from global-requirements which contains the changes for using oslo.privsep."20:15
apevecoh, red herring?20:15
dmsimarddunno, I'll look some more20:16
apevecyeah, real ERROR at the end of compute log is ConnectTimeout: Request to https://[::1]:9696/v2.0/ports.json timed out20:17
dmsimardapevec: well, the actual tempest traces are http://paste.openstack.org/show/542172/20:18
*** jeckersb is now known as jeckersb_gone20:18
dmsimarddigging in the neutron logs is like searching for a needle in a haystack though20:19
weshaymyoung, fyi.. https://review.openstack.org/#/c/346733/20:19
metabsdneutron use openvswitch ?20:20
*** zeroshft has quit IRC20:22
dmsimardapevec: right, I see the ports.json timeout now20:22
dmsimardmetabsd: neutron is a network abstraction, it has several backends for networking. openvswitch is one of those, linuxbridge is another one.20:23
metabsddmsimard: I try to acces my instance by the network. I setup packstack --allinone. I try to find how I can attach physical network to my ovs. I think --allinone use OVS. I'm right ?20:24
imcsk8metabsd: yes20:25
number80EmilienM: ok, untagging libuv, I have to patch nodejs to use openssl 1.0.120:26
number80this is no trivial change20:27
number80doable but not trivial20:27
EmilienMnumber80: k20:27
apevecnumber80, untagging won't remove it from buildlogs20:27
number80the harder being that openssl has shitty documentation20:27
number80apevec: haven't we changed that?20:28
apevecit's still accumulating afaict20:28
*** abregman has quit IRC20:29
apevecbut, how does nodejs get into tripleo-ci ?20:29
apevecthat should be only for tripleo-ui jobs?20:30
number80I thought that metadata would be regenerated20:30
number80well, I may fix it tonight, but not 100% sure20:30
apevecuntag it for now, and we'll see20:31
dmsimardapevec: can't find a legit issue with that puppet job, it looks like it may just be a flap - not seeing it anywhere else. I'll wait and see if it reproduces.20:31
dmsimardapevec: I noted the privsep stacktrace in the etherpad nonetheless20:31
apevecyeah, must be regular flapping, it worked w/ same hash run before20:32
*** jtomasek has joined #rdo20:40
*** jtomasek has quit IRC20:41
*** jtomasek has joined #rdo20:41
imcsk8dmsimard: i found this on the console log, i'm not sure if it's relevant: https://paste.fedoraproject.org/396004/46956581/20:44
*** imcleod has quit IRC20:50
rdogerritMerged rdoinfo: Normalize rdo.yml  http://review.rdoproject.org/r/172920:58
*** KarlchenK has quit IRC21:00
*** julim has quit IRC21:02
*** KarlchenK has joined #rdo21:02
trown|outtypewwwweshay: so green https://dashboards.rdoproject.org/rdo-dev21:03
*** unclemarc has quit IRC21:03
weshaywha hoooo21:03
weshayjschlueter, ^21:03
*** rlandy has joined #rdo21:07
*** KarlchenK has quit IRC21:08
*** iranzo has quit IRC21:13
apevectrown|outtypewww, it hurts my eyes21:14
*** KarlchenK has joined #rdo21:14
apevecalso green with "1 issue" :)21:14
*** iranzo has joined #rdo21:14
apevecah systool thing21:14
apevecchanged to TBD21:15
*** dustins has quit IRC21:24
*** dgurtner has joined #rdo21:32
*** rhallisey has quit IRC21:41
*** jubapa_ has joined #rdo21:44
number80good progress with node => https://cbs.centos.org/koji/watchlogs?taskID=10301821:44
number80I have patches that builds21:44
*** ayoung has quit IRC21:45
*** ccamacho|away has quit IRC21:46
*** jmelvin has quit IRC21:51
*** apevec has quit IRC21:56
*** gildub has joined #rdo21:57
*** sdake has quit IRC22:09
*** Alex_Stef has quit IRC22:15
number80EmilienM: is your CI still broken?22:17
number80I have working build of node.js 4.4.7 \o/22:24
number80found a job failing due to libuv/nodejs22:30
*** iranzo has quit IRC22:33
*** limao has joined #rdo22:38
*** rpioso has quit IRC22:44
*** sdake has joined #rdo22:47
*** sdake has quit IRC22:52
*** morazi has quit IRC22:55
*** egafford has quit IRC23:18
*** limao has quit IRC23:21
itlinuxhello all, has anyone done the undercloud in an HA mode?23:22
*** thrash is now known as thrash|g0ne23:38
*** fragatina has quit IRC23:52
Generated by irclog2html.py 2.14.0 by Marius Gedminas