Tuesday, 2016-08-16

*** markvoelker has joined #openstack-infra00:00
*** tkelsey has joined #openstack-infra00:00
*** jimbaker has joined #openstack-infra00:00
*** jimbaker has quit IRC00:00
*** jimbaker has joined #openstack-infra00:00
cloudnullpabelanger mordred is there a way we could store the console log from a vm if it's marked error by nodepool?00:01
pabelangercloudnull: Ya, we don't have a way today to keep a node online with ready-script failure. Maybe jeblair has some thoughts on that00:01
cloudnullIE boot, if fail store console log, delete?00:02
clarkbnow thats interesting, ubuntu does run an ntpdate on if up00:02
clarkbI wonder if its failing to resolve dns at that point due to unbound not being up?00:02
pabelangerwe could make our configure_mirror.sh script smarter00:02
*** Goneri has quit IRC00:03
*** spzala has joined #openstack-infra00:03
cloudnullI guess I could enable defered delete for a while and try to trap the log of instances.00:04
openstackgerritIan Wienand proposed openstack-infra/project-config: Further F24 kernel update  https://review.openstack.org/35378300:04
clarkbits tempting to just go back to ntpdate, deprecation or not, there doesn't seem to be any other sane tools to do this00:04
fungiclarkb: not all of our providers (/me glares) provide nova console log access, so we haven't relied on it in nodepool previously00:05
fungier, cloudnull ^00:05
fungisorry, clarkb00:05
* fungi failx0rz at teh tabcompletes00:05
*** tkelsey has quit IRC00:05
* cloudnull knows who to glare at...00:05
mordredcloudnull: we have the ability to hold nodes on error in nodepool - but it currently only works on job names00:06
fungiso, yes, it's possible obviously. nodepool calls openstack apis, nodepool logs things, nodepool could call another api method and log the results00:06
fungiit's "just" a matter of code, as they say00:06
* mordred muses having a feature of be able to grab an error node from a provider rather than a job00:06
*** piet_ has joined #openstack-infra00:07
*** baoli has joined #openstack-infra00:07
cloudnullI think for now i'll set the reclaim_instance_interval w/in the nova.conf to something like an hour or so.00:07
mordredfungi: it would not be difficult "just code" to attempt a console log grab on node boot error00:08
fungiagreed00:08
cloudnullthen next time we have an ssh timeout let me know and I can go look at the things.00:08
mordredfungi: I'm cooking meat now - but I can make that patch tomorrow00:08
fungiyeah, was going to say, as long as it ends up on someone's "just code" list, that's the tricky bit00:09
clarkbianw: pabelanger so I am open to ideas, but even using eg chrony on centos/fedora and ntp on ubuntu/debian isn't going to fix this for us I don't think00:09
*** tqtran has quit IRC00:09
clarkbianw: pabelanger since we will continue to run into the problem of slowly skewing time rather than making a step at boot to avoid that00:09
fungispeaking of "just code" third batch of contributor registration discount codes for barcelona just finished going out. 270 in ~3 weeks00:10
*** PalTale has joined #openstack-infra00:10
ianwclarkb: yes, i don't really see ntpdate actually being deprecated, despite what it says.  the RH maintainer tells people it's about the only sane way to start ntp00:10
fungiused latest state of 263971 for that (i also did lots of additional validation of results against older data to make sure it did what was expected of it)00:10
fungiianw: though the rh maintainer also said not relying on ntp was even saner on rh-derivatives since it's no longer default00:11
clarkbianw: even using chronyd you have to do non default things to make it actually step from my reading00:11
clarkbbasically the time sync services as implemented by these distros don't solve this problem00:12
clarkbwhich si annoying00:12
harlowjamordred ' You might even come to the conclusion that my personal preferences00:12
harlowjaor needs are not the most important thing. I'00:12
harlowjabut they are!00:12
harlowjaha00:12
mordredharlowja: :)00:12
harlowjaas long as your preferences are my preferences00:12
harlowjalol00:12
*** spzala has quit IRC00:12
openstackgerritChris Krelle proposed openstack/diskimage-builder: WIP: A hardware burn-in element.  https://review.openstack.org/35567500:13
ianwfungi: yes, that too00:13
mordredharlowja: listen - you are entitled to your own wrong opinion00:13
fungiclarkb: i wonder if the openntpd package for debian/ubuntu has a config option to start with -s00:13
harlowjamordred  not if donald gets elected, lol00:13
mordredharlowja: I believe it's a basic human right00:13
* mordred steps away from election talk ...00:13
harlowjahahahahha00:13
ianwclarkb: i believe "chronyc makestep" is the ntpdate equivalent00:14
* harlowja goes right into the deep end00:14
openstackgerritAbhishek Raut proposed openstack-infra/project-config: Use python-db-jobs for tap-as-a-service  https://review.openstack.org/35567000:14
clarkbianw: yup but that doesn't happen for us on boot00:15
clarkbianw: so we would have to write our own service to do it or otherwise hack it in00:15
openstackgerritMerged openstack-infra/system-config: Fix firehose hostname on cacti hiera  https://review.openstack.org/35567100:19
ianwclarkb: the config is00:22
ianw# In first three updates step the system clock instead of slew00:22
ianw# if the adjustment is larger than 1 second.00:22
ianwmakestep 1.0 300:22
clarkbianw: thats teh chronyd default config on centos/fedora?00:23
*** Hal has quit IRC00:23
ianwclarkb: yes00:23
clarkbah ok00:23
ianwso, of course systemd is in the mix here00:23
clarkbyes systemd has its own service to do syncing00:24
clarkbbut it has almost no docs00:24
ianwoh, that's not in use, but i think chrony does have network detection service bits00:24
clarkbianw: its in use on ubuntu xenial :/00:24
ianwparticularly http://pkgs.fedoraproject.org/cgit/rpms/chrony.git/tree/chrony-dnssrv@.service00:24
*** Swami has quit IRC00:24
clarkbby default00:24
openstackgerritYAMAMOTO Takashi proposed openstack-infra/project-config: networking-midonet: switch to python-db-jobs  https://review.openstack.org/33555100:25
*** gildub has joined #openstack-infra00:26
ianwclarkb: ah ... so now we have 3 methods to set the time00:27
clarkbianw: indeed :(00:27
*** fitoduarte has quit IRC00:27
clarkbianw: though I am somewhat partial to just using one across the barod if it can be made to work sanely00:27
clarkbchronyd seems fine except ubuntu doesn't seemt o have that makestep setup that centos/fedora do00:27
*** thorst_ has joined #openstack-infra00:28
clarkbI wonder if we can configure that via the default file somehow00:28
*** piet_ has quit IRC00:29
*** woodster_ has quit IRC00:29
*** signed8bit is now known as signed8bit_Zzz00:30
ianwclarkb: just looking at the deb packaging now...00:31
clarkbianw: I did confirm that an ubuntu-minimal build of xenial boots up with teh systemd service running00:31
clarkband trusty doesn't have anything00:32
*** gildub_ has joined #openstack-infra00:32
clarkbianw: I think the ideal situation would be to have on each distro we run something that does the similar case to ntpdate first then ntpd. Then completely remove ntp munging from devstack-gate. Sounds like centos/fedora do this with chrony, so need to figure out ubuntu/debian option that works00:33
*** xarses has quit IRC00:34
clarkbmy reading of the ntp setup on ubuntu/debian is that it will try to ntpdate on if-up but that doesn't seemt to be working for us? Maybe beacuse of a race between unbound and networking coming up resolving the ntp servers00:34
*** gongysh has joined #openstack-infra00:36
clarkbianw: and the chrony package on ubuntu will do a burst but not a step from my reading of scripts00:38
ianwclarkb: yeah, so config in https://launchpad.net/ubuntu/+archive/primary/+files/chrony_2.1.1-1.debian.tar.xz doesn't specify makestep as you say.  to me, a bug saying "redhat does it, it would be nice to be consistent and it's probably what you want anyway" might be ok00:38
ianwbut time people also seem very, ahh, set in their ways00:38
ianwso i expect that might also be closed with a flame to boot00:39
* fungi always boots in flames00:39
fungiand in flaming boots00:39
clarkbmight be worth attempting to trace the normal ntp boot up and see if ntpdate is in fact running and if it is failing due to other deps not being there at boot00:41
clarkbI have some local VMs I can use to try and attempt that but need to go to dinner now00:41
*** rbuzatu has joined #openstack-infra00:41
ianwclarkb: i have an osic vm i've been pottering on for f24.  let me rebuild that with a ubuntu image and see if anything pops out00:42
cloudnullclarkb mordred pabelanger I have deffered deletes enabled now. if at all possible I'd love to know the next time an instance has ssh issues so i can go hunt down that specific failures and such.00:45
pabelangercloudnull: sure, I can check now00:46
*** rbuzatu has quit IRC00:46
*** amotoki has joined #openstack-infra00:47
pabelangercloudnull: 8098e5c0-125f-4fda-9887-496f8f7fdf7d00:48
pabelangercloudnull: just failed00:48
*** jamielennox is now known as jamielennox|away00:48
cloudnullok00:48
ianwclarkb: ok, so with ntpdate not in the base image, it's not starting on boot for sure00:49
*** spzala has joined #openstack-infra00:49
*** jamielennox|away is now known as jamielennox00:49
*** tonytan4ever has joined #openstack-infra00:50
pabelangercloudnull: 7b3e102d-3f32-4a76-9e44-abf0b42dad4d is another00:50
ianwclarkb: and when it is there, it is called in the network ifup scripts, but -> Aug 16 00:49:35 iwienand-f24-test ntpdate[816]: Can't find host 3.debian.pool.ntp.org: Name or service not known (-2)00:50
*** csmart has quit IRC00:52
*** csmart has joined #openstack-infra00:53
cloudnullpabelanger: idk if its related but both of those instances are 16.04? do we generally see these ssh failures more on 16.04 than not?00:54
cloudnullor is 16.04 just what's more common now?00:55
pabelangercloudnull: let me check, I have logs.  we are doing more and more xenial00:55
cloudnullalso both are using config_drive, is that the default?00:56
cloudnulli'd like to spin up lots of tests to reproduce this issue without continuing to bother you :)00:56
*** fguillot_ has quit IRC00:56
openstackgerritfumihiko kakuma proposed openstack-infra/devstack-gate: Enable to add sudo permission to tempest user  https://review.openstack.org/35568200:57
*** gyee has quit IRC00:59
*** fguillot_ has joined #openstack-infra00:59
pabelangercloudnull: you are correct, it looks to be only xenial failing01:00
pabelangercloudnull: let me manually launch one and see why01:00
*** rbuzatu has joined #openstack-infra01:04
*** aeng has quit IRC01:04
*** gongysh has quit IRC01:05
*** aeng has joined #openstack-infra01:05
*** zhurong has joined #openstack-infra01:05
ianwclarkb: so here's how i think it goes on trusty.  ./network/if-up.d/ntpdate gets called by ifup ... but dhclient is still working at that point.  that's ok, because ./dhcp/dhclient-exit-hooks.d/ntpdate will be called when we actually have network01:05
ianwclarkb: none of this happens on boot of our trusty images, because ntpdate isn't installed01:05
*** esberglu has joined #openstack-infra01:06
ianwwhich is probably the fault of puppet-ntp ... i don't think ntpdate is really an optional component01:07
*** pahuang has joined #openstack-infra01:07
*** rbuzatu has quit IRC01:08
*** tqtran has joined #openstack-infra01:09
clarkbaha!01:09
*** baoli has quit IRC01:10
*** julim has joined #openstack-infra01:10
pabelangerclarkb: cloudnull: okay, reproduced the failure of host git.openstack.org in osic-cloud1.  I think we have a race condition, if I ran the command 1min later, it worked01:11
*** adrian_otto has quit IRC01:11
clarkboraybe a nat issue?01:11
pabelangerpossible01:12
pabelangerlet me force ipv6 and reboot01:12
*** baoli has joined #openstack-infra01:12
pabelangergoing to also check that sshd depends on unbound too01:12
*** tqtran has quit IRC01:13
cloudnullmaybe we can add something like this to the script http://cdn.pasteraw.com/cs48x75pis3n67r63j5mgc0a3fsscur ?01:14
pabelangerya, unbound is taking a while to start01:14
cloudnullthen it can try for a min or two before failing ?01:14
*** weshay has quit IRC01:15
pabelangerhttp://paste.openstack.org/show/557770/01:16
pabelangerunbound is taking about 1 min to start01:16
pabelangererr01:16
pabelangeryes, 1 min01:16
*** esberglu has quit IRC01:17
*** gildub_ has quit IRC01:18
*** gildub has quit IRC01:18
*** jimbaker has quit IRC01:18
*** gildub has joined #openstack-infra01:19
cloudnullpabelanger: rather... http://cdn.pasteraw.com/n57rvu8vw6w3q8mzd9s0hiua5i8v677 -- forgot an import loop there ;)01:19
*** Apoorva_ has joined #openstack-infra01:19
pabelangercloudnull: Ya, we could trying polling a few times. Let me see why unbound is taking 1 min to start01:20
*** asettle has joined #openstack-infra01:22
*** jimbaker has joined #openstack-infra01:22
*** jimbaker has quit IRC01:22
*** jimbaker has joined #openstack-infra01:22
*** Apoorva has quit IRC01:23
*** Apoorva_ has quit IRC01:24
cloudnullgoing to grab a bite, back in a while.01:24
*** rajinir has quit IRC01:25
*** spzala has quit IRC01:29
*** spzala has joined #openstack-infra01:30
pabelangercloudnull: clarkb: So, I think unbound is blocking on key generation: http://paste.openstack.org/show/557771/ waiting for random from the kernel01:31
pabelangercloudnull: clarkb: so, we can either, make configure_mirror.sh smartly but polling service unbound status every 30second, 10 times: http://paste.openstack.org/show/557773/01:32
pabelangercloudnull: clarkb: see if we can preseed the key, or disable the key01:32
*** asettle has quit IRC01:32
*** aeng has quit IRC01:33
*** spzala has quit IRC01:34
*** dkehn_ has quit IRC01:34
*** dkehn has quit IRC01:34
*** thorst_ has quit IRC01:38
*** thorst_ has joined #openstack-infra01:39
*** rfolco has quit IRC01:39
*** hparekh has quit IRC01:40
*** nwkarsten has joined #openstack-infra01:43
*** baoli has quit IRC01:43
*** amotoki has quit IRC01:43
*** Sukhdev has quit IRC01:44
*** gongysh has joined #openstack-infra01:45
*** elo has quit IRC01:45
*** dkehn has joined #openstack-infra01:47
*** dkehn_ has joined #openstack-infra01:47
*** thorst_ has quit IRC01:48
*** yanyanhu has joined #openstack-infra01:48
*** larainema has quit IRC01:49
openstackgerritTim Burke proposed openstack-dev/hacking: Add optional H203 to check that assertIs(Not)None is used  https://review.openstack.org/27651701:50
*** baoli has joined #openstack-infra01:51
*** vinaypotluri has quit IRC01:51
*** hparekh has joined #openstack-infra01:51
*** gongysh has quit IRC01:55
*** tkelsey has joined #openstack-infra02:02
*** thorst_ has joined #openstack-infra02:02
*** thorst_ has quit IRC02:03
*** inc0 has joined #openstack-infra02:03
openstackgerritJames Polley proposed openstack-dev/pbr: Fix handling of old git log output  https://review.openstack.org/33939202:03
*** gongysh has joined #openstack-infra02:04
*** dimtruck is now known as zz_dimtruck02:05
*** rbuzatu has joined #openstack-infra02:05
*** zz_dimtruck is now known as dimtruck02:05
openstackgerritzhangyanxian proposed openstack-infra/project-config: Fix typo in the Pypi-extract-name.py  https://review.openstack.org/35569202:05
*** tkelsey has quit IRC02:06
*** jamielennox is now known as jamielennox|away02:07
*** xarses has joined #openstack-infra02:07
openstackgerritzhangyanxian proposed openstack-infra/project-config: Fix typo in the Pypi-extract-name.py  https://review.openstack.org/35569202:09
openstackgerritzhangyanxian proposed openstack-infra/project-config: Fix typo in the pypi-extract-name.py  https://review.openstack.org/35569202:09
*** rbuzatu has quit IRC02:10
openstackgerritMerged openstack-infra/project-config: Further F24 kernel update  https://review.openstack.org/35378302:10
openstackgerritJames Polley proposed openstack-dev/pbr: Fix handling of old git log output  https://review.openstack.org/33939202:11
*** pradk has quit IRC02:12
openstackgerritJames Polley proposed openstack-dev/pbr: Fix handling of old git log output  https://review.openstack.org/33939202:18
*** aeng has joined #openstack-infra02:19
*** gongysh has quit IRC02:20
openstackgerritTimothy R. Chavez proposed openstack-infra/jenkins-job-builder: Use xml_jobs not jobs  https://review.openstack.org/35569402:20
*** baoli has quit IRC02:21
*** elo has joined #openstack-infra02:22
openstackgerritPaul Belanger proposed openstack-infra/project-config: Add smarter dns checking for configure_mirror.sh  https://review.openstack.org/35569502:22
pabelangercloudnull: clarkb: fungi: So, that should fix our launch node errors around DNS not working ^.  In the case of osic-cloud1 and ubuntu-xenial, we are SSHing into the node and running host git.openstack.org before unbound has finished starting02:24
*** raunak has quit IRC02:25
*** jamielennox|away is now known as jamielennox02:26
timrczxiiro: Hi... it looks like 80aa5266166dfcc84be765060cae7c6eac363ecd caused a regression.  See: https://review.openstack.org/#/c/355694/02:27
*** mriedem is now known as mriedem_away02:27
timrczxiiro: Use of --delete-old with commit 80aa5266166dfcc84be765060cae7c6eac363ecd will delete every job.02:27
fungipabelanger: what are the odds that we're not preinstalling haveged on our nodes, resulting in entropy starvation?02:29
fungitimrc: i thought they fixed that last week?02:29
zxiiroi thought we fixed it too. I've been using it on my systems with no issue02:30
zxiiroi'm not sure what the difference of passing xml_jobs instead of jobs is. Jobs is what is returned from jenkins as the list of all jobs that were updated, hence shouldn't be deleted. xml_jobs should be the same too?02:31
timrcNot what I'm seeing...02:31
zxiirotimrc: how are you running your command? jenkins-jobs update --delete-old jjbs/ ?02:33
*** asettle has joined #openstack-infra02:33
timrczxiiro: Essentially, e.g.  jenkins-jobs --conf /etc/jenkins_jobs/jenkins_jobs.ini update ./jjb-jobs/servers/`hostname` --delete-old02:34
*** netsin has quit IRC02:34
*** signed8bit_Zzz is now known as signed8bit02:37
timrczxiiro: From my console running the script that runs whenever a change to our jobs repo changes... http://paste.openstack.org/show/557860/02:38
zxiirotimrc: well let me test it real quick and if it works for me I'll merge it02:39
*** mdrabe has joined #openstack-infra02:39
*** asettle has quit IRC02:40
*** tphummel has quit IRC02:40
*** vinaypotluri has joined #openstack-infra02:42
timrczxiiro: I think the jobs list that gets returned by update_jobs is just the list of jobs that changed.. so if no jobs changed, for example, it returns [].  That empty list gets passed as the "keeps" list.  Since no jobs are in that list, they all get removed.02:44
*** hongbin has joined #openstack-infra02:44
*** bin_ has quit IRC02:44
timrcIf we use xml_jobs the "keeps" list will always be every job in config, regardless of if it changed or not.02:45
timrcWhich is exactly what we want, I think.02:45
*** zhenguo has joined #openstack-infra02:46
timrc--delete-old should presumably just delete the jobs which are no longer in config.02:46
zxiirotimrc: yeah i'm testing that theory now. I want ot make sure we understand the difference betweeen the 202:48
zxiirotimrc: i suspect i didn't catch it in testing because i run my system with ignore_cache=True02:48
*** gongysh_ has joined #openstack-infra02:48
*** yuanying has quit IRC02:49
zxiirotimrc: Ok I just confirmed it02:49
zxiirotimrc: you're right. jobs returns only updated so if you cached and your jobs didnt' update it won't be in the list. xml_Jobs is the right thing to use02:50
zxiirotimrc: can you update the commit message to explain that?02:50
zxiirotimrc: I'll approve the change right away once you do that02:50
*** elo has quit IRC02:51
*** yuanying has joined #openstack-infra02:52
*** jimbaker has quit IRC02:53
openstackgerritJeremy Stanley proposed openstack-infra/system-config: Add a script to list change owner statistics  https://review.openstack.org/26397102:53
*** yamahata has quit IRC02:54
*** inc0 has quit IRC02:55
*** elo has joined #openstack-infra02:57
*** jimbaker has joined #openstack-infra02:57
*** jimbaker has quit IRC02:57
*** jimbaker has joined #openstack-infra02:57
*** gongysh_ has quit IRC02:57
ianwis it possible there's something up with the nodepool builder?02:58
*** signed8bit is now known as signed8bit_Zzz03:00
pabelangerfungi: I am not sure, I'd have to check. I've never used haveged before either03:00
pabelangerianw: I kicked of a build an hour or so go03:01
pabelangerlooks like ubuntu-xenial is just finishing up03:01
pabelangeractually, done now03:01
*** baoli has joined #openstack-infra03:01
ianwpabelanger: ahh, yeah, sorry should have checked the debug log03:01
*** krtaylor has joined #openstack-infra03:02
pabelangerfungi: it is installed for ubuntu-xenial03:02
*** thorst_ has joined #openstack-infra03:03
*** yamahata has joined #openstack-infra03:03
*** dimtruck is now known as zz_dimtruck03:07
ianwpabelanger: what's up with that -> OpenStackCloudException: Image creation failed: delete() takes exactly 2 arguments (1 given)03:07
*** raunak has joined #openstack-infra03:08
*** signed8bit_Zzz is now known as signed8bit03:09
pabelangerianw: never seen that03:10
pabelangerianw: I did delete some old DIB images from nodepool.o.o tonight however03:10
*** apetrich has joined #openstack-infra03:10
*** netsin has joined #openstack-infra03:10
*** nwkarste_ has joined #openstack-infra03:11
pabelangerianw: looks like a bug in shade03:11
pabelangermordred: ^03:11
ianwpabelanger: yeah ... odd traceback03:11
ianwhttp://paste.openstack.org/show/557882/03:11
*** thorst_ has quit IRC03:11
*** elo has quit IRC03:12
zxiirotimrc: looks like you're not here. I'll update the commit message03:12
*** nwkarsten has quit IRC03:13
*** fguillot_ has quit IRC03:14
openstackgerritThanh Ha proposed openstack-infra/jenkins-job-builder: Use xml_jobs not jobs  https://review.openstack.org/35569403:14
*** elo has joined #openstack-infra03:17
*** nwkarste_ has quit IRC03:18
ianwpabelanger: that tb really makes no sense ... i get the feeling the builder process might not be running the same code as on disk...03:19
pabelangerianw: possible, you can restart it if you want, I am done for the night03:20
*** raunak has quit IRC03:20
ianwpabelanger: ok, no worries, i'll see, numbers might make sense on an older release03:21
*** raunak has joined #openstack-infra03:21
timrczxiiro: Sorry, was putting my daughter to sleep.  Readig up03:22
zxiirotimrc: no worries. once jenkins returns I will merge it03:24
*** psilvad has quit IRC03:25
timrczxiiro: Excellent.  Thanks!03:25
zxiirotimrc: no thank you for reporting and  fixing the issue!03:25
*** baoli has quit IRC03:26
*** rbuzatu has joined #openstack-infra03:26
ianwpabelanger: to answer my own question, the shade .py files are from the 13th, and the builder was started on the 12th.  so yeah, the numbers don't line up in the tb03:27
*** shashank_hegde has joined #openstack-infra03:30
*** rbuzatu has quit IRC03:31
ianwyep, 1.9.0 makes much more sense03:32
*** signed8bit has quit IRC03:34
*** signed8bit has joined #openstack-infra03:34
*** shashank_hegde has quit IRC03:36
beaglesmeh, still have really weird issues with zuul ansible on ubuntu. "async task produced unparseable results" shows up in the ansible log and the job fails03:38
*** signed8bit has quit IRC03:38
*** signed8b_ has joined #openstack-infra03:39
*** julim has quit IRC03:42
*** vikrant has joined #openstack-infra03:42
*** yamahata has quit IRC03:43
*** roxanaghe has joined #openstack-infra03:43
*** roxanaghe has quit IRC03:43
*** hongbin has quit IRC03:45
*** ramishra has quit IRC03:45
*** rajinir has joined #openstack-infra03:45
*** nwkarsten has joined #openstack-infra03:46
beaglespabelanger, you still around - if so, should that possible fix to ^^^ have propogated through to where it'd get picked up on a recheck?03:47
ianwbeagles: we were having issue with that on fedora, which had to do with locales on the host and it outputting error messages that got things confused03:47
beaglesouch. how did you resolve it?03:48
clarkbbeagles: ianw my understanding is jeblair kocked off some restarts to pick up new ansible today03:48
*** yuanying has quit IRC03:48
*** roxanaghe has joined #openstack-infra03:48
clarkbthe ansible fix merged but us not yet released03:48
jeblairyeah, should be in place.  we may need to hold a node to debug further.  (i can't do that now)03:48
beaglesclarkb, awwww okay03:48
ianwbeagles: fixed the locales in the image build :)  but yeah, ansible did fix it in later release03:48
beaglesclarkb, I had sifted through IRC backlog and misunderstood - thought it was "in the mix"03:49
openstackgerritMerged openstack-infra/jenkins-job-builder: Use xml_jobs not jobs  https://review.openstack.org/35569403:49
jeblairbeagles: it is in place -- we are running unreleased ansible to get it03:49
*** ramishra has joined #openstack-infra03:51
*** yuanying has joined #openstack-infra03:51
beaglesjeblair, okay nice.. how long ago would it have been available? Just want to confirm these jobs were launched before they would've gotten the fix03:51
jeblairbeagles: i think i status logged it... 1 sec03:52
jeblairbeagles: https://wiki.openstack.org/wiki/Infrastructure_Status says03:52
jeblair2016-08-15 20:34:14 UTC Installed ansible stable-2.1 branch on zuul launchers to pick up https://github.com/ansible/ansible/commit/d35377dac78a8fcc6e8acf0ffd92f47f44d7094603:52
*** nwkarsten has quit IRC03:52
*** nwkarsten has joined #openstack-infra03:53
beaglesjeblair, crap.. then unless I'm missing something it should've been picked up.. 1s03:54
beaglesjeblair, is there something in the ansible logs, etc. I can spot to check what version was being used?03:55
*** signed8bit has joined #openstack-infra03:56
*** asettle has joined #openstack-infra03:56
*** nwkarsten has quit IRC03:57
*** signed8b_ has quit IRC03:58
*** winggundamth has quit IRC03:59
*** asettle has quit IRC04:01
prometheanfirethink I may have found a bug in git-review/gerrit04:06
prometheanfiremaybe04:06
prometheanfirecan you git-review to the same change-id but a diferent branch?04:07
prometheanfirehuh, you can04:07
prometheanfirenvm then lol04:07
prometheanfirehttps://review.openstack.org/#/q/I67d7a5000bfe0c98717d3e29d23edc9c6117e765,n,z04:07
*** thorst_ has joined #openstack-infra04:10
*** tqtran has joined #openstack-infra04:10
beaglesjeblair, actually .. what I'm looking at looks like a timeout... wow04:10
clarkbprometheanfire: yes change ids are not unique04:12
clarkbprometheanfire: the unique tuple is prject, branch, change id04:13
*** hichihara has joined #openstack-infra04:13
prometheanfirejust realized that :D04:13
*** winggundamth has joined #openstack-infra04:14
prometheanfiretoabctl: for you to watch I guess https://review.openstack.org/353349 https://review.openstack.org/35571104:14
*** tqtran has quit IRC04:14
prometheanfirebah04:14
prometheanfiretoabctl: sorry, mistype04:14
prometheanfiretonyb: for you to watch I guess https://review.openstack.org/353349 https://review.openstack.org/35571104:15
prometheanfiretonyb: though you might be done working04:15
openstackgerritkyle liu proposed openstack-infra/project-config: Add new project networking-zte  https://review.openstack.org/35527804:16
prometheanfirealso, if someone has some time to review... https://review.openstack.org/#/c/310865/04:17
*** thorst_ has quit IRC04:17
*** rlandy has quit IRC04:19
*** jimbaker has quit IRC04:20
*** links has joined #openstack-infra04:20
*** sflanigan has joined #openstack-infra04:22
*** sflanigan has joined #openstack-infra04:22
*** raunak has quit IRC04:22
*** jimbaker has joined #openstack-infra04:23
*** jimbaker has quit IRC04:23
*** jimbaker has joined #openstack-infra04:23
*** raunak has joined #openstack-infra04:24
*** javeriak has joined #openstack-infra04:26
openstackgerritIan Wienand proposed openstack-infra/shade: Use "image" as argument for Glance V1 upload error path  https://review.openstack.org/35571504:27
*** tonytan4ever has quit IRC04:27
ianwpabelanger: ^ re that error.04:27
ianwthat fixed, i'll restart the builder now since it's quiet and so it's running the same code that's actually on disk :)04:28
*** javeriak has quit IRC04:31
*** kzaitsev_mb has joined #openstack-infra04:38
ianwi wonder why "nodepool image-build fedora-24" gets stuck?04:42
*** Sukhdev has joined #openstack-infra04:43
*** javeriak has joined #openstack-infra04:45
*** rbuzatu has joined #openstack-infra04:48
*** pgadiya has joined #openstack-infra04:48
*** sarob has joined #openstack-infra04:49
*** signed8bit has quit IRC04:52
*** sarob has quit IRC04:53
*** mdrabe has quit IRC04:54
*** rbuzatu has quit IRC04:54
*** psachin has joined #openstack-infra04:59
*** arnewiebalck has quit IRC05:00
*** jimbaker has quit IRC05:00
*** tonytan4ever has joined #openstack-infra05:03
*** kzaitsev_mb has quit IRC05:03
*** jimbaker has joined #openstack-infra05:04
*** jimbaker has quit IRC05:04
*** jimbaker has joined #openstack-infra05:04
*** elo has quit IRC05:04
*** raunak has quit IRC05:05
*** raunak has joined #openstack-infra05:06
*** thorst_ has joined #openstack-infra05:15
*** senk_ has joined #openstack-infra05:16
*** _nadya_ has joined #openstack-infra05:19
*** raunak has quit IRC05:20
*** raunak has joined #openstack-infra05:21
*** thorst_ has quit IRC05:22
*** _nadya_ has quit IRC05:24
*** Sukhdev has quit IRC05:26
*** kushal has joined #openstack-infra05:29
*** jaosorior has joined #openstack-infra05:30
*** raunak has quit IRC05:35
*** hichihara has quit IRC05:36
*** baoli has joined #openstack-infra05:38
*** rbuzatu has joined #openstack-infra05:39
*** ccamacho has joined #openstack-infra05:40
*** shashank_hegde has joined #openstack-infra05:42
*** baoli has quit IRC05:42
*** M-docaedo_vector has quit IRC05:43
*** raunak has joined #openstack-infra05:43
*** senk_ has quit IRC05:45
*** roxanaghe has quit IRC05:45
*** r-mibu has quit IRC05:46
*** tonytan4ever has quit IRC05:46
beaglesis it possible to log in to a node and see what's going on if it looks like jobs are hung?05:47
openstackgerritguo yunxian proposed openstack/os-testr: Add support for Python versions  https://review.openstack.org/35573005:48
*** dkehn_ has quit IRC05:48
*** dkehn has quit IRC05:49
*** shashank_hegde has quit IRC05:49
*** raunak has quit IRC05:50
*** markusry has joined #openstack-infra05:50
openstackgerritguo yunxian proposed openstack/os-testr: Add support for Python versions  https://review.openstack.org/35573005:51
*** tonytan4ever has joined #openstack-infra05:54
*** rajinir has quit IRC05:55
*** raunak has joined #openstack-infra05:55
*** dkehn has joined #openstack-infra05:55
ianwbeagles: yes, we can hold a node and give you a login, but it's a manual process05:56
*** slaweq_ has joined #openstack-infra05:57
*** oanson has joined #openstack-infra05:58
*** markvoelker has quit IRC05:58
*** dkehn_ has joined #openstack-infra06:01
openstackgerritIan Wienand proposed openstack-infra/system-config: Pre-install python2-requests package for Fedora  https://review.openstack.org/35573106:01
*** sandanar has joined #openstack-infra06:02
*** pabelanger has quit IRC06:02
*** pabelanger has joined #openstack-infra06:03
*** ccamacho is now known as ccamacho|afk06:04
*** tonytan4ever has quit IRC06:04
*** tkelsey has joined #openstack-infra06:05
*** r-mibu has joined #openstack-infra06:06
*** raunak has quit IRC06:09
*** florianf has joined #openstack-infra06:09
*** tkelsey has quit IRC06:09
*** M-docaedo_vector has joined #openstack-infra06:10
*** tqtran has joined #openstack-infra06:11
*** markusry has quit IRC06:11
*** jimbaker has quit IRC06:13
*** rcernin has joined #openstack-infra06:14
*** tqtran has quit IRC06:15
*** elo has joined #openstack-infra06:16
*** jimbaker has joined #openstack-infra06:17
*** raunak has joined #openstack-infra06:17
*** jimbaker has quit IRC06:17
*** jimbaker has joined #openstack-infra06:17
*** javeriak has quit IRC06:18
yolandagood morning06:19
*** thorst_ has joined #openstack-infra06:20
*** raunak has quit IRC06:21
*** raunak has joined #openstack-infra06:25
*** shashank_hegde has joined #openstack-infra06:26
*** kzaitsev_mb has joined #openstack-infra06:27
*** elo has quit IRC06:27
*** elo has joined #openstack-infra06:27
*** thorst_ has quit IRC06:27
*** csomerville has quit IRC06:29
*** cody-somerville has joined #openstack-infra06:30
*** cody-somerville has joined #openstack-infra06:30
*** Jeffrey4l has joined #openstack-infra06:30
*** liusheng has quit IRC06:30
*** spzala has joined #openstack-infra06:31
*** liusheng has joined #openstack-infra06:31
*** spzala has quit IRC06:35
*** raunak has quit IRC06:35
*** savihou has joined #openstack-infra06:36
*** gildub has quit IRC06:37
*** kushal has quit IRC06:39
*** vsaienko has quit IRC06:42
*** markusry has joined #openstack-infra06:46
*** raunak has joined #openstack-infra06:47
*** ihrachys has joined #openstack-infra06:47
yolandaianw, around? care reviewing https://review.openstack.org/353994 ?06:49
*** martinkopec has joined #openstack-infra06:50
*** raunak has quit IRC06:50
*** markvoelker has joined #openstack-infra06:51
*** markusry has quit IRC06:52
*** yamahata has joined #openstack-infra06:53
*** tkelsey has joined #openstack-infra06:54
*** rbuzatu has quit IRC06:57
*** rbuzatu has joined #openstack-infra06:58
*** jtomasek|afk is now known as jtomasek07:00
openstackgerritVitaly Gridnev proposed openstack-infra/project-config: don't run tempest tests in sahara grenade  https://review.openstack.org/35470007:01
*** yamahata has quit IRC07:02
*** savihou has quit IRC07:07
*** thorongil has joined #openstack-infra07:10
*** jpich has joined #openstack-infra07:11
*** ccamacho|afk is now known as ccamacho07:13
openstackgerritMerged openstack-infra/project-config: fix typo in comment  https://review.openstack.org/35515307:14
*** shashank_hegde has quit IRC07:18
openstackgerritMerged openstack-infra/project-config: Fix syntax error in ironic-python-agent post job  https://review.openstack.org/35548707:18
*** dizquierdo has joined #openstack-infra07:19
*** tonytan4ever has joined #openstack-infra07:22
*** nmagnezi has joined #openstack-infra07:23
*** tonytan4ever has quit IRC07:26
*** e0ne has joined #openstack-infra07:28
*** hichihara has joined #openstack-infra07:28
*** thorst_ has joined #openstack-infra07:28
*** raunak has joined #openstack-infra07:30
*** thorst_ has quit IRC07:32
akscramGuys, I want to add the puppet-check-jobs group and make it non-voting but I do not know how to do it properly: https://review.openstack.org/#/c/355265/07:33
akscramCould someone provide me an advise how to enable it?07:34
*** raunak has quit IRC07:37
*** bauzas_off is now known as bauzas07:38
*** ifarkas_afk is now known as ifarkas07:40
*** javeriak has joined #openstack-infra07:42
*** dkehn has quit IRC07:43
*** dkehn_ has quit IRC07:43
*** raunak has joined #openstack-infra07:45
*** savihou has joined #openstack-infra07:45
openstackgerritChangcheng Intel proposed openstack-infra/jenkins-job-builder: add compress-log option to compress log  https://review.openstack.org/35413807:49
*** dkehn has joined #openstack-infra07:50
*** matthewbodkin has joined #openstack-infra07:50
*** baoli has joined #openstack-infra07:50
*** chlong has quit IRC07:50
*** raunak has quit IRC07:52
*** kzaitsev_mb has quit IRC07:52
*** yanyanhu has quit IRC07:52
openstackgerritChangcheng Intel proposed openstack-infra/jenkins-job-builder: add post-send script option  https://review.openstack.org/35513507:53
*** hwoarang has joined #openstack-infra07:53
*** baoli has quit IRC07:54
openstackgerritChangcheng Intel proposed openstack-infra/jenkins-job-builder: use base_email_create to customize email flexible  https://review.openstack.org/35513907:54
*** sshnaidm|afk is now known as sshnaidm07:55
*** kzaitsev_mb has joined #openstack-infra07:55
*** zzzeek has quit IRC08:00
*** zzzeek has joined #openstack-infra08:00
*** pilgrimstack has joined #openstack-infra08:01
*** dkehn_ has joined #openstack-infra08:01
*** markvoelker has quit IRC08:01
*** Mmike has quit IRC08:02
*** Mmike has joined #openstack-infra08:02
*** pilgrimstack has quit IRC08:05
*** afred312 has quit IRC08:05
*** raunak has joined #openstack-infra08:06
*** afred312 has joined #openstack-infra08:06
*** pilgrimstack has joined #openstack-infra08:07
*** raunak has quit IRC08:11
*** esikachev has joined #openstack-infra08:13
*** matrohon has joined #openstack-infra08:19
*** yanyanhu has joined #openstack-infra08:20
*** asettle has joined #openstack-infra08:20
*** sshnaidm has quit IRC08:21
*** lucas-dinner is now known as lucasagomes08:21
*** sshnaidm has joined #openstack-infra08:21
*** tonytan4ever has joined #openstack-infra08:23
openstackgerritMatthew Bodkin proposed openstack-infra/storyboard-webclient: Make side bar the same length as navbar  https://review.openstack.org/35555408:26
*** Goneri has joined #openstack-infra08:27
*** Na3iL has joined #openstack-infra08:27
*** tonytan4ever has quit IRC08:28
*** chem has joined #openstack-infra08:29
*** thorst_ has joined #openstack-infra08:30
*** electrofelix has joined #openstack-infra08:33
*** bethwhite_ has joined #openstack-infra08:33
*** kzaitsev_mb has quit IRC08:34
*** sandanar_ has joined #openstack-infra08:34
*** thorst_ has quit IRC08:37
*** sandanar has quit IRC08:38
*** tkelsey has quit IRC08:39
*** yaume has joined #openstack-infra08:40
openstackgerritIvan Udovichenko proposed openstack-infra/project-config: Add new/update existing projects  https://review.openstack.org/34704708:40
*** mhickey has joined #openstack-infra08:41
*** yamamoto has quit IRC08:44
*** acoles_ is now known as acoles08:49
*** sarob has joined #openstack-infra08:51
*** dkehn_ has quit IRC08:51
*** dkehn has quit IRC08:51
*** bethwhite__ has joined #openstack-infra08:53
*** sarob has quit IRC08:55
*** Na3iL has quit IRC08:56
*** dkehn has joined #openstack-infra08:58
*** Julien-zte has joined #openstack-infra08:59
*** Goneri has quit IRC09:01
*** Goneri has joined #openstack-infra09:01
*** markvoelker has joined #openstack-infra09:02
*** derekh has joined #openstack-infra09:03
*** dkehn_ has joined #openstack-infra09:04
*** markvoelker has quit IRC09:07
*** sambetts|afk is now known as sambetts09:07
*** Na3iL has joined #openstack-infra09:11
*** vinaypotluri has quit IRC09:11
openstackgerritAleksey Zvyagintsev proposed openstack-infra/project-config: Add repo for murano-pkg-check. Murano package validator tool.  https://review.openstack.org/35386109:18
*** eranrom has quit IRC09:20
*** markmcd has joined #openstack-infra09:20
*** _nadya_ has joined #openstack-infra09:22
*** _nadya_ has quit IRC09:22
*** _nadya_ has joined #openstack-infra09:22
*** infra-red has joined #openstack-infra09:23
openstackgerritMerged openstack/diskimage-builder: Allow to skip kernel cleanup  https://review.openstack.org/35399409:24
*** dtardivel has joined #openstack-infra09:28
*** eranrom has joined #openstack-infra09:30
*** yamamoto has joined #openstack-infra09:31
*** thorst_ has joined #openstack-infra09:35
*** ociuhandu has joined #openstack-infra09:40
*** nwkarsten has joined #openstack-infra09:40
*** dtantsur|afk is now known as dtantsur09:40
*** thorst_ has quit IRC09:41
*** yamamoto has quit IRC09:41
*** ramishra has quit IRC09:42
*** ramishra has joined #openstack-infra09:44
*** nwkarsten has quit IRC09:44
*** yamamoto has joined #openstack-infra09:46
*** yamamoto has quit IRC09:46
*** dmellado has quit IRC09:46
*** amoralej has quit IRC09:46
*** geguileo has quit IRC09:46
*** kzaitsev_mb has joined #openstack-infra09:48
*** yamamoto has joined #openstack-infra09:48
*** tosky has joined #openstack-infra09:49
openstackgerritIlya Shakhat proposed openstack-infra/project-config: Add new project "os-failures"  https://review.openstack.org/35581909:53
openstackgerritJames Slagle proposed openstack-infra/tripleo-ci: Add temporary pin to last known good current tripleo repo  https://review.openstack.org/35448109:54
*** hichihara has quit IRC09:54
openstackgerritJames Slagle proposed openstack-infra/tripleo-ci: Add temporary pin to last known good current tripleo repo  https://review.openstack.org/35448109:55
*** dmellado has joined #openstack-infra09:56
*** ihrachys has quit IRC09:58
*** javeriak has quit IRC10:00
*** zhurong has quit IRC10:01
*** markvoelker has joined #openstack-infra10:03
*** jed56 has joined #openstack-infra10:03
*** sandanar__ has joined #openstack-infra10:03
*** sandanar_ has quit IRC10:07
openstackgerritJulien Danjou proposed openstack-infra/project-config: Teach some Telemetry jobs about Gnocchi stable/2.2 branch  https://review.openstack.org/35582810:07
*** markvoelker has quit IRC10:08
*** kushal has joined #openstack-infra10:09
*** tqtran has joined #openstack-infra10:12
*** ihrachys has joined #openstack-infra10:14
*** pt_15 has quit IRC10:16
*** Julien-zte has quit IRC10:17
*** tqtran has quit IRC10:17
*** _degorenko|afk is now known as degorenko10:18
sshnaidmdo you know why in some of projects when I set "closes-bug" it doesn't affect bugs in launchpad? Should be something special configured for this feature?10:18
*** asettle has quit IRC10:22
*** sdague has joined #openstack-infra10:23
*** ihrachys has quit IRC10:23
*** yanyanhu has quit IRC10:24
*** tonytan4ever has joined #openstack-infra10:24
*** yamamoto has quit IRC10:25
*** mhickey has quit IRC10:25
*** kushal has quit IRC10:26
*** kushal has joined #openstack-infra10:27
*** tonytan4ever has quit IRC10:28
*** javeriak has joined #openstack-infra10:29
*** rbuzatu has quit IRC10:29
*** cdent has joined #openstack-infra10:30
cdentI'm trying to figure out how to integrate the api-wg gerrit with the (newer) launchpad bugs collection it has. I can see from the docs that jeepyb does it, but I'm missing the bit on what to change in config to turn it on. halp?10:31
*** spzala has joined #openstack-infra10:31
*** boogibugs has joined #openstack-infra10:32
*** florianf has quit IRC10:33
openstackgerritDarragh Bailey proposed openstack-infra/jenkins-job-builder: Adding support for Manual Build Trigger  https://review.openstack.org/20254310:34
openstackgerritDarragh Bailey proposed openstack-infra/jenkins-job-builder: Consolidate trigger-manual and trigger-parameterized-builds  https://review.openstack.org/31410810:34
*** spzala has quit IRC10:36
*** boogibugs has quit IRC10:36
*** boogibugs has joined #openstack-infra10:36
*** Na3iL has quit IRC10:38
*** florianf has joined #openstack-infra10:38
*** narayrak has joined #openstack-infra10:39
openstackgerritRicardo Carrillo Cruz proposed openstack-infra/system-config: Add provisioning and public IP addresses for compute00[0-1].vanilla  https://review.openstack.org/35583910:40
*** thorst_ has joined #openstack-infra10:41
openstackgerritRicardo Carrillo Cruz proposed openstack-infra/system-config: Add provisioning and public IP addresses for compute00[0-1].vanilla  https://review.openstack.org/35583910:44
openstackgerritRicardo Carrillo Cruz proposed openstack-infra/system-config: Correct public IP for baremetal00  https://review.openstack.org/35584110:46
*** thorst_ has quit IRC10:47
*** bethwhite_ has quit IRC10:48
electrofelixzxiiro waynr: given TOX_TESTENV_PASSENV works for https://review.openstack.org/271244, perhaps I should just change that review to update documentation when testing and add a comment instead of explicitly allowing proxy variables to be passed through?10:48
*** rhallisey has joined #openstack-infra10:49
*** sarob has joined #openstack-infra10:52
openstackgerrityolanda.robla proposed openstack-infra/puppet-infracloud: Fix bridge creation when no vlan is involved  https://review.openstack.org/35584510:54
*** sarob has quit IRC10:56
openstackgerritBenny Kopilov proposed openstack-infra/devstack-gate: Enable support for cinder multi-backend in tempest  https://review.openstack.org/35584611:01
openstackgerritBenny Kopilov proposed openstack-infra/devstack-gate: Enable support for cinder multi-backend in tempest  https://review.openstack.org/35584611:03
*** yamamoto has joined #openstack-infra11:03
*** dmsimard is now known as dmsimard|afk11:03
*** azvyagintsev_h has joined #openstack-infra11:03
*** markvoelker has joined #openstack-infra11:04
*** markvoelker has quit IRC11:08
*** asettle has joined #openstack-infra11:09
*** Na3iL has joined #openstack-infra11:09
*** locust has joined #openstack-infra11:12
*** baoli has joined #openstack-infra11:14
openstackgerritRyan Hallisey proposed openstack-infra/project-config: Few changed to the kolla-kubernetes job  https://review.openstack.org/35519911:15
*** florianf has quit IRC11:17
*** baoli has quit IRC11:19
*** ociuhandu has quit IRC11:20
*** florianf has joined #openstack-infra11:21
openstackgerritSean Dague proposed openstack-infra/project-config: Prime pip cache  https://review.openstack.org/35585411:22
*** jkilpatr has joined #openstack-infra11:23
*** dizquierdo is now known as dizquierdo_afk11:29
*** rbuzatu has joined #openstack-infra11:29
*** asettle has quit IRC11:30
*** ramishra has quit IRC11:30
openstackgerritSagi Shnaidman proposed openstack-infra/tripleo-ci: DONT MERGE: test periodic job  https://review.openstack.org/35585911:31
*** ccamacho is now known as ccamacho|lunch11:31
cdentsdague: since you appear to be awake maybe you know the answer to my question above: "I'm trying to figure out how to integrate the api-wg gerrit with the (newer) launchpad bugs collection it has. I can see from the docs that jeepyb does it, but I'm missing the bit on what to change in config to turn it on. halp?"11:31
*** ramishra has joined #openstack-infra11:32
*** pbourke has joined #openstack-infra11:32
pbourkehi, wondering are the repos at http://mirror.ord.rax.openstack.org/ubuntu/dists/xenial/ signed, and if so, where can I find the key?11:33
openstackgerritFathi Boudra proposed openstack-infra/jenkins-job-builder: builders: add 'publish over ssh' support as a build step  https://review.openstack.org/9843711:34
*** rbuzatu has quit IRC11:34
*** thorst_ has joined #openstack-infra11:35
*** jaosorior has quit IRC11:35
*** jaosorior has joined #openstack-infra11:36
*** sdake has joined #openstack-infra11:36
*** berendt has joined #openstack-infra11:39
*** rfolco has joined #openstack-infra11:41
openstackgerritMerged openstack-infra/system-config: Add provisioning and public IP addresses for compute00[0-1].vanilla  https://review.openstack.org/35583911:41
*** sfinucan has quit IRC11:41
*** tpsilva has joined #openstack-infra11:41
*** asettle has joined #openstack-infra11:44
*** sfinucan has joined #openstack-infra11:44
dtantsurhi folks! could you please merge https://review.openstack.org/#/c/354608/ ? it's blocking Ironic stable gate11:45
openstackgerritDarragh Bailey proposed openstack-infra/jenkins-job-builder: Allow using lockfile per jenkins master  https://review.openstack.org/29363111:46
sdaguecdent: what is the old bug group, and what is the new one?11:47
*** matbu is now known as matbu|lunch11:47
sdaguedtantsur: +A11:47
cdentsdague: there was no previous association with launchpad. The new launchpad is: https://bugs.launchpad.net/openstack-api-wg https://launchpad.net/~openstack-api-wg-drivers11:48
odyssey4meyolanda if you have a moment, reviews of https://review.openstack.org/355434 & https://review.openstack.org/355491 would be appreciated11:48
*** sarob has joined #openstack-infra11:50
dtantsursdague, thanks!11:50
sdaguecdent: I think it's the 'groups' field11:51
*** rodrigods has quit IRC11:51
*** asettle has quit IRC11:51
*** rodrigods has joined #openstack-infra11:51
sdaguehttps://github.com/openstack-infra/project-config/blob/c5ed5d0c03c337c8834cb153de78459f4d802dda/gerrit/projects.yaml#L422011:51
sdagueanteaya, is that right? ^^^11:51
*** asettle has joined #openstack-infra11:52
*** sshnaidm is now known as sshnaidm|lnch11:52
sdagueare we really imbalanced on xenial nodes?11:53
*** baoli has joined #openstack-infra11:53
*** baoli_ has joined #openstack-infra11:54
*** sarob has quit IRC11:54
*** tonytan4ever has joined #openstack-infra11:55
openstackgerritJames Slagle proposed openstack-infra/tripleo-ci: Add temporary pin to last known good current tripleo repo  https://review.openstack.org/35448111:56
*** acabot has quit IRC11:57
*** baoli has quit IRC11:58
*** rbuzatu has joined #openstack-infra11:58
*** tonytan4ever has quit IRC11:59
openstackgerritMerged openstack-infra/project-config: Ensure we alway build old Ironic ramdisk  https://review.openstack.org/35460812:00
openstackgerritJames Slagle proposed openstack-infra/tripleo-ci: Add temporary pin to last known good current tripleo repo  https://review.openstack.org/35448112:00
beaglesianw: sorry I ran off on you... had to catch some Zzz's12:00
beaglesianw, these jobs seem to be largely hanging while cloning repos... if not hanging, then at least slowing wwaaaayyyyyy done12:00
openstackgerritJim Rollenhagen proposed openstack-infra/project-config: Ironic: multitenant job should not run on stable  https://review.openstack.org/35588012:01
openstackgerritMerged openstack-infra/project-config: Implement Swift pypy experimental check  https://review.openstack.org/35549112:02
openstackgerritSam Betts proposed openstack-infra/project-config: Prevent Ironic multitenancy job running on old versions  https://review.openstack.org/35588112:02
jrollsambetts: you're too slow :)12:02
sambettsjroll: apprently so :-P12:02
*** dprince has joined #openstack-infra12:03
*** ldnunes has joined #openstack-infra12:03
*** markvoelker has joined #openstack-infra12:05
*** sigmavirus|away is now known as sigmavirus12:05
*** lucasagomes is now known as lucas-hungry12:06
*** mriedem_away has quit IRC12:07
*** markvoelker has quit IRC12:09
*** kgiusti has joined #openstack-infra12:09
openstackgerritJulia Kreger proposed openstack-infra/project-config: Rename bifrost integration test job  https://review.openstack.org/35565212:09
openstackgerritChris Dent proposed openstack-infra/project-config: Set the launchpad name for api-wg  https://review.openstack.org/35588512:09
openstackgerritMatthew Bodkin proposed openstack-infra/storyboard: Fixing docs so it is easy to understand  https://review.openstack.org/35588612:10
*** acabot has joined #openstack-infra12:10
*** psachin has quit IRC12:10
azvyagintsev_hFolks, could you please suggest how i should fix templates for https://review.openstack.org/#/c/353861/17..18/zuul/layout.yaml ?  since if i remove check\gate section - test fall ;(12:11
*** vrovachev has joined #openstack-infra12:11
vrovachevHello around, please take a look https://review.openstack.org/#/c/355382/12:12
*** rbuzatu has quit IRC12:13
*** yaume has quit IRC12:13
*** rbuzatu has joined #openstack-infra12:14
*** narayrak has quit IRC12:15
*** locust has quit IRC12:17
*** weshay has joined #openstack-infra12:17
*** javeriak has quit IRC12:21
*** matbu|lunch is now known as matbu12:21
openstackgerritDmitry Tantsur proposed openstack-infra/project-config: Make the grenade job voting on ironic-inspector  https://review.openstack.org/35589412:22
*** javeriak has joined #openstack-infra12:24
*** gordc has joined #openstack-infra12:24
beaglesianw: is there something particular with these jobs (osic cloud jobs?) that could slow down stuff like git clone operations12:25
EmilienMto give a bit more precisions than beagles, we are seeing a persistent problem when cloning repositories with zuul cloner, when running ubuntu nodes on osic-cloud112:25
beaglesyeah, what he said12:25
beagles:)12:25
EmilienMare we aware about any downtime on osic ?12:25
*** markvoelker has joined #openstack-infra12:26
*** pradk has joined #openstack-infra12:26
*** burgerk has joined #openstack-infra12:27
*** mdrabe has joined #openstack-infra12:29
*** gouthamr has joined #openstack-infra12:32
*** apetrich has quit IRC12:32
pleia2mtreinish: so this time it really was getting stuck on the fact that the new/ directory existed and immediately failing, I manually removed it and let it run at :20, other.html now exists: http://status.openstack.org/elastic-recheck/data/other.html12:33
*** yamamoto has quit IRC12:33
pleia2mtreinish: should probably sort out the naming though :) http://status.openstack.org/elastic-recheck/ links to other.html and that exists, but it's inconsistent with our others.html template12:34
odyssey4meEmilienM afaik it's running well... but you may need to know that it's running IPv6 and that its DNS resolver is configured to use 127.0.0.1 to point at a locally running unbound service... so your tests may appear to have dns resolution errors12:34
odyssey4meEmilienM also, if your tests can't use IPv6 for external connectivity, then that may also be an issue12:34
EmilienModyssey4me: zuul-cloner takes forever12:35
EmilienModyssey4me: 35561212:35
EmilienMerr12:35
EmilienMhttp://logs.openstack.org/35/355235/1/check/gate-puppet-openstacklib-puppet-beaker-rspec-ubuntu-trusty/730b053/console.html#_2016-08-16_10_38_40_23158112:35
odyssey4meEmilienM yeah, that could relate to DNS resolution... we've seen slowness in odd places too12:36
openstackgerritBenny Kopilov proposed openstack-infra/devstack-gate: Enable support for cinder multi-backend in tempest  https://review.openstack.org/35584612:36
odyssey4mebasically OSIC is configured to use unbound, RAX has something in place which overwrites the nodepool config and uses the RAX DNS...12:37
*** apetrich has joined #openstack-infra12:37
odyssey4meso we're seeing inconsistencies and odd slowness here and there too12:37
*** ccamacho|lunch is now known as ccamacho12:38
*** yamamoto has joined #openstack-infra12:39
*** sandanar__ has quit IRC12:39
openstackgerritBrad P. Crochet proposed openstack-infra/tripleo-ci: Use tripleo-build-images for CI  https://review.openstack.org/33631212:41
mordredEmilienM, odyssey4me: for the slow cloning ... is there any chance that there is some weird routing which is causing routing between OSIC and RAX to go strange? the git mirrors are all in RAX12:42
odyssey4memordred hmm, good question - not one I have the answer to, but that would explain how slow the cloning is12:43
*** yamamoto has quit IRC12:43
odyssey4meI'm surprised that we don't have regional git endpoints too. :)12:44
odyssey4meperhaps cloudnull can provide some insight when he comes online12:44
mordredyah - well, so far it hasn't been an issue :)12:44
sdaguemordred: I did some poking around on my devstack12:44
mordredyeah?12:44
odyssey4memordred ah of course, the local git cache is useful to speed things up12:44
sdaguethe pip cache used by devstack is actually the one owned by the root user12:45
sdaguebecause sudo12:45
sdagueso https://review.openstack.org/#/c/355854/ might be all that we need12:45
*** raildo has joined #openstack-infra12:45
sdagueI don't know how one actually validates a thing like that before it goes into production12:45
*** rlandy has joined #openstack-infra12:45
* mordred looks12:46
mordredsdague: yesterday, I noticed in this change: http://logs.openstack.org/05/351905/7/check/check-osc-plugins/71038e2/console.html#_2016-08-15_17_51_18_40105412:46
mordred(which does happen to be on OSIC)12:47
mordredthat every remote update action took 4 seconds12:47
mordredsdague: root owns a pip cache?12:48
sdaguesudo pip install foo12:48
dtantsurfolks, jroll, the check-osc-plugin seems broken for ironic: http://logs.openstack.org/55/328955/15/check/check-osc-plugins/177080d/. is it something known?12:48
*** jheroux has joined #openstack-infra12:48
sdaguewill put that content into ~/.cache/pip12:48
sdaguefor root12:48
sdague /root/.cache/pip12:49
kgiustifolks: the oslo.messaging team is experiencing frequent failures of the same 3 tempest tests: http://status.openstack.org/openstack-health/#/job/gate-oslo.messaging-src-dsvm-full-zmq12:49
sdaguemordred: 4 seconds for a git operation does not seem completely out of bounds12:49
*** devkulkarni has joined #openstack-infra12:49
kgiustisimilarish to bug: https://bugs.launchpad.net/openstack-gate/+bug/144913612:49
openstackLaunchpad bug 1449136 in OpenStack-Gate "OpenStack pypi mirrors disconnecting connections" [Undecided,New]12:49
*** matt-borland has joined #openstack-infra12:49
kgiustisame failures, but not against pypi host but against localhost http server12:50
kgiustiknown issue?12:50
*** bswartz has joined #openstack-infra12:50
*** ociuhandu has joined #openstack-infra12:50
*** itisha has quit IRC12:50
sdaguekgiusti: I think we're feeding it the icon on git.openstack.org for the http image registration12:51
sdagueso that really means git.openstack.org is dropping requests12:51
*** devkulkarni has quit IRC12:53
*** devkulkarni has joined #openstack-infra12:54
mordredsdague: I think a consistent 4 seconds to check whether there are any new refs to pull in repos that should be no more than a day out of day is exceptionally long12:54
mordredsdague: that said - I have verified that the root pip caching works - so neat12:54
kgiustisdague: are the three failing tests the only ones that query git.openstack.org?   I ask because only those three tests consistently fail - all others have passed without incident.12:55
*** asettle has quit IRC12:55
mordredsdague: I don't think your patch is going to work, becuase install is going to want to build them, and we don;'t have the bindep depends installed at that point12:55
mordredsdague: if we want to prime the cache, using pip download I think may be better? but now I need to check if that also does cache things ...12:56
mordredyah. it does (just checked)12:57
*** ociuhandu has quit IRC12:57
jrolldtantsur: that's new to me12:57
rcarrillocruzo/12:58
rcarrillocruzi'm around today (yesterday was bank holiday in Spain)12:58
*** asettle has joined #openstack-infra12:58
sdaguemordred: pip download won't prime the cache12:58
mordredI just tested that it will12:58
sdagueI got a wildly smaller cache with it locally12:58
*** vikrant has quit IRC12:59
sdaguemordred: it will only try to build if the wheels aren't there, right?12:59
sdaguewe're hitting the wheel mirror with this, right?12:59
mordredmordred@camelot:~/src/openstack-infra/nodepool$ sudo -H pip install -d . paramz12:59
mordredCollecting paramz12:59
rcarrillocruzdoh12:59
mordred  Using cached paramz-0.6.1.tar.gz12:59
rcarrillocruzyolanda , mordred , pabelanger : http://paste.openstack.org/show/558377/12:59
mordredthat was the second time I ran it, after deleting the tarball from the local dir12:59
*** julim has joined #openstack-infra12:59
rcarrillocruzglean cruft on writing interfaces file12:59
rcarrillocruzbut yeah, i can deploy servers with bifrost13:00
rcarrillocruzi'll see what's up with glean13:00
Zarahm, should gerrit search autocomplete for stories and tasks now? aiui we need config to enable gerrit-updating-storyboard per project, as per the commit message here: https://review.openstack.org/#/c/347486/ but tasks and stories were now indexed in gerrit search?13:01
mordredsdague: http://paste.openstack.org/show/558378/13:01
rcarrillocruzhuh13:01
rcarrillocruzalso, the interface is set to dhcp, but should not13:01
*** apetrich has quit IRC13:02
mordredZara: I'm not sure they autocomplete - but https://review.openstack.org/#/q/bug:2000522 works13:02
openstackgerritEmmet Hikory proposed openstack-infra/storyboard-webclient: Add Worklists and Boards to About Page  https://review.openstack.org/35591213:02
mordredZara: so adding a bug:2000522 to the search finds the thing by story id13:03
*** xyang1 has joined #openstack-infra13:03
*** _ari_ has joined #openstack-infra13:03
sdaguemrodden: can you rm -rf ~/.cache/pip and try that again?13:04
*** javeriak has quit IRC13:04
*** woodster_ has joined #openstack-infra13:04
*** kbaegis has joined #openstack-infra13:05
Zaramordred: oh, aha. I thought it needed 'story:2000522' but that was probably just me misinterpreting the expected behaviour. found the docs now and they do say 'bug:' and 'tr:' so whoops.13:06
*** javeriak has joined #openstack-infra13:06
sdaguebecause when I use -d, my pip cache remains empty13:06
mordredsdague: sure13:06
sdaguewith pip 8.1.213:06
*** yamamoto has joined #openstack-infra13:07
toskysdague: now that devstack switched to neutron by default, how to enable nova-network in gate jobs (for a poor old Sahara job that I'd like to kill sooner than later)?13:07
*** yamamoto has quit IRC13:07
sdaguetosky: the gate doesn't really change, it's always had explicit service lists13:07
odyssey4meyolanda if you have a moment, a review of https://review.openstack.org/355434 would be appreciated13:07
*** andymaier has joined #openstack-infra13:08
mordredsdague: yes. it works13:08
yolandaodyssey4me, back from lunch, i'll take a look in a while13:08
*** ociuhandu has joined #openstack-infra13:08
Zara(yes, bug:$task_id will also find storyboard tasks, ace)13:09
*** sshnaidm|lnch is now known as sshnaidm13:09
*** devkulkarni has quit IRC13:10
*** lucas-hungry is now known as lucasagomes13:10
mordredsdague: http://paste.openstack.org/show/558383/13:10
penguinologHello! Could anybody help with https://review.openstack.org/#/c/355382/ - it's blocker for the parallel team13:10
*** edmondsw has joined #openstack-infra13:11
*** lifeless has quit IRC13:11
persiaZara: Do we run any risk of collision between LP bug# and SB task#?  There's a gap in SB stories to avoid LP bugs, but I don't think there is one for tasks.13:11
*** mriedem has joined #openstack-infra13:12
*** Julien-zte has joined #openstack-infra13:12
*** andymaier has quit IRC13:13
mordredsdague, odyssey4me: I tested git remote operations on an osic node and they all took less than a second as expected ( doing git remote update origin pointed at git.o.o)13:13
mordredso it doesn't seem to be routing issues13:13
*** javeriak has quit IRC13:14
Zarapersia: yes, I think so, though just when searching for them. so if two commits pop up when someone searches, it should be fairly quick to find the right one since I'd imagine they'd be about totally different things.13:14
toskysdague: I see, thanks13:15
*** nmagnezi has quit IRC13:15
*** apetrich has joined #openstack-infra13:15
persiaZara: I was worried more about comments being posted to unrelated stories that might trigger email as a result of subscriptions.  Maybe I lack context.13:15
sdaguemordred: pip --version?13:15
Zarapersia: ah, that's a separate thing. this is just for searching things in gerrit. the plugin should use storyboard-specific syntax in the commit message13:18
azvyagintsev_hFolks, could you please suggest how i should fix templates for https://review.openstack.org/#/c/353861/17..18/zuul/layout.yaml ?  since if i remove check\gate section - test fall ;(13:18
persiaAh, cool.   I was missing context :)13:18
mordredsdague: pip 8.1.2 from /usr/lib/python2.7/site-packages (python 2.7)13:19
*** spzala_ has joined #openstack-infra13:19
sdaguemordred: ok, well13:19
mordredsdague: I'm not sure why it's not working for you - but I think it has the better chance of working, since we know install won't work13:20
*** ianychoi has quit IRC13:20
Zara(so 'closes-bug: $id' will close a lp bug, 'task: $id' will affect sb task status; both are searchable in gerrit with 'bug:$id'. so in practice I think the tricky bit will be that we'll probably see people using lp notation to try to change sb task status, but that's one for the future)13:20
*** zhurong has joined #openstack-infra13:21
*** cdent has left #openstack-infra13:22
sdaguemordred: this is what I get - http://paste.openstack.org/show/558386/13:22
mordredsdague: you need to do find /root/.cache/pip13:22
mordrednot /home/sdague13:23
sdagueI'm not running as root13:23
*** kushal has quit IRC13:23
mordredhrm. weird. try as root and see if you get my behavior?13:23
mordred(since that's the important one for this)13:23
mordredsdague: got it13:25
mordredsdague: adding the index prevents the cache13:25
openstackgerritOleksii Zamiatin proposed openstack-infra/project-config: Remove n-net related gates  https://review.openstack.org/35591913:25
sdaguemordred: gah, really?13:25
mordredsdague: yup13:25
openstackgerritMerged openstack-infra/project-config: Implement LXD hypervisor experimental check  https://review.openstack.org/35543413:25
sdagueso that means this doesn't work at all because we're using alternative indexes?13:25
mordredit will neither download to the cache or use things in the cache13:25
mordredyah13:25
mordredat least, according to my test just now13:26
mordredI haven't poked more extensively13:26
openstackgerritMerged openstack-infra/project-config: fuel-qa: stable-mu branches for maintenance and stable for upgrades  https://review.openstack.org/35538213:26
sdagueso... we know that's not entirely true during runs, because we definitely only download each package once13:27
mordredweird13:27
mordredwell, local testing with your command line resulted in nothing being cached13:27
sdagueyeh, install with index still builds the cache13:28
sdagueit's just download that doesn't13:28
mordredsigh13:28
mordredthat seems like a pip bug13:28
mordredoh - so - this is going to run during image build13:28
*** lifeless has joined #openstack-infra13:28
*** rbuzatu_ has joined #openstack-infra13:28
mordredwhich means it should be hitting pypi, not pip mirrors13:29
mordredyeah?13:29
yolandarcarrillocruz, can it be a race? i ran glean several times on my environment and i get good results13:29
mordredor do we set it to use the dfw mirror during image buids (/me can't remember)13:29
sdaguemordred: that actually will defeat the purpose of the patch if it does13:29
mordredwhy?13:29
mordredit's during image build - it'll download and cache the things using download. then, during devstack run, the cache will be populated and the intsall command will be using install so it should read the cache13:30
sdaguebecause if we hit pypi and download, then we'll get numpy as source13:30
sdaguewhich means we have to spend 4 minutes compiling it on the node13:30
mordredoh. right. bother13:30
mordredso - I guess we just have to get download honoring caches13:30
*** rlandy is now known as rlandy|mtg13:31
mordreddstufft: ^^ whence you awaken ... tl;dr pip download with -i option for an alternate index does not populate or consume cache. pip install with -i option does13:32
*** andymaier has joined #openstack-infra13:32
*** ianychoi has joined #openstack-infra13:32
mordredthat said - I do not believe we point at pip mirrors until the image boots13:32
*** rbuzatu has quit IRC13:33
mordredso we'd also want to explore setting a mirror location during image build13:33
fungimordred: does changing the mirror url after boot still pose a problem?13:34
fungiif so, we're sort of stuck unless we want to build different images for every provider/region13:34
*** inc0 has joined #openstack-infra13:35
mordredfungi: no, I do not believe it does13:35
fungiso if we, say, set it to the dfw pypi mirror before caching packages, then we can update it to a different mirror later and it'll still use the cache?13:36
*** esberglu has joined #openstack-infra13:36
jrollmordred: when you have a sec, this looks like a similar thing you were looking at yesterday, is it just a timeout or something else? http://logs.openstack.org/55/328955/15/check/check-osc-plugins/177080d/_zuul_ansible/ansible_log.txt . no errors in the console log http://logs.openstack.org/55/328955/15/check/check-osc-plugins/177080d/console.html13:37
sdaguefungi: ug, you might be right13:38
*** rbuzatu_ has quit IRC13:38
mordredjroll: yah - 2016-08-16 11:33:55,481 p=6961 u=zuul |  fatal: [node]: FAILED! => {"async_result": {"ansible_job_id": "344516947230.6884", "changed": false, "finished": 0, "invocation": {"module_args": {"jid": "344516947230.6884", "mode": "status"}, "module_name": "async_status"}, "started": 1}, "changed": false, "failed": true, "msg": "async task produced unparseable results"}13:39
fungisdague: mordred: i mean, maybe we can "transform" the cache when we reset the mirror url, as an alternative. though that's getting into implementation details of pip's cache that probably aren't a guaranteed stable api13:39
mordredjroll: that just ran a couple of hours ago doens' it?13:39
jrollmordred: looks like it, yeah13:40
mordredfungi: I think it's caching by content hash, not by name13:40
*** nwkarsten has joined #openstack-infra13:40
sdaguemordred: I'm not so sure13:40
fungioh, so if the content hash is consistent (which it would be across our mirrors unless there's an update) then we might be fine13:40
fungibut if it mixes other data into that hash, like the url or something, then that gets tricky13:41
mordredsdague: pip install with a -i doesn't cache for me with install either13:41
*** hichihara has joined #openstack-infra13:41
fungihuh. apparently crowbar is still under active development? just saw a cve request to the oss-security ml because they were setting a known default admin account password in it13:42
mordredfungi: yah - it's the basis of rob's current company13:43
fungioic13:43
*** zhurong has quit IRC13:43
*** markusry has joined #openstack-infra13:43
sdagueanyway, I need to get back to release things. Once dstufft is up he can probably just tell us all our silliness instead of us guessing13:43
jrollmordred: so "yah" meaning "yah that is similar" or "yah that is a timeout" or? :) looking for something actionable I can do here13:43
Shrewsmordred: that ansible error... looks like a genuine timeout13:43
mordredsdague: http://paste.openstack.org/show/558391/13:43
sdaguemy quick git grepping in pip source isn't finding payload13:43
Shrewsmordred: TASK [zuul_runner with 1547 second timeout]13:43
jrolloh wait, timestamps13:44
* jroll feels dumb13:44
*** rbuzatu has joined #openstack-infra13:44
fungiShrews: so the behavior i was seeing in various job logs yesterday were legitimate timeouts ending with an ansible json parse failure13:44
*** zhurong has joined #openstack-infra13:44
*** ramishra has quit IRC13:44
mordredfungi: yah. that's what we fixed yesterday13:44
sdaguemordred: you delete the venv13:44
*** dizquierdo_afk is now known as dizquierdo13:44
Shrewsfungi: yeah, i can't explain why a timeout causes that13:44
mordredsdague: I do - then I re-make it13:44
sdagueah, right13:44
*** ramishra has joined #openstack-infra13:44
jrollShrews: so this is just pip being super slow, I guess13:44
fungias if one of the things ansible was failing to parse was the json coming from jobs that timed out, not that ansible was responsible for the timeout13:44
Shrewsjroll: probably?13:45
sdaguemordred: can you do that without the ^C?13:45
mordredsdague: sure13:45
jrollShrews: all that job does, if you look at the console, is install a bunch of OSC plugins13:45
jroll:)13:45
Shrewsfungi: yeah, i suspect nothing is written to the async file if the job doesn't finish, thus the unparseable13:45
fungiempty != json13:45
fungiindeed13:45
fungijroll: did that run in rax-ord?13:46
jrollfungi: nope, osic http://logs.openstack.org/55/328955/15/check/check-osc-plugins/177080d/console.html13:46
mordredShrews, fungi: hrm. that would be annoying13:46
pabelangercloudnull: I updated ubuntu-xenail in osic-cloud1 and confirmed DNS is running on ipv6.  Other images will be updated today13:46
fungijroll: oh, okay. we did just up the quota significantly there... lemme check a few things13:46
Shrewsmordred: we *might* be able to recognize the timeout in ansible and write empty json to solve that13:46
mordredShrews: there is an if/else case that I thought was related to timeout13:47
* Shrews looks13:47
jrollfungi: cool, I'm going to recheck that unless you think there's reason not to13:47
jrollI guess we had two in a row, though13:47
fungiunfortunately https://review.openstack.org/355580 hasn't merged yet, so we're going to have a relatively hard time figuring out if we're taxing that mirror13:48
mordredShrews: line 603 in lib/ansible/executor/task_executor.py13:48
jrollboth osic cloud13:48
Shrewsmordred: ah, it still depends on 'parsed' being there13:48
Shrewswhich it won't be if it didn't actually finish13:48
fungijroll: might only be coincidence, but i'll get our mirror there into cacti in moments and see what else i can find in the meantime13:48
*** kushal has joined #openstack-infra13:49
jrollfungi: cool, thank you :)13:49
mordredShrews: why not thought? the async_runner shold be the thing writing the status to the file13:49
Shrewsmordred: apparently it isn't. 'parsed' is not in the output you just pasted in channel13:49
rcarrillocruzso yeah13:52
rcarrillocruzfungi: we are bifrosting13:52
rcarrillocruzi just redeployed a server with paul via screen session now13:53
fungircarrillocruz: rock on! that's awesome news13:53
*** tonytan4ever has joined #openstack-infra13:53
*** permalac has joined #openstack-infra13:53
mordred\o/13:54
rcarrillocruzwe needed several bifrost fixes13:54
rcarrillocruzand i spotted a couple glean things13:54
rcarrillocruzwe'll go thru in a bit13:54
pabelangermordred: fungi: clarkb: Would love some feedback on: https://review.openstack.org/#/c/355695/ to fix some launch node failures with ubuntu-xenial13:54
* rcarrillocruz goes for coffee now13:54
*** burgerk has quit IRC13:54
*** vikrant has joined #openstack-infra13:55
*** infra-red has quit IRC13:55
azvyagintsev_hfungi  craige Folks, could you please suggest how i should fix templates for https://review.openstack.org/#/c/353861/17..18/zuul/layout.yaml ?  since if i remove check\gate section - test fall ;(13:55
*** yamahata has joined #openstack-infra13:55
mordredpabelanger: wow13:55
*** infra-red has joined #openstack-infra13:56
mordredpabelanger: just out of idle curiosity - (patch looks fine) - I wonder if we could get the ssh daemon to not start until unbound is started13:56
pabelangermordred: yes, I thought of that too. I haven't looked into that yet13:56
fungipabelanger: speaking of xenial, snmpd won't start on firehose01... i suspect we need to tweak our config for it13:57
pabelangermordred: as for the problem: http://paste.openstack.org/show/557771/ I _think_ unbound is waiting for random to initialize in the kernel, before doing things with its root.key13:57
*** vikrant has quit IRC13:57
*** markusry has quit IRC13:57
*** markusry has joined #openstack-infra13:57
pabelangerfungi: sounds like we need to get an etherpad to track our xenial issues13:57
*** thorongil has quit IRC13:58
fungircarrillocruz: looks (from all the sudospam i've received) that baremetal00 is having trouble resolving its own hostname. may need /etc/hosts fixed?13:58
*** rbrndt has joined #openstack-infra13:59
fungircarrillocruz: or maybe you already fixed that... last entry i have was 07:47:53 utc13:59
*** pgadiya has quit IRC14:00
*** jimbaker has quit IRC14:00
*** nmagnezi has joined #openstack-infra14:01
*** markusry has quit IRC14:02
*** andymaier has quit IRC14:02
*** bin_ has joined #openstack-infra14:02
*** jistr is now known as jistr|debug14:03
*** rlandy|mtg is now known as rlandy14:04
*** jimbaker has joined #openstack-infra14:04
*** jimbaker has quit IRC14:04
*** jimbaker has joined #openstack-infra14:04
rcarrillocruzfungi: https://review.openstack.org/#/c/355778/14:05
*** zhurong has quit IRC14:06
rcarrillocruzfungi: essentially, the install playbook on bifrost hardcodes /etc/hostname on 127.0.0.114:06
*** zhurong has joined #openstack-infra14:06
rcarrillocruzwhich breaks fqdn resolution14:06
rcarrillocruzand breaks puppet apply runs14:07
rcarrillocruzother thing i've noticed is that puppet sets /etc/resolv.conf to nameserver 127.0.0.1, not sure if that's some unbound thing on the node declaration14:07
rcarrillocruzpabelanger: ^14:07
*** yamamoto has joined #openstack-infra14:07
*** hichihara has quit IRC14:08
fungircarrillocruz: yeah, that's because we run unbound on all our servers to provide a local resolver cache14:08
rcarrillocruzthat's a problem, since baremetal00 runs dnsmasq itself14:09
fungioh, so port conflict i guess14:09
rcarrillocruzpossibly, although i see unbound set to false on the node declaration14:09
rcarrillocruzO_O14:09
rcarrillocruzi'll wait for paul, i remember he changed the unbound setting on this node for a reason14:10
fungircarrillocruz: likely we need to specify a remote resolved if we're not installing unbound14:10
fungier, remote resolver14:10
rcarrillocruzi set by hand /etc/resolv.conf to 8.8.8.8 :/14:10
*** armax has joined #openstack-infra14:10
fungipabelanger: digging into the snmpd issue, the commands in our initscript seem fine, but apparently it's not used because there's a systemd unit for it which takes precedence14:10
* fungi blames lennart14:10
rcarrillocruzbut yeah, conflicts with puppet, since that changes it back to 127.0.0.1 which breaks install playbook that pulls things from IntarWeb14:11
rcarrillocruzit can't resolve14:11
mtreinishpleia2: I though I moved everything to use others.html now14:11
pleia2mtreinish: shrug14:11
*** _ari_ has quit IRC14:11
* rcarrillocruz is procrastinating learning of systemd14:11
mtreinishpleia2: ugh, no the template for integrated gate and the output file look like it's other.html still14:13
*** yamamoto has quit IRC14:13
pleia2mtreinish: well, at least it generates now, just some final tweaks to tidy this up then14:14
*** tqtran has joined #openstack-infra14:14
jeblairrcarrillocruz: why is baremetal00 running dnsmasq as a nameserver?14:15
rcarrillocruzjeblair: it's what it uses to pxe boot servers14:15
*** pgadiya has joined #openstack-infra14:16
rcarrillocruzbifrost that is14:16
*** xarses has quit IRC14:16
jeblairrcarrillocruz: why is a name server needed for that?14:16
rcarrillocruzit's a bifrost dependency14:16
rcarrillocruzit's also used as a pxe/tftp server, not just a nameserver14:17
*** jaosorior has quit IRC14:17
jeblairyeah, the parts that are not a dns resolver make sense.  i'm just wondering why it's also configured as a dns resolver14:17
rcarrillocruzif you want historical reasons why it was decided to be used , TheJulia may be best to answer that14:17
jeblairi'm not sure i'm stating the question in a way that is conveying my meaning14:17
fungiis it possible to use it without having it serve as a dns resolver?14:17
TheJuliait is, the configuration just needs to be disabled for dns resolution14:18
*** asettle has quit IRC14:18
TheJuliathat is in what is put in place for dnsmasq's main config file14:18
fungithat way it wouldn't conflict with the local resolver cache service we want to run on the same machine14:18
*** tqtran has quit IRC14:18
*** asettle has joined #openstack-infra14:18
jeblairdnsmasq is a server which supports lots of protocols.  do we need to use its dns resolver as opposed to just the other (pxe/tftp) bits?14:18
rcarrillocruzis there a flag available to disable it ?14:19
rcarrillocruzTheJulia: ^14:19
rcarrillocruzthe dns part14:19
TheJuliarcarrillocruz: in bifrost, not presently14:19
rcarrillocruzwhat i thought14:19
rcarrillocruzi mean, it wouldn't be complex to push14:19
*** edtubill has joined #openstack-infra14:19
rcarrillocruzs/push/patch14:19
TheJuliano, it should be extremely simple14:19
rcarrillocruzTheJulia: do nodes need any dns resolving from the bifrost controller during the IPA loading etc14:20
rcarrillocruz?14:20
*** jcoufal has joined #openstack-infra14:20
rcarrillocruzif DNS was never needed, i'm curious why it wasn't just disabled from the beginning14:20
*** asselin has joined #openstack-infra14:21
TheJuliarcarrillocruz: if someone decides to use names in the config handed to ironic in terms of URLs, then dns resolution is required14:21
TheJuliabut if only IPs are used, then it is not required14:21
rcarrillocruzah14:21
*** gongysh has joined #openstack-infra14:21
rcarrillocruzhmmm14:21
jeblairwe *do* have a dns resolver :)14:21
TheJuliaWell, there you go, the correct dns resolver just needs to be offered out for dhcp requests then14:22
*** devkulkarni has joined #openstack-infra14:22
jeblair{% if disable_dnsmasq_dns %}14:22
jeblairbifrost may already have the option :)14:22
openstackgerritMatthew Treinish proposed openstack-infra/elastic-recheck: Make everything plural  https://review.openstack.org/35596714:23
rcarrillocruzdoing ironic node-show blah, i only see IPs there14:23
rcarrillocruzso i think we should be good14:23
*** asettle has quit IRC14:23
*** edtubill has quit IRC14:23
*** _ari_ has joined #openstack-infra14:23
*** hongbin has joined #openstack-infra14:25
fungianybody happen to know how to get systemd to tell you the location on disk of the unit it's using for a particular service?14:28
openstackgerritRicardo Carrillo Cruz proposed openstack-infra/puppet-infracloud: Disable DNS resolver on Bifrost dnsmasq server  https://review.openstack.org/35597314:29
rcarrillocruzjeblair: ^14:30
rcarrillocruzfungi: ^14:30
*** tosky has quit IRC14:30
*** edtubill has joined #openstack-infra14:31
*** zz_dimtruck is now known as dimtruck14:31
openstackgerritIvan Udovichenko proposed openstack-infra/project-config: Add new/update existing projects  https://review.openstack.org/34704714:32
*** tosky has joined #openstack-infra14:33
azvyagintsev_hfungi will you have some time to help me with https://review.openstack.org/#/c/353861 ?  i cannot get where i miss those stuff..(14:34
fungijust a heads up, we're discussing some job failures in #openstack-qa that look like they're a result of remote http(s) calls out of bluebox are failing with some consistency14:34
*** rajinir has joined #openstack-infra14:34
openstackgerritIvan Udovichenko proposed openstack-infra/project-config: Add new/update existing projects  https://review.openstack.org/34704714:34
fungiazvyagintsev_h: no clue what you're talking about, or why you're asking me directly. can you elaborate?14:34
*** mdrabe has quit IRC14:34
fungiazvyagintsev_h: is it related to something i was already working on?14:34
*** mdrabe has joined #openstack-infra14:35
rajinirfungi: The cell patch was reverted https://review.openstack.org/#/c/355599/114:35
*** jed56 has quit IRC14:35
*** asettle has joined #openstack-infra14:35
fungirajinir: the patch which was causing the failure you were seeing, right?14:35
rajinirfungi: yes14:35
fungii didn't follow that very closely, just saw it was also severely impacting nova cells tests in the upstream ci as well14:36
*** pgadiya has quit IRC14:36
*** burgerk has joined #openstack-infra14:37
azvyagintsev_hfungi i guess no:) should i directly ask\wait Craig\someone else ? (i'm asking you just because you are guru )14:37
*** amitgandhinz has quit IRC14:38
*** links has quit IRC14:39
*** amitgandhinz has joined #openstack-infra14:39
*** kushal has quit IRC14:40
openstackgerritIlya Shakhat proposed openstack-infra/project-config: Add new project "os-failures"  https://review.openstack.org/35581914:40
betherlyhi there! getting ready to release ironic-ui. i have a patch for openstack-releases but do i also need to tag the release?14:40
*** kushal has joined #openstack-infra14:41
betherlythe route for releasing eslint was quite different from what i did with the ironic-ui last time so got a bit confused re what i need to do this time round14:41
fungibetherly: you probably want to ask in #openstack-release but i believe for projects under release management you submit a patch to the releases repo and then a release manager runs a script after it merges and pushes a tag for you14:41
*** xarses has joined #openstack-infra14:42
betherlyah sorry fungi!!14:42
betherlythat would make sense! thank you so much :)14:42
fungibetherly: you're welcome14:42
*** florianf has quit IRC14:44
jeblairfungi: 355580 was killed by the problem it attempts to debug14:44
Shrewsfungi: so, hopefully this change will make those ansible timeouts actually be reported as timeouts and not unparseable: https://github.com/ansible/ansible/pull/1710414:44
openstackgerritMatthew Treinish proposed openstack-infra/elastic-recheck: Add query for bug 1613749  https://review.openstack.org/35598814:44
openstackbug 1613749 in OpenStack-Gate "Git timeouts from bluebox" [Undecided,New] https://launchpad.net/bugs/161374914:44
beaglesrcarrillocruz, is https://review.openstack.org/#/c/355973/ supposed to help with some of the jobs failing because of stuff like slow git repo cloning, etc?14:45
beaglesjust seeking clarification as the issue is atm near and dear to my heart :)14:45
jeblairbeagles: no; but the ansible fix from yesterday was not sufficient14:45
jeblairbeagles:  https://github.com/ansible/ansible/pull/1710414:46
rcarrillocruzbeagles: no, that change is unrelated14:46
beaglesrcarrillocruz, thanks14:46
beaglesalso jeblair thanks14:46
jeblairbeagles: so the next iteration of that fix is in progress.  it hasn't landed in ansible yet, but when it does, we'll redeploy14:47
jeblairbeagles: however, it's looking like most of the instances of this error are actually timeouts -- did you say you were thinking that was the case with your job?14:47
rcarrillocruzthat change is about infracloud, a pool of servers we'll manage to run a cloud for CI14:47
Shrewsjeblair: to be fair, that PR will just (hopefully) get timeouts reported as timeouts. it doesn't actually fix the timeout14:47
jeblairShrews: right :)14:47
beaglesjeblair, yeah, I was just going to say.. the parsing thing is just how the info was represented - it's the timeouts I'm wondering about14:48
beaglesnot the timeouts actually by *why* those particular things are taking so long :)14:49
beaglesjeblair, ultimately, I want the ansible fix to be unnecessary :)14:49
fungiazvyagintsev_h: explaining in channel what you're trying to figure out and what potential issues you've eliminated already is usually a faster way to get help, rather than just pasting a link. i've skimmed the change and it seems you're proposing creation of a new project/repo but are having trouble with the layout job. the console log from it indicates you're trying to configure zuul to run jobs you14:49
fungihaven't defined (e.g., gate-murano-pkg-check-python27-ubuntu-trusty). i see a typo which i've marked inline on your change that would account for it14:49
*** dprince has quit IRC14:49
*** pt_15 has joined #openstack-infra14:50
*** jistr|debug is now known as jistr14:50
jeblairmordred: your 4-minute git thing was on osic?14:51
sdaguejeblair: 4 seconds, right?14:52
jeblairsdague: those are different than minutes?  :)  yeah, 4-something.  i guess i'm asking what that value is too.  i may be confused because i'm staring at a log that took 5 minutes for each remote update.14:53
jeblairon osic.14:54
sdaguejeblair: I thought he said seconds14:54
sdague4 minutes would be an issue, I agree14:54
jeblairsdague: a full clone of git://git.openstack.org/openstack/python-aodhclient took 1 second, so even 4 seconds is :(14:55
*** tongli has joined #openstack-infra14:56
*** vinaypotluri has joined #openstack-infra14:56
*** Julien-zte has quit IRC14:57
* jeblair fixes he.net tunnel14:57
*** permalac has quit IRC14:58
*** florianf has joined #openstack-infra14:58
*** jimbaker has quit IRC14:59
jeblairtelnet 2001:4800:1ae1:18:f816:3eff:fe13:4660 188514:59
jeblairga14:59
jeblairthat's even the wrong port14:59
mordredjeblair: yes. 4 minutes14:59
jeblairmordred: not seconds? :)14:59
mordred:)14:59
*** yaume has joined #openstack-infra15:00
mordrednope. 4 minutes :)15:00
jeblairmordred: osic?15:00
openstackgerritAleksey Zvyagintsev proposed openstack-infra/project-config: Add repo for murano-pkg-check. Murano package validator tool.  https://review.openstack.org/35386115:00
openstackgerritDarragh Bailey proposed openstack-infra/jenkins-job-builder: Allow using lockfile per jenkins master  https://review.openstack.org/29363115:00
mordredjeblair: yup15:01
openstackgerritAleksey Zvyagintsev proposed openstack-infra/project-config: Add repo for murano-pkg-check. Murano package validator tool.  https://review.openstack.org/35386115:01
*** amitgandhinz has quit IRC15:02
*** amitgandhinz has joined #openstack-infra15:02
wznoinskhi infra15:03
wznoinsket al15:03
*** jimbaker has joined #openstack-infra15:03
*** jimbaker has quit IRC15:03
*** jimbaker has joined #openstack-infra15:03
openstackgerritAleksey Zvyagintsev proposed openstack-infra/project-config: Add repo for murano-pkg-check. Murano package validator tool.  https://review.openstack.org/35386115:03
wznoinskdid anyone see a situation where in static-network-up is emitted earlier than all the interfaces get their IPs and their /run/network/ifup.* get created?15:04
jeblairmordred, sdague: oh, huh, it's not every job on osic.  i just watched one breeze right through a git clone.15:04
*** ifarkas is now known as ifarkas_afk15:04
wznoinskthat's ubuntu 14.04, troubleshooting cloud-init init kicking off to early (before the network is actually up)15:04
*** dizquierdo is now known as dizquierdo_afk15:05
mordredjeblair: yah - I jumped on an osic node earlier and tried some manual updates and they worked as expected15:05
mordredjeblair: I have not yet been able to find the pattern15:05
jeblairgrr.15:06
pabelangermordred: jeblair: fungi: So, here is the boot process on ubuntu-xenail visualized: http://imgh.us/filename_3.svg check out unbound15:06
pabelanger1min 155ms to start15:06
pabelangerI don't know why yet15:06
*** martinkopec has quit IRC15:06
openstackgerritEmmet Hikory proposed openstack-infra/storyboard-webclient: Add Worklists and Boards to About Page  https://review.openstack.org/35591215:06
jeblaircloudnull: if we collect instance ids from jobs which have very slow interactions with our git farm, can you correlate those and see if there is a host/network patten on the cloud side?15:07
pabelangerit does look like it is waiting for random15:07
fungispeaking of osic, it looks like we also have devstack jobs failing there because glance isn't responding on 127.0.0.1:9292 when told to listen on 0.0.0.0:9292 (baffling)15:07
fungiand i notice traceroute6 out from job nodes there to git.o.o coming back blank15:07
jeblairpabelanger: is that first or second boot?15:08
*** mhickey has joined #openstack-infra15:08
*** itisha has joined #openstack-infra15:09
jeblairfungi: i just did 'traceroute6 git.openstack.org' from a node (which ran a job where git worked fine) and got data15:09
jeblairfungi: so maybe on the nodes where git takes 4+ minutes for each operation, traceroute6 git.o.o also fails?15:09
*** hockeynut has joined #openstack-infra15:10
openstackgerritAleksey Zvyagintsev proposed openstack-infra/project-config: Add repo for murano-pkg-check. Murano package validator tool.  https://review.openstack.org/35386115:10
pabelangerjeblair: in this case, 2nd boot (I disabled the puppet service on boot). I can redo on first boot if needed15:10
fungijeblair: perhaps. here's one log we were looking at with the localhost glance weirdness http://logs.openstack.org/55/352455/3/gate/gate-tempest-dsvm-cells/7881266/console.html#_2016-08-16_12_27_19_83089715:10
jeblairpabelanger: nah, that's okay.  2nd is more interesting to me.15:10
*** vhosakot has joined #openstack-infra15:13
*** devkulkarni has quit IRC15:13
*** jcoufal has quit IRC15:13
*** devkulkarni has joined #openstack-infra15:13
openstackgerritPaul Belanger proposed openstack-infra/system-config: Disable puppet service on boot  https://review.openstack.org/35600415:13
pabelangerdisabled puppet service on boot^15:14
*** hockeynu_ has joined #openstack-infra15:15
pabelangerjeblair: I've updated our configure_mirror.sh (355695) to better handle the delayed dns on ubuntu-xenial. Since we have a large amount of launch failures because of it.  This could also explain why ubuntu-trusty ready nodes is much higher then ubuntu-xenial during the day15:15
*** devkulkarni has quit IRC15:16
*** devkulkarni has joined #openstack-infra15:16
jeblairpabelanger: does that always happen, or just sometimes?15:16
jeblairpabelanger: i wonder if it's the same problem as git.15:17
*** Goneri has quit IRC15:17
pabelangerjeblair: Yes, I've also see it on multiple clouds, osic-cloud1 and bluebox, in sampling ubuntu-xenial syslogs15:17
*** dprince has joined #openstack-infra15:18
jeblairpabelanger: oh, so not just osic15:18
pabelangerlet me check others quickly15:18
pabelangerjeblair: right15:18
*** jcoufal has joined #openstack-infra15:18
*** hockeynut has quit IRC15:18
clarkbpabelanger: and that is with the pre ipv6 resolver config right? (that hasn't gotten onto our images yet)15:18
*** Goneri has joined #openstack-infra15:18
jeblaircloudnull: is there any commonality between instances a4b575fe-b043-4775-9d8e-286c04f03a9f and 9b3bf68f-b08b-4851-a22a-d2f6a5247982 ?15:19
pabelangerclarkb: no, I build and uploaded a ubuntu-xenial image to osic last night, same issue15:19
jeblaircloudnull, pabelanger: i don't think we actually needed the ipv6 resolver change -- osic has a 6 to 4 gateway15:19
jeblairclarkb: ^15:19
*** markusry has joined #openstack-infra15:19
jeblairshouldn't hurt15:20
clarkbjeblair: correct we don't need it for stuff to function properly. Just trying to make sure this isn't somehow a regression related to that change15:20
*** markusry has quit IRC15:20
fungiyeah, we don't _need_ it for osic, but we do need it in case we end up with a provider with no ipv4 routing at all for our job nodes in the future15:20
fungiso not urgent, but not entirely useless15:20
*** mtanino has joined #openstack-infra15:20
mtreinishjeblair: fwiw, we're tracking 2 failures on the tempest glance tests. One on osic and the other on bluebox15:20
mtreinishit's the same tests, but they manifest a little differently15:21
pabelangerclarkb: jeblair: that is from internap http://imgh.us/filename_4.svg15:22
pabelangerI have a change out to disable puppet on boot15:23
pabelanger35600415:23
*** derekh has quit IRC15:24
clarkbpabelanger: how long does it take if you stop the unbound service then start it?15:24
*** oanson has quit IRC15:24
clarkbis this only present on boot or any time the service starts?15:24
openstackgerritRicardo Carrillo Cruz proposed openstack-infra/puppet-infracloud: Switch to infra-cloud-bridge element  https://review.openstack.org/35600915:24
pabelangerclarkb: after it has started properly, restarts are instant15:24
fungii have a feeling it's generating a local key for dnssec at first start15:24
*** ccamacho is now known as ccamacho|out15:24
pabelangerright, I think that too15:25
fungithough why it takes that long to do so is worth asking15:25
openstackgerritMerged openstack-infra/system-config: Add mirror.regionone.osic-cloud1.o.o to cacti  https://review.openstack.org/35558015:25
fungihaveged starts well before udevd according to that visualization15:25
rcarrillocruzsome entropy delay ^15:25
pabelangerclarkb: http://paste.openstack.org/show/557771/ is always the order when unbound starts processing15:25
clarkb"With cache restoration turned on, my system reboot would take forever, because of unbound hanging/processing a maybe corrupt cache-file." is from a random pfsense forum post15:25
pabelangerrcarrillocruz: likely ^15:25
fungier, i mean haveged starts well before unbound15:26
fungioh, that's worth checking15:26
pabelangerclarkb: So....15:26
pabelangerthere is some chroot logic in unbound too15:26
clarkbpabelanger: apparently we can turn off cache restoration which should be fine for our single use nodes15:26
clarkb(if that is indeed related)15:26
pabelangerokay, I can try that15:27
*** tongli has quit IRC15:27
pabelangerany docs on how to do that?15:27
*** tongli has joined #openstack-infra15:27
clarkbnot seeing it in the unbound.conf man page15:28
* clarkb keeps digging15:28
*** esikachev has quit IRC15:29
fungiyeah, i've been through the manpages for unbound, unbound.conf and unbound-control so far, to no avail15:30
pabelangerI think it is a manual process15:30
jeblairfungi, pabelanger: i thought this graph was second boot?15:31
clarkbpabelanger: ya looks like its part of unbound-control so would probably be part of the unit files if being done on ubuntu15:31
pabelangerjeblair: first svg was 2nd boot, seconds svg was first boot15:32
jeblairpabelanger: either way, it's a long startup both times, ya?15:32
*** matrohon has quit IRC15:32
jeblairpabelanger: do you have a node where this is slow?15:33
jeblairi just restarted unbound on a xenial node and it was fast15:33
pabelangerjeblair: yes, 2001:4800:1ae1:18:f816:3eff:fe8e:9a3e15:33
pabelangerI manually launched that is osic-cloud115:34
pabelangerjeblair: feel free to reboot if needed15:34
jeblairpabelanger: cool, thanks15:34
clarkbI guess the other thing to check is logs? is unbound logging to journald here?15:35
jeblairunbound restarts instantly there.  i'll reboot15:35
pabelangerjeblair: right on first boot check the status of the service15:36
pabelangerI haven't not stopped and started right after a boot15:36
pabelangerclarkb: we'd have to enable debugging, which I can15:36
jeblairpabelanger: yeah, there's no delay when doing a stop/start15:36
jeblairthere is some logging to syslog15:36
clarkbAug 16 15:07:56 ubuntu unbound-anchor: fail: the anchor is NOT ok and could not be fixed15:36
pabelangerjeblair: on first boot?15:37
*** tosky_ has joined #openstack-infra15:37
clarkbthough on that random host I am looking at syslog for it seems to have started in about 2 seconds15:37
*** zhurong has quit IRC15:37
clarkbAug 16 15:07:55 ubuntu systemd[1]: Starting unbound.service... to Aug 16 15:07:56 ubuntu systemd[1]: Started unbound.service.15:37
*** _nadya_ has quit IRC15:38
pabelangerclarkb: which host is that?15:38
clarkbubuntu-xenial-rax-ord-352127915:38
clarkbjust a random one I grabbed out of the nodepool list15:38
jeblairstrace -p 185215:38
jeblairstrace: Process 1852 attached15:38
jeblairgetrandom(15:38
clarkbso this isn't consistent15:38
*** edtubill has quit IRC15:38
jeblairso yes, waiting for getrandom15:38
fungicat /proc/sys/kernel/random/entropy_avail15:39
pabelangerjeblair: neat15:39
jeblair237415:39
jeblairi'll reboot again and repeat15:39
fungialso does ps suggest haveged is running?15:39
clarkbthere is a haveged on my host that did it quickly15:39
fungihaveged should be keeping the entropy pool nice and full15:39
*** tosky has quit IRC15:40
pabelangerfungi: I see it running15:40
*** amotoki has joined #openstack-infra15:40
openstackgerritIlya Shakhat proposed openstack-infra/project-config: Add new project "os-failures"  https://review.openstack.org/35581915:41
pabelangerroot       700  0.2  0.0  12204  6584 ?        Ss   15:39   0:00 /usr/sbin/haveged --Foreground --verbose=1 -w 102415:41
dstufftmordred: sdague I am awake now, what's up?15:41
jeblairhrm.  haveged was running while unbound-anchor was waiting.  the pool had 249615:41
*** andreykurilin has quit IRC15:41
jeblairnow that unbound-anchor completed the pool is at 236915:41
fungijeblair: yeah, that's a ton of available entropy15:41
mordreddstufft: pip download with an alternate index does not store into or retreive from cache. pip download without an alternate index does. pip install writes to and reads from cache in both cases15:42
mordreddstufft: should I file a bug for that?15:42
*** rbuzatu has quit IRC15:42
*** amotoki has quit IRC15:43
*** amotoki has joined #openstack-infra15:43
dstufftmordred: is this alternate index available publically? can I repro it on my desktop?15:43
mordreddstufft: yup!15:43
clarkbjeblair: fungi pabelanger I wonder if /usr/share/dns/root.key's key is just old and stale? the unbound-anchor manpage warns against this15:43
mordreddstufft: pip install --trusted-host mirror.gra1.ovh.openstack.org -i http://mirror.gra1.ovh.openstack.org/pypi/simple paramz15:44
mordreddstufft: is what we've been using15:44
mordreddstufft: (obviously in the various different combinations)15:44
clarkbit then does an update beacuse that file is not valid15:44
openstackgerritEmmet Hikory proposed openstack-infra/storyboard: Describe Storyboard in more detail  https://review.openstack.org/35602115:44
*** rcernin has quit IRC15:46
clarkbit does fetch things from the internet in that case15:46
*** rbuzatu has joined #openstack-infra15:46
pabelangerclarkb: but once the file is updated once, shouldn't the next reboot be good?15:47
rcarrillocruzTheJulia, cinerama : are we good to land https://review.openstack.org/#/c/353990/ and https://review.openstack.org/#/c/354615/ ?15:48
*** e0ne has quit IRC15:49
clarkbpabelanger: maybe? Probably not if it copies the bad one over again15:50
jeblairclarkb, pabelanger: when i strace unbound-anchor at boot, it's sitting at getrandom, and stays there until the kernel says: Aug 16 15:49:39 ubuntu kernel: [   62.801497] random: nonblocking pool is initialized15:50
dstufftmordred: Hmm, well pip install --trusted-host doesn't populate the cache at all for me here, and I think that's by design (trying to remember back to when we implemented it). The comments in the code suggest we purposely only cache valid HTTPS to prevent semi persistent poisoning of the cache and requiring manual eviction, also http://mirror.gra1.ovh.openstack.org/pypi/simple/paramz/ doesn't have cache control headers so even if you15:51
dstufft had valid HTTPS it wouldn't do a no-network cache hit (it does have an ETag header, so it'll do a conditonal GET though) (same is true for the files themselves)15:51
pabelangerjeblair: I'd like to try something quickly, can we set ROOT_TRUST_ANCHOR_UPDATE=false in /etc/default/unbound and restart?15:52
pabelangerjeblair: then run strace15:52
pabelanger# Whether to automatically update the root trust anchor file.15:52
pabelangerROOT_TRUST_ANCHOR_UPDATE=true15:52
jeblairpabelanger: go for it15:52
mordreddstufft: ah. you're right - I must have done one of the combinations wrong :(15:52
mordreddstufft: if we did a download without the alternate index just pointing to normal pypi15:53
mordreddstufft: and then did a subsequent install with the trusted index ... should we expect it to read from the cache?15:53
dstufftmordred: No, cache keys are full URLs15:53
mordredgotcha15:53
openstackgerritAndrea Frittoli proposed openstack-infra/subunit2sql: Fix typo in test_attr_list handling  https://review.openstack.org/35538515:53
openstackgerritAndrea Frittoli proposed openstack-infra/subunit2sql: Remove the test_attr_prefix before injecting  https://review.openstack.org/35539315:53
jeblair$ reboot15:53
jeblairFailed to connect to bus: No such file or directory15:53
*** ganesan has joined #openstack-infra15:53
dstufftmordred: so my recommendation would be A) Throw letsencrypt on the mirror B) setup cache-control15:54
jeblairlove it15:54
clarkbjeblair: need more sudo15:54
jeblairclarkb: indeed15:54
openstackgerritBeth Elwell proposed openstack-infra/project-config: Add release notes jobs for ironic-ui  https://review.openstack.org/35602915:54
mordreddstufft: nod. cool. thanks!15:54
fungiclarkb: sure, but that error message is beyond vague15:54
mordredfungi: ^^ see convo with dstufft15:54
clarkbfungi: ya its systemctl failing to talk  to systemd due to perms15:54
*** hieulq_ has joined #openstack-infra15:55
fungimordred: but if cache keys are the full urls, then we're still not going to end up being able to do much to prepopulate the cache15:55
mordredfungi: agree15:55
fungisince we need a different mirror url in each provider15:55
mordredyup15:55
dstufftmordred: fungi if this is a systemd using machine I have a couple of systemd unit files and a cron job you can use to keep LE up to date15:55
pabelangerjeblair: clarkb: okay, that booted a little faster, jeblair I missed the strace, do you mind trying?15:55
dstufftah yea15:56
dstufftthat's harder15:56
mordredyah. I think that's the real issue15:56
fungii suppose i can buy certs for the mirrors... i'm hesitant to have letsencrypt breaking our mirrors at random when it tries to renew certs15:56
mordredfungi: I don;'t think it'll get us anywhere15:56
dstufftit's a proper HTTP cache, so it treats different URLs as distinct15:56
fungidstufft: is there a good way to transform the cache? i suppose you're using a one-way hash so we can't reverse it to update the urls?15:57
mordredactually - how is this working on normal devstack runs then?15:57
mordredsdague said earlier that we do see only one download of a given thing in our devstack jobs15:57
openstackgerritEmmet Hikory proposed openstack-infra/storyboard-webclient: Add Worklists and Boards to About Page  https://review.openstack.org/35591215:57
mordredbut if install is not supposed to cache when we have trusted-host set15:57
mordredthose two things are potentially at odds15:57
clarkbits wheel caching in devstack iirc15:58
sdagueclarkb: it's just pip15:58
dstufftmordred: it's possible that wheel caching didn't get the same treatment15:58
mordredgotit. because those are locally built wheels15:58
mordredor whatnot15:58
sdaguemordred: except they aren't15:58
pabelangerclarkb: jeblair: fungi: Oh, ya. Way faster now: http://imgh.us/filename_5.svg That is with ROOT_TRUST_ANCHOR_UPDATE=false15:58
sdaguewe're mostly downloading wheels15:58
pabelangerunbound.service (138ms)15:59
mordredso wheel cache potentially not caching the same as tarballs is the thing saving us there15:59
*** matthewbodkin has quit IRC16:00
sdagueclarkb / fungi - I have suspicions that some of our odd fails in the last day are related to this - https://review.openstack.org/#/c/356010/16:00
dstufftfungi: it is a one way hash, I'm still kinda asleep (woo waking up at 11am), but off the top of my head it might be reasonable to implement some sort of aliasing thing. a la "treat domain x, y, z as domain a"16:00
sdaguewhich increased keystone debug logs by 2 orders of magnitude16:00
dstufftwhen it comes to caching16:00
sdaguethat's the revert16:00
sdagueany chance we could pop it into the top of gate?16:00
openstackgerritEmmet Hikory proposed openstack-infra/storyboard-webclient: Add Worklists and Boards to About Page  https://review.openstack.org/35591216:00
*** yaume has quit IRC16:00
*** Sukhdev has joined #openstack-infra16:01
sdagueI've definitely seen a bunch of odd keystone token lookup fails since that merged16:01
dstufftwe should probably make the wheel cache and the http cache consistent though16:01
dstufftit's weird that it's not16:01
*** xarses has quit IRC16:01
sdagueplus, until it's reverted, keystone logs are about ~1G uncompressed16:01
*** xarses has joined #openstack-infra16:01
dstuffteither skip the lack of caching on http or make wheel cache not cache on http16:01
fungisdague: ouch16:01
pabelangerjeblair: clarkb: maybe not.  still takes 1min for host git.openstack.org to resolve16:01
jeblairpabelanger: can you put that host back so i can continue to debug?16:01
sdaguefungi: it was an attempt to narrow some issues in the revoke code, I think the full extent of the fallout wasn't anticipated16:02
pabelangerjeblair: yes, just rebooted back to original settings16:02
mordreddstufft: I agree on making them consistent16:03
jeblairpabelanger, clarkb: looking at another host which is not slow to boot, i see:16:03
jeblairAug 16 15:55:19 ubuntu-xenial-osic-cloud1-3521020 kernel: [    3.906606] random: nonblocking pool is initialized16:03
*** jcoufal_ has joined #openstack-infra16:03
jeblairnote that's at 3.9 seconds from boot, as opposed to 62 seconds on pabelanger's host16:03
fungisdague: openstack/keystone 356010,1 is at the top of the integrated gate change queue now16:03
fungistevemar: dstanek: ^16:04
*** edtubill has joined #openstack-infra16:04
stevemarfungi: thank you16:04
stevemarsdague is padding his stats with reverts again :)16:05
*** Sukhdev has quit IRC16:05
zaromorning16:05
jeblairpabelanger, clarkb: putting everything together so far -- it's deciding to fetch a new anchor file, and is using openssl for that which needs some random which is taking 60+ seconds16:05
clarkbjeblair: that sounds correct to me16:05
*** xarses has quit IRC16:05
mordredthat also sounds correct to me16:05
mordredbased on reading16:05
*** xarses has joined #openstack-infra16:06
jeblairpabelanger: i can't log into 2001:4800:1ae1:18:f816:3eff:fe8e:9a3e16:06
*** jcoufal has quit IRC16:06
jeblairpabelanger: nm16:06
*** aaltman has joined #openstack-infra16:07
aaltmanHey guys, I had a quick question about nodepool if anyone has a moment16:07
*** xarses has quit IRC16:07
*** xarses has joined #openstack-infra16:07
pabelangerI still don't understand why it gets a new root.key on reboot16:07
mordredaaltman: just shoot - we'll respond in time16:07
*** amotoki has quit IRC16:07
pabelangerI would expect that to persist16:07
clarkbpabelanger: its copying the bad one if it does that unconditionally the fixed one will be overwritten would be easy to check that in syslog16:08
*** infra-red has quit IRC16:09
*** sdake has quit IRC16:09
*** xyang1 has quit IRC16:09
aaltmanokay cool: so when I boot a vm w/ nodepool and have it configured w/ Jenkins - what is the expectation for those two to connect? Who is performing the registration? I thought it happened over Gearman, but does the jenkins ssh key and username need to be enabled on the vm or can it use something like cloud-user16:09
*** hockeynu_ has quit IRC16:10
*** sshnaidm is now known as sshnaidm|afk16:11
*** matbu is now known as matbu|afk16:11
mgagneaaltman: looks to be done via Jenkins API: https://github.com/openstack-infra/nodepool/blob/master/nodepool/myjenkins.py#L132-L13316:12
pabelangerclarkb: Yup, I see that on first boot.  root.key copied, follow up reboots, key has content16:12
mordredaaltman: the expectation is that once nodepool spins up a node it'll have account/public-keys on it such that jenkins can connect to it - we bake those in as part of our base-image build process16:12
mordredand then yes, as mgagne says, nodepool uses the Jenkins API to attach the slave to jenkins16:12
aaltmanmgagne: okay great. I think I can replicate that and see what's going on. May be an SSL issue w/ our jenkins since it's self signed for dev.16:13
*** asettle has quit IRC16:13
sdaguestevemar: hey, it counts as commit in keystone, so I get to vote for ptl again :)16:14
stevemarsdague: haha, uh oh ... :)16:14
jeblairaaltman: do the nodes show up in jenkins at all?  if not, the nodepool log may have information as to why16:14
dstaneksdague: now i see your game :-)16:14
aaltmanmordred: okay. So that may be missing as well, we are generating a key, uploading to openstack on container entry, and nodepool can access, but Jenkins shouldn't be able to w/ that model16:14
aaltmanjetblair: they don't16:15
aaltmanJetblair: we checked the logs thoroughly and don't see anything suspcious16:15
*** dizquierdo_afk is now known as dizquierdo16:15
aaltmanjetblair: there's auth exec related to root/fedora/ubuntu logins until it hits cloud-user, which works fine and finishes out the setup16:15
*** dtantsur is now known as dtantsur|afk16:18
*** xyang1 has joined #openstack-infra16:18
jeblairaaltman: what you describe sounds like the snapshot image build process, where nodepool boots a node from a base image, customizes it, then takes a snapshot of it.  the actual test nodes are built from the snapshot.16:18
jeblairaaltman: nodepool and jenkins both need to have the private ssh key for the same account which should be installed on the snapshot.  nodepool uses it to log in immediately after a node boots to make sure that it worked, then it attaches it to jenkins16:19
openstackgerritMatt Riedemann proposed openstack-infra/project-config: Make gate-tempest-dsvm-multinode-live-migration gating for nova  https://review.openstack.org/35604316:20
aaltmanjetblair: Okay, so the boot process seems* to go fine, it's the handoff to Jenkins, which I suspect is an SSL cert issue that I currently do not see in the log and then in addition matching the keys16:20
aaltmanThat should be enough information to go off of16:21
*** adrian_otto has joined #openstack-infra16:21
aaltmanI'll give those two things a try. Thanks for the help!16:21
fungiaaltman: worth noting, back when we stil used jenkins we had self-signed certs on our jenkins masters16:21
fungii don't recall if we had to do anything special to "trust" those, or were simply relying on older python 2.7 not actually validating server certs16:22
aaltmanfungi: hmmm that's interesting16:22
*** markusry has joined #openstack-infra16:22
kgiustifungi: just fyi oslo.messaging is hitting the exact same tempest failures as the gate-tempest-dsvm-cells you mentioned16:22
pabelangerAug 16 16:07:16 ubuntu kernel: [   15.415094] random: nonblocking pool is initialized16:22
*** pilgrimstack has quit IRC16:23
kgiustifungi: but we never see the issue running the same test on the centos box FWIW16:23
*** Sukhdev has joined #openstack-infra16:23
*** yamahata has quit IRC16:23
fungikgiusti: the ones where it fails to reach glance on 127.0.0.1:9292?16:23
pabelangerjeblair: that was 2 reboots ago, did you make a change on the node?16:23
pabelanger^16:23
*** piet_ has joined #openstack-infra16:23
pabelangeronly time random has started below 60 seconds16:23
kgiustifungi: http://logs.openstack.org/90/349290/3/check/gate-oslo.messaging-src-dsvm-full-zmq/dd1de25/console.html#_2016-08-16_09_59_04_23046216:24
fungikgiusti: urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=9292): Max retries exceeded with url: /v1/images (Caused by ReadTimeoutError("HTTPConnectionPool(host='127.0.0.1', port=9292): Read timed out. (read timeout=60)",))16:24
fungiso, yep16:24
jeblairpabelanger: no16:25
kgiustifungi: yay it's not just me! :)16:25
fungikgiusti: and in osic, so this pattern seems consistent16:25
cloudnullafternoons. sorry have been mostly AFK so far today.16:26
kgiustifungi: agreed.16:26
* cloudnull reading back16:26
fungicloudnull: have fun, you have several nick highlights in here ;)16:26
*** savihou has quit IRC16:28
cloudnullmordred jeblair if we can get a list of instance id's I can go and track them down to and see if there are specific issues with a given host.16:28
openstackgerritMatthew Treinish proposed openstack-infra/elastic-recheck: Add query for bug 1613749  https://review.openstack.org/35598816:28
openstackbug 1613749 in OpenStack-Gate "Git timeouts from bluebox" [Undecided,New] https://launchpad.net/bugs/161374916:28
mtreinishfungi: ^^^ there is the bug and e-r query we're using to track it16:29
cloudnullpabelanger: do we think that the DNS resolver issues are what is causing the slowdown folks have been mentioning?16:30
*** jpich has quit IRC16:30
cloudnullmordred: whats with the 4 min to resolve git.openstack.org? is that something on the OSIC side that is causing that slowdown or is that a known routing issue?16:31
*** gongysh has quit IRC16:31
*** baoli_ has quit IRC16:31
pabelangercloudnull: we've had ipv6 dns on ubuntu-xenail since last night16:32
cloudnullstill not happy ?16:32
pabelangercloudnull: however, I haven't followed the git issue much this morning16:32
cloudnullok16:32
jeblaircloudnull: i believe we're working on 2 simultaneous issues, only one of which is osic-specific16:33
cloudnulldid the unbound start issues get resolved?16:33
jeblair(the other is xenial specific)16:33
cloudnulljeblair: which one is the osic specific issue? -- sorry likley missed the message in scroll back16:33
jeblaircloudnull: the 'git' issue is that it takes 4 minutes to perform git operations from osic to git.openstack.org16:33
jeblaircloudnull: and that's the one where i sent you two instance ids to see if there is any correlation16:33
cloudnullwell the message is likely there, i just missed reading it16:34
cloudnull:)16:34
* cloudnull looking into those instances now16:34
greghaynesjeblair: Random depends-on quesion - in the case of there being multiple changesets which match a change-id in a depends-on (in two different projects) does zuul depend on both?16:34
jeblaircloudnull: the other issue is that unbound sometimes takes a while to start, but i don't think that's xenial related16:34
jeblairgreghaynes: yes16:34
greghaynesgood deal :)16:34
jeblairgreghaynes: whew! :)16:34
cloudnullI saw the patch from pabelanger last night regarding that issue giving the process start and g.o.o resolv a wait.16:35
cloudnulldoes that fix the unbound problem ?16:35
jeblaircloudnull: i don't know about that.  pabelanger ?16:37
pabelangerjeblair: cloudnull: 355695 will just make our configure_mirror.sh script more robust, it doesn't address the actually unbound delay issue16:38
cloudnullyes that one, https://review.openstack.org/#/c/355695/ -- do we think thats an entropy issue?16:39
openstackgerritMatt Riedemann proposed openstack-infra/elastic-recheck: Add query for bug 1613749  https://review.openstack.org/35598816:39
openstackbug 1613749 in OpenStack-Gate "Timeouts when requesting a glance image created with a remote image from git.o.o" [Undecided,New] https://launchpad.net/bugs/161374916:39
jeblairgrr. all of the google results about "nonblocking pool is initialized" are related to the fact that it's a late kernel message, so it's what people see when their systems are borked and hang16:39
jeblairi can't actually find what prints it16:39
* fungi wishes linux would just switch to a continuously-seeded high-quality nonblocking prng as its /dev/random backend, like all the *bsds have done for years16:40
*** florianf has quit IRC16:40
*** hockeynut has joined #openstack-infra16:40
persiaIsn't /dev/urandom very close to that?16:40
jeblairfungi: i'm still really confused since haveged is running and the pool has entropy.16:40
fungijeblair: agreed16:41
fungino clue why it thinks it needs to wait still. unless aslr is grabbing priority over the available entropy pool soon after boot while other stuff is being loaded into memory?16:41
mgagnepabelanger: I'm considering contributing to grafyaml. I found that Grafana 3.x supports more features but also changes the syntax of some options. How should grafyaml be updated so it doesn't break the world of 2.x?16:42
fungipersia: yeah, except they continue to claim /dev/urandom is not secure for things like key generation. consensus among cryptographers is that you don't really "use up" entropy you accumulate, and so the linux entropy pool design is a bit of a fiction16:42
fungiyou should be able to reuse the same entropy once you have it, as long as it's presented through an appropriately turbulent prng algorithm16:43
persiaClaiming /dev/urandom isn't secure is just FUD.  It's usually at least as good as using the HWRNG on a TPM module, or similar.16:44
*** berendt has quit IRC16:44
*** tosky_ has quit IRC16:44
fungiagreed16:44
*** rbuzatu has quit IRC16:45
persiaWell, if you need to generate a one-time pad vs. an adversary with unlimited resources, /dev/random *might* be better, if you have good sources of true entropy, but in that situation, you probably shouldn't be using an operating system you didn't hand-code from scratch...16:45
*** florianf has joined #openstack-infra16:45
fungithere's a bit of stockholm syndrome going on with linux's /dev/random though. other operating systems have moved past that thinking16:45
cloudnulljeblair: regarding those two instances, nothing stands out between the two nodes, they landed on different compute nodes w/in different cabinets and both compute nodes have other vms running on them which are funcitonal .16:46
fungijeblair: cloudnull: we do have a second (suspected) osic-specific issue, which could be related but also could be distinct: specific tests timing out trying to connect to a service listening on 127.0.0.1. we're only seeing this failure manifest in osic (so far anyway)16:46
jeblaircloudnull: gr.16:46
*** piet_ has quit IRC16:46
cloudnullfungi: hum...16:47
*** kcobb has quit IRC16:47
*** kcobb has joined #openstack-infra16:48
fungiexample failure is http://logs.openstack.org/55/352455/3/gate/gate-tempest-dsvm-cells/7881266/console.html#_2016-08-16_13_38_04_76962816:48
cloudnullfungi: anything strange or did something change in the hosts file making it not resolve?16:48
fungicloudnull: not entirely sure yet, though our network diagnostics at the start of the log also show traceroute6 timing out trying to get to git.o.o16:49
fungiit resolves via dns correctly, but sees no icmp responses from any hop16:49
jeblairfungi, cloudnull: the two instances of slow git operations also have traceroute6 git.o.o timeouts16:49
kgiustifungi: cloudnull: fyi: https://bugs.launchpad.net/openstack-gate/+bug/161374916:51
openstackLaunchpad bug 1613749 in OpenStack-Gate "Timeouts when requesting a glance image created with a remote image from git.o.o" [Undecided,New]16:51
fungikgiusti: i think that's separate16:52
fungioh, maybe it's not16:52
fungithe bluebox one might want to be separated out though since the symptoms are distinct16:53
*** sambetts is now known as sambetts|afk16:53
*** jerryz has joined #openstack-infra16:54
kgiustimtreinish: ^^^16:54
fungibut thinking through that test, _if_ glance thinks it's serving the image from a remote location on git.o.o, then that could account for the timeout for the test's api calls to 127.0.0.1:929216:54
*** infra-red has joined #openstack-infra16:54
cloudnullkgiusti fungi jeblair: We have an SSD sepcific AZ if we think that the speed of writes is whats causing that issue? we could switch to using that AZ to see if more iops fixes the issue?16:54
*** nwkarsten has quit IRC16:54
*** bin_ has quit IRC16:54
mtreinishfungi: the working theory right now is it actually might not network related. we're waiting on sdague's revert to see if the load being generated by keystone logging constantly was causing these issue16:55
mtreinishbecause there are some keystone token errors in the glance logs before things start getting weird16:55
fungimtreinish: still strange we would only see it manifest that way in osic16:55
mordredcloudnull: I think sorting out the network issue first is more likely to be a win16:55
cloudnull++16:55
cloudnullfungi: maybe we're seeing it in the osic more due to it now having more tests run within the cloud ?16:56
mtreinishfungi: well if it's load related than the hardware and/or cloud config comes into play more16:56
*** javeriak has joined #openstack-infra16:57
fungicloudnull: well, our osic quota is still a minority of our overall aggregate quota, so we should be seeing it in other providers besides just osic. so far i haven't found any though16:58
fungimtreinish: agreed16:58
*** nwkarste_ has joined #openstack-infra16:58
*** infra-red has quit IRC16:58
*** infra-red has joined #openstack-infra16:58
openstackgerritMerged openstack-infra/elastic-recheck: Make everything plural  https://review.openstack.org/35596716:58
*** edtubill has quit IRC17:00
*** lucasagomes is now known as lucas-dinner17:02
*** javeriak_ has joined #openstack-infra17:02
*** javeriak has quit IRC17:02
krotscheckAny infra-core around to add a +A to https://review.openstack.org/#/c/346130/ ? I already have 2 +2's. pabelanger, in particualr, as you can verify that the bindep changes have landed.17:02
*** nwkarste_ has quit IRC17:03
*** hockeynut has quit IRC17:03
krotscheckAlso, I'm trying to get our JS DSVM job landed... https://review.openstack.org/#/c/348056/817:03
*** xarses has quit IRC17:04
*** javeriak has joined #openstack-infra17:04
fungicloudnull: jeblair: picking jobs running at random in osic, `traceroute6 git.openstack.org` seems to be broken in all of them at the start of jobs. so maybe we have an early race with something in the network there if it's working later on?17:05
fungialso i just found one i can't connect to the console for17:05
*** yamahata has joined #openstack-infra17:05
pabelangerfungi: jeblair: sudo apt-get install rng-tools17:06
pabelangerfungi: jeblair: I do not know why yet, but that is making random start faster17:06
fungi`nc 2001:4800:1ae1:18:f816:3eff:fe3b:53ef 19885` is just dead for me. should be running gate-tempest-dsvm-neutron-full-ubuntu-xenial17:06
*** devkulkarni has quit IRC17:07
pabelangerAug 16 17:06:38 ubuntu kernel: [   24.471992] random: nonblocking pool is initialized17:07
pabelangerlowest was: Aug 16 17:04:20 ubuntu kernel: [    7.680334] random: nonblocking pool is initialized17:07
*** javeriak_ has quit IRC17:07
jeblairpabelanger: what image are you using for your tests?17:07
jeblairpabelanger: i notice that apparmor is not installed17:07
zxiiroelectrofelix: I agree with you. I think we should just make it documented and a comment in the tox file17:07
pabelangerjeblair: template-ubuntu-xenial-147131659817:08
*** oanson has joined #openstack-infra17:09
*** _nadya_ has joined #openstack-infra17:09
fungipabelanger: rng-tools _can_ be configured to feed /dev/urandom in as a mock hardware rng. probably worth double-checking its config but that might be what it's doing. otherwise it's likely getting passthrough entropy from the hypervisor host17:09
*** david-lyle_ has joined #openstack-infra17:09
*** infra-red has quit IRC17:10
*** tonytan4ever has quit IRC17:11
*** e0ne has joined #openstack-infra17:11
pabelangerjeblair: I didn't know we installed apparmor explicitly17:11
fungicloudnull: yeah, 2001:4800:1ae1:18:f816:3eff:fe3b:53ef is just plain unreachable, but should still be up. uuid is c6dd7d7d-2797-47a5-b7b1-ed3cc917a4cc according to nodepool17:11
*** ansmith has joined #openstack-infra17:12
cloudnullfungi: looking now17:13
*** david-lyle has quit IRC17:13
*** david-lyle_ is now known as david-lyle17:13
fungiopenstack server list isn't showing that uuid as existing at all for me though17:13
fungioh, now nodepool's deleted it too17:14
cloudnullyup deleted.17:15
cloudnull:'(17:15
cloudnullsorry i was too slow17:15
jeblairfungi, pabelanger: unbound (via openssl) may actually be using urandom.  the getrandom(2) call reads that by default, but even the urandom pool needs to be initialized, and getrandom will block until urandom has been initialized.17:15
jeblairfungi, pabelanger: the fact that it waits until the kernel prints 'nonblocking pool is initialized' reinforces that for me17:16
jeblairthough i have not checked the openssl code to verify the flags17:16
fungijeblair: seems a likely explanation17:16
*** _nadya_ has quit IRC17:16
jeblairfungi, pabelanger: finally found the print statement: http://lxr.free-electrons.com/source/drivers/char/random.c?v=4.4#L68417:17
pabelangerfungi: jeblair: nice work on finding the reason17:17
openstackgerritEddie Ramirez proposed openstack-infra/project-config: Add craton-dashboard repository (Horizon Plugin)  https://review.openstack.org/35427417:18
*** rbuzatu has joined #openstack-infra17:18
*** javeriak has quit IRC17:19
*** tqtran has joined #openstack-infra17:19
*** javeriak has joined #openstack-infra17:19
jeblairi don't know why it's taking so long to initialize with haveged running and, according to the kernel, 2300+ bits of entropy17:20
jeblairwhen apparently 128 bits is needed to call it initialized17:20
fungicloudnull: jeblair: okay, some random spot checking turned up an example where a job in osic is successfully doing a traceroute6 to git.openstack.org, so this certainly seems inconsistent (could still be a startup race i suppose?)17:20
jeblairfungi: yeah, if you're thinking a startup race, it could be affected by how long the node sat idle before launching the job17:21
cloudnullfungi: the only way I'm able to reproduce this issue is break the resolvers.17:21
cloudnull:(17:21
fungiright, exactly what i'm wondering17:21
*** _nadya_ has joined #openstack-infra17:21
fungicloudnull: strangely, the example logs i have, dns resolution of git.openstack.org is fine, but traceroute responses aren't coming in17:21
fungiowing in part, i think, to the fact that nodepool ready scripts do a dns lookup of that name before ever declaring the node fit for use, so it should have resolution already cached17:22
fungihowever, also dns lookups are happening via ipv4, so wouldn't be broken by ipv6 routing issues17:23
jeblairuntil that change lands17:23
jeblair(though it will still fall back on v4)17:23
*** fguillot has quit IRC17:24
*** piet_ has joined #openstack-infra17:25
cloudnullmaybe this is an issue with the neutron router for IPv4 traffic? The v6 network is dual stack in the OSIC and the v4 interface is part of a neutron router. potentially, we're pushing the router farther than it wants to be pushed or its slow to be programmed which is causing the various timeouts?17:25
*** tonytan4ever has joined #openstack-infra17:26
fungicloudnull: so what i find particularly strange is that when traceroute6 works we get a response back from what appears to be the global address of the default gateway (2001:4800:1ae1:18::3), but when traceroute6 doesn't work, i don't even get a response from that one indicating an issue with neutron, or neighbor discovery (even though the fe80::def linklocal for that gateway is showing up as having a17:27
fungivalid hw address like 00:05:73:a0:00:06), or the local layer 2 maybe?17:27
*** aaltman has quit IRC17:27
*** gomarivera has joined #openstack-infra17:27
pabelangerfungi: jeblair: if rng-tools is hardware based generator, doesn't it make more sense to use that?  I admit, haveged and rng-tools is new to me17:27
fungipabelanger: the "hardware" based entropy sources supported by rng-tools may include things that are not actual hardware (especially on virtual machines). but regardless i'm fine with using it17:29
*** hieulq_ has quit IRC17:29
jeblairpabelanger: i don't know (yet).  pabelange, fungi: i'm still digging, and i have found that /proc/sys/kernel/random/entropy_avail is the amount of entropy in the input pool, which i believe feeds the urandom pool, which is what we're waiting on being initialized.  so that at least partially explains how the value in proc is high while we're still waiting for initialization.  it doesn't explain *why*.17:30
*** gyee has joined #openstack-infra17:30
fungipabelanger: haveged provides a nice fallback when there are no rng devices available, since it attempts to extract entropy from other timing-related sources17:30
openstackgerritJim Rollenhagen proposed openstack-infra/project-config: Make ironic job non-voting on Neutron  https://review.openstack.org/35607217:31
openstackgerritSai Sindhur Malleni proposed openstack-infra/project-config: Adding Ansible jobs for Browbeat  https://review.openstack.org/35607317:32
jroll^ 356072 is is a fairly easy review so we don't end up blocking neutron this close to release17:32
*** baoli has joined #openstack-infra17:33
pabelangerfungi: okay thanks for the info.17:33
*** mhickey has quit IRC17:34
pabelangerBut it does seem to be related to which cloud we start ubuntu-xenial on17:34
pabelangerAug 16 17:21:31 ubuntu kernel: [    3.385322] random: nonblocking pool is initialized17:34
pabelangerthat is from rackspace17:34
pabelangernice and fast17:34
jeblairpabelanger: we don't have a lot of these log lines in logstash, but there are some17:34
jeblairpabelanger: i see 30 seconds in ovh, 60 seconds in bluebox17:35
*** florianf has quit IRC17:35
fungijeblair: here's another fun anecdote related to urandom initialization times http://haypo-notes.readthedocs.io/summary_python_random_issue.html17:35
pabelangerinternap is 30sec too17:35
fungiseems consistent with what we're suspecting17:35
*** hieulq_ has joined #openstack-infra17:36
*** acoles is now known as acoles_17:36
pabelangerfungi: oh, nice17:36
*** florianf has joined #openstack-infra17:36
*** thorongil has joined #openstack-infra17:36
pabelangerhttp://bugs.python.org/issue26839#msg26412117:36
*** electrofelix has quit IRC17:37
*** ccamacho|out has quit IRC17:38
*** thorongil has quit IRC17:38
fungiat least on some platforms, some pseudo-random data gets written out on shutdown and then read in at startup to quickly seed /dev/urandom. we might be able to dump something into /var/lib/random-seed in our job node images17:38
*** tphummel has joined #openstack-infra17:38
*** thorongil has joined #openstack-infra17:38
fungiahh, that's an rh-ism. debian derivatives use /var/lib/urandom/random-seed to the same ends however17:39
*** thorongil has quit IRC17:40
*** thorongil has joined #openstack-infra17:40
*** thorongil has quit IRC17:41
*** thorongil has joined #openstack-infra17:42
*** shashank_hegde has joined #openstack-infra17:43
*** thorongil has quit IRC17:43
*** thorongil has joined #openstack-infra17:44
*** thorongil has quit IRC17:45
*** nwkarste_ has joined #openstack-infra17:45
*** thorongil has joined #openstack-infra17:45
*** nwkarst__ has joined #openstack-infra17:46
*** thorongil has quit IRC17:47
*** thorongil has joined #openstack-infra17:47
*** thorongil has quit IRC17:48
*** devkulkarni has joined #openstack-infra17:49
*** nwkarste_ has quit IRC17:50
*** hieulq_ has quit IRC17:50
fungipabelanger: that makes sense, so starting in linux 3.17 we're getting that behavior, which explains why it's impacting xenial and not trusty or centos 717:51
fungii would guess recent fedoras are impacted as well17:52
*** oanson has quit IRC17:52
fungidebian jessie is one kernel rev too old to see it17:52
pabelangerya, we can check fedora-2417:52
pabelangerwe have a node online17:52
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool: [WIP] Add scheduling thread to nodepool builder  https://review.openstack.org/35607917:52
*** Sukhdev has quit IRC17:53
pabelangerfungi: so we have a few work around for now, rng-tools, smarter configure_mirror.sh.  I wait until jeblair is finished before moving forward on that front17:54
SpamapSHey. I just wanted to offer some public praise. Thanks for all the hard work you everyone in infra has put in on zuul and nodepool. :-D  http://zuul.cloud-ci.ibmcis.com/17:54
fungipabelanger: jeblair: i guess we could also pick haypo's brain in #openstack-oslo about this since he seems to have dig into it quite a bit17:54
mordredSpamapS: woot!17:54
tlbrinfra-core could you please review merge https://review.openstack.org/#/c/347047/ ?17:54
pabelangerSpamapS: yay17:54
fungier, dug17:55
fungiSpamapS: thanks!17:55
SpamapSWe're pipelining and jobbing and really just happy as clams to have CI that works like upstream. :-D17:55
*** sdake has joined #openstack-infra17:55
*** tqtran has quit IRC17:55
Shrewsmmm, clams17:55
kgiusti+1 what SpamapS said - thanks muchly!17:56
*** baoli has quit IRC17:56
*** ccamacho has joined #openstack-infra17:57
*** baoli has joined #openstack-infra17:57
SpamapSShrews: oh heck yeah, clams would be great17:57
SpamapSsteamed in a little white wine sauce. :)17:57
Shrewsoh yeah17:58
tlbrmordred, could you please also review https://review.openstack.org/#/c/347047/ ? We want to start work on this projects as soon as possible :)17:58
*** tqtran has joined #openstack-infra17:58
Shrewsmordred: jeblair: notmorgan: look, pretty diagrams  https://review.openstack.org/#/c/356079/1/doc/source/devguide.rst17:58
*** andrey-mp has joined #openstack-infra17:58
*** gomarivera has quit IRC17:58
*** tonytan4ever has quit IRC17:59
mordredShrews: woot18:00
jeblairSpamapS: thanks!18:00
jeblairShrews: nice!18:00
*** dmsimard|afk is now known as dmsimard18:01
*** ganesan has quit IRC18:01
*** rcernin has joined #openstack-infra18:01
*** tqtran has quit IRC18:03
*** rbrndt has quit IRC18:04
*** kzaitsev_mb has quit IRC18:06
rcarrillocruznice :-)18:07
*** dprince has quit IRC18:07
*** ociuhandu has quit IRC18:07
openstackgerritMerged openstack-infra/puppet-infracloud: Switch to infra-cloud-bridge element  https://review.openstack.org/35600918:07
rcarrillocruz\o/ ^18:08
*** ccamacho has quit IRC18:08
jeblairpabelanger, fungi: i booted the machine without haveged and verified that unbound will continue to sit there waiting because there is no entropy.  so i know that we are getting entropy from haveged.  i then ran haveged in the foreground which immediately (<1s) provided entropy to the pool.  yet it still took 95 seconds for the pool to be initialized18:08
jeblairer, sorry, it took an additional 30 seconds to be initialized18:08
jeblair(i waited 60 seconds to start)18:08
rcarrillocruzpabelanger: i'm going to wipe the deploy dib image to get the bridge element in18:09
rcarrillocruzin case you want to run it to see how it goes18:09
rcarrillocruz?18:09
rcarrillocruzwell nm, it seems you all are hooked with the entropy thing, sorry for the noise18:09
fungijeblair: what about with haveged removed but rng-tools installed? in theory (if this is qemu-based at least) there'll be a virt-rng it uses to get extra entropy18:11
*** tqtran has joined #openstack-infra18:12
jeblairfungi: i don't know, but i'm not quite ready to try that yet; still trying to understand the sequence with haveged18:12
mordredalso - rackspace isn't qemu based18:12
jeblairi'm running on osic18:12
jeblairi think :)18:12
fungimordred: yeah, not sure how this will vary from provider to provider18:13
mordredI know - I was just respnding to fungi in that we have to make sure that fixing osic doesn't break rax18:13
mordredfungi: ++18:13
jeblairya18:13
*** xarses has joined #openstack-infra18:13
fungiwe already know there's a significant timing variance for this across providers. seems to block longer on some than others18:13
*** degorenko is now known as _degorenko|afk18:13
jeblairfungi's question is a good one -- essentially in my mind as "if haveged is doing it's thing quickly, which seems to be the case, why does rng-tools appear to be faster"18:14
jeblairi just think i have a bit more data i can pull out of this configuration before i start to examine the delta with that one18:14
fungii'm also curious if it's got a /var/lib/urandom/random-seed it's reading in at boot18:15
*** _nadya_ has quit IRC18:15
fungiif we're seeing this delay on successive reboots then the on-disk seed likely isn't going to help18:15
jeblaircat: /var/lib/urandom/random-seed: No such file or directory18:15
sdaguemtreinish: http://logs.openstack.org/10/352610/4/gate/gate-tempest-dsvm-cells/1d2248f/console.html still failing even after the keystone revert, so I think osic issues are still a real thing18:16
fungijeblair: oh, i wonder if it's not saving one for some reason, or if it's been moved18:16
jeblairfungi: /var/lib/systemd/random-seed exists18:16
fungiaha, now all restaurants are taco bell^W^Wsystemd18:17
*** fguillot has joined #openstack-infra18:17
*** javeriak_ has joined #openstack-infra18:17
jeblairfungi: there is a /lib/systemd/system/systemd-random-seed.service18:18
jeblairDescription=Load/Save Random Seed18:18
*** javeriak has quit IRC18:18
fungiyeah, so sounds like it's there and doesn't help speed up urandom initialization at boot18:18
jeblairi would like to verify it's working18:18
*** tqtran has quit IRC18:19
mordredjeblair: is it enabled?18:19
*** pvaneck has joined #openstack-infra18:19
jeblair# service random-seed status18:19
jeblair● random-seed.service Loaded: not-found (Reason: No such file or directory) Active: inactive (dead)18:19
mordredwell, there we go18:19
jeblairmordred: that looks somewhat negative18:20
mordredyah18:20
mordredis /var/lib/urandom present?18:20
openstackgerritAbhishek Raut proposed openstack-infra/project-config: Use python-db-jobs for tap-as-a-service  https://review.openstack.org/35567018:20
jeblairyes18:20
mordrednod18:20
fungihuh18:20
fungithat's odd18:20
*** jaosorior has joined #openstack-infra18:21
Shrewsi like the "active-inactive-dead" text there. not confusing at all18:21
fungiELENNART18:21
sdaguedoes someone have a summary of the current best theory around the osic issues?18:21
jeblairShrews: 2 out of 3 are negative18:22
fungimtreinish: i found the root of our snmpd issue on xenial... https://review.openstack.org/10112 says "This can go away after everything is upgraded to precise..."18:22
sdagueas I'm trying to ponder if there is a short term mitigation on the qa side?18:22
Shrewsjeblair: ah, it's a proportional failure message. got it  :)18:22
jeblairsdague: i think you're referring to the 'glance' issue?18:23
*** gomarivera has joined #openstack-infra18:23
sdagueyeh18:23
jeblairsdague: or do you mean the 'git' issue?18:23
mordredjeblair: the internet tells me the actice: inactive (dead) thing may be the result of getting the name wrong in the status command18:23
sdaguewell... the glance issue is coupled to the git issue, right?18:23
andrey-mphi! is there a document about gate/integrated queue? I want to inderstand how it works... job for changeset 352455 is going about 10 hours. it stops at the end and begins again...18:24
fungisdague: that's unclear. the glance issue in bluebox does seem to be related to being unable to directly reach git.openstack.org because it's being treated as a "fake" glance remote location18:24
mordredjeblair: I think you want "systemctl status systemd-random-seed.service"18:24
jeblairmordred: aha you and the internet are right18:24
mordredwoot!18:24
pabelangerrcarrillocruz: okay18:25
fungisdague: the glance issue we're seeing in osic seems to be that calls to the local glance service on the job node time out (so maybe behind the scenes, glance is acting as a sort of proxy to that remote file on git.o.o still?)18:25
mordredalthough amusingly on my laptop I get a different error when I do that wrong18:25
jeblairmordred:    Active: active (exited) since Tue 2016-08-16 18:06:16 UTC; 18min ago18:25
jeblairthat seems more better18:25
mordredjeblair: that's good18:25
*** jaosorior has quit IRC18:25
jeblairnow i wonder if there are any logs18:25
jeblaircause i would like to have a timestamp for when it ran/exited18:25
pabelangernice work18:26
jeblairjournalctl -u systemd-random-seed.service18:27
jeblairAug 16 18:06:16 ubuntu systemd[1]: Started Load/Save Random Seed.18:27
*** dprince has joined #openstack-infra18:27
*** gomarivera has quit IRC18:27
jeblairthat's about t+0 seconds for this boot18:27
jeblairso it seems to have run as expected18:28
*** berendt has joined #openstack-infra18:28
*** berendt has quit IRC18:28
*** baoli has quit IRC18:29
*** baoli has joined #openstack-infra18:29
*** cody-somerville has quit IRC18:29
*** cody-somerville has joined #openstack-infra18:29
*** csomerville has joined #openstack-infra18:30
openstackgerritJeremy Stanley proposed openstack-infra/puppet-snmpd: Remove initscript  https://review.openstack.org/35609018:30
openstackgerritHenry Gessau proposed openstack-infra/project-config: Use python-db-jobs for networking-sfc  https://review.openstack.org/35435818:30
fungimtreinish: pabelanger: ^18:31
*** Jeffrey4l has quit IRC18:31
*** rbrndt has joined #openstack-infra18:32
*** ociuhandu has joined #openstack-infra18:32
*** tqtran has joined #openstack-infra18:32
*** cody-somerville has quit IRC18:34
*** abregman has joined #openstack-infra18:36
*** chem` has joined #openstack-infra18:37
*** chem has quit IRC18:38
*** _nadya_ has joined #openstack-infra18:40
*** amotoki has joined #openstack-infra18:45
mtreinishfungi: heh, that would do it18:45
openstackgerritPaul Belanger proposed openstack-infra/project-config: Add ansible-role-jobs for browbeat  https://review.openstack.org/35609318:45
openstackgerritVasyl Saienko proposed openstack-infra/devstack-gate: DO NOT REVIEW  https://review.openstack.org/35609418:46
*** _nadya_ has quit IRC18:48
*** amotoki has quit IRC18:48
mtreinishsdague: ok, sure. At least we know now18:49
mtreinishfungi: especially since the whole systemd thing on xenial an init script is even less useful there :)18:49
mtreinishfungi: do we need an equiv systemd unit file for xenial or does it come with the package?18:50
openstackgerritIsaku Yamahata proposed openstack-infra/project-config: Add networking-odl for grafana dashboard  https://review.openstack.org/35322618:51
fungimtreinish: the snmpd package on xenial ships with an initscript still18:51
mtreinishheh, ok18:52
*** abregman has quit IRC18:52
*** ryanpetrello has quit IRC18:54
*** ryanpetrello has joined #openstack-infra18:55
*** chem` has quit IRC18:55
*** Goneri has quit IRC18:55
*** tonytan4ever has joined #openstack-infra18:58
*** devkulkarni1 has joined #openstack-infra18:59
*** Sukhdev has joined #openstack-infra18:59
fungiit's that (weekly infra team meeting) time again! find us in #openstack-meeting for the next hour19:00
*** Na3iL has quit IRC19:00
*** e0ne has quit IRC19:01
*** Goneri has joined #openstack-infra19:01
*** andrey-mp has left #openstack-infra19:01
*** devkulkarni has quit IRC19:01
*** baoli has quit IRC19:01
*** edtubill has joined #openstack-infra19:02
openstackgerritIsaku Yamahata proposed openstack-infra/project-config: networking-odl: cover more combinations of version  https://review.openstack.org/34704519:02
*** camunoz has joined #openstack-infra19:04
*** edtubill has quit IRC19:04
openstackgerritJoost van der Griendt proposed openstack-infra/jenkins-job-builder: Add support for stash-pullrequest-builder plugin Although the application has now been renamed/merge to BitBucket, it is still sensible to keep the Stash name for now. As there are already plugins named BitBucket, which are purely targeting the cloud solu  https://review.openstack.org/35521119:05
*** fifieldt has quit IRC19:06
*** gomarivera has joined #openstack-infra19:08
openstackgerritIsaku Yamahata proposed openstack-infra/project-config: Add networking-odl for grafana dashboard  https://review.openstack.org/35322619:11
*** edtubill has joined #openstack-infra19:13
*** sdake_ has joined #openstack-infra19:14
*** asettle has joined #openstack-infra19:14
*** sdake has quit IRC19:14
*** Apsu has left #openstack-infra19:15
*** sdake_ has quit IRC19:15
*** docaedo has quit IRC19:16
*** sdake has joined #openstack-infra19:16
*** dtardivel has quit IRC19:17
openstackgerritMerged openstack-infra/system-config: Pre-install python2-requests package for Fedora  https://review.openstack.org/35573119:17
*** edtubill has quit IRC19:18
*** fifieldt has joined #openstack-infra19:19
*** _nadya_ has joined #openstack-infra19:20
*** martinkopec has joined #openstack-infra19:20
*** martinkopec has quit IRC19:21
anteayasdague: I'm at a loss about http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2016-08-16.log.html#t2016-08-16T11:51:3619:21
*** asettle has quit IRC19:21
anteayasdague: you link to a line that says  groups:19:21
anteaya- labs19:21
sdagueanteaya: I assumed you'd be up at that timezone, and it was a project-config question19:21
anteayaand ask about xenial nodes19:21
anteayadid you get an answer?19:22
anteayaI have been offline most of today19:22
sdagueabout how to set the launchpad bug project page19:22
sdagueI did not, but cdent is really the one that needs to know19:22
anteayayes, that is the way to set a launchpad group for bugs19:22
anteayathis is the best documentation for setting up launchpad: http://docs.openstack.org/infra/manual/creators.html#set-up-launchpad19:23
anteayaas the group and who owns the group is important19:23
*** iurygregory has joined #openstack-infra19:25
*** Hal has joined #openstack-infra19:26
*** edtubill has joined #openstack-infra19:27
*** edtubill has quit IRC19:28
*** xyang1 has quit IRC19:29
*** Hal has quit IRC19:30
*** edtubill has joined #openstack-infra19:30
*** tqtran has quit IRC19:34
*** tongli has quit IRC19:35
*** Goneri has quit IRC19:36
karthikp_clarkb: afazekas: sdague, ianw Please could you help me review these change to the infra in your free time. we need this to test multinode grenade job for Cinder19:37
karthikp_Thanks in advance19:37
*** gomarivera has quit IRC19:40
karthikp_https://review.openstack.org/#/c/355678/19:41
sdagueclarkb: is there a patch up already to move cells & ceph jobs to xenial?19:41
*** _nadya_ has quit IRC19:42
openstackgerritJoost van der Griendt proposed openstack-infra/jenkins-job-builder: Adding support for Hidden parameter plugin  https://review.openstack.org/35520919:42
*** docaedo has joined #openstack-infra19:44
*** markusry has quit IRC19:44
*** tqtran has joined #openstack-infra19:47
*** florianf has quit IRC19:48
*** markusry has joined #openstack-infra19:48
tlbrinfra-team could you please merge https://review.openstack.org/#/c/347047/ ?19:48
*** hockeynut has joined #openstack-infra19:48
openstackgerrityolanda.robla proposed openstack-infra/system-config: Bump version of rabbitmq module  https://review.openstack.org/35611719:52
*** hockeynut has quit IRC19:53
*** kzaitsev_mb has joined #openstack-infra19:54
*** hockeynut has joined #openstack-infra19:54
*** Apoorva has joined #openstack-infra19:55
mtreinishjeblair, fungi: how difficult would it be to get the node type into the metadata we pass to logstash and subunit2sql?19:56
*** nwkarst__ has quit IRC19:57
*** nwkarsten has joined #openstack-infra19:57
*** asettle has joined #openstack-infra19:59
*** asettle has quit IRC19:59
*** tqtran has quit IRC19:59
*** camunoz has quit IRC19:59
*** annegentle has joined #openstack-infra19:59
*** nwkarste_ has joined #openstack-infra20:00
jpmaxmanKrenair: I think your config is more correct - keep in mind this was patched together going from older distribution / apache.  I'm assuming you started fresh with trusty / apache 2.420:01
*** nwkarsten has quit IRC20:02
KrenairFresh Trusty, then I applied puppet which gave me apache etc.20:02
fungiKrenair: yeah, this was me porting jpmaxman's apache config changes to production. i didn't really try to whittle them down. what you have in your change is likely sufficient20:02
*** julim has quit IRC20:03
fungithat diff was simply between what we had on the production server and what i found on the upgrade test server. so it's known-working, but almost certainly could be improved/tightened20:04
yolandahi, so no time in the meeting... i wanted to raise the topic about mid-cycle sprint. OPNFV people are interested in adding some slot to the agenda, they requested in the etherpad20:04
*** e0ne has joined #openstack-infra20:05
jeblairmtreinish: possible; have the ansible launch server return it in the zmq event it sends20:05
*** tqtran has joined #openstack-infra20:05
mtreinishjeblair: ok do you have a link to where to start looking? That way I can take a detailed look after the tc meeting20:06
fungiyolanda: opnfv people from the qa team? or we have opnfv people on the infra team?20:07
jeblairmtreinish: yep, right here: http://git.openstack.org/cgit/openstack-infra/zuul/tree/zuul/launcher/ansiblelaunchserver.py#n85720:07
mtreinishjeblair: cool, thanks20:07
*** edtubill has quit IRC20:07
fungimtreinish: so you mean some different node type than what we already record in logstash?20:08
yolandaopnfv people from infra. Actually Fatih is interested on coming20:08
yolandai'm collaborating with them in infracloud deployment efforts on opnfv20:09
fungiyolanda: cool, i didn't know we had people in infra helping with that20:09
jeblairfungi, pabelanger: i believe i have found that restoring entropy data from a file in the manner of systemd (or init scripts that use dd) does put entropy into the pool, but does *not* update the entropy count.20:09
fungiyolanda: so it's really less an opnfv topic, and more a making infra-cloud reconsumable downstream topic?20:09
*** matrohon has joined #openstack-infra20:09
anteayayolanda: what is faith's irc nick?20:10
mtreinishfungi: we have the build_node and the node_provider today. I just want like trusty, or xenial-2-node or something like that20:10
*** vhosakot has quit IRC20:10
anteayaI've been bemoaning the lack of new women lately20:10
yolandathey have interest in infra-cloud, they need some specific features, being more reconsumable, having ha, some more network configs20:10
anteayagreat to see more20:10
yolandabut also an specific opnfv topic, about how can we collaborate better20:10
mtreinishfungi: so we can more easily see if a failure is isolated to a specific distro or something like that20:10
yolandafungi, Fatih nick is fdegir20:10
anteayayolanda: thanks20:11
openstackgerritMerged openstack-infra/elastic-recheck: Add query for bug 1613749  https://review.openstack.org/35598820:11
openstackbug 1613749 in OpenStack-Gate "Timeouts when requesting a glance image created with a remote image from git.o.o" [Undecided,New] https://launchpad.net/bugs/161374920:11
jpmaxmanso Krenair Fungi I'm not super familiar with Puppet - so really with your changes as far as I can tell they look good.   I'd be more capable of  judging by looking at a resulting server that was spun up from these puppet scripts.  I was actually hoping to do that myself, but it's a little silly to hold this up for that.  I'm hopeful to get more familiar with20:11
jpmaxmanpuppet in general and be able to be more helpful with that side of things moving forward.20:11
*** vhosakot has joined #openstack-infra20:11
jeblairmtreinish: 'node_image' is what i would recommend for naming that with specificity20:11
fungimtreinish: oh, indeed, for some reason i thought we had the base node label as a parameter there already20:11
fungibut on inspection i see it's definitely noy20:12
funginot20:12
mtreinishI prefer noy :)20:12
jeblairjpmaxman: cool -- if it's at all helpful, there's a bit of a walkthrough here about how to run infra puppet on a vm: http://docs.openstack.org/infra/system-config/sysadmin.html#making-a-change-in-puppet20:13
yolandafungi, anteaya, so well, i wanted to raise the attention on the etherpad, requesting that slot to be added if there is time, so Fatih can come to the mid-cycle if there is interest on it20:13
fungiyolanda: testing additional deployments of our infra-cloud manifests sounds like something some of the attendees might find interesting, but i would avoid spinning it as the infra team helping opnfv deploy a cloud20:16
KrenairThere's a story somewhere about having a wiki-dev server20:16
Krenairmaybe the puppet changes could be applied there and tested properly?20:16
*** tqtran has quit IRC20:16
KrenairIt could be that my puppet changes don't cover everywhere and there's still some things to do that I didn't find20:17
*** gomarivera has joined #openstack-infra20:17
Krenaire.g. you made some changes for ReCaptcha I think?20:17
fungiKrenair: yep, i expect that we'll do that as the next step after we merge those changes. puppet is currently disabled for the production server since the upgrade20:17
*** e0ne has quit IRC20:18
fungiKrenair: http://paste.openstack.org/show/558529 was the change i applied to Settings.php (with credentials redacted)20:18
*** yaume has joined #openstack-infra20:18
yolandafungi, i would not say "infra team helping them", but propose as some ways to collaborate or join efforts20:19
Krenairfungi, yeah it seems we're going to have quite a few extra things to puppetise20:19
fungiKrenair: again, just directly ported from the upgrade test server jpmaxman worked on, with some whitespace cleanup to reduce the diff as much as possible20:19
Krenairwhy was MF removed?20:20
fungiKrenair: it allowed account creation outside openid previously20:20
fungiit's possible in 1.27 that's no longer the case20:20
Krenairokay but isn't disabling that a separate patch? why was it included in a wiki-upgrade change?20:21
yolandaanyway, i have to leave, i'll try to attend to next infra meeting and propose some item to the agenda, to see if there is interest on having an slot for it or not20:22
jpmaxmanKrenair: also when I enabled it the wiki error'd out20:22
jpmaxmanI didn't dig into it too deep20:23
fungiyeah, we discussed the possibility of reenabling it again once we work out what's needed20:23
fungithis was fairly rushed as we're still scrambling to get the spam problem under control20:23
*** baoli has joined #openstack-infra20:24
fungiso having a wiki with limited incoming spam was prioritized over some previous features we had20:24
*** pfallenop has quit IRC20:24
fungisimilar for file uploads20:24
Krenairokay well20:24
pabelangerjeblair: good to know, thanks for the update20:25
Krenairthere's no safe way you can just send these commit through and apply puppet in prod, it's going to have to go through a wiki-dev server20:25
fungiKrenair: yes, that's what i'm expecting20:25
Krenairtoo many unknowns created by working on servers without using puppet20:26
fungiKrenair: we have puppet entirely disabled for the production server for now so we can work through massive refactoring of that puppet module in safety on a dev deployment20:26
cloudnullfungi pabelanger mordred jeblair: Just as an update, we've found that the VLAN that was supposed to be running on all of our compute nodes wasn't trunked to all of the required switch ports. so that is likely a major part of the recent raft of failures. I **Believe** this is fixed now. we're rerunning some tests and I'll let you know what I find out.20:26
Krenairokay. what will it take to get a -dev server?20:26
KrenairI assume these servers are all just instances in a cloud somewhere, right? you don't have to procure hardware for this20:26
jpmaxmanright - I think dev-wiki is next step and get that where we want it to be with the functionality we want20:27
*** jordanP has joined #openstack-infra20:27
jpmaxmanKrenair: correct20:27
krotscheckmordred: I'm going through these cloud-config things here- what's the point of having the API version in clouds.config? Shouldn't the SDK be the thing that knows what language it can talk?20:27
pabelangercloudnull: Nice, thanks for the update20:27
fungiKrenair: i (or another of our ~dozen root admins) needs to launch one. this is a priority for me, but it's competing with a number of other priorities so i can't promise it in the next 24-48 hours20:27
*** inc0 has quit IRC20:27
krotscheckAny infra cores around that can +A this patch? I've got 2x+2's, Ajaeger is on vacation, and I don't really want to sit on this for the next two weeks.20:28
Krenairokay well, don't let me rush you :)20:28
krotscheckhttps://review.openstack.org/#/c/346130/20:28
*** piet_ has quit IRC20:28
fungikrotscheck: specifically we'll need to create a server instance for it, a trove instance to hold its database, a cinder volume for the file content mounted in the appropriate place on the fs, add some dns records, and we're probably at a minimum also lacking some glue in the system-config repo to instantiate the mediawiki module for that new server name20:28
fungier, Krenair ^20:29
fungisorry krotscheck20:29
*** piet has joined #openstack-infra20:29
* krotscheck lays claim on the tab-completion scope of the letter K!20:29
Zara:)20:29
* fungi is now known as krugerand20:30
fungioh, i guess it has two r's20:30
fungiwell, three in total20:30
*** tqtran has joined #openstack-infra20:31
*** kgiusti has left #openstack-infra20:31
*** gouthamr has quit IRC20:33
*** tqtran has quit IRC20:34
openstackgerritJames Slagle proposed openstack-infra/tripleo-ci: Deploy minimal services in multinode job  https://review.openstack.org/35509720:34
*** pfallenop has joined #openstack-infra20:35
*** xyang1 has joined #openstack-infra20:36
ianwkrotscheck: it's waiting on depends-on's ?20:36
krotscheckianw: Hrm.20:37
krotscheckianw: Ah, right. So https://review.openstack.org/#/c/334873/ is a review that I don't have any other cores on20:37
*** rbuzatu has quit IRC20:37
Krenairfungi, okay, well, let me know when it's up?20:37
KrenairI'm on holiday next week but other than that I should be available20:38
*** javeriak has joined #openstack-infra20:40
ianwkrotscheck: is that list in 334873 curated in any way, or just grabbed from somewhere?  i mean i don't mind just putting it in as is, the only problem would be that it does too much20:40
krotscheckianw: You'd have to ask AJaeger, I think it's a list of default dependencies from project ¯\_(ツ)_/¯20:41
*** pfallenop has quit IRC20:41
*** tqtran has joined #openstack-infra20:42
*** piet has quit IRC20:44
*** javeriak_ has quit IRC20:44
*** armax has quit IRC20:45
*** jheroux has quit IRC20:45
*** e0ne has joined #openstack-infra20:45
fungiKrenair: will do. your work on this so far is much appreciated too!20:47
*** ansmith has quit IRC20:48
*** pfallenop has joined #openstack-infra20:48
*** tqtran has quit IRC20:48
*** kbaegis has quit IRC20:48
*** jheroux has joined #openstack-infra20:49
jpmaxmanyes Krenair thank you!20:51
*** jordanP has quit IRC20:52
*** edtubill has joined #openstack-infra20:52
*** yaume has quit IRC20:54
*** Apoorva_ has joined #openstack-infra20:54
*** kbaegis has joined #openstack-infra20:55
*** matrohon has quit IRC20:55
*** piet has joined #openstack-infra20:56
*** javeriak has quit IRC20:57
*** Apoorva has quit IRC20:57
*** rbrndt has quit IRC20:59
*** tonytan4ever has quit IRC20:59
*** dprince has quit IRC21:00
*** raildo has quit IRC21:01
cloudnullfungi pabelanger mordred jeblair: So I've now built a VM on every compute node using the V6 network and pinged it. Additionally I've added user data to the VM to install traceroute and tracerout(6) git.o.o and from a spot check of many of the instances console log they're all being able to get there. so I **hope** this "resolves" the issue with instances + busted v6 networks. In test, I've found a few misbehaving hosts21:01
cloudnullhave pulled them from the available pool.21:01
fungicloudnull: thanks!21:02
*** thorst_ has quit IRC21:02
mtreinishcloudnull: did you check it from a trusty vm by any chance?21:02
fungimtreinish: sdague: ^ keep an eye out for continued hits21:02
cloudnullIDK if that makes the localhost routing thing happy, but getting there.21:02
cloudnullmtreinish: no, i did it w/ xenial21:02
mtreinishcloudnull: because that was another side of the equation we saw. The failures were only happening on trusty jobs21:03
cloudnullI can do it w/ trusty21:03
cloudnullmtreinish: the localhost failures we're on trusty ?21:04
mtreinishcloudnull: yep21:04
cloudnullok. i'll give that a go to o21:04
*** javeriak has joined #openstack-infra21:04
*** sdague has quit IRC21:04
*** tqtran has joined #openstack-infra21:07
*** yamamoto has joined #openstack-infra21:08
openstackgerritMerged openstack-infra/project-config: Added documentation draft jobs for nodejs-based projects  https://review.openstack.org/34613021:09
*** gomarivera has quit IRC21:10
fungicloudnull: `nc 2001:4800:1ae1:18:f816:3eff:fed4:f536 198851 to see a log of an instance which showed failing traceroute6 as recently as 10 minutes ago. uuid is 553a91ef-fe3d-4c14-965a-419cf93acbba21:10
openstackgerritMerged openstack/python-jenkins: Remove discover from test-requirements  https://review.openstack.org/34576421:10
*** Apoorva_ has quit IRC21:11
cloudnullI can ping that node .21:12
fungicloudnull: i ssh'd into it, and `traceroute6 git.openstack.org` continues to fail for me there21:12
*** Apoorva has joined #openstack-infra21:12
fungiwant me to hold it?21:12
cloudnullhum. are the routes set?21:12
cloudnullif you could21:12
fungiokay, it's held21:13
cloudnulldoes ``host git.openstack.org`` work?21:13
*** dizquierdo has quit IRC21:13
fungiyeah, and it also resolved it correctly for the traceroute621:13
fungijust gets no responses back to its datagram probes21:14
fungidefault via fe80::def dev eth0  proto ra  metric 1024  expires 1787sec hoplimit 6421:14
fungiwhich i take it is the linklocal of the next hop21:14
cloudnullhum.21:14
fungife80::def dev eth0 lladdr 00:05:73:a0:00:06 router REACHABLE21:14
* cloudnull looking at the compute node21:14
*** yamamoto has quit IRC21:15
fungi`ping6 git.openstack.org` from it works fine21:16
*** Hal has joined #openstack-infra21:16
cloudnullwell thats odd.21:16
fungiit's possible that these failing v6 traceroutes are correlated to "trusty in osic" and the job failures are also correlated to "trusty in osic" but that the two behaviors are unrelated21:18
*** matrohon has joined #openstack-infra21:18
cloudnullfungi: its missing all the hops ? or just fails all together?21:18
*** gyee has quit IRC21:20
*** edtubill has quit IRC21:20
cloudnullalso does cloning from git.o.o work and _not_ take the 4 some odd minutes.21:20
*** sarob has joined #openstack-infra21:20
*** gomarivera has joined #openstack-infra21:20
*** e0ne has quit IRC21:20
fungicloudnull: missing all hops21:21
*** jcoufal_ has quit IRC21:22
fungiwe have trusty servers elsewhere with working ipv6 and the basic ip6tables -L output matches21:22
fungiand i can successfully traceroute6 to stuff from those21:22
*** jordanP has joined #openstack-infra21:23
*** jkilpatr has quit IRC21:23
*** spzala_ has quit IRC21:23
cloudnulldoes it miss all of the hops w/ something else, like google.com ?21:23
fungiso it doesn't seem to be a misconfigured firewall rule or trusty-specific bug21:23
*** spzala has joined #openstack-infra21:23
*** gyee has joined #openstack-infra21:23
fungiyep21:24
cloudnulland from the sounds of it, everything is working?21:24
cloudnullbesides the traceroute that is21:24
fungiright, so i think the traceroute errors we're getting in the logs may be unrelated to the slow git clones and to the glance-related errors in devstack21:25
mtreinishfungi: heh, that'd be too much of a coincidence for me to have 2 separate issues with trusty + osic involving talking to git.o.o21:25
*** sarob has quit IRC21:25
fungiit's something we can (and should) dig into, but i'm unconvinced it's a marker for the other issues21:25
*** sarob has joined #openstack-infra21:26
fungimtreinish: well, i have a node held where traceroute6 to git.o.o times out, but pin6 to it works fine and git cloning from it works fine21:26
fungis/pin6/ping6/21:26
*** rcernin has quit IRC21:27
fungii'm going to try to find more examples of traceroute6 _working_ from osic, and see if any of them are on trusty21:27
*** javeriak has quit IRC21:28
*** spzala has quit IRC21:28
cloudnullit seems suspect, but im going to rope in some of our network folks to see whats what,21:28
cloudnullmaybe a misconfiguration somewhere in the path.21:29
fungii have to step away for a bit though and eat dinner. bbiab21:30
cloudnullkk, ttyl21:32
cloudnullenjoy dinner.21:32
*** jordanP has quit IRC21:32
*** matrohon has quit IRC21:32
*** annegentle has quit IRC21:32
*** baoli has quit IRC21:33
jeblairfungi, pabelanger: i'm still not quite at the bottom of the rabbit hole, but i think i'm getting close.  neither systemd nor haveged alone is sufficient to initialize urandom.  systemd's entropy is not counted at all.  during the initialization phase, all system entropy goes to the urandom pool, *unless* it comes in via ioctl, which is what haveged does.  in that case, it goes straight to the input pool, and either none of it, or at ...21:33
jeblair... least not enough of it spills over (reall this is a thing) into the nonblocking (urandom) pool for it to be initialized.  eventually, it's regular system entropy which pushes it over the 128 bit threshold.21:33
jeblairi have good news though21:34
jeblairted ts'o ripped all of this out last month: https://git.kernel.org/cgit/linux/kernel/git/tytso/random.git/commit/?h=dev&id=e192be9d9a30555aae2ca1dc3aad37cba484cd4a21:35
jeblairso it's going to get better21:35
jeblairi have one more kernel recompile i want to do, then i think i'll be ready to try the experiment with the other generator21:36
*** tqtran has quit IRC21:37
*** tqtran has joined #openstack-infra21:37
cloudnullfungi: when you get back, if you would not mind, ``traceroute -6 -T git.openstack.org`` which forces TCP instead of the assumed UDP21:38
cloudnullalso same for -I21:38
*** jheroux has quit IRC21:38
cloudnullwhich is forcing ICMP, maybe the UDP packets are getting dropped/deprioritized in the path?21:38
cloudnullI'd be curious if that too fails21:39
*** thorst_ has joined #openstack-infra21:39
*** gomarivera has quit IRC21:40
*** rhallisey has quit IRC21:41
*** tqtran has quit IRC21:42
dmsimardo/ I'm trying to find where the Cirros image gets pre-cached in the nodepool images. I searched for "cirros-0.3.4-x86_64-disk.img" in project-config and system-config but no luck :(21:42
dmsimardI know it ends up in '~/cache/files' but I want to know how.21:43
*** sarob has quit IRC21:43
*** thorst_ has quit IRC21:43
*** ldnunes has quit IRC21:43
jeblairdmsimard: devstack i think21:44
mtreinishfungi, rcarrillocruz: https://github.com/eclipse/mosquitto/commit/ba2de8879008f6df90a0d6af5902926483051124 the mosquitto bug got fixed21:44
*** sarob has joined #openstack-infra21:44
mtreinishjeblair: yeah, devstack has a command which exports a list of images to precache and the nodepool image scripts call that21:44
dmsimardjeblair: ah, found it, ty https://github.com/openstack-dev/devstack/blob/06f3639a70dc5884107a4045bef5a9de1fb725a5/stackrc#L64521:44
*** nmagnezi has quit IRC21:45
beaglesthe irony would be this network thing being MTU and neutron related21:45
mtreinishjeblair, dmsimard: http://git.openstack.org/cgit/openstack-dev/devstack/tree/tools/image_list.sh21:45
mtreinishdmsimard: and http://git.openstack.org/cgit/openstack-infra/project-config/tree/nodepool/scripts/cache_devstack.py21:46
*** admcleod_ has joined #openstack-infra21:46
openstackgerritJames Slagle proposed openstack-infra/tripleo-ci: Deploy minimal services in multinode job  https://review.openstack.org/35509721:46
*** rbrndt has joined #openstack-infra21:46
*** matrohon has joined #openstack-infra21:46
*** weshay has quit IRC21:47
*** nwkarste_ has quit IRC21:48
*** nwkarsten has joined #openstack-infra21:48
*** njohnston has joined #openstack-infra21:48
cloudnullbeagles: you may be onto something there. we're using an MTU of 9000 on the hosts, maybe something is off on nodes that are showing signs of failure.21:48
*** admcleod has quit IRC21:49
* beagles facepalm21:49
*** fguillot has quit IRC21:50
njohnstonHi, I have a quick question about change 339246 - it has been sitting in the zuul UI in the check queue, all tests having completed, for over an hour now I believe.  Will it ever post it's results so the change can move on to the gate queue?21:50
*** matt-borland has quit IRC21:51
*** gomarivera has joined #openstack-infra21:51
*** annegentle has joined #openstack-infra21:51
cloudnullbeagles: sadly not the problem21:51
cloudnull:'(21:52
*** ggillies has joined #openstack-infra21:52
cloudnulli kinda wish it was it would've been simple to fix...21:52
*** nwkarsten has quit IRC21:52
beagles:(21:52
fungicloudnull: yeah, same behavior with traceroute6 -T as with the default method. however slightly different behavior with -I... first attempt none of the hops gave a response except git.openstack.org and it only responded to the second probe, but then rerunning with -I a second time worked correctly (-T and default protocols still do not however)21:53
*** thorst_ has joined #openstack-infra21:54
cloudnullfungi: traceroute6 or traceroute -6 ?21:54
*** tqtran has joined #openstack-infra21:54
fungicloudnull: traceroute621:55
fungitrying now with traceroute -6 and various options (i didn't know traditional traceroute grew a -6 option)21:55
cloudnullyea, that was news to me today too :)21:56
pabelangerjeblair: wow, that is a rabbit hole21:56
openstackgerritMerged openstack-infra/system-config: Disable puppet service on boot  https://review.openstack.org/35600421:56
*** annegentle has quit IRC21:58
fungijeblair: yeah, i'm aware ted ts'o has been heavily revamping the entropy gathering and rng stuff kernel-side. very excited for that to finally land21:58
*** sarob has quit IRC21:58
*** edmondsw has quit IRC21:58
*** thiagop has quit IRC21:58
*** amitgandhinz has quit IRC21:58
fungiit's been all abuzz on the post-cypherpunks crypto lists21:59
*** tqtran has quit IRC21:59
*** piet has quit IRC22:00
jeblairfungi, pabelanger: on boot, the first pull from urandom results in a transfer of 0 bits of entropy from the input pool to the nonblocking pool.22:00
*** spzala has joined #openstack-infra22:00
fungithat brings a tear to my eye22:00
jeblairfungi, pabelanger: that transfer of 0 bits causes a timer to start which protects urandom from draining the input pool too quickly.22:00
jeblairfungi, pabelanger: which means that later, after haveged dumps 4096 bits of entropy into the input pool, the system waits 60 seconds before it will allow a transfer from input to nonblocking for urandom to reseed22:01
*** nwkarsten has joined #openstack-infra22:01
jeblairwhich is why we were seeing an almost exactly 60 second delay22:01
jeblairand when i turned off haveged, the 90 seconds was just how long it took to naturally accumulate entropy from interrupts one bit at a time22:02
fungicloudnull: confirmed, traceroute -6 $* gives me identical behavior to traceroute6 $*22:02
cloudnullbummer.22:02
fungiright down to the strangeness with -I22:02
cloudnulloff to the next rabbit hole22:02
*** gomarivera has quit IRC22:03
bkerojeblair: Trying to gather entropy inside a VM?22:03
*** mriedem has quit IRC22:04
*** gomarivera has joined #openstack-infra22:05
fungibkero: specifically, trying to get unbound to not wait 60 seconds from boot before it can start, since that causes all other services starting and trying to resolve names via dns to bomb22:05
fungiand unbound wants a working /dev/urandom to be able to do stuff for dnssec22:06
*** nwkarsten has quit IRC22:06
fungiand the kernel makes /dev/urandom basically useless for a full minute after boot starting with linux 3.1722:06
anteayanjohnston: all tests are not complete on 339246,22:07
anteayanjohnston: all tests are not complete on 339246,22:07
anteayanjohnston: all tests are not complete on 339246,22:08
*** devkulkarni1 has quit IRC22:08
anteayanjohnston: all tests are not complete on 339246,22:08
* fungi thinks anteaya is caught in a loop22:08
anteayanjohnston: all tests are not complete on 339246, one test22:08
bkerofungi: weird. i would have assumed that urandom would be (as the name says) unblocking22:08
anteayanjohnston: all tests are not complete on 339246, one test is waiting for a node:22:08
anteayanjohnston: all tests are not complete on 339246, one test is waiting for a node:22:08
*** rbuzatu has joined #openstack-infra22:08
anteayagate-tempest-dsvm-neutron-full-ubuntu-xenial22:08
anteayasorry for the multiple spame22:09
anteayamy laptop was doing something weird with pasting22:09
anteayaand I had scrolled up22:09
anteayamy apologies22:09
fungibkero: yep, the kernel wants it to be safely seeded so processes don't rely on it before it's sufficiently entropic22:09
*** tqtran has joined #openstack-infra22:10
fungiand manage that by blocking on reads during that time22:10
bkerofungi: Huh, man urandom has a little shell script to carry that randomness between reboots. Cute.22:11
jeblairbkero: read scrollback from me today to understand why that doesn't help22:11
openstackgerritMerged openstack-infra/tripleo-ci: Use geard with keepalives  https://review.openstack.org/35256622:12
anteayafungi: and you had pinged me yesterday that the patch merged to allow anyone to compose electroal rolls, thank you for all your work on that22:12
anteayaand zaro too22:12
*** rbuzatu has quit IRC22:13
fungianteaya: yw! that and also the patch to expose submitted date via the rest api in change details are both in production, so the script is a good bit simpler now22:14
anteayayay simpler scripts!22:15
*** esberglu has quit IRC22:16
bkerojeblair: read scrollback. That's just unfun.22:16
jeblairbkero: okay, well, i mean, i've been digging into a seriously complex subject all day.  i'm not sure if you're trying to help or not.22:16
fungicloudnull: so, more spot checking, every trusty node i've found in osic has broken traceroute6, every xenial node i've found in osic seems to have a working traceroute622:16
*** mdrabe has quit IRC22:17
beaglesis there a way to get a packet trace on hosts that are doing the 4 minute clone thing22:17
bkerojeblair: ignore me, just sympathies22:17
beaglesmmm22:17
bkeroIf you're at the point of recompiling kernels to add printk()s I'm not going to be much help.22:17
beaglesactually that probably wouldn't help  - a retransmit doesn't tell you why22:17
notmorganbkero: oh god22:17
fungibeagles: what's a packet trace? do you mean route trace or a packet capture?22:17
jeblairbkero: if you would like to help, i'm happy to have it, just not sure to what degree i should invest in bootstrapping you -- not reading scrollback suggests you may not be very invested.  :)22:18
notmorganbkero: recompiling kernels.... i... nooooooo22:18
fungibeagles: sounds like you meant a packet capture22:18
beaglesfungi, I was referring to capture22:18
beaglesyeah22:18
bkerojeblair: I meant I did read scrollback and was offering sympathies22:18
jeblairbkero: ah,  thanks on all accounts then :)22:18
fungibeagles: we'd need to catch one of those instances while the job was running, since nodepool deletes them immediately on failure22:18
jeblairbkero: i read your 'read' as 'read' when you meant 'read.22:19
jeblairbkero: more like "i just read scrollback" and less like "what? me read scrollback"  :)22:19
bkeroyeah22:19
bkeroMy phrasing could have been better22:19
beaglesfungi, I could probably point you in the right direction there.. I don't know if a packet trace will help or not, but it might provide some kind of clue as to what the "profile" of the poor connection is22:20
fungibeagles: we're running down alternate theories involving other anomalous symptoms we're able to observe, in hopes that they're related enough to provide an indicator22:20
beaglesfungi, ack22:20
*** tqtran has quit IRC22:20
fungibeagles: we've also got glance doing something odd in certain devstack jobs only on trusty nodes in osic, and traceroute6 not working correctly on trusty in osic (while xenial seems to be doing fine)22:21
jeblairfungi, bkero, pabelanger: i've moved on to investigating why rng-tools makes this better -- somehow it has ticked a code path where entropy is transferred from the input pool to the nonblocking pool more often than 60 seconds22:21
jeblairso i may not understand that timer fully...22:21
jeblair(also, that timer value can be set in proc, but that's not the solution i'd like to take)22:22
*** gordc has quit IRC22:22
bkeroHahaha, wow. The char/random.c is copyright Matt Mackall of Mercurial fame.22:22
fungijeblair: there was an ubuntu bug that talked some about that. lemme see if i can dig it back up. something about the kernel not allowing userspace to advance the entropy pool directly, but the method rng-tools uses bypasses that22:22
fungithe idea is that less privileged processes should be able to add to entropy, while still not trusted to actually provide good quality entropy22:24
jeblairfungi: hrm, it *looks* like it's using the same ioctl that haveged is using... but yeah, that may be helpful22:24
jheskethMorning22:25
* bkero reading random.c, looks like adding entropy might not necessarily trigger 'crediting' the pool size. That might have to be done manually depending on the method used to add it.22:25
jeblairbkero: yes, that's why the 'save script' doesn't work.  but haveged (and presumably rng tools) use the ioctl which does credit22:26
jeblairbkero: sarch for RNDADDENTROPY:22:26
jeblairbkero: hower, that goes to the *input* pool instead of the nonblocking (urandom) pool.  so the thing that's missing is triggering a transfer from input to nonblocking22:26
bkeroAhhh okay22:26
jeblairsarch=search; hower=however22:27
*** tqtran has joined #openstack-infra22:27
rcarrillocruzmtreinish: nice!22:27
*** sdake has quit IRC22:27
*** rbuzatu has joined #openstack-infra22:28
*** sdake has joined #openstack-infra22:28
*** rbuzatu has joined #openstack-infra22:29
anteayamorning jhesketh22:29
*** dimtruck is now known as zz_dimtruck22:29
*** zz_dimtruck is now known as dimtruck22:29
jeblairbkero, fungi, pabelanger: oh -- i think pulls can only happen once per 60 seconds, but i think if you add entropy with the ioctl, and the input pool is full, then it can schedule a transfer from input to nonblocking pools22:31
*** yamahata has quit IRC22:31
jeblairbkero, fungi, pabelanger: it's looking like rng-tools does multiple ioctls to add entropy -- along with, on my test system, haveged adding one of its own22:31
jeblairso that's how adding rng-tools makes initialization happen faster22:31
bkerohm, ok22:32
* bkero looking what nonblocking_pool.initialized does22:32
jeblairpresumably, convincing haveged to push more entropy than required (possibly via multiple ioctls) may do the same22:32
fungisounds like something worth testing22:33
jeblairbkero: credit_entropy_bits has both the part i just described as well as the initialization threshold ("> 128")22:33
*** annegentle has joined #openstack-infra22:34
*** tqtran has quit IRC22:36
openstackgerritVarun Gadiraju proposed openstack-infra/project-config: Step 1 patch to project-config from bug #1609573  https://review.openstack.org/35434422:36
openstackbug 1609573 in Ironic "Ironic gate jobs should not pass configs through devstack-gate when possible" [Undecided,New] https://launchpad.net/bugs/1609573 - Assigned to Varun Gadiraju (varun-gadiraju)22:36
*** adrian_otto has quit IRC22:38
*** tqtran has joined #openstack-infra22:39
*** dimtruck is now known as zz_dimtruck22:39
*** gouthamr has joined #openstack-infra22:39
*** gouthamr_ has joined #openstack-infra22:40
*** matrohon has quit IRC22:40
*** tqtran has quit IRC22:43
*** gouthamr has quit IRC22:44
craigeo/22:44
*** gouthamr_ is now known as gouthamr22:44
*** yamamoto has joined #openstack-infra22:46
openstackgerritSai Sindhur Malleni proposed openstack-infra/project-config: Adding Ansible jobs for Browbeat  https://review.openstack.org/35607322:47
*** Apsu has joined #openstack-infra22:48
pabelangerjeblair: great info, thanks22:48
*** thorst_ is now known as thorst22:49
*** yamahata has joined #openstack-infra22:51
bkerojeblair: could always ping mpm on freenode :) he wrote the code22:51
*** burgerk has quit IRC22:51
bkerojeblair: Could it be initialized, but maybe prandom_reseed_late() is being set too high?22:53
bkerojeblair: I'm curious what prandom_seed_full_state() and prandom_bytes_state() would return22:54
jeblairbkero: well, we see the "random: nonblocking pool is initialized" line in the logs right around 63 seconds, then urandom starts working.22:54
*** hockeynut has quit IRC22:55
bkerojeblair: That print happens after the timer is set22:55
bkeroI'm tracing prandom_reseed_late(); in random.c line 682 in v4.722:55
jeblairbkero: oh!  look at 4.422:56
jeblairbkero: all this gets way better after https://git.kernel.org/cgit/linux/kernel/git/tytso/random.git/commit/?h=dev&id=e192be9d9a30555aae2ca1dc3aad37cba484cd4a22:56
*** vhosakot has quit IRC22:56
jeblairbkero: but that's not what we're running :(22:56
bkerojeblair: Still the same in 4.422:56
bkeroline 68122:56
jeblairbkero: that's the part where it initializes the urandom rng, right?22:57
bkerojeblair: despite the name it looks like it actually seeds for the first time, so seed + reseed22:58
jeblairbkero: you were asking if it could be initialized -- but we don't see the initialized line until 60+ seconds in (when the pull timer has expired)22:58
jeblairso prandom_reseed_late isn't going to be called until then22:59
bkerojeblair: I'm assuming this thing spins with the wake_up_all()  on line 683 until rng is initialized, then prints out the message22:59
jeblairbkero: i don't think this function spins at all23:00
bkeroThat credit_entropy_bits codeblock section does: prandom_reseed_late(), process_random_ready_list(), wake_up_all(), then prints the message23:00
*** spzala has quit IRC23:01
bkero__prandom_reseed has a spinlock23:01
*** spzala has joined #openstack-infra23:01
jeblairbkero: that only happens after nonblocking pool gets 128 bits23:01
bkeroYeah, I'm assuming that's happened. Maybe that's a false assumption.23:02
jeblairbkero: it hasn't happened, because the only thing that can feed the nonblocking pool is entropy from interrupts (~ one bit per second) or a transfer from the input pool.23:03
bkeroI'd think the timers would be adding it too. http://lxr.free-electrons.com/source/drivers/char/random.c?v=4.4#L80423:04
*** asettle has joined #openstack-infra23:05
openstackgerritMatthew Treinish proposed openstack-infra/devstack-gate: SUPER WIP: Use new tempest run workflow  https://review.openstack.org/35566623:05
jeblairbkero: not in practice on this system.  but theoretically yes.23:05
*** xarses has quit IRC23:05
*** hongbin has quit IRC23:05
*** tqtran has joined #openstack-infra23:05
*** spzala has quit IRC23:06
cloudnulljeblair fungi: i did some more tests using trusty and the traceroute issues using vanialla 14.04 -- I built 127 vms, passed user data to it to install and use traceroute(6) and from the looks of it, it all works.23:06
cloudnullVMS: http://cdn.pasteraw.com/rdq27ar1tcxag4zjal72vciufu2r28c23:06
cloudnullopps that console data show the traceroute23:07
cloudnullVMS http://cdn.pasteraw.com/7pgwxkqhmfvzc8pz4y3uwk9y6v6z58b23:07
cloudnullall using trusty23:07
cloudnullmtreinish: -cc ^23:07
*** Hal has quit IRC23:08
*** tpsilva has quit IRC23:08
cloudnullsimple userdata passed in http://cdn.pasteraw.com/7n0b3yeculm5zl5w4g5y5indfn4izhx23:08
*** rbrndt has quit IRC23:09
*** asettle has quit IRC23:09
*** Hal has joined #openstack-infra23:09
cloudnullI also made sure all of the VMs we're built on different compute nodes.23:09
fungicloudnull: yeah, this could be something odd with our image. i'm starting to dig around with tcpdump23:09
cloudnullI have another battery of tests to run using our various AZs and other networks just to make sure everything is on the up and up , but im kinda at a loss... :'(23:11
bkerojeblair: have you measured how many interrupts are being thrown at the boot of the system? Maybe it's not that many.23:11
*** xarses has joined #openstack-infra23:13
cloudnullfungi: this is the image i've been using http://cdn.pasteraw.com/eey9gn9gggaxyimu64dd79xwu0vjble23:14
jeblairbkero: it's about 1 per second23:14
*** xyang1 has quit IRC23:14
bkerojeblair: If the only entropy source is interrupts, and add_interrupt_randomness triggered for each, that would only add 60 bits of randomness per minute23:15
bkeroadd_interrupt_randomness() sets credit=0, and calls credit_entropy_bits(r, credit + 1). Since credit is never set except for on seed generators (PPC only) it's always 1.23:16
bkerowhere 1 = 1 bit23:17
jeblairbkero: yep.  i'd expect it to initialize after 128 seconds.  in practice, i saw it initialize after 90 seconds with no help.  i can't immediately account for the 30 second discrepancy, but could reboot with both rng and haveged disabled to find out if it might be important.23:17
*** thorst has quit IRC23:17
cloudnullI've got to relocate home. bbl23:17
*** thorst has joined #openstack-infra23:17
bkerojeblair: I'm curious why add_timer_randomness() isn't working too23:17
bkeroNO_HZ?23:18
bkeroMaybe try nohz=off in the cmdline?23:18
fungicloudnull: fyi, our current suspect image is b9cb5844-82a6-4034-9d09-d651ec019c7b23:18
jeblairbkero: possibly state->dont_count_entropy is true?23:19
jeblairbkero: i only instrumented the entropy credit, not the mix_pool_bytes func23:19
*** tqtran has quit IRC23:19
jeblairbkero: so i know that it's not crediting, but i don't know whether the add_timer_randomness func is being called23:20
jeblairbkero: here's the most recent boot: http://paste.openstack.org/show/558614/  this is with rng-tools rngd adding entropy starting around 19s23:21
jeblairfungi, pabelanger, bkero: i strongly suspect the difference between rngd and haveged is that rngd writes data in smaller chunks which allows the overflow routine to happen to push the nonblocking pool over the limit and initialize23:24
*** adriant has joined #openstack-infra23:24
bkerojeblair: I think it would help a lot to print which entropy pool was being credited23:24
bkeroCan just print the memory address, there should only be a few.23:24
jeblairbkero: credit pool nonblocking nbits 123:24
jeblairbkero: nonblocking is a variable sub there, it's the name of the pool23:25
bkeroHmm, what is entropy_count?23:25
jeblair"credit from interrupt / credit pool nonblocking nbits 1 / credit entropy_count 6 / credit entropy_total 64" are all one event, and that's the order to read them in23:26
*** thorst has quit IRC23:26
bkeroAug 16 23:06:32 ubuntu kernel: [    2.026124] random: write nonblocking pool 512 <-- doesn't look like that's getting credited.23:26
bkeroI'm betting that's systemd's seed thing23:27
jeblairbkero: entropy_count is the value of that local variable before "if (unlikely(entropy_count < 0)) {"23:27
*** fguillot has joined #openstack-infra23:27
jeblairbkero: that's it exactly23:27
jeblairbkero: that's a write to /dev/random rather than an ioctl, so it's added but not credited23:27
openstackgerritAbhishek Raut proposed openstack-infra/project-config: Use python-db-jobs for tap-as-a-service  https://review.openstack.org/35567023:28
bkerohrm23:28
bkeroThat sounds like a bug23:28
bkeroOr maybe they don't want to credit userspace additions as a matter of security23:28
bkeroIf that were the case I'd hope they would leave a comment htough.23:28
jeblairbkero: there are some comments that allude to that23:28
jeblairconsidering it's systemd, i think it could have used the ioctl, but anyway, all these bugs are fixed in newer kernels anyway :)23:29
bkerojeblair: RNDADDENTROPY should be crediting it unless write_pool()s return value is 0, but according to your log it's 0.23:31
* bkero reads the systemd source23:32
jeblairbkero: RNDADDENTROPY is the ioctl (haveged and rngd use it).  random_write is the entry point for "cat > /dev/random" which is what systemd does.  those are the write_pool calls that i have wrapped with those debug lines (the ones that print return codes).23:33
*** sarob has joined #openstack-infra23:33
bkeroYeah, systemd v229 just does:                         r = loop_write(random_fd, buf, (size_t) k, false);23:35
bkeroblah23:36
* bkero disappears for a bit23:38
bkerojeblair: Good luck figuring it out :(23:38
jeblairbkero:  thanks :)23:38
jeblairmordred, fungi, pabelanger: rngd doesn't seem to mind if there is no hardware rng.  it prints some error lines and continues.23:38
bkerojeblair: maybe make a systemd unit file to call the ioctl with a few bytes to do things correctly?23:39
*** jklare has quit IRC23:39
*** zz_dimtruck is now known as dimtruck23:40
fungijeblair: yeah, in some cases it may consume from virt-rng i think. i'm not super familiar with what happens if that's not available eithter23:40
jeblairbkero: yeah, possibly with the help of rngd or haveged23:41
jeblairfungi: do you know how to tell if it's doing that?23:41
fungijeblair: i do not, no23:42
*** csomerville has quit IRC23:42
*** aviau has quit IRC23:43
*** aviau has joined #openstack-infra23:43
jeblairfungi: it seems to behave the same in rax as on osic23:43
*** moravec has quit IRC23:44
*** cody-somerville has joined #openstack-infra23:45
*** tqtran has joined #openstack-infra23:46
fungivirtio-rng i guess23:48
*** zhurong has joined #openstack-infra23:49
*** moravec has joined #openstack-infra23:49
fungiqemu/kvm passthrough... i though xen had something similar23:49
fungithought23:49
*** bswartz has quit IRC23:49
*** annegentle has quit IRC23:50
*** ihrachys has joined #openstack-infra23:50
*** tqtran has quit IRC23:51
*** jerryz has quit IRC23:51
*** tqtran has joined #openstack-infra23:54
*** kbaegis has quit IRC23:55
*** apetrich has quit IRC23:55
*** kbaegis has joined #openstack-infra23:56
mtreinishjeblair: so I'm looking at the zuul snippet you pointed me to, and do you know if there is an example of what the job.arguments dict looks like or just the job object that gets passed to launch()23:56
mtreinishbecause I'm not exactly sure what I have to work with for adding the node_image to the metadata there23:57
jeblairmtreinish: i may be in a better position to help tomorrow; i don't think i can context switch right now, sorry.23:58
mtreinishjeblair: ok, no worries23:58
*** jklare has joined #openstack-infra23:58
*** amitgandhinz has joined #openstack-infra23:59

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!