Monday, 2016-08-15

openstackgerritCraige McWhirter proposed openstack-infra/puppet-phabricator: Patches Required to Deliver Pholio
*** tonytan4ever has joined #openstack-infra01:42
openstackgerritkyle liu proposed openstack-infra/project-config: Add new project networking-zte
*** nwkarsten has joined #openstack-infra03:04
Jeffrey4l_any guys can review this?
Jeffrey4l_fungi, anteaya this totally blocked kolla-kube project.
*** zhurong has joined #openstack-infra06:16
*** rwsu has joined #openstack-infra06:44
*** skraynev has quit IRC07:24
*** strigazi is now known as strigazi_AFK08:26
*** esikachev has joined #openstack-infra08:33
openstackgerritJuan Antonio Osorio Robles proposed openstack-infra/tripleo-ci: DO NOT MERGE - Periodic test
openstackgerritAlexey Stepanov proposed openstack-infra/project-config: fuel-qa: stable-mu branches for maintenance and stable for upgrades
*** shashank_hegde has quit IRC09:18
openstackgerritAndrea Frittoli proposed openstack-infra/subunit2sql: Fix type in test_attr_list handling
*** yaume has joined #openstack-infra09:22
openstackgerritAndrea Frittoli proposed openstack-infra/subunit2sql: Fix typo in test_attr_list handling
*** savihou has joined #openstack-infra09:24
*** thorst_ has joined #openstack-infra09:45
Jeffrey4l_any guys can review this?
*** yamamoto_ has joined #openstack-infra10:32
DuncanTCan anybody help me understand why the gate-cinder-python27-db-ubuntu-xenial job is marked as failed on please? All the tests seem to list status 'ok'
*** apetrich has joined #openstack-infra12:05
*** roxanaghe has joined #openstack-infra12:18
mordredpleia2: good to know it's hot somewhere - it's gotten chilly here - it's only 83 right now!12:36
openstackgerritEmilien Macchi proposed openstack-infra/tripleo-ci: WIP - Implement undercloud upgrade job - Mitaka -> Newton
openstackgerritThierry Carrez proposed openstack-infra/release-tools: aclmanager: Reuse releasetools.governance code
*** _ari_ has quit IRC13:36
odyssey4meAre the DNS resolvers for nodepool nodes set by something in infra? We're seeing configurations like in failed jobs, and
odyssey4me20-15-16.log in successful ones.13:37
*** _nadya_ has quit IRC13:37
*** amitgandhinz has quit IRC13:37
*** amitgandhinz has joined #openstack-infra13:38
*** yamamoto has joined #openstack-infra13:52
jrollhi, does someone mind looking at a one line project-config change to unbreak ironic stable jobs?
cloudnullits been fun. now to make it even better.14:15
openstackgerritJesse Pretorius (odyssey4me) proposed openstack-infra/project-config: Implement Swift pypy experimental check
*** admcleod_ has joined #openstack-infra14:35
anteayaDuncanT: thank you14:38
stevemarcan someone help me in getting me added to keystone-release? apparently we're going to be using them for the upcoming release, but i'm not in the group (am in stable release fwiw)14:48
Jeffrey4l_any guys can review this?
*** devkulkarni has quit IRC15:05
*** nwkarst__ has joined #openstack-infra15:19
sbezverkanteaya: done, I posted15:23
sbezverkanteaya: done, I posted
*** nwkarste_ has joined #openstack-infra15:30
fungisbezverk: i was working quite a lot through the weekend (much to my wife's annoyance) but it wasn't obvious to me what was going on there either, nor that it was imminently urgent for you. i'm sorry about that, but i also have to say i find your choice of words rather offensive so please keep discussion constructive in here in the future
*** mdrabe has joined #openstack-infra15:35
*** edtubill has joined #openstack-infra15:40
*** elo has joined #openstack-infra15:40
fungiluckily this doesn't feel like a job to me, so i don't mind doing it 60-80 hours a week but we all need to sleep sometime ;)15:43
*** nwkarste_ has quit IRC15:52
fungimordred: everywhere, right15:58
*** nwkarsten has joined #openstack-infra16:03
lennybwznoinsk, ok, I've got it, I will run tempest from virt env16:14
jeblairgreghaynes: if we go with chattr we will have *literally* gone full circle with glean:
*** nwkarste_ has joined #openstack-infra16:26
*** nwkarst__ has joined #openstack-infra16:34
openstackgerritMatthew Bodkin proposed openstack-infra/storyboard-webclient: Make side bar the same length as navbar
openstackgerritAdam Coldrick proposed openstack-infra/storyboard: Make it possible to get worklist/board timeline events via the API
sdaguehow hard would it be to pre cache into the images ? do a pip install upper-constraints.txt in a venv, then delete venv? Then we'd skip a lot of the downloads from the mirror.16:57
anteayaif the social contracts in a given project are such that a contributor offers a patch to project-config to change tests and noone else in the project was aware of the patch, I think the solution is better socialization within the project regarding change, not changing the behaviour of tests to report false status16:58
odyssey4mesdague IIRC you can actually just tell pip to download, not install16:58
mordredso ...16:59
mordredwe started the entire pre-cache/mirror game with doing caching downloads into the images16:59
openstackgerritPaul Belanger proposed openstack-infra/project-config: Add IPv6 DNS support
mordredit has almost never worked like expected16:59
sdaguemordred: because?17:00
pabelangerjeblair: clarkb: cloudnull: ^ Some testing on both ipv4 / ipv6 clouds shows that should work for unbound^17:00
mordredsdague: the reasons are varied and I have forgotten many of them - but it was consistently bad enough that we built mirrors instead17:00
sdaguebecause during a devstack run, we only ever download things once, just through pip's internal cache17:01
clarkbanteaya: definitely. I just know that for many projects in this situation the only reason they fail is tehy haven't configured tox17:01
sdagueso if that was already populated with "recently" then it would at least relieve preasure17:01
clarkbanteaya: so if we can just get them to do that instead they have tests that work and its less burden on project-config17:01
jeblairclarkb: that wasn't this situation at all17:02
sdagueupper-constraints changes would still go through17:02
clarkbjeblair: ok17:02
*** nwkars___ has joined #openstack-infra17:03
*** _nadya_ has joined #openstack-infra17:03
sdagueanyway, slightly related to that, we need a timeout bump on novaclient functional tests - - which is how I discovered this mirror constraint17:04
*** nwkarste_ has quit IRC17:04
*** signed8bit_Zzz is now known as signed8bit17:04
openstackgerritRyan Hallisey proposed openstack-infra/project-config: Make the kolla-kubernetes jobs non-voting and experimental
openstackgerritDarragh Bailey proposed openstack-infra/git-review: Refactor Isolated Env to use in unit tests
openstackgerritDarragh Bailey proposed openstack-infra/git-review: Set author and committer explicitly
mordredsdague, jeblair: so - it might be worth re-trying. the pip caching code has gotten much better. and we also have the constraints files - so doing a "pip install -d . -c upper-constraints.txt global-requirements.txt" in the image build might work better now than it did a few years ago17:05
sdaguemordred: yeh, I wouldn't want to do anything more complicated that pip itself17:06
mordredwhen we did it last time, the newer pip download cache had not yet been implemented17:06
clarkbmordred: sdague is that still per user?17:06
sdagueclarkb: yeh17:06
sdagueso just do it as the stack user17:06
jeblairsdague: stack user does not exist17:06
fungioh, right, the last time we tried there was no such thing as a pip cache or a wheelhouse17:06
clarkbwhich doesn't actually exist there17:06
mordredand stack user would not help non-devstack changes17:06
jeblairsdague: jenkins/zuul is the only user17:06
jeblairsdague: so you'd need to sudo move the cache17:07
sdaguemordred: it would not, however devstack changes are probably the biggest consumers17:07
jeblair(which i believe we also did)17:07
mordredjeblair: ++17:07
mordredjeblair: I agree with you17:07
fungipresumably devstack-gate could mv/cp/rsync the cache from ~jenkins to ~stack17:07
odyssey4mecan the cache path be configured in the global pip.conf perhaps?17:07
fungiodyssey4me: not easily since pip wants it writeable17:07
*** Hal has joined #openstack-infra17:08
odyssey4mefungi something like /opt/pip_cache - and just make it writable for anyone/everyone?17:08
*** harlowja has joined #openstack-infra17:08
electrofelixYorikSar: I wonder if you might review the response I left on a while back and see if it's acceptable for you?17:08
mordredyah. it does check ownership17:08
*** kzaitsev_mb has quit IRC17:08
sdagueok, I guess this is why we can't have nice things :)17:09
jeblairmordred, sdague: if someone wants to give that a shot, i'm not opposed.  it will increase our image sizes of course and consume root filesystem space.  it is also probably worth doing a quick test against an unsaturated mirror to find out how much faster we're actually talking about.17:09
sdaguenever mind then17:09
odyssey4meperhaps an extension of z-c then, which can move the folder appropriately and set the appropriate rights?17:09
*** nwkars___ has quit IRC17:09
jeblairoh, well, never mind then17:09
mordredis bandwidth cached on the private network? and if not, is it viable to try to do config to use private network to hit mirror instead of public?17:09
fungithough having devstack-gate rsync ~jenkins/.cache/pip into ~stack/.cache and ~tempest/.cache when it's also rsync'ing git repos from /opt/git to ~stack/new may make sense?17:10
sdagueso the numbers I've got just by poking is that internap is doing the pip installs in < 1/2 the time of rax-ord17:10
odyssey4memordred that sounds like a nifty idea - it should also kill the L3 interaction which should speed it up17:10
sdagueand I think internap nodes are otherwise slower17:10
clarkbodyssey4me: its still L3ing on private net iirc17:10
sdagueso back of the envelope, we're probably adding 3 - 4 minutes to every rax job because of the bw constriction17:10
clarkbglean gets a list of nets to route through that interface17:11
sdaguerax dsvm job17:11
jeblairsdague: i would like to discount the rax-ord times because the solution to that is easy, get a new server17:11
fungiodyssey4me: mordred: that would also be a fairly rax-centric choice, since we're relying on their rfc-1918 flat net spanning tenants/projects17:11
jeblairsdague: the reason to use a local pip cache, in my mind, is if it's faster than our best-case times on an unsaturated mirror17:11
odyssey4mebah, this is why we can't have nice things :p17:12
mgagnesdague: I don't know about RAX but we have a lower number of instances and therefore nodes dedicated (not shared) for ci infra. At this point, you could be your own noisy neighbours. but I didn't fully read backlog =)17:12
clarkbmordred: rereading the bw details for rax the private net can do 2x the public net17:12
clarkbsince public net can only utilize 50% of total bandwidth allocation17:12
fungimgagne: in this specific case it's rackspace's flavor-based bandwidth rate limits17:12
* anteaya buys many things at the thrift store as she has accepted she can't have nice things17:13
mordredclarkb: this: says there is no charge for traffic on servicenet - but it does not indicate if there are bandwidth caps17:13
clarkband 200mbps is the limit for the 2GB flavor and 50% of that is 100mbps which we are seeing17:13
fungimgagne: the flavor we used for mirror.ord.rax..o.o only gets 100mbps bw, and we're topping out there under load17:13
clarkbmordred: footnote 417:13
jeblairare we seriously thinking that we should try to work around this rather than just launch a new server?17:14
fungiyeah, their "200mbps" is 100mbps egress + 100mbps ingress if memory serves17:14
clarkbjeblair: no I think we should make an 8GB instance with 800mbps17:14
fungijeblair: i think we should just boot a replacement mirror.ord.rax..o.o but i don't personally have time to do it for a few more hours17:14
jeblairclarkb: not a 4g with 400?17:14
mgagnesdague: "I think internap nodes are otherwise slower" are we talking about jobs execution time? (not network) ?17:15
clarkbjeblair: maybe start there and go bigger if necessary17:15
pabelangerfungi: jeblair: I can boot the replacement if needed.17:15
fungii can get to it later today if we settle on a preferred flavor to replace the current one17:15
fungipabelanger: oh, thank you!17:15
odyssey4methe simplest solution is certainly the best, although the creative exercise of looking at alternative solutions is also interesting and can sometimes spawn unrelated ideas17:15
*** sarob has joined #openstack-infra17:16
mordredodyssey4me: agree. in this case, I think it served to underscore why booting a new server is absolutely the right choice17:16
*** vhosakot has joined #openstack-infra17:16
fungias to sdague's other request, pre-warming the new afs cache before putting it into production, i don't think we've done that before. it seems probably doable, but it would also be very quickly self-correcting anyway17:16
*** sarob has quit IRC17:17
clarkbfungi: should be as easy as pip installing constraints against the ip addr of the new host17:17
pabelangerso, performance1-4 or performance1-8? Sounds like that is up for debate currently17:17
*** sarob has joined #openstack-infra17:17
clarkb4GB is fine with me17:17
fungiseems like performance1-4 should be fine17:17
*** _sarob has joined #openstack-infra17:18
fungiwe've only started hitting 100mbps egress a few weeks ago, so doubling that to 200mbps egress should satisfy us for a while (perhaps indefinitely unless we get a quota bump there)17:18
pabelangerjust ord for now?17:18
mordredI don't even see performance1-4 on the pricing list17:18
clarkbpabelanger: you can probably check the other cacti graphs to see if other instances exhibit the same capped bw behavior17:19
fungithough since we have so much more quota in ord, it's unlikely we're hitting it elsewhere17:20
clarkbdfw and iad don't come close to 100mbps according to cacti17:20
fungibut i agree it deserves being checked17:20
cloudnullmordred: performance.* flavors are now general.* i believe17:21
clarkbOVH and internap look fine too17:21
clarkbso yes, I think just ord for now17:21
* cloudnull assuming your talking about rax17:21
*** tqtran has joined #openstack-infra17:21
sdaguejeblair: ok, I believe that it is, though you'd have to instrument pip maybe to figure out17:21
sdagueor add up the size of .pip/cache and do some back of the envelope there17:21
*** sarob has quit IRC17:22
mordredcloudnull: yah17:22
clarkbcloudnull: are there bw limits in osic?17:23
fungipabelanger: i just reviewed all our mirrors, and while some ( exceed the volume in rax-ord, none of the graphs besides that one show an envelope indicative of a bandwidth cap getting hit17:24
pabelangerfungi: great, thanks17:24
pabelangernew server launching now17:24
fungioh, though while it sounds like osic is probably fine, it's not in cacti right now17:24
zarofungi: forgot about this one.  identified another duplicate cron job for gerrit git gc
*** ayoung has quit IRC17:25
*** asettle has joined #openstack-infra17:26
cloudnullclarkb: do you want / need bw limits setup? we could do qos'ing via neutron or setup "tc" rule if needed.17:26
*** kzaitsev_mb has joined #openstack-infra17:26
cloudnullbut we're not doing anything as of now17:27
clarkbcloudnull: no I don't think we do :) just double checking we don't need to be aware of that like we have to be in rax17:27
anteayazaro: does root own that cron job?17:27
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Shut down gearman client in tests
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Remove testresources
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Make ZK fixture more robust
zaroanteaya: it looks like it to me.17:29
*** ihrachys has quit IRC17:29
anteayazaro: where are you looking, review-dev?17:29
fungizaro: to anteaya's point, it said user=>'gerrit2' before, and that needs to be retained when doing ensure=>absent17:29
mordredclarkb: have a sec and feel like +A on 355131 there? (it makes tests not be flaky)17:29
*** vhosakot has quit IRC17:30
clarkbmordred: trying to catch up on email but I can take a look17:30
mordredclarkb: email is the worst17:30
zarofungi: ohh right. will fix that.17:30
anteayazaro: fungi the code says owner gerrit2 on line 37417:30
dstufftnew pip cache is awesome, but make sure you have Etags and Cache-Control headers17:30
anteayais that enough for the cron jobs?17:31
dstufftit needs those17:31
fungizaro: however i think we also aren't using that particular cronjob in production as it's wrapped in if (!defined(File[$local_git_dir]))17:31
*** adrian_otto1 has joined #openstack-infra17:31
bkerofungi: So the gerritbot2 work we discussed last week is a bit troublesome. The way that the gerritbot puppet class is made makes it a singleton-per-host. We can switch it from a class to a defined type, but that's going to be nasty to merge -- each bot would have it's own init script, logging config, channel config, maybe ssh keys.17:32
mordredjeblair, pabelanger, Shrews, DuncanT: the presumptive fix for the ansible async issue has been merged upstream17:32
bkeroAlternatively we might be able to run it on a different host without modifying it.17:32
mordredof course, for us to pick it up, we'll need to go back to running from git instead of a release17:32
anteayamordred: wonderful17:32
zarofungi: local_git_dir is the local replication correct?17:32
*** tqtran has joined #openstack-infra17:32
fungibkero: shouldn't need separate ssh keys, but it will need separate versions of the rest of that yes. i figured a lot of it would have to become erb templates17:33
jeblairmordred: should we look into running it locally like we did before?17:33
zaroaren't we repicating to local_git_dir on review.o.o?17:33
bkerofungi: Yeah, I have a ~200 line patch to do that17:33
bkeroI just don't know how it's ever going to get merged17:33
mordredit has been applied to the stable branch as well - so we could just run off of the upstream stable branch instead of Shrews branch which is based off of tip of devel17:33
mordredjeblair: ^^17:33
fungizaro: yes, so that's saying if the local replication directory is not defined then add this cron resource. but as we have local replication set up that file resource already exists so the cron resource never gets added17:34
DuncanTmordred: thanks for the update17:34
*** vhosakot has joined #openstack-infra17:34
fungizaro: i don't know why it's written that way (looks to me like someone put a } in the wrong place, but there are no comments explaining so maybe it's intentional and i'm just not able to come up with the reasoning)17:34
*** sambetts is now known as sambetts|afk17:35
*** adrian_otto has quit IRC17:35
jeblairpabelanger: can you refresh ?17:36
*** dprince has joined #openstack-infra17:36
jeblairmordred, pabelanger: let's land that, then we can manually install the ansible upstream stable branch and restart launchers to pick up both changes17:36
pabelangerjeblair: looking17:38
mordredjeblair: agree17:38
openstackgerritMerged openstack-infra/jenkins-job-builder: Fix link to findbugs minimal example
mordredjeblair: it is confirmed to be in stable-2.1 branch17:38
openstackgerritMerged openstack-infra/jenkins-job-builder: Update HTML Publisher plugin to use convert xml
openstackgerritPaul Belanger proposed openstack-infra/zuul: Simplify zuul_console port binding logic
pabelangerjeblair: updated per your comments17:40
*** ayoung has joined #openstack-infra17:41
clarkbjeblair: mordred for 355131 I wonder if we can tell it to bind on port 0 then get the actual port back sanely (using /proc maybe?)17:42
*** tonytan4ever has joined #openstack-infra17:45
zarofungi: just took a closer look and it seems to me that whole section is just duplicating cron.pp in puppet-gerrit.  i think it should be completely removed17:46
*** rbrndt has quit IRC17:46
fungizaro: i agree. i think its vestigial dead code17:46
pabelangerclarkb: jeblair: fungi: mordred: Can we land so we can use non-root permissions for
fungipabelanger: was that the only missing piece?17:48
*** raunak has joined #openstack-infra17:48
pabelangerfungi: I believe so17:48
clarkbpabelanger: fungi it will also update the ansible cache iirc. So that needs to be writeable too17:48
zarofungi: i'm surprised that puppet lint didn't pick up that missing }17:48
pabelangerclarkb: ah, yes.17:49
Shrewsmordred: jeblair: so, this is new in stable-2.1 ( but i don't immediately see any issues with it. just FYI17:49
pabelangerOS_CLOUD=openstackci-rax OS_REGION=ORD openstack server list is not returning servers from ORD, but DFW17:50
fungizaro: it's not missing, just several resources after the file resource. in retrospect, i think that was probably added when we moved local mirror handling to the gerrit module and just never cleaned up after17:50
Shrewsit also has the fix for the temp dir race17:50
pabelangerI don't know why atm17:50
mordredpabelanger: on puppetmaster?17:50
pabelangermordred: yes17:50
Shrewsjeblair: i think you found that one ^^^^ (re: tmp dir race)17:50
mordredpabelanger: looking17:50
clarkbI always use the openstack flags not the env vars17:50
openstackgerritJeremy Stanley proposed openstack-infra/system-config: Add mirror.regionone.osic-cloud1.o.o to cacti
mordredpabelanger: OS_REGION_NAME17:51
mordrednot OS_REGION17:51
pabelangerlaunch/README is wrong17:51
pabelangermordred: thanks17:51
fungipabelanger: launch/README isn't "wrong" per se. it's just using different envvars in its example shell script than what openstackclient would use17:53
jeblairclarkb: i'm not sure -- i didn't think to ask proc.  however, i'm just about convinced that with the zookeeper chroot option, we can drop the per-test fixture and just expect a locally running zk...17:54
fungiit's not passing those to osc17:54
pabelangerfungi: Ah, right. That explains it17:55
beaglespabelanger, got some weird stuff happening in some puppet-neutron CI for mitaka where a bunch of ubuntu jobs are failing (see
beaglespabelanger, who should I bug about that? :)17:56
*** _nadya_ has joined #openstack-infra17:56
*** _nadya_ has quit IRC17:57
*** Sukhdev has quit IRC17:58
anteayabeagles: EmilienM is the ptl for puppet-openstacklib:
anteayahe might be able to help17:59
beaglesanteaya, actually thanks for the correction - that's puppet-openstacklib17:59
fungibeagles: those look like they all hit a one-hour timeout running in osic, which i believe is related to the ipv6 dns discussion which was going on in here earlier17:59
beaglesanteaya, I was sent in this direction17:59
pabelangerasync task produced unparseable results17:59
beaglespabelanger, yup18:00
pabelangerlooks like ansible is failing18:00
pabelangerI think we are working on patching zuul18:00
mordredpabelanger: we just landed a patch upstream for that18:00
pabelangermordred: ++18:00
sdaguejeblair: as a data point, with a primed cache on my NUC the pip_install time is 72s. So even in our best cases, my guess is that 2/3rds of the pip install time is spent on network18:00
mordredand will roll out the fix to infra at the same time as your other patch18:00
fungiwere the job timeouts in osic directly related to the ansible json parsing errors?18:00
sdaguebasically we've got a fixed cost of ~ 1 minute to install for dsvm runs, and 2 - 6 minutes of network time18:01
pabelangermordred: great18:01
pabelangerbeagles: sounds like fix is in progress18:01
beaglespabelanger, thanks man!18:01
openstackgerritDarragh Bailey proposed openstack-infra/jenkins-job-builder: Support lazy resolving of include yaml tags
openstackgerritKhai Do proposed openstack-infra/system-config: Remove duplicate code to setup gerrit local replication
openstackgerritBen Kero proposed openstack-infra/puppet-gerritbot: Refactor bot into defined types to allow multiple bots
bkerogreghaynes: ^18:08
bkerofungi: ^18:08
bkeroThat's also going to need a transition plan :/18:08
jeblairbkero: quick thought experiment -- how hard to make gerritbot support 2 connections?18:09
openstackgerritHenry Gessau proposed openstack-infra/project-config: Use python-db-jobs for networking-sfc
bkerojeblair: the gerritbot project itself? I have no idea, never looked at the source18:10
*** ihrachys has joined #openstack-infra18:13
bkerojeblair: You'd have to do some multiprocess/threaded python, since these just run/spin by themselves:
*** nwkarsten has joined #openstack-infra18:15
jeblairbkero: yeah, i'd imagine it would just end up looking a lot like running 2 bots inside of one process.  running 2 processes is probably the better way, just wanted to throw that out there in case it looked too gnarley18:15
*** vhosakot has joined #openstack-infra18:15
bkerojeblair: It's going to look gnarly either way. The easiest way would be to run on a different host.18:16
openstackgerritDarragh Bailey proposed openstack-infra/jenkins-job-builder: Allow using lockfile per jenkins master
bkerobut I'm sure that's also fraught with inheritance nightmares18:16
*** e0ne has joined #openstack-infra18:17
jeblairit also has other drawbacks :)18:18
openstackgerritDarragh Bailey proposed openstack-infra/jenkins-job-builder: Output additional info when exceptions occur
*** Apoorva_ has joined #openstack-infra18:20
*** inc0 has joined #openstack-infra18:21
greghaynesbkero: nice18:21
openstackgerritDarragh Bailey proposed openstack-infra/jenkins-job-builder: Refactor base test classes inheritance for reuse
sdaguecould I get some reviews on to increase timeouts on novaclient jobs?18:24
*** Apoorva has quit IRC18:24
*** vhosakot_ has joined #openstack-infra18:24
openstackgerritBen Kero proposed openstack-infra/puppet-gerritbot: Refactor bot into defined types to allow multiple bots
openstackgerritDarragh Bailey proposed openstack-infra/jenkins-job-builder: Improve logger output for expanding templates
*** vhosakot has quit IRC18:25
*** xyang1 has joined #openstack-infra18:26
beaglespabelanger, mordred: what should I be watching for a heads up that the expected fixes are in?18:27
*** senk_ has quit IRC18:27
mordredbeagles: we'll just ping you18:28
beaglesmordred, thanks!18:28
*** apetrich has joined #openstack-infra18:29
*** csomerville has joined #openstack-infra18:30
*** vhosakot_ has quit IRC18:33
*** ayoung has quit IRC18:37
clarkbsdague: is that related to the pip bw thing? where are we spending the other 40 minutes?18:37
rajinirGate seems to be broken. No hosts found to map to cell, exiting. Any ETA?18:37
sdagueclarkb: well, there is 7 minutes not in any log files before setup workspace, no idea why18:38
sdagueclarkb: but regardless, we've been pushing up towards our time alotment18:38
clarkbrajinir: is there more context for that? like a log file?18:38
clarkbsdague: ya I am ok with bumping it just want to amke sure we don't focus on 4 mintues of extra pip time when we have 40 minutes of setup elsewhere that may actually be the problem18:39
sdagueand we need to land code before freeze otherwise basically the nova cli will stop working18:39
sdagueclarkb: well, that also impacts dpkg installs18:39
pabelangerrajinir: where did you see that?18:39
clarkbsdague: those are cached though18:39
*** tqtran has quit IRC18:39
sdagueclarkb: ok, the biggest mystery to me right now is the missing 7 minutes here -
sdaguebecause the setupworkspace first log entry is at 27 and change18:40
*** asettle has quit IRC18:41
*** jimbaker has quit IRC18:41
rajinirpabelanger: I was watching the gate on my thirdparty CI18:42
clarkblooks like we lost time due to ntp18:43
* clarkb grumps that ntp isn't more sane18:43
jeblairclarkb: how can you tell?18:44
clarkbjeblair: due to that line I am ssuming that the logs don't jump forward in time due to a time update but instaed actually took that long18:45
jeblairclarkb: so you're thinking because it said it failed to sync earlier, it jumped later?18:46
*** jimbaker has quit IRC18:46
pabelangerrajinir: I cannot comment on that, but the gate is not broken. As other projects are passing properly18:46
*** Apoorva_ has quit IRC18:46
clarkbjeblair: ya thats one possibility18:46
*** karthik__ has joined #openstack-infra18:46
*** Apoorva has joined #openstack-infra18:47
*** vhosakot has joined #openstack-infra18:47
jeblairwhat's the 10 minutes before ntp-wait?18:47
clarkbjeblair: I think tahts the 10 minutes of ntp-wait waiting18:47
rajinirpabelanger>: On the ironic channel, a couple of folks are also seeing it18:47
jeblairclarkb: oh, all output at the ned18:47
*** vhosakot has joined #openstack-infra18:48
*** spzala has quit IRC18:48
*** spzala has joined #openstack-infra18:48
clarkbrajinir: pabelanger I don't see an error in that paste either? looks like just debug logs?18:48
*** tqtran has joined #openstack-infra18:49
jeblairclarkb, sdague: ianw and pabelanger have been looking into ntp issues18:49
clarkbrajinir: it looks like it is trying to configure cells but the config doesn't exist. You might just be able to run without cells?18:51
*** spzala has quit IRC18:51
*** hockeynut has quit IRC18:51
*** spzala has joined #openstack-infra18:52
*** bstinson has quit IRC18:53
mordredclarkb: while we're looking at timing things - this one might be related to bandwidth caps and stuff ...18:53
mordredclarkb: but:
*** bstinson has joined #openstack-infra18:54
*** javeriak has joined #openstack-infra18:54
mordredclarkb: if you scan the log, it looks like every time that tries to touch git.o.o it takes 4 minutes18:54
clarkbI doubt that is related to ntp if it happens more than once. Probably something related to the git mirrors and/or networking and/or git18:55
mordredit's always a remote update  - and it's always roughly 4 minutes18:55
sdagueclarkb: I don't think it's ntp18:56
fungibandwidth utilization on git.o.o seems to be nearing/reaching 400mbps egress traffic at times
sdaguesyslog has regular logging through the whole window18:56
sdagueansible is doing something that's not logging18:56
sdagueoh, it's the filesystem rebuilds18:57
sdaguedo we still need to do that on nodes?18:57
fungiwhat's the bw cap for rax's 30gb performance flavor?18:57
*** itisha has joined #openstack-infra18:58
clarkbsdague: we do need swap, and the / is tiny iirc so likely yes we need to make /opt large there18:58
sdagueclarkb: ok, that takes 7 minutes18:58
rajinirclarb: This could be something to with ironic plugin. Discussion happening in ironic channel to revert. thanks18:58
jeblairperhaps it's copying the git repos to the new device that is slow?18:59
openstackgerritEddie Ramirez proposed openstack-infra/project-config: Add craton-dashboard repository (Horizon Plugin)
sdaguejeblair: no, this is the mkfs18:59
clarkbjeblair: looking at sdague's log links it is the mkfs18:59
clarkbsince it doesn't mount until 7 minutes later18:59
mordredI concur with sdague18:59
sdagueand there are no other logs in this window18:59
clarkbit is possible we want to not utilize the full disk there and make a smaller but large enough fs18:59
*** tqtran has quit IRC18:59
*** e0ne has quit IRC18:59
jeblairthat is a very long mkfs18:59
sdagueI agree, that seems super long18:59
fungicould try -E lazy_itable_init ?19:00
jeblairwhere's the mkfs command?19:00
jeblairfungi: rxtx_factor is 2500.019:00
clarkboh wait it mounts twice19:00
clarkbthe first mount is fast so I don't think it is the mkfs19:00
openstackgerritEddie Ramirez proposed openstack-infra/project-config: Add craton-dashboard repository (Horizon Plugin)
mordredactually - it seems to be the mount19:00
mordredyah - what clarkb said19:00
clarkbjeblair: I think you are right, it mounts first in other location, copies, then chagnes mount19:00
clarkbthe copy being the slow bit?19:00
fungijeblair: okay, so we're nowhere near the bw cap there i guess19:01
mordredhow horrible would it be to do this dance in the ready-script rather than in d-g?19:01
jeblairfungi: i forget the rax math needed to get to 'upstream bandwidth' though19:01
clarkbjeblair: fungi its divide that number by 2 and thats your mbps iirc19:02
mordredjeblair: bother19:02
clarkbso 1250mbps for public interface19:02
jeblairclarkb: that means we have 200mbit for our 2gb mirror?19:02
*** _sarob has quit IRC19:02
sdagueclarkb: yeh, with the 2 mounts I agree19:02
mordredfungi, clarkb: the "disk" optimized flavors at rackspace have a much higher bandwidth number19:02
sdaguethis is the find / copy19:02
*** psachin has quit IRC19:03
sdague - is the line that seems to take ~7 minutes19:03
clarkbjeblair: hrm ya it should be 200mbps but thats not what we are seeing there. Weird.19:03
mordredscuse me - "I/O Optimized"19:03
fungiso i'm guessing the umount is flushing the write cache19:03
fungihow about we mv the contents of /opt somewhere else on the rootfs, mount the ephemeral disk at /opt, then mv the files into it?19:04
mordredoh - but nevermind- those have huge amounts of cpu and are way more pricey - just a bigger general would meet expanded needs much simpler19:04
fungithen we don't umount and mount it again19:04
jeblairmordred: not always -- io1-30==performance2-30==2500.019:04
*** sarob has quit IRC19:04
mordredjeblair: yah - sorry, I was looking at the first table entry and missing the fact that it was a 15G instance19:05
clarkbfungi: would be easy enough to push a patch that does that and compare times19:05
mordredthat seems like a mildly strange definitoin of the smallest "I/O Optmized" flavor19:05
clarkbalso need to figure out why ntp-wait is so cranky19:05
mordredclarkb: sync with ianw/pabelanger on that19:05
sdaguefungi: ok, while that is going on, anyone want to +A - so we can make forward progress with novaclient? :)19:05
mordredthere was a bunch of stuff on that topic towards the end of last week19:05
jeblairmordred: i already mentioned that :)19:06
fungiclarkb: same-fs mv should be atomic and basically instantaneous, so i expect it's a performance improvement to not umount and mount again regardless... just a question of how much19:06
mordredjeblair: yup - it's just been chatty so didn't want clarkb to miss it :)19:06
jeblairclarkb, mordred: some more background reading: bug 1361382 in ntp "ntp-wait hangs after boot for a long time, unless ntpd is restarted" [Unspecified,Closed: notabug] - Assigned to mlichvar19:06
jeblairsdague: ^19:06
*** edtubill has joined #openstack-infra19:08
*** asselin_ has joined #openstack-infra19:08
jeblairsdague, clarkb, fungi: is it the case that we need to move the data off of / in order to free up space there for all the installs?19:08
*** sarob has joined #openstack-infra19:09
clarkbjeblair: yes I think so19:09
openstackgerritScott DAngelo proposed openstack-infra/project-config: Add experimental Cinder job for multibackend
fungijeblair: right, that's why we mv rather than cp19:09
clarkbjeblair: VMs and mysql and friends all need disk19:09
sdagueright, but wasn't a bunch of that for hp pathelogical flavors?19:10
*** Swami has joined #openstack-infra19:10
clarkbreading this seems like we could use ntpd -qg ?19:10
* fungi wishes's ntpd worked like openntpd at startup19:10
clarkbsdague: rax and hp were basicaly the same19:10
jeblairmordred: when i clone python-aodhclient locally from our git mirrors, it takes 1 second; so i don't know what's taking 4 minutes in that job you linked.19:10
clarkbsdague: tiny / huge ephemeral disk19:10
*** asselin has quit IRC19:10
mordredjeblair: me either - it was the consistency of it across multiple invocations that had me the most concerned19:10
fungiclarkb: i'd have to reread, but it sounds like ntpd -qg can still take 10+ minutes to stabilize19:11
clarkbfungi: -g says "This option allows the time to be set to any value without restriction"19:11
sdagueactually, I'm super confused, that log line is here - ?19:11
sdagueis ansible just buffering this whole thing and throwing away all the useful timestamp info?19:12
*** tqtran has joined #openstack-infra19:12
clarkbsdague: no I think we do that timestamping outside of ansible19:13
clarkbsdague: with the tooling pulled out of devstack19:13
sdaguewell, those mount timestamps don't line up with the ones in syslog19:13
sdagueand would state that the mv took 0.003s19:13
openstackgerritgreghaynes proposed openstack/diskimage-builder: Clarify OVERWRITE_OLD_IMAGE docs
jeblairi have to run to lunch now. bbl19:14
clarkbit wouldn't surprise me if it is a buffering issue in the timestamping, just not related to ansible I don't think19:14
*** elo has quit IRC19:14
openstackgerritgreghaynes proposed openstack/diskimage-builder: Clarify OVERWRITE_OLD_IMAGE docs
openstackgerritMerged openstack-infra/project-config: increase novaclient functional timeout.
*** asettle has joined #openstack-infra19:16
*** devkulkarni has joined #openstack-infra19:17
*** devkulkarni1 has quit IRC19:17
sdagueok, well, for right now, we need to get these novaclient bits sorted. So I'm going to switch gears back over to that.19:18
clarkbfungi: looks like we could also just start ntpd with the -g flag19:19
*** asettle has quit IRC19:19
fungiclarkb: yep, i'm trying to see if i can figure out why that's not configurable for the initscript/systemd unit19:19
fungibecause if that were generally useful, you'd think it would be a startup option19:20
clarkbntpd -g for the normal daemon should work if we start within +/-68 years of current time from my reading of docs19:21
clarkb1970 is only 46 years ago so we should be fine even if we start at the epoch19:21
*** Sukhdev has joined #openstack-infra19:25
clarkbfungi: my tumbleweed system uses -g, but it also has a force set option that will run sntp first19:25
clarkbso -g may not be sufficient?19:25
clarkbfungi: trusty has it set to -g in /etc/default ntp too19:27
clarkbbut trusty has no sntp option19:27
*** asettle has joined #openstack-infra19:29
*** signed8bit is now known as signed8bit_Zzz19:30
*** asettle has quit IRC19:31
*** adrian_otto1 has quit IRC19:31
*** sean-k-mooney has joined #openstack-infra19:33
*** oanson has quit IRC19:34
fungiclarkb: indeed, my debian systems have NTPD_OPTS='-g' too19:36
clarkbfungi: I think we could do a quick survey of our images and see if they use -g by default and if they do try removing any ntp machinery from our jobs?19:38
fungi"When the initial offset is larger than 0.128s, ntpd will step the clock and then it will wait for at least 900 seconds (in default configuration) before it reports it's in the synchronized state."19:38
clarkbthe ntp machinery in our jobs was there to calculate job timeouts, but since ntpd can't skew things drastically those timeouts shouldnb't be terribly affected adn the -g should get us fairly close19:38
openstackgerritMatt Riedemann proposed openstack-infra/elastic-recheck: Add query for cells v2 setup bug 1613417
openstackbug 1613417 in devstack "gate-tempest-dsvm-cells broken with cell v2 setup: "No hosts found to map to cell, exiting."" [Undecided,In progress]
clarkbI think we would have to worry about scheduling jobs on insatnces fast enough that -g isn't done doing its thing but I don't expect it to take a ton of time since its supposed to ignore all those pesky limits19:40
fungii'm hunting for code or documentation to back up the assertion in that bug report19:41
*** jimbaker has quit IRC19:41
*** tonytan4ever has quit IRC19:41
fungithe implication is that -g will avoid ntpd freaking out and exiting if the initial offset is significant, but won't actually cause it to synchronize to that new time any faster19:43
*** amitgandhinz has quit IRC19:43
*** amitgandhinz has joined #openstack-infra19:44
clarkbwe could stop ntpd, run sntp, start ntpd19:45
clarkbwhich is similar to how the old ntpdate stuff worked19:45
*** jimbaker has joined #openstack-infra19:45
*** jimbaker has quit IRC19:45
*** jimbaker has joined #openstack-infra19:45
*** kzaitsev_mb has joined #openstack-infra19:46
fungi"Under conditions of extreme network congestion, the roundtrip delay jitter can exceed three seconds and the synchronization distance, which is equal to one-half the roundtrip delay plus error budget terms, can become very large. The ntpd algorithms discard sample offsets exceeding 128 ms, unless the interval during which no sample offset is less than 128 ms exceeds 900s. The first sample after that,19:46
fungino matter what the offset, steps the clock to the indicated time."19:46
fungiso i think that means that even at start, if the local time is off by more than 128ms, ntpd won't actually synchronize the clock for 900s19:47
clarkbwhich is certainly long enough to race job starts19:47
fungiand -g simply keeps ntpd from freaking out at startup if that >128ms skew is large enough to be >1000s19:48
*** ayoung has joined #openstack-infra19:48
*** yamahata has quit IRC19:49
*** yamahata has joined #openstack-infra19:49
fungiso, i agree, this seems to be the reason for suggesting sntp19:50
*** senk_ has quit IRC19:50
fungiand centos 7 still has an "ntpdate" service which ntpd depends on for taking acre of that, but in more recent fedora releases they seem to have replaced it with an sntp "service" to do basically the same19:52
clarkbfungi: are they enabled by default or opt in?19:54
pabelangerfungi: clarkb: jeblair: Took longer then expected, but new mirror server in ord is online:
mordredpabelanger: woot!19:55
pabelangerfungi: clarkb: jeblair: going to enroll into ansible and update DNS19:55
clarkbfungi: I am thinking the simplest thing is to undo ntp-wait and replace ntpdate with sntp19:55
clarkbfungi: in d-g19:55
clarkbfungi: or possibly make sntp part of the ready script19:56
clarkbso that all jobs have sane ntp19:56
clarkbpabelanger: great thank you for getting that up19:56
*** rbuzatu has quit IRC19:56
fungiclarkb: it got discussed in last week's meeting. maybe skim the minutes from here to the end of the topic
fungiclarkb: basically ntpd is no longer the default time sync solution on rh-based platforms, so we likely want to go with each distro's default implementations19:58
*** tonytan4ever has joined #openstack-infra19:58
jeblairfungi: the new info for me is that apparently ntp-wait is hanging on ubuntu test nodes19:58
mordredsame here19:59
fungiwhich to me means we could add an sntp call in debian/ubuntu, but switch centos/fedora to chrony19:59
openstackgerritMatthew Treinish proposed openstack-infra/elastic-recheck: Fix template filename
fungiand probably just drop ntp-wait from d-g altogether?19:59
clarkbfungi: a simple which sntp || which chrony type switch would be fine19:59
fungibasically rely on time sync to become a forced part of node bootup, and let jobs just assume that is a solved problem20:00
pabelangerdns updated, will take 60mins20:00
clarkbfungi: ya, we might also want to talk to debian and ubuntu about supporting a forced thing out of the box20:01
clarkbsince from what I can see that doesn't exist currently (but I may be missing some pacakge that adds it)20:01
openstackgerritMatt Riedemann proposed openstack-infra/project-config: Add gate-novaclient-dsvm-functional-neutron-nv job
fungiback to my earlier wistfulness of having something akin to openntpd's -s option20:02
*** oomichi_ has joined #openstack-infra20:02
clarkbhrm ubuntu says they have a thing called timedatectl20:02
fungi"-s: Try to set the time immediately at startup, as opposed to slowly adjusting the clock. ntpd will stay in the foreground for up to 15 seconds waiting for one of the configured NTP servers to reply."20:02
clarkbso now we have ntpdate, sntp, chrony, and timedatectl20:03
*** sigmavirus is now known as sigmavirus|away20:03
*** oomichi_ is now known as oomichi20:03
openstackgerritKevin Carter (cloudnull) proposed openstack-infra/project-config: Raised max instance in the OSIC
clarkbbut timedatectl won't run if you ahve ntp installed20:03
mordredof course it won't20:03
clarkbI wonder if we just removed our ntp setup completely if things would just work (tm)20:03
mordredwhy would you ever make a utility that would run if you asked it to run20:04
cloudnull^ idk if infra core folks want to let my max-instance change in quite yet but i figured i'd put it up.20:04
fungimordred: clearly they think they've put a safety on their foot-cannon20:04
*** coreyob has joined #openstack-infra20:05
mordredfungi: I'm pretty sure that the piece of paper tape across the opening on the front of the cannon that says "danger" will keep me from shooting myself20:05
* clarkb noms on more tasty VMs20:05
anteayacloudnull: what might we be waiting for?20:05
mordredcloudnull: we like your max-instance change20:06
pabelangershould we land IPv6 dns first?20:06
cloudnullIDK if there was need to wait on DNS things or what now20:06
jeblairwe might wait on the zuul telnet fix, or dns20:06
anteayathe crowd hath spoken20:06
clarkboh I approved it, I can remove the approval20:06
jeblairi don't know that we should, just saying those are the things to consider20:06
fungicloudnull: what did the dns solution end up being? are our queries to ipv4 resolver addresses going through a pat?20:06
cloudnullIDK if my cloud will cry, but i have a name to live up to.20:06
anteayaha ha ha20:06
clarkbfungi: they are NAT'd by the neutron router20:07
anteayacloudnull: and we will help you get there20:07
jlvillalFor 'gertty'. When looking at a diff. Is there a search the diff feature?20:07
cloudnullfungi: what clarkb said20:07
pabelangercloudnull: we are seeing some failures to launch in osic-cloud1: but I was going to wait until we landed dns patch to start looking why20:07
mordredcloudnull: we can always increase the level of pain we inflict on your cloud any time you feel like you need to prove your skills as a leet operator20:08
fungijlvillal: at least by default, but as with any keybindings in gertty you can set that to something else20:08
*** nmagnezi has joined #openstack-infra20:08
* cloudnull enjoys pain20:08
jlvillalfungi: Thanks. Strange a few moments ago on some diff it was showing it searching for a patch. But now it works. Odd.20:08
*** _sarob has joined #openstack-infra20:08
fungii expect dns, while needing to get solved, may be fine through pat for now. zuul-launcher ipv6 console streaming support on the other hand could be something we want to solve quickly20:08
pabelangerfungi: 355570 was my attempt at fixing dns20:09
mordredpabelanger: btw20:09
fungiis there a zuul console patch for ipv6 url support?20:09
cloudnullpabelanger: I've been monitoring / watching the logs and such. IDK what is causing the "Error Node Launch Attempts" as neutron || nova aren't stacking or really throwing any errors.20:09
cloudnullbut i'm actively trying to hunt things down.20:09
fungiaha, thanks20:10
jlvillalfungi: That search is a bit odd. It doesn't move the page down if the search result is outside the view.20:10
cloudnullit may simply be an issue with neutron programing th einterface in time. but i've not proven that at this point20:10
mordredcloudnull: oh - also, I don't know if you saw, but one of the things I was considering a problem with ipv6/shade/nodepool on osic is now at least understood ... but i don't think it's generally fixable at the moment20:10
fungijlvillal: keep hitting ctrl-s to advance20:10
jlvillalfungi: Ah sweet :) Thanks.20:11
jeblairjlvillal: ah, yeah, it doesn't look like it jumps to the initial match if outside the view.  however, repeated ctrl-s will get it there20:11
*** sarob has quit IRC20:11
cloudnullmordred: i had not seen that. something we might be able to help out with ?20:11
jlvillaljeblair: Thanks20:11
pabelangermordred: Great, +1 since I haven't done much shade yet20:11
jeblairprobably it should jump to the first one20:11
mordredcloudnull: the basic jist is that the single network with a public ipv6 and a private ipv4 is confusing to shade's concept of inferring what you want to do with your networks ... but it's not preventing us from launching nodes or using them so I'm not going to fix it until we find a way in which it breaks and can imagine a general solution20:11
jlvillaljeblair: I would vote for that behavior :)20:11
mordredcloudnull: I think it's just a deficiency in the neutron data model, and if we try to work around it TOO much in this case I think it'll lead to more not less confusion20:12
*** vhosakot has quit IRC20:12
pabelangercloudnull: So, It think we are not resolving DNS in our nodepool ready-script, we do host, and if that fails we delete the server and launch again20:12
clarkbjeblair: mordred is there a reason that that zuul patch hasn't been approved yet? can I go ahead and approve it?20:12
*** amitgandhinz has quit IRC20:12
mordredclarkb: nope. just waiting on a second +220:12
jeblairclarkb: no reason i know of.  i think pabelanger local-tested it.20:12
*** amitgandhinz has joined #openstack-infra20:12
*** _sarob has quit IRC20:13
pabelangerjeblair: clarkb: Yes, I tested it locally with a simple python app20:13
clarkbjeblair: mordred pabelanger though thinking about it, does that work if for some reaosn a host doesn't have a working ipv6 stack? do we care about such hosts?20:13
cloudnullmordred: :'( at least things are still working20:13
mordredclarkb: I do not personally care about such hosts at the moment20:13
fungiit does of course mean that people without ipv6 connectivity can't get to some of the log streams, but... join us in the new era. hurricane electric tunnels for everyone!20:13
jeblairclarkb: fungi assured us that should be fine for any linux post 1997 or something.20:13
*** rbuzatu has joined #openstack-infra20:14
pabelangerfungi: Yes! my lack of ipv6 at home is becoming a problem now20:14
clarkbfungi: thats me now that I changed ISPs20:14
clarkbI should fix that20:14
clarkbpabelanger: one trick is to ssh tunnel20:14
* sc68cal wishes FiOS would get their shit together20:14
mordredsc68cal: ++20:14
clarkbyou can v6 to v4 or v4 to v6 pretty easily with ssh20:14
clarkbsc68cal: ya thats who I changed to20:14
mordredsc68cal: oh - that reminds me - I need to call frontier to see if their Gig service is available for me20:14
pabelangerclarkb: cool, I haven't looked how to yet20:15
jeblairclarkb: that is, it should listen on v4 and v6 for dual stack hosts, which is all of them.  of course some of our nodes now are not *routable* over v4.20:15
sc68calmordred: lol humblebrag20:15
jeblairfungi: right^20:15
clarkbjeblair: yup20:15
*** sarob has joined #openstack-infra20:15
*** Goneri has quit IRC20:15
clarkbjeblair: and even if you don't have a global ipv6 addr you should have a link local addr and loopback to listen on for v620:15
anteayasc68cal: it is nice to see you, have a frowny face20:15
* sc68cal thinks he needs a REST API to POST things he needs downloaded, and ship hard drives to mordred :)20:15
fungiclarkb: pabelanger: my home ipv6 is via an he tunnel from my firewall. even have a /48 and reverse dns delegated for it20:15
clarkbjeblair: so not a problem on the bind side I don't think unless running ancient linux as fungi said20:15
clarkbfungi: ya I just know that after having native v6 with comcast very little stuff functions properly with it. Thought that may be related to the giant bitbuckets in seattel and denver in comcast land and HE is happier20:16
clarkbI have approved the zuul change20:16
clarkbfungi: I have had to disable v6 in order to get working internets more than once20:17
*** Apoorva has joined #openstack-infra20:17
fungiclarkb: the ancient behavior isn't so much lack of linklocal addressing, as older system-wide "v6only" socket behavior (which you can still set via sysctl or explicit socketopts)20:17
clarkbfungi: I want to say ubuntu of the 2005 ish era didn't have v6 enabled at all? but thats ancient so meh20:18
fungibasically, binding a socket on :: used to only listen on all ipv6 addresses, not any ipv4 addresses20:18
mordredrobust test fixtures are great20:18
*** rbuzatu has quit IRC20:18
jeblairi'm a fan20:19
jeblairmordred, pabelanger, Shrews: i'll work on getting ansible manually installed on launchers20:19
jeblairsee if i can find my old playbooks for that20:19
*** javeriak has quit IRC20:19
*** inc0 has quit IRC20:19
mordredjeblair: cool20:20
mordredjeblair, Shrews: next time you're bored: ... I added tests and a release note even20:20
Shrewsmordred: i could of swore i reviewed that already. perhaps i forgot to vote20:21
fungiclarkb: controllable through the IPV6_V6ONLY sockopt (since Linux 2.4.21 and 2.6) and /proc/sys/net/ipv6/bindv6only system default20:21
fungijeblair: ^20:21
*** valderrv_ has quit IRC20:21
*** karthik__ has quit IRC20:21
openstackgerritJames Slagle proposed openstack-infra/tripleo-ci: DO NOT MERGE - Periodic test
fungior net.ipv6.bindv6only via sysctl20:22
openstackgerritMerged openstack-infra/zuul: Simplify zuul_console port binding logic
fungiit was somewhat hotly debated on debian-devel ~7 years ago
fungiso that's what i mean by "relatively modern"20:23
mordredfungi: so as a cya, we could set net.ipv6.bindv6only to false with sysctl20:23
mordredmaybe in the zuul puppet20:23
fungii think we add that if someone complains that their 7-year-old server isn't running our experimental zuul-launcher correctly?20:24
fungithe one we haven't documented much nor encouraged others to switch to?20:24
jeblairit's on the test nodes too20:24
jeblairso 'testing on a 7 year old platform'20:24
fungiahh, yeah. i'll check centos 720:24
pabelangermordred: fungi: if you have time today, would not object to a review of 354818.  Start mirroring source packages for debian / ubuntu for zigo20:24
* jeblair hopes it's not called centos '7' because it's 7 years old20:25
funginet.ipv6.bindv6only = 0 already on centos 720:25
sean-k-mooneymmedvede: are you about?20:25
fungigood thing we're not still on centos 10!20:25
jeblairif so, i'm not sure about 'upgrading'20:25
fungialso net.ipv6.bindv6only = 0 on ubuntu precise20:26
*** valderrv has joined #openstack-infra20:26
fungiso we should be fine20:26
mmedvedesean-k-mooney: I am here20:26
clarkbfungi: ianw pabelanger ok I think I have caught up on the ntp meeting discussion. From my reading of that and ubuntu docs I think we might be ok to completely drop ntp packages and services from our test images20:27
sean-k-mooneymmedvede: i tried to set up my own instance of ciwatch but the ci_id are always null so it does not render correctly20:27
*** kgiusti has left #openstack-infra20:27
sean-k-mooneymmedvede: is the most uptoday code in the gitub?20:27
clarkbfungi: ianw pabelanger we just have to make sure that the distro defaults of chrony and timedatectl end up in place20:27
*** tonytan4ever has quit IRC20:27
clarkbfungi: ianw pabelanger I can try booting some ubuntu-minimal and fedora-minimal images once I am otherwise caught up on post vacation things to see if those just work20:28
cloudnullalso, just a shout out: thanks everyone for helping the OSIC get to gating on IPv6! its really quite awesome to see all of this getting done and rolling into production.20:28
cloudnullat the next ops-meetup/summit: beers on me :)20:28
clarkbcloudnull: its pretty neat on our end too (we have long said ipv6 should mostly work and it looks like it does \o/)20:28
clarkbcloudnull: thank you !20:28
*** asselin has joined #openstack-infra20:29
*** xyang1 has quit IRC20:29
mmedvedesean-k-mooney: yes. I have a script I can share that should setup ciwatch for you (using puppet-ciwatch module)20:29
*** xyang1 has joined #openstack-infra20:29
sean-k-mooneymmedvede: well i have it running in a docker container
sean-k-mooneybut it looks like i missed something20:30
clarkboh heh it looks like timedatectl may be a systemd realted thing that configures chronyd?20:31
*** ociuhandu has quit IRC20:31
clarkbthis isn't convoluted and confusing at all20:31
clarkband may not be part of precise but is available on trusty looks like20:31
mtreinishclarkb: yeah that's a systemd thing20:32
sean-k-mooneymmedvede: if you can point me towrad the script though i would be happy to compare and see what i missed20:32
mtreinishclarkb: or at least I think it is, because that's what I've had to use for time settings on my arch boxes for a while20:33
clarkbmtreinish: on ubuntu the systemd/systemd-services packages provide it20:33
*** _nadya_ has quit IRC20:33
jeblair#status log Installed ansible stable-2.1 branch on zuul launchers to pick up
openstackstatusjeblair: finished logging20:34
*** asselin has quit IRC20:35
mmedvedesean-k-mooney: it is pretty much just using puppet module to deploy it
*** nmagnezi has quit IRC20:37
sean-k-mooneymmedvede: thanks the only real difference i can see between the  puppet deployment and my manually deployment is i used the default sqlite conenction string instead of useing mysql20:38
sean-k-mooneymmedvede: ill give the puppet aproch a shot though and see if that works form me. thanks for the help20:39
pabelangereep, 200 nodes just got deleted by nodepool20:39
pabelangerchecking why now20:39
mmedvedesean-k-mooney: ok. I'll try deploying from scratch myself when I get some free time. I'll let you know if I see the same problem you are seeing20:40
fungipabelanger: gate reset?20:40
pabelangerfungi: I think because ansible was reinstalled20:41
pabelanger is a new failure20:41
pabelangerlets see if it happens again20:41
fungilooks like devstack-gate cells jobs are probably hitting the same problem rajinir was seeing in a third-party ci20:41
jeblairpabelanger: yes, i just reinstalled ansible20:42
*** esikachev has quit IRC20:43
clarkbfungi: yup they pushed a fix20:43
jeblairpabelanger: probably should have incorporated it into a graceful shutdown/reinstall/start playbook20:43
pabelangerjeblair: Ya, failures line up with that.  replacement nodes back online20:43
pabelangerjeblair: np20:43
pabelangerjeblair: I think you said you have a potential fix for inplace upgrades for ansible a while back?20:44
jeblairpabelanger: in-place upgrades of zuul, and that's there20:44
*** adrian_otto has joined #openstack-infra20:44
jeblairpabelanger: not ansible though.  we need to stop/upgrade/start for ansible20:44
jeblairbut that doesn't happen often20:44
jeblairi hope20:44
*** tonytan4ever has joined #openstack-infra20:45
pabelangerStarting to see traffic on the new server20:47
jeblairon a (perhaps related) note, i enqueued 355628 into the gate20:47
clarkbfungi: ianw pabelanger looks like newer ubuntu may run by default20:48
clarkbunfortuantelky the docs for that don't say anything about how it handles skew20:48
anteayajeblair: so DuncanT should be able to recheck that patch?20:48
anteayaand beagles too?20:48
jeblairanteaya: yep20:49
anteayathank you20:49
anteayathank you mordred20:49
jeblair#status log gracefully restarting all zuul-launchers20:49
openstackstatusjeblair: finished logging20:50
openstackgerritMerged openstack-infra/project-config: Raised max instance in the OSIC
jeblairin a few hours, we should have v6 telnet links working there20:51
clarkbjeblair: does that depend on new images in osic?20:51
clarkbI can babysit that if you think it will help20:51
jeblairclarkb: no, it's zuul-console component copied over by ansible from zuul-launcher20:51
jeblairclarkb: the few hours is the zuul-launcher global graceful restart i just kicked off20:52
pabelangerfungi: already up to 140 Mbps
jeblair(we *can* hard-restart the launchers, but it would burn more nodes)20:52
fungipabelanger: that's a great indication of how terrible things were before, if we were wanting 40% more than our bw cap there20:53
pabelangerfungi: indeed cc sdague ^20:53
fungiwe probably need to keep an eye on it and maybe replace it again with an even bigger flavor if we get closer to 200mbps20:53
pabelangeror setup load balanceers20:54
mordredbigger vms20:54
mordredmore power20:54
clarkbload balancers have a similar problem20:54
clarkbsince they are restricted to the same bw constraints20:54
sdaguemordred: max powers!20:54
pabelangerclarkb: that is true20:54
clarkbso in this case its simpler to just go bigger20:54
mordredsdague: so much powers20:54
pabelangergo big or go home20:55
sdaguesimpsons ^^^20:55
fungione of my favorite episodes20:56
fungii've thought from time to time it would have made an amusing online handle/pseudonym20:57
mordredsdague: so - if I could bother you for a sec ... - I added an experimental grenade job to neutronclient so that we can show that the combo of the latest os-client-config and the patch I wrote appropriately works20:57
*** sdake has joined #openstack-infra20:57
fungiclarkb: if only rackspace had a network-heavy flavor. we don't need more ram/cpu/disk but we end up with it anyway to get more bandwidth20:58
mordredsdague: BUT - it has the sads20:58
sdaguemordred: ok, you have about 3 minutes to explain the sads before I call it a day.20:58
sdaguebut you should do that, because I'll look first thing in the morning20:58
anteayasdague: thank you20:58
mordredsdague: it complains about xenial20:58
mordredsdague: which makes me think job config issue20:58
mordredsdague: but I thought I copied all of the goo from other people20:59
sdagueright, grenade doesn't run on xenial20:59
sdaguebecause comes from mitaka20:59
sdaguewhich shipped before xenial20:59
*** dprince has quit IRC20:59
sdagueand we typically don't backport that support change21:00
mordredhrm. ok, then I think my original version of that patch was potentially more correcter21:00
*** jkilpatr has quit IRC21:00
openstackgerritSagi Shnaidman proposed openstack-infra/tripleo-ci: Bump tempest version to latest
sdagueso... you should probably just move this to run on trusty21:00
sdaguewe talked about just doing the backport, but clarkb didn't think it was needed when he was rolling jobs over21:01
clarkbright we decided to run mitaka to newton/master on trusty21:01
* mordred grumps21:01
mordredsdague: cool. thnaks. super helpful21:02
openstackgerritSagi Shnaidman proposed openstack-infra/tripleo-ci: Bump tempest version to latest
sdaguemordred: ok, great. If you need other things, feel free to send an email. Heading out for the day.21:02
openstackgerritSagi Shnaidman proposed openstack-infra/tripleo-ci: WIP: DONT MERGE TESTING
*** mhickey has quit IRC21:03
*** hrubi has joined #openstack-infra21:03
clarkbyup. Thought I also think that grenade should not be so forceful about what platform I run it on21:03
openstackgerritMonty Taylor proposed openstack-infra/project-config: Run  neutronclient experimental grenade job on trusty
clarkbif I want to run it on tumbleweed please let me ...21:03
*** julim has quit IRC21:03
mordredclarkb, anteaya: ^^ per the conversation just now21:04
anteayamordred: okey dokey21:05
openstackgerritMerged openstack-infra/elastic-recheck: Add query for cells v2 setup bug 1613417
*** spzala has quit IRC21:07
*** ihrachys has quit IRC21:08
pabelangerfungi: big spike now, 175Mbps21:08
pabelangerhow high will it go21:08
pabelangernobody knows21:08
*** sdake has quit IRC21:08
mordredwow. we produce some traffic!21:09
fungiand that's just local mirror access for rax-ord21:09
fungimakes me wonder if we have very stale caches on images there21:09
*** yamamoto has joined #openstack-infra21:10
fungior whether that's something other than distro packages21:10
*** yamamoto has quit IRC21:11
fungile sigh. i'm getting "yaml.reader.ReaderError: unacceptable character #x009b: special characters are not allowed" trying to parse openstack/governance:reference/projects.yaml21:11
pabelangerfungi: so, I created a new volume for the new server since I didn't want to break jobs running. So, cache does need to warm up there21:11
anteayafungi: :(21:11
fungipabelanger: yep, but that would show up as ingress not egress21:12
jeblairwe just had a logistical discussion in #zuul which ended up with the idea that we should make a feature branch for nodepool for the zk work.  even though we want to land that and start using it soon, it will take multiple changes to implement, and coexistence with the current builder design is difficult.21:12
pabelangerfungi: Yes, that is true21:12
fungijeblair: seems reasonable to me21:12
jeblairclarkb: i guess we should have gone with your pick for server size.  :)21:12
pabelangercloudnull: your patch just went live21:12
*** adrian_otto has quit IRC21:12
pabelangerincoming 250 nodes on osic-cloud121:12
mordredfungi: how ar eyou reading it?21:13
* cloudnull goes home for the day21:13
fungimordred: yaml.safe_load(requests.get(PROJECTS_LIST % ref).text)21:13
jeblaircloudnull: oh wait, you almost forgot your pager!21:13
* cloudnull runs21:13
*** spzala has quit IRC21:13
mordredfungi: ah - a=yaml.safe_load(open('reference/projects.yaml', 'r').read()) works for me21:13
mordredI will try your version21:13
*** rhallisey has quit IRC21:14
mordredfungi: you don't have an expansion of PROJECTS_LIST % ref handy do you?21:14
fungimordred: i think it may be with how/where i'm retrieving it from. digging deeper21:14
mordreda=yaml.safe_load(requests.get('').text) works for me21:15
*** ldnunes has quit IRC21:15
*** rbuzatu has joined #openstack-infra21:15
*** tqtran has quit IRC21:15
mordredfungi: I can re-create your error with the review.o.o url21:16
openstackgerritMerged openstack-infra/elastic-recheck: Fix template filename
fungimordred: yep. i'm finding what position 25352 is next21:16
mordredfungi: fun!21:16
openstackgerritPaul Belanger proposed openstack-infra/system-config: Add tripleo-test-clouds AFS mirrors to cacti.o.o
mordredfungi: I blame jgit21:16
*** sdake has joined #openstack-infra21:17
* Shrews blames j<anything>21:17
*** _ari_ has joined #openstack-infra21:17
*** fitoduarte has joined #openstack-infra21:17
*** adduarte has joined #openstack-infra21:17
fungiu'    - name: Zbyn\xc4\x9bk Schwarz\n'21:18
mordredfungi: from git.o.o I get: u' name: Zbyn\u011bk Schwarz\n    ' right around there21:18
mordredfungi: I wonder if you need a header for the requests.get to set a language or something21:19
mordredor encoding I mean21:19
fungimordred: possibly21:19
clarkbyaml is utf8 by default iirc21:20
clarkbso if you are somehow getting the bits in not utf8 that may make it mad21:21
mordredyah - but gitweb might be encoding over the wire21:21
mordredor decoding21:21
mordredor something21:21
*** elo has joined #openstack-infra21:21
*** rbuzatu has quit IRC21:21
mordredmy browser tells me that that link is being served as "Western (Windows-1252)"21:22
fungimordred: requests.get(blah).encoding indeed says 'ISO-8859-1'21:23
fungilooks like it might be a gitweb fallback behavior21:24
pabelangerokay, stepping away to run some family errands.  I think our original can be deleted now. Last hit to apache logs in 15/Aug/2016:20:47:05 +0000.  I'll do that when I get back this evening just to be safe21:25
*** matt-borland has quit IRC21:25
clarkbpabelanger: thanks again21:25
clarkbfungi: mordred fallback for when you don't set an accepts encoding?21:26
clarkbok ntp is making me go blind21:26
clarkbianw: pabelanger: any other feedback on not using our ntp mdoule on the test images at all?21:26
*** adrian_otto has joined #openstack-infra21:27
fungiclarkb: likely. i'm just reading through requests docs now21:28
*** jkilpatr has joined #openstack-infra21:28
*** apetrich has quit IRC21:29
mordredfungi: ok. SO ...21:30
mordredresponse = requests.get(';a=blob_plain;f=reference/projects.yaml;hb=master')21:31
mordredresponse.encoding = 'utf-8'21:31
mordredfungi: that ^^ works21:31
fungimordred: yep, found that gem21:31
mordredfungi: so I think what may be happening is that gitweb is returning utf8 data but setting the header wrong21:31
fungiso requests is assuming the response is in latin1 when it's actually utf8 all along21:31
mordredyah: 'Content-Type': 'text/plain; charset=ISO-8859-1'21:32
mordredthat's in the respnse headers from gitweb21:32
fungi.headers does indeed sat that21:32
fungier, say21:32
fungithat's where i just went as well21:32
mordredI think we can consider that to be independently verified results then! :)21:32
fungimordred: i think it's actually not setting an encoding, which rfc 2616 says means latin121:34
mordredfungi: lovely21:34
fungii'd need to use a packet sniffer to confirm whether requests is faking that in the headers dict, or apache is actually passing it21:35
*** jheroux has quit IRC21:35
fungicould be we need apache on review.o.o configured differently21:35
fungithere's "AddDefaultCharset UTF-8" as one possibility21:36
*** yamahata has quit IRC21:38
*** sdake has quit IRC21:39
karthikp_clarkb: Hi21:40
openstackgerritClark Boylan proposed openstack-infra/system-config: Disable ntp services on single use test instances
clarkbfungi: ianw pabelanger ^ I am going to WIP that until I can do more testing of the distro boot time defaults to make sure tehy do set something sane (manpages for ubuntu claim they do and I think its all systemd related so the other distros should too)21:40
openstackgerritJulia Kreger proposed openstack-infra/project-config: Rename bifrost integration test job
clarkbkarthikp_: hello21:41
*** edtubill has joined #openstack-infra21:41
karthikp_clarkb: got a question for you regarding grenade.... any idea why this step is necessary?21:42
clarkbkarthikp_: I don't know for sure but guessing that if the services really fail to start then the lgos won't exist21:43
clarkbkarthikp_: you might have better luck checking the git logs for that line21:43
anteayaTheJulia: does bifrost have more than one job for ipa?21:46
anteayaTheJulia: if not, how about removing the adjective and going with ipa21:46
TheJuliaanteaya: two, we build IPA with debian as well21:46
openstackgerritIvan Udovichenko proposed openstack-infra/project-config: Add new/update existing projects
anteayajust trying to see if we can avoid needing to rename the job when you change the image21:47
fungimordred: i've investigated some apache-side workarounds, but on deeper investigation it seems that gitweb has a history of returning incorrect content types, including for blob_plain
TheJuliaanteaya: renaming the job was like the last thing I wanted to do though :\21:47
fungimordred: so i'll just set the encoding in our script as a workaround21:47
anteayathat is different from ipa-debian, yeah?21:48
TheJulianot as descriptive, yeah, it also fires up debian in that job, so in theory that still works21:48
*** amotoki has quit IRC21:48
* TheJulia likes it21:48
anteayawell the description should be in the log, right?21:48
anteayayay you like it21:48
*** burgerk has quit IRC21:48
TheJulia:)  I'll update it in a little bit21:49
anteayayup, thanks21:49
*** adrian_otto has joined #openstack-infra21:51
clarkbipa works on cirros?21:52
* clarkb wonders if thats another image in the ironic ramdisk image list21:52
karthikp_clarkb: i see .. git logs?21:52
clarkbkarthikp_: the revision control history for that repo may tell you why that line was added21:52
*** devkulkarni has joined #openstack-infra21:52
anteayaI have to assume the answer to your question is yes, based on the existing name of the job21:55
*** tqtran has joined #openstack-infra21:55
*** harlowja has quit IRC21:55
TheJuliaclarkb: more like we deploy cirros as fast lightweight reliable test21:55
anteayaI'm that patch is all I am using for my assertion21:55
clarkbah you boot cirros using tinycore ramdisk21:56
karthikp_clarkb: Oh ya that was added for all the projects by sdague..i iwlll chekc with him21:56
karthikp_clarkb: thanks21:56
anteayaJayF: is thinking like me21:56
*** tonytan4ever has quit IRC21:57
* TheJulia lets there be a little chatter and goes to start cooking dinner :)21:57
JayFanteaya: that's the nicest thing you've ever said to me \o/ :)21:57
JayFMy thought was just, if I were graphing this job, I'd wanna see how the default changed it w/o having to change the name21:58
JayFif you have >1 of something, sure, specify, but maybe leave it out if it's only 121:58
*** tkelsey has joined #openstack-infra21:58
anteayaJayF: ha ha ha :)21:59
anteayaJayF: I agree with you thinking21:59
anteayaI'm so glad I could math in school, the spelling gods never looked my way22:00
*** amotoki has joined #openstack-infra22:00
*** matrohon has quit IRC22:00
*** tqtran has quit IRC22:00
openstackgerritMerged openstack-infra/project-config: Run  neutronclient experimental grenade job on trusty
*** gordc has quit IRC22:00
*** amotoki has quit IRC22:00
*** yamahata has joined #openstack-infra22:01
*** xarses has joined #openstack-infra22:01
anteaya<-- offline22:02
*** mtanino has quit IRC22:03
*** tkelsey has quit IRC22:03
*** esberglu has quit IRC22:06
*** thorst_ has quit IRC22:08
*** tqtran has joined #openstack-infra22:08
mmedvedesean-k-mooney: around? I tested a fresh install of ciwatch, it works fine22:08
*** mdrabe has quit IRC22:09
*** mriedem has quit IRC22:09
* clarkb is working on building dib -minimal images without an explicit ntp install to see what we end up with22:09
clarkbianw: pabelanger ^ hopefully that shows us we can get away with just not doing stuff on the sinlge use nodes22:09
*** valderrv has quit IRC22:10
*** edtubill has quit IRC22:10
*** thorst_ has joined #openstack-infra22:13
*** tqtran has quit IRC22:15
scottdayolanda: Would you re-approve when you have a chance? The dependent patch has merged an it needed a rebase.22:16
*** thorst_ has quit IRC22:18
*** jistr has quit IRC22:18
*** jistr has joined #openstack-infra22:19
*** onovy has joined #openstack-infra22:19
*** netsin has quit IRC22:25
*** yamamoto has joined #openstack-infra22:26
*** hockeynut has joined #openstack-infra22:29
*** jkilpatr has quit IRC22:29
*** weshay has quit IRC22:32
*** signed8bit is now known as signed8bit_Zzz22:34
*** fguillot_ has joined #openstack-infra22:34
*** nwkarsten has quit IRC22:37
*** krtaylor has quit IRC22:38
*** rbuzatu has joined #openstack-infra22:38
*** netsin has joined #openstack-infra22:38
openstackgerritIvan Udovichenko proposed openstack-infra/project-config: Add new/update existing projects
pabelangerclarkb: ack22:39
pabelangerclarkb: I haven't really been following the ntp issues from today. Will try and catch up on backscroll here in a bit22:40
clarkbpabelanger: tl;dr is after seeing the meeting notes from last week I saw you all mentioned just using the defaults on the distros. and on further investigation I think at least for systemd distros it may just work if we stop explicuitly installing ntp22:40
clarkbpabelanger: so building images locally to test that theory22:41
pabelangerclarkb: Ah, yes. I remember that22:41
clarkbpabelanger: since systemd has some built in time syncing stuff that should update the time on boot if I am reading things correctly22:42
clarkbbut want to test that first22:42
clarkband figure out what trusty and precise do22:42
*** beagles_brb is now known as beagles22:42
JayFtimesyncd is pretty nuts though. it just does a tls connection to something and steals the timestamp iirc22:42
JayFlike if that's good enough, it's good enough, just a strange way of doing things22:43
JayFah it apparently talks to real ntp servers now, that's an improvement22:43
pabelanger#status log upgraded to performance1-4 to address network bandwidth cap.22:43
pabelangerand original server now deleted22:44
*** weshay has joined #openstack-infra22:44
openstackgerritMatthew Treinish proposed openstack-infra/devstack-gate: SUPER WIP: Use new tempest run workflow
*** rbuzatu has quit IRC22:44
*** pabelanger has quit IRC22:45
*** pabelanger has joined #openstack-infra22:45
pabelanger#status log upgraded to performance1-4 to address network bandwidth cap.22:45
openstackstatuspabelanger: finished logging22:45
*** signed8bit_Zzz is now known as signed8bit22:46
*** fguillot_ has quit IRC22:47
clarkbJayF: ya for our long lived servers we will probably continue to ntp or similar22:47
clarkbJayF: but on the test instances we really just need a mostly correct timestamps in logs that won't jump halfway through a job22:48
pabelangerokay, just starting to look into osic-cloud1 lauch node errors, first issue:
clarkbpabelanger: that needs to use iptables6 I think22:50
* mordred lookie22:50
pabelangerI think so too22:50
pabelangerOh, can we land
pabelangerhelp reduce debug logs in nodepool22:51
pabelangerwhen we cannot host git.o.o22:51
mordredoh piddle22:51
*** edmondsw has quit IRC22:51
cloudnullpabelanger: anything you need from me  ?22:51
cloudnullor any way I can help ?22:51
mordredthe 'bug' in shade (it currently doesn't do enough magic WRT IPv4/IPv6 addresses) _may_ bite us with multi-node22:52
pabelangercloudnull: I don't think so. We just need to update some nodepool scripts I think22:52
* mordred goes to look through nodepool real quick22:52
pabelangermordred: oh?22:52
*** vhosakot has quit IRC22:53
mordredyeah. blast22:53
mordredthat means I _am_ going to have to fix that22:53
* mordred cries22:53
mordredactually ...22:53
*** fguillot_ has joined #openstack-infra22:53
pabelangercloudnull: actually, I do see an SSH timeout for osic-cloud122:53
pabelangercloudnull: let me see if I can get the instance ID22:53
mordredclarkb: multinode testing networking ...22:53
clarkbya thats the setup for allowing all traffic between multinode right?22:54
mordredclarkb: we don't actually need subnodes_private to have things in it, right? because we have clouds with only public?22:54
clarkbshould be simple to just check the ip and use the right iptables command22:54
pabelangercloudnull: timeout waiting for ssh access22:55
clarkbmordred: last time I tried to use public only on clouds with both priovate and and public openstack didn't work22:55
clarkbmordred: clouds like osic when fip and bluebox22:55
clarkbI think NAT is or was creating problems for us there22:55
mordredclarkb: k. so - what if one of the things in subnodes_public was a 10. address22:55
clarkbthen other random stuff wouldn't work I would expect22:56
cloudnullpabelanger: looking22:56
mordredclarkb: the tl;dr here is that on osic we detect the 10. ipv4 address as being "public"22:56
clarkbmordred: we should put the ipv6 addr in there no?22:56
cloudnullmordred: does it make your life easier if i change that to a 192 address ?22:56
mordredcloudnull: nope22:56
*** nwkarsten has joined #openstack-infra22:57
mordredclarkb: but nodepool multi-node is the one place where we might look explicitly for public/private and expect themto be correct22:57
*** sdake has quit IRC22:57
mordred(most of the rest of the cases it all just works because interface_ip has the ipv6 address and everything is happy)22:57
*** tonytan4ever has joined #openstack-infra22:58
clarkbmordred: multinode d-g wants to use the private addrs for most stuff (I think everything) due to the presumed nat issues22:58
mordredclarkb: ok. I'll work on a fix then22:58
clarkbmordred: so I wouldn't expect that to break with 10 net addr in public22:58
clarkbbut you need to have it in private list too22:58
mordredclarkb: well, the 10. will not be in private22:59
mordredonly in public22:59
clarkbI think nodepool puts it in both22:59
clarkbif there is no private addr then it writes the public to private22:59
mordredI'll go read through that code more22:59
clarkbso that things relying on "private" continue to work22:59
mordredoh good22:59
mordred(this is me really not wanting to try to solve the problem right now)22:59
mordredclarkb: for slightly more wordy context- the underlying problem is that we currently determine "does this route packets off the cloud" with the Network object. (and to be fair, that's where the router:external property which does not mean routes externally sits)23:01
*** nwkarsten has quit IRC23:01
mordredclarkb: but it turns out you can have a subnet that routes externally and a subnet that does not route externally both attached to the same Network23:01
mordredclarkb: so the _real_ question that needs to be asked is "is the port that provides this IP address attached to a subnet that can route externally"23:02
mordredbut that's a bunch more data model trolling to get consistent and right every time - and most of the time it's an extra level of complexity that doesn't show up23:02
*** rbrndt has quit IRC23:02
pabelangerclarkb: did we want to land 355570 now? So we can have the dns fix for tomorrows image builds23:02
clarkbmordred: fun23:03
mordredclarkb: yah.23:03
*** tonytan4ever has quit IRC23:03
pabelangermordred: can we remove the autohold for Automatically held after failing gate-shade-dsvm-functional-neutron ?23:04
pabelangeror is that still needed23:04
mordredpabelanger: yes. absolutely can remove23:04
clarkbpabelanger: does that work in clouds with no v6? does unbound know to do the right thing in that situation?23:04
*** devkulkarni has quit IRC23:04
*** asettle has joined #openstack-infra23:04
mordredthat's a good question23:04
pabelangerclarkb: I tested with both ovh and osic and it worked.23:05
pabelangerI can confirm with each other cloud too23:05
clarkbpabelanger: and you made sure that it was using unbound not the cloud provided resolvers?23:06
clarkbI am not sure if that happens in ovh like in rax23:06
*** markvoelker has quit IRC23:06
pabelangerclarkb: yup, nslookup used
pabelangersame with dig +trace23:07
mtreinishfungi, pabelanger, clarkb: hmm did I miss a step in adding firehose.o.o to cacti: is all blank23:07
pabelangerwelp, internap is also using DNS from cloud provider23:07
jeblairmtreinish: for starters, isn't the server ''?23:08
*** xyang1 has quit IRC23:08
*** Goneri has joined #openstack-infra23:08
mtreinishjeblair: ah, yep that'd probably do it23:08
fungiahh, yep, need to fix that at
fungii missed that23:09
jeblairi'm not sure if that's the actual cause, but i'm not certain it's not.23:09
jeblaircacti says 'udp ping success / snmp error'23:09
*** hongbin has quit IRC23:09
*** asettle has quit IRC23:09
openstackgerritMatthew Treinish proposed openstack-infra/system-config: Fix firehose hostname on cacti hiera
mtreinishjeblair, fungi: ^^^23:10
fungias for why it's not showing up, i don't see snmpd running on the server23:10
fungiActive: active (exited) since Mon 2016-08-01 15:48:50 UTC; 2 weeks 0 days ago23:11
fungisayeth `service snmpd status`23:11
jeblairthat would do it fer shure23:11
clarkbpabelanger: approved23:12
clarkbmy local xenial host without ntp is definitely running the systemd thing23:12
clarkbtrusty doesnt' seem to do much with time though23:12
fungii'll refrain from restarting snmpd on it until the hiera change makes it onto the cacti host23:12
jeblairfungi: good plan23:12
jeblairless to delete that way23:12
fungilaziness is next to godliness23:12
fungior something like that23:13
mtreinishpleia2: it looks like puppet updated the stuff, but the cron job is still not happy:
clarkbya I think older non systemd distros are going to be a problem here23:14
clarkbthere goes that idea :P23:14
*** asselin has joined #openstack-infra23:14
pabelangerclarkb: I think we are going to be good, all clouds appear to have inet6 address on eth0 and lo0.  And unbound seems to do the right think if ipv6 entry is not accessible, fails to the next entry which is ipv423:15
*** asselin_ has quit IRC23:15
*** asselin_ has joined #openstack-infra23:15
clarkbpabelanger: most of those clouds just have link local addrs though23:16
*** baoli has joined #openstack-infra23:16
cloudnullpabelanger: interestingly I'm seeing this on the compute node where that instance was spawned.
clarkbpabelanger: which won't get them to gogole dns. The exceptions are osic, rax, and vexxhost23:16
cloudnullhowever no other errirs23:16
clarkbin any case if it falls back to ipv4 without ridiculously long timeouts we should be fine23:16
pabelangerclarkb: Ya, it is pretty fast23:17
cloudnullpabelanger: was talking about that instance you noted as having ssh timeouts23:18
* clarkb is beginning to wonder if the simplest thing would be to install our own init script for sntp and jsut run that once at boot on all platforms23:18
clarkbprobably going to run into dependency hell with the existing distro stuff though23:18
*** asselin has quit IRC23:19
pabelangercloudnull: Oh, neat. So you are seeing something23:19
cloudnullyea i may need to do some iptables munging or neutron tweaking to make that happier.23:20
cloudnullidk quite yet23:20
* pabelanger nods23:20
cloudnullbut yes.23:20
openstackgerritMerged openstack-infra/project-config: Add IPv6 DNS support
*** xarses has quit IRC23:22
openstackLaunchpad bug 1565705 in neutron "iptables duplicate rule warning on ports with multiple security groups" [Medium,Fix released] - Assigned to Kevin Benton (kevinbenton)23:23
*** shashank_hegde has quit IRC23:23
openstackgerritJeremy Stanley proposed openstack-infra/system-config: Add a script to list change owner statistics
fungianteaya: zaro: ^ latest gerrit upgrade allowed some serious simplification there on multiple fronts23:24
zarofungi: ahh nice!23:25
zarofungi: i'm testing online index but sorta hit a snag.  not enough memory on review-dev now!23:26
fungidropped more than 50 loc23:26
fungizaro: oh, ouch!23:26
fungiwe can rebuild it bigger if needed23:26
clarkband it looks like on fedora and centos we would have to explicitly install something to set the time so they are more like ubuntu trusty23:27
zarofungi: yeah may need to if we want to test multiple users hitting it while it's reindexing.23:27
clarkbI will need to fiddle with these VMs a bit more when its not almost the end of the day23:28
clarkbfigure out what magic is needed to make things happen23:28
zarofungi: on the bright side it seems to be working great with just me poking at it.23:28
*** gyee has quit IRC23:30
fungianteaya: dhellmann: you _should_ be able to use on your own to generate the electoral rolls now, though with the coming round of technical elections i think i should generate a set too and then election officials can confirm the lists they have match mine just to be on the safe side. if it works out though, our gerrit admins can get completely out of involvement in23:33
*** kzaitsev_mb has quit IRC23:33
fungifuture elections unless troubleshooting becomes necessary23:33
dhellmannfungi : excellent23:34
dhellmannthough I won't be an election official since I'll be up for election23:35
fungiahh, yup ;)23:35
*** harlowja has joined #openstack-infra23:36
*** devkulkarni has quit IRC23:40
clarkbthe mroe I dig the more I think we might have to do our own equivalent to ntpdate at boot using system appropriate tools23:40
clarkbsince everything seems to do the gentle update to avoid making processes unhappy23:40
openstackgerritJames Slagle proposed openstack-infra/tripleo-ci: DO NOT MERGE - Periodic test.
*** sdague has quit IRC23:44
*** pahuang has quit IRC23:45
*** jerryz has quit IRC23:46
mordredit's a jhesketh !23:47
mordredclarkb: yah - when what we want is "MAKE IT GOOD NOW"23:47
*** sarob has quit IRC23:48
* clarkb is happy his suse system already comes with this feature23:48
clarkbbut I can't find anything like ti on ubuntu23:48
*** zhurong has joined #openstack-infra23:49
*** gyee has joined #openstack-infra23:49
jheskethmordred: indeed :-)23:50
*** dingyichen has joined #openstack-infra23:51
cloudnullpabelanger: sadly, yet again, I can't find anything specifically wront with the environment that would produce an ingress ssh timeout. If we can identify one of these instances and keep it online I can troubleshoot it further.23:55
cloudnullNow that I have LOTS of IPs to play with I'll try to reproduce it on my own but for now, IDK :'(23:56
clarkbreading chrony init scripts for ubuntu it will do a burst on interface startup but not a step23:56
*** zhurong has quit IRC23:57

