Thursday, 2018-11-01

openstackgerritAlex Schultz proposed openstack-infra/project-config master: Fix instack jobs (again)  https://review.openstack.org/61465100:00
mwhahahaOr today00:00
*** markvoelker has quit IRC00:01
*** slaweq has quit IRC00:15
*** ssbarnea has quit IRC00:29
*** erlon has quit IRC00:30
fungitrick-or-treat activity has wound down. catching up now on what i missed00:30
fungiimacdonn: clarkb: we've definitely seen arp fights in other providers where a leaked instance held onto an ip address which got reassigned to a new instance, though i don't personally recall seeing it in rackspace00:33
*** betherly has joined #openstack-infra00:35
imacdonnfungi: any way to easily tell if this is happening repeatedly?00:35
funginot really other than logstash searches on that error string00:35
imacdonnI recheck'ed my change, but it failed in a different way .. urgh :/00:36
fungiimacdonn: openstack has bugs? color me surprised ;)00:37
imacdonnfungi: hah, well ... it failed tests that it passed the first time, but the first time it hit hat host key change in another job00:37
imacdonnspeaking of which ... is there any way to ask Zuul to recheck only the job that failed ?00:38
*** betherly has quit IRC00:39
*** erlon has joined #openstack-infra00:43
*** fuentess has quit IRC00:49
*** longkb has joined #openstack-infra00:51
fungiintentionally not, by design00:53
fungiotherwise it becomes waaaay to easy to pin-and-tumbler lockpick nondeterministic failures into merging00:54
fungier, too easy00:54
*** betherly has joined #openstack-infra00:56
*** betherly has quit IRC01:01
*** xinliang has quit IRC01:06
openstackgerritMerged openstack-infra/irc-meetings master: Remove WoO meeting  https://review.openstack.org/61464901:08
*** jamesmcarthur has joined #openstack-infra01:24
*** hongbin has joined #openstack-infra01:24
*** jamesmcarthur has quit IRC01:28
imacdonnfungi: yeah, I kinda figured ... OK ... recheck #2, I guess :/01:28
fungiimacdonn: there is usually value in identifying the failures you encounter, at least so that others might find ways to fix them01:29
imacdonnfungi: sure, and that's why I asked about the host key change thing .... but a lot of the failures are just timeouts01:33
imacdonnfungi: although this one isn't ... some DB weirdness01:34
imacdonnhttp://logs.openstack.org/17/614617/1/check/openstack-tox-py27/0040196/ if there's any interest ... not sure what would cause this ... maybe some concurrency thing, I guess01:34
*** dayou has quit IRC01:38
fungiyeah, two cinder db migration failures?01:40
openstackgerritsebastian marcet proposed openstack-infra/openstackid-resources master: updated caldav client lib  https://review.openstack.org/61466601:40
fungisqlalchemy.exc.ResourceClosedError: This result object does not return rows. It has been closed automatically.01:41
imacdonnyeah... not sure if they caused eachother to fail, or both failed for a common reason01:41
openstackgerritMerged openstack-infra/openstackid-resources master: updated caldav client lib  https://review.openstack.org/61466601:41
fungicould certainly be a concurrency-related bug in how the tests are written there01:41
imacdonnmaybe they're both faking rows in the same table, and happened to both delete them at the same time01:41
*** tdasilva has quit IRC01:42
imacdonnor something01:42
fungilong shot, but i wonder how much smcginnis knows about the db tests in cinder's unit testing (if he's around this fine hallowe'en)01:43
clarkbI think there may be an e-r bug for migration failures01:44
imacdonnI know there are concurrency issues with some of the cinder unit tests ... they fail somewhat consistently on my 48-vCPU dev box, but pass fairly consistently with '--concurrency=12'01:47
imacdonnbut I've not seen this particular failure before01:47
*** betherly has joined #openstack-infra01:48
fungiyeah, high concurrency runs are great for teasing out those sorts of problems01:48
fungiwe just don't generally have the resources to test that with any frequency01:48
*** betherly has quit IRC01:52
*** erlon has quit IRC01:58
imacdonnHMm, interesting ... those two tests are skipped when I run tox, because "Backend 'mysql+pymysql' is unavailable: Could not connect"01:59
*** rlandy|bbl is now known as rlandy01:59
*** markvoelker has joined #openstack-infra02:03
*** lujinluo has quit IRC02:03
*** lujinluo has joined #openstack-infra02:04
*** lujinluo has quit IRC02:09
*** apetrich has quit IRC02:16
*** rh-jelabarre has quit IRC02:34
*** markvoelker has quit IRC02:35
imacdonnwith mysql installed and populated by test-setup.sh, I can't get those test to fail .... ho hum ... back to recheck :P02:38
*** lujinluo has joined #openstack-infra02:49
*** mrsoul has quit IRC02:55
*** lujinluo has quit IRC02:55
*** xek has joined #openstack-infra03:09
*** psachin has joined #openstack-infra03:10
*** bhavikdbavishi has joined #openstack-infra03:10
*** icey has quit IRC03:18
*** betherly has joined #openstack-infra03:21
*** icey has joined #openstack-infra03:23
*** betherly has quit IRC03:26
*** markvoelker has joined #openstack-infra03:32
*** armax has quit IRC03:49
*** udesale has joined #openstack-infra03:50
*** betherly has joined #openstack-infra03:53
*** ramishra has joined #openstack-infra03:56
*** betherly has quit IRC03:57
*** dayou has joined #openstack-infra03:59
*** markvoelker has quit IRC04:06
*** betherly has joined #openstack-infra04:24
*** betherly has quit IRC04:29
*** hongbin has quit IRC04:37
*** xek has quit IRC04:45
*** lujinluo has joined #openstack-infra05:01
*** markvoelker has joined #openstack-infra05:02
*** lujinluo has quit IRC05:05
*** lujinluo has joined #openstack-infra05:06
openstackgerritIan Wienand proposed openstack-infra/system-config master: [to squash] Fix groups.yaml for yamlgroup plugin  https://review.openstack.org/61469305:09
openstackgerritIan Wienand proposed openstack-infra/system-config master: Add unittest for yamlgroup inventory plugin  https://review.openstack.org/61469405:09
*** annp has quit IRC05:10
*** betherly has joined #openstack-infra05:16
*** betherly has quit IRC05:21
*** rlandy has quit IRC05:30
*** markvoelker has quit IRC05:36
*** betherly has joined #openstack-infra05:37
*** betherly has quit IRC05:42
*** roman_g has quit IRC05:42
*** betherly has joined #openstack-infra05:57
*** betherly has quit IRC06:02
*** lujinluo has quit IRC06:19
*** chkumar|off is now known as chandankumar06:21
*** markvoelker has joined #openstack-infra06:33
*** xinliang has joined #openstack-infra06:45
xinliangfrickler:  This issue I found on debian arm64 node.06:45
xinliangianw: Could you have time to see this bug: https://bugs.linaro.org/show_bug.cgi?id=4035, arm64 debian mirror repo is really slow07:01
openstackbugs.linaro.org bug 4035 in Default "[uk cloud] Wget fetching from mirror.london.linaro-london.openstack.org is more slower and unstable than deb.debian.org" [Enhancement,Unconfirmed] - Assigned to gema.gomez-solano07:01
*** bhavikdbavishi has quit IRC07:02
*** apetrich has joined #openstack-infra07:03
ianwxinliang: hrm, i just logged into the host, it doesn't look like it's doing anything that would slow itself down07:04
*** icey has quit IRC07:04
ianwnothing really in dmesg07:05
*** markvoelker has quit IRC07:05
ianwdev/mapper/main-afscache     99G  2.0G   92G   3% /var/cache/openafs07:05
ianwthat's interesting, it's not using much afs cache at all07:05
xinliangianw: yeah, it's odd. The cloud inernal bandwidth is very high say ~4 G/s07:06
xinliangianw: any firewall rules on the repo web server??07:07
ianwxinliang: no, we have nothing like that07:08
ianwroot@mirror01:/etc/openafs# fs getcacheparms07:08
ianwAFS using 1781837 of the cache's available 50000000 1K byte blocks.07:08
xinliangianw: so mirror is store upon of afs, right?07:08
ianwso it seems like it has plenty of cache.  the deb's you using are coming from AFS, so ulitmately we bring that across from our severs in dfw to the mirror node07:08
ianwfor /debian, yep; we have some reverse proxies07:08
xinliangSo it fething from remote server then?07:09
ianwbut it may be that running a few times primes the cache more and helps.  any initial access can be slowish07:09
xinliangI run wget serveral times, is that enough to use cache?07:10
*** icey_ has joined #openstack-infra07:12
ianwyeah, that would hopefully prime it better07:12
ianwnow i'm thinking about it though ... i think the cache gets invalidated on releases of the volumes07:12
ianwwhich happens when we sync the mirrors07:12
ianwnow for very busy mirrors, the cost of re-fetching is armortised quickly across many jobs07:13
ianwbut for this region, where we have only a few jobs running occasionally ... that may be a bad case for it07:13
*** icey_ is now known as icey07:14
xinliangianw:  seems like mirror sync is real-time. I though it just sync serveral times one day07:15
ianwall the debian-ish things run reprepro every few hours07:16
ianwhttp://grafana.openstack.org/d/ACtl1JSmz/afs?orgId=1 should show the last release07:17
*** lujinluo has joined #openstack-infra07:18
xinliangBut what I am not understand is why it gets things from remote server not from the local mirror server, athough we specific fetch from the local mirror server dns name07:19
ianwbecause afs is a remote/distributed file system07:22
*** lujinluo has quit IRC07:23
ianw(e.g. nfs) ... so if the data isn't cached locally, it comes across remotely07:23
xinliangianw: I see, so the first time it get thing is from remote then second time it will be from locally, right?07:25
ianwxinliang: yes, second time it should be in the afs cache07:25
*** lpetrut has joined #openstack-infra07:26
xinliangianw: ok, let me check if second time is faster than first time. If so , then no problem. thanks ianw.07:27
ianwxinliang: yeah, that's worth a try at first.  if we're having problems with the cache being invalidated too frequently ... umm ... i dunno, have to think about that07:31
xinliangianw: yes, I check that the second time is much faster than first time. http://paste.openstack.org/show/733755/07:32
ianwauristor would know exactly what "vos release" invalidates and might have ideas about what we could do for less frequently accessed mirrors to try and keep them fresh07:32
ianwxinliang: ok, good, that's what we expect as a first pass07:33
xinliangianw:  in any case it will invalidated too frequently? or is there an item setting this?07:33
xinliangin what case?07:34
*** pcaruana|elisa| has joined #openstack-infra07:40
*** ramishra has quit IRC07:45
ianwxinliang: i'm not sure there's settings we can fiddle.  i'm a little concerned that with the frequency we refresh the volume, and the low frequency of jobs, it might not be the best case for this07:45
*** ramishra has joined #openstack-infra07:46
xinliangianw: i see. thanks ianw07:46
*** ifat_afek has joined #openstack-infra07:47
*** lujinluo has joined #openstack-infra07:57
*** slaweq has joined #openstack-infra07:58
*** pcaruana|elisa| has quit IRC07:59
*** imacdonn has quit IRC08:00
*** bhavikdbavishi has joined #openstack-infra08:01
*** udesale has quit IRC08:02
*** lujinluo has quit IRC08:02
*** markvoelker has joined #openstack-infra08:03
*** pcaruana has joined #openstack-infra08:05
*** shardy has joined #openstack-infra08:08
*** shardy_ has joined #openstack-infra08:08
*** bhavikdbavishi has quit IRC08:09
*** bhavikdbavishi has joined #openstack-infra08:09
*** shardy_ has quit IRC08:10
*** shardy_ has joined #openstack-infra08:11
*** lpetrut has quit IRC08:12
*** ykarel has joined #openstack-infra08:16
*** slaweq has quit IRC08:18
*** ralonsoh has joined #openstack-infra08:22
*** ralonsoh has joined #openstack-infra08:23
*** jamesmcarthur has joined #openstack-infra08:25
*** jamesmcarthur has quit IRC08:30
*** florianf|afk is now known as florianf08:32
*** markvoelker has quit IRC08:36
*** xek has joined #openstack-infra08:49
*** ssbarnea has joined #openstack-infra08:55
*** jpich has joined #openstack-infra09:00
*** rabel has quit IRC09:02
*** kjackal has joined #openstack-infra09:09
*** derekh has joined #openstack-infra09:16
*** udesale has joined #openstack-infra09:16
*** shrasool has joined #openstack-infra09:17
*** rabel has joined #openstack-infra09:28
*** masayukig[m] has joined #openstack-infra09:28
*** electrofelix has joined #openstack-infra09:32
*** markvoelker has joined #openstack-infra09:33
*** ykarel is now known as ykarel|lunch09:38
fricklerxinliang: ianw: interesting, that would explain why all my subsequent tests yesterday didn't show any issue09:42
fricklerxinliang: were there any real errors like job timeouts due to this or just longer runtimes?09:42
*** e0ne has joined #openstack-infra09:43
xinliangfrickler: haven't found yet09:44
*** lpetrut has joined #openstack-infra09:49
fricklerxinliang: so if the mirror is just slow on the first access, I think we could live with that. if we see failures, we would need to consider further measures like setting up a periodic job that would ensure the cache is kept warm09:52
xinliangfrickler: yeah, sounds a good idea.09:53
*** panda|off is now known as panda09:56
*** shrasool has quit IRC09:57
*** ralonsoh has quit IRC10:02
*** ralonsoh has joined #openstack-infra10:03
*** markvoelker has quit IRC10:07
*** chandankumar has quit IRC10:09
*** apetrich has quit IRC10:12
*** jaosorior has quit IRC10:23
*** apetrich has joined #openstack-infra10:25
*** chandankumar has joined #openstack-infra10:30
*** lujinluo has joined #openstack-infra10:32
*** e0ne has quit IRC10:32
*** e0ne has joined #openstack-infra10:36
*** lujinluo has quit IRC10:36
*** ansmith has joined #openstack-infra10:42
*** jtomasek has quit IRC10:43
*** dave-mccowan has joined #openstack-infra10:45
*** jtomasek has joined #openstack-infra10:45
*** ansmith has quit IRC10:51
*** markvoelker has joined #openstack-infra11:04
*** pcaruana has quit IRC11:05
*** bhavikdbavishi has quit IRC11:16
*** xek_ has joined #openstack-infra11:22
*** udesale has quit IRC11:24
*** xek has quit IRC11:25
*** xek__ has joined #openstack-infra11:25
*** zul has joined #openstack-infra11:26
*** xek_ has quit IRC11:27
*** longkb has quit IRC11:34
*** jaosorior has joined #openstack-infra11:35
*** markvoelker has quit IRC11:36
*** pcaruana has joined #openstack-infra11:52
*** bhavikdbavishi has joined #openstack-infra11:54
*** erlon has joined #openstack-infra11:57
*** kjackal has quit IRC12:01
*** pbourke has quit IRC12:09
*** pbourke has joined #openstack-infra12:10
*** lpetrut has quit IRC12:13
*** rh-jelabarre has joined #openstack-infra12:15
*** boden has joined #openstack-infra12:18
*** trown|outtypewww is now known as trown12:18
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool master: Add tox functional testing for drivers  https://review.openstack.org/60951512:25
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool master: Add tox functional testing for drivers  https://review.openstack.org/60951512:26
*** udesale has joined #openstack-infra12:32
*** ykarel_ has joined #openstack-infra12:32
*** ykarel|lunch has quit IRC12:35
*** kjackal has joined #openstack-infra12:36
*** rlandy has joined #openstack-infra12:41
*** ansmith has joined #openstack-infra12:48
*** agopi|brb is now known as agopi12:49
*** zul has quit IRC12:52
openstackgerritDoug Hellmann proposed openstack-infra/openstack-zuul-jobs master: remove the tag pipeline entry for release-notes-jobs  https://review.openstack.org/61475812:52
*** e0ne has quit IRC12:55
*** zul has joined #openstack-infra12:55
*** yamamoto has quit IRC12:56
*** udesale has quit IRC12:56
*** ykarel_ is now known as ykarel12:57
*** eharney has joined #openstack-infra13:01
*** udesale has joined #openstack-infra13:01
*** GDPR is now known as emerson13:01
*** hashar has joined #openstack-infra13:02
*** jcoufal has joined #openstack-infra13:04
*** zul has quit IRC13:04
*** zul has joined #openstack-infra13:05
*** kgiusti has joined #openstack-infra13:06
*** bobh has joined #openstack-infra13:10
*** carl_cai has joined #openstack-infra13:12
*** yamamoto has joined #openstack-infra13:14
*** noama has joined #openstack-infra13:19
*** mriedem has joined #openstack-infra13:28
ralonsohsmcginnis, tonyb: hi, we have a problem in os-vif, 1.12.0 version is buggy and it's blocking all CI works in neutron13:36
ralonsohsmcginnis, tonyb : https://review.openstack.org/#/c/614764/13:36
ralonsohsmcginnis, tonyb: can you prioritize this one? thank you in advance13:36
fricklerralonsoh: that patch is currently failing zuul, see http://logs.openstack.org/64/614764/1/check/openstack-tox-validate/79b2afe/job-output.txt.gz#_2018-11-01_13_32_43_86241213:39
*** ykarel_ has joined #openstack-infra13:40
*** e0ne has joined #openstack-infra13:40
fricklerralonsoh: once the patch is approved you can also ask infra-root to promote it in the gate if needed13:41
*** ykarel has quit IRC13:42
*** ykarel_ is now known as ykarel13:43
ralonsohfrickler: thanks!13:45
*** hashar has quit IRC13:49
*** tpsilva has joined #openstack-infra13:55
*** bhavikdbavishi has quit IRC13:58
auristorianw xinliang: let me know if you would like a deep dive on openafs volume transactions.14:00
*** ramishra has quit IRC14:02
*** fuentess has joined #openstack-infra14:03
*** e0ne_ has joined #openstack-infra14:09
*** e0ne has quit IRC14:10
openstackgerritAlex Schultz proposed openstack-infra/project-config master: Fix instack jobs (again)  https://review.openstack.org/61465114:20
openstackgerritTobias Henkel proposed openstack-infra/nodepool master: Add tox functional testing for drivers  https://review.openstack.org/60951514:33
AJaegerdhellmann: regarding https://review.openstack.org/614758 - won't we have the same problem once we branch of stein? I think we need to move the release notes template back to project-config...14:37
fungiclarkb: i'm starting to see very long load times on etherpads again. did apache need restarting now that the config is in place?14:37
*** armax has joined #openstack-infra14:51
*** Swami has joined #openstack-infra14:52
dhellmannAJaeger : after we branch stein we will have 2 branches adding the same job, so zuul will only run it 1 time15:05
*** kukacz has quit IRC15:12
*** psachin has quit IRC15:13
smcginnisI noticed a couple delays with etherpads this morning. Not consistently though.15:13
*** ykarel_ has joined #openstack-infra15:13
*** ykarel has quit IRC15:16
*** ykarel_ is now known as ykarel15:16
fungireloading https://etherpad.openstack.org/p/tc-chair-responsibilities is taking a minute or more for my browser to pull up the content15:16
*** apetrich has quit IRC15:18
*** kukacz has joined #openstack-infra15:19
*** ykarel is now known as ykarel|away15:22
fungiclarkb: hrm, not getting any more AH00485 in the apache error log though, and it does appear apache was restarted after the config got updated15:23
fungiso maybe this is unrelated15:23
fungiand now refreshing that tab is fairly quick again15:24
*** anteaya has joined #openstack-infra15:27
clarkbfungi: the config shouldve been applied properly yesterday15:29
fungiyeah, i don't see any indication that it hasn't been15:31
*** gyee has joined #openstack-infra15:33
*** e0ne_ has quit IRC15:34
clarkbI even cleaned out the old symlink and restarted again to make sure the desired config worked15:34
*** kjackal_v2 has joined #openstack-infra15:34
*** e0ne has joined #openstack-infra15:34
clarkbafter I did that the cpu usage fell off according to cacti. I think a lot of that cpu was  apache trying to handle new connections. What does cacti look like more recently?15:34
*** kjackal has quit IRC15:35
*** apetrich has joined #openstack-infra15:36
openstackgerritMerged openstack-infra/irc-meetings master: Remove Daisycloud meeting  https://review.openstack.org/61269215:37
fungitcp open is still high following the xenial upgrade15:37
fungicache memory usage is almost back to where it was yesterday before we started trying things15:37
fungisystem cpu utilization did die off though and hasn't returned15:38
fungiother than a brief spike around the daily cronjob window15:38
clarkbI wonder if we have two issues. The forced reconnects and general slowness15:39
clarkbthe apache fix should fix the first15:39
clarkbbut possible second was independently happening at overlapping times making them seem related15:40
fungiperhaps15:40
mwhahahahey are you folks aware of the issues with mirror.sjc1.vexxhost.openstack.org/pypi? http://logs.openstack.org/68/614168/1/gate/tripleo-ci-centos-7-undercloud-containers/a7d8e31/job-output.txt.gz#_2018-11-01_11_55_04_27485915:41
mwhahahai scrolled up but didn't see it metioned (i may be blind)15:41
fungithere was a release job failure earlier today where that mirror failed to return the pypi simple index page for pbr15:42
fungihttp://logs.openstack.org/7f/7ff373d1b8f38666fd61adac798318877e14477d/release/release-openstack-python3/ae53d6b/job-output.txt.gz#_2018-11-01_13_34_56_38501715:42
fungii wonder if it's intermittently having trouble reaching the fastly cdn pypi uses15:43
clarkbfungi: mwhahaha ya the indexes are not cached or have very short ttls15:44
clarkbso if pypi cdn is unhappy we may notice it15:44
mwhahahai'm not sure how the pypi mirror works, but if it's a timeout wouldn't we got a 503 or something15:45
funginot if pip's internal timeout happened first15:45
clarkbmwhahaha: not necessarily, iirc apache can timeout on the front end as can pip15:45
fungilooks like pip only waits 15 seconds15:45
mwhahahai see15:45
clarkboh right apache would 500, ya probably pip15:45
*** agopi is now known as agopi|brb15:46
fungijudging from those job logs15:46
mwhahahait does seem to just be vexxhost tho15:46
mwhahahathis was the 2nd time i've seen that today and this one was in the gate15:46
clarkbmwhahaha: until a couple months ago we ran a proper mirror of pypi, but the size of pypi has increased dramatically in the last year or so due to daily pacakge releases of machine learning tools that link against cuda15:46
mwhahahapesky machine learning15:46
clarkbmwhahaha: so pypi went from a few megabytes per day of new content to a few gigabytes per day of new content15:46
fungientirely possible there's a problem on the internet between vexxhost-sjc1's backbone providers and whoever is hosting the nearest fastly cdn nodes15:46
mwhahahafair enough, i'll keep an eye on things15:47
clarkbwe can probably configure apache to ignore the ttl on those indexes and have some custom ttl set15:48
fungilooks like mirror.sjc1.vexxhost.openstack.org gets a round-robin of 4 different aaaa records for pypi.org15:48
clarkbas is I think its like a 30 second or 300 second ttl if cached at all15:49
fungioh, actually we don't have ipv6 on that mirror, so it'll be the round-robin of four a records15:49
fungiout of curiosity, did we disable ipv6 on that server for a reason?15:50
clarkbfungi: there wasn't (maybe still isn't) ipv6 in vexxhost sjc115:50
*** agopi|brb has quit IRC15:50
fungihuh. i could have sworn i had ipv6 on my instances in sjc1 though i don't have any built at the moment15:52
clarkbfungi: I know it was on mnaser's todo list, but the mirror node was one of the first things we spin up so was spun up before ipv6. It is possible the region does ipv6 now15:53
clarkbin that case we'd want to build a new mirror with ipv6 and delete the old one15:53
clarkbianw: xinliang frickler re AFS invalidating the cache aiui it doesn't discard the data, it only refreshes metadata which tells it if any of the cached data is invalid. There is a definite hit to performance particularly as rtt increases (due to afs window sizing)15:54
clarkball that to say that hitting the mirror and fetching the data you want should help even if we invalidate the metadata later15:54
EmilienMhello folks :-) can someone approve this easy one: https://review.openstack.org/#/c/614616/15:55
EmilienMso we can release - thanks!15:56
clarkbEmilienM: that project isn't on pypi yet? fungi do our jobs create the project in pypi now if necessary?15:57
fungiclarkb: yes, they do15:59
clarkbneat!15:59
fungirather, the _recommended_ way to create a project on pypi these days is to upload an sdist/wheel with twine15:59
clarkb(I thought I remembered you talking about it at some point)15:59
fungithere is no separate project registration step in warehouse15:59
EmilienMthx clarkb15:59
clarkbthat makes so much sense15:59
EmilienMjaosorior: you'll be able to release soon ^15:59
mwhahahai had chatted w/ the release folks about creating as new job that replaced the old openstack-server release job that does the python3 build but doesn't publish to pypi16:00
fungithey intentionally wanted to make it hard/impossible to register a project without an actual sdist/wheel so as to discourage squatting project names16:00
mwhahahabut haven't had time to look at it16:00
mwhahahabecause things like the ansible roles don't make sense published via pypi16:01
fungido ansible roles even warrant tarball/wheel builds?16:01
mwhahahanot really but we use the tarball bits for packaging16:02
fungiseems like for ansible roles you at most want a tag and release announcement16:02
mwhahahait really should be a publish to galaxy16:02
fungioh, yeah i wonder if anyone's working on an ansible galaxy publisher for release artifacts yet16:02
fungimordred: ^ that seems like something you would know about16:02
clarkbI think the struggle with galaxy is that it assumes github16:03
clarkband the way we run our github orgs being push only mirrors from gerrit prevents individuals from doing the github tasks to release on galaxy? I dont' recall all the details but its something like that16:03
fungitraceroute from sjc1 suggests that the current fastly endpoints for pypi.org are all via cogent directly or cogent->ntt16:06
fungii suppose if there's flakiness at the cogent/ntt peering point that could be one possible explanation for the timeouts16:06
clarkbfungi: fwiw `journalctl -u etherpad-lite` doesn't show any major errors in etherpad. There are a small number of client errors on various pads16:08
mordredclarkb, fungi: yes - there is work underway to update galaxy to allow uploading things - and to not assume github16:09
clarkbTypeErrors and ReferenceErrors from the js I think16:09
fungiyeah, it's also possible the long wait for loading some pads i was seeing was a problem on my end16:09
mordredI don't know the status of it - but it was definitely discussed at fest ... I think it's also tied in with the upcoming work on mazer16:10
clarkbI did have to reconnect to the clarkb-test etherpad when I sat down this morning but other tabs in browser indicated I probably had a local networking disconnect16:10
fungii'm seeing no icmp packet loss and reasonable rtt from mirror.sjc1.vexxhost.openstack.org to each of the 4 fastly nodes listed for pypi.org at the moment, but this sort of problem tends to come and go at random anyway so that's not surprising16:10
clarkbfungi: it could potentially have been a problem between local fastly and backend servers too?16:11
clarkband pip has a shorter timeout than fastly does16:11
*** agopi has joined #openstack-infra16:12
*** jpich has quit IRC16:13
*** imacdonn has joined #openstack-infra16:15
openstackgerritMerged openstack-infra/project-config master: Add publish jobs for ansible-role-openstack-operations  https://review.openstack.org/61461616:16
AJaegerconfig-core, could you review https://review.openstack.org/614651 and https://review.openstack.org/614758 , please?16:18
*** ykarel|away has quit IRC16:21
fungiclarkb: also entirely possible, yes. we're basically looking at two layers of caching proxies in this case (ours and fastly's)16:21
fungiand so maybe it's only affecting one or more of the fastly nodes hit by vexxhost-sjc1 and not the fastly nodes our other mirrors are resolving to16:22
fungifor $reason16:22
EmilienMAJaeger: test-release-openstack-python3 is still running on https://review.openstack.org/#/c/613621/ I'm confused16:22
EmilienMI thought mwhahaha disabled it in project-config16:23
mwhahahaEmilienM: there's a patch16:23
EmilienMah I missed that16:23
mwhahahaturns out you can't define the template and the job16:23
mwhahahacause the template wins16:23
mwhahahahttps://review.openstack.org/#/c/614651/16:23
*** pall is now known as pabelanger16:24
pabelangerclarkb: mwhahaha: you don't really publish to galaxy today, it is more about triggering a hook in galaxy to import. So, which some work you could write a job in post pipeline to ensure github.com is first mirrored (from gerrit).  But you are correct, today it is pinned to github. Shortly, ansible roles will be using mazer (new tool), which will allow you to create a tarball then push that into16:26
pabelangergalaxy.16:26
pabelangers/which/with16:26
*** e0ne has quit IRC16:26
mwhahahainteresting16:27
pabelangernext week, I'll be starting to test mazer more and publishing to galaxy-qa from zuul, just to better understand how it all works. But so far, it seems to be much like how a puppet module would be released today. There is a manifest file inside a tarball16:28
pabelangerhttps://github.com/ansible-network/sandbox/pull/24 so far is my POC, with console log for zuul job (creating tarball): https://object-storage-ca-ymq-1.vexxhost.net/v1/a0b4156a37f9453eb4ec7db5422272df/logs/24/24/732ca4af4313a479ec0ba9ab6277a165700abf7d/check/build-ansible-role-tarball/1e0ac9d/job-output.html16:29
*** slaweq has joined #openstack-infra16:29
openstackgerritJames E. Blair proposed openstack-infra/system-config master: Set ansible python version for opendev nameservers  https://review.openstack.org/61460716:32
openstackgerritJames E. Blair proposed openstack-infra/system-config master: Configure adns1.opendev.org via ansible  https://review.openstack.org/61464816:32
corvusclarkb, fungi, mordred: ^ there's a lot of infrastructure in there, so fingers crossed the tests all work, but i think that's ready to go16:33
corvusoops, one quick fix needed16:34
openstackgerritJames E. Blair proposed openstack-infra/system-config master: Configure adns1.opendev.org via ansible  https://review.openstack.org/61464816:34
*** jamesmcarthur has joined #openstack-infra16:35
corvusi accidentally checked in my local test playbook :)  that's fixed now16:35
AJaegerEmilienM: I thought the patch merged to fix it - but that was another one. So, my recheck was too early...16:36
mordredcorvus: I'm guessing that once we have a story for per-service playbooks, we'd move https://review.openstack.org/#/c/614648/3/playbooks/base.yaml into a adns.yaml playbook or something?16:36
corvusmordred: maybe...? probably? but as long as we have the giant ansible wheel, what's in that change makes a lot of sense.16:37
corvusmordred: we could go ahead and start splitting it out and doing include_playbook maybe?16:38
fungiwhat's with the "start_services=true" addition?16:39
mordredcorvus: oh - totally - I think it makes perfect sense as things are now16:39
corvusfungi: i wanted the testinfra job to verify that bind can start, but i didn't want to have the playbook start it in production.  that's my attempt to have cake and eat it too.16:40
fungioh, i see where it's called from the master-nameserver playbook now16:40
corvusif we like that approach, we can use it in other services where we don't necessarily want ansible starting services behind our backs16:41
fungii missed that playbooks/zuul/run-production-playbook.yaml was testing-only16:41
corvusfungi: oh, hrm.  i think i may have put that in the wrong place...16:43
*** pabelanger is now known as pall16:43
corvusyep, working on a new revision16:45
openstackgerritJames E. Blair proposed openstack-infra/system-config master: Configure adns1.opendev.org via ansible  https://review.openstack.org/61464816:46
corvusfungi: good catch.  i think that ^ is much more sane too :)16:46
*** sshnaidm|ruck is now known as sshnaidm|afk16:48
clarkbcorvus: lgtm, I approved the python3 base change since that should be a noop until we start trying to deploy stuff with the followup change16:48
openstackgerritMerged openstack-infra/irc-meetings master: Remove various unused Neutron meetings  https://review.openstack.org/61269816:51
*** Swami has quit IRC16:55
mordredcorvus: left a second +2 on the adns change - leaving it un +Ad assuming you might want to watch it / interact with it16:56
*** rfolco|rover is now known as rfolco|ruck16:58
*** carl_cai has quit IRC17:02
corvusyeah, we can wait for tests and fungi17:02
fungiyeah, i wanted to look at the test logs to better understand how this all works together17:02
fungisorry, still feel a little out of my depth reviewing ansible stuff17:02
*** jamesmcarthur has quit IRC17:04
corvusfungi: i'm happy to answer questions and/or add comments in followup changes17:04
clarkbI'm also happy for it to not be perfect on the first pass given this is the second set of ansible only host stuff (bridge was first)17:05
clarkbrelated ish to ^ is the yamlgroup change. ianw has written some test framework for that which I think would be good to review (I've reviewed it already)17:06
*** udesale has quit IRC17:06
*** jamesmcarthur has joined #openstack-infra17:07
fungiyeah, i'm not super worried about it breaking something, more just hesitating to vote on it until i have some grasp of what it's doing ;)17:09
*** trown is now known as trown|lunch17:12
*** jamesmcarthur has quit IRC17:16
*** jamesmcarthur has joined #openstack-infra17:17
*** abishop has joined #openstack-infra17:21
*** diablo_rojo has joined #openstack-infra17:22
*** abishop has left #openstack-infra17:23
openstackgerritClark Boylan proposed openstack-infra/system-config master: Nodepool.o.o is no longer a thing, remove it  https://review.openstack.org/61481417:24
clarkblast bit of nodepool cleanup ^17:24
*** Swami has joined #openstack-infra17:26
*** slaweq has quit IRC17:27
*** eernst has joined #openstack-infra17:28
*** e0ne has joined #openstack-infra17:31
EmilienMcan someone approve this one please: https://review.openstack.org/#/c/614651/17:31
EmilienMso we can retire some old tripleo projects17:31
*** eernst has quit IRC17:31
clarkbya I've got most of that paged in will take a look17:32
clarkbactually simpler than I expected17:33
fungiahh, we double-approved17:33
EmilienMthx :D17:35
*** jpich has joined #openstack-infra17:38
fungicorvus: looks like system-config-run-dns is running for that change now17:41
fungijust about to the good parts of the deploymen17:41
fungit17:41
*** jamesmcarthur has quit IRC17:42
fungithough system-config-run-eavesdrop just failed for it, and tox-docs seems to have as well17:42
*** bobh has quit IRC17:42
openstackgerritMerged openstack-infra/system-config master: Set ansible python version for opendev nameservers  https://review.openstack.org/61460717:44
openstackgerritMerged openstack-infra/project-config master: Fix instack jobs (again)  https://review.openstack.org/61465117:46
*** e0ne has quit IRC17:47
fungicorvus: apparently it's still trying to use puppet17:47
fungifatal: [adns1.opendev.org]: FAILED! => {"changed": false, "msg": "Unsupported puppet version '3' on this platform"}17:47
*** yamamoto has quit IRC17:49
*** mriedem has left #openstack-infra17:56
*** mriedem has joined #openstack-infra17:56
*** jamesmcarthur has joined #openstack-infra17:59
*** e0ne has joined #openstack-infra17:59
*** derekh has quit IRC18:00
*** sshnaidm|afk is now known as sshnaidm|off18:00
*** e0ne has quit IRC18:02
*** ifat_afek has quit IRC18:03
*** e0ne has joined #openstack-infra18:05
*** anteaya has quit IRC18:05
mwhahahameh now getting timed out for pypi from iad.rax http://logs.openstack.org/71/614571/2/gate/tripleo-ci-centos-7-undercloud-containers/f6332aa/job-output.txt.gz#_2018-11-01_15_02_28_46232618:06
clarkblikely a pypi issue if it is affecting clouds on different sides of the country18:07
*** e0ne has quit IRC18:07
mwhahahayea18:07
clarkbmwhahaha: https://status.python.org/ shows that pypi considers itself to be functional but there is a growing number of errors listed for pypi.org if you scroll down a bit18:10
clarkbmordred: are you around? I'm going to look at nodepool launchers and builders and getting their software updated now. To double check 0.19.0 is the sdk version we want which includes the port listing fix and the iamge task upload fix?18:12
clarkbmordred: and do you think we should restart services with 0.19.0 in place or just ensure it gets installed as expected?18:12
clarkb(I'm not sure how well tested those fixes ended up being in sdk)18:13
*** ralonsoh has quit IRC18:13
clarkbmordred: https://docs.openstack.org/releasenotes/openstacksdk/unreleased.html#relnotes-0-19-0 shows the image upload fix but not the port listing fix?18:15
clarkbI'll start with the builders in this case18:16
clarkbnb01-nb03 are removed from the emergency file. I'll check that puppet updates things there with new sdk version and restart builder daemons once that looks good18:17
*** yamamoto has joined #openstack-infra18:17
*** yamamoto has quit IRC18:17
openstackgerritRyan Beisner proposed openstack-infra/openstack-zuul-jobs master: Add py3 jobs with an unspecified minor version  https://review.openstack.org/61482318:18
corvusFailed to read build info file: ValueError("build info file is broken: IndexError('list index out of range',)",)18:21
corvusanyone know what that means in the context of a sphinx build?  (specifically 'tox -re docs' in system-config)18:22
corvusoh, maybe i should rm -rf docs/build ?18:22
*** kjackal_v2 has quit IRC18:23
corvusyes.  apparently "build info file is broken" is a typo for "please run rm -rf docs/build"18:23
mordredclarkb: yes - 0.19.0 is the one18:23
clarkbmordred: and that has the ports fix too?18:24
mordredclarkb: yah  - should do - lemme verify18:24
mordredclarkb: yes - looks like we didn't do a release note because we're bad people18:25
clarkbmordred: ok I'll do the launchers after the builders. Should I go ahead and restart launchers too? probably a reasonable idea18:26
mordredclarkb: yah - to pick up the change - although, just to be safe, maybe let's pick one launcher and restart it and then make sure it didn't delete the world18:26
clarkbok18:26
clarkbI think we still reverted the change in nodepool right?18:27
mordredyah. oh - right - so no need to do that just right now18:27
mordredwe want to do that once we land the nodepool change18:27
mordredshould be safe to update to 0.19.018:27
clarkbmostly I want to get the hosts out of the emergency file so they are no longer special18:27
clarkb(also so that normal config updates work again)18:28
*** jcoufal has quit IRC18:28
openstackgerritJames E. Blair proposed openstack-infra/system-config master: Configure adns1.opendev.org via ansible  https://review.openstack.org/61464818:28
*** jcoufal has joined #openstack-infra18:28
corvusclarkb, fungi, mordred: ^ i think that fixes all the bugs shown by the first round of testing (yay for running all the jobs :)18:28
mordredcorvus: changes look reasonable18:31
mordredclarkb: ++18:31
openstackgerritRyan Beisner proposed openstack-infra/openstack-zuul-jobs master: Add py3 jobs with an unspecified minor version  https://review.openstack.org/61482318:32
*** jcoufal has quit IRC18:32
*** jcoufal has joined #openstack-infra18:33
*** jcoufal has quit IRC18:36
*** jcoufal has joined #openstack-infra18:36
*** diablo_rojo has quit IRC18:47
*** florianf is now known as florianf|afk18:48
*** diablo_rojo has joined #openstack-infra18:48
AJaegerconfig-core, do we want this in openstack-zuul-jobs or should we ask to make this charm specific in a charm repo? ^18:48
clarkbfor some reason nb03 didn't update openstacksdk when it updated nodepool. nb01 and nb02 did18:49
clarkbdifferent versiosn of pip3 may explain it. I am going to manually update to 0.19.0 on nb03 and restart there so all the builders run the same code18:50
clarkbah ther eason is we don't require newer sdk in nodepool18:51
*** yamamoto has joined #openstack-infra18:53
*** rfolco has joined #openstack-infra18:54
*** rfolco|ruck has quit IRC18:56
*** apetrich has quit IRC18:56
*** diablo_rojo has quit IRC18:57
clarkbnb01-nb03 are restarted and running with openstacksdk 0.19.0 now. I have removed nl01-nl04 from the emergency file and will restart them once updated18:57
*** diablo_rojo has joined #openstack-infra18:59
*** jpich has quit IRC18:59
*** yamamoto has quit IRC19:00
*** pcaruana has quit IRC19:05
corvusmordred: i need the ipv4 and ipv6 addresses of the host in ansible -- the equivalent of ::ipaddress and ::ipaddress6 in puppet.  do you think that's ansible_default_ipv4.address ansible_default_ipv6.address ?19:06
corvus(i believe the adns ansible change is going to need one more revision because of that -- we really don't want named listening on all addresses, only the public ones, because unbound should be listening on the local addresses)19:07
*** apetrich has joined #openstack-infra19:08
openstackgerritClark Boylan proposed openstack-infra/system-config master: Nodepool.o.o is no longer a thing, remove it  https://review.openstack.org/61481419:09
clarkbinfra-root ^ should be a fairly easy cleanup change. Thank you pabelanger for the earlier review19:10
*** bobh has joined #openstack-infra19:13
*** trown|lunch is now known as trown19:14
*** bobh has quit IRC19:18
*** panda is now known as panda|off19:20
mtreinishinfra-root: it looks like there's something up with the mysql proxy on logstash.o.o again. It just seems to hang on trying to establish a connection to the database for me19:24
mtreinishat 2min and counting...19:24
*** electrofelix has quit IRC19:28
*** diablo_rojo has quit IRC19:31
mnasercould we create/maintain our own groups within gerrit somehow?19:32
*** diablo_rojo has joined #openstack-infra19:33
clarkbmtreinish: ok I'll see what I can find after lunch19:36
clarkbmnaser: all of that is driven by the acls file config. We create groups that show up in the acl files19:36
clarkbthe launchers did not update openstacksdk, beacuse they already had up to date nodepool. I'll have to manually update openstacksdk on them by hand :/19:37
mtreinishclarkb: ok thanks19:39
*** kjackal has joined #openstack-infra19:41
AJaegermnaser: and once a group is created and the first person added, you can maintain it yourself...19:42
*** jamesmcarthur has quit IRC19:43
*** sthussey has quit IRC19:51
*** jamesmcarthur has joined #openstack-infra19:58
ianwclarkb: empirically "from . yamlgroup import InventoryModule" works, but you're right, i wonder how19:58
*** jamesmcarthur_ has joined #openstack-infra19:58
*** jamesmcarthur has quit IRC20:02
openstackgerritKendall Nelson proposed openstack-infra/infra-specs master: StoryBoard Story Attachments  https://review.openstack.org/60737720:05
openstackgerritIan Wienand proposed openstack-infra/system-config master: [to squash] Fix groups.yaml for yamlgroup plugin  https://review.openstack.org/61469320:05
openstackgerritIan Wienand proposed openstack-infra/system-config master: Add unittest for yamlgroup inventory plugin  https://review.openstack.org/61469420:05
*** eharney has quit IRC20:08
*** e0ne has joined #openstack-infra20:11
ianwclarkb / amorin: i looks like ovh-gra1 is out again.  do we wan t to remove it?  i guess it slows things down as we waste time trying to create nodes, although ultimately isn't a correctness issue as the nodes eventually go somewhere else20:12
ianwinfra-core: would be great if somone else can pass an eye over the host list in https://review.openstack.org/#/c/614693/2/inventory/groups.yaml and just look for anomalies; otherwise i'll rebase & squish it into the base change when clarkb's nodepool.o.o removal stuff goes in20:15
*** jamesmcarthur_ has quit IRC20:15
clarkbianw: ya we probably should remove gra1. Let me restart nl04 to pick up the openstacksdk update (this launcher runs ovh) then we can update the config from there?20:17
clarkbI'll restart the other three launchers if ovh bhs1 continues to look happy20:17
clarkbrestart on nl04 is done20:18
ianwclarkb: ++ sounds like a plan, good idea to stage the rollout :)20:18
*** eernst has joined #openstack-infra20:19
clarkbmtreinish: simpleproxy has >7k processes running. My hunch is that it has run out of processes or fds or both20:19
mtreinishlol, ok20:19
clarkbmtreinish: we upgraded this server from trusty to xenial not too long ago. I wonder if this is a bug in simpleproxy20:19
mtreinishyeah, that sounds like a bug20:20
mtreinishor a dos :p20:20
clarkbmtreinish: I'm going to restart simple proxy and we can monitor to see if it happens again20:20
*** eernst has quit IRC20:20
mnaserclarkb: AJaeger my apologies, i meant to say was project groups that is20:20
*** eernst has joined #openstack-infra20:20
clarkbmnaser: you'll need to use more words I think for me to undestand what you mean by that20:21
clarkbmtreinish: service stop totally didn't stop the service :)20:21
mnaserclarkb: something like https://review.openstack.org/#/admin/projects/API-Projects?20:21
mnaserso we can watch directly on that20:21
mnaser(and have all sorts of projects be children of it, that way, we can just set "openstack-ansible" as watched project rather than listing them all)20:22
mtreinishclarkb: hmm, it looks like it's the same version (just a package revision) between trusty and xenial: https://packages.ubuntu.com/search?keywords=simpleproxy20:22
clarkbmnaser: do project watches work that way? the API-Projects project is for common acls isn't it?20:23
mtreinishheh, it's not like it's a super active project either: https://github.com/vzaliva/simpleproxy :)20:23
clarkbmnaser: I think if you want to see events from individual projects you have to subscribe to them but I've never tested it20:23
clarkbmtreinish: ok I manually cleaned up the process and restarted the service. Can you test it now20:23
mnaserclarkb: well i bring this up because when i wanted to watch projects but i realized that adding all openstack ansible stuff would take forever20:24
clarkbmtreinish: there are already 7.9k processes again20:24
mnaserand would fall out of sync20:24
clarkbmnaser: sure, I'm just not sure that the proposed solution solves the problem either :/20:24
mnaserah i see20:24
*** eernst has quit IRC20:25
*** jamesmcarthur has joined #openstack-infra20:25
mnaseris there a common way that folks have been adding a lot of projects under a team or something like that?20:25
mtreinishclarkb: yeah, still the same thing, it just hangs trying to connect20:25
*** jamesmcarthur has quit IRC20:26
clarkbmtreinish: my hunch reading the code is that maybe its a systemd interaction20:26
clarkbthey have inetd handling but probably don't do the right thing with systemd maybe?20:26
*** eernst has joined #openstack-infra20:26
*** rlandy is now known as rlandy|brb20:26
*** e0ne has quit IRC20:28
*** jamesmcarthur has joined #openstack-infra20:28
clarkbNov 01 20:20:25 logstash01 simpleproxy-mysql[26071]: start-stop-daemon: need at least one of --exec, --pid, --ppid, --pidfile, --user or --name20:29
clarkbmnaser: I think some people have scripted the subscriptions? I do the opposite and don't watch anything via gerrit because its email is already too noisy20:30
mtreinishclarkb: oh, that might be on us. I think we wrote an init file in the puppet-simpleproxy20:30
mtreinishand on xenial we might have to update it for systemd20:30
mtreinishhttp://git.openstack.org/cgit/openstack-infra/puppet-simpleproxy/tree/templates/simpleproxy-mysql.init.erb20:30
*** eernst has quit IRC20:31
*** ansmith has quit IRC20:31
*** kgiusti has left #openstack-infra20:32
clarkbmtreinish: using an init script is fine, we just need to tell start-stop-daemon how to identify the service20:32
*** dave-mccowan has quit IRC20:32
clarkblooks like simple-proxy takes a -p argument for the pidfile20:32
*** eernst has joined #openstack-infra20:32
clarkbso we can do that  or use --exec to point to the binary20:32
clarkbmnaser: reading the gerrit docs All-Projects is special for all projects visiable to the user, which is all our projects.20:33
clarkbmnaser: that behavior appears built in and isn't configurable on a subset from what I can see20:34
clarkbmnaser: what I think you can do is subscribe to All-Projects then define the search string in the only-if to match osa with a regex maybe?20:34
clarkb"Changes occurring in 'PROJECT'. If 'PROJECT' starts with ^ it matches project names by regular expression. The dk.brics.automaton library is used for evaluation of such patterns."20:35
clarkbmnaser: ^ that is what I would try. Something like subscribe to All-Projects then in onlyif box: project:^openstack/openstack-ansible.*20:35
clarkbianw: looks like the gra1 errors are due to instance quota?20:36
clarkbianw: I think our quota must've gotten out of sync there20:36
openstackgerritMerged openstack-infra/system-config master: Nodepool.o.o is no longer a thing, remove it  https://review.openstack.org/61481420:36
*** eernst has quit IRC20:37
*** eernst has joined #openstack-infra20:38
*** eernst has quit IRC20:42
clarkbmtreinish: I've done what I think start-stop-daemon want. Can you take a look again? lets see if that fixes it. If so I can push a change up20:44
mtreinishclarkb: hmm, still looks like it's hanging20:46
clarkbmtreinish: ok it seems to forkbomb when you connect (which makes sense since its trying to do process per connection I think)20:47
clarkbmtreinish: my dumb telnet didn't induce it to do this20:47
mtreinishclarkb: fwiw, I"m just running 'mysql -u query --password=query -h logstash.openstack.org subunit2sql'20:50
*** slaweq has joined #openstack-infra20:50
*** eernst has joined #openstack-infra20:50
*** bobh has joined #openstack-infra20:52
clarkbmtreinish: and you must've closed the connection? beacuse I see no more processes now20:52
mtreinishyeah I did20:53
mtreinishdidn't see the point in having it sit there20:53
ianwclarkb: maybe ... last errors in the cleanup were neutron timeouts20:53
clarkbya ok I can reproduce that locally now. Start single connection and it forkbombs. Kill conncetion no more forks20:54
*** bobh has quit IRC20:56
*** bobh has joined #openstack-infra20:56
openstackgerritIan Wienand proposed openstack-infra/system-config master: Add yamlgroup inventory plugin  https://review.openstack.org/60238520:56
openstackgerritIan Wienand proposed openstack-infra/system-config master: [to squash] Fix groups.yaml for yamlgroup plugin  https://review.openstack.org/61469320:56
openstackgerritIan Wienand proposed openstack-infra/system-config master: Add unittest for yamlgroup inventory plugin  https://review.openstack.org/61469420:56
clarkbfirst attempt at strace failed. But this is reproduceable so I should be able to get there20:58
*** eernst has quit IRC20:59
*** xarses_ has quit IRC21:04
*** erlon has quit IRC21:05
*** bobh has quit IRC21:05
*** eernst has joined #openstack-infra21:05
clarkbmtreinish: the strace makes it look like it is getting many connections. the sin_port changes for my localhost connection via telnet, it then forks (well really clones) a bunch of subprocesses21:06
clarkbmtreinish: https://github.com/vzaliva/simpleproxy/blob/master/simpleproxy.c#L419 that returns way more than I woudl expect it basically21:07
*** eernst has quit IRC21:08
clarkbianw: looks like we continue to function on nl04, should be good to restart hte other three launchers with new sdk now ya?21:08
mordredclarkb: yay! I always like it when things continue to function21:10
openstackgerritJames E. Blair proposed openstack-infra/system-config master: Configure adns1.opendev.org via ansible  https://review.openstack.org/61464821:11
openstackgerritJames E. Blair proposed openstack-infra/system-config master: Remove puppet config for opendev nameservers  https://review.openstack.org/61486921:11
openstackgerritJames E. Blair proposed openstack-infra/system-config master: Configure opendev nameservers using ansible  https://review.openstack.org/61487021:11
*** trown is now known as trown|outtypewww21:11
corvusclarkb, mordred, fungi: ^ removed uneeded puppet stuff, fixed up an error i missed last time in adns1, reworked the group vars slightly, and added the stuff for the other nameservers21:12
corvusthe whole stack is there now, and should be ready to go modulo lingering bugs21:12
corvusnow i need to look an ianw and mordred's group stuff because they conflict21:12
corvusclarkb, ianw: yamlgroups uses regexes, right?21:13
clarkbcorvus: current implementation is fnmatch in python which is unix shell globbing21:13
corvusoh ok21:14
corvusi should be able to rework based on that21:14
corvusare we more or less ready to put yamlgroup in?  ie, should i rebase the nameserver stack on it?21:14
ianwcorvus: i think so, at least i'm happy if we have made a mistake, we have some testing to avoid making the same mistake twice :)21:15
clarkbmtreinish: ok I figured it out http://git.openstack.org/cgit/openstack-infra/puppet-simpleproxy/tree/templates/simpleproxy-mysql.init.erb#n19 db_host isn't being populated for some reason21:15
clarkbmtreinish: the default is to connect to localhost. So what is happening is first connection comes in then proxy connects to itself21:16
clarkbmtreinish: then that creates a recursive talk to myself loop21:16
corvusianw: do i understand correctly that yamlgroup is added in ansible 2.8 which is not released yet, but when it is, we can drop our local copy.21:16
ianwoh, i'm not sure on the upstreaming plans21:17
ianwit doesn't seem to be in my devel branch checkout21:18
corvusianw: oh, mordred wrote that from scratch?  i thought it was a backport.21:18
clarkbI'm taking the hiera/ansible hostvar lock21:18
corvusneat.21:18
ianwok, since you and clarkb have looked over the groups.yaml now, i'll squash it down21:19
corvusso, rather, yamlgroup is original to mordred and may be upstreamed when he gets around to it :)21:19
*** jamesmcarthur has quit IRC21:19
clarkband done21:19
mordredyeah - it's original work - haven't put much thought/energy into upstreaming yet21:19
mtreinishclarkb: ooh fun21:20
corvusianw, mordred: in that case -- real quick -- are we sure we want fnmatch and not regex? :)21:20
clarkbmtreinish: I think this should sort itself out on the next puppet run21:20
clarkbmtreinish: I had to update some hiera because the hostname changed from logstash.o.o to logstash01.o.o21:20
mordredcorvus: nope! not sure at all - I did fnmatch because all of the escaping of \. in things ... but I don't have strong feelings21:20
corvusianw, mordred: i'm asking on behalf of the regex "(ad)?ns\d+\.opendev\.org".  i'm sure we can just split that into two lines and it'll be fine.21:20
ianwcorvus: i don't mind ... and that was why i wrote the unit-test framework so switching it could be more like test-driven development :)21:21
corvusmordred: yeah.  i think i can live with two fnmatches for (ad)ns in exchange for not typing "\." all the time.21:21
corvusjust thought i'd check.21:21
clarkbok with logstash sorted I'm going to restart the other three nodepool launchers21:21
*** rlandy|brb is now known as rlandy21:21
openstackgerritIan Wienand proposed openstack-infra/system-config master: Add yamlgroup inventory plugin  https://review.openstack.org/60238521:21
openstackgerritIan Wienand proposed openstack-infra/system-config master: Add unittest for yamlgroup inventory plugin  https://review.openstack.org/61469421:21
mordredcorvus: we're getting to agreeable in our old age21:22
corvusmordred: no we're not21:22
corvusi'm rebasing the adns/ns changes on yamlgroup now21:22
mordredcorvus: yes we ARE!21:23
mordredcorvus: I left a couple of nitpick reviews on those21:23
mordredcorvus: so if you're respinning, might be worth looking at21:23
corvusmordred: ah thanks, i'll grab them in this round21:23
clarkb#status log openstacksdk 0.19.0 installed on nl01-04 and nb01-03 and all nodepool launchers and builders have been restarted21:24
openstackstatusclarkb: finished logging21:24
clarkbmordred: ^ fyi21:25
mordred\o/21:26
clarkbmnaser: when you meeting is over I'd be curious to learn if you subscription chnge as I described above works or not21:26
corvuswow... i just realized... we're probably about 10 lines of code away from being able to have the 'dns' test job *actually* serve a copy of the opendev zone and verify the master and authoritative servers are serving the data.21:26
mnaserclarkb: i'll give it a shot in a little bit, middle of an upgrade :<21:27
mordredcorvus: that's super cool21:27
mnaserfamous last words, 10 lines which end up taking 3 weeks to get working21:27
mnaser:P21:27
*** jamesmcarthur has joined #openstack-infra21:27
corvusmnaser: absolutely!21:27
mordredmnaser: ++21:32
openstackgerritJames E. Blair proposed openstack-infra/system-config master: Remove puppet config for opendev nameservers  https://review.openstack.org/61486921:33
openstackgerritJames E. Blair proposed openstack-infra/system-config master: Configure adns1.opendev.org via ansible  https://review.openstack.org/61464821:33
openstackgerritJames E. Blair proposed openstack-infra/system-config master: Configure opendev nameservers using ansible  https://review.openstack.org/61487021:33
clarkbmtreinish: ok try it now21:33
corvusclarkb, fungi, mordred, ianw: okay that's rebased, fixed, and ready to go21:33
*** ansmith has joined #openstack-infra21:34
clarkbcorvus: ok I'll be taking a look at the whole stack shortly21:35
ianwcorvus: should we babysit through the ansible update with https://review.openstack.org/#/c/609556/ and get that out of the way?21:35
TheJuliais etherpad grumpy today or is it just my browser?21:35
mtreinishclarkb: it works! \o/21:35
mtreinishclarkb: thanks, now I can generate a new graph :)21:36
*** slaweq has quit IRC21:36
clarkbmtreinish: yay, and sorry for that, but that was totally my derp when we deployed the upgraded thing.21:36
*** agopi is now known as agopi|pto21:36
*** xek__ has quit IRC21:36
clarkbTheJulia: we've been trying to track down "slowness". There was a webserver configuration bug that we fixed yesterday. Any more data since then would be helpful to track down additional problems21:37
clarkbfungi: re ^ I'm beginning to suspect the database since the server itself seems happy now memory and cpu wise21:37
clarkbmaybe mordred can take a look at the database?21:37
TheJuliaI just closed some of my browser tabs and it seems happier, but I kept suddenly loosing connection like the socket was closing out behind it, but I didn't manage to capture anything like tcpdumps of it21:38
clarkbTheJulia: did that happen recently? the webserver config bug from yseterday would've caused that21:38
clarkbTheJulia: but we now believe that to be fixed21:38
TheJuliaclarkb: multiple times with-in the last 5 minutes21:38
clarkbTheJulia: also if you haven't looked at etherpad since yesterday we had to restart things to pick upt he config change21:38
corvusianw: yes let's.21:38
mordredcorvus: hrm. I may have lied to you21:39
corvusruh roh21:39
corvusmordred: i find that very disagreeable :)21:39
mordredcorvus: I agree!21:39
*** carl_cai has joined #openstack-infra21:40
mordredcorvus: ok. it's not ansible.default_ipv4 - it's ansible_facts.default_ipv4 ...21:40
mordredcorvus: and it's not ansible_facts.default_ipv4, it's ansible_facts.default_ipv4.address21:41
*** agopi|pto has quit IRC21:41
mordredcorvus: http://paste.openstack.org/show/733922/21:41
clarkbTheJulia: were those pads in use before the reconnect but sometime today (so after the restarted webserver yesterday)21:41
TheJuliaclarkb: brand new pads21:42
TheJuliaOne time it just spinned on loading the boiler plate new pad message21:42
fungiclarkb: mtreinish: i'm still catching up on scrollback, but is it possible the proxy is connecting to itself spawning another proxy recursively?21:43
fungioh, as i read further, that's precisely what happened21:43
*** bobh has joined #openstack-infra21:43
openstackgerritJames E. Blair proposed openstack-infra/system-config master: Configure adns1.opendev.org via ansible  https://review.openstack.org/61464821:43
openstackgerritJames E. Blair proposed openstack-infra/system-config master: Configure opendev nameservers using ansible  https://review.openstack.org/61487021:43
corvusmordred: ^21:43
clarkbfungi: yup!21:44
clarkbTheJulia: ok reading the etherpad lite logs I don't see any errors recently other than for stx-* etherpads (whcih seem older than brand new)21:44
clarkbTheJulia: which implies that probably another webserver problem21:44
mordredcorvus: that nsd.conf looks so weird with ip-address given twice21:44
TheJuliaclarkb: well, since I closed a bunch of tabs, it has seemed better behaved, some of those windows had disconnected etherpads21:45
clarkbI wonder if it is reusing connections and not noticing they have been closed21:46
corvusmordred: it is correct though.  :)  not actually a yaml file :)21:46
TheJuliait shouldnt....  considering the max lifetime in a browser is supposed to only be 2 minutes 30 seconds if not actively doing something21:47
TheJuliaor a websocket21:47
clarkbTheJulia: ya it uses websockets21:47
TheJuliaobscure browser behavior knowledge \o/21:47
TheJuliayeah, it could have been really confused then, but those shouldn't be reused across windows...21:47
TheJuliafeels like a browser bug21:48
clarkbwell let us know if you learn anything else about the behavior we can continue to try and track down the weirdness.21:48
*** jamesmcarthur_ has joined #openstack-infra21:48
clarkbfwiw I've not personally noticed etherpad issues since we fixed the server config21:48
*** bobh has quit IRC21:48
TheJuliawill do, if I see it again I'll start packet capturing and such21:48
clarkbdouble checking the server status page we have tons of free slots for connections. There are only 194 connections and we allow for 4k21:50
*** bobh has joined #openstack-infra21:51
*** jamesmcarthur has quit IRC21:52
clarkbfungi: smcginnis dhellmann ^ have any other data to add in the last ~12 hours or so?21:52
TheJuliahmmmm... "Loading..."21:52
clarkbTheJulia: this is on a newly opened pad?21:53
* TheJulia goes and opens up tcpdump21:53
TheJuliajust tried to create a new pad21:53
clarkbok I was able to click the new pad button and get one. /me tries by direct url21:53
clarkbya that works too. So its not consistent :(21:54
clarkbfwiw I am running firefox 63.0b1421:54
TheJuliahmmmm21:54
TheJuliaI have a theory21:54
TheJuliaactually having to install tcpdump now21:54
clarkbok. Chrome 70.0.stuff seems to work too21:55
TheJuliayeah, seems my browser is rolling over on six connections and it is trying to re-use :\21:57
TheJuliaweird21:57
TheJuliaand it tries resetting the one it is killing and... wow21:58
* TheJulia switches browsers21:58
smcginnisI just tried 4 different etherpad links and they all opened in a decent amount of time.21:59
smcginnisChrome on my side21:59
* TheJulia tries to repopulate the pad again :(21:59
clarkbTheJulia: fwiw http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=115&rra_id=all shows a spike in new connections to etherpad since we upgraded the server for it. This is what pointed at a bad server config (our connection tuning stuff to allow more connections wasn't working), but it hasn't gone away since we fixed that22:00
clarkbperhaps newer apache with websockets tickles some bug in $browser?22:00
mordredclarkb: oh good22:01
TheJuliaMaybe, this is a relatively new desktop build and I've just been using the default firefox debian installed, so far chrome seems to be behaving22:01
*** jcoufal has quit IRC22:02
mordredTheJulia: my favorite part of modern life is how chrome works for a while, then stops, at which point firefox is the right choice, until something breaks it and it turns out chrome works best22:04
TheJuliamordred: sadly that doesn't come with tequila22:04
mordredTheJulia: anything can come with tequila22:05
mordredTheJulia: it's a simple solution - just add tequila22:05
TheJulia:)22:05
clarkbya chrome websocket network monitoring thing looks pretty happy here too22:07
clarkbfirefox's is unfortunately far less readable22:08
clarkboh now that is curious though22:09
clarkbit almost looks like firefox is not using web sockets22:09
clarkbbut chrome is22:09
openstackgerritMerged openstack-infra/system-config master: Install current ansible  https://review.openstack.org/60955622:10
clarkbor maybe even chrome isn't?22:11
clarkbI'm going to do a thing22:15
clarkbmod proxy wstunnel was not enabled. I have enabled ti and restarted apache22:16
clarkbnow to test if my browsers act differently22:16
*** threestrands has joined #openstack-infra22:17
clarkbI think this is the fix22:18
clarkbetherpad was falling back on the non ws method and polling22:18
clarkbTheJulia: ^ want to see if it is happier for you now?22:18
* clarkb writes a change to make this actually puppeted22:18
TheJuliaack22:18
openstackgerritClark Boylan proposed openstack-infra/puppet-etherpad_lite master: Enable mod_proxy_wstunnel  https://review.openstack.org/61488322:20
clarkbinfra-root ^ that may fix the etherpad struggles22:20
clarkbI enabled it manually already though because having TheJulia test seems like a good idea22:20
TheJuliano issues so far22:21
TheJuliaseems to be rocking along22:21
clarkbalright I need a short break then will review the ansible dns and yamlgroup stack22:22
*** mriedem has quit IRC22:23
*** bobh has quit IRC22:24
*** boden has quit IRC22:29
dhellmannclarkb : I have not been experiencing any issues with etherpad this afternoon.22:34
dhellmannTheJulia : I've seen issues like what you describe when I had a bad JS file cache for the etherpad server.22:34
dhellmannclearing my cache fixed it22:34
TheJuliait has been rock solid since clarkb's change and the restart22:36
TheJuliaBut, I have too seen something like that in the past where I needed to update my cache22:36
TheJuliaAnyway, I think I'm done creating etherpads today.22:36
openstackgerritMerged openstack-infra/yaml2ical master: add monthly recurrence options  https://review.openstack.org/60868022:49
*** jamesmcarthur_ has quit IRC22:50
*** fuentess has quit IRC22:54
clarkbcorvus: test failure on https://review.openstack.org/#/c/614648/8 I've identified what I believe is the issue22:55
clarkbcorvus: I'm happy to push a new ps if that helps too22:55
corvusclarkb: what do you think it is?22:56
clarkbcorvus: sorry I left comment on the change. We check if named is running on adns but should check if bind9 is running22:57
clarkbalso there is a docs bug on the child change I'm trying to understand now22:57
corvusclarkb: oh i see22:57
*** yamamoto has joined #openstack-infra22:57
clarkbI think our :zuul:rolevar: sphinx thing may only look in the current role?22:58
corvusclarkb: i'll fix both22:58
clarkbok reading the master-nameserver REAMDE  Idon't see the bind_repos rolevar which is likely the issue22:59
corvusyeah, bad copy pasta22:59
corvusclarkb: replied to comment22:59
clarkbah23:00
clarkbso ps vs systemctl23:00
corvusya23:00
corvuslike apache/httpd :)23:00
*** yamamoto has quit IRC23:02
corvushrm.  so for the docs bug -- should i just drop the 'source' attribute from dns_zones in the nameserver role, or fix it?  the source attribute is only used by the master nameserver, but it's sort of expected that you'd use the same data structure for both23:02
corvusi'll fix it.  it's a nice cross-reference.23:03
openstackgerritJames E. Blair proposed openstack-infra/system-config master: Configure adns1.opendev.org via ansible  https://review.openstack.org/61464823:03
openstackgerritJames E. Blair proposed openstack-infra/system-config master: Configure opendev nameservers using ansible  https://review.openstack.org/61487023:03
*** tpsilva has quit IRC23:04
openstackgerritMerged openstack-infra/puppet-etherpad_lite master: Enable mod_proxy_wstunnel  https://review.openstack.org/61488323:08
openstackgerritMerged openstack/diskimage-builder master: Add ubuntu-systemd-container operating-system element  https://review.openstack.org/56374823:09
*** agopi|pto has joined #openstack-infra23:10
clarkbcorvus: why did we change from listening on any interface to the specific ip addresses?23:10
mordredclarkb: because unbound is going to listen on the local addresses23:12
clarkbah23:12
*** noama has quit IRC23:14
*** owalsh_ has joined #openstack-infra23:14
*** owalsh has quit IRC23:15
clarkbcorvus: ok pointed out one thing between the two changes that may be a bug in the group vars23:16
clarkbspecifically we set the group ns to be ns1 and ns2.openstack.org but then in the ns.opendev.org change we use group_vars/ns when testing23:16
*** kjackal has quit IRC23:19
corvusclarkb: thanks, replied23:27
clarkbcorvus: so we do need a new patchset to add ns1.opendev.org and ns2.opendev.org to ns gruop?23:28
corvusclarkb: yeah, i'm revising the adns1 patch now23:28
clarkbok23:29
openstackgerritJames E. Blair proposed openstack-infra/system-config master: Configure adns1.opendev.org via ansible  https://review.openstack.org/61464823:29
openstackgerritJames E. Blair proposed openstack-infra/system-config master: Configure opendev nameservers using ansible  https://review.openstack.org/61487023:29
corvusclarkb: how's that look?23:29
clarkbcorvus: +2 to both changes now. Thanks23:30
corvusianw: i still see 2.7.0rc1... maybe we do need latest.23:31
*** jamesmcarthur has joined #openstack-infra23:31
ianwcorvus: yeah, i wasn't sure but i think the "ensure: latest" is probably required or it's just happy it's there at all23:34
openstackgerritJames E. Blair proposed openstack-infra/system-config master: Really install current ansible on bridge  https://review.openstack.org/61488923:34
corvusianw: ^23:34
clarkbhrm do we want latest though?23:34
clarkblike 2.8 is likely to break us beacuse every release breaks the users of the last one ...23:35
corvuswell, there's a patch upon which people can pontificate23:35
clarkbcan we set a specific version /me reads docs23:35
corvusclarkb: yes, we previously set version 2.7.0rc123:35
*** jamesmcarthur has quit IRC23:35
corvusclarkb: see https://review.openstack.org/60955623:35
ianwis it better to break quickly and force us to fix it (or take action), or wait until we need to update then have the pain of any intermediate bit rot ...23:36
clarkbmy vote would be to set the version to 2.7.023:36
corvusso if we want, we can just switch to 2.7.0.23:36
ianwan eternal question23:36
clarkbianw: my concern would be for ansible to apply the wrong state remotely23:36
clarkbianw: rather than safely failing23:36
corvusif the failure mode is "ansible stops working" i'm fine with bleeding edge.  if it's that^ then...23:37
ianwhow much do you test ansible's ci? :)23:37
ianws/test/trust/23:37
ianwor maybe i mean test too :)23:37
clarkbha23:37
corvusif we go with clarkb's approach, we do get the benefit of our own testing (if we run all of the system-config jobs on a version bump)23:38
ianwactually future versions does seem like something we can at least minimally (in fact more than minimally) test with our jobs?23:39
ianwyeah, the trick is just having something testing it.  we could even do a "ansible git master" job?23:40
clarkbianw: oh thats a neat idea, I think we could do that23:40
ianwwhy do i keep saying testing.  i mean something updating it!23:40
ianw$ git describe --tags --abbrev=023:42
ianwv2.7.0.a123:42
ianwwould possibly be the way to test "the latest release"?23:42
clarkbya or even just point at master23:42
clarkber devel23:42
ianwyep, as long as it doesn't fail so much that nobody ever bothers to look at it23:43
corvusclarkb: be sure to leave a -1 on 614889 :)23:43
clarkbcorvus: yup23:43
corvuslooks like adns is passing, but ns is not23:47
*** carl_cai has quit IRC23:50
corvusthough it failed on the adns portion23:50
corvusi think it's because there's no ipv6 address23:51
corvuswe can assume there will be one in production, but not in tests23:51
corvus(likewise, we shouldn't assume there will be an ipv4 address in tests too)23:51
clarkbshould be able to wrap those sections with jinja that test if the ipv* vars are set and set them then?23:52
mordredcorvus: is it working add an {% if ... yeah - that ^^23:52
corvusyeah i think that'll work for now23:52
corvusthose 10 lines to make a functioning test system with notifies are going to grow to 20... but that's later :)23:52
*** Swami has quit IRC23:53
mordredhaha23:54
*** gyee has quit IRC23:58
openstackgerritJames E. Blair proposed openstack-infra/system-config master: Configure adns1.opendev.org via ansible  https://review.openstack.org/61464823:58
openstackgerritJames E. Blair proposed openstack-infra/system-config master: Configure opendev nameservers using ansible  https://review.openstack.org/61487023:58

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!