Wednesday, 2017-08-16

openstackgerritIan Wienand proposed openstack-infra/system-config master: Add mirror01.iad.rax.o.o
ianwclarkb / fungi : ^ since we're talking load-balancing, etc, i think move to the "01" naming ... mirror01.iad.rax.o.o is up but needs puppeting00:12
ianwi should probably turn down the ttl on mirror.iad.rax.o.o to facilitate a cut-over00:12
pabelangerokay, reverse proxy cache for is working still. tripleo jobs are now hitting it00:13
clarkbianw: will the vhost answer to mirror.iad.rax.o.o if the hostname is something else?00:15
* clarkb looks00:15
pabelangerWow, mirror.dfw.rax.o.o is only 2GB of RAM currently00:16
pabelangerbut seems to be holding up00:16
ianwpabelanger: i thought that was intentional?  i also brought up the new one as a 2gb instance00:16
clarkbianw: ya posted a comment I think the vhost name will be a problem as is00:16
ianwi guess this was before we were doing a lot of reverse proxying00:17
clarkbianw: pabelanger ya I think only ord is the bigger flavor00:17
pabelangerYa, for bandwidth I think00:17
clarkbsince dfw and iad are temporarily larget quotas but are typically bigger?00:17
* clarkb double checks that00:19
clarkbya iad and dfw are typically smaller but tempoararily have quota bumps thanks to cloudnull00:20
clarkbord we increased the size of and is more permanently large00:20
ianwclarkb: hmm, some sort of *'d serveralias might work?  ...00:23
clarkbianw: ya left that in my comment on the change00:24
clarkbI think you can do vhost_name => '*' instead of fqdn in site.pp00:24
ianwclarkb: it looks like puppet-httpd supports serveraliases00:24
clarkband since each vhost is on different ports on that server and not based on name I think it is fine00:24
clarkboh you mean serveralias directive, I think since we supply our own template we have to update the template too, but then how do you make iad server only respond to mirror01.iad and mirror.iad and not mirror.dfw?00:25
clarkbmaybe we just let them respond to all the names and not care too much about it?00:25
clarkbwe could also have ruby in the erb do string munging to remove the 01 in the serveralias00:26
ianwWould "ServerAlias mirror*" not match?00:27
clarkbit would but this is where puppet is painful, getting a different rule for each region00:28
fungii think using a default vhost should be fine00:30
*** hongbin has joined #openstack-infra00:35
openstackgerritIan Wienand proposed openstack-infra/system-config master: Add mirror01.iad.rax.o.o
ianwclarkb / fungi : ^ untested but presented for discussion :)00:41
openstackgerritIan Wienand proposed openstack-infra/system-config master: Add mirror01.iad.rax.o.o
openstackgerritIan Wienand proposed openstack-infra/system-config master: Add mirror01.iad.rax.o.o
*** liusheng has quit IRC01:21
pabelangersurprisingly: we're capping out at 15Mb/s in mirror.dfw currently01:21
pabelangermirror.ord.rax is at 400Mb/s
pabelanger150Mbs for dfw*01:22
fungimaybe not all that surprising... you did say it's only a 2gb instance right?01:23
pabelangerwe're also caching a lot of things now too01:24
fungii think the 4gb instance in ord was capping out at 200mbps and then when i upgraded it to an 8gb instance we were able to spike higher (or maybe i'm getting some of those values mixed up, it's sorta late here)01:24
pabelangerthat sounds right01:24
fungiso would make sense that a 2gb instance could have a still lower bw cap01:25
openstackgerritIan Wienand proposed openstack-infra/system-config master: Add mirror01.iad.rax.o.o
pabelangerYa, something to keep an eye on with more traffic flowing through mirrors, might result in long job run times01:27
ianwfungi: so what size do we want the xenial instances?01:28
pabelangerbut so far, reverse proxy cache is working !01:29
*** gouthamr has joined #openstack-infra01:30
fungiianw: i think it depends a bit on provider constraints (like tying bw caps to flavors) and relative quotas01:31
openstackgerritIan Wienand proposed openstack-infra/system-config master: Add mirror01.iad.rax.o.o
ianwfungi: well, for the specific instance of this rax.iad one ... 8gb?01:31
fungii'm looking to see how that matches up to ord quota-wise01:32
fungi~75% the size of ord's quota... so yeah i'd probably do an 8gb flavor there as well01:33
fungiprobably ought to consider the same for dfw too01:34
fungisince it's only 5 instances lower than the quota for iad01:34
clarkbthose are temporary bumps though01:35
clarkb(though oversizing doesnt hurt)01:35
*** jamesmcarthur has joined #openstack-infra01:35
fungiright, more concerned that we may be breaking some jobs due to packet loss when the proxies reach the bw caps for their present flavors01:36
fungiseems to have been the case in ord anyway until i beefed up the mirror there01:36
pabelangerclarkb: fungi: we should likely calculate # jobs / available bandwidth for our mirror servers too. We might find we are over saturating some regions too.01:38
pabelangerfor example, mirror.ord I see ~ 18 Mb/s per host01:39
pabelangerin mirror.dfw, 21 Mb/s01:39
*** dave-mccowan has quit IRC01:39
pabelangerbut, that is enough for tonight. I'm happy to see caching working01:41
*** slaweq has joined #openstack-infra01:42
fungiyeah, i'm about to disappear for the night myself, once the tc office hour wraps up in ~15 minutes01:44
*** tuanluong has joined #openstack-infra01:44
*** Apoorva_ has joined #openstack-infra01:46
*** slaweq has quit IRC01:47
clarkbalso is bw in region typically limited?01:48
clarkbin rax we use the public ip so it is01:48
clarkbbut in other clouds that may be less an issue01:48
fungiright, which is why i was saying the requirements for sizing are going to vary by provider01:51
*** thorst has quit IRC01:59
*** EricGonczer_ has joined #openstack-infra02:15
*** ramishra has quit IRC02:44
*** gongysh has quit IRC02:47
*** slaweq has quit IRC02:48
openstackgerritzhurong proposed openstack-infra/project-config master: Register solum-tempest-plugin jobs
openstackgerritzhurong proposed openstack-infra/project-config master: Register solum-tempest-plugin jobs
*** sbezverk has joined #openstack-infra03:24
*** EricGonczer_ has quit IRC03:30
openstackgerritIan Wienand proposed openstack-infra/system-config master: Add mirror01.iad.rax.o.o
openstackgerritIan Wienand proposed openstack-infra/system-config master: Add mirror01.iad.rax.o.o
*** slaweq has joined #openstack-infra03:44
*** slaweq has quit IRC03:49
*** ramishra has quit IRC03:55
*** david-lyle has joined #openstack-infra04:08
*** hongbin has quit IRC04:24
ianwclarkb / fungi: i think mirror01.iad.rax.o.o is close to ready ... see my comments in ->
*** slaweq has joined #openstack-infra04:45
*** r-daneel has quit IRC04:50
*** r-daneel has joined #openstack-infra04:54
*** Hal is now known as Guest5556505:18
AJaegerjeblair: 492715 is merging now - you caught all capitalizations05:23
*** markvoelker has joined #openstack-infra05:32
*** udesale__ has joined #openstack-infra05:35
*** udesale has quit IRC05:35
*** liujiong has quit IRC05:37
*** wolverineav has joined #openstack-infra05:41
AJaegeris there a status page for zuul v3? The gate was entered 25 mins ago which looks long05:49
*** slaweq has joined #openstack-infra05:51
*** tnovacik has joined #openstack-infra05:55
*** slaweq has quit IRC05:56
*** namnh has joined #openstack-infra06:05
hwoaranggood morning. has anyone reported any dns failures on jobs? i've just seen on for opensuse but i also saw another one yesterday
*** liujiong has joined #openstack-infra06:16
*** jamesdenton has quit IRC06:16
*** adisky__ has joined #openstack-infra06:19
*** zhurong has joined #openstack-infra06:22
*** wolverin_ has joined #openstack-infra06:26
*** wolverineav has quit IRC06:29
*** wolverin_ has quit IRC06:34
*** wolverineav has joined #openstack-infra06:35
*** udesale has quit IRC06:35
*** udesale has joined #openstack-infra06:36
*** slaweq has joined #openstack-infra06:40
*** kjackal_ has joined #openstack-infra06:44
*** danpawlik has quit IRC06:47
*** zhurong has quit IRC06:53
*** coolsvap has joined #openstack-infra06:54
*** markus_z has joined #openstack-infra07:05
*** shardy_afk is now known as shardy07:10
*** markus_z has joined #openstack-infra07:11
*** rcernin has quit IRC07:16
*** ccamacho has joined #openstack-infra07:18
odyssey4meclarkb ah, you'll see later in the log that it does install it - we have a whell mirror in the build, the first attempt is to use what it has, then it reaches out if it couldn't install anything... so not firewalling, but instead a pip config restricting where it sources wheels07:20
odyssey4methat's only happening in the upgrade jobs07:20
odyssey4methanks for the ping though07:21
bogdandoo/ PTAL e-r queries to reduce unknown/uncategorised numbers07:25
*** kjackal_ has joined #openstack-infra07:29
*** shardy is now known as shardy_afk07:32
*** markvoelker has quit IRC07:36
*** alexchadin has quit IRC07:37
*** slaweq_ has joined #openstack-infra07:53
*** slaweq_ has quit IRC07:58
AJaegerjeblair: so, 492715 has still not merged. But problem might be elsewhere now since Zuul reported "Starting gate jobs"08:05
*** lucas-afk is now known as lucasagomes08:06
*** wolverineav has quit IRC08:07
*** thorst has quit IRC08:09
openstackgerritSean Handley proposed openstack-infra/project-config master: Add IRC notifications for #openstack-publiccloud.
*** Hal has joined #openstack-infra08:17
*** shardy_afk is now known as shardy08:17
*** Hal is now known as Guest5895108:17
openstackgerritMerged openstack-infra/project-config master: Add Vitrage python 35 jobs as non-voting
openstackgerritMerged openstack-infra/project-config master: Publish monasca-events-api documentation
openstackgerritMerged openstack-infra/project-config master: Add gate jobs for new openstack-ansible Octavia scenario
openstackgerritMerged openstack-infra/project-config master: [Zun] Move etcd dsvm job back to check queue
openstackgerritMerged openstack-infra/project-config master: Revert "Temporarily start using the public registry again"
openstackgerritMerged openstack-infra/project-config master: networking-midonet: Enable centos-7 jobs for stable/ocata
*** ramishra has quit IRC08:24
*** ramishra has joined #openstack-infra08:27
openstackgerritMerged openstack-infra/project-config master: Skip additional tests for Cinder doc changes
*** isaacb has joined #openstack-infra08:27
openstackgerritMerged openstack-infra/project-config master: Add pypi-jobs to masakari and related projects
openstackgerritDong Ma proposed openstack-infra/subunit2sql master: turn on warning-is-error in documentation build
openstackgerritMerged openstack-infra/project-config master: Add bandit integration job for glance_store
openstackgerritMarkos Chandras (hwoarang) proposed openstack-infra/project-config master: zuul: layout: AIO: Add openSUSE Leap 42.3 as non-voting CI job
*** wolverineav has joined #openstack-infra08:38
openstackgerritArtur Basiak proposed openstack-infra/project-config master: Provide unified gate configuration
*** wolverineav has quit IRC08:43
*** ykarel_ is now known as ykarel|lunch08:48
openstackgerritMarkos Chandras (hwoarang) proposed openstack-infra/project-config master: zuul: layout: AIO: Add openSUSE Leap 42.3 as non-voting CI job
*** slaweq has joined #openstack-infra08:54
*** slaweq has quit IRC08:59
*** lucasagomes is now known as lucas-brb09:08
openstackgerritThierry Carrez proposed openstack-infra/puppet-ptgbot master: Update to Queens, add a site index
*** sree has quit IRC09:18
*** sree has joined #openstack-infra09:19
*** sree has quit IRC09:23
*** goutham has quit IRC09:37
openstackgerritMerged openstack/diskimage-builder master: Increase timeout for removal
*** ykarel|lunch is now known as ykarel09:46
*** slaweq has joined #openstack-infra09:55
*** cuongnv has quit IRC09:58
*** slaweq has quit IRC10:00
openstackgerritSagi Shnaidman proposed openstack-infra/tripleo-ci master: DONT REVIEW: test undercloud containers
*** thorst has joined #openstack-infra10:05
*** dtantsur|afk is now known as dtantsur10:06
*** wolverineav has joined #openstack-infra10:10
*** thorst has quit IRC10:11
openstackgerritOmer Anson proposed openstack-infra/project-config master: Dragonflow: Add a gate-hook to tempest tests
*** lucas-brb is now known as lucasagomes10:18
*** slaweq has joined #openstack-infra10:19
*** yamahata has quit IRC10:21
openstackgerritSagi Shnaidman proposed openstack-infra/tripleo-ci master: Create a whitelist for /etc configs
*** chandankumar is now known as chkumar|travel10:27
*** udesale has quit IRC10:29
openstackgerritSagi Shnaidman proposed openstack-infra/tripleo-ci master: Exclude list for logs collection
*** slaweq has quit IRC10:34
*** slaweq has joined #openstack-infra10:36
openstackgerritSagi Shnaidman proposed openstack-infra/tripleo-ci master: Create a whitelist for /etc configs
*** kjackal_ has quit IRC10:47
AJaegerwow, we're already at our capacity of cloud nodes ;(10:49
openstackgerritDmitry Tantsur proposed openstack-infra/project-config master: Add missing publish-to-pypi to networking-baremetal
*** thorst has joined #openstack-infra11:04
*** jpena is now known as jpena|lunch11:05
*** isaacb has quit IRC11:09
*** isaacb has joined #openstack-infra11:10
*** lucasagomes is now known as lucas-hungry11:10
*** kjackal_ has joined #openstack-infra11:18
*** sree has joined #openstack-infra11:24
*** martinkopec has joined #openstack-infra11:29
*** jkilpatr has joined #openstack-infra11:30
*** slaweq has quit IRC11:43
*** rhallisey has joined #openstack-infra11:43
*** slaweq has joined #openstack-infra11:43
*** slaweq has quit IRC11:44
*** mat128 has joined #openstack-infra11:44
*** slaweq has joined #openstack-infra11:45
*** slaweq_ has joined #openstack-infra11:57
*** lucas-hungry is now known as lucasagomes11:59
ShrewsAJaeger: there is a zuulv3 status page
*** slaweq_ has quit IRC12:02
*** trown|outtypewww is now known as trown12:04
*** slaweq has quit IRC12:05
*** slaweq has joined #openstack-infra12:05
AJaegerShrews: thanks.12:06
AJaegerdo you know what happened to 492715 ?12:07
*** jamesdenton has quit IRC12:07
*** markvoelker has quit IRC12:08
*** jcoufal has joined #openstack-infra12:10
*** slaweq has quit IRC12:10
ShrewsAJaeger: unfortunately no12:10
*** dizquierdo has quit IRC12:19
numansAJaeger, Hi, can you please add this to your review queue -
*** dizquierdo has joined #openstack-infra12:22
AJaegernumans: please ask EmilienM to review this first . Once he's happy, I'll review.12:24
numansAJaeger, sure.  thanks. EmilienM can you please add this to your review queue -
EmilienMdone ^12:27
openstackgerritDmitry Tantsur proposed openstack-infra/project-config master: Add missing publish-to-pypi to networking-baremetal
*** atarakt has quit IRC12:29
*** efoley has quit IRC12:31
fungiAJaeger: i'll take a peek at the debug logs on zuulv3.o.o and see whether i can suss out where 492715 went12:35
*** lucasagomes is now known as lucas-brb12:48
openstackgerritThierry Carrez proposed openstack-infra/puppet-ptgbot master: Update to Queens, add a site index
openstackgerritsebastian marcet proposed openstack-infra/openstackid-resources master: External Calendar Sync
*** spzala has quit IRC12:51
fungittx: thanks12:52
AJaegerthanks, fungi for looking into zuulv312:56
fungiAJaeger: looks like when the trove instance for the zuulv3 mysql reporter was created, our typical default configuration overrides were not applied (which include extending the wait_timeout to 28800 (the upstream mysql default value) instead of whatever absurdly low inactivity timeout rax sets for their deployments12:58
*** slaweq_ has joined #openstack-infra12:58
AJaegercan we apply those now?12:59
fungihowever, that has exposed that our db query socket implementation in zuul v3 is not very robust in the face of disconnects12:59
*** markvoelker has joined #openstack-infra12:59
fungiAJaeger: not sure yet... i think go ahead and try a recheck, but i'm curious whether zuul will reestablish its connection without a restart now13:00
*** gongysh has quit IRC13:00
*** gongysh has joined #openstack-infra13:00
*** gongysh has quit IRC13:00
*** Julien-z_ has joined #openstack-infra13:00
*** gongysh has joined #openstack-infra13:00
*** EricGonczer_ has joined #openstack-infra13:01
AJaegerrestart issues, I see it on zuulv3.openstack.org13:01
*** lathiat has quit IRC13:01
fungi#status log trove configuration "sanity" created in rax dfw for mysql 5.7, setting our usual default overrides (wait_timeout=28800, character_set_server=utf8, collation_server=utf8_bin)13:01
openstackstatusfungi: finished logging13:01
openstackgerritMonty Taylor proposed openstack-infra/shade master: Use new keystoneauth version discovery
*** slaweq_ has quit IRC13:03
*** lathiat has joined #openstack-infra13:03
fungiAJaeger: i saw that you didn't know about the v3 status page being up... have you tried clicking the log link for an in-progress job?13:03
*** Julien-zte has quit IRC13:03
*** isaacb has quit IRC13:04
AJaegerfungi: WOW! Thanks for showing that to me.13:04
fungithis is going to be NICE13:04
*** gongysh has quit IRC13:05
AJaegeryes, it is!13:06
openstackgerritMerged openstack-infra/openstack-zuul-jobs master: Use openstack-publish-artifacts base job
fungiAJaeger: ^ working i guess13:06
AJaegerfungi: yeah, finally!13:07
AJaegerjeblair: it merged now ^13:07
AJaegerfungi: that single change needed two changes in project-config and fixing the database to merge. Good that we find this now...13:08
*** esberglu has joined #openstack-infra13:08
mordredAJaeger: yup! that's why we're running it on ourselves first :)13:09
fungiAJaeger: i also discovered last night that we hadn't yet granted zuul permission to leave verify -2..+2 votes nor submit to merge in gerrit13:09
fungiso... we're ironing out a lot of configuration gotchas on our end this way13:10
*** spzala has joined #openstack-infra13:12
*** spzala has quit IRC13:12
*** spzala has joined #openstack-infra13:12
*** kgiusti has joined #openstack-infra13:12
*** jamesmcarthur has joined #openstack-infra13:19
openstackgerritDavanum Srinivas (dims) proposed openstack-infra/devstack-gate master: Update grenade settings for stable/pike
openstackgerritNuman Siddique proposed openstack-infra/project-config master: Add TripleO scenario007-container experimental job for OVN
mnasermorning pabelanger13:20
openstackgerritMonty Taylor proposed openstack-infra/zuul-jobs master: Retry updating apt-cache
mnaserfyi i've been monitoring # of jobs ran on our cloud so far vs timeouts of voting jobs and 7 out of 308 timed out in the past 24 hours13:21
mnaserill still try to see if we can minimize it even more13:21
dtantsurfolks, could you please check ? it blocks releasing networking-baremetal13:26
pabelangerclarkb: fungi: ianw: apache2 process in infracloud-chocolcate has died 3 times this morning13:27
*** jamesmcarthur has quit IRC13:28
pabelanger8 times on infracloud-vanilla13:29
*** jamesmcarthur has joined #openstack-infra13:29
pabelangerERROR: apport (pid 7398) Wed Aug 16 13:22:48 2017: apport: report /var/crash/_usr_sbin_apache2.0.crash already exists and unseen, doing nothing to avoid disk usage DoS13:31
*** dave-mccowan has joined #openstack-infra13:31
*** chlong_ has joined #openstack-infra13:31
pabelangerfungi: clarkb: ianw: best I can get without running apache2-dbg:
pabelangerguess we should speed up our upgrade to xenial13:39
*** felipemonteiro has quit IRC13:40
openstackgerritMohammed Naser proposed openstack-infra/project-config master: Bump vexxhost max-servers to 40
mnaserpabelanger fungi ^ this should help a bit in clearing the queue13:41
mnaser(slow bumps because i dont want to cause more of a mess and closely monitoring timeouts/etc)13:41
fungiit should be built with debug symbols, so just installing that package (even temporarily) to provide resolution for them should allow you to get more useful detail out of the dump13:41
AJaegerthanks, mnaser13:43
mnaseror AJaeger too :D thank you13:43
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Recycle stale SQL connections
fungipabelanger: when you have a sec, any chance you have a moment to review 484798 (pretty small change) so ttx and diablo_rojo can announce the ptgbot is up and running?13:43
mnaseroh my13:44
mnaserthose zuul v3 logs are super badass13:44
pabelangerfungi: looking13:44
fungimnaser: super badass is the zuul v3 motto, i think13:44
mnaseri am just curious about how this would scale13:45
fungiso are we13:45
mnaseraha :p13:45
mnaserbut this makes for a cool new change eliminating the need for public facing ci workers13:45
fungithe log muxing is decoupled a bit with the idea that we should be able to scale it horizontally if needed13:45
fungimnaser: the actual implementation is fascinating13:46
fungimnaser: zuul executors serve the console logs via finger protocol13:46
fungiand then there's a websockets proxy which feeds them to your browser13:46
mnaseras long as it only streams logs on demand, i dont think it should be an issue (only a few crazy people watch their CI jobs like me i think, aha)13:46
mnaseryeah, i saw the websockets part, i did a little searching around :-P13:47
fungiright, basically the proxy connects via finger to request the stream for a specific log13:47
mordredmnaser: for scaling - the websocket streamer is actually scaleout/load-balancer-able13:47
mnasermordred nice!13:48
mordredmnaser: so we can run as many web frontends as needed - and they get the data from the executors - which can also be scaled out to handle load as needed13:48
fungiright, the main scaling concern is the finger server, though we batted around some thoughts on how to tackle that if needed13:48
mnasermordred are you saying its web scale? :-P13:48
fungimore of a yagni situation though13:48
mordredmnaser: so _hopefully_ it'll prove to be a pleasingly scalable system - however, we only have one so far :)13:48
mnaserbut that's awesome.  i'm really excited for the transition.13:48
mordredmnaser: if /dev/null is very fast, i'll put my data in it13:48
mnaseryour apps should be stateless so /dev/null should be your storage, databases are so old school13:49
mordredmnaser: it's step one on the path to cloud native :)13:49
fungiit's the next step past nosql13:49
fungiwe'll eventually achieve nosoftware13:50
fungiless is more!13:50
pabelangerYa, speaking of scale, I've ben wondering if we'll setup regional zuul-executors (a long side mirrors) to help cutdown on the delay of pushing git contents to nodes13:50
*** lucas-brb is now known as lucasagomes13:53
*** dhajare has quit IRC13:55
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Recycle stale SQL connections
*** chlong_ has quit IRC13:59
mordredpabelanger: it might be a thing to consider, but it would require some additional plumbing in zuul, as currently jobs are distributed to executors without any knowledge of location14:00
fungipabelanger: well, we had also talked about a future state where nodepool could add and remove executors as load dictates, and pairing that with region-specific scheduling could get weird/inefficient14:00
*** dimak has quit IRC14:01
pabelangermordred: Ya, I was thinking about that too the other night.  I think we'd need to some how filter gearman requests to a specific region for executors ( I think that is the right place).14:01
mordredfungi: it could - although it has the possibility to be nice if done well14:01
pabelangerfungi: Ya14:01
*** dimak has joined #openstack-infra14:01
pabelangermordred: jeblair: I think we are ready to test zuulv3-dev uploading with secret: both clarkb and fungi have reviewed, do you both mind looking again14:03
*** slaweq_ has quit IRC14:04
AJaegerpabelanger: hwoarang added a few links to timeouts to - do we want to merge the request or do you have an idea how to fix those?14:05
*** EricGonc_ has joined #openstack-infra14:06
*** dtantsur is now known as dtantsur|bbl14:10
*** felipemonteiro_ has quit IRC14:10
*** ykarel|afk has joined #openstack-infra14:11
*** lbragstad has joined #openstack-infra14:13
pabelangerAJaeger: hwoarang: A quick look at logs shows might be hitting directly, had to tell since not many logs on that job. Also see other yum repos, we likely can mirror. But ya, incrase in timeout okay with me, still under 90mins14:13
mordredpabelanger: lgtm +2 - when we update it for real I think we should do fungi's shred thing too14:16
*** ralonsoh_ has quit IRC14:16
pabelangerAJaeger: hwoarang: hard to tell, but is job configured to use wheels too? I see it adding new pip.conf files14:17
hwoarangi think it does use wheels14:18
mordredpabelanger: wait - doesn't the job need to request the secret?14:18
pabelangermordred: Oh, maybe. Looking14:19
*** mat128 has joined #openstack-infra14:20
*** gongysh has joined #openstack-infra14:21
openstackgerritPaul Belanger proposed openstack-infra/project-config master: Create site_zuulv3_dev secret
mordredpabelanger: lgtm. in the etherpad, you've got " convert playbooks/publish/openstack-tarball.yaml to a role (publish-to-tarballs) [pabelanger]14:24
mordredwith that change listed beside it - that still on your plate or you want me to do it when Ido the playbook tasks below? (I'm fine either day)14:24
pabelangermordred: sure, if you want to convert14:27
*** wolverineav has quit IRC14:28
*** wolverineav has joined #openstack-infra14:29
*** slaweq has quit IRC14:30
*** rbrndt has joined #openstack-infra14:31
*** LindaWang has quit IRC14:31
*** wolverineav has quit IRC14:32
mordredpabelanger: found one more bug14:37
*** wolverineav has joined #openstack-infra14:37
jeblairfungi, mordred, pabelanger: should we also have zuul shred the ansible variables files to which it writes secrets?14:39
fungidoes zuul remove those files after they're written?14:40
fungiis that done just as a batch when performing ephemeral cleanup of te tmpdir?14:40
fungior are they explicitly removed as soon as they get used?14:40
jeblairthe batch cleanup14:41
*** felipemonteiro has joined #openstack-infra14:41
jeblairthe only other special thing about them is that they are only bind-mounted into the bwrap container for the playbook run they're used for.  so for the other playbook runs, they sit in the jobdir, but outside the container.14:41
fungioverwriting sensitive file contents with random garbage and calling sync prior to unlinking mitigates some kinds of harvesting of secrets off discarded physical media or if the hypervisor fails to zero a disk before reusing blocks14:42
*** felipemonteiro_ has joined #openstack-infra14:42
fungiif they're written all the way through the local filesystem caching layer to the block device (as opposed to, say, using tmpfs) then it might be worthwhile as long as it's not a lot of added code complexity14:42
jeblairnow that's an interesting idea... let me see if we can tell bwrap to put them on a tmpfs14:43
*** jamesmcarthur has quit IRC14:43
fungigranted, it's far from foolproof as some kinds of filesystems or virtual storage layering will cause the overwriting to go to a new block (especially true of ssd or similar solid-state media)14:44
openstackgerritPaul Belanger proposed openstack-infra/project-config master: Create site_zuulv3_dev secret
pabelangermordred: thanks14:44
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Recycle stale SQL connections
mordredjeblair: question about precedence ...14:45
*** srobert has joined #openstack-infra14:45
mordredjeblair: nevermind14:46
*** felipemonteiro has quit IRC14:46
jeblairfungi: i'll move the shred conversation to #zuul14:46
openstackgerritMonty Taylor proposed openstack-infra/zuul-jobs master: Add two roles for publishing artifacts over ssh
openstackgerritMonty Taylor proposed openstack-infra/project-config master: Use artifact publication roles from zuul-jobs
*** tjones has joined #openstack-infra14:49
openstackgerritSagi Shnaidman proposed openstack-infra/tripleo-ci master: DONT REVIEW: test removing fake image from multinode
*** alexchadin has quit IRC14:54
*** xyang1 has joined #openstack-infra14:54
*** ykarel|afk is now known as ykarel14:59
clarkbianw: puppetry to do digited mirrors lgtm14:59
mnaserjeblair fungi regarding secrets, i think it could be interested to have a look at trove.  i believe some folks had similar concerns which were resolved by using ramfs to store secrets15:00
mnaserthe concern was 'what if someone snapshotted the vm during a run'15:00
mnaserand given that in infra's use case at least, you don't really "trust" the infrastructure (well, we trust each other but you know what i mean :))15:01
*** spzala has quit IRC15:01
rybridgesHello everyone. Is there anything else that you guys need me to add before we can merge this? ->
fungimnaser: agreed. granted tmpfs is basically a ramfs just abusing the kernel's filesystem cache without providing a backing block device15:02
rybridgesOkay great! Thanks clarkb15:04
mnaserfungi i think so, there is a ramfs and tmpfs section here15:04
jeblairmnaser, fungi: tmpfs does have the advantage of being supported by bubblewrap, so it's easy to add to the restricted environment we're running in15:04
jeblairof course, we could make our own tmpfs/ramfs and then bindmount it in as a normal directory.  i think.15:05
fungimnaser: or run swapless15:05
*** slaweq has quit IRC15:05
openstackgerritMonty Taylor proposed openstack-infra/zuul-jobs master: Add two roles for publishing artifacts over ssh
*** sshnaidm is now known as sshnaidm|afk15:05
mnaserfungi i don't fully have the context but i know that some jobs generate swap with sudo locally in infra afaik15:06
mnaser(i know zuul v3 isn't just -infra's solutions but its a use case :p)15:06
jeblairmnaser: in this case, the swap in question would be on the infra-managed server where zuul is running (not on a test node)15:06
*** rcernin has quit IRC15:07
mnaserjeblair oh i see15:08
mnaserfair enough :D15:08
fungiand one which is likely harder for us to control15:08
pabelangerfungi: clarkb: mordred: jeblair: okay, I think is ready now! :) zuulv3 secret for zuulv3-dev publishing15:09
mordredfungi: yup- but - that's a user-content issue so caveat emptor :)15:10
clarkbmordred: pabelanger jeblair and while I'm thinking about it a secret can only be used by a job in the same zuul.yaml file?15:11
clarkbor is any trusted repo allowed to use any secret?15:11
pabelangerIf I understand correctly now, we pin secrets to just jobs now15:13
pabelangerso only  publish-openstack-artifacts has access to it15:14
clarkbpabelanger: right but could I push a change to a different trusted repo that used that secret?15:14
mordredclarkb: I'm pretty sure it's same-repo15:14
pabelangerclarkb: I'll deferred to jeblair, but I don't think so15:14
mnaserdid rax-ord mirror have issues over the yesterday/today?15:15
pabelangerlimited to zuul.yaml file15:15
mnaser -- this job failed with "transfer closed with 351490 bytes remaining to read" on two RPMs15:15
pabelangermnaser: possible, we are having some apache2 crashes15:15
clarkbpabelanger: mordred cool thanks (that definitely helps with reviewing and not needing to expand context to all the trusted repos15:15
mordredpabelanger, clarkb: I'm not 100% sure about zuul.d dirs - I *think* those count as "same zuul.yaml file" since they're in the same repo15:16
mordredbut I do not know for 100% sure15:16
mordred(I would hope they count as the same thing)15:16
pabelangermnaser: clarkb: Ya, mirror.ord.rax is unhappy.  Core dumps of apache every 5mins15:17
*** dougs1 has joined #openstack-infra15:17
*** dougs1 has left #openstack-infra15:17
bogdandoto elastic-recheck folks, PTAL to reduce unknown/uncategorised numbers15:17
*** dougs1 has joined #openstack-infra15:18
pabelanger[393640.798829] net_ratelimit: 1 callbacks suppressed15:18
pabelangersee that in dmesg15:18
clarkbpabelanger: I've +2'd the secrets chagne but not approved it in case jeblair wants to take a look. You now have three +2's so I think are good to go ahead either way15:18
pabelangera lot15:18
clarkbI believe the net_ratelimit thing is a xen behavior that is mostly just noise15:18
*** marst has joined #openstack-infra15:19
fungimnaser: current plan is to try upgrading the mirrors to xenial before troubleshooting further. if memory serves we started noticing these segfaults on web servers we run after upgrading from precise to trusty, so expecting them to go away when upgrading from trusty to xenial isn't entirely insane15:19
clarkbyou can tune interface settings to make it go away iirc15:19
mnaserclarkb pabelanger i wonder if the hypervisor cpu/memory has issues15:19
pabelangerthat is a new server this week15:19
mnaserah okay15:20
clarkbbigger new server too15:20
mnasernow the other issue15:20
clarkbI've got to do some home things for a bit but maybe we can get ianw's change in than start a xenial replacement in ord?15:20
mnasersudo yum -y groupinstall "Development Tools" failed .. but zuul task didnt fail15:21
clarkbor continue ahead with iad since I think it is unhappy there too15:21
mnaserwould anyone know why? does zuul go based on the exit code of the entire "shell" job?15:21
clarkb(I'm not sure where ianw got iad yesterday)15:21
pabelangermnaser: clarkb: fungi: [Errno 14] curl#18 - \"transfer closed\" does seem limited to rackspace over the last 10 days.  Only 3 hits in infracloud15:21
clarkbmnaser: yum doesn't fail if at least one item successfully installs. And yes the command zuul executes' return code determine job status15:22
mnaserclarkb so i take it the || true after pip uninstall is hiding the yum exit status15:22
mnaserbecause that will always return true15:22
pabelangermnaser: I wonder if you created a mirrorlist for yum, but added 10 entries for the same mirror. Would that cause the yum client to try 'another mirror'15:23
clarkbmnaser: ya but not being set -e wil mask all the yum return codes too15:23
pabelangerEmilienM: mwhahaha: ^ see comment about mirrorlist15:23
clarkbmnaser: so setting it errexit may be the simplest fix there15:23
clarkbok I've got to do some home things for a bit, back as soon as I can15:24
mwhahahapabelanger: this is before we update the mirrors in the ci code15:24
mwhahahapabelanger: this is purely infra stuff, we didn't even get to the puppet ci builder stuff. it failed in bindep processing15:24
mwhahahapabelanger: so you'd have to do that within the images15:25
pabelangermwhahaha: Right, I was mostly curious about the mirrorlist question, do you happen to know?15:25
*** udesale has quit IRC15:25
pabelangerMy next test is going to create one, but add the same mirror hostname to it 10 times15:25
pabelangerand see if yum will kick to the 'next mirror' and try to download the package15:26
pabelangerbut, really hitting the same mirror15:26
mwhahahapabelanger: yea i believe you can fake it by listing it multiple times15:26
pabelangerYa, I am starting to think this is likely the next step. mirrorlist to have yum client attempt multiple downloads15:27
*** coolsvap has quit IRC15:27
*** shardy is now known as shardy_afk15:27
openstackgerritMohammed Naser proposed openstack-infra/project-config master: Fail early if any Puppet preparation commands fail
mnaserEmilienM mwhahaha ^ when you have any free time15:30
*** links has joined #openstack-infra15:33
*** dimak has joined #openstack-infra15:39
pabelangermnaser: mwhahaha: EmilienM: so, CentOS-Base.repo would be setup to use mirrorlist:
pabelangerI'll do some testing in a bit to see if that actually works, then maybe we can update to do it for default repos15:40
jeblairi'm going to try pushing some inap rename changes through today15:43
*** dougs1 has left #openstack-infra15:43
jeblairinfra-root: so, if nodepool dies, that may be why.15:43
mnaserpabelanger i wonder why yum just doesnt retry on its won15:43
pabelangermnaser: it does, 10 times by default15:44
pabelangermnaser: however, I am not sure it retries if the connection is broken15:44
*** yolanda has quit IRC15:44
pabelangermnaser: I think it would fail over to next mirror, if setup15:44
pabelangerYay, secrets merged!15:44
mnaserpabelanger ah yes maybe failed downloads are not recoverable in yum (maybe)15:44
pabelangerstarting testing on zuulv3-dev.o.o15:44
*** dave-mcc_ has joined #openstack-infra15:45
openstackgerritMohammed Naser proposed openstack-infra/project-config master: Fail early if any Puppet preparation commands fail
mnaserAJaeger thanks for the comments, addressed :)15:45
*** ccamacho has joined #openstack-infra15:46
*** dave-mccowan has quit IRC15:47
*** yamahata has joined #openstack-infra15:47
*** spzala has quit IRC15:51
jeblairinfra-root: i propose to remove zl07, zl08 and zl09 from the current zuulv2 deployment and create ze02, ze03 and ze04 for the zuulv3 deployment so that we can test startup times with no net change in server footprint.15:51
*** spzala has joined #openstack-infra15:51
jeblairi'll note that 08 and 09 seem not to have ever gone into production for some (iptables?) reason, so it's only a net loss of one launcher from the v2 deployment.15:51
jeblairhow does that sound?15:52
fungijeblair: sounds fine to me since we've significantly dropped our aggregate quota recently anyway15:52
pabelangerjeblair: mordred: fungi: clarkb: while the job failed, did properly connect to zuulv3-dev.o.o, if people would like to confirm logs15:52
*** isaacb has quit IRC15:52
pabelanger was created15:52
pabelangerso, SSH key worked as expected15:52
*** dave-mcc_ has quit IRC15:52
clarkbjeblair: yes what fungi said, I expect it to be fine unless we fall into a lot more new cloud quota15:53
*** Sukhdev_ has joined #openstack-infra15:53
openstackgerritGage Hugo proposed openstack-infra/project-config master: Skip non-doc jobs in certain cases
fungipabelanger: excellent!15:54
*** tumbarka__ has joined #openstack-infra15:54
pabelangerfungi: Ya, super exciting!15:54
clarkbpabelanger: one thing I notice is we gather facts for the "logs" server. Might want to turn that off as hundreds of jobs all collecting facts seems unnecessary15:54
*** martinkopec has quit IRC15:55
pabelangerclarkb: Ya, we have fact caching right now, but that is limited per job. But agree, right now we are not using any facts on that playbook15:55
*** spzala has quit IRC15:55
*** pcaruana has quit IRC15:57
openstackgerritPaul Belanger proposed openstack-infra/project-config master: Disable facts on publish-openstack-artifacts jobs
pabelangerclarkb: ^15:59
openstackgerritJames E. Blair proposed openstack-infra/system-config master: Remove zl07-zl09; add ze02-ze04
jeblairpabelanger: remind me why gathering facts is enabled by default?16:00
*** slaweq has joined #openstack-infra16:01
pabelangerjeblair: we enabled them because we started caching facts16:02
jeblairpabelanger: i see the change where you turned it on, but there was no reason16:02
jeblairpabelanger: as you point out, the caching is ineffective across job runs16:02
mordredjeblair: we gather them becuase we need them for things like os_type16:02
mordredin modules like 'package'16:02
pabelangerright, but its possible to have multiple plays for a job, so in that case fact cache is helpful16:02
mordredbut we turn on caching and then gather them once in the first playbook, so that for any given job we only gather them once per node16:03
jeblairmordred: okay, so it's required on the build nodes for some roles16:03
mordredwe also put in a blank file for localhost so that we don't fact-gather iirc16:03
jeblairmordred: ya16:04
mordredkklimonda: the second16:04
mordredkklimonda: also, puppet agent had a tendency to hang when run in daemon mode16:04
*** wolverineav has joined #openstack-infra16:04
mordredkklimonda: but it was the sequencing that pushed us to using ansible to run puppet16:04
fungikklimonda: basically, puppet is not great at complex task orchestration so we went with ansible to provide that16:05
mordredkklimonda: basically, when creating new projects, we need to create them on the mirrors first, then on the gerrit server, or else things go to the bad place16:05
*** jascott1 has joined #openstack-infra16:05
fungibut puppet does excel at declarative configuration management (as long as you keep all its actions idempotent) so we continued using it for that purpose16:05
mordredyup. that and we have a large pile of it :)16:06
pabelangermordred: jeblair: technically we could share fact cache across multiple playbook runs, maybe something to discuss at ptg16:06
*** slaweq has quit IRC16:06
jeblairpabelanger: we do share it across multiple playbook runs16:06
kklimondathanks, another thing to consider :)16:07
jeblairpabelanger: we don't share it across multiple jobs16:07
pabelangerjeblair: sorry, yes, multiple jobs runs16:07
clarkbjeblair: pabelanger right and that only really becomes a problem when collecting facts on shared nodes (like our logs/tarballs/archival servers)16:07
jeblairpabelanger: we should not share facts across job runs.  there is almost nothing in common.16:07
jeblairexcept what clarkb says16:07
jeblairwhich is the exception to the rule16:07
clarkbbecause its a potential dos against that server16:07
mordredyah. if we need facts for the log/tarball server - we could come up with a way to pre-populate the fact-cache with information about them16:08
pabelangerYa, facts on control plane servers would be the use case. Not an issue ATM, because we don't need any facts16:08
mordredin fact, we could put those facts into the secret and have the job plop the info into th efact cache like it does with host keys16:08
pabelangerya, that is possible16:09
jeblairi have invoked zuul-launcher graceful on zl0716:09
*** dmellado has quit IRC16:11
jeblairdeleted zl08 and zl0916:11
*** Apoorva has joined #openstack-infra16:14
*** amoralej has joined #openstack-infra16:14
openstackgerritRob Cresswell proposed openstack-infra/irc-meetings master: Update Horizon meeting to reflect current PTLs
amoralejpabelanger, it seems there is some issue with synchronization of buildlogs in
amoralejit seems repo metadata is not properly synced16:15
pabelangeramoralej: we don't sync, that is just a reverse proxy cache16:15
pabelangeramoralej: what data is incorrect?16:16
pabelangeramoralej: we are also hitting, so it is possible it has the issue16:16
amoralejpabelanger, there is any way to force synchronization of  ?16:16
amoralejmetadata doesn't come from cdn16:17
pabelangerthat is where you are actually getting16:17
amoralejlemme check redirects16:18
pabelangerso, we'd need to check expire headers on the cache data16:18
pabelangerpossible apache is now refreshing it16:18
amoralejpabelanger, in fact we shouldn't use cdn for metadata, that may be the issue16:19
* amoralej checking16:19
*** Apoorva_ has joined #openstack-infra16:19
pabelangeramoralej: okay, this would need to be fixed upstream. Because we are just proxying the data16:20
*** pbourke has quit IRC16:21
amoralejpabelanger, you proxy all requests to, right?16:23
openstackgerritMonty Taylor proposed openstack-infra/puppet-zuul master: Install ara on the executors
pabelangeramoralej: yes because redirects to it. So we now directly use it:
mordredjeblair, pabelanger: ^^ that's a followup to
amoralejpabelanger, doesn't redirect for metadata16:25
amoralejonly for rpms16:25
pabelangeramoralej: so that is the issue, metadata on is stale16:25
pabelangerand we are getting it16:25
amoralejpabelanger, i'm trying to figure out if proxying to cdn for metadata should work16:26
amoralejand we just need to fix that16:26
openstackgerritMerged openstack-infra/project-config master: Disable facts on publish-openstack-artifacts jobs
amoralejor we should switch to not use cdn for metadata16:26
mnaserpossible infracloud-vanilla issues (network?):
pabelangeramoralej: so, we can either revert which means extra 302 redirects for every buildslogs RPM attempt, or see ask to maybe mirror faster16:27
*** ccamacho has left #openstack-infra16:27
pabelangeramoralej: lets see how long it takes before updates metadata16:28
pabelangerOther wise, we should we able to write an apache rule just to hit for the metadata16:28
*** ihrachys has quit IRC16:28
amoralejpabelanger, update in buildlogs was 14 hours ago or so16:30
pabelangeramoralej: wow, guess they are slow16:30
*** Apoorva_ has quit IRC16:31
clarkbpabelanger: what is odd to me is it seems the yum mirror hits these problems more than anything else?16:31
*** Apoorva has joined #openstack-infra16:31
clarkbpip hits it with the hash mismatch but far less frequently despite running far more jobs agianst it16:32
clarkbI wonder if there is something specific to how we are mirroring centos repos that tickles this behavipr16:32
*** florianf has quit IRC16:32
pabelangerclarkb: sorry, which issue are you referecing ATM16:32
clarkbpabelanger: the apache segfaults16:32
clarkbpabelanger: aiui they manifest in e-r as yum client no more mirrors to try and the pip hash mismatch error16:33
*** e0ne has quit IRC16:33
pabelangerclarkb: Right, so we are pushing more yum things via reverse proxy then apt ATM. So I am starting to think it might be related16:34
pabelangerPip would just hit afs cache16:34
clarkbpabelanger: but aren't the segfaults caused by afs?16:34
clarkbI thought that is what ianw found yesterday16:34
clarkbpabelanger: pretty sure ianw tracked it to afs /me reads scrollback16:35
pabelangerI don't think I seen anything specific to AFS durning that time16:35
pabelangerinfracloud-chocolcate is what I was looking at this morning16:35
*** yamahata has quit IRC16:36
openstackgerritPaul Belanger proposed openstack-infra/system-config master: Revert "Replace buildlogs.centos with buildlogs.cdn.centos"
clarkbso that is correlation not necessarily at fault, but looks really suspicious16:37
pabelangerclarkb: Ya, I see that last night. But I didn't see the same on infracloud16:37
pabelangeronly think I see in dmesg for afs is16:37
pabelanger[2535506.431981] afs: file server in cell is back up (code 0) (multi-homed address; other same-host interfaces may still be down)16:38
clarkbwhich is from last week so older16:38
pabelangeramoralej: see ^49426516:38
jeblairclarkb: i agree that EINTR and segv are related, but i honestly have no idea which is the cause.16:38
amoralejpabelanger, i've forced fetching the file again and it's synced now16:38
jeblairclarkb: i think it's equally likely that the EINTR is a result of apache receiving sigsegv16:39
felipemonteiro_anyone aware that is done? just wondering whether it's a known issue16:39
jeblairclarkb: (based on not actually having tracked it down)16:39
pabelangerclarkb: was the best I could get from coredump this morning. We should likely add apache2-dbg16:39
pabelangeramoralej: okay, where did you do that?16:40
pabelangeramoralej: or how16:40
pabelangeramoralej: Ah, okay. So manually. Ya, we can do 494265 for now, since it was an optimization16:40
amoralejadding ?whatever forces to recheck16:40
amoralejpabelanger, lemme check with centos team if we can trust on cdn or not for that16:41
amoralejif it's issue in cdn we should fix it there16:41
pabelangeramoralej: sure, that would be helpful16:41
clarkbpabelanger: ya would need mroe info on what read is happening16:43
*** vhosakot has joined #openstack-infra16:43
clarkbfelipemonteiro_: yes I believe it was intentionally shut down16:44
pabelangerapache proxy cache is on local filesystem16:46
*** rama_y has joined #openstack-infra16:47
jlvillalclarkb, Doing a quick, I couldn't find anything.16:47
clarkbfelipemonteiro_: I'm trying to find record of discussion for that and failing so I may be wrong but looking16:47
jlvillalclarkb, I see things that are setting DEVSTACK_GATE_TLSPROXY, but nothing that I can find that uses it.16:47
clarkbjlvillal: devstack-gate features.yaml16:48
felipemonteiro_clarkb: I was thinking back to but was also looking for where the official note of it is if TC did in fact decide to shut it down16:48
jlvillalclarkb, Yep, I see it. Thanks!16:48
clarkbjlvillal: if its set we enable the service on newer branches via devstack-gate's feature thing16:48
amoralejpabelanger, the recommendation from centos team is not tu use cdn for metadata, in fact they are telling me to not use cdn but and follow redirects16:48
clarkbjlvillal: it was done that way because it makes it easy to control what branches it is enabled for16:48
clarkbamoralej: pabelanger out of curiousity why can't we just use the base OS repos?16:49
*** rcernin has joined #openstack-infra16:49
pabelangeramoralej: okay, so we should land 49426516:49
clarkbwe already mirror all of centos and epel, and I think rdo is proxied and that works?16:50
pabelangerclarkb: buildslogs contains pre-release packages which haven't landed in centos.org16:50
clarkbpabelanger: why are we using prereelase pacakges?16:50
clarkbwe don't do that anywhere else aiui16:50
pabelangerclarkb: tripleo does it for their workflow16:50
clarkbbecause we aren't testing centos16:50
pabelangertesting openstack RPM16:51
*** jpich has quit IRC16:51
pabelangerI don't know the history, but there is a complicated relationship for RPMs between RDO, DLRN and buildslogs16:52
mgagneis there a way to unqueue a change from Zuul? Use case is: Zuul thinks Jenkins job is running but it's not for reasons. and now it waits forever16:52
clarkbmgagne: you can push a new patchset16:52
pabelangerjust trying to make sure sites are reliable16:52
*** spzala has joined #openstack-infra16:52
clarkbpabelanger: ya I'm trying to understand why we rely on such a complicated setup :)16:52
clarkbpabelanger: and its because rdo doesn't host the preelease openstack packages, those are hosted by buildlogs?16:53
pabelangerclarkb: Yes, I think that is part of the issue also. RDO has limited infrastructure so they rely on buildslogs for things16:53
*** egonzalez has quit IRC16:53
*** lucasagomes is now known as lucas-afk16:54
pabelangerclarkb: one of the things I am hope to do at PTG is sit down with tripleo / puppet teams and better understand all the infrastructure they are using and why16:54
pabelangerclarkb: fungi: where can I find our credentials to testpypi? I've been just using personal ones for the moment16:55
fungipabelanger: i don't know that we have any16:56
fungiwould need to create some16:56
*** rhallisey has quit IRC16:57
*** dprince has quit IRC16:58
pabelangerfungi: k17:00
felipemonteiro_fungi: thank you17:01
openstackgerritJames E. Blair proposed openstack-infra/puppet-zuul master: Zuulv3: move the job dir under /var/lib/zuul
clarkbpabelanger: we can leave the old one hanging around too since we reduced nodepool's max server in infracloud17:03
pabelangerclarkb: ya, upgrading them to xenial should be strightforward17:04
pabelangerclarkb: fungi: do you mind helping land that will fix metadata issue amoralej is seeing17:06
*** derekh has quit IRC17:06
openstackgerritJames E. Blair proposed openstack-infra/puppet-zuul master: Zuulv3: move the job dir under /var/lib/zuul
*** slaweq has quit IRC17:07
jeblairpabelanger, fungi: yeah, i don't think it matters but updated anyway ^17:07
clarkbpabelanger: what rewrites the 302 content to point back at our proxy?17:07
pabelangerjeblair: fungi: I fear linting issue on arrows17:09
*** Sukhdev_ has quit IRC17:09
pabelangerbut +317:09
jeblairfor crying out loud17:09
pabelangerclarkb: apache2 will rewrite it properly17:10
openstackgerritJames E. Blair proposed openstack-infra/puppet-zuul master: Zuulv3: move the job dir under /var/lib/zuul
clarkbpabelanger: oh on the proxy pass reverse because that effects the entire vhost17:11
clarkbpabelanger: got it17:11
openstackgerritPaul Belanger proposed openstack-infra/project-config master: Create testpypi_secret secret for zuulv3
pabelangerfungi: do you mind creating some credentials on and maybe update ^?17:12
fungipabelanger: i may not have time today. been trying for hours to free myself up for last-minute yardwork and packing before i head out of town17:14
pabelangerfungi: okay, I don't mind doing it. Wanted to share the love with intree credentials :)17:15
fungipabelanger: though there's nothing requiring me specifically to create an openstackci account on testpypi as far as i know... anyone should be able to17:15
jeblairpabelanger, fungi, clarkb: this is what i have done on ze02 in order to put all the git repos and build dirs on the same filesystem (the ephemeral disk):
jeblairthat look okay?17:15
clarkbpabelanger: ya you should be able to do it, just add the info to the password file17:15
fungijeblair: looks right to me17:16
*** yamahata has joined #openstack-infra17:16
openstackgerritMonty Taylor proposed openstack-infra/shade master: Use new keystoneauth version discovery
fungijeblair: i suppose if you cared about uptime you could move line 1 down to between 4 and 517:16
jeblairnot so much17:16
clarkb++ also there is noatime :)17:17
fungijeblair: oh, and missing a `service zuul-executor start` at the end obviously17:17
jeblairfungi: yeah.  though i haven't run that part yet :)17:18
fungijeblair: only other thing i can think of is making sure to chmod/chown /mnt to match /var/lib/zuul17:18
fungiin case initscripts don't take care of that automagically for us17:19
*** dtantsur is now known as dtantsur|afk17:19
clarkbpuppet will get to that eventually17:19
jeblairfungi: indeed that was incorrect, thanks :)17:19
fungior pupprt17:19
*** jascott1 has quit IRC17:19
*** jascott1 has joined #openstack-infra17:20
jeblairokay, i'll do that to ze03 and 04 now, start them up, then do ze0117:21
*** ykarel|afk has quit IRC17:21
jeblairoh neat17:21
jeblairzuulv3.o.o is not running iptables17:21
pabelangeroh darn17:21
pabelangerheh, used as regstration email for testpypi when I should have used infra-root17:22
jeblairpabelanger: please change that17:22
pabelangeryes, trying to do so now17:22
*** sambetts is now known as sambetts|afk17:23
*** shardy has quit IRC17:23
*** Apoorva has quit IRC17:23
pabelangeris somebody able to help be with mailmain to not have that email get posted?17:23
pabelangervalidation link is likley on route to that17:23
*** Apoorva has joined #openstack-infra17:23
jeblairpabelanger: it should be held for moderation17:23
*** tjones has left #openstack-infra17:24
*** jascott1 has quit IRC17:24
pabelangergreat, thank you17:24
*** jascott1 has joined #openstack-infra17:25
fungipabelanger: yeah, i'll discard in the moderation queue17:25
pabelangerfungi: which email should I use for verfification?17:26
pabelangersafe with infra-root@o.o?17:26
fungiand done17:27
*** rbrndt has quit IRC17:27
fungipabelanger: yeah, that's fine17:27
fungialso, i highly recommend someone besides just me sets up imap to watch that infra-root mailbox17:27
fungiwouldn't be a bad idea to get a second moderator onto the infra ml as well17:27
pabelangerya, I do not have that setup. I'll do that this afternoon17:28
jeblairfungi, clarkb, pabelanger: i fixed iptables on zuulv3.o.o which means i need merged to proceed17:28
jeblairmordred: ^17:28
openstackgerritPaul Belanger proposed openstack-infra/project-config master: Create testpypi_secret secret for zuulv3
*** jascott1 has quit IRC17:29
clarkbjeblair: what was wrong with iptables?17:30
*** eranrom has quit IRC17:30
jeblair   Active: failed (Result: exit-code) since Sat 2017-07-08 01:12:00 UTC; 1 months 9 days ago17:30
jeblairWarning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.17:30
*** Apoorva_ has joined #openstack-infra17:30
jeblairclarkb: ^ i don't know :(  i restarted it and it's up now17:31
jeblairmaybe last time it started ze01 didn't resolve or something?17:31
clarkbI think I checked that on centos images in rax when I found it was aproblem but not on ubuntu because we didn't have xenial yet17:32
openstackgerritMonty Taylor proposed openstack-infra/project-config master: Emit and publish ara logs if available
clarkbjeblair: ya just confirmed its ring buffering17:33
clarkbjeblair: in /etc/systemd/journald.conf we set it to auto which will log to disk if /var/log/journald exists but otherwise use ringbuffer17:33
*** Apoorva has quit IRC17:33
openstackgerritSagi Shnaidman proposed openstack-infra/tripleo-ci master: Create a whitelist for /etc configs
clarkbwe may just want to add a /var/log/journald resource to puppet to make sure its everywhere and we get persistent logging17:33
*** spzala has quit IRC17:34
*** e0ne has quit IRC17:34
openstackgerritSagi Shnaidman proposed openstack-infra/tripleo-ci master: Exclude list for logs collection
jeblairclarkb: ++17:36
clarkbI'll get that patch up as soon as I determine the correct ownership and perms17:36
jeblaircool, i'm going to take a short break while those patches bake17:36
openstackgerritPaul Belanger proposed openstack-infra/project-config master: Create testpypi_secret secret for zuulv3
pabelangerokay, testpypi credentials17:40
*** jtomasek is now known as jtomasek|afk17:41
*** tosky has quit IRC17:41
pabelangerOpenStackCloudHTTPError: (403) Client Error for url: Quota exceeded for ram: Requested 8192, but already used 204800 of 204800 ram'17:46
pabelangermnaser: looks like quota issue vexxhost17:47
mnaserthat'll stop it from hitting 4017:47
mnaserill raise to 40960017:47
mnaserpabelanger done17:48
mnaserthere we go17:48
mnaserlaunching 18 now17:49
pabelangerYa, seeing ready nodes now17:49
pabelangermnaser: thanks, 40 online now17:52
mnaserpabelanger sweet, i've been keeping at eye of timeout/job run ratio17:53
mnaser3-4 in past 24 hours out of 300 something so hopefully that stabilizes17:53
*** gongysh has quit IRC17:54
openstackgerritClark Boylan proposed openstack-infra/system-config master: Make journal logs persistent on disk
clarkbthat was actually far more reading that I expected it to be17:56
clarkbpabelanger: ok I'm going to go grab some early lunch, but when I get back I think I am ready to boot some new xenial mirrors17:57
*** e0ne has joined #openstack-infra17:57
pabelangerclarkb: ack18:00
*** electrofelix has quit IRC18:00
*** rlandy|brb is now known as rlandy18:02
openstackgerritSagi Shnaidman proposed openstack-infra/tripleo-ci master: WIP: test creating fake image in oooq extras
*** spzala has joined #openstack-infra18:03
*** dave-mccowan has quit IRC18:05
*** e0ne has quit IRC18:07
*** tosky has joined #openstack-infra18:07
*** e0ne has joined #openstack-infra18:08
*** e0ne has quit IRC18:08
*** slaweq has quit IRC18:09
*** dave-mcc_ has joined #openstack-infra18:09
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Add sphinx-autodoc-typehits sphinx extension
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Collect logging information into ara callback
jeblairclarkb: is that journald change missing a git add?18:09
*** pcaruana has joined #openstack-infra18:09
openstackgerritMerged openstack-infra/system-config master: Remove zl07-zl09; add ze02-ze04
openstackgerritMerged openstack-infra/bindep master: Add ability to list all deps
clarkbjeblair: yes sorry18:11
openstackgerritClark Boylan proposed openstack-infra/system-config master: Make journal logs persistent on disk
*** rhallisey has quit IRC18:13
*** slaweq has joined #openstack-infra18:15
*** rhallisey has joined #openstack-infra18:17
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Create inventory variable
openstackgerritPaul Belanger proposed openstack-infra/project-config master: Use for zuul_site_mirror_fqdn
pabelangerjeblair: ^should be fix for internap mirror URL18:20
pabelangerjust waiting until we restart zuulv3 with inventory to confirm18:20
*** makowals_ has joined #openstack-infra18:21
*** ociuhandu has joined #openstack-infra18:21
jeblairpabelanger: okay, that's +2 from me for you to +3 after the restart18:21
*** dizquierdo has quit IRC18:27
*** bhavik1 has quit IRC18:28
openstackgerritGage Hugo proposed openstack-infra/project-config master: Skip integration/non-doc jobs in certain cases
openstackgerritMerged openstack-infra/zuul-jobs master: Retry updating apt-cache
openstackgerritMerged openstack-infra/system-config master: Revert "Replace buildlogs.centos with buildlogs.cdn.centos"
openstackgerritMerged openstack-infra/zuul-jobs master: Add two roles for publishing artifacts over ssh
*** rbrndt has joined #openstack-infra18:33
amoralejpabelanger, will be refreshed after is applied?18:34
*** rhallisey has quit IRC18:34
pabelangeramoralej: not sure I follow, that is an review from rdoproject18:35
amoralejpabelanger, sorry, wrong paste, i meant
ihrachysI noticed, multiple times already, tempest jobs fail without timeout AND no logs uploaded, only thing we have then is console18:36
openstackgerritMonty Taylor proposed openstack-infra/zuul-jobs master: Document and update fileserver roles
ihrachyswhich is not helpful18:36
ihrachyswhat could be the reason of that happening?18:36
pabelangeramoralej: ya, that just merged a few minutes ago, so once mirrors update new requests should get refreshed for that18:36
amoralejok, thx18:37
pabelangeramoralej: so, we hit first, then redirect to when they tell us too18:37
*** marst has quit IRC18:37
pabelangerihrachys: setup_host failed:
pabelangerfailed to ping mirror in internap18:38
*** thingee_ has joined #openstack-infra18:40
ihrachyspabelanger, ah I see. I always jump straight to devstack.log ;)18:45
openstackgerritJakub Libosvar proposed openstack-infra/project-config master: Revert "Make neutron functional job non-voting"
*** amoralej is now known as amoralej|off18:47
openstackgerritMerged openstack-infra/puppet-zuul master: Zuulv3: move the job dir under /var/lib/zuul
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: Add publish-openstack-python-branch-tarball to post pipeline
*** Sukhdev has joined #openstack-infra18:52
*** e0ne has joined #openstack-infra18:54
*** nicolasbock has quit IRC18:55
openstackgerritMerged openstack-infra/project-config master: Add inap cloud
openstackgerritJames E. Blair proposed openstack-infra/zuul-jobs master: Revert "Retry updating apt-cache"
jeblairinfra-root: ^ i'm going to force merge that.  the commit it reverts wedged zuulv3.19:02
jeblairhrm, why doesn't project-bootstrappers let me +2 verify that?19:04
clarkbare the verified perms on it exclusive?19:05
openstackgerritMerged openstack-infra/zuul-jobs master: Revert "Retry updating apt-cache"
jeblairgertty let me19:05
clarkbor you may just need a hard refresh19:05
clarkbya web ui caches vote categories19:05
jeblairi did refresh :(19:05
jeblairthe full shift-control-open-apple-alt-meta-splat-R one too19:05
jeblairanywho, it's in19:06
openstackgerritJakub Libosvar proposed openstack-infra/project-config master: Revert "Make neutron functional job non-voting"
*** rkukura has quit IRC19:08
openstackgerritPaul Belanger proposed openstack-infra/project-config master: Create post pipeline for zuulv3.o.o
pabelangerjeblair: mordred: fungi: clarkb: any objection for creating post pipeline for zuulv3.o.o?^19:15
*** lihi has joined #openstack-infra19:16
fungipabelanger: i'm not where i can review it right now, but i'm fully in favor of the subject line19:21
*** adisky__ has quit IRC19:23
openstackgerritClint 'SpamapS' Byrum proposed openstack-infra/zuul-jobs master: Install build private key too
*** trown is now known as trown|brb19:29
clarkbpabelanger: approved19:29
clarkbok now to do infracloud mirror things. I'm guessing step zero is uploading an ubuntu xenial cloud image19:29
clarkbya at least for infracloud it needs a xenial image. I'm going to grab one of ubuntu's and upload it to both infraclouds19:30
clarkbhrm openstack image create doesn't take a hash like glance client did?19:36
clarkbmaybe that is a property19:36
openstackgerritMerged openstack-infra/project-config master: Create post pipeline for zuulv3.o.o
clarkbmordred: ^ do you know what the magical way to have glance verify the checksum is?19:39
*** nicolasbock has joined #openstack-infra19:39
*** trown|brb is now known as trown19:39
clarkbI guess I can push it then check the checksum glance reports back but it seems like handing it one and letting it fail upfront is far more sane19:39
clarkbhrm doesn't even look like shade does this19:40
clarkbmordred: is that a bug? we don't seem to check hashes when uploading images to glance?19:42
*** Sukhdev has quit IRC19:43
*** kjackal_ has quit IRC19:44
mtreinishclarkb: it's a manual thing via the api, but that would be a good shade flag to add19:45
mtreinishclarkb: I have a doc patch to add that to the install guide:
clarkbmtreinish: I'm not seeing it in the api docs either fwiw19:45
mtreinishbecause I was bit with that19:45
clarkbI see that glance will return a checksum to you though so I am just uploading and will check the sum that glance computes before booting19:45
*** kjackal_ has joined #openstack-infra19:46
* clarkb wonders why a doc change can't merge until after pike is cut...19:46
mtreinishI dunno19:47
mtreinishI stopped asking questions...19:47
*** nicolasbock has quit IRC19:49
clarkbpabelanger: ok images are up in both regions. Is there one that you think would be ebtter to start in?19:53
jeblair#status log zuul v2 launchers zl07, zl08, zl09 have been deleted due to reduced cloud capacity and to make way for zuul v3 executors19:54
openstackstatusjeblair: finished logging19:54
*** rcernin has quit IRC19:55
jeblair#status log zuul v3 executors ze02, ze03, ze04 are online19:55
*** jpena is now known as jpena|off19:58
openstackgerritDavid Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Fix documentation nits
clarkbianw: upon rereview of I think I found a minor issue taht should be fixed before merging20:04
openstackgerritDavid Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Fix documentation nits
clarkbianw: once that is addressed I think we can get that in and I will work on infracloud mirror replacements if you want to work on the rax one (pabelanger found ifnracloud also suffering from the segfaults)20:05
clarkbpabelanger: looks like buildlogs issues have skyrocketed recently according to e-r20:09
clarkbpabelanger: is that what you were looking to fix or is it potentially caused by the fix?20:09
openstackgerritPaul Belanger proposed openstack-infra/project-config master: Update base-test to use site_logs secret
pabelangerclarkb: ya, that is the stale repodata. Our revert should fix that20:11
*** tumbarka__ has quit IRC20:12
*** Sukhdev has joined #openstack-infra20:13
pabelangerYa, last failure was 20mins ago: 2017-08-16T19:56:34.839Z20:13
pabelangermirrors should all be running revert now20:13
*** srobert has quit IRC20:13
clarkbthen I think next step is to have ianw respond to my review comments, and start booting some xenial mirrors20:14
*** mat128 has quit IRC20:17
*** dprince has quit IRC20:18
pabelangermordred: jeblair: clarkb: regarding: since logs.o.o points to static.o.o, I believe it was suggested we create seperate secrets for each fqdn we are going to access, even though they have same private SSH keys. Meaning, we'd have both site_logs and site_tarballs zuul secrets over directly using site_static with different paths. thoughts?20:18
jeblairpabelanger: that sounds nicely future-proof20:20
clarkbthe only thing to be wary of in that setup is if we change logs' key without realizing that static needs updating too. But I think it more likely we'd split the hosts up rather than change a key and miss that20:21
mordredclarkb: glance v1 lets you send a checksum in the upload payload, and shade sends a checksum to it20:22
mordredclarkb: v2 has no documented mechanism to do the same thing for direct upload20:22
jeblairze01--ze04 are cloning all repos20:23
clarkbmordred: thats ... ok20:23
mordredclarkb: if there is a way to do checksum-on-upload we can happily add it20:23
clarkbmordred: we probably want to check the checksum against the response to upload which includes the glance computed checksum20:23
mordredhowever- it wouldn't be any less expensive than just checking the returned checksum, since you'll have ot upload the whole thing before the checksum can get validated anyway20:23
clarkbmordred: so we can fail on the shade side rather than on the glance side20:23
mordredso yah - we should definitely validate like that20:24
clarkbmordred: its just more of a "your api shoudl handle these things for you" thing than anything else20:24
openstackgerritMajor Hayden proposed openstack-infra/system-config master: Add reverse proxy
openstackgerritMajor Hayden proposed openstack-infra/project-config master: Add proxy host for erlang-solutions mirror
clarkbmhayden: ^ is there a reason for not using the distro provided erlang packages? (we've used them for years so curiouos if there is some benefit to using upstream packages)20:32
mhaydenclarkb: according to cloudnull's research, rabbitmq recommends erland 19.x for the current version of rabbitmq20:32
mhaydenbut 16.04 only has 1820:32
clarkbmhayden: and you aren't using ubuntu's rabbitmq package? or their package has jumped ahead of their erlang?20:33
mhaydenclarkb: we are using the upstream rabbitmq package at the moment - 3.6.9-120:34
clarkbhuh interesting20:34
mhaydenwe found that rabbitmq did a lot better under load with 1) modern version from and 2) pinned erlang version from upstream erlang repos20:34
openstackgerritPaul Belanger proposed openstack-infra/project-config master: WIP: Create testpypi_secret secret for zuulv3
mhaydeni didn't do the research myself, but that was the output20:34
mhaydenbut the erlang mirror is somewhere in eastern europe with latency ~ 160ms to the central USA :/20:35
openstackgerritSlawek Kaplonski proposed openstack-infra/shade master: Don't determine local IPv6 support if force_ip4=True
clarkbmhayden: and is that similar situation for mariadb? why not use what is in centos repos?20:35
openstackgerritwes hayutin proposed openstack-infra/project-config master: Add oooq based undercloud-containers job.
mhaydenclarkb: we wanted some of the newer features from upstream mariadb/galera/percona20:36
mhaydenso we install from mariadb's upstream repos20:36
*** makowals_ has quit IRC20:36
*** e0ne has quit IRC20:36
*** jcoufal_ has joined #openstack-infra20:36
clarkbI'm worried that we are creating an unscalable future with all these changes particularly since we have mirror instability as it is (not that osa is at fault for that, everyone else is piling on too)20:36
mhaydencouldn't live with that typo in the commit message ;)20:36
mhaydenhaha, not enough coffee this afternoon20:37
openstackgerritwes hayutin proposed openstack-infra/project-config master: Add oooq based undercloud-containers job.
mhaydenclarkb: makes sense20:37
mhaydenour other option is to pre-stage some of this stuff as early in our gate jobs as possible, but it might not be a great  test of a production deploy20:37
mhaydenif it's preferred that we don't make more reverse proxies right now, i can go back and try out some other options20:38
openstackgerritSlawek Kaplonski proposed openstack-infra/shade master: Fix determining if IPv6 is supported when it's disabled
*** tnovacik has quit IRC20:38
*** rkukura has joined #openstack-infra20:39
openstackgerritClint 'SpamapS' Byrum proposed openstack-infra/zuul-jobs master: Install build private key too
*** jcoufal has quit IRC20:39
clarkbmhayden: it might be good to wait a minute while we upgrade the mirrors in an attempt to get to a debuggable spot. ianw should be waking soon and was working on that and I plan on doing some of it today as well20:39
clarkbmhayden: then once we stabilize we can add things back on (as is its hard to know if we are making anything better if we keep adding backends while trying to fix things tio keep up better)20:40
mhaydentotally understandable ;)20:40
clarkbthis conversation has also inspired an entry on the PTG ideas list which I am typing up now20:40
*** pcaruana has quit IRC20:41
openstackgerritPaul Belanger proposed openstack-infra/project-config master: Create testpypi_secret secret for zuulv3
mhaydeni'll be the guy in the OSA room who is getting yelled at for constantly breaking gate jobs20:41
*** rkukura_ has joined #openstack-infra20:44
clarkbso wouldn't be crazy to mirror it properly for centos as well rather than just proxying it20:44
mhaydenah okay20:44
mhaydeni'm not sure how much of mariadb's yum repo would need to be mirrored -- that'd take some digging20:45
openstackgerritEric Harney proposed openstack-dev/hacking master: Fix python 3.6 escape char warnings in strings
*** rkukura has quit IRC20:46
*** rkukura_ is now known as rkukura20:46
*** gouthamr has quit IRC20:46
openstackgerritPaul Belanger proposed openstack-infra/project-config master: Create testpypi_secret secret for zuulv3
*** Apoorva_ has quit IRC20:47
mhaydenclarkb: to be honest, if i could figure out the difference between erlang/OTP and erlang, i might not need a mirror or proxy :P20:47
*** Apoorva has joined #openstack-infra20:47
dmsimard|offmhayden: the mariadb off of RDO isn't good enough ?20:48
pabelangermordred: so, I know ^ isn't the full job yet needed for testpypi, some of that is depending on your (pre-)python-tarball logic. But might be worth it to land so we can start testing pip intall commands inside bwrap on executor20:48
mhaydendmsimard|off: would like to keep versions of mariadb synced between xenial/centos/suse20:48
*** felipemonteiro_ has quit IRC20:48
dmsimard|offmhayden: fair20:48
mhaydendmsimard|off: you're off work, go enjoy a beer20:48
dmsimard|offmhayden: who said I was working :)20:49
*** thingee_ has quit IRC20:49
*** esberglu has quit IRC20:50
*** jcoufal_ has quit IRC20:52
*** rhallisey has quit IRC20:54
*** esberglu has quit IRC20:55
*** spzala has quit IRC20:58
*** spzala has joined #openstack-infra21:00
mordredpabelanger: lgtm21:02
*** esberglu has joined #openstack-infra21:03
*** slaweq has quit IRC21:06
*** aviau has quit IRC21:06
*** slaweq has joined #openstack-infra21:06
*** aviau has joined #openstack-infra21:07
jeblairimages have been uploaded to inap21:08
jeblairi've approved the quota switch to move from using internap to using inap (493073)21:08
*** slaweq has quit IRC21:11
*** xarses_ has joined #openstack-infra21:12
*** andreww has quit IRC21:15
*** ldnunes has quit IRC21:17
*** spligak has quit IRC21:17
jeblairi'm adding zuulv3.o.o to the emergency file21:17
jeblairclarkb: are you done with in emergency?21:18
*** rama_y has quit IRC21:18
*** thorst has quit IRC21:19
clarkbjeblair: yes21:19
ianwclarkb: looking ...21:21
fungimhayden: come to the infra room and we can yell at you for breaking the infrastructure instead! er, i mean... totally help you out21:23
fungimhayden: we're probably going to break tons of eggs while making zuul v3 omelets at the ptg anyway, so won't be in much position to criticize ;)21:24
*** ihrachys has quit IRC21:25
openstackgerritIan Wienand proposed openstack-infra/system-config master: Add mirror01.iad.rax.o.o
pabelangerianw: do you think we are ready to rotate out fedora-25, now that fedora-26 has been online for a while?21:27
ianwpabelanger: there's still one devstack thing i can push today, because it's got no comment21:27
ianwunless you have better ideas on that21:27
*** sree has joined #openstack-infra21:28
ianwclarkb: i think that removing the "-"'s has padded things out a bit far, see /etc/apache2/sites-enabled/ on mirror01 .  it also doesn't bother me; i just copied it from one of the other ones21:29
clarkbianw: I think you can avoid that by removing some of the built in leading whitespace in the interpolation strings?21:32
clarkbor do you mean vertically?21:32
ianwboth, ithink that's why the "<% end -%> was there21:32
*** sree has quit IRC21:33
clarkbianw: you can add it back to the end tags as long as it is removed from yhr serveralais lines21:33
clarkbI think one of the end tags has no - and is adding an extra newline21:33
clarkbits correct as is just potentially ugly21:34
ianwcloser ...
*** slaweq has joined #openstack-infra21:36
openstackgerritClint 'SpamapS' Byrum proposed openstack-infra/zuul-jobs master: Install build private key too
*** felipemonteiro has joined #openstack-infra21:36
clarkbianw: I think you mayalso have to lose the if else block formatting so it is all at the same level of indentation21:37
*** aeng has joined #openstack-infra21:38
clarkbfungi: ya fixing the too few problem is what my -1 was about, now addressed but resulting in uglier output21:38
fungias long as the template in git is readable, those few poor schmucks with direct access to look at the resulting configs on disk can get by21:39
fungireadable in git and resulting in syntactically correct (if not aesthetically pleasing) conffiles is what i would focus on21:39
*** bobh has joined #openstack-infra21:40
fungilife's too short to care about extra rendered whitespace ;)21:40
*** slaweq has quit IRC21:42
*** Apoorva_ has joined #openstack-infra21:43
*** slaweq has joined #openstack-infra21:43
clarkbfungi: pabelanger can you review ianw's change other than extra whitespace I believe it to be correct :)21:43
jeblairdeb-python-cassandradriver seems to have a large number of branches21:43
clarkbbut with that in I will go ahead and spin up mirror01.regionone.infracloud-vanilla.openstack.org21:43
*** yamamoto has joined #openstack-infra21:44
*** yamamoto has quit IRC21:45
*** Apoorva has quit IRC21:46
ianwpaste.o.o where are you?21:46
ianwclarkb: i think sticking it all on the one line works the best ->
ianwbut whatever.  i'll kill my cache warming stuff on mirror01.iad.rax21:47
*** slaweq has quit IRC21:47
clarkbotherwise if you have more than one alias you'll get them all on one line right?21:47
ianwanyway, i'm all for leaving well enough alone too :)21:48
clarkboh huh21:48
clarkbsorry I totally missed that if that was there before and that is my bad21:48
ianwi didn't think that moving the caches from the old trusty -> xenial was a good idea, seeing as versions of everything changed21:48
*** eharney has quit IRC21:49
ianwand i also guess ttl on the mirror is of minimal importance, given everything starts fresh anyway21:49
ianwalthough i guess the upstream dns could be holding onto it21:49
pabelangerianw: ya, I've just keep 60 TTL and things slowly moved over21:50
ianwi think they call that load balancing :)21:50
pabelangerYa, could be better to prime empty caches too21:50
clarkbalso that pastebin got a facelift from the last time I saw it21:50
ianwi figure reverse proxy was less of a cold-cache issue21:51
*** thorst has joined #openstack-infra21:53
ianwcool; i'll get some breakfast while this all gets puppeted properly then see about inserting it and monitor closely21:53
*** thorst has quit IRC21:53
clarkbwe should see results fairly quickly too I imagine? I wonder if we could just do all the servers tomorrow21:53
clarkbthat would be nice and I can make time for cranking them out21:54
*** felipemonteiro_ has joined #openstack-infra21:54
pabelangerclarkb: we could setup an elastic-recheck query to track http connection resets, at least from yum client21:57
clarkbwe sort of already do21:57
clarkbI think there is strong correlation between those yum errors and that behavior21:57
clarkbwe'd just have to filter by region and see if results drop off, or check the apache logs for segfaults21:57
*** felipemonteiro has quit IRC21:57
*** yamamoto has quit IRC21:58
pabelangerYa, I think there is also networking issue there too, but is a good start21:58
*** priteau has quit IRC21:59
ianwhopefully it is not related to some weird afs just-non-posixy-enough-to-confuse-deep-logic-in-apache issue22:03
*** jascott1 has quit IRC22:05
fungii'm more likely to blame apache for this than the other way around22:06
fungibut yeah, who knows at this point22:06
clarkbwe need a trusty node22:07
*** jascott1 has quit IRC22:07
clarkbit is as if that release knows we are replacing it and is holding out22:07
*** jkilpatr has quit IRC22:09
*** xyang1 has quit IRC22:11
clarkbare half hour build times expected in internap?22:13
clarkbwondering if that is fallout from the internap changes going on22:13
mgagneclarkb: compared to what? what kind of build?22:14
clarkbmgagne: I see it for xenial and trusty instances at least. I would expect it to take just a few minutes typically so wondering if something went sideways with the name changing?22:15
mgagnewhat kind of job? like a lint job for instance?22:16
clarkbmgagne: no this is just instance booting, no jobs yet22:16
pabelangercompute nodes downloading images?22:16
mgagnepabelanger: that's what I suspect22:17
pabelangerclarkb: that large spike for Time to Ready on internap is likely new images too22:17
jeblairyeah just started looking at that22:17
mgagnebut I don't have enough details to verify22:17
pabelangerhits upwards of 50Mins22:17
ianwclarkb / pabelanger: how interesting ... another person caught that mpm_event segfault literally an hour ago ->
openstackgerritRamamani Yeleswarapu proposed openstack-infra/devstack-gate master: [TESTING][DO NOT MERGE] Testing TLS in Ironic jobs
pabelangerianw: Ha, nice22:18
pabelangerianw: looks like possible SRU22:18
ianwyeah, the tl;dr is use it from backports22:19
clarkbpabelanger: mgagne ok it does look like it spikes up daily if I expand the grafana graph period22:19
clarkbianw: pabelanger so maybe xenial will fix it then \o/22:19
ianwbut, adds some value to upgrading22:19
ianwjinx :)22:19
jeblairmgagne, clarkb, pabelanger, ianw: i spot checked inap nodes -- they really still are in building state in nova22:23
mgagnehave you swapped all quota to inap?22:23
jeblairso that idea of image copying may be correct22:23
jeblairmgagne: yes22:23
mgagnecould be that all compute nodes are downloading images at the same time22:23
clarkbI shall practice patience22:24
*** thorst has joined #openstack-infra22:24
*** jascott1 has joined #openstack-infra22:24
openstackgerritClint 'SpamapS' Byrum proposed openstack-infra/zuul-jobs master: Add known_hosts from executor to all nodes
*** thorst has quit IRC22:29
openstackgerritsebastian marcet proposed openstack-infra/openstackid-resources master: External Calendar Sync
openstackgerritEmilien Macchi proposed openstack-infra/project-config master: TEMP - Disable voting on tripleo upgrade jobs
clarkbthe number of inap building instances is 138 and has remained that way for a while so we aren't thrashing22:33
clarkblikely that I Just need to be paitent22:33
*** felipemonteiro_ has quit IRC22:34
pabelangermnaser: I am seeing about 23 jobs timeout in vexxhost using our e-r query, a few recently.22:34
jeblairclarkb: yep; i'm tailing logs and have seen no further activity other than polling22:34
pabelangerinfracloud-vanilla seems to timeout more then infracloud-chocolate22:34
clarkbpabelanger: vanilla is on the older hardware I think22:35
clarkbmight be somewhat slower22:35
pabelangerclarkb: ya, I think that is right22:35
mgagneclarkb: I checked 1 compute node and yes, it's downloading. the one I checked, compared image size to existing ones and it's done at 20% so far.22:36
fungii wonder if we've unleashed a thundering herd in the image storage network there22:37
mgagneyou are only affecting yourself so there is that =)22:37
fungioh, good. as long as we're not impacting performance for anyone else there i don't much care. it'll resolve itself soon enough22:38
mgagneI can only imagine our netadmin: "Are you downloading something?" "Yea, a cloud user is using the cloud." "well, he shouldn't!"22:39
jeblairwe get that a lot22:40
fungi"wait, what, you're using this thing?!?"22:40
fungilesson: openstack expects lots of available bandwidth between nova compute nodes and glance stores22:41
openstackgerritMerged openstack-infra/system-config master: Add mirror01.iad.rax.o.o
mgagnefungi: glance store is the key, now it's going through glance-api for no reason =)22:41
clarkbwe are down to 133 building22:42
clarkbnot sure if the 5 that changed state succeeded or failed though22:42
fungiyeah, we also toyed around some years back (in tripleo context i think?) with the idea of a bittorrent sharing solution for updating nova image caches between commute nodes22:42
clarkboh what do you know irc says the change merged so must've changed state in the direction we want :)22:42
clarkbianw: so thats in now22:42
clarkbfungi: ya22:42
*** bobh has quit IRC22:43
mgagnefungi: I heard something similar from rax with deployment and some people forgot firewalls have limited number of sessions.22:43
*** bobh has joined #openstack-infra22:43
clarkbmy favorite was when I got the 3am phone call because the default security group rule in hpcloud had nuked the database22:44
fungishould of course clarify, _stateful_ firewalls have session limits22:44
mgagneclarkb: oh, the default rule where other members are authorized and each new instance triggers an iptable update on all compute nodes?22:44
clarkbmgagne: ya22:44
clarkbdown to 108 building so its moving relatively quickly now :)22:45
fungi_stateless_ firewalls just inspect each packet on its own merits (but of course have far less effective rules and often run into packet rate issues too since they can't take state tracking shortcuts)22:45
jeblairclarkb: i think we're hitting the nodepool build timeout22:45
clarkbjeblair: oh maybe22:45
fungiwe'll eventually get those images primed onto all their compute nodes though ;)22:45
jeblairyep confirmed in logs22:45
* clarkb checks where ianw's change gated22:45
clarkbya neither trusty job was inap22:46
*** rbrndt has quit IRC22:51
clarkbI am launching the xenial mirror in vanilla cloud now22:52
*** dizquierdo has joined #openstack-infra22:53
jeblairthe inap deletes are slow too; maybe we can't delete the instances which are still waiting on image downloads until they are complete22:55
pabelangerclarkb: so, ya, I pretty sure I know the answer, but..... is doing a reverse proxy cache for something we'd do?22:57
*** EricGonc_ has quit IRC22:57
EmilienMhey infra, I have an outstanding request: - everything is explained in the commit message - we need this asap. Thanks a lot22:58
clarkbpabelanger: like I already told mhayden I think step 0 is getting reliable proxy/mirror for what we currently have22:58
clarkbpabelanger: we keep piling on making it impossible to figure out if we've made anything better than yesterday22:58
clarkbbut once that is done I imagine it is something we could slowly test22:58
clarkbpabelanger: my concern is that we are essentially trying to mirror the internet22:58
clarkband one tiny host per region is going to do a poor job of that22:59
pabelangerclarkb: Ya, agree. Don't think we should add more until we stablize current mirrors22:59
pabelangerclarkb: ya, that is also my concern now too22:59
*** esberglu has joined #openstack-infra22:59
pabelangerwhere do we draw the line on thing to proxy22:59
clarkbya I put an entry on the ptg ideas etherpad related to figuring ^ out23:00
clarkbI think a big part of it will be getting information on what people think they need mirrored/proxied bucause I think we are learning that we haven't actually done that yet23:00
openstackgerritMonty Taylor proposed openstack-infra/zuul-jobs master: Document and update fileserver roles
clarkbthen based on that feedabck and knowledge of hopefully running more reliable servers we can figure out what makes sense23:01
openstackgerritMonty Taylor proposed openstack-infra/project-config master: Use artifact publication roles from zuul-jobs
clarkbgithub is probably special in that we already do a good job of hosting git repos. Are people wanting us to provide CI for github or can we just consume release artifacts from projects hosted on github via the current channels (pip, npm, gems, ubutnu centos etc)23:02
clarkbpabelanger: ^ is there a specific case where github would be useful?23:02
*** esberglu has quit IRC23:03
clarkbfirst build in vanilla failed. I can't tell if it thinks ssh timed out or if it didn't like the ssh host key for some reason23:03
pabelangerclarkb: personally, I don't think we should proxy In the case I am looking at, infracloud seems to be having problems cloning some repos from github.com23:04
pabelangerclarkb: this is because the DLRN tool requires some configuration data and rdoproject publishes it there23:04
pabelangerclarkb: I think it would be possible to move this data into existing RDO proxy infra, but I need to work with #rdo23:05
openstackgerritGiulio Fidente proposed openstack-infra/tripleo-ci master: Copy Ceph logs from running containers
pabelangerclarkb: so, for now, I think we have better tools to mirror git then using reverse proxy cache23:05
pabelanger is the repo in question too23:05
clarkbya seems like that is repo metadata that could be hosted in the rpm repo23:06
clarkb(but I know little about hosting practices for rpm packages)23:06
*** marst_ has quit IRC23:06
*** makowals has quit IRC23:06
pabelangerclarkb: yes, I agree. That is what I am going to ask #rdo about23:06
*** makowals has joined #openstack-infra23:11
*** thorst has joined #openstack-infra23:11
clarkbI'm debuggin launch node fails in infracloud now. trying to figure out if its our ssh key or maybe the wrong users being used?23:14
*** vhosakot has quit IRC23:14
ianwclarkb: i think the "01" is a good idea, because if we get to the point we do want load-balancing or something i think that gives us flexibility23:17
ianwclarkb: want me to start an etherpad to track what's what?23:17
clarkbianw: sure23:17
clarkband I'm attempting to boot with 01 just curious if plan was to CNAME?23:17
ianwi think yeah, have mirror -> mirror0123:18
*** notmyname has quit IRC23:19
*** aeng has joined #openstack-infra23:19
*** notmyname has joined #openstack-infra23:23
clarkbpabelanger: ok just discovered why removing dns settings on our network is bad, can't launch node the new mirror because we have to resolve the host for :)23:24
clarkbpabelanger: thoughts on how we want to address that?23:24
pabelangerclarkb: Ah, ya. That would do it23:25
clarkbI could provide a user data ascript that wrote out a resolv.conf23:25
pabelangerya, was just thinking something like that23:25
clarkbthat is kind of hacky23:26
clarkbbut we only need it to work that one time23:26
pabelangerclarkb: I think once I fix up and we land that on our images, we can revert the puppet change23:26
clarkbdoesn't look like launch node supports arbitrary user data yet23:26
clarkbI'm going to hack it in23:27
*** gouthamr has joined #openstack-infra23:27
clarkbsince we have WIP to fix it properly I don't feel to bad getting the host booted today23:27
clarkbin the future it should just work :)23:27
pabelangerOne day we'll have DIB images for control plane, then we'd get unbound :)23:28
*** EricGonczer_ has joined #openstack-infra23:28
openstackgerritPaul Belanger proposed openstack-infra/project-config master: Create glean@.service.d/override.conf
clarkbwhat I'm doing is adding an ssh command in launch node that appends nameserver to resolve.conf23:29
clarkbso its a one liner hack in launch node that should work fine and will be undone by puppet23:30
*** EricGonczer_ has quit IRC23:30
clarkbif people think its worth having that in launch node proper I can push it up once confirmed ti works23:30
*** Swami has quit IRC23:33
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Add publish-openstack-python-branch-tarball to post pipeline
*** spzala has joined #openstack-infra23:34
*** Apoorva_ has quit IRC23:35
ianwis really old?  i can't log in and i wonder if it's my ecdsa key23:35
clarkbianw: I'll look23:35
*** Apoorva has joined #openstack-infra23:35
clarkbianw: ed25519 key23:35
ianwyeah, sorry23:36
fungithe "djb curve"23:36
clarkblet me take a look at what is in puppet and put that in there23:36
ianwit caused issues with precise hosts23:36
clarkboh that is what is in puppet23:36
clarkbis this a precise host that we missed somehow?23:37
clarkbit has a 3.13. trusty kernel23:37
clarkblet me see if auth log has more info23:37
*** spzala has quit IRC23:38
clarkbianw: I see root attempts is that you? should be your normal username if so23:38
ianwohhh, haha it would help if i was in a window on a host with my key, doh23:39
clarkbfatal: []: FAILED! => {"changed": false, "failed": true, "module_stderr": "", "module_stdout": "/bin/sh: 1: /usr/bin/python: not found\r\n", "msg": "MODULE FAILURE", "parsed": false}23:40
ianwand this is why i colour code terminals23:40
clarkbso uh I guess ubuntu removed python from their base xenial cloud image23:40
ianwoh, that would be the xenial not having python2 by default23:40
clarkbianw: ya guessing you haven't hit that because you are on rax's images which are different23:40
clarkbso do I also hack in an apt-get install python?23:41
clarkbI guess its worth doing :)23:41
ianwyeah, dib had plenty of issues with that at the time :)23:41
jeblairinap nodes are coming online i believe23:41
jeblairand all the previously deleted nodes have been cleared out23:42
jeblair#status log renamed nodepool internap provider to inap.  new mirror server in use.23:43
openstackstatusjeblair: finished logging23:43
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Allow requesting secrets by a different name
jeblairinfra-root: now that inap is in use (replacing internap), we should watch out for unexpected fallout from the new mirror host23:44
mordredianw, clarkb: wow. I have clearly missed a giant pile of fun - ubuntu no longer ships python in their cloud images?23:45
ianwmordred: python223:45
fungia.k.a. /usr/bin/python23:45
clarkbwhich ansible needs23:45
clarkbwhich launch-node needs23:45
clarkbmordred: I love that nasible has now graduated to the langauge runtime problem that we have had with puppet at times23:46
ianwit's what i like to call aspirational23:46
fungiwhich things not python3-only will default to using23:46
mordredclarkb: indeed. well - we can add a bootstrap python to our launch-node stuff23:46
clarkbmordred: ya ssh_client.ssh('apt-get update && apt-get install -y python') seems to have worked23:47
pabelangerconnect=raw FTW23:47
mordredclarkb: it's possible to have non-python tasks in ansible, which is the normal cantrip for getting python intsalled onto a node that doens't have it and is otherwise managed with ansible23:47
clarkbI think we have a bootstrap script that that would be better in though23:47
clarkbmordred: ah23:47
pabelangerthis is actually a good use case for zuulv3 and ubuntu cloud images23:47
mordredclarkb: yah - just saying - it's possible to do ^^ what pabelanger said if we were using ansible and not that bootstrap script23:47
pabelangerwe'd need to do the same thing23:48
mordredwell - it's also a good case for "make our own base images for control plane"23:48
mordredbut, you know - hours in a day23:48
mordredthat's been on my list for what? 3 years now?23:48
clarkbI'll take a look at the boot process when not trying to get a mirror up :)23:49
clarkbits possible we just want to add an ansible step that does it without python23:49
clarkbor put it in an existiing bootstrap script23:49
*** aeng has quit IRC23:49
openstackgerritPaul Belanger proposed openstack-infra/openstack-zuul-jobs master: Replace slash for tarball rename
*** gongysh has joined #openstack-infra23:52
*** gongysh has quit IRC23:52
clarkbhrm we didn't break in which is where I would've expected a lack of python to first be a problem23:53
openstackgerritMonty Taylor proposed openstack-infra/zuul-jobs master: Document and update fileserver roles
clarkband we default to setting up pip23:54
pabelangeractually, I thought I've launched a xenial node before23:55
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Allow requesting secrets by a different name
pabelangerclarkb: Oh, now I remember23:56
pabelanger is actually wrong23:56
dimsclarkb : fungi : mordred : Do you have a few mins to merge "Update grenade settings for stable/pike" review? pretty please -
pabelangerthat needs to move down 1 level23:56
pabelangerand not be in the if statement23:56
pabelangerI thought I proposed a fix for that23:57
mordredpabelanger: oh - hah. yeah - so it does23:57
clarkbpabelanger: it actually need to go well before that because setup_pip happens before much of anything else23:58
clarkbI'm also getting a lot of Aug 16 23:56:36 mirror01 puppet-agent[9836]: Could not request certificate: Failed to open TCP connection to puppet:8140 (getaddrinfo: Name or service not known)23:58
pabelangerclarkb: I think I had this issue in vexxhost too23:58
pabelangerwhen we did nb0323:58
clarkbpabelanger: the no python issue or the puppet-agent?23:58
mordredFailed to open TCP connection to puppet:8140 ... that sounds like puppet.conf is not right23:58
clarkbmordred: yes23:58
pabelangerclarkb: no python23:59
mordredand that something started agent23:59
clarkbits trying to talk to a master23:59
pabelangerclarkb: I must have just manually applied my fix and never pushed up the fix :(23:59
mordredAND using the default value of 'puppet' - that's fantastic23:59
pabelangerthe puppet issue, I think I see that but didn't break anything on vexxhost23:59
mordreddims: ++ from me23:59
clarkbpabelanger: ok so maybe be patient then double check after its done? it hasn't failed yet23:59

