19:01:17 #startmeeting infra 19:01:18 hi 19:01:18 Meeting started Tue Nov 26 19:01:17 2019 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:19 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:21 The meeting name has been set to 'infra' 19:01:28 #link http://lists.openstack.org/pipermail/openstack-infra/2019-November/006528.html Our Agenda 19:01:41 #topic Announcements 19:01:54 Just a note that this is a big holiday week(end) for those of us in the USA 19:02:13 I know many are already afk and I'll be afk no later than thursday :) 19:02:48 i'll likely be increasingly busy for the next two days as well 19:03:09 (much to my chagrin) 19:03:43 I guess I should also mention that OSF individual board member elections are coming up and now is the nomination period 19:04:14 #topic Actions from last meeting 19:04:31 thank you fungi for running last week's meeting, I failed at accounting for DST changes when scheduling dentist visit 19:04:37 #link http://eavesdrop.openstack.org/meetings/infra/2019/infra.2019-11-19-19.05.txt minutes from last meeting 19:04:50 No actions recorded. Let's move on 19:04:51 no sweat 19:04:57 #topic Priority Efforts 19:05:09 #topic OpenDev 19:05:10 i half-assed it in hopes nobody would ask me to run another meeting ;) 19:05:19 #link https://etherpad.openstack.org/p/rCF58JvzbF Governance email draft 19:05:30 I've edited that draft after some input at the ptg 19:05:47 I think it is ready to go out and I'm mostly waiting for monday (to avoid getting lost in the holdiay) and for thoughts on who to send it to? 19:06:15 I had thought about sending it to all the top level projects that are involved (at their -discuss mailing lists) 19:06:33 but worry that might create too many separate discussions? we unfortunatley don't have a good centralized mechanism yet (though this proposal aims to create one) 19:06:51 I don't need an answer now, but if you have thoughts on the destination fo that email please leave a note in the etherpad 19:07:04 you could post it to the infra ml, and then post notices to other related project mailing lists suggesting discussion on the infra ml 19:07:21 and linking to the copy in the infra ml archive 19:07:41 that should hopefully funnel discussion into one place 19:07:58 I'm willing to try that. If there are no objects I'll give that a go monday ish 19:08:29 ianw: any movement on the gitea git clone thing tonyb has run into? 19:08:46 we upgraded gitea and can still reproduce it, right? 19:08:50 not really, except it still happens with 1.9.6 19:08:51 fungi: yes 19:09:09 tonyb uploaded a repo that replicates it for all who tried 19:09:30 across different git version and against specific and different backends 19:09:35 it seems we know what happens; git dies on the gitea end (without anything helpful) and the remote end doesn't notice and sits waiting ~ forever 19:09:41 whcih I think points to a bug in the gitea change to use go-git 19:10:01 ianw: and tcpdump doesn't show any Fins from gitea 19:10:21 we end up playing ack pong with each side acking bits that were previous transferred (to keep the tcp connection open) 19:10:38 i'm not sure on go-git; it seems it's a result of "git upload-pack" dying, which is (afaics) basically just system() called out to 19:10:44 ah 19:11:29 how long is that upload-pack call running, do we know? 19:11:42 could it be that go-git decides it's taking too long and kills it? 19:11:46 when I reproduce it takes about 20 seconds to hit the failure case 19:12:03 I don't think that is long enough for gitea to be killing it. However we can double check those timeout values 19:12:04 from watching what happens, it seems to chunk the calls and so about 9 go through, then the 10th (or so) fails quickly 19:12:42 we see the message constantly in the logs; but there don't seem to be that many reports of issues, though, only tonyb 19:12:51 this is observed straight to the gitea socket, no apache or anything proxying it right? 19:12:56 fungi: correct 19:13:08 ianw: there can be only one tonyb 19:13:10 (there is no apache in our gitea setup. just haproxy to gitea fwiw) 19:13:25 that's what i thought, thanks 19:15:00 i think we probably need custom builds with better debugging around the problem area to make progress 19:15:08 I guess the next step is to try and see why upload-pack fails (strace it maybe?) and then trace back up through gitea to see if it is the cause or simply not handling the failure properly? 19:15:25 I would expect that gitea should close the tcp connection if the git process under it failed 19:15:30 yeah, i have an strace in the github bug, that was sort of how we got started 19:15:35 ah 19:16:08 it turns out the error message is ascii bytes in decimal, which when you decode is actually a base-64 string, which when decoded, shows the same message captured by the strace :) 19:16:47 wow 19:17:17 i know mordred already has 1.10 patches up 19:17:43 i'm not sure if we want to spend effort on old releases? 19:17:45 yeah there were a few issues he had to work through, but maybe we address those and get to 1.10 then try to push upstream to help us debug further? 19:17:57 that sounds good 19:18:18 seems like a good next step. lets move on 19:18:26 #topic Update Config Management 19:18:41 zbr_ had asked about helping out on the mailing list and I tried to point to this topic 19:19:26 Long story short if you'd like to help us uplift our puppet into ansible and containers we appreciate the help greatly. Also most of the work can be done without root as we have a fairly robust testing system set up which will allow you test it all before merging anything 19:19:27 it was a great read 19:19:33 Then once merged an infra-root can help deploy to production 19:19:50 ++ i think most tasks there stand-alone, have templates (i should reply with some prior examples) and are gate-testable with our cool testinfra setup 19:20:39 That was all I had on this topic. Anyone have other related items? 19:21:09 #topic Storyboard 19:21:18 fungi: diablo_rojo anything to mention about storyboard? 19:22:06 the api support for attachments merged 19:22:37 next step there is to negotiate and create a swift container for storyboard-dev to use 19:22:43 exciting 19:23:12 then the storyboard-webclient draft builds of the client side implementation for story attachments should be directly demonstrable 19:23:43 (now tat we've got the drafts working correctly again after the logs.o.o move) 19:24:26 i guess we can also mention that the feature to allow regular expressions for cors and webclient access in the api merged 19:24:40 since that's what we needed to solve that challenge 19:25:17 so storyboard-dev.openstack.org now allows webclient builds to connect and be usable from anywhere, including your local system i suspect 19:25:43 (though i haven't tested that bit, not sure if it needs to be a publicly reachable webclient to make openid work correctly) 19:25:51 sounds like good progress ona couple fronts there 19:26:19 any suggestions on where we should put the attachments for storyboard-dev? 19:26:31 i know we have a few places we're using for zuul build logs now 19:26:51 maybe vexxhost would be willing to host storyboard attachments as I expect there will be much less of them than job log files? 19:27:09 for production we need to make sure it's a container which has public indexing disabled 19:27:22 less critical for storyboard-dev but important for production 19:27:33 fungi: I think we control that at a container level 19:27:38 (via x-meta settings) 19:28:03 (to ensure just anyone can't browse the container and find attachments for private stories) 19:28:16 cool 19:28:40 and yeah, again for storyboard-dev i don't think we care if we lose attachment objects 19:28:51 for production there wouldn't be an expiration on them though, unlike build logs 19:29:12 maybe we should work out a cross-cloud backup solution for that 19:29:29 to guard against unexpected data loss 19:29:57 I think swift supports that somehow too, but maybe we also have storybaord write twice? 19:30:29 yeah, we could probably fairly easily make it write a backup to a second swift endpoint/container 19:31:08 that at least gets us disaster recovery (though not rollback) 19:31:42 certainly enough to guard against a provider suddenly going away or suffering a catastrophic issue though 19:32:25 anyway, that's probably it for storyboard updates 19:32:45 #topic General Topics 19:32:51 fungi: anything new re wiki? 19:33:29 nope, keep failing to find time to move it forward 19:34:09 ianw: for static replacement are we ready to start creating new volumes? 19:34:21 I think afs server is fully recovered from the outage? and we are releasing volumes successfully 19:34:55 yes, i keep meaning to do it, maybe give me an action item so i don't forget again 19:34:57 yeah, some releases still take a *really* long time, but they're not getting stuck any longer 19:35:18 #action ianw create AFS volumes for static.o.o replacement 19:35:44 though on a related note, we need to get reprepro puppetry translated to ansible so we can move our remaining mirroring to the mirror-update server. none of the reprepro mirrors currently take advantage of the remote release mechanism 19:35:46 fungi: yeah, the "wait 20 minutes from last write" we're trying with fedora isn't working 19:36:18 yeah, i started a little on reprepro but not pushed it yet, i don't think it's too hard 19:36:39 i think it shouldn't be too hard, it's basically a package, a handful of templated configs, maybe some precreated directories, and then cronjobs 19:36:41 its mostly about getting files in the correct places 19:36:46 there are a lot of files but otherwise not too bad 19:36:49 and a few wrapper scripts 19:37:38 Next up is the tox python version default changing due to python used to install tox 19:37:42 #link http://lists.openstack.org/pipermail/openstack-discuss/2019-November/010957.html 19:37:59 ianw: fwiw I agree that the underlying issue is tox targets that require a specific python version and don't specify what that is 19:38:11 these tox configs are broken anywhere someone has installed tox with python3 instead of 2 19:38:30 yeah, i just wanted to call out that there wasn't too much of a response, so i think we can leave it as is 19:38:45 wfm 19:38:51 yep, with my openstack hat on (not speaking for the stable reviewers though) i feel like updating stable branch tox.ini files to be more explicit shouldn't be a concern 19:39:21 there's already an openstack stable branch policy carve-out for updating testing-related configuration in stable branches 19:39:24 I think we're just going to have to accept there will be bumps on the way to migrating away from python2 19:39:34 and we've run into other bumps too so this isn't unique 19:40:03 And that takes us to mkolesni's topic 19:40:11 Hosting submariner on opendev.org. 19:40:15 thanks 19:40:19 I think we'd be happy to have you but there were questions about CI? 19:40:22 dgroisma, you there? 19:40:28 mkolesni: thanks for sticking around through 40 minutes of other discussion ;) 19:40:35 fungi, no prob :) 19:40:52 let me wake dgroisma ;) 19:41:18 we wanted to ask what it takes to move some of our repos to opendev.org 19:41:50 currently we have all our ci in travis on the github 19:42:11 The git import is usually pretty painless. We point out gerrit management scripts at an existing repo source that is publicly accessible and suck the existing repo content into gerrit. This does not touch the existing PRs or issues though 19:42:15 there are many question around ci, main one if we could keep using travis 19:42:34 For CI we run Zuul and as far as I know travis doesn't integrate with gerrit 19:42:41 clarkb, yeah for sure the prs will have to be manually migrated 19:43:02 It may be possible to write zuul jobs that trigger travis jobs 19:43:14 That said my personal opinion is that much of the value in hosting with opendev is zuul 19:43:39 I think it would be a mistake to put effort into continuing to use travis (though maybe it would help us to understand your motiviations for the move if Zuul is not part of that) 19:43:41 the short version on moving repos is that you define a short stanza for the repository including information on where to import existing branches/tags from, also define a gerrit review acl (or point to an existing acl) and then automation creates the projects in gerrit and gitea. after that you push up a change into your repo to add ci configuration for zuul so that changes can be merged (this can 19:43:43 be a no-op job to just merge whatever you say should be merged) 19:44:10 btw are there any k8s projects hosted on opendev? 19:44:22 how do you define a kubernetes project? 19:44:31 airship does a bunch with kubernetes 19:44:39 so do some openstack projects like magnum and zun 19:44:43 fungi: as does magnum and mnaser's k8s deployment tooling 19:44:47 one thats in golang for example :) 19:44:57 there are golang projects 19:45:10 yeah, programming language shouldn't matter 19:45:10 bits of airship are golang as an example 19:45:22 we have plenty of projects which aren't even in any programming language at all for taht matter 19:45:42 for example, projects which contain only documentation 19:46:14 you guys suggested a zuul first approach 19:46:23 to transition to zuul and then do a migration 19:46:46 but there was hesitation for that as well since zuul will have to test github based code for a while 19:47:24 mkolesni: dgroisma how many jobs are we talking about and are they complex or do they do simple things like "execute unittests", "build docs", etc? 19:47:26 well, it wouldn't have to be the zuul we're running. zuul is free software anyone can run wherever they like 19:47:54 My hunch is they can't be too complex due to travis' limitations 19:48:11 clarkb, dgroisma knows best and can answer that 19:48:14 and if that is the case quickly adding jobs in zuul after migrating shouldn't be too difficult and is something we can help with 19:48:17 the jobs are a bit complex, we are dealing with multicluster and require multiple k8s clusters to run for e2e stuff 19:48:37 dgroisma: and does travis provide that or do your jobs interact with external clusters? 19:48:50 the clusters are kind based (kubernetes in docker), so its just running bunch of containers 19:48:56 is that a travis feature, or something you've developed and happens as part of your job payload? 19:49:09 fungi, well currently we rely on github and travis and dont have our own infra so we'd prefer to avoid standing up the infra just for migration sake 19:49:27 mkolesni: totally makes sense, just pointing that out for clarity 19:49:33 ok sure 19:49:39 its out bash/go tooling 19:50:06 our tooling not travis feature 19:50:14 we use dapper images for the environment 19:50:19 okay, so from travis's perspective it's just some shell commands being executed in a generic *nix build environment? 19:50:32 yes 19:50:53 in that case, making ansible run the same commands ought to be easy enough 19:50:54 the migration should be ok, we just run some make commands 19:51:05 fungi: dgroisma mkolesni and we can actually prove that out pre migration 19:51:17 we have a sandbox repo which you can push job configs to which will run your jobs premerge 19:51:44 That is probably the easiest way to make sure zuul will work for you, then if you decide to migrate to opendev simply copy that job config into the repos once they migrate 19:52:03 that should give you some good exposure to gerrit and zuul too which will likely be useful in your decision making 19:52:04 yeah, you will probably also find that while you start with basically ansible running shell: mycommand.sh ... you'll find many advantages in getting ansible to do more and more of what mycommand.sh over time 19:52:12 clarkb, so you mean do initial migration, test the jobs, and if all is good sync up whatever is left and carry on? 19:52:29 or is the sandbox where we stick the project itself? 19:52:41 mkolesni: no I mean, push jobs into opendev/sandbox which already exists in opendev to run your existing test jobs against your software 19:52:47 you could push up a change to the opendev/sandbox repo which replaces all the files with branch content from yours and a zuul config 19:52:57 it doesn't need to get approved/merge 19:53:00 ah ok 19:53:02 Then if you are happy with tose results you can migrate the repos and copy the config you've built in the sandbox repo over to your migrated repos 19:53:12 this way you don't have to commit to much while you test it out and don't have to run your own zuul 19:53:12 zuul will test eth change as written, including job configuration 19:53:25 dgroisma, does that approach sound good to you? for a poc of the CI? 19:53:37 yes sounds good 19:53:43 ok cool 19:53:56 do you guys have any questions for us? 19:54:10 i think the creators guide covers everything else we need 19:54:14 not really. it's all free/libre open source software right? 19:54:15 I'm mostly curious to hear what your motivation is if not CI (most people we talk to are driven by the CI we offer) 19:54:34 also we'd be happy to hear feedback on your experience fiddling with the sandbox repo and don't hesitate to ask questions 19:54:39 gerrit reviews 19:54:40 sounds like the ci is a motivation and they just want a smooth transition from their existing ci? 19:54:43 github sucks for collaborative development :) 19:54:50 oh neat we agree on that too :) 19:54:55 :) 19:55:16 and as former openstack devs we're quite farmiliar with gerrit and its many benefits 19:55:29 at least i didn't hear any indication they wanted a way to keep using travis any longer than needed 19:55:40 no i don 19:55:53 i don't think we're married to travis :) 19:56:07 ok sounds like we have a plan for moving forward. Once again feel free to ask questions as you interact with Zuul 19:56:20 welcome (back) to opendev! ;) 19:56:21 I'm going to quickly try to get to the last couple topics before our hour is up 19:56:22 ok thanks we'll check out the sandbox repo 19:56:27 thank you very much 19:56:33 thanks for your time 19:56:34 ianw: want to tldr the dib container image fun? 19:56:40 mkolesni: dgroisma you're welcome 19:57:05 i would say my idea is that we have Dockerfile.opendev Dockefile.zuul Dockerfile. 19:57:06 ianw: if I read your email correctly it is that layering doesn't work for our needs here and maybe we should just embrace that and have different dockerfiles? 19:57:18 and just build layers together that make sense 19:58:01 i don't know if everyone else was thinking the same way as me, but I had in my mind that there was one zuul/nodepool-builder image and that was the canonical source of nodepool-builder images 19:58:01 It did make me wonder if a sidecar appraoch would be more appropriate here 19:58:16 but I'm not sure what kind of rpc that would require (and we don't have in nodepool) 19:58:29 but i don't think that works, and isn't really the idea of containers anyway 19:58:34 and then we would publish container images for the things we're using into the opendev dockerhub namespace, even if there are images for that software in other namespaces too, as long as those images don't do what we specifically need? (opendev/gitea being an existing example)> 19:58:45 fungi: ya that was how I read it 19:58:59 fungi: yep, that's right ... opendev namespace is just a collection of things that work together 19:59:08 i don't have any objection to this line of experimentation 19:59:11 with the sidecar idea I had it was don't try to layer everything but instead incorporate the various bits as separate containers 19:59:26 it may be useful for others, if they buy into all the same base bits opendev is built on 19:59:38 nodepool builder would run in its own container context then execute dib in another container context and somehow get the results (shared bind mount?) 19:59:39 yeah, putting those things in different containers make sense when they're services 20:00:10 but putting openstacksdk in a different container from dib and nodepool in yet another container wouldn't work i don't think? 20:00:26 We are at time now 20:00:41 The last thing I wanted to mention is I've started to take some simple notes on maybe retiring some services? 20:00:42 no, adding openstacksdk does basically bring you to multiple inheritance, which complicates matters 20:00:45 #link https://etherpad.openstack.org/infra-service-list 20:00:57 thanks clarkb 20:00:58 ianw: fungi ya I don't think the sidecar is a perfect fit 20:01:22 re opendev services, if you have a moment over tea/coffee/food it would be great for a quick look and thoughts 20:01:37 I think if we can identify a small number of services then we can start to retire them in a controlled fashion 20:01:56 (mostly the ask stuff is what brought this up in my head because it comes up periodically that ask stops working and we really don't have the time to keep it working) 20:02:00 thanks everyone! 20:02:03 #endmeeting