#openstack-meeting-3 log

15:00:32 <ttx> #startmeeting large_scale_sig
15:00:33 <openstack> Meeting started Wed Dec  2 15:00:32 2020 UTC and is due to finish in 60 minutes.  The chair is ttx. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:34 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:37 <openstack> The meeting name has been set to 'large_scale_sig'
15:00:39 <ttx> #topic Rollcall
15:00:44 <mdelavergne> Hi o/
15:00:44 <ttx> Who is here for the Large Scale SIG meeting ?
15:00:49 <genekuo_> Hi o/
15:01:13 <ttx> amorin is probably busy. If not, he should be
15:01:45 <ttx> belmoreira: maybe around?
15:02:02 <belmoreira> o/
15:02:07 <belmoreira> thanks for the ping
15:02:38 <ttx> hoping to see imtiaz soon
15:02:51 <ttx> Alright, lets get started
15:02:56 <ttx> Our agenda for today is at:
15:02:58 <ttx> #link https://etherpad.openstack.org/p/large-scale-sig-meeting
15:03:23 <ttx> checking for late-added agenda items
15:03:37 <ttx> #topic Review previous meetings action items
15:03:43 <ttx> "ttx to refactor wiki and other etherpads into that "journey" FAQ view"
15:03:47 <ttx> So... I did reorganize all pages under:
15:03:51 <ttx> #link https://wiki.openstack.org/wiki/Large_Scale_SIG
15:04:04 <ttx> If you have a look... you can see, I split everything into 4 subpages, one for each stage of the journey
15:04:14 <ttx> For each subpage there is a FAQ, a few resources links
15:04:22 <ttx> + a section on other SIG work pertaining to that stage
15:04:39 <jpward> o/
15:04:44 <genekuo_> Yeah I saw that, I'll be adding something to those pages before the next meeting
15:04:49 <ttx> jpward: hi! welcome
15:05:01 <liuyulong> Hi
15:05:05 <ttx> yes, please feel free to add questions, answers or links relevant to the stage
15:05:11 <ttx> liuyulong: hi!
15:05:14 <genekuo_> Hi, seeing new people
15:05:37 <ttx> I'll finish reviewing previous point and we'll do an introduction round
15:05:50 <ttx> At our next meeting we'll review each stage contents, so please spend some time reviewing those in the next two weeks!
15:06:00 <ttx> #action all to review pages under https://wiki.openstack.org/wiki/Large_Scale_SIG in preparation for next meeting
15:06:07 <ttx> Also let me know if you have trouble logging into the wiki.
15:06:22 <ttx> The goal being, anyone in the SIG should feel free to update those pages, not just me
15:06:46 <ttx> comments on that point?
15:07:04 <genekuo_> The structure looks pretty good to me
15:07:19 <imtiazc> The 4 categories look good. Could we add "upgrade" story as well?
15:07:55 <ttx> imtiazc: hi! I feel like like upgrade is an orthogonal concern at each stage
15:08:01 <ttx> unless we add it as a 5th stage
15:08:19 <ttx> like, once you have scaled out, how the hell do you upgrade?
15:08:38 <imtiazc> exactly. It does become quite challenging.
15:08:54 <genekuo_> I probably can add that maybe next year, we're planning on doing that
15:09:04 <genekuo_> Should be a lot of pain
15:09:31 <ttx> I like the idea of a 5th stage. does that make sense to you mdelavergne, belmoreira ?
15:10:00 <ttx> I mean, if upgrading at high scale has unique constraints, it makes sense for us to document it
15:10:30 <mdelavergne> mmh yep !
15:10:43 <mdelavergne> upgrade/maintain ?
15:10:47 <ttx> OK, I'll create a skeleton page
15:10:55 <ttx> "rinse, repeat"
15:11:23 <ttx> #action ttx to add 5th stage around upgrade and maintain scaled out systems in operation
15:11:28 <liuyulong> configuration tunning means you need to restart/reload/respawn the processes/servies/agents and so on, so looks like a upgrade already. : )
15:11:44 <ttx> ok, next action item from last meeting was...
15:11:49 <ttx> "genekuo to review/approve https://review.opendev.org/#/c/755069/"
15:11:52 <ttx> That's done
15:11:58 <ttx> then... "ttx to set up release jobs and request a 0.1 release"
15:12:05 <ttx> (for oslo.metrics)
15:12:08 <imtiazc> By upgrade, I meant moving from one OpenStack release to another.
15:12:08 <ttx> I did that:
15:12:14 <ttx> #link https://review.opendev.org/c/openstack/project-config/+/763986
15:12:16 <ttx> #link https://review.opendev.org/c/openstack/releases/+/764631
15:12:21 <genekuo_> I've checked the PR seems one of them is blocked by CI
15:12:27 <ttx> release should be processed very soon now, maybe today
15:12:56 <ttx> err, failed again. Will have a look :)
15:13:17 <ttx> #action ttx to make sure oslo.metrics 0.1 is released
15:13:21 <ttx> "ttx to reschedule meeting to be biweekly on Wednesdays, 15utc"
15:13:31 <ttx> That's done: http://eavesdrop.openstack.org/#Large_Scale_SIG_Meeting
15:13:31 <genekuo_> I'll start working on oslo.messaging code once 0.1 is released
15:13:37 <ttx> and finally... "all to think about how to improve how we collect feedbackPTG/Summit postmortem"
15:13:51 <ttx> we'll discuss that in this meeting after the round of intros
15:14:07 <ttx> #topic Introducing new SIG members
15:14:29 <ttx> I see two new faces, would be good to introduce ourselves and our interest in the Large Scale SIG
15:14:33 <ttx> I'll start
15:14:59 <ttx> I'm Thierry Carrez, VP Engineering at the now Open Infrastructure Foundation. I'm helping drive this group because I have an interest in getting large users to contribute their experience running openstack, and receive lots of questions from users that worry about the scaling journey and would very much like that we have great answers to that
15:15:09 <ttx> (yes, I copied last week's intro)
15:15:15 <liuyulong> Hi, my name is LIU Yulong, I'm the core of Neutron project. So I just want to see how many scale issue/pains you guys see on Neutron. : )
15:15:22 <genekuo_> Hi, I'm Gene Kuo, working at LINE as Infrastructure engineer. Our team have been developing and operating OpenStack based private clouds to run our services.
15:15:37 <ttx> liuyulong: neutron is definitely a hot topic around here
15:15:51 <ttx> especially since rabbitMQ started to behave a bit more sanely lately
15:15:51 <genekuo_> I also copied last weeks intro.
15:16:12 <genekuo_> I probably cannot give a lot of feedback regarding neutron as we implemented our own plugins
15:16:14 <mdelavergne> Hi, I'm Marie Delavergne, PhD student working on large scale Openstacks :)
15:16:19 <ttx> Neutron is often the first scaling pain point, so we appreciate you visiting!
15:16:41 <liuyulong> Yes, I know that.
15:17:17 <liuyulong> I'm working at China Unicom now. We have some large deployment for public cloud.
15:17:27 <ttx> jpward: care to introduce yourself and tell us what you're interested in?
15:18:07 <jpward> I'm John Ward, and I work for Global InfoTek, we have a number of different openstack deployments, the largest one that I am working on is a 15k core but looking to continue scaling.
15:18:48 <ttx> nice! I suspect it's already made of multiple clusters?
15:18:56 <ttx> or cells or..
15:19:23 <jpward> glad to be here, I bring some experience working on even larger cloud Rackspace public cloud is my previous experience
15:19:41 <jpward> currently we don't have cells implemented, but that is on the road map
15:20:16 <ttx> Nice, very interested in hearing how your scaling went so far !
15:20:43 <genekuo_> Cool, nice to meet you
15:20:46 <ttx> So, for today's meeting we planned to discuss how to best collect feedback from experienced operators
15:20:51 <ttx> #topic How to best collect feedback from experienced operators?
15:21:05 <ttx> as I said a couple of weeks ago, during Victoria cycle we tried to use etherpads to collect scaling stories, then curate them onto a wiki page
15:21:11 <ttx> That was not very successful.
15:21:19 <ttx> (understatement of 2020)
15:21:29 <ttx> In contrast, we had several people sharing at our Opendev and Forum sessions around scaling
15:21:41 <ttx> So I was wondering if we should not change our strategy there
15:21:59 <ttx> Rather than run opendev and forum sessions about scaling, in hope that people will join the SIG and share more...
15:22:06 <imtiazc> I am Imtiaz Chowdhury. I am the Cloud Architect for Workday. At Workday, we have 45 clusters running over 9K hypervisors and now close to 500K core. The deployment size is expected to double next year.
15:22:19 <ttx> Maybe we should run events specifically to collect those experiences, and not expect people to join the SIG afterwards
15:22:28 <ttx> or fill an etherpad
15:22:34 <ttx> What's your view on that?
15:22:34 <genekuo_> Yes that what I also think
15:22:54 <genekuo_> It's hard to get people fill out etherpad after work
15:23:13 <ttx> (i mean, if people jion the SIG as regular members, that's awesome, but we should collect their scaling story without expecting them to join first)
15:23:32 <genekuo_> I think an event with in OpenInfra summit and OpsMeetup is the best place to gather information
15:23:43 <ttx> Also it's hard to write and easier to just discuss
15:23:44 <imtiazc> Etherpad or any tool that allows collaborative editing: Google docs, Wiki
15:24:32 <ttx> So how about...
15:24:38 <genekuo_> People tend to get more active when there's event or deadline
15:24:38 <ttx> We build a schedule of regular Large Scale SIG events (think ~ every 2 months)
15:24:47 <ttx> piggybacking on existing events (forum, opendev, ops meetup) if available, or running our own if not
15:25:03 <ttx> and use that to ask specific questions and collect output
15:25:07 <genekuo_> I will suggest piggybacking at first
15:25:36 <genekuo_> And advertise it to some event you can join even if you are not currently running large scale
15:25:37 <ttx> genekuo_: yes but there isn;t much planned in the coming months. I have to doublecheck what the OpsMettup has planned
15:26:04 <ttx> Other suggestions included:
15:26:07 <ttx> - Leverage superuser nominations to extract knowledge
15:26:08 <genekuo_> but planning to scale in the future
15:26:13 <ttx> - Reach out to past speakers that spoke on scaling
15:26:22 <ttx> - Engage with Chinese users
15:26:40 <ttx> so, more direct or narrow outreach
15:26:49 <liuyulong> Neutron team has a mechanism that will enable a deputy each week to collect/filter the bugs.
15:27:04 <liuyulong> #link http://lists.openstack.org/pipermail/openstack-discuss/2020-November/018782.html
15:27:10 <liuyulong> For instance ^
15:27:11 <imtiazc> I like those suggestions.
15:27:23 <liuyulong> So maybe this SIG can add such routine for collecting the scale related infromations.
15:27:47 <liuyulong> And feedback a mail to the community.
15:27:50 <ttx> liuyulong: that sounds good
15:27:58 <ttx> Superuser nominations, next round is a bit far away
15:28:09 <imtiazc> Is there a way we could facilitate connecting different large scale operators?
15:28:15 <ttx> but we could try to identify past summit talks on scaling
15:28:23 <ttx> and reach out to speakers
15:28:31 <genekuo_> Previous superuser nominations also will work
15:28:35 <ttx> (in addition to actually extractign info from the video content)
15:29:10 <genekuo_> I can help out reaching directly to those user if man power is needed.
15:29:41 <ttx> Re: old scaling presentations, I'll create an etherpad where we can dump out findings, organized per event
15:30:02 <ttx> I think that's a good resource to link to in our various stages page anyway
15:30:51 <ttx> #link https://etherpad.opendev.org/p/large-scale-sig-scaling-videos
15:31:51 <genekuo_> I can pick up some of those videos once the list is complete
15:33:21 <genekuo_> I'll pick up listing for Shanghai
15:33:50 <genekuo_> Will do others if I have additional time
15:34:19 <ttx> If you have a few cycles, please assign yourself one of the summits and do a quick search for scale-related presentations
15:34:36 <ttx> If you watch any, feel free to drop notes and remarks on the etherpad too
15:34:50 <ttx> #action all to help in filling out https://etherpad.opendev.org/p/large-scale-sig-scaling-videos
15:35:13 <ttx> I'll check out the Ops meetups future plans
15:35:23 <ttx> #action ttx to check out Ops meetups future plans
15:35:40 <imtiazc> I shall look at Virtual summit and Denver. I shall also add the past scaling presentation we did from Workday
15:35:53 <ttx> imtiazc: great, thanks
15:36:30 <ttx> as far as Chinese users go, I'll defer to Chinese contributors. It feels like we get a lot of engagement when we use China-specific social media
15:36:56 <ttx> so I was wondering if we could use that to ask simple questions from large scale deployments in China
15:37:25 <ttx> am open to suggestions on how to best proceed theer
15:37:29 <ttx> there*
15:38:31 <imtiazc> I am not sure about that. Could we get some help from Jonathan Bryce or Mark C here? They seem to at least have contacts of the large operators and sponsors from China
15:38:43 <genekuo_> hmm, I can ask Rico if he can help
15:38:55 <genekuo_> ricolin
15:39:29 <ttx> That suggestion was actually from Rico :)
15:39:38 <genekuo_> yeah I know
15:40:07 <ttx> OK, that sounds like great first steps. Any other suggestions?
15:41:17 <ttx> Alright then, moving on to next topic
15:41:19 <ttx> #topic Next meeting
15:41:27 <ttx> Our next meeting will be December 16.
15:41:35 <ttx> (Then we'll skip, and have the one after that on January 13)
15:41:51 <ttx> The main topic for that next meeting will be to review all stages, and identify simple tasks to do a first pass at improving those pages
15:41:53 <jpward> same time on the 16th?
15:41:57 <ttx> yes
15:42:11 <genekuo_> I'm ok
15:42:24 <mdelavergne> ok
15:42:43 <ttx> So between now and then, please check out the base content at https://wiki.openstack.org/wiki/Large_Scale_SIG and think a bit on how we can do a first pass at improving that
15:43:05 <ttx> I bet there are a few easy question/answers we could add
15:43:38 <genekuo_> yep
15:43:39 <ttx> Like, put yourself back into that stage of your own scaling story, and answer one of your own early questions you had
15:44:04 <ttx> #topic Open discussion
15:44:20 <ttx> That is all we had on the agenda... Is there anything else you would like to discuss?
15:44:35 <genekuo_> Nope :)
15:44:49 <jpward> nice meeting everyone, nothing else from me
15:44:51 <imtiazc> I have a question on deployment story
15:44:58 <ttx> If not, I'll wrap up now and post the summary. We have a bunch of actions to work on between now and next meeting
15:45:07 <ttx> imtiazc: yes?
15:45:53 <imtiazc> What deployment tools work best for large scale deployment? We are aware of limitations of TripleO but not so sure about Kolla.
15:47:03 <genekuo_> We currently write our own Ansible script and separate compute nodes to different host groups
15:47:03 <ttx> I heard good things of OpenStack-Ansible, but i don't run a deployment myself. What do you all use, if anything?
15:48:02 <ttx> jpward, belmoreira, liuyulong: any specific tooling?
15:48:20 <jpward> We are using salt for our deployment currently, I have used OSA and TripleO in the past
15:48:28 <imtiazc> We are currently using community forked Chef based tools with some home grown tools.
15:48:39 <ttx> wow lots of homegrown tools
15:50:19 <ttx> I thought there was more convergence toward community deployment tools, but maybe I imagined things
15:50:52 <imtiazc> ttx: Do you think "deployment" story could be added as stage zero to the list of categories?
15:50:57 <genekuo_> I personally use kolla-ansible for my own cluster before and have good experience with it, but the scale is very small
15:51:01 <ttx> jpward: I had an unrelated question, how did you learn about the SIG?
15:52:50 <jpward> I run across the wiki site one day when searching for something else
15:53:11 <ttx> ah? funny
15:53:30 <mdelavergne> unexpected
15:54:11 <ttx> yeah usually people can't find anything in the wiki
15:54:40 <ttx> alright, if nothing else...
15:54:52 <ttx> Let's continue the discussion at our next meeting
15:54:52 <mdelavergne> I like the idea of deployment as stage zero!
15:55:16 <belmoreira> sorry... couldn't follow the meeting... fighting some fires
15:55:17 <ttx> mdelavergne: we'll end up writing a complete guide to openstack :)
15:55:26 <mdelavergne> ahah
15:55:51 <ttx> belmoreira: it's ok, you can catch up with the logs
15:56:18 <ttx> belmoreira: I should know that, but do you use a specific deployment tooling to handle the CERRN deployment?
15:56:27 <ttx> CERN*
15:56:50 <belmoreira> for configuration management we use puppet
15:57:03 <belmoreira> we deploy OpenStack using puppet
15:57:17 <ttx> using the openstack-puppet upstream stuff?
15:57:17 <liuyulong> Our operators use ansible, but the templates are written by them, not the openstack ansible.
15:57:31 <belmoreira> yes, openstack-puppet
15:57:47 <ttx> alright, thanks everyone, time to move to.. another meeting
15:57:55 <ttx> #endmeeting