14:01:26 <ttx> #startmeeting large_scale_sig
14:01:26 <openstack> Meeting started Wed Nov 18 14:01:26 2020 UTC and is due to finish in 60 minutes.  The chair is ttx. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:01:28 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:01:30 <openstack> The meeting name has been set to 'large_scale_sig'
14:01:31 <ttx> #topic Rollcall
14:01:35 <ttx> Who is here for the Large Scale SIG meeting ?
14:01:39 <belmoreira> o/
14:01:41 <genekuo> o/
14:01:51 <ttx> amorin: ?
14:01:55 <ttx> mdelavergne: ?
14:02:07 <mdelavergne> Hi!
14:02:25 <amorin> hey!
14:02:28 <amorin> I am here!
14:02:34 <amorin> thanks for ping :)
14:02:36 <ttx> perfect, let's get started
14:02:41 <ttx> Our agenda for today is at:
14:02:44 <ttx> #link https://etherpad.openstack.org/p/large-scale-sig-meeting
14:02:51 <ttx> #topic PTG/Summit postmortem
14:03:04 <ttx> A few weeks ago we had our Forum and PTG activities, and I'd like to discuss how successful that was
14:03:11 <ttx> My view on it is that the Forum session was a success with new people sharing experience
14:03:22 <ttx> The two PTG meetings were not as successful with only a couple of people showing up
14:03:28 <ttx> What do people think?
14:04:19 <ttx> Personally, I think that means we need to separate SIG activities (the inner circle) from the sharing of experience (outer circle)
14:04:42 <genekuo> Yeah, I think summit have more operators attending
14:04:48 <ttx> as it's a lot easier to get people to show up when you ask a specific question, than askign people to join a SIG meeting
14:05:00 <genekuo> agree
14:05:07 <ttx> For reference here is the link to the Forum and PTG etherpads:
14:05:11 <ttx> #link https://etherpad.opendev.org/p/vSummit2020_OpenStackScalingStory
14:05:16 <ttx> #link https://etherpad.opendev.org/p/wallaby-ptg-largescale-sig
14:05:34 <belmoreira> ttx if I remember well I didn't attend the PTG sessions because the timeslot
14:05:43 <ttx> amorin, mdelavergne what did you think
14:05:59 <ttx> belmoreira: our hope was that some of the people from the week before would show up
14:06:07 <amorin> I really enjoy the summit part, with a lot of operator giving feedbacks
14:06:07 <ttx> but i think that was misguided
14:06:19 <amorin> unfortunately I was not avaible for PTG, so I miss that
14:06:28 <mdelavergne> nothing much, sorry I was really occupied these last few weeks, I missed a lot :(
14:06:52 <amorin> anyway, I have the feeling that operators have things to say
14:07:02 <amorin> I dont know how we can imply them more in our meetings
14:07:11 <ttx> I've been giving it some thought
14:07:20 <amorin> you want to split the group in 2 parts?
14:07:23 <ttx> and we discussed that with genekuo at the PTG sessions
14:07:38 <belmoreira> ttx, true. But for some reason we see europeans participating more in these meetings and if I recall correctly it was at the beginning and end of the day. For example for me was impossible to attend
14:07:38 <genekuo> I think we've discussed with collaborating with OpsMeetup during PTG
14:07:45 <ttx> amorin: I think expecting people to join the SIG meeting and share their experience is not going to happen
14:08:03 <ttx> We need to use other types of exceptional events to raise the questions we careabout and get the feedback we need
14:08:23 <ttx> But that will be the topic for the next meeting
14:08:27 <ttx> as we have lots to cover today
14:08:51 <amorin> ack
14:08:56 <ttx> but yes, homework before next meeting will be to think about how we can more efficiently collect feedback
14:08:59 <belmoreira> ttx I'm just trying to understand why, maybe, the PTG sessions didn't have a wider participation
14:09:37 <ttx> belmoreira: yes that explains why the regulars did not show up, but not why the expected new people did not show up either :)
14:09:53 <ttx> #topic Wallaby setup
14:09:57 <ttx> So as we start a new cycle I'd like to discuss three things:
14:10:07 <ttx> meeting frequency, update co-chairs, and organization of workstreams
14:10:17 <ttx> Regarding meeting frequency, during the PTG we discussed moving back to a single biweekly meeting
14:10:27 <ttx> It seems our experience with rotating meeting times resulted in spreading out the team without getting regular new members
14:10:42 <ttx> So I ran a poll and we arrived at that 14utc slot
14:10:52 <ttx> We might push it back one hour to 15utc to be more friendly to imtiaz (15utc was a popular time in the survey too)
14:11:06 <amorin> ok for me
14:11:12 <ttx> (he wanted to join today but said 6am was a little early)
14:11:22 <belmoreira> I'm OK with both slots
14:11:23 <genekuo> I'm ok, even a bit later at 16 utc
14:11:36 <amorin> same here, 16utc is my last slot
14:11:48 <belmoreira> uh... 16 UTC can be more difficult for me
14:11:53 <ttx> Let's try 15utc. I don;t want to over-adapt around someone that has not attended any meeting yet
14:12:00 <genekuo> no problem
14:12:09 <mdelavergne> 15 or 16 is fine by me
14:12:13 <ttx> Regarding co-chairs, I'd like to update the list since we have not seen Pengju Jiao recently
14:12:26 <ttx> Anyone interested? the role is about helping keeping track of workstream status and (when I'm not available) help chairing meetings
14:12:30 <ttx> belmoreira: interested in continuing?
14:12:46 <ttx> anyone else ?
14:12:50 <belmoreira> sure
14:12:52 <genekuo> I'm ok
14:13:11 <genekuo> to help out
14:13:11 <ttx> genekuo: ok to be chair or ok not to be chair? :)
14:13:26 <ttx> ok to be chair, that works
14:13:28 <genekuo> ok to be co-chiar
14:13:31 <ttx> the more the merrier
14:13:32 <genekuo> *chair
14:13:47 <ttx> I'll update and replace Pengju Jiao with Gene Kuo
14:13:57 <ttx> Now... Regarding organization of workstreams
14:14:01 <ttx> #topic Reorganize work into a scaling journey
14:14:19 <ttx> During PTG with genekuo we discussed reorganizing our goals into more of a scaling journey
14:14:25 <ttx> Like as you scale up, you:
14:14:31 <ttx> 1. configure/optimize your single cluster to handle additional load
14:14:39 <ttx> 2. monitor your cluster to detect strain/limits
14:14:46 <ttx> 3. scale up until you reach limits
14:14:55 <ttx> 4. scale out to multiple clusters/regions/cells once you reach limits
14:15:06 <ttx> And our work is essentially about helping people throughout that journey
14:15:20 <ttx> So for (1) we work on documenting configuration options for large scale
14:15:30 <ttx> For (2) we work on oslo.metrics and other meaningful monitoring solutions
14:15:41 <ttx> For (3) we identify and push back low-hanging fruit limits scaling within one cluster
14:15:52 <ttx> (That can include documenting how to properly shard a RabbitMQ install, for example)
14:16:02 <ttx> and finally for (4) we explain the various options and how to do it
14:16:22 <ttx> I feel like that makes our purpose a lot clearer. It's just about helping people at various stages of this journey.
14:16:36 <ttx> And it does not have to be complete or anything. Every bit helps.
14:16:44 <ttx> Does that make sense to you?
14:16:49 <amorin> yes
14:16:53 <belmoreira> yes, I like the idea
14:16:57 <genekuo> yep
14:17:14 <ttx> it's a lot less abstract than setting artifical "goals" imho, and also less intimidating
14:17:32 <ttx> In terms of output format, I was thinking we could use a FAQ format. That way we can list common questions, and provide/improve answers when we have them
14:17:45 <ttx> FAQs have two interesting properties:
14:17:53 <ttx> - Listing questions is as important as listing answers
14:18:01 <ttx> - It's OK if the FAQ is constantly work in progress
14:18:17 <ttx> so again, less intimidating than a doc or a white paper that has to be "finished"
14:18:23 <ttx> How does that sound?
14:18:26 <mdelavergne> yep, good idea
14:18:39 <amorin> faq on wiki for example?
14:18:44 <amorin> or directly in documentation?
14:18:55 <genekuo> Yes, it's easier to answer questions than to write a document from scratch
14:18:56 <belmoreira> +1
14:18:58 <ttx> yes, I was thinking refactoring the wiki to describe the journey
14:18:59 <amorin> I'd like to have it in doc directly
14:19:10 <genekuo> Agree with doc
14:19:12 <ttx> it can still point to doc artifacts
14:19:17 <amorin> usually, operators are looking at config options, deployments etc
14:19:23 <genekuo> I rarely refer to wiki for OpenStack stuffs
14:19:33 <amorin> if we have a page over there, it could be nice IMHO
14:19:42 <amorin> genekuo same here
14:19:43 <ttx> Like obviously, oslo.metrics will not live in a FAQ, it will be a proper software library
14:19:51 <ttx> same for documenting options
14:19:51 <amorin> yup
14:20:13 <ttx> But the main artifact of our group should be this set of wiki pages (at least for now)
14:20:21 <ttx> that describe the journey and answer questions
14:20:25 <mdelavergne> wherever this is put, we can still link it from the other anyway ?
14:20:27 <ttx> pointing to other materials
14:20:42 <amorin> ok for me
14:20:50 <genekuo> sounds good
14:21:03 <ttx> Like one question could be "my database is getting quite large with stale entries, what should I do about it" and point to OSarchiver
14:21:39 <amorin> :)
14:21:55 <ttx> personally I have a lot more questions than answers, so I like a format that lets me ask them, even if for now we don;t have all the answers
14:22:17 <ttx> OK if that sounds good, I can work on refactoring our wiki docs
14:22:23 <genekuo> good starting point :)
14:22:24 <ttx> #action ttx to refactor wiki and other etherpads into that "journey" FAQ view
14:22:26 <mdelavergne> yep, at some point somebody will have the answer and add it, hopefuly
14:23:02 <ttx> I'll replace all our etherpads with a single one that tracks what the SIG is actively working on
14:23:11 <ttx> (rather than one per stream)
14:23:45 <genekuo> agreed, we have a lot of scattered pages, better reorganize or link them in one page
14:23:49 <ttx> Any other suggestion on how we should reorganize how we work?
14:24:19 <genekuo> :)
14:24:45 <ttx> ok moving on to next topic
14:24:49 <ttx> #topic The road to oslo.metrics 1.0
14:24:59 <ttx> So one objective we have for the Wallaby cycle (so between now and April 2021) is to get to a proper oslo.metrics release
14:25:18 <ttx> Before our first 0.1 release we should have:
14:25:21 <ttx> - Basic tests (https://review.opendev.org/#/c/755069/)
14:25:26 <ttx> - Latest metrics code (?)
14:25:35 <ttx> Then before the end of the Wallaby cycle we should have:
14:25:41 <ttx> - oslo-messaging metrics code (https://review.opendev.org/#/c/761848/)
14:25:48 <ttx> - Enable bandit (issue to fix with predictable path for metrics socket ?)
14:25:52 <ttx> - Improve tests to get closer to 100% coverage
14:25:56 <ttx> genekuo: anything I missed?
14:26:00 <genekuo> ttx: I basically have the latest main code merged in the last commit
14:26:20 <genekuo> THe rest of them are mostly additional metrics which I would like to add after 0.1 release
14:26:22 <ttx> genekuo: with the socket patch?
14:26:26 <genekuo> Yep
14:26:28 <ttx> ah ok
14:26:42 <ttx> So we could merge the basic tests
14:27:00 <ttx> and then do a 0.1 release that you can start using in oslo.messaging
14:27:39 <genekuo> Ah, I should have another patch handling sigterm, but it shouldn't affect the test
14:27:47 <genekuo> Let me update it later this week
14:27:56 <genekuo> Yes, thanks!
14:28:05 <ttx> genekuo: genekuo I'll add you to oslo-metrics-core so you can +2 my patch
14:28:17 <genekuo> OK
14:28:24 <ttx> ok you should be able to +2a now
14:28:34 <ttx> if you reload the page
14:28:59 <ttx> then I'll check what it takes to do a 0.1 release
14:29:13 <genekuo> thank you
14:29:27 <ttx> #action genekuo to review/approve https://review.opendev.org/#/c/755069/
14:29:43 <ttx> #action ttx to set up release jobs and request a 0.1 release
14:30:05 <ttx> That should keep us busy between now and next meeting
14:30:15 <ttx> #topic Next meeting
14:30:23 <ttx> So our next meeting should be December 2.
14:30:45 <ttx> We'll do 15utc, one hour later compared to today, in this channel if it's available
14:30:51 <ttx> #action ttx to reschedule meeting to be biweekly on Wednesdays, 15utc
14:31:01 <amorin> there is a chance that I will not be available
14:31:47 <ttx> I'd rather keep that meeting date that way we can do the next one on Dec 16
14:31:53 <ttx> then skip for holidays
14:32:13 <ttx> and be back on January 13
14:32:40 <genekuo> I'm ok with the timeline
14:33:02 <ttx> is there anythig else we can make progress on between now and next meeting?
14:33:19 <ttx> amorin: was there any progress on the OSops/OSarchiver side?
14:34:20 <amorin> nop, was not able to find time for this
14:34:38 <ttx> I suspect recent weeks at OVHCloud have been kind of busy
14:34:49 <amorin> yes :)
14:35:13 <ttx> OK anything else, anyone?
14:35:47 <ttx> oh, I should write an action item for the homework
14:36:11 <ttx> #action all to think about how to improve how we collect feedback
14:36:19 <ttx> since that was not a stellar success during the Victoria cycle...
14:36:46 <ttx> Since we have a few more minutes, I'll talk about what we discussed in the PTG sessions
14:37:12 <ttx> The suggestion there was to make those feedback sessions more "exceptional" than a stabnding etherpad open for comments
14:37:43 <ttx> Like we've had those etherpads like the scaling-stories etherpad open, with calls on the mailing-list for people to add experience to it
14:37:47 <ttx> that's just not working
14:38:15 <ttx> But when we create a session about "sharing your scaling story" at an ops meetup or a Forum, people come
14:38:54 <ttx> so basically, making it an event, and focusing it on one question
14:39:13 <genekuo> agree
14:39:31 <mdelavergne> should it be written or audio ?
14:39:33 <ttx> Could be just piggybacking on existing events, every time they happen... or maybe create our own "event"
14:39:53 <ttx> like a monthly "question to operators" session on Zoom or whatever
14:40:08 <amorin> audio/video is really nice in my opinion
14:40:45 <ttx> I think IRC is very efficient for work group meeting as it's minutes are recorded and you can easily take #actions and such
14:40:56 <ttx> But not for ops sharing experiences
14:41:11 <ttx> for that audio/video seems to be a plus, or in-person
14:41:27 <ttx> More work for us to extract learnings from that, but that's better than no feedback at all
14:41:39 <belmoreira> if we keep this session (maybe more than one slot) in the forum it's a good start
14:42:04 <belmoreira> also, a ops event... (not sure if the ops team is organizing something virtual)
14:42:11 <ttx> Yes, obviously we should make sure we leverage future forums. But once every 6 month might not be enough :)
14:42:46 <ttx> So that's the general idea I had, but you can noodle on it for two weeks and we'll dsicuss it again on Dec 2
14:43:01 <genekuo> I also vote for collaborating with ops event
14:43:21 <ttx> Also, make the SIG group more comfortable for existing attendees, rather than jump through hoops for hypothetical new members
14:43:49 <ttx> thag means obviously creating an inner circle and an outer circle, but that's what we have anyway, the regulars and the others
14:44:27 <ttx> Yes we should track what the ops meetups group is giong to do next
14:44:44 <ttx> Alright, that is all I had for today. Anything else we should be doing ?
14:45:10 <genekuo> ttx: do ops meetup have regular irc meeting?
14:45:16 <ttx> let me see
14:45:18 <genekuo> would like to join if possible
14:45:23 <imtiazc> Sorry for joining a bit late despite my best intentions to wake up earlier. Do we have a Zoom session as well for this meeting?
14:45:35 <ttx> http://eavesdrop.openstack.org/#OpenStack_Ops_Meetup_Team
14:45:53 <ttx> imtiazc: no only IRC, makes it easier to write minutes :)
14:46:03 <genekuo> thanks ttx
14:46:13 <ttx> imtiazc: thanks for joining, we agreed to move one hour later for the next meeting
14:46:21 <ttx> So December 2, 15utc
14:46:31 <imtiazc> Thank you @ttx
14:46:32 <ttx> I'll post the summary and logs for this meeting
14:46:53 <ttx> imtiazc: we have a few minutes left, maybe you can introduce yourself?
14:46:57 <ttx> Then we will
14:47:12 <genekuo> :)
14:48:21 <ttx> or maybe we can all introduce ourselves at the same time
14:48:26 <imtiazc> Sure, this is Imtiaz Chowdhury, Cloud Architect from Workday. At Workday, we have been deploying OpenStack based private clouds for 7 years.
14:48:59 <genekuo> Hi!
14:50:05 <ttx> I'm Thierry Carrez, VP Engineering at the now Open Infrastructure Foundation. I'm helping drive this group because I have an interest in getting large users to contribute their experience running openstack, and receive lots of questions from users that worry about the scaling journey and would very much like that we have great answers to that
14:50:32 <ttx> So even if I don;t have any answers, I help greasing the wheels of the group
14:50:34 <genekuo> I'm Gene Kuo, working at LINE as Infrastructure engineer. Our team have been developing and operating OpenStack based private clouds to run our services.
14:50:56 <amorin> I am Arnaud Morin, from OVHCloud, we are running OpenStack to provide a public cloud
14:51:08 <amorin> I am involved in the team which deploy and manage the infrastructure
14:51:38 <mdelavergne> Hi, I'm Marie Delavergne, phd student for Inria and I work on scaling OpenStack in different ways :)
14:51:53 <belmoreira> I'm Belmiro Moreira. I work at CERN as Cloud Engineer
14:53:00 <ttx> imtiazc: I invite you to read the logs of this meeting, should give you a lot of insights on how we plan to reorganize work of the group for the Wallaby cycle
14:53:12 <ttx> and we'll talk again in two weeks
14:53:29 <ttx> Anything else before we close, anyone?
14:54:08 <genekuo> Nope :)
14:54:11 <ttx> Alright then... thanks!
14:54:15 <ttx> #endmeeting