#openstack-meeting log

21:00:23 <oneswig> #startmeeting scientific-wg
21:00:24 <openstack> Meeting started Tue Sep  5 21:00:23 2017 UTC and is due to finish in 60 minutes.  The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:00:26 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:00:28 <openstack> The meeting name has been set to 'scientific_wg'
21:00:28 <zioproto> hello
21:00:31 <hogepodge> hi
21:00:39 <oneswig> hello and good evening etc.
21:00:43 <rbudden> hello
21:00:58 <martial> Hello all
21:00:58 <priteau> Hello!
21:01:02 <oneswig> #link agenda for today is https://wiki.openstack.org/wiki/Scientific_working_group#IRC_Meeting_September_5th_2017
21:01:20 <oneswig> #chair martial
21:01:21 <openstack> Current chairs: martial oneswig
21:02:01 <oneswig> Quite a few topics to cover today, lets get rolling
21:02:23 <oneswig> #topic opportunistic capacity on OpenStack
21:02:58 <oneswig> Blair was particularly interested in this - should we defer until he's joined?
21:03:12 <oneswig> Lets cover item 2 first
21:03:23 <oneswig> #topic private cloud capacity meter
21:03:54 <oneswig> OK so this item was triggered by the discussion on metrics for instance availability a few weeks back
21:04:38 <oneswig> As an example of how new API capabilities can be used, John Garbutt put together a demo
21:05:01 <oneswig> for measuring cloud available capacity using (some of) the new Nova placement APIs
21:05:18 <oneswig> #link os-capacity tool https://github.com/johngarbutt/os-capacity
21:05:35 <oneswig> Works particularly well for bare metal clouds!
21:05:47 <zioproto> cool, I just went through the README. I guess you need to run it with Admin credentials right ?
21:06:06 <oneswig> yes, it's an admin tool, unless your users are especially empowered
21:06:39 <priteau> oneswig: Does it require the placement API?
21:06:47 <b1airo> morning and sorry for tardiness - bit of a morning meltdown happening with #1 here
21:07:08 <oneswig> Yes but not the new features introduced by pike - we use it on ocata - but one day will improve to use it with the new features introduced in pike
21:07:19 <oneswig> Hi b1airo
21:07:23 <oneswig> #chair b1airo
21:07:24 <openstack> Current chairs: b1airo martial oneswig
21:07:33 <oneswig> just on os-capacity
21:07:36 <martial> welcome blair
21:08:03 <oneswig> What it helps with is the disconnect between (eg) SLURM queues and cloud about how to handle being full.
21:08:22 <oneswig> cloud says 'no', slurm says 'join the queue'
21:08:37 <oneswig> At least now we have an idea of how much resource we can ask for
21:09:20 <oneswig> OK - just wanted to offer that up - share and enjoy :-)
21:09:38 <oneswig> Back to the agenda
21:09:47 <oneswig> #topic opportunistic cloud capacity
21:09:58 <oneswig> b1airo: take it away
21:10:44 <b1airo> wanted to take a suevey of what people/deployers are doing to address this use-case today
21:11:18 <b1airo> what opportunistic capacity i mean something slightly different to the usual "on-demand" associated with cloud-computing
21:12:18 <b1airo> my experience of "on-demand" in the private/community cloud space is that it really means on-demand until the cloud is full, then again for a little while after each upgrade, but generally it becomes hard to launch e.g. larger flavours actually on-demand
21:13:47 <b1airo> i'd like to carve out some compute capacity for groups who have burst / speculative use-cases and are happy to be able to launch e.g. one or two 16 core instances for 24 hours with some basic fairness mechanism arbitrating
21:14:15 <b1airo> the simplest idea today seems to be:
21:14:27 <b1airo> 1) create a separate AZ for it
21:14:54 <b1airo> 2) create a new project for each existing project that wants access (to control quota)
21:15:09 <b1airo> 3) give that new project access to use the AZ
21:15:59 <b1airo> 4) run watcher and killer scripts that randomly kill stuff older than X hours
21:16:45 <oneswig> b1airo: would you have it so that there was some kind of kill-to-fill LRU execution when an instance is requested and the AZ is full?
21:17:09 <zioproto> b1airo: watcher and killer scripts, are you considering Openstack Mistral ?
21:17:41 <b1airo> that'd certainly be ideal oneswig , but implementing that is a nightmare i reckon, would be easier to have external scripts just ensure there is always Y capacity free
21:18:03 <b1airo> and if no instances older than limit then the zone is full
21:18:24 <zioproto> b1airo: we have a similar use case where we have to make sure instances run by students are killed everynight at midnight. The use case is different, but we have the same concept of killing resources after a while they are running
21:19:16 <oneswig> zioproto: do you use mistral for that, as you suggest?
21:19:34 <zioproto> oneswig: no we dont
21:20:03 <b1airo> does sound similar zioproto, i guess that is for a lab setup?
21:20:44 <b1airo> re. mistral, maybe... is it a good fit?
21:20:47 <zioproto> yes, we are using ansible based stuff to delete the instances
21:21:07 <zioproto> just because we needed to set this up quickly, and we did not have time to learn another tool just for this task
21:22:08 <b1airo> anyway, i think this general use case is very common and something that OpenStack really needs to address
21:22:37 <priteau> b1airo: In our team we call "on-availability" the opposite of "on-demand". One of our projects is combining OpenStack for on-demand and Torque for on-availability, where the compute nodes are moved from one to the other depending on usage. We are hoping to publish results at a conference in 2018. This is quite different from your solution and not relying only on OpenStack though.
21:23:07 <b1airo> e.g. in the Nectar cloud we have an allocations system with 1,3,6,12 month project lengths and an expiry process. but even with 7 zones of 3-4k cores we still run into this problem
21:23:30 <oneswig> priteau: does that mean that torque queues up OpenStack API requests that couldn't be satisfied?
21:23:41 <StefanPaetowJisc> evening folks. Pardon the tardiness
21:23:44 <b1airo> priteau, "on-availability" - i like it! have not heard that term in this context before
21:23:46 <oneswig> Hi StefanPaetowJisc
21:23:53 <rbudden> priteau: that’s similar to the limited use cases we’ve had for large scale VMs. We’ve traditionally had the nodes placed in a Slurm reservation, then turned into Nova Computes on demand via bash/ansible/etc.
21:23:59 <zioproto> b1airo: I am reading as we speak the python code my colleague wrote. The project is decorated with an attribute. Reading the attribute we wrote a custom python code that decides if killing or shutting down the instances on the project.
21:24:18 <b1airo> o/ StefanPaetowJisc
21:24:18 <priteau> oneswig: no, they're two separate queues with possibly different groups of users. But behind the scenes they're sharing the same cluster.
21:24:31 <rbudden> which reminds me, i still owe b1airo an email about this ;)
21:24:49 <b1airo> oh hey rbudden o/
21:25:21 <oneswig> John Garbutt asked me to prompt people interested in preemptible instances (which essentially is the user-centric effect of this concept)
21:25:25 <priteau> b1airo: Not directly related to the above: the Blazar team will be meeting for the Denver PTG next week and will discuss the idea of the "reaper" service that was proposed in Boston
21:25:30 <oneswig> If they could review and comment on https://review.openstack.org/#/c/438640
21:25:48 <oneswig> If they haven't done so already.  This will inform discussion at the PTG next week.
21:26:01 <martial> priteau +1
21:26:15 <oneswig> So please take a look if you want a spot instance capability on your cloud
21:26:22 <zioproto> #link WIP: Backlog spec on preemptible servers https://review.openstack.org/#/c/438640
21:26:37 <oneswig> zioproto: the very same :-)
21:26:49 <zioproto> oneswig: yes I just formatted it for the MeetBot
21:26:57 <oneswig> thanks zioproto
21:27:23 <oneswig> priteau: how's the gui for blazar?
21:27:52 <priteau> oneswig: It's upstream!
21:28:01 <priteau> https://git.openstack.org/cgit/openstack/blazar-dashboard/
21:28:05 <oneswig> nice work
21:28:11 <b1airo> nice
21:28:38 <oneswig> We may want this, sooner rather than later, our ska system is getting very busy
21:29:00 <oneswig> I'll be in touch priteau...
21:29:05 <priteau> Sounds good
21:31:09 <oneswig> OK, anything more to add on opportunistic usage?
21:31:38 <b1airo> i'm interested to know if people think it is ok to have a different system/api to meet this use-case
21:32:25 <martial> b1airo: I think that is how some people do it, so I would vote yes
21:32:34 <b1airo> or whether it should be through Nova API and therefore require some API changes to instance creation, i.e., a richer NoValidHost
21:33:08 <martial> I do like the blazar solution
21:33:25 <b1airo> martial, the implication if that is the case is we as a community should make an effort to ease and demonstrate that integration for newcomers
21:34:16 <b1airo> there are really not that many combinations to worry about, e.g., Nova+SLURM and Nova+PBS would probably cover ~80%
21:34:24 <priteau> rbudden: Do you have a writeup of your Slurm/Nova solution somewhere?
21:34:55 <rbudden> priteau: I owe and email about this to b1airo, I can include you on it if you’d like ;)
21:35:03 <priteau> Yes please!
21:35:14 <martial> rbudden: can you add me as well?
21:35:18 <rbudden> Everything is just getting back to normal after some vacation and our Bridges upgrades
21:35:21 <rbudden> Martial: sure thing
21:35:28 <martial> thx
21:35:38 <b1airo> thanks rbudden!
21:35:39 <rbudden> I’ll warn you it’s nothing super fancy
21:35:46 <oneswig> b1airo: On the all-openstack side, I think there are liabilities with queuing to get an instance that nova may be wary of, eg, what if I was delayed in creating an instance and then found the resources upon which I depended were gone
21:35:48 <zioproto> I dont know if it is related but I did some testing in running 1000 VMs in a single 'openstack server create' command
21:35:59 <rbudden> largely utilizing Availability Zones and metadata tagging of hypervisors and Nova flavors
21:36:02 <zioproto> the idea is to be able to run that big about of VMs but for a short time
21:36:04 <rockyg> rbudden, you should also consider giving it to the folks who publish superuser blog
21:36:34 <oneswig> Hi rockyg!
21:36:40 <rbudden> rockyg: sounds interesting, i can check into that unless someone has a direct contact I can use?
21:36:45 <rockyg> Hey!  been lurking
21:37:02 <oneswig> rbudden: there's a whole chapter on this... in the book... hint...
21:37:04 <rockyg> Nicole ???
21:37:08 <zioproto> where is a good start to read about PBS+nova, given that I never used PBS ?
21:37:27 <rbudden> oneswig: thanks! i have a copy in front of me on the bookshelf, i’ll check it out!
21:37:43 <rbudden> zioproto: I just did a simlar test using Nova/Ironic during our upgrade
21:38:01 <rbudden> I believe only on the order of 500 nodes in a single instantiation
21:38:04 <oneswig> rbudden: might be a case study for the second edition?
21:38:16 <rbudden> yes, i’ll have notes on this for the book update!
21:38:41 <rbudden> moved to local boot across all nodes and was able to test and verify the Nova scheduler bug fix for this that’s mentioned in the first edition of the book
21:38:42 <rockyg> Nicole Martinelli, rbudden
21:38:50 <rbudden> rockyg: thx
21:39:21 <zioproto> rbudden: #link https://cloudblog.switch.ch/2017/08/28/starting-1000-instances-on-switchengines/
21:40:38 <rbudden> cool, i’ll check out the link
21:40:39 <martial> zioproto: very nice indeed
21:41:45 <oneswig> zioproto: you should get your blog onto planet.openstack.org, if it isn't already?
21:42:04 <rockyg> ++ to that.
21:42:12 <zioproto> #action zioproto check if his blog is already on planet openstack
21:43:03 <oneswig> OK, move on?
21:43:22 <oneswig> #topic book update
21:43:37 <oneswig> The second edition is taking shape nicely.
21:43:53 <oneswig> Thank you to everyone who has contributed their time and input so far.
21:44:07 <oneswig> We have some case studies to fill still
21:44:28 <oneswig> 1) Bare metal infrastructure management case study please, to accompany Bridges and Chameleon
21:44:56 <oneswig> 2) Federation examples to be proposed for the new section, led by Enol
21:45:18 <hogepodge> I'm here to remind everyone of the deadline.
21:45:37 <rbudden> oneswig: as mentioned in my email to you this morning, i’ll be working on the Bridges update this week.
21:45:49 <oneswig> Thanks rbudden, appreciated
21:46:04 <rbudden> I was delaying in hopes to have more time to play with some Neutron integration, but other tasks have unfortuantey had me preoccupied
21:46:22 <priteau> I have done most of the update of the Chameleon case study this morning, will still provide a few more changes later this week
21:47:11 <rbudden> I’m attempting a skip level upgrade from Liberty -> Ocata on our second cluster… I doubt I’ll have it complete before the deadline but if I’ll keep everyone appraised
21:48:00 <oneswig> Excellent.  What of the federators in this time zone?
21:48:52 <oneswig> (... obviously busy debugging SAML issues...)
21:50:02 <hogepodge> Does the team feel like it's on track to deliver the update in a few weeks?
21:50:11 <StefanPaetowJisc> Sorry, debugging non-SAML stuff here... feverishly trying to get GSSAPI (Moonshot) done for an HPC-SIG meeting next week :-)
21:50:30 <StefanPaetowJisc> Still haven't looked at the book spec :-(
21:50:32 <oneswig> hogepodge: I think so, many people have been responsive
21:50:53 <oneswig> Good luck StefanPaetowJisc, keep us updated!
21:50:54 <hogepodge> Excellent. Is there anything I need to take back to the Foundation team?
21:51:31 <hogepodge> We're hoping that the book will have an exciting color image, btw. :-D
21:51:58 <martial> hogepodge: still the plan, we have to discuss a cut off date for review but we are on track
21:52:10 <oneswig> hogepodge: nothing comes to mind for the foundation team right now, thanks
21:52:15 <martial> color :)
21:52:34 <oneswig> hogepodge: how will you decide on a cover?
21:53:15 <oneswig> BTW - one issue - does anyone have Adobe Illustrator?  We can read the .ai files (they are actually PDFs) but not edit them.
21:53:19 <hogepodge> oneswig: the previous book used a research image from one of our community members. If you get an image to us, we can get it to our design team to build out the cover
21:54:39 <oneswig> Interesting idea... Can the WG members run a poll do you think?  I'm sure you and the team would pick a good one.
21:54:43 <StefanPaetowJisc> Hmmmm
21:54:49 <StefanPaetowJisc> I have AI somewhere...
21:55:05 <StefanPaetowJisc> I have AI CS4.
21:55:08 <StefanPaetowJisc> If that helps
21:55:21 <b1airo> pretty sure i can get Adobe suite if required
21:55:32 <martial> same as b1airo
21:55:32 <oneswig> StefanPaetowJisc: it could well.  Can I bear that in mind.  Same to you b1airo
21:56:00 <b1airo> hogepodge, i was wondering about that - we might be able to get something from Monash
21:56:17 <oneswig> I sense a poll ...
21:56:23 <oneswig> OK, 1 final topic to squeeze in - can we do it?
21:56:40 <b1airo> i will talk to my colleague who is very good with this sort of stuff and spends hours making slide decks :-)
21:56:44 <StefanPaetowJisc> Ok, oneswig.
21:56:46 <oneswig> #topic SWG -> SSIG?
21:57:02 <oneswig> b1airo: what's up?  Do we automatically become a SIG?
21:57:45 <zioproto> I will try to let you know about this soon
21:57:56 <zioproto> should be a topic in the UC
21:58:00 <martial> that is a conversation that was explained to us at the UC forum session in Boston
21:58:06 <zioproto> we skipped a meeting because of bank holiday in the US
21:58:28 <martial> but it seemed to Blair and I at the time that it seems so
21:58:42 <rockyg> So, you get to say yea/nay
21:58:46 <martial> zioproto, you will keep us updated it seems :)
21:58:47 <oneswig> Are there material changes to be aware of?
21:58:53 <rockyg> I don't know what happens if you don't pick.
21:59:10 <rockyg> Trying to get more devs involved.
21:59:51 <zioproto> as far as I understood the biggest change is that there will be a big mailing list with all SIGs
22:00:01 <StefanPaetowJisc> EWWW
22:00:02 <zioproto> and you have to write with your SIG in []
22:00:11 <zioproto> similar to openstack-dev mailing list
22:00:19 <b1airo> sorry i walked away to chase someone to pack there schoolbag o_0
22:00:25 <b1airo> *their
22:00:30 <b1airo> (back to school for me!)
22:00:32 <oneswig> It doesn't sound all that different
22:00:43 <priteau> Isn't that what we already do?
22:00:43 <zioproto> oneswig: I would not worry too much
22:00:49 <oneswig> b1airo: it's been that week here, too
22:00:58 <rockyg> Yeah.  Hope is one ml will get more response and cross pollination
22:01:01 <b1airo> zioproto, that is my understanding too
22:01:02 <priteau> Let's skip the SIG and go straight to STIG ;-)
22:01:10 <zioproto> #link http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-sigs
22:01:25 <oneswig> is the major difference the expectation of less work and more interest?
22:01:28 <b1airo> Special Technical Interest Group!
22:01:33 <oneswig> priteau: too good....
22:01:35 <priteau> b1airo: exactly!
22:01:55 <b1airo> oneswig, i think that is one of the subtler expectations yeah
22:02:16 <b1airo> WGs were probably thought of originally as more autonomous and goal focused
22:02:24 <oneswig> Ah, we are over time.
22:02:35 <zioproto> good night !
22:02:48 <oneswig> But to conclude, this is not an issue of concern it seems
22:02:53 <oneswig> business as usual?
22:03:02 <oneswig> zioproto: thanks for staying up!
22:03:05 <b1airo> whereas SIGs are a way to get cliques together, and i think the UC would then like to introduce a few more guidelines to get useful and standardised outputs from those groups
22:03:22 <rbudden> gotta jet, goodbye everyone!
22:03:30 <b1airo> bye all!
22:03:30 <zioproto> b1airo: ok ! I take this input for the UC :)
22:03:38 <StefanPaetowJisc> bye rbudden
22:03:43 <oneswig> thanks everyone
22:03:50 <priteau> bye everyone
22:03:52 <zioproto> guys it is really late here, I have to leave to sleep, ciao :)
22:04:00 <oneswig> #endmeeting