#openstack-meeting log

06:59:39 <blair> #startmeeting scientific-wg
06:59:40 <openstack> Meeting started Wed Jun  8 06:59:39 2016 UTC and is due to finish in 60 minutes.  The chair is blair. Information about MeetBot at http://wiki.debian.org/MeetBot.
06:59:41 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
06:59:44 <openstack> The meeting name has been set to 'scientific_wg'
06:59:48 <blair> #chair oneswig
06:59:49 <openstack> Current chairs: blair oneswig
07:00:16 <priteau> Hello
07:00:21 <oneswig> Good morning!
07:00:29 <oneswig> #topic roll-call
07:00:42 <oneswig> Who is here today?
07:00:44 <blair> present!
07:00:49 <oneswig> hello
07:01:20 <blair> we have an apology from Tim Bell, which is a bit unfortunate because I think he was pretty interested in Blazar
07:01:36 <priteau> I am Pierre Riteau from the Chameleon project
07:01:50 <priteau> I was invited by Stig Telfer to join the meeting
07:02:00 <oneswig> priteau: thanks for coming Pierre
07:02:12 <blair> +1
07:02:58 <oneswig> ... attendance is light this week ...
07:03:29 <verdurin> Adam from Francis Crick here - morning
07:03:37 <oneswig> Hi Adam
07:03:42 <blair> G'day Adam
07:04:01 <priteau> Good morning Adam
07:04:30 <oneswig> Shall we get going?
07:04:37 <blair> so maybe i can quickly tell you about the survey thing i added to the agenda (as i'll need to exit soon) and then we can move onto blazar?
07:04:58 <oneswig> #topic NeCTAR NPS
07:05:02 <oneswig> take it away Blair
07:05:20 <blair> just wanted to mention this so it's on the radar really
07:05:51 <blair> for those who don't know, the NeCTAR Research Cloud is a Science Cloud open to all government funded researchers in Australia
07:06:34 <blair> it has been operating since 2012 and scaled up between 2012-2015 to 8 sites and ~35k cores
07:06:58 <blair> we recently did our first major user survey
07:07:31 <blair> contacted over 6000 people/emails and got a pretty decent 600+ responses
07:07:47 <oneswig> that's a lot of datapoints
07:08:16 <oneswig> IIRC the OpenStack user survey is roughly the same sample size
07:08:28 <blair> it was a Net Promoter Score survey, so it's quite light in terms of participation, basically you just score the thing on a scale of 1-10
07:08:50 <blair> anything 6 or less is considered a detractor
07:09:02 <blair> i think 7 and 8 is neutral
07:09:09 <blair> 9 and 10 is promoter
07:09:29 <blair> (i could have those slightly out, might only be 7 that is neutral)
07:09:50 <blair> anyway, you end up with a score between -100 and +100 for the survey overall
07:10:07 <blair> and we got +24, which we're pretty happy with
07:10:38 <blair> there's also a "more info" field, and then detractors were followed up for more specific info if willing
07:10:53 <oneswig> That counts as approval, ie 0 is neutral in the final scoring right?
07:11:04 <blair> oneswig: yep that's right
07:11:20 <blair> that score in general NPS terms is quite good i believe
07:11:33 <oneswig> Great to hear it.  Any ideas on why?
07:11:55 <blair> though compared to similar infrastrucutre things it's a bit lower apparently
07:12:23 <blair> most of the detractors had usability or stability gripes
07:12:56 <oneswig> At least they are fixable, to some degree
07:13:32 <blair> the stability ones could be largely due to early problems with storage in the initial zone throughout 2012, but also probably because this cloud operates across university datacentres and each year there is usually at least one or two zones offline for a weekend (e.g. power works)
07:13:32 <verdurin> blair: will you be re-running this annually, six-monthly...?
07:13:45 <blair> yes that's the intention now
07:13:56 <oneswig> priteau: any idea how satisfied chameleon users are?
07:14:07 <blair> but of course just targeting active users and/or user who have recently become inactive
07:14:48 <priteau> oneswig: We haven't run a survey across all users yet. We got some good feedback from users who successfully used the testbed to run experiments and published their results
07:15:09 <blair> anyhow, wrapping up this item. we have a good pile of data here that may be useful to the UX team, so i'm working to de-identify it and pass it to Piet
07:15:33 <oneswig> That's a great idea Blair, get more out of it that way
07:16:01 <oneswig> priteau: I saw Paul Ruth talk at Austin with Kate, he seemed pretty pleased
07:16:07 <blair> yeah, as you can imagine a lot of researchers have trouble when you give them Horizon
07:16:37 <oneswig> #idea is there any existing way to feed data like this as input to UX?
07:17:06 <blair> it doesn't seem like it, currently have a thread going asking various foundation staff
07:17:42 <oneswig> It's a really useful thing to do, I'll ask around at Cambridge to see if anything similar happens
07:17:46 <blair> ideally it would be passed to the foundation and then they'd pass it to blessed projects, so there are no issues with personal liability
07:18:33 <oneswig> Any more to cover on this agenda item?
07:18:53 <blair> no let's make use of priteau while we've got him :-)
07:19:00 <oneswig> Blair I'll note an action and move on
07:19:24 <oneswig> #action blair's going to anonymise NPS data for UX purposes
07:19:39 <oneswig> #topic Blazar and use cases
07:19:56 <oneswig> priteau: can you give an overview of what it does for Chameleon?
07:20:06 <priteau> Absolutely
07:20:35 <priteau> So as you may know the Blazar project implements Reservation as a Service for OpenStack
07:21:17 <priteau> In Chameleon we make use of its ability to reserve physical hosts
07:22:28 <priteau> Before running any experiments, Chameleon users interact with Blazar (generally through the Horizon dashboard, but it can also be from the CLI) to reserve one or several physical nodes for their experiments
07:22:55 <priteau> The nodes are exclusively available to them for the duration of their reservation
07:23:05 <oneswig> At a future time, or from now for six days (say)
07:23:34 <priteau> Reservations can start now or at any time in the future
07:24:09 <blair> priteau: forgive my ignorance, but how does Blazar actually implement that? does it manipulate aggregates or something to make nodes unavailable to nova scheduler and then force-host onto them as required?
07:24:09 <priteau> Our main requirements was to allow running large scale experiments, e.g. using all resources of the testbed
07:24:26 <oneswig> And if the reservation is accepted, it's essentially a contract with the infrastructure?
07:24:43 <priteau> for this users can take a reservation some time in advance, which would not be possible if they were just relying on launching on-demand instances with Nova
07:24:43 <verdurin> blair: looks like it uses aggregates
07:25:18 <priteau> blair: yes it manipulates host-aggregates and forces them to be used with a custom Nova scheduler filter
07:25:33 <priteau> users have to pass a scheduler hint to launch instances inside a specific reservation
07:25:52 <priteau> (we've made this process easy to do through Horizon)
07:26:36 <verdurin> priteau: once a reservation has become active, can it be changed?
07:26:40 <priteau> if no scheduler hint is given, Nova will schedule within the set of hosts not managed by Blazar, which in our case is empty, making it mandatory to use Blazar
07:26:53 <verdurin> For example, if users find they need more resources than they had predicted.
07:27:26 <priteau> verdurin: only in time, i.e. you can shorten or extend the duration
07:27:45 <priteau> but they cannot be modified in space
07:28:02 <oneswig> priteau: but there's no segregation between reserved hosts and "on-demand" hosts in the same project, right?
07:28:06 <priteau> this is a feature we would like to have but it is not on our short term roadmap yet
07:28:39 <verdurin> priteau: at the moment, the workaround would be to create a new reservation with immediate start?
07:28:54 <priteau> verdurin: correct
07:28:56 <blair> ok that's cool, so right now it sounds pretty useful with the main drawback being you have to segregate the reserved infrastructure
07:29:07 <priteau> oneswig: I am not sure I understand your question
07:29:50 <oneswig> priteau: I meant, all the hosts for a project (blazar-managed or not) could still share the same east-west tenant networks?
07:30:24 <blair> would sure be nice if it worked in tandem with a preempt-able instance feature
07:30:42 <blair> priteau: can you have soft and hard reservations ?
07:31:10 <priteau> oneswig: right, networking is completely independent of Blazar
07:31:25 <oneswig> priteau: thanks.  Who else is using it, do you know?
07:32:04 <priteau> blair: I believe it is only hard reservations
07:32:52 <oneswig> blair: are you thinking of something like spot market opportunistic availability?
07:33:05 <blair> oneswig: exactly
07:33:14 <oneswig> blair: me too :-)
07:33:39 <blair> or at least a way to make sure that ephemeral workload can leverage the reserved infrastructure when there is not a reservation active
07:33:44 <verdurin> yes, that's something we've wanted for a long time
07:34:08 <oneswig> priteau: how much effort do you need to put into maintaining Blazar and keeping it from bit-rot?
07:34:28 <priteau> oneswig: I had a meeting some time ago with previous contributors to Blazar, from Mirantis & Red Hat. There was someone interested in using it for a project related to NFV, I can't find the name right now
07:34:35 <blair> as it sounds like for a general purpose science cloud where you want this functionality you'd need to hold compute capacity aside for blazar from the rest of your usual on-demand fleet
07:35:27 <blair> it would definitely be of interest for NeCTAR
07:35:58 <blair> priteau: does it support AZs?
07:36:03 <oneswig> priteau: Chameleon uses Ironic, is Blazar managing those compute nodes or other (virtualised) nodes?
07:36:17 <priteau> oneswig: There was a substantial effort in making Blazar more stable at first, as some basic validation of user input was missing or in some conditions it would fail and leave Nova in a bad state
07:36:55 <priteau> And as we operate the testbed we regularly make improvements
07:37:21 <priteau> just last week I fixed an eventlet bug triggered to concurrent operations
07:37:48 <priteau> blair: I don't know if it supports AZs, we don't use them in Chameleon
07:37:57 <aloga> priteau: how do you handle backfilling of nodes?
07:38:18 <priteau> oneswig: in Chameleon, Blazar only manages Ironic bare-metal hosts, not KVM hypervisor nodes
07:39:02 <priteau> aloga: we don't, but our use case is very specialized, so I see how this would be needed for a general purpose compute cloud
07:39:25 <aloga> blair: regarding preemtible instances, you should have a look at https://review.openstack.org/#/c/104883/
07:39:26 <oneswig> This use case is interesting for the Intel-Rackspace OSIC.  The idea that people could (to some degree) do self-service reservations of bare metal of arbitrary sizes
07:39:55 <dariov> hello folks!
07:39:55 <priteau> note that in Blazar we use the physical host reservation feature, but there is also support for reserving virtualized instances (which I am not familiar with)
07:40:01 <aloga> priteau: I do see this as a problem, if a user reserves one node with 1 week in advance, the node won't be used during that week
07:40:41 <blair> aloga: thanks for pointing that out, will definitely review
07:40:44 <aloga> blair: and of course at https://github.com/indigo-dc/opie
07:40:57 <priteau> aloga: I may have misunderstood what you meant by backfilling. If user Alice reserves a node one week in advance, Bob can still reserve the same node now until next week
07:41:22 <priteau> Blazar makes sure that reservations don't overlap
07:41:27 <aloga> priteau: maybe it is my lack of knowledge about blazar (I joined late and I was having a look at the logs)
07:41:52 <aloga> priteau: if alice reserves the node in today + 1 week and nobody else reserves that node using blazar
07:42:18 <aloga> the node is not going to be used, as it will be removed from the "normal" nova aggregate
07:42:20 <aloga> right?
07:42:48 <priteau> aloga: right, so if no other Blazar reservation is made for that node, then it isn't used, at least using the physical host reservation feature of Blazar
07:42:59 <priteau> as I said I am not familiar with the other modes of reservation
07:43:50 <blair> aloga: opie looks interesting, am i right in thinking it extends the existing nova-scheduler? some of the code looks familiar
07:43:51 <aloga> priteau: thanks for the information :)
07:44:03 <priteau> oneswig: note that for making Blazar and Ironic "play nice together" we had to make some customizations. Blazar expects one nova-compute per physical node, while Ironic is designed to run one nova-compute per cluster
07:44:20 <oneswig> priteau: it sounds like the issue here is that nodes that are reservable by blazar must be partitioned from others, is that the case?
07:44:29 <aloga> priteau: I think that preemptible instances would be a good complement for blazar, as it would allow to fill the node until the reservation starts
07:44:35 <aloga> blair: yes, you are right
07:44:48 <priteau> aloga: yes indeed it would be a good completement
07:45:25 <aloga> blair: the scheduling algorithm is described in the spec
07:45:51 <aloga> blair: but the plumbing is basically the filtering alrogithm
07:45:55 <priteau> oneswig: yes, when you register hosts to be managed by Blazar then they cannot be used without reservation anymore
07:46:02 <aloga> s/filtering algorithm/filter scheduler/
07:46:47 <oneswig> priteau: do you have future plans for developing Blazar not covered yet?
07:47:05 <blair> aloga: which spec, the one you mentioned earlier?
07:47:09 <aloga> blair: the code needs a brush up, I am currently working on this, but I keep you updated
07:47:11 <aloga> blair: yes
07:47:41 <blair> oh ok, i didn't see any reference to "opie" when i had a quick look
07:47:46 <aloga> blair: basically opie's code is the materialization of the spec
07:47:57 <priteau> oneswig: first, I have many patches to Blazar that I would like to push upstream: https://github.com/ChameleonCloud/blazar/commits/chameleon
07:48:49 <oneswig> priteau: does Blazar have CI defined for it?
07:49:40 <priteau> oneswig: for additional development, we would like to 1) improve the resource selection capabilities of Blazar 2) add support for Keystone v3 domains to the Blazar client
07:50:12 <priteau> oneswig: see one of the latest patch posted: https://review.openstack.org/#/c/325747/
07:50:27 <oneswig> priteau: just looked - should have done before asking :p)
07:50:56 <oneswig> Any more to cover on Blazar?
07:51:28 <oneswig> #topic Reference architectures and user stories
07:51:36 <priteau> oneswig: there are unit tests and tempest tests
07:51:58 <oneswig> OK, so we have some work underway to document our latest reference architecture at Cambridge Uni
07:52:05 <oneswig> Not much to report on that.
07:52:12 <oneswig> Just rolling on.
07:52:40 <oneswig> I am interested to know, is there anything similar for Chameleon, or for Indigo DC,
07:53:07 <oneswig> something that might help the Foundation's pages on how research/scientific compute can be done on OpenStack?
07:53:30 <blair> i'm gonna run folks, might be missing last call for beers! ;-)
07:53:39 <oneswig> Thanks blair, cheers
07:53:42 <aloga> blair: enjoy
07:53:49 <aloga> oneswig: yes, there is
07:53:53 <blair> thanks priteau and aloga!
07:53:56 <priteau> thanks blair
07:54:02 <blair> cya oneswig
07:54:08 <verdurin> bye blair
07:54:35 <aloga> oneswig: https://arxiv.org/abs/1603.09536
07:55:20 <oneswig> HEP in this case - high energy physics?
07:55:26 <aloga> oneswig: yes
07:55:29 <priteau> oneswig: we don't have this publicly available yet
07:55:44 <verdurin> oneswig: I know I need to submit something about eMedLab
07:56:08 <priteau> oneswig: is there a particular format that would be needed by the Foundation?
07:56:08 <oneswig> verdurin: that would be great!
07:56:33 <aloga> oneswig: there's also an EU deliverable, but it is a much larger version
07:57:15 <oneswig> #link www.openstack.org/user-stories - there are case studies linked to from there
07:57:28 <aloga> oneswig: however, the paper is focused on HEP, but indigo goes far beyond HEP
07:57:50 <oneswig> I think there are other places where the pages are more "live" and link to blogs and external pages that are continually updated
07:58:58 <oneswig> It would be a great help to contribute case studies to link use cases with people using OpenStack in that way
07:59:25 <oneswig> for example, I think we've all learned stuff today we didn't expect
07:59:56 <oneswig> Ah, we are out of time for the week.
08:00:15 <oneswig> Thank you for coming!
08:00:45 <oneswig> I'll read on with interest
08:00:46 <priteau> oneswig: thank you for inviting me! I hope my input on Blazar was helpful
08:00:50 <aloga> oneswig: thank you for chairing
08:00:56 <oneswig> certainly was
08:00:59 <verdurin> Very interesting, priteau
08:01:07 <oneswig> #endmeeting