16:01:15 #startmeeting blazar 16:01:16 * gouthamr reaches for trout shield, activates it, looks around suspiciously 16:01:17 Meeting started Thu May 9 16:01:15 2019 UTC and is due to finish in 60 minutes. The chair is priteau. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:01:18 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:01:21 The meeting name has been set to 'blazar' 16:01:44 #topic Roll call 16:03:22 Hi. This is the first Blazar meeting in the new timezone for Americas. 16:05:05 maybe everybody slept in 16:06:21 Hi turnerg. Are you interested in Blazar? 16:06:37 hi! I haven't really been involved in blazar, but I read some notes from the ptg about possible future ironic integration, and thought that maybe I'd lurk a bit 16:06:54 Yes; we have some GPU's that we're thinking about how to make available on a fair basis 16:07:44 Been thinking about Blazar off & on since 2016; come from an HPC background where scheduling is a solved problem +/- error 16:08:29 For the time being our cloud is not at capacity but, someday, it's gonna happen 16:08:32 Hi tzumainn. I remember your message on the mailing list back in February. You are working on the Hardware leasing with Ironic project, is that right? 16:09:12 Hi diurnalist 16:09:23 priteau, that's correct! I saw some notes from dtantsur in ironic that mentioned updating ironic with owner/lessee fields, and blazar possibly integrating with that 16:09:38 I am still expecting another person from University of Chicago to join, they're working through NickServ registration right now. 16:12:04 Let's start and they'll read through the logs later ;-) 16:12:10 #topic Presentation of participants 16:13:13 Hi jakecoll 16:13:23 Hey, sorry about that 16:13:24 Because this is the first meeting in this timezone, I thought it would be useful for everyone to present themselves and the OpenStack deployment or product they're working on, if any. And of course why they're interested in Blazar. 16:13:34 I'll start 16:14:40 I am Pierre Riteau, I am the PTL for Blazar in the Train cycle. I have used and developed Blazar since 2015 through the Chameleon testbed: https://www.chameleoncloud.org/ 16:15:21 I am not directly involved with Chameleon anymore, but diurnalist and jakecoll so I'll let them talk about it 16:15:50 Let's go alphabetically. diurnalist? 16:17:05 I'm Jason Anderson, DevOps lead for Chameleon. Chameleon is a scientific testbed with ~500 bare metal nodes that we provision with Ironic. Users reserve nodes with Blazar, and we've also been working to extend the idea of reservations to VLAN segments and floating IPs. Blazar is a core part of how Chameleon works; we use it to track allocations on the testbed as well. 16:17:09 Actually it will be too slow to do this one by one, everyone can present themselves now 16:17:49 Thanks diurnalist 16:19:21 Anyone else? jakecoll, turnerg, tzumainn 16:19:22 I'm George Turner, operator with the Jetstream project https://jetstream-cloud.org/ a cloud for science users. We'd like to schedule/allocate GPUs for defined periods of time; something similar for classroom environment. We have worked with some of the admins on Chameleon 16:19:53 My name is Jake Colleran. I'm Jason Anderson/diurnalist's lackey over on Chameleon. 16:20:19 hi! I'm Tzu-Mainn Chen, and I'm interested in the idea of using Blazar to lease ironic nodes; I saw some PTG discussion on the ironic side about how to accommodate that, so I thought I'd peek in here to see what future plans might be 16:21:29 turnerg: Nice to see some interest from Jetstream. Do you already have GPUs that users keep allocated for too long? 16:22:10 wanna head that rpoblem off before ti happens ;) we have some gpus on order; they're not GA yet 16:23:56 Blazar allows users to define exactly when they want to allocate specific kind of resources, though it needs to be used with a policy to enforce sharing 16:24:54 I know Chameleon had to tune its policies because GPU users kept reserving them for long periods 16:25:21 I temporarily dropped out; I blame mozilla 16:25:29 BTW if you get disconnected, you can read the logs at http://eavesdrop.openstack.org/meetings/blazar/2019/blazar.2019-05-09-16.01.log.txt 16:25:32 Worth mentioning, we have had problems with users stacking leases, especially gpus. 16:25:38 priteau: thanks 16:26:15 jakecoll: can you share what is the latest policy in use on Chameleon? 16:27:50 We've developed a policy where users cannot have advanced reservations that back up against each other over 2 days. However, we enforce this with our own scripts that operate outside blazar itself. 16:28:03 turnerg: out of the box Blazar doesn't provide any limitations on reservations, so users can reserve GPU nodes even more easily than they can launch instances (which would be subject to quota). Chameleon has developed extensions to limit reservation length, but users were working around that by making multiple advance reservations one after the other. 16:29:08 jakecoll: so if a reservation finishes on a Thursday at noon, the user cannot reserve the same resource until Saturday noon? 16:30:37 priteau: Yes. Otherwise, the later advanced reservation will be deleted and an email notification sent. 16:32:13 It would be good to see if some enforcement policies are common enough that they could be integrated in upstream Blazar. 16:32:38 We have had to introduce some pretty wacky policies due to contention over those highly-prized resources. I'm sure there is a better way of expressing the rule. We considered limiting a user/project to having N leases at a given time. However, this could require that users learn some more of the advanced features of reservations, like the fact that one lease can contain many reservations for different resource types 16:33:19 turnerg: How do you envision users would share your GPU resources? Do you have some kind of usage charged against an allocation? 16:34:08 priteau: correct me if I'm wrong, but Blazar currently has no mechanism to actually charge for usage, correct? it just can handle advanced scheduling of time allotments on a resource. 16:34:47 because I've wondered if this is something that has already been discussed, and what changes might be necessary for Chameleon's solution to this to make it upstream. 16:36:22 asorry; dealing with an interrupt 16:36:46 diurnalist: That's correct. Unfortunately I am not aware of a standard solution in this space. OpenStack has CloudKitty but I think it only computes usage after the fact and can convert it in $$$ to charge a credit card. 16:40:18 Maybe some reservation usage could be expressed like other quotas. Each OpenStack project could have a quota of "seconds/hours/days of advance reservation time" and Blazar could check against it. 16:41:45 Keystone is working on unified limits which may be relevant for this: https://docs.openstack.org/keystone/queens/admin/identity-unified-limits.html 16:43:06 We would find it attractive to be able to charge for the tie reserved; Users have ana llocation annd if they burn it then there's at least a cost. We can slice & dice the GPUs down to 1/8 of a GPU; we do it by flavors 16:44:18 turnerg: That's what Chameleon does, using a custom allocation backend and extensions to Blazar. Do you already have a similar allocation system in place for your existing cloud? 16:44:27 Reserving without using would be very bad. I could see how smart people could figure that one out 16:45:17 turnerg: yes. we found that users would set up scripts to automatically just keep making leases in the future for one node. one enterprising user effectively reserved one node for a few months until we spotted it 16:45:26 We have no allocation process other then they use their XSEDE allocation, we go through by hand and disable the project 16:46:03 turnerg: do XSEDE allocations come in some sort of unit? like a compute hour. or is it more of a "license to compute" 16:46:41 hours * CPU_core 16:47:29 Are the XSEDE allocations stored in a custom database or is there some common cluster/grid software that handles this? 16:47:37 one SU = ONE core for one Hour. We anticipate using an multipler to make the GPU more expensive 16:47:46 custom 16:48:38 turnerg: that is exactly what we do. custom multipliers for some thing. we also allow users to reserve GENI slices. it's a VLAN tag but we have a 10x multiplier on those because we only have a few and they're very useful for SDN experimentation. 16:50:11 thinking out loud, if we had some phantom instance start up on the reserved resource and that had a high value charge per time yet consumned no resources on the host, then the u ser could launch thier instance with did real work on the reserved node but that instance would have a zero caahrge rate. this way they'd be charged for the reservation without having to modify our current scripts that just log the time instances are 16:50:11 running 16:51:29 turnerg: That's fine for charging reservations that are running, but users who make advance reservations may still reserve more than they should. 16:51:57 ah; thinking.... 16:52:13 If Blazar was making REST requests to a customisable endpoint on reservation creation / update, expecting to get a simple yes/no answer (with some details, like how much SUs are left compared to how much would be used), would people be motivated to write a small REST service making the link between Blazar and any custom allocation backend? 16:52:24 I guess the quoda above is a start 16:54:19 Time is flying and we're almost finished with the hour. 16:54:48 This is a very interesting conversation, I think we should continue it either by email or at the next meeting in two weeks 16:54:57 :+1: 16:55:08 I just want to talk a bit about Ironic 16:55:39 #topic Summit and PTG feedback 16:56:21 I wasn't in Denver but I was told that Dmitry (from Ironic) and Tetsuro (from Blazar and Placement) had good discussions about how to make Blazar and Ironic standalone work. 16:57:11 There are some details in Dmitry's email to the list: http://lists.openstack.org/pipermail/openstack-discuss/2019-May/005951.html 16:57:30 I read a summary from Dmitry that talked a bit about that, and mentioned that someone on the ironic side that would come up with a blueprint detailing what they'd do 16:57:42 would there be something similar forthcoming on the blazar side? 16:58:24 I haven't had the chance to talk with Tetsuro yet, but I am expecting we would have some specs to match. 16:59:38 okay, cool! I'd be excited to read that and perhaps offer some feedback 16:59:47 There's also a plan for Blazar to work with Nova + Ironic, which we're planning to work on during this release cycle. 17:00:08 tzumainn: You are specifically interested in Ironic standalone? 17:00:13 priteau, both 17:00:20 is there a spec for the nova + ironic work? 17:00:46 Not yet but I'll let you know when it's up for review. 17:01:11 I'm also interested in the Nova + Ironic spec 17:01:34 I was pretty confused by this note in the summit notes: 17:01:49 > #. Blazar creates an allocation in Ironic (not Placement) with the candidate node matching previously picked node and allocation UUID matching the reservation UUID. 17:02:16 so I'm interested in hearing how Placement will fit in to the Nova + Ironic use-case. 17:04:21 also, this note: "To avoid partial allocations, Placement could introduce new API to consume the whole resource provider." - pretty sure this is what resource classes are used for, no? you have CUSTOM_BAREMETAL=1, which can only be 0 or 1, so the node is either 100% or 0% allocated. but I know we're well over time now... 17:04:29 perhaps something for the mailing list. 17:04:43 diurnalist: I haven't looked in detail yet at the proposed standalone workflow. I don't really see how placement can fit with selecting nodes for advance reservations. 17:05:25 Let's end here for today. Thanks everyone for joining! 17:05:30 #endmeeting