21:00:39 <martial> #startmeeting Scientific-sig
21:00:40 <openstack> Meeting started Tue Jun 26 21:00:39 2018 UTC and is due to finish in 60 minutes.  The chair is martial. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:00:41 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:00:43 <openstack> The meeting name has been set to 'scientific_sig'
21:00:54 <martial> Good day everybody
21:01:08 <trandles> hi martial
21:01:26 <m_ebert> Hello Martial
21:01:44 <janders> good evening, good morning :)
21:02:11 <martial> Today we are joined by Marcus Ebert who will share with us "Utilizing Distributed Clouds for Compute and Storage"
21:02:24 <janders> excellent, hello Marcus
21:02:25 <martial> The links to the video and slide deck are as follow
21:02:44 <m_ebert> Hello everyone!
21:02:47 <martial> #link Video https://youtu.be/QAda-ee-9Ko
21:03:02 <martial> #link Slides https://goo.gl/g9EmWg
21:03:17 <martial> now we understand that not everybody has had a chance to review the video
21:03:41 <martial> as such we want to give Mr Ebert a chance to discuss his slide deck and answer questions
21:03:42 <b1airo> o/
21:03:48 <martial> #chair b1airo
21:03:49 <openstack> Current chairs: b1airo martial
21:04:05 <martial> welcome Blair
21:04:19 <b1airo> Thanks martial
21:04:48 <martial> m_ebert: would you like to give us an introduction to the work please?
21:05:00 <m_ebert> Sure
21:05:08 <b1airo> I'm feeding the animals/children breakfast, so sorry if I'm a little slow...
21:05:24 <b1airo> Hi m_ebert
21:05:38 <m_ebert> Hi blairo
21:07:12 <m_ebert> What I wanted to show in the slide is the system we developed to utilize different clouds, which can be anywhere in the world, by building system that unifies different clouds and types into a single infrastructure
21:07:58 <m_ebert> It hides the cloud structure from the users, which only see a single "local" batch system to which they submit their jobs
21:08:51 <m_ebert> In addition, since the jobs can now run anywhere and the user has no idea where, we are also working on an data infrastructure that unifies storage space on different endpoints  into a single files system like structure
21:09:31 <m_ebert> an overview of how the compute part work is shown on slide 9
21:10:24 <m_ebert> and slide 19/20 how we would like to have the storage part that the compute is using in the future
21:10:30 <b1airo> m_ebert: presumably this works well for high-throughput workloads. What does the job definition contain, and can the system accommodate jobs added dynamically via API?
21:11:21 <m_ebert> blairo: do you mean with API submitting directly to condor, or which API do you mean?
21:12:59 <janders> m_ebert: this is excellent work. Regarding storage, is the workflow 1) download 2) process 3) upload results, or is in-place access to data also possible?
21:13:25 <b1airo> Either I suppose. (I haven't seen the slides yet sorry - just woke up)
21:14:55 <m_ebert> janders: the workflow of the 2 experiments is right now download-process-upload, but in-place access is possible. On slide 22 for example there is an example how the root analysis framework could open files directly over the network, but any other tool that can stream files should work to when it uses http/webdav
21:16:00 <m_ebert> blairo: Yes, all jobs that can be submitted to condor will go to VMs. In the job definition, requirements for RAM,CPUs, disk and so on can be defined, and if not then defaults will be used
21:17:18 <janders> m_ebert: great! Slide 22 says mounting the whole data federation is "reasonably fast". Do you know the approximate throughput you're getting? (just an order of magnitude will suffice)
21:17:21 <m_ebert> and if no VMs available that satisfy these requirements, then cloudscheduler will start a new VM
21:17:21 <martial> from your slide deck, this uses HTCondor as the batch server, can communicate to can use Openstack, OpenNebula, Amazon, Microsoft Azure, and Google Cloud, and I would second the question on storage as I see "DynaFed"?
21:18:31 <m_ebert> janders: getting data from a minio instance running on a VM on the same cloud, we see up to 3Gbps (but we use only files that are some GBs large)
21:20:35 <janders> m_ebert: Thank you. What do you think is the limiting factor in terms of achieving more throughput? storage bandwidth? WAN bandwidth? Applications (eg TCP window tuning)?
21:21:02 <m_ebert> martial: yes, HTCondor is the batch system, and cloudscheduler can use these types of clouds right now and can communicate with it. Then it uses cloud-init to send the condor config to a VM, which then can communicate with the HTcondor server.
21:23:07 <m_ebert> The limiting factor I see is the storage system itself in case of minio (data is on a volume mounted on a VM, where minio is just a layer), the cloud network interface to the outside, and all other ongoing activity on the cloud/hypervisor
21:24:17 <b1airo> Any issues with condor here? I have not used it for a long time but remember it was sometimes seeming a but slow to get started
21:24:20 <m_ebert> using gfalFS on a baremetal system, we get nearly line speed (tested up to 10Gbps interfaces) - but depends largely on where the endpoints are
21:25:25 <m_ebert> No issues in daily production so far. Only when we have a very large number of cores available temporary (>>10,000) then it made problems in the past
21:26:50 <m_ebert> Well, not cores really, but job slots  (we run mostly single core jobs, so job slots==cores)
21:27:15 <b1airo> Ok, yes that sounds more familiar
21:30:18 <b1airo> What sort of length jobs are you running?
21:32:13 <m_ebert> we run mostly so called pilot jobs, which are just a wrapper to setup the environment for real jobs and then pull in from the experiments server the real payload. pilots can run up to 2days (defined in condor), payload jobs usually between some minutes and up to about 15h)
21:32:57 <martial> how much time is spend copying data then in those jobs?
21:34:28 <b1airo> Yeah i was meaning the actual payload jobs. Do you know if condor handles short jobs (10s to 1s of seconds) ok?
21:34:30 <m_ebert> Well, with dynafed it comes mostly from close by storage so it's just some seconds at the beginning of the job. We run mostly 8core VMs, so 8 such jobs will run in parallel
21:36:03 <m_ebert> well, in our case, the condor job is only the pilot since that is all it knows about. But we had misconfigured pilots before which terminated within seconds and then run through hundreds before we stopped it - condor was fine with that
21:36:39 <martial> do you have some local vs remote benchmark to share?
21:36:55 <m_ebert> martial: when we had to go back to the site SE, the pulling in of data can take up to half hour or timed out, since it allows only a specific number of parallel transfers and send all others in a waiting queue
21:37:02 <martial> (on similar enough systems) to see how bad the overhead is?
21:37:46 <m_ebert> martial: not right now, but working on a benchmarking sytem. There will be the CHEP conference in 2 weeks, I should have it by then to present. I can send a link around once I have it ready
21:38:03 <martial> thanks we can do a follow up then
21:38:39 <janders> that would be great
21:38:55 <m_ebert> sure, I'll do that
21:41:18 <martial> any other questions for Mr Ebert?
21:41:47 <martial> seems to me we have reached a natural stopping point in this conversation for the time being
21:41:58 <martial> we invite people to check the video for additional details
21:42:26 <martial> allow me to thank you again for coming to explain this very interesting solution to us
21:42:49 <martial> please follow with us in a few weeks if you have more to add
21:43:06 <martial> for the time, please allow me to thank you for accepting to talk to us m_ebert
21:43:19 <m_ebert> Thank you! Also everyone please feel free to send me questions/comments by email later if some come up by reviewing the slides, also if you have free resources somewhere ;-)
21:43:44 <b1airo> Thanks m_ebert , we'll also share this in next week's meeting which is friendlier to different timezones
21:43:46 <m_ebert> I'll try to join more often the meetings and let you know when the benchmarks are ready
21:43:55 <martial> your email is listed in the slides
21:44:01 <m_ebert> thanks blairo
21:44:01 <martial> thank you
21:44:21 <martial> and with this, our other topic for today was
21:44:25 <martial> #topic AOB
21:44:49 <martial> for once I do not have content for AOB :)
21:44:52 <martial> b1airo?
21:44:59 <b1airo> Ha!
21:45:34 <b1airo> Hmmmm, no not off the top of my head. But I am a little slow off the mark this morning
21:45:40 <janders> the submission deadline for Berlin is approaching.. I wonder what talks would you guys like to hear?
21:47:06 <verdurin> janders: the talk where someone solves all my controlled-data problems?
21:47:32 <janders> verdurin: :)
21:47:38 <b1airo> Yeah that would be pretty wonderful
21:48:00 <b1airo> Even just a talk that tells me what they all are
21:49:26 <b1airo> I'd be interested in hearing about how people do general purpose "managed" cloud, i.e., long-lived instances with patching etc
21:50:16 <martial> that is a tough one indeed
21:50:55 <janders> b1airo: on your RoCE-enabled system, do you install MOFED in the images or are you running upstream mlnx stack?
21:51:14 <b1airo> Yeah we use MOFED
21:51:52 <janders> do you embed custom repo config in images and manage kernel-dependent packages?
21:51:56 <janders> or do you leave that to the users?
21:51:58 <b1airo> Though it gets installed post launch via Ansible
21:52:22 <janders> do you keep your own MOFED repos?
21:52:36 <janders> (or use mlnx iso/tgz ?)
21:52:57 <b1airo> The HPC crew who are the primary users of the RoCE enabled stuff do have their own repo for managing updates consistently
21:53:12 <b1airo> They use the ISO I think
21:53:29 <janders> ok!
21:54:07 <janders> I maintain MOFED repos and put excludes in the yum config - that's mostly for the infra, but I'm considering doing something similar for the instances
21:54:23 <b1airo> I think it's mostly ok now. Though I sometimes hear them grumbling about needing to rebuild modules
21:54:38 <janders> sounds like mlnx_add_kernel_support.sh :)
21:54:53 <janders> mostly works, occasionally causes frustration..
21:55:24 <b1airo> Yeah. Dkms on Ubuntu seems to be fine
21:55:26 <janders> it's an interesting lifecycle-related challenge, quite a boutique one though
21:55:59 <janders> what's your motivation for using MOFED as opposed to upstream? better perforamnce? supportability? good practice? all of the above?
21:56:00 <b1airo> It's a bugbear across the research cloud
21:56:16 <martial> we have only a short few minutes at this point
21:57:16 <b1airo> Lots of fairly green users with root access of public machines. Would be nice to give them guardrails and help protect their data
21:57:58 <janders> b1airo: +1! :)
21:59:12 <martial> and on those words, I am about to end our meeting
21:59:27 <martial> thanks everybody for spending this time with us
21:59:29 <b1airo> I see scope for a community project in this, but not something I can start at the moment!
21:59:37 <b1airo> Cheers all
21:59:43 <martial> and to Mr Ebert for presenting
21:59:48 <martial> bye all
21:59:53 <martial> #endmeeting