#openstack-meeting log

11:00:19 <oneswig> #startmeeting scientific-sig
11:00:20 <openstack> Meeting started Wed Mar 10 11:00:19 2021 UTC and is due to finish in 60 minutes.  The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot.
11:00:21 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
11:00:23 <openstack> The meeting name has been set to 'scientific_sig'
11:00:49 <oneswig> Hi all
11:01:00 <oneswig> #link Agenda for today https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_March_10th_2021
11:02:19 <oneswig> We have a discussion on Jupyter as today's main event
11:02:27 <eliaswimmer> Hi
11:03:03 <oneswig> Hi eliaswimmer, thanks for coming along
11:03:08 <eliaswimmer> I have prepared a few slides about my lessons learned using jupyterhub for lectures
11:03:14 <eliaswimmer> https://docs.google.com/presentation/d/1Il7nCNTaKCla0AqbQJnrmPpp_TUhg_-E8IqgYA6q4gk/edit?usp=sharing
11:03:37 <oneswig> I need to request access, can you make them world-readable?
11:03:56 <b1airo> seconded
11:04:07 <oneswig> Hi b1airo, evening
11:04:11 <oneswig> #chair b1airo
11:04:12 <openstack> Current chairs: b1airo oneswig
11:04:19 <b1airo> howdy oneswig
11:04:28 * b1airo yawns
11:05:47 <eliaswimmer> oneswig: is it working now?
11:05:58 <oneswig> Yes, thanks that works
11:06:14 <oneswig> I am in
11:09:59 <oneswig> eliaswimmer: how much additional work does Zero to JupyterHub require to make into a production service?
11:10:46 <eliaswimmer> That really depends on your requirements
11:10:59 <eliaswimmer> but it works really well out of the box
11:11:15 <eliaswimmer> a lot of effort went into it
11:11:45 <eliaswimmer> sometimes it is a bit hard to follow the fast release updates
11:12:46 <eliaswimmer> I spent most time with creating images and getting user creation right
11:13:14 <oneswig> What were the complexities in docker container image creation?
11:13:19 <eliaswimmer> and of course on the underlying kubernetes setup
11:13:28 <verdurin> I'm not really here, but Magic Castle includes Jupyterhub.
11:13:36 <b1airo> does Z2J already handle idle notebook cleanup?
11:14:04 <verdurin> https://github.com/ComputeCanada/magic_castle
11:14:05 <eliaswimmer> b1airo: yes, that works quite well
11:15:23 <eliaswimmer> oneswig: mainly the diverse needs of our users, that's the place where all the features are setup
11:15:53 <dh3> sorry for being late. we have a group which runs Jhub notebooks on k8s on Rancher on OpenStack (on turtles...) Their main difficulty was scheduling pods, not knowing if an arriving user was about to run something big or something small, and overcommitting resources.
11:16:05 <b1airo> what does your front-end proxy setup look like, any load or scaling issues with so many clients?
11:18:07 <dh3> nothing special in our front end as far as I know, standard k8s ingresses
11:19:07 <oneswig> eliaswimmer: it sounds like your labs scale up and down a lot.  What's the highest scale the deployment has reached in terms of users online?  Is the Kubernetes auto-scaling working well?
11:19:40 <eliaswimmer> I have to admit, we just throw a lot of hardware onto it, so it was never 100% utilized
11:19:59 <mpryor> I apologise for my lateness! Where I work we run Jupyter notebooks in two ways - we have JupyterHub running on Rancher Kubernetes which we manage for our users. We also run a system we have called "Cluster-as-a-Service" which can dynamically deploy Pangeo-based JupyterHub instances on our OpenStack cloud.
11:20:17 <eliaswimmer> oneswig: autoscaling was never needed and tested so far
11:20:38 <oneswig> ah ok
11:21:35 <eliaswimmer> I set limits for each lecture separately, which is easier for lectures where you know your requirements
11:22:52 <eliaswimmer> Does anyone use GPUs already in their setups?
11:22:57 <oneswig> eliaswimmer: how is the storage interface working?
11:23:47 <eliaswimmer> oneswig: right now I use CephFS via CSI driver directly for homes and shares
11:23:50 <oneswig> We have used GPUs with K8S in a couple of deploys
11:24:56 <eliaswimmer> How is it with utilization? I wonder how to do AI lectures with 150 students, I would need 150 GPUs for them!
11:25:29 <eliaswimmer> That is why I am looking into an additional KubeFlow setup for a GPU cluster
11:26:30 <mpryor> How many people that are running JupyterHub have Dask enabled? We have found that users like having that functionality.
11:26:33 <b1airo> Could split them under k8s - MIG or vGPU
11:27:25 <eliaswimmer> b1airo: therefore I would need tesla grade GPUs
11:27:35 <b1airo> Yep, our HPC integrated JHub has Dask built-in
11:28:35 <eliaswimmer> Anyone using SLURM spawner?
11:28:47 <verdurin> That's how the Magic Castle implementation works.
11:28:54 <b1airo> true eliaswimmer , though is they are already in an OpenStack cluster I'm assuming they're in server machines, in a data centre somewhere, so...
11:29:31 <b1airo> yes, we're using Slurmspawner
11:30:16 <verdurin> https://github.com/cmd-ntrf/slurmformspawner
11:30:38 <eliaswimmer> b1airo: Do you have extra partitions for JupyterHub with shared nodes?
11:32:24 <dh3> the system here has Dask (not sure how many people use it). not using SLURM (there is LSF elsewhere for those who want it)
11:34:56 <b1airo> Actually our spawner is kind of a mashup, as the Hub machine is not allowed the Slurm keys directly, so once a user authenticates (2fa via a custom PAM authenticator) we create a Kerberos credential that allows them temporary ssh access to a login node, so our version of batchspawner does things via ssh
11:36:27 <oneswig> eliaswimmer: what do you do to provide user data into Jupyter environments?
11:37:13 <b1airo> re partitions, we were just filling up space on gpu nodes to start with, but now have a dedicated interactive partition for jupyter and other modest jobs. partly because we had some requests for teaching postgrad labs on the environment
11:38:25 <eliaswimmer> oneswig: that's a good question, upload and download capabilities are quite limited, so for lectures with huge data sets I provided the lecturers with an extra share server.
11:39:16 <mpryor> eliaswimmer oneswig We have found that, especially when using Dask, it makes sense to have any large datasets in an object store.
11:39:54 <eliaswimmer> b1airo: we do so the same as our clusters are getting smaller and smaller in terms of nodes
11:41:02 <mpryor> For our managed notebook service, even though the notebook servers are running in Kubernetes we actually mount home directories and shared filesystems, so the environment they see is much like they see if they SSH to our traditional batch platform.
11:41:09 <eliaswimmer> mpryor: Oh that sound interesting, are you using a plugin for Jupyter to provide a view on the object store?
11:42:03 <mpryor> eliaswimmer The community that we operate in (Earth Sciences) has built tools around a technology called Zarr that makes using the object store more or less transparent.
11:42:33 <eliaswimmer> mpryor: opencube?
11:43:38 <mpryor> eliaswimmer I think we have some people using datacube-like technologies. However the most common software stack seems to be data on object store, accessed using Dask, XArray and Zarr. Data catalogs are provided by a tool called Intake.
11:43:58 <mpryor> This is basically the Pangeo stack - https://pangeo.io/
11:45:30 <mpryor> The Pangeo community also maintain a data catalog for CMIP6 - https://pangeo.io/catalog.html
11:46:34 <eliaswimmer> mpryor: thank you! Our geo scientist are very interested in our setups, so that is a good starting point for me
11:47:32 <mpryor> eliaswimmer Pangeo is our standard setup that we provide via our Cluster-as-a-Service system.
11:48:17 <mpryor> Mostly it is oceanographers using it at the moment, but we have had interest from other groups included geo-type stuff.
11:48:17 <eliaswimmer> does anyone use gpfs with manila and kubernetes?
11:48:18 <oneswig> eliaswimmer: are you deploying all the k8s and jupyterhub environments for users or do they self-service somehow?
11:49:42 <eliaswimmer> oneswig: right now I do everything my own, but the plan is to have a self service platform with a service catalog once
11:52:05 <eliaswimmer> Anyone using JupyterHubs for teaching?
11:52:48 <mpryor> eliaswimmer A few of our tenants have used the self-service hubs we offer via CaaS for workshops and teaching.
11:53:43 <mpryor> They are able to onboard all their own users, so we often don't find out about it though.
11:53:44 <eliaswimmer> We are planing to improve grading services (nbgrader and ngshare) a lot over summer, we will open source our code when ready
11:54:10 <eliaswimmer> mpryor: how do the manage authentication?
11:56:03 <verdurin> Magic Castle was originally created for teaching at ComputeCanada, so I'm pretty sure they use the Jupyter part for that. It sets up IPA for auth.
11:56:20 <mpryor> eliaswimmer One of the other cluster types we offer as part of our Cluster-as-a-Service is a central identity manager, which all other clusters connect to.
11:56:37 <verdurin> I meant to ask if they wanted to contribute to this meeting, but forgot, and it's a bit early for Canada.
11:57:19 <oneswig> I think it's a significant enough use case that a follow-on is warranted
11:57:48 <b1airo> Computational notebooks for teaching.. some interesting pedagogical arguments around that I've come across, feels like a lost battle though
11:59:38 <mpryor> eliaswimmer verdurin Our identity manager is FreeIPA + Keycloak. Keycloak is only there in a readonly capacity to provide OpenID Connect. Our Pangeo (JupyterHub) instances authenticate using the LDAP that you get from FreeIPA.
12:00:16 <oneswig> ah, we are out of time
12:00:20 <mpryor> We could have also used OpenID Connect, but OIDC is not fully supported yet by the JupyterHub OAuthenticator.
12:00:27 <oneswig> final comments please
12:00:27 <b1airo> interesting discussion - thanks all
12:00:39 <eliaswimmer> mpryor: keycloak is a great tool, I use it for our SLURM based setup, wrote a little extension for our 2fa auth
12:00:49 <b1airo> seems everyone is doing Jupyter these days
12:01:05 <eliaswimmer> I think there is a lot to share, maybe we can setup some etherpad?
12:01:09 <oneswig> Thanks eliaswimmer and all, useful discussion
12:01:25 <b1airo> related, am keen to talk to anyone using OpenOnDemand
12:01:26 <verdurin> Thanks.
12:01:40 <oneswig> eliaswimmer: some follow-up is definitely needed.
12:01:49 <oneswig> OK, have to close the session.  Thanks all
12:01:52 <oneswig> #endmeeting