11:00:41 <oneswig_> #startmeeting scientific-sig
11:00:42 <openstack> Meeting started Wed Jun 17 11:00:41 2020 UTC and is due to finish in 60 minutes.  The chair is oneswig_. Information about MeetBot at http://wiki.debian.org/MeetBot.
11:00:43 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
11:00:45 <openstack> The meeting name has been set to 'scientific_sig'
11:01:10 <oneswig_> hi all
11:01:19 <oneswig_> #link agenda for today https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_June_17th_2020
11:02:42 <oneswig_> #topic OpenStack and COVID19 workloads
11:03:28 <oneswig_> It seems like a lot of organisations are working to support workloads for COVID, in various forms
11:04:28 <verdurin> Hello - am partially here.
11:04:38 <noggin143> https://home.cern/news/news/cern/cern-contributes-computers-combatting-covid-19
11:05:01 <noggin143> Running Folding@HOME and Rosetta@HOME on hardware about to be retired
11:05:44 <oneswig_> Not heard of Rosetta@home - what is that?
11:06:24 <noggin143> https://boinc.bakerlab.org/
11:06:48 <noggin143> "With the recent COVID-19 outbreak, R@h has been used to predict the structure of proteins important to the disease as well as to produce new, stable mini-proteins to be used as potential therapeutics and diagnostics, like the one displayed above which is bound to part of the SARS-CoV-2 spike protein."
11:08:55 <noggin143> pretty simple cloud-init script to start a new VM - https://clouddocs.web.cern.ch/using_openstack/contextualisation.html#install-the-folding-home-client
11:08:59 <priteau> NSF-funded infrastructures can also accept workloads related to COVID-19, I saw it mentioned through Chameleon: https://www.chameleoncloud.org/blog/2020/03/13/chameleon-use-covid-19-projects/
11:09:15 <oneswig_> Good article on the CERN blog - how many work units have been completed I wonder!
11:09:38 <noggin143> Stats at https://stats.foldingathome.org/teams-monthly
11:10:17 <noggin143> but it is important to not overload the volunteer organisations, we have some spare CPU time at the moment because the team doing hardware work is only just back on site
11:10:45 <noggin143> However, it's important that the core volunteers don't get displaced by this temporary contribution
11:11:55 <noggin143> We're also in discussions with the WHO to see how we can help as they are just down the road from us in Geneva
11:12:39 <oneswig_> verdurin: are you close to the source of research in Oxford?
11:13:14 <verdurin> oneswig_: yes, there's a lot going on here.
11:15:40 <noggin143> verdurin: we're running F@H at the moment but if there is a better application to run, I can put you in touch with the CERN COVID folk
11:18:43 <oneswig_> Aside from simulation work, what about the epidemiology, public health, contact tracing etc.
11:18:48 <verdurin> noggin143: thanks. As usual there is a myriad different applications in use.
11:20:08 <verdurin> There is also the RECOVERY clinical trial, and the vaccine trial that originates from a couple of buildings down.
11:20:33 <oneswig_> I recall from somewhere that protein simulations don't have significant data requirements, does that also apply for your workloads Adam?
11:21:27 <verdurin> A lot of the workloads I know about are similar to our normal genomic ones, hence they do have significant data requirements.
11:23:15 <verdurin> There is also demand from data generators e.g. sequencing, proteomics.
11:23:20 <noggin143> I guess there are also privacy concerns for some of the applications, which don't apply for the volunteer projects like F@H
11:23:40 <verdurin> Yes. In some cases various flavours of patient data.
11:23:49 <oneswig_> All of which make them difficult to spread to other places.
11:24:59 <oneswig_> In the SIG session at the PTG there was reference to this tracking project in India - https://www.aarogyasetu.gov.in/
11:25:21 <oneswig_> Prakash are you here?
11:25:21 <verdurin> One aspect that may be of interest here is provisioning of resources such as RStudio Server on cloud instances, where in the past dedicated nodes were used.
11:26:04 <oneswig_> verdurin: is that licensed software?  That can often be tricky
11:26:25 <verdurin> It depends. There are different flavours, free and licensed.
11:27:42 <oneswig_> Is RStudio being used for post-processing and visualisation of batch simulations?
11:28:47 <verdurin> It's mainly for code development, I believe.
11:28:55 <verdurin> Very fast-moving area.
11:29:19 <oneswig_> I am sure it is.
11:32:47 <oneswig_> I'd be interested to hear how Public Health England's OpenStack systems are being applied for the modelling work they do.
11:34:51 <oneswig_> Anything else to raise on this subject before we move on?
11:35:03 <sluna> Just one more comment: AFAIK RStudio is an IDE for R code development. RStudio Server is useful when you deploy it next to big data and powerful compute so the researcher connects to it through a web browser to do interactive analyses.
11:35:52 <oneswig_> Hi sluna, thanks for clarifying.
11:36:18 <verdurin> The dividing line is a bit muddier, but it's not that important.
11:37:43 <oneswig_> We haven't covered the IOT-class issues of tracking populations but I don't think anyone's here who is working on that.
11:39:11 <Zama8152> anyone using or recommending elastic search for analyzing data and  monitoring movements
11:40:02 <oneswig_> Hello Zama8152, welcome :-)
11:41:03 <oneswig_> ElasticSearch is very good for indexing and retrieval of JSON-encoded data.
11:41:14 <Zama8152> I'd be interested at knowing what better tools to use with regards to tracking population
11:41:35 <Zama8152> oneswig_Hi, thanks for the invite..
11:42:02 <noggin143> We use it mainly for structured search like logs but ES is pretty flexible
11:44:10 <Zama8152> researchers on my side use elastic search to analyze Covid-19 self-screening data and monitoring movements of citizens in areas of interest to understand the effectiveness and impact of the lockdown, this information is used by the National Department of health in making relevant decisions
11:44:59 <oneswig_> Zama8152: it's good for data that doesn't always have the same structure - semi-structured perhaps.  If your data is always of the same format, you could also consider an SQL database like postgres
11:46:02 <oneswig_> How big is the Elastic Search, will it grow to be massive?
11:46:14 <Zama8152> Yeah the data does'nt have the same structure and progress db is indeed used ..
11:46:29 <Zama8152> *postgress
11:49:04 <oneswig_> One other thought on ElasticSearch is that performance can be limited by IOPS and IO latency - it will benefit from local SSD storage in your hypervisors, if you have the local disk capacity for it.
11:49:58 <oneswig_> Is your group developing everything from scratch?  There probably isn't much precedent for this kind of application.
11:50:25 <noggin143> we run in VMs with 4 1TB SSDs per server - works OK, you lose some IOPS and rebalancing can take a while
11:51:27 <noggin143> old presentation at https://indico.cern.ch/event/717615/contributions/3033517/attachments/1676735/2692320/ES_Security.pdf, now migrating to ES7
11:52:35 <oneswig_> Is that 250TB of data in ES back in 2018?  Must be even more now...
11:54:40 <noggin143> the security one is getting pretty big... used for forensics but otherwise, we purge aggressively with archive in HDFS.
11:54:48 <Zama8152> currently they are running on 1 vm with 80GB memory and 1TB ssd
11:55:15 <oneswig_> We are short on time - final comments on this topic?
11:55:44 <Zama8152> They have complained about memory and I am moving them to a vm  with 160GB memory..
11:56:18 <oneswig_> Hopefully that will help!
11:56:23 <Zama8152> I have to attend another meeting right now.. Thanks for your input..
11:56:37 <oneswig_> Thanks for coming Zama8152
11:57:02 <oneswig_> OK, a couple more events to mention
11:57:13 <oneswig_> #topic online conferences
11:57:27 <oneswig_> I'm sure there are plenty of others going on
11:57:48 <oneswig_> #link High Performance Container Workshop, 16-18 June https://hpcw.github.io/
11:58:12 <oneswig_> Content looks quite interesting but I haven't listened in to any of the sessions so far
11:58:36 <oneswig_> #link Virtual ISC https://www.isc-hpc.com/
11:59:04 <oneswig_> I'm not sure how much of ISC will be virtual but at least we don't have to find a hotel in Frankfurt to participate.
12:00:36 <oneswig_> The OpenDev events are imminent too
12:01:35 <oneswig_> The link eludes me alas and we must close
12:01:41 <oneswig_> Thanks all
12:01:43 <oneswig_> #endmeeting