11:00:41 #startmeeting scientific-sig 11:00:42 Meeting started Wed Jun 17 11:00:41 2020 UTC and is due to finish in 60 minutes. The chair is oneswig_. Information about MeetBot at http://wiki.debian.org/MeetBot. 11:00:43 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 11:00:45 The meeting name has been set to 'scientific_sig' 11:01:10 hi all 11:01:19 #link agenda for today https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_June_17th_2020 11:02:42 #topic OpenStack and COVID19 workloads 11:03:28 It seems like a lot of organisations are working to support workloads for COVID, in various forms 11:04:28 Hello - am partially here. 11:04:38 https://home.cern/news/news/cern/cern-contributes-computers-combatting-covid-19 11:05:01 Running Folding@HOME and Rosetta@HOME on hardware about to be retired 11:05:44 Not heard of Rosetta@home - what is that? 11:06:24 https://boinc.bakerlab.org/ 11:06:48 "With the recent COVID-19 outbreak, R@h has been used to predict the structure of proteins important to the disease as well as to produce new, stable mini-proteins to be used as potential therapeutics and diagnostics, like the one displayed above which is bound to part of the SARS-CoV-2 spike protein." 11:08:55 pretty simple cloud-init script to start a new VM - https://clouddocs.web.cern.ch/using_openstack/contextualisation.html#install-the-folding-home-client 11:08:59 NSF-funded infrastructures can also accept workloads related to COVID-19, I saw it mentioned through Chameleon: https://www.chameleoncloud.org/blog/2020/03/13/chameleon-use-covid-19-projects/ 11:09:15 Good article on the CERN blog - how many work units have been completed I wonder! 11:09:38 Stats at https://stats.foldingathome.org/teams-monthly 11:10:17 but it is important to not overload the volunteer organisations, we have some spare CPU time at the moment because the team doing hardware work is only just back on site 11:10:45 However, it's important that the core volunteers don't get displaced by this temporary contribution 11:11:55 We're also in discussions with the WHO to see how we can help as they are just down the road from us in Geneva 11:12:39 verdurin: are you close to the source of research in Oxford? 11:13:14 oneswig_: yes, there's a lot going on here. 11:15:40 verdurin: we're running F@H at the moment but if there is a better application to run, I can put you in touch with the CERN COVID folk 11:18:43 Aside from simulation work, what about the epidemiology, public health, contact tracing etc. 11:18:48 noggin143: thanks. As usual there is a myriad different applications in use. 11:20:08 There is also the RECOVERY clinical trial, and the vaccine trial that originates from a couple of buildings down. 11:20:33 I recall from somewhere that protein simulations don't have significant data requirements, does that also apply for your workloads Adam? 11:21:27 A lot of the workloads I know about are similar to our normal genomic ones, hence they do have significant data requirements. 11:23:15 There is also demand from data generators e.g. sequencing, proteomics. 11:23:20 I guess there are also privacy concerns for some of the applications, which don't apply for the volunteer projects like F@H 11:23:40 Yes. In some cases various flavours of patient data. 11:23:49 All of which make them difficult to spread to other places. 11:24:59 In the SIG session at the PTG there was reference to this tracking project in India - https://www.aarogyasetu.gov.in/ 11:25:21 Prakash are you here? 11:25:21 One aspect that may be of interest here is provisioning of resources such as RStudio Server on cloud instances, where in the past dedicated nodes were used. 11:26:04 verdurin: is that licensed software? That can often be tricky 11:26:25 It depends. There are different flavours, free and licensed. 11:27:42 Is RStudio being used for post-processing and visualisation of batch simulations? 11:28:47 It's mainly for code development, I believe. 11:28:55 Very fast-moving area. 11:29:19 I am sure it is. 11:32:47 I'd be interested to hear how Public Health England's OpenStack systems are being applied for the modelling work they do. 11:34:51 Anything else to raise on this subject before we move on? 11:35:03 Just one more comment: AFAIK RStudio is an IDE for R code development. RStudio Server is useful when you deploy it next to big data and powerful compute so the researcher connects to it through a web browser to do interactive analyses. 11:35:52 Hi sluna, thanks for clarifying. 11:36:18 The dividing line is a bit muddier, but it's not that important. 11:37:43 We haven't covered the IOT-class issues of tracking populations but I don't think anyone's here who is working on that. 11:39:11 anyone using or recommending elastic search for analyzing data and monitoring movements 11:40:02 Hello Zama8152, welcome :-) 11:41:03 ElasticSearch is very good for indexing and retrieval of JSON-encoded data. 11:41:14 I'd be interested at knowing what better tools to use with regards to tracking population 11:41:35 oneswig_Hi, thanks for the invite.. 11:42:02 We use it mainly for structured search like logs but ES is pretty flexible 11:44:10 researchers on my side use elastic search to analyze Covid-19 self-screening data and monitoring movements of citizens in areas of interest to understand the effectiveness and impact of the lockdown, this information is used by the National Department of health in making relevant decisions 11:44:59 Zama8152: it's good for data that doesn't always have the same structure - semi-structured perhaps. If your data is always of the same format, you could also consider an SQL database like postgres 11:46:02 How big is the Elastic Search, will it grow to be massive? 11:46:14 Yeah the data does'nt have the same structure and progress db is indeed used .. 11:46:29 *postgress 11:49:04 One other thought on ElasticSearch is that performance can be limited by IOPS and IO latency - it will benefit from local SSD storage in your hypervisors, if you have the local disk capacity for it. 11:49:58 Is your group developing everything from scratch? There probably isn't much precedent for this kind of application. 11:50:25 we run in VMs with 4 1TB SSDs per server - works OK, you lose some IOPS and rebalancing can take a while 11:51:27 old presentation at https://indico.cern.ch/event/717615/contributions/3033517/attachments/1676735/2692320/ES_Security.pdf, now migrating to ES7 11:52:35 Is that 250TB of data in ES back in 2018? Must be even more now... 11:54:40 the security one is getting pretty big... used for forensics but otherwise, we purge aggressively with archive in HDFS. 11:54:48 currently they are running on 1 vm with 80GB memory and 1TB ssd 11:55:15 We are short on time - final comments on this topic? 11:55:44 They have complained about memory and I am moving them to a vm with 160GB memory.. 11:56:18 Hopefully that will help! 11:56:23 I have to attend another meeting right now.. Thanks for your input.. 11:56:37 Thanks for coming Zama8152 11:57:02 OK, a couple more events to mention 11:57:13 #topic online conferences 11:57:27 I'm sure there are plenty of others going on 11:57:48 #link High Performance Container Workshop, 16-18 June https://hpcw.github.io/ 11:58:12 Content looks quite interesting but I haven't listened in to any of the sessions so far 11:58:36 #link Virtual ISC https://www.isc-hpc.com/ 11:59:04 I'm not sure how much of ISC will be virtual but at least we don't have to find a hotel in Frankfurt to participate. 12:00:36 The OpenDev events are imminent too 12:01:35 The link eludes me alas and we must close 12:01:41 Thanks all 12:01:43 #endmeeting