#openstack-meeting log

16:00:23 <jungleboyj> #startmeeting Cinder
16:00:23 <openstack> Meeting started Wed Jun 19 16:00:23 2019 UTC and is due to finish in 60 minutes.  The chair is jungleboyj. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:24 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:26 <openstack> The meeting name has been set to 'cinder'
16:00:28 <whoami-rajat> Hi
16:00:32 <enriquetaso> hi
16:00:32 <smcginnis> o/
16:00:38 <jungleboyj> courtesy ping:  jungleboyj whoami-rajat rajinir lseki carloss pots woojay erlon geguileo eharney rosmaita enriquetaso e0ne smcginnis davidsha walshh_ xyang hemna _hemna
16:00:42 <jungleboyj> @!
16:00:43 <_pewp_> jungleboyj (◍˃̶ᗜ˂̶◍)ﾉ”
16:00:45 <geguileo> hi! o/
16:00:46 <walshh_> hi
16:00:53 <lseki> hi
16:01:14 <carloss> hi
16:01:37 <woojay> hi
16:01:45 <jungleboyj> Hello everyone.
16:02:42 <jungleboyj> Give people one more moment to show up.
16:03:15 <e0ne> hi
16:03:29 <jungleboyj> Ok, now e0ne is here.  We can start.  :-)
16:03:38 <e0ne> :)
16:03:50 <e0ne> #link https://etherpad.openstack.org/p/cinder-train-meetings
16:03:59 <jungleboyj> e0ne:
16:04:00 * e0ne feels useful today :)
16:04:00 <jungleboyj> Thanks.
16:04:07 <jungleboyj> #announcements
16:04:14 <jungleboyj> #topic announcements
16:04:21 <jungleboyj> As you can see not a lot today.
16:04:44 <jungleboyj> Just the reminder that the requirement to have 3rd Party CI running on Py3 is coming up in about a month.
16:04:58 <jungleboyj> Will start doing an audit of systems soon.
16:05:20 <e0ne> jungleboyj, smcginnis: do you know how many 3rd party CI use python3?
16:05:30 <jungleboyj> e0ne:  I haven't actually looked yet.
16:05:59 <smcginnis> e0ne: I've seen a couple at least, but I've been afraid to really go look at them all.
16:06:15 <jungleboyj> smcginnis:  Same here.
16:06:54 <smcginnis> There are several driver maintainers here. Anyone have any progress or updates to report on this?
16:07:08 <jungleboyj> ++
16:08:12 <smcginnis> Py37 is an officially supported runtime for train, so if driver maintainers don't make sure they work right with it, there could be a lot of pain in a few months when customers want to try out the new release.
16:08:12 <walshh_> dell_emc have moved from py36 to py37
16:08:23 <smcginnis> walshh_: Awesome - thanks!
16:09:22 <jungleboyj> That is good.
16:09:30 <jungleboyj> rajinir: Anything from your team?
16:10:40 <jungleboyj> Hmmm.
16:10:43 <jungleboyj> Anyone else?
16:10:48 <woojay> LINSTOR is 3.7
16:10:55 <jungleboyj> woojay:  Woot woot!
16:11:01 <jungleboyj> Thanks.
16:11:29 <smcginnis> woojay: Nice! :)
16:12:27 * _erlon_ sneaks ...
16:12:40 <dviroel> e0ne: We (NetApp) are using python3 on our CI.
16:12:53 <e0ne> great!
16:13:11 <jungleboyj> Good.  So, we aren't asking for something that doesn't work.
16:14:00 <jungleboyj> So, hopefully others have it working too and just aren't in the meeting.
16:14:07 <_erlon_> jungleboyj: no, it works, we just needed to set USE_PYTHON3=True in our jobs
16:14:43 <walshh_> I believe legacy Dell drivers are in the process of changing to py37 also.  It may even be complete
16:15:12 <jungleboyj> Good.
16:15:29 <smcginnis> dviroel: 3.7?
16:15:56 <_erlon_> 3.6 or 3.7 not sure
16:16:11 <dviroel> smcginnis: need to check that
16:16:48 <smcginnis> OK, it needs to be 3.7, but if it's already using 3.6 then that shouldn't be as big of a hurdle.
16:16:53 <jungleboyj> Ok.  Any further discussion needed on that then?
16:16:54 <_erlon_> smcginnis: 3.5 actually
16:17:04 <smcginnis> OK, so a little work to do there yet.
16:17:14 <_erlon_> I think DEVSTACK should allow you to set the version. We are just using the default
16:17:32 <smcginnis> It just depends on what version is available on your platform.
16:17:44 <smcginnis> Nothing controlled by devstack other that USE_PYTHON3=true
16:17:50 <_erlon_> smcginnis: hmm, jobs still running on xenial
16:18:32 <_erlon_> smcginnis: if 3.7 is not available there and we need to move to ubuntu 18 it will be a big lap
16:19:00 <smcginnis> You probably should move though. Xenial isn't a supported version anymore.
16:19:10 <whoami-rajat> i thought CI's moved to Bionic?
16:19:22 <smcginnis> jungleboyj: Guess this should have been a full topic and not just an announcement. :D
16:19:53 <_erlon_> smcginnis: jungleboyj: I know 3.7 is the recommended, but is that a requirement for Train?
16:19:55 <jungleboyj> :-)  Well, it is good we are having the discussion.
16:20:10 <_erlon_> move to ubuntu 18 will take us some time
16:20:17 <smcginnis> _erlon_: Yes, that's what we've stated several times over the last few months.
16:20:18 <rajinir> walshh_, jungleboyj: sorry I'm late. All dell emc ci has moved to 3.7
16:20:29 <smcginnis> rajinir: Thanks!
16:20:34 <jungleboyj> rajinir:  No problem.  Thank you!
16:20:34 <_erlon_> whoami-rajat: if you move them its moved :P
16:20:41 <e0ne> #link https://governance.openstack.org/tc/reference/runtimes/train.html
16:21:06 <e0ne> ubuntu 18.04, python 3.5 and 3.7 should be supported
16:21:20 <smcginnis> 3.6 and 3.7, not 3.5
16:21:24 <_erlon_> smcginnis: damn, I just picked the python 3 part
16:21:39 <e0ne> smcginnis: sure, just a typo :(
16:22:09 <smcginnis> And can run on whatever you can install py3.7 on, but if you want to actually run what customers are being told is supported, that's Bionic, CentOs/RHEL7, or openSUSE Leap 15.
16:22:14 <smcginnis> e0ne: ;)
16:22:51 <rajinir> There are two flags in devstack USE_PYTHON3 and PYTHON3_VERSION
16:23:15 <smcginnis> Oh, great.
16:23:32 <smcginnis> https://docs.openstack.org/devstack/latest/configuration.html#id19
16:24:19 <_erlon_> rajinir: great! Good to know thanks
16:24:33 <_erlon_> I'll give a shot on that
16:24:44 <jungleboyj> Cool.
16:25:13 <jungleboyj> Anything more on Py3?
16:26:06 <jungleboyj> I take that as a no.
16:26:21 <jungleboyj> Thank you all vendors for your input and for participating here.
16:26:32 <jungleboyj> Appreciate all of you that stay active.
16:27:52 <jungleboyj> Ok.  So ...
16:27:59 <jungleboyj> #topic Open Discussion
16:28:13 <jungleboyj> Anything else that people want to discuss?
16:28:48 <_erlon_> I do
16:28:53 <jungleboyj> _erlon_: Sure.
16:29:16 <_erlon_> I would like to get some input on the replication interaction beteewn Nova and Cinder
16:29:33 <_erlon_> may be there's someone here with some backgrond and experience on that
16:29:59 <_erlon_> Im trying to understand if is possible to do a dynamic re-discover of volumes after a failover/failback
16:30:02 <jungleboyj> What kind of information are you looking for?
16:30:13 <jungleboyj> Hmmm ...
16:31:00 <jungleboyj> geguileo: You know that code path to some extent ...
16:31:18 <_erlon_> I know, that currently its said that you need to detach and re-attach the volumes, but this seems to be a very manual process and I *think* its not possible for volume backed VMs
16:31:26 <geguileo> there is no interaction afaik
16:31:55 <geguileo> _erlon_: if you mean for boot volume VMs, yes, you are correct
16:32:25 <jungleboyj> Right.  Limited functionality there.
16:32:26 <geguileo> I think I wrote somewhere current limitations...
16:32:37 <_erlon_> geguileo: because its not implemented or because is not possible?
16:32:38 <_erlon_> I was trying to imagine how it would be possible, but I think it might not
16:34:17 <jungleboyj> For Volume Backed VMs I think it is highly unlikely that it could work.
16:34:44 <_erlon_> jungleboyj: yep, so, one would have to re-create those kind of VMs
16:34:55 <jungleboyj> _erlon_:  Correct.
16:35:01 <smcginnis> It's all very manual right now, kind of on purpose.
16:35:07 <geguileo> _erlon_: I believe this falls in the same category of changing the root volume contents
16:35:09 <jungleboyj> smcginnis: ++
16:35:18 <smcginnis> Past attempts had tried to bite off too much, so we wanted a minimally viable feature.
16:35:39 <geguileo> _erlon_: so it could be implemented in the same way that we are working now to reimage the root volume
16:35:42 <smcginnis> I know folks want something like VMware SRM, but that's not at all what we provide and it would take a LOT of work to get there.
16:35:47 <geguileo> _erlon_: but in this case it would need to reattach it
16:36:13 <jungleboyj> :-(
16:36:44 <_erlon_> smcginnis: it can contiue to be manual, but only triggered manualy, not the whole process, that sometims might require manual intervention to DB
16:37:18 <_erlon_> how its peoples experience with what we have so far
16:37:20 <_erlon_> ?
16:37:34 <_erlon_> I hope nobudy ever needed to use it lol
16:37:37 <jungleboyj> _erlon_:  That is a good question.  Not sure how many people are actually using it.
16:38:42 <jungleboyj> After all the work to get it there ... if no one is using it.  *Sigh*
16:38:46 <_erlon_> jungleboyj: we found out this lat days that our SF driver has some flaws that would make hardly possible to someone to use it, so, we believe nobody has ever really even tried
16:39:26 <jungleboyj> Probably true.
16:39:45 <geguileo> _erlon_: replication at this point is mainly to ensure you don't lose data
16:40:00 <geguileo> but to recover, that would require a lot of knowledge and manual steps
16:40:04 <geguileo> afaik
16:40:08 <_erlon_> jungleboyj: Im putting 2 meaning to the 'using' word, I would expect people to use, in the sense of having it ready for a disaster, but not really having to do a failover
16:40:31 <jungleboyj> Ah, true enough.
16:41:18 <geguileo> it was discussed at some point to have a feature to "try the failover/failback"
16:41:39 <jungleboyj> Yes.  Wasn't some of that implemented?
16:41:49 <_erlon_> geguileo: yes, that was the first point in replication v1. To be able to continually test the process
16:41:51 <geguileo> to ensure that it was actually working
16:42:24 <_erlon_> but with the backend based replication on v2 this was left behind
16:43:03 <jungleboyj> Ah, that is right.
16:44:06 <_erlon_> it is possible to do it, you would need to setup a backend using the same arrays and them keep failing over it back and forth periodically
16:44:33 <jungleboyj> So, I think what we are getting at here is that if this hasn't been tested by vendors, it probably hasn't been tested.
16:44:42 <_erlon_> :)
16:45:20 <jungleboyj> That is not good.
16:45:37 <geguileo> jungleboyj: you mean that the failover and failback hasn't been tested?
16:45:48 <jungleboyj> geguileo: Not sure.
16:45:52 <geguileo> or that the whole process of failing over + making the VMs work?
16:45:53 <jungleboyj> Who has tested it?
16:45:57 <geguileo> I tested it
16:46:03 <jungleboyj> Ok, that is good.
16:46:20 <geguileo> when I was helping jobernar on the RBD failover
16:46:35 <jungleboyj> Did you do anything with VMs?
16:46:36 <geguileo> and I discovered a bunch of issues in our core code and fixed most of them
16:46:41 <_erlon_> geguileo: we mean that there's few testing on Cinder and there not a good way for users that have it in production to test the failover
16:47:03 <geguileo> jungleboyj: this was a long time ago, when Nova couldn't even force-detach
16:47:17 <_erlon_> jungleboyj: we are probably putting together some replication tests this release we could try to run them on the gate with ceph
16:47:22 <geguileo> so it was not possible to cleanly re-attach a volume, the old one would still be there
16:47:55 <geguileo> _erlon_: agreed, there is not enough testing
16:48:13 <jungleboyj> Ok.  That isn't surprising.
16:48:16 <geguileo> _erlon_: and recovering VMs is not easy (it is HARD)
16:48:22 <jungleboyj> That is why there needs to be a detach and reattach?
16:48:39 <_erlon_> geguileo: if you re-attach the device will came back in another path in the VMs right?
16:48:56 <geguileo> jungleboyj: in a perfect world we should just tell Nova which volumes have been failed over
16:49:04 <_erlon_> yeap, I assume so
16:49:05 <geguileo> and nova would do whatever needs to do
16:49:15 <geguileo> _erlon_: yup
16:49:17 <jungleboyj> Yeah, that would be nice.
16:50:11 <_erlon_> geguileo: couldn't Cinder let nova know it? so, that would be a more straightforward process for the admin?
16:50:12 <geguileo> _erlon_: but maybe Nova can hack around this
16:50:25 <geguileo> _erlon_: yes, Cinder could let Nova know
16:50:33 <geguileo> which volumes and the instances they are connected to
16:50:57 <geguileo> iirc the failover method returned volumes that have been failed over
16:51:10 <_erlon_> geguileo: Nova can, but not the OS, which will  be trying  to  write to the dead device
16:51:20 <geguileo> so then Cinder would just need to check the instances they are attached to and tell Nova about those
16:51:22 <_erlon_> geguileo: yes they do
16:51:54 <geguileo> (ok, my memory is not that bad)
16:52:28 <geguileo> _erlon_: we would have to talk with Nova and see if it would be helpful for them to know the change
16:52:40 <geguileo> (aka can they do something about it?)
16:53:12 <_erlon_> geguileo: mhm, Ill try to bring that to next Nova meeting
16:53:22 <geguileo> _erlon_: thanks
16:53:25 <_erlon_> not sure if Ill make it tomorrow, but asap
16:54:00 <jungleboyj> _erlon_:  Cool.  Thank you for following up on that.
16:54:07 <_erlon_> jungleboyj: sure, np
16:54:37 <jungleboyj> #action Erlon to follow up with Nova team to find out if there is anything they can do to handle failures better if they know the volumes failed over and the associated instances.
16:55:02 <jungleboyj> We have 6 minutes left.  Anything else that people would like to discuss.
16:55:11 <jungleboyj> geguileo:  Thanks for the expertise there by the way.
16:56:16 <jungleboyj> Ok.  Looks like everyone has gone quiet so I will wrap up the meeting.
16:56:37 <jungleboyj> Thank you all for attending.  Good discussions today.
16:56:40 <smcginnis> Thanks jungleboyj
16:56:43 <whoami-rajat> Thanks!
16:56:47 <enriquetaso> o/
16:56:48 <jungleboyj> #endmeeting