16:00:33 <gibi> #startmeeting nova
16:00:33 <openstack> Meeting started Thu Dec  3 16:00:33 2020 UTC and is due to finish in 60 minutes.  The chair is gibi. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:35 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:38 <openstack> The meeting name has been set to 'nova'
16:02:10 <gibi> o/
16:02:18 <gmann> o/
16:02:35 <stephenfin> o/
16:02:44 <elod> o/
16:03:14 <gibi> #topic Bugs (stuck/critical)
16:03:20 <gibi> One Critical bugs
16:03:21 <gibi> #link https://bugs.launchpad.net/nova/+bug/1906428 blocking the nova gate as nova-multi-cell job fails
16:03:22 <openstack> Launchpad bug 1906428 in OpenStack Compute (nova) "test_cold_migrate_unshelved_instance failing with cat: can't open '/mnt/timestamp': No such file or directory" [Critical,In progress]
16:03:24 <gibi> Patch is on the gate to skip the failing test until we find a solution #link https://review.opendev.org/c/openstack/nova/+/765141
16:03:45 <gibi> I saw it bounced from the gate :/
16:03:54 <gmann> ah again failed.
16:04:52 <gmann> 134 run already in check pipeline I think it would not merge soon
16:04:54 <gibi> lyarwood promised to  continue looking into the actual problem next week
16:05:07 <gibi> gmann: yeah, gate feels slow these days
16:05:15 <bauzas> \o
16:05:42 <gibi> #link 14 new untriaged bugs (+0 since the last meeting): #link https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New
16:05:58 <gibi> we are hovering around this number during the whole week
16:06:06 <gibi> #link 75 bugs are in INPROGRESS state without any tag (+0 since the last meeting): #link https://bugs.launchpad.net/nova/+bugs?field.tag=-*&field.status%3Alist=INPROGRESS
16:06:12 <gibi> these are potentially un-triaged bugs. Check if they are still valid
16:06:24 <gibi> Is there any bug we need to discuss here ?
16:07:08 <gibi> #topic Gate status
16:07:14 <gibi> Gate on master is blocked. Patch to unblock it is on the gate #link https://review.opendev.org/c/openstack/nova/+/765141
16:07:19 <gibi> we dicussed this already
16:07:24 <gibi> Gate on stable/victoria is blocked. Fix is on the gate #link https://review.opendev.org/c/openstack/nova/+/764432
16:07:41 <gibi> this also bounced
16:07:47 <gibi> :/
16:08:08 <gibi> Classification rate 35% (+11 since the last meeting) #link http://status.openstack.org/elastic-recheck/data/integrated_gate.html
16:08:13 <gibi> Please look at the gate failures, file a bug, and add an elastic-recheck signature in the opendev/elastic-recheck repo (example: #link https://review.opendev.org/#/c/759967)
16:08:28 <gibi> I don't know how relevant the classification rate as an absolute value
16:08:47 <gibi> as it is now show better classification than last week but the gate feels in worst shape
16:09:08 <gibi> maybe what changed that we know why the gate fails but we didn't solved the failures yet
16:09:25 <gibi> anyhow I will keep reporting / tracking this number for a while to see if it is relevant
16:09:34 <gibi> any other gate issue we need to talk about?
16:10:51 <gibi> #topic Release Planning
16:10:56 <gibi> Wallaby Milestone 1 is today!
16:11:08 <gibi> The second spec review day was a success. We now have 11 blueprints approved to Wallaby. #link https://blueprints.launchpad.net/nova/wallaby
16:11:27 <gibi> Until Milestone 1 we finished 0 blueprint out of the 11 approved blueprints.
16:11:51 <gibi> M2 is january 22
16:12:11 <gibi> considering the holiday season there is not much time until M2
16:12:35 <gibi> M2 will be spec freeze so if you have an open spec please hurry up :)
16:12:56 <gibi> any other release specific thing to disucss?
16:14:21 <gibi> #topic Stable Branches
16:14:26 <gibi> stable/victoria is blocked but patch to unblock is on the gate - https://review.opendev.org/764432
16:14:31 <gibi> other stable branches seems to be OK, no outstanding issue
16:14:32 <gibi> EOM
16:14:42 <elod> sorry for repeating o:)
16:14:55 <gibi> no worry, thanks for consistently adding update to the agenda
16:14:56 <elod> did not see that it's already listed at gate status
16:15:07 <elod> np
16:15:23 <gibi> any other stable thing to discuss? ( lyarwood is on PTO today)
16:15:39 <elod> nothing that I'm aware of :)
16:16:01 <gibi> #topic Sub/related team Highlights
16:16:05 <gibi> Libvirt (bauzas)
16:16:15 <bauzas> nothing to say
16:16:38 <gibi> #topic Open discussion
16:16:51 <gibi> there are two on the agenda
16:16:51 <gibi> (stephenfin): Stuck on what to do about invalid instance hostnames like 'ubuntu18.04'
16:16:59 <gibi> #link https://review.opendev.org/c/openstack/nova/+/764482
16:17:17 <gibi> stephenfin: could you summarize where we are?
16:17:29 <stephenfin> I've brought this up on the mailing list
16:17:30 <gibi> I was only able to follow the ML thread partially
16:17:44 <stephenfin> tl;dr: people are using instance names that look like FQDNs
16:17:54 <stephenfin> I haven't yet figured out if they're relying on these to be balid
16:17:55 <stephenfin> *valid
16:18:38 <stephenfin> In any case, I'm not sure if we're going to be able to just replace all periods is the name
16:19:14 <stephenfin> so I'm still thinking the "if it's an invalid FQDN, munge the name, otherwise don't" approach is best
16:19:26 <rafaelweingartne> I would like to ask for guidance with a patch
16:19:26 <stephenfin> but I know sean-k-mooney at least disagrees
16:19:37 <rafaelweingartne> I proposed this patch: https://review.opendev.org/c/openstack/nova/+/711113, but it has not received much reviews so far
16:19:46 <rafaelweingartne> should I open an RFE, and then a spec for it as well?
16:19:49 <gibi> rafaelweingartne: i will ping you after stephen's topic
16:19:57 <rafaelweingartne> ops, sorry, sure
16:20:43 <gibi> stephenfin: but sean is not here :)
16:21:08 <stephenfin> quick - everyone review it while sean is distracted!
16:21:11 <stephenfin> :)
16:21:41 <gibi> stephenfin: your proposed the split approach to support two separate use cases?
16:22:01 <gibi> use case a) server name is used as fqdn in the guest
16:22:18 <gibi> but what is use case b)
16:22:43 <stephenfin> use case a) is more a FQDN is used as the server display name and therefore the server host name
16:23:16 <stephenfin> while use case b) is a server display name with a period in it that is *not* a FQDN is used, so the server host name should be something else
16:23:48 <stephenfin> i.e. 'test.domain.com' is okay. 'test.01' will be converted to 'Server-{serverUUID}'
16:24:33 <stephenfin> if that makes sense?
16:24:46 <gibi> and in case b) what will be the hostname in the guest?
16:24:57 <stephenfin> 'Server-{serverUUID}'
16:25:23 <stephenfin> which is the fallback today if you end up with an empty string after all non-alphanumeric characters are removed
16:25:26 <gibi> I assum now test.01 causing a real failure somewhere down the line
16:25:43 <stephenfin> if designate is deployed, you aren't able to boot an instance
16:25:56 <stephenfin> because neutron will error out when creating/attaching a port
16:26:28 <gibi> with proper documentation I'm OK to have this split behavior. I guess you need a backportable solution
16:26:52 <gibi> hence not trying to disconnect the name and the hostname
16:26:58 <stephenfin> yes, exactly
16:27:11 <stephenfin> the proper solution is 'openstack server create --hostname FOO ...'
16:27:19 <stephenfin> but that's not backportable (API change)
16:27:23 <gibi> yeah
16:27:36 <gibi> does sean has a counter proposal that is also backportable?
16:27:53 <stephenfin> Not backportable fwict, no
16:27:57 <gibi> I see
16:28:06 <stephenfin> It's user error in his eyes
16:28:17 <gibi> then I think we can say that do a backportable fix first then do a proper fix on master later
16:28:22 <sean-k-mooney> o/
16:28:23 <stephenfin> and we should close as WONTFIX, which is user hostile
16:28:38 <gibi> sean-k-mooney: o/
16:28:52 <gibi> we are just discussing the server name test.01 issue
16:29:06 <sean-k-mooney> ah ok
16:29:10 <bauzas> mmmm
16:29:31 * bauzas looks at the API docs to see what we tell about naming instances
16:29:55 <bauzas> "The server name."
16:29:58 <bauzas> wow
16:30:00 <sean-k-mooney> bauzas: it tell you nothing
16:30:02 <gibi> sean-k-mooney: what is the reason you are against stephenfin's proposal to convert test.01 to server-{serverUUID} and not convert valid FQDNs
16:30:03 <bauzas> didn't see that coming
16:30:03 <sean-k-mooney> yep
16:30:33 <sean-k-mooney> gibi: it would change the hostname seen in the guest for one
16:30:59 <sean-k-mooney> the precendiet is also based on a missunder standing that unicode was invalid in a hostname
16:31:02 <bauzas> so, honestly, given we haven't told it's either the display name or the hostname, I think we are OK
16:31:16 <bauzas> because the semantics can change
16:31:36 <gibi> sean-k-mooney: I gues we not just remove unicode charachters but other non hostname compatible charachters too
16:31:51 <gibi> like /
16:31:57 <sean-k-mooney> so we should be allowing unicode hostnames
16:32:01 <sean-k-mooney> but ath is a seperete fature
16:32:04 <bauzas> definitelty ^
16:32:06 <sean-k-mooney> *feature
16:32:10 <gibi> agree ^^
16:32:17 <gibi> so unicode aside
16:32:22 <bauzas> asséééééé
16:32:27 <sean-k-mooney> we also are not transforming the hostnames acording to the relenvet RFEs
16:32:34 <sean-k-mooney> *RFCs
16:32:49 <sean-k-mooney> we shoudl be substituiing all punctianto and other special symble with _
16:32:54 <sean-k-mooney> sorry -
16:33:26 <bauzas> or, just consider that if you provide a ".", then you knew you are providing a FQDN
16:33:54 <bauzas> so, the hostname should only be the server name, not the TLD
16:34:05 <sean-k-mooney> so what we coudl do is in a new microversion add an fqdn filed and take only what is before the . for the instance.hostname
16:34:15 <bauzas> ie. if I wrote "bauzas.local", that meant to me that the name of my server is "bauzas"
16:34:23 <sean-k-mooney> yep
16:34:38 <sean-k-mooney> which is what actully happens todya
16:34:41 <bauzas> and I leave my DNS telling me my own TLD
16:34:47 <stephenfin> an API microversion isn't backportable though
16:34:47 <gibi> but as far as I understand we need a backportable solution first, then a proper solution on master
16:34:55 <sean-k-mooney> but as i pointed out in the email thread the metadat is totally wrong in that case
16:35:15 <stephenfin> I totally agree that what we do is rubbish, but we do it and people rely on it to some degree
16:35:18 <sean-k-mooney> i dont belive we need a backporable solution
16:35:24 <sean-k-mooney> or at lease im not sold on it
16:35:30 <bauzas> stephenfin: can't we consider to limit the server name to be "server" and not the whole FQDN ?
16:36:03 <sean-k-mooney> bauzas: i woudl be ok backproting that although im uncofrotabel with the transformation in general
16:36:04 <bauzas> (speaking of "server.domain")
16:36:13 <stephenfin> if we do, that's a change in behavior for users that were doing e.g. 'openstack server create instance.domain.com'
16:36:31 <sean-k-mooney> stephenfin: its not form a cloud init poitn of view
16:36:32 <bauzas> stephenfin: that's why I said I'm cool with explaning this behavioural change
16:36:45 <sean-k-mooney> there hostname will be instance in both cases
16:36:49 <bauzas> as we didn't promised anything with the servername
16:36:59 <sean-k-mooney> e.g. with or without designate
16:37:01 <bauzas> we're not breaking the contract)
16:37:03 <stephenfin> hmm, okay, so I'd assumed that would be rejected as non-backportable
16:37:25 <sean-k-mooney> what that would change is the designate dns name
16:37:33 <bauzas> well, it says "The server name."
16:37:33 <bauzas> "
16:37:36 <sean-k-mooney> currently it appending the designate default domain to the full sever name
16:37:47 <sean-k-mooney> now it would do the sane thing and append the default domain tothe hostname
16:37:57 <bauzas> yup
16:38:00 <sean-k-mooney> which woudl acutlly be resolveable via dns
16:38:04 <bauzas> yup
16:38:14 * gibi lost
16:38:17 <bauzas> and we could keep the display name to be the FQDN
16:38:23 <stephenfin> so if you create a server with 'instance.domain.com' and designate's default domain is 'domain.com', what happens?
16:38:26 <sean-k-mooney> bauzas: sure
16:38:36 <bauzas> gibi: trying to rephrase
16:38:37 <sean-k-mooney> the dispaly name could be that server name as it was passed in
16:38:57 <stephenfin> gibi: bauzas and sean-k-mooney are suggesting we drop everything after the first period, and suggesting it's backportable because we never made a guarantee about what the instance's hostname would be
16:39:07 <bauzas> this ^
16:39:10 <gibi> thanks
16:39:34 <stephenfin> so 'test-instance.domain.com' would have a hostname of 'test-instance'
16:39:42 <gibi> would this change the hostname of existing instances?
16:39:46 <bauzas> (with a big fat note explaining why we're so mean to the user)
16:39:48 <stephenfin> and 'ubuntu18.04' would have a hostname of 'ubuntu18'
16:39:48 <sean-k-mooney> gibi: no
16:40:04 <bauzas> gibi: don't
16:40:06 <sean-k-mooney> gibi: it would only change the hostname for new instances
16:40:19 <stephenfin> it shouldn't - that information is only calculated once on initial boot and stored in instance.hostname
16:40:26 <sean-k-mooney> yep
16:40:27 <gibi> ok
16:40:29 <bauzas> mustn't is the word :)
16:40:39 <sean-k-mooney> did peopel see http://lists.openstack.org/pipermail/openstack-discuss/2020-November/019137.html by the way
16:40:40 <stephenfin> I don't think we recalculate it if you e.g. change the instance name via 'openstack server set --name NAME server'
16:40:46 <stephenfin> assuming that is a command...
16:40:53 <sean-k-mooney> where i wen ther how the info is actully prented to the gust
16:41:01 * stephenfin knows you can set the name when rebuilding but isn't sure about otherwise
16:41:04 <gibi> then I'm OK to do this change as a backportable fix with a fat note
16:41:36 <bauzas> sean-k-mooney: yup, I saw your email
16:41:40 <gibi> could some of you please summarize it back to the ML to see if other will be against it?
16:41:54 <bauzas> sean-k-mooney: and that's why I think that people using periods in their server names are either foolish or smart enough
16:42:13 <gibi> sorry folks we have two other topics for today
16:42:17 <gibi> so we should move on
16:42:20 <stephenfin> yup
16:42:23 * stephenfin will summarize
16:42:27 <gibi> thanks!
16:42:28 <bauzas> I think we have a reasonable consensus here
16:42:31 <sean-k-mooney> stephenfin++
16:42:42 <gibi> rafaelweingartne: your turn
16:43:15 <rafaelweingartne> Sure. I have proposed this patch (https://review.opendev.org/c/openstack/nova/+/711113), it has some conflicts, but before resolving them
16:43:21 <rafaelweingartne> I would like to understand if we are missing something
16:43:30 <rafaelweingartne> such as an RFE, or a spec
16:44:28 <gibi> rafaelweingartne: glancing at the patch and the commit message you plan to redefine what 'usage' currntly means in the os-simple-tenant-usage API
16:45:09 <rafaelweingartne> yes, and no
16:45:30 <rafaelweingartne> we plan to externalise it. So, the default behaviour is maitained, and if somebody wants to redefine it, they could do so
16:46:06 <rafaelweingartne> To us, for instance, we were expecting something totally different from the data we get there (in the API) right now
16:46:11 <gibi> extrenalize is with a config option I assume
16:46:13 <sean-k-mooney> well if you wanted to do it differntly you can do so alredy
16:46:21 <sean-k-mooney> via consuming the instance notifocations
16:46:38 <rafaelweingartne> gibi: exactly
16:46:39 <sean-k-mooney> and building a system to track the lifecycle of the servers as you see fit
16:46:43 <rafaelweingartne> that is what the API is doing
16:46:57 <gibi> it feels like a config driver API
16:46:59 <rafaelweingartne> sean: we have other systems in-place that do that
16:47:01 <gibi> driven
16:47:12 <rafaelweingartne> gibi: yes
16:47:46 <gibi> we try to avoid config driven APIs as it makes differnt public coulds behave differently
16:47:47 <rafaelweingartne> when we saw that API, we just thought about using it to cross-check the data we already have in other monitoring and billing systems that we have in place
16:48:00 <gibi> Is os-simple-tenant-usage admin only by default?
16:48:09 <sean-k-mooney> so so this is one of the apis that i dont really fit well in nova
16:48:25 <sean-k-mooney> long term i think it would live better in an external service
16:48:34 <rafaelweingartne> probably yes
16:48:48 <sean-k-mooney> its one of the larger performance hedaces for our custoemr
16:49:13 <sean-k-mooney> this is very slow to query and result in a slow horizion as it used in the defautl overview page
16:49:26 <rafaelweingartne> but the current docs gave us the idea of providing the usage for a VM, but as I explain in the patch, it consider usage the time between the instance was created up until now or when it was destroyed
16:49:28 <sean-k-mooney> so im concerned about adding more complexity to it
16:49:35 <rafaelweingartne> I see
16:50:03 <rafaelweingartne> Right now, the API does not provide usage data as it says
16:50:17 <rafaelweingartne> at least, it is not the same understanding of usage as we have
16:50:26 <rafaelweingartne> that is why we proposed the patch
16:50:48 <gibi> rafaelweingartne: so it provides resource allocation usage but not runtime for the VM I guess
16:51:21 <rafaelweingartne> exactly
16:51:31 <sean-k-mooney> rafaelweingartne: well it does provide usage info
16:51:36 <gibi> I tend to agree with sean-k-mooney that this is not a good API for billing, and also rafaelweingartne you said that you have a different service anyhow for billing
16:51:40 <rafaelweingartne> but the documentation says usage, it does not differ between allocation and actual usage
16:51:44 <sean-k-mooney> but the defition of usage is differnt form what you are expecting
16:52:13 <rafaelweingartne> therefore, we tried to amend that
16:52:20 <gibi> I don't really think we shoudl develop os-simple-tenant-usage further (hence the name simple) but fix the doc to be precies instead
16:52:29 <sean-k-mooney> so amending that woudl be an api change and require a spec not a bugfix
16:52:35 <rafaelweingartne> well, ok that would help as well then
16:52:49 <sean-k-mooney> https://github.com/openstack/nova/blob/0e7cd9d1a95a30455e3c91916ece590454235e0e/doc/source/contributor/policies.rst#metrics-gathering
16:53:05 <sean-k-mooney> its slightly tangental but we have delcare metrics gathering as out of scope before
16:53:18 <sean-k-mooney> i tought we had a similar statement for billing but i dont see one
16:53:26 <rafaelweingartne> Ok, so no sense in creating an RFE then
16:53:47 <rafaelweingartne> well, I will create a patch to make the docs more clear then
16:54:00 <gibi> rafaelweingartne: thank you!
16:54:07 <gibi> (please file a doc bug for tracking)
16:54:42 <gibi> there is one more topic from the agenda
16:54:43 <gibi> (gibi): do we want to merge the backports for the placement-audit command? https://review.opendev.org/q/topic:%22placement-audit-backport%22
16:54:57 <gibi> It was raised during the week on #openstack-nova
16:55:11 <stephenfin> yes please
16:55:22 <gibi> does somebody remember what was the reason not to merge it?
16:55:30 <stephenfin> artom: ^ ?
16:55:53 <stephenfin> I think the concern was that it's kind of feature'y, but it's not user visible and is a huge win for operators (and us, diagnosing problems)
16:55:57 <artom> Oh, it was just super messy
16:56:03 <stephenfin> oh, even simpler than that
16:56:05 <artom> Past, like, 1 or 2 releases back
16:56:16 <bauzas> yup
16:56:20 <bauzas> this was the concern
16:56:21 <stephenfin> it was merged in stable/ussuri, right?
16:56:30 <artom> Nope, we didn't bother
16:56:38 <stephenfin> no, I mean initially
16:56:38 <artom> I used the upstrem DNM backports for CI, essentially
16:56:46 <artom> Because our RH CI is... well, it is.
16:57:00 <artom> Ah, you'd have to ask bauzas about the initial landing.
16:57:18 <bauzas> when this was merged ?
16:57:21 <bauzas> well, I'm old
16:57:29 <bauzas> ussuri IIRC
16:57:40 <sean-k-mooney> dansmith had an opion on it and i belive it was in favor of mergeing based on the operator win but i also dont recal
16:57:48 <gibi> merged in ussuri
16:57:53 <bauzas> https://review.opendev.org/c/openstack/nova/+/670112 => ussuri
16:58:08 <bauzas> sean-k-mooney: I think his opinion was meh
16:58:29 <gibi> how risky it is to backport the mess?
16:58:37 <sean-k-mooney> bauzas: basically im rembering it was not a hell no
16:58:45 <bauzas> but honestly, audit is related to allocations recreate
16:58:59 <bauzas> from mriedem
16:59:10 <gibi> I assume the effor to create the backport was already spent so only future efforts on stable due to these patches in question
16:59:16 <bauzas> one is deleting orphaned, the other is recreating them
16:59:46 <bauzas> gibi: I'd say that the maintainance is low but the initial effort is worth it pre-Train
16:59:58 <bauzas> Train backport is easy
17:00:08 <gibi> bauzas: but the initial effort is already spent as we have the patches proposed
17:00:11 <bauzas> but then artom sweated a lot with older releases
17:00:11 <stephenfin> bauzas: is or is not?
17:00:25 <bauzas> technically, we QE'd it on Queens
17:00:37 <gibi> QE?
17:00:39 <artom> bauzas, did we tho?
17:00:41 <bauzas> so the effort is already done and manually validated
17:00:47 <bauzas> against Queens
17:00:50 <artom> I'd have to double check the BZ
17:00:54 <gibi> we run out of time
17:01:02 <gibi> lets move this to #openstack-nova
17:01:03 <gibi> sorry
17:01:05 <gibi> #endmeeting