Thursday, 2021-08-26

*** abhishekk is now known as akekane|home03:55
*** akekane|home is now known as abhishekk03:55
*** akekane_ is now known as abhishekk06:01
*** rpittau|afk is now known as rpittau07:24
*** jokke_ is now known as jokke11:07
*** jokke is now known as jokke_11:07
abhishekk#startmeeting glance14:00
opendevmeetMeeting started Thu Aug 26 14:00:09 2021 UTC and is due to finish in 60 minutes.  The chair is abhishekk. Information about MeetBot at http://wiki.debian.org/MeetBot.14:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.14:00
opendevmeetThe meeting name has been set to 'glance'14:00
abhishekk#topic roll call14:00
abhishekk#link https://etherpad.openstack.org/p/glance-team-meeting-agenda14:00
dansmitho/14:00
abhishekko/14:00
abhishekklets wait couple of minutes for others to show14:01
abhishekkI doubt rosmaita will join us today14:01
jokke_o/14:01
abhishekkLets start14:02
abhishekk#topic release/periodic jobs update14:02
abhishekkM3 next week, but we will tag it a week after M314:02
abhishekki.e. next to next week14:02
abhishekkso we still have around 6 working days to get things done14:02
abhishekkpython-glanceclient needs to be tagged next week though14:03
abhishekkI will put a release patch around Sept 01 for the same14:03
pdeoreo/14:03
abhishekkSurprisingly periodic jobs no timeouts for last  3 days14:04
abhishekkall green at the moment14:04
abhishekk#topic M3 targets14:04
abhishekkGlance Xena 3 review dashboard - https://tinyurl.com/glance-xena-314:04
abhishekkMost of the policy patches are merged and remaining are approved 14:04
abhishekkdue to heavy traffic in gate we are facing some unusual failures, I will keep watch on them14:05
jokke_I'd say usual milestone failures :)14:05
abhishekkThank you croelandt and dansmith and lance for reviewing these patches on priority14:05
jokke_Every cycle the same thing. Just like every year the winter surprises the Finns :D14:06
abhishekkagain and again at crucial time14:06
abhishekkCache API - Still under review - FFE required?14:06
abhishekkThere are some comments on tests and some doc changes needs to be done14:07
abhishekkneed to mention header which we are using to clear the cached and queued images in docs and api reference as well14:07
dansmithI have some more draft comments I'm still mulling, I will move those to the latest PS or drop them accordingly14:07
jokke_I fixed the comments and the output of clear_cache as Dan kindly pointed out that it was very silly behaviour14:07
abhishekkack14:08
abhishekkWe will revisit the progress next week and decide on FFE grant for the same14:08
abhishekkAny questions ?14:09
abhishekkSame goes for metadef project persona14:09
jokke_Just FYI I'll be on PTO next week. I'd say the debate on the tests are great opportunity of followup patch after the FF as that API change needs to merge so we can get the client patch in before it needs to be released14:09
abhishekkPatches are now open for review and we have good functional test coverage there to ensure the new RBAC behavior14:09
jokke_so FFE for that specific work is not great as it needs the client side 14:10
abhishekkack, I haven't got enough time to have a look on new patch set or other review comments14:10
abhishekkIn your absence I will work on that14:11
abhishekkComing back to RBAC metadef, we are working on glance-tempest-plugin protection testing and that will be up and complete by tomorrow14:12
abhishekkBut I think the functional coverage on the glance side is good and we can consider those changes for M314:13
abhishekkI will apply for the FFE for the same if this is not merged before M3 work14:14
abhishekkMoving ahead14:14
abhishekk#topic Wallaby backports14:15
croelandtwhoami-rajat: ^14:15
croelandtSo, these 2 backports in the agenda are part of a huge big fux that includes Cinder patches as well14:15
whoami-rajathi14:15
croelandtWe were under the impression it was ok to backport in Wallaby14:15
croelandtthe first patch is a new feature but also support for the bug fix14:16
croelandtI think Rajat has users affected by this in upstream Cinder, am I right?14:16
whoami-rajatyes14:16
whoami-rajatso most of the fixes on glance cinder side, like multi store support, format info support are all dependent on this attachment API code14:16
abhishekkI am also under impression that if we have a crucial bug fix then we can backport supportive patches for it to stable branches14:17
abhishekkand, I have seen some similar kind of backports in upstream in the past (not for glance though)14:18
croelandtjokke_: what do you think?14:18
whoami-rajatI've already backported cinder side changes and they're already +2ed, so we won't have any issues on code side as far as I'm aware14:18
abhishekkCan we also have opinion from some other stable maintainers ?14:19
jokke_I think I already pointed out in the early phase of fixing these bugs that we should not had depended the prevent qcow2 on nfs on the attachement change as that is not backportable really by the policy.14:20
whoami-rajatwe can't do the qcow2 change without adding the new attachment API changes, it depends on the attachments get command14:21
whoami-rajats/command/api14:21
abhishekkhmm, the policy suggests some corner cases as well14:21
jokke_And I do know ad understand that we will be backporting these in downstream anyways but that's totally different story. What comes to the upstream backport is all the refactoring, new dependencies etc. of that attachment API is making it very dodgy backport14:23
croelandtUntil when are we gonna be backporting stuffi n wallaby?14:23
croelandtThis might not be an issue for long :D14:24
jokke_whoami-rajat: I think we could have done that by looking the volume connection we get and looking the image file we have. There was no need ofr attachment api to figure out that we have combo of qcow2+NFS that we cannot support14:24
jokke_croelandt: wallaby is in active stable maintenance still for another ~8months 14:25
abhishekkI think we should get opinion from stable team as well14:26
croelandtjokke_: is it likely that backporting these patches is going to be an issue in the next 8 months?14:26
whoami-rajatjokke_, the initialize_connection call doesn't return the right format, the feature i implemented on cinder side was discussed during PTG and was decided to include the format in connection_info in the new attachment API response14:26
abhishekkUnfortunately we have couple of requests for it from other customers but we will stick to policy if we need to14:26
abhishekkwe had lengthy discussion for that14:27
croelandtabhishekk: so, how do we make our decision?14:28
dansmithI'm not on glance stable,14:28
dansmithbut I definitely opt for less backporting in general, and definitely extreme caution over anything complex unless it's absolutely necessary14:28
dansmithI think I looked at this before briefly, and I don't remember all the details, but anything that requires glance and cinder things to be backported is high risk for breaking people that don't upgrade both services in lockstep unless both sides are fully tolerant (and tested that way) of one happening before the other14:29
dansmithdownstream we can handle that testing and risk (and support if it breaks) but it's not really appropriate material for upstream stable in general, IMHO14:30
abhishekkcroelandt, I think we need some opinion from other stable maintainers as well14:31
abhishekkbut what dansmith has said now, this might be problematic in case of upgrade14:31
rosmaitasorry i'm late14:31
jokke_dansmith: yeah, that kind of can be flagged in the requirements, which we don't currently do. But in general there is just too many red flags. It's not one or two of our stable rules this specific case is crossing14:31
jokke_hi rosmaita \o14:31
jokke_just in time for the attachment api backport discussion14:32
rosmaitaah14:32
dansmithjokke_: requirements.txt you mean? that has nothing to do with what is installed on other servers in a cluster, and certainly no direct impact on distro packages14:32
abhishekkI think rosmaita is stable member for very long14:32
rosmaitatoo long14:32
rosmaitaand not very stable14:32
abhishekk:D14:32
croelandt*badum tss*14:32
rosmaitais there a place i can read the scrollback?14:33
abhishekkSo just to give you short overview14:33
rosmaitai think the logs don't get published until the meeting ends14:33
rosmaitaok, short overview is good14:33
jokke_dansmith: well sure, not the service. I was more thinking of cinderclient and os_brick needing to be able to do the right thing anyways14:33
abhishekkwe have one bug fix to backport which id depend on the patch which is impemented as a feature14:33
abhishekk#link https://review.opendev.org/c/openstack/glance_store/+/80592714:33
abhishekkthis is actual bug fix14:33
abhishekk#link https://review.opendev.org/c/openstack/glance_store/+/80592614:34
abhishekkthis is dependent patch which is needed for the above backport14:34
abhishekkI am pro for this backport because I thought;14:34
abhishekkthe change is related to cinder driver and will not affect any other glance backend drivers14:35
jokke_rosmaita: basically the qcow2+NFS was implemented in a way that it depends on the attachment API support. Which is problematic backport due to it introducing new dependency, depending on cinder side backports and refactoring significant amount of the driver code14:35
abhishekkand in the past for some other projects I have seen these kind of backports were supported14:35
rosmaitawell, rajat described the cinder attitude to driver backports very clearly in his last comment on https://review.opendev.org/c/openstack/glance_store/+/80592614:36
abhishekkyes14:37
rosmaitaour view is that people actually use the drivers, and it's a big ask to make them upgrade their entire cloud to a new release, rather than update within their current release14:37
dansmithyou could apply that reasoning to any feature backport for a buddy right?14:38
rosmaitanot really14:38
dansmith"My buddy doesn't want to upgrade but does want this one feature, so we're going to backport so he doesn't have to upgrade?"14:38
rosmaitaif it impacts main cinder code, we don't do it14:39
rosmaitathe difference is that it's isolated to a single driver14:39
jokke_dansmith: that's why we backport in downstream like there is no tomorrow14:39
dansmithyou can apply this to a driver or the main code, either yields the same result for me :)14:39
dansmithjokke_: exactly14:39
croelandtjokke_: true that14:39
abhishekkcroelandt, I think downstream it is then14:40
croelandtjokke_: I'm gonna start backporting every patch, at this rate14:40
abhishekkI can understand the feeling 14:40
jokke_And I personally think that in downstream that is business decision with attached commitment to support any problems it brings. In upstream we should be very cautious what we backport as we do not have similar control of the environment14:41
dansmiththis ^14:41
abhishekkack14:41
abhishekkany counter arguments to this?14:42
abhishekkok, moving ahead then14:43
abhishekk#topic Holiday plans14:43
abhishekkcroelandt, is going on 2 weeks holiday from Monday14:43
abhishekkand jokke_ is for 1 week14:43
rosmaitaslackers!14:44
abhishekkany other core member is planning to take off during same time ?14:44
croelandtrosmaita: hey, people died so that I could have PTO14:44
croelandtI'm glad they did not die so I could eat Brussel sprouts14:44
jokke_rosmaita: I have an excuse. HR and the Irish gov will rain proper shaitstorm on me and my manager if I don't use my holidays. so there's that :P14:44
abhishekkthose two weeks we will be going to have ninja time I guess14:45
rosmaitai'm just jealous, that's all14:45
abhishekk++14:45
dansmithabhishekk: I have no such plans at the moment14:45
abhishekkgreat14:45
abhishekkme neither14:45
croelandtrosmaita: Unionize, comrade14:46
jokke_LOL14:46
rosmaitais RH unionized in France?14:46
abhishekkand on top of that jokke_ will send me picture of his tent and beer 14:46
abhishekkI guess that's it from me for today14:47
jokke_Tovarich Cyril :D14:47
abhishekkmoving to open discussion14:47
abhishekk#topic Open discussion14:47
abhishekkNothing from me14:47
rajivHi, sry to divert from holiday mood, firstly, thanks for merging/commenting on few of my bugs. The below is still pending : https://bugs.launchpad.net/swift/+bug/1899495 any update on this ?14:47
jokke_I can be inclusive and send the pictures to rest of ye too!14:47
abhishekkHappy holidays croelandt and jokke_ 14:47
rosmaitajokke_: what's your current beer total on that app?14:48
abhishekkrajiv, I think everyone is busy at the moment on M3 priorities as it is just around the corner14:48
jokke_rajiv: like I mentioned that is tricky one from Glance point of view. And I have couple of others under works still too14:48
rajivokay, i would i like to understand how glance-api process consumes memory ? for example, different image types and sizes consumes different glance-api process memory consumption.14:49
jokke_rajiv: I'll be back chasing those once I'm back after next week. 14:49
croelandtrosmaita: some employees are in a union14:49
* croelandt is not :-(14:49
abhishekkwe can make ours14:49
rosmaitarajiv: i think you get a 409 on a container delete if it's not empty14:49
abhishekkglance union14:49
rosmaitanot sure that's a helpful operation14:50
rajivrosmaita: the swift container isnt empty, since the deletion goes in parallel, a conflict occurs.14:50
rosmaitai mean, "observation"14:50
rajivwe introduced retries but it dint help either.14:50
abhishekkrosmaita, is cinder hitting any tempest failures in gate at the moment ?14:50
rosmaitaabhishekk: i sure hope not14:50
rajivtoday, i had an user upload 20 images in parallel and glance-api crashed. 14:51
rosmaitai will look14:51
abhishekkrosmaita, ack, please let me know14:51
rajivrosmaita: jokke_ abhishekk any suggestion on my second question ?14:52
jokke_rajiv: I saw your question in #os-glance ... so the memory consumption is very tricky to predict. There is quite a bit buffering involved as the data actually passes through the API service and like you saw there might be lots of data in buffers and caches when you have lots of concurrent connections in flight14:52
rajivjokke_: yes, i raised it there as well but had no response, hence i asked here.14:52
jokke_rajiv: yeah, you were gone by the time I saw it :D14:53
rajivinitially i set 3GB as the limit, but setting 5GB for 20 image uploads still chocks the glance-api process and sometimes the process gets killed14:53
rosmaitaabhishekk: just got an tempest-integrated-storage failure on tempest.api.compute.admin.test_volume.AttachSCSIVolumeTestJSON.test_attach_scsi_disk_with_config_drive14:54
jokke_In general this is one of those things we've seen in busy production clouds having set of decent dedicated servers for gapi alone as it can be quite taxing 14:54
rajivhence the upload terminates and sends back a HTTP 502 and we have to manually delete the chunks in the swift container as well.14:54
rosmaitajust that one test, though14:54
abhishekkrosmaita, yeah, I am hitting that three times since morning14:54
rosmaitathat same test?14:54
abhishekks/I am/I hit14:54
abhishekkyeah14:55
jokke_rajiv: yeah, I don't think we ever designed any part of the service to really capping. So I think you setting limits for it will eventually lead to that same situation again14:55
rajivjokke_: okay, is there a doc or code i can refer ?14:55
rajivto understand how memory consumption works ? or a pattern ?14:56
jokke_rajiv: it's just matter of if it's 5 concurrent operations, 20 or 30. But you will eventually hit your artificial limit and get the service killed14:56
rosmaitaabhishekk: it did pass on a patch that depended on the one with the failure (which of course got a -2 from Zuul because the dependency didn't merge)14:56
abhishekkrosmaita, not same test, mine is resize related14:56
rajivthe image being uploaded was ~900GB, among 20, only 3 images were created.14:56
abhishekkhttps://52b5fef6b4a63a70ea73-b7be325c2c973618eb7074df9913ea2c.ssl.cf5.rackcdn.com/799636/21/check/tempest-integrated-storage/91789c0/testr_results.html14:57
jokke_rajiv: I can't recall us having any documentation about that. 14:57
rosmaitaok, mine was a server-failed-to-delete problem14:57
jokke_rajiv: the main limiting factor is chunk ize14:57
abhishekkrosmaita, ack14:57
abhishekkthank you14:57
rajivthe chunk size is 200MB, and enabling buffering did not help either.14:57
abhishekk3 minutes to go14:58
abhishekk214:58
jokke_so each of the greenlet worker thread will eat some memory and while you have data transfer flying, there is obviously the network buffers involved but really the chunking is the main limiting factor for the API not just caching your whole 900gigs into memory if the storage is slow :D14:58
rosmaitahmmm, looks like it never got to the point where it could attach a volume, couldn't ssh into the vm14:59
jokke_rajiv: we can continue on #os-glance as we're running out of time if you prefer14:59
rajivsure, switching over.14:59
abhishekkrosmaita, yes14:59
abhishekkthank you all14:59
abhishekkhave a nice weekend14:59
rosmaitai think it's just one of those random failures14:59
jokke_thanks everyone14:59
abhishekk#endmeeting15:00
opendevmeetMeeting ended Thu Aug 26 15:00:13 2021 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)15:00
opendevmeetMinutes:        https://meetings.opendev.org/meetings/glance/2021/glance.2021-08-26-14.00.html15:00
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/glance/2021/glance.2021-08-26-14.00.txt15:00
opendevmeetLog:            https://meetings.opendev.org/meetings/glance/2021/glance.2021-08-26-14.00.log.html15:00
*** rpittau is now known as rpittau|afk16:02
*** akekane_ is now known as abhishekk16:18

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!