Thursday, 2021-07-29

*** rpittau|afk is now known as rpittau07:10
abhishekk#startmeeting glance14:00
opendevmeetMeeting started Thu Jul 29 14:00:16 2021 UTC and is due to finish in 60 minutes.  The chair is abhishekk. Information about MeetBot at http://wiki.debian.org/MeetBot.14:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.14:00
opendevmeetThe meeting name has been set to 'glance'14:00
abhishekk#topic roll call14:00
jokke_o/14:00
abhishekk#link https://etherpad.openstack.org/p/glance-team-meeting-agenda14:00
abhishekko/14:00
dansmitho/14:00
croelandto/14:00
abhishekkCool I guess we have enough audience today, lets start, rosmaita will join shortly14:01
rosmaitao/14:01
abhishekk#topic release/periodic jobs update14:01
amorinhelo14:01
abhishekkM3 is 5 weeks away and glance-store release is 3 weeks14:02
abhishekkWe are good on store release front at the moment14:02
abhishekkPeriodic job, we started hitting time out again, daily at least one job is failing with Time out issue14:03
abhishekkSame goes with our patches in gate14:03
dansmithwhere is the timeout?14:03
abhishekkwait a minute14:04
abhishekkhttps://zuul.opendev.org/t/openstack/build/e3e617817bce4d7b8fe332ee9a52861014:04
dansmithoh,14:04
abhishekkI haven't got much time to go through logs, will do it after the meeting14:05
dansmithhmm, yeah okay14:05
dansmithinteresting that it was doing some metadef namespace tests right then14:05
abhishekkok14:05
abhishekkImportant part is we have 5 weeks from now for policy refactoring and cache API work14:06
abhishekkImplementing project scope for metadefs in good shape and does not have much work left14:07
abhishekkMoving ahead14:07
abhishekk#topic M3 targets14:07
abhishekkPolicy refactoring work14:07
abhishekkYou can get entire overview of this work in one spreadsheet14:08
abhishekk#link https://docs.google.com/spreadsheets/d/1SWBq0CsHw8jofHxmOG8QeZEX6veDE4eU0QHItOu8uQs/edit?pli=1#gid=7377311714:08
dansmith(most impressive spreadsheet I've seen in a long time)14:08
abhishekkAround 50% patches are up for reviews14:08
abhishekk:D14:08
abhishekkcroelandt, is already doing great job for us in reviews, thank you for that14:09
dansmithI don't really expect we're going to fully finish the refactor in xena, but I do think we're making more progress than I thought we would14:09
abhishekkhmm, we will assess the situation next week again14:10
abhishekkAll i will say is guys please review patches, there are plenty in the tree other than policy as well14:10
abhishekkI am hoping once lance is back then the work will pick up the speed14:11
abhishekkMoving to Cache API14:11
abhishekkjokke_, anything to update?14:11
abhishekkI guess testing and documentation is pending for the same14:12
jokke_I will hopefully get back to it (the tests) next week. Had couple of days of pto and been busy with some other stuff14:12
abhishekkack, please let me know if there is anything to do14:13
jokke_Which should be sorted this week ;)14:13
abhishekkgreat14:13
abhishekkMetadef project persona integration14:13
abhishekk#link https://review.opendev.org/c/openstack/glance/+/798700/14:13
abhishekkWe have tempest-plugin tests up and under review for the same as well14:14
abhishekkThat's it from me for today14:14
abhishekkJust to know that I have filed one bug in glanceclient14:15
abhishekkOur client does not have support to show member details14:15
abhishekkMoving to Open discussion14:16
abhishekk#topic Open discussion14:16
abhishekkCI issue on stable/ussuri?14:16
abhishekk#link https://review.opendev.org/c/openstack/glance/+/78555214:16
abhishekkThis certainly started to fail, earlier it was passing 14:17
croelandtyes14:17
croelandtthe logs are weird14:17
croelandtI'm not sure exactly what to do about it14:17
croelandtnot how to find out the root cause of the issue :/14:17
croelandtIOW: I'm stuck :D14:17
abhishekkMe too14:17
jokke_I'll have a quick look after the meeting14:17
abhishekk#link glance-code-constants-check https://zuul.opendev.org/t/openstack/build/22fc30ffbb0b400f87f4261d7397fec0 14:18
abhishekkgreat, thank you14:18
croelandtI'm not even sure what this job is, tbh14:18
abhishekkI will explain you after the meeting14:18
dansmithI would like to know too :)14:18
jokke_it's sheet14:18
jokke_:P14:18
croelandtjokke_: always here to make things easy to understand :D14:19
abhishekkIts has something to do with database migration14:19
abhishekknext on Open discussion is we have review request for this bug;14:20
alistarleHello guy's, it's been a long time :)14:20
abhishekk#link https://bugs.launchpad.net/glance/+bug/193826514:20
alistarle we just see a bug with multi-store when taking a snapshot with rbd driver (so using direct-snapshot): https://bugs.launchpad.net/glance/+bug/193826514:20
abhishekkalistarle, welcome back14:20
alistarleYup, I always come back when I find new bug ><'14:20
abhishekk:D14:21
abhishekkI think as suggested by Dan we need some tests there14:21
alistarleWe are currently writing it, it should be submitted today :)14:21
alistarleIt seems it totally prevent nova to perform snapshot with this configuration14:22
jokke_alistarle: thanks for the patch too, not just a bug :D14:22
jokke_I was peaking into it earlier today14:22
alistarleIt's not actually my patch, but amorin one :)14:23
alistarleBut yeah, cool to fix stuff 14:23
amorinwe did it together, but you're the boss for the tests for sure :)14:23
amorindo you have any clue on kind of tests for this?14:23
abhishekkglad to see that multistore is finally in use 14:24
amorinFYI, we hit this bug in Stein downstream14:24
amorinbut code base has not moved so much14:24
abhishekkno I guess14:25
abhishekkwhat happens when rbd is not used in nova?14:25
amorinthen nova is not doing any location update14:25
amorinit's done differently AFAIK14:25
dansmithyup14:25
amorinonly direct snapshot is triggering this14:26
alistarleHe call the standard glance workflow, so POST /images14:26
alistarleAnd what about the solution, seems suitable for you ?14:26
jokke_I wonder if that issue actully persists in Train, or when we implemented the lazy upload abhishekk. That should at least in theory solve it or do we fail the actual add lication call?14:26
alistarleI think guessing the store is acceptable, as we already doing it for lazy update14:26
amorinI tested this against a small victoria deployment in my dev environment14:27
amorinand I triggered the same bug14:27
alistarleNo because lazy update is called for get and list call, not patch for adding a new location14:27
jokke_kk, so it's the actual location add validation that fails. Interesting14:27
alistarleSo we are still hitting this enigmatic "Invalid location" 400 error 14:27
abhishekkjokke_, lazy update is for list or get call, once we have actual image in action14:28
jokke_croelandt: add to your list :D14:28
alistarleAnd something interesting is we are allowing "backend=None" in glance, but in glance_store it is a mandatory parameter14:28
jokke_abhishekk: yeah, but we do get right after the add, so if it would allow adding it, we would have updated it right after I think14:28
alistarleAnd as far as I see, there is no way "backend=None" can produce a workable result, it will always end by a KeyError somewhere14:29
abhishekkI think we have one job running for nova-ceph-glance14:29
abhishekkIt means we have no test to cover this workflow ?14:29
amorin:(14:30
abhishekkalistarle, nova-ceph-multistore could you verify this job and check whether we run this scenario or not ?14:30
jokke_abhishekk: also means that nova doesn't test snapshotting with Ceph either14:30
dansmithunless the ceph-multistore job isn't running full tempest, then we certainly are14:31
abhishekkElse for start I guess we can add this scenario as reproducer and then consider this fix 14:31
abhishekkI do see 923 tests running in that job14:32
abhishekkand 80 skips 14:32
dansmithhmm, looks like maybe the ceph-plugin job we inherit from might be missing some of full tempest14:33
dansmithwe do volume snap tests, but might be skipping the compute ones14:33
dansmithnot us, but that plugin's job def, which we inherit from14:33
jokke_it's likely the storage scoped parent job. Few cycles back we stopped running full tempest 20 times on all possible scenarios14:34
abhishekkhmm14:34
abhishekkSo I guess we need to enable those14:35
dansmithhttps://github.com/openstack/devstack-plugin-ceph/blob/master/.zuul.yaml#L3114:35
dansmithlooks like that's focused mostly on volume tests14:36
jokke_or we probably should finally default to multistore configs in gate so the tests would actually catch multi-store issues14:36
jokke_as we've deprecated the single store configs for quite a while ago14:36
dansmithwe've been running that in nova as our ceph job for a long time, before multistore :/14:36
abhishekkjokke_, we don't have support to configure mutiple stores other than file store in devstack14:37
dansmith...and ceph?14:37
abhishekkI have patches up for swift and ceph but didn't got time to relook at those14:38
dansmithI'm not sure what you mean.. the multistore job is file+ceph14:38
abhishekk#link https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/74180114:38
dansmithright, but you know that my multistore job is ceph+file, right?14:39
abhishekkyou have done it via yaml file by adding some local configs there14:39
dansmithright14:39
abhishekkyep14:39
dansmithyou just mean there's no easy flag in devstack, okay14:39
abhishekkyeah14:39
abhishekkI was saying to jokke's comment, to default multistore in gate14:40
abhishekkthis for swift + file, https://review.opendev.org/c/openstack/devstack/+/74165414:40
alistarleDespite that I think this bug can be valid also with a single multi-store configuration14:40
dansmithack, okay14:41
alistarleNo need to have two backend actually configured14:41
abhishekkalistarle, right, 14:41
abhishekkSo as I said and rightly commented by Dan on the patch, we need one reproducer and then the fix14:42
jokke_alistarle: correct, that was my point. By now we should not be testing by default the old single store configs but actually configuring the store(s) even single with ulti-store enabled14:42
dansmithokay, hold up14:43
dansmithtest_create_image_from_paused_server[id-71bcb732-0261-11e7-9086-fa163e4fa634]14:43
dansmithpass14:43
jokke_as the old way of configuring should have been removed already, but obviously that's a years away if all gating is still relying on it14:43
dansmiththat's from our multistore job, so it should be creating images from servers14:43
dansmithand maybe better:14:43
dansmithtest_create_delete_image[id-3731d080-d4c5-4872-b41a-64d0d0021314] pass14:43
dansmithso we should dig into a job run and see if those are really working, and if so, figure out why14:44
abhishekkdansmith, those might be using post call and not copy on write ?14:45
alistarleYes but it fail only with rbd backend, so direct-snapshot enabled14:45
alistarleMaybe this job are using file backend ?14:45
dansmithabhishekk: in the ceph job they should be rbd-backed, which means it'll do the cow snapshot14:45
dansmithif these tests are actually on file backend, then the whole premise of the job is broken14:46
abhishekkmay be14:46
abhishekkI think default backend is ceph14:47
abhishekkas defined in job14:47
dansmiththat's the whole point of the job yeah14:47
jokke_Well Nova is clearly not doing direct snapshots in that job 'cause it would have been broken14:48
alistarleYes I can double check, but even in case of the default backend, this line will fail https://github.com/openstack/glance_store/blob/master/glance_store/location.py#L111 if backend = None14:48
jokke_so even if glnce is using ceph as backend, nova might still be uploading the image14:48
abhishekkdefault is file backend14:48
alistarleAnd backend are directly coming from the metadata of the "add_location" call14:48
dansmithjokke_: all of our customers use rbd and direct snapshots.. surely if this was completely broken someone would have mentioned it right?14:49
jokke_dansmith: OSP 16.1+ DCN only. 14:49
amorinare they using multi store?14:49
dansmithamorin: I thought the assertion is that this is broken in non-multistore as well?14:50
dansmithjokke_: you mean multistore is only 16.1+, right?14:50
jokke_dansmith: but that's why I told croelandt to add it into his list of work to do as we ned to fix this for downstream customers too14:50
abhishekkhttps://github.com/openstack/nova/blob/master/.zuul.yaml#L48014:50
amoringood point, I havnt checked in non-multistore14:51
jokke_dansmith: only 16.1+ DCN (distributed/edge) with storage. Otherwise we still config the old way14:51
amorinit may not be broken because we are not calling the same function14:51
amorinwe are calling https://github.com/openstack/glance_store/blob/master/glance_store/location.py#L55 AFAIK14:51
dansmithah, alistarle said "single multistore configuration" above.. I took that to mean "non-multistore" but I see14:51
abhishekkdansmith, images_rbd_glance_store_name what does this do?14:51
dansmithabhishekk: it's just for the auto-copy from another store14:52
abhishekkack14:52
abhishekkso the job is running for file store I guess14:52
abhishekkLast 7 minutes14:53
dansmithI dunno what "running for file store" means14:54
abhishekkdansmith, default backend in the job is file backend14:54
abhishekkso any request coming from nova to glance will be using file backend14:54
dansmithabhishekk: nova will always copy it to rbd first when booting, so it should be on rbd when snapshot happens14:55
abhishekkbecause of that flag I just mentioned earlier ?14:55
jokke_dansmith: but nova doesn't do direct snapshot if it had to copy it into ceph14:55
dansmithabhishekk: yes14:55
dansmithjokke_: huh?14:55
jokke_dansmith: if nova pulls the image over http from glance and writes it to ceph, it will not do direct snapshot in eph. It will upload the snapshot image back over http too14:56
dansmithwe're calling set image location clearly in the tests: tempest-ImagesTestJSON-1022511244-project] enforce: rule="set_image_location"14:56
jokke_hmm-m, interesting14:57
abhishekklast 3 minutes 14:57
abhishekkwe can move to glance channel for discussion14:57
dansmithjokke_: it's not doing that though. it's asking glance to copy the cirros image from file to rbd before it ever boots the instance, and then boots the instance from the rbd copy14:57
abhishekkjokke_, you need to look at stable/ussuri job as well14:57
abhishekklets move to our channel14:58
abhishekkThank you all14:58
abhishekkSee you next week14:58
abhishekkKeep reviewing 14:59
abhishekk#endmeeting14:59
opendevmeetMeeting ended Thu Jul 29 14:59:21 2021 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)14:59
opendevmeetMinutes:        https://meetings.opendev.org/meetings/glance/2021/glance.2021-07-29-14.00.html14:59
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/glance/2021/glance.2021-07-29-14.00.txt14:59
opendevmeetLog:            https://meetings.opendev.org/meetings/glance/2021/glance.2021-07-29-14.00.log.html14:59
*** rpittau is now known as rpittau|afk16:03

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!