Wednesday, 2020-06-24

*** rcernin has quit IRC01:12
*** rcernin has joined #openstack-glance01:17
*** Liang__ has joined #openstack-glance01:18
*** Liang__ has quit IRC01:32
*** Liang__ has joined #openstack-glance01:33
*** Liang__ has quit IRC01:47
*** rcernin has quit IRC02:34
*** rcernin has joined #openstack-glance02:36
*** rcernin has quit IRC03:46
*** rcernin has joined #openstack-glance03:55
*** rcernin has quit IRC04:04
*** rcernin has joined #openstack-glance04:05
*** evrardjp has quit IRC04:33
*** evrardjp has joined #openstack-glance04:33
*** udesale has joined #openstack-glance04:35
*** ratailor has joined #openstack-glance04:44
*** m75abrams has joined #openstack-glance05:23
*** udesale has quit IRC05:40
*** udesale has joined #openstack-glance06:25
*** udesale_ has joined #openstack-glance06:29
*** udesale has quit IRC06:30
*** gyee has quit IRC06:51
openstackgerritAbhishek Kekane proposed openstack/glance master: WIP: Fix race condition in copy image operation  https://review.opendev.org/73759607:26
*** rcernin has quit IRC07:39
*** priteau has joined #openstack-glance07:47
*** belmoreira has joined #openstack-glance07:49
*** bhagyashris is now known as bhagyashris|lunc07:57
*** belmoreira has quit IRC08:08
*** belmoreira has joined #openstack-glance08:08
*** udesale_ has quit IRC08:11
*** bhagyashris|lunc is now known as bhagyashris09:08
*** rcernin has joined #openstack-glance09:12
*** rcernin has quit IRC09:17
*** tkajinam has quit IRC10:03
openstackgerritvinay harsha mitta proposed openstack/glance master: api-ref needs update about 'checksum' image prop  https://review.opendev.org/73773510:05
*** rcernin has joined #openstack-glance10:10
*** rcernin has quit IRC10:15
*** rcernin has joined #openstack-glance11:27
*** rcernin has quit IRC11:34
*** rcernin has joined #openstack-glance11:39
*** wxy has quit IRC11:40
*** rcernin has quit IRC12:08
*** rchurch has quit IRC12:34
*** rcernin has joined #openstack-glance12:35
*** rchurch has joined #openstack-glance12:37
*** jawad_axd has quit IRC13:18
*** jawad_axd has joined #openstack-glance13:18
*** rcernin has quit IRC13:29
*** Liang__ has joined #openstack-glance13:56
*** Liang__ is now known as LiangFang13:57
dansmithabhishekk: correct me if I'm wrong, but this test seems bogus to me: https://github.com/openstack/glance/blob/master/glance/tests/unit/v2/test_images_resource.py#L2947-L295814:14
dansmiththe failure that generates the forbidden result is failure to create a task as owner=None14:14
dansmithwhich would never happen via the API, right?14:14
dansmithI guess that is why you say "not expected in normal scenarios",14:15
abhishekkyes14:15
dansmithbut the test being there makes it look like a user could get Forbidden via the API, but they can't14:16
dansmitheven if I change that to owner=TENANT2, it allows the test to complete (without my code)14:16
abhishekkthat is added for coverage purpose14:16
dansmithso shall I change that test and note to be used to test my auth check?14:16
dansmithwhat coverage?14:16
abhishekkSounds good14:16
abhishekkcode coverage14:17
dansmithheh14:17
dansmithI mean, what code does this cover that remains uncovered without it? it's testing something that's not possible via the API, so unless there is some dead code that isn't normally reachable...14:18
abhishekkThats why I added note there14:18
*** ratailor has quit IRC14:35
openstackgerritDan Smith proposed openstack/glance master: Check authorization before import for image  https://review.opendev.org/73754814:37
abhishekkdansmith, this issue ^ is for copy-image operation, right?14:41
dansmithabhishekk: import_image(method='copy-to-store') but I imagine it applies to all import_image operations, no?14:44
abhishekkdansmith, could be (but not sure this will happen in real scenarios for other than copy-to-store case14:47
dansmithwhy not?14:48
abhishekklike if image is in queued state and other user use that image id to import image, less possibility14:48
dansmithyou mean it's not likely for a user to try to do that, but if the image was created with --public then they could _try_ right?14:48
abhishekkhmm14:49
dansmithmeaning if I create an image with --public, and then go make coffee,14:49
dansmithsome other user could see it and say "hey let me try to import my qcow into that"14:49
abhishekkgot it14:49
dansmithwithout this, they will get a 202 from the import command, and the task will fail on the server side, but otherwise not know why14:49
dansmithnote the commit message, where even different tenant will be allowed to create the task, which is why it's important to change that test to use a _different_ tenant and not _no_ tenant14:50
abhishekkclear :D14:50
dansmithack :)14:50
abhishekkbetween I tried little workaround for race condition, posted new PS14:51
dansmithokay will look in a sec14:51
abhishekkno problem, take your time14:52
abhishekkCleanup on service start for pending tasks need to be consider as new work14:53
*** m75abrams has quit IRC15:07
*** priteau has quit IRC15:27
*** LiangFang has quit IRC15:37
*** suryasingh has joined #openstack-glance15:48
* abhishekk going for dinner, back in 3015:56
*** gyee has joined #openstack-glance15:57
*** mordred has joined #openstack-glance16:04
mordredrosmaita: all_stores_must_succeed ... the docs describe both true and false cases but don't indicate what the default is. I'd *guess* default is false, would that be right?16:05
rosmaitamordred: i don't remember!  give me a few minutes to look16:06
mordredrosmaita: thanks!16:09
rosmaitamordred: looks like it defaults to true: https://opendev.org/openstack/glance/src/branch/master/glance/api/v2/images.py#L10816:10
mordredrosmaita: aha! glad I asked16:11
rosmaitayeah, it's kind of counterintuitive -- but the idea is that you requested that the image be in stores X, Y, Z, so if it can't be stored in all 3, the call should fail16:14
rosmaitamordred: i'm going to file a doc bug for that, thanks for pointing it out16:15
dansmithmordred:             all_stores = body.get('all_stores', False)16:17
rosmaitadansmith: that's a different one16:18
dansmithoh all_stores and all_stores_must_succeed I guess? confusing16:18
rosmaitaall_stores is shorthand so that you don't have to list all the stores in the call; all_stores_must_succeed refers to all the stores listed in the call16:19
rosmaitayou can tell this API was designed by developers!16:19
dansmithgotcha16:33
mordredrosmaita: ya - it is a little counter- although I do think it's the right default behavior16:38
mordredrosmaita: like - it's one of those flags that probably nobody should ever use :)16:38
rosmaita:D16:38
dansmithyou mean all_stores_must_succeed should default to true?16:39
dansmithI'm not really arguing that it shouldn't, but it would suck to upload 1TiB over your home cable connection only to find out that one store ran  out of disk, and if others had succeeded, at least the image is in the cloud and you can copy-from-store to fix it :)16:40
rosmaitayes, that would be a motivating scenario for learning about the option!16:41
mordreddansmith, rosmaita: https://review.opendev.org/737845 Add support for multiple image stores16:42
dansmithrosmaita: heh, fair enough16:42
mordreddansmith: well - the failure case for all_stores_must_succeed is that the image remains in the staging area16:42
mordredand the image is deleted from any store that it did succeed copying to - but import can be called again without needing to re-upload over your home cable connection16:43
rosmaitamordred: i'll put your patch on the agenda for tomorrow's glance meeting to get some eyes on it16:43
mordredrosmaita: there's one ahead of it in the stack adding image import support - and also one adding the flag to openstackclient16:43
mordredrosmaita: https://review.opendev.org/#/c/737625/16:44
rosmaitaok16:44
rosmaitathanks16:44
dansmithmordred: oh, it does that for all store types?16:44
mordredrosmaita: while I'm bugging you ... with normal upload the content is copied straight to the backing store (like ceph) but with import it goes into a staging area and my understanding is that that's on the controller. is it possible to configure the staging area to also be in ceph - like for controller nodes with limited local disk?16:45
mordreddansmith: yeah.16:46
rosmaitamordred: not yet16:46
mordredrosmaita: oh - interesting - I see another place in the docs where it does indicate False is default16:46
rosmaitathat's not good16:46
mordredrosmaita: the text docs of import image indicate it - it's just not on the field docs for all_stores_must_succeed itself16:46
abhishekkdefault value for all_stores_must_succeed is False16:50
rosmaitaabhishekk: https://opendev.org/openstack/glance/src/branch/master/glance/api/v2/images.py#L10816:50
abhishekkoops16:51
mordredhonestly - True is likely a more expected value - I'd vote that the docs just get updated16:52
mordredbut - you know - I'm realy mostly asking so I can document the parameter properly :)16:52
rosmaitamordred: where did you see that in the docs -- I only found (default is True)16:52
mordredrosmaita: https://docs.openstack.org/api-ref/image/v2/index.html?expanded=import-an-image-detail16:53
abhishekkrosmaita, that is in api ref16:53
mordredrosmaita: "done and the state of the image remains unchanged. When set to False (default), the workflow will fail only if the upload fails on all stores"16:53
mordredbut I agree - the code seems to say True16:54
rosmaitamordred: thanks for finding that16:55
mordred\o/16:55
* mordred is mildly helpful16:55
abhishekkyes, that needs to be corrected16:55
dansmithso, sorry to rewind, but... I'm still kinda stuck on this staging directory thing... when I upload via import, I send the whole image to the staging area, and only then does it start copying to stores? does it copy to stores in parallel?16:56
mordreddansmith: it copies to stores with the import call16:57
dansmithand if the initial upload fails, say because the glance service dies, how does glance know that the thing in the staging area isn't valid if I try to import again?16:57
dansmithmordred: maybe I'm missing the fact that there are multiple API steps because I've just used the client?16:58
mordreddansmith: so, it's POST /images ; PUT /images/{id}/stage ; POST /images/{id}/import16:58
dansmithokay, totes didn't get that from using the client, so ... if all the links are the same speed, twice as long to get the image ready in the new location I guess16:59
mordreddansmith: yah. so in the single-store no-transforms-needed case it's more efficient to PUT to image.data16:59
dansmiththis puts the race condition abhishekk and I are working on in perspective16:59
dansmithmordred: yeah I didn't grok that before, but I get it now17:00
dansmithassume there's some hash/size checking to make sure that a previous stage got the image there in one piece before you try to import from staging?17:00
mordreddansmith: no, not to my knowledge - there is no hash/checksum returned from the put to stage17:01
dansmithhrm, okay17:02
mordred*but* - I haven't tried - it's possible the image resource will have an updated checksup after upload to staging that could be checked17:02
dansmithso and, um, the staging dir is per node/worker, so how do you ensure you hit the same worker again when you go to do the import?17:02
dansmithabhishekk: ^ ?17:07
abhishekkdansmith, staging area is common for all workers in single node17:08
dansmithabhishekk: on a single node, sure17:08
abhishekkFor HA we need to set that staging area on NFS17:08
dansmithbut If you have three controller nodes, each running some workers... if I stage to worker1...17:08
dansmithdear god really?17:09
abhishekkthat's the limitation at the moment if you use glance-direct import method17:09
abhishekkif you use web-download import method then its not needed17:09
dansmiththat seems like a limitation that puts it outside the realm of being useful at all17:09
dansmithoh17:10
dansmithimport that takes a URI instead of uploads the image from the client?17:10
abhishekkyes17:10
dansmithokay17:11
dansmiththat significantly complicates getting the image uploaded though17:11
abhishekkin case of web-download import method there will be only two API calls17:11
dansmithokay but presumably glance still stores it in its staging directory after it pulls the image?17:11
abhishekkyes and in this case /import does the work of staging and then importing to store17:12
dansmithabhishekk: when using copy-to-store, I assume glance is downloading the image to the staging area and then pushing it back to the new store, right?17:26
abhishekkyes17:26
dansmithokay and if that fails because I kill -9 the glance worker, leaving residue in the staging area, will a subsequent copy-to-store on that worker try to upload the partial image to the new store without re-fetching it?17:27
abhishekkit will17:29
dansmithokay shall I file a bug for that as well?17:30
abhishekkI guess yes17:31
dansmithhttps://bugs.launchpad.net/glance/+bug/188500317:35
openstackLaunchpad bug 1885003 in Glance "Interrupted copy-to-store may corrupt a subsequent operation" [Undecided,New]17:35
abhishekkack17:36
abhishekkdansmith, ran this scenario on local17:39
abhishekkthe worker does try to copy image to store from staging area but it fails while importing and operation gets reverted, below log message confirms it17:40
abhishekkJun 24 17:38:26 vicoria glance-api[27935]: ERROR glance.location [-] size of uploaded data is different from current value set on the image.17:40
dansmithabhishekk: does it delete the residue?17:41
abhishekkyes17:41
dansmithokay, so the bug is less corruption then and more just "it makes the next operation fail"17:42
dansmithI'll alter the title17:42
abhishekkdansmith, wait17:42
* dansmith waits :)17:42
dansmithsurely something could just check the size of the stage residue against image.size and delete/restart instead of making the user have to do that right?17:43
abhishekkit does not delete the data actual from the store if all_stores_must_succeed is False17:44
dansmithoh gosh, you mean it might actually delete the image from the other stores if that is true?17:44
abhishekklet me explain17:45
abhishekk1. I start copy operation for image which size is 4 gb17:45
abhishekk2. After copying 1gb data to staging service goes down17:46
abhishekk3. I restart the service and initiates the copy to store B17:46
abhishekk(Note image is in store A)17:46
abhishekkwhile copying all_stores_must_succeed is True (step 3)17:47
abhishekk4. It sees some data in staging and without checking actual size it goes to import that data in store B17:47
abhishekk5. Now after import is complete it checks for the existing size and current size and marked the operation as failure17:48
*** priteau has joined #openstack-glance17:48
abhishekksorry all_stores_must_succeed is False (step 3)17:49
abhishekkNow after step 5 raises exception and all_stores_must_succeed is False it Ignores the failure and does not initiates the revert operation17:49
abhishekkso even if image locations does not get updated, the partial data in store B does not get cleaned17:50
dansmithokay what about the residue in the staging dir?17:50
abhishekkit gets deleted17:52
dansmithokay, and what if all_stores_must_succeed=True ?17:53
abhishekkin this case data in staging remains as it is17:54
abhishekkbut it should delete it from stores17:54
dansmithjust the failed store or ALL stores?17:54
abhishekkfailed store17:55
dansmithwhew, okay :)17:55
dansmithso in the =False case, the user gets slapped with an immediate fail, but can retry and it will work because the residue from the stage is cleared, but we might leak storage on the backend, right?17:55
dansmithand on =True case, we leave the stage but delete the backend partial.. what happens if the user tries again? fail again?17:56
abhishekkdansmith, for false yes we might leak storage on the backend17:57
abhishekkand on True if user tries again with True result will be same17:58
dansmithokay, IMHO, both of those cases need to be fixed. leaking storage is definitely not okay, and in both cases a pre-check of the size in the staging area would prevent the user from having to retry (potentially with no improvement) and prevent copying some data only to fail and waste the time, network, etc... agree?17:59
abhishekkdansmith, agree, this will work gem and not chance of leaking18:01
dansmith"work gem" ?18:01
abhishekkthis is good way to solve this leaking issue18:02
dansmith++18:02
dansmithI shall update the bug and then you can see if I got it right :)18:02
abhishekkwill put up a patch18:02
abhishekkdansmith, ack18:03
abhishekkso, this will not occur for import case18:04
abhishekkwill only in case of copy-image (copy-to-store) operation18:04
dansmithright, I figured18:07
dansmithbug updated18:07
abhishekklooking18:07
zzzeekdansmith: hey did you try to ping me here, just notcied my irc wasclosed18:08
abhishekkdansmith, in glance there is no concept of copy-to-store, its copy-image18:08
dansmithzzzeek: I didn't because you were marked as away18:08
zzzeekdansmith: ah very polite :)18:08
zzzeekdansmith: the basic idea is do that UPDATE and then get the rowcount back for numnber of rows affected18:09
dansmithabhishekk: ah sorry18:09
zzzeekif it's zero, you missed18:09
zzzeekdoes that make sense ?18:09
dansmithzzzeek: meaning my example SQL update query you mean right?18:09
zzzeekyes18:09
abhishekkdansmith, no problem18:09
zzzeekI would skip the ORM and just run result = connection.execute(table.updat()...)18:10
zzzeekthen result.rowcount18:10
dansmithzzzeek: yeah, okay that's what I thought.. I don't really know how to translate that into ORM stuff, but maybe I can work on that a bit and get you to validate that what I'm doing is legit?18:10
dansmithzzzeek: suhweet, that's what I'd rather do18:10
zzzeekdansmith: yeah I would just use a core statement18:10
dansmithzzzeek: a dedicated claim() operation and just do it that way18:10
zzzeekdansmith: just for the claim part at least.  then once oyu have it you can do wahever ORM things18:10
dansmithyep18:10
zzzeekdansmith: also if you have an ORM session, you can get the connection for the transaciton as conn = session.connection()18:11
dansmithzzzeek: don't we want to do that update thing outside a transaction where we have already read some of the data we'll be examining?18:12
zzzeekdansmith: hmmmm yes if oyu are on repeatable read then I guess so18:12
dansmithI thought if we did it with other stuff, we'd be operating on repeated read data... yeah that18:12
zzzeekmight not matter18:12
zzzeekthe other transacvtions would block18:12
dansmithso I figured we'll just do the claim as a single shot thing, which is fine because the real work happens later in an async task18:12
zzzeeksure I guess commit is safer b.c. then it goes out and gets past the locks but also does the galera writeset thing18:13
zzzeekit's just when you do work subsequent to that, you want to have a new transaction18:14
zzzeekso if this is the very fisrt thing you're doing it should be OK18:14
dansmithack okay18:15
zzzeekassuming this table is also involved in the subseqeunt work18:15
dansmithzzzeek: I need to look at their stuff more to be able to answer that question, but .. probably18:15
zzzeekits just cleaner to have two serial transactions rather than one oingoing and another pops out in the middle18:15
abhishekkdansmith, this solution will also solve our race?18:16
dansmithso abhishekk I will try to work on this claim thing, zzzeek can check my work and or make fun of it if it's wrong and then we can use it to solve your race18:16
dansmithzzzeek: yep, two is what I'm going for18:16
abhishekkdansmith, ack :D18:16
abhishekkdansmith, back to our staging clearing solution18:16
zzzeekwhen transactions overlap that's how you can get deadlocks so keep them serial is simpler18:16
*** priteau has quit IRC18:16
dansmithzzzeek: ++18:17
zzzeekok i will copy gthis to the gerrit as a record18:17
dansmithzzzeek: thanks18:17
abhishekkdansmith, sorry, forget it, I am mixing to many things18:17
abhishekkzzzeek, is Mike?18:17
dansmithyes :)18:17
zzzeekyes18:17
abhishekk:D18:17
abhishekkthanks zzzeek18:18
zzzeekok the formatting is disastrous in my comment18:18
dansmithzzzeek: I can get you an eavesdrop link18:18
dansmithhttp://eavesdrop.openstack.org/irclogs/%23openstack-glance/%23openstack-glance.2020-06-24.log.html#t2020-06-24T18:08:3618:19
dansmithyou can just mic-drop that if you want18:19
dansmith"the zzzeek hath spoken"18:19
dansmithabhishekk: so I need to step away for a bit and I know it's late there, I'll try to work on that a bit later and we can sync up in the morning okay?18:22
abhishekkdansmith, sounds good18:22
abhishekkmorning means what time in UTC?18:23
dansmithabhishekk: about 1330 UTC18:30
dansmiththat is 0630 local time for me18:30
abhishekkdansmith, I will be around at that time18:30
dansmithcool18:30
abhishekkthank you, have a nice time ahead18:31
dansmith:) o/18:31
*** belmoreira has quit IRC18:33
abhishekko/18:34
*** irclogbot_0 has quit IRC18:34
*** irclogbot_2 has joined #openstack-glance18:37
openstackgerritAbhishek Kekane proposed openstack/glance master: Fix: Interrupted copy-imgae leaking data on subsequent operation  https://review.opendev.org/73786720:06
abhishekkdansmith, ^^20:07
dansmithabhishekk: -120:07
dansmithabhishekk: but cool, will review for real in a bit :)20:07
abhishekkjokke, ^^20:07
dansmithabout to push up the atomic update patch20:07
abhishekkwhy -1?20:07
dansmithspelling! :D20:08
abhishekkohhh20:08
abhishekkI am going offline now, will fix your comments tomorrow20:08
abhishekkthank you again20:08
openstackgerritDan Smith proposed openstack/glance master: Add image_set_property_atomic() helper  https://review.opendev.org/73786820:08
dansmithabhishekk: ^20:08
dansmithabhishekk: go to sleep :)20:08
abhishekklooking20:08
dansmithdon't look, go to sleep :)20:09
abhishekkwill have better sleep then :D20:09
abhishekkdansmith, some print statements are there in api.py20:12
dansmithah crap20:12
openstackgerritDan Smith proposed openstack/glance master: Add image_set_property_atomic() helper  https://review.opendev.org/73786820:13
abhishekkcool, see you tomorrow20:13
abhishekkgood night20:14
dansmitho/20:14
abhishekko/~20:14
*** belmoreira has joined #openstack-glance20:21
*** belmoreira has quit IRC20:26
*** priteau has joined #openstack-glance20:28
*** priteau has quit IRC20:39
dansmithmordred: https://zuul.opendev.org/t/openstack/build/307319ffe4b845f4b58cb78bd32f191120:52
dansmithmordred: same config worked with glanceclient, so I think that means you're checking something they're not20:52
mordreddansmith: cool! that's great - it means the machinery all worked, but there's a bug in the detection of import capability20:55
mordreddansmith: I'll poke and see what that bug is, then we should be good to go20:56
dansmiththanks20:56
dansmithyou can recheck that patch to get the full stack test, but it won't actually pass for other reasons20:56
dansmithbut you can at least see it not get stuck at import20:56
mordreddansmith: it'll be https://review.opendev.org/#/c/737608/4/openstack/image/v2/_proxy.py@224 and https://review.opendev.org/#/c/737608/4/openstack/image/v2/image.py@23420:57
mordredthe api docs say that header is returned from the image creation if the server supports import20:57
dansmithhmm21:03
mordreddansmith: fixing a different thing first - then i'll figure out what it thinks it's doing and what it should be doing21:53
dansmithaight22:04
*** rcernin has joined #openstack-glance22:42
*** tkajinam has joined #openstack-glance22:51

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!