Thursday, 2020-06-25

*** gyee has quit IRC00:07
*** hoonetorg has quit IRC03:02
*** rcernin has quit IRC03:14
*** hoonetorg has joined #openstack-glance03:16
*** rcernin has joined #openstack-glance03:16
*** rcernin has quit IRC03:24
*** rcernin has joined #openstack-glance03:30
*** rcernin has quit IRC03:45
*** rcernin has joined #openstack-glance03:48
*** udesale has joined #openstack-glance03:52
*** rcernin has quit IRC03:56
*** ratailor has joined #openstack-glance04:00
*** rcernin has joined #openstack-glance04:05
*** evrardjp has quit IRC04:34
*** evrardjp has joined #openstack-glance04:34
*** jmlowe has quit IRC04:46
*** jmlowe has joined #openstack-glance04:49
*** zzzeek has quit IRC05:41
*** zzzeek has joined #openstack-glance05:41
*** m75abrams has joined #openstack-glance05:59
*** m75abrams has quit IRC05:59
*** m75abrams has joined #openstack-glance06:00
*** zzzeek has quit IRC06:36
*** zzzeek has joined #openstack-glance06:39
openstackgerritAbhishek Kekane proposed openstack/glance master: Fix: Interrupted copy-image leaking data on subsequent operation
*** zzzeek has quit IRC07:36
*** zzzeek has joined #openstack-glance07:37
*** rcernin has quit IRC08:17
*** jawad_ax_ has joined #openstack-glance08:35
*** jawad_axd has quit IRC08:39
*** jmlowe has quit IRC08:48
*** jmlowe has joined #openstack-glance08:50
*** bhagyashris is now known as bhagyashris|afk09:55
*** tkajinam has quit IRC10:22
*** bhagyashris|afk is now known as bhagyashris11:00
*** udesale_ has joined #openstack-glance11:52
*** udesale has quit IRC11:54
mordreddansmith: fixed
mordreddansmith: I've rechecked your nova test change to be sure12:32
*** ratailor has quit IRC13:25
dansmithmordred: hmm, same deal?
dansmithoh wait, it didn't run on your last recheck13:29
dansmithabhishekk: hmm, does glance's fast8 not work?13:30
openstackgerritDan Smith proposed openstack/glance master: Add image_set_property_atomic() helper
abhishekkdansmith, you mean pep8?13:42
dansmithno, fast813:42
dansmithI had run fast8 and gotten a pass so assumed my patch was clean, but apparently that doesn't work in glance, even though the tox target runs13:43
dansmithfast8 smartly just checks the things that have changed since master, so it runs way way quicker on nova13:43
dansmithmaybe pep8 runs quick enough on glance that you've had no need I guess13:43
abhishekkI never tried fast813:43
abhishekkwill have a look though13:44
dansmithabhishekk: assume you saw my atomic update patch?13:44
abhishekkdansmith, yes, I saw it13:45
abhishekkWe need API wrapper around it13:46
dansmithyeah, assumed you would make your patch use it13:46
abhishekkI will update my patch with dependency of yours13:50
abhishekkneed some time for that though13:50
dansmithack, it would be nice to get an ack from zzzeek too13:52
abhishekkrosmaita, jokke, smcginnis, dansmith glance weekly meeting at #openstack-glance in 5 minutes13:55
dansmithoh is it in here? wiki says meeting-413:55
abhishekkdansmith,  #openstack-meeting14:00
dansmithheh okay14:00
* abhishekk I really need some time off14:00
abhishekkrosmaita, around?14:02
*** alistarle has joined #openstack-glance14:05
openstackgerritDan Smith proposed openstack/glance master: Check authorization before import for image
*** alistarle has quit IRC14:42
*** alistarle has joined #openstack-glance14:48
*** m75abrams has quit IRC15:00
abhishekkdansmith, updated the meeting channel in wiki15:03
dansmithcool :)15:04
dansmithabhishekk: if I stage some new image content for an image, and then do a copy-image to a new store, I would end up with two locations for the image, with different image content right?15:12
dansmithsince you just check the image size (as of your patch) I could cause havoc if I innocently did that right?15:12
abhishekkdansmith, yes, if you managed to staged exact same size of image in satging15:16
*** jawad_ax_ has quit IRC15:16
dansmiththat seems, like, bad and stuff :)15:16
dansmithI would hope that glance is using the hash to ensure that every location (at least the ones it populates) are identical15:16
dansmithI mean, as a user I'd expect that15:16
dansmithis there any image.fsck() command to recalculate and validate the hash on each location?15:17
abhishekkdansmith, again, then it will pass this check but will fail when it matches the checksum and hash with original image later15:18
*** jawad_axd has joined #openstack-glance15:18
dansmithI'm not sure what you mean...what will fail later?15:19
*** bhagyashris is now known as bhagyashris|afk15:19
abhishekkSo if you stage some new image with exact size of the image you want to copy in staging area15:20
abhishekkand the you initiates copy image operation15:20
abhishekkThis will pass the the current check which I have added in the patch15:21
*** tosky has joined #openstack-glance15:22
abhishekkBut During import task when that image is imported to desired store, it will check size, hash and checksum of new image with the original image15:22
*** jawad_axd has quit IRC15:22
abhishekkand either of them mismatched it will fail15:23
toskyhi glance people, can you please add this (not critical) change to your review list?
dansmithah, okay, so we will upload what we find in store (if size matches), but after upload, if the hash wasn't the same, we'd fail, and delete the copy we just made instead of adding the new location?15:24
toskyIt removes the last legacy job15:24
abhishekkdansmith, yes15:25
dansmithabhishekk: gotcha okay15:25
abhishekktosky, ack15:25
dansmithabhishekk: that's the answer to my question in the review about hash vs size, which I just said more about, so just ignore :)15:25
abhishekkdansmith, ack :D15:26
* abhishekk dinner break, back in 3015:26
*** udesale_ has quit IRC15:28
jokkedansmith: yeah, we upload through hashing so we don't have verification what landed to the store is what we expected, but we have checkpoint that what we sent matches to the metdata15:31
jokkedansmith: unfortunately I think none of the store backends gives us nice interface to get hash of what they have stored for us. Also when consumed clinet will do hash verification of what it receives15:32
dansmithack, I was wondering why we didn't hash on the download to staging, store the hash in a sidecar and then avoid needing to copy the bad data to the new location just to check the hash15:32
dansmithbut as long as it's doing it sometime, that is the important part15:32
*** udesale has joined #openstack-glance15:34
jokkeThe copy is utilizing lots of the tasks from the already existing import flows, we could have done one more hashing check in between like you said, but a) it's async task so we're not holding client waiting and b) it's done already on the later stage so quite diminishing returns15:38
jokkeIt's not like majority of the transactions are corrupt15:38
dansmithheh, hashing is for catching the 0.001% that are, but yeah I understand15:39
jokkeyup and we still catch it, just after bit of extra work :P15:39
*** jawad_axd has joined #openstack-glance15:53
zzzeekhey dansmith I just noticed I think you need to include "WHERE value=None" in the WHERE clause there, otherwise the UPDATE will match the row regardless15:55
dansmithzzzeek: I don't think that will work because we don't know the value is None, it's whatever the last attempt set it to15:56
zzzeekOK if "name" is not being updated to a unique token then the rows matched here will not be zero15:56
zzzeekthe idea is, there's a row, and you expect to have "<values>" in it.  if some other thread gets the row, then "<values>" is gone15:56
zzzeekthat's how you check atomically15:57
dansmithzzzeek: I just want the presence of a row with name=$mykey to be atomic, not anything to do with the value column15:57
*** jawad_axd has quit IRC15:57
zzzeekOK so then shouldnt you include "name" in the SET clause ?15:57
dansmithbut the name won't change, why do I need to set it?15:58
zzzeekdansmith: OK then this won't work.   you're trying to grab a "lock" on that row and we are trying to do compare and swap15:59
dansmithzzzeek: can you look at these tests and see if that clears up what I'm going for?
zzzeekdansmith: oh ok "deleted" is the significant column16:00
zzzeekthoguht there was some kind of token OK16:01
dansmithonly one should be able to un-delete it right?16:01
zzzeekbut that's not the "lock"16:01
zzzeekyes within the scope of this method, if you set deleted to zero, another UPDATE looking for deleted == 0 will not find the row16:01
zzzeekso as far as DBDuplicate, is this UPDATE setting a set of values that has a unique constraint on it?16:02
zzzeeklike unique on (name, value, deleted) or soemthign?16:02
dansmiththe only constraint is over (image_id, name) deleted is not included,16:03
dansmithwhich means you can't even create a new row with name if there's already an existing deleted one (which is weird)16:03
zzzeekOK so you are only affeting value and deleted so this UPDATE can't raise a dbduplicate16:03
dansmiththat's why I'm having to update-or-insert.. I would have expected all I needed was the insert until I looked at their schema16:03
dansmithokay gotcha16:03
zzzeekthe INSERT can16:03
zzzeekif you were updating another row to have the identical name/image_id as this one, that woudl raise a cosnraint error16:04
jokkename is not unique constrain16:04
zzzeekwhich is the dbduplicate16:04
jokkeonly ID is16:04
dansmithjokke:                       UniqueConstraint('image_id',16:04
dansmith                                       'name',16:04
dansmith                                       name='ix_image_properties_'16:04
dansmith                                            'image_id_name'),)16:04
dansmithzzzeek: makes sense16:05
jokkedansmith: Ok, let me correct. Should not be and is not treated as such in Glance16:05
dansmithjokke: I don't understand the difference :)16:05
jokkeI wonder what that is actually affecting16:06
dansmithwell, if I create a property, delete it, and try to insert another such row, I get a duplicate exception, since the UC doesn't cover deleted16:06
dansmithpresumably the property updates are looking for existing rows (ignoring deleted) and updating them if they find them, undeleting them16:07
dansmithbut I didn't fully chase that as it can't really work any other way16:07
jokkeOh yes, that the property name, not the image name16:08
dansmithright, we're talking about properties here16:08
jokkesorry, my bad16:08
dansmithso all good?16:08
jokkegotta love the overloading of keys :D16:08
jokkeI was like "WTF, I surely can create multiple images with same name so where is that overwritten" but yes, property names are unique per image16:09
dansmithheh yeah16:09
jokkeAs I would have not been surprised it being overwritten somewhere else before actually used :P16:10
openstackgerritDan Smith proposed openstack/glance master: Add image_set_property_atomic() helper
dansmithjokke: so for the policy thing.. it seems like the "is owner or admin" stuff is baked super deep into glance, such that the context we start the task with determines whether or not the image can be updated at all16:15
dansmithjokke: in nova we've moved as much policy stuff to the API as possible, so my tendency is to do the check in the API (like I'm doing for the current bugfix), but obviously I need to do something to make the task able to update the image16:16
dansmithso ... what's the best strategy there? have the task check policy and use an admin context for the update, or check in the API as well, and pass an admin context to the task?16:17
jokkedansmith: we have had this dream to refactor the policies in glance for I think at least couple of years now. Just no-one who has enough interest and/or time to do it. The policies in Glance are real mess as some are closer to API when some are right at the db layer or any of that damn onion in between16:18
dansmithyep, been there :)16:18
jokkedansmith: and the onion model is very descriptive. Every time you try to cut through, it makes you cry.16:18
* dansmith nods16:18
jokkeSo majority of the import code just directly bypasses the domain model. If we can work around that, great. One less thing that needs refactoring when we get to that16:20
dansmithI don't know what the domain model is. I've chased calls through it and though "gateway" but I don't really know what they are16:20
jokkeIf we need to inject "No adding the location from this user is really ok, we checked it already" to the context so be it :P16:21
jokkedansmith: just that the gateway + ~dozen layers of different "Repo" proxies layered on top of eachother16:22
jokkedansmith: some of which defined in glance/domain/16:23
dansmithit's more than just locations that needs updating.. the reason this fails now is because the task can't even start by adding the store to os_glance_importing_stores16:23
dansmithis there some reason that I can't pass the actual image to the task in task_input? Allowing the API to grab a non-immutable image and pass it to the task and say "here use this instead of fetching it yourself" seems like it might be reasonable16:24
jokkedansmith: ouch, yeah, there is that16:24
*** jawad_axd has joined #openstack-glance16:25
dansmithif we're doing the auth check in the api (which we are as of my patch), then is there any reason to not just let the copy task run with an admin context entirely?16:26
jokkedansmith: I think all of it is done in the context of the request. There really isn't any more real image object16:26
dansmithnot sure what you mean.. this is failing now because image_repo.get() returns an ImmutableImage which refuses to allow the task to update image.extra_properties['os_glance_importing_stores']16:27
dansmiththat happens inside the task, as the task has only image_id and the user's context, which is what causes image_repo to return the immutable form16:28
jokkedansmith: yes, that immutable image is returned based on the context. If you call that as owner or admin that all changes. That's what I mean we do everything in the scope of the request context16:30
dansmithokay, that's why I was suggesting running the task with an admin context instead of the user,16:31
openstackgerritVlad Gusev proposed openstack/glance stable/ussuri: Fix hacking min version to 3.0.1
dansmithbecause image_repo.get() won't know that we're doing a copy and that it should give a non-owner a mutable image object16:31
jokkedansmith: so we would do all the permission/policy checks in the api call, deem it ok and safe and the swap the context to admin when passed to taskflow?16:33
jokkeI'm kind of ok with that, not sure if it's can of worms the rest want to open16:34
dansmithwe just use an admin_context in get_flow() to get the image if there's other stuff the user's context should use16:35
jokkeI think we need to go through the copy-image taskflow and just make sure it's static (no plugins will be enabled for it) and see there is nothing that can be leaked into the from the import parameters etc.16:35
dansmithbut based on the way this works, I'm not sure what else we could do, other than a flag on the context that says "let this context do stuff" or of course something more complicated like a PartiallyMutableImage class that allows updating locations, and the couple of properties we need, but I'm not sure that's really the right thing to do16:36
jokkeI really really hate getting into these situations just because of the crappy core design :(16:36
dansmithyeah the middle of this is so naive it's hard to do something better, which is why I figured just checking auth up front and letting the task run as admin is the most in line with everything else16:37
jokkeAnd I'm not sure how much refactoring the kind of partially mutable object would need16:37
*** alistarle has quit IRC16:37
jokkeit's probably needing tons of work every layer between the API and DB16:38
dansmithright, any time the task changes behavior slightly, that class would need to be tweaked to allow whatever it does16:38
dansmithnote we could also do this admin context thing only for import where method='copy-image', so that normal imports continue to run as the user16:39
dansmithsince normal imports do things like set the image active, etc16:39
jokkedansmith: yup, and you can imagine how many other things there is that are exactly the same. You think you fix one simple thing and then you spend next week tracking down those proxy classes that needs to be changed to allow that passing :(16:39
jokkedansmith: yes, if we are going to that route it will be _only_ the copy-image16:40
jokkeand we need to harden that taskflow16:40
jokketo make sure it doesn't leak permissions on anything else than it's supposed to16:40
jokkeFor me it kind of makes sense and is yet another reason why we should try to find time to fix the core. But I can only speak on my own behalf16:41
jokkeneed to check what abhishekk rosmaita smcginnis thinks of it16:42
* abhishekk reading scroll backs16:42
dansmithack, I can extend my existing check in the API and make it run the task as admin in the copy-image case, but probably need some help auditing the task to make sure it's safe to do that16:42
jokkedansmith: yeah, I'll take some time to read the execution path through16:43
abhishekkdansmith, do we also need to make sure to not include any other import plugins if import operation is copy-image?16:43
dansmithI don't know.. can the import task do other things?16:44
abhishekklike we have conversion, decompression, inject_metadata16:44
dansmithyeah, do those get called from the same task?16:44
jokkedansmith: so how it works is that the methods are "plugins" for the API that triggers them. But they will add also the user (admin/deployer in this context) definable plugins into the flow. And I personally think if ran in admin context we need to do that as the plugin chain is common for every method16:46
*** gyee has joined #openstack-glance16:46
dansmiththe real problem here is that copy-image is a plugin or method of import, where it really should probably be a separate action right?16:47
dansmithbecause the others are all dealing with changing the content of the image itself, which definitely is owner-only,16:47
dansmithbut copy-image is a little different16:47
jokkedansmith: all three import methods are that. They are all different combinations of same Taskflow Tasks16:48
dansmithso api_image_import is a composite task for anything that is done via import16:48
jokkedansmith: correct16:49
dansmithI'm feeling like maybe just giving up is the best course here16:51
*** udesale has quit IRC16:53
jokkedansmith: I'm very familiar with the Taskflows and how we built it. I definitely can help sorting those bits out once we have figured out the way around the major blocker which is the context16:53
jokkedansmith: so if we can agree to run the copy-image in admin context and lock it down, it's no problem for me to get it in shape where it needs to be so we can actually do that16:54
jokkedansmith: and the more I look this discussion the more convinced I am that we should do some of those changes regardless.16:56
dansmithokay, well, I'll try to get it to the point where I can run it as admin, depends-on from my devstack test and then I guess we can go from there16:57
*** jawad_axd has quit IRC16:57
jokkeSo please don't give up ... this is more feedback on a fresh feature we've got in ages and I think we got somewhat tunnelvisioned in some of the decisions and implementation due to the lack of feedback for so long16:57
dansmithjokke: abhishekk: how about a hybrid approach where I factor out all the things we need to do for a copy-image into a helper, which grabs the image as admin just to do things like os_glance_importing_to_store=$store and the locations thing17:09
dansmithI don't know if that will be feasible, but that stuff is a bit messy in the task anyway,17:10
dansmithso refactoring it to one place and then saying "this stuff can run as admin" would be a much smaller scope17:10
dansmithso, this for example would be a helper call:
dansmithsort of a "sudo set_image_importing($store)"17:11
jokkedansmith: you mean like wrapper that uses image object picked up with admin context?17:13
dansmithyeah, let me just write up what I mean with that one thing delegated to the wrapper and I think it will be more clear17:14
jokkeit would definitely limit the scope of things that makes me afraid17:14
dansmithlemme ask another unrelated question17:18
dansmithjokke: abhishekk: it seems to me that if two import actions happen on an image simultaneously, the task will munge the stores and os_glance_importing_to_store type things stepping on top of each other17:19
dansmithi.e. two nova computes are booting from the same instance in two different edge sites with different rbd clusters,17:19
dansmithboth ask to have the image copied to their rbd cluster at the same time17:19
dansmithone or both will be confused as the two competing tasks will overwrite the other, such that my store disappears from os_glance_importing_to_store, without going into stores or os_glance_import_to_store_failed (or whatever that error property is)17:20
jokkedansmith: I was thinking of that, but thought that lets try to get the basics in place before bringing the question up if we should put the new task waiting or just append to the list and continue.17:21
dansmithI'm just asking what will happen today17:21
abhishekktoday you are right17:22
dansmithI assume competing tasks will step all over each other and potentially result in the image copied to a store that doesn't end up in stores17:22
jokkedansmith: today the second task will overwrite the importing to stores property, everything will go just fine as it should until the first task finishes, tries to remove it's store from the list and blows up :P17:22
abhishekkisn't it the same that get solved with the race condition logic?17:22
dansmithjokke: right okay17:22
dansmithabhishekk: yeah I was thinking more about corrupting the single copy when I reported that, but I'm just extending it to cover internal glance metadata corruption too17:23
jokkedansmith: the taskflow will not pull the next store from that property, it has it's own reference list. Just the fact that updating that property will fail I think17:24
dansmithabhishekk: your race lock will make it so only one can be started at a time I guess, but you will want to communicate to the user clearly why that is, since the image doesn't change state while copying17:24
jokkeIt actually might not blow up as I'm not sure if it fetches the image object again from db in between. It just might be that it overwrites the property again17:25
jokkeit's either or, not sure17:25
dansmithI think in at least several cases it will not fetch the image and not notice that it is blowing away the existing17:25
dansmithif that property is being changed, it will just update the property to the new value17:26
abhishekkIMO 2nd thread will overwrite os_glance_importing_to_store17:26
dansmithif that is not what is changed, the update-match method in the db api might cause you to just explode because something else changed one of those values17:26
dansmithabhishekk: yep17:26
jokkedansmith: exactly ... not sure how it worked. It might or might not blow up at that point when they are racing to update the property17:27
abhishekkthen, 1st thread after completing the copy will come back to remove the store from that property and founds that it is not in the list, log a message and continue next17:27
jokkeSo what we might want to do as hardening excersize is to append to that list if it exists instead of just going and overwriting it. And do get_image(imageid) in between every store17:28
jokkeyeah, so we do pull fresh metadata for each task instead of passing the image object around17:30
dansmithnot at the start of the import though, we assume we have the master list of importing stores right?17:32
dansmiththis ^ uses stores from the task creation17:32
jokkedansmith: that takes it from the request and writes it to the property. But the actual taks that executes and removes them one by one pulls the metadata from db to remove the store id from that list when they finish17:34
dansmithright, so that one case would clobber any list, which if it runs at just the right time, it could clobber those two properties for a period of time at least17:35
jokke creates the individual tasks and adds them to the flow. So each of those tasks handles one store in the list17:35
dansmiththe other cases seem to do the merge, but unless it's done atomically they can still interleave and lose information17:36
dansmithright but not atomically17:36
jokkeyeah, there is no cluster wide locking to ensure no race condition would ever happen if you hammer 500 requests simultaneously17:37
dansmithyou don't need to do cluster-wide locking, just do it atomically at the DB layer17:39
openstackgerritAbhishek Kekane proposed openstack/glance master: Fix: Interrupted copy-image leaking data on subsequent operation
* abhishekk signing out for the day17:50
*** jmlowe has quit IRC18:11
*** jmlowe has joined #openstack-glance18:20
dansmithholy crap, this goes so much deeper than I even thought it might18:39
*** jawad_axd has joined #openstack-glance18:41
*** jawad_axd has quit IRC19:15
*** jawad_axd has joined #openstack-glance20:32
openstackgerritDan Smith proposed openstack/glance master: Add a test to replicate the owner-required behavior of copy-image
openstackgerritDan Smith proposed openstack/glance master: WIP: Refactor TaskFactory and Executor to take an admin ImageRepo
openstackgerritDan Smith proposed openstack/glance master: WIP: Admin import action wrapper
*** jawad_axd has quit IRC21:06
dansmithmordred: looks like --import worked21:57
dansmithonly took 6.5 hours after rechecking this morning to get that result21:57
dansmithhowever, I wonder if you expected this:21:57
dansmith2020-06-25 21:51:02.964679 | controller | | properties       | OpenStack-image-import-methods='['glance-direct', 'web-download', 'copy-image']', locations='[]', os_hidden='False', owner_specified.openstack.md5='', owner_specified.openstack.object='images/cirros-0.5.1-x86_64-disk', owner_specified.openstack.sha256='' |21:58
dansmithkinda looks like you coped the header into the image properties?21:58
dansmithI haven't seen that before, maybe glance is exposing that automatically when you use the import mechanism?21:58
mordreddansmith: \o/22:00
mordreddansmith: hrm. no - I was not expecting that22:00
dansmither, I meant "copied" and "the supported import methods header (or whatever)"22:01
mordreddansmith: I will make a fixing of that22:01
dansmithcool thanks22:01
dansmithbut thanks all around for making that easier22:02
mordreddansmith: we've made some progress towards making success22:06
dansmithI'm sorry, but I'm not willing to concede that 6.5h to run part of one job is "progress" towards anything22:07
dansmithbut discounting that...yes :)22:07
*** jmlowe has quit IRC22:22
*** jawad_axd has joined #openstack-glance22:24
*** jmlowe has joined #openstack-glance22:25
*** tosky has quit IRC22:40
mordredit's better than if it had taken 65h22:55
mordredan order of magnitude better even!22:55
* mordred remembers those days unfondly22:55
*** jawad_axd has quit IRC22:56
*** rchurch has quit IRC23:43
*** rchurch has joined #openstack-glance23:43
*** gyee has quit IRC23:59

Generated by 2.17.2 by Marius Gedminas - find it at!