Tuesday, 2019-10-08

timburkeif you don't have the disk yet and know you won't for a while, you can just remove the device from any rings it was participating in and rebalance to reassign its partitions. replicators will have some work to do, but after a cycle or two everything should settle a bit00:02
*** baojg has joined #openstack-swift00:03
timburkeeven without the ring change, objects should be fully durable as long as you unmount the drive - replication should put an extra copy on the first handoff00:03
donnydI have a replacement (or 20) on hand00:05
donnydI have 3 swift servers.. but with my current air handler can only have two of them on00:05
donnydwhich is a bummer... my old air handler let me have 6 of them turned on without an issues00:06
*** hoonetorg has joined #openstack-swift00:06
donnydjust don't want to blow anything up.. I can make a reasonable assumption people are actually using the data in FN for real things... and its integrity is a priority for me00:07
*** gyee has quit IRC00:07
*** baojg has quit IRC00:08
timburkeentirely reasonable. replacing the drive should be very smooth00:10
timburkeoh hey! https://docs.openstack.org/swift/latest/admin_guide.html#handling-drive-failure00:10
timburkekinda sounds like we recommend removing the failed device, rebalancing, then adding the replacement as new... which is a bit different than i was expecting; i wonder if rledisez or alecuyer have any insight on what works best for them...00:13
*** baojg has joined #openstack-swift00:24
*** NM has quit IRC00:28
mattoliveraucant you also use `swift-recon --expirer` to manually check. or even hit your nodes exirer API to get info. It's also what my patches to monasca does.. now if they'd only land.00:32
donnydthanks timburke00:33
*** BjoernT has joined #openstack-swift01:40
*** BjoernT has quit IRC01:43
openstackgerritMatthew Oliver proposed openstack/swift master: Auto-sharding: Initial steps  https://review.opendev.org/66703003:28
mattoliveraujust a rebase (from the UI) let's hope it worked :)03:28
*** tkajinam has joined #openstack-swift03:49
openstackgerritMatthew Oliver proposed openstack/swift master: Auto-sharding: Initial steps  https://review.opendev.org/66703003:54
openstackgerritMatthew Oliver proposed openstack/swift master: sharding: first attempt at _elect_leader  https://review.opendev.org/66757903:54
openstackgerritMatthew Oliver proposed openstack/swift master: auto-sharding: send shard-ranges via container UPDATE  https://review.opendev.org/67265003:54
mattoliverauThis time I did the rebase myself ^^03:55
timburkeman, we really ought to try to standardize our recon dumps...04:00
timburkejust looking for basic, common things like cycle time and end of last cycle, i put together http://paste.openstack.org/show/781847/04:01
timburkeseveral daemons don't meaningfully capture any of this data (account reaper, container reconciler, container sync, object auditor)04:02
timburkesomething like half the time, cycle-time names include account/container/object, despite it going to a file like object.recon04:03
timburke"pass completed", "sweep", "time", "pass" all mean approximately the same thing04:04
mattoliverauyeah, it being standardised would be awesome, and would simplify p 583876 so it could be alot more generic, and maintainable04:04
patchbothttps://review.opendev.org/#/c/583876/ - monasca-agent - Add swift_recon check plugin to monasca - 1 patch set04:05
timburkepoor ho never did come back to https://review.opendev.org/#/c/270014/04:05
patchbotpatch 270014 - swift - Fix time unit of Recon's replication_time for object - 1 patch set04:05
mattoliverauspeaking of which I should go give that patch some love now that someone has actaully reviewed it from monasca :P04:05
mattoliverau:(04:05
timburke\o/ reviews! progress!04:06
timburkeat least we're pretty consistent about using "last" to mean the timestamp for last cycle completion04:10
timburkebut then we have "expired_last_pass" which tracks the number of objects actually deleted04:11
timburkei can see how that makes sense. but taken as a whole, it's definitely confusing04:11
mattoliverauit's interesting, you can kinda see the evolution. All the resplicators or daemons based off replicators are the ones that have the _last for the end time. All daemons runningo n cycles really should have a last, a cycle time is good too, but knowing when it's comlete kinda feels like a must.04:17
mattoliverauBut  guess I don't have to OP Swift clusters as much as I'd like, so maybe I'm wrong.04:18
*** fungi has quit IRC04:21
timburkemattoliverau, i agree entirely. without the last, you have no idea how stale that cycle time is :-/04:22
mattoliverauWell lucky for us adding missing items is easier then renaming.04:22
timburke:D04:23
*** fungi has joined #openstack-swift04:26
*** psachin has joined #openstack-swift05:59
*** ccamacho has quit IRC06:35
*** rcernin has quit IRC07:05
*** tesseract has joined #openstack-swift07:10
baffletdasilva: That's pretty good timing. I just readded all devices, worked great. 😁07:13
*** rdejoux has joined #openstack-swift07:24
*** ccamacho has joined #openstack-swift07:26
*** ccamacho has quit IRC07:27
*** ccamacho has joined #openstack-swift07:27
alecuyertimburke: we keep spare disks ready on some machines in each cluster. When a disk fails the ring gets changed to point at a spare. then there's time to replace the failed disk07:27
*** mvkr has quit IRC07:30
*** mvkr has joined #openstack-swift07:43
*** mikecmpbll has joined #openstack-swift08:01
*** mvkr has quit IRC08:21
*** tkajinam has quit IRC08:30
*** e0ne has joined #openstack-swift08:31
*** mvkr has joined #openstack-swift08:34
*** rpittau|afk is now known as rpittau08:38
*** e0ne has quit IRC08:52
*** e0ne has joined #openstack-swift09:13
*** pcaruana has joined #openstack-swift09:30
*** rcernin has joined #openstack-swift10:53
*** tomha has joined #openstack-swift12:06
*** tomha has quit IRC12:20
*** NM has joined #openstack-swift12:34
*** pcaruana has quit IRC12:53
*** rcernin has quit IRC13:19
*** mikecmpbll has quit IRC13:33
*** diablo_rojo has joined #openstack-swift14:01
*** NM has quit IRC14:06
*** NM has joined #openstack-swift14:19
*** pcaruana has joined #openstack-swift14:46
*** gyee has joined #openstack-swift15:21
*** rpittau is now known as rpittau|afk15:47
timburkegood morning15:52
*** e0ne has quit IRC15:55
*** tesseract has quit IRC16:08
*** rdejoux has quit IRC16:18
clayghot spare drives!!!16:38
claygtimburke: do you have any idea how these tests are succesfully using the null byte in query args -> https://review.opendev.org/#/c/682138/8/test/unit/account/test_server.py16:39
patchbotpatch 682138 - swift - Allow internal clients to use null namespace - 8 patch sets16:39
claygbut I can't seem to get it to work in the like filter?16:40
claygi guess it might not be the query args code - but instead the code handling like that barfs 🤔16:43
timburkeyeah, grabbing sqlite source now...16:43
clayg"The result of expressions involving strings with embedded NULs is undefined." Fuh.  http://www.sqlite.org/c3ref/bind_blob.html16:54
timburkewhoa16:58
timburke> The sqlite3_create_function() interface can be used to override the like() function and thereby change the operation of the LIKE operator.16:58
timburkei don't think we use LIKE anywhere else.... hmm....16:59
openstackgerritThiago da Silva proposed openstack/swift master: WIP: New Object Versioning mode  https://review.opendev.org/68238217:00
timburkewhee! https://sqlite.org/lang_corefunc.html#quote17:01
timburke> Strings with embedded NUL characters cannot be represented as string literals in SQL and hence the returned string literal is truncated prior to the first NUL.17:01
timburkemakes me wonder whether the prefix tests are actually testing everything we want...17:07
claygyes, i 100% agree - i feel like if that acctually what was happening I'd be able to demonstrate it trivially with these marker and prefix tests - but they're *working*17:07
claygi'm so confused17:07
claygwell, not that confused - i mean all the documentation is telling me "stop; don't do this; it's not supported; you'll end up maintaining sqlite" - but I'm like *we're so CLOSE!!!*17:08
claygI'm also looking at if there's anything we could do with that range of bytes that's our weird outlawed utf8 i.e. '%d8'17:09
timburkeit's gonna get harder/weirder -- the sorting isn't going to be in our favor17:11
timburkethe beautiful thing about NUL was that it's *so early* owhen sorting17:11
timburke*maybe* the separation between archive and primary containers can save us a bit? like, store with some non-utf8 byte, then replace all of them with nulls in time for us to do our interleaving? idk... feels like the elegance is slipping away...17:14
claygyup17:17
clayg😭17:17
*** ccamacho has quit IRC17:29
*** lbragstad has joined #openstack-swift17:32
timburkeclayg, good find on the set_trace_callback() func -- defintiely helpful as i play with this. but i'm starting to wonder how well it works, in light of the other logging issues i've seen...17:36
timburkein particular, if i drop a self.fail() at the end of test_prefix_with_null(), i see a query like17:37
timburkeWHERE  name < 'null' AND name >= 'null' AND deleted = 017:37
timburkewhich really shouldn't return anything17:37
lbragstado/ hi folks - i'm having some difficulty generating a temp url, but i think i'm following all the right steps, at least based on what i found in documentation (this is what i've done so far: https://pasted.tech/pastes/88360ef2441f66fc3be37d3afbce7335ffca5f46.raw )17:39
timburkeclayg, are we *sure* we can't claw back the \x01-\x08 namespace, similar to how we grabbed the leading . in the account namespace?17:40
timburkelbragstad, is delay_auth_decision enabled in the auth_token middleware? i don't think tempurl works without it17:42
lbragstadtimburke good question - let me check quick17:43
timburkepretty sure other features will break, too -- staticweb, formpost, anonymous access...17:44
lbragstadok - interesting... i am noticing 401s in my swift.log for other requests (service-to-service), too17:45
lbragstadi don't see delay_auth_decision set in /etc/swift/proxy-server.conf17:46
lbragstadbut i do see ksm's auth_token middleware in the pipeline17:46
timburkelbragstad, default is false; i think you'll need to explicitly enable it17:47
lbragstadok - i'm seeing several configs, but i assume proxy-server.conf is the one i need to edit?17:48
timburkeyep; all auth decisions are handled at the proxy (for better or worse...)17:49
lbragstadok - i enabled that an bounced all the swift service, still no luck though ( i generated a new tempurl with swift tempurl and used curl directly)17:53
lbragstads/an/and/17:53
lbragstadnew paste https://pasted.tech/pastes/452d4abb2701053fe2c22926c7b43fea57c7d9e1.raw17:55
lbragstadi used `swift post -m "Temp-URL-Key:MYKEY"` earlier to set my key, and that appears to have worked because i can see it when i list my account information18:00
timburkeoh! just noticed that the tempurl was generated for a GET, but then you used it for a PUT... mind trying it as a GET (or HEAD)?18:06
lbragstadfrom what i can see in https://docs.openstack.org/api-ref/object-store/?expanded=create-or-replace-object-detail,list-activated-capabilities-detail,show-account-details-and-list-containers-detail#create-or-replace-object and https://docs.openstack.org/swift/latest/api/temporary_url_middleware.html that should be all i need to generate a temp url, right?18:06
lbragstadso - that was my next question :)18:07
lbragstadi was wondering if the `swift tempurl` bit was supposed to take the method you intended to use or the method that's actually used in the request18:07
lbragstadif i use `swift tempurl GET` to generate a tempurl for a GET request (allowing temporary access to a tempurl) - how do you set that on an object?18:08
lbragstadtimburke ack - setting PUT in the tempurl worked... https://pasted.tech/pastes/cf77d04cac53a93dd09c8701a03338c43333c172.raw18:10
timburkenothing is stored the object -- it's just a decision made based on the account or container metadata (to get the key), request method, request path, request expiration, and server timestamp18:10
timburke👍18:10
lbragstadok - so the method to get the signature doesn't attribute to access in the server?18:11
lbragstade.g., using `swift tempurl PUT` will still allow people with the tempurl to get the contents of that object using `curl -X GET $tempurl`?18:12
timburkeno, a PUT tempurl won't let you GET. it *will* let you HEAD, though (i suppose, so you can check whether the upload's already been completed?)18:14
lbragstadaha18:15
lbragstadhere i was trying to use `swift tempurl` to generate tempurls to _create_ tempurls18:15
lbragstadso - i think that's where my hangup was18:15
lbragstadbecause i was using it to generate requests i wanted to make in the future (e.g., i want temporary access to allow people to GET this thing)18:16
lbragstadand then i tried putting that into a PUT request to _create_ that URL... but the signatures obviously won't match18:16
timburkecool! yeah, the one tempurl should be enough, then. though i suppose there may be some value in varying the expiry slightly for fingerprinting...18:18
lbragstadok - that's what i was wondering because i have a deployment with tempurls that expire after a year18:18
timburkehmm... that long of a window will likely make it hard to rotate keys...18:19
lbragstadso - i was trying to figure out how expiration was set on a tempurl for a object and i assumed it was something you set on the put request when you created that pbject18:19
claygtimburke: i was also thinking about trying to claim back some of the lower byte namespace... i was curious where the s3 allowed names bottom out... part of me is scared I'll end up suggesting a v2 api and restricting object versioning to only that and s3api18:19
lbragstador is temp_url_expires not a settable thing on an object - was i interpreting that wrong?18:23
timburkeclayg, i know we've got some logic in s3api to get headers back and forth between quoted-printable (see https://tools.ietf.org/html/rfc1521.html) ... idk about name restrictions, though18:25
timburkelbragstad, temp_url_expires is purely a property of the request -- nothing gets stored with the object18:26
lbragstadaha18:26
timburkeproxy uses it; i don't think object server even sees it18:26
lbragstadhere i was thinking you could set the expiration of it18:26
lbragstads/it/the temp url/18:27
timburkeso: there's no way of knowing what tempurls have been generated for a particular object18:27
lbragstadif temp url expiration (TTL) is only a thing clients send, how does the server use it?18:27
lbragstadi guess i'm trying to understand the usecase18:28
timburkeserver needs it (1) to calculate the same signature as the client provided -- if the expiry doesn't match what was used when the client created the signature, the signature can't match -- and (2) to compare against the server time -- if the server time is past the expiry, it's ipso facto invalid18:30
claygtimburke: 😬 https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html#object-key-guidelines-special-handling18:31
lbragstadok - so temp_url_expired doesn't limit access at all18:32
timburkeclayg, "likely need to be URL encoded" doesn't give me much hope... looks like we need to test18:33
claygyeah, i'm working on that now18:33
timburkelbragstad, what do we mean be "limit access"? after an hour (give or take, if the client's reasonably in-sync with the cluster), either of those tempurls would be invalid and attempting to use them would yield only 401s18:36
lbragstadoh - it's a threshold?18:36
lbragstadagain - i'm sorry, i have my wires cross and i'm thinking about expiration differently18:38
lbragstadcrossed*18:38
timburkeyes -- you can test this by using --absolute to specify a date in the past18:38
timburkeheh, sorry -- we further the confusion a bit with our tempurl expiration and object expiration -- which are entirely orthogonal concepts18:39
lbragstadi was confused because i thought 1.) temp_url_expires was set somewhere on the object 2.) if i set it to something like a year in the future, in 366 days i won't be able to use that temp_url anymore18:39
timburke2 is true, 1 is not18:39
claygso using boto3 at least giving it quoted names just resulted in objects *named* '%00' and '%FF' etc18:41
clayggiving it the bytes worked (in the listing       "Key": "\u0001") - but for \x00 i get an error -> An error occurred (400) when calling the PutObject operation: Bad Request Unable to create key '\x00'18:42
timburke(and if you rotate your keys monthly, say, you won't even be able to use the tempurl for the full year)18:42
timburkeclayg, i'm getting more and more curious about what the actual bytes-on-the-wire look like...18:43
lbragstadtimburke ok - interesting, i think i get it now...18:44
timburkelbragstad, any feedback on what we could say in docs to make the mental-model more obvious? improvements always welcome and all ;-)18:45
lbragstadso the temp_url usage is obviously underpinned by the key used to sign it and the expiration it was given when the temp_url was "created"18:45
lbragstadcreated == signed with entropy (from uuid)18:46
lbragstadtimburke yeah - i'll thinking about this a bit more and re-read the docs18:46
lbragstadtimburke clayg thanks for the help, i really appreciate it18:47
timburkethanks :-)18:47
timburkelbragstad, "with entropy" and "from uuid" give me some pause, though -- do you mean from the object name? or from the key? or from something else?18:48
lbragstadtimburke nevermind - https://github.com/openstack/python-swiftclient/blob/2fcd4d872713dc30e7352845c37515280f1d21ab/swiftclient/utils.py#L17918:52
lbragstadi didn't fully read that18:52
timburkein all fairness, you shouldn't need to *read the source* to understand what's going on ;-)18:53
lbragstadafter rereading the temp_url_sig - it's clear18:54
timburkeclayg, this kinda makes me wish we had https://review.opendev.org/#/c/212824/ -- it should be pretty easy to write an audit-watcher that just scribbles down names that include chars in the \x01-\x08 range....19:04
patchbotpatch 212824 - swift - Let developers/operators add watchers to object audit - 12 patch sets19:04
claygtimburke: it's not obvious to me how that would be helpful... just for like finding out if such names exist?19:31
*** psachin has quit IRC19:33
timburkeclayg, yeah, mainly just having something we could have rledisez (for example) run to see if this even passes the sniff test19:36
rlediseztimburke: sure, if you want us to scan a bit our disks, just tell us (no need to merge the patch, it can even be a quick&dirty script with enough security to not sucks all the IO)19:41
claygtimburke: well the problem is also that s3 allows these characters in key names - everything except \x0019:43
timburkeclayg, good to know -- but how many s3 *clients* use them?19:44
dcourtoihello, should we consider that not being able to use hostnames in rings instead of IP addresses while using servers_per_port is a bug ?19:54
dcourtoi(it works if servers_per_port = 0)19:55
claygdcourtoi: i'm a little surprised it works with servers_per_port=0, i'm curious what breaks regardless - do you have a stack trace or something?20:02
dcourtoiI don't have any stack trace when I'm not using the servers_per_port feature, it works. But when I enable servers_per_ports, the ip/hostnames are in the ring are compared to what common.utils.whataremyips() returns, and it always returns IP addresses. So if we put hostnames in the ring the object-server process hangs without logging anything, indifinetly looking for a match between the hostname20:12
dcourtoiand whataremyips return value20:12
*** e0ne has joined #openstack-swift20:23
*** e0ne has quit IRC20:28
claygdcourtoi: i bet the issue is either IN common.ring.utils.is_local_device or it's just the cardinality of calling socket.getaddrinfo20:30
*** pcaruana has quit IRC20:35
claygdcourtoi: like on my machine whataremyips returned different values that socket.getaddrinfo - it's possible that with some configuration tweaking it could be made to work20:40
claygi think the reason it's not better "supported" with betters docs validation, and error messages is because not many folks have tried to set things up this way - if you can get it working it'd be much easier to know what bug to open and how to fix it20:41
dcourtoifor what I saw whataremyips always returns IP addresses, and in common.storage_policy dev['ip'] are compared to those IP addresses. I was able to make the object server start by forcing whataremyips to return the hostname when the hostname resolution failed (when socket.gaierror.errno == -2). I'll contunue digging tomorrow20:48
dcourtoithe hostname resolution fails because socket.AI_NUMERICHOST is passed to socket.getaddrinfo in whataremyips()20:51
dcourtoito be continued...20:52
*** NM has quit IRC21:08
openstackgerritTim Burke proposed openstack/swift master: WIP: New Object Versioning mode  https://review.opendev.org/68238221:08
claygoic, somewhere we look at the ips instead of just is_local_device21:15
claygtimburke: i'm extracting the specific character we use to a constant RESERVED_BYTE and updating tests and code21:15
claygi'm not sure that's helpful - but my assumption is we may decide "fuck it we'll just use \x01\x0121:16
clayg😞21:16
*** NM has joined #openstack-swift21:22
*** NM has quit IRC21:29
timburkeclayg, hmm... the signatures on things like patternCompare in https://www3.sqlite.org/cgi/src/artifact/ed33e38cd6420581 make me pretty nervous about trying to use NUL in a LIKE...22:09
*** gyee has quit IRC22:14
*** rcernin has joined #openstack-swift22:26
*** patchbot has quit IRC22:27
*** patchbot has joined #openstack-swift22:31
*** tkajinam has joined #openstack-swift23:00
*** joeljwright has quit IRC23:06
*** joeljwright has joined #openstack-swift23:07
*** ChanServ sets mode: +v joeljwright23:07
*** hoonetorg has quit IRC23:12
*** diablo_rojo has quit IRC23:17
*** hoonetorg has joined #openstack-swift23:25
*** tkajinam has quit IRC23:39
*** tkajinam has joined #openstack-swift23:40
mattoliveraumorning23:42

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!