Wednesday, 2020-09-30

timburkestay safe mattoliverau! GL!00:18
*** gyee has quit IRC01:08
openstackgerritXuan Yandong proposed openstack/swift-bench master: Remove six Replace the following items with Python 3 style code.  https://review.opendev.org/75515101:21
*** clayg has quit IRC02:14
*** tkajinam has quit IRC02:14
*** StevenK has quit IRC02:14
*** mattoliverau has quit IRC02:14
*** tkajinam_ has joined #openstack-swift02:15
*** StevenK has joined #openstack-swift02:15
*** clayg has joined #openstack-swift02:15
*** ChanServ sets mode: +v clayg02:15
*** mattoliverau has joined #openstack-swift02:20
*** tepper.freenode.net sets mode: +v mattoliverau02:20
*** viks____ has joined #openstack-swift02:28
*** rcernin has quit IRC02:55
*** rcernin_ has joined #openstack-swift02:56
*** psachin has joined #openstack-swift03:31
*** psachin has quit IRC03:32
*** psachin has joined #openstack-swift03:33
*** m75abrams has joined #openstack-swift04:22
*** evrardjp has quit IRC04:33
*** evrardjp has joined #openstack-swift04:33
*** mikecmpbll has joined #openstack-swift04:37
openstackgerritXuan Yandong proposed openstack/swift-bench master: Remove six and py27 tox Replace the following items with Python 3 style code.  https://review.opendev.org/75515106:06
openstackgerritXuan Yandong proposed openstack/swift-bench master: Remove six and py27 tox  https://review.opendev.org/75515106:43
*** rcernin_ has quit IRC07:06
*** rcernin_ has joined #openstack-swift07:17
*** rcernin_ has quit IRC07:20
*** rcernin has joined #openstack-swift07:20
*** mikecmpbll has joined #openstack-swift08:06
*** rcernin has quit IRC08:48
openstackgerritwu.shiming proposed openstack/swift master: requirements: Drop os-testr  https://review.opendev.org/75523208:53
*** ab-a has quit IRC09:35
*** ab-a has joined #openstack-swift09:36
openstackgerritMerged openstack/swift stable/train: py3: Fix swift-dispersion-populate  https://review.opendev.org/75485310:02
*** StevenK has quit IRC10:58
*** StevenK has joined #openstack-swift10:58
*** rcernin has joined #openstack-swift11:50
*** rcernin has quit IRC12:16
*** tkajinam_ has quit IRC13:12
*** m75abrams has quit IRC13:57
*** gyee has joined #openstack-swift14:59
*** ozzzo has joined #openstack-swift15:19
*** Hamidreza has joined #openstack-swift15:24
HamidrezaHi15:25
HamidrezaI've a question about openstack swift storage15:25
HamidrezaI add 20 disks to my cluster and nodes then update the ring, Now it should balance the data but it didn't do that!!!15:25
Hamidrezawhat should i do?15:25
timburkeHamidreza, have you checked that the object-replicator is running on all nodes?15:27
HamidrezaI checked object-replicator and even rsync proccess15:28
Hamidrezaand they were working15:28
ormandjwe see a lot of intermittent ConnectionTimeouts to backend servers (using servers per port of 2) - i would have expected slowness, but not connection timeouts if disks are saturated. is this expected with ussuri?15:28
Hamidrezaand even i increased the number of proccess15:28
timburkei just saw http://lists.openstack.org/pipermail/openstack-discuss/2020-September/017675.html -- note that you *won't* want to keep the fs read-only; the replicator needs to be able to delete data that no longer belongs on that disk15:30
Hamidrezaok, what can I do ?15:32
timburkeHamidreza, you may want to look at the handoffs_first and handoff_delete options: https://github.com/openstack/swift/blob/2.26.0/etc/object-server.conf-sample#L287-L30415:32
timburkeif things have been fairly healthy, you should be fine to set handoffs_first=True and handoff_delete=1, restart the replicators, and wait for a replication cycle -- the full drives should start draining fairly quickly at that point15:34
timburkeodds are, you'll be limited by the iops of the new drives15:34
HamidrezaI don't want to start object replicator15:36
HamidrezaI've stoped it before15:36
Hamidrezabecause, oneday suddenly I saw that all of my disks get broken one by one15:37
HamidrezaI think they were under high pressure15:37
Hamidrezaso I disable the object replicator15:38
Hamidrezaafter that day none of my disks get broken!!!15:38
timburke"broken" how? the replicators (and, if using erasure coding, reconstructors) are how swift (1) ensures that data remains durable even in the face of failing drives and (2) moves data as part of expansions so that drives don't fill up. it's a vital part of your swift deployment15:42
*** openstackgerrit has quit IRC15:46
Hamidreza(2) yeah, this is vital part of swift deployment but it didn't work for me. it must rebalance the disks and move from full disks to empty15:47
*** mikecmpbll has quit IRC15:53
*** mikecmpbll has joined #openstack-swift15:54
*** Hamidreza has quit IRC16:02
*** psachin has quit IRC16:34
claygHamidreza: maybe try changing the ionice_priority setting for the object-replicator with rather low concurrency and handoffs_first=True while monitoring your devices with iostat16:50
clayghopefully you can find a balance that allows your disks to service the io needs of your client facing traffic as well as the consistency engines io needs for background work16:50
claygin an emergency you can also turn off other processes like the object-auditor16:50
claygif you run container resources on the same disks as your object devices that can put a lot of pressure on those disks as well16:51
claygdedicated ssd's are best16:51
claygtimburke: so my config tests are having problems with the dlo middleware trying to reparse the config for legacy options...16:52
claygI'm sure I can get the tests passing - but i'm not looking forward to an overhaul of staticweb16:52
claygis there maybe a better idea than message passing via the request environ for how SLO can signal to proxy-logging that an error occurred during the iterator?16:54
claygi feel like catch errors and proxy logging are starting to kinda team up or converge when it comes to watch dogging the iterators on content length 🤔16:55
timburkeormandj, connection timeouts aren't so surprising, especially if it's a busy cluster. one of the things that can happen is the object-server gets stuck waiting on disk IO, so incoming connections can't be accepted. kernel will queue some of them, but eventually, the connect will block. you can use something like `lsof -a -u swift -i -s TCP:LISTEN -T q` to check in on your listen queue depths16:55
ormandjtimburke: yeah, is there any way around it? tl;dr, we've had to lower replication workers to almost nothing and are only doing about 40MB/s to the 'new' server, and it's still causing every client massive problems with ~8 out of 56 drives per server relatively iops saturated16:56
ormandjcustomers would be fine with slow, but broken is not so much. we could increase conn_timeout, but historically, that really hasn't helped16:57
ormandjwe're about to try servers_per_port =4 instead of 2 in hopes it will help16:57
ormandjlesson learned, don't deploy with less than 20+ nodes in the future, but trying to figure out a way out of this pickle in the meantime hah16:58
ormandji think we're down at 12 replication workers atm16:59
ormandjsee some read queues of 20-30ish17:07
ormandjusing ss, read/send queues at 127/12817:08
*** openstackgerrit has joined #openstack-swift17:41
openstackgerritClay Gerrard proposed openstack/swift master: Test proxy-server.conf-sample  https://review.opendev.org/75508717:41
openstackgerritClay Gerrard proposed openstack/swift master: Add staticweb to default pipeline  https://review.opendev.org/75513217:47
openstackgerritClay Gerrard proposed openstack/swift master: Log error processing manifest as ServerError  https://review.opendev.org/75277017:47
*** recyclehero has joined #openstack-swift17:59
recycleherohi17:59
recycleheroconsider horozin, keystone and databases gone. swift-proxy and container-account-object are avaialbe.18:01
recycleherowondering is there any hope for recovery of data or just burn the thing18:01
DHErecovery how?18:03
recycleheroBmy queens infra was very unstable mostly hardware problems. going to deploy new usurri with kolla18:03
recycleherowhat should I do with my swift? its something good to know for planning DR later18:04
recycleheroDHE: getting files18:04
recycleheroobjects sorry18:04
DHEthe official thing I could suggest is making a project with the same project ID as the old one. however looking at my (admittedly old) version of the openstack cli tool there isn't a means to select your own uuid so you might have to go into the keystone DB post creation and change it18:05
DHEauthentication is really by your project membership. other than using read/write ACLs swift doesn't care much about individual users18:05
DHEI built my swift cluster to have minimal keystone dependencies, so there are tempurl secrets everywhere and most programs that would authenticate use that instead. really keystone is only needed for deletion (that might be beatable, but I'm using bulk delete so not bothering) and container/object listings18:06
recycleheroDHE: aha, but I have to give it to use swift to get them back. is there a way I can assemble files from swift dbs?18:07
DHEwell all the swift dbs under the account and container directories on your disks are sqlite so you can absolutely stick your nose in there with the sqlite tool18:07
recycleherounderlying fs + swift dbs18:08
recyclehero?18:08
DHEif you know the object URLs you want there's swift-get-nodes which will take the ring file and path name and provide both full server+paths, and CURL commands to get some data18:09
recycleherofiles are made into chunks if I am correct. somehow one should assemble them back right?18:09
timburkeyour swift data is still safe and sound; the difficult part will be finding it. as DHE suggests, if you can get the project IDs to match between old and new, it'll all be there and available. if matching project IDs isn't feasible, you should still be able to create a "reseller admin" user that will have full access to any account; that user could then do server side copies of data to a new location, then clean up the old data18:09
DHEif you're using EC, yes they are.  if you're using multi-way replication each file is fully intact18:09
recycleherotimburke: nice. I read the OS was made from nova and swift respectivly by nasa and rackspace. I like swift18:11
timburkeus too :-)18:12
recycleheroand all I should do witn cmd now?18:12
recycleherosure. thanks :-)18:13
timburkethe cardinality of accounts is usually fairly small -- i'd probably write a little script to just walk the account disks on each node and build a list of all accounts in the cluster. then you'd need figure out which account was who's and make a mapping from old account to new account, which may be non-trivial. then script a data-mover and wait a while18:16
timburkemight be worth doing periodic (encrypted) backups of the keystone db into swift ;-) then i think you could just restore from the backup and be up and running again18:17
recycleherothanks again I will start and came back when I have some progress18:23
openstackgerritRomain LE DISEZ proposed openstack/swift master: Fix a race condition in case of cross-replication  https://review.opendev.org/75424218:32
ormandjif you have logs, you can probably clean a lot of account info out of them18:32
ormandjand just mangle the db18:32
openstackgerritRomain LE DISEZ proposed openstack/swift master: Fix a race condition in case of cross-replication  https://review.opendev.org/75424218:38
timburkealmost meeting time!20:51
seongsoocho\o/20:54
kota_morning20:57
mattoliverauMorning20:59
*** nicolasbock has joined #openstack-swift21:11
nicolasbockHi! Does swift support rewriting requests to public (staticweb, tempurl) containers? I am looking to be able to point my browser to `www.example.com` and be redirected to `https://swift-cluster.example.com/v1/AUTH_account/container/object?....`21:37
timburkenicolasbock, you'll want to look at the cname_lookup and domain_remap middlewares, but the long and short of it is yeah, that's doable21:39
nicolasbockOh cool21:39
nicolasbockThanks for the pointer timburke !21:39
timburkenp! idea is to have www.example.com have a cname record pointing to something like container.auth-account.swift-cluster.example.com, then cname_lookup does the translation in the received host header and domain_remap unpacks the account/container pieces21:44
nicolasbockNice, that doesn't sound too bad21:46
DHEI'm just using nginx as a proxy to the proxy (HA!) no tempurl though (but it can be done if need be)21:48
claygwsgi is the worst abstraction for web request processing; except for all the others22:04
claygMy 12 year old has his very first football scrimmage tonight!  Go Rice Ravens!22:05
timburkehaha nice! have fun!22:05
timburkeoh! i also kinda wanted to point out https://review.opendev.org/#/c/751966/ to people -- i still haven't gotten a fips-enabled vm to come back up in a usable state, but it looks like the patch might be about ready22:06
patchbotpatch 751966 - swift - replace md5 with swift utils version - 11 patch sets22:06
timburkemerging it will result in a bunch of merge conflicts, so it seems worth having on people's radars22:06
timburkeand if it means i'll be able to get even just one review from cschwede i'm calling it worth it ;-)22:07
*** rcernin has joined #openstack-swift22:12
*** mikecmpbll has quit IRC22:23
*** mikecmpbll has joined #openstack-swift22:29
*** tkajinam has joined #openstack-swift23:00
openstackgerritTim Burke proposed openstack/liberasurecode master: Be willing to write fragments with legacy crc  https://review.opendev.org/73895923:45
openstackgerritTim Burke proposed openstack/swift master: ec: Add an option to write fragments with legacy crc  https://review.opendev.org/73916423:51

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!