Tuesday, 2021-10-05

kotait looks like python 3.10 is now available https://mail.python.org/archives/list/python-committers@python.org/message/OQWNWZWDPASOUOAT6VPUXIXBH2THYREC/01:39
slim00hi, i am looking to merge two swift clusters, any ideas how to merge account and container rings?01:42
kotaslim00: perhaps, composite ring would be helpful for your purpose https://docs.openstack.org/swift/latest/overview_ring.html#composite-rings01:46
timburke_slim00, are the hash prefix/suffix the same between the two clusters? i'd assume not -- in which case, you'll have to download everything from one cluster and upload it all to the other03:59
timburke_might be able to use container-sync to do the movement03:59
timburke_once a container finishes syncing, you can break the sync then issue a bunch of DELETEs to clear out the data03:59
timburke_as capacity frees up in the "old" cluster, you pick a node (or maybe a rack, depending on topology) and drop all its devices' weights to zero. wait for replication to drain them off, then take them out of the ring entirely03:59
timburke_then they should be good to get ingested into the "new" cluster (possibly with a device format in between)04:00
timburke_it's gonna be a long, slow, sucky process unfortunately -- doubly so if capacity is tight04:00
timburke_slim00, i guess my main question would be: why do you want to merge the two clusters? operational concerns, client ease of use, ... something else?04:03
slim00kota, thanks. will read more about composite ring04:12
slim00timburke, they are not the same04:13
slim00timburke, will explore container-sync option. yes, it's more for operational issue and we would like to remove the old cluster and only operate new cluster04:14
timburke_slim00, is there a pretty good link between the clusters? are there still new writes going into the old cluster?04:26
timburke_prometheanfire, if i had to guess, i'd say dnspython is the likely culprit. looks like the type of an answer's rrset.items changed in the 1->2 transition?04:27
slim00timburke, yes there are good link between them and yes new writes still going in04:28
timburke_oh hey... eventlet actually seems to support it dnspython>=2.0.0 ... https://github.com/eventlet/eventlet/issues/61904:29
timburke_slim00, fwiw, if the new cluster has enough spare capacity that you don't need to transition hardware between the clusters, i think it should simplify some things. you can set up container sync to push from old to new, then once things seem mostly caught up, use the read-only middleware to stop new writes and wait for the sync to finish04:32
prometheanfirethat makes sense04:32
timburke_monitoring your progress is likely to be difficult, and it's probably not going to go as fast as you'll want it to, particularly if there's a lot of EC data04:33
prometheanfireman, that was a large bump in dnspython04:34
slim00timburke, thanks for the suggestions. going to try it out in the lab.04:34
timburke_we've put it off for a while, mostly because of the lack of eventlet support :-(04:34
prometheanfirewell, if I get this merged...04:35
opendevreviewTim Burke proposed openstack/swift master: cname_lookup: Work with dnspython 2.0+  https://review.opendev.org/c/openstack/swift/+/81242404:50
timburke_prometheanfire, well that was easy! who knows how well it works in practice, of course... but at least unit tests should pass!04:51
prometheanfireheh04:54
opendevreviewMatthew Oliver proposed openstack/swift master: container-updater: no incoming syncs no account update  https://review.opendev.org/c/openstack/swift/+/81183308:58
mattoliver^ there is one approach at it also supporting single replica container rings.. not 100% on it. 09:00
mattoliveracoles:  ^09:00
acolesmattoliver: ack, thanks09:01
opendevreviewMerged openstack/swift master: cname_lookup: Work with dnspython 2.0+  https://review.opendev.org/c/openstack/swift/+/81242421:07
reid_gQuestion: When we specify different IPs/Ports for replication in the ring, how does the reconstructor work? I see calls to the replication IP:Port in the error log, but I see a huge amount of traffic going over the normal network during a rebalance. before rebalance the normal network is around ~2GB/s in the cluster. During rebalance it is ~15-25GB/s while the replication network went from about 800MB/s to 1.8GB/s21:08
timburke_:-/ https://github.com/openstack/swift/blob/master/swift/obj/reconstructor.py#L396-L397 looks suspicious -- that should probably be using replication_ip/replication_port21:13
timburke_the good news is that SSYNC traffic should be using the replication network: https://github.com/openstack/swift/blob/master/swift/obj/ssync_sender.py#L235-L23621:15
timburke_but we should really be pulling frags for reconstruction over that, too21:15
opendevreviewTim Burke proposed openstack/swift master: ec: Use replication network to get frags for reconstruction  https://review.opendev.org/c/openstack/swift/+/81261421:22
timburke_reid_g, good spot! i'm surprised we never noticed that before...21:23
reid_gThat is pretty traffic intensive because it is trying to reconstruct the data that is missing right?21:24
reid_gThe ssync part only matters if the reconstructor is pushing the data to the correct node?21:26
timburke_yup, i wouldn't be surprised if it's fairly traffic intensive -- reverting data from handoffs should just use the replication network, but any reconstruction would need ndata frags for every frag it sent21:27
timburke_is there an expansion going on, or is this day-to-day "make sure everything is durable" reconstruction?21:28
prometheanfirenice, the dnspython change merged :D21:54
reid_gThis is an expansion. I have a 1 or more rebalances left to do but we are adding to other clusters 21:54
timburke_reid_g, if you haven't already, you might want to turn on handoffs_only -- it should prevent reconstruction so you can use those iops just to rebalance data, and as a side-benefit it should only be doing stuff on the replication network22:00
timburke_(that's probably why we hadn't really noticed the problem before...)22:00
reid_gI think I get why we want to use the handoffs only option now... It causes the reconstructors to just do push instead of recreating the missing fragments which is ligher operation?23:53
reid_gJust clicked... we have not been using that setting23:57

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!