Wednesday, 2021-06-30

*** dviroel|out is now known as dviroel11:29
timssHi, my object nodes have a 2xSSD (RAID1 w/LVM for OS) and 64xHDD setup and I'm looking into possibly seperating account/container onto leftover space of the SSD RAID for extra performance. With our previous growth the space requirement for containers seems to be fine (~85G per server), and I've been able to configure the rings accordingly.13:10
timssQuestion is; with only 1 device (logical volume) per object node (atleast 7), will it be enough devices for a healthy distribution, and how would one go about deciding part power etc.? Wasn't able to find any references online, although I think I've heard people run similar setups (albeit maybe with a more significant number of devices for the account/container rings)13:10
opendevreviewAlistair Coles proposed openstack/swift master: relinker: tolerate existing tombstone with same X-Timestamp  https://review.opendev.org/c/openstack/swift/+/79884913:10
DHEas long as you have more devices than you have distributed copies you're fine. the concern comes when you add additional redundancy into the system, like multiple failure zones (racks)13:42
DHEat 7 servers I'm guessing they're all in the same rack connected to the same switch?13:42
timssAs of this time unfortunately yes, all in the same rack13:42
timssAt least there's redundant networking and power, but it's not optimal for sure13:43
DHEso redundancy concerns where you're giving some topology information to swift become more serious with only 7 copies and, say, 2 failure zones13:45
DHE*7 servers13:45
timssIn this scenario there's no real difference between the servers inside the same rack so even defining a clear failure domain is a bit tricky. From my understanding even with 1 region and 1 zone, Swift would at least ensure all 3 replicas will be spread on different servers (and their partitions). Not sure if splitting it up would help much, or?13:53
zaitcevSwift spreads partitions in tiers. First to each region, then to each zone, then to each node, and finally to each device.14:23
zaitcevThis allows to assign zones to natural failure boundaries, such as racks.14:24
zaitcevBut each tier can be degenerate: 1 region total, 1 zone total, etc.14:24
zaitcevSo, 7 nodes for replication factor 3 sounds fine to me.14:25
zaitcevGives space to handoff nodes beyond the strictly necessary 3 in the node tier.14:27
timssCheers to both, I feel like I can live with this setup, and if growth is to continue I would perhaps introduce another zone or region at some point, but for it is what it is, and the application of this cluster should be fine with the level of redundancy set, was more worried about the very low amount of devices than anything14:29
zaitcevIs your replication factor 3 for the container and account rings?14:30
timssThat's the plan14:30
zaitcevSounds adequate to me.14:31
timssNext up would be to decide the partition power for the account/container rings. I've usually seen at least 100-1000 partitions per device be recommended, but clayg's pessimistic part power even recommends as much as ~7k at a part power of 14. Perhaps the general recommendations doesn't play well with the very low amount of devices, but dunno14:33
claygthe recommendation was more "on the order of 1K" - so 2-3, maybe 5-6 is fine but >10 starts to look sketchy even if you are "planning" for some growth14:35
claygnow that I have more experience with part power increase I wonder if my recommendations about picking a part power may have changed (for objects at least; AFAIK no one has attempted a a/c PPI)14:36
zaitcevI never understood economizing at partitions. The more, the better for the replication. The biggest clusters can have issues like having too many i-nodes, which auditors and replicators constantly refresh in the kernel. If you have adequate RAM to contain the inodes and the rings, what's the downside?14:37
zaitcevIs there a problem with replicator passes taking too long?14:38
timssoh, seems I summoned the man himself involuntarily :D14:38
timssback to object rings primarly now, but I'm curious what made some folks over at Rackspace recommend more in the scale of ~200 partitions per drive in their ring calculation tool https://rackerlabs.github.io/swift-ppc/14:46
timssI've been running ~6k partitions per device (pp 19) on a previous installation for years which has been going ok, but replication performance isn't the best (probably more factors to it, it hasn't gotten that much love)14:48
opendevreviewAlistair Coles proposed openstack/swift master: relinker: don't bother checking for previous tombstone links  https://review.opendev.org/c/openstack/swift/+/79891415:10
opendevreviewHitesh Kumar proposed openstack/swift-bench master: Migrate from testr to stestr  https://review.opendev.org/c/openstack/swift-bench/+/79894118:13
timburkeanybody care much about swift-bench? looks like ~a year ago i proposed we drop py2 for it: https://review.opendev.org/c/openstack/swift-bench/+/74155319:24
*** dviroel is now known as dviroel|out20:41
zaitcevI would not mind. It's a client, isn't it? Surely new test runs for it run on new installs. No data gravity.20:45
opendevreviewTim Burke proposed openstack/swift master: reconciler: Tolerate 503s on HEAD  https://review.opendev.org/c/openstack/swift/+/79653820:45
zaitcevWell I can imagine benching from an ancient kernel in case there's an anomaly in a new one.20:46
zaitcevBut frankly I suspect the time for that is in the past.20:46
kotagood morning20:56
timburkeo/20:57
kotatimburke: o/20:58
timburke#startmeeting swift21:00
opendevmeetMeeting started Wed Jun 30 21:00:37 2021 UTC and is due to finish in 60 minutes.  The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot.21:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.21:00
opendevmeetThe meeting name has been set to 'swift'21:00
timburkewho's here for the swift meeting?21:00
kotao/21:01
acoleso/21:01
timburkepretty sure mattoliver is out sick -- we'll see if clayg and zaitcev end up chiming in later ;-)21:03
zaitcevo/21:04
timburkeas usual, the agenda's at https://wiki.openstack.org/wiki/Meetings/Swift21:04
timburke#topic swift-bench and py221:04
timburkeso a while back i proposed that we drop py2 support from swift-bench: https://review.opendev.org/c/openstack/swift-bench/+/74155321:05
timburke...and then i promptly forgot to push on getting it merged at all :P21:05
timburkei saw that there's a new patch up for swift-bench (https://review.opendev.org/c/openstack/swift-bench/+/798941) -- and the py2 job seems broken21:06
kotai see. it's updated in Jul 202021:06
timburkeso i thought i'd check in to see whether anyone objects to dropping support there21:07
timburkesounds like i'm good to merge it :-)21:09
kota+121:10
timburkeon to updates!21:10
timburke#topic sharding21:10
timburkeit seems like acoles and i are getting close to agreement on https://review.opendev.org/c/openstack/swift/+/794582 to prevent small tail shards21:11
timburkewere there any other follow-ups to that work we should be paying attention to? or other streams of work related to sharding?21:11
acolesIIRC mattoliver had some follow up patch(es) for tiny tails but I don't recall exactly what21:13
acolesmaybe to add an 'auto' option, IDK21:14
timburkesounds about right. and there's the increased validation on sharder config options -- https://review.opendev.org/c/openstack/swift/+/79796121:16
timburkei think that's about it for sharding -- looking forward to avoiding those tail shards :-)21:17
timburke#topic relinker21:17
timburkewe (nvidia) are currently mid part-power increase21:18
timburkeand acoles wrote up https://bugs.launchpad.net/swift/+bug/1934142 while investigating some issues we saw21:18
timburkebasically, the reconciler has been busy writing out tombstones everywhere, which can cause fomr relinking errors as multiple reconcilers can try to write the same tombstone at the same time21:20
acoleswe're fortunate that the issue has only manifested with tombstones, as a result of the circumstances of the reconciler workload we had and the policy for which we were doing part power increase21:21
zaitcevOh I see. I was just thinking about it.21:21
acolesits relatively easy to reason about tolerating a tombstone with different inode,  data files would probably require more validation that 'same filename'21:22
timburkea fix is currently up at https://review.opendev.org/c/openstack/swift/+/798849 that seems reasonable, with a follow-up to remove some now-redundant checks at https://review.opendev.org/c/openstack/swift/+/79891421:22
acolestimburke: if we feel happy about the follow up I reckon I should squash the two 21:23
acoleswe're basically relaxing the previous checks rather than adding another21:24
timburkei think i am, at any rate. i also think i'd be content to skip getting the timestamp out of metadata21:24
acolesyeah, that was my usual belt n braces :)21:25
timburkesurely the auditor includes a timestamp-from-metadata vs timestamp-from-file-name check, right?21:26
acolesidk21:26
acolesok i'll rip out the metadata check and squash the two21:27
timburke👍21:28
timburke#topic dark data watcher21:28
timburkei saw acoles did some reviews!21:28
zaitcevYes21:28
timburkethanks :-)21:28
acolesyes!21:28
zaitcevIndeed.21:28
acoleswell just one21:28
acolesiirc i was happy apart from some minor fixes21:29
zaitcevI squashed that already but now I'm looking at remaining comments, like the one about when X-Timestamp is present and if an object can exist without one.21:30
acoleszaitcev: i think its ok, the x-timestamp should be there if the auditor passes the diskfile to watcher21:31
timburkeand if the auditor *doesn't* check for it, it *should* and idk that the watcher necessarily needs to be defensive against it being missing21:32
zaitcevok21:33
timburkeall right, that's all i had to bring up21:34
timburke#topic open discussion21:34
timburkewhat else should we be talking about?21:34
zaitcevHackathon :-)21:35
timburkei love that idea -- unfortunately, i don't think it's something we can do yet21:38
timburkeshort of a virtual one, at any rate21:38
kotaexactly21:39
opendevreviewPete Zaitcev proposed openstack/swift master: Make dark data watcher ignore the newly updated objects  https://review.opendev.org/c/openstack/swift/+/78839821:39
timburkespeaking of -- looks like we've got dates for the next PTG: http://lists.openstack.org/pipermail/openstack-discuss/2021-June/023370.html21:39
timburkeOct 18-22, still all-virtual21:39
acolesack21:40
* kota will register it21:41
zaitcevI'm just back from a mini vacation at South Padre. Seen a few people in masks. Maybe one in 20.21:41
timburkeyeah, but you're in TX ;-)21:42
zaitcevThe island is overflowing. I guess the international vacationing still not working. People even try to surf, although obviously the wave is pitiful in the Gulf absent a storm.21:42
timburkei just check; my company's guidelines for travel are currently matching their guidelines for office re-opening, which is "not yet"21:42
zaitcevok21:43
timburkeall right, let's let kota get on with his morning :-)21:43
acolesis the us even allowing aliens in ? without quarantine?21:43
timburkethank you all for coming, and thank you for working on swift!21:44
timburke#endmeeting21:44
opendevmeetMeeting ended Wed Jun 30 21:44:23 2021 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)21:44
opendevmeetMinutes:        https://meetings.opendev.org/meetings/swift/2021/swift.2021-06-30-21.00.html21:44
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/swift/2021/swift.2021-06-30-21.00.txt21:44
opendevmeetLog:            https://meetings.opendev.org/meetings/swift/2021/swift.2021-06-30-21.00.log.html21:44
timburkeacoles, it looks like they probably wouldn't let you in: https://www.cdc.gov/coronavirus/2019-ncov/travelers/from-other-countries.html :-(21:48
claygsorry i missed the meeting; scrollback all looks good 👍21:58
opendevreviewMerged openstack/swift-bench master: Drop testing for py27  https://review.opendev.org/c/openstack/swift-bench/+/74155323:54
opendevreviewTim Burke proposed openstack/swift-bench master: Switch to xena jobs  https://review.opendev.org/c/openstack/swift-bench/+/74155423:56

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!