21:00:16 <timburke> #startmeeting swift
21:00:17 <openstack> Meeting started Wed Dec 16 21:00:16 2020 UTC and is due to finish in 60 minutes.  The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:00:18 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:00:20 <openstack> The meeting name has been set to 'swift'
21:00:25 <timburke> who's here for the swift meeting?
21:00:33 <kota_> hi
21:00:35 <seongsoocho> o/
21:00:50 <acoles> hi
21:00:52 <rledisez> hi o/
21:01:01 <mattoliverau> o/ (only for a short while)
21:02:00 <timburke> thank you all for coming -- i may be a little in-and-out; handling some childcare duties again
21:02:18 <timburke> as usual, the agenda's at https://wiki.openstack.org/wiki/Meetings/Swift
21:02:35 <timburke> first up
21:02:42 <clayg> o/
21:02:46 <timburke> #topic end-of-year meeting schedule
21:03:15 <clayg> how about: this is it.  this is the last meeting of 2020.  see you later 2020.
21:03:45 <timburke> yeah, that :-)
21:03:56 <mattoliverau> Lol, damn you 2020 :p
21:04:06 <zaitcev> Red Hat enters the Christmas shutdown until January.
21:04:20 <zaitcev> But we could have one last meeting if we wanted.
21:04:52 <timburke> nah -- next meeting as Jan 6 seems perfectly reasonable
21:05:36 <timburke> next topic
21:05:41 <timburke> #topic audit watchers
21:05:46 <timburke> we're so close!
21:06:14 <timburke> #link https://review.opendev.org/c/openstack/swift/+/706653
21:06:15 <zaitcev> Well, the df was the last principal problem, I think.
21:06:35 <zaitcev> Now, even if we get back to independent processees in the future, we can.
21:06:47 <timburke> sorry that i haven't done another pass since my review last week
21:07:01 <zaitcev> So, I'm honestly content with the final revision.
21:07:25 <mattoliverau> I'll take another look this week, and hopefully add my +2 again
21:07:46 <timburke> i am still a bit worried about the need to distinguish start/end for different workers when you've got more than one
21:08:35 <zaitcev> mattoliverau: I added the doc that you asked. And it includes the Dark Data part after all. At first, I hoped to sweep it under the carpet and only use it in case of emergency at customer clusters.
21:08:45 <mattoliverau> Nice
21:09:14 <mattoliverau> timburke: I thought it added their name to the logger
21:09:18 <zaitcev> Yes, you were right. Doing what I meant is how tribal memory is generated and its' wrong.
21:09:33 <zaitcev> Yes, logs have prefixes
21:09:54 <timburke> i think the device_key/worker-id was my last major concern, and i think we could remedy that with a new arg to end (and maybe start? i'm not actually sure how important it is there...)
21:10:01 <zaitcev> And in fact, using watcher_name is better because that comes from proxy-server.conf, and is not the name of the Python class.
21:10:57 <zaitcev> It's easy to add new arguments thanks to Sam's foresight. I was way more concerned about letting df stuck in there. But if you want to add some, easy to do in a follow-up.
21:11:33 <timburke> mattoliverau, we get watcher prefixes, but we still don't have a way to distinguish between the same watcher spread across multiple workers
21:11:57 <zaitcev> True, but why is it needed?
21:12:52 <timburke> if you've got, say, 24-bay chassis and 4 workers per node (so each worker is responsible for 6 disks), when you go to dump stats to recon (say), you don't want to have all four workers writing a quarter of the full stats to the same keys
21:12:58 <zaitcev> Workers are ephymeral, so... even if you know PID, all you can is kill the whole auditor and maybe restart it. Thinking as an operator here, the most important is to know which object triggered issues.
21:13:15 <zaitcev> Oh, that way.
21:13:50 <timburke> i was realizing it as i was thinking through https://review.opendev.org/c/openstack/swift/+/766640 (watchers: Add a policy-stat watcher)
21:14:23 <mattoliverau> Good spot. So we need to append a worker I'd or something?
21:15:09 <timburke> i think so. soemthing along the lines of the device_key from https://github.com/openstack/swift/blob/2.26.0/swift/obj/auditor.py#L98-L103
21:15:23 <zaitcev> Right... of course you can do os.getpid() now safely, but eh... My mental model was that you just increment all stats in some center place like memcached or Prometheus, and reset them at wall clock moment, like midnight on Mondays, rather than when auditor starts. That would give you comparable counts to watch trends.
21:16:21 <zaitcev> If you insist on recon specifically, than a key is needed.
21:16:46 <zaitcev> But I think you can add it in a follow-up.
21:17:07 <timburke> good thought on os.getpid() -- forgot about that... might be sufficient
21:17:22 <zaitcev> os.getpid() changes when auditor restarts, so you'll have a ton of old recon files in /var
21:17:36 <zaitcev> well, if you reboot
21:18:08 <timburke> i could also re-work it so that everything's always aggregated by device, and i write things to recon based on that. saves the same problem when worker count changes
21:18:52 <zaitcev> Hmm. We never have 2 workers crawling the same device?
21:19:14 <timburke> shouldn't; not for the same audit-type, anyway
21:20:13 <timburke> (seems like it'd make for more disk-thrashing)
21:21:25 <zaitcev> Okay. I still think it's good for your and Matt's final review pass.
21:21:26 <timburke> oh, i also need to think about the resumability of auditors... if they get interrupted, they pick up again more or less where they left off, right? hmm...
21:21:37 <timburke> all right, i'll make sure to review it again within the next three weeks, and it sounds like mattoliverau will try to do the same
21:21:44 <zaitcev> More or less. They write that json thing checkpoint.
21:22:27 <timburke> #topic py3 fixes
21:23:16 <timburke> i was noticing that we've got a few py3 fixes that i wanted to raise attention for
21:23:38 <zaitcev> see https://wiki.openstack.org/wiki/Swift/PriorityReviews
21:24:10 <acoles> timburke: I'll volunteer to review https://review.opendev.org/c/openstack/swift/+/759075 if you like
21:24:38 <timburke> thanks! it could use a test, but i know i've seen https://bugs.launchpad.net/swift/+bug/1900770 while running tests in my aio
21:24:40 <openstack> Launchpad bug 1900770 in OpenStack Object Storage (swift) "py3 comparison troubles" [High,In progress]
21:25:16 <acoles> yup, maybe I'll put a test together, will do me good to re-educate myself about bad buckets
21:25:36 <timburke> https://review.opendev.org/c/openstack/swift/+/765204 has been observed in the wild: https://bugs.launchpad.net/swift/+bug/1906289
21:25:37 <openstack> Launchpad bug 1906289 in OpenStack Object Storage (swift) "Uploading a large object (SLO) in foreign language characters using S3 browser results in 400 BadRequest - Error in completing multipart upload" [High,Confirmed]
21:26:23 * mattoliverau needs to take the car in for a service.
21:26:41 <mattoliverau> Gotta run, have a great one all o/
21:27:54 <zaitcev> mattoliverau: later
21:28:24 <timburke> and https://review.opendev.org/c/openstack/swift/+/695781 is one that i'd mostly forgotten about, but can let bad utf-8-decoded-as-latin-1-encoded-as-utf-8 out to the client
21:28:46 <zaitcev> right... are there any more besides these 3
21:29:17 <timburke> probably. those are the three i could remember ;-)
21:30:09 <timburke> i *really* want to get to the point that i can feel confident in moving my prod clusters to py3
21:31:40 <timburke> moving on
21:31:50 <timburke> #topic finishing sharding
21:32:34 <timburke> i came in late last week, so i wanted to check if there was any more discussion needed here, or if we've got a pretty good idea of what would be involved
21:33:08 <zaitcev> I don't, but I sent David to investigate and teach me :-)
21:33:21 * zaitcev manages
21:33:49 * zaitcev shuffles some more documents
21:34:24 <acoles> my summary was: 1. be able to recover from whatever could go wrong with auto-sharding (split brain) 2. do our best to prevent split-brain autosharding  3. get more confident about auto-shrinking
21:35:06 <acoles> and suggested some current patches as a good starting place to get involved
21:35:17 <timburke> sounds like a great plan :-)
21:35:31 <acoles> e.g. the chain starting with https://review.opendev.org/c/openstack/swift/+/741721
21:35:35 <timburke> i won't worry then
21:35:41 <acoles> haha
21:35:51 <timburke> one last-minute topic
21:35:58 <timburke> #topic stable gate
21:36:07 <acoles> BTW I updated priority reviews because I have squashed a couple of patches into https://review.opendev.org/c/openstack/swift/+/741721
21:36:48 <timburke> currently, things are fairly broken. mostly to do with pip-on-py2 trying to drag in a version of bandit that's py3-only
21:38:00 <timburke> there are some patches to pin bandit, and at least some of them are mergeable, but it looks like there are some other requirements issues going on that complicate some branches
21:38:27 <acoles> didn't a bandit fix merge?
21:38:57 <timburke> i'm going to keep working on getting those fixed, just wanted to keep people apprised
21:39:07 <acoles> https://review.opendev.org/c/openstack/swift/+/765883 ?
21:40:17 <timburke> yeah, that at least got master moving. i might be able to do that for one or two of the more-recent branches, too
21:40:38 <acoles> OIC there's a bunch of backport patches
21:41:02 <timburke> someone proposed a fix back on pike through stein like https://review.opendev.org/c/openstack/swift/+/766495
21:41:40 <acoles> ok, is py2-constraints the right way though?
21:42:21 <timburke> not all branches have a py2-constraints. though maybe we could introduce that?
21:43:01 <timburke> fwiw, a cap in test-requirements.txt hits failures like https://zuul.opendev.org/t/openstack/build/284cdb5099114af685a4bfeb53b0d2ff/log/job-output.txt#520-522 on some branches
21:43:12 <zaitcev> no ;python_version=='2.7 though for that bandit, I wonder whyt
21:44:29 <timburke> another option would be to just drop bandit from test-requirements.txt on (some?) stable branches -- we don't backport *that* much, and i'm not sure how much value we get from running bandit checks on stable
21:45:12 <timburke> that's all i've got
21:45:18 <timburke> #topic open discussion
21:45:25 <timburke> anything else we should bring up this week?
21:45:43 <acoles> thanks timburke for all your work on the gate issues, it's incredibly valuable
21:46:24 <timburke> anything i can do so you guys can focus on making swift great!
21:46:40 <zaitcev> I don't understand how bandit even gets invoked. There's a [bandit] in tox.ini, but it's not in the list at the top or in any zuul jobs.
21:47:11 <zaitcev> oh. maybe it's not no master branch.
21:48:21 <acoles> its part of the pep8 tox env
21:48:22 <timburke> iirc it's a flake8 plugin -- just install it and it'll start getting run as well
21:48:30 <zaitcev> Oh, right.
21:49:23 <zaitcev> Okay. I don't have anything else to discuss. Michelle managed to push through that patch for swift-init, but I have no idea if he's going to stick around.
21:50:50 <timburke> oh yeah! looking at the bug report (https://bugs.launchpad.net/swift/+bug/1079075), i'm not actually sure that the title was really accurate...
21:50:52 <openstack> Launchpad bug 1079075 in OpenStack Object Storage (swift) "swift-init should check if binary exists before starting" [Low,In progress] - Assigned to Michele Valsecchi (mvalsecc)
21:51:12 <zaitcev> how so? He wanted not to have extra messages.
21:51:33 <zaitcev> So, there's no change in function.
21:51:45 <zaitcev> By "he" I mean the original reporter.
21:52:19 <timburke> but the reason processes didn't start up wasn't actually missing binaries (afaict)
21:52:46 <timburke> "fails because some *configuration files* are not existent"
21:53:03 <zaitcev> well yeah
21:53:19 <zaitcev> Someone removed both configurations and binaries
21:53:42 <zaitcev> You know, I used to try that crap too. It was a mistake. But our RPM packages used to be very fine-grained like that.
21:54:36 <timburke> *shrug* if it's still a problem, we'll get a new bug report ;-)
21:54:40 <zaitcev> But then we started to share a bunch of code across types of services. For example, GET on accounts and containers use a function that's not in common code, but in container IIRC. So, when someone installs just one type of service, it blows up.
21:55:07 <zaitcev> I had to give up and create a common package that contains all of the code, no matter where it belongs.
21:55:56 <zaitcev> So the logic was, if swift-init starts checking for binaries, it would not attempt to run something that has no configuration.
21:55:57 <zaitcev> see
21:56:37 <zaitcev> So, I think it was an appropriate patch and it was okay for us to include it.
21:57:06 <zaitcev> Well, its value was very low. Only helps people who do this fine-grained installation.
21:57:28 <timburke> cool. yeah, i'm not worried about the patch; i do think it makes swift better. just thinking about whether the bug should be closed or not
21:57:30 <zaitcev> Oh, Tim
21:57:41 <zaitcev> Yeah, of course close it.
21:57:55 <zaitcev> One question: when are we going to drop py2?
21:58:06 <timburke> great question!
21:58:10 <timburke> i don't know!
21:58:37 * zaitcev backrolls in nagare kaiten
21:58:46 <seongsoocho> lol
21:59:12 <timburke> i feel like with train/ussuri we saw a decent number of new clusters stood up running py3-only
21:59:45 <timburke> and more recently in victoria/wallaby we're seeing clusters that were on py2 migrate to py3
21:59:54 <zaitcev> I'm sure projects other than bandit are going to put pressure on us. I think eventlet is the worst of them.
22:01:27 <timburke> yup -- it's a growing worry for me too -- see https://github.com/eventlet/eventlet/pull/665 for their deprecation (i don't think they've dropped it yet, but it's just a matter of time)
22:01:31 <zaitcev> Red Hat offers 7 years on some of the supported releases, but they have a controlled set of packages + backported patches. But in the trunk it's kind of a pain.
22:02:17 <timburke> thinking mostly selfishly, i'll say "not until i've migrated off of py2 myself" ;-)
22:02:44 <zaitcev> so, you have trunk on py2?
22:03:03 <timburke> my prod clusters run py2, yes
22:03:10 <zaitcev> What's the OS? Some kind of old Ubuntu I presume.
22:03:11 <timburke> (home cluster's py3 though!)
22:03:23 <timburke> centos7, mainly
22:03:39 <timburke> i think we've got some legacy customers still on ubuntu
22:03:43 <zaitcev> Right, that is py2.
22:03:53 <zaitcev> OK thanks for the answer.
22:04:11 <timburke> we package our own python; system python is a pain
22:04:32 <timburke> all right, sorry, i let us go over time. thank you all for coming, and thank you for working on swift!
22:04:37 <timburke> #endmeeting