Tuesday, 2020-08-18

ianwok, old git servers should be gone00:05
ianwif anyone wants to point out anything else to remove i can before i close the window00:05
clarkbI still see the git servers00:25
clarkbelasticsearch01.openstack.org can also be removed00:26
ianwhmm, yes, i wonder how i get rid of them, they're not in the host list any more00:30
clarkbmaybe we need to rerun the generating script?00:32
ianwi think it's "graph trees" and then go in there and delete one by one00:36
ianwok that seems to have got it00:40
ianwSilverblue/x86_64/os/images/install.img ... we probably want to cut this out00:50
ianwfedora is releasing now01:21
ianwi think the mit mirror is fine, but we should keep an eye on rawhide mirroring if it's timing out or causing problems01:21
fungik, thanks for continuing with it!01:49
fungii guess once the vos release completes we can recheck some failing dib changes01:50
ianwzuul-jobs too01:51
ianwok, mirroring stopped, fedora jobs back to workin gin zuul-gate at least02:41
openstackgerritIan Wienand proposed zuul/zuul-jobs master: Add focal testing  https://review.opendev.org/74662904:19
openstackgerritIan Wienand proposed openstack/diskimage-builder master: Makes EFI images bootable by bios  https://review.opendev.org/74324304:41
openstackgerritMatthew Thode proposed openstack/diskimage-builder master: update gentoo to allow building arm64 images  https://review.opendev.org/74600004:50
openstackgerritIan Wienand proposed zuul/zuul-jobs master: [dnm] trigger bridge jobs  https://review.opendev.org/74663104:59
openstackgerritIan Wienand proposed zuul/zuul-jobs master: Add focal testing  https://review.opendev.org/74662905:28
openstackgerritIan Wienand proposed zuul/zuul-jobs master: [dnm] trigger bridge jobs  https://review.opendev.org/74663105:28
openstackgerritIan Wienand proposed zuul/zuul-jobs master: Add focal testing  https://review.opendev.org/74662905:35
openstackgerritIan Wienand proposed zuul/zuul-jobs master: [dnm] trigger bridge jobs  https://review.opendev.org/74663105:35
openstackgerritMerged openstack/diskimage-builder master: Handle NetworkManager for dhcp-all-interfaces  https://review.opendev.org/74569805:35
openstackgerritMerged openstack/diskimage-builder master: source-repositories: git is a build-only dependency  https://review.opendev.org/74567806:07
openstackgerritOpenStack Proposal Bot proposed openstack/project-config master: Normalize projects.yaml  https://review.opendev.org/74663606:08
ianwclarkb: might be good to get focal testing in zuul-jobs with https://review.opendev.org/746629 before any more bitrot06:27
*** lpetrut has joined #opendev07:23
openstackgerritMerged openstack/project-config master: Normalize projects.yaml  https://review.opendev.org/74663607:37
*** tosky has joined #opendev07:40
openstackgerritMerged openstack/diskimage-builder master: Fedora 32 support  https://review.opendev.org/73721707:43
*** DSpider has joined #opendev07:49
*** andrewbonney has joined #opendev07:55
openstackgerritMerged openstack/diskimage-builder master: Makes EFI images bootable by bios  https://review.opendev.org/74324307:59
openstackgerritMerged openstack/diskimage-builder master: Update name of ipa job  https://review.opendev.org/74304208:04
openstackgerritMerged openstack/diskimage-builder master: Remove glance-registry  https://review.opendev.org/73979608:12
*** hashar has joined #opendev08:13
openstackgerritMerged zuul/zuul-jobs master: Add focal testing  https://review.opendev.org/74662908:21
openstackgerritDenis proposed zuul/zuul-jobs master: terraform: Add parameter for plan file  https://review.opendev.org/74665708:26
*** priteau has joined #opendev08:35
openstackgerritDenis proposed zuul/zuul-jobs master: terraform: Add parameter for plan file  https://review.opendev.org/74665708:37
*** mordred has joined #opendev08:54
openstackgerritDenis proposed zuul/zuul-jobs master: terraform: Add parameter for plan file  https://review.opendev.org/74665708:59
openstackgerritMerged openstack/diskimage-builder master: Do not install python2 packages in ubuntu focal  https://review.opendev.org/74566511:28
auristorclarkb: sorry I disappeared yesterday.  The rxdebug output shows that at the time of the query, there were no active calls.11:46
auristorThe history from 2020-07-20 that ianw linked to showed the fileserver being unable to access the callback service in response to receiving the first call on a new rx connection.  the failure was most likely due to a nat port mapping timing out after 60s.11:56
*** ysandeep is now known as ysandeep|brb12:30
*** ysandeep|brb is now known as ysandeep12:44
*** priteau has quit IRC13:06
*** tkajinam has joined #opendev13:50
*** olaph has joined #opendev13:53
openstackgerritVanou Ishii proposed opendev/puppet-openstackci master: Remove Empty Top Namespace from Puppet Classes  https://review.opendev.org/74672614:00
*** mlavalle has joined #opendev14:01
corvusclarkb: starlink launch in 1.5m14:29
* clarkb has a call...14:30
corvusalso planet sat payload14:30
*** qchris has joined #opendev15:09
*** lpetrut has joined #opendev15:23
clarkband 6th launch for the booster15:31
openstackgerritEmilien Macchi proposed openstack/project-config master: Re-introduce puppet-tripleo-core group  https://review.opendev.org/74675916:21
clarkbI upgraded my local test gerrit from 2.16 to 3.0 and change file diffs are all empty :/ Now I need to retest everything and make sure that isn't broken through 2.1616:54
clarkbthere was an exception on startup which could be related I guess16:55
clarkbalso the db isn't auto cleaned when you go to 3.016:55
openstackgerritJeremy Stanley proposed opendev/infra-specs master: Central Authentication Service  https://review.opendev.org/73183816:56
fbo|ptoHi, any thoughs moving from Gitea to Pagure for opendev.org ? Just relaying the question from twitter: https://twitter.com/Det_Conan_Kudo/status/129570650290956288216:57
clarkbI'm not sure what the benefits would be?16:57
clarkbwe don't connect zuul to gitea16:58
clarkbnor would we to pagure I imagine16:58
clarkband if we did maybe I should stop working on this gerrit upgrade testing16:58
fungifbo|pto: people wanted a "github-like" code browser instead of cgit, and gitea seemed to fit the bill at the time. replacing it at this point would be a lot of work when all we need is a read-only web display of git repos16:58
fungithe features he's suggesting there (issue tracking and documentation publishing) are features we already don't enable in gitea for good reason, not because gitea is bad at them but because they're not a fit for our service model17:01
fbo|ptoclarkb: fungi thanks for the explanation. Would you like me to respond this one twitter. Or you manage a opendev account on twitter (I'm not aware of any btw) and you'd like to reply ?17:01
fungifbo|pto: twitter is a proprietary service, so i don't think many of the regulars in here use it (not sure why someone would try to communicate with us on twitter to start with)17:02
fungii think the osf marketing team may operate an opendev twitter handle but i wouldn't begin to know how to get access to it17:03
fungiif someone really wants to initiate a discussion about getting involved in helping us maintain services, we have this irc channel and also the service-discuss@lists.opendev.org mailing list they can reach out through... both linked from the https://opendev.org/ [age17:04
fbo|ptofungi: :) that's fair, I'll replay your explanation there on proprietary service and invite to join on #opendev for other questions.17:04
fungier, page17:04
clarkbhuh I restarted to change gerrit's plugin manager config since it was the thing throwing exceptions. Now the diffs render. But I still get plugin manager exceptions17:06
fungiwhat sort of exceptions?17:06
clarkb[2020-08-18 17:03:23,645] [plugin-manager-preloader] ERROR com.google.gerrit.pgm.Daemon : Thread plugin-manager-preloader threw exception com.google.common.util.concurrent.UncheckedExecutionException: java.lang.NullPointerException17:07
clarkbI don't think thats super critical as we manage our plugins via the docker images anyway17:07
clarkbcan likely just drop the plugin manager entirely17:07
clarkbwhat I don't understand is why the diffs suddenly work17:07
clarkbthere were no other exceptions and I waited for reindexing to complete before browsing17:07
clarkbI'm going to restart again and see if it flip flops17:08
clarkbbut also I want to add some inline comments and run through the upgrades again anyway17:08
clarkbthat way I can check diffs on the older versions as we go17:08
fungiinfra-root: rackspace has notified us that they'll be performing disruptive block storage maintenance in october impacting 29 of our current volumes, and that we can avoid downtime by following these migration instructions: https://support.rackspace.com/how-to/minimizing-the-impact-of-cloud-block-storage-maintenance/17:10
clarkbfungi: I wonder if we create new volumes now, attach, lvm mirror, then detach and clean up the old volumes if that would work too17:11
fungiclarkb: yep, they also note that any new volumes created are going to backends which won't be impacted by the maintenance17:11
fungii wouldn't bother with lvm mirror, we can just attach a new volume and then pvmove to it17:12
fungithat can happen live, and is safely interruptable as long as you don't remove the volumes prematurely17:12
clarkbcool. If the es servers are in that list maybe we just turn them off. review.o.o's volume as well as eavesdrop should get the pvmove treatment likely17:13
fungipvmove is how i've done a lot of our other volume replacements in the past17:13
fungiright, i figured there were some servers where if they choke it's fine, and others where we'll want to avoid an outage17:14
fungiunfortunately the list is all uuids, so i haven't yet checked to see which systems will be affected17:14
* clarkb does a new set of test gerrit upgrades from 2.13 with inline comments17:17
fungiugh, also they sent the list in html-only e-mail, making it hard to just open in a text editor and write a script around17:24
* fungi loses faith in humanity17:24
clarkbadding inline comments populates the accountPatchReviewDb DB17:25
clarkbthats one mystery solved at least17:25
*** lseki has joined #opendev17:54
*** ajitha has joined #opendev17:56
clarkbah the accountPatchReviewDb is there to track the little check mark next to files to say you've seen them or not17:58
clarkbnow I GET IT17:58
ajithaHi all,Today, I have login issue with https://review.opendev.org/..17:59
ajithassh -p 29418 <username>@review.opendev.org17:59
ajithaConnection reset by port 2941817:59
clarkbajitha: you may have hit our connection limit17:59
clarkbajitha: is this for a CI system?18:00
ajithaNo, its for updating a patch.18:00
fungii can check gerrit's ssh log18:01
clarkbajitha: is the account shared wtih a CI system? we often see this with CI systems ending up with stale connections18:01
clarkband we allow 96 iirc?18:01
clarkbwe also have a per ip limit but its great than that I think18:01
fungi100 simultaneous established connections per ip address18:01
ajithaclarkb : how to check connection limit. I am in a personal laptop. I will try once by disabling VPN18:02
clarkbajitha: fungi is checking18:02
fungiajitha: if i knew your gerrit username, that would help. the server is very busy so quite a long log18:03
fungihuh, okay, i had already checked the log for "ajitha" and there were no matches, so it's not getting as far as exchanging credentials at least18:04
fungii so see some successful logins from you yesterday (in an earlier rotated logfile)18:05
fungier, i do see18:05
fungilast successful activity was at 06:11:39 utc yesterday18:05
ajithayes. yesterday i could login18:06
clarkbIt could be the ip restriction if on corp VPN and that goes through NAT?18:06
fungii also don't see any activity from the same ip address as you were coming from yesterday18:06
clarkbits also possible the corp firewall isn't allowing 29418 out18:07
ajithai disconnected the VPN now18:07
clarkbajitha: is it still erroring or does it work after that?18:11
fungialso i don't think it's the connection limit because the error was "Connection reset" (so tcp rst packet) while we tell iptables to reject those with icmp port unreachable18:12
ajithastill not accessible. some updates happened from AD.. it is possible to block requests from laptop18:12
zbrclarkb: what is the next step regarding e-r?18:13
fungianything's possible. for example if this is a work laptop with a mandatory network firewall installed on it and it specifies allowed egress destination ports/addresses then it could do what you're seeing18:13
ajithayes right.. thanks.. i shall contact the respective admin18:14
fungiajitha: but the short answer is that it doesn't look like your connection attempts are ever reaching our server18:14
ajithayes got it. blocking from my side right?18:15
fungiajitha: that's what it looks like to me at least18:15
ajithafungi: thank you18:15
ajithaclarkb: thank you18:16
fungiyou mentioned "updates from ad" so if this is a microsoft windows machine it's entirely possible network connectivity policy is being managed from your employer's active directory18:16
ajithayes right18:16
clarkbzbr: did gmann ever respond?18:16
clarkbI think that was the last person we were waiting on?18:16
gmannclarkb: zbr on which one?18:17
clarkbgmann: e-r maintainership18:17
clarkbgmann: I think tripleo (and zbr) are offering to take that on18:17
clarkbI wanted to make sure the qa team didn't want to own it first18:17
gmannclarkb: i did respond on qa channel with OK.18:17
gmannI can do on ML also if i missed if there is any?18:18
clarkbgmann: ya there was an opendev service-discuss thread. If you are subscribed and can reply that would be great18:19
clarkbbut if not thats probably fine. I can modify the core group18:19
fungiis anybody able to do `openstack volume list` from rackspace at the moment? i'm getting "Not Found (HTTP 404)" when i try, which is making it really hard to figure out which volumes are affected by the upcoming maintenance. also if i do `openstack volume show <some uuid>` i get "No volume with a name or ID of '...' exists." (with the uuid i specified instead of ...)18:20
fungii wonder if they've broken their volume api for all existing volumes18:21
openstackgerritClark Boylan proposed opendev/system-config master: Remove tmp gerrit plugins from our docker images  https://review.opendev.org/74678418:22
clarkbfungi: its an api version thing18:23
clarkbI thought I fixed it18:23
clarkboh ya I remember now18:23
clarkbthe original proposed fix was to pin to volumes v118:23
clarkbbut mordred pointed out they actually have avolumes v2 its just not in the cinder catalog18:24
clarkbso we updated the clouda.yaml to override the volumes url to the v2 endpoint that isn't in the catalog18:24
clarkbfungi: I would try using v1 api instead18:24
fungiwell, it's moot for the moment. i just pulled up dfw for our tenant in the rackspace dashboard and we have 29 volumes18:24
fungiso i'm going to assume the 29 uuids they put in the ticket represent every volume18:25
clarkbgmann: http://lists.opendev.org/pipermail/service-discuss/2020-August/000071.html is the thread18:25
clarkbgmann: just let me know if you think I should modify the core group without an email and I'll do that18:25
gmannclarkb: done, replied on ML also.18:25
clarkbexcellent thanks18:25
fungiclarkb: which clouds.yaml did we tweak for that? i'm just doing `sudo openstack --os-cloud openstackci-rax --os-region-name DFW volume list` from bridge so it should be using the /etc/openstack/clouds.yaml we install there18:26
clarkbfungi: yes that one18:27
clarkbyou'll probably need to make a copy, edit the contents and point osc at the copy to confirm v1 works18:27
fungiso v2 should be working with that one?18:27
clarkbfungi: yes, we updated to make v2 work despite it being in the catalog, but it is possible it wasn't in the catalog for a reason18:27
clarkbzbr: https://review.opendev.org/#/admin/groups/218,members has been updated18:27
gmannclarkb: sorry for missing the ML reply, i thought IRC confirmation is enough.18:27
clarkb#status log Updated elastic-recheck-core to include tripleo-ci-core as the tripleo-ci team intends to maintain the elastic-recheck tool18:28
openstackstatusclarkb: finished logging18:28
zbrthanks! clarkb i will bother you around switching to container use once I am ready with, ok?18:29
clarkbfungi: its also possible the openstacksdk release isn't new enough for that to work? it depends-on https://review.opendev.org/#/c/714624/18:29
clarkbzbr: ok18:29
fungithe command-line help for osc is not forthcoming in how to override an api version. possible i'm looking in the wrong places18:30
clarkbya its part of sdk18:30
clarkbI still think ist a bug and have to look in my history to figure it out18:30
clarkbfungi: look in clarkb's bridge history for openstack commands18:30
clarkbshould be something like OS_CLIENT_CONFIG=/path18:30
clarkbfungi: https://review.opendev.org/#/c/714553/5/playbooks/templates/clouds/bridge_all_clouds.yaml.j2 that diff shows you how to use v118:31
fungioh, you mean altering the clouds.yaml to use v1, got it. thought maybe there was a cli option18:31
fungiyeah, i know how to specify an alt config file18:32
fungilooks like we need python-openstackclient>=5.1.0 for that change18:32
fungiwe're running 5.2.1 from the container18:33
fungiso change 714624 is there18:33
fungiclarkb: switching it to v1 doesn't seem to help either18:43
fungieither that or setting envvars doesn't percolate into the docker container?18:43
clarkbfungi: I would try adding --debug to the client call and see what it is trying to talk to18:43
clarkboh ya if its in the containe I doubt that helps18:44
clarkbyou'd have to put the env var in the invocation18:44
clarkbon the other side of the docker run/exec18:44
fungimaybe i'll just use a virtualenv18:45
clarkbI've got on in my homedir. ~clarkb/venv/bin/openstack iirc18:45
fungii do too, but i had stopped using it because we had new osc installed18:46
* clarkb needs to pop out and get a quick snack before the meeting. Back shortly for that18:46
fungiunfortunately installed in a way that makes it hard to set behaviors with envvars18:46
fungiyep, i'm able to get to rackspace volume api by reverting 714553 in a copy of clouds.yaml and using osc from a virtualenv18:50
fungiannoyingly `openstack volume show -c Name` doesn't seem to work, returns blank entries even though `openstack volume show <id>` gives a name in its output18:58
fungii guess i'll have to scrape these in a loop18:58
*** priteau has quit IRC19:44
*** diablo_rojo has quit IRC19:48
clarkboh neat the /var/gerrit/tmp is mounted into the container so its actually becuse I'm doing upgrades that I see that I think20:05
clarkbso thats actuall a non issue (not great to have the noise but I can ignore it I think)20:05
clarkbnext stop skip level upgrades20:05
* clarkb finds lunch first20:06
openstackgerritEmilien Macchi proposed openstack/project-config master: Re-introduce puppet-tripleo-core group  https://review.opendev.org/74675921:04
clarkbtesting seems to show we can do a "skip level" upgrade by running the init for 2.14, init for 2.15, init for 2.16, then reindex everything, then start21:15
clarkbI'm 99% certain we can't do online reindexing if we do that21:15
fungimight be interesting to try. the offline reindex is going to add an order of magnitude more downtime (at least)21:16
clarkbya I'll give it a go I guess. There are a couple of things we have to reindex manually anyway, so will do those but then see if online reindexing works I guess21:16
clarkbthough I think I'm going to take a berak and do some other things as I've basically spent all day doing gerrit stuff21:17
fungii've got a one evening window to mow my lawn before the rains return, so trying to knock out stuff on my to do list when resting between laps21:20
clarkbok I've requested PTG space and signed us up for the same blocks of time. The one thing I need to double check is DST. I know here in the USA we don't switch until like november so we should be fine but that may not be true for europe21:52
* clarkb tries to do timezone math and not get it wrong21:53
clarkbya looks like october 25th for europe dropping summer time21:53
clarkboh and australia starts summer time first sunday in october21:55
clarkbmore maths21:55
clarkbianw: you'll be UTC+11 last week of october ya?22:03
clarkbdoing maths I think we should consider moving the 04:00 - 0:600 block to 05:00-07:00. That will be slightly more EU friendly and its all bad for fungi :)22:11
clarkbbut I think australia and china can handle that shift without much trouble22:11
* clarkb makes that small edit to the ethercalc22:11
clarkbthe other times are probably fine as is22:12
ianwclarkb: yeep, seems we switch on 4th oct22:13
clarkbthanks for confirming. I've requested Monday the 26th 13:00 - 15:00UTC Monday 26th 23:00 - Tuesday 27th 01:00 UTC and Wednesday 05:00 - 07:00. The only change from last time is moving wednesday from 04:00-06:00 to 05:00-07:00 to be slightly more europe friendly22:15
clarkbtheir switch from UTC+2 to UTC+1 means 04:00-06:00 is incredibly early22:15
fungi05:00-07:00 before usa dst ends?22:16
fungior after?22:16
fungii already got confused22:16
clarkbbefore usa dst ends after eu dst ends22:16
fungiahh, before usa dst ends22:16
clarkbPTG happens in the week of limbo22:16
clarkbfungi: I don't really expect you to show up for that one22:17
clarkbbut its "early" enough I should be able to make it22:17
fungiso 05:00-07:00 is basically 1-3am here. i'll survive ;)22:17
fungimaybe i'll skip. maybe i'll see it as an excuse to play video games until 1am. who knows?22:17
clarkbfungi: https://www.gerritcodereview.com/2.16.html#reindex-for-new-projects-index-and-changed-group-index that notes you need to do an offline reindex of changes and accounts if not upgrading from 2.1522:32
clarkbI've just tested the skip upgrade with online reindexing and it breaks beacuse I need to reindex changes22:32
clarkbI beleve that is the costly one22:33
clarkbI think that means we have to do the offlien reindex22:33
* clarkb tries to figure out what all the indexes are now22:34
fungiahh, yeah, too bad22:35
fungiif memory serves, offline reindexing nova's changes was somewhere around 10 hours last time we did it?22:35
clarkbaccounts, groups, changes, projects22:35
clarkbI think it is the changes and projects reindexes that are slow?22:35
fungianyway, we should probably just plan for and announce a 24-hour outage to be safe22:36
clarkbya so the alternative is to do shorter outages and step through 2.14 and 2.15 in order to get to 2.1622:36
clarkband basically let each one online reindex22:36
clarkb(which is slower iirc becuse it has to compete with doing the regular server duties)22:36
fungiand basically have two maintenances, the first where we get everything upgraded and start the offline reindex, the second where we try to bring gerrit back online the next day22:36
clarkbprobably figure 2-3 days for each of those22:36
fungiso, yeah, several short-ish outages plus possible bugs from now-no-longer-fixable gerrit versions? or one huge outage?22:37
clarkbyup. I think I prefer the huge otuage22:37
fungithe big outage would be more downtime overall, but probably also safer22:38
fungianyway, we *might* have a better idea of how long after we try a production dry-run on review-test22:39
clarkbya that should give us a good ballpark. Anyway trying to write up the distilled version of all this at https://etherpad.opendev.org/p/gerrit-2.16-upgrade next then will respond to luca and see if he can look it over once I think its good22:39
fungilike i could be around for a late-night (my time) start maintenance and then a mid-morning (my time) fire-up of 2.16 after the reindex22:39
fungiso it could start in your evening and then finish the next morning once you're up, say22:40
fungiwe might also want a temporary maintenance page for an outage of that duration. people will be surprised by it no matter how much communication we provide ahead of time22:41
clarkbfungi: want to look over https://etherpad.opendev.org/p/gerrit-2.16-upgrade really quickly? I've got a couple other questions I'll send luca's way as well22:46
*** olaph has quit IRC22:48
*** olaph has joined #opendev22:49
fungiyeah, just a sec22:52
fungiany point in dropping db tables if the notedb migration is on the horizon anyway? we'll drop the whole db when that comes22:59
clarkbfungi: I think that doesn't happen until 3.0 so cleaning up early seems like good hygiene22:59
clarkb(I mean i want to get to 3.x as quickly asp ossible but..)23:00
fungiahh, the sql db is still used for some things in 2.16 even after notedb migration?23:03
clarkbat least that seemed to be what luca said in his response to my first email23:04
clarkbhe says its 3.0 where you can drop the db23:04
fungidraft plan lgtm23:04
clarkbcool firing off another email to luca now then23:04
fungifor some reason i read his previous reply as indicating that 3.0 no longer supported using the db, but not necessarily that 2.16 required it if using notedb23:05
fungianyway, better safe than sorry. it doesn't really lose us anything23:06
clarkbfungi: one of my questions was why does the chagnes table remain and he said that you can drop the whole db after 3.x aiui23:06
clarkbebcause in testing that was one of the things I wanted to check to ensure that notedb migration worked23:06
clarkbit does turn out that no new changes end up in the chagnes table after the notedb migration so tahts how I've been checking23:06
fungiahh, okay. fair enough23:10
fungithat's ambiguous, but better to err on the side of caution23:10
