Thursday, 2020-11-19

clarkb++00:01
ianwclarkb: thanks, responded to the commenty bits; will fill all the other stuff now00:07
ianwclarkb: oh, that was the other thing, i called it codesearch because it's pretty heavily configured to be our codesearch00:10
ianwlike the container starts and writes out the config pulled from project-config projects.  so it's not really a generic hound container00:11
clarkbya I think we've done that with things like gitea too but still call it gitea?00:11
ianwfair enough00:11
*** tosky has quit IRC00:14
openstackgerritMerged openstack/project-config master: Revert "Disable limestone provider due to IPv4-less nodes"  https://review.opendev.org/76325400:25
openstackgerritIan Wienand proposed opendev/system-config master: Migrate codesearch site to container  https://review.opendev.org/76296000:35
*** dmellado has quit IRC01:03
*** dmellado has joined #opendev01:04
*** hamalq has quit IRC02:09
openstackgerritMerged opendev/system-config master: devel job: use ansible-core name  https://review.opendev.org/76309902:28
*** ysandeep|holiday is now known as ysandeep|off02:59
*** d34dh0r53 has quit IRC03:24
*** d34dh0r53 has joined #opendev03:27
openstackgerritIan Wienand proposed opendev/zone-opendev.org master: Add codesearch.opendev.org  https://review.opendev.org/76329703:32
openstackgerritIan Wienand proposed opendev/system-config master: Add codesearch.opendev.org server  https://review.opendev.org/76329803:34
ianwok, i've brought up a new codesearch server, and also added the acme-challenge cname in openstack.org to acme.opendev.org so it's cert can cover that too03:38
openstackgerritClark Boylan proposed opendev/system-config master: Build new gerrit images  https://review.opendev.org/76329904:02
*** raukadah is now known as chandankumar04:03
*** ykarel has joined #opendev04:19
*** jaicaa has quit IRC04:56
*** jaicaa has joined #opendev04:56
*** marios has joined #opendev06:05
*** marios has quit IRC06:15
*** marios has joined #opendev06:18
*** hamalq has joined #opendev06:25
*** jaicaa has quit IRC06:37
*** jaicaa has joined #opendev06:39
*** slaweq has joined #opendev07:00
*** eolivare has joined #opendev07:09
*** sboyron has joined #opendev07:20
*** sboyron has quit IRC07:23
*** sboyron has joined #opendev07:23
*** marios is now known as marios|ruck07:37
*** DSpider has joined #opendev07:39
*** ralonsoh has joined #opendev07:43
*** bhagyashris|off is now known as bhagyashris07:55
*** hashar has joined #opendev08:00
*** rpittau|afk is now known as rpittau08:03
*** andrewbonney has joined #opendev08:10
*** roman_g has joined #opendev08:15
*** icey has joined #opendev08:17
*** hamalq has quit IRC08:28
*** tosky has joined #opendev08:44
*** lpetrut has joined #opendev08:47
*** mgoddard has joined #opendev08:58
*** ykarel_ has joined #opendev09:04
*** ykarel has quit IRC09:07
*** mlavalle has quit IRC09:09
*** mlavalle has joined #opendev09:12
*** icey has quit IRC09:17
*** icey has joined #opendev09:24
*** hamalq has joined #opendev09:29
*** hamalq has quit IRC09:34
*** dtantsur|afk is now known as dtantsur09:41
*** ykarel_ is now known as ykarel09:57
*** hamalq has joined #opendev10:07
*** hamalq has quit IRC10:11
*** d34dh0r53 has quit IRC10:21
*** hamalq has joined #opendev10:27
*** hamalq has quit IRC10:32
*** ykarel_ has joined #opendev10:33
*** ykarel has quit IRC10:35
*** icey has quit IRC10:36
*** icey has joined #opendev10:47
*** icey has quit IRC10:52
*** icey has joined #opendev11:03
*** hamalq has joined #opendev11:09
*** hamalq has quit IRC11:14
*** ykarel__ has joined #opendev11:25
*** ykarel_ has quit IRC11:28
*** tkajinam has quit IRC11:29
*** tkajinam has joined #opendev11:30
*** hamalq has joined #opendev11:30
*** ykarel__ is now known as ykarel11:33
*** hamalq has quit IRC11:35
*** hamalq has joined #opendev11:51
*** hamalq has quit IRC11:56
openstackgerritSlawek Kaplonski proposed zuul/zuul-jobs master: [multi-node-bridge] Add script to configure connectivity  https://review.opendev.org/76265012:02
*** hamalq has joined #opendev12:12
*** hamalq has quit IRC12:16
*** hamalq has joined #opendev12:34
*** kevinz has joined #opendev12:36
*** hamalq has quit IRC12:39
*** rpittau is now known as rpittau|brb13:16
*** hamalq has joined #opendev13:40
dtantsurhey folks! is it only me or viewing logs on https://zuul.opendev.org/t/openstack/build/ has become really slow recently?13:45
*** hamalq has quit IRC13:45
fungidtantsur: slowness in the logs tab, the summary tab or the console tab?14:00
fungithere was recent ansi color rendering which got added to the summary and console views which we have evidence to suggest is causing order of magnitude or greater increases in display time at least for summary and console14:01
fungiapparently it gets much worse the larger the ansible json is14:01
*** ykarel_ has joined #opendev14:05
dtantsurfungi: pretty everything is quite slow for me, the summary takes seconds to open, firefox shows "this tab slows down your browser"14:07
*** ykarel has quit IRC14:07
*** lamt has joined #opendev14:08
fungidtantsur: on all build results or just ones with lots of output? for example, this loads quickly for me: https://zuul.opendev.org/t/openstack/build/183b590240ab4527a2f6d5e3382d2a0514:11
dtantsuryep, this was pretty fast14:15
dtantsurprobably our dsvm jobs then14:15
fungiwe're talking in #zuul about reverting the ansi color rendering for now to continue working on it and get some better performance benchmarks before adding it back14:15
fungiso if that's the cause, you'll probably know some time today14:15
*** auristor has quit IRC14:20
*** auristor has joined #opendev14:20
*** d34dh0r53 has joined #opendev14:25
dtantsurgreat! I'll check again14:31
*** mgoddard has quit IRC14:48
*** rpittau|brb is now known as rpittau14:51
dtantsuralso, could we remove the browser warning from the meetup, given that 1) firefox works okay nowadays, 2) chromium is currently broken?15:04
dtantsurs/meetup/meetpad/15:05
*** mgoddard has joined #opendev15:07
fungichromium's broken?15:07
fungiand yeah, we mainly added that warning because we were getting many firefox users saying they were unable to get jitsi to work for them, so we added that warning to reduce the number of requests for assistance15:08
fungialso supposedly the webrtc renderer in firefox performed worse, no idea if they've worked on improving that in recent months15:09
dtantsurfungi: there is a bug currently with chromium crashing on switching windows when webrtc is used15:12
dtantsurI think firefox has improved, but I have no data to back this statement15:13
fungioh neat. i haven't witnessed that but i generally use chromium only for videoconferencing and keep it separate from my locked-down firefox with all the privacy usability extensions15:13
fungidtantsur: any specific chromium versions impacted? i'm using the 83.0.4103.116-3.1 build in debian/unstable currently15:14
dtantsurfungi: https://bugzilla.redhat.com/show_bug.cgi?id=189592015:15
openstackbugzilla.redhat.com bug 1895920 in chromium "Chromium 86 crashes on WebRTC videos when switching window" [Urgent,New] - Assigned to spotrh15:15
dtantsurI haven't dived into that, just using firefox15:15
fungiahh, okay, so my chromium is fairly old apparently. that would probably explain why i haven't seen it15:16
*** tosky has quit IRC15:18
*** mlavalle has quit IRC15:21
fungianybody else happen to know if firefox is working better with jitsi recently?15:22
*** tosky has joined #opendev15:22
*** tosky has quit IRC15:26
*** elod has quit IRC15:27
*** elod has joined #opendev15:27
*** ykarel_ has quit IRC15:29
*** tosky has joined #opendev15:50
smcginnisJust noticed today I am getting a publickey error when trying to "git review -d".15:57
smcginnisTried SSHing into the port and see the error - debug1: send_pubkey_test: no mutual signature algorithm15:58
smcginnisAny recent changes that might be related to this?15:58
clarkbsmcginnis: did you upgrade to fedora33?15:58
smcginnisOnly recent change on my end, that I can think of, is upgrading to Fedora 33 and now having py39 as default.. yep15:59
clarkbthats the change then. openssh has deprecated using sha1 for hostkey exchanges. fedora33 has taken it a step further and disabled that. Our gerrit 2.13 ssh server doesn't do sha2 and you get that failure15:59
clarkbonce we've upgraded that should go away. So hopefully next week it is a non issue. In the meantime you can do a host specific ssh config override to allow sha1 host key exchanges with gerrit16:00
smcginnisAh... any examples of that I can use?16:00
smcginnisKexAlgorithms +diffie-hellman-group1-sha1   ?16:01
clarkbhttps://unix.stackexchange.com/a/34085316:02
smcginnisThanks clarkb!16:02
clarkbI think16:02
clarkbI haven't had to do it myself.16:02
smcginnisI'll give it a shot.16:02
smcginnisAnd report back.16:02
fungiyeah, ought to just be able to add a review.opendev.org section to your ~/.ssh/config and set that for now16:08
smcginnisLooks like maybe that is the sshd option. Client side, I had to add PubkeyAcceptedKeyTypes +ssh-rsa16:10
fungithat sounds right16:11
smcginnisI'll post something to the ML in case anyone else upgrades to f33 before the gerrit upgrade.16:11
clarkbnote the gerrit upgrade is supposed to start tomorrow :)16:11
smcginnisYeah, small window.16:11
smcginnisGuess I can skip the ML. Not likely someone will decide to do it during the weekday.16:12
clarkbfwiw I did test that this error goes away with upgraded gerrit16:12
clarkbso pretty confident in that :) and not just hoping16:13
fungiinfra-root: not sure if you saw earlier, but dtantsur linked to a bug about chromium 86 builds being broken with jitsi and similar videoconferencing tools (though it looks like maybe it's fixed in chromium 87). worth keeping an eye out for if people report problems with meetpad16:15
clarkblooks like it affects wayland and x1116:16
clarkbI'm up for doing a call with chrome later and seeing if it fails similarly16:16
fungiyep, and apparently downgrading to 85 is a "bad idea" because of a serious security vulnerability in it16:16
fungisupposedly people testing with chrome did not experience the same problems as with chromium16:17
clarkbfungi: yes chrom* has patched a number of vulnerabilities that are being exploited in the wild16:17
clarkbin like the last 2 weeks16:17
clarkbah ya I see where chrome is reported to not be affected16:18
fungirelated, he's requested we drop the warning about firefox not working with meetpad... apparently it does work at least to some degree (we knew that much) but i suppose it's worth revisiting whether the reduced support burden from people using firefox and seeing suboptimal behavior/performance justifies annoying the firefox users who have been able to make it work for them anyway16:19
clarkbfungi: I'm willing to test firefox too and at least try to quickly reproduce our previous experiences16:20
clarkbif we can't reproduce then we can drop the warning16:20
*** hamalq has joined #opendev16:20
*** hamalq has quit IRC16:25
*** lpetrut has quit IRC16:42
*** zaro69 has joined #opendev16:51
*** rpittau is now known as rpittau|afk16:52
*** marios|ruck is now known as marios|out16:57
*** marios|out has quit IRC17:00
*** hamalq has joined #opendev17:01
*** d34dh0r53 has quit IRC17:02
*** zaro69 has quit IRC17:03
*** zaro95 has joined #opendev17:05
*** zaro95 has quit IRC17:06
*** d34dh0r53 has joined #opendev17:06
*** zaro48 has joined #opendev17:07
*** eolivare has quit IRC17:09
*** fressi has quit IRC17:22
*** ralonsoh_ has joined #opendev17:27
*** ralonsoh has quit IRC17:28
*** roman_g has quit IRC17:43
*** roman_g has joined #opendev17:44
*** roman_g has quit IRC17:44
*** hamalq has quit IRC17:45
clarkbfungi: going through my pre upgrade notes, we did update prod to update the refs/meta/config perms right? Otherwise I'm not really finding much other than get those images rebuilt and published then work through the backup initialization/prep/testing17:45
*** roman_g has joined #opendev17:45
*** hamalq has joined #opendev17:45
*** roman_g has quit IRC17:45
clarkbbut please call out anything that we may have missed or should double check prior to tomorrow17:45
fungiyes, we (at least i think i) did17:46
fungiwe can work on adding our backup volume next i suppose17:46
fungii'll get that created and attached shortly17:46
*** roman_g has joined #opendev17:46
*** roman_g has quit IRC17:47
clarkbthanks!17:47
clarkboh we also need to do the maintenance html. I can work on that in a bit17:50
fungiwe may want to keep an eye out for anything like https://bugs.chromium.org/p/gerrit/issues/detail?id=1370117:51
fungisupposedly newer jetty is causing folks to need to add 'RequestHeader set "X-Forwarded-Proto" expr=%{REQUEST_SCHEME}' to their apache reverse proxy configs17:51
clarkbthat should be easy enough to add assuming that our version of apache supports that17:52
clarkbshould we go ahead and add that to review-test?17:53
*** zaro48 has quit IRC17:53
fungiwell, maybe we need to double-check that it's a problem for us first17:54
clarkbwe don't enable the plugin manager so can't test easily with the given example17:54
fungii'm not a fan of cargo-culting stuff like that unless we're sure it's necessary17:54
clarkbI guess we can just browse around for a bit and see if we trip it17:54
clarkb++17:54
clarkbbrowsing changes seems fine17:54
clarkbnow to try searching17:54
*** zaro has joined #opendev17:55
clarkbsearching also seems fine17:55
*** hamalq has quit IRC17:56
*** hamalq has joined #opendev17:56
*** fressi has joined #opendev17:58
*** mlavalle has joined #opendev18:00
*** mgoddard has quit IRC18:05
clarkbif anyone is able to replicate that issue on review-test let us know but I haven't succeeded so far18:05
fungiinfra-root: looks like ns1 has been hung since late utc tuesday. i'm investigating18:12
fungialso apparently ns2 isn't responding on its ipv4 address (not sure how long, we seem to be monitoring it via ipv6 which is working fine as far as i can tell)18:17
openstackgerritMerged opendev/system-config master: Build new gerrit images  https://review.opendev.org/76329918:18
fungithe oob console for ns1 shows the usual hung task messages on its console, though i was able to get it to initiate a soft reboot it seems18:20
clarkbfungi: I can reach ns1 now and it seems to be running nsd18:23
clarkbI guess the next thing is to sort out why ipv4 to ns2 is sad?18:23
fungiyeah, i'm inspecting the logs on it18:23
*** dtantsur is now known as dtantsur|afk18:23
fungilooks like it was still logging ansible connections at 06:22:30 yesterday18:24
fungibut that's where syslog abruptly ends18:24
clarkbdid it fill its disk?18:24
fungiand it doesn't seem to have logged the current boot messages in syslog either18:25
clarkbalso are you looking at ns1 or ns2?18:25
funginope, the fs is mostly empty18:25
fungins118:25
fungii haven't started investigating the network issue for ns2 yet18:25
clarkbI think we've seen that before on rax. Where the disk just goes away18:25
clarkband then the server gets really sad18:25
fungiyeah, but i wonder why after a reboot nothing's logging to syslog18:26
clarkbjournalctl shows logs18:26
fungi/dev/xvda1 on / type ext4 (rw,noatime,nobarrier,errors=remount-ro,data=ordered)18:26
clarkbis rsyslog not running to slurp into the file?18:26
fungithe rootfs still seems to be writeable18:27
clarkbya i don't see an rsyslog running18:27
clarkbwhich will cause that to happen18:27
fungijournalctl continued logging when syslog ended18:29
clarkbfungi: yes aiui syslog relies on rsyslog being installed and it will slurp from journald into /var/log/syslog18:29
fungithe last thing logged to syslog was ansible doing something with rsyslog18:29
fungi<smoking gun>18:30
clarkbcorvus: unrelated to ^ have you seen https://zuul.opendev.org/t/openstack/build/571d8b35ec5f4857b7391437d080f45c/logs before? it looks like docker hub had a proper error trying to tag the 3.1 image18:30
clarkbcorvus: I don't think that is catastrophic for us because we don't intend on running 3.1 so we'll be fine on the older image18:30
clarkbcorvus: but if we can get it promoted that would be great18:31
fungiNov 18 06:22:30 ns1 python3[16348]: ansible-apt Invoked with name=rsyslog state=absent purge=True package=['rsyslog'] ...18:31
fungiso we've got ansible explicitly uninstalling the rsyslog package?18:31
clarkbhrm I recall ianw doing syslog things but don't think it was to uninstall it18:31
fungi#status log rebooted ns1.opendev.org after it became unresponsive18:32
openstackstatusfungi: finished logging18:32
clarkbfungi: system-config/playbooks/roles/base/server/tasks/Debian.yaml we remove the package then reinstall it18:32
fungihuh, i guess it logged the removal but not the reinstallation18:33
fungimaybe it failed to start properly afterward18:33
clarkbcoincidence that it happened right at the time18:33
clarkbya18:33
clarkbI think we should alnd the prescribed cleanup in that block though. Maybe when ianw's day starts18:33
clarkbthe server should self correct itself given ^ once our hourly jobs run I think18:33
fungiwell, no "right at the time" it's still unclear to me when the server broke18:33
clarkbgotcha18:33
fungii'm trying to piece that together now looking for a gap in journalctl messages prior to the reboot18:34
fungijournalctl was logging right up to the end18:35
fungiseems like services were seeing connections coming in18:35
clarkbcorvus: actually looking at dockerhub all of them seem to have updated18:36
fungilooking to see if maybe the rootfs got marked read-only at some point18:36
clarkbI think my concern there is that maybe pabelanger's container work has broken something? it does look like the promote job for 3.1 is complaining about 2.14 for some reason18:36
clarkbcorvus: if you have a minute to sanity check those today that would be great. I'll see what I can find too18:37
fungins2 is reachable for me over ipv4 now, not sure what was going on with it earlier18:40
fungiit wasn't even responding via icmp echo when i was testing previously18:41
pabelangerclarkb: was the job using build-container-image?18:41
clarkbno its opendev-build-docker-image18:42
clarkbbut you're modifying roles too right? I just want to santiy check that something sin't broken and that our imges are still building properly before we do the upgrade tomorrow18:43
clarkbit is incredibly difficult to map what docker hub shows you to anything the docker tools show you to determine if you've got a 1:1 match18:44
*** andrewbonney has quit IRC18:45
clarkbpabelanger: the thing that has me concerned is the job to promote the 3.1 image is complaining about the 2.14 image18:46
clarkbwhich has my paranoia thinking: did we mixup the tags somehow18:47
fungilooks like wmf also reported the same redirect problem with their gerrit and eclipse's too: https://bugs.chromium.org/p/gerrit/issues/detail?id=1370518:48
fungireally surprised we're not seeing it on review-test18:48
clarkbok I think I see what is going on18:49
clarkbthe last step in those jobs is to list all the tags and then clean out the obsolete tags18:49
clarkbI think the race is that the 2.14 job deleted the 2.14 tag while the 3.1 job was listing tags and dockerhub broke18:50
clarkbthe actual promotion side of things seems to have been fine18:50
corvusclarkb: i was just digging into that and come to the same conclusion18:50
clarkbwe should be fine in that case, and that can be somethign we clean up on the job side later :)18:50
corvusthey were within seconds of each other; haven't confirmed the sequence yet18:51
*** ralonsoh_ has quit IRC18:53
corvusthe order is opposite what i expect, but there's still enough overlap for it to be a race, especially if there's a lot of locking/cdn stuff going on on docker's side.  so i think that's the hypothesis we should go with18:55
corvusprobably should just put that in a retry18:55
clarkbor maybe make it less greedy? I don't know if that is possible given the state we have18:55
corvuswell, it was just the listing that failed18:56
fungiso a retry might work there?18:56
clarkboh right18:56
fungioh, you suggested a retry18:56
fungiyeah, makes sense18:56
corvusyeah.  mind you, if we get past the listing working, we could end up with the same issue then moved to the actual delete stage.  <shrug>18:56
corvuseither way, it's not terribly important.  we could also just fail_when:false18:57
ianwfungi: catching up; was the syslog ok in the end?18:58
fungiianw: no, for some reason rsyslogd didn't start successfully when ansible removed and reinstalled it, and it also didn't start on reboot18:58
fungii haven't checked yet to see why18:58
clarkbfungi: ianw maybe it didn't reinstall18:58
ianwclarkb: yeah, i want to do that cleanup, but wanted to get https://review.opendev.org/756605 into production, that got blocked because codesearch job failed, which led me to containerising it :)18:58
fungiclarkb: bingo, looks like it's currently not installed on ns118:59
fungithe last action logged in /var/log/dpkg.log was the uninstallation of rsyslog too18:59
fungiand journalctl has truncated now... where do i find old journals?19:00
ianwclarkb: if you could loop back on https://review.opendev.org/#/c/762960/ i started the server for it yesterday too.  i think i'd like to get the gate around this fixed up one way or the other today so that's not an issue19:01
ianwfungi: i'll check on the bridge side what it thought it did19:01
fungioh, nevermind, user error19:01
fungiunfortunately, journalctl doesn't seem to record the ansible activity the way rsyslog did19:01
clarkbianw: yup I'll rereview that. We have a couple of gerrit upgrade prep things to do at this point: write maintenance.thml and get backup bolume mounted and initial sycn done19:02
clarkbI'm about to find lunch then if I can sneak it in a bike ride since the sun decided to show up today19:02
clarkbsounds like fungi will do the backup volume mounting, I'll work on the maintenance.html then hoepflly once things wind down I can rereview codesearch?19:02
fungiahh, okay i found in the journal where it logged the rsyslog package removal but it doesn't seem to have tried to reinstall it immediately19:05
ianwfatal: [ns1.opendev.org]: UNREACHABLE! => {19:05
fungiianw: yeah it was unreachable for a while today19:06
funginot sure for how long19:06
*** sboyron has quit IRC19:07
*** sboyron has joined #opendev19:07
fungii see "Nov 19 06:24:24 ns1 python3[22116]: ansible-apt Invoked with state=present name=['at', 'git', 'logrotate', 'lvm2', 'openssh-server', 'parted', 'rsync', 'rsyslog', 'strace', 'tcpdump', 'wget'] ..." in the journal19:07
ianwwhen i look in base.yaml.log -- it's almost like it's running twice.  things are all mixed up19:08
clarkbprobably different hosts?19:08
fungilooks like it removed rsyslog Nov 18 06:22:29 but didn't try to install it again until Nov 19 06:24:24 (and for whatever reason that didn't work either)19:09
ianw"debconf: delaying package configuration, since apt-utils is not installed"19:10
ianwi wonder if that's invovled19:10
fungioh, it also said to install rsyslog Nov 18 06:18:0519:11
fungicould it have gotten the install and remove steps backwards somehow?19:11
fungibasically it installed at 06:18:05 but it was already installed so that presumably did nothing, then it removed at 06:22:2919:12
fungiand then didn't try to install again until the next day19:12
ianwns1.opendev.org            : ok=29   changed=3    unreachable=1    failed=0    skipped=3    rescued=0    ignored=019:12
ianwns2.opendev.org            : ok=30   changed=4    unreachable=1    failed=0    skipped=3    rescued=0    ignored=019:12
ianwso it was both ok, and unreachable ?19:12
clarkbthose are task counts19:12
fungiianw: when was that? we also saw ipv4 was broken on ns2 earlier today19:12
clarkbso 30 tasks ok but one was unreachablr thrn it skipped the rest aiui19:13
ianwyeah, this is in the latest base run logs19:13
fungithe removal and install attempts i see look like they probably happened from our daily periodic job19:13
fungigiven the times19:14
fungi(and frequency)19:14
ianwanyway, now this has run, we should stop it doing this.  sorry, i planned for this to happen in the matter of a few hours -- but then the gate got broken by codesearch19:14
ianwif i try and update the base yaml i think it will run the failing codesearch job too19:15
ianw"this" being the reinstall19:15
openstackgerritMohammed Naser proposed zuul/zuul-jobs master: added lgtm with basic docs  https://review.opendev.org/76342819:15
fungifor whatever reason, rsyslog was able to be reinstalled successfully on ns219:15
fungithough similarly, it seems it got uninstalled in the daily run on wednesday and then wasn't installed again until the daily run on thursday19:17
fungiso ns2 had no rsyslog for ~24 hours19:17
fungii guess that's not what was intended19:17
ianwno, looking at the base.yaml, it seems it does the purge but not the reinstall, thinking the host was unreachable19:18
ianwfungi: what's your preference here?  i don't think any of us have interest in updating the current puppet & codesearch server to their new releases so we can rule that out19:19
ianwi can make the job non-voting, and propose a change to stop the purge here now the old config file is gone19:19
ianwor we can merge https://review.opendev.org/763298 to remove the codesearch puppet19:20
ianwand then merge a change to avoid the reinstall without gate changes19:20
funginot entirely sure i grok the interrelationship between these issues, but happy to prioritize the codesearch container reviews if that gets rsyslog working on ns1 (would it also be broken anywhere else?)19:21
ianwit's just that the current testing of the codesearch job is broken, which will block the gate for anything that tries to run it, like base file updates19:21
ianwfungi: as far as i can tell from the base logs, it was only ns1/2 that seemed to become unreachable for the reinstall step19:22
*** mgoddard has joined #opendev19:27
*** sboyron has quit IRC19:32
*** sboyron_ has joined #opendev19:33
fungiclarkb: do you think we need the backup volume to be ssd? i guess it might speed up the maintenance a little if we don't have to wait as long as for rsync to sata?19:35
clarkbI dont think it is neccesary but may help?19:36
fungiclarkb: and the idea is to just sync /home/gerrit2 into it, right? so i can get by with a 100gb volume as the data in there is less than that19:37
clarkbwell we'll sync two sents of gerrit2 homedirs into it19:37
clarkbso it should be large enough for both copies19:38
clarkb2.13 and 219:38
clarkb*2.13 and 2.1619:38
fungigot it, i'll make it 256gb then?19:38
clarkbcurrent is using less than half the exieting 256 right?19:38
fungiyep, 93gb19:39
clarkbif so then ya 256 is probably a good size19:39
*** mgoddard has quit IRC19:39
clarkbnotedb grows things a bit but were snapshotting pre notedb19:39
clarkbits about 15gb growth before we gc then 4gb after gc iirc19:39
fungido we want different logical volumes for the two copies or just separate paths?19:42
*** sboyron_ has quit IRC19:48
fungiassuming separate paths will work fine19:50
clarkbjust separate path is fine19:51
clarkbalso I think we only need to copy the review_site and the db backup19:51
clarkbnot all of the gerrit2 homedir. That may make it a bit quicker19:51
openstackgerritIan Wienand proposed opendev/system-config master: base: Remove rsyslogd reinstall  https://review.opendev.org/76343119:52
fungimeh, it's already underway. but also the rest of the stuff is unlikely to change much during upgrade?19:52
clarkbgood point19:52
fungi`sudo rsync -Sax --delete /home/gerrit2/ /mnt/2020-11-20_backups/2.13` is what's presently running19:53
*** sboyron has joined #opendev19:55
*** hashar has quit IRC19:58
clarkbinfra-root review.o.o:~clarkb/maintenance.html has a short blurb in it now20:06
clarkbdo we think we need to add anything more to that?20:06
fungiyour last closing paragraph tag is busted, otherwise lgtm20:07
clarkbfixed20:08
fungiyep, looks great20:14
clarkbas far as timing goes for tomorrow I'm going to try and be at the keyboard by 14:30UTC20:14
clarkbthe schedule I've written doesn't have us doing anything until 1500 anyway so that should be plenty of time to wake up20:15
fungiyeah, at most we'll send out a status notice an hour before or something to remind folks20:16
fungii can leave myself a reminder to send the reminder20:16
fungi2.13 rsync completed, i'm priming the copy for 2.16 now and will add these commands to the pad20:18
clarkbfungi: thanks!20:18
clarkbianw: looking at https://review.opendev.org/#/c/762960/10..11/playbooks/roles/codesearch/templates/docker-compose.yaml.j2 the path in the dockercompose files is data not /data I think that means it is relative to the docker compose config dir?20:18
clarkbianw: I guess my question was why not do it as /var/hound/data or similar which we do with other containers20:18
clarkbthen when you bind mount it will be /var/run/data ?20:19
clarkbianw: also left one note about the jobs20:20
ianwclarkb: i think that's saying "data volume at /var/run/data"?20:31
clarkboh a proper docker volume. I think we should avoid those20:31
clarkbthey get allocated out of a difficult to manage space and that makes it hard to supplmement with lvm and cinder20:32
clarkbI think its best for our current use cases to use regular bind mounts20:32
ianwit seemed appropriate in this case because the data is not ephemeral, but also not required to be outside the container20:32
clarkbianw: ya I've been using them with my nextcloud deployment at home and think it was the biggest mistake in that deployment20:32
clarkbmostly because you can't say "use disk space from this location" easily20:33
clarkband since we rely on cinder a lot I think that may be important?20:33
clarkbalso consistency is nice. but maybe others are fine with that20:33
ianwhrm, i mean i don't think this is going to expand.  i can, but the config file is deliberately generated in the container, to keep it self-contained20:34
clarkbyou can bind mount the dir and still generate the config file right?20:34
clarkbI might be missing how that affects volumes vs bind mounts20:34
ianwyeah, i can, it just seems a bit unnecessary to have it outside the container context20:35
ianwi dont' feel strongly.  i can bind mind mount it in20:36
clarkbI like the simplicity of bind moutns and we can move them and remount bigger fs's under them etc20:37
openstackgerritIan Wienand proposed opendev/system-config master: Migrate codesearch site to container  https://review.opendev.org/76296020:41
openstackgerritIan Wienand proposed opendev/system-config master: Add codesearch.opendev.org server  https://review.opendev.org/76329820:41
openstackgerritIan Wienand proposed opendev/system-config master: base: Remove rsyslogd reinstall  https://review.opendev.org/76343120:42
*** zaro has quit IRC20:44
*** zaro has joined #opendev20:47
clarkbfungi: is 2.16 sync done?20:49
*** sboyron has quit IRC20:49
fungiyep, 22m36s elapsed time20:51
fungiso in theory any update will be faster than that20:51
clarkb++20:51
fungii'm timing a nearly no-op update of both now to get an approximate lower bound20:51
clarkbone thing I just checked was that the 2.16 bugfix branch has the notedb conversion improvement changes on it and it does20:51
fungioh good20:52
clarkbI'm drawing a blank for other things we can check without redoing things we have already done. So I think I'll sneak in that bike ride as soon ast he curren rain passes20:52
fungiand given that stable-3.2 has no new commits since two weeks ago i'm guessing the only difference to its corresponding bugfix branch is the security fixes20:52
clarkbianw: your stack lgtm I didn't approve anything given I'll be out on the bike and also distracted by gerrit20:53
clarkbfungi: ya I checked that last night20:53
clarkbfungi: also re the jetty thing I wonder if that is only on java 1120:54
ianwclarkb: thanks for reviews.  i can babysit after ci and get gate back to working20:54
clarkbwe're doing our own java 8 builds and not seeing that20:54
fungiahh, could be20:54
fungii did already approve the containerization change20:55
funginothing under that topic should break the existing codesearch anyway, that will cut over when someone updates openstack.org dns20:55
fungiianw: oh! you probably want to add the acme cname to openstack.org dns in advance or that's going to break cert issue?20:56
ianwfungi: already did that :)20:56
fungiyou're smarter than i20:56
fungii didn't even think to check it until just now20:56
ianwnot smarter, just have made the mistake before20:57
fungi_acme-challenge.codesearch.openstack.org is an alias for acme.opendev.org.20:57
fungiperfect!20:57
openstackgerritSlawek Kaplonski proposed zuul/zuul-jobs master: [multi-node-bridge] Add script to configure connectivity  https://review.opendev.org/76265021:01
fungiokay, mostly null re-up of the two rsync copies took 2m7s and 1m46s so that probably puts our lower bound around 2 minutes21:06
fungii'm going to estimate around 5 minutes for the 2.13 backup refresh and 10 minutes for the 2.16 backup refresh in our maintenance notes, just to have a ballpark figure21:07
fungii suppose i can time a mysqldump too, seems we don't have an estimate for that yet21:08
fungii've set myself reminders to do a status notice at 13:00 and again at 14:00. i'll go ahead and startmeeting in #opendev-maintenance at 13:00 as well so we can capture any last minute prep discussion21:15
clarkb++ thanks21:23
fungi9m16s, i'll put it down as 10m21:24
*** sboyron has joined #opendev21:27
*** fressi has quit IRC21:40
*** fressi has joined #opendev21:41
openstackgerritMerged opendev/zone-opendev.org master: Add codesearch.opendev.org  https://review.opendev.org/76329722:01
openstackgerritJeremy Stanley proposed opendev/system-config master: Revert "Build new gerrit images"  https://review.opendev.org/76347322:13
fungiinfra-root: ^ not sure if we want to do that at this point or not, but worth noting we can22:15
corvusfungi: commit msg says 'most' which raises questions22:16
corvusfungi: most means >= ones we care about?22:16
corvusfungi: but not 2.14 and 2.15 which we are building?  so i guess i'm confused22:17
fungicorvus: well, we're never planning to expose 2.14 and 2.15 publicly, we're just running init in each of them temporarily22:19
corvusfungi: so we don't really care which version of those we build and we're switching to the non-updated stable branches on those just to clean up the config along with the rest?22:20
fungii suppose i could have used more words in the commit message. i meant "most of the stable branches we're building (including 3.2 which we'll be exposing and 2.16 which we might roll back to)22:21
fungiit was more so we didn't have to revert twice nor wait to get back onto stable-3.2 until stable-2.14 eventually gets those commits (if ever)22:22
fungii could amend it to be a partial revert and continue using the bugfix branches for 2.14 and 2.15. we expect to rip out all the entries <3.2 after upgrading anyway22:22
corvushashar merged a change into 2.16 recently (after the notedb fix) which may be merged up through other branches soon22:24
fungiand stable-2.15 has also updated now as we've been talking22:24
corvusit's a doc change22:25
corvuswe may get varying builds based on that if they're in the process of merging up, but i think we don't care and can just ignore it.22:25
corvusfungi: +2 and as you can tell, i actually double checked everything :)22:26
openstackgerritMerged opendev/system-config master: Migrate codesearch site to container  https://review.opendev.org/76296022:26
fungithanks, and yeah i checked that commit from hashar as well thinking at first it might impact the upgrade process, but nope22:26
*** sboyron has quit IRC22:31
fungistable-2.14 has updated now too22:53
*** DSpider has quit IRC22:55
clarkbfungi: how important do we think that is? eg should we land it right now and use those images or should we stick to the image we tested then land that after the upgrade?23:21
clarkbI'm kinda leaning towards leaving it as is before the upgrade then we can land that as part of the changes we need to land after? but if people feel strongly the other way let me know23:23
clarkbcorvus: ^ do you have a preference?23:25
*** zaro has quit IRC23:29
ianwFailed to download remote objects and refs:  error: file write error: No space left on device23:35
ianwi think our nodepool builders are unhappy23:35
clarkbI guess we were closer to hte edge with only 2 builders than I thought (we removed a lot of images but then I think we added a couple back in)23:36
clarkbiirc f33 and centos-8-stream happeend after we condensed to 2?23:36
ianwthis failed in letsencrypt because we keep acme.sh on /opt23:36
ianwit's 01 & 02 ... i've brought the container down while i look23:38
clarkbianw: the other thing that happens is we leak the mounts and consume a bunch of tmp space iirc23:39
ianwin this case, there doesn't seem tob e any leaked mount23:39
clarkbin the past what I've done is down the builders, disable the service, reboot, rm everything in dib_tmp, then enable the service and reboot23:39
clarkbk23:39
clarkbdib_tmp has a bunch of stuff in it fwiw. Running du on it now (this is on nb0123:40
ianwyeah, it's quicker to just rm it all and see what frees up :)23:40
clarkbwfm if you want to do that23:41
clarkbI've stopped the du23:41
openstackgerritMerged opendev/system-config master: Add codesearch.opendev.org server  https://review.opendev.org/76329823:42
ianwthe logs have all rotated out what started it23:45
clarkbalso it may be worth removing the cache and rebuilding it depending on whether or not we think some of that old distro stuff is in there in large stale quantities23:46
ianwthat's freed up 43gb23:48
clarkbthere is a decent chance we've just got too many imaes for 1tb now :/23:49
ianw43gb is actually pretty tight, given the various formats we convert to23:50
clarkbya23:51
ianwthe vhd thing is ridiculous and writes it out about 3 times in total i think23:52
clarkbthe other thing is in theory they are supposed to balance out, but maybe we aren't doing that23:52
clarkbor its just te new images filling our disks up23:52
*** hamalq has quit IRC23:53
*** hamalq has joined #opendev23:54
clarkbthinking out loud here: what if we didn't keep the raw and vhd images on disk when building qcow2? we can convert the qcow2 to the others if need be23:56
clarkbthat would require nodepool hcnages I bet, but maybe that is a good optimization?23:56
clarkbbasically do all the uploads then trim23:56
fungiclarkb: i'd be fine upgrading with the 3.2 we've tested running (note we haven't tested "upgrading" with it per se, nor with the other fixed intermediate images we built today), just wanting to make sure we can pull new commits shortly after we upgrade in case we run into any new bugs which get fixed upstream23:57
clarkbfungi: ya I think we want to revert soon, but sticking with the images we've got at this point seems good until the upgrade is done23:58
fungiwfm23:58
fungiin theory the stable-3.2 branch is currently identical to what we built from, so it shouldn't make a difference barring problems arising on rebuild, so i'm fine waiting23:59

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!