Saturday, 2021-05-15

*** darshna has quit IRC00:33
*** tkajinam has joined #opendev02:38
*** tkajinam has quit IRC02:38
*** darshna has joined #opendev06:25
*** calcmandan has quit IRC06:35
*** calcmandan has joined #opendev06:36
*** slaweq has quit IRC07:38
*** frigo has joined #opendev07:39
frigomorning! opendev.org looks down07:40
*** frigo has quit IRC07:47
*** tosky has joined #opendev07:49
*** avass has quit IRC08:39
*** avass has joined #opendev08:40
*** akahat|ruck has quit IRC09:20
*** kopecmartin has quit IRC09:22
*** kopecmartin has joined #opendev09:26
*** akahat has joined #opendev09:44
fungifrigo seems to be gone, but i'm looking into it now11:45
fungithe haproxy-docker_haproxy_1 container on gitea-lb01 is in a "restarting" state according to `docker-compose ps`11:49
fungidowning and upping the container just brings it back to a restarting state11:50
fungithe last time syslog records haproxy forwarding anything was at 06:37:4711:52
fungi`docker image list` says the haproxy "latest" tag is for an image built 17 hours ago11:54
fungiwondering if something changed with it11:54
*** ykarel has joined #opendev11:59
fungiswitching from latest to lts doesn't seem to have changed anything11:59
fungioh, latest and lts both seem to point to the same thing as 2.4.012:01
fungiswitching to the 2.3 tag fixed it12:03
fungii can load https://opendev.org/ again12:03
fungi#status notice The load balancer for opendev.org Git services was offline between 06:37 and 12:03 utc due to unanticipated changes in haproxy 2.4 container images, but everything is in service again now12:05
openstackstatusfungi: sending notice12:05
-openstackstatus- NOTICE: The load balancer for opendev.org Git services was offline between 06:37 and 12:03 utc due to unanticipated changes in haproxy 2.4 container images, but everything is in service again now12:05
fungiinfra-root: i've put gitea-lb01.opendev.org in the emergency disable list until we can merge a change to pin to the 2.3 tag12:06
openstackstatusfungi: finished sending notice12:08
*** dmsimard has quit IRC12:12
openstackgerritJeremy Stanley proposed opendev/system-config master: Temporarily pin haproxy image to 2.3  https://review.opendev.org/c/opendev/system-config/+/79159612:21
fungiyeah, prior to https://github.com/docker-library/haproxy/commit/ae10fbf9 yesterday, latest was an alias for 2.3 and lts was an alias for 2.2, but now they're both aliased to 2.412:38
fungiaccording to https://www.haproxy.com/blog/announcing-haproxy-2-4/#validation we can use haproxy -c to check the validity of our configuration. maybe something's not quite right syntax-wise and 2.4 is catching it12:44
*** ykarel_ has joined #opendev12:44
fungii don't see mention of any configuration options removed in 2.4 so i doubt it's something that simple12:47
*** ykarel has quit IRC12:47
*** frigo has joined #opendev13:16
frigothanks:)  looks good now13:17
fungithanks frigo!13:29
fungiappreciate the heads up13:29
frigohope you got some logs to investigate13:30
frigomaybe there is a real bug upstreams?:)  if you did not know what to do during the week-end13:30
fungiwell, unfortunately docker wasn't logging much as to why the new images wouldn't start, but my guess is they reorganized the image layout or something between 2.3 and 2.413:31
fungiour tests probably already reproduce the behavior so shouldn't be hard to hold a node and have a working replica of the problem combination13:32
fungimaybe at some point we'll get sophisticated enough to always pin container versions for things we don't control and then auto-propose updates to those image versions so they can be tested before we deploy13:34
frigobut.. if the new image does not start, the change was still rolled out on all the nodes ?13:34
fungifrigo: right now we just always deploy the "latest" tag of the haproxy image from dockerhub13:34
fungiand the problem is that latest flipped from 2.3.x to 2.4.0 earlier today13:35
fungiwhich for some reason doesn't start13:35
frigoyeah I get that, but the change kept being rolled out even after a first node failed?13:35
fungithere was no "change" which was rolled out13:36
fungiwe continuously update the containers13:36
fungiwhenever "latest" updates on dockerhub, periodic redeployment picks that up automatically13:37
fungiit's not tested13:37
frigook ok :D13:37
fungiit was designed naively to assume whatever the latest tag for haproxy is would work the same as previous versions13:37
fungithere are ways we could catch that, we just haven't implemented them due to limited supply of people involved in running all this13:38
*** slaweq has joined #opendev13:38
frigoof course of course13:38
fungiwe do test the images we build, but we're not building an haproxy image we're just consuming the "official" one13:39
*** slaweq has quit IRC13:47
fungiinfra-root: i've self-approved https://review.opendev.org/791596 so i can remove the emergency disable entry for the load balancer in short order13:48
*** frigo has quit IRC13:49
fungiin theory the revert of that should fail its system-config-run-gitea build and then we can use that to investigate further13:49
*** slaweq has joined #opendev14:08
*** tosky has quit IRC14:11
*** DSpider has joined #opendev14:22
*** ykarel_ has quit IRC14:23
openstackgerritMerged opendev/system-config master: Temporarily pin haproxy image to 2.3  https://review.opendev.org/c/opendev/system-config/+/79159614:36
*** slaweq has quit IRC14:49
*** ykarel_ has joined #opendev16:06
*** ricolin has quit IRC16:20
*** ykarel_ has quit IRC16:40
openstackgerritJeremy Stanley proposed opendev/system-config master: Revert "Temporarily pin haproxy image to 2.3"  https://review.opendev.org/c/opendev/system-config/+/79159816:43
fungiit's been long enough since 791596 merged i'm taking gitea-lb01 back out of the emergency disable list now16:43
fungiand setting an autohold for system-config-run-gitea on 79159816:44
fungiif i can figure out how to plumb that through docker-compose exec now16:49
fungiyay got it added16:50
fungiwill check back later when that (hopefully) fails16:51
fungiyep, as predicted system-config-run-gitea failed on the revert, at least our testing is predictable there. held node is 188.212.108.13617:36
fungiso at first pass, it looks like a permissions error: https://zuul.opendev.org/t/openstack/build/678857cf702541d789cd3daa3d829907/log/gitea-lb01.opendev.org/docker/haproxy-docker_haproxy_1.txt#5-717:41
*** brinzhang has joined #opendev18:01
*** brinzhang0 has quit IRC18:04
*** auristor has quit IRC19:15
*** auristor has joined #opendev19:19
*** tosky has joined #opendev19:20
*** dmsimard has joined #opendev21:21
*** darshna has quit IRC22:03
*** irclogbot_0 has quit IRC23:30
*** irclogbot_1 has joined #opendev23:33
*** tosky has quit IRC23:34
*** artom has quit IRC23:45
*** artom has joined #opendev23:45

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!