Thursday, 2019-07-25

*** xek has joined #openstack-infra00:01
*** whoami-rajat has quit IRC00:01
*** yamamoto has joined #openstack-infra00:02
openstackgerritMerged zuul/nodepool master: Add nodepool_debug flag to openstack functional jobs  https://review.opendev.org/66993900:15
*** jistr has quit IRC00:15
*** jistr has joined #openstack-infra00:15
openstackgerritIan Wienand proposed zuul/nodepool master: [wip] functional testing: test journal-to-console element  https://review.opendev.org/66978700:21
openstackgerritIan Wienand proposed zuul/nodepool master: Enable debug logs for openstack-functional tests  https://review.opendev.org/67241200:23
openstackgerritIan Wienand proposed zuul/nodepool master: [wip] functional testing: test journal-to-console element  https://review.opendev.org/66978700:23
*** larainema_ has joined #openstack-infra00:48
*** larainema_ is now known as larainema00:49
*** gyee has quit IRC00:49
*** ricolin has joined #openstack-infra00:55
ianwclarkb: http://logs.openstack.org/87/669787/9/check/nodepool-functional-openstack-src/235e201/nodepool/nodepool-launcher.log00:56
ianw@ around 2019-07-25 00:52:37,654 ... sending the systemd output to the journal, it gets captured ok ... i think that will be helpful in general for any such future issues00:57
*** igordc has quit IRC01:04
*** yamamoto has quit IRC01:04
clarkbya that bit was working iirc01:07
*** slaweq has joined #openstack-infra01:11
*** slaweq has quit IRC01:15
*** tdasilva has quit IRC01:20
*** tdasilva has joined #openstack-infra01:21
openstackgerritIan Wienand proposed openstack/diskimage-builder master: journal-to-console: element to send systemd journal to console  https://review.opendev.org/66978401:25
*** mriedem has quit IRC01:49
*** Frootloop has quit IRC02:09
*** jcoufal has joined #openstack-infra02:19
*** jcoufal has quit IRC02:33
*** yamamoto has joined #openstack-infra02:46
*** bhavikdbavishi has joined #openstack-infra02:51
*** bhavikdbavishi1 has joined #openstack-infra02:54
*** bhavikdbavishi has quit IRC02:55
*** bhavikdbavishi1 is now known as bhavikdbavishi02:55
*** ykarel|away has joined #openstack-infra02:56
*** whoami-rajat has joined #openstack-infra03:06
*** slaweq has joined #openstack-infra03:11
openstackgerritClark Boylan proposed opendev/system-config master: Remove gitea02 from inventory so we can replace it  https://review.opendev.org/67262103:13
clarkbfungi: ^ head start on tomorrow03:13
*** slaweq has quit IRC03:16
openstackgerritIan Wienand proposed zuul/nodepool master: Functional testing: add journal-to-console element  https://review.opendev.org/66978703:35
*** eernst has joined #openstack-infra03:36
*** psachin has joined #openstack-infra03:38
*** yamamoto has quit IRC03:42
*** yamamoto has joined #openstack-infra03:46
*** yamamoto has quit IRC03:51
*** yamamoto has joined #openstack-infra03:53
*** rcernin has quit IRC03:55
*** yamamoto has quit IRC03:57
*** yamamoto has joined #openstack-infra04:02
*** lmiccini has quit IRC04:04
*** lmiccini has joined #openstack-infra04:05
*** udesale has joined #openstack-infra04:06
*** dchen has quit IRC04:07
*** ykarel|away has quit IRC04:08
*** dchen has joined #openstack-infra04:10
*** yolanda has quit IRC04:21
*** yolanda has joined #openstack-infra04:22
*** ykarel|away has joined #openstack-infra04:34
*** pcaruana has joined #openstack-infra04:44
*** pcaruana has quit IRC04:56
*** slittle1 has joined #openstack-infra05:04
*** slittle1 has quit IRC05:09
*** slaweq has joined #openstack-infra05:11
openstackgerritMatthew Thode proposed openstack/diskimage-builder master: support alternate portage directories  https://review.opendev.org/67153005:15
*** slaweq has quit IRC05:16
*** kopecmartin|offf is now known as kopecmartin05:18
*** eernst has quit IRC05:22
*** rcernin has joined #openstack-infra05:33
*** ykarel|away is now known as ykarel05:42
*** dchen has quit IRC05:50
openstackgerritMerged openstack/diskimage-builder master: Enable nodepool debugging for functional tests  https://review.opendev.org/67260806:00
*** kjackal has joined #openstack-infra06:01
*** jaosorior has quit IRC06:03
*** rcernin has quit IRC06:03
*** yamamoto has quit IRC06:07
*** dchen has joined #openstack-infra06:09
*** slaweq has joined #openstack-infra06:11
*** slaweq has quit IRC06:15
openstackgerritKartikeya Jain proposed openstack/diskimage-builder master: Adding new dib element  https://review.opendev.org/57877306:18
*** yamamoto has joined #openstack-infra06:18
*** rcernin has joined #openstack-infra06:18
*** jaosorior has joined #openstack-infra06:20
*** pcaruana has joined #openstack-infra06:21
*** rcernin has quit IRC06:21
*** rcernin has joined #openstack-infra06:21
*** jaicaa has quit IRC06:28
AJaegerinfra-root, I cannot login to Zanata at translate.openstack.org, is our openid somehow broken? I do not get a login screen at all ;(06:30
*** dpawlik has joined #openstack-infra06:31
*** jaicaa has joined #openstack-infra06:31
*** joeguo has quit IRC06:33
*** slaweq has joined #openstack-infra06:33
*** udesale has quit IRC06:33
*** udesale has joined #openstack-infra06:34
*** cshen has joined #openstack-infra06:36
cshenmorning, is opendev.org DOWN?06:36
AJaegerhttps://opendev.org/ is up - what exactly is failing for you?06:37
*** abhishekk has joined #openstack-infra06:38
AJaegerinfra-root, do we have gitea problem again?06:38
openstackgerritKartikeya Jain proposed openstack/diskimage-builder master: Adding support for SLES 15 in element 'sles'  https://review.opendev.org/61918606:38
AJaegerI get: "fatal: unable to access 'https://opendev.org/openstack/openstack-manuals.git/': Empty reply from server"06:38
AJaegercshen: is that your problem as well? ^06:38
AJaegerinfra-root, this is running a git pull from opendev06:38
abhishekkhi, I am not able to access https://opendev.org/openstack/glance/ or https://opendev.org/openstack/glance_store/06:39
abhishekkis there any problem?06:39
cshenAJaeger: opendev.org is not accessible.06:39
AJaegerabhishekk: seems so, see the last lines06:39
*** marios|ruck has joined #openstack-infra06:39
AJaegercshen: Which URL exactly? The git clone or anything else?06:40
cshenwhat a luck, it just happened when we started our major upgrade :-D06:40
abhishekkAJaeger, ack06:40
cshenAJaeger: basicly, the whole site is not accesible.06:40
AJaegercshen: for me https://opendev.org/ works on top level, so are you running in the same problem with gig cloning that abhishekk and myself do or is there another one? How exactly can we reproduce?06:41
cshenAJaeger: git clone failed by me as well.06:42
AJaeger#infra log cloning with git from opendev is failing06:42
yoctozeptoAJaeger: does not from here either06:42
yoctozeptoby browser either06:42
yoctozeptoseems like connection issue?06:42
cshenyoctozepto: it seems that the site is down.06:43
yoctozeptocshen: AJaeger has just claimed it works for him :D06:43
yoctozeptotop-level, from browser, does not load for me06:43
cshenyoctozepto: I can't access opendev.org from Germany right now. neither HTTP nor git clone.06:44
yoctozeptoPoland here06:44
yoctozeptoPodlachia region (north east)06:44
AJaegeryoctozepto: git cloning fails for me, https://opendev.org (top-level) works but nothing git related like browsing repositories - from Germany06:45
abhishekkme From India - Asia06:45
abhishekknot able to clone or access via browser06:45
AJaeger#status alert The git service on opendev.org is currently down.06:46
openstackstatusAJaeger: sending alert06:46
* AJaeger sends an alert to reduce questions ;)06:46
*** rlandy has joined #openstack-infra06:46
AJaegerI think we can all agree that git is broken - and without an admin around, nothing we can do until the US wakes up. So, this might take another 5 hours...06:47
AJaegeryoctozepto, cshen , abhishekk, thanks for reporting - and sorry for this. But nothing we can do right now06:48
*** pgaxatte has joined #openstack-infra06:48
abhishekkAJaeger, ack06:48
-openstackstatus- NOTICE: The git service on opendev.org is currently down.06:49
*** ChanServ changes topic to "The git service on opendev.org is currently down."06:49
yoctozeptoAJaeger: roger that, git is definitely down when all http is down :-)06:49
yoctozeptoit's odd06:50
yoctozeptoI debugged it06:50
*** dpawlik has quit IRC06:50
yoctozeptohttp does a redirect to https06:50
yoctozeptohttps negotiates tls session06:50
yoctozeptoand hangs06:51
yoctozeptoafter tunnel is established06:51
yoctozeptoshould be region independent06:51
yoctozeptohttp://paste.openstack.org/show/754833/06:52
*** jpena|off is now known as jpena06:52
yoctozeptocould it be that it banned us at app level? ;d06:52
*** jpena is now known as jpena|mtg06:53
openstackstatusAJaeger: finished sending alert06:53
cshenAJaeger: ack, any backup git repo which we could check out?06:53
yoctozeptocshen: review.opendev.org seems to still work06:54
cshenyoctozepto: same here06:54
yoctozeptocshen: cool, I meant you can use the repos via gerrit06:54
Tenguwait, comodo CA is still alive ?!06:56
openstackgerritMatthieu Huin proposed zuul/zuul master: Zuul CLI: allow access via REST  https://review.opendev.org/63631506:56
yoctozeptoTengu: that's what it seems, at least for this cert06:57
Tengusurprizing..... didn't they get intrusion and CA stollen?06:57
Tengu(now, wondering why not using something free like «let's encrypt» :D)06:58
openstackgerritMatthieu Huin proposed zuul/zuul master: Add Authorization Rules configuration  https://review.opendev.org/63985506:58
openstackgerritMatthieu Huin proposed zuul/zuul master: Web: plug the authorization engine  https://review.opendev.org/64088406:59
cshenyoctozepto: could you give me an example of repo url?06:59
yoctozeptoTengu: yup, as long as you don't need EV (i.e. you are not a payment processing org)06:59
openstackgerritMatthieu Huin proposed zuul/zuul master: Zuul Web: add /api/user/authorizations endpoint  https://review.opendev.org/64109906:59
Tenguyoctozepto: of course :).06:59
openstackgerritMatthieu Huin proposed zuul/zuul master: authentication config: add optional token_expiry  https://review.opendev.org/64240806:59
yoctozeptocshen: sure, it requires you to be a registered user though:06:59
yoctozepto[remote "gerrit"]07:00
yoctozepto        url = ssh://yoctozepto@review.opendev.org:29418/openstack/kolla-ansible.git07:00
yoctozepto        fetch = +refs/heads/*:refs/remotes/gerrit/*07:00
*** apetrich has quit IRC07:00
yoctozeptochange to your username obviously07:01
Tenguhttps://review.opendev.org/openstack/tripleo-ci  also07:01
Tenguanonymous07:01
Tenguand http(s)07:01
cshenor maybe use the repos in github.com?07:02
cshenit seems to be 1:1 mirrored.07:02
yoctozeptoTengu: it said 'not found'?07:02
yoctozeptocshen: yeah, openstack/ are07:02
Tenguo_O that's the link provided within the project listing of gerrit07:03
yoctozeptothough wonder if lack of opendev.org did not stop sync at some point07:03
Tengufor instance: https://review.opendev.org/#/admin/projects/openstack/tripleo-ci07:03
*** odicha has joined #openstack-infra07:03
yoctozeptoTengu: yeah, it worked now07:03
*** jamesmcarthur has joined #openstack-infra07:04
Tengubut the git link doesn't....07:04
Tenguthat's interesting.07:04
yoctozeptoit works from git, not browser, just checked07:04
Tenguhmm.... didn't work for me using git.07:04
yoctozeptothen it's magic07:04
Tengu {"changed": false, "cmd": ["/bin/git", "fetch", "origin"], "msg": "Failed to download remote objects and refs:  fatal: remote error: Git repository not found\n"}07:05
Tenguunless... wait.07:05
yoctozepto$ git clone https://review.opendev.org/openstack/tripleo-ci07:05
yoctozeptoCloning into 'tripleo-ci'...07:05
yoctozeptoremote: Counting objects: 13343, done07:05
yoctozeptoremote: Finding sources: 100% (13343/13343)07:05
yoctozeptoremote: Total 13343 (delta 6671), reused 11016 (delta 6671)07:05
yoctozeptoReceiving objects: 100% (13343/13343), 5.99 MiB | 3.27 MiB/s, done.07:05
yoctozeptoResolving deltas: 100% (6671/6671), done.07:05
yoctozeptoso anonymous https works too via gerrit07:05
yoctozeptogood to know07:05
Tenguoh, my fault.07:05
yoctozeptonext time gitea refuses to work07:05
Tenguwas still using the old project "openstack-infra".07:06
*** rcernin has quit IRC07:06
*** rlandy is now known as rlandy|mtg07:07
yoctozeptoAJaeger: wonder if you can send announcement about the availiablity of git repos via gerrit?07:07
yoctozeptoshould make ppl happier07:07
yoctozeptothe path seems to be exact the same07:08
ianwhrrm, this is definitely not my area of knowledge with the changes going on atm07:09
yoctozepto#status info The git service on review.opendev.org can be used in place of opendev.org's - project paths are preserved07:12
yoctozepto(was worth trying ;D )07:12
*** tesseract has joined #openstack-infra07:15
*** iurygregory has joined #openstack-infra07:15
*** udesale has quit IRC07:16
*** iokiwi has quit IRC07:17
*** adriant has quit IRC07:17
*** dpawlik has joined #openstack-infra07:17
*** udesale has joined #openstack-infra07:18
*** iokiwi has joined #openstack-infra07:18
*** adriant has joined #openstack-infra07:18
*** gfidente has joined #openstack-infra07:20
ianw12660704.934832] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.07:21
ianw[12660825.726429] INFO: task jbd2/vda1-8:248 blocked for more than 120 seconds.07:21
ianw[12660825.732761]       Not tainted 4.15.0-45-generic #48-Ubuntu07:21
ianwthis is on the gitea-lb01 console07:22
ianwhttp://paste.openstack.org/show/754834/ for posterity07:22
ianwi think it needs a reboot ... i guess it can't make it worse07:22
*** aedc has joined #openstack-infra07:22
*** rpittau|afk is now known as rpittau07:22
*** raissa has quit IRC07:23
*** raissa has joined #openstack-infra07:24
*** raissa has joined #openstack-infra07:25
ianwgreat, now it is in error state07:26
yoctozeptolife is full of surprises07:27
cshenianw: do we have only one server for serving git service?07:27
ianwcshen: one load balancer, anyway :/07:28
yoctozeptocshen: but review.opendev.org works with the same paths07:30
*** Goneri has joined #openstack-infra07:30
yoctozeptoso it's a no-brainer actually to replace ;D07:30
ianwi think this is a problem on vexxhost that i can't solve07:30
yoctozeptodiscussed a bit above07:30
yoctozeptocshen: change opendev.org to review.opendev.org and it should magically work (for git)07:31
cshenyoctozepto: yes, I checked, I even checked out from github.com. But the upgrade scripts have some dependencies on opendev.org.07:32
*** kobis1 has joined #openstack-infra07:32
yoctozeptocshen: what scripts are you about?07:32
ianwi don't think there's much i can do at this point.  either vexxhost need to look at what's going on in the backend and recover the server, or we need to build a new one07:34
ianwmnaser: ^07:35
*** dchen has quit IRC07:35
cshenyoctozepto: https://github.com/openstack/openstack-ansible/blob/master/scripts/bootstrap-ansible.sh07:39
yoctozeptoah, osa07:40
cshenit pulls a lot of things from opendev.org07:40
*** ykarel is now known as ykarel|lunch07:41
noonedeadpunkguilhermesp probably you can help with opendev thing ^07:42
yoctozeptoyeah, kolla's CI does too, it is broken for the moment07:43
yoctozeptomostly due to redirect from upper-constraints to opendev07:43
yoctozepto;D07:43
*** priteau has joined #openstack-infra07:45
*** marekchm has joined #openstack-infra07:50
*** tkajinam has quit IRC07:53
*** tkajinam has joined #openstack-infra07:53
AJaegeryoctozepto: upper-constraints should be downloaded from releases.openstack.org07:57
*** jaosorior has quit IRC07:57
AJaegeryoctozepto: e.g. https://releases.openstack.org/constraints/upper/master07:57
AJaegerianw: do you know how to take git01 out of haproxy?07:58
ianw#status log sent email update about opendev.org downtime, appears to be vexxhost region-wide http://lists.openstack.org/pipermail/openstack-infra/2019-July/006426.html07:58
openstackstatusianw: finished logging07:58
AJaegerianw: thanks !07:58
yoctozeptoAJaeger: yeah and that REDIRECTS ;D07:58
ianwAJaeger: ^ see above email.  not only does the load-balancer have issues, but the gitea backend servers also have kernel errors about storage.  i think it's a region wide issues on vexxhost07:58
ianwso yeah, just rebuilding the lb somewhere else won't help07:59
yoctozeptoto opendev which is utterly broken atm ;/07:59
AJaegeryoctozepto: oh, it redirects? didn't know that ;(08:00
AJaegerianw: argh ;/08:00
yoctozeptoAJaeger: yeah, unfortunately, someone even suggested it is inefficient when it was proceeded08:00
yoctozeptoforgot it could also be "unstable"08:01
ianwAJaeger: yeah sorry i've got to step away, but i think the most practical thing is to wait for vexxhost to confirm issues08:03
*** dtantsur|afk is now known as dtantsur08:11
*** pkopec has joined #openstack-infra08:11
*** lucasagomes has joined #openstack-infra08:12
*** pkopec has quit IRC08:12
*** pkopec has joined #openstack-infra08:12
*** ralonsoh has joined #openstack-infra08:13
AJaegerianw: I'm in meetings all day, so not much time either (and even less options than you have). Is the alert good enough or do you have a propoal to change it?08:13
jamesmcarthurianw: yeah, openstack.org, etc... are all down as well08:16
jamesmcarthurif anyone is asking :|08:16
yoctozeptojamesmcarthur, ianw, AJaeger: oh, that escalated pretty quickly08:21
*** apetrich has joined #openstack-infra08:24
*** fdegir has joined #openstack-infra08:24
AJaegerSo, is the following ok to send out "Services at opendev.org like our git server and at openstack.org are currently down, looks like an outage in one of our cloud providers." ?08:26
*** siqbal has joined #openstack-infra08:26
ianwjamesmcarthur: yeah, i guess that goes through the same lb08:27
yoctozeptoAJaeger: looks fine08:27
*** panda has quit IRC08:28
yoctozeptoguys, https://review.opendev.org/671178 , are cyclic dependencies possible?08:29
yoctozeptoI get no error but it does not seem to be picked up08:29
yoctozepto;/08:29
AJaeger#status alert Services at opendev.org like our git server and at openstack.org are currently down, looks like an outage in one of our cloud providers.08:29
openstackstatusAJaeger: sending alert08:29
AJaegeryoctozepto: cyclic dependencies are not fine - Zuul will review to test these since it cannot put them in any sequential order08:30
*** tosky has joined #openstack-infra08:30
*** panda has joined #openstack-infra08:31
-openstackstatus- NOTICE: Services at opendev.org like our git server and at openstack.org are currently down, looks like an outage in one of our cloud providers.08:32
*** ChanServ changes topic to "Services at opendev.org like our git server and at openstack.org are currently down, looks like an outage in one of our cloud providers."08:32
* AJaeger sends an email to openstack-discuss now as well...08:32
noonedeadpunkso, seems that mnaser just fixed balancer08:33
AJaegercool!08:34
cshenthanks, better now.08:34
AJaegerare we green again?08:34
AJaegerlooks good on my end...08:35
cshenI'm bootstraping.08:35
AJaegernoonedeadpunk: thanks for telling us08:35
yoctozeptolooks green08:35
AJaegerok, then I'll send the "ok" ;)08:35
*** ysastri has joined #openstack-infra08:36
*** wpp has joined #openstack-infra08:36
AJaeger#status ok The problem in our cloud provider has been fixed, services should be working again08:36
openstackstatusAJaeger: finished sending alert08:36
*** tkajinam has quit IRC08:36
openstackstatusAJaeger: sending ok08:36
AJaegermnaser: thanks for fixing!08:36
*** kobis1 has quit IRC08:37
noonedeadpunkAJaeger: I gues you should send alert a bit earlier - probably we'll get solution faster :P08:38
*** sshnaidm has quit IRC08:38
*** dkopper has joined #openstack-infra08:39
*** ChanServ changes topic to "Discussion of OpenStack Developer and Community Infrastructure | docs http://docs.openstack.org/infra/ | bugs https://storyboard.openstack.org/ | source https://opendev.org/opendev/ | channel logs http://eavesdrop.openstack.org/irclogs/%23openstack-infra/"08:39
AJaegernoonedeadpunk: first alert was send two hours ago - as soon as it was reported...08:39
-openstackstatus- NOTICE: The problem in our cloud provider has been fixed, services should be working again08:39
jamesmcarthurappears everything is back online now08:39
noonedeadpunkah...08:40
AJaegerjamesmcarthur: thanks for confirming.08:41
* AJaeger is offline again...08:41
*** sshnaidm has joined #openstack-infra08:42
cshenme is still working.08:42
openstackstatusAJaeger: finished sending ok08:43
mnasersorry about that, this should have not happened and I'm a bit embarassed at how it all went down08:46
mnaserAnd sorry for the lack of communication on my side.08:46
mnaserAlso, is it possible to drop max-servers to 0 in sjc for now?08:47
*** jamesmcarthur has quit IRC08:48
ianwmnaser: np, stuff happens!  yep we can, is it a fast-merge situation?08:53
*** apetrich has quit IRC08:53
mnaserianw: I mean I kinda disabled the user already on my side08:53
mnaserSo not really unless it breaks you a whole ton having the OpenStack Jenkins user disabled08:54
openstackgerritAndreas Jaeger proposed openstack/project-config master: Disable sjc  https://review.opendev.org/67266208:54
AJaegerianw: want to fast-merge ^08:54
AJaegerand apply on the server directly?08:54
ianwAJaeger: heh, you beat me to it :)08:55
ianwAJaeger: umm i can, maybe it will miss a puppet run.  with the remote end disabled we'll just timeout08:55
AJaegeryou're the expert ;)08:56
ianwi'd never claim that :)  but i've set it to zero on nl03 for the mean time anyway08:57
*** ykarel|lunch is now known as ykarel08:57
*** jtomasek has joined #openstack-infra09:01
*** joeguo has joined #openstack-infra09:01
*** kobis1 has joined #openstack-infra09:02
*** siqbal90 has joined #openstack-infra09:02
*** apetrich has joined #openstack-infra09:02
*** siqbal has quit IRC09:04
openstackgerritFabien Boucher proposed zuul/zuul master: Return dependency cycle failure to user  https://review.opendev.org/67248709:12
*** lpetrut has joined #openstack-infra09:15
*** lpetrut has quit IRC09:16
*** lennyb has joined #openstack-infra09:16
*** lpetrut has joined #openstack-infra09:16
*** kobis1 has quit IRC09:24
openstackgerritMerged openstack/project-config master: Disable sjc  https://review.opendev.org/67266209:24
*** e0ne has joined #openstack-infra09:32
*** yamamoto has quit IRC09:39
*** apetrich has quit IRC09:42
*** ysastri has quit IRC09:52
*** bhavikdbavishi has quit IRC09:52
openstackgerritFabien Boucher proposed zuul/zuul master: Fix reference pipelines syntax coloration for Pagure driver  https://review.opendev.org/67267709:54
*** Lucas_Gray has joined #openstack-infra09:55
*** Lucas_Gray has quit IRC10:06
openstackgerritFabien Boucher proposed zuul/zuul master: Add reference pipelines file for Gerrit driver  https://review.opendev.org/67268310:12
*** yamamoto has joined #openstack-infra10:17
*** yamamoto has quit IRC10:27
*** yamamoto has joined #openstack-infra10:27
*** siqbal has joined #openstack-infra10:33
*** siqbal90 has quit IRC10:34
*** abhishekk has quit IRC10:38
*** ykarel is now known as ykarel|afk10:43
*** jaosorior has joined #openstack-infra10:47
*** yamamoto has quit IRC10:54
*** yamamoto has joined #openstack-infra11:01
*** yamamoto has quit IRC11:06
*** adriant has quit IRC11:07
*** adriant has joined #openstack-infra11:07
*** jaosorior has quit IRC11:08
*** udesale has quit IRC11:13
*** marekchm has quit IRC11:13
*** cshen has quit IRC11:25
*** cshen has joined #openstack-infra11:28
*** yamamoto has joined #openstack-infra11:32
*** rh-jelabarre has joined #openstack-infra11:35
*** yamamoto has quit IRC11:37
*** stakeda has quit IRC11:39
*** pcaruana has quit IRC11:42
*** bhavikdbavishi has joined #openstack-infra11:42
*** igordc has joined #openstack-infra11:43
*** mriedem has joined #openstack-infra11:51
*** apetrich has joined #openstack-infra11:58
*** armax has quit IRC11:58
*** armax has joined #openstack-infra11:59
*** ykarel|afk is now known as ykarel12:00
*** lmiccini has quit IRC12:02
*** dpawlik has quit IRC12:02
*** lmiccini has joined #openstack-infra12:08
*** iurygregory has quit IRC12:11
*** yamamoto has joined #openstack-infra12:11
*** iurygregory has joined #openstack-infra12:11
*** lmiccini has quit IRC12:15
*** yamamoto has quit IRC12:17
*** yamamoto has joined #openstack-infra12:18
*** dpawlik has joined #openstack-infra12:21
*** pcaruana has joined #openstack-infra12:22
*** aedc has quit IRC12:25
*** yamamoto has quit IRC12:27
openstackgerritMonty Taylor proposed zuul/zuul master: Improve SQL query performance in some cases  https://review.opendev.org/67260612:31
*** jcoufal has joined #openstack-infra12:34
openstackgerritFabien Boucher proposed zuul/zuul master: Add reference pipelines file for Github driver  https://review.opendev.org/67271212:41
openstackgerritFabien Boucher proposed zuul/zuul master: Add change replacement field in doc for start-message  https://review.opendev.org/66597412:44
*** joeguo has quit IRC12:47
*** aaronsheffield has joined #openstack-infra12:56
*** yamamoto has joined #openstack-infra13:00
*** ekultails has joined #openstack-infra13:01
*** gtarnaras has joined #openstack-infra13:06
*** rfarr has joined #openstack-infra13:07
*** rfarr_ has joined #openstack-infra13:07
*** bhavikdbavishi has quit IRC13:07
*** bhavikdbavishi has joined #openstack-infra13:10
*** yamamoto has quit IRC13:14
*** udesale has joined #openstack-infra13:16
*** ykarel is now known as ykarel|away13:22
*** jhesketh has quit IRC13:22
*** jaosorior has joined #openstack-infra13:23
*** jhesketh has joined #openstack-infra13:26
*** ykarel_ has joined #openstack-infra13:27
petevgI've got a question about the outage earlier: I had a change that got merged around the same time as the outage, and it seems to have been merged to gerrit's view of the master branch, but not to origin's view of the master branch.13:29
petevgThis is https://opendev.org/x/microstack13:29
*** ykarel|away has quit IRC13:29
petevgMy local view of the change that didn't get merged to origin looks like this:13:29
petevgcommit 59551ca2cdf387fb3a1e857f3aeb89912731e3f2 (HEAD -> master, gerrit/master, multipass-testing-support)13:30
petevgAs opposed to my local view of the last change to appear in "origin's" master:13:30
petevgcommit 8ea5dc8679eea1921888fec1a3d468c0b3ae09ce (origin/master, origin/HEAD)13:30
petevgDoes anybody have a suggestion for a fix? I'm thinking of just running git review on my local copy of master, which I've manually pulled from gerrit, to see if that triggers the gate to fix things ...13:31
*** goldyfruit has joined #openstack-infra13:32
AJaegerpetevg: what is link for the change?13:32
*** ykarel_ has quit IRC13:32
petevgAJaeger: https://review.opendev.org/#/c/672586/13:32
AJaegerpetevg: where exactly are you missing it?13:33
petevgAJaeger: if I git clone https://opendev.org/x/microstack.git, the change doesn't show up in the master branch.13:34
petevgAJaerger: (also, if I just "git pull origin master" on the previously cloned repo.)13:34
AJaegerpetevg: I see it on https://opendev.org/x/microstack - let me check cloing13:34
AJaegerpetevg: I just downloaded and it's there...13:35
AJaegerit's also here https://opendev.org/x/microstack/commit/59551ca2cdf387fb3a1e857f3aeb89912731e3f213:35
petevgAJaeger: yeah. I see it there, too. That's why I pasted the commit lines from git log above. It's in a weird state where it's merged to HEAD and gerrit/master, but not to origin/master.13:35
petevgI'll try recloning. Maybe it fixed itself while I was poking at it.13:36
*** ricolin has quit IRC13:36
petevgAJaeger: nope. It's still not there when you clone.13:36
AJaegerIt is fine on my end - but we have a git farm. So, if it still fails for you, we need hepl from an admin to check check of the systems in the git farm - maybe you hit the one that is out of sync13:36
fungiit's possible that some gitea backends are missing some objects which could have replicated at the time13:37
petevgAJaeger: that would make sense. Just to verify, when you say "download", do you mean that you grabbed a tarball, or that you cloned w/ git?13:37
fungiprobably best if we force replication to all of them from gerrit just to be sure13:37
AJaegercloned with git - git clone https://opendev.org/x/microstack13:37
AJaegerfungi: yeah...13:37
petevgAJaeger: cool. fungi: thank you!13:38
fungiyou can reach them individually without going through the lb like http://gitea08.opendev.org:3080/x/microstack13:38
petevgfungi: ooh, cool. I can self service on the troubleshooting next time :-)13:39
fungianyway, mass replicating to all of them is likely a good precaution but it will take some hours to complete and will delay replication of newer refs13:40
petevgfungi: If I've got a new ref ready to merge, will that fix it?13:40
petevgBecause I'm selfishly okay w/ that. I don't know whether anybody else was affected, though.13:41
fungipetevg: for that one repo, it should13:41
fungiodds are there are plenty of missing refs if there's at least one13:41
*** wpp has quit IRC13:41
petevgYeah ...13:41
*** rfarr_ has quit IRC13:41
*** rfarr has quit IRC13:41
petevgI won't complain about any delays when/if you decide to kick of the mass replication, then. I have a lot of meetings today, anyway :-)13:42
fungii'll give #openstack-release a heads up so they don't approve any openstack release changes while this is still going on13:42
*** jaosorior has quit IRC13:43
*** apetrich has quit IRC13:43
*** yamamoto has joined #openstack-infra13:44
fungi~17k gerrit replication tasks queued13:47
*** apetrich has joined #openstack-infra13:47
*** yamamoto has quit IRC13:48
AJaegerthanks!13:48
openstackgerritMerged opendev/system-config master: Remove gitea02 from inventory so we can replace it  https://review.opendev.org/67262113:54
*** iurygregory has quit IRC13:59
*** iurygregory has joined #openstack-infra14:02
*** eernst has joined #openstack-infra14:02
openstackgerritMerged openstack/project-config master: Cleanup in-tree removed jobs  https://review.opendev.org/67141214:03
*** yamamoto has joined #openstack-infra14:04
*** yamamoto has quit IRC14:04
*** goldyfruit has quit IRC14:07
*** ykarel_ has joined #openstack-infra14:08
*** wpp has joined #openstack-infra14:09
clarkbfungi: one trick to make it go faster is to only replicate to the gitea backends (then github and local /p are left alone)14:13
fungithat's what i did14:14
*** gtarnaras has quit IRC14:14
*** gtarnaras has joined #openstack-infra14:14
fungiin retrospect i should have skipped 02 since we're about to rip it out14:14
*** ian-pittwood has joined #openstack-infra14:15
*** goldyfruit has joined #openstack-infra14:16
*** wpp has quit IRC14:18
*** bobh has joined #openstack-infra14:23
*** dpawlik has quit IRC14:28
ian-pittwoodI'm currently stumped by a problem I am having with Zuul. I have a tox job that I need to run in a py36 environment. I know that Zuul uses py35 by default so I added a line to set the bindep_profile to use py36. Unfortunately that didn't seem to help as the job still fails, stating that py36 wasn't found. Does anyone know what I might be missing? H14:30
ian-pittwoodere's the zuul.yaml in question https://review.opendev.org/#/c/672599/4/.zuul.yaml14:30
clarkbian-pittwood: You likely need to change the nodeset. Ubuntu xenial has py35 but not 36. Bionic has py36. There should be existing py36 jobs you can use too14:31
ian-pittwoodOk, I'll give that a try. Thank you14:32
*** ccamacho has joined #openstack-infra14:32
clarkbbut this specific issue is related to your nodeset14:32
*** ysastri has joined #openstack-infra14:40
*** yikun has quit IRC14:40
openstackgerritMatthew Thode proposed openstack/diskimage-builder master: support alternate portage directories  https://review.opendev.org/67153014:42
*** yamamoto has joined #openstack-infra14:43
*** eernst has quit IRC14:47
*** yamamoto has quit IRC14:53
*** ccamacho has quit IRC14:53
*** jjohnson42 has joined #openstack-infra14:58
jjohnson42So I have an issue where it says 'Change has been successfully merged by Zuul' but I don't see it in the opendev git repo?14:59
*** roman_g has quit IRC14:59
AJaegerjjohnson42: we had some downtime this morning and currently replicate everything to our git farm to ensure they are in sync. So, I hope this will be fixed a few hours...15:00
*** ricolin_phone has joined #openstack-infra15:00
fungijjohnson42: yeah, we're down to 1.25k replication tasks queued so should be caught up in the next couple hours15:01
fungier, 12.5k i mean15:01
jjohnson42ok, figured it would be something well known, just asking to double check, thanks for the info15:01
fungiwhat's an order of magnitude among friends? ;)15:01
mordredfungi: I dunno, joey vs chandler?15:02
fungii'm doing my best to forget that i have context to parse that punchline15:02
*** rlandy|mtg has quit IRC15:05
*** jpena|mtg is now known as jpena|off15:07
*** siqbal has quit IRC15:12
openstackgerritJames E. Blair proposed zuul/zuul-operator master: WIP: test operator / iptables  https://review.opendev.org/67275515:17
*** dklyle has quit IRC15:17
*** _erlon_ has joined #openstack-infra15:18
*** dklyle has joined #openstack-infra15:18
*** dkopper has quit IRC15:20
openstackgerritTristan Cacqueray proposed zuul/zuul-operator master: WIP: Add zuul-operator-functional-openshift job  https://review.opendev.org/67275615:21
*** larainema has quit IRC15:21
*** ricolin has joined #openstack-infra15:22
*** siqbal has joined #openstack-infra15:23
*** e0ne has quit IRC15:23
*** kopecmartin is now known as kopecmartin|off15:24
mordredfungi: this punchline is cut in half. I'd like to exchange it for a punchline that is NOT ... cut in half.15:24
*** pgaxatte has quit IRC15:25
openstackgerritTristan Cacqueray proposed zuul/zuul-operator master: WIP: Add zuul-operator-functional-openshift job  https://review.opendev.org/67275615:25
*** gfidente has quit IRC15:26
clarkb11k tasks. I do wonder if it goes faster when we do them one or two at a time15:27
clarkbstill waiting for gitea02 removal to show up on bridge (likely due to the replication backlog)15:27
*** odicha has quit IRC15:28
clarkbthat must be gerrit's way of telling me to go on an early bike ride15:29
*** Goneri has quit IRC15:31
*** ricolin_ has joined #openstack-infra15:33
*** siqbal has quit IRC15:34
*** ricolin_phone has quit IRC15:34
*** ricolin has quit IRC15:36
*** ricolin_ is now known as ricolin15:38
*** adriancz has quit IRC15:39
openstackgerritSorin Sbarnea proposed zuul/zuul-jobs master: Assure ensure-tox installs latest tox version  https://review.opendev.org/67276015:39
openstackgerritJames E. Blair proposed zuul/zuul master: Improve SQL query performance in some cases  https://review.opendev.org/67260615:39
zbr_clarkb: mordred ^ i hope I explained well the ensure-tox change reasoning. i am curious what you think.15:40
AJaegerdo we need to sync to codesearch as well? Or will it be updated once the replicatoin is done?15:41
clarkbzbr that eould break any users that might preselect a working tox in their image builds15:42
clarkbAJaeger: I think codesearch pulls from opendev.org on its own so should self correct once opendev is up to date15:42
clarkb(codeesearch is #3 requestor to opendev when I looked)15:42
zbr_clarkb: depends how they call it. if they call it with full path, it should not.15:43
AJaegerclarkb: great, thanks15:43
clarkbzbr unless that path is in that user install venv15:43
clarkbzbr we have had to do this a couple times in the past due to changesin tox breaking backward compat15:44
zbr_yep, and I already see jobs failing. any ideas?15:44
*** gyee has joined #openstack-infra15:44
clarkbI would add a separate upgrade tox step to jobs that know they always want the latest version15:44
zbr_i could add a variable that tells it to update or not, default not to.15:44
zbr_in fact is even worse: i need to remove the system one to be sure it will work.15:44
zbr_clarkb: i discovered an hour ago that i was not able to add new stuff to a tox.ini file because the repository was running tox-docs on centos7, which happens to have tox 1.6.15:46
clarkbrun the job on a different nodetype is probably the quickest path fprward there15:46
corvuszbr_: seems to me that maybe someone setting that job up wanted to make sure that development could happen on centos7?15:46
zbr_so I am trying to find a solution that would not break exiting system15:47
clarkbcorvus: ya that is similar to my other concern15:47
clarkbbasically that tox version choice may be intebtional15:47
zbr_clarkb: is not indentional in this case. so ok if I add a paramter to change behavior? so only those wanting lastest would get it.15:48
*** altlogbot_0 has quit IRC15:48
*** ykarel_ is now known as ykarel|away15:49
clarkba flag to opt into upgrading would probably be ok15:49
*** altlogbot_2 has joined #openstack-infra15:49
*** marios|ruck has quit IRC15:49
*** tesseract has quit IRC15:50
zbr_here is an interesting finding: upgrading tox as user breaks tox on system that do not have ~/.local/bin in PATH (aka CentOS7, newer ones do have it)15:52
openstackgerritJames E. Blair proposed zuul/zuul-operator master: WIP: test operator / iptables  https://review.opendev.org/67275515:53
zbr_so in this particular case one user cannot have a working system-tox and a working tox-in-user-dir -- one of them will fail to import.15:53
zbr_workarounds: calling tox with `python -m tox`15:53
zbr_or removing the old one. me being inclined to like the the module calling method in general.15:54
*** ginopc has quit IRC15:54
zbr_only the script is broken, module works fine, both versions.15:54
zbr_another approach would be to check if ~/.local/bin is in PATH and add it before calling tox, but it is bit ugly.15:55
*** siqbal has joined #openstack-infra15:57
openstackgerritMerged zuul/zuul-jobs master: Skip test-setup.sh in pep8 jobs  https://review.opendev.org/67013315:57
openstackgerritSorin Sbarnea proposed zuul/zuul-jobs master: WIP: Assure ensure-tox installs latest tox version  https://review.opendev.org/67276015:58
*** cdent has joined #openstack-infra16:05
yoctozeptojjohnson42: re: opendev.org - I reconfigured my repos to use review.opendev.org, also wanted to report my repos are not in sync16:06
cdenthow long do we normally expect a patch to show up in opendev.org master? https://review.opendev.org/#/c/672298/ is in gerrit/master but not origin/master (where origin is the opendev.org)16:07
cdentah.16:07
cdentseems it is already being discussed16:07
fungicdent: yoctozepto: yep, we're down to 9.6k remaining replication tasks in the queue16:08
*** gtarnaras has quit IRC16:08
cdentI assume that's fallout from the earlier disk issues?16:08
fungiyep, since there were block device problems in the provider hosting the gitea servers, they ended up missing some git objects, so i initiated a full replication of all repositories to them to make sure any missing objects are fixed16:09
fungibut that causes all replication for new refs to queue up behind that16:09
*** wpp has joined #openstack-infra16:10
yoctozeptofungi: thanks for background16:11
cdentditto16:11
mnaserfungi: i wonder if long term, it would be faster to replicate to a 'master' gitea node that then replicates to a bunch of other ones16:11
mnasereliminating latency and reducing load on the gerrit server too16:11
openstackgerritSorin Sbarnea proposed zuul/zuul-jobs master: WIP: Allow ensure-tox to upgrade tox version  https://review.opendev.org/67276016:12
fungimnaser: long term we want gitea servers to be able to share a backend16:13
openstackgerritJames E. Blair proposed zuul/zuul-operator master: WIP: test operator / iptables  https://review.opendev.org/67275516:13
fungimnaser: but there are some enhancements it needs to be able to support that16:13
mnaserGotcha16:13
fungiour original deployment model involved only replicating to one, and it mostly worked accidentally16:14
fungibut gitea isn't actually designed for that (yet) so it stopped working when we upgraded16:14
*** mattw4 has joined #openstack-infra16:14
fungiand so the current design with independent backends is a workaround for now16:15
corvuswork is in progress to support that16:15
openstackgerritTristan Cacqueray proposed zuul/zuul-operator master: WIP: Add zuul-operator-functional-openshift job  https://review.opendev.org/67275616:16
cdentthank fungi, now back to my reguarly scheduled assorted manyness16:17
*** iurygregory has quit IRC16:20
*** lucasagomes has quit IRC16:21
*** ykarel|away has quit IRC16:22
*** mattw4 has quit IRC16:23
*** mattw4 has joined #openstack-infra16:23
*** rascasoft has quit IRC16:23
*** rascasoft has joined #openstack-infra16:27
*** lpetrut has quit IRC16:29
mordredmnaser: in fact, once the work in progress to support single-shared-gitea is done, it would be made even better by manilla-cephfs - so there are several future improvement possibilities16:33
mnasermordred: forever hinting at the need/want of manila-cephfs :P16:33
mnasersoon(tm)16:34
mnaser:p16:34
mordredmnaser: it's how I let you know I care ;)16:34
*** cdent has left #openstack-infra16:35
*** rpittau is now known as rpittau|afk16:36
*** ricolin has quit IRC16:39
openstackgerritTristan Cacqueray proposed zuul/zuul-jobs master: install-openshift: bump version to 3.11.0  https://review.opendev.org/67278516:40
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: Add clear-firewall role  https://review.opendev.org/67278616:41
openstackgerritTristan Cacqueray proposed zuul/zuul-operator master: WIP: Add zuul-operator-functional-openshift job  https://review.opendev.org/67275616:41
openstackgerritTristan Cacqueray proposed zuul/nodepool master: DNM: test openshift version bump  https://review.opendev.org/67278816:43
*** ykarel|away has joined #openstack-infra16:45
*** pkopec has quit IRC16:48
*** dtantsur is now known as dtantsur|afk16:49
openstackgerritTristan Cacqueray proposed zuul/zuul-operator master: WIP: Add zuul-operator-functional-openshift job  https://review.opendev.org/67275616:50
*** chandankumar is now known as raukadah16:52
openstackgerritJeff Liu proposed zuul/zuul-operator master: Add telnet to Docker Image  https://review.opendev.org/67279116:53
openstackgerritJeff Liu proposed zuul/zuul-operator master: Add telnet to Docker Image  https://review.opendev.org/67279116:56
openstackgerritTristan Cacqueray proposed zuul/zuul-jobs master: install-openshift: bump version to 3.11.0  https://review.opendev.org/67278516:57
*** igordc has quit IRC16:58
*** igordc has joined #openstack-infra16:58
openstackgerritTristan Cacqueray proposed zuul/nodepool master: DNM: test openshift version bump  https://review.opendev.org/67278816:58
*** ysastri has quit IRC16:59
*** jcoufal_ has joined #openstack-infra17:03
fungiit's under 900017:04
*** jcoufal has quit IRC17:07
*** roman_g has joined #openstack-infra17:08
openstackgerritTristan Cacqueray proposed zuul/zuul-operator master: WIP: Add zuul-operator-functional-openshift job  https://review.opendev.org/67275617:10
*** diablo_rojo has joined #openstack-infra17:11
*** armax has quit IRC17:13
*** ian-pittwood has quit IRC17:19
*** odicha has joined #openstack-infra17:19
*** betherly has joined #openstack-infra17:19
*** odicha_ has joined #openstack-infra17:21
*** odicha__ has joined #openstack-infra17:22
*** ralonsoh has quit IRC17:24
*** betherly has quit IRC17:24
*** igordc has quit IRC17:28
*** odicha__ has quit IRC17:28
*** odicha has quit IRC17:28
*** odicha_ has quit IRC17:28
openstackgerritTristan Cacqueray proposed zuul/zuul-operator master: WIP: Add zuul-operator-functional-openshift job  https://review.opendev.org/67275617:30
*** odicha has joined #openstack-infra17:33
*** odicha has quit IRC17:33
*** udesale has quit IRC17:34
*** odicha has joined #openstack-infra17:36
*** siqbal has quit IRC17:36
*** bobh has quit IRC17:39
*** weifan has joined #openstack-infra17:45
*** odicha_ has joined #openstack-infra17:46
clarkbbringing the security group discussion here. Historically the two major issues with them have been 1) rax didn't support security groups and 2) they were very inefficient with group to group rules (which we'd need to rely on for multinode testing and the like) no the database. I believe rax has security groups now and that the database is no longer as sad about security groups17:47
clarkbI think that means we could reconsider them as an option for preventing open dns resolvers and such on the internet then remove our firewall rules from the test nodes entirely17:47
*** goldyfruit has quit IRC17:49
clarkbThen zuul testing and everyone else testing doesn't have to worry about modifying firwall rules at job time17:49
*** psachin has quit IRC17:50
*** armax has joined #openstack-infra17:50
weifanHas there been any changes to tag pushing?17:50
weifanI was trying to push a new tag using following remote, which used to work..17:50
weifanssh://<username>@review.opendev.org:29418/x/<project_name>17:50
weifanRight now it says the push is completed, and I could also find it on pypi. But I dont see the tag on opendev for some reason..17:50
clarkbweifan: there was a cloud outage a little while ago that prevented us from replicating gerrit repo data to the opendev backends. That outage has been corrected and we are now in the process of rereplicating everythign to gitea to ensure it is up to date17:51
clarkbweifan: when that process completes your tag should be present on opendev, but until then it is somewhere in the queue17:51
weifani see, thanks :)17:51
clarkbat this rate I'm guessing a few more hours?17:51
openstackgerritTristan Cacqueray proposed zuul/zuul-operator master: WIP: Add zuul-operator-functional-openshift job  https://review.opendev.org/67275617:54
clarkbI think we could turn on security groups with our existing iamges (we'll just be double firewalled) then if that doesn't break anything remove the firewalls from the images. The transition should be fairly safe (and if adding security groups does break something revert the cloud laucnher change)17:55
fungiclarkb: yeah, it's possible we could orchestrate whitelist security groups over each of the job node tenant networks... as long as things like temporary docker registries coexist in the same region as the builds which connect to them17:56
fungiotherwise i think we're stuck with a blacklist model instead17:56
clarkbfungi: I believe zuul enforces that requirement currently, but good point we should double check that17:56
fungibasically if we can assume that builds which interact with each other will only attempt to connect to job nodes in the same provider/region then it's probably pretty straightforward17:57
clarkbI'm 99% sure zuul does enforce that locality requirement (probably because we were thinking about stuff like this)17:57
clarkbcorvus would likely know 100%17:58
*** odicha has quit IRC17:58
*** odicha_ has quit IRC17:59
clarkband we should double check that security groups do work on rax (their docs say you can do that with public cloud so I expect it to work)17:59
*** jtomasek has quit IRC18:01
openstackgerritSorin Sbarnea proposed zuul/zuul-jobs master: WIP: Allow ensure-tox to upgrade tox version  https://review.opendev.org/67276018:03
*** goldyfruit has joined #openstack-infra18:04
openstackgerritTristan Cacqueray proposed zuul/zuul-operator master: WIP: Add zuul-operator-functional-openshift job  https://review.opendev.org/67275618:05
mordredclarkb, fungi: what's the email address we're using for when we need an opedev root email address? infra-root@openstack.org still?18:11
*** bobh has joined #openstack-infra18:11
clarkbmordred: yes18:11
*** mattw4 has quit IRC18:11
mordredclarkb: thx18:11
*** mattw4 has joined #openstack-infra18:11
*** dklyle has quit IRC18:11
clarkbfungi: re locality I remember why we enforce that, it is because some clouds have ipv6 only and others are ipv4 only so we can't assume they can talk to each other even if firwalls are wide open18:11
*** dklyle has joined #openstack-infra18:12
clarkbthe firwalls are 1980s wood paneling18:12
mordredsuch lovely wood paneling18:12
*** priteau has quit IRC18:13
fungiyup18:13
fungiokay, so a fairly simple (22/tcp from everywhere) whitelist is probably sufficient?18:13
clarkbfungi: and an in group wide open rule (security group members can talk to themselves)18:14
fungithough to allow instance-to-instance traffic we have to add the instances to groups18:14
fungiyeah, that18:14
clarkbthat is a thing you can express in the rules too18:14
fungiis there a default group they appear in automatically?18:14
clarkbthere is a default group18:14
clarkband by default that group has the talk to myself rule (but our cloud launcher removes it currently)18:15
*** auristor has quit IRC18:15
*** jamesmcarthur has joined #openstack-infra18:17
*** auristor has joined #openstack-infra18:17
clarkbwe also need to open the zuul console log port18:17
clarkbssh + console log port + in group connectivity. Anything else missing?18:17
*** bobh has quit IRC18:17
openstackgerritMerged zuul/zuul master: Improve SQL query performance in some cases  https://review.opendev.org/67260618:18
*** dims has quit IRC18:19
openstackgerritTristan Cacqueray proposed zuul/zuul-operator master: WIP: Add zuul-operator-functional-openshift job  https://review.opendev.org/67275618:21
*** roman_g has quit IRC18:22
*** igordc has joined #openstack-infra18:22
*** roman_g has joined #openstack-infra18:23
openstackgerritClark Boylan proposed opendev/system-config master: Use cloud security groups for test node isolation  https://review.opendev.org/67280618:28
clarkbfungi: mordred ^ that roughly what it would look like (and applied to vexxhost mtl1 only in that change if we wnt to merge it, only gpu test nodes reside there currently)18:29
*** dims has joined #openstack-infra18:29
clarkbI believe the default ruleset if applied to instances by default if you don't specify one either so nothing in nodepool would have to change either18:30
openstackgerritSorin Sbarnea proposed zuul/zuul-jobs master: WIP: Allow ensure-tox to upgrade tox version  https://review.opendev.org/67276018:30
*** weifan has quit IRC18:35
mordredclarkb: udp?18:36
clarkbmordred: we don't need udp inbound do we?18:36
mordredoh-  default group rule is typeless18:36
clarkb(I think iptables treats udp as "stateful" so the outbound dns requests should get responses)18:36
clarkbmordred: ya18:36
mordred(was more thinking instance-to-instance traffic)18:36
*** goldyfruit has quit IRC18:38
clarkb5.9k tasks to go now18:39
*** eharney has quit IRC18:39
*** betherly has joined #openstack-infra18:41
openstackgerritSorin Sbarnea proposed zuul/zuul-jobs master: WIP: Allow ensure-tox to upgrade tox version  https://review.opendev.org/67276018:42
fungiyeah, we're down to ~1/3 of the replication backlog remaining18:44
fungigoing to try and knock out some yardwork so that my evening is free to work on gitea server replacement stuff18:44
*** betherly has quit IRC18:45
*** fdegir has quit IRC18:45
*** fdegir has joined #openstack-infra18:46
*** ykarel|away has quit IRC18:49
clarkbI think if we want to move ahead with that chagne the next two things to do would be to confirm it doesn't break anything (by applying it to vexxhost as proposed) and also to try and apply it to the rax regions18:50
clarkbsince will it work with rax and will it not break existing jobs are the two big questions18:50
openstackgerritTristan Cacqueray proposed zuul/zuul-operator master: WIP: Add zuul-operator-functional-openshift job  https://review.opendev.org/67275618:51
corvusfungi, clarkb: okay so on the firewall thing -- let me summarize and see if we're on the same page: 1) the firewall is good because it's easy for folks to mess up and accidentally create an open proxy/resolver/etc.  2) we give folks root, they can disable it if they need to.  3) it's good to have that speedbump though so that they have to think about it, so we should not remove it from the base18:51
corvusimages.  4) it is reasonable to disable the firewall for the k8s case because the very next step is that k8s is going to create a bunch of firewall rules that are not going to allow undue external access.  5) we could consider using security groups in our providers as a replacement for the firewall (but that's going to take some careful engineering since we have jobs which communicate cross-region)18:51
clarkbcorvus: yes basically and maybe 6) the major historical reasons for not using security groups are no longer present (according to neutron and rax docs)18:52
clarkbcorvus: what jobs communicate cross region? I seem to recall we couldn't do that due to ipv6 and ipv4 only clouds existing18:52
*** boden has joined #openstack-infra18:52
fungii concur with the summary18:53
bodenhi... wondering if anyone has any pointers on a functional job failure related to "Error when trying to get requirement for VCS system" as shown in http://logs.openstack.org/25/672725/5/check/neutron-classifier-functional-dsvm/5eb2c85/job-output.txt.gz#_2019-07-25_18_41_29_57911018:53
bodenis this because keystone is not in the test-requirementst.txt maybe?18:54
*** mriedem has quit IRC18:54
mordredboden: that's not actually an error18:55
clarkbboden: http://logs.openstack.org/25/672725/5/check/neutron-classifier-functional-dsvm/5eb2c85/job-output.txt.gz#_2019-07-25_18_40_32_918662 is the error18:56
mordredboden: it's an unfortunate error printed by pip because of the lack of origin remote in the repos - but is harmless ... ^^ what clarkb said18:56
corvusclarkb: i think jobs that use the buildset registry may do that  (and yes, it's a pita)18:56
clarkbyou are running into ERROR_ON_CLONE because devstack is needing to clone some repos but we've told it isn't allowed to. The way to address that is to add it to the required projects of the job or remove those services from the devstack config18:56
clarkbboden: ^18:56
bodenclarkb mordred thanks for that18:57
yoctozeptodid iad.rax experience issues with epel mirror around 16:50 UTC? cause different images failed to build due to different 404 packages18:57
clarkbcorvus: in cases where we pause a job with a buildset registry then other jobs consume from that? for some reason I thought we did restrict that to the same region18:57
mordredyeah - I thought the same thing18:57
mordredbut I am most likely just wrong18:57
clarkbyoctozepto: that is our kafs canary, that implies the fixes for falling back to the second afs server are not working18:57
clarkbyoctozepto: can you provide direct links to where that happens it will help us and the kernel devs debug possibly18:58
corvusclarkb, fungi, mordred: multinode jobs are restricted to the same region, but jobs which depend on other jobs aren't18:58
clarkbcorvus: got it18:58
fungiif a job paused to serve a registry in limestone (global v6 access only) and then the build trying to use that ran in ovh (no global ipv6 egress routing) they'd be unable to talk18:58
clarkbcorvus: considering that we can't rely on that cross cloud region communication working anyway (regardless of where we put the firwall) I think we may want to fix that anyway?18:58
*** cshen has quit IRC18:59
yoctozeptoclarkb: e.g. here http://logs.openstack.org/86/668286/4/check/kolla-build-oraclelinux-source/1ea1696/logs/build/ - timestamps on _FAILED_18:59
yoctozeptothough it only pinpoints the time18:59
yoctozepto404 is generic ;-)18:59
clarkbyoctozepto: why do your log files not have timestamps in them?18:59
yoctozeptoclarkb: this one does: http://logs.openstack.org/86/668286/4/check/kolla-build-oraclelinux-source/1ea1696/job-output.txt.gz19:00
yoctozeptothough it's all-in-one19:00
clarkbyoctozepto: 404 is generic but we know it happens in kafs when the filesytem is being updating and clients are supposed to fallback to the secondary fs, however kafs wasn't doing that and we are running proposed changes that are supposed to fix that in kafs whcih I'm guessing they dont. That feedback is useful to the kernel19:00
yoctozeptook, then I pinpoint the time for ya19:01
clarkband ya if we have the timestamp we can check if the fs was updating at that time to correlate the two events19:01
yoctozeptogrep "HTTP Error 404" on http://logs.openstack.org/86/668286/4/check/kolla-build-oraclelinux-source/1ea1696/job-output.txt.gz19:01
*** betherly has joined #openstack-infra19:01
yoctozeptonice timestamps19:01
clarkbyoctozepto: note you can direct link to the timestamps on that file19:02
corvusclarkb, fungi: maybe we need to fix that by getting ipv6 in ovh19:02
clarkbhttp://logs.openstack.org/86/668286/4/check/kolla-build-oraclelinux-source/1ea1696/job-output.txt.gz#_2019-07-25_16_44_54_226419 for example19:02
clarkbcorvus: and inap iirc19:02
clarkband rax19:02
clarkb(we only support ipv6 on rax on debuntu hosts)19:02
yoctozeptomore here: http://logs.openstack.org/86/668286/4/check/kolla-build-oraclelinux-binary/0d3fc67/job-output.txt.gz19:03
*** mriedem has joined #openstack-infra19:03
zbr_AJaeger: clarkb: i made the required changed to ensure-tox, if you can have another look it would be great.19:03
zbr_https://review.opendev.org/#/c/672760/19:03
yoctozeptoclarkb: thanks, you are right, though there are many to share19:03
clarkbyoctozepto: we only need the one probably19:03
clarkbjust enough to correlate to an updating afs volume19:03
corvuszbr_, AJaeger: that sort of change should have a test job19:04
yoctozeptohttp://logs.openstack.org/86/668286/4/check/kolla-build-oraclelinux-binary/0d3fc67/job-output.txt.gz#_2019-07-25_16_39_23_60682619:04
yoctozepto^ earliest probably19:04
yoctozeptoseems it hit epel only19:04
yoctozeptocentos mirror seems to have worked19:04
clarkbyoctozepto: they are separate afs volumes iirc (though I'll double check that when I look at this more closely)19:04
clarkbcurrently about to consume lunch19:04
*** bobh has joined #openstack-infra19:05
*** betherly has quit IRC19:05
*** weifan has joined #openstack-infra19:07
openstackgerritJeff Liu proposed zuul/zuul-operator master: [WIP] Verify Operator Pod Running  https://review.opendev.org/67039519:08
corvusclarkb, fungi, mordred: jobs which depend on other paused jobs *request* nodes from the same provider, and will get them if that provider is still online.19:08
corvusclarkb, fungi, mordred: so that case should usually not be a problem19:08
corvusonly in weird edge cases (like a provider going offline during a buildset)19:08
corvus(in that case, it'll fall back on letting any provider fulfill it)19:09
*** igordc has quit IRC19:09
*** tosky has quit IRC19:10
corvusclarkb, fungi, mordred: and we're talking nodepool provider here, so that's a cloud-region combo19:10
corvuscould come from a different 'pool' though19:11
*** bobh has quit IRC19:11
*** weifan has quit IRC19:11
clarkbour nodepool providers are per region19:12
openstackgerritSorin Sbarnea proposed zuul/zuul-jobs master: Allow ensure-tox to upgrade tox version  https://review.opendev.org/67276019:13
clarkbFrom that I think we'd be ok except for the fallback case that we also risk breaking in the ipv4 vs ipv6 case, however with security groups that would be a hard fail all the time rather than a sometimes fail19:14
zbr_corvus: done, added test jobs and referenced it with needed-by. see https://review.rdoproject.org/r/#/c/21594/19:14
openstackgerritTristan Cacqueray proposed zuul/zuul-operator master: WIP: Add zuul-operator-functional-openshift job  https://review.opendev.org/67275619:15
*** xek_ has joined #openstack-infra19:15
* clarkb lunches19:16
*** xek has quit IRC19:17
*** bhavikdbavishi has quit IRC19:18
*** igordc has joined #openstack-infra19:25
*** dims has quit IRC19:30
*** goldyfruit has joined #openstack-infra19:32
*** igordc has quit IRC19:32
AJaegerzbr_: we have in-tree test jobs nowadays in zuul-jobs, have a look zuul-tests.d/ directory19:38
*** rfarr has joined #openstack-infra19:38
*** rfarr has quit IRC19:38
*** e0ne has joined #openstack-infra19:39
*** jamesmcarthur has quit IRC19:40
*** jamesmcarthur has joined #openstack-infra19:41
*** joeguo has joined #openstack-infra19:44
*** rascasoft has quit IRC19:45
*** jamesmcarthur has quit IRC19:46
*** rascasoft has joined #openstack-infra19:47
zbr_AJaeger: no pb with me, so you one one more job that uses this new param that triggers when someone edits this role, right?19:48
zbr_i personally prefer using molecule to test ansible roles, as I can easily test lots of usecases in seconds, and locally too. maybe I should make a demonstration19:49
fungiwhat's nice about the existing jobs is they exercise these roles the way they'll be used in ci jobs, rather than in an abstract framework19:59
*** michael-beaver has joined #openstack-infra20:02
*** betherly has joined #openstack-infra20:02
*** betherly has quit IRC20:07
*** jcoufal_ has quit IRC20:07
*** igordc has joined #openstack-infra20:08
*** jamesmcarthur has joined #openstack-infra20:11
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: Add clear-firewall role  https://review.opendev.org/67278620:15
corvuszbr_: a third-party test is great, but how about a first party test? :)  AJaeger had some suggestions there20:17
corvuszbr_: this may be a candidate for testing on different platforms too; thare are examples for that20:18
*** jamesmcarthur has quit IRC20:19
zbr_corvus: sure. which platforms/versions you want me to cover?20:20
corvuszbr_: at least ubuntu-bionic (the default) plus any you don't want to break.  since centos7 was a concern, you may want to include that.20:20
corvuszbr_: there's a special macro you can use if you think it should be tested on all platofrms20:21
corvuszbr_: http://lists.zuul-ci.org/pipermail/zuul-discuss/2019-July/000973.html  has more info too20:21
corvuszbr_: i'm writing a patch for zuul-jobs to update the docs with the info in that ml post20:21
zbr_cool, that was what I expected. i will read that too,20:22
*** gyee has quit IRC20:22
clarkbI have WIP'd https://review.opendev.org/#/c/672806/1 given that buildset registries may run in different clouds20:25
corvusclarkb: did you see my update?20:25
corvusclarkb: you were 99% right about that (and i was 1% right)20:25
clarkboh no I missed it then20:25
corvusso i don't think it's a problem we need to concern ourselves with20:26
corvussee 19:08-19:11 in here; i think you were getting lunch20:26
fungi(depends on a provider outage or similar immediate catastrophy)20:26
corvusfungi: right20:26
clarkboh neat. Should I remove the WIP then? I guess the question now becomes: do we think that this is worth pursuing as it will take some measured rollout20:26
corvusclarkb: i kinda think so?  i like the idea of having a cleaner test env20:27
clarkbk I'll remove the WIP then as  Ithink the current ps is a good starting point for testing a rollout20:28
fungiit does mean that, e.g., if someone manually troubleshooting a job wants to initiate connections to it other than those allowed by the security groups we apply will be unable to (aside from reverse tunneling or similar complexity). not sure if that's a concern20:28
fungii'm not personally concerned by that aspect, fwiw20:29
*** weifan has joined #openstack-infra20:29
clarkblooking up epel afs volume update times now20:29
corvusi have held nodes running a docker registry and performed local actions from my workstation against them to debug.  this would make that harder.  not sure if that's a deal killer.20:29
*** jamesmcarthur has joined #openstack-infra20:30
clarkbya you'd likely end up doing ssh -L type proxying20:30
corvusyep.  should suffice i think.20:30
fungigerrit replication backlog is under 3k now20:31
fungii think we're on track for an 8 hour completion time, which implies that it currently takes ~1 hour to perform full replication to a single gitea backend20:32
corvusaren't they in parallel?20:32
fungiestimating completion around 21:40z20:32
clarkbhttp://paste.openstack.org/show/754873/ does seem to coincide with http://logs.openstack.org/86/668286/4/check/kolla-build-oraclelinux-binary/0d3fc67/job-output.txt.gz#_2019-07-25_16_39_23_60682620:33
clarkbianw: ^ re kafs I don't think the fixes for falling back to other servers is working properly20:33
clarkbyes it is in parallel20:33
clarkbthere are N threads per replication target20:33
clarkbHowever, I think it may be faster if we do them one by one? seems like it didn't take me that long to run through them all after OOMs20:34
*** zbr_ has quit IRC20:35
clarkbI wonder if that implies we should have fewer replication threads (contention being a likely cause of slowdown when run in parallel?)20:35
fungiahh, yeah that i don't know about. because i issued replication commands for each of them one by one (so as to exclude local and github... i couldn't manage to get a glob/regex working) that might have caused them to get serialized? hard to tell from what's left in the backlog at this point but can probably suss it out from cacti graphs20:35
*** raissa has joined #openstack-infra20:36
*** zbr has joined #openstack-infra20:37
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: Update testing section  https://review.opendev.org/67282020:37
corvusAJaeger, zbr: ^20:38
*** diablo_rojo has quit IRC20:40
*** cshen has joined #openstack-infra20:40
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: Add clear-firewall role  https://review.opendev.org/67278620:41
*** harlowja has joined #openstack-infra20:43
fungicorvus: clarkb skimming the active replication processes in the queue, they do appear to be parallelized (~4 active per destination)20:45
clarkblooks like we do set it to 4 threads per gitea backend20:46
clarkbthat is in system-config/modules/openstack_project/manifests/review.pp20:47
openstackgerritSorin Sbarnea proposed zuul/zuul-jobs master: WIP: Allow ensure-tox to upgrade tox version  https://review.opendev.org/67276020:49
*** mriedem has quit IRC20:52
*** mriedem has joined #openstack-infra20:53
*** bobh has joined #openstack-infra20:53
*** bobh has quit IRC20:59
*** gyee has joined #openstack-infra20:59
*** jamesmcarthur has quit IRC21:00
*** Lucas_Gray has joined #openstack-infra21:00
*** jamesmcarthur has joined #openstack-infra21:01
openstackgerritSorin Sbarnea proposed zuul/zuul-jobs master: WIP: Allow ensure-tox to upgrade tox version  https://review.opendev.org/67276021:02
*** betherly has joined #openstack-infra21:03
zbrcorvus: thanks for documenting this, I will try to use it tomorrow as is 10pm here. For the. moment i enabled tox-molecule job for testing that role (just to compare the two approaches)21:04
*** jjohnson42 has quit IRC21:05
*** cshen has quit IRC21:07
*** betherly has quit IRC21:08
*** zbr has quit IRC21:11
openstackgerritMonty Taylor proposed opendev/system-config master: Build gerrit images for 2.16 and 3.0 as well  https://review.opendev.org/67227321:11
openstackgerritMonty Taylor proposed opendev/system-config master: Trim some bazel flags  https://review.opendev.org/67227421:12
*** ekultails has quit IRC21:12
openstackgerritJeff Liu proposed zuul/zuul-operator master: [WIP] Verify Operator Pod Running  https://review.opendev.org/67039521:12
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: Add clear-firewall role  https://review.opendev.org/67278621:13
mordredcorvus, clarkb: https://review.opendev.org/#/c/671457 is ready for re-review - I think I took care of the review comments21:13
*** jamesmcarthur has quit IRC21:13
clarkbmordred: safe to approve since nothing is using it yet right?21:14
*** slaweq has quit IRC21:15
mordredclarkb: that's right21:15
corvusi agree21:15
clarkbdone21:15
mordredwoot!21:15
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: Update testing section  https://review.opendev.org/67282021:17
*** cshen has joined #openstack-infra21:18
*** diablo_rojo has joined #openstack-infra21:19
openstackgerritClark Boylan proposed zuul/zuul-jobs master: Add note to clear-firewall docs  https://review.opendev.org/67282921:20
*** cshen has quit IRC21:23
*** zbr has joined #openstack-infra21:26
*** whoami-rajat has quit IRC21:28
*** pcaruana has quit IRC21:28
fungireplication backlog is nearly down to 1k. gonna go grab dinner and by the time i'm done hopefully the haproxy config change will have taken effect and i can rip out gitea02 and start building its replacement21:29
openstackgerritClark Boylan proposed zuul/zuul-jobs master: Add note to clear-firewall docs  https://review.opendev.org/67282921:30
*** zbr has quit IRC21:32
*** boden has quit IRC21:32
*** panda has quit IRC21:34
*** panda has joined #openstack-infra21:34
openstackgerritMerged zuul/zuul-jobs master: Add clear-firewall role  https://review.opendev.org/67278621:34
clarkbmriedem: thank you for calling out the nova memcache thing on the config drive bug21:44
clarkbmriedem: I left a note on it suggesting that having devstack just do it when memcache is enabled would be great21:44
*** jamesmcarthur has joined #openstack-infra21:46
mriedem\o/21:46
openstackgerritJeff Liu proposed zuul/zuul-operator master: [WIP] Verify Operator Pod Running  https://review.opendev.org/67039521:48
openstackgerritMerged zuul/zuul-jobs master: Add note to clear-firewall docs  https://review.opendev.org/67282921:50
*** jamesmcarthur has quit IRC21:51
openstackgerritJeff Liu proposed zuul/zuul-operator master: [WIP] Verify Operator Pod Running  https://review.opendev.org/67039521:55
*** e0ne has quit IRC21:56
*** rascasoft has quit IRC21:56
openstackgerritMerged opendev/system-config master: Build docker images of gerrit  https://review.opendev.org/67145721:58
*** rascasoft has joined #openstack-infra21:58
*** slaweq has joined #openstack-infra22:11
*** bdodd_ has joined #openstack-infra22:12
*** bdodd_ has quit IRC22:13
clarkbwe are now processin replication events from after the great enqueing22:15
*** slaweq has quit IRC22:16
*** betherly has joined #openstack-infra22:16
*** rcernin has joined #openstack-infra22:16
clarkband we are caught up22:21
*** betherly has quit IRC22:21
clarkbI think we are about half an hour from bridge's system-config updating based on where it is in the loop22:25
clarkbhrm22:26
clarkbexcept https://opendev.org/opendev/system-config/commits/branch/master is still out of date22:26
clarkbI wonder if all of these have corrupt root disks like 06 did around the summit :/22:26
* clarkb checks them individually22:27
ianwclarkb: were they rebooted after the outage?22:27
ianwthey all had various kernel messages with things like "vda" in them22:28
clarkbianw: I don't know22:28
clarkb01 and 08 have the latest system-config refs but none of the others do22:28
clarkbI'm going to try replicating system-config to gitea0222:29
openstackgerritJames E. Blair proposed zuul/zuul master: Remember tab location on build page  https://review.opendev.org/67283622:29
clarkbunless things are cached I don't think that is working22:30
clarkbwhich is very similar to the behavior we observed in gitea0622:30
ianwclarkb: looks like no ... gitea02 for example systemd has decided the journal is corrupt at least22:31
ianwalthough, rebooting it might make it worse if it doesn't want to mount the disk any more22:31
clarkbianw: I guess we remove it from haproxy, reboot it, retrigger replication and see if that helps?22:31
clarkbfungi: ^ are you back yet?22:32
openstackgerritJames E. Blair proposed zuul/zuul master: Use base 1 line number anchors in log view  https://review.opendev.org/67283722:33
ianwclarkb: i looped through the gitea* servers last night and they all had similar things; especially the systemd journal unhappiness22:33
ianwbut then again, they haven't logged anything since, so maybe it's recovered22:35
clarkbexcept that replication doesn't work22:35
clarkbbut maybe a reboot will solve that?22:35
clarkbI'll remove 02 from the haproxy and reboot it22:37
*** jamesmcarthur has joined #openstack-infra22:38
fungiclarkb: back now22:38
clarkb02 has been removed22:38
ianwclarkb: the rax rescue image thing would be good to try a fsck on the disk and see what that thinks ...22:39
ianwnot sure how to do that22:39
fungii did check the gitea servers and none seemed to have marked their root filesystems read-only22:39
clarkbianw: https://docs.openstack.org/infra/system-config/gitea.html#backend-maintenance22:39
fungiwhich i would have expected if they had irrecoverable i/o errors22:39
clarkbI'm checking gitea docker logs now to see that connections have stopped22:39
clarkbfungi: maybe you want to grab a db backup or 10 just in case these filesystems are really unhappy?22:40
ianwclarkb: oh i mean more mount the disk from outside and check it22:40
*** armax has quit IRC22:40
clarkblast request at 2019-07-25 22:38:15 so going to reboot now22:40
clarkbianw: oh22:40
clarkbsorry skipped the fsck message22:40
clarkblets reboot since that is easy, rereplicate and check22:41
fungijust copy the last nightly backup from one? should be fine since we haven't created new projects22:41
clarkbfungi: ya22:41
ianwalthough agree with fungi, they didn't offline themselves.  and also it seemed to be a pretty hard shutoff, so it's not like some writes were getting through, but others weren't22:42
fungiwe can experiment with 02 presumably22:42
clarkbI thik the writes ar ehappening22:42
clarkbbut you can't read them back again22:42
clarkbanyways rebooting 02 now22:42
fungiif this comes right back up, maybe we need to touch /forcefsck22:42
clarkbit came right back up22:43
clarkbwaiting for docker to show happy containers then will try rereplicating22:43
*** jamesmcarthur has quit IRC22:44
fungibut you did confirm it had missing git objects?22:45
clarkbfungi: yes22:45
clarkber not after reboot22:45
clarkbI havne't replicated yet22:45
clarkbpanic: Failed to execute 'git config --global core.quotepath false': error: could not lock config file /data/git/.gitconfig: File exists22:45
clarkbI am going to delete that file22:45
fungicurious if they were still missing after a reboot to22:45
fungio22:45
funginot that i have high hopes22:46
fungiare all 8 affected, or just some of them? any idea?22:46
clarkber the .lock file22:46
clarkbfungi: 01 and 08 have the system-config refs, none of the others do22:46
clarkbI don't know if that means 01 and 08 are ok or if it is per repo problem22:46
fungiyeah22:46
fungiwell, i've got nothing better to do with my evening than churn through gitea server rebuilds. yardwork is done, dinner is behind me22:48
fungiand we've ironed out most of the gotchas as of yesterday22:48
clarkbhttps://gitea02.opendev.org:3000/opendev/system-config/commits/branch/master is serving content again (old content) going to trigger replication now22:50
clarkbafter triggering replication those refs are present22:51
clarkbgiven that should we rotate through all 8, reboot them all, then trigger replication again?22:51
clarkbI'm adding 02 back to haproxy since its reboot is done22:52
clarkbI'm going to remove 03 now22:54
clarkbany objections to preoceeding to do all of these? maybe I should start with 01?22:54
ianwclarkb: i'm happy to help ... would a little playbook help?22:55
clarkbianw: maybe? the tricky bit with a playbook will be clearing the .gitconfig.lock file but only if gitea fails to start22:55
auristorianw: was the "5.3.0-rc1-afs-next-48c7a244 : volume is offline messages during release" e-mail sent due to additional failures of the mirror?22:57
clarkbdecided to start with 0122:57
ianwauristor: it was mostly an update, but i think we have a case reported above of a file that seemed missing during a release.  i need to correlate it all into something readable, will respond to your mail :)22:58
clarkbianw: I think it may be quicker to just do it given how complicated checking that lock file may be?22:58
clarkb(would have to check docker logs output after determining the container id to find if there are errors around the lockfile?)22:58
clarkb01 is back up and didn't have lock errors. Putting it back in haproxy again22:59
clarkbI wonder if that lockfile is gonna be the canary for broken gitea replication22:59
ianwclarkb: yep, sure.  if you want to log the steps i can follow along and help out with some of the others in due course22:59
*** weifan has quit IRC22:59
clarkbianw: run the disable commands in that link I pasted earlier on the load balancer. Log into giteaXY and do `docker ps -a` then `docker logs --tail 50 $ID_FOR_GITEA` that comes from previous command output when you see no new connections reboot23:01
clarkbthen on start do the docker ps -a and docker log sagain to see if it is sad about the lock file23:02
clarkbif it is the file to delete is in /var/haproxy/data/git/.gitconfig.lock23:02
clarkbdocker should try again and it will succeed after that file is gone, then you can enable the host in haproxy as per my link earlier23:02
ianwok, should i try 08?23:02
fungiclarkb: cycling through all of them makes sense. we should hold off on rebuilds i guess23:02
clarkbsorry /var/gitea/data/git23:02
fungimaybe i can knock some out tomorrow and over the weekend23:03
clarkbianw: yup I am on 03 and it is failing on the lockfile23:03
ianwok, bringing up some windows ...23:03
clarkbI bet 08 doesn't fail on the lockfile because it had the system-config ref23:03
fungiit's an interesting theory, but still hard to know for sure it's not missing something else23:04
clarkbya :/23:04
clarkbbut a lock file may prevent replication from succeeding maybe?23:04
fungicertainly possible23:05
*** aaronsheffield has quit IRC23:05
clarkb03 is done, doing 04 now23:06
*** _erlon_ has quit IRC23:07
clarkb04 also had lock problem23:08
clarkbgitea logs when it starts listening on 3000 too23:10
clarkbthough the new health checks should make that a non issue if we want to enable early?23:10
fungijust wondering if we should take this opportunity to yank several of the problem servers out of rotation and rebuild them in parallel while volume is low23:10
ianwok 08 rebooted, back in rotation and i can't see anything bad in logs23:11
clarkbfungi: to do that the "right" way we have to update system-config which requires working gitea23:11
fungitrue23:11
clarkb04 is back up, doing 05 now23:12
ianwi'll do 0723:12
clarkb05 too had lockfile problems23:13
clarkb(the correlation seems very strong)23:14
fungii guess the "wrong" way would be to put the haproxy server into emergency disable and then manually tweak the config to remove those from pools23:14
fungior use the command socket23:14
clarkbfungi: using the command socket hould be safe without emergency updates23:14
clarkbthe problem is we can't use the inventory we have to launch nodes until we remove the nodes we want to replace23:14
fungiright23:15
clarkb05 is done. doing 0623:16
ianw07 has the lockfile issue23:17
clarkb06 did too23:17
*** mriedem has quit IRC23:18
clarkbianw: when 07 is happy let me know and I htink I'll trigger system-config replication on all the giteas23:18
*** betherly has joined #openstack-infra23:18
clarkbthen we can check if they have all updated, if they have then I think we trigger replication again globally23:18
clarkb(maybe do it one gitea at a time to see if that is faster than all at once?)23:18
fungiyeah, should definitely see if there's any speedup23:19
fungiif it takes roughly an hour to replicate one, then the parallel replication is apparently not buying us any performance increase23:19
*** jamesmcarthur has joined #openstack-infra23:20
fungiand we ought to focus first on replicating to the ones we suspect are broken before the rest23:20
clarkb07 looks up?23:20
fungiwe could even take some /all of the ones we think have stale state out of the haproxy pools in the interim23:21
ianwyep just came back into rotation and seems ok23:21
clarkbalright I'm going to trigger system-config replication to all giteas now23:21
funginot going to try one at a time after all?23:21
clarkbjust system-config23:21
fungioh, right23:21
fungiso with that we can still take a few out of the inventory and replace them while we replicate to the others23:22
*** weifan has joined #openstack-infra23:22
clarkball 8 render the latest commit of system-config now23:22
*** betherly has quit IRC23:23
clarkbfor replication should we do 01 then 03-08 in that order? skipping 02 since it is going to be replaced?23:23
clarkbI'll trigger 01 replication now if so23:23
fungii's say 01 and 06 first?23:23
fungisince 06 will also not be rebuilt23:23
clarkboh good point23:23
clarkbya 01, 06, 03, 04, 05, 07, 08 in that order23:24
clarkbtriggering 01 now23:24
fungii mean, serially still if you want23:24
clarkbyes serially23:24
*** jamesmcarthur has quit IRC23:24
clarkb01 is in progress now. ~2100 tasks23:25
fungibut yeah, the new servers first, and we could work on replacing 02,03,04 together or something23:25
fungiand then replace 05,07,08 in a second batch23:25
clarkbfungi: we'll need a new change to the inventory if we want ot batch them23:25
clarkbat this point unlikely to get any of them done today? so maybe we push that up for prep for tomorrow?23:26
fungithat's fine too. i'm willing to work on some server replacements this evening but just as happy to save them for tomorrow when more folks are on hand23:26
fungiand when we're not conflating today's incident with issues we might create with server replacements23:27
clarkbfungi: well I don't want you to feel pressured to do that. I think we'll be ok to limp into tomorrow if these replications work23:27
*** weifan has quit IRC23:27
clarkbI'm going to hav eto make dinner in the near future: curry too so won't be able to type and eat :)23:27
fungiahh, yes, let's not get in the way of curry ;)23:28
* fungi is envious23:28
ianw(not something i want to take on while you're all away, not quite across it well enough)23:29
*** armax has joined #openstack-infra23:30
openstackgerritJames E. Blair proposed zuul/zuul master: Parse log file in action module  https://review.opendev.org/67283923:30
*** tjgresha has quit IRC23:31
clarkbalready down to 1100 tasks23:31
clarkbat this rate serializing will be done in ~12 minutes?23:32
clarkb(maybe we should reduce the thread count then)23:32
*** weifan has joined #openstack-infra23:32
fungii also wonder if it just goes faster when nobody's using gerrit23:34
clarkbcould be23:36
*** weifan has quit IRC23:37
fungii basically started the mass replication just when the bulk of our activity was climbing for the day23:38
clarkband time23:38
clarkbabout 14-15 minutes?23:39
clarkbstarting 06 now23:39
clarkb`ssh -p 29418 user@review.openstack.org replication start --url gitea06.opendev.org` is the command I'm running23:39
fungiyeah, that was waaaay faster than earlier23:40
*** goldyfruit has quit IRC23:40
*** sshnaidm is now known as sshnaidm|off23:43
*** dchen has joined #openstack-infra23:45
*** dchen has quit IRC23:45
*** dchen has joined #openstack-infra23:46
fungialready more than halfway done with 0623:46
fungiwonder how fast it goes with two at a time23:47
clarkbfungi: I can do 03 and 04 together next23:47
fungithough another possibility is that 01 and 06 are faster than the rest?23:48
clarkbcould be23:48
fungi(on faster storage owing to be created more recently)23:48
clarkbfungi: and proper journal size23:48
fungiyeah23:49
clarkbhowever I Think we should do 03 and 04 together for science23:49
fungifor science, yes23:49
*** jamesmcarthur has joined #openstack-infra23:50
clarkb03 and 04 started23:53
*** jamesmcarthur has quit IRC23:55
*** smcginnis has quit IRC23:56

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!