Friday, 2018-06-08

*** heyongli has quit IRC00:00
*** tosky has quit IRC00:00
*** heyongli has joined #openstack-infra00:00
auristormirror.pypi is currently at 1.738TB00:00
ianwi think we have quite good taste, and the graphite queries are quite powerful.  stupid things like bouncing because you go one byte over are mitigated by sensible queries like ensuring average values are too high for a long period00:00
clarkbwe need to rebuild pypi with the new blacklist in bandersnatch to see the true disk saving00:04
*** felipemonteiro has joined #openstack-infra00:05
clarkbright now we arent adding to it quite as quickly but need to delete the older stuff we dont want00:05
ianwanyway we are better tracking things now00:06
ianwhttp://grafana02.openstack.org/d/lFKIH5Smk/afs?panelId=8&orgId=1&from=now-7d&to=now&tab=general00:06
*** felipemonteiro_ has quit IRC00:07
corvuswe have a year's worth of that data in graphite00:08
corvuser cacti00:08
corvushttp://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=3171&rra_id=all00:08
ianwyeah but not volume data00:08
ianwhttp://grafana02.openstack.org/d/lFKIH5Smk/afs?panelId=10&fullscreen&orgId=1&from=now-7d&to=now00:09
corvusianw: where's the volume data?00:09
auristorI'm just seeing partition data.  I was going to ask about volume00:10
ianwcorvus: where's it come from?00:10
*** heyongli has quit IRC00:10
corvusianw: i feel like you're trying to show me a graph of volume data but i'm not seeing one00:10
ianwthis is the new tracking i've been working on00:10
*** heyongli has joined #openstack-infra00:10
corvusi see partitions here: https://screenshots.firefox.com/rsosIaN3G4csM5ax/grafana02.openstack.org00:10
corvusand this is what i get on the second link: https://screenshots.firefox.com/lSBlYJytN4Tyc1ib/grafana02.openstack.org00:11
auristorNow I see pypi growth00:11
ianwoh, try now.  it's the same problem as the alert i left, i forgot to click save00:11
corvusbetter :)  you might want to set the min-y to 000:11
ianwi'm just fiddling with the graphs.  obviously this needs to be put into grafyaml00:12
corvusi think the volume graphs will be a nice addition :)00:13
ianwwhat i'm hoping i can figure out, and i think grafana 5.x has some features to help with "holding" last values, is if i can see when the last vos release was done in some sane way00:13
ianwit would be very handy to know "debian mirror hasn't released in the last 24 hours"00:13
*** r-daneel has quit IRC00:14
corvusianw: grafana and graphite support events; if you use that, the graphs can have little annotations on them00:14
corvusianw: or you could report the last release time as a gauge with the unix timestamp as a value, and have a graph perform math on it00:15
clarkbif we did want to replace the pypi contents is deleting what is there then rebuilding it reasonable since we can wait to vos release or will the delete take forever?00:16
clarkbI guess another option is to use a new volume and switch to tjat then delete the old00:16
ianwcorvus: yeah, gauge idea was roughly what i was thinking, but i also like stamping it as events.  i haven't looked but i'm not sure about getting the last release time post-fact from "vos list" type tools00:17
corvusianw: the docs release cron job does it00:18
auristorif openstack.org was auristorfs then "vos splitvolume vol-name relative-path" would be the answer.00:19
corvusianw: oh, hrm, no that's last update it does00:19
ianwsends a stat?  i'd like to incorporate it into the server/partition/volume polling if possible for overall consistency00:19
corvusianw: no, i mean parsing vos examine00:19
*** heyongli has quit IRC00:20
*** heyongli has joined #openstack-infra00:20
ianwlistvol say "last update"00:21
corvusah, the last update time of the readonly volume is the last time it was released00:21
auristorvos examine mirror.pypi.readonly and then take the Creation time.   That is the time the .readonly was last created.00:21
corvuseven better :)00:21
auristorThe Last Update time is the most recent update of the RW before the release.00:22
corvusi guess last update would be the last file change to the read-write volume before the release, then create... yeah that :)00:22
auristorcompare the Last Update time of the RW to the RO to determine if a release is required00:23
corvusthat's what the docs relase script does here: https://git.openstack.org/cgit/openstack-infra/system-config/tree/modules/openstack_project/files/openafs/release-volumes.py#n100:23
ianwauristor: in this case what we're mostly interested in is if the release happened.  because if it didn't, that means the mirroring job failed00:24
ianwwhich is the "real" problem00:24
ianwi.e. reprepro corrupted itself again00:24
corvusianw: another approach would be to report how long ago the release happened as a gauge.  you're doing the math ahead of time there, so that makes some things easier.00:25
ianwanyway, good info, and step 1 for me is to grab the creation date anyway, as i'm not picking that up right now00:27
*** vtapia has quit IRC00:27
ianwthen i think some gnarly graphite queries can probably collate things00:27
*** heyongli has quit IRC00:30
*** heyongli has joined #openstack-infra00:31
*** rwsu has quit IRC00:32
*** claudiub has quit IRC00:33
*** yamamoto has joined #openstack-infra00:34
*** claudiub has joined #openstack-infra00:35
*** yamamoto has quit IRC00:39
*** shardy has quit IRC00:41
*** heyongli has quit IRC00:41
*** heyongli has joined #openstack-infra00:41
*** shardy has joined #openstack-infra00:41
*** felipemonteiro_ has joined #openstack-infra00:42
*** hongbin has joined #openstack-infra00:43
*** felipemonteiro has quit IRC00:43
openstackgerritIan Wienand proposed openstack-infra/afsmon master: Add .gitreview  https://review.openstack.org/57347100:47
*** heyongli has quit IRC00:51
openstackgerritIan Wienand proposed openstack-infra/afsmon master: Add .gitreview  https://review.openstack.org/57347100:51
openstackgerritIan Wienand proposed openstack-infra/afsmon master: Add basic zuul jobs  https://review.openstack.org/57347200:51
*** heyongli has joined #openstack-infra00:51
*** masayukig has quit IRC00:52
*** rlandy|rover has quit IRC00:52
*** masayukig has joined #openstack-infra00:53
*** rfolco has quit IRC00:54
auristorianw: for the volume disk usage graph it might be useful to plot the volume quota in addition to disk usage00:54
auristorianw: and partition free space00:54
*** rwsu has joined #openstack-infra00:56
*** dhill_ has quit IRC00:56
ianwauristor: yep i have that.  you can see what i'm sending in at http://graphite.openstack.org/ metrics->stats->gauges->afs00:57
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: scheduler: add job's parent name to the rpc job_list method  https://review.openstack.org/57347301:01
*** heyongli has quit IRC01:01
*** heyongli has joined #openstack-infra01:01
*** r-daneel has joined #openstack-infra01:04
*** vtapia has joined #openstack-infra01:04
*** gyee has quit IRC01:05
ianwauristor: it's slightly less dramatic when graphed to zero against the quota01:07
ianwhttp://grafana02.openstack.org/d/lFKIH5Smk/afs?panelId=10&fullscreen&orgId=1&from=now-7d&to=now01:07
ianwthat's with an alert "average usage > 90% for more than 24 hours" setup on it01:07
*** heyongli has quit IRC01:11
*** heyongli has joined #openstack-infra01:12
openstackgerritMerged openstack-infra/afsmon master: Add basic zuul jobs  https://review.openstack.org/57347201:18
openstackgerritMerged openstack-infra/afsmon master: Add .gitreview  https://review.openstack.org/57347101:18
*** yamahata has quit IRC01:18
*** heyongli has quit IRC01:22
*** heyongli has joined #openstack-infra01:22
*** mriedem_afk is now known as mriedem01:23
*** namnh has joined #openstack-infra01:25
*** jesslampe has quit IRC01:31
*** jesslampe has joined #openstack-infra01:31
*** jesslampe has quit IRC01:31
*** jesslampe has joined #openstack-infra01:32
*** heyongli has quit IRC01:32
*** jesslampe has quit IRC01:32
*** heyongli has joined #openstack-infra01:32
*** jesslampe has joined #openstack-infra01:32
*** jesslampe has quit IRC01:33
*** jesslampe has joined #openstack-infra01:34
*** mriedem is now known as mriedem_afk01:37
*** zhangfei has joined #openstack-infra01:42
*** heyongli has quit IRC01:42
*** heyongli has joined #openstack-infra01:42
*** heyongli has quit IRC01:52
*** heyongli has joined #openstack-infra01:53
*** mriedem_afk is now known as mriedem01:54
*** VW has joined #openstack-infra01:54
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: add /{tenant}/job/{job_name} route  https://review.openstack.org/55097802:02
*** heyongli has quit IRC02:03
*** heyongli has joined #openstack-infra02:03
*** owalsh_ has joined #openstack-infra02:08
*** VW has quit IRC02:09
*** VW has joined #openstack-infra02:09
*** mriedem has quit IRC02:10
*** owalsh has quit IRC02:12
*** heyongli has quit IRC02:13
*** heyongli has joined #openstack-infra02:13
*** VW has quit IRC02:14
*** lifeless_ has joined #openstack-infra02:22
*** heyongli has quit IRC02:23
*** heyongli has joined #openstack-infra02:23
*** lifeless has quit IRC02:23
*** rh-jelabarre has quit IRC02:25
*** hongbin has quit IRC02:26
*** rh-jelabarre has joined #openstack-infra02:30
*** heyongli has quit IRC02:33
*** heyongli has joined #openstack-infra02:34
*** psachin has joined #openstack-infra02:35
openstackgerritIan Wienand proposed openstack-infra/afsmon master: Run pep8  https://review.openstack.org/57348302:41
openstackgerritIan Wienand proposed openstack-infra/afsmon master: Add creation date, report RO volumes  https://review.openstack.org/57348402:41
*** heyongli has quit IRC02:44
*** heyongli has joined #openstack-infra02:44
openstackgerritIan Wienand proposed openstack-infra/afsmon master: Run pep8  https://review.openstack.org/57348302:45
openstackgerritIan Wienand proposed openstack-infra/afsmon master: Add creation date, report RO volumes  https://review.openstack.org/57348402:45
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: add /{tenant}/projects and /{tenant}/project/{project} routes  https://review.openstack.org/55097902:48
openstackgerritIan Wienand proposed openstack-infra/afsmon master: Add empty bindep.txt  https://review.openstack.org/57348602:49
*** jcoufal has joined #openstack-infra02:49
openstackgerritIan Wienand proposed openstack-infra/afsmon master: Add empty bindep.txt  https://review.openstack.org/57348602:50
openstackgerritIan Wienand proposed openstack-infra/afsmon master: Run pep8  https://review.openstack.org/57348302:50
openstackgerritIan Wienand proposed openstack-infra/afsmon master: Add creation date, report RO volumes  https://review.openstack.org/57348402:50
*** jcoufal has quit IRC02:54
*** heyongli has quit IRC02:54
*** heyongli has joined #openstack-infra02:54
*** rosmaita has quit IRC02:56
openstackgerritMerged openstack-infra/afsmon master: Add empty bindep.txt  https://review.openstack.org/57348602:58
*** markvoelker has quit IRC03:00
*** markvoelker has joined #openstack-infra03:02
*** heyongli has quit IRC03:04
*** heyongli has joined #openstack-infra03:04
*** ramishra has joined #openstack-infra03:06
*** markvoelker has quit IRC03:07
*** markvoelker has joined #openstack-infra03:12
*** heyongli has quit IRC03:14
*** heyongli has joined #openstack-infra03:15
openstackgerritIan Wienand proposed openstack-infra/afsmon master: Add creation date, report RO volumes  https://review.openstack.org/57348403:16
openstackgerritMerged openstack-infra/afsmon master: Run pep8  https://review.openstack.org/57348303:21
*** felipemonteiro has joined #openstack-infra03:24
*** heyongli has quit IRC03:25
*** heyongli has joined #openstack-infra03:25
*** yamamoto has joined #openstack-infra03:26
*** felipemonteiro_ has quit IRC03:26
*** yamamoto has quit IRC03:26
openstackgerritIan Wienand proposed openstack-infra/system-config master: mirror-update: install afsmon and run from cron  https://review.openstack.org/57349303:27
*** Bhujay has joined #openstack-infra03:28
openstackgerritIan Wienand proposed openstack-infra/system-config master: mirror-update: install afsmon and run from cron  https://review.openstack.org/57349303:29
mnaseri'm seeing some packet loss and high latency to zuul.o.o03:31
mnaseranyone see the same?03:31
*** yamamoto has joined #openstack-infra03:31
mnaserhttp://paste.openstack.org/show/722942/03:32
mnasermtr shows packet loss at zayo03:33
clarkbmy connection is also over zayo but not seeing loss (ipv6)03:33
mnaserhttp://paste.openstack.org/show/722944/03:34
mnaserhttps://tranzact.zayo.com/#!/networkStatus03:34
*** heyongli has quit IRC03:35
*** heyongli has joined #openstack-infra03:35
clarkbmy path went through ord03:38
*** tpsilva has quit IRC03:41
mnaseri guess it is the nyc fiber issue causing congestion03:42
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: angular6 fix attempt  https://review.openstack.org/57349403:43
*** dave-mcc_ has quit IRC03:44
*** heyongli has quit IRC03:45
*** lpetrut has joined #openstack-infra03:45
*** heyongli has joined #openstack-infra03:45
*** udesale has joined #openstack-infra03:48
openstackgerritIan Wienand proposed openstack-infra/system-config master: mirror-update: install afsmon and run from cron  https://review.openstack.org/57349303:52
*** germs has joined #openstack-infra03:54
*** germs has quit IRC03:54
*** germs has joined #openstack-infra03:54
*** heyongli has quit IRC03:55
*** heyongli has joined #openstack-infra03:56
openstackgerritIan Wienand proposed openstack-infra/system-config master: mirror-update: install afsmon and run from cron  https://review.openstack.org/57349303:58
*** felipemonteiro has quit IRC04:03
*** heyongli has quit IRC04:06
*** heyongli has joined #openstack-infra04:06
*** rh-jelabarre has quit IRC04:14
*** heyongli has quit IRC04:16
*** heyongli has joined #openstack-infra04:16
*** germs has quit IRC04:22
*** rh-jelabarre has joined #openstack-infra04:23
*** lpetrut has quit IRC04:23
*** heyongli has quit IRC04:26
*** heyongli has joined #openstack-infra04:26
openstackgerritArtem Goncharov proposed openstack-infra/zuul master: fill `delta`, `start`, `end` for skipped `creates` and `removes` command.  https://review.openstack.org/56786404:33
*** heyongli has quit IRC04:36
*** heyongli has joined #openstack-infra04:37
*** pgadiya has joined #openstack-infra04:37
*** pgadiya has quit IRC04:37
*** links has joined #openstack-infra04:41
*** ianychoi has quit IRC04:44
*** heyongli has quit IRC04:47
*** heyongli has joined #openstack-infra04:47
*** heyongli has quit IRC04:57
*** heyongli has joined #openstack-infra04:57
*** heyongli has quit IRC05:07
*** heyongli has joined #openstack-infra05:07
*** aeng has quit IRC05:10
*** heyongli has quit IRC05:17
*** heyongli has joined #openstack-infra05:18
*** Bhujay has quit IRC05:19
*** lifeless_ has quit IRC05:26
*** lifeless has joined #openstack-infra05:26
*** heyongli has quit IRC05:28
*** heyongli has joined #openstack-infra05:28
openstackgerritIan Wienand proposed openstack-infra/grafyaml master: Add transparent  https://review.openstack.org/57352705:31
ianwumm, i'm pretty sure the jobs launched per hour has dropped to zero05:32
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Refactor load sensors into drivers  https://review.openstack.org/54927505:33
openstackgerritTobias Henkel proposed openstack-infra/zuul master: WIP: Add cgroup support to ram sensor  https://review.openstack.org/54950605:33
ianwcorvus: the type of thing it would be good to alert for ;)05:33
ianwthere's a lot of05:34
ianw2018-06-08 04:33:45,465 DEBUG zuul.RPCListener: Received job zuul:status_get05:34
ianw2018-06-08 04:33:45,559 DEBUG zuul.RPCListener: Received job zuul:tenant_sql_connection05:34
ianwin the logs05:34
ianwlauncher seem to be running, but seem to have nothing to do05:37
*** heyongli has quit IRC05:38
*** heyongli has joined #openstack-infra05:38
ianwi can not see any smoking guns in the scheduler logs05:39
ianwthe executors all seem up05:40
ianwit's like it has stopped receiving events from gerrit05:41
amotokiall finished jobs continue to remain in check/gate queues too05:41
*** gfidente has joined #openstack-infra05:42
*** gfidente has joined #openstack-infra05:42
ianwi wonder, with prior reports of packet drops, if we did somehow drop the connection?05:43
ianwgerrit memory usage @11,461 Mb / 43,691 Mb05:44
ianwso stable, cpu load is no different than usual05:44
ianwi'm going to restart zuul-scheduler05:45
*** heyongli has quit IRC05:48
*** heyongli has joined #openstack-infra05:48
ianwok, i think we are starting to see things come in again05:53
ianwok, i have re-queued what was in the gate, but my strong suspicion here is that events from gerrit were not coming through05:55
ianwhence i can't requeue what wasn't in the queue05:56
Tenguerf05:56
*** iranzo has joined #openstack-infra05:57
*** pcaruana has joined #openstack-infra05:57
*** heyongli has quit IRC05:58
*** heyongli has joined #openstack-infra05:59
ianw#status notice Zuul stopped receiving gerrit events around 04:00UTC; any changes submitted between then and now will probably require a "recheck" comment to be requeued.  Thanks!05:59
openstackstatusianw: sending notice05:59
-openstackstatus- NOTICE: Zuul stopped receiving gerrit events around 04:00UTC; any changes submitted between then and now will probably require a "recheck" comment to be requeued. Thanks!06:01
ianwinfra-root: ^ that's my best guess :/  with that suggestion log spelunking might show up something more useful.  i couldn't see any obvious errors or exceptions06:02
openstackstatusianw: finished sending notice06:03
ianwi'm afk for a while, will check later06:03
*** armaan has joined #openstack-infra06:04
AJaegerthanks, ianw !06:05
ianwcorvus / auristor : in other news, the dashboard is looking about how i'd like it now -> http://grafana02.openstack.org/d/ACtl1JSmz/afs?orgId=106:06
openstackgerritIan Wienand proposed openstack-infra/project-config master: Fix up AFS dashboard  https://review.openstack.org/57353706:07
*** e0ne has joined #openstack-infra06:09
*** heyongli has quit IRC06:09
*** heyongli has joined #openstack-infra06:09
*** pcaruana has quit IRC06:14
*** pcaruana has joined #openstack-infra06:15
*** Bhujay has joined #openstack-infra06:16
*** heyongli has quit IRC06:19
*** heyongli has joined #openstack-infra06:19
*** armaan has quit IRC06:20
*** armaan has joined #openstack-infra06:20
*** cshastri has joined #openstack-infra06:22
*** lpetrut has joined #openstack-infra06:22
*** jbadiapa has joined #openstack-infra06:22
*** dhajare has quit IRC06:24
openstackgerritOpenStack Proposal Bot proposed openstack-infra/project-config master: Normalize projects.yaml  https://review.openstack.org/57359206:27
*** heyongli has quit IRC06:29
*** heyongli has joined #openstack-infra06:29
*** dhajare has joined #openstack-infra06:39
*** heyongli has quit IRC06:39
*** heyongli has joined #openstack-infra06:40
*** zoli has quit IRC06:41
*** dulek has quit IRC06:41
*** Bhujay has quit IRC06:43
*** dklyle has quit IRC06:44
*** e0ne has quit IRC06:44
*** armaan has quit IRC06:44
*** armaan has joined #openstack-infra06:45
*** zoli has joined #openstack-infra06:48
*** heyongli has quit IRC06:50
*** heyongli has joined #openstack-infra06:50
*** alexchadin has joined #openstack-infra06:56
*** dhajare has quit IRC06:57
*** jaosorior has quit IRC06:58
*** caphrim007_ has quit IRC06:59
*** caphrim007 has joined #openstack-infra07:00
*** heyongli has quit IRC07:00
*** heyongli has joined #openstack-infra07:00
*** Bhujay has joined #openstack-infra07:02
*** d0ugal has joined #openstack-infra07:02
*** hashar has joined #openstack-infra07:03
*** dhajare has joined #openstack-infra07:09
*** heyongli has quit IRC07:10
*** heyongli has joined #openstack-infra07:10
*** rcernin has quit IRC07:13
*** ramishra has quit IRC07:14
*** diablo_rojo has joined #openstack-infra07:16
*** slaweq has joined #openstack-infra07:17
*** jesslampe has quit IRC07:17
*** jesslampe has joined #openstack-infra07:18
*** jesslampe has quit IRC07:18
*** jesslampe has joined #openstack-infra07:18
*** heyongli has quit IRC07:20
*** heyongli has joined #openstack-infra07:21
*** amoralej|off is now known as amoralej07:23
*** jesslampe has quit IRC07:23
*** jesslampe has joined #openstack-infra07:23
*** jesslampe has quit IRC07:24
*** jesslampe has joined #openstack-infra07:24
*** jesslampe has quit IRC07:25
*** jesslampe has joined #openstack-infra07:25
*** jistr is now known as jistr|reloc07:25
*** jesslampe has quit IRC07:25
*** jesslampe has joined #openstack-infra07:26
*** e0ne has joined #openstack-infra07:26
*** jesslampe has quit IRC07:26
*** jesslampe has joined #openstack-infra07:26
*** jesslampe has quit IRC07:27
*** jesslampe has joined #openstack-infra07:27
*** jesslampe has quit IRC07:28
openstackgerritArtem Goncharov proposed openstack-infra/project-config master: Add openstack-service-broker project  https://review.openstack.org/57345907:29
*** tosky has joined #openstack-infra07:30
openstackgerritIan Wienand proposed openstack-infra/system-config master: mirror-update: install afsmon and run from cron  https://review.openstack.org/57349307:30
*** heyongli has quit IRC07:31
*** heyongli has joined #openstack-infra07:31
*** hongbin has joined #openstack-infra07:32
*** jbadiapa has quit IRC07:32
openstackgerritArtem Goncharov proposed openstack-infra/nodepool master: Use openstacksdk instead of os-client-config  https://review.openstack.org/56615807:33
*** aojea has joined #openstack-infra07:33
*** jbadiapa has joined #openstack-infra07:33
*** hongbin has quit IRC07:35
*** bauzas is now known as PapaOurs07:38
*** lyarwood is now known as lyaaaaaaaarwood07:38
*** annp has quit IRC07:39
*** salv-orlando has joined #openstack-infra07:41
*** heyongli has quit IRC07:41
*** heyongli has joined #openstack-infra07:41
*** jcoufal has joined #openstack-infra07:42
*** ramishra has joined #openstack-infra07:44
*** salv-orlando has quit IRC07:46
*** amotoki has quit IRC07:47
*** jcoufal has quit IRC07:47
*** jesslampe has joined #openstack-infra07:47
*** roman_g has joined #openstack-infra07:47
*** amotoki has joined #openstack-infra07:48
*** jesslampe has quit IRC07:48
*** rcernin has joined #openstack-infra07:48
*** heyongli has quit IRC07:51
*** heyongli has joined #openstack-infra07:51
*** rwsu has quit IRC07:52
*** jpena|off is now known as jpena07:53
hwoarangianw: new leap-150 build (478) is woking fine! awesome! howeer, 477 is still being present on various nl0* so can you wipe that because every other build hits the problematic nodes i think07:54
hwoarang*however07:54
*** dulek has joined #openstack-infra07:54
*** Bhujay has quit IRC07:57
*** shardy has quit IRC07:59
*** shardy has joined #openstack-infra08:01
*** heyongli has quit IRC08:01
*** heyongli has joined #openstack-infra08:02
*** Adri2000 has quit IRC08:03
*** rwsu has joined #openstack-infra08:05
*** jpich has joined #openstack-infra08:05
*** dklyle has joined #openstack-infra08:06
*** yamamoto has quit IRC08:08
*** lifeless has quit IRC08:09
*** lifeless has joined #openstack-infra08:10
*** ramishra has quit IRC08:11
*** Adri2000 has joined #openstack-infra08:11
*** heyongli has quit IRC08:12
*** heyongli has joined #openstack-infra08:12
*** ramishra has joined #openstack-infra08:13
*** shardy has quit IRC08:13
*** annp has joined #openstack-infra08:14
*** slaweq has quit IRC08:14
*** slaweq has joined #openstack-infra08:14
*** slaweq has quit IRC08:14
*** slaweq has joined #openstack-infra08:15
*** alexchadin has quit IRC08:16
openstackgerritMerged openstack-infra/project-config master: Normalize projects.yaml  https://review.openstack.org/57359208:17
*** jistr|reloc is now known as jistr08:19
*** heyongli has quit IRC08:22
*** yamamoto has joined #openstack-infra08:22
*** heyongli has joined #openstack-infra08:22
*** slaweq has quit IRC08:25
*** slaweq has joined #openstack-infra08:25
*** alexchadin has joined #openstack-infra08:27
*** shardy has joined #openstack-infra08:31
*** heyongli has quit IRC08:32
*** heyongli has joined #openstack-infra08:33
ianwhwoarang: hmm, it shouldn't be picking that up if it's not the most recent08:33
ianwunless some providers are out of sync08:33
hwoarangah i saw jobs queuing a lot on opensuse-150  so i thought there were just hitting a bad node.08:35
hwoarangin the end, a node was found and the job worked fine.08:35
hwoarangso lets say it's working then08:35
ianwyeah, i deleted it anyway since it doesn't work and i was in there08:36
*** markvoelker has quit IRC08:37
hwoarangthank you08:37
*** markvoelker has joined #openstack-infra08:38
*** derekh has joined #openstack-infra08:41
*** salv-orlando has joined #openstack-infra08:42
*** markvoelker has quit IRC08:42
*** heyongli has quit IRC08:42
*** heyongli has joined #openstack-infra08:43
*** salv-orlando has quit IRC08:46
*** heyongli has quit IRC08:53
*** heyongli has joined #openstack-infra08:53
*** d0ugal_ has joined #openstack-infra09:00
*** d0ugal has quit IRC09:00
*** d0ugal_ has quit IRC09:00
*** d0ugal has joined #openstack-infra09:01
*** lifeless_ has joined #openstack-infra09:02
*** lifeless has quit IRC09:03
*** heyongli has quit IRC09:03
*** heyongli has joined #openstack-infra09:03
openstackgerritOlivier Bourdon proposed openstack/diskimage-builder master: Fix CentOS image build failure when dib runs on debian based system  https://review.openstack.org/55948509:06
*** dtantsur|afk is now known as dtantsur09:09
openstackgerritMasayuki Igawa proposed openstack/os-testr master: Deprecate ostestr command  https://review.openstack.org/57363609:12
*** heyongli has quit IRC09:13
*** lifeless_ is now known as lifeless09:13
*** heyongli has joined #openstack-infra09:13
*** sambetts|afk is now known as sambetts09:17
*** alexchadin has quit IRC09:17
*** ramishra has quit IRC09:17
ltomasboping AJaeger: again gate issue for another patch https://review.openstack.org/#/c/56414809:17
ltomasboAJaeger, I rebase it and I see on Zuul some jobs (not yet failing but) with the message: merger_failure09:18
*** ramishra has joined #openstack-infra09:19
*** heyongli has quit IRC09:23
*** heyongli has joined #openstack-infra09:24
*** edmondsw has joined #openstack-infra09:26
*** alexchadin has joined #openstack-infra09:28
*** edmondsw has quit IRC09:30
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Don't use GRANT to create new MySQL users  https://review.openstack.org/57364109:30
*** lifeless_ has joined #openstack-infra09:30
*** lifeless has quit IRC09:32
*** owalsh_ is now known as owalsh09:33
*** heyongli has quit IRC09:34
*** heyongli has joined #openstack-infra09:34
*** jaosorior has joined #openstack-infra09:36
*** dtantsur is now known as dtantsur|brb09:38
*** pbourke has quit IRC09:42
*** pbourke has joined #openstack-infra09:42
*** salv-orlando has joined #openstack-infra09:42
*** heyongli has quit IRC09:44
fricklerltomasbo: infra-root: I'm seeing MERGER_FAILURE for lots of jobs, but haven't found a cause for it yet. might be related to the network issues mentioned earlier09:44
*** heyongli has joined #openstack-infra09:44
ltomasbofrickler, ahh ok, I was not aware! thanks!09:45
*** salv-orlando has quit IRC09:46
jokke_abigerrit seems to be belly up as well09:47
jokke_-abi09:47
ianwfrickler / ltomasbo : that job went in around 06:3009:51
ianw2018-06-08 06:30:54,012 DEBUG zuul.Pipeline.openstack.check: Scheduling merge for item <QueueItem 0x7fadc2296ef0 for <Change 0x7fadccebd2e8 564148,15>09:51
ianwat about the same time in the merger09:51
ianw2018-06-08 06:26:23,180  ... git.exc.GitCommandError: Cmd('git') failed due to: exit code(128)09:52
ianw  cmdline: git fetch origin refs/pull/41155/head09:52
ianw  stderr: 'error: RPC failed; curl 18 transfer closed with outstanding read data remaining09:52
ianwa different change, but clearly about that time there was some networking issues09:52
ianwi think it's in the same realm of issues09:52
ltomasboianw, so, should I recheck it?09:52
ianwltomasbo: i probably would.  i don't have time to track that change to the exact merger that picked it up, but i would think it's likely the same sort of thing09:53
ltomasbook09:53
ltomasbothanks!09:53
*** evrardjp has quit IRC09:53
ianwfrickler: my examination of review.o.o & gerrit seemed to show it was roughly ok, but maybe check again?09:54
ianwwe don't have any email from rax about known networking issues at this point09:54
*** heyongli has quit IRC09:54
*** heyongli has joined #openstack-infra09:54
*** evrardjp has joined #openstack-infra09:56
*** lifeless has joined #openstack-infra09:58
*** lifeless_ has quit IRC09:58
*** evrardjp_ has joined #openstack-infra10:01
*** evrardjp has quit IRC10:01
*** namnh has quit IRC10:03
*** alexchadin has quit IRC10:03
*** heyongli has quit IRC10:04
*** heyongli has joined #openstack-infra10:05
*** evrardjp_ has quit IRC10:05
*** alexchadin has joined #openstack-infra10:06
*** evrardjp has joined #openstack-infra10:06
*** annp has quit IRC10:09
*** heyongli has quit IRC10:15
*** heyongli has joined #openstack-infra10:15
*** lifeless_ has joined #openstack-infra10:23
*** lifeless has quit IRC10:24
*** vivsoni_ has joined #openstack-infra10:24
*** vivsoni has quit IRC10:24
*** heyongli has quit IRC10:25
*** heyongli has joined #openstack-infra10:25
*** udesale_ has joined #openstack-infra10:27
*** udesale__ has joined #openstack-infra10:28
*** udesale has quit IRC10:29
*** udesale_ has quit IRC10:31
*** dtantsur|brb is now known as dtantsur10:31
*** boden has joined #openstack-infra10:35
*** heyongli has quit IRC10:35
*** heyongli has joined #openstack-infra10:35
*** rcernin has quit IRC10:37
*** slaweq has quit IRC10:38
*** slaweq_ has joined #openstack-infra10:38
*** markvoelker has joined #openstack-infra10:38
*** lifeless_ has quit IRC10:39
*** stephenfin is now known as finucannot10:44
*** heyongli has quit IRC10:45
*** shardy has quit IRC10:46
*** heyongli has joined #openstack-infra10:46
*** quiquell has joined #openstack-infra10:47
quiquellHello10:47
quiquellI see some "MERGER_FAILURE" here https://review.openstack.org/#/c/572096/10:47
quiquellWhat it means ?10:48
*** shardy has joined #openstack-infra10:48
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Add --check-config option to zuul scheduler  https://review.openstack.org/54216010:55
*** lifeless has joined #openstack-infra10:55
*** heyongli has quit IRC10:56
*** heyongli has joined #openstack-infra10:56
openstackgerritMerged openstack-dev/cookiecutter master: Fix quotes and undefined variable  https://review.openstack.org/56975510:57
Tenguquiquell: have the same on one of my reviews: https://review.openstack.org/57091311:01
Tengufunky part: it's not shown in the table - have to dig in the mail/messages.11:02
Tengumaybe it's a new bunch of tests?11:02
*** heyongli has quit IRC11:06
*** heyongli has joined #openstack-infra11:06
fricklerTengu: quiquell: it seems there were some issues earlier, not sure yet what the exact reason was, but hopefully it will be better when you do a recheck11:07
*** elod has joined #openstack-infra11:08
Tengufrickler: ok :).11:08
Tenguas it's not an urgent thing, I'll let it as-is until my next refresh round.11:09
*** jpena is now known as jpena|lunch11:09
*** vtapia has quit IRC11:11
*** alexchadin has quit IRC11:11
*** zoli is now known as zoli|afk11:13
*** markvoelker has quit IRC11:13
*** heyongli has quit IRC11:16
*** heyongli has joined #openstack-infra11:16
auristorianw: the dashboard is looking good11:19
*** shardy has quit IRC11:19
*** dhajare has quit IRC11:19
*** shardy has joined #openstack-infra11:20
*** heyongli has quit IRC11:26
*** heyongli has joined #openstack-infra11:27
*** auristor has quit IRC11:27
fricklerianw: still seing sporadic failures on jobs less than 2h old, I'll go through all the zm* now and check their logs11:28
*** udesale__ has quit IRC11:29
*** Bhujay has joined #openstack-infra11:33
*** auristor has joined #openstack-infra11:35
*** ldnunes has joined #openstack-infra11:35
*** heyongli has quit IRC11:37
*** heyongli has joined #openstack-infra11:37
*** vtapia has joined #openstack-infra11:37
openstackgerritMerged openstack-dev/pbr master: Add leading 0 on alpha release in semver doc  https://review.openstack.org/55818111:37
ianwfrickler: hmm, example?11:38
ianwok 570913 ...11:39
ianw2018-06-08 07:18:05,671 DEBUG zuul.Merger: Processing ref refs/changes/13/570913/5 for project gerrit/openstack/tripleo-quickstart-extras / master uuid 99a6605fba0a4f89b955d54dda2157f011:39
*** jcoufal has joined #openstack-infra11:40
ianw2018-06-08 07:18:06,190 DEBUG zuul.Repo: Checking out 90ab17980bc793b4920bc72f4d3a318442d3bd1b ... that's the right hash11:40
*** alexchadin has joined #openstack-infra11:42
*** vtapia has quit IRC11:43
ianw2018-06-08 07:18:06,896 INFO zuul.MergeClient: Merge <gear.Job 0x7fadcd3a50b8 handle: b'H:127.0.0.1:22420' name: merger:merge unique: e2309f8ba26f4a748da647a8f87e9dd0> complete, merged: True, updated: False, commit: 4bcef77e29658cf512f2ca4194641928573389f911:43
ianwzuul gets response back11:43
*** dklyle has quit IRC11:46
*** rosmaita has joined #openstack-infra11:46
ianwok, here's hte error on the executor11:47
ianw2018-06-08 11:43:47,985 ERROR zuul.AnsibleJob: [build: f197ad36a1904208894f04b10269cbf7] Retry 1: Fetch /var/lib/zuul/builds/f197ad36a1904208894f04b10269cbf7/work/src/git.openstack.org/openstack/airship-drydock origin None11:47
*** heyongli has quit IRC11:47
*** heyongli has joined #openstack-infra11:47
fricklerianw: which executor is that?11:52
ianwif you look for that "retry [1|2]" it's basically across all the executors11:52
ianwno i tell a lie11:53
ianwze01 ze07 ze09 ze10 do *not* have that in the logs at all11:53
ianwze02 ze03 ze04 ze05 ze06 ze08 *do have* the retry messages11:54
ianwso ... something different between the two sets?11:54
fricklerthe ones I looked at all seem to have "ssh: Could not resolve hostname review.openstack.org: Temporary failure in name resolution" as reason11:55
ianwbut yeah, that fetch failure was from /var/lib ?11:56
*** dklyle has joined #openstack-infra11:56
*** rfolco has joined #openstack-infra11:56
fricklerhmm unbound seems dead. on ze02 at least:  active (exited) since Fri 2018-06-08 06:39:21 UTC; 5h 17min ago11:56
*** heyongli has quit IRC11:57
fungithat timing seems suspiciously close to when system cronjobs fire11:57
*** heyongli has joined #openstack-infra11:57
fricklerafter a restart it seems to be working again11:58
fricklerthat is what the journal says: http://paste.openstack.org/show/722969/11:58
fungiyeah, not running on ze03 either11:58
fungithis may be the difference between the broken and working executors11:59
*** vtapia has joined #openstack-infra11:59
ianwfrickler: good eye!11:59
ianwnot running on 3 5 8 6 412:00
ianwit is running on 2 ... maybe we restarted it.  but it matches exactly12:00
ianwotherwise12:00
fricklerianw: yeah, I just restarted it on 2, that matches completely12:00
ianwthat's what i call a smoking gun! :)12:01
*** dklyle has quit IRC12:02
*** jpena|lunch is now known as jpena12:02
fungilooks like the unbound restarts were related to unattended upgrades12:02
fungilooking at /var/log/dpkg.log12:03
openstackgerritIan Wienand proposed openstack-infra/system-config master: mirror-update: install afsmon and run from cron  https://review.openstack.org/57349312:05
fungiyeah, the problem executors mention unbound upgrades in the current /var/log/dpkg.log and the executors which were still working have no mention of unbound in their dpkg.log12:05
frickleralso it looks like systemd was too fast when restarting. look at the paste I posted, pid 1681 is the old unbound process. and it outputs its info block after the new process has been started. so the new only probably fails to bind to port 5312:06
fungibasically, the problem executors have the newer unbound 1.5.8-1ubuntu1.1 and the ones which remained working are still on older unbound 1.5.8-1ubuntu112:07
*** heyongli has quit IRC12:07
fungiso we should probably do controlled upgrades of unbound on the rest and make sure it remains running afterward12:08
fungistarting it again manually if needed12:08
*** heyongli has joined #openstack-infra12:08
frickleryeah, that's a security update that got published tonight https://bugs.launchpad.net/ubuntu/+source/unbound/1.5.8-1ubuntu1.112:08
openstackLaunchpad bug 1 in Ubuntu Malaysia LoCo Team "Microsoft has a majority market share" [Critical,In progress] - Assigned to MFauzilkamil Zainuddin (apogee)12:08
fungiotherwise they're just going to break like the others did when unattended-upgrades fires on them12:08
ianwhaha i think that bug parsing is wrong :)12:09
*** hemna_ has joined #openstack-infra12:09
fungijust slightly12:10
*** rosmaita has quit IRC12:11
*** yamamoto has quit IRC12:11
*** Bhujay has quit IRC12:11
ianwbut yet strangely relevant12:12
toskyuhm, are you talking about the reasons for all the MERGER_FAILURE errors that I'm seeing all around?12:12
fricklertosky: indirectly, but yes12:12
* frickler goes to open a bug report for ubuntu12:12
ianwtosky: yes, i think frickler found the reason and should probably be cleared up soon12:12
*** nicolasbock has joined #openstack-infra12:13
ianwis someone doing the manual upgrades/restarts?12:13
*** mugsie_ is now known as mugsie12:14
*** yamamoto has joined #openstack-infra12:15
*** markvoelker has joined #openstack-infra12:16
fungii'm not yet in a spot where i can, but can probably start on them in 30-60 minutes if nobody else beats me to it12:16
fricklerI'll run the restarts now12:16
*** yamamoto has quit IRC12:16
*** psachin has quit IRC12:16
*** dhill_ has joined #openstack-infra12:17
*** heyongli has quit IRC12:18
*** heyongli has joined #openstack-infra12:18
fungithanks frickler!12:19
frickleralso confirmed that the update breaks unbound by running it manually on ze09. will update the other nodes now12:20
fricklercreated https://bugs.launchpad.net/ubuntu/+source/unbound/+bug/1775833 in the meantime12:20
openstackLaunchpad bug 1775833 in unbound (Ubuntu) "unbound not running after automatic update" [Undecided,New]12:20
bodenhi clarkb corvus as per our chat yesterday on tox siblings, I added a job for lower constraints and also added neutron as required project in https://review.openstack.org/#/c/573429/  however based on my testing neutron master still isn’t getting installed http://logs.openstack.org/29/573429/1/check/vmware-tox-lower-constraints/0b6888a/job-output.txt.gz#_2018-06-07_20_34_48_06047412:21
bodenany ideas?12:21
*** alexchadin has quit IRC12:21
bodensorry, the job updates are in https://review.openstack.org/#/c/573386/12:22
*** armaan has quit IRC12:25
*** yamamoto has joined #openstack-infra12:27
ianw++ thanks frickler.  i'm out, have a good day all!12:28
*** heyongli has quit IRC12:28
*** yamamoto has quit IRC12:28
*** heyongli has joined #openstack-infra12:28
*** sthussey has joined #openstack-infra12:28
fricklerianw: thanks to you, have a nice and quiet weekend12:31
*** rlandy has joined #openstack-infra12:31
*** rlandy is now known as rlandy|rover12:31
*** jcoufal_ has joined #openstack-infra12:31
*** alexchadin has joined #openstack-infra12:32
smcginnisHave things been restarted? Safe to recheck patches now?12:33
*** jcoufal has quit IRC12:34
fricklersmcginnis: yes, we should be fine again now12:37
smcginnisfrickler: OK, thank you.12:37
*** heyongli has quit IRC12:38
*** heyongli has joined #openstack-infra12:38
*** armaan has joined #openstack-infra12:41
*** armaan has quit IRC12:41
*** armaan has joined #openstack-infra12:41
*** armaan has quit IRC12:46
*** heyongli has quit IRC12:48
*** heyongli has joined #openstack-infra12:49
fricklerinfra-root: ze01-10 should be fine now. some zm* were already updated without issues, so it only seems to happen under load. not sure which other nodes might be affected, maybe someone can later update all of them just to be on the safe side12:49
*** alexchadin has quit IRC12:51
fungifrickler: thanks, looks like they're all upgraded to 1.5.8-1ubuntu1.1 now12:52
fricklerclarkb: corvus: I found this error in the zuul debug.log, but the node did indeed got held successfully. I debugged that node and deleted it though before noticing this message. http://paste.openstack.org/show/722971/12:54
*** tpsilva has joined #openstack-infra12:54
*** VW has joined #openstack-infra12:54
*** zoli|afk is now known as zoli12:55
*** zoli is now known as zoli|wfh12:55
*** zoli|wfh is now known as zoli12:55
*** dklyle has joined #openstack-infra12:58
*** heyongli has quit IRC12:59
*** myoung|off is now known as myoung12:59
*** heyongli has joined #openstack-infra12:59
toskyfrickler: is it safe to recheck then?13:01
*** edmondsw has joined #openstack-infra13:02
*** amoralej is now known as amoralej|lunch13:02
fricklertosky: I pretty much hope so. we do have quite some backlog, though, so don't expect fast results13:04
toskysure13:04
* tosky rechecks13:04
*** dbecker has joined #openstack-infra13:04
toskyfrickler: is there some specific message/tag that I should add to recheck in order to track this?13:04
*** edmondsw has quit IRC13:06
*** d0ugal_ has joined #openstack-infra13:08
*** d0ugal has quit IRC13:08
*** d0ugal_ has quit IRC13:08
*** d0ugal has joined #openstack-infra13:08
*** d0ugal has quit IRC13:08
*** d0ugal has joined #openstack-infra13:08
*** heyongli has quit IRC13:09
*** heyongli has joined #openstack-infra13:09
*** dhill_ has quit IRC13:12
*** VW has quit IRC13:14
*** VW has joined #openstack-infra13:14
fricklertosky: well you could mention either "MERGER_FAILURE" or the unbound bug I quoted above, but I don't think that that is necessary from an infra pov13:14
toskyack13:17
*** Goneri has joined #openstack-infra13:17
toskynow let's see13:17
*** caphrim007 has quit IRC13:18
*** owalsh has quit IRC13:18
*** pblaho has quit IRC13:19
*** owalsh has joined #openstack-infra13:19
*** VW has quit IRC13:19
*** heyongli has quit IRC13:19
*** caphrim007 has joined #openstack-infra13:19
*** heyongli has joined #openstack-infra13:19
*** efried is now known as fried_rice13:21
*** psachin has joined #openstack-infra13:21
*** eharney has joined #openstack-infra13:23
*** caphrim007 has quit IRC13:25
*** owalsh_ has joined #openstack-infra13:28
*** jaosorior has quit IRC13:29
*** yamamoto has joined #openstack-infra13:29
*** dklyle has quit IRC13:29
*** heyongli has quit IRC13:29
*** heyongli has joined #openstack-infra13:30
fungigiven that we run unbound on most of our servers, we probably want to keep an eye out for other similar breakage in random places. i've spot-checked some more important systems and all have it still running, so i agree this seems to have been isolated to the zuul executors13:30
*** yamamoto has quit IRC13:31
*** owalsh has quit IRC13:31
openstackgerritJavier Peña proposed openstack-infra/openstack-zuul-jobs master: Update version of CentOS OpenAFS packages to 1.6.22.3  https://review.openstack.org/57369413:31
*** dklyle has joined #openstack-infra13:31
*** mriedem has joined #openstack-infra13:32
*** yamamoto has joined #openstack-infra13:32
*** zhangfei has quit IRC13:34
*** owalsh_ has quit IRC13:34
*** ramishra has quit IRC13:38
*** owalsh has joined #openstack-infra13:39
*** heyongli has quit IRC13:40
*** heyongli has joined #openstack-infra13:40
fungi#status notice A misapplied distro security package update caused many jobs to fail with a MERGER_FAILURE error between ~06:30-12:30 UTC; these can be safely rechecked now that the problem has been addressed13:45
openstackstatusfungi: sending notice13:45
*** links has quit IRC13:46
-openstackstatus- NOTICE: A misapplied distro security package update caused many jobs to fail with a MERGER_FAILURE error between ~06:30-12:30 UTC; these can be safely rechecked now that the problem has been addressed13:46
fungi#status log unbound was manually restarted on many zuul executors following the 1.5.8-1ubuntu1.1 security update, due to https://launchpad.net/bugs/177583313:46
openstackLaunchpad bug 1775833 in unbound (Ubuntu) "unbound not running after automatic update" [Undecided,New]13:46
*** zhangfei has joined #openstack-infra13:47
openstackstatusfungi: finished sending notice13:48
openstackstatusfungi: finished logging13:48
fungithanks to giblet in #openstack-nova for pointing out that we hadn't logged that13:49
*** heyongli has quit IRC13:50
*** slaweq_ is now known as slaweq13:50
*** heyongli has joined #openstack-infra13:50
openstackgerritJeremy Stanley proposed openstack-infra/system-config master: Template credentials for Packet Host/Platform 9  https://review.openstack.org/57302113:52
dhellmannis the ever-growing event queue length in zuul a side-effect of those failures that openstackstatus just mentioned?13:53
*** amoralej|lunch is now known as amoralej13:54
dhellmannah, and there's the reset13:54
fungithat's simply how zuul breathes13:54
*** edmondsw has joined #openstack-infra13:54
fungiinhale events, exhale results13:54
dhellmannyeah, I haven't seen that up to 600+ in a long time13:55
*** dklyle has quit IRC13:55
dhellmannsomehow it always clears up as soon as I come to ask about it. I wonder if zuul is also watching for questions like that on irc?13:55
dhellmann"uh oh, someone noticed, better get back to work!"13:56
fungibut yeah, even before the ansible upgrade and the network connectivity incident and the broken unbound restarts from unattended-upgrades, we were maxed out on node capacity as of early yesterday13:56
dhellmannah13:56
fungii'm betting this is the rush of r2 milestone procrastinators ;)13:56
dhellmannprobably13:56
dhellmannalthough I like my image of zuul taking a smoke break better13:56
fungigranted, maintenance and unrelated outages isn't helping matters13:57
*** lpetrut has quit IRC13:58
openstackgerritMonty Taylor proposed openstack-infra/zuul master: Upgrade from angularjs (v1) to angular (v6)  https://review.openstack.org/55198914:00
*** heyongli has quit IRC14:00
*** heyongli has joined #openstack-infra14:01
*** ianychoi has joined #openstack-infra14:04
*** hongbin has joined #openstack-infra14:05
openstackgerritMonty Taylor proposed openstack-infra/zuul master: Upgrade from angularjs (v1) to angular (v6)  https://review.openstack.org/55198914:06
openstackgerritMonty Taylor proposed openstack-infra/zuul master: Upgrade from angularjs (v1) to angular (v6)  https://review.openstack.org/55198914:09
*** heyongli has quit IRC14:10
*** heyongli has joined #openstack-infra14:11
*** felipemonteiro has joined #openstack-infra14:12
*** shardy has quit IRC14:18
openstackgerritMonty Taylor proposed openstack-infra/zuul master: Hide queue headers for empty queues when filtering  https://review.openstack.org/57258814:19
openstackgerritThierry Carrez proposed openstack-infra/project-config master: Remove direct branching/tagging ACL for Chef  https://review.openstack.org/57371214:20
*** heyongli has quit IRC14:21
*** shardy has joined #openstack-infra14:21
frickleroh, nice, the mails about the broken unattended upgrades also weren't sent out earlier because of broken dns ... at least they mention that a reboot might be required, which would also solve that issue14:21
*** heyongli has joined #openstack-infra14:21
*** lpetrut has joined #openstack-infra14:22
openstackgerritJeremy Stanley proposed openstack-infra/system-config master: Document an example for deleting content from AFS  https://review.openstack.org/57282114:24
fungiindeed, an interesting catch-22 when your broken upgrade also breaks your ability to notify the sysadmin that some action may be required14:24
*** jistr is now known as jistr|mtg14:28
*** VW has joined #openstack-infra14:29
*** alkhodos_ has quit IRC14:30
*** alkhodos_ has joined #openstack-infra14:30
*** heyongli has quit IRC14:31
*** heyongli has joined #openstack-infra14:31
*** rpioso|afk is now known as rpioso14:32
*** quiquell is now known as quiquell|off14:32
*** r-daneel has quit IRC14:37
*** udesale has joined #openstack-infra14:38
*** felipemonteiro_ has joined #openstack-infra14:39
*** TheJulia is now known as needssleep14:40
*** heyongli has quit IRC14:41
*** heyongli has joined #openstack-infra14:41
*** felipemonteiro has quit IRC14:43
openstackgerrityolanda.robla proposed openstack/diskimage-builder master: Fix bootloader packages for rhel  https://review.openstack.org/57372614:46
openstackgerritThierry Carrez proposed openstack-infra/project-config master: Fix ACLs for tripleo-ci and dib-utils  https://review.openstack.org/57372814:46
openstackgerritOlivier Bourdon proposed openstack/diskimage-builder master: Fix CentOS image build failure when dib runs on debian based system  https://review.openstack.org/55948514:51
*** neiloy has joined #openstack-infra14:51
*** heyongli has quit IRC14:51
*** heyongli has joined #openstack-infra14:52
Shrewsooh, did we restart nl03? i'm pleasantly surprised it's working since the multi-label stuff is merged14:52
* Shrews goes to inspect zk data14:52
*** lpetrut has quit IRC14:54
Shrews"type": ["ubuntu-xenial"]14:54
Shrewsneat14:54
*** hashar is now known as hasharAway14:54
*** jamesmcarthur has joined #openstack-infra14:57
*** myoung is now known as myoung|biaf14:57
mnaserinfra-root: is there a way to check if things have been healthy on the new vms we added?15:00
*** r-daneel has joined #openstack-infra15:01
fungii think the answer is that it depends on how you expect to measure "healthy"15:01
fungiwe can look in logstash to see if any jobs which ran there succeeded15:02
*** heyongli has quit IRC15:02
*** zhangfei has quit IRC15:02
mnaserfungi: i guess just knowing that the overall state of things are okay, no weird mirror issues or other unrelated things15:02
*** shardy has quit IRC15:02
*** heyongli has joined #openstack-infra15:02
Shrewsmnaser: which region & node label?15:03
mordredShrews: \o/15:03
mnaserShrews: vexxhost-ca-ymq-1 i think would be the node_provider15:03
*** r-daneel_ has joined #openstack-infra15:05
mnaserhttp://logstash.openstack.org/#dashboard/file/logstash.json?query=node_provider%3Avexxhost-ca-ymq-1%20AND%20message%3A%5C%22%5ERUN%20END%20RESULT_NORMAL%5C%2215:05
mnaserit looks like things are ok15:05
Shrewsmnaser: well, i see that one actively doing things on nl03. no obvious errors in nodepool log15:05
*** armaan has joined #openstack-infra15:05
*** r-daneel has quit IRC15:06
*** r-daneel_ is now known as r-daneel15:06
Shrewsmnaser: and 10 in-use nodes15:06
Shrewscentos-7 and ubuntu-xenial15:07
Shrewslooks ok?15:07
openstackgerritMohammed Naser proposed openstack-infra/project-config master: Bump vexxhost to 25 servers  https://review.openstack.org/57373815:07
mnaserShrews: yep, going to slowly bump it more15:07
mnaser^15:07
Shrewsi'm all for increasing capacity15:11
mnasermaybe if fungi is around ^15:11
clarkbI'm sort of around at this point. fwiw I did keep an eye on vexxhost yesterday before I disappeared and it looked ok15:12
*** heyongli has quit IRC15:12
fungii am around though trying to pay attention in the release team meeting15:12
fungibut will review that15:12
corvuswe only seem to be using 750 nodes out of our ~1000 node capacity15:13
*** d0ugal has quit IRC15:14
openstackgerrityolanda.robla proposed openstack/diskimage-builder master: Fix bootloader packages for rhel  https://review.openstack.org/57372615:14
fungiindeed, what's going on there?15:15
*** heyongli has joined #openstack-infra15:15
corvusShrews, fungi, clarkb, mordred, mnaser: ^15:15
corvushttp://grafana.openstack.org/dashboard/db/zuul-status15:15
clarkbpossible the dynamic quote calculations are reflecting reality better than our max servers values?15:15
fungiyeah, wonder if the dynamic quota stuff is causing that to be lower than however we set the max there?15:15
fungier, what clarkb said15:16
*** jistr|mtg is now known as jistr15:16
corvuswe need a nodepool dashboard with the node graphs of each provider on it15:16
*** caphrim007 has joined #openstack-infra15:17
mnaserwe have a dashbaord for each but not for each provider in the same one15:17
mnaserlimestone seems to never go over 1315:17
fungilooks like it spiked up after the scheduler restart at 06:00 related to what was reported as a network connectivity problem, but quickly flattened out at around 750 nodes15:17
mnaseroh15:17
mnaserovh has max of 15915:17
mnaserbut only 3 available and 1 in use and 1 deleting15:17
mnaserin bhs115:17
mnasergra1 has a max of 79 and only a few are being used15:18
mnaserhttp://grafana.openstack.org/dashboard/db/nodepool-ovh15:18
fungimaybe our old quota calculation bug in ovh has re-emerged15:18
corvusyeah, that looks like the anomalous one15:18
mnaseri'll defer to the roots who have access and can check actual quotas15:18
*** caphrim007 has quit IRC15:18
*** caphrim007 has joined #openstack-infra15:19
fungithey seem to semi-regularly have trouble with whatever mechanism they use to sync quota utilization between regions to support their global quota implementation15:19
*** dtroyer has quit IRC15:19
*** caphrim007 has quit IRC15:19
*** dtroyer has joined #openstack-infra15:19
*** dklyle has joined #openstack-infra15:20
corvusgra1 dropped off starting at 23:00... bhs1... hard to say; it's more complicated.15:20
*** bnemec is now known as beekneemech15:20
Shrewsi count 56 ready+unlocked nodes15:20
corvusboth are served by nl0415:20
openstackgerritClark Boylan proposed openstack-infra/openstack-zuul-jobs master: Improve kata-runsh job  https://review.openstack.org/57374815:21
*** caphrim007 has joined #openstack-infra15:21
corvusdid we, by any chance, restart all of the launchers except nl04?15:21
clarkbnl03 was the only one I restarted15:21
corvusTypeError: unhashable type: 'list'15:21
corvusthat's why i ask ^15:22
*** heyongli has quit IRC15:22
*** cshastri has quit IRC15:22
corvusweird.  nl04 and nl01 have both been running since april 2615:22
Shrewscorvus: all but nl03 have Apr something start times15:22
*** heyongli has joined #openstack-infra15:22
*** ccamacho has quit IRC15:23
*** eernst has joined #openstack-infra15:23
corvusokay there are similar errors in nl01's logs15:24
corvusmaybe it's related to the order in which they handle requests15:24
corvushow about i restart the launcher on 0415:24
mordred++15:25
fungiworth a try15:25
*** dave-mccowan has joined #openstack-infra15:25
corvusokay, it's handling requests now, which makes me realize that it probably should have been doing more work earlier.  it's possible there's more to the failure than just the list change15:26
corvus(like, i wonder if a thread died somewhere)15:27
corvuswe should also restart nl01 and nl0215:27
corvusbut i'm about to be running late; can someone else handle that?15:28
*** dklyle has quit IRC15:28
*** hjensas has joined #openstack-infra15:29
Shrewscorvus: i'll do it15:30
corvusShrews: thx15:30
fungithanks corvus and Shrews!15:31
*** dklyle has joined #openstack-infra15:32
Shrewsnl01 and nl02 launchers restarted15:32
*** dave-mccowan has quit IRC15:32
*** iranzo has quit IRC15:32
*** heyongli has quit IRC15:32
*** heyongli has joined #openstack-infra15:33
Shrewscitycloud seems to be reject requests because "images not available"15:34
Shrewsis that a known issue?15:34
clarkbcitycloud should have max-servers set to 015:34
clarkband we stopped pushing images to them15:34
clarkbproobably unknown issue that something is trying to talk to their apis still15:35
Shrewsah15:35
Shrewsmaybe we should move the max-servers check before the images check15:36
*** myoung|biaf is now known as myoung15:37
*** dklyle has quit IRC15:39
clarkbseems reasonable15:40
openstackgerritIan Y. Choi proposed openstack-infra/project-config master: [translation] doc generatepot jobs for 3 projects  https://review.openstack.org/54537715:40
*** e0ne has quit IRC15:41
*** kamren has quit IRC15:41
clarkbthere isn't a way to dump the periodic queue without restarting the zuul scheduler right?15:41
clarkbjust thinking out loud here that dumping last nights periodic jobs may be helpful for moving check/gate along15:42
clarkbthough periodic is low priority iirc so that may not have an effect15:42
AJaegerinfra-root, ianw, https://review.openstack.org/#/c/573728/ removes direct tagging from diskimage-builder (commit speaks only about dib-utils but they are shared) to use tagging via the releases repository by the release team. Is that ok?15:42
clarkbAJaeger: no we've asked the release team to keep it outside of that as an infra project that getting releases out indirectly depends on15:42
clarkbthat said I don't know what dib-utils is15:43
*** heyongli has quit IRC15:43
clarkbdefinitel ^ is true for dib itself15:43
AJaegerclarkb: then ttx needs to rework the change ;(15:43
*** heyongli has joined #openstack-infra15:43
AJaegerclarkb: dib-utils shares ACL with diskimage-builder.15:43
fungiclarkb: the most dumping periodic will buy us is freeing up resources for post, but we don't generally reenqueue post on a scheduler restart either15:43
clarkbdib-utils is still a tripleo repo according to governance15:44
clarkbmaybe it should be part of release if that is the case15:44
*** lbragstad is now known as elbragstad15:44
bodenclarkb hi… did you see my ping earlier regarding https://review.openstack.org/#/c/57338615:44
fungiwe could split the acls apart for those repos too15:44
AJaegerfungi, that's what I proposed...15:44
fungioh, cool ;)15:45
AJaeger(as review comment with my -1 now)15:45
*** robled has quit IRC15:46
clarkbboden: no15:46
*** robled has joined #openstack-infra15:47
*** robled has joined #openstack-infra15:47
clarkbAJaeger: fungi ok reading dib element readmes and release notes, dib-utils is a copy of dib-run-parts for use in places that want it without dib. dib itself vendors its own copy. I think that basically means infra probably doesn't care as dib itself is fine15:47
*** sambetts is now known as sambetts|afk15:47
ttxAJaeger: hah!15:47
*** kamren has joined #openstack-infra15:47
clarkbAJaeger: fungi: happy for it to be a tripleo project and be managed by release team, but we should update the acl I think15:47
bodenclarkb: based on our chat yesterday regarding siblings, I thought https://review.openstack.org/#/c/573386 would address the issue, but it appears neutron/master is still not installed.. see latest comment in 57338615:47
clarkbttx: ^ fyi I think its ok to manage dib-utils15:48
openstackgerritkaka proposed openstack/diskimage-builder master: fix tox python3 overrides  https://review.openstack.org/57375915:48
fungifwiw, dib-utils looks basically dead activity-wise15:49
ttxclarkb: we need a separate ACL file for dib-utils ?15:49
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool master: Fail quickly for disabled provider pools  https://review.openstack.org/57376215:49
clarkbboden: we don't dynamically load config out of project config because it is a repo that contains secrets and stuff (it is trusted and must be reivewed before used)15:49
clarkbboden: so that change will have to merge first15:49
bodenclarkb ah ok, I wasn’t aware15:49
fungior extremely stable... the only util in it is bin/dib-run-parts?15:49
fungimaybe it should be considered part of dib, just a very stable part?15:49
*** robled has quit IRC15:50
clarkbttx: ya I think we should have tripleo apply a tripleo acl to it15:50
clarkbfungi: well dib has its own copy15:50
bodenclarkb perhaps I can ask you to peek at https://review.openstack.org/#/c/573386 when you get a min so we can help land it and I can verify… thanks15:50
ttxclarkb: ok will fix15:50
fungiahh, so if dib doesn't use it then seems fine to stay under tripleo15:50
clarkbfungi: I think what hapened was rhel didn't have a run parts that would work for dib so dib made their own run parts, then later rhelians found usage for it outside of dib but didn't want to install dib for it so they made a copy in its own repo15:50
*** robled has joined #openstack-infra15:50
*** robled has joined #openstack-infra15:50
clarkbdib appears to have its own run parts implementation internally15:50
clarkbdiskimage_builder/lib/dib-run-parts specifically15:51
fungithe test nodes graph shows we now have a glut of available nodes not getting assigned to jobs?15:52
fungibasically everything built since the launcher restart15:52
clarkbzuul schedule should tell us what is going on related to that since it makes the node requests iirc15:53
*** heyongli has quit IRC15:53
fungirather, everything above and beyond what we were previously using. so our node capacity came back but our in use didn't increase15:53
*** heyongli has joined #openstack-infra15:53
fungiand looks like the starting builds graph indicates no executors have started builds for the past hour?15:54
Shrewsi'm seeing many in-use nodes15:54
clarkb2018-06-08 15:42:43,117 INFO zuul.nodepool: Node request <NodeRequest 200-0004363196 <NodeSet devstack-single-node OrderedDict([(('primary',), <Node 0004362489 ('primary',):ubuntu-xenial>)])OrderedDict()>> fulfilled was the last fulfilled request15:55
Shrews64615:55
fungiyeah, we have ~700 in use and ~300 available according to the graph15:55
clarkbplenty of requests are being updated though15:55
openstackgerritThierry Carrez proposed openstack-infra/project-config master: Fix ACLs for tripleo-ci and dib-utils  https://review.openstack.org/57372815:56
Shrews3075 requests in the nodepool queue15:56
clarkbyou know thinking about ^ the gearman priority may not affect node allocation?15:57
clarkbbceause we make the node requests first in the scheduler then ask for an executor to run the job via gearman after15:57
clarkbI think15:57
clarkbIts early in the morning and I am pre caffeien but I think gearman priority may be far less meaningful now?15:58
corvusclarkb: pipeline priority determines node request priority15:59
corvusclarkb: https://git.openstack.org/cgit/openstack-infra/zuul/tree/zuul/model.py#n62815:59
clarkbah16:00
clarkbso its still a layer removed but we attempt to respect it16:00
clarkbsince a high priority node request may take longer to boot than the next low priority one it isn't perfect but in general should be good16:00
openstackgerritJeremy Stanley proposed openstack-infra/system-config master: Document an example for deleting content from AFS  https://review.openstack.org/57282116:01
*** heyongli has quit IRC16:03
*** heyongli has joined #openstack-infra16:04
corvusShrews, clarkb: nodepool list suggests that most of the ready nodes are recently built nodes from ovh16:04
*** jcoufal_ has quit IRC16:05
corvusi have to run now, sorry i can't help more16:05
clarkbis it possible that we are at executor load limits?16:06
clarkbthat would prevent executors from taking new jobs and marking nodes in use right?16:06
corvusclarkb: not even close16:06
corvusclarkb: http://grafana.openstack.org/dashboard/db/zuul-status16:06
corvusthe executors are becoming more and more idle16:06
corvusi would seriously consider reverting the nodepool changes16:07
corvusthis is not a great time to have the system stuck16:07
clarkbya, though I'm not caught up on what changes were made /me looks at git16:07
clarkbah the driver changes16:08
corvus(there are currently only about 100 builds running)16:09
*** r-daneel has quit IRC16:09
*** r-daneel_ has joined #openstack-infra16:09
fungihttps://review.openstack.org/568704 Simplify driver API16:09
fungiand i'm guessing the multi-label stuff depends on that too16:09
*** camunoz has joined #openstack-infra16:09
clarkbrevert to this commit maybe 9a03c679e32c13c0401dccaa9d9347922914780116:10
corvusgotta run now.  Shrews knows the story16:10
clarkbit is still after when the nodepool launchers were last restarted but before the driver updates16:10
Shrewswell, ovh-gra1 doesn't seem to be able to launch nodes16:10
openstackgerritSean McGinnis proposed openstack-infra/project-config master: Remove DragonFlow tagging ACL  https://review.openstack.org/57377216:11
fungiclarkb: 9e5df7325b863e8bc4718b32d0ff93ce27c5530b looks like the last state before the merges from 4 days ago16:11
clarkbah a couple ahead of the one I picked, that wfm16:11
corvusShrews: it launched 72 of them in the past 40 minutes.  that's a red herring.16:11
*** r-daneel_ is now known as r-daneel16:11
clarkbI'm going to add nl01-4 to the emergecny file so that we can do this without pupet fighting us16:12
fungisounds good, thanks16:12
*** neiloy has quit IRC16:12
clarkbthat is done16:13
clarkbovh is on 04 right? why don't we start there since that will in theory free up available ready nodes for use16:13
*** heyongli has quit IRC16:13
fungii agree that seems like a good place to start16:14
*** heyongli has joined #openstack-infra16:14
clarkbI'm going to checkout 9e5d and pip install then restart16:14
clarkbon nl0416:14
fungik16:14
clarkbit helps to remember to use python3 not 216:15
mordredinfra-root: mails sent to Shrews and frickler by thesystem keep getting bounced due to google being a bad-actor on the internet - is there any fix we're aware of? or should we maybe take their emails out of the sysadmin list?16:16
*** armaan has quit IRC16:16
openstackgerritAnita Kuno proposed openstack-infra/system-config master: Survey Documentation  https://review.openstack.org/57153616:16
clarkbmordred: my fix for that was to get a fastmail account16:16
*** armaan has joined #openstack-infra16:16
clarkbnl04 is running 9e5df7325b863e8bc4718b32d0ff93ce27c5530b now16:16
clarkbTypeError: unhashable type: 'list' is an issue16:17
clarkbwhich is where we started this morning right?16:17
fungimordred: which system?16:18
clarkbI think this may have been a nop16:18
clarkbwe went from one not working system to another not working system16:18
clarkbdo we understand the unhashable type problem?16:18
Shrewsclarkb: nope. haven't had time to look16:18
Shrewsclarkb: wait.16:19
Shrewsdon't restart any more16:19
Shrewswe have a problem16:19
Shrewswe now have zk nodes with a list for node.type, but the revert doesn't know how to handle that16:20
clarkbok shoudl I go ahead and just stop the daemon on nl04 then?16:20
Shrewsclarkb: yes16:20
clarkbalso I bet this was the original issue with me restarting nl0316:20
clarkband only nl0316:20
fungiohhhh16:20
clarkbit "polluted" the zk data16:20
Shrewsclarkb: possibly16:21
clarkbnl04 is not running a launcher anymore16:21
clarkbany thoughts on our next step? we can debug the problem with master, or do a full system restart which should clear out the requests and bring up the launchers on the old code16:23
clarkbby full I mean scheduler and launcher restarts16:23
fungiis that sufficient or do we also have to do something extra to clear out zk?16:23
*** heyongli has quit IRC16:24
Shrewsi think we'd have to clear out zk if we don't push forward with finding the problem in master16:24
clarkbmy understanding is they are all connection dependent nodes so once the connection timeout occurs they are deleted. We may have to wait >1 minute to have it dlete them though16:24
*** heyongli has joined #openstack-infra16:24
clarkbwe can also manually clear out the db16:24
Shrewscan someone confirm if *any* requests are being handled now?16:25
clarkb15:42:43,117 remains the last fulfilled request according to the scheduelr16:25
*** fried_rice is now known as fried_rolls16:29
*** lpetrut has joined #openstack-infra16:29
clarkbzuul appears to really only be processing external events like gerrit and zuul web rpc16:30
clarkbso ya I don't think there are any requests being handled16:30
*** jesslampe has joined #openstack-infra16:30
clarkblots of retrying node requests in the launchers but not seeing tracebacks to tell me why16:31
openstackgerritOliver Walsh proposed openstack/diskimage-builder master: WIP: Add DIB element to install NVIDIA GPU drivers  https://review.openstack.org/57322316:31
clarkbah that comes from the self.paused handling16:32
fricklermordred: can you show a copy of the bounce? I'm regulary seeing bounces for some other accounts http://paste.openstack.org/show/722989/16:32
Shrewsso yeah, the "unhashable type" issue was because of zk pollution16:33
anteayais it worth sending out a status notification about the current situation yet?16:33
anteayajust to pre-empt any questions?16:34
*** heyongli has quit IRC16:34
clarkb2018-06-08 15:08:24,139 INFO nodepool.NodeLauncher-0004362219: Node id 0004362219 is ready is the last case of a node going ready on nl0116:34
fungifrickler: the bounces from the zk servers are because those still haven't gotten their hostnames corrected so they're helo'ing with unresolvable hostnames16:34
*** heyongli has joined #openstack-infra16:34
*** jesslampe has quit IRC16:34
clarkbanteaya: probably? things have been on fire for like 9 hours or something16:35
anteayashows you how behind I am then16:35
fungiat this point it's just a matter of builds queuing and not starting, i think, so depending on how we correct it there may be no visible impact other than just more delayed results16:36
clarkbShrews: I don't see any tracebacks in the log after 1400ish UTC either16:36
clarkbShrews: my hunch is that our openstack driver isn't really being used and we're just calling the launch on the base handler that psses16:36
clarkbor something similar to that based on lack of openstack specific logging (either successes or failures)16:36
anteayaI don't see anything I can do that is helpful, I have to run away and complete an errand, may the force be with you16:38
*** caphrim007_ has joined #openstack-infra16:39
clarkbI too have an 11am doctors visit with the kids so I'm on a bit of a time crunch. But not up against it just yet16:39
Shrewsok, i hate to say it, but the action i'm thinking we may need to do now is shutdown all of zuul, remove all nodes from zk, restart the launchers in reverted state (they should then begin cleaning up leaked instances and building new nodes), then start zuul16:39
Shrewsunless someone else has a better idea?16:39
fungino, that sounds reasonable at this stage16:40
clarkbI don't think all of zuul has to be restarted, just the scheduler16:40
clarkbmaybe web if it doesn't like the scheduler going away16:40
Shrewsi mean, i could spend time trying to fix master, but i have no idea what the problem is16:40
Shrewsi think just the scheduler since it holds the node locks in zk16:40
clarkbfwiw I don't see the driver load error messages in the log file16:40
clarkb* launcher log file16:41
Shrewswhat driver load error?16:41
*** Guest14735 is now known as sdake16:41
*** caphrim007 has quit IRC16:41
clarkbno implementation found16:41
Shrewswhere is that?16:42
clarkbnodepool/driver/__init__.py16:42
Shrewsi mean what log?16:42
clarkbbut I don't see that in the log so no positive verification that is the issue. launcher debug log16:42
Shrewswait, rewind please.... did you see this driver error in a log?16:43
clarkbno16:43
fungispeculation based on lack of any logged error?16:43
*** caphrim007 has joined #openstack-infra16:43
clarkbI was just following up on my theory above that we aren't amking any openstack related calls since there are no positive or negative openstack log messages16:43
Shrewssomeone want to stop zuul scheduler?16:44
Shrews(assuming we are agreed this is the plan?)16:44
*** heyongli has quit IRC16:44
clarkbI'm fine with proceeding with that plan. I can stop the scheduler and dump its queues16:44
*** heyongli has joined #openstack-infra16:44
*** panda is now known as panda|off16:45
fungiyou got into place faster than i16:45
clarkboh I'm not into place :)16:45
fungiare we still good with 9e5df7325b863e8bc4718b32d0ff93ce27c5530b as the rollback for the launchers?16:45
Shrewsfungi: yes16:45
clarkbbefore we stop the scheduler why don't we prep all of the launchers with ^16:45
fungido we also need to roll back the builders?16:45
Shrewsfungi: builders should be fine16:46
*** jpich has quit IRC16:46
clarkband don't forget to use pip3 install instaed of pip install16:46
fungiworking on nl01 now16:46
Shrewsi'm going to prep the zk cleanup16:46
clarkbI'll do 0316:46
*** caphrim007_ has quit IRC16:47
fungiSuccessfully uninstalled nodepool-3.0.2.dev5116:47
fungiSuccessfully installed nodepool-3.0.2.dev2916:47
fungipresumably that's what we want to see everywhere16:47
fungii'll work on 02 next16:48
clarkbya that looks like what I've got on 04 and 0316:48
Shrewsare all launchers stopped?16:48
clarkbShrews: no16:48
funginot yet16:48
clarkband scheduler is still running too16:48
fungiokay, nl02 is upgraded now too16:48
Shrewslet's do that too16:48
fungiso the idea is that we stop the launchers next?16:48
clarkbyes I think launchers, then scheduler, then zk cleanup, then start scheduler and launchers again16:49
Shrewsyes. i don't want anything accessing zk16:49
fungii'll do 01 and 0216:49
clarkbI'll make sure 03 and 04 are stopped then do scheduler and confirm they are all done when done16:49
Shrewsack16:49
fungi01 and 02 are stopped now16:49
clarkb03 and 04 are stopped, moving to the scheduler16:50
mordredfrickler, fungi : http://paste.openstack.org/show/722990/16:50
clarkbI have asked the scheduler to stop and am waiting for it to do so16:50
clarkbthe builders use zk too, do you want them stopped or you'll only worry about the noderequests?16:51
clarkbShrews: ^16:51
fungimordred: frickler: we could probably "fix" that by whitelisting those server ip addresses in the pbl16:51
Shrewsclarkb: just things access nodes16:52
Shrewsso builders are fine to leave alone16:52
clarkbzuul scheduler is refusing to stop quickly again. I'll kill it manually in a minute or two if it doesn't stop on its own16:52
*** jtomasek has quit IRC16:52
*** jpena is now known as jpena|off16:53
clarkbok I'm going to manually kill it16:53
Shrewsok16:53
clarkbscheduler is stopped16:53
clarkbShrews: I think you are good from a zk perspective16:54
*** felipemonteiro_ has quit IRC16:54
*** heyongli has quit IRC16:54
*** heyongli has joined #openstack-infra16:55
Shrewsclarkb: are you certain we shouldn't stop the executors? once i restart the launchers, they'll attempt to delete the leaked instances which could be running jobs16:55
Shrewsthat may make the executors go haywire16:55
clarkbI think stopping the scheduler already effectively does/did that16:56
clarkbwe can stop them too if we want, I don't think it will hurt16:56
Shrewsok, then my plan is to 1) remove zk data in /nodepool/nodes/*  2) start nl04 launcher and verify it's doing what i'd expect   3) restart the other launchers if #2 is ok16:57
Shrews4) we can restart scheduler16:57
clarkbok, when should we start the scheduler? perfect16:57
mordredShrews: ++16:57
fungisounds great16:57
Shrewsdoing #1 now16:57
Shrewsdone. starting nl04 launcher16:58
*** lpetrut has quit IRC16:59
clarkblooks like it marked a bunch of stuff for deletion as expected16:59
Shrewsyes.16:59
Shrewsnew instances launching too16:59
*** derekh has quit IRC17:00
Shrewsactually, no new instances17:00
*** r-daneel has quit IRC17:00
clarkbthe only new instances should be for min ready but we manage that on nl01 only17:01
clarkbI think this is expected. maybe do nl01 next and ensure min ready launch?17:01
Shrewsoh we have no min-ready17:01
Shrewscool17:01
clarkbya we made that change so that we'd stop doing min ready * number of launchers17:01
Shrewsok, ovh looks to be cleaned up17:01
Shrewsgoing to do nl01 now17:01
*** pcaruana has quit IRC17:02
clarkbfungi: since I may have to pop out before we are done. ~root/check.sh and ~root/gate.sh on zuul01 are our saved checka nd gate queues. You'll want to ensure the shceduler is fully started before enqueuing otherwise it just errors at you a lot. Also zuul-web may need a restart after zuul-scheduler is up again17:02
funginoted, thanks!17:03
*** udesale has quit IRC17:03
*** hemna_ has quit IRC17:03
Shrewsdeleting leaked instances. i see new ones building17:03
*** zoli is now known as zoli|gne17:04
*** zoli|gne is now known as zoli|gone17:04
Shrewsgoing to let that cleanup finish before doing nl0217:04
*** zoli|gone is now known as zoli17:04
clarkbok17:04
clarkb(I have about 5 minutes left before I need to look at getting kids out the door for an on time doctors visit)17:04
*** heyongli has quit IRC17:05
clarkbbut I think its mostly mechanical now and a matter of just watching things as we start up17:05
*** heyongli has joined #openstack-infra17:05
fungii'm all queued up to work on the scheduler startup and requeuing17:05
fungier, cued up17:05
Shrewsfungi: thx17:06
Shrewsdoing nl02 now17:06
Shrewsnl03 now17:08
*** Swami has joined #openstack-infra17:08
Shrewsfungi: i think we're about ready. let nl03 clean up resources a bit....17:09
fungik17:09
fungistanding by17:10
Shrewswhole lot of inap instances to clear out17:10
clarkbalright I'm just about out of time. the scheduler startup will probably take around 5 minutes iirc17:11
clarkbits a pretty clear change in the logging content it goes from doing a ton of git operations to build a config ot listening to events and stuff17:11
fungithanks!17:12
Shrewsfungi: good to start scheduler17:13
fungion it17:13
*** dhill_ has joined #openstack-infra17:13
Shrewsjobs may be delayed by node building17:13
fungitailing /var/log/zuul/debug.log to catch when it's safe to reenqueue check/gate builds17:14
*** heyongli has quit IRC17:15
*** heyongli has joined #openstack-infra17:15
Shrewsstuff is happening17:17
fungiso we think events from ~16:50 to ~17:15 will have been missed, needing rechecks/reapprovals?17:17
fungibasically from when clarkb stopped the scheduler to when i started it?17:17
fungialso looks like it's finished starting up so i'll get to reenqueuing now17:17
Shrewsi believe so17:18
Shrewsand let's wait until next week to make any more changes  :)17:18
*** eharney has quit IRC17:19
fungii'll restart zuul-web too. i'm not getting any content on the status page17:20
smcginnisWas waiting to point that out if it didn't come up. :)17:20
smcginnisThere it is.17:20
fungiyeah, i just wanted to give it a few17:21
*** iranzo has joined #openstack-infra17:21
fungithere it goes17:21
smcginnisAll the pretty status bars.17:21
fungiapparently does still need zuul-web restarted after a scheduler restart17:21
fungichanges are still being enqueued, but i'll go ahead and send a status notice17:21
Shrewsfungi: thx17:22
smcginnisDid we end up needing to manual recheck patches, or did things get captured OK for renque?17:22
fungi#status notice The Zuul scheduler was offline briefly to clean up from debugging a nodepool issue, so changes uploaded or approved between 16:50 and 17:15 UTC may need to be rechecked or reapproved (all already queued changes are in the process of being reenqueued now)17:22
openstackstatusfungi: sending notice17:22
smcginnisHah, guess that answers it. Thanks.17:22
Shrewsfungi: infra-root: let's leave the launchers in the emergency file for now17:23
fungiShrews: yes, let's17:23
*** psachin has quit IRC17:23
Shrewsi've grabbed the nl04 debug log file, so will need to spend some time digging through that17:23
-openstackstatus- NOTICE: The Zuul scheduler was offline briefly to clean up from debugging a nodepool issue, so changes uploaded or approved between 16:50 and 17:15 UTC may need to be rechecked or reapproved (all already queued changes are in the process of being reenqueued now)17:24
*** jamesmcarthur has quit IRC17:25
*** heyongli has quit IRC17:25
*** yamahata has joined #openstack-infra17:25
*** heyongli has joined #openstack-infra17:25
openstackstatusfungi: finished sending notice17:26
Shrewsfungi: agree that things look normal'ish enough for me to step out for lunch for a bit?17:26
openstackgerritSean McGinnis proposed openstack-infra/project-config master: Remove DragonFlow tagging ACL  https://review.openstack.org/57377217:26
*** electrofelix has quit IRC17:27
fungiShrews: i believe so, yes17:27
fungii'm about to go cook something up myself17:27
Shrewshttp://grafana.openstack.org/dashboard/db/nodepool seems sane17:28
*** dtantsur is now known as dtantsur|afk17:30
fungiyeah, looks like lots of jobs are running and quite a few have already succeeded17:30
fungireenqueue is still underway17:31
clarkbShrews: as a datapoint before I forget nl03 was restarted yestder at about 3pm pacific ish and was successfully launching nodes17:32
clarkbI dont know when all thr new stuff merged but possible its not a 100% always on failure17:32
*** aojea has quit IRC17:32
Shrewsclarkb: thx17:33
*** heyongli has quit IRC17:35
*** heyongli has joined #openstack-infra17:36
*** myoung is now known as myoung|bbl17:36
*** camunoz has quit IRC17:37
*** camunoz has joined #openstack-infra17:38
*** lpetrut has joined #openstack-infra17:41
*** XP_2600 has joined #openstack-infra17:43
*** amoralej is now known as amoralej|off17:44
*** hemna_ has joined #openstack-infra17:45
*** heyongli has quit IRC17:46
*** heyongli has joined #openstack-infra17:46
*** XP_2600 has quit IRC17:47
*** jcoufal has joined #openstack-infra17:53
*** jcoufal has quit IRC17:53
*** jcoufal has joined #openstack-infra17:54
*** heyongli has quit IRC17:56
*** heyongli has joined #openstack-infra17:56
openstackgerritMerged openstack-infra/project-config master: Add magnum to storyboard  https://review.openstack.org/57244018:02
*** heyongli has quit IRC18:06
*** heyongli has joined #openstack-infra18:07
*** rajinir has joined #openstack-infra18:13
*** neiloy has joined #openstack-infra18:14
*** heyongli has quit IRC18:16
*** heyongli has joined #openstack-infra18:17
*** kgiusti has quit IRC18:18
*** kjackal has joined #openstack-infra18:20
*** yamamoto has quit IRC18:25
*** yamamoto has joined #openstack-infra18:26
*** heyongli has quit IRC18:27
*** heyongli has joined #openstack-infra18:27
*** rmcall has joined #openstack-infra18:30
*** yamamoto has quit IRC18:31
*** hasharAway is now known as hashar18:31
*** heyongli has quit IRC18:37
*** heyongli has joined #openstack-infra18:37
*** gfidente has quit IRC18:40
*** dhill_ has quit IRC18:40
*** jcoufal has quit IRC18:41
*** fried_rolls is now known as fried_rice18:43
*** heyongli has quit IRC18:47
*** heyongli has joined #openstack-infra18:47
Shrewswelp, been totally quite since lunch. i'm assuming things are still good18:49
fungiseems so18:50
fungithough i just emerged from a huge wok full of fried rice myself so still checking18:50
*** heyongli has quit IRC18:57
*** heyongli has joined #openstack-infra18:58
*** rmcall has quit IRC19:02
fungiyeah, nodes is pretty much maxed out in use, node requests and check pipeline length are steadily (if slowly) dropping19:07
Shrewsok, i *think* i have a theory on what happened.19:07
* fungi pulls up a chair19:07
fungiwill i need popcorn?19:07
Shrewsnah19:07
*** heyongli has quit IRC19:08
*** heyongli has joined #openstack-infra19:08
Shrewsbecause we were running a mix of zk data with node.type of str and node.type of list, our min-ready code (running on nl01, so old code) did not recognize the new nodes as ready19:08
Shrewsso we were rapidly creating new nodes which were never seen as ready by the min-ready code19:09
fungireasonable19:09
Shrewsso we maxed out our capacity19:09
Shrewslaunchers paused19:09
Shrewsthings were sad19:09
fungiand accumulating those at the expense of usable nodes19:09
Shrewsi'm going to keep looking, but that seems like at least part of the issue19:10
Shrewsthe solution would have been to stop all launchers, then restart one-at-a-time19:10
Shrewsso that we never ran a mix19:10
Shrewsthis was not easy to predict or forsee19:10
Shrews(which is why changing the zk schema *always* scares me)19:11
Shrewsassuming that was the only issue, we'll need a release note for admins19:12
Shrewswell, regardless, we'll need that19:13
Shrewstobiash: hopefully your not running nodepool from master  ;)19:14
Shrewss/your/you're/19:14
clarkb++ to a release note19:14
tobiashShrews: ?19:14
fungiyes, i'd rather nobody else needed to trip over this19:14
* tobiash reads backlog19:14
Shrewstobiash: multilabel support requires total shutdown of nodepool, then restart19:15
fungithough maybe this is only likely to be a problem for deployments with multiple launchers?19:15
clarkbshrews does it reqyite clearing all existing noderequests too?19:15
Shrewstobiash: err, all nodepool launchers that is19:15
Shrewsclarkb: no19:15
tobiashShrews: total shutdown means also deletong all nodes in zk?19:15
Shrewstobiash: shouldn't need to19:16
tobiashI'm currently having only one launcher19:16
Shrewstobiash: oh, then you're fine19:16
tobiashso I guess that is a full restart :)19:16
clarkbprobably wortb a status log too with path forward19:17
Shrewsclarkb: i'll throw something together19:17
*** heyongli has quit IRC19:18
*** heyongli has joined #openstack-infra19:18
tobiashso that was the reason for that massive queueing today?19:19
Shrewsstatus log Nodepool issue from earlier today seems to have been caused by nl03 launcher restart. Mixed, incompatible versions of code caused us to create min-ready nodes continually until we reached full capacity.19:19
Shrewssomething like ^^ ?19:20
Shrewstobiash: believe so19:21
clarkbShrews: ya and maybe note that a full restart of launchwr is necessary to go forward19:21
fungirca lgtm19:22
Shrewsstatus log Nodepool issue from earlier today seems to have been caused by nl03 launcher restart. Mixed, incompatible versions of code caused us to create min-ready nodes continually until we reached full capacity. A full restart of nodepool launchers is necessary to prevent this going forward.19:23
Shrewsstatus log Nodepool issue from earlier today seems to have been caused by nl03 launcher restart. Mixed, incompatible versions of code caused us to create min-ready nodes continually until we reached full capacity. A full rhutddown and restart of nodepool launchers is necessary to prevent this going forward.19:23
clarkb++19:23
Shrewsrhutddown... gah19:23
*** camunoz has quit IRC19:23
* Shrews edits in vim19:23
fungiyou just need a vim-based irc client and you'll be all set19:24
Shrews#status log Nodepool issue from earlier today seems to have been caused by nl03 launcher restart. Mixed, incompatible versions of code caused us to create min-ready nodes continually until we reached full capacity. A full shutdown and restart of nodepool launchers is necessary to prevent this going forward.19:25
*** eharney has joined #openstack-infra19:25
openstackstatusShrews: finished logging19:25
fungithanks for figuring that out, Shrews!19:27
*** heyongli has quit IRC19:28
*** heyongli has joined #openstack-infra19:28
*** yamamoto has joined #openstack-infra19:29
Shrewsi'm not quite sure how to test and validate my shutdown/restart suggestion19:31
*** vivsoni_ has quit IRC19:31
Shrewsi mean, in my head... it works19:32
*** vivsoni has joined #openstack-infra19:32
*** dklyle has joined #openstack-infra19:32
clarkbI think we can run two launchers against a devstack amd see they work?19:32
Shrewsinternal debugger is efficient, but limited19:32
clarkbmaybe even modufy our existing job to do it19:32
Shrewsclarkb: but it requires different code versions. that'd be hard to do with existing job19:34
*** armaan has quit IRC19:34
*** armaan has joined #openstack-infra19:34
Shrewsclone old version; run nodepool; build nodes; shutdown nodepool; upgrade nodepool; run nodepool; send node requests19:35
Shrewsmight have to do this manually, somehow19:36
tobiashShrews: do you do sharding of the clouds or does every launcher talk with every cloud?19:36
Shrewstobiash: we shard19:36
*** myoung|bbl is now known as myoung19:36
*** yamamoto has quit IRC19:37
tobiashah, had to reread your explanation to match that with sharding in my head19:37
*** dklyle has quit IRC19:37
*** heyongli has quit IRC19:38
Shrewsi'm still investigating how it affected the other launchers. i know the min-ready requests would at least put additional pressure on them due to shared labels19:39
*** heyongli has joined #openstack-infra19:39
*** slaweq has quit IRC19:39
*** kjackal has quit IRC19:40
*** slaweq has joined #openstack-infra19:41
*** kjackal has joined #openstack-infra19:41
*** lifeless has quit IRC19:46
*** heyongli has quit IRC19:49
*** heyongli has joined #openstack-infra19:49
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool master: Fix 'satisfy' spelling errors  https://review.openstack.org/57382319:54
*** VW has quit IRC19:58
*** heyongli has quit IRC19:59
*** heyongli has joined #openstack-infra19:59
sthusseyIs the post pipeline queue still executing?20:01
*** kjackal has quit IRC20:08
*** heyongli has quit IRC20:09
*** heyongli has joined #openstack-infra20:10
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool master: Add release note about upgrading for multi-label  https://review.openstack.org/57382720:11
*** jesslampe has joined #openstack-infra20:12
*** armaan has quit IRC20:12
*** armaan has joined #openstack-infra20:13
*** heyongli has quit IRC20:19
*** heyongli has joined #openstack-infra20:20
*** lifeless has joined #openstack-infra20:22
fungisthussey: we reenqueued everything that had been enqueued before the restart, so if you didn't got job results and don't see your change pending/running at https://zuul.openstack.org/ then you'll need to recheck it (or have its approval vote reapplied if it's suppose to be in the gate)20:25
sthusseyIt is in the queue, I just haven't seen the queue change in 45m20:26
sthusseyWell, change aside from grow.20:26
*** e0ne has joined #openstack-infra20:27
fungihttp://grafana.openstack.org/dashboard/db/zuul-status indicates the check and gate pipelines are steadily (if slowly) shrinking since we brought everything back up about 3 hours ago20:28
*** heyongli has quit IRC20:30
*** heyongli has joined #openstack-infra20:30
sthusseyRight, this is the post pipeline20:31
*** armaan has quit IRC20:31
*** armaan has joined #openstack-infra20:31
*** iranzo has quit IRC20:35
fungioh, post and periodic run at the lowest priority, so don't get node allocations until gate and check (primarily) are satisfied20:35
clarkbok back now20:36
clarkbShrews: as for restarting again for infra maybe we plan to do that monday?20:36
fungiwfm20:36
Shrewswfm2. we can always do what we did today if that doesn't work more betterer20:38
clarkbya we now have a rollback process that is known to work so less scary20:40
*** agopi has joined #openstack-infra20:40
*** heyongli has quit IRC20:40
*** heyongli has joined #openstack-infra20:40
*** dhill_ has joined #openstack-infra20:41
fungiwell, and we actually know what's going on so can keep it brief20:42
*** hemna_ has quit IRC20:42
*** Swimingly has quit IRC20:44
*** Swimingly has joined #openstack-infra20:45
*** Swimingly has quit IRC20:45
*** Swimingly has joined #openstack-infra20:47
*** Swimingly has quit IRC20:47
*** Swimingly has joined #openstack-infra20:47
*** heyongli has quit IRC20:50
*** heyongli has joined #openstack-infra20:50
*** heyongli has quit IRC21:00
*** heyongli has joined #openstack-infra21:01
*** kjackal has joined #openstack-infra21:03
*** r-daneel has joined #openstack-infra21:05
*** neiloy has quit IRC21:07
*** r-daneel_ has joined #openstack-infra21:08
*** r-daneel has quit IRC21:09
*** r-daneel_ is now known as r-daneel21:09
*** heyongli has quit IRC21:11
*** heyongli has joined #openstack-infra21:11
sthusseyAh, thanks. That explains it. We've chosen the wrong pipeline.21:13
clarkbnot necessarily. Depends on what the job's function is and when it needs to run21:14
*** lpetrut has quit IRC21:15
*** ldnunes has quit IRC21:15
fungisince the same jobs get run for each commit which merges to the post pipeline and are generally idempotent, it's been deemed non-time-sensitive and possibly lossy (i.e., we don't make any effort to preserve and restore it on scheduler restarts)21:16
mnaserhttps://review.openstack.org/#/c/573738/21:18
mnaserif you wanna bump that up to help speed things up21:18
clarkbI can enqueue it to the gate directly21:19
clarkbfungi: ^ any reason to not do that?21:19
mnaserit's not much but i figured it would help21:19
clarkbbetter than nothing21:20
clarkboh except we have puppet disabled on those nodes21:20
clarkbI'll just manually increase it to 25 then when puppet is running again it will match21:20
sthusseythis is our job to publish docker images21:21
*** heyongli has quit IRC21:21
clarkb#status log Manually applied https://review.openstack.org/#/c/573738/ to nl03 as nl* are disabled in puppet until we sort out the migration to no zk schema21:21
openstackstatusclarkb: finished logging21:21
*** heyongli has joined #openstack-infra21:21
fungiclarkb: right, hand-patching is the way for now. i guess it's just needed on one launcher21:21
sthusseyprobably should be in the promotion pipeline which I wasn't aware of when I was writing the CI stuff21:21
clarkbsthussey: promote and post have the same priority fwiw. The difference is the gerrit event they trigger off of, one is tied to generic ref updates the other to changes merging21:23
fungisthussey: normally it wouldn't be a bad choice... we simply had a couple of disruptive incidents strike in the middle of the mad rush to land patches for milestone 221:23
fungiso things are a lot more backed up than usual21:23
sthusseyOkay, post is probably fine. We want this after merges.21:23
sthusseyThanks21:23
clarkbfungi: do we know why unbound's package update broke the daemons?21:24
clarkbfungi: and should we expect that to happen next time they update the package (and why were none of our other servers affected or are they) sorry I'm dumpiong all the questions now that fires are mostly out and I'm back from doctor visit21:24
fungiclarkb: suspicions but no smoking gun... it seems like the zuul executors, being under heavy load, exposed some sort of race in the daemon restart triggered by the deb maintscripts at package upgrade time21:26
*** dklyle has joined #openstack-infra21:26
fungipossibly the old process hadn't fully unlinked the listening socket?21:26
fungithe logs were not helpful21:26
clarkbweird, we've got unbound all over the place would've expected a more universal problem (also I run it at home and don't have problems like that either)21:27
fungiagreed21:27
funginone of our other servers seem to have been impacted21:27
*** heyongli has quit IRC21:31
*** heyongli has joined #openstack-infra21:32
*** myoung is now known as myoung|off21:34
*** boden has quit IRC21:34
*** heyongli has quit IRC21:41
*** armaan has quit IRC21:41
*** heyongli has joined #openstack-infra21:42
*** dhill_ has quit IRC21:45
*** felipemonteiro has joined #openstack-infra21:51
*** heyongli has quit IRC21:52
*** heyongli has joined #openstack-infra21:52
*** lifeless_ has joined #openstack-infra21:58
*** e0ne has quit IRC22:00
*** lifeless has quit IRC22:00
*** heyongli has quit IRC22:02
*** heyongli has joined #openstack-infra22:02
*** jamesmcarthur has joined #openstack-infra22:02
*** r-daneel has quit IRC22:03
*** hashar has quit IRC22:04
*** nicolasbock has quit IRC22:06
*** jamesmcarthur has quit IRC22:07
*** ykarel has joined #openstack-infra22:09
*** rfolco_ has joined #openstack-infra22:11
*** rfolco has quit IRC22:12
*** heyongli has quit IRC22:12
openstackgerritJeremy Stanley proposed openstack-infra/storyboard master: use required enums to validate 'type' args  https://review.openstack.org/54517022:12
openstackgerritJeremy Stanley proposed openstack-infra/storyboard master: mark worklist filter_criteria as a required field  https://review.openstack.org/54540522:12
openstackgerritJeremy Stanley proposed openstack-infra/storyboard master: mark FilterCriterion title as a mandatory field  https://review.openstack.org/54540622:12
*** heyongli has joined #openstack-infra22:12
*** diablo_rojo has quit IRC22:13
openstackgerritZane Bitter proposed openstack-dev/cookiecutter master: Convert to new docs PTI  https://review.openstack.org/57329422:13
*** ykarel_ has joined #openstack-infra22:14
*** florianf has quit IRC22:14
*** ykarel has quit IRC22:16
openstackgerritMerged openstack-infra/storyboard master: Make notification driver configurable  https://review.openstack.org/53857422:19
*** heyongli has quit IRC22:22
*** heyongli has joined #openstack-infra22:23
*** agopi has quit IRC22:24
ianwclarkb: (scrollback) yep dib from 2.0 doesn't actually use dib-utils ... dib-run-parts was the only thing in it22:24
clarkbianw: thank you for confirming22:24
*** lbragstad has joined #openstack-infra22:25
*** elbragstad has quit IRC22:25
openstackgerritIan Wienand proposed openstack-infra/system-config master: mirror-update: install afsmon and run from cron  https://review.openstack.org/57349322:26
ianwwas zuul-scheduler ok since it's weird gerrit dropout yesterday?22:26
ianwthinking about it, it may have had to do with the unbound thing too22:26
*** rlandy|rover has quit IRC22:27
clarkbits fine now22:27
clarkbwe had to surgery things22:27
clarkblargely nodepool related btu had to take scheduler offline too to avoid it talking to zk22:27
ianwok, put that one down to a glitch in the matrix22:29
*** slaweq has quit IRC22:30
*** r-daneel has joined #openstack-infra22:30
*** slaweq has joined #openstack-infra22:30
*** harlowja has joined #openstack-infra22:30
*** heyongli has quit IRC22:33
*** heyongli has joined #openstack-infra22:33
*** fried_rice is now known as efried22:35
*** lifeless_ is now known as lifeless22:41
*** tpsilva has quit IRC22:42
*** heyongli has quit IRC22:43
*** hongbin has quit IRC22:43
*** heyongli has joined #openstack-infra22:43
*** felipemonteiro has quit IRC22:44
openstackgerritMerged openstack-infra/storyboard master: Add MQTT notification publisher  https://review.openstack.org/53857522:45
openstackgerritMerged openstack-infra/storyboard master: Add configurable notification subscriber and mqtt driver  https://review.openstack.org/54095822:45
openstackgerritMerged openstack-infra/storyboard master: Make it impossible to create a userless private story  https://review.openstack.org/41607022:46
*** ykarel_ has quit IRC22:49
openstackgerritMerged openstack-infra/storyboard master: Document some usage instructions for a freshly deployed dev instance  https://review.openstack.org/55601822:50
*** heyongli has quit IRC22:53
*** heyongli has joined #openstack-infra22:53
*** felipemonteiro has joined #openstack-infra22:57
*** rpioso is now known as rpioso|afk23:00
*** r-daneel_ has joined #openstack-infra23:01
*** caphrim007_ has joined #openstack-infra23:01
*** caphrim007_ has quit IRC23:03
*** r-daneel has quit IRC23:03
*** r-daneel_ is now known as r-daneel23:03
*** heyongli has quit IRC23:03
*** heyongli has joined #openstack-infra23:04
*** ykarel_ has joined #openstack-infra23:04
*** caphrim007 has quit IRC23:04
*** ykarel_ has quit IRC23:09
*** kjackal has quit IRC23:12
*** heyongli has quit IRC23:14
*** heyongli has joined #openstack-infra23:14
*** claudiub has quit IRC23:21
*** felipemonteiro has quit IRC23:23
*** heyongli has quit IRC23:24
*** heyongli has joined #openstack-infra23:24
*** r-daneel has quit IRC23:25
*** sthussey has quit IRC23:28
*** Swami has quit IRC23:31
*** r-daneel has joined #openstack-infra23:33
*** heyongli has quit IRC23:34
*** heyongli has joined #openstack-infra23:34
openstackgerritMerged openstack-infra/project-config master: Bump vexxhost to 25 servers  https://review.openstack.org/57373823:36
*** felipemonteiro has joined #openstack-infra23:40
*** XP_2600 has joined #openstack-infra23:44
*** heyongli has quit IRC23:44
*** heyongli has joined #openstack-infra23:45
*** XP_2600 has quit IRC23:48
clarkbdown to 338 node requests23:53
*** heyongli has quit IRC23:55
*** heyongli has joined #openstack-infra23:55
*** r-daneel has quit IRC23:58

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!