Friday, 2018-12-07

corvusclarkb: oh; er, do you have the whole sequence handy? (or a pointer to docs?)00:00
clarkbya let me get the email that was sent to the list00:00
clarkbcorvus: http://lists.openstack.org/pipermail/openstack-discuss/2018-December/000489.html is what I started with00:00
*** yamamoto has quit IRC00:01
clarkbI performed the pull and update steps in taht email for each of the kubernets containers that showed up as running according to `atomic containers list --no-trunc`00:02
clarkbthe only things outside of that I did was free up disk space so that the pull would work. I did that via atomic images prune and `journalctl --vacuum-size=100M`00:02
corvusclarkb: do we need to restart anything after it's done?00:03
clarkbcorvus: the update command restarts the containers for us (you can confirm with the containers list command)00:03
corvusok i think i grok now00:04
clarkbthe pull brings down the image, the update restarts services as necessary aiui00:04
clarkbcorvus: note that the minion was updated too (it runs a subset of the services)00:08
clarkbcorvus: thinking out loud, does the json change need an executor restart?00:10
corvusclarkb: yep.  i did your foreach kube*; pull, update on all00:10
clarkbthats part of the ansible proceses which are forked on demand and should've seen the change as soon as we installed it?00:10
corvusclarkb: i think it will need a restart because it's one of the things we copy into place on startup00:11
clarkbah00:11
*** rlandy has quit IRC00:11
corvusinfra-root: i've created another k8s cluster via magnum in vexxhost to experiment with running gitea in it00:12
corvusclarkb: ^ and i just upgrade it, because i forgot to specify the version on create00:12
fungiokay, i'm fed/inebriated and catching up00:12
clarkbcorvus: it was actually pretty painless once I figured out how to get around teh disk constraints00:13
corvusif this works, i'll be able to push some changes to system-config for it.  but for now i'd just like to experiment and see what works00:13
corvusclarkb: agreed; i'm done now :)00:13
clarkbwell that and knowing what to do in the first place but thankfully people sent email to the discuss list about it00:13
corvusoh, the cluster is called "opendev" btw00:14
fungiwe're ~ready to do executor restarts nowish i guess?00:16
fungi(unless i've misread)00:16
clarkbI am ready if you are00:16
fungize01 seemed to have the right (new) version installed when i looked a moment ago00:19
fungii'll double-check them all real fast00:19
corvuswfm.  i'd recommend an all-stop, then update fstab on ze12, then all start.00:20
corvusdoes someone else want to do the all-stop, and i'll fixup ze12 when they're stopped?00:20
fungihuh, suddenly i've lost (ipv6 only?) access to ze0100:21
clarkbcorvus: I'm happy to do it, though maybe fungi would like to if he hasn't done one before?00:21
clarkbfungi: ansible ze*.openstack.org -m shell -a 'sytemctl stop zuul-executor' should work iirf00:22
clarkbif you fix the typos :/00:22
clarkbthen we wait for all of them to stop and switch stop to start00:22
fungiseems i'm suddenly able to connect again and finishing double-checking installed versions00:23
fungiokay, ze01-ze12 report the expected versions via pbr freeze00:25
fungize12 doesn't actually seem to reflect any zuul installed according to pbr freeze so digging deeper00:25
fungize01-ze11 did00:25
clarkbfungi: python3 $(which pbr) freeze ?00:26
clarkbchances are pbr is currently installed under python2 so it looks there by default00:26
fungipip3 freeze works, so yeah00:26
*** bobh has joined #openstack-infra00:26
fungiwell, chances are pbr is installed under _both_ (otherwise zuul wouldn't install under python3) but the executable entrypoint is the python2 version00:27
clarkbright the "binary" executable00:27
fungipbr freeze indicates "zuul==3.3.2.dev32  # git sha e2520b9" on the others and pip3 freeze on ze12 reports "zuul==3.3.2.dev32" so i take that as a match00:28
fungiunfortunately `python3 -m pbr freeze` isn't a thing00:29
fungiperhaps mordred knows what needs to be added to make that also work00:29
fungianyway, i think we're good for executor restarts if we want00:30
corvusi'm ready00:30
clarkbI'm still here00:31
*** jmorgan1 has quit IRC00:31
*** bobh has quit IRC00:31
*** yamamoto has joined #openstack-infra00:32
fungiokay, so should we be manually restarting executors one-by-one or use the zuul_restart playbook in system-config?00:33
corvusfungi: not that one -- it does a full system restart00:33
corvusfungi: i'd just do what clarkb suggested above00:33
clarkbfungi: I put an example command above that should work `ansible ze*.openstack.org -m shell -a 'systemctl stop zuul-executor'`00:34
clarkbthat probably needs sudo on bridge.o.o00:34
fungiso `ansible ze*.openstack.org -m shell -a 'sytemctl stop zuul-executor' (modulo typos)00:34
clarkbya00:34
fungii'll do that now00:35
fungiobviously then we start too00:35
clarkbfungi: no you wait before starting00:35
fungik00:35
clarkbthe executors will take some time to completely stop (so we can check ps -elf | grep zuul-executor or whatever incantation you prefer for that sort of thing00:35
corvusit takes like 15 minutes to stop00:35
clarkbI do something like ps -elf | grep zuul | wc -l to get a countdown metric00:36
fungion bridge.o.o i've run `sudo ansible ze*.openstack.org -m shell -a 'systemctl stop zuul-executor'`00:36
fungiit reported "CHANGED" for all 12 executors, which i take as a good sign00:37
corvusyep00:37
corvusi see ze12 stopping00:37
corvusi have already rsynced the necessary data, so the ze12 switcheroo shouldn't take long00:38
corvusze12 has stopped00:38
fungiawesome00:39
clarkbze01 is down to 82 processes by my earlier command00:39
clarkbnow 80, so trending in the expected direction00:39
fungiyeah, `sudo ansible ze*.openstack.org -m shell -a 'ps -elf | grep zuul00:40
fungi| wc -l'`00:40
*** tosky has quit IRC00:40
corvusze12 is ready to go00:40
fungiis returning pretty high numbers (disregard the stray newline)00:40
fungireturning 52 for ze1200:41
clarkbfungi: it likely won't got to zero fwiw due to ssh control persist processes being stubborn00:41
fungik00:41
corvusgrep for "zuul-" instead00:41
fungiahh00:41
corvusas in "zuul-executor"00:41
fungimuch more reasonable00:41
fungi4 on all but ze12 which returns 200:42
fungi2 is the new 0?00:42
clarkbfungi: according to puppet: yes00:42
corvuswith grep, i think so :)00:42
fungik00:42
corvusze12 has been stopped for a while00:42
fungii get it ;)00:42
clarkbze01 looks stopped00:43
fungiseveral of them are returning 2 now, yes00:43
fungi, 05, 08 and 10 still going00:44
funginow just 04 and 0500:44
*** yamamoto has quit IRC00:44
*** mriedem has quit IRC00:45
fungiand now just 05 left00:46
openstackgerritMarcH proposed openstack-infra/git-review master: doc: new testing-behind-proxy.rst; tox.ini: passenv = http[s]_proxy  https://review.openstack.org/62336100:47
*** yamamoto has joined #openstack-infra00:47
fungi100% stopped now00:48
corvusi cleaned up all old build dirs00:48
fungiready to start, or wait?00:48
corvusfungi: ready00:48
fungifinger hovering over the button00:48
fungiclarkb: all clear?00:48
clarkbfungi: ya00:49
fungirunning00:49
corvusze12 seems happy00:49
clarkbI see new executor on ze0100:49
fungishould all be starting up now00:49
*** Swami has quit IRC00:49
fungi`ps -elf | grep zuul- | wc -l` is returning 4 for all00:49
openstackgerritMarcH proposed openstack-infra/git-review master: CONTRIBUTING.rst, HACKING.rst: fix broken link, minor flow updates  https://review.openstack.org/62336200:50
clarkbswap has been stable so far00:51
corvuslooks like the restart did clear out swap usage too, so it should be easy to compare00:52
corvus(ie, swap went to 0 at restart)00:52
mwhahahaif anyone is around to promote https://review.openstack.org/#/c/623293/ in the tripleo gate that would be helpful (to stop resets due to nested virt crashes)00:54
corvuslooks pretty good; i'm going to eod now00:54
* mwhahaha wanders off00:54
fungimwhahaha: i've promoted it now00:57
mwhahahaThanks00:58
*** rkukura has quit IRC00:59
clarkbthere are no more queued jobs now01:06
*** bobh has joined #openstack-infra01:06
clarkbno more executor queued jobs01:07
clarkbthere may be jobs the scheduler is waiting for nodes on01:07
*** gyee has quit IRC01:09
*** bobh has quit IRC01:10
pabelangerze12.o.o looks to be running a different kernel for some reason01:11
pabelangerLinux ze12 4.4.0-137-generic #163-Ubuntu SMP Mon Sep 24 13:14:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux01:11
pabelangerthis is from ze0101:11
pabelangerLinux ze01 4.15.0-42-generic #45~16.04.1-Ubuntu SMP Mon Nov 19 13:02:27 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux01:11
clarkbhrm how do we apply the hwe kernel?01:12
clarkbI thought we puppeted that01:12
clarkbmaybe we haven't rebooted since puppet ran01:12
clarkbI bet that is it01:12
pabelangeryah, maybe01:12
clarkbsince now puppet is a step after launch01:12
pabelangerbut, it seem ze12.o.o is running more jobs right now01:12
pabelangerand using less ram01:12
pabelangerunsure if related to kernel or just smaller jobs01:13
clarkbpabelanger: the jobs definitely have an impact on executor memory01:13
clarkbso could be distribution of "expensive" jobs01:13
pabelangeryah, the HDD usage on ze12 is low also01:13
pabelangerso, likely smaller projects01:13
*** kjackal has quit IRC01:14
clarkbze01 swap use is still pretty stable01:18
clarkbze12 is apparently swapping01:18
clarkbwhich is maybe not surprising since the old kernel has/had issues with swap01:18
pabelangeryah, we should stop it and reboot if we wanted01:19
pabelangerbut ze01.o.o swap looks good01:20
clarkbI've got to go to a birthday dinner so I can't do that now but can help in the morning if we want to do that01:20
pabelangerwfm01:20
*** jamesmcarthur has joined #openstack-infra01:21
*** jamesmcarthur has quit IRC01:24
*** jamesmcarthur has joined #openstack-infra01:25
*** rkukura has joined #openstack-infra01:40
*** yamamoto has quit IRC01:49
*** bobh has joined #openstack-infra01:51
*** wolverineav has quit IRC01:55
*** bobh has quit IRC01:55
*** betherly has joined #openstack-infra01:59
*** dave-mccowan has quit IRC02:00
*** jamesmcarthur has quit IRC02:01
*** betherly has quit IRC02:04
*** bobh has joined #openstack-infra02:10
*** mrsoul has joined #openstack-infra02:12
*** bobh has quit IRC02:15
lbragstadawesome write up to the mailing list clarkb02:26
*** hongbin has joined #openstack-infra02:41
*** bobh has joined #openstack-infra02:47
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: add OpenAPI documentation  https://review.openstack.org/53554102:49
*** bobh has quit IRC02:51
*** psachin has joined #openstack-infra02:53
*** imacdonn has quit IRC02:53
*** imacdonn has joined #openstack-infra02:53
*** betherly has joined #openstack-infra03:01
*** bhavikdbavishi has joined #openstack-infra03:02
*** betherly has quit IRC03:05
*** bobh has joined #openstack-infra03:06
*** rh-jelabarre has quit IRC03:08
*** bobh has quit IRC03:10
*** dave-mccowan has joined #openstack-infra03:11
*** dave-mccowan has quit IRC03:19
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: add OpenAPI documentation  https://review.openstack.org/53554103:20
*** jamesmcarthur has joined #openstack-infra03:23
*** diablo_rojo has quit IRC03:25
*** jamesmcarthur has quit IRC03:41
*** bobh has joined #openstack-infra03:43
*** bobh has quit IRC03:48
*** ramishra has joined #openstack-infra03:50
*** ykarel|away has joined #openstack-infra03:52
*** bobh has joined #openstack-infra04:01
*** jamesmcarthur has joined #openstack-infra04:02
*** bobh has quit IRC04:06
*** jamesmcarthur has quit IRC04:07
*** neilsun has joined #openstack-infra04:15
*** lbragstad has quit IRC04:23
openstackgerritJea-Min, Lim proposed openstack-infra/project-config master: add new project called ku.stella this project is unofficial openstack project  https://review.openstack.org/62339604:30
*** bobh has joined #openstack-infra04:37
*** psachin has quit IRC04:41
*** bobh has quit IRC04:42
*** wolverineav has joined #openstack-infra04:44
*** ykarel|away has quit IRC04:46
*** janki has joined #openstack-infra04:46
mrhillsmanis there a dib element for running a custom script?04:49
*** bobh has joined #openstack-infra04:56
*** psachin has joined #openstack-infra04:58
*** bobh has quit IRC05:01
*** ykarel|away has joined #openstack-infra05:04
ianwmrhillsman: umm, every dib element is a custom script?05:05
ianwthat's kind of the point of them :)05:05
*** bobh has joined #openstack-infra05:14
*** bobh has quit IRC05:19
*** wolverineav has quit IRC05:21
*** wolverineav has joined #openstack-infra05:32
*** bobh has joined #openstack-infra05:32
*** wolverineav has quit IRC05:36
*** bobh has quit IRC05:36
*** bobh has joined #openstack-infra05:51
*** bobh has quit IRC05:55
prometheanfireI'm guessing it's been a busy day/week?06:02
*** bobh has joined #openstack-infra06:10
*** bobh has quit IRC06:14
*** bhavikdbavishi has quit IRC06:18
*** adam_zhang has joined #openstack-infra06:22
*** bobh has joined #openstack-infra06:28
*** hongbin has quit IRC06:32
*** bobh has quit IRC06:33
*** kjackal has joined #openstack-infra06:42
*** adam_zhang has quit IRC06:44
*** bobh has joined #openstack-infra06:46
*** bobh has quit IRC06:51
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: add OpenAPI documentation  https://review.openstack.org/53554106:52
*** hwoarang has quit IRC06:54
*** hwoarang has joined #openstack-infra06:56
openstackgerritQuique Llorente proposed openstack-infra/zuul master: Add default value for relative_priority  https://review.openstack.org/62217507:01
*** rcernin has quit IRC07:01
*** bobh has joined #openstack-infra07:03
openstackgerritMerged openstack-infra/zuul master: web: refactor status page to use a reducer  https://review.openstack.org/62139507:04
openstackgerritMerged openstack-infra/zuul master: web: refactor jobs page to use a reducer  https://review.openstack.org/62139607:06
*** yamamoto has joined #openstack-infra07:07
*** bobh has quit IRC07:07
*** betherly has joined #openstack-infra07:08
*** dklyle has quit IRC07:09
*** dklyle has joined #openstack-infra07:10
*** yamamoto has quit IRC07:11
*** betherly has quit IRC07:13
*** pgaxatte has joined #openstack-infra07:19
*** bobh has joined #openstack-infra07:21
*** dpawlik has joined #openstack-infra07:24
*** bobh has quit IRC07:26
*** kjackal has quit IRC07:28
*** jtomasek has joined #openstack-infra07:28
*** ykarel|away is now known as ykarel07:35
*** alexchadin has joined #openstack-infra07:35
*** bobh has joined #openstack-infra07:38
*** bobh has quit IRC07:43
*** dims has quit IRC07:44
*** dims has joined #openstack-infra07:47
*** kjackal has joined #openstack-infra07:54
*** bobh has joined #openstack-infra07:56
*** bobh has quit IRC08:01
*** ykarel is now known as ykarel|lunch08:02
*** bobh has joined #openstack-infra08:14
*** bobh has quit IRC08:19
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Use combined status for Github status checks  https://review.openstack.org/62341708:23
*** bobh has joined #openstack-infra08:30
*** bobh has quit IRC08:35
tobias-urdintonyb: thank you tony!08:38
*** jpena|off is now known as jpena08:43
*** bobh has joined #openstack-infra08:48
*** shardy has joined #openstack-infra08:52
*** bobh has quit IRC08:53
*** jpich has joined #openstack-infra08:57
*** ykarel|lunch is now known as ykarel09:00
*** yamamoto has joined #openstack-infra09:04
*** bobh has joined #openstack-infra09:07
*** ccamacho has joined #openstack-infra09:09
*** ccamacho has quit IRC09:09
*** bobh has quit IRC09:11
tobiashclarkb, corvus: when I look at grafana I think the starting builds graph looks odd. During the times when all executors deregistered it's constantly at 5. So maybe something changed in the starting jobs phase.09:14
*** ccamacho has joined #openstack-infra09:15
tobiashclarkb, corvus: to me it looks like job starting (maybe repo setup or gathering facts) is slower. So the swapping could be a red herring.09:16
*** bobh has joined #openstack-infra09:23
*** alexchadin has quit IRC09:24
*** bobh has quit IRC09:27
*** bobh has joined #openstack-infra09:37
*** gfidente has joined #openstack-infra09:37
*** bobh has quit IRC09:41
*** bobh has joined #openstack-infra09:51
*** bobh has quit IRC09:55
*** e0ne has joined #openstack-infra09:59
*** electrofelix has joined #openstack-infra10:02
*** jamesmcarthur has joined #openstack-infra10:03
*** verdurin has quit IRC10:04
*** jamesmcarthur has quit IRC10:07
*** verdurin has joined #openstack-infra10:07
*** bobh has joined #openstack-infra10:09
*** bobh has quit IRC10:13
stephenfinI've noticed that I seem to be getting signed out of Gerrit each day. Has something changed in the past ~3 weeks?10:19
*** yamamoto has quit IRC10:22
*** dpawlik has quit IRC10:23
*** dpawlik has joined #openstack-infra10:23
*** bobh has joined #openstack-infra10:24
*** bhavikdbavishi has joined #openstack-infra10:27
*** bobh has quit IRC10:29
*** ccamacho has quit IRC10:31
*** agopi is now known as agopi-pto10:38
fricklerstephenfin: no changes that I know of, and I seem to stay logged in as long as I'm active once per day10:40
*** ccamacho has joined #openstack-infra10:41
stephenfinfrickler: Ack. Must be a client side issue so. I'll investigate. Thanks! :)10:41
frickleramorin: infra-root: I just saw a gate failure with three simultaneous timed_out jobs all on bhs1, so it seems that something is still not good there, even with the reduced load10:43
*** agopi-pto has quit IRC10:44
*** bobh has joined #openstack-infra10:49
*** yamamoto has joined #openstack-infra10:50
*** pbourke has quit IRC10:50
*** bobh has quit IRC10:53
*** yamamoto has quit IRC10:56
*** eernst has joined #openstack-infra10:58
*** wolverineav has joined #openstack-infra11:00
*** bobh has joined #openstack-infra11:01
*** wolverineav has quit IRC11:05
*** bobh has quit IRC11:06
*** eernst has quit IRC11:08
*** kjackal has quit IRC11:15
*** bobh has joined #openstack-infra11:20
*** bobh has quit IRC11:24
*** pbourke has joined #openstack-infra11:27
*** kjackal has joined #openstack-infra11:33
*** sshnaidm|afk is now known as sshnaidm|off11:33
*** yamamoto has joined #openstack-infra11:36
*** bobh has joined #openstack-infra11:37
*** bobh has quit IRC11:41
*** yamamoto has quit IRC11:46
openstackgerritJens Harbott (frickler) proposed openstack-infra/project-config master: Disable ovh bhs1 and gra1  https://review.openstack.org/62345711:46
frickleramorin: infra-root: ^^ this would be my measure of last resort, unless you have a better idea. but gate queues seem to be effectively be stuck due to the large number of timeouts and queue resets11:48
*** gfidente has quit IRC11:49
*** bobh has joined #openstack-infra11:49
*** tosky has joined #openstack-infra11:52
*** bobh has quit IRC11:53
*** dtantsur|afk is now known as dtantsur\11:54
*** dtantsur\ is now known as dtantsur11:54
*** slaweq has joined #openstack-infra12:03
*** yamamoto has joined #openstack-infra12:14
*** bobh has joined #openstack-infra12:15
*** bobh has quit IRC12:20
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Add timer for starting_builds  https://review.openstack.org/62346812:24
tobiashcorvus, clarkb: having metrics about the job startup times could be helpful ^12:24
*** e0ne has quit IRC12:27
fungistephenfin: there's a long standing (mis?)behavior with gerrit where if you have multiple tabs open and you try to use one you've gotten inadvertently signed out of and sign it back in, that will invalidate teh session token used by other gerrit tabs in your browser. workaround is to make sure one is signed in and working, then go around and force refresh all your other gerrit tabs _before_ trying to12:38
fungiclick anything in them12:38
*** tobiash has quit IRC12:38
*** rh-jelabarre has joined #openstack-infra12:39
*** bobh has joined #openstack-infra12:39
stephenfinfungi: Ahhhh, I've seen that. It's probably the fact I have tabs open from before vacation that's throwing me so12:39
* stephenfin goes and refreshes everything manually12:39
*** kaisers has quit IRC12:39
fungifrickler: are you sure the gate's stuck? we seem to be merging 10-15 changes an hour. looks to me like we're approving changes faster than we can get them through12:40
pandais zuul in infra deployed using container images build with pbrx ?12:40
fungipanda: not yet, no12:40
fungistill deployed with the puppet-zuul module12:40
*** jpena is now known as jpena|lunch12:40
*** bhavikdbavishi has quit IRC12:41
*** psachin has quit IRC12:41
pandafungi: but is planned to be ?12:41
fungipanda: i believe so, yes. https://specs.openstack.org/openstack-infra/infra-specs/specs/update-config-management.html outlines the transition plan12:43
fungithe containers section mentions "For our Python services, a new tool is in work, pbrx, which has a command for making single-process containers from pbr setup.cfg files and bindep.txt."12:44
pandafungi: ah, the link I was looking for. Thanks!12:48
*** ahosam has joined #openstack-infra12:49
fungialways happy when i can point someone to actual documentation!12:49
*** tobiash_ has joined #openstack-infra12:52
*** tobiash has joined #openstack-infra12:53
*** kaisers has joined #openstack-infra12:56
*** bobh has quit IRC12:56
*** bobh has joined #openstack-infra12:57
*** tobiash has quit IRC13:02
*** tobiash has joined #openstack-infra13:02
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Add timer for starting_builds  https://review.openstack.org/62346813:03
fricklerfungi: most changes that merge seem to be outside of integrated or tripleo queues. still things seem to work a bit better currently, will wait for the next set of results from integrated queue13:05
fungiclarkb was helping track down a lot of failures which weren't related to performance in ovh regions13:08
*** boden has joined #openstack-infra13:08
fungiwhich were impacting the tripleo and integrated gate queues13:08
*** gfidente has joined #openstack-infra13:08
fricklerI added the timeouts that I saw today at the bottom of https://etherpad.openstack.org/p/bhs1-test-node-slowness and they all were in ovh13:09
fricklerbut I'm also fine with waiting to see how things evolve over the weekend13:10
fungii can run some more analysis of timeouts seen by logstash and break them down by provider weighted by max-servers in each13:11
*** chandan_kumar is now known as chkumar|off13:12
*** e0ne has joined #openstack-infra13:13
*** jamesmcarthur has joined #openstack-infra13:18
*** bobh has quit IRC13:22
*** EmilienM is now known as EvilienM13:22
*** bobh has joined #openstack-infra13:24
*** janki has quit IRC13:24
fricklerfungi: https://ethercalc.openstack.org/jg8f4p7jow5o , seems to point to bhs1 still being bad, so maybe disable only that region13:25
fricklerthat's from the results for the last 12h on http://logstash.openstack.org/#/dashboard/file/logstash.json?query=(message:%20%5C%22FAILED%20with%20status:%20137%5C%22%20OR%20message:%20%5C%22FAILED%20with%20status:%20143%5C%22%20OR%20message:%20%5C%22RUN%20END%20RESULT_TIMED_OUT%5C%22)%20AND%20tags:%20%5C%22console%5C%22%20AND%20voting:1&from=864000s13:26
*** bobh has quit IRC13:29
fungifrickler: thanks, and yeah that should only be since the most recent max-servers change to bhs1 so presumably fairly accurate13:31
fungii agree no need to drop gra1, it's doing better on this than all the rackspace regions according to your analysis13:31
fungihard to know if vexxhost-ca-ymq-1 is statistically significant there13:33
fungiso i wouldn't go drawing any conclusions from that one13:34
*** jamesmcarthur has quit IRC13:35
*** neilsun has quit IRC13:36
*** jpena|lunch is now known as jpena13:37
*** sshnaidm|off has quit IRC13:37
*** bobh has joined #openstack-infra13:39
*** jcoufal has joined #openstack-infra13:41
*** eumel8 has joined #openstack-infra13:42
*** edmondsw has quit IRC13:42
openstackgerritFrank Kloeker proposed openstack-infra/project-config master: Re-activate translation job for Trove  https://review.openstack.org/62349213:42
openstackgerritFrank Kloeker proposed openstack-infra/openstack-zuul-jobs master: Add Trove to project doc translation  https://review.openstack.org/62349313:43
*** bobh has quit IRC13:43
*** rlandy has joined #openstack-infra13:44
*** tobiash has left #openstack-infra13:44
*** tobiash_ is now known as tobiash13:45
*** alexchadin has joined #openstack-infra13:47
*** jamesmcarthur has joined #openstack-infra13:49
fricklerfungi: vexxhost-ca-ymq-1 seems to be only special nodes for kata, so that shouldn't be relevant.13:50
*** bobh has joined #openstack-infra13:53
fungiagreed13:56
*** bobh has quit IRC13:58
*** jpich has quit IRC14:02
*** jpich has joined #openstack-infra14:03
*** kgiusti has joined #openstack-infra14:04
*** dave-mccowan has joined #openstack-infra14:05
*** lbragstad has joined #openstack-infra14:06
*** eharney has quit IRC14:06
openstackgerritJens Harbott (frickler) proposed openstack-infra/project-config master: Disable ovh bhs1  https://review.openstack.org/62345714:06
openstackgerritPaul Belanger proposed openstack-infra/zuul master: Add nodepool.host_id variable to inventory file  https://review.openstack.org/62349614:07
*** dave-mccowan has quit IRC14:10
openstackgerritMerged openstack-infra/nodepool master: Include host_id for openstack provider  https://review.openstack.org/62310714:10
*** edmondsw has joined #openstack-infra14:12
mnasermorning infra-root14:19
mnaserwe're having massive issues with our centos-7 builds with 7.6 being out14:19
mnasera lot of timeouts and really slow operations14:19
mordredmnaser: how spectacular14:19
mnasercould i request a node hold?14:19
mnasermordred: stable enterprise os they said14:20
mnaseralmost everything is timing out, it looks like slow io.. or slow network.. something is slow14:20
mnaseri dont know what14:20
mordredmnaser: yeah. value in not changing they said14:20
dmsimardmnaser: which job for which project you'd like to hold ?14:20
mnaserdmsimard: can we have it hold based on job? we have `openstack-ansible-functional-centos-7` or `openstack-ansible-functional-distro_install-centos-7` on any project14:21
mnaserall roles are broken14:21
dmsimardmnaser: yes, it can be based on job14:21
mnaserdmsimard: awesome, any of those two should be ok.14:21
dmsimardmnaser: what project? openstack/openstack-ansible ?14:22
dmsimardI know you said any project, but I need one :p14:23
mnaserdmsimard: it is affecting all of our roles so anything openstack/openstack-ansible-* .. the integrated is broken so i can find a job from openstack/openstack-ansible14:23
mnaserah if its any project then lets just do openstack/openstack-ansible but i have different job name for you14:23
dmsimardok14:23
mnaseropenstack-ansible-deploy-aio_lxc-centos-714:23
mnaserand/or openstack-ansible-deploy-aio_metal-centos-714:23
dmsimardmnaser: the holds are set, next time there's a failure they will be available14:24
mnaserdmsimard: cool, will they catch and already running job that might fail14:25
mnaseror will it have to be done started now?14:25
dmsimardnot 100% sure14:25
mnaserone literally just failed now, i wonder if were quick enough =P14:25
dmsimardlet me see14:25
mnaserhttps://review.openstack.org/#/c/618711/ 1 minute ago14:25
*** mriedem has joined #openstack-infra14:25
*** lbragstad has quit IRC14:25
dmsimardyeah it triggered the autohold, one min14:25
mnaserperfect timing woo14:27
dmsimardmnaser: root@104.130.117.5814:28
dmsimardmnaser: it's in rax-ord14:28
mnaserawesome, works, thank you14:28
mnaserssh taking ages to log me in14:28
mnaserso thats a good start14:28
*** lbragstad has joined #openstack-infra14:29
dmsimardlooks like ansible-playbook is still running14:29
mnaseryeah, the job didn't fail but timed out (it really shouldn't take this long)14:29
fungimnaser: any chance you're seeing these slow runs just in ovh-bhs1? i think we're about to disable it again because of a disproportionate amount of job timeouts there14:30
mnaserfungi: i've seen some but in this case it's actually been centos and i've noticed it on rax-ord14:30
mnaserwhich in my experience is a stable region but i dunno if 7.6 made weird things14:30
fungiokay, so some more systemic issue i suppose14:30
dmsimardmnaser: setup-infrastructure is 56 minutes in your patch: http://logs.openstack.org/11/618711/3/check/openstack-ansible-deploy-aio_lxc-centos-7/ec4a635/logs/ara-report/14:30
mnaserthing is even log collection14:31
dmsimardit's 28 minutes on opensuse14:31
mnasertimes out after 30 minutes14:31
mnaserso something is strange on centos machines14:31
*** jamesmcarthur has quit IRC14:31
mnaserwe cant even collect logs because we timeout14:31
mnaserhttp://logs.openstack.org/42/614342/4/check/openstack-ansible-functional-centos-7/517947d/job-output.txt.gz#_2018-12-06_17_57_16_16368714:32
mnaserhave a look at that14:32
dmsimardmnaser: looking at the ara report for that timeout'd job, there's definitely something going on14:33
guilhermespyeah mostly yesterday, around 3 rechecks with timeout collecting logs http://logs.openstack.org/20/618820/48/check/openstack-ansible-functional-centos-7/ca0f625/job-output.txt.gz#_2018-12-07_03_13_17_20163114:33
mnaserhttp://logs.openstack.org/11/618711/3/check/openstack-ansible-deploy-aio_lxc-centos-7/ec4a635/logs/ara-report/result/143042ae-27a8-4601-8506-1f4f7bea56a6/14:33
dmsimardmnaser: templating a file shouldn't take >5 minutes14:34
dmsimardcreating files/directories too14:34
mnaseryeah..14:34
mnaserfungi: what was the dd command you've been using?14:34
mordredwow. the systemd log verification failure is nice14:34
mnaser:D14:34
mnaseri cant remember if hwoarang or cloudnull worked on that14:35
fungimnaser: i took it from your swapfile setup example log: sudo dd if=/dev/zero of=/foo bs=1M count=409614:35
mnaserbut one is dealing with a new born and the other is enjoying far east asia14:35
mnaserso dont think they can comment too much :)14:35
mnaserok running that to see what sort of numbers i get14:35
*** wolverineav has joined #openstack-infra14:36
dmsimardtaking 1m30s to create a directory http://logs.openstack.org/11/618711/3/check/openstack-ansible-deploy-aio_lxc-centos-7/ec4a635/logs/ara-report/file/aee893b3-c5f4-4c21-943a-367431ef584c/#line-46 ...14:36
mnaser4294967296 bytes (4.3 GB) copied, 9.86048 s, 436 MB/s14:37
mnaserhmm14:37
mnasersee whats interesting is14:37
mnaserwhen you login, it takes a while to get a terminal14:37
mnaserim wondering if that has to do with it..14:37
mnaserpam_systemd(sshd:session): Failed to create session: Failed to activate service 'org.freedesktop.login1': timed out14:37
mordredwow14:38
*** eernst has joined #openstack-infra14:38
fungilooks like we lost a merger around 09:00z14:38
*** bobh has joined #openstack-infra14:38
logan-following up on the nested virt issues we were seeing in limestone over the psat 3-4 days: all of the HVs have been upgraded from xenial hwe kernel 4.15.0.34.56 to 4.15.0.42.63 now and my jobs that require nested virt are happy again, so I assume others are also.14:38
fungii'll see if i can figure out which merger is missing14:38
*** wolverineav has quit IRC14:41
fungizuul-merger is running on all the dedicated zm hosts14:42
dmsimardmnaser: not sure if related, but watching the processes, sudo commands seems to get stuck a lot ?14:42
dmsimardi.e, http://paste.openstack.org/raw/736820/14:42
mnaserdmsimard: that 100% is related to that logind stuff14:42
mnaserjournalctl | grep login114:43
dmsimardoh yeah it's obvious now14:43
*** takamatsu has joined #openstack-infra14:43
*** alexchadin has quit IRC14:43
mnasernow i dont know if we're breaking it or if 7.6 broke it14:43
fungiactually, have we lost 4 mergers? we should have 20, right? 8 dedicated and 12 accessory to the executors?14:44
fungihttp://grafana.openstack.org/d/T6vSHcSik/zuul-status?panelId=30&fullscreen&orgId=1&from=now%2Fd&to=now%2Fd14:44
fungiyep14:44
dmsimardmnaser: from http://grafana.openstack.org/d/T6vSHcSik/zuul-status?orgId=114:45
dmsimardmnaser: wrong name, meant fungi sorry :p14:45
mnaserfigured :)14:45
dmsimardmnaser: in systemd-logind journal there's "Failed to abandon session scope: Transport endpoint is not connected" which leads to https://github.com/systemd/systemd/issues/292514:45
mnaserdmsimard: thats super useful14:46
mnaserit looks like something happened at 12:14:52 which restarted dbus14:46
mnaserand since then it never came back14:46
frickler"Restarting dbus is not supported generally. That disconnects all clients, and the system generally cannot recover from that. This is a dbus limitation." - poettering14:48
fungize12 seems to have registered an oom-killer event in dmesg at 23:21:17z14:48
mnaserlooking here https://github.com/openstack/openstack-ansible-lxc_hosts/blob/6eee41f123dd49d73ad2851b878c11efd6cfffa2/tasks/lxc_cache_preparation_systemd_old.yml14:48
mnaserill move this to #openstack-ansible14:48
*** yamamoto has quit IRC14:49
fungithe other executors don't indicate any recent oom events, but then again they turn over their dmesg ring buffers rather rapidly14:49
openstackgerritFrank Kloeker proposed openstack-infra/project-config master: Add translation job for storyboard  https://review.openstack.org/62350814:50
funginot super urgent as the mergers seem to be keeping up. occasional spikes of 10-20 queued but they clear quickly14:50
eumel8fungi, mordred: ^^ We want to start with storyboard translation in that cycle.14:51
fungibut a little worried that the merger threads on the executors may be dying inexplicably14:51
*** sshnaidm has joined #openstack-infra14:52
*** jcoufal has quit IRC14:54
*** jamesmcarthur has joined #openstack-infra14:57
*** jcoufal has joined #openstack-infra14:58
*** diablo_rojo has joined #openstack-infra14:58
*** ccamacho has quit IRC15:00
*** armstrong has joined #openstack-infra15:00
*** sshnaidm has quit IRC15:03
*** slaweq has quit IRC15:04
*** eharney has joined #openstack-infra15:05
*** ramishra has quit IRC15:06
openstackgerritFrank Kloeker proposed openstack-infra/project-config master: Add translation job for storyboard  https://review.openstack.org/62350815:09
*** ykarel is now known as ykarel|away15:11
pabelangerfungi: ze12 needs to be rebooted as it is running a non HWE kernel. The only executor that is different15:12
AJaegereumel8: please ask storyboard cores to review that change and +115:12
*** dpawlik has quit IRC15:12
fungipabelanger: yep, i assume that plays into the oom situation on that server15:13
eumel8AJaeger: thx15:14
mnasermriedem: i see you do this often, but how do you take a bug and list it across affecting different releases?15:15
mnaserinside launchpad15:15
fungimnaser: you have to be in the bug supervisor group for those projects, i think15:16
mriedemright15:16
mriedem"Target to series"15:16
fungiand if you are, there's a little icon where you can get into the project details for a given bugtask and add specific series15:16
mriedemand the series has to be managed properly in launchpad15:17
mnaserok i see that, alright, looks like we need t add the new releases because it looks like the last series we have there is newton15:17
mriedemyup https://launchpad.net/nova/+series15:17
mriedemlots of projects in lp don't track the series15:17
fungiyeah, it's rather a bit of setup so unless it's something you expect to make a lot of use of i'm not sure i'd bother15:17
mnaserack15:17
*** ykarel|away has quit IRC15:18
*** ccamacho has joined #openstack-infra15:18
dmsimardFor the record, we have identified what was causing the OSA CentOS timeouts and they're testing a fix right now. A task ended up restarting dbus and this apparently causes issues with systemd-logind which lead to 25s timeouts for every ssh, sudo (and ansible!) commands15:20
dmsimardIt seems like a well documented bug and the consensus appears to be that dbus isn't meant to be restarted, it should only be reloaded if need be15:20
mordreddmsimard: awesome news!15:23
mordreddmsimard: yay for debugging and finding errors15:23
dmsimardmnaser: do you still need those two nodes ?15:23
mnaserdmsimard: oh can i have access to the baremetal one?15:23
dmsimardmnaser: yeah15:23
dmsimardsec15:23
mnaserwe shouldn't run lxc_hosts on the metal jobs15:24
dmsimardactually, 25secs :P15:24
dmsimardmnaser: root@104.239.173.19415:24
mnaseraw man15:25
mnaseri hope we're not running lxc_hosts on non-container jobs15:25
*** yamamoto has joined #openstack-infra15:26
*** yamamoto has quit IRC15:29
*** sshnaidm has joined #openstack-infra15:29
*** yamamoto has joined #openstack-infra15:29
openstackgerritFrank Kloeker proposed openstack-infra/project-config master: Re-activate translation job for Trove  https://review.openstack.org/62349215:30
fungiif any infra-puppet-core has a moment to review https://review.openstack.org/623290 the storyboard team would appreciate it15:31
fungieumel8: i've mentioned https://review.openstack.org/623508 in #storyboard and given a couple of our primary maintainers a heads up about it15:33
*** ykarel|away has joined #openstack-infra15:33
*** e0ne has quit IRC15:34
mnaserok we found the root cause, we can lose those vms dmsimard15:34
mnaserthank you15:34
dmsimard\o/15:34
*** eharney has quit IRC15:37
*** dpawlik has joined #openstack-infra15:41
*** sshnaidm is now known as sshnaidm|off15:42
*** dpawlik has quit IRC15:46
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: add OpenAPI documentation  https://review.openstack.org/53554115:46
*** eharney has joined #openstack-infra15:52
*** e0ne has joined #openstack-infra15:55
*** janki has joined #openstack-infra16:00
*** pgaxatte has quit IRC16:06
clarkbfrickler: fungi: I also identified ~6 e-r bugs that correlated strongly to disabling ovh bhs116:13
clarkbwe shoudl be able to use that to see if things are improving too16:13
*** tosky has quit IRC16:14
*** e0ne has quit IRC16:23
openstackgerritMerged openstack-infra/zuul master: Add default value for relative_priority  https://review.openstack.org/62217516:25
*** gyee has joined #openstack-infra16:25
*** adriancz has quit IRC16:26
fungiclarkb: did hits for those reduce by half when we halved the max-servers too?16:28
clarkbfungi: its hard to tell because they don't seem to run at a constant rate either? and looking at multiple graphs we'd need to merge them to answer that I think16:29
clarkbfwiw they seem to have continued since yesterday's halving16:29
fungibut disappeared when bhs1 was offline?16:30
clarkbyes16:32
clarkbfungi: http://status.openstack.org/elastic-recheck/index.html#1805176 is a really good example of that16:32
openstackgerritMerged openstack-infra/zuul master: web: refactor job page to use a reducer  https://review.openstack.org/62315616:32
clarkbthere is this giant hole there, and at first I was really concerned we had broken log indexing but I'm fairly positive its the bhs1 disabling instead16:32
openstackgerritMerged openstack-infra/zuul master: web: refactor tenants page to use a reducer  https://review.openstack.org/62315716:35
clarkbfungi: but you'll alos notice that it hasn't stopped since yesterday's halving16:37
fungithat's "neat"16:39
*** bhavikdbavishi has joined #openstack-infra16:39
clarkbya I started going down the hold of where did we break indexing16:40
clarkband after 15 minutes light bulb went off when I couldn't find anything obviously broken16:40
clarkbunfortunately halving our use to curb being our own noisy neighbor was my last good idea :) I don't really know what we'd look at next particularly from our end.16:41
clarkbmaybe go back to the idea its unhappy hypervisors and maybe there are more of them than we expected?16:41
clarkbwe do know that there are jobs that run perfectly fine on bhs116:42
clarkbdstat doesn't show the cpu sys and wai signature of the jobs that fail. These happy jobs run in a resonable amount of time successfully16:42
pabelangerhttps://review.openstack.org/623496/ is atleast ready to start exposing host_id into the inventory files, for when we next restart zuul16:42
clarkbthis implies to me its not a consistent issue like "cpus are slow" or "disks are slow" deal with it. I think it also rules out overhead due to metldown/spectre/l1tf16:43
pabelangeras another data point for logstash16:43
clarkbpabelanger: ++16:43
clarkbadded to my review queue16:43
clarkbMaybe a next step is to disable the region, then boot a couple VMs on every hypervisor (may need amorin help with this depending on how the scheduler places things) and run benchmarks on each VM and see if that shows a pattern16:45
clarkb(I would just run devstack as a first pass benchmark)16:45
*** cgoncalves has quit IRC16:45
clarkbwe've seen the db migrations and service startup for eg nova api trigger things in devstack16:45
fungii wonder if we could take bhs1 out of the line of fire for real jobs, but then load it with identical representative workloads we could use to benchmark instances and map them back to their respective hosts to identify any correlation16:45
clarkbya16:45
clarkbalso maybe the nova team ahs ideas around how to make nova VMs happier16:47
*** bobh has quit IRC16:50
*** bobh has joined #openstack-infra16:51
*** cgoncalves has joined #openstack-infra16:53
*** kjackal has quit IRC16:53
*** boden has quit IRC16:54
*** kjackal has joined #openstack-infra16:55
clarkbconsidering it is friday and we are entering a quieter period for many. Maybe the next action is as frickler suggests disable bhs1. Then we can regroup while things are slow and rather than treat this like a fire work through it carefully?16:56
clarkbunfortunately my timezone doesn't overlap well with CET. Thank you all who do have some overlap with help with that. Curious what you think about ^ if we were to try and do more measured debugging and take our time (which likely requires being awake when amorin et al are16:57
fungii concur, we should likely disable it again. frickler's analysis was pretty eye-opening16:59
clarkbinfra-root https://review.openstack.org/#/c/623457/2 is frickler's change to do ^. I've +2'd it and if anyone else would like to chime in now is a good time :)16:59
mordredclarkb: lgtm - I think the plan above to disable it and then run synthetic tests sounds good17:00
fungii do think the frequent timeouts there are likely costing us more capacity than the nodes we lose by dropping it for now17:00
clarkbya the cascading effect of gate resets is painful17:02
*** ginopc has quit IRC17:02
fungii think you can approve it. we have a lot of +2s on there now17:05
clarkbdone17:05
clarkbthen to be extra confident in those e-r holes being related to bhs1 we can watch for a new hole :)17:08
*** jpich has quit IRC17:10
*** mriedem is now known as mriedem_lunch17:19
clarkbpabelanger: +2'd https://review.openstack.org/#/c/623496/1 but didn't approve as I'm always wary around the addition of reliance on new zk data. I'm not quite sure if the release note needs to add any additioanl info about updates or if that will just work if your nodepool is older17:19
clarkbShrews: ^ that might be a question for you?17:19
*** gfidente has quit IRC17:20
pabelangerclarkb: in older nodepool, I'd expect it to nodepool.host_id to be None17:20
Shrewslooking17:21
*** dims has quit IRC17:21
Shrewslooks like that should just work17:22
openstackgerritMerged openstack-infra/project-config master: Disable ovh bhs1  https://review.openstack.org/62345717:22
*** dtantsur is now known as dtantsur|afk17:22
Shrewsolder nodepool won't have host_id, so it should just remain None, as pabelanger says17:23
*** rlandy is now known as rlandy|brb17:23
clarkbpabelanger: ya I guess since you are just setting yaml/ansible var values and not acting on the attribute that should be fine17:23
*** wolverineav has joined #openstack-infra17:23
clarkbif the jobs don't handle that properly the job will have issues, but zuul itself will be fine17:23
Shrewsthe gotchas for zk schema changes tend to be adding new states to pre-existing fields17:23
Shrewspabelanger: just a reminder, but launchers will need a restart if you intend to use that new field17:25
Shrewslast restart didn't get that update17:25
clarkb'The recend tripleo gate reset was not bhs1 related, at least jobs didn't run there. Instead delorean is unhappy with DEBUG: cp: cannot stat 'README.md': No such file or directory'17:25
clarkbwe that first ' should be just before DEBUG17:25
pabelangerShrews: ++17:26
clarkbTryign to call those out to dispell the idea that bhs1 is our only issue17:26
EvilienMwe're trying to figure how tripleo-ci-centos-7-undercloud-containers is not in our gate anymore17:26
EvilienMit's in our layout17:27
corvusEvilienM: give me a change id that should have run it and i can tell you17:27
*** wolverineav has quit IRC17:27
corvus(you can also use the debug:true feature, but this will save 3 hours :)17:27
EvilienMcorvus: I6754da1142e2ec865ef8c60a7e09df00300f791e17:27
EvilienMyeah17:27
Shrews"debug: corvus" > "debug: true"17:28
corvuslol17:28
EvilienMyes that^17:28
mordredall debug messages should just be "corvus:17:28
corvus(i wonder if we can attach the debug info to buildsets in the sql db so folks could look this up post-facto through the web)17:29
EvilienMcorvus: it started 3 days ago17:29
EvilienM(lgostash says)17:29
fungithat sounds like a great idea. it doesn't really mean that much additional data in the db, and it's not like that db gets huge anyway17:29
fungiplus, people often come asking for an explanation of why something didn't run17:30
*** janki has quit IRC17:33
corvusEvilienM: i haven't found that zuul even considered running that job on that change... where is it attached to the tripleo-ci project gate pipeline?17:33
corvusEvilienM: http://git.openstack.org/cgit/openstack-infra/tripleo-ci/tree/zuul.d/undercloud-jobs.yaml#n6 has it attached to check but not gate17:36
corvus(note that because of the clean-check gate requirement, a voting job in check but not in gate can wedge a co-gating project system)17:37
*** rlandy|brb is now known as rlandy17:41
*** bhavikdbavishi has quit IRC17:48
*** sean-k-mooney has joined #openstack-infra17:49
*** ahosam has quit IRC17:50
*** jpena is now known as jpena|off17:50
*** boden has joined #openstack-infra17:50
EvilienMcorvus: weird, let me see17:51
EvilienMI think it's a leftover17:51
EvilienMwhen we had to deal with fires17:51
EvilienMI2753393cd7cdd64720d39360bebe9ddea2f20efc17:51
EvilienMmwhahaha, corvus : https://review.openstack.org/#/c/623555/17:52
EvilienMdamn, sorry for noise17:52
EvilienMmwhahaha: are we missing other jobs in your finding?17:52
mwhahahano just that one17:52
mwhahahahe removed the wrong one17:52
*** shardy has quit IRC17:57
*** Swami has joined #openstack-infra18:13
*** wolverineav has joined #openstack-infra18:13
*** jamesmcarthur has quit IRC18:14
*** ykarel|away has quit IRC18:17
clarkbintegrated gate just merged ~9 changes. tripleo gate also merged a stack not too long ago. I think things are moving18:18
openstackgerritMerged openstack-infra/zuul master: Add nodepool.host_id variable to inventory file  https://review.openstack.org/62349618:21
*** udesale has joined #openstack-infra18:26
*** udesale has quit IRC18:27
sean-k-mooneyhi o/18:28
sean-k-mooneyi ask this every 6 months or so but looking at https://docs.openstack.org/infra/manual/testing.html is the policy on nested virt still the same.18:28
sean-k-mooneye.g. we cant gurrenttee its available and its usally disabled if detected18:29
*** jamesmcarthur has joined #openstack-infra18:31
clarkbsean-k-mooney: yes, in fact just yseterday we found it crashing and rebooting VMs18:34
clarkbsean-k-mooney: seems to be just as unreliable as ever unfortunately18:34
sean-k-mooneyclarkb: its now on by default in the linux kernel going forward18:34
clarkbsean-k-mooney: ya but none of our clouds or guests are running 4.19 yet (except our tumbleweed images and maybe f29)18:35
sean-k-mooneyclarkb: my experince is its pretty reliable if you set teh vms to host-passthough but its flaky if you set a custom cpu moduel or host-model18:35
*** jamesmcarthur has quit IRC18:35
clarkband quite literally yesterday we found unexpected reboots in tests from centos7 crashing with nested guests and virt enabled. I wish it were reliable but it isn't18:36
clarkblogan-: ^ not sure if you set host passthrough or not butthat may be something to check if you don't need live migration in that region or have homogenous cpus18:36
sean-k-mooneyok the reason im asking is the intel nfv ci has broken again and im trying to figure out if i can set up a ci to replace it upstream or else where18:36
logan-yes, we use host passthrough18:37
*** bobh has quit IRC18:38
sean-k-mooneyhuh strange i guess i was just looking when we ran the intel nfv ci with it in our dev lab we never had any issues18:38
fungisean-k-mooney: in particular, over the past couple years we've seen that it's very sensitive to the combination of guest and host kernel used18:38
fungiif you control both then you can get it to work18:38
fungi_probably_18:38
sean-k-mooneyfungi: we were using ubuntu 14.04/16.04 hosts with centos7 and ubutu 14.04/16.04 guests18:39
sean-k-mooneydepending on theyear18:39
fungiso for "private cloud" scenarios it's likely viable. in our situation relying mostly on public cloud providers it's more like russian roulette18:39
sean-k-mooneyya i get that18:39
*** ccamacho has quit IRC18:40
sean-k-mooneyok well that contiues to rule out ovs-dpdk testing in the gate so18:40
sean-k-mooneyim going to set up a tiny cluster at home to do some third party ci for now.18:41
fungiwe get to periods where there is some working combination of guest and host kernels in our images and some of our providers, and then our images get a minor kernel update and suddenly it all goes sideways until the provider gets a newer kernel onto their hosts18:41
logan-the base xenial 4.4 seemed to break a lot more often, so we run the 4.15 xenial hwe on the nodepool hvs starting a few months ago, and even with that we still see things like this week where cent7 and xenial guests running on these hvs were hard rebooting when they attempted to launch a nested virt guest18:41
sean-k-mooneylogan-: did you have host-pasthough on both the hosting cloud and the vms launched by tempest18:42
sean-k-mooneyi should have mention that you need it for both18:43
logan-i don't control all of the guest jobs, but for my test jobs that were rebooting on xenial, yes passthrough was enabled on the guest also18:43
sean-k-mooneyoh well as i said i guess we were lucky with our configs18:43
sean-k-mooneylogan-: do you work for one of the cloud providers?18:44
logan-i run the limestone cloud18:44
clarkbmgagne_: had found that the cpu itself seemed to make a difference too iirc. But unsure if that was a bug in cpus or linux kernel issues or something else18:44
sean-k-mooneyah ok not familar with that one.18:44
clarkbif 4.19 is indeed stable enough to have it turned on by default and evidence shows that to be the case I'll happily use it when we get there18:45
clarkbbut that is likely still a ways off18:45
clarkb4.19 has fixes for the fs corruption now too18:46
clarkbwhcih I should actually reboot for18:46
*** kjackal has quit IRC18:47
*** e0ne has joined #openstack-infra18:47
sean-k-mooneyclarkb: ya if it changes in the future ill maybe see about setting up some ustream nfv testing but for now ill look at third party solutions18:48
sean-k-mooneythat or review the effort to allow ovs-dpdk testing to work without nested virt but nova dont want to merge code that would only be used for testing18:49
*** e0ne has quit IRC18:49
sean-k-mooney*revive18:49
*** bobh has joined #openstack-infra18:49
*** mriedem_lunch is now known as mriedem18:49
fungii don't suppose that behavior could be made pluggable18:49
clarkbanother avenue which we've considered in the past but never got very far with is baremetal test resources18:50
mnaserfyi i think our nested virt is working well18:50
*** dims has joined #openstack-infra18:50
mnaseri know its pretty stable for kata afaik18:50
clarkbmnaser: have they recently tested with centos 7.6?18:50
mnaseri dont know if kata uses 7.618:50
clarkbmnaser: that seemed to be the recent trigger for us I think18:50
clarkbmnaser: it was the new kernel in the host VM18:51
mnaserwe also run the latest 7.6 in sjc1 in the host18:51
sean-k-mooneyfungi: well it basically comes down to the fact that nova enables cpu pinning when you request hugepages and qemu does not supprot cpu pinning without kvm18:51
mnaserand that resolved a lot of nested virt issues18:51
clarkbmnaser: ah in that case it may work out18:51
mnaserworth a shot.  kata has a lot of resources in terms of nested virt folks so maybe working with them might be beneficial18:51
sean-k-mooneyfungi: hugepgaes and numa work with qemu but since i cant disabel the pinning i cant do the testing upstream.18:52
clarkbmnaser: logan- found the new ubuntu hwe kernel seems to have stabilized it too. The bigger issue on my side is that it stopped working in tripleo. The tests were rebooting halfway through and no one noticed until days later when I dug into failures18:52
sean-k-mooneymaybe that will change when rhel8/centos8 come out next year18:53
sean-k-mooneythat said we will still have cento7 jobs for a few releaes18:53
*** bobh has quit IRC18:54
*** bobh has joined #openstack-infra18:57
fungiwe will, but they'll probably be run a lot less frequently as most of the development activity goes on in master anyway and we'd be using 7 on (fewer and older over time) stable branches18:58
*** bobh has quit IRC19:01
*** eernst has quit IRC19:06
*** electrofelix has quit IRC19:11
*** bobh has joined #openstack-infra19:16
*** wolverineav has quit IRC19:19
*** bobh has quit IRC19:20
openstackgerritMarcH proposed openstack-infra/git-review master: CONTRIBUTING.rst, HACKING.rst: fix broken link, minor flow updates  https://review.openstack.org/62336219:28
*** armstrong has quit IRC19:30
openstackgerritMarcH proposed openstack-infra/git-review master: doc: new testing-behind-proxy.rst; tox.ini: passenv = http[s]_proxy  https://review.openstack.org/62336119:31
*** bobh has joined #openstack-infra19:34
clarkbI'm caught up on email and the gate is looking relatively happy. I think I'm going to take this as an opportunity to reboot for kernel fixes (so my fs doesn't corrupet) and apply some patches to my local router. I expect I'll not be gone long but will let you know via the phone network if I managed to break something :)19:36
*** bobh has quit IRC19:39
tobiashclarkb: good luck :)19:39
clarkbthat was actually far less painful than I expected19:46
*** bobh has joined #openstack-infra19:53
*** bobh has quit IRC19:57
*** jcoufal has quit IRC20:10
*** jamesmcarthur has joined #openstack-infra20:10
*** jamesmcarthur has quit IRC20:11
*** bobh has joined #openstack-infra20:11
*** eernst has joined #openstack-infra20:13
*** kjackal has joined #openstack-infra20:15
*** bobh has quit IRC20:16
fungihuh, a bunch of failing changes near the front of the integrated gate queue now20:19
*** sean-k-mooney has quit IRC20:20
fungiall seem to be failures and timeouts for unit tests? weird20:20
fungithe neutron change with the three unit test timeouts all ran in limestone-regionone20:21
fungicinder change failed a volume creation/deletion unit test, ran in ovh-gra120:24
*** mriedem has quit IRC20:25
fungithe nova change had two unit test jobs fail on assorted database migration/sync tests, and both those jobs ran in limestone-regionone20:25
*** kjackal has quit IRC20:25
fungilogan-: any chance the kernel updates have caused us to chew up more resources there?20:25
*** sean-k-mooney has joined #openstack-infra20:28
*** mriedem has joined #openstack-infra20:28
*** bobh has joined #openstack-infra20:29
*** kgiusti has left #openstack-infra20:30
*** bobh has quit IRC20:34
*** eharney has quit IRC20:39
*** ahosam has joined #openstack-infra20:41
fungiaccording to logstash, job timeouts for limestone-regionone really started picking up around 18:00z20:46
*** ralonsoh has quit IRC20:47
*** bobh has joined #openstack-infra20:47
fungithat's something like 3.5 hours after logan- mentioned all the hypervisors had been upgraded20:48
fungiso maybe not connected?20:48
*** bobh has quit IRC20:52
mriedemso, on that new zuul queueing behavior, is there any way that could negatively affect some changes from landing?20:54
mriedemclarkb's email mentioned it could mean things in nova taking longer,20:54
mriedembut i'm just wondering if we're re-enqueuing some approved nova changes that take about 16 hours just to fail on some slow node, then take extra long now to go back through20:54
mriedemb/c it's taking us days to land code20:55
mriedemi.e. it sounds like nova changes are deprioritized because there could be a lot of nova changes queued up at any given time right?20:56
pabelangereach patch now enters the check pipeline at the same priority, across all projects. So the first patch set of each project gets nodes now, even if nova submitted 10 different patches lets say20:58
mnasermriedem: is this happening in big stacks?20:58
mriedemdefine big20:58
mriedemhttps://review.openstack.org/#/c/602804/1520:58
pabelangerthis is amplified, if a large stack of patches are submitted together, as patches behind that won't get nodes until the patch series has nodes20:58
mnaseryeah what pabelanger said.. i think this is something we should somehow address20:59
dansmithdefine large?20:59
dansmithand is this just stacks of patches?20:59
mnaserwell, large is relative to how many patches are in the nova queue right now20:59
mnaseri.e. if i push a 30 stack change, then i've just effectively slowed down all of nova's development till all 30 are tested21:00
mnaserwhich isn't ideal tbh21:00
dansmithmeaning it slows down all of nova, while trying to avoid slowing down, say, neutron?21:00
mnaserpretty much dansmith21:00
mnaserbecause then the 'queue' for nova is long and neutron is short so.. neutron gets a node while nova waits21:01
pabelangermnaser: well, not really. it is the same behavior to nova, but other projects get priority over nodes now21:01
dansmithdoesn't that kinda not work if one project has three people working on it and another has 30?21:01
dansmithpabelanger: I'm not sure I see the difference21:01
mnaserdansmith: yeah. unfortunately the busier projects somehow get *less* results and the more idle ones get more results quicker21:01
pabelangerfor node requests, now large and small project both have equal weight to the pool of resources21:02
mnaserbecause $some_small_project always has 2-3 changes in queue, vs nova that might have 4021:02
mnaserso if i'm working on a project alone, i can almost always get a node right away.. but im in nova, it might not be for a while21:02
dansmithmnaser: okay, yeah, that seems frustrating.. because we've been waiting longer than a day to get a single read on a fix.. like, to even get logs to evaluate something21:02
pabelangerthe 4th paragraph in http://lists.openstack.org/pipermail/openstack-discuss/2018-December/000482.html should give a little more detail21:02
mriedemi just replied,21:03
mnaserdansmith: i agree. i'm kinda on both sides of this, now our openstack-ansible roles get super quick feedback.. but i see working with other big projects that now it takes forever to get check responses21:03
mriedemwondering if it would be possible to weigh nova changes in the queue differently based on their previous number of times through21:03
funginode assignments are round-robined between repositories, so if three nova changes are queued at the same time as two cinder changes and one neutron change then nodes will be doled out to the first nova, neutron and cinder changes, then the next two nova and neutron changes, and then the third nova change in that order21:03
pabelangerdansmith: right now I would say the gate resets / reduced nodes from providers is also impacting the time here too21:03
dansmithmnaser: honestly the last few days I'll make one change in the morning and just do other stuff for the whole day, then check on the patch the next morning21:03
dansmithmnaser: which pretty much kills one's will to live and motivation to iterate on something21:04
mnaserdansmith: i agree with you 100%.21:04
dansmithpabelanger: I'm sure yeah21:04
mnaseri do think at times a smaller project can iterate a lot faster than say.. nova21:04
fungia lot of this queuing model was driven by the desire to make tripleo changes wait instead of starving the ci system of resources to the point where most other projects were waiting forever21:04
mnaserbecause they'll always get a highest priority if they only have one change in a queue21:04
mriedemso, just curious, are the new non-openstack foundation projects contributing resources to node pool for CI resources?21:04
mriedemis that just a bucket of money the foundation doles out for CI?21:05
fungithe foundation doesn't purchase ci resources21:05
mnasermriedem: none do, but afaik, their utilization is very small21:05
dansmithfungi: yeah, and I'm sure tripleo causes a lot of trouble for the pool21:05
mriedemfungi: don't get me wrong, i'd like to see tripleo go full 3rd party CI at this point :)21:05
mriedemhas the idea of quotas ever come up?21:05
fungiand yeah, the other osf projects use miniscule amounts of ci resources, as mnaser notes21:05
*** bobh has joined #openstack-infra21:06
mriedeme.g. each project gets some kind of quota so one project can't just add 30 jobs to run per change21:06
mriedemor never clean up their old redundant jobs21:06
fungimriedem: yes, i think the idea is that we'd potentially implement quotas in zuul/nodepool per tenant, but we're still missing some of the mechanisms we'd need to do that21:07
mriedemtenant == code repo/project requesting CI resources?21:08
*** jtomasek has quit IRC21:08
mriedemmaybe project under governance21:08
*** rlandy has quit IRC21:08
mriedemi just know it's very easy today, especially with zuulv3 where jobs are defined per project, for a project to be like, "oh we should test knob x, let's copy job A, call it job B and set knob x to True even though 95% of the tests run will be the same"21:09
sean-k-mooneyfungi: out of interest what is missing for quotas. i played wtih having mulitple zuul tenants using the same cloud resoues provided by a ahred node pool.21:09
sean-k-mooneyi did not have it deployed for long but had tought everything was supported to do that21:10
corvuswell from my pov what's missing is the will to set quotas.  does anyone have a suggestion as to how we should set them?21:10
*** bobh has quit IRC21:10
clarkbseems I've missed the fun conversation running errands and finding lunch21:11
clarkbmy last email included a paste that breaks down usage21:11
*** jamesmcarthur has joined #openstack-infra21:11
clarkbtripleo is ~42% with nova and neutron around 10-15% iirc. All openstack official projects together are 98.2% iirc21:11
dansmithI definitely get the intent here, and I think it's good,21:12
clarkbI really want to gwt away from the idea itsnew projects using all the resources the data doesnot back that up21:12
dansmithclarkb: we get it. it's not new projects, it's tripleo :)21:12
sean-k-mooneycorvus: well i dont know what the actul capasity is but you could start with one tenant per offial team in the governance repo and  oversibsribing the actual cloud resouces by giving each tanatn a quorat of 100 inances and then adjust it down21:13
dansmith42% when the fat pig that is nova is a pretty huge scaling factor21:13
fungisean-k-mooney: sharing job configuration across tenants is inconvenient, i think we were looking at tenant per osf top-level project21:14
*** jamesmcarthur has quit IRC21:14
*** jamesmcarthur has joined #openstack-infra21:14
mriedembut nova waiting a day for results when zaqar is prioritized higher is kind of....weird21:14
clarkbright the biggest impact we can have is improving tripleos fail rate and overall high drmand21:15
corvusdansmith: it's a little hard for me to separate whether you're seeing the effect of the new queueing behavior or just the general backlog.  because while you may wait a day to get results on the change you push up this morning, if we turned off the new behavior the same would still be true.21:15
mriedemi would think quota would be doled out by some kind of deployed / activity metrci21:15
mriedem*metric21:15
corvusyesterday, mid-day, nova was waiting 5 hours for results21:15
dansmither, I meant 42% when something as big as nova is only 10% is a big scaling factor21:15
mriedemtaking a different angle here,21:15
dansmithcorvus: 5 hours for one patch or five hours for the last patch in a small series?21:16
mriedemare we aware of any voting jobs that have a fail rate of 50% or more?21:16
dansmithcorvus: and yeah, I know things are in shambles right now21:16
sean-k-mooneywhy does triplo use so much resouces anyway. is its minium deployment size jsut quite big or does it have a lot of jobs? or both i guess?21:16
mriedemshitload of jobs21:16
*** tpsilva has quit IRC21:16
mriedemfrom what i could see21:16
corvussean-k-mooney: and they run very long21:16
mriedembaremetal21:16
mriedemit's like the worst kind of ci requirements21:16
sean-k-mooneyam could we have a seperate teanatn for tripplo?21:17
corvuswe don't have any bare-metal resources, so they all run virtualized21:17
clarkbthey are big long running jobs that fail a lot21:17
mriedemso back to my other question, do we know of voting jobs that are failing at too high a rate and should make them non-voting to avoid resets?21:18
clarkbthe fail a lot is important with how the gate works21:18
clarkbmriedem: non voting jobs dont causeresets and shouldnt run in the gate21:18
mriedemwe used to make the ceph job non-voting if it failed consistently at something like 25%21:18
dansmithmaybe we have one queue for tripleo and one for everything else?21:18
mriedemclarkb: that's my point,21:18
mriedemdo we have *voting* jobs with high fail rates that should be made non-voting21:18
mriedemuntil they are sorted out21:18
dansmithif they're currently 50% of the load, that would seem reasonable,21:18
clarkbmriedem: I am not sure, you'd have to check graphite21:19
clarkbbut also that doesnt fix the issue of brojen software21:19
sean-k-mooneydansmith: ya that was basically the logic i had with suggesting making the be a seperate tenant then the rest21:19
corvusdansmith: they do have their own queue -- or do you mean quota? or... i may not understand...21:19
mriedemclarkb: i realize that doesn't fix broken software, but holding the rest of openstack hostage while one project figures out why it's jobs are always failing is wrong21:20
dansmithcorvus: like, the per-project queue thing you have now, but put tripleo  in one queue and everyone else in the same queue,21:20
dansmithso that nova and neutron fight only against tripleo :)21:20
corvusdansmith: gotcha -- so only test one tripleo change at a time across the whole system?21:20
clarkbmriedem: except Im fairly certain some of the issue here is broken rolling downhill21:20
mriedemas i said, if the ceph job had a high failure rate and someone wasn't fixing it, we'd make it non-voting21:20
clarkbtripleo takes nova and it does work tests break21:20
clarkbthen clouds take tripleo and hosts the tests and break21:21
clarkbits not as simple as x y and z are bad stop them21:21
dansmithcorvus: is that what the current queuing is doing? or you mean just because their jobs are so big, that they'd realistically only get one going at a time?21:21
corvusdansmith: that's actually not that hard to do -- we can adjust the new priority system to operate based on the gate pipeline's shared changes queues, even in check.21:21
corvusdansmith: it's not what the queueing is currently doing, because each tripleo project gets its own priority queue in check (in gate, they do share one queue).21:22
dansmithcorvus: oh cripes!21:23
dansmithcorvus: yeah, seems like they get priority inflation because they have a billion projects, amirite?21:23
fungiagain though, that's in check. in gate the tripleo queue and the integrated queue are basically on equal footing21:24
corvusdansmith: right21:24
dansmithpresumably gate is less of an issue right?21:24
clarkbwell gate is what eats all the nodes21:24
corvuswell, gate has priority over check21:24
*** bobh has joined #openstack-infra21:24
clarkbespecially when thing are flaky because a reset stops all jobs, then restarts them taking nodes from check21:24
dansmithreally? gate is always smaller it seems like21:24
clarkbwhich is why I keep telling people please fix the tests21:24
dansmithyeah I know gate resets kill everything21:24
corvusso a gate reset (integrated or tripleo) starves check21:24
clarkbyou will merge features faster if we all just spend a little time fixing bugs21:25
sean-k-mooneyclarkb: well to get to gate they would have to go through check so check should be the limiting factor no?21:25
dansmithsean-k-mooney: it's flaky tests21:25
dansmithnot failboat tests21:25
clarkbright its using nested virt crashing teh VM and failing the test that does it21:25
clarkband other not 100% failure issues21:25
fungiyeah, tests with nondeterministic behaviors (or tests exercising nondeterministic features of the software)21:26
pabelangermaybe in check, group the priority based on the project deliverables? I think that is what dansmith might be getting at21:26
dansmithso, gate resets suck, we know that, and a very small section of people work on fixing those which sucks21:26
dansmithbut yeah,21:26
sean-k-mooneyso like 4 years ago we used to have both recheck and reverify to allow running check and gate independely21:26
dansmithffs put tripleo in one bucket at least,21:26
sean-k-mooneydo we have reverify anymore21:26
dansmithand maybe put everyone else in another bucket together21:26
clarkbdansmith: if the TC tells us to we will. But so far OpenStack the project seems to tolerate their behavior21:26
fungipabelanger: challenge there is probably in integrating governance data. zuul knows about a flat set of repositories, and knows about specific queues21:27
pabelangerfungi: agree21:27
clarkbfwiw I don't think tolerating is necessarily a bad thing21:27
pabelangersean-k-mooney: no, just recheck21:27
clarkbtripleo testing has to deal with all the bugs in the rest of openstack21:27
dansmithclarkb: really? Is the TC in charge of how to allocate infra resources? I didn't realize21:27
corvus[since i'm entering the convesation in the middle, let me add a late preface -- the new queing behavior is an attempt to make things suck equally for all projects.  it's just an idea and i'm very happy to talk about and try other ideas, or turn it off if it's terrible.  it's behind a feature flag]21:27
clarkbdansmith: the TC is in charge of saying nova is more important that tripleo21:27
dansmithclarkb: but osa is the same and they use a much smaller fraction right?21:27
sean-k-mooneypabelanger: ok could we bring back reverify so that if check passes and gate fails they could just rerun gate jobs and maybe reduces load that way21:28
dansmithclarkb: we know they're never going to say that.. this doesn't seem like a politics thing, but rather a making the best use of the resources we all share21:28
clarkbdansmith: yes, I pointed this out to the TC when I started sharing this data21:28
clarkbdansmith: all of the other deployment projects use about 5% or less resources each21:28
*** bobh has quit IRC21:28
dansmithclarkb: if you make it about "okay who is prettier" you'll never get an answer from them and we'll all just keep sucking21:28
dansmithclarkb: right, and they all deal with "all the bugs of openstack" in the same way right?21:29
corvussean-k-mooney: that's called the 'clean-check' requirement and it was instituted because too many people were approving changes with failing check jobs, and therefore causing gate resets or, if they merged, then merging flaky tests.21:29
clarkbdansmith: sort of. I think what makes tripleo "weird" is that they have stronger dependencies on openstack working for tripleo to actually deploy anything21:29
clarkbdansmith: ansible runs mistral which runs heat which runs ansible which runs openstack type of deal21:29
clarkbwhereas puppet and ansible and so on are just the last bit21:29
corvussean-k-mooney: sdague thought clean-check was very important and effective for eliminating that.21:29
dansmithclarkb: yeah I get that their design has made them fragile :)21:30
sean-k-mooneycorvus: im suggestig that reverify would only work if check was clean and gate jobs failed it woudl rerun the gate jobs only keeping the clean check results21:30
corvussean-k-mooney: oh, when the gate runs we forget the check result21:30
*** eharney has joined #openstack-infra21:31
clarkbthe good news is tripleo is reducing their footprint21:31
corvussean-k-mooney: so basically the issue is, how do you enqueue a change in gate with a verified=-2.  i guess we could permit that....21:31
fungisean-k-mooney: also, before clean-check one of the problems was reviewers approving changes which had all-green results from check jobs run 6 months ago21:31
clarkbI think we are seeing positive change there, unfortunately it is still slow movement and I end up debugging a bunch of gate failures21:31
clarkbIdeally tripleo would be doing that more actively21:31
clarkb(as would openstack with its flaky queue)21:31
corvus(verified=-2 would mean "this failed in gate, but it previously passed in check, so allow it to go back into gate)21:31
fungisean-k-mooney: which clean-check doesn't solve obviously, but at least forcing the change which caused a gate reset to getobtain check results keeps it from continuing to get reapproved21:32
sean-k-mooneyfungi:  ya i know it would reduce the quality of the gate in some respects21:32
sean-k-mooneyit woudl be harder to do but enven if reverify woudl conditionally skip the check if they were run in the last day might help21:33
fungisean-k-mooney: less about reducing the quality of the gate, it would actually slow development (these changes went in specifically because our ability to gate changes ground to a halt with nondeterministic jobs and people approving broken changes)21:33
mriedemyeah i don't want to go back to that21:33
mriedemit was changed for good reasons21:33
sean-k-mooneyfair enough21:34
mriedemsdague isn't in his grave yet, but i can hear him rolling21:34
sean-k-mooneyi know there were issue with it21:34
clarkbthe reason I push on gate failures is the way the gate should work is that 20 changes go in, they grab X nodes to test those changes and do that for 2.5 hours. Then they all merge and the gate is empty leaving all the other resources for everyone else21:34
clarkbwhat happens instead is we use N nodes for 20 minutes, reset then use N nodes for 20 minutes and reset and on and on21:35
corvuss/2.5/1 :)21:35
fungiyep, while forcing a change which fails in the gate pipeline to reobtain check results is painful, not doing it is significantly more painful in the long run21:35
clarkband never free up resources for check21:35
clarkbunfortunately the ideal behavior sort of assumes we operate under the assumption that broken flaky code is bad21:35
pabelangerclarkb: given how realitive priority works now, if the gate window was smaller again, that would mean more nodes in check right? Meaning more feedback, however it does mean potential longer for things to gate and merge21:36
sean-k-mooneywell we do in most projects21:36
pabelangerright now, there is a lot of nodes servicing gate21:36
fungiand assumes that we catch a vast majority of the failures in check, and that our jobs pass reliably21:36
clarkbsean-k-mooney: there are certainly pockets that do, but overall from my perspective as being the person debugging things and beating this drum very few do21:36
corvuspabelanger: relative_priority doesn't really change the gate/check balance.  and clarkb shrunk the window a few weeks ago and observed no noticeable change in overall throughput.21:37
corvus(so the window is back at 20)21:37
clarkbI do think this feedback is worthwhile though. One thing that comes to mind is maybe the priority should be based on time rather than count (though noav would still suffer under that)21:37
corvusshrinking the window should alter the check/gate balance though.  so we might see some check results faster if we shrunk it again.21:37
clarkbperhaps there should be a relieve valve in the priority?21:37
fungiyeah, i think when things get really turbulent, resetting a 10-change queue over and over is about as bad as a 2021:38
clarkbperhaps we do need to allocate quotas to projects given some importance value21:38
clarkbmy concern with this is I really don't want to be the person that says tripleo gets less resources than nova21:38
fungii have a feeling there are a lot of times where zuul never gets nodes allocated to more than 10 changes in the queue before another reset tanks them all anyway21:38
sean-k-mooneyclarkb: ya to be honest ignoring the hassel of being involed in runing a thrid party ci,  one thing i did liek about it is the random edgecase it exposed that i was then able to go report or fix upstream21:38
clarkbI think we have an elected body over all the projects for that21:38
dansmithclarkb: to some degree, it's currently being said that tripleo gets as many resources as all the rest of openstack21:39
pabelangerclarkb: wasn't there also a suggestion for priority becase on number of nodes also?21:39
clarkbdansmith: yup, and we've asked tripleo nicely to change that and they have started doing so21:39
dansmithand by them having a crapton of projects, they're N times more important any any one other single-repo project, if I understand correctly21:39
clarkbpabelanger: ya node time I think I mentioend once21:39
clarkbdansmith: ya using gate queues to determine allocations (I think corvus mentioend that above) may be an important improvement here21:40
clarkbor the governance data? or some sort of aggregation21:40
dansmithclarkb: that means treating tripleo as one thing instead of N things? definitely seems like an improvement21:40
*** bobh has joined #openstack-infra21:41
clarkbdansmith: ya and osa and others that are organized similarly21:41
* dansmith nods21:41
sean-k-mooneyclarkb: how hard would it be to have a check queue per governance team21:41
pabelangerI still like the suggestion from dansmith that project deliverables are counted together, not as individuals for priority. That does seem to give an advandage to a project with code over more repos, then a single21:41
clarkbsean-k-mooney: we already do a check queue per change. I think this is less about the queue and more prioritizing how nodes are assigned21:41
sean-k-mooneycheck queue per change? did you mean project?21:42
clarkbsean-k-mooney: no per change is how its logically implemented in zuul21:42
clarkbto represent dependencies21:42
fungieach change in an independent pipeline (e.g., "check") gets its own queue21:43
fungichanges in a dependent pipeline (e.g., "gate") get changes queued together based on jobs they share in common, or via explicit queue declaration21:43
sean-k-mooneyfungi: ya i realised when i said queue before i ment to say pipeline21:43
corvusideas so far: https://etherpad.openstack.org/p/QxXuCSdAoF21:43
fungior in v3 did we drop the "jobs in common" criteria for automatic queue determination? is it all explicit now?21:44
corvusfungi: all explicit21:44
fungiokay, that then ;)21:44
fungiunfortunately my next idea is to go get dinner, and beer21:45
*** bobh has quit IRC21:45
fungior perhaps that's fortunately21:45
corvusit sounds like grouping projects in check by the same criteria we use in gate for setting relative_priority (ie, group all tripleo projects together in check) seems popular, fair, and not-difficult-to-implement.21:45
*** jamesmcarthur has quit IRC21:45
corvusshould i work on that?21:45
clarkbcorvus: and if not difficult to implement we can at least try it out and if it doesn't work well probably not a huge deal?21:46
clarkbcorvus: sounds like you should :)21:46
fungiyes, my concern with trying to merge it with governance data is the complexity. zuul already has knowledge of queues21:46
corvusyeah, i consider this whole thing an experiment and we should change it however we want :)21:46
pabelanger+1 to experiment21:46
clarkbdansmith: ^ does that seem like a reasonable place to start? That should treat aggregate tripleo as a unit rather than individual repos21:47
pabelangerclarkb: my question would be, does that ignore the 'integrated' queue in gate, or include it?21:47
dansmithI would think that'd be an improvement worth doing at least yeah21:47
fungishould there be a multi-tiered prioritization decision within the queuing set too? or just treat all changes for that group equally even if there are 10 nova changes and 1 mistral change (assuming mistral is in integrated too)21:48
corvus(oh, i added one more idea -- pabelanger suggested that we could treat the second and later patch in a patch series with a lower priority)21:48
corvusfungi: that's a good point, this would lump nova/cinder/etc together21:49
clarkbpabelanger: ya I guess thats the other question is if it will result in much change if nova + neutron + cinder + glance + swift are all together21:49
sean-k-mooneyclarkb: you know there is one other thing we could try21:49
sean-k-mooneycould we split the short zull jobs from the long ones21:49
clarkbsean-k-mooney: thats sort of what we've done with the current priority21:50
clarkbwe are running a lot more short jobs beacuse the lessa ctive projects tend to have less involved testing21:50
sean-k-mooneye.g. run the unit,function,pep8,docs and release notes in one set and comment back and the tempest ones in another bucket21:50
clarkbah split on that axis21:50
sean-k-mooneyya21:51
clarkbya though maybe thats an indication we all need to run `tox` before pushing more often :P21:51
sean-k-mooneyso that you get develop feed back quickly on at least the non integration tests21:51
clarkb(I'm bad at it myself, but it does make a difference)21:51
pabelangerclarkb: yah, i think for the impact, we'd need to group specific queues in check (via configuration?) and keep current behavior of per project realitive priority21:51
sean-k-mooneyclarkb: well i normally do tox -e py27,pep8,docs21:52
sean-k-mooneybut i rarely un py3 or fucntional test locally21:52
sean-k-mooneyi do if i touch them but nova takes a while21:52
sean-k-mooneyclarkb: you could even wait to kick off the tempest test until the other test job passed21:53
corvusit's worth keeping in mind that nothing we've discussed (including this idea) changes the fact that during the north-american day, we are trying to run about 2x the number of jobs at a time than we can support.21:53
clarkbhttp://logs.openstack.org/37/611137/1/gate/grenade-py3/e454703/job-output.txt.gz#_2018-12-07_21_48_57_091229 just reset the integrated gate21:53
clarkbsean-k-mooney: ^ its specifically test failures like that that I'm talking about needing more eyeballs fixing21:54
corvusif we run 20,000 jobs in a day now, with any of these changes, we'll still run 20,000 jobs in a day.  it's just re-ordering when we run them.  the only thing that will change that is to run fewer jobs, run them more quickly, or have them fail less often.21:54
clarkbthat change modifies cinder unit tests21:54
clarkbso shouldn't be anywhere near what grenade runs21:55
clarkband yet it fails >0% but <100% of the time21:55
*** boden has quit IRC21:55
clarkbcorvus: ya I think what dansmith and mriedem were get at is the turnaround time for a specific patch impacts there ability to fix review today or wait for tomorrow21:55
clarkbcorvus: in the fifo system we had before that turnaround time was the same for evryone roughly (we'd get so far behind then everyone is waiting all day sort of thing)21:56
clarkbin the current system only some subset of people are waiting for tomorrow21:56
dansmithyep21:56
clarkbwhich is an improvement for some and not for others21:56
mriedemit's also impacting our ability to try and work on things to fix the gate21:56
mriedemlike the n-api slow start times21:56
dansmithwe just proposed removing the cellsv1 job from our regular run, btw.. I'm constantly thinking about what we can run less of, fwiw21:56
clarkbya nova is a fairly responsible project21:56
sean-k-mooneyhum the content type was application/octet-stream which i woudl have expected to work21:56
mriedemi'm trying to merge the nova-multiattach job into tempest so we can kill that job as well21:56
dansmithmriedem: right, this actually came up while trying to get results from gate-fixing patches21:56
*** wolverineav has joined #openstack-infra21:57
mriedemclarkb: can you say that again, but this time into my lapel?21:57
dansmithclarkb: FAIRLY RESPONSIBLE21:57
*** wolverineav has quit IRC21:57
*** wolverineav has joined #openstack-infra21:57
clarkbdansmith: mriedem heh I just mean in comparison to others yall seem to jump on your bugs without prompting21:57
clarkband dive in if prompted21:57
dansmithclarkb: yeah, arguing the "fairly" part :)21:57
* mriedem lights a cigarette21:57
fungiour previous high-profile gate-fixers were mostly nova core reviewers too, historically21:58
dansmithmriedem jumps without prompting, and I jump when prompted by mriedem21:58
dansmithit's a good system.21:58
corvusclarkb: yes, though one thing that wasn't pointed out was that a large project has more changes with results waiting at the end of the turnaround time, whereas a smaller project may only have the one change.  so if you're in a large project and can work on multiple things, it's less of an issue.  of course, if you're focused on one change in a large project, it's worse now.21:58
clarkbcorvus: ya21:58
corvusi'll go look into the combine-stuff-in-check idea now21:58
clarkbcorvus: ok, thanks21:58
mriedemdansmith: working on this shit allows me to procrastinate from working on cross-cell resize21:58
dansmithmriedem: and you're generally a noble sumbitch to boot.21:59
* dansmith has to run21:59
clarkbsean-k-mooney: ya these bugs tend to be difficult to debug (though not always) which is one reason I think we have so few people that dig into them21:59
sean-k-mooneyso in general do people think there would be merrit in a precheck pipeline for running all the non tempest test(pep8,py27...) and only kicking of the dsvm test if the precheck job passed21:59
clarkbbut that digging is quite valuable21:59
clarkbsean-k-mooney: about 5 years ago we did do that, and what we found was we had more round trips per patch as a result21:59
clarkbsean-k-mooney: that doesn't mean we shouldn't try it again22:00
clarkbbut is something to keep in mind, the current thought is providing complete results in one go makes it easier to fix the complete set of bugs in a change before it goes through again22:00
sean-k-mooneymore rount trips with a shorter latancy for result untill the quick jobs passed might be a saving on gate time over all22:00
clarkbwe should see about measuring that along with any more lag time on throughput22:00
pabelangerRight, this was also the idea of doing fast-fail too, if one job fails, they all do.22:01
pabelangerbut means less results22:01
sean-k-mooneyclarkb:  do we have visablity or an easy way to catogries what precentege of falirues on a patch are from tempest jobs verses the rest?22:01
clarkbsean-k-mooney: I think that is something to keep in our back pocket as an option if the reorg of priority aggregation continues to be sadness. I do want to avoid changing too many things at once22:01
fungiokay, really running off to dinner now. *might* pop back on when i return, but... it is friday night22:01
pabelangersean-k-mooney: you could actually test that today, with using job dependencies in your zuul.yaml file22:01
clarkbfungi: enjoy your evening and weekened22:02
fungithanks!22:02
sean-k-mooneypabelanger: you could actully test it today. i could test it in a week after i read up on how this all works again :)22:02
clarkbsean-k-mooney: pabelanger I added it to https://etherpad.openstack.org/p/QxXuCSdAoF for compelteness22:02
pabelangerclarkb: +122:03
sean-k-mooneyclarkb: this is somethign we could maybe test on a per projec basis too22:03
corvusmy changes will definitely use more nodes overall if we fast-fail :(22:04
sean-k-mooneye.g. if we had a precheck pipeline we try it on nova or something by change just our zull file22:04
clarkbcorvus: ya mine too22:04
*** wolverineav has quit IRC22:04
clarkbbut maybe that is ok if the aggregate doesn't22:04
corvus(typically my changes use 2x nodes because i always get something wrong the first time.  expect x! nodes if we do fast-fail.  :)22:05
clarkbsean-k-mooney: re that test I called out. It failed due to a 502 from apache talking to cinder. Apache says AH01102: error reading status line from remote server 127.0.0.1:60999 at http://logs.openstack.org/37/611137/1/gate/grenade-py3/e454703/logs/apache/error.txt.gz22:06
clarkbcinder api log doesn't immediately show me anything that would indicate why22:07
sean-k-mooneyclarkb: ya i saw that apache is proxying to mod_wsgi i am assuming22:07
clarkbsean-k-mooney: I think only for apache, the rest of the services run uwsgi standalone and we just tcp to them?22:07
clarkbapache is just terminatign ssl for us22:07
sean-k-mooneyoh ok22:07
*** wolverineav has joined #openstack-infra22:08
clarkbstack exchange says set proxy-initial-not-pooled in apache22:08
clarkbthis will degrade performance but make things more reliable as it avoids a race between pooled connection being closed and new connection to frontend22:09
clarkbI thought we had something like this in the apache config already22:09
clarkboh I remember it was the backends and apache not allowing connection reuse by the python clients because python requests has the same race22:10
sean-k-mooneyya just reading https://httpd.apache.org/docs/2.4/mod/mod_proxy_http.html22:10
clarkbchances are we do want this sort of thing added to devstack22:10
openstackgerritClint 'SpamapS' Byrum proposed openstack-infra/project-config master: Add gate job for Slack notifier in zuul-jobs  https://review.openstack.org/62359322:11
*** slaweq has joined #openstack-infra22:11
sean-k-mooneythat or we use something other then appache for ssl termination22:11
clarkbya we used apache because keystone already depped on it22:12
clarkbavoided adding a dep22:12
sean-k-mooneykeystone can run under uwisg now right?22:12
sean-k-mooneyi know glance stil has some issues22:12
clarkbmaybe? its a good question22:12
openstackgerritClint 'SpamapS' Byrum proposed openstack-infra/zuul-jobs master: Add a slack-notify role  https://review.openstack.org/62359422:13
sean-k-mooneyhaproxy, nginx and caddy all are lighter weight solution to ssl termmination then apache but that option is proably a good place to start22:13
clarkbya devstack had broken support for some lightweight terminator that ended up being EOL'd and removed from the distros22:17
clarkband it was at that point stuff moved to apache because it was already a hard dep for keystone22:17
*** eernst has quit IRC22:17
clarkbit can certainly be updated again if it makes sense, though configuring apache is likely easier short term22:17
* clarkb updates devstack repo22:18
sean-k-mooneyya am i have several other experiment i want to do with devstack but i might add that to the list22:18
jrosseris it right i seem to see a mix of centos 7.5 & 7.6 nodes?22:19
clarkbjrosser: as of yesterday all but inap should haev an up to date 7.6 image22:20
clarkbI haven't checked yet today if inap image managed to get pushed22:21
jrosserok - i'll check they're all from there22:21
*** dmellado has quit IRC22:22
*** stevebaker has quit IRC22:23
*** gouthamr has quit IRC22:23
*** bobh has joined #openstack-infra22:24
jrosserclarkb: looking at a few the 7.5 do indeed look to be inap nodes22:26
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Consider shared changes queues for relative_priority  https://review.openstack.org/62359522:28
clarkbjrosser: ya I just checked and inap is still not getting a succesful upload22:28
clarkbwe'll need to debug that22:28
clarkbjrosser: does that cause problems? maybe the package updates take a long time?22:29
*** bobh has quit IRC22:29
corvusclarkb, pabelanger, sean-k-mooney, dansmith, mriedem, fungi: ^ i think https://review.openstack.org/623595 is our tweak (combined with a change to project-config to establish the queues in check)22:29
jrosserok thanks - it'll trip up osa jobs where we just fixed 7.6 host + 7.6 docker image22:29
jrossermismatching those doesnt work for us22:29
clarkboh right the venv thing mnaser mentioned22:30
jrosseryes thats it22:30
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Consider shared changes queues for relative_priority  https://review.openstack.org/62359522:30
mriedemcorvus: ok, but i'll admit the wording in that commit message is greek to me22:31
mriedemmaybe you want to poke that out in the ML thread22:31
clarkbopenstack.exceptions.ConflictException: ConflictException: 409: Client Error for url: https://image.api.mtl01.cloud.iweb.com/v2/images/7cc7c423-dea6-4efb-b36d-bc3b7fbdee5e/file, Image status transition from saving to saving is not allowed: 409 Conflict22:33
clarkbjrosser: ^ that is why our uploads are failing I think that is an openstacksdk bug22:33
clarkbmordred: ^ fyi22:33
corvusmriedem: it's greek for "lump tripleo together in check" :)22:33
clarkbjrosser: I'm not in a great spot to debug that myself right now, but let me get a full traceback pasted so that someone can look if that have time22:34
clarkbmordred: jrosser http://paste.openstack.org/show/736850/22:35
mriedemcorvus: ok, but still probably good to say in that thread "this is what happens today and this is what's proposed after getting brow beaten for an hour in irc"22:35
jrosserclarkb: thankyou22:35
clarkbI want to finish up the train of thought with devstack apache, then review corvus' change, then maybe I'll get to that sdk thing22:35
mriedemmaybe leave out that last part22:35
corvusmriedem: yeah, i'll reply, but it'll take me a minute because now i have to write about your suggestion ;)22:36
mriedemoh weights on changes within a given project?22:37
mriedemthis all sounds like stuff people want nova to be doing all the time22:37
mriedemnow you know how i feel22:37
mriedemoh what if we just used zuul in the nova-scheduler...22:37
corvuswell, i'm reading it as weigh higher changes which have failed alot, but yeah22:37
corvusmriedem: they both have schedulers, they must be the same thing22:38
mriedemcorrect22:38
clarkbdoes that mean our work here is done?22:38
corvusi'm pretty sure the zuul-scheduler is so called because that was the only word that came to mind after spending a year listening to people talk about nova22:39
clarkbthere are times that I wish devstack was more config managementy, this is one of them :P22:41
mriedemclarkb: btw, please take it easy on me on monday after your boy russell prances all over my team this weekend22:41
clarkbmriedem: I had them going 5-11 this year. I'm both happy and mad they proved me wrong22:41
clarkbmriedem: whats crazy is the packers could be that 5-10-1 team22:41
mriedemnot crazy, great22:42
*** bobh has joined #openstack-infra22:43
*** mriedem is now known as mriedem_afk22:46
*** bobh has quit IRC22:47
clarkbsean-k-mooney: https://review.openstack.org/623597 fyi should set that env var22:52
*** slaweq has quit IRC22:52
clarkbI spent more time figuring out what the opensuse envvars file is than writing the patch :P22:52
clarkbcorvus: I think your change must've got caught by a pyyaml release? pep8 is complaning that the "safe" methods dont' exist anymore22:57
corvusclarkb: i don't see a new pyyaml release22:58
clarkbI'm guessing the change to make safe the default and make unsafe explicit opt in hit?22:58
clarkbhrm22:58
clarkbhttp://logs.openstack.org/95/623595/2/check/tox-pep8/b4c0a8b/job-output.txt.gz#_2018-12-07_22_36_31_79623322:58
corvusthere was a new mypy release though22:58
clarkboh22:58
clarkbthat must be it then22:58
corvusi expect the unit test failures are separate from that, so i'll look into that before pushing up fixes for both22:59
clarkbok23:00
clarkbcorvus: the other thing I notice is that we'll set specific check queues which are different tyhan those in gate (or could be at least?)23:00
clarkbthat seems like a good feature23:00
corvusclarkb: yep; it'd be really messy to implement it otherwise23:00
clarkbI think you only get the CSafeLoader attributes if the libyaml-dev headers are available23:01
clarkbI wonder if mypy can be convinced to allow either type23:01
clarkbanother option is to install that package via bindep23:01
*** irdr has quit IRC23:03
clarkb" Image status transition from saving to saving is not allowed"23:04
clarkbit only just occurred to me its mad that the state is transitioning to the same state23:04
*** gouthamr has joined #openstack-infra23:06
*** wolverineav has quit IRC23:08
*** wolverin_ has joined #openstack-infra23:08
clarkbthis is the two step upload process, We create an image record, this first step is the one that gets passed all the image property data. Then we PUT the actual image file data to foo_image/file url23:09
clarkbits the second one that is failing, I don't think we supply the property data there, it should just be the content type header and the content of the image itself23:09
*** slaweq has joined #openstack-infra23:09
*** dmellado has joined #openstack-infra23:11
*** slaweq has quit IRC23:14
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Consider shared changes queues for relative_priority  https://review.openstack.org/62359523:15
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Cap mypy  https://review.openstack.org/62359823:15
clarkbya it looks like if we use the sdk native interface it may remember some object attributes but we consume things ont he shade side which is just doing a pretty boring POST then PUT without any information abuot state at that level (shade manages the state, the client is clueless beyond the session from what I can tell)23:19
clarkbpossibly a chagne on the cloud side?23:19
*** slaweq has joined #openstack-infra23:19
clarkbmgagne_: ^ if you are around do you see a similar traceback? I can't tell if its the lib/sdk that is buggy or if maybe the server is?23:20
clarkbmordred: chances are you just know off the top of your head if you have a sec too23:23
*** slaweq has quit IRC23:24
*** stevebaker has joined #openstack-infra23:25
sean-k-mooneyclarkb: im kindo of surpised that we soudl set envars in /etc/sysconfig i would have assumed it would have either been in /etc/default/<apache name> or some systemd folder23:26
clarkbsean-k-mooney: apparently on rhel/centos and suse this is how you do it23:28
clarkbdebuntu use /etc/apache2/envvars23:28
sean-k-mooneyok i proably should know that ...23:28
clarkbthen the init system applies that data (via apachectl on debuntu?)23:28
*** apetrich has quit IRC23:47
mnaserhm23:47
mnaserin nested virt situations, i assume folks have seen really slow io or traffic?23:48
mnasertrying to debug functional tests for k8s .. http://logs.openstack.org/75/623575/1/check/magnum-functional-k8s/463ae1a/logs/cluster-nodes/master-test-172.24.5.203/cloud-init-output.txt.gz23:48
mnaserit looks like its downloading quite slowyl23:48
sean-k-mooneymnaser: no i havent seen that before23:48
mnaseri mean it could also not be using nested virt23:49
clarkbmnaser: chances are its not nested virt23:49
mnaserdarn :(23:49
mnaseri'm trying to make magnum functional tests for k8s actually start working again23:49
mnaserbut i think it might just be totally impossible without some sort of nested virt guarantee :(23:49
clarkbhttp://logs.openstack.org/75/623575/1/check/magnum-functional-k8s/463ae1a/logs/etc/nova/nova-cpu.conf.txt.gz virt type qemu23:49
clarkbwhich we set in devstack by default due to issues like centos 7 crashing under it :(23:50
mnaserit'd be nice if we can have not-so-third-party-third-party-ci23:50
sean-k-mooneymnaser: so nested virst is on but kvm is not used so the nested vms are just slow23:50
mnaseras in like "here's some credentials, can we use those nodesets in this project only please"23:50
mnaserrather than us deploying a fully fledged zuul to do third party ci23:51
clarkbsean-k-mooney: it may or not be on depending on the cloud that it is scheduled to23:52
* sean-k-mooney may or may not be deploying openstack and zuul at home to set up a third party ci...23:52
clarkbwhich is the other issue23:52
clarkbmnaser: we actually can do that, no one has offered as far as I know. But thats roughly what we are doing with kata23:53
clarkbthe key thing is we can't gate in that setup (because its a spof)23:53
clarkbbut can provide informational results23:53
sean-k-mooneyclarkb: ya that is true its not that hard to check if you can enable kvm you jsut have to modeprobe it and see if /dev/kvm is there23:53
mnaserclarkb: .. so can we do that for magnum :D23:54
clarkbmnaser: maybe? adrian otto actually brought it up a while back and then it went nowhere (idea was to use rax onmetal at the time)23:54
clarkbsean-k-mooney: ya it even works if you've hidden vmx from the instance (which is why we don't botherdoing that)23:54
mnaseri mean i can do the work and provide the infra (..magnum is important for us, and its functional jobs are pretty disfunctional because of this)23:54
sean-k-mooneyi barbican still doing terble things in there ci jobs.23:55
*** rkukura_ has joined #openstack-infra23:55
sean-k-mooneybefore osic died they had a ci job that tried to enable kvm and powered off the host if it failed so zuul woudl reschdule them.23:56
clarkbsean-k-mooney: a few project try to use nested virt if its there (octavia and tripleo are/were doing this, its how we ran into that issues with centos(23:56
sean-k-mooneyat least i think it was barbican that had that job23:56
*** rkukura has quit IRC23:57
*** rkukura_ is now known as rkukura23:57
*** jamesmcarthur has joined #openstack-infra23:58
sean-k-mooneymnaser: by the way do you know with vexhost if i create an account can i set a limit on my usage per month23:59
mnasersean-k-mooney: unfortunately the only way that's possible is by enforcing a quota on your account, we don't have a "cost" quota ;x23:59
sean-k-mooneyok i assumed that would be the answer23:59

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!