Thursday, 2017-05-25

jlkTurns out, when we poke at the dusty corners, we uncover some ugly stuff.00:06
*** adam_g has quit IRC00:06
*** adam_g has joined #zuul00:09
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Put variables into the inventory  https://review.openstack.org/46763500:16
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Write inventory as yaml not ini  https://review.openstack.org/46763400:16
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Display command/shell results in a banner  https://review.openstack.org/46760300:21
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Strip unneeded cmd, changed, stdout and stderr from results  https://review.openstack.org/46731000:21
mordredjeblair: ^^ I updated that per your review and also responded. relatedly - I think we're only fetching the contents of the first task00:22
mordredjeblair: I seem to remember we discovered that was a bug before00:22
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Disable leaked repo check  https://review.openstack.org/46681000:23
mordredShrews: ^^ since you've been looking at logging, maybe you'll have an idea? (It'll be easier to spot once the logs aren't spammy)00:23
jlkoooh merged the leaked repo.00:28
jlktime to recheck all the things.00:28
mordredjlk: I just rechecked the first half of your stack00:29
jlk^500:29
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Wait for merger to complete before item is ready  https://review.openstack.org/46305400:44
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Add support for requiring github pr head status  https://review.openstack.org/44939000:44
*** tristanC has joined #zuul01:06
jlkmordred: you've used betamax in the past, would you have any objections if it were brought into zuul to better test github driver code that interacts with the API? We've hit a number of bugs that were not discoverable until running the actual code against the API instead of our mocked fake stuff.01:22
jlkmordred: I'm thinking we can just stuff json events into the webapp, record the zuul responses to interacting with real PRs on github, and then re-use those betamax responses in tests.01:23
jlk(similar to how github3.py does things, except they don't deal with webhooks)01:23
openstackgerritJesse Keating proposed openstack-infra/zuul feature/zuulv3: Implement trigger require/reject-review  https://review.openstack.org/46338701:35
jeblairjlk, mordred: i'm a fan of betamax for such things01:39
openstackgerritJesse Keating proposed openstack-infra/zuul feature/zuulv3: Include tenant name in github context  https://review.openstack.org/46387101:39
openstackgerritJesse Keating proposed openstack-infra/zuul feature/zuulv3: Save installation ids to a cache and fetch them per project  https://review.openstack.org/46342401:39
openstackgerritJesse Keating proposed openstack-infra/zuul feature/zuulv3: Catch integration key file not found in zuul  https://review.openstack.org/46342501:39
openstackgerritJesse Keating proposed openstack-infra/zuul feature/zuulv3: Create github client each time it's called.  https://review.openstack.org/46342101:39
openstackgerritJesse Keating proposed openstack-infra/zuul feature/zuulv3: Use integration_id with github  https://review.openstack.org/46342001:39
openstackgerritJesse Keating proposed openstack-infra/zuul feature/zuulv3: Refactor integration key loading to always load  https://review.openstack.org/46342301:39
openstackgerritJesse Keating proposed openstack-infra/zuul feature/zuulv3: Decode JSON body once for requests  https://review.openstack.org/46342201:40
jeblairjlk: i'm thinking that might be best as a somewhat more traditional unit test of the github driver, yeah?  verifying one thing at a time and keeping the fake for the functional tests?01:41
jlkmaybe? I'm a bit of a novice when it comes to designing tests. Mostly I just dislike that in the github test area we're manufacturing API call/responses and at times we've done it wrong. SO we wind up mostly testing our fake methods instead of testing the real methods.01:43
jeblairjlk: (the only reason i'd hesitate about betamax is the thought that the fake needs to be accurate enough for the functional tests, so once we get all the errors out, will betamax still offer something beyond the fake?01:43
jlkI've tried to extract the calls to external APIs into small functions but it's not complete.01:43
* jlk afk for the dinners.01:44
jeblairthe main thing is that the fake needs to be fully functional so that we can make changes to zuul itself with confidence. *also* testing the driver with betamax is fine with me (if, at the end of the day, it's still useful)01:46
jeblairfwiw, on the gerrit side, we created the fake by copying the data from production responses and then stuffing them into the fake with templated values01:49
jeblairbetamax might be useful getting a better baseline for the fake01:49
jlkoh oh02:25
jlkSo02:25
jlkI can easily fake the events.02:25
jlkthat we have to have no matter what02:25
jlkit's what the driver does in _response_ to the events that matters. Because unlike gerrit, we don't get _all_ the data. Github driver has to make calls back to the API to get more data.02:25
jlkand, it makes calls to _set_ data, like status.  It's those things that we're doing now with fake calls, that's bypassing parts of the actual driver code. If we used betamax, it'd use the driver code, and we should still be able to inspect that it Did The Right Thing.02:26
*** adam_g has quit IRC02:28
*** adam_g has joined #zuul02:30
SpamapSjlk: I think jeblair is suggesting that you could make a fake for that side too.03:26
SpamapS2017-05-25 03:57:40,836 kazoo.client                     INFO     Connecting to localhost:218103:58
SpamapS2017-05-25 03:57:40,837 kazoo.client                     WARNING  Connection dropped: socket connection error: Connection refused03:58
SpamapSthe plot thickens03:58
SpamapSbtw, we have _something_ calling basicConfig during tests03:58
* SpamapS puts a pin in it to come back to03:58
SpamapSI monkeypatched basicConfig out and at least started getting sane output from these fails03:58
SpamapSgah n/m on the ZK thing04:00
* SpamapS has too many things going on at once ;)04:00
jlkSpamapS: but... isn't that what betamax does? Creates the fakes for you, based on the real interaction, so you can run it over and over again without needing the real API there?04:03
SpamapSjlk: I believe it will parrot things back to you. I don't believe it will handle a two way comm functionally the way the fake gerrit does for instance.04:06
jlkwhat's the difference?04:06
SpamapSThe fake gerrit keeps some state and will feed it back to you as it was submitted. I don't recall that betamax has a facility for that.04:06
SpamapSso you can tell the fake gerrit "add a change. now give me the event stream".. betamax is more "if I add a change, respond with this response"04:07
SpamapSbut I could be wrong04:07
SpamapSmy betamax experience was over a year ago and brief04:08
jlkah.04:12
jlkhrm.04:12
jlkI'm not sure if it's that way or if I'm following04:12
jlkbut it seemed like you could betamax a multi-call thing and have the back/forth04:13
jlkmaybe it's because there isn't an "event stream" on the github side per se04:13
*** bhavik1 has joined #zuul05:25
SpamapSjlk: probably worth investigating.05:38
* SpamapS 's been debugging on the ssh agent.. all the live long day05:53
*** bhavik1 has quit IRC07:01
*** DangerousDaren has joined #zuul07:39
*** Cibo_ has joined #zuul08:40
*** smyers has quit IRC09:04
*** Cibo_ has quit IRC09:09
*** smyers has joined #zuul09:15
*** bhavik1 has joined #zuul09:41
*** jkilpatr has quit IRC10:57
*** jkilpatr has joined #zuul11:14
*** isaacb has joined #zuul11:26
*** isaacb has quit IRC11:30
*** bhavik1 has quit IRC11:43
openstackgerritTobias Henkel proposed openstack-infra/zuul feature/zuulv3: Use ssh for git-upload-pack  https://review.openstack.org/43680211:48
*** jkilpatr has quit IRC12:00
*** jkilpatr has joined #zuul12:01
*** jkilpatr has quit IRC12:16
*** jkilpatr has joined #zuul12:43
*** rcarrillocruz has quit IRC12:46
Shrewsmordred: sorry, an idea on what?13:05
mordredShrews: I _believe_ there is a bug in the consumption of log streaming and writing it to the file on the executor13:17
mordredShrews: I _think_ the bug is that it for $reason only winds up streaming the first shell task and not subsequent ones13:18
Shrewshmm13:18
mordredShrews: it's hard to see right now because we also log a whole mess of garbage13:18
mordredwhich https://review.openstack.org/#/c/467310/ will hopefully fix13:18
Shrewsi still see garbage  :(13:19
mordredyah. so much garbage13:19
mordredwe need to land 467310 and 467603 and the logs should be prettier13:19
mordredpabelanger: ^^ if you're up - those are for the logging issue13:20
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Re-enable E305 pep8 errors  https://review.openstack.org/46628613:21
*** dkranz has quit IRC13:23
* SpamapS has resorted to appending debug traces to files to try and find out where the thing is getting stuck13:36
mordredSpamapS: print is the ultimate debug tool13:38
SpamapSnope13:38
SpamapSthat gets swallowed13:38
SpamapSthanks testr!13:38
mordredoh yuck13:38
SpamapS(and the problem doesn't happen not in testr)13:38
mordredSpamapS: running the test in isolation outside of testr everything is fine?13:39
SpamapSso like literally I have a thing that appends to a file at ~/tmp/debug.{pid}13:39
mordredwow13:39
SpamapSmordred: no one test kills it13:39
SpamapSit just gets stuck somewhere and times out13:39
*** Cibo_ has joined #zuul13:39
SpamapSin futex()13:39
SpamapSwhich tells me "in python thread stuffs"13:40
* mordred hands SpamapS a large bucket into which to put his sads13:40
mordredI _hate_ these types of bugs13:40
mordredbecause the sad thing is - they're REAL bugs that will hit actual production at some point13:41
SpamapSthe good news is I just found the line it's getting stuck at13:41
mordredooh!13:41
* mordred hands SpamapS a large bucket into which to put his happy13:41
SpamapSthe bad news is.. now what? :-P13:41
mordredSpamapS: have you tried deleting the line?13:41
mordredmaybe it's just naughty13:41
SpamapSmordred: who needs to join threads anyway?13:42
SpamapSbe an individual13:42
SpamapSnot a joiner13:42
mordredthat's what I'm saying13:43
mordredSpamapS: so -it's trying to join threads but one of the threads decides it can't because $reason and $reason is the issue13:43
SpamapSRight, the threads do try to cooperate a bit13:43
SpamapSso we have a running flag13:44
SpamapSand we set it to False, and then wait for that to work13:44
mordredbut if one of the threads is blocked on something, it can't cycle back around to obey the flag13:44
SpamapSI think yes13:44
SpamapSI think13:46
mordredSpamapS: golly - you might have to add ~/tmp/debug.{threadid} files13:46
SpamapSI may be wrong13:46
SpamapSI think it's possible we're doing a blanket 'wait()'13:46
SpamapSand picking up the ssh-agent13:46
mordredahhhhh13:46
mordredthat is not a bad hypothesis13:46
SpamapShrm no... waitpid13:47
SpamapSself.proc.wait() :-P13:47
mordredssh-agent is going in to subprocess right?13:47
mordredbut I guess there is a controlling execution thread that owns that subprocess handle13:47
SpamapSso, I tried it with os.fork/execlp actually13:48
SpamapSthat doesn't seem to fix it13:48
* mordred is just being the dumb sounding board saying stupid things in case the trigger a thought btw13:48
SpamapSappreciate it13:48
SpamapSthere are a lot of bad theories bouncing around in my head13:48
mordredSpamapS: ok - I have a REALLY stupid idea13:48
SpamapSthat's exactly what I need13:49
mordredSpamapS: but it's REALLY stupid - so bear with me13:49
mordredSpamapS: make a completely separate daemon whose job is only to fork and reap ssh-agents. have the executor request a new ssh-agent from the agent daemon rather than forking one itself, and have the executor return the agent when it's done13:50
SpamapSUgh13:50
SpamapSI mean yeah13:50
SpamapSthat's stupid13:50
mordredit is13:50
SpamapSbut13:50
SpamapSI'm getting to where I have no idea why having this process running is locking up my python interpreter13:51
mordredhowever - we might be hitting the point where we're at enough layers of subprocesses interleaved with threads that we just don't have the tools to actually find the sticking point13:51
ShrewsSpamapS: ooh, i cherish the bugs that force me to log to a file13:51
SpamapSfound spot seems to have been a red herring.. it hasn't locked up there again13:51
Shrewso/` Cherish the bugs you have o/`13:52
SpamapSCherish is the word I use to remind me of...13:53
SpamapSShrews's Love13:53
Shrewsaawwwww13:53
Shrewsoops, autocorrect. should have been...13:53
Shrewseewwwww13:53
Shrewsmordred: is it ok to filter result._result before calling the super?13:55
Shrewsneed to check what the super does with that...13:55
* SpamapS has to go begin the child shuttling process13:55
SpamapSmordred: I'll ponder on that.. thanks for the dumb idea13:56
mordredShrews: I responded to same question in the review from jeblair13:58
mordredShrews: we have no choice13:58
Shrewsoh13:59
mordredShrews: well - we do have a choice - if we find that filtering first is unacceptable because it would break something else - we can stop calling super and write our own complete methods13:59
mordredShrews: so figuring out if it's a bad idea _is_ worthwhile13:59
*** isaacb has joined #zuul14:00
Shrewsmordred: i'm not immediately seeing anything that would break. However, I'm willing to bet that this is yet another unexpected bastardization of the ansible API that would make bcoca and jimi cringe.  :)14:01
mordredyah14:02
mordredShrews: that said - we need to put on the list "need defined API for result objects in callback plugins"14:03
mordredbecauseyou basically can't do _anything_ without hitting private members14:03
mordredI know bcoca has said before that they weren't ready to commit to what's there yet14:03
mordredwhich I totally respect14:03
*** Cibo_ has quit IRC14:04
Shrewsyep14:04
mordredbut it seems enough people are writing callback plugins that it's time ready or not14:04
*** dkranz has joined #zuul14:18
*** isaacb has quit IRC14:34
rbergeronpabelanger: where is beefy being used?14:48
pabelangerrbergeron: just some random hacking by dougbtv14:50
dougbtv:D14:51
dougbtvit's in a cowsay-beefymiracle rpm available in fedora fwiw14:51
rbergeronyay14:56
rbergeronahhh14:56
rbergeronah, thanks to rrix and will woods14:58
* rbergeron wonders if they can be bribed to package it for epel, lol14:59
dougbtv+1!15:13
*** isaacb has joined #zuul15:17
pabelangermordred: clarkb: jeblair: do you mind adding https://review.openstack.org/#/c/452494/ so your review pipeline.  Its our nodepool console-log patch15:24
mordredpabelanger: we wrote get_server_console in shde to just return none if the API call fails,right?15:26
pabelangerlet me confirm15:27
pabelangermordred: empty string it looks like15:28
mordredpabelanger: ok - cool. so no exceptions15:38
mordredpabelanger: patch +2 from me15:38
*** jasondotstar has quit IRC15:43
*** leifmadsen has quit IRC15:45
*** leifmadsen has joined #zuul15:49
*** jasondotstar has joined #zuul15:51
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Add override-branch property to job repos  https://review.openstack.org/46737515:58
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Enable test_one_branch cloner test  https://review.openstack.org/46777915:58
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Check out the appropriate branch in executor  https://review.openstack.org/46777815:58
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Use the executor cached repos more often  https://review.openstack.org/46777715:58
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Return resulting commits from merger  https://review.openstack.org/46777615:58
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Add default-branch property to projects  https://review.openstack.org/46733415:58
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Rename 'repos' job attribute to 'required-projects'  https://review.openstack.org/46737615:58
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Add override-branch property to jobs  https://review.openstack.org/46777515:58
*** DangerousDaren has quit IRC16:08
mordredSpamapS, jeblair: topic came up in a completely unrelated conversation - but I thought it might be worth mentioning here16:21
mordredSpamapS, jeblair: http://opentracing.io/ ... current effort Adrian Cole (dude who wrote jclouds originally) is working on - it's a spec and a set of implementation for tracing things through distributed systems16:22
mordredwe have a distributed system that's goal in life is to become even more distributed - thought at least pointing people to that would be nice - on the off chance we happen across the problems they're focused on solving16:23
SpamapSheh16:25
SpamapStracing :)16:25
jeblairmordred: *nod*  could be interesting.  though i'd like to start by making sure our log files are sufficient (and sufficiently tagged) so that an admin doesn't need an extra tool to do basic debugging.  but once we have our house in order, might be nice.  :)16:26
SpamapSI need something like set -x for python right now :-P16:28
* SpamapS is literally putting a debug append to file on every line in executor server right now16:28
jeblairSpamapS: maybe i should poke at that problem with you... what do i do?16:29
*** bhavik1 has joined #zuul16:29
jeblairmordred, jlk: i reviewed a bunch of github changes16:29
mordredjeblair: woot. I'll do that next16:32
SpamapSjeblair: Would love a second set of eyes. Especially if they're your elf eyes.16:32
SpamapSjeblair: the problematic patch is 46271216:32
SpamapSjeblair: running tests under testr times out tests reliably. Have tried on single thread, still happens.16:33
SpamapSstracing reveals before they get SIGALRM they're in 'futex()'16:33
jeblairbut running a single test succeeds?16:33
SpamapSand looks like they're stuck on joining threads16:33
mordredspeaking of logging, I just pushed up a shade stack related to logging that ends in https://review.openstack.org/#/c/468095/ which is a document about the loggers we log to. mostly mentioning here in that, in as much as nodepool uses shade, admin info on how to control the logging shade emits can be useful16:33
SpamapSjeblair: yes any single test succeeds though it's possible that's a sample size problem16:34
jeblairmordred: yeah, we may want to xref that in nodepool docs16:34
SpamapSalso setting gentle=True deadlocks forever, presumably because the thread that receives SIGALRM is the one stuck in futex()16:35
mordredjlk: I'm not going to -1 it - but in https://review.openstack.org/#/c/453844/17/tests/base.py you moved user = 'zuul' to a parameter - but the comment at the top of the method says "we hard set user to zuul" - I think the comment needs changing/removal16:36
jeblairSpamapS: neat.  my ssg-adent has no "-D" argument16:37
jeblairssh-agent even16:37
jeblairii  openssh-client                              1:6.6p1-2ubuntu2.816:37
jeblairSpamapS: that's what zuulv3-dev has as well.  any chance yours is a gnome thingy?16:40
SpamapSwhhhhhhhhhat?16:40
SpamapSopenssh-client: /usr/bin/ssh-agent16:40
SpamapSii  openssh-client                                  1:7.2p2-4ubuntu2.2           amd64                        secure shell (SSH) client, for secure access to remote machines16:41
SpamapSso -D is new?16:41
SpamapShrm16:41
jeblairseems that way16:41
jeblairhttp://paste.openstack.org/show/610657/16:41
SpamapSgood 'ol BSD people.. daemonizing everything16:41
SpamapSso the original patchset did use the daemonized one16:41
SpamapSthat made me nervous though16:41
SpamapSas it makes it harder for us to ensure they get cleaned up16:41
SpamapSat least with it in the same pg as zuul-executor in theory when zuul-executor's session ends it will get a HUP and die.16:42
pabelangerI was using ssh-agent -k16:42
pabelangerto help clean up16:42
SpamapSso of course we can track pids and kill16:42
SpamapSbut the kernel helps us quite a bit if we stay in the same pg16:42
jeblairyeah, i really like the foregroud/pgroup option16:43
SpamapSbut requiring openssh >= 7.x might be a bit much?16:43
SpamapSTrusty still has 2 years left.16:43
clarkbwe already require bubblewrap16:44
clarkbor mostly require it16:44
jeblairand we're on the verge of being py3 only16:44
SpamapSand RHEL6 still has 2 years too :-P16:44
clarkbrhel6 is python2.6 which I doubt we care about16:44
pabelangercentos-7 lacks zookeeper currently16:44
SpamapSso16:44
SpamapSagree on that!16:44
SpamapShowever..16:44
SpamapSthere's a difference between helpers (zk, bwrap) and core system software (ssh)16:44
pabelangerI've been testing on fedora latest, works well16:44
SpamapSquite a different thing to run backported ssh16:45
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Adds github triggering from status updates  https://review.openstack.org/45384416:45
clarkbaren't we talking about running backported kernels?16:45
clarkbI don't think you get more core sytstem software than that16:45
pabelangerssh-agent on xenial support -D16:46
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Implement pipeline requirement on github reviews  https://review.openstack.org/45384516:46
pabelangerand I think we're planning on migrating more things in openstack-infra to xenial16:46
pabelangerI just did nodepool-builders this week16:46
SpamapSI'm surprised we booted zuulv3-dev on trusty :-P16:47
pabelangereasy mode16:47
clarkbyes I awnt it if for no other reason than to be systemd everywhere and no longer have to deal with 3 init systems16:47
mordredalso - isn't zesty already out?16:47
pabelangerWe could upgrade it to xenial, if people would like to do so.16:47
mordredoh - wait - zesty  still isn't lts is it?16:48
SpamapSzesty is in fact out16:48
SpamapSbut zesty is a 9 month support release16:48
jeblairpabelanger: we'll need to soon anyway for the py3 stuff16:48
mordredjeblair: ++16:48
pabelangerokay, I can prep zuulv3 upgrade16:48
mordredI certainly don't think we should target zuul v3 to trusty16:48
clarkbno, but xenial is fine16:49
mordredyup16:49
pabelangeractually16:49
pabelangercan we just inplace distro upgrade it? That there is alot of stuff that is not under puppet right now16:49
jeblairpabelanger: well, maybe we should leave it alone and build zuulv3.openstack.org on xenial with puppet?16:50
pabelangerwe can, just a lot of puppet work atm16:50
SpamapSyou can borrow BonnyCI's ansible if you want ;-)16:50
* SpamapS ducks16:51
jeblairi'd be down with that :)16:51
mordredhonestly, if puppet is hurting not helping us to manage the systems, I think that should be considered16:52
mordredmanaging our systems via config management is suppoesd to be a force-multiplier16:52
SpamapSya'll puppet what you need to puppet. :)16:52
pabelangerwell, we do have some ansible roles today in gate. I'd love to demo some of the stuff I did will windmill testing :)16:52
mordredif we're avoiding tasks because too-much-puppet - it's not helping16:52
pabelangerthat's how I have been rolling zuulv3 locally16:52
SpamapShoist likely has some non-generic code that would make porting it to infra as hard as porting openstack's stuff to v3. :)16:52
pabelangerbut, didn't want to push the ansible / puppet cfgmgt issue in openstack-infra16:53
clarkbI'm not sure that changing cfg mgmt systems magically fixes anything16:53
Shrewsjeblair: mordred: well, i absolutely am baffled as to why I cannot get the asyncio server to shutdown when I daemonize the thing. It works fine non-daemonized. I'm about to give up on this.16:53
SpamapS(hoist being our pirate-themed Ansible for deploying BonnyCI)16:53
clarkb(they all have problems and make like miserable in their own ways)16:53
jeblairat any rate, one way or another, we do need the production system to be under config management, so i think we should leave the dev machine alone and launch the prod server on xenial with cfgmgt16:53
mordredI agree with jeblair16:53
mordredI also agree with clarkb - Ido not think there is a magical bullet - but if humans are avoiding the puppet work it's a worthy facet to consider16:54
mordredjeblair, jlk: different topic-  minor bikeshed: https://review.openstack.org/#/c/460700/14/tests/base.py ... gh_time_format = '%Y-%m-%dT%H:%M:%SZ'16:54
mordredI bring this up because I quite literally JUST hit this with shade and nova api16:54
clarkbprotip never ever read teh ansible service management code if you want to use ansible to manage services16:54
jlkSo, just so I can keep it straight again, EventFilter deals with data in the actual event (like the branch, the repo, etc..  and the RefFilter is about deeper data in the change, like what votes it has or whether it's the latest ref in the change request and whatever.16:55
SpamapSclarkb: to be fair, they have to deal with even more than 3 init systems. :)16:55
jlkmordred: I have no skin in that game, I think that was a time format created by adam_g16:55
mordredjeblair, jlk: '%Y-%m-%dT%H:%M:%SZ' is a subset of ISO 8601 - but is not full ISO 8601 - are we certain that github uses the subset and will never send richer ISO 8601 values?16:55
jeblairjlk: yep16:55
jlker not format, but code16:55
clarkbSpamapS: ya tahts actually a big part of my grump. Everywhere else ansible has explicitly decided to avoid abstracting things into generic resources. Except for service. And its terrible16:55
jlkmordred: we're not certain about anything with regard to the github API16:55
clarkbI think in part because it flies in the face of the general design of the rest of it16:56
jlkall we have is empirical evidence16:56
jlkoh oh are we griping about the "service" module?16:56
* jlk hates it16:56
jlkclarkb: the "package" module is getting to be like that too, trying to abstract around apt, yum, dnf, etc...16:57
mordredjlk: which is a dumb idea16:58
jlkI get the desire, but I choose not to embrace it16:58
mordredjlk: the original ansible approach of an apt, yum and dnf module is much nicer16:58
mordredyup16:58
jlkI'd rather use playbook branching logic to call the correct modules.16:58
mordreddehann wasn't always right - but he's right on this one16:59
jlkagreed16:59
mordredwell - you have to have branching logic ANYWAY16:59
SpamapSclarkb: Agree, would much rather see a systemctl module.16:59
mordredbecause fo package names16:59
mordredSpamapS, clarkb: https://docs.ansible.com/ansible/systemd_module.html16:59
jlkwell..16:59
jlkyes16:59
SpamapSwell there you go16:59
clarkbmordred: oh nice16:59
jlkyou need branching logic for variable data loading16:59
clarkbeven more reason to just get to systemd everywhere world :)16:59
jlkand presumably you always feed the same _variable name_ into the "package" action plugin.17:00
clarkbas much as I'm sure it hurts mordred to do that :)17:00
* SpamapS is now more confused after adding tracing17:00
mordredjlk: so - data type is not listed in thier API reference anywyhere for datatimes - but in their graphql docs they have "An ISO-8601 encoded UTC date string." - https://developer.github.com/early-access/graphql/scalar/datetime/17:00
jlkvariable branching is WAY easier than playbook branching17:00
SpamapSthread is exiting... join is not returning17:00
jlkmordred: I really need to get a grasp on graphql and see if we should move over to it, but for now we're on the REST API17:00
mordredin any case - it may be worth considering just using iso8601 to parse the incoming string for potential forward compat17:00
jlkmordred: if that's safe with the not quite 8601 we get now, sure.17:01
mordredsince it'll parse the subset17:01
mordredwell - the string above is valid iso 860117:01
jlkmy time code skills are REALLY lacking.17:01
openstackgerritDavid Shrewsbury proposed openstack-infra/zuul feature/zuulv3: WIP: Add web-based console log streaming  https://review.openstack.org/46335317:01
mordredit's just that there are other things that are also optional and valid17:01
mordredjlk: I'll make a follow up patch - I don't think it's worth reworking the existing patch17:02
*** isaacb has quit IRC17:02
jeblairSpamapS: i'll switch to a xenial machine to poke at this for now, and we can defer the ssh version decision for later17:02
jlkkay thanks17:02
jlkI have to get back to rebasing this stuff, because I screwed up the EventFilter vs RefFilter for some things17:02
pabelangerjeblair: mordred: does that mean we want ze01.o.o and zuulv3.o.o ?17:02
pabelangeror all in one still17:02
*** Cibo_ has joined #zuul17:02
jeblairpabelanger: let's go ahead and separate17:02
SpamapSjeblair: There's an earlier patch set that doesn't use -D ... if that one works rebased on feature/zuulv3, it might be a clue about what's causing the weirdness :-P17:03
openstackgerritDavid Shrewsbury proposed openstack-infra/zuul feature/zuulv3: WIP: Add web-based console log streaming  https://review.openstack.org/46335317:03
jeblairSpamapS: true; i'll reserve that line of inquiry for later17:03
Shrewsmordred: In case there's any chance you might have a clue on this, lines 164-170 of https://review.openstack.org/#/c/463353/13/zuul/streamer.py summarize the situation.17:04
mordredjlk: while you're in there- https://review.openstack.org/#/c/46076 is +3 but jeblair left an inline comment17:06
jlkalrighty17:06
Shrewshah. typical. now that i've publicly given up, i finally find something on the internet: https://bugs.python.org/issue2199817:07
mordredShrews: yay!17:08
Shrewsfrom the comments i've read so far, the answer is "don't do that"17:08
Shrewsso maybe we just have to treat this differently from the other zuul processes17:09
jeblairShrews: where's the daemonization code?17:09
Shrewsjeblair: i posted the latest in 46335317:09
Shrewsjeblair: zuul/cmd/streamer.py does the normal zuul stuff17:10
Shrewswith a little manual forking thrown in17:11
clarkboh Shrews is here. Shrews pabelanger can has review on https://review.openstack.org/#/c/467783/ its the failed image upload records thing17:11
clarkbif that looks good on the v3 branch I'd like to port it over to master too so we can get that in soonish17:11
Shrewsclarkb: yeah, it's on my list. i want to make sure we aren't depeding on those anywhere17:11
Shrewspretty sure we aren't, but need to refresh my memory17:12
clarkbShrews: I did some rudimentary checking (grepping for FAILED or was it FAILING), but extra eyes appreciated17:12
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Ensure PRs arent rejected for stale negative reviews  https://review.openstack.org/46070017:12
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Represent github change ID in status page by PR number  https://review.openstack.org/46071617:12
jeblairShrews: ah, i see it; i was misled by the "TODO: make zuul/cmd/streamer.py" comment, as that appears to be TODONE :)17:12
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Comment on PRs if a remote call to merge a change failed  https://review.openstack.org/46076217:13
Shrewsjeblair: where is that comment?17:16
jeblairShrews: bottom of zuul/streamer.py17:16
SpamapS#4  0x000000000049f414 in PyThread_acquire_lock ()17:16
* SpamapS finds all the lock acquires17:16
Shrewshah17:16
jeblairShrews: i'll poke at it after i finish poking at the ssh-agent thing17:17
SpamapSoh hrm17:17
SpamapSthat's lower level17:17
SpamapSall the threads are at that spot17:17
Shrewsjeblair: according to that bug report, the event loop FD appears to be the problem17:17
Shrewsi suspect i need to play to creating a new event loop17:19
SpamapSShrews: what are you forking for?17:22
* SpamapS has no context17:22
ShrewsSpamapS: all zuul processes support being run in the foreground (with -d option) or being daemonized (the default when you run the command). Forking is part of the daemonizing process17:25
SpamapSShrews: so you haven't done any actual work before daemonizing right?17:25
ShrewsSpamapS: not i, but i can't control what the asyncio lib does17:26
SpamapSSeems like moving the event loop init to after daemonizing would be the simplest thing?17:26
ShrewsSpamapS: right, that's part of asyncio17:26
Shrewsi wonder if i move the import to much later...17:27
SpamapSzomg this is awesome. So I added a crap ton of debug writes to my log files... and of course.. now it's not timing out.17:27
SpamapSShrews: the event loop isn't initialized on import, is it?17:27
jeblairShrews: yeah, if it does something in module import (which is evil... didn't quido write this?), you may need to do that.  we have similar late imports due to paramiko17:27
SpamapSThat would be .. awful.17:27
jeblairguido even17:27
clarkbShrews: was it not somethign as simple as keeping fds open?17:28
SpamapSimport side effects are one of those newbie mistakes I thought17:28
ShrewsSpamapS: i have to assume so because one does not initialize the event loop17:28
clarkbdaemonization closes all open fds17:28
clarkbbut you can whitelist them17:28
jeblairi don't think we want any fds open17:28
jeblairwe want daemonization to happen before any asyncio stuff starts17:29
Shrewsthe fds are part of asyncio internals. i don't want to try to mess with those17:29
jeblairright.  i'm just saying they shouldn't be open before we daemonize.17:29
clarkbwell you don't mess with them so much as tell daemonization process to leave them alone (the opposite of messing with them :) )17:29
clarkband ya doing asyncio setup after wards is fine too17:29
Shrewsi'm going to try creating a new event loop, and late import as a last resort17:30
SpamapSI'm not kidding though.. I actually find it hard to believe asyncio initializes the event loop on import. I believe you it does, but it _hurts_ my brain to think they did that on purpose.17:30
jeblairwell, just speculation at this point.  but it would be nice to have a definitive answer.17:31
SpamapSsince you always have to asyncio.get_event_loop() , that seems like the thing to move after fork.17:36
ShrewsSpamapS: that is after the fork17:37
Shrewsabsolutely nothing is done with asyncio until after the fork17:38
Shrewsand creating a new event loop, assuming i'm doing it correctly, doesn't help17:38
jlkjeblair: can you help me understand why in gerritmodel.py the EventFilter class deals with required_approvals and reject_approvals, instead of just handling those in RefFilter? Since they deal with the deeper content of the change, shouldn't it be only in the Ref filter?17:46
SpamapSShrews: I think you can get a fresh one with get_event_loop_policy().get_event_loop()17:49
SpamapSShrews: oh, that's kind of what new_event_loop() does actually.17:50
jeblairjlk: it's a borderline abstraction violation -- it means "this event matches iff the change already has some approval (wich may not be related to the event)"17:50
SpamapSShrews: set_event_loop(new_event_loop()) maybe?17:50
ShrewsSpamapS: i tried all of that17:50
SpamapSow17:51
jlkjeblair: er.... hrm. How would that play out? I'm trying to determine if I need to make the same violation on github.17:51
jeblairjlk: it *seems* like it should be obsoleted by the pipeline requirement, however, we use it to say "enqueue a change into check if someone adds a workflow +1 vote and jenkins has a -1 vote".  normally a workflow +1 would not enqueue in check.  and you definitely don't want to require a workflow +1 for check.17:51
jlkah, that feels... gerrit/openstack specific.17:52
jlkI'm going to ignore that for now and not duplicate it for github.17:52
jeblairjlk: well, our *use* of it is; i was just giving you an example of how it's different.  i think it's okay to ignore for now, but we may end up finding it useful to add later.17:53
SpamapSShrews: so yeah, next step is to delay import of asyncio I guess. :-/17:53
Shrewsand just tried that (well, delayed the import of the thing that imports asyncio). still no worky17:54
* Shrews stabs all things python17:54
jeblairjlk: (the same thing can map to github pretty easily: enqueue in check if zuul has voted failure already and someone approves the pr nonetheless; essentially, an implied "recheck")17:54
jlkso normally a "review" would trigger gate, but gate can't trigger because it requires a zuul status, and zuul has said no. Somehow that scenario manages to trigger the check?17:56
jlkthinking17:56
jlkso when scheduler tries to match up the event to a pipeline,17:57
jlkIt won't see the "check" pipeline because the trigger config doesn't match, it has no "review" trigger17:57
SpamapSShrews: what about the websocket library?18:01
SpamapSShrews: perhaps it has import side effects18:02
SpamapSreading asyncio's code, I don't think it creates the event loop on import18:02
ShrewsSpamapS: ah, that's entirely probable18:03
Shrewsbut still, nothing is done with the websocket lib until after fork18:04
SpamapSShrews: import?18:04
Shrewsit's imported in the same file that i tried delaying import on18:04
SpamapSShrews: is it possible the problem has been misdiagnosed?18:06
jlkjeblair: oh hrm, I found something else18:07
jlkjeblair: A trigger schema has to account for 'require-approval', and that gets defined in <driver>/<driver>trigger.py, which calls <Driver>EventFilter18:08
ShrewsSpamapS: i welcome others to investigate. all code is up in 463353 that demonstrates the problem18:08
jlkso the EventFilter has to handle the require/reject intput18:08
jlkeven though it seems it should be handled  by RefFilter18:08
* jlk pulls more hair out18:08
ShrewsSpamapS: just run 'zuul-web-streamer -c zuul.yaml' and 'kill <child pid>'18:09
jeblairjlk: a trigger doesn't have to support require-approval -- i think you just suggested not doing that for github, yeah?18:09
jlkI think I was arguing the wrong thing18:09
jeblairjlk: successfully though :)18:09
jlkjeblair: in my head, I mapped EventFilter to "just look at the content of the event, i.e. the stuff that changed right now to cause the event"18:10
jlkSo, EventFilter handles things like a comment being added, a label being added, a ref being updated, a vote being added.18:10
jlkand I mapped RefFilter to "deeper data about the change itself that must be in a specific set"18:11
jeblairjlk: yes, that is, generally, and ideally, the case.  except for this one thing.  :)18:11
jlkso "this label must exist on the change, even if it's not set on this event"  or "this vote must exist even if it wasn't set on this event" or things like that.18:11
SpamapSShrews: If I get tired of my current personal threading hell, I'll try your asyncio hell.18:12
jeblairjlk: yep.  it's just that we also discovered that it was useful to say "this event, but only if this vote exists".18:12
jlkthe code makes this awkward when you want to express a RefFilter level requirement at the trigger config point18:12
jlkyeah, we've got pipeline requirements that generically cover this18:13
jlkbut I think I see what you're saying18:13
jlkpipeline reqs are more like "any of these events are cool, but this must be in place for any of them"18:13
jlkand a trigger req is more "this event is _only_ cool in this specific scenario, and _STILL_ the pipeline req must be in place."18:14
jlkso I could configure a trigger for check that was an approved review from a writer, but only if there is a failed status from zuul.18:15
*** Cibo_ has quit IRC18:15
jlkso that an approved review from a writer wouldn't trigger if there was a success status from zuul.18:16
jlkI think I get it now. I had originally looked at trigger requirements as a way to have a driver specific requirement on a pipeline with multiple drivers, but we fixed that a different way.18:17
jeblairjlk: yep!18:17
jeblairjlk: yeah, i recall we talked about that earlier, and i think it's no longer necessary to think of them in that way since we made driver-specific pipeline reqs18:18
jeblair(it would be awkward to use them that way since you'd have to duplicate them on every trigger)18:18
jlkBut this does mean now that EventFilter has to deal with deeper data of the change.18:18
jlksince a trigger calls EventFilter18:18
jlkI think it'll make sense to implement a trigger requirement on status, but not on review.18:19
jlkgerrit doesn't really have status vs review, it's all in just differently formatted approvals. So it makes sense for gerrit to have require-approval18:20
jeblairthat sounds reasonable18:23
jlkand this definitely needs a new test case18:24
openstackgerritPaul Belanger proposed openstack-infra/nodepool feature/zuulv3: Drop -e for pip install for devstack plugin.sh  https://review.openstack.org/46812718:37
*** bhavik1 has quit IRC18:39
Shrewshrm, ok. seems something about the Ctl-C stops the event loop.18:40
openstackgerritJesse Keating proposed openstack-infra/zuul feature/zuulv3: Include tenant name in github context  https://review.openstack.org/46387118:44
openstackgerritJesse Keating proposed openstack-infra/zuul feature/zuulv3: Add cachecontrol to requests to github  https://review.openstack.org/46158718:44
openstackgerritJesse Keating proposed openstack-infra/zuul feature/zuulv3: Save installation ids to a cache and fetch them per project  https://review.openstack.org/46342418:44
openstackgerritJesse Keating proposed openstack-infra/zuul feature/zuulv3: Catch integration key file not found in zuul  https://review.openstack.org/46342518:44
openstackgerritJesse Keating proposed openstack-infra/zuul feature/zuulv3: Include exc_info in reporter failure  https://review.openstack.org/46076518:44
openstackgerritJesse Keating proposed openstack-infra/zuul feature/zuulv3: Implement github trigger requirement status  https://review.openstack.org/46338618:44
openstackgerritJesse Keating proposed openstack-infra/zuul feature/zuulv3: Create github client each time it's called.  https://review.openstack.org/46342118:44
openstackgerritJesse Keating proposed openstack-infra/zuul feature/zuulv3: Use integration_id with github  https://review.openstack.org/46342018:44
openstackgerritJesse Keating proposed openstack-infra/zuul feature/zuulv3: Refactor integration key loading to always load  https://review.openstack.org/46342318:44
openstackgerritJesse Keating proposed openstack-infra/zuul feature/zuulv3: Decode JSON body once for requests  https://review.openstack.org/46342218:44
*** rcarrillocruz has joined #zuul18:46
openstackgerritDavid Shrewsbury proposed openstack-infra/zuul feature/zuulv3: WIP: Add web-based console log streaming  https://review.openstack.org/46335318:51
Shrewsthat PS ^^^ captures SIGINT and attempts to shutdown things, but doesn't work as expected, whether daemonized or not18:52
Shrewsso i guess it ISN'T fork related18:52
Shrewsprogress, at least18:53
jeblairShrews: that looks like it should immediately exit?  (it calls start then stop)18:55
jeblairer, rather, start then it just exits18:55
Shrewsjeblair: start() doesn't return18:55
Shrewsnot until the loop stops18:56
jlkmordred: jeblair: I think I addressed the stated concerns in those patches. I dropped one more, the trigger level review requirement. Tests appear to pass on the tip of the branch locally. I'm off to find some fish and chips.18:56
Shrewsjlk: event_loop.stop() should cause the run_forever() call to exit at the next opportunity18:57
Shrewserr, jeblair ^^^18:57
jeblairShrews: ah, well, ctrl+c only goes to the main thread, so it will interrupt the streamer start() method which is running run_forever18:57
jeblairShrews: what happens if you run that code then send sigterm?18:58
jeblairit'll do the same, actually18:58
Shrewsjeblair: i'm not using threads18:58
jeblairShrews: yeah, but you still get one for free.  :)18:58
Shrewswell, yeah. there's only 1, i meant18:59
jeblairShrews: so ctrl+c will raise an exception inside of run_forever18:59
Shrewsjeblair: right, but we SHOULD be able to stop it manually19:00
jeblairi would not be surprised if that breaks things, so i say it's best to set that aside for now and try a different method of triggering the shutdown19:00
jeblairlike sigterm, which should call the exit handler then return to processing insed of run_forever19:00
jeblairShrews: so what happens when you start that with "-d", then in another terminal, kill -15 the pid?19:01
Shrewsjeblair: same thing19:02
Shrewsevent loop is never stopped, even though we've asked it to19:02
Shrewsjeblair: the answer might be here: https://stackoverflow.com/questions/42628795/indirectly-stopping-a-python-asyncio-event-loop-through-sigterm-has-no-effect/4263512819:04
Shrews"Instead of signal.signal() use loop.add_signal_handler()"19:05
ShrewsI will try that19:05
jeblairShrews: ah, so incidentally, fixing the ctrl-c problem may fix this as well.19:09
jeblairShrews: if you have the streamer start in a thread, and have the start() method return, then wait on signal.pause() in the main thread, it may cause both things to work without the need for loop.add_signal_handler19:10
Shrewsjeblair: i'll try that next because add_signal_handler does the same thing  :/19:11
*** hashar has joined #zuul19:42
*** jamielennox is now known as jamielennox|away19:56
* SpamapS returns to the ssh-agent salt mine20:23
SpamapShrm20:27
SpamapSwhen I attach to the threads of my running tests that are timing out, I just noticed they're actually really busy in the kazoo threads20:28
pabelangerheads up, nodepool dsvm jobs are broken. Working to fix them20:28
*** jkilpatr has quit IRC20:29
SpamapS2017-05-25 20:25:58,064 zuul.zk.ZooKeeper                DEBUG    ZooKeeper connection: CONNECTED20:30
SpamapS2017-05-25 20:25:58,080 zuul.nodepool                    DEBUG    Updating node request <NodeRequest 100-0000000000 <NodeSet OrderedDict([('controller', <Node None controller:invalid-label>)])>>20:30
SpamapS2017-05-25 20:25:58,080 zuul.nodepool                    INFO     Submited node request <NodeRequest 100-0000000000 <NodeSet OrderedDict([('controller', <Node None controller:invalid-label>)])>>20:30
SpamapSrunning with 'ttrun -epy27' just prints those 3 lines and then that process is looping hard selecting/recving/sending with zookeeper20:31
SpamapShttp://paste.openstack.org/show/610675/20:33
ShrewsSpamapS: that's likely the kazoo keepalive thread20:37
Shrewswe had to restrict it to a single kazoo connection b/c otherwise it hit the cpu pretty hard with multiple20:38
Shrewsin nodepool, that is20:38
Shrewsi think kazoo uses something like 3 threads per connection, iirc20:39
SpamapSYeah that sounds about right. Need at least 1 for watches.20:39
jeblairSpamapS: in general, what i've found is that the stop process for ssh isn't robust, and when we hit the wait timeout, errors stopping ssh can cause the executor to raise an exception when stopping and end up aborting the test shutdown.  i'm still tracking stuff down, but i'm making progress.20:39
SpamapSShrews: it's an awfully busy thread.20:39
SpamapSjeblair: I have some updates that I may not have pushed up where I run .wait() in a thread20:40
SpamapSjeblair: so that I can do terminate, then start waiting for it to die, then after a few seconds use kill.20:40
jeblairSpamapS: yeah, though normally, we'd expect terminate to be pretty fast, right?20:42
SpamapSjeblair: in fact I was just re-reading the subprocess manual..20:42
SpamapSit says to be careful with subprocess.PIPE and wait.20:42
SpamapSjeblair: terminate should be fast, but wait needs to wait for the actual death of ssh-agent.20:42
SpamapSthe agent does have to unlink its socket20:42
SpamapSso, a few ms, but not instant20:43
SpamapSjeblair: I'm not seeing executor exceptions ever though, so you're clearly finding more than I could.20:43
SpamapSjeblair: I wonder if I should just send all the fd's to devnull and poll for the socket existence, rather than try to be fancy and select stdout.20:44
SpamapSthat way we don't have any pipes20:44
jeblairSpamapS: i think that's a good idea.  i think the buffering, etc, there is a theoretical problem, though i'm not sure it's an actual one.  at any rate, a lot of stuff gets simpler if we do that.  (also, incidentally, the same is true if we don't run in foreground, so we might still consider running without -D depending on whether we find the pgroup thing useful)20:46
jeblairSpamapS: the biggest thing i've found is that stopping is racy, and ExecutorServer.stop() can end up running with self not having a .thread attribute; so i added protection there (and for ssh_agent).20:47
jeblairSpamapS: i still have some tests hitting the hard timeout though; so still something else to track down20:47
SpamapSjeblair: I'm trying with devnull now.20:47
*** yolanda has quit IRC20:49
*** jkilpatr has joined #zuul20:49
SpamapSjeblair: also re-reading the stop method, I think it will be more robust with wait() running in a thread. Otherwise we could get stuck there.20:50
jeblairSpamapS: i keep seeing something that looks like git commands are getting stuck.  i wonder if there is some interaction there with ssh-agent?  (i don't understand why -- all the git commands are local)20:50
SpamapSoh hm20:50
SpamapSI saw that too20:50
SpamapSbut kind of let it fall by the wayside20:50
SpamapSI was wondering if maybe the pipes were open and causing FD issues with git20:51
SpamapSstill hard timeout with devnull so pipes probably not the culprit20:51
jeblairSpamapS: but this only happens sometimes...20:51
SpamapShttp://paste.ubuntu.com/24659247/20:52
SpamapSmy current diff20:52
SpamapSjeblair: merger runs an update thread, right?20:53
jeblairSpamapS: yes, and that's one of the things that can get stuck and block shutdown20:53
SpamapSjeblair: so what if that merger thread is concurrently running forks and picks up FDs opened for ssh-agent20:53
SpamapSsubprocess is supposed to handle this20:54
SpamapSby closing all fds20:54
SpamapSbut I'm grasping for anything to hold on to now20:54
jeblairya20:54
SpamapSwould be great if I could attach gdb but get actual python line numbers20:56
jeblairSpamapS: that used to work with https://wiki.python.org/moin/DebuggingWithGdb  but last time i tried (not recently) it was somehow broken20:59
jeblairi just ran ps during a failing test and saw a zombie git process21:00
clarkbI want to say I got ^ mostly working, where it falls down is when you are in code outside of stdlib iirc21:01
jeblairalso, amazingly, it was a single test in the foreground.  first time that's happened.  test_zuul_refs21:02
clarkbwhcih was my problem when trying to debug the python3.4 segfaults21:02
*** dkranz has quit IRC21:02
jeblairoh, yeah, i just tried it.  it's producing some output.  let's see if it's helpful21:04
jeblairbe in the zuul root directory when you run it so it can pick up the local files21:06
SpamapSjeblair: I'm trying without -D21:09
jeblairhttp://paste.openstack.org/show/610684/21:10
jeblairthere's the traceback (via gdb) for the thread stuck on a git operation21:10
jeblairits git subprocess is (Z)ombie21:10
SpamapSMy copious debug logging led me to innerUpdateLoop, but I couldn't crack what was happening there21:13
SpamapSspecifically updateRepo21:13
jeblairSpamapS: http://bugs.python.org/issue1278621:14
jeblair(via https://stackoverflow.com/questions/14615462/why-does-communicate-deadlock-when-used-with-multiple-popen-subprocesses)21:14
jeblairSpamapS: looks like your fd hunch may have been right21:14
jeblairSpamapS: i am running in 2.7; are/were you when you saw the git hangs?21:15
SpamapSjeblair: yes, 2.721:16
SpamapSthe paint is still a little wet on 3.5 ;)21:16
jeblairSpamapS: yeah, that's what i thought too.  :)  i'm adding close_fds=True to the ssh-agent and re-running now21:17
SpamapSbut I'm on 2.7.1221:17
SpamapSjeblair: reading the python issue.. this definitely sounds like our problem21:18
ShrewsOMG OMG OMG21:18
Shrewsi have left the asyncio hell21:19
jeblairShrews: yay!21:20
SpamapSjeblair: oh doh.. 3 changed the default21:20
* SpamapS shoudl have tried 3.521:20
SpamapSrunning without -D also seems to have had a positive effect I think21:20
SpamapSwhich makes sense, because ssh-agent detaches and exits relatively quickly.21:20
jeblairSpamapS: yeah, though then we would have thrown up our hands and wondered why 3 "fixed" it :)21:20
openstackgerritDavid Shrewsbury proposed openstack-infra/zuul feature/zuulv3: WIP: Add web-based console log streaming  https://review.openstack.org/46335321:20
SpamapSdoes anybody else get a spray of logs to stdout just before the tests succeed?21:21
Shrewsjeblair: SpamapS: ^^^ changed from run_forever to run_until_complete on a Future, where I set the Future (scheduled in the event loop) in our stop()21:22
SpamapSShrews: oh yay21:22
Shrewsi am now going to drink copious amounts of alcohol21:22
SpamapSif close_fds=True solves this issue I may do the same21:23
SpamapSjeblair: FYI, the run without -D did still get one deadlocked test process, so it just narrowed the window. I'm re-running with my non -D patch, and close_fds=True21:24
mordredShrews: yay indeed!21:24
jeblairShrews: cool -- though ultimately we shouldn't have any signal stuff in the streamer.  did you try the thread idea?21:24
Shrewsnope. was running down some other last minute hunches21:25
Shrewsplus i read some scary things about threads and event loops21:25
Shrewsbut i'll look more closely tomorrow21:26
SpamapSIIRC, usually one has an event-loop per thread and dispatches things between them using queues.21:26
Shrewsjeblair: i THINK it will work by moving the signal handling back out of the streamer21:27
* Shrews tests real quick21:27
jeblairyeah, we're not trying to do anything funny with the event loop; we just want to start it in its own thread, and tell it to stop.21:27
jeblairand we're going to want it in its own thread too so it follows the pattern of the other services, and so we can start/stop it from tests21:27
Shrewsjeblair: hrm, nope. gotta be done in the event loop21:29
jeblairShrews: what does?21:29
Shrewssignal handling21:29
jeblairShrews: did you put it in another thread?21:29
Shrews(for the current method that i have working)21:29
Shrewsno21:29
Shrewswill try tomorrow21:29
jeblairShrews: right, i don't expect that to work with the current method because of the way signals are delivered.  they are always delivered to the main thread.  so the trick is to put the event loop in a new (second) thread, so that the main thread just waits for a signal.  then it tells the event loop to stop when it gets a signal.21:30
SpamapSjeblair: close_fds worked for me!21:30
jeblairSpamapS: it's working for me, so far, inasmuch as i'm getting real test timeouts and not alarm clocks :)21:31
SpamapSRan 274 (+5) tests in 455.393s (-103.340s)21:31
SpamapSPASSED (id=88, skips=18)21:31
SpamapSwhen running with my non -D patch as well21:31
SpamapSalready have 4 rogue ssh-agents though21:32
jeblairoh, my ssh-adds are failing, that's why all the failures21:33
SpamapSI have a patch for that21:34
SpamapSI'm going to push up this ssh agent patch as a new change ID so we can see the same things21:34
jeblairSpamapS: http://paste.openstack.org/show/610686/21:35
openstackgerritClint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Add SSH Agent Primitives and usage  https://review.openstack.org/46817321:35
jeblairSpamapS: that's the other thing we should include21:35
SpamapSAh yeah21:36
SpamapSsaw those missing threads too which were confusing me21:36
SpamapSI thought it was just fallout21:36
SpamapSok 468173 is without -D, and with an attempt at a singleton one-per-test-run-process SSH key generator21:36
SpamapSthe __del__ approach there produces wonky stuff htough21:37
SpamapSthough21:37
SpamapSException TypeError: TypeError("'NoneType' object is not callable",) in <bound method SshKeyResource.__del__ of <tests.base.SshKeyResource object at 0x7f4ea8a23610>> ignored21:37
jeblairSpamapS: oh, that also needs initializers, i left that off the diff, but you get the idea21:37
jeblairSpamapS: regarding the key -- i was thinking it might be nice to just generate a key and stick it in as a test fixture so we don't have to do that live21:38
jeblairSpamapS: we do that for the RSA keys21:38
*** hashar has quit IRC21:38
SpamapSjeblair: oh good idea21:39
SpamapSthat crossed my mind but this was more fun ;)21:39
SpamapSI'll rework around that21:39
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Add SSH Agent Primitives and usage  https://review.openstack.org/46817321:40
SpamapSoh I had another thought which was that we probably don't need to explicitly call SshAgent.stop in AnsibleJob.stop because it's already called in the finally block of execute21:40
jeblairSpamapS: ^ i pushed the .thread attribute safety as a new PS to your new chang ^21:41
jeblairSpamapS: oh, even better.  :)21:41
jeblairSpamapS: let's go with that.21:41
SpamapSjeblair: and how about -D vs. not?21:44
SpamapSI'm still of two minds21:44
SpamapSjeblair: thanks so much for the help on debugging this. I was definitely at the end of the rope.21:45
SpamapSI'm going to collect my thoughts and see about putting in a final version of the patch we can be proud of :)21:45
jeblairSpamapS: yeah, i'm not sure what the right answer is there yet....21:49
jeblairSpamapS: that patch passes tests for me21:54
jeblairoh look, my "actually check out the right branches of stuff" patch series is all green.  that's 461176 through 467779 if folks want to start taking a look.  it should be ready for review.21:56
jlkoh22:04
jeblairmordred: reviewed your group/inventory changes22:13
*** jamielennox|away is now known as jamielennox22:36
* SpamapS never thought watching tests pass would bring tears to his eyes.. but... today...22:43
openstackgerritClint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Add SSH Agent Primitives and usage  https://review.openstack.org/46817322:49
SpamapSjeblair: ^ Let's stick without -D. It makes us work on more platforms and should work well enough.22:50
SpamapSI also like that after .communicate() I know the agent is up and working.22:50
SpamapS(vs. with -D where I have to select.select on the stdout)22:51
jeblairSpamapS: wfm22:51
* SpamapS restacks bwrap on that22:52
openstackgerritClint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Add support for bwrap  https://review.openstack.org/45385122:52
mordredjeblair: thanks! (I forgot to say that before I started working on addressing your reviews a little while ago)22:53
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Add support for defining groups in nodesets  https://review.openstack.org/46761122:53
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Put variables into the inventory  https://review.openstack.org/46763522:53
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Write inventory as yaml not ini  https://review.openstack.org/46763422:53
mordredjeblair: hopefully those will be mo better22:53
mordredjeblair: and yay on check out the right branches of stuff!22:54
mordredjeblair: I believe I've read this stack many times already ... I know I've read the commit message at least multiple times22:55
SpamapSdoh23:10
SpamapShttp://paste.openstack.org/show/610690/23:10
SpamapShave to muck with the perms23:10
jeblairmordred: i just assume you're always saying "thanks!"23:22
jeblairSpamapS: ah yeah; we could copy it into the test root and muck it up there23:22
SpamapSjeblair: testing that now23:26
openstackgerritClint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Add support for bwrap  https://review.openstack.org/45385123:29
openstackgerritClint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Add SSH Agent Primitives and usage  https://review.openstack.org/46817323:29
jlkjeblair: what happens if you have a pipeline that triggers on push, but a requirement of "open"? A push event doesn't have a change that can either be open or closed.23:32
openstackgerritClark Boylan proposed openstack-infra/nodepool feature/zuulv3: Revert "Use devstack's zookeeper support"  https://review.openstack.org/46820523:32
jeblairjlk: probably nothing?23:32
jeblairjlk: ie, all changes rejected23:33
jlkIt looks like the "open" attribute to a change only gets created/updated when _updateChange is called23:33
jlkso it kind of looks like you'd get a traceback on a missing attribute in the model23:33
jeblairjlk: yeah, probably needs tidying up and has never been exercised due to the contradiction23:35
jlkI'll put a thumbtack in that23:35
jlk(as I'm implementing it for github)23:35
jlkProbably need to test a number of things with a push event instead of a change event for requirements and such23:36
mordredjeblair: my updates to the inventory group patches are still failing - but not in a super weird way or anything - I'll look at them tomorrow and get them fixed23:41
jeblairmordred: yeah, i think the first change is fine; that py35 failure looks unrelated23:42
jeblairi rechecked23:42
jeblair(but something to keep an eye on)23:42
jeblairthe second one def has a real problem23:42
jeblairmordred: the second one is just some job name mismatches23:44
openstackgerritClint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Make zuul ansible dir self-contained  https://review.openstack.org/46820823:45
SpamapSmordred: ^23:45
SpamapSI think that makes it work inside the bwrap23:46
SpamapSbut I'm out of runway for today..23:46
* SpamapS EOD's23:46
openstackgerritJesse Keating proposed openstack-infra/zuul feature/zuulv3: Implement github pipepline req of open  https://review.openstack.org/46821323:56

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!