Monday, 2014-01-27

*** masayukig has joined #openstack-gate01:29
*** markmcclain has joined #openstack-gate02:31
*** dims has quit IRC02:50
*** markmcclain has quit IRC04:05
*** SergeyLukjanov_ is now known as SergeyLukjanov06:04
*** SergeyLukjanov is now known as SergeyLukjanov_06:31
*** SergeyLukjanov_ is now known as SergeyLukjanov06:57
*** SergeyLukjanov is now known as SergeyLukjanov_07:20
*** SergeyLukjanov_ is now known as SergeyLukjanov07:48
*** SergeyLukjanov is now known as SergeyLukjanov_08:03
*** SergeyLukjanov_ is now known as SergeyLukjanov08:03
*** SergeyLukjanov is now known as SergeyLukjanov_08:17
*** jpich has joined #openstack-gate09:19
*** frankbutt has joined #openstack-gate11:10
*** frankbutt has left #openstack-gate11:10
*** SergeyLukjanov_ is now known as SergeyLukjanov11:17
sdaguemorning folks11:23
*** SergeyLukjanov is now known as SergeyLukjanov_11:33
*** shardy has joined #openstack-gate12:15
*** Alexei_987 has joined #openstack-gate12:17
*** werebutt has joined #openstack-gate12:18
*** werebutt has left #openstack-gate12:18
*** obondarev has joined #openstack-gate12:21
*** cyeoh has joined #openstack-gate12:21
*** Ajaeger has joined #openstack-gate12:21
*** dims has joined #openstack-gate12:23
*** therve has joined #openstack-gate12:27
*** flaper87 has joined #openstack-gate12:28
flaper87\o/12:28
portanteo/12:32
portantemornin'12:32
*** gsamfira has joined #openstack-gate12:36
*** koofoss has joined #openstack-gate12:37
dimso/12:37
*** masayukig has quit IRC12:37
flaper87portante: morning :)12:38
portante:)12:38
*** salv-orlando has joined #openstack-gate12:48
salv-orlandoaloha12:49
flaper87I prepared a patch that would keep the config files of the gate. https://review.openstack.org/#/c/69344/ (In case you guys think it's useful)12:51
salv-orlandoI joined the room just 5 minutes ago. Do we have somebody actively working on bug 1254890?12:52
salv-orlandoI think at least for neutron jobs the hang is occurring because of kernel crashes, so I'd like to discuss how we should go about it12:53
salv-orlandoon the other hand, for bug 1253896, Darragh's patch for increasing the ping timeout in tempest merged12:56
salv-orlandothis will solve the missed DHCPDISOVER problem12:56
salv-orlandoThere is another neutron patch which is still blocked because of the other gate failures, which are mostly bug 1254890 and bug 127021212:57
salv-orlandoFor the latter the neutron patch is under review as well: https://review.openstack.org/#/c/67537/12:57
*** koofoss has left #openstack-gate12:57
salv-orlandopromotion might not help however since there's still a high failure rate because of kernel crashes12:58
sdaguesalv-orlando: so any idea why we are triggering kernel crashes now?13:01
sdaguethat seems like a relatively new situation13:01
salv-orlandoI looked at neutron changes and nothing would justify this. The problem is that the crashes are triggered by the same operations which usually worked fine before.13:02
salv-orlandoAnd we have pretty much no logging for this from neutron, since the crash is usually triggered by the metadata proxy, whose log is stashed into the l3 agent log by redirecting the stream.13:02
salv-orlandoyucky.13:02
salv-orlandosdague: So I was thinking that, assuming I can have metadata proxies (there is one for each namespace) logging into their own file, how hard would it be to have gate jobs store an additional log file?13:04
sdaguecollecting them isn't hard13:04
salv-orlandothe other issue is that with the current gate situation I would be hardly able to merge the required neutron change.13:04
sdagueso we can always bypass if we have a critical debug issue like this13:05
*** jd__ has joined #openstack-gate13:07
salv-orlandok, so I'll work on that as this might give us the information we need to assess whether we need a neutron fix or whether we need to change something in the system where gate tests are executed13:07
sdagueyeh13:09
*** ociuhandu has joined #openstack-gate13:15
sdaguehttps://etherpad.openstack.org/p/gate-bugs - if people wanted to update what they are working on there13:20
*** alexpilotti has joined #openstack-gate13:21
chmoueldo you know how to do an update to status.openstack.org website i like to add http://status.openstack.org/elastic-recheck/ to it (on the top bar)13:21
Ajaegerchmouel: repo openstack-infra/config - and then check the directory ./modules/openstack_project/files/status/13:24
chmouelAjaeger: cool cheers13:24
Ajaegerchmouel: Great idea to do this - will you patch it?13:25
chmouelAjaeger: yes, i am doing that now13:25
Ajaegerthanks, chmouel !13:26
dimschmouel, there is/was a plan to merge elastic-recheck and rechecks page, so my prev review was -1'd13:27
chmoueldims: ah but i guess in the meantime we can just add that page to the top bar until it's merged?13:27
chmoueldims: as ppl are not aware of it13:28
dimschmouel, that was exactly in my review :)13:28
chmoueldims: so no -1 if i submit that changes ? :)13:28
Ajaegerdims: do you have a link to your patch?13:29
chmoueldims: ah i misunderstood, you submitted a change beforehand, cool let's see if we can chat the ppl who -1 you (if you can send the link)13:31
*** SergeyLukjanov_ is now known as SergeyLukjanov13:31
sdaguechmouel: there isn't one yet, though the gate status page has been stalled since Jan 9, so I hadn't bothered yet13:32
sdagueI'm also trying to sort out something on our unclassified rate, as the numbers don't look right to me13:32
anteayamorning13:42
anteayaI'm here for a bit and then have to move to airport wifi13:42
anteayajust reading up on things atm13:42
*** dhellmann has joined #openstack-gate13:47
sdagueanteaya: where you headed today?13:48
anteayaI have to fly to Salt Lake City tonight for SaltConf13:48
anteayaI am in Toronto today for hotel wifi rather than spending the day driving to the airport13:48
*** ttx has joined #openstack-gate13:48
sdagueah, cool13:49
salv-orlandoanyone has info regarding this error in n-cpu logs? http://logs.openstack.org/84/52884/7/check/check-tempest-dsvm-neutron-isolated/b0230ed/logs/screen-n-cpu.txt.gz?level=INFO#_2014-01-27_08_14_12_23114:03
salv-orlandodoes not seem fatal, however I never seen it.14:03
sdaguesalv-orlando: yeh, cyeoh has some patches up to catch those, but I think there is a more systematic approach needed14:08
salv-orlandosdague: thanks14:08
anteayamy goal for myself today is to learn how to build elastic recheck matches14:09
anteayaI feel I have seen enough around to get a sense of it, I just need to focus on it14:09
anteayawill ask silly questions if I hit a wall14:09
sdaguesounds great14:11
*** rustlebee has joined #openstack-gate14:12
*** rustlebee is now known as russellb14:12
sdaguemorning russellb14:12
russellbmorning14:12
russellbso yeah, top bug, i think we should split it in 2 ... i can work on doing that14:13
russellb(once i finish checking for fires in email backlog)14:13
sdaguesure, sounds good14:13
*** dims has quit IRC14:13
*** dims has joined #openstack-gate14:15
russellbsdague: are you offended by Python code in a shell script?  heh ... https://review.openstack.org/#/c/69256/14:15
sdagueummmmm14:17
sdaguewe should probably try to do that in bash, honestly, for when it goes up for real14:17
sdagueNUMCPU=`cat /proc/cpuinfo | grep processor | wc -l`14:18
chmouelnproc14:20
chmouelwoudl work as well14:20
russellbah, nproc is in coreutils, so that should be fine14:21
sdaguechmouel: cool14:21
sdagueactually14:21
sdaguenproc --ignore=214:21
sdaguegive you the answer you want14:21
anteayafor filing bugs against elastic-recheck, does it have it's own launchpad account or does it go under infra?14:25
russellbah true14:25
russellbi actually changed it from - 2 to / 214:25
russellbas a safer initial increase14:25
chmouelare we doing the tabs are evil in shell scripts?14:25
chmouelhttps://review.openstack.org/#/c/69256/3/devstack-vm-gate-wrap.sh14:25
russellbyes14:25
russellbi just screwed it up14:25
russellbi love doing a bunch of iterations on a trivial patch, heh14:26
anteayaa great start to the morning14:26
*** jeckersb has joined #openstack-gate14:27
sdagueanteaya: honestly, we're not really using a tracker for er14:29
anteayaokay14:30
anteayaI found a bug in the docs14:30
anteayahttp://docs.openstack.org/infra/elastic-recheck/readme.html14:30
anteayacontains a link: http://docs.openstack.org/developer/elastic-recheck which is broken14:31
anteayaand I found 4 e-r bugs filed against infra's launchpad, fyi14:31
anteayado you want me to just offer an e-r docs patch that removes the broken link for now?14:32
sdagueyeh, I'm not surprised, we're not really using it :)14:32
sdagueanteaya: yes please14:32
anteaya:D14:32
anteayacan do14:32
*** mriedem has joined #openstack-gate14:37
*** mriedem has left #openstack-gate14:37
*** mriedem has joined #openstack-gate14:37
mriedemwhat did i miss?14:37
anteayamriedem: we waited for you14:38
mriedemwell that's nice :)14:39
anteaya:D14:39
anteayaalso logs: http://eavesdrop.openstack.org/irclogs/%23openstack-gate/14:39
russellbmriedem: you missed everything14:40
mriedemi saw the logs, that's a lie14:41
mriedemsomething about shell scripts within python...14:41
mriedemor vice-versa14:41
mriedemi got to the parking lot at work today and then decided my car might not start tonight, so went back home14:41
*** licostan has joined #openstack-gate14:41
russellbmriedem: from the cold?14:42
mriedemyeah, -15 real temp, -40 wind chill14:42
russellbeep14:42
mriedemnot too bad14:42
russellbi'm in the deep south, and we may get 2" of snow this week14:42
mriedemhells bells14:42
mriedemcall off school14:42
russellb25 years ago was the last time i remember real snow on the ground here, heh14:42
russellbpretty much :)14:42
russellbmilk and bread will be gone14:42
russellbpanic everywhere, shut the city down14:43
mriedemdon't forget bullets and booze14:43
russellbthough they do that even at the threat of snow/ice14:43
mriedemyou know it's bad when you step on dog shit in the back yard and think it was a rock14:43
anteayamriedem: where are you?14:43
mriedemanteaya: rochester, minnesota14:44
anteayaah nice14:44
anteayaalmost canada14:44
mriedemsort of, not really, it could be worse - north end of the state near the canadian border is always -4014:44
mriedemgod's country14:45
mriedemiron mines and bearded women :)14:45
*** licostan has left #openstack-gate14:45
mriedemalright, back to fingerprinting nova bugs - lots of these are old/fixed by now i'm finding14:46
anteayaI am trying my hand at finding fingerprints for neutron unit test failures. Here is my first attempt at a fingerprint: http://bit.ly/1esexsm for this failure: http://logs.openstack.org/71/60571/3/gate/gate-neutron-python27/eb0985e/14:53
anteayasdague: any feedback?14:53
sdagueanteaya: looking14:54
sdagueanteaya: lgtm14:55
anteayathanks, I'll do up a patch with this query14:56
anteayaoh I guess I need to file a bug first, if there isn't one14:56
sdagueyep14:57
mriedemanteaya: you're missing a colon after 'message'15:00
mriedemshould be: message:"delete_port() got an unexpected keyword argument 'l3_port_check'" AND filename:"console.html"15:00
mriedemseems to be hitting though...maybe message is implied15:00
mriedemyou could further restrict the build_name to only the neutron unit test jobs15:01
anteayaI can add the colon15:04
mriedemrussellb: sdague: seems we should handle ec2 failure responses at a level higher than debug? http://logs.openstack.org/87/44787/16/check/check-tempest-devstack-vm-neutron/d2ede4d/logs/screen-n-api.txt.gz?#_2013-10-25_18_06_26_21715:05
anteayaI could restrict the build_name too15:05
mriedemsdague: because logstash doesn't index on debug level messages right? only INFO and higher?15:05
sdaguemriedem: sure15:05
sdagueyou did get a 404 on the request line15:06
anteayait seems filename:"console.html" works as does filename:console.html15:06
sdagueanteaya: yes15:06
mriedemanteaya: yeah, i think the quotes are only if there are spaces15:06
anteayaah15:07
sdaguemriedem: so are you sure that not found is actually an issue?15:07
mriedemanteaya: also, wildcards will work in logstash (kibana) but not elastic-recheck15:07
mriedemsdague: yeah, well at one point it caused a check failure15:07
mriedemin a tempest boto test15:07
mriedemsdague: https://bugs.launchpad.net/nova/+bug/124476215:07
mriedemhowever, that test hasn't failed in the gate in the last 2 weeks15:07
anteayamriedem: yes, was reading that15:08
mriedemanteaya: i need to get the INFO level indexing restriction into the e-r readme also15:08
anteayaokay15:08
anteayaI will vote on that patch when you have it up15:09
sdaguemriedem: yeh, if we haven't seen it int he gate in the last 2 weeks, I wouldn't worry about it15:09
anteayaI don't know about the INFO level indexing restriction15:09
mriedemanteaya: here is the query for that UT fail i came up with: http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiZGVsZXRlX3BvcnQoKSBnb3QgYW4gdW5leHBlY3RlZCBrZXl3b3JkIGFyZ3VtZW50ICdsM19wb3J0X2NoZWNrJ1wiIEFORCBmaWxlbmFtZTpcImNvbnNvbGUuaHRtbFwiIEFORCAoYnVpbGRfbmFtZTpcImdhdGUtbmV1dHJvbi1weXRob24yNlwiIE9SIGJ1aWxkX25hbWU6XCJnYXRlLW5ldXRyb24tcHl0aG9uMjdcIikiLCJmaWVsZHMiOltdLCJvZmZzZXQiOjAsInRpbWVmcmFtZSI6ImFsbCIsImdyYXBobW9k15:09
sdagueanteaya: for throughput reasons we are only logging at INFO level and up15:09
mriedemanteaya: i had some e-r notes in a wiki here but need to move anything novel in there into the e-r readme: https://wiki.openstack.org/wiki/ElasticRecheck15:09
sdagueotherwise we overwhelm the search cluster15:09
mriedemi'll get an e-r docs patch up for the INFO level restriction15:10
sdaguemriedem: thanks15:10
anteayasdague: ah15:11
anteayamriedem: can I get the query link as a shorter url?15:11
mriedemsdague: here is an e-r query i wrote for a nova bug last night, i couldn't find anything better in the logs since it's a timeout: https://review.openstack.org/#/c/69242/15:11
mriedemanteaya: sure, sec15:11
anteayathe weechat doesn't do well with multiline links15:11
anteayathanks15:11
sdaguemriedem: is that the one that russellb wanted to split in half?15:12
mriedemsdague: there aren't any comments on it15:12
mriedemanteaya: http://goo.gl/wFbs7315:13
* mriedem needs more coffee15:14
anteayamriedem: thanks15:14
anteayaah you went the long route for build name since there are no wild cards, thanks for showing me that, I was wondering how to address it15:15
russellbthat looks different i believe15:15
anteayawill use that query15:15
mriedemanteaya: yeah, i found out the hard way about no wildcard support in the e-r queries15:18
mriedemwildcards are disabled by defalut in ElasticSearch15:18
mriedemfor performance reasons15:18
*** dansmith has joined #openstack-gate15:18
*** dtroyer has joined #openstack-gate15:23
anteayamakes sense15:26
ttxsdague: I have now 90min to dedicate to the bugday (sorry was late due to credit card being abused), anything specific I could jump in ?15:30
sdaguettx: sure, pick a job off of the list - http://status.openstack.org/elastic-recheck/data/uncategorized.html and try to build a bug & fingerprint15:31
ttxsdague: ok15:32
*** markmcclain has joined #openstack-gate15:34
russellbdoes anyone remember what the blockers are for getting us moved to cloud-archive?15:34
russellbin particular, we need to run on a newer libvirt15:34
russellbi'd like to do that as one of the next steps on the libvirt related bugs we're still seeing15:35
mriedemrussellb: wasn't there an etherpad with the patches around that?15:35
mriedemfew weeks ago15:35
mriedemtop 4 fails or something at the time?15:35
russellbcould be, trying to remember / dig it back up15:35
mriedemme too15:35
mriedemrussellb: https://etherpad.openstack.org/p/nova-gate-issue-tracking15:36
russellbha i started that etherpad ...15:37
*** mtreinish has joined #openstack-gate15:37
mriedemhttps://bugs.launchpad.net/nova/+bug/122897715:37
mriedemyeah :)15:37
*** ndipanov has joined #openstack-gate15:38
mriedemrussellb: sounds like danpb is working on a fix15:38
*** SergeyLukjanov is now known as SergeyLukjanov_15:38
*** markmcclain has quit IRC15:38
russellbndipanov: hey, so we were just talking about the blockers for newer libvirt ... some notes on https://etherpad.openstack.org/p/nova-gate-issue-tracking15:38
russellbndipanov: in particular, take a look at the notes under the libvirt bug15:38
ndipanovrussellb, thanks - will take a look now15:39
russellbndipanov: current known blocker is https://bugs.launchpad.net/nova/+bug/122897715:40
ndipanovrussellb, thanks - will read up as soon as I get off this call15:42
russellbk15:43
ttxsdague: OK, filed a bug and a finderprint, any way to mark that bug as done ?15:43
ttxsdague: https://bugs.launchpad.net/openstack-ci/+bug/127328315:43
mriedemttx: that fingerprint seems pretty loose15:44
*** markmcclain has joined #openstack-gate15:44
mriedemthought we already had some like that15:44
ttxmriedem: there were other bugs on jenkins exceptions, but they matched other exceptions15:45
ttxlike "Interrupted"15:45
mriedemyeah, seeing that now15:45
mriedemand init failure on MasterComputer15:45
ttxmriedem: am open to suggestions on making it less loose, but so far it catches the rigth stuff15:45
ttxI suspect it's just a transient issue when things fall apart for other reasons (like a restart)15:46
ttxbut better have those 7 out of the other lists15:46
mriedemttx: yeah, looks sane15:46
mriedemgood thing openstack isn't written in java :)15:46
ttxmriedem: so, is there a way to prevent those hits from appearing as uncategorized ? Or will the list autorefresh at some point ?15:47
mriedemttx: i'm not sure how often that list is updated, sdague or jog0 would know15:47
mriedemttx: as for closing the bug, it'll just be a placeholder, right? like the other 2 jenkins fail bugs we already track for random env issues.15:48
mriedemttx: want me to push up the e-r query patch for it?15:48
ttxmriedem: probably useless in that case -- just pushed a fingerprint to do as instructed15:49
mriedemttx: well having the bug and e-r query will/should prevent duplicate bugs when someone hits this hiccup15:49
mriedemso they can recheck and then maybe move to another node15:49
mriedemi'd say this is worthwhile15:50
*** roaet has joined #openstack-gate15:52
*** mestery has joined #openstack-gate15:53
ttxmriedem: is there anything else I should do to get the one I debunked off the list ?15:53
ttx(i suspect posting a bug with a fingerprint is not enough)15:54
mriedemttx: write a query in elastic-recheck15:55
mriedemttx: see anteaya's one from this morning as an example: https://review.openstack.org/#/c/69386/15:55
mriedemttx: if you don't have the time i can write it up15:55
ttxmriedem: ok, thanks!15:55
ttxmriedem: well, the idea is that I take the time to help you rater than the other way around :)15:56
ttxwill push15:56
*** ndipanov has quit IRC15:57
*** SergeyLukjanov_ is now known as SergeyLukjanov15:58
russellban e-r update -- https://review.openstack.org/6939115:58
russellbactually ... going to tweak it a touch more15:59
*** dhellmann is now known as dhellmann_16:00
sdaguettx: we're updating the uncategorized list every 60 minutes (or on code merge to r) top bugs list is every 15 mins IIRC16:01
ttxsdague: ok, I'll just propose an e-r query16:01
mriedemrussellb: while you're tweaking, i made a comment16:02
russellbheh16:02
russellbi just updated16:02
russellbwill look16:02
russellbmriedem: i think my update makes the AND comment no longer relevant16:03
russellbat least for that one16:03
mriedemrussellb: yeah, but it's in your other one now16:04
russellbyep, fixed now16:04
mriedem+116:05
russellbthanks16:05
russellbseems the instances case may be due to a kernel bug16:06
*** HenryG has quit IRC16:06
mriedemrussellb: is there something more specific in the n-cpu logs?16:07
russellbmriedem: that was my next step (at least for the volumes bug i just filed)16:08
*** ndipanov has joined #openstack-gate16:11
russellbsalv-orlando: have you talked to anyone about getting a kernel upgrade on our test nodes?16:11
ndipanovrussellb, If I read this right (wrt to libvirt 1.0.6 being used in the gate) it's a libvirt bug that is being worked on16:11
russellbndipanov: wasn't sure if it was a libvirt bug or a nova bug16:11
ndipanovrussellb, as per danpb - it's a libvirt bug...16:12
russellbah, ok16:12
sdaguendipanov: so the version of libvirt in cloud archive is actually 1.1.116:12
russellbso may be a while before we can update then16:12
sdagueso nova would need to work with that16:12
sdaguethen we could get that in the gate16:13
ndipanovsdague, and it doesn't?16:13
russellbblew up last time we tried16:13
russellbthough it was actually the unit test problem that was the main issue16:13
russellband that is now resolved16:13
russellbnot sure if this other bug is a blocker for upgrading or not?16:13
sdaguedims had an experimental patch out there, it was failing16:14
mtreinishrussellb: on https://review.openstack.org/#/c/69391 can you add a related-bug line to the commit message then I'll push it through16:14
russellbmtreinish: yes16:14
ndipanovsdague, any chance you have a link to the review?16:14
russellbmtreinish: done16:15
sdagueyeh, let me find it16:15
mtreinishrussellb: ok approved16:16
russellbmtreinish: thanks!16:16
russellbsdague: i think we need to try to get kernel upgraded on our nodes for https://bugs.launchpad.net/nova/+bug/125489016:16
sdaguehttps://review.openstack.org/#/c/67564/16:16
ttxJust pushed https://review.openstack.org/#/c/69398/16:16
russellbi just rechecked that one, see if it has improved with patches merged in the last week16:17
sdaguecool16:17
russellbfungi: see my comment above to sdague16:17
dimssdague, russellb bad lockup in libvirt - https://bugzilla.redhat.com/show_bug.cgi?id=92941216:17
mtreinishmriedem: on: https://review.openstack.org/#/c/69242/1 that sounds like something tempest would try during a negative test16:17
mtreinishthat query doesn't cause false positives16:17
dimssdague, russellb - we can't upgrade to 1.1.116:18
dimsof libvirt16:18
russellbdims: ah thanks for the link!  ndipanov ^^^^16:18
fungidims: unless the reason libvirt is breaking is *because* it needs a newer kernel16:18
mriedemttx: wildcards don't work in e-r queries :(16:18
ndipanovrussellb, yeah - it's linked in the LP bug16:18
ndipanovI saw that16:18
russellbk16:18
fungibut sounds like not16:18
russellbfungi: so my kernel comment was related to this bug where salv-orlando is seeing kernel crashes related to network namespace operations16:19
russellband i've been told there have been fixes in that code since the kernel we're using16:19
ttxmriedem: you mean the '*' I added in my build_name ?16:19
mriedemttx: yup16:19
mriedemjust commented in your review16:19
ttxmriedem: copied it from queries/1272511.yaml16:20
fungirussellb: right. i was just commenting that it was also suggested that some of our issues with newer libvirt may also be related to running too olf of a kernel16:20
ttxmriedem: is taht one bad too ?16:20
russellbfungi: oh ok16:20
mriedemttx: maybe..16:20
russellbfungi: what can I do to help with a kernel upgrade?16:20
mriedemttx: although 1272511 does show up here: http://status.openstack.org/elastic-recheck/16:20
ttxmriedem: there are 5 queries using * is build_name, fwiw16:20
ttxin*16:20
fungirussellb: which jobs is it impacting? i'm waist deep in wrangling a nodepool/jenkins issue at the moment so little time to dig through scrollback16:21
mriedemttx: hmmm, well now my world is collapsing16:21
russellbfungi: sorry.  devstack-gate basically16:21
russellbfungi: in particular, the ones using neutron16:21
ttxmriedem: sorry about that ;)16:21
fungirussellb: okay, bit of a challenge there. newer kernels mean reboots. we don't currently have a reboot phase on nodepool node creation (though maybe upgrading the kernel on the image build will be sufficient to cause launched nodes to use a newer kernel, in which case just having devstack require a suitable kernel deb ought to be fine?)16:23
mriedemmtreinish: for https://review.openstack.org/#/c/69242/ - yeah, maybe, but in logstash for the query it's all fails16:24
mriedemmtreinish: not sure if we have negative rebuild tests?16:24
mtreinishmriedem: neither do I, let me check16:24
sdaguefungi: so I think we know we have a kernel bug, but we don't know what the fix it16:24
sdagueis16:25
russellbfungi: yeah, i guess i was thinking just upgrading the kernel on the base image used for the dsvm nodes ...16:25
russellbsdague: well, there are known fixes in this kernel code16:25
russellbsdague: so i think first step we just need to see what kind of upgrade we can do without much pain16:25
russellbi think ubuntu has newer kernels available for LTS for hardware enablement, so we just need to use one of them16:25
sdagueyeh, that's true16:26
russellb"just need to"  .. i say it like it's simple since i don't know how to do it16:26
sdagueok, I can investigate this afternoon.16:26
russellbk, i'm happy to help out16:27
mtreinishmriedem: from what I can see it's just test_rebuild_reboot_deleted_server and test_rebuild_non_existent_server16:27
mtreinishon the negative test side of things16:27
dimsfungi, Daniel Berrange has confirmed a code issue in libvirt and we have logs to prove it16:27
sdaguedims: so do we know a fix strategy?16:28
mriedemmtreinish: hmm, ok, open to suggestions - unfortunately there are several test_rebuild_server* tests failing with timeouts but there isn't really anything great to fingerprint on, besides that one n-api log message16:28
russellbsdague: he's working on it16:28
mriedemmtreinish: they timeout while waiting for the instance to rebuild16:29
mriedemso nova is doing it's thing, but apparently not quick enough16:29
fungidims: ahh, yes, now i recall he finally responded on that bug16:29
*** SergeyLukjanov is now known as SergeyLukjanov_16:30
dimssdague, not yet. will have to ping Daniel16:31
mtreinishmriedem: no that query is probably fine16:31
mtreinishI was just worried that we had a test trying to do that16:31
mtreinishbut we don't16:31
*** jgriffith has joined #openstack-gate16:36
*** coolsvap has joined #openstack-gate16:37
anteayaall 24 remaining neutron unit test unclassified failures should be addressed once this is merged: https://review.openstack.org/#/c/69400/116:39
anteayaanyone working on gate-grenade-dsvm unclassified failures yet?16:40
ttxanteaya: i'm on it but will stop soon16:41
mriedemttx: there is a 33% success rate in the laste 7 days with this: https://review.openstack.org/#/c/69398/16:41
anteayattx I can switch to another category16:41
anteayattx and let me know when you change focus16:41
ttxmriedem: looking16:42
anteayaI'll work on this list for now: gate-tempest-dsvm-neutron16:42
ttxmriedem: tere seem to be one case where that failure is not propagated yes. Probably best to leave it out then16:44
ttxoh. ah.16:46
ttxI think I understand where this bug comes from though16:48
ttxhaha.16:50
*** SergeyLukjanov_ is now known as SergeyLukjanov16:54
anteayasdague: do you know which of cyeoh's patches address salv-orlando's earlier question regarding Info cache for instance <instance #> could not be found17:01
anteayahttps://review.openstack.org/#/dashboard/529217:01
anteayaI'm seeing the same error in an unclassified log and am trying to find the correct bug for it17:02
mriedemanteaya: that info cache one is fixed17:03
mriedemsec17:03
*** SergeyLukjanov is now known as SergeyLukjanov_17:03
mriedemanteaya: fixed with this: https://review.openstack.org/#/c/65374/17:03
mriedemcyeoh: ^17:03
anteayafound it https://bugs.launchpad.net/nova/+bug/125618217:03
mriedemyou'll see that bug drops off elastic-recheck once that was merged17:03
ttxOK, submitted one sig and debunked one bug. Got to get the kids at school. Sorry I couldn't contribute more :)17:04
mriedemttx: thanks for helping17:04
ttxmriedem: I abandoned that second sig, it's actually all bug 1097592 now17:05
anteayattx thanks, see you later17:05
anteayamriedem: okay 65374 was merged on the 25th, but the fingerprint I have for that failure is still collecting some failures, including on the 27th: http://bit.ly/1b1bpTo17:09
anteayanow all the failures I have are from neutron17:09
mriedemanteaya: i seem to remember seeing a nova bug in triage last night that was for an info_cache not found failure that was novel17:10
mriedemwill dig in a sec17:10
anteayathanks17:10
mriedemsdague: mtreinish: russellb: another e-r query for libvirt connection reset: https://review.openstack.org/#/c/69415/17:13
mriedemslightly different than the one we see more often17:13
anteayamriedem: might this be the one? https://bugs.launchpad.net/nova/+bug/107201417:14
mriedemanteaya: not the one i saw17:15
* mriedem looks now17:15
anteayano sorry that was from November 27th, 201217:16
*** gsamfira has quit IRC17:17
mriedemanteaya: maybe this? https://bugs.launchpad.net/nova/+bug/124906517:18
mriedemthere are actually 17 hits when searching launchpad for nova bugs with info_cache17:18
anteayayes, I have been wandering among them17:19
anteayathat was where i found the dusty one from 201217:19
mriedemanteaya: yeah, http://goo.gl/92G9U217:19
anteayait looks like a good candidate17:21
anteayaI have to change locations, pay for a cab, get boarding passes and have my privacy violated by the TSA17:21
anteayaI'll be back in a bit17:21
mriedemenjoy17:23
*** HenryG has joined #openstack-gate17:24
*** SergeyLukjanov_ is now known as SergeyLukjanov17:26
*** jog0 has joined #openstack-gate17:30
mriedemyet another libvirt connection fail query: https://review.openstack.org/#/c/69418/17:31
*** markmcclain has quit IRC17:31
*** markmcclain has joined #openstack-gate17:32
mriedemanteaya: this could also be a large ops race fail related to info cache not found: https://bugs.launchpad.net/nova/+bug/122714317:33
*** markmcclain has quit IRC17:33
*** markmcclain has joined #openstack-gate17:33
mriedemalthough that's grizzly...so probably nevermind17:33
*** jpich has quit IRC17:36
*** SergeyLukjanov is now known as SergeyLukjanov_17:41
jgriffithrussellb: I'm thinking of proposing a bump up on the num_scan_tries for that bug17:42
jgriffithrussellb: if nothing else to see if we can impact it17:42
jgriffithrussellb: trying some other setups to see if that's rational or not17:43
russellbOK17:43
jgriffithrussellb: the attach makes it down, everything looks "ok" and we connect succesfully17:43
jgriffithrussellb: we just don't get the device mapped via libvirt17:43
jgriffitherr... open-iscsi17:44
russellbjust one of those being impatient issues?17:44
jgriffithrussellb: not fully convinced yet, but possibly17:44
jgriffithrussellb: I mean there's something not right in the time it takes, but I"m checking to see if we're borderline in good cases17:44
jgriffithrussellb: ie logstash on retries for that op17:44
russellbjgriffith: note that i think someone said earlier that you can't query debug in logstash17:45
russellbnot sure if you get an INFO or higher message for that, haven't looked17:45
mriedemrussellb: you can query against INFO+17:46
*** Alexei_987 has quit IRC17:46
russellbmriedem: thanks17:46
mriedemhttps://review.openstack.org/#/c/69388/17:46
jgriffithrussellb: there's a warning for it so that *should* work17:48
russellbcool17:49
jgriffithinteresting: message:"ISCSI volume not yet found at"  over last 7 days = 31017:55
jgriffithquite a distribution between how many tries it takes, not what I expected17:56
russellbinteresting17:56
russellbso that supports your theory17:56
jgriffithrussellb: seems to, but I'm curious why we have the variance17:57
* russellb blames the cloud17:57
jgriffithrussellb: indeed17:57
russellbhonestly, i've seen tons of variance in how long things take causing failures17:57
jgriffithrussellb: at any rate I'll look at a sane adjustment for the retries, either by upping the default count or adjusting localrc17:57
jgriffithrussellb: well, we are supposed to expect that eh? :)17:58
russellband it seemed to be worse before turning down tempest concurrency17:58
jgriffithrussellb: for sure17:58
jgriffithrussellb: huge drop after the 21'st17:58
russellbthat sounds about right for the concurrency merge17:58
jgriffithrussellb: in that querie alone17:58
jgriffithrussellb: alright, I'll play with some things and bump the default in nova's conf17:59
russellbjgriffith: k, ping me if/when you need a review17:59
jgriffithrussellb: will do17:59
jgriffithrussellb: thakns17:59
russellbthank _you_17:59
jgriffiththanks even17:59
sdaguethe concurency merge was th 16th18:02
* jgriffith is covered by sdague 's wet blanket18:03
jog0wow 95% classification rate!18:03
sdagueBug 1270608 - n-cpu 'iSCSI device not found' log causes gate-tempest-dsvm-*-full to fail went away when I disabled a tempest test18:03
jog0and 56 bugs :(18:04
sdaguejgriffith: https://review.openstack.org/#/c/67991/18:04
sdagueso that test is really good at exposing that bug18:04
sdagueit could be a test bug18:04
jgriffithsdague: yes, I remember that one now, and yes completely explains the log querie18:04
jgriffithsdague: Yup, and it'll give me some data on the timing assuming it's the same18:05
sdagueit would be good to figure out if this is a real cinder issue18:05
jgriffithsdague: it's not18:05
jgriffithsdague: it's load on the compute note18:05
jgriffithsdague: at least the case I'm looking at now18:05
jgriffithsdague: has NOTHING to do with Cinder at all18:05
jgriffithsdague: or nova for that matter18:05
jgriffithsdague: strictly slow iscsi connect18:05
sdaguejgriffith: so under load, how do we fall over?18:06
jgriffithwe give it 8 seconds, and sometimes that's not enough18:06
sdagueok18:06
sdagueso can we adjust that to a more real timeout that makes sense?18:06
russellbyeah that sounds aggressive for these loaded test nodes18:06
jgriffithsdague: it looks like on average in our gate runs we're going past 4 seconds anyway18:06
jgriffithsdague: yes18:06
russellbdang, yeah then 8 isn't enough :)18:06
jgriffithsdague: I'm just trying to decide what's most sane18:06
sdaguejgriffith: ok, cool18:06
jgriffitheither change teh default or change the sleep factor18:07
jgriffithsleep factor is currently two, considering bumping it to 418:07
russellbjgriffith: which code is this18:07
russellbnm found it18:08
jgriffithrussellb: nova.virt.libvirt.py:L#31718:08
russellblibvirt/volume.py right?18:08
jgriffithrussellb: yes sir18:08
russellbso we'll loop 3 times right?  so we'll sleep 2 seconds, 4 seconds, then 8 seconds ... so 14 seconds total i think18:10
jgriffithrussellb: well it's **18:10
jgriffithso 2, 4, 918:10
russellbdo'oh, meant 9 :-p18:11
* russellb can't do math apparently18:11
jgriffithalright I did the same thing when I first viewed it18:11
russellbso, i don't think we should do ** 4 ...18:11
russellbbecause it's a hard sleep18:11
russellbif we changed the default retries to 4, we'd double the time we wait18:12
russellbmaybe just change retries to 5 from 3?18:12
jgriffithrussellb: I'm leaning towards upping the sleep factor18:13
jgriffithrussellb: so that way we don't have to mess with chaning config18:13
jgriffithchaing18:13
jgriffithgrrr... changing18:13
russellbit's just a config default though, that's not a huge deal18:13
sdagueyeh, I'd got with more loops18:13
sdagues/got/go/18:13
jgriffithrussellb: tru'dat18:13
jgriffithrussellb: sdague fair18:14
sdaguebecause then on fast envs it will be fast, and on slow envs it will not blow up18:14
russellbif we did **4 ... it'd be 1, 16, 81 sleeps18:14
*** SpamapS has joined #openstack-gate18:14
russellbseems a bit aggressive on the ramp up18:14
sdagueyeh18:15
jgriffithrussellb: I'll bump that up then, my only point was that we already hit retries on a regular basis18:15
russellb1, 4, 9, 16 2518:15
russellbseems better18:15
jgriffithrussellb: why have warning that we're retrying when we know we're going to hit them?18:15
jgriffithjust sec.. phone call18:15
russellbjgriffith: dunno, seems like we should only warn if we give up18:15
russellbmaybe debug on retry18:15
jog0jgriffith: are we hitting retires outside of gate in production to or just in gate?18:15
jog0due to load etc18:15
jgriffithrussellb: then we'd be screwed18:15
russellborly18:16
jgriffithrussellb: ie like today trying to get info on this18:16
russellbi think the early retries are reasonable for production machines though18:16
jgriffithrussellb: personally I like the warning, gives an indication that you need to bump the value up18:16
russellbshouldn't optimize for our loaded test env18:16
russellbthat's fine, we can leave it18:16
jgriffithrussellb: You're the nova folks, totally your call18:17
russellbk :)18:17
russellblet's just bump retries18:17
jgriffithrussellb: perfecto18:17
jgriffithrussellb: doing it now18:17
russellbgreat18:17
russellb== 5 gives us a total timeout of almost 1 minute18:18
russellbwhich seems sane ...18:18
jog0it does?18:18
jgriffithrussellb: I agree fully18:18
jog0note: not saying its insane18:18
russellbjog0: 5 retries will result in sleeps of 1, 4, 9, 16, 25 seconds18:18
russellb=== 55 seconds18:18
jgriffithrussellb: sdague I'll run this on the ssh tests and see how things go there18:18
jog0ohh exponential backoff yeah that seems sane18:19
mriedemis this the libvirt iscsi connection fail we're talking about?18:19
jgriffithrussellb: sdague maybe we'll kill two bugs with one patch18:19
russellbnice18:19
mriedem1270608?18:19
jgriffithmriedem: yes18:19
russellblove it when "wait longer" is the solution, heh18:19
jgriffithrussellb: :P18:19
mriedemha, bknudson suggested waiting longer a couple weeks ago but we didn't think that would fly :)18:20
russellbmriedem: d'oh18:20
mriedemoh well18:20
russellbsometimes it is the right answer.18:20
jgriffithkudos to bknudson18:21
jog0sdague: you have a patch up to make http://status.openstack.org/elastic-recheck/data/uncategorized.html discovareable from http://status.openstack.org/elastic-recheck/ ?18:22
*** dhellmann_ is now known as dhellmann18:23
*** dhellmann is now known as dhellmann_18:23
sdaguejog0: no, not yet18:24
russellbso what builds the image used for dsvm nodes18:27
sdagueit's using devstack under the covers18:27
sdaguefungi or jeblair could probably explain18:28
russellbso, nodepool does it as a periodic task ... got that far :)18:30
funginodepool prep scripts (in the config repo) look in devstack's source to determine what to preinstall/cache18:30
russellbok cool18:31
fungiand also apply configuration from the puppet manifests which configure them to be consistent with other jenkins slaves18:31
jog0russellb: for bug 1254890 we can add a syslog based fingerprint18:38
jog0or open a new bug with a syslog fingerprint18:38
russellbsure, that'd be helpful18:38
russellbthe more targeted the queries/bugs the better i think18:38
salv-orlandojog0: both options are equivalent for me. It should be easy to handle the overlap18:40
*** ndipanov is now known as ndipanov_gone18:40
jog0salv-orlando: you want to write the query?18:42
jog0as you have dug into this bug more then me18:42
salv-orlandosure jog0… so just to make sure, shall we add the query to the a new bug or still to 125489018:47
salv-orlando?18:47
jgriffithrussellb: mriedem sdague https://review.openstack.org/#/c/69443/18:53
*** SergeyLukjanov_ is now known as SergeyLukjanov18:53
jgriffithmriedem: sdague might be worth trying to put boot_from_vol test back in after this lands18:53
russellbjgriffith: +218:54
russellbjgriffith: will chase an approval after check finishes18:54
jgriffithrussellb: sounds good18:55
mriedemjgriffith: russellb: sdague: ok, we just need to restore this: https://review.openstack.org/#/c/69203/18:57
sdaguejgriffith: sounds good19:01
sdaguemriedem: can you restore that now?19:01
sdagueand we'll run recheck on it a few times after the nova code lands19:01
jgriffithsdague: excellent19:01
sdaguejgriffith: thanks for diving on this19:01
russellbso, the kernel upgrade thing ... rev1 -- https://review.openstack.org/6944519:01
russellbtotally untested19:02
mriedemsdague: sure19:02
jog0salv-orlando: your call on new bug or not19:02
salv-orlandojog0: ok thanks19:02
russellbsalv-orlando: see my patch above ... trying to figure out how to get kernel upgraded for the crashes you're seeing19:03
jog0russellb: nice, we should look into getting new libvirt in as well19:03
jog0I think dims was working on that19:03
jog0dims: ^19:03
russellbyeah dims is on that19:03
russellbit's blocked by a libvirt bug19:03
jog0russellb: which one?19:03
jgriffithrussellb: running on my box now19:03
dimsright19:03
russellbjog0: notes on https://etherpad.openstack.org/p/nova-gate-issue-tracking19:04
salv-orlandorussellb: let's hope that helps, otherwise we'll have to isolate the failure and find a way to work around it19:04
dimsjog0, https://bugzilla.redhat.com/show_bug.cgi?id=92941219:04
sdaguerussellb: yeh, that looks vaguely sane19:04
jog0dims: thanks19:04
russellbsdague: i'll take that :)19:04
salv-orlandoin other news, I haven't heard from canonical team any complaint… so perhaps their internal testing is not failing ;)19:04
salv-orlandoin which case, it might be worth finding some canonical guy and asking them which kernel version they are running.19:05
jog0russellb dims: do we havea launchpad BP where we aretracking libvirt 1.x ?19:05
russellbjog0: not that i know of19:05
jog0dims: you want the honors of creating one19:06
jog0so we can track this19:06
*** Ajaeger has quit IRC19:07
dimsjog0, don't see one. looking for something to model on19:11
jog0dims: it could be something like: support libvirt 1.x19:17
jog0and target for icehouse19:18
flaper87sdague: PS 12 seems to have passed all tests, PS 11 didn't19:19
flaper87I'll check that one19:19
dimsjog0, will do19:20
jog0dims: thanks.19:21
sdagueflaper87: yeh, I expect that this is a more deep seated race in glance unit tests19:21
flaper87sdague: indeed. I was looking at those tests the other day. Not sure where the race is but I don't think those asserts need to be there to begin with19:21
flaper87anyway, I'll figure this out19:22
anteayaback19:25
*** jmeridth has joined #openstack-gate19:26
sdagueflaper87: thanks!19:26
*** coolsvap has quit IRC19:33
russellbeep @ http://status.openstack.org/elastic-recheck/  bug 125762619:50
russellball the sudden blowing up again?19:50
russellbi did not approve of this19:50
sdaguerussellb: so that's in check queue19:51
sdaguethere is some big nova patch series that just pushed that blew up pretty universally on that test19:52
mtreinishsdague: so that means dansmith is at fault? :)19:52
sdagueprobably :)19:52
russellb1328 hits in the last 4 hours19:52
jog0message:"kernel BUG at /build/buildd/linux-3.2.0/fs/buffer.c:2917" AND filename:"logs/syslog.txt"19:52
jog0thats a lot of kernel bug hits19:52
dansmithit worked for me on my machine :)19:53
russellbdansmith: but yeah, looks like those are all on your patch series19:54
dansmithrussellb: only on mine?19:54
russellbdansmith: well ... so far that's what i see19:54
dansmithhmm19:54
jog0russellb: 1257626 looks like it legitimately came back19:54
jog0not all of them19:54
jog0 6769419:54
jog0http://logstash.openstack.org/#eyJmaWVsZHMiOltdLCJzZWFyY2giOiJtZXNzYWdlOlwibm92YS5jb21wdXRlLm1hbmFnZXIgVGltZW91dDogVGltZW91dCB3aGlsZSB3YWl0aW5nIG9uIFJQQyByZXNwb25zZSAtIHRvcGljOiBcXFwibmV0d29ya1xcXCIsIFJQQyBtZXRob2Q6IFxcXCJhbGxvY2F0ZV9mb3JfaW5zdGFuY2VcXFwiXCIgQU5EIGZpbGVuYW1lOlwibG9ncy9zY3JlZW4tbi1jcHUudHh0XCJcbiIsInRpbWVmcmFtZSI6IjYwNDgwMCIsImdyYXBobW9kZSI6ImNvdW50Iiwib2Zmc2V0IjowLCJ0aW1lIjp7InVzZXJfaW50ZXJ2YWwiOjB9LCJtb2RlIjoidGVybXMiLCJhbmFseXplX2ZpZWxk19:55
russellbdansmith: the first 67694 is still technically the same patch series19:55
russellbit's based on one of dan's patches19:55
jog0russellb: ahh19:55
jog0russellb: so all of these faulires are in the check queue19:56
* russellb nods19:56
jog0so looks like it hasn't hit gate yet19:56
russellbdansmith just rebased his patch series today (bunch of patches)19:56
russellbso that would explain the sudden huge appearance of those if it was something in there19:56
* dansmith is confused19:57
jog0russellb: https://review.openstack.org/#/c/69448/19:57
russellbcool19:58
jog0dansmith: https://jenkins03.openstack.org/job/gate-tempest-dsvm-large-ops/4499/console19:59
russellbhere's a count of the errors per patch -- http://logstash.openstack.org/#eyJmaWVsZHMiOltdLCJzZWFyY2giOiJtZXNzYWdlOlwibm92YS5jb21wdXRlLm1hbmFnZXIgVGltZW91dDogVGltZW91dCB3aGlsZSB3YWl0aW5nIG9uIFJQQyByZXNwb25zZSAtIHRvcGljOiBcXFwibmV0d29ya1xcXCIsIFJQQyBtZXRob2Q6IFxcXCJhbGxvY2F0ZV9mb3JfaW5zdGFuY2VcXFwiXCIgQU5EIGZpbGVuYW1lOlwibG9ncy9zY3JlZW4tbi1jcHUudHh0XCIiLCJ0aW1lZnJhbWUiOiIxNDQwMCIsImdyYXBobW9kZSI6ImNvdW50Iiwib2Zmc2V0IjowLCJ0aW1lIjp7InVzZXJfaW50ZX19:59
russellbJ2YWwiOjB9LCJzdGFtcCI6MTM5MDg1MjMxNzkzNCwibW9kZSI6InNjb3JlIiwiYW5hbHl6ZV9maWVsZCI6ImJ1aWxkX2NoYW5nZSJ919:59
russellbhrm19:59
russellbshorter: http://goo.gl/Uvw30f19:59
anteayathanks19:59
russellbdansmith: errors start occurring on this patch: https://review.openstack.org/#/c/66634/1020:00
russellbaccording to logstash *shrug*20:00
jog0http://logs.openstack.org/50/67550/5/check/gate-tempest-dsvm-large-ops/60a7a43/logs/screen-n-net.txt.gz?level=INFO20:00
dansmithI guess I have to wait for n-net logs, eh?20:01
jog0russellb: for https://review.openstack.org/#/c/69448/ I wanted to make sure you think adding that bug makes sense20:01
jog0dansmith: see n-net link above ^20:01
russellbso, nova-network timed out because it was blocking on conductor ...20:03
russellbjog0: sure, yeah20:05
dansmithrussellb: this is the bug sdague pointed me at before I left20:05
dansmithwhich I don't understand20:05
dansmithseems like a race during worker startup, but I don't know why conductor hits it and not n-api20:05
jog0russellb: cool thanks20:05
*** david-lyle has joined #openstack-gate20:06
russellbcould use a Related-bug: tag in the commit msg20:07
jog0russellb: can you add that to the review20:07
russellbdopne20:07
russellbjog0: you see https://review.openstack.org/#/c/69445/ ?20:08
russellbadded it to that bug too20:09
jog0russellb: nice20:10
jog0if this works we should add something to the release notes about this bug20:11
russellbyeah20:11
jog0saying which kernel works20:11
jog0in fact maybe we should document in release notes what kernel we gate on20:11
russellbwish there was a good way to do a test deploy of this change20:11
russellbif this merges, it gets applied the next time nodepool rebuilds its base image, and that's for *everything*20:12
jog0fungi: ^ thoughts20:12
jgriffithsdague: mriedem FWIW I'm convinced that the boot-from-volume test failures were due to the same issue (virt attach timeout)20:14
jgriffithjog0: +1000 on publishing kernel20:14
mriedemjgriffith: cool, this should also hopefully help the DOS on cinder from tempest: https://review.openstack.org/#/c/69455/20:14
sdaguejgriffith: nice20:14
jgriffithjog0: should be in release notes/docs at a min, consider adding as a pre-req in install guide20:14
jog0jgriffith: lets file a bug about adding this for the docs team20:15
jgriffithmriedem: interesting20:15
fungijog0: well, it's worth noting that this change isn't going to fix the problem fior anybody besides infra ci tests20:15
jgriffithjog0: sure, I'll do that now, assumign we're mvoing forward? Or is there some testing data we want first?20:16
sdaguemriedem: well, it's really a possible quota overrun20:16
mriedemjgriffith: that test was creating 2 volumes per 7 test cases each, and only 2 of the tests actually needed a volume, separately20:16
jgriffithfungi: I'm still not comfortable testing one way and deploying another20:16
fungijog0: i really think that if devstack doesn't work properly with the current precise kernels, then either ubuntu needs to update those kernels with a patch or devstack should make sure the correct kernel is installed20:16
mriedemand never waiting for deletes20:16
jog0jgriffith: we want to document what version of kernel we test on, but the specific version doens't matter for the bug20:16
jog0(docs bug)20:16
fungijgriffith: agreed20:16
sdaguefungi: so the issue is mostly about testing it20:17
jgriffithjog0: so just a statement in docs that we use kernel version X for now, maybe something else later20:17
fungisdague: okay, then devstack sounds like the right place to fix it20:17
russellband it's not even devstack20:17
russellbit's a neutron issue20:17
jog0jgriffith: ideally docs can have a tool that checks what kernel we test with20:17
jgriffithrussellb: good point :)20:17
sdaguefungi: also... ubuntu precise, without cloud archive, is not supported on icehouse :)20:17
sdagueso our current config.... really isn't reality20:17
fungirussellb: well, if devstack is configured with neutron then devstack needs to make sure there's a suitable kernel for neutron20:18
sdaguefungi: sure20:18
jog0jgriffith: I figure docs would say two things: these kernels are known to have issues. and we gate on kernel x20:18
russellbfungi: i don't think devstack should be in the business of installing kernels20:18
sdaguefungi: the real question is how do we get test data on it20:18
russellbbut we could add a safety check20:18
jgriffithjog0: sounds like the best approach20:18
sdaguerussellb: it installs kernels on centos20:18
jgriffithjog0: I'll log it and maybe add it here later if I have a minute20:18
jog0jgriffith: cool, thanks for filing the bug20:18
russellbsdague: >_<20:18
fungirussellb: well, devstack is in the business of installing all sorts of other system-wide packages, and it's where we get the list of what devstack needs to be able to run tempest tests when we're building nodepool nodes20:18
sdagueotherwise you *can't* use neutron20:18
sdagueat all20:19
russellbwith network namespaces, you're right20:19
russellbfungi: hrm .... ok.  let me see what i can do here20:19
sdaguefungi: so how would we try this to see if it solved things? because we don't have a step where we could reboot to take the new kernel?20:20
sdagueas that today is a nodepool prep step20:20
fungisdague: since we currently can't reboot slaves while they're running jobs, i don't think there's a good way to self-test that change... however we could define a new node type which uses that kernel and then set up an experimental job which uses only that node type20:20
sdaguefungi: ok, lets do that20:20
fungithat way it gets its own nodepool image which won't affect other running jobs20:21
sdaguerussellb and I can work on making a flag behind devstack so this is option20:21
sdagueoptional20:21
sdaguefungi: yep, that would be great20:21
russellbsounds like a sane way forward20:22
fungiso in that case i think we need a separate nodepool prep script which does the thing in https://review.openstack.org/#/c/69445/1/modules/openstack_project/files/nodepool/scripts/prepare_devstack.sh but otherwise just wraps prepare_devstack.sh, and then specify that as the build script for our new node type in the nodepool configuration20:23
russellbok, i can do that easily enough20:24
fungiand keep in mind that we want to rip it back out again and switch to figuring out the kernel package we want from devstack once we're sure this is sane20:24
jgriffithjog0: FWIW https://bugs.launchpad.net/openstack-manuals/+bug/127341220:25
jgriffithlet's see how things go an I'm happy to augment the docs, or maybe someone from neutron to give more indepth info20:25
sdaguerussellb: ok, you working on the nodepool side? want me to take the devstack side? or you already have something in process?20:25
russellbsdague: i'm doing the nodepool change right now20:26
sdaguecool20:26
russellbi started looking at devstack, but then came back to nodepool after this plan came up20:26
*** ociuhandu has quit IRC20:28
fungiworth noting, there are a metric ton of nova changes in the check pipeline failing large-ops jobs (but i see one passing in the gate so it's hopefully not a real epidemic)20:28
sdaguefungi: yep20:29
russellbfungi: yeah we're on that ... it's a giant nova patch series20:30
russellbsdague: fungi nodepool update - https://review.openstack.org/#/c/69445/20:30
jog0fungi: its all the check queue20:30
*** marun has joined #openstack-gate20:31
fungirussellb: awesome. added some initial comments20:36
sdaguerussellb:  https://review.openstack.org/6946420:39
russellbfungi: i think wrapping is ok ... the package installed includes headers, too.  also installing headers for the old but currently running kernel won't hurt anything20:40
fungirussellb: oh, linux-generic-lts-saucy includes an equivalent of linux-headers-`uname -r` ?20:44
*** SergeyLukjanov is now known as SergeyLukjanov_20:44
* fungi checks20:44
russellblinux-generic-lts-saucy - Generic Linux kernel image and headers20:44
russellbaccording to the package description anyway20:44
fungiah, yep, depends on linux-headers-generic-lts-saucy20:45
russellbcool20:45
fungiso it should get pulled in fine20:45
russellbsdague: so, this looks fine, but won't work for the gate right?20:46
sdaguerussellb: right20:46
russellbk20:46
sdaguesorry, I guess I got the split wrong here. I was thinking we'd put it in devstack, have nodepool hit it with a different variable20:47
sdagueno worries20:47
russellbyeah, don't think nodepool actually runs stack.sh20:47
russellbit just pulls the package lists20:47
fungicorrect20:47
sdaguegotcha20:48
russellbcaching all the packages it would have downloaded for every devstack run20:48
russellb(but doesn't install them yet)20:48
russellbAFAICT20:48
sdagueright20:49
russellb$ vim modules/openstack_project/templates/nodepool/nodepool.yaml.erb20:49
russellbVim: Caught deadly signal SEGV20:49
russellbVim: Finished.20:49
russellbSegmentation fault (core dumped)20:49
russellb...20:49
sdagueheh20:49
russellbit is seriously seg faulting *every* time i try to open that file20:49
russellbinfra is too l33t for vim20:50
fungiohhhh... right. so we definitely still need to do something out of band in the production equivalent of this to make sure the new kernel package is actually installed onto the image and not just cached20:50
fungiwhich will mean a permanent addition to the nodepool prep script i guess20:50
russellbfungi: yeah, though the command is telling it to install20:50
anteayajog0: check the first link under gate-tempest-dsvm-neutron : 6 Uncategorized Fails. 97.5% Classification Rate (240 Total Fails)20:51
*** ociuhandu has joined #openstack-gate20:51
anteayacontext: https://review.openstack.org/#/c/69458/20:51
fungirussellb: right, for production we'll want something similar, but would still be nice to figure it out from devstack. not convinced there's a sane way for that though20:52
anteayaif the current fingerprint were catching all the fails, that log wouldn't be in uncategorized20:52
russellbfungi: yeah, i dunno ... not sure how much it's worth trying to make a generic solution.  it seems like a one-off hack20:52
russellbfungi: think i should set up this node type for all providers?  or think 1 should be enough for the experimental job?20:54
fungirussellb: i would just do one for now20:54
russellbfungi: k, does it matter which?20:55
funginah20:55
russellbk, *picks top of the providers list*20:55
fungiwell, not the tripleo cloud provider, but the rest should be fine20:55
russellbha20:55
russellbright.20:55
jog0anteaya: whats an example of a hit from message:"No nw_info cache associated with instance""20:57
jog0anteaya: in http://status.openstack.org/elastic-recheck/data/uncategorized.html#gate-tempest-dsvm-neutron20:57
fungieek, swift devstack exercises for grizzly failing in a grenade job for a stable/havana nova change in the gate20:58
funginew bitrot or known nondeterministic condition?20:59
mriedemrussellb: jgriffith: we good to go on this? https://review.openstack.org/#/c/69443/20:59
mriedemi am20:59
russellbyeah approved21:00
jgriffithmriedem: russellb awesome21:00
russellbcandidate for promoting if there's a gate reset, or it should be in by tomorrow21:00
russellbupdated https://review.openstack.org/#/c/69445 ...21:02
russellbnow i guess i need a new job defined, and then have it added as an experimental job for nova and neutron or something21:03
* russellb learning all this infra amazingness slowly but surely21:03
fungirussellb: yep, that would be good to add in the same change. also i spotted an issue with the nodepool config21:06
fungi(see review comment)21:06
russellbthanks :)21:06
russellbok, will mark WIP while I get the rest in place21:06
fungiwill be nice if this can all be added as one config change (i believe it's possible) so that it will be easier to revert once we're done testing it out21:07
*** gsamfira has joined #openstack-gate21:07
russellbfungi: works for me21:07
portantefungi: need some help with swift stuff?21:09
fungiportante: spotted this a few moments ago... https://jenkins02.openstack.org/job/gate-grenade-dsvm/5046/consoleText21:10
fungifailure of swift devstack exercises in stable/grizzly21:11
fungie-r says it might be bug 1209086 or 124025621:12
fungihttp://logs.openstack.org/24/61924/1/gate/gate-grenade-dsvm/a40c7a4/21:12
* portante looks21:13
fungisince we have e-r patterns for it, probably not something new21:14
*** dhellmann_ is now known as dhellmann21:15
*** yjiang5_1 has joined #openstack-gate21:15
portantefungi: looks like the "new" code account server did not start21:17
portanteyes, that is 120908621:18
fungiportante: okay, thanks for the confirmation21:19
russellbseems like jjb files have changed since i last looked ... more templatey21:20
gsamfirahey guys. I am going through the uncategorized bugs now. Where can I find the already created categories? Would like to add some.21:21
jog0gsamfira: ?21:23
fungigsamfira: uncategorized failures just mean there are no patterns to positively match them in elastic-recheck21:25
gsamfiralooking through the bugs here: http://status.openstack.org/elastic-recheck/data/uncategorized.html . Is there a list of already open bugs where I can add some of these. If they match21:25
gsamfiragotcha21:25
fungirussellb: yeah, clarkb and jeblair templated-up the devstack jobs to make them less redundant21:26
fungiso now jobs can run in check/gate/periodic and on multiple branches without needing separate definitions21:27
russellbmy brain is tired.21:35
salv-orlandorussellb: your tired brain is my brain at its best.21:36
russellbha, whatever21:36
fungiummm... did mock move beneath us? there are nova and ceilo changes in the gate are spontaneously but consistently failing all their unit tests with "TypeError: _load_plugins() takes exactly 4 arguments (5 given)"21:39
russellbfungi: it's a stevedore release21:40
fungioh. ugh21:40
russellbfungi: there's a patch or 2 up for nova as of a few minutes ago ...21:40
fungiokay, known issue them. silencing my personal alarm21:40
dhellmannseveral test suites are mocking a private method in stevedore21:40
fungigot it. makes sense21:40
russellbdhellmann: what could go wrong?21:40
dhellmannrussellb: indeed21:40
mriedemjog0: another e-r query patch for you to look at while i'm working out the fix in tempest: https://review.openstack.org/#/c/69441/21:41
russellbsdague: dhellmann do we need both https://review.openstack.org/#/c/69476 and https://review.openstack.org/#/c/69475 ?21:41
sdaguerussellb: no we were racing on the solution21:41
dhellmannrussellb: no, I just abandoned mine21:41
russellbk21:41
dhellmannsdague: please just add that comment about the right way to use stevedore's test API to yours21:42
mriedemanteaya: i think you were looking at this earlier: https://bugs.launchpad.net/nova/+bug/125618221:43
mriedemanteaya: i think that one is probably a difference in how nova-network and neutronv2 api handle the bad request, i'm looking but have to leave soon21:43
mriedemthat wouldn't be uncommon though, we've had that before with tempest21:43
mriedemlike how security groups are handled21:43
sdaguedhellmann: so, honestly, I think a low hanging fruit bug would be more fruitful than that comment21:45
dhellmannsdague: ok, I can open that21:45
*** alexpilotti has quit IRC21:47
dhellmannsdague: https://bugs.launchpad.net/nova/+bug/127345121:47
*** alexpilotti_ has joined #openstack-gate21:48
jog0do we have a bug filed for the stevadore issue?21:48
jog0ahh ^, I'll add an e-r fingerprint for it21:48
jog0actually hmm, not sure if should wait or not on that21:49
jog0sdague: thoughts^21:49
sdaguejog0: we don't, it would be worth while21:50
sdagueat least nova and ceilometer need to solve it21:51
sdagueI don't know who else has mocks that need it21:51
jog0sdague: looks like dhellmann just opened bug 127345121:51
sdagueright, yep21:51
sdaguejog0: you want to write the fingerprint?21:51
jog0sdague: sure21:52
jog0not sure if that makes sense though21:52
jog0hopefully people will read the bug and see if its fixed or not21:52
sdaguejog0: so actually, dhellmann's bug is a longer term one21:52
sdagueI'd actually fix a different one for this21:52
jog0sdague: ahh what bug should I put a fingerprint under?21:52
sdaguejog0: there isn't one yet21:52
sdaguelet me file one quick21:53
jog0query: message:" TypeError: _mock_load_plugins() takes exactly 4 arguments (5 given)"  AND filename:"console.html"21:53
jog0only works for nova  though21:53
russellbthe fun never ends.21:53
jog0russellb: heh!21:54
sdaguejog0: https://bugs.launchpad.net/ceilometer/+bug/127345521:55
sdaguejog0: ceilometer may not have gotten to ES yet21:55
jog0ceilometer21:55
sdagueit *litterally* just happened21:55
sdaguejog0: the bug is on both21:55
*** bnemec has joined #openstack-gate21:56
dhellmannsdague: I'll take the ceilometer side of the fix21:56
sdagueoh, the function name in ceilometer is different21:56
sdaguejog0: so your signature would need to account for that21:56
jog0sdague: just surprised that sall21:56
jog0thats all21:56
jog0(message:" TypeError: _mock_load_plugins() takes exactly 4 arguments (5 given)" OR message:" TypeError: _load_plugins() takes exactly 4 arguments (5 given)" )  AND filename:"console.html"21:58
russellbunit tests passed21:58
jog0any other projects?21:58
sdaguejog0: http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiIHRha2VzIGV4YWN0bHkgNCBhcmd1bWVudHMgKDUgZ2l2ZW4pXCIiLCJmaWVsZHMiOltdLCJvZmZzZXQiOjAsInRpbWVmcmFtZSI6IjE3MjgwMCIsImdyYXBobW9kZSI6ImNvdW50IiwidGltZSI6eyJ1c2VyX2ludGVydmFsIjowfSwic3RhbXAiOjEzOTA4NTk4NDkxMzB921:58
russellbhttps://review.openstack.org/#/c/69476/21:58
russellbif anyone wants to hit +A on that21:58
jog0ohh oslo-messaing21:58
jog0russellb: +Aed21:59
russellbjog0: thank ya21:59
sdagueok, added olso.messaging to the bug22:00
*** dims has quit IRC22:00
*** dims has joined #openstack-gate22:02
jog0sdague: how about this message:"site-packages/stevedore/extension.py"22:02
sdague"_plugins() takes exactly 4 arguments (5 given)"22:02
sdagueseems like everyone named it close22:02
jog0mine will look hit any stacktrace with stevadore in it22:03
sdagueyeh, those might actually be different issues22:03
mriedemanyone remember libvirt fails like this causing scheduler fails in the gate? "libvirtError: Unable to write to monitor: Broken pipe"22:04
mriedemhigh success rate in builds when it shows up so can't be the reason the compute host goes down22:04
jog0sdague: message:"_plugins() takes exactly 4 arguments (5 given)"  doesn't work22:05
jog0mriedem: rings a bell22:05
sdagueoh.... right, word boundaries22:05
mriedemjog0: yeah, there are lots of libvirt error bugs, trying to figure out if this is a dupe...which it probably is22:06
mriedemso many to choose from22:06
jog0mriedem: dims may have more insight22:06
jog0sdague: https://review.openstack.org/6948322:08
dimsmriedem, only in ceilometer whitelist. haven't run across that myself22:09
dimsjog0, what broke in oslo.messaging for stevedore?22:09
sdaguedims: the signature on an private method changed, and turns out nova, ceilometer, and oslo.messaging had mocked it22:10
dhellmannsdague: the issue in ceilometer actually points to a bug in stevedore22:10
dhellmanndo we have a log for the issue in oslo.messaging?22:10
dhellmannceilometer was properly using an API that I broke because there wasn't a test for it22:11
sdaguehttp://logs.openstack.org/1d/1d25c5ae20cdd8a9faf5a7f7dc2195a46e5be861/post/oslo.messaging-coverage/abe1a94/console.html22:11
dhellmannI think that's the same issue22:11
dimsthx sdague !22:11
dhellmannceilometer's tests won't run because it wants pytidylib and that's not on pypi, does someone know the magic incantation to allow that one to be downloaded remotely?22:12
dhellmannnevermind, I think I found it22:15
dhellmannsdague: https://review.openstack.org/6948522:23
SpamapSI am here to lend my eyes/code/w'ever to the effort for the next 2 hours.22:27
SpamapSWhat's a good starting point?22:27
dimsdhellmann, so i don't need to do anything in oslo.messaging?22:29
dhellmanndims, I'm going to work on a stevedore patch22:29
dhellmannI have one that makes the tests pass now22:29
dimscool. thx22:30
dhellmannI just need to clean up the commit message and tie it to the bug22:30
dimsok. back in a bit.22:30
sdaguedhellmann: cool22:32
sdagueso that will fix the ceilometer issue?22:32
dhellmannsdague: yeah22:34
dhellmannpatch is merging now, and then I'll tag 0.14.122:34
dhellmanntag pushed22:35
dhellmannrelease on pypi22:35
*** jeckersb is now known as jeckersb_gone22:38
*** flaper87 is now known as flaper87|afk22:50
sdaguedhellmann: does stevedore now rely on a new non pypi package?22:56
sdagueor was the tidylib thing just ceilometer?22:57
dhellmannsdague: the tidylib thing is something in ceilometer22:58
sdagueok cool22:58
sdaguejust realized we're not doing stevedore tempest runs until it hits the mirror tonight22:58
sdagueand wanted to head off any issues there22:58
dhellmannok22:58
dhellmannI also submitted 2 patches to remove the use of the broken stevedore class -- it was deprecated anyway22:59
dhellmannI need to run out and buy bread and milk like all the other southerners in case it actually snows here tomorrow23:00
dhellmannI'll be back online in an hour or two, in case there's more breakage23:00
*** dhellmann is now known as dhellmann_23:01
*** jeckersb_gone is now known as jeckersb23:09
*** masayukig has joined #openstack-gate23:12
* cyeoh waves23:26
* jog0 waves back23:27
*** sdague has quit IRC23:27
dimshey cyeoh23:28
cyeohdims: hi!23:28
*** sdague has joined #openstack-gate23:28
cyeohany gate bug in particular getting debugged at the moment? (I'm just back from vacation).  Gate queue looks really good at the moment23:29
jog0cyeoh: gate is moving pretty well actually23:30
jog0but we do ahve 66 gate bugs we are tracking http://status.openstack.org/elastic-recheck/23:30
sdaguedims: https://review.openstack.org/#/c/69492/ - actually has a pep8 fail in it23:31
cyeohjog0: cool - I'll have a look through.23:31
jog0cyeoh: actually one thing that would be useful is go through that entire list and make sure the bugs are properlly triaged23:32
cyeohjog0: ok I'll check as I look at them23:33
*** dims has quit IRC23:33
jog0cyeoh: we also now have http://status.openstack.org/elastic-recheck/data/uncategorized.html23:34
jog0for tracking unclassified failures23:34
cyeohjog0: ah I didn't know about that page23:34
jog0cyeoh: it didn't exist 2 weeks ago23:35
cyeohjog0: heh23:35
jog0so good news is: we have a good grasp on why the gate is failing23:35
jog0bad news is we identified 66 potential issues, with many of them still open23:35
cyeohyea some of them look really rare so its good we're at least tracking them now23:36
jog0cyeoh: yeah23:37
sdaguecyeoh: yeh, categorization is good23:37
sdaguebecause it's actually really interesting to see that rare events show up a few times for us23:37
cyeohsdague: yea agreed.23:38
*** david-lyle has quit IRC23:40
jog0sdague: btw it would be really slick if there was an option on http://status.openstack.org/elastic-recheck/ to specify gate, check or all queues23:40
mriedemcyeoh: could use another set of eyes on the n-cpu/n-sch logs for this bug to see why the compute host goes down which causes the build error23:40
mriedemhttps://bugs.launchpad.net/nova/+bug/125779923:40
jog0or something like that23:40
mriedemcyeoh: libvirt is breaking at some point i think but couldn't find anything good to fingerprint that didn't also hit a lot of successful runs23:41
jog0mriedem: thats the libvirt issues23:41
jog0we want to try libvirt 1.x but there are bugs in it23:41
sdaguejog0: yeh, there's lots of good things that we could do. Honestly, I think I've spent about as much of my time as I can this cycle on the er graphics side.23:41
jog0sdague: heh yeah you did spend a lot23:42
cyeohmriedem, jog0: yea it does look like the libvirt problems we've seen before23:42
jog0sdague: thoughts on 6830423:42
sdaguejog0: no more thoughts for the day. Time to call it a night :)23:42
jog0dims was working on the libvirt stuff too23:42
jog0sdague: o/23:43
*** dtroyer_zz has joined #openstack-gate23:45
*** jd__` has joined #openstack-gate23:47
*** dims has joined #openstack-gate23:48
mriedemjog0: might have a fingerprint on this one too: https://bugs.launchpad.net/nova/+bug/126920423:48
mriedemsee last comment23:48
*** dtroyer has quit IRC23:48
*** jd__ has quit IRC23:48
mriedemcould be the whole 'timed out waiting for thing' bug though23:48
*** jd__` is now known as jd__23:48
jog0mriedem: nice23:54
mriedemjog0: basically anything failing in test_server_rescue.py is becoming suspect to me23:55

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!