Wednesday, 2019-04-24

*** armax has quit IRC00:42
*** diablo_rojo has quit IRC00:48
*** mriedem has quit IRC01:03
*** whoami-rajat has joined #openstack-release01:16
*** ekcs has quit IRC01:25
*** ricolin has joined #openstack-release02:06
*** ykarel|away has joined #openstack-release02:07
*** gmann_afk is now known as gmann02:16
*** lbragstad has quit IRC03:08
*** whoami-rajat has quit IRC03:35
*** ykarel|away has quit IRC03:40
*** udesale has joined #openstack-release03:58
*** whoami-rajat has joined #openstack-release04:06
*** ykarel|away has joined #openstack-release04:24
*** ykarel_ has joined #openstack-release04:30
*** ykarel|away has quit IRC04:32
openstackgerritMerged openstack/releases master: [keystone] create pike-em tag against final releases  https://review.opendev.org/65290505:08
openstackgerritMerged openstack/releases master: [Telemetry] final releases for pike  https://review.opendev.org/65288205:08
*** e0ne has joined #openstack-release05:10
*** e0ne has quit IRC05:17
*** ykarel_ is now known as ykarel05:53
*** electrofelix has joined #openstack-release06:11
*** d34dh0r53 has quit IRC06:22
*** pcaruana has joined #openstack-release06:24
*** egonzalez has quit IRC07:03
*** egonzalez has joined #openstack-release07:04
*** amoralej has joined #openstack-release07:25
*** tosky has joined #openstack-release07:26
*** hberaud has joined #openstack-release07:39
*** dtantsur|afk is now known as dtantsur07:40
*** ykarel is now known as ykarel|lunch07:53
*** jpich has joined #openstack-release08:00
ttxtonyb[m]: we should really hold on releases until we figure out the twine issue.08:15
ttxCurrently all those tarballs are lost and all those release jobs need to be reenqueued08:16
*** ykarel|lunch is now known as ykarel08:30
*** jpich has quit IRC08:56
*** jpich has joined #openstack-release08:56
*** jpich has quit IRC08:57
*** jpich has joined #openstack-release09:04
*** dtantsur is now known as dtantsur|brb09:19
*** e0ne has joined #openstack-release09:20
*** d34dh0r53 has joined #openstack-release09:46
*** dirk has quit IRC09:54
*** gmann has quit IRC09:54
*** dirk has joined #openstack-release09:54
*** vdrok has quit IRC09:55
*** rm_work has quit IRC09:55
*** gmann has joined #openstack-release09:56
*** vdrok has joined #openstack-release09:57
*** ykarel_ has joined #openstack-release10:01
*** ykarel has quit IRC10:04
*** rm_work has joined #openstack-release10:06
*** hberaud is now known as hberaud|lunch10:09
*** ykarel_ is now known as ykarel10:14
*** ykarel_ has joined #openstack-release10:19
*** ykarel has quit IRC10:22
*** ykarel_ is now known as ykarel10:23
*** ykarel is now known as ykarelaway10:33
*** dtantsur|brb is now known as dtantsur10:35
*** gmann has quit IRC10:44
*** ykarelaway is now known as ykarel11:05
*** udesale has quit IRC11:16
*** hberaud|lunch is now known as hberaud11:28
*** amoralej is now known as amoralej|lunch12:06
*** udesale has joined #openstack-release12:29
*** lbragstad has joined #openstack-release12:44
*** irclogbot_3 has quit IRC12:55
*** irclogbot_3 has joined #openstack-release12:56
*** altlogbot_1 has quit IRC12:57
*** altlogbot_0 has joined #openstack-release12:58
smcginnisfungi: Did anything more happen with tracking down the source of the ensure-twine issues?13:03
fungismcginnis: i haven't given up but i'm still searching. on the lp error ttx posted to the ml, i think that one's clear-cut and easy to fix (see my reply there)13:09
smcginnisYeah, the lp one should be fixed, but that's not the blocker.13:09
fungifor ensure-twine, i think we need to figure out whether pip is actually being invoked from within a virtualenv there, or whether it is confused by something making it think it's using a virtualenv. we ought to be able to use the ensure-twine role in a check job proposed as a do-not-merge change and see what it does, then maybe add some debugging around it13:14
smcginnisI have https://review.opendev.org/655241 up, but it sounded like it would fail due to being in a read-only environment (if I understood correctly), so not sure if that's really needed or not.13:15
smcginnisWould be easy enough to add a debug print for sys.real_prefix.13:15
fungii spent a good chunk of yesterday looking into anything which might have changed on or shortly before the 17th when we saw the first case of this, but didn't get to do any further debugging what with also trying to look into network instability and rogue vm issues in our providers most of the day13:16
fungitrying to debug with something like 655241 is going to be harder, since we have to land changes to it and exercise them with test releases13:16
*** amoralej|lunch is now known as amoralej13:17
smcginnisI'd be glad to help, but need some hand holding to know how.13:17
fungibut as for the role itself, either it really *is* calling pip inside the ansible virtualenv (in which case the job shouldn't be able to pip install anything into it) or it's confused about what its situation is13:18
fungiif we take the error on its face, a possible way out is to rebuild the ansible venvs with system site packages enabled, at which point pip install --user in the workspace should work, but i don't know whether there could be security ramifications like jobs being able to so13:19
fungisilently replace ansible modules13:19
smcginnisWhat if we always set up twine in a venv and make sure any use of it is via that venv?13:20
fungiwe could also try sanitizing the calling environment and setting the interpreter to /usr/bin/python3, or creating a twine venv in the workspace and using that13:20
* fungi will be back in just a minute13:21
*** ykarel is now known as ykarel|away13:23
fungiwell, a few minutes13:24
dhellmannfungi : is ansible on the workers is using a virtualenv by default? if so, that might explain why this is a "new" problem, since those may have not had a "python3" executable on the old images13:34
dhellmannbeing explicit about using a twine virtualenv is probably the quickest route back to making things work, but it would be nice to understand why they're failing13:35
fungithere are no "images" relevant to this as twine is being installed and run on the executor13:41
fungiansible is installed in and run from virtualenvs on the executor so we can have multiple versions of ansible from which for zuul to select, but that's been the case since well before stein released so not recent enough to be the cause13:41
fungithough i do wonder if there could be a new ansible regression which is leaking envvars into the pip install shell task, i'm going to see if recent ansible release history lines up with our timeline for this13:42
fungiif zuul, for some reason, got new versions of ansible at different times on different executors, that could account for the random behavior we were seeing early on13:43
smcginnisNot sure if it will help, but I put up https://review.opendev.org/65543713:44
fungi(i'm also not finished catching up on overnight irc scrollback and e-mail either, so please bear with me)13:46
*** ykarel|away has quit IRC13:46
smcginnisdhellmann: I haven't wanted to bring it up yet, but see last comment on https://review.opendev.org/#/c/654627/3 for another potential issue.13:47
openstackgerritIvan Kolodyazhny proposed openstack/releases master: Release Horizon Queens 13.0.2  https://review.opendev.org/65544013:49
dhellmannsmcginnis : if we have a git repo sync issue, we should be more aggressive about cloning/updating in our jobs. I think only the release jobs are sensitive to missing tags like that.13:50
openstackgerritIvan Kolodyazhny proposed openstack/releases master: Release Horizon Rocky 14.0.2  https://review.opendev.org/65544213:51
dhellmannfungi : sure, I don't know how that stuff is set up so by "images" I just meant "the OS on the hosts running the ansible jobs" and it sounds like there may have been some changes in that content (at least to ansible itself, if not to the version of python used in those virtualenvs)13:51
*** gmann has joined #openstack-release13:51
dhellmannsmcginnis : our clone_repo.sh script *should* be doing a lot of extra fetching and pulling already, but maybe we're missing something?13:52
* dhellmann has to drop offline for a bit13:52
*** mriedem has joined #openstack-release13:53
openstackgerritIvan Kolodyazhny proposed openstack/releases master: Release Horizon Stein 15.0.1  https://review.opendev.org/65544713:54
fungiyeah, the virtualenvs themselves were built in mid-march across all the executors with the same versions (latest) of virtualenv and pip, which are a couple months old at this point too13:54
fungibut we do upgrade ansible in the virtualenvs whenever there's a new point release13:54
smcginnisLooks like 2.7.10 was released 21 days ago.13:57
fungiyeah, so in checking my assumptions, it seems we have *not* been upgrading ansuble 2.7 at least. the executors are all still using 2.7.9 from march 14 (latest at the time their virtualenvs were built on march 18)13:58
fungisame with ansible 2.6.15 from march 1513:59
fungiso there goes that theory13:59
smcginnisSo ansible, python, and pip have all been the same since well before this started happening?14:00
fungiyes, since well before stein release even14:03
fungias have the jobs and roles in use14:03
fungiwhich is why this is so maddening14:03
smcginnisUnderlying platform on the executors?14:04
fungiwe have unattended-upgrades pulling in security updates for ubuntu xenial (16.04 lts) so in theory it could be something which upgraded around that time. i'll take a look at the dpkg log on an executor14:05
fungithere was a glibc update on the 16th14:06
*** udesale has quit IRC14:06
*** udesale has joined #openstack-release14:07
*** udesale has quit IRC14:08
*** udesale has joined #openstack-release14:08
fungihttp://paste.openstack.org/show/749704 is a list of the packages which were updated on ze01.openstack.org since the start of april14:09
smcginnisNothing in that list that looks obvious to me that could have an impact on python execution.14:10
fungii was wrong about glibc upgrading, i think those were just trigger debug lines14:14
smcginnisSo sounds like we may need that debugging to figure out what virtual environment we are in. If the host, ansible, python, and pip all have not really changed, then the only other thing I can think of is some other task was added that is somehow getting us into a venv.14:15
*** ykarel has joined #openstack-release14:18
ttxyeah, we at least need to figure out whether the error is correct or whether it's misleading14:19
openstackgerritIvan Kolodyazhny proposed openstack/releases master: Release Horizon Stein 15.0.1  https://review.opendev.org/65544714:30
openstackgerritIvan Kolodyazhny proposed openstack/releases master: Release Horizon Rocky 14.0.3  https://review.opendev.org/65544214:33
*** electrofelix has quit IRC14:37
*** mlavalle has joined #openstack-release14:45
fungiright, the output of the additional debugging should hopefully confirm or refute the basis of that error and then we'll hopefully have a better idea of where/how to try to fix it14:46
openstackgerritMatt Riedemann proposed openstack/releases master: [nova] final releases for pike  https://review.opendev.org/65286814:52
openstackgerritMatt Riedemann proposed openstack/releases master: [nova] create pike-em tag against final releases  https://review.opendev.org/65286914:52
smcginnisttx: I have that lp change just about ready if you have not started it yet.14:52
ttxdone14:53
ttxsorry14:53
ttxhttps://review.opendev.org/65546514:53
smcginnisNo worries. Was just writing the commit message when I saw your ML response. :)14:53
openstackgerritIvan Kolodyazhny proposed openstack/releases master: Release Horizon Stein 15.0.1  https://review.opendev.org/65544714:59
*** dave-mccowan has joined #openstack-release15:01
*** pcaruana has quit IRC15:06
*** dave-mccowan has quit IRC15:07
*** amotoki_ is now known as amotoki15:08
*** diablo_rojo has joined #openstack-release15:15
*** ricolin has quit IRC15:17
*** ricolin has joined #openstack-release15:28
*** armax has joined #openstack-release15:29
*** pcaruana has joined #openstack-release15:46
*** diablo_rojo has quit IRC15:53
*** diablo_rojo has joined #openstack-release15:56
*** tosky has quit IRC16:00
smcginnisStatus update on debugging the post-release failures with ensure-twine.16:02
smcginnisAdded debug output of the environment and got one passing, one failing.16:02
smcginnisPass: http://logs.openstack.org/77/654477/1/check/test-release-openstack/24f58a8/ara-report/result/8537481b-3908-4016-8165-0b4650aeb758/16:03
smcginnisFail: http://logs.openstack.org/59/59ce65e9e66fe3ea203b77812ee14e69ebdb192a/release/release-openstack-python/781edd8/ara-report/result/a6f1d958-2d17-4b8a-a6bf-97cb1bf6e31d/16:03
smcginnisPass shows "Host: ubuntu-bionic" which would indicate ansible was ssh'ing into the machine and running, where it ended up not being within a virtualenv.16:03
smcginnisFail shows "Host: localhost" which would indicate it was just running locally, and output shows it is within ansible's virtualenv.16:04
*** ekcs has joined #openstack-release16:05
*** altlogbot_0 has quit IRC16:09
*** ykarel is now known as ykarel|away16:11
*** altlogbot_2 has joined #openstack-release16:12
*** ianychoi has quit IRC16:14
*** ianychoi has joined #openstack-release16:15
smcginnisIt is looking like https://opendev.org/zuul/zuul/commit/70ec13a7caf8903a95b0f9e08dc1facd2aa75e84 is the cause of the problems.16:17
smcginnis3 weeks ago, then it wouldn't have been picked up until more executors were restarted.16:17
smcginnisWhich a lot were around the 16/17.16:18
smcginnisGoing to revert that change.16:18
openstackgerritHervĂ© Beraud proposed openstack/releases master: Introduce a new yamlutils available by using oslo.serizalization  https://review.opendev.org/64813316:19
*** ricolin has quit IRC16:20
ttxsmcginnis: nice catch16:20
smcginnisttx: Definitely team effort with the infra and zuul folks. Just glad we're finally narrowing things down and have a good hypothesis.16:22
smcginnisRevert - https://review.opendev.org/#/c/655491/16:22
smcginnisWe'll run another test after that lands. I think we also need to restart the executors to make sure they pick that up.16:22
*** dtantsur is now known as dtantsur|afk16:29
*** jpich has quit IRC16:33
*** altlogbot_2 has quit IRC16:43
*** altlogbot_0 has joined #openstack-release16:44
*** altlogbot_0 has quit IRC16:53
openstackgerritMatt Riedemann proposed openstack/releases master: [nova] final releases for pike  https://review.opendev.org/65286816:56
openstackgerritMatt Riedemann proposed openstack/releases master: [nova] create pike-em tag against final releases  https://review.opendev.org/65286916:56
mriedemi think ^ should be good now16:56
*** altlogbot_2 has joined #openstack-release16:56
smcginnisThanks mriedem16:57
*** hberaud is now known as hberaud|gone17:01
*** e0ne has quit IRC17:02
*** amoralej is now known as amoralej|off17:18
*** electrofelix has joined #openstack-release17:46
*** udesale has quit IRC17:49
fungiyeah, i suspect if we go back through system logs we'll find that one or more (but not all) executors were restarted around the 16th/17th and so release-openstack-python jobs which got scheduled to one of those executors hit this regression while reenqueunig them often resulted in running on a different executor which was not yet running with that change18:17
fungiand the point at which the behavior became consistent was coincident with when we did a full coordinated restart of all executors18:18
fungiso as long as it goes away once we restart all the executors again with the revert in place, i won't lose any more sleep over it18:20
*** ykarel|away has quit IRC18:22
smcginnisNow we just need that revert to make it through.18:31
*** e0ne has joined #openstack-release18:48
*** openstackgerrit has quit IRC18:57
*** electrofelix has quit IRC19:03
dhellmannisn't that revert just going to break other things, though?19:03
smcginnisIt was needed for ara, but they have a plan for handling it better there rather than forcing all jobs to be executed by ansibles venv.19:04
smcginnisDoesn't sound like it was a service affecting ara issue either.19:05
dhellmannok, good19:14
*** e0ne has quit IRC19:32
*** e0ne has joined #openstack-release19:35
*** dave-mccowan has joined #openstack-release19:56
*** e0ne has quit IRC20:08
*** pcaruana has quit IRC20:39
*** whoami-rajat has quit IRC21:25
smcginnistonyb[m]: Still issues with releasing python projects.21:43
smcginnistonyb[m]: We think we need https://review.opendev.org/#/c/655491/ to merge and the zuul executor nodes to be restarted before we can do releases without failures.21:43
smcginnistonyb[m]: So probably best to hold off on any more releases until we get the infra all clear.21:43
*** mlavalle has quit IRC22:13
dhellmannmaybe we should apply a procedural -2 to all of those so their authors know the status22:17

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!