Friday, 2020-07-24

*** xiaolin has joined #opendev00:53
*** ysandeep|PTO is now known as ysandeep02:00
*** sgw has quit IRC02:20
*** sgw has joined #opendev03:46
*** ysandeep is now known as ysandeep|ruck04:32
*** sgw has quit IRC04:56
*** sgw has joined #opendev05:07
openstackgerritIan Wienand proposed openstack/diskimage-builder master: Pre-install python3 for CentOS  https://review.opendev.org/74186805:35
openstackgerritIan Wienand proposed openstack/diskimage-builder master: Deprecate dib-python; remove from in-tree elements  https://review.opendev.org/74187705:35
*** marios has joined #opendev06:20
openstackgerritIan Wienand proposed zuul/zuul-jobs master: build-python-release: default to Python 3  https://review.opendev.org/74279906:41
openstackgerritIan Wienand proposed zuul/zuul-jobs master: build-python-release: default to Python 3  https://review.opendev.org/74279906:42
openstackgerritIan Wienand proposed zuul/zuul-jobs master: build-python-release: default to Python 3  https://review.opendev.org/74279906:43
*** qchris has quit IRC06:51
*** qchris has joined #opendev07:04
*** fressi has joined #opendev07:12
*** tosky has joined #opendev07:37
*** dougsz has joined #opendev07:40
*** fressi has quit IRC07:43
*** DSpider has joined #opendev07:44
*** ysandeep|ruck is now known as ysandeep|lunch07:49
*** fressi has joined #opendev07:51
*** moppy has quit IRC08:01
*** moppy has joined #opendev08:03
openstackgerritTobias Henkel proposed zuul/zuul-jobs master: Consolidate common log upload code into module_utils  https://review.opendev.org/74273608:14
*** fressi has quit IRC08:14
openstackgerritTobias Henkel proposed zuul/zuul-jobs master: Consolidate common log upload code into module_utils  https://review.opendev.org/74273608:25
openstackgerritTobias Henkel proposed zuul/zuul-jobs master: Merge upload logs modules into common role  https://review.opendev.org/74273208:29
openstackgerritTobias Henkel proposed zuul/zuul-jobs master: Consolidate common log upload code into module_utils  https://review.opendev.org/74273608:29
*** ysandeep|lunch is now known as ysandeep|ruck08:31
openstackgerritTobias Henkel proposed zuul/zuul-jobs master: Merge upload logs modules into common role  https://review.opendev.org/74273208:32
openstackgerritTobias Henkel proposed zuul/zuul-jobs master: Consolidate common log upload code into module_utils  https://review.opendev.org/74273608:32
*** dougsz has quit IRC08:35
*** dougsz has joined #opendev08:48
*** dtantsur|afk is now known as dtantsur08:53
*** zbr is now known as zbr|ruck09:26
openstackgerritMerged openstack/diskimage-builder master: Support non-x86_64 DIB_DISTRIBUTION_MIRROR variable for CentOS 7  https://review.opendev.org/74018309:38
*** fressi has joined #opendev09:59
openstackgerritMerged opendev/irc-meetings master: Rotate Large Scale SIG meeting  https://review.opendev.org/74238610:10
*** fressi has quit IRC11:57
*** fressi has joined #opendev12:00
*** ysandeep|ruck is now known as ysandeep|afk12:15
*** ysandeep|afk is now known as ysandeep|ruck12:40
*** avass has joined #opendev12:56
*** ysandeep|ruck is now known as ysandeep|away12:57
*** ysandeep|away is now known as ysandeep12:58
*** ysandeep is now known as ysandeep|away13:44
*** mlavalle has joined #opendev13:58
clarkbhello!13:59
clarkbfungi: one thing I realized we may want to do is clean up the disk on review.o.o a bit. I think we have enough free space for our index backup today (it will be in the 7GB range and we have 23GB available) but before we forget a gain we should clean up stale backup material14:00
clarkbfungi: do you think we should try and do that now really quick or do it after the downtime?14:00
clarkbin particular we've got old index backups that I think can go away as well as old mysql backups14:00
clarkbI probably lean towards after simply because its early and I don't awnt to think extra hard :)14:01
*** dpawlik2 has quit IRC14:11
clarkbfungi: (and infra-root ) I added some disk cleanup notes to https://etherpad.opendev.org/p/gerrit-2020-07-24 so that we have that ready to go. If you get a chance double checking those files and dirs would be good14:14
corvusclarkb: what time is start?14:16
clarkbcorvus: 15:0014:16
clarkbabout 44 minutes from now14:16
clarkbI'll stop zuul in about half an hour so that we can confirm it is paused well before we start the outage14:17
corvusthat confused me; i think you mean "stop the periodic ansible runs executed by zuul on bridge" yeah? :)14:23
clarkbyes, sorry not the zuul service but our consumption of it to run playbooks on bridge14:24
clarkbvia the disable ansible command on bridge14:24
fungiclarkb: i would do the cleanup after14:25
fungialso is disabling 15 minutes before downtime sufficiently early to avoid having an earlier build still running by the maintenance start?14:27
fungii guess we can always check if one's running and delay the maintenance if we need14:27
clarkbfungi: it should be because we haven't approved anything and the hourly deploys happen at top of the hour14:27
openstackgerritOleksandr Kozachenko proposed zuul/zuul-jobs master: Add docker format option to skopeo in push-to-intermediate-registry role  https://review.opendev.org/74289214:27
clarkbbasically that gets us well ahead of the hourly deploy and should be enough14:27
fungioh, yep. perfect14:27
openstackgerritOleksandr Kozachenko proposed zuul/zuul-jobs master: Add docker format option to skopeo in push-to-intermediate-registry role  https://review.opendev.org/74289214:28
openstackgerritOleksandr Kozachenko proposed zuul/zuul-jobs master: Modify push-to-intermediate-registry role  https://review.opendev.org/74289214:29
clarkbthe other rename step that would be good is to have someone double check the content of https://etherpad.opendev.org/p/gerrit-2020-07-24 looks correct and that I copied it to bridge at /root/renames/20200724/20200724.yaml properly14:31
clarkbif that looks good then I think we are pretty well set up to go now14:32
fungii'll take another look in just a sec14:33
clarkber I meant to ask to review https://review.opendev.org/74273114:34
clarkbnot the etherpad itself necessarily (but double checking the etherpad too is good as well :)14:34
clarkbthats our input to the playbook14:34
fungiahh, yup14:35
openstackgerritOleksandr Kozachenko proposed opendev/base-jobs master: Modify opendev-build-docker-image job  https://review.opendev.org/74289514:36
clarkbI added an IRC status notice to the etherpad14:37
*** mlavalle has quit IRC14:40
*** mlavalle has joined #opendev14:43
corvuson screen, status and renames files lgtm.14:43
clarkbcorvus: thanks for checking. Unfortunately it seems that openstack release team isn't subbed to our service-announce ml and they are doing a nova release right now :/ I've since warned them and those jobs don't tend to take long so we can likely wait for that to flush out14:45
corvuscool.  they subbed now? :)14:45
clarkbI've been told the old rule of no friday releases has been relaxed because the testing and automation works so well (the upside in all this I guess)14:45
clarkbcorvus: yup14:45
corvusthen all's well that ends well :)14:45
fungiheh, we've sabotaged our maintenance availability by being too stable ;)14:46
corvusi'm planning on re-enqueing that nodepool release after we're done14:46
fungiclarkb: disabling ansible deploys now?14:48
clarkbfungi: yup just did it in the screen14:48
fungioh, cool. i've joined that screen session now14:49
clarkbhopefully I made my terminal window small enough14:49
fungii suppose there's no need to move conversation to #opendev-meeting for this maintenance14:49
clarkbwe can if you'd prefer14:50
clarkbmaybe thats a good habit to get into regardelss14:50
fungidoesn't matter to me, just suddenly remembered that's one of the reasons we created it14:50
openstackgerritOleksandr Kozachenko proposed opendev/base-jobs master: Modify opendev-buildset-registry job  https://review.opendev.org/74289514:54
*** sgw is now known as sgw_away15:06
openstackgerritOleksandr Kozachenko proposed opendev/base-jobs master: Modify opendev-buildset-registry job  https://review.opendev.org/74289515:11
-openstackstatus- NOTICE: We are renaming projects in Gerrit and review.opendev.org will experience a short outage. Thank you for your patience.15:21
*** fressi has quit IRC15:24
openstackgerritMerged openstack/project-config master: Rename transparency-policy from openstack/ to osf/ namespace  https://review.opendev.org/73928615:37
openstackgerritMerged openstack/project-config master: Fix x/devstack-plugin-tobiko name  https://review.opendev.org/73897915:37
corvusmnaser: as requested in #zuul i ran this: zuul autohold --tenant vexxhost --project vexxhost/node-labeler --job node-labeler:image:build --change 742276 --reason "mnaser debug multi-arch containers" --count 115:41
corvusmnaser: and issued a recheck on that change15:41
*** marios has quit IRC15:46
*** dtantsur is now known as dtantsur|afk15:53
mnasercorvus: thanks -- infra-root, appreciate access to root@198.72.124.203 which is the failed held node :)15:56
fungimnaser: where can i find your ssh public key?15:57
mnaserfungi: curl https://github.com/mnaser.keys >> ~/.ssh/authorized_keys :)15:57
mnaserbut with your luck: curl: command not found15:57
mnaser:P15:57
fungii did it with wget anyway15:58
fungiyou should be all set. let us know when you're done with it15:58
openstackgerritJames E. Blair proposed opendev/base-jobs master: Add infra-prod-base job to set up git repos  https://review.opendev.org/74293415:59
mnaserfungi: awesome.  thank you.16:01
openstackgerritJames E. Blair proposed opendev/base-jobs master: Add infra-prod-base job to set up git repos  https://review.opendev.org/74293416:03
openstackgerritJames E. Blair proposed opendev/system-config master: Use infra-prod-base in infra-prod jobs  https://review.opendev.org/74293516:04
corvusclarkb, fungi: ^ i think those 2 should get us moving again16:05
openstackgerritOleksandr Kozachenko proposed opendev/base-jobs master: Modify opendev-buildset-registry job  https://review.opendev.org/74289516:06
clarkbcorvus: both lgtm16:07
clarkbthats actually cleaner than I thought it would be making me way more comfortable with just merging them without disabling most of the jobs first16:08
corvusyeah, i'd be comfortable merging that and just watching the next run16:08
fungiapproved both16:10
*** chandankumar is now known as raukadah16:18
corvusclarkb, fungi: i think things are settling down...16:23
corvusi have to prepare for my hazmat shopping expidition; i'll be back in a little while and will check back in then, sound good?16:24
openstackgerritMerged opendev/project-config master: Add record for project renames on July 24, 2020  https://review.opendev.org/74273116:24
clarkbcorvus: ya thats fine16:24
fungicorvus: have fun storming the castle!16:24
clarkbcorvus: before you go should we recheck https://review.opendev.org/742935 ?16:24
clarkbcorvus: mostly wondering if you think you want to be around for that landing16:25
clarkbwhich will trigger the jobs I think16:25
clarkbwe can also wait for you to return to do that16:25
corvusclarkb: i say don't wait for me, but it does look like 742934 needs whitespace fixing16:25
corvusoh actually16:25
corvusit's that the job isn't documented16:26
corvuslemme fix that real quick16:26
clarkbk16:26
clarkbactually if we're going to pause I'm happy to get my bike ride in too then we can land the child change when people have returned?16:27
clarkbfungi: ^ unless you'd prefer to just push forward I can stick around and bike later (its actaully somewhat cool again here)16:27
fungii have plenty of things i can knock out in the meantime, go for it16:27
fungistill baffled by the release copy to tarballs failure, i was able to write to the same path from the same executor16:28
openstackgerritJames E. Blair proposed opendev/base-jobs master: Add infra-prod-base job to set up git repos  https://review.opendev.org/74293416:29
clarkbfungi: could it be a corner case issue wtih afs in docker in bwrap?16:29
corvusclarkb, fungi: ^ i think that should take care of it16:29
clarkbfungi: you might need to docker exec then run bwrap and try?16:29
fungiclarkb: well, i also didn't use the exector's creds when i tested, i used my own... but yeah maybe something where certain executors aren't able to use their kerberos accounts for some reason?16:30
*** dougsz has quit IRC16:31
fungiit's just odd since the job runs fine most of the time, but we've had three failures in as many days... an order of magnitude more successes than failures16:32
fungifor the same jobs16:32
clarkbya16:33
fungii wonder if the executor's bubblewrap container started before ntpd corrected the boot-time clock skew, and whether thatt has caused kerberos tickets to be invalid or something16:35
fungier, docker container, not bubblewrap16:38
fungithough `date` run under docker-compose exec matches16:40
*** zbr|ruck is now known as zbr16:41
mnaserhmm16:52
mnaseri see some afs wording in the backlog16:52
mnaseris there any known issues currently?16:53
mnaserdoc promote job here failed -- but on vexxhost tenant -- https://zuul.opendev.org/t/vexxhost/build/2aeeb5d8d38d427ab03b95ca475fa40816:53
mnaserThere was an issue creating /afs/.openstack.org as requested: [Errno 13] Permission denied: b'/afs/.openstack.org'16:53
clarkbmnaser: yes16:53
clarkbwe don't know why yet and seems to only affect a subset of executors (we should check your job's executor for that)16:54
mnaserlet me grab it for you16:54
mnaserExecutor: ze10.openstack.org16:54
mnasera successful run in the same merge (minutes apart was on Executor: ze03.openstack.org)16:55
clarkbya thats one of the unhappy ones16:55
mnaserso at least it consistently breaks, which is .. nice16:55
clarkbfungi: fwiw I would try an fs checkvolumes on those servers16:57
clarkband if that doesn't help a reboot :/16:57
*** redrobot has joined #opendev16:58
openstackgerritMerged opendev/base-jobs master: Add infra-prod-base job to set up git repos  https://review.opendev.org/74293416:59
clarkbI've rechecked https://review.opendev.org/#/c/742935/1 since ^ merged16:59
fungi"All volumeID/name mappings checked."17:03
fungiran `sudo fs checkvolumes` on both ze10 and ze1117:03
clarkbya I think thats all it ever says17:03
clarkbit doesn't report if it fixed anything but has fixed things for wheel caches17:03
clarkbalso not sure ifi t matters but I ran fs checkvolumes aklogged into my admin account on the server17:04
fungii think part of the lack of reporting is because it's async17:04
clarkbah17:04
fungiif memory serves, it flags them for check on next use17:04
fungibut if that were the problem, then i shouldn't have been able to write to the same volume from those servers17:05
fungiin theory17:05
clarkbre gerrit disk cleanup. /opt actually has plenty of room so I'm thinking I'll move the things I identified into there then we can let them sit another week or wahtever then rm off of /opt17:08
clarkbthis is my paranoia coming through :)17:08
fungiwfm, though i really don't think that stuff you identified will be missed, and also we've been backing it up too right?17:08
clarkbfungi: yes it should be included in our backups. ianw was going to double check recovery after the local index was cleaned17:09
clarkbI'm not sure if that happened17:09
clarkb(but also these files are all so old they'd ahve been covered pre index cleanup anyway)17:09
clarkbfungi: you're good with the list I have on the etherpad?17:09
fungiwhere was it again?17:18
fungii remember looking at it some weeks ago and being fine with it17:18
clarkbI added it to the maintenance etherpad https://etherpad.opendev.org/p/gerrit-2020-07-2417:19
clarkbthat way it would be top of mind :)17:19
fungioh, cool, it's there now17:20
fungiclarkb: yep, that list looks plenty safe to clean up17:22
clarkbok I'll start mv's now17:22
fungithanks!17:23
*** sgw_away is now known as sgw17:23
fungiwhat happened to your bike ride?17:23
clarkbfungi: I decided to get some things done. I'm just in the habit of early bike rides because it gets hot later in the day but today is supposed to be cool so I'll do it in a bit17:24
clarkbalso I got hungry17:24
fungii moved to the patio. first day this week without a heat advisory, only a little over 30c with a pleasant breeze now17:26
clarkbsystem-config-run-base failed on the infra-prod fix17:29
clarkbI want to finish up the file mv's before I do anything else17:29
fungiit's going to be a game of whack-a-mole17:31
fungi"Check that ARA is installed" also runs afoul of the same problem17:32
fungihttps://opendev.org/opendev/system-config/src/branch/master/playbooks/zuul/run-base-post.yaml includes the ara-report role which runs `bash -c "type -p {{ ara_report_executable }}"`17:37
fungihttps://opendev.org/zuul/zuul-jobs/src/branch/master/roles/ara-report/tasks/main.yaml#L317:37
clarkbfungi: I think that is doing an ara report for the nested ansible ? if so maybe we can fix that by running it on the remote node17:37
fungiit runs on "localhost" https://opendev.org/opendev/system-config/src/branch/master/playbooks/zuul/run-base-post.yaml#L73-L8117:38
clarkbya but its running it against the sqlite file produced by nested ansible so I think we can instead generate that report on the remote node before we copy logs17:39
clarkbbasically generate then copy to logs dir on executor not copy to executor then generate17:40
fungiso we'll need to adjust ara_database_path and ara_report_path too17:41
clarkbyes17:41
clarkbas well as move the task location so it happens before logs are copied17:41
clarkbabove line 53 ish17:42
fungiahh, yep, so we'd have it use ara_database_path: "{{ log_dir }}/ara-report/ansible.sqlite"17:42
clarkbno {{ log_dir }} is executor relative already17:42
clarkb/var/cache/ansible/ara.sqlite is the bridge.o.o file17:43
fungioh, duh, "/var/cache/ansible/ara.sqlite"17:43
fungiyep, i mixed up src and dest17:43
clarkbI'm goign to close our root screen nwo since it isn't being used17:44
fungishould we tell ara-report to write the html report there too i guess?17:44
clarkbfile moves are done17:44
clarkbfungi: we should write the html report to /home/zuul/logs or similar I think there is a convention for that then base jobs automatically grab them17:44
*** AJaeger has quit IRC17:45
clarkb#status log Moved files out of gerrit production fs and onto ephemeral drive. Assuming this causes no immediate problems those files can be removed in the near future. This has freed up space for gerrit production efforts.17:47
openstackstatusclarkb: finished logging17:47
clarkbfungi: also as another option we can just disable ara for now17:48
clarkbfungi: continue to copy the sqlite file then if we need it we can manually generate a report17:48
clarkbthat may be quickest for now and keep things moving17:48
fungilooks like it's /home/zuul/zuul-output/logs17:51
openstackgerritJeremy Stanley proposed opendev/system-config master: Run ara-report on bridge in run-base-post  https://review.opendev.org/74295517:55
fungiclarkb: ^ is that what you were thinking?17:55
clarkblooking17:55
fungimay still need that task which creates the report dir, if ara won't do so itself17:56
fungiin retrospect, we likely added it for a reason17:57
clarkbfungi: I think the task you rm'd at line 10 is still needed for th task at line 54. As an alternative copy the sqlite file into the test node's zuul/zuul-output/logs dir17:57
clarkbfungi: I think its correct to drop the hostname when working on the remote host because then when we copy we automatically prefix with the hostname17:58
fungiyeah, that's the one i was thinking i'd need to put back (but also move it to run on the node)17:58
clarkbbut you are right that you may need to create zuul-report/logs/ara-report/ ?17:58
clarkbthen also change the sqlite copy to go into zuul-report/logs/17:58
clarkbbut ya that looks like the thing we want17:59
openstackgerritJeremy Stanley proposed opendev/system-config master: Run ara-report on bridge in run-base-post  https://review.opendev.org/74295518:01
fungioh, i see what you meant about the sqlite copy18:01
fungiwhat's the best way to copy/move a file on the node in ansible?18:02
clarkbfungi: left comments on the change itself18:04
clarkbfungi: oddly one of your responses was to the base of the chagne. Not ps218:09
clarkbtook me half a second to udnersatnd what the question was :)18:09
fungihuh, i wonder if gertty is having trouble aligning comments with the right lines18:10
*** gmann is now known as gmann_lunch18:13
fungioh, i see, if you comment on an unchanged line gertty seems to associate it with the base not new18:16
openstackgerritJeremy Stanley proposed opendev/system-config master: Run ara-report on bridge in run-base-post  https://review.opendev.org/74295518:18
clarkbfungi: you missed the last synchronise, but that looks like what I would expect otherwise18:19
clarkbmaybe we want to see if this run works before updating?18:19
funginot sure what you mean by "missed the last synchronize"18:20
clarkbI left a comment in the change18:20
fungii asked if you meant it on a different line and you replied yes?18:21
fungiare you saying the synchronize from /etc/ansible to "{{ log_dir }}/etc" needs a separate call to create that path too?18:21
fungii didn't see where the job was originally doing that, if so18:21
clarkboh I did misparse the moved comment then. I thought you were asking if the see above was referring to the prior comment18:22
clarkbfungi: that last task in the file is copying from test node to the executor. But we're no longer creating the directory on the executor to copy into18:22
clarkbI think it may be better to copy everything as we'd normally do via the zuul-output dir and have it all get collceted together from there18:23
clarkbthat said if rsync will create the dir for us it may just work18:23
clarkbfungi: the task on line 10 is what creates that dir which is removed in your change18:23
fungigertty and the gerrit webui totally don't seem to agree about which line numbers those comments are supposed to be on18:23
fungiso gimme a sec, i'm completely confused about which synchronize you're talking about. i thought i removed the one for the dest we were directly creating previously18:24
clarkbfungi: you removed one of the two18:24
clarkbwe need to remove the second one as well.18:24
fungibut we weren't creating the path for the other one?18:24
fungithe one for etc?18:24
clarkboh I see its /etc instead of /ara-report so ya the synchronize will probaly work there18:25
fungii'm pulling up the latest patchset in the gerrit webui so i can go by the line numbers it displays, just a sec18:25
clarkbI still think its odd to copy directly like that if we're relying on zuul-output but it should be functional18:25
clarkbmaybe lets wait on CI results for this?18:25
fungiso in the newest version, the "{{ log_dir }}/etc" dest at line 69, the playbook never created that before now either18:26
fungiat least not that i can see18:26
clarkbya when I was reading it before I was thinking etc was under ara-report18:26
clarkbbut it isn't18:26
fungiokay, cool18:26
clarkbso this is likely fine. Except its weird to copy files like that and also via zuul-output. But lets let it run and see if the changes as is work as there may be things we haven'y considered yet18:27
fungii *suspect* the reason the playbook used to have a "ensure bridge ara log directories exist" task is that ara itself wouldn't create the directory where it's told to dump its report18:27
fungiso to satisfy it, i switched to creating the report directory on the node (before generating the report on the node)18:28
fungii doubt it was there to satisfy the synchronize to the executor18:29
dmsimardo/ reading scrollback18:29
corvusclarkb: i agree with your comment but am not too fussed about it :)18:30
dmsimardthanks for looking into this! I haven't had the change to troubleshoot yet but I had provided some info http://eavesdrop.openstack.org/irclogs/%23opendev/%23opendev.2020-07-22.log.html#t2020-07-22T02:52:5518:31
dmsimards/change/chance/18:31
corvusdmsimard: can you change that play to run ara-report on the remote node instead of the executor?  that's what fungi is doing for the system-config jobs18:32
clarkbthe devel job is failing but I think because of ansible devel being unhappy with our stuff18:32
dmsimardara-report originally ran from the executor to generate a report based on the executor's perspective (not the nested one)18:33
dmsimardthere is a use case for that and there is a different use case for the nested one18:33
clarkbdmsimard: ya for the non nested case that will have to be run trusted18:34
corvusdmsimard: imeant the zuul-executor18:34
fungiargh, i replied to your comment with gertty and ended up commenting on the base side agani18:34
fungiagain18:34
corvusfungi: hit "tab" :)18:34
clarkbbut that should all be handled by base jobs if it is what the zuul install wants? its only the nestedcase that is a problem and fungi's change shows how we can fix that18:34
dmsimardclarkb: yeah, my understanding is that the one from the executor perspective now has to be trusted18:34
corvusfungi: oh, this may be a unified diff usability issue18:34
corvusdmsimard: were you running ara from the zuul executor before?18:35
fungicorvus: oh, yep! i'm using unified diff, and gertty seems to assume it should comment on the base if commenting on an unchanged line18:35
fungi(tab does nothing in unified)18:35
fungii can work around it18:36
corvusdmsimard: sorry i meant ansible18:36
corvusdmsimard, clarkb: let me rephrase this to either clarify or elucidate my confusion: the job currently runs ansible $somewhere; it seems that since it's failing when running ara on zuul-executor in post, it must be the case that the job got past the point where it ran ansible, so it must be running ansible on some test node.  can the post playbook be updated to run ara-report on that node?18:37
clarkb"2020-07-24 18:37:34.709316 | Something failed during the generation of the HTML report."18:38
dmsimardcorvus: zuul executor runs ansible with the ara callback enabled, right ? in the executor's buildroot there'll be a ~/.ara/ansible.sqlite file from which we can generate html reports18:38
clarkbcorvus: that is exactly what we are doing with system-config (or attempting to do). I think it would be possible for other jobs too. We may even be able to copy the sqlite file from executor to remote then do the "compile" on remote18:39
corvusoh, this isn't ara-report for the ara tests18:39
corvusthis is ara-report for this job?18:39
dmsimardnow, there is also a nested report -- because inside the job we install ansible and ara and then run ansible... and then generate a nested report18:39
clarkbcorvus: the thing fungi and I are looking at is for system-config-run jobs which is ara report on nested ansible. I'm not sure if dmsimard is talking about another case?18:40
corvusclarkb: i am sure dmsimard is talking about another case (the ones he linked to :)18:40
corvusclarkb: we're good in system-config.  i +2d the change.  i expect it to work.18:40
clarkbcorvus: it failed fwiw18:40
fungiyeah, but we're probably close18:40
corvusclarkb: i still stand by my statement :)18:40
fungi;)18:41
clarkboh wait it succeeded the job but ara failed18:41
clarkbmaybe that is good enough for now18:41
clarkbhttps://ca46ccff70fd6ee77e6c-5f381a9e8c14b627196c6ef3340b4d4e.ssl.cf1.rackcdn.com/742955/3/check/system-config-run-base/2a5c90d/bridge.openstack.org/ara-report/ ya we didn't get a report but do get the sqlite file18:41
clarkbI think I can live with that for now as we can always grab the sqlite file and generate a report locally if necessary. fungi corvus any objectiosn to approving the chagne given ^18:42
corvusclarkb: err18:42
corvusclarkb: i pretty much rely on the ara report to debug those jobs18:42
dmsimardhttp://paste.openstack.org/show/796296/ is the error from the generation18:42
fungiyeah, trying to see if i can work out why it failed18:42
dmsimardnot very helpful :(18:43
clarkbk I'll hold off on approving18:43
dmsimardlet me pull the database and check18:43
clarkbhttps://zuul.opendev.org/t/openstack/build/2a5c90ddf6dc4848955e0923862acf3e/console#3/2/11/bridge.openstack.org18:43
corvusclarkb: i'll say it's worth another 15 minutes effort before we cut our losses and approve it just to get things moving :)18:43
clarkbthe command line there looks wrong18:44
clarkbalmost as if it assumes to run on the executor18:44
corvusclarkb: yep18:44
fungilooks like "install-ansible: Verify ansible install" failed?18:44
clarkbwhich may be why we did the old process18:44
dmsimardI'm not sure how that's worked before18:44
clarkbdmsimard: before it relied on the zuul bug to run on the executor18:44
dmsimardbut the nested report worked, right ?18:45
corvus    final_ara_report_path: "{{ zuul.executor.log_root }}/{{ ara_report_path }}"18:45
corvusthat's hardcoded in the ara-report role18:45
fungiohh18:45
clarkbdmsimard: the nested report was generated on the executor but with the remote sqlite file18:45
clarkbdmsimard: now we know why :)18:45
dmsimardclarkb: or maybe it ran ara-manage on bridge instead ?18:46
openstackgerritMohammed Naser proposed zuul/zuul-jobs master: dnm: test multiarch  https://review.opendev.org/74296718:46
fungialso i just realized i was looking at the wrong build result18:46
dmsimardalthough as I read the code it would run ara-report on localhost which I guess is the executor18:47
fungiyep, that's the executor18:48
fungiwe're trying to run it on the node where the nested ansible is invoked instead18:48
corvusi'm going to propose a change, 1 sec18:48
fungiand then copy the html report back to the executor for publication18:49
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: Allow ara-report to run on any node  https://review.opendev.org/74297118:52
corvusdmsimard, clarkb, fungi: ^ i believe this is how we imagined that role worked18:53
corvusi think that should be backwards compatible for anyone using the role in a base job post playbook with default values18:53
dmsimardyeah that makes sense18:53
corvusi'm not sure if we want to announce that change though in case people are using it in another awy18:54
corvusmy feeling is that i don't expect the change to break anyone, so we should go with that approach rather than, say, adding a bunch of new forward-compatible variables.  but we still should probably announce it with a warning period.18:55
clarkbcorvus: in the commit message is ara_report_root meant to be ara_report_path? /me triyng to understand the risk of the change now18:55
corvusclarkb: yes18:55
clarkbah yup because its relative append before18:56
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: Allow ara-report to run on any node  https://review.opendev.org/74297118:56
clarkbso if they had a relative path before it will break18:56
clarkbI agree I think risk is reasonably low because there aren't a ton of useful paths to operate on within the executor18:56
clarkba warning period should be fine18:56
clarkbif we do that would the plan be land fungi's change as is, then in a week or two land the ara role update and it will start working in fungi's change?18:57
clarkbwe may even be able to test it with a depends on ?18:57
corvusclarkb: sgtm and yes18:58
corvusdmsimard: i'll wait for a code-review vote from you on https://review.opendev.org/742971 and if you like it, i can send the announcement18:58
dmsimardis the change for localhost execution released yet ? then I guess if people update without a change they would have the same breakage18:58
clarkbdmsimard: yes it was released yesterday18:58
corvusdmsimard: note that if they're using ara-report in a trusted base playbook, it'll still work18:58
dmsimardchange is good +2 :)19:01
corvuscase A) ara-report in base job trusted post playbook with default values: works past, present, and future.  case B) ara-report in base job trusted post playbook with a non-standard ara_report_path: works past, present, will break with my change.  case C) ara-report in untrusted playbook: worked in past, broken now, can be fixed in future if run on a worker node after my change merges.19:01
corvuscool, i'm going to grab lunch now, will send email after19:01
clarkbcorvus: should I approve fungi's change?19:02
clarkbyou're ok with it landing half broken I mean?19:02
clarkbwell ist got my +2 now. I'm going to similarly do the bike ride now and find food and all that19:02
clarkbI think the ara report change in system-cofnig is fine as well as the infra-prod reparenting change if yall want to alnd that while I'm out19:03
dmsimardI downloaded the database and trying to figure it out.. haven't touched 0.x in a bit and getting tracebacks :D19:03
dmsimardneed to look up the python and ansible version used19:03
dmsimardpython3.6 and ansible 2.9.1119:09
dmsimardok the tracebacks were my fault, was running from an old checkout o_O19:15
dmsimardnothing wrong with generating the report from the database locally19:16
dmsimardall green in the playbooks too19:17
*** gmann_lunch is now known as gmann19:19
corvusdmsimard: yeah, i suspect the error is just "can't open file at path"19:26
corvusclarkb: i approved fungi's change19:27
openstackgerritMerged opendev/system-config master: Run ara-report on bridge in run-base-post  https://review.opendev.org/74295519:46
*** DSpider has quit IRC19:47
fungii'm done grilling dinner and back now too20:19
clarkbIm done biking21:03
clarkbhas the fix for infra prod been rechecked?21:03
clarkbit has been21:04
clarkbbut its failing in the gate on a puppet change :/21:06
fungiyep, system-config-puppet-apply-4-ubuntu-xenial21:07
fungijust noticed it too21:07
clarkbI'm making lunch but should be around for a few more hours to help shepherd that in, I'm also happy if we decide it can wait for monday :021:08
clarkber :)21:08
clarkb(I'm not sure how close to wanting a weekend othes are)21:08
fungii'm always weekending, but taking a look at it in a sec21:11
fungihttps://zuul.opendev.org/t/openstack/build/323bf84fc8a545c8bba16a5ee2dbc7ac/log/applytest/puppetapplytest11.final.out.FAILED#2021:12
clarkbthats a nested ansible I believe21:13
fungissh host key verification failure connecting to localhost21:13
clarkbso localhost should be literally right there21:13
clarkbalso we run ansible a bunch of times for all the other puppet applies21:13
fungiyeah21:13
clarkbI'm guessing thats a recheck and ignore it for now?21:13
fungialso this same job passed in check21:13
fungiyup21:13
fungivery odd though21:14
clarkbfungi: ya and in that same job it will have run ansible for the other paply tests21:14
clarkbit does one for each different puppet host21:14
fungiright21:14
clarkbit is in the gate again21:44
openstackgerritMerged opendev/system-config master: Use infra-prod-base in infra-prod jobs  https://review.opendev.org/74293522:01
fungiyay!22:02
funginow wait for the next deploy?22:02
clarkbit should enqueue a deploy from that change I think22:03
clarkbhttps://zuul.opendev.org/t/openstack/stream/7305e6f77df04045b2e9350c657895f8?logfile=console.log that job is one of them I think22:04
clarkbthen ya the next hourly run should include manage-projects (we just missed the previous one I checked it failed)22:04
clarkbwe have permissions issues now22:04
fungimissed it by one minute22:05
clarkbso its still not working22:05
clarkboh maybe those were happening already22:05
clarkbok maybe it did work? I'm trying to find the logs on bridge now22:06
clarkbnope install ansible log is still a week old. I'm confused as to what actually failed22:07
clarkbthe job succeeded but it didn't really do anything?22:08
clarkbhttps://zuul.opendev.org/t/openstack/build/7305e6f77df04045b2e9350c657895f822:08
clarkbits like the run playbook didn't run22:09
clarkbso we ran pre, then post and succeeded22:09
fungiskipped run... why?22:10
clarkbno clue22:11
clarkbI've grepped the logs for the run on ze03 now and am trying to see if the executor says more22:12
clarkbprovided hosts list is empty, only localhost is available.22:12
clarkbI think that is the problem22:12
fungithe console view is having trouble rendering the json or else it's really just empty for those plays22:12
clarkbya its empty because it had no hosts to run on22:13
clarkbbecause the base-jobs side is a separate playbook we need to add host again in the system-config side22:13
clarkbI'll work on that change now22:13
fungiwhy did that change?22:13
fungioh! right22:14
fungithe stuff that went into opendev/base-jobs22:14
fungiokay, got it22:14
openstackgerritClark Boylan proposed opendev/system-config master: Continue to add_host here even though we do it in base-jobs  https://review.opendev.org/74300522:16
clarkbcorvus: fungi ^22:16
clarkbI've also added in the ssh host key but it may not be strictly necessary22:17
clarkbnow separately there are some pyc files that the ansible complains about not being able to cleanup but I suspect that has been the case for a while and isn't a regression22:17
corvusclarkb: lgtm22:18
clarkbcorvus: I wonder if we can update the zuul console to catch the "no hosts matched" situation and give that info22:19
clarkb(I'm not sure what the json looks like)22:19
clarkbthe fix is in the gate now22:36
openstackgerritMerged opendev/system-config master: Continue to add_host here even though we do it in base-jobs  https://review.opendev.org/74300522:54
clarkbhttps://zuul.opendev.org/t/openstack/stream/ad587c4bfafc49d9a5b1ec535f5f8229?logfile=console.log is running now22:55
clarkbRunning 2020-07-24T22:55:17Z: ansible-playbook -v -f 5 /home/zuul/src/opendev.org/opendev/system-config/playbooks/install-ansible.yaml has shown up on bridge in /var/log/ansible/install-ansible.something.log22:56
clarkbits looking happy so far22:56
clarkbbridge.openstack.org       : ok=27   changed=2    unreachable=0    failed=0    skipped=8    rescued=0    ignored=022:57
fungiyay!22:57
clarkbit updated virtualenv a point release22:58
clarkband that seems to have been about it22:58
clarkband in a few minute swe should have hourly jobs I think22:58
clarkband we should finally update the ssl cert for the linaro mirror now that things are running again22:58
clarkbthat was our canary I think22:58
fungiwe're up to 2 or 3 mirrors with certs expiring <30 days now22:59
clarkbthey should update today/tonight I think22:59
clarkbassuming this fix is globally happy (it should be we addressed a thing that is happy on one job and used the same way by other jobs)22:59
fungiyep, i agree23:00
clarkband now we have hourly jobs. No manage-projects jobs whcih would've been good to see but we really sould be fine there too since we use latest project-config in them so I think if the jobs we do have are happy running now we're all set23:02
*** mlavalle has quit IRC23:02
fungiwfm23:02
clarkbone down 7 to go23:04
clarkbprobably the most important one is the zuul one to pick up the project renames in zuul's config23:04
fungiright23:04
clarkbbridge.openstack.org       : ok=36   changed=0    unreachable=0    failed=0    skipped=4    rescued=0    ignored=0 from service-bridge23:06
clarkbcloud launcher is doing its thing. Looks as expected to me so far23:11
clarkbbridge.openstack.org       : ok=477  changed=0    unreachable=0    failed=0    skipped=1022 rescued=0    ignored=0 thats cloud launcher23:19
clarkbnodepool runs on 7 hosts so I won't paste all of it here but it looked good to me23:26
clarkbregistry is just finishing up. Zuul is next23:28
*** shtepanie has joined #opendev23:31
clarkbzuul is spending a lot of time gathering facts23:34
clarkbI think zm01 is out to lunch23:38
clarkbso I guess we wait until it gives up then hope that we update zuul01 anyway?23:38
clarkbfungi: zm01 is pingable from here but not respnding to ssh, do you see the same?23:38
fungium, checking23:45
fungii can ssh to it23:45
fungiboth via ipv4 and ipv623:45
fungiclarkb: did it recover for you?23:45
fungioh, zm0123:45
fungihold on23:45
fungiyeah, it gets partway through the ssh key exchange23:46
fungiand then hangs23:46
clarkbya23:46
clarkbso that might be another case where we need to reboot :/ they keep dropping like flies23:47
clarkbI'm running out of steam23:47
clarkb(I expect we're in a happy state now less ssh that doesn't timeout in a reasonable amount of time)23:47
*** tosky has quit IRC23:55

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!